Ron02
Posts: 1
Joined: Tue Oct 30, 2018 10:33 am

Re: Yet Another Bare Metal Tutorial for the RPi3

Tue Oct 30, 2018 10:39 am

Dear all,

I reviewed the Posts in this Forum, but I didn't find anything About Debugging the Pi from a IDE. The Arduino has a EDBG interface integrated so that a remote debugging from the host PC is feasible. Is there some opportunity for doing this with the Raspberry Pi as well or do I have to use the SD card even for bare metal programming?

Thanks in advance

Ron

bzt
Posts: 301
Joined: Sat Oct 14, 2017 9:57 pm

Re: Yet Another Bare Metal Tutorial for the RPi3

Wed Oct 31, 2018 11:16 pm

Hi Ron,

As far as I see, you have the several options:
1. you can use raspbootin (I've rewritten it for 64 bit) to avoid SD card usage and boot your kernel over serial line
2. for debugging, you can compile in my mini-debugger and use that over serial line with any terminal emulator (even with rasbootcom)
3. most complicated, but most promosing for fully featured IDE itegration is to compile a gdb remote stub into you kernel. In theory gdbserver has a patch for AArch64, but honestly I haven't tried that. There's also a (not very helpful) description on ARM info center on how to use JTAG in virtual ethernet/tty mode with gdb.
4. if you are fine with a virtual environment, qemu has a built-in disassembler (-d int,in_asm) and built-in gdb server (-S -s) which you can use without modifying your kernel just out-of-the-box.

Cheers,
bzt

pxlnpx
Posts: 5
Joined: Sat Feb 02, 2019 6:14 pm

Re: Yet Another Bare Metal Tutorial for the RPi3

Sat Feb 02, 2019 7:07 pm

bzt wrote:
Wed Oct 31, 2018 11:16 pm
3. most complicated, but most promosing for fully featured IDE itegration is to compile a gdb remote stub into you kernel. In theory gdbserver has a patch for AArch64, but honestly I haven't tried that. There's also a (not very helpful) description on ARM info center on how to use JTAG in virtual ethernet/tty mode with gdb.
Being new to binutils and related tools, a quick bare-metal debugging-related question: how to modify the ‘link.ld’ script in bzt’s tutorials in order to contiguously include all debugging info (i.e. dwarf .debug_* sections) and how would e.g. a running program later discover its own debug info memory starting address and its size?

bzt
Posts: 301
Joined: Sat Oct 14, 2017 9:57 pm

Re: Yet Another Bare Metal Tutorial for the RPi3

Mon Feb 04, 2019 12:40 am

pxlnpx wrote:
Sat Feb 02, 2019 7:07 pm
Being new to binutils and related tools, a quick bare-metal debugging-related question: how to modify the ‘link.ld’ script in bzt’s tutorials in order to contiguously include all debugging info (i.e. dwarf .debug_* sections) and how would e.g. a running program later discover its own debug info memory starting address and its size?
Add "-g" to CFLAGS in Makefile. Then load "kernel8.elf" into gdb while running "kernel8.img" as usual. I think that's all (the elf executable already contains the starting address and size for each segment, and "-g" adds dwarf sections and symbol translations to it). You can add a new segment in link.ld for debug info, but shouldn't be needed. Normally they are just appended to the text segment. If memory serves gdb may need some gnu specific sections though (like gnu hash I think), so you should remove (.gnu*) from the DISCARD rule just to be on the safe side.

Cheers,
bzt

pxlnpx
Posts: 5
Joined: Sat Feb 02, 2019 6:14 pm

Re: Yet Another Bare Metal Tutorial for the RPi3

Tue Feb 05, 2019 8:39 pm

bzt wrote:
Mon Feb 04, 2019 12:40 am
... You can add a new segment in link.ld for debug info, but shouldn't be needed. Normally they are just appended to the text segment. ...
Thanks for this hint; I just added

Code: Select all

    .debug_info 0 : {
        __debug_info_start = .;
        *(.debug_info)
        __debug_info_end = .;
    }
__debug_info_size = SIZEOF(.debug_info);
for all .debug_* sections and 'objdump' shows the linker now does what one expects. Unfortunately

Code: Select all

    objcopy -O binary kernel8.elf kernel8.img
ruins this success by arbitrarily removing all debugging info (one does not have to specify '-R' or '-g' for this to happen, even passing '-debugging' to 'objcopy' in the hope the debug sections simply get flushed into the binary doesn't change anything here.)

Is there any alternative tool to turn ELF files into binary, without loosing any sections?

bzt
Posts: 301
Joined: Sat Oct 14, 2017 9:57 pm

Re: Yet Another Bare Metal Tutorial for the RPi3

Wed Feb 06, 2019 9:57 am

pxlnpx wrote:
Tue Feb 05, 2019 8:39 pm
...ruins this success by arbitrarily removing all debugging info
This shouldn't be a problem. You run the .img, true, but you load the .elf (with all the sections) into gdb. Therefore gdb will be able to read the debug information and the symbols even though the running .img doesn't have them.
But if you want to keep the debug info in the .img regardless, simply put them in the text section after the rodata.

Cheers,
bzt

pxlnpx
Posts: 5
Joined: Sat Feb 02, 2019 6:14 pm

Re: Yet Another Bare Metal Tutorial for the RPi3

Sun Feb 10, 2019 7:54 pm

bzt wrote:
Wed Feb 06, 2019 9:57 am
...But if you want to keep the debug info in the .img regardless, simply put them in the text section after the rodata.
Thanks for this hint, it looks promising.
Though, by this simple copy

Code: Select all

.text : {
    *(.debug_info)
}
the LMA and VMA addresses look different, and the dwarf info unfortunately gets "corrupted" too (is different if embedded into a .text section). Most probably the corrupted debug-info from the .text section gets dumped into the baremetal image by "objdump -O binary ..." too (haven't tested though).

pxlnpx
Posts: 5
Joined: Sat Feb 02, 2019 6:14 pm

Re: Yet Another Bare Metal Tutorial for the RPi3

Sun Feb 10, 2019 9:14 pm

In the process of studying the arm processor by following bzt's tutorials an interesting question emerged: can raspberrypi be deployed in a big-endian mode? Just as a toy exercise here is bzt's tutorial "03_uart1" slightly modified to support big-endian. The "start.S" file:

Code: Select all

.section ".text.boot"

.global _start

_start:
    // read cpu id, stop slave cores
    mrs     x1, mpidr_el1
    and     x1, x1, #3
    // cpu id > 0, stop
    cbnz    x1, 1f

    // set stack before our code
    adr     x1, _start
    msr     sp_el1, x1

    // enable AArch64 in EL1
    mov     x2, #(1 << 31)      // AArch64
    orr     x2, x2, #(1 << 1)   // SWIO hardwired on Pi3
    msr     hcr_el2, x2
    mrs     x2, hcr_el2

    // Setup SCTLR access
    mov     x2, #0x0800
#if (defined __BYTE_ORDER__) && (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__)
    movk    x2, #0x33d0, lsl #16
#else
    movk    x2, #0x30d0, lsl #16
#endif
    msr     sctlr_el1, x2

    // change execution level to EL1
    mov     x2, #0x3c4  //  PSR_D_BIT | PSR_A_BIT | PSR_I_BIT | PSR_F_BIT | PSR_MODE_EL1h
    msr     spsr_el2, x2
    adr     x2, 5f
    msr     elr_el2, x2
    eret

5:  mov     sp, x1

    // clear bss
    ldr     x1, =__bss_start
    ldr     w2, =__bss_size
3:  cbz     w2, 4f
    str     xzr, [x1], #8
    sub     w2, w2, #1
    cbnz    w2, 3b

    // jump to C code, should not return
4:  bl      main
    // for failsafe, halt this core too

1:  wfe
    b       1b
(From the "_start:" label until the "eret" instruction we possibly execute big-endian code on a little-endian cpu.) The mailbox-interface apparently is only little-endian by design, so "uart.c" had to be refactored a little:

Code: Select all

#include "gpio.h"

/* Auxilary mini UART registers */
#define AUX_ENABLE      ((volatile unsigned int*)(MMIO_BASE+0x00215004))
#define AUX_MU_IO       ((volatile unsigned int*)(MMIO_BASE+0x00215040))
#define AUX_MU_IER      ((volatile unsigned int*)(MMIO_BASE+0x00215044))
#define AUX_MU_IIR      ((volatile unsigned int*)(MMIO_BASE+0x00215048))
#define AUX_MU_LCR      ((volatile unsigned int*)(MMIO_BASE+0x0021504C))
#define AUX_MU_MCR      ((volatile unsigned int*)(MMIO_BASE+0x00215050))
#define AUX_MU_LSR      ((volatile unsigned int*)(MMIO_BASE+0x00215054))
#define AUX_MU_MSR      ((volatile unsigned int*)(MMIO_BASE+0x00215058))
#define AUX_MU_SCRATCH  ((volatile unsigned int*)(MMIO_BASE+0x0021505C))
#define AUX_MU_CNTL     ((volatile unsigned int*)(MMIO_BASE+0x00215060))
#define AUX_MU_STAT     ((volatile unsigned int*)(MMIO_BASE+0x00215064))
#define AUX_MU_BAUD     ((volatile unsigned int*)(MMIO_BASE+0x00215068))

unsigned int get32le (volatile unsigned int *p)
{
#if (defined __BYTE_ORDER__) && (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__)
    return __builtin_bswap32 (*p);
#else
    return (*p);
#endif
}

void put32le (volatile unsigned int *p, unsigned int i)
{
#if (defined __BYTE_ORDER__) && (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__)
    (*p) = __builtin_bswap32 (i);
#else
    (*p) = i;
#endif
}

/**
 * Set baud rate and characteristics (115200 8N1) and map to GPIO
 */
void uart_init (void)
{
    register unsigned int r;

    /* initialize UART */
    put32le (AUX_ENABLE, get32le (AUX_ENABLE) | 1);  // enable UART1, AUX mini uart
    put32le (AUX_MU_CNTL, 0);
    put32le (AUX_MU_LCR, 3);  // 8 bits
    put32le (AUX_MU_MCR, 0);
    put32le (AUX_MU_IER, 0);
    put32le (AUX_MU_IIR, 0xC6);  // disable interrupts
    put32le (AUX_MU_BAUD, 270);  // 115200 baud

    /* map UART1 to GPIO pins */
    r  = get32le (GPFSEL1);
    r &= ~((7<<12) | (7<<15));  // gpio14, gpio15
    r |=  ((2<<12) | (2<<15));  // alt5
    put32le (GPFSEL1, r);

    put32le (GPPUD, 0);  // enable pins 14 and 15
    r = 150; while (r--) { asm volatile ("nop"); }
    put32le (GPPUDCLK0, (1<<14) | (1<<15));
    r = 150; while (r--) { asm volatile ("nop"); }
    put32le (GPPUDCLK0, 0);  // flush GPIO setup
    put32le (AUX_MU_CNTL, 3);  // enable Tx, Rx
}

/**
 * Send a character
 */
void uart_send (unsigned int c)
{
    /* wait until we can send */
    do { asm volatile ("nop"); } while (!(get32le (AUX_MU_LSR) & 0x20));
    /* write the character to the buffer */
    put32le (AUX_MU_IO, c);
}

/**
 * Receive a character
 */
char uart_getc (void)
{
    char r;
    /* wait until something is in the buffer */
    do { asm volatile ("nop"); } while (!(get32le (AUX_MU_LSR) & 0x01));
    /* read it and return */
    r = (char)(get32le (AUX_MU_IO));
    /* convert carrige return to newline */
    return r == '\r' ? '\n' : r;
}

/**
 * Display a string
 */
void uart_puts (char *s)
{
    while (*s) {
        /* convert newline to carrige return + newline */
        if (*s == '\n')
            uart_send ('\r');
        uart_send (*s++);
    }
}
Compiled using

Code: Select all

    gcc -ffreestanding -nostdinc -nostdlib -nostartfiles -mbig-endian -o kernel8.elf -T link.ld start.S main.c uart.c
    objcopy -O binary kernel8.elf kernel8.img
There are still few open questions:
  • could start.S be reduced in the sense of fewer assembler instructions?
  • what is the purpose of "msr sp_el1, x1", or why do we need "mov sp, x1"?
  • is it correct that cores 1, 2, 3 stay in EL2?
  • how power-consuming is "1: b 1b"?
  • the mailbox-interface is certainly not thread-safe? (Just for the case we later invent main0(), ..., main3() for cores 0, ..., 3 to enter, and just want to access mailbox from any of them.)
Sorry for this big post. :-)

bzt
Posts: 301
Joined: Sat Oct 14, 2017 9:57 pm

Re: Yet Another Bare Metal Tutorial for the RPi3

Fri Feb 15, 2019 2:48 pm

pxlnpx wrote:the dwarf info unfortunately gets "corrupted"
I'm sorry, I can't give you cut out instructions here. The best advice I can give you is to try different configurations until the addresses in dwarf info became correct. Having debug info in the elf only and not in the img shouldn't be a problem, so including debug section into text segment is just optional.
pxlnpx wrote:In the process of studying the arm processor by following bzt's tutorials an interesting question emerged: can raspberrypi be deployed in a big-endian mode?
Yes, and you have already done that :-) It worth mentioning that if you implement virtual memory, you must set big-endian enable bits in the paging tables too.
pxlnpx wrote:There are still few open questions:
  • could start.S be reduced in the sense of fewer assembler instructions?
Probably. My goal was to create easily distinguishable blocks for education, and not optimalization. Although _start is quite small, and only executed once during boot, so optimization doesn't worth it imho.
pxlnpx wrote:[*] what is the purpose of "msr sp_el1, x1", or why do we need "mov sp, x1"?
Probably. The first one sets the stack for exception handlers (while running in EL2), the second one sets the current sp (running in EL1). There's a good chance you never start your kernel at EL1, therefore the eret is always executed, and sp is always loaded from sp_el1.
pxlnpx wrote:[*] is it correct that cores 1, 2, 3 stay in EL2?
Yes. Simplicity was my goal. For a full-blown implementation, take a look at my bootloader's boot.S. It sets up all cores (EL1, virtual mappings, etc.), loads an ELF from an initrd, maps it in higher half (-2M) and starts executing it on all cores. Except for the stack, all cores are intialized equally.
pxlnpx wrote:[*] how power-consuming is "1: b 1b"?
Very much. It's generating 100% CPU usage. :-) Use "1: wfe; b 1b" instead. The Wait For Event instruction puts the cpu core in a low energy consuption mode until it receives an interrupt (or some other similar event).
pxlnpx wrote:[*] the mailbox-interface is certainly not thread-safe? (Just for the case we later invent main0(), ..., main3() for cores 0, ..., 3 to enter, and just want to access mailbox from any of them.)
No, you should implement an exclusive access mechanism for it. Simpliest is a spinlock, so that only one CPU can write the mailbox MMIO address at any given time. By the way, the same stands for all MMIO addresses. It's not healthy either if more CPUs are trying to write the same UART registers concurrently for example.
pxlnpx wrote:Sorry for this big post. :-)
Don't you worry :-)

Cheers,
bzt

bzt
Posts: 301
Joined: Sat Oct 14, 2017 9:57 pm

Re: Yet Another Bare Metal Tutorial for the RPi3

Fri Feb 15, 2019 8:01 pm

pxlnpx wrote:
Sun Feb 10, 2019 7:54 pm
the LMA and VMA addresses look different, and the dwarf info unfortunately gets "corrupted" too (is different if embedded into a .text section). Most probably the corrupted debug-info from the .text section gets dumped into the baremetal image by "objdump -O binary ..." too (haven't tested though).
Hi,

I've tested this for you with my bootloader. These are the steps I've done:
1. I've added "-g" to aarch64-elf-gcc in the Makefile (to generate debug info)
2. I haven't changed anything in the linker script
3. The bootboot.elf's size increased significantly, but bootboot.img remained the same (text segment unchanged)

Now in one terminal, I've started qemu like this (-s stops guest execution, -S starts the built-in gdb-server):

Code: Select all

$ qemu-system-aarch64 -s -S -M raspi3 -kernel bootboot.img

In another terminal, I've started the cross-platform gdb that my distro ships:

Code: Select all

$ aarch64-linux-gnu-gdb
Then in the gdb prompt, I've typed

Code: Select all

(gdb) set architecture aarch64
The target architecture is assumed to be aarch64
To set AArch64 architecture.

Code: Select all

(gdb) target remote localhost:1234
Remote debugging using localhost:1234
warning: No executable has been specified and target does not support
determining executable automatically. Try using the "file" command.
0x0000000000000000 in ??()
To connect the gdb to the gdb-server in qemu.

Code: Select all

(gdb) symbol-file bootboot.elf
Reading symbols from bootboot.elf...done.
To load the symbols and debugging information from the elf file.

Code: Select all

(gdb) display/i $pc
1: x/i $pc
=> 0x0: ldr     x0, 0x18
I've used this because I like to see what's the machine code doing :-)

Code: Select all

(gdb) break bootboot_main
Breakpoint 1 at 0x840c0: file bootboot.c, line 1110.
As a test, I've set up a breakpoint at one of the C functions. As you can see, the address was read from the elf symbols correctly, and the debug info provided the source file and line also correctly.

Code: Select all

(gdb) c
Continuing.

Thread 1 hit Breakpoint 1, bootboot_main (hcl=2147483650) at bootboot.c:1110
1110    {
1: x/i $pc
=> 0x840c0 <bootboot_main>:     sub      sp, sp, #0xa90
Finally with "c" (stands for continue), I have started the virtual machine. The execution then stopped at my function as expected, showing the source line (that's just a "{" block opening in this case) and the first instruction.

For a better view, type "layout split". That will show you several lines from the source file with the disassembed instructions in parallel above the gdb prompt, like this (note this is just an example I found on the internet, it's actually x86):
Image.

Hope this helps.

Cheers,
bzt

Return to “Bare metal, Assembly language”