banspri
Posts: 24
Joined: Tue Jun 27, 2017 11:02 pm

Run all 4 cores Raspberry Pi 3

Wed Jul 19, 2017 7:21 pm

I'm working on what is essentially a bare metal operating system, and I'm trying to make use of all 4 cores on the Raspberry Pi 3. Currently only core 0 is running. Doing a little bit of Googling, I found that it is possible that cores 1-3 aren't starting at _start (which is the section I've declared to be the start of the kernel). Why is this and how can I fix this?

LdB
Posts: 1207
Joined: Wed Dec 07, 2016 2:29 pm

Re: Run all 4 cores Raspberry Pi 3

Wed Jul 19, 2017 7:45 pm

They start they are just parked .. see post below you. You need to give them something to do.
https://www.raspberrypi.org/forums/view ... 2&t=188720

banspri
Posts: 24
Joined: Tue Jun 27, 2017 11:02 pm

Re: Run all 4 cores Raspberry Pi 3

Thu Jul 20, 2017 4:00 pm

I've checked out the other forum and I want to clarify a few things.

If I add kernel_old=1 to my config.txt file, does the code that the bootloader uses to park the cores not run? I tried adding that to my config file and my code decided to not run at all (probably because kernel_old does more than just bypass the parking of cores...) I tried specifying my code to start at 0x0, since I believe you said that's where the code should start if I add kernel_old=1, but my code still didn't run, so I'll probably give up on using kernel_old.

If I don't add kernel_old=1, does that mean that the bootloader is waiting for a specific number, namely an address, to be put at a specific location (0x4000009C for core 1)? I tried adding these two lines of code:

Code: Select all

        long *core1 = 1073741980;
        *core1 = somerand;
as well as this method:

Code: Select all

void somerand(void)
{
        while(1)
                kprintf("dude");
}
and the rest of the code ran fine, but this method didn't run. Did I write those pieces of code wrong? Does it matter whether I use decimal or hex? Should the pointer be something other than a long pointer?

banspri
Posts: 24
Joined: Tue Jun 27, 2017 11:02 pm

Re: Run all 4 cores Raspberry Pi 3

Thu Jul 20, 2017 4:10 pm

I also have this in my start.S file:

Code: Select all

 
        mrc p15, 0, r0, c0, c0, 5
        mov r1, #3
        and r1, r0, r1
        cmp r1, #0
        beq nulluser
        cmp r1, #1
        beq _startcore1
        cmp r1, #2
        beq _startcore2
        cmp r1, #3
        beq _startcore3
_startcore1:
        b       testcore1

_startcore2:
        b       testcore2

_startcore3:
        b       testcore3
just in case the cores aren't parked. The testcore methods above are simply as follows:

Code: Select all

void testcore1(void)
{
        kprintf("I am core 1\r\n");
}

void testcore2(void)
{
        while(1)
                kprintf("I am core 2\r\n");
}

void testcore3(void)
{
        while(1)
                kprintf("I am core 3\r\n");
}

void testcore4(void)
{
        while(1)
                kprintf("I am core 4\r\n");
}
I'm a little worried that, if cores 1-3 run before core 0, the testcore methods won't print simply because nulluser wasn't called, and nulluser initializes the serial port. However, I tried branching all the cores to nulluser and it seems the function ran only once.

LdB
Posts: 1207
Joined: Wed Dec 07, 2016 2:29 pm

Re: Run all 4 cores Raspberry Pi 3

Thu Jul 20, 2017 6:27 pm

banspri wrote:If I add kernel_old=1 to my config.txt file, does the code that the bootloader uses to park the cores not run?
Correct it does not you will see all 4 cores.
banspri wrote: I tried adding that to my config file and my code decided to not run at all (probably because kernel_old does more than just bypass the parking of cores...) I tried specifying my code to start at 0x0, since I believe you said that's where the code should start if I add kernel_old=1, but my code still didn't run, so I'll probably give up on using kernel_old.[/code]
All four cores will come to you at 0x0000 and they come to you in EL3 mode not EL2 like they do with bootloader. I suspect your code later expects the processor in EL2 and hence the crash.
banspri wrote:If I don't add kernel_old=1, does that mean that the bootloader is waiting for a specific number, namely an address, to be put at a specific location (0x4000009C for core 1)? I tried adding these two lines of code:
Correct.

Code: Select all

 long *core1 = 1073741980;
        *core1 = somerand;
Right idea bad C code :-)

Let me make it easier for you

Code: Select all

#include <stdint.h>
#define CORE1_MAILBOX ((volatile __attribute__((aligned(4))) uint32_t*) (0x4000009C))

void somerand(void) __attribute__((naked));
void somerand(void)
{
        while(1)
                kprintf("dude");
}

/* use */
*CORE1_MAILBOX = (intptr_t)&somerand;
However I suspect there is a problem and I need David and his debugger to look at the loaded stub in a Pi3 in 32bit.
https://github.com/raspberrypi/tools/bl ... armstub7.S

Code: Select all

ldr	r5, mbox		@ mbox      // Mbox is 0x4000008C
	mov	r3, #0			@ magic

	add	r5, #(0x400000CC-0x4000008C)	@ mbox // This adds 0x40  so R5 = 0x400000CC
1:
	ldr	r4, [r5, r0, lsl #4]           // R0 is core ID ... I believe translation is load 0x400000CC + 0x10 * ID
That I think is saying the mailboxes are at 0x400000DC, 0x400000EC, 0x400000FC

As the arm site says
LDR R0, [R1, R2, LSL #4]
translates as r0 = r1[(r2 << 4)];

UPDATE .. found the info .. and that is right:
https://www.raspberrypi.org/documentati ... rev3.4.pdf

So it's reading and clearing from mailbox 3 on each core. You need to write to mailbox 3 which is the correct address 9C.

I gave it a quick try and it didn't work so I think we need to ask David and his debugger to look at the loaded stub in a Pi3 in 32bit.

dwelch67
Posts: 955
Joined: Sat May 26, 2012 5:32 pm

Re: Run all 4 cores Raspberry Pi 3

Thu Jul 20, 2017 8:57 pm

LdB what did you do to get out of the WFE?

banspri
Posts: 24
Joined: Tue Jun 27, 2017 11:02 pm

Re: Run all 4 cores Raspberry Pi 3

Thu Jul 20, 2017 10:10 pm

Right idea bad C code :-)

Let me make it easier for you
CODE: SELECT ALL
#include <stdint.h>
#define CORE1_MAILBOX ((volatile __attribute__((aligned(4))) uint32_t*) (0x4000009C))

void somerand(void) __attribute__((naked));
void somerand(void)
{
while(1)
kprintf("dude");
}

/* use */
*CORE1_MAILBOX = (intptr_t)&somerand;

I know you said this didn't quite work, but I wanted to try it anyway. I couldn't find intptr_t in my stdint.h. What's it typdef'd to?

banspri
Posts: 24
Joined: Tue Jun 27, 2017 11:02 pm

Re: Run all 4 cores Raspberry Pi 3

Thu Jul 20, 2017 10:36 pm

I was having this problem with my old code and it looks like I'm still having the same problem now. Even though I have the code

Code: Select all

*CORE1_MAILBOX = (intptr_t)&somerand;
, when I print out (intptr_t)&somerand I get 40912 (the same value as when I simply print somerand), yet when I print *CORE1_MAILBOX I get 181010432.

My apologies for using decimal instead of hex. I'm just more used to typing %d than typing %x.

banspri
Posts: 24
Joined: Tue Jun 27, 2017 11:02 pm

Re: Run all 4 cores Raspberry Pi 3

Thu Jul 20, 2017 10:43 pm

181010432 is also the value I get for *CORE1_MAILBOX before I assign (intptr_t)&somerand to it. I had the same issue earlier when I was just assigning the int "8" to the pointer I had called core1 above. I believe the value in the pointer was similar also, if not the same. When I assigned a different value to the pointer, like the integer "9", the value in *core1 was 0.

banspri
Posts: 24
Joined: Tue Jun 27, 2017 11:02 pm

Re: Run all 4 cores Raspberry Pi 3

Thu Jul 20, 2017 11:00 pm

What's even more interesting is if I do the same thing to a pointer with a random value, *(name of pointer) is what I expect it to be.

banspri
Posts: 24
Joined: Tue Jun 27, 2017 11:02 pm

Re: Run all 4 cores Raspberry Pi 3

Thu Jul 20, 2017 11:44 pm

Realllllly don't know what changed, but I'm using the same code I was using before, with

Code: Select all

        long *core1 = 1073741980;
        *core1 = somerand;
and now I seem to be getting somerand in *core1. Core 1 still doesn't seem to be running, though, and the code you provided continues to make it seem that the value the pointer points to doesn't change. And this non-changingness still seems to be happening only to the pointer to the mailbox... If I change the value of the pointer to something random I still get the expected result from *pointer

LdB
Posts: 1207
Joined: Wed Dec 07, 2016 2:29 pm

Re: Run all 4 cores Raspberry Pi 3

Fri Jul 21, 2017 2:34 am

intptr_t is defined under C99 as the integer type capable of holding a pointer. So I am telling the compiler to promote the address of somerand to a an integer (of width any pointer can be) so I can assign it to a pointer with safety.
So your compiler must be running C89 in which case replace it with uint32_t which is the type
*CORE1_MAILBOX = (uint32_t)&somerand;
Alternative is to change compile version using flags if available -C99 or -C11 etc

It looks wrong to me on 64 bit compiler because it creates a 32bit address on a 64bit address system but it's right given the hardware can only take a 32 bit address and the code wont work on a Pi3 in AARCH64. So lets settle on that change.

Now this is rather long but I want to run thru your code.
long *core1 = 1073741980;
*core1 = somerand;

Line one says set the long pointer called core1 to value 1073741980.
Inherent in that are multiple problems and lots ride on the specific compiler
The second line won't compile on any C compiler but some C++ compilers will let it thru

So lets deal with why:

First the size of long is compiler specific it could be 32bits or 64bits that varies from compiler to compiler
The value 1073741980 is a literal and again that is compiler dependent what it means it could be an 16,32 or 64 bits but on a Pi probably the later choices.

The literal you could have at least made slightly safer by placing an lower case L after it so the compiler knows it's a long literal.
long *core1 = 1073741980l;

The other line on a C compiler will not compile and address is always prefixed by the & symbol and without it then you have a type violation and it will not compile. Core1 would have to be a function pointer whose prototype matched somerand for that to be valid and that isn't what you wanted anyhow. On C++ compiler it might be smart enough to get you mean the address after first realizing that the alternative is an error. I am on a C compiler so what you have written is ILLEGAL and wont compile. So what it tells me is you are on a C++ compiler and that isn't C code.

Easy enough to prove to you turn on all the warnings with the flags -wall on any GCC compiler and I guarantee you it will refuse to compile for very good reasons.

Now even if your code compiles you are praying a long is 32bits and the literal is 32bits.
I use 6 different compilers on the Pi for various reasons the easiest being I write both 32 and 64 bit code and on 3 of those compilers one or both of those assumptions are wrong.

So I have absolutely no way of knowing what your code sends to the hardware it is specific to your C++ compiler and basically is useless to me or anyone else unless they are using the exact compiler as you. The hardware register is fixed it is 32bits wide and must be accessed as such so you need to be far more careful when writing code to target hardware.

The fact my code produces a different result to yours tells you that your code is wrong because my code is completely portable (aside from the attributes) and produces the same result on any compiler. To show you what my code produces here it is, comments are mine

Code: Select all

  86 003c 0121A0E3 		mov	r2, #1073741824     //  r2 = 0x4000 0000
  87 0044 003000E3 		movw	r3, #:lower16:somerand
  88 0048 003040E3 		movt	r3, #:upper16:somerand  //  r3 = address of somerand
  89 0050 9C3082E5 		str	r3, [r2, #156]                  // write r3 to address r2+156 (0x9c)
That is it does exactly what I expected it to do :-)

I am not going thru this to be smart but explain I am a commercial programmer and I can't write the loose and free code you did because I have no way of knowing what it does and it varies between all my compilers.

So things aren't working because of the C code but because there is something wrong either the stub is different or the bus at 0x40000000 isn't setup or the like. I am pretty sure (read certain) Ultibo has the cores on his Pi3 system working as he is pretty dam good at this stuff so I will do a ferret in his code and see whats wrong. If the post gods are on our side he may drift thru see it and tell us.

dwelch67
Posts: 955
Joined: Sat May 26, 2012 5:32 pm

Re: Run all 4 cores Raspberry Pi 3

Fri Jul 21, 2017 3:02 pm

Wow I would hate to hear what you say about my code, dont know if that is a grin or a frown. I am well aware of my portability as well as lack of issues though and where to look when the code fails...but some folks like my style and some folks really hate my style, so I get it...

Trying to get motivated to do stuff this week, no desire to play on boards (come home from work and have just enough energy to veg and watch tv).

Not quite sure where you folks are stuck. The pi3 in aarch64 mode, no config.txt uses those mailboxes we think/assume, but there is a wfe in the loop so you have to kick it with an event or research what you already looked into but take it further on the register that affects wfe/wfi.

that or the easy way is boot with a config.txt and sort the cores yourself into whatever address you want with no wfe/wfi.

or are you running 32 bit mode without a config.txt that should just work with the addresses yes? did on the pi2 and the pi3 in 32 bit mode I thought was supposed to be somewhat compatible, but I have not dumped that code or maybe have and forgot...(to see the real answer in the disassembly).

LdB
Posts: 1207
Joined: Wed Dec 07, 2016 2:29 pm

Re: Run all 4 cores Raspberry Pi 3

Fri Jul 21, 2017 3:16 pm

I wasn't trying to be hard I have got so many compilers on the go I am having to start invoking work like discipline. Otherwise everytime I move code to compiler I spend an hour straightening it up .. so I am a bit touchy about it atm :-).

banspri is working 32 bit on the Pi3 no config.txt and I just got around to that and looking at what Ultibo did because he has the cores switching. I can't see anything with the core or register setup but he does a pile of work on the cache setup which I am trawling thru. I would love a debug of the Pi3 32 bit stub if you get a chance. I can read the stub but it's just hex I haven't found a way to disassemble the hex.

I found the WFE/WFI are real so definitely implemented. WFE will only come out with a SEV and I haven't worked out how to route interrupts for WFI because I haven't worked out where the core mailbox is :-)

I have a new play tool for you to look at which I will get up on GITHUB which does the 64/32 bit app thing. Anyhow just starting out for the night so see what I can get done.

banspri
Posts: 24
Joined: Tue Jun 27, 2017 11:02 pm

Re: Run all 4 cores Raspberry Pi 3

Fri Jul 21, 2017 4:06 pm

Thanks for all the info, LdB. I am a student, so it's helpful to know these things. I'm assuming that the reason my code "worked" (it assigned a value where I wanted it to) was because it was wrong. It seems to me that values simply can't be assigned to the mailboxes, though they can be assigned to random locations in memory. I'll play around some more and see if I'm having the same issue with the mailboxes for cores 2 and 3.

banspri
Posts: 24
Joined: Tue Jun 27, 2017 11:02 pm

Re: Run all 4 cores Raspberry Pi 3

Fri Jul 21, 2017 4:35 pm

So confused... So here's the information I have so far:

I can write to 0x400009c (4 zeros instead of 5) and 0x40000cc (again, 4 zeros instead of 5). I can't write to 0x4000006c, 0x4000007c, 0x4000008c, 0x4000009c, 0x400000ac, 0x400000bc, or 0x400000cc. I can read from all these locations, and the values that are in the bigger numbers, both before and after I try writing to it, are as follows:

0x4000006c ~ 0
0x4000007c ~ 0
0x4000008c ~ 181010432
0x4000009c ~ 181010432
0x400000ac ~ 181010432
0x400000bc ~ 181010432
0x400000cc ~ 0

My preliminary conclusions are that there is some range of memory that I cannot write to but can read from. 0x4000008c, 0x4000009c, 0x400000ac, and 0x400000bc seem to be special, so I'm assuming they are all mailboxes for specific cores (not sure what 0x4000008c is for, but I'm assuming it's for core 0).

Any idea why I can write to some locations and not others? Do you seem to be facing the same problems on your end? Perhaps the issue has something to do with my compiler. Not sure why it would be, but if I'm facing this issue and you're not, that's the only thing I can think of, since we're using the same code.
Last edited by banspri on Fri Jul 21, 2017 4:41 pm, edited 1 time in total.

LdB
Posts: 1207
Joined: Wed Dec 07, 2016 2:29 pm

Re: Run all 4 cores Raspberry Pi 3

Fri Jul 21, 2017 4:37 pm

Yes it confused me as well what you need to do is read about the core mailbox on here
https://www.raspberrypi.org/documentati ... rev3.4.pdf

dwelch67
Posts: 955
Joined: Sat May 26, 2012 5:32 pm

Re: Run all 4 cores Raspberry Pi 3

Fri Jul 21, 2017 4:56 pm

This works for 32 bit arm gcc, but not 64 bit there are generic ways to do it for anything gcc supports, but for 32 bit arm...

I took my bootloader but basically any uart example that I can print hex 32 bit numbers. build that and have it dump the first 0x100 bytes (atags typically start at 0x100)

Code: Select all

int notmain ( void )
{
    unsigned int ra,rb;
    
    uart_init();
    hexstring(0x12345678);
    hexstring(GETPC());

    for(ra=0;ra<0x100;ra+=4)
    {
        rb=GET32(ra);
        //hexstrings(ra);
        hexstring(rb);
    }

    return(0);
}
That gives something like

Code: Select all

12345678 
000082D8 
EA000008 
0124F800 
E3001131 
EE011F11 
E30001DA 
E16FF000 
E1B0F00E 
00063FFF 
00000C42 
4000008C 
EE110F10 
E3800004 
...
then using text editor magic turn this to

Code: Select all


.word 0xEA000008 
.word 0x0124F800 
.word 0xE3001131 
.word 0xEE011F11 
.word 0xE30001DA 
.word 0xE16FF000 
.word 0xE1B0F00E 

Code: Select all

assemble to an object then disassemble

00000000 <.text>:
   0:	ea000008 	b	28 <.text+0x28>
   4:	0124f800 	msreq	CPSR_s, r0, lsl #16
   8:	e3001131 	movw	r1, #305	; 0x131
   c:	ee011f11 	mcr	15, 0, r1, cr1, cr1, {0}
  10:	e30001da 	movw	r0, #474	; 0x1da
  14:	e16ff000 	msr	SPSR_fsxc, r0
  18:	e1b0f00e 	movs	pc, lr
  1c:	00063fff 	strdeq	r3, [r6], -pc	; <UNPREDICTABLE>
  20:	00000c42 	andeq	r0, r0, r2, asr #24
  24:	4000008c 	andmi	r0, r0, r12, lsl #1
  28:	ee110f10 	mrc	15, 0, r0, cr1, cr0, {0}
  2c:	e3800004 	orr	r0, r0, #4
  30:	e3800a01 	orr	r0, r0, #4096	; 0x1000
  34:	ee010f10 	mcr	15, 0, r0, cr1, cr0, {0}
  38:	ec510f1f 	mrrc	15, 1, r0, r1, cr15
  3c:	e3800040 	orr	r0, r0, #64	; 0x40
  40:	ec410f1f 	mcrr	15, 1, r0, r1, cr15
  44:	e3a00001 	mov	r0, #1
  48:	ee0e0f33 	mcr	15, 0, r0, cr14, cr3, {1}
  4c:	e51f1038 	ldr	r1, [pc, #-56]	; 1c <.text+0x1c>
  50:	ee011f51 	mcr	15, 0, r1, cr1, cr1, {2}
  54:	e51f1058 	ldr	r1, [pc, #-88]	; 4 <.text+0x4>
  58:	ee0e1f10 	mcr	15, 0, r1, cr14, cr0, {0}
  5c:	e24f1064 	sub	r1, pc, #100	; 0x64
  60:	ee0c1f30 	mcr	15, 0, r1, cr12, cr0, {1}
  64:	ee1ccf10 	mrc	15, 0, r12, cr12, cr0, {0}
  68:	f57ff06f 	isb	sy
  6c:	e1600070 	smc	0
  70:	ee0ccf10 	mcr	15, 0, r12, cr12, cr0, {0}
  74:	e59f4080 	ldr	r4, [pc, #128]	; fc <.text+0xfc>
  78:	ee100fb0 	mrc	15, 0, r0, cr0, cr0, {5}
  7c:	e7e10050 	ubfx	r0, r0, #0, #2
  80:	e3500000 	cmp	r0, #0
  84:	0a00000a 	beq	b4 <.text+0xb4>
  88:	e3a05001 	mov	r5, #1
  8c:	e1a05015 	lsl	r5, r5, r0
  90:	e31500ff 	tst	r5, #255	; 0xff
  94:	0a00000a 	beq	c4 <.text+0xc4>
  98:	e51f507c 	ldr	r5, [pc, #-124]	; 24 <.text+0x24>
  9c:	e3a03000 	mov	r3, #0
  a0:	e2855040 	add	r5, r5, #64	; 0x40
  a4:	e7954200 	ldr	r4, [r5, r0, lsl #4]
  a8:	e1540003 	cmp	r4, r3
  ac:	0afffffc 	beq	a4 <.text+0xa4>
  b0:	e7854200 	str	r4, [r5, r0, lsl #4]
  b4:	e3a00000 	mov	r0, #0
  b8:	e51f10a0 	ldr	r1, [pc, #-160]	; 20 <.text+0x20>
  bc:	e59f2034 	ldr	r2, [pc, #52]	; f8 <.text+0xf8>
  c0:	e12fff14 	bx	r4
  c4:	e320f003 	wfi
  c8:	eafffffd 	b	c4 <.text+0xc4>
	...
  fc:	00008000 	andeq	r8, r0, r0
and now I need to go read this and look up all the coprocessor accesses...

super quick glance this could be the code surrounding the loop waiting for non-zero.

Code: Select all

  9c:	e3a03000 	mov	r3, #0
  a0:	e2855040 	add	r5, r5, #64	; 0x40
  a4:	e7954200 	ldr	r4, [r5, r0, lsl #4]
  a8:	e1540003 	cmp	r4, r3
  ac:	0afffffc 	beq	a4 <.text+0xa4>
  b0:	e7854200 	str	r4, [r5, r0, lsl #4]
  b4:	e3a00000 	mov	r0, #0
  b8:	e51f10a0 	ldr	r1, [pc, #-160]	; 20 <.text+0x20>
  bc:	e59f2034 	ldr	r2, [pc, #52]	; f8 <.text+0xf8>
  c0:	e12fff14 	bx	r4
this being the loop, but it is not quite something someone would hand code, so not sure, will look at all this later.

Code: Select all

  a4:	e7954200 	ldr	r4, [r5, r0, lsl #4]
  a8:	e1540003 	cmp	r4, r3
  ac:	0afffffc 	beq	a4 <.text+0xa4>

banspri
Posts: 24
Joined: Tue Jun 27, 2017 11:02 pm

Re: Run all 4 cores Raspberry Pi 3

Fri Jul 21, 2017 6:01 pm

I read the documentation and I still don't quite get why I can't write to the mailboxes. Or is it that they're write-only so I can't read the information I'm writing?

Also, is the 64-bit timer mentioned in that document the same as the system timer? I haven't set any timers besides the system timer.

dwelch67
Posts: 955
Joined: Sat May 26, 2012 5:32 pm

Re: Run all 4 cores Raspberry Pi 3

Fri Jul 21, 2017 6:05 pm

Code: Select all

Disassembly of section .text:

00000000 <.text>:
   0:	ea000008 	b	28 <.text+0x28>
   4:	0124f800 	
   8:	e3001131 	movw	r1, #305	; 0x131
   c:	ee011f11 	mcr	15, 0, r1, cr1, cr1, {0}
  10:	e30001da 	movw	r0, #474	; 0x1da
  14:	e16ff000 	msr	SPSR_fsxc, r0
  18:	e1b0f00e 	movs	pc, lr
  1c:	00063fff 	
  20:	00000c42 	
  24:	4000008c 	


MRC p15, 0, <Rt>, c1, c0, 0

SCTLR
enable caches  
  28:	ee110f10 	mrc	15, 0, r0, cr1, cr0, {0}
  2c:	e3800004 	orr	r0, r0, #4
  30:	e3800a01 	orr	r0, r0, #4096	; 0x1000
  34:	ee010f10 	mcr	15, 0, r0, cr1, cr0, {0}

MRC p15, 0, <Rt>, c1, c0, 0
MRRC p15, 1, <Rt>, <Rt2>, c15

I dont know what this is yet
  
  38:	ec510f1f 	mrrc	15, 1, r0, r1, cr15
  3c:	e3800040 	orr	r0, r0, #64	; 0x40
  40:	ec410f1f 	mcrr	15, 1, r0, r1, cr15

CNTV_CTL  enable
  
  44:	e3a00001 	mov	r0, #1
  48:	ee0e0f33 	mcr	15, 0, r0, cr14, cr3, {1}

NSACR
00063fff
enable coprocessor access

  4c:	e51f1038 	ldr	r1, [pc, #-56]	; 1c <.text+0x1c>
  50:	ee011f51 	mcr	15, 0, r1, cr1, cr1, {2}

CNTFRQ =   0124f800
  54:	e51f1058 	ldr	r1, [pc, #-88]	; 4 <.text+0x4>
  58:	ee0e1f10 	mcr	15, 0, r1, cr14, cr0, {0}


MVBAR
  
  5c:	e24f1064 	sub	r1, pc, #100	; 0x64
  60:	ee0c1f30 	mcr	15, 0, r1, cr12, cr0, {1}

VBAR
  
  64:	ee1ccf10 	mrc	15, 0, r12, cr12, cr0, {0}
  
  68:	f57ff06f 	isb	sy
  6c:	e1600070 	smc	0

VBAR

  
  70:	ee0ccf10 	mcr	15, 0, r12, cr12, cr0, {0}
  74:	e59f4080 	ldr	r4, [pc, #128]	; fc <.text+0xfc>

MPIDR
  
  78:	ee100fb0 	mrc	15, 0, r0, cr0, cr0, {5}
  7c:	e7e10050 	ubfx	r0, r0, #0, #2

if cpu0 branch to 0xb4

  80:	e3500000 	cmp	r0, #0
  84:	0a00000a 	beq	b4 <.text+0xb4>
  
  88:	e3a05001 	mov	r5, #1
  8c:	e1a05015 	lsl	r5, r5, r0
  90:	e31500ff 	tst	r5, #255	; 0xff
  94:	0a00000a 	beq	c4 <.text+0xc4>

4000008c
  
  98:	e51f507c 	ldr	r5, [pc, #-124]	; 24 <.text+0x24>
  9c:	e3a03000 	mov	r3, #0
  a0:	e2855040 	add	r5, r5, #64	; 0x40

400000cc hmm this is the right math yes?  8c + 40 is CC but that is wrong the answer is really something else...
  
  a4:	e7954200 	ldr	r4, [r5, r0, lsl #4]
  a8:	e1540003 	cmp	r4, r3
  ac:	0afffffc 	beq	a4 <.text+0xa4>

store it back for some reason.
  
  b0:	e7854200 	str	r4, [r5, r0, lsl #4]

cpu 0 branches here or skips

  
  b4:	e3a00000 	mov	r0, #0
  b8:	e51f10a0 	ldr	r1, [pc, #-160]	; 20 <.text+0x20>
  bc:	e59f2034 	ldr	r2, [pc, #52]	; f8 <.text+0xf8>
  c0:	e12fff14 	bx	r4


some error case?

  c4:	e320f003 	wfi
  c8:	eafffffd 	b	c4 <.text+0xc4>
	...
  fc:	00008000 	andeq	r8, r0, r0

quick review, this is again kernel7.img no config.txt so aarch32 on a pi3. start.elf is probably weeks old...

It is polling 0x400000cc plus an id register based offset so probably CC, DC, EC.

Code: Select all

  a4:	e7954200 	ldr	r4, [r5, r0, lsl #4]
  a8:	e1540003 	cmp	r4, r3
  ac:	0afffffc 	beq	a4 <.text+0xa4>
it didnt init to zero so perhaps the gpu did.

then it stores it back and then falls through to branch the address it found, there is no wfe/wfi. So in aarch32 you should be able to simply write your address of the code you want to enter (to 0x400000CC for example). Remember this is another core so you need to bootstrap it set the stack pointer at a minimum (that does not conflict with the other cores) plus anything else your bootstrap needs.
Last edited by dwelch67 on Sat Jul 22, 2017 12:03 am, edited 1 time in total.

pmcg521
Posts: 34
Joined: Thu Jun 08, 2017 2:41 pm

Re: Run all 4 cores Raspberry Pi 3

Fri Jul 21, 2017 7:11 pm

Dwelch67, I have seen your nice example implementation in https://github.com/dwelch67/raspberrypi ... r/multi00/ and it does work correctly, randomly starting the cores that fight for address 0x40. I wanted to start each one at a different address and then access them like you did by waiting until the value at the address 0x40 changes. To do this, I instead tried using address 0x40 for core 1, 0x50 core2, and 0x60 core 3 in start.S [based on how you used all 0x40 within start_cpux]. Then I access them as such, following your method (for example, this is for accessing core 2):

Code: Select all

uint strt2, b_strt2;
PUT32(0x50, 0);
uint strt2 = GET32(0x50);
start2();
b_start2 = GET32(0x50);
if(strt2 != b_strt2){
    strt2 = b_strt2;
    printf("strt2 = 0x%X\r\n", strt2);
}
But doing it this way does not reach the if statement. Could it be something with the addresses I used? Why did you use 0x40? Also noteworthy: before this code, I am parsing atags at 0x100.

LdB
Posts: 1207
Joined: Wed Dec 07, 2016 2:29 pm

Re: Run all 4 cores Raspberry Pi 3

Fri Jul 21, 2017 7:27 pm

dwelch67 wrote: Remember this is another core so you need to bootstrap it set the stack pointer at a minimum (that does not conflict with the other cores) plus anything else your bootstrap needs.
Dammit sometimes I hate you :-)

I have no stack pointer and the dam thing is in HYP mode still isn't it ... arg I am doubly dorked !!!!!!!!!

Let me guess the Pi3 extra cores are in EL3 mode not even HYP need to go and check all that. I think you solved the mystery.

banspri
Posts: 24
Joined: Tue Jun 27, 2017 11:02 pm

Re: Run all 4 cores Raspberry Pi 3

Fri Jul 21, 2017 8:07 pm

I just realized that too, LdB lol. I'm stumped on how to get my core1 to start where I want it to, though. Do you think there'd be any problems if I start it at _start (which is the very first thing that runs for core0)? I can't seem to figure out how to put the address for _start into the mailbox, since _start is an ARM function.

And, wait, the cores are in EL3 mode? Ay caramba! I'm going to go take a much-needed break. Let me know if switching modes and setting up the stack does it for you.

LdB
Posts: 1207
Joined: Wed Dec 07, 2016 2:29 pm

Re: Run all 4 cores Raspberry Pi 3

Fri Jul 21, 2017 8:13 pm

here is a minimum .. I picked 0x4000 for the stack for it :-)

Code: Select all

.globl ExtraCoreSetup
ExtraCoreSetup:	
	mov sp, 0x4000							;@ Set the stack pointer for that mode
	bx  lr									;@ Return
You can then in see it in C because of the .globl

extern void ExtraCoreSetup (void) __attribute__((naked));

The linker will know how to make the association ... try feeding that to the core1 before you do anything else :-)

dwelch67
Posts: 955
Joined: Sat May 26, 2012 5:32 pm

Re: Run all 4 cores Raspberry Pi 3

Fri Jul 21, 2017 9:01 pm

very sorry must have slipped a bit in my math

Code: Select all

0x4000009C for core 1
0x400000AC for core 2
0x400000BC for core 3
are correct

Code: Select all

int notmain ( void )
{
   
    uart_init();
    hexstring(0x12345678);
    hexstring(GETPC());
    hexstring(GETCPSR());
    hexstring(GETSCTLR());
    hexstring(GETMPIDR());
    
    hopnstop(0x4000009C,0x8000,0x200000);
    return(0);
}

Code: Select all

.globl hopnstop
hopnstop:
    subs r2,r2,#1
    bne hopnstop
    str r1,[r0]
    b .

Code: Select all

12345678 
00008300 
200001DA   this is hyp mode yes?
00C50838   caches are off.
80000000   cpu0
12345678 
00008300 
200001DA 
00C50838 
80000001   cpu1

Code: Select all

int notmain ( void )
{
    unsigned int ra;

   
    uart_init();
    hexstring(0x12345678);
    hexstring(GETPC());
    hexstring(GETCPSR());
    hexstring(GETSCTLR());
    ra=GETMPIDR();
    hexstring(ra);


    switch(ra&3)
    {
        case 0: hopnstop(0x4000009C,0x8000,0x200000);
        case 1: hopnstop(0x400000AC,0x8000,0x200000);
        case 2: hopnstop(0x400000BC,0x8000,0x200000);
        case 3: hopnstop(0x400000BC,0x8000,0x200000);
    }
    
    return(0);
}

Code: Select all

12345678 
00008300 
200001DA 
00C50838 
80000000  cpu0
12345678 
00008300 
200001DA 
00C50838 
80000001  cpu1
12345678 
00008300 
200001DA 
00C50838 
80000002 cpu2
12345678 
00008300 
200001DA 
00C50838 
80000003 cpu3

I cheated and put the delay in before writing to the mailbox, so the uart would finish. then that cpu would go into this infinite loop so it didnt care about the stack anymore. and then the next cpu would run through the same program and end up with the same fate.

Return to “Bare metal, Assembly language”