jtan
Posts: 2
Joined: Fri Oct 28, 2016 8:52 pm

Creating a bootloader that writes to CPUACTLR?

Fri Oct 28, 2016 9:12 pm

I have a Raspberry Pi 3B running Raspbian Jessie Lite, and I'm trying to write to the CPU Auxiliary Control Register (CPUACTLR).
Based on what I've read and researched, I need a bootloader that executes in EL3, writes to CPUACTLR on all four cores, and then boots Raspbian.
Did I understand it correctly, and is this possible? If so, what's the simplest way to implement a bootloader that can do this task?

AlfredJingle
Posts: 69
Joined: Thu Mar 03, 2016 10:43 pm

Re: Creating a bootloader that writes to CPUACTLR?

Sat Oct 29, 2016 3:01 pm

Interesting register, I had not yet come across it. I played around with it and both in HYP mode as well as in secure supervisor mode I can read and write to it. Especially the prefetch control bits seem worth playing with. A quick preliminary test showed that copying blocks of memory to screen-memory (i.e. for a windowing system) is faster with the data-prefetch throttle disabled (bit 21). I will certainly experiment around with this one!
Now for your question: To me it seems easiest to insert the code you want in the early boot routine of Debian. I think I remember that Debian starts with leaving HYP-mode and I would try adding your code around there. I have no clue whether the rest of Debian has any problems with it.
going from a 6502 on an Oric-1 to an ARMv8 is quite a big step...

jtan
Posts: 2
Joined: Fri Oct 28, 2016 8:52 pm

Re: Creating a bootloader that writes to CPUACTLR?

Mon Oct 31, 2016 6:46 pm

Thanks for the reply.
You're saying that you can set CPUACTLR in EL2 without having to set any other registers first? That's interesting, and it does open up the option of setting CPUACTLR while the kernel is in HYP mode.

Unfortunately, it seems like the kernel leaves HYP mode really early, in particular before the secondary CPUs start up. I'm looking into getting the kernel to jump into HYP mode, writing CPUACTLR, and then back into SVC mode, but it doesn't seem easy.

AlfredJingle
Posts: 69
Joined: Thu Mar 03, 2016 10:43 pm

Re: Creating a bootloader that writes to CPUACTLR?

Tue Nov 01, 2016 9:10 pm

ARM specifically states that CPUACTLR should only be changed before anything else has happened. I interpret this in such a way that the only change you have to use this register is by either writing your own bootloader which in it's place starts Debain, or you change the early code of Debian by adding your own code and than rebuilding Debian.
I can access the register without changing anything else, but this is on my own bare metal Forth OS before or just after leaving HYP-mode, before the MMU is activated etc.
I now as a standard step during start-up disable the pre-fetch throttle, mostly to see whether anything untoward happens. Till now everything is ok, and cpu-temperature is between 43 and 45 degrees.

Out of curiosity: Is there any specific reason why you want to write to the register? It seems irrelevant for most users.
going from a 6502 on an Oric-1 to an ARMv8 is quite a big step...

koparasy
Posts: 8
Joined: Wed Jun 14, 2017 8:21 am

Re: Creating a bootloader that writes to CPUACTLR?

Wed Jun 14, 2017 8:28 am

Have u managed to write on CPUACTLR_EL1? I am using https://github.com/rsta2/circle a bare metal library and added the instructions which modify the register in HYP mode. However, when i read the register the value is not changed. Could u plz inform me on how you managed to disable prefetch/

Thank u.

vineetg76
Posts: 6
Joined: Wed Mar 13, 2019 9:51 pm

Re: Creating a bootloader that writes to CPUACTLR?

Wed Mar 13, 2019 11:17 pm

I've been trying to write to the CPUACTLR register from Linux kernel to try and disable the prefetcher as well. This is for RPI3 ARM32 build (using buildroot raspberrypi3_defconfig). My initial hakc was in setup_processor() as follows, but apparently this is too late and any write (even same value) hoses it.
__asm__ __volatile__("mrrc p15, 0, %0, %1, c15" : "=r"(a0), "=r"(a1));
b1 = a1;
b0 = a0 & ~0xE000; /* L1PCTL: clear: prefetch disable */
__asm__ __volatile__("mcrr p15, 0, %0, %1, c15" :: "r"(b0), "r"(b1));

Next I tried moving this early into __hyp_stub_install_secondary (for all cpus). This write doesn't crash, but can't seem to be able to clear the L1PCTL (it's readout later seems same)
mrrc 15, 0, r4, r5, cr15
bic r4, r4, #0xe000 ; clear L1PCTL
mcrr 15, 0, r4, r5, cr15

I'm not too well versed in the boot supervisor levels just yet, but was wondering if someone sees an obvious flaw here.

Thx,
-Vineet

LdB
Posts: 1178
Joined: Wed Dec 07, 2016 2:29 pm

Re: Creating a bootloader that writes to CPUACTLR?

Thu Mar 14, 2019 12:54 am

This is similar to the L1 cache register we were discussing 3 posts ago :-)

Look at bit 0 of this register and note it's default
http://infocenter.arm.com/help/index.js ... GHIBG.html

I suspect it is blocked at EL2 as well as most of these type of registers can be blocked at each level

You need to set access to it for lower EL's

The bootstub code is provided in userland as armstub7.s for ARM8-32 bit and they don't allow access before they give you the proc in HYP mode
https://github.com/raspberrypi/tools/tr ... r/armstubs

The process to get around it is complicated and only doable for advanced programmers.
It means using kernel_old=1 in config.txt to pick the ARM up raw at 0x0 doing all the stuff in armstub7.s with your new code and then booting into what would have been the original kernel8-32.img. Practically that means binding the original kernel8-32.img into your kernel img file at its normal position at 0x8000 (your code is at 0x0) and after you have done your new code simply jump to 0x8000 like normal entry and the original code takes over oblivious to the new bootstub that got executed. Only other choice is write a FAT reader to load the original kernel file into 0x8000 and jump into it.

The easiest way to bind a binary file into your bootloader assembler block is usually ".incbin" which is supported by most assemblers and GCC.
At the very bottom of your assembler file you set the origin and section so something like

Code: Select all

.orig 0x8000
.section text
.incbin "orig_kernel8-32.img"
If all worked your new img file should be exactly 0x8000 bytes bigger than than the original kernel img :-)

User avatar
Ultibo
Posts: 158
Joined: Wed Sep 30, 2015 10:29 am
Location: Australia
Contact: Website

Re: Creating a bootloader that writes to CPUACTLR?

Thu Mar 14, 2019 9:34 am

LdB wrote:
Thu Mar 14, 2019 12:54 am
It means using kernel_old=1 in config.txt to pick the ARM up raw at 0x0 doing all the stuff in armstub7.s with your new code and then booting into what would have been the original kernel8-32.img.
That won't help because kernel_old=1will disable the device tree loading by the firmware and the Linux kernel simply will not boot without device tree.
vineetg76 wrote:
Wed Mar 13, 2019 11:17 pm
I've been trying to write to the CPUACTLR register from Linux kernel to try and disable the prefetcher as well. This is for RPI3 ARM32 build (using buildroot raspberrypi3_defconfig).
There is a way to do what you want if you are ok with modifying the kernel source and recompiling (which it looks like you are already doing).

We have code in the Ultibo project that switches back to secure mode after switching out of hypervisor mode, have a look at the StartupSecure function which we use to return to secure supervisor mode where you can set almost any register you choose.

In the Linux kernel the macro that switches out of hypervisor mode is in safe_svcmode_maskall, you would need to find where that is called during startup and add something similar to our function to allow you to get back to secure supervisor mode and change the registers you want.

Not easy but it should be possible, even in Linux.
Ultibo.org | Make something amazing
https://ultibo.org

Threads, multi-core, OpenGL, Camera, FAT, NTFS, TCP/IP, USB and more in 3MB with 2 second boot!

LdB
Posts: 1178
Joined: Wed Dec 07, 2016 2:29 pm

Re: Creating a bootloader that writes to CPUACTLR?

Thu Mar 14, 2019 10:20 am

I thought the DT was dynamic now ... any mods around?

Update: Found the announcement back in 2015
viewtopic.php?f=29&t=93015&start=350
PhilE wrote:
Tue May 12, 2015 4:46 pm
Good news - the next firmware release will contain support for DT when kernel_old is set. It's not something I can test easily, but it goes through the right code paths and should work.

An rpi-update release should be available in the next few days.
Next comment said it worked so I believe not an issue unless anyone knows differently

I guess do whichever you find easier.

vineetg76
Posts: 6
Joined: Wed Mar 13, 2019 9:51 pm

Re: Creating a bootloader that writes to CPUACTLR?

Mon Jun 10, 2019 9:57 pm

OK I finally got a chance to try this. Thx @ultibo for the code references. Here's what I tried and it doesn't work
So the idea was to implement the secure monitor call vector (in hypervisor vector table)

Linux boot code sets up hypervisor vectors already, I just needed to define a new entry for the secure monitor call

ENTRY(stext)
bl __hyp_stub_install --> in turn calls __hyp_stub_install_secondary which sets up hyp vector Base etc
smc 0

ENTRY(__hyp_stub_vectors)
__hyp_stub_reset: W(b) .
__hyp_stub_und: W(b) .
+ __hyp_stub_svc: W(b) __hyp_stub_do_smc
...

The vector did my foo, i.e. disable hardware prefetch

ENTRY(__hyp_stub_do_smc)
mrrc 15, 0, r4, r5, cr15 # CPUACTLR : L1PCTL
bic r4, r4, #0xe000
mcrr 15, 0, r4, r5, cr15
__ERET
ENDPROC(__hyp_stub_do_smc)

And I forced a call into secure mode with a SMC 0, right after hyp stuff is done, but apparently the secure call doesn't happen as likely "traps" being explicitly disabled in __hyp_stub_install_secondary. I'm not sure which traps to enable.

ENTRY(__hyp_stub_install_secondary)
...
@ Disable all traps, so we don't get any nasty surprise
mov r7, #0
mcr p15, 4, r7, c1, c1, 0 @ HCR
mcr p15, 4, r7, c1, c1, 2 @ HCPTR
mcr p15, 4, r7, c1, c1, 3 @ HSTR

vineetg76
Posts: 6
Joined: Wed Mar 13, 2019 9:51 pm

Re: Creating a bootloader that writes to CPUACTLR?

Tue Jun 18, 2019 5:40 pm

OK, using some ideas from ultibo, I was able to hack https://github.com/rsta2/circle baremetal library to successfully write to CPUACTLR (disable dual issue and hardware prefetcher).

.macro safe_svcmode_maskall reg:req

+ mrc p15, #0, r0, cr12, cr0, #0
+ ldr r1, .LSecureVectorTable
+
+ ldmia r1!, {r2-r9}
+ stmia r0!, {r2-r9}
+ ldmia r1!, {r2-r9}
+ stmia r0!, {r2-r9}
+
+ //Clean Data Cache MVA
+ mov r12, #0
+ mcr p15, #0, r12, cr7, cr10, #1
+
+ dsb
+
+ //Invalidate Instruction Cache
+ mov r12, #0
+ mcr p15, #0, r12, cr7, cr5, #0
+
+ //Flush Branch Target Cache
+ mov r12, #0
+ mcr p15, #0, r12, cr7, cr5, #6
+
+ dsb
+ isb
+
+ smc #0
...
.endm

+.LSecureVectorTable:
+ .word SecureVectorTable
+
+ .globl SecureVectorTable
+SecureVectorTable:
+ ldr pc, .Lhdlr1
+ ldr pc, .Lhdlr1
+ ldr pc, .Lhdlr2
+ ldr pc, .Lhdlr1
+ ldr pc, .Lhdlr1
+ ldr pc, .Lhdlr1
+ ldr pc, .Lhdlr1
+ ldr pc, .Lhdlr1
+
+.Lhdlr1:
+ .word nop_hdlr
+.Lhdlr2:
+ .word smc_hdlr
+
+ .globl smc_hdlr
+smc_hdlr:
+ mrrc 15, 0, r4, r5, cr15
+ bic r4, r4, #0xe000
+ orr r4, r4, #0x20000000
+ mcrr 15, 0, r4, r5, cr15
+ movs pc, lr

However similar code doesn't seem to work for Linux kernel proper. It doesnt seem to like the vector copying scheme: smc #0 causes some cpu badness. I tried an alternate scheme of just pointing to a pre-cooked table (with smc stub added). That boots, but CPUACTLR remains unchanged. I'm not sure if secure mode call is happening at all

W(adr) r4, __hyp_stub_vectors
mcr p15, 0, r4, c12, c0, 0

I obviously don't have a debugger, so relying on code assisted debugging like this (printing @debug_flag). This scheme also worked in circle baremetal library, but not in linux, likely because mmu is turned on and mapping is missing for this data.

ldr r4, .Ldebug_flag
ldr r6, [r4]
add r6, r6, r5
str r6, [r4]

.Ldebug_flag:
.word debug_flag

.globl debug_flag
debug_flag:
.long 0xdead0000

User avatar
Ultibo
Posts: 158
Joined: Wed Sep 30, 2015 10:29 am
Location: Australia
Contact: Website

Re: Creating a bootloader that writes to CPUACTLR?

Tue Jun 25, 2019 11:50 pm

vineetg76 wrote:
Tue Jun 18, 2019 5:40 pm
This scheme also worked in circle baremetal library, but not in linux, likely because mmu is turned on and mapping is missing for this data.
Are you trying to do this for the primary CPU, one of the secondary CPUs or for all of them?

If the point in the kernel where you are trying to insert code already has the MMU enabled then it is way too late, the switch out of hypervisor mode and then back to secure supervisor mode needs to happen almost immediately after the kernel starts executing or it simply cannot work.
Ultibo.org | Make something amazing
https://ultibo.org

Threads, multi-core, OpenGL, Camera, FAT, NTFS, TCP/IP, USB and more in 3MB with 2 second boot!

vineetg76
Posts: 6
Joined: Wed Mar 13, 2019 9:51 pm

Re: Creating a bootloader that writes to CPUACTLR?

Wed Jun 26, 2019 11:04 pm

Are you trying to do this for the primary CPU, one of the secondary CPUs or for all of them?
For all cpus, in __hyp_stub_install_secondary
If the point in the kernel where you are trying to insert code already has the MMU enabled then it is way too late, the switch out of hypervisor mode and then back to secure supervisor mode needs to happen almost immediately after the kernel starts executing or it simply cannot work.
Right so this is super early and I don't think mmu/caches is the issues. My issue is secure mode vector table not getting copied properly (in a position independent manner I suppose). To debug the early asm code, in circle baremetal library (that smc handler is called) I was setting a variable from asm in early boot code and reading it back in "C" later. e.g.

ldr r2, .Lvmy_var
ldr r3, [r2]
add r3, r3, #0x10
str r3, [r2]

.Lmy_var:
.word my_var

.globl my_var
my_var:
.word 0xdead0000
.
For Linux kernel to be abel to do this i had to change it slightly.

adr r4, .Lmy_var
ldr r5, [r4]
ldr r6, [r4, r5]
add r6, r6, #0x10
str r6, [r4, r5]

.Lmy_var:
.long my_var - .

So this is using a PC relative data access.

Another difference is Linux kernel is linked at 0x8000_8000 while bare metal libs link at 0x8000 - im not sure how the top bit is ignored - even when the mmu is not turned on.

User avatar
Ultibo
Posts: 158
Joined: Wed Sep 30, 2015 10:29 am
Location: Australia
Contact: Website

Re: Creating a bootloader that writes to CPUACTLR?

Fri Jun 28, 2019 12:14 am

vineetg76 wrote:
Wed Jun 26, 2019 11:04 pm
For all cpus, in __hyp_stub_install_secondary
I think that's the wrong place to put this, at the time of calling __hyp_stub_install_secondary you are still in hypervisor mode.

You need to insert this after the call to safe_svcmode_maskall which switches back to supervisor mode instead, immediately after each call to safe_svcmode_maskall from head.S would seem to be the correct place.
Ultibo.org | Make something amazing
https://ultibo.org

Threads, multi-core, OpenGL, Camera, FAT, NTFS, TCP/IP, USB and more in 3MB with 2 second boot!

vineetg76
Posts: 6
Joined: Wed Mar 13, 2019 9:51 pm

Re: Creating a bootloader that writes to CPUACTLR?

Mon Jul 01, 2019 6:49 pm

In the baremetal code example atleast, we can invoke SMC even in hypervisor mode. Please take a look at my fork of https://github.com/rsta2/circle a very well written baremetal library for RPI.

It is all a hack clearly, but as you can see at https://github.com/vineetgarc/circle/bl ... /startup.S that I invoke the dofoobar macro while in the default firmware mode (HYP) and when i print the contents of CPUACTLR register I see

Code: Select all

logger: Circle 39.1 started on Raspberry Pi 3 Model B
----: --> CPUACTLR 290c0000:0  ACTLR 0 flag dead0011
----: --> CPSR (CPU mode) [stock] 600001da [in smc hdlr] 600001d6
Th default value of CPUACTLR is 090ca000 and the one that I modify to is 290c0000 which has disabled dual issue as well as hardware prefetch.

So the issue is not being able to code up the secure mode vector table correctly. And this has to do with high address of 0x8000_8000 somehow and keeping the code position independent. I tried build circle library with LOADADDR 0x8000_8000 and don't see output - which could also be my issue.

Return to “Bare metal, Assembly language”