bzt
Posts: 374
Joined: Sat Oct 14, 2017 9:57 pm

CNTP interrupt not firing

Sun Feb 24, 2019 10:08 pm

Hi,

I'd like to implement a per CPU core one-shot timer. I usually good at interpreting docs, but this time I'm really stuck. I'm trying to use the ARM Generic Timer.

First, I can confirm that exceptions are set up correctly (I'm at EL1, VBAR, SPSel etc. selects the handler correctly). According to the ARM-Cortex-A53-Manual section 10.2, Non-Secure EL1 physical timer is implemented. The DDI0487D document on page D10-2653 says this about the timer:
The CompareValue View of a timer operates as a 64-bit upcounter. The timer condition is met when the appropirate counter reaches the value programmed into its CompareValue register. When the timer condition is met, an interrupt is generated if the interrupt is not masked in the corresponding register, CNTP_CTL_EL0, CNTHP_CTL_EL2, CNTPS_CTL_EL1, CNTV_CTL_EL0, the asserted interrupt is the same as the interrupt asserted by the Non-Secure instance of AArch32 register CNTP_CTL.
And then the description of CNTP_CTL_EL0 says:
ISTATUS, bit [2] The status of the timer. This bit indicates whether the timer condition is met: 0 Timer condition is not met. 1 Timer condition is met. When the value of ENABLE bit is 1, ISTATUS indicates whether the timer condition is met. ISTATUS takes no account of the value of the IMASK bit. If the value of ISTATUS is 1 and the value of IMASK is 0 then the timer interrupt is asserted.
This sounds great, exatly what I want. I initialize it as:

Code: Select all

CNTHCTL_EL2 |= 3
CNTKCTL_EL1 = 0
CNTHP_CTL_EL2 = 1
CNTP_CTL_EL0 = 3      // IMASK=1, ENABLE=1
DAIFCLR = 1
Then when I want to fire the interrupt a second from now, I do

Code: Select all

CNTP_CVAL_EL0 = CNTPCT_EL0 + CNTFRQ_EL0
CNTP_CTL_EL0 = 1    // IMASK=0, ENABLE=1
This works great, and polling CNTP_CTL_EL0 shows that after a second it's changed to 5 (ISTATUS=1, IMASK=0, ENABLE=1). The problem is, no matter what I do, the interrupt handler is not called, and more interestingly the ISR_EL1 register remains zero. CNTHP_CTL_EL2 changes to 5 too, but I can't access CNTPS_CTL_EL1 (because I boot at EL2 and I haven't implemented a Trusted OS) which shouldn't be a problem as I'm aiming for the Non-Secure EL1 physical timer. I also tried to enable the IRQ64 (ARM timer) in the interrupt controller's ENABLE_BASIC_IRQS register bit 0, no luck. What am I missing? Why I don't get the IRQ when ISTATUS changes to 1? Is there some other system register that the documentaton does not mention? Do I need to acknowledge this timer somehow? If yes, how?

Another loosely related question, why are there so many timer devices?

- BCM2837 page 172 System Timer: this is clear, ARM independent, but therefore unfortunately not per CPU core configurable.
- BCM2837 page 196 Timer (ARM side): base addres 0x3F00B400 is this the same as the ARM Timer in the IC?
- BCM2836 QA7 specifies ARM control at 0x40000000 has two ARM timers: page 9 Core timer register and page 17 Local timer. What are those? Are they the same as the ARM Generic Timer or totally separate? How are they relate to the Timer (ARM side) at B400 and the ARM Timer bit in the IC?

Thanks,
bzt

bzt
Posts: 374
Joined: Sat Oct 14, 2017 9:57 pm

Re: CNTP interrupt not firing

Mon Feb 25, 2019 12:31 am

Ok, I did a little digging, and now I'm more confused as ever.

In the Linux kernel source, I've found two ARM clocksources: arm_arch_timer and arm_global_timer. I have hard time to figure out which one corresponds to which timer in the documentations.

For example, arm_arch_timer seems to use a memory mapped version of the CNTP registers, but I can't find those in the BCM docs. ARM doc only mentions that CNTP can be implemented in system registers as well as MMIO, but without a base address. I'm sure that BCM2836 QA7's ARM Timer at 0x40000000 is something different, because registers don't match.

The arm_global_timer is another mistery: it has a COUNTER0, COUNTER1 and a CONTROL registers, in that order, which does not fit anything that any of the doc mentions. Timer (ARM side) has LOAD, VALUE, CONTROL registers, which even could fit if it weren't for the totally different control bits. ARM Timer (as in BCM2836 QA7) starts with a CONTROL register at (base+0), so that can't be either. What is ARM Global Timer then?

Although there's no arm_generic in current Linux source tree, there used to be one. Or at least I've found a patch from 2012 which did exactly what I'm trying to do. I don't know where that file gone since, I assume it was superseded by arm_arch_timer. Examining the source revealed that I did everything right, except that old patch called the non-ARM CPU specific "enable_percpu_irq(clk->irq, 0);" function, where clk->irq cames from arch_timer_ppi which in turn is parsed from the dts. Unless anybody has a better idea, I'll follow that path, figure out the irq number which corresponds to the CNTP registers and what IRQ controller register is used to enable that. I hope that's not ARM Timer in the IC, because if so, then I don't know why enabling that didn't worked in the first place.

Cheers,
bzt

LdB
Posts: 1171
Joined: Wed Dec 07, 2016 2:29 pm

Re: CNTP interrupt not firing

Mon Feb 25, 2019 3:25 am

There are 2 timer blocks on single cores, 3 timer blocks on multicore Pi's

So the first two exist on any Pi

1.) Free running timer system described in BCM2835 Manual Section 12 page 172
It has 4 compare registers which each feed into the interrupt block controls

2.) Peripheral timer described in BCM2835 Manual Section 14 page 196
Single scalable timer that feeds into the interrupt block control

The next exists only on multicore BCM2836, BCM2837

3.) Core local timer as described in QA7 which runs off either 19.2Mhz Crystal clock or APB clock
It's default setup is to use the 19.2Mhz but it counts +2 ( see 4.2 Control register) so it looks like 38.4Mhz
It feeds an Irq/fiq into the interrupt control block

**** On QA7 the diagram at 3.2 clearly shows the irq routing to the cross point switch. ****

So on your linux sources

Timer 1 above is represented by this linux source they refer to it as the global time
https://github.com/torvalds/linux/blob/ ... al_timer.c
It is using 2 of the 4 compare registers to trigger interrupts, it calls them GT_COMP0, GT_COMP1
There is a strange part to the linux code it is actually COMPARE 1 & COMPARE 2
Note the offsets in the linux code

Code: Select all

#define GT_COMP0	0x10
#define GT_COMP1	0x14
Note Section 12.1 page 172 on BCM2835 Manual that is C1 & C2

Timer 2 above is somewhat represented by
https://github.com/torvalds/linux/blob/ ... ch_timer.c
However look at page 196 section 14 it tells you the problem ... only one timer !!!!!!
The ARM Timer is based on a ARM AP804, but it has a number of differences with the standard SP804:
• There is only one timer.
I suspect this is the correct code for the Pi for that clock
https://github.com/torvalds/linux/blob/ ... 35_timer.c

If it helps I have a sample running QA7 timer to core 3 interrupts ... main.c has details
https://github.com/LdB-ECM/Raspberry-Pi ... 3Interrupt
Last edited by LdB on Mon Feb 25, 2019 7:34 am, edited 1 time in total.

rst
Posts: 386
Joined: Sat Apr 20, 2013 6:42 pm
Location: Germany

Re: CNTP interrupt not firing

Mon Feb 25, 2019 7:22 am

bzt wrote:
Sun Feb 24, 2019 10:08 pm
This works great, and polling CNTP_CTL_EL0 shows that after a second it's changed to 5 (ISTATUS=1, IMASK=0, ENABLE=1). The problem is, no matter what I do, the interrupt handler is not called, and more interestingly the ISR_EL1 register remains zero.
Have you set the nCNTPNSIRQ bit in the "Core N Timers interrupt control" register at 0x40000040 + 4*N? Have a look on pg. 13 of the QA7 document.
Another loosely related question, why are there so many timer devices?
I think, this has historical reasons. But it's better to have more timers than less, isn't it?

bzt
Posts: 374
Joined: Sat Oct 14, 2017 9:57 pm

Re: CNTP interrupt not firing

Mon Feb 25, 2019 12:52 pm

Thank you for your answers!
LdB wrote:
Mon Feb 25, 2019 3:25 am
Timer 1 above is represented by this linux source they refer to it as the global time
https://github.com/torvalds/linux/blob/ ... al_timer.c
It is using 2 of the 4 compare registers to trigger interrupts, it calls them GT_COMP0, GT_COMP1
There is a strange part to the linux code it is actually COMPARE 1 & COMPARE 2
Note the offsets in the linux code

Code: Select all

#define GT_COMP0	0x10
#define GT_COMP1	0x14
Note Section 12.1 page 172 on BCM2835 Manual that is C1 & C2
I don't think so, I'm pretty sure BCM2835 Section 12.1 page 172 corresponds to the bcm2835_timer.c, all the registers and the control bits match nicely (actually that the only timer I'm sure about :-) ). It can't be the Section 14 page 196 timer either, becuase the control register's bit don't match. Finally, there's no match for the ARM QA7 Section 4.11 page 17 timer either, as that ARM Timer has a combined counter/control register.
Timer 2 above is somewhat represented by
https://github.com/torvalds/linux/blob/ ... ch_timer.c
However look at page 196 section 14 it tells you the problem ... only one timer !!!!!!
Exactly, that's why this is a different clock. It looks to me that arm_arch_timer is using a MMIO version of the CNTP registers, otherwise it's for the ARM Generic Timer, the one I'm trying to use. At least the similarilty with the DDI0487 Generic Timer spec and the arm_generic.c driver is uncanny.
The ARM Timer is based on a ARM AP804, but it has a number of differences with the standard SP804:
• There is only one timer.
I suspect this is the correct code for the Pi for that clock
https://github.com/torvalds/linux/blob/ ... 35_timer.c
Nope, as I said before, I'm pretty sure bcm2835_timer is the driver for the BCM System Timer (page 172).
If it helps I have a sample running QA7 timer to core 3 interrupts ... main.c has details
https://github.com/LdB-ECM/Raspberry-Pi ... 3Interrupt
Thank you very much, I'll look into that!
rst wrote:Have you set the nCNTPNSIRQ bit in the "Core N Timers interrupt control" register at 0x40000040 + 4*N? Have a look on pg. 13 of the QA7 document.
Yes, I've enabled ARM Timer in the IC 0xB218 just to be sure, and I have enabled CNTPNSIRQ and CNTVIRQ in all core's timer interrupt control registers (0x40000040 + 4*N |= (1<<1) | (1<<3)) and of course I have unmasked DAIF. Still no interrupts, ISR_EL1 is zero when ISTATUS goes up.
rst wrote:But it's better to have more timers than less, isn't it?
Normally yes. My only problem is that they don't have a consistent name throughout the docs and the drivers, therefore it's difficult to figure out which one is which. At some point they are all called the ARM Timer, which is very confusing.

To sum it up:
0x3F003000: BCM2835 Section 12, let's call this System Timer. This is one timer with four comparators, 0x3F00B204 bits 0-3, and first and third comparators are used by the GPU. Enabled by setting bits in 0x3F00B210 bits 0-3.
0x3F00B400: BCM2835 Section 14, SP408, let's call this TimerARM. This is one timer with one comparator, IRQ unknown (is this enabled by 0x3F00B218 bit 0? It should be I think).
0x40000034: QA7 Section 4.11, let's call this Local Timer. This is one timer with one comparator, IRQ in 0x40000070 + 4*X bit 11. Enabling should be connecting it with a Core in 0x40000024 (specifying X), and then writing the reload value and enable bits in 0x40000034.
0x40000040: QA7 Section 5.6, let's call them Core Timers. Four timers, comparator unknown (is this in CNTP_CVAL register? If so, what about the next timer?), IRQ in 0x40000070 + 4*N bit 1. It should be enabled by bits in 0x40000040. The most confusing part that this timer does not have Generic Timer registers, but it does have IRQ bits for the Generic Timer.
unknown: DDI0487 page D10-2653, let's call this Generic Timer. This has CNT system registers, which could be mapped in memory at an unknown address (not any of the above addresses, as neither of those has CNTFREQ register). Enabled by CNTP_CTL_EL0, comparator in CNTP_CVAL_EL0 as per D10-2653. IRQ unknown.

One thing I've noticed, that the Linux kernel is using the GIC interrupt controller and not BCM2835 Section 7 as I do. Could that be the problem?

Cheers,
bzt

LdB
Posts: 1171
Joined: Wed Dec 07, 2016 2:29 pm

Re: CNTP interrupt not firing

Mon Feb 25, 2019 2:14 pm

You lost me QA7 Section 5.6????
https://www.raspberrypi.org/documentati ... rev3.4.pdf

Can you link what datasheet you are using?
I don't see any comparator registers on the local arm clock?

As per my sample I never found it hard to get the Interrupts working on each core especially what you call the local timer.
The only thing is the prescale etc is for all 4 cores as per 3.1, I dont think you can use it to generate diff delays on each core
you need either of the other two timers with comparators for that.
3.1 64-bit Timer
The A7-core requires a 64-bit timing input signal which is used to implement the four timers internally to each processor core.There is only one 64-bit timer signal going to all four cores. This means that any changes to the timer will affect all cores.
.
Finally there is no GIC on the raspberry Pi they ripped it out and put the Broadcom one in for compatibility. Don't even look at GIC stuff it doesn't exist.

rst
Posts: 386
Joined: Sat Apr 20, 2013 6:42 pm
Location: Germany

Re: CNTP interrupt not firing

Mon Feb 25, 2019 2:44 pm

bzt wrote:
Mon Feb 25, 2019 12:52 pm
rst wrote:But it's better to have more timers than less, isn't it?
Normally yes. My only problem is that they don't have a consistent name throughout the docs and the drivers, therefore it's difficult to figure out which one is which. At some point they are all called the ARM Timer, which is very confusing.
Yes, it is a bit. But because the timer are from different IP vendors (ARM, Broadcom) you cannot expect consistence here. From my point of view it's good, that the RPi's are upwards compatible. You can still use the system timer at 0x3F003000 on the RPi 3, which was introduced on the RPi 1.
0x40000034: QA7 Section 4.11, let's call this Local Timer. This is one timer with one comparator, IRQ in 0x40000070 + 4*X bit 11. Enabling should be connecting it with a Core in 0x40000024 (specifying X), and then writing the reload value and enable bits in 0x40000034.
If I remember well, this timer has been added especially for the USB timing. On the RPi 1 the driver needs 8000 interrupts per second to synchronise with the SOF (start-of-frame) event and they used the FIQ for it. With this special timer one can program, after how many frames he wants to get control and can access the USB.
unknown: DDI0487 page D10-2653, let's call this Generic Timer. This has CNT system registers, which could be mapped in memory at an unknown address (not any of the above addresses, as neither of those has CNTFREQ register). Enabled by CNTP_CTL_EL0, comparator in CNTP_CVAL_EL0 as per D10-2653. IRQ unknown.
I think, it is not memory mapped in the BCM2836/7. You can access it via system control registers only.

LdB
Posts: 1171
Joined: Wed Dec 07, 2016 2:29 pm

Re: CNTP interrupt not firing

Mon Feb 25, 2019 2:54 pm

rst wrote:
Mon Feb 25, 2019 2:44 pm
If I remember well, this timer has been added especially for the USB timing. On the RPi 1 the driver needs 8000 interrupts per second to synchronise with the SOF (start-of-frame) event and they used the FIQ for it. With this special timer one can program, after how many frames he wants to get control and can access the USB
You are spot on and on the figure at 3.2 Interrupt routing (page 4) it is still called the USB timer
https://www.raspberrypi.org/documentati ... rev3.4.pdf

I don't get the bit about memory mapping 3.2 figure makes it quite clear just the IRQ/FIQ signal lines are crosspointed.
Last edited by LdB on Mon Feb 25, 2019 3:13 pm, edited 1 time in total.

rst
Posts: 386
Joined: Sat Apr 20, 2013 6:42 pm
Location: Germany

Re: CNTP interrupt not firing

Mon Feb 25, 2019 3:10 pm

LdB wrote:
Mon Feb 25, 2019 2:54 pm
You are spot on and on the figure at 3.2 Interrupt routing (page 4) it is still called the USB timer
https://www.raspberrypi.org/documentati ... rev3.4.pdf
Ah, yes. Haven't seen that. So it seems to be right.

rst
Posts: 386
Joined: Sat Apr 20, 2013 6:42 pm
Location: Germany

Re: CNTP interrupt not firing

Mon Feb 25, 2019 4:42 pm

I did a little test and it works here this way:

Code: Select all

write32 (0x40000040, 1 << 1);

u64 nCNTPCT;
asm volatile ("mrs %0, CNTPCT_EL0" : "=r" (nCNTPCT));
asm volatile ("msr CNTP_CVAL_EL0 , %0" :: "r" (nCNTPCT + 19200000));
asm volatile ("msr CNTP_CTL_EL0 , %0" :: "r" (1));
Then you get an IRQ after one second. You have to reload CNTP_CVAL_EL0, to clear the interrupt.

Info about the USB timer is here (last paragraph).

bzt
Posts: 374
Joined: Sat Oct 14, 2017 9:57 pm

Re: CNTP interrupt not firing

Mon Feb 25, 2019 5:05 pm

LdB wrote:
Mon Feb 25, 2019 2:14 pm
You lost me QA7 Section 5.6????
Sorry, my mistake. That was a typo, I wanted to write 4.6.
https://www.raspberrypi.org/documentati ... rev3.4.pdf
Can you link what datasheet you are using?
I don't see any comparator registers on the local arm clock?
I'm using that one. Well, strictly speaking it's not called a comparator on page 17, because that timer is a count down timer, so it's called a re-load value instead, but that's just wording:
The local timer counts down and re-loads when it gets to zero. At the same time an interrupt-flag is set.
Which is technically the same as if the timer would count up from zero and comparing current value to the re-load value. The point is, what I meant is, that's the one where you set the time for the interrupt.
As per my sample I never found it hard to get the Interrupts working on each core especially what you call the local timer.
The register "local interrupt routing" 0x40000014 on page 18 made me believe that the Local Timer IRQ is raised on one of the cores only. I can't see an option "all" or "any"? And because there's only one Local Timer re-load value, so you can't use it for all cores individually anyway.
3.1 64-bit Timer
The A7-core requires a 64-bit timing input signal which is used to implement the four timers internally to each processor core.There is only one 64-bit timer signal going to all four cores. This means that any changes to the timer will affect all cores.
Yeah, but which timer do they refer to by "64-bit Timer", Core Timers or the Local Timer? All 5 timers are 64 bit... :-)
Finally there is no GIC on the raspberry Pi they ripped it out and put the Broadcom one in for compatibility. Don't even look at GIC stuff it doesn't exist.
That's new to me. Are you serious? The raspberrypi/linux repo has gic driver in it. Raspberrypi Linux device tree has gic. As a matter of fact, all configurations in that directory have gic. Btw the arch/arm64/boot/dts/arm64/broadcom/bcm2837-rpi-3-b-plus.dts file has only one #include "arm/bcm2837-rpi-3-b-plus.dts" line, but there's no arm/bcm2837-rpi-3-b-plus.dts anywhere... And raspberrypi/firmware only has binary dtb files. So seriously, WTF?

So back to square one: how can I implement a per CPU core one-shot timer interrupt?

Cheers,
bzt

bzt
Posts: 374
Joined: Sat Oct 14, 2017 9:57 pm

Re: CNTP interrupt not firing

Mon Feb 25, 2019 5:29 pm

rst wrote:
Mon Feb 25, 2019 2:44 pm
I think, it is not memory mapped in the BCM2836/7. You can access it via system control registers only.
I've thought that too. But in DDI0487 page D10-2646 says:
A Generic Timer implementation must also include a memory-mapped component. This component:
- Must provide the System Counter shown in Figure D10-1
- Optionally, can provide timer components for use at system level.
Chapter 12 System Level Implementation of the Generic Timer describes this memory-mapped component.
and page I2-6726 describes those. There's no doubt that arm_arch_timer.c is using that mapped memory instead of the system registers. But if it's not the arm_arch_timer.c, then what per CPU timer driver is used in Linux on RaspberryPi? System Timer (bcm2835_timer.c) surely can't be used as a per CPU timer.
rst wrote:Then you get an IRQ after one second.
Yeah, that's funny, I do exactly the same but I don't. ISR_EL1 remains 0. Just one thing, I use virtual memory, so I've mapped 0x40000000 with OSH, NX and with mair 4 (device, nGnRE). Is that correct? It works for other MMIO (0x3F000000 - 0x40000000), but those are not ARM core specific.

Btw, have you tried this on a real hardware or in qemu -M raspi3?

I've checked qemu's source, 0x40000000 area is included, and in hw/arm/bcm2836.c the nCNTPNSIRQ is connected to it. It should work. Frankly I don't try my code on real hw until I see it running perfectly in a vm, but this time maybe it's not my code but I've found a qemu issue? Listing device tree tells me that System Timer, TimerARM is not implemented at all, but ARM Control is. I've debugged qemu and I can confirm when ISTATUS changes to one, "qemu_set_irq(cpu->gt_timer_outputs[GTIMER_PHYS], 1);" gets called for sure. Although I've not thought because it's unlikely, but this doesn't mean there's no bug in the qemu irq code somewhere... I wrote to qemu mailinglist, let's see what they say.

Thank you all your help,
bzt

bzt
Posts: 374
Joined: Sat Oct 14, 2017 9:57 pm

Re: CNTP interrupt not firing

Mon Feb 25, 2019 5:44 pm

Hi guys,

Look what I've found:
https://github.com/NienfengYao/armv8-bare-metal
(Fixed, 2018/07/19) Timer IRQ doesn't work(commit b72c6a8cc7033a4fed89b57f75826d201466179f)

We can see CNTV_CTL_EL0[2]:ISTATUS changes, but irq_handler doesn't be called.
And we also didn't see any changes in ISR_EL1.
Solved in commit 2aaa0bff7516e84e01acd10d8de64189839d9d51.
Sounds familiar? :-)

Unfortunately this code uses GIC, so if LdB is right, there's no use on RPi :-(

Cheers,
bzt

LdB
Posts: 1171
Joined: Wed Dec 07, 2016 2:29 pm

Re: CNTP interrupt not firing

Mon Feb 25, 2019 5:48 pm

You have an issue you haven't thought about anyhow that you need an interrupt table to splay out the 64 normal irq's anyhow ... unless you were going to stay at a trivial example because any timer will fire the irq/fiq. Done this way it's easy just take the timer interrupts on one core and use the mailbox irq to pass to the other cores which more closely how you have to set a multicore up anyhow for ipc (I am sure linux does that and it puts trivial load on the core).

I can think of an easy but probably not optimal way to do what you want with two timers each using an irq and fiq which you can independantly crosspointed to a different core. I can't think of a way to do it off a single timer without using the mailbox, I just don't think it is designed for that.

rst
Posts: 386
Joined: Sat Apr 20, 2013 6:42 pm
Location: Germany

Re: CNTP interrupt not firing

Mon Feb 25, 2019 7:24 pm

bzt wrote:
Mon Feb 25, 2019 5:29 pm
rst wrote:
Mon Feb 25, 2019 2:44 pm
I think, it is not memory mapped in the BCM2836/7. You can access it via system control registers only.
I've thought that too. But in DDI0487 page D10-2646 says:
A Generic Timer implementation must also include a memory-mapped component. This component:
- Must provide the System Counter shown in Figure D10-1
- Optionally, can provide timer components for use at system level.
Chapter 12 System Level Implementation of the Generic Timer describes this memory-mapped component.
and page I2-6726 describes those. There's no doubt that arm_arch_timer.c is using that mapped memory instead of the system registers. But if it's not the arm_arch_timer.c, then what per CPU timer driver is used in Linux on RaspberryPi? System Timer (bcm2835_timer.c) surely can't be used as a per CPU timer.
That's interesting. In fact there is a memory-mapped component of the generic timer in the RPi 2 and 3. The one, which is described in the QA7 document. :) It is not compatible with the layout in the I2 section, but maybe they decided to define their own memory layout? I don't know. But the question is, at which other address should be these memory frames (CNTControlBase, CNTReadBase etc.)? I don't know enough about timers in the current Linux kernel.
Yeah, that's funny, I do exactly the same but I don't. ISR_EL1 remains 0. Just one thing, I use virtual memory, so I've mapped 0x40000000 with OSH, NX and with mair 4 (device, nGnRE). Is that correct? It works for other MMIO (0x3F000000 - 0x40000000), but those are not ARM core specific.
I'm using the same memory attributes. I also see no difference between the two MMIO regions.
Btw, have you tried this on a real hardware or in qemu -M raspi3?
On a real RPi 3B+.
I've checked qemu's source, 0x40000000 area is included, and in hw/arm/bcm2836.c the nCNTPNSIRQ is connected to it. It should work. Frankly I don't try my code on real hw until I see it running perfectly in a vm, but this time maybe it's not my code but I've found a qemu issue? Listing device tree tells me that System Timer, TimerARM is not implemented at all, but ARM Control is. I've debugged qemu and I can confirm when ISTATUS changes to one, "qemu_set_irq(cpu->gt_timer_outputs[GTIMER_PHYS], 1);" gets called for sure. Although I've not thought because it's unlikely, but this doesn't mean there's no bug in the qemu irq code somewhere... I wrote to qemu mailinglist, let's see what they say.
Let's wait, what they're answering. I'm currently not working with QEMU for AArch64, because I need the system timer at 0x3F003000 and it is not implemented, as you know. :(
Thank you all your help,
bzt
You are welcome,
rst

bzt
Posts: 374
Joined: Sat Oct 14, 2017 9:57 pm

Re: CNTP interrupt not firing

Mon Feb 25, 2019 8:56 pm

Hi,

Thank you for your help!

@LdB: not a bad idea! But I'm definitely looking for 4 independent timers, so that I can preempt each core individually. With a single timer and 4 mailbox calls I can interrupt the cores at given intervals, but I want different time for each. I'll leave that to plan B :-)

@rst: yeah, interesting I give you that. I haven't found any base address for them yet, and I'm not desperate enough to put debug messages in the Linux kernel and see :-) I don't think they have changed the layout, it's more like they haven't documented the address, the one used by arm_arch_timer.c as "base". I wouldn't be surprised if it turns out to be 0x40000100 or something, right after the documented ones.

Btw, I've added debug messages to qemu:

Code: Select all

diff --git a/hw/intc/bcm2836_control.c b/hw/intc/bcm2836_control.c
index cfa5bc7365..a90a7728c5 100644
--- a/hw/intc/bcm2836_control.c
+++ b/hw/intc/bcm2836_control.c
@@ -55,6 +55,8 @@ static void deliver_local(BCM2836ControlState *s, uint8_t core, uint8_t irq,
     } else {
         /* the interrupt is masked */
     }
+    if(s->irqsrc[core])
+        qemu_log_mask(CPU_LOG_INT, "deliver_local core %d irqsrc %d\n", core, s->irqsrc[core]);
 }
 
 /* Update interrupts.  */
@@ -113,6 +115,8 @@ static void bcm2836_control_set_local_irq(void *opaque, int core, int local_irq,
 
     assert(core >= 0 && core < BCM2836_NCORES);
     assert(local_irq >= 0 && local_irq <= IRQ_CNTVIRQ);
+        qemu_log_mask(CPU_LOG_INT, "set_local_irq core %d local_irq %d level %d timerirqs %x\n",
+                      core, local_irq, level, s->timerirqs[core]);
 
     s->timerirqs[core] = deposit32(s->timerirqs[core], local_irq, 1, !!level);
 
@@ -192,6 +196,8 @@ static void bcm2836_control_write(void *opaque, hwaddr offset,
 {
     BCM2836ControlState *s = opaque;
 
+        qemu_log_mask(CPU_LOG_INT, "%s: write %"HWADDR_PRIx" %lx\n",
+                      __func__, offset, val);
     if (offset == REG_GPU_ROUTE) {
         s->route_gpu_irq = val & 0x3;
         s->route_gpu_fiq = (val >> 2) & 0x3;
diff --git a/target/arm/helper.c b/target/arm/helper.c
index fbaa801cea..aad26d133c 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -2413,6 +2413,8 @@ static void gt_recalc_timer(ARMCPU *cpu, int timeridx)
         gt->ctl = deposit32(gt->ctl, 2, 1, istatus);
 
         irqstate = (istatus && !(gt->ctl & 2));
+        if(irqstate)
+            qemu_log_mask(CPU_LOG_INT,"CNTPIRQ timeridx=%d\n",timeridx);
         qemu_set_irq(cpu->gt_timer_outputs[timeridx], irqstate);
 
         if (istatus) {
@@ -9442,6 +9444,9 @@ static void arm_cpu_do_interrupt_aarch32(CPUState *cs)
     take_aarch32_exception(env, new_mode, mask, offset, addr);
 }
This revealed a few interesting things:
- first, MMIO mapping is correct as I could see the writes to 0x40000024 and 0x40000040+4N.
- second, when irqstate changed in gt_recalc_timer(), the call went through bcm2836_control_set_local_irq(), but deliver_local() thought the IRQ was masked.
- third, in my bootloader, when I cleared cnthvoff_el2, it triggered immediately a virtual CNTVIRQ, but since interrupts were masked in DAIF, I did not noticed. I also haven't acknowleged anything, so I assume that's why the second (this time wanted) CNTPNSIRQ was dropped.

I've removed the el2 part from the bootloader, now I still don't get my handler called, but at least I can see bits set in ISR_EL1 :-) From now on, this is a totally different issue, but a progress, so I'm going somewhere :-)

Thank you guys your help! Very much!
bzt

bzt
Posts: 374
Joined: Sat Oct 14, 2017 9:57 pm

Re: CNTP interrupt not firing

Mon Feb 25, 2019 10:37 pm

Now that I'm not needed it any more, of course I've found everything :-) It seems Linus' repo has all the missing files raspberrypi/linux repo lacks. I'll post the links here if someone later needs them.

The RaspberryPi 3B device tree source file is here
https://github.com/torvalds/linux/blob/ ... pi-3-b.dts
https://github.com/torvalds/linux/blob/ ... m2837.dtsi

The ARM Core Timers driver is not in the clocksource directory, but in irqchip:
https://github.com/torvalds/linux/blob/ ... 836.c#L258
Just a sidenote, unlike what the name bcm2835_init_local_timer_frequency() suggests, that's not the Local Timer, but the Core Timers what the driver sets up. Anyway, it clears bits 7 and 8 in control register (0x40000000, so sets stepping to 1 and clocksource to the Crystal clock), and prescale to 0x80000000 at 0x40000008. Then the enabler (unmasking) function is in line 95 (that writes to 0x40000040+4N). That's all, nothing else needed.

Cheers,
bzt

LdB
Posts: 1171
Joined: Wed Dec 07, 2016 2:29 pm

Re: CNTP interrupt not firing

Mon Feb 25, 2019 11:22 pm

bzt wrote:
Mon Feb 25, 2019 8:56 pm
@LdB: not a bad idea! But I'm definitely looking for 4 independent timers, so that I can preempt each core individually. With a single timer and 4 mailbox calls I can interrupt the cores at given intervals, but I want different time for each. I'll leave that to plan B :-)
I don't know what you are doing but in general you don't do that because then you have to synchronize the core communication, generally you want all the cores preempting (doing a context switch) together because it means they are largely self synchronized. For example if you are trying to setup a multicore scheduler with each core running it's own local task list you have just made it much much more sensitive to synchronization and you made launching a gang schedule with a couple of cores much more complex than it need to be.

As a general rule pre-emptive switchers only run between 100-1000 switches per second and given the narrow range there is little to be gained from having a homogenous core cluster switching at different speeds. Remember there are other sorts of pre-emptive schedulers that don't use a timer tick and they may be more appropriate as you seem hell bent you need 4 independent timers which to me is setting off alarm bells.

Perhaps google Tickless RTOS or Tickless scheduler and convince yourself that is not more what you want.

I have played around with this heavily of late and that is just my thoughts on matter not a definitive answer.

bzt
Posts: 374
Joined: Sat Oct 14, 2017 9:57 pm

Re: CNTP interrupt not firing

Tue Feb 26, 2019 2:32 am

Hi

@LdB: actually what I'm doing is using one one-shot timer per core. When a task is scheduled, I know on which core it's on, and for how long it can run, and I set a timer to that. Because with Core Timers I have one timer for each core, I can easily do that. The scheduler runs on each code individually, called as needed, so no synchronization required at all yet there'll be no deadlocks.

About the Plan B, the idea was to use a single timer, whos handler would use the mailboxes to notify the all the cores. That mailbox notifier would then run on each core, would decrease a core local counter, and if the counter is zero, it would call the scheduler on that core, otherwise it would exit as soon as possible. But since I got the Generic Timer running, I don't have to worry about this :-)

There's one catch though, right now qemu does not support the Local Timer, needed for a periodic IRQ. So good news, I've implemented that. It works with my test code, but the support is very basic, and more tests are welcome.
https://github.com/bztsrc/qemu-local-timer

Cheers,
bzt

LdB
Posts: 1171
Joined: Wed Dec 07, 2016 2:29 pm

Re: CNTP interrupt not firing

Tue Feb 26, 2019 3:20 am

Okay so generally if I was doing what you are I would not use a timer at all I would simply setup the SVC call to trigger the context switch.

I have actually done this on my version of FreeRTOS which support tickless operation or early finish tasks
https://github.com/LdB-ECM/Raspberry-Pi ... tStart64.S

Code: Select all

swi_handler_stub:
portSAVE_CONTEXT
MOV X1, SP
AND X1, X1, #0xF				// Ensure 16-byte stack alignment
SUB SP, SP, X1					// adjust stack as necessary
STP X1, XZR, [SP, #-16]!			// Store adjustment 
BL 	vTaskSwitchContext
LDP	X1, XZR,  [SP], #16		// Reload adjustment
ADD SP, SP, X1					// Un-adjust stack
/* restore context which includes a return from interrupt */
portRESTORE_CONTEXT
/* code should never reach this deadloop */
B 
So when your task finishes just execute a SVC 0 instruction and the context switches (actually it isn't specifically zero, I was lazy it will work on any svc call).

In FreeRTOS it is called portYield or historically taskYield

Code: Select all

#define portYIELD() __asm volatile ( "SVC 0" ::: "memory" )
So you can do exactly what you are trying to do without needing a timer at all .. you learn a lot playing around.

So my conversion of FreeRTOS supports both a fixed timer tick preemptive switch and an early finish and force task switch which is sort of optimal and clean. In short it supports both timer tick and tickless context switch operation or even both at the same time it doesn't care.

Perhaps it isn't what you want but what I am getting at is there are many ways to skin the scheduler cat so perhaps call this plan C and it is dead easy to setup on each core.

bzt
Posts: 374
Joined: Sat Oct 14, 2017 9:57 pm

Re: CNTP interrupt not firing

Tue Feb 26, 2019 11:05 am

Hi,

Nice! But that's not what I want. What you have implemented with SVC is called cooperative multitasking. What I'm doing here is called preemptive multitasking but without regular interval interrupts (aka tickless).
wikipedia wrote:The term preemptive multitasking is used to distinguish a multitasking operating system, which permits preemption of tasks, from a cooperative multitasking system wherein processes or tasks must be explicitly programmed to yield when they do not need system resources.
wikipedia wrote:A tickless kernel is an operating system kernel in which timer interrupts do not occur at regular intervals, but are only delivered as required.
Cheers,
bzt

LdB
Posts: 1171
Joined: Wed Dec 07, 2016 2:29 pm

Re: CNTP interrupt not firing

Tue Feb 26, 2019 11:40 am

For the record I am not suggesting a co-operative scheduler :-)

It isn't co-operative because you setup a scheduler algorthim running and referencing to a system clock and it will "forcibly take the core" by executing a SVC 0 which is why it is on an interrupt not just a code block like you might do with a co-operative scheduler. It will basically do exactly what your one shot does but the scheduler quanta will build up off the clock until it exceeds any task quanta at which point it will kick a SVC 0 to the core running that task and force it to switch. Operationally it would be indistinguishable from your one shot, it is just a software version from a single clock.

So there are multiple ways to make the thing you are calling a one shot besides a physical hardware clock. I am going to suggest it would be rare for a multicore to have the hardware you are after but in software its easy, but I will leave that to you.

For reference:
Software timer creation from FreeRTOS it allows a one shot by setting uxAutoReload to false
https://www.freertos.org/FreeRTOS-timer ... reate.html
You might want a faster clock by organizing a sub-tick rather than tick timer but none the less you could software implement what you are doing on FreeRTOS code it is trivial.

Anyhow I just offered some suggestions and you know what your doing, so leave you to it.

bzt
Posts: 374
Joined: Sat Oct 14, 2017 9:57 pm

Re: CNTP interrupt not firing

Wed Feb 27, 2019 12:42 am

LdB wrote:
Tue Feb 26, 2019 11:40 am
It isn't co-operative because you setup a scheduler algorthim running and referencing to a system clock and it will "forcibly take the core" by executing a SVC 0 which is why it is on an interrupt not just a code block like you might do with a co-operative scheduler.
Oh, I missed that, sorry. My interrupt handlers are already switching tasks, so there's no need to call SVC, and I never do re-entrant ISRs, that's what confused me :-) Now I see what you meant. Makes a lot more sense this way actually :-)
Operationally it would be indistinguishable from your one shot, it is just a software version from a single clock.
I still don't see how it could replace 4 independent per core timers, because on which core will that single clock irq handler run? Or on all of them at once?
So there are multiple ways to make the thing you are calling a one shot besides a physical hardware clock.
This is a different thing, maybe I wasn't clear. One shot timer is a timer which waits for a given time, then triggers an IRQ. That's it. It's opposed to a periodic timer, which triggers many IRQs at the given intervals until it's told to stop. This has nothing to do with the timer being per core local timer, a CPU independent hardware clock, or if the timer IRQ can be routed to one of he cores, all of the cores etc. It's just means after the timer enabled, how many IRQs will be fired (one=one shot or many=periodic).
I am going to suggest it would be rare for a multicore to have the hardware you are after but in software its easy, but I will leave that to you.
Not really. All the multicore architectures I worked with had CPU local interrupts. And if memory serves well all of them had per CPU timers (not sure about the VAX9000, it was a long long time ago), or at least several comparators in one timer each routeable to different cores.
Anyway, right now I'm concerned with x86_64 and AArch64, and the first has LAPIC Timer, the latter has ARM Generic Timer (or Core timers with QA7 parlance), so I'm good :-) But thanks for your advice, I appreciate it very much!

Cheers,
bzt

LdB
Posts: 1171
Joined: Wed Dec 07, 2016 2:29 pm

Re: CNTP interrupt not firing

Wed Feb 27, 2019 3:26 am

bzt wrote:
Wed Feb 27, 2019 12:42 am
I still don't see how it could replace 4 independent per core timers, because on which core will that single clock irq handler run? Or on all of them at once?
The software timer effectively runs on the core because each core has it's own Control Block(CB) so when they enter the common scheduler code they adjust there own local CB.

It might be easier to look at this on code and what you are doing is called SCHED_RR on the linux 2.6 Real Time scheduler
https://github.com/shichao-an/linux/blo ... sched_rt.c
You can look at the scheduler on version 4 kernel but it is significantly more complex to understand

So the code has two real-time scheduling policies, it also has a normal idle/slow app policy called SCHED_NORMAL but we are interested in
SCHED_FIFO
SCHED_RR

SCHED_FIFO: does a first-in, first-out scheduling algorithm without timeslices and obeys these rules
(i) A runnable SCHED_FIFO task is always scheduled over any SCHED_NORMAL tasks.
(ii) When a SCHED_FIFO task becomes runnable, it continues to run until it blocks or explicitly yields the processor; it has no timeslice and can run indefinitely
(iii) Only a higher priority SCHED_FIFO or SCHED_RR task can preempt a SCHED_FIFO task.
(iv) Two or more SCHED_FIFO tasks at the same priority run round-robin, but only yielding the processor when they explicitly choose to do so.
(v) If a SCHED_FIFO task is runnable, all other tasks at a lower priority cannot run until the SCHED_FIFO task becomes unrunnable

SCHED_RR: (the one you are interested in) does SCHED_FIFO except that each process can run only until it exhausts a predetermined timeslice (that is your one shot you keep referring too). In other words, SCHED_RR is SCHED_FIFO with timeslices. It is a real-time, round-robin scheduling algorithm.

So SCHED_RR does exactly what you want and if you look at the SMP section you will find how the cores have a simple way to borrow time from each other

Code: Select all

/*
 * We ran out of runtime, see if we can borrow some from our neighbours.
 */
static int do_balance_runtime(struct rt_rq *rt_rq) 

So if you cutdown the linux 2.6 RT scheduler to only SCHED_RR it does exactly the same as what you are trying to do with hardware and it is portable. I have never seen a hardware dependent scheduler so you have got it over me on that, but I am used to seeing the portable software ones like this. Probably not much more to say if you are interested in the solution you now have a code reference to look at.

bzt
Posts: 374
Joined: Sat Oct 14, 2017 9:57 pm

Re: CNTP interrupt not firing

Wed Feb 27, 2019 10:49 am

LdB wrote:
Wed Feb 27, 2019 3:26 am
The software timer effectively runs on the core
Okay, but which one? :-) Do you route the timer IRQ to one core only, or to all at once? I understand that the scheduler gets called individually on all cores at the end, I'm just curious what's happening in between.
LdB wrote:You can look at the scheduler on version 4 kernel but it is significantly more complex to understand
Hey, pretty nice scheduler you have there! Besides the schedulers in Linux, I've also studied a lot the Mainframe OS/390 scheduler and Solaris' scheduler among others, so I have a quite good grasp :-) Btw Solaris' scheduler was the first I studied which could combine real time and time shared priority levels.

I have a much simpler scheme: I use 8 priority levels, in each processes are choosen in a round rubin fashion (I reorder the items in the queues to minimize TLB flushes, but that's a different story). Every higher priority level have more timeslices than the one beneath, and the top 3 levels are uninterruptible, meaning when the IRQ handler is executed, it won't call the scheduler. So top 3 priority level tasks can be interrupted for a very short period of time, just like any other process in lower priority levels, but they are given the control back ASAP and they are never task switched away. I use a micro-kernel architecture, so my IRQ handlers do nothing more than sending a message to one of the device driver tasks. If the handler interrupts a lower priority task, then the scheduler switches to that particular driver task, otherwise the message is just queued. Not a perfect solution (only soft real time), but pretty simple and works remarkably well.

Cheers,
bzt

Return to “Bare metal, Assembly language”