gertk
Posts: 52
Joined: Mon Aug 29, 2011 9:08 am

Interrupt latency

Fri Jun 07, 2013 10:00 am

Been searching through the forums but could not find the answer..

In Bare Metal assembler how much time would pass between a GPIO edge triggered interrupt (FIR?) and the interrupt handler starting ?

What I want to do is emulate a peripheral and/or memory to a 1 MHz 6502
I have done a similar thing with the LPC1768 on a slower bus system (8048) with success: the read or write signal generates the interrupt and the LPC presented data on the bus in (real)time.

Problem with the Pi is that I have to multiplex the reading of the 6502 address and databus because of the few IO pins available and also need level conversion. Still with about 70ns per single GPIO reading (found this time in other thread) it might just be doable.

I am aiming at sub 500 ns response time (the low part of the phase 2 clock)

JacobL
Posts: 76
Joined: Sun Apr 15, 2012 2:23 pm

Re: Interrupt latency

Fri Jun 07, 2013 5:05 pm

I assume that you will of course use the FIQ, and limit your code to use R8-R12 + SP + LR? I'm pretty sure the full IRQ resolver will take longer time than you have.

The interrupt itself would normally occur from one clock cycle to the next (though you have to add an unknown delay due to all interrupts being routed through the GPU). But that is just the act of loading 0x1C in pc. After that, the typical interrupt vector table would have both an instruction fetch and a data fetch, though a hardcoded branch could limit that to just the instruction.

It might be doable, but you won't have much room for fancy stuff. I would avoid using the stack or function calls (freeing up SP + LR as general purpose registers), and generally try to limit memory access. You should also keep your application footprint small enough to be able to guarantee that the cache will always be hot, one cache miss at this level could break it.

gertk
Posts: 52
Joined: Mon Aug 29, 2011 9:08 am

Re: Interrupt latency

Fri Jun 07, 2013 8:53 pm

JacobL wrote:I assume that you will of course use the FIQ, and limit your code to use R8-R12 + SP + LR? I'm pretty sure the full IRQ resolver will take longer time than you have.
Yes, I meant FIQ not FIR :oops:
The interrupt itself would normally occur from one clock cycle to the next (though you have to add an unknown delay due to all interrupts being routed through the GPU). But that is just the act of loading 0x1C in pc. After that, the typical interrupt vector table would have both an instruction fetch and a data fetch, though a hardcoded branch could limit that to just the instruction.
With the LPC the handler was entered in about 120 nsec, the total bus cycle of that 8048 was 2.5 usec so I had some more time to play. The width of the RD or WR pulse was about 1 usec and in that time I could read the address bus and prepare the data to be emitted. After that I just busy waited for the RD or WR line to change and removed the data from the bus again.
The better part of the 1.5 usec left was still available for the LPC to do (lots) of other things. Also the RD/WR interrupt was the only one on the system. All other semi-critical timing was done by polling the system timer from the main loop :)
It might be doable, but you won't have much room for fancy stuff. I would avoid using the stack or function calls (freeing up SP + LR as general purpose registers), and generally try to limit memory access. You should also keep your application footprint small enough to be able to guarantee that the cache will always be hot, one cache miss at this level could break it.
Yes, the cache might play tricks here. The size of the routine itself will be quite small: some GPIO reading, and a single memory fetch from RAM. The layout of the GPIO pins might be troublesome, (at least) 8 bits in a row would be ideal...

tufty
Posts: 1456
Joined: Sun Sep 11, 2011 2:32 pm

Re: Interrupt latency

Sat Jun 08, 2013 6:31 am

Use the "don't jump from the vector table" trick for your FIQ, i.e. start your FIQ routine at 0x1c without a jump. You have 32K of space to play with as long as you either don't care about the atags stuff or have relocated it. Gains you a few cycles.

JacobL
Posts: 76
Joined: Sun Apr 15, 2012 2:23 pm

Re: Interrupt latency

Sat Jun 08, 2013 6:49 am

gertk wrote:The size of the routine itself will be quite small
How about the rest of the code running outside the FIQ? Could it cause your FIQ handler to be evicted from cache? You could consider disabling cache and use the SRAM for your code, but it needs some careful consideration about the memory access that you need.
Use the "don't jump from the vector table" trick for your FIQ, i.e. start your FIQ routine at 0x1c without a jump. You have 32K of space to play with as long as you either don't care about the atags stuff or have relocated it. Gains you a few cycles.
Good advice. But remember that atags get written after loading kernel.img, so if you want to use the area 0x100-0x4000 then you need to copy code at runtime. An FIQ handler where you avoid stack and function calls would normally be position independent, so it should be fairly simple to set up.

tufty
Posts: 1456
Joined: Sun Sep 11, 2011 2:32 pm

Re: Interrupt latency

Sat Jun 08, 2013 10:21 am

JacobL wrote:But remember that atags get written after loading kernel.img, so if you want to use the area 0x100-0x4000 then you need to copy code at runtime. An FIQ handler where you avoid stack and function calls would normally be position independent, so it should be fairly simple to set up.
Unless you're using kernel_old, everything loads starting at 0x8000, the need to copy at least a vector table is implicit. For maximum speed, I'd probably avoid using the stack anyway and store stuff in hard-coded locations somewhere in the first 32k.

For the cache issue, you can lock a cache way, which is (IIRC) 16K. ISTR there's a TCM mapping thing that could be useful, but I forget the specifics. Check the "Level One Memory System" part of the 1176jzf-s TRM.

gertk
Posts: 52
Joined: Mon Aug 29, 2011 9:08 am

Re: Interrupt latency

Sat Jun 08, 2013 12:06 pm

Thanks for all the useful tips.
I will try and setup something to try them out.

First I need to sort out the (bidirectional) level conversion.
I will start with a simple 8 bit bus and see if I can manage to emulate a single IO port or something like 256 bytes of ram.

Return to “Bare metal, Assembly language”