tjrob
Posts: 33
Joined: Tue Feb 19, 2013 5:31 pm

Solution: Dedicating one core to a real-time process

Tue Dec 11, 2018 2:45 am

Linux is most definitely not a real-time operating system, and if you just use the default configuration you'll find you occasionally have latencies as large as 50 milliseconds. That's pretty hopeless for anything that needs real-time response. You can certainly install some real-time OS, or write a kernel module -- those require real expertise and are a major effort. If the requirements are such that this approach can meet them, it will be much easier. Note that the results of using some RTOS, or writing a kernel module or "bare metal" code may be less than expected, as the RPi hardware is not capable of disabling interrupts on a single core (tests show that the Linux Kernel routine local_irq_disable() disables interrupts on all cores).

I describe how to achieve latencies less than 3 microseconds 99+% of the time, with a worst-case latency of 41 microseconds. This is on a Raspberry Pi 3B+ running 2018-06-27-raspbian-stretch-lite, without any kernel drivers or other exotica. Note that on single-core models the results will be much worse. While this is not "hard" real time, it is sufficient for many applications. My applications are for a scientific instrument and involve sampling hardware at 1,000 Hz and writing each result to a socket.

By "latency" I mean the time difference between the time that "something" happens, and the time the program knows that it happened. Here "something" could be an edge on a GPIO, a time-delay ends, a specific time of day, etc. As best I can tell, large latencies are due to an interrupt causing the kernel to schedule some other task. Because of that, worst-case latencies do not add, and during one loop the code can call several routines that have latencies, but it essentially never happens that more than one of them gets a large latency. The RPi has a hardware counter that increments at 1 MHz, so measuring latency with 1 microsecond (us) resolution is easy.

The idea is to dedicate one core on a RPi 3B+ to the real-time process, and then write it as a simple loop doing whatever it needs to do:

Code: Select all

initialize
for(;;) {
    wait for connection to the socket
    for(;;) {
        wait for GPIO edge indicating ADC data are ready
        read ADC channels
        fprintf(socket, ...)
    }
    close socket
}
Yes, there is time to use fprintf(). But using stdout to an ssh connection is barely possible at 1 kHz, while using a bare TCP connection leaves a lot more headroom.

How to do it
  1. add this argument to /boot/cmdline.txt:
    isolcpus=3
    This prevents Linux from scheduling processes on core 3. But interrupts still happen on it, and there is an essential kernel task running on it. Still, it is a good start, but not at all sufficient.
  2. Set the process's CPU affinity to 3. See attached code.
  3. Disable turbo mode. Otherwise the OS will change the CPU clock frequency, and that screws up the SPI clock and the overall timing. See attached code.
  4. Set the process to real-time FIFO scheduling, with maximum priority. See attached code.
  5. Permit real-time processes to use 100% of the CPU. Without this you'll get 50 millisecond delays once per second. See attached code.
Attached are Realtime.h and Realtime.cc that implement the code needed in your process for #2-5. Just call Realtime::setup() during initialization.

Also attached is latency.cc to measure the latencies. Note that to measure the GPIO latency it needs GPIO22 connected to GPIO23. Here is a screenshot of an 18-hour run on a Raspberry Pi 3B+:
The attachment Screen Shot 2018-12-10 at 5.39.52 PM.png is no longer available
The line
0us: 0 76961522 31486427 9058 ...
Means that no sample had a 0-microsecond latency, 76961522 samples had a 1-microsecond latency, 31486427 samples had a 2-microsecond latency, etc. See the comments in the code for an explanation of what is measured.
Attachments
Screen Shot 2018-12-10 at 5.39.52 PM.png
Screen Shot 2018-12-10 at 5.39.52 PM.png (45.89 KiB) Viewed 2880 times

tjrob
Posts: 33
Joined: Tue Feb 19, 2013 5:31 pm

Re: Solution: Dedicating one core to a real-time process

Tue Dec 11, 2018 2:48 am

The code attachment got lost. Here it is.

I also forgot to show how to build the latency program:

Code: Select all

g++ -o latency latency.cc Realtime.cc -lpigpio -lpthread
Attachments
code.tgz
(6.01 KiB) Downloaded 106 times

User avatar
Joel_Mckay
Posts: 289
Joined: Mon Nov 12, 2012 10:22 pm
Contact: Website

Re: Solution: Dedicating one core to a real-time process

Wed Dec 12, 2018 7:13 am

Guaranteed latency OS like LinuxRT essentially install a program as a kernel module.
;-)
Our clubs rc1 variant of Stretch with this kernel will be out Dec. 28, 2018 (includes many updates):
https://sourceforge.net/projects/microm ... pberry-pi/

Cheers,
J

tjrob
Posts: 33
Joined: Tue Feb 19, 2013 5:31 pm

Re: Solution: Dedicating one core to a real-time process

Sun Dec 23, 2018 1:02 am

That latency program doesn't actually do anything in the real-time process. In particular, it does not do the fprintf(socket, ...). Actually using it, I learned what should have been obvious -- network flow control can back up and block the fprintf(), causing it to miss samples.

Fortunately there is a simple fix: split the program into two threads, the real-time thread that takes samples, and an IO thread that writes to the socket. Connect them via a thread-safe queue. Have the real-time thread set its CPU affinity to CPU 3, and the IO thread to CPUs 0,1,2; both need real-time priority.

With the improved version, an 18-hour run using 1 kHz sampling missed no samples. For 99.999% of the samples the queue had fewer than 10 entries, but there were two episodes where it exceeded 1,000 entries (and then rapidly decreased to 0) -- that's an I/O delay of more than a second! The measured time between samples never differed from 1,000 us by more than 50 us. So this meets my requirements.

My code is specific to the hardware being sampled, but if anyone wants me to post it, just ask. My first program implements a 200 MHz frequency counter in a CPLD, sampled at 1 kHz; I will be doing an ADS1256 4-channel ADC sampled at 1 kHz, and also an SR04 ultrasonic ranger at 20 Hz (with accuracy of 2 mm RMS -- much better than most because it times the echo using the SPI clock).

nixiebunny
Posts: 5
Joined: Thu Apr 04, 2019 4:32 pm
Contact: Website

Re: Solution: Dedicating one core to a real-time process

Thu Apr 04, 2019 4:47 pm

Hi. I I'm an engineer at the University of Arizona who works on radio telescopes. I'm working on building a real-time controller for a telescope secondary mirror, to replace an aging FPGA system. This thing moves really fast - 20 milliseconds from start to finish.
https://public.nrao.edu/wp-content/uplo ... 85x300.jpg

I am interested in your method of locking one core to do the real-time task. In my case, it will be a 200 microsecond PID loop, receiving position feedback over serial port running at 2 Mbit/sec.

I expect to use shared memory to implement the communication between the host interface thread and the PID loop thread.

Would you be willing to post your code for your loop? I'd like to see how you solved it, to save effort on my end.

Thanks for posting.

User avatar
Joel_Mckay
Posts: 289
Joined: Mon Nov 12, 2012 10:22 pm
Contact: Website

Re: Solution: Dedicating one core to a real-time process

Thu Apr 04, 2019 10:18 pm

nixiebunny wrote: Would you be willing to post your code for your loop? I'd like to see how you solved it, to save effort on my end.
FPGA solve a different set of problems, and FIR/CIC are used in DSP given the uniformity of timing latency.

I am not familiar with the equipment in the photo, but would recommend looking at a NIOS soft-cpu like Altera offers.
This runs a specialized Linux kernel that allows mapping shared memory into the dedicated FPGA sections.
Or put another way, the real-time is handled by the FPGA hardware, and the DMA buffers are polled by the soft-cpu multitasking OS.

There is a high probability things have changed over the years.. but repeatability is usually very important to the credibility of a scientist's works.
;-)

nixiebunny
Posts: 5
Joined: Thu Apr 04, 2019 4:32 pm
Contact: Website

Re: Solution: Dedicating one core to a real-time process

Tue Apr 23, 2019 4:04 am

Joel_Mckay wrote:
Thu Apr 04, 2019 10:18 pm
I am not familiar with the equipment in the photo, but would recommend looking at a NIOS soft-cpu like Altera offers.
This runs a specialized Linux kernel that allows mapping shared memory into the dedicated FPGA sections.
Or put another way, the real-time is handled by the FPGA hardware, and the DMA buffers are polled by the soft-cpu multitasking OS.
;-)
Joel,

Thanks for your concern. I know about doing DSP in FPGAs - I programmed one to be a 10 GSPS FFT spectrometer.

This application needs to calculate two motor drive numbers from ten parameters every 200 microseconds. It's easily doable by an Arduino, except for the I/O. The previous designers used an FPGA because they weren't being clever.

I just need to ensure that the maximum latency is under a hundred microseconds.

User avatar
Joel_Mckay
Posts: 289
Joined: Mon Nov 12, 2012 10:22 pm
Contact: Website

Re: Solution: Dedicating one core to a real-time process

Tue Apr 23, 2019 5:02 am

nixiebunny wrote:
Tue Apr 23, 2019 4:04 am
I just need to ensure that the maximum latency is under a hundred microseconds.
This may ultimately depend on how tolerant your device is to timing jitter, and whether your 10 variables are concurrently sampled.
https://en.wikipedia.org/wiki/Segal%27s_law

One could compile linuxRT to get 5kHz task resolution, lock the CPU/GPU clock, and inhibit the USB IRQ subsystems with a busy-loop (costs about 30% of standby CPU load on the Pi3B+). I have not personally tried higher RT task resolutions on the pi.... YMMV with your custom kernel setup.

Your group may want to have a look at this publication at some point too....
"Optimal signals of Golomb ruler class for spectral measurements at EKB SuperDARN radar: Theory and experiment" ( O. I. Berngardt, A.L.Voronov, and K. V. Grkovich )

Indeed, one may get better results with a bare-metal atmega system, as Arduino SDK has a timer0 IRQ service you would need to explicitly cripple to get repeatable timing. ;-)

Best of luck,
~J~

Paleloshow
Posts: 13
Joined: Mon Feb 11, 2019 9:23 pm

Re: Solution: Dedicating one core to a real-time process

Wed May 01, 2019 12:09 am

Thanks for this Real time solution. I was struggling to make Xenomai work on my Pi. I managed to run a neural network using your Realtime.h library and I got an execution time of 14 ms. I measured this using the empty() section of the latency.cc code (I call the neural network between begin and end Realtime::micros). Although it's a good time, I would like to lower this time even more. Do you know any means to do that? The neural network I am using it's a basic one: Iris data classification.

PurpleMark
Posts: 1
Joined: Wed Jul 03, 2019 2:03 pm

Re: Solution: Dedicating one core to a real-time process

Thu Jul 04, 2019 1:00 pm

I had a go at benchmarking this approach on the Raspberry Pi 4.

Firstly, new peripheral base address: 0xFE000000.

I ran the 'short delay' and 'long delay' tests only, at 600MHz, 1GHz and 1.5GHz for a duration of 4 hours.
(arm_freq=600/1000/1500 & force_turbo=1 in /boot/config.txt)
Note: I get the impression from reading other posts that fixing the clock at 1.5GHz has a risk of overheating under a heavy load so I kept cores 1/2/3 at near idle (resulting in a fairly stable 63 degC in 23 degC room). Incoming updated firmware (that shows a >3 degC improvement) and CPU cooler would be beneficial here.
Latency.png
Latency.png (38.79 KiB) Viewed 520 times
As expected, the latency tail improves at higher clock speeds (bear in mind it's a logarithmic probability scale so the effect is very tiny), but the maximum latency doesn't change by much. It would be interesting to find find out what the hold-up here is.

Return to “Advanced users”