ACouCam
Posts: 17
Joined: Mon Feb 24, 2014 7:16 pm

Re: SPI driver latency and a possible solution

Tue Dec 16, 2014 7:55 am

Hi Notro!

Just a question about including this driver in the kernel. Will you try again some time in the future (perhaps when the issue is fixed) and why was the case closed last time?
I would be very happy to see that this would get included.

Regards,
Jonas

notro
Posts: 695
Joined: Tue Oct 16, 2012 6:21 pm
Location: Drammen, Norway

Re: SPI driver latency and a possible solution

Tue Dec 16, 2014 10:59 pm

The main reason it wasn't included was lack of interest and that the DMA version didn't have 9-bit support.
Now that we know about the broken SPI multi transfer, it certainly won't get included.
I have other priorities at the moment, so I haven't got the time to look into this.

You can use rpi-source to build this driver for the official kernel:
https://github.com/notro/rpi-source/wiki
https://github.com/notro/rpi-source/wik ... pi-bcm2708

ACouCam
Posts: 17
Joined: Mon Feb 24, 2014 7:16 pm

Re: SPI driver latency and a possible solution

Wed Dec 17, 2014 6:41 am

Ok tkanks a lot, didn't know about rpi-source.

Jonas

AlexKordic
Posts: 9
Joined: Sun Jun 02, 2013 12:46 am

Re: SPI driver latency and a possible solution

Fri Feb 13, 2015 1:32 pm

Is there any example for using the new driver ? Especially for the following use cases:
msperl wrote: [*] any GPIO can get used as ChipSelect - we are no longer limited to 2 SPI Devices on the bus!
msperl wrote: The driver includes copying a 64Bit timestamp (based on a 1MHz clock) of when the SPI transfer terminates (and it would be quite easy to add one to the start of the transfer as well), but that is not exposed further up the linux SPI-stack.

msperl
Posts: 344
Joined: Thu Sep 20, 2012 3:40 pm

Re: SPI driver latency and a possible solution

Sat Feb 28, 2015 12:38 pm

Something new:
I have restarted somewhat from scratch with the goal of solving the "low hanging fruit" and getting them included.
Some of those have been fixed.

I have now also recaptured the situation with all sorts of measurements and came up with a few improvements that are giving real improvements for my use-case (Can controller).

So much that I believe that for cases of low transfer counts it may not be necessary to run into DMA.
Note that there is still place to improve the situation by going for full blown DMA-scheduled SPI, but the question becomes: is all this code necessary?

Please look at the wiki and the "patch the upstream" branch, which also includes all the measurements.

I now really need to start working on getting these upstream...


Martin

jesperkn
Posts: 6
Joined: Tue Mar 17, 2015 5:01 am

Re: SPI driver latency and a possible solution

Tue Mar 17, 2015 3:16 pm

Hi Martin

First of all great work and huge achivement! I will be testing some ASIC / FPGA in the coming week(s). I believe I need 3-4Mbit/s over SPI DMA. Any concerns?

Best Jesper

notro
Posts: 695
Joined: Tue Oct 16, 2012 6:21 pm
Location: Drammen, Norway

Re: SPI driver latency and a possible solution

Tue Mar 17, 2015 7:17 pm

10 MB/s should be possible at 128MHz, in bursts at least: https://github.com/notro/fbtft/wiki/Per ... 581-pi-ext (using https://github.com/notro/spi-bcm2708)
As long as you don't do other DMA stuff like heavy USB.

This prebuilt kernel has that SPI DMA driver if you want to test: https://github.com/notro/fbtft/wiki#install

jesperkn
Posts: 6
Joined: Tue Mar 17, 2015 5:01 am

Re: SPI driver latency and a possible solution

Tue Mar 17, 2015 7:54 pm

Hey notro,

This is perfect and thanks for the great work also ... you and Martin have done a huge job on this. I should not do much USB work at the same time so this might be perfect. What is best the driver update or the Kernel update? Or will it be the same.

Can I use the Mike McCauley package or is there are better alternative for c coding this?
http://www.raspberry-projects.com/pi/pr ... e-mccauley

Best Jesper

notro
Posts: 695
Joined: Tue Oct 16, 2012 6:21 pm
Location: Drammen, Norway

Re: SPI driver latency and a possible solution

Tue Mar 17, 2015 9:20 pm

I failed to mention one crucial bit of information, you can (currently) only do SPI DMA from kernel space.
So you need to write a kernel module to get that speed, together with the out-of-tree spi-bcm2708 driver.

When/if the spi-bcm2835 SPI driver gets can_dma support, then it will be possible to do DMA from userspace using spidev, because the SPI subsystem will do the DMA mapping.

I have done 2MB/s at 32MHz without DMA from kernel space: https://github.com/notro/fbtft/wiki/Per ... itdb28_spi
Not sure what you can achieve through spidev though. You have to try.
SPI loopback code example: https://github.com/raspberrypi/linux/bl ... dev_test.c
https://www.kernel.org/doc/Documentation/spi/spidev
The default spidev buffer size is 4k. You can change this in /boot/cmdline.txt: spidev.bufsiz=NN
http://lxr.free-electrons.com/source/dr ... i/spidev.c

The bcm2835 library uses polling, so I don't think it will give you the speed you need

jesperkn
Posts: 6
Joined: Tue Mar 17, 2015 5:01 am

Re: SPI driver latency and a possible solution

Wed Mar 18, 2015 4:18 am

Thx notro, crucial for sure, but likely not impossible :) I will start by writing the code under userspace and get it stable then I will have to find a way of getting it into a kernel module. Should get the ASIC/FPGA module on monday but I already have sufficient data for 1 week of coding :)

jesperkn
Posts: 6
Joined: Tue Mar 17, 2015 5:01 am

Re: SPI driver latency and a possible solution

Mon Mar 30, 2015 2:01 pm

Update ... :)

I successfully got the ASIC/FPGA working and now are the results. I can drive the FPGA in 2 modes slave or master.

1. In slave mode the RPI can follow and I see only weird behavior of the clock (I set it to 4Mhz but in reality I see 3.4Mhz). Any thoughts on this ... I seem to have read about this but I cannot remember where and what was the reason for this.

2. In Master mode the RPI has to become the slave ... this I have no good solution for besides writing a slave kernel module for implementation. I was thinking about just dedicating some GPIO pins for CS, MOSI, MISO, CLK ... any thoughts?

Kind
J

msperl
Posts: 344
Joined: Thu Sep 20, 2012 3:40 pm

Re: SPI driver latency and a possible solution

Mon Mar 30, 2015 3:03 pm

RPI as a SPI slave seems very unlikley to ever work efficiently.

There is a chance for it it to work using the other (BSC) slave-SPI engine, but that is quite an effort for this new driver and someone who is interrested in taking up that task.

Also the specs (BCM2835-ARM-Periphials page 160) indicate that it would need to be a polling driver due to missing interrupts or DMA support, which means high CPU utilization and slow clock rates due to interrupt latencies.

You might be better served triggering a pull via an interrupt line and writing the corresponding driver to pull the data instead.

Martin

jesperkn
Posts: 6
Joined: Tue Mar 17, 2015 5:01 am

Re: SPI driver latency and a possible solution

Mon Mar 30, 2015 5:46 pm

Hey Martin, this was my idea to interrupt on the CS -> LOW from the FPGA/ASIC and poll out the data. Yes MCU utilisation will be high during this event but it is also critical to get all data as there is no resend possibility. Second option would be to implement at hardware buffer (SPI storage) or a SPI SLAVE to Master buffer. I prefer to go with first option as this is the simplest and I also expect to get 160 bytes of data every 300mys (and the speed is something like 10Mbit/s).

I expect this to be doable but any thoughts / concerns?

Kind
J

msperl
Posts: 344
Joined: Thu Sep 20, 2012 3:40 pm

Re: SPI driver latency and a possible solution

Mon Mar 30, 2015 6:08 pm

jesperkn wrote:Hey Martin, this was my idea to interrupt on the CS -> LOW from the FPGA/ASIC and poll out the data. Yes MCU utilisation will be high during this event but it is also critical to get all data as there is no resend possibility. Second option would be to implement at hardware buffer (SPI storage) or a SPI SLAVE to Master buffer.

I expect this to be doable but any thoughts / concerns?
as said: triggering an interrupt and then have a driver run the "pull" is right now the best approach.
SPI-Slave on the RPI seems very cost-sensitive from a CPU perspective.

If you got some sort of hard-realtime requirements then there may be one other approach that might become sensible if you think of using the RPI2: limit normal processes to only use to 3 CPUs (kernel boot parameter isolcpus) and have one CPU run only your own process/thread with a set affinity.

Have that thread handle the transfers directly polling/managing the GPIOs directly - maybe even 8 bit wide to speed things up further... (this obviously means that the process needs to run as root, and you can kill the RPI if done wrong!)
Then latencies would be minimal and the thread would not get interrupted (as long as it is the only thread on the Core). - OK, there may be a few interrupts in the order of 10 us, but you would have to accept that.

For possible details of what would be required look here:
http://stackoverflow.com/questions/1358 ... le-process
which shows the fine details...

Martin

jesperkn
Posts: 6
Joined: Tue Mar 17, 2015 5:01 am

Re: SPI driver latency and a possible solution

Mon Mar 30, 2015 7:02 pm

Perfect Martin and thanks for the info, I have already ordered a couple of RPI2 so I should get them next week (due to Easter). I will see how far I can get on the RPI1 as my primary limitation will be my skills within kernel drivers ... this is stille new territory for me but the complexity of the drivers seems manageably.

Any good instructions on sharing data from kernelspace to userspace?

Kind J

msperl
Posts: 344
Joined: Thu Sep 20, 2012 3:40 pm

Re: SPI driver latency and a possible solution

Mon Mar 30, 2015 7:14 pm

As said: if you want to make it quick for your solution - as explained - use one core for the communication from the FPGA into the rpi. Then you can avoid the kernel barrier and keep everything in user-space and do not have to focus on those details...

But if you wanted to really do some communication from kernel to userspace, then you should look into mapping memory between Kernel and userspace. But even then it totally depends on what kind of data you have to process - if it is an audio-stream then the solution would be different than if it is an image.

So you have to tailor it to what you really need, but there might be some framework already which you can leverage.

If you want to look into kernel details get the book "linux Device drivers" to give you some ideas about the basics...

Karmek
Posts: 3
Joined: Mon Mar 30, 2015 3:22 pm

Re: SPI driver latency and a possible solution

Tue Mar 31, 2015 8:03 am

Hi Everybody,

i tried to catch up with this topic but there is a lot of information to process.
Could you help me understand what is going on here? I am about to build a data acquisition system with a rPi B+ which is supposed to sample 4 Channels with 1kHz sampling rate each. Since i could not find any simultaneous sampling ADCs that suits my needs, i will probably go for one with 24bit with integrated multiplexer and channel sequencer.
This means ~4000 Interrupts/s, each initiating a 3 byte read from the ADC.

As far as i understood, the crucial part is latency here, since context switches or scheduler interrupts may delay the SPI communication.

I have some questions left where i hope you could help me:
- Is the default spidev driver (spi_bcm2708) included in 3.12. Kernel (got a RT kernel with that version) good enough for this?
- I read about a BCM2835 driver which seems not to be included in my kernel. Does it improve latency perceptibly? Is it stable?
- msperls driver seems to improve performance a lot, but he mentioned that it brings little benefit for small transfers. Do you think it could improve performance in my scenario? Is it compatible with my kernel version? I think it was mentioned that it uses a new dev structure? What will change here (does it use device tree)?
- Are iio drivers a much better approach? I've never worked with them and do not know exactly what they change or if they use DMA or interrupts.

Hope you guys can give me some input.

Regards,

Dennis

msperl
Posts: 344
Joined: Thu Sep 20, 2012 3:40 pm

Re: SPI driver latency and a possible solution

Tue Mar 31, 2015 1:03 pm

Karmek wrote: As far as i understood, the crucial part is latency here, since context switches or scheduler interrupts may delay the SPI communication.

I have some questions left where i hope you could help me:
- Is the default spidev driver (spi_bcm2708) included in 3.12. Kernel (got a RT kernel with that version) good enough for this?
- I read about a BCM2835 driver which seems not to be included in my kernel. Does it improve latency perceptibly? Is it stable?
- msperls driver seems to improve performance a lot, but he mentioned that it brings little benefit for small transfers. Do you think it could improve performance in my scenario? Is it compatible with my kernel version? I think it was mentioned that it uses a new dev structure? What will change here (does it use device tree)?
- Are iio drivers a much better approach? I've never worked with them and do not know exactly what they change or if they use DMA or interrupts.
In summary:
  • the spi framework has improved a lot since starting this thread
  • the spi-driver itself - at least the spi-bcm2835 driver is currently going thru lots of changes that may help your use-case - this might be ready for 4.1 kernels and the foundation may want to backport the driver to 3.18 or later.
  • iio drivers still use the spi framework and the same issues you would see.
Dma is not a requirement for your usecase really - you run 4 bytes/sample (1byte to select the channel and 3 bytes for data) and 4000 samples/second. DMA does not help you there - it helps mostly for longer "predictable" transfers.
The optimizations to the spi-bcm2835 (which I talk mostly about) help you more as they drive down interrupt latencies and are also polling for transfers that are <30us.

The spidev driver might be the best choice if you really have "predictable" transfers of 4 channel-samples.
Then the easiest is to use SPIDEV and create multiple transfers in an SPI message thus chaining all the 4 measurements into one message. Then if you run your "sampling" process on a dedicated CPU (isolcpu=3 as kernelparameter) and run that process on cpu=3 then you have your own "ownership" and you should be able to sample the messages at the right time with minimal jitter.
The way spi_sync is implemented now these calls now are made synchronous without any extra scheduling if there are no other devices on the spi bus.

Also the new patches to spi-bcm2835 move to polling mode for short transfers which keeps the overhead minimal as well.

So I believe spidev may be your best option if you also want to avoid jitter and the assumptions above are valid...

Hope this helps...

msperl
Posts: 344
Joined: Thu Sep 20, 2012 3:40 pm

Re: SPI driver latency and a possible solution

Tue Mar 31, 2015 1:36 pm

I had a look at spidev, but it seems as if the spidev as of now does not make use of any of those "advances" for spi_sync but relies instead on spi_async, which requires context-switches,...

You may want to patch spidev.c and there change spidev_sync so that it uses spi_sync instead of spi_async.
The code is from 2008, so it may need a review and might really get mapped directly to spi_sync.

Karmek
Posts: 3
Joined: Mon Mar 30, 2015 3:22 pm

Re: SPI driver latency and a possible solution

Tue Mar 31, 2015 2:23 pm

Hi msperl,

Thanks for your detailed answer!
For now i think i am stuck with kernel 3.12 since i read that no RT kernel patch is available for 3.18. yet. Also i am working on a rPi B+ (just one core). Using spi_sync would mean that the core is blocked by the SPI thread during tranfers, right?
Anyway your suggestions sounds very good to me. Locking one core just for SPI transfers could avoid a lot of trouble! So if there is no other way, i might change to rPi2 eventually.

If i understand you correctly DMA is not suitable for byte-wise transfer (very small data chunks) or unpredictable transfers. In case i can find a good 4 channel simultaneously sampling ADC it would give me a single data stream containing all channels - thus causing a single transfer of ~4*3=12 byte with 1kSPS. May i ask for your guess here? Any improvement with DMA possible? Or still not worth the effort?

What will be polled with spi-bcm2835 driver? Incoming bits on the MISO line?

I planned to use an ADC with DataReady indicator, such that it generates an interrupt on my rPi when ADC data is ready for retrieval. Within the interrupt handler I wanted to initiate the SPI transfer using wiringPi. I already did some experiments to identify the delay from the occurrence of an interrupt to a change in my SPI chip select line. It differs a lot, in average between 120 and 190us (which would be almost acceptable). Applying a FIFO scheduler and high priority (chrt -f -a -p 90 PID) stabilizes the delay a bit, but rarely a delay of~1.1ms appears, which is even longer then the sampling interval.
Do you have any idea what might cause this delay? It confuses me, since the FIFO scheduler may not be interrupted until the transfer is complete (!?).

Thanks for your help,

Dennis

msperl
Posts: 344
Joined: Thu Sep 20, 2012 3:40 pm

Re: SPI driver latency and a possible solution

Tue Mar 31, 2015 3:19 pm

Karmek wrote:Hi msperl,
If i understand you correctly DMA is not suitable for byte-wise transfer (very small data chunks) or unpredictable transfers. In case i can find a good 4 channel simultaneously sampling ADC it would give me a single data stream containing all channels - thus causing a single transfer of ~4*3=12 byte with 1kSPS. May i ask for your guess here? Any improvement with DMA possible? Or still not worth the effort?
Well - if you create a spi_message that does the following:
  • spi_transfer send 4 bytes (first byte select channel 0 select and start ADC, others 0) read 4 bytes (bytes 2,3,4 contain the 24bit) and cs_change=1
  • spi_transfer send 4 bytes (first byte select channel 1 select and start ADC, others 0) read 4 bytes (bytes 2,3,4 contain the 24bit) and cs_change=1
  • spi_transfer send 4 bytes (first byte select channel 2 select and start ADC, others 0) read 4 bytes (bytes 2,3,4 contain the 24bit) and cs_change=1
  • spi_transfer send 4 bytes (first byte select channel 3 select and start ADC, others 0) read 4 bytes (bytes 2,3,4 contain the 24bit) and cs_change=1
Then you schedule this "complex" message with spi_sync and you will get the whole 4 samples in one go without extra overhead. The polling driver will make sure that the latencies between the transfers are minimal.
Karmek wrote: What will be polled with spi-bcm2835 driver? Incoming bits on the MISO line?
Polling means running without an interrupt in a tight loop consuming CPU.
This only makes sense for a certain amount of time - the reason is that interrupts take 10-15us to start handling your code and then some more before releasing the CPU again for "normal" work. So this sums up to about 30us of CPU utilization either way.
so for short transfers (say 4 bytes in 3.5 us) it takes with polling about 6.5us while it takes 30us with interrupts.
that is why polling for short transfers is of an advantage. Also this makes transfers very predictable in length with minor variations...
Karmek wrote: I planned to use an ADC with DataReady indicator, such that it generates an interrupt on my rPi when ADC data is ready for retrieval. Within the interrupt handler I wanted to initiate the SPI transfer using wiringPi. I already did some experiments to identify the delay from the occurrence of an interrupt to a change in my SPI chip select line. It differs a lot, in average between 120 and 190us (which would be almost acceptable). Applying a FIFO scheduler and high priority (chrt -f -a -p 90 PID) stabilizes the delay a bit, but rarely a delay of~1.1ms appears, which is even longer then the sampling interval.
Do you have any idea what might cause this delay? It confuses me, since the FIFO scheduler may not be interrupted until the transfer is complete (!?).
If I was you and you want minimal jitter, then I would not go that route of an ADC that sends an interrupt (see above), as you would need to have 4x an interrupt for 4 transfers. instead look for an ADC that has "predictable" sample/hold times - say a prefix of X bit transmitted/reading as 0 and then follows the data. Also I would run everything essential in a tight C loop not python to minimize latencies further. I guess you could get as close as 30us for the full transfer (maybe slower).

The other approach could be an ADC which triggers the capture via a pin and which buffers the result - then any kind of jitter becomes minimal...

You could also use a micro-controller for the hard-realtime capturing from the ADC(s) and buffering the result for the pi to pull when it is ready via an interrupt...

But that said: a lot of it depends really what your application is and what the data requirements are with regards to latencies, ...

User avatar
joan
Posts: 14355
Joined: Thu Jul 05, 2012 5:09 pm
Location: UK

Re: SPI driver latency and a possible solution

Tue Mar 31, 2015 3:25 pm

Given that 1kps is being talked about the most accurate (in timing) results would probably be gained by bit banging as in An experiment in bit-banging SPI.

Karmek
Posts: 3
Joined: Mon Mar 30, 2015 3:22 pm

Re: SPI driver latency and a possible solution

Wed Apr 01, 2015 2:06 pm

Thanks very much for the good summary and information!
I need to discuss the idea of a non interrupt based data acquisition, since i am not sure if for example FFT analysis of the signals is important later. If so, i would need very accurate timing for the sampling intervals. I still need to read joan's link about bit-banging though. If you want i can keep you updated :)

Regards,

Dennis

adgriff2
Posts: 3
Joined: Wed Jun 05, 2013 1:16 am

Re: SPI driver latency and a possible solution

Thu Apr 02, 2015 6:58 pm

I've ready the majority of this thread and understood maybe 75% of it.

I need to pull SPI data off an IC at a theoretical minimum of 9446400 bits/s. I assume this would workout to a theoretical minimum SPI speed of ~9.45 MHz. The IC has an SPI ceiling of 20 MHz. I've written a C++ program that attempts to open /dev/spidev0.0 with a speed of 20MHz. (I read this only gets opened at 15.6MHz) After doing some testing of transferring ~3.75MB I'm only achieving an effective rate of ~8.27MHz.

I read somewhere that the power of 2 cdiv requirement was artificial and delved into the scary (for me) world of recompiling kernel modules. After about 3 days of floundering, I managed to compile a working version of spi-bcm2708.c for kernel version 3.18.10-v7+ with the line

Code: Select all

//cdiv = roundup_pow_of_two(cdiv);
commented out!

This change did increase my effective rate to ~9.10 MHz. I could try to challenge the datasheet on it's 20 MHz limit, but reducing the dead-time seems like a better choice. I *think* I'm using spidev, but changes to spi-bcm2708 obviously helped.

Finally 2 questions: Would the changes to spi-bcm2708 discussed here help me in "userspace"? If so, what source should I be compiling? (I see notro has a master branch: https://github.com/notro/spi-bcm2708/bl ... -bcm2708.c And msperl has many different branches: https://github.com/msperl/linux-rpi/blo ... -bcm2708.c)

Thanks for any help!

mbmahan39
Posts: 5
Joined: Tue Apr 14, 2015 12:18 am

Re: SPI driver latency and a possible solution

Tue Apr 14, 2015 12:28 am

Hi,
I'm trying to interface the raspberry pi compute module with RFM69HW radio module for transferring high speed data and I realized I have the same problem explained in this post. My kernel version is 3.18.5+.
My questions are:
1-how much is the maximum speed we can run the original SPI driver without loosing data?
2- how much installing this driver is going to improve it? :roll:
3-Has any one compiled the driver for this kernel?(3.18.5) :roll:
4- Is there any tutorials to explain how to install this driver(from compiling the kernel to the end) :mrgreen:
5- I spent time reading all of the posts in this thread but I couldn't find a solid tutorial as I'm just new to linux. :oops: :geek:
Thanks

Return to “Interfacing (DSI, CSI, I2C, etc.)”