msperl
Posts: 344
Joined: Thu Sep 20, 2012 3:40 pm

Re: SPI driver latency and a possible solution

Wed Jan 29, 2014 2:57 pm

Note that one thing that I came to realize (over the last few days thinking about this project) is the fact that with the DMA implementation it should also be feasible to have more than just 2 ChipSelect lines - in principle ANY GPIO can get used for chip-select and it does not make much of a difference...

Martin

msperl
Posts: 344
Joined: Thu Sep 20, 2012 3:40 pm

Re: SPI driver latency and a possible solution

Tue Feb 04, 2014 11:07 am

While thinking about the finer details of the implementation (again)
i also came up with the idea that it should be in - principle - be possible to run a separate (single/Dual/quad) SPI bus on the remaining GPIO pins driven only via DMA with minimal CPU involvment.

A Quad SPI option would be GPIO 17,18, 22, 23 for data and one of 24,25,26 for SCK/CS - if you do not use I2C or the serial port, then also GPIO 2,3,4,14,15 could get used for SCK,CS,Interrupt...
Also the alt connector (GPIO 28,29,30,31) could also get used for this.

There are some ideas how I implement that, but I do not know how fast we could really run this - only experimentation can really show.

Obviously my first task is finishing the "normal" DMA driver first and then we can discuss the "other" ideas...
But I wonder if there would be interest in something like this.
My guess would be that it would be mostly of interest to LCD devices (maybe Notro can comment on that) and to attached flash devices for which there is currently quite a huge effort happening in the upstream kernel...

On the downside: to make it work with "minimal" CPU overhead a few Megabyte of data tables would be needed to handle most of the Byte to Bit translations in DMA alone. Also I gues that the clock would not be 100% symmetrical and would show some jitter effects...

So please let me know if there would be interest for this...

Thanks, Martin

notro
Posts: 695
Joined: Tue Oct 16, 2012 6:21 pm
Location: Drammen, Norway

Re: SPI driver latency and a possible solution

Tue Feb 04, 2014 5:08 pm

Have you tried the spi-gpio driver: http://lxr.free-electrons.com/source/dr ... spi-gpio.c
Comment in the source code: Software overhead means we usually have trouble reaching even one Mbit/sec (except when we can inline bitops)

SPI connected displays with a touchpanel, uses both chipselects. Having more chipselects on the native bus might be useful to many.

Having a dedicated DMA driven GPIO "bitbanging" SPI master driver, might not get that much use. The danger is that if no one "really" needs this, the code will quicly become stale and out of date.

msperl
Posts: 344
Joined: Thu Sep 20, 2012 3:40 pm

Re: SPI driver latency and a possible solution

Tue Feb 04, 2014 6:42 pm

I know that it might Not be needed, that is why i was asking if there was any need for something like this.
I know that flash devices can support dual and quad spi increasing thruput dramatically. But I do not know if this would also apply to LCD display drivers so I was wondering if there were any devices that would benefit from this...

Anyway: my first priority is to get the dma driver working before looking into the gpio solution...

notro
Posts: 695
Joined: Tue Oct 16, 2012 6:21 pm
Location: Drammen, Norway

Re: SPI driver latency and a possible solution

Tue Feb 04, 2014 7:23 pm

But I do not know if this would also apply to LCD display drivers so I was wondering if there were any devices that would benefit from this...
None that I am aware of.
I know that flash devices can support dual and quad spi increasing thruput dramatically.
Do you have an example of this with a Linux driver?

msperl
Posts: 344
Joined: Thu Sep 20, 2012 3:40 pm

Re: SPI driver latency and a possible solution

Tue Feb 04, 2014 10:34 pm

notro wrote:
I know that flash devices can support dual and quad spi increasing thruput dramatically.
Do you have an example of this with a Linux driver?
Any device that is supported by the new mtd/spi-NOR driver framework currently under development and which might be in 3.13 or 3.14 already.

An example would be the m25p80, but I have seen several others as well...
I have seen at least one datasheet of an ic that can even do a transfer on each edge of the clock.

msperl
Posts: 344
Joined: Thu Sep 20, 2012 3:40 pm

Re: SPI driver latency and a possible solution

Sat Feb 22, 2014 6:28 pm

Hi Notro!

one quick question regarding your framebuffer drivers, as these might be the ones that also profit a lot from this:
how do you schedule the block transfer of your frame?

I am asking specifically to know:
  • if you are using spi_message->is_dma_mapped
  • what is your maximum frame-size you transfer at a time (xfer->len) - do you chunk at 4096 bytes?
  • if you reuse the spi_message and if so, if you modify it in between calls and if what fields do you modify?
  • is it mostly write, or do you also read from the device?
I am asking because I have currently some limitations that I might or might not want/need to address - especially with regards to xfer->len where the BCM2835-SPI-device seems to limit the transfers to 65535 bytes in one go (at least the documentation says taht the SPI_DLEN register is only 16 bit wide)

Also this would impact the proposed "spi_message_optimize" call, which essentially translates the spi_message into a prepared DMA format and which can then get executed synchronously with the spi_async call (meaning that the DMA is scheduled - and possibly running - by the time spi_async returns).

So what are your requirements?

Ciao, martin

P.s: it seems as if there are some Displays (like the ones used on the C-BERRY display, which is 8/16 bit wide it seems) that could benefit from an 8 bit wide spi interface, but here I am no expert into that part. But if you can transfer 8 bit in one go at 8MHz, then it would still be faster than transfering the same frame accross only one bit with 25MHz

notro
Posts: 695
Joined: Tue Oct 16, 2012 6:21 pm
Location: Drammen, Norway

Re: SPI driver latency and a possible solution

Sat Feb 22, 2014 7:21 pm

if you are using spi_message->is_dma_mapped
Yes
what is your maximum frame-size you transfer at a time (xfer->len) - do you chunk at 4096 bytes?
4k is the default, but it can be changed by the user.
if you reuse the spi_message and if so, if you modify it in between calls and if what fields do you modify?
No I don't, it is allocated on the stack for each synchronous transfer.
is it mostly write, or do you also read from the device?
Reading is only done during initialization on a couple of drivers.

Here's my write function: https://github.com/notro/fbtft/blob/mas ... t-io.c#L10
I am asking because I have currently some limitations that I might or might not want/need to address - especially with regards to xfer->len where the BCM2835-SPI-device seems to limit the transfers to 65535 bytes in one go (at least the documentation says taht the SPI_DLEN register is only 16 bit wide)
I haven't done any testing with DMA and increased buffer size, but I don't think my drivers will benefit from >64k buffers.
Also this would impact the proposed "spi_message_optimize" call, which essentially translates the spi_message into a prepared DMA format and which can then get executed synchronously with the spi_async call (meaning that the DMA is scheduled - and possibly running - by the time spi_async returns).
What is a prepared DMA format?
Can your driver convert kmalloc'ed buffers to DMA buffers like spi-omap2-mcspi.c? This way everyone can benefit from DMA transfers.
http://lxr.free-electrons.com/source/dr ... =arm#L1208
P.s: it seems as if there are some Displays (like the ones used on the C-BERRY display, which is 8/16 bit wide it seems) that could benefit from an 8 bit wide spi interface, but here I am no expert into that part. But if you can transfer 8 bit in one go at 8MHz, then it would still be faster than transfering the same frame accross only one bit with 25MHz
I'm guessing that 8/16 bit in this context means a parallel bus, either driven by a SPI adapter or directly through GPIOs. Do you have a link?

msperl
Posts: 344
Joined: Thu Sep 20, 2012 3:40 pm

Re: SPI driver latency and a possible solution

Sun Feb 23, 2014 1:02 am

Tanks for the Info:
For most parts your driver should immediately benefit from the dma - actually not the driver, but the system should require less CPU overall as there are no interrupts involved.

But from what I just read what I would recommend is:
A) use a fixed data structure that you do not have to prepare each time (maybe two, so that you can use alternative frames)
B) possibly allocate the frame buffers from continuous memory on the physical and logical level
C) when/if the optimize interface comes to the official kernel, then after allocating the messages with a) use the message-optimize call to create on optimized version, which further reduces CPU overhead.

Yes - actually I have to rely on dma_map/dma_unmap to map a region if you do not use the message->dma-enabled bit. Otherwise you have to fill in rx/txdma .

As for scatter/gather: in principle it should be possible to run scatter/gather but the problem is CPU overhead for mapping/unmapping the memory regions... That was one of the reasons why I was asking for your use-cases...

Thanks, martin

msperl
Posts: 344
Joined: Thu Sep 20, 2012 3:40 pm

Re: SPI driver latency and a possible solution

Sun Feb 23, 2014 8:58 am

The reason for the optimize call is to minimize the CPU overhead needed for transforming/verifying spi messages only once and not every time.

Even the simple "verify" spi message function that runs on most spi-async takes on the rpi a ms - depending on complexity.

And if you then have to transform the spi message to dma means additional CPU resources.

And the optimize call wants to reduce this to a (mostly) single operation (for spi-messages allocated on the heap) that gets executed once and is bypassed on subsequent calls - a net reduction in CPU cycles... (obviously you have to release the spi message when you discard it)

Martin

msperl
Posts: 344
Joined: Thu Sep 20, 2012 3:40 pm

Re: SPI driver latency and a possible solution

Tue Mar 18, 2014 5:46 pm

Just an update

The DMA driver starts to work - there are still issues with the "spi_message_optimize" code.

But some details on the driver:
  • everything (Speed, delays,...) are programmed via DMA - no interaction.
  • any GPIO can get used as ChipSelect - we are no longer limited to 2 SPI Devices on the bus!
  • this obviously includes some "overhead" for computation of those DMAs
  • this "overhead" depends on the number of spi_transfers for each message, but as a rule of thumb each transfer takes about 10usec
  • to avoid this there is a patch that allows to "optimize" SPI messages once and then they are set in stone and can only get executed - when doing this all the "overhead" goes away and things become very fast. This obviously needs to go into upstream linux kernels as well...
  • does not use that many interrupts/s or requires context-switches - just one interrupt per spi-message
So here a visualization of reading a message of the mcp2515 CAN bus controller with a saturated bus at 250kHz:
  • Old Programmed IO driver, which takes 293us from interrupt start to end of transfer and uses about 40% SystemCPU:
    OldDriver-293us.png
    Old programmed IO driver
    OldDriver-293us.png (26.79 KiB) Viewed 4740 times
  • New Driver without optimizations which takes 172us from interrupt start to end of transfer and uses about 15% System CPU (as it is in interrupt it is not really visible via vmstat)
    NewDriverNoOptimize-172us.png
    New DMA driver optimize disabled
    NewDriverNoOptimize-172us.png (25.38 KiB) Viewed 4740 times
  • New Driver with "basic" optimization, which takes 170us from interrupt start to end of transfer and uses about 13% System CPU (as it is in interrupt it is not really visible via vmstat)
  • New Driver with "full" optimization (still buggy), which takes 82us from interrupt start to end of transfer and uses about 3% System CPU (as it is in interrupt it is not really visible via vmstat)
    NewDriverFullOptimize-81us.png
    New DMA driver full optimize
    NewDriverFullOptimize-81us.png (18.17 KiB) Viewed 4740 times
The code is still not fully ready (especially the optimize part contains bugs.
But it should give you an idea what to expect in the future...

Still - to make optimize "really" work, you will need a driver that makes use of this interface!

Ciao,
Martin

User avatar
mikronauts
Posts: 2722
Joined: Sat Jan 05, 2013 7:28 pm
Contact: Website

Re: SPI driver latency and a possible solution

Tue Mar 18, 2014 6:11 pm

VERY IMPRESSIVE!

I am looking forward to the release version, as I could sure use fast SPI for reading ADC's...
http://Mikronauts.com - home of EZasPi, RoboPi, Pi Rtc Dio and Pi Jumper @Mikronauts on Twitter
Advanced Robotics, I/O expansion and prototyping boards for the Raspberry Pi

msperl
Posts: 344
Joined: Thu Sep 20, 2012 3:40 pm

Re: SPI driver latency and a possible solution

Tue Mar 18, 2014 6:55 pm

As said: you still have to have a driver that is optimized to make use of those features (especially optimize)...

But with ADCs typically the problem is that of scheduling the transfers with minimal jitter and that becomes a problem.

The driver includes copying a 64Bit timestamp (based on a 1MHz clock) of when the SPI transfer terminates (and it would be quite easy to add one to the start of the transfer as well), but that is not exposed further up the linux SPI-stack.

Martin

msperl
Posts: 344
Joined: Thu Sep 20, 2012 3:40 pm

Re: SPI driver latency and a possible solution

Thu Mar 20, 2014 1:37 pm

Update:
I believe I found the reason for those "stalls" - it was a race condition between DMA and interrupts.
right now I got a solution that "seems" to work, but worsted case i will need to implement dma interrupts via a separate DMA channel to 100% solve the issue.

Still - the issue has not shown up after receiving 5405195 can frames at 250KHz bus speed with only 2 drops and a rate of about 1500 messages/s, which is a good sign... I will keep it running for a lot longer to see if it the issue shows up.

Still here the "vmstat 10" output while the System is running this load:

Code: Select all

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 1  0      0 386816  23300  58696    0    0     0     0 8295   35  0  0 100  0
 0  0      0 386816  23300  58696    0    0     0     0 8266   34  0  0 100  0
 0  0      0 386816  23300  58696    0    0     0     0 8373   34  0  0 100  0
 0  0      0 387072  23300  58696    0    0     0     0 8352   42  0  1 99  0
 0  0      0 387072  23300  58696    0    0     0     0 8346   35  0  0 100  0
 0  0      0 387104  23300  58696    0    0     0     0 8350   35  0  0 100  0
 0  0      0 387104  23300  58696    0    0     0     0 8326   34  0  0 100  0
 0  0      0 387104  23300  58696    0    0     0     0 8362   35  0  0 100  0
 0  0      0 387104  23300  58696    0    0     0     0 8346   35  0  0 100  0
 0  0      0 387136  23300  58696    0    0     0     0 8303   35  0  0 100  0
 0  0      0 387104  23300  58696    0    0     0     0 8332   35  0  0 100  0
 0  0      0 387104  23300  58696    0    0     0     0 8343   35  0  0 100  0
 0  0      0 387104  23300  58696    0    0     0     0 8324   35  0  0 100  0
 0  0      0 387104  23300  58696    0    0     0     0 8266   35  0  0 100  0
 0  0      0 387104  23300  58696    0    0     0     0 8326   35  0  0 100  0
 0  0      0 387104  23300  58696    0    0     0     0 8278   35  0  0 100  0
 0  0      0 387104  23300  58696    0    0     0     0 8309   35  0  0 100  0
 0  0      0 387072  23300  58696    0    0     0     0 8342   35  0  0 100  0
 0  0      0 386944  23300  58696    0    0     0     0 8315   35  0  1 99  0
 0  0      0 387432  23300  58696    0    0     0     0 8320   55  1  1 99  0
 0  0      0 387552  23300  58696    0    0     0     0 8330   42  0  1 99  0
 0  0      0 387500  23428  58976    0    0    41     0 8370   51  0  1 99  0
 0  0      0 387340  23428  58976    0    0     0     0 8337   44  0  1 99  0
 0  0      0 387308  23428  58976    0    0     0     0 8247   36  0  0 100  0
 0  0      0 387276  23428  58976    0    0     0     0 8330   34  0  1 99  0
 0  0      0 387212  23428  58976    0    0     0     0 8318   35  0  0 100  0
 0  0      0 387148  23428  58976    0    0     0     0 8392   35  0  0 100  0
So the system is mostly idle - unlike with the "normal" drivers where you would see 30-40% system CPU.
but that hides the time spent in interrupt handlers as this does not get accounted...

Now I need to do some code-cleanup...

Martin

msperl
Posts: 344
Joined: Thu Sep 20, 2012 3:40 pm

Re: SPI driver latency and a possible solution

Sun Mar 23, 2014 9:00 pm

After some hardware issues which have been resolved (bad solder skills a year ago coming to bit me now)
I found the time to do some more tests and found another "race condition" between DMA and an interrupt handler that results in stalled DMA.

So here some measurements compiling the "out of tree module" while we run a 100% saturated 250kHz CAN bus (via "listen-only on"):

Code: Select all


Type                  | Compile Time |     packetloss | Interrupts/s
Idle                  |          50s |       N/A      |          N/A
PIO driver            |         120s |   1000 /  400k |        45000
PIO driver + optimize |         115s |   1000 /  400k |        45000
DMA driver            |          76s |      8 /  800k |         6700
DMA driver + optimize |          60s |      1 / 8000k |         8300
So you see that there is a lot of improvement on the available CPU as well as lower packet loss.

The thing that is a bit surprising (at first) is that the DMA driver without optimization runs with less interrupts than the optimized version.

The reason for this is that the mcp2515-device used has 2 receive buffers and may overflow to the second buffer. The way the driver is written it will make use of that and tries to read the second buffer with the same number of interrupts as the "single" buffer full case. So in the case of higher "overflow" rates the effective number of interrupts/s decreases when faced with a constant message rate.

There are still things to do (probably on both drivers) but I still wanted to share the current status with you.

Martin

msperl
Posts: 344
Joined: Thu Sep 20, 2012 3:40 pm

Re: SPI driver latency and a possible solution

Mon Mar 31, 2014 7:56 am

I am still fighting some race conditions and I am unsure how to resolve them:

Seems as if sometimes updating a memory locations on the arm via:

Code: Select all

writel(data,address);
dsb();
does not propagate to L2 Cache of RAM itself withing 20us.

This means that a (freshly) linked "DMA" control-block is not recognized and the dma transfer stops.

I know that the docu says you should "pause" the DMA before updating those "links", but so far I have only had bad experience with this...

Anyone got an Idea how to solve that?
Martin

msperl
Posts: 344
Joined: Thu Sep 20, 2012 3:40 pm

Re: SPI driver latency and a possible solution

Tue Apr 01, 2014 7:25 am

OK - solved the mystery!

It was a race condition between the GPIO and DMA interrupt, where GPIO was triggered first.

Now the system is up and running for 54 minutes without any hickups handling 10M CAN messages (@500khz) during this time without any hickup (except for 69 CAN requests that have not been handled - overflow).

I will push the code to github shortly...

Martin

msperl
Posts: 344
Joined: Thu Sep 20, 2012 3:40 pm

Re: SPI driver latency and a possible solution

Wed Apr 02, 2014 5:17 am

I have left it running for the night and this morning the RPI was still receiving CAN messages.

Stats are: 150M messages out of which 949 messages have "overflown" due to CPU activities...
So that looks good.

I have pushed the latest code to git-hub...

Martin

rafal-raf
Posts: 6
Joined: Fri Apr 04, 2014 12:21 pm

Re: SPI driver latency and a possible solution

Fri Apr 04, 2014 12:30 pm

For some time, the CAN communication I test using RPI.
Unfortunately, I have a problem with the speed SPI - RPI lost frames

Reading your posts I see that msperl done a great job and allowed the use of CAN in RPI

I downloaded from your git kernel and modules.
Kernel compilation ran properly but I have problem with compile spi modules.
Can you help me with it ?

Code: Select all

In file included from /root/ras-new/spi-bcm2835/include/linux/dma/bcm2835-dma.h:27:0,
                 from /root/ras-new/spi-bcm2835/drivers/dma/bcm2835-dma-debug.c:21:
/root/ras-new/spi-bcm2835/include/linux/dma-fragment.h: In function 'dma_fragment_set_default_links':
/root/ras-new/spi-bcm2835/include/linux/dma-fragment.h:378:3: error: implicit declaration of function 'list_last_entry' [-Werror=implicit-function-declaration]
/root/ras-new/spi-bcm2835/include/linux/dma-fragment.h:380:4: error: expected expression before 'struct'
cc1: some warnings being treated as errors

make[2]: *** [/root/ras-new/spi-bcm2835/drivers/dma/bcm2835-dma-debug.o] Error 1
make[1]: *** [_module_/root/ras-new/spi-bcm2835] Error 2

msperl
Posts: 344
Joined: Thu Sep 20, 2012 3:40 pm

Re: SPI driver latency and a possible solution

Sat Apr 05, 2014 5:38 am

Seems as if i have not pushed an update lately...
It may also be compiler dependent...

Will check...

msperl
Posts: 344
Joined: Thu Sep 20, 2012 3:40 pm

Re: SPI driver latency and a possible solution

Sat Apr 05, 2014 5:53 am

I think I know now what it is: you run a 3.14 kernel and there is a file-name conflict with dma-engine support for the upstream kernel. Will rename mine and push an update soon...

rafal-raf
Posts: 6
Joined: Fri Apr 04, 2014 12:21 pm

Re: SPI driver latency and a possible solution

Sat Apr 05, 2014 6:52 am

Thanks for the help

I look forward to updates

msperl
Posts: 344
Joined: Thu Sep 20, 2012 3:40 pm

Re: SPI driver latency and a possible solution

Sat Apr 05, 2014 7:51 am

Pushed the changes to github - it should (hopefully) compile now...

Also note that there is still a very rare race condition that shows up with highly optimized drivers that make use of the proposed spi_optimize interface (patchfile to 3.13 is in the repository), which triggers very rarely - in my case after more than 24 hours...

Note also that I have not yet tested the driver with other devices - that is on my todo list...

Martin

rafal-raf
Posts: 6
Joined: Fri Apr 04, 2014 12:21 pm

Re: SPI driver latency and a possible solution

Mon Apr 07, 2014 12:46 pm

Thanks for the update

But I still have problem with compile spi modules.
I download new sources from git

Below I put the instructions of me compilation

Code: Select all

root@homeiq-R540-R580-R780-SA41-E452:~/ras-new/spi-bcm2835# make
make -C ../linux M=/root/ras-new/spi-bcm2835 modules
make[1]: enter to `/root/ras-new/linux'

  WARNING: Symbol version dump /root/ras-new/linux/Module.symvers
           is missing; modules will have no dependencies and modversions.

/root/ras-new/spi-bcm2835/drivers/spi/spi-bcm2835dma_drv.c: In function 'bcm2835dma_schedule_dma_fragment':
/root/ras-new/spi-bcm2835/drivers/spi/spi-bcm2835dma_drv.c:239:3: error: implicit declaration of function 'list_last_entry' [-Werror=implicit-function-declaration]
/root/ras-new/spi-bcm2835/drivers/spi/spi-bcm2835dma_drv.c:240:20: error: expected expression before 'struct'
/root/ras-new/spi-bcm2835/drivers/spi/spi-bcm2835dma_drv.c: In function 'bcm2835dma_spi_message_to_dma_fragment':
/root/ras-new/spi-bcm2835/drivers/spi/spi-bcm2835dma_drv.c:502:34: error: 'struct spi_transfer' has no member named 'tx_nbits'
/root/ras-new/spi-bcm2835/drivers/spi/spi-bcm2835dma_drv.c:503:12: error: 'struct spi_transfer' has no member named 'tx_nbits'
/root/ras-new/spi-bcm2835/drivers/spi/spi-bcm2835dma_drv.c:505:34: error: 'struct spi_transfer' has no member named 'rx_nbits'
/root/ras-new/spi-bcm2835/drivers/spi/spi-bcm2835dma_drv.c:506:12: error: 'struct spi_transfer' has no member named 'rx_nbits'
/root/ras-new/spi-bcm2835/drivers/spi/spi-bcm2835dma_drv.c: At top level:
/root/ras-new/spi-bcm2835/drivers/spi/spi-bcm2835dma_drv.c:113:13: warning: 'bcm2835dma_spi_message_unoptimize' used but never defined [enabled by default]
cc1: some warnings being treated as errors

msperl
Posts: 344
Joined: Thu Sep 20, 2012 3:40 pm

Re: SPI driver latency and a possible solution

Mon Apr 07, 2014 12:50 pm

Against which kernel do you compile?
3.10, 3.13 or 3.14?

Return to “Interfacing (DSI, CSI, I2C, etc.)”