ejolson
Posts: 3052
Joined: Tue Mar 18, 2014 11:47 am

A Pi Pie Chart

Sun Nov 18, 2018 5:16 am

I've been working on making a Pi pie chart and came up with this:

Image

The chart is generated by a self-contained C program--based on the algorithms in this thread--that can be run on almost any system for further comparisons. No throttling was observed during any of the runs. After I'm done polishing the code I'll post that as well. I had expected a greater performance difference between the 2B (which is the original model) and the 3B, so maybe more tuning is in order.

Update: The Pi Chart has now been updated after some tuning.

Update: Source code is available as a gzipped tar archive here.

Update: Source code has been updated to version 23 which automatically detects whether Linux high-precision monotonic timers are available and uses gettimeofday otherwise.

Update: Detailed instructions how to download, compile and run the pichart programs are available here.

Update: Source code has been updated to version 30 which provides an additional metric indicating relative speed compared to the original 700MHz ARMv6-based Raspberry Pi. The single-threaded version now graphs the single-threaded results for the reference machines for better comparison. The code has also been updated to make it compatible with a wider variety of compilers.
Last edited by ejolson on Tue May 14, 2019 8:40 am, edited 8 times in total.

W. H. Heydt
Posts: 10317
Joined: Fri Mar 09, 2012 7:36 pm
Location: Vallejo, CA (US)

Re: A Pi Pie Chart

Sun Nov 18, 2018 6:18 am

Two things...
1. I'm surprised there is a difference between the B+ and the Pi0 as they use the same SoC.
2. Which version of the Pi2B are you using? The v1.1 and v1.2 have different ARM cores even though the default clock speed is the same. The A53 should be faster than the A7 on an IPC basis.

jahboater
Posts: 4439
Joined: Wed Feb 04, 2015 6:38 pm

Re: A Pi Pie Chart

Sun Nov 18, 2018 7:54 am

W. H. Heydt wrote:
Sun Nov 18, 2018 6:18 am
1. I'm surprised there is a difference between the B+ and the Pi0 as they use the same SoC.
The B+ ran at 700MHz, the Zero runs at 1000MHz by default.
(A large factory overclock with over voltage 6!)
W. H. Heydt wrote:
Sun Nov 18, 2018 6:18 am
2. Which version of the Pi2B are you using?
@ejolson said it is the original model. So it must be the Cortex-A7
It is quad core.

I hope this really nice chart will scale for the 4B :)

User avatar
bensimmo
Posts: 4065
Joined: Sun Dec 28, 2014 3:02 pm
Location: East Yorkshire

Re: A Pi Pie Chart

Sun Nov 18, 2018 8:46 am

Yes, the 700/1000MHz difference, though every B+ and A+ I have ever used runs at 1000MHz as expected and I always run them at that (with the official overclock).
Perhaps add a shaded extension for official overclocks, the 2B also has an official overclock.
We'll need to know the 2B core version.
... as above...

Are they all single core benchmarks ?
Are any memory limiting (as in would an A+ be useful, I have the 256MB versions)

For benchmarks, for me and given this is the reason for the Pi, pyBench would be more useful.


I'll ignore the use of the pie chart for such performance illustration, I get the reason for it.
The bar graphs are nice though :-)

ejolson
Posts: 3052
Joined: Tue Mar 18, 2014 11:47 am

Re: A Pi Pie Chart

Sun Nov 18, 2018 11:54 am

bensimmo wrote:
Sun Nov 18, 2018 8:46 am
Are they all single core benchmarks ?
Are any memory limiting (as in would an A+ be useful, I have the 256MB versions)
They are all parallel codes as described in this previous thread. In each case all cores are used to perform a single large calculation. You are right that it would be nice if the codes would run in the 256MB models. There is also more contention for the memory bus than desired in some of the codes.

At the moment the phase space of the Lorenz 96 dynamical system is sized larger than reasonable and I need to adjust the cache blocking size on the prime number sieve. After these changes there will be less contention for the memory bus in those two calculations with more interesting results as the different benchmarks explore a greater variety of hardware characteristics. I'll update the graph and post the code after I've made these changes and others. Thanks for the feedback.

User avatar
DavidS
Posts: 4213
Joined: Thu Dec 15, 2011 6:39 am
Location: USA
Contact: Website

Re: A Pi Pie Chart

Sun Nov 18, 2018 12:05 pm

Nice Pie chart of Pi performance. Though do you think it is fair to include the single core boards in benchmarks designed for thesting multi-processor performance? Seems to me that that is unfair as it does not show the improvement in performance for the CPU directly (will have an extreme increase when jumping to 4 cores, that is much greater than the improvment in performance of the CPU (one core)).

You will find that there is still a large improvment in performance going from ARMv6 to ARMv7, and again from ARMv7 to ARMv7, even if runnign the same code at the same clock for every component on the board (SDRAM, ARM CPU's, VideoCore IV, etc). That I think a better comparison between these systems.
RPi = The best ARM based RISC OS computer around
More than 95% of posts made from RISC OS on RPi 1B/1B+ computers. Most of the rest from RISC OS on RPi 2B/3B/3B+ computers

ejolson
Posts: 3052
Joined: Tue Mar 18, 2014 11:47 am

Re: A Pi Pie Chart

Sun Nov 18, 2018 12:32 pm

DavidS wrote:
Sun Nov 18, 2018 12:05 pm
Nice Pie chart of Pi performance. Though do you think it is fair to include the single core boards in benchmarks designed for thesting multi-processor performance? Seems to me that that is unfair as it does not show the improvement in performance for the CPU directly (will have an extreme increase when jumping to 4 cores, that is much greater than the improvment in performance of the CPU (one core)).

You will find that there is still a large improvment in performance going from ARMv6 to ARMv7, and again from ARMv7 to ARMv7, even if runnign the same code at the same clock for every component on the board (SDRAM, ARM CPU's, VideoCore IV, etc). That I think a better comparison between these systems.
You might be right. It depends on what you are interested in. For example RISCOS only runs on one core and Python doesn't allow threaded SMP multiprocessing very easily either. It was suggested to create a shaded overlay showing standard over clock performance, but showing multiple core versus single core performance would also be interesting. Thanks for the suggestion!

W. H. Heydt
Posts: 10317
Joined: Fri Mar 09, 2012 7:36 pm
Location: Vallejo, CA (US)

Re: A Pi Pie Chart

Sun Nov 18, 2018 4:40 pm

DavidS wrote:
Sun Nov 18, 2018 12:05 pm
You will find that there is still a large improvment in performance going from ARMv6 to ARMv7, and again from ARMv7 to ARMv7, even if runnign the same code at the same clock for every component on the board (SDRAM, ARM CPU's, VideoCore IV, etc). That I think a better comparison between these systems.
Minor point, and one everyone probably understood, but that second transition should be ARMv7 to ARMv8, I think. And that makes it even more of interest to compare the Pi2Bv1.1 to the Pi2Bv1.2 as that *is* that transition with the default clock speed kept the same so it would be a direct "Apples to Apples" comparison of ARM version difference.

ejolson
Posts: 3052
Joined: Tue Mar 18, 2014 11:47 am

Re: A Pi Pie Chart

Mon Nov 19, 2018 7:25 am

ejolson wrote:
Sun Nov 18, 2018 11:54 am
At the moment the phase space of the Lorenz 96 dynamical system is sized larger than reasonable and I need to adjust the cache blocking size on the prime number sieve.
I've made the above changes and updated the pie chart.

I don't have a 2B with the 64-bit processor. I could under clock the 3B+ to 900MHz to simulate it--in fact that is usually what I do with the 3B for stability. At the same time, there isn't much room for additional graphs--need to save space for the Pi 4--so maybe I'll leave it alone for now.

I'm still working on polishing the code enough to post it. Currently the code is written using the Cilk parallel programming extensions to the C programming language and compiled with gcc version 6.4. As versions of gcc which support Cilk on ARM architectures are rare, I'm making an OpenMP version as well. Hopefully the performance will not be too different.
Last edited by ejolson on Mon Nov 19, 2018 5:10 pm, edited 1 time in total.

ejolson
Posts: 3052
Joined: Tue Mar 18, 2014 11:47 am

Re: A Pi Pie Chart

Mon Nov 19, 2018 4:36 pm

bensimmo wrote:
Sun Nov 18, 2018 8:46 am
For benchmarks, for me and given this is the reason for the Pi, pyBench would be more useful.
I've just looked at pyBench. It appears to consist of 52 micro benchmarks used to tune Python implementations and prevent interpreter and compiler regressions. It should be possible to create a Py Pi Pie Chart based on an averaged pyBench score for each type of Pi computer. Alternatively, one could translate the prime sieve, merge sort, Fourier transform and Lorenz 96 dynamical system simulations used for C into Python and compute solutions to well-defined problems. I found more than 15 different and incompatible parallel processing libraries for Python on this site. Does anyone have a recommendation which one to use?

In my experience people who need performance do not write the computational part of their code in Python, but rather C, C++ or Fortran. However, just like knowing the speed of the BASIC interpreter on the different 8-bit microcomputers of the past was interesting, so would knowing how Python runs today.

User avatar
DavidS
Posts: 4213
Joined: Thu Dec 15, 2011 6:39 am
Location: USA
Contact: Website

Re: A Pi Pie Chart

Mon Nov 19, 2018 6:52 pm

ejolson wrote:
Mon Nov 19, 2018 4:36 pm
bensimmo wrote:
Sun Nov 18, 2018 8:46 am
For benchmarks, for me and given this is the reason for the Pi, pyBench would be more useful.
I've just looked at pyBench. It appears to consist of 52 micro benchmarks used to tune Python implementations and prevent interpreter and compiler regressions. It should be possible to create a Py Pi Pie Chart based on an averaged pyBench score for each type of Pi computer. Alternatively, one could translate the prime sieve, merge sort, Fourier transform and Lorenz 96 dynamical system simulations used for C into Python and compute solutions to well-defined problems. I found more than 15 different and incompatible parallel processing libraries for Python on this site. Does anyone have a recommendation which one to use?

In my experience people who need performance do not write the computational part of their code in Python, but rather C, C++ or Fortran. However, just like knowing the speed of the BASIC interpreter on the different 8-bit microcomputers of the past was interesting, so would knowing how Python runs today.
I can not help with the python stuff, as I do not use python at all.

As for looking at interpreted and compiled:
I agree that knowing the speed of the default interpreter (seems to be python on Raspbian) is important. Definitely would be interesting.

Would also be interesting to see how it is on ARM BASIC (BBC BASIC V) for those that use RISC OS. Though that will have to wait untill the still infant state SMP for RISC OS comes along a bit, and ARM BASIC is updated to support multithreading (so we may be waiting a while).
RPi = The best ARM based RISC OS computer around
More than 95% of posts made from RISC OS on RPi 1B/1B+ computers. Most of the rest from RISC OS on RPi 2B/3B/3B+ computers

User avatar
bensimmo
Posts: 4065
Joined: Sun Dec 28, 2014 3:02 pm
Location: East Yorkshire

Re: A Pi Pie Chart

Mon Nov 19, 2018 7:42 pm

I would just use pyBench for a start so the basic calls an implementation of Python are benchmarked.
(I think one of the Linux websites uses it.

And then some other real world uses browser. Robohornet or is that an old one now.

(GPIO switching benchmark, is that actually a real world use?)

I benchmarked an old laptop using RPIDesktopX86 so can compare directly as it uses the same Python version. It's also a low-mid range for the era (10yrs ago).
(Intel T5800 2GHz core2duo : 3GB)

IDLE3 & Python3-terminal box completed in 6.41/6.25seconds.
Thonny3 is about a second slower at 7.13 seconds.

ejolson
Posts: 3052
Joined: Tue Mar 18, 2014 11:47 am

Re: A Pi Pie Chart

Tue Nov 20, 2018 5:01 am

bensimmo wrote:
Mon Nov 19, 2018 7:42 pm
IDLE3 & Python3-terminal box completed in 6.41/6.25seconds.
Thonny3 is about a second slower at 7.13 seconds.
I'm not sure what IDLE3, Python3-terminal box and Thonny3 have to do with Python performance. What is being completed in 6.41, 6.25 and 7.13 seconds?

ejolson
Posts: 3052
Joined: Tue Mar 18, 2014 11:47 am

Re: A Pi Pie Chart

Tue Nov 20, 2018 5:46 am

ejolson wrote:
Mon Nov 19, 2018 7:25 am
As versions of gcc which support Cilk on ARM architectures are rare, I'm making an OpenMP version as well. Hopefully the performance will not be too different.
The OpenMP version is finished.

I couldn't figure out how to convince OpenMP to schedule parallel loops using an omp single pool of shared worker threads with work stealing to avoid deadlocks. I did, however, find slides for a talk by Vivek Kale from SC17 which seem to indicate a complicated way of doing this using schedule(user) and an invitation for comment. Instead, I manually converted all parallel cilk_for loops to sequences of cilk_spawn calls followed by a cilk_sync, because the semantics of these constructions naturally map to sequences of omp task calls followed by an omp taskwait. Doing so forced me to explicitly specify the grain size of each parallel loop, which further resulted in slight performance gains both for the Cilk parallel and OpenMP versions of the code.

At this point the code can be compiled using either Cilk or OpenMP simply by changing a couple of defines in a header file. Moreover, the resulting executables run with about the same level of performance either way.

User avatar
bensimmo
Posts: 4065
Joined: Sun Dec 28, 2014 3:02 pm
Location: East Yorkshire

Re: A Pi Pie Chart

Tue Nov 20, 2018 7:25 am

ejolson wrote:
Tue Nov 20, 2018 5:01 am
bensimmo wrote:
Mon Nov 19, 2018 7:42 pm
IDLE3 & Python3-terminal box completed in 6.41/6.25seconds.
Thonny3 is about a second slower at 7.13 seconds.
I'm not sure what IDLE3, Python3-terminal box and Thonny3 have to do with Python performance. What is being completed in 6.41, 6.25 and 7.13 seconds?
pyBench default benchmark.
Each is a different way of using Python3.
Some use command line no GUI, many use IDLE3 and RPF now use Thonny V3 as their python program. (From the point of view of a general user)

I thought I would test all three (for when I test on the pi)
But that's for an alternative 'user' benchmark, rather than the specific task of your benchmarks.
(for another thread I guess).

ejolson
Posts: 3052
Joined: Tue Mar 18, 2014 11:47 am

Re: A Pi Pie Chart

Wed Nov 21, 2018 12:27 am

bensimmo wrote:
Tue Nov 20, 2018 7:25 am
I thought I would test all three (for when I test on the pi)
But that's for an alternative 'user' benchmark, rather than the specific task of your benchmarks.
(for another thread I guess).
I read that pybench is not considered reliable because many of the benchmarks can be optimized by compilers such as PyPy in ways that make them trivial. What do you think about the performance object?

User avatar
bensimmo
Posts: 4065
Joined: Sun Dec 28, 2014 3:02 pm
Location: East Yorkshire

Re: A Pi Pie Chart

Wed Nov 21, 2018 12:00 pm

I'll have a read but you know more about them than me.
I was just going with the default Raspbian python(s), so would only change if version where bumped and IDLE is as simple as it get (cpython underneath when it run the benchmark).
It's just a general feel of using python.

What it did show was Thonny was generally a bit quicker, apart from module loads which are a lot slower.
No idea why as I though it used the OS's Python in Raspbian, even on x86.

So what else could be used.
Python benchmark
Chrome (browser benchmark)

That has to be the two main 'user' uses.

Two more?

User avatar
DavidS
Posts: 4213
Joined: Thu Dec 15, 2011 6:39 am
Location: USA
Contact: Website

Re: A Pi Pie Chart

Wed Nov 21, 2018 2:32 pm

That has always been a trouble with well known benchmarks and compilers. Compiler writers will create optimizations to improve the performance of the benchmarks, as it looks good when talking about the compiler, these optimizations do not always help for real world code as much as for the benchmarks.

There have been cases of compilers recognizing certain benchmarks, and just cheating and compiling around the benchmark inserting code to provide the expected results without running the actual benchmark (though that was with comercial compilers, I doubt that an open source one would go that far).
RPi = The best ARM based RISC OS computer around
More than 95% of posts made from RISC OS on RPi 1B/1B+ computers. Most of the rest from RISC OS on RPi 2B/3B/3B+ computers

ejolson
Posts: 3052
Joined: Tue Mar 18, 2014 11:47 am

Re: A Pi Pie Chart

Sun Nov 25, 2018 1:39 am

The code is finished and I have made a gzipped tar file containing the complete source that should be possible to compile and run on any Linux computer. I would be interested to know what modifications are needed for other operating systems. If you are interested in comparing the speed of your computer to the Raspberry Pi, please download the archive using this link.

By default the serial version and OpenMP versions are built using the system compiler. If you have a Cilk compiler, feel free to include pichart-cilk as one of the build targets. After tuning there was not much difference in the performance between the OpenMP and Cilk versions for the multiprocessor Raspberry Pi computers. For Intel compatible machines with higher core counts there were greater differences. Details may be found in the comments of the pichart.c file.

Note that for the Pi 2B, 3B or 3B+ running Raspbian it is important to explicitly include compiler flags to build ARMv7 executables. Otherwise, noticeably-slower reverse-compatible ARMv6 binaries will be built. When I created the reference data I tried to choose reasonable compiler settings. If I missed something and different options lead to faster runtimes, I would appreciate a reply with details how to obtain the improved results.

Upon running the pichart program it will perform four tests: Prime Sieve, Merge Sort, Fourier Transform and Lorenz 96. After the tests have completed a scalable vector graphic pichart.svg will be written to the current directory. This can be viewed using the geeqie image viewer among others. While SVG is a web standard, I used the GNU Image Manipulation Program GIMP to export a portable network graphics file pichart.png to make the image for the first post in this thread. It would be great to see any interesting Pi charts people make.

ejolson
Posts: 3052
Joined: Tue Mar 18, 2014 11:47 am

Re: A Pi Pie Chart

Sun Nov 25, 2018 4:54 am

Here is Pi pie chart that compares the Raspberry Pi 3B+ to some much more expensive computers. All the performance numbers were computed using the OpenMP version of the benchmark.

Image

Data for the above graph is included in the source archive linked to in the previous post. A short description of each of the computers follows:

2xE5-2620 -- a 12-core 2.4Ghz dual-socket Xeon server.
R7Pro1700 -- an 8-core 3Ghz AMD Ryzen 7 desktop.
JetsonTX2 -- a 4-core 2Ghz Cortex-A57 SBC.
S5P6818 -- an 8-core 1.4Ghz Cortex-A53 SBC.

Update: The pie chart has been updated with new scores for the Xeon server reflecting better compiler optimization settings. The Ryzen 7 is still faster for the prime sieve, but not by so much.
Last edited by ejolson on Tue Nov 27, 2018 7:18 pm, edited 4 times in total.

User avatar
bensimmo
Posts: 4065
Joined: Sun Dec 28, 2014 3:02 pm
Location: East Yorkshire

Re: A Pi Pie Chart

Sun Nov 25, 2018 2:47 pm

TX2 is 6-core, 2xDenver2-ARM and 4x-A57 (all at 2GHz).
Can it run on the CUDA cores, would that improve things?
(Given the TX2 is more powerful than the Switch Console)

ejolson
Posts: 3052
Joined: Tue Mar 18, 2014 11:47 am

Re: A Pi Pie Chart

Sun Nov 25, 2018 5:57 pm

bensimmo wrote:
Sun Nov 25, 2018 2:47 pm
TX2 is 6-core, 2xDenver2-ARM and 4x-A57 (all at 2GHz).
Can it run on the CUDA cores, would that improve things?
(Given the TX2 is more powerful than the Switch Console)
Update: By default Jetson boots with the Denver cores disabled for power savings. The tests above only involved the A57 cores. The Denver cores are about 60% faster on the Pi Chart benchmark. A new set of runs that include the Denver cores appears here.

From what I can tell, the Ubuntu Linux running on the Jetson TX2 boots with cores 0, 3, 4 and 5 active. I haven't spent much time to know which kind of cores those are and whether they dynamically switch based on the load. In particular, the above tests were performed with only 4 cores active.

I actually translated the parallel merge sort into CUDA last year to use in an algorithm for finding differences in sets of point cloud data gathered from a LIDAR. While a work-span analysis of general parallel sort algorithms places merge sort of near the top, to efficiently use hundreds of CUDA cores people sometimes employ a radix sort customized to the data. In the case of the point-cloud differencing algorithm, the real-number radix sort from the CUB sorting library performed surprisingly worse than my general merge sort running on a P100 GPU. When running the same code on the Pascal GPU of the Jetson TX2 the radix sort performed better.

The prime sieve is embarrassingly parallel with the added benefit of being cache friendly--for this reason the parallel algorithm also runs faster on single core computers. The algorithm employs a lot of bitwise operations so it would be interesting to see what happens on CUDA. I haven't tried.

In my opinion, the fact that Nvidia offered good FFT and BLAS libraries for CUDA greatly increased its popularity in the beginning. The Fourier transform coded in C for the pie-chart benchmark uses a cache-agnostic recursive parallel implementation. New versions of CUDA support this type of dynamic parallelism and it would be interesting to see how the vendor library compares. When I converted the merge sort to CUDA, it was necessary to explicitly code the nonparallel part of the recursion using arrays and goto statements. This was needed to prevent too many stack frames from overwhelming the simple GPU memory manager. I expect a similar technique would be needed when translating the C code used in the Fourier transform benchmark.

The Lorenz 96 simulation might be the easiest to translate to CUDA because of its computational simplicity and the fine-grained parallelism available. The CPU version uses an overlapping boundary approach based on domains of dependence to obtain coarser-grained computational blocks that contain enough work for parallel speedup on 64-core systems. GPUs typically have many more processors. It is also worth noting that the phase space consists of 32K double-precision words and entirely fits into the cache. This is small for a parallel problem but quite large compared to typical Lorenz 96 simulations.

Since CUDA is not supported on any current Raspberry Pi models, I don't see a pressing need to translate the existing OpenMP and Cilk parallel C codes to CUDA. At the same time, the performance characteristics of GPUs are quite different than CPUs so the comparison would still be interesting. If you decide to write a GPU version yourself, I'll contribute my CUDA merge sort code as needed.
Last edited by ejolson on Wed May 15, 2019 12:36 am, edited 2 times in total.

User avatar
bensimmo
Posts: 4065
Joined: Sun Dec 28, 2014 3:02 pm
Location: East Yorkshire

Re: A Pi Pie Chart

Sun Nov 25, 2018 8:25 pm

:oops: not my field of expertise, I can follow (mostly) what you are saying, I have to rely on people like you for this (hence probably my user benchmark alternatives)

I can say for this old HP G70 laptop T5800 (2GHz Dual) it tend to beat the Pi3B+ in the benchmarks, it has a hard time at Merge sort in both.

It was already hot from video playing and did throttle at points (Lorenz, really made it ramp up). So I will run again.

This is running 'Raspbian x86' ;-) updated to has a lot of cpu 'bug'* fixes now.


---
quick run now. Not while these may be slower in places than a Pi3B+ in actual perceptible desktop use, with Raspberry Pi Desktop, the 3B+ is way behind.

Linux version 4.9.0-8-amd64 (debian-kernel@lists.debian.org) (gcc version 6.3.0 20170516 (Debian 6.3.0-18+deb9u1) ) #1 SMP Debian 4.9.130-2 (2018-10-27)[/i[]

Code: Select all

pi@PiLaptop:~/Downloads/pichart-current $ ./pichart-openmp 
pichart -- Raspberry Pi Performance OPENMP version 22

Prime Sieve          P=14630843 Threads=4 Sec=1.07687 Mops=867.633
Merge Sort           N=16777216 Threads=4 Sec=1.74059 Mops=231.331
Fourier Transform    N=4194304 Threads=2 Sec=1.16637 Mflops=395.563
Lorenz 96            N=32768 K=16384 Threads=2 Sec=0.879889 Mflops=3660.94

Making pie charts...done.
pi@PiLaptop:~/Downloads/pichart-current $ ls
fourier.c  Makefile  pichart.c  pichart-openmp  pichart.svg  util.c
lorenz.c   merge.c   pichart.h  pichart-serial  sieve.c
pi@PiLaptop:~/Downloads/pichart-current $ ./pichart-serial
pichart -- Raspberry Pi Performance Serial version 22

Prime Sieve          P=14630843 Threads=1 Sec=2.11042 Mops=442.722
Merge Sort           N=16777216 Threads=1 Sec=3.42745 Mops=117.479
Fourier Transform    N=4194304 Threads=1 Sec=1.89937 Mflops=242.908
Lorenz 96            N=32768 K=16384 Threads=2 Sec=1.74144 Mflops=1849.74

Making pie charts...done.

*bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf

ejolson
Posts: 3052
Joined: Tue Mar 18, 2014 11:47 am

Re: A Pi Pie Chart

Mon Nov 26, 2018 1:36 am

bensimmo wrote:
Sun Nov 25, 2018 8:25 pm
I can say for this old HP G70 laptop T5800 (2GHz Dual) it tend to beat the Pi3B+ in the benchmarks, it has a hard time at Merge sort in both.
Those numbers look reasonable for a 2GHz laptop processor of that vintage. As you've mentioned, due to inadequate cooling, laptops are not very well suited to any kind of processor-intensive computation. Still, it is reassuring to see that the speed of the parallel version runs about twice as fast on two cores as the serial version does on one. My guess is that little if any throttling occurred. If you rerun the tests using a cooling pad or something, it would be nice to note any performance differences. Were you able to view the automatically generated pie charts?

Do you think the better apparent responsiveness of your laptop computer compared to a Pi is related to 3GB RAM versus 1GB or because of the slowness of the SD card in the Pi? Even though the four slower cores on a Pi 3B+ may perform a parallel sort faster than the two cores in the laptop, apparent responsiveness in desktop applications may be more a reflection of single core speed.
Last edited by ejolson on Mon Nov 26, 2018 2:16 am, edited 2 times in total.

ejolson
Posts: 3052
Joined: Tue Mar 18, 2014 11:47 am

Re: A Pi Pie Chart

Mon Nov 26, 2018 2:02 am

DavidS wrote:
Sun Nov 18, 2018 12:05 pm
Nice Pie chart of Pi performance. Though do you think it is fair to include the single core boards in benchmarks designed for thesting multi-processor performance? Seems to me that that is unfair as it does not show the improvement in performance for the CPU directly (will have an extreme increase when jumping to 4 cores, that is much greater than the improvment in performance of the CPU (one core)).

You will find that there is still a large improvment in performance going from ARMv6 to ARMv7, and again from ARMv7 to ARMv7, even if runnign the same code at the same clock for every component on the board (SDRAM, ARM CPU's, VideoCore IV, etc). That I think a better comparison between these systems.
Here is a Pi pie chart comparing the single-core speeds of the different Pi computers using the serial versions of the benchmark codes.

Image

Note that the gap in performance between the Zero and the Pi 2B has narrowed considerably with the Zero even outperforming the 2B in the prime sieve benchmark due to the faster clock speed. This is reminiscent of how the 8-core 3Ghz Ryzen 7 outperformed the 12-core 2.4GHz Xeon for the prime sieve in the pie chart posted earlier. In that case, the clock speed of the Ryzen as well as architectural differences made up for having fewer cores.

Return to “General discussion”