The parallel benchmark tries different numbers of software threads in sequence starting with twice the number of hardware threads and ending with the serial version of the code. Tests for each threading configuration are run a minimum of three times or for 5 seconds which ever takes longer and the best timing is kept as the final result. Therefore, one expects intervals where fewer cores are busy when the parallel code is running. This allows automatic tuning for hyperthreading and cases where there are more or fewer floating point units than integer units per core. It also allows the system to cool off a bit before performing the next benchmark.
The pichart-serial benchmark runs in about 22 minutes on the original Pi B+ because it takes the minimum time (maximum speed) of eight measurements in each of the four categories and spends from 20 seconds to a minute on each measurement.
Code: Select all
$ ./pichart-cilk -t "dual PIII 650MHz"
pichart -- Raspberry Pi Performance CILKPLUS version 23
Prime Sieve P=14630843 Threads=2 Sec=4.76001 Mops=196.287
Merge Sort N=16777216 Threads=2 Sec=10.6594 Mops=37.7746
Fourier Transform N=4194304 Threads=2 Sec=9.29035 Mflops=49.6616
Lorenz 96 N=32768 K=16384 Threads=2 Sec=28.9737 Mflops=111.178
Making pie charts...done.
$ cp pichart.svg p3cilk.svg
$ ./pichart-openmp -t "dual PIII 650MHz"
pichart -- Raspberry Pi Performance OPENMP version 23
Prime Sieve P=14630843 Threads=2 Sec=4.70759 Mops=198.473
Merge Sort N=16777216 Threads=4 Sec=10.9022 Mops=36.9333
Fourier Transform N=4194304 Threads=2 Sec=9.49918 Mflops=48.5698
Lorenz 96 N=32768 K=16384 Threads=2 Sec=27.9837 Mflops=115.111
Making pie charts...done.
$ cp pichart.svg p3openmp.svg
Code: Select all
$ ./pichart-serial -t "P4 1500MHz"
pichart -- Raspberry Pi Performance Serial version 23
Prime Sieve P=14630843 Threads=1 Sec=4.74821 Mops=196.775
Merge Sort N=16777216 Threads=1 Sec=6.5265 Mops=61.6952
Fourier Transform N=4194304 Threads=2 Sec=5.90099 Mflops=78.1857
Lorenz 96 N=32768 K=16384 Threads=2 Sec=6.05222 Mflops=532.238
Making pie charts...done.
Code: Select all
$ ./pichart-openmp -t "P4D 2.8GHz"
pichart -- Raspberry Pi Performance OPENMP version 23
Prime Sieve P=14630843 Threads=2 Sec=1.4254 Mops=655.484
Merge Sort N=16777216 Threads=4 Sec=2.09171 Mops=192.499
Fourier Transform N=4194304 Threads=2 Sec=2.05578 Mflops=224.427
Lorenz 96 N=32768 K=16384 Threads=2 Sec=0.993851 Mflops=3241.16
Making pie charts...done.
Code: Select all
$ ./pichart-serial -t "P4D 2.8Ghz"
pichart -- Raspberry Pi Performance Serial version 23
Prime Sieve P=14630843 Threads=1 Sec=2.83992 Mops=328.998
Merge Sort N=16777216 Threads=1 Sec=4.14171 Mops=97.2192
Fourier Transform N=4194304 Threads=2 Sec=2.88016 Mflops=160.19
Lorenz 96 N=32768 K=16384 Threads=1 Sec=1.80798 Mflops=1781.67
Making pie charts...done.
Code: Select all
./pichart-openmp
pichart -- Raspberry Pi Performance OPENMP version 23
Prime Sieve P=14630843 Threads=2 Sec=3.83142 Mops=243.86
Merge Sort N=16777216 Threads=4 Sec=4.34331 Mops=92.7064
Fourier Transform N=4194304 Threads=1 Sec=6.78003 Mflops=68.0488
Lorenz 96 N=32768 K=16384 Threads=1 Sec=2.22188 Mflops=1449.78
Making pie charts...done.
bigmac:pichart-current scruss$ uname -a
Darwin bigmac.local 9.8.0 Darwin Kernel Version 9.8.0: Wed Jul 15 16:57:01 PDT 2009; root:xnu-1228.15.4~1/RELEASE_PPC Power Macintosh
bigmac:pichart-current scruss$ gcc-4.2 --version
powerpc-apple-darwin9-gcc-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5577)
Copyright (C) 2007 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
The gcc implementation of OpenMP has improved dramatically in recent years to include efficient support for dynamic parallelism. Since all four performance tests were originally written as Cilk parallel programs, they rely quite significantly on that feature. Maybe it's time to install gcc 6.3 or better.
Those PowerPC Macintosh computers--especially the big silver coloured towers--were quite impressive. I think what you have measured so far is the single core performance. Unfortunately, the only Macintosh computers I have for testing are modern iMacs without toaster-sized heatsinks that seem engineered to toast their hard disks instead.
Code: Select all
$ ./pichart-serial -t "P4 3.4GHz"
pichart -- Raspberry Pi Performance Serial version 23
Prime Sieve P=14630843 Threads=2 Sec=2.55177 Mops=366.15
Merge Sort N=16777216 Threads=2 Sec=3.43348 Mops=117.273
Fourier Transform N=4194304 Threads=1 Sec=2.60827 Mflops=176.888
Lorenz 96 N=32768 K=16384 Threads=1 Sec=1.24317 Mflops=2591.14
Making pie charts...done.
$ ./pichart-openmp -t "P4 3.4GHz"
pichart -- Raspberry Pi Performance OPENMP version 23
Prime Sieve P=14630843 Threads=4 Sec=2.18189 Mops=428.219
Merge Sort N=16777216 Threads=4 Sec=2.18205 Mops=184.53
Fourier Transform N=4194304 Threads=2 Sec=2.12401 Mflops=217.218
Lorenz 96 N=32768 K=16384 Threads=2 Sec=1.18418 Mflops=2720.23
Making pie charts...done.
Code: Select all
$ tar zxf pichart-current.tgz
$ cd pichart-current
$ make
gcc -std=gnu99 -O3 -ffast-math -Wall -lm -o pichart-serial pichart.c util.c sieve.c merge.c fourier.c lorenz.c
gcc -std=gnu99 -O3 -ffast-math -Wall -lm -fopenmp -o pichart-openmp pichart.c util.c sieve.c merge.c fourier.c lorenz.c
$ ./pichart-openmp
pichart -- Raspberry Pi Performance OPENMP version 23
Prime Sieve P=14630843 Threads=4 Sec=0.805925 Mops=1159.32
Merge Sort N=16777216 Threads=8 Sec=0.766017 Mops=525.645
Fourier Transform N=4194304 Threads=4 Sec=1.24434 Mflops=370.779
Lorenz 96 N=32768 K=16384 Threads=4 Sec=0.764573 Mflops=4213.1
Making pie charts...done.
Code: Select all
(gdb) run -t 'R51 Debug'
Starting program: /home/user/Downloads/pichart-current/pichart-openmp -t 'R51 Debug'
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/i386-linux-gnu/libthread_db.so.1".
pichart -- Raspberry Pi Performance OPENMP version 23
Prime Sieve [New Thread 0xb5dbeb40 (LWP 4220)]
Thread 1 "pichart-openmp" received signal SIGSEGV, Segmentation fault.
0x0804930d in setrange (jmax=268435456, jmin=268304400, imax=<optimized out>)
at sieve.c:57
57 if(!getbit(notPrimebits,i)){
Code: Select all
$ time ./pichart-serial -t 'MacBook5,1 Core 2 Duo 2.4 GHz'
pichart -- Raspberry Pi Performance Serial version 23
Prime Sieve P=14630843 Threads=2 Sec=2.039 Mops=458.23
Merge Sort N=16777216 Threads=2 Sec=2.99945 Mops=134.243
Fourier Transform N=4194304 Threads=2 Sec=1.62804 Mflops=283.393
Lorenz 96 Error 2.96128e-11 in parallel solver at index 0!
real 1m29.162s
user 1m27.686s
sys 0m0.502s
I think you need to remove the fast-math option from LLVM to run the Lorenz 96 timing. It might be closer to unsafe-math on gcc.scruss wrote: ↑Sat Dec 15, 2018 3:55 pmHaving time to kill while a large laser etch job was running, I tried my somewhat old MacBook. mac OS doesn't ship with GCC, but uses a compatible front-end to LLVM. It had problems -Code: Select all
$ time ./pichart-serial -t 'MacBook5,1 Core 2 Duo 2.4 GHz' pichart -- Raspberry Pi Performance Serial version 23 Prime Sieve P=14630843 Threads=2 Sec=2.039 Mops=458.23 Merge Sort N=16777216 Threads=2 Sec=2.99945 Mops=134.243 Fourier Transform N=4194304 Threads=2 Sec=1.62804 Mflops=283.393 Lorenz 96 Error 2.96128e-11 in parallel solver at index 0! real 1m29.162s user 1m27.686s sys 0m0.502s
Code: Select all
pichart-serial: $(SOURCE) pichart.h
$(CC) -o pichart-serial $(SOURCE) $(CFLAGS)
It may be important for the optimiser flags to occur early on the command line and the linker flags the end. I guess lumping everything into CFLAGS, while it works for gcc, may not have been such a good idea in general. Until I update the Makefile to correct this, it should be possible to sort things out by hand if necessary.scruss wrote: ↑Sat Dec 15, 2018 9:54 pmThe biggest thing I've noticed is that the code won't build on many systems unless you move CFLAGS to the end of the line:Code: Select all
pichart-serial: $(SOURCE) pichart.h $(CC) -o pichart-serial $(SOURCE) $(CFLAGS)
Agreed. The -lm should go at the end. On the other hand, the -O3 along with the -march and -mtune settings should be at the beginning. I'll put up a new version soon.
Code: Select all
pichart -- Raspberry Pi Performance Serial version 23
Prime Sieve P=14630843 Threads=1 Sec=3.32331 Mops=281.144
Merge Sort N=16777216 Threads=2 Sec=3.91997 Mops=102.718
Fourier Transform N=4194304 Threads=1 Sec=5.9601 Mflops=77.4103
Lorenz 96 N=32768 K=16384 Threads=2 Sec=2.41353 Mflops=1334.65
pichart -- Raspberry Pi Performance OPENMP version 23
Prime Sieve P=14630843 Threads=2 Sec=1.8036 Mops=518.035
Merge Sort N=16777216 Threads=2 Sec=2.10441 Mops=191.338
Fourier Transform N=4194304 Threads=2 Sec=3.43897 Mflops=134.161
Lorenz 96 N=32768 K=16384 Threads=2 Sec=1.459 Mflops=2207.84
Code: Select all
cairosvg pichart.svg -o pichart.png
It's nice to see both cores working together.