## MFLOP questions

docteur.blanchard
Posts: 23
Joined: Wed Sep 19, 2012 4:52 pm

### MFLOP questions

Dear all,

I have the model B 512Mo RAM.

I wish to know how to calculate the number of CPU FLOPs and GPU FLOPs given by different values of overclocking offered by the raspi-config.

Can you help me to know how to calculate and/or does someone has such of informations ?

teh_orph
Posts: 346
Joined: Mon Jan 30, 2012 2:09 pm
Location: London
Contact: Website

### Re: MFLOP questions

Well the theoretical way of doing this is taking the clock speed (nominally 700MHz) and multiplying it by the number of FP operations you can do per second. Rpi has a superscalar FPU and can sustain one instruction per clock for many common ops. Often the operation which "does the most FP ops" is chosen and in our case it's a scalar multiply-add (FMAC). Multiply-add is two operations.

So issuing one multiply-add instruction per clock cycle gives us 2 * 700000000 = 1400 MFLOPS.

Since the number of FLOPS you get out from a system is normally only useful for theoretical situations (it entirely ignores the rest of the system) you need to be careful when making comparisons with other MFLOP values from different systems.

James I'm sure will tell you about the GPU

EDIT tough day at work so please double-check my thinking here!

docteur.blanchard
Posts: 23
Joined: Wed Sep 19, 2012 4:52 pm

### Re: MFLOP questions

Hello Teh_Orph,

If i follow the logical, more I overclock, more i will obtain flops. Am i right?

I though that your calculation is for mips and that the mflop was different.
Thanks for your explain, i'll check.

no idea about a benchmark tool ?
Best Regards
Marc

teh_orph
Posts: 346
Joined: Mon Jan 30, 2012 2:09 pm
Location: London
Contact: Website

### Re: MFLOP questions

docteur.blanchard wrote:If i follow the logical, more I overclock, more i will obtain flops. Am i right?
Fo reals dawg. There's a linear relationship between the theoretical amount of FLOPS and the clock speed. However of course in the real world it depends on other things.
I though that your calculation is for mips and that the mflop was different.
MIPS I suppose would detail the issue rate (how many instructions per second can be run). The Rpi's CPU is single issue, and can therefore only issue one instruction per clock cycle. So 1 * 700 = 700 MIPS.

summers
Posts: 63
Joined: Mon Jan 30, 2012 4:27 pm

### Re: MFLOP questions

You could always just check the http://en.wikipedia.org/wiki/BogoMips reported by the kernel ...

Code: Select all

``dmesg | grep BogoMIPS``

jamesh
Raspberry Pi Engineer & Forum Moderator
Posts: 24617
Joined: Sat Jul 30, 2011 7:41 pm

### Re: MFLOP questions

The GPU is difficult to quantify - it has many different processors on it - you could just add up all the values, but since not all the processors can run entirely in parallel, that's not fair (although seem to be the way some people add them up - for example some A10 based boards are 2x700Mhz cores, so they are advertised at 1400Mhz...).

But the number 24 GFLOPs rings a bell for the GPU. Which is a big number.
Principal Software Engineer at Raspberry Pi (Trading) Ltd.
Contrary to popular belief, humorous signatures are allowed. Here's an example...
“I own the world’s worst thesaurus. Not only is it awful, it’s awful."

tk321
Posts: 31
Joined: Sat Jun 02, 2012 6:09 pm
Location: UK

### Re: MFLOP questions

I believe the theoretical peak performance of the ARM cpu is only 350Mflops double precision at 700 MHz. I'm not sure but I would guess fused multiply-add is not available on the Raspberry Pi, because its math unit is only vfpv2 and from the arm doc:
The fused multiply-add instructions are only available on NEON or VFP systems that implement the fused multiply-add extension. The VFP system that implements the fused multiply-add extension is VFPv4.
I timed faddd and fmuld (double prec add and mul) a while ago and I think it was something like
• faddd: 8 cycles latency, 2 cycles throughput
• fmuld: 9 cycles latency, 2 cycles throughput
So in the best case it still takes 2 cycles for one operation and then 700MHz/2 = 350 Mflops. In the worst case where in your algorithm the result of the current operation is required for the next operation, ie pipelining can't be used, it takes 8 cycles for one operation and we end up with 700MHz/8 = 87.5Mflops.

The GPU is impressively fast, but I'd guess the 24GFlops are single precision.

Heater
Posts: 14262
Joined: Tue Jul 17, 2012 3:02 pm

### Re: MFLOP questions

How about you take your favorite benchmark program and run it in the Pi and see what you get?
Or right your own simple FLOPs test in C?

If you want to speak of GPU FLOPs that's another issue. GPU FLOPs are not available to "normal" programs. If you know how to take advantage of them you also know how to measure the results.

Overclocking does not come into this. As someone above said there is a linear relation between FLOPs and clcock speed. So it's easy to extrapolate assuming you Pi is reliable when overclocked.
Memory in C++ is a leaky abstraction .

kalyani.barve
Posts: 1
Joined: Sun Sep 10, 2017 5:49 am

### Re: MFLOP questions

I am going to use 4 raspberry pi those will have connected in parallelly but how do i measure speed of supercomputer ?