scologic
Posts: 61
Joined: Mon Sep 12, 2011 8:05 am
Contact: Website

Re: Floating point performance?

Mon Sep 26, 2011 10:31 pm

sorry for asking something that may seem a bit dumb.. but with the floating point working have you ran a desktop on the board - does it seem to perform alot better, and if it is would or does the floating point work better at browsing and java or video playback with the floating point?
Having worked with the sheeva and no FP we always thought that ephiphany or ice wezel would work better if there was FP on the processor.. and if thats the case are we looking at better video browser playback(we had swf videos playing fine on the sheeva).

jamesh
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 23924
Joined: Sat Jul 30, 2011 7:41 pm

Re: Floating point performance?

Tue Sep 27, 2011 8:22 am

The problem is that I would need to re-compile all the desktop source and libraries with the float-abi=softfp flag enabled - quite a big job which I don;t have the time for. If someone with a windows emulator going fancies rebuilding it all, I can certainly try it out.
Principal Software Engineer at Raspberry Pi (Trading) Ltd.
Contrary to popular belief, humorous signatures are allowed. Here's an example...
“I think it’s wrong that only one company makes the game Monopoly.” – Steven Wright

vladhed
Posts: 35
Joined: Fri Aug 05, 2011 3:04 pm

Re: Floating point performance?

Tue Sep 27, 2011 7:04 pm

Quote from jamesh on September 25, 2011, 18:41
C Converted Double Precision Whetstones: 41.7 MIPS


For comparison, my AMD K6/166 Linux box that I bought in 1997 ran that same test and returned with 69.444 MIPS

rmike
Posts: 41
Joined: Mon Aug 22, 2011 10:50 am

Re: Floating point performance?

Tue Sep 27, 2011 8:02 pm

Hi,
my AMD K6/166 Linux box that I bought in 1997 ran that same test and returned with 69.444 MIPS i hope there is still something wrong with the setup for the HW FPU usage because according to this site: http://www.roylongbottom.org.u.....tstone.htm even a Pentium @ 75 MHz shows more performance.

Michael

Svartalf
Posts: 596
Joined: Fri Jul 29, 2011 6:50 pm

Re: Floating point performance?

Tue Sep 27, 2011 9:30 pm

It should be noted that ARM's Hardware FP capabilities, while better than software FP, isn't anything to write home about until you talk about NEON on the A9/A15. It's pretty weak compared to Intel and AMD who have put quite a bit more effort into FP operations than ARM has until recently.

Michael
Posts: 340
Joined: Sat Jul 30, 2011 6:05 pm

Re: Floating point performance?

Tue Sep 27, 2011 9:44 pm

Quote from rmike on September 27, 2011, 21:02
i hope there is still something wrong with the setup for the HW FPU usage

Well this is probably correct for 'softfp'. What would be interesting is seeing the same benchmarks when the stack is compiled 'hardfp'.

jamesh
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 23924
Joined: Sat Jul 30, 2011 7:41 pm

Re: Floating point performance?

Wed Sep 28, 2011 8:03 am

The question is - can I just build the app with hardfp, or do I need to recompile all the C libraries as well? I'm thinking the latter. Which is a PITA and I don't have time to do it.

Also, can any Arm compiler flags experts see any issue with the build I am doing? Can it be improved?
Principal Software Engineer at Raspberry Pi (Trading) Ltd.
Contrary to popular belief, humorous signatures are allowed. Here's an example...
“I think it’s wrong that only one company makes the game Monopoly.” – Steven Wright

scologic
Posts: 61
Joined: Mon Sep 12, 2011 8:05 am
Contact: Website

Re: Floating point performance?

Wed Sep 28, 2011 10:46 am

I'm going to say the usual response to all this... I'd love to help and get John my linux guru to work on this also but..... we dont have a board to play with or have any chance to optimise.

Michael
Posts: 340
Joined: Sat Jul 30, 2011 6:05 pm

Re: Floating point performance?

Wed Sep 28, 2011 7:19 pm

Quote from jamesh on September 28, 2011, 09:03
The question is - can I just build the app with hardfp, or do I need to recompile all the C libraries as well? I'm thinking the latter. Which is a PITA and I don't have time to do it.


I believe *everything* has to be compiled hardfp, including the kernel. The ABI for hardfp and softfp is different.

jamesh
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 23924
Joined: Sat Jul 30, 2011 7:41 pm

Re: Floating point performance?

Wed Sep 28, 2011 7:32 pm

Quote from Michael on September 28, 2011, 20:19
Quote from jamesh on September 28, 2011, 09:03
The question is - can I just build the app with hardfp, or do I need to recompile all the C libraries as well? I'm thinking the latter. Which is a PITA and I don't have time to do it.


I believe *everything* has to be compiled hardfp, including the kernel. The ABI for hardfp and softfp is different.


I'm sure the C libs would all need recompiling hardfp, but I am not sure about the kernel - does it use any floating point at all? Or need to provide services to programs in hardfp?
Principal Software Engineer at Raspberry Pi (Trading) Ltd.
Contrary to popular belief, humorous signatures are allowed. Here's an example...
“I think it’s wrong that only one company makes the game Monopoly.” – Steven Wright

dom
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 5341
Joined: Wed Aug 17, 2011 7:41 pm
Location: Cambridge

Re: Floating point performance?

Wed Sep 28, 2011 8:19 pm

The key option is -mfpu=vfp, but you get that as standard if you use the cross-compiler (bcm2708-gcc).
That improves the results to:
./whetstone
Loops: 1000, Iterations: 100, Duration: 106 sec.
C Converted Double Precision Whetstones: 94.3 MIPS

but most of the time in this test is spent in libm (sin/cos/atan/log/exp/sqrt) which is built without -mfpu-vfp. Use the libm from toolchain's sys-root:
LD_LIBRARY_PATH=/home/dc4/sys-root ./whetstone
Loops: 1000, Iterations: 100, Duration: 15 sec.
C Converted Double Precision Whetstones: 666.7 MIPS

better...

vladhed
Posts: 35
Joined: Fri Aug 05, 2011 3:04 pm

Re: Floating point performance?

Thu Sep 29, 2011 12:17 am

Very good - now we are into the range of a P3 running at a similar clock speed. More than enough horsepower to efficiently run bloat-free tools.

kme
Posts: 448
Joined: Sun Sep 04, 2011 9:37 am

Re: Floating point performance?

Thu Sep 29, 2011 1:50 am

This makes me think - are there anyone working on recompiling Debian directly targeted against Raspberry PI? Debian surely does run on the ARM platform, but this platform is a mixed bag and I would not be surprised if Debian tries to cover as many variant as possible and is optimized for none.

This floating point story shows optimization can be important. Debian as an OS won't benefit much from optimized FP but applications may. Similarly Raspberry PI has some graphical capabilities that is non-standard and there may be something to gain recompiling X for this particular platform.

A Raspberry PI edition of Debian won't cost anything except brain power and computer power. And if this gives you a general 10% performance boost I think this is worth the effort.

Svartalf
Posts: 596
Joined: Fri Jul 29, 2011 6:50 pm

Re: Floating point performance?

Thu Sep 29, 2011 2:39 am

Quote from jamesh on September 28, 2011, 20:32
I'm sure the C libs would all need recompiling hardfp, but I am not sure about the kernel - does it use any floating point at all? Or need to provide services to programs in hardfp?

The kernel generally avoids doing FP operations. glibc, libstdc++, etc would have to be recompiled, but the kernel should be "safe" for the purposes of what we're trying to get at.

In regards to "optimized" distributions, it's not overly hard- you've just got to get past board bring-up, which is where they are. It's easier to grab from the Debian ARM repositories during this phase, which are probably ARM5 since they're armel ABI based files. We'll just have to see if someone from the OE camp or similar steps up to the plate and pushes out some differing, optimized binaries.

Svartalf
Posts: 596
Joined: Fri Jul 29, 2011 6:50 pm

Re: Floating point performance?

Thu Sep 29, 2011 3:09 am

Quote from jamesh on September 28, 2011, 09:03
Also, can any Arm compiler flags experts see any issue with the build I am doing? Can it be improved?


Similar machines are using the following tuning settings for OE...

-march=armv6j -mtune=arm1176jzf-s -mfpu=vfp -mfloat-abi=softfp

That, combined with -O2 or -O3, should produce close to peak results.

rmike
Posts: 41
Joined: Mon Aug 22, 2011 10:50 am

Re: Floating point performance?

Thu Sep 29, 2011 6:45 am

Hi dom,
LD_LIBRARY_PATH=/home/dc4/sys-root ./whetstone
Loops: 1000, Iterations: 100, Duration: 15 sec.
C Converted Double Precision Whetstones: 666.7 MIPS
excellent! Where did you get a raspi board from? :D
With these results real fancy science apps are possible with it not only in the field of astronomy...

Michael

jamesh
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 23924
Joined: Sat Jul 30, 2011 7:41 pm

Re: Floating point performance?

Thu Sep 29, 2011 7:47 am

Quote from rmike on September 29, 2011, 07:45
Hi dom,
LD_LIBRARY_PATH=/home/dc4/sys-root ./whetstone
Loops: 1000, Iterations: 100, Duration: 15 sec.
C Converted Double Precision Whetstones: 666.7 MIPS
excellent! Where did you get a raspi board from? :D
With these results real fancy science apps are possible with it not only in the field of astronomy...

Michael

Dom works for Broadcom....and knows everything (lots more than me!)

Thanks for the results Dom - knew my results were a bit on the low side, but not why.
Principal Software Engineer at Raspberry Pi (Trading) Ltd.
Contrary to popular belief, humorous signatures are allowed. Here's an example...
“I think it’s wrong that only one company makes the game Monopoly.” – Steven Wright

david13lt
Posts: 20
Joined: Thu Sep 08, 2011 4:55 am

Re: Floating point performance?

Thu Sep 29, 2011 7:56 am

Quote from Svartalf on September 29, 2011, 04:09
Quote from jamesh on September 28, 2011, 09:03
Also, can any Arm compiler flags experts see any issue with the build I am doing? Can it be improved?


Similar machines are using the following tuning settings for OE...

-march=armv6j -mtune=arm1176jzf-s -mfpu=vfp -mfloat-abi=softfp

That, combined with -O2 or -O3, should produce close to peak results.

We can tweak that in the OE configuration files, I think. That should not be a problem. Also there is TARGET_FPU[_*] variables, which allows different packages to decide how they should be compiled. And for TARGET_CC_ARCH tuning we need to write a proper machine configuration or CPU tuning configuration.

We could write a letter to Angstrom dev mailing-list, maybe they would be interesting into bringing support for this board and could contact R-P people for alpha board.

jamesh
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 23924
Joined: Sat Jul 30, 2011 7:41 pm

Re: Floating point performance?

Thu Sep 29, 2011 8:02 am

Quote from david13lt on September 29, 2011, 08:56
Quote from Svartalf on September 29, 2011, 04:09
Quote from jamesh on September 28, 2011, 09:03
Also, can any Arm compiler flags experts see any issue with the build I am doing? Can it be improved?


Similar machines are using the following tuning settings for OE...

-march=armv6j -mtune=arm1176jzf-s -mfpu=vfp -mfloat-abi=softfp

That, combined with -O2 or -O3, should produce close to peak results.

We can tweak that in the OE configuration files, I think. That should not be a problem. Also there is TARGET_FPU[_*] variables, which allows different packages to decide how they should be compiled. And for TARGET_CC_ARCH tuning we need to write a proper machine configuration or CPU tuning configuration.

We could write a letter to Angstrom dev mailing-list, maybe they would be interesting into bringing support for this board and could contact R-P people for alpha board.

Optimising kernels....

I'm building kernels here - just don't know where to make the changes to the settings/add new config files for particular boards. Did have a peek round the config files, but got a bit lost and ran out of time. Looks like Dom may already have done a lot of this though. I'll investigate (i.e. Ask Dom) and see what the plan is.
Principal Software Engineer at Raspberry Pi (Trading) Ltd.
Contrary to popular belief, humorous signatures are allowed. Here's an example...
“I think it’s wrong that only one company makes the game Monopoly.” – Steven Wright

lingon
Posts: 119
Joined: Fri Aug 26, 2011 7:31 am

Re: Floating point performance?

Thu Sep 29, 2011 8:06 am

Quote from dom on September 28, 2011, 21:19
The key option is -mfpu=vfp, but you get that as standard if you use the cross-compiler (bcm2708-gcc).
That improves the results to:
./whetstone
Loops: 1000, Iterations: 100, Duration: 106 sec.
C Converted Double Precision Whetstones: 94.3 MIPS

but most of the time in this test is spent in libm (sin/cos/atan/log/exp/sqrt) which is built without -mfpu-vfp. Use the libm from toolchain's sys-root:
LD_LIBRARY_PATH=/home/dc4/sys-root ./whetstone
Loops: 1000, Iterations: 100, Duration: 15 sec.
C Converted Double Precision Whetstones: 666.7 MIPS

better...

What is the difference between -mfpu=vfp and -mfpu=vfp3?
According to the gcc man page there are quite a few different variants available like vfpv3-fp16, vfpv3-d16, vfpv3-d16-fp16, vfpv3xd, vfpv3xd-fp16. What is actually the optimum value of all these for the Raspberry Pi?

It would be nice to see the LINPACK result with the same compiler flags and libraries as for the optimized Whetstones result.

jamesh
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 23924
Joined: Sat Jul 30, 2011 7:41 pm

Re: Floating point performance?

Thu Sep 29, 2011 8:16 am

Quote from lingon on September 29, 2011, 09:06
Quote from dom on September 28, 2011, 21:19
The key option is -mfpu=vfp, but you get that as standard if you use the cross-compiler (bcm2708-gcc).
That improves the results to:
./whetstone
Loops: 1000, Iterations: 100, Duration: 106 sec.
C Converted Double Precision Whetstones: 94.3 MIPS

but most of the time in this test is spent in libm (sin/cos/atan/log/exp/sqrt) which is built without -mfpu-vfp. Use the libm from toolchain's sys-root:
LD_LIBRARY_PATH=/home/dc4/sys-root ./whetstone
Loops: 1000, Iterations: 100, Duration: 15 sec.
C Converted Double Precision Whetstones: 666.7 MIPS

better...

What is the difference between -mfpu=vfp and -mfpu=vfp3?
According to the gcc man page there are quite a few different variants available like vfpv3-fp16, vfpv3-d16, vfpv3-d16-fp16, vfpv3xd, vfpv3xd-fp16. What is actually the optimum value of all these for the Raspberry Pi?

It would be nice to see the LINPACK result with the same compiler flags and libraries as for the optimized Whetstones result.

I was just about to post that I will redo the Linpack tests with the correct flags and libraries (need to get the libraries first! On the server somewhere I guess).

For the difference in vfp settings (and worth a read)

http://www.arm.com/products/pr.....-point.php

also here

http://infocenter.arm.com/help.....HDJJE.html
Principal Software Engineer at Raspberry Pi (Trading) Ltd.
Contrary to popular belief, humorous signatures are allowed. Here's an example...
“I think it’s wrong that only one company makes the game Monopoly.” – Steven Wright

HenryG
Posts: 28
Joined: Sat Sep 10, 2011 3:15 pm

Re: Floating point performance?

Thu Sep 29, 2011 1:47 pm

Quote from dom on September 28, 2011, 21:19
but most of the time in this test is spent in libm (sin/cos/atan/log/exp/sqrt) which is built without -mfpu-vfp. Use the libm from toolchain's sys-root:
LD_LIBRARY_PATH=/home/dc4/sys-root ./whetstone
Loops: 1000, Iterations: 100, Duration: 15 sec.
C Converted Double Precision Whetstones: 666.7 MIPS


Maybe the Quake 3 port needs to be recompiled with these options in order to show the Rasberry true potential ?

jamesh
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 23924
Joined: Sat Jul 30, 2011 7:41 pm

Re: Floating point performance?

Thu Sep 29, 2011 1:55 pm

Quote from HenryG on September 29, 2011, 14:47
Quote from dom on September 28, 2011, 21:19
but most of the time in this test is spent in libm (sin/cos/atan/log/exp/sqrt) which is built without -mfpu-vfp. Use the libm from toolchain's sys-root:
LD_LIBRARY_PATH=/home/dc4/sys-root ./whetstone
Loops: 1000, Iterations: 100, Duration: 15 sec.
C Converted Double Precision Whetstones: 666.7 MIPS


Maybe the Quake 3 port needs to be recompiled with these options in order to show the Rasberry true potential ?

It probably was, but I can try and check. Most of the hard work is done by the accelerated OpenGL back end on the GPU, but I believe there is some FP in the main code as well.
Principal Software Engineer at Raspberry Pi (Trading) Ltd.
Contrary to popular belief, humorous signatures are allowed. Here's an example...
“I think it’s wrong that only one company makes the game Monopoly.” – Steven Wright

kme
Posts: 448
Joined: Sun Sep 04, 2011 9:37 am

Re: Floating point performance?

Thu Sep 29, 2011 1:58 pm

@HenryG
Quake III is very light on FP, so that won't matter much. But of course every little bit is welcome.

obarthelemy
Posts: 1399
Joined: Tue Aug 09, 2011 10:53 pm

Re: Floating point performance?

Thu Sep 29, 2011 2:31 pm

The way I understand it, ARM apps can also use the FPU to pass arguments between functions/processes even when no FP is actually involved, to improve performance a bit ?
And most ARMv6 apps assume no FP because it's optional, but the Pi generously does provide it (quick ! use it before "they" cost-optimize it away ^^)

Return to “General discussion”