NigelJK
Posts: 65
Joined: Wed Sep 05, 2012 1:44 pm

Re: Trying to get an algorithm to run faster on RISC than wh

Fri Apr 12, 2013 7:23 am

mmm interesting windows results. Always figured .Net was a lash up.
Have you tried the same using VB6 as this compiles down to machine code (not pseudo-code as per anything .Net based, hence the interesting results as both vb.net and c#.net should compile to the same pseudo code).

dr_d_gee
Posts: 84
Joined: Fri Jan 04, 2013 1:30 pm

Re: Trying to get an algorithm to run faster on RISC than wh

Fri Apr 12, 2013 1:46 pm

My figures weren't with optimisation turned on. If I ask VC++ to optimise for speed, the time comes down to 0.36 cs! A bigger difference than you would expect optimisation to make. There's a lesson there, I'm sure...

FWIW, the I5 is clocked at 2.67GHz. It is surprising though what a difference assembler can make in the case of the RPi.

PocketSized
Posts: 13
Joined: Sat Sep 15, 2012 10:36 am
Location: Norway

Re: Trying to get an algorithm to run faster on RISC than wh

Fri Apr 12, 2013 3:13 pm

Thank you so much for all your help and input! Especially you guys that left me code examples. I'll have a look at it as soon we get the cluster working properly. :)

We have a "side competition" now with two classmates that are working with a Zedboard. With 30 pieces they get ~27s and we are down to ~34s with max overclocking on our Pi, both on linux kernel. There are no rules so I really hope I can beat them with BASIC or assembly. But for now we have to get the cluster to work properly before we can play around.... ;)

This is the best toy ever!
It's here!

AMcS
Posts: 184
Joined: Sun Jan 06, 2013 11:23 am
Location: Dublin, Ireland

Re: Trying to get an algorithm to run faster on RISC than wh

Fri Apr 12, 2013 8:14 pm

dr_d_gee wrote:My figures weren't with optimisation turned on. If I ask VC++ to optimise for speed, the time comes down to 0.36 cs! A bigger difference than you would expect optimisation to make. There's a lesson there, I'm sure...

FWIW, the I5 is clocked at 2.67GHz. It is surprising though what a difference assembler can make in the case of the RPi.
In Visual C++ to go down from 93.5 Centiseconds to 0.36 just by turning optimisation on is stunning - that's a difference of 259 times... that's probably beyond the level of difference you get when comparing a program written in an interpreted language and comparing the same algorithm performance in a compiled language.

I can only surmise (yes a guess...) that some sort of code re-writing by the compiler is going on - so that what you've compiled may not actually be what the source suggests.... but still gives the same results only much faster.

Regarding the ARM code sample by Gavin W - the ARM code is a faithful implementation even down to the recursive calling and gets speeds as low as 1 centisecond - which is remarkable. It suggests that hand crafted ARM code on RISC OS on a Pi can (in some instances) perform at a level close to that of a much faster x64 PC running compiled code.

I know it's NOT comparing like with like (so pinch of salt is in order here...) - but your PC is clocked 3.8 times faster than Pi - yet the optimised VC++ on the PC was only 2.7 times faster than the ARM code. Or put another way in this *limited* example clock for clock the ARM implementation was 40% faster than the VC++ one.

It's been a long time since I've been able to write something like that :)

AMcS
Posts: 184
Joined: Sun Jan 06, 2013 11:23 am
Location: Dublin, Ireland

Re: Trying to get an algorithm to run faster on RISC than wh

Fri Apr 12, 2013 9:40 pm

NigelJK wrote:mmm interesting windows results. Always figured .Net was a lash up.
Have you tried the same using VB6 as this compiles down to machine code (not pseudo-code as per anything .Net based, hence the interesting results as both vb.net and c#.net should compile to the same pseudo code).
I would have originally been sceptical about .NET too, but in practice you get reasonable performance and considerable convenience from it. I'd also, with the usual admonitions of not comparing like-with-like being duly acknowledged, point out that the VB.NET (in the previous benchmarks) was 69 times faster than the compiled BBC BASIC for Windows on the same machine.

And as seems to be the practice here - I've translated one of the earlier Hanoi's into C# and when run on a i7 920 (@2.66GHz/24GB Window 8 Pro x64) it does an average of 12.4 centiseconds per iteration - which is pretty fast (still slower than ARM Assembly code on a Pi - weird or what :D )

User avatar
Burngate
Posts: 6100
Joined: Thu Sep 29, 2011 4:34 pm
Location: Berkshire UK Tralfamadore
Contact: Website

Re: Trying to get an algorithm to run faster on RISC than wh

Sat Apr 13, 2013 9:14 am

So I'm probably wrong in what I'm about to say but ....
I took the RISC OS BASIC version from above, and added a couple of lines to count and print the number of times the function was called.

Code: Select all

.
FOR I% = 1 TO Loops%
  startTime% = TIME
  E%=0
  PROCHanoi (Pieces%, &A, &B, &C)
.
.
PRINT "Average ", Total%/Loops%, " centiseconds", "count ",E%
END

DEF PROCHanoi(n%,a%,b%,c%)
  E%+=1
.
The answer was 33554431 - no surprises here

I did the same with the BBC BASIC assembler version - that took a bit more fudging, to get the count returned, but I don't think it's changed anything material

Code: Select all

.
A% = 0: Loops% = 1: Total% = 0
B% = 1:C% = 2:D% = 3:E% = 25
PRINT "Starting ..."
FOR I% = 1 TO Loops%
  startTime% = TIME
REM  CALL code%
  count%=USR(code%)
.
.
  .entry%
  ADC R0,R0,#1                      ;increment loop count
  STMFD R13!, {R1-R4,R14}    ;save to stack
  CMP R4,#1                          ;check pieces
  LDMLEFD R13!, {R1-R4,PC} ;if 1 piece or less return
  SUB R4,R4,#1                     ;dec. pieces
  BL entry%                          ;go back in
  SUB R4,R4,#1                     ;dec. pieces
  BL entry%                          ;go back in
  LDMFD R13!, {R1-R4,PC}    ;return
What's odd is that it comes up with only 439202

I haven't looked at the other versions yet, but something odd is going on.

AMcS
Posts: 184
Joined: Sun Jan 06, 2013 11:23 am
Location: Dublin, Ireland

Re: Trying to get an algorithm to run faster on RISC than wh

Sat Apr 13, 2013 10:24 am

Burngate wrote:So I'm probably wrong in what I'm about to say but ....
I took the RISC OS BASIC version from above, and added a couple of lines to count and print the number of times the function was called.

<snip>

The answer was 33554431 - no surprises here
Agreed, the count should be 2^n - 1 (where n is the number of pieces) should give 33554431 as you've said

Burngate wrote:I did the same with the BBC BASIC assembler version - that took a bit more fudging, to get the count returned, but I don't think it's changed anything material
What's odd is that it comes up with only 439202
<snip>
I haven't looked at the other versions yet, but something odd is going on.
Converting that to the number of pieces used I make that the equivalent of 18.744... (*)

Weird - or what ! (I'll say that again even - Weird or what !).

As you've made some mods to the code I'll look at Gavin W's original and stick something in to output the iteration count and see what I find...



(*) that's log(439202+1)/log(2) - converts it to the number of pieces

AMcS
Posts: 184
Joined: Sun Jan 06, 2013 11:23 am
Location: Dublin, Ireland

Re: Trying to get an algorithm to run faster on RISC than wh

Sat Apr 13, 2013 1:58 pm

Burngate wrote:I did the same with the BBC BASIC assembler version - that took a bit more fudging, to get the count returned, but I don't think it's changed anything material

Code: Select all

.
A% = 0: Loops% = 1: Total% = 0
B% = 1:C% = 2:D% = 3:E% = 25
PRINT "Starting ..."
FOR I% = 1 TO Loops%
  startTime% = TIME
REM  CALL code%
  count%=USR(code%)
.
.
  .entry%
  ADC R0,R0,#1                      ;increment loop count
  STMFD R13!, {R1-R4,R14}    ;save to stack
  CMP R4,#1                          ;check pieces
  LDMLEFD R13!, {R1-R4,PC} ;if 1 piece or less return
  SUB R4,R4,#1                     ;dec. pieces
  BL entry%                          ;go back in
  SUB R4,R4,#1                     ;dec. pieces
  BL entry%                          ;go back in
  LDMFD R13!, {R1-R4,PC}    ;return
What's odd is that it comes up with only 439202

I haven't looked at the other versions yet, but something odd is going on.
With suitable qualifications in place (i.e., I might be wrong in stating this - so if you know better please let me know !) - it appears that a slightly altered version based on the original program doesn't seem to complete in 2^n-1 calls.

The modified code is as follows (I've added a bit to Gavin's that outputs the actual iteration count (held in R4)) this is incremented after each call (or recursive call) to the original routine. I've noted lines in the program that are different from the original.

Code: Select all

REM Hanoi in Basic and Assembler
REM GCW 11/04/2013
ON ERROR PRINT REPORT$;" at line ";ERL:END

PROCDebug : REM [AMcS] New Line - Assemble Debugging code

code% = FNasm
A% = 25: Loops% = 1: Total% = 0
B% = 1:C% = 2:D% = 3

E%=0   : REM [AMcS] New Line - To initialise register 4 to zero
PRINT "Starting ..."
FOR I% = 1 TO Loops%
  startTime% = TIME
  CALL code%
  endTime% = TIME
  Total% += endTime% - startTime%
NEXT
PRINT "Average ", Total%/Loops%, " centiseconds"
END



DEF FNasm
  DIM P% 60
  [ OPT 2
  .entry%
  STMFD R13!, {R0-R3,R14}
  ADD R4,R4,#1     \\New Line - Increment number of times Hanoi ASM entry is called
     BL debugEntry  \\New Line - Call new Debug entry point to output R4 (Count of calls) 
  CMP R0,#1
  LDMLEFD R13!, {R0-R3,PC}
  SUB R0,R0,#1
  EOR R2,R2,R3
  EOR R3,R3,R2
  EOR R2,R2,R3
  BL entry%
  SUB R0,R0,#1
  EOR R1,R1,R2
  EOR R2,R2,R1
  EOR R1,R1,R2
  BL entry%
  LDMFD R13!, {R0-R3,PC}
  ]
  = entry%



REM Debug ASM code added by AMcS

DEF PROCDebug
DIM DebugCode% 256
P%=DebugCode%
[OPT 2
 .buffer
  EQUS STRING$(16," ")
  ALIGN


 .debugEntry
  STMFD R13!,{R0-R4,R14}
    ADR R1,buffer
    MOV R2,#16
    MOV R0,R4
    SWI "OS_ConvertInteger4"
    SWI "OS_Write0"
    SWI "OS_NewLine"
  LDMFD R13!,{R0-R4,R14}
    MOV PC,R14
]
ENDPROC
When I call this (with 25 disks - that is A%=25) the program consistently finishes at 242,785 iterations (@Burngate - this differs from the result you obtained - but can't immediately tell if that is because of an error introduced in the code I've altered or the changes you made).

Varying the number of disks gives different results - as far as I can tell only 2 disks is correct (takes 3 iterations) - everything after that appears off.

By changing A% (which I think sets the number of disks)

With A%=...
4 Disks I get 9 iterations (not 15 (2^4-1))
8 Disks I get 67 iterations (not 255 (2^8 -1))
16 Disks I get 3193 iterations (not 65535 (2^16-1))

User avatar
Burngate
Posts: 6100
Joined: Thu Sep 29, 2011 4:34 pm
Location: Berkshire UK Tralfamadore
Contact: Website

Re: Trying to get an algorithm to run faster on RISC than wh

Sat Apr 13, 2013 3:29 pm

I'm thinking it's something to do with the decrementing of the number of discs.
Removing the second SUB R4,R4,#1 gives me 2^26-2, which is only out by a factor of 2!

User avatar
GavinW
Posts: 90
Joined: Tue Nov 01, 2011 8:11 pm
Location: UK
Contact: Website

Re: Trying to get an algorithm to run faster on RISC than wh

Sat Apr 13, 2013 3:49 pm

Sorry folks. It is indeed that second SUB R0,R0,#1 which is the bug. With it removed I now get 93 centiseconds. My iteration-reporting version gives the right number of iterations. I changed debugEntry to show R0's value, too.

Code: Select all

 .debugEntry
STMFD R13!,{R0-R5,R14}
MOV R5,R0
ADR R1,buffer
MOV R2,#16
MOV R0,R4
SWI "OS_ConvertInteger4"
SWI "OS_Write0"
MOV R0,#32
SWI "OS_WriteC"
MOV R0,R5
SWI "OS_ConvertInteger4"
SWI "OS_Write0"
SWI "OS_NewLine"
LDMFD R13!,{R0-R5,PC}
otium negare negotium vanum

pygmy_giant
Posts: 1562
Joined: Sun Mar 04, 2012 12:49 am

Re: Trying to get an algorithm to run faster on RISC than wh

Sat Apr 13, 2013 4:21 pm

Still damn quick - perhaps wrapping it up in C rather than BASIC would make it quicker ... or you could use DexBasic and run it on the bare metal(!) I doubt you would get it any quicker than that.

User avatar
Burngate
Posts: 6100
Joined: Thu Sep 29, 2011 4:34 pm
Location: Berkshire UK Tralfamadore
Contact: Website

Re: Trying to get an algorithm to run faster on RISC than wh

Sat Apr 13, 2013 4:22 pm

Great. So I wasn't going too bananas. Thanks

AMcS
Posts: 184
Joined: Sun Jan 06, 2013 11:23 am
Location: Dublin, Ireland

Re: Trying to get an algorithm to run faster on RISC than wh

Sat Apr 13, 2013 5:59 pm

pygmy_giant wrote:Still damn quick - perhaps wrapping it up in C rather than BASIC would make it quicker ... or you could use DexBasic and run it on the bare metal(!) I doubt you would get it any quicker than that.
You're right it's probably as fast as it can get, the BASIC lines are only used to initialise the registers, reserve space and assemble the code, all pretty quick and the FOR..NEXT is only run once and it calls the machine code once and then prints the timing results afterwards. The bulk of the 33Million plus iterations are all done by ARM code and that will be what determines the speed.

Short of doing something "magical" with that ARM code (I doubt it's possible) then 93 Centiseconds is the speed that'll stand as fastest, pretty good isn't it!

Thanks very much GavinW for the code and to Burngate for being on the ball eh !

User avatar
GavinW
Posts: 90
Joined: Tue Nov 01, 2011 8:11 pm
Location: UK
Contact: Website

Re: Trying to get an algorithm to run faster on RISC than wh

Sat Apr 13, 2013 8:40 pm

The ARM instruction set is seductive. In the days of the Archimedes, applications tended to be written in assembler; C came later. It can also be a trap. Even short bits of assembly code can be hard to debug (vide supra);). For larger projects debugging and maintenance can be a nightmare. How many enthusiastic programmers have spent their youth and energy on assembler, only to find that in the fullness of time their projects die of neglect because nobody but themselves could maintain them properly! I have seen this so often. In the early days memory was expensive and there were not so many high level programming languages available for RISC OS, so the temptation to use speedy assembler was greater than today. Nevertheless, these warnings may still be needed for those who fall in love with the ARM.
otium negare negotium vanum

pygmy_giant
Posts: 1562
Joined: Sun Mar 04, 2012 12:49 am

Re: Trying to get an algorithm to run faster on RISC than wh

Sat Apr 13, 2013 10:03 pm

93cs for ARM/BASIC
160cs for Norcroft C

AMcS
Posts: 184
Joined: Sun Jan 06, 2013 11:23 am
Location: Dublin, Ireland

Re: Trying to get an algorithm to run faster on RISC than wh

Sat Apr 13, 2013 10:25 pm

GavinW wrote:The ARM instruction set is seductive. In the days of the Archimedes, applications tended to be written in assembler; C came later. It can also be a trap. Even short bits of assembly code can be hard to debug (vide supra);). For larger projects debugging and maintenance can be a nightmare. How many enthusiastic programmers have spent their youth and energy on assembler, only to find that in the fullness of time their projects die of neglect because nobody but themselves could maintain them properly! I have seen this so often. In the early days memory was expensive and there were not so many high level programming languages available for RISC OS, so the temptation to use speedy assembler was greater than today. Nevertheless, these warnings may still be needed for those who fall in love with the ARM.
I'd largely accept that but would, with due caution, point out that only one language on Raspberry Pi allowed it to better the performance of a fairly modern PC (i5) clocked at nearly 4 times faster than the Pi and that was ARM Assembler. The lure is speed - the reason RISC OS is competitive and actually looks good and performs well on Pi is because it is largely assembler and is fairly frugal.

Would we even be having this discussion (or would RISC OS even be on Pi) if that were not the case?

The issues of debugging and maintenance, I believe, could be addressed with appropriate tools and methods. For example the issue with the ARM Hanoi program could possibly have been addressed more easily if there was an convenient way to step through the code, examine/change register values and trace execution. If we leveraged the computer to do the work - fixing ARM code (or indeed ANY language) becomes more practical. [Yes I know we could insert brake points with *BREAKSET - but in a program of any length that would get tedious... there has to be an easier way]

The issue of on going maintenance exists in all software (I will admit it is a greater problem in Assembler) but again I feel it's down to a disciplined approach and the right tools.

One weakness you highlighted - that of projects being abandoned happens all over the place (not just ARM code). With a greater degree of open sourcing - and community code development there would probably be fewer examples of an application being the sole responsibility of one person, there would be more co-ordination, documentation and support.

I know these are things to aim for rather than something that's likely to be achieved - but it would be nice if we could wouldn't it (all IMHO of course)?

AMcS
Posts: 184
Joined: Sun Jan 06, 2013 11:23 am
Location: Dublin, Ireland

Re: Trying to get an algorithm to run faster on RISC than wh

Sat Apr 13, 2013 10:43 pm

pygmy_giant wrote:93cs for ARM/BASIC
160cs for Norcroft C
Or un-optimised Visual C++ on an i5 clocked 3.8 times faster than a Pi, having faster RAM, larger cache and still managing 93.5 Centiseconds (slightly slower than ARM code).

Gavin is right to point out the difficulties, but who says that we can't with better tools and programming discipline continue to develop quality ARM code (at least for speed critical bits) so that RISC OS remains competitive on the ARM.

As I said in reply to Gavin - would we be having this conversation, would RISC OS be on the Pi if it were coded fully in C. Would it hold it's appeal if it were a not particularly fast "also ran". The Linux OSes are good and beat RISC OS in a great many areas - not having speed and responsiveness on our side would probably have consigned RISC OS to the "dust bin of history" (hey I am allow limited hyperbole from time to time.. :D ).

When the ARM came out first code on RISC OS was at least twice (and as much as 15 times) faster than the competition. Newer ARM processors are upping performance (like the Samsung Exynos 5) if a humble ARM11 clocked nearly four times slower than a PC can under some circumstances be competitive - what do you think we could do at 1.7GHz processor, and megs of cache .... that's the prize and that's why it's important not to lose the ARM coding skills or be too quick to abandon it as a means of getting an "edge".

The weaknesses Gavin has identified are still there - but we need to manage them.

User avatar
GavinW
Posts: 90
Joined: Tue Nov 01, 2011 8:11 pm
Location: UK
Contact: Website

Re: Trying to get an algorithm to run faster on RISC than wh

Sun Apr 14, 2013 9:19 am

I agree with you. But there is another aspect to the divide between low and high level programming languages that needs to be taken into account. Programming languages are not just for humans to talk to computers. They are important for humans to talk to other humans and, more particularly, to themselves; that is to say, for thinking and for conceiving algorithms. Of course, fundamentally, computers only understand machine code. Assembler mnemonics, variables, functions, types, scope, environments, objects, inheritance, interfaces, functors, ... these are all tools, which have taken decades to evolve, that let us translate abstract ideas into machine code. If we do not also learn higher level programming languages (but stick with assembler) we never get the chance to encounter this evolving heritage of ideas. But I may well be preaching to the converted :).
otium negare negotium vanum

dr_d_gee
Posts: 84
Joined: Fri Jan 04, 2013 1:30 pm

Re: Trying to get an algorithm to run faster on RISC than wh

Sun Apr 14, 2013 11:14 am

Yes, I've been wondering about optimisation in this particular case. I think there may be an issue in that the function/procedure does nothing other than call itself recursively. A "clever" optimiser might spot this and optimise all the calls away. I think I'll try counting the number of recursive calls made, adding in the code to do that, and seeing what difference it makes.

On the PC, the time taken is now 12 cs. Now to try the Norcroft version in these circumstances.

dr_d_gee
Posts: 84
Joined: Fri Jan 04, 2013 1:30 pm

Re: Trying to get an algorithm to run faster on RISC than wh

Sun Apr 14, 2013 2:30 pm

When adding the count feature in on the other platforms open to me, the results are:

Norcroft 193cs (was 160 without count)

On Raspbian:
gcc - unoptimised - 253cs
gcc - optimised (-O2) - 100cs

I don't know where to find (or if there is) an optimisation setting for Norcroft, but these figures convince me that the VC++ optimiser was probably *not* carrying out the code as intended. The Raspbian figure is using a terminal window with the LXDE desktop running, but no other apps (with Midori running, the time goes up about 50%). That said, on RISC OS you can't do anything else while the code is running...

AMcS
Posts: 184
Joined: Sun Jan 06, 2013 11:23 am
Location: Dublin, Ireland

Re: Trying to get an algorithm to run faster on RISC than wh

Mon Apr 15, 2013 12:59 am

dr_d_gee wrote:When adding the count feature in on the other platforms open to me, the results are:

Norcroft 193cs (was 160 without count)

On Raspbian:
gcc - unoptimised - 253cs
gcc - optimised (-O2) - 100cs

I don't know where to find (or if there is) an optimisation setting for Norcroft, but these figures convince me that the VC++ optimiser was probably *not* carrying out the code as intended.
Agreed - I'd be very surprised at VC++ having an x250+ speed improvement without some radical re-arrangement of the output (compared to the source) being done.

Overall very interesting results - are the gcc ones with/without count ? (in effect - do I compare them against the Norcroft 160 - with "No Count" or the 193 Centisecond "With Count" ?).

I did take a very quick look (on Google) for Acorn C/Norcroft compiler switches/options, but couldn't find anything obvious.

AMcS
Posts: 184
Joined: Sun Jan 06, 2013 11:23 am
Location: Dublin, Ireland

Re: Trying to get an algorithm to run faster on RISC than wh

Mon Apr 15, 2013 1:28 am

GavinW wrote:I agree with you. But there is another aspect to the divide between low and high level programming languages that needs to be taken into account. Programming languages are not just for humans to talk to computers. They are important for humans to talk to other humans and, more particularly, to themselves; that is to say, for thinking and for conceiving algorithms.
That's actually a very good point - but such communications doesn't necessarily have to be conducted in a "computing" language high or low level at all. Structured English, Pseudocode or some other agreed formal way of doing it would achieve the same aim. The programming language (high or low level) comes later.
GavinW wrote:Of course, fundamentally, computers only understand machine code. Assembler mnemonics, variables, functions, types, scope, environments, objects, inheritance, interfaces, functors, ... these are all tools, which have taken decades to evolve, that let us translate abstract ideas into machine code. If we do not also learn higher level programming languages (but stick with assembler) we never get the chance to encounter this evolving heritage of ideas. But I may well be preaching to the converted :).
Yes, as it happens you are :)

NigelJK
Posts: 65
Joined: Wed Sep 05, 2012 1:44 pm

Re: Trying to get an algorithm to run faster on RISC than wh

Mon Apr 15, 2013 7:52 am

Just to add some spice to the mix. The ARM uses about 2 watts of power, my current AMD 64bit cpu is rated at a (low) 60 watts! i7's (even the eco versions) start at around 45 watts and proceed up to 130 watts.

jamesh
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 24149
Joined: Sat Jul 30, 2011 7:41 pm

Re: Trying to get an algorithm to run faster on RISC than wh

Mon Apr 15, 2013 9:10 am

AMcS wrote:
dr_d_gee wrote:When adding the count feature in on the other platforms open to me, the results are:

Norcroft 193cs (was 160 without count)

On Raspbian:
gcc - unoptimised - 253cs
gcc - optimised (-O2) - 100cs

I don't know where to find (or if there is) an optimisation setting for Norcroft, but these figures convince me that the VC++ optimiser was probably *not* carrying out the code as intended.
Agreed - I'd be very surprised at VC++ having an x250+ speed improvement without some radical re-arrangement of the output (compared to the source) being done.

Overall very interesting results - are the gcc ones with/without count ? (in effect - do I compare them against the Norcroft 160 - with "No Count" or the 193 Centisecond "With Count" ?).

I did take a very quick look (on Google) for Acorn C/Norcroft compiler switches/options, but couldn't find anything obvious.
That level of speed increase is more likely due to memory caching improving, rather than a simply change in the code. Keeping stuff in L1 or L2 cache vs fetching from main memory can make a colossal difference, esp. when the code is very memory intensive. This also make direct code comparisons very difficult, as it's complicated to predict what will and won't be in cache, esp. in a multithreaded system.

As to use of Assembler vs high level languages - we've had many arguments over this in other threads. I'm of the opinion that compilers nowadays are so good that only a very limited subset of engineers can actually write more efficient code than a compiler, and when they do, that code is almost incomprehensible to anyone else. Take a look at optimised output from a compiler - it's really difficult to see what is going on. So assembler should only be used where absolutely necessary, and those circumstances are very rare indeed.
Principal Software Engineer at Raspberry Pi (Trading) Ltd.
Contrary to popular belief, humorous signatures are allowed. Here's an example...
“I think it’s wrong that only one company makes the game Monopoly.” – Steven Wright

dr_d_gee
Posts: 84
Joined: Fri Jan 04, 2013 1:30 pm

Re: Trying to get an algorithm to run faster on RISC than wh

Mon Apr 15, 2013 1:46 pm

Just to clarify, the gcc results are *with* the count enabled. So the unoptimised code is slower than Norcroft/RISC Os while the optimised code is faster (almost double the speed).

Return to “RISCOS”