User avatar
DavidS
Posts: 4334
Joined: Thu Dec 15, 2011 6:39 am
Location: USA
Contact: Website

Re: Why Avoid BASIC on RPi?

Sun Dec 02, 2018 2:00 am

I agree that we owe a lot to Berkeley RISC (and it is debatable if it or MIPS was first, they both claim to be first). There was a great philosophy, one that just needed a little more reasonable look at implementation, and that is what ARM, MIPS 4000, etc brought on board, the commercially successful implementations all had things that made optimizing easier (remember MIPS was also a research project, though it was and still is still commercially successful).

With the concepts of both Berkely RISC and MIPS we gained ARM (I think more of MIPS looking at the ISA's), a nearly 100% RISC implementation (excepting the LDM/STM pair, that is still done in a RISC like implementation). Though with ARM it was realized that it does not increase the propagation delay to include a barrel shifter in the pipeline, and the delay by looping part of the pipeline for multiplication is acceptable for the time it saves in overall execution (and now ARM Multiply is truly single cycle in many implementations). LDM/STM are the only debatable addition that ARM made to purely RISC implementation, and now these additions are speeding up our code more than ever, as we can use them to take advantage of the wider bus to cache on newer ARM CPU's (128 bit gives 4 registers at once, and still tree decoded so not CISC [NO microcode]).

I will continue looking at RISC V and where it is going, though I would be surprised if it is the next RISC (though I say the same for AARCH64, which shares almost nothing with the traditional ARM ISA). If something displaces the traditional 32-bit ARM ISA I do not think that it will be either RISC V or AARCH64.

Though this is just my view on the issue.
RPi = The best ARM based RISC OS computer around
More than 95% of posts made from RISC OS on RPi 1B/1B+ computers. Most of the rest from RISC OS on RPi 2B/3B/3B+ computers

User avatar
DavidS
Posts: 4334
Joined: Thu Dec 15, 2011 6:39 am
Location: USA
Contact: Website

Re: Why Avoid BASIC on RPi?

Sun Dec 02, 2018 2:07 am

Heater wrote:
Sun Dec 02, 2018 1:12 am
Nothing special about the carry flag. It's the same number range overflow flag for unsigned integers as the "overflow" flag is for signed integers.

Most code never cares about that so why build it into the ISA specification?

It's not just that these flags might be useful, its that if you have them you then have to have other instructions that test them or use them. For example you might then need an ADD that does not care about carry and an ADC that does.

Nope, too complex, skip it. On the rare occasion people want a carry it can be synthesized with other instructions.

Those flags are only cherished by old skool assembler programmers.
I am taking it that you have never looked at the ARM instruction set. We do not need special instructions to work with flags, instead the tree decoder is divided into two parts, one for the condition code one for the op (speeding up the decode). You can write an entire ARM program without using either of the special instructions (that themselves are conditional) that discard there results, because conditions are universal in ARM code, and conditions being universal simplifies the decoding of instructions.

Yes I am aware that there are a few newer instructions that use the NV condition and thus are not conditional (like BX). Though they are still done in a way that makes them easily decoded.

Those flags are cherished by compiler writers a lot more than they are by assembly coders. Those flags simplify the life of a compiler writer in optimizing code by so much that they will be missed on RISC V.
RPi = The best ARM based RISC OS computer around
More than 95% of posts made from RISC OS on RPi 1B/1B+ computers. Most of the rest from RISC OS on RPi 2B/3B/3B+ computers

User avatar
DavidS
Posts: 4334
Joined: Thu Dec 15, 2011 6:39 am
Location: USA
Contact: Website

Re: Why Avoid BASIC on RPi?

Sun Dec 02, 2018 2:36 am

@Heater:
To help you a little ADDCSS is just the ADD opcode that is decoded in parallel to the conditional Cary Set CS and enables the ALU to set the flags S.

The upper four bits of the 32-bit instruction word are the condition, always the upper four bits of an ARM instruction is the condition code, no need to special case some instructions, makes decoding much simpler, keeping the decode tree of transistors to a maximum depth of 4, the next four bits is the opcode, again keeping the decode tree to a depth of 4. A transistor decode tree has propagation, the deeper it is the more propagation it has. Now we have added some ops that combine the opcode with the NV condition code to expand the instruction set (i think only BX off the top of my head), though these do not matter.

The barrel shift is decoded yet another separate tree, thus also keeping the complexity down. The register placement is constant across all ops, thus keeping the decode complexity down. The ARM 32 bit is very well designed, in that it avoids special case ops, in a way it is more purely RISC than those it evolved out of (excepting the LDM/STM ops).

Then there is the fact that a good optimizing compiler has more opportunities to better optimize the code when all instructions are conditional.

A note on RAM access speed:
This is in relation to our example. SDRAM is a burst read write ram, thus consecutive access are quite fast, ok we are not quite keeping it consecutive yet, though we could by interleaving the two source arrays, with each 128 bit element on in RAM belonging to the next array, and that would also decrease the number of registers needed. Though it would mean that the output implementation would need to also know about the interleave. Point being that if you keep the two source arrays then the reads are consecutive, and the writes are to a consecutive array as well, this will make the way that the cache controller reads it in and writes it out nearly keep up to the maximum possible speed, thus with we should be able to do a 128-bit read or write every 4 instructions without lag (assuming 400MHz RAM and 1200Mhz ARM), though because some ops will overlap time in the pipeline it is not that good, not to mention the lost time to VideoCoreIV and refresh, so it is more like 8 to 12 instructions for each 128 bit read or write in most situations. Well optimized interleaved instructions can be even more instructions in the same time, making it as high as 32 instructions (though the example is not going to do much multiple issue, mostly serial operation).
RPi = The best ARM based RISC OS computer around
More than 95% of posts made from RISC OS on RPi 1B/1B+ computers. Most of the rest from RISC OS on RPi 2B/3B/3B+ computers

jahboater
Posts: 4846
Joined: Wed Feb 04, 2015 6:38 pm

Re: Why Avoid BASIC on RPi?

Sun Dec 02, 2018 10:05 am

In the video, the compressed mode (C extension) was mentioned a lot and compared to the other ISA's.
But they never mentioned the ARM compressed mode thumb2 that appears to be almost the same, and has been available for years.

Still reading the RISCV spec in detail.
Further to the missing flags "feature". It seems the compare instructions just write a boolean value to a destination register, which must subsequently be tested. Hmm.

There are some nifty things in the floating-point extensions - the FCLASS instruction is cool (better than the x87 FXAM insn), and the sign injection stuff.
The rounding modes support traditional rounding (as NEON does) it is a 3-bit field. But there are no floating-point to floating-point rounding instructions which is odd. I don't yet understand the point of "NaN-boxing".

The integer ISA seems limited to say the least.

They make a comment that "NaN payloads are optional in the IEEE standard, and therefore are not portable, and therefore cannot be used".
True.
The trouble is that most of the RISC V ISA is optional, and by the same reasoning presumably is non-portable and cannot be used.
I know there are good reasons for the optional stuff (easier to make cheap low power devices). But ARM successfully and completely dominate that market, and have moved away from optional features.
Even integer multiply is optional in RV.

As node sizes get reduced in time (ARM Cortex-A76 is 7nm only!!!!!!!) there is more space available. ARM for example had decent floating point as an option in the past, but now, for the last 7 years, it is guaranteed present which is a very good thing IMO.

Unaligned access seems to be done by multiple aligned accesses and is very slow and discouraged.
Other CPU's have moved away from requiring aligned access. ARMv8 32 and 64 bit happily accepts unaligned access. Intel hardware always has, and importantly, in the last few years there is no performance penalty at all, not even for unaligned SIMD access.

The five bit register fields allow 31 registers. Like zr on aarch64, register x0 always contains zero - extremely useful!

User avatar
DavidS
Posts: 4334
Joined: Thu Dec 15, 2011 6:39 am
Location: USA
Contact: Website

Re: Why Avoid BASIC on RPi?

Sun Dec 02, 2018 3:32 pm

Yes the RISC V ISA is very interesting, quite a bit different from others in a number of ways. The way it does conditional execution is still a little off putting, though I do think that it will have its place, not as a replacement for ARM, though as a useful open source ISA.

I wonder how it would do with an in order multiple issue implementation? We already know that expermental implementations of the ARM ISA have shown that in order multiple issue can out perform out of order multiple issue, so long as the code is well optimized (which is actually fairly simple for a compiler), so what about the RISC V ISA, it looks like it could have some potential there.

I am most definitely interested in the potential of the RISC V ISA, and the implementations thereof. I think it is going to be used for many utility type applications, and it will likely stick around for that purpose. It would be nice if the next revision used proper flags, and thus reduced the complexity of the implementation, going even more RISC than they already are.

@Heater:
I am looking forward to the rest of the C code, this is going to be a fun challenge to compare BASIC and C in a very direct maner, with a fairly complex implementation problem.

Also sorry about the code issues of yesterday, I had a bad day (turns out I had had a medical event that i did not know about until late last night).
RPi = The best ARM based RISC OS computer around
More than 95% of posts made from RISC OS on RPi 1B/1B+ computers. Most of the rest from RISC OS on RPi 2B/3B/3B+ computers

Heater
Posts: 13924
Joined: Tue Jul 17, 2012 3:02 pm

Re: Why Avoid BASIC on RPi?

Sun Dec 02, 2018 7:57 pm

You guys have interesting takes on RISC V and that video. I look at things differently and there are some misconceptions to clear up, so some comments:

Clearly Chris Celio is being a bit playful when he make the claim that ARM is "CISCy". But it's not without merit, the ARM instruction sets (There are many) are huge compared to RISC V. If you want to see how CISCy they are just print out the instruction set references and weigh them!

Clearly those Berkeley researcher know the various instruction sets well. They have been studying them longer than most people have been alive.

I watched the video again, his comments re: micro-ops are sound. They were mostly to do with Intel's string operations. That ARM LDMIAEQ example is also sound, the number of clocks it takes to execute is "1+Nb (+Pa if PC loaded)" according to the manual, clearly there are "micro-ops" going on there.

The use of -O3 is quite fine and to be expected. Same compiler, same version, same options, across all 6 architectures he looks at. If anything that gives RISC V a disadvantage as GCC support is fairly new and there may be further optimizations that have not been realized yet.

If you want other RISC V presentations just type "risc v" into YouTube.

Berkeley RISC and MIPS are in the same bucket in my mind. Both research projects looking at RISC ideas at about the same time. Both inspired by earlier work in reduced instruction sets at IBM. Their respective project leaders, Hennessy and Patterson, were close collaborators and wrote the book on it together:
https://www.amazon.com/Computer-Organiz ... +Patterson

I think it's very likely that RISC V will displace a lot of ARM in the future. Why would it not? If it's small and low power and equally performant why would anyone choose to pay SoftBank to use ARM? Similar arguments apply all the way up to cloud server and super computer applications, and there are people working on it around the world.

When it comes to status flags, conditional execution etc, I'm going to bet that the likes of Hennessy and Patterson and all their grad students have been looking into such things for decades. Especially with regard how they play with available compiler technology. Seems they have found none of that is of any benefit. They would not toss out an idea lightly.

DavidS has a lot of interesting comments about instruction decoding, barrel shifters, multiple-dispatch, in order vs out of order, execution, memory access, etc. All of that is in the implementation, build it how you like. RISC V is only an instruction set specification.

I suspect the reason thumb mode was not mentioned is that it does not exist on 64 bit ARM. Also GCC will use thumb instructions if advantageous in 32 bit ARM. If I understand correctly.

Yes, there are a lot of optional parts to the RISC V spec. That does not make things non-portable, just recompile and go. Or make use of undefined instruction traps and do the op in software. Also there is concept of the "G" variant which includes all the options one will need for a reasonable general purpose machine running Linux or whatever.

Options are good. It means I can write my own RISC V. It means we can fit them into small spaces. It means companies can add extensions in a clean way for their "secret sauce" hardware.

Yes ARM dominates in some markets. It has not displaced all low end micro-controllers. Now that MCUs are available down to 5 cents a piece, the guys that make them will not want to being paying royalties to SoftBank. Already we are seeing innovative new MCU products coming out that are RISC V based.

Finally, an important point: There will never be a "next version" of RISC V. It's a published standard. It is intended to be fixed in concrete. There will not be a RISC V with redundant features like "proper flags" or conditional execution. Contrary to David's assertion such things have little or no performance benefit and only make implementation more complex.

You are of course free to add extensions to RISC V that have such things. Which would be a good exercise: Implement it in Verilog, getting it working on FPGA and report back any performance gains you have made.

Phew... that was a long ramble...
Memory in C++ is a leaky abstraction .

jahboater
Posts: 4846
Joined: Wed Feb 04, 2015 6:38 pm

Re: Why Avoid BASIC on RPi?

Sun Dec 02, 2018 8:37 pm

Heater wrote:
Sun Dec 02, 2018 7:57 pm
That ARM LDMIAEQ example is also sound, the number of clocks it takes to execute is "1+Nb (+Pa if PC loaded)" according to the manual, clearly there are "micro-ops" going on there.
Yes - and that's why its been removed from aarch64. Its far too slow and doesn't handle interrupts well.
Heater wrote:
Sun Dec 02, 2018 7:57 pm
I think it's very likely that RISC V will displace a lot of ARM in the future. Why would it not? If it's small and low power and equally performant why would anyone choose to pay SoftBank to use ARM? Similar arguments apply all the way up to cloud server and super computer applications, and there are people working on it around the world.
These little ARM controllers must be very cheap. I recently bought a "unicorn hat" (led matrix) and it has a small ARM processor on it just to handle the I2C!! They must be cheap to make that worthwhile.
Heater wrote:
Sun Dec 02, 2018 7:57 pm
When it comes to status flags, conditional execution etc, I'm going to bet that the likes of Hennessy and Patterson and all their grad students have been looking into such things for decades. Especially with regard how they play with available compiler technology. Seems they have found none of that is of any benefit. They would not toss out an idea lightly.
Well I bet it is to reduce the chip complexity, save power and costs. I still don't like it - sorry. It feels like going back to the dark ages. [rant]Similar to GCC inline asm. Before about GCC 6, to make a flag available to the surrounding C code, you had to save it in a register with setcc or worse on ARM. Then, in the surrounding C you had to test the variable and take any action. That's what RISC V has gone back to. Now GCC properly exports flags and they are usable in the C directly, with no intermediate registers! Using the flags directly in C makes everything smaller and faster and generally cleaner[/rant]
Heater wrote:
Sun Dec 02, 2018 7:57 pm
I suspect the reason thumb mode was not mentioned is that it does not exist on 64 bit ARM.
That's what I thought too.
Heater wrote:
Sun Dec 02, 2018 7:57 pm
Also GCC will use thumb instructions if advantageous in 32 bit ARM. If I understand correctly.
Only if you ask it to with "-mthumb".
Heater wrote:
Sun Dec 02, 2018 7:57 pm
Yes, there are a lot of optional parts to the RISC V spec. That does not make things non-portable, just recompile and go.
Yes OK. Recompile it and the compiler will replace all the missing features with slow library functions. I thought we had got past that sort of thing with the x87 :(
Heater wrote:
Sun Dec 02, 2018 7:57 pm
Also there is concept of the "G" variant which includes all the options one will need for a reasonable general purpose machine running Linux or whatever.
Now I like that! RV64G seems the minimum spec to look out for when I buy a RISCV SBC.

User avatar
DavidS
Posts: 4334
Joined: Thu Dec 15, 2011 6:39 am
Location: USA
Contact: Website

Re: Why Avoid BASIC on RPi?

Sun Dec 02, 2018 9:16 pm

Heater wrote:
Sun Dec 02, 2018 7:57 pm
You guys have interesting takes on RISC V and that video. I look at things differently and there are some misconceptions to clear up, so some comments:

Clearly Chris Celio is being a bit playful when he make the claim that ARM is "CISCy". But it's not without merit, the ARM instruction sets (There are many) are huge compared to RISC V. If you want to see how CISCy they are just print out the instruction set references and weigh them!
LOL, I like your view there.
ARM less than 64 instructions.

Now when you add the standard coprocessors like NEON/VFP and MMU, or the predecoder that is Thumb than yes it starts getting out of hand. Though just ARM is less than 64 very versitile RISC instructions.

What I have been reading shows RISC V to have nearly as much once you add its coprocessors. The big difference being that RISC V has been smart enough to not attempt to require any of them as standard.
Clearly those Berkeley researcher know the various instruction sets well. They have been studying them longer than most people have been alive.

I watched the video again, his comments re: micro-ops are sound. They were mostly to do with Intel's string operations. That ARM LDMIAEQ example is also sound, the number of clocks it takes to execute is "1+Nb (+Pa if PC loaded)" according to the manual, clearly there are "micro-ops" going on there.

No microops on the ARM, not even for multi-cycle operatins like LDMFDEQ (my prefered of writing the same mnemonic). I laghed at the thought though. Which the instruction is LDM, so trying to break LDM into a bunch of instructions based on things that are not the instruction I also found funny.

The LDM multi clock ops are done with a natural state-machine (I believe it is implement as a shift curcuit or similar), not as a series of micro-ops (which would be what CISC does).
The use of -O3 is quite fine and to be expected. Same compiler, same version, same options, across all 6 architectures he looks at. If anything that gives RISC V a disadvantage as GCC support is fairly new and there may be further optimizations that have not been realized yet.

If you want other RISC V presentations just type "risc v" into YouTube.

Berkeley RISC and MIPS are in the same bucket in my mind. Both research projects looking at RISC ideas at about the same time. Both inspired by earlier work in reduced instruction sets at IBM. Their respective project leaders, Hennessy and Patterson, were close collaborators and wrote the book on it together:
https://www.amazon.com/Computer-Organiz ... +Patterson

I think it's very likely that RISC V will displace a lot of ARM in the future. Why would it not? If it's small and low power and equally performant why would anyone choose to pay SoftBank to use ARM? Similar arguments apply all the way up to cloud server and super computer applications, and there are people working on it around the world.

When it comes to status flags, conditional execution etc, I'm going to bet that the likes of Hennessy and Patterson and all their grad students have been looking into such things for decades. Especially with regard how they play with available compiler technology. Seems they have found none of that is of any benefit. They would not toss out an idea lightly.

DavidS has a lot of interesting comments about instruction decoding, barrel shifters, multiple-dispatch, in order vs out of order, execution, memory access, etc. All of that is in the implementation, build it how you like. RISC V is only an instruction set specification.

I suspect the reason thumb mode was not mentioned is that it does not exist on 64 bit ARM. Also GCC will use thumb instructions if advantageous in 32 bit ARM. If I understand correctly.

Yes, there are a lot of optional parts to the RISC V spec. That does not make things non-portable, just recompile and go. Or make use of undefined instruction traps and do the op in software. Also there is concept of the "G" variant which includes all the options one will need for a reasonable general purpose machine running Linux or whatever.

Options are good. It means I can write my own RISC V. It means we can fit them into small spaces. It means companies can add extensions in a clean way for their "secret sauce" hardware.

Yes ARM dominates in some markets. It has not displaced all low end micro-controllers. Now that MCUs are available down to 5 cents a piece, the guys that make them will not want to being paying royalties to SoftBank. Already we are seeing innovative new MCU products coming out that are RISC V based.

Finally, an important point: There will never be a "next version" of RISC V. It's a published standard. It is intended to be fixed in concrete. There will not be a RISC V with redundant features like "proper flags" or conditional execution. Contrary to David's assertion such things have little or no performance benefit and only make implementation more complex.

You are of course free to add extensions to RISC V that have such things. Which would be a good exercise: Implement it in Verilog, getting it working on FPGA and report back any performance gains you have made.

Phew... that was a long ramble...
I do thank you, you definitely have some valid points I agree with. I feel that RISC V is likely to do well in the high end embeded market do to its nature. I seriously doubt that it will ever displace our beloved desktop CPU the ARM. It is not the kind of arch that is likely to do as well in the desktop market as the ARM has these 30+ years. Now it may indeed displace the low end ARM's that are being abused as embeded or mobile processors, ARM is a desktop CPU so no loss there.
RPi = The best ARM based RISC OS computer around
More than 95% of posts made from RISC OS on RPi 1B/1B+ computers. Most of the rest from RISC OS on RPi 2B/3B/3B+ computers

Heater
Posts: 13924
Joined: Tue Jul 17, 2012 3:02 pm

Re: Why Avoid BASIC on RPi?

Sun Dec 02, 2018 10:10 pm

DavidS,
LOL, I like your view there.
Why thank you.
ARM less than 64 instructions.
What kind of ARM are you using? The ARMv8 Instruction Set Overview doccument has about a thousand instructions listed. I gave up counting after 200 or so and realized I was still so near the top of the document! https://www.element14.com/community/ser ... Manual.pdf

As far as I understand 64 bit ARM has prunned a lot of old junk and what is left is mandatory.
No microops on the ARM
I guess we now have to debate what you mean by "micro-ops". The meaning from the video was clear.

On Intel the string instructions can work on hundreds, thousands of bytes/words etc. The time it takes to execute depends on the amount of data in question. Clearly there is some kind of loop going on internally, iterating over the data. We can call each piece of work done in an iteration a "micro-op".

Similary On ARM the instruction in question takes a time to execute proportional to the data size involved, the number of registers in this case. Clearly there is some kind of loop going on internally, iterating over the data. We can call each piece of work done in an iteration a "micro-op".

The point being made in the video is that you cannot tell how much useful work is being done by simply counting instructions. You have to count the "micro-ops" going on in each instruction.

Perhaps you can explain what you mean by "natural state-machine" vs "series of micro-ops (which would be what CISC does)." ?
It is not the kind of arch that is likely to do as well in the desktop market as the ARM has these 30+ years.
What?!

ARM has no presence on the desktop.

The only desktop presence ARM has had was the Archie. That was not a big thing and soon fizzeled out.

ARM has not been a desktop CPU since.

Now, as it happens one of the biggest backers of RISC V is the desktop giant, Intel. Who knows what interest they have in it and what they might have in mind?

Finally, assuming performance parity is reached, why would I care if my desktop machine is running Linux on Intel, ARM or RISC V ? (Issues of openness, company preferences, etc, aside)
Memory in C++ is a leaky abstraction .

Heater
Posts: 13924
Joined: Tue Jul 17, 2012 3:02 pm

Re: Why Avoid BASIC on RPi?

Sun Dec 02, 2018 10:54 pm

jahboater,
These little ARM controllers must be very cheap....
I'm sure they are. But I'm talking CHEAP! Like the "3 CENT Micro": https://www.youtube.com/watch?v=Rixo78hv_lw

More generally, if one were a small start up, or department of a bigger company that wanted to make a chip and needed a CPU core design, what is more attractive: 1) Pay up for an ARM licence and wait around for months for the lawyers to sort out the contract. 2) Drop a RISC V in there and get on with it.

The likes of Western Digital, Nvidia and such made that decision. They went RISC V.

These Chinese guys saw the advantage as well:
https://item.taobao.com/item.htm?id=578484113485
https://hackaday.io/project/162174-kend ... windows-10
I bet it is to reduce the chip complexity, save power and costs.
I'm inclined to doubt that is the primary motivation.

There are many examples throughout computing history of the hardware guys coming up with designs that had huge performance benefits on paper but the failed because compilers could not make use of what they provided and nothing like theoretical peak performance was achieved.

Examples: The Intel i432. The Intel i860. The Intel Itanium. Various VLSIW designs. Or just look at how ARM has jettisoned piles of junk in moving to their 64 bit architecture.

This can be seen as a kind of "impedance mismatch" between the hardware and software at the instruction set interface.

My take on it is that all these decades of RISC projects at Berkeley and elsewhere are all about minimizing that impedance mismatch. Understanding what hardware and compiler technology can do and mating them together smoothly.
Memory in C++ is a leaky abstraction .

User avatar
DavidS
Posts: 4334
Joined: Thu Dec 15, 2011 6:39 am
Location: USA
Contact: Website

Re: Why Avoid BASIC on RPi?

Sun Dec 02, 2018 11:05 pm

Heater wrote:
Sun Dec 02, 2018 10:10 pm
DavidS,
LOL, I like your view there.
Why thank you.
ARM less than 64 instructions.
What kind of ARM are you using? The ARMv8 Instruction Set Overview doccument has about a thousand instructions listed. I gave up counting after 200 or so and realized I was still so near the top of the document! https://www.element14.com/community/ser ... Manual.pdf
That is AARCH64 not ARM in the normal since. It has about as much in comon with traditional ARM as traditional ARM has in common with x86.

Though yes the ARM (AARCH32, without coprocessors) instruction set is less than 64 instructions, depdneing on how you count either 37 instructions ttl, or 61 instructions.
As far as I understand 64 bit ARM has prunned a lot of old junk and what is left is mandatory.
No microops on the ARM
I guess we now have to debate what you mean by "micro-ops". The meaning from the video was clear.

On Intel the string instructions can work on hundreds, thousands of bytes/words etc. The time it takes to execute depends on the amount of data in question. Clearly there is some kind of loop going on internally, iterating over the data. We can call each piece of work done in an iteration a "micro-op".

Similary On ARM the instruction in question takes a time to execute proportional to the data size involved, the number of registers in this case. Clearly there is some kind of loop going on internally, iterating over the data. We can call each piece of work done in an iteration a "micro-op".
Not similarly, on the x86 it is a microcoded program, on the ARM it is not. Completely different implementations.

A micro op is a peice of microcode, that has been the definition forever. I do not call a hardware state a micro op, if I did each stage of a RISC pipeline would be a micro-op.

And you do know that you can do stack ops on AARCH32 without using the LDM/STM instruction (yes bothe are actually the same op/instruction)? The address increment/decrement post/pre are available in the normal single register STR/LDM instruction (again yes a single instruction).

And the LDM/STM pair is the one and only argument that people can use to make it look like a coded set. All other ARM instructions are purely RISC (ARM instructions, not VFP/NEON or other coprocessor).
The point being made in the video is that you cannot tell how much useful work is being done by simply counting instructions. You have to count the "micro-ops" going on in each instruction.

Perhaps you can explain what you mean by "natural state-machine" vs "series of micro-ops (which would be what CISC does)." ?
I agree that we can not count the instructions to determine performance, even if all instructions executed in the same number of states on all the test machines, there are still differences in the work done per instruction. For example the use of a barrel shifter in line with the ALU can provide up to 3 times the work of what can be done on another RISC Archetechure, using instructions that take only one cycle each. So even knowing the number of stages of execution total (what you keep calling microops) would not tell us the useful work, we need the work on average per state of operation of the ISA.

A natural state machine just uses HW states that are not advanced by the instruction stream. A micro op is the execution of a micro code instruction on the CPU.
It is not the kind of arch that is likely to do as well in the desktop market as the ARM has these 30+ years.
What?!

ARM has no presence on the desktop.

The only desktop presence ARM has had was the Archie. That was not a big thing and soon fizzeled out.

ARM has not been a desktop CPU since.
Really, what universe do you live? The Archie was first, followed by the A series, followed by the RISC PC, then things started to spread, and we began seeing third party ARM based desktop computers to attempt to compete with the extrememly popular RISC PC (largely used in television in the 1990's even in America, and popular to Americans because of what it could do in the TV studios showing off its power). Many of these third party systems began to fad for a while, then things became even more interesting, as we got the IYONIX (one I never had) as a high end desktop, then came along the smallest desktops with even more power, these being desktop computers built around ARM dev boards like the Beagle Board, then it got extreme when a charity came out with a little lower end ARMv6 (ARM1176) based SBC for an extremely low price in 2012, quickly many of the ARM users around the world rebuilt there existing systems, or built new desktop systems around this SBC.

Since then a greate number of ARM based SBC's have poped up, many of which are being built into desktop comuters, some with extraordinary specs. Though currently the best one that is affordable with open enough specs all around to be usefull is a much more powerfull and up to date version of that first super low cost one from 2012, form the same people.
Now, as it happens one of the biggest backers of RISC V is the desktop giant, Intel. Who knows what interest they have in it and what they might have in mind?
While it is true that Intel did OK in the desktop market back in the 90's and early 2000's, calling them a giant? Especially with the ISA of the x86. I would just say they are a marketing push, not a giant of what is realistic to use.
Finally, assuming performance parity is reached, why would I care if my desktop machine is running Linux on Intel, ARM or RISC V ? (Issues of openness, company preferences, etc, aside)
That question should answer itself. ISA, ISA, ISA, ISA.
RPi = The best ARM based RISC OS computer around
More than 95% of posts made from RISC OS on RPi 1B/1B+ computers. Most of the rest from RISC OS on RPi 2B/3B/3B+ computers

ejolson
Posts: 3831
Joined: Tue Mar 18, 2014 11:47 am

Re: Why Avoid BASIC on RPi?

Sun Dec 02, 2018 11:55 pm

Heater wrote:
Sun Dec 02, 2018 10:10 pm
ARM has no presence on the desktop.
A significant number of Chromebooks also use ARM and a surprising number of people use Chromebooks.

jahboater
Posts: 4846
Joined: Wed Feb 04, 2015 6:38 pm

Re: Why Avoid BASIC on RPi?

Mon Dec 03, 2018 10:03 am

Heater wrote:
Sun Dec 02, 2018 10:54 pm
I bet it is to reduce the chip complexity, save power and costs.
I'm inclined to doubt that is the primary motivation.
The only other reason I can think of is that the flags are an extra register that instructions may depend on.
Worse - if an instruction doesn't update all the flags in the register, then we have a "partial register stall". This makes things really complicated as the current instruction has to wait for the previous instructions that set the other flags to complete.

Intel had this problem with the INC instruction. It did not set the carry flag. And for years people recommended using "add reg,1" instead because it set all the flags. Of course they have overcome that now, probably by throwing lots of hardware at it.

User avatar
DavidS
Posts: 4334
Joined: Thu Dec 15, 2011 6:39 am
Location: USA
Contact: Website

Re: Why Avoid BASIC on RPi?

Mon Dec 03, 2018 1:46 pm

jahboater wrote:
Mon Dec 03, 2018 10:03 am
Heater wrote:
Sun Dec 02, 2018 10:54 pm
I bet it is to reduce the chip complexity, save power and costs.
I'm inclined to doubt that is the primary motivation.
The only other reason I can think of is that the flags are an extra register that instructions may depend on.
Worse - if an instruction doesn't update all the flags in the register, then we have a "partial register stall". This makes things really complicated as the current instruction has to wait for the previous instructions that set the other flags to complete.

Intel had this problem with the INC instruction. It did not set the carry flag. And for years people recommended using "add reg,1" instead because it set all the flags. Of course they have overcome that now, probably by throwing lots of hardware at it.
Yes they have.

Thogh in most RISC designs the issue of intels INC can not come into play. The Adder is going to generate the extra bit for add and sub instructions, overflow is a simple change, runing these to a latch is all that a status/condition register is, it is reducing complexity by not having to put them in a more general register. Now things like multiplication and similar in HW could cause a pipeline stall, though that is going to be the same with or without a carry flag.

Though also remember it is a good idea now days to keep instructions that rely on the results of previous instructions (including flags) seperated by a few instructions where possible (and why it is great that the ARM has the option for instructions to not set the flags [actually the option is to set the flags, the default is not to]).
RPi = The best ARM based RISC OS computer around
More than 95% of posts made from RISC OS on RPi 1B/1B+ computers. Most of the rest from RISC OS on RPi 2B/3B/3B+ computers

jahboater
Posts: 4846
Joined: Wed Feb 04, 2015 6:38 pm

Re: Why Avoid BASIC on RPi?

Mon Dec 03, 2018 3:11 pm

DavidS wrote:
Mon Dec 03, 2018 1:46 pm
(and why it is great that the ARM has the option for instructions to not set the flags [actually the option is to set the flags, the default is not to]).
I know, I actually like that feature.
Intel sets the flags for almost every instruction, and people end up using "lea" for arithmetic to avoid doing so.
I see they are now adding instructions that explicitly don't set the flags see MULX here:
https://www.felixcloutier.com/x86/MULX.html

The trouble with the scheduling you mention (inserting instructions in between things) is that a) modern "out-of-order" processors do that anyway, and b) again, modern processors are "fusing" instructions together so they enter the pipeline as one micro-op. In such cases it is madness to separate the instructions.
Intel for example has always fuzed "cmp/conditional jump" pairs, now it is fuzing things like "add/conditional jump".
ARM cpu's fuze things like "movw/movt" on 32-bit and "mov/movk sequences" on 64-bit (so you can load a 64-bit immediate in one effective instruction).
Its really hard to work it all out. Fortunately the compiler can, and will automatically adjust the code for each cpu model.
An important reason for using an up to date compiler, it will know how to schedule for all the ARM versions.

User avatar
DavidS
Posts: 4334
Joined: Thu Dec 15, 2011 6:39 am
Location: USA
Contact: Website

Re: Why Avoid BASIC on RPi?

Mon Dec 03, 2018 3:29 pm

jahboater wrote:
Mon Dec 03, 2018 3:11 pm
DavidS wrote:
Mon Dec 03, 2018 1:46 pm
(and why it is great that the ARM has the option for instructions to not set the flags [actually the option is to set the flags, the default is not to]).
I know, I actually like that feature.
Intel sets the flags for almost every instruction, and people end up using "lea" for arithmetic to avoid doing so.
I see they are now adding instructions that explicitly don't set the flags see MULX here:
https://www.felixcloutier.com/x86/MULX.html

The trouble with the scheduling you mention (inserting instructions in between things) is that a) modern "out-of-order" processors do that anyway, and b) again, modern processors are "fusing" instructions together so they enter the pipeline as one micro-op. In such cases it is madness to separate the instructions.
Intel for example has always fuzed "cmp/conditional jump" pairs, now it is fuzing things like "add/conditional jump".
ARM cpu's fuze things like "movw/movt" on 32-bit and "mov/movk sequences" on 64-bit (so you can load a 64-bit immediate in one effective instruction).
Its really hard to work it all out. Fortunately the compiler can, and will automatically adjust the code for each cpu model.
An important reason for using an up to date compiler, it will know how to schedule for all the ARM versions.
True, excepting the issue that out of order versus in order multiple issue, that is to say the two methods of having multiple instructions running in parallel on the same core. At university when I was there we were playing with the differences between the two models, and which could produce the best results with very carefully optimized code for each (we were using a licensed ARM ISA, custom cores). It was interesting that with well optimized code the widest in order multi issue models were able to average 3.7 instructions per clock consistantly with normal kinds of code, while the out of order models were able to just make 3.1 average in most test cases.

And we also did some experiments with which is easier to implement compiler optimization for, at the time we found that it was easiest to implement optimization for in order multiple issue, because the hinting for out of order gets painful when writing a compiler.

All of the ARM based tests were done with a pipeline that could take 4 instructions per issue (that is every stage could have 4 unrelleated instructions at any time) and was a very simple implementation.

A little off topic, though another experiment that was also done at that time was to see how far parallel multiple issue on a single processor can go. As such a couple of custome designs for testing were made, with 64 registers each (to allow for a lot of different instruction combinations and still as wide of an issue as possible), and 12 instruction wide pipeline. Using real world algorithms it was found that 4 instructions per issue were just about the limit for either model per core. They actually were only able to manage an average of 3.1 instructions per issue with extreme care in optimization, I think there chosen ISA got in the way a bit more than the ARM ISA did. Though I was only involved in the projects using ARM ISA, so it is what it is.
RPi = The best ARM based RISC OS computer around
More than 95% of posts made from RISC OS on RPi 1B/1B+ computers. Most of the rest from RISC OS on RPi 2B/3B/3B+ computers

ejolson
Posts: 3831
Joined: Tue Mar 18, 2014 11:47 am

Re: Why Avoid BASIC on RPi?

Mon Dec 03, 2018 9:19 pm

I found my C implementation of Karatsuba multiplication. It was written a number of years ago for the Sphere Online Judge programming problems Fast Multiplication and Square Root. The speed of the code is not particularly fast compared to top entries for those problems, but it worked well enough to get correct answers without going over the maximum time limits.

I added a recursive procedure to compute the nth Fibonacci number using the suggested doubling formula. Computation of F(4784969) and printing the million-digit answer in decimal takes 13MB of RAM and 111 seconds using a Pi Zero.

The program consists of a single self-contained C-code file which will be posted shortly. The known inefficiencies include using 64-bit integers on the 32-bit ARMv6 architecture, wasting half the bits available to make multiply easier, wasting even more memory on temporary variables and performing 2log(n) unnecessary bignumber copy operations. There are also easy opportunities for parallelism on multicore processors which haven't been exploited, however this isn't relevant for the Pi Zero.

Heater
Posts: 13924
Joined: Tue Jul 17, 2012 3:02 pm

Re: Why Avoid BASIC on RPi?

Tue Dec 04, 2018 12:45 am

That is cool. Look forward to seeing it.

I'm still tinkering with my "fast" fibo(4784969) as and when I have a moment. Currently, using my big integer addition and the slow schoolboy fibo algorithm it takes about 15 minutes on my PC!

Nice to see have run into the same issues you did, redundant copying and inefficient multiply.
Memory in C++ is a leaky abstraction .

User avatar
DavidS
Posts: 4334
Joined: Thu Dec 15, 2011 6:39 am
Location: USA
Contact: Website

Re: Why Avoid BASIC on RPi?

Tue Dec 04, 2018 1:25 am

I am also looking forward to seeing it.

@Heater
How is the C code coming for the big Fibbonacci?
RPi = The best ARM based RISC OS computer around
More than 95% of posts made from RISC OS on RPi 1B/1B+ computers. Most of the rest from RISC OS on RPi 2B/3B/3B+ computers

Heater
Posts: 13924
Joined: Tue Jul 17, 2012 3:02 pm

Re: Why Avoid BASIC on RPi?

Tue Dec 04, 2018 3:48 am

My last post above has the status of my big integer fibo.
Memory in C++ is a leaky abstraction .

ejolson
Posts: 3831
Joined: Tue Mar 18, 2014 11:47 am

Re: Why Avoid BASIC on RPi?

Tue Dec 04, 2018 4:00 am

DavidS wrote:
Tue Dec 04, 2018 1:25 am
I am also looking forward to seeing it.

@Heater
How is the C code coming for the big Fibbonacci?
I had forgotten to turn on the gcc optimizer for the previous run. With -O3 -ffast-math there is a 43 percent improvement. Execution time to compute F(4784969) is now 63 seconds on a Pi Zero. The code is

Code: Select all

/*  fibonacci.c -- Compute the nth Fibonacci Number
    Written December 1, 2018 by Eric Olson

    This program demonstrates the expressiveness of C as measured
    by explicitly coding the Karatsuba multiplication algorithm for
    big-number arithmetic and then using the doubling formulas

        F(2k) = F(k)[2F(k+1)-F(k)]
      F(2k+1) = F(k+1)^2+F(k)^2
      F(2k+2) = F(k+1)[2F(k)+F(k+1)]

    to compute the nth Fibonacci number.  Note that n is specified
    in the first line of the main routine.

    Version 2:  Minor changes to fix compiler warnings and zero the
    unused memory in the copybn routine.
*/

#include <stdio.h>
#include <math.h>
#include <string.h>
#include <stdlib.h>

typedef unsigned long long ull;
static const ull base=1000000000LL;
static const int bexp=9;
 
static unsigned char *bnbuf,*bp;

static inline void *fmalloc(unsigned int p){
    unsigned char *r=bp; bp+=p; return r;
}
static inline void ffree(void *p){
    bp=p;
}
 
typedef struct {
    int n;
    ull *d;
} bignum;
 
static ull atolln(char *p,int n){
    int i;
    ull e,s=0;
    for(i=n-1,e=1;i>=0;i--,e*=10){
        s+=(p[i]-'0')*e;
    }
    return s;
}

static void pbn(bignum x){
    int i;
    char fmt[8];
    sprintf(fmt,"%%0%dLu",bexp);
    if(!x.n){
        printf("0\n");
        return;
    }
    for(i=x.n-1;i>=0;i--){
        if(i==x.n-1) printf("%Lu",x.d[i]);
        else printf(fmt,x.d[i]);
    }
    printf("\n");
}
 
static inline bignum trmbn(bignum x){
    int i;
    if(!x.n) return x;
    for(i=x.n-1;i>=0;i--) if(x.d[i]) break;
    x.n=i+1;
    return x;
}
 
static inline bignum newbn(int digits){
    bignum x;
    x.d=fmalloc(digits*sizeof(ull));
    memset(x.d,0,digits*sizeof(ull));
    x.n=0;
    return x;
}
 
static bignum atobn(char *p,int digits){
    int i;
    bignum x=newbn(digits);
    int n=strlen(p);
    if(!n) return x;
    for(i=n-bexp;i>=0;i-=bexp){
        x.d[x.n++]=atolln(&p[i],bexp);
    }
    if(i+bexp>0) {
        x.d[x.n++]=atolln(&p[0],i+bexp);
    }
    return x;
}
 
static void inline delbn(bignum a){
    ffree(a.d);
}

static bignum addbn(bignum a,bignum b){
    ull c;
    bignum x;
    int i,n;
    n=a.n>b.n?a.n:b.n;
    x=newbn(n+1);
    for(i=0,c=0;i<n;i++){
        x.d[i]=c;
        if(i<a.n) c+=a.d[i];
        if(i<b.n) c+=b.d[i];
        if(c>=base) {
            x.d[i]=c-base;
            c=1;
        } else {
            x.d[i]=c;
            c=0;
        }
    }
    if(c) x.d[n++]=c;
    x.n=n;
    return x;
}
 
static void subbn2(bignum a,bignum b){
    int i;
    char *c;
    if(!b.n) return;
    c=fmalloc(a.n+1);
    for(i=0,c[0]=0;i<b.n;i++){
        if(a.d[i]<b.d[i]+c[i]) {
            a.d[i]+=base;
            c[i+1]=1;
        } else {
            c[i+1]=0;
        }
    }
    for(i=0;i<b.n;i++){
        a.d[i]-=b.d[i]+c[i];
    }
    a.d[b.n]-=c[b.n];
    ffree(c);
    return;
}
 
static bignum mulbn(bignum a,bignum b){
    bignum x;
    int i,k,j;
    ull c;
    if(!a.n||!b.n) {
        x=newbn(1);
        return x;
    }
    x=newbn(a.n+b.n+1);
    x.n=a.n+b.n-1;
    for(i=0;i<a.n;i++) {
        for(j=0;j<b.n;j++){
            x.d[i+j]+=a.d[i]*b.d[j];
        }
        if((a.n-i)%50==1){
            for(k=0;k<=x.n;k++){
                if(x.d[k]>=base){
                    c=x.d[k]/base;
                    x.d[k]%=base;
                    x.d[k+1]+=c;
                }
            }
        }
    }
    if(x.d[x.n]) x.n++;
    return x;
}
 
static bignum mulbnk(bignum a, bignum b){
    bignum x;
    int M,m,i,n;
    ull c;
    bignum a0,a1,b0,b1;
    bignum z2,z1a,z1b,z1,z0;
    if(!a.n||!b.n) {
        x=newbn(1);
        return x;
    }
    x=newbn(a.n+b.n+1);
    if(a.n>b.n) { M=a.n; m=b.n; }
    else { M=b.n; m=a.n; }
    if(m<49) return mulbn(a,b);
    n=M/2;
    a0.d=&a.d[0]; a0.n=n; if(a.n<a0.n) a0.n=a.n;
    b0.d=&b.d[0]; b0.n=n; if(b.n<b0.n) b0.n=b.n;
    a1.d=&a.d[n]; a1.n=a.n-n; if(a1.n<0) a1.n=0;
    b1.d=&b.d[n]; b1.n=b.n-n; if(b1.n<0) b1.n=0;
    z0=mulbnk(a0,b0);
    z2=mulbnk(a1,b1);
    z1a=addbn(a1,a0);
    z1b=addbn(b1,b0);
    z1=mulbnk(z1a,z1b);
    subbn2(z1,z0);
    subbn2(z1,z2);
    z1=trmbn(z1);
    memcpy(x.d,z0.d,(x.n=z0.n)*sizeof(ull));
    if(z1.n){
        int k;
        for(i=0,c=0;i<z1.n;i++){
            x.d[k=n+i]+=z1.d[i]+c;
            if(x.d[k]>=base) {
                x.d[k]-=base;
                c=1;
            } else c=0;
        }
        k=n+i;
        while(c){
            x.d[k]+=c;
            if(x.d[k]>=base) {
                x.d[k]-=base;
                c=1;
            } else c=0;
            k++;
        }
        if(x.n<k) x.n=k;
    }
    if(z2.n){
        int n2=2*n,k;
        for(i=0,c=0;i<z2.n;i++){
            x.d[k=n2+i]+=z2.d[i]+c;
            if(x.d[k]>=base) {
                x.d[k]-=base;
                c=1;
            } else c=0;
        }
        k=n2+i;
        while(c){
            x.d[k]+=c;
            if(x.d[k]>=base) {
                x.d[k]-=base;
                c=1;
            } else c=0;
            k++;
        }
        if(x.n<k) x.n=k;
    }
    delbn(z1); delbn(z1b); delbn(z1a); delbn(z2); delbn(z0);
    return x;
}
 
static void copybn(bignum *a,bignum b){
    b=trmbn(b);
    if(a->n>b.n) bzero(a->d+b.n,sizeof(ull)*(a->n-b.n));
    memcpy(a->d,b.d,sizeof(ull)*(a->n=b.n));
}

static int digits;    
static int fibo(int n,bignum *a,bignum *b){
    if(!n) {
        *a=atobn("0",digits);
        *b=atobn("1",digits);
        return 0;
    }
    int m=2*fibo(n>>1,a,b)+n%2;
    bignum ta=*a, tb=*b;
    bignum taa=mulbnk(ta,ta);
    bignum tbb=mulbnk(tb,tb);
    bignum taapbb=addbn(taa,tbb);
    if(n%2){
        // [a,b]=[a*a+b*b,b*(2*a+b)]
        bignum t2a=addbn(ta,ta);
        bignum t2apb=addbn(t2a,tb);
        bignum tbL2apbR=mulbnk(tb,t2apb);
        copybn(a,taapbb); copybn(b,tbL2apbR);
        delbn(tbL2apbR); delbn(t2apb); delbn(t2a);
    } else {
        // [a,b]=[a*(b*2-a),a*a+b*b]
        bignum t2bma=addbn(tb,tb); subbn2(t2bma,ta);
        bignum taL2bmaR=mulbnk(ta,t2bma);
        copybn(a,taL2bmaR); copybn(b,taapbb);
        delbn(taL2bmaR); delbn(t2bma);
    }
    delbn(taapbb); delbn(tbb); delbn(taa);
    return m;
}

#define PHI ((1+sqrt(5.0))/2)
#define MFACT 12

int main(){
    int n=4784969;
    digits=n*log10(PHI)/bexp+4;
    bnbuf=malloc(digits*MFACT*sizeof(ull));
    if(!bnbuf){
        fprintf(stderr,"Out of memory!\n");
        return 1;
    }
    bp=bnbuf;
    bignum a,b;
    if(n<2) printf("%d\n",n);
    else {
        fibo(n-1,&a,&b);
        pbn(b);
    }
    return 0;
}
A transcript of a sample run of the program on the Raspberry Pi Zero follows:

Code: Select all

$ gcc -O3 -ffast-math -o fibonacci fibonacci.c -lm
$ time ./fibonacci | head -c 32
10727395641800477229364813596225
real    1m2.800s
user    1m2.620s
sys     0m0.120s
$ time ./fibonacci | tail -c 32
4856539211500699706378405156269

real    1m3.034s
user    1m2.840s
sys     0m0.120s
It would be interesting to see a different C code and whether it is possible to express a similar algorithm efficiently in BASIC.
Last edited by ejolson on Tue Dec 04, 2018 11:49 pm, edited 1 time in total.

jahboater
Posts: 4846
Joined: Wed Feb 04, 2015 6:38 pm

Re: Why Avoid BASIC on RPi?

Tue Dec 04, 2018 4:29 am

On a Pi3B+ (gcc 8.2) it is:-

Code: Select all

$ gcc -O3 fibo.c -o fibo
$ time ./fibo | head -c 32
10727395641800477229364813596225
real	0m15.430s
user	0m15.416s
sys	0m0.016s
$ time ./fibo | tail -c 32
4856539211500699706378405156269

real	0m15.500s
user	0m15.496s
sys	0m0.007s

ejolson
Posts: 3831
Joined: Tue Mar 18, 2014 11:47 am

Re: Why Avoid BASIC on RPi?

Tue Dec 04, 2018 4:38 am

jahboater wrote:
Tue Dec 04, 2018 4:29 am
On a Pi3B+ (gcc 8.2) it is:-

Code: Select all

$ gcc -O3 fibo.c -o fibo
$ time ./fibo | head -c 32
10727395641800477229364813596225
real	0m15.430s
user	0m15.416s
sys	0m0.016s
$ time ./fibo | tail -c 32
4856539211500699706378405156269

real	0m15.500s
user	0m15.496s
sys	0m0.007s
That's four times faster but still just one core! Is your Pi 3B+ running in 32-bit mode or 64-bit mode? Thanks for the additional timing.

User avatar
Paeryn
Posts: 2749
Joined: Wed Nov 23, 2011 1:10 am
Location: Sheffield, England

Re: Why Avoid BASIC on RPi?

Tue Dec 04, 2018 5:23 am

ejolson wrote:
Tue Dec 04, 2018 4:38 am
jahboater wrote:
Tue Dec 04, 2018 4:29 am
On a Pi3B+ (gcc 8.2) it is:-

Code: Select all

$ gcc -O3 fibo.c -o fibo
$ time ./fibo | head -c 32
10727395641800477229364813596225
real	0m15.430s
user	0m15.416s
sys	0m0.016s
$ time ./fibo | tail -c 32
4856539211500699706378405156269

real	0m15.500s
user	0m15.496s
sys	0m0.007s
That's four times faster but still just one core! Is your Pi 3B+ running in 32-bit mode or 64-bit mode? Thanks for the additional timing.
I just tried it on my stock 3B under Raspbian, compiling with gcc-6.3 gave me 39s, gcc-8.1 gave me 38s. It didn't throttle the speed back whilst running either.

Code: Select all

pi@rpi3:~/Programming/asm/fibo $ time ./fibo8.1 | head -c 32
10727395641800477229364813596225
real    0m38.549s
user    0m38.503s
sys     0m0.047s
She who travels light — forgot something.

jahboater
Posts: 4846
Joined: Wed Feb 04, 2015 6:38 pm

Re: Why Avoid BASIC on RPi?

Tue Dec 04, 2018 6:15 am

ejolson wrote:
Tue Dec 04, 2018 4:38 am
That's four times faster but still just one core! Is your Pi 3B+ running in 32-bit mode or 64-bit mode? Thanks for the additional timing.
It is running in 32-bit mode. Normal Raspbian Lite. No throttling. Stock settings - except for gpu_mem=16

Code: Select all

Time      Temp    CPU        Throttle       Vcore
06:18:46 36.5'C  600MHz 00000000000000000000 1.2V
06:18:51 35.9'C  600MHz 00000000000000000000 1.2V
06:18:56 36.5'C 1400MHz 00000000000000000000 1.3250V
06:19:01 37.0'C 1400MHz 00000000000000000000 1.3250V
06:19:06 37.0'C 1400MHz 00000000000000000000 1.3250V
06:19:11 36.5'C  600MHz 00000000000000000000 1.2V
06:19:17 36.5'C  600MHz 00000000000000000000 1.2V
06:19:22 36.5'C  600MHz 00000000000000000000 1.2V
It is 2.5 times faster than Paeryn's stock 3B which is very pleasing but unexplained!

Here is all the system info

Code: Select all

pi@pi3:~ $ 
pi@pi3:~ $ cat /proc/device-tree/model; echo
Raspberry Pi 3 Model B Plus Rev 1.3
pi@pi3:~ $ cat /etc/os-release
PRETTY_NAME="Raspbian GNU/Linux 9 (stretch)"
NAME="Raspbian GNU/Linux"
VERSION_ID="9"
VERSION="9 (stretch)"
ID=raspbian
ID_LIKE=debian
HOME_URL="http://www.raspbian.org/"
SUPPORT_URL="http://www.raspbian.org/RaspbianForums"
BUG_REPORT_URL="http://www.raspbian.org/RaspbianBugs"
pi@pi3:~ $ vcgencmd get_config int
aphy_params_current=819
arm_freq=1400
audio_pwm_mode=514
config_hdmi_boost=5
core_freq=400
desired_osc_freq=0x331df0
desired_osc_freq_boost=0x3c45b0
disable_commandline_tags=2
disable_l2cache=1
display_hdmi_rotate=-1
display_lcd_rotate=-1
dphy_params_current=547
enable_uart=1
force_eeprom_read=1
force_pwm_open=1
framebuffer_depth=16
framebuffer_ignore_alpha=1
framebuffer_swap=1
gpu_freq=300
hdmi_force_cec_address=65535
init_uart_clock=0x2dc6c00
lcd_framerate=60
over_voltage_avs=31250
over_voltage_avs_boost=0x1e848
overscan_bottom=32
overscan_left=32
overscan_right=32
overscan_top=32
pause_burst_frames=1
program_serial_random=1
sdram_freq=450
temp_soft_limit=70
pi@pi3:~ $ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/local/libexec/gcc/arm-linux-gnueabihf/8.2.0/lto-wrapper
Target: arm-linux-gnueabihf
Configured with: ../configure --enable-languages=c,c++ --with-cpu=cortex-a53 --with-fpu=neon-fp-armv8 --with-float=hard --build=arm-linux-gnueabihf --host=arm-linux-gnueabihf --target=arm-linux-gnueabihf --enable-checking=no
Thread model: posix
gcc version 8.2.0 (GCC) 
pi@pi3:~ $ 

Return to “Off topic discussion”