User avatar
sakaki
Posts: 417
Joined: Sun Jul 16, 2017 1:11 pm

Re: 64-bit operating system

Mon Dec 10, 2018 1:59 pm

ejolson wrote:
Wed Dec 05, 2018 6:00 am
jdonald wrote:
Wed Dec 05, 2018 5:31 am
Tried the image on my Pi 3B+.
In a different thread appears a short self-contained C program which computes the first Fibonacci number with a million digits. This program implements big-number arithmetic using 64-bit integers as the underlying type. The Pi 3B+ running in 32-bit compatibility mode completes the computation in 15.43 seconds. Based on rescaling the clock speeds of a different ARM-based single-board computer, it was estimated that the Pi 3B+ running in 64-bit mode should complete this same computation in only 7.49 seconds. If true, that would be a two-fold increase in speed for a particular application just by switching operating systems.

It would be nice if someone who is running a 64-bit operating system on real 3B+ hardware could confirm that this estimate is correct. The program is available in this post. The above mentioned performance results are discussed in subsequent posts of the same thread.
Not sure if anyone has posted results for this as requested, but here's a run on an RPi3B+, gcc 8.2.0, gentoo-on-rpi3-64bit image, with and without -ffast-math (as expected, on arm64 this flag makes essentially no difference), FLIRC case, on-demand governor:

Code: Select all

demouser@pi64 ~ $ gcc -O3 -ffast-math -o fibonacci fibonacci.c -lm

demouser@pi64 ~ $ time ./fibonacci | head -c 32
10727395641800477229364813596225
real	0m7.746s
user	0m7.713s
sys	0m0.032s

demouser@pi64 ~ $ time ./fibonacci | tail -c 32
4856539211500699706378405156269

real	0m7.818s
user	0m7.764s
sys	0m0.033s

demouser@pi64 ~ $ gcc -O3 -o fibonacci fibonacci.c -lm

demouser@pi64 ~ $ time ./fibonacci | head -c 32
10727395641800477229364813596225
real	0m7.740s
user	0m7.713s
sys	0m0.024s

demouser@pi64 ~ $ time ./fibonacci | tail -c 32
4856539211500699706378405156269

real	0m7.813s
user	0m7.795s
sys	0m0.017s
hth, sakaki

ejolson
Posts: 3825
Joined: Tue Mar 18, 2014 11:47 am

Re: 64-bit operating system

Mon Dec 10, 2018 6:29 pm

sakaki wrote:
Mon Dec 10, 2018 1:59 pm
Not sure if anyone has posted results for this as requested, but here's a run on an RPi3B+, gcc 8.2.0, gentoo-on-rpi3-64bit image, with and without -ffast-math (as expected, on arm64 this flag makes essentially no difference), FLIRC case, on-demand governor:

Code: Select all

demouser@pi64 ~ $ gcc -O3 -ffast-math -o fibonacci fibonacci.c -lm

demouser@pi64 ~ $ time ./fibonacci | head -c 32
10727395641800477229364813596225
real	0m7.746s
user	0m7.713s
sys	0m0.032s

demouser@pi64 ~ $ time ./fibonacci | tail -c 32
4856539211500699706378405156269

real	0m7.818s
user	0m7.764s
sys	0m0.033s

demouser@pi64 ~ $ gcc -O3 -o fibonacci fibonacci.c -lm

demouser@pi64 ~ $ time ./fibonacci | head -c 32
10727395641800477229364813596225
real	0m7.740s
user	0m7.713s
sys	0m0.024s

demouser@pi64 ~ $ time ./fibonacci | tail -c 32
4856539211500699706378405156269

real	0m7.813s
user	0m7.795s
sys	0m0.017s
hth, sakaki
Thanks for running the code on the Pi 3B+ in 64-bit mode. Compared to the timing of 15.47 seconds in 32-bit mode from this post, we have

15.47 / 7.740 = 1.999

which is nearly a 2-fold increase in performance. This confirms the similar result posted here.

From my point of view, the fibonacci.c program performs a real computation using an asymptotically reasonable algorithm. In particular, it uses Karatsuba multiplication along with the doubling formulas for the Fibonacci sequence to find the nth term. While some care has been taken with the code, it is definitely not hand-coded assembler tuned to a particular architecture. For these reasons this is not a synthetic benchmark, in my opinion, but rather a program which represents application-level performance that results from writing suitable code to solve a real problem in a high-level language.

It would be interesting to see an example of a reasonably written program which solves a real problem that runs 2-times slower on 64-bit compared to 32-bit. Are there any examples that can be quantitatively compared?

jahboater
Posts: 4843
Joined: Wed Feb 04, 2015 6:38 pm

Re: 64-bit operating system

Mon Dec 10, 2018 8:06 pm

ejolson wrote:
Mon Dec 10, 2018 6:29 pm
It would be interesting to see an example of a reasonably written program which solves a real problem that runs 2-times slower on 64-bit compared to 32-bit. Are there any examples that can be quantitatively compared?
Interesting challenge!

I suspect the only thing that's slower might be a program reading/writing vast numbers of pointers to and from memory.

Pointers (and the related size_t and ptrdiff_t), are the only types that change size gratuitously. You could argue about long, but a reasonably written program should be using stdint.h. Perhaps off_t, but that can be set to 64-bits in 32-bit mode.

The 31 general purpose registers, the removal of the slow instructions, the regular opcode layout, the 32 floating-point registers, and so on, means 64-bit mode is usually going to be a bit faster, like it or not.

ejolson
Posts: 3825
Joined: Tue Mar 18, 2014 11:47 am

Re: 64-bit operating system

Mon Dec 10, 2018 8:31 pm

jahboater wrote:
Mon Dec 10, 2018 8:06 pm
ejolson wrote:
Mon Dec 10, 2018 6:29 pm
It would be interesting to see an example of a reasonably written program which solves a real problem that runs 2-times slower on 64-bit compared to 32-bit. Are there any examples that can be quantitatively compared?
I suspect the only thing that's slower might be a program reading/writing vast numbers of pointers to and from memory.

Pointers (and the related size_t and ptrdiff_t), are the only types that change size gratuitously. You could argue about long, but a reasonably written program should be using stdint.h.

The 31 general purpose registers, the removal of the slow instructions, the regular opcode layout, the 32 floating-point registers, and so on, means 64-bit mode is usually going to be a bit faster, like it or not.
I'm pretty sure it is possible to create a synthetic benchmark that runs 2 times slower by leveraging memory bandwidth constraints when reading 64-bit pointers.

Since development of most mainstream desktop applications now target 64-bit platforms, I suspect most code that showed performance regressions on 64-bit platforms has already been rewritten. For example, one could use 32-bit integer offsets to a 64-bit base pointer in code where the excessive use of 64-bit pointers resulted in slowdowns. While this sounds like a lot of trouble, someone else has already done the tuning. Therefore, finding real-world examples where the 32-bit version runs faster than the 64-bit version may be rather difficult.

jahboater
Posts: 4843
Joined: Wed Feb 04, 2015 6:38 pm

Re: 64-bit operating system

Mon Dec 10, 2018 8:47 pm

From what I have seen, modern hardware is optimized for reading 16-bytes (or more) at a time, probably for SIMD.
It will not have the slightest problem reading 8-byte pointers.

Some time ago I bench marked a crude memcpy that used "ldp q0,q0; stp q0,q0" - 32 bytes at a time (on suitable data) which was extremely fast, 8 times faster than the library memcpy.

In 64-bit mode, the stack, returns from malloc, large static objects, etc are all 16 byte aligned. In 32-bit mode it is 8-bytes.

jamesh
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 24175
Joined: Sat Jul 30, 2011 7:41 pm

Re: 64-bit operating system

Tue Dec 11, 2018 10:44 am

ejolson wrote:
Mon Dec 10, 2018 6:29 pm
sakaki wrote:
Mon Dec 10, 2018 1:59 pm
Not sure if anyone has posted results for this as requested, but here's a run on an RPi3B+, gcc 8.2.0, gentoo-on-rpi3-64bit image, with and without -ffast-math (as expected, on arm64 this flag makes essentially no difference), FLIRC case, on-demand governor:

Code: Select all

demouser@pi64 ~ $ gcc -O3 -ffast-math -o fibonacci fibonacci.c -lm

demouser@pi64 ~ $ time ./fibonacci | head -c 32
10727395641800477229364813596225
real	0m7.746s
user	0m7.713s
sys	0m0.032s

demouser@pi64 ~ $ time ./fibonacci | tail -c 32
4856539211500699706378405156269

real	0m7.818s
user	0m7.764s
sys	0m0.033s

demouser@pi64 ~ $ gcc -O3 -o fibonacci fibonacci.c -lm

demouser@pi64 ~ $ time ./fibonacci | head -c 32
10727395641800477229364813596225
real	0m7.740s
user	0m7.713s
sys	0m0.024s

demouser@pi64 ~ $ time ./fibonacci | tail -c 32
4856539211500699706378405156269

real	0m7.813s
user	0m7.795s
sys	0m0.017s
hth, sakaki
Thanks for running the code on the Pi 3B+ in 64-bit mode. Compared to the timing of 15.47 seconds in 32-bit mode from this post, we have

15.47 / 7.740 = 1.999

which is nearly a 2-fold increase in performance. This confirms the similar result posted here.

From my point of view, the fibonacci.c program performs a real computation using an asymptotically reasonable algorithm. In particular, it uses Karatsuba multiplication along with the doubling formulas for the Fibonacci sequence to find the nth term. While some care has been taken with the code, it is definitely not hand-coded assembler tuned to a particular architecture. For these reasons this is not a synthetic benchmark, in my opinion, but rather a program which represents application-level performance that results from writing suitable code to solve a real problem in a high-level language.

It would be interesting to see an example of a reasonably written program which solves a real problem that runs 2-times slower on 64-bit compared to 32-bit. Are there any examples that can be quantitatively compared?
Has anyone checked the memory used 32 vs 64? Both in program size and memory used during the run?
Principal Software Engineer at Raspberry Pi (Trading) Ltd.
Contrary to popular belief, humorous signatures are allowed. Here's an example...
“I think it’s wrong that only one company makes the game Monopoly.” – Steven Wright

jahboater
Posts: 4843
Joined: Wed Feb 04, 2015 6:38 pm

Re: 64-bit operating system

Tue Dec 11, 2018 10:58 am

jamesh wrote:
Tue Dec 11, 2018 10:44 am
Has anyone checked the memory used 32 vs 64? Both in program size and memory used during the run?
The 64-bit version is larger in both cases.
From "top" :-
64-bit virtual 12232, resident 7916, shared 760
32-bit virtual 12108, resident 7148, shared 788

The executable is 19k (64-bit) and 14k (32-bit)

It seems to be quite variable, I have a text editor compiled on both: 65k (64-bit) and 70k (32-bit)

jamesh
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 24175
Joined: Sat Jul 30, 2011 7:41 pm

Re: 64-bit operating system

Tue Dec 11, 2018 12:21 pm

jahboater wrote:
Tue Dec 11, 2018 10:58 am
jamesh wrote:
Tue Dec 11, 2018 10:44 am
Has anyone checked the memory used 32 vs 64? Both in program size and memory used during the run?
The 64-bit version is larger in both cases.
From "top" :-
64-bit virtual 12232, resident 7916, shared 760
32-bit virtual 12108, resident 7148, shared 788

The executable is 19k (64-bit) and 14k (32-bit)

It seems to be quite variable, I have a text editor compiled on both: 65k (64-bit) and 70k (32-bit)
I guess that is about what I would expect, with the greater size of pointer variables affecting both run time and static memory requirements.
Principal Software Engineer at Raspberry Pi (Trading) Ltd.
Contrary to popular belief, humorous signatures are allowed. Here's an example...
“I think it’s wrong that only one company makes the game Monopoly.” – Steven Wright

User avatar
DavidS
Posts: 4334
Joined: Thu Dec 15, 2011 6:39 am
Location: USA
Contact: Website

Re: 64-bit operating system

Tue Dec 11, 2018 12:58 pm

ejolson wrote: I'm pretty sure it is possible to create a synthetic benchmark that runs 2 times slower by leveraging memory bandwidth constraints when reading 64-bit pointers.
I think not likely. The bus on the RPi is 128-bits wide, hence why we can read 4 32-bit registers at a time in 32-bit real ARM without stalling the pipeline.
Since development of most mainstream desktop applications now target 64-bit platforms, I suspect most code that showed performance regressions on 64-bit platforms has already been rewritten. For example, one could use 32-bit integer offsets to a 64-bit base pointer in code where the excessive use of 64-bit pointers resulted in slowdowns. While this sounds like a lot of trouble, someone else has already done the tuning. Therefore, finding real-world examples where the 32-bit version runs faster than the 64-bit version may be rather difficult.
I think you need to take a look at the real world applications. Yes the example of the extreme Fibonacci will perform better on a 64-bit system, most applications will not do to the limits of the archetecture.

We still do not have a single cycle 32-bit divide, and it takes longer in 64-bit, there are many more examples where 32-bit is faster than 64-bit. Also as we can move 128-bits at a time to or from RAM if not in cache on either 32-bit or 64-bit there is no advantage for that either.

With the 32-bit ARM with its MMU we have the ability to address a space way bigger than is available on any system by more than 8000 times over. So we do not need 64-bit for memory access.

There are a few examples that we all know of where 64-bit is faster, these are the exceptions not the rule. People using exceptions to make something sound faster and better does not really work out in the end.

There is a reason that 32-bit systems still persist on any platform for which 64-bit is available. Those that use the 64-bit versions do it more for the bragging value, or they do not know the truth of performance. There is a reason that many that do know still flock to 32-bit x86 Linux even when there CPU supports AMD64 bit Long Mode. There is a reason that there is a huge demand for 32-bit ReactOS though not really anything to push the 64-bit version along.

So I must dissagree on this issue. 32-bit rules and will until every advantage of the 32-bit ARM is matched on the 64-bit ARM, including the timing for execution of any given instruction.
RPi = The best ARM based RISC OS computer around
More than 95% of posts made from RISC OS on RPi 1B/1B+ computers. Most of the rest from RISC OS on RPi 2B/3B/3B+ computers

jahboater
Posts: 4843
Joined: Wed Feb 04, 2015 6:38 pm

Re: 64-bit operating system

Tue Dec 11, 2018 1:23 pm

DavidS wrote:
Tue Dec 11, 2018 12:58 pm
We still do not have a single cycle 32-bit divide, and it takes longer in 64-bit, there are many more examples where 32-bit is faster than 64-bit.
Dividing large numbers is slower than dividing small numbers. If the numbers are the same size, then a 32-bit divide takes similar time to a 64-bit divide. I mean 42/12 will take the same time on both platforms. Obviously a 64-bit divide can deal with much larger numbers and so may potentially take longer - which is obviously not relevant.
Divide will never take one cycle on any platform, even Intel.
DavidS wrote:
Tue Dec 11, 2018 12:58 pm
So I must dissagree on this issue. 32-bit rules and will until every advantage of the 32-bit ARM is matched on the 64-bit ARM, including the timing for execution of any given instruction.
You should look at the conditional instructions, the 64-bit ones have one less dependency than the 32-bit ones, and work better with modern CPU's (CSET/CSEL/CINC/CNEG/CINV etc). LDP/STP is much much faster than LDM/STM.

Simple things like ADD take the same time even though the 64-bit version can handle much larger numbers.

User avatar
DavidS
Posts: 4334
Joined: Thu Dec 15, 2011 6:39 am
Location: USA
Contact: Website

Re: 64-bit operating system

Tue Dec 11, 2018 4:02 pm

jahboater wrote:
Tue Dec 11, 2018 1:23 pm
DavidS wrote:
Tue Dec 11, 2018 12:58 pm
We still do not have a single cycle 32-bit divide, and it takes longer in 64-bit, there are many more examples where 32-bit is faster than 64-bit.
Dividing large numbers is slower than dividing small numbers. If the numbers are the same size, then a 32-bit divide takes similar time to a 64-bit divide. I mean 42/12 will take the same time on both platforms. Obviously a 64-bit divide can deal with much larger numbers and so may potentially take longer - which is obviously not relevant.
Divide will never take one cycle on any platform, even Intel.
Not long ago we said the same thing for Multiply, everyone believed that a single cycle multiply was not possible without increasing propagation delay to an unacceptable level, that has been proven wrong so I can see a time when the same is true of Divide. As it stands to implement a single cycle divide introduces to much propagation delay, and that is the same issue we had with multiply. The other solution of breaking a divide across multiple pipeline stages is not acceptable because it would make the pipeline way to deep to manage performance in a sane way (optimization would be even beyond compilers of the highest caliber).

Though just because it is not done does not mean it can not be done. And intel is a poor example of anything, except for lackluster design.
DavidS wrote:
Tue Dec 11, 2018 12:58 pm
So I must dissagree on this issue. 32-bit rules and will until every advantage of the 32-bit ARM is matched on the 64-bit ARM, including the timing for execution of any given instruction.
You should look at the conditional instructions, the 64-bit ones have one less dependency than the 32-bit ones, and work better with modern CPU's (CSET/CSEL/CINC/CNEG/CINV etc).
So you are saying that it is lower latency to not be able to have every instruction conditional?
I would argue that, big time. That is the one thing missing from AARCH64 that will forever kill potential performance.

There are a bunch of cases where there is a huge advantage to have every instruction conditional (I know that a few of the newer instructions are not), and have the ability to specify which instructions set flags or not.
LDP/STP is much much faster than LDM/STM.
That is true. Though there are other ways around that issue, using NEON (ok it is a cooprocessor, still it is standard now), and equally fast on both :) .

So not really an advantage in most situations, with very few exceptions.

Also that is not the issue of the ISA, rather the implementation, it would be fairly easy to make LDM/STM single cycle for any load up to 4 registers (128 bits), with out adding much to the implementation, and without increasing any propagation delay in any stage of the pipeline.
Simple things like ADD take the same time even though the 64-bit version can handle much larger numbers.
That is a given, the propagation delay through the gates for the carry look ahead is minimally different between the two lengths when done correctly.

So I stand on my argument.
RPi = The best ARM based RISC OS computer around
More than 95% of posts made from RISC OS on RPi 1B/1B+ computers. Most of the rest from RISC OS on RPi 2B/3B/3B+ computers

Heater
Posts: 13919
Joined: Tue Jul 17, 2012 3:02 pm

Re: 64-bit operating system

Tue Dec 11, 2018 4:35 pm

DavidS,
So I stand on my argument.
Meanwhile in the real world:

1) Division takes longer for bigger numbers. Even in the ARM world. See for example:
https://lemire.me/blog/2017/11/17/fast- ... m-edition/

2) There is no "huge advantage to have every instruction conditional".
As evidenced by the fact that the RISC V does not do that. If it were advantageous the RISC V designers would have used it. They have been studying and experimenting with these things for decades, they know. Besides, actual RISC V devices demonstrate it is not required.

3) There is nothing "lackluster" about what Intel has achieved. One can argue the x86 is a mess but Intel, bless'em, has invested billions in efforts to get off that to something else, i432, i860, Itanium of the decades. It there customers than continually demand more of the same, so they have obliged.

4) Real world applications have demanded 64 bit computing. The likes of Google would not buy all that 64 bit hardware if it was less efficient.

Is this of any relevance to the Pi? Mostly not.
Memory in C++ is a leaky abstraction .

jahboater
Posts: 4843
Joined: Wed Feb 04, 2015 6:38 pm

Re: 64-bit operating system

Tue Dec 11, 2018 4:58 pm

DavidS wrote:
Tue Dec 11, 2018 4:02 pm
So you are saying that it is lower latency to not be able to have every instruction conditional?
I was just saying the new conditional instructions in A64 have one less dependency than the A32 ones.
They work in a different way. They are always executed and therefore the destination register is not dependent on its previous value.

I suspect the new conditionals were chosen as being the most useful ones.
DavidS wrote:
Tue Dec 11, 2018 4:02 pm
That is the one thing missing from AARCH64 that will forever kill potential performance.
The exact opposite, it was to enable high performance on future ARM architectures. Pretty obviously, any out of order CPU will benefit. And it free's up four bits in the opcode enabling 32 registers instead of 16 - a huge benefit.
It sounds like you think the ARM CPU designers are wrong - which I very much doubt :)

LDP/STP is much much faster than LDM/STM. That is true.
Here is a cool thing!
I like LDP/STP because you can give the same register twice, which you cant with LDM/STM.
For example, I have a C structure that is 16 bytes in size and I want to zero it all.
The compiler changes "memset( &mystruct, 0, 16 )" into say "STP XZR, XZR, [X25]" (using register 31, the zero register)
You cant do that in one instruction with STM.
Edit: You can do it two instructions with NEON - as you say!

User avatar
DavidS
Posts: 4334
Joined: Thu Dec 15, 2011 6:39 am
Location: USA
Contact: Website

Re: 64-bit operating system

Tue Dec 11, 2018 5:26 pm

jahboater wrote:
Tue Dec 11, 2018 4:58 pm
DavidS wrote:
Tue Dec 11, 2018 4:02 pm
So you are saying that it is lower latency to not be able to have every instruction conditional?
I was just saying the new conditional instructions in A64 have one less dependency than the A32 ones.
They work in a different way. They are always executed and therefore the output register is not dependent on the previous value. That can break a dependency chain.

I suspect the new conditionals were chosen as being the most useful ones.
DavidS wrote:
Tue Dec 11, 2018 4:02 pm
That is the one thing missing from AARCH64 that will forever kill potential performance.
The exact opposite, it was to enable high performance on future ARM architectures. Pretty obviously, any out of order CPU will benefit. You sound like you think the ARM CPU designers are wrong and/or stupid - which I very much doubt :)
Not by a long shot. I more think that the advantages one way or the other are unbalanced. The AARCH64 feels like an experimental ISA. As for the dependancy chain, that is on the coder.

On a personal note I still feel (because of the research we did while I was in university) that it is a better choice to use in-order multiple issue architectures than it is to use out of order multiple issue architectures. Either way you are unlikely to execute more than 4 instructions per cycle in a single stream (the limits of dataflow, regardless of number of registers), and either way you have about equal chance of issuing more instructions in parallel in a single stream. Though In order multiple issue has the advantage of being simpler to implement, and reducing potential propagation issues by being able to issue instructions without any extra pipeline delays (unlike most out of order implementations). Uses less components positive, simplifies the pipeline positive, at least equals potential performance positive. In either case there will need to be well optimized code.
LDP/STP is much much faster than LDM/STM. That is true.
Here is a cool thing!
I like LDP/STP because you can give the same register twice, which you cant with LDM/STM.
For example, I have a C structure that is 16 bytes in size and I want to zero it all.
The compiler changes "memset( &mystruct, 0, 16 )" into say "STP XZR, XZR, [X25]" (using register 31, the zero register)
You cant do that in one instruction with STM.
Edit: You can do it two instructions with NEON - as you say!
Yes there are definite advantages to the LDP/STP instructions. Now if we can get our conditionals back, have a way to execute normal ARM code without having to go through 3 state changes each way. Either that or have the licencing on 32-bit ARM cores go way down in cost so more companies are compelled to use the 32-bit, if ARM really wants to push the AARCH64 on the world in place of ARM ISA.
RPi = The best ARM based RISC OS computer around
More than 95% of posts made from RISC OS on RPi 1B/1B+ computers. Most of the rest from RISC OS on RPi 2B/3B/3B+ computers

Heater
Posts: 13919
Joined: Tue Jul 17, 2012 3:02 pm

Re: 64-bit operating system

Tue Dec 11, 2018 5:50 pm

DavidS,
Uses less components positive, simplifies the pipeline positive, at least equals potential performance positive.
Sounds reasonable to me.

I guess the RISC V guys are on the right track then. They check all those boxes.
Memory in C++ is a leaky abstraction .

code_exec
Posts: 273
Joined: Sun Sep 30, 2018 12:25 pm

Re: 64-bit operating system

Tue Dec 11, 2018 5:59 pm

64-bit on the Pi is possible, and very stable. I'm writing this from a Pi 3B running 64-bit Debian MATE on Chromium with two other tabs open, and it's running smoothly for me.
Ubuntu 18.04 LTS desktop images for the Raspberry Pi 3.

https://github.com/CodeExecution/Ubuntu-ARM64-RPi

jdonald
Posts: 417
Joined: Fri Nov 03, 2017 4:36 pm

Re: 64-bit operating system

Tue Dec 11, 2018 11:52 pm

The issue with 32-bit Docker got me thinking: might it be any different with other types of containers? So I tried LXC with a 64-bit kernel on Raspbian:

Code: Select all

sudo apt install lxc

# enable bridge networking
echo 'USE_LXC_BRIDGE="true"' | sudo tee /etc/default/lxc-net

# replace default lxc.network.type = empty
cat <<EOF | sudo tee /etc/lxc/default.conf
lxc.network.type = veth
lxc.network.link = lxcbr0
lxc.network.flags = up
lxc.network.hwaddr = 00:16:3e:xx:xx:xx
EOF

sudo systemctl restart lxc-net

sudo lxc-create -t download --name pi64 -- -d debian -r stretch -a arm64
# lxc.seccomp fails on Debian ARM; see lxc#1490
echo 'lxc.seccomp =' | sudo tee -a /usr/share/lxc/config/debian.common.conf
sudo lxc-start -n pi64 -d
sudo lxc-attach -n pi64

# root@pi64:/# apt install gcc
# root@pi64:/# ...
So now you can run 64-bit software on 32-bit Raspbian without resorting to multiarch. LXC is not as user-friendly as Docker but gets the job done. With this proof-of-concept I don't see any fundamental reason that this shouldn't be possible with docker-ce:armhf, so I'll file a ticket with them.
Last edited by jdonald on Fri Dec 21, 2018 7:00 pm, edited 1 time in total.

User avatar
Gavinmc42
Posts: 4057
Joined: Wed Aug 28, 2013 3:31 am

Re: 64-bit operating system

Wed Dec 12, 2018 12:28 am

64-bit on the Pi is possible, and very stable. I'm writing this from a Pi 3B running 64-bit Debian MATE on Chromium with two other tabs open, and it's running smoothly for me.
Try Gentoo64 with Firefox, I got up to 30+ tabs and gave up adding and counting more.
It is also a bit more bleeding edge and has newer stuff than the normal Debian.
I'm dancing on Rainbows.
Raspberries are not Apples or Oranges

Heater
Posts: 13919
Joined: Tue Jul 17, 2012 3:02 pm

Re: 64-bit operating system

Wed Dec 12, 2018 12:46 am

Gavinmc42,

Wow, good old Gentoo. I have not used that since 2000 or so. Not thought about it much since. Glad to see it's still going strong.

Any sign of webgl working? For example http://webglsamples.org/blob/blob.html
Memory in C++ is a leaky abstraction .

Daniel Gessel
Posts: 26
Joined: Sun Dec 03, 2017 1:47 am

Re: 64-bit operating system

Thu Dec 13, 2018 2:45 pm

My 2 cents is that while I’d love to see a 64 bit “official” Raspian, I’d prioritize 64-bit Rapberry Pi Desktop because I think it would benefit the educational mission of the foundation to get teachers to switch lab computers and their own laptops/desktops to the environment their students are using (without e.g. losing the benefit of > 4GB of memory, especially useful in the edu environment for stuff like media creation).

Dan

User avatar
Gavinmc42
Posts: 4057
Joined: Wed Aug 28, 2013 3:31 am

Re: 64-bit operating system

Fri Dec 14, 2018 12:49 am

My 2 cents is that while I’d love to see a 64 bit “official” Raspian
If kids NEED 64bit OS's then does the OS distribution NEED to be Raspbian?
I'm assuming serious stuff needs 64bit so anyone doing that sort of stuff could learn any OS?
In 10 years time kids will be saying "What's Linux, I use Raspbian"?
Raspbian then PiCore then Gentoo64 and on my PC's Mint.
That's 4 Linux Distributions I use to write OS less code for Pi's with Ultibo.

Blender for Artists works better on my Gentoo64 Pi box :D
Neddy Seagoon and Sakaki have made Gentoo64 work on Pi's.
It is almost at the stage where I can move Pi development 100% to Gentoo64.
Any sign of webgl working? For example http://webglsamples.org/blob/blob.html
Heater, don't have internet here for my Gentoo64 box, will get back to you on the answer.
WebGL uses OpenGLES, but this is something I want to do and even extend it to GLTF.
That's why I have moved to 64 bit OS on Pi's to find answers to stuff like this.
I'm dancing on Rainbows.
Raspberries are not Apples or Oranges

ejolson
Posts: 3825
Joined: Tue Mar 18, 2014 11:47 am

Re: 64-bit operating system

Fri Dec 14, 2018 1:41 am

Gavinmc42 wrote:
Fri Dec 14, 2018 12:49 am
My 2 cents is that while I’d love to see a 64 bit “official” Raspian
If kids NEED 64bit OS's then does the OS distribution NEED to be Raspbian?
From what I understand, Raspbian is for the benefit of teachers and parents who just want the computer to be configured by default in a way suitable for children to learn computer science. Raspberry Pi Desktop on x86 compatible hardware provides a familiar programming environment for those already comfortable with Raspbian.

Given that common desktop computers have been 64-bit capable for more than a decade, I found it surprising that Raspberry Pi Desktop was 32-bit compatible. While very few people are running Pentium III, original Athlon processors and earlier, there are some nested virtualization solutions (e.g. a virtual machine running inside another virtual machine) for which 32-bit is required. Maybe that is why Raspberry Pi Desktop is 32-bit.

W. H. Heydt
Posts: 11102
Joined: Fri Mar 09, 2012 7:36 pm
Location: Vallejo, CA (US)

Re: 64-bit operating system

Fri Dec 14, 2018 4:38 am

ejolson wrote:
Fri Dec 14, 2018 1:41 am
While very few people are running Pentium III, original Athlon processors and earlier,....
Hmmm... I think my Win98SE box has a P-III. I know that my SuSE system dual Opteron-240 CPUs, though. I built that one in 2003 and picked SuSE because it was the only commercially available 64-bit Linux system at the time.

Daniel Gessel
Posts: 26
Joined: Sun Dec 03, 2017 1:47 am

Re: 64-bit operating system

Fri Dec 14, 2018 2:14 pm

I see value in a 64-bit Raspian due to the performance implications of having 64 bit instructions for arbitrary precision arithmetic and the potential to improve the speed of certain applications, e.g. Mathematica. I’d switch to Raspberry Pi desktop if it were 64 bit. What I’m doing isn’t serious enough to need multiple OS’s, so I’d be happy to stick with one.

User avatar
Gavinmc42
Posts: 4057
Joined: Wed Aug 28, 2013 3:31 am

Re: 64-bit operating system

Sat Dec 15, 2018 5:10 am

Any sign of webgl working? For example http://webglsamples.org/blob/blob.html

Heater, don't have internet here for my Gentoo64 box, will get back to you on the answer.
Heater - 19fps on my Celeron Core Duo Mint box and 5fps on my Gentoo64 Pi3B+.
Aquarium crashed Firefox on Gentoo64 and runs 10fps on Mint.
Gentoo64 Firefox is the Nightly 63.0.3 Developers? version.
The saga Neddy Seagoon went through to compile it is on the Gentoo forums.

If I remember right, Firefox is using the Servo engine, not sure if that has VC4 hardware acceleration?
Yep this WebGL stuff is pushing the Firefox browser in Gentoo64 over the edge ;)
Firefox might crash but Gentoo64 is still running :D

Mind you the point of me getting a 64bit OS was to learn how to code GL without an OS.
So a browser that is perfectly fine with many tabs of text/GL tutorials is ok for my stuff, at the moment.
Who knows, now that Aarch64 Pi's are doable engines like Servo can be improved by someone from the Pi community.

For gaming a Steam type baremetal OS running GLTF models should be possible.
Probably not for commercial use but certainly for research and Computer Science coding.
This is not something that needs to be handicapped by only doing it 32bits.
All four of those 64bit 1.4GHZ Arm, 128 bit NEON cores I suspect will be needed?
I'm dancing on Rainbows.
Raspberries are not Apples or Oranges

Return to “General discussion”