jdonald
Posts: 413
Joined: Fri Nov 03, 2017 4:36 pm

Re: Why moving to 64bit?

Tue Aug 13, 2019 1:15 pm

Running some quick experiments, I see now. Part of the reason I was getting incorrect output without -mcpu prior is that I instinctively used:

Code: Select all

    -march=armv7-a -mtune=cortex-a72
Without knowing the semantics behind the args, this seemed appropriate at first. After all I am trying to generate ARMv7 instructions that are compatible with a Cortex-A72. I guess there's a backstory for why at a minimum it requires -march=armv8-a (verified sufficient for a basic 32-bit division program), or for something simpler: what jahboater said.

Adding or omitting -mfloat-abi=hard makes no difference in the output binary as far I've seen, as hard float is the default.

jahboater
Posts: 4678
Joined: Wed Feb 04, 2015 6:38 pm

Re: Why moving to 64bit?

Tue Aug 13, 2019 2:19 pm

You may want to add:

-mfpu=neon-fp-armv8

or it will use the old VFP instead (even with -march=native)

Brad Q
Posts: 26
Joined: Mon Aug 12, 2019 12:10 am

Re: Why moving to 64bit?

Tue Aug 13, 2019 6:13 pm

Having been around when Fedore Core (FC4)(and yes it used to be Fedora Core and not just Fedora) was making it's jump to mainstream the 64 bit version, most of this thread is just replaying old music. Which is pretty amusing.

One of the "problems" I see is that the nature of Pi machines seem to be in transition. Going from a task specific type machine to a low energy consumption desktop pc. This seem to disturb some. I suspect that the OS will eventually evolve into two main versions. A 64bit version for the Desktop crowd and a more streamlined (sans even current desktop considerations) 32bit version for the more traditional Pi use.

Keep in mind this is my first Pi (4-4gb) and that my intended use is email and browsing level usage.

Heater
Posts: 13299
Joined: Tue Jul 17, 2012 3:02 pm

Re: Why moving to 64bit?

Tue Aug 13, 2019 7:00 pm

Well, transition indeed.

The ARM has been growing up. From 32 bits to 64, from single core to many, from Megs of RAM to Gigs. If it were not for the pressure from smart phones, smart TV's and such it may have stayed a 32 bit machine used as a micro-controller forever. Not to mention the move now to the server space.

Meanwhile the Pi, starting as a small and cheap machine targeted at kids education was rapidly taken up by hackers and makers and then finding itself in demand to create products. Then then a whole new crowd that only want nothing but a cheap way to watch TV and the like.

It's a very wide spectrum of use cases.

jdonald
Posts: 413
Joined: Fri Nov 03, 2017 4:36 pm

Re: Why moving to 64bit?

Thu Aug 15, 2019 11:04 pm

I've run more benchmarks on the controlled setup. After sysbench I figured I'd start with elliptic curve crypto in order to put to rest the rumors from a year ago.

Below you can find the raw numbers for ECDH 64-bit followed by 32-bit ECDH, then 64-bit ECDSA followed by 32-bit ECDSA.

Code: Select all

(pi64)pi@raspberrypi:~ $ openssl speed ecdh
...
OpenSSL 1.1.1c  28 May 2019
built on: Thu May 30 15:27:48 2019 UTC
options:bn(64,64) rc4(char) des(int) aes(partial) blowfish(ptr) 
compiler: gcc -fPIC -pthread -Wa,--noexecstack -Wall -Wa,--noexecstack -g -O2 -fdebug-prefix-map=/build/openssl-vZWY2W/openssl-1.1.1c=. -fstack-protector-strong -Wformat -Werror=format-security -DOPENSSL_USE_NODELETE -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_BN_ASM_MONT -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DVPAES_ASM -DECP_NISTZ256_ASM -DPOLY1305_ASM -DNDEBUG -Wdate-time -D_FORTIFY_SOURCE=2
                              op      op/s
 160 bits ecdh (secp160r1)   0.0008s   1211.3
 192 bits ecdh (nistp192)   0.0010s   1003.2
 224 bits ecdh (nistp224)   0.0014s    707.6
 256 bits ecdh (nistp256)   0.0003s   3813.7
 384 bits ecdh (nistp384)   0.0043s    231.6
 521 bits ecdh (nistp521)   0.0113s     88.7
 163 bits ecdh (nistk163)   0.0012s    853.9
 233 bits ecdh (nistk233)   0.0017s    580.3
 283 bits ecdh (nistk283)   0.0036s    279.0
 409 bits ecdh (nistk409)   0.0075s    132.7
 571 bits ecdh (nistk571)   0.0157s     63.7
 163 bits ecdh (nistb163)   0.0012s    812.8
 233 bits ecdh (nistb233)   0.0018s    552.7
 283 bits ecdh (nistb283)   0.0039s    257.3
 409 bits ecdh (nistb409)   0.0084s    119.7
 571 bits ecdh (nistb571)   0.0175s     57.1
 256 bits ecdh (brainpoolP256r1)   0.0016s    632.2
 256 bits ecdh (brainpoolP256t1)   0.0016s    629.7
 384 bits ecdh (brainpoolP384r1)   0.0043s    231.3
 384 bits ecdh (brainpoolP384t1)   0.0043s    233.3
 512 bits ecdh (brainpoolP512r1)   0.0085s    117.5
 512 bits ecdh (brainpoolP512t1)   0.0085s    118.3
 253 bits ecdh (X25519)   0.0003s   3524.4
 448 bits ecdh (X448)   0.0018s    566.0

(pi32)pi@raspberrypi:~/openssl-1.1.1c/build_shared/apps $ LD_LIBRARY_PATH=.. ./openssl speed ecdh
...
OpenSSL 1.1.1c  28 May 2019
built on: Thu May 30 15:27:48 2019 UTC # jdonald NB: "built on" misleading because it uses SOURCE_DATE_EPOCH
options:bn(64,32) rc4(char) des(long) aes(partial) blowfish(ptr) 
compiler: gcc -fPIC -pthread -Wa,--noexecstack -Wall -Wa,--noexecstack -march=armv8-a+crc+simd -mtune=cortex-a72 -mfpu=neon-fp-armv8 -g -O2 -fdebug-prefix-map=/home/pi/openssl-1.1.1c=. -fstack-protector-strong -Wformat -Werror=format-security -DOPENSSL_USE_NODELETE -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DAES_ASM -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPOLY1305_ASM -DNDEBUG -Wdate-time -D_FORTIFY_SOURCE=2
                              op      op/s
 160 bits ecdh (secp160r1)   0.0010s    978.6
 192 bits ecdh (nistp192)   0.0015s    675.3
 224 bits ecdh (nistp224)   0.0021s    479.1
 256 bits ecdh (nistp256)   0.0004s   2348.4
 384 bits ecdh (nistp384)   0.0078s    128.9
 521 bits ecdh (nistp521)   0.0193s     51.9
 163 bits ecdh (nistk163)   0.0011s    870.2
 233 bits ecdh (nistk233)   0.0019s    522.3
 283 bits ecdh (nistk283)   0.0034s    291.3
 409 bits ecdh (nistk409)   0.0072s    139.6
 571 bits ecdh (nistk571)   0.0167s     59.8
 163 bits ecdh (nistb163)   0.0012s    819.2
 233 bits ecdh (nistb233)   0.0021s    479.0
 283 bits ecdh (nistb283)   0.0038s    266.2
 409 bits ecdh (nistb409)   0.0081s    123.1
 571 bits ecdh (nistb571)   0.0189s     52.9
 256 bits ecdh (brainpoolP256r1)   0.0027s    371.9
 256 bits ecdh (brainpoolP256t1)   0.0027s    372.4
 384 bits ecdh (brainpoolP384r1)   0.0078s    128.6
 384 bits ecdh (brainpoolP384t1)   0.0077s    129.3
 512 bits ecdh (brainpoolP512r1)   0.0110s     91.3
 512 bits ecdh (brainpoolP512t1)   0.0109s     91.7
 253 bits ecdh (X25519)   0.0005s   1839.8
 448 bits ecdh (X448)   0.0026s    383.9
 
(pi64)pi@raspberrypi:~ $ openssl speed ecdsa
...
OpenSSL 1.1.1c  28 May 2019
built on: Thu May 30 15:27:48 2019 UTC
options:bn(64,64) rc4(char) des(int) aes(partial) blowfish(ptr) 
compiler: gcc -fPIC -pthread -Wa,--noexecstack -Wall -Wa,--noexecstack -g -O2 -fdebug-prefix-map=/build/openssl-vZWY2W/openssl-1.1.1c=. -fstack-protector-strong -Wformat -Werror=format-security -DOPENSSL_USE_NODELETE -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_BN_ASM_MONT -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DVPAES_ASM -DECP_NISTZ256_ASM -DPOLY1305_ASM -DNDEBUG -Wdate-time -D_FORTIFY_SOURCE=2
                              sign    verify    sign/s verify/s
 160 bits ecdsa (secp160r1)   0.0009s   0.0007s   1135.5   1389.2
 192 bits ecdsa (nistp192)   0.0011s   0.0009s    944.2   1152.4
 224 bits ecdsa (nistp224)   0.0015s   0.0012s    668.7    848.8
 256 bits ecdsa (nistp256)   0.0001s   0.0003s   8275.4   2871.0
 384 bits ecdsa (nistp384)   0.0045s   0.0033s    220.2    302.9
 521 bits ecdsa (nistp521)   0.0119s   0.0082s     84.3    121.4
 163 bits ecdsa (nistk163)   0.0013s   0.0025s    798.6    404.2
 233 bits ecdsa (nistk233)   0.0018s   0.0036s    545.0    274.5
 283 bits ecdsa (nistk283)   0.0038s   0.0075s    263.3    132.6
 409 bits ecdsa (nistk409)   0.0079s   0.0156s    127.1     64.1
 571 bits ecdsa (nistk571)   0.0163s   0.0323s     61.2     30.9
 163 bits ecdsa (nistb163)   0.0013s   0.0026s    761.6    385.7
 233 bits ecdsa (nistb233)   0.0019s   0.0038s    518.8    260.3
 283 bits ecdsa (nistb283)   0.0041s   0.0080s    246.9    124.4
 409 bits ecdsa (nistb409)   0.0087s   0.0172s    115.3     58.1
 571 bits ecdsa (nistb571)   0.0182s   0.0360s     54.9     27.8
 256 bits ecdsa (brainpoolP256r1)   0.0017s   0.0014s    598.8    695.8
 256 bits ecdsa (brainpoolP256t1)   0.0017s   0.0013s    599.9    755.6
 384 bits ecdsa (brainpoolP384r1)   0.0046s   0.0035s    219.5    282.9
 384 bits ecdsa (brainpoolP384t1)   0.0045s   0.0033s    221.7    305.6
 512 bits ecdsa (brainpoolP512r1)   0.0089s   0.0064s    112.3    155.4
 512 bits ecdsa (brainpoolP512t1)   0.0088s   0.0059s    113.1    168.8
 
 (pi32)pi@raspberrypi:~/openssl-1.1.1c/build_shared/apps $ LD_LIBRARY_PATH=.. ./openssl speed ecdsa
...
OpenSSL 1.1.1c  28 May 2019
built on: Thu May 30 15:27:48 2019 UTC # jdonald NB: "built on" misleading because it uses SOURCE_DATE_EPOCH
options:bn(64,32) rc4(char) des(long) aes(partial) blowfish(ptr) 
compiler: gcc -fPIC -pthread -Wa,--noexecstack -Wall -Wa,--noexecstack -march=armv8-a+crc+simd -mtune=cortex-a72 -mfpu=neon-fp-armv8 -g -O2 -fdebug-prefix-map=/home/pi/openssl-1.1.1c=. -fstack-protector-strong -Wformat -Werror=format-security -DOPENSSL_USE_NODELETE -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DAES_ASM -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPOLY1305_ASM -DNDEBUG -Wdate-time -D_FORTIFY_SOURCE=2
                              sign    verify    sign/s verify/s
 160 bits ecdsa (secp160r1)   0.0011s   0.0009s    910.1   1128.1
 192 bits ecdsa (nistp192)   0.0016s   0.0012s    640.0    809.1
 224 bits ecdsa (nistp224)   0.0022s   0.0017s    457.0    596.6
 256 bits ecdsa (nistp256)   0.0002s   0.0006s   4250.7   1612.2
 384 bits ecdsa (nistp384)   0.0081s   0.0057s    123.0    175.9
 521 bits ecdsa (nistp521)   0.0202s   0.0136s     49.4     73.7
 163 bits ecdsa (nistk163)   0.0012s   0.0024s    810.5    411.6
 233 bits ecdsa (nistk233)   0.0020s   0.0039s    497.1    254.4
 283 bits ecdsa (nistk283)   0.0036s   0.0072s    276.2    139.8
 409 bits ecdsa (nistk409)   0.0077s   0.0149s    130.6     67.0
 571 bits ecdsa (nistk571)   0.0178s   0.0348s     56.2     28.7
 163 bits ecdsa (nistb163)   0.0013s   0.0025s    770.4    394.1
 233 bits ecdsa (nistb233)   0.0022s   0.0042s    463.7    236.4
 283 bits ecdsa (nistb283)   0.0040s   0.0078s    253.0    128.8
 409 bits ecdsa (nistb409)   0.0085s   0.0166s    117.9     60.1
 571 bits ecdsa (nistb571)   0.0199s   0.0391s     50.2     25.6
 256 bits ecdsa (brainpoolP256r1)   0.0028s   0.0023s    354.0    441.2
 256 bits ecdsa (brainpoolP256t1)   0.0028s   0.0021s    354.6    470.6
 384 bits ecdsa (brainpoolP384r1)   0.0081s   0.0061s    123.1    162.8
 384 bits ecdsa (brainpoolP384t1)   0.0081s   0.0057s    123.8    176.5
 512 bits ecdsa (brainpoolP512r1)   0.0115s   0.0085s     86.7    117.2
 512 bits ecdsa (brainpoolP512t1)   0.0114s   0.0079s     87.8    126.7
 
In order to ensure a fair baseline for 32-bit I added -march=armv8-a+crc+simd -mtune=cortex-a72 -mfpu=neon-fp-armv8 and built 32-bit openssl from source in a Debian armhf container. It was a bit tricky as providing CFLAGS unexpectedly overrides the default flags (including -O3) leading to erroneous results. One needs to make sure the overridden CFLAGS and CXXFLAGS are the combined set before running dpkg-buildpackage. This build from source ultimately didn't change ARMv7 libcrypto performance much (certainly not the 10x difference seen in sysbench), but was done for diligence.

The median speedup here is +28.9% for ECDH and +27.8% for ECDSA. Not 3x, but higher than I expected.

I did some rough tests for RSA, DSA, and HMAC and so far those results point towards a wash for 64-bit vs 32-bit.

Which brings us to the cipher, the part of HTTPS that executes across every byte of content you download. Running AES (specifically focused on openssl speed -evp aes-256-gcm) showed a major anomaly with 64-bit achieving half the throughput. I investigated and if you look closely at the compiler options above the likely reason is there. The Debian aarch64 builds don't include -DAES_ASM, because it turns out OpenSSL doesn't have an aarch64 AES assembly implementation (despite having VPAES) yet. I hope this gets fixed in the not too distant future.

As far as I can tell, OpenSSH does not use libcrypto's implementation of AES. I'm curious as to whether it has the same issue with its 64-bit implementation. Unfortunately it does not appear to have a convenient benchmarking command-line option like OpenSSL.

jerrm
Posts: 194
Joined: Wed May 02, 2018 7:35 pm

Re: Why moving to 64bit?

Fri Aug 16, 2019 3:07 am

I saw similar differences, OpenSSH was another off the curve performance improvement. 64 bit was a good 30% faster for OpenSSH transfers.

I saw similar AES256 performance when testing OpenVPN. 32 bit gave 112mbps before pegging the Pi CPU. 64 bit around 99mbps. It would be nice if the OpenSSL assembly code gets ported to aarch64, but with hardware aes being so prevalent elsewhere I'm afraid it may not get much attention.

It's a shame RPF couldn't get hardware encryption extensions in the SOC. The Pi is now CPU bound instead of IO bound.

Below are the OpenSSH tests I ran. The remote device is the Pi4/2GB in these tests. The 64 bit target is running Sakaki's 7/28 image with a Deb64 chroot OpenSSH. 64/32 bit is Sakaki with 32bit Raspbian userland. 32 bit is pure Raspbian. The local device is a 7yr old i3 notebook PC running Buster. All tests were using the repo versions of OpenSSH. The method I used for bench marking may be crude, but the results carry over to real life rsync jobs.

Code: Select all

=============================================================================================
64 BIT
=============================================================================================
time dd if=/dev/zero bs=10240 count=409600 | ssh -p 2222 root@192.168.22.58 'cat > /dev/zero'
---------------------------------------------------------------------------------------------
409600+0 records in
409600+0 records out
4194304000 bytes (4.2 GB, 3.9 GiB) copied, 60.6595 s, 69.1 MB/s

real    1m0.692s
user    0m35.075s
sys     0m11.436s

409600+0 records in
409600+0 records out
4194304000 bytes (4.2 GB, 3.9 GiB) copied, 57.0103 s, 73.6 MB/s

real    0m57.046s
user    0m34.734s
sys     0m9.792s

---------------------------------------------------------------------------------------
time ssh -p 2222 root@192.168.22.58 'dd if=/dev/zero bs=10240 count=409600' > /dev/null
---------------------------------------------------------------------------------------
409600+0 records in
409600+0 records out
4194304000 bytes (4.2 GB, 3.9 GiB) copied, 68.5412 s, 61.2 MB/s

real    1m8.837s
user    0m38.308s
sys     0m11.058s

409600+0 records in
409600+0 records out
4194304000 bytes (4.2 GB, 3.9 GiB) copied, 69.4922 s, 60.4 MB/s

real    1m9.784s
user    0m38.732s
sys     0m10.975s


=====================================================================================
64/32 BIT
=====================================================================================
time dd if=/dev/zero bs=10240 count=409600 | ssh root@192.168.22.58 'cat > /dev/zero'
-------------------------------------------------------------------------------------
409600+0 records in
409600+0 records out
4194304000 bytes (4.2 GB, 3.9 GiB) copied, 80.2948 s, 52.2 MB/s

real    1m20.338s
user    0m40.381s
sys     0m11.419s

409600+0 records in
409600+0 records out
4194304000 bytes (4.2 GB, 3.9 GiB) copied, 80.8875 s, 51.9 MB/s

real    1m20.931s
user    0m41.776s
sys     0m11.443s

-------------------------------------------------------------------------------
time ssh root@192.168.22.58 'dd if=/dev/zero bs=10240 count=409600' > /dev/null
-------------------------------------------------------------------------------
409600+0 records in
409600+0 records out
4194304000 bytes (4.2 GB, 3.9 GiB) copied, 92.5082 s, 45.3 MB/s

real    1m32.887s
user    0m44.963s
sys     0m15.377s

409600+0 records in
409600+0 records out
4194304000 bytes (4.2 GB, 3.9 GiB) copied, 92.8123 s, 45.2 MB/s

real    1m33.179s
user    0m44.541s
sys     0m15.030s


=====================================================================================
32 BIT
=====================================================================================
time dd if=/dev/zero bs=10240 count=409600 | ssh root@192.168.22.58 'cat > /dev/zero'
-------------------------------------------------------------------------------------
409600+0 records in
409600+0 records out
4194304000 bytes (4.2 GB, 3.9 GiB) copied, 91.1898 s, 46.0 MB/s

real    1m31.246s
user    0m43.491s
sys     0m12.446s
409600+0 records in
409600+0 records out
4194304000 bytes (4.2 GB, 3.9 GiB) copied, 90.1818 s, 46.5 MB/s

real    1m30.231s
user    0m44.946s
sys     0m10.945s

-------------------------------------------------------------------------------
time ssh root@192.168.22.58 'dd if=/dev/zero bs=10240 count=409600' > /dev/null
-------------------------------------------------------------------------------
409600+0 records in
409600+0 records out
4194304000 bytes (4.2 GB, 3.9 GiB) copied, 96.0945 s, 43.6 MB/s

real    1m36.693s
user    0m48.734s
sys     0m13.826s

409600+0 records in
409600+0 records out
4194304000 bytes (4.2 GB, 3.9 GiB) copied, 94.0497 s, 44.6 MB/s

real    1m34.669s
user    0m46.585s
sys     0m14.196s

ejolson
Posts: 3553
Joined: Tue Mar 18, 2014 11:47 am

Re: Why moving to 64bit?

Fri Aug 16, 2019 4:45 am

jdonald wrote:
Thu Aug 15, 2019 11:04 pm
Running AES (specifically focused on openssl speed -evp aes-256-gcm) showed a major anomaly with 64-bit achieving half the throughput. I investigated and if you look closely at the compiler options above the likely reason is there.
My understanding is that many of the 64-bit ARM processors have AES cryptographic extensions in hardware as described here. If that's the case, it could explain why nobody has hand coded AES directly in 64-bit ARM assembler.

From what I understand, the SOC for the Pi 3B+ omitted the hardware AES instructions. Does anyone know whether the 4B has them?

jdonald
Posts: 413
Joined: Fri Nov 03, 2017 4:36 pm

Re: Why moving to 64bit?

Fri Aug 16, 2019 5:59 am

Looks like I misspoke when I said OpenSSL doesn't have an AES assembly implementation for aarch64. aesv8-armx.pl generates such assembly and even has a Cortex-A72 performance estimate in the comments. The problem seems to be more subtle in that bsaes-armv7.pl exists but there's no bsaes-armv8.pl, causing the configuration logic not to define the AES_ASM macro in the arm64 case. Then distros like Debian or Ubuntu don't care if this is broken because they prioritize compatibility with Cortex-A53 chips (including the Pi 3B+).

This requires more investigation, and I'd appreciate if someone can point out the proper incantation to get this to build with AES accelerator instructions for arm64. Merely forcing -DAES_ASM=1 ultimately causes linker errors.

jerrm thanks for running more tests particularly OpenSSH.

However, in light of what pica200 and others pointed out isn't the biggest concern with your methodology that you're comparing 64-bit programs against ARMv6 ones? Furthermore, even if you used ARMv7 binaries it would still fail to account for the performance difference of using first-gen ARMv7 vs higher-end ARMv7. While you might hope this part to be negligible (as it sometimes is), in the case of sysbench there's a 10x performance difference.

I think the only way to properly account for that is to compile your 32-bit baseline test programs with -march=armv8-a+crc+simd -mtune=cortex-a72 -mfpu=neon-fp-armv8. Or at the very least, you could run your baseline tests in a Debian armhf chroot instead of Raspbian's userland to avoid comparing against ARMv6.

dp11
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 14
Joined: Thu Dec 29, 2011 5:46 pm

Re: Why moving to 64bit?

Fri Aug 16, 2019 6:29 am

My understanding is that if the crypto instructions exist they also exist in 32bit world. See pdf near the end for instruction timings http://infocenter.arm.com/help/index.js ... index.html

jerrm
Posts: 194
Joined: Wed May 02, 2018 7:35 pm

Re: Why moving to 64bit?

Fri Aug 16, 2019 12:38 pm

jdonald wrote:
Fri Aug 16, 2019 5:59 am
jerrm thanks for running more tests particularly OpenSSH.

However, in light of what pica200 and others pointed out isn't the biggest concern with your methodology that you're comparing 64-bit programs against ARMv6 ones? Furthermore, even if you used ARMv7 binaries it would still fail to account for the performance difference of using first-gen ARMv7 vs higher-end ARMv7. While you might hope this part to be negligible (as it sometimes is), in the case of sysbench there's a 10x performance difference.
I'm only interested in "out of the box" performance. I'm perfectly capable of custom builds, but anything that breaks "apt-get upgrade" keeping security fixes current is a non-starter for our general use cases.

There have been times when I've had to have custom components and kernels, but it's not a path to go down when there are other options (like boards with aes extensions).

pica200
Posts: 138
Joined: Tue Aug 06, 2019 10:27 am

Re: Why moving to 64bit?

Fri Aug 16, 2019 12:48 pm

There was a different thread where someone complained about poor SSH speed and i pointed out how the crypto extensions would have made a difference but apparently the userbase was not worth the few cents more per SoC. I would have happily paid 1€ more for hardware AES and SHA1/2. The A72 can do software crypto at nearly 1 Gbit/s but it comes at the high cost of slowing down everything else massively. Every other sane ARMv8 based SoC i know does have the crypto extensions.

And yeah, you can see that especially crypto benefits from 64 bit as i have predicted ;)

User avatar
rpdom
Posts: 15184
Joined: Sun May 06, 2012 5:17 am
Location: Chelmsford, Essex, UK

Re: Why moving to 64bit?

Fri Aug 16, 2019 12:59 pm

pica200 wrote:
Fri Aug 16, 2019 12:48 pm
There was a different thread where someone complained about poor SSH speed and i pointed out how the crypto extensions would have made a difference but apparently the userbase was not worth the few cents more per SoC. I would have happily paid 1€ more for hardware AES and SHA1/2.
The Pi is built to a fixed price point for the standard model. $35. Not $36. Not $35.50. Not $35.05. $35. That is set in stone. The margins are tight. Some stuff has to be left out to get it down to that price. Charging extra for something that most people won't even notice is pointless.

jerrm
Posts: 194
Joined: Wed May 02, 2018 7:35 pm

Re: Why moving to 64bit?

Fri Aug 16, 2019 1:05 pm

pica200 wrote:
Fri Aug 16, 2019 12:48 pm
There was a different thread where someone complained about poor SSH speed and i pointed out how the crypto extensions would have made a difference but apparently the userbase was not worth the few cents more per SoC.
Yeah, i'd like to know what the real tradeoffs were. There was an early response to the question where jamesh didn't seem to really know the answer("Not as far as I can ascertain."). I would have considered leaving out the extensions a major compromise that would have led to a lot of hand wringing. I hope it wasn't just an oversight.

pica200
Posts: 138
Joined: Tue Aug 06, 2019 10:27 am

Re: Why moving to 64bit?

Fri Aug 16, 2019 1:16 pm

rpdom wrote:
Fri Aug 16, 2019 12:59 pm
Charging extra for something that most people won't even notice is pointless.
The numbers tell a different story. I think quite a few people will notice how it doesn't reach the advertised 1 Gbit/s because it is now CPU bound resulting from saving money at the wrong end. This also results in much higher power usage than necessary.

Aydan
Posts: 692
Joined: Fri Apr 13, 2012 11:48 am
Location: Germany, near Lake Constance

Re: Why moving to 64bit?

Fri Aug 16, 2019 1:29 pm

jerrm wrote:
Fri Aug 16, 2019 1:05 pm
pica200 wrote:
Fri Aug 16, 2019 12:48 pm
There was a different thread where someone complained about poor SSH speed and i pointed out how the crypto extensions would have made a difference but apparently the userbase was not worth the few cents more per SoC.
Yeah, i'd like to know what the real tradeoffs were. There was an early response to the question where jamesh didn't seem to really know the answer("Not as far as I can ascertain."). I would have considered leaving out the extensions a major compromise that would have led to a lot of hand wringing. I hope it wasn't just an oversight.
It may have something to do with export restrictions for devices which have encryption hardware.

Heater
Posts: 13299
Joined: Tue Jul 17, 2012 3:02 pm

Re: Why moving to 64bit?

Fri Aug 16, 2019 1:34 pm

I'm wondering who are all these people that need 100 megabytes per second in or out of their Pi, and why do they need it?

I can't collect data anywhere near that fast from any devices connected to my Pi.

If I could I cannot get it over my internet connection or mobile connection.

pica200
Posts: 138
Joined: Tue Aug 06, 2019 10:27 am

Re: Why moving to 64bit?

Fri Aug 16, 2019 1:57 pm

Aydan wrote:
Fri Aug 16, 2019 1:29 pm
It may have something to do with export restrictions for devices which have encryption hardware.
But it does that even without hardware acceleration so where is the point restricting it any more than without?
Heater wrote:
Fri Aug 16, 2019 1:34 pm
I'm wondering who are all these people that need 100 megabytes per second in or out of their Pi, and why do they need it?
It's a matter of principle. When a product says it can do X but it can't in real world i it may not be fraud immediately (To make it clear: That's not what i'm saying) but it's disappointing at the least. It can never reach the full 1 Gbit/s due to protocol overhead and stuff which everyone knows but what's not visible to potential customers (until they dig deeper which few will do) is that it's now limited elsewhere.

ejolson
Posts: 3553
Joined: Tue Mar 18, 2014 11:47 am

Re: Why moving to 64bit?

Fri Aug 16, 2019 2:41 pm

Heater wrote:
Fri Aug 16, 2019 1:34 pm
I'm wondering who are all these people that need 100 megabytes per second in or out of their Pi, and why do they need it?

I can't collect data anywhere near that fast from any devices connected to my Pi.

If I could I cannot get it over my internet connection or mobile connection.
Encrypted gigabit Ethernet is useful when setting up a VPN inside a local network. It is good for any kind of secure network filesystem or sharing, again within a local network. Encryption is also used for remote desktop and login.

While security may be mandatory in corporate and university settings, in the home there are often devices--smart light bulbs, web cameras, televisions, mobile phones and video players--behind the router firewall which are insecure and make encryption a good idea on a home network.

Eggshell security, where everything behind the firewall is soft and squishy, is generally not considered a best practice. The better policy of additional security behind the firewall immediately generates a need for encrypted gigabit Ethernet.

I suspect the number of people with only a single Pi behind their firewall router are in the minority.

pica200
Posts: 138
Joined: Tue Aug 06, 2019 10:27 am

Re: Why moving to 64bit?

Fri Aug 16, 2019 3:05 pm

I dunno. It's of course good practice but personally i'm not securing my local network much (apart from the strongest wireless encryption standard my router supports disallowing any outdated/insecure ones duh). But i care a lot about secure connections to everything outside my home and the trend goes in the same direction (end-to-end encryption, https/TLS everywhere).

I can think about another scenario though where acceleration matters. Encrypted archives. I think many have dealt with them in the past.

jerrm
Posts: 194
Joined: Wed May 02, 2018 7:35 pm

Re: Why moving to 64bit?

Fri Aug 16, 2019 3:51 pm

Heater wrote:
Fri Aug 16, 2019 1:34 pm
I'm wondering who are all these people that need 100 megabytes per second in or out of their Pi, and why do they need it?

I can't collect data anywhere near that fast from any devices connected to my Pi.

If I could I cannot get it over my internet connection or mobile connection.
I'm the first to admit our Pi uses are not standard, but we'll be purchasing 100+ units (of something) vs 1 or 2. Miniscule overall for RPF, but I'm sure there are others like us.

Even for the home user 1Gbps internet is available and affordable here, 100mbps+ even more so.

ejolson
Posts: 3553
Joined: Tue Mar 18, 2014 11:47 am

Re: Why moving to 64bit?

Fri Aug 16, 2019 4:17 pm

jerrm wrote:
Fri Aug 16, 2019 3:51 pm
Heater wrote:
Fri Aug 16, 2019 1:34 pm
I'm wondering who are all these people that need 100 megabytes per second in or out of their Pi, and why do they need it?

I can't collect data anywhere near that fast from any devices connected to my Pi.

If I could I cannot get it over my internet connection or mobile connection.
I'm the first to admit our Pi uses are not standard, but we'll be purchasing 100+ units (of something) vs 1 or 2. Miniscule overall for RPF, but I'm sure there are others like us.

Even for the home user 1Gbps internet is available and affordable here, 100mbps+ even more so.
Here, just outside the coverage area of the microwave towers, I'm lucky to get more than 1mbps. Even for local network use, most network protocols (ssh, samba, rdp, iscsi and realvnc) are encrypted by default. It actually takes quite a bit of effort to install and use unencrypted protocols such as telnet, ftp and nfs.

Heater
Posts: 13299
Joined: Tue Jul 17, 2012 3:02 pm

Re: Why moving to 64bit?

Fri Aug 16, 2019 5:07 pm

I pretty much agree with everyone's comments about security.

It's just that when you start to expect such high performance from the very cheap and humble Pi I wonder if that was ever it's intended use case.

Clearly I'm behind the times on expectations here.

jerrm
Posts: 194
Joined: Wed May 02, 2018 7:35 pm

Re: Why moving to 64bit?

Fri Aug 16, 2019 6:36 pm

Heater wrote:
Fri Aug 16, 2019 5:07 pm
It's just that when you start to expect such high performance from the very cheap and humble Pi I wonder if that was ever it's intended use case.
More of a wish than an expectation.

I wish we could use the Pi primarily because of the long term production commitment (with an organization that will more than likely still be here three years from now).

Unfortunately it's looking like the Pi4 won't be a good fit. That's OK, but doesn't mean I have to be happy about it.

jdonald
Posts: 413
Joined: Fri Nov 03, 2017 4:36 pm

Re: Why moving to 64bit?

Fri Aug 16, 2019 6:40 pm

jerrm wrote:
Fri Aug 16, 2019 12:38 pm
I'm only interested in "out of the box" performance. I'm perfectly capable of custom builds, but anything that breaks "apt-get upgrade" keeping security fixes current is a non-starter for our general use cases.
This could be an arguable reason not to use Cortex-A72 compiled 32-bit benchmarks. Restating: your 64-bit test system does not receive the benefit of recompiling their binaries which are likely tuned for Cortex-A15 as systems provide out-of-the-box. Thus, it may be an appropriate comparison to use lower end ARMv7 binaries as your 32-bit baseline.

However, this does not appear to justify using ARMv6 binaries in your baseline. Debian and Ubuntu are two systems compiled for ARMv7 with upstream packages that can handle "apt-get upgrade" just fine. Your 64-bit measurement is already done inside a Debian arm64 chroot so it would make sense for the baseline to run inside a Debian armhf chroot. This would also better control for factors such as side effects from running inside a chroot vs metal, or subtle configuration differences between Raspbian and Debian.

As alluded to by jahboater early on in this thread, for various programs the performance delta going from ARMv6->ARMv7 exceeds that of going from 32-bit->64-bit. If we only compare ARMv6 vs 64-bit that leaves a big unknown on any test result.

jerrm
Posts: 194
Joined: Wed May 02, 2018 7:35 pm

Re: Why moving to 64bit?

Sat Aug 17, 2019 2:17 am

jdonald wrote:
Fri Aug 16, 2019 6:40 pm
However, this does not appear to justify using ARMv6 binaries in your baseline. Debian and Ubuntu are two systems compiled for ARMv7 with upstream packages that can handle "apt-get upgrade" just fine. Your 64-bit measurement is already done inside a Debian arm64 chroot so it would make sense for the baseline to run inside a Debian armhf chroot. This would also better control for factors such as side effects from running inside a chroot vs metal, or subtle configuration differences between Raspbian and Debian.

As alluded to by jahboater early on in this thread, for various programs the performance delta going from ARMv6->ARMv7 exceeds that of going from 32-bit->64-bit. If we only compare ARMv6 vs 64-bit that leaves a big unknown on any test result.
Simplicity and maintainability count. I have no real desire to maintain Raspbian with a Debian chroot. But since this is just a curiosity project, I set up the Debian chroot anyway.

No benefit for Debian armhf on this OpenSSH test. The numbers actually came in little worse than than the Raspbian binaries, but too close to declare a winner/loser.

Code: Select all

=============================================================================================
64/deb32 BIT
=============================================================================================
time dd if=/dev/zero bs=10240 count=409600 | ssh -p 2222 root@192.168.22.58 'cat > /dev/null'
---------------------------------------------------------------------------------------------
409600+0 records in
409600+0 records out
4194304000 bytes (4.2 GB, 3.9 GiB) copied, 86.6481 s, 48.4 MB/s

real    1m26.695s
user    0m44.092s
sys     0m12.941s

409600+0 records in
409600+0 records out
4194304000 bytes (4.2 GB, 3.9 GiB) copied, 89.0552 s, 47.1 MB/s

real    1m29.112s
user    0m44.384s
sys     0m12.446s

---------------------------------------------------------------------------------------
time ssh -p 2222 root@192.168.22.58 'dd if=/dev/zero bs=10240 count=409600' > /dev/null
---------------------------------------------------------------------------------------
409600+0 records in
409600+0 records out
4194304000 bytes (4.2 GB, 3.9 GiB) copied, 99.3105 s, 42.2 MB/s

real    1m39.604s
user    0m50.835s
sys     0m13.415s

409600+0 records in
409600+0 records out
4194304000 bytes (4.2 GB, 3.9 GiB) copied, 101.382 s, 41.4 MB/s

real    1m41.674s
user    0m56.046s
sys     0m9.099s

=============================================================================================
32/deb32 BIT
=============================================================================================
time dd if=/dev/zero bs=10240 count=409600 | ssh -p 2222 root@192.168.22.58 'cat > /dev/null'
---------------------------------------------------------------------------------------------
409600+0 records in
409600+0 records out
4194304000 bytes (4.2 GB, 3.9 GiB) copied, 91.5883 s, 45.8 MB/s

real    1m31.639s
user    0m45.516s
sys     0m13.374s

409600+0 records in
409600+0 records out
4194304000 bytes (4.2 GB, 3.9 GiB) copied, 94.2352 s, 44.5 MB/s

real    1m34.286s
user    0m45.983s
sys     0m11.374s

---------------------------------------------------------------------------------------
time ssh -p 2222 root@192.168.22.58 'dd if=/dev/zero bs=10240 count=409600' > /dev/null
---------------------------------------------------------------------------------------
409600+0 records in
409600+0 records out
4194304000 bytes (4.2 GB, 3.9 GiB) copied, 95.1678 s, 44.1 MB/s

real    1m35.496s
user    0m47.133s
sys     0m15.391s
409600+0 records in
409600+0 records out
4194304000 bytes (4.2 GB, 3.9 GiB) copied, 98.5608 s, 42.6 MB/s

real    1m38.886s
user    0m48.183s
sys     0m15.235s
Last edited by jerrm on Sat Aug 17, 2019 6:05 am, edited 1 time in total.

Return to “General discussion”