Heater
Posts: 13271
Joined: Tue Jul 17, 2012 3:02 pm

Re: Why Avoid BASIC on RPi?

Thu Jan 03, 2019 10:09 am

That is a nice history essay ejolson, I may comment on that after my morning coffee...

The classic.bas is a thing to behold. Insane but strangely beautiful:) I'll get down to adding that to the repository.

If turning on GCC optimization is causing problems I suspect that FreeBASIC may be generating code that uses some undefined behaviors of C. When the optimizer mashes things around it may trigger a different undefined behavior that is not the one we want.

Time to break out GCC's undefined behavior sanitzer https://developers.redhat.com/blog/2014 ... zer-ubsan/

Perhaps have a go with address sanitizer as well.

Heater
Posts: 13271
Joined: Tue Jul 17, 2012 3:02 pm

Re: Why Avoid BASIC on RPi?

Thu Jan 03, 2019 10:35 am

Line numbered classic.bas works fine here:

Code: Select all

$ fbc -R -O 3 -lang qb -fpu sse -fpmode fast classic.bas
$ time ./classic | head -c 32 ; time ./classic | tail -c 32
10727395641800477229364813596225
real    0m24.847s
user    0m6.016s
sys     0m18.828s
4856539211500699706378405156269

real    0m26.008s
user    0m6.422s
sys     0m20.328s
Last edited by Heater on Thu Jan 03, 2019 11:17 am, edited 1 time in total.

ejolson
Posts: 3547
Joined: Tue Mar 18, 2014 11:47 am

Re: Why Avoid BASIC on RPi?

Thu Jan 03, 2019 10:45 am

Heater wrote:
Thu Jan 03, 2019 10:35 am
Line numbered classic.bas works fine here:

Code: Select all

$ time ./classic | head -c 32 ; time ./classic | tail -c 32
10727395641800477229364813596225
real    0m29.424s
user    0m11.016s
sys     0m18.375s
4856539211500699706378405156269

real    0m30.437s
user    0m11.031s
sys     0m19.906s
Good news. Is that on a Pi or x86? Did you compile with -O 3 or -O 2 optimisation?

Heater
Posts: 13271
Joined: Tue Jul 17, 2012 3:02 pm

Re: Why Avoid BASIC on RPi?

Thu Jan 03, 2019 11:18 am

On my PC. I update the post above with my build command and new timings.

jahboater
Posts: 4675
Joined: Wed Feb 04, 2015 6:38 pm

Re: Why Avoid BASIC on RPi?

Thu Jan 03, 2019 12:49 pm

I saw this fbc option:-

-vec n
Automatic vectorization level (default: 0)

Would it be faster without the -R ?

ejolson
Posts: 3547
Joined: Tue Mar 18, 2014 11:47 am

Re: Why Avoid BASIC on RPi?

Thu Jan 03, 2019 1:51 pm

jahboater wrote:
Thu Jan 03, 2019 12:49 pm
I saw this fbc option:-

-vec n
Automatic vectorization level (default: 0)

Would it be faster without the -R ?
I don't think -R affects optimisation levels, but only preserves the intermediate files generated during compilation.

I think the vectorisation option only applies to the built-in x86 code generator. With the gcc back end, vectorisation should be done automatically by gcc. Note that the O(n^2) multiplication in the present code may not easily vectorise. Therefore, you may want to replace the current subroutine starting on line 4050 by something similar to what's in the C program. I've actually tried this but don't remember it helping much--maybe I didn't set all the switches correctly.

Heater
Posts: 13271
Joined: Tue Jul 17, 2012 3:02 pm

Re: Why Avoid BASIC on RPi?

Thu Jan 03, 2019 8:32 pm

classic.bas is up on the repo.

Let me know if there are any notes you would like to add to the README.

ejolson
Posts: 3547
Joined: Tue Mar 18, 2014 11:47 am

Re: Why Avoid BASIC on RPi?

Fri Jan 04, 2019 5:08 am

Heater wrote:
Thu Jan 03, 2019 8:32 pm
classic.bas is up on the repo.

Let me know if there are any notes you would like to add to the README.
The README looks good. Thanks for keeping GitHub up to date as this thread is getting too long to find all the interesting code posted here.

ejolson
Posts: 3547
Joined: Tue Mar 18, 2014 11:47 am

Re: Why Avoid BASIC on RPi?

Fri Jan 04, 2019 6:14 am

ejolson wrote:
Thu Jan 03, 2019 7:18 am
If some C enthusiast wants to use the integer version of the line-numbered BASIC Fibonacci code to figure out what's going on with the gcc optimizer, let me know and I'll include that in a different post.
The integer version of the line-numbered Fibonacci code is

Code: Select all

100 rem integer.bas -- Compute the nth Fibonacci Number
110 rem Written December 25, 2018 by Eric Olson
120 rem
130 rem This program demonstrates the expressiveness of the original
140 rem versions of Microsoft BASIC as measured by explicitly coding
150 rem Karatsuba multiplication for big-number arithmetic and then
160 rem using the doubling formula
170 rem
180 rem     F(2k) = F(k)[2F(k+1)-F(k)]
190 rem   F(2k+1) = F(k+1)[2F(k+1)-F(k)]+(-1)^(k+1)
200 rem   F(2k+1) = F(k)[2F(k)+F(k+1)]+(-1)^(k)
210 rem   F(2k+2) = F(k+1)[2F(k)+F(k+1)]
220 rem
221 rem to compute the nth Fibonacci number.
222 rem
223 rem Version 2:  Minor changes to optimize the O(n^2) multiply and
224 rem prevent overflow when running on MITS BASIC.
225 rem
226 rem Version 3:  Conversion to integer data types and removal of
227 rem subroutine that determines maximum floating-point integer.
228 rem
229 rem To run this program on early versions of Microsoft BASIC please
230 rem set the default data type to 16-bit integers using defint a-z in
231 rem line 241 and edit line 260 to read b8=100:b9=2:b7=32767-2*b8*b8.
232 rem Then change n so the resulting Fibonacci number fits in memory.
240 rem
241 deflng a-z
242 rem defint a-z
250 n=4784969
251 rem n=7499
260 b8=10000:b9=4:b7=2147483647-2*b8*b8
261 rem b8=100:b9=2:b7=32767-2*b8*b8
265 b6=b7\b8:b7=b8*b6
270 d9=int(n*log((1+sqr(5))/2)/log(b8)+7):m9=14
280 dim m(d9*m9):m0=1
300 a=m0:m0=m0+1+d9:b=m0:m0=m0+1+d9
310 t1=m0:m0=m0+1+d9:t2=m0:m0=m0+1+d9
400 r0=n:gosub 1000
420 r7=b:gosub 7000
430 stop
1000 rem Compute nth Fibonacci number
1010 rem    inputs: r0 the value of n
1020 rem   outputs: b the value of F(n)
1040 if r0<2 then m(b)=1:m(b+1)=r0:return
1060 n1=r0:r0=(n1-1)\2:gosub 1500
1070 p1=n1 mod 4
1080 if p1=1 or p1=3 goto 1200
1090 r1=t1:r2=a:r3=a:gosub 2000
1110 r1=t2:r2=t1:r3=b:gosub 2000
1120 r1=t1:r2=b:r3=t2:gosub 4000
1170 r1=b:r2=t1:gosub 5000:return
1200 r1=t1:r2=b:r3=b:gosub 2000
1210 r1=t2:r2=t1:r3=a:gosub 3000
1220 r1=t1:r2=b:r3=t2:gosub 4000
1230 if p1=3 then 1250
1240 r1=t1:gosub 6000:goto 1260
1250 r1=t1:gosub 6500
1260 r1=b:r2=t1:gosub 5000:return
1500 rem Recursive work for nth Fibonacci number
1510 rem    inputs: r0 the value of n
1520 rem   outputs: a the value of F(n)
1530 rem   outputs: b the value of F(n+1)
1540 if r0=0 then m(a)=1:m(a+1)=0:m(b)=1:m(b+1)=1:return
1600 m(m0)=r0:m0=m0+1:r0=r0\2:gosub 1500
1610 m0=m0-1:r0=m(m0)
1620 p1=r0 mod 4
1630 if p1=1 or p1=3 then 1720
1640 r1=t1:r2=b:r3=b:gosub 2000
1650 r1=t2:r2=t1:r3=a:gosub 3000
1660 r1=t1:r2=a:r3=t2:gosub 4000
1670 r1=a:r2=t1:gosub 5000
1680 r1=t1:r2=b:r3=t2:gosub 4000
1690 if p1=2 then 1710
1700 r1=t1:gosub 6000:goto 1711
1710 r1=t1:gosub 6500
1711 r1=b:r2=t1:gosub 5000:return
1720 r1=t1:r2=a:r3=a:gosub 2000
1730 r1=t2:r2=t1:r3=b:gosub 2000
1740 r1=t1:r2=b:r3=t2:gosub 4000
1750 r1=b:r2=t1:gosub 5000
1760 r1=t1:r2=a:r3=t2:gosub 4000
1770 if p1=3 then 1790
1780 r1=t1:gosub 6500:goto 1800
1790 r1=t1:gosub 6000
1800 r1=a:r2=t1:gosub 5000:return
2000 rem Big-number addition
2010 rem  outputs: r1 the value of a+b
2020 rem   inputs: r2 the value of a
2030 rem   inputs: r3 the value of b
2050 if m(r3)>m(r2) then i9=m(r3) else i9=m(r2)
2060 for i=1 to i9+1:m(r1+i)=0:next i
2070 for i=1 to i9:c=0:t=m(r1+i)
2080 if i<=m(r2) then t=t+m(r2+i)
2090 if i<=m(r3) then t=t+m(r3+i)
2110 if t>=b8 then c=1:t=t-b8
2120 m(r1+i)=t:m(r1+i+1)=m(r1+i+1)+c:next i
2130 m(r1)=i9+1
2140 r4=r1:gosub 7500
2150 return
3000 rem Big-number subtraction
3010 rem  outputs: r1 the value of a-b
3020 rem   inputs: r2 the value of a
3030 rem   inputs: r3 the value of b
3050 for i=1 to m(r2):m(r1+i)=0:next i
3060 for i=1 to m(r3):t=m(r1+i)+m(r2+i)-m(r3+i)
3070 if t<0 then t=t+b8:m(r1+i+1)=m(r1+i+1)-1
3080 m(r1+i)=t:next i
3090 for i=m(r3)+1 to m(r2):t=m(r1+i)+m(r2+i)
3100 if t<0 then t=t+b8:m(r1+i+1)=m(r1+i+1)-1
3110 m(r1+i)=t:next i
3120 m(r1)=m(r2)
3130 r4=r1:gosub 7500
3150 return
4000 rem Big-number multiplication
4010 rem  outputs: r1 the value of a*b
4020 rem   inputs: r2 the value of a
4030 rem   inputs: r3 the value of b
4040 if m(r2)>80 and m(r3)>80 then 4300
4050 i9=m(r2)+m(r3):for i=1 to i9:m(r1+i)=0:next i
4070 for i=1 to m(r2):for j=1 to m(r3)
4080 t=m(r1+i+j-1)+m(r2+i)*m(r3+j)
4090 if t<b7 then 4120
4100 m(r1+i+j-1)=t-b7
4110 m(r1+i+j)=m(r1+i+j)+b6:goto 4130
4120 m(r1+i+j-1)=t
4130 next j:next i
4140 c=0:for i=1 to i9:t=m(r1+i)+c
4150 if t<b8 then c=0:goto 4170
4160 c=t\b8:t=t mod b8
4170 m(r1+i)=t:next i
4180 m(r1)=i9
4190 r4=r1:gosub 7500
4230 return
4300 rem Big-number Karatsuba algorithm
4310 if m(r2)<m(r3) then i8=m(r3) else i8=m(r2)
4320 i8=i8\2
4330 z0=m0:m0=m0+1+2*i8+1
4332 z2=m0:m0=m0+1+2*i8+3
4334 z1=m0:m0=m0+1+2*i8+5
4340 z3=m0:m0=m0+1+i8+2
4350 z4=m0:m0=m0+1+i8+2
4360 r5=z4:r6=r3:gosub 4500
4370 r5=z3:r6=r2:gosub 4500
4380 gosub 4600:r1=z1:r2=z3:r3=z4:gosub 4000:gosub 4700
4400 q1=m(r2):if i8<q1 then m(r2)=i8
4405 q2=m(r3):if i8<q2 then m(r3)=i8
4410 gosub 4600:r1=z0:gosub 4000:gosub 4700
4420 m(r2)=q1:m(r3)=q2
4430 q3=q1-i8:q4=q2-i8:if q3<0 or q4<0 then m(z2)=0:goto 8000
4440 q1=m(r2+i8):m(r2+i8)=q3:q2=m(r3+i8):m(r3+i8)=q4
4450 gosub 4600:r1=z2:r2=r2+i8:r3=r3+i8:gosub 4000:gosub 4700
4460 m(r2+i8)=q1:m(r3+i8)=q2
4470 goto 8000
4500 rem Add high to low
4510 rem  outputs: r5 the sum of high(a)+low(a)
4520 rem   inputs: r6 the value of a
4530 rem   inputs: i8 the split point
4540 c=0:for i=1 to i8+1:t=c
4545 if i<=i8 and i<=m(r6) then t=t+m(r6+i)
4550 if i+i8<=m(r6) then t=t+m(r6+i+i8)
4560 if t>=b8 then c=1:t=t-b8 else c=0
4570 m(r5+i)=t:next i:m(r5+i8+2)=c
4590 m(r5)=i8+2:r4=r5:gosub 7500
4595 return
4600 rem Save frame
4610 m(m0)=z1:m0=m0+1:m(m0)=z2:m0=m0+1:m(m0)=z0:m0=m0+1
4620 m(m0)=i8:m0=m0+1:m(m0)=q1:m0=m0+1:m(m0)=q2:m0=m0+1
4630 rem Save parameters
4640 m(m0)=r1:m0=m0+1:m(m0)=r2:m0=m0+1:m(m0)=r3:m0=m0+1
4650 return
4700 rem Restore frame
4710 gosub 4750
4720 m0=m0-1:q2=m(m0):m0=m0-1:q1=m(m0):m0=m0-1:i8=m(m0)
4730 m0=m0-1:z0=m(m0):m0=m0-1:z2=m(m0):m0=m0-1:z1=m(m0)
4740 return
4750 rem Restore parameters
4760 m0=m0-1:r3=m(m0):m0=m0-1:r2=m(m0):m0=m0-1:r1=m(m0)
4770 return
5000 rem Big-number copy
5010 rem  outputs: r1 the value of a
5020 rem   inputs: r2 the value of a
5030 r4=r2:gosub 7500
5040 for i=1 to m(r2):m(r1+i)=m(r2+i):next i
5050 for i=m(r2)+1 to m(r1):m(r1+i)=0:next i
5060 m(r1)=m(r2)
5070 return
6000 rem Big-number decrement
6010 rem   inputs: r1 the value of a
6020 rem  outputs: r1 the value of a-1
6040 i=1:c=1
6050 if c=0 then 6080
6060 if m(r1+i)<1 then m(r1+i)=b8-1 else m(r1+i)=m(r1+i)-1:c=0
6070 i=i+1:goto 6050
6080 r4=r1:gosub 7500
6100 return
6500 rem Big-number increment
6510 rem   inputs: r1 the value of a
6520 rem  outputs: r1 the value of a+1
6540 m(r1)=m(r1)+1:m(r1+m(r1))=0:i=1:c=1
6550 if c=0 then 6590
6560 t=m(r1+i)+1
6570 if t>=b8 then m(r1+i)=t-b8 else m(r1+i)=t:c=0
6580 i=i+1:goto 6550
6590 r4=r1:gosub 7500
6600 return
7000 rem Big-number print
7010 rem   inputs: r7 the value to print
7020 if m(r7)=0 then print "0":return
7030 for i=m(r7) to 1 step -1
7040 s$=str$(m(r7+i))
7045 if mid$(s$,1,1)=" " then s$=mid$(s$,2,len(s$)-1)
7050 if i=m(r7) or b9<=len(s$) then 7070
7060 s$=mid$("0000000000000",1,b9-len(s$))+s$
7070 print s$;:next i:print:return
7500 rem Big-number trim
7510 rem    inputs: r4 the value of a
7520 rem   outputs: r4 the trimmed value of a
7530 if m(r4+m(r4))=0 and m(r4)>1 then m(r4)=m(r4)-1:goto 7530
7540 return
8000 rem Tail of Karatsuba
8003 t4=m0-1-2*i8-5
8005 gosub 4630:r1=t4:r2=z1:r3=z0:gosub 3000
8010 r1=z1:r2=t4:r3=z2:gosub 3000:gosub 4750
8020 r4=z1:gosub 7500
8030 i9=m(r2)+m(r3):i7=2*i8:c=0:for i=1 to i9:t=c
8040 if i<=m(z0) then t=t+m(z0+i)
8050 if i>i8 and i-i8<=m(z1) then t=t+m(z1+i-i8)
8060 if i>i7 and i-i7<=m(z2) then t=t+m(z2+i-i7)
8070 if t<b8 then c=0:goto 8090
8080 c=1:t=t-b8
8090 m(r1+i)=t:next i
8092 m(r1)=i9:r4=r1:gosub 7500
8100 m0=z0
8120 return
9999 end
When compiled using the built-in x86 code generator we obtain

Code: Select all

$ make
/usr/local/fbc-519/bin/fbc -O 3 -lang qb integer.bas
$ time ./integer >integer.txt

real    0m22.092s
user    0m21.552s
sys     0m0.528s
$ md5sum integer.txt 
c926caa99ed1e30fe0d3c966da901738  integer.txt
$ md5sum fibonacci.txt 
c926caa99ed1e30fe0d3c966da901738  fibonacci.txt
which indicates that the code is correct and slightly slower than the double-precision floating point version. On Raspberry Pi Zero with optimization turned off we obtain

Code: Select all

$ make
/usr/local/fbc-300/bin/fbc  -lang qb integer.bas
$ time ./integer >integer.txt

real    3m25.624s
user    3m17.260s
sys 0m8.150s
$ md5sum integer.txt 
c926caa99ed1e30fe0d3c966da901738  integer.txt
which indicates that the code is correct and also slightly faster than the double-precision floating point version. However, with optimization turned on a segmentation fault results as seen by

Code: Select all

$ make
/usr/local/fbc-300/bin/fbc -O 3 -lang qb integer.bas
$ time ./integer >integer.txt
Segmentation fault

real    0m0.847s
user    0m0.700s
sys 0m0.120s
Here integer.txt turns out to be an empty file rather than the desired Fibonacci number.

Since there is no built-in ARM code generator on Raspberry Pi then gcc is used as the default back end. If the -gen gcc option is specified on x86 machines with optimization turned on, then similar segmentation faults or wrong answers also result depending on the version of gcc used. The purpose of this post is to provide enough details so that someone may be able to determine what is going wrong with optimization when the gcc back end is used.

To preserve the integer.c intermediate file and display the appropriate gcc, as and ld command lines needed to compile it, add the options -R -v to obtain

Code: Select all

$ /usr/local/fbc-300/bin/fbc -O 3 -R -v -lang qb integer.bas
FreeBASIC Compiler - Version 1.06.0 (11-22-2018), built for linux-arm (32bit)
Copyright (C) 2004-2016 The FreeBASIC development team.
target:       linux-arm, armv6, 32bit
compiling:    integer.bas -o integer.c (main module)
compiling C:  gcc -march=armv6 -S -nostdlib -nostdinc -Wall -Wno-unused-label -Wno-unused-function -Wno-unused-variable -Wno-unused-but-set-variable -Wno-main -Werror-implicit-function-declaration -O3 -fno-strict-aliasing -frounding-math -fno-math-errno -fwrapv -fno-exceptions -fno-unwind-tables -fno-asynchronous-unwind-tables "integer.c" -o "integer.asm"
assembling:   as --strip-local-absolute "integer.asm" -o "integer.o"
linking:      ld -m armelf_linux_eabi -o "integer" -dynamic-linker /lib/ld-linux-armhf.so.3 "/usr/local/fbc-300/bin/../lib/freebasic/linux-arm/fbextra.x" -s -L "/usr/local/fbc-300/bin/../lib/freebasic/linux-arm" -L "." -L "/usr/lib/gcc/arm-linux-gnueabihf/6" "/usr/lib/gcc/arm-linux-gnueabihf/6/../../../arm-linux-gnueabihf/crt1.o" "/usr/lib/gcc/arm-linux-gnueabihf/6/../../../arm-linux-gnueabihf/crti.o" "/usr/lib/gcc/arm-linux-gnueabihf/6/crtbegin.o" "/usr/local/fbc-300/bin/../lib/freebasic/linux-arm/fbrt0.o" "integer.o" "-(" -lfb -ltinfo -lm -ldl -lpthread -lgcc -lgcc_eh -lc "-)" "/usr/lib/gcc/arm-linux-gnueabihf/6/crtend.o" "/usr/lib/gcc/arm-linux-gnueabihf/6/../../../arm-linux-gnueabihf/crtn.o" 
On x86 you will need to add the -gen gcc option as well. Note that the intermediate file integer.c will be slightly different depending on whether or not the -O 3 option is specified; however, when -O 3 is not specified the only difference in the gcc, as and ld commands shown above is the substitution of -O0 for -O3.

To demonstrate that it is the optimizer in gcc rather than the one in FreeBASIC which causes the problem, first compile the integer.c file generated above using the same commands as above, except with -O0 substituted for -O3. The results

Code: Select all

$ make
gcc -march=armv6 -S -nostdlib -nostdinc -Wall -Wno-unused-label -Wno-unused-function -Wno-unused-variable -Wno-unused-but-set-variable -Wno-main -Werror-implicit-function-declaration -O0 -fno-strict-aliasing -frounding-math -fno-math-errno -fwrapv -fno-exceptions -fno-unwind-tables -fno-asynchronous-unwind-tables "integer.c" -o "integer.asm"
as --strip-local-absolute "integer.asm" -o "integer.o"
ld -m armelf_linux_eabi -o "integer" -dynamic-linker /lib/ld-linux-armhf.so.3 "/usr/local/fbc-300/bin/../lib/freebasic/linux-arm/fbextra.x" -s -L "/usr/local/fbc-300/bin/../lib/freebasic/linux-arm" -L "." -L "/usr/lib/gcc/arm-linux-gnueabihf/6" "/usr/lib/gcc/arm-linux-gnueabihf/6/../../../arm-linux-gnueabihf/crt1.o" "/usr/lib/gcc/arm-linux-gnueabihf/6/../../../arm-linux-gnueabihf/crti.o" "/usr/lib/gcc/arm-linux-gnueabihf/6/crtbegin.o" "/usr/local/fbc-300/bin/../lib/freebasic/linux-arm/fbrt0.o" "integer.o" "-(" -lfb -ltinfo -lm -ldl -lpthread -lgcc -lgcc_eh -lc "-)" "/usr/lib/gcc/arm-linux-gnueabihf/6/crtend.o" "/usr/lib/gcc/arm-linux-gnueabihf/6/../../../arm-linux-gnueabihf/crtn.o" 
$ time ./integer >integer.txt

real    3m27.472s
user    3m18.860s
sys     0m8.400s
$ md5sum integer.txt 
c926caa99ed1e30fe0d3c966da901738  integer.txt
indicate that the correct answer is now produced. It should be emphasized that the only difference between a working and non-working version of the executable is whether or not -O3 or -O0 is included on the gcc command line used to compile the integer.c intermediate file.

To further confirm this, start over without the -O 3 option on the FreeBASIC command line as

Code: Select all

$ /usr/local/fbc-300/bin/fbc -R -v -lang qb integer.bas
FreeBASIC Compiler - Version 1.06.0 (11-22-2018), built for linux-arm (32bit)
Copyright (C) 2004-2016 The FreeBASIC development team.
target:       linux-arm, armv6, 32bit
compiling:    integer.bas -o integer.c (main module)
compiling C:  gcc -march=armv6 -S -nostdlib -nostdinc -Wall -Wno-unused-label -Wno-unused-function -Wno-unused-variable -Wno-unused-but-set-variable -Wno-main -Werror-implicit-function-declaration -O0 -fno-strict-aliasing -frounding-math -fno-math-errno -fwrapv -fno-exceptions -fno-unwind-tables -fno-asynchronous-unwind-tables "integer.c" -o "integer.asm"
assembling:   as --strip-local-absolute "integer.asm" -o "integer.o"
linking:      ld -m armelf_linux_eabi -o "integer" -dynamic-linker /lib/ld-linux-armhf.so.3 "/usr/local/fbc-300/bin/../lib/freebasic/linux-arm/fbextra.x" -s -L "/usr/local/fbc-300/bin/../lib/freebasic/linux-arm" -L "." -L "/usr/lib/gcc/arm-linux-gnueabihf/6" "/usr/lib/gcc/arm-linux-gnueabihf/6/../../../arm-linux-gnueabihf/crt1.o" "/usr/lib/gcc/arm-linux-gnueabihf/6/../../../arm-linux-gnueabihf/crti.o" "/usr/lib/gcc/arm-linux-gnueabihf/6/crtbegin.o" "/usr/local/fbc-300/bin/../lib/freebasic/linux-arm/fbrt0.o" "integer.o" "-(" -lfb -ltinfo -lm -ldl -lpthread -lgcc -lgcc_eh -lc "-)" "/usr/lib/gcc/arm-linux-gnueabihf/6/crtend.o" "/usr/lib/gcc/arm-linux-gnueabihf/6/../../../arm-linux-gnueabihf/crtn.o" 
As stated before, the generated integer.c file is different because the optimizer in FreeBASIC has been turned off as well as the optimizer in gcc. Now compile integer.c using gcc with -O3 optimization to obtain

Code: Select all

$ make
gcc -march=armv6 -S -nostdlib -nostdinc -Wall -Wno-unused-label -Wno-unused-function -Wno-unused-variable -Wno-unused-but-set-variable -Wno-main -Werror-implicit-function-declaration -O3 -fno-strict-aliasing -frounding-math -fno-math-errno -fwrapv -fno-exceptions -fno-unwind-tables -fno-asynchronous-unwind-tables "integer.c" -o "integer.asm"
as --strip-local-absolute "integer.asm" -o "integer.o"
ld -m armelf_linux_eabi -o "integer" -dynamic-linker /lib/ld-linux-armhf.so.3 "/usr/local/fbc-300/bin/../lib/freebasic/linux-arm/fbextra.x" -s -L "/usr/local/fbc-300/bin/../lib/freebasic/linux-arm" -L "." -L "/usr/lib/gcc/arm-linux-gnueabihf/6" "/usr/lib/gcc/arm-linux-gnueabihf/6/../../../arm-linux-gnueabihf/crt1.o" "/usr/lib/gcc/arm-linux-gnueabihf/6/../../../arm-linux-gnueabihf/crti.o" "/usr/lib/gcc/arm-linux-gnueabihf/6/crtbegin.o" "/usr/local/fbc-300/bin/../lib/freebasic/linux-arm/fbrt0.o" "integer.o" "-(" -lfb -ltinfo -lm -ldl -lpthread -lgcc -lgcc_eh -lc "-)" "/usr/lib/gcc/arm-linux-gnueabihf/6/crtend.o" "/usr/lib/gcc/arm-linux-gnueabihf/6/../../../arm-linux-gnueabihf/crtn.o" 
$ time ./integer >integer.txt
Segmentation fault

real    0m0.877s
user    0m0.700s
sys     0m0.150s
Therefore, the integer.c generated by the FreeBASIC front end with optimization turned off also leads to a segmentation fault when compiled with the gcc optimizer turned on.

I have not included either of the integer.c intermediate files in this post. They are rather large and easily regenerated from the integer.bas source. The purpose of this post was so that the interested parties could check what's happening with the gcc optimizer. In my opinion there is no need to start a separate why-avoid-C-on-the-Raspberry-Pi thread to further discuss this problem. Anything uncovered about gcc and undefined behavior using the FreeBASIC-generated intermediate code integer.c seems off topic enough to post here.
Last edited by ejolson on Sat Jan 05, 2019 12:20 pm, edited 1 time in total.

jahboater
Posts: 4675
Joined: Wed Feb 04, 2015 6:38 pm

Re: Why Avoid BASIC on RPi?

Fri Jan 04, 2019 7:44 am

Its doing this:

Code: Select all

==18894== Invalid write of size 4
==18894==    at 0x402A4B: main (integer.c:989)
==18894==  Address 0x6caf44c is 14,001,164 bytes inside a block of size 14,004,128 in arena "client"
==18894== 
==18894== Invalid write of size 4
==18894==    at 0x402ACC: main (integer.c:997)
==18894==  Address 0x6caf544 is 14,001,412 bytes inside a block of size 14,004,128 in arena "client"
==18894== 
==18894== Invalid write of size 4
==18894==    at 0x402AD9: main (integer.c:999)
==18894==  Address 0x6caf448 is 14,001,160 bytes inside a block of size 14,004,128 in arena "client"
==18894== 
.................................
hundreds of them.

presumably somewhere in here is an invalid address:-

*(int32*)(((int64)(R5$0 + I$0) << (2ll & 63ll)) + *(int64*)&Mi$$0) = T$0;

gdb says

Code: Select all

Program received signal SIGSEGV, Segmentation fault.
0x0000000000402a4b in main (__FB_ARGC__$0=<optimized out>, __FB_ARGV__$0=<optimized out>) at integer.c:989
989	    *(int32*)(((int64)(R5$0 + I$0) << (2ll & 63ll)) + *(int64*)&Mi$$0) = T$0;
(gdb) 
No surprises stuff like this doesn't survive optimization :)

ejolson
Posts: 3547
Joined: Tue Mar 18, 2014 11:47 am

Re: Why Avoid BASIC on RPi?

Fri Jan 04, 2019 8:35 am

jahboater wrote:
Fri Jan 04, 2019 7:44 am
Its doing this:

Code: Select all

==18894== Invalid write of size 4
==18894==    at 0x402A4B: main (integer.c:989)
==18894==  Address 0x6caf44c is 14,001,164 bytes inside a block of size 14,004,128 in arena "client"
==18894== 
==18894== Invalid write of size 4
==18894==    at 0x402ACC: main (integer.c:997)
==18894==  Address 0x6caf544 is 14,001,412 bytes inside a block of size 14,004,128 in arena "client"
==18894== 
==18894== Invalid write of size 4
==18894==    at 0x402AD9: main (integer.c:999)
==18894==  Address 0x6caf448 is 14,001,160 bytes inside a block of size 14,004,128 in arena "client"
==18894== 
.................................
hundreds of them.

presumably somewhere in here is an invalid address:-

*(int32*)(((int64)(R5$0 + I$0) << (2ll & 63ll)) + *(int64*)&Mi$$0) = T$0;

gdb says

Code: Select all

Program received signal SIGSEGV, Segmentation fault.
0x0000000000402a4b in main (__FB_ARGC__$0=<optimized out>, __FB_ARGV__$0=<optimized out>) at integer.c:989
989	    *(int32*)(((int64)(R5$0 + I$0) << (2ll & 63ll)) + *(int64*)&Mi$$0) = T$0;
(gdb) 
No surprises stuff like this doesn't survive optimization :)
I also wondered about those casts of signed integers to pointers. Do you think it would work better if they were unsigned integers or alternatively pointers to unsigned characters?

The message

Address 0x6caf448 is 14,001,160 bytes inside a block of size 14,004,128 in arena "client"

says the write will occur inside the block. Isn't that what's supposed to happen? Why would that be invalid?

jahboater
Posts: 4675
Joined: Wed Feb 04, 2015 6:38 pm

Re: Why Avoid BASIC on RPi?

Fri Jan 04, 2019 8:55 am

ejolson wrote:
Fri Jan 04, 2019 8:35 am
I also wondered about those casts of signed integers to pointers. Do you think it would work better if they were unsigned integers or alternatively pointers to unsigned characters?
I think its just trying to do integer array indexing.
Yes, unsigned, or preferably the correct "uintptr_t" is better for this sort of thing, but int64 is so large its unlikely to overflow (one would hope). "intptr_t" is probably "int64" anyway.
Maybe its just a case of the optimizer removing code that is incorrect.
ejolson wrote:
Fri Jan 04, 2019 8:35 am
The message

Address 0x6caf448 is 14,001,160 bytes inside a block of size 14,004,128 in arena "client"

says the write will occur inside the block. Isn't that what's supposed to happen? Why would that be invalid?
I don't know, I wondered the same thing.
Where in the gibberish does it actually allocate the 14MB of memory?

Maybe the address sanitizer will come up with something more helpful (Heater?)

Heater
Posts: 13271
Joined: Tue Jul 17, 2012 3:02 pm

Re: Why Avoid BASIC on RPi?

Fri Jan 04, 2019 9:27 am

I get the same segfault at -O2 and -O3 here on my PC.

So far I cannot figure out how to build the C file with address santitizer or the undefined behavior sanitizer. They have an over the top complicated way to compile and link a single C file !

I'll try and get back to it later.

Perhaps just throw it over the wall as a bug report to the FreeBASIC developers.

jahboater
Posts: 4675
Joined: Wed Feb 04, 2015 6:38 pm

Re: Why Avoid BASIC on RPi?

Fri Jan 04, 2019 9:35 am

Although the valgrind message appears not to make sense,
GDB independently reports the segfault at the same line number as valgrind reports the invalid write.
Last edited by jahboater on Fri Jan 04, 2019 9:42 am, edited 1 time in total.

jahboater
Posts: 4675
Joined: Wed Feb 04, 2015 6:38 pm

Re: Why Avoid BASIC on RPi?

Fri Jan 04, 2019 9:40 am

Heater wrote:
Fri Jan 04, 2019 9:27 am
Perhaps just throw it over the wall as a bug report to the FreeBASIC developers.
Its failing without GCC on my x64 PC:

Code: Select all

$ fbc -O 3 -lang qb integer.bas
$ ./integer 
Segmentation fault

Heater
Posts: 13271
Joined: Tue Jul 17, 2012 3:02 pm

Re: Why Avoid BASIC on RPi?

Fri Jan 04, 2019 10:15 am

Same here.

ejolson
Posts: 3547
Joined: Tue Mar 18, 2014 11:47 am

Re: Why Avoid BASIC on RPi?

Fri Jan 04, 2019 11:04 am

Heater wrote:
Fri Jan 04, 2019 10:15 am
Same here.
That's strange. Maybe the built-in code generator only produces 32-bit x86 code. I forgot to mention the Intel machine I've been testing with is running in 32-bit mode. Are you sure it is not defaulting to the gcc back end for 64-bit? Try -v to see what's happening.
Last edited by ejolson on Fri Jan 04, 2019 11:09 am, edited 2 times in total.

User avatar
Paeryn
Posts: 2657
Joined: Wed Nov 23, 2011 1:10 am
Location: Sheffield, England

Re: Why Avoid BASIC on RPi?

Fri Jan 04, 2019 11:08 am

It's a big problem with the generated code. It uses setjmp()/longjmp() to handle GOSUB/RETURN but this fails spectacularly when longjmp() happens as variables that were held in registers at the time of the setjmp() end up with incorrect values when it returns.

If you add -Wextra to gcc's compile line it will tell you a whole load of variables which can be (and indeed are) clobbered. Editing the C code to make all those volatile (so it always saves/loads those variables to memory rather than relying on registers having the values) and the code runs.

Code: Select all

pi@rpi3:~/Programming/asm/challenge_fibo/fibo_4784969/BASIC $ time ./integer |head -c32
10727395641800477229364813596225
real    0m41.414s
user    0m39.215s
sys     0m2.179s
She who travels light — forgot something.

jahboater
Posts: 4675
Joined: Wed Feb 04, 2015 6:38 pm

Re: Why Avoid BASIC on RPi?

Fri Jan 04, 2019 11:17 am

Well spotted!

I saw the clobbered warnings, but it commonly reports false positives for that, so I focused on the obvious valgrind warning which is in fact just a symptom.

ejolson
Posts: 3547
Joined: Tue Mar 18, 2014 11:47 am

Re: Why Avoid BASIC on RPi?

Fri Jan 04, 2019 11:20 am

Paeryn wrote:
Fri Jan 04, 2019 11:08 am
It's a big problem with the generated code. It uses setjmp()/longjmp() to handle GOSUB/RETURN but this fails spectacularly when longjmp() happens as variables that were held in registers at the time of the setjmp() end up with incorrect values when it returns.

If you add -Wextra to gcc's compile line it will tell you a whole load of variables which can be (and indeed are) clobbered. Editing the C code to make all those volatile (so it always saves/loads those variables to memory rather than relying on registers having the values) and the code runs.

Code: Select all

pi@rpi3:~/Programming/asm/challenge_fibo/fibo_4784969/BASIC $ time ./integer |head -c32
10727395641800477229364813596225
real    0m41.414s
user    0m39.215s
sys     0m2.179s
I suspect this volatile fix should be incorporated into the FreeBASIC gcc back end, even though it seems a bit of a severe work around as there must be many cases where there is no longjump and the values in the registers are still good.

I find it odd, however, that the gcc optimiser assumes registers are untouched by setjump and longjump. I would think the usual case would be that all registers need to be reloaded after such an operation. Isn't some sort of rule about sequence points in the C standard supposed to stop such things from happening?

Can anyone get the LLVM back end to work?

Is the optimised code after the volatile keyword has been added faster or slower than the original code with the optimiser turned off?

User avatar
Paeryn
Posts: 2657
Joined: Wed Nov 23, 2011 1:10 am
Location: Sheffield, England

Re: Why Avoid BASIC on RPi?

Fri Jan 04, 2019 11:59 am

ejolson wrote:
Fri Jan 04, 2019 11:20 am
I suspect this volatile fix should be incorporated into the FreeBASIC gcc back end, even though it seems a bit of a severe work around as there must be many cases where there is no longjump and the values in the registers are still good.

I find it odd, however, that the gcc optimiser assumes registers are untouched by setjump and longjump. I would think the usual case would be that all registers need to be reloaded after such an operation. Isn't some sort of rule about sequence points in the C standard supposed to stop such things from happening?

Can anyone get the LLVM back end to work?

Is the optimised code after the volatile keyword has been added faster or slower than the original code with the optimiser turned off?
Local non-volatile automatic variables are not preserved (or rather the spec states that there values are indeterminate, the compiler doesn't have to try to save them. Using setjmp()/longjmp() should be done with care.
<Addendum>
Why should the compiler assume setjmp()/longjmp() affect the registers that the calling convention says they can't touch? If R4 held a particular value just before setjmp() then it should have the exact same value when setjmp() returns. The problem is that at the point when longjmp() is called R4 may be being used to hold a totally different value, and longjmp() performs magic to pretend that it is the corresponding setjmp() returning again. R4 will now have a different value but the code just after setjmp() will assume it still holds the original.

If a local variable's value changes between a setjmp() and a longjmp() then you have to declare it volatile to tell the compiler that it's value can't be cached in a register.
C11-spec wrote: All accessible objects have values, and all other components of the abstract machine have state, as of the time the longjmp function was called, except that the values of objects of automatic storage duration that are local to the function containing the invocation of the corresponding setjmp macro that do not have volatile-qualified type and have been changed between the setjmp invocation and longjmp call are indeterminate.
Haven't timed the non-optimised code and I've got to go out now so if nobody else gives it I will sometime tonight.
She who travels light — forgot something.

Steve Drain
Posts: 105
Joined: Tue Oct 30, 2012 2:08 pm
Location: Exeter UK

Re: Why Avoid BASIC on RPi?

Fri Jan 04, 2019 1:22 pm

ejolson wrote:
Thu Jan 03, 2019 7:18 am
While attempts to create a Fibonacci code using BBC BASIC by members posting in this thread have so far met with failure
I have been lurking here from the start and I have learned a great deal, but I cannot let that go without some comment. ;-)

I wrote a fairly naive doubling routine for RISC OS BBC BASIC using a module which has SWIs for the big number multiplication. This ran in about 5 minutes without any further optimisation. I have not posted it because it fails to meet the conditions. Conversion of the routine to assembler would be quite straightforward, but again it would be in the BASIC assembler and not suitable.

Writing routines in plain BASIC is fun, but the results could never be used for this problem in a realistic time. I do agree that BBC BASIC is not very expressive for this.

ejolson
Posts: 3547
Joined: Tue Mar 18, 2014 11:47 am

Re: Why Avoid BASIC on RPi?

Fri Jan 04, 2019 2:09 pm

Steve Drain wrote:
Fri Jan 04, 2019 1:22 pm
ejolson wrote:
Thu Jan 03, 2019 7:18 am
While attempts to create a Fibonacci code using BBC BASIC by members posting in this thread have so far met with failure
I have been lurking here from the start and I have learned a great deal, but I cannot let that go without some comment. ;-)

I wrote a fairly naive doubling routine for RISC OS BBC BASIC using a module which has SWIs for the big number multiplication. This ran in about 5 minutes without any further optimisation. I have not posted it because it fails to meet the conditions. Conversion of the routine to assembler would be quite straightforward, but again it would be in the BASIC assembler and not suitable.

Writing routines in plain BASIC is fun, but the results could never be used for this problem in a realistic time. I do agree that BBC BASIC is not very expressive for this.
Your code sounds very nice. There have already been Python, JavaScript, C and Haskell codes posted which leverage big-number subroutines written in other languages, so I wouldn't worry that much about the rules. Even though such results don't relate directly to the expressiveness of any particular high-level language, they do tell an interesting story about the productivity of a particular programming environment.

Would you mind posting what you have so far?

A version with inline assembler instead of SWIs also sounds nice. Do you know whether the assembler implements an O(n^2) algorithm for the multiplication or one with greater asymptotic efficiency such as Karatsuba?

Heater
Posts: 13271
Joined: Tue Jul 17, 2012 3:02 pm

Re: Why Avoid BASIC on RPi?

Fri Jan 04, 2019 4:22 pm

ejolson,
There have already been Python, JavaScript, C and Haskell codes posted which leverage big-number subroutines written in other languages, so I wouldn't worry that much about the rules.
Yes, At this stage of the game I don't think anyone is going to worry about the challenge rules. Especially since one of the rule makers has been AWOL for weeks. If there is an interesting solution to the problem, let's see it, no matter how it's done.

But I have a little issue with the above...

Languages like Python, Javascript, Haskell, Scala, etc have the notion of big integers built into the very syntax and semantics of the language definition.

Languages like C, BASIC, Pascal etc do not.

How these things are implemented under the hood is of no consequence, as long as they perform well enough. Perhaps the language run time makes use of libs written in other languages, perhaps it does not. We are talking about the language here not any particular implementation.

That may be of no consequence to you but it's a big deal to my mind.

I'm sure you will agree that being able to write "a = b + c" is a lot more concisely expressive than something like "mul_bigInt(a, b, c)"

Then there is the matter of types and type checking and so on.

Actually I was wondering what you thought about C++ in this respect. On the one hand, with it's operator overloads and such, C++ can express the fibo algorithms we have been using very nicely. On the other hand, without using a non-standard library one has to do a ton of work to make those operator overloads mean something.

I'm kind of happy with it. I could swap out my home made big int class for a GMP big int class and with no change the to level expression of fibo() it would still work. Only faster!

Steve Drain
Posts: 105
Joined: Tue Oct 30, 2012 2:08 pm
Location: Exeter UK

Re: Why Avoid BASIC on RPi?

Fri Jan 04, 2019 5:36 pm

ejolson wrote:
Fri Jan 04, 2019 2:09 pm
Would you mind posting what you have so far?
I may do, but it only applies to BBC BASIC running on a RISC OS machine with the Numbers module loaded.

To clarify for anyone unfamiliar with RISC OS, modules are operation system extensions. They can be written in assembler, C or just about in compiled BASIC. They provide software interrupts (SWIs) which can be called by any language. In this case the Numbers module was written by a third party quite while ago and provides a fairly comprehensive suite of big number operations.
A version with inline assembler instead of SWIs also sounds nice. Do you know whether the assembler implements an O(n^2) algorithm for the multiplication or one with greater asymptotic efficiency such as Karatsuba?
You mistake my meaning. An assembler version would call the SWIs in the same way, but the overheads would be much reduced.

I do not know how multiplication is done. I have the source, but have not examined it. I do not think it is Karatsuba, because it is a development of algorithms originally written in the 1980s.

Return to “Off topic discussion”