ejolson
Posts: 3061
Joined: Tue Mar 18, 2014 11:47 am

Re: Why Avoid BASIC on RPi?

Sat Feb 02, 2019 7:31 pm

Gavinmc42 wrote:
Fri Feb 01, 2019 10:20 pm
Next step is an AI as our decision makers and leaders?
Currently I see that as a benefit not a harm.
There are many villages and townships where Mayor is a part-time position. The thinking appears to be that it's better not to have too much of a good thing when it comes to politics. Along these lines, compared to a neural-network politician that works 24 hours a day 7 days a week without sleep named Marvin, I suspect many would prefer a trash-sorting robot named Wall-E.

I've been working on a parallel MPI Fibonacci code for use on distributed-memory clusters. Rather than dividing the computation into thousands of pieces as done with OpenMP, it appears necessary to keep things more course grained. As the conquer and divide recursion in Karatsuba multiplication naturally parallelizes into three parts at a time, such course-grained parallelization should work best when the cluster has 3^n nodes.

The super-cheap cluster inconveniently has 5 computational nodes. However, over-provisioning 4 of the 5 nodes by using 9 ranks for the computation could plausibly result in an efficient distribution of work. In this case, it is important that the MPI library has been configured to avoid busy waiting, which fortunately is already true for the super-cheap cluster.

I don't know whether the final MPI parallel computation will actually run faster on the cluster or not. The networking fabric is made out of bridged USB Ethernet gadgets while the nodes are Pi Zero computers. Thus, even though communications between nodes has high latency and low bandwidth, the CPU speed of each node is so slow that increasing performance by parallel processing may still be possible.

Unless you want convenient use of MPI for distributed-memory parallel processing, why avoid Basic?

ejolson
Posts: 3061
Joined: Tue Mar 18, 2014 11:47 am

Re: Why Avoid BASIC on RPi?

Mon Feb 04, 2019 7:22 pm

ejolson wrote:
Sat Feb 02, 2019 7:31 pm
Unless you want convenient use of MPI for distributed-memory parallel processing, why avoid Basic?
The distributed-memory MPI parallel code is turning out as troublesome as line-numbered Basic. After 50 or so big numbers successfully passed back and forth between the ranks, I'm receiving "Fatal Error in MPI_Recv: Unknown error class."

What does that mean? Could it be caused by a stray pointer? Maybe I need a deep-learning dog detector for debugging. When C works like this, why avoid Basic?

ejolson
Posts: 3061
Joined: Tue Mar 18, 2014 11:47 am

Re: Why Avoid BASIC on RPi?

Tue Feb 05, 2019 2:19 am

The MPI program is now working. Rather than merely a stray pointer, there was a race in gathering the results between lower and higher levels of the recursion. The unexpected order of the messages resulted in memory being freed before the return value arrived. This caused the program to scribble on the stack of the MPI_Recv function. Surprisingly, the problem was fixed by a thirteen-character change in the sync_bigmul3 routine to explicitly specify the ranks from which to receive messages. As the saying goes, if contractors built houses like programmers built programs, then not one of the three little pigs would have been saved.

The parallelism is relatively coarse grained. The last return in the recursion uses 3^n ranks while fibowork runs with 2*3^n ranks. Both ways of dividing up the computation must be efficiently distributed among the nodes of a cluster for good performance. Theoretically, this can be achieved by over-provisioning 3^n nodes with 2*3^n ranks. Alternatively, TUNE3 can be set to an impossibly large number so the linear recurrence in fibowork is not parallelized. In that case only the Karatsuba multiplication in bigmul3 will be parallelized in which case using 3^n ranks is all that's needed.

Sometimes over-provisioning is not practical because the MPI library employs busy waiting (almost all do by default) or takes a relatively long time to start each individual rank. In such situations setting TUNE3 large and using exactly 3^n nodes can result in better performance. When the number of nodes available is not a power of three then over-provisioning will almost always increase performance.

Since the super-cheap cluster has 5 single-processor nodes, one option would be to run the MPI program using 18 over-provisioned ranks. Alternatively, one could set TUNE3 very large and run the program using 9 ranks. In the case of the super-cheap cluster, the latter results slightly better performance. In particular, we have the following results in seconds:

Code: Select all

         Nodes   Ranks      TUNE3     Time
serial     1      n/a         n/a   24.195
fibompi    1        1         n/a   26.672
fibompi    5       18     1048576    9.084
fibompi    5        9   134217728    8.901
The distributed-memory MPI parallel code runs 2.718 times faster than the simple serial code on the super-cheap cluster. Further comparing the parallel code to the serial code suggests that at least 2.5 seconds of the 9 second runtime is used for initialization. If these initialization costs are neglected, then 24 divided by 6.5 estimates the actual parallel speedup to be over 3.7 times faster. Following this line of reasoning, it is reasonable to conjecture that a 4-fold speedup might be achieved when calculating larger Fibonacci numbers.

The MPI code is about 55 lines longer than the original parallel.c OpenMP code. For reference fibompi.c follows:

Code: Select all

/*  fibompi.c -- Compute the nth Fibonacci Number
    Written February 4, 2019 by Eric Olson

    This program demonstrates the expressiveness of C with MPICH
    as measured by explicitly coding a distributed-memory parallel
    version of the Karatsuba multiplication algorithm and then 
    using the doubling formulas

        F(2k) = F(k)[2F(k+1)-F(k)]
      F(2k+1) = F(k+1)[2F(k+1)-F(k)]+(-1)^(k+1)
      F(2k+1) = F(k)[2F(k)+F(k+1)]+(-1)^(k)
      F(2k+2) = F(k+1)[2F(k)+F(k+1)]

    to compute the nth Fibonacci number.  Note that n is specified
    in the first line of the main routine.

    For best performance use 3^n nodes over provisioned with 2*3^n
    ranks or set TUNE3 to be large and then use only 3^n ranks. */

#include <stdio.h>
#include <math.h>
#include <string.h>
#include <strings.h>
#include <stdlib.h>
#include <alloca.h>
#include <sys/time.h>
#include <sys/resource.h>
#include <mpi.h>
int mpi_rank, mpi_size;

#define TUNE1 44
#define TUNE2 1024
//#define TUNE3 1048576
#define TUNE3 134217728

typedef unsigned long long llu;
typedef int limb;
typedef unsigned long long limb2;
typedef struct {
    int n; limb *d;
} bignum;

static int rexp,cmult,digits;
static limb rbase,rcarry;
static limb2 cadd,cmax;

static void bigprint(bignum x){
    char fmt[10];
    sprintf(fmt,"%%0%dllu",rexp);
    if(!x.n){
        printf("0\n");
        return;
    }
    for(int i=x.n-1;i>=0;i--){
        if(i==x.n-1) printf("%llu",(llu)x.d[i]);
        else printf(fmt,(llu)x.d[i]);
    }
    printf("\n");
}

static bignum bigtrim(bignum x){
    for(int k=x.n-1;k>=0;k--){
        if(x.d[k]!=0){
            x.n=k+1;
            return x;
        }
    }
    x.n=0;
    return x;
}

static bignum bigcarry(bignum x){
    int imax=x.n-1;
    for(int i=0;i<imax;i++){
        if(x.d[i]>=rbase){
            do {
                x.d[i]-=rbase; x.d[i+1]++;
            } while(x.d[i]>=rbase);
        } else if(x.d[i]<0){
            do {
                x.d[i]+=rbase; x.d[i+1]--;
            } while(x.d[i]<0);
        }
    }
    if(x.d[imax]<rbase) return bigtrim(x);
    x.n++;
    x.d[imax]-=rbase; x.d[imax+1]=1; 
    while(x.d[imax]>=rbase){
        x.d[imax]-=rbase; x.d[imax+1]++;
    }
    return x;
}

static bignum bigcopy(bignum z,bignum x){
    memcpy(z.d,x.d,sizeof(limb)*x.n);
    z.n=x.n;
    return z;
}

static bignum bigadd3(limb zd[],bignum x,bignum y){ 
    bignum z; z.d=zd;
    if(x.n<y.n){
        for(int i=0;i<x.n;i++) z.d[i]=x.d[i]+y.d[i];
        for(int i=x.n;i<y.n;i++) z.d[i]=y.d[i];
        z.n=y.n;
    } else {
        for(int i=0;i<y.n;i++) z.d[i]=x.d[i]+y.d[i];
        for(int i=y.n;i<x.n;i++) z.d[i]=x.d[i];
        z.n=x.n;
    }
    return z;
}

#ifdef NOTUSED
static bignum bigsub3(limb zd[],bignum x,bignum y){ 
    bignum z; z.d=zd;
    for(int i=0;i<y.n;i++) z.d[i]=x.d[i]-y.d[i];
    for(int i=y.n;i<x.n;i++) z.d[i]=x.d[i];
    z.n=x.n;
    return z;
}
#endif

static bignum bigsub2(bignum x,bignum y){ 
    for(int i=0;i<y.n;i++) x.d[i]-=y.d[i];
    return x;
}

static bignum bigadd2(bignum x,bignum y){
    if(x.n<y.n){
        if(x.n>0) {
            x.d[0]--;
            for(int i=0;i<x.n;i++) x.d[i]+=y.d[i]-rcarry;
            x.d[x.n]=y.d[x.n]+1;
            for(int i=x.n+1;i<y.n;i++) x.d[i]=y.d[i];
            x.n=y.n;
        } else {
            x=bigcopy(x,y);
        }
    } else {
        x.d[0]--;
        for(int i=0;i<y.n;i++) x.d[i]+=y.d[i]-rcarry;
        if(y.n==x.n) {
            x.n++;
            x.d[y.n]=1;
        } else {
            x.d[y.n]++;
        }
    }
    return x;
}

static bignum bigdec1(bignum x){ 
    int imax=x.n-1;
    x.d[0]--;
    for(int i=0;i<imax;i++){
        if(x.d[i]>=0) return x;
        x.d[i]+=rbase; x.d[i+1]--;
    }
    return x;
}

static bignum biginc1(bignum x){
    int imax=x.n-1;
    if(imax>=0) {
        x.d[0]++;
        for(int i=0;i<imax;i++){
            if(x.d[i]<rbase) return x;
            x.d[i]-=rbase; x.d[i+1]++;
        }
        if(x.d[imax]<rbase) return x;
        x.d[imax]-=rbase;
    }
    x.n++;
    x.d[imax+1]=1;
    return x;
}

static bignum bigmul3w(limb zd[],bignum x,bignum y){
    bignum z; z.d=zd;
    int imax=x.n+y.n;
    limb2 s[imax];
    bzero(s,sizeof(limb2)*imax);
    imax--;
    for(int i=0;i<x.n;i++){
        for(int j=0;j<y.n;j++){
            s[i+j]+=(limb2)x.d[i]*y.d[j];
        }
        if((i+1)%cmult==0){
            for(int k=0;k<imax;k++){
                if(s[k]<cmax) continue;
                s[k]-=cmax; s[k+1]+=cadd;
            }
        }
    }
    for(int k=0;k<imax;k++){
        s[k+1]+=s[k]/rbase; z.d[k]=s[k]%rbase;
    }
    z.d[imax]=s[imax];
    z.n=imax+1;
    return bigtrim(z);
}

static bignum biglow(bignum x,int n){
    if(x.n>n) x.n=n;
    return x;
}

static bignum bighigh(bignum x,int n){
    if(x.n<n) x.n=0;
    else {
        x.n-=n;
        x.d=&x.d[n];
    }
    return x;
}

static bignum bigmul3s(limb zd[],bignum x,bignum y){
    if(x.n<TUNE1||y.n<TUNE1) return bigmul3w(zd,x,y);
    int M=x.n<y.n?y.n:x.n; int n=M/2;
    bignum z; z.d=zd;
    bignum x0=biglow(x,n),x1=bighigh(x,n);
    bignum y0=biglow(y,n),y1=bighigh(y,n);
    limb z0d[M+1],z1d[M+3],z1yd[n+1],z2d[M+3];
    bignum z0,z2,z1x,z1y,z1;
    z0=bigmul3s(z0d,x0,y0);
    z2=bigmul3s(z2d,x1,y1);
    z1x=bigcarry(bigadd3(z1d,x0,x1));
    z1y=bigcarry(bigadd3(z1yd,y0,y1));
    z1=bigmul3s(z1x.d,z1x,z1y);
    z1=bigcarry(bigsub2(bigsub2(z1,z0),z2));
    z=bigcopy(z,z0);
    z=bigadd3(&z.d[n],(bignum){z.n-n,&z.d[n]},z1);
    z=bigadd2((bignum){z.n-n,&z.d[n]},z2);
    z.n=z.n+2*n; z.d=zd;
    return bigcarry(z);
}

static bignum bigmul3(int l,limb zd[],bignum x,bignum y);
typedef struct { bignum *zp; int l,nx,ny; limb d[]; } bigmul3args;
static void spawn_bigmul3(int c,bignum *zp,int l,limb zd[],bignum x,bignum y){
    unsigned int msgsize=sizeof(bigmul3args)+(x.n+y.n)*sizeof(limb);
    bigmul3args *msg=alloca(msgsize);
    msg->zp=zp; msg->l=l; msg->nx=x.n; msg->ny=y.n;
    memcpy(&msg->d[0],x.d,x.n*sizeof(limb));
    memcpy(&msg->d[x.n],y.d,y.n*sizeof(limb));
    zp->d=zd;
    MPI_Send(msg,msgsize,MPI_BYTE,2*mpi_rank+c,0,MPI_COMM_WORLD);
    return;
}
typedef struct { bignum *zp; int n; limb d[]; } bigmul3ret;
static int workn(){
    unsigned int insize=sizeof(bigmul3args)+digits*sizeof(limb);
    bigmul3args *in=alloca(insize);
    for(;;){
        MPI_Status err;
        MPI_Recv(in,insize,MPI_BYTE,
            MPI_ANY_SOURCE,0,MPI_COMM_WORLD,&err);
        int rankret=err.MPI_SOURCE;
        if(in->zp==0) return 0;
        bignum x,y,z;
        x.n=in->nx; x.d=&in->d[0];
        y.n=in->ny; y.d=&in->d[x.n];
        bigmul3ret *out=(bigmul3ret *)in;
        z=bigmul3(in->l,out->d,x,y);
        out->n=z.n;
        unsigned int outsize=sizeof(bigmul3ret)+z.n*sizeof(limb);
        MPI_Send(out,outsize,MPI_BYTE,rankret,0,MPI_COMM_WORLD);
    }
}
static void sync_bigmul3(int l,int p){
    unsigned int msgsize=sizeof(bigmul3ret)+digits*sizeof(limb);
    bigmul3ret *msg=alloca(msgsize);
    for(int i=0;i<p;i++){
        MPI_Status err;
        MPI_Recv(msg,msgsize,MPI_BYTE,
            2*mpi_rank+l+i,0,MPI_COMM_WORLD,&err);
        msg->zp->n=msg->n;
        memcpy(msg->zp->d,msg->d,msg->n*sizeof(limb));
    }
    return;
}

static bignum bigmul3(int l,limb zd[],bignum x,bignum y){
    int l3=l*3;
    if(x.n<TUNE1||y.n<TUNE1) return bigmul3w(zd,x,y);
    int M=x.n<y.n?y.n:x.n; int n=M/2;
    if(M<TUNE2||l3>mpi_size) return bigmul3s(zd,x,y);
    bignum z; z.d=zd;
    bignum x0=biglow(x,n),x1=bighigh(x,n);
    bignum y0=biglow(y,n),y1=bighigh(y,n);
    limb z0d[M+1],z1d[M+3],z1yd[n+1],z2d[M+3];
    bignum z0,z2,z1x,z1y,z1;
    spawn_bigmul3(l,&z0,l3,z0d,x0,y0);
    spawn_bigmul3(l+1,&z2,l3,z2d,x1,y1);
    z1x=bigcarry(bigadd3(z1d,x0,x1));
    z1y=bigcarry(bigadd3(z1yd,y0,y1));
    z1=bigmul3(l3,z1x.d,z1x,z1y);
    sync_bigmul3(l,2);
    z1=bigcarry(bigsub2(bigsub2(z1,z0),z2));
    z=bigcopy(z,z0);
    z=bigadd3(&z.d[n],(bignum){z.n-n,&z.d[n]},z1);
    z=bigadd2((bignum){z.n-n,&z.d[n]},z2);
    z.n=z.n+2*n; z.d=zd;
    return bigcarry(z);
}

static bignum t1,a,b;
static void fiboworks(int n){
    if(n==0){
        a.n=0; b.n=1; b.d[0]=1;
        return;
    }
    fiboworks(n/2);
    if(n%2==0){
        // [a,b]=[a*(2*b-a),b*(2*b-a)-(-1)^k]
        t1=bigcarry(bigsub2(bigadd3(t1.d,b,b),a));
        a=bigmul3(1,a.d,a,t1);
        b=bigmul3(1,b.d,b,t1);
        if(n%4==0) b=bigdec1(b); else b=biginc1(b);
    } else {
        // [a,b]=[a*(2*a+b)+(-1)^k,b*(2*a+b)]
        t1=bigcarry(bigadd2(bigadd3(t1.d,a,a),b));
        b=bigmul3(1,b.d,b,t1); 
        a=bigmul3(1,a.d,a,t1); 
        if(n%4==1) a=biginc1(a); else a=bigdec1(a);
    }
    return;
}

static void fibowork(int n){
    if(n==0){
        a.n=0; b.n=1; b.d[0]=1;
        return;
    }
    if(n<TUNE3||2>mpi_size) {
        fiboworks(n);
        return;
    }
    fibowork(n/2);
    if(n%2==0){
        // [a,b]=[a*(2*b-a),b*(2*b-a)-(-1)^k]
        t1=bigcarry(bigsub2(bigadd3(t1.d,b,b),a));
        spawn_bigmul3(1,&a,2,a.d,a,t1);
        b=bigmul3(2,b.d,b,t1);
        sync_bigmul3(1,1);
        if(n%4==0) b=bigdec1(b); else b=biginc1(b);
    } else {
        // [a,b]=[a*(2*a+b)+(-1)^k,b*(2*a+b)]
        t1=bigcarry(bigadd2(bigadd3(t1.d,a,a),b));
        spawn_bigmul3(1,&b,2,b.d,b,t1); 
        a=bigmul3(2,a.d,a,t1); 
        sync_bigmul3(1,1);
        if(n%4==1) a=biginc1(a); else a=bigdec1(a);
    }
    return;
}

static bignum fibo(int n, limb xd[]){
    b.d=xd;
    if(n<2){
        b.n=1; b.d[0]=n;
        return b;
    }
    limb ad[digits]; a.d=ad;
    fibowork((n-1)/2);
    if(n%2==0){
        // b=b*(2*a+b)
        t1=bigcarry(bigadd2(bigadd3(t1.d,a,a),b));
        b=bigmul3(1,b.d,b,t1);
    } else {
        // b=b*(2*b-a)-(-1)^k
        t1=bigcarry(bigsub2(bigadd3(t1.d,b,b),a));
        b=bigmul3(1,b.d,b,t1);
        if(n%4==1) b=bigdec1(b); else b=biginc1(b);
    }
    return b;
}

static int work0(int n){
    limb t1d[digits]; t1.d=t1d;
    limb xd[digits];
    bignum x=fibo(n,xd);
    bigprint(x);
    return 0;
}

int main(int argc, char* argv[]){
    int n=4784969;
//    int n=7499;
    setrlimit(RLIMIT_STACK,
        &(const struct rlimit)
        {RLIM_INFINITY,RLIM_INFINITY});
    if(sizeof(limb)*2!=sizeof(limb2)){
        fprintf(stderr,
        "sizeof(limb)=%llu must be half of sizeof(limb2)=%llu!\n",
        (llu)sizeof(limb),(llu)sizeof(limb2));
        return 1;
    }
    limb limbmax=((limb2)1<<(8*sizeof(limb)-1))-1;
    limb2 limb2max=((limb2)1<<(8*sizeof(limb)))-1;
    limb2max=(limb2max<<(8*sizeof(limb)))+limb2max;
    rexp=log(limbmax)/log(10);
    rbase=pow(10,rexp)+0.5; rcarry=rbase-1;
    cmult=(limb2max-2*rbase)/rbase/rbase/2;
    cadd=(limb2)rbase*cmult; cmax=cadd*rbase;
    digits=n*log10(0.5*(1+sqrt(5.0)))/rexp+4;
    MPI_Init(&argc,&argv);
    MPI_Comm_rank(MPI_COMM_WORLD,&mpi_rank);
    MPI_Comm_size(MPI_COMM_WORLD,&mpi_size);
    if(mpi_rank==0){
        work0(n);
        bignum *zp=0;
        for(int i=1;i<mpi_size;i++){
            MPI_Send(&zp,sizeof(bignum *),MPI_BYTE,i,0,MPI_COMM_WORLD);
        }
    } else {
        workn();
    }
    MPI_Finalize();
    return 0;
}
Last edited by ejolson on Mon Feb 11, 2019 6:13 am, edited 1 time in total.

ejolson
Posts: 3061
Joined: Tue Mar 18, 2014 11:47 am

Re: Why Avoid BASIC on RPi?

Sun Feb 10, 2019 12:25 am

The second age of personal computing started with the introduction of a single-board computer called the Raspberry Pi that was cheap enough even a child could own one. This resulted in clusters of real computers housed in Lego structures along with more sensible clusters put together by adults. In this way was born the personal computing cluster, a cluster which took no more electricity than a light bulb and less space than a toaster.

The golden age of personal computing started with the introduction of 8-bit single-board computers such as the Commodore PET. Although it was theoretically possible to assemble these computers into clusters using the built-in IEEE 488 interface, such was not done, possibly because it took too much Lego. People did, of course, still create personal clusters. A representative system is the 16-processor Z80 cluster assembled by Duane Elscott

Image

used for computing fractal art. In general, however, the personal computing cluster only recently became practical.

Even so, it would appear that computing clusters are still much less popular than multi-core processors. Cost is not a factor: The super-cheap cluster in its entirety costs less than US $100, while the cost of a single x86 multi-core CPU is significantly more. Space is not a factor: The super-cheap cluster takes less space than most desktop or notebook computers. What is the difference? Why are multi-core processors everywhere and multi-node clusters not?

It is interesting to note that the 32-core Ryzen Threadripper, which is geared towards mainstream users, internally consists of four active dies arranged in a cluster: Two dies in that arrangement have directly attached memory controllers and two do not. The high-speed interconnect between the CPUs inside the Threadripper package enjoy a 25 Gbit/sec bandwidth with a 250 ns latency. Even though the Threadripper is often treated as a symmetric multiprocessor, internally it is actually a cluster of processors with non-uniform but fast memory access.

For comparison, the USB Ethernet gadgets which form the networking fabric of the Pi-Zero cluster operate with a 0.1 Gbit/sec bandwidth and an 800000 ns latency. The bandwidth is 250 times less and the latency 3200 times more. Latency is often the biggest issue with high-performance computing and why the Cray-1 was arranged in that distinctive C shape. In particular, the high-latency of the networking fabric in traditional clusters compared to NUMA means for efficient parallel operation that memory in a cluster must be viewed as distributed between processors rather than shared.

In either case, distributed-memory clusters and shared-memory multiprocessors both require parallel-processing techniques to expressively convey an algorithm to the computing hardware for execution. Although not written in line-numbered Basic, the MPI parallel program for computing Fibonacci numbers on a distributed-memory cluster was at least 55 lines of code more difficult to write than the OpenMP shared memory version. Could this additional programming complexity be the reason why multi-core notebooks, desktops and even phones are commodity items while computing clusters are not?

From a computer literacy point of view, it is important to know whether distributed-memory parallel processing, though difficult, is necessary to maintain the second age of personal computing and avoid the apocalypse that comes after digital feudalism. As anyone who has compared fat-free milk to one-percent knows, the one percent can make an important difference. Indeed, a continuation of the liberation provided by current technology is not at all certain. Consider, for example, the following graph of data obtained from the Wayback Machine showing the number of new posts to the Raspberry Pi forums per year over the last five years:

Image

Heater
Posts: 12660
Joined: Tue Jul 17, 2012 3:02 pm

Re: Why Avoid BASIC on RPi?

Sun Feb 10, 2019 11:34 am

So there I was tenaciously avoiding BASIC again, when I accidentally wrote a fibo(4784969) in the D language: https://dlang.org/

Code: Select all

import std.stdio;
import std.bigint;

BigInt zero = "0";
BigInt one = "1";
BigInt two = "2";

bool isEven(int n)
{
    return (n & 1) == 0;
}

// This Fibonacci version derived from Paeryn's Haskell example.
BigInt fibo(int n)
{
    if (n == 0)
    {
        return zero;
    }
    if (n < 3)
    {
        return one;
    }
    int k = (n / 2);
    const BigInt a = fibo(k);
    const BigInt b = fibo(k - 1);
    if (isEven(n))
    {
        return  a * (two * b + a);
    }
    BigInt twoa = two * a;
    if ((n % 4) == 1)
    {
        return  (twoa + b) * (twoa - b) + two;
    }
    return  (twoa + b) * (twoa - b) - two;
}

void main()
{
    BigInt f = fibo(4784969);
    writeln (f);
}
Results:

Code: Select all

$ dmd fibo.d
$ time ./fibo | head -c 32;  time ./fibo | tail -c 32;
107273956418004772293648135
real    1m15.649s
user    1m15.516s
sys     0m0.047s
4856539211500699706378405156269

real    1m14.942s
user    1m14.891s
sys     0m0.016s
That looks terribly slow for a C like language that compiles to native code. BUT once again we have a case where actually calculating fibo(4784969) takes 2 or 3 seconds, then printing the result takes a minute and ten seconds! In fact, as far as I can tell most of that time is spent freeing up all the memory it used.

Perhaps there are better ways to write this in D but I know nothing about it. There is an option to use GMP from D but I'm not sure if we can count that as a standard library for D.

jahboater
Posts: 4448
Joined: Wed Feb 04, 2015 6:38 pm

Re: Why Avoid BASIC on RPi?

Sun Feb 10, 2019 5:51 pm

Interesting.
You could perhaps try:

GC.disable()

at the start of main, which might stop the automatic garbage collection.

FYI, this book on D is the best I have seen on D
https://www.amazon.co.uk/Programming-Tu ... g+language

ejolson
Posts: 3061
Joined: Tue Mar 18, 2014 11:47 am

Re: Why Avoid BASIC on RPi?

Sun Feb 10, 2019 6:07 pm

Heater wrote:
Sun Feb 10, 2019 11:34 am
So there I was tenaciously avoiding BASIC again, when I accidentally wrote a fibo(4784969) in the D language
Something seems strange about the output. Usually head -c32 prints 32 digits while tail -c32 prints 31 digits and a newline.

I suspect the built-in library is computing in base 2 using a sensible algorithm but the base 2 to base 10 conversion for printing is unfortunately an O(n^2) algorithm. This may be the case with Python as well.

Like with any programming language, one is constrained by the vision of other developers when using standard subroutine libraries and built-in features. Maybe the multiply routine was micro-optimised while the printing routine was not. Could GMP be tricked into printing the results of the std.bigint calculation? If so, then it would be possible to have the algebraic notation of one library and the efficient printing of the other. If not, then one can really feel the all-or-nothing pain that results from using someone else's code. Alternatively, maybe the problem really is garbage collection.

The possibility of collecting garbage makes me want to go test the Go library that was linked earlier in this thread. From reading the code, that library appears to have an asymptotically reasonable base 2 to base 10 conversion used for printing performed by a conquer and divide application of division. However, it is impossible to know with certainty how expressively any code conveys an algorithm to the CPU without actually running it on the Raspberry Pi Zero.

Heater
Posts: 12660
Joined: Tue Jul 17, 2012 3:02 pm

Re: Why Avoid BASIC on RPi?

Sun Feb 10, 2019 8:25 pm

jahboater,

So I added:

Code: Select all

import core.memory;
...
void main(in string[] args)
{
    GC.disable;
    ...
Made no difference to the run time when printing the output.

fibo.d consumes 204MB RAM whilst running which seems rather high.

Thanks for the book tip. Actually I started to read Ali Cehreli's book online this afternoon here:
http://ddili.org/ders/d.en/writeln.html

I don't know, I promised myself I'd never be learning another programming language in my life, I've been through so many. Most of which are obsolete now. But they just keep coming...

I got attracted to D because of the involvement of one of my software nerd heros Andrei Alexandrescu plus a hint that work was going on to get it running on tiny micro-controllers plus a hint that there was an effort to use it as a hardware description language. Sadly the latter two of those seem to have fizzed out.

Heater
Posts: 12660
Joined: Tue Jul 17, 2012 3:02 pm

Re: Why Avoid BASIC on RPi?

Sun Feb 10, 2019 8:41 pm

ejolson,
Something seems strange about the output. Usually head -c32 prints 32 digits while tail -c32 prints 31 digits and a newline.
Well spotted. It did actually print a line announcing "Done." prior to printing the output so that I could get a feel for how long the computation took. I snipped that out when I posted the output here so it looks a bit odd.

From what I read the dlang BigInt is using karatsuba and is optimized for use only around 1000 digits or so. Bigger than that and they suggest using GMP.

ejolson
Posts: 3061
Joined: Tue Mar 18, 2014 11:47 am

Re: Why Avoid BASIC on RPi?

Mon Feb 11, 2019 4:32 am

Heater wrote:
Sun Feb 10, 2019 8:25 pm
jahboater,

So I added:

Code: Select all

import core.memory;
...
void main(in string[] args)
{
    GC.disable;
    ...
Made no difference to the run time when printing the output.
Would avoiding garbage collection work better with the empty parenthesis after it as originally suggested?

Heater
Posts: 12660
Joined: Tue Jul 17, 2012 3:02 pm

Re: Why Avoid BASIC on RPi?

Mon Feb 11, 2019 6:23 am

Adding parens did not help noticeably.

I also tried some compiler options "-release -inline -O". No change.

There are GCC and LLVM based D compilers available. I guess they would not make a huge difference.

ejolson
Posts: 3061
Joined: Tue Mar 18, 2014 11:47 am

Re: Why Avoid BASIC on RPi?

Mon Feb 11, 2019 5:19 pm

Heater wrote:
Mon Feb 11, 2019 6:23 am
There are GCC and LLVM based D compilers available. I guess they would not make a huge difference.
Unless you are avoiding D on the Raspberry Pi, you will need one of the GCC or LLVM based D compilers to generate the ARM executable.

Heater
Posts: 12660
Joined: Tue Jul 17, 2012 3:02 pm

Re: Why Avoid BASIC on RPi?

Mon Feb 11, 2019 5:56 pm

Ah, right. Good point. That would be a huge difference then.

Actually I was thinking of avoiding D until there was a compiler targeting RISC V for it.

jahboater
Posts: 4448
Joined: Wed Feb 04, 2015 6:38 pm

Re: Why Avoid BASIC on RPi?

Mon Feb 11, 2019 6:51 pm

GCC version 9 will include D so it should be available on everything including RISC-V.

Current status:
https://gcc.gnu.org/ml/gcc/2019-02/msg00028.html

Heater
Posts: 12660
Joined: Tue Jul 17, 2012 3:02 pm

Re: Why Avoid BASIC on RPi?

Mon Feb 11, 2019 7:01 pm

Most excellent!

User avatar
Gavinmc42
Posts: 3168
Joined: Wed Aug 28, 2013 3:31 am

Re: Why Avoid BASIC on RPi?

Mon Feb 11, 2019 11:59 pm

Hey D looks easier to read than C++, even got bit manipulation, so good for those little embedded micros?
I really did not want to learn another language but this looks worthwhile.

I do think it is real cool that we can use these languages on Pi's.
Sure they may not be the faster compiler boxes around but for tinkering with langauges they work well enough.

Was there a D based Fibo code?
I'm dancing on Rainbows.
Raspberries are not Apples or Oranges

Heater
Posts: 12660
Joined: Tue Jul 17, 2012 3:02 pm

Re: Why Avoid BASIC on RPi?

Tue Feb 12, 2019 1:38 am

Gavinmc42,
Was there a D based Fibo code?
Yes, I posted it here yesterday, try to keep up: https://www.raspberrypi.org/forums/view ... 0#p1428014

Coincidentally I just found this presentation on YouTube telling me that the D language is now working for RISC V, using GCC, via Fedora Linux: "Fedora on RISC-V 64-bit Introduction, Brief Overview and Latest Developments": https://www.youtube.com/watch?v=yxdT9gsBF_M

But still some road blocks:

1) 64 bit only.

2) Requires Linux to run.

3) No use for "bare metal" 32 bit RISC V. Still suck with C there.

I don't see D making it down to micro-controllers anytime soon. It can generate the code, like C, but some how needs a huge run time. Even if you are not using garbage collection or threads etc.

ejolson
Posts: 3061
Joined: Tue Mar 18, 2014 11:47 am

Re: Why Avoid BASIC on RPi?

Tue Feb 12, 2019 6:02 am

Heater wrote:
Tue Feb 12, 2019 1:38 am
Gavinmc42,
Was there a D based Fibo code?
Yes, I posted it here yesterday
While APL led to A and then the A+ programming language, BCPL led to B, C and then the D programming language. As a teacher I asked myself, what in the grading scale comes after D?

In order to avoid a Fortran Fibonacci code, or at least delay it, the second page of the comic begun in this post is provided instead.

Image

Note that the B in the above comic is unrelated to the B programming language. No identification with actual programming languages (living or deceased), places, buildings, and products is intended or should be inferred. Basically, all characters are fictitious but have become increasingly difficult to avoid, at least in this thread.
Last edited by ejolson on Tue Feb 12, 2019 1:32 pm, edited 1 time in total.

jahboater
Posts: 4448
Joined: Wed Feb 04, 2015 6:38 pm

Re: Why Avoid BASIC on RPi?

Tue Feb 12, 2019 8:30 am

ejolson wrote:
Tue Feb 12, 2019 6:02 am
While APL led to A and then the A+ programming language, BCPL led to B, C and then the D programming language. As a teacher I asked myself, what in the grading scale comes after D?
CPL (Combined Programming Language) led to BCPL (Basic CPL).

They called the first successor to C, C++ to avoid the choice between D and P (P being the third letter of BCPL).

ejolson
Posts: 3061
Joined: Tue Mar 18, 2014 11:47 am

Re: Why Avoid BASIC on RPi?

Tue Feb 12, 2019 3:57 pm

jahboater wrote:
Tue Feb 12, 2019 8:30 am
ejolson wrote:
Tue Feb 12, 2019 6:02 am
While APL led to A and then the A+ programming language, BCPL led to B, C and then the D programming language. As a teacher I asked myself, what in the grading scale comes after D?
CPL (Combined Programming Language) led to BCPL (Basic CPL).

They called the first successor to C, C++ to avoid the choice between D and P (P being the third letter of BCPL).
Interesting, so the B in BCPL stood for Basic. Maybe that explains why C and subsequently D both have goto statements. Basic is sure getting difficult to avoid!

Does D have a gosub-like setjmp similar to C?

Heater
Posts: 12660
Joined: Tue Jul 17, 2012 3:02 pm

Re: Why Avoid BASIC on RPi?

Tue Feb 12, 2019 5:23 pm

Amazingly one can still get recently maintained BCPL from Martin Richards. Which is usable on the Raspberry Pi apparently:

"Martin Richards maintains a modern version of BCPL on his website, last updated in 2018. This can be set up to run on various systems including Linux, FreeBSD, Mac OS X and Raspberry Pi. The latest distribution includes Graphics and Sound libraries and there is a comprehensive manual in PDF format. He continues to program in it, including for his research on musical automated score following."

https://www.cl.cam.ac.uk/~mr10/BCPL.html

fibo(4784969) in BCPL anybody?

jahboater
Posts: 4448
Joined: Wed Feb 04, 2015 6:38 pm

Re: Why Avoid BASIC on RPi?

Tue Feb 12, 2019 5:42 pm

ejolson wrote:
Tue Feb 12, 2019 3:57 pm
Does D have a gosub-like setjmp similar to C?
I believe it is possible but frowned upon!

I thought gosub was a normal function call in Basic?

jahboater
Posts: 4448
Joined: Wed Feb 04, 2015 6:38 pm

Re: Why Avoid BASIC on RPi?

Tue Feb 12, 2019 6:29 pm

Heater wrote:
Tue Feb 12, 2019 5:23 pm
Amazingly one can still get recently maintained BCPL from Martin Richards. Which is usable on the Raspberry Pi apparently:
I installed BCPL on an Amdahl mainframe over 35 years ago and at the time I was impressed that it "just worked" - because it was a binary executable and IBM hardware had remained backwards compatibility.

The only thing I can recall from then, was that BCPL could return an value from a statement expression (like an Algol 68 serial clause).
You can do this in C now with the GCC extension ({ .... })

B and BCPL were type-less which now I would regard as an absolute nightmare!

ejolson
Posts: 3061
Joined: Tue Mar 18, 2014 11:47 am

Re: Why Avoid BASIC on RPi?

Fri Feb 15, 2019 8:08 pm

Heater wrote:
Tue Feb 12, 2019 1:38 am
try to keep up
It seems I will need to do some studying to make an object-oriented operator-overloaded parallel coarray Fortran 2018 version of the Fibonacci code. I wonder how expressive that will be or whether it would have been better not to avoid BASIC. If one of the 50,000 views of this thread was made by Fortran Man, some help getting started would be appreciated.

In the mean time, here are some meaningless performance results comparing parallel.c using two different parallelization technologies on the same Xeon Gold 6126 server as before.

Code: Select all

TYPE  CORES HT RANKS TIME  RATIO
serial  1    1  n/a  0.652   1
MPICH   9   18   18  0.202  3.2
MPICH   9    9    9  0.184  3.5
OpenMP  9    9  n/a  0.123  5.3
OpenMP  9   18  n/a  0.103  6.3
Note that MPICH was compiled to use the ch3:sock device which is likely slower than using shared memory to pass messages. It would be interesting to know how much of a performance penalty, if any, this incurs.

Has there been any progress adding the Visual Basic and BBC Basic programs to Github?

jahboater
Posts: 4448
Joined: Wed Feb 04, 2015 6:38 pm

Re: Why Avoid BASIC on RPi?

Mon Feb 18, 2019 9:32 am

I saw this on code review, which someone may find interesting:-

https://codereview.stackexchange.com/qu ... iplication

Return to “Off topic discussion”