Heater
Posts: 12976
Joined: Tue Jul 17, 2012 3:02 pm

Re: Project Digital Apocalypse Not Now

Mon Jul 22, 2019 6:17 pm

ejolson,
The JavaScript timings look promising. I wonder what needs to be done so that code runs on the Raspberry Pi. The incompatibility reminds me of Basic.
Nothing. It runs just fine. See below.

What incompatibility? It's the same node.js engine running the language to the same ECMA standard.

Nothing like BASIC at all.

OK. On a Pi 3 (not +) with out of the box Buster and node.js install I get this:

Code: Select all

pi@pi3-buster-1:~ $ uname -a
Linux pi3-buster-1 4.19.50-v7+ #896 SMP Thu Jun 20 16:11:44 BST 2019 armv7l GNU/Linux
pi@pi3-buster-1:~ $ node -v
v10.15.2

pi@pi3-buster-1:~ $ time node selfgrams.js > selfgrams_js.txt
rss 147.39 MB
heapTotal 114.88 MB
heapUsed 100.21 MB
external 0.01 MB

real    0m16.848s
user    0m18.010s
sys     0m0.401s

pi@pi3-buster-1:~ $ time ./selfgrams.pl > selfgrams_pl.txt

real    0m11.214s
user    0m11.042s
sys     0m0.171s

pi@pi3-buster-1:~ $ md5sum selfgrams_pl.txt selfgrams_js.txt
addeda36a4631afc983f3bff13742e36  selfgrams_pl.txt
addeda36a4631afc983f3bff13742e36  selfgrams_js.txt
Note the heap use report I added there.

What was running on what exactly to cause that FATAL ERROR?

Heater
Posts: 12976
Joined: Tue Jul 17, 2012 3:02 pm

Re: Project Digital Apocalypse Not Now

Mon Jul 22, 2019 6:52 pm

node.js version 12.6.0 is doing much better. Speed is up. Memory consumption is down:

Code: Select all

pi@pi3-buster-1:~ $ node -v
v12.6.0

pi@pi3-buster-1:~ $ time ./selfgrams.pl > selfgrams_pl.txt

real    0m11.297s
user    0m11.161s
sys     0m0.111s

pi@pi3-buster-1:~ $ time node selfgrams.js > selfgrams_js.txt
rss 119.24 MB
heapTotal 83.82 MB
heapUsed 77.32 MB
external 0.64 MB

real    0m11.817s
user    0m13.635s
sys     0m0.292s

pi@pi3-buster-1:~ $ md5sum selfgrams_js.txt selfgrams_pl.txt
addeda36a4631afc983f3bff13742e36  selfgrams_js.txt
addeda36a4631afc983f3bff13742e36  selfgrams_pl.txt
pi@pi3-buster-1:~ $

User avatar
davidcoton
Posts: 3949
Joined: Mon Sep 01, 2014 2:37 pm
Location: Cambridge, UK

Re: Project Digital Apocalypse Not Now

Mon Jul 22, 2019 9:12 pm

Heater wrote:
Mon Jul 22, 2019 5:13 pm
Give me two ticks...
But where in all Unicode do we find ticks? :lol:
Signature retired

Heater
Posts: 12976
Joined: Tue Jul 17, 2012 3:02 pm

Re: Project Digital Apocalypse Not Now

Mon Jul 22, 2019 9:40 pm

✓✓

:)

User avatar
davidcoton
Posts: 3949
Joined: Mon Sep 01, 2014 2:37 pm
Location: Cambridge, UK

Re: Project Digital Apocalypse Not Now

Mon Jul 22, 2019 10:08 pm

OK I'll bookmark your post, so I can mine it for ticks next time I need them. :+1:
Signature retired

Heater
Posts: 12976
Joined: Tue Jul 17, 2012 3:02 pm

Re: Project Digital Apocalypse Not Now

Tue Jul 23, 2019 3:39 pm

jamesh will be glad to hear that he is correct about 32 bit vs 64 bit OS when it comes to this anagrams problem.

On a Pi 3 running pi64:
pi@pi64-aalto:~$ time node insane-british-anagram.js > insane-british-anagram.txt
rss 231.5 MB
heapTotal 194.57 MB
heapUsed 169.08 MB
external 0.01 MB

real 0m15.261s
user 0m15.748s
sys 0m0.572s
pi@pi64-aalto:~$ time ./selfgrams.pl > selfgrams.txt

real 0m11.489s
user 0m11.155s
sys 0m0.332s
Execution time is much the same. Memory usage is twice as much!

User avatar
John Spikowski
Posts: 41
Joined: Sat Jul 20, 2019 5:34 pm
Location: Anacortes, WA USA
Contact: Website

Re: Project Digital Apocalypse Not Now

Tue Jul 23, 2019 4:31 pm

Heater,

And you gave up crossword puzzles for this?

User avatar
scruss
Posts: 2360
Joined: Sat Jun 09, 2012 12:25 pm
Location: Toronto, ON
Contact: Website

Re: Project Digital Apocalypse Not Now

Tue Jul 23, 2019 5:20 pm

Heater wrote:
Mon Jul 22, 2019 6:17 pm
What was running on what exactly to cause that FATAL ERROR?
Your code, on a Raspberry Pi Zero, running whatever the stock version of node is (dunno, but old) on Stretch. I can't reach that particular machine right now to check.

I just ran it on a 3A+ running Stretch (node 8.11.1) and it completed in a rather tidy 19.7 s. ISTR node having problems on the single-core Raspberry Pis.
‘Remember the Golden Rule of Selling: “Do not resort to violence.”’ — McGlashan.

ejolson
Posts: 3252
Joined: Tue Mar 18, 2014 11:47 am

Re: Project Digital Apocalypse Not Now

Tue Jul 23, 2019 5:44 pm

scruss wrote:
Tue Jul 23, 2019 5:20 pm
Heater wrote:
Mon Jul 22, 2019 6:17 pm
What was running on what exactly to cause that FATAL ERROR?
Your code, on a Raspberry Pi Zero, running whatever the stock version of node is (dunno, but old) on Stretch. I can't reach that particular machine right now to check.

I just ran it on a 3A+ running Stretch (node 8.11.1) and it completed in a rather tidy 19.7 s. ISTR node having problems on the single-core Raspberry Pis.
Do you need a different binary for Node.js compiled as ARMv6 to run on the Pi Zero?

You might end up with wrong binary if switching cards between different flavours of Raspberry Pi, but I had problems with the default openjdk-9 binaries in Raspbian on the Zero earlier. While I know Java and JavaScript are different, maybe something similar is happening in the present case.
Last edited by ejolson on Tue Jul 23, 2019 5:50 pm, edited 5 times in total.

Heater
Posts: 12976
Joined: Tue Jul 17, 2012 3:02 pm

Re: Project Digital Apocalypse Not Now

Tue Jul 23, 2019 5:46 pm

scruss,

Thanks for looking into this. Something odd going on here.

I may still be wrong but as far as I can see there should be problem running this in 512MB RAM. I can only see it using a shade less than 90MB. Presuming there is not something else using a lot of RAM.

I don't have a Stretch here anymore but it does run under my Buster which has node 10.15.2 out of the box.

I have been running node.js on Pi since the first models, never heard of any problems running on a single core machine. Out of curiosity I disabled all but one core on a Pi 3. Everything runs fine in 14 seconds. A bit slower than with more cores.

Heater
Posts: 12976
Joined: Tue Jul 17, 2012 3:02 pm

Re: Project Digital Apocalypse Not Now

Tue Jul 23, 2019 5:48 pm

ejolson,
Do you need a different binary for Node.js compiled as ARMv6 to run on the Pi Zero?
In theory the node that comes with Raspbian runs on all models of Pi. Like the rest of Raspbian.

Sadly I don't have a Zero here to experiment with.

User avatar
scruss
Posts: 2360
Joined: Sat Jun 09, 2012 12:25 pm
Location: Toronto, ON
Contact: Website

Re: Project Digital Apocalypse Not Now

Tue Jul 23, 2019 11:00 pm

Ah, it's all good now.
For some reason the Zero only had ~32 MB free memory and the GPU memory set very high. I fixed that. It works now. Sorry about that.
‘Remember the Golden Rule of Selling: “Do not resort to violence.”’ — McGlashan.

Heater
Posts: 12976
Joined: Tue Jul 17, 2012 3:02 pm

Re: Project Digital Apocalypse Not Now

Wed Jul 24, 2019 2:11 am

Phew. No worries. Mystery solved. I have done similar in the past.

Given that the British insane dictionary is 6.6MB it could be argued that my using nearly 100MB here is insane!

The way that JS works it:

Reads the entire file to RAM -- ~7MB,
builds the anagrams list -- another 7MB
then builds a formatted output string --- another 7MB.

That's 21MB. What is JS doing with the rest of it?

There has to be be about a million pointers used in there, so that's another 8MB.

ejolson
Posts: 3252
Joined: Tue Mar 18, 2014 11:47 am

Re: Project Digital Apocalypse Not Now

Wed Jul 24, 2019 5:48 am

ejolson wrote:
Sun Jul 21, 2019 7:33 pm
I'm presently working on a version in classic line-numbered Basic that organizes all the anagrams into a heap data structure.
At FidoBasic headquarters I entered what looked like a lift. It turned out to be a time machine. Suddenly I was standing in the midst of the golden age of personal computing, still holding my Raspberry Pi. In true canine fashion Fido's head turned to one side as I explained the insane British anagram challenge to the wondering crowd of teenage student Basic programmers. By the time I was finished explaining how with a challenge anyone who writes a working program is a winner, one of the children motioned to me.

With the hunt-and-peck quickness of a ninja and stealth, a working program had already been written without my notice. On the screen of a nearby 8-bit microcomputer was the following:

Code: Select all

1000 REM banagram.bas -- Find anagrams
1010 REM Written July 16, 1978 by Eric Olson
1020 REM
1030 REM This program demonstrates using letter counts and linear
1040 REM lists to find anagrams in the insane British dictionary.
1050 REM
2000 REM F$="/usr/share/dict/british-english-insane"
2001 F$="tail.dat"
2010 M0=0:OPEN F$ FOR INPUT AS #1
2015 IF EOF(1)<>0 THEN 2030
2020 INPUT #1,A$:M0=M0+1:GOTO 2015
2030 DIM L$(M0),K$(M0),V0(26)
2040 CLOSE #1:OPEN F$ FOR INPUT AS #1
2050 A1=ASC("a")-1:A2=ASC("0"):N0=0
2060 FOR I=1 TO M0:INPUT #1,A$
2063 FOR J=1 TO 26:V0(J)=0:NEXT J:J=0
2065 IF J>=LEN(A$) THEN 2075
2068 J=J+1:C=ASC(MID$(A$,J,1))-A1
2070 IF (C<1) OR (C>26) THEN 2130
2071 V0(C)=V0(C)+1:GOTO 2065
2075 V$="":FOR J=1 TO 26
2080 V$=V$+CHR$(V0(J)+A2):NEXT J:J=0
2090 IF J>=N0 THEN 2120
2100 J=J+1:IF K$(J)<>V$ THEN 2090
2110 L$(J)=L$(J)+" "+A$+",":GOTO 2130
2120 N0=N0+1:L$(N0)=A$+":":K$(N0)=V$
2130 NEXT I:CLOSE #1
2400 FOR I=1 TO N0:A$=L$(I)
2410 IF RIGHT$(A$,1)=":" THEN 2430
2420 PRINT LEFT$(A$,LEN(A$)-1)
2430 NEXT I
9000 END
Two things caught my attention: The first was the fact that the comments listed me as the author; the second that the program would surely run with non-ninja turtle-like slowness because it used an O(n^2) algorithm based on linear lists to match and store the anagrams.

I turned to Fido and whispered under my breath, for a program written by someone who has not formally studied algorithms at the university this code shows remarkable skill and insight. The canine sneezed and and looked at me as if I had just congratulated myself. Then with a bark the dog developer announced, let's see if it really works.

Since the 4K memory of the PET 2001 was not sufficient to hold the insane British dictionary and the algorithm was too slow anyway, I connected the Pi 3B to a nearby television and typed

Code: Select all

$ tail -n 10000 /usr/share/dict/british-english-insane >tail.dat
$ wc tail.dat
10000 10000 93607 tail.dat
$ time ./bwbasic banagram.bas >banagram-bw.tail

real    18m30.145s
user    18m29.419s
sys 0m0.120s
$ time ./sbrandy banagram-mb.bas >banagram-mb.tail

real    0m23.888s
user    0m23.797s
sys 0m0.071s
$ time ./gplbasic banagram.bas >banagram-gpl.tail

real    0m9.476s
user    0m9.460s
sys 0m0.011s
$ fbc -lang qb banagram.bas
$ time ./banagram >banagram-fbc.tail

real    0m5.330s
user    0m5.313s
sys 0m0.011s
$ time ./anagram.perl >perl.tail

real    0m0.226s
user    0m0.205s
sys 0m0.021s
$ md5sum *.tail
2f0427e2de42059e20bb335e25c5c1ca  banagram-bw.tail
2f0427e2de42059e20bb335e25c5c1ca  banagram-fbc.tail
2f0427e2de42059e20bb335e25c5c1ca  banagram-gpl.tail
2f0427e2de42059e20bb335e25c5c1ca  banagram-mb.tail
2f0427e2de42059e20bb335e25c5c1ca  perl.tail
The code for Matrix Brandy Basic was slightly different and read as

Code: Select all

1000 REM banagram-mb.bas -- Find anagrams
1010 REM Written July 17, 1978 by Eric Olson
1020 REM
1030 REM This program demonstrates using letter counts and linear
1040 REM lists to find anagrams in the insane British dictionary.
1050 REM
2000 REM F$="/usr/share/dict/british-english-insane"
2001 F$="tail.dat"
2010 M0=0:F1=OPENIN(F$)
2015 IF EOF#(F1)<>0 THEN 2030
2020 A$=GET$#(F1):M0=M0+1:GOTO 2015
2030 DIM L$(M0),K$(M0),V0(26)
2040 CLOSE# F1:F1=OPENIN(F$)
2050 A1=ASC("a")-1:A2=ASC("0"):N0=0
2060 FOR I=1 TO M0:A$=GET$#(F1)
2063 FOR J=1 TO 26:V0(J)=0:NEXT J:J=0
2065 IF J>=LEN(A$) THEN 2075
2068 J=J+1:C=ASC(MID$(A$,J,1))-A1
2070 IF (C<1) OR (C>26) THEN 2130
2071 V0(C)=V0(C)+1:GOTO 2065
2075 V$="":FOR J=1 TO 26
2080 V$=V$+CHR$(V0(J)+A2):NEXT J:J=0
2090 IF J>=N0 THEN 2120
2100 J=J+1:IF K$(J)<>V$ THEN 2090
2110 L$(J)=L$(J)+" "+A$+",":GOTO 2130
2120 N0=N0+1:L$(N0)=A$+":":K$(N0)=V$
2130 NEXT I:CLOSE# F1
2400 FOR I=1 TO N0:A$=L$(I)
2410 IF RIGHT$(A$,1)=":" THEN 2430
2420 PRINT LEFT$(A$,LEN(A$)-1);CHR$(10);
2430 NEXT I
9000 END
Fido's paw motioned that is was time to go, so we headed to the time machine. Once inside I suggested, let's go back to the future so I can get a Pi 4B. Fido became barking mad and growled, the 4B has already been released. I replied, but there won't be such a short supply in the future so shipping times will be quicker.

Instead we returned to the present and Fido created a bar chart depicting the performance of the different versions of Basic.

Image

Fido's tail happily wagged back and forth while explaining, I've used dogarithmic coordinates because the timings involved so many different orders of magnitude. I think the canine coder meant logarithmic.
Last edited by ejolson on Wed Jul 24, 2019 3:45 pm, edited 6 times in total.

Heater
Posts: 12976
Joined: Tue Jul 17, 2012 3:02 pm

Re: Project Digital Apocalypse Not Now

Wed Jul 24, 2019 7:23 am

Barking mad indeed.

I thought dogs calculated by chewing on bones, Napier's bones.

It's just as well you made that trip back in time. Else the code written my your young self there in the audience of teenage student Basic programmers would have been lost forever, neglected and forgotten as it never completed on any machine at the time.

All we ever got done on a PET was a lunar lander game.

ejolson
Posts: 3252
Joined: Tue Mar 18, 2014 11:47 am

Re: Project Digital Apocalypse Not Now

Wed Jul 24, 2019 4:13 pm

Heater wrote:
Wed Jul 24, 2019 7:23 am
I thought dogs calculated by chewing on bones, Napier's bones.
I questioned the lead developer of FidoBasic about Napier's bones. With a bark Fido proudly replied, Napier's bones are based on dogarithms. You mean logarithms, I suggested. The canine coder looked confused and whined, have you ever seen a log chew on a bone?

Before this thread goes completely to the dogs, a plea hereby goes out to the non-canine members of this forum to attempt the insane British anagram challenge and thereby help to avoid a dogital apocalypse.

Speaking of the apocalypse, I just received another letter from the zombies which read
zombies wrote:We have found a way to use dogarithms and prime numbers to speed up our anagram finding programs. Let us in now.
I turned to Fido, have any dogs been helping the zombies? The reply: Anyone writing code deserves and will receive help around here.

User avatar
Paeryn
Posts: 2614
Joined: Wed Nov 23, 2011 1:10 am
Location: Sheffield, England

Re: Project Digital Apocalypse Not Now

Wed Jul 24, 2019 5:08 pm

My feline coder is halfway through it, though he was annoyed that the insane list ruined his word hashing algorithm (taking it over a 64 bit value). He's gone for a sulk and a lie down.
She who travels light — forgot something.

gkreidl
Posts: 5999
Joined: Thu Jan 26, 2012 1:07 pm
Location: Germany

Re: Project Digital Apocalypse Not Now

Wed Jul 24, 2019 5:11 pm

Here is the Python 3 version for the anagram challenge.

Code: Select all

#!/usr/bin/python3

import sys,re
if sys.version[0:3] < "3.6":
    from collections import OrderedDict
    anagrams = OrderedDict()
else:
    anagrams = {}

f = open('/usr/share/dict/british-english-insane','rt')
dictionary = f.read().split('\n')
f.close()

check = re.compile('^[a-z]+$')

for word in dictionary:
    if check.match(word):
        keyl = list(word)
        keyl.sort()
        key = ''.join(keyl)
        if key in anagrams:
            anagrams[key].append(word)
        else:
            anagrams[key] = [word]
output = ''

for v in anagrams.values():
    if len(v) > 1:
        op = ', '.join(v) + '\n'
        output += op.replace(',',':',1)

print(output,end='')

Running it on a RPi 3+ using Stretch and comparing it to Perl I get:

Code: Select all

pi@raspberrypi4:~ $ time ./selfgrams.py > selfgrams_py.txt

real	0m14,493s
user	0m14,221s
sys	0m0,180s
pi@raspberrypi4:~ $ time ./selfgrams.pl > selfgrams_pl.txt

real	0m11,729s
user	0m11,417s
sys	0m0,120s
pi@raspberrypi4:~ $ md5sum selfgrams_pl.txt
bec74aa3b31577edbb291aeb7269a4d5  selfgrams_pl.txt
pi@raspberrypi4:~ $ md5sum selfgrams_py.txt
bec74aa3b31577edbb291aeb7269a4d5  selfgrams_py.txt
On Buster it will run more than a second faster, because we do not have to use the slower subclass OrderedDict.

BTW, the British Insane Dictionary includes words containing one or more of the following characters "èóüäçéáåöñëíâôûøêîïàùú". Why should we exclude them from the anagrams (with the exception of Brexit fans)? It also includes words containing apostrophs, but it seems natural (to me) to exclude them.

To include the special characters, the Perl code has to be modified like this:

Code: Select all

#!/usr/bin/perl -l
# selfgrams4.pl - find words that have valid anagrams
# scruss - 2019-07
my ( %grams, $key, $fh, @out, $k );

open( $fh, '<', '/usr/share/dict/british-english-insane' ) or die "$!\n";
foreach ( grep { /^[a-zèóüäçéáåöñëíâôûøêîïàùú]+$/ && chomp; } <$fh> ) {
    $key = join( '', sort( split( '', $_ ) ) );
    if ( exists( $grams{$key} ) ) {
        $out[ $grams{$key} ] .= ', ' . $_;
    }
    else {
        $grams{$key} = $k;
        $out[$k] = $_;
        $k++;
    }
}
close($fh);

print join( "\n", grep { index( $_, ',' ) >= 0 && s/,/:/; } @out );
exit;
The Python code can be simplified because we don't have to use regular expressions any more:

Code: Select all

#!/usr/bin/python3

import sys
if sys.version[0:3] < "3.6":
    from collections import OrderedDict
    anagrams = OrderedDict()
else:
    anagrams = {}

f = open('/usr/share/dict/british-english-insane','rt')
dictionary = f.read().split('\n')
f.close()

for word in dictionary:
    if word.islower() and "'" not in word:
        keyl = list(word)
        keyl.sort()
        key = ''.join(keyl)
        if key in anagrams:
            anagrams[key].append(word)
        else:
            anagrams[key] = [word]
output = ''

for v in anagrams.values():
    if len(v) > 1:
        op = ', '.join(v) + '\n'
        output += op.replace(',',':',1)

print(output,end='')
Timing on Stretch:

Code: Select all

pi@raspberrypi4:~ $ time ./selfgramsall.py > selfgrams_py.txt

real	0m12,821s
user	0m12,599s
sys	0m0,160s
pi@raspberrypi4:~ $ time ./selfgramsall.pl > selfgrams_pl.txt

real	0m11,504s
user	0m11,320s
sys	0m0,150s
pi@raspberrypi4:~ $ md5sum selfgrams_pl.txt
a6073146aceaa463645c7685ddd6a9a0  selfgrams_pl.txt
pi@raspberrypi4:~ $ md5sum selfgrams_py.txt
a6073146aceaa463645c7685ddd6a9a0  selfgrams_py.txt
On Buster both language versions should need about the same time.
Minimal Kiosk Browser (kweb)
Slim, fast webkit browser with support for audio+video+playlists+youtube+pdf+download
Optional fullscreen kiosk mode and command interface for embedded applications
Includes omxplayerGUI, an X front end for omxplayer

User avatar
John Spikowski
Posts: 41
Joined: Sat Jul 20, 2019 5:34 pm
Location: Anacortes, WA USA
Contact: Website

Re: Project Digital Apocalypse Not Now

Wed Jul 24, 2019 5:27 pm

This challenge seems to favor languages with built in functions like sort , islower, ...

The same bias was shown with fibo and languages with BIGINT support.

Let's rename this thread "Obscure challenges with the goal to define Infinity".

ejolson
Posts: 3252
Joined: Tue Mar 18, 2014 11:47 am

Re: Project Digital Apocalypse Not Now

Wed Jul 24, 2019 6:54 pm

John Spikowski wrote:
Wed Jul 24, 2019 5:27 pm
This challenge seems to favor languages with built in functions like sort , islower, ...
The goal is to avoid the digital apocalypse by writing some code that finds anagrams. As stated in the challenge description, you are free to use any programming language you prefer, whether suitable or not.

By the design of the ASCII as well as UTF-8 character encodings, one can determine whether a string contains only lowercase characters by checking that the byte values of the string fall between the byte values for the letters a and z inclusive. In a modern version of the Basic programming language example code to do this is

Code: Select all

module wordtest

function isword(s as string) as boolean
    dim i,c as integer
    for i=1 to len(s)
        c=asc(mid(s,i,1))
        if (c<asc("a")) or (c>asc("z")) then
            return false
        end if
    next i
    return true
end function

function main() as integer
    if isword("test") then 
        console.writeline("test is lowercase")
    else
        console.writeline("test is not lowercase")
    end if
    if isword("tEst") then
        console.writeline("tEst is lowercase")
    else
        console.writeline("tEst is not lowercase")
    end if
    return 0
end function

end module
To run the above program using the mono Visual Basic compiler type

Code: Select all

$ vbnc wordtest.bas 
Visual Basic.Net Compiler version 0.0.0.5943 (Mono 4.0.1 - tarball)
Copyright (C) 2004-2010 Rolf Bjarne Kvinge. All rights reserved.

Compilation successful
Compilation took 00:00:18.4694050
$ mono wordtest.exe
test is lowercase
tEst is not lowercase
If I understood correctly, ScriptBasic has built-in associative arrays. Since one can assume the dictionary is already sorted, associative arrays are actually all one needs to finish the challenge. I think this is the approach taken in the recently posted JavaScript and Python codes.

ejolson
Posts: 3252
Joined: Tue Mar 18, 2014 11:47 am

Re: Project Digital Apocalypse Not Now

Wed Jul 24, 2019 7:10 pm

gkreidl wrote:
Wed Jul 24, 2019 5:11 pm
Here is the Python 3 version for the anagram challenge.
It's interesting that the Python code to find all anagrams with those extra letters in them seems to run faster than the original. Is that due to timing variations or is it a real effect? What happens if you rerun the original program a couple times?
Last edited by ejolson on Wed Jul 24, 2019 7:58 pm, edited 1 time in total.

User avatar
scruss
Posts: 2360
Joined: Sat Jun 09, 2012 12:25 pm
Location: Toronto, ON
Contact: Website

Re: Project Digital Apocalypse Not Now

Wed Jul 24, 2019 7:44 pm

ejolson wrote:
Wed Jul 24, 2019 7:10 pm
BTW, the British Insane Dictionary includes words containing one or more of the following characters "èóüäçéáåöñëíâôûøêîïàùú". Why should we exclude them from the anagrams (with the exception of Brexit fans)?
No brexitry here, I assure you. They're excluded because the English version of Bananagrams (developed in Pawtucket, RI, USA) does not have them. A truly British set would have tiles for 'ff', 'll', 'ng', 'ch' and 'gh' at the very least, since they are compound letters used in languages of the British Isles. Collating on compound letters is loads of laffs, let me tell you.
‘Remember the Golden Rule of Selling: “Do not resort to violence.”’ — McGlashan.

gkreidl
Posts: 5999
Joined: Thu Jan 26, 2012 1:07 pm
Location: Germany

Re: Project Digital Apocalypse Not Now

Wed Jul 24, 2019 8:14 pm

ejolson wrote:
Wed Jul 24, 2019 7:10 pm
gkreidl wrote:
Wed Jul 24, 2019 5:11 pm
Here is the Python 3 version for the anagram challenge.
It's interesting that the Python code to find all anagrams with those extra letters in them seems to run faster than the original. Is that due to timing variations or is it a real effect? What happens if you rerun the original program a couple times?
The speed difference is in using str.lower() instead of using a regular expression, which is a bit slower in Python than in Perl. If I want to exclude all special chracters I have to use the regular expression method.
Minimal Kiosk Browser (kweb)
Slim, fast webkit browser with support for audio+video+playlists+youtube+pdf+download
Optional fullscreen kiosk mode and command interface for embedded applications
Includes omxplayerGUI, an X front end for omxplayer

User avatar
Paeryn
Posts: 2614
Joined: Wed Nov 23, 2011 1:10 am
Location: Sheffield, England

Re: Project Digital Apocalypse Not Now

Wed Jul 24, 2019 8:19 pm

Kira the Koding Kitty (as he insists he be called) says he's only including words containing the 26 English lower-case letters, though he can be purr-suaded to allow hyphens which will be ignored and possibly upper-case which will be treated as lower-case, but only if I give him some turkey. And it will have to wait until he's been fishing!
She who travels light — forgot something.

Heater
Posts: 12976
Joined: Tue Jul 17, 2012 3:02 pm

Re: Project Digital Apocalypse Not Now

Wed Jul 24, 2019 10:02 pm

No cats or dogs around here I'm afraid. I have to do everything myself.

After a days graft parsing an obscure proprietary binary protocol I found myself making this insane British anagram solution in C/C++:

Code: Select all

//
// insane-british-anagram.cpp - Find words that have valid anagrams
//                              Words sourced from Debian's british-english-insane dictionary
//
// heater - 2019-07-25
// 
#include <fstream>
#include <string>
#include <iostream>
#include <stdlib.h>
#include <string.h>

#include <unordered_map> 

std::string getFileContents(const char *filename) {
    std::ifstream in(filename, std::ios::in | std::ios::binary);
    if (in) {
        std::string contents;
        in.seekg(0, std::ios::end);
        contents.resize(in.tellg());
        in.seekg(0, std::ios::beg);
        in.read(&contents[0], contents.size());
        in.close();
        return(contents);
    }
    throw(std::runtime_error("Shit happened!"));
}

char* sortString (const char* s) {
    char* string = (char*)malloc(strlen(s) + 1); 
    strcpy (string, s); 
    char temp;
    int i, j;
    int n = strlen(string);

    for (i = 0; i < n-1; i++) {
        for (j = i+1; j < n; j++) {
            if (string[i] > string[j]) {
            temp = string[i];
            string[i] = string[j];
            string[j] = temp;
            }
        }
    }
    return string;
}

int main (int argc, char* argv[]) {
    // Map container for sets of anagrams 
    // An anagram set is simply a string of format like: "whiter: wither, wrihte, writhe"
    std::unordered_map<std::string, std::string> anagramMap;
    std::unordered_map<std::string, std::string>::iterator it;

    auto dictionary = getFileContents("/usr/share/dict/british-english-insane");
    char* dictionaryPtr = (char*)dictionary.c_str();
    char* wordPtr = dictionaryPtr;
    char* charPtr = wordPtr;
    bool reject = false;
    while (1) {
        if (islower(*charPtr)) {
            // We are scanning a valid word
            charPtr++;
        } else if ((*charPtr) == '\n') {
            // We have hit the end of a word, use the word if it's valid
            *charPtr = 0;
            if (!reject) {
                // Do we have a word with this key (potential anagram)?
                char* key = sortString(wordPtr);
                it = anagramMap.find(key);
                if (it == anagramMap.end()) {
                    // No: Add it to the map as start of new anagram set.
                    anagramMap[key] = std::string(wordPtr);
                } else {
                    // Yes: Append it to the existing anagram set.
                    it->second.append(", ").append(wordPtr); 
                }
            }
            charPtr++;
            wordPtr = charPtr;
            reject = false;
        } else if ((*charPtr) != 0) {
            // Invalid character
            reject = true;
            charPtr++;
        } else {
            // End of dictionary
            break;
        }
    }

    // Output all the sets that have more than one word.
    for( auto const& [key, val] : anagramMap )
    {
        // Sets with a "," contain more than one word, the anagrams!
        size_t found = val.find(',');
        if (found != std::string::npos) {
            std::string fart = val;
            fart.replace(found, 1, ":");
            std::cout << fart << std::endl ;
        }
    }
    return (0);
}
Which runs nicely on a Pi 3 like so (with comparison to JS):

Code: Select all

pi@pi3-buster-1:~ $ g++ -std=c++17 -Wall -O3 -o insane-british-anagram insane-british-anagram.cpp
pi@pi3-buster-1:~ $ time ./insane-british-anagram > insane-british-anagram-cpp.txt

real    0m1.931s
user    0m1.580s
sys     0m0.350s
pi@pi3-buster-1:~ $ time node insane-british-anagram.js > insane-british-anagram-js.txt
rss 119.62 MB
heapTotal 83.82 MB
heapUsed 77.3 MB
external 0.64 MB

real    0m11.967s
user    0m13.957s
sys     0m0.231s
A useful 6 times faster than JS and PL.

Only one problem....

It finds all the right anagrams but the lines are printed in random order!
$ tail insane-british-anagram-cpp.txt
noup: upon
nourisher: renourish
nibbles: snibble
nouses: onuses
nov: von
koso: skoo, soko, sook
novale: novela
anorectous: courantoes
pagnes: paseng
novial: violan
Well of course, I used an unordered map (hash table) to keep track of the anagrams.

Maybe I'll be inspired to fix it after a cat nap...

Return to “General programming discussion”