Archive for the 'Erlang' Category

04/12 Border

Let u be a non-empty word. A border of u is a word different from u which is both prefix and suffix of u. For example, the word u = abacaba has the three borders ε, a and aba. Let us note BORDER( x ) the longest border of a non-empty word x.

This implementation of Erlang of BORDER(x) is a bit convoluted, infriges on Christian’s fatwah on mixing printing and computations and function overloading [can’t win them all in one day], but it seems to work. As a bonus it calculates the length, β, and the border itself all in one go.

-module(rinder).
-compile(export_all).
border(A) ->
 AA=list_to_tuple(A),
 SA=size(AA),
 Len=trunc(SA/2),
 Len2=Len * 2,
 case SA of
  Len2 -> Len3 = Len-1;
  _ -> Len3 = Len
 end,
 io:format("Max length of border: ~p~n",[Len3]),
 io:format("Min value of u: ~p~n",[lists:sublist(A,Len3+1,SA-Len3-Len3)]),
 H=element(1,AA),
 T=element(SA,AA),
 case H of
  T -> io:format(" => Border of size 1, ~p~n",[[T]]), border(AA,Len3,1,1,2);
  _ -> io:format(" => No border size of 1~n",[]), border(AA,Len3,0,1,2)
 end.

border(A,Len3,Temp,Cnt,Current) when Current > Len3 ->
 case Temp of
  0 -> {0,[]};
  N -> {N,lists:sublist(tuple_to_list(A),1,N)}
 end;
border(A,Len3,Temp,Cnt,Current) when Cnt > Current ->
 io:format(" => Border of size ~p, ~p~n",[Current,lists:sublist(tuple_to_list(A),1,Current)]),
 border(A,Len3,Current,1,Current+1);
border(A,Len3,Temp,Cnt,Current) ->
 Cond = element(Cnt,A),
 Cond2 = element(size(A)-Current+Cnt,A),
 case Cond of
  Cond2 ->
   border(A,Len3, Temp, Cnt+1, Current);
  _ ->
   io:format(" => No border of size ~p~n",[Current]),
   border(A,Len3, Temp, 1, Current+1)
 end.

1> rinder:border("abacXababa").
Max length of border: 4
Min value of u: "Xa"
 => Border of size 1, "a"
 => No border of size 2
 => Border of size 3, "aba"
 => No border of size 4
{3,"aba"}

2> rinder:border("ababbb").
Max length of border: 2
Min value of u: "ab"
 => No border size of 1
 => No border of size 2
{0,[]}

3> rinder:border("babbaa").
Max length of border: 2
Min value of u: "bb"
 => No border size of 1
 => No border of size 2
{0,[]}

4> rinder:border("abaaab").
Max length of border: 2
Min value of u: "aa"
 => No border size of 1
 => Border of size 2, "ab"
{2,"ab"}

5> rinder:border("abbaab").
Max length of border: 2
Min value of u: "ba"
 => No border size of 1
 => Border of size 2, "ab"
{2,"ab"}

I’ve played with a few more examples – examples 2 to 5 are from Christian’s coursework – and it seems to work.
The API entry takes a string/list as argument, converts it to a tuple, and computes the maximum hypothetical length of the border: since the word surrounded by the border can’t be null, you need to leave at least 1 character. So, if |x| is even, you need to take (|x|/2)-1, otherwise int(|x|/2). And since Erlang seems to miss a modulo function, I have to go through the Len=int(|x|), Len2=Len*2 hoop. Grrr…

The code tests the first possible border, of size one, by comparing the first and last element of the tuple. If they match the Temp variable [which records the highest length so far] is set to 1, otherwise to 0, while calling border/4. Then the code basically goes from 2 to Len3, recorded in Current, calling border/5, which itself goes from 1 to Current, testing the Cnt first/last elements of the tuple. Crude but works. When all is done, the largest border and its length are returned.

Erlang

04/12 Naive search

I am going through Christian’s course, Information Retrieval, trying to keep it relevant to my untrained mind – I was never good at maths… – so as soon as pseudocode popped up, I grabbed Erlang and started coding. This is an implementation in Erlang of so-called naive search.

-module(rinder).
-compile(export_all).
naive(X,T) ->
 XX=list_to_tuple(X),
 TT=list_to_tuple(T),
 M=size(XX),
 N=size(TT),
 io:format("Going with ~p/~p ~p/~p~n",[XX,M,TT,N]),
 io:format("Maximum iterations: ~p~n",[N*M-M*M+M]),
 naive(XX,TT,M,N,1,1).
naive(X,T,M,N,I,J) when I>M ->
 % finished
 io:format("The end~n",[]),
 case (I>M) of
  true -> {true, (J-M)};
  _ -> {false}
 end;
naive(X,T,M,N,I,J) when J>N -> {false};
naive(X,T,M,N,I,J) ->
 Cond=element(I,X),
 Cond2=element(J,T),
 case Cond2 of
  Cond ->
   io:format(" Cond/Cond2 is ~p/~p [match], next is ~p::~p~n",[[Cond],[Cond2],I+1,J+1]),
   naive(X,T,M,N,I+1,J+1);
  _ ->
   io:format(" Cond/Cond2 is ~p/~p [false], next is ~p::~p~n",[[Cond],[Cond2],1,J-I+2]),
   naive(X,T,M,N,1,J-I+2)
 end.

The code converts the string/list to a tuple [accesses with element() are supposed to be faster, or so say the big boys at Ericsson], and then, well, applies said naive algorithm:

NAIVE ( x , t )
i ← 1 ; j ← 1
while i ≤ m and j ≤ n
  if t[j] == x[i] then
    i ← i+1 ; j ← j+1
  else
    i ← 1 ; j ← j-i+2

if i > m Then x appears at position t[j-m]
else No Love

1> rinder:naive("foldez","folder").
Going with {102,111,108,100,101,122}/6 {102,111,108,100,101,114}/6
Maximum iterations: 6
 Cond/Cond2 is "f"/"f" [match], next is 2::2
 Cond/Cond2 is "o"/"o" [match], next is 3::3
 Cond/Cond2 is "l"/"l" [match], next is 4::4
 Cond/Cond2 is "d"/"d" [match], next is 5::5
 Cond/Cond2 is "e"/"e" [match], next is 6::6
 Cond/Cond2 is "z"/"r" [false], next is 1::2
 Cond/Cond2 is "f"/"o" [false], next is 1::3
 Cond/Cond2 is "f"/"l" [false], next is 1::4
 Cond/Cond2 is "f"/"d" [false], next is 1::5
 Cond/Cond2 is "f"/"e" [false], next is 1::6
 Cond/Cond2 is "f"/"r" [false], next is 1::7
{false}
2> rinder:naive("old","folder").
Going with {111,108,100}/3 {102,111,108,100,101,114}/6
Maximum iterations: 12
 Cond/Cond2 is "o"/"f" [false], next is 1::2
 Cond/Cond2 is "o"/"o" [match], next is 2::3
 Cond/Cond2 is "l"/"l" [match], next is 3::4
 Cond/Cond2 is "d"/"d" [match], next is 4::5
The end
{true,2}

Erlang

03/26 Stanford CS Education Library

Not everyone is lucky enough to enjoy a posh education at Stanford, but it shouldn’t prevent you to go to their Stanford CS Education Library and go through the material there. Pretty basic but good stuff. Currently dealing with Binary Trees (with a focus on C), and I am using this as Erlang tutorials, the DIY variant… There’s probably a good implementation of a binary tree [in the Chapter about Tuples of the Erlang doc], but I didn’t cheat and did it all on my own :D


Binary Search Tree

I have now a semi-functioning implementation of a binary search tree [a last minute “upgrade” of the code borked the lookup function, sigh]. It can, so far, create and add nodes to a parent, left or right; lookup a node well, when I fix the code that is…; calculate the maximum depth, minimum and maximum values, and the overall count of nodes; and mirror the tree – for some reason it took me more than 5 minutes, must be tired…

Note to self: unit tests are fine, as long as they don’t have bugs… Trying to debug a perfectly functioning piece of code because the unit test is bugged is a perfectly rotten way of spending time in front of the ‘puter…

I will probably open soon a code repository for all my Erlang efforts – learning CVS on the way, it cannot hurt. If and when, I’ll post the url.

I wonder how fast my bst would be if I dumped 22,000+ XML nodes, parsed to strings… Not that it would make sense, but it would sure be fun to try!

Erlang

03/24 Collateral damage

Mar 23 10:51:10 <malware> dda: I know nothing about Erlang. Can you give me a capsule description? What kind of stuff it is commonly used for or optimal for?
Mar 23 10:54:36 <dda_> malware: concurrency and distributed computing
Mar 23 10:54:53 <dda_> that capsule enough? ;-)
Mar 23 10:54:59 <MNK2_> dda: I thought it was to drive French hackers mad with wacky UTF-8 and string semantics :)
Mar 23 10:55:07 <dda_> that too
Mar 23 10:55:14 <dda_> but that’s collateral damage

Erlang

03/23 Success!

I have brought all the pieces together today in yet another IronCoding session. I discovered two things:

  1. Distribution doesn’t mean raw speed improvement. At least not on the scale of my test files. Running my parser over a couple of machines is slower – so far, but I am waiting to see what’ll happen with the monster files –than on just one.
    Which probably means I’d better get a dual core, very very fast machine. nnnyessss… /me looks at wallet… nnnnnoooooo….
  2. Erlang *is* powerful. 110 lines of code are enough to set everything [concurrency, distribution, tokenization of strings, centralized sql output to a file] up.

On my TiBook, it is a tad slow – besides, it is a busy machine – but on the dedicated server we have with a friend at OVH, which is pretty much underused, a puny Celeron @2.6GHz with half a gig of RAM, it parsed 1,010 complex records [30,000 lines] in a little over 1 second. I tried there too to run it distributed over two local nodes, no dice. 10 seconds or so. sigh…

Now, I need to dump that into an sqlite database. I do. Don’t ask. Any ideas? Google and #erlang weren’t helpful…

Erlang