Archive for April, 2006

04/16 Reminds me something…

Foreword / Politeness

Please, when in the classroom

  • do not wear a cap, a hat or anything on your head,
  • do not eat or drink,
  • do not chew gum,
  • switch your mobile phone off.

When I taught in a Korean University, I enforced the same basic rules. But I wasn’t as nice… It is telling that [young] adults have to be told to follow basic politeness rules.

04/13 The Morris-Pratt algorithm

I put all the bits together, and it would be too much pretified Erlang code than necessary [and while highlight is nice, it produces monster amounts of HTML/CSS cruft]. Instead, get the code here. That thing seems to work, from preliminary tests – not that I don’t trust either the algorithm or Christian, but rather my Erlang coding skillz… – and I might rewrite that monster in a more user-friendly language, like RB.

Now I should really try and understand how it works. I think I have a vague understanding, but it’d be better if the legs I’m standing on weren’t that shaky…

Update
The code has been updated to reflect changes in the coursework. Apparently I flat-footed on some errors in the algorithms. Happy to be of some help :-)

Erlang

04/12 Beta

Two new Erlang functions here, beta/2 which calculates the β function at offset n, and allBetas/1, which computes all β and returns them in a list [could be a tuple if you want :-) ]

-module(rinder).
-compile(export_all).
beta(X,0) -> -1;
beta(X,1) -> 0;
beta(X,J) ->
 I= 1+beta(X,J-1),
 Cond=element(I,X),
 Cond2=element(J,X),
 case Cond of
  Cond2 -> I;
  _ -> beta(X,I)
 end.

allBetas(X) ->
 XX=list_to_tuple(X),
 allBetas(XX,[-1,0],2,size(XX)).
allBetas(X,Acc,J,M) when J>M->
 Acc;
allBetas(X,Acc,J,M) ->
 I=1+beta(X,J-1),
 Cond=element(I,X),
 Cond2=element(J,X),
 case Cond of
  Cond2 ->
   allBetas(X,lists:flatten([Acc,I]),J+1,M);
  _ ->
   allBetas(X,lists:flatten([Acc,element(I+1,list_to_tuple(Acc))]),J+1,M)
 end.

log_to_screen(A,B) -> io:format(A,B).

 rinder:beta(list_to_tuple("abacabac"),3).
1

Yes I know too lazy to provide a conversion from list to tuple. Next time…

 rinder:allBetas("abacabac").
[-1,0,0,1,0,1,2,3,4]

which thankfully matches what Christian calculated…

Since Erlang’s tuples and lists start at 1, the line
allBetas(X,lists:flatten([Acc,element(I+1,list_to_tuple(Acc))]),J+1,M)
has I + 1 instead of i in the algorithm (else β[j] < -- β[i]).
Erlang

04/12 Border

Let u be a non-empty word. A border of u is a word different from u which is both prefix and suffix of u. For example, the word u = abacaba has the three borders ε, a and aba. Let us note BORDER( x ) the longest border of a non-empty word x.

This implementation of Erlang of BORDER(x) is a bit convoluted, infriges on Christian’s fatwah on mixing printing and computations and function overloading [can’t win them all in one day], but it seems to work. As a bonus it calculates the length, β, and the border itself all in one go.

-module(rinder).
-compile(export_all).
border(A) ->
 AA=list_to_tuple(A),
 SA=size(AA),
 Len=trunc(SA/2),
 Len2=Len * 2,
 case SA of
  Len2 -> Len3 = Len-1;
  _ -> Len3 = Len
 end,
 io:format("Max length of border: ~p~n",[Len3]),
 io:format("Min value of u: ~p~n",[lists:sublist(A,Len3+1,SA-Len3-Len3)]),
 H=element(1,AA),
 T=element(SA,AA),
 case H of
  T -> io:format(" => Border of size 1, ~p~n",[[T]]), border(AA,Len3,1,1,2);
  _ -> io:format(" => No border size of 1~n",[]), border(AA,Len3,0,1,2)
 end.

border(A,Len3,Temp,Cnt,Current) when Current > Len3 ->
 case Temp of
  0 -> {0,[]};
  N -> {N,lists:sublist(tuple_to_list(A),1,N)}
 end;
border(A,Len3,Temp,Cnt,Current) when Cnt > Current ->
 io:format(" => Border of size ~p, ~p~n",[Current,lists:sublist(tuple_to_list(A),1,Current)]),
 border(A,Len3,Current,1,Current+1);
border(A,Len3,Temp,Cnt,Current) ->
 Cond = element(Cnt,A),
 Cond2 = element(size(A)-Current+Cnt,A),
 case Cond of
  Cond2 ->
   border(A,Len3, Temp, Cnt+1, Current);
  _ ->
   io:format(" => No border of size ~p~n",[Current]),
   border(A,Len3, Temp, 1, Current+1)
 end.

1> rinder:border("abacXababa").
Max length of border: 4
Min value of u: "Xa"
 => Border of size 1, "a"
 => No border of size 2
 => Border of size 3, "aba"
 => No border of size 4
{3,"aba"}

2> rinder:border("ababbb").
Max length of border: 2
Min value of u: "ab"
 => No border size of 1
 => No border of size 2
{0,[]}

3> rinder:border("babbaa").
Max length of border: 2
Min value of u: "bb"
 => No border size of 1
 => No border of size 2
{0,[]}

4> rinder:border("abaaab").
Max length of border: 2
Min value of u: "aa"
 => No border size of 1
 => Border of size 2, "ab"
{2,"ab"}

5> rinder:border("abbaab").
Max length of border: 2
Min value of u: "ba"
 => No border size of 1
 => Border of size 2, "ab"
{2,"ab"}

I’ve played with a few more examples – examples 2 to 5 are from Christian’s coursework – and it seems to work.
The API entry takes a string/list as argument, converts it to a tuple, and computes the maximum hypothetical length of the border: since the word surrounded by the border can’t be null, you need to leave at least 1 character. So, if |x| is even, you need to take (|x|/2)-1, otherwise int(|x|/2). And since Erlang seems to miss a modulo function, I have to go through the Len=int(|x|), Len2=Len*2 hoop. Grrr…

The code tests the first possible border, of size one, by comparing the first and last element of the tuple. If they match the Temp variable [which records the highest length so far] is set to 1, otherwise to 0, while calling border/4. Then the code basically goes from 2 to Len3, recorded in Current, calling border/5, which itself goes from 1 to Current, testing the Cnt first/last elements of the tuple. Crude but works. When all is done, the largest border and its length are returned.

Erlang

04/12 Naive search

I am going through Christian’s course, Information Retrieval, trying to keep it relevant to my untrained mind – I was never good at maths… – so as soon as pseudocode popped up, I grabbed Erlang and started coding. This is an implementation in Erlang of so-called naive search.

-module(rinder).
-compile(export_all).
naive(X,T) ->
 XX=list_to_tuple(X),
 TT=list_to_tuple(T),
 M=size(XX),
 N=size(TT),
 io:format("Going with ~p/~p ~p/~p~n",[XX,M,TT,N]),
 io:format("Maximum iterations: ~p~n",[N*M-M*M+M]),
 naive(XX,TT,M,N,1,1).
naive(X,T,M,N,I,J) when I>M ->
 % finished
 io:format("The end~n",[]),
 case (I>M) of
  true -> {true, (J-M)};
  _ -> {false}
 end;
naive(X,T,M,N,I,J) when J>N -> {false};
naive(X,T,M,N,I,J) ->
 Cond=element(I,X),
 Cond2=element(J,T),
 case Cond2 of
  Cond ->
   io:format(" Cond/Cond2 is ~p/~p [match], next is ~p::~p~n",[[Cond],[Cond2],I+1,J+1]),
   naive(X,T,M,N,I+1,J+1);
  _ ->
   io:format(" Cond/Cond2 is ~p/~p [false], next is ~p::~p~n",[[Cond],[Cond2],1,J-I+2]),
   naive(X,T,M,N,1,J-I+2)
 end.

The code converts the string/list to a tuple [accesses with element() are supposed to be faster, or so say the big boys at Ericsson], and then, well, applies said naive algorithm:

NAIVE ( x , t )
i ← 1 ; j ← 1
while i ≤ m and j ≤ n
  if t[j] == x[i] then
    i ← i+1 ; j ← j+1
  else
    i ← 1 ; j ← j-i+2

if i > m Then x appears at position t[j-m]
else No Love

1> rinder:naive("foldez","folder").
Going with {102,111,108,100,101,122}/6 {102,111,108,100,101,114}/6
Maximum iterations: 6
 Cond/Cond2 is "f"/"f" [match], next is 2::2
 Cond/Cond2 is "o"/"o" [match], next is 3::3
 Cond/Cond2 is "l"/"l" [match], next is 4::4
 Cond/Cond2 is "d"/"d" [match], next is 5::5
 Cond/Cond2 is "e"/"e" [match], next is 6::6
 Cond/Cond2 is "z"/"r" [false], next is 1::2
 Cond/Cond2 is "f"/"o" [false], next is 1::3
 Cond/Cond2 is "f"/"l" [false], next is 1::4
 Cond/Cond2 is "f"/"d" [false], next is 1::5
 Cond/Cond2 is "f"/"e" [false], next is 1::6
 Cond/Cond2 is "f"/"r" [false], next is 1::7
{false}
2> rinder:naive("old","folder").
Going with {111,108,100}/3 {102,111,108,100,101,114}/6
Maximum iterations: 12
 Cond/Cond2 is "o"/"f" [false], next is 1::2
 Cond/Cond2 is "o"/"o" [match], next is 2::3
 Cond/Cond2 is "l"/"l" [match], next is 3::4
 Cond/Cond2 is "d"/"d" [match], next is 4::5
The end
{true,2}

Erlang