Naive search

I am going through Christian’s course, Information Retrieval, trying to keep it relevant to my untrained mind – I was never good at maths… – so as soon as pseudocode popped up, I grabbed Erlang and started coding. This is an implementation in Erlang of so-called naive search.

-module(rinder).
-compile(export_all).
naive(X,T) ->
 XX=list_to_tuple(X),
 TT=list_to_tuple(T),
 M=size(XX),
 N=size(TT),
 io:format("Going with ~p/~p ~p/~p~n",[XX,M,TT,N]),
 io:format("Maximum iterations: ~p~n",[N*M-M*M+M]),
 naive(XX,TT,M,N,1,1).
naive(X,T,M,N,I,J) when I>M ->
 % finished
 io:format("The end~n",[]),
 case (I>M) of
  true -> {true, (J-M)};
  _ -> {false}
 end;
naive(X,T,M,N,I,J) when J>N -> {false};
naive(X,T,M,N,I,J) ->
 Cond=element(I,X),
 Cond2=element(J,T),
 case Cond2 of
  Cond ->
   io:format(" Cond/Cond2 is ~p/~p [match], next is ~p::~p~n",[[Cond],[Cond2],I+1,J+1]),
   naive(X,T,M,N,I+1,J+1);
  _ ->
   io:format(" Cond/Cond2 is ~p/~p [false], next is ~p::~p~n",[[Cond],[Cond2],1,J-I+2]),
   naive(X,T,M,N,1,J-I+2)
 end.

The code converts the string/list to a tuple [accesses with element() are supposed to be faster, or so say the big boys at Ericsson], and then, well, applies said naive algorithm:

NAIVE ( x , t )
i ← 1 ; j ← 1
while i ≤ m and j ≤ n
  if t[j] == x[i] then
    i ← i+1 ; j ← j+1
  else
    i ← 1 ; j ← j-i+2

if i > m Then x appears at position t[j-m]
else No Love

1> rinder:naive("foldez","folder").
Going with {102,111,108,100,101,122}/6 {102,111,108,100,101,114}/6
Maximum iterations: 6
 Cond/Cond2 is "f"/"f" [match], next is 2::2
 Cond/Cond2 is "o"/"o" [match], next is 3::3
 Cond/Cond2 is "l"/"l" [match], next is 4::4
 Cond/Cond2 is "d"/"d" [match], next is 5::5
 Cond/Cond2 is "e"/"e" [match], next is 6::6
 Cond/Cond2 is "z"/"r" [false], next is 1::2
 Cond/Cond2 is "f"/"o" [false], next is 1::3
 Cond/Cond2 is "f"/"l" [false], next is 1::4
 Cond/Cond2 is "f"/"d" [false], next is 1::5
 Cond/Cond2 is "f"/"e" [false], next is 1::6
 Cond/Cond2 is "f"/"r" [false], next is 1::7
{false}
2> rinder:naive("old","folder").
Going with {111,108,100}/3 {102,111,108,100,101,114}/6
Maximum iterations: 12
 Cond/Cond2 is "o"/"f" [false], next is 1::2
 Cond/Cond2 is "o"/"o" [match], next is 2::3
 Cond/Cond2 is "l"/"l" [match], next is 3::4
 Cond/Cond2 is "d"/"d" [match], next is 4::5
The end
{true,2}

Erlang

6 Responses to “Naive search”

  1. Christian Says:

    For a second, I feared I made a mistake in my course… It’s an interesting mistake of yours, by the way. In the pseudo-code one has to be careful of the side-effects, since the pseudo-language is not functional (or simply single-assignment). Sometimes, it does not matter, as in the line i <- i + 1; j <- j + 1 which is equivalent to j <- j + 1; i <- i + 1, but sometimes it matters, as in the line i <- 1; j <- j - i + 2 which is NOT equivalent to j <- j - i + 2; i <- 1 (the latter is the correct one)…

    Your error is interesting because it shows that when you copied the pseudo-code, you were probably still thinking in Erlang, where this kind of mistake is impossible. More interestingly, I can say so _even if I don’t know Erlang_. It is like using an argument in linguistic to state something about a particular language, I guess? The usefulness of some theory…

    So, I think that your Erlang code is correct, even if I am not sure.

    I would like make some remarks about Erlang itself and programming style. First, Erlang obviously borrowed _a lot_ of its syntax and vocabulary (atoms, tuples, lists) to Prolog… Second, I understand that Erlang allows to overload functions of same arity (as in Prolog): do not use this feature, except for coding a “default argument” trick, and find another name for the other functions. In your case, you have two “naive” functions, right? Third, it seems that Erlang has no built-in support for arrays… Can’t you code this algorithm with strings? If not, Erlang is definitely not suitable for string processing. Even for writing a mail agent (due to header processing). Argh.

    Also, be careful with the argument names. For example you write

    naive(XX,TT,M,N,1,1) and naive(X,T,M,N,I,J)

    Since there is no static typing, the compiler does not help you to catch a lot of interesting errors. Is it an error, by the way? (Same hell in Prolog or Scheme.)

    About “my” algorithm. If I had to publish a textbook, I would not assume, as I did in this course, that the first cell of arrays has index 1, because some programming languages use 0 (C), 1 (Pascal) or even let the user decide (Ada). Abstracting the first index by means of a function is a good idea when writing the algorithm (of course, when writing the corresponding code, you replace these function calls with the value defined in your language, for the sake of efficiency). Is it pedantry? No: if you use an explicit constant for the first index, it will certainly be involved in some arithmetic expression to compute another index, and this constant may be cancelled, as in 1 - (3 - 2) = 0, and there is no way, just by reading the expression, to translate automatically this piece of source code to a language which has a different semantics for the first index…

    Now, about the code itself. First, it is important to separate functions which are pure, i.e. contain no side-effects at all, and functions which are impure. This helps tremendously for the design and also when debugging: you know where to look for side-effects. So, in your case, do not mix printing and computations. Never. If you want to debug, try some feature like assert or raise exceptions (I don’t know Erlang, sorry). If you want your code to scale, trust my advice.

    Also, you could get rid of booleans for keeping track of a match or a failure: use a dummy index, i.e. an integer which cannot be a valid index for any array (tuples, here). Since Erlang uses 1 for the first component of a tuple, -1 would be a possible choice. If you can abstract (hide) the choice of this value, it is much better (but I don’t know the module system of Erlang).

    Another thing: you don’t need to check both i > m and j > n because there has been a match if and only if i > m (make a picture to convince yourself). So I can see one case which is useless for sure in your code.

    finally, if I rewrite my imperative pseudo-code into a functional pseudo-code, it would look like

    NAIVE(x,t)
    m <- size(t)
    n <- size(x)
    if n <= m
    then (i,j) <- NAIVE_AUX(x,t,n,m,1,1)

      if i > m

      then j - m

      else dummy_index
    else dummy_index

    NAIVE_AUX(x,t,n,m,i,j)
    if i <= m and j <= n
    then if t[j] = x[i]

      then NAIVE_AUX(x,t,n,m,i+1,j+1)

      else NAIVE_AUX(x,t,n,m,1,j-i+2)
    else (i,j)

    And some wrapper to test it:

    TEST(x,t)
    index=NAIVE(x,t)
    if index = dummy_index
    then print(”No match!”)
    else print(”Found it at index”, index)

    Note that a normal compiler would compile NAIVE_AUX exactly as a loop, i.e. without using the stack, because it would recognise a tail recursive function, so this functional code would be exactly as efficient as the imperative one.

    Maybe the syntax of Erlang allows you to factorise the “dummy_index” cases in NAIVE, too…

    Have fun!

  2. Christian Says:

    Arghh! What happened to my long comment???

  3. Christian Says:

    Could it be a nasty side-effect due to me using etc. in the pseudo-code? Please tell me if you can recover my comment, otherwise I will try again.

  4. dda Says:

    Christian, thank you very much for your comments. I am learning a lot here. And by the by, when I visit Seoul next time, I have lots of questions…

    Anyway, about method overloading. As far as I can tell overloading methods of same arity *is the norm* in Erlang. At least, that’s what I gather from the code I have read, and the manual. Thus naive/2 is the API entry, and the three naive/5 are the real code.

    As for arrays, well, lists are arrays are strings. It’s very irritating to have to share a common type for different things, but there you go. Tuples are actually nice to work with [as long as you only need a finite, static string, which, in doing searches, is fine], since they are a bit faster to manipulate than lists. Strings are – at least to an Erlang rookie lilke me – a nightmare. So be it. My goal is not to do string manips – I can do that in RB and Python very well thank you – but to learn Erlang. The more hoops I go through the better. Pain makes you stronger and all that… ;-)

    naive(XX,TT,M,N,1,1) and naive(X,T,M,N,I,J)
    naive(XX,TT,M,N,1,1) is the function call, with XX and TT being tuples, converted from X and T, strings/lists. After that, in the main function, I can go back to calling them X and T [or Toto and Tata if I care to], as these variable names are local anyway. Variables being bound, I can’t do X=list_to_tuple(X) for instance, as X already exists. Retarded, I know, but that’s what we have to deal with in Erlang. And I am sure there is a good reason for this, but I haven’t seen it yet.

    Debugging: yessir. I usually have a debug module – at least in other languages – but couldn’t be arsed for this little effort. I guess I’ll have to write a small module for logging and use it :-)

    Dummy index: I see what you mean. Again, I was following what I assume is established practice, returning a tuple with an atom indicating success or status, and a result. Note that true and false and really just atoms, ie dummies, I could’ve used schtroumpf and tagada, for what matters. Sure enough, true and false and returned as pseudo-booleans when doing comparisons, but in the end, they’re just atoms. Dunno if it is the best way to do it, but when in Rome…

    Thanks a lot for all this. I will rewrite the code with your new pseudo-code – and post a first draft of border/1 [and border/4 and border/5], which calculates the longest border – and its length of a word. Not sure it is right, since I am not sure I understood everything right…

  5. Christian Says:

    Dear dda,

    I made a mistake, of course: I meant “overloading of functions of _different_ arities.” Like naive/2 and naive/5 (respectively of arity 2 and 5). The only reason to use that feature is to provide some default arguments to a function, like

    f(x,y) \rightarrow f(x,y,5)
    f(x,y,z) \rightarrow …

    Then f(x,y) is the same as f(x,y,5). If you use this feature for other purposes, you are going to shoot yourself in the foot one day (”One concept, one name” is the safest rule of thumb).

    Using a boolean as first component in a tuple to give information about the next components is a standard practice indeed, but in this case it is an overkill.

    I think that the problem is that you are using Erlang for doing things that should not be done in Erlang and, as a consequence, you may miss the more interesting features of Erlang, as well as your examples may not be really workable. Without speaking about the subconscious karmic imprints:-) But go ahead, see for yourself, it’s the only way to knowledge.

    Actually, the best topics to start learning a functional language are found in my course Artificial Intelligence (don’t be mislead by the title, it’s not what you think it is inside): stacks, queues and trees. Then learn how to use higher-order functions, i.e. functions as arguments and return values (see my other course on Objective Caml). In fact, functional languages are top-notch for symbolic computations. It seems that Erlang does not support currying (partial evaluation of functions on their first arguments), what a pity.

    Tell me when you come to Seoul (I live in Ilsan). I will ask you the most important question among all: where the Hell can I by a decent Bourgogne in this city???

  6. dda Says:

    About the arity, I thought you’d made the mistake too – but you never know :-)

    I agree with you that I am misusing Erlang – or your coursework, who knows… – but it is fun so far, and I am way too old now to take things tooo seriously. I know I’ll never make a real living out of programming – I’ll keep trying though – so I’d better get at least some fun, some Aha! moments, out of it. When a light bulb goes in the old attic upstairs, it is comforting :-)

    Of course, being able to implement the algorithms and understanding them are two different things, and I can’t say I understand how MP works yet, which means really that a big chunk of what you write flies waaaaaaaay over my head. That’s punishment for not paying attention in math classes… Oh well. I’ll keep trying, again. And you can fill in the gaps when we meet!

    Burgundy wine can be found. But. It’s. Expensive. I’ll show you anyway… my clients will be happy to have you as a customer!

Leave a Reply

You must be logged in to post a comment.