Archive for the 'Erlang' Category

07/08 Erlang multi-byte module, continued

There are still many dusty corners, and some major refactoring to do, but it is going well, very well indeed.

Erlang

07/06 Erlang mode in emacs

(setq load-path (cons "/sw/lib/erlang/lib/tools-2.4/emacs/" load-path))
(setq erlang-root-dir "/sw/lib/erlang/")
(setq exec-path (cons "/sw/lib/erlang/bin/" exec-path))
(require 'erlang-start)
(global-font-lock-mode t)

Copy/paste is always easier than typing it from the Erlang book I bought recently.

Not that I am planning to use it much, as preliminary tests show that the Erlang mode doesn’t bring much more than I already have with SEE [I wrote an Erlang mode for SEE], a local copy of the Erlang docs open in a series of Firefox tabs [to which I added a basic search function in PHP], and four Terminal sessions. Since I have been using Desktop Manager for a few years now, and have seven active desktops – did I hear someone mumble Adult ADD here? – navigating between the different parts of my own toolchain is actually pleasant, as ⌘-Tab to navigate between apps, and ⌥⌘-[1..7] or ⌥⌘-[⇠ ⇢] to navigate between desktops, help tremendously. Like many people of my age and/or with my kind of mileage on computers, I am happy when my hand doesn’t reach for the mouse for a long stretch.

4 terminal sessions

The tall window on the right is the main debugging session, where I compile the module and test the functions in the Erlang shell. The 3 sessions on the left are: a root session to copy the files into the Erlang main libraries folder – so that the mb module is available from everywhere – another Erlang shell, opened from a different directory, to test that the module *is* indeed available and up to date, and a bash shell to to maintenance stuff. Behind these windows are the four .erl/.hrl files that make up mb, opened in SEE. That’s desktop #6. The docs are usually in desktop #2, in Firefox. Here again, navigation is done with the keyboard, with ⌘-1 to ⌘-9. If I need more tabs than that, I tend to open new windows in order to keep all tabs accessible with ⌘-# shortcuts – in which case I’ll switch between windows with ⌘-`. That’s one keyboard shortcut Apple got right. An OS-level shortcut can be a pain, but this one is very cool.

Erlang

07/06 Erlang Multibyte Module

900 lines of code later, I got this:

Eshell V5.4.3  (abort with ^G)
1> c("src/mb.erl",[{outdir,"ebin/"},nowarn_unused_function, nowarn_unused_vars]).
{ok,mb}
2> mb:
bocu/1              charToInt/1         convert10/1
convert16/1         convert2/1          convertEncoding/2
filter/2            filterB/2           format/1
getNextChar/1       getNextCharAsInt/1  hasProcess/1
hasTable/1          inStr/2             init/0
isASCII/1           join/2              kangxi/1
left/2              leftB/2             len/1
lenB/1              longSplit/2         lowercase/1
mid/2               mid/3               midB/2
midB/3              module_info/0       module_info/1
new/0               new/1               new/2
new/3               oneByte_to_utf8/1   print/2
replace/3           replaceAll/3        reset/0
reverse/1           reverseB/1          right/2
split/2             surrogate/1         uppercase/1
utf16_to_utf8/1     utf8_to_oneByte/2   utf8_to_utf16/1

There’s actually a bit more, but not yet activated/complete. Been doing quite a bit of bug fixing too… The next big chunk will be CJK encodings ⇔ UTF. I foresee lots of fun here… But at least thanks to the mass of info in the Unihan database, I got my work all lined up : extract BigFive and CCCII data into a file, have the init() script parse it and store that into a dets database. Like I did for case folding – except that the case folding data comes from the aptly named CaseFolding.txt file. Very handy nonetheless.

More later.

Erlang

07/03 Multibyte strings in Erlang

Aka, science-fiction. In a language where strings are lists of integers, what can you expect in terms of multibyte strings?
|
|
|___> zilch

That’s okay, gives me something to do for my idle weekends. I have thus started this project, mb [the shortest while meaningful name I could come up with], aimed at providing unicode support to Erlang – and possibly some sort of support for other encodings. One-byte encodings are actually easy to support; without too much pain I managed to add support for latin-1 to latin-10, MacRoman, Codepage 1252 and Codepage 437 [that’s Windows]. You can create MBStrings [it’s a tuple, really, but don’t tell anyone] and convert to and from these encodings + utf-8. This module also digs utf-16, albeit partially [I did get the surrogates part right, I think].

The -export() macro is already three lines long, main features are:

  • new
  • convertEncoding / oneByte_to_utf8 / utf8_to_oneByte
  • split / splitB
  • left / leftB ; right / rightB ; mid / midB
  • reverse / reverseB
  • lowercase / uppercase
  • isASCII
  • getNextChar

Note the ~B commands that work on the byte-level – they return “strings” [ie lists] and not MBString objects. Yes, this is an influence from RB, the only language I know that gives you ZERO pain in handling encodings. And I *do* mean zero. When you work at the byte level [there may be a few good reasons to do that, including speed, if you know you are manipulating an ASCII (7-bit) or one-byte (8-bit) encoded string], whatever comes out can’t be multibyte safe. Not 100%. So I reject the output as not safe and hand over a list of integers. You’re then free to try and convert that – again – to an MBString. Plug and pray :D

My interest in mb strings is of course more CJKV than koi-whatever (russian) or arabic or else. So I am adding first functionalities that interest me [what’s the radical of ? how many strokes are there in ? and possibly encoding conversions between the big standards – that is, if I find enough info on them.] Some of the functionalities are – of course – cross-language, inasmuch that the concept applies, like lowercase and uppercase. CJK languages have a ‘full-width’ alphabet that is *not* in the ASCII range. Thus, the ordinary and crude algorithm of my youth, back when ASCII rocked, will not work…

Fortunately, the Unicode project has a lot of info, and the UniHan file has it all – almost. What I did is extract the relevant case folding data, and I build a dets database with it. Whenever I need to convert between lower and upper case, I ask the database. Easy as pie. Maybe not as fast as I’d like, but slow is better than zilch.

Let’s take this string:
(ascii)ABCDEFG (russian)ПО-РУСКИЙ (greek)ΕΛΛΑΣ (circles)ⒸⒾⓇⒸⓁⒺⓈ
Which in Erlang “translates” into:
U=mb:new("(ascii)ABCDEFG (russian)\320\237\320\236-\320\240\320\243\320\241\320\232\320\230\320\231 (greek)\316\225\316\233\316\233\316\221\316\243 (circles)\342\222\270\342\222\276\342\223\207\342\222\270\342\223\201\342\222\272\342\223\210").

Ugly I know, but that’s precisely because Erlang is not too good at multibyte strings that I am working on it…

Now,

UL=mb:lowercase(U).
will give back - trust me – the following:

(ascii)abcdefg (russian)\320\277\320\276-\321\200\321\203\321\201\320\272\320\270\320\271 (greek)\316\265\316\273\316\273\316\261\317\203 (circles)\342\223\222\342\223\230\342\223\241\342\223\222\342\223\233\342\223\224\342\223\242

Translated in ‘real’ utf:

(ascii)abcdefg (russian)по-руский (greek)ελλασ (circles)ⓒⓘⓡⓒⓛⓔⓢ

ie a properly lowercase’d string.

More later…

Erlang

06/21 Wow…

I had heard about Erlang’s hot code swapping. But never used it so far. Well, I just tried it today, and I am still the bits of my jaw, which dropped while running the following code:

Eshell V5.4.3 (abort with ^G)
1> c(m).
{ok,m}
2> Pid=m:init().
rebooting...
<0.36.0>
3> Pid!{self(),status}.
M/status: tagadax
{<0.29.0>,status}

Here I change the code to return “tagada” [no -x] next time the code is run.

4> c(m).
{ok,m}
5> Pid!{self(),code_switch}.
Requesting reboot...
{<0.29.0>,code_switch}
6> Pid!{self(),status}.
M/status: tagada
{<0.29.0>,status}
7>

And, hang on to your socks, here’s the full code:

-module(m).
-export([init/0,loop/1,message/0]).
message() -> tagada.
init() ->
io:format("rebooting...~n",[]),
spawn(m,loop,[message()]).
loop(M) ->
  receive
    {_, code_switch} ->
      io:format("Requesting reboot...~n",[]),
      m:loop(m:message());
    {Nugu, status} ->
      Nugu ! M,
      io:format("M/status: ~w~n",[M]),
      loop(message())
  end.

m:loop(m:message()); is the full code doing the hot swap. Nice eh? When you do a call to a fully qualified function, MODULE:FUNCTION, the code is hot swapped. That’s it… Amazing!

Erlang