Archive for the 'Erlang' Category
07/06 Erlang mode in emacs
(setq load-path (cons "/sw/lib/erlang/lib/tools-2.4/emacs/" load-path)) (setq erlang-root-dir "/sw/lib/erlang/") (setq exec-path (cons "/sw/lib/erlang/bin/" exec-path)) (require 'erlang-start) (global-font-lock-mode t)
Copy/paste is always easier than typing it from the Erlang book I bought recently.
Not that I am planning to use it much, as preliminary tests show that the Erlang mode doesn’t bring much more than I already have with SEE [I wrote an Erlang mode for SEE], a local copy of the Erlang docs open in a series of Firefox tabs [to which I added a basic search function in PHP], and four Terminal sessions. Since I have been using Desktop Manager for a few years now, and have seven active desktops – did I hear someone mumble Adult ADD here? – navigating between the different parts of my own toolchain is actually pleasant, as ⌘-Tab to navigate between apps, and ⌥⌘-[1..7] or ⌥⌘-[⇠ ⇢] to navigate between desktops, help tremendously. Like many people of my age and/or with my kind of mileage on computers, I am happy when my hand doesn’t reach for the mouse for a long stretch.

The tall window on the right is the main debugging session, where I compile the module and test the functions in the Erlang shell. The 3 sessions on the left are: a root session to copy the files into the Erlang main libraries folder – so that the mb module is available from everywhere – another Erlang shell, opened from a different directory, to test that the module *is* indeed available and up to date, and a bash shell to to maintenance stuff. Behind these windows are the four .erl/.hrl files that make up mb, opened in SEE. That’s desktop #6. The docs are usually in desktop #2, in Firefox. Here again, navigation is done with the keyboard, with ⌘-1 to ⌘-9. If I need more tabs than that, I tend to open new windows in order to keep all tabs accessible with ⌘-# shortcuts – in which case I’ll switch between windows with ⌘-`. That’s one keyboard shortcut Apple got right. An OS-level shortcut can be a pain, but this one is very cool.
07/06 Erlang Multibyte Module
900 lines of code later, I got this:
Eshell V5.4.3 (abort with ^G)
1> c("src/mb.erl",[{outdir,"ebin/"},nowarn_unused_function, nowarn_unused_vars]).
{ok,mb}
2> mb:
bocu/1 charToInt/1 convert10/1
convert16/1 convert2/1 convertEncoding/2
filter/2 filterB/2 format/1
getNextChar/1 getNextCharAsInt/1 hasProcess/1
hasTable/1 inStr/2 init/0
isASCII/1 join/2 kangxi/1
left/2 leftB/2 len/1
lenB/1 longSplit/2 lowercase/1
mid/2 mid/3 midB/2
midB/3 module_info/0 module_info/1
new/0 new/1 new/2
new/3 oneByte_to_utf8/1 print/2
replace/3 replaceAll/3 reset/0
reverse/1 reverseB/1 right/2
split/2 surrogate/1 uppercase/1
utf16_to_utf8/1 utf8_to_oneByte/2 utf8_to_utf16/1
There’s actually a bit more, but not yet activated/complete. Been doing quite a bit of bug fixing too… The next big chunk will be CJK encodings ⇔ UTF. I foresee lots of fun here… But at least thanks to the mass of info in the Unihan database, I got my work all lined up : extract BigFive and CCCII data into a file, have the init() script parse it and store that into a dets database. Like I did for case folding – except that the case folding data comes from the aptly named CaseFolding.txt file. Very handy nonetheless.
More later.
07/03 Multibyte strings in Erlang
Aka, science-fiction. In a language where strings are lists of integers, what can you expect in terms of multibyte strings?
|
|
|___> zilch
That’s okay, gives me something to do for my idle weekends. I have thus started this project, mb [the shortest while meaningful name I could come up with], aimed at providing unicode support to Erlang – and possibly some sort of support for other encodings. One-byte encodings are actually easy to support; without too much pain I managed to add support for latin-1 to latin-10, MacRoman, Codepage 1252 and Codepage 437 [that’s Windows]. You can create MBStrings [it’s a tuple, really, but don’t tell anyone] and convert to and from these encodings + utf-8. This module also digs utf-16, albeit partially [I did get the surrogates part right, I think].
The -export() macro is already three lines long, main features are:
- new
- convertEncoding / oneByte_to_utf8 / utf8_to_oneByte
- split / splitB
- left / leftB ; right / rightB ; mid / midB
- reverse / reverseB
- lowercase / uppercase
- isASCII
- getNextChar
Note the ~B commands that work on the byte-level – they return “strings” [ie lists] and not MBString objects. Yes, this is an influence from RB, the only language I know that gives you ZERO pain in handling encodings. And I *do* mean zero. When you work at the byte level [there may be a few good reasons to do that, including speed, if you know you are manipulating an ASCII (7-bit) or one-byte (8-bit) encoded string], whatever comes out can’t be multibyte safe. Not 100%. So I reject the output as not safe and hand over a list of integers. You’re then free to try and convert that – again – to an MBString. Plug and pray
My interest in mb strings is of course more CJKV than koi-whatever (russian) or arabic or else. So I am adding first functionalities that interest me [what’s the radical of 寒? how many strokes are there in 龍? and possibly encoding conversions between the big standards – that is, if I find enough info on them.] Some of the functionalities are – of course – cross-language, inasmuch that the concept applies, like lowercase and uppercase. CJK languages have a ‘full-width’ alphabet that is *not* in the ASCII range. Thus, the ordinary and crude algorithm of my youth, back when ASCII rocked, will not work…
Fortunately, the Unicode project has a lot of info, and the UniHan file has it all – almost. What I did is extract the relevant case folding data, and I build a dets database with it. Whenever I need to convert between lower and upper case, I ask the database. Easy as pie. Maybe not as fast as I’d like, but slow is better than zilch.
Let’s take this string:
(ascii)ABCDEFG (russian)ПО-РУСКИЙ (greek)ΕΛΛΑΣ (circles)ⒸⒾⓇⒸⓁⒺⓈ
Which in Erlang “translates” into:
U=mb:new("(ascii)ABCDEFG (russian)\320\237\320\236-\320\240\320\243\320\241\320\232\320\230\320\231 (greek)\316\225\316\233\316\233\316\221\316\243 (circles)\342\222\270\342\222\276\342\223\207\342\222\270\342\223\201\342\222\272\342\223\210").
Ugly I know, but that’s precisely because Erlang is not too good at multibyte strings that I am working on it…
Now,
UL=mb:lowercase(U).
will give back - trust me – the following:
(ascii)abcdefg (russian)\320\277\320\276-\321\200\321\203\321\201\320\272\320\270\320\271 (greek)\316\265\316\273\316\273\316\261\317\203 (circles)\342\223\222\342\223\230\342\223\241\342\223\222\342\223\233\342\223\224\342\223\242
Translated in ‘real’ utf:
(ascii)abcdefg (russian)по-руский (greek)ελλασ (circles)ⓒⓘⓡⓒⓛⓔⓢ
ie a properly lowercase’d string.
More later…
06/21 Wow…
I had heard about Erlang’s hot code swapping. But never used it so far. Well, I just tried it today, and I am still the bits of my jaw, which dropped while running the following code:
Eshell V5.4.3 (abort with ^G)
1> c(m).
{ok,m}
2> Pid=m:init().
rebooting...
<0.36.0>
3> Pid!{self(),status}.
M/status: tagadax
{<0.29.0>,status}
Here I change the code to return “tagada” [no -x] next time the code is run.
4> c(m).
{ok,m}
5> Pid!{self(),code_switch}.
Requesting reboot...
{<0.29.0>,code_switch}
6> Pid!{self(),status}.
M/status: tagada
{<0.29.0>,status}
7>
And, hang on to your socks, here’s the full code:
-module(m).
-export([init/0,loop/1,message/0]).
message() -> tagada.
init() ->
io:format("rebooting...~n",[]),
spawn(m,loop,[message()]).
loop(M) ->
receive
{_, code_switch} ->
io:format("Requesting reboot...~n",[]),
m:loop(m:message());
{Nugu, status} ->
Nugu ! M,
io:format("M/status: ~w~n",[M]),
loop(message())
end.
m:loop(m:message()); is the full code doing the hot swap. Nice eh? When you do a call to a fully qualified function, MODULE:FUNCTION, the code is hot swapped. That’s it… Amazing!
