Archive for the 'My Kit' Category

07/06 Erlang Multibyte Module

900 lines of code later, I got this:

Eshell V5.4.3  (abort with ^G)
1> c("src/mb.erl",[{outdir,"ebin/"},nowarn_unused_function, nowarn_unused_vars]).
{ok,mb}
2> mb:
bocu/1              charToInt/1         convert10/1
convert16/1         convert2/1          convertEncoding/2
filter/2            filterB/2           format/1
getNextChar/1       getNextCharAsInt/1  hasProcess/1
hasTable/1          inStr/2             init/0
isASCII/1           join/2              kangxi/1
left/2              leftB/2             len/1
lenB/1              longSplit/2         lowercase/1
mid/2               mid/3               midB/2
midB/3              module_info/0       module_info/1
new/0               new/1               new/2
new/3               oneByte_to_utf8/1   print/2
replace/3           replaceAll/3        reset/0
reverse/1           reverseB/1          right/2
split/2             surrogate/1         uppercase/1
utf16_to_utf8/1     utf8_to_oneByte/2   utf8_to_utf16/1

There’s actually a bit more, but not yet activated/complete. Been doing quite a bit of bug fixing too… The next big chunk will be CJK encodings ⇔ UTF. I foresee lots of fun here… But at least thanks to the mass of info in the Unihan database, I got my work all lined up : extract BigFive and CCCII data into a file, have the init() script parse it and store that into a dets database. Like I did for case folding – except that the case folding data comes from the aptly named CaseFolding.txt file. Very handy nonetheless.

More later.

Erlang

07/03 Multibyte strings in Erlang

Aka, science-fiction. In a language where strings are lists of integers, what can you expect in terms of multibyte strings?
|
|
|___> zilch

That’s okay, gives me something to do for my idle weekends. I have thus started this project, mb [the shortest while meaningful name I could come up with], aimed at providing unicode support to Erlang – and possibly some sort of support for other encodings. One-byte encodings are actually easy to support; without too much pain I managed to add support for latin-1 to latin-10, MacRoman, Codepage 1252 and Codepage 437 [that’s Windows]. You can create MBStrings [it’s a tuple, really, but don’t tell anyone] and convert to and from these encodings + utf-8. This module also digs utf-16, albeit partially [I did get the surrogates part right, I think].

The -export() macro is already three lines long, main features are:

  • new
  • convertEncoding / oneByte_to_utf8 / utf8_to_oneByte
  • split / splitB
  • left / leftB ; right / rightB ; mid / midB
  • reverse / reverseB
  • lowercase / uppercase
  • isASCII
  • getNextChar

Note the ~B commands that work on the byte-level – they return “strings” [ie lists] and not MBString objects. Yes, this is an influence from RB, the only language I know that gives you ZERO pain in handling encodings. And I *do* mean zero. When you work at the byte level [there may be a few good reasons to do that, including speed, if you know you are manipulating an ASCII (7-bit) or one-byte (8-bit) encoded string], whatever comes out can’t be multibyte safe. Not 100%. So I reject the output as not safe and hand over a list of integers. You’re then free to try and convert that – again – to an MBString. Plug and pray :D

My interest in mb strings is of course more CJKV than koi-whatever (russian) or arabic or else. So I am adding first functionalities that interest me [what’s the radical of ? how many strokes are there in ? and possibly encoding conversions between the big standards – that is, if I find enough info on them.] Some of the functionalities are – of course – cross-language, inasmuch that the concept applies, like lowercase and uppercase. CJK languages have a ‘full-width’ alphabet that is *not* in the ASCII range. Thus, the ordinary and crude algorithm of my youth, back when ASCII rocked, will not work…

Fortunately, the Unicode project has a lot of info, and the UniHan file has it all – almost. What I did is extract the relevant case folding data, and I build a dets database with it. Whenever I need to convert between lower and upper case, I ask the database. Easy as pie. Maybe not as fast as I’d like, but slow is better than zilch.

Let’s take this string:
(ascii)ABCDEFG (russian)ПО-РУСКИЙ (greek)ΕΛΛΑΣ (circles)ⒸⒾⓇⒸⓁⒺⓈ
Which in Erlang “translates” into:
U=mb:new("(ascii)ABCDEFG (russian)\320\237\320\236-\320\240\320\243\320\241\320\232\320\230\320\231 (greek)\316\225\316\233\316\233\316\221\316\243 (circles)\342\222\270\342\222\276\342\223\207\342\222\270\342\223\201\342\222\272\342\223\210").

Ugly I know, but that’s precisely because Erlang is not too good at multibyte strings that I am working on it…

Now,

UL=mb:lowercase(U).
will give back - trust me – the following:

(ascii)abcdefg (russian)\320\277\320\276-\321\200\321\203\321\201\320\272\320\270\320\271 (greek)\316\265\316\273\316\273\316\261\317\203 (circles)\342\223\222\342\223\230\342\223\241\342\223\222\342\223\233\342\223\224\342\223\242

Translated in ‘real’ utf:

(ascii)abcdefg (russian)по-руский (greek)ελλασ (circles)ⓒⓘⓡⓒⓛⓔⓢ

ie a properly lowercase’d string.

More later…

Erlang

06/11 Erlang/gs Event loop

I have now a cool-ish event loop in place that declares a buttonpress/click for each known GUI element. An event loop is created with all elements referenced, and basic messages are implemented [clicks/buttonpresses]. If a pushbutton as “quit” as caption, app termination is provided for. Ditto when closing the window. [Mouse] Button presses provide the number of the mouse button, and the position. What else do ya need? ;-)

Erlang

06/11 Erlang/gs progress

I solved most of my problems with listboxes and grids. I should probably add a couple of other elements, like the editor [aka EditField]. But now is the time to turn to linking the existing elements to events inside an event loop, which looks like this:

loop() ->
    receive
        {gs,_Win,destroy,_Data,_Args} -> bye;
        {gs,_Gridline,click,_Data,[Col,Row,Text|_]} ->
            io:format("Click at col:~p row:~p text:~p~n",[Col,Row,Text]),
            loop();
        Msg ->
            io:format("Got ~p~n",[Msg]),
            loop()
    end.

You set up a receiver and match messages following this pattern: {gs, IdOrName, EventType, Data, Args}

The original in RB’s IDE

The end result in X11

Erlang

06/10 Messing with Erlang/gs

Building GUI apps in Erlang is pretty primitive. And it’s text-only work. As in, you have to hard code GUI elements into a text editor.

W_Window1 = gs:create(window, gs:start(), [{width,789}, {height,544}, {x,100}, {y,100}, {title,"First Try"}]),
CV_Canvas1 = gs:create(canvas, W_Window1, [{y,14}, {x,20}, {width,483}, {height,483}]),
CVP_Canvas1 = gs:create(image, CV_Canvas1, [{'load_gif', "tiger.gif"}]),

Blech…

So I am working on a GUI editor for Erlang, but with a twist I had used before for Python: I am using RB’s own GUI editor to build a nice window, and then I make a call in the window’s Open() event to a library I wrote that outputs Erlang/gs code for said window. The app runs, produces the code, and quits. It’s far from complete, and I am having difficulties with gs grids, aka multiple-column listboxes. But it is already enough to easily produce some skeletton code, and I hope I can then move on to linking GUI elements to code managing events. Like when a pushbutton is clicked, provide basic code to handle this.

More later… But see this already:

The original in RB

The end result in gs/X11

Erlang