Erlang Multibyte Module

900 lines of code later, I got this:

Eshell V5.4.3  (abort with ^G)
1> c("src/mb.erl",[{outdir,"ebin/"},nowarn_unused_function, nowarn_unused_vars]).
{ok,mb}
2> mb:
bocu/1              charToInt/1         convert10/1
convert16/1         convert2/1          convertEncoding/2
filter/2            filterB/2           format/1
getNextChar/1       getNextCharAsInt/1  hasProcess/1
hasTable/1          inStr/2             init/0
isASCII/1           join/2              kangxi/1
left/2              leftB/2             len/1
lenB/1              longSplit/2         lowercase/1
mid/2               mid/3               midB/2
midB/3              module_info/0       module_info/1
new/0               new/1               new/2
new/3               oneByte_to_utf8/1   print/2
replace/3           replaceAll/3        reset/0
reverse/1           reverseB/1          right/2
split/2             surrogate/1         uppercase/1
utf16_to_utf8/1     utf8_to_oneByte/2   utf8_to_utf16/1

There’s actually a bit more, but not yet activated/complete. Been doing quite a bit of bug fixing too… The next big chunk will be CJK encodings ⇔ UTF. I foresee lots of fun here… But at least thanks to the mass of info in the Unihan database, I got my work all lined up : extract BigFive and CCCII data into a file, have the init() script parse it and store that into a dets database. Like I did for case folding – except that the case folding data comes from the aptly named CaseFolding.txt file. Very handy nonetheless.

More later.

Erlang

Leave a Reply

You must be logged in to post a comment.