Erlang Multibyte Module
900 lines of code later, I got this:
Eshell V5.4.3 (abort with ^G)
1> c("src/mb.erl",[{outdir,"ebin/"},nowarn_unused_function, nowarn_unused_vars]).
{ok,mb}
2> mb:
bocu/1 charToInt/1 convert10/1
convert16/1 convert2/1 convertEncoding/2
filter/2 filterB/2 format/1
getNextChar/1 getNextCharAsInt/1 hasProcess/1
hasTable/1 inStr/2 init/0
isASCII/1 join/2 kangxi/1
left/2 leftB/2 len/1
lenB/1 longSplit/2 lowercase/1
mid/2 mid/3 midB/2
midB/3 module_info/0 module_info/1
new/0 new/1 new/2
new/3 oneByte_to_utf8/1 print/2
replace/3 replaceAll/3 reset/0
reverse/1 reverseB/1 right/2
split/2 surrogate/1 uppercase/1
utf16_to_utf8/1 utf8_to_oneByte/2 utf8_to_utf16/1
There’s actually a bit more, but not yet activated/complete. Been doing quite a bit of bug fixing too… The next big chunk will be CJK encodings ⇔ UTF. I foresee lots of fun here… But at least thanks to the mass of info in the Unihan database, I got my work all lined up : extract BigFive and CCCII data into a file, have the init() script parse it and store that into a dets database. Like I did for case folding – except that the case folding data comes from the aptly named CaseFolding.txt file. Very handy nonetheless.
More later.
