Tomabaem online light

On the side bar you’ll find a small form that accepts one chinese character, aka sinogram, and will return in the box below that the readings [Cantonese, Mandarin, Japanese, Korean and Viêtnamese] and meanings for this sinogram. This stuff was pulled from the UniHan database, as explained at the Tomabaem main page. Tomabaem Online is a derivate of the desktop app, and is built on indexes of the database that are made offline. Then, a little Python [thanks to 2.4’s unicode codecs, conversion from UTF-16 to UTF-8 is seamless] and grep later, and we have a winner! Which reminds me, I haven’t updated the indexes for a while, I shall do that now. I wrote an RB app that runs 11 threads [must. resist. temptation. to. rewrite. it. in. Erlang.] and the main routine reads the UniHan db and sends text to each thread depending on its content. Here’s a sample:

U+3400 kCantonese jau1
U+3400 kDefinition (same as U+4E18 丘) hillock or mound
U+3400 kHanYu 10015.030
U+3400 kIRGHanyuDaZidian 10015.030
U+3400 kIRGKangXi 0078.010
U+3400 kIRG_GSource KX
U+3400 kIRG_JSource A-2121
U+3400 kIRG_TSource 6-222C
U+3400 kMandarin QIU1
U+3400 kRSUnicode 1.4
U+3400 kSemanticVariant U+4E18
U+3400 kTotalStrokes 5

U+3400 gives me the sinogram’s codepoint, and depending on the kXXXXX tag, I pass each line to a thread storing the data in a separate file. The whole process takes two and a half minutes, and I guess I could do faster. But 160 seconds for 25MB of data is fast enough. So now the indexes are up to date :-) I need to update Tomabaem’s own db, will do that later.

dda> time ./threadIndexer/threadIndexer
Thread kSimplifiedVariant starting...
Thread kTraditionalVariant starting...
Thread kCantonese starting...
Thread kJapaneseOn starting...
Thread kJapaneseKun starting...
Thread kKorean starting...
Thread kMandarin starting...
Thread kVietnamese starting...
Thread kRSKangXi starting...
Thread kTotalStrokes starting...
Thread kDefinition starting...
Done dispatching in 150,204,884µs
Saving kSimplifiedVariant.idx
Saving [...]
Thread kSimplifiedVariant finished...
Thread [...] finished...

real    2m39.733s
user    1m54.760s
sys     0m6.600s

So. Okay, most of the time is spent dispatching, I could improve that…

Leave a Reply

You must be logged in to post a comment.