Archive for the 'Erlang' Category

08/25 The rats, they be leavin’ teh ships?

At one point I thought I hated programming because I was just so sick it… It turns out I don’t hate programming, I just hate programming in Java.

Russell Beattie

ㅋㅋㅋ.

The reason people are looking at Erlang is not because its beautiful syntax, great documentation, or up-to-date libraries. Trust me. It’s because the Erlang VM can run for long periods of time, scaling linearly across cores or processors filling the same niche that Java does right now on the server.

This is less ㅋㅋㅋish. First off, why should I trust Russ on this? Is he Joe Armstrong? Kenneth Lundin? Robert Virding? Or the hordes of contributors on the Erlang mailing list? Nope. He is welcome to his opinion about Erlang’s syntax, but as a recovering Java programmer, what can you expect? The documentation is great, and the community vibrant and knowledgeable. Erlang’s faults are more with the lack of things that’d make it a great all-purpose language, like string manipulations, graphic libraries, and good GUI tools.

06/19 700% speedup

I have been cooperating on and off with the good people from Dot.Tunes, based in Australia. I have coded bits of the application, notably the iTunes Library XML file import engine. Back when they integrated my engine, D-T saw huge improvements while importing the XML database, especially with large files. If memory serves, the previous import engine stalled on some files. With the first two or three version of my code, large files took some time to import, but at least D-T didn’t fail. When I mean large, I mean real large. Like 50,000 record-large. 35MB+ of XML to digest and convert to sqlite. Gulp.

50K records took, give or take, 20 minutes. Note the use of past tense. Took. Now, it takes 3 minutes. Yessir. On the same machine. But to achieve that I had to break out the big guns. The real big ‘uns. I had been playing for a while with the idea of writing some iTunes/mp3/etc related code in Erlang and possibly Yaws. So one of the things I started writing is a new engine for the iTunes Library XML format, using message passing, concurrency and gazillions of lightweight threads.

The result is astounding. 2,000 records take 2 seconds on my MacBook, and/or a late-model PowerMac G5. 50,000 records take three minutes on that same PowerMac G5 – I don’t have yet the file to make the tests, and I am quite curious to see how the MacBook will fare. To be fair, there’s extra overhead then to inject the .sql file into an sqlite database, but it’s minimal. The only problem is that we’ll need to ship CEAN, a stand-alone version of Erlang. That adds +/-12MB to the package, but I think it’s well worth it.

We’re still in the testing stages, and Jeff over at D-T will certainly make announcements when we’re ready, but I am already quite excited by the progress we’ve made….

01/14 99 Lisp Prolog Erlang problems

Since I have been lapsing in my Erlang practice, I am playing with L-99: Ninety-Nine Lisp Problems, based on a Prolog problem list – which is fitting, since the first Erlang interpreter was written in Prolog…

I’ll post links to my solutions when I have enough – I have done P01 to P20, with the exception of P13. Feels good to get back in the groove…

Erlang

Update
I posted some sample code in the comments, but here is a permanent link to the source code.

08/21 mb update

I have decided that the only way to make mb faster was to change the underlying structure from {Encoding::atom(), String::list()} to {Encoding::atom(), String::binary()}. Which implied a thorough overhaul of the code. I am almost done, although still fighting with some issues. Still, preliminary tests tend to show that the change was worthwile: mb:reset() – the creation of the encodings-related text files into dets tables – is easily faster by half. Only one test in the test suite passes so far, and it too executes quite faster than the original.

As a side note, I have discovered something puzzling. Say you have a variable Code1 which contains the integer value 0×2121. I was expecting that doing <<Code1>> would yield <<33,33>>. Nopesky. It yields <<”!”>>. You have to do <<Code1:16>> – and hope the integer is not greater than 65535: if your integer was, say, 0×012345, <<Code1:16>> would yield <<35,69>>. Ah well…

I am maintaining the source code with git. It is quite pleasant to use, although I have yet to manage to push the repository correctly to the public repository. So nothing is published yet, except partial docs. MNK, who gave me a FreeBSD jail to play with, and is hosting the whole thing, is playing with a git to mercurial bridge. We’ll probably have one day a <your_scm> to mb repository bridge. Maybe.

Erlang

07/10 X-Encodings in Erlang/mb

Been hard at it. But I hit a snag when encountering a plain, innocuous-looking sinogram: 內. Pretty harmless, right? D’uh. Big time. Because in giapponese it’s 内, not 內. Friggin’ variants. 0×5167 vs 0×5185 [If you don’t know what I am talking about, it’s okay. In this case, ignorance is bliss!]. Can I slap someone – preferrably not me? Say hello to hasVariant(X) who just joined us. Nyessss…. Anyway, all fixed now – or rather, for now!

A couple of screenshots is worth 1,000 words!

Here’s a screenshot of the source of KanjiTest.html, in SEE. It’s a cross-table of a few sinograms for a slew of encodings, showing the respective code points for each character in all the encodings. The page itself is in utf-8.
KanjiTest.html screenshot

Here’s a screenshot of the source of KanjiTestUTF8.html. An extract of the former table, if you will.
KanjiTestUTF8.html screenshot

Look, Mom, with only one hand: a UTF-16 encoded page showing a table of sinograms, in UTF-16 of course.
KanjiTestUTF16.html screenshot

Ooooh, lookit, no hands! Same sinograms, in Shift-JIS. Damn, I rock!
KanjiTestSJ.html screenshot

It’s not exactly rosy. The code’s a deluxe candidate for refactoring – read, it’s a mess – but is written in a way that can handle easily as many new encodings as you can throw at me, provided you give me a UTF-8/16 to said encoding cross-ref file. The whole yahzoo [case folding data, Big5, CCCII, Shift-JIS, EUC-KR] is stored in dets tables – UTF-16⇔UTF-8 is an algorithm, thus a tad faster. Because this thing, it’s nice to have, but not exactly a Ferrari. The test that produces these tables, it runs in, ahem, er…, 14 seconds? Don’t reach for your gun right now though, because:
A. I am going to work on speed when functionalities are all in and tested
B. Try to do that right now in Erlang :D – good luck!

So I guess slow is better than nothing. But we’ll work on speed.

Erlang