I have implemented a partial framework for browsing and querying eprints web sites. From an IRC channel…
This is part of my ongoing IRC bot project, Yaspyb (Yet Another PYthon Bot). I stumbled upon Cogprints, and managed to build rapidly a minimal framework allowing me to browse top- and second-level categories, search title lists for keywords, and even send the full-text (when available) as a text file (via pdftotext, sometimes first via epstopdf, since some articles are in .ps).
Then I realized that as an open-source project, there had to be more users of eprints. Indeed there’s a bunch out there. I picked a couple of them, and lo and behold, after some tweaking, I can search, from IRC a bunch of web sites, for academic publications. Yaspyb builds a couple of small indexes at startup, so that it knows where to find the top-level and secondary subjects. Most of the data gathered from user queries are cached, so the impact of the web sites is low – unlike a certain experiment of mine involving viêtnamese and chinese characters, ahem…
Of course, browsing for journals in an IRC channel is not the most convenient way, but my bot is a very convenient platform to integrate web-scraping/indexing/encoding-conversion code, and I suppose I could rewrite many of its functions into a desktop application – and a faster one too! I love Python, but fast it isn’t…