LiPyrary - Python for books: April 2009

Donnerstag, 16. April 2009

Ubuntu 9.04 (Jaunty) and Python

Last week I've started with testing Ubuntu 9.04. The update process ran smoothly as usual but I had to switch from fglrx to radeonhd because the proprietary ATI driver seems to lack support for the graphics card of my Lenovo T60p. Well down ATI ...

Good news

Ubuntu 9.04 is shipped with four versions of Python:

2.4.6 (Zope2 still requires Python 2.4)
2.5.4
2.6.2 (default)
3.0.1

You can't ask for more Python versions!

Bad news

Ubuntu's Python team hasn't included my multiprocessing backport for Python 2.4 and 2.5 in Jaunty although Sandro Tosi has created a Debian package. You still have to install it manually from PyPI.

Warning

Did you install Python 2.6 yourself before? Make sure you remove it ASAP!

At first I couldn't figure out why lots of Python based applications were broken. It was a major issue for me because Ubuntu uses Python in lots of places. Then it occured to me that it may be related to my previous installation of Python 2.6. Ubuntu 8.04 didn't use Python 2.6 but 9.04 uses 2.6 as default Python for its apps. And /usr/local has precedence over /usr. Once it was gone everything worked again.

In order to wipe your own installation of Python from your hard disk you have to remove

/usr/local/bin/py*2.6
/usr/local/lib/libpython2.6.so*
/usr/local/lib/python2.6 (except for an empty site-packages directory)

Have fun!

Dienstag, 7. April 2009

work, booksnakes and a side dish of cherries

For more than a year I'm employed by a company called Smantics Kommunikationsmanagement GmbH as a Python developer. The company offers multiple services related to communcation and its management in our modern world. Most of the time I'm working on the server part of a software stack called Visual Library.

The rest of the time I'm allowed to spend on Open Source projects -- up to a quarter of my work hours! In the past weeks I've spent the contingent of open source time on several Python packages I've developed for my employer so far. In the next couple of weeks I will release several projects as Open Source on http://pypi.python.org/.

Before I start blogging about my work I like to give you an impression what the work is all about. I hope it doesn't sound too much like an advertisment of the software and for my employer.

Abstract

The Visual Library software stack is used for the digitalization process of bibliographic entities. Bibliographic entities is a technical term that includes a variety of things, including but not limited to books, maps, magazines, news papierrs, photographies, letters, records, charters and many more. The software aids libraries in modeling the entire process. It starts with importing catalogs and metadata, assembling work batches, assigning books to scanners, importing images, quality assurance, text recognition ... The process is much, much more than simply uploading a bunch of images. Really!

Metadata and open interfaces are very important in the world of libraries. Therefore the VLS provides various standarized interfaces and data exchange formats like METS, MODS, SRU, OAI, Epicur, MarcXML, Dublin Core and URN ( just to name a few). We also heavily rely upon XML, open standards and open file formats to guarantee that the data can be read in fifty years or more from now.

Fifty years don't sound much when one deals with 500 year old books. But can you still open the images you have created on your C64 and stored on a 5 1/4 inch floppy disk twenty years ago? What about your ATARI's datasette tapes? Even NASA has issues reading their old tapes because hardware is missing or the file format is undocumented ...

Our software is used in multiple installations across Germany and German speaking countries. The largest installation hosts about 9 TB of raw image data for more than half a million pages of more than ten thousand bibliographic entities from the 17th century. The material is from the 16th to 21th century with a focus on old entities. We have mostly German material written in German but also Latin, Greek, Hebrew, French and other languages. Two important projects are about Judaica (I wasn't able to find a correct translation, it roughly translates to Jewish material). The bibliographic entities orginaties from public libraries, usually from an university environment.

Visual Library Server

The heart of the Visual Library software stack is a Python driven web application server. It's built on top of CherryPy framework and driven by a Firebird database. The server utilizes a cornucopia of open source third party packages as well as commercial and proprietary software. The most noticable Python packages are lxml for XML and XSL(T), reportlab for PDF creation, Cython for optimization / library bindings and PyLucene for full text search.

The software is yet another example for the power of Python. We wouldn't have been able to build such a large and complex system without Python. I like to thank the community for all the hard work and feature rich extensions, too.

Examples

Are you interested in more? Have a look ...

Churfürstl. Sächsisches Schreiben an dero Abgesandten in Nürenberg, 1649

Die Herzogthumer Iulich, Cleve, und Berg samt der Grafschafft Marck, und angrænzenten Herrschafften, about 1720

[zoom view of this map]

Donnerstag, 2. April 2009

autoconf'ing multiprocessing

A few weeks ago Deepak Rokade has reported an issue with the processing package on Solaris. The posting has caught my interest since processing is the ancestor of the multiprocessing package. I'm the current maintainer of the backport to Python 2.4 and 2.5. I started a discussion about the problem on the python-dev mailing list. No immediate solution was found but we decided to move from hard coded configuration to an autoconf approach.

"I guess multiprocessing doesn't use autoconf tests for historical reasons. It's ancestor -- the pyprocessing package -- was using hard coded values, too." [quoting myself]

I did some experiments with autoconf but I had no luck in getting it right on BSDish platforms. I ran out of free time, too. Jesse was going to work on the matter at PyCon anyway so I stopped pursuing the failing tests. It's no fun debugging these kind of problems through the build bots. I'm looking forward in getting access to the snakebit network. It will make our work much easier.

Anyway, it turned out that platforms FreeBSD and Darwin are having known bugs like a broken sem_getvalue() function. Jesse and Martin von Löwis have finished their combined work. I'll release a new version of the multiprocessing backport when the tests have passed on all releveant build bots.

Good work, Jesse and Martin!

LiPyrary - Python for books

Donnerstag, 16. April 2009

Ubuntu 9.04 (Jaunty) and Python

Dienstag, 7. April 2009

work, booksnakes and a side dish of cherries

Donnerstag, 2. April 2009

autoconf'ing multiprocessing

Über mich

profiles & resources

ohloh, the open source network

Blog-Archiv

Follower