Dienstag, 7. April 2009

work, booksnakes and a side dish of cherries

For more than a year I'm employed by a company called Smantics Kommunikationsmanagement GmbH as a Python developer. The company offers multiple services related to communcation and its management in our modern world. Most of the time I'm working on the server part of a software stack called Visual Library.

The rest of the time I'm allowed to spend on Open Source projects -- up to a quarter of my work hours! In the past weeks I've spent the contingent of open source time on several Python packages I've developed for my employer so far. In the next couple of weeks I will release several projects as Open Source on http://pypi.python.org/.

Before I start blogging about my work I like to give you an impression what the work is all about. I hope it doesn't sound too much like an advertisment of the software and for my employer.


The Visual Library software stack is used for the digitalization process of bibliographic entities. Bibliographic entities is a technical term that includes a variety of things, including but not limited to books, maps, magazines, news papierrs, photographies, letters, records, charters and many more. The software aids libraries in modeling the entire process. It starts with importing catalogs and metadata, assembling work batches, assigning books to scanners, importing images, quality assurance, text recognition ... The process is much, much more than simply uploading a bunch of images. Really!

Metadata and open interfaces are very important in the world of libraries. Therefore the VLS provides various standarized interfaces and data exchange formats like METS, MODS, SRU, OAI, Epicur, MarcXML, Dublin Core and URN ( just to name a few). We also heavily rely upon XML, open standards and open file formats to guarantee that the data can be read in fifty years or more from now.

Fifty years don't sound much when one deals with 500 year old books. But can you still open the images you have created on your C64 and stored on a 5 1/4 inch floppy disk twenty years ago? What about your ATARI's datasette tapes? Even NASA has issues reading their old tapes because hardware is missing or the file format is undocumented ...

Our software is used in multiple installations across Germany and German speaking countries. The largest installation hosts about 9 TB of raw image data for more than half a million pages of more than ten thousand bibliographic entities from the 17th century. The material is from the 16th to 21th century with a focus on old entities. We have mostly German material written in German but also Latin, Greek, Hebrew, French and other languages. Two important projects are about Judaica (I wasn't able to find a correct translation, it roughly translates to Jewish material). The bibliographic entities orginaties from public libraries, usually from an university environment.

Visual Library Server

The heart of the Visual Library software stack is a Python driven web application server. It's built on top of CherryPy framework and driven by a Firebird database. The server utilizes a cornucopia of open source third party packages as well as commercial and proprietary software. The most noticable Python packages are lxml for XML and XSL(T), reportlab for PDF creation, Cython for optimization / library bindings and PyLucene for full text search.

The software is yet another example for the power of Python. We wouldn't have been able to build such a large and complex system without Python. I like to thank the community for all the hard work and feature rich extensions, too.


Are you interested in more? Have a look ...

Keine Kommentare:

Kommentar veröffentlichen