Donnerstag, 20. August 2009

Microsoft gives MSDN Premium subscription to PSF members

This almost went through unnoticed. Steve Holden (Python Software Foundation) has worked out a fantastic deal with Sam Ramji and Tom Hanrahan (both leading Open Source guys at Microsoft). Microsoft has given fourteen MSDN Premium subscriptions to Python core developers and PSF members. I'm one of the lucky few. [1], [2]

The premium subscription includes licenses and downloads of almost every Microsoft product from MS-DOS 6.22 to Windows 7, all versions of Visual Studio and many more stuff. This is very useful for Python core developers. Every developer with a subscription can finally set up multiple virtual boxes with 32 and 64bit versions of XP, Vista and Windows 7 to test and debug issues. 64bit versions of Windows were hard and costly to come by.

I'll keep my Ubuntu boxes for daily work and I'll still be skeptical about Microsoft's open source politics. However I'm glad that their paradigm towards Open Source is changing into the right direction. Python (more precisely IronPython) is going to become more important to Microsoft. I'll put my subscription into good use.

Thanks to Sam, Tom and Steve!


Donnerstag, 13. August 2009

How to add a new module search path

Once in a while Python users are asking how to add some directories to sys.path permanently. Usually a solution like the PYTHONPATH env variable are suggested to the op. Other solutions require root privileges or modify the search path for all users. PEP 370 adds another way that is more clean and easy to use. It doesn't require root privileges and it doesn't suffer from other issues. PYTHONPATH causes trouble for multiple Python versions. C extensions only work for one version of Python, most Python modules won't work on Python 2 and 3.

My preferred way adds additional search pathes just for one version of Python and just for me. It uses a .pth file as explained in the site module manual. .pth files only work in site-packages directories, either the global or the user specific directories.

The Python way

$ python2.6
>>> import os
>>> import site
>>> site.USER_SITE


Create the directory if it doesn't exist yet

>>> if not os.path.isdir(site.USER_SITE):
... os.makedirs(site.USER_SITE)

mypath.pth is going to contain my list of addition search path

>>> mypth = os.path.join(site.USER_SITE, "mypath.pth")

>>> path_to_add = ["/home/heimes/modules", "/home/heimes/other_modules"]

Add a list of search paths line by line, also make sure we end with an empty line

>>> with open(mypth, "a") as f:

... f.write("\n".join(path_to_add))
... f.write("\n")

The bash way

$ python2.6 -m site --user-site
$ mkdir -p $(python2.6 -m site --user-site)
$ echo "/home/heimes/more_modules" >> $(python2.6 -m site --user-site)/mypath.pth

Let's check if it works

check the pth file

$ cat $(python2.6 -m site --user-site)/mypath.pth


Let's see if the modules are in the new search path ... they aren't because the directories don't exist yet.

$ python2.6 -m site
sys.path = [
USER_BASE: '/home/heimes/.local' (exists)
USER_SITE: '/home/heimes/.local/lib/python2.6/site-packages' (exists)

create one example directory

$ mkdir /home/heimes/modules

$ python2.6 -m site
sys.path = [
USER_BASE: '/home/heimes/.local' (exists)
USER_SITE: '/home/heimes/.local/lib/python2.6/site-packages' (exists)

Easy, isnt' it?

Mittwoch, 12. August 2009

libxml2 crash on 64bit Ubuntu

I've spent the last couple of hours debugging a really strange segfault. Our application stack had a reproduceable crash in libxml2 -- but only with self compiled versions of libxml2. Ubuntu's 2.6.32 worked like a charm, my self compiled 2.6.32 didn't. The very same version works on several other Debian, Redhat and SuSE boxes, 32 and 64bit, too. WTF!?

The crash always occured in xmlIO.c:__xmlParserInputBufferCreateFilename() with xmlGzfileOpen() as open handler. After several gdb debugging sessions and several recompiles I noticed a suspicious message in the make output:

xmlIO.c: In function 'xmlGzfileOpen_real':
xmlIO.c:1132: warning: implicit declaration of function 'gzopen64'
xmlIO.c:1132: warning: nested extern declaration of 'gzopen64'
xmlIO.c:1132: warning: assignment makes pointer from integer without a cast
xmlIO.c: In function 'xmlGzfileOpenW':
xmlIO.c:1200: warning: assignment makes pointer from integer without a cast

The message only occured during my own compiles but not during "apt-get source -b libxml2" . Apparently Ubuntu has patched the sources to fix the issue. The changelog contains yet another hint:

* libxml.h: define _LARGEFILE64_SOURCE to properly get gzopen64 defines in zlib.h. Closes: #439843. Thanks Dann Frazier.

That's the solution to my problem! CFLAGS="-D_LARGEFILE64_SOURCE" ./configure and both the compiler warning and the crash is gone.

Donnerstag, 30. Juli 2009

multiprocessing released

A new version of the multiprocessing backport to Python 2.4 and 2.5 has been released. It contains all fixes from the Python 2.6 branch. As usually the release is available as tar.gz and Windows installer for Python 2.4 and 2.5 on PyPI.

Freitag, 3. Juli 2009

Python 3.0 is dead, long lives Python 3.0

Now it's official. The developer teams has decided against a Python 3.0.2 bugfix release [1]. Python 3.0 will not see another release and everybody should move to Python 3.1 as soon as possible. The 3.1 release is so much better than 3.0 and the list of incompatibilities is small. Have fun!

[1] Barry Warsaw, Python 3.0 (pinin' for the fjords)

Donnerstag, 16. April 2009

Ubuntu 9.04 (Jaunty) and Python

Last week I've started with testing Ubuntu 9.04. The update process ran smoothly as usual but I had to switch from fglrx to radeonhd because the proprietary ATI driver seems to lack support for the graphics card of my Lenovo T60p. Well down ATI ...

Good news

Ubuntu 9.04 is shipped with four versions of Python:
  • 2.4.6 (Zope2 still requires Python 2.4)
  • 2.5.4
  • 2.6.2 (default)
  • 3.0.1
You can't ask for more Python versions!

Bad news

Ubuntu's Python team hasn't included my multiprocessing backport for Python 2.4 and 2.5 in Jaunty although Sandro Tosi has created a Debian package. You still have to install it manually from PyPI.


Did you install Python 2.6 yourself before? Make sure you remove it ASAP!

At first I couldn't figure out why lots of Python based applications were broken. It was a major issue for me because Ubuntu uses Python in lots of places. Then it occured to me that it may be related to my previous installation of Python 2.6. Ubuntu 8.04 didn't use Python 2.6 but 9.04 uses 2.6 as default Python for its apps. And /usr/local has precedence over /usr. Once it was gone everything worked again.

In order to wipe your own installation of Python from your hard disk you have to remove
  • /usr/local/bin/py*2.6
  • /usr/local/lib/*
  • /usr/local/lib/python2.6 (except for an empty site-packages directory)
Have fun!

Dienstag, 7. April 2009

work, booksnakes and a side dish of cherries

For more than a year I'm employed by a company called Smantics Kommunikationsmanagement GmbH as a Python developer. The company offers multiple services related to communcation and its management in our modern world. Most of the time I'm working on the server part of a software stack called Visual Library.

The rest of the time I'm allowed to spend on Open Source projects -- up to a quarter of my work hours! In the past weeks I've spent the contingent of open source time on several Python packages I've developed for my employer so far. In the next couple of weeks I will release several projects as Open Source on

Before I start blogging about my work I like to give you an impression what the work is all about. I hope it doesn't sound too much like an advertisment of the software and for my employer.


The Visual Library software stack is used for the digitalization process of bibliographic entities. Bibliographic entities is a technical term that includes a variety of things, including but not limited to books, maps, magazines, news papierrs, photographies, letters, records, charters and many more. The software aids libraries in modeling the entire process. It starts with importing catalogs and metadata, assembling work batches, assigning books to scanners, importing images, quality assurance, text recognition ... The process is much, much more than simply uploading a bunch of images. Really!

Metadata and open interfaces are very important in the world of libraries. Therefore the VLS provides various standarized interfaces and data exchange formats like METS, MODS, SRU, OAI, Epicur, MarcXML, Dublin Core and URN ( just to name a few). We also heavily rely upon XML, open standards and open file formats to guarantee that the data can be read in fifty years or more from now.

Fifty years don't sound much when one deals with 500 year old books. But can you still open the images you have created on your C64 and stored on a 5 1/4 inch floppy disk twenty years ago? What about your ATARI's datasette tapes? Even NASA has issues reading their old tapes because hardware is missing or the file format is undocumented ...

Our software is used in multiple installations across Germany and German speaking countries. The largest installation hosts about 9 TB of raw image data for more than half a million pages of more than ten thousand bibliographic entities from the 17th century. The material is from the 16th to 21th century with a focus on old entities. We have mostly German material written in German but also Latin, Greek, Hebrew, French and other languages. Two important projects are about Judaica (I wasn't able to find a correct translation, it roughly translates to Jewish material). The bibliographic entities orginaties from public libraries, usually from an university environment.

Visual Library Server

The heart of the Visual Library software stack is a Python driven web application server. It's built on top of CherryPy framework and driven by a Firebird database. The server utilizes a cornucopia of open source third party packages as well as commercial and proprietary software. The most noticable Python packages are lxml for XML and XSL(T), reportlab for PDF creation, Cython for optimization / library bindings and PyLucene for full text search.

The software is yet another example for the power of Python. We wouldn't have been able to build such a large and complex system without Python. I like to thank the community for all the hard work and feature rich extensions, too.


Are you interested in more? Have a look ...

Donnerstag, 2. April 2009

autoconf'ing multiprocessing

A few weeks ago Deepak Rokade has reported an issue with the processing package on Solaris. The posting has caught my interest since processing is the ancestor of the multiprocessing package. I'm the current maintainer of the backport to Python 2.4 and 2.5. I started a discussion about the problem on the python-dev mailing list. No immediate solution was found but we decided to move from hard coded configuration to an autoconf approach.
"I guess multiprocessing doesn't use autoconf tests for historical reasons. It's ancestor -- the pyprocessing package -- was using hard coded values, too." [quoting myself]
I did some experiments with autoconf but I had no luck in getting it right on BSDish platforms. I ran out of free time, too. Jesse was going to work on the matter at PyCon anyway so I stopped pursuing the failing tests. It's no fun debugging these kind of problems through the build bots. I'm looking forward in getting access to the snakebit network. It will make our work much easier.

Anyway, it turned out that platforms FreeBSD and Darwin are having known bugs like a broken sem_getvalue() function. Jesse and Martin von Löwis have finished their combined work. I'll release a new version of the multiprocessing backport when the tests have passed on all releveant build bots.

Good work, Jesse and Martin!

Montag, 30. März 2009

I blog therefore I am (online)

Yesterday I decided to start my own blog. Why? Well, for starters blogging is no longer a hype but a well established way to tell people about interesting stuff. I'm not the kind of person that used to follow hypes.

I've been working on lots of cool stuff related to Python and books over the past year. I'm planing to use this blog as channel to tell you about various and manifold things related to my work at s<e>mantics (my employer) as well as my doings in and for the Python community.

My blog will focus on Python and my work on library software. Hence the prevailing word game on Py in the title of my blog.

And now for something completely different ...