Monday, April 16, 2007

If you ever make my mistakes!

Recently, I created a quick and dirty webapplication using Turbogears and SQLObject. During the initial phase, I had some problems with unicode strings, but figured out that I could use UnicodeCol in SQLObject, and append '?sqlobject_encoding=latin1' to the dburi setting for the turbogears project.

Later on, a new requirement for the project came up. Someone had to be able to download a file with infomation in an excel file. From before I had created excel files from mysql data in PHP and Perl using the SpreadSheet::Writer module (and the PHP equivalent). I just assumed this to be a walk in the park, recreating the same functionallity in my turbogears site. As you might expect (or even know), I was wrong. To this date, I can not find the same or a similar well-working module for Python. As an old (not that old really) Perl hacker, I have at least some amounts of the right lazyness. I can of course just reuse my old PHP solution, using some suitable rewrite rules in the apache2 config for the turbogears site. And this is where my problems begin. Getting things up and running is easy enough, we are only talking reading mysql data here, and figuring out the tables in mysql that the SQLObject based model is using is simple. But, and this was the hard part, I figured out that I had done something stupied. The mysql-tables is actually latin1, as I have told SQLObject in the dburi setting. But the texts inside these latin1 tables is encoded by Python in some unicode form, using latin1 characters. Which mean that when I read mysql data in PHP and feed into an excel file, I don't get the right thing. Using default values, I get garbage. Trying to set either latin1, iso-8859-1, or utf-8 as character set for the excel file also yields garbage.

Somehow, I have to translate the encoded unicode strings (by Python) into something that at least works in PHP, which in the next step can be put into an excel file and still be readable.

And then, by accident, I stumbled upon the solution. There is a function in PHP called utf8_decode. It's purpose is to convert a iso-8859-1 encoded unicode-string into single-byte iso-8859-1. Which happens to be exactly what I was looking for.

It took me a couple of hours of googling and experimenting before I ended up there. Now that I know the solution the whole thing is of course obvious. But if you make the same mistakes as me, you may like to read this, and save a few hours.

The right solution, of course, is to use utf-8 all the way. If the unicode data have been stored in unicode mysql tables as unicode, the whole thing should have worked right out of the box. So doing that may save you from the whole problem. Good luck!

2 comments:

Anonymous said...

If you need an excel module in python, you might be interested in:
http://sourceforge.net/projects/pyexcelerator/

Åsmund Ødegård said...

Thanks, I'll look into that!