Friday, March 16, 2007

Python, flickr, and Unicode

I found a script the other day, for generating a wallpaper based on images from flickr. In its original form, the script expected one or more tags as argument and get a number of random pictures from flickr that have these tags. I wanted to make it a bit more interesting, so I looked at API at flickr to see what is available, and discovered the getHotList method. I added this method to the python wrapper of the flickr api I am using (one of those listed under Python at the site), and changed the logic slightly to fetch the 3 most popular tags during the last day if no tags are given as argument.

And now comes the interesting thing. The getHotList gives you tags in unicode, looking like u'mus\xe9edelelys\xe9e' (one of the popular tags today). But when this tag is fed into flickr.photos.search to retrive urls for some images, an exception is thrown, as the method can not use this tag format. More or less obviously, it needs to be on an urlencoded form, as we are communicating through http. Reading documentation here and there, I figured out that unicode was expected, but more on an ascii form. The tag mentioned here should look like 'mus%C3%A9edelelys%C3%A9e'. After quite a few minutes with google, I figured out a possible solution:
tag = urllib.quote(tag.encode('utf-8'))

Looks kind of funny to me, but it works.

Lesson learned: Unicode is not unicode is not unicode.

No comments: