Followup on "Python doctest and Unicode"

Submitted by Christoph on 21 July, 2009 - 12:05

I complained about Python doctest and Unicode some time ago. This was an itch I finally wanted to scratch, so I followed the popular saying: "Luke, read the source".

Turns out the error in question is fixed pretty easily. Python needs to properly encode the output, so a conversion to the output stream's encoding did the trick. Now a new issue came up.

Python 2.x has two string classes, Unicode and byte strings. And to separate them Unicode strings are preceded with a single u, e.g. u"ü". As doctest does a simple output comparison this special treatment breaks if the target says "u" but the code generates u"u". While being the same string, their textual representation is different. This is where my Python fu runs out on me.

To add another Unicode issue to the list: Titlecase doesn't follow the Unicode guidelines http://bugs.python.org/issue6412.

Stay tuned for more Unicode woes :(

Christoph's blog

Christoph's CJK-centered concerns

Navigation

tags in site content

Archive

Blogs I read

Followup on "Python doctest and Unicode"