Followup on "Python doctest and Unicode"
I complained about Python doctest and Unicode some time ago. This was an itch I finally wanted to scratch, so I followed the popular saying: "Luke, read the source".
Turns out the error in question is fixed pretty easily. Python needs to properly encode the output, so a conversion to the output stream's encoding did the trick. Now a new issue came up.
Python 2.x has two string classes, Unicode and byte strings. And to separate them Unicode strings are preceded with a single u
, e.g. u"ü"
. As doctest does a simple output comparison this special treatment breaks if the target says "u"
but the code generates u"u"
. While being the same string, their textual representation is different. This is where my Python fu runs out on me.
To add another Unicode issue to the list: Titlecase doesn't follow the Unicode guidelines http://bugs.python.org/issue6412.
Stay tuned for more Unicode woes :(