Python Unicode rant

If you are at the beginning of a new project and are currently considering which programming language to use, and furthermore you will definitely need Unicode support, then please consider some other languages than python.

Don't use python. Really only used it, if you have no other choice.

Everything python does inside needs some kind of an encoding. If you read from a command line set to utf8, your whole system actually is utf8, then you still need to decode your input from utf8 (as long it's not ascii).

It doesn't wrap everything in unicode (unlike Java), as the basic string is str and unicode strings are different objects. So str("hello") isn't the same as unicode("hello").

Furthermore you will stumple upon a lot of other culprits: e.g. htmlentitydefs has a table for mapping "entity definitions to their replacement text in ISO Latin-1". Ups, there it is again: Latin-1. You can't just use it, but you have to decode it first.

Or the csv module will say, that you actually can't import unicode strings that easily, you will need to build an extra wrapper.

The dream of using Unicode all over the world is still far away.

At least for python...

Second thought

While reading http://blog.ianbicking.org/why-python-unicode-sucks.html it seems to me that a) I'm not the only one unhappy with the difficulties regarding Unicode in python and b) rival Ruby seems to be worse. Maybe I'm spoilt by Java.