Unicode finally takes the lead...

Submitted by Christoph on 31 March, 2008 - 19:12

...at least for Python 3.0.

Still hoping we can only wonder about sentences like the following in 10 years:
Some languages use special characters (Chinese, Japanese, Arabic, Klingon, etc.) that are difficult to handle with traditional software.
Although there are standards for using and displaying them, these standards are not widely used, and make life a lot more complicated than necessary. Since virtually all software (and even hardware) is
made to be used with the Roman alphabet (possibly with minor
language-dependent modifications) [...] (quoted from [1]

For Python it seems to get reality soon:

What is quite an old fact (from 2007) is new to me: Python 3.0 which is currently available as alpha? will bring together str and unicode objects and the stupid coexistence of two string classes will finally be overcome.

Quoting [2]:

There is only one string type; its name is str but its behavior and implementation are like unicode in 2.x.
Yay!
PEP 3137: There is a new type, bytes, to represent binary data (and encoded text, which is treated as binary data until you decide to decode it). The str and bytes types cannot be mixed; you must always explicitly convert between them, using the str.encode() (str -> bytes) or bytes.decode() (bytes -> str) methods.
Sounds like Java!
PEP 3120: UTF-8 default source encoding.
No stupid warnings anymore.
PEP 3131: Non-ASCII identifiers. [...]
Nice.

Feels like Christmas (replace that with your preferred holiday).

P.S.: Still hoping that the bug report I filed yesterday will be accepted as such. It's not a feature to me!

Update: To understand my happiness consider reading my Python Unicode rant.

Christoph's blog

Christoph's CJK-centered concerns

Navigation

tags in site content

Archive

Blogs I read

Unicode finally takes the lead...