August 2009

Locale magic (literally)

Another programming-centric post and follow-up on Thursday's post about locale issues with Turkish.

So, I showed some Problems with locale-dependant mappings using the case of Turkish, that mapps small Latin character i to uppercase İ, which also has a dot on top. Now join me on some more magic. On Unix you need to have the proper locale generated, which under debian works with dpkg-reconfigure locales.

Python, Unicode and the digital divide

One could say that Unicode is the reflection of globalization in computing. So, being a computer scientist this huge project very much gets my attention and fascinates me on a daily basis. And Unicode is not just a feature, it is a foundation that bridges between languages and cultures in the digital world.

Wrong Diacritics with Pinyin

Dear lazyweb,

I am looking for examples of bad Pinyin where invalid diacritics are being used.

I have two already for the third tone, but I am curious if other tones also see similar errors.

  1. Xiàndài Hànyû Dàcídiân (Circumflex)
  2. Wŏ huì shuō yìdiănr (Breve)

Colloquial designations of Kangxi radicals

Sorting and indexing English words or those of other languages with roman alphabet is pretty easy, as letters are ordered from A to Z. Chinese characters are much more difficult to handle, as setting up a distinct order for each and every character fails due to the sheer number of characters - there's even no distinguishable upper limit.

Radical 53

Radical 53

http://commons.wikimedia.org/wiki/Image:Radical053.png by Immanuel Giel under a creative commons license