Unicode related

Locale magic (literally)

Another programming-centric post and follow-up on Thursday's post about locale issues with Turkish.

So, I showed some Problems with locale-dependant mappings using the case of Turkish, that mapps small Latin character i to uppercase İ, which also has a dot on top. Now join me on some more magic. On Unix you need to have the proper locale generated, which under debian works with dpkg-reconfigure locales.

Wrong Diacritics with Pinyin

Dear lazyweb,

I am looking for examples of bad Pinyin where invalid diacritics are being used.

I have two already for the third tone, but I am curious if other tones also see similar errors.

  1. Xiàndài Hànyû Dàcídiân (Circumflex)
  2. Wŏ huì shuō yìdiănr (Breve)

Followup on "Python doctest and Unicode"

I complained about Python doctest and Unicode some time ago. This was an itch I finally wanted to scratch, so I followed the popular saying: "Luke, read the source".

Turns out the error in question is fixed pretty easily. Python needs to properly encode the output, so a conversion to the output stream's encoding did the trick. Now a new issue came up.

Write upside down using Unicode IPA characters

¡ǝpoɔıun oʇ sʞuɐɥʇ uʍop ǝpısdn ǝʇıɹʍ uɐɔ noʎ 'ʇou ɹo ʇı ǝʌǝılǝq.

This is nothing new (see [1] or [2]), but I wrote a small Python program for this to pipe my instant messaging through it. Needless to say that with KDE's Kopete I found yet another application that seems not to work with non-ASCII characters.

Syndicate content