Language Log on Uyghur

It's been some days but Language Log has a nice short article on Uyghur nouns together with their Mandarin forms: A Little Primer of Xinjiang Proper Nouns.

Bo, po and duo, tuo

Pinyin doesn't always specify finals in a straightforward way. For example it is difficult to see that wei and dui have the same final -uei, the former substituting u for semi-vowel w, the latter omitting the e. In Views on initials and finals of Mandarin in Pinyin I've tried to show those peculiarities by grouping forms under their actual final, not just their spelling. What I didn't do was merging columns -o (bo, po, mo, fo) and -uo (duo, tuo, ...).

But as Y.R. Chao indicates "there is a special form for labial initials, namely, 波 bo, 坡 po, 摸 mho, 脖 bor, 婆 por, 摩 mo, 佛 for, etc. This is only graphical, and the actual pronunciation is still, buo, puo, and so on." (A Grammar of spoken Chinese, 1968, p. 30). He adds: "After the labials: b, p, m, f, this final is written o, but there is still a trace of u before and also an unrounding at the end." (Mandarin Primer, 1948, p. 24). Though said about Gwoyeu Romatzyh this translates directly to Pinyin.

This is interesting as Hànyǔ Pǔtōnghuà Yǔyīn Biànzhèng (汉语普通话语音辨正, 2003, ISBN 7-5619-0622-6) states: "o这个字母只代表一个元音:[o]" and "[o]后、半高、圆唇元音。例如“波、摸、佛、磨”等字的韵母。" "The character o does only represent one sound: [o]" and "o, back, half high vowel with rounded lips. For example: 'bo, mo, fo, mo' and other characters with same final." And then in contrast: "uo[uo]例如“我、活、多、说”等字的韵母。" "uo [uo], for example wo, huo, duo, shuo and other characters with same final." So it gives different IPA values for both finals.

In addition to that Xiàndài Hànyǔ Cídiǎn (现代汉语词典) has entries lo with lonely character 咯 (also spelled luo) and luo, separating spellings, which for Chao would be the same.

Followup on "Python doctest and Unicode"

I complained about Python doctest and Unicode some time ago. This was an itch I finally wanted to scratch, so I followed the popular saying: "Luke, read the source".

Turns out the error in question is fixed pretty easily. Python needs to properly encode the output, so a conversion to the output stream's encoding did the trick. Now a new issue came up.

Python 2.x has two string classes, Unicode and byte strings. And to separate them Unicode strings are preceded with a single u, e.g. u"ü". As doctest does a simple output comparison this special treatment breaks if the target says "u" but the code generates u"u". While being the same string, their textual representation is different. This is where my Python fu runs out on me.

To add another Unicode issue to the list: Titlecase doesn't follow the Unicode guidelines http://bugs.python.org/issue6412.

Stay tuned for more Unicode woes :(

A survey on German learners of Chinese

We did a short survey on German beginners of Mandarin where we asked 30 people what problems they face, what they use for learning, and what they think is missing. Most of the 30 people are students and none of them have a family background in China. On average they already studied 7 months of Chinese while learning 1.8 days a week.

It is interesting to see that for most of them gaining a special qualification for a later job is of some importance, and at least 19 have interest in Chinese history, culture or philosophy.

Though only 12 can actually write Chinese on their computers, 20 frequently use an online dictionary. Amongst those sites named are LEO Chinesisch-Deutsches Wörterbuch, Chinesisch-Deutsches Wörterbuch HanDeDict and MDBG Chinese-English dictionary with frequency in this order.

Of those asked 7 have regular contact with people from China, nobody has a language partner. Only 3 are planning to do a internship or language curse soon, but 8 have already been in China before they started their language course.

The evaluation is still under way.

Stepchildren of Pinyin

Pinyin or fully Hanyu Pinyin is the standard Romanisation for Mandarin. It is widely used and only in few cases older Romanisations like Wade-Giles or Bopomofo prevail. So as Pinyin is ubiquitous you think you've seen it all? Do you know characters ê, ẑ, ĉ, ŝ and ŋ? I didn't when I came across them some while ago; I believed them to be some kind of ad-hoc invention or imprecise form used by an uninformed author. But far from that. ISO 7098, the ISO form of Pinyin, lists this character (only as a final, in its single form still rendered e, but nevertheless) and others do, too. It's actually an interjection and used for 欸 by some sources.

ẑ, ĉ, ŝ and ŋ have a different use-case though. They seem to be a relict of the Pinjin draft scheme from 1956. Here initials zh, ch, sh and final part ng are expressed using only one letter, the latter as ŋ (there it is) and the other three as (U+1D8E, LATIN SMALL LETTER Z WITH PALATAL HOOK; or this: ȥ U+0225, LATIN SMALL LETTER Z WITH HOOK), ɕ (U+0255, LATIN SMALL LETTER C WITH CURL) and ʂ (U+1D8A, LATIN SMALL LETTER S WITH PALATAL HOOK). Those characters, hard to find on any keyboard, probably were substituted with their -h counterparts later, still allowing a short form using the circumflex: Nǐ ẑīdao mā?

Thinking about that, you could believe that the ingenious inventors only anticipated the Taiwanese accent, making it easy to omit any retroflex pronunciation.

Syndicate content