Cantonese sound changes

Similar to Non-standard Mandarin (Taipei) here is a list of initials/finals pairs giving changes in the pronunciation of Cantonese (using the Yale romanisation). This is taken from:
Cantonese: A Comprehensive Grammar (Stephen Matthews, Virginia Yip, Routledge, 1994, ISBN 0-415-08945-X.)

As the book mentions the following changes vary between different social groups, ages amongst others. Character examples are taken from CantoDict as the book only gives Cantonese Yale transcriptions together with the English translation. The greater sign used here doesn't necessarily indicate the direction of the change as hypercorrection will lead to changes backwards to the initial change.

Initials:

  • n > l

    This is a very advances sound change (first stated in 1941) as for example 諗 nám becoming lám.

  • gwo > go, kwo > ko

    Examples are 國 gwok changing to gok and 狂 kwòhng becoming kòhng.

  • k > h

    For the pronoun 佢 kéuih the inital k is changed resulting in héuih for younger speakers.

  • ng > Ø

    The initial ng is dropped for younger speakers resulting in e.g. óh for 我 ngóh furthermore resulting in free interchange.

Finals:

  • k > t

    Example: 百 baak changing to baat.

  • ng > n

    Final ng is changed especially after a long vowel aa and less often after a short a and other vowels, e.g. 生 sāang resulting in sāan.

  • Ø + ng > Ø + m

    For example 五 ńgh changes to ḿh. As the author states this might happen mostly for people not using the initial ng (see above in ng > Ø).

It is interesting how people deal with these sound changes. While the given book already incorporates some of those changes fully throughout the book the authors state that there are teachers who would still correct them as unacceptable.

Chinese on Linux sucks

This is a rant but at the same time can serve as a future reference on this topic:

Using Chinese input methods (IME) for writing Chinese on Linux sucks. For years now keys would be suddenly swallowed so no input was possible for any application requiring a restart of the window session. This is very embarrassing and not reasonable in 2008.

This problem is well know. As the Debian package states:

"However, in XIM mode, this setting causes problems for some GTK+ programs,
most notable nautilus. Sometimes, the keyboard input in the program will
completely freeze, and nothing you type will show up. As a temporary
workaround, you can choose a different input method other than "X Input
Method" in the "Input Methods" list from the right-click menu ("Default"
should usually work)."

Needless to say that there is a bunch of programms that provide input frameworks, input methods, input briges for each other and input switches, let alone the documentation one needs to read before starting off with SCIM which spans more than 20 screens.

I meet an open source contributor this week and he himself being Chinese told me he doesn't even install a character input method on Linux but use his Apple for all his actuall work.

On a personal note, I reinstalled the whole load of packages for SCIM again in the hope that this problem vanishes. Of course there is no one to blame on this problem as everybody is contributing voluntarily. One should just not wonder why Linux doesn't do that well in this areas as it lacks basic infrastructure.

Write upside down using Unicode IPA characters

¡ǝpoɔıun oʇ sʞuɐɥʇ uʍop ǝpısdn ǝʇıɹʍ uɐɔ noʎ 'ʇou ɹo ʇı ǝʌǝılǝq.

This is nothing new (see [1] or [2]), but I wrote a small Python program for this to pipe my instant messaging through it. Needless to say that with KDE's Kopete I found yet another application that seems not to work with non-ASCII characters.

Anyway, here's the code:

#!/usr/bin/python
# -*- coding: utf8 -*-

"""
Simple program that flips the input on stdin for latin characters.

Support for diacritics through combining diacritical marks. Depends on proper
rendering though.

2008 Christoph Burgmer (uyhc@stud.uni-karlsruhe.de)

      ˙ǝɹɐʍʇɟos ǝɥʇ uı sƃuılɐǝp ɹǝɥʇo ɹo ǝsn ǝɥʇ ɹo ǝɹɐʍʇɟos ǝɥʇ ɥʇıʍ uoıʇɔǝuuoɔ
         uı ɹo ɟo ʇno 'ɯoɹɟ ƃuısıɹɐ 'ǝsıʍɹǝɥʇo ɹo ʇɹoʇ 'ʇɔɐɹʇuoɔ ɟo uoıʇɔɐ uɐ uı
  ɹǝɥʇǝɥʍ 'ʎʇılıqɐıl ɹǝɥʇo ɹo sǝƃɐɯɐp 'ɯıɐlɔ ʎuɐ ɹoɟ ǝlqɐıl ǝq sɹǝploɥ ʇɥƃıɹʎdoɔ
  ɹo sɹoɥʇnɐ ǝɥʇ llɐɥs ʇuǝʌǝ ou uı ˙ʇuǝɯǝƃuıɹɟuıuou puɐ ǝsodɹnd ɹɐlnɔıʇɹɐd ɐ ɹoɟ
ssǝuʇıɟ 'ʎʇılıqɐʇuɐɥɔɹǝɯ ɟo sǝıʇuɐɹɹɐʍ ǝɥʇ oʇ pǝʇıɯıl ʇou ʇnq ƃuıpnlɔuı 'pǝıldɯı
      ɹo ssǝɹdxǝ 'puıʞ ʎuɐ ɟo ʎʇuɐɹɹɐʍ ʇnoɥʇıʍ '"sı sɐ" pǝpıʌoɹd sı ǝɹɐʍʇɟos ǝɥʇ
                                 ˙ǝɹɐʍʇɟos ǝɥʇ ɟo suoıʇɹod lɐıʇuɐʇsqns ɹo sǝıdoɔ
  llɐ uı pǝpnlɔuı ǝq llɐɥs ǝɔıʇou uoıssıɯɹǝd sıɥʇ puɐ ǝɔıʇou ʇɥƃıɹʎdoɔ ǝʌoqɐ ǝɥʇ
                                            :suoıʇıpuoɔ ƃuıʍolloɟ ǝɥʇ oʇ ʇɔǝɾqns
 'os op oʇ pǝɥsıuɹnɟ sı ǝɹɐʍʇɟos ǝɥʇ ɯoɥʍ oʇ suosɹǝd ʇıɯɹǝd oʇ puɐ 'ǝɹɐʍʇɟos ǝɥʇ
ɟo sǝıdoɔ llǝs ɹo/puɐ 'ǝsuǝɔılqns 'ǝʇnqıɹʇsıp 'ɥsılqnd 'ǝƃɹǝɯ 'ʎɟıpoɯ 'ʎdoɔ 'ǝsn
    oʇ sʇɥƃıɹ ǝɥʇ uoıʇɐʇıɯıl ʇnoɥʇıʍ ƃuıpnlɔuı 'uoıʇɔıɹʇsǝɹ ʇnoɥʇıʍ ǝɹɐʍʇɟos ǝɥʇ
   uı lɐǝp oʇ '("ǝɹɐʍʇɟos" ǝɥʇ) sǝlıɟ uoıʇɐʇuǝɯnɔop pǝʇɐıɔossɐ puɐ ǝɹɐʍʇɟos sıɥʇ
 ɟo ʎdoɔ ɐ ƃuıuıɐʇqo uosɹǝd ʎuɐ oʇ 'ǝƃɹɐɥɔ ɟo ǝǝɹɟ 'pǝʇuɐɹƃ ʎqǝɹǝɥ sı uoıssıɯɹǝd

                                                            ǝsuǝɔıl ʇıɯ :ǝsuǝɔıl
"""

import sys
import locale
import unicodedata

BASE_LATIN_FLIP = u"ɐqɔpǝɟƃɥıɾʞlɯuodbɹsʇnʌʍxʎz"

OTHERS_FLIP = {'!': u'¡', '?': u'¿', '(': ')', '{': '}', ';': u'؛', '>': '<',
    '\'': ',','.': u'˙'}

# See:
# http://de.wikipedia.org/wiki/Unicode-Block_Kombinierende_diakritische_Zeichen
UNICODE_COMBINING_DIACRITICS = {u'̈': u'̤', u'̊': u'̥', u'́': u'̗', u'̀': u'̖',
    u'̇': u'̣', u'̃': u'̰', u'̄': u'̱', u'̂': u'̬', u'̆': u'̯', u'̌': u'̭',
    u'̑': u'̮', u'̍': u'̩'}

TRANSLITERATIONS = {u'ß': 'ss'}

# character lookup
charLookup = dict([(unichr(charOrd), BASE_LATIN_FLIP[i]) for i, charOrd \
    in enumerate(range(ord('a'), ord('z') + 1))])
charLookup.update(OTHERS_FLIP)
for char in charLookup.copy():
    charLookup[charLookup[char]] = char

# lookup for diacritical marks
diacriticsLookup = dict([(UNICODE_COMBINING_DIACRITICS[char], char) \
    for char in UNICODE_COMBINING_DIACRITICS])
diacriticsLookup.update(UNICODE_COMBINING_DIACRITICS)

_, default_encoding = locale.getdefaultlocale()

line = sys.stdin.readline().decode(default_encoding)

while line:
    line = line.strip("\n").lower()
    for char in TRANSLITERATIONS:
        line = line.replace(char, TRANSLITERATIONS[char])

    input = list(line)
    input.reverse()

    output = []
    for char in input:
        if char in charLookup:
            output.append(charLookup[char])
        else:
            charNormalised = unicodedata.normalize("NFD", char)

            for c in charNormalised:
                if c in charLookup:
                    charNormalised = charNormalised.replace(c, charLookup[c])
                elif c in diacriticsLookup:
                    charNormalised = charNormalised.replace(c,
                        diacriticsLookup[c])

            output.append(unicodedata.normalize("NFC", charNormalised))

    print "".join(output).encode(default_encoding)

    line = sys.stdin.readline().decode(default_encoding)

Update: It seem that Fonts currently only support a fixed set of common character-diacritics combinations. There is a discussion currently on the Unicode mailing list about a more general algorithm that could place diacritical marks, then allowing for a wider range of support.

Tones in Chinese songs

Are tones used in songs of Mandarin and Cantonese?

Both are tonal languages which means that the meaning of a single syllable/word depends on the pitch it is spoken with.

So, the answer is that even the linguists aren't sure about it. As San Duanmu states in "The Phonology of Standard Chinese" (ISBN 978-0-19-921578-2) there are two possible theories: a) either tones and musical notes are independently articulated or b) both are built by the same mechanism. Furthermore: "I believe that [...] tone and musical notes are made with the same articulatory mechanism and therefore they do interfere with each other. Unfortunately, I am not aware of any experimental study that demonstrates such interference (or the lack of it)" (Chapter 10, p. 252).

Fact is, Chinese speakers don't have problems perceiving the actual lyrics. Context does help where tonal information is missing. Furthermore it has been shown that song writers for Cantonese choose notes depending on the tones (Wong and Diehl).

Non-standard Mandarin (Taipei)

Standard Mandarin (Putonghua) is spoken by many people and in fact is the language with the most native speakers. There are then many people who acquire it as a second language increasing the number of speakers further. In a lot of places people will furthermore speak a local variant of Mandarin (e.g. in Sichuan). Both facts of Mandarin being learnt as a second language and spoken as in a variant form will definitely be a reason why "accented Mandarin" is an issue.

There is an easy way of expressing difference in pronunciation by using the Pinyin system. Though from a linguistic point of view this will probably not be very precise it should work for many cases. I thus want to give a list of possible variations on Mandarin by listing them using Pinyin initials and finals. The source used is:

Cornelius C. Kubler, George T.C. Ho: Varieties of Spoken Standard Chinese, Volume II: A Speaker from Taipei. Foris Publications, Dordrecht, 1984, ISBN 90-6765-040-4.

Initials:

  • zh, ch, sh > z, c, s

    The retroflex initials will be pronounced as the alveolar equivalents, so 知道 zhīdao becomes zīdao.

  • r > *z

    The retroflex r will sometimes be pronunced with, as the book states, similar to an English "z".

  • f > h, hu

    According to the book both directions of change will occur: 政府 zhèngfǔ will become zènghǔ, 分化 fēnhuà becomes fēnfà.

  • l, r > *r

    Both l and r are sometimes pronunced with a flapped r, according to the above source.

  • n > l

    Confusion of n and l can happen, usually before nasal finals as in 南 nán which becomes lán.

  • b, p, m, f, w +eng > b, p, m, f, w +ong

    Labial sounds ending with eng will change their final to ong, so 碰 pèng becomes *pòng.

Finals:

  • ing, eng > in, en

    The velar nasal [ŋ] will be lost for ing, eng thus for 高兴 gāoxìng changing to gāoxìn.

  • en > *En

    For final en the vowel originally pronounced in a mid central position will change to a mid front one, with no equivalence in the original Pinyin set.

  • ü+ Ø, e, an, n > i+ Ø, e, an, n

    Vowel [y] (as in yu, nü, quan) will change to resemble the vowel denoted by i in Pinyin, as for 下雨 xià yǔ becoming xià yǐ.

  • u+ o, ei > o, ei

    Diphthong uo and triphthong ui will degenerate as in 美国 měiguó with *měigó or 对 duì with dèi.

  • zh, ch, sh, r, z, c, s +i > zh, ch, sh, r, z, c, s +u

    Vowel -i as for [z̩], [ʐ̩] will change to vowel u as for 知道 zhīdao with zūdao.

  • e > o

    Final -e changes to -o, e.g. 可能 kěnéng to *kǒ'nén.

  • r > Ø

    Erhua sound as in 画儿 huàr will mostly not be used.

Furthermore there are tonal changes, and hypercorrection is an issue, but for now I will finish.

Update: The greater sign doesn't necessarily indicate the direction of the change as hypercorrection will lead to changes backwards to the initial change and the given book doesn't always clearly state the major direction so that in at least on case the change pair was swapped here as to accommodate the given example.

Syndicate content