cjklib.reading.operator.WadeGilesOperator is an implementation of the Mandarin Chinese romanisation Wade-Giles. It was in common use before being replaced by Pinyin.
Features:
While the Wade-Giles romanisation system itself is a modification by H. A. Giles, some further alterations exist, requiring an adaptable solution to parse transliterated text.
While non-retroflex zero final syllables tzŭ, tz’ŭ and ssŭ carry a breve on top of the u in the standard realization of Wade-Giles, it is often left out while creating no ambiguity. In the same fashion finals -ê, -ên and -êng, also syllable êrh, carry a circumflex over the e which often is not written, and no ambiguity arises as no equivalent forms with a plain e exist. These forms can be handled by setting options 'zeroFinal' to 'u' and 'diacriticE' to 'e'.
Different to that, leaving out the umlaut on the u for finals -ü, -üan, -üeh and -ün does create forms where back-conversion for some cases is not possible as an equivalent vowel u form exists. Unambiguous forms consist of initial hs- and y- (exception yu) and/or finals -üeh and -üo, the latter being dialect forms not in use today. So while for example hsu can be unambiguously converted back to its correct form hsü, it is not clear if ch’uan is the wanted form or if it stems from ch’üan, its diacritics being mangled. This reporting is done by checkPlainEntity(). The omission of the umlaut can be controlled by setting 'umlautU' to 'u'.
For the non-retroflex zero final forms tzŭ, tz’ŭ and ssŭ the latter is sometimes changed to szŭ. The operator can be configured by setting the Boolean option 'useInitialSz'.
The neutral tone by default is not marked. As sometimes the digits zero or five are used, they can be set by option 'neutralToneMark'.
The apostrophe marking aspiration can be set by 'wadeGilesApostrophe'.
Tones are by default marked with superscript characters. This can be controlled by option 'toneMarkType'.
Recovering omitted apostrophes for aspiration is not possible as for all cases there exists ambiguity. No means are provided to warn for possible missing apostrophes. In case of uncertainty check for initials p-, t-, k-, ch-, ts and tz.
The WadeGilesDialectConverter allows conversion between said forms.
>>> from cjklib.reading import ReadingFactory
>>> f = ReadingFactory()
>>> f.convert(u"K’ung³-tzu³", 'WadeGiles', 'WadeGiles',
... sourceOptions={'zeroFinal': 'u'})
u'K\u2019ung\xb3-tz\u016d\xb3'
>>> f.convert(u"k’ai¹-men²-chien⁴-shan¹", 'WadeGiles', 'WadeGiles',
... sourceOptions={'diacriticE': 'e'})
u'k\u2019ai\xb9-m\xean\xb2-chien\u2074-shan\xb9'
>>> f.convert(u"hsueh²", 'WadeGiles', 'WadeGiles',
... sourceOptions={'umlautU': 'u'})
u'hs\xfceh\xb2'
>>> f.convert(u"hsu⁴-ch’u³", 'WadeGiles', 'WadeGiles',
... sourceOptions={'umlautU': 'u'})
Traceback (most recent call last):
...
cjklib.exception.AmbiguousConversionError: conversion for entity 'ch’u³' is ambiguous: ch’u³, ch’ü³
>>> from cjklib.reading import operator
>>> operator.WadeGilesOperator.guessReadingDialect(
... u"k'ai1-men2-chien4-shan1")
{'zeroFinal': u'\u016d', 'diacriticE': u'e', 'umlautU': u'\xfc', 'toneMarkType': 'numbers', 'useInitialSz': False, 'neutralToneMark': 'none', 'wadeGilesApostrophe': "'"}
Bases: cjklib.reading.operator.TonalRomanisationOperator
Provides an operator for the Mandarin Wade-Giles romanisation.
Todo
Parameters: |
|
---|
Todo
Checks if the given plain entity with is a form with lost diacritics or an ambiguous case.
Examples: While form *erh can be clearly traced to êrh, form kuei has no equivalent part with diacritcs. The former is a case of a 'lost' vowel, the second of a 'strict' form. Syllable ch’u though is an 'ambiguous' case as both ch’u and ch’ü are valid.
Parameters: |
|
---|---|
Return type: | str |
Returns: | 'strict' if the given form is a strict Wade-Giles form with vowel u, 'lost' if the given form is a mangled vowel form, 'ambiguous' if two forms exist with vowels (i.e. u and ü) each |
Raises ValueError: | |
if plain entity doesn’t include the ambiguous vowel in question |
Composes the given list of basic entities to a string by applying a hyphen between syllables.
Parameter: | readingEntities (list of str) – list of basic syllables or other content |
---|---|
Return type: | str |
Returns: | composed entities |
Converts the alternative syllable representation from the current dialect to the given target, or by default to the standard representation.
Use the WadeGilesDialectConverter for conversions in general.
Parameters: |
|
---|---|
Return type: | str |
Returns: | converted entity |
Raises AmbiguousConversionError: | |
if conversion is ambiguous. |
Splits the given plain syllable into onset (initial) and rhyme (final).
Semivowels w- and y- will be treated specially and an empty initial will be returned, while the final will be extended with vowel i or u.
Old forms are not supported and will raise an UnsupportedError. For the dialect with missing diacritics on the ü an UnsupportedError is also raised, as it is unclear which syllable is meant.
Returned strings will be lowercase.
Parameter: | plainSyllable (str) – syllable without tone marks |
---|---|
Return type: | tuple of str |
Returns: | tuple of entity onset and rhyme |
Raises InvalidEntityError: | |
if the entity is invalid. | |
Raises UnsupportedError: | |
if the given entity is not supported |
Gets the list of plain entities supported by this reading. Different to getReadingEntities() the entities will carry no tone mark.
Syllables will use the user specified apostrophe to mark aspiration.
Return type: | set of str |
---|---|
Returns: | set of supported syllables |
Takes a string written in Wade-Giles and guesses the reading dialect.
The following options are tested:
Parameter: | readingString (str) – Wade-Giles string |
---|---|
Return type: | dict |
Returns: | dictionary of basic keyword settings |
Removes hyphens between two syllables for a given decomposition.
Parameter: | readingEntities (list of str) – list of basic syllables or other content |
---|---|
Return type: | list of str |
Returns: | the given entity list without separating hyphens |
Regex to split a string into several syllables in a crude way. It consists of: