The following examples show how to convert between different representations of Pinyin.
Create the Converter and convert from standard Pinyin to Pinyin with tones represented by numbers:
>>> from cjklib.reading import *
>>> targetOp = operator.PinyinOperator(toneMarkType='numbers')
>>> pinyinConv = converter.PinyinDialectConverter(
... targetOperators=[targetOp])
>>> pinyinConv.convert(u'hànzì', 'Pinyin', 'Pinyin')
u'han4zi4'
Convert Pinyin written with numbers, the ü (u with umlaut) replaced by character v and omitted fifth tone to standard Pinyin:
>>> sourceOp = operator.PinyinOperator(toneMarkType='numbers',
... yVowel='v', missingToneMark='fifth')
>>> pinyinConv = converter.PinyinDialectConverter(
... sourceOperators=[sourceOp])
>>> pinyinConv.convert('nv3hai2zi', 'Pinyin', 'Pinyin')
u'nǚháizi'
Or more elegantly:
>>> f = ReadingFactory()
>>> f.convert('nv3hai2zi', 'Pinyin', 'Pinyin',
... sourceOptions={'toneMarkType': 'numbers', 'yVowel': 'v',
... 'missingToneMark': 'fifth'})
u'nǚháizi'
Decompose the reading of a dictionary entry from CEDICT into syllables and convert the ü-vowel and forms of Erhua sound:
>>> pinyinFrom = operator.PinyinOperator(toneMarkType='numbers',
... yVowel='u:', Erhua='oneSyllable')
>>> syllables = pinyinFrom.decompose('sun1nu:r3')
>>> print syllables
['sun1', 'nu:r3']
>>> pinyinTo = operator.PinyinOperator(toneMarkType='numbers',
... Erhua='twoSyllables')
>>> pinyinConv = converter.PinyinDialectConverter(
... sourceOperators=[pinyinFrom], targetOperators=[pinyinTo])
>>> pinyinConv.convertEntities(syllables, 'Pinyin', 'Pinyin')
[u'sun1', u'nü3', u'r5']
Or more elegantly with entities already decomposed:
>>> f.convertEntities(['sun1', 'nu:r3'], 'Pinyin', 'Pinyin',
... sourceOptions={'toneMarkType': 'numbers', 'yVowel': 'u:',
... 'Erhua': 'oneSyllable'},
... targetOptions={'toneMarkType': 'numbers',
... 'Erhua': 'twoSyllables'})
[u'sun1', u'nü3', u'r5']
Fix cosmetic errors in Pinyin input (note tone mark and apostrophe):
>>> f.convert(u"Wǒ peí nǐ qù Xīān.", 'Pinyin', 'Pinyin')
u"Wǒ péi nǐ qù Xī'ān."
Fix more errors in Pinyin input (note diacritics):
>>> string = u"Wŏ peí nĭ qù Xīān."
>>> dialect = operator.PinyinOperator.guessReadingDialect(string)
>>> f.convert(string, 'Pinyin', 'Pinyin', sourceOptions=dialect)
u"Wǒ péi nǐ qù Xī'ān."
Bases: cjklib.reading.converter.ReadingConverter
Provides a converter for different representations of the Chinese romanisation Hanyu Pinyin.
Parameters: |
|
---|
Converts a list of entities in the source reading to the given target reading.
Parameters: |
|
---|---|
Return type: | list of str |
Returns: | list of entities written in target reading |
Raises AmbiguousConversionError: | |
if conversion for a specific entity of the source reading is ambiguous. |
|
Raises ConversionError: | |
on other operations specific to the conversion between the two readings (e.g. error on converting entities). |
|
Raises UnsupportedError: | |
if source or target reading is not supported for conversion. |
|
Raises InvalidEntityError: | |
if an invalid entity is given. |
Converts the various Erhua forms in a list of reading entities to a representation with one syllable, e.g. ['tou2', 'r5'] to ['tour2'].
Parameter: | entityTuples (list of tuple/str) – list of tuples with plain syllable and tone |
---|---|
Return type: | list of tuple/str |
Returns: | list of tuples with plain syllable and tone |
Converts the various Erhua forms in a list of reading entities to a representation with two syllable, e.g. ['tour2'] to ['tou2', 'r5'].
Parameter: | entityTuples (list of tuple/str) – list of tuples with plain syllable and tone |
---|---|
Return type: | list of tuple/str |
Returns: | list of tuples with plain syllable and tone |