cjklib.reading.operator.GROperator is a mature implementation of the Chinese Gwoyeu Romatzyh romanisation (國語羅馬字, often abbreviated GR). Gwoyeu Romatzyh is different from most other romanisation methods as that it encodes Chinese tones using alphabetic characters instead of diacritics or digits.
Features:
Tones are transcribed rigorously as syllables in the neutral tone additionally carry the original (etymological) tone information. Y.R. Chao also annotates the optional neutral tone (e.g. buh jy˳daw) which can be pronounced with either the neutral tone or the etymological one. Compared to other reading operators for Mandarin, special care has to be taken to cope with these special requirements.
Gwoyeu Romatzyh renders rhotacised syllables (Erlhuah) by trying to give the actual pronunciation. As the effect of r-colouring loses the information of the underlying etymological syllable conversion between the r-coloured form back to the underlying form can not be done in an unambiguous way. As furthermore finals i, iu, in, iun contrast in the first and the second tone but not in the third and the forth tone conversion between different tones (including the base form) cannot be made in a general manner: 小鸡儿 sheau-jiel is different to 小街儿 sheau-jie’l but 几儿 jieel (from jǐ) equals 姐儿 jieel (from jiě), see Chao.
Thus this ReadingOperator lacks the general handling of syllable renderings and many methods narrow the range of syllables allowed. While original forms can carry any tone (even though Mandarin doesn’t make use of some combinations), r-coloured forms for Erlhuah will currently be limited to those given in the source by Y.R. Chao. Those not mentioned there will raise an UnsupportedError.
Yuen Ren Chao includes several abbreviated forms in his books (references see below). For example 個/个 which would be fully transcribed as .geh or ˳geh is abbreviated as g. These forms can be accessed by getAbbreviatedForms() and getAbbreviatedFormData(), and their usage can be contolled by option 'abbreviations'. Use the GRDialectConverter to convert these abbreviations into their full forms:
>>> from cjklib.reading import ReadingFactory
>>> f = ReadingFactory()
>>> f.convert('Hairtz', 'GR', 'GR', breakUpAbbreviated='on')
u'Hair.tzy'
Special abbreviated forms are given in form of repetition markers. These take the form x and v or a combination vx for repetition of the last syllable/the second last syllable or both, e.g. shie.x for shie.shie, deengiv for deengideeng and duey .le vx for duey .le duey .le. Both forms can be preceded by a neutral tone mark, e.g. .x or ˳v.
See also
Bases: cjklib.reading.operator.TonalRomanisationOperator
Provides an operator for the Mandarin Gwoyeu Romatzyh romanisation.
Todo
Parameters: |
|
---|
Composes the given list of basic entities to a string. Applies an apostrophe between syllables if the second syllable has a zero-initial.
Parameter: | readingEntities (list of str) – list of basic syllables or other content |
---|---|
Return type: | str |
Returns: | composed entities |
Gets a list of abbreviated GR entities. This returns single entities from getAbbreviatedForms() and only returns those that don’t also exist as full forms. Includes repetition markers x and v.
Returned entities are in lowercase.
Return type: | list |
---|---|
Returns: | list of abbreviated GR forms |
Gets table of abbreviated entities including the traditional Chinese characters, original spelling and specialised information.
Some abbreviated syllables come with additional information:
>>> from cjklib.reading import operator
>>> gr = operator.GROperator()
>>> gr.getAbbreviatedEntityData(['yi'])
[(u'一', [u'i'], set([u'S', u'T']))]
Parameter: | entities (list of str) – entities abbreviated form for which information is returned |
---|---|
Return type: | list |
Returns: | list full spellings, Chinese character string and specialised information |
Todo
Gets a list of abbreviated forms used in GR.
The returned list consists of a tuple of one or more possibly abbreviated reading entites in lowercase. See getAbbreviatedFormData() on how to get more information on these forms.
Return type: | list |
---|---|
Returns: | a list of abbreviated forms |
Gets a list of base entities as plain entity/tone pair for a given r-coloured entity (Erlhuah form).
This is the counterpart of getRhotacisedTonalEntity() and as different syllables can have a similar rhotacised form, the back transformation is not injective.
Parameter: | tonalEntity (str) – r-coloured entity |
---|---|
Return type: | set of tuple |
Returns: | list of plain entities with tone |
Raises InvalidEntityError: | |
if the entity is invalid. |
Gets the tone number of the tone or the etymological tone if it is a neutral or optional neutral tone.
Parameter: | tone (str) – tone |
---|---|
Return type: | int |
Returns: | base tone number |
Raises InvalidEntityError: | |
if an invalid tone is passed. |
Gets a set of full entities supported by the reading excluding abbreviated forms.
Return type: | set of str |
---|---|
Returns: | set of supported syllables |
Gets the list of plain entities supported by this reading without r-coloured forms (Erlhuah forms). Different to getReadingEntities() the entities will carry no tone mark.
Return type: | set of str |
---|---|
Returns: | set of supported syllables |
Gets the r-coloured entity (Erlhuah form) with tone mark for the given plain entity and tone. Not all entity-tone combinations are supported.
Parameters: |
|
---|---|
Return type: | str |
Returns: | entity with appropriate tone |
Raises InvalidEntityError: | |
if the entity is invalid. |
|
Raises UnsupportedError: | |
if the given entity is an Erlhuah form or the syllable is not supported in this given tone. |
Gets the entity with tone mark for the given plain entity and tone. This method only works for plain syllables that are not r-coloured (Erlhuah forms) as due to the depiction of Erlhuah in GR the information about the base syllable is lost and pronunciation partly varies between different syllables. Use getRhotacisedTonalEntity() to get the tonal entity for a given etymological (base) syllable.
Parameters: |
|
---|---|
Return type: | str |
Returns: | entity with appropriate tone |
Raises InvalidEntityError: | |
if the entity is invalid. |
|
Raises UnsupportedError: | |
if the given entity is an Erlhuah form. |
Takes a string written in GR and guesses the reading dialect.
The options 'grRhotacisedFinalApostrophe' and 'grSyllableSeparatorApostrophe' are guessed. Both will be set to the same value which derives from a list of different apostrophes and similar characters.
Parameter: | readingString (str) – GR string |
---|---|
Return type: | dict |
Returns: | dictionary of basic keyword settings |
Todo
Returns true if the given entity is an abbreviated spelling.
Case of characters will be handled depending on the setting for option 'case'.
Parameter: | entity (str) – entity to check |
---|---|
Return type: | bool |
Returns: | True if entity is an abbreviated form. |
Checks if the given entity is a r-coloured entity (Erlhuah form).
Parameter: | entity (str) – reading entity |
---|---|
Return type: | bool |
Returns: | True if the given entity is a r-coloured entity, False otherwise. |
Removes apostrophes between two syllables for a given decomposition.
Parameter: | readingEntities (list of str) – list of basic syllables or other content |
---|---|
Return type: | list of str |
Returns: | the given entity list without separating apostrophes |
Splits the given plain syllable into consonants-vowels-consonants.
Parameter: | plainSyllable (str) – entity without tonal information |
---|---|
Return type: | tuple of str |
Returns: | syllable CVC triple |
Raises InvalidEntityError: | |
if the entity is invalid. |