GROperator — Gwoyeu Romatzyh

cjklib.reading.operator.GROperator is a mature implementation of the Chinese Gwoyeu Romatzyh romanisation (國語羅馬字, often abbreviated GR). Gwoyeu Romatzyh is different from most other romanisation methods as that it encodes Chinese tones using alphabetic characters instead of diacritics or digits.

Features:

  • support of abbreviated forms (zh, j, g, sherm, ...),
  • conversion of abbreviated forms to full forms,
  • placement of apostrophes before 0-initial syllables,
  • support for different apostrophe characters,
  • support for r-coloured syllables (Erlhuah),
  • syllable repetition markers (x, v, vx) and
  • guessing of input form (reading dialect).

Specifics

Tones

Tones are transcribed rigorously as syllables in the neutral tone additionally carry the original (etymological) tone information. Y.R. Chao also annotates the optional neutral tone (e.g. buh jy˳daw) which can be pronounced with either the neutral tone or the etymological one. Compared to other reading operators for Mandarin, special care has to be taken to cope with these special requirements.

R-colouring

Gwoyeu Romatzyh renders rhotacised syllables (Erlhuah) by trying to give the actual pronunciation. As the effect of r-colouring loses the information of the underlying etymological syllable conversion between the r-coloured form back to the underlying form can not be done in an unambiguous way. As furthermore finals i, iu, in, iun contrast in the first and the second tone but not in the third and the forth tone conversion between different tones (including the base form) cannot be made in a general manner: 小鸡儿 sheau-jiel is different to 小街儿 sheau-jie’l but 几儿 jieel (from jǐ) equals 姐儿 jieel (from jiě), see Chao.

Thus this ReadingOperator lacks the general handling of syllable renderings and many methods narrow the range of syllables allowed. While original forms can carry any tone (even though Mandarin doesn’t make use of some combinations), r-coloured forms for Erlhuah will currently be limited to those given in the source by Y.R. Chao. Those not mentioned there will raise an UnsupportedError.

Abbreviations

Yuen Ren Chao includes several abbreviated forms in his books (references see below). For example 個/个 which would be fully transcribed as .geh or ˳geh is abbreviated as g. These forms can be accessed by getAbbreviatedForms() and getAbbreviatedFormData(), and their usage can be contolled by option 'abbreviations'. Use the GRDialectConverter to convert these abbreviations into their full forms:

>>> from cjklib.reading import ReadingFactory
>>> f = ReadingFactory()
>>> f.convert('Hairtz', 'GR', 'GR', breakUpAbbreviated='on')
u'Hair.tzy'

Repetition markers

Special abbreviated forms are given in form of repetition markers. These take the form x and v or a combination vx for repetition of the last syllable/the second last syllable or both, e.g. shie.x for shie.shie, deengiv for deengideeng and duey .le vx for duey .le duey .le. Both forms can be preceded by a neutral tone mark, e.g. .x or ˳v.

Sources

  • Yuen Ren Chao: A Grammar of Spoken Chinese. University of California Press, Berkeley, 1968, ISBN 0-520-00219-9.
  • Yuen Ren Chao: Mandarin Primer: an intensive course in spoken Chinese. Harvard University Press, Cambridge, 1948.

See also

GR Junction
by Richard Warmington
A Guide to Gwoyeu Romatzyh Tonal Spelling of Chinese
Overview article
Gwoyeu Romatzyh
Article on the English Wikipedia

Class

class cjklib.reading.operator.GROperator(**options)

Bases: cjklib.reading.operator.TonalRomanisationOperator

Provides an operator for the Mandarin Gwoyeu Romatzyh romanisation.

Todo

  • Impl: Initial, medial, head, ending (ending1, ending2=l?)
  • Lang: Y.R. Chao uses particle and interjection ㄝ è. For more see ‘Mandarin Primer’, Vocabulary and Index, pp. 301.
  • Impl: Implement Erhua forms as stated in W. Simon: A Beginner’s Chinese-English Dictionary.
  • Impl: Implement a GRIPAConverter once IPA values are obtained for the PinyinIPAConverter. GRIPAConverter can work around missing Erhua conversion to Pinyin.
  • Lang: Special rule for non-Chinese names with initial r- to be transcribed with an r- cited by Ching-song Gene Hsiao: A Manual of Transcription Systems For Chinese, 中文拼音手册. Far Eastern Publications, Yale University, New Haven, Connecticut, 1985, ISBN 0-88710-141-0.
Parameters:
  • options – extra options
  • dbConnectInst – instance of a DatabaseConnector, if none is given, default settings will be assumed.
  • strictSegmentation – if True segmentation (using segment()) and thus decomposition (using decompose()) will raise an exception if an alphabetic string is parsed which can not be segmented into single reading entities. If False the aforesaid string will be returned unsegmented.
  • case – if set to 'lower', only lower case will be supported, if set to 'both' a mix of upper and lower case will be supported.
  • abbreviations – if set to True abbreviated spellings will be supported.
  • grRhotacisedFinalApostrophe – an alternate apostrophe that is taken instead of the default one for marking a longer and back vowel in rhotacised finals.
  • grSyllableSeparatorApostrophe – an alternate apostrophe that is taken instead of the default one for separating 0-initial syllables from preceding ones.
  • optionalNeutralToneMarker – character to use for marking the optional neutral tone. Only values given in OPTIONAL_NEUTRAL_TONE_MARKERS are allowed.
APOSTROPHE_LIST
List of apostrophes used in guessing routine.
DB_RHOTACISED_FINAL_APOSTROPHE
Default apostrophe used by GR syllable data in database for marking the longer and back vowel in rhotacised finals.
DB_RHOTACISED_FINAL_MAPPING
Database fields for tonal Erlhuah syllables.
DB_RHOTACISED_FINAL_MAPPING_ZEROINITIAL
Database fields for tonal Erlhuah syllables with i, u and iu medials.
OPTIONAL_NEUTRAL_TONE_MARKERS
List of allowed optional neutral tone markers: ˳ (U+02F3), 。 (U+FF61), ○ (U+FFEE), ₀ (U+2080), ₒ (U+2092)
SYLLABLE_STRUCTURE
Regular expression describing the plain syllable structure in GR (C,V,C).
compose(readingEntities)

Composes the given list of basic entities to a string. Applies an apostrophe between syllables if the second syllable has a zero-initial.

Parameter:readingEntities (list of str) – list of basic syllables or other content
Return type:str
Returns:composed entities
getAbbreviatedEntities(*args, **kwargs)

Gets a list of abbreviated GR entities. This returns single entities from getAbbreviatedForms() and only returns those that don’t also exist as full forms. Includes repetition markers x and v.

Returned entities are in lowercase.

Return type:list
Returns:list of abbreviated GR forms
getAbbreviatedFormData(entities)

Gets table of abbreviated entities including the traditional Chinese characters, original spelling and specialised information.

Some abbreviated syllables come with additional information:

  • 'T', the abbreviated form shortens the tonal information,
  • 'S', the abbreviated form shows a tone sandhi,
  • 'I', the full spelling is a non-standard pronunciation, or another mapping, that can be ignored,
  • 'F', the abbreviated entity or entities also exist(s) as a full form (as full forms).
Example:
>>> from cjklib.reading import operator
>>> gr = operator.GROperator()
>>> gr.getAbbreviatedEntityData(['yi'])
[(u'一', [u'i'], set([u'S', u'T']))]
Parameter:entities (list of str) – entities abbreviated form for which information is returned
Return type:list
Returns:list full spellings, Chinese character string and specialised information

Todo

  • Lang: tz is currently mapped to .tzy. Character 子 though generally has 3rd tone, which then should be tzyy or .tzyy. See ‘A Grammar of Spoken Chinese’, p. 36 (“-.tzy (which we abbreviate as -tz)”) and p. 55 (“suffix -tz (<tzyy)”)
getAbbreviatedForms(*args, **kwargs)

Gets a list of abbreviated forms used in GR.

The returned list consists of a tuple of one or more possibly abbreviated reading entites in lowercase. See getAbbreviatedFormData() on how to get more information on these forms.

Return type:list
Returns:a list of abbreviated forms
getBaseEntitiesForRhotacised(tonalEntity)

Gets a list of base entities as plain entity/tone pair for a given r-coloured entity (Erlhuah form).

This is the counterpart of getRhotacisedTonalEntity() and as different syllables can have a similar rhotacised form, the back transformation is not injective.

Parameter:tonalEntity (str) – r-coloured entity
Return type:set of tuple
Returns:list of plain entities with tone
Raises InvalidEntityError:
 if the entity is invalid.
getBaseTone(tone)

Gets the tone number of the tone or the etymological tone if it is a neutral or optional neutral tone.

Parameter:tone (str) – tone
Return type:int
Returns:base tone number
Raises InvalidEntityError:
 if an invalid tone is passed.
classmethod getDefaultOptions()
getFormattingEntities(*args, **kwargs)
getFullReadingEntities(*args, **kwargs)

Gets a set of full entities supported by the reading excluding abbreviated forms.

Return type:set of str
Returns:set of supported syllables
getPlainReadingEntities(*args, **kwargs)

Gets the list of plain entities supported by this reading without r-coloured forms (Erlhuah forms). Different to getReadingEntities() the entities will carry no tone mark.

Return type:set of str
Returns:set of supported syllables
getReadingCharacters(*args, **kwargs)
getReadingEntities(*args, **kwargs)
getRhotacisedTonalEntity(plainEntity, tone)

Gets the r-coloured entity (Erlhuah form) with tone mark for the given plain entity and tone. Not all entity-tone combinations are supported.

Parameters:
  • plainEntity (str) – entity without tonal information
  • tone (str) – tone
Return type:

str

Returns:

entity with appropriate tone

Raises InvalidEntityError:
 

if the entity is invalid.

Raises UnsupportedError:
 

if the given entity is an Erlhuah form or the syllable is not supported in this given tone.

getTonalEntity(plainEntity, tone)

Gets the entity with tone mark for the given plain entity and tone. This method only works for plain syllables that are not r-coloured (Erlhuah forms) as due to the depiction of Erlhuah in GR the information about the base syllable is lost and pronunciation partly varies between different syllables. Use getRhotacisedTonalEntity() to get the tonal entity for a given etymological (base) syllable.

Parameters:
  • plainEntity (str) – entity without tonal information
  • tone (str) – tone
Return type:

str

Returns:

entity with appropriate tone

Raises InvalidEntityError:
 

if the entity is invalid.

Raises UnsupportedError:
 

if the given entity is an Erlhuah form.

getTones(*args, **kwargs)
classmethod guessReadingDialect(readingString)

Takes a string written in GR and guesses the reading dialect.

The options 'grRhotacisedFinalApostrophe' and 'grSyllableSeparatorApostrophe' are guessed. Both will be set to the same value which derives from a list of different apostrophes and similar characters.

Parameter:readingString (str) – GR string
Return type:dict
Returns:dictionary of basic keyword settings

Todo

  • Impl: Both options 'grRhotacisedFinalApostrophe' and 'grSyllableSeparatorApostrophe' can be set independantly as the former one should only be found before an l and the latter mostly before vowels.
isAbbreviatedEntity(entity)

Returns true if the given entity is an abbreviated spelling.

Case of characters will be handled depending on the setting for option 'case'.

Parameter:entity (str) – entity to check
Return type:bool
Returns:True if entity is an abbreviated form.
isReadingEntity(entity)
isRhotacisedReadingEntity(entity)

Checks if the given entity is a r-coloured entity (Erlhuah form).

Parameter:entity (str) – reading entity
Return type:bool
Returns:True if the given entity is a r-coloured entity, False otherwise.
isStrictDecomposition(readingEntities)
removeApostrophes(readingEntities)

Removes apostrophes between two syllables for a given decomposition.

Parameter:readingEntities (list of str) – list of basic syllables or other content
Return type:list of str
Returns:the given entity list without separating apostrophes
splitEntityTone(entity)
splitPlainSyllableCVC(plainSyllable)

Splits the given plain syllable into consonants-vowels-consonants.

Parameter:plainSyllable (str) – entity without tonal information
Return type:tuple of str
Returns:syllable CVC triple
Raises InvalidEntityError:
 if the entity is invalid.

Table Of Contents

Previous topic

WadeGilesOperator – Wade-Giles

Next topic

MandarinIPAOperator — IPA for Cantonese

This Page