cjklib.reading.operator.CantoneseYaleOperator is a mature implementation of the Yale transcription for Cantonese. It’s one of the major romanisations used for Cantonese and frequently found in education.
Features:
Yale distinguishes two tones often subsumed under one: the high level tone with tone contour 55 as given in the commonly used pitch model by Yuen Ren Chao and the high falling tone given as pitch 53 (as by Chao), 52 or 51 (Bauer and Benedikt, chapter 2.1.1 pp. 115). Many sources state that these two tones aren’t distinguishable anymore in modern Hong Kong Cantonese and thus are subsumed under one tone in some romanisation systems for Cantonese.
In the abbreviated form of the Yale romanisation that uses numbers to represent tones this distinction is not made. The mapping of the tone number 1 to either the high level or the high falling tone can be given by the user and is important when conversion is done involving this abbreviated form of the Yale romanisation. By default the high level tone will be used as this primary use is indicated in the given sources.
Tone marks, if using the standard form with diacritics, are placed according to Cantonese Yale rules (see getTonalEntity()). The CantoneseYaleOperator by default tries to work around misplaced tone marks though to ease handling of malformed input. There are cases, where this generous behaviour leads to a different segmentation compared to the strict interpretation. No means are implemented to disambiguate between both solutions. The general behaviour is controlled with option 'strictDiacriticPlacement'.
See also
Bases: cjklib.reading.operator.TonalRomanisationOperator
Provides an operator for the Cantonese Yale romanisation. For conversion between different representations the CantoneseYaleDialectConverter can be used.
Parameters: |
|
---|
Mapping of tone name to representation per tone mark type. Representations includes a diacritic mark and optional the letter ‘h’ marking a low tone.
The 'internal' dialect is used for conversion between different forms of Cantonese Yale. As conversion to the other dialects can lose information (Diacritics: missing tone, Numbers: distinction between high level and high rising, None: no tones at all) conversion to this dialect can retain all information and thus can be used as a standard target reading.
Splits the given plain syllable into onset (initial), nucleus and coda, the latter building the rhyme (final).
The syllabic nasals m, ng will be returned as coda. Syllables yu, yun, yut will fall into (y, yu, ), (y, yu, n) and (y, yu, t).
Returned strings will be lowercase.
Parameter: | plainSyllable (str) – syllable in the Yale romanisation system without tone marks |
---|---|
Return type: | tuple of str |
Returns: | tuple of syllable onset, nucleus and coda |
Raises InvalidEntityError: | |
if the entity is invalid (e.g. syllable nucleus or tone invalid). |
Todo
Splits the given plain syllable into onset (initial) and rhyme (final).
The syllabic nasals m, ng will be returned as final. Syllables yu, yun, yut will fall into (y, yu, ), (y, yu, n) and (y, yu, t).
Returned strings will be lowercase.
Parameter: | plainSyllable (str) – syllable without tone marks |
---|---|
Return type: | tuple of str |
Returns: | tuple of entity onset and rhyme |
Raises InvalidEntityError: | |
if the entity is invalid. |
Todo
Takes a string written in Cantonese Yale and guesses the reading dialect.
Currently only the option 'toneMarkType' is guessed. Unless 'includeToneless' is set to True only the tone mark types 'diacritics' and 'numbers' are considered as the latter one can also represent the state of missing tones.
Parameters: |
|
---|---|
Return type: | dict |
Returns: | dictionary of basic keyword settings |
Checks if the given plain syllable can occur with stop tones which is the case for syllables with unreleased finals.
Parameter: | plainEntity (str) – entity without tonal information |
---|---|
Return type: | bool |
Returns: | True if given syllable can occur with stop tones, False otherwise |
Checks if the given plain entity and tone combination is valid.
Only syllables with unreleased finals occur with stop tones, other forms must not (see hasStopTone()).
Parameters: |
|
---|---|
Return type: | bool |
Returns: | True if given combination is valid, False otherwise |
Splits the entity into an entity without tone mark and the entity’s tone index.
The plain entity returned will always be in Unicode’s Normalization Form C (NFC, see http://www.unicode.org/reports/tr15/).
Parameter: | entity (str) – entity with tonal information |
---|---|
Return type: | tuple |
Returns: | plain entity without tone mark and entity’s tone index (starting with 1) |
Regex to split a string in NFD into several syllables in a crude way. The regular expressions works for both, diacritical and number tone marks. It consists of: