To do

Todo

  • Lang: On multiple occurrences of same radical (may be in different forms): Which one to choose? Implement to turn down unwanted forms.

(The original entry is located in library/cjklib.build.builder.rst, line 55 and can be found here.)

Todo

  • Lang: Implement, find a good algorithm to turn down unwanted forms, don’t just choose random one. See the following list:

    >>> from cjklib import characterlookup
    >>> cjk = characterlookup.CharacterLookup(‘T’)
    >>> for char in cjk.db.selectSoleValue(‘CharacterRadicalResidualStrokeCount’,
    ...     ‘ChineseCharacter’, distinctValues=True):
    ...     try:
    ...         entries = cjk.getCharacterKangxiRadicalResidualStrokeCount(char, ‘C’)
    ...         lastEntry = entries[0]
    ...         for entry in entries[1:]:
    ...             # print if diff. radical forms and diff. residual stroke count
    ...             if lastEntry[0] != entry[0] and lastEntry[2] != entry[2]:
    ...                 print char
    ...                 break
    ...             lastEntry = entry
    ...     except:
    ...         pass
    ...
    渌
    犾
    玺
    珏
    缧
    >>> cjk.getCharacterKangxiRadicalResidualStrokeCount(u’缧’)
    [(u’糸’, 0, u’⿻’, 0, 8), (u’纟’, 0, u’⿰’, 0, 11)]
    

(The original entry is located in library/cjklib.build.builder.rst, line 53 and can be found here.)

Todo

  • Fix: Optimize insert, use transaction which disables autocommit and cosider passing data all at once, requiring proper handling of row indices.

(The original entry is located in library/cjklib.build.builder.rst, line 12 and can be found here.)

Todo

  • Impl: Check if all glyphs in LocaleCharacterGlyph are included.

(The original entry is located in library/cjklib.build.builder.rst, line 9 and can be found here.)

Todo

  • Impl: For implementation as view, we need the concept of runtime dependency. All DEPENDS are actually BUILD_DEPENDS, while the DEPENDS here will be a runtime dependency.

(The original entry is located in library/cjklib.build.builder.rst, line 10 and can be found here.)

Todo

  • Fix: Word regex is specialised for HanDeDict.
  • Fix: Using a row_id for joining instead of Headword(Traditional) and Reading would maybe speed up table joins. Needs a workaround to include multiple rows for one actual headword entry though.

(The original entry is located in library/cjklib.build.builder.rst, line 13 and can be found here.)

Todo

  • bug: “Prefer” system does not work for additional builders

(The original entry is located in library/cjklib.build.cli.rst, line 6 and can be found here.)

Todo

  • Impl: Incorporate stroke lookup (bigram) techniques.

  • Impl: How to handle character forms (either decomposition or stroke order), that can only be found as a component in other characters? We already mark them by flagging it with an ‘S’.

  • Impl: Add option to component decomposition methods to stop on Kangxi radical forms without breaking further down beyond those.

  • Impl: Further character domains for Japanese, Cantonese, Korean, Vietnamese

  • Impl: There are more than 800 characters that have compatibility mappings with its targets having same semantics. Those characters do not need own data for stroke order and decomposition, but can share with their targets:

    >>> unicodedata.normalize(‘NFD’, u’嗀’)
    u’嗀’
    

(The original entry is located in library/cjklib.characterlookup.rst, line 9 and can be found here.)

Todo

  • Lang: Clarify on characters classified under a given radical but without any proper radical glyph found as component.
  • Lang: Clarify on different radical glyphs for the same radical form. At best this method should return one and only one radical form (glyph).
  • Impl: Give the Unicode radical form and not the equivalent character form in the relevant table as to always return the pure radical form (also avoids duplicates). Then state: If the included component has an appropriate Unicode radical form or Unicode radical variant, then this form is returned. In either case the radical form can be an ordinary character.

(The original entry is located in library/cjklib.characterlookup.rst, line 309 and can be found here.)

Todo

  • Docu: Write about different kinds of variants
  • Impl: Give a source on variant information as information can contradict itself (http://www.unicode.org/reports/tr38/tr38-5.html#N10211). See 呆 (U+5446) which has one form each for semantic and specialised semantic, each derived from a different source. Change also in getAllCharacterVariants().
  • Lang: What is the difference on Z-variants and compatible variants? Some links between two characters are bidirectional, some not. Is there any rule?

(The original entry is located in library/cjklib.characterlookup.rst, line 421 and can be found here.)

Todo

  • Impl: Table of same character glyphs, including special radical forms (e.g. 言 and 訁).
  • Data: Adopt locale dependant glyph for parent characters (e.g. 鬼 in 隗 愧 嵬).
  • Data: Use radical forms and radical variant forms instead of equivalent characters in decomposition data. Mapping looses information.
  • Lang: By default we get the equivalent character for a radical form. In some cases these equivalent characters will be only abstractly related to the given radical form (e.g. being the main radical form), so that the result set will be too big and doesn’t reflect the original query. Set up a table including only strict visual relations between radical forms and equivalent characters. Alternatively restrict decomposition data to only include radical forms if appropriate, so there would be no need for conversion.
  • Fix: Radical equivalent forms should be included independent of the chosen locale. E.g. u’⻔’ for u’门’.

(The original entry is located in library/cjklib.characterlookup.rst, line 459 and can be found here.)

Todo

  • Docu: Write about how Unihan maps characters to a Kangxi radical. Especially Chinese simplified characters.
  • Lang: 6954 characters have no Kangxi radical. Provide integration for these (SELECT COUNT(*) FROM Unihan WHERE kRSUnicode IS NOT NULL AND kRSKangxi IS NULL;).

(The original entry is located in library/cjklib.characterlookup.rst, line 515 and can be found here.)

Todo

  • Lang: Check if radicals for which multiple radical forms exists include a simplified form or other variation (e.g. ⻆, ⻝, ⺐). There are radicals for which a Chinese simplified character equivalent exists and that is mapped to a different radical under Unicode.

(The original entry is located in library/cjklib.characterlookup.rst, line 668 and can be found here.)

Todo

  • Lang: Narrow locales, not all variant forms are valid under all locales.

(The original entry is located in library/cjklib.characterlookup.rst, line 727 and can be found here.)

Todo

  • Impl: Add option to return converted entities even if conversion fails for some entities. Represent those with None.

(The original entry is located in library/cjklib.characterlookup.rst, line 801 and can be found here.)

Todo

  • Lang: Add stroke order source to stroke order data so that in general different and contradicting stroke order information can be given. The user then could prefer several sources that in the order given would be queried.

(The original entry is located in library/cjklib.characterlookup.rst, line 958 and can be found here.)

Todo

Impl: Implement means to check if the component is really not
found, or if our data is just insufficient.

(The original entry is located in library/cjklib.characterlookup.rst, line 1047 and can be found here.)

Todo

  • Fix: Conversion without tones will mostly break as the target reading doesn’t support missing tone information. Prefering ‘diacritic’ version (Pinyin/CantoneseYale) over ‘numbers’ as tone marks in the absence of any marks would solve this issue (forcing fifth tone), but would mean we prefer possible false information over the less specific estimation of the given entities as missing tonal information.

(The original entry is located in library/cjklib.cjknife.rst, line 83 and can be found here.)

Todo

  • Impl: Once mapping of similar radical forms exist (e.g. 言 and 訁) include here.

(The original entry is located in library/cjklib.cjknife.rst, line 150 and can be found here.)

Todo

  • Impl: Once mapping of similar radical forms exist (e.g. 言 and 訁) include here.

(The original entry is located in library/cjklib.cjknife.rst, line 203 and can be found here.)

Todo

  • Lang: Implementation is too simple to cover all aspects.

(The original entry is located in library/cjklib.cjknife.rst, line 260 and can be found here.)

Todo

  • bug: Specifying a limit might yield less results than possible.

(The original entry is located in library/cjklib.dictionary.rst, line 62 and can be found here.)

Todo

  • bug: Specifying a limit might yield less results than possible.

(The original entry is located in library/cjklib.dictionary.rst, line 78 and can be found here.)

Todo

  • bug: Specifying a limit might yield less results than possible.

(The original entry is located in library/cjklib.dictionary.rst, line 96 and can be found here.)

Todo

  • bug: Specifying a limit might yield less results than possible.

(The original entry is located in library/cjklib.dictionary.rst, line 112 and can be found here.)

Todo

(The original entry is located in library/cjklib.dictionary.install.rst, line 25 and can be found here.)

Todo

  • Impl: Allow simple FTS3 searching as build support is already provided.

(The original entry is located in library/cjklib.dictionary.search.rst, line 6 and can be found here.)

Todo

  • Fix: How to handle non-reading entities?

(The original entry is located in library/cjklib.dictionary.search.rst, line 10 and can be found here.)

Todo

  • Impl: Support readings with toneless base forms but without support for missing tone

(The original entry is located in library/cjklib.dictionary.search.rst, line 17 and can be found here.)

Todo

  • Impl: What about hiding of inner classes? _checkSpecialOperators() method is called for internal converters and for external ones delivered by createReadingConverter(). Latter method doesn’t return internal cached copies though, but creates new instances. ReadingOperator also gets copies from ReadingFactory objects for internal instances. Sharing saves memory but changing one object will affect all other objects using this instance.
  • Impl: General reading options given for a converter with **options need to be used on creating a operator. How to raise errors to save user of specifying an operator twice, one per options, one per concrete instance (similar to sourceOptions and targetOptions)?

(The original entry is located in library/cjklib.reading.rst, line 16 and can be found here.)

Todo

  • Impl: Make parameters fromReading, toReading optional if only one conversion direction is given. Same for convertEntities().

(The original entry is located in library/cjklib.reading.converter.rst, line 55 and can be found here.)

Todo

  • Impl: Strict mode for tone abbreviating spellings. Raise AmbiguousConversionError, e.g. raise on a which could be .a or a.
  • Impl: Add option to remove hyphens, “A Grammar of Spoken Chinese, p. xxii”, Conversion to Pinyin can use that.

(The original entry is located in library/cjklib.reading.converter.GRDialectConverter.rst, line 33 and can be found here.)

Todo

  • Impl: Two different methods for tone sandhi and coarticulation effects?
  • Lang: Support for Erhua in mapping.

(The original entry is located in library/cjklib.reading.converter.PinyinIPAConverter.rst, line 13 and can be found here.)

Todo

  • Lang: What to do on several following neutral tones?

(The original entry is located in library/cjklib.reading.converter.PinyinIPAConverter.rst, line 100 and can be found here.)

Todo

  • Impl: Optimise decompose() as to incorporate segment() and prune the tree while it is created. Does this though yield significant improvement? Would at least be O(n).

(The original entry is located in library/cjklib.reading.operator.rst, line 11 and can be found here.)

Todo

  • Lang: Shed more light on representations of tones in IPA.
  • Impl: Get all diacritics used in IPA as tones for TONE_MARK_REGEX.
  • Fix: What about CompositionError? All romanisations raise it, but they have a distinct set of characters that belong to the reading.

(The original entry is located in library/cjklib.reading.operator.rst, line 26 and can be found here.)

Todo

  • Impl: Place diacritics on main vowel, derive from IPA representation.

(The original entry is located in library/cjklib.reading.operator.rst, line 125 and can be found here.)

Todo

  • Lang: Shed more light on tone sandhi in Cantonese language.
  • Impl: Implement diacritics for Cantonese Tones. On which part of the syllable should they be placed. Document.
  • Lang: Binyām 變音
  • Impl: What are the semantics of non-level tones given for unreleased stop finals? Take high rising Binyam into account.

(The original entry is located in library/cjklib.reading.operator.CantoneseIPAOperator.rst, line 10 and can be found here.)

Todo

  • Impl: Finals ing, ik, ung, uk, eun, eut, a differ from other finals with same vowels. What semantics/view do we want to provide on the syllable parts?

(The original entry is located in library/cjklib.reading.operator.CantoneseYaleOperator.rst, line 99 and can be found here.)

Todo

  • Lang: Place the tone mark on the first character of the nucleus?

(The original entry is located in library/cjklib.reading.operator.CantoneseYaleOperator.rst, line 133 and can be found here.)

Todo

  • Impl: Initial, medial, head, ending (ending1, ending2=l?)
  • Lang: Y.R. Chao uses particle and interjection ㄝ è. For more see ‘Mandarin Primer’, Vocabulary and Index, pp. 301.
  • Impl: Implement Erhua forms as stated in W. Simon: A Beginner’s Chinese-English Dictionary.
  • Impl: Implement a GRIPAConverter once IPA values are obtained for the PinyinIPAConverter. GRIPAConverter can work around missing Erhua conversion to Pinyin.
  • Lang: Special rule for non-Chinese names with initial r- to be transcribed with an r- cited by Ching-song Gene Hsiao: A Manual of Transcription Systems For Chinese, 中文拼音手册. Far Eastern Publications, Yale University, New Haven, Connecticut, 1985, ISBN 0-88710-141-0.

(The original entry is located in library/cjklib.reading.operator.GROperator.rst, line 9 and can be found here.)

Todo

  • Lang: tz is currently mapped to .tzy. Character 子 though generally has 3rd tone, which then should be tzyy or .tzyy. See ‘A Grammar of Spoken Chinese’, p. 36 (“-.tzy (which we abbreviate as -tz)”) and p. 55 (“suffix -tz (<tzyy)”)

(The original entry is located in library/cjklib.reading.operator.GROperator.rst, line 143 and can be found here.)

Todo

  • Impl: Both options 'grRhotacisedFinalApostrophe' and 'grSyllableSeparatorApostrophe' can be set independantly as the former one should only be found before an l and the latter mostly before vowels.

(The original entry is located in library/cjklib.reading.operator.GROperator.rst, line 289 and can be found here.)

Todo

  • Impl: Finals ing, ik, ung, uk differ from other finals with same vowels. What semantics/view do we want to provide on the syllable parts?

(The original entry is located in library/cjklib.reading.operator.JyutpingOperator.rst, line 57 and can be found here.)

Todo

  • Impl: Punctuation marks in isFormattingEntity() and getFormattingEntities(). Then change PinyinBrailleConverter.convertEntitySequence() to use these methods.

(The original entry is located in library/cjklib.reading.operator.MandarinBrailleOperator.rst, line 10 and can be found here.)

Todo

  • Impl: ISO 7098 asks for conversion of 。、·「」 to .,-«». What about ,?《》:-? Implement a method for conversion to be optionally used.
  • Impl: Special marker for neutral tone: ‘mȧ’ (u’m\u0227’, reported by Ching-song Gene Hsiao: A Manual of Transcription Systems For Chinese, 中文拼音手册. Far Eastern Publications, Yale University, New Haven, Connecticut, 1985, ISBN 0-88710-141-0. Seems like left over from Pinjin, 1956), and ‘·ma’ (u’\xb7ma’, check!: 现代汉语词典(第5版)[Xiàndài Hànyǔ Cídiǎn 5. Edition]. 商务印书馆 [Shāngwù Yìnshūguǎn], Beijing, 2005, ISBN 7-100-04385-9.)
  • Impl: Consider handling \*nue and \*lue.

(The original entry is located in library/cjklib.reading.operator.PinyinOperator.rst, line 12 and can be found here.)

Todo

  • Fix: don’t raise an ValueError here (delayed), raise an Exception directly in the constructor. See also WadeGilesOperator.

(The original entry is located in library/cjklib.reading.operator.PinyinOperator.rst, line 223 and can be found here.)

Todo

  • Lang: Asterisk (*) marking the entering tone (入聲): e.g. chio²* and chüeh²* for 覺 used by Giles (A Chinese-English Dictionary, second edition, 1912).

(The original entry is located in library/cjklib.reading.operator.WadeGilesOperator.rst, line 9 and can be found here.)

Todo

  • Impl: Raise value error on invalid values for diacriticE, zeroFinal, umlautU

(The original entry is located in library/cjklib.reading.operator.WadeGilesOperator.rst, line 56 and can be found here.)

Todo

  • Impl: include script table from Unicode 5.2.0 to get character ranges for Hangul and Kana

(The original entry is located in library/cjklib.test.characterlookup.rst, line 10 and can be found here.)

Todo

  • Impl: Add second dimension to consistency check for converting between dialect forms for all entities. Use cartesian product option_list x dialects

(The original entry is located in library/cjklib.test.readingconverter.rst, line 6 and can be found here.)

Todo

  • Impl: While this function is only needed as long as Python doesn’t ship with a proper title casing algorithm as defined by Unicode, we need a proper handling for Wade-Giles, as Pinyin Erhua forms will convert to two entities being separated by a hyphen, which does not fall in to the Unicode title casing algorithm’s definition of a case-ignorable character.

(The original entry is located in library/cjklib.util.rst, line 18 and can be found here.)

Previous topic

cjklib.util — Utilities

This Page