cjklib.cjknife — Command line interface

Command line interface (CLI) to the library’s functionality.

Check what this script offers on the command line with cjknife -h.

The script’s output depends on the following:

  • dictionary setting in the cjklib’s config file

  • user locale settings are checked to guess appropriate values for the

    character locale and the default input and output readings

See also

cjknife — Command Line Interface
Documentation on the CLI

Functions

cjklib.cjknife.getDecompositionForEntry(decomposition)

Gets a fixed width string representation of the given decomposition.

Parameter:decomposition (list) – character decomposition tree
Return type:list of str
Returns:string representation of decomposition
cjklib.cjknife.getDecompositionForList(decompositionList)

Gets a fixed width string representation of the given decompositions.

Parameter:decompositionList (list) – a list of character decompositions
Return type:list of str
Returns:string representation of decomposition
cjklib.cjknife.getPrintableList(stringList, joinString='')

Gets a printable representation for the given list.

Parameters:
  • stringList (list of list of str) – strings that need to be concatenated for output
  • joinString (str) – string that concatenates the different values
Return type:

str

Returns:

printable representation for the given list

Classes

class cjklib.cjknife.CharacterInfo(charLocale=None, characterDomain='Unicode', readingN=None, dictionaryN=None, dictionaryDatabaseUrl=None)

Provides lookup method services.

Initialises the CharacterInfo object.

Parameters:
  • charLocale (str) – character locale (one out of TCJKV)
  • characterDomain (str) – character domain (see L{characterlookup.CharacterLookup.getAvailableCharacterDomains()})
  • readingN (str) – name of reading
  • dictionaryN (str) – name of dictionary
  • dictionaryDatabaseUrl (str) – database connection setting in the format driver://user:pass@host/database.
CHAR_LOCALE_DEFAULT_READING
Character locale’s default character reading.
CHAR_LOCALE_NAME
Character locale names.
DICTIONARY_CHAR_LOCALE
Dictionary default locale.
LANGUAGE_CHAR_LOCALE_MAPPING
Mapping table for locale to default character locale.
READING_DEFAULT_DICTIONARY
Dictionary to use by default for a given reading.
VARIANT_TYPE_NAMES
List of character variants and their names.
convertReading(readingString, fromReading, toReading=None)

Converts a string in the source reading to the given target reading.

Parameters:
  • readingString (str) – string written in the source reading
  • fromReading (str) – name of the source reading
  • toReading (str) – name of the target reading
Return type:

str

Returns:

the input string converted to the toReading

Raises DecompositionError:
 

if the string can not be decomposed into basic entities with regards to the source reading or the given information is insufficient.

Raises CompositionError:
 

if the target reading’s entities can not be composed.

Raises ConversionError:
 

on operations specific to the conversion between the two readings (e.g. error on converting entities).

Raises UnsupportedError:
 

if source or target reading is not supported for conversion.

Todo

  • Fix: Conversion without tones will mostly break as the target reading doesn’t support missing tone information. Prefering ‘diacritic’ version (Pinyin/CantoneseYale) over ‘numbers’ as tone marks in the absence of any marks would solve this issue (forcing fifth tone), but would mean we prefer possible false information over the less specific estimation of the given entities as missing tonal information.
getAvailableDictionaries()

Gets a list of available dictionaries supported.

Return type:list of str
Returns:names of available dictionaries
getCharacterInformation(char)

Get the basic information for the given character.

The following data is collected and returned in a dict:
  • char
  • locale
  • locale name
  • character domain
  • code point hex
  • code point dec
  • type
  • equivalent form (if type is 'radical')
  • radical index
  • radical form (if available)
  • radical variants (if available)
  • stroke count (if available)
  • readings (if type is 'character')
  • variants (if type is 'character')
  • default glyph
  • glyphs
Parameter:char (str) – Chinese character
Return type:dict
Returns:character information as keyword value pairs
getCharactersForComponents(componentList, includeEquivalentRadicalForms=True)

Gets all characters that contain the given components.

If option includeEquivalentRadicalForms is set, all equivalent forms will be searched for when a Kangxi radical is given.

Parameters:
  • componentList (list of str) – list of character components
  • includeEquivalentRadicalForms (bool) – if True then characters in the given component list are interpreted as representatives for their radical and all radical forms are included in the search. E.g. 肉 will include ⺼ as a possible component.
Return type:

list of tuple

Returns:

list of pairs of matching characters and their glyphs

Raises ValueError:
 

if an invalid character locale is specified

Todo

  • Impl: Once mapping of similar radical forms exist (e.g. 言 and 訁) include here.
getCharactersForKangxiRadicalIndex(radicalIndex)

Gets all characters for the given Kangxi radical index grouped by their residual stroke count.

Parameter:radicalIndex (int) – Kangxi radical index
Return type:list of str
Returns:list of matching Chinese characters
getCharactersForReading(readingString, readingN=None)

Gets all know characters for the given reading.

Parameters:
  • readingString (str) – reading entity for lookup
  • readingN (str) – name of reading
Return type:

list of str

Returns:

list of characters for the given reading

Raises UnsupportedError:
 

if no mapping between characters and target reading exists.

Raises ConversionError:
 

if conversion from the internal source reading to the given target reading fails.

getEquivalentCharTable(componentList, includeEquivalentRadicalForms=True)

Gets a list structure of equivalent chars for the given list of characters.

If option includeEquivalentRadicalForms is set, all equivalent forms will be searched for when a Kangxi radical is given.

Parameters:
  • componentList (list of str) – list of character components
  • includeEquivalentRadicalForms (bool) – if True then characters in the given component list are interpreted as representatives for their radical and all radical forms are included in the search. E.g. 肉 will include ⺼ as a possible component.
Return type:

list of list of str

Returns:

list structure of equivalent characters

Todo

  • Impl: Once mapping of similar radical forms exist (e.g. 言 and 訁) include here.
getReadingForCharacters(charList)

Gets a list of readings for a given character string.

Parameter:charList (list) – list of Chinese characters
Return type:list of list of str
Returns:a list of readings per character
Raises exception.UnsupportedError:
 raised when a translation from character to reading is not supported by the given target reading
Raises exception.ConversionError:
 if conversion for the string is not supported
getReadingOptions(string, readingN)

Guesses the reading options using the given string to support reading dialects.

Parameters:
  • string (str) – reading string
  • readingN (str) – reading name
Return type:

dict

Returns:

reading options

getSimplified(charList)

Gets the Chinese simplified character representation for the given character string.

Parameter:charList (list) – list of Chinese characters
Return type:list of list of str
Returns:list of simplified Chinese characters
getTraditional(charList)

Gets the traditional character representation for the given character string.

Parameter:charList (list) – list of Chinese characters
Return type:list of list of str
Returns:list of simplified Chinese characters

Todo

  • Lang: Implementation is too simple to cover all aspects.
guessCharacterLocale()

Guesses the best suited character locale using the user’s locale settings.

Return type:str
Returns:locale
guessReading()

Guesses the best suited reading using the user’s locale settings.

Return type:str
Returns:reading name
hasDictionary()
isSemanticVariant(char, variants)

Checks if the character is a semantic variant form of the given characters.

Parameters:
  • char (str) – Chinese character
  • variants (list of str) – Chinese characters
Return type:

bool

Returns:

True if the character is a semantic variant form of the given characters, False otherwise.

searchDictionary(searchString, readingN=None, limit=None)

Searches the dictionary for matches of the given string.

Parameters:
  • searchString (str) – search string
  • readingN (str) – reading name
  • limit (int) – maximum number of entries
setCharacterDomain(characterDomain)

Table Of Contents

Previous topic

cjklib.characterlookup — Chinese character based functions

Next topic

cjklib.build — Build database

This Page