Announcing cjklib

Announcing cjklib, a library for higher-level support of Chinese characters.

May 19th, 2009

(Hong Kong) We would like to announce the availability of cjklib, a new Python-based programming library providing higher-level support of Chinese characters, also called Han characters.

Chinese characters, in comparison to other scripts, have several distinctive features: more than 40,000 characters exist, they have a complex visual appearance, they to some extent contain meaning in their structure (ideographic characters), and they almost completely lack enunciative information. Chinese characters are employed in writing the Chinese, Japanese, Korean, and formerly the Vietnamese language, denoted in short by CJK or CJKV.

Cjklib tries to fill a current void in supporting Chinese characters by focusing on visual appearance and reading-based data. While many lexical sources already exists, there is no layer which provides the data in an accessible and consistent way, burdening the developer with reinventing many basic functions. This project wants to channel different efforts in order to provide the developer with a consistent view independent of the chosen language. This library directly targets developers and experienced users, its overall goal being to improve the coverage of applications for the end user.

Some features of cjklib:

  • Glyph-based functions
    • Radical index, residual stroke count
    • 'Breaking down' a character into a tree of its components
    • Stroke order
    • Locale based glyph layout
  • Reading-based function
    • Character to reading mapping
    • Conversion between readings (Mandarin Chinese: Pinyin, Gwoyeu Romatzyh, Wade-Giles, IPA; Cantonese: Jyutping, Cantonese Yale; Japanese: Kana; Korean: Hangul)
    • Translation between realizations of a reading, e.g. numbers to diacritics
  • Database back-end with powerful build system providing access for Unihan, Kanjidict, EDICT, CEDICT, HanDeDict
  • Command line tool to access the library's functions

The project was released recently and is still under heavy development. Although API changes might occur in the near future, the library is usable and already being employed in other software. Cjklib is released under the LGPL.

If you wish to know more about cjklib then its website [1] is a good starting point. Much documentation already exists and more is being added. To have a quick overview of some functions offered you might want to look at [2].
Download here [3].

The cjklib developers
cjklib-devel@googlegroups.com

[1] http://code.google.com/p/cjklib
[2] http://code.google.com/p/cjklib/wiki/Screenshots
[3] http://code.google.com/p/cjklib/downloads/list