cjklib.dictionary.search — Search strategies for dictionaries

New in version 0.3.

Search strategies for dictionaries.

Todo

  • Impl: Allow simple FTS3 searching as build support is already provided.

Functions

cjklib.dictionary.search.setDefaultWildcards(singleCharacter='_', multipleCharacters='%')
Convenience method to change the default wildcard characters globally.

Classes

class cjklib.dictionary.search.CEDICTTranslation(caseInsensitive=True, **options)

Bases: cjklib.dictionary.search.SingleEntryTranslation

CEDICT translation based search strategy. Takes into account additions put in parentheses and appended information separated by a comma.

getMatchFunction(searchStr)
class cjklib.dictionary.search.CEDICTWildcardTranslation(*args, **options)

Bases: cjklib.dictionary.search.SingleEntryTranslation, cjklib.dictionary.search._SimpleTranslationWildcardBase

CEDICT translation based search strategy with support for wildcards. Takes into account additions put in parentheses and appended information separated by a comma.

getMatchFunction(searchStr)
getWhereClause(column, searchStr)
class cjklib.dictionary.search.Exact(fullwidthCharacters=False, **options)

Bases: cjklib.dictionary.search._CaseInsensitiveBase

Simple search strategy class.

Parameters:
  • caseInsensitive (bool) – if True, latin characters match their upper/lower case equivalent, if False case sensitive matches will be made (default)
  • sqlCollation (str) – optional collation to use on columns in SQL queries
  • fullwidthCharacters (bool) – if True alphabetic halfwidth characters are converted to fullwidth.
getMatchFunction(searchStr)

Gets a function that returns True if the entry’s cell content matches the search string.

This method provides the sufficient condition for a match. Note that matches from other SQL clauses might get included which do not fulfill the conditions of getWhereClause().

Parameter:searchStr (str) – search string
Return type:function
Returns:function that returns True if the entry is a match
getWhereClause(column, searchStr)

Returns a SQLAlchemy clause that is the necessary condition for a possible match. This clause is used in the database query. Results may then be further narrowed by getMatchFunction().

Parameters:
  • column (SQLAlchemy column instance) – column to check against
  • searchStr (str) – search string
Returns:

SQLAlchemy clause

class cjklib.dictionary.search.HanDeDictTranslation(caseInsensitive=True, **options)

Bases: cjklib.dictionary.search.SingleEntryTranslation

HanDeDict translation based search strategy. Takes into account additions put in parentheses and allows for multiple entries in one record separated by punctuation marks.

getMatchFunction(searchStr)
class cjklib.dictionary.search.HanDeDictWildcardTranslation(*args, **options)

Bases: cjklib.dictionary.search.SingleEntryTranslation, cjklib.dictionary.search._SimpleTranslationWildcardBase

HanDeDict translation based search strategy with support for wildcards. Takes into account additions put in parentheses and appended information separated by a comma.

getMatchFunction(searchStr)
getWhereClause(column, searchStr)
class cjklib.dictionary.search.MixedTonelessWildcardReading(supportWildcards=True, headwordFullwidthCharacters=False, **options)

Bases: cjklib.dictionary.search.SimpleReading, cjklib.dictionary.search._MixedTonelessReadingWildcardBase

Reading search strategy that supplements TonelessWildcardReading to allow intermixing of readings missing tonal information with single characters from the headword. By default wildcard searches are supported.

This strategy complements the basic search strategy. It is not built to return results for plain reading or plain headword strings.

Parameters:
  • caseInsensitive (bool) – if True, latin characters match their upper/lower case equivalent, if False case sensitive matches will be made (default)
  • sqlCollation (str) – optional collation to use on columns in SQL queries
  • supportWildcards (bool) – if True wildcard characters are interpreted (default).
  • headwordFullwidthCharacters (bool) – if True halfwidth characters are converted to fullwidth if found in headword.
  • escape (str) – character used to escape command characters
  • singleCharacter (str) – wildcard character matching a single arbitrary character
  • multipleCharacters (str) – wildcard character matching zero, one or many arbitrary characters
getMatchFunction(searchStr, **options)
getWhereClause(headwordColumn, readingColumn, searchStr, **options)

Returns a SQLAlchemy clause that is the necessary condition for a possible match. This clause is used in the database query. Results may then be further narrowed by getMatchFunction().

Parameters:
  • headwordColumn (SQLAlchemy column instance) – headword column to check against
  • readingColumn (SQLAlchemy column instance) – reading column to check against
  • searchStr (str) – search string
Returns:

SQLAlchemy clause

class cjklib.dictionary.search.MixedWildcardReading(supportWildcards=True, headwordFullwidthCharacters=False, **options)

Bases: cjklib.dictionary.search.SimpleReading, cjklib.dictionary.search._MixedReadingWildcardBase

Reading search strategy that supplements SimpleWildcardReading to allow intermixing of readings with single characters from the headword. By default wildcard searches are supported.

This strategy complements the basic search strategy. It is not built to return results for plain reading or plain headword strings.

Parameters:
  • caseInsensitive (bool) – if True, latin characters match their upper/lower case equivalent, if False case sensitive matches will be made (default)
  • sqlCollation (str) – optional collation to use on columns in SQL queries
  • supportWildcards (bool) – if True wildcard characters are interpreted (default).
  • headwordFullwidthCharacters (bool) – if True halfwidth characters are converted to fullwidth if found in headword.
  • escape (str) – character used to escape command characters
  • singleCharacter (str) – wildcard character matching a single arbitrary character
  • multipleCharacters (str) – wildcard character matching zero, one or many arbitrary characters
getMatchFunction(searchStr, **options)
getWhereClause(headwordColumn, readingColumn, searchStr, **options)

Returns a SQLAlchemy clause that is the necessary condition for a possible match. This clause is used in the database query. Results may then be further narrowed by getMatchFunction().

Parameters:
  • headwordColumn (SQLAlchemy column instance) – headword column to check against
  • readingColumn (SQLAlchemy column instance) – reading column to check against
  • searchStr (str) – search string
Returns:

SQLAlchemy clause

class cjklib.dictionary.search.SimpleReading(caseInsensitive=True, **options)

Bases: cjklib.dictionary.search.Exact

Simple reading search strategy. Converts search string to the dictionary reading and separates entities by space.

Todo

  • Fix: How to handle non-reading entities?
getMatchFunction(searchStr, **options)
getWhereClause(column, searchStr, **options)
setDictionaryInstance(dictInstance)
class cjklib.dictionary.search.SimpleTranslation(caseInsensitive=True, **options)

Bases: cjklib.dictionary.search.SingleEntryTranslation

Simple translation search strategy. Takes into account additions put in parentheses.

getMatchFunction(searchStr)
class cjklib.dictionary.search.SimpleWildcardReading(**options)

Bases: cjklib.dictionary.search.SimpleReading, cjklib.dictionary.search._SimpleReadingWildcardBase

Simple reading search strategy with support for wildcards. Converts search string to the dictionary reading and separates entities by space.

getMatchFunction(searchStr, **options)
getWhereClause(column, searchStr, **options)
class cjklib.dictionary.search.SimpleWildcardTranslation(*args, **options)

Bases: cjklib.dictionary.search.SingleEntryTranslation, cjklib.dictionary.search._SimpleTranslationWildcardBase

Simple translation search strategy with support for wildcards. Takes into account additions put in parentheses.

getMatchFunction(searchStr)
getWhereClause(column, searchStr)
class cjklib.dictionary.search.SingleEntryTranslation(caseInsensitive=True, **options)

Bases: cjklib.dictionary.search.Exact

Basic translation search strategy.

getMatchFunction(searchStr)
getWhereClause(column, searchStr)
class cjklib.dictionary.search.TonelessWildcardReading(**options)

Bases: cjklib.dictionary.search.SimpleReading, cjklib.dictionary.search._TonelessReadingWildcardBase

Reading based search strategy with support for missing tonal information and wildcards.

Example:

>>> from cjklib.dictionary import *
>>> d = CEDICT(readingSearchStrategy=search.TonelessWildcardReading())
>>> [r.Reading for r in d.getForReading('zhidao', toneMarkType='numbers')]
[u'zhì dǎo', u'zhí dǎo', u'zhǐ dǎo', u'zhí dào', u'zhí dǎo', u'zhī dao']

Todo

  • Impl: Support readings with toneless base forms but without support for missing tone
getMatchFunction(searchStr, **options)
getWhereClause(column, searchStr, **options)
setDictionaryInstance(dictInstance)
class cjklib.dictionary.search.Wildcard(fullwidthCharacters=False, **options)

Bases: cjklib.dictionary.search.Exact, cjklib.dictionary.search._WildcardBase

Basic headword search strategy with support for wildcards.

Parameters:
  • caseInsensitive (bool) – if True, latin characters match their upper/lower case equivalent, if False case sensitive matches will be made (default)
  • sqlCollation (str) – optional collation to use on columns in SQL queries
  • fullwidthCharacters (bool) – if True alphabetic halfwidth characters are converted to fullwidth.
  • escape (str) – character used to escape command characters
  • singleCharacter (str) – wildcard character matching a single arbitrary character
  • multipleCharacters (str) – wildcard character matching zero, one or many arbitrary characters
getMatchFunction(searchStr, **options)
getWhereClause(column, searchStr, **options)
class cjklib.dictionary.search.WildcardTranslation(*args, **options)

Bases: cjklib.dictionary.search.SingleEntryTranslation, cjklib.dictionary.search._WildcardBase

Basic translation search strategy with support for wildcards.

getMatchFunction(searchStr)
getWhereClause(column, searchStr)

Table Of Contents

Previous topic

cjklib.dictionary.install — Install dictionaries

Next topic

cjklib.exception — Error classes

This Page