cjklib.dictionary.search — Search strategies for dictionaries
New in version 0.3.
Search strategies for dictionaries.
Todo
- Impl: Allow simple FTS3 searching as build support is already provided.
Functions
-
cjklib.dictionary.search.setDefaultWildcards(singleCharacter='_', multipleCharacters='%')
- Convenience method to change the default wildcard characters globally.
Classes
-
class cjklib.dictionary.search.CEDICTTranslation(caseInsensitive=True, **options)
Bases: cjklib.dictionary.search.SingleEntryTranslation
CEDICT translation based search strategy. Takes into account additions put
in parentheses and appended information separated by a comma.
-
getMatchFunction(searchStr)
-
class cjklib.dictionary.search.CEDICTWildcardTranslation(*args, **options)
Bases: cjklib.dictionary.search.SingleEntryTranslation, cjklib.dictionary.search._SimpleTranslationWildcardBase
CEDICT translation based search strategy with support for wildcards. Takes
into account additions put in parentheses and appended information separated
by a comma.
-
getMatchFunction(searchStr)
-
getWhereClause(column, searchStr)
-
class cjklib.dictionary.search.Exact(fullwidthCharacters=False, **options)
Bases: cjklib.dictionary.search._CaseInsensitiveBase
Simple search strategy class.
Parameters: |
- caseInsensitive (bool) – if True, latin characters match their
upper/lower case equivalent, if False case sensitive matches
will be made (default)
- sqlCollation (str) – optional collation to use on columns in SQL queries
- fullwidthCharacters (bool) – if True alphabetic halfwidth
characters are converted to fullwidth.
|
-
getMatchFunction(searchStr)
Gets a function that returns True if the entry’s cell content
matches the search string.
This method provides the sufficient condition for a match. Note that
matches from other SQL clauses might get included which do not fulfill
the conditions of
getWhereClause().
Parameter: | searchStr (str) – search string |
Return type: | function |
Returns: | function that returns True if the entry is a match |
-
getWhereClause(column, searchStr)
Returns a SQLAlchemy clause that is the necessary condition for a
possible match. This clause is used in the database query. Results may
then be further narrowed by
getMatchFunction().
Parameters: |
- column (SQLAlchemy column instance) – column to check against
- searchStr (str) – search string
|
Returns: | SQLAlchemy clause
|
-
class cjklib.dictionary.search.HanDeDictTranslation(caseInsensitive=True, **options)
Bases: cjklib.dictionary.search.SingleEntryTranslation
HanDeDict translation based search strategy. Takes into account additions
put in parentheses and allows for multiple entries in one record separated
by punctuation marks.
-
getMatchFunction(searchStr)
-
class cjklib.dictionary.search.HanDeDictWildcardTranslation(*args, **options)
Bases: cjklib.dictionary.search.SingleEntryTranslation, cjklib.dictionary.search._SimpleTranslationWildcardBase
HanDeDict translation based search strategy with support for wildcards.
Takes into account additions put in parentheses and appended information
separated by a comma.
-
getMatchFunction(searchStr)
-
getWhereClause(column, searchStr)
-
class cjklib.dictionary.search.MixedTonelessWildcardReading(supportWildcards=True, headwordFullwidthCharacters=False, **options)
Bases: cjklib.dictionary.search.SimpleReading, cjklib.dictionary.search._MixedTonelessReadingWildcardBase
Reading search strategy that supplements
TonelessWildcardReading to allow
intermixing of readings missing tonal information with single characters
from the headword. By default wildcard searches are supported.
This strategy complements the basic search strategy. It is not built to
return results for plain reading or plain headword strings.
Parameters: |
- caseInsensitive (bool) – if True, latin characters match their
upper/lower case equivalent, if False case sensitive matches
will be made (default)
- sqlCollation (str) – optional collation to use on columns in SQL queries
- supportWildcards (bool) – if True wildcard characters are
interpreted (default).
- headwordFullwidthCharacters (bool) – if True halfwidth characters
are converted to fullwidth if found in headword.
- escape (str) – character used to escape command characters
- singleCharacter (str) – wildcard character matching a single arbitrary
character
- multipleCharacters (str) – wildcard character matching zero, one or many
arbitrary characters
|
-
getMatchFunction(searchStr, **options)
-
getWhereClause(headwordColumn, readingColumn, searchStr, **options)
Returns a SQLAlchemy clause that is the necessary condition for a
possible match. This clause is used in the database query. Results may
then be further narrowed by
getMatchFunction().
Parameters: |
- headwordColumn (SQLAlchemy column instance) – headword column to check against
- readingColumn (SQLAlchemy column instance) – reading column to check against
- searchStr (str) – search string
|
Returns: | SQLAlchemy clause
|
-
class cjklib.dictionary.search.MixedWildcardReading(supportWildcards=True, headwordFullwidthCharacters=False, **options)
Bases: cjklib.dictionary.search.SimpleReading, cjklib.dictionary.search._MixedReadingWildcardBase
Reading search strategy that supplements
SimpleWildcardReading to allow
intermixing of readings with single characters from the headword.
By default wildcard searches are supported.
This strategy complements the basic search strategy. It is not built to
return results for plain reading or plain headword strings.
Parameters: |
- caseInsensitive (bool) – if True, latin characters match their
upper/lower case equivalent, if False case sensitive matches
will be made (default)
- sqlCollation (str) – optional collation to use on columns in SQL queries
- supportWildcards (bool) – if True wildcard characters are
interpreted (default).
- headwordFullwidthCharacters (bool) – if True halfwidth characters
are converted to fullwidth if found in headword.
- escape (str) – character used to escape command characters
- singleCharacter (str) – wildcard character matching a single arbitrary
character
- multipleCharacters (str) – wildcard character matching zero, one or many
arbitrary characters
|
-
getMatchFunction(searchStr, **options)
-
getWhereClause(headwordColumn, readingColumn, searchStr, **options)
Returns a SQLAlchemy clause that is the necessary condition for a
possible match. This clause is used in the database query. Results may
then be further narrowed by
getMatchFunction().
Parameters: |
- headwordColumn (SQLAlchemy column instance) – headword column to check against
- readingColumn (SQLAlchemy column instance) – reading column to check against
- searchStr (str) – search string
|
Returns: | SQLAlchemy clause
|
-
class cjklib.dictionary.search.SimpleReading(caseInsensitive=True, **options)
Bases: cjklib.dictionary.search.Exact
Simple reading search strategy. Converts search string to the dictionary
reading and separates entities by space.
Todo
- Fix: How to handle non-reading entities?
-
getMatchFunction(searchStr, **options)
-
getWhereClause(column, searchStr, **options)
-
setDictionaryInstance(dictInstance)
-
class cjklib.dictionary.search.SimpleTranslation(caseInsensitive=True, **options)
Bases: cjklib.dictionary.search.SingleEntryTranslation
Simple translation search strategy. Takes into account additions put in
parentheses.
-
getMatchFunction(searchStr)
-
class cjklib.dictionary.search.SimpleWildcardReading(**options)
Bases: cjklib.dictionary.search.SimpleReading, cjklib.dictionary.search._SimpleReadingWildcardBase
Simple reading search strategy with support for wildcards. Converts search
string to the dictionary reading and separates entities by space.
-
getMatchFunction(searchStr, **options)
-
getWhereClause(column, searchStr, **options)
-
class cjklib.dictionary.search.SimpleWildcardTranslation(*args, **options)
Bases: cjklib.dictionary.search.SingleEntryTranslation, cjklib.dictionary.search._SimpleTranslationWildcardBase
Simple translation search strategy with support for wildcards. Takes into
account additions put in parentheses.
-
getMatchFunction(searchStr)
-
getWhereClause(column, searchStr)
-
class cjklib.dictionary.search.SingleEntryTranslation(caseInsensitive=True, **options)
Bases: cjklib.dictionary.search.Exact
Basic translation search strategy.
-
getMatchFunction(searchStr)
-
getWhereClause(column, searchStr)
-
class cjklib.dictionary.search.TonelessWildcardReading(**options)
Bases: cjklib.dictionary.search.SimpleReading, cjklib.dictionary.search._TonelessReadingWildcardBase
Reading based search strategy with support for missing tonal information and
wildcards.
Example:
>>> from cjklib.dictionary import *
>>> d = CEDICT(readingSearchStrategy=search.TonelessWildcardReading())
>>> [r.Reading for r in d.getForReading('zhidao', toneMarkType='numbers')]
[u'zhì dǎo', u'zhí dǎo', u'zhǐ dǎo', u'zhí dào', u'zhí dǎo', u'zhī dao']
Todo
- Impl: Support readings with toneless base forms but without support
for missing tone
-
getMatchFunction(searchStr, **options)
-
getWhereClause(column, searchStr, **options)
-
setDictionaryInstance(dictInstance)
-
class cjklib.dictionary.search.Wildcard(fullwidthCharacters=False, **options)
Bases: cjklib.dictionary.search.Exact, cjklib.dictionary.search._WildcardBase
Basic headword search strategy with support for wildcards.
Parameters: |
- caseInsensitive (bool) – if True, latin characters match their
upper/lower case equivalent, if False case sensitive matches
will be made (default)
- sqlCollation (str) – optional collation to use on columns in SQL queries
- fullwidthCharacters (bool) – if True alphabetic halfwidth
characters are converted to fullwidth.
- escape (str) – character used to escape command characters
- singleCharacter (str) – wildcard character matching a single arbitrary
character
- multipleCharacters (str) – wildcard character matching zero, one or many
arbitrary characters
|
-
getMatchFunction(searchStr, **options)
-
getWhereClause(column, searchStr, **options)
-
class cjklib.dictionary.search.WildcardTranslation(*args, **options)
Bases: cjklib.dictionary.search.SingleEntryTranslation, cjklib.dictionary.search._WildcardBase
Basic translation search strategy with support for wildcards.
-
getMatchFunction(searchStr)
-
getWhereClause(column, searchStr)