Spell checking module

The spell checking module consists of the following separate parts:

  • Spell checking: using language-specific dictionary elements. It uses third-party language-specific spell checkers. There are two kinds of spell checkers:

    • Language dictionaries (language checking)

    • Vertical dictionaries (professional dictionaries for a given set of languages)

  • User-written checking: using user-written callback functions.

    Note User dictionaries and user-written callbacks are deprecated.

You can combine any of these steps during the recognition process to determine the acceptability of words. The use of these checks can still be enabled or disabled separately at zone level.

The CSDK is delivered with over 20 different language dictionary files. These are generic language dictionaries and typically contain between 100,000 and 200,000 entries.

Vertical dictionaries are based on specific professions. They can be treated as extensions to the language dictionaries, but they can also be used when no language dictionary is specified. The CSDK is delivered with the following vertical dictionaries:

  • Dutch Legal Professional Dictionary

  • Dutch Medical Professional Dictionary

  • English Financial Professional Dictionary

  • English Legal Professional Dictionary

  • English Medical Professional Dictionary

  • French Legal Professional Dictionary

  • French Medical Professional Dictionary

  • German Legal Professional Dictionary

  • German Medical Professional Dictionary

A User Dictionary is a list of words, and regular expressions, called UDitems. User Dictionaries can be created or modified by the user manually or through KernelAPI calls. User-written checking callback functions are parts of the integrating application. They receive the string to be checked and the index of the zone where this string comes from. The application must evaluate the string, express the acceptability of the recognized string, then return this to the Engine. The Recognition Module uses feedback from the Spell Checker module along with other data to make its assessment of recognition confidence.

Note The confidence reporting system works best when the 3-way voting engine runs. If other machine print recognition modules are used, confidence information is still available, but the ability of the system to report on confidence is reduced. This results in a higher level of false negative and false positive reporting of suspicious recognition results.