Word separation characters

The "Field delimiter" option for fuzzy databases and some page recognition profiles enables you to specify what characters can be used to identify compound words. When one of the separation characters is encountered, it is recognized and a new word starts with the following character.

For example, if a document contains a compound word like Diagon-Alley, and you want the search to consider this compound word as two separate words, word separation characters can be specified. In this example, if a hyphen (-) is used then diagon and alley are searched and evaluated separately.

By default, fuzzy databases have a hyphen (-) and a comma (,) as their default values. Page recognition profiles however, have different default values. The following table lists the default word separation characters for each of the page recognition profiles:

Table 1. Default Page Recognition Profile Word Separation Characters
Recognition Profile Default Word Separation Characters

FineReader 12.4 page recognition

/:()-#

OmniPage 21.0 page recognition

/:()-#

Cursive page recognition *

  • Check Reader 11.1

  • Field Reader 8.1

  • Document Reader 8.1

N/A

Mixed Print page recognition **

/:()-#

* These recognition engines are not installed by default and require additional licensing.

** If no profile is chosen as input profile for machine print.

Important The FineReader recognition engine will be deprecated in the next release of Kofax Transformation Modules. As a result, Kofax recommends that you use the OmniPage recognition profile for both page and zone recognition for all new projects. If you have existing projects that use one or more FineReader profiles, it is also recommended that you modify those projects to use a comparable OmniPage profile.