RecoStar Zone Recognition Profile Settings window

Countries

Select one or more countries from the list to determine what country and language characters are supported by this recognition profile. The selected countries also correspond to the internal trigram mode and any existing dictionaries.

The default country option is determined by the Kofax Transformation Modules user interface language when the project is first created. For example, if the interface is set to German when the project is created, the default country option is Germany. Changing the application language after the project is created does not affect the default country.

Languages

Only use additional recognition options in a multi-lingual setting when the text in your documents corresponds to the Machine print type and the Alphanumeric content type.

Note The (optional) number in brackets helps you monitoring the number of selected countries and/or languages. This is especially important if your selection is hidden and you need to scroll to see the selection.
Image PreProcessing

This group enables the selection of one of the following predefined image processing definitions:

  • <No Image PreProcessing>.

  • Remove Dots, Lines.

  • Remove Dots. This is the default value for this option.

  • Remove Lines, Punch holes.

  • Remove Lines.

  • Remove Shading, Dots, Lines, Punch holes.

  • Remove Shading, Lines, Punch holes.

Note These predefined image processing definitions are performed before recognition, and are not related to the image cleanup methods defined on the Project Settings - Image Cleanup tab.
Print Type

This group helps improve recognition results by selecting what print type is expected. Choose from one of the following print types:

  • Unknown. This is the default value for this option.

  • Fixed.

  • Hand print.

  • Machine print.

  • OCR-A.

  • OCR-B.

  • Farrington 7B.

  • CMC-7.

  • E-13B.

Content

This group has the following options:

Content type

Select one of the following content types:

  • Alphanumeric. This is the default value for this option.

  • Numeric.

  • Amount.

Character set

Use this option to specify a mask for the recognition text. The value for this option is set to empty by default.

Pattern

Use this option to narrow down the segmentation and recognition alternatives that are already limited by the selected character set. The value for this option is set to blank by default.

Lines and pitch

This group has the following options:

Number of lines

Use this to specify the number of expected lines in a zone or choose a type of segmentation to be performed.

  • Unknown. Select this if you do not know how many lines are in a zone. This is the default value for this option.

  • Document Layout. Select this if you want to extract text information from areas with complex layout ignoring all non-text information.

  • Manual. Select this if you know the number of expected lines. Use the field to the right to set the number of expected lines.

Machine print pitch

Use this to specify the pitch (horizontal spacing) of the characters that are machine generated. Pitch is measured from the start of one character (leftmost pixel) to the start of the next character:

  • Unknown. Select this if you do not know if the pitch is fixed or variable. In this case, the recognition engine tries to determine the type (fixed or variable) and the value of the pitch automatically. This is the default value for this option.

  • Fixed. Select this if you know that the pitch is constant, but you know the exact value.

  • Variable. Select this if you know that the pitch is inconsistent.

  • Manual. Select this if the pitch is fixed and you know the exact value. You can provide a value in unit of 1/100 mm. The default is 500 (5mm). However, you can specify any integer value between 170 and 10000.

Hand print pitch

Use this to specify the pitch (horizontal spacing) of the hand printed characters. Pitch is measured from the start of one character (leftmost pixel) to the start of the next character:

  • Unknown. Select this if you do not know if the pitch is fixed or variable. In this case, the recognition engine tries to determine the type (fixed or variable) and the value of the pitch automatically. This is the default value for this option.

  • Fixed. Select this if you know that the pitch is constant, but you do not know the exact value.

  • Variable. Select this option if you know that the pitch is inconsistent.

  • Manual. Select this if the pitch is fixed and you want to specify an exact value. You can provide a value in units of 1/100 mm. The default value is 500 (5mm). However, you can specify any number between 360 and 630.

Trigram Mode

Select one of the following trigram modes:

  • Off. This is the default value for this option.

  • Check.

  • Repair.

Logical context

Select this option if you want to enable the trigram feature to resolve uncertain characters on the basis of their logical context. This option is selected by default.

The Dictionary group enables you to select a dictionary to help with recognition. Sample dictionaries are provided by Kofax Transformation Modules. Additionally, you can create your own dictionaries. By default, no dictionary is selected. When you click the button to the right, you can browse to the directory, where the dictionaries are stored. Click <None> to remove the dictionary from your recognition profile. The value for this option is set to <None> by default.

Important To be able to use extended sample dictionaries modified in a project created in an earlier Kofax Transformation Modules version, you need to copy these files from <Install Directory>\Program Files\Common Files\Kofax\RecoStar40b\Dict to the ..\RecoStar78\Dict directory.

Complete

Select this if you want to return a result from the dictionary if the recognition result is not confident. This option is cleared by default.

Definitions for the buttons at the bottom of this window can be found in Common Project Builder Buttons.