OmniPage Page Recognition Profile Settings window

You can use this window to set the properties for the OmniPage full page recognition. The following settings are available for configuration.

Languages

This group enables you to select one or more languages from a comprehensive list of available languages. You can combine any languages. The English language only is selected by default.

As you select additional languages, the language count in the header increases. Not all languages are visible and this ensures that you are aware that some languages lower down in the list are selected.

If you select English plus one additional language, that other language is given priority during recognition. For example, a selection of English and Greek assumes that the majority of the documents are in Greek with only a few words in English.

If there are many documents, some mostly English, and others mostly Greek, then it is recommended to pre-sort the documents and then use different projects for production. Alternatively, use a single project and classify the documents into the correct language before recognition is executed, and then use a re-read profile for the dominant language.

General Settings

This group has the following settings:

Word separation characters

Use this field to define what characters may separate words.

The value for this setting is set to /:()-# by default.

For Chinese and Japanese characters, the separation characters defined above, or a space must be present at the end of a character, in order for it to be recognized as a word. Sequential characters with no spaces or separation characters are not recognized as individual words.

Recognition mode

Select the Recognition mode from the following values.

  • Fast.

    Choose this value to prioritize speed of recognition over the recognition accuracy. If you choose this value, the recognition process takes less time, but the overall recognition accuracy may decrease.

  • Balanced.

    Choose this value to balance the priority of the recognition accuracy with the speed of recognition.

    This is the default value for this setting.

  • Accurate.

    Choose this value if you want to prioritize the recognition accuracy over the speed of recognition. If you choose this value, the recognition process takes longer.

Image transfer mode

Choose how images are transferred for this recognition method. You can choose between the following values.

  • Memory.

    Choose this value to use the binary image that is already stored in memory. This is the default value for this setting.

  • File.

    Choose this value to provide the file path so that it can be accessed directly by the recognition engine. If chosen, any image pre-processing performed on the image is ignored during recognition.

    When the File value is chosen for the Image transfer mode, you cannot use the OmniPage page profile for the Mixed Print page profile. If an OmniPage page profile is already used by a Mixed Print page profile, then selecting the File transfer mode removes that OmniPage page profile from the corresponding Mixed Print profile, and an error message is displayed.

    A Mixed Print page profile is not able to process images when their image transfer mode is file-based. This is because in mixed recognition, two images are created in memory, one for each recognition print type configured in the Mixed Print page profile.

Dictionaries

This group has the following settings:

Language dictionary

Select to run spell checking during recognition for all of the following languages.

  • Brazilian
  • Catalan
  • Czech
  • Danish
  • Dutch
  • English
  • Esperanto
  • Finnish
  • French
  • German
  • Greek
  • Hungarian
  • Italian
  • Norwegian
  • Polish
  • Portuguese
  • Russian
  • Slovenian
  • Spanish
  • Swedish
  • Turkish
User dictionary

Select this setting to specify your own dictionary that is used to aid spell checking during recognition. It is important to know that a misspelled word in the recognition results is not always replaced with a corresponding entry from a user dictionary.

Once selected, browse to select a dictionary.

User dictionaries are not supported for the following list of languages.

  • Arabic

  • Chinese (Simplified and Traditional)

  • Hebrew

  • Japanese

  • Korean

  • Thai

  • Vietnamese

The value for this setting is set to None by default.

Business terms dictionary

Select to use a dictionary and then select a business terms dictionary from the list. This dictionary can improve spell checking during recognition using the contents of the selected dictionary. The following dictionaries containing business terms are available. If there is not a suitable setting here, use the User dictionary setting above and specify your own list of terms.

  • English Financial Dictionary

  • Dutch Medical Dictionary

  • English Medical Dictionary

  • French Medical Dictionary

  • German Medical Dictionary

  • Dutch Legal Dictionary

  • English Legal Dictionary

  • French Legal Dictionary

  • German Legal Dictionary

The value for this setting is set to None by default.

OmniPage Raw Results

Use this group to configure how the raw results from the OmniPage recognition engine are saved.

Provide page elements

Select this setting to provide raw page results by specifying a format and scope. This setting is cleared by default.

Scope

Select a value for the scope that is included in the raw results.

  • All elements. When selected, all information from all pages of a document is provided.
  • Tables only. When this value is selected, only table information is provided.

This setting is available only when the Provide page elements setting is selected.

Data format

Select the format for how the raw data is recorded. Select from:

  • XML

  • JSON

This setting is available only when the Provide page elements setting is selected.

Definitions for the buttons at the bottom of this window can be found in Common Transformation Designer Buttons.