Kofax Clarity Page Recognition Profile Settings window

This recognition engine differs slightly from the other engines. It is specialized to recognize text in any image, including an picture of an ID badge worn by a person, or a photo of a sign. Also, instead of performing OCR on the Kofax Transformation Modules server, OCR is performed on a remote server via an internet connection.

Important You require a separate license to use this recognition engine.

Communication between the Kofax Clarity recognition profile and its server use port 443. This port must be open in your Firewall settings.

If you perform recognition using Kofax Clarity during testing in Project Builder, runtime licenses are consumed. If you want to avoid consuming too many runtime licenses during project configuration and testing, Kofax recommends that you test Kofax Clarity on selected images only, and that you perform OCR on other test images or training images using another recognition engine.

During configuration, the best practice is to run this recognition engine a few times without a fallback recognition engine configured. This ensures that everything is working as expected and that the proper internet access if available.

You can use this window to set the Kofax Clarity full-text OCR profile.

Languages

This group enables you to select one or more specific languages, or allow the recognition engine to determine the language itself.

This group has the following options:

Automatic language detection

Select this to allow the Kofax Clarity recognition engine to determine the language of a document. This option is selected by default.

Selected languages

Select this option if you want to explicitly specify what languages are used in your documents.

Once selected, the list of languages is enabled. Select one or more languages.

The list of available languages depends on which Recognition mode is selected.

If your documents are in Chinese, Greek, Hebrew, Japanese, Korean, or Thai, these are not supported by Document Mode. These languages are supported by Text mode, but your documents may not be suitable for that mode. For the best results for documents with these languages, thoroughly test both modes with the selected language as well as the Automatic language detection option to see which combination performs best. Alternatively, select a different recognition engine.

Important If you are not sure what languages are used in your documents, use the "Automatic language detection" option because it provides better OCR results than if the wrong languages are selected.
General Settings

This group enables you to specify how a document is recognized with the Kofax Clarity engine.

This group has the following options:

Recognition mode

Select one of the following modes of recognition.

  • Document Mode.

    Select this mode if your documents are classic paper documents, forms, or densely packed text images. For example, an invoice or a bank letter. This option is selected by default.

  • Text mode.

    Select to detect and extract text from images with a small amount of text. For example, a photo ID card.

Word separation characters

Use this field to define what characters may separate words.

The value for this option is set to /:()-# (forward slash, colon, open and close parentheses, hyphen, pound) by default.

Fallback Profile

If the Kofax Clarity recognition profile is temporary unavailable, you can configure what recognition engine is used instead. This ensures that a broken network connection does not hold up processing with failed OCR results.

Recognition profile to be used as fallback

Select a page recognition profile that performs OCR if the Kofax Clarity recognition profile is not available. The value for this option is set to <None> by default.

Important You cannot call page recognition from script using the Kofax Clarity recognition engine.