Kofax Clarity Page Recognition Profile Settings window

This recognition engine differs slightly from the other engines. It is specialized to recognize text in any image, including an picture of an ID badge worn by a person, or a photo of a sign. Also, instead of performing OCR on the Kofax TotalAgility server, OCR is performed on a remote server via an internet connection.

You require a separate license to use this recognition engine.

Communication between the Kofax Clarity recognition profile and its server use port 443. This port must be open in your Firewall settings.

If you perform recognition using Kofax Clarity during testing in Transformation Designer, runtime licenses are consumed. If you want to avoid consuming too many runtime licenses during project configuration and testing, Kofax recommends that you test Kofax Clarity on selected images only, and that you perform OCR on other test images or training images using another recognition engine.

During configuration, the best practice is to run this recognition engine a few times without a fallback recognition engine configured. This ensures that everything is working as expected and that the proper internet access if available.

You can use this window to set the Kofax Clarity full-text OCR profile.

Languages

This group enables you to select one or more specific languages, or allow the recognition engine to determine the language itself.

This group has the following settings:

Automatic language detection

Select this to allow the Kofax Clarity recognition engine to determine the language of a document. This setting is selected by default.

Selected languages

Select this setting if you want to explicitly specify what languages are used in your documents.

Once selected, the list of languages is enabled. Select one or more languages.

The list of available languages depends on which Recognition mode is selected.

If your documents are in Chinese, Greek, Hebrew, Japanese, Korean, or Thai, these are not supported by Document Mode. These languages are supported by Text mode, but your documents may not be suitable for that mode. For the best results for documents with these languages, thoroughly test both modes with the selected language as well as the Automatic language detection setting to see which combination performs best. Alternatively, select a different recognition engine.

If you are not sure what languages are used in your documents, use the "Automatic language detection" setting because it provides better OCR results than if the wrong languages are selected.

General Settings

This group enables you to specify how a document is recognized with the Kofax Clarity engine.

This group has the following settings:

Recognition mode

Select one of the following modes of recognition.

  • Document Mode.

    Select this mode if your documents are classic paper documents, forms, or densely packed text images. For example, an invoice or a bank letter. This setting is selected by default.

  • Text mode.

    Select to detect and extract text from images with a small amount of text. For example, a photo ID card.

Word separation characters

Use this field to define what characters may separate words.

The value for this setting is set to /:()-# (forward slash, colon, open and close parentheses, hyphen, pound) by default.

Endpoint for OCR processing

Enter the URL for the Kofax Clarity recognition engine. By default, the global endpoint is provided.

Since Kofax Clarity may run in multiple locations, specify the location that best suits your needs.

Fallback Profile

If the Kofax Clarity recognition profile is temporary unavailable, you can configure what recognition engine is used instead. This ensures that a broken network connection does not hold up processing with failed OCR results.

Recognition profile to be used as fallback

Select a page recognition profile that performs OCR if the Kofax Clarity recognition profile is not available. The value for this setting is set to <None> by default.

You cannot call page recognition from script using the Kofax Clarity recognition engine.