Classification tab - Project Settings window

You can use the Classification tab to configure the classification for your project.

Classification Settings

This group has the following options:

Default classification result

Select the desired default classification result that is used if no other classification result can be determined. The value for this option is set to <unclassified> by default.

Classification evaluation

Select the type of classification evaluation you want to use. The following choices are available:

  • Automatic evaluation.

    The specified values for confidence and distance are used to evaluate the result. For multi-page documents the classification is performed page by page and stops when a page can be classified. All pages following the classified page are not processed. This is the default value for this option.

  • Script implemented evaluation.

    The same page by page classification loop is executed, but your custom script is responsible for evaluating the classification results, breaking the classification loop, and determining the final classification result for the document. You cannot use class-level classification along with script implemented evaluation. If you select the script implemented evaluation option, any previously-made settings are ignored.

Layout Classification

This group has the following options:

Classification behavior

Choose how a document is classified using layout classification from one of the following choices:

  • Classify first page only.

    Use this option if you want the classification loop to processes only the first page of a document for the specified classifier type. If the page cannot be classified the document remains unclassified. This is the default value for this option.

  • Classify each page.

    Use this option if you want the classification loop to processes each page of a multi-page document for the specified classifier type until a page can be classified or the complete document is processed and remains unclassified.

  • Do not use layout classification.

    Select this option if you do not want to use the Layout Classifier.

Note If you are using trainable document separation, the selection for this option does not affect the page classifier from classifying each page for separation. The selection made here determines how a document is classified using layout classification after it is separated.

After selecting the Layout Classifier method, you can specify confidence with the following options:

Minimum confidence

Use this option to set the minimum confidence. This is the smallest value required for automatic evaluation to assign a classification result. The value for this option is set to 70 by default.

Minimum distance

Use this option to set the minimum distance between the best and the second best classification result. The value for this option is set to 10 by default.

Layout Classifier

If layout classification is enabled for one or more classes in your project, this classifier is run during production. The following buttons are available for managing your Layout Classifier.

Properties

Click this button to display the Layout Classifier Properties window.

Reset

Click this button to remove the Layout Classifier from your project. If layout classification is enabled for any class, this is skipped and content classification is performed instead.

Content Classification

This group has the following options:

Training behavior
Note You can ignore one or more pages of a document if you want to exclude them from training. This is done via the Document Viewer Thumbnails view.

Choose how a document is trained from one of the following choices.

  • Train first page only.

    Select this option if you want to train a document for content classification based on the first page that is not ignored. This is the default value for this option.

    Important When this option is enabled, recognition for content classification is performed for the first page of a document only. If you later switch to "Train all pages" the project needs trained again. This ensures that recognition is performed for all pages so that information is available for content classification and that text editing is possible.
  • Train all pages.

    Select this option is you want to include the complete document for content classification, excluding any ignored pages.

Note The selected training behavior is also used for Classification Online Learning.
Classification behavior

Choose how a document is classified using content classification from one of the following choices.

  • Classify first page only.

    Select this option if you want the classification loop to process only the first page of a document for the specified classifier type. If the page cannot be classified the document is unclassified. This is the default value for this option.

  • Classify each page.

    Select this option if you want the classification loop to processes each page of a multi-page document for the specified classifier type until a page can be classified or the complete document is processed and stays unclassified.

  • Classify all pages at once.

    Select this option to merge the text of all pages so it can be processed by the content classifier at once.

  • Do not use content classification.

    Select this option if you do not want to use the content classifier. This option can be used to avoid OCR on demand during classification if only the Layout Classifier is used.

Note Since Instruction classifiers are processed during content classification, your selection for this option can affect any instructions configured in the Project Tree. If you select the Classify first page only option, the instructions are performed on the first page only. If however, you select the Do not use content classification option, the instructions are not processed at all. Ensure that if your project is using instructions to improve classification results that you select the appropriate value for this option.
Note If you are using trainable document separation, the selection for this option does not affect the page classifier from classifying each page for separation. The selection made here determines how a document is classified using content classification after it is separated.

After selecting the content classifier method, you can specify confidence with the following content classifier options:

Minimum confidence

Use this option to set the minimum confidence. This is the smallest value required for automatic evaluation to assign a classification result. The value for this option is set to 50 by default.

Minimum distance

Use this option to set the minimum distance between the best and the second best classification result. The value for this option is set to 10 by default.

Content Classifier

If content classification is enabled for one or more classes in your project, this classifier is run during production. You can select between the following content classifier types:

  • Adaptive Feature Classifier.

    This is a learn-by-example classifier that can be trained and improve classification results over time. This is the default value for this option. The following buttons are available for this classifier

    Properties

    Click this button to display the Adaptive Feature Classifier Properties window for the main content classifier.

    Reset

    Click this button to remove the main content classifier from your project. If content classification is enabled for any class, the classification result relies on other methods of classification.

  • Text Classifier.

    This is a deprecated classifier and should only be used for old projects that use this type of classifier already.

Definitions for the buttons at the bottom of this window can be found in Common Project Builder Buttons.