General tab - Classification Locator Properties window

The General tab enables you to select a classification project, configure the classification mode and specify minimum confidence. The following options are available:

Reference project file

Browse to the location where the reference project is located and select the desired project.

Automatic update from project file

Select this option to ensure that the local copy of the reference project stays up-to-date with the original reference project file. For the best results, select this option only if the referenced project is updated regularly. This option is cleared by default.

Classification Mode

This group enables you to select one of the following classification modes:

  • Complete document (text only). This option means that the entire document and its text is used for classification.

    This mode does not consider any hierarchical classification rules, such as subtree classification or default classification results. This is the default value for this option.

    Note The text used here for classification can be restricted to specific regions or pages.
  • Line by line (text only). This mode means that each text line is classified individually and returned as an alternative, if the confidence is high enough. The results are then sorted by confidence.

    The coordinates of the line are included with the returned alternatives and each alternative is highlighted on the document. This enables the calling project to access these coordinates as needed. For example, to find the highest line on a page that was classified as a specific value.

  • Complete document (hierarchical). This mode means that both layout and text classification can be used in the referenced classification project. For the actual classification process, the various settings in the classification project is used.

    The regions definition is used to determine how many pages need OCR.

    A final classification result can have a very low confidence if certain classification rules were applied. The result can also be lower than the results of other classes that are not the final classification result. The Set classification results to 100% option should be used in that case.

    Note If you select this option, you cannot define a default result for the locator and therefore the Default result for the Result Mode pane is disabled. In case no result is found for this locator, the default classification result that is defined within the referenced project file is assigned as the locator result.
Note The above classification modes do not execute scripts, even classification scripts.
Classification Settings

This group has the following options:

Min. confidence

Only classification results with a confidence higher than or equal to this value is returned as alternatives. This option is set to 70 by default.

Set classification result to 100%

This option is only enabled when the Complete document (hierarchical) value is selected for the Classification Mode option. When this option is selected, the confidence of the alternative that is the final classification result always is 100%. This is important because the final classification result might be very low as a result of subtree classification or using the default classification result. If that were to happen, it would not be possible to distinguish between the final classification result and other possible alternatives. This option is selected by default.

Result Mode

This group enables you to select one of the following result modes as well as configure the following options:

  • Single topic.

    If this option is selected, only one class from the referenced project is used as a result in the alternative. This is the default value for this option.

  • Multi topic. If this option is selected, a semicolon delimited list of the best class results is used as alternative values.

Max. number of results (0 = all)

This limits the number of returned classification results to the specified number. A value of 0 means all alternatives that meet the confidence requirements is returned. This option is set to 5 by default.

Default result

This option is disabled unless the Complete document (text only) value is selected for the Classification Mode option. If no classification result is found, this default result is assigned as the final result. If no default result is defined, the locator returns no value. This option has a value of <none> by default.

The default result can be a text string such as Nothing, Unclassified or something similar.

Definitions for the buttons at the bottom of this window can be found in Common Transformation Designer Buttons.