Layout Classifier Properties window

Use this window to configure the properties for a layout classifier.

Optimize Classification for
  • Invoices. This is the default value for this option.

    If this option is selected, the classifier analyzes only the upper and lower parts of the document. The remainder of the document is not used for classification. This is especially useful for invoices, because they often have a preprinted header and footer area. It may also apply for other types of business documents that have a similar structure.

  • Forms.

    If this option is selected, the classifier uses the entire region of the image. This can be used for forms and other types of documents that have a fixed layout over the entire region of the image.

If you want to modify the advanced properties, click Advanced to access the advanced properties.

Image Preparation

This group has the following option:

Enable skew tolerance

This option cannot be used if the processed documents are already deskewed by some other application. For example, when using VRS during scanning, there is no need to select this option because VRS adjusts skewed images automatically. This option is selected by default.

Training

This group has the following options:

Max samples per class

The Layout Classifier supports an unlimited number of samples per class. If the sample images are very different, the Layout Classifier internally learns different patterns for each sample. For performance reasons, you may want to limit the number of sample documents that are used for feature extraction. A value of 0 means no limitation. The value for this option is set to 0 by default.

Class homogeneity

This feature controls how sensitive the classifier is to variations in the layout of the images in the training set. If the sample images are very different, the Layout Classifier automatically creates internal patterns for each new type. These types are not visible to the user.

The more types, the better the classification accuracy, but the the classification speed is slowed for each additional type. The value set by this control is a threshold that determines when new internal types are created. The value for this option is set to 80.0 by default. In most cases the default value works the best.

Noise Filter

This group has the following options:

This feature controls how to match regions with low contrast. For example, images that have a fine background pattern. A value closer to the “max. precision” side does not classify images with low contrast. This means that even documents from the training set do not have 100% confidence. The probability of getting misclassified documents is then much smaller, resulting in a higher accuracy, but more rejects. If you set a value closer to the max. recall side, higher confidence values are returned for documents with low contrast. However, this may mean that high confidence values are determined for other classes with low contrast in the same region of the document, and may lead to a higher error rate. The value for this option is set to 15.0 by default. In most cases the default value works best.

Definitions for the buttons at the bottom of this window can be found in Common Project Builder Buttons.