Configure tab - Clustering window

Use the Configure tab to define the clustering settings for the selected document subset that contains the documents you want to categorize as clusters.

General

This group has the following options:

Use project class names

Select this option to use the project class names as the labels for the clusters. This option is selected by default.

Cluster documents with no assigned class only

Select this option to perform clustering for those documents that do not have an assigned class. Any documents with an assigned class are ignored. This option is selected by default.

Are most of the documents unstructured?

Select one of the following clustering methods:

  • Yes (cluster based on content)

    Note Documents require recognition results for content classification. If no recognition results are available, an error message is displayed recommending the use of the Recognition on-demand option.
  • No/Unknown (cluster based on layout)

    Note Documents used for layout clustering must include an image component. If no image is available, an error message is displayed.

If the selected document subset contains mostly unstructured documents such as letters, emails, contracts, or general correspondence, select Yes to use content clustering. If however, most of the documents are structured, such as forms or invoices, select No to use layout clustering. If you do not know the type of documents in the document subset, select No.

Content Clustering

This group has the following option:

Recognition on-demand

Select this option if you want to perform content clustering and you are not sure if all of your documents are already processed by the recognition engine. Before the clustering process starts, recognition is performed when the full text results of the images are missing.

Note If any documents do not have recognition content, they cannot be clustered. As a result, the best practice is to select this option unless you have performed recognition in a previous step.
Layout Clustering

This group has the following options:

Minimum cluster size

You can specify the minimum number of documents required for a cluster to accept the result as a valid cluster. The value for this option is set to 10 by default.

Minimum confidence

Select a value to determine how confident the classification of a document has to be before it is assigned to a cluster.

In addition to the common Project Builder buttons, the following buttons are available:

Reset

Select this option to restore the default configuration settings.

Start Clustering

Click this button to start the clustering process on the selected document subset using the specified settings. In order to proceed, you need to select at least one of the clustering methods.