Content classification

This type of classification is used for any class that has Content classification enabled on the class details. Content classification uses the text recognition results of a document to determine the classification result and is made up of several smaller steps.

Depending on your project configuration, content classification can include the following:

  • Project-wide Content Classifier that is configured on the Project Setting - Classification tab.

  • A Subtree Classifier that is configured on a class-by-class basis for classes that have child classes.

  • A TDS Page Classifier that is configured on the Project Settings - Document Separation tab if TDS is configured.

  • Instructions that can be added to each class and are processed by the Instruction Classifier.

    Note Disabling content classification means that configured Instructions are not processed. As a result, even if you use no other means of content classification, when Instructions are configured, you must enable content classification for the instructions to be processed.

These content classifiers are all processed in a single step to determine the final classification result.

Use content classification if the document layout is inconsistent. Content classification allows documents in a specific class to have varying first pages and inconsistent page counts.

If a class is configured for both layout classification and content classification, content classification is run only if a confident result was not returned during layout classification. Similarly, if a class has an inconsistent layout, the best results are usually returned by using the content classifier only.

Content classification also uses the Classification Set. This type of classification may require more training documents than layout classification. For the best results, add training documents slowly and train and test after each addition. Once you are happy with your test results, configure your extraction settings.

Note If a document has been classified into a class that does not require text recognition and then manually reclassified into a class that uses the content classifier, the system task does not train this document into the dynamic classifiers. The document is collected in the New Samples document set for later import.