Classification benchmarks and optimization with the Result Matrix

To provide a complete and detailed analysis of classification quality, Project Builder enables you to test the classification against a benchmark document set or the Classification Training document set. A benchmark provides more information than a preliminary classification test.

A benchmark document set is a set of documents that differs from the training documents and a set of test documents. Each document in a benchmark set has an assigned class. When you generate a benchmark, the classifier compares the assigned class to the classification result based on project and class classification settings.

You can generate a classification benchmark for all classification settings in a project. This includes all classification settings on the Project Settings - Classification tab, class settings, all classifiers, any classification instructions, and any classification script events.

You can also generate a classification benchmark for a specific classifier configured in your project. This means that if you have a Layout Classifier, a top-level content classifier (Adaptive Feature Classifier), and subtree classifiers, you can generate a benchmark for each. This enables you to compare classifier results and modify classification settings as needed.

You cannot run a benchmark against a normal custom test set because there is no "Assigned class" value in a test set, and this value is needed for comparison against the classification result. If necessary, convert a custom test set into a benchmark set to generate a benchmark.

Important When you want to generate a classification or separation benchmark, first store the assigned class structure on disk using the Sort Documents on Disk by Class option from the Benchmark Sets shortcut menu.

Running a benchmark generates the Result Matrix and displays a grid of the correctly and incorrectly classified documents by class. The Result Matrix provides statistics about unclassified documents and about recall and precision.

Tip For the best results for a project that includes child classes or a complex class hierarchy, select "All Settings" when generating a classification benchmark for a benchmark set.

Any results that are overridden by exceptions appear in a darker color than the rest of the results. For example, if an added exception is sets a group of unclassified documents to valid, the color becomes dark green.

In order to optimize the classification quality with the Result Matrix, you can do the following: