Classification benchmarks and optimization with the Result Matrix
To provide a complete and detailed analysis of classification quality, Project Builder enables you to test the classification against a benchmark document set or the Classification Training document set. A benchmark provides more information than a preliminary classification test.
A benchmark document set is a set of documents that differs from the training documents and a set of test documents. Each document in a benchmark set has an assigned class. When you generate a benchmark, the classifier compares the assigned class to the classification result based on project and class classification settings.
You can generate a classification benchmark for all classification settings in a project. This includes all classification settings on the Project Settings - Classification tab, class settings, all classifiers, any classification instructions, and any classification script events.
You can also generate a classification benchmark for a specific classifier configured in your project. This means that if you have a Layout Classifier, a top-level content classifier (Adaptive Feature Classifier), and subtree classifiers, you can generate a benchmark for each. This enables you to compare classifier results and modify classification settings as needed.
You cannot run a benchmark against a normal custom test set because there is no "Assigned class" value in a test set, and this value is needed for comparison against the classification result. If necessary, convert a custom test set into a benchmark set to generate a benchmark.
Running a benchmark generates the Result Matrix and displays a grid of the correctly and incorrectly classified documents by class. The Result Matrix provides statistics about unclassified documents and about recall and precision.
Any results that are overridden by exceptions appear in a darker color than the rest of the results. For example, if an added exception is sets a group of unclassified documents to valid, the color becomes dark green.
In order to optimize the classification quality with the Result Matrix, you can do the following: