Create a classification and separation benchmark set

Since classification and separation are closely related, you can use the same benchmark set golden files for both separation benchmark and classification benchmark testing.

The physical structure of your documents is not as important as it was in earlier versions of Kofax Transformation Modules. This is because the new document set structure is able to assign a class for a document rather than rely on its physical location. If however, you have a bunch of test documents that are already separated into folders that match your project class hierarchy, this is fine.

Tip Consider renaming your documents so that the name include the class name. This makes assigning a class significantly easier because you do not need to open and view each document in the Document Viewer.

You can create a classification and separation golden files document set by following these steps:

  1. Open the Documents window if it is not already open.
  2. Add a new document set that includes the documents you want to include in your golden files.,

    Enter a descriptive name for the document subset such as Classification/Separation Golden Files.

    A new document set is added to the Documents window and expands so you can see its document subsets automatically.

  3. Convert your new document set to a benchmark set.

    The document set type changes so that you can assign a classification for each document in the document set.

  4. If a different view is in use, switch to the List View Documents Window - Flat View icon.
  5. Assign a class so all documents in your benchmark set.

    Each document is assigned a class for comparison after classification.

  6. Click on your newly added document subset and click Select All Select All icon from the toolbar or press Ctrl + A.

    All documents in the selected document subset are selected in the List view.

  7. Optionally, if your documents already have recognition data, right-click the selected documents and then click Clear Classification & Extraction Data Clear Document Data icon.

    This removes any existing classification or extraction data and provides you with a clean state so you can build your golden files.

  8. Optionally, if your documents do not have recognition data, press F4 to perform recognition for the selected documents.

    If you want to select a different recognition engine, right-click on the selected Extraction Golden Files documents, click Recognize Recognize icon on the shortcut menu, and then select the desired recognition engine from the submenu.

    A progress bar is displayed showing the recognition progress.

  9. On the document shortcut menu, click Classify Test - Classify Selection icon.

    Each document in the benchmark set is assigned a classification result based on your project settings and training documents if available.

  10. Once again, click Select All Select All icon for your Classification/Separation Golden Files document subset, and then click Save All Save Document Set icon.

    All changes to the documents are saved.

  11. Compare the classification result for each document against its assigned class. If any do not match, confirm that the assigned class is correct. If the assigned class is correct, then add the mismatched to the Classification Set and repeat steps 6 - 10 until the classification result matches the assigned class for all documents in your benchmark set.
  12. If you added any documents to your Classification Set, train your project. From the Ribbon Process tab, in the Train group, select Classification Train - Classification icon.

    If Separation is enabled, the option is called Classification & Separation Train - Classification icon.

    A progress bar is displayed showing the training progress.

  13. If your project is configured for separation, you can now test your separation settings. From the Ribbon Process tab, in the Test group, click Separate Test - Separate Selection icon.
    Important For separation benchmarks, your document set cannot include any subfolders under the Root Folder. You can see the hierarchy of a document set by switching to the Hierarchy View.

    The Document Separation Results window is displayed showing the separation results.

  14. If the separation results do not match your document structure, review your document separation settings on the Project Settings - Document Separation tab.
    Tip If you are using trainable document separation, ensure that the Project Settings - Classification tab is configured to "Classify each page" for each classifier used in your project.
  15. Repeat steps 13 and 14 until your separation results are as desired.
  16. On the Ribbon Project tab, in the File group, click Save Project Save Project icon.
    Tip Create a backup copy of your processed extraction golden files. This ensures that you can always access the backup copy if something happens to the working copy.

    Your project is saved and your golden files are now ready for performing Classification and Separation benchmarks.