Create an all-purpose golden files benchmark set

An all-purpose set of golden files can be used to test different types of benchmarks. Since this set may be used to test each of the different types of benchmarks, you should include ideal documents that are suitable for extraction, classification, and separation.

For example, you should not include a document that is easily classified based on layout, but returns poor recognition results because it was poorly scanned.

Note You cannot include PDF documents in this document set if you plan on using this benchmark set to test separation.
Tip Consider renaming your documents so that they include the class name where they belong. This makes assigning a class significantly easier because you do not need to open and view each document in the Document Viewer.

You can create and all-purpose golden files document set by following these steps:

  1. Open the Documents window if it is not already open.
  2. Add a new document set that includes the documents you want to include in your golden files, ensuring that you provide a descriptive name for the document subset such as "All-Purpose Golden Files."

    A new document set is added to the Documents window and expands so you can see its document subsets automatically.

  3. Convert your new document set to a benchmark set.

    The document set type changes so that you can assign a classification for each document in the document set.

  4. If a different view is in use, switch to the List view Documents Window - Flat View icon.
  5. Assign a class for all documents in your benchmark set.

    Each document is assigned a class for comparison after classification.

  6. Click in the "All-Purpose Golden Files" document subset and click Select All Select All icon from the toolbar or press Ctrl + A.

    All documents in the selected document subset are selected in the List view.

  7. Optionally, if your documents already have recognition data, right-click the selected documents, and then Clear Classification & Extraction Data Clear Document Data icon.

    If you are not sure whether your documents have recognition data or not, perform this step anyway.

    This removes any existing classification or extraction data and provides you with a clean state so you can build your golden files.

  8. Press F4 to perform recognition for the selected documents using the default recognition engine.

    If you want to select a different recognition engine, right-click the selected "All-Purpose Golden Files" documents, click Recognize Recognize icon on the shortcut menu, and then select the desired recognition engine from the submenu.

    A progress bar is displayed showing the recognition progress.

  9. On the document shortcut menu, click Classify Test - Classify Selection icon.

    Each document in the benchmark set is assigned a classification result based on your project settings and training documents if available.

  10. Once again, click Select All Select All icon for your "All-Purpose Golden Files" document subset, and then click Save All Save Document Set icon.
    Tip To ensure you do not lose any changes, save your documents regularly.

    All changes to the documents are saved.

  11. Compare the classification result for each document against its assigned class. If any do not match, confirm that the assigned class is correct. If the assigned class is correct, then add the mismatched to the Classification Set and repeat steps 6 - 10 until the classification result matches the assigned class for all documents in your benchmark set.
  12. If you added any documents to your Classification Set, train your project. From the Ribbon Process tab, in the Train group, select Separation & Classification Train - Classification icon.

    A progress bar is displayed showing the training progress.

  13. If your project is configured for separation, you can now test your separation settings. From the Ribbon Process tab, in the Test group, click Separate Test - Separate Selection icon.

    The Document Separation Results window is displayed showing the separation results.

  14. If the separation results do not match your document structure, review your document separation settings on the Project Settings - Document Separation tab.
    Tip If you are using trainable document separation, ensure that the Project Settings - Classification tab is configured to "Classify each page" for each classifier used in your project.
  15. Repeat steps 13 and 14 until your separation results are as desired.
  16. Now that your classification results and separation results are correct, you can work on your extraction results. Once again, click Select All Select All icon for your "All-Purpose Golden Files" document subset.

    This selects all documents in your subset.

  17. Right-click the selected documents, and then click Extract Test - Extract Selection icon on the shortcut menu.

    Alternatively, you can skip this step and manually classify and extract the documents in Test Validation.

    A progress bar is displayed showing the extraction progress.

  18. You can view your extraction results by opening the Extraction Results window.

    If you have five or fewer fields on your documents, you can also add field columns to the List View in the Documents window. This does not however, give you any information about the confidence or alternatives.

    The confidence of each extraction result is displayed in the Extraction Results window so that you see what needs improvement.

  19. If the extraction results for any document are incorrect, add that document to the Extraction Set.
  20. Since your Extraction Set is modified, retrain your project. From the Ribbon Process tab, in the Train group, select Extraction Train - Extraction icon.

    A progress bar is displayed showing the training progress.

  21. Repeat steps 13 - 17 until you are happy your extraction results and you are ready to validate your documents.
    Note Even if a document is in the Extraction Set, some fields can still be missed. This is not a problem because can fix them in Test Validation.
  22. On the Ribbon Process tab, in the Test group, select Validate Test - Validation icon.

    If your project has more than one validation step configured, you can select the desired step from the submenu.

    The Test Validation window opens and is similar to the Kofax Transformation -Validation module.

  23. Process each document by validating each field that was not extracted correctly.
  24. On the Batch menu in Test Validation, select Exit to return to the main Project Builder window.
  25. Once again, click Select All Select All icon for your "All-Purpose Golden Files" document subset, and then click Save All Save Document Set icon.
  26. On the Ribbon Project tab, in the File group, click Save Project Project Settings icon.
    Tip Create a backup copy of your processed golden files. This ensures that you can always access the backup copy if something happens to the working copy.

    Your project is saved and your golden files are now ready for performing Separation, Classification or Extraction benchmarks.