Create an extraction golden files benchmark set

If you want to create a set of golden files for extraction benchmarks, you can do this in one of the following ways:

  • Validate a set of documents using Project Builder and Test Validation.

  • Process a set of documents through Kofax Capture, Kofax Transformation - Server, Validation, and Knowledge Base Learning Server and save documents for online learning.

Tip Consider renaming your documents so that they include the class name where they belong. This makes assigning a class significantly easier because you do not need to open and view each document in the Document Viewer.

The following steps guide you through the steps to create your golden files using Project Builder and Test Validation.

Note PDF documents can be included in these golden files if the golden files are not used for separation.

You can create an extraction golden files document set by following these steps:

  1. Open the Documents window if it is not already open.
  2. Add a new document set that includes the documents you want to include in your golden files, ensuring that you provide a descriptive name for the document subset such as Extraction Golden Files.

    A new document set is added to the Documents window and expands so you can see its document subsets automatically.

  3. Convert your new document set to a benchmark set.

    The document set type changes so that you can assign a classification for each document in the document set.

  4. If a different view is in use, switch to the List View Documents Window - Flat View icon.
  5. Assign a class for all documents in your benchmark set.

    Each document is assigned a class for comparison after classification.

  6. Click in the Extraction Golden Files document subset you created earlier and click Select All Select All icon from the toolbar or press Ctrl + A.

    All documents in the selected document subset are selected in the List view.

  7. Optionally, if your documents already have recognition data, right-click the selected documents, and then click Clear Classification & Extraction Data Clear Document Data icon.

    If you are not sure whether your documents have recognition data or not, perform this step anyway.

    This removes any existing classification or extraction data and provides you with a clean state so you can build your golden files.

  8. Press F4 to perform recognition for the selected documents using the default recognition engine.

    If you want to select a different recognition engine, right-click the selected Extraction Golden Files documents, click Recognize Recognize icon on the shortcut menu, and then select the desired recognition engine from the submenu.

    A progress bar is displayed showing the recognition progress.

  9. On the document shortcut menu, click Classify Test - Classify Selection icon.

    Each document in the benchmark set is assigned a classification result based on your project settings and training documents if available.

  10. Once again, click Select All Select All icon for your Extraction Golden Files document subset, and then click Save All Save Document Set icon.
    Tip To ensure you do not lose any changes, save your documents regularly.

    All changes to the documents are saved.

  11. Compare the classification result for each document against its assigned class. If any do not match, confirm that the assigned class is correct. If the assigned class is correct, then add the mismatched to the Classification Set and repeat steps 6 - 10 until the classification result matches the assigned class for all documents in your benchmark set.
  12. If you added any documents to your Classification Set, train your project. From the Ribbon Process tab, in the Train group, select Separation & Classification Train - Classification icon.

    A progress bar is displayed showing the training progress.

  13. Now that your classification results are correct, you can work on your extraction results. Once again, click Select All Select All icon for your Extraction Golden Files document subset.

    This selects all documents in your subset.

  14. Right-click the selected documents, and then click Extract Test - Extract Selection icon from the shortcut menu.

    Alternatively, you can skip this step and manually classify and extract the documents in Test Validation.

    A progress bar is displayed showing the extraction progress.

  15. You can view your extraction results by opening the Extraction Results window.

    If you have five or fewer fields on your documents, you can also add field columns to the List View in the Documents window. This does not however, give you any information about the confidence or alternatives.

    The confidence of each extraction result is displayed in the Extraction Results window so that you see what needs improvement.

  16. If the extraction results for any document are incorrect, add that document to the Extraction Set.
  17. Since your Extraction Set is modified, retrain your project. From the Ribbon Process tab, in the Train group, select Extraction Train - Extraction icon.

    A progress bar is displayed showing the training progress.

  18. Repeat steps 13 - 17 until you are happy your extraction results and you are ready to validate your documents.
    Note Even if a document is in the Extraction Set, some fields can still be missed. This is not a problem because can fix them in Test Validation.
  19. On the Ribbon Process tab, in the Test group, select Validate Test - Validation icon.

    If your project has more than one validation step configured, you can select the desired step from the submenu.

    The Test Validation window opens and is similar to the Kofax Transformation -Validation module.

  20. Process each document by validating each field that was not extracted correctly.
  21. On the Batch menu in Test Validation, select Exit to return to the main Project Builder window.
  22. Once again, click Select All Select All icon for your Extraction Golden Files document subset, and then click Save All Save Document Set icon.
  23. On the Ribbon Project tab, in the File group, click Save Project Save Project icon.
    Tip Create a backup copy of your processed golden files. This ensures that you can always access the backup copy if something happens to the working copy.

    Your project is saved and your golden files are now ready to perform Extraction benchmarks.