Extraction Benchmark window - Benchmark tab

This window displays the results of an extraction benchmark. Depending on the type of extraction benchmark selected, one of the following is performed:

  • An extraction benchmark for the currently-selected class

  • An extraction benchmark for the currently-selected class and all of its child classes

  • An extraction benchmark for all classes in the project

Regardless of the type of extraction benchmark performed, the options displayed on this window are the same.

You can see how many of the selected document set are processed by the progress bar at the bottom of the screen. When the processing is complete, the total number of documents and any skipped documents are displayed.

Legend

This group shows the four possible results. Each is given a unique label and colour so they are readily recognizable in the Summary and Details results.

  • Correct valid fields

  • Correct invalid fields

  • Incorrect invalid fields

  • Incorrect valid fields

Summary

This group provides overall results for the extraction benchmark. It looks at all of the extracted field and provides details about the following:

  • The total percentage of fields from all documents for each of four results. This is the "All Fields" column.

  • The percentage of each of the four results for each extracted field. Each extracted field has its own column in this table.

The sum of each column in this table should equal 100%, and the goal of your extraction settings is to have the highest percentage in the first row. This represents the correct results that match the desired values.

You can sort the individual rows in ascending or descending order by clicking on the relevant result letter (A, B, C, or D). To revert to the default sort order, click Revert to Default Sort Order.

If you click a field in this table, the Selection document set displays only the documents that contain the corresponding field and extraction result. For example, you want to see only the documents that have the Invoice Number field incorrectly extracted as valid. Since this is an undesired result, you can focus on that subset of documents and update your extraction settings for that field. When you are finished with the Invoice Number field, you can move on to other fields that have undesirable extraction results.

Details

This group provides more detailed information about each document and field. Each row is a single document. The name, class, percentages of A, B, C, and D, and the results for each extracted field are displayed.

Important The Class column is not shown when extraction is run for the selected class only.

The first five or six columns are fixed, so if there is a long list of fields, you can scroll the fields without losing sight of the file, class if applicable, and the cumulative extraction statistics.

Each individual field cell contains the results of the extraction benchmark. The six possible results are:

Correct valid field (green with single value)

The extracted value for this field matches the desired value.

The following example shows a date field that has the correct value:

Correct valid field

Correct invalid field (blue with single value)

The extracted value for this field matches the desired value, but for some reason it is marked invalid.

The following example shows a date field that has the correct value, but was marked invalid because it did not meet the minimum confidence threshold.

Correct invalid field

Incorrect invalid field (yellow with conflicting values)

The extracted value for this field does not match the desired result and is marked invalid.

The following example shows an invoice number field that has the incorrect value. Because the extracted value and the desired value do not match, it is marked invalid. The desired value is shown in brackets, and since there was no extracted value, it is empty.

Field not found on document

Incorrect valid field (red with two values)

The extracted value does not match the desired value, yet the field is marked valid. In situations like this, the field is not shown during Validation, so it is important to eliminate these results by modifying the extraction settings before the project is used in production.

The following example shows a subtotal field that has an incorrect value, even though the extracted value and the desired value do not match.

Incorrect invalid field

Field not in class (no highlighting or values)

A blank cell in the details table means that a field is not present in the class.

The following example shows an empty cell in the InvoiceDate column. The field is empty because the document in the corresponding row is classified as a different class that does not contain the InvoiceDate field.

Incorrect valid field

Field not found on document (no highlighting and missing values)

A blank cell that contains -/- means that the field is present in the corresponding class but cannot be found on the document.

The following example shows a cell with no highlighting that contains a blank value. The field has a blank value (-/-) because the document was classified as an Invoice, yet there is no invoice number on the document.

If you want to view the file associated with an individual cell in this table, you can click a cell to select the document in the Documents window. You can also double-click the cell and open the Document Viewer for the corresponding document. All extracted field data is highlighted on the image.

By default, the documents shown in the Details table are listed in the order they are processed. If you want to rearrange the sort order, you can choose between the following options:

  • Click in one of the summary headers (A, B, C, or D) to sort the results in ascending or descending order.

  • Click in a field column header to sort the field results in ascending or descending alphabetical order.

  • If you have sorted a field column, you can click Switch Sort Mode, and switch to Sort by status mode. This sorts the column by extraction status, displaying all of the green cells at the top, followed by yellow, then blue, then red, and last any empty fields if they exist.

Field not found in class

If you want to view the file associated with an individual cell in this table, you can click a cell to select the document in the Documents window. You can also double-click the cell and open the Document Viewer for the corresponding document. All extracted field data is highlighted on the image.

By default, the documents shown in the Details table are listed in the order they are processed. If you want to rearrange the sort order, you can choose between the following options:

  • Click in one of the summary headers (A, B, C, or D) to sort the results in ascending or descending order.

  • Click in a field column header to sort the field results in ascending or descending alphabetical order.

  • If you have sorted a field column, you can click Switch Sort Mode, and switch to Sort by status mode. This sorts the column by extraction status, displaying all of the green cells at the top, followed by yellow, then blue, then red, and last any empty fields if they exist.

In addition to the common Project Builder buttons, the following buttons are provided:

Start

Begins processing the extraction benchmark calculations.

Stop

Cancels the benchmark calculations. Once the extraction benchmark results are complete, this button is no longer available.

Save

Saves your current extraction benchmark results to an XML file. The Save Benchmark window is displayed so you can specify a filename and provide a comment for your benchmark results. These benchmark files can be compared to other recent benchmarks.

Export

Saves your current extraction benchmark results in .CSV format.