Table Extraction Set

Use this document set to train table detection. Documents added and then trained improve the recognition of advanced tables when the project is in production.

If a project changes over time or new tables are not recognized, the only solution is manual training. This is because there is no online learning for table detection. Instead, add and edit new table training documents, retrain your project, and then release the project. Any documents processed after that should catch any missing tables that meet the new training criteria. Repeat this if your recognition results decline or new tables are introduced.

This document set contains the <All Documents> document subset only. Documents are displayed or hidden based on the Filter selection, and the Table Training window works with the <selected class> filter. As you select a class in the Project Tree only those table training documents for that class are displayed.

If you add a document to the Table Extraction Set that is not classified, it is visible only when you select the <no filter> filter and it is flagged with the Exclude from Training icon . Because of this, Kofax recommends that when you add a document to the Table Extraction Set it is already classified and has recognition results.

When already classified documents are first added to the Table Extraction Set for the selected class, they are initially excluded from training and the Exclude from Training icon is displayed. Once a table label is added to a document it is automatically included in training and the Include for Training icon is displayed. This means that documents are excluded from training until at least one table label is added to the training document. You can manually include a document with no table labels, but this may cause issues with training.

It is possible to manually exclude a document from the table training without removing it from the document subset. This is done by clicking on the icon in the Use column. The Include for Training icon is used to show that a document is used for training. The Exclude from Training icon shows that a training document is not included in the training. You may need to set the Filter to <no filter> to see unassigned documents.

If a document is excluded from training then the document still appears in Table Training window, but there is an indicator that indicates that this document is excluded and predicted label will be displayed even when excluded. In other words, even though a document is excluded from the training, the table detection information is still there and displayed in the Training window. However, this information is excluded from the training data.

Table training uses the assigned class to collected documents for class level training. When training is performed, all confirmed and protected documents for the selected class are used for training. Any excluded documents are classified but not included in the training data.

This document set should have no more than 2000 training documents. Any more than this can negatively affect project performance.

You can manage your Table Extraction Set in the following ways: