Best practices for extraction training

To reduce the time needed to retrain a project, the following trainable group locator methods allow you to export their training data to knowledge bases.

You can export documents from the Extraction Training Set for these locators, and once they are part of a knowledge base, you can completely remove them from the training set. If you do not wish to remove the training documents completely, you can move them to another training set that is excluded from training. Exporting the training documents to a knowledge base decreases the time needed to train your project, but the training data is not lost. The online learning sequence looks at the knowledge bases if there is no match to a document in the Extraction Training Set.

The other trainable locator methods collect training documents as well, but these documents cannot be exported to a knowledge base. As a result, the training documents accumulate until you import them and retrain the project.

To improve overall training for your project:

  • Export training data from the trainable group locators to knowledge bases.

  • Import New Samples in Project Builder often. Failure to import the training documents increases the likelihood of training conflicts and errors in extraction.

  • Make use of the "Automatic training after Validation" option. This option ensures that the only documents collected are those that have been modified during production.

  • Use the "Show in training dialog" option in the field properties and the "Use For Training" option in the Edit Document window. These options ensure that fields used for training are monitored and trained correctly.