Extraction online learning

Use extraction online learning to improve the results for documents with unsatisfactory recognition results.

Extraction online learning is available only for trainable locators.

Marking a document for extraction online learning enables you to increase the recognition of documents with the same layout during production.

To use this feature, you need geometric information for the document fields. This type of extraction online learning does not require a project to be manually trained and is ideal for projects that process invoices.

This type of learning is designed to optimize field recognition rate during production. This enables faster initial project setup and continuous optimization during production.

It is mainly based on the specific training algorithm of all trainable locators.

When extraction online learning is performed for a document, the following process occurs during production.

  1. A document is scanned into the system, it is stored in the documents database.

  2. The Transformation Server retrieves the document from the database in order to perform extraction. After extraction, the documents and its extraction results are returned to the database.

  3. The Validation operator takes the validation activity (Thin Client - Document Services) that opens the updated document from the documents database. If any of the extraction results are unsuccessful, the user provides the correct information.

    When the user completes the activity, the validated document is once again returned to the documents database along with the correct training information. The Thin Client also copies the modified documents to a special holding area in the documents database called the online learning folder.

  4. At scheduled intervals, the Transformation Server takes all of the documents stored in the online learning folder and trains the project dynamically. Any information that the validation operator provided is processed and stored in the Dynamic Knowledge Base.

    These documents are also copied to the New Samples document set, where a project administrator can import them into Transformation Designer to help the project perform better.

  5. The next time that the Transformation Server performs extraction, the information in the Dynamic Knowledge Base is used to aid in extraction.