Extraction online learning

Use extraction online learning to improve the results for documents with unsatisfactory recognition results. Extraction online learning is available only for trainable locators.

Marking a document for extraction online learning enables you to increase the recognition of documents with the same layout during production. To use this feature, you need geometric information for the document fields. This type of extraction online learning does not require a project to be manually trained and is ideal for projects that process invoices.

This type of learning is designed to optimize field recognition rate during production. This enables faster initial project setup and continuous optimization during production. It is mainly based on the specific training algorithm of all trainable locators.

When extraction online learning is performed for a document, the following process occurs during production.

  1. A document is scanned into the system, it is stored in the documents database.

  2. The Transformation Server retrieves the document from the database in order to perform extraction. After extraction, the documents and its extraction results are returned to the database.

  3. The Validation operator takes the validation activity (Thin Client - Document Services) that opens the updated document from the documents database. If any of the extraction results are unsuccessful, the user provides the correct information.

    When the user completes the activity, the validated document is once again returned to the documents database along with the correct training information. The Thin Client also copies the modified documents to a special holding area in the documents database called the online learning folder.

  4. At scheduled intervals, the Transformation Server takes all of the documents stored in the online learning folder and trains the project dynamically. Any information that the validation operator provided is processed and stored in the Dynamic Knowledge Base.

    These documents are also copied to the New Samples document set, where a project administrator can import them into Transformation Designer to help the project perform better.

  5. The next time that the Transformation Server performs extraction, the information in the Dynamic Knowledge Base is used to aid in extraction.

When extraction online learning is performed for a document, the following sequence is used to locate field data:


Extraction Online Learning Process

  1. First, the Extraction Training Set is used to help with extraction results. If a confident result is found, it is used. If not, processing moves on to the second step.

  2. Second, if there are any knowledge bases added directly to a trainable locators (*.kbi, *.kba, *.kbo, *.kbtgl, and *.kbtbl), these are processed. If a confident result if found, it is used. If not, processing moves on to the third step.

  3. Third, the Dynamic Specific Knowledge Base is used. If a confident result is found, it is used. This is the last step, so if no confident result is found, the corresponding document will be presented to a user.

Since the first step is to look in the Extraction Training Set, it is important to keep this training set up-to-date. This means regularly processing your New Samples and training your project when it is first put into production.

Note If the structure of a document is modified outside of Kofax TotalAgility during processing, it cannot be used for online learning and any generated online learning files are deleted. For example, if a batch is routed to Kofax Capture - Quality Control and one or more documents are modified, those documents are not suitable for online learning.