Specific training of trainable locators

All trainable locators provide an algorithm called specific training. Specific training recognizes the layout of a trained document and correlates all possible features that can be used for extraction with that layout.

During extraction, these locators internally perform a layout classification. If the layout is known from a training document, the document is extracted based on information from the training document with the same layout. The disadvantage of this approach is that for each layout, you must train at least a single document. Using the specific training algorithm in combination with the generic knowledge bases significantly reduces the number of necessary training documents.

Only documents that cannot be recognized by the generic knowledge bases need to be trained for Extraction Online Learning.

The tight integration between the specific training algorithm and the Online Learning workflow greatly reduces the effort for additional document training. The specific training algorithm works well against unintended training errors, because a training document only affects other documents of the same layout. Even an unusual combination of keywords on an invoice document, which causes problems with the Generic Knowledge Base algorithm, can be trained and marked for extraction online learning.