Machine learning and online learning in TotalAgility

This topic provides an overview of all the tools in TotalAgility that support Online Learning or Learn by Example.

Classification

TotalAgility supports the following types of classification:

Content-Based: Learns from examples of documents that have text or OCR results.
Layout-Based: Learns to distinguish document types based on their graphical layout. It does not require OCR and is usually faster.

Both classifiers support pre-training and Online Learning.

The two classifiers are often combined. You run the layout-based classification first, as it is fast and does not need OCR. If the result is not confident, then you perform OCR and run the content-based classification.

Separation

Starting with TotalAgility 7.9.0, both pre-training and Online Learning are supported. In earlier versions, Online Learning was not available.

Document separation learns from multi-page sample documents. It learns to distinguish the first, middle and last pages of every document type. When executed, it first classifies every page and then creates the most likely separation of the page stream into individual documents based on the page classification results. Document separation can use both content and layout classifiers for the pages.

Extraction

TotalAgility supports the following locators and evaluators:

Locator and evaluator	Description	Supports pre-training?	Supports Online Learning?	Supports Specific learning?	Supports Generic learning?
Invoice Specific Locator	Used for amount and header fields	Yes	Yes	Yes	Yes
Trainable Group Locator	Used for custom fields that can be trained (values and keywords).	Yes	Yes	Yes	Yes
Table Locator	Used for line items.	Yes	Yes	Yes	No
Text Content Locator	Used for data that is embedded in natural language text, such as phrases or objects from a sentence.	Yes	No	No	Yes
Trainable Evaluator	Similar to the Trainable Group Locator. The only difference is the Trainable Group Locator considers all the words on a document as potential candidates, while the Trainable Evaluator only considers words provided by another locator. So, the Trainable Evaluator gives more control, but also requires more setup.	Yes	No	Yes	No