Machine learning and online learning in TotalAgility

This topic provides an overview of all the tools in TotalAgility that support Online Learning or Learn by Example.

Classification

TotalAgility supports the following types of classification:

  • Content-Based: Learns from examples of documents that have text or OCR results.

  • Layout-Based: Learns to distinguish document types based on their graphical layout. It does not require OCR and is usually faster.

Both classifiers support pre-training and Online Learning.

The two classifiers are often combined. You run the layout-based classification first, as it is fast and does not need OCR. If the result is not confident, then you perform OCR and run the content-based classification.

Separation

Starting with TotalAgility 7.9.0, both pre-training and Online Learning are supported. In earlier versions, Online Learning was not available.

Document separation learns from multi-page sample documents. It learns to distinguish the first, middle and last pages of every document type. When executed, it first classifies every page and then creates the most likely separation of the page stream into individual documents based on the page classification results. Document separation can use both content and layout classifiers for the pages.

Extraction

TotalAgility supports the following locators and evaluators:

Locator and evaluator

Description

Supports pre-training?

Supports Online Learning?

Supports Specific learning?

Supports Generic learning?

Invoice Specific Locator

Used for amount and header fields

Yes

Yes

Yes

Yes

Trainable Group Locator

Used for custom fields that can be trained (values and keywords).

Yes

Yes

Yes

Yes

Table Locator

Used for line items.

Yes

Yes

Yes

No

Text Content Locator

Used for data that is embedded in natural language text, such as phrases or objects from a sentence.

Yes

No

No

Yes

Trainable Evaluator

Similar to the Trainable Group Locator. The only difference is the Trainable Group Locator considers all the words on a document as potential candidates, while the Trainable Evaluator only considers words provided by another locator. So, the Trainable Evaluator gives more control, but also requires more setup.

Yes

No

Yes

No