Separation and classification online learning

Use document separation and classification online learning to ensure that documents with unsatisfactory document separation or classification results are collected and used to improve future batches.

Document separation and classification online learning is configurable only if your project has a Classification Group and document separation is enabled.

Documents that are modified during production are collected automatically. How the collected documents are used however, depends on how your project is configured.

The documents are used in one of the following ways.

  1. Dynamic online learning for separation and classification. This type of online learning is the default type when classification online learning is enabled. It has dynamic classifiers that collected training documents whenever an issue with classification arises. At regular intervals, the Transformation Server takes the accumulated documents in the training folder, imports them, and trains your project. Subsequent documents benefit from these added training documents and classification improves over time.

    This is the recommended type of classification online learning to get the best results for your project.

  2. Basic document separation and classification online learning also collects training documents automatically. These documents are copied into the New Samples document set. The project administrator can import the training documents, but they are applied only when your project is manually trained and then republished. For the best results, train your project at regular intervals.

When an unknown document is encountered during production and separated manually or assigned to a class by a user, that document is then collected as a training document. The next time a document of that class is processed, it is better recognized. A few documents are collected for each class or until the separation and classification results are confident.

Although it is best practice to add at least one training document for each class during configuration, this is no longer mandatory in order to get successful classification results. You can now use separation and classification online learning. You can put a brand new project with several classes and no classification training documents into production. Each time a document is not recognized and is separated manually or reclassified, or both, a training document is collected. This means that even though your project started out with no training data, documents are collected and the separation and classification results of the project improve over time because of online learning.

Alternatively, if you have a project that has been in production for some time but you have added is a new class, republish the project and rely on separation and classification online learning to collect documents to improve separation and classification for the new class.