Classification online learning

Use classification online learning to ensure that documents with unsatisfactory classification results are collected and used to improve future batches.

Important Classification online learning is configurable only if your project has a Classification Group.

Documents that are modified during production are collected automatically. How the collected documents are used however, depends on how your project is configured.

The documents are used in one of the following ways.

  1. Dynamic Classification Online Learning. This type of online learning is the default type when classification online learning is enabled. It has dynamic classifiers that collected training documents whenever an issue with classification arises. At regular intervals, the Transformation Server takes the accumulated documents in the training folder, imports them, and trains your project. Subsequent documents benefit from these added training documents and classification improves over time.

    This is the recommended type of classification online learning to get the best results for your project.

  2. Basic classification online learning also collects training documents automatically. These documents are copied into the New Samples document set. The project administrator can import the training documents, but they are applied only when your project is manually trained and then republished. For the best results, train your project at regular intervals.

When an unknown document is encountered during production and assigned to a class by a user, that document is then collected as a training document. The next time a document of that class is processed, it is better recognized. A few documents are collected for each class or until the classification results are confident.

Although it is best practice to add at least one training document for each class during configuration, this is no longer mandatory in order to get successful classification results. You can now use classification online learning. You can put a brand new project with several classes and no classification training documents into production. Each time a document is not recognized and needs to be reclassified, a training document is collected. This means that even though your project started out with no training data, documents are collected and the classification results of the project improve over time because of classification online learning.

Alternatively, if you have a project that has been in production for some time but you have added is a new class, republish the project and rely on classification online learning to collect documents and improve classification for the new class.

Important If you are using document separation, you cannot use dynamic classifiers. So, if you are using document separation, classification online learning can only collect documents automatically. This means that in order to improve classification, the training documents need to be imported and then your project retrained and published before the collected documents can improve classification.