Trainable Document Separation

This type of separation is disabled by default for all new projects. This means that Transformation Server separates documents based on how each individual page is classified. Configure separation settings on the Document Separation tab of the Project Settings window.

Use this type of document separation when the content and layout of the first pages of your documents vary within a class. This separation type is also useful when the length of a document is not consistent because you can train first pages, middle pages and last pages for each classification result.

If you do not want to use document separation online learning, you can still train your project for separation. Use a large training set that contains typical examples of each classification type, add these documents to the Classification Set, and confirm them before training your project.

If you want to use document separation online learning, this means that each time a document is manually separated or reclassified for by a user, that document is collected and used to improve the separation of subsequent documents with the same layout. For the best document separation results, use a combination of manually trained documents and online learning. Adding training documents for known document types will ensure that they are successfully separated and classified, while online learning will train new document types as they are encountered in production.

For example, you configure a project to recognize correspondence. Most correspondence varies in length, content, and layout. Trainable document separation works well for this type of document because Transformation Server can recognize typical first page, middle page, and last page content found on correspondence letters.

After changing the properties of your classifiers, or after adding or deleting documents from your training set, you must retrain your project.