Train the content classifiers

To ensure that your content classification settings work in production, train the settings against a set of representative documents that have similar content as those you are processing. These training documents are used to train all content classification, including the top-level content classifier and subtree classifiers.

Similarly, if you change your content classification settings, it is necessary to train your project another time. For example, if the training behavior of your project is configured to "Train the first page only," this means that only the first page is trained when you train your project. If you later change this behavior to "Train all pages" if is necessary to train your project again so that the classifiers have all page recognition details.

Important After changing the properties of your classifiers, or after adding or deleting documents from your training set, you must retrain your project.

You can train your project for content classification by following these steps:

  1. In the main Transformation Designer window, hide or close any windows that block the Ribbon.
  2. Open the Project Tree window if it is not already open.
  3. Expand the Project Tree and select the class.
  4. Optionally, view the class contents if they are not already displayed.

    The hidden class contents are displayed.

  5. Ensure that content classification training is enabled for the selected class.

    If this option is cleared, this class is not trained for layout classification, even if you add training documents.

  6. Open the Documents window if it is not already open.
  7. Optionally, add training documents to one of the document subsets in the Classification Set.
  8. On the Process tab, in the Train group, click Separation & Classification.

    The documents in your Classification Set are trained and you may now test your classification settings.

  9. Save the changes to your project.