Dealing with large training sets

A full copy of the project, including the entire classification/extraction group or shared project, is stored in the TotalAgility database every time you save or release the project. The project can get very large if it contains a large training set. You must consider the following options to keep the size of the training set minimal:

  1. Use bitonal images. Full color images are comparatively 10-100 times larger than the bitonal image. If you train the project in Transformation Designer, select bitonal images to begin with. If you process documents at runtime, use an image processing activity and the rendition feature in TotalAgility Designer. This will allow you to convert documents to bitonal images for faster processing (and keep the training set small for Online Learning) but still allow you to export the original color images to the system of record (as that is kept in the rendition).

  2. Delete documents that are not needed. Once you train the project in Transformation Designer, documents are flagged as Needed or Not Needed. The documents that are not needed can be deleted without impacting the extraction accuracy.

  3. If the training set gets very large despite using the bitonal images and deleting documents that are not needed, you can store it separately from the project, outside of the TotalAgility database. These sets are called “External Training Sets.” By default, the training sets in Transformation Designer are part of the project and stored in the same temp folder as the rest of the project while the project is opened in Transformation Designer. When Transformation Designer is closed, all content from the temp folder is stored in the TotalAgility database. To “externalize” the training set, you can do the following:

    1. Select a folder on the disk or network share where you want to store the training set.

    2. Open that folder in Transformation Designer as a Test Set.

      Open Document Set

    3. Right-click the Test Set and select Use as Extraction Training Set (or Use as Classification Training Set).

      Test Document Set

      Now the extraction Training Set points to your selected folder instead of pointing to the project's temp folder.

      Documents folder

      When you save the project and close Transformation Designer, this folder is not stored in TotalAgility.

      • Make sure you perform a backup.

      • Always open the project in Transformation Designer when you have access to the external training set on your local computer or a shared network folder.