Extraction Set

One way of ensuring your extraction results are improved is by providing training documents for extraction.

This document set is where all of the documents used for extraction training are stored. Similar to the Classification Set, the Extraction Set is created automatically when a project is created, ready for you to add documents as necessary. This document set is available in List View and Thumbnail View only.

The Extraction Set contains documents that are ideal examples with good extraction results for the classes defined in your project.

For those projects created in an earlier version of the Project Builder that does not use document sets, the existing Extraction Set is converted to the new document set format. This means that you do not lose any of your old training documents, and you can add additional documents to your new Extraction Set at any time.

The Extraction Set that is created automatically cannot be deleted from your project unless another document set is set as the Extraction Set. This means that if you have an existing set of documents organized in a meaningful directory, you can add it as a new document set and set it to be the Extraction Set. The original training set can now be deleted if desired.

Note For the best results, ensure that the documents used in any document set are located on an NTFS file system. A FAT 32 file system has a restriction on the number of files allowed, and exceeding this number could result in the loss of documents and data. For more information, see http://technet.microsoft.com/en-us/library/cc776720%28v=WS.10%29.aspx.

You can manage your Extraction Set in the following ways: