Configure the Text Content Locator

The Text Content Locator enables you to extract data from unstructured documents. Because extraction does not rely on the document layout, training documents must show where the expected data is located. Depending on your configuration, you can continuously add new training documents to improve your overall results for this locator.

You can configure this locator by following these steps:

  1. Open the locator properties.
  2. Add one or more subfields.

    A list of subfields appears in the Define Subfields table.

  3. Optionally, Rename one or more subfields
  4. For each subfield you add to this locator, add fields to your class.
  5. Map each class field to the subfields in your Text Content Locator.
  6. On the Text Content Locator Properties window, click the Advanced tab.
  7. Select or enter a value for Maximum number of training documents.

    For the best results, do not exceed 500 training documents. More than 500 training documents may not improve extraction but it does increase the time it takes to train your project. The additional training time outweighs the limited advantage of the improved extraction results.

  8. Optionally, clear the Collect online documents for training option.

    Documents collected during runtime are not included in online learning. To incorporate these documents into your project to improve extraction results, manually train your project. However, it is best to clear this option if you are close to the maximum number of training documents. This option selects random documents equaling your maximum number from the training set during training and can take around 30 minutes to train 500 documents.

  9. Click Close to close the locator properties window.
  10. Open the Documents window if it is not already open.
  11. Train and test your settings.
  12. Save the changes to your project.