Text Content Locator

Text Content Locator icon This locator method finds data in unstructured documents that have no consistent layout. This enables you extract data from contracts, correspondence, or even essays and manuscripts. This locator works best for semi-structured documents, and is designed for documents that unstructured text made up of sentences. You can extract data from unstructured documents with moderate success, and the results should improve as you add more training documents.

You can use this locator to extract data from machine-printed forms with some success. However, other locator methods such as the Advanced Zone Locator or the Format Locator are more successful in extracting data from forms.

This locator needs training documents to teach the locator how to find the necessary data. Unlike other locators, you configure this locator by adding documents to your Extraction Set, lassoing the needed content, and training your project.

For the best results, the more training documents you have, the better. However, as you increase the number of training documents, the time to train your project also increases. Test your extraction results regularly after training your project to ensure that you are not adding documents without a positive effect on the results.

Manage the Text Content Locator as follows:

  • Add Text Content Locator subfields

  • Map Text Content Locator subfields

  • Rename Text Content Locator subfields

  • Delete Text Content Locator subfields

  • Configure the Text Content Locator

  • Train the Text Content Locator

Important This locator method supports content that spans multiple lines. However, the Edit Document window does not support lassoing content that spans multiple lines. To train a document with content on multiple lines, you can add the data for such a field using Test Validation prior to adding the document to the Extraction Set.

Open the classified test document in Test Validation and hold the Ctrl key while selecting the values in the Document Viewer you want to assign to the current selection field, one after the other. Note that the complete area is highlighted, however, only the selected words are added to the field. Save the document after Test Validation has been closed and add the document to the Extraction Set.

The Text Content Locator Properties window contains the following tabs: