General tab - Properties of Format Locator window

Use this tab to determine how the format locator is used in your project.

Advanced

This group has the following settings:

Use results from locator

Enable this setting to use the results from another locator. (Default: Cleared)

In order to use this setting your project must contain two or more locators and the input locator used by this Format Locator must appear above it in the project hierarchy. This is because locators are processed in a sequential order and the results from the input locator are needed before this locator can be processed. Only the following subset of simple locator and evaluator methods are supported as input for a Format Locator:

  • Bar Code Locator
  • Classification Locator
  • Format Locator
  • Named Entity Locator
  • Relation Evaluator
  • Script Locator
  • Sentiment Locator
  • Standard Evaluator
  • Summary Locator

If this setting is enabled, the Format Definitions tab is disabled.

Settings for Regular Expressions

This group has the following settings:

Use OCR substitution

Enable this setting if you want to correct commonly misread letters or numbers. (Default: Cleared)

This setting does not work when a dictionary is inserted in the same format locator.

Define OCR Substitution

Click this button to open the OCR Substitution window and define OCR substitutions.

Settings for Non-Regular Expressions

This group has the following settings:

Maximum search tolerance

This setting helps you to perform a search by allowing for preceding or trailing characters alongside the comparison string. (Default: 30%)

You can use the following equation to calculate the desired search tolerance:

Search Tolerance = 1.00 - Number of Characters (Search String) / Number of Characters (Comparison String)

A value of 0% requires an exact match to the search string and a value of 100% will match everything.

If you are searching a document for word such as "name," the search tolerance can be adjusted so words that contain name are also returned. In order to find words like "names" or "named", the tolerance here needs to allow an extra character or a Maximum search tolerance of 20%. To locate words such as "surname," "jobname," and "unnamed", a value of 43% is required. A value of 50% is required to find words such as "forename" and "username."

Maximum word count

This setting allows you to extract strings that consist of multiple words. (Default: 6)

For example, a phone number that is separated by a hyphen (-) such as 123-456-7890, consists of three words. To ensure that these types of matches are found, the value for this setting must be three or greater.

Maximum gap between words

This specifies the maximum distance in mm that permits word concatenation during a search. Words that fall outside this measurement are treated as separate words and not included in the current search. (Default: 5 mm)

Maximum alternative length

This specifies the maximum length that an alternative is allowed. An alternative that exceeds this amount is rejected. (Default: 100 mm)

Definitions for the buttons at the bottom of this window can be found in Common Transformation Designer Buttons.