General tab - Database Locator Properties window

This tab is used to select databases and to set the maximum number of alternatives, the confidence threshold and the penalty for empty fields in a record of a Database Locator.

Database

This group has the following options:

Select existing database for locator

This option is mandatory for a Database Locator to function. Select a database that is compared against the recognition data during extraction. If an extracted value matches a database entry, then that value is given a higher confidence. This minimizes the effect of recognition errors in extracted data. The value for this option is set to <none> by default.

Select a database with records that should not be found by this locator (e.g. own addresses)

If necessary, select a database that contains records that should exclude detected values from the list of alternatives. If one of these values is found on the document, it is excluded from the final extraction result for this locator. This type of database is commonly referred to as an exclusion database.

The value for this option is set to <none> by default.

Note The "Locator Algorithm Properties" on this window do not apply to this exclusion database. Instead, there is a pre-defined 80% minimum confidence threshold that cannot be modified. All alternatives in an exclusion database search that have at least an 80% confidence are excluded from the main database search. You can adjust how the confidence is calculated for this database on its individual database properties window accessible from the Project Settings - Databases tab.
Database Settings

Click this button to open the Project Settings - Databases tab.

Locator Algorithm Properties

This group has the following options:

Max. alternatives

Type a number to limit the number of alternatives that are returned by this locator. You get the best results if you have at least 2 alternatives available for comparison if the distance option is used. The value for this option is set to 10 by default.

Note When using an Associative Search Database, it may be beneficial to increase the value of this option. For example, if your database contains more than 100,000 records, or your database contains duplicate records or records with similar content, increasing the Max. alternatives value to something like 50 can improve overall extraction results.

However, with the increased maximum alternatives, there may be negative side effects that need to be considered. The speed of extraction may be negatively impacted, as well as the overall time to process a document. Performing tests and benchmarks will help you decide the optimal value for this option get the best overall extraction results.

Min. confidence

Type a number or use the slider to specify the minimum confidence required for a match to be used as an alternative. Only matches with a confidence greater than this threshold are returned. The value for this option is set to 40 by default.

Penalty for empty fields

Type a value or use the slider to specify the penalty for empty fields in the database. The maximum penalty is the percentage of empty fields in a record. The Penalty for empty fields value determines how much of the maximum penalty is applied. For example, a database contains records with ten fields. If one record contains two empty fields, the maximum penalty is 20%. If the Penalty for empty fields value is 100, the actual penalty applied to the field is 20%. If however, the Penalty for empty fields value is 50, the actual penalty applied to the fields is 10%. The value for this option is set to 50 by default.

Optimization

Choose one of the following optimization values:

  • Improve speed. This is the default value for this option.

    Select this value to accelerate the database matching operation. The combination of the minimum confidence settings and this selection can affect the overall accuracy results.

  • Improve accuracy.

    Select this value to focus on precision when matching the document with the database records.

Note This option is not available if you are using a local fuzzy database that is configured for basic database processing.
Maximum distance to concatenate numbers (mm)

Select how far numbers have to be away from each other to be considered a separate number. Any numbers within the distance specified here are considered to be part of the same overall number. For example, modify this value to ensure that the spaces included in a phone number sequence do not force each part to be a separate number. The value for this option is set to 5 mm by default.

Definitions for the buttons at the bottom of this window can be found in Common Transformation Designer Buttons.