Local Fuzzy Database Properties window

Use this window to select a locally stored import file for the database using the following settings:

Referenced import file (text or csv file)

Select one of the following reference file locations:

File system.

Browse to the desired location of a local fuzzy database. The import process starts automatically when the window is closed, and a message box is displayed that counts the number of imported database lines. One million lines with three fields take about 1 minute to import.
Web.

Type the URL for your local fuzzy database file.

Click Test to ensure that the connection to the specified URL is available.

Provide a User Name and Password if authentication is required.

Column Configuration

This table has the following columns. You can rename an entry by clicking on the row and then clicking on a the cell in that row.

Column name

The name of the database column.

Search

If selected for a field, that field is included in database searches.

Choose this value for each field that might be present on a document.

Filter

It is possible to filter the data before searching to reduce the number of possible records. If this setting is selected for a field it is included in the filtering index. Filtering is possible using scripting only.

This setting is available when both of following settings are selected in the Optimization group only.

Load database in memory.
The value for Database processing is set to Advanced.

Import Options

This group has the following settings:

Ignore Case

Select this setting to convert all search and lookup strings to lower case, effectively ignoring case.This setting is selected by default.

Filtering is case sensitive

Select this setting to ensure that the filtering text entered in the script matches an entry in the database. If the cases do not match exactly, no records are returned. This setting is selected by default.

This setting is available when at least one field has the Filter setting selected in the Column Configuration group and both of the following settings are selected in the Optimization group only.

Load database in memory.
The value for Database processing is set to Advanced.

First line contains caption

Select this setting if the first record of the input file contains the column headers. This setting is selected by default.

Field delimiter

Type values into this field to specify what characters separate the import file content into individual fields. The value for this setting is set to ; (semicolon) by default.

Tab: Select this checkbox to use a Tab as a delimiter in addition to the characters specified in the Field delimiter setting.

Word separation characters

If fields in the database contain compound words, common characters can be specified so that each part of the compound word is searched and evaluated separately.

The value for this setting is set to -, (space, hyphen, comma) by default.

For example, using the default settings, the compound word "Diagon-Alley," is treated as two words, "diagon" and "alley" that are searched and evaluated separately.

The separation characters must correspond to the delimiter characters that are defined for OCR.

Tab: Select this checkbox if you want to use a Tab as a word separation character in addition to the characters specified in the Word separation characters setting.
Space: Select this checkbox if you want to use a Space as a word separation character in addition to the characters specified in the Word separation characters setting.

Characters to ignore

Type a list of characters into this field to filter unwanted characters from the input record. When you want to use a field delimiter that may also be a character in the input, such as a comma (,), then you have to use quotes (") to identify the input strings. However, you probably do not want to retain those quotation marks as part of the final results.

If you define the quotes as characters to ignore, they are removed. To define a tab or space as characters to ignore, select the corresponding check box.The value for this setting is set to ."'! (period, quotation mark, single quotation mark, and an exclamation point) by default.

Space: Select this checkbox if you want to ignore a Space character in addition to the characters specified in the Characters to ignore setting.

Optimization

This group has the following settings:

Automatic update from import file

Select this setting to update the Local Fuzzy Database automatically when the source file is updated. This setting is cleared by default.

Load database in memory

Select this setting to load the database into memory. This setting is selected by default.

If you clear this setting, the Filter setting is disabled in the Column Configuration group.

Database processing

Select one of the settings to determine the level of processing required for your database searches. Choose from:

Basic.

Choose this value if the computer where Kofax Capture and Kofax TotalAgility is installed does not have a lot of memory or processing power. Selecting this value means that the accuracy of your results can be lower than expected, but the time it takes to generate these results may be significantly faster than if you selected the Advanced value for this option.

If you choose the Basic value for this setting, the Filter setting is disabled in the Column Configuration group.

This setting replicates the search behavior from Kofax TotalAgility 5.0 that enables users to focus on speed instead of accuracy.
Advanced. This is the default value for this setting.

Choose this value if you want the most accurate search results. Selecting this value means that the accuracy of your results are better than the Basic value for this setting. However, the time it takes to generate these results depends on the size and complexity of your database, the available memory, and the number of processing cores available on your server. If you do not have a lot of memory and multiple processing cores, or your database is very large and complex, the Basic value may be more suitable.

If you choose this value, you can further optimize a database locator that uses this database for speed or accuracy.

If you are unsure which value is best for your project, the best practice is to finish configuring the database locator that uses this fuzzy database and then run several extraction benchmarks to compare the results between the two Database processing values.

String Substitution

This group has the following settings:

The string substitution table substitutes Search Text with Replacement Text in the document and in the database. It is used to normalize the results of the text search.

If you use the same dictionary in more than one project, you can create a list of string substitutions and export them to use in the other project.

You can use the following buttons to manage string substitutions:

Table Records Preview

This table provides a preview of the selected database.

Definitions for the buttons at the bottom of this window can be found in Common Transformation Designer Buttons.