Local Fuzzy Database Properties window

Use this window to select a locally stored import file for the database using the following options:

Referenced import file (text or csv file)

Select one of the following reference file locations:

  • Filesystem.

    Browse to the desired location of a local fuzzy database. The import process starts automatically when the window is closed, and a message box is displayed that counts the number of imported database lines. One million lines with three fields take about 1 minute to import.

  • Web.

    Type the URL for your local fuzzy database file.

    Click Test to ensure that the connection to the specified URL is available.

    Provide a User Name and Password if authentication is required.

Import Options

This group has the following options:

Ignore Case

Select this option to convert all search and lookup strings to lower case, effectively ignoring case.This option is selected by default.

First line contains caption

Select this option if the first record of the input file contains the column headers. This option is selected by default.

Field delimiter

Type values into this field to specify what characters separate the import file content into individual fields. The value for this option is set to ; (semicolon) by default.

Tab

Select this checkbox to use a Tab as a delimiter in addition to the characters specified in the Field delimiter setting.

Word separation characters

If fields in the database contain compound words, common characters can be specified so that each part of the compound word is searched and evaluated separately. The value for this option is set to" -," (space, hyphen, comma) by default.

For example, using the default settings, the compound word Diagon-Alley, is treated as two words, diagon and alley that are searched and evaluated separately.

Note The separation characters must correspond to the delimiter characters that are defined for OCR.
Tab

Select this checkbox if you want to use a Tab as a word separation character in addition to the characters specified in the Word separation characters setting.

Space

Select this checkbox if you want to use a Space as a word separation character in addition to the characters specified in the Word separation characters setting.

Characters to ignore

Type a list of characters into this field to filter unwanted characters from the input record. When you want to use a field delimiter that may also be a character in the input, such as a comma (,), then you have to use quotes (") to identify the input strings. However, you probably do not want to retain those quotation marks as part of the final results.

If you define the quotes as characters to ignore, they are removed. To define a tab or space as characters to ignore, select the corresponding check box.The value for this option is set to ."'! (period, quotation mark, single quotation mark, and an exclamation point) by default.

Space

Select this checkbox if you want to ignore a Space character in addition to the characters specified in the Characters to ignore setting.

Optimization

This group has the following options:

Automatic update from import file

Select this option to update the Local Fuzzy Database automatically when the source file is updated. This option is cleared by default.

Load database in memory

Select this option to load the database into memory. This option is selected by default.

Database processing
Select one of the options to determine the level of processing required for your database searches. Choose from:
  • Basic.

    Select this option if the computer where Kofax Capture and Kofax TotalAgility is installed does not have a lot of memory or processing power. Selecting this value means that the accuracy of your results can be lower than expected, but the time it takes to generate these results may be significantly faster than if you selected the "Advanced" value for this option.

    This option replicates the search behavior from Kofax TotalAgility 5.0 that enables users to focus on speed instead of accuracy.

  • Advanced. This is the default value for this option.

    Select this option if you want the most accurate search results. Selecting this option means that the accuracy of your results are better than the "Basic" value for this option. However, the time it takes to generate these results depends on the size and complexity of your database, the available memory, and the number of processing cores available on your server. If you do not have a lot of memory and multiple processing cores, or your database is very large and complex, the "Basic" value may be more suitable.

    If you select this value, you can further optimize a database locator that uses this database for speed or accuracy.

Tip If you are unsure which value is best for your project, the best practice is to finish configuring the database locator that uses this fuzzy database and then run several extraction benchmarks to compare the results between the two Database processing values.
String Substitution

This group has the following options:

The string substitution table substitutes Search Text with Replacement Text in the document and in the database. It is used to normalize the results of the text search.

If you use the same dictionarydictionary in more than one project, you can create a list of string substitutions and export them to use in the other project.

You can use the following buttons to manage string substitutions:

Field Names

This group provides a preview of the first 20 lines of the referenced file according to the settings. You can modify a database column by right clicking on its column heading. The following shortcut menu options are available:

Rename Field

Use this option to manually change the column title.

Field present on document

Select this option if the corresponding column has a field value that is displayed on a document. This option is selected by default.

Not present on document

Select this option if the corresponding column does not have a value that is displayed on a document. This option is cleared by default.

Definitions for the buttons at the bottom of this window can be found in Common Transformation Designer Buttons.