Extraction Profiles

These settings are used to define field extraction profiles as part of the configurable Custom 1 to Custom 5 fields. Once defined, an extraction profile is subsequently assigned to a custom field in the Solution Configuration Manager > Profile Settings > Field Settings, using the Extraction Profile ID setting.

It is possible to add multiple entries for these settings. As a result, the settings are displayed in a table.

The following column settings are available.

Extraction Profile ID

The unique ID for this extraction profile.

Description

The profile description.

Analysis Profile ID

The analysis profile ID that is used to generate alternatives for the field.

Entries here correspond to the Analysis Profile ID setting in the Solution Configuration Manager > Global Settings > Custom Extraction Profiles > Analysis Profiles.

Evaluation Profile ID

The evaluation profile ID that is used to evaluate alternatives for the field.

Entries here correspond to the Evaluation Profile ID setting in the Solution Configuration Manager > Global Settings > Custom Extraction Profiles > Evaluation Profiles.

Evaluation Distance

This represents the fuzzy factor that the system uses when searching for keywords or phrases in the evaluation profile.

This value ranges between zero and one, where zero requires an exact match, and one accepts values that do not match at all.

Base Weighting

The base weight that is given to all alternatives generated for the field, and is expressed in a percentage.

Use this setting when only a few alternatives are generated for a field, and these generated alternatives are considered valid extraction results.

Overwrite With Search String

When selected, the field result is overwritten with the string compare or Levenstein search string that is used to generate the alternative.

Remove No Number Alternatives

When selected, any alternatives that do not contain at least one numeric character are removed from the list of available alternatives.

Distance

The fuzzy factor that the system uses when generating alternatives.

This value ranges between zero and one, where zero requires an exact match, and one accepts values that do not match at all.

Max Word Count

This specifies the maximum number of OCR words that are permitted to form an alternative for the field.

Max Word Gap

This specifies the maximum gap in millimeters that is allowed to exist between OCR words, so that they are included as part of a generated alternative.

Max Word Len

This value expresses the maximum length of an alternative in millimeters.

Any alternatives that exceed this length are ignored.

Case Sensitive

If selected, the system generates alternatives based on the format strings entered in the field analysis profile. These are case sensitive.

Keep Spaces

If selected, any spaces between OCR words are preserved in the generated alternative text.

Use Regions

If selected, alternative generation is restricted to specific regions on a document.

In order for this setting to work, Use Regions must also be selected.

Use First Page

If selected, the system generates alternatives on the first page of a document only.

First Top

Expressed as a percentage, this setting defines the top-most area on the first page of a document where alternatives may be generated.

A value of zero would start at the top of the page. A value of 20 would start 20% of the way down the length of the page. A value that exceeds 100 is automatically set to 100. A value of less than zero is automatically set to zero.

In order for this setting to work, Use First Page must also be selected.

First Bottom

Expressed as a percentage, this setting defines the bottom-most area on the first page of a document where alternatives may be generated.

A value of zero would start at the bottom of the page. A value of 80 would stop 80% of the way down the length of the page. A value that exceeds 100 is automatically set to 100. A value of less than zero is automatically set to zero.

In order for this setting to work, Use First Page must also be selected.

First Left

Expressed as a percentage, this setting defines the left-most area on the first page of a document where alternatives may be generated.

A value of zero would start at the left of the page. A value of 20 would start 20% of the way across the left of the page. A value that exceeds 100 is automatically set to 100. A value of less than zero is automatically set to zero.

In order for this setting to work, Use First Page must also be selected.

First Right

Expressed as a percentage, this setting defines the right-most area on the first page of a document where alternatives may be generated.

A value of zero would start at the left of the page. A value of 80 would start 80% of the way across the left of the page. A value that exceeds 100 is automatically set to 100. A value of less than zero is automatically set to zero.

In order for this setting to work, Use First Page must also be selected.

Use Subsequent Page

If selected, the system generates alternatives for all pages between the first and last page of the document.

In order for this setting to work, Use Regions must also be selected.

Subsequent Top

Expressed as a percentage, this setting defines the top-most area on the subsequent page of a document where alternatives may be generated.

A value of zero would start at the top of the page. A value of 20 would start 20% of the way down the length of the page. A value that exceeds 100 is automatically set to 100. A value of less than zero is automatically set to zero.

Subsequent Bottom

Expressed as a percentage, this setting defines the bottom-most area on the subsequent page of a document where alternatives may be generated.

A value of zero would start at the bottom of the page. A value of 80 would stop 80% of the way down the length of the page. A value that exceeds 100 is automatically set to 100. A value of less than zero is automatically set to zero.

Subsequent Left

Expressed as a percentage, this setting defines the left-most area on the subsequent page of a document where alternatives may be generated.

A value of zero would start at the left of the page. A value of 20 would start 20% of the way across the left of the page. A value that exceeds 100 is automatically set to 100. A value of less than zero is automatically set to zero.

Subsequent Right

Expressed as a percentage, this setting defines the right-most area on the subsequent page of a document where alternatives may be generated.

A value of zero would start at the left of the page. A value of 80 would start 80% of the way across the left of the page. A value that exceeds 100 is automatically set to 100. A value of less than zero is automatically set to zero.

Use Last Page

If selected, the system generates alternatives on the last page of a document.

In order for this setting to work, Use Regions must also be selected.

Last Top

Expressed as a percentage, this setting defines the top-most area on the last page of a document where alternatives may be generated.

A value of zero would start at the top of the page. A value of 20 would start 20% of the way down the length of the page. A value that exceeds 100 is automatically set to 100. A value of less than zero is automatically set to zero.

Last Bottom

Expressed as a percentage, this setting defines the bottom-most area on the last page of a document where alternatives may be generated.

A value of zero would start at the bottom of the page. A value of 80 would stop 80% of the way down the length of the page. A value that exceeds 100 is automatically set to 100. A value of less than zero is automatically set to zero.

Last Left

Expressed as a percentage, this setting defines the left-most area on the last page of a document where alternatives may be generated.

A value of zero would start at the left of the page. A value of 20 would start 20% of the way across the left of the page. A value that exceeds 100 is automatically set to 100. A value of less than zero is automatically set to zero.

Last Right

Expressed as a percentage, this setting defines the right-most area on the last page of a document where alternatives may be generated.

A value of zero would start at the left of the page. A value of 80 would start 80% of the way across the left of the page. A value that exceeds 100 is automatically set to 100. A value of less than zero is automatically set to zero.