Extraction Profiles
These settings are used to define field extraction profiles as part of the configurable Custom 1 to Custom 5 fields. Once defined, an extraction profile is subsequently assigned to a custom field in the Field Settings, using the Extraction Profile ID setting.
It is possible to add multiple entries for these settings. As a result, the settings are displayed in a table.
The following column settings are available.
- Extraction Profile ID
-
The unique ID for this extraction profile.
- Description
-
The profile description.
- Analysis Profile ID
-
The analysis profile ID that is used to generate alternatives for the field.
Entries here correspond to the Analysis Profile ID setting in the Analysis Profiles.
- Evaluation Profile ID
-
The evaluation profile ID that is used to evaluate alternatives for the field.
Entries here correspond to the Evaluation Profile ID setting in the Evaluation Profiles.
- Evaluation Distance
-
This represents the fuzzy factor that the system uses when searching for keywords or phrases in the evaluation profile.
This value ranges between zero and one, where zero requires an exact match, and one accepts values that do not match at all.
- Base Weighting
-
The base weight that is given to all alternatives generated for the field, and is expressed in a percentage.
Use this setting when only a few alternatives are generated for a field, and these generated alternatives are considered valid extraction results.
- Overwrite With Search String
-
When selected, the field result is overwritten with the string compare or Levenstein search string that is used to generate the alternative.
- Remove No Number Alternatives
-
When selected, any alternatives that do not contain at least one numeric character are removed from the list of available alternatives.
- Distance
-
The fuzzy factor that the system uses when generating alternatives.
This value ranges between zero and one, where zero requires an exact match, and one accepts values that do not match at all.
- Max Word Count
-
This specifies the maximum number of OCR words that are permitted to form an alternative for the field.
- Max Word Gap
-
This specifies the maximum gap in millimeters that is allowed to exist between OCR words, so that they are included as part of a generated alternative.
- Max Word Len
-
This value expresses the maximum length of an alternative in millimeters.
Any alternatives that exceed this length are ignored.
- Case Sensitive
-
If selected, the system generates alternatives based on the format strings entered in the field analysis profile. These are case sensitive.
- Keep Spaces
-
If selected, any spaces between OCR words are preserved in the generated alternative text.
- Use Regions
-
If selected, alternative generation is restricted to specific regions on a document.
In order for this setting to work, Use Regions must also be selected.
- Use First Page
-
If selected, the system generates alternatives on the first page of a document only.
- First Top
-
Expressed as a percentage, this setting defines the top-most area on the first page of a document where alternatives may be generated.
A value of zero would start at the top of the page. A value of 20 would start 20% of the way down the length of the page. A value that exceeds 100 is automatically set to 100. A value of less than zero is automatically set to zero.
In order for this setting to work, Use First Page must also be selected.
- First Bottom
-
Expressed as a percentage, this setting defines the bottom-most area on the first page of a document where alternatives may be generated.
A value of zero would start at the bottom of the page. A value of 80 would stop 80% of the way down the length of the page. A value that exceeds 100 is automatically set to 100. A value of less than zero is automatically set to zero.
In order for this setting to work, Use First Page must also be selected.
- First Left
-
Expressed as a percentage, this setting defines the left-most area on the first page of a document where alternatives may be generated.
A value of zero would start at the left of the page. A value of 20 would start 20% of the way across the left of the page. A value that exceeds 100 is automatically set to 100. A value of less than zero is automatically set to zero.
In order for this setting to work, Use First Page must also be selected.
- First Right
-
Expressed as a percentage, this setting defines the right-most area on the first page of a document where alternatives may be generated.
A value of zero would start at the left of the page. A value of 80 would start 80% of the way across the left of the page. A value that exceeds 100 is automatically set to 100. A value of less than zero is automatically set to zero.
In order for this setting to work, Use First Page must also be selected.
- Use Subsequent Page
-
If selected, the system generates alternatives for all pages between the first and last page of the document.
In order for this setting to work, Use Regions must also be selected.
- Subsequent Top
-
Expressed as a percentage, this setting defines the top-most area on the subsequent page of a document where alternatives may be generated.
A value of zero would start at the top of the page. A value of 20 would start 20% of the way down the length of the page. A value that exceeds 100 is automatically set to 100. A value of less than zero is automatically set to zero.
- Subsequent Bottom
-
Expressed as a percentage, this setting defines the bottom-most area on the subsequent page of a document where alternatives may be generated.
A value of zero would start at the bottom of the page. A value of 80 would stop 80% of the way down the length of the page. A value that exceeds 100 is automatically set to 100. A value of less than zero is automatically set to zero.
- Subsequent Left
-
Expressed as a percentage, this setting defines the left-most area on the subsequent page of a document where alternatives may be generated.
A value of zero would start at the left of the page. A value of 20 would start 20% of the way across the left of the page. A value that exceeds 100 is automatically set to 100. A value of less than zero is automatically set to zero.
- Subsequent Right
-
Expressed as a percentage, this setting defines the right-most area on the subsequent page of a document where alternatives may be generated.
A value of zero would start at the left of the page. A value of 80 would start 80% of the way across the left of the page. A value that exceeds 100 is automatically set to 100. A value of less than zero is automatically set to zero.
- Use Last Page
-
If selected, the system generates alternatives on the last page of a document.
In order for this setting to work, Use Regions must also be selected.
- Last Top
-
Expressed as a percentage, this setting defines the top-most area on the last page of a document where alternatives may be generated.
A value of zero would start at the top of the page. A value of 20 would start 20% of the way down the length of the page. A value that exceeds 100 is automatically set to 100. A value of less than zero is automatically set to zero.
- Last Bottom
-
Expressed as a percentage, this setting defines the bottom-most area on the last page of a document where alternatives may be generated.
A value of zero would start at the bottom of the page. A value of 80 would stop 80% of the way down the length of the page. A value that exceeds 100 is automatically set to 100. A value of less than zero is automatically set to zero.
- Last Left
-
Expressed as a percentage, this setting defines the left-most area on the last page of a document where alternatives may be generated.
A value of zero would start at the left of the page. A value of 20 would start 20% of the way across the left of the page. A value that exceeds 100 is automatically set to 100. A value of less than zero is automatically set to zero.
- Last Right
-
Expressed as a percentage, this setting defines the right-most area on the last page of a document where alternatives may be generated.
A value of zero would start at the left of the page. A value of 80 would start 80% of the way across the left of the page. A value that exceeds 100 is automatically set to 100. A value of less than zero is automatically set to zero.