RecoStar character set and pattern

In addition to selecting a content type, the allowed characters can be further restricted when using the RecoStar recognition engine.

The "Character set" option specifies a mask of allowable characters and formatting for the field. A basic character set is implicitly determined by the selected countries and content type. Select from the following masks:

  • 0-9 to restrict the allowed characters to digits only

  • A-Z to restrict the allowed characters to uppercase letters only

  • A-Za-z0-9 to restrict the allowed characters to letters and digits only

The characters found in a zone that do not meet the requirements of the selected character set are displayed as substitutes or rejects.

The "Pattern" option enables the project administrators to narrow the results even further by using a regular expression. This is useful in situations where standard logical context is misleading or too weak. For example, when the standard logical context is controlled by algorithms working with common experience, dictionaries and the Trigram mode.

Important If a regular expression pattern is specified in combination with a possibly incompatible character set, the character set determines what symbols are valid. If there is no character set, but a pattern, the character set is restructured or even extended to the characters supported by the pattern.