Fuzzy database CSV and text file formats

A fuzzy database needs to comply with the following features before it can be added and successfully processed by Kofax Transformation Modules.

  • The database file contains one record per line only.

  • Line breaks can be either CR+LF (Carriage Return/Linefeed) or LF (Linefeed).

  • The database format or encoding is either ANSI, or UTF-8 with the BOM (Byte Order Marker). If a UTF-8 file is missing a BOM character, it is interpreted as an ANSI file instead.

  • Values are separated by a field delimiter that is configurable in the database properties.

  • Different field delimiter characters are allowed to indicate where one field ends and another begins, but double characters like :: are not supported as field delimiters.

  • Values can be enclosed in double quotes yet still contain a field delimiter character to represent a string of fields that are to be kept together. For example, Chris;Booker ;Wetzingerstr 19;79100;Freiburg .

  • The first line of the database file can contain the column headers, but this is not mandatory.

  • Every record has the same number of columns. If a record does not have a value for a specific field, it still needs to be represented with an empty value between two field delimiter characters. If a record contains a different number of columns, it is ignored when the database is imported and a message is displayed to the user.