File parsing

Use file parsing capability to obtain data from files. This functionality supports plain (such as CSV), XML, and Excel files. You can create rules and templates within Studio to parse files with no need to use external tools or additional modules. You first define a template that describes how to extract data from a specific file type. Next, define a batch to use a set of templates against a specified set of files. These data-loading batches are then run from within an execution plan. When the file parser runs, it extracts data from files and loads it into a Staging database.

Note You need to have the Microsoft database engine installed on the server to process Excel files as data sources. To view Excel files and parse them by Insight, you must install Microsoft Excel on the server. To process Excel 2016 files, you need to perform a procedure described in the Kofax Insight Installation Guide.

To work with external files, you must:

  • Review and change (if needed) the parsing configuration settings.

  • Create a set of file templates.

    Each template is used only for files with the same format and internal structure. Files with a unique column order, data type, or other difference require a separate template to extract data.

  • Define the loading batches using templates.

  • Include the batch load in an execution plan.

  • Run an execution plan.

Document Fields, Records, Header, Footer

To use a template to parse data from a file, the file parsing engine requires a Document Fields definition and at least one Record.

Document Fields are fields that have one value for the whole parsed file. By default, they include the File name, Create date, Modify date, and File size fields. You can add additional fields with values parsed from the file name or special file regions such as the Header or Footer. For each parsed file, the engine adds one row into the Document Fields table.

Note You can parse files with multiple data sections.

The parsing engine recognizes three different types of data parts:

Header: Static part in the beginning of a file that contains the shared information for all data from the file.

Footer: Static part at the end of a file that contains the overall/aggregate/total information for all data from the file.

Record(s): Repeating part of the file that contains grouped data with a common structure (unique within the file).

Note Fields specified within the header and footer are stored in the Document Fields table.
Template Groups, Files, Templates

Use a template group for the logical template grouping, such as templates from one source or for one file format. You can define one or more files within a template group. A file (template folder) can contain one or more templates.