Direct TXT Output Converter Module

This module allows you to convert recognized text. That is, you use the output of the recognition module as it is, without reading order and paragraph detection. Therefore the DirectTXT Outputs are simpler and faster to produce than the Layout Retention Output conversions that are available in RecAPIPlus, because DirectTXT Outputs do not include slow detection processes. The following DirectTXT output types are available:

  • DirectTXT Text output: a simple text file.

  • DirectTXT CSV output: a comma-separated text file, a simple format to represent tables. Microsoft Excel can read this format.

  • DirectTXT Formatted Text output: This converter delivers plain text, but attempts to keep the page layout as detected in the original image. It creates a text file that simulates columns and boxes using tabulators.

  • DirectTXT PDF output: contains the whole image of the original page and the text behind the image on a separate layer (image on text PDF). These PDF files suit the purpose of page archiving, because they contain both the original image and recognized text.

  • DirectTXT XML output: typically used for further processing recognized data. You can easily parse, for example, to MSXML, or transform to XSLT the output XML file. The format of the XML output is specified by the same schema as the Layout Retention XML Output, see http://www.scansoft.com/omnipage/xml/ssdoc-schema3.xsd.

  • DirectTXT Binary output: used for creating files directly from the recognition data without any character conversion and formatting. It is the most usable output format for barcodes containing binary data, for example Code128 or PDF417 barcodes, containing encrypted data.

When you specify an already existing file name, the TXT type outputs are appended.

The DTXT module can be especially useful for applications that do not require formatting but speed is an important factor, for example: indexing, archiving, or some form processing applications. When programming with KernelAPI, this is your only output choice. It is possible to purchase distribution licenses that exclude formatted output, see Prepare distribution file set.