Enhanced OCR Output Format Window - HTML

Use this window to control the output format for the HTML file generated by the Kofax Enhanced OCR Full Text recognition engine. The following rules apply:

  • The original page layout retains where possible.
  • The text attributes (Bold, Italic, Underline) are always retained when the recognized data is saved to the output file.
  • The text color is always retained.
  • The pictures are always detected (if possible) and embedded in the output file. No resolution changing of pictures is possible.

Output format

Changing the output format may make other options available. The settings of disabled options are retained, so that if you return to that format, the most recent settings are still used.

You can select an output format from this list:

  • Plain Text (.txt)

  • Rich Text Format (.rtf)

  • HTML (.mht)

  • Microsoft Word (*.doc)

  • Comma-Separated Values (*.csv)

  • Microsoft Excel (*.xls)

  • Microsoft Word 2007 and later (*.docx)

  • Microsoft Excel 2007 and later (*.xlsx)

Index page

Configure the index page of the exported HTML files. You can select from the following:

  • None: No index page is present.

  • Simple: The index page is stored as a separate file. This is the default setting.

  • Frame: The index page is a frame.

Suppress line breaks

Select this check box if you want line breaks in the original document to be suppressed (discarded) when the recognized data is saved. If not, the line breaks are retained.

Use page break as page separator

Select this check box when you want page breaks in the original document to be used as page separators when the recognized data is saved. If not, the page breaks are ignored.