HTML tab

Use this tab to set the parameters for saving the recognized text as an HTML file.

Field name Description
Format
  • Simple (compatible with all browsers) — If you set this option, the HTML 3 format is used. The document layout is retained approximately: first-line indent and indents in tables are not retained. This HTML format is supported by all browsers (Netscape Navigator, Internet Explorer 3.0 and later).
  • Full (CSS) — If you set this option, the HTML 4 format is used. It supports any type of the document layout retention. The built-in style sheet is used.
Synthesis mode This option allows you to specify if the page layout should be retained partly or completely. Select one of the following options:
  • Retain paragraphs and fonts — Retains the structure of tables, arrangement into paragraphs, font, and font size.
  • Retain full document logical structure — Retains the layout in full: arrangement into paragraphs, font and font size, columns, text direction, text color, the structure of tables.
  • Retain only paragraphs — Retains only the structure of tables and arrangement into paragraphs.
    This option appears when you select Simple from the Format drop-down list box.
Keep line breaks Set this option if you want the original arrangement of lines to be retained, otherwise, the text is formatted in a single line in the HTML file.
Keep text color Set this option if you want the original character color to be retained. If the Retain only paragraphs is selected in Synthesis mode combo box, then this check box is disabled.
Use solid line as page breaks The original arrangement of pages is retained, and pages are separated by a solid line.
This option is only available, when Format option is set to Simple.
Keep pictures Select this check box to save the pictures together with the recognized text.
Picture formats Select a color image format for exporting to an HTML file with embedded pictures:
  • Auto — Format is defined automatically.
  • Color — Exports images in color.
  • Gray — Exports images in gray.
  • B&W — Exports images in black and white.
Reduce picture resolution to Sometimes you may want to reduce the image resolution. For example, HTML files are usually viewed in a browser, so there is no sense to save high-resolution pictures in such files. So, you may reduce the image resolution (so reducing the HTML file size) without actually loosing the visual image quality: enter the necessary resolution value in this field.
  • If you enter a higher resolution value than the source one in the Reduce picture resolution to field, this value will be ignored; the pictures will be saved using the source resolution.

  • Usually, the value of 150 dpi is suitable.

  • If a PDF file has been added to the batch, when exporting it to PDF, PDF/A, HTML, RTF or PPT formats the resolution of pictures will not exceed 300 dpi.

JPEG quality

This property specifies the value of the JPEG quality for color pictures saved in HTML format in percent. The so-called "quality loss" algorithm is used to compress the image, that is, the compressing technology is based on averaging the groups of pixels, so that a whole region is saved as a single number and not as a large number of numbers that describe each pixel. The higher the value you specify in this field, the higher will be the quality of the image you save. The default value for this property is 75%.

This value is ignored for PNG pictures.
HTML fields Click to open HTML Fields dialog box to set property values.
When HTML is used as the output file format, you get an HTML file and some images that the HTML file references. Renaming the images breaks the internal links. Do not use the rename schema when exporting to HTML format.