PDF Image + Text Output Format Window for Text Under Image

When the recognition profile is set to Kofax PDF Text Under Image with the PDF Image + Text engine, use this window to configure PDF output preferences.

Note For new Kofax Capture 10.1 or 10.2 installations, Kofax PDF Text Under Image appears on the list of recognition profiles by default, and Kofax PDF Image + Text is no longer listed. However, you can continue to use any Kofax PDF Image + Text recognition profile created in an earlier version of Kofax Capture. If you import a batch class that uses the Kofax PDF Image + Text profile, it is added to the list of available recognition profiles.

Output format

Select the output format that is generated by the PDF Image + Text recognition engine and saved to an external file:

  • Kofax PDF: File format that lets you view a document on any computer system while preserving the layout.

  • Kofax PDF/A: File format based on PDF that supports the long-term preservation of digital documents. PDF/A files often have larger file sizes than PDF. This option also includes tags in the PDF document.

The availability of other settings on this window may vary, based on your output format selection.

Page Content

Select the structure of the PDF pages generated by the PDF Image + Text recognition engine:

  • Text Over Image: The recognized text is saved over the entire page image. The entire image is saved as a bitmap; however, text areas are saved as text (with full text search capability) over the bitmap. With this option, you can select and copy any of the text.

  • Text Under Image: The recognized text is saved under the entire page image. This is the default selection. The entire image is saved as a bitmap; however, text is placed beneath it. This option is useful if you export your text to document archives: the full page layout is retained and full text search is available. You can select and copy from the underlying text. This option tends to produce the largest output file among these three choices.

  • Text and Image: The recognized text is saved as text (with full text search capability) and images are saved as bitmaps. The original document design (font, background and layout marking) is not retained. This option tends to produce the smallest output file among these three choices.

Text Settings

Select the text attributes you want to be retained when the recognized data is saved to the output file. For example, if you want to retain characters that are bold in the original document, select the Bold option.

Replace uncertain words with images

Use this option to replace words the engine cannot recognize with small graphic snippets of the word that have been clipped from the original image file.

Note Depending on the image, in some cases the PDF Image + Text recognition engine may not be able to determine bold or italic text attributes, even when those options are selected in the window.

Note that the text attribute settings behave differently according to the Page content setting.

  • With Text Over Image, the text attributes are output as selected, and the output text retains its original color.

  • With Text Under Image, text attribute selections are ignored, and the output text is always black text on a white background.

  • With Text and Image, the text attributes are output as selected, and the output text is always black text on a white background.

Retain text color

Select this option if you want the color of the text in the original document to be retained when the recognized data is saved. If not, the original color is ignored.

Embed recognized text fonts

The Embed recognized text fonts option is not available.

Resolution

Set the resolution of the images for the PDF document being saved as an output file. You can select from the following output resolutions in dots per inch:

  • 72

  • 96

  • 120

  • 200
  • 240

  • 300

  • 360

  • 400

  • 600

Note If necessary, you can improve image quality results by increasing the resolution setting from the 72 dpi default resolution. The maximum setting that can be implemented is equal to the original image resolution value. For example, if the original image resolution of the scanned page is 200 dpi and the resolution is set to 300 dpi, the image resolution of the output file is 200 dpi rather than 300 dpi.

Compression format

Use the list to select a compression format (CCITT4, JPEG, or JPEG 2000) for PDF output.

By default, the selection is JPEG, which is supported for color and grayscale images. If you are processing bitonal images while JPEG is selected, CCITT4 is used instead.

Note If you select PDF/A, the JPEG 2000 format is not supported.

JPEG Quality

Use this to specify the JPEG quality for color pictures saved in the output file. The quality range is from 1% to 100%, with 100% being the best quality.

PDF Version

Select one of the following PDF versions to use for the output:

  • Auto

  • 1.3

  • 1.4

  • 1.5

  • 1.6

  • 1.7

When you select Auto, the application automatically determines the PDF version number.

PDF/A Compliance

If Kofax PDF/A is selected as the output format, select the level of PDF/A compliance:

  • PDF/A-1a: (default setting): Supports long-term storage of digital documents and fully satisfies the requirements in the ISO 19005-1 specification. Readable by any PDF reader that conforms to PDF 1.4, or later.

  • PDF/A-1b : Offers the same level of compliance as PDF/A-1a and satisfies the minimal requirements in the ISO 19005-1 specification.

  • PDF/A-2a: Offers the same level of compliance as PDF/A-1a, and adds support for JPEG 2000 compression to generate reduced file sizes. Satisfies the requirements in the ISO 19005-2 specification. Readable by any PDF reader that conforms to PDF 1.7.

  • PDF/A-2u: Offers the same level of compliance as PDF/A-2a, and adds the ability to extract text in Unicode.

  • PDF/A-3a: Permits the embedding of another PDF/A file or a binary file format (such as XML or Microsoft Office) within the PDF/A file. Satisfies the requirements in the ISO 19005-3 specification. Otherwise, the same as PDF/A-2a.

    Note With PDF/A-3a compliance, only the PDF/A file, rather than any embedded file, should be considered for archiving purposes.
  • PDF/A-3u: Offers the same level of compliance as PDF/A-3a, and adds the ability to extract text in Unicode.

Add tags to document

Adds PDF tags to the PDF document to specify document structure and allow the extraction of page content. Tags are useful for reflowing text and graphics, conversion to HTML and XML file formats, and interpretation by assistive software for the visually impaired.

Image Compression

Select an image compression profile from the list.

Edit button

Modify an existing image compression profile or create a new one. The Image Compression Profiles window appears so you can specify the type of image compression to use.