PDFTextInput

From INVOICES 5-9

Description

This flag allows the data layer in a PDF to be extracted instead of using the OCR that is normally performed by the interpretation engine. If this flag is turned on, the OmniPage engine extracts the data layer in the PDF, if one exists.

The following items need to be considered when using this functionality:

  • It takes longer to interpret the PDF compared to when only OCR is performed, because both processes are run and the results compared. This is due to an OmniPage security feature that validates the consistency of the data layer with the OCR from the image.
  • Since the data layer can contain any information, potential risk exists when using this feature if the data in the PDF has been manipulated. Even though the OmniPage security feature should detect inconsistencies between the data and the OCR, manipulated data could be passed on to Verify.
  • Due to the above, situations may arise where values seen in the image do not match values extracted from the data layer when the data layer is used.
  • If a PDF does not contain a data layer and this functionality is turned on, normal OCR is performed on the image.
  • Neither the OmniPage engine nor the PDF standard contain functionality that makes it possible to detect if the data layer is used or if OCR was performed on the image. As a result, there is no way to log to the process log to notify the user that values have been extracted from the data layer or that values were OCRed from the image, which would be the case if a data layer is missing in the PDF.
  • If ABBYY is used as the OCR engine, this flag has no effect.

Valid values

  • 0 - bitmap image used to interpret invoices (default)
  • 1 - text and image layer of the source PDF file used to interpret invoices