Troubleshooting tips

The OPOCR component provides detailed logging that may be used in troubleshooting. "Jobxml" is data sent to the OCR engine when a file is sent to be read. The data contains configuration settings that tell the engine how to proceed to do its work. This information is useful for recreating and troubleshooting issues, and for getting fixes. The Jobxml info is now written to logs and can be saved to disk.

OCR_Last and OCR_Failed output folders
The OPOCR component creates two folders in the AutoStore program files folder. They are OCR_Failed and OCR_Last. Each is disabled by default, and can be enabled by renaming the subfolder contained within each one from "Disabled" to "Enabled". Note that this feature may be removed or changed in a later update.
  • OCR_Last captures the last job run through OCR, regardless of whether it failed or succeeded. It will only ever contain the last job. It will contain the Jobxml and the actual file that was processed.
  • OCR_Failed captures every failed job run through OCR. It can become cluttered with thousands of files, so you want to only enable it when necessary. It will contain the Jobxml and the actual file last processed for every failed job. Each failed job is saved to a unique subfolder.

Known issues

Problem description Solution
AutoStore jobs fails or hangs

The RESULTS WAIT TIMEOUT option for the OPOCR component in the configuration (.cfg) file of an AutoStore process controls how long the AutoStore workflow service waits before restarting the OCR engine when it is not responding. By default, this value is set to 60 minutes. You can edit the configuration file and change this interval to a different value. If the interval is too short, a lengthy OPOCR task may be terminated prematurely and result in a failed job.

You may want to set the RESULTS WAIT TIMEOUT option to a shorter value if most of your documents are short and you do not want to wait for task processing to continue while rarely processing a long document. For example, if nearly all of your documents require less than 15 minutes, you can set this value to RESULTS WAIT TIMEOUT = 15, and then manually handle a document that takes longer than 15 minutes after OPOCR times out and fails the job. You can activate the OCR_Failed folder to collect OCR jobs that timeout.

Poor-quality OCR results

Inaccuracies in the OCR process can have many causes. It is recommended that you perform an analysis of types of paper, scanners, and resolution levels to optimize your OCR results before setting up OCR processes.

The following are some common tips for increasing OCR accuracy.

  1. File format — Color documents do not capture image details accurately. When the process input is a color image, you achieve lower-quality OCR documents. Review your color document requirements and consider higher-resolution scanning to increase accuracy.
  2. Document quality — Low-quality paper documents are another major cause of lower OCR accuracy. Lower-quality documents generally increase the error rate for OCR. When working with such documents, consider the following factors to increase your OCR accuracy:
    • Try to discover ways to get higher-quality paper documents.
    • Consider a scanner with different scanner bulb color, which might work better with the paper color of your document.
    • Test a higher-level scan resolution.
    • Consider using the image process in advance of OCR to clean up the image.
When you export file to HTML format, the images are not displayed in the output file. This problem may appear when you use renaming schema. When HTML is used as the output file format, you get an HTML file and some images to which the HTML file references. If you rename the images then the internal links will be broken. Therefore, the rename schema should not be used when exporting to HTML format.
Some setting of the output document has a value different from the specified one. Make sure that this setting was specified correctly. If a setting was defined incorrectly or uses an RRT that was replaced with the incorrect value, the component replaces the incorrect value by the default value at run time, if the default value exists. 
When using the Zoned OCR Matches wildcard validation setting on a zip code with 5 numbers, the validation might fail. Use [#][#][#][#][#] to validate the zip code.
When using the Zoned OCR Matches regular expression validation setting on a zip code with 5 numbers, the validation fails if you use multipliers {...}.

Use the following to validate the zip code:

(0|1|2|3|4|5|6|7|8|9)(0|1|2|3|4|5|6|7|8|9)(0|1|2|3|4|5|6|7|8|9)(0|1|2|3|4|5|6|7|8|9)(0|1|2|3|4|5|6|7|8|9)

OCR engine may return error 0x8004C60A if the document has empty or bad pages. Set the Advanced tab > At page compression error: to Replace corrupted page with blank to process the rest of the document.