Background processing

Background processing is a ShareScan functionality designed to speed up the workflow at the MFP and improve user experience, as users no longer have to wait at the device for the completion of their scan jobs.

The documents are created in the background, using a lower-priority queue, to enable devices to be still responsive, with this not hindering the users even during peak periods. This lower priority may result in a longer timeframe for document creation, depending on various factors (job size, system resource availability, scan job resolution, whether it is color or BW, and so on).

Document services can be run via background processing if they are used with express profiles (no user interaction). Pre-5.2 SP2 document services always processed the scanned images real-time, and the user had to wait for any additional screen until this processing was completed. However, in v5.4, the user can leave the MFP much earlier, if a document service is configured with an express profile.

Kofax recommends using background processing in all scenarios. If the user tries to save a connector profile in the Administration Console with background processing disabled, a warning message pops up requiring confirmation.

The following ShareScan functionalities can be particularly useful with background processing:

  • Notification service provides information on how to configure notification for your document workflows. Notifications are especially useful with background processed jobs, because in these cases the user does not have a direct feedback on the MFP UI if there is a problem with document creation or delivery.
  • Activity Tracking Report allows you to generate and overview reports about the activity levels and actions on a configurable basis (for example, per device, or per connector).
  • Document Tracking allows you to monitor the passage of individual documents in the ShareScan workflow - including background processing.

Background processing settings

You can set background processing for your connector profiles via the ShareScan Administration Console on the Settings pane of the Connectors tab in the Workflow group.

Background processing is enabled by default for all connector profiles, if ShareScan is installed to a clean system. In case of an upgrade scenario where older profiles are migrated, it is disabled and needs to be enabled manually; the settings of the existing profiles are retained.

Field

Description

Background Processing

Select the Enabled check box next to this setting to allow the connector profile to use background processing.

Bypass redirect screen

If enabled, this option allows users to bypass the Redirect screen, and return to the Main form immediately.

Logoff automatically

This option can only be enabled, if Bypass redirect screen option is turned on. If enabled, the user is logged off when the job is put into the processing queue.

The maximum number of the OutputCreator.exe processes performing document building and OCR is determined by the MaxNumberOfOutputCreators Advanced Setting.

If it is set to 0, the system default is used, which is as follows:

  • 2 Output Creators if there are 2 CPU cores
  • 6 Output Creators if there are 4 CPU cores
  • 12 Output Creators if there are 8 CPU cores

It is not recommended to increase the number above the defaults, but if the available memory is low, then the maximum number can be set to a lower value than the default.

Advanced settings

Advanced background processing settings can be reached via the Administration Console under Settings / Advanced ShareScan Settings. These advanced settings are stored in the database.

  • By default, all non-completed jobs are moved to background processing after 4 minutes. This can be changed via editing the OnlineProcessingTimeout advanced setting in the Administration Console under Settings / Advanced ShareScan Settings.
  • By default, jobs remain in the background processing queue for 4000 seconds maximum; after that, the document is considered a failed job, and indicated as such. We advise raising this timeout value when a large number of documents are being processed regularly, or when processing time-consuming jobs. To do this, set the value of the OutputCreatorRetryCreationWithBackgroundAndDT advanced setting in the Administration Console under Settings / Advanced ShareScan Settings.The value of the setting is in seconds (default is 1000 seconds), which is multiplied by 4, as there are always four retries.

Performing OCR

Scanning offers a highly efficient way to distribute and store business documents. However, the file format and scan resolution must be chosen carefully. Applying inappropriate settings may result in poor image quality or unacceptably large files.

Scanning converts a printed page into an image. These images are converted via the OCR process to several possible document formats, for example, PDF (especially Searchable PDF) or various Microsoft Office formats (for example, DOC or DOCX). As the application creating the output file must be able to intelligently recognize different regions as text or graphics and apply the appropriate compression technology to each region, this is a resource-intensive process.

The sheer amount of raw data and system resources involved in creating a high-quality output via the OCR process makes using OCR online a less optimal solution, as it negatively affects user experience. Routing such jobs to background processing is an effective way of improving user experience and system resource distribution.

OCR-related tips

  • Use OCR only if it is a requirement on the backend service (for example, document storage – if a full text index or search feature is used). We advise leaving the OCR engine turned off by default, with the user having the option to turn it on for specific jobs.
  • When scanning a large number of documents with OCR enabled, using 200 dpi grayscale or 300 dpi BW is recommended for optimal performance.
  • If accuracy is not a key factor, you can set the OCR engine accordingly, providing a speed increase. This is only valid for jobs using OCR, and can be modified via the SetOCRSpeed registry setting. The following options are available:
    • 1 - 3-way voting, prefer accuracy
    • 2 - 2-way voting, prefer accuracy
    • 3 - 3-way voting, prefer speed
    • 4 - 2-way voting, prefer speed
    • 5 - fastest OCR (default)
  • We advise you to consider and restrict the output document format according to your needs, as several output types (for example, DOC, DOCX, XLS, XLSX, and Searchable PDF) perform OCR, consuming more server resources in the process.
  • For OCR-intensive environments, we advise using ShareScan Load Balancing, to improve scalability and processing times, as well as better network load balancing. For details consult the eCopy ShareScan High Availability and Load Balancing Deployment Guide.

Caveats of using online profiles on Web-based devices

If a job is processed online (as compared to background) the following disadvantages arise:

  • the user has to spend more time at the MFP device
  • the online job may time out on certain device vendors/families. More specifically, some web browser-based devices (where the ShareScan UI is displayed in a web browser) might close the browser (hence the ShareScan application) if there is no UI activity for a specific amount of time. The period of this time-out can be configured on the device.

A job is processed online if the user sets the Background processing setting to OFF mode (by clearing the Enabled check box next to Background Processing setting) in the connector profile.

If a job does not have any UI to display (that is, all the document service / extender forms and all the connector forms are displayed), then the system automatically switches the online job to a background job if the document processing is not finished during the timeout configurable by the OnlineProcessingTimeout advanced setting.

On the contrary, if a job still has any forms to display (either document service / extender forms or connector forms) the automatic switch to background mode does not happen.

In such situations, on devices mentioned above it can happen that the (online) processing is still in progress, but the device UI times out, with this causing the job to be unable to complete (as it would still require UI / UI interaction, but the UI is not present anymore).

In such vendor / workflow usecase combinations the workflow needs to be redesigned to eliminate the root cause of the problem by

  • scanning less pages to ensure that all the scanned jobs can be processed until the device timeout happens, before the next UI interaction
  • eliminating the user interaction (if possible) by using express document service / connector settings (profiles not having forms)
  • creating so-called combined workflows (where applicable) that have a primary phase where scanning and data collection (data entry) from the user happens, with this producing a document file and an index file; these are processed with a folder watcher workflow that performs background document processing and document delivery with express profiles. This is possible only if there are no document services / extenders with validation / verification screens and only for connectors supporting metadata usage via Data Publishing.

For details see the corresponding white paper.

If the job has Background processing enabled, it means that the system switches to background mode immediately if there are no more forms to display.

If there are both document services / extenders in the workflow that can be executed in the background (that is, not displaying forms -> express profiles) and there are others configured with UI and the processing order of the document services is not important, then it is recommended to put the express (no UI) profiles to a later position in the list of document services, because this enables the system to switch automatically to background mode as soon as possible.