Document Transformation

The Document Transformation step helps you extract and use information from images and text documents. The Kofax RPA Document Transformation Service can process .png, .jpeg, .jpg, .tif, .tiff, .pdf, and .txt files. You can submit multiple documents either as a .zip archive or as a path to a folder with files. If you use the Document Separation feature in Kofax Transformation, Kofax RPA receives several documents that you can navigate to in the Document Transformation browser.

The Kofax RPA Document Transformation Service can also process Natural Language Processing (NLP) requests using the Sentiment project to help you detect the mood of the text, such as positive or negative, and to extract entities, such as company names, person names, and so on. You can use the Sentiment project to process customer reviews to understand whether customers are satisfied with the service or not. Moreover, you can use it to find all mentions of your company in an article. The Sentiment project can be used with KTT version 6.3.1 or later. See the Sentiment project in Predefined projects for details.

Document Transformation workflow

The Document Transformation action processes your graphical or PDF documents using a selected project. A project is a module that processes and transforms your documents using OCR and other specified operations.

The processing result is returned to the Robot and opened in the Document Transformation Browser in the Recorder view. The service forms an element tree with all extracted information. Note that in a multi-page document, you can browse through the pages using the Previous and Next buttons on the Document Transformation browser toolbar. See Document Transformation browser for details.

Elements in the tree contain confidence levels for the OCR results and other extraction results defined by the project. The confidence attribute can contain values from zero to one where the most confident is one.

element properties in a transformed document

Derived attributes such as der_x help you find the element and can be used in finders.

Once the transformed document is in the editor, you can determine whether you want to perform the validation of the transformation results. If you are satisfied with the transformation results without any validation, you can extract and use the data in the document.

Validation is performed by the Document Transformation Thin Client. Click Configure proxy settings in the Document Transformation Browser to send the document to the specified Thin Client. A unique URL is generated and returned to the robot. The robot extracts the URL and uses it to send the document to a validation user, such as via email. The validation user clicks the URL, enters credentials, and after that the document with the extracted data opens. The validation user inspects the transformed document and, if needed, modifies extracted information in the document.

When validating documents, the user can enable the Online Learning feature to increase the rate of field recognition on similar documents. This feature is based on remembering the layout of a sample document, such as an invoice. By using automatic field completion, manually typing or selecting the correct value in the document, the user contributes to the knowledge base, which improves extraction results when the user works on a similar document next time.

When validation is finished, the validation user marks the document as valid. When the document is marked valid, it is used as an argument for a robot specified in the Callback option in the Document Transformation action.

Note To include meta data for the Open Document Transformation step, execute the Migrate step, then close and open the robot, and click Update step.

Step properties

Action

Select an action to perform using the Kofax RPA Document Transformation Service.

Service URL

Specify a URL and a port if necessary for the computer running the Document Transformation Service. If the service is installed locally, enter localhost in this field. The URL must include the http:// or https:// prefix. If you use https, the web hosting service should have a certificate accepted by well-known certificate authorities.

Project Type
  • Default Project: This option provides a set of predefined projects. See Predefined projects.

  • Custom Project: When you select this option, specify the path to the project to process your documents in Custom Project Path.

Document Source

Select how the robot locates a document to process.

  • Local File: Enter the path to one or more documents to process in File name. Use either a full path to an image file, .zip archive, folder with files, or another file of the supported format accessible form the computer running a robot.

  • Robot File System: Enter the path to the configured file system and the file name, such as myshare/doctotransform.pdf. The file system name must correspond to that specified in the Robot File System section in Management Console.

  • Binary Variable: Specify a binary variable that contains a document.

When a path to multiple documents is specified, you can navigate between the documents using the Document Transformation browser toolbar buttons.
Metadata

Select this option to pass additional data to the Document Transformation Service.

This data is added to the input document as XValues, so it can be used by the Document Transformation Service project. The projects usually use the data to tune or control the analysis. Common use cases for this option are language settings or customer identification. The XValues are available in the device tree after the processed documents are received back from the Document Transformation Service.

Consult the developer of your Document Transformation Service projects to identify which values are supported by a specific project.

Note You can add more than one Key/Value pair to the Metadata property.

If any Key appears more than once in the list, the Value of the last occurrence is used.

Validation URL
Select this option to specify a URL for the Thin Client Service. This property is required to send processed documents for validation. The URL is specified in the ValidationService property of the Document Transformation Service. The URL may look similar to the following:

http://localhost:8082

Callback
Select this option to specify a robot that the Thin Client Service must call after a document is validated. When the validation is completed, the robot is queued for execution on the Management Console.

  • Robot project: Specify the project where the robot to call resides. For example, Default project.

  • Robot name and path: Specify the name of and path to the robot if it resides inside a folder in the project. For example, MyRobot.robot or folder/subfolder/MyRobot.robot.

After the validation is completed, the documents sent back to the robot specified with the Callback option will contain the name of the user who validated the batch of documents. This information will be provided as XValue with the name KDTS-ValidatingUser.

Note To find the Management Console to queue the callback robot on, a robot with the Document Transformation step uses the Management Console URL as configured for the RoboServer that the step runs on. When the robot runs in Design Studio, it uses the URL of the Management Console that is marked as 'primary' in Design Studio Settings. When the robot runs on an embedded Management Console, it uses the URL configured with the -mcUrl parameter.

These configured URLs must use the hostname or IP address of the computer running the Management Console. Do not use 'localhost', because the Document Transformation Service cannot reach the Management Console, and the callback robot will not be queued.