Process documents

The processing of the documents is mainly done in two steps. First the document is tried to be classified and then the settings of that specific class are applied to the document to retrieve the document data.

Each processing of a document is introduced by a Document_BeforeProcessXDoc in the project script, because the incoming document is completely unknown. This event can be used for controlling the OCR as shown in the following example.

Important You cannot call page recognition from script using the Kofax Clarity recognition engine.

If project fields are defined their extraction events are executed first, then the classification events are following which are also part of the project script. The extraction of the project fields can be used to classify the document by these extraction results, for example, a bar code result is used to classify documents to a specific class.

For a classified document the document extraction is performed. This means that all locator methods are executed and by their assignment to the fields the document is getting its field results. The extraction comes along with extraction events that are following the defined class hierarchy, they are considering the field and locator inheritance. When the extracted document belongs to a child class, the extraction events for the inherited locators and for the inherited fields are also fired for all parent classes.

Important If foldering is enabled and you have folder fields defined you must not change any folder fields for any of the document processing events, such as DocumentValidated. The reason is that multiple documents are processed in parallel and therefore the changes cannot be saved to the root Xfolder object.

This example shows how to skip OCR for all pages after the third page. All documents have three or less pages have a complete OCR, documents having more than three pages have only OCR on the first three pages.

' Project Script
Private Sub Document_BeforeProcessXDoc(pXDoc As CASCADELib.CscXDocument) 
    Dim i As Long  
    Dim Count As Long  
    Count = pXDoc.CDoc.Pages.Count

    ' suppress OCR for all pages after 3 
    For i = 3 To Count - 1 
       pXDoc.CDoc.Pages(i).SuppressOCR = True
    Next i

End Sub

Your search for returned results.

Search tips

Process documents