Standard Document Separation

In standard document separation, an allocated extraction process analyzes the page layout, and splits the document in parts. The splitting algorithm fires three different events, that can modify the separation result.

Document_BeforeSeparatePages( _
    ByVal pXDoc As CscXDocument, _
    ByRef bSkip As Boolean
    )

Introducing event of the document separation process, here the separation of the given document pXDoc can be skipped.

If the separation process proceeds first the classification of all pages is performed that can raise an additional event:

Document_XDocPageRotated( _
    ByVal RotationBy As CASCADELib.CscAutoRotation, _
    ByVal pXDoc As CASCADELib.CscXDocument, _
    ByVal PageNr As Long, _
    ByVal Rotation As CASCADELib.CscXDocRotationTypeEnum, _
    ByRef bCancel As Boolean
    )

If a page cannot be classified it is rotated stepwise by 90° clockwise and classification is re-executed. If the classification is successful for a rotation step, the rotation event is fired. If this is canceled from script by setting bCancel to TRUE the remaining rotation directions are applied and classification is executed for the page. This is done for all rotation directions where the page can either not be classified or the rotation is canceled by script. The parameter RotationBy is set to CscAutoRotationByDocumentClassifier.

Note If the Document_XDocRotated event is executed in Project Builder you cannot access the CscXFolder object. To ensure that any implementation will not terminate the application abnormally, you can evaluate the script execution mode (Project.ScriptExecutionMode=CscScriptModeServerDesign).

If a page could not be classified it is checked if content classification is required and if OCR has to be executed. This may raise another XDocPageRotated event with the parameter RotationBy set to CscAutoRotationByOCR. This rotation reflects a rotation that is suggested by the OCR. If this event is canceled the OCR is re-executed without rotation.

Document_SeparateCurrentPage( _
    ByVal pXDoc As CASCADELib.CscXDocument, _
    ByVal PageNr As Long, _
    ByVal bSplitPage As Boolean, _
    ByRef RemainingPages As Long _
    )

For each page specified by its PageNr of the given document this event is fired. If a page is recognized as first page the parameter bSplitPage is set to TRUE. It can also set by script to TRUE to force a split at this position. To skip a defined number of pages if a specific document type has always a defined number of pages set the parameter RemainingPages to the number of pages that belong to the current first page, for these pages this event is not executed.

Document_AfterSeparatePages( _
    ByVal pXDoc As CscXDocument
    )

This is the finalizing event of the standard document separation for the given document pXDoc. Here the complete document before applying the document separation is given, theoretically the pXDoc.CDoc.Pages(...).SplitPage parameter can be modified to flag the pages before the document is cut.

The following image shows the sequence of the standard document separation events:


An image that shows the standard document separation events.