Classification by graphical lines

Some documents just cannot be classified by layout or content, because they do not contain a typical layout or content. This script sample shows how to classify a document by graphical lines. This might be useful for scanned charts or printed diagrams, containing a grid pattern as a background. The script calculates the number of vertical and horizontal lines on the image. Depending on some thresholds these numbers are used to make the classification decision.

The function DetectGraphicLines works on the first 3 pages of a document. Internally it calls DetectGraphicLinesOnPage, which works on a single page.

It can be called in the AfterClassifyXDoc event. It is assumed that Charts is a valid class name of the actual project and that the reclassification is confindent, so the confidence is set to 1.

A reference to Kofax Cascade Forms Processing 2.0 must be added to the script sheet where that function is implemented.

' Project Script
Private Sub Document_AfterClassifyXDoc(pXDoc As CASCADELib.CscXDocument)
   If DetectGraphicLines(pXDoc) = True Then
      pXDoc.Reclassify("Charts", 1.0)
      Exit Sub
   End If
   '...
End Sub


Private Function DetectGraphicLines(pXDoc As CASCADELib.CscXDocument) As Boolean
   Dim i As Long
   Dim count As Long
   Dim bResult As Boolean
   ' search for hor. and vertical lines on the first 3 pages only
   count = pXDoc.CDoc.Pages.Count
   If count > 3 Then
      count = 3
   End If
   For i = 0 To count - 1
      ' if we detect enough graphic lines on any of the first 3 pages, return TRUE
      bResult = DetectGraphicLinesOnPage(pXDoc.CDoc.Pages(i).GetBitonalImage(Project.ColorConversion))
      If bResult = True Then
         DetectGraphicLines = True
         Exit Function
      End If
   Next i
   DetectGraphicLines = False
End Function

Private Function DetectGraphicLinesOnPage(pImage As CscImage) As Boolean
   ' counts horizontal and vertical lines on a page
   ' this is used to detect class "Zeichnungen"
   Dim pLinesDetection As CscLinesDetection
   Dim xLeft As Long
   Dim xWidth As Long
   Dim yTop As Long
   Dim yHeight As Long

   ' check color format
   If pImage.BitsPerSample <> 1 Or pImage.SamplesPerPixel <> 1 Then
      DetectGraphicLinesOnPage = False
      Exit Function
   End If

   Set pLinesDetection = New CscLinesDetection
   ' setup parameters for lines detection
   pLinesDetection.DetectHorCombs = False
   pLinesDetection.DetectHorDotLines = False
   pLinesDetection.DetectHorLines = True
   pLinesDetection.DetectVerLines = True
   pLinesDetection.MinHorLineLenMM = 40
   pLinesDetection.MinVerLineLenMM = 40
   ' start lines detection, skip a border of 5%
   xLeft = pImage.Width * 0.05
   xWidth = pImage.Width * 0.9
   yTop = pImage.Height * 0.05
   yHeight = pImage.Height * 0.9
   pLinesDetection.DetectLines pImage, xLeft, yTop, xWidth, yHeight


   ' we require more than 8 hor. and vertical lines to return TRUE
   If (pLinesDetection.HorLineCount > 8 And pLinesDetection.VerLineCount > 8) Then
      DetectGraphicLinesOnPage = True
   Else
      DetectGraphicLinesOnPage = False
   End If

End Function