Classification by blackness for single classes

Some documents just cannot be classified by layout or content, because they do not contain a typical layout or content. This script sample shows how to classify a document by blackness. This might be useful for scanned photos, which normally are displayed very dark on the scanned image. The script calculates the average blackness on the image and checks also that the black regions are well distributed over the document. This should avoid conflicts with logos or graphical elements which might also generate dark regions but only on a single place of the image.

The function DetectBlackImage works on the first 3 pages of a document. Internally it calls DetectBlackImageOnPage, which works on a single page.

It can be called in the AfterClassifyXDoc event. It is assumed that 'Pictures' is a valid class name of the actual project and that the reclassification is sure, so the confidence is set to 1.

' Project Script
Private Sub Document_AfterClassifyXDoc(pXDoc As CASCADELib.CscXDocument)
    If DetectBlackImage(pXDoc) = True Then
      pXDoc.Reclassify("Pictures", 1.0)
      Exit Sub
    End If
    '...
End Sub

Private Function DetectBlackImage(pXDoc As CASCADELib.CscXDocument) As Boolean
   Dim i As Long
   Dim count As Long
   Dim bResult As Boolean
   ' search for photos on the first 3 pages only
   count = pXDoc.CDoc.Pages.Count
   If count > 3 Then
      count = 3
   End If

   For i = 0 To count - 1
      bResult = DetectBlackImageOnPage(pXDoc.CDoc.Pages(i).GetBitonalImage(Project.ColorConversion))
      If bResult = True Then
         DetectBlackImage = True
         Exit Function
      End If
   Next i
   DetectBlackImage = False
End Function

Private Function DetectBlackImageOnPage(pImage As CscImage) As Boolean
   ' detects dark regions on a page
   ' this is used to detect class "Foto"

   Dim TileHeight As Long
   Dim TileWidth As Long
   Dim XStart As Long
   Dim YStart As Long
   Dim x As Long
   Dim y As Long
   Dim dBlackness As Double
   Dim BlackTileCount As Long

   ' divide the image in 5 * 7 tiles (ignoring 1/2 tile as border)
   ' we have to check 4*6 tiles
   TileWidth = pImage.Width / 5

   TileHeight = pImage.Height / 7

   YStart = TileHeight / 2
   BlackTileCount = 0
   For y = 0 To 5
      XStart = TileWidth / 2
      For x = 0 To 3
         dBlackness = pImage.GetBlackness(XStart, YStart, TileWidth, TileHeight)
         If dBlackness > 0.4 Then
            BlackTileCount = BlackTileCount +1
         End If
         XStart = XStart + TileWidth
      Next x
      YStart = YStart + TileHeight
   Next y

   If BlackTileCount > 3 Then
      DetectBlackImageOnPage = True
   Else
      DetectBlackImageOnPage = False
   End If

End Function