Classification by blackness for single classes
Some documents just cannot be classified by layout or content, because they do not contain a typical layout or content. This script sample shows how to classify a document by blackness. This might be useful for scanned photos, which normally are displayed very dark on the scanned image. The script calculates the average blackness on the image and checks also that the black regions are well distributed over the document. This should avoid conflicts with logos or graphical elements which might also generate dark regions but only on a single place of the image.
The function DetectBlackImage works on the first 3 pages of a document. Internally it calls DetectBlackImageOnPage, which works on a single page.
It can be called in the AfterClassifyXDoc event. It is assumed that 'Pictures' is a valid class name of the actual project and that the reclassification is sure, so the confidence is set to 1.
' Project Script
Private Sub Document_AfterClassifyXDoc(pXDoc As CASCADELib.CscXDocument)
If DetectBlackImage(pXDoc) = True Then
pXDoc.Reclassify("Pictures", 1.0)
Exit Sub
End If
'...
End Sub
Private Function DetectBlackImage(pXDoc As CASCADELib.CscXDocument) As Boolean
Dim i As Long
Dim count As Long
Dim bResult As Boolean
' search for photos on the first 3 pages only
count = pXDoc.CDoc.Pages.Count
If count > 3 Then
count = 3
End If
For i = 0 To count - 1
bResult = DetectBlackImageOnPage(pXDoc.CDoc.Pages(i).GetBitonalImage(Project.ColorConversion))
If bResult = True Then
DetectBlackImage = True
Exit Function
End If
Next i
DetectBlackImage = False
End Function
Private Function DetectBlackImageOnPage(pImage As CscImage) As Boolean
' detects dark regions on a page
' this is used to detect class "Foto"
Dim TileHeight As Long
Dim TileWidth As Long
Dim XStart As Long
Dim YStart As Long
Dim x As Long
Dim y As Long
Dim dBlackness As Double
Dim BlackTileCount As Long
' divide the image in 5 * 7 tiles (ignoring 1/2 tile as border)
' we have to check 4*6 tiles
TileWidth = pImage.Width / 5
TileHeight = pImage.Height / 7
YStart = TileHeight / 2
BlackTileCount = 0
For y = 0 To 5
XStart = TileWidth / 2
For x = 0 To 3
dBlackness = pImage.GetBlackness(XStart, YStart, TileWidth, TileHeight)
If dBlackness > 0.4 Then
BlackTileCount = BlackTileCount +1
End If
XStart = XStart + TileWidth
Next x
YStart = YStart + TileHeight
Next y
If BlackTileCount > 3 Then
DetectBlackImageOnPage = True
Else
DetectBlackImageOnPage = False
End If
End Function