RecoStar recognition trigram modes
Trigrams are combinations of three letters that are commonly found in many languages. For example, common English trigrams
include
ing
and
ion.
RecoStar can take advantage of trigrams to enhance recognition accuracy. Trigrams can check and optionally repair combinations of letters that have both a low confidence rating and a low frequency of occurrence.
Consider these examples:
In the first case, the image file for
Walking
suffers from drop-outs. In particular the
n
is badly faded. The recognition engine cannot decide if it is an
r
followed by an
i,
or if it is a single
n,
so the character is marked as
rejected
in the initial results. Trigram analysis is applied to the initial result and the recognition engine decides
that the most likely combination of three letters, in this case, is
ing.
In the second case, the image file contains substantial noise. Because of this noise, the second
i
in
Dictionary
is interpreted as the letter
l
Trigram analysis shows that
ion
is more likely than
lon
and the word is corrected. It is important to keep in mind that trigram analysis is a statistical process.
RecoStar ships with trigram tables for most supported languages. Each table contains a list of possible three letter combinations
and their frequency of occurrence in that language. Although there are thousands of such combinations, many of them are almost
never be used so their frequency of occurrence is near zero.
There may be rare occasions where your data contains many uncommon trigrams. For example, a list of Chicago radio stations might include WGN, WLS, WNVR, WKTAF, WZRD, WBEZ, or WXRT. In such cases, if you notice problems, you should consider disabling trigrams for your recognition profile.