Code samples for language detection

If each pages contain only one language, and the language is known, the language can be set by the kRecManageLanguages(sid, SET_LANG, language); command.

If pages can contain multiple languages from a known set of languages, set the possible languages with the kRecManageLanguages command before preprocessing the image.

If each page contains only one language, and the language is unknown, use the automatic single language detection, described in Single Language Detection. The following limitations apply:

  • Greek, Russian (Cyrillic), Thai, and Arabic languages and scripts are not recommended.

  • Western (Latin) languages without dictionary are not supported.

  • Accuracy of detection strongly depends on the image quality and other condition.

Note If the detection process cannot determine the page language, the language of the previous page is set, and LANGDET_INHERITED_WARN is returned by kRecGetPageLanguages.