Knowledge bases

A knowledge base in Kofax Transformation Modules is a repository of binary files used to store extraction patterns. Knowledge bases are relatively compact. A knowledge base for 350 trained invoices is about 60 KB in size. Because the size increases linearly, 5,000 trained invoices have a knowledge base around the size of 1 MB.

When a user imports a knowledge base into a new project, the inherited knowledge enables the project to immediately extract data from a certain percentage of invoices. A single project may have multiple knowledge bases.

Documents that were not properly extracted can then be used to improve the extraction results for your project. This training is typically the responsibility of the system administrator who processes sample documents that are placed in a training set. The training session creates new extraction patterns that are stored with the project.

In addition, you can make these new extraction patterns portable by adding them to a knowledge base. If you do this, all projects using that knowledge base benefit from the training.

Note Kofax Transformation Modules only stores the relevant extraction pattern information in the knowledge base. You cannot access or display the training document contents from the knowledge base.

Five types of available knowledge bases store the following information:

  • Amount Group (*.kba) - stores extraction patterns for amount fields such as subtotal and total.
  • Invoice Group (*.kbi) - stores patterns for invoice fields such as invoice number and invoice date.
  • Order Group (*.kbo) - stores extraction patterns for order fields such order number and order date.
  • Trainable Group (*.kbtgl) - stores patterns for arbitrary values in fields not covered by the other group locators.
  • Trainable Table (*.kbtbl) - stores extraction patterns for various types of fields within a table.

In addition to different types of knowledge bases, you can use one of two algorithms to create the extraction pattern: generic and specific.

Note Create different knowledge bases for each algorithm type, to provide both extraction patterns. This means that you can create up to nine different kinds of knowledge bases, but only the specific algorithm is available for use with the Trainable Table knowledge base.

With extraction online learning, the feedback loop happens immediately inside the production system. No manual interaction is required to convert the knowledge of the training documents into an improved extraction rate.