Project structure

The project structure is displayed in the Project Explorer. You can create a Contained or a Partial project; the difference lies in that the Contained project does not need to have an Alien documents section, since all input documents fit one of the defined document classes, whereas in the Partial project this is not so.
Parts of contained and partial projects

Contained project structure:

  • Training set

  • Stopwords

  • Metawords

  • Phrases

  • Test set

  • Test results

Partial project structure:

  • Training set

    • Alien documents

  • Stopwords

  • Metawords

  • Phrases

  • Test set

    • Alien documents

  • Test results

In case you have a large number of documents in your Training set or Test set, you can easily locate one specific document in the Main panel view of the particular class to see all relevant information about it. While a document is selected in Project Explorer for viewing its contents, click the Locate in parent button in the header of Project Explorer and your selected document gets highlighted in green in the Main panel view of the container class.


Selected document highlighted in project tree

Specify new project details in the New Project dialog:

  • Name: name of the new project

  • Location: save path of project

  • Project file: project file path

  • Language: OCR language of project; it cannot be changed after project creation.

  • Contained project: mark this radio button to create a project with no alien documents (see above)

  • Partial project: mark this radio button to create a project with alien documents (see above)


New project dialog box

The project details are shown in the Main panel:

The Settings, Other and Statistics collapsible sections have the following items:

Settings

  • Classifier method: type of document classifier; choose any of the following from the dropdown after clicking in the Classifiers field

    • Both text and layout based: text and layout based document classifier

    • Layout based only: layout based document classifier

    • Text based only: text based document classifier


    Classifier methods
  • Error weights: expand this section and choose any of the three options to modify the default settings for automatic confidence threshold calculation

    • False negative

    • False positive

    • Misclassified


    Error types with weight
  • Project type: select Partial or Contained from the dropdown


    Project type options
  • Test training documents, too: if this checkbox is marked, training documents are also tested; it can have True (marked) and False (unmarked) values

Other

  • Created: project creation date

  • Description: provide some general comment for easier identification

  • Language: language set of the project used by OCR and word processing; cannot be changed after project creation

  • Project file path: local file path of project file

  • Selected confidence threshold: confidence threshold of the project initially set at project creation

  • Suggested confidence threshold based on the training set: optimal confidence threshold calculated on the training set documents

Statistics

  • Alien test documents: number of alien test documents

  • Classes: number of defined classes in the project

  • Hidden classes: number of hidden classes in the project; these do not participate in the training phase

  • Hidden training documents

    : number of hidden documents in the project; these do not participate in the training phase
  • Test documents: number of test documents in the project

  • Training documents: number of training documents in the project


Project properties window