Adaptive Feature Classifier Properties window

This classifier is used throughout your project wherever content classification is enabled.

You can configure the Adaptive Feature Classifier properties with this window.

You need to retrain the project before any changes made to these settings can take affect.

Text Filtering

This group has the following settings:

Use digits

This setting controls whether the classifier uses digits as features or ignores them during text filtering. (Default: Cleared)

Min. word length

All words that are shorter than this value are ignored during text filtering. Independently of word length, features with a very low or high frequency are also not taken into account. (Default: 3)

Training

This group has the following settings:

Max. number of features

Limits the maximum number of internally generated features per class. (Default: 5000)

Min. feature length

Specifies the minimum number of characters that should be used for a feature. This value cannot be smaller than the Min. word length. (Default: 3)

Max. feature length

Specifies the maximum number of characters that are used for a feature. Should not be larger than 64 characters. (Default: 50)

Automatic selection of Min. feature frequency

Enables the Min. feature frequency to be set automatically. If this setting is selected, you cannot manually assign a Min. feature frequency value. (Default: Cleared)

Min. feature frequency

Specifies how often a substring is displayed inside the training set of a class to be used as a feature for content classification. (Default: 2)

Start features at beginning of words

Specifies that a feature substring needs to start at the beginning of a word. If not checked, the substring can start anywhere. (Default: Selected)

Max. words per feature (0-n)

Limits the number of words per feature. A value of zero means unlimited words, although the total number of characters of the words per feature cannot exceed the "Max. feature length" property. (Default: 2)

Use fuzzy string match

Enables matching fuzziness with the disadvantage of slower classification performance. (Default: Cleared)

Fuzzy length (5-10)

Configures the fuzzy string comparison. (Default: 5)

Automatic selection of Min. class entropy

Enables the Min. class entropy to be set automatically. If this setting is selected, you cannot manually assigned a Min. class entropy value. (Default: Cleared)

Min. class entropy (0.0 - 1.0)

Controls the importance of a feature, depending on the number of classes where it is displayed. A value of 1.0 requires that a feature is displayed only inside the sample documents of a single class; otherwise, it is not used for classification. The lower the value, the more classes can contain the feature inside the training set. (Default: 0.600)

Definitions for the buttons at the bottom of this window can be found in Common Transformation Designer Buttons.