Information gain |
Each attribute in a dataset is assigned a score based on the additional information that an attribute provides regarding a class in terms of entropy reduction. |
Simple and fast. Good for prediction problems where the high dimensionality limits the application of more sophisticated methods. |
Does not account for redundancy and interactions among attributes. |
Relief [17] |
Randomly samples an instance from the data and then locates its nearest neighbor from the same and opposite class. The values of the attributes of the nearest neighbors are compared to the sampled instance and used to update relevance scores for each attribute. |
Same as information gain. |
Same as information gain. |
Correlation- based feature selection (CFS)[18] |
Merit of a given attribute is calculated taking into account the correlation of the attribute with the target class as well as the correlation of the attribute with other attributes in the dataset. Attributes with stronger correlation with the target class and weaker correlation with other attributes are ranked higher. |
Fast and independent of the target learning method. Accounts for redundancy. |
Does not account for potential interactions between attributes. |
Consistency- based [19] |
Identifies attribute sets whose values divide the data into subsets containing a strong single class majority. |
Independent of the target learning method. Accounts for redundancy and interactions among attributes. |
Slower than correlation- based feature selection.
|
Wrappers[20] |
Uses a target learning algorithm to estimate the worth of attribute subsets. A search algorithm is used to test as many combinations of attributes as possible and find an optimal solution. |
Accounts for redundancy and interactions among attributes. Generally give better results than other techniques because candidate solutions are evaluated using the target learning algorithm. |
Specific to the learning algorithm that is used to evaluate the worth of the subsets (has to be rerun for each learning algorithm). Slower than the other methods, precluding its application to datasets with high dimensionality and slow learning algorithms. |