Table 2.
Cross-validation accuracy (number of features) | ||||
---|---|---|---|---|
Type of input features | Without feature selection | With feature selection | ||
Info_Gain | Chi-square | Feat_Perm | ||
(i) OTU | 0.762 (7048) | 0.779 (60) | 0.777 (50) | 0.798 (20) |
(ii) Clade | 0.738 (14,402) | 0.802 (110) | 0.800 (170) | 0.802 (100) |
(iii) Function | 0.761 (6191) | 0.762 (120) | 0.754 (100) | 0.761 (60) |
(iv) Hybrid | 0.777 (1556/1518) | 0.804 (92/78) | 0.805 (68/62) | 0.805 (28/22) |
The initial numbers show the accuracy score, with numbers in parentheses indicating the total number of features used to train and test the classifier. The four types of input features used were (i) OTUs only, (ii) OTUs and clades comprising related sets of OTUs, (iii) functional predictions made using PICRUSt, and (iv) a dataset comprising all generated features. Feature selection techniques used were the filter methods, information gain and chi-square, and the feature permutation wrapper method