Table 5.
Performance (%) of random forest classifiers in predicting presence of CBMa
| Performance metric | Validation |
Testing |
|||
|---|---|---|---|---|---|
| All 5933 features | Top 50 features | 44 features (no C-terminus) | Top 20 features | Top 20 features | |
| Accuracy | 90.8 ± 2.1 | 90.9 ± 2.1 | 88.2 ± 2.5 | 89.3 ± 2.4 | 89.7 |
| Sensitivity | 93.7 ± 2.8 | 92.2 ± 2.9 | 89.6 ± 3.4 | 90.0 ± 3.2 | 95.7 |
| Specificity | 87.9 ± 3.5 | 89.7 ± 3.3 | 86.9 ± 3.7 | 88.5 ± 3.6 | 87.4 |
| MCC | 0.80 ± 0.05 | 0.81 ± 0.05 | 0.76 ± 0.05 | 0.78 ± 0.05 | 0.68 |
Validation and testing are performed on a 90%:10% split of the dataset, respectively. Validation performance is reported as the mean over 100 repetitions of 5-fold cross-validation ± 1 standard deviation.