Table 3:
Performance of different machine learning classifiers on the training dataset S2′ via 5-fold cross-validation
| Classifier | Method | Features | Threshold dependent | Threshold independent | |||||
|---|---|---|---|---|---|---|---|---|---|
| Score range | Thresholda | Sensitivity | Specificity | MCC | SN_496b | AUC | |||
| Basic | KNNc | 518 | 0.100∼0.900 | 0.500∼0.550 | 0.593 | 0.621 | 0.214 | 0.607 ± 0.014 | 0.6305 |
| RFd | Random | 0.080∼0.900 | 0.380∼0.579 | 0.590 ± 0.168 | 0.617 ± 0.183 | 0.219 ± 0.019 | 0.600 ± 0.007 | 0.6413 ± 0.0082 | |
| SVM | 518 | 0.328∼0.743 | 0.542 | 0.567 | 0.681 | 0.250 | 0.615 | 0.6509 | |
| Optimised | SVM + FFS | 78e | 0.170∼0.836 | 0.561 | 0.518 | 0.760 | 0.287 | 0.621 | 0.6768 |
| SVM + ASI | 74e | 0.098∼0.918 | 0.549 | 0.623 | 0.750 | 0.376 | 0.681 | 0.7479 | |
This threshold is provided by maximising the value of MCC.
This sensitivity is measured among tested genes with the top 496 prediction probabilities.
The k-value here is set as the square root of the size of the training samples in 5-fold cross validation (i.e., k = 20) [62].
This random forest algorithm uses 50 random grown trees and the modelling and validation procedures are repeated 10 times.
These features constitute the best/optimum feature set for the current machine learning method.