Skip to main content
. 2022 Nov 18;11:giac103. doi: 10.1093/gigascience/giac103

Table 3:

Performance of different machine learning classifiers on the training dataset S2′ via 5-fold cross-validation

Classifier Method Features Threshold dependent Threshold independent
Score range Thresholda Sensitivity Specificity MCC SN_496b AUC
Basic KNNc 518 0.100∼0.900 0.500∼0.550 0.593 0.621 0.214 0.607 ± 0.014 0.6305
RFd Random 0.080∼0.900 0.380∼0.579 0.590 ± 0.168 0.617 ± 0.183 0.219 ± 0.019 0.600 ± 0.007 0.6413 ± 0.0082
SVM 518 0.328∼0.743 0.542 0.567 0.681 0.250 0.615 0.6509
Optimised SVM + FFS 78e 0.170∼0.836 0.561 0.518 0.760 0.287 0.621 0.6768
SVM + ASI 74e 0.098∼0.918 0.549 0.623 0.750 0.376 0.681 0.7479
a

This threshold is provided by maximising the value of MCC.

b

This sensitivity is measured among tested genes with the top 496 prediction probabilities.

c

The k-value here is set as the square root of the size of the training samples in 5-fold cross validation (i.e., k = 20) [62].

d

This random forest algorithm uses 50 random grown trees and the modelling and validation procedures are repeated 10 times.

e

These features constitute the best/optimum feature set for the current machine learning method.