Table 2.
Comparison of discriminative features of 11 machine learning models in testing set.
Characteristics | SVM | NN | MLP | GP | GBM | LR | NB | XGB | C5.0 | KNN | RF |
---|---|---|---|---|---|---|---|---|---|---|---|
Apparent prevalence | 0.21(0.08, 0.41) | 0.21(0.08, 0.41) | 0.25(0.11, 0.45) | 0.14(0.06, 0.27) | 0.78(0.71, 0.83) | 0.14(0.06, 0.27) | 0.43(0.24, 0.63) | 0.18(0.09, 0.31) | 0.18(0.06, 0.37) | 0.21(0.08, 0.41) | 0.82(0.76, 0.87) |
True prevalence | 0.32(0.16, 0.52) | 0.32(0.16, 0.52) | 0.32(0.16, 0.52) | 0.18(0.09, 0.31) | 0.79(0.73, 0.85) | 0.18(0.09, 0.31) | 0.32(0.16, 0.52) | 0.18(0.09, 0.31) | 0.32(0.16, 0.52) | 0.32(0.16, 0.52) | 0.79(0.73, 0.85) |
Sensitivity | 0.56(0.21, 0.86) | 0.56(0.21, 0.86) | 0.67(0.30, 0.93) | 0.56(0.21, 0.86) | 0.96(0.92, 0.98) | 0.56(0.21, 0.86) | 0.89(0.52, 1.00) | 0.78(0.40, 0.97) | 0.56(0.21, 0.86) | 0.56(0.21, 0.86) | 1.00(0.98, 1.00) |
Specificity | 0.95(0.74, 1.00) | 0.95(0.74, 1.00) | 0.95(0.74, 1.00) | 0.95(0.83, 0.99) | 0.93(0.81, 0.99) | 0.95(0.83, 0.99) | 0.79(0.54, 0.94) | 0.95(0.83, 0.99) | 1.00(0.82, 1.00) | 0.95(0.74, 1.00) | 0.88(0.75, 0.96) |
PPV | 0.83(0.36, 1.00) | 0.83(0.36, 1.00) | 0.86(0.42, 1.00) | 0.71(0.29, 0.96) | 0.98(0.95, 1.00) | 0.71(0.29, 0.96) | 0.67(0.35, 0.90) | 0.78(0.40, 0.97) | 1.00(0.48, 1.00) | 0.83(0.36, 1.00) | 0.97(0.93, 0.99) |
NPV | 0.82(0.60, 0.95) | 0.82(0.60, 0.95) | 0.86(0.64, 0.97) | 0.91(0.78, 0.97) | 0.85(0.72, 0.94) | 0.91(0.78, 0.97) | 0.94(0.70, 1.00) | 0.95(0.83, 0.99) | 0.83(0.61, 0.95) | 0.82(0.60, 0.95) | 1.00(0.91, 1.00) |
PLR | 10.56(1.44, 77.62) | 10.56(1.44, 77.62) | 12.67(1.78, 90.18) | 11.39(2.61, 49.66) | 13.73(4.61, 40.91) | 11.39(2.61, 49.66) | 4.22(1.72, 10.39) | 15.94(3.95, 64.40) | Inf(NaN, Inf) | 10.56(1.44, 77.62) | 8.60(3.77, 19.60) |
NLR | 0.47(0.22, 0.98) | 0.47(0.22, 0.98) | 0.35(0.14, 0.89) | 0.47(0.22, 0.97) | 0.05(0.02, 0.09) | 0.47(0.22, 0.97) | 0.14(0.02, 0.91) | 0.23(0.07, 0.79) | 0.44(0.21, 0.92) | 0.47(0.22, 0.98) | 0.00(0.00, NaN) |
All constructed predictive models were developed without the utilization of data augmentation techniques.
SVM supported vector machine, NN neural network, MLP multi-layer perceptron, GP gaussian process, GBM gradient boosting machine, LR logistic regression, NB Naive Bayes, XGB XGBoost, C5.0 C5.0 Decision Trees, KNN k-nearest neighbor, RF random forest, PPV positive predictive value, NPV negative predictive value, PLR positive likelihood ratio, NLR negative likelihood ratio.