. 2021 Feb;13(2):1215–1229. doi: 10.21037/jtd-20-2580

Table 2. Comparison of clinical model based on eight machine learning classifiers in predicting critical illness among patients with COVID-19.

Classifiers	Measured metrics
Classifiers	AUC (95% CI)	Accuracy% (95% CI)	F1 score (95% CI)	PPV% (95% CI)	NPV% (95% CI)	Specificity% (95% CI)	Sensitivity% (95% CI)
XGBoost	0.960 (0.913–1.000)	90.6 (81.1–98.1)	82.8 (65.9–100.0)	70.6 (54.5–100.0)	100.0 (95.1–100.0)	87.8 (75.6–100.0)	100.0 (83.3–100.0)
AdaBoost	0.929 (0.857–1.000)	84.9 (71.7–98.1)	75.0 (53.3–100.0)	60.0 (44.4–100.0)	100.0 (91.1–100.0)	80.5 (63.4–100.0)	100.0 (66.7–100.0)
RF	0.959 (0.913–1.000)	90.6 (81.1–98.1)	82.8 (68.4–100.0)	70.6 (54.5–100.0)	100.0 (97.1–100.0)	87.8 (75.6–100.0)	100.0 (91.7–100.0)
LR	0.937 (0.871–1.000)	90.6 (81.1–98.1)	82.8 (68.4–96.0)	70.6 (54.5–92.3)	100.0 (97.1–100.0)	87.8 (75.6–97.6)	100.0 (91.7–100.0)
KNN	0.851 (0.718–0.983)	90.6 (83.0–98.1)	78.9 (55.6–100.0)	83.3 (62.5–100.0)	92.9 (86.7–100.0)	95.1 (87.8–100.0)	75.0 (50.0–100.0)
SVM	0.917 (0.834–1.000)	92.5 (73.6–98.1)	86.5 (57.1–100.0)	81.8 (46.2–100.0)	97.4 (92.7–100.0)	95.1 (65.9–100.0)	91.7 (75.0–100.0)
NB	0.856 (0.734–0.977)	86.8 (77.4–94.3)	74.1 (53.8–94.7)	66.7 (50.0–90.0)	94.9 (87.8–100.0)	87.8 (75.6–97.6)	83.3 (58.3–100.0)
BPNN	0.821 (0.680–0.962)	90.6 (83.0–96.2)	76.6 (52.0–95.7)	90.0 (69.2–100.0)	90.9 (84.8–97.6)	97.6 (92.7–100.0)	66.7 (41.7–91.7)

The confusion matrix in our study was given as a 2×2 contingency table that reported the number of true positives, false positives, false negatives, and true negatives. Sensitivity = true positives/(true positives + false negatives) ×100%. Specificity = True negatives/(true negatives + false positives) ×100%. Accuracy = (true positives + true negatives)/n ×100%. The F1 score is equivalent to harmonic mean of the precision and recall, where the best value is 1.0 and the worst value is 0.0. The formula for F1 score is: F1 =2 * (precision * recall)/(precision + recall), precision = true positives/(true positives + false positives), recall = true positives/(true positives + false negatives). PPV was the probability that the disease was present when the test was positive (expressed as a percentage). NPV was the probability that the disease was not present when the test was negative (expressed as a percentage). The ROC curve was created by plotting the true positive rate (sensitivity) against the false positive rate (1-sensitivity). By varying the predicted probability threshold, we calculated AUC values. We calculated 95% CIs with the bootstrap (100 iterations) method. AUC, area under the curve; CI, confidence interval; PPV, positive predictive value; NPV, negative predictive value; NB, Naive Bayes; LR, Linear Regression; RF, Random Forest; XGBoost, Extreme Gradient Boosting; AdaBoost, Adaptive Boosting; KNN, K-Nearest Neighbor; k-SVM, Kernel Support Vector Machine; BPNN, Back Propagation Neural Networks.