. 2025 May 1;21(5):843–854. doi: 10.5664/jcsm.11560

Table 2.

Comparison of model performance predicting OSA risk in different machine learning techniques.

	LR	RF	SVM	XGBoost
AUROC (%)	97.2	96.0	91.6	96.7
(95% CI)	(93.9–99.5)	(91.7–99.2)	(85.7–96.8)	(92.4–99.5)
AUPRC (%)	97.0	95.3	91.9	95.7
(95% CI)	(93.7–99.5)	(91.0–99.1)	(85.1–96.8)	(92.4–99.5)
Sensitivity (%)	93.0	97.7	90.7	95.3
(95% CI)	(85.9–97.6)	(84.5–96.2)	(75.9–90.9)	(84.6–96.3)
Specificity (%)	90.7	83.7	76.7	86.0
(95% CI)	(85.1–97.6)	(84.9–96.3)	(75.7–91.6)	(84.2–96.4)
Accuracy (%)	91.9	90.7	83.7	90.7
(95% CI)	(86.0–97.6)	(84.8–96.3)	(75.4–90.7)	(84.5–96.4)
PPV (%)	90.9	85.7	79.6	87.2
(95% CI)	(85.3–97.1)	(84.6–96.3)	(75.6–90.8)	(84.4–96.5)
NPV (%)	92.9	97.3	89.2	94.9
(95% CI)	(85.3–96.8)	(84.4–96.4)	(76.1–91.1)	(84.5–96.5)
F1 score (%)	92.0	91.3	84.8	91.1
(95% CI)	(85.5–96.8)	(84.3–96.0)	(76.0–91.1)	(83.8–96.4)
Threshold	0.547	0.308	0.298	0.396

The F1 score is calculated by 2 × sensitivity × PPV/(sensitivity + PPV). AUROC = area under receiver operating characteristic, AUPRC = area under the precision–recall curve, CI = confidence interval, LR = logistic regression, NPV = negative predictive value, PPV = positive predictive value, RF = random forest, RUS = random undersampling, SVM = support vector machine, XGBoost = extreme gradient boosting.