Table 3. Main findings, validation, best models, and clinical relevance for each study.
AUROC, area under the receiver operating characteristic curve; MESS, mangled extremity severity score; ACS, American College of Surgeons; NSQIP, National Surgical Quality Improvement Program; VSGNE, Vascular Study Group of New England; CV, cross-validation; TEG, thromboelastography; AUC, area under the curve; RF, random forest; NB, Naïve Bayes; SVM, support vector machine; ANN, artificial neural network; LR, logistic regression.
| Study | Main findings | Calibration/Validation | Best model | Clinical utility/Comparison |
| Li et al. (2024) [1] | XGBoost AUROC=0.93 (0.92-0.94), LR AUROC=0.72 (0.70-0.74). ~9.0% event rate. | Brier score 0.09 (good calibration). 10-fold CV, 70/30 split. | XGBoost | Outperformed logistic regression (0.72 AUROC). Accurate 30-day outcome prediction; authors note need for prospective validation. |
| Perkins et al. (2020) [2] | BN model AUROC=0.95 (0.92-0.98) for predicting failed revascularization (vs MESS AUROC 0.74). Accuracy not reported explicitly (high). | Calibration slope 1.96 (dev), 1.72 (val); Brier 0.05. Validation: 10-fold CV + external UK cohort. | Bayesian network | Outperformed Mangled Extremity Severity Score (0.95 vs 0.74 AUROC ). Provides individualized limb-salvage risk to inform decision-making in trauma. |
| Ghandour et al. (2025) [3] | Logistic regression (with baseline+TEG data) AUC=0.76; accuracy=0.70; sensitivity=0.68; specificity=0.71. XGBoost and tree had similar AUCs (~0.72-0.76). | Five-fold CV with 70/30 split. Logistic model had best combined discrimination and calibration. | Logistic (with TEG) | Combining patient factors and thromboelastography improved prediction of 1-year post-revascularization thrombosis. May help identify high-risk patients for tailored anticoagulation. |
| Li et al. (2024)[4] | XGBoost AUROC=0.93 (0.92-0.94) (versus RF 0.92, NB 0.87, SVM 0.85, ANN 0.80, LR 0.63 ). Overall accuracy ~0.86. | Brier score 0.08 (good calibration). 10-fold cross-validation (CV) with 70/30 train-test split. | XGBoost | Demonstrated strong discrimination where no clinical risk tool exists. Potential to improve risk stratification beyond traditional ACS-NSQIP/VSGNE scores. |
| Li et al. (2024) [5] | XGBoost AUROC=0.94 (0.93-0.95); accuracy=0.86; sensitivity=0.87; specificity=0.85. LR AUROC=0.67. | 10-fold CV (70/30 train-test). XGBoost performance remained high post-op (AUROC up to 0.98). | XGBoost | Significantly better than logistic. High predictive accuracy could guide perioperative risk mitigation strategies. |