Table 2. PROBAST-informed appraisal of primary prediction model studies contributing to Table 1.
Items are recorded as “Yes/No/NR/Unclear” based on study reporting; when methodological details (e.g., missing-data handling, class imbalance handling, calibration/utility) were not reported, they are conservatively coded as “NR/Unclear.” This appraisal is intended to contextualize AUC-based performance comparisons with attention to validation design, reporting completeness, and clinical readiness rather than to assign formal risk-of-bias scores.
THA: total hip arthroplasty; TKA: total knee arthroplasty; DCA: decision curve analysis; NR: not reported; AUC: area under the curve; EMR: electronic medical record; CV: cross-validation; NSQIP: National Surgical Quality Improvement Program; RF: Random Forest; SMOTE: Synthetic Minority Over-sampling Technique; PROBAST: Prediction model Risk Of Bias ASsessment Tool
| Reference | Study (Author, Year) | Outcome | Data source | Validation | Missing data handling | Class imbalance | Calibration/Utility | PROBAST domains flagged | Notes |
| [13] | Huang et al., 2023 | Surgical site infection | Ortho patients (train + external validation cohort) | External validation (separate external set) | NR | NR | Calibration + DCA reported | Analysis | Classic regression nomogram w/ calibration + DCA |
| [40] | Huang et al., 2021 | Blood transfusion after THA/TKA | Multi-hospital EMR cohort | Random subsampling + 10-fold CV | Excluded missing/incorrect (0.73%) | NR | NR | Analysis | Reports AUC-focused comparison; calibration not highlighted |
| [47] | Zang et al., 2024 | Perioperative blood transfusion (hip surgery) | Retrospective hip surgery cohort | Temporal split (first 70% train / last 30% test) | Excluded missing data | NR | Calibration + Brier + DCA reported | Analysis | Time split is stronger than random split; still single-system retrospective |
| [59] | Jiang et al., 2024 | Pedicle screw loosening | Lumbar fixation cohort | Random split (8:2) + 10-fold CV | Imputed (<20%) w/ RF regression; otherwise excluded | NR | Calibration plots + Brier | Analysis | Strong reporting on calibration/Brier; still internal validation only |
| [60] | Xiong et al., 2023 | Spine outcome model; per paper | Spine surgery imaging/clinical cohort | Train/validation split (0.75/0.25) | Excluded incomplete imaging | NR | Calibration + DCA reported | Participants / Analysis | Imaging exclusion can introduce selection bias; no external validation shown |
| [61] | Chen et al., 2023 | Surgical site infection prediction | Retrospective cohort | Train/test split (reported) | Excluded incomplete clinical data | NR | Calibration curves reported | Analysis | Calibration reported, but imbalance handling not clearly described |
| [62] | Zhang et al., 2024 | SSI following spine surgery | 986 pts “complete data” cohort | 5-fold CV (4 folds train / 1 validate) | Complete-case (“complete data” only) | NR | NR | Participants / Analysis | Internal CV only; AUC-heavy evaluation; missing handling = exclusion |
| [71] | Gupta et al., 2025 | 30-day mortality (femoral shaft fracture surgery) | NSQIP dataset | Stratified 80/20 split + 10-fold CV | Dropped vars >5% missing; kNN imputation for remaining | SMOTE + Tomek | Calibration slope/intercept + Brier | Analysis | Best reported “methods hygiene” among these (imbalance + calibration + imputation) |
| [72] | Han et al., 2025 | Postoperative infection (nosocomial infection paper) | Retrospective cohort (2011–2024) | Random 70/30 split + 10-fold CV | Imputation reported (details in supplement) | NR | Brier reported | Analysis | Strong model-development description; still internal validation only |