. 2025 Jan 14;9:2. doi: 10.1186/s41512-024-00182-4

Table 3.

Results of reviews reporting model development and validation

Review author (publication year)	DEV/VAL (no. studies)	Setting of included studies; data sources	Model development algorithms	Internal validation methods	Brief description of study quality	Summary of model performance results
Barghouthi [55] (2023)	DEV (23)	Setting of included studies NS, but the review’s inclusion criteria specified hospital settings Retrospective n = 15; prospective n = 5; both retrospective and prospective n = 1; case–control study n = 1; experimental study design n = 1 EHRs n = 20; international or national database n = 3	LR n = 18; RF n = 13; DT n = 5; NN n = 5; SVM n = 5; Fine-Gray Model n = 2; KNN n = 2; XGBoost n = 2; Adaboost n = 1; BART n = 1; EBM n = 1; Gaussian Naïve Bayes n = 1; GB n = 1; GBM n = 1; LDA n = 1; NB n = 1	Split sample n = 17; NS n = 6	RoB assessed using JBI critical appraisal checklist for cohort studies, and only summary results provided Only one domain was low RoB across all included studies, which was whether the participants were free from the outcome (PIs) at the start of the study Domains with mostly high-risk (< 50%) or moderate-risk (51–81%) results related to statistical analysis methods, follow-up time, dealing with confounding factors, and measurement of the exposure	Only reported measures of discrimination: Accuracy ranged between 0.52 (ML Walther [82]) and 0.99 (ML Anderson [83]); Sensitivity ranged between 0.04 (ML Walther [82]) and 1 (ML Hu [84], ML Anderson [83]); Specificity ranged between 0.69 (ML Hyun [85], ML Nakagami [86]) and 1 (ML Cai [87], ML Walther [82]); PPV ranged between 0.01 (ML Nakagami [86]) and 1 (ML Cai [87]); NPV ranged between 0.08 (ML SPURS [88], ML Cramer [89]) and 1 (ML Hu [84], ML Anderson [83], ML Ladios-Martin [90]); AUC ranged between 0.50 (ML Cai [87]) and 1 (ML Hu [84], ML Cai [87])
Dweekat [36] (2023)	DEV (34); unclear (1)^a	HAPI/CAPI n = 32; SRPI n = 2; detection of PI (effect on length of stay) n = 1; nursing home residents n = 2 Data sources NS	LR n = 20; RF n = 18; DT n = 12; SVM n = 12; MLP n = 9; KNN n = 4; LDA n = 1; other n = 19	CV n = 10; split sample n = 10; split sample and CV n = 8; NS n = 7	No RoB assessment	Results not reported; review focused on methods only
Jiang [37] (2021)	DEV (9)	ICU n = 3; operating room n = 2; acute care hospital n = 1; oncology department n = 1; end-of-life care n = 1; mobility-related disabilities n = 1 EHRs used in all models	DT n = 5; LR n = 3; NN n = 2; SVM n = 2; BN n = 1; GB n = 1; MTS n = 1; RF n = 1	Split sample n = 4; NS n = 9	RoB assessed using PROBAST. Overall RoB high for all predictive models. All models at high RoB in analysis domain	Only reported measures of discrimination: F-score ranged between 0.377 (ML Su MTS [91]) and 0.670 (ML Su LR [91]); G-means ranged between 0.628 (ML Kaewprag BN [92]) and 0.822 (ML Su MTS [91]); Sensitivity ranged between 0.478 (ML Kaewprag [92]) and 0.848 (ML Yang [93]); Specificity ranged between 0.703 (ML Deng [94]) and 0.988 (ML Su LR [91])
Pei [54] (2023)	DEV (17); DEV + VAL (1)	DEV ICU n = 4; hospitalised patients n = 8; hospitalised patients awaiting surgery n = 3; cancer patients n = 1; end-of-life inpatients n = 1 Retrospective n = 14; prospective n = 3 EHRs n = 12; MIMIC-IV database n = 1; CONCERN database n = 1 DEV + VAL ICU n = 1 Retrospective n = 1 EHRs n = 1	RF n = 12; LR n = 11; DT n = 9; SVM n = 8; NN n = 5; MTS n = 1; NB n = 3; KNN n = 2; MLP n = 1; XGBoost n = 2; BART n = 1; LASSO n = 1; BN n = 1; ANN n = 1; EN n = 1; GBM n = 1; Other^b n = 1	CV n = 1; Split sample n = 5; split sample and CV n = 10; NS n = 2	RoB assessed using PROBAST. Overall, 16/18 (88.9%) papers were at high RoB, 1 (5.6%) was at unclear RoB and only 1 (5.6%) was at low RoB 14 (77.8%) studies were at high RoB in the analysis domain. The most common factors contributing to the high risk of bias in the analysis domain included an inadequate number of events per candidate predictor, poor handling of missing data and failure to deal with overfitting	Only reported measures of discrimination: Summary AUC 0.9449 Summary sensitivity 0.79 (95% CI 0.78, 0.80); N_cases = 19,893 Summary specificity 0.87 (95% CI 0.88, 0.87); N_non-cases = 388,611 Summary likelihood ratios PLR 10.71 (95% CI 5.98, 19.19) NLR 0.21 (95% CI 0.08, 0.50) Pooled odds ratio 52.39 (95% CI 24.83, 110.55)
Ribeiro [51] (2021)	DEV (3)	SRPI cardiovascular n = 2; SRPI critical care n = 1 EHRs used in n = 2 models	ANN n = 1; RF n = 1; XGBoost n = 1	Split sample n = 2; NS n = 1	No RoB assessment	Only reported measures of discrimination: Accuracy ranged between 0.79 (ML Alderden [95]) and 0.82 (ML Chen [96])
Shi [52] (2019)	DEV (21); VAL (7)	DEV General acute care hospital n = 7; long-term care n = 5; specific acute care (e.g. ICU) n = 4; cardiovascular surgery n = 2; trauma and burn centres n = 1; rehabilitation units n = 1; unclear n = 1 Retrospective n = 11; prospective n = 10 VAL Long-term care n = 3; specific acute care (e.g. ICU) n = 2; general (acute care) hospital n = 2 Retrospective n = 4; prospective n = 3	LR n = 16; cox regression n = 5; ANN n = 1; C4.5 ML (DT induction algorithm) n = 1; DA n = 1; DT n = 1; NS n = 1	CV n = 1; tree-pruning n = 1; split sample n = 1; re-sampling n = 2; NS n = 16	RoB assessed using PROBAST DEV Overall RoB unclear for two models. Overall RoB high for the remaining 19 models. Analysis and outcome domains were mostly at high RoB VAL Overall RoB unclear for three validation studies. Overall RoB high for the remaining four validation studies. Analysis and outcome domains were mostly at high RoB	C-statistics^c ranged between 0.61 (interRAI PURS [78]) and 0.90 (TNH-PUPP [75]); O/E ratios^c ranged between 0.91 (Berlowitz MDS [77]) and 1.0 (prePURSE study tool [81]) Pooled C-statistics^c TNH-PUPP [75]: 0.86 (95% CI 0.81–0.90), n = 2 Fragmment scale [97]: 0.79 (95% CI 0.77–0.82), n = 1^d Berlowitz 11-item model [98]: 0.75 (95% CI 0.74–0.76), n = 2 Berlowitz MDS model [77]: 0.73 (95% CI 0.72–0.74), n = 2 interRAI PURS [78]: 0.65 (95% CI 0.60–0.69), n = 3 Compton [79]: 0.81 (95% CI 0.78–0.84), n = 2 Pooled O/E ratios^c Berlowitz 11-item model [98]: 0.99 (95% CI 0.95–1.04), n = 2 Berlowitz MDS [77]: 0.94 (95% CI 0.88–1.01), n = 2
Zhou [53] (2022)	DEV (22)	SRPI n = 3; ICU n = 11; hospitalised n = 6; rehabilitation centre n = 1; hospice n = 1 EHR n = 18; MIMIC-III database n = 4	LR n = 15; RF n = 10; DT n = 9; SVM n = 9; ANN n = 8; BN n = 3; XGBoost n = 3; GB n = 2; AdaBoost n = 1; CANTRIP n = 1; LSTM n = 1; EN n = 1; KNN n = 1; MTS n = 1; NB n = 1	CV n = 12; NS n = 10	RoB assessed using PROBAST. Overall RoB unclear for five studies. Overall RoB high for 15 models. RoB not assessed in two studies due to use of unstructured data	Only reported measures of discrimination: F1 score ranged between 0.02 (ML Nakagami [86]) and 0.99 (ML Song [2] [99]); AUC ranged between 0.78 (ML Delparte [100]) and 0.99 (ML Song [2] [99]); Sensitivity ranged between 0.08 (ML Cai [87]) and 0.99 (ML Song [2] [99]); Specificity ranged between 0.63 (ML Delparte [100]) and 1 (ML Cai [87])

^aAppears to be a model validation study, but the review lists validation method as N/A

^bOther includes: average perception, Bayes point machine, boosted DT, boosted decision forest, decision jungle and locally deep SVM. All reported for one study [90]

^cValues from fixed-effects meta-analyses, pooling development and external validation study estimates together

^dOne data source but included two C-statistic values (one for model development and one for internal validation) that were subsequently pooled

AUC area under curve, ANN artificial neural network, BART Bayesian additive regression tree, BN Bayesian network, CAPI community-acquired pressure injury, CANTRIP reCurrent Additive Network for Temporal RIsk Prediction, CONCERN Communicating Narrative Concerns Entered, CV cross-validation, DEV development, DOR diagnostic odds ratio, DT decision tree, EBM explainable boosting machine, EHRs electronic health records, EN elastic net, GB(M) gradient boosting (machine), HAPI hospital-acquired pressure injury, ICU intensive care unit, JBI Joanna Briggs Institute, KNN k-nearest neighbours, LASSO least absolute shrinkage and selection operator, (L)DA (linear) discriminant analysis, LSTM long short-term memory, LR logistic regression, MIMIC Medical Information Mart for Intensive Care, ML machine learning, MLP, multilayer perceptron, MTS Mahalanobis-Taguchi system, N/A not applicable, NB naïve Bayes, NN neural network, NLR negative likelihood ratio; NS not stated, O/E observed vs expected, PI pressure injury, PLR positive likelihood ratio, PROBAST Prediction model Risk of Bias ASsessment Tool, RF random forest, RoB risk of bias, SRPI surgery-related pressure injury, SVM support vector machine, VAL validation, XGBoost extreme gradient boosting