Systematic evaluation of machine learning models for postoperative surgical site infection prediction

Anna M van Boekel; Siri L van der Meijden; Sesmu M Arbous; Rob G H H Nelissen; Karin E Veldkamp; Emma B Nieswaag; Kim F T Jochems; Jeroen Holtz; Annekee van IJlzinga Veenstra; Jeroen Reijman; Ype de Jong; Harry van Goor; Maryse A Wiewel; Jan W Schoones; Bart F Geerts; Mark G J de Boer

doi:10.1371/journal.pone.0312968

. 2024 Dec 12;19(12):e0312968. doi: 10.1371/journal.pone.0312968

Systematic evaluation of machine learning models for postoperative surgical site infection prediction

Anna M van Boekel ^1,^*, Siri L van der Meijden ^2,³, Sesmu M Arbous ², Rob G H H Nelissen ⁴, Karin E Veldkamp ⁵, Emma B Nieswaag ^2,³, Kim F T Jochems ^2,³, Jeroen Holtz ^2,³, Annekee van IJlzinga Veenstra ^2,³, Jeroen Reijman ^2,³, Ype de Jong ^1,⁶, Harry van Goor ⁷, Maryse A Wiewel ³, Jan W Schoones ⁸, Bart F Geerts ³, Mark G J de Boer ^6,⁹

Editor: Mohamad K Abou Chaar¹⁰

¹Department of Internal Medicine, Leiden University Medical Center, Leiden, The Netherlands

²Department of Intensive Care, Leiden University Medical Center, Leiden, The Netherlands

³Healthplus.ai R&D B.V., Amsterdam, The Netherlands

⁴Department of Orthopedic surgery, Leiden University Medical Center, Leiden, The Netherlands

⁵Department of Medical Microbiology and Infection Control, Leiden University Medical Center, Leiden, The Netherlands

⁶Department of Clinical Epidemiology, Leiden University Medical Center, Leiden, The Netherlands

⁷Department of Surgery, Radboud UMC, Nijmegen, The Netherlands

⁸Waleus Medical Library, Leiden University Medical Center, Leiden, The Netherlands

⁹Department of Infectious disease, Leiden University Medical Center, Leiden, The Netherlands

¹⁰Mayo Clinic Rochester, UNITED STATES OF AMERICA

Competing Interests: I have read the journal’s policy and the authors of this manuscript have the following competing interests: B.F. Geerts declares to be shareholder and owner of Healthplus.ai S.L. van der Meijden, M. Wiewel, E.B. Nieswaag, K.F.T. Jochems, J. Holtz, A. van IJlzinga Veenstra, and J. Reijman declare to be an employee of Healthplus.ai. This does not alter our adherence to PLOS ONE policies on sharing data and materials.

^✉

* E-mail: a.m.van_boekel@lumc.nl

Roles

Anna M van Boekel: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Writing – original draft, Writing – review & editing

Siri L van der Meijden: Conceptualization, Data curation, Writing – review & editing

Sesmu M Arbous: Conceptualization, Data curation, Methodology, Supervision, Writing – review & editing

Rob G H H Nelissen: Writing – review & editing

Karin E Veldkamp: Writing – review & editing

Emma B Nieswaag: Data curation, Writing – review & editing

Kim F T Jochems: Data curation, Writing – review & editing

Jeroen Holtz: Data curation, Writing – review & editing

Annekee van IJlzinga Veenstra: Data curation, Writing – review & editing

Jeroen Reijman: Data curation, Writing – review & editing

Ype de Jong: Writing – review & editing

Harry van Goor: Writing – review & editing

Maryse A Wiewel: Data curation, Writing – review & editing

Jan W Schoones: Data curation, Writing – review & editing

Bart F Geerts: Conceptualization, Data curation, Software, Writing – review & editing

Mark G J de Boer: Conceptualization, Data curation, Methodology, Supervision, Writing – review & editing

Mohamad K Abou Chaar: Editor

PMCID: PMC11637340 PMID: 39666725

Abstract

Background

Surgical site infections (SSIs) lead to increased mortality and morbidity, as well as increased healthcare costs. Multiple models for the prediction of this serious surgical complication have been developed, with an increasing use of machine learning (ML) tools.

Objective

The aim of this systematic review was to assess the performance as well as the methodological quality of validated ML models for the prediction of SSIs.

Methods

A systematic search in PubMed, Embase and the Cochrane library was performed from inception until July 2023. Exclusion criteria were the absence of reported model validation, SSIs as part of a composite adverse outcome, and pediatric populations. ML performance measures were evaluated, and ML performances were compared to regression-based methods for studies that reported both methods. Risk of bias (ROB) of the studies was assessed using the Prediction model Risk of Bias Assessment Tool.

Results

Of the 4,377 studies screened, 24 were included in this review, describing 85 ML models. Most models were only internally validated (81%). The C-statistic was the most used performance measure (reported in 96% of the studies) and only two studies reported calibration metrics. A total of 116 different predictors were described, of which age, steroid use, sex, diabetes, and smoking were most frequently (100% to 75%) incorporated. Thirteen studies compared ML models to regression-based models and showed a similar performance of both modelling methods. For all included studies, the overall ROB was high or unclear.

Conclusions

A multitude of ML models for the prediction of SSIs are available, with large variability in performance. However, most models lacked external validation, performance was reported limitedly, and the risk of bias was high. In studies describing both ML models and regression-based models, one modelling method did not outperform the other.

Introduction

Surgical site infections (SSIs) are known complications following surgery and belong to the most frequently occurring hospital-acquired infections. The incidence of SSIs ranges between 0.6% and 18% and depend on the type of surgical procedure and setting [1–4]. Surgical site infections lead to increased morbidity, mortality and hospital stay, resulting in a negative impact on the patient’s health-related quality of life [2]. Moreover, SSIs cause an increase in healthcare costs due to prolonged hospitalization, the need for extra diagnostic tests and interventions, and prolonged treatment. Recent meta-analyses showed an additional length of hospital stay between 2.1 to 54 days for patients with an SSI [2] with an estimated cost ranging from USD $10,443 to USD$ 25,546 per case [3]. Early detection and treatment are important for reducing these negative effects of SSIs.

Several risk factors for the development of SSIs have been identified such as sex, BMI, comorbidity American Society of Anesthesiologist (ASA) score, smoking, age and surgical approach [5, 6]. Several prognostic prediction models have been developed to identify which patients are at risk for developing an SSI. Besides traditional models, such as those using logistic regression [7], machine learning (ML) models are increasingly being developed and used for this purpose. ML comprises a wide spectrum of different algorithms that automatically learn from presented and new input data in a continuous iterative process, and variable selection for ML models is performed by these algorithms. This in contrast to traditional models, where variable selection and internal model settings are more dictated by humans [8–10]. ML models benefit not only from this iterative learning process, but also from using more and different types of input variables. The complex algorithmic structure can find non-linear relations between variables, which contrasts with traditional regression-based models [11]. The disadvantage of ML models is that the outcomes result in “black-box” predictions, where the used data for ML model output, the (relative) importance of these data and their possible mutual effects are less evident compared to regression-based models [12, 13].

To evaluate the statistical performance of prediction models, discriminative performance in terms of concordance statistics (C-statistic), also known as area under the receiver operating characteristic curve (ROC or area under the curve -AUC-), and calibration in terms of calibration plots with slope and intercept are most often assessed [14]. Discriminative performance is the ability of the model to distinguish between patients with and without the outcome, whereas calibration is the agreement between the predicted probability and the proportion of patients with the actual outcome. Prediction models are first internally validated, using for example cross-validation or bootstrapping. Thereafter, external validation should be performed either on other hospital datasets, prospectively in time, or both, to ensure generalizability [15].

ML models are increasingly being developed for many different purpuses in surgery [7]. Elfanagely et al [16] described 45 ML models used for the prediction of surgical outcomes and another review [17] summarized the outcomes of 212 articles with ML models developed for prediciting a broad spectrum of outcomes in vascular surgery. The ML models performed reasonably well, but there were concerns regarding the risk of bias. A recent systematic review and meta-analysis performed by Wu et al. showed that there are many different ML models for the prediction or detection of SSIs, but that the validation of these models is generally insufficient [18]. Wu et al. mainly focused on the methodological aspects of the models and made no distinction between the prediction of SSI or SSI detection for surveillance purposes. Moreover, a clear overview of the available models for different surgical specialties or SSI subtypes (superficial-, deep- or organ space SSI) is still missing. The number of models developed for SSI prediction is increasing, and since 2021 new models have been developed. The aim of this systematic review was therefore to describe the performance of all internally or externally validated ML models for the prediction of SSI, to describe the methodological quality of the studies studying ML models for prediction of SSI, and to give an overview of the available models per surgical specialty and SSI subtype.

Methods

A systematic review of the published literature on the prediction of postoperative infections was conducted according to the Preferred Reporting items for Systematic Reviews and Meta-Analyses (PRISMA) statement (S1 Appendix). The protocol for this study was registered in PROSPERO (registration number 248953).

Search strategy

The literature search was performed in MEDLINE, EMBASE and the Cochrane Library from inception to July 1, 2023. The complete search strings are shown in the Supplementary material (S2 Appendix).

Inclusion and exclusion criteria

All original studies that developed and validated (internally or externally) ML models for the prediction of SSIs and studies that externally validated ML models that were previously developed were included. Models were considered to be an ML model if a non-regression-based approach for model development was used such as random forests, support vector machines and neural networks. As outcome, prediction of all types of SSIs within 30 days postoperative were included. Models that only predicted SSIs as part of a composite adverse outcome were excluded. Other exclusion criteria were pediatric populations (age <18 years old), no full text article available, and articles not written in English language.

Screening and data extraction

Study selection was performed using the Covidence^® software program (www.covidence.org, Melbourne, Australia). After removal of duplicates, titles and abstracts were screened on full text inclusion criteria by two independent authors (AB, BG, or MW). Full text analysis of the remaining articles was performed by the same authors. All conflicts were resolved by a third reviewer (MB or SA).

The following data were obtained from the included articles: type of SSI predicted (either superficial, deep or organ space), surgical specialty, number of surgeries, patients or both, performance parameters of the model (sensitivity, specificity, accuracy, calibration and C-statistic), method of validation, variables used as predictors, and all types of developed and/or validated models (ML as well as regression-based models). A complete list of the extracted data is provided in S2 Table. Reviewers used a standardized data extraction form that was based on the CHARMS (CHecklist for critical Appraisal and data extraction for systematic Reviews of prediction Modelling Studies) [19]. Extracted data was double checked for inconsistencies by AB and BG and discrepancies were resolved by consensus.

Descriptive analyses

Results were summarized using descriptive statistics. We did not perform a meta-analysis due to the heterogeneity in reported outcome measures and definitions. Analyses were performed using R (version 2023.06.1+524, R Core Team, Vienna, Austria).

Risk of bias

The methodological quality of all included studies was assessed using the Prediction model study Risk of Bias Assessment Tool (PROBAST) [20, 21]. The PROBAST is designed to critically appraise prediction models and contains two main domains: the risk of bias, which consists of four subdomains (participant selection, predictor selection, outcome definition and analysis) and the applicability for the review. In total there are 20 signaling questions which can be scored as ‘yes’, ‘probably yes’, ‘probably no’, ‘no’, or ‘no information’ which combined lead to a low, uncertain, or high risk of bias and applicability.

Results

A flowchart of the search is summarized in Fig 1. Of the 4,377 publications identified, 24 studies were included for further analysis. See S1 Table for the exclusion reasons of the excluded full text articles.

Characteristics of included studies

The 24 included studies described a total of 85 different ML models. Sixty-nine models (81%) were internally validated and 16 (19%) were externally validated, including one model (the Predictive OpTimal Trees in Emergency Surgery Risk (POTTER) Calculator) that was externally validated in five separate studies. The most frequently predicted outcome was SSI in general (i.e., a combination of superficial-, deep- and organ space SSI or unspecified), 11 models predicted superficial SSI, nine models predicted deep SSI and 24 models predicted organ space SSI. Abdominal surgery was the surgical specialty for which most models were developed (47%), followed by general surgery (21%) and orthopedic surgery (8%). See Table 1 for an overview of all included studies.

Table 1. Overview of included studies.

Author & year	Number of ML models	Country	Specialty	Type of infection predicted	Sample size (n)	Type of validation
Bertsimas 2018 [22]	2	USA	Emergency surgery	Superficial SSI, deep SSI, organ space SSI	Development 382,960 Validation unknown	Model development, internal and external validation
Bonde 2021 [23]	4	USA	General surgery	Superficial SSI, deep SSI, organ space SSI	Training 4,694,488 Validation 173,622 Testing 13,771	Model development, internal and external validation
Chang 2020 [24]	1	USA	Vascular surgery	SSI	72,435 split into 80–20 training ‐ validation cohort 370 institution cohort.	Internal
El Hechi 2021 [25]	1	USA	Emergency surgery	Superficial SSI, deep SSI, organ space SSI	59,955	External
Gowd 2019 [26]	5	USA	Orthopedic surgery	SSI	17,119 split into 80–20 training-validation cohort	Internal
Grass 2021 [27]	2	USA	Abdominal surgery	SSI	ACS-NSQIP database: Model development 180,538 External validation 2,376 Mayo clinic database: 2,376 (10-fold cross validation)	ACS-NSQIP database: Model development, internal and external validation Mayo-clinic: internal
Ke 2017 [28]	1	The Netherlands	Abdominal surgery	SSI	860	Internal
Liu (W.C.) 2022 [29]	5	China	Neurosurgery	SSI	Training set 201 Test set 87	Internal
Liu (X.) 2022 [30]	4	China	Abdominal surgery	Organ space SSI	297 split into training (81%), validation (9%) and testing (10%) set.	Internal
Mamlook 2023 [31]	6	USA	General	SSI	2,882,526	Internal
Maurer 2020 [32]	1	USA	Emergency surgery	Superficial SSI, deep SSI, organ space SSI	29,366	External
Mazaki 2021 [33]	1	Japan	Abdominal surgery	Organ space SSI	256	Internal
Merath 2020 [34]	1	USA	Abdominal surgery	Superficial SSI, deep SSI, organ space SSI	15,657	Internal
Nudel 2020 [35]	2	USA	Abdominal surgery	Organ space SSI	436,807 split into training (218,403), validation (109,202) and testing (109,202)	Internal
Ohno 2022 [36]	1	Japan	Abdominal surgery	SSI	730 development 100 validation	Internal
Sanger 2016 [37]	4	The Netherlands	Abdominal Surgery	SSI	851	Internal
Taylor 2019 [38]	4	USA	Urological surgery	SSI	7,557 split into 80% training and 20% testing set.	Internal
Van Esbroeck 2014 [39]	5	USA	General surgery	SSI	Training 607,558 Validation 363,875 Evaluation (external validation) 363,431	Model development, internal and external validation
Van Kooten 2022 [40]	6	The Netherlands	Abdominal surgery	Organ space SSI	6,427 split into 75% training and 25% testing set	Internal
Velmahos 2023 [41]	2	USA	Abdominal surgery	Superficial SSI, organ space SSI	94,530 split into 75% training and 25% testing set	Internal
Walczak 2019 [42]	3	USA	General surgery	SSI	646	Internal
Weller 2018 [43]	16	USA	Abdominal surgery	SSI	9,598 split into 80% training and 20% testing set	Internal
Ying 2023 [44]	2	China	Orthopedic	SSI	351	Model development, internal and external validation
Zhang 2023 [45]	6	China	Cardiothoracic	SSI	1,223 split into training (858) and validation (365) set	Internal

Open in a new tab

ACS-NSQIP, American College of Surgeons National Surgical Quality Improvement Program; SSI, Surgical Site Infection; USA, United States of Ameri.

Performance of ML models

The most common reported outcome for model performance was the C-statistic, which was reported in 96% of the studies. Other model performance parameters reported were sensitivity, specificity, negative predicting value and positive predicting value. Only two studies reported calibration metrics of which one study also reported the brier score [39, 44]. Of the internally validated models, the median C-statistic was 0.62 and ranged from 0.44 to 0.99, for the externally validated models the median C-statistic was 0.79 and ranged from 0.55 to 0.87. Sensitivity, specificity, negative predictive value (NPV) and positive predictive value (PPV) were reported in one externally validated model by Grass et al. and were 0.47, 0.8, 0.97 and 0.10 respectively. Of the internally validated models, sensitivity was reported for twenty (29%) models and varied between 0.24 to 0.90, specificity was reported for fifteen (22%) models and varied between 0.25 to 0.91, NPV was reported for four (6%) models and varied between 0.87 to 0.98 and PPV was reported for eleven (16%) models and varied between 0.06 to 0.90 respectively. Overall, the performance of the models varied widely and there was no clear difference between the different surgical specialties or type of SSI predicted (Tables 2–5).

Table 2. Performance of ML models predicting SSI in general per surgical specialism.

Author	Algorithm	Number of predictors	C-statistic	Sensitivity	Specificity	NPV	PPV	Validation
Cardiothoracic surgery
Zhang 2023 [45]	RF	-	0.83	-	-	-	-	Internal
	SVM	-	0.91	-	-	-	-	Internal
	XGB	-	0.99	-	-	-	-	Internal
	GBDT	-	0.99	-	-	-	-	Internal
	Adaboost	-	0.81	-	-	-	-	Internal
	NN	-	0.99	-	-	-	-	Internal
Abdominal surgery
Grass 2021 [27]	BPMI	-	0.74	0.47	0.8	0.97	0.10	External
Grass 2021 [27]	BPMI–Mayo clinic	-	0.78	0.56	0.8	0.98	0.11	Internal
Ke 2017 [28]	Bilinear model	≥28	-	-	-	-	-	Internal
Ohno 2022 [36]	SVM	25	0.73	-	-	-	-	Internal
Sanger 2016 [37]	NB ‐ Baseline features	28	0.63	-	-	-	-	Internal
	NB ‐ Serial features	9	0.74	0.42–0.80^*	0.64–0.91^*	0.87–0.92^*	0.33–0.53^*	Internal
	NB ‐ Serial features simplified	5	0.73	0.42–0.75^*	0.64–0.91^*	0.87–0.92^*	0.35–0.53^*	Internal
	NB ‐ Combined features	37	0.75	-	-	-	-	Internal
Weller 2018 [43]	RF	-	0.44^a, 0.47^b, 0.50^c, 0.55^d	-	-	-	-	Internal
	SVM	-	0.55^a, 0.51^b, 0.47^c, 0.49^d	-	-	-	-	Internal
	AdaBoost	-	0.44^a, 0.47^b, 0.51^c, 0.51^d	-	-	-	-	Internal
	NB	-	0.48^a, 0.45^b, 0.45^c, 0.78^d	-	-	-	-	Internal
General surgery
Van Esbroeck 2014 [39]	SVM–short description	-	0.79	-	-	-	-	External
	SVM–medium description	-	0.79	-	-	-	-	External
	SVM–large description	-	0.79	-	-	-	-	External
	SVM–CPT	-	0.74	-	-	-	-	External
	SVM–multivariate model	-	0.80	-	-	-	-	External
Mamlook 2023 [31]	Naïve bayes	-	0.71	0.76	0.71	-	-	Internal
	Random forest	-	0.83	0.84	0.84	-	-	Internal
	Decision tree	-	0.81	0.82	0.81	-	-	Internal
	SVM	-	0.82	0.82	0.82	-	-	Internal
	ANN	-	0.82	0.81	0.81	-	-	Internal
	DNN	16	0.85	0.85	0.85	-	-	Internal
Walczak 2019 [42]	ANN–all variables	11	-	0.69	0.60	-	-	Internal
	ANN without NSQIP compliance	10	-	0.90	0.50	-	-	Internal
	ANN without NSQIP compliance and sex	9	-	0.79	0.50	-	-	Internal
Orthopedic surgery
Gowd 2019 [26]	KNN	22	0.5	-	-	-	-	Internal
	RF	22	0.53	-	-	-	-	Internal
	NB	22	0.45	-	-	-	-	Internal
	Decision tree	22	0.5	-	-	-	-	Internal
	Gradient boosting trees	22	0.61	-	-	-	-	Internal
Ying 2023 [44]	Extra trees classifier	-	0.87	-	-	-	-	External
Ying 2023 [44]	Random forest	-	0.82	-	-	-	-	External
Neurosurgery
Liu 2022 [29]	Decision tree	-	0.78	-	-	-	0.79	Internal
	Multilayer perception	-	0.76	-	-	-	0.66	Internal
	Random forest	-	0.89	-	-	-	0.86	Internal
	Gradient boosting machine	-	0.91	-	-	-	0.88	Internal
	Extreme gradient boosting machine	-	0.92	-	-	-	0.9	Internal
Urological surgery
Taylor 2019 [38]	GAM	-	∼ 0.61^**	-	-	-	-	Internal
	LASSO logistic regression	-	0.62	-	-	-	-	Internal
	RF	-	∼ 0.61^**	-	-	-	-	Internal
	NNET	-	∼ 0.58^**	-	-	-	-	Internal
Vascular surgery
Chang 2020 [24]	DLPM	-	0.61	0.83	0.25	0.912	0.14	Internal

Open in a new tab

AdaBoost, Adaptive boosting; ANN, Artificial neural network; BPMI, Bayesian-probit regression model with multiple imputation; DLPM, Deep learning based risk model; GAM, Generalize additive models; GBDT, Gradient boosted decision trees; GLM, Generalized linear model; KNN, K-nearest neighbors; NB, Naïve Bayes; NNET, Feed-forward neural network with logistic activation function and no weight decay; OCT, optimal classification trees; POD, postoperative day; POTTER, Predictive OpTimal Trees in Emergency Surgery Risk; RF, Random forest; SVM, Support vector machine; XGB, Gradient Boosting machine

*Depending on cut-off value sensitivity and specificity

**No exact value given.

^apreoperative

^bPOD0

^cPOD1

^dPOD2

Table 5. Performance of ML models predicting organ space SSI per surgical specialism.

Author	Algorithm	Number of predictors	C-statistic	Validation
Abdominal surgery
Liu2022 [30]	GBDT	-	0.83	Internal
	KNN	-	0.71	Internal
	Random forest	-	0.87	Internal
	SVM	-	0.89	Internal
Mazaki2021 [33]	ANN	18	0.77	Internal
Merath 2020 [34]	Decision tree	-	0.76	Internal
Nudel2020 [35]	XGB	33	0.70	Internal
Nudel2020 [35]	ANN	-	0.75	Internal
Van Kooten2022 [40]	AdaBoost	-	0.61	Internal
	Adalearner	-	0.62	Internal
	KNN	-	0.57	Internal
	NN	-	0.61	Internal
	Random forest	-	0.59	Internal
	SVM	-	0.59	Internal
Velmahos2023 [41]	Random forest	-	0.72	Internal
Velmahos2023 [41]	XGB	-	0.72	Internal
General surgery
Bonde 2021 [23]	ANN model 1	21	0.85	External
	ANN model 2	34	0.85	External
	ANN model 3	56	0.87	External
Emergency surgery
Bertsimas 2018 [22]	OCT POTTER–without ASA	-	0.78	External
Bertsimas 2018 [22]	OCT POTTER with ASA	-	0.79	External
Bonde 2021 [23]	ANN model 1	21	0.78	External
	ANN model 2	21	0.81	External
	ANN model 3	21	0.86	External
	OCT POTTER	-	0.79	External
El Hechi 2021 [25]	OCT POTTER	-	General 0.69 Laparotomy 0.62	External
Maurer 2020 [32]	OCT POTTER	-	0.80^**	External

Open in a new tab

AdaBoost, Adaptive boosting; ANN, Artificial neural network; ASA, American Society of Anesthesiology; GBDT, Gradient boosted decision trees; KNN, K-nearest neighbors; OCT, optimal classification trees; POTTER, Predictive OpTimal Trees in Emergency Surgery Risk; SVM, Support vector machine; XGB, Gradient Boosting machine.

**No exact value given.

Table 3. Performance of ML models predicting superficial SSI per surgical specialism.

Author	Algorithm	Number of predictors	C-statistic	Validation
Abdominal surgery
Merath 2020 [34]	Decision tree	-	0.76	Internal
Velmahos 2023 [41]	Random forest	-	0.63	Internal
Velmahos 2023 [41]	XGB	-	0.64	Internal
General surgery
Bonde 2021 [23]	ANN model 1	21	0.82	External
	ANN model 2	34	0.82	External
	ANN model 3	56	0.83	External
Emergency surgery
Bertsimas 2018 [22]	OCT POTTER–without ASA	-	0.68	External
Bertsimas 2018 [22]	OCT POTTER with ASA		0.68	External
Bonde 2021 [23]	ANN model 1	21	0.75	External
	ANN model 2	21	0.75	External
	ANN model 3	21	0.76	External
	OCT POTTER	-	0.68	External
El Hechi 2021 [25]	OCT POTTER	-	General 0.64 Laparotomy 0.55	External
Maurer 2020 [32]	OCT POTTER	-	0.60–0.80**	External

Open in a new tab

ANN, Artificial neural network; ASA, American Society of Anesthesiology; OCT, optimal classification trees; POTTER, Predictive OpTimal Trees in Emergency Surgery Risk; XGB, Gradient Boosting machine.

**No exact value given.

Table 4. Performance of ML models predicting deep SSI per surgical specialism.

Author	Algorithm	Number of predictors	C-statistic	Validation
Abdominal surgery
Merath 2020 [34]	Decision tree	-	0.93	Internal
General surgery
Bonde 2021 [23]	ANN model 1	21	0.78	External
	ANN model 2	34	0.78	External
	ANN model 3	56	0.80	External
Emergency surgery
Bertsimas 2018 [22]	OCT POTTER–without ASA	-	0.74	External
Bertsimas 2018 [22]	OCT POTTER with ASA		0.75	External
Bonde 2021 [23]	ANN model 1	21	0.87	External
	ANN model 2	21	0.85	External
	ANN model 3	21	0.82	External
	OCT POTTER	-	0.75	External
El Hechi 2021 [25]	OCT POTTER	-	General 0.70 Laparotomy 0.60	External
Maurer 2020 [32]	OCT POTTER	-	0.60–0.80^**	External

Open in a new tab

ANN, Artificial neural network; ASA, American Society of Anesthesiology; OCT, optimal classification trees; POTTER, Predictive OpTimal Trees in Emergency Surgery Risk.

**No exact value given.

Predictors used in ML models

Of the 85 included ML models, the number of predictors used in the model was reported for 20 models (24%), with mentioning of feature importance (determined by SHAP values) in 15 models (18%). In total 116 different predictors were used in these 20 models. The median number of included predictors per model was 22, ranging from 5–56. The most commonly included predictors were age (100%), oral corticosteroid use (85%), sex (85%), smoking (80%), and diabetes (75%) (Fig 2).

Fig 2 — All predictors used five times or more are included in the figure. ASA classification (American Society of Anesthesiologists); BMI, (Body Mass Index); COPD (Chronic Obstructive Pulmonary Disease); INR, (International Normalized Ratio); PT, (Prothrombin time); WBC, (White blood count).

Regression-based models

Of the 24 studies, thirteen studies (54%) also included regression-based models and compared the regression-based performance to the performance of their developed ML models (Fig 3 and S3 Table). The C-statistic for regression-based models varied between 0.41 to 0.95. For the prediction of SSIs, ML performed slightly better compared to regression-based models in four studies [27, 29, 35, 44], whereas regression-based models performed better in two studies [26, 41]. In the other studies reporting both regression-based and ML models, performances were similar [23, 30, 31, 39, 40, 43, 45]. See Fig 3 for an overview of the AUCs of the studies presenting both ML and regression-based models.

Fig 3 — Green dots represent the AUC of the ML models, orange dots represent the AUC of the regression-based models. The green and orange lines represent the median.

Risk of bias

The ROB was assessed for all models described in the 24 studies. ROB was low in the participants domain. ROB was high or unclear in the predictors domain and outcome domain, as studies often poorly reported the used predictors and whether predictors were selected independent of the outcome status. In the analysis domain, all studies had a high or unclear ROB, mostly caused by statistical issues such as poor reporting of performance measures, not taking competing risks into account and inappropriate methods to handle missing data. There were no concerns on applicability for all studies. See Fig 4 for an overview of ROB, and S4 Table for the complete ROB.

Fig 4 — Green low risk of bias, yellow unclear risk of bias due to lack of information, red high risk of bias. ROB; Risk of bias.

Discussion

This systematic review showed that a multitude of 85 different validated ML prediction models for SSIs exists. Most models were developed and tested in patient populations that underwent abdominal surgery. Most of these models (81%) were only internally validated. The most frequently reported parameter for performance was the C-statistic, which varied widely between the different models, and only two studies reported calibration metrics. This corresponds with previous studies on the use of ML in other fields, that found that calibration is rarely reported and that only a minority of the models is externally validated [11, 46, 47]. However, for proper assessment of model performance, both discrimination and calibration are essential parameters for the interpretation of the predicted probabilities [14]. Without external validation of a prediction model, it is difficult to accurately estimate the actual performance of a model in different clinical practices. Furthermore, it is common that retraining or recalibration of an ML model is necessary to fit the unseen population [48]. Therefore, newly developed ML prediction models as well as already existing models need to be retrained, recalibrated, and again validated for new populations. Furthermore, their effect on patient care should then be evaluated and reported with impact studies.

Thirteen of the included studies described both regression-based and ML models and compared their performances in the same population. Both the regression-based models as well as the ML models showed large variability of performance, which is in accordance with previous literature on regression-based models for the prediction of SSIs [49–51]. When compared, the ML and regression-based models did not outperform each other. This is in accordance with previous studies that compared ML models with regression-based models, although some studies suggest that certain subtypes of ML (i.e. gradient boosting trees) perform better than regression-based models [52, 53]. ML models generally need larger datasets to use their full potential. It is possible that this condition was not met in all studies, as the median number of predictors was 22 and the sample size ranged from 256 to 5,881,881.

Model explainability is an important issue with ML prediction models. In general, ML models are considered to be more complex and less transparent with respect to which variables are selected for the prediction compared to regression-based models. Furthermore, in our study, transparency of ML models was further limited as only in the minority of the ML models (24%) the used predictors were reported. This contrasts with regression-based models which are usually presented with regression coefficients representing the strength of the relation between individual predictors and the outcome [54]. Despite being less transparent, ML models are able to utilize large and heterogenous number of datasets and types, can take into account more complex relationships of predictors, can be adapted to the local setting if the model has been validated or recalibrated to this population and can be incorporated in the electronic health care system, making them potentially more beneficial when implemented into clinical care [10].

The ROB was high or unclear for almost all studies, suggesting considerable methodological issues. ROB was scored using the PROBAST which is the most common used tool to estimate ROB of prediction studies. Although an high or unclear ROB for almost all studies is in agreement with previous reviews using the PROBAST [55–57], the PROBAST has been criticized because of poor inter-rater agreement [56, 57]. Moreover, it is not possible to distinguish domains with a high ROB based on one single signaling question answered with ‘no’ from domains with all signaling questions answered with ‘no’. Despite the limitations of the PROBAST, it remains a useful tool to assess methodological shortcomings in prediction studies. Therefore, caution for the interpretation of the findings from these ML models for SSI prediction is recommended. Recently, the new TRIPOD-AI guidelines have been published and new ML models being developedshould follow these guidelines in order to prevent bias [58].

Strengths and limitations

The major strength of this systematic review is that it included all presently available validated ML models for the prediction of SSIs without restrictions on surgical specialty or SSI subtype. In addition, we described the comparison of regression-based models with ML models where possible. As both types of models were compared to each other within the same population, bias was minimalized.

Some limitations exist

Differences in the quality and the heterogeneity of the data prevented the conduction of a sound meta-analysis comparing the different ML models. Furthermore, this review is limited to SSIs as outcome, although other postoperative infections such as pneumonia and bloodstream infections are also clinically relevant.

Conclusions

This systematic review showed that many ML models for the prediction of SSIs exist, and that their performance generally is equal to regression-based models. Machine learning techniques are still developing and are seen as a promising tool to improve medical care. However, there are multiple methodological issues with the currently available models and there is still a substantial gap between the existing models and their practical and safe implementation in clinical settings. The recently published TRIPOD-AI guidelines should be used to reduce methodological flaws. To create clinically relevant prediction models for future use, more collaboration between clinicians and data scientists, as well as post-implementation studies are needed.

Supporting information

S1 Appendix. Prisma checklist.

(DOCX)

pone.0312968.s001.docx^{(36.2KB, docx)}

S2 Appendix. Search strategy.

(DOCX)

pone.0312968.s002.docx^{(20.3KB, docx)}

S1 Table. Excluded full text articles.

(DOCX)

pone.0312968.s003.docx^{(217.3KB, docx)}

S2 Table. Extracted parameters from the data.

(DOCX)

pone.0312968.s004.docx^{(12.7KB, docx)}

S3 Table. Studies with both ML and regression-based models.

(DOCX)

pone.0312968.s005.docx^{(46.2KB, docx)}

S4 Table. Risk of bias assessment with the use of the PROBAST score.

(DOCX)

pone.0312968.s006.docx^{(78.8KB, docx)}

Acknowledgments

The author would like to thank Rory Monahan for proofreading the pre-final manuscript.

Data Availability

All relevant data are within the manuscript and its Supporting Information files.

Funding Statement

The author(s) received no specific funding for this work.

References

1.Global patient outcomes after elective surgery: prospective cohort study in 27 low-, middle- and high-income countries. Br J Anaesth. 2016;117(5):601–9. [DOI] [PMC free article] [PubMed]
2.Badia JM, Casey AL, Petrosillo N, Hudson PM, Mitchell SA, Crosby C. Impact of surgical site infection on healthcare costs and patient outcomes: a systematic review in six European countries. J Hosp Infect. 2017;96(1):1–15. doi: 10.1016/j.jhin.2017.03.004 [DOI] [PubMed] [Google Scholar]
3.Gillespie BM, Harbeck E, Rattray M, Liang R, Walker R, Latimer S, et al. Worldwide incidence of surgical site infections in general surgical patients: A systematic review and meta-analysis of 488,594 patients. Int J Surg. 2021;95:106136. doi: 10.1016/j.ijsu.2021.106136 [DOI] [PubMed] [Google Scholar]
4.European Centre for Disease Prevention and Control. Healthcare-associated infections: surgical site infections. ECDC. Annual epidemiological report for 2018–2020. Stockholm; 2023.
5.Qu H, Liu Y, Bi DS. Clinical risk factors for anastomotic leakage after laparoscopic anterior resection for rectal cancer: a systematic review and meta-analysis. Surg Endosc. 2015;29(12):3608–17. doi: 10.1007/s00464-015-4117-x [DOI] [PubMed] [Google Scholar]
6.Dietz N, Sharma M, Alhourani A, Ugiliweneza B, Wang D, Drazin D, et al. Evaluation of Predictive Models for Complications following Spinal Surgery. J Neurol Surg A Cent Eur Neurosurg. 2020;81(6):535–45. doi: 10.1055/s-0040-1709709 [DOI] [PubMed] [Google Scholar]
7.Guo Y, Hao Z, Zhao S, Gong J, Yang F. Artificial Intelligence in Health Care: Bibliometric Analysis. J Med Internet Res. 2020;22(7):e18228. doi: 10.2196/18228 [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Rajkomar A, Dean J, Kohane I. Machine Learning in Medicine. N Engl J Med. 2019;380(14):1347–58. doi: 10.1056/NEJMra1814259 [DOI] [PubMed] [Google Scholar]
9.Choi RY, Coyner AS, Kalpathy-Cramer J, Chiang MF, Campbell JP. Introduction to Machine Learning, Neural Networks, and Deep Learning. Transl Vis Sci Technol. 2020;9(2):14. doi: 10.1167/tvst.9.2.14 [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Miotto R, Wang F, Wang S, Jiang X, Dudley JT. Deep learning for healthcare: review, opportunities and challenges. Brief Bioinform. 2018;19(6):1236–46. doi: 10.1093/bib/bbx044 [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Andaur Navarro CL, Damen JAA, van Smeden M, Takada T, Nijman SWJ, Dhiman P, et al. Systematic review identifies the design and methodological conduct of studies on machine learning-based prediction models. J Clin Epidemiol. 2023;154:8–22. doi: 10.1016/j.jclinepi.2022.11.015 [DOI] [PubMed] [Google Scholar]
12.Solomonides AE, Koski E, Atabaki SM, Weinberg S, McGreevey JD, Kannry JL, et al. Defining AMIA’s artificial intelligence principles. J Am Med Inform Assoc. 2022;29(4):585–91. doi: 10.1093/jamia/ocac006 [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Hunter DJ, Holmes C. Where Medical Statistics Meets Artificial Intelligence. N Engl J Med. 2023;389(13):1211–9. doi: 10.1056/NEJMra2212850 [DOI] [PubMed] [Google Scholar]
14.Steyerberg EW, Vickers AJ, Cook NR, Gerds T, Gonen M, Obuchowski N, et al. Assessing the performance of prediction models: a framework for traditional and novel measures. Epidemiology. 2010;21(1):128–38. doi: 10.1097/EDE.0b013e3181c30fb2 [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Steyerberg EW, Vergouwe Y. Towards better clinical prediction models: seven steps for development and an ABCD for validation. Eur Heart J. 2014;35(29):1925–31. doi: 10.1093/eurheartj/ehu207 [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Elfanagely O, Toyoda Y, Othman S, Mellia JA, Basta M, Liu T, et al. Machine Learning and Surgical Outcomes Prediction: A Systematic Review. J Surg Res. 2021;264:346–61. doi: 10.1016/j.jss.2021.02.045 [DOI] [PubMed] [Google Scholar]
17.Li B, Feridooni T, Cuen-Ojeda C, Kishibe T, de Mestral C, Mamdani M, et al. Machine learning in vascular surgery: a systematic review and critical appraisal. NPJ Digit Med. 2022;5(1):7. doi: 10.1038/s41746-021-00552-y [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Wu G, Khair S, Yang F, Cheligeer C, Southern D, Zhang Z, et al. Performance of machine learning algorithms for surgical site infection case detection and prediction: A systematic review and meta-analysis. Ann Med Surg (Lond). 2022;84:104956. doi: 10.1016/j.amsu.2022.104956 [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Moons KG, de Groot JA, Bouwmeester W, Vergouwe Y, Mallett S, Altman DG, et al. Critical appraisal and data extraction for systematic reviews of prediction modelling studies: the CHARMS checklist. PLoS Med. 2014;11(10):e1001744. doi: 10.1371/journal.pmed.1001744 [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Moons KGM, Wolff RF, Riley RD, Whiting PF, Westwood M, Collins GS, et al. PROBAST: A Tool to Assess Risk of Bias and Applicability of Prediction Model Studies: Explanation and Elaboration. Ann Intern Med. 2019;170(1):W1–w33. doi: 10.7326/M18-1377 [DOI] [PubMed] [Google Scholar]
21.de Jong Y, Ramspek CL, Zoccali C, Jager KJ, Dekker FW, van Diepen M. Appraising prediction research: a guide and meta-review on bias and applicability assessment using the Prediction model Risk Of Bias ASsessment Tool (PROBAST). Nephrology (Carlton). 2021;26(12):939–47. doi: 10.1111/nep.13913 [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Bertsimas D, Dunn J, Velmahos GC, Kaafarani HMA. Surgical Risk Is Not Linear: Derivation and Validation of a Novel, User-friendly, and Machine-learning-based Predictive OpTimal Trees in Emergency Surgery Risk (POTTER) Calculator. Ann Surg. 2018;268(4):574–83. doi: 10.1097/SLA.0000000000002956 [DOI] [PubMed] [Google Scholar]
23.Bonde A, Varadarajan KM, Bonde N, Troelsen A, Muratoglu OK, Malchau H, et al. Assessing the utility of deep neural networks in predicting postoperative surgical complications: a retrospective study. Lancet Digit Health. 2021;3(8):e471–e85. doi: 10.1016/S2589-7500(21)00084-4 [DOI] [PubMed] [Google Scholar]
24.Chang B, Sun Z, Peiris P, Huang ES, Benrashid E, Dillavou ED. Deep Learning-Based Risk Model for Best Management of Closed Groin Incisions After Vascular Surgery. J Surg Res. 2020;254:408–16. doi: 10.1016/j.jss.2020.02.012 [DOI] [PubMed] [Google Scholar]
25.El Hechi MW, Maurer LR, Levine J, Zhuo D, El Moheb M, Velmahos GC, et al. Validation of the Artificial Intelligence-Based Predictive Optimal Trees in Emergency Surgery Risk (POTTER) Calculator in Emergency General Surgery and Emergency Laparotomy Patients. J Am Coll Surg. 2021. doi: 10.1016/j.jamcollsurg.2021.02.009 [DOI] [PubMed] [Google Scholar]
26.Gowd AK, Agarwalla A, Amin NH, Romeo AA, Nicholson GP, Verma NN, et al. Construct validation of machine learning in the prediction of short-term postoperative complications following total shoulder arthroplasty. J Shoulder Elbow Surg. 2019;28(12):e410–e21. doi: 10.1016/j.jse.2019.05.017 [DOI] [PubMed] [Google Scholar]
27.Grass F, Storlie CB, Mathis KL, Bergquist JR, Asai S, Boughey JC, et al. Challenges of Modeling Outcomes for Surgical Infections: A Word of Caution. Surg Infect (Larchmt). 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Ke C, Jin Y, Evans H, Lober B, Qian X, Liu J, et al. Prognostics of surgical site infections using dynamic health data. J Biomed Inform. 2017;65:22–33. doi: 10.1016/j.jbi.2016.10.021 [DOI] [PubMed] [Google Scholar]
29.Liu WC, Ying H, Liao WJ, Li MP, Zhang Y, Luo K, et al. Using Preoperative and Intraoperative Factors to Predict the Risk of Surgical Site Infections After Lumbar Spinal Surgery: A Machine Learning-Based Study. World Neurosurg. 2022;162:e553–e60. doi: 10.1016/j.wneu.2022.03.060 [DOI] [PubMed] [Google Scholar]
30.Liu X, Lei S, Wei Q, Wang Y, Liang H, Chen L. Machine Learning-based Correlation Study between Perioperative Immunonutritional Index and Postoperative Anastomotic Leakage in Patients with Gastric Cancer. Int J Med Sci. 2022;19(7):1173–83. doi: 10.7150/ijms.72195 [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Mamlook REA, Wells LJ, Sawyer R. Machine-learning models for predicting surgical site infections using patient pre-operative risk and surgical procedure factors. Am J Infect Control. 2023;51(5):544–50. doi: 10.1016/j.ajic.2022.08.013 [DOI] [PubMed] [Google Scholar]
32.Maurer LR, Chetlur P, Zhuo D, El Hechi M, Velmahos GC, Dunn J, et al. Validation of the AI-based Predictive OpTimal Trees in Emergency Surgery Risk (POTTER) Calculator in Patients 65 Years and Older. Ann Surg. 2020;Publish Ahead of Print. [DOI] [PubMed] [Google Scholar]
33.Mazaki J, Katsumata K, Ohno Y, Udo R, Tago T, Kasahara K, et al. A Novel Predictive Model for Anastomotic Leakage in Colorectal Cancer Using Auto-artificial Intelligence. Anticancer Res. 2021;41(11):5821–5. doi: 10.21873/anticanres.15400 [DOI] [PubMed] [Google Scholar]
34.Merath K, Hyer JM, Mehta R, Farooq A, Bagante F, Sahara K, et al. Use of Machine Learning for Prediction of Patient Risk of Postoperative Complications After Liver, Pancreatic, and Colorectal Surgery. J Gastrointest Surg. 2020;24(8):1843–51. doi: 10.1007/s11605-019-04338-2 [DOI] [PubMed] [Google Scholar]
35.Nudel J, Bishara AM, de Geus SWL, Patil P, Srinivasan J, Hess DT, et al. Development and validation of machine learning models to predict gastrointestinal leak and venous thromboembolism after weight loss surgery: an analysis of the MBSAQIP database. Surg Endosc. 2021;35(1):182–91. doi: 10.1007/s00464-020-07378-x [DOI] [PMC free article] [PubMed] [Google Scholar]
36.Ohno Y, Mazaki J, Udo R, Tago T, Kasahara K, Enomoto M, et al. Preliminary Evaluation of a Novel Artificial Intelligence-based Prediction Model for Surgical Site Infection in Colon Cancer. Cancer Diagn Progn. 2022;2(6):691–6. doi: 10.21873/cdp.10161 [DOI] [PMC free article] [PubMed] [Google Scholar]
37.Sanger PC, van Ramshorst GH, Mercan E, Huang S, Hartzler AL, Armstrong CA, et al. A Prognostic Model of Surgical Site Infection Using Daily Clinical Wound Assessment. J Am Coll Surg. 2016;223(2):259–70.e2. doi: 10.1016/j.jamcollsurg.2016.04.046 [DOI] [PMC free article] [PubMed] [Google Scholar]
38.Taylor J, Meng X, Renson A, Smith AB, Wysock JS, Taneja SS, et al. Different models for prediction of radical cystectomy postoperative complications and care pathways. Ther Adv Urol. 2019;11:1756287219875587. doi: 10.1177/1756287219875587 [DOI] [PMC free article] [PubMed] [Google Scholar]
39.Van Esbroeck A, Rubinfeld I, Hall B, Syed Z. Quantifying surgical complexity with machine learning: looking beyond patient factors to improve surgical models. Surgery. 2014;156(5):1097–105. doi: 10.1016/j.surg.2014.04.034 [DOI] [PubMed] [Google Scholar]
40.van Kooten RT, Bahadoer RR, Ter Buurkes de Vries B, Wouters M, Tollenaar R, Hartgrink HH, et al. Conventional regression analysis and machine learning in prediction of anastomotic leakage and pulmonary complications after esophagogastric cancer surgery. J Surg Oncol. 2022;126(3):490–501. doi: 10.1002/jso.26910 [DOI] [PMC free article] [PubMed] [Google Scholar]
41.Velmahos CS, Paschalidis A, Paranjape CN. The Not-So-Distant Future or Just Hype? Utilizing Machine Learning to Predict 30-Day Post-Operative Complications in Laparoscopic Colectomy Patients. Am Surg. 2023:31348231167397. doi: 10.1177/00031348231167397 [DOI] [PubMed] [Google Scholar]
42.Walczak S, Davila M, Velanovich V. Prophylactic antibiotic bundle compliance and surgical site infections: an artificial neural network analysis. Patient Saf Surg. 2019;13:41. doi: 10.1186/s13037-019-0222-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
43.Weller GB, Lovely J, Larson DW, Earnshaw BA, Huebner M. Leveraging electronic health records for predictive modeling of post-surgical complications. Stat Methods Med Res. 2018;27(11):3271–85. doi: 10.1177/0962280217696115 [DOI] [PubMed] [Google Scholar]
44.Ying H, Guo BW, Wu HJ, Zhu RP, Liu WC, Zhong HF. Using multiple indicators to predict the risk of surgical site infection after ORIF of tibia fractures: a machine learning based study. Front Cell Infect Microbiol. 2023;13:1206393. doi: 10.3389/fcimb.2023.1206393 [DOI] [PMC free article] [PubMed] [Google Scholar]
45.Zhang N, Fan K, Ji H, Ma X, Wu J, Huang Y, et al. Identification of risk factors for infection after mitral valve surgery through machine learning approaches. Front Cardiovasc Med. 2023;10:1050698. doi: 10.3389/fcvm.2023.1050698 [DOI] [PMC free article] [PubMed] [Google Scholar]
46.van der Endt VHW, Milders J, Penning de Vries BBL, Trines SA, Groenwold RHH, Dekkers OM, et al. Comprehensive comparison of stroke risk score performance: a systematic review and meta-analysis among 6 267 728 patients with atrial fibrillation. Europace. 2022;24(11):1739–53. doi: 10.1093/europace/euac096 [DOI] [PMC free article] [PubMed] [Google Scholar]
47.de Jong Y, Ramspek CL, van der Endt VHW, Rookmaaker MB, Blankestijn PJ, Vernooij RWM, et al. A systematic review and external validation of stroke prediction models demonstrates poor performance in dialysis patients. J Clin Epidemiol. 2020;123:69–79. doi: 10.1016/j.jclinepi.2020.03.015 [DOI] [PubMed] [Google Scholar]
48.de Hond AAH, Kant IMJ, Fornasa M, Cinà G, Elbers PWG, Thoral PJ, et al. Predicting Readmission or Death After Discharge From the ICU: External Validation and Retraining of a Machine Learning Model. Crit Care Med. 2023;51(2):291–300. doi: 10.1097/CCM.0000000000005758 [DOI] [PMC free article] [PubMed] [Google Scholar]
49.Kunutsor SK, Whitehouse MR, Blom AW, Beswick AD. Systematic review of risk prediction scores for surgical site infection or periprosthetic joint infection following joint arthroplasty. Epidemiol Infect. 2017;145(9):1738–49. doi: 10.1017/S0950268817000486 [DOI] [PMC free article] [PubMed] [Google Scholar]
50.Gwilym BL, Ambler GK, Saratzis A, Bosanquet DC. Groin Wound Infection after Vascular Exposure (GIVE) Risk Prediction Models: Development, Internal Validation, and Comparison with Existing Risk Prediction Models Identified in a Systematic Literature Review. Eur J Vasc Endovasc Surg. 2021;62(2):258–66. doi: 10.1016/j.ejvs.2021.05.009 [DOI] [PubMed] [Google Scholar]
51.Lubelski D, Alentado V, Nowacki AS, Shriver M, Abdullah KG, Steinmetz MP, et al. Preoperative Nomograms Predict Patient-Specific Cervical Spine Surgery Clinical and Quality of Life Outcomes. Neurosurgery. 2018;83(1):104–13. doi: 10.1093/neuros/nyx343 [DOI] [PubMed] [Google Scholar]
52.Song X, Liu X, Liu F, Wang C. Comparison of machine learning and logistic regression models in predicting acute kidney injury: A systematic review and meta-analysis. Int J Med Inform. 2021;151:104484. doi: 10.1016/j.ijmedinf.2021.104484 [DOI] [PubMed] [Google Scholar]
53.Christodoulou E, Ma J, Collins GS, Steyerberg EW, Verbakel JY, Van Calster B. A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models. J Clin Epidemiol. 2019;110:12–22. doi: 10.1016/j.jclinepi.2019.02.004 [DOI] [PubMed] [Google Scholar]
54.van Smeden M, Heinze G, Van Calster B, Asselbergs FW, Vardas PE, Bruining N, et al. Critical appraisal of artificial intelligence-based prediction models for cardiovascular disease. Eur Heart J. 2022;43(31):2921–30. doi: 10.1093/eurheartj/ehac238 [DOI] [PMC free article] [PubMed] [Google Scholar]
55.Venema E, Wessler BS, Paulus JK, Salah R, Raman G, Leung LY, et al. Large-scale validation of the prediction model risk of bias assessment Tool (PROBAST) using a short form: high risk of bias models show poorer discrimination. J Clin Epidemiol. 2021;138:32–9. doi: 10.1016/j.jclinepi.2021.06.017 [DOI] [PubMed] [Google Scholar]
56.Langenhuijsen LFS, Janse RJ, Venema E, Kent DM, van Diepen M, Dekker FW, et al. Systematic metareview of prediction studies demonstrates stable trends in bias and low PROBAST inter-rater agreement. J Clin Epidemiol. 2023;159:159–73. doi: 10.1016/j.jclinepi.2023.04.012 [DOI] [PubMed] [Google Scholar]
57.Kaiser I, Pfahlberg AB, Mathes S, Uter W, Diehl K, Steeb T, et al. Inter-Rater Agreement in Assessing Risk of Bias in Melanoma Prediction Studies Using the Prediction Model Risk of Bias Assessment Tool (PROBAST): Results from a Controlled Experiment on the Effect of Specific Rater Training. J Clin Med. 2023;12(5). doi: 10.3390/jcm12051976 [DOI] [PMC free article] [PubMed] [Google Scholar]
58.Collins GS, Moons KGM, Dhiman P, Riley RD, Beam AL, Van Calster B, et al. TRIPOD+AI statement: updated guidance for reporting clinical prediction models that use regression or machine learning methods. Bmj. 2024;385:e078378. doi: 10.1136/bmj-2023-078378 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

S1 Appendix. Prisma checklist.

(DOCX)

pone.0312968.s001.docx^{(36.2KB, docx)}

S2 Appendix. Search strategy.

(DOCX)

pone.0312968.s002.docx^{(20.3KB, docx)}

S1 Table. Excluded full text articles.

(DOCX)

pone.0312968.s003.docx^{(217.3KB, docx)}

S2 Table. Extracted parameters from the data.

(DOCX)

pone.0312968.s004.docx^{(12.7KB, docx)}

S3 Table. Studies with both ML and regression-based models.

(DOCX)

pone.0312968.s005.docx^{(46.2KB, docx)}

S4 Table. Risk of bias assessment with the use of the PROBAST score.

(DOCX)

pone.0312968.s006.docx^{(78.8KB, docx)}

Data Availability Statement

All relevant data are within the manuscript and its Supporting Information files.

[pone.0312968.ref001] 1.Global patient outcomes after elective surgery: prospective cohort study in 27 low-, middle- and high-income countries. Br J Anaesth. 2016;117(5):601–9. [DOI] [PMC free article] [PubMed]

[pone.0312968.ref002] 2.Badia JM, Casey AL, Petrosillo N, Hudson PM, Mitchell SA, Crosby C. Impact of surgical site infection on healthcare costs and patient outcomes: a systematic review in six European countries. J Hosp Infect. 2017;96(1):1–15. doi: 10.1016/j.jhin.2017.03.004 [DOI] [PubMed] [Google Scholar]

[pone.0312968.ref003] 3.Gillespie BM, Harbeck E, Rattray M, Liang R, Walker R, Latimer S, et al. Worldwide incidence of surgical site infections in general surgical patients: A systematic review and meta-analysis of 488,594 patients. Int J Surg. 2021;95:106136. doi: 10.1016/j.ijsu.2021.106136 [DOI] [PubMed] [Google Scholar]

[pone.0312968.ref004] 4.European Centre for Disease Prevention and Control. Healthcare-associated infections: surgical site infections. ECDC. Annual epidemiological report for 2018–2020. Stockholm; 2023.

[pone.0312968.ref005] 5.Qu H, Liu Y, Bi DS. Clinical risk factors for anastomotic leakage after laparoscopic anterior resection for rectal cancer: a systematic review and meta-analysis. Surg Endosc. 2015;29(12):3608–17. doi: 10.1007/s00464-015-4117-x [DOI] [PubMed] [Google Scholar]

[pone.0312968.ref006] 6.Dietz N, Sharma M, Alhourani A, Ugiliweneza B, Wang D, Drazin D, et al. Evaluation of Predictive Models for Complications following Spinal Surgery. J Neurol Surg A Cent Eur Neurosurg. 2020;81(6):535–45. doi: 10.1055/s-0040-1709709 [DOI] [PubMed] [Google Scholar]

[pone.0312968.ref007] 7.Guo Y, Hao Z, Zhao S, Gong J, Yang F. Artificial Intelligence in Health Care: Bibliometric Analysis. J Med Internet Res. 2020;22(7):e18228. doi: 10.2196/18228 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0312968.ref008] 8.Rajkomar A, Dean J, Kohane I. Machine Learning in Medicine. N Engl J Med. 2019;380(14):1347–58. doi: 10.1056/NEJMra1814259 [DOI] [PubMed] [Google Scholar]

[pone.0312968.ref009] 9.Choi RY, Coyner AS, Kalpathy-Cramer J, Chiang MF, Campbell JP. Introduction to Machine Learning, Neural Networks, and Deep Learning. Transl Vis Sci Technol. 2020;9(2):14. doi: 10.1167/tvst.9.2.14 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0312968.ref010] 10.Miotto R, Wang F, Wang S, Jiang X, Dudley JT. Deep learning for healthcare: review, opportunities and challenges. Brief Bioinform. 2018;19(6):1236–46. doi: 10.1093/bib/bbx044 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0312968.ref011] 11.Andaur Navarro CL, Damen JAA, van Smeden M, Takada T, Nijman SWJ, Dhiman P, et al. Systematic review identifies the design and methodological conduct of studies on machine learning-based prediction models. J Clin Epidemiol. 2023;154:8–22. doi: 10.1016/j.jclinepi.2022.11.015 [DOI] [PubMed] [Google Scholar]

[pone.0312968.ref012] 12.Solomonides AE, Koski E, Atabaki SM, Weinberg S, McGreevey JD, Kannry JL, et al. Defining AMIA’s artificial intelligence principles. J Am Med Inform Assoc. 2022;29(4):585–91. doi: 10.1093/jamia/ocac006 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0312968.ref013] 13.Hunter DJ, Holmes C. Where Medical Statistics Meets Artificial Intelligence. N Engl J Med. 2023;389(13):1211–9. doi: 10.1056/NEJMra2212850 [DOI] [PubMed] [Google Scholar]

[pone.0312968.ref014] 14.Steyerberg EW, Vickers AJ, Cook NR, Gerds T, Gonen M, Obuchowski N, et al. Assessing the performance of prediction models: a framework for traditional and novel measures. Epidemiology. 2010;21(1):128–38. doi: 10.1097/EDE.0b013e3181c30fb2 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0312968.ref015] 15.Steyerberg EW, Vergouwe Y. Towards better clinical prediction models: seven steps for development and an ABCD for validation. Eur Heart J. 2014;35(29):1925–31. doi: 10.1093/eurheartj/ehu207 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0312968.ref016] 16.Elfanagely O, Toyoda Y, Othman S, Mellia JA, Basta M, Liu T, et al. Machine Learning and Surgical Outcomes Prediction: A Systematic Review. J Surg Res. 2021;264:346–61. doi: 10.1016/j.jss.2021.02.045 [DOI] [PubMed] [Google Scholar]

[pone.0312968.ref017] 17.Li B, Feridooni T, Cuen-Ojeda C, Kishibe T, de Mestral C, Mamdani M, et al. Machine learning in vascular surgery: a systematic review and critical appraisal. NPJ Digit Med. 2022;5(1):7. doi: 10.1038/s41746-021-00552-y [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0312968.ref018] 18.Wu G, Khair S, Yang F, Cheligeer C, Southern D, Zhang Z, et al. Performance of machine learning algorithms for surgical site infection case detection and prediction: A systematic review and meta-analysis. Ann Med Surg (Lond). 2022;84:104956. doi: 10.1016/j.amsu.2022.104956 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0312968.ref019] 19.Moons KG, de Groot JA, Bouwmeester W, Vergouwe Y, Mallett S, Altman DG, et al. Critical appraisal and data extraction for systematic reviews of prediction modelling studies: the CHARMS checklist. PLoS Med. 2014;11(10):e1001744. doi: 10.1371/journal.pmed.1001744 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0312968.ref020] 20.Moons KGM, Wolff RF, Riley RD, Whiting PF, Westwood M, Collins GS, et al. PROBAST: A Tool to Assess Risk of Bias and Applicability of Prediction Model Studies: Explanation and Elaboration. Ann Intern Med. 2019;170(1):W1–w33. doi: 10.7326/M18-1377 [DOI] [PubMed] [Google Scholar]

[pone.0312968.ref021] 21.de Jong Y, Ramspek CL, Zoccali C, Jager KJ, Dekker FW, van Diepen M. Appraising prediction research: a guide and meta-review on bias and applicability assessment using the Prediction model Risk Of Bias ASsessment Tool (PROBAST). Nephrology (Carlton). 2021;26(12):939–47. doi: 10.1111/nep.13913 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0312968.ref022] 22.Bertsimas D, Dunn J, Velmahos GC, Kaafarani HMA. Surgical Risk Is Not Linear: Derivation and Validation of a Novel, User-friendly, and Machine-learning-based Predictive OpTimal Trees in Emergency Surgery Risk (POTTER) Calculator. Ann Surg. 2018;268(4):574–83. doi: 10.1097/SLA.0000000000002956 [DOI] [PubMed] [Google Scholar]

[pone.0312968.ref023] 23.Bonde A, Varadarajan KM, Bonde N, Troelsen A, Muratoglu OK, Malchau H, et al. Assessing the utility of deep neural networks in predicting postoperative surgical complications: a retrospective study. Lancet Digit Health. 2021;3(8):e471–e85. doi: 10.1016/S2589-7500(21)00084-4 [DOI] [PubMed] [Google Scholar]

[pone.0312968.ref024] 24.Chang B, Sun Z, Peiris P, Huang ES, Benrashid E, Dillavou ED. Deep Learning-Based Risk Model for Best Management of Closed Groin Incisions After Vascular Surgery. J Surg Res. 2020;254:408–16. doi: 10.1016/j.jss.2020.02.012 [DOI] [PubMed] [Google Scholar]

[pone.0312968.ref025] 25.El Hechi MW, Maurer LR, Levine J, Zhuo D, El Moheb M, Velmahos GC, et al. Validation of the Artificial Intelligence-Based Predictive Optimal Trees in Emergency Surgery Risk (POTTER) Calculator in Emergency General Surgery and Emergency Laparotomy Patients. J Am Coll Surg. 2021. doi: 10.1016/j.jamcollsurg.2021.02.009 [DOI] [PubMed] [Google Scholar]

[pone.0312968.ref026] 26.Gowd AK, Agarwalla A, Amin NH, Romeo AA, Nicholson GP, Verma NN, et al. Construct validation of machine learning in the prediction of short-term postoperative complications following total shoulder arthroplasty. J Shoulder Elbow Surg. 2019;28(12):e410–e21. doi: 10.1016/j.jse.2019.05.017 [DOI] [PubMed] [Google Scholar]

[pone.0312968.ref027] 27.Grass F, Storlie CB, Mathis KL, Bergquist JR, Asai S, Boughey JC, et al. Challenges of Modeling Outcomes for Surgical Infections: A Word of Caution. Surg Infect (Larchmt). 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0312968.ref028] 28.Ke C, Jin Y, Evans H, Lober B, Qian X, Liu J, et al. Prognostics of surgical site infections using dynamic health data. J Biomed Inform. 2017;65:22–33. doi: 10.1016/j.jbi.2016.10.021 [DOI] [PubMed] [Google Scholar]

[pone.0312968.ref029] 29.Liu WC, Ying H, Liao WJ, Li MP, Zhang Y, Luo K, et al. Using Preoperative and Intraoperative Factors to Predict the Risk of Surgical Site Infections After Lumbar Spinal Surgery: A Machine Learning-Based Study. World Neurosurg. 2022;162:e553–e60. doi: 10.1016/j.wneu.2022.03.060 [DOI] [PubMed] [Google Scholar]

[pone.0312968.ref030] 30.Liu X, Lei S, Wei Q, Wang Y, Liang H, Chen L. Machine Learning-based Correlation Study between Perioperative Immunonutritional Index and Postoperative Anastomotic Leakage in Patients with Gastric Cancer. Int J Med Sci. 2022;19(7):1173–83. doi: 10.7150/ijms.72195 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0312968.ref031] 31.Mamlook REA, Wells LJ, Sawyer R. Machine-learning models for predicting surgical site infections using patient pre-operative risk and surgical procedure factors. Am J Infect Control. 2023;51(5):544–50. doi: 10.1016/j.ajic.2022.08.013 [DOI] [PubMed] [Google Scholar]

[pone.0312968.ref032] 32.Maurer LR, Chetlur P, Zhuo D, El Hechi M, Velmahos GC, Dunn J, et al. Validation of the AI-based Predictive OpTimal Trees in Emergency Surgery Risk (POTTER) Calculator in Patients 65 Years and Older. Ann Surg. 2020;Publish Ahead of Print. [DOI] [PubMed] [Google Scholar]

[pone.0312968.ref033] 33.Mazaki J, Katsumata K, Ohno Y, Udo R, Tago T, Kasahara K, et al. A Novel Predictive Model for Anastomotic Leakage in Colorectal Cancer Using Auto-artificial Intelligence. Anticancer Res. 2021;41(11):5821–5. doi: 10.21873/anticanres.15400 [DOI] [PubMed] [Google Scholar]

[pone.0312968.ref034] 34.Merath K, Hyer JM, Mehta R, Farooq A, Bagante F, Sahara K, et al. Use of Machine Learning for Prediction of Patient Risk of Postoperative Complications After Liver, Pancreatic, and Colorectal Surgery. J Gastrointest Surg. 2020;24(8):1843–51. doi: 10.1007/s11605-019-04338-2 [DOI] [PubMed] [Google Scholar]

[pone.0312968.ref035] 35.Nudel J, Bishara AM, de Geus SWL, Patil P, Srinivasan J, Hess DT, et al. Development and validation of machine learning models to predict gastrointestinal leak and venous thromboembolism after weight loss surgery: an analysis of the MBSAQIP database. Surg Endosc. 2021;35(1):182–91. doi: 10.1007/s00464-020-07378-x [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0312968.ref036] 36.Ohno Y, Mazaki J, Udo R, Tago T, Kasahara K, Enomoto M, et al. Preliminary Evaluation of a Novel Artificial Intelligence-based Prediction Model for Surgical Site Infection in Colon Cancer. Cancer Diagn Progn. 2022;2(6):691–6. doi: 10.21873/cdp.10161 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0312968.ref037] 37.Sanger PC, van Ramshorst GH, Mercan E, Huang S, Hartzler AL, Armstrong CA, et al. A Prognostic Model of Surgical Site Infection Using Daily Clinical Wound Assessment. J Am Coll Surg. 2016;223(2):259–70.e2. doi: 10.1016/j.jamcollsurg.2016.04.046 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0312968.ref038] 38.Taylor J, Meng X, Renson A, Smith AB, Wysock JS, Taneja SS, et al. Different models for prediction of radical cystectomy postoperative complications and care pathways. Ther Adv Urol. 2019;11:1756287219875587. doi: 10.1177/1756287219875587 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0312968.ref039] 39.Van Esbroeck A, Rubinfeld I, Hall B, Syed Z. Quantifying surgical complexity with machine learning: looking beyond patient factors to improve surgical models. Surgery. 2014;156(5):1097–105. doi: 10.1016/j.surg.2014.04.034 [DOI] [PubMed] [Google Scholar]

[pone.0312968.ref040] 40.van Kooten RT, Bahadoer RR, Ter Buurkes de Vries B, Wouters M, Tollenaar R, Hartgrink HH, et al. Conventional regression analysis and machine learning in prediction of anastomotic leakage and pulmonary complications after esophagogastric cancer surgery. J Surg Oncol. 2022;126(3):490–501. doi: 10.1002/jso.26910 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0312968.ref041] 41.Velmahos CS, Paschalidis A, Paranjape CN. The Not-So-Distant Future or Just Hype? Utilizing Machine Learning to Predict 30-Day Post-Operative Complications in Laparoscopic Colectomy Patients. Am Surg. 2023:31348231167397. doi: 10.1177/00031348231167397 [DOI] [PubMed] [Google Scholar]

[pone.0312968.ref042] 42.Walczak S, Davila M, Velanovich V. Prophylactic antibiotic bundle compliance and surgical site infections: an artificial neural network analysis. Patient Saf Surg. 2019;13:41. doi: 10.1186/s13037-019-0222-4 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0312968.ref043] 43.Weller GB, Lovely J, Larson DW, Earnshaw BA, Huebner M. Leveraging electronic health records for predictive modeling of post-surgical complications. Stat Methods Med Res. 2018;27(11):3271–85. doi: 10.1177/0962280217696115 [DOI] [PubMed] [Google Scholar]

[pone.0312968.ref044] 44.Ying H, Guo BW, Wu HJ, Zhu RP, Liu WC, Zhong HF. Using multiple indicators to predict the risk of surgical site infection after ORIF of tibia fractures: a machine learning based study. Front Cell Infect Microbiol. 2023;13:1206393. doi: 10.3389/fcimb.2023.1206393 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0312968.ref045] 45.Zhang N, Fan K, Ji H, Ma X, Wu J, Huang Y, et al. Identification of risk factors for infection after mitral valve surgery through machine learning approaches. Front Cardiovasc Med. 2023;10:1050698. doi: 10.3389/fcvm.2023.1050698 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0312968.ref046] 46.van der Endt VHW, Milders J, Penning de Vries BBL, Trines SA, Groenwold RHH, Dekkers OM, et al. Comprehensive comparison of stroke risk score performance: a systematic review and meta-analysis among 6 267 728 patients with atrial fibrillation. Europace. 2022;24(11):1739–53. doi: 10.1093/europace/euac096 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0312968.ref047] 47.de Jong Y, Ramspek CL, van der Endt VHW, Rookmaaker MB, Blankestijn PJ, Vernooij RWM, et al. A systematic review and external validation of stroke prediction models demonstrates poor performance in dialysis patients. J Clin Epidemiol. 2020;123:69–79. doi: 10.1016/j.jclinepi.2020.03.015 [DOI] [PubMed] [Google Scholar]

[pone.0312968.ref048] 48.de Hond AAH, Kant IMJ, Fornasa M, Cinà G, Elbers PWG, Thoral PJ, et al. Predicting Readmission or Death After Discharge From the ICU: External Validation and Retraining of a Machine Learning Model. Crit Care Med. 2023;51(2):291–300. doi: 10.1097/CCM.0000000000005758 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0312968.ref049] 49.Kunutsor SK, Whitehouse MR, Blom AW, Beswick AD. Systematic review of risk prediction scores for surgical site infection or periprosthetic joint infection following joint arthroplasty. Epidemiol Infect. 2017;145(9):1738–49. doi: 10.1017/S0950268817000486 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0312968.ref050] 50.Gwilym BL, Ambler GK, Saratzis A, Bosanquet DC. Groin Wound Infection after Vascular Exposure (GIVE) Risk Prediction Models: Development, Internal Validation, and Comparison with Existing Risk Prediction Models Identified in a Systematic Literature Review. Eur J Vasc Endovasc Surg. 2021;62(2):258–66. doi: 10.1016/j.ejvs.2021.05.009 [DOI] [PubMed] [Google Scholar]

[pone.0312968.ref051] 51.Lubelski D, Alentado V, Nowacki AS, Shriver M, Abdullah KG, Steinmetz MP, et al. Preoperative Nomograms Predict Patient-Specific Cervical Spine Surgery Clinical and Quality of Life Outcomes. Neurosurgery. 2018;83(1):104–13. doi: 10.1093/neuros/nyx343 [DOI] [PubMed] [Google Scholar]

[pone.0312968.ref052] 52.Song X, Liu X, Liu F, Wang C. Comparison of machine learning and logistic regression models in predicting acute kidney injury: A systematic review and meta-analysis. Int J Med Inform. 2021;151:104484. doi: 10.1016/j.ijmedinf.2021.104484 [DOI] [PubMed] [Google Scholar]

[pone.0312968.ref053] 53.Christodoulou E, Ma J, Collins GS, Steyerberg EW, Verbakel JY, Van Calster B. A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models. J Clin Epidemiol. 2019;110:12–22. doi: 10.1016/j.jclinepi.2019.02.004 [DOI] [PubMed] [Google Scholar]

[pone.0312968.ref054] 54.van Smeden M, Heinze G, Van Calster B, Asselbergs FW, Vardas PE, Bruining N, et al. Critical appraisal of artificial intelligence-based prediction models for cardiovascular disease. Eur Heart J. 2022;43(31):2921–30. doi: 10.1093/eurheartj/ehac238 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0312968.ref055] 55.Venema E, Wessler BS, Paulus JK, Salah R, Raman G, Leung LY, et al. Large-scale validation of the prediction model risk of bias assessment Tool (PROBAST) using a short form: high risk of bias models show poorer discrimination. J Clin Epidemiol. 2021;138:32–9. doi: 10.1016/j.jclinepi.2021.06.017 [DOI] [PubMed] [Google Scholar]

[pone.0312968.ref056] 56.Langenhuijsen LFS, Janse RJ, Venema E, Kent DM, van Diepen M, Dekker FW, et al. Systematic metareview of prediction studies demonstrates stable trends in bias and low PROBAST inter-rater agreement. J Clin Epidemiol. 2023;159:159–73. doi: 10.1016/j.jclinepi.2023.04.012 [DOI] [PubMed] [Google Scholar]

[pone.0312968.ref057] 57.Kaiser I, Pfahlberg AB, Mathes S, Uter W, Diehl K, Steeb T, et al. Inter-Rater Agreement in Assessing Risk of Bias in Melanoma Prediction Studies Using the Prediction Model Risk of Bias Assessment Tool (PROBAST): Results from a Controlled Experiment on the Effect of Specific Rater Training. J Clin Med. 2023;12(5). doi: 10.3390/jcm12051976 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0312968.ref058] 58.Collins GS, Moons KGM, Dhiman P, Riley RD, Beam AL, Van Calster B, et al. TRIPOD+AI statement: updated guidance for reporting clinical prediction models that use regression or machine learning methods. Bmj. 2024;385:e078378. doi: 10.1136/bmj-2023-078378 [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Systematic evaluation of machine learning models for postoperative surgical site infection prediction

Anna M van Boekel

Siri L van der Meijden

Sesmu M Arbous

Rob G H H Nelissen

Karin E Veldkamp

Emma B Nieswaag

Kim F T Jochems

Jeroen Holtz

Annekee van IJlzinga Veenstra

Jeroen Reijman

Ype de Jong

Harry van Goor

Maryse A Wiewel

Jan W Schoones

Bart F Geerts

Mark G J de Boer

Roles

Abstract

Background

Objective

Methods

Results

Conclusions

Introduction

Methods

Search strategy

Inclusion and exclusion criteria

Screening and data extraction

Descriptive analyses

Risk of bias

Results

Fig 1. PRISMA figure.

Characteristics of included studies

Table 1. Overview of included studies.

Performance of ML models

Table 2. Performance of ML models predicting SSI in general per surgical specialism.

Table 5. Performance of ML models predicting organ space SSI per surgical specialism.

Table 3. Performance of ML models predicting superficial SSI per surgical specialism.

Table 4. Performance of ML models predicting deep SSI per surgical specialism.

Predictors used in ML models

Fig 2. Predictors used in proportion of the ML models.

Regression-based models

Fig 3. Area under the curve (AUC) for each article that presented both ML and regression-based models.

Risk of bias

Fig 4. Summary of risk of bias assessment using the PROBAST.

Discussion

Strengths and limitations

Some limitations exist

Conclusions

Supporting information

Acknowledgments

Data Availability

Funding Statement

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases