Artificial intelligence-based prediction model for surgical site infection in metastatic spinal disease: a multicenter development and validation study

Yunpeng Cui; Xuedong Shi; Qiwei Wang; Yong Qin; Xiongwei Zhao; Xiaotong Che; Shengjie Wang; Yuanxing Pan; Bing Wang; Yuncen Cao; Yaosheng Liu; Mingxing Lei

doi:10.1097/JS9.0000000000002806

. 2025 Jun 27;111(10):6867–6884. doi: 10.1097/JS9.0000000000002806

Artificial intelligence-based prediction model for surgical site infection in metastatic spinal disease: a multicenter development and validation study

Yunpeng Cui ^a, Xuedong Shi ^a,^*, Qiwei Wang ^a, Yong Qin ^b, Xiongwei Zhao ^c, Xiaotong Che ^d, Shengjie Wang ^e, Yuanxing Pan ^a, Bing Wang ^a, Yuncen Cao ^f,^g, Yaosheng Liu ^f,^g,^✉, Mingxing Lei ^g,^h,^i,^j,^*

PMCID: PMC12527739 PMID: 40576176

Abstract

Background:

Treatment of metastatic spinal disease often involves surgical intervention; however, surgical site infections (SSIs) pose a great challenge for spine surgeons. At present, there is an absence of reliable clinical tools for predicting SSI, which can adversely affect treatment decisions and overall patient management. This study aims to construct and validate an application to stratify the patients at high risk of SSI among those with metastatic spinal disease using an artificial intelligence (AI) approach.

Methods:

A total of 667 patients diagnosed with metastatic spinal disease were enrolled in this study to train and validate models. Patients in the model-derivation cohort (n = 485) from two tertiary medical institutions were randomly divided into two groups at a ratio of 8:2, with the most patients belonging to the model-training group and the remaining patients classified into the model-validation group. External validation was conducted among patients (n = 182) from another tertiary medical institution. Logistic Regression and five machine learning algorithms, including support vector machine, gradient boosting machine (GBM), K-nearest neighbor (KNN), neural network (NN), and decision tree, were used to train and optimize models. The predictive performance of the models was assessed through both discrimination and calibration. The model demonstrating the best prediction accuracy was selected as the AI platform for assessing the risk of SSI in patients with metastatic spinal disease. To evaluate the clinical utility of our AI model, we conducted a comparative study involving 100 patients undergoing surgery for metastatic spinal disease at one tertiary medical institution.

Results:

The incidence of SSI in spinal metastasis surgeries was 6.4% in the model derivation cohort and 7.7% in the external validation cohort. Among all models, the GBM model had the highest area under the curve (AUC) value (0.986, 95% confidence interval [CI]: 0.972–1.000), followed by the KNN model (0.962, 95% CI: 0.933–0.991), and the NN model (0.944, 95% CI: 0.914–0.974). The GBM model also had the best prediction performance in terms of accuracy, precision, recall, F1 score, Brier score, and log loss. The calibration curve revealed the GBM model had favorable calibration ability, and decision curve analysis showed the GBM model had significant net clinical benefits in various risk thresholds. External validation generated an AUC value of the model of 0.848 (95% CI: 0.806–0.890). Surgery time, tumor type, and number of comorbidities were identified as the most three influential factors for postoperative SSI. The AI application achieved significantly higher accuracy than clinician assessments (AUC: 0.986 vs. 0.572–0.627, P < 0.001). Sensitivity analysis confirmed robustness across subgroups (e.g. diabetes, visceral metastases).

Conclusions:

This study develops and validates an AI tool with strong predictive performance in identifying patients at a high risk for SSI. By facilitating personalized treatment based on risk classification, this advancement has the potential to significantly enhance surgical care for patients with metastatic spinal disease. Future research should focus on integrating this predictive tool into clinical practice and exploring its applicability across diverse patient populations.

Keywords: artificial intelligence, cohort study, decompressive surgery, machine learning, metastatic spinal disease, surgical site infection

Introduction

Metastatic spinal disease is a common condition often requiring surgical intervention to alleviate pain, decompress neural structures, and restore spinal stability^[1]. However, surgical site infections (SSIs) pose a significant challenge in the surgical treatment of these patients^[2], as it can lead to prolonged hospital stay^[3–5], surgical failure^[6], increased economic burden^[3], and compromised survival outcomes^[4,7]. Hence, identifying patients at high risk for SSI is crucial to implement preventive measures and improve patient’s outcomes.

A number of studies have highlighted various risk factors for SSI in patients with metastatic spinal disease. These factors include patient-specific characteristics, surgical techniques, and aspects related to systemic treatments^[5,8–13]. A comprehensive understanding of these factors is important for enhancing clinical outcomes, as it allows for the implementation of personalized preventive strategies, optimization of surgical techniques, and improvement of systemic treatments. However, the predictive power of these traditional risk factors has been limited^[14,15], partly because metastatic spinal disease involves heterogeneous tumor biology, varying treatment histories (e.g. prior radiotherapy or chemotherapy), and complex interactions between comorbidities and immunosuppression – factors that traditional logistic regression or Cox models may struggle to capture due to their assumptions of linearity and independence among predictors^[16]. Furthermore, clinical data often present complex, multidimensional data sets, making it challenging for conventional statistical methods to elucidate the relationships between risk factors and outcomes^[16]. Additionally, small sample sizes in single-center studies and imbalanced datasets (where SSI events are relatively rare) further reduce the reliability of traditional methods. Currently, there is no universally accepted method or tool for predicting SSI in this specific patient population. Recently, two studies have aimed to develop predictive models for wound complications, including wound infection and reoperation, among patients undergoing surgery for metastatic spinal tumors^[17,18]. Regrettably, the studies in question did not utilize machine learning methods, resulting in models that achieved area under the curve (AUC) values of only 0.73 and 0.81, respectively. This indicates a clear need for further enhancement of predictive performance, and highlights the necessity for innovative approaches to predict SSI in patients with metastatic spinal disease.

Artificial intelligence (AI) has emerged as a promising tool in various medical fields, including the prediction and management of surgical complications for spine surgery^[19–21]. AI algorithms, particularly machine learning models, are designed to analyze large datasets, identifying complex patterns and relationships that may not be immediately apparent to human practitioners. These models can learn from historical patient data, integrating variables such as demographic information, medical history, and intraoperative factors to predict the risk of adverse outcomes. In fact, AI has been reported to be used to predict survival^[22,23], mental distress^[24], and intraoperative blood loss^[25], specifically for patients with metastatic spinal disease. Furthermore, several machine learning-based models have been proposed to predict SSI following different surgical procedures, such as spinal fusion surgery^[26], neurological procedures^[27], and lumbar internal fixation surgery^[28]. These models are valuable in clinical settings as they could improve predictive accuracy, inform clinical decisions, and potentially improve patient outcomes by allowing early interventions based on predicted risks. However, currently suitable models that predict SSI specifically for patients with metastatic spinal disease are lacking, indicating a critical gap in the application of AI in this area.

HIGHLIGHTS

This study successfully established an artificial intelligence (AI) platform designed to predict surgical site infection (SSI) in patients undergoing surgery for metastatic spinal disease.
The gradient-boosting machine model outperformed other models with an impressive area under the curve of 0.986, indicating exceptionally high reliability and validity in predicting SSI risks.
The platform allows for interactive engagement where clinicians can input patient-specific data and receive instant risk evaluations.
The AI application achieved significantly higher accuracy than clinician assessments.
The study also identified critical factors influencing the risk of SSI, such as surgery duration, tumor type, and smoking status.

Therefore, the purpose of this study is to construct and validate an AI platform that can stratify the risk of SSI among patients with metastatic spinal disease. This platform can be used to predict the risk of SSI, helping to identify patients at the high risk for SSI and providing support to clinical decision-making.

Patients and methods

Data sources and study design

A total of 667 patients were included in this study. Among them, 485 patients with metastatic spinal tumors were included from two tertiary medical institutions at the capital of our nation between January 2013 and December 2023. These patients constituted the model derivation group and were subsequently divided into a training group and an internal validation group in an 8:2 ratio. External validation was conducted with 182 patients from another tertiary medical institution in the south of our nation between January 2017 and November 2024. Thus, the above three different Grade-A tertiary hospitals were located in various regions of our nation, spanning both the northern and southern parts of the country. All patients underwent X-ray and MRI scans to determine the location of the metastatic lesions. The decision to proceed with surgery was based on critical indications, including severe, unmanageable pain resulting from spinal instability and myelopathy caused by spinal cord compression. A multidisciplinary team, including a neuroradiologist, a spinal tumor surgeon, and an oncologist, collaboratively determined the most suitable surgical approach. The surgical management of spinal metastases involved a complex procedure that included decompression and internal fixation using pedicle screw instrumentation. The selection criteria for the external validation cohort were consistent with those for the model derivation group. Inclusion criteria for the study in the entire cohort were radiographic evidence of metastatic spinal disease and the presence of one or more of the following symptoms: progressive local mechanical or radiating pain, impairment of sensory function, lower limb motor function, or sphincter function. Patients with primary spinal tumors, metastatic spinal disease due to leukemia, and intramedullary metastases of spinal metastases were excluded. A flowchart illustrating the patient selection process and the study design is presented in Supplemental Digital Content Figure 1, available at: http://links.lww.com/JS9/E507. The study protocol was approved by the ethics board of our hospitals, and all patients provided informed consent for the review of their medical records and images. To ensure patient privacy and data security, all personal identifiers were removed prior to analysis, and datasets were anonymized using unique study codes maintained in password-protected systems accessible only to authorized researchers. This study was conducted and reported in accordance with the STROCSS guidelines^[29,30] and the TRIPOD Checklist^[31].

Surgical process

Surgical intervention was indicated for patients experiencing intractable pain due to spinal instability or myelopathy resulting from spinal cord compression. A multidisciplinary team, including a neuro-radiologist, spinal tumor surgeon, and oncologist, determined the optimal surgical approach. The procedure aimed to achieve palliative decompression, partial vertebrectomy or en bloc vertebral resection, and spinal stabilization using pedicle screw instrumentation.

Under general anesthesia, the patient was carefully positioned in a prone position on the operating table. A midline incision was made at the appropriate spinal level, and the paraspinal muscles were gently separated to reveal the posterior spinal structures. To access the spinal cord and nerve roots, a posterior approach was employed, which included performing either a laminectomy or laminotomy, depending on the tumor’s location and extent. The tumor was then excised through a combination of vertebrectomy and tumor debulking. The degree of vertebrectomy was carefully determined based on the tumor’s involvement and the patient’s overall health and anatomy. After tumor removal, the vertebral defect was reconstructed using a combination of bone cement and artificial vertebral bodies to restore the spine’s structural integrity. To prevent potential deformity and ensure stability, the adjacent vertebrae were instructed using pedicle screws and rods. This instrumentation was placed to maintain proper spinal alignment during the healing process. Finally, the surgical site was closed in layers to promote optimal recovery and minimize the risk of complications.

Data collection

The selection of factors analyzed for their potential impact on SSI risk was based on a combination of prior literature review^[5,8–13] and expert recommendations^[32]. This study collected demographic information from each patient, including age, gender, smoking status, body mass index (BMI), and the number of comorbidities. Detailed tumor characteristics, such as tumor type, presence of extra-vertebral bone metastases, viscera metastases, and Eastern Cooperative Oncology Group (ECOG) score, were also recorded. In addition, data on oncological therapies, including preoperative chemotherapy, targeted therapy, and endocrinology, were documented. Targeted therapy is a precise treatment approach designed to inhibit specific molecular targets driving tumor growth. For example, in lung cancer, EGFR mutations can be treated with erlotinib or Osimertinib, ALK gene rearrangements with Crizotinib or Alectinib, and KRAS G12C mutations with Sotorasib. These therapies are more specific than traditional chemotherapy, offering fewer side effects and significantly improving outcomes for selected patients. Endocrinology treatment for malignant tumors involves using hormonal therapies to interfere with hormone production or action that drives the growth of certain cancers. This approach is most commonly applied to hormone-sensitive cancers, such as breast, prostate, and endometrial cancers. Information related to surgery, such as surgery duration, blood transfusion, and surgical segments, was also gathered. In addition, the data collection process incorporated various laboratory test results, such as albumin levels, hemoglobin concentrations, platelet counts, white blood cell counts, and glucose measurements.

Quality control

This study employed various measures to ensure data accuracy and reliability. To mitigate potential sources of bias in data collection across three distinct hospitals, we established a comprehensive data collection protocol to ensure accuracy and reliability. The research team received extensive training to minimize errors and maintain consistency with standardized guidelines. Additionally, a double-entry system was implemented for data entry, where two independent operators entered the data, followed by cross-verification to identify discrepancies. Validation checks were conducted to detect and correct inconsistencies or inaccuracies. Continuous monitoring, including periodic audits and reviews of data collection forms, allowed for early detection of issues. Constructive feedback was provided to the team to maintain high-quality standards. These stringent quality control measures ensured the validity and integrity of the research findings.

Definition of SSI

The definition of SSI (superficial or deep) followed the criteria outlined by Public Health England, which was largely based on the definitions provided by the Centers for Disease Control and Prevention and previous studies^[3,7,33]. An SSI was defined as an infection occurring in the part of the body where surgery took place, typically presenting within 30 days post-surgery. This infection could involve superficial skin layers (superficial SSI) or extend deeper into tissues or implanted materials (deep SSI). All cases met the following requirements: (1) clinical evidence of infection at the surgical site within 30 postoperative days, (2) documentation of characteristic symptoms (purulent drainage, pain/swelling/redness, or fever >38°C), and (3) either positive microbial culture (obtained in 82.0% of SSI cases) or unequivocal clinical diagnosis by the surgical team when cultures were unavailable (18.0%). For deep SSIs specifically, we required radiographic confirmation of infection (MRI/CT evidence of abscess or instrumentation involvement) in addition to clinical criteria. While metagenomic next-generation sequencing testing was not routinely performed, all culture-negative cases underwent secondary review by two independent surgeons to confirm diagnosis. Additionally, the follow-up period was determined based on the standard definition of SSI, which includes infections diagnosed within 30 days after surgery. This time frame aligns with widely accepted clinical guidelines and ensures the capture of all relevant cases for accurate analysis. All patients were monitored through post-discharge follow-up visits or telecommunication to record any signs or symptoms indicative of SSI.

Data engineering

All models utilized the same set of input features to ensure consistency and comparability. The dataset exhibited minimal missingness (<5.0% across all variables), with specific examples including surgery time (three missing values), ECOG performance status (three missing), and visceral metastases (four missing). For continuous variables like surgery time, we employed median imputation to preserve data distribution, while categorical variables (e.g. ECOG, visceral metastases) used mode imputation to retain the most frequent category. Given the relatively low incidence of SSI among patients with metastatic spinal disease, we selected SMOTETomek for its dual-phase approach that optimally addresses imbalanced data: the SMOTE component generates synthetic minority-class samples to prevent model bias toward non-SSI cases, while the Tomek Links component subsequently removes ambiguous boundary instances to improve class separation. This combined approach is particularly suited for clinical datasets in this study, as it preserves authentic data patterns while effectively mitigating overfitting – a critical advantage over simple oversampling methods when dealing with rare but clinically significant outcomes like SSI in metastatic spinal disease. In addition, when splitting the dataset into training and validation groups, the stratify = y_res parameter was applied. This method ensured that both subsets retained the same proportion of the outcome variable, thereby preserving the class distribution and improving the reliability of model evaluation. Normalization of the data was conducted using the preproc_pipeline.fit_transform method, which standardized the feature set to ensure all variables were treated equally in terms of scale.

Machine learning algorithm selection

To develop an accurate and robust predictive model, we conducted a comprehensive analysis using various advanced machine learning algorithms, including support vector machine (SVM), gradient boosting machine (GBM), K-nearest neighbor (KNN), neural network (NN), and decision tree (DT)^[34–36]. These algorithms were selected based on their complementary strengths in handling different aspects of dataset. For example, SVM was included due to its effectiveness in high-dimensional spaces and robustness to outliers, which is relevant given the multidimensional nature of clinical risk factors in metastatic spinal disease. KNN was chosen for its simplicity and ability to capture local patterns in data, while NN was employed to model complex, nonlinear relationships that may exist between predictors and SSI outcomes. DT served as an interpretable baseline model, allowing us to visualize decision pathways. Notably, we prioritized ensemble methods like GBM for several reasons^[37,38]: (1) GBM’s sequential learning approach (boosting) is particularly effective in addressing class imbalance – a key challenge in our study given the relatively low incidence of SSI compared to non-SSI cases. By iteratively focusing on misclassified samples, GBM improves predictive accuracy for rare events. (2) Our preliminary analysis revealed heterogeneous predictor effects across patient subgroups, and GBM’s ability to combine multiple weak learners makes it adept at capturing such heterogeneity. (3) The algorithm’s built-in feature importance metrics help identify dominant risk factors while maintaining robustness to irrelevant variables – a critical advantage given the numerous potential confounders in metastatic spinal disease. In addition, logistic regression (LR) model was used as a control to benchmark performance against conventional statistical approaches and validate that our machine learning models provided meaningful improvements over traditional methods. The principal and advantages of these machine learning-based algorithm are summarized in Supplemental Digital Content Table 1, available at: http://links.lww.com/JS9/E508.

Hyperparameter tuning and model optimization

Extensive grid and random hyperparameter searches were conducted to enhance the performance of each model, using the AUC as the main metric for optimization. While identifying the best hyperparameters for each model, we implemented 10-fold cross-validation on the training dataset, along with 100 iterations of train/test set shuffling to ensure robust validation results. To account for the variability in model performance, we defined wide ranges for the hyperparameters. Learning curves were utilized to identify potential overfitting and underfitting issues within the models. By varying the train/test ratios for each hyperparameter, we generated learning curves to assess the convergence of cross-validation scores, allowing for a more nuanced understanding of model behavior as training proceeds.

To optimize model hyperparameters, we utilized grid search and randomized grid search, configuring parameters such as n_iter = 100, cv = 5, and scoring = “roc_auc.” The n_iter = 100 setting enabled the evaluation of 100 hyperparameter combinations, supporting a comprehensive exploration of the search space. Broad parameter ranges were defined to ensure an exhaustive search. In detail, for the DT model, key hyperparameters were fine-tuned for optimal performance. The max_features parameter was set to “auto” or “log2,” allowing the selection of features based on dataset complexity. To control tree depth and prevent overfitting, max_depth was randomized between 1 and 50. Node splitting was managed using min_samples_split, randomized between 2 and 200, ensuring a balanced trade-off between underfitting and overfitting. Similarly, the min_samples_leaf parameter, randomized within the same range, defined the minimum samples per leaf, ensuring sufficient data for reliable predictions. These tuning efforts improved predictive accuracy and ensured robust generalization to unseen data. By systematically exploring a diverse range of hyperparameters, we aimed to strike an optimal balance between bias and variance, resulting in a model that is well-equipped to handle complex datasets effectively.

Model validation

This study utilized a range of evaluation metrics to assess the performance of the models, including AUC, accuracy, precision, recall, F1 score, Brier score^[39], log loss^[25], and precision-recall curve^[40]. To enhance robustness and reliability, the AUC was calculated using 100 bootstrap samples. Accuracy, precision, recall, and F1 score were derived from the confusion matrix, offering a comprehensive evaluation of the model’s classification capabilities^[36,41]. This multifaceted approach ensures a thorough understanding of the models’ effectiveness.

Accuracy: The proportion of correct predictions among all predictions made.

A c c u r a c y = \frac{T P + T N}{T P + F N + F P + T N}

(1)

where TP is true positives, TN is true negatives, FP is false positives, and FN is false negatives.

Precision: The ratio of true positive predictions to all predicted positives.

P r e c i s i o n = \frac{T P}{T P + F P}

(2)

Recall: The ratio of true positive predictions to all actual positives.

R e c a l l = \frac{T P}{T P + F N}

(3)

F1 Score: The harmonic mean of precision and recall, balancing both metrics.

F 1 S c o r e = \frac{2 (P r e c i s i o n \times R e c a l l)}{P r e c i s i o n + R e c a l l}

(4)

The Brier score was calculated using the formula where $N$ represents the total sample, $p_{i}$ represents the predicted risk, and $o_{i}$ represents the true probability. The Brier score measured the overall accuracy of the model’s predicted probability of experiencing SSI in this study.

B r i e r S c o r e = \frac{1}{N} \sum_{i = 1}^{n} {(p_{i} - o_{i})}^{2}

(5)

The log loss, calculated using the scikit-learn formula, evaluates the quality of classification model predictions. It considers the number of samples ( $N$ ), the number of classes ( $M$ ), the true labels ( $y_{i j}$ ), and the predicted probabilities ( $p_{i j}$ ).

L o g L o s s = - \frac{1}{N} \sum_{i = 1}^{N} \sum_{j = 1}^{M} y_{i j} \log (p_{i j})

(6)

To comprehensively evaluate the models, calibration curves were plotted based on 100 bootstrap iterations to assess their ability to align predicted probabilities with observed outcomes^[42]. Calibration curves measure the degree to which predicted risks match actual risks, indicating the reliability of a model’s predictions. A perfectly calibrated model produces a curve along the 45° diagonal. A density curve analysis was performed to examine the distribution of predicted probabilities across different risk groups, providing insights into the model’s discrimination ability and potential biases. Furthermore, decision curve analysis (DCA) was employed to quantify the clinical net benefits of each model across varying probability thresholds^[43]. DCA assesses the value of a model’s predictions in enhancing decision-making by weighing true positives against the potential harms of false positives, with a focus on patient-centered outcomes. This approach helps to determine the clinical utility of predictive models by evaluating their impact on patient care and decision-making processes. This analysis helps determine the practical utility of the model in real-world clinical settings, guiding threshold selection for optimal patient care.

Feature importance analysis

To enhance the interpretability of our models, we employed SHapley Additive exPlanations (SHAP) to quantify feature contributions^[44,45]. The trained GBM model was used to initialize a SHAP explainer. SHAP values were then computed for all samples using the optimized TreeSHAP algorithm, which efficiently calculates exact Shapley values for tree-based models. For visualization, we generated two complementary plots: (1) a beeswarm plot, which summarizes global feature importance by plotting the distribution of SHAP values for each variable across the entire test cohort, and (2) individual-level waterfall plots, which deconstruct the prediction for representative cases by sequentially adding each feature’s contribution to the base value. These visualizations were rendered using the SHAP Python library (v0.44.1) with consistent color coding (red/blue for positive/negative impact) and normalized scales to enable cross-plot comparisons.

Construction of the AI application

An AI-powered application was developed to evaluate the risk of SSI in patients undergoing decompressive surgery for metastatic spinal tumors. This tool, designed for user-friendliness, was integrated into the GitHub and deployed as an application^[46]. Users could customize input parameters, including detailed patient demographics and specific surgical variables. Once the parameters were set, the algorithm computed the probability of SSI, delivering results through an interface that not only revealed the risk of SSI but also elucidated the underlying methodology and the performance metrics of the AI system. To further refine its utility, patients were categorized into either high-risk or low-risk groups based on a predefined threshold according to the optimal model, with tailored intervention recommendations provided for each category to guide clinical decisions.

Human vs. AI predictive performance comparison

To evaluate the clinical utility of our AI model, we conducted a comparative study involving 100 patients undergoing surgery for metastatic spinal disease at one tertiary medical institution. Five board-certified spine surgeons with 10–15 years of experience independently reviewed de-identified patient profiles, including demographics, preoperative laboratory values, oncologic history, and planned surgical parameters. Clinicians were blinded to postoperative outcomes and asked to estimate each patient’s probability of developing SSI based on their clinical judgment. The same patient cohort was analyzed by our pre-trained GBM model, which automatically extracted model features and generated SSI risk scores (0–1 probability) without human intervention. Input variables matched those used in model development. ROC curves were constructed for both AI and clinician predictions, with SSI diagnosis as the ground truth. AUC values and 95% confidence intervals (CIs) were calculated. The Delong test was used to compare AUC differences between the AI and human predictors.

Statistical analysis

The summary of continuous variables utilized the mean with standard deviation for normally distributed data, and the median with interquartile range for non-normally distributed data. Categorical variables were presented as proportions. The distribution of categorical variables was compared using the Chi-square test, and continuous variables were compared using either the student’s t-test or Wilcoxon rank-sum test, depending on the data distribution. Univariate analysis was employed to identify potential model predictors based on previous studies^[46,47]. To assess the statistical robustness of our study, we employed the Power Analysis and Sample Size software (Version 11.0.10) to calculate the statistical power for each of the significant variables identified. This study undertook an in-depth examination of the model predictors through the use of a correlation matrix to investigate potential multicollinearity among the features. Pearson correlation coefficients were calculated to quantify these relationships. Furthermore, this study assessed the multicollinearity by computing the variance inflation factor (VIF) for each predictor. The regression analysis was conducted using LR to evaluate the predictive performance of the proposed models against a traditional statistical approach. The implementation and hyperparameter tuning of the machine learning algorithms were performed using Python (version 3.9.7) with scikit-learn (version 1.2.2). Statistical analyses were conducted using R (version 4.1.2), with a p-value of less than 0.05 considered statistically significant in two-sided testing.

Results

Patient characteristics overview

The median age of the model derivation cohort was 61.00 years, and the majority of patients were male (68.2%). Regarding tumor type, the most common cancer was lung cancer (25.4%), followed by renal cancer (17.9%) and prostate cancer (16.3%). Among all patients, 20.4% of patients were current smokers. A large number of patients had at least one comorbidity (48.2%), with 4.5% of patients having three or above comorbidities. The cancer burden was relatively heavy, since 43.1% patients had extravertebral bone metastasis, 23.7% patients had visceral metastases, and 50.1% of patients had an ECOG score of 3 or above. The median surgery time was 271.00 min, and the majority of patients had blood transfusion during surgery (81.0%). The incidence of SSI in surgeries for spinal metastases was 6.4% in the model derivation cohort. In detail, 35.5% of the infected patients experienced deep SSI, while 64.5% had superficial SSI. More detailed information on patient’s laboratory examination, including preoperative albumin, hemoglobin, platelet, white blood cell count, and glucose is summarized in Table 1. In this study, patients in the model derivation cohort were randomly assigned to either the training cohort (n = 388) or the validation cohort (n = 97). A comparison of clinical characteristics between the two groups revealed no significant differences (Supplemental Digital Content Table 2, available at: http://links.lww.com/JS9/E509), indicating that the two cohorts were comparable.

Table 1.

Patient’s clinical characteristics.

Characteristics	Overall
n	485
Age (years, median [IQR])	61.00 [54.00, 70.00]
Gender (male/female, %)	331/154 (68.2/31.8)
Tumor type (%)
Thyroid cancer	8 (1.6)
Prostate cancer	79 (16.3)
Breast cancer	43 (8.9)
Renal cancer	87 (17.9)
Lung cancer	123 (25.4)
Hepatocellular carcinoma	26 (5.4)
Gastrointestinal system cancer	25 (5.2)
Urogenital cancer	25 (5.2)
Others	69 (14.2)
Smoking (%)
Never	362 (74.6)
Previous	24 (4.9)
Current	99 (20.4)
BMI (kg/m², median [IQR])	23.62 [21.26, 25.88]
Number of comorbidity (%)
0	251 (51.8)
1	155 (32.0)
2	57 (11.8)
≥3	22 (4.5)
Coronary disease (no/yes, %)	456/29 (94.0/6.0)
Diabetes (no/yes, %)	411/74 (84.7/15.3)
Hypertension (no/yes, %)	330/155 (68.0/32.0)
Extravertebral bone metastasis (no/yes, %)	276/209 (56.9/43.1)
Visceral metastases (no/yes, %)	370/115 (76.3/23.7)
ECOG (%)
1	13 (2.7)
2	229 (47.2)
3	165 (34.0)
4	78 (16.1)
Preoperative chemotherapy (no/yes, %)	394/91 (81.2/18.8)
Preoperative targeted therapy (no/yes, %)	443/42 (91.3/8.7)
Preoperative endocrinological therapy (no/yes, %)	435/50 (89.7/10.3)
Surgery time (min, median [IQR])	271.00 [221.00, 320.00]
Blood transfusion (mL, %)
None	92 (19.0)
<1000	268 (55.3)
≥1000	125 (25.8)
Surgical segments (%)
1	237 (48.9)
2	152 (31.3)
≥3	96 (19.8)
Albumin (g/L, median [IQR])	40.20 [37.20, 42.80]
Hemoglobin (g/L, median [IQR])	132.00 [117.00, 143.00]
Platelet (×10⁹/L, median [IQR])	220.00 [173.00, 270.00]
White blood cell count (×10⁹/L, median [IQR])	6.40 [5.20, 8.10]
Glucose (mmol/L, median [IQR])	5.58 [5.06, 6.69]
Surgical site infection (no/yes, %)	454/31 (93.6%/6.4%)

Open in a new tab

IQR, interquartile range; ECOG, Eastern Cooperative Oncology Group.

Selection of model inputs

Before modeling, a comparison was conducted according to the presence of postoperative SSI, it demonstrated that tumor type (P < 0.001), smoking (P = 0.020), number of comorbidities (P < 0.001), diabetes (P < 0.001), visceral metastases (P = 0.025), surgery time (P = 0.005), and surgical segments (P = 0.004) were significant (Supplemental Digital Content Table 3, available at: http://links.lww.com/JS9/E510), all of which were used as model inputs. Among the seven significant variables, the analysis of statistical power revealed that six exhibited the statistical power values exceeding 0.80, which suggests a high probability of detecting true effects in the population. Furthermore, we computed the Pearson correlation coefficients among the model predictors (Supplemental Digital Content Figure 2, available at: http://links.lww.com/JS9/E507), which showed that all coefficients were below 0.80, indicating that there was no significant multicollinearity among the variables. To elaborate, the correlation coefficient between tumor type and surgery time was found to be 0.066 (P = 0.320), indicating no statistically significant correlation between the two variables. Similarly, no significant correlations were found between other variables, such as tumor type and smoking, smoking and comorbidities, or comorbidities and surgical segments. Additionally, the VIF values for each model predictor were all below the threshold of 10. These findings validate that multicollinearity was not an issue in our analysis. To ensure the stability of the input variables, we conducted an additional selection process using the LASSO method. The LASSO analysis identified the number of comorbidities, surgical time, surgical segments, tumor type, visceral metastases, and diabetes as the retained variables, with corresponding coefficients of 0.185, 0.001, 0.172, 0.113, 0.098, and 0.307, respectively. Given the strong association between smoking and infection risk^[5,12], we chose to retain smoking as a model input. This decision aligned with the clinical relevance of smoking in the context of surgical outcomes, thus resulting in the final selection of the above seven model inputs.

Model evaluation

Six different models were developed using the same model inputs, and their hyperparameters are summarized in Supplemental Digital Content Table 4, available at: http://links.lww.com/JS9/E511. In addition, we have uploaded the complete implementation code for machine learning development and validation to GitHub, along with a detailed README file explaining the code organization and usage. This study further compared six different models for predicting SSI among patients with metastatic spinal disease and revealed that the GBM model excelled across all evaluated metrics, making it the most optimal choice. The GBM model achieved the highest accuracy, precision, and recall, and maintained these scores in its F1 score as well (Fig. 1). Furthermore, it featured an exceptional AUC of 0.986 (95% CI: 0.972–1.000), indicating nearly perfect predictive capability (Fig. 2). Its Brier score of 0.033 and log loss of 0.137 were the lowest among the developed models, suggesting minimal error and uncertainty in predictions (Table 2). While other models like the NN and KNN also performed commendably, particularly in AUC and F1 score, they did not approach the comprehensive superiority of the GBM model. Probability density showed that the majority of models, particularly the GBM model and the KNN model, had favorable discrimination ability with small overlap area (Supplemental Digital Content Figure 3, available at: http://links.lww.com/JS9/E507). In addition, the precision-recall curves demonstrated that models, in particular the GBM model and the KNN model, had favorable discriminative ability (Fig. 3). In summary, the results underscored the GBM model’s efficacy and reliability as a valuable tool in clinical decision-making for this medical condition, clearly outperforming LR, SVM, NN, KNN, and DT models in every aspect considered, and thus the GBM model was considered as the optimal model. Moreover, the calibration curve showed that the GBM model had favorable calibration ability (Fig. 4), and the DCA demonstrated that the GBM model had favorable clinical net benefits across various risk thresholds (Fig. 5).

Figure 1. — Prediction performance for all machine learning models. (A) Accuracy; (B) precise; (C) recall; (D) F1 score; (E) brier score; (F) log loss. LR, logistic regression; SVM, support vector machine; NN, neural network; GBM, gradient boosting machine; KNN, k-nearest neighbor; DT, decision tree.

Figure 2. — The area under the curve analysis after applying 100 bootstraps for all machine learning models.

Table 2.

Prediction performance of the six models for predicting surgical site infection among patients with metastatic spinal disease.

Metrics	Models
Metrics	LR	SVM	NN	GBM	KNN	DT
Accuracy	0.657	0.862	0.873	0.967	0.928	0.735
Precision	0.659	0.874	0.876	0.967	0.914	0.710
Recall	0.644	0.844	0.867	0.967	0.944	0.789
F1 score	0.652	0.859	0.872	0.967	0.929	0.747
AUC (95% CI)	0.729 (0.654–0.803)	0.930 (0.896–0.965)	0.944 (0.914–0.974)	0.986 (0.972–1.000)	0.962 (0.933–0.991)	0.871 (0.822–0.919)
Brier score	0.215	0.108	0.106	0.033	0.050	0.144
Log loss	0.622	0.341	0.344	0.137	1.196	0.422

Open in a new tab

AUC, area under the curve; CI, confidence interval; LR, logistic regression; SVM, support vector machine; NN, neural network; GBM, gradient boosting machine; KNN, K-nearest neighbor; DT, decision tree.

Figure 3. — The precision-recall curve for all machine learning models.

Figure 4. — Calibration curve for all machine learning models. The calibration curve was plotted after applying 100 bootstraps.

Figure 5. — Decision curve analysis for all machine learning models. The dashed black line represents the optimal risk threshold identified by Youden’s index maximization.

External validation

External validation was performed among 182 patients, and the clinical characteristics of these patients were summarized in Supplemental Digital Content Table 5, available at: http://links.lww.com/JS9/E512. The incidence of SSI in surgeries for spinal metastases was 7.7% in the external validation cohort, showing no statistically significant difference compared to the derivation cohort (6.4%) (P = 0.672). In detail, in the external validation cohort, 42.9% of the infected patients had deep SSI, and 57.1% had superficial SSI. Subgroup analysis was also conducted according to the presence of SSI in the external validation cohort, and it also demonstrated that tumor type (P = 0.012), number of comorbidities (P = 0.001), visceral metastases (P = 0.012), surgery time (P = 0.011), and surgical segments (P = 0.019) were significant. However, smoking and diabetes were not found to be significant. In addition, this study further compared the baseline clinical characteristics between the model derivation cohort and the external validation cohort (Supplemental Digital Content Table 6, available at: http://links.lww.com/JS9/E513). The above results suggested both robustness and variability among different clinical cohorts, and validation of models in external cohort could further test the generalizability of models. In this study, external validation of the optimal model revealed an AUC value of 0.848 (95% CI: 0.806–0.890) (Supplemental Digital Content Figure 4, available at: http://links.lww.com/JS9/E507), with a precision of 0.907, the highest among all developed models (Supplemental Digital Content Table 7, available at: http://links.lww.com/JS9/E514). Other models, such as the SVM model and the NN model, also demonstrated favorable prediction performance in terms of AUC values. The precision-recall curve demonstrated that the GBM model also had favorable discriminative ability in the external validation cohort (Supplemental Digital Content Figure 5, available at: http://links.lww.com/JS9/E507). In addition, the calibration curve indicated that the model exhibited relatively favorable prediction performance in the external validation cohort (Supplemental Digital Content Figure 6, available at: http://links.lww.com/JS9/E507), suggesting favorable calibration ability. DCA confirmed the clinical net benefits of the model (Supplemental Digital Content Figure 7, available at: http://links.lww.com/JS9/E507), demonstrating its potential for improved clinical decision-making. Despite some strong performances from other models in the external validation, the GBM model remains the best overall choice due to its balanced performance across the internal and external validation. Notably, although this study observed an AUC reduction during external validation (0.986–0.848), the performance remained clinically excellent (AUC >0.8 generally indicates strong predictive utility), this variability reflects real-world differences between cohorts. These variations, compounded by institutional differences in surgical protocols or perioperative care, likely contributed to the AUC shift. The model maintained very strong prediction performance, despite cohort heterogeneity, underscoring its generalizability. This underscores its adaptability to heterogeneous populations, while highlighting the need for local calibration when implementing predictive tools.

To further validate the model’s effectiveness, this study combined the three datasets into a single cohort for comprehensive model evaluation and internal validation. The integrated analysis demonstrated excellent performance metrics: accuracy (0.976), precision (0.968), recall (0.984), F1 score (0.976), and AUC (0.994) (Supplemental Digital Content Figure 8, available at: http://links.lww.com/JS9/E507), along with strong reliability indicators including a Brier score of 0.026 and log loss of 0.100. Calibration curve analysis revealed excellent agreement between predicted probabilities and observed outcomes (Supplemental Digital Content Figure 9, available at: http://links.lww.com/JS9/E507). Clinical DCA demonstrated significant net benefit across a wide range of threshold probabilities, suggesting robust clinical utility (Supplemental Digital Content Figure 10, available at: http://links.lww.com/JS9/E507). Furthermore, precision-recall curve analysis confirmed the model’s high predictive performance in this imbalanced clinical scenario (Supplemental Digital Content Figure 11, available at: http://links.lww.com/JS9/E507). These results collectively indicate that our AI model achieves both high discriminative ability and clinical applicability for SSI risk prediction in metastatic spinal disease patients.

Sensitivity analysis

This study conducted a detailed subgroup analysis to evaluate the predictive performance of the optimal model across various patient demographics in the external validation cohort. The findings highlighted robustness in the model’s performance across diverse groups, with AUC values indicating high predictive accuracy. Specifically, the predictive accuracy remained high among patients with varying health conditions, such as diabetes, where the AUC values were 0.858 for non-diabetics and 0.816 for diabetics (Fig. 6A). Furthermore, the model performed commendably in differentiating between patients with and without visceral metastases, achieving AUC values of 0.869 and 0.870, respectively (Fig. 6B). The analysis also explored the impact of multiple comorbidities, with AUC values reported as 0.989 for patients with no comorbidities, decreasing to 0.827 as the number of comorbidities increased (Fig. 6C). Additionally, patients treated with different numbers of segments exhibited AUC values of 0.630 for one segment, 0.913 for two segments, and 0.956 for three or more segments (Fig. 6D). This study also performed sensitivity analyses to evaluate the model’s prediction performance specifically across the three hospitals, and the AUC values for the patients from these hospitals were 0.997, 0.933, and 0.848 (external validation cohort), respectively. Hence, this study confirmed that the model demonstrates excellent predictive performance across a range of patient subgroups, maintaining relatively high accuracy irrespective of treatment complexity, presence of visceral metastases, diabetic condition, or the number of comorbidities. These findings underscored the model’s adaptability and reliability in varied clinical settings, highlighting its potential utility in personalized patient care.

Figure 6. — Subgroup analysis of prediction performance. (A) Diabetes; (B) visceral metastases; (C) the number of comorbidities; (D) surgical segments.

Feature importance

Feature importance analysis revealed that the surgery time, tumor type, and the number of comorbidities were the most important three contributors to postoperative SSI among patients with metastatic spinal disease in both the training (Supplemental Digital Content Figure 12A, available at: http://links.lww.com/JS9/E507) and internal validation (Supplemental Digital Content Figure 12B, available at: http://links.lww.com/JS9/E507) cohorts. In addition, smoking, surgical segments, visceral metastases, and diabetes also played important role in affecting the postoperative SSI. In the entire model derivation cohort, the AUC value for predicting SSI was 0.652 for surgery time, 0.526 for tumor type, 0.582 for smoking, 0.673 for the number of comorbidities, 0.625 for diabetes, 0.645 for surgical segments, and 0.597 for visceral metastases (Supplemental Digital Content Figure 13, available at: http://links.lww.com/JS9/E507).

Deployment of the AI application

To enhance clinical utility, the optimal model has been deployed as an AI application (Supplemental Digital Content Figure 14, available at: http://links.lww.com/JS9/E507). The codes for developing the AI application are available at: Supplemental Digital Content Table 8, available at: http://links.lww.com/JS9/E515. At the data input portal, users could enter patient-specific data. After submitting the data, the AI application automatically calculated the probability of SSI risk and categorizes the risk levels. The risk threshold cutoff value when we determine a high versus a low risk of experiencing SSI was 40.1% based on the optimal model. The 40.1% cutoff was determined as the optimal threshold for binary classification (SSI vs. no SSI) based on the maximization of Youden’s index (sensitivity + specificity − 1) in our GBM model. This threshold was selected after rigorous evaluation of the precision-recall trade-off and clinical interpretability. A threshold near 40% balanced the need to identify high-risk patients (avoiding under-treatment) while minimizing false alarms (preventing over-intervention). In this study, the optimal risk threshold was labeled in the clinical decision curve (Fig. 5) to assess its net benefit across different risk stratifications and to guide clinical decision-making. In addition, based on these risk stratifications, it recommended different treatment strategies tailored to the patient. Additionally, the AI application generated a personalized risk report, which highlighted the risk and protective factors for individuals. This report provided insights into why a patient was classified as either the high or low risk for SSI.

Using the AI application to predict the risk of SSI

A case study is presented below: A patient diagnosed with metastatic spinal disease originating from renal cancer, who currently smokes, presented with two comorbidities and visceral metastases but did not suffer from diabetes. The patient underwent a surgical procedure involving two segments of the vertebrae, with a total duration of 241 minutes. After entering these characteristics into the system and submitting the data, the AI application calculated the patient’s risk of postoperative SSI (43.1%) and classified the patient as high risk of experiencing SSI (Supplemental Digital Content Figure 14, available at: http://links.lww.com/JS9/E507). In this case, an individualized risk report generated by the AI application provided insights into the contributions of various predictors within the model. The analysis indicated that the duration of the surgical operation served as a protective factor, while comorbidities and tumor type were identified as risk factors. This individualized risk report equips healthcare providers with essential information to develop targeted strategies aimed at reducing the incidence of SSI.

Comparative analysis of human vs. AI prediction of SSI risk

To evaluate clinical applicability, a comparative study was conducted on 100 patients with metastatic spinal disease. The AI application demonstrated superior predictive performance (AUC: 0.986, 95% CI: 0.968–1.000) compared to five independent clinicians (AUCs: 0.572 [95% CI: 0.458–0.686], 0.600 [0.488–0.712], 0.603 [0.491–0.715], 0.576 [0.454–0.698], and 0.627 [0.517–0.737]). The Delong test confirmed the AI’s significantly higher accuracy (P < 0.001, Fig. 7), underscoring its potential to outperform conventional clinical judgment in SSI risk stratification.

Figure 7. — Comparative performance of AI versus clinicians in predicting surgical site infection risk.

Discussion

Principal findings

Among the various machine learning models evaluated, the GBM model demonstrated the highest predictive performance, with the superior AUC value, accuracy, precision, recall, and F1 score. The GBM model also exhibited favorable calibration ability and high clinical net benefits across different risk thresholds. In addition, external validation demonstrated its favorable prediction performance. The AI application developed based on this model could potentially serve as a valuable tool for identifying patients at high risk of postoperative SSI, and it could provide personalized risk reports and recommend therapeutic strategies for patients with metastatic spinal disease.

Risk factors associated with SSI in metastatic spinal disease

The incidence of SSI in surgeries for spinal metastases has been reported to range from 5.1% to 14.7% in previous studies^{[3–6,8–11]}. This rate was generally higher compared to other types of spinal surgeries^[11]. In the present study, the incidence of SSI in surgeries for spinal metastases was 6.4% in the model derivation cohort and 7.7% in the external validation cohort, which was consistent with other studies^{[3–6,8–11]}. Compared to surgery for spinal metastases, other spinal surgeries carry a lower risk of SSI^[48]. This may be due to the combined effects of tumor-induced immunosuppression, frequent patient comorbidities (such as malnutrition, anemia, and prior radiation/chemotherapy)^[49], and the inherently complex nature of these procedures (including prolonged operative times, multilevel instrumentation, and extensive reconstructions). Additionally, various factors associated with SSI in patients with metastatic spinal disease have been identified. These factors could be categorized into patient-related factors, surgical factors, and systematic treatment-related factors^[5,8–13]. Patient-related factors that were associated with a higher risk of SSI include female sex^[12], smoking^[5,12], elevated BMI^[5,8], poor prognostic scores^[10], presence of diabetes^[11], poor postoperative Frankel scale scores^[10], postoperative performance status^[10], and higher ASA scores^[5]. Surgical factors have also been found to contribute to the risk of SSI. A retrospective analysis of 152 patients revealed that thoracic surgery and surgery involving a greater number of vertebral levels were significantly associated with SSI among patients undergoing surgery for spinal metastases^[9]. Another study found that cervicothoracic surgery, increased intraoperative blood loss, longer surgical time, and an increased number of fixed vertebrae were associated with an increased risk of SSI^[5]. In terms of systematic treatment-related factors, studies have shown that preoperative chemotherapy^[12], preoperative radiotherapy^[10,11], and corticosteroid use^[12] were associated with a higher incidence of wound-related complications. However, a recent study contradicted these findings and claimed that the administration of corticosteroids did not increase the risk of developing SSI in surgery for spinal metastasis^[13]. Regarding tumor stages, all patients in our study were diagnosed with metastatic spinal disease, classifying them uniformly at the M1 stage of malignancy. However, there were some variations in the sites of metastasis: 23.7% of patients had visceral metastases, while 43.1% presented with extravertebral bone metastases. This suggests a relatively heavy tumor burden within our cohort. Importantly, we observed a statistically significant difference in the incidence of SSI between patients with visceral metastases and those without. In contrast, no significant difference was found in the incidence of SSI concerning extravertebral bone metastases between the two groups. These results indicate that the presence of visceral metastases is likely associated with an increased risk of developing SSI, highlighting the need for heightened vigilance in managing patients with this specific condition. By comprehensively analyzing the above factors, healthcare professionals can more accurately assess the risk of SSI in patients with metastatic spinal disease, enabling the implementation of targeted preventive measures to reduce the incidence of such complications. In addition, much of the existing research focuses on postoperative changes in CRP, IL-6, and other related inflammatory indicators in relation to SSI among patients undergoing spine surgery^[50,51]. However, a study of 825 patients revealed that CRP levels were a risk factor for urinary tract infection but not for SSI ^[52]. Thus, the predictive role of these indicators for SSI remains to be further validated. Notably, our research aims to utilize preoperative and intraoperative indicators to predict SSI for early forecasting; therefore, postoperative indicators were not analyzed in our study.

In line with previous research^[5,11,12], our study also observed a correlation between higher medical comorbidities and an increased risk of SSI. Additionally, our study identified specific tumor types – particularly renal cancer, hepatocellular carcinoma, and gastrointestinal malignancies – as well as additional factors such as smoking, diabetes, prolonged surgical duration, visceral metastases, and a greater number of surgical segments were significantly more prevalent in patients who developed SSI. These findings highlight the critical need to address modifiable risk factors to prevent postoperative SSIs effectively. To optimize patient outcomes, it is essential to customize preoperative assessments and surgical strategies according to each patient’s distinct comorbidities and tumor characteristics, which were closely linked to the risk of postoperative SSI in our study. By adopting this tailored approach, surgeons can more accurately anticipate potential complications and implement proactive preventive measures. For instance, patients with higher comorbidity scores may require more thorough preoperative optimization and enhanced postoperative monitoring. Additionally, understanding how tumor type influences surgical strategy can guide the selection of techniques that minimize surgical duration, ultimately reducing the likelihood of SSI. By integrating these personalized strategies, we aim to improve patient outcomes and promote a more efficient recovery process.

Prediction of SSI for metastatic spinal disease

Through a comprehensive literature review, we identified multiple predictive models that have been developed for SSI across various surgical specialties. These encompass: oncological resection procedures^[53,54], neurological operations^[27], non-cardiac surgeries in type 2 diabetes patients^[55], pelvic organ prolapse surgery^[56], lower third molar surgery^[57], and elective spine surgery for lumbar disc herniation, lumbar spondylolisthesis, or lumbar spinal stenosis^{[26,28,58–62]} (Supplemental Digital Content Table 9, available at: http://links.lww.com/JS9/E516). Notably, several investigations employed machine learning algorithms for model development^{[26,28,54,57,58,62]}, with reported AUC values demonstrating considerable variation from 0.603 to 0.988 across studies. In addition, two studies utilized mobile thermal images and machine learning technology for early detection of SSI^[63,64]. A systematic review concluded that machine learning-based models showed optimal performance in detecting SSI^[65]. Notably, one study specifically focused on developing a model to predict the occurrence of wound complications in patients treated with surgery for spinal metastases^[17]. The above study included 330 patients and found that factors such as a higher Charlson comorbidity index, lower platelet count, revision surgery, and longer incision length were linked to wound infections. The model developed in the above study showed an AUC value of 0.81; however, machine-learning techniques were not utilized. To the best of our knowledge, our study is among the earliest machine learning-based models to predict the risk of SSI, particularly for patients with metastatic spinal disease. In the present study, we used five machine learning algorithms to develop models and found the GBM model had the best prediction performance. Additionally, we deployed the model as an AI application to promote clinical utility.

Despite these advancements, several barriers may hinder the clinical adoption of our AI application. To elaborate, clinician trust in AI-driven predictions remains a critical challenge. Many healthcare professionals may be hesitant to rely on machine learning models due to concerns about interpretability, transparency, and potential biases in the training data. In addition, integrating AI tools into existing clinical workflows poses logistical challenges, including compatibility with electronic health record systems, real-time data accessibility, and workflow disruptions. Seamless integration will require collaboration with healthcare IT specialists and iterative feedback from end-users to optimize usability. Conversely, facilitators such as demonstrated improvements in predictive accuracy, cost-effectiveness, and time savings could encourage adoption. Training programs and clinical validation studies may help build clinician confidence in the tool. Furthermore, legal and liability concerns surrounding AI-guided clinical decisions present a significant barrier. Ambiguities in accountability frameworks – particularly regarding whether responsibility resides with clinicians, developers, or institutions when AI recommendations lead to adverse outcomes – may deter healthcare systems from adoption. Therefore, this study recommends that the AI model should be restricted to research applications, serving solely as a reference for clinical interventions rather than constituting direct evidence for implementing clinical strategies.

This study chose to build models based on different cancer types primarily because this approach enhances the model’s generalizability, ensuring it can be applied to a broader range of cancer patients with metastasis treated with decompressive surgery. While renal cancer patients showed a higher incidence of SSI in our data, the impact of cancer type on SSI occurrence may vary across different tumor types. For example, distinct cancer types may have significant differences in tumor biology, therapeutic strategies, and postoperative recovery, all of which can influence SSI rates. Thus, by considering multiple cancer types when constructing the models, we aimed to improve the model’s applicability, enabling it to better predict the risk of SSI across various cancer patients under different treatment scenarios. Additionally, although renal cancer patients showed a higher rate of SSI, this may be linked to specific biological characteristics related to renal cancer. For instance, renal cancer is classified as a hypervascular tumor, which is particularly relevant in the context of renal spinal metastases^[66]. These metastases are highly vascularized, making surgical procedures more prone to significant bleeding during the operation. As previously demonstrated, renal cancer patients with spinal metastases tended to experience a higher rate of massive intraoperative blood loss compared to other cancer types^[25]. This increased blood loss can compromise wound healing immune function, potentially contributing to the higher incidence of SSI observed in these patients.

Individualized prediction and clinical relevance

The integration of AI in the medical field has opened new avenues for personalized patient care, particularly in predicting SSI among patients with metastatic spinal disease in our study. By employing machine learning algorithms, we analyzed extensive patient data to uncover complex patterns associated with SSI. This process could not only enhance the accuracy of risk stratification but also enable the development of individualized care plans tailored to each patient’s unique circumstances.

Based on the developed AI application in the study, patients were classified into the low-risk and high-risk groups. Patients with metastatic spinal disease who were at high risk for SSI require a comprehensive approach to management. This includes preoperative optimization of overall health and nutritional status, as well as the administration of appropriate antibiotic prophylaxis before surgery. Additionally, the surgical technique should aim to minimize invasiveness and ensure meticulous wound closure. Postoperatively, close monitoring of the surgical wound for any signs of infection is essential, and prompt intervention is necessary if SSI is suspected. High-risk patients may also benefit from additional measures such as advanced wound care and closer postoperative follow-up to mitigate the risk of SSI. For example, a case-control study compared three methods to reduce SSI in patients undergoing spine tumor surgery^[8], and it revealed that patients administrated with betadine irrigation and intrawound vancomycin powder had a significantly lower rate of infections (2.7%) as compared to the patients administered with intrawound vancomycin powder only (12.8%) and patients receiving neither (13.0%) groups^[8]. Perioperative glycemic control and antibiotics prophylaxis have shown to reduce the incidence of SSI in spine surgery^[67]. For patients with metastatic spinal disease at low risk for SSI, standard preoperative assessments and antibiotic prophylaxis per established guidelines are generally adequate. It is crucial to follow standard surgical protocols for wound preparation and closure and to conduct regular postoperative monitoring for any signs of SSI. While the risk of SSI may be lower in these patients, it is still crucial to provide appropriate care and attention to prevent any potential postoperative complications. Ultimately, both high and low-risk patients should receive individualized management measures based on their specific needs and risk factors, with the guidance of a multidisciplinary team of healthcare professionals.

In addition, the individualized risk report, in the AI application, provides valuable insights into the contributions of model predictors related to SSI for each patient, clearly distinguishing between protective and risk factors. This differentiation highlights the importance of recognizing both types of factors when formulating tailored strategies to prevent postoperative SSI. By identifying these individualized factors, healthcare providers can effectively customize preoperative and postoperative care plans; for instance, understanding that shorter surgical durations may lower the risk of infection can motivate efforts to streamline surgical procedures. Additionally, acknowledging the influence of specific comorbidities and tumor types can lead to enhanced monitoring and optimization strategies for high-risk patients. Ultimately, this personalized approach has the potential to significantly reduce the incidence of SSI, enhancing patient outcomes and facilitating a more efficient recovery process. However, the preventive recommendations provided by the AI application should be approached with caution, as these suggestions have not yet undergone rigorous validation. Thus, further clinical trials and extensive studies are necessary to validate the effectiveness of these recommendations in real-world settings. Additionally, reliance on AI-generated insights must be balanced with clinical expertise and judgment, ensuring that healthcare providers consider the unique circumstances of each patient. Continuous monitoring and evaluation of the AI application’s recommendations will be essential to refine its accuracy and reliability over time.

Notably, this study compared model derivation and external validation, and the variable distribution difference between the two groups may arise from variations between the two datasets, including sample size and potential heterogeneity in patient characteristics. However, it is important to emphasize that the majority of the model’s key features, such as tumor type, number of comorbidities, visceral metastases, surgery time, and surgical segments, remained statistically significant in the external validation cohort. Despite these differences, the model demonstrated robust discriminative ability in the external cohort, achieving an AUC of 0.848. This indicates that the model maintains strong predictive performance even when applied to a distinct population, reinforcing its generalizability and clinical utility. Rather than detracting from the model’s practical application, this variability in feature significance across cohorts highlights the model’s adaptability and validity under different clinical scenarios. The consistency in overall performance further supports its potential for broad implementation.

Limitations

Despite the promising results, there are limitations to this study. First, the study identified several influential factors for postoperative SSI among patients with metastatic spinal disease. However, these variables may not comprehensively cover all potential influencing factors. Additional confounding factors – including intraoperative blood loss and perioperative glucose management – may influence postoperative SSI risk but were not analyzed in this study. Future investigations should incorporate these variables through expanded data collection, dynamic modeling, or enhanced physiological monitoring to further improve the model’s predictive accuracy. In addition, we acknowledged that adding more intraoperative factors might further enhance the predictive performance of our model, it is also essential to consider that an increase in model parameters may complicate clinical data collection, ultimately affecting the model’s practicality in real-world settings. Second, the model is specifically designed for assessing the risk of postoperative SSI in patients with metastatic spinal disease. Its applicability to other types of surgeries or diseases, as well as different SSI risk assessments, may necessitate further validation and adjustments. Third, despite the AI platform’s ability to provide personalized postoperative infection risk reports and recommended treatment strategies, clinicians need to be aware that the model serves as an adjunctive decision-making tool and should not entirely replace the clinical judgment of healthcare professionals. Moreover, additional clinical trials and comprehensive studies are crucial to confirm the applicability of these recommendations in practical, real-world scenarios. Additionally, clinicians may need additional time and training to understand and trust the predictive results of the model. Fourth, variations in protocols, patient demographics, and surgical practices can significantly impact the model’s predictive performance. Therefore, it is essential to conduct further validation in diverse clinical settings to confirm the model’s generalizability across heterogeneous populations and varying surgical practices. Lastly, while our study adhered to the standard 30-day SSI surveillance window, we recognize that implant-associated infections may manifest later (e.g. up to 90 days postoperatively). This temporal constraint may underestimate the true SSI incidence in such cases. Future studies should incorporate extended follow-up durations and stratify analyses by infection depth (superficial, deep, or organ space) to refine risk assessment for specific surgical contexts. This will help ensure that the model remains robust and reliable in various healthcare environments. In addition, should this research expand internationally, investigators should proactively address the above-mentioned factors through rigorous multicenter validation studies that account for regional variations in clinical protocols, healthcare infrastructure, and patient demographics.

Conclusions

The AI application shows promise as an effective tool for identifying patients at increased risk of SSI, particularly among those with metastatic spinal disease. It enables personalized treatment plans based on risk assessments and detailed patient profiles. Utilizing this AI application can potentially enhance patient outcomes and propel advancements in surgical medicine for individuals with complex health conditions like metastatic spinal disease. Future research should focus on integrating this predictive tool into clinical practice and exploring its applicability across diverse patient populations.

Footnotes

Sponsorships or competing interests that may be relevant to content are disclosed at the end of this article.

Y.C. and X.S. contributed equally to this work.

Supplemental Digital Content is available for this article. Direct URL citations are provided in the HTML and PDF versions of this article on the journal's website, www.lww.com/international-journal-of-surgery.

Published online 27 June 2025

Contributor Information

Yunpeng Cui, Email: luful1987@163.com.

Xuedong Shi, Email: xuedongs@hotmail.com.

Qiwei Wang, Email: wqw19941017@163.com.

Yong Qin, Email: qinyong0125@126.com.

Xiongwei Zhao, Email: zxw1278531342@163.com.

Xiaotong Che, Email: nursexiaotong@163.com.

Shengjie Wang, Email: wsjcsusjtu@163.com.

Yuanxing Pan, Email: stcyhmpyx@163.com.

Bing Wang, Email: icerain2008@163.com.

Yuncen Cao, Email: 18515365972@139.com.

Yaosheng Liu, Email: liuyaosheng@301hospital.com.cn.

Mingxing Lei, Email: leimingxing2@sina.com.

Ethical approval

All analyses of human data conducted in this study were approved by our hospitals and in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.

Consent

All participants provided informed consent prior to their inclusion in the study. Participants consented to the use of their data for research and publication.

Sources of funding

This study was funded by the Beijing Natural Science Foundation (No.7254420), Beijing Natural Science Foundation-Haidian Original Innovation Joint Fund (No. L232064), and Hainan Clinical Medical Research Center.

Author contributions

All authors contributed to the study design, conducted the data collection and analyses, and drafted the paper. All authors have read and approved the manuscript.

Conflicts of interest disclosure

The authors declare that there are no conflicts of interest.

Research registration unique identifying number (UIN)

ChiCTR: Chinese Clinical Trial Registry, ChiCTR-POC-16008393 and ChiCTR2300075792, https://www.chictr.org.cn/showproj.html?proj=14138 and https://www.chictr.org.cn/showproj.html?proj=206957.

Guarantor

Xuedong Shi, Yaosheng Liu, and Mingxing Lei.

Provenance and peer review

Not commissioned, externally peer-reviewed.

Data availability statement

The datasets of the current study are available under reasonable request.

References

[1].Furlan JC, Wilson JR, Massicotte EM, Sahgal A, Fehlings MG. Recent advances and new discoveries in the pipeline of the treatment of primary spinal tumors and spinal metastases: a scoping review of registered clinical studies from 2000 to 2020. Neuro Oncol 2022;24:1–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
[2].Dhamija B, Batheja D, Balain BS. A systematic review of MIS and open decompression surgery for spinal metastases in the last two decades. J Clin Orthopaedics Trauma 2021;22:101596. [DOI] [PMC free article] [PubMed] [Google Scholar]
[3].Atkinson RA, Jones A, Ousey K, Stephenson J. Management and cost of surgical site infection in patients undergoing surgery for spinal metastasis. J Hosp Infect 2017;95:148–53. [DOI] [PubMed] [Google Scholar]
[4].Quraishi NA, Ahmed MS, Arealis G, Boszczyk BM, Edwards KL. Does surgical site infection influence neurological outcome and survival in patients undergoing surgery for metastatic spinal cord compression? Eur Spine J 2019;28:792–97. [DOI] [PubMed] [Google Scholar]
[5].Sebaaly A, Shedid D, Boubez G, et al. Surgical site infection in spinal metastasis: incidence and risk factors. Spine J 2018;18:1382–87. [DOI] [PubMed] [Google Scholar]
[6].Tarawneh AM, Pasku D, Quraishi NA. Surgical complications and re-operation rates in spinal metastases surgery: a systematic review. Eur Spine J 2021;30:2791–99. [DOI] [PubMed] [Google Scholar]
[7].Atkinson RA, Davies B, Jones A, van Popta D, Ousey K, Stephenson J. Survival of patients undergoing surgery for metastatic spinal tumours and the impact of surgical site infection. J Hosp Infect 2016;94:80–85. [DOI] [PubMed] [Google Scholar]
[8].Mesfin A, Baldwin A, Bernstein DN, et al. Reducing surgical site infections in spine tumor surgery: a comparison of three methods. Spine (Phila Pa 1976) 2019;44:E1428–E1435. [DOI] [PubMed] [Google Scholar]
[9].Atkinson RA, Stephenson J, Jones A, Ousey KJ. An assessment of key risk factors for surgical site infection in patients undergoing surgery for spinal metastases. J Wound Care 2016;25:S30–34. [DOI] [PubMed] [Google Scholar]
[10].Sugita S, Hozumi T, Yamakawa K, Goto T, Kondo T. Risk factors for surgical site infection after posterior fixation surgery and intraoperative radiotherapy for spinal metastases. Eur Spine J 2016;25:1034–38. [DOI] [PubMed] [Google Scholar]
[11].Demura S, Kawahara N, Murakami H, et al. Surgical site infection in spinal metastasis: risk factors and countermeasures. Spine (Phila Pa 1976) 2009;34:635–39. [DOI] [PubMed] [Google Scholar]
[12].Schilling AT, Ehresman J, Huq S, et al. Risk factors for wound-related complications after surgery for primary and metastatic spine tumors: a systematic review and meta-analysis. World Neurosurg 2020;141:467–478.e463. [DOI] [PubMed] [Google Scholar]
[13].Kumar S, van Popta D, Rodrigues-Pinto R, et al. Risk factors for wound infection in surgery for spinal metastasis. Eur Spine J 2015;24:528–32. [DOI] [PubMed] [Google Scholar]
[14].Chen L, Liu C, Ye Z, et al. Predicting surgical site infection risk after spinal tuberculosis surgery: development and validation of a nomogram. Surg Infect (Larchmt) 2022;23:564–75. [DOI] [PMC free article] [PubMed] [Google Scholar]
[15].Zong K, Peng D, Jiang P, et al. Derivation and validation of a novel preoperative risk prediction model for surgical site infection in pancreaticoduodenectomy and comparison of preoperative antibiotics with different risk stratifications in retrospective cohort. J Hosp Infect 2023;139:228–37. [DOI] [PubMed] [Google Scholar]
[16].Dwyer DB, Falkai P, Koutsouleris N. Machine learning approaches for clinical psychology and psychiatry. Annu Rev Clin Psychol 2018;14:91–118. [DOI] [PubMed] [Google Scholar]
[17].Hersh AM, Feghali J, Hung B, et al. A web-based calculator for predicting the occurrence of wound complications, wound infection, and unplanned reoperation for wound complications in patients undergoing surgery for spinal metastases. World Neurosurg 2021;155:e218–e228. [DOI] [PubMed] [Google Scholar]
[18].Ryvlin J, Kim SW, De la Garza Ramos R, et al. External validation of an online wound infection and wound reoperation risk calculator after metastatic spinal tumor surgery. World Neurosurg 2024;185:e351–e356. [DOI] [PubMed] [Google Scholar]
[19].Han SS, Azad TD, Suarez PA, Ratliff JK. A machine learning approach for predictive models of adverse events following spine surgery. Spine J 2019;19:1772–81. [DOI] [PubMed] [Google Scholar]
[20].Fatima N, Zheng H, Massaad E, Hadzipasic M, Shankar GM, Shin JH. Development and validation of machine learning algorithms for predicting adverse events after surgery for lumbar degenerative spondylolisthesis. World Neurosurg 2020;140:627–41. [DOI] [PubMed] [Google Scholar]
[21].Katiyar P, Chase H, Lenke LG, Weidenbaum M, Sardar ZM. Using machine learning (ML) models to predict risk of venous thromboembolism (VTE) following spine surgery. Clin Spine Surg 2023;36:E453–E456. [DOI] [PMC free article] [PubMed] [Google Scholar]
[22].Yang JJ, Chen CW, Fourman MS, et al. International external validation of the SORG machine learning algorithms for predicting 90-day and one-year survival of patients with spine metastases using a Taiwanese cohort. Spine J 2021;21:1670–78. [DOI] [PubMed] [Google Scholar]
[23].Karhade AV, Thio Q, Ogink PT, et al. Development of machine learning algorithms for prediction of 30-day mortality after surgery for spinal metastasis. Neurosurgery 2019;85:E83–E91. [DOI] [PubMed] [Google Scholar]
[24].Gao L, Cao Y, Cao X, et al. Machine learning-based algorithms to predict severe psychological distress among cancer patients with spinal metastatic disease. Spine J 2023;S1529-9430:00199–00197. [DOI] [PubMed] [Google Scholar]
[25].Shi X, Cui Y, Wang S, Pan Y, Wang B, Lei M. Development and validation of a web-based artificial intelligence prediction model to assess massive intraoperative blood loss for metastatic spinal disease using machine learning techniques. Spine J 2024;24:146–60. [DOI] [PubMed] [Google Scholar]
[26].Hopkins BS, Mazmudar A, Driscoll C, et al. Using artificial intelligence (AI) to predict postoperative surgical site infection: a retrospective cohort of 4046 posterior spinal fusions. Clin Neurol Neurosurg 2020;192:105718. [DOI] [PubMed] [Google Scholar]
[27].Tunthanathip T, Sae-Heng S, Oearsakul T, Sakarunchai I, Kaewborisutsakul A, Taweesomboonyat C. Machine learning applications for the prediction of surgical site infection in neurological operations. Neurosurg Focus 2019;47:E7. [DOI] [PubMed] [Google Scholar]
[28].Chen T, Liu C, Zhang Z, et al. Using machine learning to predict surgical site infection after lumbar spine surgery. Infect Drug Resist 2023;16:5197–207. [DOI] [PMC free article] [PubMed] [Google Scholar]
[29].Rashid R, Sohrabi C, Kerwan A, et al. The STROCSS 2024 guideline: strengthening the reporting of cohort, cross-sectional, and case-control studies in surgery. Int J Surg 2024;110:3151–65. [DOI] [PMC free article] [PubMed] [Google Scholar]
[30].Agha RA, Mathew G, Rashid R, et al. Revised Strengthening the Reporting of Cohort, Cross-Sectional and Case-Control Studies in Surgery (STROCSS) Guideline: an update for the age of Artificial Intelligence. Premier Journal of Science 2025;10:100081. [Google Scholar]
[31].Collins GS, Reitsma JB, Altman DG, Moons KG. Transparent Reporting of a multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD). Ann Intern Med 2015;162:735–36. [DOI] [PubMed] [Google Scholar]
[32].Choi D, Crockard A, Bunger C, et al. Review of metastatic spine tumour classification and indications for surgery: the consensus statement of the global spine tumour study group. Eur Spine J 2010;19:215–22. [DOI] [PMC free article] [PubMed] [Google Scholar]
[33].Horan TC, Gaynes RP, Martone WJ, Jarvis WR, Emori TG. CDC definitions of nosocomial surgical site infections, 1992: a modification of CDC definitions of surgical wound infections. Am J Infect Control 1992;20:271–74. [DOI] [PubMed] [Google Scholar]
[34].Cui Y, Shi X, Wang S, et al. Machine learning approaches for prediction of early death among lung cancer patients with bone metastases using routine clinical characteristics: an analysis of 19,887 patients. Front Public Health 2022;10:1019168. [DOI] [PMC free article] [PubMed] [Google Scholar]
[35].Lei M, Wu B, Zhang Z, et al. A web-based calculator to predict early death among patients with bone metastasis using machine learning techniques: development and validation study. J Med Internet Res 2023;25:e47590. [DOI] [PMC free article] [PubMed] [Google Scholar]
[36].Xiong F, Cao X, Shi X, Long Z, Liu Y, Lei M. A machine learning-based model to predict early death among bone metastatic breast cancer patients: a large cohort of 16,189 patients. Front Cell Dev Biol 2022;10:1059597. [DOI] [PMC free article] [PubMed] [Google Scholar]
[37].Kigka VI, Sakellarios AI, Georga EI, et al. Site specific prediction of PCI stenting based on imaging and biomechanics data using gradient boosting tree ensembles. Annu Int Conf IEEE Eng Med Biol Soc 2020;2020:2812–15. [DOI] [PubMed] [Google Scholar]
[38].Ahn JM, Kim J, Kim K. Ensemble machine learning of gradient boosting (XGBoost, LightGBM, CatBoost) and Attention-Based CNN-LSTM for harmful algal blooms forecasting. Toxins (Basel) 2023;15:608. [DOI] [PMC free article] [PubMed] [Google Scholar]
[39].Nezic DG. Assessing the performance of risk prediction models. Eur J Cardiothorac Surg 2020;58:401. [DOI] [PubMed] [Google Scholar]
[40].Cabot JH, Ross EG. Evaluating prediction model performance. Surgery 2023;174:723–26. [DOI] [PMC free article] [PubMed] [Google Scholar]
[41].Cui Y, Wang Q, Shi X, Ye Q, Lei M, Wang B. Development of a web-based calculator to predict three-month mortality among patients with bone metastases from cancer of unknown primary: an internally and externally validated study using machine-learning techniques. Front Oncol 2022;12:1095059. [DOI] [PMC free article] [PubMed] [Google Scholar]
[42].Huang Y, Li W, Macheret F, Gabriel RA, Ohno-Machado L. A tutorial on calibration measurements and calibration models for clinical prediction models. J Am Med Inform Assoc 2020;27:621–33. [DOI] [PMC free article] [PubMed] [Google Scholar]
[43].Van Calster B, Wynants L, Verbeek JFM, et al. Reporting and interpreting decision curve analysis: a guide for investigators. Eur Urol 2018;74:796–804. [DOI] [PMC free article] [PubMed] [Google Scholar]
[44].Zhu C, Xu Z, Gu Y, et al. Prediction of post-stroke urinary tract infection risk in immobile patients using machine learning: an observational cohort study. J Hosp Infect 2022;122:96–107. [DOI] [PubMed] [Google Scholar]
[45].Yi M, Cao Y, Wang L, et al. Prediction of medical disputes between health care workers and patients in terms of hospital legal construction using machine learning techniques: externally validated cross-sectional study. J Med Internet Res 2023;25:e46854. [DOI] [PMC free article] [PubMed] [Google Scholar]
[46].Cui Y, Shi X, Qin Y, et al. Establishment and validation of an interactive artificial intelligence platform to predict postoperative ambulatory status for patients with metastatic spinal disease: a multicenter analysis. Int J Surg 2024;110:2738–56. [DOI] [PMC free article] [PubMed] [Google Scholar]
[47].Sun B, Lei M, Wang L, et al. Prediction of sepsis among patients with major trauma using artificial intelligence: a multicenter validated cohort study. Int J Surg 2025;111:467–80. [DOI] [PMC free article] [PubMed] [Google Scholar]
[48].Zhou J, Wang R, Huo X, Xiong W, Kang L, Xue Y. Incidence of surgical site infection after spine surgery: a systematic review and meta-analysis. Spine (Phila Pa 1976) 2020;45:208–16. [DOI] [PubMed] [Google Scholar]
[49].Tsantes AG, Papadopoulos DV, Lytras T, et al. Association of malnutrition with surgical site infection following spinal surgery: systematic review and meta-analysis. J Hosp Infect 2020;104:111–19. [DOI] [PubMed] [Google Scholar]
[50].Shen CJ, Miao T, Wang ZF, et al. Predictive value of post-operative neutrophil/lymphocyte count ratio for surgical site infection in patients following posterior lumbar spinal surgery. Int Immunopharmacol 2019;74:105705. [DOI] [PubMed] [Google Scholar]
[51].Lenski M, Tonn JC, Siller S. Interleukin-6 as inflammatory marker of surgical site infection following spinal surgery. Acta Neurochir (Wien) 2021;163:1583–92. [DOI] [PubMed] [Google Scholar]
[52].Tominaga H, Setoguchi T, Ishidou Y, Nagano S, Yamamoto T, Komiya S. Risk factors for surgical site infection and urinary tract infection after spine surgery. Eur Spine J 2016;25:3908–15. [DOI] [PubMed] [Google Scholar]
[53].Wang Y, Shi Y, Wang L, et al. Risk prediction model for surgical site infection in patients with gastrointestinal cancer: a systematic review and meta-analysis. World J Surg Oncol 2025;23:72. [DOI] [PMC free article] [PubMed] [Google Scholar]
[54].Ohno Y, Mazaki J, Udo R, et al. Preliminary evaluation of a novel Artificial Intelligence-based prediction model for surgical site infection in colon cancer. Cancer Diagnosis & Prognosis 2022;2:691–96. [DOI] [PMC free article] [PubMed] [Google Scholar]
[55].Koshizaka M, Ishibashi R, Maeda Y, et al. Predictive model and risk engine web application for surgical site infection risk in perioperative patients with type 2 diabetes. Diabetol Int 2022;13:657–64. [DOI] [PMC free article] [PubMed] [Google Scholar]
[56].Sheyn D, Gregory WT, Osazuwa-Peters O, Jelovsek JE. Development and validation of a model for predicting surgical site infection after pelvic organ prolapse surgery. Urogynecology (Phila) 2022;28:658–66. [DOI] [PMC free article] [PubMed] [Google Scholar]
[57].Yamagami A, Narumi K, Saito Y, et al. Development of a risk prediction model for surgical site infection after lower third molar surgery. Oral Dis 2024;30:3202–11. [DOI] [PubMed] [Google Scholar]
[58].Zhang Q, Chen G, Zhu Q, et al. Construct validation of machine learning for accurately predicting the risk of postoperative surgical site infection following spine surgery. J Hosp Infect 2024;146:232–41. [DOI] [PubMed] [Google Scholar]
[59].Stewart KE, Terada R, Windrix C, et al. Trends and prediction of surgical site infection after elective spine surgery: an analysis of the American College of Surgeons National Surgical Quality Improvement Project Database. Surg Infect (Larchmt) 2023;24:506–13. [DOI] [PubMed] [Google Scholar]
[60].Qiu HY, Liu DM, Sun FL, et al. Development and validation of a clinical nomogram prediction model for surgical site infection following lumbar disc herniation surgery. Sci Rep 2024;14:26910. [DOI] [PMC free article] [PubMed] [Google Scholar]
[61].Liu H, Zhang W, Zhang Y, Zhang S, Jin G, Li X. Establishment and validation of a nomogram model for postoperative surgical site infection after transforaminal lumbar interbody fusion: a retrospective observational study. Surgery 2023;174:1220–26. [DOI] [PubMed] [Google Scholar]
[62].Xiong C, Zhao R, Xu J, et al. Construct and validate a predictive model for surgical site infection after posterior lumbar interbody fusion based on machine learning algorithm. Comput Math Methods Med 2022;2022:2697841. [DOI] [PMC free article] [PubMed] [Google Scholar]
[63].Prey BJ, Colburn ZT, Williams JM, et al. The use of mobile thermal imaging and machine learning technology for the detection of early surgical site infections. Am J Surg 2024;231:60–4. [DOI] [PubMed] [Google Scholar]
[64].Fletcher RR, Olubeko O, Sonthalia H, et al. Application of machine learning to prediction of surgical site infection. Annu Int Conf IEEE Eng Med Biol Soc 2019;2019:2234–37. [DOI] [PubMed] [Google Scholar]
[65].Wu G, Khair S, Yang F, et al. Performance of machine learning algorithms for surgical site infection case detection and prediction: a systematic review and meta-analysis. Ann Med Surg Lond 2022;84:104956. [DOI] [PMC free article] [PubMed] [Google Scholar]
[66].Saha A, Peck KK, Lis E, Holodny AI, Yamada Y, Karimi S. Magnetic resonance perfusion characteristics of hypervascular renal and hypovascular prostate spinal metastases: clinical utilities and implications. Spine (Phila Pa 1976) 2014;39:E1433–E1440. [DOI] [PMC free article] [PubMed] [Google Scholar]
[67].Swann MC, Hoes KS, Aoun SG, McDonagh DL. Postoperative complications of spine surgery. Best Pract Res Clin Anaesthesiol 2016;30:103–20. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The datasets of the current study are available under reasonable request.

[R1] [1].Furlan JC, Wilson JR, Massicotte EM, Sahgal A, Fehlings MG. Recent advances and new discoveries in the pipeline of the treatment of primary spinal tumors and spinal metastases: a scoping review of registered clinical studies from 2000 to 2020. Neuro Oncol 2022;24:1–13. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] [2].Dhamija B, Batheja D, Balain BS. A systematic review of MIS and open decompression surgery for spinal metastases in the last two decades. J Clin Orthopaedics Trauma 2021;22:101596. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] [3].Atkinson RA, Jones A, Ousey K, Stephenson J. Management and cost of surgical site infection in patients undergoing surgery for spinal metastasis. J Hosp Infect 2017;95:148–53. [DOI] [PubMed] [Google Scholar]

[R4] [4].Quraishi NA, Ahmed MS, Arealis G, Boszczyk BM, Edwards KL. Does surgical site infection influence neurological outcome and survival in patients undergoing surgery for metastatic spinal cord compression? Eur Spine J 2019;28:792–97. [DOI] [PubMed] [Google Scholar]

[R5] [5].Sebaaly A, Shedid D, Boubez G, et al. Surgical site infection in spinal metastasis: incidence and risk factors. Spine J 2018;18:1382–87. [DOI] [PubMed] [Google Scholar]

[R6] [6].Tarawneh AM, Pasku D, Quraishi NA. Surgical complications and re-operation rates in spinal metastases surgery: a systematic review. Eur Spine J 2021;30:2791–99. [DOI] [PubMed] [Google Scholar]

[R7] [7].Atkinson RA, Davies B, Jones A, van Popta D, Ousey K, Stephenson J. Survival of patients undergoing surgery for metastatic spinal tumours and the impact of surgical site infection. J Hosp Infect 2016;94:80–85. [DOI] [PubMed] [Google Scholar]

[R8] [8].Mesfin A, Baldwin A, Bernstein DN, et al. Reducing surgical site infections in spine tumor surgery: a comparison of three methods. Spine (Phila Pa 1976) 2019;44:E1428–E1435. [DOI] [PubMed] [Google Scholar]

[R9] [9].Atkinson RA, Stephenson J, Jones A, Ousey KJ. An assessment of key risk factors for surgical site infection in patients undergoing surgery for spinal metastases. J Wound Care 2016;25:S30–34. [DOI] [PubMed] [Google Scholar]

[R10] [10].Sugita S, Hozumi T, Yamakawa K, Goto T, Kondo T. Risk factors for surgical site infection after posterior fixation surgery and intraoperative radiotherapy for spinal metastases. Eur Spine J 2016;25:1034–38. [DOI] [PubMed] [Google Scholar]

[R11] [11].Demura S, Kawahara N, Murakami H, et al. Surgical site infection in spinal metastasis: risk factors and countermeasures. Spine (Phila Pa 1976) 2009;34:635–39. [DOI] [PubMed] [Google Scholar]

[R12] [12].Schilling AT, Ehresman J, Huq S, et al. Risk factors for wound-related complications after surgery for primary and metastatic spine tumors: a systematic review and meta-analysis. World Neurosurg 2020;141:467–478.e463. [DOI] [PubMed] [Google Scholar]

[R13] [13].Kumar S, van Popta D, Rodrigues-Pinto R, et al. Risk factors for wound infection in surgery for spinal metastasis. Eur Spine J 2015;24:528–32. [DOI] [PubMed] [Google Scholar]

[R14] [14].Chen L, Liu C, Ye Z, et al. Predicting surgical site infection risk after spinal tuberculosis surgery: development and validation of a nomogram. Surg Infect (Larchmt) 2022;23:564–75. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R15] [15].Zong K, Peng D, Jiang P, et al. Derivation and validation of a novel preoperative risk prediction model for surgical site infection in pancreaticoduodenectomy and comparison of preoperative antibiotics with different risk stratifications in retrospective cohort. J Hosp Infect 2023;139:228–37. [DOI] [PubMed] [Google Scholar]

[R16] [16].Dwyer DB, Falkai P, Koutsouleris N. Machine learning approaches for clinical psychology and psychiatry. Annu Rev Clin Psychol 2018;14:91–118. [DOI] [PubMed] [Google Scholar]

[R17] [17].Hersh AM, Feghali J, Hung B, et al. A web-based calculator for predicting the occurrence of wound complications, wound infection, and unplanned reoperation for wound complications in patients undergoing surgery for spinal metastases. World Neurosurg 2021;155:e218–e228. [DOI] [PubMed] [Google Scholar]

[R18] [18].Ryvlin J, Kim SW, De la Garza Ramos R, et al. External validation of an online wound infection and wound reoperation risk calculator after metastatic spinal tumor surgery. World Neurosurg 2024;185:e351–e356. [DOI] [PubMed] [Google Scholar]

[R19] [19].Han SS, Azad TD, Suarez PA, Ratliff JK. A machine learning approach for predictive models of adverse events following spine surgery. Spine J 2019;19:1772–81. [DOI] [PubMed] [Google Scholar]

[R20] [20].Fatima N, Zheng H, Massaad E, Hadzipasic M, Shankar GM, Shin JH. Development and validation of machine learning algorithms for predicting adverse events after surgery for lumbar degenerative spondylolisthesis. World Neurosurg 2020;140:627–41. [DOI] [PubMed] [Google Scholar]

[R21] [21].Katiyar P, Chase H, Lenke LG, Weidenbaum M, Sardar ZM. Using machine learning (ML) models to predict risk of venous thromboembolism (VTE) following spine surgery. Clin Spine Surg 2023;36:E453–E456. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] [22].Yang JJ, Chen CW, Fourman MS, et al. International external validation of the SORG machine learning algorithms for predicting 90-day and one-year survival of patients with spine metastases using a Taiwanese cohort. Spine J 2021;21:1670–78. [DOI] [PubMed] [Google Scholar]

[R23] [23].Karhade AV, Thio Q, Ogink PT, et al. Development of machine learning algorithms for prediction of 30-day mortality after surgery for spinal metastasis. Neurosurgery 2019;85:E83–E91. [DOI] [PubMed] [Google Scholar]

[R24] [24].Gao L, Cao Y, Cao X, et al. Machine learning-based algorithms to predict severe psychological distress among cancer patients with spinal metastatic disease. Spine J 2023;S1529-9430:00199–00197. [DOI] [PubMed] [Google Scholar]

[R25] [25].Shi X, Cui Y, Wang S, Pan Y, Wang B, Lei M. Development and validation of a web-based artificial intelligence prediction model to assess massive intraoperative blood loss for metastatic spinal disease using machine learning techniques. Spine J 2024;24:146–60. [DOI] [PubMed] [Google Scholar]

[R26] [26].Hopkins BS, Mazmudar A, Driscoll C, et al. Using artificial intelligence (AI) to predict postoperative surgical site infection: a retrospective cohort of 4046 posterior spinal fusions. Clin Neurol Neurosurg 2020;192:105718. [DOI] [PubMed] [Google Scholar]

[R27] [27].Tunthanathip T, Sae-Heng S, Oearsakul T, Sakarunchai I, Kaewborisutsakul A, Taweesomboonyat C. Machine learning applications for the prediction of surgical site infection in neurological operations. Neurosurg Focus 2019;47:E7. [DOI] [PubMed] [Google Scholar]

[R28] [28].Chen T, Liu C, Zhang Z, et al. Using machine learning to predict surgical site infection after lumbar spine surgery. Infect Drug Resist 2023;16:5197–207. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R29] [29].Rashid R, Sohrabi C, Kerwan A, et al. The STROCSS 2024 guideline: strengthening the reporting of cohort, cross-sectional, and case-control studies in surgery. Int J Surg 2024;110:3151–65. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R30] [30].Agha RA, Mathew G, Rashid R, et al. Revised Strengthening the Reporting of Cohort, Cross-Sectional and Case-Control Studies in Surgery (STROCSS) Guideline: an update for the age of Artificial Intelligence. Premier Journal of Science 2025;10:100081. [Google Scholar]

[R31] [31].Collins GS, Reitsma JB, Altman DG, Moons KG. Transparent Reporting of a multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD). Ann Intern Med 2015;162:735–36. [DOI] [PubMed] [Google Scholar]

[R32] [32].Choi D, Crockard A, Bunger C, et al. Review of metastatic spine tumour classification and indications for surgery: the consensus statement of the global spine tumour study group. Eur Spine J 2010;19:215–22. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R33] [33].Horan TC, Gaynes RP, Martone WJ, Jarvis WR, Emori TG. CDC definitions of nosocomial surgical site infections, 1992: a modification of CDC definitions of surgical wound infections. Am J Infect Control 1992;20:271–74. [DOI] [PubMed] [Google Scholar]

[R34] [34].Cui Y, Shi X, Wang S, et al. Machine learning approaches for prediction of early death among lung cancer patients with bone metastases using routine clinical characteristics: an analysis of 19,887 patients. Front Public Health 2022;10:1019168. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R35] [35].Lei M, Wu B, Zhang Z, et al. A web-based calculator to predict early death among patients with bone metastasis using machine learning techniques: development and validation study. J Med Internet Res 2023;25:e47590. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R36] [36].Xiong F, Cao X, Shi X, Long Z, Liu Y, Lei M. A machine learning-based model to predict early death among bone metastatic breast cancer patients: a large cohort of 16,189 patients. Front Cell Dev Biol 2022;10:1059597. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R37] [37].Kigka VI, Sakellarios AI, Georga EI, et al. Site specific prediction of PCI stenting based on imaging and biomechanics data using gradient boosting tree ensembles. Annu Int Conf IEEE Eng Med Biol Soc 2020;2020:2812–15. [DOI] [PubMed] [Google Scholar]

[R38] [38].Ahn JM, Kim J, Kim K. Ensemble machine learning of gradient boosting (XGBoost, LightGBM, CatBoost) and Attention-Based CNN-LSTM for harmful algal blooms forecasting. Toxins (Basel) 2023;15:608. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R39] [39].Nezic DG. Assessing the performance of risk prediction models. Eur J Cardiothorac Surg 2020;58:401. [DOI] [PubMed] [Google Scholar]

[R40] [40].Cabot JH, Ross EG. Evaluating prediction model performance. Surgery 2023;174:723–26. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R41] [41].Cui Y, Wang Q, Shi X, Ye Q, Lei M, Wang B. Development of a web-based calculator to predict three-month mortality among patients with bone metastases from cancer of unknown primary: an internally and externally validated study using machine-learning techniques. Front Oncol 2022;12:1095059. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R42] [42].Huang Y, Li W, Macheret F, Gabriel RA, Ohno-Machado L. A tutorial on calibration measurements and calibration models for clinical prediction models. J Am Med Inform Assoc 2020;27:621–33. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R43] [43].Van Calster B, Wynants L, Verbeek JFM, et al. Reporting and interpreting decision curve analysis: a guide for investigators. Eur Urol 2018;74:796–804. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R44] [44].Zhu C, Xu Z, Gu Y, et al. Prediction of post-stroke urinary tract infection risk in immobile patients using machine learning: an observational cohort study. J Hosp Infect 2022;122:96–107. [DOI] [PubMed] [Google Scholar]

[R45] [45].Yi M, Cao Y, Wang L, et al. Prediction of medical disputes between health care workers and patients in terms of hospital legal construction using machine learning techniques: externally validated cross-sectional study. J Med Internet Res 2023;25:e46854. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R46] [46].Cui Y, Shi X, Qin Y, et al. Establishment and validation of an interactive artificial intelligence platform to predict postoperative ambulatory status for patients with metastatic spinal disease: a multicenter analysis. Int J Surg 2024;110:2738–56. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R47] [47].Sun B, Lei M, Wang L, et al. Prediction of sepsis among patients with major trauma using artificial intelligence: a multicenter validated cohort study. Int J Surg 2025;111:467–80. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R48] [48].Zhou J, Wang R, Huo X, Xiong W, Kang L, Xue Y. Incidence of surgical site infection after spine surgery: a systematic review and meta-analysis. Spine (Phila Pa 1976) 2020;45:208–16. [DOI] [PubMed] [Google Scholar]

[R49] [49].Tsantes AG, Papadopoulos DV, Lytras T, et al. Association of malnutrition with surgical site infection following spinal surgery: systematic review and meta-analysis. J Hosp Infect 2020;104:111–19. [DOI] [PubMed] [Google Scholar]

[R50] [50].Shen CJ, Miao T, Wang ZF, et al. Predictive value of post-operative neutrophil/lymphocyte count ratio for surgical site infection in patients following posterior lumbar spinal surgery. Int Immunopharmacol 2019;74:105705. [DOI] [PubMed] [Google Scholar]

[R51] [51].Lenski M, Tonn JC, Siller S. Interleukin-6 as inflammatory marker of surgical site infection following spinal surgery. Acta Neurochir (Wien) 2021;163:1583–92. [DOI] [PubMed] [Google Scholar]

[R52] [52].Tominaga H, Setoguchi T, Ishidou Y, Nagano S, Yamamoto T, Komiya S. Risk factors for surgical site infection and urinary tract infection after spine surgery. Eur Spine J 2016;25:3908–15. [DOI] [PubMed] [Google Scholar]

[R53] [53].Wang Y, Shi Y, Wang L, et al. Risk prediction model for surgical site infection in patients with gastrointestinal cancer: a systematic review and meta-analysis. World J Surg Oncol 2025;23:72. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R54] [54].Ohno Y, Mazaki J, Udo R, et al. Preliminary evaluation of a novel Artificial Intelligence-based prediction model for surgical site infection in colon cancer. Cancer Diagnosis & Prognosis 2022;2:691–96. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R55] [55].Koshizaka M, Ishibashi R, Maeda Y, et al. Predictive model and risk engine web application for surgical site infection risk in perioperative patients with type 2 diabetes. Diabetol Int 2022;13:657–64. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R56] [56].Sheyn D, Gregory WT, Osazuwa-Peters O, Jelovsek JE. Development and validation of a model for predicting surgical site infection after pelvic organ prolapse surgery. Urogynecology (Phila) 2022;28:658–66. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R57] [57].Yamagami A, Narumi K, Saito Y, et al. Development of a risk prediction model for surgical site infection after lower third molar surgery. Oral Dis 2024;30:3202–11. [DOI] [PubMed] [Google Scholar]

[R58] [58].Zhang Q, Chen G, Zhu Q, et al. Construct validation of machine learning for accurately predicting the risk of postoperative surgical site infection following spine surgery. J Hosp Infect 2024;146:232–41. [DOI] [PubMed] [Google Scholar]

[R59] [59].Stewart KE, Terada R, Windrix C, et al. Trends and prediction of surgical site infection after elective spine surgery: an analysis of the American College of Surgeons National Surgical Quality Improvement Project Database. Surg Infect (Larchmt) 2023;24:506–13. [DOI] [PubMed] [Google Scholar]

[R60] [60].Qiu HY, Liu DM, Sun FL, et al. Development and validation of a clinical nomogram prediction model for surgical site infection following lumbar disc herniation surgery. Sci Rep 2024;14:26910. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R61] [61].Liu H, Zhang W, Zhang Y, Zhang S, Jin G, Li X. Establishment and validation of a nomogram model for postoperative surgical site infection after transforaminal lumbar interbody fusion: a retrospective observational study. Surgery 2023;174:1220–26. [DOI] [PubMed] [Google Scholar]

[R62] [62].Xiong C, Zhao R, Xu J, et al. Construct and validate a predictive model for surgical site infection after posterior lumbar interbody fusion based on machine learning algorithm. Comput Math Methods Med 2022;2022:2697841. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R63] [63].Prey BJ, Colburn ZT, Williams JM, et al. The use of mobile thermal imaging and machine learning technology for the detection of early surgical site infections. Am J Surg 2024;231:60–4. [DOI] [PubMed] [Google Scholar]

[R64] [64].Fletcher RR, Olubeko O, Sonthalia H, et al. Application of machine learning to prediction of surgical site infection. Annu Int Conf IEEE Eng Med Biol Soc 2019;2019:2234–37. [DOI] [PubMed] [Google Scholar]

[R65] [65].Wu G, Khair S, Yang F, et al. Performance of machine learning algorithms for surgical site infection case detection and prediction: a systematic review and meta-analysis. Ann Med Surg Lond 2022;84:104956. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R66] [66].Saha A, Peck KK, Lis E, Holodny AI, Yamada Y, Karimi S. Magnetic resonance perfusion characteristics of hypervascular renal and hypovascular prostate spinal metastases: clinical utilities and implications. Spine (Phila Pa 1976) 2014;39:E1433–E1440. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R67] [67].Swann MC, Hoes KS, Aoun SG, McDonagh DL. Postoperative complications of spine surgery. Best Pract Res Clin Anaesthesiol 2016;30:103–20. [DOI] [PubMed] [Google Scholar]

PERMALINK

Artificial intelligence-based prediction model for surgical site infection in metastatic spinal disease: a multicenter development and validation study

Yunpeng Cui, MD

Xuedong Shi, MD

Qiwei Wang, MD

Yong Qin, MD

Xiongwei Zhao, MD

Xiaotong Che, MD

Shengjie Wang, MD

Yuanxing Pan, MD

Bing Wang, MD

Yuncen Cao, MD

Yaosheng Liu, MD

Mingxing Lei, MD

Abstract

Background:

Methods:

Results:

Conclusions:

Introduction

HIGHLIGHTS

Patients and methods

Data sources and study design

Surgical process

Data collection

Quality control

Definition of SSI

Data engineering

Machine learning algorithm selection

Hyperparameter tuning and model optimization

Model validation

Feature importance analysis

Construction of the AI application

Human vs. AI predictive performance comparison

Statistical analysis

Results

Patient characteristics overview

Table 1.

Selection of model inputs

Model evaluation

Figure 1.

Figure 2.

Table 2.

Figure 3.

Figure 4.

Figure 5.

External validation

Sensitivity analysis

Figure 6.

Feature importance

Deployment of the AI application

Using the AI application to predict the risk of SSI

Comparative analysis of human vs. AI prediction of SSI risk

Figure 7.

Discussion

Principal findings

Risk factors associated with SSI in metastatic spinal disease

Prediction of SSI for metastatic spinal disease

Individualized prediction and clinical relevance

Limitations

Conclusions

Footnotes

Contributor Information

Ethical approval

Consent

Sources of funding

Author contributions

Conflicts of interest disclosure

Research registration unique identifying number (UIN)

Guarantor

Provenance and peer review

Data availability statement

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles