Establishment and validation of an interactive artificial intelligence platform to predict postoperative ambulatory status for patients with metastatic spinal disease: a multicenter analysis

Yunpeng Cui; Xuedong Shi; Yong Qin; Qiwei Wang; Xuyong Cao; Xiaotong Che; Yuanxing Pan; Bing Wang; Mingxing Lei; Yaosheng Liu

doi:10.1097/JS9.0000000000001169

. 2024 Feb 19;110(5):2738–2756. doi: 10.1097/JS9.0000000000001169

Establishment and validation of an interactive artificial intelligence platform to predict postoperative ambulatory status for patients with metastatic spinal disease: a multicenter analysis

Yunpeng Cui ^a, Xuedong Shi ^a,^*, Yong Qin ^h, Qiwei Wang ^a, Xuyong Cao ^c, Xiaotong Che ^g, Yuanxing Pan ^a, Bing Wang ^a, Mingxing Lei ^e,^f,^b,^*, Yaosheng Liu ^d,^b,^*

PMCID: PMC11093492 PMID: 38376838

Abstract

Background:

Identification of patients with high-risk of experiencing inability to walk after surgery is important for surgeons to make therapeutic strategies for patients with metastatic spinal disease. However, there is a lack of clinical tool to assess postoperative ambulatory status for those patients. The emergence of artificial intelligence (AI) brings a promising opportunity to develop accurate prediction models.

Methods:

This study collected 455 patients with metastatic spinal disease who underwent posterior decompressive surgery at three tertiary medical institutions. Of these, 220 patients were collected from one medical institution to form the model derivation cohort, while 89 and 146 patients were collected from two other medical institutions to form the external validation cohorts 1 and 2, respectively. Patients in the model derivation cohort were used to develop and internally validate models. To establish the interactive AI platform, machine learning techniques were used to develop prediction models, including logistic regression (LR), decision tree (DT), random forest (RF), extreme gradient boosting machine (eXGBM), support vector machine (SVM), and neural network (NN). Furthermore, to enhance the resilience of the study’s model, an ensemble machine learning approach was employed using a soft-voting method by combining the results of the above six algorithms. A scoring system incorporating 10 evaluation metrics was used to comprehensively assess the prediction performance of the developed models. The scoring system had a total score of 0 to 60, with higher scores denoting better prediction performance. An interactive AI platform was further deployed via Streamlit. The prediction performance was compared between medical experts and the AI platform in assessing the risk of experiencing postoperative inability to walk among patients with metastatic spinal disease.

Results:

Among all developed models, the ensemble model outperformed the six other models with the highest score of 57, followed by the eXGBM model (54), SVM model (50), and NN model (50). The ensemble model had the best performance in accuracy and calibration slope, and the second-best performance in precise, recall, specificity, area under the curve (AUC), Brier score, and log loss. The scores of the LR model, RF model, and DT model were 39, 46, and 26, respectively. External validation demonstrated that the ensemble model had an AUC value of 0.873 (95% CI: 0.809–0.936) in the external validation cohort 1 and 0.924 (95% CI: 0.890–0.959) in the external validation cohort 2. In the new ensemble machine learning model excluding the feature of the number of comorbidities, the AUC value was still as high as 0.916 (95% CI: 0.863–0.969). In addition, the AUC values of the new model were 0.880 (95% CI: 0.819–0.940) in the external validation cohort 1 and 0.922 (95% CI: 0.887–0.958) in the external validation cohort 2, indicating favorable generalization of the model. The interactive AI platform was further deployed online based on the final machine learning model, and it was available at https://postoperativeambulatory-izpdr6gsxxwhitr8fubutd.streamlit.app/. By using the AI platform, researchers were able to obtain the individual predicted risk of postoperative inability to walk, gain insights into the key factors influencing the outcome, and find the stratified therapeutic recommendations. The AUC value obtained from the AI platform was significantly higher than the average AUC value achieved by the medical experts (P<0.001), denoting that the AI platform obviously outperformed the individual medical experts.

Conclusions:

The study successfully develops and validates an interactive AI platform for evaluating the risk of postoperative loss of ambulatory ability in patients with metastatic spinal disease. This AI platform has the potential to serve as a valuable model for guiding healthcare professionals in implementing surgical plans and ultimately enhancing patient outcomes.

Keywords: artificial intelligence, feature importance, machine learning, metastatic spinal disease, postoperative ambulatory status

Introduction

Highlights

The interactive AI platform can accurately predict postoperative ambulatory status.
The ensemble model obtained the highest score in a comprehensive scoring system.
External validation confirms high predictive capability of the AI model.
The AI platform outperforms medical experts in predicting risk.

Metastatic spinal disease is a common complication of cancer, with a reported incidence of 70%^1,2. It poses significant challenges for patients and clinicians due to its detrimental effects on neurological function and quality of life^3,4. The disease is characterized by the spread of cancer cells to the vertebral column, leading to spinal instability, compression of neural elements, pain, and neurological deficits^4,5. The management of metastatic spinal disease often involves surgical intervention to relieve pain, decompress neural structures, and restore spinal stability^6–8. Notably, the ability to maintain or regain ambulatory status after surgery is a critical factor in determining the success of the procedure and the overall prognosis for patients⁹.

Despite the importance of postoperative ambulatory status, there is a lack of robust clinical tools to predict whether a patient will have the ability to walk after surgery among patients with metastatic spinal disease. Currently, the literature on this topic is limited, with only a few studies reporting on specific risk factors associated with postoperative ambulatory status^10–14, and some traditional scoring system being developed based on survival prediction to stratify function outcomes^8,15,16. As a result, surgeons lack reliable guidance to inform their postoperative functional outcome for patients with metastatic spinal disease¹⁷. Therefore, there is a pressing need for the development of accurate prediction models to assess postoperative ambulatory status in these patients.

In recent years, the application of artificial intelligence (AI) and machine learning techniques in the field of spinal metastatic tumors has demonstrated excellent achievements^18–20. Machine learning algorithms can effectively analyze complex datasets and identify patterns and relationships that may not be readily apparent to human observers. These algorithms have the potential to improve the accuracy and precision of prediction models¹⁹, providing valuable insights into the prognosis and outcomes of patients with metastatic spinal disease^21,22.

Therefore, the objective of this study is to establish and validate an AI platform for predicting postoperative ambulatory status in patients with metastatic spinal disease. Furthermore, we will conduct a comparative analysis between the predictions made by spine surgeons and the AI platform to assess its performance in clinical practice. The development of this AI platform has the potential to assist clinicians in making informed decisions regarding surgical intervention and improving patient outcomes.

Methods

Patients and study design

This study collected 455 patients with metastatic spinal disease who underwent posterior decompressive surgery at three tertiary medical institutions from January 2015 to May 2023. Of these, 220 patients were prospectively collected from one medical institution to form the model derivation cohort, while 89 and 146 patients were collected from two other medical institutions to form the external validation cohorts 1 and 2, respectively. The three medical institutions in this study, two of which are located in the northern region of our country and one in the southern region, are all teaching hospitals and are classified as tertiary A-grade hospitals with good reputations. Validating the model in different regions can further confirm its effectiveness by demonstrating its ability to perform consistently across diverse populations and healthcare settings. All patients underwent X-ray and MRI scans to confirm the location of the metastatic lesion. The study included patients who met specific criteria, including the presence of radiographic evidence of metastatic spinal disease and at least one of the following symptoms: progressive local mechanical or radiation pain, impairment of sensory function, lower limb motor function, or sphincter function. Patients who were receiving conservative treatment, those with primary spinal tumors, metastatic spinal disease caused by leukemia, and intramedullary metastases of spinal metastases were excluded from the study. Additionally, patients who had previously undergone surgery at the site of spinal metastases were also excluded. A flowchart illustrating the study design is presented in Figure 1. Patients in the model derivation cohort were randomly divided into a training cohort and an internal validation cohort using a 7:3 ratio. Patients from external validation cohorts 1 and 2 were utilized for the external validation of the model. Based on the specified inclusion and exclusion criteria and the objectives of the analysis, the study employed a per-protocol analysis method, and the study protocol is provided in Supplementary File 1 (Supplemental Digital Content 1, http://links.lww.com/JS9/B973). The study protocol was approved by the research ethics board of our institution, and all patients provided informed consent for the review of their medical records and images. This study was registered at a National Clinical Trial Registry Center. The study was conducted in accordance with the guidelines outlined in the Declaration of Helsinki, and the reporting of the study adhered to the strengthening the reporting of cohort, cross-sectional, and case–control studies in surgery (STROCSS) criteria²³ and the TRIPOD Checklist²⁴ (Supplemental Digital Content 2, http://links.lww.com/JS9/B974).

Schematic depiction of study design and machine learning process.

Surgical process

The decision to perform surgery was based on indications such as intractable pain due to spinal instability and myelopathy caused by spinal cord compression. The appropriate surgical approach was determined through multidisciplinary collaboration among a neuro-radiologist, spinal tumor surgeon, and oncologist. The surgical management of spinal metastases involved a complex procedure comprising palliative decompression, partial vertebrectomy/En bloc resection of vertebrae, and internal fixation using pedicle screw instrumentation. Under general anesthesia, the patient was positioned in the prone position on the operating table. A midline skin incision was made over the affected area of the spine, and the muscles were carefully dissected to expose the posterior elements of the spine. The surgical procedure involved a posterior approach with either laminectomy or laminotomy to gain access to the spinal cord and nerve roots. The tumor was then meticulously removed using a combination of vertebrectomy and tumor debulking techniques. The extent of the vertebrectomy depended on the tumor involvement, and the decision to perform either subtotal or total vertebrectomy was based on the surgeon’s assessment of the tumor and the patient’s overall health status. No intradural work was required in the present study. Following tumor excision, the resultant space within the vertebral body was filled with bone cement and artificial vertebral bodies to facilitate fusion and stabilization of the spine. Fusion was achieved by addressing the adjacent vertebrae above and below the corpectomy site, involving the insertion of screws into the pedicles of the affected vertebrae, which were interconnected using rods to ensure stability and mitigate the risk of further deformity. The pedicle screw instrumentation also facilitated the maintenance of the correction achieved during the vertebrectomy. The wound was closed in layers using sutures.

Evaluation of the primary outcome

The primary outcome of this study was the ambulatory status of the patients within one week after surgery. Ambulatory status was defined as the ability to independently take at least two steps with each foot (totaling four steps), even if the use of a cane or walker was necessary⁶. The ability to ambulate after surgery for metastatic spinal disease is of great importance as it directly impacts the patients’ overall quality of life and functional independence²⁵.

Quality control

In this study, a meticulous approach was adopted to safeguard the precision and dependability of the acquired data. To begin with, extensive training was provided to the research team, equipping them with a thorough understanding of the data collection protocols. The primary objective of this training was to minimize potential errors and ensure uniform adherence to standardized guidelines by all team members during the data collection and recording processes. Subsequently, a rigorous data entry and validation framework was implemented, involving a meticulous double-entry verification method where two independent individuals entered the data, followed by a meticulous cross-verification process to detect any disparities. Additionally, comprehensive data validation checks were performed to meticulously identify and rectify any inconsistencies or inaccuracies within the collected data.

Moreover, an exhaustive data cleaning procedure was meticulously executed to detect and rectify errors, missing data points, or outliers. This involved a meticulous comparison of the collected data with the source documents, diligently resolving any disparities uncovered. Furthermore, continuous data monitoring was conducted throughout the study, facilitating the proactive identification of potential issues or trends that could impact data quality. This encompassed periodic audits, meticulous review of data collection forms, and the provision of constructive feedback to the research team to ensure ongoing data quality control. By adhering to these stringent data quality control measures, the study aimed to uphold a standard of excellence in data quality, thereby fortifying the validity and integrity of the research findings.

Data preparation

Data preprocessing pipeline was utilized to ensure consistent and reproducible transformation of the data using scikit-learn. The pipeline combined multiple preprocessing steps into a single object for improved accuracy and reliability of machine learning models. It involved data transformation, feature selection, data splitting, and standardization and normalization. Imbalanced data was addressed using a SMOTETomek resampling strategy, which combines Synthetic Minority Oversampling Technique (SMOTE) and Tomek Links Undersampling²⁶. This strategy generates a new dataset with a larger sample size and a more balanced distribution of data, enhancing statistical power and generalizability of the findings. For example, the sample size of the model derivation cohort increased to 334 after implementing the SMOTETomek resampling strategy, with a positive rate of the primary outcome at 50%. Similarly, cohorts 1 and 2 also saw an increase in sample size to 134 and 228, respectively. In addition, a stratified strategy was employed to maintain consistent proportions of the outcome classes between ambulatory and nonambulatory patients.

Modeling

A comprehensive analysis was conducted using a range of models, including logistic regression (LR) and five machine learning algorithms: extreme gradient boosting machine (eXGBM), support vector machine (SVM), random forest (RF), neural network (NN), and decision tree (DT). Furthermore, to enhance the resilience of the study’s model, an ensemble machine learning approach was employed using a soft-voting method^27,28. All models were provided with the same input features to ensure consistency, and the model features were identify using subgroup analysis. Grid and random hyperparameter searches were performed to identify the optimal hyperparameters for each model, with the area under the curve (AUC) used as the optimization metric. During the grid and random hyperparameter searches to identify the optimal hyperparameters for each model, the model parameters were optimized and validated through 10-fold cross-validation on the training data. By training and evaluating the model on different combinations of hyperparameters using 10-fold cross-validation, we obtained a more reliable estimate of the model’s performance. In addition, to accommodate the variability in model performance, wide ranges were established for the hyperparameters. For instance, the DT depth range was set from 2 to 100, allowing for a broad exploration of different tree depths. Similarly, the ‘min_child_weight’ hyperparameter was varied from 1 to 100, enabling the model to consider a wide range of weights for the minimum number of samples required to create a new child node. The ‘min_samples_split’ and ‘min_samples_leaf’ hyperparameters were set from 2 to 200, allowing for flexibility in determining the minimum number of samples needed to split an internal node or form a leaf node, respectively. Learning curves were used as a tool in the identification of overfitting and underfitting issues in models. In the learning curve, the model’s performance on both the training and validation datasets was plotted against the number of training instances or iterations. Machine learning algorithms were implemented using Python (version 3.9.7), and hyperparameter tuning was conducted using Python scikit-learn (version 1.2.2).

Validation

The internal and external validation cohorts were used to validate the models, and multiple evaluation metrics were employed, including the AUC, accuracy, precision, recall, specificity, Brier score, log loss, discrimination slope, calibration slope, and intercept-in-large value. The AUC value was obtained using 100 bootstraps. The accuracy, precision, and recall were calculated using a confusion matrix. In the following equation, TP, TN, FP, and FN represent true positive, true negative, false positive, and false negative, respectively.

Accuracy = (TP + TN) / (TP + FN + FP + TN)

Precision = TP / (TP + FP)

Recall (Sensitivity) = TP / (TP + FN)

Specificity = 1 - FP / (TN + FP)

The Brier score, was calculated using the formula where $N$ represents the total sample, $p_{i}$ represents the predicted risk, and $o_{i}$ represents the actual probability.

Brier Score = \frac{1}{N} \sum_{i = 1}^{n} {(p_{i} - o_{i})}^{2}

The log loss, calculated using the scikit-learn formula, is a metric that evaluates the quality of classification model predictions. It takes into account the number of samples ( $N$ ), the number of classes ( $M$ ), the true labels ( $y_{ij}$ ), and the predicted probabilities ( $p_{ij}$ ).

Log Loss = - \frac{1}{N} \sum_{i = 1}^{N} \sum_{j = 1}^{M} y_{ij} \log (p_{ij})

The discrimination slope was calculated as the mean difference between the predicted probabilities of patients with and without postoperative ambulatory status. The calibration slope and intercept-in-large value were obtained from the calibration curve. Additionally, a scoring system was used to comprehensively evaluate the prediction performance of the models^29,30, with each metric rated on a scale of 1 to 6. The scoring system ranged from 0 to 60. Finally, decision curve analysis (DCA) was employed to determine the clinical net benefits for each model.

Feature importance

The Shapley additive explanation (SHAP) values were used to determine the importance of model features²⁶. The SHAP values were calculated using the equation, where $g$ represents the interpretation model, M represents the number of input parameters, $ϕ_{0}$ represents a constant, $ϕ_{j}$ represents the Shapley value of each model feature, and ${Z^{'}}_{j}$ represents the coalition vector.

g (z^{'}) = ϕ_{0} + \sum_{j = 1}^{M} ϕ_{j} {Z^{'}}_{j}

Among the coalition vectors, ‘1’ suggests that the feature is the same as the feature of the case $x$ to be explained, while ‘0’ suggests that the feature is missing in the present case $x$ . Therefore, considering case $x$ as all simplified features were 1, and then the SHAP expression could be simplified and outlined below.

g (x^{'}) = ϕ_{0} + \sum_{j = 1}^{M} ϕ_{j}

Establishment of the interactive AI platform

An AI platform was developed to estimate the risk of not gaining ambulatory status in patients undergoing decompressive surgery for metastatic spinal disease. The AI platform was designed to be user-friendly and accessible using the Streamlit. The code of supporting the development of the AI platform is available at https://github.com/Starxueshu/postoperativeambulatory. It allows users to customize the input parameters, calculates the probability of not gaining ambulatory status based on the selected parameters, and provides an interface that explains the model’s methodology and performance. Patients were categorized into high-risk or low-risk groups based on a threshold, and corresponding intervention recommendations were provided in terms of the patient stratification. In addition, a human-machine comparative experiment was conducted to compare the prediction performance of 6 medical experts and the AI platform in predicting the outcome of not gaining ambulatory status among patients with spinal metastatic tumors. AUC values were calculated for each medical expert.

Statistical analysis

Continuous variables were summarized using the mean and SD for normally distributed data, while median and interquartile range (IQR) were utilized for non-normally distributed data. Categorical variables were presented as proportions. Student’s t-test was used to compare normally distributed continuous variables, while Wilcoxon Rank Sum Test (Mann–Whitney U test) was used to compare non-normally distributed variables. The χ ² test was conducted to compare the distribution of categorical variables. Statistical power analysis was conducted for significant variables. Delong test was used to compare the prediction performance between medical experts and the AI platform. Statistical analysis was performed using the R language program (version 4.1.2), and a P-value less than 0.05 was considered statistically significant with two-sided testing.

Results

Patient clinical characteristics

In the model derivation cohort, 220 patients were collected for analysis in the study. The median age was 60.00 (53.00, 68.00) years, with 67.3% of patients being male and 20.9% of patients being current smokers (Table 1). The most common primary tumor was rapid growth (38.6%), followed by moderate growth (35.0%) and slow growth (26.4%). The burden of comorbidities was relatively heavy, since 47.3% of patients had at least one comorbidity. In detail, the most prevalent comorbidities were hypertension (30.91%) and diabetes (13.64%) (Supplementary Table 1, Supplemental Digital Content 3, http://links.lww.com/JS9/B975). Regarding systematic therapies, preoperative chemotherapy, targeted therapy, and endocrinology therapy accounted for 17.3, 9.5, and 9.1%, respectively. The tumor burden was relatively heavy, because there were 43.6% of patients had extravertebral bone metastases, 26.8% of patients had visceral metastases, and 19.1% of patients had an Eastern Cooperative Oncology Group (ECOG) score of four, indicating being unable to take care of oneself in daily life. The majority of patients were treated with palliative decompression (89.5%) at thoracic and thoracolumbar site (67.7%). In the entire cohort, 58.6% of patients had the Bilsky score of 3, indicating severe spinal cord compression, and 49.1% of patients lost their ability to walk before surgery. During surgery, 80.5 of patients received intraoperative blood transfusion. More detailed information on preoperative laboratory examination, including albumin, cholesterol, hemoglobin, and PT, is summarized in Table 1.

Table 1.

Patient’s clinical characteristics and a comparison of clinical characteristics between patients with and without postoperative walking ability.

		Postoperative ambulatory status
Characteristics	Overall	Yes	No	P
n	220	169	51
Age (years, median [IQR])	60.00 [53.00–68.00]	60.00 [53.00–67.00]	63.00 [56.00–74.00]	0.041
Sex (male/female, %)	148/72 (67.3/32.7)	110/59 (65.1/34.9)	38/13 (74.5/25.5)	0.277
Primary tumor (%)				0.557
Slow growth	58 (26.4)	46 (27.2)	12 (23.5)
Moderate growth	77 (35.0)	61 (36.1)	16 (31.4)
Rapid growth	85 (38.6)	62 (36.7)	23 (45.1)
Tumor type (%)				0.254
Thyroid cancer	5 (2.3)	5 (3.0)	0 (0.0)
Prostate cancer	31 (14.1)	23 (13.6)	8 (15.7)
Breast cancer	20 (9.1)	17 (10.1)	3 (5.9)
Renal cancer	43 (19.5)	35 (20.7)	8 (15.7)
Lung cancer	58 (26.4)	41 (24.3)	17 (33.3)
Hepatocellular carcinoma	13 (5.9)	9 (5.3)	4 (7.8)
Gastrointestinal system cancer	12 (5.5)	11 (6.5)	1 (2.0)
Urogenital cancer	8 (3.6)	8 (4.7)	0 (0.0)
Others	30 (13.6)	20 (11.8)	10 (19.6)
Smoking (%)				0.773
Never	162 (73.6)	125 (74.0)	37 (72.5)
Previous	12 (5.5)	10 (5.9)	2 (3.9)
Current	46 (20.9)	34 (20.1)	12 (23.5)
BMI (kg/m², median [IQR])	23.90 [21.34–26.12]	23.66 [21.23–26.12]	24.03 [22.04–26.20]	0.419
Number of comorbidities (%)				0.050
0	116 (52.7)	93 (55.0)	23 (45.1)
1	68 (30.9)	54 (32.0)	14 (27.5)
≧2	36 (16.4)	22 (13.0)	14 (27.5)
Coronary disease (no/yes, %)	211/9 (95.9/4.1)	164/5 (97.0/3.0)	47/4 (92.2/7.8)	0.254
Diabetes (no/yes, %)	191/29 (86.8/13.2)	151/18 (89.3/10.7)	40/11 (78.4/21.6)	0.074
Hypertension (no/yes, %)	150/70 (68.2/31.8)	121/48 (71.6/28.4)	29/22 (56.9/43.1)	0.071
Preoperative chemotherapy (no/yes, %)	182/38 (82.7/17.3)	140/29 (82.8/17.2)	42/9 (82.4/17.6)	1.000
Preoperative targeted therapy (no/yes, %)	199/21 (90.5/9.5)	153/16 (90.5/9.5)	46/5 (90.2/9.8)	1.000
Preoperative endocrinology (no/yes, %)	200/20 (90.9/9.1)	157/12 (92.9/7.1)	43/8 (84.3/15.7)	0.112
Extravertebral bone metastasis (no/yes, %)	124/96 (56.4/43.6)	100/69 (59.2/40.8)	24/27 (47.1/52.9)	0.171
Viscera metastases (no/yes, %)	161/59 (73.2/26.8)	119/50 (70.4/29.6)	42/9 (82.4/17.6)	0.132
ECOG (%)				<0.001
1	3 (1.4)	3 (1.8)	0 (0.0)
2	108 (49.1)	106 (62.7)	2 (3.9)
3	67 (30.5)	51 (30.2)	16 (31.4)
4	42 (19.1)	9 (5.3)	33 (64.7)
Surgical process (%)				0.069
Palliative decompression	197 (89.5)	147 (87.0)	50 (98.0)
Partial resection of vertebrae	12 (5.5)	11 (6.5)	1 (2.0)
En bloc resection of vertebrae	11 (5.0)	11 (6.5)	0 (0.0)
Surgical site (%)				0.011
Cervical and cervical thoracic	9 (4.1)	7 (4.1)	2 (3.9)
Thoracic and thoracolumbar	149 (67.7)	106 (62.7)	43 (84.3)
Lumbar and lumbosacral	62 (28.2)	56 (33.1)	6 (11.8)
Number of surgical segments (%)				0.237
1	108 (49.1)	88 (52.1)	20 (39.2)
2	68 (30.9)	48 (28.4)	20 (39.2)
≥3	44 (20.0)	33 (19.5)	11 (21.6)
Preoperative albumin (g/l, median [IQR])	40.10 [37.20–42.60]	40.60 [37.50–43.20]	38.90 [36.75–41.50]	0.034
Total cholesterol (mmol/l, median [IQR])	4.42 [3.71–5.10]	4.58 [3.83–5.19]	4.08 [3.50–4.78]	0.004
Preoperative hemoglobin (g/l, median [IQR])	133.00 [119.00–143.00]	133.00 [124.00–144.00]	131.00 [112.00–141.50]	0.188
PT (seconds, median [IQR])	11.29 [10.60–11.90]	11.20 [10.50–11.70]	11.80 [11.10–12.50]	0.001
Bilsky score (%)				<0.001
1	28 (12.7)	27 (16.0)	1 (2.0)
2	63 (28.6)	58 (34.3)	5 (9.8)
3	129 (58.6)	84 (49.7)	45 (88.2)
Preoperative ambulatory status (yes/no, %)	112/108 (50.9/49.1)	110/59 (65.1/34.9)	2/49 (3.9/96.1)	<0.001
Intraoperative blood transfusion (ml, %)				0.799
None	43 (19.5)	33 (19.5)	10 (19.6)
<1000	122 (55.5)	92 (54.4)	30 (58.8)
≧1000	55 (25.0)	44 (26.0)	11 (21.6)

	Models
Metrics	LR	eXGBM	SVM	RF	NN	DT	Ensemble
Accuracy	0.812	0.832	0.842	0.832	0.861	0.752	0.861
Precise	0.772	0.837	0.827	0.780	0.833	0.692	0.833
Recall	0.880	0.820	0.860	0.920	0.900	0.900	0.900
Specificity	0.745	0.843	0.824	0.745	0.824	0.608	0.824
AUC (95% CI)	0.889 (0.859–0.912)	0.911 (0.881–0.936)	0.904 (0.885–0.928)	0.926 (0.903–0.942)	0.894 (0.864–0.916)	0.814 (0.783–0.844)	0.911 (0.854–0.968)
Brier score	0.129	0.121	0.121	0.116	0.120	0.169	0.118
Log loss	0.431	0.382	0.397	0.362	0.403	0.523	0.375
Discrimination slope	0.522	0.584	0.538	0.501	0.539	0.399	0.513
Intercept-in-large value	0.009	−0.002	0.032	−0.304	0.098	0.079	−0.043
Calibration slope	0.771	0.732	0.859	1.337	0.854	0.706	1.086
Total score	39	54	50	46	50	26	57

PERMALINK

Establishment and validation of an interactive artificial intelligence platform to predict postoperative ambulatory status for patients with metastatic spinal disease: a multicenter analysis

Yunpeng Cui, MD

Xuedong Shi, MD

Yong Qin, MD

Qiwei Wang, MD

Xuyong Cao, MD

Xiaotong Che, MD

Yuanxing Pan, MD

Bing Wang, MD

Mingxing Lei, MD

Yaosheng Liu, MD

Abstract

Background:

Methods:

Results:

Conclusions:

Introduction

Methods

Patients and study design

Figure 1.

Surgical process

Evaluation of the primary outcome

Quality control

Data preparation

Modeling

Validation

Feature importance

Establishment of the interactive AI platform

Statistical analysis

Results

Patient clinical characteristics

Table 1.

Identification of risk factors by subgroup analysis

Modeling and prediction evaluation

Figure 2.

Table 2.

Figure 3.

Figure 4.

Figure 5.

Figure 6.

Figure 7.

Ensemble machine learning-based model

Figure 8.

External validation

Feature importance

Comparison of prediction performance in the ensemble model with and without the number of comorbidities

Deployment of the interactive AI platform

Figure 9.

A comparison of prediction performance between the AI platform and medical experts

Figure 10.

Discussion

Principal findings

Epidemiology of postoperative ambulatory status

Factors affecting postoperative ambulatory status

Prediction of ambulatory status

Intervention guidance under the AI model

Limitations

Conclusions

Ethical approval

Consent

Sources of funding

Author contribution

Conflicts of interest disclosure

Research registration unique identifying number (UIN)

Guarantor

Data availability statement

Provenance and peer review

Supplementary Material

Footnotes

Contributor Information

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases