Prediction Tools in Spine Surgery: A Narrative Review

Martin Coia Jadresic; Joseph F Baker

doi:10.22603/ssrr.2024-0189

. 2024 Oct 19;9(1):1–10. doi: 10.22603/ssrr.2024-0189

Prediction Tools in Spine Surgery: A Narrative Review

Martin Coia Jadresic ¹, Joseph F Baker ^1,²

PMCID: PMC11808232 PMID: 39935977

Abstract

There have been increasing reports on prediction models in spine surgery. Interest in prognostic tools or risk calculators can facilitate shared decision-making about treatment between patients and clinicians.

In recent years, there has been a steady increase in the number of models developed using varying methods. External validation is an essential component of prediction model testing to ensure the appropriate use of these models in populations outside of the developing center.

This narrative review aimed to provide an overview of the literature describing the development and validation of prediction models in the field of spine surgery.

Keywords: spine, surgery, risk, prognosis, complications, validation

Introduction

Prediction models in the field of surgery have two major applications: risk adjustment and prognostication. Risk adjustment is a measure of quality appraisal that enables individualization by casemix when comparing a clinician or a center's outcomes with a given standard¹⁾. For example, the United States uses the Merit-Based Incentive Payment System to adjust clinician reimbursement based in part on quality outcomes²⁾. Prognostication relates to an individual's predicted risk when undergoing a surgical procedure and is an essential part of the process of informed consent³⁾.

There has been a rapid expansion in the publication and dissemination of risk prediction models across numerous fields, including spinal surgery⁴⁾. A recent systematic review identified more than 30 prediction models for various outcomes following degenerative spine surgery alone⁵⁾. With the number and influence of these tools only likely to increase, critical appraisal is vital to assisting patients, clinicians, and policymakers in making informed choices about the use of these tools in clinical practice.

Prediction models exist for the risk of complications, reoperation, readmission, length of stay, and clinical prognosis following spine surgery⁵⁾. In this narrative review, we aimed to provide an overview of the development and validation of prediction models for the risk and adverse events (AEs) of spine surgery.

We selected models based on two recent systematic reviews and an unsystematic review of the literature using the terms “prediction model(s)” and “spine surgery”^5,6). We evaluated studies for completeness using the transparent reporting of a multivariable prediction model for individual prognosis or diagnosis guideline and for bias using the prediction model risk of bias assessment tool, which are the preferred tools of the Cochrane collaboration^7-10). We searched PubMed, Medline, OVID, EMBASE, Cochrane, and Google Scholar for relevant articles. The adequacy of the articles for inclusion was determined by the first author with guidance of the senior author when in doubt.

Ethics

Ethical approval was not required for this literature review.

Study characteristics

We selected 9 model development and 7 associated external validation (EV) studies (Table 1, 2). Four models have been externally validated: SpineSage and the American College of Surgeons National Surgical Quality Improvement Program Risk Calculator (NSQIP) have both undergone independent external validation; the Risk Assessment Tool (RAT) and the Spine Adverse Event (SpineAE) Predictor have been validated by the developing or associated author groups. We identified the sample size and assessment of model performance as major sources of bias (Fig. 1, 2), among others.

Table 1.

Derivation Studies.

Author, Year	Model	Type	Sample Size	Validation	Age (mean)	Degenerative (%)	Complication rate (%)	Variables	EPV	Calibration measure	AUC	TRIPOD	Risk of bias (PROBAST)
Bilimoria, 2013	NSQIP	LR	1414006	Colon subset	58	Unknown	9	21	6044	Plot	0.72	25/30	Unclear
Lee, 2014	SpineSage	LR	1546	Cross- validation	49	67	23	23	23	None	0.76	25/30	High
McGirt, 2015	None	LR	1803	Cross- validation	56	91	6.6	46	3	χ>0.05	0.82	21/29	High
Ratliff, 2016	RAT	LR	279315	Training/ validation	47	97	14	19	2049	None	0.70	18/31	High
Scheer, 2017	None	ML	557	Training/ validation	58	100	27	45	3	None	0.89	19/29	High
Buchlak, 2017	Seattle Spine Score	LR	136	Training/ validation	63	100	26	12	3	χ=0.89	0.71	22/30	High
Kim, 2018	None	ANN	22629	Training/ validation	60	100	2.2*	12	41	None	0.59–0.71	17/30	High
Han, 2019	SpineAE	ML	1104233	Training/ validation	62	96	25	274	995	Plot α=0, β=1	0.70	21/29	Unclear
Broda, 2020	Universal Spine Surgery Score	LR	177928	Training/ validation	57	100	11	18	1101	None	0.75	21/30	High

Open in a new tab

EPV, events per variable; AUC, area under the receiver operating characteristic; TRIPOD, transparent reporting of a multivariable prediction model for individual prognosis or diagnosis; PROBAST, prediction model risk of bias assessment tool; LR, logistic regression; ML, machine learning; ANN, artificial neural network; α, Intercept; β, Calibration Slope; χ=Hosmer–Lemeshow Statistic; * Wound Complications Only.

Table 2.

Validation Studies.

Author, Year	Country	Model	Independent	Design	Sample size	Age (mean)	Degenerative (%)	Outcomes	Predicted (%)	Observed (%)	Calibration	AUC	TRIPOD	Risk of bias (PROBAST)
Veeravagu, 2017	US	RAT	N	Prospective	4 5	62	88	69	27.3	29	Plot	0.67	24/29	High
Veeravagu, 2017	US	NSQIP	Y	Prospective	257	62	88	69	7.2	29	Plot	0.67	24/29	High
Wang, 2017	China	NSQIP	Y	Retrospective	242	79	100	106	14	44	χ=0.16	0.71	20/29	High*
Kasparek, 2018	Austria	Spine Sage	Y	Retrospective	273	61	83	44	15	16	None	0.71	18/29	High
Han, 2019	US	RAT	N	Retrospective	1104233	62	96	831487	?	25	α=−0.63 β=0.3	0.60	17/28	Unclear
McCarthy, 2020	US	NSQIP	Y	Retrospective	641	65	100	111	7	28	None	0.80	17/28	High
Fatemi, 2021	US	RAT/SpineAE	N	Prospective	276	64	89	76	Categorical	28	None	0.64	24/29	High

Open in a new tab

Figure 1. — TRIPOD adherence, n=16. (D)=development studies only, (V)=validation studies only.

Four studies used single or double internal registries, three utilized the NSQIP registry, and two used administrative claims databases. There was a wide range in the size of the derivation cohorts (136 - 14140066). The mean sample size of the validation cohorts was 324. All derivation and four validation studies belonged to the US cohorts. All but one study cohort were primarily (>88%) degenerative cases.

Variable selection was largely a priori using a combination of previously published studies, expert opinion, and availability. Three studies modified their variable selection for the final model based on univariate analysis, a practice specifically identified as a source of bias in PROBAST. The variables or risk factors used in the model development are summarized in Table 3. The outcomes or complications analyzed are summarized in Table 4.

Table 3.

Risk Factors Including Patient, Comorbidity, and Surgery-specific Variables Considered in the Development of Each Model.

NSQIP	SpineSage	Risk Assessment Tool	Spine AE Predictor	McGirt	Scheer	Seattle Spine Score
Demographics Age group Sex Functional status Status: emergency or elective ASA class BMI class Laboratory markers Comorbidities: • Chronic steroid use • Ascites within 30 days preop. • Systemic sepsis within 48 h preop • Ventilator dependence • Disseminated cancer • Diabetes • Hypertension needing medical treatment • Previous cardiac event • CHF within 30 days preop • Dyspnoea • Current smoker within 1-year • COPD • Dialysis • Acute renal failure	Age Sex Smoking Alcohol abuse Drug use DM BMI Medical comorbidity: • MI • Non-MI cardiac disease • CHF • CVA • COPD • Asthma • HTN • RA • Preexisting neoplasm • Syncope or seizure • Anaemia • Bleeding disorder Previous spinal surgery Primary spinal diagnosis: • degenerative • trauma • neoplasm • other Level of surgery: • cervical • thoracic • lumbosacral Approach: • anterior • posterior • combined Surgical Invasiveness Index	Pulmonary disorders HTN Cardiac disorders other than HTN DM Neurologic disorders/deficit Hypercholesterolaemia Smoking Cancer/systemic malignancy Gastroesophageal disorder Alcohol/drug abuse Psychiatric disorder *all with multiple ICD-9-CM codes	Procedure details • Diagnosis • Instrumentation • Fusion • Fusion levels • BMP use Comorbidities: • Pulmonary • HTN • Cardiac disorders – not HTN • DM • Neurological disorder • Hypercholesterolaemia • Smoking • Cancer/systemic malignancy • Gastroesophageal disorder • Alcohol/drug abuse • Psychiatric disorder *all with multiple ICD-9-CM codes	Demographics: • Age • Race • Gender Employment Ambulatory status Smoking status Symptoms+duration ASA Patient comorbidities: • CAD • HTN • MI • A. fib • CHF • COPD • Arthritis • Diabetes • Osteoporosis Primary spinal diagnosis Motor deficits Surgery details: • Primary vs. revision • MIS vs. open • No. of levels involved Pre-op narcotic use No. of prior surgeries BMI	Demographics • Age • Sex BMI Number comorbidities CCI ASA Comorbidities: • Anaemia • Depression • Osteoporosis Surgical data: • Primary vs. revision • Single vs. staged • Rod diameter and material • UIV • LIV • Decompression details • Osteotomy details • Interbody fusion • Graft details: iliac crest, BMP • Number of levels fused	Sex Age BMI Comorbidities: • Smoking • HTN • Anxiety • Depression • Diabetes • Bipolar disorder • Parkinson’s disease • Cancer • Anaemia

Open in a new tab

AE, adverse event; A. fib, atrial fibrillation; ASA, American Society of Anaesthesiologists; BMI, body mass index; BMP, bone morphogenetic protein; CCI, Charlson co-morbidity index; CHF, congestive heart failure; CVA, cerebrovascular accident; COPD, chronic obstructive pulmonary disease; DM, diabetes mellitus; HTN, hypertension; LIV, lower instrumented vertebra; MI, myocardial infarction; MIS, minimally invasive surgery; RA, rheumatoid arthritis; UIV, upper instrumented vertebra

Table 4.

Adverse Events and Outcomes Recorded in Each Tool.

NSQIP	SpineSage	Risk assessment tool	Spine AE Predictor	McGirt	Scheer	Seattle Spine Score
30-day postoperative outcomes: • Mortality • Morbidity • Pneumonia • Cardiac • SSI • UTI • VTE • Renal failure	Cardiac Air embolism; arrest; arrhythmia; CHF; HTN; hypotension; MI; Inadequate fluid therapy; thermodysregulation; other Pulmonary ARDS; empyema; haemothorax; pleural effusion; hypoxia; pneumonia; pneumothorax; PE; respiratory arrest; other GI Ascites; colitis; GI bleeding; ileus; obstruction; pancreatitis; perforation; peritonitis; other Neurological CVA; cerebral hypoperfusion; delirium; diabetes insipidus; electrolyte change; meningitis; SAH/ICH; seizure; withdrawal – alcohol/narcotic; other Hematologic Coagulopathy; DVT; OR haemorrhage>3 L; transfusion; other Urologic -Foley catheter-related trauma; renal insufficiency; urinary retention; UTI; other	All adverse events, including medical, within 30 days of surgery: • Wound infection • Pneumonia • Renal failure • MI • Pulmonary • Neurological • CHF • PE • DVT • Wound hematoma • Other wound complication • UTI • Cardiac dysrhythmia • Delerium *all relevant subheadings under the given ICD-9-CM code included	• Pulmonary complications • CHF • Neurological complications • Pneumonia • Cardiac dysrhythmia • Renal failure • MI • Wound infection • PE • DVT • Wound hematoma • Other wound complications • UTI • Delerium • Infection • Delerium *all relevant subheadings under the given ICD-9-CM code included	Perioperative course including morbidity and need for rehab Mortality	Intra- and perioperative complications within 6 weeks Classed as major and minor according to Glassman et al. (2007) • Cardiopulmonary • Electrolyte • Gastrointestinal • Implant • Infection • MSK • Neurological • Operative • Radiographic • Renal • Vascular • Wound • Other	Complication events within 30 days of surgery: • Cardiac event including MI • Pneumothorax • Pneumonia • Wound infection • Wound dehiscence • UTI • PE • VTE • Unplanned return to OR • Death

Open in a new tab

ARDS, adult respiratory distress syndrome; CHF, congestive heart failure; CVA, cerebrovascular accident; DVT, deep vein thrombosis; GI, gastrointestinal; HTN, hypertension; ICH, intracerebral haemorrhage; MI, myocardial infarction; MSK, musculoskeletal; PE, pulmonary embolus; OR, operating room; SAH, subarachnoid haemorrhage; SSI, surgical site infection; UTI, urinary tract infection; VTE, venous thromboembolism

All nine development papers reported an appropriate detail for model building procedures; however, there was poor reporting of model iterations-no study provided detailed information of discarded models. Meanwhile, six models were derived using logistic regression (LR), two using machine learning (ML), and one using an artificial neural network (ANN).

To avoid overfitting, LR classically required at least 10 participants with the outcome of interest per candidate variable (events-per-variable [EPV]), although in the simulation studies, the regression models were robust only at EPVs>20^11,12). For example, the cohort reported by Scheer et al. had 148 complications, 45 initial variables, and 20 included in the final model (EPV=3) suggesting a lower than ideal number of complications per variable¹³⁾. They reported the highest discrimination of any paper (area under the curve [AUC]=0.89) and high accuracy (87.6%), indicating significant overfitting of the model.

The inclusion of all “candidate” variables is critical-EPV is calculated using all the factors considered in the preliminary analysis, not just those included in the final model¹⁴⁾. Overfitting can be ameliorated to some degree using techniques such as bootstrapping and cross-validation, but no model with a low EPV specifically addressed this.

Of the nine development studies, six internally validated their models by dividing the data into training and validation subsets. This random split-sample approach is common in predictive modeling but has been criticized as being inefficient as it does not use all available data for model development and is unlikely to demonstrate significant differences between the subsets considering the common data source⁴⁾. Temporal or geographical split-sampling has been suggested to provide a more robust evaluation than sampling from contiguous time periods or regions^15,16). The alternative approaches include cross-validation (validation between all possible combinations in a split sample) and bootstrapping, which can also be employed to ameliorate model optimism. Three studies used cross-validation, but only one reported the iterative performance of the model at each step.

A complete EV should present an assessment of calibration, discrimination, and clinical utility¹⁷⁾. Briefly, calibration is a measure of agreement between predicted and observed outcome rates and is often visually reported, whereas discrimination reflects the likelihood of a model returning a higher risk estimate for participants who actually experience the outcome⁴⁾. Assessment of how useful a risk calculator is in aiding clinical decision making can be inferred using decision curve analysis or relative utility⁴⁾. Of these measures, discrimination is the most frequently reported in the orthopedic literature, usually in the form of an area under the receiver-operating characteristic (AUC)¹⁸⁾.

All studies specified the overall AUC of their model, although only seven reported confidence intervals. Eight studies conducted assessment of calibration. However, in 3 studies calibration was performed using the Hosmer-Lemeshow test statistic, which is sensitive to subgroup number and sample size and gives no information on the direction of any miscalibration¹⁴⁾. Two studies provided appropriate measures of discrimination and calibration. No study included decision curve analysis or other assessment of clinical utility; thus, extrapolation of model usefulness to the clinical setting was not possible.

Validation studies are recommended to include at least 100 participants with outcomes to avoid biased estimates of model performance¹⁴⁾. Of the six studies, three met this criterion for the primary outcome of any occurring complication; however, two of these three studies assessed secondary outcomes with significantly lower incidence, including one positive assessment of the NSQIP model's mortality prediction despite the very low incidence (n=2)¹⁹⁾. Adjustments can be made for the analysis of rare outcomes, such as the substituting precision-recall (PR) metrics for the receiver-operating characteristic (ROC) curve; however, no studies reported actual PR metrics despite one including a PR plot^20,21).

NSQIP

The NSQIP Risk Calculator is an LR model derived from a retrospective cohort of 1,414,006 surgical procedures across multiple specialties from 393 hospitals in the United States²²⁾. Internal validation was performed on a subcohort of colorectal surgical patients for whom a model had previously been derived. The AUC was 0.944 for mortality and 0.816 for morbidity. Calibration was visually reported using the Hosmer-Lemeshow-type calibration curves and calibration plots, and good performance was observed. The calculator is freely available online at https://riskcalculator.facs.org/RiskCalculator/

Wang et al. assessed the performance of the NSQIP calculator in a cohort of 242 patients aged >60 years undergoing exclusive single-level laminectomies at a single center¹⁹⁾. They found that NSQIP had poor discrimination, poor calibration-in-the-large (Table 2), and poor overall performance (Brier score=0.321) for predicting any complication. The authors concluded that these findings were likely because of the older cohort. Performance analysis of the NSQIP for complication subtype and death was also conducted despite the very low incidence of these outcomes.

Veeravagu et al. found similar overall discrimination for the NSQIP calculator (ROC=0.67) to Wang and systematic risk underestimation (Table 2). However, their study included only 69 patients who had complications. Validation studies should ideally include at least 100 participants with outcomes to avoid biased estimates of model performance²³⁾.

McCarthy et al retrospectively analyzed 641 patients who underwent either cervical or lumbar primary arthrodesis at a single center in the USA²⁴⁾. They reported excellent discrimination for NSQIP with an AUC of 0.801. However, the gross calibration was poor, with a predicted event rate of 6.9% and observed rate of 27.5%.

In all validation studies, the NSQIP calculator exhibited poor calibration for spine surgery. A systematic deviation of observed risk from predicted risk indicates the absence of an important predictor from the model²⁵⁾. The calculator only allows a single current procedural terminology code to be entered. Thus, procedure complexity and invasiveness may not be appropriately represented, particularly for multilevel or dual-approach surgeries.

Broda et al. addressed these deficiencies by deriving a model using only the subset of the NSQIP cohort that underwent spine surgery for degenerative diseases of the cervical or lumbar spine (n=177, 928). Univariate analysis was employed to find significant predictors, which were then included in the multivariate LR model²⁶⁾. This approach, though common, ignores important confounding relationships between predictors in the final model and omits factors that may significantly influence outcome in the multivariate analysis²⁷⁾.

External validity was evaluated via comparison with the extant NSQIP calculator; however, EV requires testing against cohort independent from that used for development^15,28). The complication rate was 11.1%, consistent with the low overall morbidity rates in the NSQIP cohort. A final AUC of 0.75 was reported for any complication, with good agreement between the predicted and observed complication rates within the 12 risk groups analyzed.

Kim et al. derived yet another model from the NSQIP cohort of spine surgeries (n=22629) using ANNs²⁹⁾. The outcomes analyzed were cardiac and wound complications, venous thromboembolism (VTE), and mortality. The EPV was 2.8 for mortality and 41 for wound complications. The authors employed stepwise LR to select variables for the final model based on coefficient magnitude and significance in the initial regression model. This approach is counterintuitive as ANNs are designed to “learn” associations between many different variables without influence of a prespecified hypothesis³⁰⁾. After the split-sample training and validation, the models were tested and compared on a further split sample. LR exhibited a better AUC than ANN for VTE (0.588 vs. 0.567), wound complications (0.61 vs. 0.6), and mortality (0.7 vs. 0.68). The calibration of the model was not assessed.

SpineSage

The SpineSage calculator was derived from a cohort of 1476 adult patients who underwent spine surgery at one of two tertiary centers in the USA³¹⁾. Variables were selected based on previous published analyses and depending on whether they were known confounders or had strong univariate association. Broad definitions of postoperative complications were used, and the definition of a major complication was more restrictive than that used in NSQIP (cardiac arrest, myocardial infarction, acute respiratory distress syndrome, postop hypoxia, pulmonary embolism, bowel perforation, and/or meningitis). The authors employed the 50:50 cross-validation method for internal validation. SpineSage is freely available online at https://depts.washington.edu/spinersk/

The AUC was 0.76 for any medical complication and 0.81 for major complications. Unfortunately, the number of patients with major complications was not reported; thus, the EPV for major complication is unknown. The authors described calibration using the Hosmer-Lemeshow test; however, no statistic was reported, and no calibration plot was included.

The final model was disseminated at the time of publication as an online calculator (SpineSage.com). The calculated risks were stratified using the surgical invasiveness index (SII), a validated tool to measure the invasiveness of spinal procedures based on the number of vertebrae decompressed, instrumented, anterior, and/or posterior to the pedicles³²⁾. In the development study, SII>25 was the greatest risk factor in the univariate analysis (odds ratio: 6.95, 95% confidence interval: 3.43-10.3, P<0.001), suggesting that granular assessment of surgical factors is critical in the development of an accurate model.

Kasparek et al. investigated the performance of the SpineSage tool for major and all medical complications at a single center in Vienna (n=273)³³⁾. The study was underpowered with 44 occurrences of the main outcome (any medical complication)²⁵⁾. Nine patients had major complications. The model showed similar discrimination to the derivation study (Table 1, 2) with reasonable gross calibration. However, the authors chose to assess calibration on the basis of median risk in arbitrarily defined risk groups, rather than reporting calibration curve or intercept, which was criticized due to the possibility of false assumptions of risk profile within these arbitrary groups^34,35).

Risk assessment tool

The RAT was derived from a large cohort of patients (n=279, 315) using an administrative claims database (MarketScan)³⁶⁾. The PROBAST guidelines suggest that development studies have a higher potential for bias when participant data are collected from existing sources, such as registries, as data are often collected for a purpose other than development (i.e., administrative claims). Historically, there have been important issues with the use of claims data to judge clinical outcomes^37-39). The authors cited a study they previously published claiming improved sensitivity for capturing AE using longitudinal data captured in MarketScan for 30 days; however, in that study, the 30-day outcomes were simply compared with the complication rates at an earlier time point, not with retrospectively/prospectively collected data based on electronic medical records²³⁾. The AUCs were presented for the cohort as a whole and for each surgical subgroup. No assessment of calibration was performed, but good calibration is expected in a cohort of this size.

Veeravagu et al. conducted a prospective validation of the RAT (n=246), which showed good calibration-in-the-large but poor discrimination (Table 2)⁴⁰⁾. A calibration curve was reported to show a good fit up to a predicted/actual risk of 40% with progressive overestimation of risk thereafter. The complication incidence was 73. Calibration-in-the-large was very poor for complication subtype; however, the event rate for these complications was extremely low (2-26). The AUC for the NSQIP calculator in this population was 0.67 with systematic risk underestimation in each of the groups analyzed.

For their analysis of accuracy of risk prediction, Veeravagu et al. reclassified the risk predictions into risk groups, namely, low, medium, and high, with each having equal tertiles with the same/similar number of patients. As previously discussed, this reclassification can result in loss of information and risks bias; when reclassification is performed, the ideal method is to reclassify into at least 10 subgroups¹⁴⁾.

Spine AE predictor

The same research group that developed the RAT created another model (SpineAE Predictor) in 2019 using similar methods, with the addition of the Medicare and Medicaid databases⁴¹⁾. In this new cohort, the previous RAT model exhibited an AUC of 0.6 and poor calibration (intercept −0.63, slope 0.301). Spine AE Predictor is freely available at https://spineaepredictor.shinyapps.io/app-1/

The SpineAE Predictor was prospectively validated by Fatemi et al. at a single institution (n=276). However, the RAT model was also used for some aspects of the analysis, with conflicting information in the study regarding whether the risk thresholds and absolute risk were calculated using RAT or SpineAE²⁰⁾. The authors reported an AUC of 0.64, with no assessment of calibration. The bulk of the analysis was based on the sensitivity/specificity of the tool according to the risk thresholds. The calculator had a sensitivity of 0.38 for high-risk patients (probability of AE>0.278). The accuracy of the calculator ranged from 0.5 to 0.69, depending on the subgroup, with no overall accuracy reported. The authors reported their results as a successful EV.

Notably, the same research group that developed the RAT and the SpineAE Predictor also contributed to and performed the subsequent validation, thereby increasing the risk of bias⁴²⁾. Ideally, EV should be conducted by independent actors¹⁸⁾.

McGirt

McGirt et al. used a single-center patient database (n=1803) to derive an LR model for complications, readmission, 12-month Oswestry Disability Index, and a composite outcome of any unplanned occurrence⁴³⁾. Varying sample sizes from 750 to 1200 were used in model development “depending on outcome of interest.” Patients with missing information were excluded from all analyses. Surgical procedure variables were included types of procedure: primary or revision; and, minimally invasive or open. No explicit definition of a complication was given, but the complication rate was 6.6%. A total of 46 model covariates were listed with yielding an EPV of 2.6. For postoperative complications, model performance was assessed using AUC (0.82) and the Hosmer-Lemeshow test for calibration. No value was reported for the latter, but it was stated as >0.05. The model coefficients were reported, but there was no intercept. We found this development study to be at a high risk of bias in each of the four PROBAST domains, with the most relevant point being the degree of overfitting. The model has yet to undergo EV.

Scheer

Scheer et al. used the ML algorithm to derive a model for complications on a multicenter adult spinal deformity database (n=557)¹³⁾. The model had an extremely low EPV (3)⁴⁴⁾. It is unclear whether the model returns a binary categorical outcome or an absolute risk, but the former is inferred, considering the absence of information on predicted and observed risks. The AUC and overall “accuracy” were reported, without a supporting confusion matrix detailing metrics, such as sensitivity and specificity. No calibration assessment was conducted. The authors acknowledged that the limitation of this type of model is the lack of transparency-the specific influence of the covariates is unknown, limiting the potential application for both risk adjustment and prognostication.

Seattle spine score

Buchlak et al. derived an LR model (Seattle Spine Score [SSS]) using a retrospective cohort of adult spinal deformity patients (n=136)⁴⁵⁾. The authors included variables that were deemed clinically relevant or achieved univariate significance level of 0.2 or less. The low EPV in this study is compounded by evidence suggesting that ML models exhibit high optimism even when the EPV is >200⁴⁴⁾. The SSS can be accessed in the following: https://safetyinspinesurgery.com/publications/seattle-spine-score/

Clinical utility was assessed by comparing predictive performance with and without access with the model based on 100 de-identified cases. Performance was marginally improved with access to the model (61% vs. 50% accuracy), although it appears the cases used in testing were pulled from the same cohort from which the model was developed.

Discussion

In 2014, a systematic review of EV studies reported that 54% of the studies did not mention missing data and that 67% did not evaluate calibration. The authors concluded that the vast majority of EV studies were poorly designed and suggested that this may explain the lack of uptake of predictive models in clinical practice⁴⁶⁾. A more recent systematic review of the general orthopedic literature obtained similar findings and recommended multicenter collaboration to increase the inadequate sample sizes in the current EV studies¹⁸⁾. Publication bias of negative EV studies is also a concern. Despite the existence of robust, explicit guidance in the areas of model development and evaluation, there is poor awareness of these guidelines - none of the studies we analyzed mentioned an accepted framework.

Furthermore, the focus on discrimination as the primary measure of model performance should be highlighted, particularly in the context of small EV studies where ROC may be particularly misleading²¹⁾. Decision curve analysis was conspicuously lacking from all the studies we analyzed, presenting a further barrier to face validity of the models. This may be due to poor awareness of the technique in the orthopedic field or the perceived difficulty in the reporting and interpretation of decision curves, as the analysis itself is prescriptive and easy to undertake, with freely disseminated step-by-step instructions available for all major statistical packages published online^47,48).

In this review, only models that published a web-based tool have undergone EV. Although many of the LR development studies published model coefficients, none reported an intercept, whereas the ML/ANN models were completely opaque. Even if these metrics are reported, validation requires their incorporation into a statistical package and subsequent generation of risk scores, a step that is obviated by an online tool.

Significant heterogeneity was observed in the definition of a complication. Some models focused on a composite outcome of “any complication,” whereas others attempted to predict specific complications. The latter may be more suited to risk stratification-a patient may question why their specific risk of a urinary tract infection or deep vein thrombosis is relevant when more salient risks, such as paralysis and admission to the intensive care unit, are discussed in the clinic. Standardized, composite outcome measures may improve the generalizability and comparison of future models⁴⁹⁾. Furthermore, composite outcome measures of clinically relevant complications allow relatively small cohorts to make statistically valid conclusions. For example, the SSI rate after instrumented fusion ranges from 2.4% to 8.5%-an EV study examining this complication will require a minimum sample size of 1176-4166 to make robust conclusions regarding predictive ability⁵⁰⁾.

It is perhaps surprising that such a large database as the NSQIP does not result in the generation of a reliable RAT. However, as pointed out by others, the database covers various surgical procedures rather than just spinal; thus, the influence of variables may be either enhanced or diminished by the inclusion of nonspinal procedures. The approach by Broda et al. on this regard may allow a more refined analysis, but as aforementioned, an improvement of the methodology used is needed²⁶⁾. Furthermore, validation of the NSQIP may yield disappointing results when performed in centers outside the USA where healthcare resource and funding models are often strikingly different.

Focusing only on spinal surgical procedures improves the accuracy of model development, as seen in SpineSage. In comparison with SpineSage, which uses the SII to allow a risk scale based on the selected variables, NSQIP does not provide a nuanced assessment of the spinal procedure³²⁾. However, it is notable that this calculator does not include comorbid conditions, such as fibromyalgia, for which there is clear evidence of a high complication rate⁵¹⁾. The exclusion of conditions in this manner likely relates to the clinicians' inherent knowledge and experience that they predispose to complications; surgery is avoided, and the condition therefore fails to appear as a variable in any data analysis. This potential pitfall supports the need to continuously add prospectively to datasets allowing interim reanalysis lest previous findings are forgotten and omitted from the risk assessment.

Considering the shortcomings of current EV studies for complications following spine surgery, it is unsurprising that clinical impact studies of any of the described models are absent. Clinicians should exercise caution before adopting routine use of these tools into their daily practice and consider conducting their own validation studies because regional variation in patient demographics and healthcare resources may result in different performances. Future efforts in this field should adhere to established guidelines in the development and validation of these tools and encourage continued assessment of their accuracy across multiple systems.

Conflicts of Interest: JFB: received consultation fees from Fisher & Paykel; teaching honorarium from Medtronic

Sources of Funding: Nil relevant

Author Contributions: MCJ completed literature review and extraction of data, initial drafting; JFB developed concept and methodology, assisted in data interpretation, final drafting

Ethical Approval: Not applicable-this a review article

Informed Consent: Not applicable-no patients involved

References

1.Iezzoni LI. Risk adjustment for measuring health care outcomes. Chicago: Health Administration Press; 1997; 471-516 [Google Scholar]
2.MIPS Explore Measures - QPP. Accessed September 14, 2021. https://qpp.cms.gov/mips/explore-measures
3.Powell JM, Rai A, Foy M, et al. The “three-legged stool”: a system for spinal informed consent. Bone Joint J. 2016;98-B(11):1427-30. [DOI] [PubMed] [Google Scholar]
4.Moons KG, Altman DG, Reitsma JB, et al. Transparent reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD): explanation and elaboration. Ann Intern Med. 2015;162(1):W1-73. [DOI] [PubMed] [Google Scholar]
5.Lubelski D, Hersh A, Azad TD, et al. Prediction models in degenerative spine surgery: a systematic review. Glob Spine J. 2021;11(1_suppl):79S-88S. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Romiyo P, Ding K, Dejam D, et al. Systematic review and evaluation of predictive modeling algorithms in spinal surgeries. J Neurol Sci. 2021;420:117184. [DOI] [PubMed] [Google Scholar]
7.Collins GS, Reitsma JB, Altman DG, et al. Transparent reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD). Ann Intern Med. 2015;162(10):735-6. [DOI] [PubMed] [Google Scholar]
8.Wolff RF, Moons KG, Riley RD, et al. PROBAST: a tool to assess the risk of bias and applicability of prediction model studies. Ann Intern Med. 2019;170(1):51-8. [DOI] [PubMed] [Google Scholar]
9.Abstracts of the 25th Cochrane Colloquium. In: Abstracts of the 25th Cochrane Colloquium. Wiley; 2018. doi: 10.1002/14651858.CD201801 [DOI] [Google Scholar]
10.Abstracts accepted for the 26th Cochrane Colloquium. Cochrane Database Syst Rev. Published online January 31, 2020. doi: 10.1002/14651858.CD201901 [DOI] [Google Scholar]
11.Courvoisier DS, Combescure C, Agoritsas T, et al. Performance of logistic regression modeling: beyond the number of events per variable, the role of data structure. J Clin Epidemiol. 2011;64(9):993-1000. [DOI] [PubMed] [Google Scholar]
12.van Smeden M, de Groot JA, Moons KG, et al. No rationale for 1 variable per 10 events criterion for binary logistic regression analysis. BMC Med Res Methodol. 2016;16(1):163. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Scheer JK, Smith JS, Schwab F, et al. Development of a preoperative predictive model for major complications following adult spinal deformity surgery. J Neurosurg Spine. 2017;26(6):736-43. [DOI] [PubMed] [Google Scholar]
14.Moons KG, Wolff RF, Riley RD, et al. PROBAST: a tool to assess risk of bias and applicability of prediction model studies: explanation and elaboration. Ann Intern Med. 2019;170(1):W1-33. [DOI] [PubMed] [Google Scholar]
15.Altman DG, Royston P. What do we mean by validating a prognostic model? Stat Med. 2000;19(4):453-73. [DOI] [PubMed] [Google Scholar]
16.Altman DG, Vergouwe Y, Royston P, et al. Prognosis and prognostic research: validating a prognostic model. BMJ. 2009;338 (7708):b605. [DOI] [PubMed] [Google Scholar]
17.Steyerberg EW, Vergouwe Y. Towards better clinical prediction models: seven steps for development and an ABCD for validation. Eur Heart J. 2014;35(29):1925-31. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Groot OQ, Bindels BJ, Ogink PT, et al. Availability and reporting quality of external validations of machine-learning prediction models with orthopedic surgical outcomes: a systematic review. Acta Orthop. 2021;92(4):385-93. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Wang X, Hu Y, Zhao B, et al. Predictive validity of the ACS-NSQIP surgical risk calculator in geriatric patients undergoing lumbar surgery. Medicine. 2017;96(43):e8416. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Fatemi P, Zhang Y, Han SS, et al. External validation of a predictive model of adverse events following spine surgery. Spine J. 2022;22(1):104-12. [DOI] [PubMed] [Google Scholar]
21.Saito T, Rehmsmeier M. The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLOS ONE. 2015;10(3):e0118432. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Bilimoria KY, Liu Y, Paruch JL, et al. Development and evaluation of the universal ACS NSQIP surgical risk calculator: a decision aid and informed consent tool for patients and surgeons. J Am Coll Surg. 2013;217(5):833-42. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Veeravagu A, Cole TS, Azad TD, et al. Improved capture of adverse events after spinal surgery procedures with a longitudinal administrative database. J Neurosurg Spine. 2015;23(3):374-82. [DOI] [PubMed] [Google Scholar]
24.McCarthy MH, Singh P, Nayak R, et al. Can the American College of Surgeons risk calculator predict 30-day complications after spine surgery? Spine. 2020;45(9):621-8. [DOI] [PubMed] [Google Scholar]
25.Vergouwe Y, Steyerberg EW, Eijkemans MJ, et al. Substantial effective sample sizes were required for external validation studies of predictive logistic regression models. J Clin Epidemiol. 2005;58(5):475-83. [DOI] [PubMed] [Google Scholar]
26.Broda A, Sanford Z, Turcotte J, et al. Development of a risk prediction model with improved clinical utility in elective cervical and lumbar spine surgery. Spine. 2020;45(9):E542-51. [DOI] [PubMed] [Google Scholar]
27.Sun GW, Shook TL, Kay GL. Inappropriate use of bivariable analysis to screen risk factors for use in multivariable analysis. J Clin Epidemiol. 1996;49(8):907-16. [DOI] [PubMed] [Google Scholar]
28.Moons KG, Kengne AP, Grobbee DE, et al. Risk prediction models: II. External validation, model updating, and impact assessment. Heart. 2012;98(9):691-8. [DOI] [PubMed] [Google Scholar]
29.Kim JS, Merrill RK, Arvind V, et al. Examining the ability of artificial neural networks machine learning models to accurately predict complications following posterior lumbar spine fusion. Spine. 2018;43(12):853-60. [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Forsström JJ, Dalton KJ. Artificial neural networks for decision support in clinical medicine. Ann Med. 1995;27(5):509-17. [DOI] [PubMed] [Google Scholar]
31.Lee MJ, Konodi MA, Cizik AM, et al. Risk factors for medical complication after spine surgery: a multivariate analysis of 1,591 patients. Spine J. 2012;12(3):197-206. [DOI] [PMC free article] [PubMed] [Google Scholar]
32.Mirza SK, Deyo RA, Heagerty PJ, et al. Development of an index to characterize the “invasiveness” of spine surgery: validation by comparison to blood loss and operative time. Spine. 2008;33(24):2651-61. [DOI] [PubMed] [Google Scholar]
33.Kasparek MF, Boettner F, Rienmueller A, et al. Predicting medical complications in spine surgery: evaluation of a novel online risk calculator. Eur Spine J. 2018;27(10):2449-56. [DOI] [PubMed] [Google Scholar]
34.Bennette C, Vickers A. Against quantiles: categorization of continuous variables in epidemiologic research, and its discontents. BMC Med Res Methodol. 2012;12:21. DOI.org/10.1186/1471-2288-12-21 [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Shahsavari S, Naderi M, Abbasi M. Letter to the Editor concerning “Predicting medical complications in spine surgery: evaluation of a novel online risk calculator” by M. F. Kasparek et al. (Eur Spine J: DOI 10.1007/s00586-018-5707-9). Eur Spine J. 2018;27(11):2885-6. [DOI] [PubMed] [Google Scholar]
36.Ratliff JK, Balise R, Veeravagu A, et al. Predicting occurrence of spine surgery complications using “big data” modeling of an administrative claims database. J Bone Joint Surg Am. 2016;98(10):824-34. [DOI] [PubMed] [Google Scholar]
37.Fisher ES, Whaley FS, Krushat WM, et al. The accuracy of Medicare's hospital claims data: progress has been made, but problems remain. Am J Public Health. 1992;82(2):243-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
38.Guimarães PO, Krishnamoorthy A, Kaltenbach LA, et al. Accuracy of medical claims for identifying cardiovascular and bleeding events after myocardial infarction: a secondary analysis of the TRANSLATE-ACS study. JAMA Cardiol. 2017;2(7):750-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
39.Rudrapatna VA, Glicksberg BS, Avila P, et al. Accuracy of medical billing data against the electronic health record in the measurement of colorectal cancer screening rates. BMJ Open Qual. 2020;9(1):e000856. [DOI] [PMC free article] [PubMed] [Google Scholar]
40.Veeravagu A, Li A, Swinney C, et al. Predicting complication risk in spine surgery: a prospective analysis of a novel risk assessment tool. J Neurosurg Spine. 2017;27(1):81-91. [DOI] [PubMed] [Google Scholar]
41.Han SS, Azad TD, Suarez PA, et al. A machine learning approach for predictive models of adverse events following spine surgery. Spine J. 2019;19(11):1772-81. [DOI] [PubMed] [Google Scholar]
42.Siontis GC, Tzoulaki I, Castaldi PJ, et al. External validation of new risk prediction models is infrequent and reveals worse prognostic discrimination. J Clin Epidemiol. 2015;68(1):25-34. [DOI] [PubMed] [Google Scholar]
43.McGirt MJ, Sivaganesan A, Asher AL, et al. Prediction model for outcome after low-back surgery: individualized likelihood of complication, hospital readmission, return to work, and 12-month improvement in functional disability. Neurosurg Focus. 2015;39(6):E13. [DOI] [PubMed] [Google Scholar]
44.van der Ploeg T, Austin PC, Steyerberg EW. Modern modelling techniques are data hungry: a simulation study for predicting dichotomous endpoints. BMC Med Res Methodol. 2014;14:137. [DOI] [PMC free article] [PubMed] [Google Scholar]
45.Buchlak QD, Yanamadala V, Leveque JC, et al. The Seattle spine score: predicting 30-day complication risk in adult spinal deformity surgery. J Clin Neurosci. 2017;43:247-55. [DOI] [PubMed] [Google Scholar]
46.Collins GS, de Groot JA, Dutton S, et al. External validation of multivariable prediction models: a systematic review of methodological conduct and reporting. BMC Med Res Methodol. 2014;14:40. [DOI] [PMC free article] [PubMed] [Google Scholar]
47.Vickers AJ, van Calster B, Steyerberg EW. A simple, step-by-step guide to interpreting decision curve analysis. Diagn Progn Res. 2019;3:18. [DOI] [PMC free article] [PubMed] [Google Scholar]
48.Vickers AJ, Holland F. Decision curve analysis to evaluate the clinical benefit of prediction models. Spine J. 2021;21(10):1643-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
49.Clavien PA, Vetter D, Staiger RD, et al. The Comprehensive Complication Index (CCI^Ⓡ): added value and clinical perspectives 3 years “down the line”. Ann Surg. 2017;265(6):1045-50. [DOI] [PubMed] [Google Scholar]
50.Schimmel JJ, Horsting PP, de Kleuver M, et al. Risk factors for deep surgical site infections after spinal fusion. Eur Spine J. 2010;19(10):1711-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
51.Donnally CJ, 3rd, Vakharia RM, Rush AJ, 3rd, et al. Fibromyalgia as a predictor of increased postoperative complications, readmission rates, and hospital costs in patients undergoing posterior lumbar spine fusion. Spine. 2019;44(4):E233-8. [DOI] [PubMed] [Google Scholar]

[B1] 1.Iezzoni LI. Risk adjustment for measuring health care outcomes. Chicago: Health Administration Press; 1997; 471-516 [Google Scholar]

[B2] 2.MIPS Explore Measures - QPP. Accessed September 14, 2021. https://qpp.cms.gov/mips/explore-measures

[B3] 3.Powell JM, Rai A, Foy M, et al. The “three-legged stool”: a system for spinal informed consent. Bone Joint J. 2016;98-B(11):1427-30. [DOI] [PubMed] [Google Scholar]

[B4] 4.Moons KG, Altman DG, Reitsma JB, et al. Transparent reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD): explanation and elaboration. Ann Intern Med. 2015;162(1):W1-73. [DOI] [PubMed] [Google Scholar]

[B5] 5.Lubelski D, Hersh A, Azad TD, et al. Prediction models in degenerative spine surgery: a systematic review. Glob Spine J. 2021;11(1_suppl):79S-88S. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B6] 6.Romiyo P, Ding K, Dejam D, et al. Systematic review and evaluation of predictive modeling algorithms in spinal surgeries. J Neurol Sci. 2021;420:117184. [DOI] [PubMed] [Google Scholar]

[B7] 7.Collins GS, Reitsma JB, Altman DG, et al. Transparent reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD). Ann Intern Med. 2015;162(10):735-6. [DOI] [PubMed] [Google Scholar]

[B8] 8.Wolff RF, Moons KG, Riley RD, et al. PROBAST: a tool to assess the risk of bias and applicability of prediction model studies. Ann Intern Med. 2019;170(1):51-8. [DOI] [PubMed] [Google Scholar]

[B9] 9.Abstracts of the 25th Cochrane Colloquium. In: Abstracts of the 25th Cochrane Colloquium. Wiley; 2018. doi: 10.1002/14651858.CD201801 [DOI] [Google Scholar]

[B10] 10.Abstracts accepted for the 26th Cochrane Colloquium. Cochrane Database Syst Rev. Published online January 31, 2020. doi: 10.1002/14651858.CD201901 [DOI] [Google Scholar]

[B11] 11.Courvoisier DS, Combescure C, Agoritsas T, et al. Performance of logistic regression modeling: beyond the number of events per variable, the role of data structure. J Clin Epidemiol. 2011;64(9):993-1000. [DOI] [PubMed] [Google Scholar]

[B12] 12.van Smeden M, de Groot JA, Moons KG, et al. No rationale for 1 variable per 10 events criterion for binary logistic regression analysis. BMC Med Res Methodol. 2016;16(1):163. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B13] 13.Scheer JK, Smith JS, Schwab F, et al. Development of a preoperative predictive model for major complications following adult spinal deformity surgery. J Neurosurg Spine. 2017;26(6):736-43. [DOI] [PubMed] [Google Scholar]

[B14] 14.Moons KG, Wolff RF, Riley RD, et al. PROBAST: a tool to assess risk of bias and applicability of prediction model studies: explanation and elaboration. Ann Intern Med. 2019;170(1):W1-33. [DOI] [PubMed] [Google Scholar]

[B15] 15.Altman DG, Royston P. What do we mean by validating a prognostic model? Stat Med. 2000;19(4):453-73. [DOI] [PubMed] [Google Scholar]

[B16] 16.Altman DG, Vergouwe Y, Royston P, et al. Prognosis and prognostic research: validating a prognostic model. BMJ. 2009;338 (7708):b605. [DOI] [PubMed] [Google Scholar]

[B17] 17.Steyerberg EW, Vergouwe Y. Towards better clinical prediction models: seven steps for development and an ABCD for validation. Eur Heart J. 2014;35(29):1925-31. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B18] 18.Groot OQ, Bindels BJ, Ogink PT, et al. Availability and reporting quality of external validations of machine-learning prediction models with orthopedic surgical outcomes: a systematic review. Acta Orthop. 2021;92(4):385-93. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B19] 19.Wang X, Hu Y, Zhao B, et al. Predictive validity of the ACS-NSQIP surgical risk calculator in geriatric patients undergoing lumbar surgery. Medicine. 2017;96(43):e8416. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B20] 20.Fatemi P, Zhang Y, Han SS, et al. External validation of a predictive model of adverse events following spine surgery. Spine J. 2022;22(1):104-12. [DOI] [PubMed] [Google Scholar]

[B21] 21.Saito T, Rehmsmeier M. The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLOS ONE. 2015;10(3):e0118432. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B22] 22.Bilimoria KY, Liu Y, Paruch JL, et al. Development and evaluation of the universal ACS NSQIP surgical risk calculator: a decision aid and informed consent tool for patients and surgeons. J Am Coll Surg. 2013;217(5):833-42. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B23] 23.Veeravagu A, Cole TS, Azad TD, et al. Improved capture of adverse events after spinal surgery procedures with a longitudinal administrative database. J Neurosurg Spine. 2015;23(3):374-82. [DOI] [PubMed] [Google Scholar]

[B24] 24.McCarthy MH, Singh P, Nayak R, et al. Can the American College of Surgeons risk calculator predict 30-day complications after spine surgery? Spine. 2020;45(9):621-8. [DOI] [PubMed] [Google Scholar]

[B25] 25.Vergouwe Y, Steyerberg EW, Eijkemans MJ, et al. Substantial effective sample sizes were required for external validation studies of predictive logistic regression models. J Clin Epidemiol. 2005;58(5):475-83. [DOI] [PubMed] [Google Scholar]

[B26] 26.Broda A, Sanford Z, Turcotte J, et al. Development of a risk prediction model with improved clinical utility in elective cervical and lumbar spine surgery. Spine. 2020;45(9):E542-51. [DOI] [PubMed] [Google Scholar]

[B27] 27.Sun GW, Shook TL, Kay GL. Inappropriate use of bivariable analysis to screen risk factors for use in multivariable analysis. J Clin Epidemiol. 1996;49(8):907-16. [DOI] [PubMed] [Google Scholar]

[B28] 28.Moons KG, Kengne AP, Grobbee DE, et al. Risk prediction models: II. External validation, model updating, and impact assessment. Heart. 2012;98(9):691-8. [DOI] [PubMed] [Google Scholar]

[B29] 29.Kim JS, Merrill RK, Arvind V, et al. Examining the ability of artificial neural networks machine learning models to accurately predict complications following posterior lumbar spine fusion. Spine. 2018;43(12):853-60. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B30] 30.Forsström JJ, Dalton KJ. Artificial neural networks for decision support in clinical medicine. Ann Med. 1995;27(5):509-17. [DOI] [PubMed] [Google Scholar]

[B31] 31.Lee MJ, Konodi MA, Cizik AM, et al. Risk factors for medical complication after spine surgery: a multivariate analysis of 1,591 patients. Spine J. 2012;12(3):197-206. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B32] 32.Mirza SK, Deyo RA, Heagerty PJ, et al. Development of an index to characterize the “invasiveness” of spine surgery: validation by comparison to blood loss and operative time. Spine. 2008;33(24):2651-61. [DOI] [PubMed] [Google Scholar]

[B33] 33.Kasparek MF, Boettner F, Rienmueller A, et al. Predicting medical complications in spine surgery: evaluation of a novel online risk calculator. Eur Spine J. 2018;27(10):2449-56. [DOI] [PubMed] [Google Scholar]

[B34] 34.Bennette C, Vickers A. Against quantiles: categorization of continuous variables in epidemiologic research, and its discontents. BMC Med Res Methodol. 2012;12:21. DOI.org/10.1186/1471-2288-12-21 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B35] 35.Shahsavari S, Naderi M, Abbasi M. Letter to the Editor concerning “Predicting medical complications in spine surgery: evaluation of a novel online risk calculator” by M. F. Kasparek et al. (Eur Spine J: DOI 10.1007/s00586-018-5707-9). Eur Spine J. 2018;27(11):2885-6. [DOI] [PubMed] [Google Scholar]

[B36] 36.Ratliff JK, Balise R, Veeravagu A, et al. Predicting occurrence of spine surgery complications using “big data” modeling of an administrative claims database. J Bone Joint Surg Am. 2016;98(10):824-34. [DOI] [PubMed] [Google Scholar]

[B37] 37.Fisher ES, Whaley FS, Krushat WM, et al. The accuracy of Medicare's hospital claims data: progress has been made, but problems remain. Am J Public Health. 1992;82(2):243-8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B38] 38.Guimarães PO, Krishnamoorthy A, Kaltenbach LA, et al. Accuracy of medical claims for identifying cardiovascular and bleeding events after myocardial infarction: a secondary analysis of the TRANSLATE-ACS study. JAMA Cardiol. 2017;2(7):750-7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B39] 39.Rudrapatna VA, Glicksberg BS, Avila P, et al. Accuracy of medical billing data against the electronic health record in the measurement of colorectal cancer screening rates. BMJ Open Qual. 2020;9(1):e000856. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B40] 40.Veeravagu A, Li A, Swinney C, et al. Predicting complication risk in spine surgery: a prospective analysis of a novel risk assessment tool. J Neurosurg Spine. 2017;27(1):81-91. [DOI] [PubMed] [Google Scholar]

[B41] 41.Han SS, Azad TD, Suarez PA, et al. A machine learning approach for predictive models of adverse events following spine surgery. Spine J. 2019;19(11):1772-81. [DOI] [PubMed] [Google Scholar]

[B42] 42.Siontis GC, Tzoulaki I, Castaldi PJ, et al. External validation of new risk prediction models is infrequent and reveals worse prognostic discrimination. J Clin Epidemiol. 2015;68(1):25-34. [DOI] [PubMed] [Google Scholar]

[B43] 43.McGirt MJ, Sivaganesan A, Asher AL, et al. Prediction model for outcome after low-back surgery: individualized likelihood of complication, hospital readmission, return to work, and 12-month improvement in functional disability. Neurosurg Focus. 2015;39(6):E13. [DOI] [PubMed] [Google Scholar]

[B44] 44.van der Ploeg T, Austin PC, Steyerberg EW. Modern modelling techniques are data hungry: a simulation study for predicting dichotomous endpoints. BMC Med Res Methodol. 2014;14:137. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B45] 45.Buchlak QD, Yanamadala V, Leveque JC, et al. The Seattle spine score: predicting 30-day complication risk in adult spinal deformity surgery. J Clin Neurosci. 2017;43:247-55. [DOI] [PubMed] [Google Scholar]

[B46] 46.Collins GS, de Groot JA, Dutton S, et al. External validation of multivariable prediction models: a systematic review of methodological conduct and reporting. BMC Med Res Methodol. 2014;14:40. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B47] 47.Vickers AJ, van Calster B, Steyerberg EW. A simple, step-by-step guide to interpreting decision curve analysis. Diagn Progn Res. 2019;3:18. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B48] 48.Vickers AJ, Holland F. Decision curve analysis to evaluate the clinical benefit of prediction models. Spine J. 2021;21(10):1643-8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B49] 49.Clavien PA, Vetter D, Staiger RD, et al. The Comprehensive Complication Index (CCI^Ⓡ): added value and clinical perspectives 3 years “down the line”. Ann Surg. 2017;265(6):1045-50. [DOI] [PubMed] [Google Scholar]

[B50] 50.Schimmel JJ, Horsting PP, de Kleuver M, et al. Risk factors for deep surgical site infections after spinal fusion. Eur Spine J. 2010;19(10):1711-9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B51] 51.Donnally CJ, 3rd, Vakharia RM, Rush AJ, 3rd, et al. Fibromyalgia as a predictor of increased postoperative complications, readmission rates, and hospital costs in patients undergoing posterior lumbar spine fusion. Spine. 2019;44(4):E233-8. [DOI] [PubMed] [Google Scholar]

PERMALINK

Prediction Tools in Spine Surgery: A Narrative Review

Martin Coia Jadresic

Joseph F Baker

Abstract

Introduction

Ethics

Study characteristics

Table 1.

Table 2.

Figure 1.

Figure 2.

Table 3.

Table 4.

NSQIP

SpineSage

Risk assessment tool

Spine AE predictor

McGirt

Scheer

Seattle spine score

Discussion

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Prediction Tools in Spine Surgery: A Narrative Review

Martin Coia Jadresic

Joseph F Baker

Abstract

Introduction

Ethics

Study characteristics

Table 1.

Table 2.

Figure 1.

Figure 2.

Table 3.

Table 4.

NSQIP

SpineSage

Risk assessment tool

Spine AE predictor

McGirt

Scheer

Seattle spine score

Discussion

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases