Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2022 Sep 1.
Published in final edited form as: Spine (Phila Pa 1976). 2021 Sep 1;46(17):1181–1190. doi: 10.1097/BRS.0000000000004017

Administrative Data are Unreliable for Ranking Hospital Performance Based on Serious Complications after Spine Fusion

Jacob K Greenberg 1, Margaret A Olsen 2, John Poe 5, Christopher F Dibble 1, Ken Yamaguchi 3,6, Michael P Kelly 3, Bruce L Hall 4, Wilson Z Ray 1
PMCID: PMC8363514  NIHMSID: NIHMS1676590  PMID: 33826589

Abstract

Study Design:

Retrospective analysis of administrative billing data.

Objective:

To evaluate the extent to which a metric of serious complications determined from administrative data can reliably profile hospital performance in spine fusion surgery.

Summary of Background Data:

While payers are increasingly focused on implementing pay-for-performance measures, quality metrics must reliably reflect true differences in performance among the hospitals profiled.

Methods:

We used State Inpatient Databases from nine states to characterize serious complications after elective cervical and thoracolumbar fusion. Hierarchical logistic regression was used to risk-adjust differences in case mix, along with variability from low case volumes. The reliability of this risk-stratified complication rate (RSCR) was assessed as the variation between hospitals that was not due to chance alone, calculated separately by fusion type and year. Finally, we estimated the proportion of hospitals that had sufficient case volumes to obtain reliable (≥ 0.7) complication estimates.

Results:

From 2010-2017 we identified 154,078 cervical and 213,133 thoracolumbar fusion surgeries. 4.2% of cervical fusion patients had a serious complication, and the median RSCR increased from 4.2% in 2010 to 5.5% in 2017. The reliability of the RSCR for cervical fusion was poor and varied substantially by year (range 0.04-0.28). Overall, 7.7% of thoracolumbar fusion patients experienced a serious complication, and the RSCR varied from 6.8%-8.0% during the study period. Although still modest, the RSCR reliability was higher for thoracolumbar fusion (range 0.16-0.43). Depending on the study year, 0-4.5% of hospitals had sufficient cervical fusion case volume to report reliable (> 0.7) estimates, whereas 15-36% of hospitals reached this threshold for thoracolumbar fusion.

Conclusion:

A metric of serious complications was unreliable for benchmarking cervical fusion outcomes and only modestly reliable for thoracolumbar fusion. When assessed using administrative datasets, these measures appear inappropriate for high-stakes applications, such as public reporting or pay-for-performance.

Keywords: Reliability, administrative billing data, quality measures, spine surgery

INTRODUCTION

Degenerative spine disease is one of the most costly conditions in the United States healthcare system.1 Spine fusion in particular is one of the most common inpatient operations, and has become more common than less invasive decompression surgeries.2-4 Compared to decompression alone, spine fusion is associated with increased complications rates and cost.2, 5 As both the United States population and the average age of spine fusion patients have increased,6, 7 so too has the need to identify and support safe surgical practices.

Multiple studies have reported variation in spine fusion outcomes across geographic regions.3, 8, 9 Less attention has been focused on hospital-level variation in complication rates following fusion.10 However, understanding the influence of hospital-level variation is imperative given the growing emphasis on benchmarking hospital performance based on safety metrics. The Centers for Medicare and Medicaid Services (CMS) already applies a financial penalty for hospitals with elevated rates of hospital-acquired conditions,11 and also specifically benchmarks complication rates of hip and knee arthroplasty.12 While spine surgery complication rates are not currently subject to such national benchmarking, the development of multiple quality improvement registries reflects the growing focus on profiling hospital safety.13-15

Despite numerous studies investigating spine surgery complications,16-18 there is a dearth of evidence investigating the variability in outcomes across hospitals or the extent to which complication rates are a valid metric for benchmarking hospital quality. Therefore, the objectives of this study were: 1) to characterize the hospital-level variation in serious complications (determined from administrative data) following spine fusion; and 2) to evaluate the validity of using a composite metric of serious complications determined from administrative data to benchmark hospital performance.

MATERIALS AND METHODS

Study Dataset

Data for this analysis came from State Inpatient Databases (SID) provided by the Healthcare Cost and Utilization Project supported by the Agency for Healthcare Research and Quality.19 The SID are administrative billing databases that provide comprehensive, all-payor data on non-federal hospital admissions in participating states.20 We used the SID from Arkansas, Florida, Iowa, Massachusetts, Maryland, Nebraska, New York, Vermont, and Wisconsin, which included a patient-level identifier allowing individual patient visits to be tracked across admissions. We included patients treated from 2010-2017. The number of patients contributed by each state and the number of hospitals included in each year’s analysis are shown in the supplemental digital content (SDC, E-Tables 1-2).

Patient Population

We included patients 18 years old and older who underwent elective cervical or thoracolumbar spine fusion. The full list of international classification of diseases (ICD)-9 and ICD-10 codes used to identify these patients is included in in the SDC (E-Table 3). Patients with cervicothoracic fusions were included in the cervical cohort. We excluded patients who were admitted from the emergency department or transferred from another hospital, and those who had a diagnosis of spine trauma, neoplasm, or cauda equina syndrome during index admission, along with patients who underwent fracture repair. Patients were also excluded if they had infection listed as their primary diagnosis code or “present on arrival” during index admission. Patients with infection listed in other diagnosis codes were treated as having had a postoperative complication. We only included hospitals treating more than 25 patients per category (cervical or thoracolumbar fusion) per year, to ensure a minimum sample size for evaluating hospital performance.21

Confounder and Outcome Assessment

The outcome in this study was the occurrence of a serious postoperative complication within 30 days of surgery. Serious complications were defined as return to the operating room, stroke, pulmonary embolism, myocardial infarction, or death, or a total 30-day length of stay in the 90th percentile per procedure in conjunction with another complication (e.g. respiratory, neurological, thrombotic, wound-related). Billing procedure codes were evaluated for 30 days from surgery and diagnosis codes were evaluated for any admission within 30 days of surgery. The full list of complications is shown in the SDC (E-Table 3). These complications are similar to ones used by CMS to benchmark hospital performance following arthroplasty.22 The length of stay condition added has been applied in multiple studies to focus on complications that had a major impact on the patient’s hospital course.23-25 While some studies used the 75th percentile as a length of stay cutoff, we used the 90th percentile to be consistent across procedures, given that some procedures (e.g. anterior cervical fusion) had a small difference between the median and 75th percentile (SDC, E-Table 4).

Potential confounding variables used for risk-adjustment included: comorbidities defined using the Elixhauser index that were diagnosed in at least 1% of the population during index admission or the year prior;26 patient demographics; insurance status; other clinical variables relevant to spine patients (e.g. myelopathy); and surgical characteristics.27 We also conducted several sensitivity analyses, described in the SDC, to evaluate whether variations in ICD coding across time impacted the final results (SDC, E-Figure 1AB, E-Table 5).

Statistical Analysis

To evaluate hospital complication rates, adjusting for differences in patient characteristics and case mix, we used hierarchical logistic regression. These models evaluated how patient risk factors impacted performance at an average hospital (“fixed effects”) as well as how individual hospitals performed compared to their peers (“random effect”). Each individual hospital’s predicted performance was based both on its patient/case mix as well as its individual performance (i.e. whether it performed better or worse than other hospitals). To mitigate variability in performance predictions at hospitals with fewer patients, this approach applies a “shrinkage” penalty based on empirical Bayes techniques.28, 29 Therefore, each hospital’s predicted complication rate is “shrunk” toward the population average, with the degree of shrinkage proportional to hospital volume (i.e. smaller-volume hospitals experience greater shrinkage).30 By accounting for differences in both case mix and hospital volume, this modeling approach adjusts for differences in both patient risk and the reliability of the hospital-level estimates.

The final risk/reliability-adjusted complication rate (RSCR) for each hospital was calculated as the number of complications predicted for an individual hospital (based on its observed performance) divided by the number predicted for an average hospital with the same patient/case mix, multiplied by the population average complication rate.21, 22 Thus, the RSCR reflects both a hospital’s performance relative to its peers and also the overall population complication rate. Each statistical model was run separately by fusion type and year. The predictive performance of each model was calculated using the c-statistic, which reflects a model’s ability to discriminate which patients will develop complications.31 The c-statistic ranges from 0.5 (no better than chance) to 1.0 (perfect discrimination).

While the RSCR is intended to mitigate “noise” from low-volume outliers, using this metric to profile hospital performance is inappropriate when true variation in outcome across hospitals is low relative to the amount of variation within hospitals. To evaluate the appropriateness of using the rate of serious complications to benchmark hospitals, we quantified the “rankability” of the serious complication metric for each procedure group and year using the following formula:32, 33

Rankability=σhospitaltohospital2σhospitaltohospital2+median(σhospitalspecific2)

This approach relates the hospital-to-hospital variability from the random effects model (σ2hospital–to–hospital) to the uncertainty of the hospital-specific effects from the fixed effects model (σ2hospital–specific). In other words, this measure—termed “reliability” by some authors—quantifies the ability to distinguish true differences across hospitals compared to random fluctuations in performance (signal-to-noise ratio) .34 High rankability does not inherently imply good or bad quality, nor does it necessarily reflect the predictive accuracy (e.g. c-statistic) of the random effects models. Rankability ranges from 0 to 1, where 1 implies all differences across hospitals reflects true variation, and 0 implies all differences are due to chance. By analogy to agreement statistics, a threshold of 0.41-0.60 may be considered moderate, 0.61-0.80 good, and 0.81-1.0 almost perfect.35 In the benchmarking literature specifically, a threshold of 0.7 is often considered acceptable.36

To illustrate the impact of rankability graphically, we evaluated the year-to-year stability in hospital decile rank in RSCR using a heat map, given that low rankability will typically produce marked fluctuations in rank over time. Finally, using a previously described approach basing hospital variance on the binomial distribution estimate,37 we calculated the number of surgeries per year needed to achieve different reliability cutoffs for a hypothetical hospital with the population average complication rate. All analyses were performed using complete case analysis and were conducted using SAS version 9.4 (SAS Institute, Cary, NC) and R version 4.0.1.38

RESULTS

After excluding 23,535 patients treated at hospitals performing 25 or fewer surgeries per year, we identified 154,078 cervical spine fusions performed at 324 hospitals and 213,133 thoracolumbar fusions performed at 364 hospitals. There were slightly more females than male patients for both cervical (52%) and thoracolumbar (54.9%) fusions, and most patients were white (77.5% and 78.6%). Among cervical fusion patients, most (86.8%) underwent anterior fusion, while most (81.6%) thoracolumbar patients had a posterior fusion. Overall, 5.6% of cervical fusion patients experienced any postoperative complication within 30 days and 4.2% had a serious complication. By comparison, 12.3% of thoracolumbar fusion patients experienced any complication and 7.7% had a serious complication. The full list of population demographic characteristics, procedure types, and postoperative complications is shown in Table 1.

Table 1:

Demographic characteristics, comorbid conditions, procedure characteristics, and postoperative complications for patients undergoing cervical and thoracolumbar fusion.

Characteristic, n (%) Cervical Fusion (n=154,078) Thoracolumbar Fusion
(n=213,133)
Age in years, mean (std) 55.5 (12.0) 59.5 (13.9)
Gender
 Male 74,014 (48.0) 96,058 (45.1)
 Female 80,064 (52.0) 117,075 (54.9)
Insurance type
 Private 72,751 (47.2) 77,943 (36.6)
 Medicare 48,992 (31.8) 95,987 (45.0)
 Medicaid 11,295 (7.3) 12,106 (5.7)
 Other 21,040 (13.7) 27,097 (12.7)
Race/ethnicity
 White 119,377 (77.5) 167,612 (78.6)
 Black 14,140 (9.2) 15,302 (7.2)
 Hispanic 8,783 (5.7) 13,282 (6.2)
 Other 11,778 (7.6) 16,937 (8.0)
Elixhauser comorbidity index, mean (std) 5.9 (10.2) 7.1 (10.7)
Myelopathy 55,712 (36.2) 6,844 (3.2)
Scoliosis 1,158 (0.75) 21,424 (10.1)
Other deformity 5,881 (3.8) 13,381 (6.3)
Tobacco use 54,754 (35.5) 71,580 (33.6)
Anxiety 22,541 (14.6) 31,389 (14.7)
Hypercholesterolemia 48,173 (31.3) 83,105 (39.0)
Chronic pain 11,064 (7.2) 20,999 (9.9)
Surgical Approach
 Anterior fusion 133,714 (86.8) 20,071 (9.4)
 Posterior fusion 16,481 (10.7) 173,864 (81.6)
 Anterior/posterior fusion 3,064 (2.0) 19,198 (9.0)
 Not specified* 819 (0.53) N/A
Cervicothoracic Fusion 1,797 (1.2) N/A
Osteotomy/corpectomy 12,473 (8.1) 4,305 (2.0)
Interbody device 96,866 (62.9) 142,438 (66.8)
Multilevel fusion 45,946 (29.8) 36,233 (17.0)
Any complication 8,553 (5.6) 26,205 (12.3)
 Died 182 (0.12) 301 (0.14)
 Return to the operating room 2,041 (1.3) 6,010 (2.8)
 Neurological complication 1,568 (1.0) 7,898 (3.7)
 Pneumonia 1,109 (0.72) 2,281 (1.1)
 Postoperative hematoma 1,039 (0.67) 1,969 (0.92)
 Other respiratory complication 925 (0.60) 1,329 (0.62)
 Surgical site infection 807 (0.52) 3,078 (1.4)
 Device complication 749 (0.49) 1,946 (0.91)
 Urinary-renal complication 1,603 (1.04) 5,988 (2.8)
 Myocardial infarct/cardiac complication 614 (0.40) 1,669 (0.78)
 Sepsis 576 (0.37) 1,735 (0.81)
 Deep venous thrombosis 425 (0.28) 1,266 (0.59)
 Pulmonary embolism 411 (0.27) 1,244 (0.58)
 Wound dehiscence 343 (0.22) 1,277 (0.60)
 Stroke 187 (0.12) 383 (0.18)
 Osteomyelitis/discitis 85 (0.06) 174 (0.08)
 Meningitis NR 31 (0.01)
 Vascular injury NR 31 (0.01)

NR=not reported due to not reaching the cutoff of 10 event rates required by HCUP for reporting individual outcomes.

N/A= not applicable to that patient cohort.

*

Anterior or posterior approach was not specified.

Cervical Fusion

The median RSCR and corresponding percentile values for cervical fusion are shown in Table 2 and E-Table 5. The RSCR by volume group and year is depicted graphically in Figure 1 and E-Figure 2a. As shown in the figures, complication rates were generally similar across volume groups, with median RSCR ranging from 4.1%-4.2% across the three groups. However, there was a statistically significant increase in the median cervical fusion RSCR from 2010 (4.2%) to 2017 (5.5%) (p < 0.01). As shown in E-Table 7, model discrimination and hospital volume remained similar over time and did not explain these trends.

Table 2:

Risk/reliability-adjusted complication rates (RSCR) for cervical spine fusion by year and percentile.

2010 2011 2012 2013 2014 2015 2016 2017
10th Percentile 3.77 3.46 3.12 3.16 3.68 3.61 3.99 4.14
20th Percentile 3.86 3.61 3.36 3.47 3.82 3.87 4.15 4.68
30th Percentile 3.96 3.70 3.55 3.63 3.90 4.02 4.22 4.94
40th Percentile 4.07 3.77 3.72 3.80 4.04 4.18 4.28 5.18
50th Percentile 4.18 3.83 3.92 3.99 4.17 4.37 4.39 5.51
60th Percentile 4.30 3.90 4.13 4.13 4.28 4.58 4.51 5.75
70th Percentile 4.43 4.01 4.36 4.38 4.44 4.75 4.64 6.24
80th Percentile 4.56 4.10 4.65 4.70 4.65 5.09 4.81 6.59
90th Percentile 4.84 4.36 5.16 5.07 4.89 5.60 5.15 7.46

Figure 1:

Figure 1:

The risk-stratified complication rate (RSCR) and interquartile range for cervical (A) and thoracolumbar (B) fusion surgeries. Results are stratified by hospital mean annual case-volume, divided into three groups of approximately equal size. The x-axis labels show the mean annual volume cutoffs for each group in parentheses.

The rankability of the serious complication metric is shown in Figure 2. The rankability was low in all years, but there was a wide range from 0.04 in 2011 to 0.28 in 2017 (Figure 2). As demonstrated in E-Table 7, model discrimination and hospital volumes showed little change over time and did not explain differences in rankability.

Figure 2:

Figure 2:

The rankability of risk-stratified complication rate (RSCR) by year for cervical and thoracolumbar fusion surgeries.

We graphically examined the stability of hospital performance relative to other institutions using a heatmap of hospital decile rank in RSCR over time (Figure 3a and 3b). As shown in the figure, there was substantial inconsistency in RSCR rank across years for both low and high volume hospitals, indicating rank-based profiling efforts would lead to widely varying results over time. Finally, given the low overall rankability, we estimated the sample size needed at a hypothetical hospital with an average complication rate to achieve reliability thresholds ranging from 0.5 to 0.7. As shown in Table 4, to achieve reliable (i.e. ≥ 0.70) complication estimates, between 196-961 surgeries would be needed per hospital depending on the year, volumes typically achieved by less than 1% of hospitals.

Figure 3:

Figure 3:

A heatmap showing the degree of consistency in hospital decile rank in RSCR by year for cervical (A-B) and thoracolumbar (C-D) fusion, separated into low (A & C) versus high volume (B & D) by tertiles of case-volume. Shading reflects hospital decile rank in each year, where a value of 1 indicates the lowest (best) decile in terms of RSCR, and 10 indicates the decile with the highest (worst) RSCR. Hospital vertical position was set based on RSCR rank in 2010 and held constant throughout the figure. Hospitals missing data for any year from 2010-2016 were excluded, explaining the smaller number of hospitals included for low volume compared to high volume heatmaps.

Table 4:

The number of surgeries per hospital needed to achieve different cutoffs of reliability for a hypothetical hospital with a complication rate at the population average. In parentheses is the number of hospitals with case volumes exceeding that threshold for each year.

Estimated n for
reliability=0.5 (% of
hospitals exceeding
that volume)
Estimated n for
reliability=0.6 (% of
hospitals exceeding
that volume)
Estimated n for
reliability=0.7 (% of
hospitals exceeding
that volume)
Cervical
 2010 332 (1.1) 498 (0) 774 (0)
 2011 412 (0) 618 (0) 961 (0)
 2012 177 (12.7) 265 (4.2) 413 (0.5)
 2013 176 (13.0) 264 (4.3) 411 (0.5)
 2014 281 (2.3) 422 (0.8) 655 (0)
 2015 174 (11.8) 260 (5.3) 404 (0.4)
 2016 333 (0.5) 499 (0) 776 (0)
 2017 84 (23.6) 126 (9.0) 196 (4.5)
Thoracolumbar
 2010 53 (71.9) 80 (51.4) 124 (34.3)
 2011 50 (73.7) 75 (56.0) 117 (36.2)
 2012 55 (68.0) 82 (49.0) 128 (31.1)
 2013 60 (67.0) 90 (47.3) 139 (27.2)
 2014 51 (76.6) 77 (56.4) 119 (33.0)
 2015 49 (72.8) 74 (54.4) 114 (34.1)
 2016 44 (74.6) 66 (52.1) 102 (35.6)
 2017 65 (53.7) 97 (29.6) 150 (14.8)

Thoracolumbar Fusion

The RSCR and percentile values for thoracolumbar fusion are shown in Table 3 and E-Table 6. As shown in Figure 1b, the median RSCR varied little across volume groups (range 7.3-7.7%). The median RSCR also did not change substantially over time, consistently falling between 6.8-8.0% (E-Figure 2b). The rankability of the thoracolumbar fusion RSCR was substantially higher than that observed for cervical fusion, ranging from 0.16 in 2017 to 0.43 in 2016 (Figure 2). As was the case for cervical fusion, differences in rankability over time did not correspond to variations in model discrimination or hospital volume (E-Table 7).

Table 3:

Risk/reliability-adjusted complication rates (RSCR) for thoracolumbar spine fusion by year and percentile.

2010 2011 2012 2013 2014 2015 2016 2017
10th Percentile 5.51 5.63 5.30 5.25 5.11 4.77 4.99 4.83
20th Percentile 5.97 6.41 5.84 5.95 5.70 5.21 5.81 5.32
30th Percentile 6.79 7.05 6.63 6.59 6.26 5.90 6.36 5.97
40th Percentile 7.22 7.49 7.12 7.10 6.79 6.45 7.01 6.59
50th Percentile 7.92 8.01 7.73 7.59 7.38 7.10 7.62 6.84
60th Percentile 8.67 8.66 8.58 8.34 8.10 7.93 8.42 7.39
70th Percentile 9.59 9.27 9.37 8.86 8.85 8.61 9.26 7.74
80th Percentile 10.72 10.50 10.52 9.84 9.94 9.49 10.30 8.60
90th Percentile 12.59 12.46 11.63 11.44 11.89 11.03 12.64 10.11

The consistency of each hospital’s RSCR for thoracolumbar fusion relative to other institutions is depicted graphically in Figures 3c and 3d. While hospital ranks did fluctuate over time, the variations were less extreme than those seen for cervical fusion, and high volume hospitals showed the highest degree of consistency in rank. Corresponding to the improved rankability, a much larger proportion of hospitals had sufficient case volumes to achieve moderate (0.5) reliability. Nonetheless, only about one-third of hospitals had sufficient volume for highly reliable (0.7) estimates in most years (Table 4).

DISCUSSION

This investigation offers a population-level assessment of the rates of and variability in spine fusion complications, as determined from administrative data, over time, along with the reliability of such metrics for benchmarking hospital performance. We found that the rankability of cervical fusion data varied substantially by year but remained unacceptably low across all years. By comparison, thoracolumbar fusion showed substantially higher rankability, but still varied from just “slight” to “moderate” by year,35 and a minority of hospitals achieved high reliability. Together, these results provide important insights into the advisability of using administrative billing data to benchmark hospital quality in spine surgery.

As both payers and patients increasingly focus on quality metrics, evaluating the reliability of such measures is key. There are several factors that influence the reliability of a quality metric. Holding other factors constant, reliability increases with higher case volumes.39 Yet in our study, median hospital case volume did not correlate with rankability within surgery groups, consistent with previous studies showing that the reliability of surgical site infection models can vary from 0.005 to 0.63 across surgeries with similar case volumes and event rates.40 As demonstrated by the results of this study, metrics with inherently low reliability may require hundreds of patients per hospital to have acceptable performance (e.g. reliability ≥ 0.7),41 volumes typically achieved by few if any hospitals.

Aside from case volume, there are several factors that influence the rankability of a performance metric. Most notably is the amount of true between-hospital variation (i.e. “signal”) relative to random chance and within-hospital variability (i.e. “noise”).33 Inconsistency in the between-hospital variation was particularly notable in the cervical fusion data, where the rankability varied substantially across time, despite stable hospital volumes and model discrimination. Rankability is also affected by the prevalence of the outcome.33 While there is value in profiling a variety of outcomes not assessed in this study, the event rates (e.g. 0.52% for surgical site infection after cervical fusion) will influence the feasibility of such efforts. Indeed, the lower complication rate after cervical compared to thoracolumbar fusion may be one reason for the difference in rankability.

Rankability will also suffer from unadjusted differences in case mix across providers.42 The model discrimination observed in our study (range 0.75-0.81) was moderate to good by conventional standards, and also compared favorably to previously published risk models in spine surgery.16, 18 Likewise, our model discrimination was better than the models used by CMS to risk-adjust arthroplasty outcomes (c-statistic range 0.65-0.66).21 Nonetheless, given the heterogeneity in both spinal fusion patients and procedures, further investigations using more granular registry data are needed to determine the extent to which improved risk adjustment might improve rankability. More granular risk-adjustment may also increase acceptability among physician and hospital stakeholders who may resist profiling efforts they perceive to reflect differences in case mix across hospitals.43-45

This study has several limitations. The most important limitation is the study’s reliance on administrative billing data, which are known to be less accurate than registry data for surgical outcomes.46, 47 Therefore, although we corrected for documented measures of surgical complexity (e.g. multilevel surgery), these measures are poorly defined and of uncertain accuracy. In addition, although we did not identify substantial changes associated with the transition in ICD coding schemes, we cannot exclude the possibility that such coding variations could have impacted the consistency of our results over time. The durability of our findings should be evaluated in future years using alternative datasets (e.g. Medicare). Another limitation is that the SID only allowed us to evaluate the impact of hospital volume but not individual surgeon volume, which may have a more important influence on surgical complications.48 Future studies (e.g. using Medicare data) should evaluate reliability metrics associated with individual surgeon performance. Additionally, we only assessed inpatient surgeries and complications. Given the increasing shift to ambulatory surgery and pressure to avoid readmissions,49 future efforts using alternative datasets are needed to evaluate the impact of outpatient data on profiling efforts. Finally, as noted above, we evaluated the rankability of a single composite measure of 30-day morbidity. Future efforts are needed to examine the rankability of other metrics.

In conclusion, postoperative composite complications, derived from administrative data, after thoracolumbar fusion show slight-to-moderate rankability, whereas such complications after cervical fusion appear unreliable. These results indicate that such metrics derived from administrative billing data should not be used in high stakes applications, such as public reporting or pay-for-performance.

Supplementary Material

Supplemental Material

Acknowledgments:

The authors thank Ms. Joanna Reale for her assistance with database programming.

This work was supported by the Washington University Institute of Clinical and Translational Sciences which is, in part, supported by the NIH/National Center for Advancing Translational Sciences (NCATS), CTSA grant #UL1 TR002345. The Center for Administrative Data Research is supported in part by the Washington University Institute of Clinical and Translational Sciences grant UL1 TR002345 from the National Center for Advancing Translational Sciences (NCATS) of the National Institutes of Health (NIH) and Grant Number R24 HS19455 through the Agency for Healthcare Research and Quality (AHRQ). JKG was supported by funding from the Agency for Healthcare Research and Quality (1F32HS027075-01A1) and the Thrasher Research Fund (#15024).

References:

  • 1.Healthcare Cost and Utilization Project (HCUP). Statistical Brief #204. Healthcare Cost and Utilization Project (HCUP) Rockville, MD: Agency for Healthcare Research and Quality; 2016. [Available from: www.hcup-us.ahrq.gov/reports/statbriefs/sb204-Most-Expensive-Hospital-Conditions.jsp. [PubMed] [Google Scholar]
  • 2.Machado GC, Maher CG, Ferreira PH, Harris IA, Deyo RA, McKay D, et al. Trends, complications, and costs for hospital admission and surgery for lumbar spinal stenosis. Spine (Phila Pa 1976). 2017;42(22):1737–43. [DOI] [PubMed] [Google Scholar]
  • 3.Raad M, Donaldson CJ, Dafrawy MHE, Sciubba DM, Riley LH, Neuman BJ, et al. Trends in isolated lumbar spinal stenosis surgery among working US adults aged 40–64 years, 2010–2014. 2018;29(2):169. [DOI] [PubMed] [Google Scholar]
  • 4.Grotle M, Småstuen MC, Fjeld O, Grøvle L, Helgeland J, Storheim K, et al. Lumbar spine surgery across 15 years: trends, complications and reoperations in a longitudinal observational study from Norway. BMJ open. 2019;9(8):e028743. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Yeramaneni S, Robinson C, Hostin R. Impact of spine surgery complications on costs associated with management of adult spinal deformity. Curr Rev Musculoskelet Med. 2016;9(3):327–32. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Vonck CE, Tanenbaum JE, Smith GA, Benzel EC, Mroz TE, Steinmetz MP. National trends in demographics and outcomes following cervical fusion for cervical spondylotic myelopathy. Global spine journal. 2018;8(3):244–53. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Martin BI, Mirza SK, Spina N, Spiker WR, Lawrence B, Brodke DS. Trends in Lumbar Fusion Procedure Rates and Associated Hospital Costs for Degenerative Spinal Diseases in the United States, 2004 to 2015. Spine (Phila Pa 1976). 2019;44(5):369–76. [DOI] [PubMed] [Google Scholar]
  • 8.Azad TD, Vail D, O'Connell C, Han SS, Veeravagu A, Ratliff JK. Geographic variation in the surgical management of lumbar spondylolisthesis: characterizing practice patterns and outcomes. The Spine Journal. 2018;18(12):2232–8. [DOI] [PubMed] [Google Scholar]
  • 9.Alosh H, Li D, Riley LHI, Skolasky RL. Health Care Burden of Anterior Cervical Spine Surgery: National Trends in Hospital Charges and Length of Stay, 2000–2009. Clinical Spine Surgery. 2015;28(1):5–11. [DOI] [PubMed] [Google Scholar]
  • 10.Martin BI, Mirza SK, Franklin GM, Lurie JD, MacKenzie TA, Deyo RA. Hospital and surgeon variation in complications and repeat surgery following incident lumbar fusion for common degenerative diagnoses. Health Serv Res. 2013;48(1):1–25. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Hospital-Acquired Condition Reduction Program (HACRP): Centers for Medicare & Medicaid Services; 2020. [Available from: https://www.cms.gov/Medicare/Medicare-Fee-for-Service-Payment/AcuteInpatientPPS/HAC-Reduction-Program. [DOI] [PubMed]
  • 12.Complication rate for hip/knee replacement patients: Centers for Medicare & Medicaid Services; 2020. [Available from: https://www.medicare.gov/hospitalcompare/Data/Surgical-Complications-Hip-Knee.html.
  • 13.Asher A, Speroff T, Dittus R, Parker S, Davies J, Selden N. The National Neurosurgery Quality and Outcomes Database (N2QOD): a collaborative North American outcomes registry to advance value-based spine care. Spine (Phila Pa 1976). 2014;39(22 Suppl 1):S106–S16. [DOI] [PubMed] [Google Scholar]
  • 14.Michigan Spine Surgery Improvement Collaborative. Michigan Spine Surgery Improvement Collaborative [Available from: https://mssic.org.
  • 15.Foundation for Healthcare Quality. SPINE CARE OUTCOMES ASSESSMENT PROGRAM (SPINE COAP) 2020. [Available from: https://www.qualityhealth.org/spinecoap/.
  • 16.Veeravagu A, Li A, Swinney C, Tian L, Moraff A, Azad TD, et al. Predicting complication risk in spine surgery: a prospective analysis of a novel risk assessment tool. J Neurosurg Spine. 2017;27(1):81–91. [DOI] [PubMed] [Google Scholar]
  • 17.Yagi M, Fujita N, Okada E, Tsuji O, Nagoshi N, Tsuji T, et al. Impact of frailty and comorbidities on surgical outcomes and complications in adult spinal disorders. Spine (Phila Pa 1976). 2018;43(18):1259–67. [DOI] [PubMed] [Google Scholar]
  • 18.Han SS, Azad TD, Suarez PA, Ratliff JK. A machine learning approach for predictive models of adverse events following spine surgery. The Spine Journal. 2019;19(11):1772–81. [DOI] [PubMed] [Google Scholar]
  • 19.Agency for Healthcare Research and Quality. HCUP Databases. Healthcare Cost and Utilization Project (HCUP) Rockville, MD: Agency for Healthcare Research and Quality; 2018. [Available from: http://www.hcup-us.ahrq.gov/sidoverview.jsp. [Google Scholar]
  • 20.Agency for Healthcare Research and Quality. Introduction to the HCUP State Inpatient Databases (SID). In: (HCUP) HCaUP, editor.: Agency for Healthcare Research and Quality; 2019. [Google Scholar]
  • 21.Yale New Haven Health Services Corporation – Center for Outcomes Research and Evaluation (YNHHSC/CORE). 2020 Procedure-Specific Complication Measure Updates and Specifications Report; Elective Primary Total Hip Arthroplasty (THA) and/or Total Knee Arthroplasty (TKA) – Version 9.0. Centers for Medicare & Medicaid Services (CMS); 2020. [Google Scholar]
  • 22.Bozic K, Yu H, Zywiel MG, Li L, Lin Z, Simoes JL, et al. Quality Measure Public Reporting Is Associated with Improved Outcomes Following Hip and Knee Replacement. JBJS. 2020. [DOI] [PubMed] [Google Scholar]
  • 23.Ibrahim AM, Ghaferi AA, Thumma JR, Dimick JB. Variation in outcomes at bariatric surgery centers of excellence. JAMA surgery. 2017;152(7):629–36. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Osborne NH, Nicholas LH, Ryan AM, Thumma JR, Dimick JB. Association of hospital participation in a quality reporting program with surgical outcomes and expenditures for Medicare beneficiaries. JAMA. 2015;313(5):496–504. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Ibrahim AM, Hughes TG, Thumma JR, Dimick JB. Association of hospital critical access status with surgical outcomes and expenditures among Medicare beneficiaries. JAMA. 2016;315(19):2095–103. [DOI] [PubMed] [Google Scholar]
  • 26.Elixhauser A, Steiner C, Harris DR, Coffey RM. Comorbidity measures for use with administrative data. Med Care. 1998;36(1):8–27. [DOI] [PubMed] [Google Scholar]
  • 27.Ratliff JK, Balise R, Veeravagu A, Cole TS, Cheng I, Olshen RA, et al. Predicting Occurrence of Spine Surgery Complications Using “Big Data” Modeling of an Administrative Claims Database. JBJS. 2016;98(10). [DOI] [PubMed] [Google Scholar]
  • 28.Dimick JB, Staiger DO, Birkmeyer JD. Ranking hospitals on surgical mortality: the importance of reliability adjustment. Health Serv Res. 2010;45(6p1):1614–29. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Hofer TP, Hayward RA, Greenfield S, Wagner EH, Kaplan SH, Manning WG. The unreliability of individual physician report cards for assessing the costs and quality of care of a chronic disease. JAMA. 1999;281(22):2098–105. [DOI] [PubMed] [Google Scholar]
  • 30.Dimick JB, Ghaferi AA, Osborne NH, Ko CY, Hall BL. Reliability adjustment for reporting hospital outcomes with surgery. Ann Surg. 2012;255(4):703–7. [DOI] [PubMed] [Google Scholar]
  • 31.Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology. 1982;143(1):29–36. [DOI] [PubMed] [Google Scholar]
  • 32.Lingsma HF, Eijkemans MJ, Steyerberg EW. Incorporating natural variation into IVF clinic league tables: The Expected Rank. BMC Med Res Methodol. 2009;9(1):53. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Austin PC, Ceyisakar IE, Steyerberg EW, Lingsma HF, Marang-van de Mheen PJ. Ranking hospital performance based on individual indicators: can we increase reliability by creating composite indicators? BMC Med Res Methodol. 2019;19(1):131. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Lingsma HF, Steyerberg EW, Eijkemans M, Dippel D, Scholte Op Reimer W, Van Houwelingen H, et al. Comparing and ranking hospitals based on outcome: results from The Netherlands Stroke Survey. QJM: An International Journal of Medicine. 2010;103(2):99–108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977;33(1):159–74. [PubMed] [Google Scholar]
  • 36.Hwang J, Adams JL, Paddock SM. Defining and estimating the reliability of physician quality measures in hierarchical logistic regression models. Health Services and Outcomes Research Methodology. 2020:1–20. [Google Scholar]
  • 37.Saito JM, Chen LE, Hall BL, Kraemer K, Barnhart DC, Byrd C, et al. Risk-adjusted hospital outcomes for children’s surgery. Pediatrics. 2013;132(3):e677–e88. [DOI] [PubMed] [Google Scholar]
  • 38.R: A Language and Environment for Statistical Computing. 4.0.1 ed. Vienna, Austria: R Foundation for Statistical Computing; 2020. [Google Scholar]
  • 39.Lawson EH, Ko CY, Adams JL, Chow WB, Hall BL. Reliability of evaluating hospital quality by colorectal surgical site infection type. Ann Surg. 2013;258(6):994–1000. [DOI] [PubMed] [Google Scholar]
  • 40.Huffman KM, Cohen ME, Ko CY, Hall BL. A comprehensive evaluation of statistical reliability in ACS NSQIP profiling models. Ann Surg. 2015;261(6):1108–13. [DOI] [PubMed] [Google Scholar]
  • 41.Hall BL, Huffman KM, Hamilton BH, Paruch JL, Zhou L, Richards KE, et al. Profiling individual surgeon performance using information from a high-quality clinical registry: opportunities and limitations. J Am Coll Surg. 2015;221(5):901–13. [DOI] [PubMed] [Google Scholar]
  • 42.Glance LG, Maddox KJ, Johnson K, Nerenz D, Cella D, Borah B, et al. National Quality Forum guidelines for evaluating the scientific acceptability of risk-adjusted clinical outcome measures: a report from the National Quality Forum Scientific Methods Panel. Ann Surg. 2020;271(6):1048–55. [DOI] [PubMed] [Google Scholar]
  • 43.Abecassis MM, Burke R, Klintmalm G, Matas AJ, Merion R, Millman D, et al. American Society of Transplant Surgeons transplant center outcomes requirements—a threat to innovation. Am J Transplant. 2009;9(6):1279–86. [DOI] [PubMed] [Google Scholar]
  • 44.Ross JS, Williams L, Damush TM, Matthias M. Physician and other healthcare personnel responses to hospital stroke quality of care performance feedback: a qualitative study. BMJ quality & safety. 2016;25(6):441–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Gude WT, Brown B, van der Veer SN, Colquhoun HL, Ivers NM, Brehaut JC, et al. Clinical performance comparators in audit and feedback: a review of theory and evidence. Implementation Science. 2019;14(1):39. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Sacks GD, Dawes AJ, Russell MM, Lin AY, Maggard-Gibbons M, Winograd D, et al. Evaluation of hospital readmissions in surgical patients: do administrative data tell the real story? JAMA surgery. 2014;149(8):759–64. [DOI] [PubMed] [Google Scholar]
  • 47.Patterson JT, Sing D, Hansen EN, Tay B, Zhang AL. The James A. Rand Young Investigator's Award: administrative claims vs surgical registry: capturing outcomes in total joint arthroplasty. The Journal of Arthroplasty. 2017;32(9):S11–S7. [DOI] [PubMed] [Google Scholar]
  • 48.Farjoodi P, Skolasky RL, Riley LH. The Effects of Hospital and Surgeon Volume on Postoperative Complications After LumbarSpine Surgery. Spine (Phila Pa 1976). 2011;36(24). [DOI] [PubMed] [Google Scholar]
  • 49.Baird EO, Egorova NN, McAnany SJ, Qureshi SA, Hecht AC, Cho SK. National Trends in Outpatient Surgical Treatment of Degenerative Cervical Spine Disease. Global Spine Journal. 2014;4(3):143–9. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental Material

RESOURCES