Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2022 Dec 1.
Published in final edited form as: Spine J. 2021 Jun 20;21(12):2026–2034. doi: 10.1016/j.spinee.2021.06.014

Comparison of Cost and Complication Rates for Profiling Hospital Performance in Lumbar Fusion for Spondylolisthesis

Jacob K Greenberg 1, Margaret A Olsen 2, Christopher F Dibble 1, Justin K Zhang 1, Brenton H Pennicooke 1, Ken Yamaguchi 3,5, Michael P Kelly 3, Bruce L Hall 4, Wilson Z Ray 1
PMCID: PMC8720504  NIHMSID: NIHMS1763947  PMID: 34161844

Abstract

Background Context:

There is growing interest among payers in profiling hospital value and quality-of-care, including both the cost and safety of common surgeries, such as lumbar fusion. Nonetheless, there is sparse evidence describing the statistical reliability of such measures when applied to lumbar fusion for spondylolisthesis.

Purpose:

To evaluate the reliability of 90-day inpatient hospital costs, overall complications, and rates of serious complications for profiling hospital performance in lumbar fusion surgery for spondylolisthesis.

Study Design/Setting:

Data for this analysis came from State Inpatient Databases from nine states made available through the Healthcare Cost and Utilization Project.

Patient Sample:

Patients undergoing elective lumbar spine fusion for spondylolisthesis from 2010–2017 in participating states.

Outcome Measures:

Statistical reliability, defined as the ability to distinguish true performance differences across hospitals relative to statistical noise. Reliability was assessed separately for 90-day inpatient costs (standardized across years to 2019 dollars), overall complications, and serious complication rates.

Methods:

Statistical reliability was measured as the amount of variation between hospitals relative to the total amount of variation for each measure. Total variation includes both between-hospital variation (“signal”) and within-hospital variation (“statistical noise”). Thus, reliability equals signal over (signal plus noise) and ranges from 0 to 1. To adjust for differences in patient-level risk and procedural characteristics, hierarchical linear and logistic regression models were created for the cost and complication outcomes. Random hospital intercepts were used to assess between-hospital variation. We evaluated the reliability of each measure by study year and examined the number of hospitals meeting different thresholds of reliability by year.

Results:

We included a total of 66,571 elective lumbar fusion surgeries for spondylolisthesis performed at 244 hospitals during the study period. The mean 90-day hospital cost was $30,827 (2019 dollars). 12.0% of patients experienced a complication within 90 days of surgery, including 7.8% who had a serious complication. The median reliability of 90-day cost ranged from 0.97–0.99 across study years, and there was a narrow distribution of reliability values. By comparison, the median reliability for the overall complication metric ranged from 0.22 to 0.44, and the reliability of the serious complication measure ranged from 0.30 to 0.49 across the study years. At least 96% of hospitals had high (> 0.7) reliability for cost in any year, whereas only 0–9% and 0–11% of hospitals reached this cutoff for the overall and serious complication rate in any year, respectively. By comparison, 10–69% of hospitals per year achieved a more moderate threshold of 0.4 reliability for overall complications, compared to 21–80% of hospitals who achieved this lower reliability threshold for serious complications.

Conclusions:

90-day inpatient costs are highly reliable for assessing variation across hospitals, whereas overall and serious complications are only moderately reliable for profiling performance. These results support the viability of emerging bundled payment programs that assume true differences in costs of care exist across hospitals.

Keywords: reliability, spine fusion, spondylolisthesis, healthcare costs, surgical quality, complications

Introduction:

Degenerative lumbar spine disease causing low back pain is one of the most costly problems in the United States healthcare system and is responsible for more global disability than any other cause of pain.[13] Lumbar spine fusion has become a mainstay of treatment for select types of severe lumbar disease, particularly spondylolisthesis.[4] Indeed, the number of lumbar fusion surgeries for spondylolisthesis increased by more than 100% from 2004 to 2015,[4] and spinal fusion was recently listed as one of the most common and the single most costly grouping of operations in U.S. hospitals ($12.8 billion in 2011).[5, 6]

While lumbar fusion is beneficial for many patients,[6] both the prevalence and costs of these surgeries have motivated efforts to profile hospital performance for these interventions. To date, several spine surgery registries have been developed to benchmark hospital quality, with the most recent being the American Spine Registry,[79] a joint effort between the American Association of Neurological Surgeons and the American Academy of Orthopaedic Surgeons.[10] These registries have largely focused on patient-reported outcomes and measures of surgical morbidity, a focus increasingly shared by payers;[11, 12] however, related to this growing focus on quality and value, government and private payers have also increasingly sought to aggregate payments for a variety of services associated with surgical procedures into higher risk, episode-based “bundled” payments. The aim is to drive attention simultaneously toward quality of outcomes and cost savings by sharing more risk. This evolution has included spine fusion.[13, 14]

Comparing hospitals based on performance metrics assumes that differences between hospitals can be identified, and that such metrics can be reliable measures for distinguishing true performance differences. Within the hospital quality literature, the definition of the term “reliability” is context dependent. As it relates to risk-adjusted outcome measures, the National Quality Forum defines reliability as “the extent to which a provider’s measured score approaches their true score.”[15] In this context, reliability is defined by the extent to which hospitals actually differ from each other (i.e. “signal”) compared to the amount of variability within each hospital (i.e. statistical “noise”). Importantly, the statistical reliability of a performance measure is distinguished from its validity, which reflects the extent to which a measure captures a clinically/economically meaningful outcome.[15]

Using this definition of reliability, recent work has indicated that a metric of serious complication rates was only modestly reliable for profiling hospital performance in a broad population of thoracolumbar fusion patients.[16] However, the reliability of postoperative complications has not been investigated specifically in patients undergoing lumbar fusion for spondylolisthesis. Moreover, it is unknown to what extent variations in hospital cost profiles represent true differences in performance across hospitals. While costs are not a direct measure of quality, such information should have important implications for bundled care programs that assume that the cost differences across hospitals generally reflect consistent variations rather than random fluctuations due to chance.

To address this evidence gap, the objective of this study was to use administrative data from a large, geographically diverse population to evaluate the reliability of two composite complication metrics and 90-day inpatient costs for profiling hospital performance.

Methods:

Study Population

This study used data from State Inpatient Databases (SID) provided by the Healthcare Cost and Utilization Project supported by the Agency for Healthcare Research and Quality.[17] The SID are all-payor databases that provide billing codes corresponding to inpatient diagnoses and procedures, along with patient demographic characteristics and inpatient charges. The SID include data from all non-federal hospital admissions in participating states. We used SID data from 2010–2017 from nine states that included a unique patient-level identifier. Unlike the National Inpatient Sample, this patient-level identifier allowed us to track individual patient visits across readmission episodes after index surgical discharge.[17] The states (and years) included in the study were: Arkansas (2010–2016), Florida (2010–2017), Iowa (2010–2012; 2014–2015), Massachusetts (2011–2015), Maryland (2014–2017), Nebraska (2010–2016), New York (2010–2016), Vermont (2012–2016), and Wisconsin (2014–2016).

This study included patients 18 years and older who underwent elective lumbar fusion for spondylolisthesis. In an attempt to restrict the study population to patients with similar procedural characteristics, we excluded patients who were coded for multilevel fusion surgeries, given our inability to determine the exact number of levels fused. To focus exclusively on elective procedures, we excluded patients who were admitted from the emergency department or transferred from another hospital, along with those who had a diagnosis of spine trauma, neoplasm, or cauda equina syndrome during the index surgical admission. We also excluded patients with procedure codes indicating fracture repair, along with patients who had infection listed as their primary diagnosis code or “present on admission” during the index surgical admission. To ensure a minimum sample size for measuring quality outcomes, we restricted the analysis to only include hospitals performing more than 25 eligible lumbar fusions for spondylolisthesis each year, similar to cutoffs used by CMS for profiling hip/knee arthroplasty.[18] The International Classification of Diseases (ICD)-9 and ICD-10 diagnosis and procedure codes used to define the study population are included in E-Table 1 in the Appendix.

Outcomes and Confounding Variables

This study evaluated the reliability of three distinct performance metrics. First, inpatient costs were defined as all hospital costs accrued during index surgical admission and readmission(s) associated with a postoperative complication within 90 days of surgery. Professional fees are generally not included in the SID, and consequently were not considered in this analysis. Hospital charges were converted to costs using cost-to-charge ratio files.[19] These files provide hospital-specific ratios that relate what hospitals bill for services to the actual cost incurred with care delivery. All costs were adjusted to 2019 dollars using the medical care component of the consumer price index.[20] Second, we defined overall postoperative complications as return to the operating room, death, or an infectious, thrombotic, cardiac, respiratory, urinary-renal, neurological, vascular, or wound-related complications. Medical complication diagnoses (e.g. cardiac, thrombotic) were included for any readmission that began within 30 days of index surgery. Wound complications (infection and dehiscence), device complications, and return to the operation room were included for any event within 90 days of index surgery. These complication definitions are similar to those used by CMS when profiling hip arthroplasty surgeries and also similar to previous studies evaluating complications from major surgeries, including spine surgery.[18, 21, 22] Third, we distinguished serious postoperative complications as return to the operating room, pulmonary embolism, myocardial infarction, or death, or any of the previously noted complications in conjunction with a 90-day length of stay in the 90th percentile or above. This type of approach to distinguishing serious complications with a substantial impact on the patient has been applied in a variety of previous publications.[21, 23, 24] The full list of ICD-9/10 codes used to define these complications is shown in E-Table 1.

Statistical Analysis

For each study outcome, we created separate hierarchical regression models for each year of the study period. For the cost outcome, we used a hierarchical linear regression model, because we found that the cost data were approximately normally distributed. For the complication outcomes, we used hierarchical logistic regression models. All study models were risk-adjusted for patient-level comorbidities defined based on the Elixhauser index,[25] other comorbidities relevant in patients undergoing spine surgery (e.g. anxiety, tobacco use),[22] age, sex, race, insurance status, presence of scoliosis or other deformity, and surgical characteristics (anterior, posterior, or combined approach; addition of an osteotomy or corpectomy; use of an interbody device). Hospital cost was reported as the mean and standard deviation by study year. Complication rates were reported as risk/reliability stratified rates (RSCR), which is the metric used by CMS to profile joint arthroplasties.[18] The RSCR is similar to a risk-adjusted complication rate that accounts for baseline patient differences (e.g. in age, comorbidities) except that it also incorporates the reliability of the complication rate estimate based on a hospital’s case volume. More specifically, the observed complication rate for low-volume hospitals is assumed to be a less accurate reflection of a hospital’s true performance due to low sample size. Therefore, the RSCR for low-volume hospitals is “shrunk” toward the overall population mean. By comparison, risk-adjusted complication rates from high-volume hospitals are considered to be a more accurate reflection of that hospital’s “true” performance and are subjected to less “shrinkage.” The calculation of this metric has been reported previously and is defined in detail in the Appendix.[16, 18, 26, 27]

In the context of hospital performance measures, statistical reliability relates to the ability to distinguish actual differences in performance across individual providers or hospitals from statistical “noise” resulting from variation within hospitals. Stated another way, reliability is defined as: [signal/(signal + noise)], where “signal” reflects between-hospital variation and “noise” reflects within-hospital variation.[28] This measure has been compared to a power calculation estimating the ability to detect “true” differences across hospitals when they exist.[28] Reliability is measured on a scale from 0 to 1, where 0 indicates all variation across hospitals is due to chance (“no signal”) and 1 indicates all variation reflects true differences in performance at the hospital-level (“all signal”).[29] Higher reliability reflects the degree of confidence in comparing one hospital to another, but higher reliability does not necessarily indicate better outcomes. Although there is no absolute threshold, a reliability above 0.7 is often considered substantially strong.[30]

The amount of variability across hospitals (i.e. “signal”) was obtained from the risk-adjusted hierarchical regression models, which included random intercepts to represent differences across hospitals.[15] These models directly quantified the amount of between-hospital variance (i.e. how much variation in complication rates or costs there was across hospitals). To calculate the reliability of the cost outcome, the “noise” was calculated as the hospital-level variance divided by the number of surgeries at each hospital, as previously described.[29, 30] To calculate the reliability of the complication outcomes, we used the binomial estimation, which defines within-hospital variance as “p*(1 – p)/ni”, where “p” is the proportion of surgeries at hospital i that experienced a complication.[31] The calculation of this reliability measure has been described previously and is reported in further detail in the Appendix.[30, 31]

To evaluate the impact of hospital case volume, we first portrayed costs and complication rates by tercile of hospital volume. We tested for significant differences across volume groups and complication groups using one-way analysis of variance and Tukey’s correction for multiple comparison testing. We also compared reliability vs. annual case volume for each of the three outcome measures. We further examined the median and range of reliability values for each study year using a violin plot to represent the distribution of the data. Additionally, we measured the proportion of hospitals meeting different cutoffs of reliability for each outcome for each study year.

Complete case analysis was used for all analyses. Calculations were performed using SAS version 9.4 (SAS Institute, Cary, NC) and R version 4.0.1.[32] The authors’ institutional review board (IRB) deemed that the study dataset did not constitute human subjects research. Therefore, neither IRB approval nor patient informed consent was obtained.

Results:

Between 2010 and 2017, we identified a total of 82,513 elective lumbar fusion surgeries for spondylolisthesis in participating states. After excluding 15,942 surgeries at hospitals not meeting the threshold of more than 25 cases per year, we included 66,571 surgeries performed at 244 hospitals. Due to the varying number of states providing data, the number of hospitals in the dataset per year ranged from 45 to 200, and the median hospital case volume was 50 surgeries per year (range 26–344). Low-volume hospitals were considered to be those performing 39 or fewer lumbar fusions for spondylolisthesis per year, medium volume as those performing 40–65 fusions per year, and high volume as those performing more than 65 fusions per year. The mean age of included patients was 62 years, and most patients were female (61%), White (81%), and had Medicare (50%) or private (37%) insurance. Population demographic and surgical characteristics are shown in Table 1.

Table 1:

Demographic characteristics, comorbid conditions, and procedure characteristics for patients undergoing lumbar fusion for spondylolisthesis.

Characteristic, n (%)
Age in years, mean (std) 62.4 (12.8)
Gender
 Male 25,970 (39.0)
 Female 40,601 (61.0)
Insurance type
 Private 24,771 (37.2)
 Medicare 33,298 (50.0)
 Medicaid 2,896 (4.4)
 Other 5,606 (8.4)
Race/ethnicity
 White 54,143 (81.3)
 Black 4,151 (6.2)
 Hispanic 3,017 (4.5)
 Other 5,260 (7.9)
Elixhauser comorbidity index, mean (std) 6.7 (10.2)
Scoliosis 3,664 (5.5)
Other deformity 2,693 (4.1)
History of tobacco use 21,803 (32.8)
Hypercholesterolemia 28,112 (42.2)
Anxiety 9,550 (14.4)
Chronic pain 5,387 (8.1)
Procedure type
 Anterior fusion 4,111 (6.2)
 Posterior fusion 57,254 (86.0)
 Anterior/posterior fusion 5,206 (7.8)
Osteotomy/corpectomy 687 (1.0)
Interbody device 45,527 (68.4)
Overall complications 6,391 (9.6)
 Died 70 (0.11)
 Return to the operating room 2,087 (3.1)
 Neurological complication 2,507 (3.8)
 Pneumonia 578 (0.87)
 Postoperative hematoma 460 (0.69)
 Other respiratory complication 288 (0.43)
 Surgical site infection 1,085 (1.6)
 Device complication 747 (1.1)
 Urinary-renal complication 1,701 (2.6)
 Myocardial infarct/cardiac complication 446 (0.67)
 Sepsis 388 (0.58)
 Deep venous thrombosis 319 (0.48)
 Pulmonary embolism 334 (0.50)
 Wound dehiscence 468 (0.70)
 Stroke 113 (0.17)
 Osteomyelitis/discitis 58 (0.09)
 Meningitis NR
 Vascular injury NR

NR=not reported due to not reaching the cutoff of 11 events required by HCUP for reporting individual outcomes.

The mean and standard deviation for cost, along with the RSCR for serious and overall complications, stratified by terciles of hospital procedure volume, are shown in Figure 1. The mean inpatient 90-day cost of a lumbar fusion surgery for spondylolisthesis was $30,827 in 2019 dollars. Mean costs were significantly higher for patients that experienced serious complications ($51,980) and other ($30,678) complications compared to those without these outcomes ($28,960; p<0.001 for both comparisons). The mean cost was slightly but significantly higher in low volume ($31,709) hospitals compared to both medium ($ 30,566) and high volume ($30,684) hospitals (p<0.001 for both comparisons). Overall, 12.0% of patients experienced a postoperative complication, and 7.8% experienced a serious complication. While the mean RSCR for overall and serious complications were slightly higher in high compared to low volume hospitals (12.6% vs. 12.2%; 8.3% vs. 8.1%), none of these differences were significant (p=0.10–0.99 for overall complications; p=0.49–0.94 for serious complications).

Figure 1:

Figure 1:

Mean and standard deviation for each study outcome by volume group, divided into thirds of approximately equal size. a) 90-day hospital costs; b) 90-day risk-stratified rate of overall postoperative complications; b) 90-day risk-stratified rate of serious complications. * indicates significant differences (p<0.05) for 90-day costs. For the RSCR evaluation, no individual volume-group comparisons were statistically significant, and all p-values were ≥ 0.10.

The distribution of hospital-specific reliability values for each year and outcome is shown in Figure 2. As represented in the figure, the reliability of the cost measurement was substantially higher than those of the complication metrics. The median reliability ranged from 0.97–0.99 across study years, and there was a narrow distribution of reliability values (interquartile range (IQR) 0.02–0.05). By comparison, the median reliability for the overall complication metric ranged from 0.22 to 0.47 (IQR 0.13–0.24), and the median reliability for the serious complication measure ranged from 0.31 to 0.49 (IQR 0.15–0.28) across the study years.

Figure 2:

Figure 2:

A violin plot showing the distribution of hospital reliability scores by year for the cost (a), overall complication (b), and serious complication (c) outcome metrics. The width of the violin indicates the density of hospital reliability scores at that level. Solid black points indicate the median reliability for each year.

The hospital-specific reliability values are shown for each outcome in relation to the annual hospital lumbar fusion case volume in Figure 3. As displayed in the figure, most hospitals had cost reliability values above 0.9, but at lower volumes, some hospitals had reliability values as low as 0.51. However, all hospitals with annual case volumes above 100 had reliability of 0.9 or higher. By comparison, the complication metrics showed a wide range of reliability values at low to moderate case volumes, while the highest volume hospitals generally showed high reliability (> 0.70). Corresponding to these observations, at least 96% of hospitals achieved high (0.7) and at least 84% achieved very high (0.9) reliability for the cost metric in all study years. In contrast, only 0–9% of hospitals had high (0.7) reliability for the overall complication measure depending on the study year, whereas 1–11% of hospitals achieved this threshold for the serious complication outcome. Nonetheless, up to 69% of hospitals reached 0.4 reliability for overall complications by year, as did up to 80% for serious complications, a reliability level often considered moderate. The proportion of hospitals achieving different reliability thresholds by study year is shown in Table 2.

Figure 3:

Figure 3:

The relationship between annual hospital surgical volume and reliability for the cost (a), overall complication (b), and serious complication (c) outcome metrics. Data from all years are combined, and each point represents one hospital per year.

Table 2:

The number (%) of hospitals meeting different thresholds of reliability by outcome measure and year.

2010 2011 2012 2013 2014 2015 2016 2017
Cost

 0.4 105 (100%) 120 (100%) 136 (100%) 131 (100%) 200 (100%) 189 (100%) 108 (100%) 45 (100%)

 0.7 105 (100%) 120 (100%) 135 (99%) 131 (100%) 199 (99.5%) 187 (99%) 108 (100%) 43 (96%)

 0.9 100 (95%) 105 (88%) 129 (95%) 124 (95%) 187 (94%) 181 (96%) 99 (92%) 38 (84%)

Overall Complications

 0.4 10 (10%) 83 (69%) 57 (42%) 80 (61%) 114 (57%) 62 (33%) 18 (17%) 20 (44%)

 0.7 0 (0%) 11 (9%) 5 (4%) 10 (8%) 13 (7%) 3 (2%) 1 (1) 0 (0)

 0.9 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

Serious Complications

 0.4 22 (21%) 67 (56%) 94 (69%) 85 (65%) 135 (68%) 69 (37%) 26 (24%) 36 (80%)

 0.7 1 (1%) 6 (5%) 4 (10%) 15 (11%) 17 (9%) 4 (2%) 1 (1%) 2 (4)

 0.9 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)

Discussion:

In this analysis of all-payor data from nine states across an eight-year period, we found that 90-day inpatient cost was a highly reliable measure for profiling hospital performance. By comparison, composite metrics of any postoperative complication or serious postoperative complications were only moderately reliable. While the cost outcome showed the highest reliability at all hospital volume levels, the differences in reliability across the three measures was most pronounced for hospitals with lower to moderate case volumes. Although costs were slightly higher in low volume hospitals, the RSCR did not differ by hospital volume, consistent with past studies showing surgeon volume but not hospital volume is associated with complications from spine fusion.[3335]

In recent years, there has been a growing emphasis toward shifting the financial risk and burden of healthcare cost from payers to hospital providers.[14] The most prominent example of such efforts has been the Bundled Payments for Care Improvement (BPCI) initiative created by the Centers for Medicare and Medicaid Services (CMS), and its successors the Comprehensive Care for Joint Replacement (CJR) and BPCI-Advanced programs.[36] Composed of several participation options, the BPCI program provided bundled payments for acute and post-acute care episodes associated with a variety of clinical conditions, including spine surgery. While participation in the BPCI program was voluntary, beginning in 2016 CMS created the Comprehensive Care for Joint Replacement (CJR) program, requiring certain hospitals to participate in bundled payments for hip and knee arthroplasty surgeries.[37] Subsequent to CJR, CMS has created the BPCI-Advanced program and a growing number of other bundled or higher-risk programs. Some private payers have begun similar bundled payment efforts for spine fusion.[38] There are now also many examples of large employers seeking out bundled agreements.

The high reliability observed for hospital cost has important implications for such payment models. Our results indicate that the amount of variability observed for hospital costs is overwhelmingly related to true variability across hospitals, rather than random fluctuations due to chance.[30] Variations in total inpatient hospital cost may have a number of etiologies, including length of stay, patient characteristics, surgical approach, use of surgical implants, and postoperative complications, among others.[3941] While we found that complications did increase cost, these relatively rare events were not a primary driver of overall hospital costs. Although the SID did not include data to separate line-item costs, our findings suggest that cost variations were more closely related to differences in operating room and inpatient billing practices across hospitals. Regardless of the individual drivers, these results support the viability of bundled care models that assume that true cost differences exist and can be distinguished across hospitals.

Expanding use of bundled payment programs may impact several aspects of hospital behavior, including mergers and acquisitions. Specifically, several recent studies suggest that such vertical acquisitions can reduce costs for the hospitals acquired due to cost efficiencies.[42, 43] As our findings suggest that cost variations are largely driven by hospital-level differences, expanding the use of bundled care programs may add financial pressure to smaller, standalone facilities to integrate with larger healthcare systems.

In contrast to the cost estimates, complication rates were only moderately reliable (range 0.31–0.49 for serious complications). This result is consistent with recent findings in a broader population of spine fusion patients, as well as studies in the general surgery literature highlighting the wide range of reliability values observed for some surgeries.[16, 28, 44, 45] While measures with higher event rates often have higher reliability due to less measurement noise,[45, 46] our results add to growing evidence that outcome frequency is only one influence on statistical reliability. Despite being less frequent, we found that serious complications had higher reliability than overall complications in most years. Similarly, past studies have shown that the statistical reliability of complication metrics from certain high-risk procedures (e.g. vascular surgery) can be similar to or lower than the reliability of metrics for less-morbid procedures (e.g. hip arthroplasty).[44] These results emphasize the importance of evaluating the reliability of specific performance measures in each population where they are being applied.

While actual variability in event rates is likely the primary source of the variability observed in both overall and serious complications, other factors should also be considered. For example, complications in administrative dataset are recorded by billing coders, who may differ in their overall experience and particular familiarity with spine surgery patients. Likewise, the quality and completeness of physician documentation will influence the accuracy of billing codes for complications. These multifaceted sources of variation likely contributed to the inferior reliability of complications compared to cost data, which are far less susceptible to both chance and human error.

Although our results provide novel insights regarding the statistical reliability of several performance measures, our analysis was not designed to evaluate their validity, or the clinical/economic importance of each measure. For instance, while decreasing healthcare costs is imperative, some unique hospital capabilities might justify higher payments. Likewise, the final decision regarding appropriate outcomes to include in complication profiles should reflect expert opinion regarding outcomes that are both clinically important and likely related to the quality of care delivered. Future efforts by payers and policymakers should therefore consider both validity and statistical reliability when establishing new programs.

Beyond these considerations not addressed in our analysis, this study has several limitations. First, we relied on ICD-9/10 billing codes to adjust for observed differences in patient characteristics and case mix, which may not have fully reflected important differences in factors that influence cost and risk of complications.[47, 48] Second, the SID only provides data on overall hospital-level charges for relevant encounters, adjusted with hospital cost-to-charge ratios. Therefore, we could not distinguish line-item costs that would reflect the individual components driving cost variations. Likewise, we could not separate actual reimbursement by payor, which might impact our findings. Third, the SID did not provide data on outpatient costs, which are included in some bundled care programs.[36] Therefore, our present findings are most relevant to bundled care programs focused on inpatient costs, and future studies using alternative datasets may extend these analyses to the outpatient setting. Finally, due to data availability, the states represented in this study were restricted to the Northeast and Central United States, and future studies should verify these results in Western states.

Conclusion:

90-day inpatient hospital costs are highly reliable for assessing variation across hospitals for patients undergoing lumbar fusion for spondylolisthesis. By comparison, overall complications and serious complication rates are only moderately reliable when measured using administrative data. The notable cost differences across centers suggest that higher-cost hospitals could use cost data to identify potential inefficiencies in care or to highlight unique capabilities that might justify some differences.[28] Hospitals’ ability to capitalize on such efforts may have important implications for how they are affected by the growing movement toward episode-bundled payment programs that shift financial risks onto providers, in hopes of facilitating patient care improvements.

Supplementary Material

1

Acknowledgments

The authors thank Ms. Joanna Reale for her assistance with database programming. The authors thank Drs. John Adams, Mark Cohen, and Yaoming Liu for their insightful comments related to the statistical methods used in this study.

Funding

This work was supported by the Washington University Institute of Clinical and Translational Sciences which is, in part, supported by the NIH/National Center for Advancing Translational Sciences (NCATS), CTSA grant #UL1 TR002345. The Center for Administrative Data Research is supported in part by the Washington University Institute of Clinical and Translational Sciences grant UL1 TR002345 from the National Center for Advancing Translational Sciences (NCATS) of the National Institutes of Health (NIH) and Grant Number R24 HS19455 through the Agency for Healthcare Research and Quality (AHRQ). JKG was supported by funding from the Agency for Healthcare Research and Quality (1F32HS027075-01A1) and the Thrasher Research Fund (#15024). The sponsors had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; or the decision to submit the manuscript for publication.

Declaration of Competing Interest

The authors had no conflicts of interest related to this study. Dr. Olsen received research funding from Merck, Pfizer, and Sanofi Pasteur unrelated to this study. Dr. Olsen received consulting fees from Pfizer unrelated to this study. Dr. Yamaguchi received grant funding from the NIH and royalty payments from Zimmer Biomet and Wright Medical, unrelated to this study. Dr. Kelly received consulting fees from The Journal of Bone and Joint Surgery and research support from the ISSGF and SSSF. Dr. Hall is the consulting director of the American College of Surgeons National Surgical Quality Improvement Program. Dr. Ray received grant funding from the NIH, Department of Defense, and Depuy/Synthes, unrelated to this study. Dr. Ray also received consulting fees from Depuy/Synthes, Globus, Nuvasive, Corelink, and Medtronic, along with royalties from Depuy/Synthes, Nuvasive, Corelink, and Acera Surgical.

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References:

  • 1.Dieleman JL, Cao J, Chapin A, et al. US Health Care Spending by Payer and Health Condition, 1996–2016. JAMA. 2020;323(9):863–84. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Kroenke K, Cheville A. Management of Chronic Pain in the Aftermath of the Opioid Backlash. JAMA. 2017;317(23):2365–6. [DOI] [PubMed] [Google Scholar]
  • 3.Hartvigsen J, Hancock MJ, Kongsted A, et al. What low back pain is and why we need to pay attention. The Lancet. 2018;391(10137):2356–67. [DOI] [PubMed] [Google Scholar]
  • 4.Martin BI, Mirza SK, Spina N, Spiker WR, Lawrence B, Brodke DS. Trends in Lumbar Fusion Procedure Rates and Associated Hospital Costs for Degenerative Spinal Diseases in the United States, 2004 to 2015. Spine (Phila Pa 1976). 2019;44(5):369–76. [DOI] [PubMed] [Google Scholar]
  • 5.Weiss AJ, Elixhauser A, Andrews RM. Statistical Brief# 180. Overview of Hospital Stays in the United States. 2014. [PubMed] [Google Scholar]
  • 6.Ghogawala Z, Dziura J, Butler WE, et al. Laminectomy plus Fusion versus Laminectomy Alone for Lumbar Spondylolisthesis. N Engl J Med. 2016;374(15):1424–34. [DOI] [PubMed] [Google Scholar]
  • 7.Asher A, Speroff T, Dittus R, Parker S, Davies J, Selden N. The National Neurosurgery Quality and Outcomes Database (N2QOD): a collaborative North American outcomes registry to advance value-based spine care. Spine (Phila Pa 1976). 2014;39(22 Suppl 1):S106–S16. [DOI] [PubMed] [Google Scholar]
  • 8.Foundation for Healthcare Quality. SPINE CARE OUTCOMES ASSESSMENT PROGRAM (SPINE COAP). 2020. [cited 2020 March 15, 2020]; Available from: https://www.qualityhealth.org/spinecoap/.
  • 9.Michigan Spine Surgery Improvement Collaborative. Michigan Spine Surgery Improvement Collaborative. [cited 2020 March 15, 2020]; Available from: https://mssic.org.
  • 10.American Spine Registry. American Spine Registry, The National Quality Improvement Registry for Spine Care. Rosemond, IL: American Spine Registry; 2020. [November 22, 2020]; Available from: https://www.americanspineregistry.org. [Google Scholar]
  • 11.Hospital-Acquired Condition Reduction Program(HACRP). Centers for Medicare & Medicaid Services; 2020 [September 3, 2020]; Available from: https://www.cms.gov/Medicare/Medicare-Fee-for-Service-Payment/AcuteInpatientPPS/HAC-Reduction-Program.
  • 12.Complication rate for hip/knee replacement patients. Centers for Medicare & Medicaid Services; 2020 [September 3, 2020]; Available from: https://www.medicare.gov/hospitalcompare/Data/Surgical-Complications-Hip-Knee.html.
  • 13.Dummit L, Marrufo G, Marshall J, et al. CMS Bundled Payments for Care Improvement Initiative Models 2–4: Year 5 Evaluation & Monitoring Annual Report. CMS, 2018. [Google Scholar]
  • 14.Agarwal R, Liao JM, Gupta A, Navathe AS. The Impact Of Bundled Payment On Health Care Spending, Utilization, And Quality: A Systematic Review: A systematic review of the impact on spending, utilization, and quality outcomes from three Centers for Medicare and Medicaid Services bundled payment programs. Health Aff (Millwood). 2020;39(1):50–7. [DOI] [PubMed] [Google Scholar]
  • 15.Glance LG, Maddox KJ, Johnson K, et al. National Quality Forum guidelines for evaluating the scientific acceptability of risk-adjusted clinical outcome measures: a report from the National Quality Forum Scientific Methods Panel. Ann Surg. 2020;271(6):1048–55. [DOI] [PubMed] [Google Scholar]
  • 16.Greenberg JK, Olsen MA, Poe J, et al. Administrative Data are Unreliable for Ranking Hospital Performance Based on Serious Complications after Spine Fusion. Spine (Phila Pa 1976). 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Agency for Healthcare Research and Quality. Introduction to the HCUP State Inpatient Databases (SID). In: (HCUP) HCaUP, ed.: Agency for Healthcare Research and Quality; 2019. [Google Scholar]
  • 18.Yale New Haven Health Services Corporation–Center for Outcomes Research and Evaluation (YNHHSC/CORE). 2020 Procedure-Specific Complication Measure Updates and Specifications Report; Elective Primary Total Hip Arthroplasty (THA) and/or Total Knee Arthroplasty (TKA) – Version 9.0. Centers for Medicare & Medicaid Services (CMS), 2020. [Google Scholar]
  • 19.Cost-to-Charge Ratio for Inpatient Files. Agency for Healthcare Research and Quality; 2020 [November 22, 2020]; Available from: https://www.hcup-us.ahrq.gov/db/ccr/ip-ccr/ipccr.jsp.
  • 20.U.S. Bureau of Labor Statistics. BLS Data Viewer. Washington, DC: U.S. Bureau of Labor Statistics; [cited 2020]; Available from: https://beta.bls.gov/dataViewer/view/timeseries/CUUR0000SAM. [Google Scholar]
  • 21.Osborne NH, Nicholas LH, Ryan AM, Thumma JR, Dimick JB. Association of hospital participation in a quality reporting program with surgical outcomes and expenditures for Medicare beneficiaries. JAMA. 2015;313(5):496–504. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Ratliff JK, Balise R, Veeravagu A, et al. Predicting Occurrence of Spine Surgery Complications Using “Big Data” Modeling of an Administrative Claims Database. JBJS. 2016;98(10). [DOI] [PubMed] [Google Scholar]
  • 23.Ibrahim AM, Hughes TG, Thumma JR, Dimick JB. Association of hospital critical access status with surgical outcomes and expenditures among Medicare beneficiaries. JAMA. 2016;315(19):2095–103. [DOI] [PubMed] [Google Scholar]
  • 24.Ibrahim AM, Ghaferi AA, Thumma JR, Dimick JB. Variation in outcomes at bariatric surgery centers of excellence. JAMA surgery. 2017;152(7):629–36. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Elixhauser A, Steiner C, Harris DR, Coffey RM. Comorbidity measures for use with administrative data. Med Care. 1998;36(1):8–27. [DOI] [PubMed] [Google Scholar]
  • 26.Bozic K, Yu H, Zywiel MG, et al. Quality Measure Public Reporting Is Associated with Improved Outcomes Following Hip and Knee Replacement. JBJS. 2020. [DOI] [PubMed] [Google Scholar]
  • 27.Cohen ME, Ko CY, Bilimoria KY, et al. Optimizing ACS NSQIP Modeling for Evaluation of Surgical Quality and Risk: Patient Risk Adjustment, Procedure Mix Adjustment, Shrinkage Adjustment, and Surgical Focus. J Am Coll Surg. 2013;217(2):336–46.e1. [DOI] [PubMed] [Google Scholar]
  • 28.Grenda TR, Krell RW, Dimick JB. Reliability of hospital cost profiles in inpatient surgery. Surgery. 2016;159(2):375–80. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Adams JL, Mehrotra A, Thomas JW, McGlynn EA. Physician Cost Profiling — Reliability and Risk of Misclassification. N Engl J Med. 2010;362(11):1014–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Hwang J, Adams JL, Paddock SM. Defining and estimating the reliability of physician quality measures in hierarchical logistic regression models. Health Services and Outcomes Research Methodology. 2020:1–20. [Google Scholar]
  • 31.Lawson EH, Ko CY, Adams JL, Chow WB, Hall BL. Reliability of evaluating hospital quality by colorectal surgical site infection type. Ann Surg. 2013;258(6):994–1000. [DOI] [PubMed] [Google Scholar]
  • 32.R: A Language and Environment for Statistical Computing. 4.0.1 ed. Vienna, Austria: R Foundation for Statistical Computing; 2020. [Google Scholar]
  • 33.Blais MB, Rider SM, Sturgeon DJ, et al. Establishing objective volume-outcome measures for anterior and posterior cervical spine fusion. Clin Neurol Neurosurg. 2017;161:65–9. [DOI] [PubMed] [Google Scholar]
  • 34.Schoenfeld AJ, Sturgeon DJ, Burns CB, Hunt TJ, Bono CM. Establishing benchmarks for the volume-outcome relationship for common lumbar spine surgical procedures. The Spine Journal. 2018;18(1):22–8. [DOI] [PubMed] [Google Scholar]
  • 35.Dasenbrock HH, Clarke MJ, Witham TF, Sciubba DM, Gokaslan ZL, Bydon A. The Impact of Provider Volume on the Outcomes After Surgery for Lumbar Spinal Stenosis. Neurosurgery. 2012;70(6):1346–54. [DOI] [PubMed] [Google Scholar]
  • 36.CMS.gov. Bundled Payments for Care Improvement (BPCI) Initiative: General Information. Baltimore, MD: Centers for Medicare & Medicaid Services; 2020 [November 23, 2020]; Available from: https://innovation.cms.gov/innovation-models/bundled-payments. [Google Scholar]
  • 37.CMS.gov. Comprehensive Care for Joint Replacement Model. Baltimore, MD: Centers for Medicare & Medicaid Services; 2020 [November 23, 2020]; Available from: https://innovation.cms.gov/innovation-models/cjr. [Google Scholar]
  • 38.Humana Announces Two Milestones in Value-Based Orthopedic Specialty Care, Launching Bundled Payment Model for Spinal Fusion Surgeries, and Expanding Total Joint Replacement Program. Business Wire; 2019. [Google Scholar]
  • 39.Kalakoti P, Gao Y, Hendrickson NR, Pugely AJ. Preparing for Bundled Payments in Cervical Spine Surgery: Do We Understand the Influence of Patient, Hospital, and Procedural Factors on the Cost and Length of Stay? Spine (Phila Pa 1976). 2019;44(5):334–45. [DOI] [PubMed] [Google Scholar]
  • 40.Lucio JC, Vanconia RB, Deluzio KJ, Lehmen JA, Rodgers JA, Rodgers W. Economics of less invasive spinal surgery: an analysis of hospital cost differences between open and minimally invasive instrumented spinal fusion procedures during the perioperative period. Risk Manag Healthc Policy. 2012;5:65–74. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Carr DA, Saigal R, Zhang F, Bransford RJ, Bellabarba C, Dagal A. Enhanced perioperative care and decreased cost and length of stay after elective major spinal surgery. Neurosurgical Focus FOC. 2019;46(4):E5. [DOI] [PubMed] [Google Scholar]
  • 42.Schmitt M Do hospital mergers reduce costs? J Health Econ. 2017;52:74–94. [DOI] [PubMed] [Google Scholar]
  • 43.Craig S, Grennan M, Swanson A. Mergers and marginal costs: New evidence on hospital buyer power. National Bureau of Economic Research, 2018. [Google Scholar]
  • 44.Huffman KM, Cohen ME, Ko CY, Hall BL. A comprehensive evaluation of statistical reliability in ACS NSQIP profiling models. Ann Surg. 2015;261(6):1108–13. [DOI] [PubMed] [Google Scholar]
  • 45.Krell RW, Finks JF, English WJ, Dimick JB. Profiling Hospitals on Bariatric Surgery Quality: Which Outcomes Are Most Reliable? J Am Coll Surg. 2014;219(4):725–34.e3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Austin PC, Ceyisakar IE, Steyerberg EW, Lingsma HF, Marang-van de Mheen PJ. Ranking hospital performance based on individual indicators: can we increase reliability by creating composite indicators? BMC Med Res Methodol. 2019;19(1):131. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Patterson JT, Sing D, Hansen EN, Tay B, Zhang AL. The James A. Rand Young Investigator's Award: administrative claims vs surgical registry: capturing outcomes in total joint arthroplasty. The Journal of Arthroplasty. 2017;32(9):S11–S7. [DOI] [PubMed] [Google Scholar]
  • 48.Sacks GD, Dawes AJ, Russel lMM, et al. Evaluation of hospital readmissions in surgical patients: do administrative data tell the real story? JAMA surgery. 2014;149(8):759–64. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1

RESOURCES