Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2021 May 1.
Published in final edited form as: Health Aff (Millwood). 2020 May;39(5):862–870. doi: 10.1377/hlthaff.2019.00778

Improving the Accuracy of Hospital Report Cards by Incorporating Information on the Volume-outcome Association

Laurent G Glance 1, Caroline P Thirukumaran 2, Yue Li 3, Shan Gao 4, Andrew W Dick 5
PMCID: PMC7423250  NIHMSID: NIHMS1579620  PMID: 32364861

Abstract

CMS uses hierarchical modeling to stabilize hospital ranking by shrinking the performance of low-volume hospitals toward the performance of average hospitals. A CMS technical expert panel recommended consideration of the use of shrinkage targets to avoid mis-classifying poor-performing low-volume hospitals as average. We used Medicare data for patients undergoing aortic valve replacements to create one set of mortality hospital star ratings based on the standard approach, while the second shrunk the performance of low-volume hospitals toward the performance of low-volume hospitals. After grouping hospitals into star categories using a CMS clustering algorithm used to rate overall hospital quality, there was moderate to substantial agreement in hospital star ratings for all but the lowest volume hospitals. When hospitals were instead classified as high, average, and low performance based on their statistical outlier status, hospital ratings changed very little, which was not unexpected since nearly 99% of the hospitals were classified as average. The CMS risk-adjustment methodology does not mask the performance of hospitals as long as case volumes exceed the current case volume cutoff (25) used by CMS for public reporting.

INTRODUCTION

Performance measurement is central to the Centers for Medicare and Medicaid Services’(CMS) efforts to make health care safer and more affordable, and to allow patients to make informed choices.(1) Performance measures must be valid to avoid misleading regulators and the public.(2) One of the most important challenges to creating valid quality measures is that serious adverse outcomes are rare and that many hospitals have few cases. Reporting the performance of low-volume hospitals using the well-known observed-to-expected (OE) ratio based on standard logistic regression can lead to “wild fluctuations” in hospital ratings from year to year.(3) Instead, CMS uses hierarchical regression modeling to minimize the large year-to-year fluctuations for low-volume hospitals that is due to chance alone.(4) This technique estimates a hospital’s performance as the weighted average of a hospital’s own outcomes and the performance of average hospitals.(5) The smaller a hospital’s volume, the more this weighted estimate is tilted towards the performance of average hospitals.

But it has long been widely understood that low-volume hospitals have higher mortality rates for surgical procedures and common medical conditions, and that the performance of low-volume hospitals is frequently below average(69). Nearly 10 years ago, Silber showed that Medicare’s Hospital Compare model for acute myocardial infarctions (AMI) strikingly underestimates the risk-adjusted mortality rates of low-volume hospitals.(3) Silber showed that adding hospital volume to the AMI model resulted in much more accurate mortality estimates for low-volume hospitals. Citing Silber’s work, a recent white paper commissioned by CMS and the Committee of Presidents of Statistical Societies (COPSS) recommends that measure developers consider incorporating shrinkage targets in hierarchical modeling to address this problem.(10) Using shrinkage targets based on hospital case volume, a hospital’s performance is calculated as the weighted average of a hospital’s own outcomes and the performance of other hospitals with similar case volumes. This approach shrinks the performance of low-volume hospitals to the overall performance of other low-volume hospitals, instead of shrinking low-volume hospitals to the performance of average hospitals.

The goal of this exploratory analysis was to examine the changes in hospital star rankings for aortic valve replacement (AVR) when shrinkage targets based on hospital case volume are used instead of the standard CMS approach. We used Medicare data to create two sets of hospital star ratings for 30-day mortality, one based on conventional hierarchical modelling and the other using hospital case volume as a shrinkage target. CMS uses a star-rating system (1 to 5 stars) to publicly report the overall performance of hospitals based on a composite outcome measure that includes mortality and readmissions.(11) Although AVR mortality is not publicly reported on Hospital Compare, we selected this surgery because it is one of the most common cardiac surgeries, is often performed in low-volume hospitals, and has a strong volume-outcome association.(8) We chose to base our analysis on a single outcome, mortality, instead of a composite outcome, for simplicity. We hypothesized that shrinkage targets would shift the measured performance of many low-volume hospitals from “average” to “below average” given the strong volume-outcome association for this surgery. Our study goes beyond Silber’s seminal report by illustrating the impact of using shrinkage targets on hospital quality rating. Our findings may prove useful to CMS, the National Quality Forum, measure developers, and other stakeholders seeking to address the potential low case volume bias inherent in hierarchical modelling which treats low-volume hospitals as if they were average when, for many conditions where there is a strong volume-outcome association, their performance may be very much below average.

STUDY DATA AND METHODS

Data Source

This study was conducted using the 100% Medicare Provider Analysis and Review (MEDPAR) files and the Master Beneficiary Summary files (MBSF) between 2013 and 2015. These databases include beneficiary demographic information, International Classification of Diseases, Ninth Revision (ICD-9-CM) diagnosis and procedure codes, and date of death for all fee-for-service Medicare patients. The Institutional Review Board of the University of Rochester School of Medicine and Dentistry approved the study protocol.

Study Sample

We identified 134,144 patients who underwent aortic valve replacement between January 2013 and September 2015. Patients younger than 65 years (7,422), or who also underwent mitral valve replacement (3,779) or mitral valve repair (7,603) were excluded (Appendix Exhibit A1).(12) The analytic data set consisted of 115,084 observations in 1,166 hospitals.

Model development

We first estimated a baseline non-hierarchical multivariable logistic regression model (model 1). We adjusted for patient age, surgical urgency, concomitant coronary artery bypass grafting (CABG) surgery, history of previous cardiac surgery, and coexisting diseases using the Elixhauser Comorbidity algorithm.(13) We then examined the volume-outcome association (model 2) because the use of shrinkage targets based on hospital case volume would not be indicated in the absence of a clinically significant volume-outcome association (Appendix Exhibit A5.(12) We used robust variance estimators in models 1 and 2 to account for clustering of observations within hospitals.(14)

We estimated a hierarchical logistic regression model (without shrinkage targets) for 30-day mortality based on the baseline model (model 3). We specified hospitals as random effects. We also estimated another model identical to the above model, but which included hospital case volume as a shrinkage target (model 4). The optimal specification for the volume term was determined using fractional polynomials.(15, 16)

Hospital performance

We used model 3 to calculate the hospital predicted-to-expected (PE) ratio using the standard CMS approach.(17) The hospital predicted-to-expected (PE) ratio is a measure of hospital performance and is analogous to the hospital observed-to-expected (OE) mortality ratio based on non-hierarchical modelling. The hospital predicted mortality rate (P) was calculated using patient-level risk factors and includes the hospital contribution to outcomes. The hospital expected mortality rate (E) was calculated using only patient-level risk factors and does not include the hospital effect.(12) We used bootstrapping to estimate 95% confidence intervals around the hospital PE ratios.(18) The risk-adjusted mortality rate (RAMR) was calculated by multiplying the hospital PE ratio by the overall 30-day mortality rate for all patients.

We calculated hospital PE ratios based on shrinkage targets, as described in the COPSS-CMS White Paper, using model 4.(10) The hospital predicted mortality P was a function of hospital case volume, in addition to patient risk factors and the hospital contribution to outcome. The hospital expected mortality rate (E) was calculated using only patient-level risk factors and a calibration factor.(12)

We applied k-means clustering to assign each hospital 1 to 5 stars.(11) This iterative procedure partitions hospitals into five categories by minimizing the distance between each hospital’s RAMR and the mean RAMR for each star category.(11, 19) We applied this CMS algorithm separately to the distribution of RAMRs based on (1) standard shrinkage and (2) shrinkage targets to create 2 sets of star ratings. We also classified hospitals as low-performance outliers if the lower limit of their 95% confidence interval was greater than the national mortality rate, and as high-performance outliers if the upper limit was lower than the national mortality rate.

We also calculated the overall OE ratio quartiles based on all of the patients in each volume quartile. We chose the OE ratio because it provides an estimate of the performance of all of the hospitals together as a group in each quartile. The observed mortality rate (O) is the actual mortality rate for the patients treated by all of the hospitals in a volume quartile, whereas the expected mortality rate (E) is the average of the predicted mortality rates for the same patients based on the baseline non-hierarchical model (model 1). While hospital OE ratios for low-volume hospitals are frequently unstable due to small sample sizes, the overall OE ratio based on all patients in each of the volume quartiles, including the lowest-volume quartile, is expected to provide reliable estimates of the overall performance of the hospitals in each of the volume quartiles because the OE ratios for each of the volume quartiles are based on large sample sizes.

Comparison of model fit

We evaluated model calibration for the standard shrinkage model using calibration plots. We first ranked the observations according to predicted risk of 30-day mortality, and then divided the analytic sample into ten equal-sized deciles of risk. We then plotted the mean of the observed mortality rate alongside the mean of the predicted mortality rate for each decile as a function of the decile of risk. To further evaluate model calibration, we created separate calibration plots for very-low (<25 cases), low (25-49), medium (50-124), and high-volume hospitals (>= 125 cases), based approximately on hospital volume quartiles. In addition to standard calibration plots based on deciles of risk, we compared the observed and predicted mortality rates for each of the four volume quartiles using the two sample t-test. We also evaluated model discrimination using the C statistic.(20) We evaluated the performance of the shrinkage targets model in a similar fashion.

Analysis

Our analytic plan is outlined in Appendix Exhibit A3.(12) We first compared the distributions of hospital PE ratio based on the (1) standard shrinkage and (2) shrinkage targets to the overall OE ratio for each hospital volume quartile using the sign test. We then compared hospital ranking based on standard shrinkage (model 3) and shrinkage targets (model 4) by assessing the agreement of (1) star ratings; (2) RAMRs; and (3) performance outlier status. We assessed the agreement for the star ratings and outlier status using kappa analysis,(21) and repeated this analysis for star ratings after stratifying hospitals into volume quartiles. We assessed agreement for RAMRs using the intraclass correlation coefficient, and then repeated this analysis after stratifying hospitals into volume quartiles. Agreement was evaluated using the Landis scale: values less than 0 suggest poor agreement; 0.00 to 0.20 slight agreement; 0.21-0.40 fair agreement; 0.41-0.60 moderate agreement; 0.61-0.80 substantial agreement; and 0.81 to 1.00 almost perfect agreement.(22) All statistical analyses were performed using STATA SE/MP (version 15.1; STATA Corp, College Station, TX).

STUDY RESULTS

Volume-outcome Association

Patient demographics are shown in Appendix Exhibit A4.(12) Patients treated in hospitals in the lowest-volume quartile (<25 cases), have 2.6-fold higher odds of mortality (adjusted odds ratio [AOR] 2.61; 95% confidence interval [CI] 2.19, 3.11; P<0.001) compared to hospitals in the highest-volume quartile (>= 125 cases) (Appendix Exhibit A5).(12) Patients in hospitals with case volumes between 25 and 49 cases (quartile 2) have a 1.75-fold higher odds of mortality (AOR 1.75; 95% CI: 1.51, 2.04; P <0.001), whereas patients in hospitals with case volumes between 50 and 124 (quartile 3) have a nearly 1.5-fold higher odds of mortality (AOR 1.48; 95% CI: 1.32, 1.65; P < 0.001) compared to the highest volume hospitals.

Model Performance

Both the standard shrinkage model (model 3) and the shrinkage targets model (model 4) demonstrated very good discrimination (C statistic 0.80). Visual inspection of the calibration plots suggests that both models are well calibrated (Appendix Exhibit A6).(12) Separate calibration graphs for each of the four volume quartiles, however, suggest that the shrinkage targets model is better calibrated than the model without shrinkage targets in quartiles 1 and 2. (Appendix Exhibit A7).(12)

Appendix Exhibit A8 examines model calibration in each volume quartile by comparing the overall observed mortality to the predicted mortality based on the models (1) with shrinkage targets or (2) without shrinkage targets - and shows that the risk-adjusted mortality rates based on the model without shrinkage targets substantially under-estimates mortality in quartile 1 and 2. By comparison, the risk-adjusted mortality rates based on the model with shrinkage targets much more closely approximates observed mortality rates for quartiles 1 and 2. Both models, with and without shrinkage targets, are well calibrated in quartiles 3 and 4.

Distribution of Hospital Predicted-to-Expected (PE) Mortality Ratios

Exhibit 1 compares the distribution of hospital PE ratios based on the model without shrinkage targets (model 3) to the overall observed-to-expected mortality (OE) ratio for each volume quartile. The medians of the hospital PE ratio for quartiles 1, 2, and 3 are substantially less than the overall OE ratios for each of the hospital volume quartiles (P < 0.001). While the OE ratios for quartiles 1, 2 and 3 are 1.90, 1.37, and 1.18, the median of the hospital PE ratios for quartiles 1, 2, and 3 are 1.02, 0.98, and 0.99 respectively. These findings suggest that hospital PE ratios based on the baseline model without shrinkage targets tend to underestimate the mortality of patients treated in low-volume hospitals, and that the degree of under-estimation is greatest in the lowest volume quartiles.

Exhibit 1. Hospital PE Ratios Based on No Shrinkage Targets.

Exhibit 1.

SOURCE Authors’ analysis of 100% Medicare Provider Analysis and Review (MEDPAR) files and the Master Beneficiary Summary files (MBSF) between 2013 and 2015. NOTES Box-plot of distribution of Hospital PE ratios for each volume quartile based on no shrinkage targets versus overall Quartile observed-to-expected (OE) mortality ratio.

Exhibit 2 compares the distribution of hospital PE ratios based on the model with shrinkage targets (model 4) to the overall observed-to-expected mortality (OE) ratio for each volume quartile. In this case, the medians for the hospital PE ratios for quartiles 1, 2, 3 and 4 are not significantly different than the OE ratios for quartiles 1, 2, 3 and 4 (P > 0.05). Together these findings suggest that PE ratios based on shrinkage targets more accurately reflect hospital performance than PE ratios based on the baseline model without shrinkage targets.

Exhibit 2. Hospital PE Ratios Based on Shrinkage Targets.

Exhibit 2.

SOURCE Authors’ analysis of 100% Medicare Provider Analysis and Review (MEDPAR) files and the Master Beneficiary Summary files (MBSF) between 2013 and 2015. NOTES Box-plot of distribution of Hospital PE ratios for each volume quartile based on shrinkage targets versus overall quartile observed-to-expected (OE) mortality ratio.

Distribution of Hospital Star Ratings

Appendix Exhibit A8 displays the range of risk adjusted mortality rates for hospitals across the 5 star categories. (12) Exhibit 3 displays hospital star ratings when these are based on RAMRs estimated using shrinkage targets versus no shrinkage targets. Overall, kappa analysis revealed moderate agreement (kappa = 0. 51) between star ratings based on no shrinkage targets versus shrinkage targets, and that star ratings disagreed 14.3% of the time. Exhibit 4 and Appendix Exhibit A10 (12) also display hospital star ratings based either on shrinkage targets or no shrinkage targets, but stratified by hospital volume quartile. Hospital star ratings based on shrinkage targets exhibited only slight agreement with star ratings based on no shrinkage targets in quartile 1 (kappa = 0.089), and moderate to substantial agreement for quartile 2 (kappa = 0.64), quartile 3 (kappa = 0.61) and quartile 4 (kappa = 0.55). Depending on whether shrinkage targets or no shrinkage targets were used to classify hospitals, star ratings were discordant 29.1% of the time in quartile 1, 8.5% in quartile 2, 10.4% in quartile 3, and 11.3% in quartile 4. Of the 93 very-low volume hospitals (< 25 cases) classified as 3-stars using standard shrinkage, 13 were classified as 1-star and 43 as 2-stars using shrinkage targets.

Exhibit 3. Hospital Star Ratings Based on Shrinkage Targets vs No Shrinkage Targets (all hospitals).

Exhibit 3.

SOURCE Authors’ analysis of 100% Medicare Provider Analysis and Review (MEDPAR) files and the Master Beneficiary Summary files (MBSF) between 2013 and 2015. NOTES Shift in hospital star ratings when they are based on shrinkage targets instead of no shrinkage targets for all hospitals. For example, 319 hospitals were classified as 3-stars using standard shrinkage. Of these, 56 were re-classified as 1 or 2-stars, and 131 as 4-stars using shrinkage targets.

Exhibit 4. Hospital Star Ratings Based on Shrinkage Targets vs No Shrinkage Targets (Volume Quartile 1).

Exhibit 4.

SOURCE Authors’ analysis of 100% Medicare Provider Analysis and Review (MEDPAR) files and the Master Beneficiary Summary files (MBSF) between 2013 and 2015. NOTES Shift in hospital star ratings when star ratings are based on shrinkage targets instead of no shrinkage targets for very-low volume hospitals (<25 cases). For example, 93 hospitals were classified as 3-stars using standard shrinkage. Of these, 13 were classified as 1-star, 43 as 2-stars, and 37 as 3-stars using shrinkage targets.

Comparison of Hospital RAMR and Hospital Outlier Status

Overall, there was substantial agreement between hospital RAMRs based on standard shrinkage versus shrinkage targets (intraclass correlation coefficient [ICC] = 0.64) (Appendix Exhibit A11).(12) Agreement was only slight for quartile 1 (ICC = 0.085), but was substantial for quartile 2 (ICC = 0.64), and nearly perfect for quartiles 3 (ICC = 0.91) and 4 (ICC = .99).

Very few hospitals were identified as performance outliers: 99.1% of hospitals were classified as average using standard shrinkage and 98.7% were classified as average using shrinkage targets. The level of agreement between these two approaches was substantial (kappa = 0.77).

Limitations

This study has several limitations. First, our decision to limit this analysis to a single surgery could lead to questions regarding the generalizability of our findings. However, the extent to which shrinkage estimators distort hospital profiling is a function of the strength of the volume-outcome association and not the procedure itself. Thus, we would expect to see similar findings for other procedures and conditions which exhibit a strong volume-outcome association, such as coronary artery bypass graft surgery, mitral valve replacement, lower extremity bypass, and acute myocardial infarctions.(3, 8) Second, we elected to include all hospitals in our analysis as opposed to excluding hospitals with fewer than 25 cases per year. We included these very low-volume hospitals because excluding these hospitals would have caused one-fourth of the hospitals to be excluded, which arguably would have caused a large blind spot in a measurement system for AVRs. We believe that the performance of very low-volume hospitals should be reported to ensure transparency and accountability since it is precisely these lowest-volume hospitals which, as a group, have the worst outcomes. Furthermore, our approach is consistent with the COPSS-CMS white paper which recommended avoiding volume cutoffs in performance reporting.(10) As recommended by the expert panel, our study has explored the use of case volume as a shrinkage target(10) and provides empiric evidence that this approach reduces some of the distortion of hospital profiling caused by standard shrinkage estimators. However, the use of shrinkage targets is not expected to address the uncertainty around the point estimates for hospital PE ratios for very-low and low-volume hospitals better than the use of conventional shrinkage estimators.

Finally, our examination of the comparative accuracy of the standard approach versus shrinkage targets across volume quartiles compares the predictions of these models to the observed mortality rate. In doing so, we are assuming that the observed mortality rate is the true rate. Although we do not know the true stochastic process generating deaths, and thus the true mortality rate, the sample size in each quartile is large enough to provide a reasonable approximation of the true mortality rate. Other approaches to examining model goodness-of-fit, such as the Hosmer-Lemeshow statistic,(23) also assume that the observed mortality rate is the true mortality rate.

DISCUSSION

Because information on hospital performance is at the center of efforts to redesign the healthcare system, the accuracy of performance measurement is of paramount importance. One of the principal criticisms of the CMS approach is that it “masks performance of small hospitals” and may provide misleading information to patients, referring physicians, and third-party payers.(2, 10)” CMS uses a statistical methodology that calculates a hospital’s performance as the weighted average of a hospital’s own outcomes and the performance of average hospitals. The weight assigned to a hospital’s actual outcomes in this calculation decreases as its case volume decreases. Although this approach is less likely to result in extreme values for hospital performance due to chance alone, it shrinks low-volume providers to the national mean, ignoring the fact that for many conditions and surgeries, hospitals with smaller case volumes have worse outcomes than higher-volume hospitals.

We created two parallel sets of performance measures - one which assumed that low-volume hospitals are average (no shrinkage targets), while the other incorporated prior knowledge that low-volume hospitals have worse outcomes than high-volume hospitals (shrinkage targets). We included very-low volume hospitals in our analyses because the CMS expert panel recommended that low-volume hospitals not be excluded from performance reporting.(10) We found that hospital predicted-to-expected (PE) mortality ratios based on hierarchical modelling without shrinkage targets were clustered around 1 for very-low and low-volume hospitals because hierarchical modelling shrinks their performance to the performance of the average hospital. In theory, hospitals with PE ratios close to 1 should have mortality outcomes similar to an average hospital. However, the overall OE ratio for patients undergoing AVR surgery in very-low and low-volume hospitals were not close to 1, they were 1.9 and 1.37, respectively. By comparison, the hospital PE ratios based on shrinkage targets for very-low volume and low-volume hospitals were clustered around the overall OE ratio. In other words, the hospital PE ratios based on shrinkage targets were consistent with the overall outcomes of patients treated in very-low and low volume hospitals, while hospital PE ratios based on the model without shrinkage targets provided an overly optimistic assessment of the performance of very-low and low-volume hospitals.

CMS assigns hospitals an overall quality star rating of between 1 to 5 stars.(11, 24) This approach provides patients, physicians, and other stakeholders with information that allows them to differentiate between high-performing (4 or 5-star) hospitals, average (3-star) and low-performing (1 or 2 star) hospitals. After grouping hospitals into star categories using the same cluster algorithm used by CMS(11), we found that star ratings based on shrinkage targets exhibited moderate to substantial agreement for all but the lowest-volume hospitals. Star ratings for hospitals with case volumes less than 25 were different nearly 30% of the time depending on whether the star ratings were based on shrinkage targets. For example, of the 93 very-low volume hospitals assigned 3 stars using the standard approach without shrinkage targets, 56 were classified as 1 or 2 star hospitals using shrinkage targets. Together, these findings suggest that the star ratings for all but the lowest volume hospitals are not distorted when shrinkage targets are not used. Thus, the use of shrinkage targets may be important if CMS choses to measure the performance of hospitals with case volumes less than 25.

In addition to star ratings for overall quality, CMS separately reports the risk-adjusted outcome rates as either “no different than the national rate,” “better than the national rate”, or “worse than the national rate.” This metric takes into account the statistical uncertainty around the estimate of the risk-adjusted outcome, as compared to star rankings which are based only on the point estimates for the risk-adjusted outcome rate. But because greater than 99.5% of hospitals are identified as average for reported conditions such as acute myocardial infarction,(3) this classification system conveys little information to patients and is not used by CMS for value-based purchasing. In our analysis, nearly 99% of the hospitals were classified as average using either no shrinkage targets or shrinkage targets. Although these two approaches exhibited substantial agreement when hospital ratings were based on outlier status, this is not surprising since both methods classified nearly all hospitals as average.

Since CMS uses risk-standardized outcome rates based on PE ratios for public reporting and value-based purchasing in programs such as the Hospital Readmission Reduction Program(25) and the Comprehensive Joint Replacement Program,(26) we also investigated the impact of shrinkage targets on PE ratios. We found that PE ratios showed poor agreement for very-low volume hospitals, intermediate level of agreement for low-volume hospitals, and excellent agreement for high and very-high volume hospitals. In particular, the RAMRs based on standard shrinkage consistently under-estimated RAMRs based on shrinkage targets for very-low and low-volume hospitals. These findings are consistent with our main finding that star rankings based on standard shrinkage for very-low volume hospitals provide an overly optimistic estimate of hospital performance.

The fact that the standard methodology used by CMS distorts the performance of low-volume hospitals has been previously reported by Silber and others, (3, 27, 28) but is not generally well understood by the medical or health care policy community. The use of shrinkage targets to address this limitation was identified as a top priority in a recent COPSS-CMS white paper commissioned by CMS.(10) However, to the best of our knowledge, measure developers have not submitted measures using shrinkage targets to NQF for endorsement. In our study, we operationalized shrinkage targets as described in the COPSS-CMS white paper and have shown that the standard methodology does not introduce significant distortions in hospital rankings for most hospitals, with the exception of very-low volume hospitals. Our study builds on prior work by Silber and others by showing that shrinkage targets lead to major shifts in quality rankings for very-low volume hospitals compared to not using shrinkage targets. To the best of our knowledge, ours is the first study to demonstrate the practical implications of using shrinkage targets compared to not using shrinkage targets.

In theory, one of the main advantages of hierarchical modelling is that the performance of very low-volume hospitals can be measured. The current CMS practice of excluding very low-volume hospitals creates a blind spot in performance profiling, and makes it more difficult for patients to make informed choices. Using the standard CMS methodology that does not incorporate shrinkage targets, however, distorts the performance of very low-volume hospitals and may provide patients with potentially misleading information. Although shrinkage targets result in more accurate performance measures for very-low volume hospitals, these hospitals may argue that incorporating hospital volume into performance profiling is unfair since some small hospitals may deliver excellent care and shrinkage targets pulls their mortality rates upwards. Since the true performance of individual very low-volume hospitals may be unknowable because of sample size issues, policy makers need to determine whether the benefit of providing patients with information on very low-volume hospitals outweighs the risk of mis-classifying some of these hospitals as low quality. We believe that CMS has an obligation to the public to report the performance of very-low volume hospitals for procedures where the risk of poor outcomes is especially high. But, we also appreciate that such an approach may unintentionally mis-classify some very low-volume hospitals as low-performance.

Conclusion

Our findings demonstrate the feasibility of implementing the recommendation made by the COPSS-CMS Committee tasked with addressing the criticism that the CMS methodology masks the performance of low-volume hospitals.(10) Our findings suggest that the use of shrinkage targets does not have a significant impact on the classification of hospitals with case volumes > 25, which is the current volume cutoff used by CMS for public reporting. Our findings are particularly important in light of the ongoing controversy surrounding the use of hierarchical modeling in risk adjustment at the National Quality Forum, and the strong recommendation to CMS by its expert panel to consider the option of using shrinkage targets.(10)

Supplementary Material

Appendix

Contributor Information

Laurent G. Glance, Department of Anesthesiology and Perioperative Medicine, University of Rochester, NY

Caroline P. Thirukumaran, Department of Orthopaedics, University of Rochester, NY

Yue Li, Department of Public Health Sciences, University of Rochester, NY.

Shan Gao, Department of Biostatistics and Computational Biology, University of Rochester, NY.

Andrew W. Dick, RAND Health, RAND, Boston, MA

NOTES

  • 1.Burwell SM. Setting value-based payment goals--HHS efforts to improve U.S. health care. N Engl J Med. 2015;372(10):897–9. [DOI] [PubMed] [Google Scholar]
  • 2.Glance LG, Joynt Maddox K, Johnson K, Nerenz D, Cella D, Borah B, et al. National Quality Forum Guidelines for evaluating the scientific acceptability of risk-adjusted clinical outcome measures: A report From the National Quality Forum Scientific Methods Panel. Ann Surg. 2019. (ePub). [DOI] [PubMed] [Google Scholar]
  • 3.Silber JH, Rosenbaum PR, Brachet TJ, Ross RN, Bressler LJ, Even-Shoshan O, et al. The Hospital Compare mortality model and the volume-outcome relationship. Health Serv Res. 2010;45(5 Pt 1):1148–67. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Krumholz HM, Brindis RG, Brush JE, Cohen DJ, Epstein AJ, Furie K, et al. Standards for statistical models used for public reporting of health outcomes: An American Heart Association scientific statement from the Quality of Care and Outcomes Research Interdisciplinary Writing Group: cosponsored by the Council on Epidemiology and Prevention and the Stroke Council endorsed by the American College of Cardiology Foundation. Circulation. 2006;113(3):456–62. [DOI] [PubMed] [Google Scholar]
  • 5.Snijders TB, R J. Multilevel Analysis. Newbury Park: Sage; 1999. [Google Scholar]
  • 6.Birkmeyer JD, Siewers AE, Finlayson EV, Stukel TA, Lucas FL, Batista BA, et al. Hospital volume and surgical mortality in the United States. N Engl J Med. 2002;346(15):1128–44. [DOI] [PubMed] [Google Scholar]
  • 7.Ross JS, Normand SL, Wang Y, Ko DT, Chen J, Drye EE, et al. Hospital volume and 30-day mortality for three common medical conditions. N Engl J Med. 2010;362(12):1110–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Reames BN, Ghaferi AA, Birkmeyer JD, Dimick JB. Hospital volume and operative mortality in the modern era. Ann Surg. 2014;260(2):244–51. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Chhabra KR, Dimick JB. Hospital networks and value-based payment: fertile ground for regionalizing high-risk surgery. AMA. 2015;314(13):1335–6. [DOI] [PubMed] [Google Scholar]
  • 10.Ash ASFS, Louis TA, Feinberg SE, Normand ST, Stukel TA, Utts J. Statistical issues in assessing hospital performance [Internet]. Baltimore: Centers for Medicare & Medicaid Services; 2012. January [cited 2020 Feb 13]. 70p. Available from: https://www.cms.gov/Medicare/Quality-Initiatives-Patient-Assessment-Instruments/HospitalQualityInits/Downloads/Statistical-Issues-in-Assessing-Hospital-Performance.pdf. [Google Scholar]
  • 11.Venkatesh AK, Bernheim SM, Qin L, Bao H, Simoes J, Wing M et al. Overall hospital quality star rating on Hospital Compare methodology report (v3.0) [Internet]. New Haven: Yale New Haven Health Services Corporation/Center for Outcomes Research & Evaluation; 2017. December [cited 2020 Feb 13]. 47 p. Available from: https://www.qualitynet.org/dcs/ContentServer?c=Page&pagename=QnetPublic%2FPage%2FQnetTier3&cid=1228775959066. Please include the link that goes directly to the source. It appears that this is the link; please confirm: https://www.qualitynet.org/files/5d0d3a1b764be766b0103ec1?filename=Star_Rtngs_CompMthdlgy_010518.pdf. [Google Scholar]
  • 12.To access the appendix, click on the Details tab of the article online.
  • 13.Quan H, Sundararajan V, Halfon P, Fong A, Burnand B, Luthi JC, et al. Coding algorithms for defining comorbidities in ICD-9-CM and ICD-10 administrative data. Med Care. 2005;43(11):1130–9. [DOI] [PubMed] [Google Scholar]
  • 14.Hosmer DW, Lemeshow S. Applied Logistic Regression. second edition New York: Wiley-Interscience Publication; 2000. [Google Scholar]
  • 15.Royston P, Ambler G, Sauerbrei W. The use of fractional polynomials to model continuous risk variables in epidemiology. Int J Epidemiol. 1999;28(5):964–74. [DOI] [PubMed] [Google Scholar]
  • 16.Hospital case volume was specified using 2 terms: (1) log(hospital case volume/1000); (2) (hospital case volume/1000).
  • 17.Krumholz HM, Wang Y, Mattera JA, Wang Y, Han LF, Ingber MJ, et al. An administrative claims model suitable for profiling hospital performance based on 30-day mortality rates among patients with an acute myocardial infarction. Circulation. 2006;113(13):1683–92. [DOI] [PubMed] [Google Scholar]
  • 18.Grady JN, Lin Z, Wang Y, Nwosu C, Keenan M, Bhat K et al. 2013 measures updates and specifications: Acute myocardial infarction, heart failure, and pneumonia 30-day risk-standardized mortality measure (Version 7.0)[Internet]. New Haven: Yale New Haven Health Services Corporation/Center for Outcomes Research & Evaluation; 2013. March [cited 2020 Feb 13]. 55 p. Available from: https://www.cms.gov/Medicare/Quality-Initiatives-Patient-Assessment-Instruments/HospitalQualityInits/Mortality_AMI-HF-PN_Measures_Updates_Report_FINAL_06-13-2013.pdf [Google Scholar]
  • 19.Kanungo T, Mount DM, Netanyahu NS, Piatko CD, Silverman R, Wu AY. An efficient k-means clustering algorithm: Analysis and implementation. Ieee T Pattern Anal. 2002;24(7):881–92. [Google Scholar]
  • 20.Pencina MJ, D’Agostino RB. Evaluating discrimination of risk prediction models: The C statistic. JAMA. 2015;314(10):1063–4. [DOI] [PubMed] [Google Scholar]
  • 21.Viera AJ, Garrett JM. Understanding interobserver agreement: the kappa statistic. Fam Med. 2005;37(5):360–3. [PubMed] [Google Scholar]
  • 22.Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977;33(1):159–74. [PubMed] [Google Scholar]
  • 23.Hosmer DWLS, Sturdivant RX. Assessing the Fit of theModel Applied Logistic Regression. Hoboken, NJ: John Wiley & Sons; 2013. [Google Scholar]
  • 24.Chung JW, Dahlke AR, Barnard C, DeLancey JO, Merkow RP, Bilimoria KY. The Centers For Medicare And Medicaid Services hospital ratings: Pitfalls of grading on a single curve. Health Aff (Millwood). 2019;38(9):1523–9. [DOI] [PubMed] [Google Scholar]
  • 25.McIlvennan CK, Eapen ZJ, Allen LA. Hospital readmissions reduction program. Circulation. 2015;131(20):1796–803. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Centers for Medicare & Medicaid Services Innovation Center. Overview of CJR quality measures, composite quality score, and pay-for-performance methodology [Internet]. Washington, D.C.: Centers for Medicare & Medicaid Services; [cited 2020 Feb 13]. 15 p. Available from: https://innovation.cms.gov/Files/x/cjr-qualsup.pdf. [Google Scholar]
  • 27.Mukamel DB, Glance LG, Dick AW, Osler TM. Measuring quality for public reporting of health provider quality: making it meaningful to patients. Am J Public Health. 2010;100(2):264–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Sosunov EA, Egorova NN, Lin HM, McCardle K, Sharma V, Gelijns AC, et al. The impact of hospital size on CMS hospital profiling. Med Care. 2016;54(4):373–9. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Appendix

RESOURCES