Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2018 Mar 7.
Published in final edited form as: N Engl J Med. 2017 Jun 15;376(24):2358–2366. doi: 10.1056/NEJMsa1613412

Changes in Hospital Quality Associated with Hospital Value-Based Purchasing

Andrew M Ryan 1, Sam Krinsky 1, Kristin A Maurer 1, Justin B Dimick 1
PMCID: PMC5841552  NIHMSID: NIHMS889788  PMID: 28614675

Abstract

BACKGROUND

Starting in fiscal year 2013, the Hospital Value-Based Purchasing (HVBP) program introduced quality performance–based adjustments of up to 1% to Medicare reimbursements for acute care hospitals.

METHODS

We evaluated whether quality improved more in acute care hospitals that were exposed to HVBP than in control hospitals (Critical Access Hospitals, which were not exposed to HVBP). The measures of quality were composite measures of clinical process and patient experience (measured in units of standard deviations, with a value of 1 indicating performance that was 1 standard deviation [SD] above the hospital mean) and 30-day risk-standardized mortality among patients who were admitted to the hospital for acute myocardial infarction, heart failure, or pneumonia. The changes in quality measures after the introduction of HVBP were assessed for matched samples of acute care hospitals (the number of hospitals included in the analyses ranged from 1364 for mortality among patients admitted for acute myocardial infarction to 2615 for mortality among patients admitted for pneumonia) and control hospitals (number of hospitals ranged from 31 to 617). Matching was based on preintervention performance with regard to the quality measures. We evaluated performance over the first 4 years of HVBP.

RESULTS

Improvements in clinical-process and patient-experience measures were not significantly greater among hospitals exposed to HVBP than among control hospitals, with difference-in-differences estimates of 0.079 SD (95% confidence interval [CI], −0.140 to 0.299) for clinical process and −0.092 SD (95% CI, −0.307 to 0.122) for patient experience. HVBP was not associated with significant reductions in mortality among patients who were admitted for acute myocardial infarction (difference-in-differences estimate, −0.282 percentage points [95% CI, −1.715 to 1.152]) or heart failure (−0.212 percentage points [95% CI, −0.532 to 0.108]), but it was associated with a significant reduction in mortality among patients who were admitted for pneumonia (−0.431 percentage points [95% CI, −0.714 to −0.148]).

CONCLUSIONS

In our study, HVBP was not associated with improvements in measures of clinical process or patient experience and was not associated with significant reductions in two of three mortality measures. (Funded by the National Institute on Aging.)


Health care in the United States is extremely costly. There is compelling evidence that a large share of spending — particularly in Medicare — results in little or no patient benefit.15 Quality performance also varies widely across hospitals.6 Numerous public and private payer initiatives have attempted to resolve this conflict through value-based purchasing programs.7,8 The Patient Protection and Affordable Care Act (ACA) established value-based purchasing programs throughout Medicare, including the Hospital Value-Based Purchasing (HVBP) program. Beginning in fiscal year (FY) 2013, the HVBP program made Medicare payments to acute care hospitals — hospitals paid under the inpatient prospective payment system — conditional on performance as assessed by a variety of metrics. Starting with clinical-process and patient-experience measures in FY 2013, the program expanded to include patient outcome measures in FY 2014 and spending measures in FY 2015. The size of the program incentives has also increased gradually from 1% of diagnosis-related group revenue in FY 2013 to 2% by FY 2017. Beginning in 2005, hospitals that were ultimately subject to HVBP also became subject to public quality reporting through the Hospital Compare website.9

Despite the well-intended goals of value-based purchasing programs, evidence that these programs have improved quality and spending outcomes is mixed and far from convincing.1014 Previous research on the first 9 months of HVBP showed no evidence that the program improved performance as assessed by clinical-process or patient-experience measures.15 Recent research also showed that HVBP did not reduce mortality during the first 30 months of the program.16

Nonetheless, the longer-term effects of HVBP — particularly with regard to clinical process and patient experience — are unknown. It is possible that, despite the lack of early responsiveness to the program, the effects of HVBP may grow stronger over time, as the incentives increase and hospitals have time to respond to the program. It is also possible that the effects of HVBP are heterogeneous across hospitals. Of particular interest is whether hospital characteristics (e.g., teaching status, size, and Medicaid share) or engagement in health system delivery reforms (e.g., meaningful use of electronic health records, accountable care organization programs, and bundled payment) modify performance in the program.

METHODS

DATA, STUDY POPULATION, AND STUDY OUTCOMES

Our study population included all U.S. hospitals that reported data through Hospital Compare (ranging from 4546 hospitals in the first release used in the study to 4799 hospitals in the most recent release). We restricted our study to general short-term acute care hospitals, which were exposed to HVBP (exposed hospitals), and Critical Access Hospitals, which were exempt (control hospitals). In 2014, there were 1331 Critical Access Hospitals in the United States. Reporting of quality measures for Critical Access Hospitals through Hospital Compare varied considerably across study domains (Table S1 in the Supplementary Appendix, available with the full text of this article at NEJM.org). We excluded all children’s, psychiatric, or other specialty facilities; Veterans Affairs hospitals; and hospitals in Maryland. Although hospitals in Maryland were not exposed to HVBP, they were subject to a similar set of financial incentives.17 We excluded hospitals that did not report outcomes data in each study period, mainly because they did not meet minimum sample-size requirements as specified under HVBP rules. A total of 2842 hospitals were included in the clinical-process analysis, 3247 in the patient-experience analysis, and 2195, 3256, and 3525 hospitals in the analysis of mortality among patients who were admitted to the hospital for acute myocardial infarction, heart failure, and pneumonia, respectively. All clinical-process, patient-experience, and mortality data were downloaded from Hospital Compare.

For the clinical-process and patient-experience outcomes, data were available for the eight annual measurement periods (ending June 30) from 2008 through 2015. We constructed a clinical-process composite that was based on the performance of hospitals across the seven indicators that were used as the basis for incentives in each of the first 3 years of the program and were reported through Hospital Compare for the duration of the study period. A list of the indicators included in the composite is provided in Table S2 in the Supplementary Appendix. The measure “primary percutaneous coronary intervention received within 90 minutes after hospital arrival” for patients with myocardial infarction was included for exposed hospitals but not for control hospitals (because percutaneous coronary intervention is rarely performed at Critical Access Hospitals). For hospitals in the study sample that participated in HVBP, performance with regard to clinical-process indicators ranged from 99.7% compliance (appropriate venous thromboembolism prophylaxis for surgical patients) to 95.6% compliance (primary percutaneous coronary intervention for patients with acute myocardial infarction) by the end of the study period (Tables S3 through S6 in the Supplementary Appendix). To create the clinical-process composite, we standardized each indicator by subtracting the sample mean performance across the full sample (not matched) of exposed and control hospitals for the whole study period and dividing by the sample standard deviation. We then calculated the composite as the mean of these standardized indicator scores. A value of 1 indicates performance with regard to the composite measure that is 1 standard deviation (SD) above the mean among hospitals that met the inclusion criteria during the study period.

The patient-experience composite consisted of eight indicators that were used as a basis for incentives under the program (a list of the indicators is provided in Table S2 in the Supplementary Appendix). As specified under the program rules, for each indicator we assessed the mean percentage of patients who reported excellent performance, known as “top box” performance (e.g., communication with doctors was “always” good). Among hospitals in the study sample that participated in HVBP, by the end of the study period, performance with regard to the patient-experience indicators ranged from 87.0% of patients who reported being given discharge instructions to 64.6% of patients who reported that staff always explained medications before administering them (Tables S7 through S10 in the Supplementary Appendix). To create a composite measure for patient experience, as with clinical-process performance, we calculated the mean of the standardized indicator scores.

We evaluated hospital-level 30-day risk-standardized mortality among patients who were admitted for acute myocardial infarction, heart failure, or pneumonia as separate outcomes. Mortality was adjusted for patient age, sex, and clinical coexisting conditions (based on hierarchical condition categories). Mortality was also standardized to account for the variance in the estimates. Data on these measures were available for the seven overlapping 3-year periods (ending June 30) between 2008 and 2014.18 For example, the 2008 extract included data from discharges occurring between July 1, 2006, and June 30, 2008, and the 2009 extract included data from discharges occurring between July 1, 2007, and June 30, 2009.

We used data on teaching status, number of beds, and Medicaid share of inpatient days, all of which were obtained from Medicare cost reports between 2008 and 2015. To assess hospital participation in other payment reforms, we obtained publicly available data on dates of participation in the meaningful use of electronic health records (stage 1 or stage 2)19 and the Bundled Payment for Care Improvement initiative. We also obtained data on dates of participation in the Pioneer and Medicare Shared Savings accountable care organization programs from Leavitt Partners.20

STATISTICAL ANALYSIS

We performed a difference-in-differences analysis to test the association between HVBP and the outcomes rewarded in the program. This analysis tests whether there were greater improvements in our study outcomes among exposed hospitals than among the control hospitals (Critical Access Hospitals). Although payment adjustments began to occur in FY 2013, we considered the start date of the HVBP to be July 2011, the first period in which hospital performance on quality measures was subject to incentives (with 2013 payment adjustments reflecting 2011–2012 performance).

Critical Access Hospitals are small, rural hospitals that are much different from the general short-term acute care hospitals that are subject to HVBP. To facilitate the comparison of outcomes between acute care hospitals and Critical Access Hospitals, we created control groups of hospitals that had similar levels of and trends in the preintervention outcomes. To do this, we implemented a matching strategy using propensity scores, performing one-to-one matching with replacement and calipers (defining the maximum difference in the propensity score that was allowable for a match) of 0.01 of the propensity score. We restricted matches to exposed and control hospitals with overlapping ranges of propensity-score values, known as common support.21 Matching was performed separately for each outcome. Our matching procedure first stratified outcomes on the basis of preintervention trends and then matched hospitals within strata.22 For clinical process and patient experience, we stratified into deciles, and for the mortality outcomes we stratified into quintiles, because of the small number of Critical Access Hospitals that met the minimum case requirements. Multiple exposed hospitals could be matched to the same Critical Access Hospital. In the analysis, observations from Critical Access Hospitals were weighted according to the number of matches between a given Critical Access Hospital and acute care hospitals. The matching procedure was implemented with Stata software, version 12 (StataCorp), with the use of a user-written command.23 Recent research suggests that matching can result in more accurate estimates in difference-in-differences analysis, particularly for measures like clinical process and patient experience, for which changes are closely related to baseline levels.24 In the matched analysis, a relatively large share of acute care hospitals received suitable matches, ranging from 51% of hospitals for the patient-experience outcome to 93% for the pneumonia mortality outcome, and were therefore included in the analysis (Table 1).

Table 1.

Characteristics of the Samples of Matched Exposed Hospitals and Control Hospitals Used in Analyses of the Five Study Outcomes.*

Characteristic Sample for Standardized
Clinical-Process Composite
Sample for Standardized
Patient-Experience
Composite
Sample for 30-Day Risk-
Standardized Mortality for
Acute Ml Admissions
Sample for 30-Day Risk-
Standardized Mortality for
Heart Failure Admissions
Sample for 30-Day Risk-
Standardized Mortality for
Pneumonia Admissions
Exposed
Hospitals
Control
Hospitals
Exposed
Hospitals
Control
Hospitals
Exposed
Hospitals
Control
Hospitals
Exposed
Hospitals
Control
Hospitals
Exposed
Hospitals
Control
Hospitals
Hospitals — no. 2164 153 1507 237 1364 31 2383 419 2615 617

Hospital-years 17,312 1224 12,056 1896 9548 217 16,681 2933 18,305 4319

Preintervention score −0.32±0.77 −0.33±0.79 −0.01±0.66 0.07±0.61 16.07±1.49§ 16.04±1.23§ 11.49±1.38§ 11.53±1.28§ 11.86±1.74§ 11.87±1.66§

Hospital characteristics — %

 Teaching 37 1 28 2 39 0 32 1 33 1

 BPCI 13 0 10 0 3 0 2 0 3 0

 Meaningful use 96 82 97 80 96 87 96 82 96 83

 ACO 19 15 18 11 15 10 13 5 14 6

Medicaid share — % 12±9 10±5 12±9 10±8 13±8 12±7 13±9 8±6 13±9 8±7

Beds — no. 223±187 24±2 177±170 24±2 242±182 25±0 203±183 24±3 208±187 24±4
*

Plus-minus values are means ± SD. The exposed hospitals were general short-term acute care hospitals, which were exposed to Hospital Value-Based Purchasing (HVBP); the control hospitals were Critical Access Hospitals, which were exempt. To facilitate the comparison of outcomes between exposed hospitals and control hospitals, we used a matching strategy with propensity scores to create control groups of hospitals that had similar levels of and trends in the preintervention outcomes. Matching was performed separately for each outcome. BPCI denotes Bundled Payments for Care Improvement, and Ml myocardial infarction.

The standardized clinical-process composite is a composite of the clinical-process indicators that form the basis for incentives in HVBP. The composite is the mean of the individual indicators; each indicator in the composite has a mean of 0 for all hospitals over the study period and is expressed in units of its standard deviation (SD). The composite has negative values in the preintervention period because improvement occurred during the study period. A value of 1 indicates performance on the composite that is 1 SD above the mean among hospitals meeting inclusion criteria during the study period. The standardized patient-experience process composite is a composite of the patient-experience indicators that form the basis for incentives in HVBP and was constructed in the same way as the clinical-process composite. A list of the indicators included in both composite outcomes is provided in Table S2 in the Supplementary Appendix.

This score is expressed as standard deviations relative to all hospitals meeting inclusion criteria during the study period.

§

This score is expressed as the risk-standardized percentage of patients who died from the given condition within 30 days after a hospital admission.

Data are the percentages of unique hospitals that participated in a given reform program at any time during the study period. The hospitals designated as participating in the meaningful use of electronic health records were those that received incentives under the Electronic Health Records Initiative (stage 1 or stage 2); those designated as participating in an accountable care organization (ACO) participated in the Pioneer or Medicare Shared Savings ACO Program.

Medicaid share is measured as the share of inpatient days paid for by Medicaid over the study period.

To test the effect of HVBP, we estimated a linear fixed-effects model at the hospital level. Models were estimated separately for our study outcomes. For the clinical-process and patient-experience models, we estimated the following model for hospital j at time t: Yjt = a0 + b postt + δ (postt × exposedj) + ρuj + ejt. In this equation, Y represents the study outcome, a is the model intercept, b is the coefficient estimate for the “post” term, u is a vector of hospital fixed effects, ρ is a vector of coefficients for the hospital fixed effects, and e is the idiosyncratic error term. The term “post” is equal to 1 for observations occurring after the start of HVBP (and 0 otherwise), and the term “exposed” is equal to 1 for exposed hospitals (and 0 otherwise). The term “post × exposed” is equal to 1 in the post-HVBP period among acute care hospitals and is equal to 0 otherwise. The difference-in-differences estimate is the coefficient δ. The specification included hospital fixed effects (u). The term “exposed” is not included as a main effect because it does not vary over time and is therefore absorbed into the hospital fixed effect.

Our models took a modified form for the mortality analysis. Because the available data extracts were for rolling 36-month periods, some observations included data that spanned the pre-HVBP and post-HVBP implementation period. To address this, our measure of hospital exposure to HVBP was specified as the proportion of months in a given data extract that followed the July 1, 2011, program start date. For instance, the 2012 data extract included discharges from July 1, 2009, through June 30, 2012, which we coded as 0.33. Similarly, we coded the 2013 extract as 0.66 and the 2014 extract as 1.

For each outcome, we also estimated a separate specification that allowed the effect of HVBP to vary across a vector of hospital characteristics (teaching status, number of beds, and Medicaid share) and across hospital statuses with regard to participation in the meaningful use of electronic health records, the Bundled Payment for Care Improvement initiative, and the Pioneer and Medicare Shared Savings accountable care organization programs (see the Additional Description of Methods section in the Supplementary Appendix).

Statistical tests and confidence intervals for the difference-in-differences estimates were based on nonparametric permutation tests with 2000 permutation resamples (see the Additional Description of Methods section in the Supplementary Appendix). These tests have been shown to have better properties than parametric methods for inference in the context of difference-in-differences analyses.25 All analyses were performed with Stata software, version 12.26

RESULTS

CHARACTERISTICS OF THE HOSPITALS

Table 1 shows that, as compared with control hospitals, the matched sample of exposed hospitals were larger, were more likely to be teaching hospitals, had a higher share of Medicaid inpatient days, and were more likely to participate in the meaningful use of electronic health records for each of the study outcomes. However, the matched samples of exposed hospitals and control hospitals had very similar levels of preintervention performance with regard to each outcome, reflecting successful matching according to baseline performance.

CHANGES IN PERFORMANCE

Performance with regard to both clinical process and patient experience improved for both the exposed and the control hospitals before and after HVBP was implemented (Fig. 1). Exposed and control hospitals had similar improvement after HVBP was implemented. Thirty-day risk-standardized mortality decreased among patients who were admitted for acute myocardial infarction, remained relatively constant among those admitted for pneumonia, and increased slightly among those admitted for heart failure in both the exposed and the control hospitals during the study period (Fig. 2). For mortality among patients who were admitted for acute myocardial infarction or heart failure, these trajectories were similar for the exposed and control hospitals before and after HVBP was initiated. Thirty-day risk-standardized mortality among patients who were admitted for pneumonia increased slightly in the post-HVBP period among the control hospitals.

Figure 1. Standardized Clinical-Process and Patient-Experience Performance among Matched Exposed and Matched Control Hospitals, 2008–2015.

Figure 1

The standardized clinical-process composite is a composite of the clinical-process indicators that form the basis for incentives in Hospital Value-Based Purchasing (HVBP). The composite is the mean of the individual indicators; each indicator in the composite has a mean of 0 for all hospitals over the study period and is expressed in units of its standard deviation (SD). The composite has negative values in the preintervention period because improvement occurred during the study period. A value of 1 indicates performance on the composite that is 1 SD above the mean among hospitals meeting inclusion criteria during the study period. The standardized patient-experience process composite is a composite of the patient-experience indicators that form the basis for incentives in HVBP and was constructed in the same way as the clinical-process composite. A list of the indicators included in both composite outcomes is provided in Table S2 in the Supplementary Appendix.

Figure 2.

Figure 2

30-Day Risk-Standardized Mortality among Patients Admitted to the Hospital for Acute Myocardial Infarction (MI), Heart Failure, or Pneumonia among Matched Exposed and Matched Control Hospitals, 2008–2014.

Table 2 shows the estimates for each of the study outcomes. The between-group differences in preintervention trends were not significant for any study outcome. The difference-in-differences estimates — comparing the change in performance from the pre-HVBP period to the post-HVBP period between exposed and control hospitals — indicated that HVBP was associated with a nonsignificant increase in clinical-process performance of 0.079 SD (95% confidence interval [CI], −0.140 to 0.299). HVBP was associated with a nonsignificant reduction in patient-experience performance (−0.092 SD; 95% CI, −0.307 to 0.122). HVBP was not associated with significant reductions in 30-day risk-standardized mortality among patients who were admitted for acute myocardial infarction (−0.282 percentage points; 95% CI, −1.715 to 1.152) or heart failure (−0.212 percentage points; 95% CI, −0.532 to 0.108). HVBP was, however, associated with a significant reduction in 30-day risk-standardized mortality among patients who were admitted for pneumonia (−0.431 percentage points; 95% CI, −0.714 to −0.148).

Table 2.

Estimates of the Association between HVBP and Incentive-Associated Outcomes.

Outcome No. of Hospitals Preintervention Difference in Annual Trend (95% Cl)* Postintervention Change (95% CI) Difference-in-Differences Estimate (95% Cl)§
Exposed Hospitals Control Hospitals
standard deviations

Standardized clinical-process composite 2317 0.001 (−0.047 to 0.049) 0.697 (0.678 to 0.716) 0.617 (0.526 to 0.708) 0.079 (−0.140 to 0.299)

Standardized patient-experience composite 1744 0.004 (−0.043 to 0.050) 0.354 (0.335 to 0.373) 0.447 (0.337 to 0.557) −0.092 (−0.307 to 0.122)

percentage points

30-Day risk-standardized mortality for acute Ml admissions 1395 0.028 (−0.220 to 0.277) −1.756 (−1.841 to −1.670) −1.474 (−2.282 to −0.667) −0.282 (−1.715 to 1.152)

30-Day risk-standardized mortality for heart failure admissions 2802 0.016 (−0.080 to 0.113) 0.479 (0.419 to 0.538) 0.691 (0.500 to 0.882) −0.212 (−0.532 to 0.108)

30-Day risk-standardized mortality for pneumonia admissions 3232 −0.012 (−0.083 to 0.060) −0.184 (−0.255 to −0.114) 0.247 (0.076 to 0.419) −0.431 (−0.714 to −0.148)
*

The preintervention difference in trend is the difference between the annual linear trend for hospitals exposed to HVBP and control hospitals in the preintervention period (measurement periods ending between 2008 and 2011). For example, for mortality among patients admitted to the hospital for acute Ml, exposed hospitals were changing at an annual rate that was 0.028 percentage points greater (i.e., worsening) than control hospitals, a difference that was not significant.

The 95% confidence intervals (CIs) are based on a parametric t-test with clustered standard errors.

The postintervention change is the difference between the mean preintervention outcome and the mean postintervention outcome.

§

The 95% CIs are based on permutation tests from 2000 resamples.

SENSITIVITY ANALYSIS

Sensitivity analysis of the effects of HVBP across the full sample of hospitals showed an inconsistent pattern of results across the study outcomes, indicating no clear benefit in association with HVBP (Table S12 and Figs. S1 through S3 in the Supplementary Appendix). We also found that HVBP was not associated with improvement in an alternative measure of patient experience — a single item indicating the percentage of patients who gave the hospital an overall rating of 9 or 10 out of 10 — or an alternative clinical-process composite that excluded the measure “primary percutaneous coronary intervention received within 90 minutes after hospital arrival” for patients with acute myocardial infarction. Sensitivity analysis also showed little evidence that the effect of HVBP was modified by teaching status, size, and Medicaid share and hospital participation in the meaningful use of electronic health records, the Bundled Payment for Care Improvement initiative, or accountable care organization programs. A related analysis showed that hospitals with a greater share of Medicare patients — and therefore stronger incentives to improve — did not improve more under HVBP than hospitals with a smaller share of Medicare patients. Results from models stratified according to baseline performance also showed no clear pattern of effect modification. Additional details regarding the results of the sensitivity analyses are provided in Tables S13 through S22 in the Supplementary Appendix.

DISCUSSION

Our estimates of the effect of HVBP on clinical process, patient experience, and mortality were small, not consistent with one another in the direction of the association, and generally non-significant. The significant reduction in 30-day risk-standardized mortality among patients who were admitted to the hospital for pneumonia was driven by an increase in mortality in the matched sample of Critical Access Hospitals. Because the incentives in the program did not appear to improve performance with regard to the clinical-process indicators related to pneumonia,15 it is unlikely that HVBP would have reduced mortality among the patients who were admitted for pneumonia, since such reductions are harder to achieve than improvements in clinical process. We also found no meaningful variation in the effectiveness of the program across hospital characteristics and across statuses with regard to engagement in voluntary value-based reforms. Our study provides evidence that HVBP did not result in meaningful improvements in clinical process or patient experience or in a significant reduction in mortality during its first 4 years.

Our results are consistent with those from studies that have shown that HVBP did not increase quality with regard to clinical process or patient experience in its first 9 months15 and more recent research indicating that HVBP did not reduce mortality over the first 30 months of the program.16 The seeming ineffectiveness of HVBP stands in contrast to the Medicare Hospital Readmissions Reduction Program (HRRP), which appears to have reduced rates of readmission for targeted conditions.27 This may have resulted from the fact that incentives in HVBP are much smaller than the incentives in the HRRP. HVBP incentives are also spread over numerous domains and performance measures, further diluting the effect of the program. In addition, whereas the incentives in HVBP involve both bonuses and penalties, the HRRP uses only penalties. These penalties may have triggered loss aversion among hospital administrators, enhancing its effect.28 In addition, HVBP uses a highly complex incentive design, rewarding hospitals for a combination of relative performance and improvement across numerous performance measures. The complex, wide-ranging, and evolving bonus-based incentive structure of the program may be a less effective design than the simpler, more narrowly targeted, penalty-based design of the HRRP.29

The Critical Access Hospitals that formed the control group differ from exposed hospitals across a number of distinct dimensions, including size, teaching status, and baseline quality performance. Although we used matching to attempt to address differences in preintervention quality, expectations for quality improvement may differ between the Critical Access Hospitals and hospitals exposed to HVBP. In addition, the control hospitals did not face financial penalties for not reporting quality of care through Hospital Compare, and therefore they reported data at lower rates than the exposed hospitals. However, a control group that was made up of more “motivated” Critical Access Hospitals that voluntarily reported data through Hospital Compare would probably bias the results away from, rather than toward, the null. Changes in outcomes throughout the study period were very similar between the matched acute care hospitals and Critical Access Hospitals (Figs. S4 and S5 in the Supplementary Appendix). This supports our use of Critical Access Hospitals as a control group. We view our difference-in-differences analysis strategy with an imperfect control group to be superior to an interrupted time-series design, because the nonlinear trajectories of many of the study outcomes can lead to biased inferences when interrupted time-series designs are used (Tables S23 through S25 in the Supplementary Appendix).30

The measures that form the basis for incentives under HVBP have been publicly reported for several years. Hospitals also had high performance on some of the measures — particularly the clinical-process indicators — at the start of the program. As a result, the opportunity for additional improvement under HVBP may have been limited, decreasing the apparent effectiveness of the program. In addition, although HVBP was created by the passage of the ACA in 2010, hospitals may have attempted to improve quality in anticipation of the program.15 Also, although the confidence intervals for our matching estimators were small for most of our outcomes, the estimates for mortality among patients who were admitted to the hospital for acute myocardial infarction had larger confidence intervals as a result of the lower number of hospitals that met caseload requirements. As a result, we had lower statistical power to determine the effect of HVBP on this measure of mortality. Finally, the incentives in the HVBP changed over time, and this may have modified hospital responsiveness to the program. Future research may evaluate whether the changing incentives in the program affected hospital performance.

Our evaluation suggests that HVBP, which introduced small quality performance–based adjustments in Medicare payments, has resulted in little tangible benefit over its first 4 years. It is possible that alternative incentive designs — including those with simpler criteria for performance and larger financial incentives — might have led to greater improvement among hospitals. It may be useful for the Centers for Medicare and Medicaid Services to continue to experiment with other value-based payment models, including the HRRP, accountable care organization programs, and bundled payment programs, in an effort to improve the value of hospital spending.

Supplementary Material

Supplement1

Acknowledgments

Supported by grants from the National Institute on Aging (R01-AG-047932 to Dr. Ryan, Mr. Krinsky, and Ms. Maurer and R01 AG039434-05 to Dr. Dimick).

We thank David Muhlestein for the use of the Leavitt Partners data on hospital participation in accountable care organization programs.

Footnotes

Disclosure forms provided by the authors are available with the full text of this article at NEJM.org.

References

  • 1.Yasaitis L, Fisher ES, Skinner JS, Chandra A. Hospital quality and intensity of spending: is there an association? Health Aff (Millwood) 2009;28:w566–w572. doi: 10.1377/hlthaff.28.4.w566. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Jha AK, Orav EJ, Dobson A, Book RA, Epstein AM. Measuring efficiency: the association of hospital costs and quality of care. Health Aff (Millwood) 2009;28:897–906. doi: 10.1377/hlthaff.28.3.897. [DOI] [PubMed] [Google Scholar]
  • 3.Hussey PS, Wertheimer S, Mehrotra A. The association between health care quality and cost: a systematic review. Ann Intern Med. 2013;158:27–34. doi: 10.7326/0003-4819-158-1-201301010-00006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Fisher ES, Wennberg DE, Stukel TA, Gottlieb DJ, Lucas FL, Pinder EL. The implications of regional variations in Medicare spending. Part 1: the content, quality, and accessibility of care. Ann Intern Med. 2003;138:273–87. doi: 10.7326/0003-4819-138-4-200302180-00006. [DOI] [PubMed] [Google Scholar]
  • 5.Fisher ES, Wennberg DE, Stukel TA, Gottlieb DJ, Lucas FL, Pinder EL. The implications of regional variations in Medicare spending. Part 2: health outcomes and satisfaction with care. Ann Intern Med. 2003;138:288–98. doi: 10.7326/0003-4819-138-4-200302180-00007. [DOI] [PubMed] [Google Scholar]
  • 6.Jha AK, Li Z, Orav EJ, Epstein AM. Care in U.S. hospitals — the Hospital Quality Alliance program. N Engl J Med. 2005;353:265–74. doi: 10.1056/NEJMsa051249. [DOI] [PubMed] [Google Scholar]
  • 7.Rosenthal MB, Fernandopulle R, Song HR, Landon B. Paying for quality: providers’ incentives for quality improvement. Health Aff (Millwood) 2004;23:127–41. doi: 10.1377/hlthaff.23.2.127. [DOI] [PubMed] [Google Scholar]
  • 8.Ryan AM, Damberg CL. What can the past of pay-for-performance tell us about the future of Value-Based Purchasing in Medicare? Healthc (Amst) 2013;1:42–9. doi: 10.1016/j.hjdsi.2013.04.006. [DOI] [PubMed] [Google Scholar]
  • 9.Ryan AM, Nallamothu BK, Dimick JB. Medicare’s public reporting initiative on hospital quality had modest or no impact on mortality from three key conditions. Health Aff (Millwood) 2012;31:585–92. doi: 10.1377/hlthaff.2011.0719. [DOI] [PubMed] [Google Scholar]
  • 10.Grossbart SR. What’s the return? Assessing the effect of “pay-for-performance” initiatives on the quality of care delivery. Med Care Res Rev. 2006;63(Suppl):29S–48S. doi: 10.1177/1077558705283643. [DOI] [PubMed] [Google Scholar]
  • 11.Lindenauer PK, Remus D, Roman S, et al. Public reporting and pay for performance in hospital quality improvement. N Engl J Med. 2007;356:486–96. doi: 10.1056/NEJMsa064964. [DOI] [PubMed] [Google Scholar]
  • 12.Ryan AM, Blustein J, Casalino LP. Medicare’s flagship test of pay-for-performance did not spur more rapid quality improvement among low-performing hospitals. Health Aff (Millwood) 2012;31:797–805. doi: 10.1377/hlthaff.2011.0626. [DOI] [PubMed] [Google Scholar]
  • 13.Ryan AM. Effects of the Premier Hospital Quality Incentive Demonstration on Medicare patient mortality and cost. Health Serv Res. 2009;44:821–42. doi: 10.1111/j.1475-6773.2009.00956.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Jha AK, Joynt KE, Orav EJ, Epstein AM. The long-term effect of premier pay for performance on patient outcomes. N Engl J Med. 2012;366:1606–15. doi: 10.1056/NEJMsa1112351. [DOI] [PubMed] [Google Scholar]
  • 15.Ryan AM, Burgess JF, Jr, Pesko MF, Borden WB, Dimick JB. The early effects of Medicare’s mandatory hospital pay-for-performance program. Health Serv Res. 2015;50:81–97. doi: 10.1111/1475-6773.12206. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Figueroa JF, Tsugawa Y, Zheng J, Orav EJ, Jha AK. Association between the Value-Based Purchasing pay for performance program and patient mortality in US hospitals: observational study. BMJ. 2016;353:i2214. doi: 10.1136/bmj.i2214. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Calikoglu S, Murray R, Feeney D. Hospital pay-for-performance programs in Maryland produced strong results, including reduced hospital-acquired conditions. Health Aff (Millwood) 2012;31:2649–58. doi: 10.1377/hlthaff.2012.0357. [DOI] [PubMed] [Google Scholar]
  • 18.Krumholz HM, Wang Y, Mattera JA, et al. An administrative claims model suitable for profiling hospital performance based on 30-day mortality rates among patients with an acute myocardial infarction. Circulation. 2006;113:1683–92. doi: 10.1161/CIRCULATIONAHA.105.611186. [DOI] [PubMed] [Google Scholar]
  • 19.Data and program reports. Baltimore: Centers for Medicare and Medicaid Services. 2016 ( https://www.cms.gov/regulations-and-guidance/legislation/ehrincentiveprograms/dataandreports.html).
  • 20.Colla CH, Lewis VA, Tierney E, Muhlestein DB. Hospitals participating in ACOs tend to be large and urban, allowing access to capital and data. Health Aff (Millwood) 2016;35:431–9. doi: 10.1377/hlthaff.2015.0919. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Caliendo M, Kopeinig S. Some practical guidance for the implementation of propensity score matching. J Econ Surv. 2008;22:31–72. [Google Scholar]
  • 22.Heckman JJ, Ichimura H, Todd PE. Matching as an econometric evaluation estimator: evidence from evaluating a job training programme. Rev Econ Stud. 1997;64:605–54. [Google Scholar]
  • 23.Leuven E, Sianesi B. PSMATCH2: Stata module to perform full Mahalanobis and propensity score matching, common support graphing, and covariate imbalance testing. Chestnut Hill, MA: Boston College; 2003. ( https://ideas.repec.org/c/boc/bocode/s432001.html). [Google Scholar]
  • 24.Ryan AM, Burgess JF, Jr, Dimick JB. Why we should not be indifferent to specification choices for difference-in-differences. Health Serv Res. 2015;50:1211–35. doi: 10.1111/1475-6773.12270. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Bertrand M, Duflo E, Mullainathan S. How much should we trust differences-in-differences estimates? Q J Econ. 2004;119:249–75. [Google Scholar]
  • 26.Stata statistical software: release 14. College Station, TX: StataCorp; 2015. [Google Scholar]
  • 27.Zuckerman RB, Sheingold SH, Orav EJ, Ruhter J, Epstein AM. Readmissions, observation, and the hospital readmissions reduction program. N Engl J Med. 2016;374:1543–51. doi: 10.1056/NEJMsa1513024. [DOI] [PubMed] [Google Scholar]
  • 28.Tversky A, Kahneman D. Loss aversion in riskless choice: a reference-dependent model. Q J Econ. 1991;106:1039–61. [Google Scholar]
  • 29.Doran T, Maurer KA, Ryan AM. Impact of provider incentives on quality and value of health care. Annu Rev Public Health. 2017;38:449–65. doi: 10.1146/annurev-publhealth-032315-021457. [DOI] [PubMed] [Google Scholar]
  • 30.Kontopantelis E, Doran T, Springate DA, Buchan I, Reeves D. Regression based quasi-experimental approach when randomisation is not an option: interrupted time series analysis. BMJ. 2015;350:h2750. doi: 10.1136/bmj.h2750. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplement1

RESOURCES