ABSTRACT
Pay-for-performance (P4P) reimbursement models were launched in 2013 to incentivise the value of healthcare delivered by including quality outcomes, such as mortality, readmission, and patient satisfaction, in hospital reimbursement in the U.S. Although a decade has passed, the efficacy of these P4P programs remains unclear. This research intends to evaluate their long-term performance implications along two critical dimensions – productivity and healthcare value. Drawing on a nationwide sample of U.S. hospitals collected from 2008 to 2019, we utilise data envelopment analysis to measure hospital performance and the Malmquist index to evaluate their longitudinal trends. Although average hospital productivity and value improved since the rollout of the P4P programs, we observe that a large proportion of laggard hospitals were unable to catch up with improvements to the performance frontier, raising concerns about disparities in the impact of future value-based programs. Our analyses also indicate that horizontal integration across hospitals is associated with greater productivity and value. While greater physician-hospital (vertical) integration is associated with higher hospital productivity, it does not have a positive impact on value. Our study provides new insights into the antecedents and performance consequences of implementing value-based healthcare initiatives and their implications for hospital managers and policymakers.
KEYWORDS: Value-based healthcare, pay for performance, data envelopment analysis, Malmquist index, productivity, integration
1. Introduction
In 2010, the Patient Protection and Affordable Care Act (PPACA) was signed into law to reform the healthcare system in the U.S. The PPACA introduced a variety of value-based payment models with the goal of enhancing the value of care delivered by including quality outcomes in reimbursement to healthcare providers. In general, these value-based initiatives evaluate provider performance based on pre-determined measures, including clinical quality, health outcomes, patient experience, and cost efficiency (Chee et al., 2016). Providers whose performance scores exceed a certain threshold are likely to receive payment incentives (or bonuses) based on their ability to meet specific cost and quality objectives.
As of 2018, about 90% of Medicare payments have been linked to quality-based performance (Werner et al., 2021). The Centers for Medicare and Medicaid Services (CMS) estimates that all Medicare reimbursements will be based on value-based models by 2030 (LaPointe, 2022). Due to the shift from fee-for-service (FFS) payment models to value-based, it becomes imperative to better understand the efficacy of these payment initiatives. In this study, we focus on a specific class of value-based models, termed pay-for-performance (P4P), which aim to improve healthcare services at acute care hospitals (Joynt Maddox et al., 2017). This includes hospital value-based purchasing (HVBP), hospital readmission reduction program (HRRP), and hospital-acquired condition reduction program (HACRP).
Both HVBP and HRRP went into effect in 2013, although they were designed to target different dimensions of healthcare delivery. HVBP includes a broad set of performance measures, such as mortality rate and patient satisfaction, to incentivise hospitals to undertake quality improvements. In contrast, HRRP focuses on reducing hospital readmissions associated with specific diseases, such as congestive heart failure, chronic obstructive pulmonary disease, etc. In 2015, HACRP was launched to lower hospital-based infections and improve patient safety (c.f. Joynt Maddox et al., 2017; Renee Rutter & Park, 2020 for a summary of P4P programs). Hence, our research studies the overall impact of P4P programs on hospital performance, specifically focusing on the productivity and value of healthcare delivery.
A substantial body of literature has assessed the efficacy of these programs, but the results have been mixed (Izón & Pardini, 2018; Renee Rutter & Park, 2020). For instance, Ryan et al. (2015) compared acute care and critical access hospitals using a difference-in-differences analysis, but did not find evidence of significant improvements in clinical process measures or patient experience during the first nine months of HVBP. Subsequently, they analysed four years of HVBP data and still did not observe significant improvements in these measures (Ryan et al., 2017). Similar results have also been observed in other studies (Izón & Pardini, 2018).
The prevailing evidence on the impact of HRRP is more encouraging. Ayabakan et al. (2021) observed that HRRP not only reduced targeted readmission rates among Medicare patients, but also exhibited spillover effects on other insurance types and performance measures. Similarly, Chen and Grabowski (2019) also reported a decline in hospital readmission rates after HRRP implementation. However, alternative explanations may undermine these positive findings. Himmelstein and Woolhandler (2015) reported that hospitals may artificially reduce readmission rates by treating returning patients in observation beds or emergency departments instead of readmitting them as inpatients. This strategy signals little quality gains from HRRP, but potential for financial harm to Medicare patients because they are responsible for higher medical bills and may be ineligible for rehab or skilled nursing care.
Overall, early evidence on the efficacy of P4P programs has been inconclusive. There are several reasons that may explain the modest findings. First, many studies were based on data until 2015 and thus focused on early performance changes. However, P4P programs require hospitals to develop new skills and processes, provide personnel training, build care coordination capabilities, acquire physical amenities, and implement IT systems. Some adjustments may take years to mature and lead to significant disruptions to patient care in the short run (Daub et al., 2020; Izón & Pardini, 2018). Second, the measurement systems were overwhelmingly complex and changed frequently during the early rollout period as CMS changed several performance metrics and weights attributed to these measures. These changes increased the administrative burden of program implementation and made it harder for hospitals to achieve their performance targets (Revere et al., 2021).
Furthermore, the literature has documented that the targeted measures may not be aligned with program goals and have unpredictable consequences, which underscores the importance of a holistic understanding of the overall impact of P4P programs (Amorim Lopes et al., 2016; Aragon et al., 2022). Lastly, there may exist systematic differences across hospitals that drive the heterogeneous effect of P4P programs (Ryan et al., 2017). For example, earlier studies have identified hospital attributes, such as number of beds, geographic location, teaching status, and for-profit ownership, as antecedents of performance in the P4P era (Renee Rutter & Park, 2020).
Our study addresses these research gaps by (a) evaluating the overall impact of P4P programs in the long run, and (b) developing a better understanding of the relationship between hospital operating characteristics and performance change. We draw on the concept of healthcare value to include a variety of required P4P measures (Porter, 2010). Using data envelopment analysis (DEA), we quantify and differentiate productivity from value. While the former is used as a measure of performance in FFS programs, the latter is emphasised under value-based care reimbursement (Ryan et al., 2017). We utilise the DEA-based Malmquist index approach to assess changes in hospital performance across the two dimensions over a ten-year period, which spans the time period before and after the introduction of P4P. We also utilise panel data regressions to study the effect of hospital characteristics on performance. Our results contribute to the literature by revealing the underlying mechanisms of long-term changes in hospital performance based on productivity and value, as well as their organisational antecedents.
2. Materials and methods
In this section, we first describe the data sources and sample selection. We then present the variables used in DEA and econometric analyses, followed by model estimation.
2.1. Data sources and sample
This is a retrospective study based on a nationwide sample of 2,031 acute care hospitals in the U.S., evaluated across ten years from 2008 to 2017. Hospital-level data were collected from multiple sources, following the extant literature (Bardhan et al., 2023; Kazley & Ozcan, 2009). We obtained data on healthcare expenses and capital investment from CMS Cost Reports. We also acquired hospital staffing, service levels, and other characteristics from the American Hospital Association (AHA) Annual Survey. Further, we extracted quality of care and patient outcomes from the CMS Hospital Compare database. Finally, we supplemented the hospital case mix index (CMI) and operational adjustment factors from the CMS Impact Files. Since hospital performance may also be driven by the external environment, we constructed several regional factors using data from the Dartmouth Atlas of Healthcare and American Community Survey and linked them to hospitals based on their geographic location.
When combining these data sources, we removed hospitals with more than three years of missing observations for the variables in the DEA models (Atasoy et al., 2018). We deployed linear interpolation and last-known-value extrapolation to impute missing values. These are necessary steps to form a balanced panel for Malmquist index analyses so that performance changes can be quantified for every hospital unit, while also ensuring the generalizability of our findings (Cantor & Poh, 2018). Additional details on our data preparation are provided in Appendix A.
Furthermore, we restrict our sample to acute care hospitals for two reasons. First, speciality and critical access hospitals report excessive missing data, which are mostly dropped during data preparation. Second, these hospitals are not subject to P4P program requirements, which may affect the accuracy of our estimates. Hence, our study uses pre-post analysis based on the Malmquist index to reveal performance changes attributed to P4P programs, contingent on hospital operational characteristics.
2.2. Variable construction
2.2.1. Input resources
We implement two separate DEA models, one for hospital productivity and the other for healthcare value. We consider the same set of input resources that are expended to produce clinical services and patient outcomes, for both productivity and value measures (Burgess, 2012). Consistent with prior hospital DEA studies, we utilise five commonly deployed inputs, including capital, labour, and materials (Kohl et al., 2019; Pai et al., 2019). Capital expenses represent expenditures on building, fixtures, and movable equipment (Bardhan et al., 2023). Labour inputs include the number of full-time equivalent (FTE) physicians, nurses, and other clinicians on the payroll. We do not include residents, interns, or trainees, as their roles and performance evaluations may differ. Hospital operating expenses measure the costs incurred during the process of care delivery, excluding expenses on payroll, capital, and benefits to avoid double-counting. We drop hospitals with costs in the top 1% or bottom 1% of observations to avoid outlier effects, and deflate operating and capital expenses so that the costs are comparable using Malmquist index over time.
2.2.2. Service quantity
Healthcare productivity measures the extent to which hospitals utilise input resources to generate healthcare services (Amorim Lopes et al., 2016). While there exist various proxies for productivity, this definition has been widely adopted in the efficiency literature (Cantor & Poh, 2018; Novignon et al., 2023). In productivity-based DEA models, the outputs include quantities of three major services provided by hospitals – inpatient discharges, outpatient visits, and emergency department (ED) visits (Cantor & Poh, 2018). Inpatient cases are adjusted by CMI to account for extra resources needed for patients with severe conditions (Kazley & Ozcan, 2009).
2.2.3. Patient outcomes
Healthcare value is defined as patient outcomes achieved per dollar spent (Burgess, 2012; Porter, 2010). Consistent with this notion, we measure a variety of health outcomes as DEA outputs to quantify the value of care delivered (Pai et al., 2019). We calculate mortality as the average risk-adjusted mortality rate, weighted by the number of patients, for three major diseases (heart attack, heart failure, and pneumonia) that CMS uses to evaluate hospital performance in P4P programs. Similarly, hospital readmission rate is the risk-adjusted, weighted average of readmission rates across these conditions. Since hospitals intend to maximise patient outcomes, we utilise the inverse of mortality and readmission rates as output measures in our DEA models (Aragon et al., 2022). Furthermore, we also include the experiential quality of services (i.e., patient satisfaction) based on a survey of discharged patients with respect to the communication and responsiveness of providers, pain management, and care transition (Bardhan et al., 2023).
2.2.4. Data collection period and variable normalization
The reporting period (fiscal year) may vary by hospital, resulting in discrepancies in the reported hospital expenditures and service quantities. Such discrepancies also contribute to large variations in the DEA inputs, relative to the quality outputs that are mostly rate measures, which leads to DEA-based estimates of hospital value disproportionately favouring smaller hospitals. To address these concerns, we divide the operating expenses, capital expenses, and service quantities (inpatient, outpatient, and emergency visits) by the length of the reporting period (number of days), converting these variables to a per-day unit.
It is worth noting that CMS reports mortality and readmission rates based on a three-year rolling window (i.e., data collection period), different from other data sources. To address this unique reporting timeline, we consider the first year of the collection window as the observation year (Banerjee et al., 2019). For example, mortality and readmission rates reported for the period from July 2017 to December 2019 are considered as the 2017 data; similarly, the rates from July 2016 to June 2018 are considered as the 2016 data, and so on. This allows us to stagger the lead inputs and lagged outputs in the DEA models to account for temporal differences in measurements. Therefore, we label our sample as from 2008 to 2017, while the actual reporting period covers hospital performance collected from July 2008 to December 2019, which is six years after the launch of P4P programs, and before the onset of COVID-19 in the U.S. Hence, our sample offers sufficient length of observations to study the long-term effect of P4P programs.
2.2.5. Other variables
We focus on several key hospital characteristics to examine their association with changes in hospital performance. First, we capture two types of organisational integration that are common strategies to mitigate financial risk during the rollout of value-based care programs (Post et al., 2018). Vertical integration is defined based on contractual arrangements between a focal hospital and its integration with affiliated physician practices. Following the literature, we categorise hospital-physician practice contracts into four groups: none, low, medium, and high integration (Bardhan et al., 2023; Post et al., 2018). The extent of vertical integration varies from loose partnerships to complete ownership, with an increasing level of incentive alignment and clinical coordination. Horizontal integration refers to partnerships between hospitals with the goal of providing coordinated care (Baicker & Levy, 2013). Patients typically receive a majority of treatments from providers within the same hospital referral region (HRR), building stronger connections with these hospitals (Atasoy et al., 2018). Therefore, we define horizontal integration as a binary variable with a value of one if the focal hospital partners with at least one peer hospital (that belongs to the same health system) within the HRR. In other words, horizontal integration measures how well hospitals are integrated with their peers in the same geographic region.
Furthermore, we consider three categories of hospital ownership: government, for-profit, and not-for-profit status. To account for geographic variations, we distinguish between urban and rural hospitals. In terms of teaching status, we distinguish between major teaching, minor teaching, and non-teaching hospitals. Major teaching hospitals are members of the Council of Teaching Hospitals. Hospital size is defined based on the number of staffed beds (<100, 100–250, >250). Since prior research has documented that past P4P performance is a significant predictor of current achievement, we calculate a ratio of DEA scores for hospital productivity and value from the previous year to measure the baseline effect (Revere et al., 2021).
We also include a broad spectrum of control variables that may affect hospital performance. CMI captures the severity of the patient population, the outlier factor reflects exceptionally costly cases, and the disproportionate share adjustment factor measures the propensity to treat uninsured patients. We further control for environmental factors at the HRR level (Gearhart, 2017). This includes the density of primary care physicians, specialists, and nurses per population, a Herfindahl-Hirschman index based on the average daily census, and socio-demographics of residents (i.e., population size, education level, median income, and age). Table 1 presents the definitions, summary statistics, and data sources of all variables used in DEA and econometric analyses.
Table 1.
Variable definitions and summary statistics.
| Variable | Definition | Mean | (St. Dev.) | Source |
|---|---|---|---|---|
| Hospital Input Resources for Productivity and Value | ||||
| Operating Expense | Operating expense per day, excluding payroll, capital expense, depreciation, interest, and benefits (in thousand $). | 415.83 | (369.44) | Cost Reports |
| Capital Expense | Capital-related expense per day for building, fixture, and movable equipment (in thousand $). | 34.39 | (31.87) | Cost Reports |
| Physicians | Number of FTE physicians and dentists. | 23.36 | (47.64) | AHA |
| Nurses | Number of FTE nurses, including registered nurses, licensed practical nurses, and nursing assistive personnel. | 481.11 | (408.71) | AHA |
| Other Clinicians | Number of other FTE clinicians, including radiology technicians, laboratory technicians, licensed pharmacists, pharmacy technicians, and respiratory therapists. | 114.24 | (95.02) | AHA |
| Hospital Output Services for Productivity | ||||
| Inpatient Visits | Number of CMI-adjusted hospital admissions per day, excluding newborns and nursing home visits. | 43.24 | (38.67) | AHA, Impact Files |
| Outpatient Visits | Number of non-emergent outpatient visits per day. | 408.72 | (468.11) | AHA |
| ED Visits | Number of visits to emergency department units per day. | 110.16 | (75.85) | AHA |
| Hospital Output Patient Outcomes for Value | ||||
| Patient Satisfaction | Average percentage of patients who reported the highest satisfaction level to the HCAHPS survey, addressing nurse communication, doctor communication, staff responsiveness, pain management, communication about medication, communication about discharge, and care transition. | 70.77 | (4.29) | Hospital Compare |
| Mortality Rate | Average 30-day mortality rate of hospital admissions for heart attack, heart failure, and pneumonia, weighted by the number of patients with each disease. | 13.38 | (1.76) | Hospital Compare |
| Readmission Rate | Average 30-day readmission rate for heart attack, heart failure, and pneumonia, weighted by the number of patients with each disease. | 19.21 | (1.63) | Hospital Compare |
| Hospital Characteristics Variables for Panel Data Regressions | ||||
| Vertical Integration | Categorical variable to indicate the integration level of contractual arrangements between hospitals and physicians. | AHA | ||
| No integration | 0.48 | (0.50) | ||
| Low integration = independent practice association, group practice without walls, and open physician-hospital organisation. | 0.11 | (0.31) | ||
| Medium integration = closed physician-hospital organisation and management service organisation. | 0.05 | (0.22) | ||
| High integration = integrated salary, equity, and foundation models. | 0.36 | (0.36) | ||
| Horizontal Integration | Binary indicator equals 1 if the focal hospital has at least one peer hospital in the same health system within the HRR, and 0 otherwise. | 0.54 | (0.50) | AHA, Dartmouth Atlas |
| Productivity/Value DEA Ratio | Ratio of DEA productivity score divided by value score from the previous year. | 6.64 | (6.52) | BCC DEA models |
| Ownership | Categorical variable to indicate hospital ownership. | AHA | ||
| Public = owned by federal and nonfederal governments. | 0.15 | (0.36) | ||
| For-profit = owned by for-profit investors. | 0.20 | (0.40) | ||
| Not-for-profit = owned by not-for-profit nongovernment organisations. | 0.65 | (0.48) | ||
| Rural | Binary indicator of geographic location. 1 = rural and 0 = urban. | 0.28 | (0.45) | Cost Reports |
| Teaching Status | Categorical variable of hospital teaching status. | AHA | ||
| Non-teaching = not involved in training residents. | 0.66 | (0.47) | ||
| Minor teaching = regular teaching hospitals. | 0.29 | (0.45) | ||
| Major teaching = member of Council of Teaching Hospitals. | 0.05 | (0.22) | ||
| Bed Size | Categorical variable of hospital size based on the number of staffed beds. | AHA | ||
| <100 | 0.29 | (0.45) | ||
| 100 – 250 | 0.42 | (0.49) | ||
| >250 | 0.29 | (0.45) | ||
| CMI | Case mix index. | 1.46 | (0.25) | Impact Files |
| Outlier Adjustment Factor | CMS operating outlier adjustment factor. | 0.04 | (0.04) | Impact Files |
| OPDSH Adjustment Factor | CMS operating disproportionate share hospital payment adjustment factor. | 0.08 | (0.10) | Impact Files |
| HHI | Herfindahl – Hirschman Index in the HRR, calculated using the average daily census. | 0.23 | (0.18) | AHA, Dartmouth Atlas |
| No. PCPs in HRR | Number of primary care physicians per 100,000 residents in the HRR. | 72.40 | (11.73) | Dartmouth Atlas |
| No. Specialists in HRR | Number of specialists per 100,000 residents in the HRR. | 127.38 | (21.38) | Dartmouth Atlas |
| No. RNs in HRR | Number of hospital-based registered nurses per 1,000 residents in the HRR. | 4.29 | (0.72) | Dartmouth Atlas |
| Population Size in HRR | Number of residents in the HRR (in millions). | 1.94 | (1.89) | ACS |
| College Education in HRR | Percentage of residents in the HRR who have a college degree or higher. | 26.59 | (6.66) | ACS |
| Household Income in HRR | Median household income in the HRR (in thousands $). | 54.16 | (11.40) | ACS |
| Age 65 in HRR | Percentage of residents in the HRR who are 65 years or older. | 13.33 | (2.70) | ACS |
Mortality and readmission rates are presented before inversion. Abbreviations: HCAHPS is Hospital Consumer Assessment of Healthcare Providers and Systems, OPDSH is Operating Disproportionate Share Hospital, HHI is Herfindahl – Hirschman Index, PCP is primary care physician, RN is registered nurse, ACS is American Community Survey.
2.3. Models
2.3.1. Data envelopment analysis
We study the long-term effect of P4P programs, which necessitates assessing hospital productivity and value over time, before and after the deployment of P4P programs. However, it is challenging to quantify hospital performance because it entails different measures of health expenditures, service utilisation, and patient outcomes, which may not be measured on a single dimension or converted into a dollar value. Furthermore, complex interdependencies may exist between these dimensions, and the underlying production function is often unobservable. These concerns make other approaches, such as weighted composite scores or quadrant diagrams, infeasible and motivate us to use DEA due to its methodological advantages.
First, DEA considers multiple inputs and outputs in the production function, thereby providing a better fit with the multi-dimensional nature of healthcare services and measurement of hospital performance. Second, DEA is a non-parametric estimation method that does not require a priori knowledge of the functional form of the underlying production function, unlike parametric methods such as stochastic frontier analysis. Third, DEA uses linear programming techniques to identify the optimal efficiency of converting a pre-defined set of inputs into outputs for each entity under evaluation, thereby aligning with hospital objectives to improve the efficiency and quality of outcomes under P4P programs.
We implement two separate DEA models to determine the optimal ratio at which input resources are expended to produce healthcare services and patient outcomes, i.e., the hospital productivity and value measures, respectively. Following the literature, we deployed the Banker-Charnes-Cooper (BCC) model, which accounts for variable returns to scale in the DEA production function (Banker et al., 1984; Kohl et al., 2019). Since hospitals typically have more control over the deployment of input resources than their ability to change the mix of services delivered or patient outcomes (since these measures are partially dependent on their patient population), we adopt an input-oriented DEA specification. The input-oriented BCC model generates the Pareto-optimal frontier comprised of the best performers, among which it identifies the peer reference set for benchmarking each off-frontier hospital and calculates a ratio score ranging from zero to one, with a higher number indicating better performance. We describe further methodological details in Appendix B.
2.3.2. DEA-based Malmquist index
While DEA provides a suitable tool to quantify hospital productivity and value within a single period, simply comparing the DEA results across two time periods does not accurately measure performance change because DEA scores are highly dependent on the reference group (i.e., Pareto frontier), which may also shift over time. Appendix B provides a numerical example to illustrate this concern. Hence, we need to consider not only the distance to the frontier but also the position of the frontier, to assess hospital performance changes driven by P4P programs. This consideration leads us to use the Malmquist index, a commonly used approach to analyse the impact of policy reforms in the DEA literature (Kohl et al., 2019).
Specifically, the Malmquist index extends DEA models to measure the overall performance change between two time periods, which can be further decomposed into two effects. The catch-up effect compares DEA scores between two time periods to measure changes in the distance to the respective frontier, and the result indicates if the focal hospital is able to catch up with the movement of the performance frontier across time. The frontier-shift effect involves comparing the focal hospital in one period with the frontier in another period (using DEA) to measure how the frontier moves, i.e., whether the best performers improve or deteriorate.
We characterise hospital performance as having progressed, stagnated, or regressed, if its Malmquist index is greater than, equal to, or less than one, respectively. The same rubric also applies to the catch-up and frontier-shift effects. Therefore, the DEA-based Malmquist index allows us to evaluate changes in hospital performance over time and provides rich insights into the driving effects, which is particularly appealing in our research context to study the impact of P4P programs. Appendix C illustrates the details and intuition of the Malmquist index method.
We carefully select three timestamps around the program commencement, i.e., 2009, 2013, and 2017. The reasons are multi-fold. First, CMS collects mortality and readmission rates based on a three-year rolling window. Specifically, the first timestamp of 2009 hospital data was collected from July 2009 to June 2012, the second timestamp of 2013 data was collected from July 2013 to June 2016, and the third timestamp of 2017 data was from July 2017 to December 2019. With a 4-year gap, the selected timestamps do not overlap in their data collection period which allows us to better evaluate changes in hospital performance.
Second, the data collection period for the first timestamp ended just before P4P programs were launched in 2013, while the data for the second timestamp was collected immediately afterwards. Therefore, by comparing the first two timestamps, we can obtain an estimate of the performance changes in the short term. Furthermore, the third timestamp data was collected six years after P4P programs were launched, which is ample time for hospitals to adjust their practices and strategies in response to changes in reimbursement models. Hence, we measure long-term performance change by comparing hospital data across the three time periods.
Lastly, since our data collection period ended in December 2019, it allows us to preclude disruptions to the healthcare system caused by the COVID-19 pandemic. For instance, it has been reported that the mode of care delivery and service utilisation has changed, with a substantial decrease in in-person visits as routine checkups and elective procedures were deferred or voided altogether (Werner & Glied, 2021). These behaviour changes during the pandemic may have significantly impacted health expenditures, professional employment, service utilisation, and quality outcomes, making the data from 2020 onward incomparable to the pre-pandemic period. Hence, our current approach of using hospital data across three timestamps, starting in July 2009 and ending in December 2019, provides unbiased estimates of the impact of P4P programs on hospital productivity and value.
We implemented the DEA models and calculated the Malmquist index using the Benchmarking package v0.30 in R statistical software v4.2.1.
2.3.3. Malmquist index and hospital characteristics
In order to attribute changes in hospital productivity and value performance to contemporary hospital characteristics, we implement a series of Malmquist index models to calculate performance change in every pair of adjacent years (e.g., 2008 – 2009, 2009 – 2010, etc.). To mitigate omitted variable concerns, we regress the Malmquist index on hospital operating characteristics to examine their associations while accounting for a wide spectrum of control variables. We perform several tests to determine model specifications. First, considering that there may exist systematic differences across hospitals, we use a Breusch-Pagan Lagrange Multiplier test to examine the significance of hospital-specific heterogeneity in explaining the Malmquist index. The test has a statistic equal to 23.42 (p < 0.01), which rejects the null hypothesis that the unobserved hospital heterogeneity is zero, indicating a significant panel effect. Second, we conduct a Hausman specification test (p < 0.01), which suggests that the unobserved hospital heterogeneity should be modelled as fixed effects (FE). Third, P4P programs have undergone substantial changes and there may exist other economic factors that affect all hospitals, leading to seasonal trends. A test of the joint significance of time effects supports this conjecture with an F-statistic of 4.32 (p < 0.01), suggesting FE for years.
Therefore, we estimate two-way FE panel regressions to study the association between the Malmquist index and hospital characteristics, as specified in Equation (1).
| (1) |
The subscript i indexes an individual hospital, and t indicates the observation year. The dependent variable MI is the Malmquist index measured separately for hospital productivity and value performance. We use log transformation to account for skewness. X is a vector of independent variables, including hospital vertical and horizontal integration, a ratio of DEA scores for productivity and value, ownership, geographic locations, teaching status, and number of beds. W and Z control for the aforementioned hospital covariates (e.g., CMI, operation outlier factor, etc.) and HRR-level regional factors (e.g., population size, education, income, etc.), respectively. and represent hospital and year FEs to account for unobserved hospital heterogeneity and economic shocks (Gearhart, 2017). We also conduct a Wald test (p < 0.01) and Wooldridge test (p = 0.02), suggesting that our model exhibits groupwise heteroscedasticity and serial correlation. Hence, we cluster standard errors at the hospital level. All regressions are implemented using Stata® software, version 16.1 MP.
3. Results
This section presents the estimation results for DEA and econometric analyses. We first describe the Malmquist index for short- and long-term hospital performance change, followed by the effect of key hospital characteristics.
3.1. Short-term performance change
Table 2 presents the Malmquist index along with the decomposed catch-up and frontier-shift estimates for short-term performance change from 2009 to 2013. Panel A shows the productivity change. Catch-up has a mean slightly below one (0.97), indicating that low-productivity hospitals regressed marginally compared to their high-productivity counterparts. Although these hospitals moved slightly away from the productivity frontier, the frontier improved significantly, as indicated by the mean frontier-shift effect of 1.15. Due to the frontier shift, the overall productivity improved for both high- and low-productivity hospitals (mean Malmquist index = 1.11).
Table 2.
Malmquist index of hospital productivity and value between 2009 and 2013.
| (A) Productivity |
|||
|---|---|---|---|
| Catch-up Effect | Frontier-shift Effect | Malmquist Index | |
| Mean | 0.97 | 1.15 | 1.11 |
| St. Dev. | 0.22 | 0.25 | 0.37 |
| Median |
0.95 |
1.13 |
1.06 |
| (B) Value |
|||
| |
Catch-up Effect |
Frontier-shift Effect |
Malmquist Index |
| Mean | 1.20 | 0.97 | 1.03 |
| St. Dev. | 0.92 | 0.66 | 1.14 |
| Median |
1.01 |
0.83 |
0.85 |
| (C) Productivity vs. Value |
|||
| |
Catch-up Effect |
Frontier-shift Effect |
Malmquist Index |
| Paired t-Test | −0.23*** (−11.25) | 0.18*** (11.79) | 0.08*** (3.31) |
| Wilcoxon Signed-Rank Test | −0.06*** (−9.35) | 0.30*** (20.24) | 0.21*** (15.50) |
Paired t-tests are performed to compare the mean difference, while Wilcoxon signed-rank tests are performed to compare median differences between paired observations. Mean and median differences are reported with test statistics in parentheses. ***p < 0.01, **p < 0.05, *p < 0.1.
Panel B presents the change in value performance. The average catch-up effect is 1.20, indicating that low-value hospitals were able to catch up with their respective reference group. However, the majority of frontier hospitals experienced a decline in value performance, as the frontier-shift effect is less than one (with a median value of 0.83). Hence, the two effects offset each other, leading to a mixed overall result for value. In other words, while the mean Malmquist index of 1.03 is slightly larger than one, the median value of 0.85 is substantially below one. We also observe a large variation in value performance with a standard deviation of 1.14. Collectively, this indicates a highly skewed distribution of short-term value change, with a majority of hospitals having regressed over time, while only a few low-value hospitals improved their performance.
Next, we distinguish between performance changes in hospital productivity and value. We perform paired t-tests to examine the mean difference and Wilcoxon signed-rank tests to compare the median difference between paired observations. As shown in Panel C, these tests are significant and suggest that P4P programs have significantly different effects on hospital productivity and value performance in the short term. Specifically, we observe a lower catch-up effect and higher frontier-shift effect for hospital productivity compared to value, based on short-term change, before and after the introduction of P4P programs. The overall Malmquist index for hospital productivity is significantly higher compared to hospital value.
3.2. Long-term performance change
While hospital productivity and value differ markedly in the short term, their long-term trends converge in a more consistent manner, as shown in Table 3. In Panel A, hospital productivity continued a similar pattern as the short-term change. In particular, the reference hospitals consistently improved their productivity (frontier-shift effect of 1.11), while low-productivity hospitals were able to keep up with the frontier change (catch-up effect close to 1). As a result, most hospitals progressed with respect to long-term productivity change, as both the mean and median values of Malmquist index are larger than 1.
Table 3.
Malmquist index of hospital productivity and value between 2013 and 2017.
| (A) Productivity |
|||
|---|---|---|---|
| Catch-up Effect | Frontier-shift Effect | Malmquist Index | |
| Mean | 1.00 | 1.11 | 1.10 |
| St. Dev. | 0.24 | 0.41 | 0.48 |
| Median |
0.98 |
1.07 |
1.04 |
| (B) Value |
|||
| |
Catch-up Effect |
Frontier-shift Effect |
Malmquist Index |
| Mean | 1.06 | 1.30 | 1.34 |
| St. Dev. | 0.93 | 0.37 | 1.07 |
| Median |
0.90 |
1.26 |
1.07 |
| (C) Productivity vs. Value |
|||
| |
Catch-up Effect |
Frontier-shift Effect |
Malmquist Index |
| Paired t-Test | −0.06*** (−2.85) | −0.19*** (−15.55) | −0.23*** (−9.75) |
| Wilcoxon Signed-Rank Test | 0.08*** (4.37) | −0.19*** (−24.44) | −0.03*** (−8.03) |
Paired t-tests are performed to compare the mean difference, while Wilcoxon signed-rank tests are performed to compare median differences between paired observations. Mean and median differences are reported with test statistics in parentheses. ***p < 0.01, **p < 0.05, *p < 0.1.
With respect to hospital value, we observe substantial progress in the frontier, with the mean and median frontier-shift effect equal to 1.30 and 1.26, respectively. On average, low-value hospitals caught up with their high-value reference hospitals, driving improvements in overall value (mean Malmquist index = 1.34). However, we also observe that a majority of low-value hospitals are worse off, as the median catch-up effect is 0.90. Together, a few hospitals may exhibit an overall value progression since the magnitude of the frontier-shift effect is large, while a sizeable group may deteriorate. Indeed, the standard deviation of the Malmquist value index is relatively large, suggesting significant variations in value change.
Again, we compare the long-term performance change between hospital productivity and value. The test results are presented in Panel C, which rejects the null hypothesis that the mean and median differences between these two dimensions of performance are zero. This result indicates that although hospital productivity and value exhibit similar long-term trends, the magnitude of performance change is significantly different as hospitals may strategically focus on healthcare value, stimulated by the requirements in P4P programs.
3.3. Overall performance change
Combining the short- and long-term performance trends, we explore the overall change in hospital performance from 2009 to 2017, as shown in Table 4. With respect to productivity, the performance trend is highly consistent across years, as shown in Panel A. On one hand, the catch-up effect is less than one, suggesting a slight regression in performance among low-productivity hospitals compared to their high-productivity peers. On the other hand, the reference hospitals on the Pareto-optimal frontier progressed considerably. Since the frontier-shift effect is large, the resulting Malmquist index is larger than one for most hospitals. In other words, a majority of hospitals were able to effectively utilise their clinical resources to produce more healthcare services, resulting in higher long-term productivity after the introduction of P4P programs.
Table 4.
Malmquist index of hospital productivity and value between 2009 and 2017.
| (A) Productivity |
|||
|---|---|---|---|
| Catch-up Effect | Frontier-shift Effect | Malmquist Index | |
| Mean | 0.95 | 1.24 | 1.18 |
| St. Dev. | 0.26 | 0.39 | 0.53 |
| Median |
0.92 |
1.19 |
1.09 |
| (B) Value |
|||
| |
Catch-up Effect |
Frontier-shift Effect |
Malmquist Index |
| Mean | 1.15 | 1.17 | 1.28 |
| St. Dev. | 1.24 | 0.66 | 1.98 |
| Median |
0.92 |
1.01 |
0.94 |
| (C) Productivity vs. Value |
|||
| |
Catch-up Effect |
Frontier-shift Effect |
Malmquist Index |
| Paired t-Test | −0.20*** (−7.41) | 0.07*** (4.15) | −0.10** (−2.38) |
| Wilcoxon Signed-Rank Test | −0.003*** (−4.01) | 0.18*** (11.48) | 0.15*** (9.07) |
Paired t-tests are performed to compare the mean difference, while Wilcoxon signed-rank tests are performed to compare median differences between paired observations. Mean and median differences are reported with test statistics in parentheses. ***p < 0.01, **p < 0.05, *p < 0.1.
Regarding hospital value, the overall trend is mixed (mean Malmquist index = 1.28, whereas median Malmquist index = 0.94) due to conflicting short- and long-term trends. Frontier hospitals initially experienced operational disruptions in the short term, leading to a performance decline. In the long run, they make necessary adjustments and adapt to changes in the P4P policy, as indicated by the higher frontier-shift effect (with mean and median greater than one). In contrast, the catch-up effect suggests that low-value hospitals witnessed divergent trends. Although they are able to catch up with the frontier in the short term, systematic differences may exist between low- and high-value hospitals that create barriers to sustaining such improvements over a longer period. As a result, the majority of low-value hospitals regressed overall, as suggested by the median value of the catch-up effect and Malmquist index for hospital value being less than one.
Overall, productivity and value exhibit considerably distinct patterns, as indicated in Panel C, implying heterogeneous performance implications of P4P programs on hospitals. In Appendix D, we also construct a 2 × 2 quadrant table to link the overall performance changes in hospital productivity and value. Furthermore, we perform panel regression analyses to compare the short- and long-term performance changes, while considering repeated measures within hospitals. We observe significant differences between short- and long-term performance changes, which underscores the importance of our study in unveiling the long-term efficacy of P4P programs, compared to prior studies that do not explore such trends. We present details in Appendix E.
3.4. Hospital characteristics and Malmquist index
Table 5 presents the results of the panel data regression.1 For brevity, we do not present the coefficients of control variables, although they are included in the estimation model. In Column (1), the dependent variable is the Malmquist productivity index. Compared to hospitals that do not integrate with physicians, hospitals with high vertical integration exhibit, on average, a 0.6% higher improvement in productivity. Similarly, horizontal integration with peer hospitals is associated with a 1.3% productivity improvement. The coefficient of the productivity/value ratio is negative and significant, supporting our argument that it is challenging for high-productivity hospitals to improve further.
Table 5.
Drivers of hospital productivity and value.
| (1) | (2) | |
|---|---|---|
| DV (log-transformed) | Productivity Malmquist Index |
Value Malmquist Index |
| Vertical Integration | Baseline: No integration | |
| Low integration | 0.005 | 0.026*** |
| (0.004) | (0.008) | |
| Medium integration | 0.005 | 0.028** |
| (0.005) | (0.013) | |
| High integration | 0.006** | 0.008 |
| (0.003) | (0.006) | |
| Horizontal Integration | 0.013*** | 0.022*** |
| (0.004) | (0.008) | |
| Productivity/Value DEA Ratio | −0.002*** | 0.014*** |
| (0.001) | (0.001) | |
| Ownership | Baseline: Public | |
| For-profit | 0.006 | −0.022 |
| (0.009) | (0.015) | |
| Not-for-profit | 0.002 | 0.005 |
| (0.008) | (0.012) | |
| Rural | −0.002 | 0.007 |
| (0.005) | (0.010) | |
| Teaching Status | Baseline: Non-teaching | |
| Minor teaching | 0.008* | −0.001 |
| (0.005) | (0.009) | |
| Major teaching | 0.028** | −0.009 |
| (0.011) | (0.023) | |
| Bed Size | Baseline: <100 | |
| 100-250 | 0.006 | 0.003 |
| (0.005) | (0.010) | |
| >250 | 0.005 | −0.002 |
| (0.006) | (0.017) | |
| Constant | 0.545*** | 0.368*** |
| (0.053) | (0.113) | |
| Observations | 17,838 | 17,838 |
| Number of Hospitals | 2,031 | 2,031 |
| R-squared | 0.017 | 0.150 |
| Hospital FE | Yes | Yes |
| Year FE | Yes | Yes |
Other hospital-specific covariates (e.g., CMI, operation outlier factor, etc.) and HRR-level regional factors are considered control variables and are included in the estimation but not reported for brevity. Standard errors (in parentheses) are clustered at the hospital level. ***p < 0.01, **p < 0.05, *p < 0.1.
Column (2) shows the regression results using the Malmquist value index as the dependent variable. Interestingly, higher vertical integration is not significantly associated with greater value. Instead, hospitals that deployed low and medium integration models, on average, improved their value scores by 2.6% and 2.8%, respectively. Horizontal integration remains a significant factor and is associated with a 2.2% higher value. The positive and significant coefficient of the productivity/value ratio again supports the baseline effect that low-value hospitals are more likely to improve performance, suggesting that they have greater scope for improvements.
Furthermore, we find that teaching hospitals are more likely to exhibit higher productivity, although this does not translate to greater value. Teaching hospitals typically receive more severe patients who require intensive treatments and services (which translates into higher productivity) but are likely to yield lower relative improvements in patient health outcomes (i.e., low value). We do not observe a systematic effect of hospital ownership, geographic location, and bed size on the Malmquist productivity and value indices.
4. Discussion
Considering the growing impetus by public and private health insurers to shift towards value-based reimbursement, it is imperative to fully understand the effectiveness and consequences of P4P payment models. Building on the prior literature, we propose a DEA-based Malmquist index to construct two performance metrics, namely hospital productivity and value (Amorim Lopes et al., 2016; Novignon et al., 2023). We evaluate the performance trends over a ten-year period, which spans the rollout of P4P programs, to assess the short- and long-term performance implications, and identify the drivers of performance among a nationwide sample of U.S. hospitals.
4.1. Research implications
Consistent with extant studies, we observe that hospitals exhibit relatively moderate changes in healthcare value from 2009 to 2013, indicating a modest effect in the short term. On the other hand, hospitals were able to significantly improve the value of care delivery between 2013 and 2017, a period during which value-based payment initiatives were rolled out and gained traction. Ours is one of the first studies to provide empirical evidence to support the positive effect of P4P programs on hospital productivity and value-based care performance. Our research provides a novel approach to attribute improvements in hospital performance to (a) the shift of performance frontier and (b) catching up with best performers.
We observe that long-term improvements in healthcare value are primarily driven by the frontier-shift effect among leading hospitals, masking the poor performance of 1,197 (59%) low-value hospitals that were not able to catch up. Indeed, more than 70% of laggard hospitals in our sample regressed in terms of their overall value-based performance after accounting for the frontier-shift effect. Therefore, our results imply that long-term hospital value may exhibit a distribution with a long tail skewed towards the lower end. Unlike hospital value, we observe that hospital productivity improved consistently in both the short and long term. Although this change was largely driven by the frontier-shift effect, laggard hospitals were able to catch up quickly. Hence, a majority of hospitals improved their efficiency of resource utilization during healthcare service delivery.
Overall, our research contributes to healthcare literature by revealing the long-term performance trends of hospitals along two critical dimensions – productivity and value. In contrast to prior studies, we conducted DEA based on a variety of productivity and value-based performance measures to assess the overall impact of P4P programs. The Malmquist index approach allows us to decompose performance changes at leading and laggard hospitals, which extends our understanding of the average effects using regression estimation.
Furthermore, we implemented panel data regressions to examine the drivers of hospital performance. We observe that hospitals with high vertical integration with affiliated physician practices are more likely to achieve productivity improvements, but do not exhibit greater value-based performance. On the other hand, horizontal integration with peer hospitals enhances both the productivity and value of care delivery, suggesting that greater care coordination through integration across hospital networks can improve the value of care delivered compared to simply integrating physician practices with hospitals. Hence, our results contribute to the literature on service integrations and their impact on healthcare productivity and value. Our analyses also highlight the role of baseline performance and teaching status in terms of their impact on changes in hospital performance.
4.2. Implications on practice
Our research offers several important implications for healthcare practitioners and policymakers. First, due to the multifaceted nature of healthcare, many value-based initiatives consider a wide list of quality outcomes with frequent changes in measurement criteria, inevitably increasing the compliance burden of tracking hospital performance and identifying opportunities for improvement. This may explain the marginal effect on hospital performance in the short term (Chee et al., 2016; Ryan et al., 2017). Our results indicate that hospitals were able to better understand program policies and adjust their care management practices in the long run, contributing to considerable improvements along both performance dimensions. This suggests that future value-based programs should follow a consistent path in their design and implementation guidelines and support hospitals in executing their performance improvement strategies.
Second, it has been observed that P4P programs may penalise hospitals that treat disadvantaged populations and exacerbate health disparities (Chee et al., 2016). Our results provide supporting evidence for this claim since a considerable group of low-value hospitals fell behind the leading performers, creating larger gaps in healthcare value. We also observe this trend in long-term analyses of hospital performance, suggesting that the pattern of disparities in healthcare delivery may persist in the future. P4P programs deploy a relative ranking method, which evaluates hospitals’ performance relative to their peers. As such, even if a laggard hospital improves significantly, it may still rank low compared to high performers. This method also creates an environment of competition, placing low-performing hospitals at a further disadvantage (Joynt Maddox et al., 2017). Our results suggest that program design should consider rewarding absolute (in addition to relative) performance improvements, which can encourage all hospitals to actively improve care delivery and seek greater care coordination.
Third, despite consistent progression in productivity, value-based performance varied substantially across hospitals. This nuanced difference highlights potential shortfalls in resource utilisation. On one hand, hospitals may strive to increase the variety and scope of healthcare delivery, with the goal that intensive services may yield better health outcomes. However, the relationship between cost and quality is obscure and often nonlinear. Our results underscore the effect of inefficient resource utilisation in an era of healthcare reform where care delivery does not translate into corresponding improvements in health outcomes and has low value to patients. On the other hand, P4P programs were developed based on the prevailing FFS payment structure, which places greater emphasis on the volume of services provided (Chee et al., 2016). This indicates that hospitals may choose to improve productivity while trading off against value such that increased FFS revenue may compensate for the financial penalties incurred due to non-compliance with pre-defined P4P benchmarks.
Our findings suggest that hospitals are not able to meet the original goals of P4P programs, highlighting the need to transition to more advanced alternative payment models, such as Accountable Care Organizations (ACOs). While P4P programs focus on individual hospitals within the FFS model, ACOs are collaborative networks of healthcare providers that work together to provide coordinated care and are accountable for the health outcomes of a defined patient population. ACO participants share a portion of cost savings (as a reward) if they can lower health expenditures without compromising quality outcomes, or incur penalties if they are unable to lower costs below a pre-defined cost benchmark. The dual objectives of cost efficiency and care quality bind both performance dimensions and motivate providers to effectively utilise resources for better productivity and value of care delivery.
Furthermore, in response to the expansion of value-based care reform, hospitals have increasingly merged and acquired other physician practices and smaller hospitals to better coordinate patient care (Daub et al., 2020). Kocher and Sahni (2011) reported that, on average, each additional physician employed increases average hospital expenses by $150,000 to $250,000 annually. This requires physicians to refer more patients and hospitals to more efficiently deliver healthcare services to break even. However, our results indicate that greater vertical integration is associated with higher productivity but may not have an impact on value. This dichotomy suggests that tighter hospital-physician integration is not a panacea. When hired as salaried employees (as in high vertical integration models), physicians may shift their financial risk for clinical outcomes to hospitals, leading to agency issues (Post et al., 2018). As a result, they are more likely to engage in opportunistic behaviour based on their own financial interests compared to their counterparts in low vertical integration models.
In contrast, hospitals that are affiliated with the same health system are more likely to benefit from greater incentive alignment and lower barriers to health information sharing (Atasoy et al., 2018). Hence, when hospitals are horizontally integrated with peer hospitals, they are able to better coordinate patient care, contributing to better productivity and higher value of care. This finding emphasises the critical role of care coordination in achieving the CMS Triple Aim goals. Collectively, our observed effects of vertical and horizontal integration are consistent with prior research and suggest that program incentives should be aligned with the goals of individual providers in order to achieve better patient outcomes (Van Herck et al., 2010). It also highlights the advantages of alternative models, such as ACOs and bundled payment models, in which healthcare providers share the overall risks and rewards.
Furthermore, we observe that hospital performance in the previous year is a significant predictor of current achievement. Specifically, hospitals with low productivity (value) in one year are likely to achieve greater productivity (value) improvement in the following year. This result is partially consistent with Revere et al. (2021), who reported that more than 90% of hospitals that were in the bottom performance category moved to a higher performance bracket in the following year, whereas it is difficult for top performers to remain in the top category. As such, our findings support the current HVBP payment model, which rewards hospital improvement or achievement, whichever is larger. Although this approach may reduce the size of potential P4P program penalties, our findings suggest that CMS could increase the cap on the maximum yearly loss to provide sufficient financial incentives to hospitals to implement process improvements.
Overall, our DEA and regression analyses provide deep insights to improve future payment programs and guide healthcare providers towards value-based performance models. For example, in 2019, MedPAC recommended that the U.S. Congress consolidate all hospital P4P programs into a single alternative program, the Hospital Value Incentive Program (MedPAC, 2019). While the details of their methodology are unclear, our analysis provides a forward-looking view of the potential concerns that may emerge under such a combined program.
5. Conclusions
In this study, we utilise the DEA-based Malmquist index method to examine longitudinal trends in hospital performance with respect to productivity and value. Our results indicate that, on average, hospitals exhibited improvements in productivity and value from 2013 to 2017, which provides supporting evidence for the long-term efficacy of P4P programs. However, a sizeable portion of laggard hospitals were not able to catch up with improvements to the performance frontier, raising concerns about the viability of future value-based healthcare programs. Specifically, our results suggest that horizontal integration between hospitals is associated with improvements in both hospital productivity and value, whereas physician-hospital (vertical) integration has a marginal impact on the value of care. This highlights the need to align organisational strategies with incentive programs to realise greater performance improvements.
Our study has several limitations. First, our sample primarily focuses on acute care hospitals because other types of hospitals have excessive missing values in our data sources, affecting the validity of DEA estimation. Future studies could include other entities, such as post-acute care and rehabilitation facilities, and perform a difference-in-differences analysis to evaluate the causal impact. Second, we only consider the collective impact of P4P programs at the hospital level and encourage future research to utilise granular data sources (e.g., encounter-level data) to examine the effectiveness of individual programs. Furthermore, this work draws on hospitals in the U.S. and may not generalise to other settings. Nonetheless, as value-based care initiatives continue to flourish, our work opens an interesting venue for future research with the application of Malmquist index to examine the dynamics of health policy. Lastly, as discussed above, we precluded the COVID-19 period to avoid any operational disruptions that may confound our results. Further studies could extend the analysis to more recent years for additional insights.
Supplementary Material
Acknowledgments
I.R. Bardhan thanks the Charles and Elizabeth Prothro Regents Chair in Healthcare Management and the Dean’s Research Excellence Grant at the McCombs School of Business at UT Austin for generous financial support. C. Bao thanks the Spears Fellowship at Oklahoma State University for financial support.
Note
We addressed endogeneity concerns for horizontal and vertical integration by using the average integration levels among regional peer hospitals (within the same HRR) as their respective instruments, which lead to consistent findings.
Disclosure statement
No potential conflict of interest was reported by the author(s).
Data availability statement
Data that support this study cannot be made available due to our data use agreement with the American Hospital Association.
Supplemental data
Supplemental data for this article can be accessed online at https://doi.org/10.1080/20476965.2024.2421533
References
- Amorim Lopes, M., Soares, C., Almeida, Á., & Almada-Lobo, B. (2016). Comparing comparables: An approach to accurate cross-country comparisons of health systems for effective healthcare planning and policy guidance. Health Systems, 5(3), 192–212. 10.1057/hs.2015.21 [DOI] [Google Scholar]
- Aragon, L., Schieman, K., & Cure, L. (2022). Incorporating the six aims for quality in the analysis of trauma care. Health Systems, 11(2), 98–108. 10.1080/20476965.2021.1906763 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Atasoy, H., Chen, P., & Ganju, K. (2018). The spillover effects of health it investments on regional healthcare costs. Management Science, 64(6), 2515–2534. 10.1287/mnsc.2017.2750 [DOI] [Google Scholar]
- Ayabakan, S., Bardhan, I., & Zheng, Z. E. (2021). Triple aim and the hospital readmission reduction program. Health Services Research and Managerial Epidemiology, 8, 2333392821993704. 10.1177/2333392821993704 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Baicker, K., & Levy, H. (2013). Coordination versus competition in health care reform. New England Journal of Medicine, 369(9), 789–791. 10.1056/NEJMp1306268 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Banerjee, S., McCormick, D., Paasche-Orlow, M. K., Lin, M.-Y., & Hanchate, A. D. (2019). Association between degree of exposure to the hospital value based purchasing program and 30-day mortality: Experience from the first four years of medicare’s pay-for-performance program. BMC Health Services Research, 19(1), 921. 10.1186/s12913-019-4562-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Banker, R. D., Charnes, A., & Cooper, W. W. (1984). Some models for estimating technical and scale inefficiencies in data envelopment analysis. Management Science, 30(9), 1078–1092. 10.1287/mnsc.30.9.1078 [DOI] [Google Scholar]
- Bardhan, I. R., Bao, C., & Ayabakan, S. (2023). Value implications of sourcing electronic health records: The role of Physician practice integration. Information Systems Research, 34(3), 1169–1190. 10.1287/isre.2022.1183 [DOI] [Google Scholar]
- Burgess, J. F., Jr. (2012). Innovation and efficiency in health care: Does anyone really know what they mean? Health Systems, 1(1), 7–12. 10.1057/hs.2012.6 [DOI] [Google Scholar]
- Cantor, V. J. M., & Poh, K. L. (2018). Integrated analysis of healthcare efficiency: A systematic review. Journal of Medical Systems, 42(1), 1–23. 10.1007/s10916-017-0848-7 [DOI] [PubMed] [Google Scholar]
- Chee, T. T., Ryan, A. M., Wasfy, J. H., & Borden, W. B. (2016). Current state of value-based purchasing programs. Circulation, 133(22), 2197–2205. 10.1161/CIRCULATIONAHA.115.010268 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen, M., & Grabowski, D. C. (2019). Hospital readmissions reduction program: Intended and unintended effects. Medical Care Research and Review, 76(5), 643–660. 10.1177/1077558717744611 [DOI] [PubMed] [Google Scholar]
- Daub, S., Rosenzweig, C., & Schilkie, M. C. (2020). Preparing for a value-driven future. Families, Systems, & Health, 38(1), 83–86. 10.1037/fsh0000476 [DOI] [PubMed] [Google Scholar]
- Gearhart, R. S. (2017). Non-parametric frontier estimation of health care efficiency among US states, 2002–2008. Health Systems, 6(1), 15–32. 10.1057/s41306-016-0015-2 [DOI] [Google Scholar]
- Himmelstein, D., & Woolhandler, S. (2015). Quality improvement: Become Good at cheating and you never need to become Good at anything Else. Health Affairs Forefront. 10.1377/forefront.20150827.050132 [DOI] [Google Scholar]
- Izón, G. M., & Pardini, C. A. (2018). Association between medicare’s mandatory hospital value-based purchasing program and cost inefficiency. Applied Health Economics and Health Policy, 16(1), 79–90. 10.1007/s40258-017-0357-3 [DOI] [PubMed] [Google Scholar]
- Joynt Maddox, K. E., Sen, A. P., Samson, L. W., Zuckerman, R. B., DeLew, N., & Epstein, A. M. (2017). Elements of program design in medicare’s value-based and alternative payment models: A narrative review. Journal of General Internal Medicine, 32(11), 1249–1254. 10.1007/s11606-017-4125-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kazley, A. S., & Ozcan, Y. A. (2009). Electronic medical record use and efficiency: A DEA and windows analysis of hospitals. Socio-Economic Planning Sciences, 43(3), 209–216. 10.1016/j.seps.2008.10.001 [DOI] [Google Scholar]
- Kocher, R., & Sahni, N. R. (2011). Hospitals’ race to employ physicians—the logic behind a money-losing proposition. New England Journal of Medicine, 364(19), 1790–1793. 10.1056/NEJMp1101959 [DOI] [PubMed] [Google Scholar]
- Kohl, S., Schoenfelder, J., Fügener, A., & Brunner, J. O. (2019). The use of data envelopment analysis (DEA) in healthcare with a focus on hospitals. Health Care Management Science, 22(2), 245–286. 10.1007/s10729-018-9436-8 [DOI] [PubMed] [Google Scholar]
- LaPointe, J. (2022, March 2). What is value-based care, what it means for providers? RevCycleIntelligence. https://revcycleintelligence.com/features/what-is-value-based-care-what-it-means-for-providers [Google Scholar]
- MedPAC . (2019). The hospital value incentive program: Measuring and rewarding meaningful hospital quality – MedPAC. https://www.medpac.gov/the-hospital-value-incentive-program-measuring-and-rewarding-meaningful-hospital-quality/
- Novignon, J., Aryeetey, G., Nonvignon, J., Malm, K., Peprah, N. Y., Agyemang, S. A., Amon, S., & Aikins, M. (2023). Efficiency of malaria service delivery in selected district-level hospitals in Ghana. Health Systems, 12(2), 198–207. 10.1080/20476965.2021.2015251 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pai, D. R., Hosseini, H., & Brown, R. S. (2019). Does efficiency and quality of care affect hospital closures? Health Systems, 8(1), 17–30. 10.1080/20476965.2017.1405874 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Porter, M. E. (2010). What is value in health care? New England Journal of Medicine, 363(26), 2477–2481. 10.1056/NEJMp1011024 [DOI] [PubMed] [Google Scholar]
- Post, B., Buchmueller, T., & Ryan, A. M. (2018). Vertical integration of hospitals and physicians: Economic theory and empirical evidence on spending and quality. Medical Care Research and Review, 75(4), 399–433. 10.1177/1077558717727834 [DOI] [PubMed] [Google Scholar]
- Renee Rutter, S., & Park, S. H. (2020). Relationship between hospital characteristics and value-based program measure performance: A literature review. Western Journal of Nursing Research, 42(12), 1010–1021. 10.1177/0193945920920180 [DOI] [PubMed] [Google Scholar]
- Revere, L., Langland-Orban, B., Large, J., & Yang, Y. (2021). Evaluating the robustness of the CMS hospital value-based purchasing measurement system. Health Services Research, 56(3), 464–473. 10.1111/1475-6773.13608 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ryan, A. M., Burgess, J. F., Pesko, M. F., Borden, W. B., & Dimick, J. B. (2015). The early effects of medicare’s mandatory hospital pay-for-performance program. Health Services Research, 50(1), 81–97. 10.1111/1475-6773.12206 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ryan, A. M., Krinsky, S., Maurer, K. A., & Dimick, J. B. (2017). Changes in hospital quality associated with hospital value-based purchasing. New England Journal of Medicine, 376(24), 2358–2366. 10.1056/NEJMsa1613412 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Van Herck, P., De Smedt, D., Annemans, L., Remmen, R., Rosenthal, M. B., & Sermeus, W. (2010). Systematic review: Effects, design choices, and context of pay-for-performance in health care. BMC Health Services Research, 10(1), 247. 10.1186/1472-6963-10-247 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Werner, R. M., Emanuel, E. J., Pham, H. H., & Navathe, A. S. (2021, February 17). The future of value-based payment: A road map to 2030. https://ldi.upenn.edu/our-work/research-updates/the-future-of-value-based-payment-a-road-map-to-2030/
- Werner, R. M., & Glied, S. A. (2021). Covid-induced changes in health care delivery—can they last? New England Journal of Medicine, 385(10), 868–870. 10.1056/NEJMp2110679 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Data that support this study cannot be made available due to our data use agreement with the American Hospital Association.
