Improving the evaluation of an integrated healthcare system using entropy balancing: Population health improvements in Gesundes Kinzigtal

Nicolas Larrain; Oliver Groene

doi:10.1016/j.ssmph.2023.101371

. 2023 Feb 24;22:101371. doi: 10.1016/j.ssmph.2023.101371

Improving the evaluation of an integrated healthcare system using entropy balancing: Population health improvements in Gesundes Kinzigtal

Nicolas Larrain ^a,^∗, Oliver Groene ^b

PMCID: PMC9996350 PMID: 36909929

Abstract

Background

Evidence of integrated healthcare networks' effect on population health is scarce. Moreover, current designs for evaluating such networks have shortcomings that can result in misleading conclusions. Our paper evaluates Gesundes Kinzigtal, a best-practice integrated healthcare network, using an innovative design that enlightens the discussion about health gains produced by integrated healthcare.

Research question

What is the effect of Gesundes Kinzigtal on population health?

Methods

We evaluated the effect of the integrated healthcare initiative by performing a quasi-experimental evaluation based on entropy balancing. Integrated network participants were compared to a control group and followed for five years. Claims data from 2004 to 2018 was used. Population health outcomes correspond to survival (Cox hazard ratio, Kaplan-Meier curve), mortality ratio, mean age at the time of death, and years of life lost or gained. Design validity was evaluated by assessing group balance at baseline. Finally, we compared our results to those obtained using a previously published design for evaluating integrated networks.

Results

The treatment group was composed of 9083 network participants, compared to an equivalent control group, showing, respectively, a mortality ratio of 5.4% vs 7.5% (p < 0.05), mean age at the time of death of 80.1 vs 80.3 (p > 0.05) and a gain of 0.2 years of life per person for the treatment group (p > 0.05). The Cox hazard ratio (0.72; p < 0.05) and mean survival time (1784 vs 1768 days; p < 0.05) showed better survival for treated participants. Results using the previously published design were more favorable for the treatment group; however, the design excluded participants significantly associated with greater healthcare needs.

Discussion

The integrated network had a favorable effect on participants' mortality and survival risk. Previous evaluations based on propensity score matching might overestimate the network's impact on population health by excluding participants with greater healthcare needs.

Keywords: Impact evaluation, Quasi-experiment, Integrated healthcare, Entropy balancing, Propensity score matching

Highlights

•
Gesundes Kinzigtal had a positive impact on population health.
•
After 5 years of exposure to the integrated network, the mortality ratio decreased by 28%.
•
We present an improved robust design to evaluate population-based health care interventions.
•
The effect of care integration was smaller for users with greater healthcare needs.

1. Introduction

Gesundes Kinzigtal ('Healthy Kinzigtal') (GK) is one of Germany's leading population-based accountable care organizations (ACO). The organization brings together a local network of physicians, a professional health management company (OptiMedis), and a wide arrange of other organizations, such as social health insurance companies, patients' associations, gyms and pharmacies. Together, they form an integrated healthcare network to overcome health system fragmentation and improve population health, patient and carer experience, and cost efﬁciency (the 'Quadruple Aim'). The network is managed by a "regional integrator" (Gesundes Kinzigtal ltd), an organization that harmonizes the roles of the network components and serves as the implementation partner for system interventions. The business model was built around a shared savings contract, where savings from the difference in expected and actual care costs of the patients in the region are shared between the ACO and the system's payers, the insurance companies. An extensive review of GK has been documented elsewhere (Hildebrandt et al., 2015).

Even though ACOs have had documented success (Kaufman et al., 2019; Peiris et al., 2018), the performance assessment of these initiatives is not straightforward. The Institute for Healthcare Improvement and the Expert Group for Health System performance assessment propose a series of specific outcome indicators to measure the performance of integrated systems (European Commission. Directorate General for Health and Food Safety., 2017; Stiefel & Nolan, 2012). Nevertheless, these guidelines lack information regarding evaluation designs to determine the causal relationship between system integration and health improvements.

Acknowledging this challenge, Pimperl et al. (Pimperl et al., 2017) evaluated GK with a quasi-experiment based on a combination of exact and propensity score matching using data from 2013. Later, the authors proposed this design as a robust method for evaluating the effect of integrated systems on population health. However, when updating the evaluation of GK, the design led to selection bias in the matching process (Lunt, 2014), violating the critical assumption of common support (Rosenbaum & Rubin, 1983). In detail, the treatment sample excluded due to not having a matched pair had a larger prevalence in variables directly related to the outcome. The population approach of GK entails accountability for the whole spectrum of care of all the patients involved in the initiative. Consequently, interventions seeking health gains are introduced for all patients, where the relationship between patients' sickness and care integration health gains is unknown. Accordingly, the direction of the bias introduced by systematically excluding a set of patients is also unknown.

Entropy balancing (Hainmueller, 2012) is a multivariate reweighting method to produce balanced samples in observational studies and can be used to cope with the challenges presented in the previous evaluation design. We make use of this methodology to tackle two main objectives. First, to update the evidence of the effectiveness of GK over population health. Second, to describe an appropriate design for evaluating the effect of population-based care integration interventions on population health outcomes.

2. Methods and theory

2.1. Data

We used claims data from the two health insurance companies in contract with GK. Data did not contain identifiable information and, hence, was completely anonymous. Data corresponds to approximately 50% of all patients in the region of Kinzigtal. Literature suggests that claims data are valuable for assessing the quality and safety of care (Romano et al., 2003) and is readily available in electronic format. Data includes claims from outpatient, hospital, work incapacities, prescriptions, prevention services, rehabilitation, and long-term care services. Data was available from 2004 until the first half of 2018.

2.2. Study design

Given the nature of our data and the ex-post timing of our evaluation, the causal effect of the ACO over population health outcomes is most adequately evaluated with a quasi-experimental design. Quasi-experimental designs construct a non-treated control group equivalent to the treatment group that can be used as a counterfactual. The comparability between groups is evaluated by comparing the resemblance of moment conditions and distribution of observable covariates at baseline. Because we can't compare unobservable variables, quasi-experimental designs are more vulnerable to internal validity threats (Morgan & Winship, 2015).

Physicians participating in the ACO ("partner physicians") receive health intelligence support and participate in several interventions to improve care integration. At the same time, enrollment in the ACO is available for all the patients insured by the two partner health insurance companies in the region, regardless of the participation status of the physicians they visit. Enrolled patients are managed by the regional integrator and have access to special care programs, case management and other interventions to improve care efficiency and promote healthy lifestyles.

Our impact evaluation measures the effect of patient enrollment in the ACO. Patients enrolled between 2006 and 2013 (N = 10 499) are compared to patients in the same region that were not affected by the actions of the ACO, and followed up for 5 years (4.5 for participants enrolling in 2013). The control group is drawn from the people insured with the partner insurance companies that live in the region of Kinzigtal but are not enrolled with the ACO. In addition, partner physicians perform less than 50% of their physician visits. These two conditions ensure that potential controls don't have access to special care programs, case management, enhanced integration, or other interventions. Between 2006 and 2013, there were 26 163 non-participants to be considered as potential controls (N = 26 163).

Physician participation might suffer from self-selection due to variables considered counterfactuals. Because controls are restricted to having most of their physician visits with non-partners, this could translate to better (or worse) outcomes, biasing results downward (or upward). The 50% threshold is modified in a sensitivity analysis to test if the physician's effect biases our results.

The counterfactual creation process is based on indicators of healthcare use (in- and outpatient visits, long-term care and rehabilitation use), health status (diagnosis and prescriptions) and demographic factors (age & sex) (Table 1). Four variables (age, sex, Charlson Score and type of insurer) are set to be exactly the same between treatment and control groups. Other variables are set to be as similar between groups as possible. By balancing these variables at baseline, we secure that before the intervention started, patients in the two groups had equivalent health and health utilization patterns; hence, differences found after follow-up can be attributed to the intervention. Including the insurance company in the exact variables might seem strange. Still, it improves comparability as insurance companies have specific target audiences in terms of occupation and urban or rural status, two variables that otherwise are not present in the database.

Table 1.

Variables for counterfactual creation process.

Variables set to be as close as possible (Balanced at first moment in EB design, Propensity Score calculation in PSM design)	Variables set to be the same between groups (Balanced at first and second moment in EB design, exact matching variables in PSM design)
•Insured days	•Age (in age brackets by 3 years)
•No. of physician visits	•Sex
•No. of specialist visits	•Charlson Score (Charlson score insurer (based on the ICD-10-GM diagnosis from the 2 years preceding the year of enrollment to the ACO intervention; in 2 by 2 brackets)
•No. of Hospital admissions	•Statutory Health Insurance
• Days of stay in hospital care
• Days of in-patient rehabilitation
• Number of drug prescriptions
• Days of temporary incapacity for work
• Days of permanent incapacity of work
• Long term care level by the German long-term care insurance (range: 0 = no care level, 1 = lowest care level to 3 = highest care level, 4 = special hardship cases)
• Presence (yes/no) of an outpatient or in-hospital diagnosis in an ICD- 10 GM (International Statistical Classiﬁcation of Diseases and Related Health Problems, 10th revision, German Modiﬁcation) diagnosis group (all ICD diagnosis groups A00-Z99; except diagnosis groups with less than 100 persons with an event in the intervention or control group in the diagnosis group concerned)
All variables based on data from the two years preceding the year of enrollment to the ACO intervention. A nearest neighbor approach with a maximal difference of $\pm$ 0.01 caliper (0.2 standard deviations) is used for the PSM design.

Additional data quality criteria:

•
Control group subjects drawn in the matching process must have data on the year of enrollment of its ACO enrollee counterpart.
•
All insurees included in the study must have 90% of the possible insured person days and the corresponding data in the 2 years preceding the enrollment.
•
Potential controls need to have data in the follow-up period (no dropouts). This is motivated by very few (n = 14) dropouts in the treatment group.

Open in a new tab

Further, diseases with low prevalence were not considered to avoid over-restricting the pool of potential controls. Nevertheless, including the Charlson Comorbidity score in the exact variables accounts for the potential higher morbidity related to the excluded diseases. Unlike the design by Pimperl et al. (Pimperl et al., 2017), prescription medications were not included in the counterfactual creation process to secure that substantial matches were made. Instead, prescription information was used to evaluate the effectiveness of the design to achieve balanced comparison groups.

Entropy balancing (EB): Our design uses entropy balancing to create the counterfactual (Hainmueller, 2012). Entropy balancing is a preprocessing procedure that allows researchers to create balanced samples to estimate treatment effects. The methodology calculates control unit weights so that the reweighted control group satisfies a pre-set of balance conditions that are imposed on the sample moments of the covariate distributions. The optimization problem will search for the set of weights that satisfies the balance constraints while remaining as close as possible to the original weights to minimize the loss of information. By doing so, the method bypasses the need to iterate different matching methods to find balanced groups. The recalibration of unit weights effectively adjusts for systematic and random inequalities in representation and eliminates covariate imbalance more efficiently than matching methods (Parish et al., 2018). Moreover, Entropy Balancing has the advantage of not losing any treated subjects due to a lack of an appropriate counterfactual. Lastly, the method has proven better estimation accuracy and lower calculation burden than other reweighting methods, such as inverse probability weighting and stabilized inverse probability weighting (Harvey et al., 2017).

As explained by Parish et al. (Parish et al., 2018), the equation determining the recalibration of unit weights minimizes the entropy distance between the initial weight of $n_{0}^{- 1}$ and the solution weight $w_{i}$ . The mathematical form follows that of an optimization; in a sample with $n_{0}$ untreated subjects and $n_{1}$ treated subjects the weight $w_{i}$ of each $i$ subject is determined by:

w_{i} = \arg \min_{ω i} \sum_{i = 1, \dots, n_{0}} ω_{i} \log (\frac{ω_{i}}{n_{0}^{- 1}})

with constraints:

Equation 1.

(1)

Equation 2.

(2)

Equation 3.

(3)

Constraint (1.) ensures that all covariates ( $X_{i}$ ) meet a pre-specified level of covariate balance in terms of first, second or higher moments ( $m_{r}$ ). Constraint (2.) specifies that all weights of the untreated subjects must add up to 1 or to a normalizing constant. In our design, said constant is the number of treated subjects. Finally, (3.) specifies that all weights must be non-negative. The weights that result from entropy balancing can be used similarly to survey sample weights.

We specified our balance conditions to match the first and second moments of the covariates set to be exactly balanced (age, sex, insurer and Charlson score) and to the first moment for the covariates set to be balanced as close as possible (see Table 1). To deal with the roll-in nature of the intervention, we created year cohorts and signaled them with a period variable per year of enrollment. Consecutively, we bind all cohorts into one analytical sample. In other words, the same potential control subject is repeated in the potential control sample once per each assessment period where the subject is eligible. The result is a much larger set of potential controls. This manipulation is later controlled by including a first and second-moment restriction to the period variable. The design does not allow the identification of the specific follow-up starting day of the control units, as they are not matched to one treated participant. For this reason, the starting date of the observation period corresponds to 182 days into the year of enrolment for both control and treatment units. We analyze the groups' balance by comparing variables that were not included in the pre-specified constraints but that can influence the outcome (see Table 1 in the Appendix). Given the roll-in solution, it is more probable for some controls to have a large total weight when considering all cohorts. To secure that no control group participant is overrepresented, we stratified the control sample according to their weight in intervals of one and examined its distribution. Using absolute standardized residuals (Everitt & Skrondal, 2010), we examined if the most represented control units were outliers in any of the covariates used in the reweighting process.

Propensity Score Matching (PSM): In a second stage, we compared our results to those obtained using the design created by Pimperl et al. (Pimperl et al., 2017). Compared to the original study, our paper comprehended a larger cohort of enrollees (2006–2009 vs 2006–2013) (n = 6922 vs n = 10 499) followed up for a more extended period (4 years vs 4.5–5 years). The design combines propensity score matching and small-scale exact matching. Propensity score matching estimates the conditional probability of an insuree to be a participant of the ACO initiative with logistic regression (Rosenbaum & Rubin, 1983), while exact matching limits the possible pool of controls for a treated patient to the ones that have the exact same value in crucial variables. Predictors for both the exact and propensity score matching can be found in Table 1.

The match pairs are constructed considering the average values of the matching variables (Table 1) in the two previous years to enrolment in an optimal 1:1 nearest neighbor approach. Given the limited sample, the 1:1 option was selected to minimize bias introduced by using more than the best potential control (in a 1:k matching, for example) and the loss of information by using fewer controls (k:1 matching)(Stuart, 2010). The original design comprehended a greedy matching for easiness of calculation, supported by evidence stating the non-inferiority of the method compared to optimal matching (Gu & Rosenbaum, 1993). However, new statistical software advances permit an optimal approach at this scale with ease (R package "Rollmatch"). Given the participants' different enrollment times, a roll-in matching was used (Witman et al., 2019). Following the literature, a maximal difference of 0.01 in caliper (width equal to 0.2 standard deviations of the logit of the propensity score) is tolerated (Austin, 2011). The balance between groups was evaluated by comparing arithmetic means, standard deviations, and standardized mean differences (Murray et al., 2003). These measures were also used to compare the matched and unmatched participants in the treatment group to determine or discard the existence of selection bias.

2.3. Outcome indicators

We used the population health outcome indicators recommended by the Institute for Healthcare Improvement(Stiefel & Nolan, 2012) when dealing with claims data, with the adaptations suggested by Pimperl et al. (Pimperl et al., 2017):

1.
Mortality ratio (observed number of deaths/total of subjects in the study population). The indicator is simple to understand and manipulation-resistant to measure quality and patient beneﬁt (Stiefel & Nolan, 2012).
2.
Age at the time of death. In evaluations where follow-up is not until the death of all participants, the measure can be used as a proxy for life expectancy.
3.
Years of potential life lost and gained (YPLLG) is an adapted version of the Year of Potential Life Lost indicator (YPLL)(Stiefel & Nolan, 2012). YPLL measures potential life lost because of premature death compared to the subject's life expectancy. The formula is: $\sum_{i = 1}^{n} ({L E}_{i} - {A g e a t t h e t i m e o f d e a t h}_{i})$ Where $i$ represents the subjects, and the possible values have a lower limit of 0. The YPLLG removes the lower limit so that 'gained' years are also accounted for. Life expectancy is calculated for individuals using the generations' tables of the German Federal Statistical Ofﬁce (Generationensterbetafeln Für Deutschland: Modellrechnungen Für Die Geburtsjahrgänge 1896–2009., 2011). The measure is presented as a per-dead patient average to ensure that the indicator is not affected by the different mortality rates in the comparison groups.
4.
Survival time and hazard. Survival time is measured as the days between the start of the observation and the end of the study period or death. It is measured with the mean survival time indicator and portrayed in a Kaplan-Meier curve (Kaplan & Meier, 1958). Median survival time is discarded because none of the groups reaches 50% mortality at any point of the follow-up period. Survival hazard estimates the probability of a subject survival in any given time interval and is measured with the Cox proportional hazard model (Spruance et al., 2004).

2.4. Additional data quality criteria

Only living patients at the year of enrollment of the ACO participants were considered as potential controls, to avoid immortality bias (Levesque et al., 2010). To assess the potential of subjects in the non-treated group to be regarded as controls, subjects need sufficient data in the baseline years. In addition, given that most people in the treatment group had data for the entirety of the assessment period (only 14 dropouts), potential controls were restricted to non-treated subjects with available data in the follow-up period. Finally, we excluded the first half year after the start of the observation period to account for the indirect immortal time bias generated by GK partner physicians restricting the enrollment in the ACO to terminally ill patients. The exclusion of the first six months is also helpful in the entropy balancing design to ensure that all participants have started the intervention at the starting time of the observation period.

3. Results

3.1. Comparison groups

From the initial 10 499, there were 1416 ACO enrollees (13.5%) with insufficient data in the baseline years to be considered in the assessment. In the PSM-based design, an additional 1077 (10.2%) ACO enrollees had to be excluded from the analysis because no adequate matched pair could be found. Quality criteria expressed in Table 1 reduced the number of unique potential controls to N = 23 963. The analytical sample for the PSM-based design consisted of 8006 insurees in the treatment and non-ACO control groups. The EB-based design consisted of 9083 in the treatment group and 123 702 in the control group, with an added weight of 9083. Both designs achieved balanced groups in all variables used in the matching process (by design in the entropy balancing case) and other control variables not included in the matching process. Variables presented in Table 2 summarize the assessment of group balance. A lower standardized mean difference indicates better balance, where 0.2 (Faraone, 2008) or 0.1 (Parish et al., 2018) is the maximum threshold considered appropriate for assuming a balanced covariate. Table 1 in the Appendix provides the complete set of variables to assess group balance.

Table 2.

Comparison of metric variables to assess group balance after adjusting counterfactual. Calculation for baseline years.

Indicator	Initial: Treatment: 9083/Control: 123 699			PSM: Treatment/Control = 8003			Entropy balancing: Treatment/Control = 9083
Indicator	Mean (SD) Treat	Mean (SD) Control	stdmdiff	Mean (SD) Treat	Mean (SD) Control	stdmdiff	Mean (SD) Treat	Mean (SD) Control	stdmdiff
Age	46.7 (24.8)	46.6 (23)	0.05	46.6 (24.4)	47 (23.7)	0.02	46.7 (24.8)	46.7 (24.8)	0
Sex (1 = Female)	0.6 (0.5)	0.5 (0.5)	0.05	0.6 (0.5)	0.6 (0.5)	0	0.6 (0.5)	0.6 (0.5)	0
Charlson Score	1.8 (2.3)	1.6 (2.2)	0.1	1.7 (2.1)	1.7 (2.1)	0.02	1.8 (2.3)	1.8 (2.2)	0.03
Insured days	729.3 (11.2)	727.8 (18.7)	0.08	729.6 (9.1)	728.4 (17.2)	0.08	729.3 (11.2)	728 (18.1)	0.09
N° physician visits	13.7 (9.6)	11.3 (10.6)	0.23	12.9 (9.3)	12.4 (9.4)	0.05	13.7 (9.6)	13.1 (10.2)	0.06
N° specialist visits	6.2 (5.8)	4.8 (5.6)	0.25	5.7 (5.6)	5.6 (5.6)	0.03	6.2 (5.8)	6 (6.1)	0.03
N° Hospital visits	0.4 (1)	0.4 (1.1)	0.04	0.4 (0.9)	0.4 (1.2)	0.03	0.4 (1)	0.4 (1.1)	0
Hosp LOS	4.0 (13.6)	3.6 (13.9)	0.03	3.5 (12.5)	3.9 (15.1)	0.03	4.0 (13.6)	4 (15.4)	0
Rehab LOS	0.9 (5.5)	0.7 (4.9)	0.05	0.8 (5.1)	0.8 (4.8)	0.01	0.9 (5.5)	0.8 (5.4)	0.02
N° prescriptions	7.4 (6.7)	5.7 (6.7)	0.25	7 (6.4)	6.6 (6.7)	0.06	7.4 (6.7)	6.9 (7)	0.07
Temp. Incapacity for work	15.9 (44.9)	12.5 (39.9)	0.08	14.8 (43.1)	15.1 (43.2)	0.01	15.9 (44.9)	14.7 (46.5)	0.03
Perm. Incapacity for work	6.2 (64.9)	5.8 (62.6)	0.01	7 (68.6)	6 (64)	0.02	6.2 (64.9)	6.1 (64)	0
Long term care level	0.1 (0.5)	0.1 (0.5)	0.03	0.1 (0.4)	0.1 (0.4)	0.03	0.1 (0.5)	0.1 (0.5)	0.01
N° Rehab cases	0 (0.3)	0 (0.2)	0.06	0 (0.2)	0 (0.2)	0.01	0 (0.3)	0 (0.2)	0.03
Age at the start of LTC	68.6 (18.6)	72.2 (17.6)	0.2	69.1 (18.8)	67.7 (19.6)	0.07	68.6 (18.6)	70.9 (20.6)	0.1
10-year survival rate	0.8 (0.3)	0.8 (0.3)	0.08	0.8 (0.3)	0.8 (0.3)	0.03	0.8 (0.3)	0.8 (0.3)	0.02
Risk adjusted Contribution	1570 (3242)	2089 (4046)	0.13	1447 (2778)	1475 (2893)	0.01	1570 (3242)	1916 (3688)	0.1
Total cost	3647 (8728)	3223 (9206)	0.05	3230 (7510)	3366 (8005)	0.02	3647 (8728)	3533 (8656)	0.01

Open in a new tab

LOS: Length of stay; LTC: Long-term care, stdmdiff: Standard mean difference.

3.2. Selection bias

The comparison between the matched and unmatched participants in the treatment group of the PSM-based design shows that the unmatched participants had significantly more physician, specialists, and hospital visits, together with higher in-patient length of stay, rehabilitation length of stay, Charlson comorbidity score and number of prescriptions. Table 3 shows the arithmetic means, standard deviations, standardized mean differences and t-test significance of the difference between groups.

Table 3.

Comparison of matched vs unmatched treatment participants in PSM-based model.

	Matched Mean (SD)	Unmatched Mean (SD)	stdmdiff	t-test (p-value)
Age	46 (24.4)	47.5 (27.8)	−0.06	0.070
Sex (1 = Female)	0.5 (0.5)	0.6 (0.5)	−0.01	0.620
Charlson Score	1.2 (1.6)	2 (2.4)	−0.47	0.000
N° physician visits	13.3 (8.7)	17.5 (10.1)	−0.47	0.000
N° specialist visits	6 (5.5)	8.8 (6.5)	−0.5	0.000
N° Hosp visits	1.6 (1.3)	2 (1.9)	−0.29	0.000
Hosp LOS	1.6 (1.3)	2 (1.9)	−0.21	0.000
Rehab LOS	23.7 (12.8)	21.2 (11.6)	0.2	0.000
N° pres.	7.4 (6.1)	10.7 (7.3)	−0.52	0.000
Days of temp. Incapacity	33.8 (56.6)	46 (74.5)	−0.21	0.000
Days of perm. Incapacity	603 (224.8)	545.6 (304.6)	0.24	0.000

Open in a new tab

stdmdiff: Standard mean difference.

3.3. Impact of the ACO in population health outcomes

Our EB-based design shows lower mortality (significant at 5% error) for the treatment group in all the follow-up years. On the other hand, the age at the time of death is slightly higher in the control group (80.1 vs 80.3). However, the 95% confidence intervals are highly overlapped, indicating the difference is not statistically significant (Cumming, 2009). The accumulated years of life lost or gained are higher in the control group (13 578 years gained) than in the treatment group (9880 years gained), plausible given the higher number of deaths in the control group. The per-dead participant measure of the YPLLG indicator is favorable to the treatment group, with an average gain of 0.2 years more than the control group (20.3 vs 20.1). As in the previous indicator, the 95% confidence intervals are heavily overlapping, indicating the difference is not statistically significant. The Cox proportional hazard regression model shows a significant hazard ratio of 0.72 (95%CI: 0.66–0.79, p < 0.05), evidencing the protective effect of being in the treatment group. These results are summarized in Table 4. In line with the protective effect of the intervention, the Kaplan-Meier curve in Fig. 1 shows a higher survival probability in time for the treatment group, with a mean survival time of 1784 (95%CI: 1780–1788) days in the treatment group versus 1768 (95%CI: 1767–1769) days in the control. Changing the threshold of fewer than 50% of physician visits made by participating physicians to be considered a potential control showed that results remain stable, with statistically insignificant differences for the survival analysis. Results for the sensitivity analysis using thresholds of 10%, 30% and 80% or fewer visits to participating physicians to be considered in the control group can be found in the online supplementary material.

Table 4.

Mortality Ratios, Age at Time of Death, Years of Potential Life Lost and Gained and Cox Hazard model: ACO Intervention Group Versus Non-ACO Control Group. Entropy balancing based design.

Time period after enrollment	ACO intervention group:			Non-ACO control group:			Unmatched: 0
	9083			123702 (9083)			Unmatched: 0
	Deceased insurees		YPLLG (LE—age at the time of death)	Deceased insurees		YPLLG (LE—age at the time of death)	Pearson chi- square test for mortality t-test sig. P < 0.05
	n	%	YPLLG (LE—age at the time of death)	n	%	YPLLG (LE—age at the time of death)	Pearson chi- square test for mortality t-test sig. P < 0.05
1/2 year	27	0.30%	−408	66	0.72%	−1345	*
Year 1*	44	0.48%	−891	73	0.80%	−1446	*
Year 2	85	0.94%	−1847	147	1.62%	−2958	*
Year 3	121	1.33%	−2369	148	1.63%	−2857	*
Year 4	107	1.18%	−2027	152	1.68%	−3103	*
Year 5**	129	1.42%	−2745	157	1.72%	−3208	*
Total (without 1/2 year)	486	5.35%	−9880	677	7.45%	−13578	*
Average YPLLG (SD)	−20.3	(14.957)	95%CI(-19;-21.7)	−20.1	(2.255)	95%CI(-19.9;-20.2)
Average age at time of death(SD)	80.1	(11.1)	95%CI(79.1; 81)	80.3	(6.4)	95%CI(79.8; 80.8)

Cox Hazard Ratio
	coef	exp (coef)	se (coef)	robust se	z	Pr (>\|z\|)
treatT	−0.331	0.718	0.0595	0.0478	(−)6.87	4.47e-12 ***
Concordance	0.541 (se = 0.006)
Likelihood ratio test	31.4 on 1 df, p = 2e-08
Wald test	47.91 on 1 df, p = 4e-12
Score (logrank) test	31.25 on 1 df, p = 2e-08, Robust = 60.92 p = 6e-15

Open in a new tab

*Without first ½ year; **Participants joining in the last period followed for 4.5 years.

Fig. 1 — Survival probability. Intervention versus weighted control group; EB-based model (log scaled, survival time in days, censoring the deceased within the ﬁrst 182 days, max 1825 days).

The PSM-based model overestimates the effect of ACO enrollment in every indicator (Table 5). We highlight the significant favorable difference of 80.1 (95%CI: 79–81.2) vs 77.3 (95%CI: 76.2–78.3) in the age at the time of death, and the per-dead participant gained life years 20.3 (95%CI: 21.8–18.8) vs 16.2 (95%CI: 17.7–14.8) between treatment and control, respectively. The Kaplan-Meier curve for the PSM-based design can be found in Figure A.1 of the Appendix. The mean survival time was 1788 (95%CI: 1784–1792) days in the treatment group versus 1775 (95%CI: 1770–1780) days in the control group. Figure A.2 in the Appendix compares the Kaplan-Meier curves of the EB and PSM designs.

Table 5.

Mortality Ratios, Age at Time of Death, Years of Potential Life Lost and Gained and Cox Hazard model: ACO Intervention Group Versus Non-ACO Control Group. PSM.

Time period after enrollment	ACO intervention group: 8006			Non-ACO control group: 8006			Unmatched: 1077
	Deceased insurees			Deceased insurees			Pearson chi- square test for mortality t-test sig. P < 0.05
	n	%	YPLLG (LE—age at the time of death)	n	%	YPLLG (LE—age at the time of death)	Pearson chi- square test for mortality t-test sig. P < 0.05
1/2 year	20	0.25%	−325	49	0.61%	−806	*
Year 1*	28	0.35%	−514	45	0.56%	−624	*
Year 2	71	0.89%	−1511	112	1.40%	−2029	*
Year 3	99	1.24%	−1936	116	1.45%	−1561
Year 4	89	1.11%	−1735	103	1.29%	−1781
Year 5**	103	1.29%	−2213	135	1.69%	−2302	*
Total (without 1/2 year)	390	4.87%	−7910	511	6.38%	−8296	*
Average YPLLG (SD)	−20.3	(14.823)	95%CI(-21.8;-18.8)	−16.2	(16.569)	95%CI(-17.7;-14.8)
Average age at time of death (SD)	80.1	(11.1)	95%CI(79; 81.2)	77.3	(12.4)	95%CI(76.2; 78.3)

Cox Hazard Ratio
	Coef	exp (coef)	se (coef)	robust se	z	Pr (>\|z\|)
treatT	−0.296	0.744	0.0693	0.0693	−4.268	1.97e-05 ***
Concordance	0.537 (se = 0.008)
Likelihood ratio test	18.43 on 1 df, p = 2e-05
Wald test	18.22 on 1 df, p = 2e-05
Score (logrank) test	18.35 on 1 df, p = 2e-05, Robust = 18.38 p = 2e-05

Open in a new tab

*Without first ½ year.

**Participants joining in the last period followed for 4.5 years.

4. Discussion

There is a positive effect of ACO enrollment on population health. The evaluation shows lower mortality and a protective effect of the integrated care initiative over the per year probability of dying for the treatment group (significant at 5% error). Moreover, the impact measurement is more accurate than the one obtained with the previously published design.

At the same time, our analysis shows that the PSM-based design overestimates the intervention effect. PSM is a robust method to determine the treatment effect on the treated sample that has a match. Still, the validity of the estimation is damaged if the portion of the treatment group excluded from the analysis is related to the outcomes (Lim et al., 2014). In the case of this study, the excluded treated participants had significant differences in indicators related to higher healthcare needs, reflected in greater healthcare use and higher comorbidity. These are variables directly associated with a higher risk of death and mortality outcomes (Fraccaro et al., 2016). In other words, the PSM-based analysis is valid only for the matched sample but not to determine the overall effect of the intervention.

The average age at the time of death is a short-term approximation of the life expectancy of each group (Samaras, 2017). Because it only considers deaths during follow-up, it is plausible that it is lower than the national life expectancy at birth (81.10 in 2018). The difference between treatment and control gives us an approximation of the gain in life expectancy due to the intervention. Meanwhile, the YPPLG indicator shows that for both treatment and control, there is a gain of life years compared to the life expectancy at birth of the subjects. This finding indicates that participants live longer than the average life length when the subjects were born. Because age and sex are balanced between treatment and control, the difference in life years gained is an adjusted measure of the excess life expected after the life expectancy of each group. Both these measures are favorable to the treatment group only in the PSM-based design, while in the EB-based design, the differences between groups are not statistically significant. The Cox Hazard ratio shows the probability of death in the treatment group in a time unit (in our case, a year) compared to the control group. In other words, how protective the intervention is. For both designs, the intervention shows to be protective, being more protective in the EB-based design. This result seems contradictory, as the two previous indicators are more favorable for the treated in the PSM-based design. However, the two point-estimations are very close and are not statistically different from each other. Indicators used to balance the sample at baseline (i.e. health status and utilization) were not measured at follow-up for being competing indicators with mortality.

Gesundes Kinzigtal is considered a "comprehensive care reform". This means the initiative is accountable for its population's full spectrum of health and health services (McClellan et al., 2013). Consequently, the initiative introduces interventions and health intelligence directed to improve network efficiency and attend to specific health issues identified in the whole spectrum of patients. The first group of interventions includes aligning provider incentives, professional care coordination and partnerships with other stakeholders that influence population health, among others. In the last group, we can find, for example, special care management programs and healthy behaviors campaigns. Considering the selection bias occurring in the PSM-based design and the more favorable results compared to our EB-based design, we have evidence to support the hypothesis that the effect of the ACO on health is lower for the population with higher healthcare needs. This may be supported by the range of interventions directed to chronically ill patients, but not necessarily those patients with the highest need and vulnerability. Nevertheless, more evidence is necessary to make a definitive statement.

Entropy balance is considered more efficient than propensity score methods for creating balanced comparison groups (Parish et al., 2018)(Matschinger et al., 2020). However, both designs assessed in the article achieved balanced comparison groups, proving the effectiveness of the design presented by Pimperl et al., in time based on the design by Stuart (Stuart, 2010). The advantage of entropy balancing in our evaluation is instead focused on the non-elimination of part of the analytical sample. The feature ensures that our results represent the overall effect of the intervention and eliminates the possibility of biasing the results by systematically excluding treated participants because of variables related to the outcome. This advantage becomes especially relevant for comprehensive care reforms, as the one evaluated in this article. By including the whole spectrum of patients in the evaluation, we can secure the consideration of all the health gains (or non-gains) generated by the ACO. Inverse probability treatment weighting and stabilized waiting are other methods for the creation of the counterfactual in quasi experiments with similar characteristics to entropy balancing. The main difference is that these methods often still require iteration and post-calibration to achieve covariate balance. Moreover, EB has been shown to be more effective in mitigating observed bias and confounding (Harvey et al., 2017).

4.1. Limitations

Based on claims data, our design presents high replicability with a low burden for data collection. However, claims data are restricted in the scope of indicators that can be obtained. More complex indicators of population health, such as years lived in a healthy condition, could improve the impact evaluation. Moreover, entropy balancing may assign large weights to a small set of control observations if, for example, the treatment sample differs significantly from control sample observations. Furthermore, outliers of the control sample may be overrepresented. In cases where entropy balancing assigns large weights to a few control observations, slight alterations to the control sample could produce significantly different estimates of abnormal accruals (McMullin & Schonberger, 2020). Our assessment paid particular attention to this limitation and showed no signs of overrepresented control units. The most represented control units corresponded to 64 patients with weights between three and four. None of these control units was considered outliers in any of the covariates used in the counterfactual creation process, having absolute standardized residuals below three (Everitt & Skrondal, 2010).

Although entropy balancing ensures covariate balance for all determinants included in the matching algorithm, creating a valid counterfactual requires correctly specifying the set of underlying accrual determinants. In the current study, we elected to include determinants following a thorough assessment found in a previous evaluation of the same case. Future researchers applying entropy balancing should consider determinants following a theoretical framework appropriate to their setting.

Finally, it is important to recognize the potential bias from the non-treated group being affected by large-scale, general public-focused interventions such as campaigns promoting healthy behaviors. In addition, some interventions are not restricted to ACO enrollees, so even though they are promoted mainly among ACO members, a non-treated patient could have also taken part. However, we consider that this potential bias only diminishes the effects found on our evaluation; hence the conclusions remain robust.

5. Conclusion

We present an evaluation design to effectively measure the effect of integrated healthcare on population health outcomes. Our EB-based design deals with the shortcomings of previous designs by including the entire treatment group in the assessment while achieving better-balanced samples at baseline. Our calculations show that people enrolled in the ACO have a favorable mortality ratio, Cox hazard and longer median survival time compared to the control.

Funding

This research project was funded by the Marie Sklodowska-Curie Innovative Training Network (HealthPros- Healthcare Performance Intelligence Professionals; https://www.healthpros-h2020.eu/), of the Horizon 2020 Marie Skłodowska-Curie Actions of the European Commission, under grant agreement no. 765141. The funder provided support through OptiMedis AG in the form of salaries for NL but did not have any role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. The specific roles of the authors are further articulated in the ‘author contributions’ section. The views expressed in this manuscript are those of the authors and not necessarily those of the European Commission.

Author contributions

Nicolas Larrain: Conceptualization, Data curation, Formal analysis, Investigation Methodology, Visualization, Writing - original draft & review and editing. Oliver Groene: Conceptualization, Methodology, Writing - review and editing, Supervision.

Availability of data and material

Data is not available for public use. Analysis was conducted during 2019/2020 in the context of a partnership contract between data owners, the integrated health care network “Gesundes Kingzigtal” and OptiMedis AG.

Ethics approval

Given all used data was anonymized and results are presented in aggregation, ethics approval was not needed in accordance with the policies of the institutions.

Code availability

Upon request.

Conflicts of interest

NL was employed and received a salary from OptiMedis AG (private company), in the context of the HealthPros Innovative Training Network (described in the ‘funding’ section) from September 2018 until February 2022. OG is currently employed and in the board of directors of OptiMedis AG.

Footnotes

^{Appendix A}

Supplementary data to this article can be found online at https://doi.org/10.1016/j.ssmph.2023.101371.

Appendix

Table 1.

Comparison of metric variables to assess group balance after adjusting counterfactual. Calculation for baseline years. Full list of variables.

	Initial: Treatment: 9083/Control: 123 699					PSM: Treatment/Control = 8003					Entropy balancing: Treatment/Control = 9083
Indicator	Mean Treat	Mean Control	SD Treat	SD Control	stdmdiff	Mean Treat	Mean Control	SD Treat	SD Control	stdmdiff	Mean Treat	Mean Control	SD Treat	SD Control	stdmdiff
Age	46.7	46.6	24.8	23	0	46.6	47	24.4	23.7	0	46.7	46.7	24.8	24.8	0
Sex (1 = Female)	0.6	0.5	0.5	0.5	0.1	0.6	0.6	0.5	0.5	0	0.6	0.6	0.5	0.5	0
Charlson Score	1.8	1.6	2.3	2.2	0.1	1.7	1.7	2.1	2.1	0	1.8	1.8	2.3	2.2	0
Insured days	729.3	727.8	11.2	18.7	0.1	729.6	728.4	9.1	17.2	0.1	729.3	728	11.2	18.1	0.1
N° physician visits	13.7	11.3	9.6	10.6	0.2	12.9	12.4	9.3	9.4	0.1	13.7	13.1	9.6	10.2	0.1
N° specialist visits	6.2	4.8	5.8	5.6	0.2	5.7	5.6	5.6	5.6	0	6.2	6	5.8	6.1	0
N° Hospital visits	0.4	0.4	1	1.1	0	0.4	0.4	0.9	1.2	0	0.4	0.4	1	1.1	0
Hosp LOS	4.0	3.6	13.6	13.9	0	3.5	3.9	12.5	15.1	0	4.0	4	13.6	15.4	0
Rehab LOS	0.9	0.7	5.5	4.9	0	0.8	0.8	5.1	4.8	0	0.9	0.8	5.5	5.4	0
N° prescriptions	7.4	5.7	6.7	6.7	0.3	7	6.6	6.4	6.7	0.1	7.4	6.9	6.7	7	0.1
Temp. Incapacity for work	15.9	12.5	44.9	39.9	0.1	15.1	14.8	43.2	43.1	0	15.9	14.7	44.9	46.5	0
Perm. Incapacity for work	6.2	5.8	64.9	62.6	0	6	7	64	68.6	0	6.2	6.1	64.9	64	0
Long term care level	0.1	0.1	0.5	0.5	0	0.1	0.1	0.4	0.4	0	0.1	0.1	0.5	0.5	0
N° Rehab cases	0	0	0.2	0.2	0.1	0	0	0.2	0.2	0	0	0	0.2	0.2	0
Age at the start of LTC	68.6	72.2	18.6	17.6	0.2	69.1	67.7	18.8	19.6	0.1	68.6	70.9	18.6	20.6	0.1
10-year survival rate	0.8	0.8	0.3	0.3	0.1	0.8	0.8	0.3	0.3	0	0.8	0.8	0.3	0.3	0
Risk adjusted Contribution	1569.6	2089.4	3241.7	4045.7	0.1	1447.2	1475.4	2778.7	2893.4	0	1569.6	1916.5	3241.7	3688.1	0.1
Total cost	3647.2	3222.5	8727.5	9206.3	0	3230.3	3366.1	7510.4	8005.1	0	3647.2	3533.4	8727.5	8656.8	0
Med. A	2	2.1	1.3	1.4	0	2	2	1.3	1.3	0.1	2	2.1	1.3	1.4	0.1
Med. B	1.6	1.6	0.7	0.8	0.1	1.5	1.6	0.7	0.7	0	1.6	1.6	0.7	0.7	0
Med. C01	1.6	1.7	0.7	0.7	0.1	1.6	1.7	0.7	0.7	0.1	1.6	1.7	0.7	0.7	0.1
Med. C02	1.7	1.8	0.8	0.7	0.1	1.7	1.8	0.8	0.8	0.1	1.7	1.8	0.8	0.7	0
Med. C03	2	2.1	0.9	0.9	0.1	2	2	0.9	0.9	0	2	2	0.9	0.9	0
Med. C04	1.1	1.2	0.4	0.4	0.1	1.1	1.2	0.4	0.4	0.1	1.1	1.2	0.4	0.4	0.1
Med. C05	1.1	1.1	0.3	0.4	0.1	1.1	1.1	0.3	0.4	0.2	1.1	1.1	0.3	0.4	0.2
Med. C07	1.2	1.1	0.4	0.4	0.3	1.8	1.8	0.5	0.5	0	1.2	1.8	0.4	0.4	0.1
Med. C08	1.8	1.8	0.5	0.4	0	1.7	1.7	0.5	0.5	0.1	1.8	1.7	0.5	0.5	0.2
Med. C09	1.7	1.7	0.5	0.5	0.1	2	2	0.7	0.7	0.1	1.7	2	0.5	0.7	0.1
Med. C10	2	2	0.7	0.7	0.1	1.6	1.7	0.5	0.5	0.1	2	1.7	0.7	0.5	0.1
Med. D	1.6	1.7	0.5	0.5	0.2	1.5	1.5	0.9	0.9	0	1.6	1.6	0.5	1	0.1
Med. G	1.5	1.5	0.9	0.9	0	1.7	1.7	0.8	0.8	0	1.5	1.7	0.9	0.8	0.1
Med. H	1.7	1.6	0.8	0.8	0	1.6	1.7	0.7	0.7	0	1.7	1.7	0.8	0.7	0
Med. J	1.6	1.7	0.7	0.7	0	1.7	1.7	1	1	0	1.6	1.8	0.7	1	0
Med. L	1.7	1.7	1.1	1	0.1	1.6	1.5	0.8	0.7	0.2	1.7	1.5	1.1	0.7	0.2
Med. M	1.7	1.6	0.8	0.7	0.1	1.7	1.7	0.9	0.9	0.1	1.7	1.7	0.8	0.9	0.1
Med. N	1.7	1.7	0.9	0.9	0	2.1	2.2	1.6	1.7	0	1.7	2.2	0.9	1.7	0.1
Med. P	2.2	2.3	1.7	1.7	0.1	1.1	1.1	0.2	0.3	0.1	2.2	1.1	1.7	0.3	0.2
Med. R	1.1	1.1	0.2	0.3	0.1	2.1	2.2	1.5	1.7	0.1	1.1	2.3	0.2	1.8	0.1
Med. S	2.1	2	1.6	1.5	0.1	1.4	1.4	0.7	0.7	0	2.1	1.5	1.6	0.7	0.1
Med. V	1.4	1.5	0.7	0.8	0	1.5	1.6	0.6	0.5	0.2	1.4	1.6	0.7	0.5	0.2
Med. Z	1.6	1.6	0.6	0.5	0.1	1.1	1.1	0.3	0.3	0	1.6	1.1	0.6	0.3	0.1
Period 1	0.1	0.2	0.3	0.4	0.2						0.1	0.1	0.3	0.3	0
Period 2	0.1	0.2	0.3	0.4	0.1						0.1	0.1	0.3	0.3	0
Period 3	0.3	0.1	0.4	0.4	0.3						0.3	0.3	0.4	0.4	0
Period 4	0.2	0.1	0.4	0.4	0.1						0.2	0.2	0.4	0.4	0
Period 5	0.1	0.1	0.3	0.3	0.1						0.1	0.1	0.3	0.3	0
Period 6	0.1	0.1	0.3	0.3	0						0.1	0.1	0.3	0.3	0
Period 7	0.1	0.1	0.3	0.3	0.2						0.1	0.1	0.3	0.2	0
Health insurance (1)	0.9	0.9	0.2	0.2	0						0.9	1	0.2	0.2	0

Open in a new tab

LOS: Length of stay; TC: Long-term care, stdmdiff: Standard mean difference. Med: Medicines by ATC code.

Fig. A.1 — Survival probability. Intervention versus control group; PSM-based model (log scaled, survival time in days, censoring the deceased within the ﬁrst 182 days, max 1825 days).

Fig. A.2 — Survival probability. Intervention versus control group; EB and PSM based models (log scaled, survival time in days, censoring the deceased within the ﬁrst 182 days, max 1825 days).

Appendix ASupplementary data

The following is the Supplementary data to this article:

Multimedia component 1

mmc1.xlsx^{(16.9KB, xlsx)}

Data availability

The authors do not have permission to share data.

References

Austin P.C. Optimal caliper widths for propensity‐score matching when estimating differences in means and differences in proportions in observational studies. Pharmaceutical Statistics. 2011;10(2):150–161. doi: 10.1002/pst.433. [DOI] [PMC free article] [PubMed] [Google Scholar]
Cumming G. Inference by eye: Reading the overlap of independent confidence intervals. Statistics in Medicine. 2009;28(2):205–220. doi: 10.1002/sim.3471. [DOI] [PubMed] [Google Scholar]
European Commission . Blocks :tools and methodologies to assess integrated care in europe: Report by the Expert group on health systems performance assessment. Publications Office; 2017. Directorate general for health and Food safety.https://data.europa.eu/doi/10.2875/017891 [Google Scholar]
Everitt B.S., Skrondal A. Cambridge University Press; 2010. The cambridge dictionary of statistics.http://www.SLQ.eblib.com.au/patron/FullRecord.aspx?p=554744 [Google Scholar]
Faraone S.V. Interpreting estimates of treatment effects: Implications for managed care. P and T: A Peer-Reviewed Journal for Formulary Management. 2008;33(12):700–711. [PMC free article] [PubMed] [Google Scholar]
Fraccaro P., Kontopantelis E., Sperrin M., Peek N., Mallen C., Urban P., Buchan I.E., Mamas M.A. Predicting mortality from change-over-time in the Charlson comorbidity index: A retrospective cohort study in a data-intensive UK health system. Medicine. 2016;95(43) doi: 10.1097/MD.0000000000004973. [DOI] [PMC free article] [PubMed] [Google Scholar]
Generationensterbetafeln Für Deutschland . 2011. Modellrechnungen Für Die Geburtsjahrgänge 1896–2009.https://www.destatis.de/DE/ZahlenFakten/GesellschaftStaat/Bevoelkerung/Sterbefaelle/Tabellen/Genera tionensterbetafelMethoden.pdf?__blob=publicationFile Statistisches Bundesamt (Germany) [Google Scholar]
Gu X.S., Rosenbaum P.R. Comparison of multivariate matching methods: Structures, distances, and algorithms. Journal of Computational & Graphical Statistics. 1993;2(4):405–420. doi: 10.1080/10618600.1993.10474623. [DOI] [Google Scholar]
Hainmueller J. Entropy balancing for causal effects: A multivariate reweighting method to produce balanced samples in observational studies. Political Analysis. 2012;20(1):25–46. doi: 10.1093/pan/mpr025. [DOI] [Google Scholar]
Harvey R.A., Hayden J.D., Kamble P.S., Bouchard J.R., Huang J.C. A comparison of entropy balance and probability weighting methods to generalize observational cohorts to a population: A simulation and empirical example: Entropy and probability weighting methods. Pharmacoepidemiology and Drug Safety. 2017;26(4):368–377. doi: 10.1002/pds.4121. [DOI] [PubMed] [Google Scholar]
Hildebrandt H., Pimperl A., Schulte T., Hermann C., Riedel H., Schubert I., Köster I., Siegel A., Wetzel M. Triple Aim – evaluation in der Integrierten Versorgung Gesundes Kinzigtal – gesundheitszustand, Versorgungserleben und Wirtschaftlichkeit. Bundesgesundheitsblatt - Gesundheitsforschung - Gesundheitsschutz. 2015;58(4–5):383–392. doi: 10.1007/s00103-015-2120-y. [DOI] [PubMed] [Google Scholar]
Kaplan E.L., Meier P. Nonparametric estimation from incomplete observations. Journal of the American Statistical Association. 1958;53(282):457–481. doi: 10.1080/01621459.1958.10501452. [DOI] [Google Scholar]
Kaufman B.G., Spivack B.S., Stearns S.C., Song P.H., O'Brien E.C. Impact of accountable care organizations on utilization, care, and outcomes: A systematic review. Medical Care Research and Review. 2019;76(3):255–290. doi: 10.1177/1077558717745916. [DOI] [PubMed] [Google Scholar]
Levesque L.E., Hanley J.A., Kezouh A., Suissa S. Problem of immortal time bias in cohort studies: Example using statins for preventing progression of diabetes. BMJ. 2010;340(mar12 1) doi: 10.1136/bmj.b5087. b5087–b5087. [DOI] [PubMed] [Google Scholar]
Lim S., Marcus S.M., Singh T.P., Harris T.G., Levanon Seligson A. Bias due to sample selection in propensity score matching for a supportive housing program evaluation in New York city. PLoS One. 2014;9(10) doi: 10.1371/journal.pone.0109112. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lunt M. Selecting an appropriate caliper can Be essential for achieving good balance with propensity score matching. American Journal of Epidemiology. 2014;179(2):226–235. doi: 10.1093/aje/kwt212. [DOI] [PMC free article] [PubMed] [Google Scholar]
Matschinger H., Heider D., König H.-H. A comparison of matching and weighting methods for causal inference based on routine health insurance data, or: What to do if an RCT is impossible. Das Gesundheitswesen. 2020;82(S 02):S139–S150. doi: 10.1055/a-1009-6634. [DOI] [PubMed] [Google Scholar]
McClellan M., Kent J., Stephen B., Macdonnell M., Thoumi A., Shuttleworth B., Cohen S. World innovation summit for health (WISH) Qatar Foundation; 2013. Accountable care: Focusing accountability on the outcomes: Report of the accountable care working group 2013.https://www.wish.org.qa/wp-content/uploads/2018/01/27425_WISH_Accountable_care_Report_AW-Web.pdf [Google Scholar]
McMullin J.L., Schonberger B. Entropy-balanced accruals. Review of Accounting Studies. 2020;25(1):84–119. doi: 10.1007/s11142-019-09525-9. [DOI] [Google Scholar]
Morgan S.L., Winship C. 2nd ed. Cambridge University Press; 2015. Counterfactuals and causal inference: Methods and principles for social research. [Google Scholar]
Murray P.K., Singer M., Dawson N.V., Thomas C.L., Cebul R.D. Outcomes of rehabilitation services for nursing home residents 121No commercial party having a direct financial interest in the results of the research supporting this article has or will confer a benefit on the authors or any organization with which the authors are associated.2Reprints are not available. Archives of Physical Medicine and Rehabilitation. 2003;84(8):1129–1136. doi: 10.1016/S0003-9993(03)00149-7. [DOI] [PubMed] [Google Scholar]
Parish W.J., Keyes V., Beadles C., Kandilov A. Using entropy balancing to strengthen an observational cohort study design: Lessons learned from an evaluation of a complex multi-state federal demonstration. Health Services & Outcomes Research Methodology. 2018;18(1):17–46. doi: 10.1007/s10742-017-0174-z. [DOI] [Google Scholar]
Peiris D., News M., Nallajah K. Sax Institute for the NSW Agency for Clinical Innovation; 2018. Evidence check: Accountable care organizations.https://aci.health.nsw.gov.au/__data/assets/pdf_file/0009/420939/ACO-Evidence-Check.pdf [Google Scholar]
Pimperl A., Schulte T., Mühlbacher A., Rosenmöller M., Busse R., Groene O., Rodriguez H.P., Hildebrandt H. Evaluating the impact of an accountable care organization on population health: The quasi-experimental design of the German Gesundes kinzigtal. Population Health Management. 2017;20(3):239–248. doi: 10.1089/pop.2016.0036. [DOI] [PubMed] [Google Scholar]
Romano P.S., Geppert J.J., Davies S., Miller M.R., Elixhauser A., McDonald K.M. A national profile of patient safety in US hospitals. Health Affairs. 2003;22(2):154–166. doi: 10.1377/hlthaff.22.2.154. [DOI] [PubMed] [Google Scholar]
Rosenbaum P.R., Rubin D.B. The central role of the propensity score in observational studies for causal effects. Biometrika. 1983;70(1):41–55. doi: 10.1093/biomet/70.1.41. [DOI] [Google Scholar]
Samaras T.T. International encyclopedia of public health. Elsevier; 2017. Longevity of specific populations; pp. 464–468. [DOI] [Google Scholar]
Spruance S.L., Reid J.E., Grace M., Samore M. Hazard ratio in clinical trials. Antimicrobial Agents and Chemotherapy. 2004;48(8):2787–2792. doi: 10.1128/AAC.48.8.2787-2792.2004. [DOI] [PMC free article] [PubMed] [Google Scholar]
Stiefel M., Nolan K. Institute for Healthcare Improvement; 2012. A Guide to Measuring the triple aim: Population health, Experience of care, and per capita cost (IHI innovation series white paper) [Google Scholar]
Stuart E.A. Matching methods for causal inference: A review and a look forward. Statistical Science. 2010;25(1) doi: 10.1214/09-STS313. [DOI] [PMC free article] [PubMed] [Google Scholar]
Witman A., Beadles C., Liu Y., Larsen A., Kafali N., Gandhi S., Amico P., Hoerger T. Comparison group selection in the presence of rolling entry for health services research: Rolling entry matching. Health Services Research. 2019;54(2):492–501. doi: 10.1111/1475-6773.13086. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Multimedia component 1

mmc1.xlsx^{(16.9KB, xlsx)}

Data Availability Statement

The authors do not have permission to share data.

[bib1] Austin P.C. Optimal caliper widths for propensity‐score matching when estimating differences in means and differences in proportions in observational studies. Pharmaceutical Statistics. 2011;10(2):150–161. doi: 10.1002/pst.433. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib2] Cumming G. Inference by eye: Reading the overlap of independent confidence intervals. Statistics in Medicine. 2009;28(2):205–220. doi: 10.1002/sim.3471. [DOI] [PubMed] [Google Scholar]

[bib3] European Commission . Blocks :tools and methodologies to assess integrated care in europe: Report by the Expert group on health systems performance assessment. Publications Office; 2017. Directorate general for health and Food safety.https://data.europa.eu/doi/10.2875/017891 [Google Scholar]

[bib4] Everitt B.S., Skrondal A. Cambridge University Press; 2010. The cambridge dictionary of statistics.http://www.SLQ.eblib.com.au/patron/FullRecord.aspx?p=554744 [Google Scholar]

[bib5] Faraone S.V. Interpreting estimates of treatment effects: Implications for managed care. P and T: A Peer-Reviewed Journal for Formulary Management. 2008;33(12):700–711. [PMC free article] [PubMed] [Google Scholar]

[bib6] Fraccaro P., Kontopantelis E., Sperrin M., Peek N., Mallen C., Urban P., Buchan I.E., Mamas M.A. Predicting mortality from change-over-time in the Charlson comorbidity index: A retrospective cohort study in a data-intensive UK health system. Medicine. 2016;95(43) doi: 10.1097/MD.0000000000004973. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib7] Generationensterbetafeln Für Deutschland . 2011. Modellrechnungen Für Die Geburtsjahrgänge 1896–2009.https://www.destatis.de/DE/ZahlenFakten/GesellschaftStaat/Bevoelkerung/Sterbefaelle/Tabellen/Genera tionensterbetafelMethoden.pdf?__blob=publicationFile Statistisches Bundesamt (Germany) [Google Scholar]

[bib8] Gu X.S., Rosenbaum P.R. Comparison of multivariate matching methods: Structures, distances, and algorithms. Journal of Computational & Graphical Statistics. 1993;2(4):405–420. doi: 10.1080/10618600.1993.10474623. [DOI] [Google Scholar]

[bib9] Hainmueller J. Entropy balancing for causal effects: A multivariate reweighting method to produce balanced samples in observational studies. Political Analysis. 2012;20(1):25–46. doi: 10.1093/pan/mpr025. [DOI] [Google Scholar]

[bib10] Harvey R.A., Hayden J.D., Kamble P.S., Bouchard J.R., Huang J.C. A comparison of entropy balance and probability weighting methods to generalize observational cohorts to a population: A simulation and empirical example: Entropy and probability weighting methods. Pharmacoepidemiology and Drug Safety. 2017;26(4):368–377. doi: 10.1002/pds.4121. [DOI] [PubMed] [Google Scholar]

[bib11] Hildebrandt H., Pimperl A., Schulte T., Hermann C., Riedel H., Schubert I., Köster I., Siegel A., Wetzel M. Triple Aim – evaluation in der Integrierten Versorgung Gesundes Kinzigtal – gesundheitszustand, Versorgungserleben und Wirtschaftlichkeit. Bundesgesundheitsblatt - Gesundheitsforschung - Gesundheitsschutz. 2015;58(4–5):383–392. doi: 10.1007/s00103-015-2120-y. [DOI] [PubMed] [Google Scholar]

[bib12] Kaplan E.L., Meier P. Nonparametric estimation from incomplete observations. Journal of the American Statistical Association. 1958;53(282):457–481. doi: 10.1080/01621459.1958.10501452. [DOI] [Google Scholar]

[bib13] Kaufman B.G., Spivack B.S., Stearns S.C., Song P.H., O'Brien E.C. Impact of accountable care organizations on utilization, care, and outcomes: A systematic review. Medical Care Research and Review. 2019;76(3):255–290. doi: 10.1177/1077558717745916. [DOI] [PubMed] [Google Scholar]

[bib14] Levesque L.E., Hanley J.A., Kezouh A., Suissa S. Problem of immortal time bias in cohort studies: Example using statins for preventing progression of diabetes. BMJ. 2010;340(mar12 1) doi: 10.1136/bmj.b5087. b5087–b5087. [DOI] [PubMed] [Google Scholar]

[bib16] Lim S., Marcus S.M., Singh T.P., Harris T.G., Levanon Seligson A. Bias due to sample selection in propensity score matching for a supportive housing program evaluation in New York city. PLoS One. 2014;9(10) doi: 10.1371/journal.pone.0109112. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib17] Lunt M. Selecting an appropriate caliper can Be essential for achieving good balance with propensity score matching. American Journal of Epidemiology. 2014;179(2):226–235. doi: 10.1093/aje/kwt212. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib18] Matschinger H., Heider D., König H.-H. A comparison of matching and weighting methods for causal inference based on routine health insurance data, or: What to do if an RCT is impossible. Das Gesundheitswesen. 2020;82(S 02):S139–S150. doi: 10.1055/a-1009-6634. [DOI] [PubMed] [Google Scholar]

[bib19] McClellan M., Kent J., Stephen B., Macdonnell M., Thoumi A., Shuttleworth B., Cohen S. World innovation summit for health (WISH) Qatar Foundation; 2013. Accountable care: Focusing accountability on the outcomes: Report of the accountable care working group 2013.https://www.wish.org.qa/wp-content/uploads/2018/01/27425_WISH_Accountable_care_Report_AW-Web.pdf [Google Scholar]

[bib20] McMullin J.L., Schonberger B. Entropy-balanced accruals. Review of Accounting Studies. 2020;25(1):84–119. doi: 10.1007/s11142-019-09525-9. [DOI] [Google Scholar]

[bib21] Morgan S.L., Winship C. 2nd ed. Cambridge University Press; 2015. Counterfactuals and causal inference: Methods and principles for social research. [Google Scholar]

[bib22] Murray P.K., Singer M., Dawson N.V., Thomas C.L., Cebul R.D. Outcomes of rehabilitation services for nursing home residents 121No commercial party having a direct financial interest in the results of the research supporting this article has or will confer a benefit on the authors or any organization with which the authors are associated.2Reprints are not available. Archives of Physical Medicine and Rehabilitation. 2003;84(8):1129–1136. doi: 10.1016/S0003-9993(03)00149-7. [DOI] [PubMed] [Google Scholar]

[bib23] Parish W.J., Keyes V., Beadles C., Kandilov A. Using entropy balancing to strengthen an observational cohort study design: Lessons learned from an evaluation of a complex multi-state federal demonstration. Health Services & Outcomes Research Methodology. 2018;18(1):17–46. doi: 10.1007/s10742-017-0174-z. [DOI] [Google Scholar]

[bib24] Peiris D., News M., Nallajah K. Sax Institute for the NSW Agency for Clinical Innovation; 2018. Evidence check: Accountable care organizations.https://aci.health.nsw.gov.au/__data/assets/pdf_file/0009/420939/ACO-Evidence-Check.pdf [Google Scholar]

[bib25] Pimperl A., Schulte T., Mühlbacher A., Rosenmöller M., Busse R., Groene O., Rodriguez H.P., Hildebrandt H. Evaluating the impact of an accountable care organization on population health: The quasi-experimental design of the German Gesundes kinzigtal. Population Health Management. 2017;20(3):239–248. doi: 10.1089/pop.2016.0036. [DOI] [PubMed] [Google Scholar]

[bib26] Romano P.S., Geppert J.J., Davies S., Miller M.R., Elixhauser A., McDonald K.M. A national profile of patient safety in US hospitals. Health Affairs. 2003;22(2):154–166. doi: 10.1377/hlthaff.22.2.154. [DOI] [PubMed] [Google Scholar]

[bib27] Rosenbaum P.R., Rubin D.B. The central role of the propensity score in observational studies for causal effects. Biometrika. 1983;70(1):41–55. doi: 10.1093/biomet/70.1.41. [DOI] [Google Scholar]

[bib28] Samaras T.T. International encyclopedia of public health. Elsevier; 2017. Longevity of specific populations; pp. 464–468. [DOI] [Google Scholar]

[bib29] Spruance S.L., Reid J.E., Grace M., Samore M. Hazard ratio in clinical trials. Antimicrobial Agents and Chemotherapy. 2004;48(8):2787–2792. doi: 10.1128/AAC.48.8.2787-2792.2004. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib30] Stiefel M., Nolan K. Institute for Healthcare Improvement; 2012. A Guide to Measuring the triple aim: Population health, Experience of care, and per capita cost (IHI innovation series white paper) [Google Scholar]

[bib31] Stuart E.A. Matching methods for causal inference: A review and a look forward. Statistical Science. 2010;25(1) doi: 10.1214/09-STS313. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib32] Witman A., Beadles C., Liu Y., Larsen A., Kafali N., Gandhi S., Amico P., Hoerger T. Comparison group selection in the presence of rolling entry for health services research: Rolling entry matching. Health Services Research. 2019;54(2):492–501. doi: 10.1111/1475-6773.13086. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Improving the evaluation of an integrated healthcare system using entropy balancing: Population health improvements in Gesundes Kinzigtal

Nicolas Larrain

Oliver Groene

Abstract

Background

Research question

Methods

Results

Discussion

Highlights

1. Introduction

2. Methods and theory

2.1. Data

2.2. Study design

Table 1.

2.3. Outcome indicators

2.4. Additional data quality criteria

3. Results

3.1. Comparison groups

Table 2.

3.2. Selection bias

Table 3.

3.3. Impact of the ACO in population health outcomes

Table 4.

Fig. 1.

Table 5.

4. Discussion

4.1. Limitations

5. Conclusion

Funding

Author contributions

Availability of data and material

Ethics approval

Code availability

Conflicts of interest

Footnotes

Appendix

Table 1.

Fig. A.1.

Fig. A.2.

Appendix ASupplementary data

Data availability

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases