Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2023 Jan 5.
Published in final edited form as: Stat Med. 2020 Jul 9;39(22):3003–3021. doi: 10.1002/sim.8581

Empirical use of causal inference methods to evaluate survival differences in a real-world registry versus those found in randomized clinical trials

Hui-Jie Lee 1, John B Wong 2, Beilin Jia 1, Xinyue Qi 1, Elizabeth R DeLong 1
PMCID: PMC9813951  NIHMSID: NIHMS1856392  PMID: 32643219

Abstract

With heighted interest in causal inference based on real-world evidence, this empirical study sought to understand differences between the results of observational analyses and long-term randomized clinical trials. We hypothesized that patients deemed “eligible” for clinical trials would follow a different survival trajectory from those deemed “ineligible” and that this factor could partially explain results. In a large observational registry dataset, we estimated separate survival trajectories for hypothetically trial-eligible versus ineligible patients under both coronary artery bypass surgery (CABG) and percutaneous coronary intervention (PCI). We also explored whether results would depend on the causal inference method (inverse probability of treatment weighting versus optimal full propensity matching) or the approach to combining propensity scores from multiple imputations (the “across” versus “within” approaches). We found that, in this registry population of PCI/CABG multi-vessel patients, 32.5% would have been eligible for contemporaneous RCTs, suggesting that RCTs enroll selected populations. Additionally, we found treatment selection bias with different distributions of propensity scores between PCI and CABG patients. The different methodological approaches did not result in different conclusions. Overall, trial-eligible patients appeared to demonstrate at least marginally better survival than ineligible patients. Treatment comparisons by eligibility depended on disease severity. Among trial-eligible three-vessel diseased and trial-ineligible two-vessel diseased patients, CABG appeared to have at least a slight advantage with no treatment difference otherwise. In conclusion, our analyses suggest that RCTs enroll highly selected populations, and our findings are generally consistent with RCTs but less pronounced than major registry findings.

Keywords: Observational studies, trial eligibility, propensity score, multiple imputation, optimal full matching, inverse probability of treatment weighting

1. Introduction

For decades, a tension has existed between the results of randomized controlled clinical trials and those of large scale clinical data registries. 13 Advocates of the former maintain that the only way to ensure unbiased results is through randomization, while proponents of the latter point to the wealth of causal inference methodology that can be brought to bear to address the selection bias problem. In addition, a major difference between randomized trial data and clinical data registries is the restrictive eligibility criteria for the trials, thereby limiting the generalizability of the trial results. We speculated that differences in results between clinical trials and observational data registries might be explained by the strict eligibility criteria imposed by clinical trials. This could occur if trial-eligible patients represent a distinct subpopulation within an observational data cohort.

In particular, we have chosen as an example the randomized controlled trials (RCT) of revascularization with coronary artery bypass grafting (CABG) versus percutaneous coronary interventions (PCI) for stable angina. Most of these trials have not shown any survival advantage,46 but have also enrolled only 3–15% of patients undergoing cardiac catheterization for whom the results might apply.710 A patient-level meta-analysis of four trials with 3051 patients (ARTS, ERACI-II, MASS-II, and SoS) found no statistically significant 5-year patient survival differences for PCI versus CABG (8.5% versus 8.2%, P=0.69), or for two-vessel (7.6% versus 7.3%, P=0.87) or for three-vessel disease (10.2% versus 9.5%, P=0.71) subgroups.11 In contrast, observational studies have found statistically significant survival benefit of CABG over PCI for three-vessel disease and some or all subsets of two-vessel disease depending on the study.1215 If trial-eligible patients do, in fact, represent a distinct subpopulation, their survival curves and treatment differences would a priori differ from those who would have been ineligible. To explore this explanatory hypothesis, we used a large registry database to estimate survival curves for hypothetically trial-eligible versus ineligible patients receiving CABG versus PCI. The ultimate goal was to determine whether eligibility could explain comparative treatment-attributable survival differences between observational analyses versus RCTs.

Interest in real world evidence, such as disease registries, is one response to the challenging pace of healthcare. However, these observational databases invariably involve potential treatment selection bias and the presence of missing covariate data.1618 To account for selection bias, various versions of propensity score methods are currently recommended.1922 Propensity scores reduce a large covariate space by summarizing the entire collection of covariates into a single measure. Addressing the likely resulting covariate imbalance then involves either matching patients from different treatment groups according to their propensity for treatment or using inverse probability of treatment weighting (IPTW). Prior comparisons of these two approaches have found: (1) both pair matching on propensity scores and IPTW result in minimal bias in estimated marginal hazard ratios23; and (2) caliper propensity score matching performs equivalently to IPTW methods in estimating absolute effects of treatments on survival outcomes when treatment prevalence is low, 24 (3) full matching has comparable performance with inverse probability of treatment weighting even when the prevalence of treatment is high.25,26 However, these comparisons do not examine different approaches to the handling of missing values.

Commonly recommended to account for the uncertainty associated with missing covariate values, multiple imputation involves simulating several replicate datasets by imputing values of the missing data according to distributional assumptions and then performing the planned analysis. Methods for creating the replicate datasets necessary for multiple imputation have been well developed, 2730 but uncertainty persists about whether and how to incorporate the outcome into the process3134 and which of two predominant approaches for combining the results over the multiple imputations leads to better bias and variance reduction.3540 Both approaches to combining results initially construct a propensity score model within each imputation to estimate a single propensity score for each subject within that imputation. The “across” approach averages the individual propensity scores over imputations before analyzing the outcomes; the alternative “within” approach uses the propensity scores to analyze each individual imputation dataset separately and then combines results over imputations. A number of simulation studies have compared the two approaches but have been limited in terms of the number and type of covariates and the type of outcome variables examined with, in particular, little methodological guidance on which approach would be best for analyzing survival data with missing values. 3540

In the absence of strong recommendations, we assessed the performance of four analytic approaches: paired combinations of propensity matching versus IPTW, and averaging propensities before analyzing (across approach) versus analyzing before averaging (within approach). The point was to determine whether they would lead to different conclusions regarding treatment-attributable survival differences between RCT-eligible patients and those deemed not eligible in a large real-world registry. Results were similar across all four approaches (see supplementary material), except for the full matching across approach, so we present only the full matching within versus full matching across hereafter.

Frequently comparison of survival outcomes use the hazard ratio estimated from the Cox proportional hazards model to represent the treatment effect. Violations of the proportional hazards assumption, such as with surgical mortality associated with CABG (leading to crossing survival curves) necessitate a treatment-stratified Cox model, but the hazard ratio for treatment can no longer be estimated. In this present study, the outcomes of interest are therefore differences in average survival probabilities at pre-specified time points and differences in restricted mean survival time (RMST).4143

2. Determining eligibility

For this empirical study, we extracted data from the Duke Databank for Cardiovascular Disease, a long-term follow-up registry of patients with cardiovascular disease.4447 All patients receiving a cardiac catheterization within the Duke University Health System have been routinely followed for subsequent events since 1969. Treatment assignment was based on whether the patient underwent an initial procedure of CABG or PCI within 30 days of the catheterization; otherwise the treatment was designated as medical therapy (MED). As previously described,48 the initial treatment “strategy” determined the treatment assignment, and the survival for patients who subsequently crossed over to a different treatment was attributed to the initial “strategy” as would occur in an intention-to-treat analysis, consistent with an RCT. To create a comprehensive cohort of patients who could have been eligible for either a PCI or a CABG trial, we initially included all patients who were catheterized during the enrollment periods of six randomized clinical trials comparing CABG, PCI, and possibly MED: Arterial Revascularization Study (ARTS)49; Clinical Outcomes Utilizing Revascularization and Aggressive Drug Evaluation (COURAGE)50; Argentine Randomized Study: Coronary Angioplasty with Stenting vs. Coronary Bypass Surgery in Multivessel Disease (ERACI II)8; Medicine, Angioplasty, or Surgery Study (MASS II)51; second Randomised Intervention Treatment of Angina (RITA-2)10; and Stent or Surgery trial (SoS).52 Among the three treatments, this paper focuses on examining only CABG vs. PCI survival. Also, because of instability induced by the small numbers of trial-eligible single-vessel CABG patients, we limited our results to comparing survivals of patients with two- or three-vessel disease.

For the earliest and latest trial enrollment dates (7/92–1/04), we applied inclusion/exclusion criteria to each Duke database patient to characterize whether that patient would have been eligible for one or more of the six trials (Table 1) and included follow-up through 8/2014 for these patients. If patients met eligibility criteria for at least one of the trials during that trial’s enrollment period, they were deemed eligible; otherwise they were deemed ineligible. Patients assigned to MED were initially included because they would have been potentially eligible for one or more of the trials and thus could contribute data to the multiple imputations. Because the number of diseased vessels is a strong selection factor for determining treatment strategy, as evident in revascularization guidelines and appropriate use criteria,5355 we subsequently grouped and analyzed subgroups of patients by the number of diseased vessels and by their eligibility status, with eligibility defined as meeting eligibility for at least one of the randomized clinical trials.

Table 1.

Randomized PCI/CABG clinical trials from which eligibility criteria were used in determining hypothetical eligibility for Duke patients

Trial ARTS COURAGE ERACI II MASS II RITA-2 SoS
Treatment Groups PCI vs. CABG PCI vs. Medical PCI vs. CABG Medical vs. PCI vs. CABG PCI vs. Medical PCI vs. CABG
Dates 4/97 – 6/98 6/99–1/04 10/96 – 9/98 5/95 – 5/00 7/92–5/96 11/96 – 12/99
Follow-up Time 5 years 2.5–7 years 5 years 10 year 5–9 years 5–8 years
Sample Size 1 205 2 287 450 611 1 018 988
Two-vessel disease (%) 67–68% 39% 38–40% 41–42% 33%1 52–62%2
Three-vessel disease (%) 30–33% 30–31% 55–58% 58–59% 7%1 38–47%2
Results of all-cause mortality No difference at 5-year3 No difference at 5-year3 No difference at 5-year3 No difference at 10-year No difference at 7-year3 CABG better than PCI at 6-year
1:

RITA-2 had 60% single-vessel patients.

2:

Number of diseased vessels was not balanced between PCI and CABG groups in SoS. Higher percentage of 3-vessel disease patients in CABG (47% vs. 38%), and higher 2-vessel disease patients in PCI group (62% vs. 52%).

3:

All-cause mortality was a secondary outcome. The primary outcome of ARTS was freedom from major adverse cardiac and cerebrovascular events (MACCE) at one year. It was death from any cause and non-fatal myocardial infarction during follow-up for COURAGE and RITA-2. It was freedom from major adverse cardiovascular events (MACE) up to 5 years for ERACI II.

3. Notation and Methods

3.1. The Potential Outcomes Framework

For notational convenience, we refer to PCI as the control and CABG as the active treatment, and we denote the survival outcome for patient i as a vector Yi, consisting of both the latest follow-up and the censoring indicator. The causal effect for subject i compares the potential outcomes that would have been observed under the control treatment, Yi(0), versus the active treatment, Yi(1), respectively. Let Zi denote the treatment indicator, so that Zi = 0 for PCI and Zi = 1 for CABG. Because each subject can only receive one of the two possible treatments, only one outcome, Yi, is observed for each subject. The observed survival outcome is equal to Yi = Zi Yi(1) + (1-Zi) Yi(0).

The theory underlying causal analyses of observational data relies on a critical assumption, namely that of strongly ignorable treatment assignment.19 This assumption states that the treatment assignment (Z) is independent of the potential outcomes given a set of measured covariates (X) that can affect treatment selection, i.e., ZY0, Y1|X. That is, after controlling for observed covariates, treatment assignment is essentially random. Additionally, 0<PrZ=1|X<1, i.e. every subject has a non-zero chance of receiving either treatment.

3.2. Estimation of Propensity Scores with Missing Covariate Data

Propensity score methods have become a popular mechanism for addressing treatment selection bias that might result in imbalances in the baseline covariate distributions between treatment groups. The propensity score, e(X), is the probability of receiving the active treatment conditional on observed baseline covariates: eX=PrZ=1|X. Assuming strong ignorability of the treatment assignment mechanism and the absence of unmeasured confounding,19 e(X) is a balancing score, i.e.,

ZX|eX.

Thus treatment assignment is then independent of the potential outcomes, conditional on only e(X):

ZY0, Y1|eX.

Further, when holding the propensity score constant, the expected distribution of the covariate vector X is the same for the two treatment groups. Hence, an estimate of the outcome difference between intervention and controls, given e(X), represents the intervention treatment effect at that value of the propensity. These propensity scores may be used in the analysis either through IPTW or propensity score matching. Although the average treatment effect among the treated (ATT) subpopulation can be estimated by calibrating different weights to the treated subpopulation, this study focuses on the average treatment effect (ATE) in the overall population being studied.

The above conditions assume complete data in the vector of covariates X that comprise the propensity score. To accommodate missing values in X, Frangakis and Rubin extended the notion of strong ignorability to “latent ignorability.56” In this case, ignorability of the treatment assignment mechanism is achieved by conditioning on the complete set of both “latent” and observed values in X. Assuming that the missing value distribution among the covariates in X depends only on the observed values and not on the missing values, the “latent” values are operationalized through multiple imputation.

3.3. Multiple Imputation and Variable Selection

In survival analysis, because all observations have outcomes recorded as either an event or censored, the missing values are concentrated in the covariates. The Figure 1 flow chart summarizes our analysis pipeline. Using all patients, including those assigned to the medically treated group, we generated 10 imputed datasets through multivariate imputation. Assuming missing at random, we used the multiple imputation by chained equations (MICE) approach which is also known as “fully conditional specification” (FCS) or “sequential regression multiple imputation” among other names.57 In the R programming environment, the predictive mean matching (PMM) method in the MICE package58,59 specifies an imputation model for each variable with missing values conditional on all other variables with imputation done by sequentially predicting all variables with missing values for several cycles. Using PMM, the imputed value is chosen as the observed value of the complete observation with the closest predicted mean to that of the incomplete observation.60,61 Some simulation studies recommend incorporating the outcome into the multiple imputation.31,32,40 With a time-to event analysis, incorporating the outcome is not straightforward, but incorporating the event indicator, logarithm of time-to-event variable and all variables related to the outcome can serve as an approximation.31

Figure 1.

Figure 1.

Flowchart of analysis pipeline that summarizes the procedure for obtaining the ultimate trimmed population and the subsequent analyses.

For each imputation and separately for two- and three-vessel disease, we estimated the propensity score with a logistic regression model to predict the probability of receiving treatment Z as a function of covariates X using only two- and three-vessel disease PCI and CABG patients. To incorporate missing covariate data, we assumed latent ignorability of the treatment assignment mechanism, conditional on complete covariate data that incorporated imputed values. Recent studies have suggested that covariates associated with the outcome, regardless of whether they were associated with treatment assignment, should be included in the propensity model.20,62 Thus, we selected variables based on clinical insights and employed restricted cubic splines for continuous variables, which allowed flexible representation of the functional form of continuous variables. Because we were exploring the role of eligibility itself, that variable was not included in the propensity modeling.

3.4. Utilization of Propensity Scores

As described below, we first used the propensity scores to evaluate the common support in covariate distributions of the two treatment groups, separately for two- and three-vessel patients, and then to trim samples with limited overlap in covariate distributions. We analyzed using both IPTW and optimal full matching, implementing each across and within, but present only the full matching across and within results. Then, separately for eligible and for not eligible patients, we compared the resulting relative survival trajectories (representing the ATE) for the two treatments and summarized their restricted mean survival time (RMST) at 5, 10, 15 years. The RMST is the expected survival time up to a specific time point and is estimated by the area under the survival trajectory up to that time point.

To assess the comparability of patients “selected” in real-world academic clinical practice for PCI versus CABG, we examined the distribution of the propensity scores in each imputed dataset. For example, similar to that seen in the nine other imputed datasets, Figure 2a and 2c display the limited overlap in the distribution of propensity scores for one imputed dataset. Adopting the approach of Crump et al.,63 we determined propensity score cut points for each multiply-imputed dataset and excluded the set of all patients outside the cut points for any of the 10 imputed datasets from all imputed datasets. Using this final sample, we conducted all analyses on this “ultimate trimmed population.” Figures 2b and 2d display an example from one of the multiply imputed datasets of the resulting propensity scores after trimming.

Figure 2.

Figure 2.

Propensity score distribution from one imputation.

* Different y-axis scales were used for display purposes

3.6. Propensity Score Analysis with Missing Data

3.6.1. Optimal Full Matching

Propensity score matching involves forming matched sets of treated and control subjects with similar propensity scores. It is frequently implemented as one-to-n nearest neighbor matching, so that each treated subject is matched with a fixed or variable number of control subjects with the smallest distance between each set.64 Control subjects who do not match with treated subjects are discarded, and this method only estimates average treatment effect among the treated (ATT) rather than the population average treatment effect (ATE).

Alternatively, the full matching approach uses all available patients to enable estimation of ATE.65 Full matching resembles subclassification as it partitions a sample into a collection of strata, each consisting of either one treated subject and at least one control subject, or one control subject and at least one treated subject. Full matching is optimal if it minimizes the average distances in terms of propensity scores between each pair of treated and control subjects within each matched set. We adopted the optimal full matching algorithm implemented in the optmatch package in the R programming environment.66 As suggested by Rosenbaum and Rubin,67 we imposed a maximum permissible distance (or “caliper”) of 0.25 standard deviation of the logit of the propensity score when matching on the logit of the propensity score. This restriction can improve the match quality and avoid poor matches. Moreover, within subgroups designated by the number of diseased vessels and eligibility, PCI and CABG patients could only be matched if they had the same number of diseased vessels and the same hypothetical eligibility status. Because of these constraints, some patients could not be matched in all of the imputed data sets when using the within approach, thereby slightly reducing the sample size for that approach.

Depending on how subjects in matched sets are weighted, both ATT and ATE can be estimated in the full matching setting. In the case of matching within number of diseased vessels and eligibility, we weighted patients as follows to estimate the ATE for both PCI and CABG patients.26 Let ak and bk be the numbers of eligible (ineligible) PCI and eligible (ineligible) CABG patients, respectively, in a given matched set within the k-vessel disease subgroup, and let qk denote the marginal probability of receiving PCI in the eligible (ineligible) k-vessel disease subgroup. Then the weights for PCI and CABG eligible (ineligible) k-diseased vessels patients in the given matched set are qkak+bkak and 1qkak+bkbk, respectively. To create estimated survival trajectories for PCI and CABG using full matching within, we fit a weighted Cox regression model stratified on treatment accounting for matched sets using the ID statement in Proc Phreg in SAS. An alternative approach would be to stratify on matched sets, allowing different baseline hazard functions for each set. However, this approach has been shown to estimate the treatment effect conditional on the matched sets (conditional treatment effect) but not the treatment effect at the population level (marginal treatment effect).23 The weighted Cox regression was performed within each imputation, separately for each number of diseased vessels and eligibility subgroup. For the across approach, patients were matched using average propensity scores and one weighted Cox model was fit for each subgroup defined by number of diseased vessels and eligibility. For the within approach, the survival estimates for CABG and PCI for each individual imputation were combined using Rubin’s rule after complementary log-log transformation.68

3.6.3. Diagnostics of Full Matching

To assess the comparability of PCI and CABG patients after weighting, we compared the standardized differences of observed covariates in the weighted sample versus the unweighted sample.69 The full comparison involved assessing balance in each of the 10 imputed datasets. For brevity, we present the evaluation of one dataset using the matching weight. Let wi be the weight associated with subject i. For a continuous covariate x, let x¯CABG* and x¯PCI* be the weighted sample mean of CABG and PCI patients, respectively. The weighted mean is defined as iwixiiwi over all patients in that treatment group. Let sCABG2* and sPCI2* denote the weighted sample variance of CABG and PCI patients, respectively. The respective weighted sample variance is defined as iwiiwi2iwi2iwixix¯*2. For binary covariates, we arbitrarily assigned a value of zero to one level of the covariate and a value of one to the other level, and calculated the weighted proportions, pCABG^ and pPCI^, for CABG and PCI patients according to the weighted mean formula for continuous variables. For binary covariates, sCABG2* and sPCI2* are pCABG^1pCABG^ and pPCI^1pPCI^, respectively. The standardized difference then becomes d=100× x¯CABG*x¯PCI*sCABG2*+sPCI2*2. A standardized difference below 0.1 has been recommended to indicate a negligible imbalance in covariates between treatment groups.70

3.7. Programming code

In the Appendix, we provide statistical software code that implements the methods discussed in this section in the R and SAS programming languages. Specifically, we provide code for (1) IPTW within, (2) IPTW across, (3) full matching within, (4) full matching across approach, when missing data have already been multiply-imputed and when propensity scores have already been estimated. Readers can expand the code to obtain bootstrap-based inference.

4. Results

Of 35,145 patients who were seen at Duke University Health System for cardiac catheterization between July 1, 1992 and December 31, 2012, 23,247 entered during the overall enrollment periods of the six CABG or PCI trials. Although later excluded, 6473 patients assigned to the medical therapy strategy and 8029 single-vessel disease PCI and CABG patients were included in the multiple imputation process. Propensity modeling was restricted to the remaining 8745 CABG and PCI multi-vessel patients.

Table 2 lists the covariates included in the propensity models by assigned treatment strategy, along with summary statistics and percent missing for each covariate. Although most covariates had complete data recorded, a few covariates had a substantial number of missing values that occurred unevenly across treatment groups. Particularly notable, left ventricular ejection fraction and mitral insufficiency were missing much more frequently in the PCI group than in the CABG group, which was expected because patients transferred to Duke for PCI have frequently not had ejection fraction results from outside cardiac catheterizations recorded. These two covariates were among the most important for treatment selection and prognosis.7177

Table 2.

Missing values for covariates included in propensity models, by assigned treatment

Description PCI – Trial Eligible
(N = 1,294)
PCI – Trial Ineligible
(N = 3,257)
CABG – Trial Eligible
(N = 1,547)
CABG – Trial Ineligible
(N = 2,647)
Mean (SD) or N (%) Missing (%) Mean (SD) or N (%) Missing (%) Mean (SD) or N (%) Missing (%) Mean (SD) or N (%) Missing (%)
Age in Years 62.7 (11.2) 0 (0.0%) 62.0 (11.5) 0 (0.0%) 63.9 (10.7) 0 (0.0%) 63.9 (10.7) 0 (0.0%)
Sex (Female) 416 (32.1%) 0 (0.0%) 1008 (30.9%) 0 (0.0%) 449 (29.0%) 0 (0.0%) 736 (27.8%) 0 (0.0%)
Race (Caucasian) 1006 (79.5%) 29 (2.2%) 2528 (79.1%) 63 (1.9%) 1282 (84.5%) 30 (1.9%) 2126 (82.1%) 57 (2.1%)
Body Mass Index 29.1 (6.2) 1 (0.1%) 28.7 (6.2) 11 (0.3%) 28.4 (5.6) 3 (0.2%) 28.3 (5.9) 4 (0.2%)
Heart Rate 70.2 (14.0) 3 (0.2%) 71.8 (15.3) 25 (0.8%) 71.2 (14.1) 13 (0.8%) 73.0 (14.6) 23 (0.9%)
Systolic Blood Pressure 142.4 (27.9) 31 (2.4%) 137.3 (27.3) 95 (2.9%) 146.3 (26.8) 57 (3.7%) 138.3 (26.9) 90 (3.4%)
Diastolic Blood Pressure 76.5 (14.2) 31 (2.4%) 75.1 (14.0) 101 (3.1%) 78.5 (13.2) 57 (3.7%) 75.5 (13.8) 89 (3.4%)
eGFR CKD – EPI (algorithm) 72.2 (22.5) 241 (18.6%) 73.9 (23.1) 519 (15.9%) 70.4 (23.1) 130 (8.4%) 71.0 (22.9) 263 (9.9%)
Left Ventricular Ejection Fraction 55.5 (11.8) 388 (30.0%) 52.9 (13.5) 875 (26.9%) 54.3 (13.8) 8 (0.5%) 51.3 (14.5) 50 (1.9%)
History of Hypertension 843 (65.1%) 0 (0.0%) 2132 (65.5%) 0 (0.0%) 1010 (65.3%) 0 (0.0%) 1691 (63.9%) 0 (0.0%)
History of Diabetes 395 (30.5%) 0 (0.0%) 961 (29.5%) 0 (0.0%) 513 (33.2%) 0 (0.0%) 778 (29.4%) 0 (0.0%)
History of Peripheral Vascular Disease 127 (9.8%) 0 (0.0%) 336 (10.3%) 0 (0.0%) 169 (10.9%) 0 (0.0%) 350 (13.2%) 0 (0.0%)
History of Cerebrovascular Disease 112 (8.7%) 0 (0.0%) 338 (10.4%) 0 (0.0%) 171 (11.1%) 0 (0.0%) 310 (11.7%) 0 (0.0%)
History of Hyperlipidemia 754 (58.3%) 0 (0.0%) 1904 (58.5%) 0 (0.0%) 926 (59.9%) 0 (0.0%) 1495 (56.5%) 0 (0.0%)
History of Smoking 780 (60.3%) 0 (0.0%) 1997 (61.3%) 0 (0.0%) 917 (59.3%) 0 (0.0%) 1630 (61.6%) 0 (0.0%)
History of Myocardial Infarction 547 (42.3%) 0 (0.0%) 2112 (64.8%) 0 (0.0%) 566 (36.6%) 0 (0.0%) 1560 (58.9%) 0 (0.0%)
History of Congestive Heart Failure 197 (15.5%) 27 (2.1%) 497 (15.5%) 60 (1.8%) 318 (20.9%) 25 (1.6%) 496 (19.2%) 69 (2.6%)
Charlson Comorbidity Disease Index 0.7 (0.9) 0 (0.0%) 0.7 (0.9) 0 (0.0%) 0.8 (1.0) 0 (0.0%) 0.8 (1.0) 0 (0.0%)
Duke Coronary Artery Disease Severity Index 53.5 (14.1) 0 (0.0%) 58.9 (16.8) 0 (0.0%) 71.6 (15.5) 0 (0.0%) 76.7 (16.7) 0 (0.0%)
Prior CABG 92 (7.1%) 0 (0.0%) 946 (29.0%) 0 (0.0%) 16 (1.0%) 0 (0.0%) 208 (7.9%) 0 (0.0%)
Prior PCI 31 (2.4%) 0 (0.0%) 574 (17.6%) 0 (0.0%) 37 (2.4%) 0 (0.0%) 276 (10.4%) 0 (0.0%)
Carotid Bruits 96 (7.5%) 7 (0.5%) 311 (9.6%) 25 (0.8%) 189 (12.3%) 14 (0.9%) 323 (12.3%) 24 (0.9%)
Valvular Heart Disease 8 (0.6%) 0 (0.0%) 23 (0.7%) 0 (0.0%) 32 (2.1%) 0 (0.0%) 49 (1.9%) 0 (0.0%)
Acute Coronary Syndrome Category (STEMI, non-STEMI, unspecified MI) 457 (35.3%) 0 (0.0%) 1556 (47.8%) 0 (0.0%) 427 (27.6%) 0 (0.0%) 1229 (46.4%) 0 (0.0%)
Acute Coronary Syndrome Category (Acute Coronary Syndrome) 161 (12.4%) 0 (0.0%) 1090 (33.5%) 0 (0.0%) 222 (14.4%) 0 (0.0%) 791 (29.9%) 0 (0.0%)
Year of Cardiac Catheterization 1997.0 (3.3) 0 (0.0%) 1997.9 (3.5) 0 (0.0%) 1997.0 (2.9) 0 (0.0%) 1997.6 (3.5) 0 (0.0%)
Severity of Congestive Heart Failure (NYHA class) (No CHF) 1094 (86.8%) 34 (2.6%) 2763 (88.2%) 123 (3.8%) 1225 (80.8%) 31 (2.0%) 2117 (82.8%) 90 (3.4%)
Mitral insufficiency (Absent) 659 (80.3%) 473 (36.6%) 1726 (80.1%) 1102 (33.8%) 1133 (77.8%) 91 (5.9%) 1842 (74.8%) 184 (7.0%)
Number of Diseased Vessels (Two-vessel disease) 1009 (78.0%) 0 (0.0%) 2094 (64.3%) 0 (0.0%) 467 (30.2%) 0 (0.0%) 790 (29.8%) 0 (0.0%)
Ventricular Gallop 18 (1.4%) 1 (0.1%) 68 (2.1%) 2 (0.1%) 31 (2.0%) 1 (0.1%) 111 (4.2%) 0 (0.0%)
*

eGFR CKD – EPI: Estimated glomerular filtration rate using Chronic Kidney Disease-Epidemiology Collaboration equations; STEMI: ST-elevation myocardial infarction; MI: myocardial infarction; NYHA: New York Heart Association; CHF: congestive heart failure.

Figure 2a and 2c demonstrate the different distributions of propensity scores in the PCI and CABG groups for a randomly selected imputation dataset. This plot clearly indicates that certain types of patients were assigned to a particular revascularization strategy with high probability, suggesting influential clinical practice conventions. Selection bias is apparent and most noticeable for two-vessel disease patients where over 70% were selected for PCI versus three-vessel disease patients where two-thirds underwent CABG. After applying cut points specified by Crump et al., 63 figures 2b and 2d illustrate somewhat improved balance with, however, the sample size falling by 22.4% from 8745 to 6793, due to the trimming (see Supplementary material for baseline patient characteristics before and after trimming). Trimming resulted in excluding almost a quarter of the PCI two-vessel patients and almost 30% of CABG three-vessel patients.

Table 3 displays the sample sizes for each method, by treatment assignment and eligibility. All available data in the ultimate trimmed population were used in the IPTW and full matching across methods, but sample size was slightly reduced for the full matching within methods because some patients could not be matched in individual imputations within strata formed by number of diseased vessels and eligibility. Note that it is possible to have reduced sample size for full matching across methods, but this was not the case in this analysis. Only 22.3% of PCI three-vessel patients would have been deemed eligible for one of the six trials, compared with slightly less than 40% of the other subgroups.

Table 3:

Sample size by Treatment Assignment, Number of Diseased Vessels, and Eligibility.

Treatment Assigned Two Vessel Disease Three Vessel Disease Total
Trial Eligible Trial Ineligible Trial Eligible Trial Ineligible Trial Eligible Trial Ineligible
Before trimming CABG 467 790 1080 1857 1547 2647
 (Percent eligible) (37.2%) (36.8%) (36.9%)
Full matching Within Approach – Match within NUMDZV and ELIG 419 671 808 1288 1227 1959
Full matching Across Approach – Match within NUMDZV and ELIG 448 696 808 1288 1256 1984
IPTW methods 448 696 808 1288 1256 1984
 (Percent eligible) (39.2%) (38.5%) (38.8%)
Before trimming PCI 1009 2094 285 1163 1294 3257
 (Percent eligible) (32.5%) (19.7%) (28.4%)
Full matching Within Approach – Match within NUMDZV and ELIG 878 1483 245 926 1123 2409
Full matching Across Approach – Match within NUMDZV and ELIG 878 1483 266 926 1144 2409
IPTW methods 878 1483 266 926 1144 2409
 (Percent eligible) (37.2%) (22.3%) (32.2%)
*

Note that in some imputations, some patients could not be matched. ELIG: Eligibility; NUMDZV: Number of diseased vessels; CABG: Coronary artery bypass surgery; PCI: Percutaneous coronary intervention.

First, to compare the effect of trial eligibility on patient survivals within a given treatment group (PCI or CABG) and number of diseased vessels, Figure 3 displays the estimated survival trajectories by number of diseased vessels and eligibility from the full matching within and across approaches. For PCI survival of two-vessel and CABG survival of three-vessel patients, the two approaches produce essentially identical curves and demonstrate higher survival for eligible patients, at least out to 10 years. For CABG survival of two-vessel and PCI survival of three-vessel patients, there is still a suggestion of slightly higher survival for patients who would have been deemed eligible, although there is more variability between the methods. Results for IPTW are included in supplementary material and mimic those of the full matching within.

Figure 3.

Figure 3

Comparison of survival estimates by eligibility from full matching within and across approaches.

Next, we compared the treatment effect within different combinations of eligibility and number of diseased vessels. Figure 4 displays the potential differences in treatment conclusions that might be made based on the selected methodological approach. Interestingly, curves for hypothetically trial-ineligible two-vessel patients show an early survival advantage for CABG, which gives way at about 10 years to a slight PCI survival advantage. For the three-vessel disease eligible subgroup, a noticeable CABG survival advantage appears out to about ten years, and CABG maintains a slim advantage throughout. For the other two groups, the CABG and PCI curves follow each other closely until the very end, when PCI confers a slight advantage for hypothetically ineligible patients.

Figure 4.

Figure 4.

Comparison of survival estimates by treatment from full matching within and across approaches.

To quantify the extent to which eligibility or different analysis methods produced different survival conclusions, we calculated the estimated RMSTs for the four subgroups (two- and three-vessel eligible and not eligible, see Supplementary Table S2). For each of the four methods, both for patients who would have been trial-eligible and those ineligible, we calculated the estimated area under the two treatment survival curves at 5, 10, and 15 years. Each column of four dots in Figure 5 demonstrates estimated RMST results for trial-eligible and -ineligible patients undergoing PCI or CABG for one method over 5-, 10- and 15-year time horizons. With respect to eligibility status, PCI 2-vessel trial-eligible patients appear to have approximately two-tenths of a year survival advantage over trial-ineligible patients for the 10-year horizon, while survival for CABG trial-eligible patients lags that of ineligible patients in the early years, showing a small advantage later. For three-vessel patients, trial-eligible CABG patients demonstrate an estimated additional half year survival at 10 years over patients who would not have been trial-eligible, and trial eligible PCI patients appear to have about two-tenths of a year advantage over those deemed ineligible. RMST differs little by analytic method, with the exception of the full matching across approach. With regard to treatment comparisons, among trial-ineligible patients with either two- and three vessel disease, CABG appears to have at least a marginal survival advantage over PCI up to 10 years, although very slight for three-vessel patients and a trend that reverses at 15 years for two-vessel patients. For trial eligible patients, PCI appears to have a slight advantage over CABG in two-vessel patients; a more substantial CABG survival advantage is evident in three-vessel patients.

Figure 5.

Figure 5.

Estimated restricted mean survival time (RMST) under 4 different methods at 5, 10, and 15 years.

5. Discussion

The tension between the narrow focus and lack of generalizability of traditional randomized controlled trials (RCTs) and the inherent biases of observational analyses has led to confusion when results appear to differ. One of the most cited examples is that of post-menopausal hormone therapy (HRT).78 Early observational analyses from the Nurses’ Health Study and others had suggested HRT would significantly lower the risk for heart disease,79,80 leading to thousands of women receiving this therapy. However, in the 90s, the Heart and Estrogen/progestin Replacement Study (HERS) and the Women’s Health Initiative large randomized trials disputed the results of the initial observational analyses.8183 Instead of being protective, HRT appeared to increase the risk for stroke, fractures, and mortality.8184 The evolution of the reanalysis and interpretation of these studies is ongoing. In fact, when Hernan and colleagues emulated the design and intention-to-treat analysis of the Women’s Health Initiative randomized trial in the observational Nurses’ Health Study, they found that the discrepancy could be explained by differences in the distribution of time since menopause and length of follow-up between the two studies.85 Similarly, with respect to the comparison of CABG versus PCI survival among coronary artery disease patients, observational studies have reported a significant advantage of CABG over PCI, while trials have been mixed.

Commonly used approaches for causal inference include covariate adjustment modeling, propensity matching, and inverse probability weighting, with potential variations on each, especially for estimating survival with missing data. Although multiple imputation has become the standard approach for dealing with missing values when assuming a missing at random mechanism, how best to combine the data over iterations when applying these causal inference methods remains relatively unexplored. While our primary goal sought to assess whether eligibility status might have been responsible for the differences between trials and observational analyses, our empirical study also explored whether and how results from different approaches might differ when applied to estimating survival comparisons with actual clinical data.

In a simulation study, Austin and Stuart compared IPTW and full matching when estimating a marginal hazard ratio in the presence of misspecification of the propensity score.26 Their simulation suggested little bias with either method when the treatment selection process was weak to moderate. However, a relatively strong treatment selection process led to significant bias. They found they could mitigate this bias by using full matching with caliper restriction or a restricted sample with IPTW and correctly specifying the propensity model. In a subsequent comparison of propensity score matching, stratification on the propensity score and IPTW, Austin and Schuster showed that stratification on the propensity score had the greatest bias when examining absolute effects of treatment on survival in a Monte Carlo simulation, but IPTW methods performed better than matching methods when treatment prevalence was less extreme.24 Although the accuracy of the propensity model in real-world data can never actually be determined, our analyses used full matching with caliper restriction and a substantially trimmed sample.

Previous authors have used simulation in the context of propensity analyses to gain insight into whether individual imputations should be separately analyzed with results combined (the “within” approach), or propensities combined prior to analysis (the “across” approach). Naturally, simulation studies need to impose restrictions on the data generating mechanism and the number of covariates. Mitra and Reiter considered the case of a continuous outcome in a propensity matching analysis with two covariates and concluded that the “across” method potentially reduces bias, and the “within” method demonstrates slightly smaller variance.37 Additionally, bias reduction for the across approach was greatest when the treatment assignment depended only on the non-missing covariate. They also performed an empirical study on real data and found similar results between the across and within methods, possibly because, consistent with their simulations, the missing values were not strongly associated with treatment assignment. In our analyses, mitral insufficiency was a strong predictor of treatment and was missing about 35% of the time for PCI patients. Hence we might assume that the bias reduction for the “across” approach would not be substantial relative to the “within” approach.

More recently, when comparing the within versus across approach to propensity score analyses following multiple imputation on actual data, Granger et al. evaluated eight different scenarios involving both binary and continuous outcomes and provided more definitive results and a recommendation.39 Invoking the explanation by Leyrat et al.40 that the average propensity score across imputations is not truly a balancing score, they recommend against the across approach as being more biased. Nonetheless, to our knowledge, no studies to date have compared these approaches on survival data.

This study provides several conclusions. First, it makes clear the distinction between populations available for real-world observational analyses and those who qualify for randomized trials. In our study, fewer than 40% of two- and three-vessels patients seen at or referred to our academic medical center would have been eligible to participate at least one of the contemporaneous trials. For any individual trail, the percentage is far fewer. Secondly, conventional clinical treatment selection practices appear evident in our analysis of actual treatment received, i.e., there appear to be preferred treatments for certain subgroups, suggesting “inextricable confounding”86 with likely violation of ignorable treatment assignment. For example, over 70% of two-vessel patients were assigned to the PCI treatment strategy, whereas over 65% of three-vessel patients were assigned to CABG. Ultimately trimming the dataset to potentially comparable patients reduced the sample size by 22% from 8745 down to 6793. Given these limitations on both eligibility and comparability, the validity of results from any observational study should be taken in context; likewise, this raises further uncertainty about the generalizability of clinical trials to real world applications. These results also point to the potential confounding by clinicians only enrolling patients they considered to be at equipoise based on prevailing clinical conventions.

In our analyses, hypothetically trial-eligible patients had a survival advantage over hypothetically trial-ineligible patients, especially for two-vessel PCI and three-vessel CABG patients. In our final trimmed sample, estimated survival differences between CABG and PCI were unremarkable, except for three-vessel eligible patients for whom CABG demonstrated better survival than PCI.

With respect to the different methodological approaches, we found very few differences in survival trajectories or RMST from these methods. The four approaches yielded essentially equivalent results, except possibly for the full matching across method which has been noted previously to depend on an average propensity score that is not a true balancing score. 40

A difficulty encountered by both our study and also long-term protocol-driven RCTs is to maintain consistency with contemporary treatment strategies. Our data included treatment assignment as received over a decade in which technology evolved from balloon angioplasty to bare metal and then drug-eluting stents and as guideline recommendations evolved based on the emerging evidence. Additionally, our assignment of the RCT-eligible flag was based on meeting eligibility criteria for any of the six RCTs comparing PCI or CABG. This aggregation over different trials could have also introduced confounding and additional heterogeneity into the eligible subgroup.

Our outcome variable was 15-year mortality. The four PCI vs. CABG RCTs that we examined all published survival outcomes to five, six or ten years with no statistically significant difference between PCI and CABG patients except for the SoS trial in which CABG patients had a significantly improved survival due in large part to higher cancer deaths in the PCI arm.8790 An individual patient-level meta-analysis confirmed significant heterogeneity when comparing treatment mortality between SoS and the other trials but also corroborated a non-significant hazard ratio of 0.95 (95% CI: 0.73–1.23) for CABG vs. PCI found in three of the four RCTs.11 More inclusive meta-analyses of RCTs have also almost always found no survival advantage for CABG.46,91,92 For comparison, when substantially limiting our registry data to multi-vessel trial-eligible patients, our 5-year RMST analysis similarly suggests a 58 to 65 days RMST longevity benefit with CABG over PCI for patients with three-vessel disease and a shorter 25 to 32 days RMST benefit with CABG over PCI for patients with two-vessel disease.

Our analysis found that two-vessel trial-eligible patients appeared to have better short-term PCI survival than ineligible patients but was inconclusive for long-term or CABG survival. Similarly, our analysis found that three-vessel trial-eligible patients appeared to have superior CABG survival than ineligible patients but was inconclusive for PCI survival. Negligible treatment survival differences were observed among two-vessel eligible patients and three-vessel ineligible patients. Among two-vessel ineligible patients, CABG appeared to be superior to PCI for up to 10 years but then the trend reversed. Among three-vessel eligible patients, CABG appeared to demonstrate marginally better survival than PCI. These results are consistent with, but much less pronounced than, findings from major clinical registry studies that found CABG to PCI hazard ratios favoring CABG.92 In general, we found that the full matching across method produced results that differed slightly from those of the other three methods, which were essentially equivalent to each other.

To conclude, our findings of treatment selection bias with poorly overlapping propensity scores necessitating substantial trimming, along with only 35% eligibility for contemporaneous RCTs, suggest that RCTs enroll highly selected populations, and current methodological approaches appear to be unable to resolve the differences between observational and RCTs results.

Supplementary Material

Supplemental figure 1

Supplementary Figure 1. Estimated survivals under 5 different methods for patients with two-vessel disease.

Supplemental figure 2

Supplementary Figure 2. Estimated survivals under 5 different methods for patients with three-vessel disease.

Supplemental document

Supplementary Table S1. Comparison of demographics before and after trimming for patients with 2- or 3-vessel disease.

Supplementary Table S2. Comparison of restricted mean survival times (RMST) estimated from full matching within and across, and IPTW within and across approaches.

Acknowledgements

The authors thank Frank E Harrell, PhD (Vanderbilt), Linda K Shaw, MS (Duke) and Aaron D Jones, MB (Duke) for their statistical advice and support and for Daniel B Mark, MD, MPH (Duke) and Eric D Peterson, MD, MPH (Duke) for their clinical perspectives.

Funding

This work was supported through a Patient-Centered Outcomes Research Institute Inaugural Methods Initiative, Improving Methods for Conducting Patient-Centered Outcomes Research Award (ME-1303–5894) Integrating Causal Inference, Evidence Synthesis, and Research Prioritization Methods, which supported the salaries of Drs. Lee, DeLong and Wong. This work was also made possible by Grant Number UL1TR001117 from the National Center for Advancing Translational Sciences (NCATS), a component of the National Institutes of Health (NIH), and NIH Roadmap for Medical Research. The Foundation for Informed Medical Decision Making funded a pilot award. Its contents are solely the responsibility of the authors and do not necessarily represent the official view of NCATS or NIH.

Footnotes

Disclosures

All statements in this report, including its findings and conclusions, are solely those of the authors and do not necessarily represent the views of the Patient-Centered Outcomes Research Institute, its Board of Governors, or its Methodology Committee.

Data Availability

The data that support the findings of this study are not shared. However, the de-identified version of the Duke Databank for Cardiovascular Disease (DDCD), DukeCath analysis dataset, is available from Duke if the research proposal is approved by Duke reviewer committee.

References

  • 1.Benson K, Hartz AJ. A Comparison of Observational Studies and Randomized, Controlled Trials. N Engl J Med 2000;342(25):1878–1886. [DOI] [PubMed] [Google Scholar]
  • 2.Concato J, Shah N, Horwitz RI. Randomized, Controlled Trials, Observational Studies, and the Hierarchy of Research Designs. N Engl J Med 2000;342(25):1887–1892. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Ioannidis JPA, Haidich A-B, Pappa M, et al. Comparison of Evidence of Treatment Effects in Randomized and Nonrandomized Studies. JAMA 2001;286(7):821–830. [DOI] [PubMed] [Google Scholar]
  • 4.Hlatky MA, Boothroyd DB, Bravata DM, et al. Coronary artery bypass surgery compared with percutaneous coronary interventions for multivessel disease: a collaborative analysis of individual patient data from ten randomised trials. Lancet 2009;373(9670):1190–1197. [DOI] [PubMed] [Google Scholar]
  • 5.Windecker S, Stortecky S, Stefanini GG, et al. Revascularisation versus medical treatment in patients with stable coronary artery disease: network meta-analysis. BMJ 2014;348:g3859. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Kim DD, Trikalinos TA, Wong JB. Leveraging Cumulative Network Meta-analysis and Value of Information Analysis to Understand the Evolving Value of Medical Research Med Decis Making 2019;39(2):119–129. [DOI] [PubMed] [Google Scholar]
  • 7.Boden WE, O’Rourke RA, Teo KK, et al. Optimal Medical Therapy with or without PCI for Stable Coronary Disease. N Engl J Med 2007;356(15):1503–1516. [DOI] [PubMed] [Google Scholar]
  • 8.Rodriguez A, Bernardi V, Navia J, et al. Argentine Randomized Study: Coronary Angioplasty with Stenting versus Coronary Bypass Surgery in patients with Multiple-Vessel Disease (ERACI II): 30-day and one-year follow-up results. ERACI II Investigators. J Am Coll Cardiol 2001;37(1):51–58. [DOI] [PubMed] [Google Scholar]
  • 9.Hueb W, Lopes NH, Gersh BJ, et al. Five-year follow-up of the Medicine, Angioplasty, or Surgery Study (MASS II): a randomized controlled clinical trial of 3 therapeutic strategies for multivessel coronary artery disease. Circulation 2007;115(9):1082–1089. [DOI] [PubMed] [Google Scholar]
  • 10.Anonymous. Coronary angioplasty versus medical therapy for angina: the second Randomised Intervention Treatment of Angina (RITA-2) trial. RITA-2 trial participants. Lancet 1997;350(9076):461–468. [PubMed] [Google Scholar]
  • 11.Daemen J, Boersma E, Flather M, et al. Long-term safety and efficacy of percutaneous coronary intervention with stenting and coronary artery bypass surgery for multivessel coronary artery disease: a meta-analysis with 5-year patient-level data from the ARTS, ERACI-II, MASS-II, and SoS trials. Circulation 2008;118(11):1146–1154. [DOI] [PubMed] [Google Scholar]
  • 12.Jones RH, Kesler K, Phillips HR 3rd, et al. Long-term survival benefits of coronary artery bypass grafting and percutaneous transluminal angioplasty in patients with coronary artery disease. J Thorac Cardiovasc Surg 1996;111(5):1013–1025. [DOI] [PubMed] [Google Scholar]
  • 13.Hannan EL, Racz MJ, Walford G, et al. Long-term outcomes of coronary-artery bypass grafting versus stent implantation. N Engl J Med 2005;352(21):2174–2183. [DOI] [PubMed] [Google Scholar]
  • 14.Malenka DJ, Leavitt BJ, Hearne MJ, et al. Comparing long-term survival of patients with multivessel coronary disease after CABG or PCI: analysis of BARI-like patients in northern New England. Circulation 2005;112(9 Suppl):I371–376. [DOI] [PubMed] [Google Scholar]
  • 15.Smith PK, Califf RM, Tuttle RH, et al. Selection of surgical or percutaneous coronary intervention provides differential longevity benefit. Ann Thorac Surg 2006;82(4):1420–1428; discussion 1428–1429. [DOI] [PubMed] [Google Scholar]
  • 16.Institute of Medicine. Observational Studies in a Learning Health System: Workshop Summary Washington, DC: The National Academies Press; 2013. [PubMed] [Google Scholar]
  • 17.Califf RM, Robb MA, Bindman AB, et al. Transforming Evidence Generation to Support Health and Health Care Decisions. N Engl J Med 2016;375(24):2395–2400. [DOI] [PubMed] [Google Scholar]
  • 18.Sherman RE, Anderson SA, Dal Pan GJ, et al. Real-World Evidence — What Is It and What Can It Tell Us? N Engl J Med 2016;375(23):2293–2297. [DOI] [PubMed] [Google Scholar]
  • 19.Rosenbaum PR, Rubin DB. The central role of the propensity score in observational studies for causal effects. Biometrika 1983;70(1):41–55. [Google Scholar]
  • 20.Austin PC, Grootendorst P, Anderson GM. A comparison of the ability of different propensity score models to balance measured variables between treated and untreated subjects: a Monte Carlo study. Stat Med 2007;26(4):734–753. [DOI] [PubMed] [Google Scholar]
  • 21.Austin PC. An Introduction to Propensity Score Methods for Reducing the Effects of Confounding in Observational Studies. Multivariate Behav Res 2011;46(3):399–424. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Deb S, Austin PC, Tu JV, et al. A Review of Propensity-Score Methods and Their Use in Cardiovascular Research. Can J Cardiol 2016;32(2):259–265. [DOI] [PubMed] [Google Scholar]
  • 23.Austin PC. The performance of different propensity score methods for estimating marginal hazard ratios. Stat Med 2013;32(16):2837–2849. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Austin PC, Schuster T. The performance of different propensity score methods for estimating absolute effects of treatments on survival outcomes: A simulation study. Stat Methods Med Res 2016;25(5):2214–2237. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Austin PC, Stuart EA. Optimal full matching for survival outcomes: a method that merits more widespread use. Stat Med 2015;34(30):3949–3967. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Austin PC, Stuart EA. The performance of inverse probability of treatment weighting and full matching on the propensity score in the presence of model misspecification when estimating the effect of treatment on survival outcomes. Stat Methods Med Res 2015;26(4):1654–1670. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Rubin DB. Inference and missing data. Biometrika 1976;63(3):581–592. [Google Scholar]
  • 28.Rubin DB. Multiple Imputation For Nonresponse In Surveys New York: Wiley; 1989. [Google Scholar]
  • 29.Rubin DB. Multiple Imputation After 18+ Years. J Am Stat Assoc 1996;91(434):473–489. [Google Scholar]
  • 30.Schafer JL. Analysis of Incomplete Multivariate Data Taylor & Francis; 1997. [Google Scholar]
  • 31.van Buuren S, Boshuizen HC, Knook DL. Multiple imputation of missing blood pressure covariates in survival analysis. Stat Med 1999;18(6):681–694. [DOI] [PubMed] [Google Scholar]
  • 32.Moons KGM, Donders RART, Stijnen T, Harrell FE Jr. Using the outcome for imputation of missing predictor values was preferred. J Clin Epidemiol 2006;59(10):1092–1101. [DOI] [PubMed] [Google Scholar]
  • 33.Sterne JAC, White IR, Carlin JB, et al. Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. BMJ 2009;338:b2393. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.White IR, Royston P. Imputing missing covariate values for the Cox model. Stat Med 2009;28(15):1982–1998. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Hill J Reducing bias in treatment effect estimation in observational studies suffering from missing data 2004.
  • 36.Eulenburg C, Suling A, Neuser P, et al. Propensity Scoring after Multiple Imputation in a Retrospective Study on Adjuvant Radiation Therapy in Lymph-Node Positive Vulvar Cancer. PLoS One 2016;11(11):e0165705. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Mitra R, Reiter JP. A comparison of two methods of estimating propensity scores after multiple imputation. Stat Methods Med Res 2016;25(1):188–204. [DOI] [PubMed] [Google Scholar]
  • 38.Choi J, Dekkers OM, le Cessie S. A comparison of different methods to handle missing data in the context of propensity score analysis. Eur J Epidemiol 2019;34(1):23–36. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Granger E, Sergeant JC, Lunt M. Avoiding pitfalls when combining multiple imputation and propensity scores. Stat Med 2019;38(26):5120–5132. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Leyrat C, Seaman SR, White IR, et al. Propensity score analysis with partially observed covariates: How should multiple imputation be used? Stat Methods Med Res 2019;28(1):3–19. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Zucker DM. Restricted Mean Life with Covariates: Modification and Extension of a Useful Survival Analysis Method. J Am Stat Assoc 1998;93(442):702–709. [Google Scholar]
  • 42.Royston P, Parmar MK. Restricted mean survival time: an alternative to the hazard ratio for the design and analysis of randomized trials with a time-to-event outcome. BMC Medical Res Methodol 2013;13(1):152. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Royston P, Parmar MK. The use of restricted mean survival time to estimate the treatment effect in randomized clinical trials when the proportional hazards assumption is in doubt. Stat Med 2011;30(19):2409–2421. [DOI] [PubMed] [Google Scholar]
  • 44.Rosati RA, McNeer J, Starmer C, et al. A new information system for medical practice. Arch Intern Med 1975;135(8):1017–1024. [PubMed] [Google Scholar]
  • 45.Califf RM, Harrell FE Jr, Lee KL, et al. The evolution of medical and surgical therapy for coronary artery disease. A 15-year perspective. JAMA. 1989;261(14):2077–2086. [PubMed] [Google Scholar]
  • 46.Smith LR, Harrell FE Jr, Rankin JS, et al. Determinants of early versus late cardiac death in patients undergoing coronary artery bypass graft surgery. Circulation 1991;84(5 Suppl):Iii245–253. [PubMed] [Google Scholar]
  • 47.Fortin DF, Califf RM, Pryor DB, Mark DB. The way of the future redux. Am J Cardiol 1995;76(16):1177–1182. [DOI] [PubMed] [Google Scholar]
  • 48.DeLong ER, Nelson CL, Wong JB, et al. Using observational data to estimate prognosis: an example using a coronary artery disease registry. Stat Med 2001;20(16):2505–2532. [DOI] [PubMed] [Google Scholar]
  • 49.Serruys PW, Unger F, van Hout BA, et al. The ARTS study (Arterial Revascularization Therapies Study). Semin Interventar Cardiol 1999;4(4):209–219. [DOI] [PubMed] [Google Scholar]
  • 50.Boden WE, O’Rourke RA, Teo KK, et al. Design and rationale of the Clinical Outcomes Utilizing Revascularization and Aggressive DruG Evaluation (COURAGE) trial Veterans Affairs Cooperative Studies Program no. 424. Am Heart J 2006;151(6):1173–1179. [DOI] [PubMed] [Google Scholar]
  • 51.Hueb W, Soares PR, Gersh BJ, et al. The medicine, angioplasty, or surgery study (MASS-II): a randomized, controlled clinical trial of three therapeutic strategies for multivessel coronary artery disease: one-year results. J Am Coll Cardiol 2004;43(10):1743–1751. [DOI] [PubMed] [Google Scholar]
  • 52.Stables RH. Design of the ‘Stent or Surgery’ trial (SoS): a randomized controlled trial to compare coronary artery bypass grafting with percutaneous transluminal coronary angioplasty and primary stent implantation in patients with multi-vessel coronary artery disease. Semin Intervent Cardiol 1999;4(4):201–207. [DOI] [PubMed] [Google Scholar]
  • 53.Fihn SD, Gardin JM, Abrams J, et al. 2012 ACCF/AHA/ACP/AATS/PCNA/SCAI/STS guideline for the diagnosis and management of patients with stable ischemic heart disease: executive summary: a report of the American College of Cardiology Foundation/American Heart Association task force on practice guidelines, and the American College of Physicians, American Association for Thoracic Surgery, Preventive Cardiovascular Nurses Association, Society for Cardiovascular Angiography and Interventions, and Society of Thoracic Surgeons. Circulation 2012;126(25):3097–3137. [DOI] [PubMed] [Google Scholar]
  • 54.Patel MR, Dehmer GJ, Hirshfeld JW, Smith PK, Spertus JA. ACCF/SCAI/STS/AATS/AHA/ASNC/HFSA/SCCT 2012 Appropriate Use Criteria for Coronary Revascularization Focused Update: A Report of the American College of Cardiology Foundation Appropriate Use Criteria Task Force, Society for Cardiovascular Angiography and Interventions, Society of Thoracic Surgeons, American Association for Thoracic Surgery, American Heart Association, American Society of Nuclear Cardiology, and the Society of Cardiovascular Computed Tomography. J Thorac Cardiovasc Surg 2012;59(9):857–881. [DOI] [PubMed] [Google Scholar]
  • 55.Patel MR, Calhoon JH, Dehmer GJ, et al. ACC/AATS/AHA/ASE/ASNC/SCAI/SCCT/STS 2017 Appropriate Use Criteria for Coronary Revascularization in Patients With Stable Ischemic Heart Disease: A Report of the American College of Cardiology Appropriate Use Criteria Task Force, American Association for Thoracic Surgery, American Heart Association, American Society of Echocardiography, American Society of Nuclear Cardiology, Society for Cardiovascular Angiography and Interventions, Society of Cardiovascular Computed Tomography, and Society of Thoracic Surgeons. J Am Coll Cardiol 2017. [DOI] [PubMed]
  • 56.Frangakis CE, Rubin DB. Addressing complications of intention-to-treat analysis in the combined presence of all-or-none treatment-noncompliance and subsequent missing outcomes. Biometrika 1999;86(2):365–379. [Google Scholar]
  • 57.van Buuren S Multiple imputation of discrete and continuous data by fully conditional specification. Stat Methods Med Res 2007;16(3):219–242. [DOI] [PubMed] [Google Scholar]
  • 58.van Buuren S, Groothuis-Oudshoorn K. mice: Multivariate Imputation by Chained Equations in R. J Stat Softw 2011;45(3):67. [Google Scholar]
  • 59.R: A language and environment for statistical computing. [computer program] Vienna, Austria: R Foundation for Statistical Computing; 2015. [Google Scholar]
  • 60.Rubin DB. Statistical Matching Using File Concatenation with Adjusted Weights and Multiple Imputations. J Bus Econ Stat 1986;4(1):87–94. [Google Scholar]
  • 61.Roderick JAL. Missing-Data Adjustments in Large Surveys. J Bus Econ Stat 1988;6(3):287–296. [Google Scholar]
  • 62.Brookhart MA, Schneeweiss S, Rothman KJ, Glynn RJ, Avorn J, Sturmer T. Variable selection for propensity score models. Am J Epidemiol 2006;163(12):1149–1156. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Crump RK, Hotz VJ, Imbens GW, Mitnik OA. Dealing with limited overlap in estimation of average treatment effects. Biometrika 2009;96(1):187–199. [Google Scholar]
  • 64.Rubin DB. Matching to Remove Bias in Observational Studies. Biometrics 1973;29(1):159–183. [Google Scholar]
  • 65.Rosenbaum P R. A Characterization of Optimal Designs for Observational Studies. J R Stat Soc Series B Stat Methodol 1991;53(3):597–610. [Google Scholar]
  • 66.Hansen BB, Klopfer SO. Optimal Full Matching and Related Designs via Network Flows. J Comput Graph Stat 2006;15(3):609–627. [Google Scholar]
  • 67.Rosenbaum PR, Rubin DB. Constructing a Control Group Using Multivariate Matched Sampling Methods That Incorporate the Propensity Score. Am Stat 1985;39(1):33–38. [Google Scholar]
  • 68.Marshall A, Altman DG, Holder RL, Royston P. Combining estimates of interest in prognostic modelling studies after multiple imputation: current practice and guidelines. BMC Med Res Methodol 2009;9(1):57. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Austin PC, Stuart EA. Moving towards best practice when using inverse probability of treatment weighting (IPTW) using the propensity score to estimate causal treatment effects in observational studies. Stat Med 2015;34(28):3661–3679. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Normand ST, Landrum MB, Guadagnoli E, et al. Validating recommendations for coronary angiography following acute myocardial infarction in the elderly: a matched analysis using propensity scores. J Clin Epidemiol 2001;54(4):387–398. [DOI] [PubMed] [Google Scholar]
  • 71.Trichon BH, Glower DD, Shaw LK, et al. Survival after coronary revascularization, with and without mitral valve surgery, in patients with ischemic mitral regurgitation. Circulation 2003;108 Suppl 1:Ii103–110. [DOI] [PubMed] [Google Scholar]
  • 72.Kang DH, Sun BJ, Kim DH, et al. Percutaneous versus surgical revascularization in patients with ischemic mitral regurgitation. Circulation 2011;124(11 Suppl):S156–162. [DOI] [PubMed] [Google Scholar]
  • 73.Fortuna D, Nicolini F, Guastaroba P, et al. Coronary artery bypass grafting vs percutaneous coronary intervention in a ‘real-world’ setting: a comparative effectiveness study based on propensity score-matched cohorts. Eur J Cardiothorac Surg 2013;44(1):e16–24. [DOI] [PubMed] [Google Scholar]
  • 74.Hlatky MA, Boothroyd DB, Baker L, et al. Comparative effectiveness of multivessel coronary bypass surgery and multivessel percutaneous coronary intervention: a cohort study. Ann Intern Med 2013;158(10):727–734. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Castleberry AW, Williams JB, Daneshmand MA, et al. Surgical revascularization is associated with maximal survival in patients with ischemic mitral regurgitation: a 20-year experience. Circulation 2014;129(24):2547–2556. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Marui A, Kimura T, Nishiwaki N, et al. Comparison of five-year outcomes of coronary artery bypass grafting versus percutaneous coronary intervention in patients with left ventricular ejection fractions</=50% versus >50% (from the CREDO-Kyoto PCI/CABG Registry Cohort-2). Am J Cardiol 2014;114(7):988–996. [DOI] [PubMed] [Google Scholar]
  • 77.Samad Z, Shaw LK, Phelan M, et al. Management and outcomes in patients with moderate or severe functional mitral regurgitation and severe left ventricular dysfunction. Eur Heart J 2015;36(40):2733–2741. [DOI] [PubMed] [Google Scholar]
  • 78.Tannen RL, Weiner MG, Xie D, Barnhart K. Perspectives on hormone replacement therapy: the Women’s Health Initiative and new observational studies sampling the overall population. Fertil Steril 2008;90(2):258–264. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79.Grodstein F, Stampfer MJ, Manson JE, et al. Postmenopausal estrogen and progestin use and the risk of cardiovascular disease. N Engl J Med 1996;335(7):453–461. [DOI] [PubMed] [Google Scholar]
  • 80.Grodstein F, Manson JE, Colditz GA, Willett WC, Speizer FE, Stampfer MJ. A prospective, observational study of postmenopausal hormone therapy and primary prevention of cardiovascular disease. Ann Intern Med 2000;133(12):933–941. [DOI] [PubMed] [Google Scholar]
  • 81.Hulley S, Grady D, Bush T, et al. Randomized trial of estrogen plus progestin for secondary prevention of coronary heart disease in postmenopausal women. Heart and Estrogen/progestin Replacement Study (HERS) Research Group. JAMA 1998;280(7):605–613. [DOI] [PubMed] [Google Scholar]
  • 82.Rossouw JE, Anderson GL, Prentice RL, et al. Risks and benefits of estrogen plus progestin in healthy postmenopausal women: principal results From the Women’s Health Initiative randomized controlled trial. JAMA 2002;288(3):321–333. [DOI] [PubMed] [Google Scholar]
  • 83.Anderson GL, Limacher M, Assaf AR, et al. Effects of conjugated equine estrogen in postmenopausal women with hysterectomy: the Women’s Health Initiative randomized controlled trial. JAMA 2004;291(14):1701–1712. [DOI] [PubMed] [Google Scholar]
  • 84.Grodstein F, Clarkson TB, Manson JE. Understanding the divergent data on postmenopausal hormone therapy. N Engl J Med 2003;348(7):645–650. [DOI] [PubMed] [Google Scholar]
  • 85.Hernan MA, Alonso A, Logan R, et al. Observational studies analyzed like randomized experiments: an application to postmenopausal hormone therapy and coronary heart disease. Epidemiology 2008;19(6):766–779. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 86.Blackstone EH. Comparing apples and oranges. J Thorac Cardiovasc Surg 2002;123(1):8–15. [DOI] [PubMed] [Google Scholar]
  • 87.Rodriguez AE, Baldi J, Fernandez Pereira C, et al. Five-year follow-up of the Argentine randomized trial of coronary angioplasty with stenting versus coronary bypass surgery in patients with multiple vessel disease (ERACI II). J Am Coll Cardiol 2005;46(4):582–588. [DOI] [PubMed] [Google Scholar]
  • 88.Serruys PW, Ong AT, van Herwerden LA, et al. Five-year outcomes after coronary stenting versus bypass surgery for the treatment of multivessel disease: the final analysis of the Arterial Revascularization Therapies Study (ARTS) randomized trial. J Am Coll Cardiol 2005;46(4):575–581. [DOI] [PubMed] [Google Scholar]
  • 89.Booth J, Clayton T, Pepper J, et al. Randomized, controlled trial of coronary artery bypass surgery versus percutaneous coronary intervention in patients with multivessel coronary artery disease: six-year follow-up from the Stent or Surgery Trial (SoS). Circulation 2008;118(4):381–388. [DOI] [PubMed] [Google Scholar]
  • 90.Hueb W, Lopes N, Gersh BJ, et al. Ten-year follow-up survival of the Medicine, Angioplasty, or Surgery Study (MASS II): a randomized controlled clinical trial of 3 therapeutic strategies for multivessel coronary artery disease. Circulation 2010;122(10):949–957. [DOI] [PubMed] [Google Scholar]
  • 91.Hoffman SN, TenBrook JA, Wolf MP, Pauker SG, Salem DN, Wong JB. A meta-analysis of randomized controlled trials comparing coronary artery bypass graft with percutaneous transluminal coronary angioplasty: one- to eight-year outcomes. J Am Coll Cardiol 2003;41(8):1293–1304. [DOI] [PubMed] [Google Scholar]
  • 92.Bravata DM, Gienger AL, McDonald KM, et al. Systematic review: the comparative effectiveness of percutaneous coronary interventions and coronary artery bypass graft surgery. Ann Intern Med 2007;147(10):703–716. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental figure 1

Supplementary Figure 1. Estimated survivals under 5 different methods for patients with two-vessel disease.

Supplemental figure 2

Supplementary Figure 2. Estimated survivals under 5 different methods for patients with three-vessel disease.

Supplemental document

Supplementary Table S1. Comparison of demographics before and after trimming for patients with 2- or 3-vessel disease.

Supplementary Table S2. Comparison of restricted mean survival times (RMST) estimated from full matching within and across, and IPTW within and across approaches.

Data Availability Statement

The data that support the findings of this study are not shared. However, the de-identified version of the Duke Databank for Cardiovascular Disease (DDCD), DukeCath analysis dataset, is available from Duke if the research proposal is approved by Duke reviewer committee.

RESOURCES