Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2018 Jan 1.
Published in final edited form as: Epidemiology. 2017 Jan;28(1):20–27. doi: 10.1097/EDE.0000000000000565

Bias due to confounders for the exposure-competing risk relationship

Catherine R Lesko 1, Bryan Lau 1
PMCID: PMC5489237  NIHMSID: NIHMS866998  PMID: 27748680

Abstract

Background

Epidemiologic studies that aim to estimate a causal effect of an exposure on a particular event of interest may be complicated by the existence of competing events that preclude the occurrence of the primary event. Recently, many articles have been published in the epidemiologic literature demonstrating the need for appropriate models to accommodate competing risks when they are present. However, there has been little attention to variable selection for confounder control in competing risk analyses.

Methods

We employ simulation to demonstrate the bias in two variable selection strategies: include covariates that are associated with the exposure and 1) which change the cause-specific hazard of any of the outcomes; or 2) which change the cause-specific hazard of the specific event of interest.

Results

We demonstrated minimal to no bias in estimators adjusted for confounders of exposure and either the event of interest or the competing event, but bias of varying magnitude in almost all estimators adjusted only for confounders of exposure and the primary outcome.

Discussion

When estimating causal effects for which there are competing risks, the analysis should control for confounders of both the exposure–primary outcome effect and of the exposure–competing outcome effect.


In many epidemiologic studies, the event of interest may be precluded by another event, termed a competing risk. The majority of the epidemiologic literature on competing risk has focused on explaining why and how competing risks should be incorporated into epidemiologic analyses,15 prediction,6 inference when the cause of failure is misclassified or incompletely recorded.7 However, there has been little attention given to estimation of causal effects in the presence of competing risks, and in particular to variable selection for confounder control.

Recent work on model specification in the presence of competing risks has focused on the development of stepwise variable selection procedures based on information criterion,8 score statistics,9 or various penalized likelihoods.10,11 However, the field of epidemiology has largely moved away from automated variable selection procedures,1215 in favor of the use of background knowledge encoded in directed acyclic graphs (DAGs) to identify a set of variables sufficient for confounder control.1517 Our objective was to show that estimation of the cumulative incidence function will be biased when a confounder of the effect of exposure on a competing event is ignored. Furthermore, estimands based upon the cumulative incidence function, including the subdistribution proportional hazard ratio will also be biased.18 We illustrate the bias with a simulation.

METHODS

Motivating example

Imagine an observational study of a hypothetical drug that alleviates symptoms of chronic obstructive pulmonary disease (COPD) but also increases mortality. A standard analysis might censor individuals when they die or report the incidence of a composite outcome. However, censoring individuals who die will overestimate the probability of chronic obstructive pulmonary disease (COPD) remission when deaths are not rare. Furthermore, it does not make sense to combine remission (a desirable outcome) with death (an undesirable outcome). Even when both outcomes are desirable or undesirable, analyzing a composite outcome results in a loss of information.19 An analysis that explicitly incorporates competing risks (e.g., one that employs the Fine and Gray subdistribution hazard model18 or that estimates the cumulative incidence function non-parametrically1) accounts for both the “direct” effect of the drug on the probability of COPD remission (because treated individuals have a higher hazard of COPD remission), and the “indirect” effect (because individuals who die are no longer at risk for COPD remission).

In the presence of competing risks, there are at least as many causal estimands as there are competing events. For example, denote the difference in the cumulative incidence of remission due to drug, P(Ta=1<t, Ja=1=1)P(Ta=0<t, Ja=0=1) and the difference in the cumulative incidence of death due to drug, P(Ta=1<t, Ja=1=2)P(Ta=0<t, Ja=0=2). Here, P (∙) denotes probability; T is the composite event time; A denotes treatment type; and J distinguishes event types. Ta and Ja denote composite event time and event type that we would have seen under treatment a (that is, potential outcomes). Following convention, we denote random variables with capital letters and possible realizations of random variables with lower case letters. We borrow potential outcomes notation from Cole et al (2015).1 In eAppendix A, we present an extension of potential outcomes notation that combines an event indicator with an event type indicator at a particular point in time, which allows incorporation of Greenland’s causal response types to the competing risk setting.20

Brief introduction to competing risk

Complete introductions to competing risks have previously been published.1,2,21 Nevertheless we reiterate some concepts for completeness. We limit discussion to two competing events; however, methods are easily extended to settings with more than two competing events.

The cumulative incidence function is perhaps the most natural estimand in the presence of competing risks and is defined:

Fj=P(Tt,J=j) Equation 1

where Fj is used to denote the cumulative incidence function for the Jth event type, j = 1,…,J; T and J are defined as above; and the asterisk distinguishes the cumulative incidence function from the conditional risk function estimated in standard survival analyses (e.g., complement of a Kaplan-Meier curve). Causal estimates can be generated by estimating the cumulative incidence function for each level of exposure, then taking a difference or ratio of those estimates, e.g., Fj,a=1Fj,a=0. Contrast the cumulative incidence function with the conditional risk (where competing events are censored):

Fj=P(T't) Equation 2

In (2), T′ denotes time to event j. We explicitly embrace the term conditional risk function22 to highlight the assumption embedded in the risk function when competing events are censored; the conditional risk function is the risk of the outcome in a world in which all competing risks have been eliminated, (without changing the cause-specific hazard of the event of interest). Imagining an intervention that would result in such a world is typically difficult, if not impossible. This assumption is also inherent in our definition of T′. T′ does not exist for people who get the competing event; estimators of the conditional risk impute T′ for people with the competing risk when they are treated as censored, despite the fact that by definition experiencing a competing event precludes the occurrence of event j. Therefore the conditional risk is rarely of interest because of its lack of grounding in reality. Furthermore, in the presence of competing risks, j=1JFj may exceed 1, violating the rule of coherence.23

The cumulative incidence function is a function of the cause-specific hazards:24,25

Fj(t)=0tS(u)hj(u)du=0texp(0uj=1Jhj(x)dx)hj(u)du Equation 3

where S(u−) is the survival function from all events (i.e., from the composite outcome) as it approaches u from the left and hj(t) denotes the cause-specific hazard for outcome j at time t:

hj(t)=limΔt0{P(t<Tt+Δt,J=j|T>t)Δt} Equation 4

The cause-specific hazard includes individuals who have survived from all events to time t in the risk set. Informally, (3) shows that the cumulative incidence function for event j is obtained by partitioning the cumulative incidence function for the composite event according to the relative magnitude of the cause-specific hazards. Importantly, one can see that the cumulative incidence function relies on the survival function, which is itself a function of the sum of the cause-specific hazards for both the primary event of interest and the competing event(s).

Because there is not a one-to-one relationship between the cause-specific hazard ratios and the relative cumulative incidence functions, Fine and Gray introduced the subdistribution hazards model. The risk set for the subdistribution hazard includes individuals who have survived until time t and those who failed due to the competing event prior to t.18 The subdistribution hazard is defined:18

λj(t)=limΔt0{P[t<Tt+Δt,J=j|T>t(T<tJj)]Δt} Equation 5

The cumulative incidence function is directly estimable from the subdistribution hazards:18

Fj(t)=1exp(0tλj(u)du) Equation 6

In the presence of confounding, the cumulative incidence functions for each level of exposure can be estimated nonparametrically (or semiparametrically, depending on the formulation of the weights) by estimating cause-specific or subdistribution hazards and applying (3) or (6) above, weighting each observation by the inverse probability of exposure.26 CIFsCumulative incidence functions could also be estimated parametrically.2729 Interpretation of the these functions should be complemented by examination of the cause-specific hazards for all events.3 If the exposure effects on the cause-specific hazard ratios are in the same direction for competing events, the exposure effects on the cumulative incidence functions are less predictable.30

The cause-specific hazards can be estimated from a Cox proportional hazards model, censoring individuals who fail with the competing event:

hj(t|z)=h0j(t)exp(zTβj) Equation 7

where h0j(t) is the unspecified baseline cause-specific hazard, z a vector of covariates, and βj the corresponding vector of regression coefficients such that exp(βj) are cause-specific hazard ratios associated with z. In the subdistribution proportional hazards model individuals who experience the competing event remain in the risk set until the end of follow-up and censored individuals are partitioned across the cumulative incidence functions for all the event types:

λj(t|z)=λ0j(t)exp(zTϕj) Equation 8

where λ0j is the unspecified baseline subdistribution hazard and ϕj the corresponding vector of regression coefficients such that exp(ϕj) are subdistribution hazard ratios associated with z. Confounders can be included in either model, resulting in a covariate conditional hazard ratio due to exposure, or confounding can be controlled with inverse probability exposure weights, resulting in a marginal hazard ratio due to exposure. The covariate conditional and marginal hazard ratios may not be equal because the hazard ratio is a non-collapsible estimator.31

Variable selection strategies

When etiologic or interventional parameters are of interest, the purpose of variable selection is to block non-causal pathways between the exposure and outcome under study,1416,32 rather than to maximize a model’s predictive ability. Strategies put forward for variable selection for confounder control, include identifying a minimally sufficient set of covariates for d-separation between exposure and outcome on a directed acyclic graph17 (considered in more detail in the Discussion) and identifying confounders based on a set of criteria. One set of criteria for identifying confounders includes: confounders must be 1) associated with exposure, 2) either a true cause or a surrogate of a true cause of the outcome, and 3) not affected by exposure.33,34 Another set of criteria states that if any set of covariates suffice to control confounding, selecting all pretreatment variables that cause exposure or cause the outcome will also control confounding.32 However, existing strategies only reference the relationship between covariates and one outcome; to our knowledge, there is no guidance on how to handle covariates associated with exposure and competing outcomes. One strategy for selecting covariates for confounder control would be to include confounders of only the outcome of interest. A second, more inclusive strategy would be to also include confounders of the competing event.

Simulation

We simulated 1,000 cohorts of 1,000 individuals each, in which we estimated the effect of a dichotomous exposure, A on an outcome of interest, j = 1, in the presence of a competing event, j = 2, and two dichotomous confounders:Z1, a confounder of the cause-specific hazard ratio for A on j = 1, and Z2, a confounder of the cause-specific hazard ratio for A on j = 2 (henceforth, a confounder of A on j = 1and a confounder of A on j = 2, respectively). Discussions of confounders are typically not specific as to the estimand of interest, but previous work has shown that presence of confounding can depend on the outcome parameter of interest.35 In this paper we generate confounding by simulating variables that change the odds of the exposure log-linearly and that change the cause-specific hazard of one of the two simulated events (but not the other) log-linearly. Details of the simulated data structure are provided in eAppendix B.

To establish values for the truth for each estimand, we generated deterministic potential outcomes for the 1,000,000 individuals across all simulations and cohorts, then calculated the value of each estimand using the 2,000,000 potential outcomes (one for each of the two treatments).36 Because the hazard ratio is non-collapsible,31,35 we calculated true marginal and covariate conditional hazard ratios to contrast with inverse probability exposure weighted and covariate conditional estimators, by including only exposure, or exposure and covariates, respectively, in models fitted on the simulated potential outcomes. To calculate cause-specific hazard ratios, we fit Cox proportional hazards models37 and censored individuals experiencing the competing event. To calculate subdistribution hazard ratios, we fit Cox proportional hazards models and set follow-up time to the end of follow-up for individuals experiencing the competing event; this is the subdistribution proportional hazards model in the absence of censoring.18 We assumed no censoring in our simulation, as it would have complicated calculations without changing any conclusions. We calculated the true conditional risk functions and cumulative incidence function nonparametrically using the simulated potential outcomes.

In each simulated cohorts, we estimated the cause-specific hazard ratio and subdistribution hazard ratio from a Cox model and Fine and Gray model, respectively. We controlled for different sets of confounders using covariate adjustment and inverse probability exposure weights.38 The conditional risk function and cumulative incidence function were estimated using inverse probability exposure weights,26 and incidence was read off of those curves at t = 200 to estimate risk differences. Calculated risk differences at other times yielded substantively similar results; we present risk differences at only one time point (the end of follow-up) to simplify results. Within-simulation standard error for the inverse probability exposure weighted estimator (for calculating 95% confidence interval coverage) of the cause-specific and subdistribution hazard ratios was estimated using the robust variance. Within-simulation standard error for risk differences were estimated using the standard deviation of estimates from 200 bootstrap samples, sampled with replacement within each simulated cohort. We report the bias and percent bias averaged across all 1000 simulations, the average within-simulation standard error of the 1000 estimates, the average mean squared error (MSE) which we calculated as [Bias(θ^,θ)]2+Var(θ) where Var (θ) is the Monte Carlo variance, and the percent coverage averaged across all 1000 simulations.

RESULTS

We present estimates from the simulation for in tables 16. For all parameter values we investigated, there was minimal to no bias in any estimators when we adjusted for both Z1 and Z2. There were, however, varying degrees of bias in the covariate-adjusted subdistribution hazard ratios for J = 1 when we adjusted only for Z1. The percent bias was −14.8% in the base case (table 1; odds ratios for association between Zs and A, and cause-specific hazard ratio for association between Zs and T = 2.0). Percent bias increased to −19.2 when the odds ratios for association between Zs and A increased from 2.0 to 4.0 (table 2) and increased to −29.6 when the cause-specific hazard ratio for association between Zs and T increased from 2.0 to 4.0 (table 3). The bias remained when the cause-specific hazard ratio for A on T was set to 1.0 (null; table 4) and when the cause-specific hazard ratio for A on T was increased to 4.0 (table 5). Finally, in table 6, we present results from a simulation with strong associations between covariates Z1 and Z2 and A and the cause-specific hazard ratios for T, to show that (although the set-up may be more extreme) there is bias in almost all the estimators adjusting only for Z1 (i.e. adjusting only for confounders specific to the event under study and ignoring confounders of the competing event). In table 6, the percent bias in the subdistribution hazard ratio jumped to −44.3, while it was 11.0 and 9.5 for the inverse probability of exposure weighted estimator of the subdistribution hazard ratio and cumulative incidence difference, respectively.

Table 1.

Simulation results, estimands for event of type J = 1, true cause-specific HRJ=2 = 2.0, true association between Z1 and A OR= 2.0, true association between Z2 and A OR=2.0, true cause-specific HR = 2.0 for associations between Z1 and event J = 1 and between Z2 and event J = 2

Truth Avg. Bias Avg. Pct. Bias Avg. Std. Error MSEa 95% Coverage

Conditional on…b Controlling for… Controlling for… Controlling for… Controlling for… Controlling for…

Parameter Z1 & Z2 Z1 only Z1 & Z2 Z1 only Z1 & Z2 Z1 only Z1 & Z2 Z1 only Z1 & Z2 Z1 only Z1 & Z2 Z1 only
Cause-specific log(HR)
 Covariate adjusted −0.692 −0.692 −0.006 −0.005 0.8 0.8 0.128 0.126 0.019 0.019 96.7 96.3
 IPEW estimatorc −0.679 −0.004 −0.004 0.6 0.6 0.128 0.126 0.022 0.051 97.1 97.1
RD, d conditional riskc −0.168 −0.000 −0.000 0.1 0.2 0.031 0.030 0.001 0.001 96.9 96.8
Subdistribution log(HR)e
 Covariate adjusted −0.746 −0.733 −0.007 0.109 1.0 −14.8 0.128 0.126 0.015 0.027 96.1 87.5
 IPEW estimatorc −0.733 −0.005 −0.016 0.7 2.2 0.128 0.126 0.015 0.015 96.6 96.9
RD,d CIFc −0.170 −0.001 −0.003 0.3 1.8 0.029 0.020 0.001 0.000 96.4 96.6

Abbreviations: Avg., average; CIF, cumulative incidence function; HR, hazard ratio; IPEW, inverse probability of exposure weighted; MSE, mean squared error; OR, odds ratio; Pct., percent; RD, risk difference

a

Mean Squared Error. MSE(θ^)=[Bias(θ^,θ)]2+Var(θ) where Var (θ) is the Monte Carlo variance

b

Truth varies according to variables included in the model for the HR because it is a non-collapsible measure.

c

Truth is marginal (not conditional on any covariates)

d

Difference in cumulative incidence functions or conditional risk functions estimated at t = 200.

e

Because cause-specific hazards were simulated to be proportional, subdistribution hazards are not proportional. Truth here is a time-averaged subdistribution hazard ratio.

Table 6.

Simulation results, estimands for event of type J = 1, true cause-specific HRJ=2 = 4.0, true association between Z1 and A OR = 4.0, true association between Z2 and A OR = 4.0, true cause-specific HR = 4.0 for associations between Z1 and event J = 1 and between Z2 and event J = 2

Truth Avg. Bias Avg. Pct. Bias Avg. Std. Error MSEa 95% Coverage

Conditional on…b Controlling for… Controlling for… Controlling for… Controlling for… Controlling for…

Parameter Z1 & Z2 Z1 only Z1 & Z2 Z1 only Z1 & Z2 Z1 only Z1 & Z2 Z1 only Z1 & Z2 Z1 only Z1 & Z2 Z1 only
Cause-specific log(HR)
 Covariate adjusted −0.695 −0.695 −0.001 −0.002 0.2 0.3 0.122 0.119 0.016 0.015 94.1 93.8
 IPEW estimatorc −0.625 0.001 0.010 −0.2 −1.6 0.127 0.118 0.014 0.013 96.4 96.4
RD, d conditional riskc −0.192 0.001 0.003 −0.7 −1.7 0.038 0.035 0.001 0.001 95.9 96.1
Subdistribution log(HR)e
 Covariate adjusted −0.962 −0.890 −0.005 0.395 0.5 −44.3 0.125 0.119 0.017 0.170 94.1 10.1
 IPEW estimatorc −0.891 −0.001 −0.098 0.1 11.0 0.126 0.117 0.014 0.022 96.7 88.9
RD,d CIFc −0.242 0.001 −0.023 −0.3 9.5 0.032 0.029 0.001 0.001 96.8 89.3

Abbreviations: Avg., average; CIF, cumulative incidence function; HR, hazard ratio; IPEW, inverse probability of exposure weighted; MSE, mean squared error; OR, odds ratio; Pct., percent; RD, risk difference

a

Mean Squared Error. MSE(θ^)=[Bias(θ^,θ)]2+Var(θ) where Var (θ) is the Monte Carlo variance

b

Truth varies according to variables included in the model for the HR because it is a non-collapsible measure.

c

Truth is marginal (not conditional on any covariates)

d

Difference in cumulative incidence functions or conditional risk functions estimated at t = 50

e

Because cause-specific hazards were simulated to be proportional, subdistribution hazards are not proportional. Truth here is a time-averaged subdistribution hazard ratio.

Table 2.

Simulation results, estimands for event of type J = 1, true cause-specific HRJ=2 = 2.0, true association between Z1 and A OR = 4.0, true association between Z2 and A OR = 4.0, true cause-specific HR = 2.0 for associations between Z1 and event J = 1 and between Z2 and event J = 2

Truth Avg. Bias Avg. Pct. Bias Avg. Std. Error MSEa 95% Coverage

Conditional on…b Controlling for… Controlling for… Controlling for… Controlling for… Controlling for…

Parameter Z1 & Z2 Z1 only Z1 & Z2 Z1 only Z1 & Z2 Z1 only Z1 & Z2 Z1 only Z1 & Z2 Z1 only Z1 & Z2 Z1 only
Cause-specific log(HR)
 Covariate adjusted −0.692 −0.692 −0.008 −0.007 1.1 1.1 0.138 0.129 0.017 0.016 95.4 95.7
 IPEW estimatorc −0.679 −0.008 −0.006 1.2 0.9 0.139 0.129 0.018 0.015 96.0 96.2
RD, d conditional riskc −0.168 −0.001 −0.001 0.8 0.6 0.033 0.031 0.001 0.001 95.7 96.1
Subdistribution log(HR)e
 Covariate adjusted −0.746 −0.733 −0.010 0.214 1.3 −19.2 0.136 0.129 0.018 0.062 95.6 61.1
 IPEW estimatorc −0.733 −0.009 −0.028 1.3 3.9 0.139 0.129 0.018 0.016 96.1 95.9
RD,d CIFc −0.171 −0.001 −0.006 0.9 3.5 0.031 0.029 0.001 0.001 95.7 95.7

Abbreviations: Avg., average; CIF, cumulative incidence function; HR, hazard ratio; IPEW, inverse probability of exposure weighted; MSE, mean squared error; OR, odds ratio; Pct., percent; RD, risk difference

a

Mean Squared Error. MSE(θ^)=[Bias(θ^,θ)]2+Var(θ) where Var (θ) is the Monte Carlo variance

b

Truth varies according to variables included in the model for the HR because it is a non-collapsible measure.

c

Truth is marginal (not conditional on any covariates)

d

Difference in cumulative incidence functions or conditional risk functions estimated at t = 200

e

Because cause-specific hazards were simulated to be proportional, subdistribution hazards are not proportional. Truth here is a time-averaged subdistribution hazard ratio.

Table 3.

Simulation results, estimands for event of type J = 1, true cause-specific HRJ=2 = 2.0, true association between Z1 and A OR = 2.0, true association between Z2 and A OR = 2.0, true cause-specific HR = 4.0 for associations between Z1 and event J = 1and between Z2 and event J = 2

Truth Avg. Bias Avg. Pct. Bias Avg. Std. Error MSEa 95% Coverage

Conditional on…b Controlling for… Controlling for… Controlling for… Controlling for… Controlling for…

Parameter Z1 & Z2 Z1 only Z1 & Z2 Z1 only Z1 & Z2 Z1 only Z1 & Z2 Z1 only Z1 & Z2 Z1 only Z1 & Z2 Z1 only
Cause-specific log(HR)
 Covariate adjusted −0.692 −0.692 −0.002 −0.002 0.3 0.3 0.110 0.109 0.013 0.013 95.0 95.1
 IPEW estimatorc −0.621 −0.001 0.000 0.1 0.1 0.110 0.108 0.010 0.010 96.4 96.5
RD, d conditional riskc −0.191 0.000 0.000 −0.2 −0.2 0.030 0.030 0.001 0.001 96.2 96.6
Subdistribution log(HR)e
 Covariate adjusted −0.768 −0.704 −0.005 0.208 0.6 −29.6 0.111 0.109 0.013 0.056 95.1 51.7
 IPEW estimatorc −0.704 −0.002 −0.035 0.3 5.0 0.110 0.108 0.011 0.012 96.2 94.9
RD,d CIFc −0.200 0.000 −0.009 0.0 4.7 0.030 0.030 0.001 0.001 96.1 94.6

Abbreviations: Avg., average; CIF, cumulative incidence function; HR, hazard ratio; IPEW, inverse probability of exposure weighted; MSE, mean squared error; OR, odds ratio; Pct., percent; RD, risk difference

a

Mean Squared Error. MSE(θ^)=[Bias(θ^,θ)]2+Var(θ) where Var (θ) is the Monte Carlo variance

b

Truth varies according to variables included in the model for the HR because it is a non-collapsible measure.

c

Truth is marginal (not conditional on any covariates)

d

Difference in cumulative incidence functions or conditional risk functions estimated at t = 200

e

Because cause-specific hazards were simulated to be proportional, subdistribution hazards are not proportional. Truth here is a time-averaged subdistribution hazard ratio.

Table 4.

Simulation results, estimands for event of type J = 1, true cause-specific HRJ=2 = 1.0, true association between Z1 and A OR = 2.0, true association between Z2 and A OR = 2.0, true csHR = 2.0 for associations between Z1 and event J = 1 and between Z2 and event J = 2

Truth Avg. Bias Avg. Pct. Bias Avg. Std. Error MSEa 95% Coverage

Conditional on…b Controlling for… Controlling for… Controlling for… Controlling for… Controlling for…

Parameter Z1 & Z2 Z1 only Z1 & Z2 Z1 only Z1 & Z2 Z1 only Z1 & Z2 Z1 only Z1 & Z2 Z1 only Z1 & Z2 Z1 only
Cause-specific log(HR)
 Covariate adjusted −0.691 −0.691 −0.001 0.000 0.1 −0.1 0.125 0.124 0.017 0.017 94.7 93.9
 IPEW estimatorc −0.677 0.001 0.002 −0.2 −0.3 0.125 0.123 0.016 0.016 94.6 93.9
RD, d conditional riskc −0.168 0.001 0.001 −0.5 −0.6 0.030 0.030 0.001 0.001 94.5 93.8
Subdistribution log(HR)e
 Covariate adjusted −0.683 −0.670 −0.002 0.114 0.3 −17.0 0.125 0.123 0.017 0.030 94.2 83.3
 IPEW estimatorc −0.670 0.000 −0.007 0.0 1.0 0.125 0.123 0.016 0.016 94.4 94.3
RD,d CIFc −0.159 0.000 −0.001 −0.3 0.8 0.030 0.028 0.001 0.001 93.6 93.6

Abbreviations: Avg., average; CIF, cumulative incidence function; HR, hazard ratio; IPEW, inverse probability of exposure weighted; MSE, mean squared error; OR, odds ratio; Pct., percent; RD, risk difference

a

Mean Squared Error. MSE(θ^)=[Bias(θ^,θ)]2+Var(θ) where Var (θ) is the Monte Carlo variance

b

Truth varies according to variables included in the model for the HR because it is a non-collapsible measure.

c

Truth is marginal (not conditional on any covariates)

d

Difference in cumulative incidence functions or conditional risk functions estimated at t = 50

e

Because cause-specific hazards were simulated to be proportional, subdistribution hazards are not proportional. Truth here is a time-averaged subdistribution hazard ratio.

Table 5.

Simulation results, estimands for event of type J = 1, true cause-specific HRJ=2 = 4.0, true association between Z1 and A OR = 2.0, true association between Z2 and A OR = 2.0, true cause-specific HR = 2.0 for associations between Z1 and event J = 1 and between Z2 and event J = 2

Truth Avg. Bias Avg. Pct. Bias Avg. Std. Error MSEa 95% Coverage

Conditional on…b Controlling for… Controlling for… Controlling for… Controlling for… Controlling for…

Parameter Z1 & Z2 Z1 only Z1 & Z2 Z1 only Z1 & Z2 Z1 only Z1 & Z2 Z1 only Z1 & Z2 Z1 only Z1 & Z2 Z1 only
Cause-specific log(HR)
 Covariate adjusted −0.696 −0.696 0.003 0.004 −0.4 −0.5 0.132 0.131 0.018 0.017 95.4 95.5
 IPEW estimatorc −0.682 0.003 0.005 −0.5 −0.7 0.133 0.130 0.017 0.017 95.9 95.8
RD, d conditional riskc −0.169 0.002 0.002 −0.9 −1.0 0.024 0.024 0.001 0.001 95.0 94.7
Subdistribution log(HR)e
 Covariate adjusted −0.871 −0.857 −0.001 0.114 0.1 −13.3 0.133 0.130 0.018 0.030 94.7 85.2
 IPEW estimatorc −0.857 0.002 −0.015 −0.2 1.7 0.133 0.131 0.017 0.017 95.8 95.9
RD,d CIFc −0.193 0.001 −0.002 −0.6 1.1 0.028 0.028 0.001 0.001 94.9 94.7

Abbreviations: Avg., average; CIF, cumulative incidence function; HR, hazard ratio; IPEW, inverse probability of exposure weighted; MSE, mean squared error; OR, odds ratio; Pct., percent; RD, risk difference

a

Mean Squared Error. MSE(θ^)=[Bias(θ^,θ)]2+Var(θ) where Var (θ) is the Monte Carlo variance

b

Truth varies according to variables included in the model for the HR because it is a non-collapsible measure.

c

Truth is marginal (not conditional on any covariates)

d

Difference in cumulative incidence functions or conditional risk functions estimated at t = 50

e

Because cause-specific hazards were simulated to be proportional, subdistribution hazards are not proportional. Truth here is a time-averaged subdistribution hazard ratio.

All estimators related to the subdistribution hazards (the covariate-adjusted subdistribution hazard ratio, inverse probability exposure weighted subdistribution hazard ratio, and the cumulative incidence function) were biased when the confounder of the effect of exposure on the competing event was omitted, although by far the most clinically meaningful bias was in the covariate-adjusted subdistribution hazard ratio.

DISCUSSION

Others have suggested that all the pertinent statistical estimands in the presence of competing risks (cumulative incidence function, subdistribution hazards and subdistribution hazard ratio) are derived from the cause-specific hazards.30 Therefore, given that the cumulative incidence function (and therefore the subdistribution hazards) are a function of all events (Equation 3), an imbalance in a covariate related to the exposure and a competing event would distort the difference in the cumulative incidence between exposed and unexposed groups. It is fairly straightforward to show mathematically that an unbiased estimator for any of the statistical estimands for a competing risks analysis requires adjustment for confounders of the effect of exposure on the competing event, in addition to the confounders of the effect of exposure on the event of interest. The reader may be disturbed by our demonstration that the competing risk estimands are biased when the cause-specific hazard ratios are not, but previous work has shown that confounding can depend on the outcome parameter of interest.35 Furthermore, the cause-specific hazard ratios in our simulation were unbiased because the correct models were fit; in practice, correct model specification will never be assured. In a competing risk setting, the exposure may be associated with the probability of a particular event because (1) the exposure only directly causes or prevents the event of interest, (2) the exposure only causes or prevents the competing event, which subsequently indirectly permits or prevents the event of interest, or (3) the exposure both directly and indirectly causes or prevents the occurrence of the event of interest.

Cause-specific hazards interplay with one another and may produce unexpected results in the cumulative incidence function. In the simulation presented in this paper, we simulated cause-specific hazard ratios that were in opposite directions. However, if cause-specific hazard ratios are in the same direction for both the event of interest and the competing event, it is possible that the effect of exposure on the cumulative incidence function may be in the opposite direction.30 It is impossible to accurately predict the effect of an exposure on the cumulative incidence function based on a single cause-specific hazard ratio alone. Both cause-specific hazard ratios and cumulative incidence functions should be investigated in the presence of a competing risk.3

In presenting results from our simulation, we have chosen to present the conditional risk curves and conditional risk differences. We have done this because it is not uncommon for analyses that include competing risks to ignore them and treat individuals who get the competing risk as censored. However, as stated earlier, interpretation of a conditional risk functions requires the assumption that somehow all the competing events could prevented without affecting the risk of the event of interest; in nearly every scenario, this assumption is unrealistic. We reiterate that estimating conditional risks is not appropriate or meaningful when competing risks are present.

To our knowledge, this is the first paper to address variable selection for competing risk analyses when causal inference is the goal of the inquiry. We have demonstrated that in order to have unbiased estimators in a competing risk analysis, the analysis should control for confounders of both the effect of exposure on the primary event and of the effect of exposure on the competing event. However, we have not addressed how to identify confounders of either or both relationships. In the Methods section we recalled criteria for confounder selection and briefly mentioned directed acyclic graphs (DAGs) as an aid favored by epidemiologists for confounder identification. DAGs are a representation of the researcher’s hypothesis about the causal relationships between exposure, the outcome of interest, and all of their common causes. Using graphical criteria to analyze the DAG leads to identification of a minimally sufficient set of variables to block all non-causal paths between exposure and the primary outcome of interest.17 While the extension of this strategy to a competing risk situation may seem trivial, there are no established rules for representing competing risks on a DAG. To illustrate bias in the presence of competing risks, several authors have drawn DAGs with two outcome nodes, one for the event of interest and one for the competing event, with a box around the competing event indicating that a typical cause-specific analysis is restricted to those who do not experience the competing event.39,40 An analysis that censors individuals who experience the competing event and estimates conditional risks might be represented by drawing a box around the competing event. However, it is unclear how to alter the DAG when conducting a competing risk analysis. Indeed, including separate nodes for the event of interest and the competing event may cause researchers to forget that the two nodes cannot be separated because they are often two halves of the same coin.19,41 On such a DAG, one may be tempted to remove the box around the competing event, but then the DAG appears to indicate that Z2Y,J=1, which we demonstrated not to be the case in our simulation. Perhaps researchers would do better to include a node on the DAG that is the composite outcome, because that would generally lead us to the correct conclusion about what covariates are confounders (all covariates that are confounders of any exposure-outcome cause-specific effect). A complete discourse on DAGs in the presence of competing risks is beyond the scope of this paper.

Both the cumulative incidence functions and the cause-specific hazard ratios provide insight into the mechanisms at work in the presence of competing risks, and both should be estimated. When the goal of an investigation is causal inference, and there is a competing event that may preclude occurrence of the event of interest, we have demonstrated that confounders of the effect of the exposure on both the primary event and the competing event must be controlled. The resultant bias from failing to account for confounders of the exposure effect on the competing event is likely to increase with increasing incidence of the competing event, although the direction of the bias is generally unpredictable.

Supplementary Material

eAppendices

Acknowledgments

Sources of funding: This work was supported by NIH grants U01 HL121812, U01 DA036935, and R56 AI102622.

Footnotes

Conflicts of Interest: The authors have no conflicts of interest to report.

Editor’s Note: A Commentary on this article appears on p. xxx.

References

  • 1.Cole SR, Lau B, Eron JJ, et al. Estimation of the standardized risk difference and ratio in a competing risks framework: application to injection drug use and progression to AIDS after initiation of antiretroviral therapy. Am J Epidemiol. 2015;181(4):238–245. doi: 10.1093/aje/kwu122. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Lau B, Cole SR, Gange SJ. Competing risk regression models for epidemiologic data. Am J Epidemiol. 2009;170(2):244–256. doi: 10.1093/aje/kwp107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Latouche A, Allignol A, Beyersmann J, Labopin M, Fine JP. A competing risks analysis should report results on all cause-specific hazards and cumulative incidence functions. Journal of clinical epidemiology. 2013;66(6):648–653. doi: 10.1016/j.jclinepi.2012.09.017. [DOI] [PubMed] [Google Scholar]
  • 4.Wolbers M, Koller MT, Stel VS, et al. Competing risks analyses: objectives and approaches. European heart journal. 2014;35(42):2936–2941. doi: 10.1093/eurheartj/ehu131. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Andersen PK, Geskus RB, de Witte T, Putter H. Competing risks in epidemiology: possibilities and pitfalls. Int J Epidemiol. 2012;41(3):861–870. doi: 10.1093/ije/dyr213. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Wolbers M, Koller MT, Witteman JC, Steyerberg EW. Prognostic models with competing risks: methods and application to coronary risk prediction. Epidemiology. 2009;20(4):555–561. doi: 10.1097/EDE.0b013e3181a39056. [DOI] [PubMed] [Google Scholar]
  • 7.Van Rompaye B, Jaffar S, Goetghebeur E. Estimation with Cox models: cause-specific survival analysis with misclassified cause of failure. Epidemiology. 2012;23(2):194–202. doi: 10.1097/EDE.0b013e3182454cad. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Kuk D, Varadhan R. Model selection in competing risks regression. Stat Med. 2013;32(18):3077–3088. doi: 10.1002/sim.5762. [DOI] [PubMed] [Google Scholar]
  • 9.Schmidtmann I, Elsasser A, Weinmann A, Binder H. Coupled variable selection for regression modeling of complex treatment patterns in a clinical cancer registry. Stat Med. 2014;33(30):5358–5370. doi: 10.1002/sim.6340. [DOI] [PubMed] [Google Scholar]
  • 10.Tapak L, Saidijam M, Sadeghifar M, Poorolajal J, Mahjub H. Competing Risks Data Analysis with High-dimensional Covariates: An application in Bladder Cancer. Genomics, proteomics & bioinformatics. 2015 doi: 10.1016/j.gpb.2015.04.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Ha ID, Lee M, Oh S, Jeong JH, Sylvester R, Lee Y. Variable selection in subdistribution hazard frailty models with competing risks data. Stat Med. 2014;33(26):4590–4604. doi: 10.1002/sim.6257. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Greenland S. Modeling and variable selection in epidemiologic analysis. Am J Public Health. 1989;79(3):340–349. doi: 10.2105/ajph.79.3.340. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Brookhart MA, Schneeweiss S, Rothman KJ, Glynn RJ, Avorn J, Sturmer T. Variable selection for propensity score models. Am J Epidemiol. 2006;163(12):1149–1156. doi: 10.1093/aje/kwj149. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Greenland S. Invited commentary: variable selection versus shrinkage in the control of multiple confounders. Am J Epidemiol. 2008;167(5):523–529. doi: 10.1093/aje/kwm355. discussion 530–521. [DOI] [PubMed] [Google Scholar]
  • 15.Hernán MA, Hernández-Diaz S, Werler MM, Mitchell AA. Causal knowledge as a prerequisite for confounding evaluation: an application to birth defects epidemiology. Am J Epidemiol. 2002;155(2):176–184. doi: 10.1093/aje/155.2.176. [DOI] [PubMed] [Google Scholar]
  • 16.Hernán MA, Robins JM. Estimating causal effects from epidemiological data. Journal of epidemiology and community health. 2006;60(7):578–586. doi: 10.1136/jech.2004.029496. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Greenland S, Pearl J, Robins JM. Causal diagrams for epidemiologic research. Epidemiology. 1999;10(1):37–48. [PubMed] [Google Scholar]
  • 18.Fine JP, Gray RJ. A proportional hazards model for the subdistribution of a competing risk. Journal of the American Statistical Association. 1999;94(446):496–509. [Google Scholar]
  • 19.Hernán MA, Schisterman EF, Hernandez-Diaz S. Invited commentary: composite outcomes as an attempt to escape from selection bias and related paradoxes. Am J Epidemiol. 2014;179(3):368–370. doi: 10.1093/aje/kwt283. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Greenland S, Robins JM. Identifiability, exchangeability, and epidemiological confounding. Int J Epidemiol. 1986;15(3):413–419. doi: 10.1093/ije/15.3.413. [DOI] [PubMed] [Google Scholar]
  • 21.Prentice RL, Kalbfleisch JD, Peterson AV, Jr, Flournoy N, Farewell VT, Breslow NE. The analysis of failure times in the presence of competing risks. Biometrics. 1978;34(4):541–554. [PubMed] [Google Scholar]
  • 22.Rothman K, Greenland S, Lash T. Modern epidemiology. Lippincott: Williams & Wilkins; 2008. [Google Scholar]
  • 23.Lindley DV. Understanding uncertainty. Hoboken, New Jersey: John Wiley & Sons, Inc; 2014. Revised edition. ed. [Google Scholar]
  • 24.Beyersmann J, Dettenkofer M, Bertz H, Schumacher M. A competing risks analysis of bloodstream infection after stem-cell transplantation using subdistribution hazards and cause-specific hazards. Stat Med. 2007;26(30):5360–5369. doi: 10.1002/sim.3006. [DOI] [PubMed] [Google Scholar]
  • 25.Beyersmann J, Schumacher M. Misspecified regression model for the subdistribution hazard of a competing risk. Stat Med. 2007;26(7):1649–1651. doi: 10.1002/sim.2727. [DOI] [PubMed] [Google Scholar]
  • 26.Cole SR, Hernán MA. Adjusted survival curves with inverse probability weights. Computer methods and programs in biomedicine. 2004;75(1):45–49. doi: 10.1016/j.cmpb.2003.10.004. [DOI] [PubMed] [Google Scholar]
  • 27.Hinchliffe SR, Lambert PC. Flexible parametric modelling of cause-specific hazards to estimate cumulative incidence functions. BMC Med Res Methodol. 2013;13:13. doi: 10.1186/1471-2288-13-13. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Lau B, Cole SR, Gange SJ. Parametric mixture models to evaluate and summarize hazard ratios in the presence of competing risks with time-dependent hazards and delayed entry. Stat Med. 2011;30(6):654–665. doi: 10.1002/sim.4123. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Nicolaie M, Houwelingen HV, Putter H. Vertical modeling: Analysis of competing risks data with missing causes of failure. Stat Methods Med Res. 2011 doi: 10.1177/0962280211432067. [DOI] [PubMed] [Google Scholar]
  • 30.Allignol A, Schumacher M, Wanner C, Drechsler C, Beyersmann J. Understanding competing risks: a simulation point of view. BMC Med Res Methodol. 2011;11:86. doi: 10.1186/1471-2288-11-86. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Hernán MA. The hazards of hazard ratios. Epidemiology. 2010;21(1):13–15. doi: 10.1097/EDE.0b013e3181c1ea43. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.VanderWeele TJ, Shpitser I. A new criterion for confounder selection. Biometrics. 2011;67(4):1406–1413. doi: 10.1111/j.1541-0420.2011.01619.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Howards PP, Schisterman EF, Poole C, Kaufman JS, Weinberg CR. “Toward a clearer definition of confounding” revisited with directed acyclic graphs. Am J Epidemiol. 2012;176(6):506–511. doi: 10.1093/aje/kws127. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Szklo M, Nieto FJ. Epidemiology : beyond the basics. 3rd. Burlington, Mass: Jones & Bartlett Learning; 2014. [Google Scholar]
  • 35.Greenland S, Robins JM, Pearl J. Confounding and collapsibility in causal inference. Statistical Science. 1999;14(1):29–46. [Google Scholar]
  • 36.Austin PC. The performance of different propensity score methods for estimating marginal hazard ratios. Stat Med. 2013;32(16):2837–2849. doi: 10.1002/sim.5705. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Cox DR. Regression models and life tables. Journal of the Royal Statistical Society. Series B, statistical methodology. 1972;34:187–220. [Google Scholar]
  • 38.Robins J, Hernán MA, Brumback B. Marginal Structural Models and Causal Inference in Epidemiology. Epidemiology. 2000;11(5):550–560. doi: 10.1097/00001648-200009000-00011. [DOI] [PubMed] [Google Scholar]
  • 39.Sarfati D, Blakely T, Pearce N. Measuring cancer survival in populations: relative survival vs cancer-specific survival. Int J Epidemiol. 2010;39(2):598–610. doi: 10.1093/ije/dyp392. [DOI] [PubMed] [Google Scholar]
  • 40.Thompson CA, Zhang ZF, Arah OA. Competing risk bias to explain the inverse relationship between smoking and malignant melanoma. European journal of epidemiology. 2013;28(7):557–567. doi: 10.1007/s10654-013-9812-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Kramer MS, Zhang X, Platt RW. Analyzing risks of adverse pregnancy outcomes. Am J Epidemiol. 2014;179(3):361–367. doi: 10.1093/aje/kwt285. [DOI] [PubMed] [Google Scholar]
  • 42.Holland PW. Statistics and Causal Inference. J Am Stat Assoc. 1986;81(396):945–960. [Google Scholar]
  • 43.Cole SR, Hudgens MG, Brookhart MA, Westreich D. Risk. Am J Epidemiol. 2015;181(4):246–250. doi: 10.1093/aje/kwv001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Bender R, Augustin T, Blettner M. Generating survival times to simulate Cox proportional hazards models. Statistics in Medicine. 2005;24(11):1713–1723. doi: 10.1002/sim.2059. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

eAppendices

RESOURCES