Skip to main content
Biostatistics (Oxford, England) logoLink to Biostatistics (Oxford, England)
. 2016 Jun 26;18(1):48–61. doi: 10.1093/biostatistics/kxw023

Nonparametric binary instrumental variable analysis of competing risks data

Amy Richardson 1,*,, Michael G Hudgens 1, Jason P Fine 1, M Alan Brookhart 2
PMCID: PMC6497235  PMID: 27354709

Summary

In both observational studies and randomized trials with noncompliance, unmeasured confounding may exist which may bias treatment effect estimates. Instrumental variables (IV) are a popular technique for addressing such confounding, enabling consistent estimation of causal effects. This paper proposes nonparametric IV estimators for censored time to event data that may be subject to competing risks. A simple, plug-in estimator is introduced using nonparametric estimators of the cumulative incidence function, with confidence intervals derived using asymptotic theory. To provide an overall test of the treatment effect, an integrated weighted difference statistic is suggested, which is applicable to data with and without competing risks. Simulation studies demonstrate that the methods perform well with realistic samples sizes. The methods are applied to assess the effect of infant or maternal antiretroviral therapy on transmission of HIV from mother to child via breastfeeding using data from a large, recently completed randomized trial in Malawi where noncompliance with assigned treatment may confound treatment effect estimates.

Keywords: Competing risks, Compliance, Identifiability, Instrumental variables, Right censoring, Survival analysis

1. Introduction

In both randomized trials with noncompliance and nonrandomized studies, researchers may seek to assess the causal effect of a treatment on the time to different possible events of interest, such as the effect of infant or maternal antiretroviral (ARV) drugs on the time to death or HIV infection of a breastfeeding infant of an HIV+ woman (Chasela and others, 2010). However, such treatment effects are not identifiable without making empirically untestable assumptions about the relationship between the various event types, the time to the event, and the treatment selection mechanism. With the exception of randomized clinical trials with perfect compliance to treatment assignment, the treatment selection mechanism is typically unknown, which may confound standard “as treated” analyses. Such unmeasured confounding may bias treatment effect estimates in both observational studies and randomized trials with noncompliance to assigned treatment.

In some circumstances, there may exist variables that are not related to the outcome except through their effect on treatment selection. Such variables may be used to provide partial or point identification of treatment effects without knowledge of the treatment selection mechanism (Imbens and Angrist, 1994; Angrist and others, 1996; Abadie, 2003; Tan, 2006; Ogburn and others, 2015). These variables are referred to as instrumental variables (IV), and examples include treatment assignment in randomized clinical trials with noncompliance (Imbens and Angrist, 1994; Angrist and others, 1996), the calendar time for the approval of a new treatment by a regulatory agency (Martens and others, 2006; Cain and others, 2009), physician treatment prescribing preference (Brookhart and Schneeweiss, 2007), or randomized encouragement to take treatment (Martens and others, 2006). If the effect of the IV on treatment selection is monotonic, then the IV may be used to identify treatment effects within the subpopulation whose treatment is affected by the IV (Imbens and Angrist, 1994; Angrist and others, 1996; Hernán and Robins, 2006).

IV have been used to identify treatment effects describing differences in survival functions between treatment arms within the subpopulation described above. Inferential methods for censored data with IV have been developed using both nonparametric methods (Baker, 1998; Abbring and van den Berg, 2005; Nie and others, 2011; Li and others, 2015) as well as parameteric and semiparametric modelling techniques (Robins and Tsiatis, 1991; Loeys and Goetghebeur, 2003; Cuzick and others, 2007; MacKenzie and others, 2014; Tchetgen and others, 2015). In this paper, we consider competing risks data with multiple failure types and decompose the overall causal effect of treatment on the survival probability at a fixed time point into the sum of causal effects on the various event type specific cumulative incidence (or subdistribution) functions. If an IV does not share common causes with either the time to the event or type of event experienced, it may be used to identify the event type specific causal effect based on the difference in the event type specific cumulative incidence function between treatment arms. Inferences may be obtained using nonparametric estimates of the cumulative incidence functions, analogously to an overall causal effect estimator based on nonparametric estimators of the survival functions.

Typically in survival analysis an intent-to-treat test for differences in treatment-specific survival functions utilizes the log-rank statistic. This nonparametric test provides a global assessment of differences in survival functions over time. To our knowledge, the existing literature on nonparametric analyses of censored data with IV provides inferential methods only at fixed time points (Baker, 1998; Abbring and van den Berg, 2005; Nie and others, 2011) and does not address testing for global treatment differences. In this paper, we develop test statistics for differences in overall survival which are integrated weighted differences of the estimated causal effects over time, where the weight function may be chosen to emphasize time points of greatest interest. The tests are easily implemented using a straightforward variance estimator and are theoretically justified. The proposed statistics are extended to the competing risks setting, where they are constructed from nonparametric estimators of the differences in the treatment-specific cumulative incidence functions.

As an example of a setting where such methods may be applicable, consider the Breastfeeding, Antiretrovirals, and Nutrition (BAN) randomized clinical trial undertaken in Lilongwe, Malawi, between 2004 and 2010 (Chasela and others, 2010). In this study, 2369 HIV-infected breastfeeding mothers and their uninfected newborn babies were randomly assigned to one of three treatment regimens: maternal ARV therapy (n=849), daily infant nevirapine (NVP) (n=852), or control (n=668). The aim was to assess the effect of maternal ARV or infant NVP on reducing mother to child transmission of HIV. Two challenges in the analysis of data from this trial are (i) not all participants complied to their randomized treatment regimen assignment and (ii) death (prior to HIV infection) was a competing risk for HIV infection. The randomized treatment assignment provides an IV that allows for estimation of treatment effects among those who would comply to whichever treatment they were assigned. In the BAN study treatment regimen adherence was measured via surveys administered to the mothers, allowing for estimation of such effects (assuming accurate self-report).

The organization of the remainder of this paper is as follows. In Section 2, notation, assumptions, causal estimands and estimators are given, both for the overall and event-type specific causal effects. Section 3 describes nonparametric inferences using the estimators for both the standard survival setup as well as in the presence of competing risks and gives details of nonparametric test statistics for a global assessment of the causal effects over time. Section 4 presents the results of a simulation study examining the finite sample performance of these estimators and tests. Section 5 applies the methods derived in Section 3 to the BAN study. Section 6 concludes with a discussion.

2. Preliminaries

2.1 Notation

An IV that is often used to estimate causal effects is randomized treatment assignment in a clinical trial. Let R be an IV given by randomized assignment where R=0 indicates assignment to control and R=1 indicates assignment to treatment (e.g., maternal ARV or infant NVP in the BAN study). Although treatment effects of R on the outcome of interest may be identifiable if R is randomly assigned, treatment effects due to the actual treatment taken (which may differ from R) are typically the target of inference. Let Z be the actual treatment taken, where Z=0 denotes treatment not taken, Z=1 denotes that treatment was taken. Define potential treatment outcomes Z(r) under randomized treatment assignment r=0,1; specifically let Z(r)=0 indicate that the subject would not take treatment under randomized assignment r, and Z(r)=1 indicates that the subject would take treatment under randomized assignment r. As in Imbens and Angrist (1994) and Angrist and others (1996), define principal strata based on the vector of the treatment potential outcomes ZP0=(Z(0),Z(1)) where ZP0=(0,1) are compliers (i.e., they only take treatment if assigned to do so), ZP0=(1,1) are the always treated, ZP0=(0,0) are the never treated, and ZP0=(1,0) are defiers (i.e., they would only take treatment when not assigned to do so).

Suppose we are interested in time-to-event outcomes that may be subject to competing risks. For randomization assignment r and actual treatment received z, let T(r,z) be the potential failure time and let J(r,z) be the potential event type taking on values 1,,k. Let T be the failure time that would have been observed in absence of censoring, C be the censoring time (e.g., due to loss to follow up), and X=min(T,C). Let J be the event type that would have been observed in absence of censoring, and Δ=JI[TC]) be the observed event type where Δ=0 indicates that the subject was censored before the event was experienced. Suppose we observe n i.i.d copies of {Ri,Zi,Xi,Δi}.

2.2 Assumptions

Assumption 1 Stable unit treatment value assumption (Rubin, 1980, SUTVA): if R=r and Z=z then Z=Z(r) and T=T(r,z) and J=J(r,z) for r,z=0,1.

Assumption 2 Independent instrument: R{T(r,z),Z(r) for r,z=0,1}.

Assumption 3 Exclusion restriction: T(0,z)=T(1,z) and J(0,z)=J(1,z) for z=0,1

Assumption 4 Nonzero causal effect of R on Z: E[Z(1)Z(0)]0.

Assumption 5 Monotonicity (Imbens and Angrist, 1994): Z(1)Z(0).

Assumption 6 Independent censoring: {T,J}C|R.

Assumption 1 is a standard assumption made in order to estimate causal effects defined using potential outcomes. Assumptions 2–4 qualify R as an IV and are the same assumptions found in Imbens and Angrist (1994) and Angrist and others (1996). When the IV is randomly assigned, Assumption 2 will typically be considered plausible. Assumption 3 means that the potential outcomes only depend on z such that we may write T(z)=T(r,z) and J(z)=J(r,z). Assumption 5 implies that the defiers principal stratum ZP0=(1,0) is empty (i.e., there is no subject that would take treatment only when not assigned to). This assumption is commonly made when using an instrumental variable to make inferences about causal effects (Imbens and Angrist, 1994). Assumption 6 is made in order to draw inference about the overall survival function and cumulative incidence functions in the presence of right censoring. Relaxing Assumption 6 is discussed in Section 5 in the context of the BAN study.

2.3 Causal estimands and estimators

We are interested in causal effects describing differences between the survival curves of the treated versus the nontreated within the subpopulation defined by ZP0=(0,1). This is sometimes referred to as a local average treatment effect and is defined as

δ(t)=Pr[T(1)>t|ZP0=(0,1)]Pr[T(0)>t|ZP0=(0,1)]. (2.1)

Under Assumptions 1–5, (2.1) is equivalent to

δ(t)=Pr[T>t|R=1]Pr[T>t|R=0]Pr[Z=1|R=1]Pr[Z=1|R=0]=S1(t)S0(t)p1p0=dS(t)dp (2.2)

where Sr(t)=Pr[T>t|R=r] is the survival function given R=r, pr = Pr[Z=1|R=r], dS(t)=S1(t)S0(t), and dp=p1p0. For fixed time point t, (2.2) is equivalent to the standard IV estimand of Imbens and Angrist (1994) and Angrist and others (1996). Under Assumption 6, a consistent estimator δ^(t) of (2.2) is found by plugging in the Kaplan Meier estimator of the survival functions S^r(t)=Pr^[T>t|R=r] at time t conditional on R=r and dp^=p^1p^0 where p^r=iZiI[Ri=r]/iI[Ri=r] for r=0,1 (Baker, 1998; Abbring and van den Berg, 2005; Nie and others, 2011). This will be called the IV estimator of the local average treatment effect δ(t).

The local average treatment effect may be further partitioned into event-type specific local average treatment effects describing differences in the cumulative incidence functions for specific event type j when there are competing risks for the failure time T. Namely, a local average treatment effect for event type j can be defined as

δj(t)=Pr[T(0)t,J(0)=j|ZP0=(0,1)]Pr[T(1)t,J(1)=j|ZP0=(0,1)]. (2.3)

It follows that δ(t)=j=1kδj(t), i.e., the local average treatment effect can be decomposed into the sum of event-type specific effects. Note that the local average treatment effects can be zero while some of the event-type specific effects are nonzero, e.g., if δj(t)0 and j=1,jjkδj(t)=δj(t). In the context of the BAN study, this could occur if infant NVP resulted in reduced proportion of infants being infected with HIV but also increased the proportion of infants dying (perhaps due to drug side effects) such that the proportions dying or becoming infected are the same in the infant NVP and control arms. Similarly, Kalbfleisch and Prentice (2002, p. 249) describe a study comparing survival times in diabetic patients where there were no significant differences in overall survival between treatment groups, but one treatment was found to have an elevated risk of death due to cardiovascular complications.

In order to arrive at an expression of (2.3) that is identifiable from observable data, Assumption 2 will be replaced by the stronger condition given below.

Assumption 7 Jointly independent instrument: R{T(r,z),J(r,z), Z(r) for r,z=0,1}.

Assumption 7 states that the potential failure time T(r,z), potential event type J(r,z), and potential treatment taken Z(r) are independent of the instrument. When the instrument is randomized treatment assignment, as in the BAN study, this assumption will hold. Under Assumptions 1–5 and 7, (2.3) is equivalent to the following

δj(t)=Pr[Tt,J=j|R=0]Pr[Tt,J=j|R=1]Pr[Z=1|R=1]Pr[Z=1|R=0]=F0j(t)F1j(t)p1p0=dFj(t)dp (2.4)

where Frj(t)=Pr[Tt,J=j|R=r] is the cumulative incidence function for event type j given R=r and dFj(t)=F0j(t)F1j(t). Under Assumption 6, a consistent estimator δ^j(t) of (2.4) is found by plugging in the Aalen and Johansen (1978) estimator of the cumulative incidence function F^rj(t)=Pr^[Tt,J=j|R=r] for event type j at time t conditional on R=r and consistent estimators of each pr. This will be called the IV estimator of δj(t). As with the estimands in (2.2) and (2.4), the estimator of the local average treatment effect δ^(t) equals the sum of the estimators of the event-type specific local average treatment effects j=1kδ^j(t).

3. Inference

3.1 Pointwise confidence intervals

Here we give the large sample distribution for IV estimators of the local average treatment effect and the event-type specific local average treatment effect for some t(0,τ) where τ is the maximum follow-up time. These results yield asymptotic pointwise confidence intervals for the local effects.

Proposition 1 —

Let nr=i=1nI[Ri=r]. Assume that nr/nqr>0 as n for r=0,1 and let yr(t)=Pr[Xt|R=r] and yrz(t)=Pr[Xt, Z=z|R=r]. Assume that yr(t), yrz(t)> 0. Then n{δ^(t)δ(t)}dN(0,σδ2(t)) and n{δ^j(t)δj(t)}dN(0,σδ2(t,j)) as n.

A proof of Proposition 1 and consistent estimators σ^δ2(t) and σ^δ2(t,j) of the asymptotic variances σδ2(t) and σδ2(t,j) are given in Appendix A of supplementary material available at Biostatistics online. It follows from Proposition 1 that an asymptotic 100(1α)% confidence interval for δ(t) is given by δ^(t)±zα/2σ^δ(t)/n and an asymptotic 100(1α)% confidence interval for δj(t) is given by δ^j(t)±zα/2σ^δ(t,j)/n where zα/2 is the 1α/2 quantile of a standard normal variate.

3.2 Hypothesis testing

To conduct tests for a difference between the two treatment groups in the survival curves and cumulative incidence curves for event type j, consider testing the following hypotheses

H0:δw(t0)=0t0w(u)δ(u)du=0 and H0j:δwj(t0)=0t0w(u)δj(u)du=0

for t0(0,τ) where w is a user-defined weight function that may be used to test for earlier or later differences. Let W^ be some estimator of w. For example, we might choose w(t)=S1(t)S0(t) and W^(t)=S^1(t)S^0(t).

Proposition 2 —

Suppose supt[0,t0]|W^(t)w(t)|p0 as n. Then n{δ^w(t0)δw(t0)}dN(0,σw2(t0)) under H0 and n{δ^wj(t0)δwj(t0)}dN(0,σw2(t0,j)) under H0j as n where δ^w(t0)=0t0W^(u)δ^(u)du and δ^wj(t0)=0t0W^(u)δ^j(u)du.

A proof of Proposition 2 and consistent estimators σ^w2(t0) and σ^w2(t0,j) of the asymptotic variances σw2(t0) and σw2(t0,j) are given in Appendix B of supplementary material available at Biostatistics online. It follows from Proposition 2 that weighted instrumental variable (WIV) tests with rejection regions defined by Q={δ^w(t0):|nδ^w(t0)/σ^w(t0)|>z1α/2} and Qj={δ^wj(t0):|nδ^wj(t0)/σ^w(t0,j)|>z1α/2} provide asymptotically two-sided size α tests of H0:δw(t0)=0 and H0j:δwj(t0)=0.

Rejection of H0 indicates that the effect of treatment on the overall survival experience within the ZP0=(0,1) principal stratum is nonzero. Similarly, rejection of H0j indicates that the effect of treatment on the cumulative incidence of events of type j is nonzero within stratum ZP0=(0,1). Similar to the estimands in (2.2) and (2.4), δw(t)=j=1kδwj(t). Again, the effect of treatment on one event-type j may cancel with the effect of treatment on another event type j such that the null hypothesis of no treatment effect on the overall survival H0 is true, but the event type specific null hypotheses of no treatment effect H0j and H0j are not true. Therefore, the availability of tests of H0j for j=1,,k allows for testing of treatment effects on event-type specific cumulative incidence functions that might be missed if only a test of a treatment effect on the overall survival H0 were conducted.

3.3 As treated and intent to treat analyses

An alternative approach to the IV-based methods developed in this paper might entail “as treated” analysis about (2.1) and (2.3) based on the estimators δ˜(t)=S^11(t)S^00(t) and δ˜j(t)=F^00j(t)F^11j(t) where S^rz(t) and F^rzj(t) are the Kaplan Meier estimator of the survival function and the Aalen Johansen estimator of the cumulative incidence function conditional on R=r and Z=z; corresponding confidence intervals for (2.1) and (2.3) might be computed by appealing to asymptotic normality results (Andersen and others, 1995) for S^rz(t) and F^rz(t). In such an “as treated” analysis, testing the hypotheses H0 and H0j might be accomplished by replacing δ^(t) and δ^j(t) with δ˜(t) and δ˜j(t) in the WIV tests from Section 3.2 (and also modifying the variance estimators accordingly). For H0, this yields the Pepe and Fleming (1989) weighted Kaplan Meier (WKM) test. Because of potential selection bias induced by conditioning R=r and Z=z, such a naive as-treated analysis is not expected to yield valid inferences about the local average treatment effects. Indeed this is demonstrated empirically in the next section.

Another competing method might entail an “intent to treat” (ITT) anlaysis about (2.1) and (2.3) based on the estimators δ(t)=S^1(t)S^0(t) and δj(t)=F^0j(t)F^1j(t) with large sample confidence intervals computed analogously to the as-treated intervals described above. Because the ITT analysis makes no adjustment for noncompliance, the ITT estimators are expected in general to be biased for the local average treatment effects; this is also shown empirically in the next section. On the other hand, ITT tests of the hypotheses H0 and H0j based on 0t0W^(u)δ(u)du and 0t0W^(u)δj(u)du are equivalent to the WIV tests in Section 3.2. This occurs because under the null the difference in proportions dp^ in the denominators of δ^(t) and δ^j(t) cancels with the dp^2 term in the variance estimators when constructing the WIV tests (see Appendix B of supplementary material available at Biostatistics online). Similar results occur with uncensored data, where an asymptotic test based on the IV estimator is equivalent to a test based on ITT estimates of mean outcomes for the two treatment arms.

4. Simulation study

Simulation studies were conducted to assess the finite sample operating characteristics of the proposed IV estimators, corresponding confidence intervals, and WIV tests. Data sets were simulated under Assumptions 1 – 7. For each simulated data set n principal strata vectors ZP0 were simulated using a multinomial random number generator with parameter θ=(θ00,θ01,θ10,θ11) with θij = Pr[ZP0=(i,j)]; (θ10 = 0 under Assumption 5). The parameter θ01 is the proportion of the population in the compliers principal stratum and provides a measure of the strength of the instrument in determining treatment selection. The randomized treatment assignment R was simulated by randomly permuting a vector of size n with q0n elements equal to 0 and the remaining elements equal to 1. The random variable Z was determined based on R and ZP0. Censoring times C were generated using a uniform random number generator on the interval (cR,cR+dR). The failure time T(Z)=T was simulated by sampling from the distribution defined by an overall hazard of j=1kλrzj(t), where each λrzj(t) has a Weibull hazard of the form κγ(γt)κ1 for four scenarios as detailed in Table 1. The event indicator J was simulated by sampling from a multinomial random variable with Pr[J=j|T=u]=λrzj(u)/j=1kλrzj(u) for j=1,2. If the subject was censored, i.e., C<T, Δ was set to 0; otherwise Δ=J. All results are based on 5000 Monte Carlo simulations per scenario, q0=0.5, c0 = 4, d0 = 6, c1 = 3, d1 = 3, w(t) = 1 for all t and t0 = 7. Censoring rates were 25%, 18%, 25%, and 18% for scenarios 1–4 described below. Event 1 occurred at rates of 32%, 33%, 37%, and 41% and event 2 occurred at rates of 43%, 49%, 38%, and 41% for scenarios 1–4. For each data set in the simulation study, the IV, as-treated, and ITT analyses described in Section 3 were conducted.

Table 1.

Empirical type I error and power of the WIV test and the naive WKM test of H0:δwj(t0)=0 and H0:δw(t0)=0 from the simulation study described in Section 4 for n=300,1000,2000. Results are based on θ = ([1θ01]/2, θ01, 0, [1θ01]/2) for various θ01. The hazard for each j within each ZP0 has Weibull hazard of the form κγ(γt)κ1 for parameters (γ,κ). For ZP0=(1,1), (γ,κ)=(0.10,1) for j=1,2 and for ZP0=(0,0), (γ,κ)=(0.16,1) for j=1,2

(γ,κ) for Power H0j : δwj(t0)=0
ZP0=(0,1) WIV Naive WKM
n= n=
Scenario θ01 j z=0 z=1 300 1000 2000 300 1000 2000
1 0.6 1 (0.12,1.2) (0.12,1.2) 5 8 14 83 98 99
2 (0.24,1.2) (0.12,1.2) 63 95 99 90 100 100
(all) 56 97 100 89 100 100
0.3 1 (0.12,1.2) (0.12,1.2) 4 4 4 59 93 98
2 (0.24,1.2) (0.12,1.2) 20 51 80 20 51 80
(all) 19 50 78 79 99 100
2 0.6 1 (0.1,1.2) (0.2,1.2) 59 96 99 87 99 100
2 (0.3,1.2) (0.2,1.2) 65 97 99 89 100 100
(all) 6 6 6 17 45 74
0.3 1 (0.1,1.2) (0.2,1.2) 18 45 74 18 45 74
2 (0.3,1.2) (0.2,1.2) 20 50 76 20 50 76
(all) 6 5 5 34 83 99
3 0.6 1 (0.19,1.2) (0.12,1.2) 14 35 61 30 69 90
2 (0.19,1.2) (0.12,1.2) 15 35 61 15 35 61
(all) 62 98 100 91 100 100
0.3 1 (0.19,1.2) (0.12,1.2) 6 10 17 13 20 29
2 (0.19,1.2) (0.12,1.2) 8 10 17 17 19 31
(all) 21 55 84 42 94 100
4 0.6 1 (0.2,1.2) (0.2,1.2) 4 3 2 6 9 10
2 (0.2,1.2) (0.2,1.2) 5 3 3 6 10 12
(all) 6 5 5 13 32 58
0.3 1 (0.2,1.2) (0.2,1.2) 4 3 3 5 7 9
2 (0.2,1.2) (0.2,1.2) 5 3 2 5 7 9
(all) 5 5 5 5 17 31

Table 1 gives the estimated power of the WIV and WKM tests for various sample sizes for each scenario. Scenario 1 describes a situation in which there is a treatment effect on event type j=2 in the complier principal stratum but not on event type 1. In this scenario, the power of the WIV test to reject H02 is similar to H0 and the power to reject H01 is small (though note that this scenario is not null, i.e., H01:δw1(t0)0). Scenario 2 describes a situation in which there is a treatment effect on both event types, but these effects cancel each other out such that δw(t0)=0 (as described in Section 2.3 and at the end Section 3.2). The empirical type I error of the WIV test to reject H0 in Scenario 2 was approximately 5%. The opposing causal effects for j=1 and 2 are roughly the same magnitude as δw2(t0) in Scenario 1, and the power of the WIV test to reject both H01 and H02 in Scenario 2 is similar to the power to reject H02 in Scenario 1. Scenario 3 describes a situation in which there is again a treatment effect for both event types, but these have the same sign and magnitude. As would be expected, the power of the WIV test to reject H0 in this situation is higher than that of H01 or H02, which are roughly the same. Scenario 4 describes a situation in which there are no causal treatment effects in the complier principal stratum for event type 1 or 2 such that δw(t0)=δw1(t0)=δw2(t0)=0. Again, as expected the results here demonstrate that the tests of H0, H01, and H02 control the type I error rate. In all scenarios, the strength of the instrument (as measured by θ01 – the proportion of the population who are compliers) has large effects on the power of the test, with increasing instrument strength yielding increased power.

Power of the WKM tests tended to be greater than the WIV tests. However, the WKM tests failed to control the nominal type I error. In particular, for Scenarios 2 and 4 for j=(all), the empirical type I error tended to be greater than 5%, sometimes substantially so, and increasing with sample size. These results demonstrate that the WKM tests do not provide valid size α tests of no local average treatment effects.

Table 2 shows that the IV estimators δ^(t) and δ^j(t) are unbiased and that the variance estimators accurately estimate the true variance (as indicated by the ratio of the average estimated standard error and the empirical standard error). The coverage of the IV pointwise confidence intervals are approximately the nominal 0.95 in almost all scenarios. On the other hand, the naive “as treated” estimators δ˜(t) and δ˜j(t) have higher bias, and the coverage for the corresponding confidence intervals is poor in several scenarios (e.g., see Scenario 2, j=1 or Scenario 4, for all j). The power to reject H0j(t):δj(t)=0 based on the IV pointwise confidence intervals gives similar results to Table 1, particularly for t=5. These IV based tests again yield type I error of approximately 5%. However, testing H0j(t) using a naive WKM analysis again results in an inflated type I error.

Table 2.

Simulation results: bias (× 100), empirical standard error (ESE) (× 100), the ratio of the average estimated standard error and the empirical standard error (ESE Ratio, %), coverage of pointwise 95% confidence intervals for δj(t) and the percent power to reject H0j(t):δj(t)=0 and H0:δw(t)=0 (%) based on (i) the IV estimators δ^j(t) and confidence intervals and (ii) the naive estimators δ˜j(t) and confidence intervals for simulation Scenarios 1–4 as described in Table 1 for θ01=0.6 and n=1000

Bias ESE ESE Ratio Coverage Power
Scenario j t δ^j(t) δ˜j(t) δ^j(t) δ˜j(t) δ^j(t) δ˜j(t) δ^j(t) δ˜j(t) δ^j(t) δ˜j(t)
1 1 3 −0.2 2.9 4.4 3.0 104 99 96 84 11 6
5 −0.2 4.1 5.2 3.4 105 101 96 79 31 21
2 3 0.2 −3.3 5.0 3.6 94 90 93 82 99 100
5 0.2 −4.3 5.6 3.7 95 98 93 79 100 100
(all) 3 0.0 −0.4 5.3 3.5 100 101 95 95 93 100
5 0.0 −0.3 5.2 3.4 100 101 95 95 92 100
2 1 3 −0.1 7.0 4.6 3.1 110 97 97 36 98 97
5 −0.1 8.5 5.3 3.5 112 99 97 30 99 100
2 3 0.1 −2.8 5.2 3.7 94 92 93 87 97 100
5 0.1 −4.2 5.6 3.7 93 99 93 79 99 100
(all) 3 0.0 4.1 5.2 3.4 100 102 95 78 6 22
5 0.0 4.2 4.8 3.0 100 101 95 72 5 28
3 1 3 −0.3 −0.5 4.8 3.3 97 95 94 94 54 84
5 −0.1 −0.3 5.5 3.7 97 98 94 94 41 70
2 3 0.0 −0.4 4.8 3.3 96 96 94 94 57 85
5 0.1 −0.2 5.6 3.7 96 99 94 95 43 70
(all) 3 −0.2 −0.9 5.1 3.4 103 103 96 95 96 100
5 0.0 −0.6 5.1 3.4 102 101 96 95 95 100
4 1 3 0.1 2.3 5.0 3.5 98 93 95 88 5 12
5 0.0 2.3 5.5 3.7 100 99 95 90 5 10
2 3 −0.1 2.2 5.0 3.5 99 93 94 89 6 11
5 −0.1 2.2 5.6 3.8 98 97 95 90 5 10
(all) 3 0.1 4.4 5.3 3.5 98 98 94 74 6 26
5 0.0 4.4 4.9 3.1 98 99 95 69 5 31

Simulation results comparing the IV and ITT estimators are given in Appendix C of supplementary material available at Biostatistics online. Table C.1 of supplementally materical available of Biostatistics online gives the bias, empirical standard error, and coverage of the pointwise confidence intervals for the IV estimators δ^(t) and δ^j(t) and the ITT estimators δ(t) and δj(t) for j=1,2. These results demonstrate that ITT estimators are also biased, and coverage of the corresponding confidence intervals is not nominal.

5. Application to the ban study

In this section, the methods developed in Section 3 are employed to compare the cumulative incidence of HIV or death in the infant NVP arm and the maternal ARV arm to the control group in the BAN study. Here R denotes randomized treatment assignment, and Z denotes the actual treatment taken based on the treatment compliance surveys completed by mothers shortly after randomization. Mother-infant pairs assigned to infant NVP or maternal ARV were considered noncompliant (i.e., Z=0) if any pills were reported as missed on the first completed treatment compliance survey administered 1–2 weeks after randomization. In the maternal ARV arm, 12% of the pairs met this criteria, and in the infant NVP arm 5% met this criteria. Following Chasela and others (2010), mother-infant pairs were excluded from the analysis if the infant became infected or died by two weeks. Death times were observed exactly whereas HIV infection times were interval censored between an infant's last negative and first positive HIV tests. However, the testing intervals were narrow, and therefore the time of HIV infection was assumed to equal the time of first positive HIV test as in the analysis by Chasela and others (2010).

The nonparametric IV and naive “as treated” estimates along with corresponding 95% confidence intervals of δj(t) for HIV infection (j=1) or death (j=2) and of δ(t)=j=12δj(t) are given in Table 3. Figure 1 depicts IV estimates of the overall cumulative distribution functions partitioned by cumulative incidence of HIV and death for each treatment arm as well as the results of the WIV tests for H0 and H0j at t0 = 48 weeks using weights w(t)=1 for all t. These results indicate that, compared to control, infant NVP and maternal ARV resulted in significant decreases in the risk of infant HIV infection in the complier stratum. These results also indicate that the interventions lowered the risk of the composite endpoint of HIV infection or death in the complier stratum. The estimated risk of death prior to infection in the complier stratum was lower in both intervention arms compared to the control arm; however, these effects were not statistically significant. Table 3 and Figure 1 also demonstrate that the IV pointwise confidence intervals and WIV tests give qualitatively similar results to a naive analysis adjusting for compliance (as described in Section 4) when comparing the infant NVP arm to control. This might be expected because the proportion that were compliant in the infant NVP arm was quite high.

Table 3.

Results for the BAN study: IV δ^j(t) and δ^(t) and naive δ˜j(t) and δ˜(t) estimates (×100) and corresponding 95% confidence intervals for (a) infant NVP versus control and (b) maternal ARV versus control for endpoints of infant HIV infection (j=1), death (j=2) and HIV infection or death (all j)

Treatment comparison
Endpoint (a) Infant NVP vs control (b) Maternal ARV vs control
t δ^j(t) (95% CI) δ˜j(t) (95% CI) δ^j(t) (95% CI) δ˜j(t) (95% CI)
HIV infection (j=1)
6 weeks 1.60 (0.56, 2.64) 1.53 (0.54, 2.52) 0.69 (−0.68, 2.07) 0.60 (−0.63, 1.84)
18 weeks 3.31 (1.76, 4.86) 3.16 (1.67, 4.65) 1.79 (−0.19, 3.76) 1.59 (−0.19, 3.36)
28 weeks 3.41 (1.56, 5.25) 3.36 (1.60, 5.12) 2.20 (−0.01, 4.42) 1.88 (−0.13, 3.88)
48 weeks 2.55 (0.19, 4.92) 2.59 (0.32, 4.85) 2.07 (−0.58, 4.72) 1.71 (−0.70, 4.12)
Death (j=2)
6 weeks 0.40 (−0.26, 1.05) 0.37 (−0.26, 1.01) 0.28 (−0.49, 1.06) 0.36 (−0.29, 1.01)
18 weeks 0.69 (−0.61, 1.98) 0.61 (−0.66, 1.87) 0.43 (−1.05, 1.91) 0.80 (−0.44, 2.04)
28 weeks 1.12 (−0.55, 2.78) 0.98 (−0.64, 2.60) 1.05 (−0.80, 2.89) 1.44 (−0.13, 3.00)
48 weeks 1.55 (−0.59, 3.68) 1.34 (−0.74, 3.42) 2.01 (−0.29, 4.30) 2.35 ( 0.38, 4.32)
δ^(t) (95% CI) δ˜(t) (95% CI) δ^(t) (95% CI) δ˜(t) (95% CI)
Death or HIV infection (all)
6 weeks 2.00 (0.77, 3.22) 1.90 (0.73, 3.08) 0.98 (−0.59, 2.55) 0.96 (−0.43, 2.35)
18 weeks 4.00 (2.00, 6.00) 3.76 (1.83, 5.70) 2.22 (−0.21, 4.65) 2.39 (0.24, 4.53)
28 weeks 4.52 (2.07, 6.97) 4.34 (1.98, 6.70) 3.25 (0.43, 6.07) 3.31 (0.80, 5.82)
48 weeks 4.10 (0.98, 7.22) 3.93 (0.91, 6.94) 4.08 (0.68, 7.47) 4.06 (1.01, 7.11)

Fig. 1.

Fig. 1.

Application to the BAN study: cumulative incidence estimates partitioned by event type and results of the hypothesis tests of no treatment effect on cumulative incidence of HIV, H01:δw1(t0)=0; no treatment effect on death, H01:δw2(t0)=0; and no effect of treatment on death or cumulative incidence of HIV, H0:δw(t0)=0 based on the WIV tests in Proposition 2 for (a) infant NVP versus control and (b) maternal ARV versus control for t0=48. In each panel the lower step function equals F^r1(t), the estimated probability of HIV infection by time t, and the upper step function equals F^r1(t)+F^r2(t), the estimated probability of HIV infection or death by time t.

On the other hand, different conclusions are reached by IV-based and naive analyses when comparing the maternal ARV arm to control in the incidence of HIV. For example, as seen in Table 3, IV-based estimates of the difference in cumulative incidence of HIV between maternal ARV and control are roughly 20–30% greater than the naive estimates. Moreover, a significant positive effect of maternal ARV versus control is found when using the WIV test (Z-score |nδ^w1(48)/σ^w(48,1)|=1.96, p-value 0.05), whereas the naive WKM test does not reject the null hypothesis H01 of no treatment effect on cumulative incidence of HIV (Z score 1.67, p-value 0.09). For the death and composite endpoints, the IV and naive inferences were similar, although naive estimates of the cumulative incidence of death are somewhat higher, and at 18 weeks the naive confidence interval for δ(t) excludes 0, whereas the IV confidence interval does not.

Results comparing the IV estimators to that obtained by an ITT analysis are contained in Table D.1 of supplementary material available at Biostatistics online. The difference in the IV and ITT estimates are more pronounced for the comparison of maternal ART with control because compliance was higher in the infant NVP arm. In particular, the infant NVP IV estimates are 5% greater than the ITT estimates, whereas in maternal ART IV estimates are 14% greater than the ITT estimates.

The veracity of the IV analysis relies on Assumptions 1–7. Although interference between mother-infant pairs was not likely, Assumption 1 could have been violated by changes in the infant and maternal treatment regimens during the course of the study. Because R is randomized treatment assignment, Assumptions 2 and 7 should hold. Assumption 3 is plausible, although study participants were not blinded so that randomization assignment may have had an effect on T or J not mediated through treatment received. Not surprisingly, associations between R and Z in the observed data support Assumptions 4 and 5.

On the other hand, there is some indication that Assumption 6 may not hold. Specifically, Sellers and others (2015) found a significant association between censoring and certain covariates. Analyses relaxing Assumption 6 by only requiring that {T,J}C|{R,V} for some set of covariates V by utilizing inverse probability of censoring weights are detailed in Appendix D.2 of supplementary material available at Biostatistics online. Results using censoring weights were similar to those in Table 3; see Table D.2 of Appendix D.2 of supplementary material available at Biostatistics online.

6. Discussion

There are several possible avenues of future research related to this work. The WIV tests here compare treatment and control within the complier principal stratum using a statistic based on the integral of weighted differences between cumulative incidence functions. A test based on differences in the sub-distribution hazard estimates, as in the Gray test (Gray, 1988), may also yield a valid test of the hypothesis given in Proposition 2. Methods relaxing Assumption 5 would be helpful in settings where monotonicity may not hold, such as in trials comparing two active arms. Relaxing Assumption 6 using inverse probability of censoring weights based on covariates predictive of failure and censoring is discussed in the supplementary material available at Biostatistics online. In observational studies, finding an IV satisfying Assumption 7 may be difficult. Relaxing Assumption 7 such that R is independent of the potential outcomes conditional on some set of covariates may increase the likelihood of obtaining an IV. Additionally, sensitivity analysis methods might be developed to assess the robustness of inferences to violations of Assumption 7. In this paper, compliance is simplified to an all or nothing binary measure; however, in many real-world applications, compliance may be more complicated, for example, in a randomized trial some subjects may be partially compliant. Thus methods that allow for a more general form of either the IV R or the treatment received Z may also be useful.

Supplementary material

Supplementary material is available at http://biostatistics.oxfordjournals.org.

Acknowledgments

The content is solely the responsibility of the authors and does not necessarily represent the official views of CDC or NIH. The authors thank the Associate Editor and two reviewers for helpful comments and the BAN investigators for access to study data. Conflict of Interest: None declared.

Funding

US Centers for Disease Control and Prevention (CDC) (U48-DP001944), and US National Institutes of Health (NIH) (P30 AI50410) and (R01 AI085073).

References

  1. Aalen O.O., Johansen S. (1978). An empirical transition matrix for non-homogeneous Markov chains based on censored observations. Scandinavian Journal of Statistics 5, 141–150. [Google Scholar]
  2. Abadie A. (2003). Semiparametric instrumental variable estimation of treatment response models. Journal of Econometrics 113, 231–263. [Google Scholar]
  3. Abbring J., van den Berg G. (2005). Social experiments and instrumental variables with duration outcomes. IFS Working Papers W05/19, London: Institute for Fiscal Studies.
  4. Andersen P.K., Borgan O., Gill R.D., Keiding N. (1995). Statistical Models Based on Counting Processes (Springer Series in Statistics), 2nd edition. New York: Springer. [Google Scholar]
  5. Angrist J.D., Imbens G.W., Rubin D.B. (1996). Identification of causal effects using instrumental variables. Journal of the American Statistical Association 91, 444–455. [Google Scholar]
  6. Baker S.G. (1998). Analysis of survival data from a randomized trial with all-or-none compliance: estimating the cost-effectiveness of a cancer screening program. Journal of the American Statistical Association 93, 929–934. [Google Scholar]
  7. Brookhart M.A., Schneeweiss S. (2007). Preference-based instrumental variable methods for the estimation of treatment effects: assessing validity and interpreting results. The International Journal of Biostatistics 3, 1–23. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Cain L.E., Cole S.R., Greenland S., Brown T.T., Chmiel J.S., Kingsley L., Detels R. (2009). Effect of highly active antiretroviral therapy on incident AIDS using calendar period as an instrumental variable. American Journal of Epidemiology 169, 1124–1132. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Chasela C.S., Hudgens M.G., Jamieson D.J., Kayira D., Hosseinipour M.C., Kourtis A.P., Martinson F., Tegha G., Knight R.J., Ahmed Y.I., et al. (2010). Maternal or infant antiretroviral drugs to reduce HIV-1 transmission. New England Journal of Medicine 362, 2271–2281. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Cuzick J., Sasieni P., Myles J., Tyrer J. (2007). Estimating the effect of treatment in a proportional hazards model in the presence of non-compliance and contamination. Journal of the Royal Statistical Society Series B 69, 565–588. [Google Scholar]
  11. Gray R.J. (1988). A class of k-sample tests for comparing the cumulative incidence of a competing risk. The Annals of statistics 16, 1141–1154. [Google Scholar]
  12. Hernán M.A., Robins J.M. (2006). Instruments for causal inference: an epidemiologist's dream? Epidemiology 17, 360–372. [DOI] [PubMed] [Google Scholar]
  13. Imbens G.W., Angrist J.D. (1994). Identification and estimation of local average treatment effects. Econometrica 62, 467–475. [Google Scholar]
  14. Kalbfleisch J.D., Prentice R.L. (2002). The Statistical Analysis of Failure Time Data, 2nd edition, Wiley Series in Probability and Statistics New Jersey: Wiley-Interscience. [Google Scholar]
  15. Li J., Fine J.P., Brookhart M.A. (2015). Instrumental variable additive hazards models. Biometrics 71, 122–130. [DOI] [PubMed] [Google Scholar]
  16. Loeys T., Goetghebeur E. (2003). A causal proportional hazards estimator for the effect of treatment actually received in a randomized trial with all-or-nothing compliance. Biometrics 59, 100–105. [DOI] [PubMed] [Google Scholar]
  17. MacKenzie T.A., Tosteson T.D., Morden N.E., Stukel T.A., Oḿalley A.J. (2014). Using instrumental variables to estimate a Cox's proportional hazards regression subject to additive confounding. Health Services and Outcomes Research Methodology 14, 54–68. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Martens E.P., Pestman W.R., de Boer A., Belitser S.V., Klungel O.H. (2006). Instrumental variables: application and limitations. Epidemiology 17, 260–267. [DOI] [PubMed] [Google Scholar]
  19. Nie H., Cheng J., Small D.S. (2011). Inference for the effect of treatment on survival probability in randomized trials with noncompliance and administrative censoring. Biometrics 67, 1397–1405. [DOI] [PubMed] [Google Scholar]
  20. Ogburn E.L., Rotnitzky A., Robins J.M. (2015). Doubly robust estimation of the local average treatment effect curve. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 77, 373–396. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Pepe M.S., Fleming T.R. (1989). Weighted Kaplan-Meier statistics: a class of distance tests for censored survival data. Biometrics 45, 497–507. [PubMed] [Google Scholar]
  22. Robins J.M., Tsiatis A.A. (1991). Correcting for non-compliance in randomized trials using rank preserving structural failure time models. Communications in Statistics - Theory and Methods 20, 2609–2631. [Google Scholar]
  23. Rubin D.B. (1980). Discussion of “Randomization analysis of experimental data in the Fisher randomization test,” by D. Basu. Journal of the American Statistical Association 75, 591–593. [Google Scholar]
  24. Sellers C.J., Lee H., Chasela C., Kayira D., Soko A., Mofolo I., Ellington S., Hudgens M.G., Kourtis A.P., King C.C., et al. (2015). Reducing lost to follow-up in a large clinical trial of prevention of mother-to-child transmission of HIV: the breastfeeding, antiretrovirals and nutrition study experience. Clinical Trials 12, 156–165. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Tan Z. (2006). Regression and weighting methods for causal inference using instrumental variables. Journal of the American Statistical Association 101, 1607–1618. [Google Scholar]
  26. Tchetgen Tchetgen E.J., Walter S., Vansteelandt S., Martinussen T., Glymour M. (2015). Instrumental variable estimation in a survival context. Epidemiology 26, 402–410. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Biostatistics (Oxford, England) are provided here courtesy of Oxford University Press

RESOURCES