Summary
In both observational studies and randomized trials with noncompliance, unmeasured confounding may exist which may bias treatment effect estimates. Instrumental variables (IV) are a popular technique for addressing such confounding, enabling consistent estimation of causal effects. This paper proposes nonparametric IV estimators for censored time to event data that may be subject to competing risks. A simple, plug-in estimator is introduced using nonparametric estimators of the cumulative incidence function, with confidence intervals derived using asymptotic theory. To provide an overall test of the treatment effect, an integrated weighted difference statistic is suggested, which is applicable to data with and without competing risks. Simulation studies demonstrate that the methods perform well with realistic samples sizes. The methods are applied to assess the effect of infant or maternal antiretroviral therapy on transmission of HIV from mother to child via breastfeeding using data from a large, recently completed randomized trial in Malawi where noncompliance with assigned treatment may confound treatment effect estimates.
Keywords: Competing risks, Compliance, Identifiability, Instrumental variables, Right censoring, Survival analysis
1. Introduction
In both randomized trials with noncompliance and nonrandomized studies, researchers may seek to assess the causal effect of a treatment on the time to different possible events of interest, such as the effect of infant or maternal antiretroviral (ARV) drugs on the time to death or HIV infection of a breastfeeding infant of an HIV+ woman (Chasela and others, 2010). However, such treatment effects are not identifiable without making empirically untestable assumptions about the relationship between the various event types, the time to the event, and the treatment selection mechanism. With the exception of randomized clinical trials with perfect compliance to treatment assignment, the treatment selection mechanism is typically unknown, which may confound standard “as treated” analyses. Such unmeasured confounding may bias treatment effect estimates in both observational studies and randomized trials with noncompliance to assigned treatment.
In some circumstances, there may exist variables that are not related to the outcome except through their effect on treatment selection. Such variables may be used to provide partial or point identification of treatment effects without knowledge of the treatment selection mechanism (Imbens and Angrist, 1994; Angrist and others, 1996; Abadie, 2003; Tan, 2006; Ogburn and others, 2015). These variables are referred to as instrumental variables (IV), and examples include treatment assignment in randomized clinical trials with noncompliance (Imbens and Angrist, 1994; Angrist and others, 1996), the calendar time for the approval of a new treatment by a regulatory agency (Martens and others, 2006; Cain and others, 2009), physician treatment prescribing preference (Brookhart and Schneeweiss, 2007), or randomized encouragement to take treatment (Martens and others, 2006). If the effect of the IV on treatment selection is monotonic, then the IV may be used to identify treatment effects within the subpopulation whose treatment is affected by the IV (Imbens and Angrist, 1994; Angrist and others, 1996; Hernán and Robins, 2006).
IV have been used to identify treatment effects describing differences in survival functions between treatment arms within the subpopulation described above. Inferential methods for censored data with IV have been developed using both nonparametric methods (Baker, 1998; Abbring and van den Berg, 2005; Nie and others, 2011; Li and others, 2015) as well as parameteric and semiparametric modelling techniques (Robins and Tsiatis, 1991; Loeys and Goetghebeur, 2003; Cuzick and others, 2007; MacKenzie and others, 2014; Tchetgen and others, 2015). In this paper, we consider competing risks data with multiple failure types and decompose the overall causal effect of treatment on the survival probability at a fixed time point into the sum of causal effects on the various event type specific cumulative incidence (or subdistribution) functions. If an IV does not share common causes with either the time to the event or type of event experienced, it may be used to identify the event type specific causal effect based on the difference in the event type specific cumulative incidence function between treatment arms. Inferences may be obtained using nonparametric estimates of the cumulative incidence functions, analogously to an overall causal effect estimator based on nonparametric estimators of the survival functions.
Typically in survival analysis an intent-to-treat test for differences in treatment-specific survival functions utilizes the log-rank statistic. This nonparametric test provides a global assessment of differences in survival functions over time. To our knowledge, the existing literature on nonparametric analyses of censored data with IV provides inferential methods only at fixed time points (Baker, 1998; Abbring and van den Berg, 2005; Nie and others, 2011) and does not address testing for global treatment differences. In this paper, we develop test statistics for differences in overall survival which are integrated weighted differences of the estimated causal effects over time, where the weight function may be chosen to emphasize time points of greatest interest. The tests are easily implemented using a straightforward variance estimator and are theoretically justified. The proposed statistics are extended to the competing risks setting, where they are constructed from nonparametric estimators of the differences in the treatment-specific cumulative incidence functions.
As an example of a setting where such methods may be applicable, consider the Breastfeeding, Antiretrovirals, and Nutrition (BAN) randomized clinical trial undertaken in Lilongwe, Malawi, between 2004 and 2010 (Chasela and others, 2010). In this study, 2369 HIV-infected breastfeeding mothers and their uninfected newborn babies were randomly assigned to one of three treatment regimens: maternal ARV therapy (), daily infant nevirapine (NVP) (), or control (). The aim was to assess the effect of maternal ARV or infant NVP on reducing mother to child transmission of HIV. Two challenges in the analysis of data from this trial are (i) not all participants complied to their randomized treatment regimen assignment and (ii) death (prior to HIV infection) was a competing risk for HIV infection. The randomized treatment assignment provides an IV that allows for estimation of treatment effects among those who would comply to whichever treatment they were assigned. In the BAN study treatment regimen adherence was measured via surveys administered to the mothers, allowing for estimation of such effects (assuming accurate self-report).
The organization of the remainder of this paper is as follows. In Section 2, notation, assumptions, causal estimands and estimators are given, both for the overall and event-type specific causal effects. Section 3 describes nonparametric inferences using the estimators for both the standard survival setup as well as in the presence of competing risks and gives details of nonparametric test statistics for a global assessment of the causal effects over time. Section 4 presents the results of a simulation study examining the finite sample performance of these estimators and tests. Section 5 applies the methods derived in Section 3 to the BAN study. Section 6 concludes with a discussion.
2. Preliminaries
2.1 Notation
An IV that is often used to estimate causal effects is randomized treatment assignment in a clinical trial. Let be an IV given by randomized assignment where indicates assignment to control and indicates assignment to treatment (e.g., maternal ARV or infant NVP in the BAN study). Although treatment effects of on the outcome of interest may be identifiable if is randomly assigned, treatment effects due to the actual treatment taken (which may differ from ) are typically the target of inference. Let be the actual treatment taken, where denotes treatment not taken, denotes that treatment was taken. Define potential treatment outcomes under randomized treatment assignment ; specifically let indicate that the subject would not take treatment under randomized assignment , and indicates that the subject would take treatment under randomized assignment . As in Imbens and Angrist (1994) and Angrist and others (1996), define principal strata based on the vector of the treatment potential outcomes where are compliers (i.e., they only take treatment if assigned to do so), are the always treated, are the never treated, and are defiers (i.e., they would only take treatment when not assigned to do so).
Suppose we are interested in time-to-event outcomes that may be subject to competing risks. For randomization assignment and actual treatment received , let be the potential failure time and let be the potential event type taking on values . Let be the failure time that would have been observed in absence of censoring, be the censoring time (e.g., due to loss to follow up), and . Let be the event type that would have been observed in absence of censoring, and ) be the observed event type where indicates that the subject was censored before the event was experienced. Suppose we observe i.i.d copies of .
2.2 Assumptions
Assumption 1 Stable unit treatment value assumption (Rubin, 1980, SUTVA): if and then and and for .
Assumption 2 Independent instrument: for .
Assumption 3 Exclusion restriction: and for
Assumption 4 Nonzero causal effect of on : .
Assumption 5 Monotonicity (Imbens and Angrist, 1994): .
Assumption 6 Independent censoring: .
Assumption 1 is a standard assumption made in order to estimate causal effects defined using potential outcomes. Assumptions 2–4 qualify as an IV and are the same assumptions found in Imbens and Angrist (1994) and Angrist and others (1996). When the IV is randomly assigned, Assumption 2 will typically be considered plausible. Assumption 3 means that the potential outcomes only depend on such that we may write and . Assumption 5 implies that the defiers principal stratum is empty (i.e., there is no subject that would take treatment only when not assigned to). This assumption is commonly made when using an instrumental variable to make inferences about causal effects (Imbens and Angrist, 1994). Assumption 6 is made in order to draw inference about the overall survival function and cumulative incidence functions in the presence of right censoring. Relaxing Assumption 6 is discussed in Section 5 in the context of the BAN study.
2.3 Causal estimands and estimators
We are interested in causal effects describing differences between the survival curves of the treated versus the nontreated within the subpopulation defined by . This is sometimes referred to as a local average treatment effect and is defined as
| (2.1) |
Under Assumptions 1–5, (2.1) is equivalent to
| (2.2) |
where is the survival function given , = Pr, , and . For fixed time point , (2.2) is equivalent to the standard IV estimand of Imbens and Angrist (1994) and Angrist and others (1996). Under Assumption 6, a consistent estimator of (2.2) is found by plugging in the Kaplan Meier estimator of the survival functions at time conditional on and where for (Baker, 1998; Abbring and van den Berg, 2005; Nie and others, 2011). This will be called the IV estimator of the local average treatment effect .
The local average treatment effect may be further partitioned into event-type specific local average treatment effects describing differences in the cumulative incidence functions for specific event type when there are competing risks for the failure time . Namely, a local average treatment effect for event type can be defined as
| (2.3) |
It follows that , i.e., the local average treatment effect can be decomposed into the sum of event-type specific effects. Note that the local average treatment effects can be zero while some of the event-type specific effects are nonzero, e.g., if and . In the context of the BAN study, this could occur if infant NVP resulted in reduced proportion of infants being infected with HIV but also increased the proportion of infants dying (perhaps due to drug side effects) such that the proportions dying or becoming infected are the same in the infant NVP and control arms. Similarly, Kalbfleisch and Prentice (2002, p. 249) describe a study comparing survival times in diabetic patients where there were no significant differences in overall survival between treatment groups, but one treatment was found to have an elevated risk of death due to cardiovascular complications.
In order to arrive at an expression of (2.3) that is identifiable from observable data, Assumption 2 will be replaced by the stronger condition given below.
Assumption 7 Jointly independent instrument: , for .
Assumption 7 states that the potential failure time , potential event type , and potential treatment taken are independent of the instrument. When the instrument is randomized treatment assignment, as in the BAN study, this assumption will hold. Under Assumptions 1–5 and 7, (2.3) is equivalent to the following
| (2.4) |
where is the cumulative incidence function for event type given and . Under Assumption 6, a consistent estimator of (2.4) is found by plugging in the Aalen and Johansen (1978) estimator of the cumulative incidence function for event type at time conditional on and consistent estimators of each . This will be called the IV estimator of . As with the estimands in (2.2) and (2.4), the estimator of the local average treatment effect equals the sum of the estimators of the event-type specific local average treatment effects .
3. Inference
3.1 Pointwise confidence intervals
Here we give the large sample distribution for IV estimators of the local average treatment effect and the event-type specific local average treatment effect for some where is the maximum follow-up time. These results yield asymptotic pointwise confidence intervals for the local effects.
Proposition 1 —
Let . Assume that as for and let and , . Assume that , 0. Then .
A proof of Proposition 1 and consistent estimators and of the asymptotic variances and are given in Appendix A of supplementary material available at Biostatistics online. It follows from Proposition 1 that an asymptotic 100% confidence interval for is given by and an asymptotic 100% confidence interval for is given by where is the quantile of a standard normal variate.
3.2 Hypothesis testing
To conduct tests for a difference between the two treatment groups in the survival curves and cumulative incidence curves for event type , consider testing the following hypotheses
for where is a user-defined weight function that may be used to test for earlier or later differences. Let be some estimator of . For example, we might choose and .
Proposition 2 —
Suppose . Then under and under as where and .
A proof of Proposition 2 and consistent estimators and of the asymptotic variances and are given in Appendix B of supplementary material available at Biostatistics online. It follows from Proposition 2 that weighted instrumental variable (WIV) tests with rejection regions defined by provide asymptotically two-sided size tests of and .
Rejection of indicates that the effect of treatment on the overall survival experience within the principal stratum is nonzero. Similarly, rejection of indicates that the effect of treatment on the cumulative incidence of events of type is nonzero within stratum . Similar to the estimands in (2.2) and (2.4), . Again, the effect of treatment on one event-type may cancel with the effect of treatment on another event type such that the null hypothesis of no treatment effect on the overall survival is true, but the event type specific null hypotheses of no treatment effect and are not true. Therefore, the availability of tests of for allows for testing of treatment effects on event-type specific cumulative incidence functions that might be missed if only a test of a treatment effect on the overall survival were conducted.
3.3 As treated and intent to treat analyses
An alternative approach to the IV-based methods developed in this paper might entail “as treated” analysis about (2.1) and (2.3) based on the estimators and where and are the Kaplan Meier estimator of the survival function and the Aalen Johansen estimator of the cumulative incidence function conditional on and ; corresponding confidence intervals for (2.1) and (2.3) might be computed by appealing to asymptotic normality results (Andersen and others, 1995) for and . In such an “as treated” analysis, testing the hypotheses and might be accomplished by replacing and with and in the WIV tests from Section 3.2 (and also modifying the variance estimators accordingly). For , this yields the Pepe and Fleming (1989) weighted Kaplan Meier (WKM) test. Because of potential selection bias induced by conditioning and , such a naive as-treated analysis is not expected to yield valid inferences about the local average treatment effects. Indeed this is demonstrated empirically in the next section.
Another competing method might entail an “intent to treat” (ITT) anlaysis about (2.1) and (2.3) based on the estimators and with large sample confidence intervals computed analogously to the as-treated intervals described above. Because the ITT analysis makes no adjustment for noncompliance, the ITT estimators are expected in general to be biased for the local average treatment effects; this is also shown empirically in the next section. On the other hand, ITT tests of the hypotheses and based on and are equivalent to the WIV tests in Section 3.2. This occurs because under the null the difference in proportions in the denominators of and cancels with the term in the variance estimators when constructing the WIV tests (see Appendix B of supplementary material available at Biostatistics online). Similar results occur with uncensored data, where an asymptotic test based on the IV estimator is equivalent to a test based on ITT estimates of mean outcomes for the two treatment arms.
4. Simulation study
Simulation studies were conducted to assess the finite sample operating characteristics of the proposed IV estimators, corresponding confidence intervals, and WIV tests. Data sets were simulated under Assumptions 1 – 7. For each simulated data set principal strata vectors were simulated using a multinomial random number generator with parameter with = Pr; ( = 0 under Assumption 5). The parameter is the proportion of the population in the compliers principal stratum and provides a measure of the strength of the instrument in determining treatment selection. The randomized treatment assignment was simulated by randomly permuting a vector of size with elements equal to 0 and the remaining elements equal to 1. The random variable was determined based on and . Censoring times were generated using a uniform random number generator on the interval . The failure time was simulated by sampling from the distribution defined by an overall hazard of , where each has a Weibull hazard of the form for four scenarios as detailed in Table 1. The event indicator was simulated by sampling from a multinomial random variable with for . If the subject was censored, i.e., , was set to 0; otherwise . All results are based on 5000 Monte Carlo simulations per scenario, , = 4, = 6, = 3, = 3, = 1 for all and = 7. Censoring rates were 25%, 18%, 25%, and 18% for scenarios 1–4 described below. Event 1 occurred at rates of 32%, 33%, 37%, and 41% and event 2 occurred at rates of 43%, 49%, 38%, and 41% for scenarios 1–4. For each data set in the simulation study, the IV, as-treated, and ITT analyses described in Section 3 were conducted.
Table 1.
Empirical type I error and power of the WIV test and the naive WKM test of and from the simulation study described in Section 4 for . Results are based on = (, , 0, ) for various . The hazard for each within each has Weibull hazard of the form for parameters . For , for and for , for
| for | Power | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| WIV | Naive WKM | |||||||||
| Scenario | 300 | 1000 | 2000 | 300 | 1000 | 2000 | ||||
| 1 | 0.6 | 1 | (0.12,1.2) | (0.12,1.2) | 5 | 8 | 14 | 83 | 98 | 99 |
| 2 | (0.24,1.2) | (0.12,1.2) | 63 | 95 | 99 | 90 | 100 | 100 | ||
| (all) | 56 | 97 | 100 | 89 | 100 | 100 | ||||
| 0.3 | 1 | (0.12,1.2) | (0.12,1.2) | 4 | 4 | 4 | 59 | 93 | 98 | |
| 2 | (0.24,1.2) | (0.12,1.2) | 20 | 51 | 80 | 20 | 51 | 80 | ||
| (all) | 19 | 50 | 78 | 79 | 99 | 100 | ||||
| 2 | 0.6 | 1 | (0.1,1.2) | (0.2,1.2) | 59 | 96 | 99 | 87 | 99 | 100 |
| 2 | (0.3,1.2) | (0.2,1.2) | 65 | 97 | 99 | 89 | 100 | 100 | ||
| (all) | 6 | 6 | 6 | 17 | 45 | 74 | ||||
| 0.3 | 1 | (0.1,1.2) | (0.2,1.2) | 18 | 45 | 74 | 18 | 45 | 74 | |
| 2 | (0.3,1.2) | (0.2,1.2) | 20 | 50 | 76 | 20 | 50 | 76 | ||
| (all) | 6 | 5 | 5 | 34 | 83 | 99 | ||||
| 3 | 0.6 | 1 | (0.19,1.2) | (0.12,1.2) | 14 | 35 | 61 | 30 | 69 | 90 |
| 2 | (0.19,1.2) | (0.12,1.2) | 15 | 35 | 61 | 15 | 35 | 61 | ||
| (all) | 62 | 98 | 100 | 91 | 100 | 100 | ||||
| 0.3 | 1 | (0.19,1.2) | (0.12,1.2) | 6 | 10 | 17 | 13 | 20 | 29 | |
| 2 | (0.19,1.2) | (0.12,1.2) | 8 | 10 | 17 | 17 | 19 | 31 | ||
| (all) | 21 | 55 | 84 | 42 | 94 | 100 | ||||
| 4 | 0.6 | 1 | (0.2,1.2) | (0.2,1.2) | 4 | 3 | 2 | 6 | 9 | 10 |
| 2 | (0.2,1.2) | (0.2,1.2) | 5 | 3 | 3 | 6 | 10 | 12 | ||
| (all) | 6 | 5 | 5 | 13 | 32 | 58 | ||||
| 0.3 | 1 | (0.2,1.2) | (0.2,1.2) | 4 | 3 | 3 | 5 | 7 | 9 | |
| 2 | (0.2,1.2) | (0.2,1.2) | 5 | 3 | 2 | 5 | 7 | 9 | ||
| (all) | 5 | 5 | 5 | 5 | 17 | 31 | ||||
Table 1 gives the estimated power of the WIV and WKM tests for various sample sizes for each scenario. Scenario 1 describes a situation in which there is a treatment effect on event type in the complier principal stratum but not on event type 1. In this scenario, the power of the WIV test to reject is similar to and the power to reject is small (though note that this scenario is not null, i.e., ). Scenario 2 describes a situation in which there is a treatment effect on both event types, but these effects cancel each other out such that (as described in Section 2.3 and at the end Section 3.2). The empirical type I error of the WIV test to reject in Scenario 2 was approximately 5%. The opposing causal effects for and 2 are roughly the same magnitude as in Scenario 1, and the power of the WIV test to reject both and in Scenario 2 is similar to the power to reject in Scenario 1. Scenario 3 describes a situation in which there is again a treatment effect for both event types, but these have the same sign and magnitude. As would be expected, the power of the WIV test to reject in this situation is higher than that of or , which are roughly the same. Scenario 4 describes a situation in which there are no causal treatment effects in the complier principal stratum for event type 1 or 2 such that . Again, as expected the results here demonstrate that the tests of , , and control the type I error rate. In all scenarios, the strength of the instrument (as measured by – the proportion of the population who are compliers) has large effects on the power of the test, with increasing instrument strength yielding increased power.
Power of the WKM tests tended to be greater than the WIV tests. However, the WKM tests failed to control the nominal type I error. In particular, for Scenarios 2 and 4 for (all), the empirical type I error tended to be greater than 5%, sometimes substantially so, and increasing with sample size. These results demonstrate that the WKM tests do not provide valid size tests of no local average treatment effects.
Table 2 shows that the IV estimators and are unbiased and that the variance estimators accurately estimate the true variance (as indicated by the ratio of the average estimated standard error and the empirical standard error). The coverage of the IV pointwise confidence intervals are approximately the nominal 0.95 in almost all scenarios. On the other hand, the naive “as treated” estimators and have higher bias, and the coverage for the corresponding confidence intervals is poor in several scenarios (e.g., see Scenario 2, or Scenario 4, for all ). The power to reject based on the IV pointwise confidence intervals gives similar results to Table 1, particularly for . These IV based tests again yield type I error of approximately 5%. However, testing using a naive WKM analysis again results in an inflated type I error.
Table 2.
Simulation results: bias ( 100), empirical standard error (ESE) ( 100), the ratio of the average estimated standard error and the empirical standard error (ESE Ratio, %), coverage of pointwise 95% confidence intervals for and the percent power to reject and (%) based on (i) the IV estimators and confidence intervals and (ii) the naive estimators and confidence intervals for simulation Scenarios 1–4 as described in Table 1 for and
| Bias | ESE | ESE Ratio | Coverage | Power | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Scenario | ||||||||||||
| 1 | 1 | 3 | −0.2 | 2.9 | 4.4 | 3.0 | 104 | 99 | 96 | 84 | 11 | 6 |
| 5 | −0.2 | 4.1 | 5.2 | 3.4 | 105 | 101 | 96 | 79 | 31 | 21 | ||
| 2 | 3 | 0.2 | −3.3 | 5.0 | 3.6 | 94 | 90 | 93 | 82 | 99 | 100 | |
| 5 | 0.2 | −4.3 | 5.6 | 3.7 | 95 | 98 | 93 | 79 | 100 | 100 | ||
| (all) | 3 | 0.0 | −0.4 | 5.3 | 3.5 | 100 | 101 | 95 | 95 | 93 | 100 | |
| 5 | 0.0 | −0.3 | 5.2 | 3.4 | 100 | 101 | 95 | 95 | 92 | 100 | ||
| 2 | 1 | 3 | −0.1 | 7.0 | 4.6 | 3.1 | 110 | 97 | 97 | 36 | 98 | 97 |
| 5 | −0.1 | 8.5 | 5.3 | 3.5 | 112 | 99 | 97 | 30 | 99 | 100 | ||
| 2 | 3 | 0.1 | −2.8 | 5.2 | 3.7 | 94 | 92 | 93 | 87 | 97 | 100 | |
| 5 | 0.1 | −4.2 | 5.6 | 3.7 | 93 | 99 | 93 | 79 | 99 | 100 | ||
| (all) | 3 | 0.0 | 4.1 | 5.2 | 3.4 | 100 | 102 | 95 | 78 | 6 | 22 | |
| 5 | 0.0 | 4.2 | 4.8 | 3.0 | 100 | 101 | 95 | 72 | 5 | 28 | ||
| 3 | 1 | 3 | −0.3 | −0.5 | 4.8 | 3.3 | 97 | 95 | 94 | 94 | 54 | 84 |
| 5 | −0.1 | −0.3 | 5.5 | 3.7 | 97 | 98 | 94 | 94 | 41 | 70 | ||
| 2 | 3 | 0.0 | −0.4 | 4.8 | 3.3 | 96 | 96 | 94 | 94 | 57 | 85 | |
| 5 | 0.1 | −0.2 | 5.6 | 3.7 | 96 | 99 | 94 | 95 | 43 | 70 | ||
| (all) | 3 | −0.2 | −0.9 | 5.1 | 3.4 | 103 | 103 | 96 | 95 | 96 | 100 | |
| 5 | 0.0 | −0.6 | 5.1 | 3.4 | 102 | 101 | 96 | 95 | 95 | 100 | ||
| 4 | 1 | 3 | 0.1 | 2.3 | 5.0 | 3.5 | 98 | 93 | 95 | 88 | 5 | 12 |
| 5 | 0.0 | 2.3 | 5.5 | 3.7 | 100 | 99 | 95 | 90 | 5 | 10 | ||
| 2 | 3 | −0.1 | 2.2 | 5.0 | 3.5 | 99 | 93 | 94 | 89 | 6 | 11 | |
| 5 | −0.1 | 2.2 | 5.6 | 3.8 | 98 | 97 | 95 | 90 | 5 | 10 | ||
| (all) | 3 | 0.1 | 4.4 | 5.3 | 3.5 | 98 | 98 | 94 | 74 | 6 | 26 | |
| 5 | 0.0 | 4.4 | 4.9 | 3.1 | 98 | 99 | 95 | 69 | 5 | 31 | ||
Simulation results comparing the IV and ITT estimators are given in Appendix C of supplementary material available at Biostatistics online. Table C.1 of supplementally materical available of Biostatistics online gives the bias, empirical standard error, and coverage of the pointwise confidence intervals for the IV estimators and and the ITT estimators and for . These results demonstrate that ITT estimators are also biased, and coverage of the corresponding confidence intervals is not nominal.
5. Application to the ban study
In this section, the methods developed in Section 3 are employed to compare the cumulative incidence of HIV or death in the infant NVP arm and the maternal ARV arm to the control group in the BAN study. Here denotes randomized treatment assignment, and denotes the actual treatment taken based on the treatment compliance surveys completed by mothers shortly after randomization. Mother-infant pairs assigned to infant NVP or maternal ARV were considered noncompliant (i.e., ) if any pills were reported as missed on the first completed treatment compliance survey administered 1–2 weeks after randomization. In the maternal ARV arm, 12% of the pairs met this criteria, and in the infant NVP arm 5% met this criteria. Following Chasela and others (2010), mother-infant pairs were excluded from the analysis if the infant became infected or died by two weeks. Death times were observed exactly whereas HIV infection times were interval censored between an infant's last negative and first positive HIV tests. However, the testing intervals were narrow, and therefore the time of HIV infection was assumed to equal the time of first positive HIV test as in the analysis by Chasela and others (2010).
The nonparametric IV and naive “as treated” estimates along with corresponding 95% confidence intervals of for HIV infection () or death () and of are given in Table 3. Figure 1 depicts IV estimates of the overall cumulative distribution functions partitioned by cumulative incidence of HIV and death for each treatment arm as well as the results of the WIV tests for and at = 48 weeks using weights for all . These results indicate that, compared to control, infant NVP and maternal ARV resulted in significant decreases in the risk of infant HIV infection in the complier stratum. These results also indicate that the interventions lowered the risk of the composite endpoint of HIV infection or death in the complier stratum. The estimated risk of death prior to infection in the complier stratum was lower in both intervention arms compared to the control arm; however, these effects were not statistically significant. Table 3 and Figure 1 also demonstrate that the IV pointwise confidence intervals and WIV tests give qualitatively similar results to a naive analysis adjusting for compliance (as described in Section 4) when comparing the infant NVP arm to control. This might be expected because the proportion that were compliant in the infant NVP arm was quite high.
Table 3.
Results for the BAN study: IV and and naive and estimates () and corresponding 95% confidence intervals for (a) infant NVP versus control and (b) maternal ARV versus control for endpoints of infant HIV infection (), death () and HIV infection or death (all )
| Treatment comparison | ||||
|---|---|---|---|---|
| Endpoint | (a) Infant NVP vs control | (b) Maternal ARV vs control | ||
| (95% CI) | (95% CI) | (95% CI) | (95% CI) | |
| HIV infection () | ||||
| 6 weeks | 1.60 (0.56, 2.64) | 1.53 (0.54, 2.52) | 0.69 (−0.68, 2.07) | 0.60 (−0.63, 1.84) |
| 18 weeks | 3.31 (1.76, 4.86) | 3.16 (1.67, 4.65) | 1.79 (−0.19, 3.76) | 1.59 (−0.19, 3.36) |
| 28 weeks | 3.41 (1.56, 5.25) | 3.36 (1.60, 5.12) | 2.20 (−0.01, 4.42) | 1.88 (−0.13, 3.88) |
| 48 weeks | 2.55 (0.19, 4.92) | 2.59 (0.32, 4.85) | 2.07 (−0.58, 4.72) | 1.71 (−0.70, 4.12) |
| Death () | ||||
| 6 weeks | 0.40 (−0.26, 1.05) | 0.37 (−0.26, 1.01) | 0.28 (−0.49, 1.06) | 0.36 (−0.29, 1.01) |
| 18 weeks | 0.69 (−0.61, 1.98) | 0.61 (−0.66, 1.87) | 0.43 (−1.05, 1.91) | 0.80 (−0.44, 2.04) |
| 28 weeks | 1.12 (−0.55, 2.78) | 0.98 (−0.64, 2.60) | 1.05 (−0.80, 2.89) | 1.44 (−0.13, 3.00) |
| 48 weeks | 1.55 (−0.59, 3.68) | 1.34 (−0.74, 3.42) | 2.01 (−0.29, 4.30) | 2.35 ( 0.38, 4.32) |
| (95% CI) | (95% CI) | (95% CI) | (95% CI) | |
|---|---|---|---|---|
| Death or HIV infection (all) | ||||
| 6 weeks | 2.00 (0.77, 3.22) | 1.90 (0.73, 3.08) | 0.98 (−0.59, 2.55) | 0.96 (−0.43, 2.35) |
| 18 weeks | 4.00 (2.00, 6.00) | 3.76 (1.83, 5.70) | 2.22 (−0.21, 4.65) | 2.39 (0.24, 4.53) |
| 28 weeks | 4.52 (2.07, 6.97) | 4.34 (1.98, 6.70) | 3.25 (0.43, 6.07) | 3.31 (0.80, 5.82) |
| 48 weeks | 4.10 (0.98, 7.22) | 3.93 (0.91, 6.94) | 4.08 (0.68, 7.47) | 4.06 (1.01, 7.11) |
Fig. 1.
Application to the BAN study: cumulative incidence estimates partitioned by event type and results of the hypothesis tests of no treatment effect on cumulative incidence of HIV, ; no treatment effect on death, ; and no effect of treatment on death or cumulative incidence of HIV, based on the WIV tests in Proposition 2 for (a) infant NVP versus control and (b) maternal ARV versus control for . In each panel the lower step function equals , the estimated probability of HIV infection by time , and the upper step function equals , the estimated probability of HIV infection or death by time .
On the other hand, different conclusions are reached by IV-based and naive analyses when comparing the maternal ARV arm to control in the incidence of HIV. For example, as seen in Table 3, IV-based estimates of the difference in cumulative incidence of HIV between maternal ARV and control are roughly 20–30% greater than the naive estimates. Moreover, a significant positive effect of maternal ARV versus control is found when using the WIV test (Z-score , -value 0.05), whereas the naive WKM test does not reject the null hypothesis of no treatment effect on cumulative incidence of HIV ( score 1.67, -value 0.09). For the death and composite endpoints, the IV and naive inferences were similar, although naive estimates of the cumulative incidence of death are somewhat higher, and at 18 weeks the naive confidence interval for excludes 0, whereas the IV confidence interval does not.
Results comparing the IV estimators to that obtained by an ITT analysis are contained in Table D.1 of supplementary material available at Biostatistics online. The difference in the IV and ITT estimates are more pronounced for the comparison of maternal ART with control because compliance was higher in the infant NVP arm. In particular, the infant NVP IV estimates are 5% greater than the ITT estimates, whereas in maternal ART IV estimates are 14% greater than the ITT estimates.
The veracity of the IV analysis relies on Assumptions 1–7. Although interference between mother-infant pairs was not likely, Assumption 1 could have been violated by changes in the infant and maternal treatment regimens during the course of the study. Because is randomized treatment assignment, Assumptions 2 and 7 should hold. Assumption 3 is plausible, although study participants were not blinded so that randomization assignment may have had an effect on or not mediated through treatment received. Not surprisingly, associations between and in the observed data support Assumptions 4 and 5.
On the other hand, there is some indication that Assumption 6 may not hold. Specifically, Sellers and others (2015) found a significant association between censoring and certain covariates. Analyses relaxing Assumption 6 by only requiring that for some set of covariates by utilizing inverse probability of censoring weights are detailed in Appendix D.2 of supplementary material available at Biostatistics online. Results using censoring weights were similar to those in Table 3; see Table D.2 of Appendix D.2 of supplementary material available at Biostatistics online.
6. Discussion
There are several possible avenues of future research related to this work. The WIV tests here compare treatment and control within the complier principal stratum using a statistic based on the integral of weighted differences between cumulative incidence functions. A test based on differences in the sub-distribution hazard estimates, as in the Gray test (Gray, 1988), may also yield a valid test of the hypothesis given in Proposition 2. Methods relaxing Assumption 5 would be helpful in settings where monotonicity may not hold, such as in trials comparing two active arms. Relaxing Assumption 6 using inverse probability of censoring weights based on covariates predictive of failure and censoring is discussed in the supplementary material available at Biostatistics online. In observational studies, finding an IV satisfying Assumption 7 may be difficult. Relaxing Assumption 7 such that is independent of the potential outcomes conditional on some set of covariates may increase the likelihood of obtaining an IV. Additionally, sensitivity analysis methods might be developed to assess the robustness of inferences to violations of Assumption 7. In this paper, compliance is simplified to an all or nothing binary measure; however, in many real-world applications, compliance may be more complicated, for example, in a randomized trial some subjects may be partially compliant. Thus methods that allow for a more general form of either the IV or the treatment received may also be useful.
Supplementary material
Supplementary material is available at http://biostatistics.oxfordjournals.org.
Acknowledgments
The content is solely the responsibility of the authors and does not necessarily represent the official views of CDC or NIH. The authors thank the Associate Editor and two reviewers for helpful comments and the BAN investigators for access to study data. Conflict of Interest: None declared.
Funding
US Centers for Disease Control and Prevention (CDC) (U48-DP001944), and US National Institutes of Health (NIH) (P30 AI50410) and (R01 AI085073).
References
- Aalen O.O., Johansen S. (1978). An empirical transition matrix for non-homogeneous Markov chains based on censored observations. Scandinavian Journal of Statistics 5, 141–150. [Google Scholar]
- Abadie A. (2003). Semiparametric instrumental variable estimation of treatment response models. Journal of Econometrics 113, 231–263. [Google Scholar]
- Abbring J., van den Berg G. (2005). Social experiments and instrumental variables with duration outcomes. IFS Working Papers W05/19, London: Institute for Fiscal Studies.
- Andersen P.K., Borgan O., Gill R.D., Keiding N. (1995). Statistical Models Based on Counting Processes (Springer Series in Statistics), 2nd edition. New York: Springer. [Google Scholar]
- Angrist J.D., Imbens G.W., Rubin D.B. (1996). Identification of causal effects using instrumental variables. Journal of the American Statistical Association 91, 444–455. [Google Scholar]
- Baker S.G. (1998). Analysis of survival data from a randomized trial with all-or-none compliance: estimating the cost-effectiveness of a cancer screening program. Journal of the American Statistical Association 93, 929–934. [Google Scholar]
- Brookhart M.A., Schneeweiss S. (2007). Preference-based instrumental variable methods for the estimation of treatment effects: assessing validity and interpreting results. The International Journal of Biostatistics 3, 1–23. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cain L.E., Cole S.R., Greenland S., Brown T.T., Chmiel J.S., Kingsley L., Detels R. (2009). Effect of highly active antiretroviral therapy on incident AIDS using calendar period as an instrumental variable. American Journal of Epidemiology 169, 1124–1132. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chasela C.S., Hudgens M.G., Jamieson D.J., Kayira D., Hosseinipour M.C., Kourtis A.P., Martinson F., Tegha G., Knight R.J., Ahmed Y.I., et al. (2010). Maternal or infant antiretroviral drugs to reduce HIV-1 transmission. New England Journal of Medicine 362, 2271–2281. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cuzick J., Sasieni P., Myles J., Tyrer J. (2007). Estimating the effect of treatment in a proportional hazards model in the presence of non-compliance and contamination. Journal of the Royal Statistical Society Series B 69, 565–588. [Google Scholar]
- Gray R.J. (1988). A class of k-sample tests for comparing the cumulative incidence of a competing risk. The Annals of statistics 16, 1141–1154. [Google Scholar]
- Hernán M.A., Robins J.M. (2006). Instruments for causal inference: an epidemiologist's dream? Epidemiology 17, 360–372. [DOI] [PubMed] [Google Scholar]
- Imbens G.W., Angrist J.D. (1994). Identification and estimation of local average treatment effects. Econometrica 62, 467–475. [Google Scholar]
- Kalbfleisch J.D., Prentice R.L. (2002). The Statistical Analysis of Failure Time Data, 2nd edition, Wiley Series in Probability and Statistics New Jersey: Wiley-Interscience. [Google Scholar]
- Li J., Fine J.P., Brookhart M.A. (2015). Instrumental variable additive hazards models. Biometrics 71, 122–130. [DOI] [PubMed] [Google Scholar]
- Loeys T., Goetghebeur E. (2003). A causal proportional hazards estimator for the effect of treatment actually received in a randomized trial with all-or-nothing compliance. Biometrics 59, 100–105. [DOI] [PubMed] [Google Scholar]
- MacKenzie T.A., Tosteson T.D., Morden N.E., Stukel T.A., Oḿalley A.J. (2014). Using instrumental variables to estimate a Cox's proportional hazards regression subject to additive confounding. Health Services and Outcomes Research Methodology 14, 54–68. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Martens E.P., Pestman W.R., de Boer A., Belitser S.V., Klungel O.H. (2006). Instrumental variables: application and limitations. Epidemiology 17, 260–267. [DOI] [PubMed] [Google Scholar]
- Nie H., Cheng J., Small D.S. (2011). Inference for the effect of treatment on survival probability in randomized trials with noncompliance and administrative censoring. Biometrics 67, 1397–1405. [DOI] [PubMed] [Google Scholar]
- Ogburn E.L., Rotnitzky A., Robins J.M. (2015). Doubly robust estimation of the local average treatment effect curve. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 77, 373–396. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pepe M.S., Fleming T.R. (1989). Weighted Kaplan-Meier statistics: a class of distance tests for censored survival data. Biometrics 45, 497–507. [PubMed] [Google Scholar]
- Robins J.M., Tsiatis A.A. (1991). Correcting for non-compliance in randomized trials using rank preserving structural failure time models. Communications in Statistics - Theory and Methods 20, 2609–2631. [Google Scholar]
- Rubin D.B. (1980). Discussion of “Randomization analysis of experimental data in the Fisher randomization test,” by D. Basu. Journal of the American Statistical Association 75, 591–593. [Google Scholar]
- Sellers C.J., Lee H., Chasela C., Kayira D., Soko A., Mofolo I., Ellington S., Hudgens M.G., Kourtis A.P., King C.C., et al. (2015). Reducing lost to follow-up in a large clinical trial of prevention of mother-to-child transmission of HIV: the breastfeeding, antiretrovirals and nutrition study experience. Clinical Trials 12, 156–165. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tan Z. (2006). Regression and weighting methods for causal inference using instrumental variables. Journal of the American Statistical Association 101, 1607–1618. [Google Scholar]
- Tchetgen Tchetgen E.J., Walter S., Vansteelandt S., Martinussen T., Glymour M. (2015). Instrumental variable estimation in a survival context. Epidemiology 26, 402–410. [DOI] [PMC free article] [PubMed] [Google Scholar]

