Summary
Instrumental variable methods allow unbiased estimation in the presence of unmeasured confounders when an appropriate instrumental variable is available. Two‐stage least‐squares and residual inclusion methods have recently been adapted to additive hazard models for censored survival data. The semi‐parametric additive hazard model which can include time‐independent and time‐dependent covariate effects is particularly suited for the two‐stage residual inclusion method, since it allows direct estimation of time‐independent covariate effects without restricting the effect of the residual on the hazard. In this article, we prove asymptotic normality of two‐stage residual inclusion estimators of regression coefficients in a semi‐parametric additive hazard model with time‐independent and time‐dependent covariate effects. We consider the cases of continuous and binary exposure. Estimation of the conditional survival function given observed covariates is discussed and a resampling scheme is proposed to obtain simultaneous confidence bands. The new methods are compared to existing ones in a simulation study and are applied to a real data set. The proposed methods perform favorably especially in cases with exposure‐dependent censoring.
Keywords: Additive hazard, Confounding, Instrumental variable, Survival analysis
1. Introduction
Instrumental variables (IV) can be used in regression modeling to avoid bias from unmeasured confounding or dependent measurement error in covariates by providing a source of exogenous variation (Angrist et al., 1996). These methods are also popular in epidemiology in the analysis of observational studies. In randomized clinical trials with survival endpoints unmeasured confounding may occur as a result of non‐compliance, for example, when patients switch to salvage treatment after a progression of the disease. Applying naive analysis methods in such circumstances may result in severe bias (Zeng et al., 2012).
Two‐stage IV methods for duration data in econometrics have been proposed by Bijwaard and Ridder (2005). Estimation of survival probabilities under treatment non‐compliance using IV methods was considered by Nie et al. (2011). Baker (1998) estimates life years saved using IV methods in the context of all‐or‐none compliance. Two‐stage IV methods for parametric Bayesian models have been developed by Li and Lu (2015), and non‐parametric binary IV methods for competing risks data by Richardson et al. (2017). The additive hazard model (Aalen, 1989) is particularly amenable to IV methods, since it resembles the linear regression model, while the popular Cox proportional hazards model is inappropriate for IV methods as shown by Tchetgen Tchetgen et al. (2015).
For additive hazard survival models with censored data several two‐stage methods employing IVs have been developed. In the two‐stage least squares (2SLS) method, the first stage consists of a linear model for the confounded exposure given the IV and other observed covariates. In the second stage, an additive hazard model is fitted with the observed exposure being replaced by the predicted exposure from the first stage regression. Alternatively, the two‐stage residual inclusion (2SRI) method (Terza et al., 2008) keeps the observed exposure in the model, but includes the estimated first stage residual as additional covariate in the model.
For the 2SRI method, the first stage does not need to be a linear model, but additional assumptions about the unobserved confounding are required (Tchetgen Tchetgen et al., 2015). Essentially, in the case of continuous exposure, it is required that the unobserved confounding is a linear function of the first stage residual plus an independent error term. In the case of binary exposure we must be able to write the unobserved confounder as the sum of the conditional expectation of the unobserved confounder given exposure, instrument and observed covariates and an independent error term. These assumptions will be detailed in Section 2.
A 2SLS method for a continuous instrument for the semi‐parametric additive hazard model of Lin and Ying (1994), where all covariate effects are assumed to be time‐independent, was developed by Li et al. (2015). A similar 2SLS method for continuous instruments was proposed by Tchetgen Tchetgen et al. (2015) for the non‐parametric additive hazard model of Aalen (1989), where all covariate effects are allowed to be time‐dependent. For the same model they also develop a 2SRI method for binary and continuous instruments. However, asymptotic results are only provided for the 2SLS method. Work on IV methods for the additive hazard model has focused on the case of only time‐independent covariate effects. The semi‐parametric additive hazards model of McKeague and Sasieni (1994), which allows time‐independent and time‐dependent effects has received less attention. We argue that this model is more appropriate for the 2SRI method, since it does not require the effect of the residual included in the second stage model to be time‐independent. At the same time the exposure effect can still be modeled as time‐independent, which may be more useful to summarize treatment effects in a randomized trial.
While the 2SRI method requires more stringent assumptions about the influence of the unobserved confounder on the hazard, the assumptions about the censoring can be relaxed. It is sufficient that the censoring is independent of the survival time conditional on the exposure and observed covariates, since the exposure is still part of the model (Chan, 2016). While the 2SLS method with a linear first stage can be used in the case of a binary exposure, a non‐linear first stage model, such as a logistic regression model, might be more appropriate.
A different and very general approach is taken by Martinussen et al. (2017), who develop an IV method for a class of structural cumulative survival models. Their approach does not require any modeling of the relationship between the exposure and the instrument. However, it requires a parametric model for conditional expectation of the instrument given the observed confounders and the survival function cannot be readily estimated from this model. In recent work Choi et al. (2017) proposed a two‐stage procedure for general structural equation models, that can also be applied to censored survival data.
In Section 2, we extend the 2SRI methods for continuous and binary exposure to the semi‐parametric additive hazards model of McKeague and Sasieni (1994), which allows for time‐dependent and time‐independent covariate effects. Hence, the residual can be included in the model without restrictions (other than linearity), while the effect of the other covariates can be modeled as time‐independent. Asymptotic results are derived for the 2SRI approach with binary and continuous exposure and instrument. In Section 2.3, an iid decomposition of an estimator of the conditional survival function given the exposure, the instrument and all observed confounders is proved. Based on this result a resampling scheme for obtaining simultaneous confidence bands is proposed. In our simulation study in Section 3, we find the 2SRI method to be superior to the 2SLS in the binary case and/or exposure‐dependent censoring for the survival times. In Section 4, the methods are applied to a dataset from the Illinois unemployment bonus experiment (Woodbury and Spiegelman, 1987), where participants receiving unemployment benefits were offered a cash bonus on re‐employment.
2. Two‐Stage Instrumental Variable Methods
Let T be a continuous survival time, C the censoring time, and the observed right‐censored survival time. We assume that the follow‐up period is a fixed finite interval and that the hazard of T follows an additive hazard model
| (1) |
where is the baseline hazard, R is the observed exposure/treatment indicator with a time‐independent effect, is a p‐vector of observed covariates with time‐independent effects, is a q‐vector of observed covariates with time‐dependent effects, and is a term depending on a vector of unobserved confounders U. All covariates in the model are baseline covariates which cannot change over time. We call this model the “McKeague–Sasieni model” (McKeague and Sasieni, 1994). The additive hazard model of Lin and Ying (1994) where all covariate effects are time‐independent will be called the “Lin–Ying model”. The original additive hazard model of Aalen (1989) where all covariate effects are unrestricted will be called the “Aalen model”. Both the Lin–Ying and the Aalen model can be viewed as special cases of the McKeague–Sasieni model.
Our main focus is on estimating the causal effect of the exposure on the hazard . In general IV, methods can only identify the local average treatment effect (LATE) as shown in Angrist et al. (1996), that is, the average treatment effect of those whose exposure changes when the value of the IV changes. IV methods cannot say anything about subjects whose exposure is always the same regardless of the value of the IV (so‐called “always‐takers” and “never‐takers” in the context of binary treatment assignment and instrument). However, implicit in Model (1) is the assumption that the treatment effect is the same for all individuals for a given value of the covariates. This means that the LATE is equal to for all subjects and can therefore be interpreted as the average treatment effect (ATE) for the entire population. Hence, the IV estimate in this model is a consistent estimate of the population ATE.
Alternatively, one could start with the Aalen model and then use
as an estimate of , where is a fixed time horizon and is a consistent estimate of the cumulative effect obtained by 2SLS or 2SRI in the Aalen model (Tchetgen Tchetgen et al., 2015). Outside of the two‐stage setting this approach was also taken by Martinussen et al. (2017). However, this estimate would have a larger standard error than the semi‐parametric estimate and may not be data dependent.
Let . Formally we assume the existence of an instrumental variable G, such that following assumptions hold:
-
A1
G is associated with R conditional on L.
-
A2
G is independent of T conditional on L, R, and U.
Assumption (A1) implies that there is a non‐zero average causal effect of the instrument G on the exposure R and Assumption (A2) is the exclusion restriction of Angrist et al. (1996). We also assume that L and G are exogenous, that is,
-
A3
U is independent of L and G.
The 2SLS methods of Li et al. (2015) and Tchetgen Tchetgen et al. (2015) first predict the exposure from a linear regression model given the instrument and any observed covariates. Then an additive hazard model is fitted with the observed exposure replaced by the predicted exposure. In the 2SRI method of Tchetgen Tchetgen et al. (2015), the observed exposure is kept and instead the residual of the first stage regression is included as an additional regressor in the second stage model. For uncensored observations and linear first and second stage models both methods would coincide. However, in the case of a binary exposure a non‐linear first stage model, such as a logistic regression model, might be more appropriate.
When considering regression methods for censored survival data it is usually necessary to assume independence of censoring and survival times conditional on all covariates included in the model. The 2SLS method requires censoring C and survival time T to be independent conditional on the observed covariates L. The 2SLS method can suffer from bias when censoring and survival are dependent on the exposure R. The bias of the 2SLS method induced by exposure dependent censoring is explored in Scenario VI of Li et al. (2015) and in our own simulations in Section 3. Since the exposure R is still included in the second stage model, it is sufficient to require conditional independence of censoring and survival times given the observed covariates and the exposure (Chan, 2016):
-
A4
C is independent of T conditional on R and L.
The relationships encoded in Assumptions (A1)–(A4) can be represented by a directed acyclic graph (DAG) as shown in Figure 1. The arrows represent dependencies between random variables. There is an arrow from G to R (Assumption (A1)), but no arrow from G to T (Assumption (A2)) and no arrows from U to L and G (Assumption (A3)). The censoring C is allowed to depend on the instrument G for 2SRI, since removing the nodes R and L from the DAG separates T and C even when C depends on G. It is however important to note that C must be independent from the unobserved confounder U given R and L, that is, no arrow from U to C.
Figure 1.

Visualization of IV assumptions (A1)–(A4) with instrument G, exposure R, survival time T, observed confounders L, unobserved confounders U, and censoring time C
2.1. Binary Case
In the case of a binary exposure R we use a logistic regression model in the first stage
| (2) |
where and . Denote the maximum likelihood estimator of by . The predicted probability for a patient with instrument G and covariates L from this model is
The 2SRI method requires an additional linearity assumption about the unobserved heterogeneity (Tchetgen Tchetgen et al., 2015):
-
A5
,
where is an error independent of R, G and L. This assumption holds, for example, when U has a normal distribution where only the mean depends on R, G and L.
Under assumptions (A1)–(A5) a reparametrization of the original model can be obtained from Result 3 of Tchetgen Tchetgen et al. (2015):
| (3) |
where , , and . Since the true residual is unknown it is estimated by .
We emphasize, that the conditional independence assumption (A4) is sufficient in the binary exposure case as well, that is, censoring is allowed to be dependent on the binary exposure.
An interesting special case is when the exposure is confounded only for the group with , which implies that U is independent of R given and L. This is the case in our data example (Section 4) with full compliance in the control group. In this case and the conditional hazard becomes
| (4) |
If instead U is independent of R given and L, then and is replaced by in equation (4). For example, such a situation occurred in the panitumumab colorectal cancer trial (Amado et al., 2008), where patients randomized to the standard of care group had the possibility of switching to the experimental treatment on disease progression. Fitting the model which only includes the residual‐instrument interaction but not the main effect of the residual may avoid numerical stability issues as in our data example (Section 4).
We are interested in estimating the vector of regression coefficients and the vector of cumulative covariate effects
Let be the matrix with i‐th row given by , where is the at‐risk indicator at time t of the i‐th subject. The design matrix for the time‐dependent coefficient functions including the baseline hazard function is defined like Z with i‐th row equal to . Furthermore, we obtain the matrix by replacing in X the unknown residuals with the estimated residuals . We can then define the estimators of and A like those given by McKeague and Sasieni (1994), but using instead of X,
| (5) |
and
| (6) |
where , I is the identity matrix and is the vector of counting processes.
The additional variation in the second stage introduced by must be taken into account when calculating standard errors for the regression coefficients. The correct standard errors are given by Theorem 2.1 below. Its proof and the required regularity assumptions (B1)–(B6) are given in the Appendix.
Theorem 1
Under the IV assumptions (A1)–(A5) and the regularity assumptions (B1)–(B3) we havewhere are iid vectors defined in equation (A.5) in the Appendix. This implies that and is asymptotically normal with mean zero and covariance matrix , where for a vector a.
(7) Under assumptions (A1)–(A5) and (B1)–(B6) we havewhere are iid functions defined in equation (A.7) in the Appendix. This implies that and converges weakly to a vector of mean‐zero Gaussian processes with covariance function .
(8)
Theorem 2.1 can also be applied in the less restrictive Aalen model
| (9) |
with only time‐dependent covariate effects by setting , which implies for all t.
2.2. Continuous Case
For a continuous exposure we assume a linear model as the first stage model, that is,
Assumption (A5) needs to be modified to
-
A5c
,
where is an error term independent of (Tchetgen Tchetgen et al., 2015). According to Result 2 of Tchetgen Tchetgen et al. (2015) we have
| (10) |
When fitting this model the true unknown residual is again replaced with the residual from the first stage regression . The result for the asymptotic distribution of Theorem 2.1 still holds, when we replace and with and , respectively, in Assumption (B2) and (B6). As in the binary case, this holds for the special case of only time‐dependent effects (equation (9)) as well.
2.3. Estimation of the Conditional Survival Function
In the 2SLS approach, it is possible to estimate the survival function of T given R and L only, as shown by Li et al. (2015), whereas in the 2SRI approach this can only be achieved by further modeling of the conditional distribution of G given R and L and then taking the expectation of with respect to that distribution. This is because we can only estimate the covariate effects in the model for the conditional hazard (equation (1)), but we cannot estimate the original baseline hazard . Therefore, the survival function can only be estimated from the model for the conditional hazard (equations (3) and (10)), which explicitly depends on the first stage residual and therefore on the instrument G. Only in the case of binary instrument and exposure and no covariates is a simple non‐parametric estimator of available:
Let and , where . Then
where in the continuous and in the binary case. Uniform consistency and asymptotic normality of the obvious estimator
| (11) |
follow from a Taylor expansion around and the iid decompositions given in Theorem 2.3.
In principle an estimator of could be obtained by
where is an estimator of the conditional probability density of G given and , such as a kernel density estimator, which is feasible when the dimension of the covariate vector L is small. However, deriving the asymptotic properties of is beyond the scope of this article.
Theorem 2
Let . Under assumptions (A1)–(A5) and (B1)–(B6) we have
where
in the continuous case and
in the binary case, respectively, are iid random variables. The iid decomposition implies weak convergence of to a Gaussian process whose variance function can be consistently estimated by , where is obtained by replacing all unknown quantities in the definition of with their consistent estimators.
Theorem 2.3 follows from a Taylor expansion of around and the iid decompositions of , and in Theorem 2.1.
Simultaneous confidence bands for can be obtained by , where is such that . The distribution of can be approximated using a resampling approach based on the iid decomposition in Theorem 2.3. For independent standard normal random variables , given the observed data,the process
has the same asymptotic distribution as (Theorem 5.4.1 Martinussen and Scheike, 2006). Therefore the limiting distribution of can be approximated by the empirical distribution of for a large number M. The quantile is then obtained as the empirical quantile of .
3. Simulations
We compare the finite‐sample properties of the benchmark method (all confounders included in the model), the two‐stage residual inclusion (2SRI) method, two‐stage least squares method and naive method (confounders ignored) in several simulation scenarios with continuous and binary exposure.
3.1. Scenarios
This scenario corresponds to Case I of Li et al. (2015). The instrument G, unobserved confounder U and observed confounder L are all standard normal. The exposure R is continuous and is generated from the linear model , where . The conditional hazard of the survival time is . The censoring time is exponential with rate .
Same as Scenario 1, but with exposure‐dependent censoring, that is, censoring time is now exponential with rate
Same as Scenario 1, but linearity condition (A5c) for the confounder violated, that is, in the first stage , where and .
- Slight modification of Scenario 3 from Martinussen et al. (2017) with continuous instrument and unobserved confounder , where . The binary exposure is generated from the logistic regression model
The conditional hazard of the survival time is and censoring is uniform on . This scenario corresponds to Case VII from Li et al. (2015). The instrument is binary with . The unobserved confounder U is standard normal. The exposure is set to 1 if and to 0 otherwise, where is normal with mean 0 and standard deviation . This corresponds to a probit model. The survival time has hazard where for all t and censoring is exponential with rate .
Same as Scenario 5, but with exposure‐dependent censoring, that is, C given R has an exponential distribution with rate .
Our results include as special cases the additive hazards model where all effects are modeled as time‐dependent. We consider a scenario with time‐dependent exposure effect on the hazard.
-
7
The same as Scenario 4, but now .
In the scenarios with binary exposure estimates were only calculated up to times where at least 15 (approx. 3–4 times the number of covariates) subjects were still at‐risk, in order to avoid numerical instability with singular matrices in the calculation of the estimates.
3.2. Results
In this section, we consider the results for the estimated effect of exposure. In all scenarios we also consider the coverage probability of the confidence intervals based on the unadjusted estimates of the standard errors, which do not account for the additional variation caused by including the estimated first stage residuals as covariates in the second stage. The results of the two continuous exposure Scenarios 1 and 2 are shown in Table 1. For Scenario 1 both two‐stage methods can be seen to be unbiased and near nominal coverage probabilities. The naive method has a substantial bias for all sample sizes and very small coverage probability that tends to 0 as the sample size increases. In Scenario 2 with exposure‐dependent censoring the 2SLS method is now biased. In Scenario 3, where the linearity assumption for the confounder is violated, 2SRI has a substantial bias, but the coverage probabilities are still close to the nominal level.
Table 1.
Results of 50,000 simulations for scenarios 1–3 (continuous exposure) of benchmark (all confounders observed), two‐stage residual inclusion (2SRI), two‐stage least‐squares (2SLS), and naive (confounders ignored) analysis for varying sample sizes n. RMSE, root mean‐squared error; SD, standard deviation; ESE, estimated standard error; ESE, estimated unadjusted standard error of; CP, coverage probability of 95% confidence interval; CP, coverage probability of unadjusted 95% confidence interval
| Scenario | n | Method | RMSE | Bias | SD | ESE | ESE | CP | CP | Power (%) |
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 400 | Benchmark | 1.031 | 0.005 | 1.031 | 1.029 | 1.029 | 0.949 | 0.949 | 7.1 |
| 2SRI | 1.124 | −0.030 | 1.123 | 1.106 | 1.114 | 0.948 | 0.948 | 7.4 | ||
| 2SLS | 1.118 | −0.015 | 1.118 | 1.122 | 1.121 | 0.952 | 0.951 | 6.6 | ||
| Naive | 1.275 | 1.177 | 0.489 | 0.485 | 0.485 | 0.310 | 0.310 | 93.5 | ||
| 800 | Benchmark | 0.713 | 0.004 | 0.713 | 0.716 | 0.716 | 0.951 | 0.951 | 10.3 | |
| 2SRI | 0.775 | −0.024 | 0.774 | 0.772 | 0.772 | 0.950 | 0.949 | 9.9 | ||
| 2SLS | 0.767 | −0.006 | 0.767 | 0.776 | 0.776 | 0.953 | 0.953 | 9.4 | ||
| Naive | 1.218 | 1.170 | 0.337 | 0.339 | 0.339 | 0.061 | 0.061 | 99.9 | ||
| 2 | 400 | Benchmark | 1.083 | 0.019 | 1.083 | 1.086 | 1.086 | 0.950 | 0.950 | 6.9 |
| 2SRI | 1.192 | 0.008 | 1.192 | 1.201 | 1.184 | 0.955 | 0.949 | 6.8 | ||
| 2SLS | 1.194 | −0.137 | 1.186 | 1.184 | 1.182 | 0.949 | 0.949 | 5.2 | ||
| Naive | 1.287 | 1.173 | 0.530 | 0.530 | 0.530 | 0.390 | 0.390 | 89.2 | ||
| 800 | Benchmark | 0.753 | 0.006 | 0.753 | 0.757 | 0.757 | 0.951 | 0.951 | 9.8 | |
| 2SRI | 0.823 | −0.003 | 0.823 | 0.829 | 0.819 | 0.953 | 0.949 | 9.2 | ||
| 2SLS | 0.826 | −0.136 | 0.815 | 0.818 | 0.818 | 0.949 | 0.949 | 6.6 | ||
| Naive | 1.225 | 1.168 | 0.369 | 0.370 | 0.370 | 0.109 | 0.109 | 99.5 | ||
| 3 | 400 | Benchmark | 1.054 | 0.008 | 1.054 | 1.054 | 1.054 | 0.951 | 0.951 | 6.9 |
| 2SRI | 1.117 | 0.103 | 1.113 | 1.129 | 1.127 | 0.953 | 0.953 | 7.3 | ||
| 2SLS | 1.127 | 0.010 | 1.127 | 1.128 | 1.128 | 0.951 | 0.951 | 6.5 | ||
| Naive | 1.047 | 0.007 | 1.047 | 1.048 | 1.048 | 0.951 | 0.951 | 6.9 | ||
| 800 | Benchmark | 0.734 | 0.001 | 0.734 | 0.733 | 0.733 | 0.951 | 0.951 | 10.3 | |
| 2SRI | 0.785 | 0.044 | 0.784 | 0.787 | 0.786 | 0.950 | 0.950 | 10.2 | ||
| 2SLS | 0.789 | −0.001 | 0.789 | 0.787 | 0.787 | 0.949 | 0.949 | 9.4 | ||
| Naive | 0.731 | −0.001 | 0.731 | 0.731 | 0.731 | 0.951 | 0.951 | 10.2 |
The results of the binary exposure scenarios are shown in Table 2. In Scenario 4 with a logistic regression model in the first stage the 2SLS method is again substantially biased, while 2SRI method is practically unbiased. Although, both methods have a substantially larger root mean‐squared error than the benchmark method and the massively biased naive method. The results for Scenario 4 also show clearly that the unadjusted estimator underestimates standard errors resulting in coverage probabilities below the nominal level. In Scenario 5 with a probit model in the first stage the 2SRI is unbiased even though the first stage model is misspecified, while 2SLS has a small bias. In Scenario 6, which is the same as Scenario 5, but with exposure‐dependent censoring 2SRI remains unbiased, while the bias of 2SLS increases. There is a notable difference in the coverage probabilities of the adjusted and unadjusted confidence intervals for the exposure effects for the 2SRI method. In the binary scenarios both IV methods do substantially increase the variance of the estimates leading to a large loss of power compared to the benchmark method. This is a general feature of the two‐stage IV methods and not specific to our method.
Table 2.
Results of 50,000 simulations for scenarios 4–6 (binary exposure) of benchmark (all confounders observed), two‐stage residual inclusion (2SRI), two‐stage least‐squares (2SLS), and naive (confounders ignored) analysis for varying sample sizes n. RMSE, root mean‐squared error; SD, standard deviation; ESE, estimated standard error; ESE, estimated unadjusted standard error of; CP, coverage probability of 95% confidence interval; CP, coverage probability of unadjusted 95% confidence interval
| Scenario | n | Method | RMSE | Bias | SD | ESE | ESE | CP | CP | Power (%) |
|---|---|---|---|---|---|---|---|---|---|---|
| 4 | 400 | Benchmark | 0.089 | −0.000 | 0.089 | 0.088 | 0.088 | 0.951 | 0.951 | 99.1 |
| 2SRI | 0.236 | −0.004 | 0.236 | 0.236 | 0.232 | 0.955 | 0.948 | 43.0 | ||
| 2SLS | 0.239 | 0.068 | 0.229 | 0.238 | 0.238 | 0.952 | 0.952 | 51.8 | ||
| Naive | 0.088 | −0.007 | 0.088 | 0.087 | 0.087 | 0.950 | 0.950 | 99.0 | ||
| 800 | Benchmark | 0.062 | −0.001 | 0.062 | 0.062 | 0.062 | 0.950 | 0.950 | 100.0 | |
| 2SRI | 0.162 | −0.001 | 0.162 | 0.161 | 0.160 | 0.952 | 0.949 | 70.8 | ||
| 2SLS | 0.173 | 0.067 | 0.160 | 0.166 | 0.166 | 0.943 | 0.943 | 82.1 | ||
| Naive | 0.062 | −0.007 | 0.061 | 0.061 | 0.061 | 0.949 | 0.949 | 100.0 | ||
| 5 | 400 | Benchmark | 2.005 | 0.028 | 2.005 | 1.994 | 1.994 | 0.949 | 0.949 | 24.8 |
| 2SRI | 4.622 | 0.001 | 4.622 | 4.583 | 4.413 | 0.955 | 0.939 | 8.5 | ||
| 2SLS | 4.766 | 0.117 | 4.765 | 4.775 | 4.767 | 0.955 | 0.954 | 8.2 | ||
| Naive | 2.668 | 2.211 | 1.493 | 1.480 | 1.480 | 0.674 | 0.674 | 88.8 | ||
| 800 | Benchmark | 1.395 | −0.013 | 1.394 | 1.391 | 1.391 | 0.949 | 0.949 | 43.4 | |
| 2SRI | 3.149 | 0.005 | 3.149 | 3.142 | 3.084 | 0.953 | 0.945 | 13.0 | ||
| 2SLS | 3.262 | 0.132 | 3.260 | 3.284 | 3.283 | 0.953 | 0.952 | 12.7 | ||
| Naive | 2.416 | 2.181 | 1.040 | 1.037 | 1.037 | 0.440 | 0.440 | 99.4 | ||
| 6 | 400 | Benchmark | 2.165 | −0.055 | 2.165 | 2.168 | 2.168 | 0.952 | 0.952 | 21.1 |
| 2SRI | 4.809 | −0.031 | 4.809 | 4.769 | 4.192 | 0.950 | 0.899 | 8.1 | ||
| 2SLS | 4.681 | −0.289 | 4.672 | 4.677 | 4.672 | 0.952 | 0.952 | 7.2 | ||
| Naive | 2.607 | 2.095 | 1.552 | 1.550 | 1.550 | 0.709 | 0.709 | 82.8 | ||
| 800 | Benchmark | 1.518 | −0.021 | 1.518 | 1.518 | 1.518 | 0.951 | 0.951 | 37.8 | |
| 2SRI | 3.272 | −0.036 | 3.272 | 3.264 | 3.024 | 0.950 | 0.925 | 11.9 | ||
| 2SLS | 3.231 | −0.266 | 3.220 | 3.238 | 3.237 | 0.951 | 0.951 | 10.4 | ||
| Naive | 2.379 | 2.116 | 1.087 | 1.088 | 1.088 | 0.495 | 0.495 | 98.2 |
For each of the seven scenarios we also used the 2SRI method to estimate the conditional survival function (with covariate values fixed at their mean values in the continuous scenarios). From the estimate the median is estimated as . The confidence interval for the median is obtained by inverting the pointwise confidence interval for . Simultaneous confidence bands for were obtained using the bootstrap method from Theorem 2.3, where is chosen in each scenario such that on average approx. of the subjects were still at risk at time .
The results are shown in Table 3. In all scenarios, the median estimate has a very small bias and the coverage probabilities are close to the nominal level. Only in Scenario 6, where the linearity assumption for the confounder (Assumption (A5c)) is violated, is the coverage probability of the simultaneous confidence band markedly below the nominal level.
Table 3.
Mean of estimated median and confidence intervals of the conditional survival function for scenarios 1–7 and sample sizes and 800 in 10,000 simulations. Coverage probabilities of confidence intervals for the true median m () and simultaneous confidence bands () for the survival curve on . Simultaneous confidence bands are estimated from 1000 bootstrap replications
| Scenario |
|
m | n | Median (95%CI) | (%) | (%) | ||
|---|---|---|---|---|---|---|---|---|
| 1 | 0.19 | 0.069 | 400 | 0.070 (0.060, 0.081) | 94.7 | 94.6 | ||
| 800 | 0.069 (0.062, 0.077) | 95.3 | 95.2 | |||||
| 2 | 0.17 | 0.069 | 400 | 0.070 (0.059, 0.081) | 95.3 | 94.6 | ||
| 800 | 0.069 (0.062, 0.077) | 95.2 | 94.8 | |||||
| 3 | 0.18 | 0.070 | 400 | 0.070 (0.060, 0.081) | 94.9 | 94.1 | ||
| 800 | 0.070 (0.062, 0.077) | 95.2 | 95.0 | |||||
| 4 | 2.50 | 0.770 | 400 | 0.775 (0.658, 0.905) | 94.5 | 94.4 | ||
| 800 | 0.772 (0.689, 0.862) | 95.0 | 95.0 | |||||
| 5 | 0.15 | 0.050 | 400 | 0.050 (0.041, 0.060) | 94.8 | 94.5 | ||
| 800 | 0.050 (0.043, 0.057) | 94.9 | 94.9 | |||||
| 6 | 0.12 | 0.050 | 400 | 0.048 (0.040, 0.058) | 93.4 | 92.5 | ||
| 800 | 0.049 (0.043, 0.056) | 93.1 | 92.7 | |||||
| 7 | 0.17 | 0.050 | 400 | 0.050 (0.041, 0.060) | 95.0 | 94.8 | ||
| 800 | 0.050 (0.044, 0.058) | 94.8 | 94.9 |
For Scenario 7 with the time‐dependent exposure effect the mean of the cumulative effect is shown in Figure 2. Here, the naive method is substantially biased and fails to capture the true time‐dependency of the exposure. The 2SRI method is slightly biased for larger times as the number‐at‐risk becomes small. Web Figure 1 in Web Appendix A shows the means of the estimated survival functions and , respectively, for a sample size of 1000. The simultaneous confidence bands obtained from 1000 resampled processes have coverage probabilities and , respectively.
Figure 2.

Results of Scenario 7. Mean of for of 10,000 simulations with sample size .
4. Application
We consider data from a social experiment conducted by Illinois Department of Employment Security between mid‐1984 and mid‐1985 to test the effect of cash bonuses in reducing the duration of insured unemployment (W.E. Upjohn Institute, 1987; Woodbury and Spiegelman, 1987). A total of 12,101 new claimants for unemployment insurance were randomized into 3 groups, 3952 to the control group (no cash bonus offered), 3963 to the employer bonus group (cash bonus offered to the next employer), and 4186 to the claimant bonus group (cash bonus offered to the claimant). The cash bonus of $500 was only paid if the claimants found a new job within 11 weeks of claiming unemployment insurance. Thus, it is plausible to assume that the effect of offering the bonus on the duration of unemployment is time‐dependent.
We will only analyze the data from the claimant bonus experiment consisting of the control group and the claimant bonus group. Subjects randomized to the control group were not informed about the experiment and not asked whether they wanted to participate. In the claimant bonus group 659 () refused to participate for unknown reasons, which suggests that there is unobserved confounding.
This dataset has been previously analyzed using a two‐stage IV method based on a mixed proportional hazards model using the original randomization as the instrument (Bijwaard and Ridder, 2005). We analyze the dataset using the 2SLS and 2SRI methods, both with the cash bonus offer effect modeled as time‐dependent and time‐independent, and the naive method without any adjustment. The 2SRI is implemented based on the model in equation (4), which does not include the main effect of the first stage residual, since including the main effect made the design matrix singular for all event times. Following Bijwaard and Ridder (2005) we include age, the logarithm of pre‐unemployment earnings, gender, ethnicity, and the logarithm of the weekly amount of unemployment insurance benefits plus dependence allowance as additional covariates in our first and second stage models.
We use a formal goodness‐of‐fit test of the additive hazard model which has been proposed by Gandy and Jensen (2005). Their test statistic can be interpreted as a scaled sum of martingale residuals. The goodness‐of‐fit test indicates that the additive hazard model fits the data well for the female subgroup (), but neither the male subgroup () nor the entire group (). We therefore restrict our analysis to the 3619 female participants in the claimant bonus experiment.
The estimated cumulative effects are shown in Web Figure 2 in Web Appendix A. The non‐parametric two‐stage estimates are slightly larger than the non‐parametric naive estimate. The 2SRI method in the McKeague–Sasieni model and the 2SLS method in the Lin–Ying model give practically identical results for the effect of the cash bonus offer with the estimated effect with standard error about larger than the naive estimate with standard error . All estimates are positive, that is, offering the cash bonus increases the hazard of re‐employment therefore shortening the duration of uninsurance benefit claims, as expected.
The estimated effect for the 2SRI method is statistically significant (), but not for the naive method ().
5. Discussion
We have provided asymptotic results for the two‐stage residual inclusion method in an semi‐parametric additive hazard model for binary and continuous exposure. These results include as a special case the general model where all effects are time‐dependent. The advantage of the semi‐parametric model in connection with 2SRI method is that the effect of the included residual may be time‐dependent, while the effect of other covariates can modeled as constant over time.
Our simulations have shown that the 2SRI method avoids the bias of 2SLS when censoring depends on the exposure and when the first stage is a non‐linear model. Although the asymptotic results assume a logistic regression model in the first stage, an extension to other generalized linear models would be straightforward. The coverage probabilities of the confidence intervals are near the nominal level even for relatively small sample sizes and the method is seen to be robust when the data is generated from a probit model in the first stage instead of the assumed logistic model. The naive method, which ignores any confounding, had in some cases a very large bias and coverage probabilities far below the nominal level.
A potential application of the 2SRI method is when drop‐out is suspected to depend on the level of exposure and/or the instrument, as this would be adjusted for.
It can be seen that the coverage probabilties of the confidence intervals based on the unadjusted standard errors can be substantially below the nominal level. This is despite the difference between the adjusted and unadjusted standard errors seemingly becoming smaller as the sample size increases.
6. Supplementary Materials
Figures referenced in Sections 3 and 4 and the R code (R Core Team, 2017) for fitting the two‐stage methods to the data set in Section 4 are available with this article at the Biometrics website on Wiley Online Library.
Supporting information
Supplementary Data S1.
Supplementary Data Code S1.
Acknowledgements
This work is independent research arising in part from Dr Jaki's Senior Research Fellowship (NIHR‐SRF‐2015‐08‐001) supported by the National Institute for Health Research. Funding for this work was also provided by the Medical Research Council (MR/M005755/1). The views expressed in this publication are those of the authors and not necessarily those of the NHS, the National Institute for Health Research or the Department of Health.
Appendix 1.
First Stage iid Decompositions
We state two well known asymptotic results for the maxmium‐likelihood estimators for the logistic and linear regression models, that we need for our proof of Theorem 2.1. We have
| (A.1) |
where () are independent and identically distributed mean‐zero random ‐vectors.
Logisitic regression: , where .
Linear regression: , where .
Regularity Assumptions
A number of regularity assumptions are needed for proving our asymptotic results:
-
B1
There exist positive definite matrices and such that and , where .
-
B2For exist positive definite matrices such that
where is the design matrix of the first stage regression, and and are vectors defined by and , respectively. Let . -
B3
The covariates , and L have bounded support.
In order to prove uniform consistency of A and convergence of to a mean‐zero Gaussian process we need the following additional assumptions:
-
B4There exists a positive definite matrix function such that
-
B5There exists positive definite matrices such that
where for a matrix . -
B6For and exist positive definite matrices such that
Let for .
Furthermore, we also assume the regularity conditions required for asymptotic normality of the maximum‐likelihood estimator in the logistic regression model. Specifically, we assume that
| (A.2) |
where are iid random variables defined in the Appendix with covariance matrix .
Proofs
Lemma 1
Under Assumptions (B1), (B3), and (B4)
and
We only prove the first equation, the proof of the second is almost identical. We have , where and . The delta method implies . Therefore,
Since the covariates are bounded and we have . The conclusion then follows by Assumption (B3) and equation (A.2).
[Proof of Theorem 1] Let be the vector of counting process martingales, where
We have
since . Thus,
and with the definition of in equation (5) we have
(A.3) Since we have . By Assumption (B1) and Lemma 1 the second term on the right hand side becomes
(A.4) which is a sum of mean‐zero iid terms and asymptotic normality follows from the central limit theorem. Asymptotic normality of the first term on the right hand side of equation (A.3) follows from the martingale central limit theorem (Andersen et al., 1993). The asymptotic variance of follows, since the two terms on the right hand side of equation (A.3) are asymptotically independent. Thus, .
For later reference, we note that admits the following iid decomposition , where
(A.5) where . Now let , . For showing asymptotic normality of we start by noting that
The second term on the right hand side is asymptotically equivalent to , by Lemma 1, and the last term is asymptotically equivalent to by Assumption (B2) and equation (A.4). Thus,
(A.6) The martingale central limit theorem and Assumption (B3) imply convergence of the first term on the right hand side to a mean‐zero Gaussian process with covariation function . The second term converges to a mean‐zero Gaussian process where is a mean‐zero normal random vector with covariance matrix . The last term on the right hand side converges to a mean‐zero Gaussian process where is a mean‐zero normal random vector with covariance matrix K. All three processes are asymptotically independent, since each is time‐independent and the covariation between the two martingale processes is 0, by a similar argument as that in Appendix 1 of McKeague and Sasieni (1994). Thus, .
In order to prove uniform consistency of on we divide equation (A.6) by and see that all terms converge to 0 uniformly in probability, the first two terms by Lenglart's inequality (Andersen et al., 1993) and the last term because of the law of large numbers.
We have again an iid decomposition , where
(A.7)
References
- Aalen, O. O. (1989). A linear regression model for the analysis of life times. Statistics in Medicine 8, 907–925. [DOI] [PubMed] [Google Scholar]
- Amado, R. G. , Wolf, M. , Peeters, M. , VanCutsem, E. , Siena, S. , Freeman, D. J. , et al. (2008). Wild‐type KRAS is required for panitumumab efficacy in patients with metastatic colorectal cancer. Journal of Clinical Oncology 26, 1626–1634. [DOI] [PubMed] [Google Scholar]
- Andersen, P. K. , Gill, R. D. , and Keiding, N. (1993). Statistical Models Based on Counting Processes. Springer Series in Statistics. Springer‐Verlag, New York. [Google Scholar]
- Angrist, J. D. , Imbens, G. W. , and Rubin, D. B. (1996). Identification of causal effects using instrumental variables. Journal of the American Statistical Association 91, 444–455. [Google Scholar]
- Baker, S. G. (1998). Analysis of survival data from a randomized trial with all‐or‐none compliance: Estimating the cost‐effectiveness of a cancer screening program. Journal of the American Statistical Association 93, 929–934. [Google Scholar]
- Bijwaard, G. E. and Ridder, G. (2005). Correcting for selective compliance in a re‐employment bonus experiment. Journal of Econometrics 125, 77–111. [Google Scholar]
- Chan, K. C. G. (2016). Reader reaction: Instrumental variable additive hazards models with exposuredependent censoring. Biometrics 72, 1003–1005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Choi, B. Y. , Fine, J. P. , and Brookhart, M. A. (2017). On two‐stage estimation of structural instrumental variable models. Biometrika 104, 881–899. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gandy, A. and Jensen, U. (2005). On goodness‐of‐fit tests for Aalen's additive risk model. Scandinavian Journal of Statistics 32, 425–445. [Google Scholar]
- Li, G. and Lu, X. (2015). A Bayesian approach for instrumental variable analysis with censored time‐to‐event outcome. Statistics in Medicine 34, 664–684. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li, J. , Fine, J. , and Brookhart, A. (2015). Instrumental variable additive hazards models. Biometrics 71, 122–130. [DOI] [PubMed] [Google Scholar]
- Lin, D. Y. and Ying, Z. (1994). Semiparametric analysis of the additive risk model. Biometrika 81, 61–71. [Google Scholar]
- Martinussen, T. and Scheike, T. H. (2006). Dynamic Regression Models for Survival Data. Statistics for Biology and Health. New York: Springer‐Verlag. [Google Scholar]
- Martinussen, T. , Vansteelandt, S. , Tchetgen, T.,J, E. , and Zucker, D. M. (2017). Instrumental variables estimation of exposure effects on a time‐to‐event endpoint using structural cumulative survival models. Biometrics 73, 1140–1149. [DOI] [PMC free article] [PubMed] [Google Scholar]
- McKeague, I. W. and Sasieni, P. D. (1994). A partly parametric additive risk model. Biometrika 81, 501–514. [Google Scholar]
- Nie, H. , Cheng, J. , and Small, D. S. (2011). Inference for the effect of treatment on survival probability in randomized trials with noncompliance and administrative censoring. Biometrics 67, 1397–1405. [DOI] [PubMed] [Google Scholar]
- R Core Team (2017). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.
- Richardson, A. , Hudgens, M. G. , Fine, J. P. , and Brookhart, M. A. (2017). Nonparametric binary instrumental variable analysis of competing risks data. Biostatistics 18, 48–61. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tchetgen Tchetgen, E. J. , Walter, S. , Vansteelandt, S. , Martinussen, T. , and Glymour, M. (2015). Instrumental variable estimation in a survival context. Epidemiology (Cambridge, Mass.) 26, 402–410. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Terza, J. V. , Basu, A. , and Rathouz, P. J. (2008). Two‐stage residual inclusion estimation: Addressing endogeneity in health econometric modeling. Journal of Health Economics 27, 531–543. [DOI] [PMC free article] [PubMed] [Google Scholar]
- W.E. Upjohn Institute (1987). The Illinois Unemployment Insurance Experiments public use data. https://upjohn.org/node/950. Accessed: 2017‐06–29.
- Woodbury, S. A. and Spiegelman, R. G. (1987). Bonuses to workers and employers to reduce unemployment: Randomized trials in illinois. The American Economic Review 77, 513–530. [Google Scholar]
- Zeng, D. , Chen, Q. , Chen, M.‐H. , Ibrahim, J. G. , and Groups, A. R. (2012). Estimating treatment effects with treatment switching via semicompeting risks models: An application to a colorectal cancer study. Biometrika 99, 167–184. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Supplementary Data S1.
Supplementary Data Code S1.
