Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2017 Sep 1.
Published in final edited form as: Biometrics. 2016 Jan 11;72(3):1003–1005. doi: 10.1111/biom.12471

Instrumental Variable Additive Hazards Models with Exposure-dependent Censoring

Kwun Chuen Gary Chan 1
PMCID: PMC4940314  NIHMSID: NIHMS751408  PMID: 26754156

Summary

Li, Fine and Brookhart (2015) presented an extension of the two-stage least squares (2SLS) method for additive hazards models which requires an assumption that the censoring distribution is unrelated to the endogenous exposure variable. We present another extension of 2SLS that can address this limitation.

Keywords: Two stage residual inclusion, unmeasured confounding

1. Introduction

Instrumental variable regression methods have been widely used for addressing the problem of unmeasured confounding when a valid instrument is available for analysis. Two-stage least squares (2SLS) has been used when the outcome of interest is uncensored. Li, Fine and Brookhart (2015) recently proposed a simple but powerful extension of 2SLS to additive hazards models for censored survival outcomes. As acknowledged in that paper, a limitation of their method is that the censoring distribution is assumed to be unrelated to the endogenous exposure variable given exogenous variables. The purpose of this paper is to discuss a different extension of 2SLS that can address this limitation.

2. Two-stage predictor substitution and residual inclusion

We employ the same notations as Li, Fine and Brookhart (2015). Let T be the failure time of interest and C be the censoring time. The observed survival data are Y = min(T, C) and Δ = I(TC). Let Xe be an endogenous exposure variable of interest, Xo be observed exogenous variables, Xu be unobserved confounders and XI be an instrumental variable. The conditional hazard function of the survival outcome follows an additive hazards model h0(t)+βeXe+βoTXo+η and the endogenous exposure model follows Xe = αc+αIXI+αoXo+ν, where η and ν are dependent on Xu but not (XI, Xo).

Li, Fine and Brookhart (2015) first obtain the least-square estimates (α̂c, α̂I, α^oT) and predict the exposure X^e=α^c+α^IXI+α^oTXo. Then, they fit the additive hazards model using e and Xo as the covariates. This procedure is often called two-stage predictor substitution (2SPS) in the literature. An alternative two-stage residual inclusion (2SRI) method can be formulated as follows: Using the same exposure model to obtain least-square estimates (α̂c, α̂I, α^oT), we predict the residuals ν^=Xe-α^c-α^IXI-α^oTXo. The second-stage model uses (Xe, XoT, ν̂) as predictors.

Intuitive rationales of the instrumental variable methods are given as follows: Note that the major complication of unmeasured confounding is that Xe and η are dependent but η is unobserved. In order to correct for the bias, one needs to redefine the predictors in such a way that they are independent of the unobserved effective error term. Two different decompositions are possible and form the bases for 2SPS and 2SRI. For 2SPS, the endogenous exposure is decomposed into the predicted exposure which is a function of (XI, Xo) and a remainder ν that is independent of (XI, Xo). The predicted exposure in the second stage is then independent of the effective error term, which is ν + η. For 2SRI, the error term η is decomposed into a part that depends on (Xe, XI, Xo) and a remainder term that is independent of (Xe, XI, Xo). By including the first term in the second stage as a predictor, the effective predictors are independent of the effective error term.

When the outcome T is uncensored and follows a linear model, the coefficients for (Xe, XoT) in 2SRI are identical to coefficients of (e, XoT) in 2SPS. However, for censored outcomes they are different in general. In fact, they are valid under different censoring assumptions. For 2SPS, since Xe=αc+αIXI+αoTXo depends solely on (XI, Xo), risk-set arguments are valid under Assumption (C6) given below:

CisconditionallyindependentofTgiven(XI,Xo). (C6)

For 2SRI, the predictor of the second-stage model depends on (Xe, XI, Xo) and risk-set arguments are valid under the relaxed assumption

CisconditionallyindependentofTgiven(Xe,XI,Xo). (C6′a)

To see that (C6) is necessary for 2SPS, we consider the compensated process

M(t)=I(Yt,Δ=1)-0tI(Ys)[h0(s)+βeXe+βoTXo]ds

where h0(t) is the marginal hazard function of a random variable T* for which the conditional hazard function of T* given η is h0(t)+ η and T* is conditionally independent of (XI, Xo, Xu) given η. It follows that E(M*(t)|XI, Xo) = 0, but E(M*(t)|Xe, XI, Xo) ≠ 0 in general because η and Xe are dependent. Therefore, (C.6) is needed for the consistency of 2SPS.

While the assumption (C6′a) is weaker than (C6) in general, the validity of 2SRI requires an additional assumption on the relationship between η and ν:

η=γνν+ε (C6′b)

where ε′ is independent of (XI, Xo, Xu). Note that Assumption (C6′b) is substantially more relaxed than a similar assumption in a prominent paper by Terza, Basu and Rathouz (2008), in which their equation (3) requires ε′ ≡ 0. When ε′ ≡ 0, the unobserved confounder can be fully identified by the observable exposures and instruments, which is unrealistic in practice. Assumption (C6′b) is substantially more general but is needed in 2SRI because one needs to decompose η into a part that is dependent on Xe and a part that is independent of Xe. It is not needed in 2SPS because e solely depends on (XI, Xo), which are independent of η. However, including Xe as a predictor of 2SRI allows C to be dependent on Xe in (C6′a). The assumptions (C6′a) and (C6′b) imply unbiasedness of the following compensated process:

M(t)=I(Yt,Δ=1)=0tI(Ys)[h0(s)+βeXe+βoTXo+γνν]ds

where h0(t) is the marginal hazard function of a random variable T for which the conditional hazard function of T given ε′ is h0(t) + ε′, and T is conditionally independent of (Xe, XI, Xo, Xu) given ε′. It follows that E(M*(t)|Xe, XI, Xo) = 0 because ε′ and Xe are independent.

3. Numerical studies

We follow simulation settings I–III in Li, Fine and Brookhart (2015) to simulate data without exposure-dependent censoring. In Scenario I, survival times are generated from an exponential distribution and βe = 1. In Scenario II, survival times are generated from a Weibull distribution and βe = 0.15. In Scenario III, survival times are generated from a gamma frailty model with a linear conditional hazard function and βe = 0.2. For each of the settings, we consider three scenarios corresponding to (a) independence between Xe and C, (b) weak dependence between Xe and C and (c) moderate dependence between Xe and C. Dependence in cases (b) and (c) follow a Clayton copula with θ = 1 and 2, and the corresponding Kendall’s rank correlations are of 0.3 and 0.5. Table 1 shows the simulation results. When censoring is not exposure-dependent, the performance of two methods are almost identical. When censoring is exposure-dependent, the 2SPS of Li, Fine and Brookhart (2015) can have substantial relative biases of the true effect, while the two-stage residual inclusion method has negligible relative biases. The empirical standard error of the two estimators has negligible differences.

Table 1.

Simulation results comparing two-stage predictor substitution (2SPS) of Li, Fine and Brookhart (2015) and the proposed two-stage residual inclusion (2SRI); (a) independence between Xe and C, (b) weak dependence between Xe and C and (c) moderate dependence between Xe and C. Results are based on 5000 simulations. Truth represents true parameter values (× 100). AB represents absolute sampling bias (× 100). PB represents percentage bias (× 100%) and SE represents sampling standard deviation (× 100).

Scenario n Truth (a) (b) (c)
2SPS 2SRI 2SPS 2SRI 2SPS 2SRI
AB PB (SE) AB PB (SE) AB PB (SE) AB PB (SE) AB PB (SE) AB PB (SE)
I 100 100 −6 −6 (262) −5 −5 (269) −32 −32 (273) 6 6 (281) −53 −53 (276) 4 4 (285)
200 100 −6 −6 (167) −4 −4 (170) −43 −43 (175) −2 −2 (179) −59 −59 (178) 1 1 (182)
400 100 −2 −2 (111) −2 −2 (114) −38 −38 (118) 2 2 (120) −54 −54 (121) 4 4 (123)
800 100 −4 −4 (78) −4 −4 (79) −42 −42 (82) −2 −2 (83) −60 −60 (84) −2 −2 (86)
1200 100 2 2 (64) −2 −2 (65) −42 −42 (66) −2 −2 (67) −60 −60 (68) −1 −1 (68)
II 100 15 −2 −13 (41) −2 −11 (42) −8 −54 (44) −1 −7 (44) −12 −79 (45) −1 −9 (46)
200 15 −2 −10 (28) −1 −9 (29) −8 −51 (31) −1 −4 (31) −11 −75 (31) −1 −5 (32)
400 15 −1 −4 (19) <1 −3 (19) −7 −49 (21) <1 −2 (22) −11 −74 (22) −1 −4 (22)
800 15 <1 −2 (14) <1 −1 (14) −8 −52 (15) −1 −5 (15) −11 −76 (16) −1 −6 (16)
1200 15 <1 −2 (11) <1 −1 (11) −8 −50 (12) <1 −3 (12) −11 −75 (13) −1 −5 (13)
III 100 20 −2 −11 (40) −2 −9 (41) −10 −50 (44) −1 −8 (45) −14 −68 (46) −2 −10 (47)
200 20 −2 −8 (27) −1 −7 (28) −10 −50 (30) −1 −8 (31) −14 −71 (31) −1 −9 (32)
400 20 −1 −5 (19) −1 −4 (19) −10 −48 (21) −1 −6 (21) −14 −68 (22) −1 −7 (22)
800 20 −1 −7 (13) −1 −6 (13) −10 −45 (15) −1 −4 (15) −13 −66 (15) −1 −5 (15)
1200 20 −1 −6 (11) −1 −4 (11) −9 −47 (12) −1 −5 (12) −14 −68 (12) −1 −6 (12)

Acknowledgments

The author would like to thank Prof. Jeanine Houwing-Duistermaat, an associate editor and an anonymous referee for their helpful suggestions which greatly improve the article. The author was partially supported by the National Institutes of Health grant R01 HL 122212.

References

  1. Li J, Fine J, Brookhart A. Instrumental variable additive hazards model. Biometrics. 2015;71(1):122–130. doi: 10.1111/biom.12244. [DOI] [PubMed] [Google Scholar]
  2. Terza JV, Basu A, Rathouz PJ. Two-stage residual inclusion estimation: Addressing endogeneity in health econometric modeling. Journal of Health Economics. 2008;27(3):531–543. doi: 10.1016/j.jhealeco.2007.09.009. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES