Instrumental Variable Additive Hazards Models with Exposure-dependent Censoring

Kwun Chuen Gary Chan

doi:10.1111/biom.12471

. Author manuscript; available in PMC: 2017 Sep 1.

Published in final edited form as: Biometrics. 2016 Jan 11;72(3):1003–1005. doi: 10.1111/biom.12471

Instrumental Variable Additive Hazards Models with Exposure-dependent Censoring

Kwun Chuen Gary Chan ¹

PMCID: PMC4940314 NIHMSID: NIHMS751408 PMID: 26754156

Summary

Li, Fine and Brookhart (2015) presented an extension of the two-stage least squares (2SLS) method for additive hazards models which requires an assumption that the censoring distribution is unrelated to the endogenous exposure variable. We present another extension of 2SLS that can address this limitation.

Keywords: Two stage residual inclusion, unmeasured confounding

1. Introduction

Instrumental variable regression methods have been widely used for addressing the problem of unmeasured confounding when a valid instrument is available for analysis. Two-stage least squares (2SLS) has been used when the outcome of interest is uncensored. Li, Fine and Brookhart (2015) recently proposed a simple but powerful extension of 2SLS to additive hazards models for censored survival outcomes. As acknowledged in that paper, a limitation of their method is that the censoring distribution is assumed to be unrelated to the endogenous exposure variable given exogenous variables. The purpose of this paper is to discuss a different extension of 2SLS that can address this limitation.

2. Two-stage predictor substitution and residual inclusion

We employ the same notations as Li, Fine and Brookhart (2015). Let T be the failure time of interest and C be the censoring time. The observed survival data are Y = min(T, C) and Δ = I(T ≤ C). Let X_e be an endogenous exposure variable of interest, X_o be observed exogenous variables, X_u be unobserved confounders and X_I be an instrumental variable. The conditional hazard function of the survival outcome follows an additive hazards model $h_{0} (t) + β_{e} X_{e} + β_{o}^{T} X_{o} + η$ and the endogenous exposure model follows X_e = α_c+α_IX_I+α_oX_o+ν, where η and ν are dependent on X_u but not (X_I, X_o).

Li, Fine and Brookhart (2015) first obtain the least-square estimates (α̂_c, α̂_I, ${\hat{α}}_{o}^{T}$ ) and predict the exposure ${\hat{X}}_{e} = {\hat{α}}_{c} + {\hat{α}}_{I} X_{I} + {\hat{α}}_{o}^{T} X_{o}$ . Then, they fit the additive hazards model using X̂_e and X_o as the covariates. This procedure is often called two-stage predictor substitution (2SPS) in the literature. An alternative two-stage residual inclusion (2SRI) method can be formulated as follows: Using the same exposure model to obtain least-square estimates (α̂_c, α̂_I, ${\hat{α}}_{o}^{T}$ ), we predict the residuals $\hat{ν} = X_{e} - {\hat{α}}_{c} - {\hat{α}}_{I} X_{I} - {\hat{α}}_{o}^{T} X_{o}$ . The second-stage model uses (X_e, $X_{o}^{T}$ , ν̂) as predictors.

Intuitive rationales of the instrumental variable methods are given as follows: Note that the major complication of unmeasured confounding is that X_e and η are dependent but η is unobserved. In order to correct for the bias, one needs to redefine the predictors in such a way that they are independent of the unobserved effective error term. Two different decompositions are possible and form the bases for 2SPS and 2SRI. For 2SPS, the endogenous exposure is decomposed into the predicted exposure which is a function of (X_I, X_o) and a remainder ν that is independent of (X_I, X_o). The predicted exposure in the second stage is then independent of the effective error term, which is ν + η. For 2SRI, the error term η is decomposed into a part that depends on (X_e, X_I, X_o) and a remainder term that is independent of (X_e, X_I, X_o). By including the first term in the second stage as a predictor, the effective predictors are independent of the effective error term.

When the outcome T is uncensored and follows a linear model, the coefficients for (X_e, $X_{o}^{T}$ ) in 2SRI are identical to coefficients of (X̂_e, $X_{o}^{T}$ ) in 2SPS. However, for censored outcomes they are different in general. In fact, they are valid under different censoring assumptions. For 2SPS, since ${\tilde{X}}_{e} = α_{c} + α_{I} X_{I} + α_{o}^{T} X_{o}$ depends solely on (X_I, X_o), risk-set arguments are valid under Assumption (C6) given below:

C is conditionally independent of T given (X_{I}, X_{o}) .

(C6)

For 2SRI, the predictor of the second-stage model depends on (X_e, X_I, X_o) and risk-set arguments are valid under the relaxed assumption

C is conditionally independent of T given (X_{e}, X_{I}, X_{o}) .

(C6′a)

To see that (C6) is necessary for 2SPS, we consider the compensated process

M^{*} (t) = I (Y \leq t, Δ = 1) - \int_{0}^{t} I (Y \geq s) [h_{0}^{*} (s) + β_{e} {\tilde{X}}_{e} + β_{o}^{T} X_{o}] d s

where $h_{0}^{*} (t)$ is the marginal hazard function of a random variable T* for which the conditional hazard function of T* given η is h₀(t)+ η and T* is conditionally independent of (X_I, X_o, X_u) given η. It follows that E(M^*(t)|X_I, X_o) = 0, but E(M*(t)|X_e, X_I, X_o) ≠ 0 in general because η and X_e are dependent. Therefore, (C.6) is needed for the consistency of 2SPS.

While the assumption (C6′a) is weaker than (C6) in general, the validity of 2SRI requires an additional assumption on the relationship between η and ν:

η = γ_{ν} ν + ε^{'}

(C6′b)

where ε′ is independent of (X_I, X_o, X_u). Note that Assumption (C6′b) is substantially more relaxed than a similar assumption in a prominent paper by Terza, Basu and Rathouz (2008), in which their equation (3) requires ε′ ≡ 0. When ε′ ≡ 0, the unobserved confounder can be fully identified by the observable exposures and instruments, which is unrealistic in practice. Assumption (C6′b) is substantially more general but is needed in 2SRI because one needs to decompose η into a part that is dependent on X_e and a part that is independent of X_e. It is not needed in 2SPS because X̃_e solely depends on (X_I, X_o), which are independent of η. However, including X_e as a predictor of 2SRI allows C to be dependent on X_e in (C6′a). The assumptions (C6′a) and (C6′b) imply unbiasedness of the following compensated process:

M^{†} (t) = I (Y \leq t, Δ = 1) = \int_{0}^{t} I (Y \geq s) [h_{0}^{†} (s) + β_{e} X_{e} + β_{o}^{T} X_{o} + γ_{ν} ν] d s

where $h_{0}^{†} (t)$ is the marginal hazard function of a random variable T^† for which the conditional hazard function of T^† given ε′ is h₀(t) + ε′, and T^† is conditionally independent of (X_e, X_I, X_o, X_u) given ε′. It follows that E(M^*(t)|X_e, X_I, X_o) = 0 because ε′ and X_e are independent.

3. Numerical studies

We follow simulation settings I–III in Li, Fine and Brookhart (2015) to simulate data without exposure-dependent censoring. In Scenario I, survival times are generated from an exponential distribution and β_e = 1. In Scenario II, survival times are generated from a Weibull distribution and β_e = 0.15. In Scenario III, survival times are generated from a gamma frailty model with a linear conditional hazard function and β_e = 0.2. For each of the settings, we consider three scenarios corresponding to (a) independence between X_e and C, (b) weak dependence between X_e and C and (c) moderate dependence between X_e and C. Dependence in cases (b) and (c) follow a Clayton copula with θ = 1 and 2, and the corresponding Kendall’s rank correlations are of 0.3 and 0.5. Table 1 shows the simulation results. When censoring is not exposure-dependent, the performance of two methods are almost identical. When censoring is exposure-dependent, the 2SPS of Li, Fine and Brookhart (2015) can have substantial relative biases of the true effect, while the two-stage residual inclusion method has negligible relative biases. The empirical standard error of the two estimators has negligible differences.

Table 1.

Simulation results comparing two-stage predictor substitution (2SPS) of Li, Fine and Brookhart (2015) and the proposed two-stage residual inclusion (2SRI); (a) independence between X_e and C, (b) weak dependence between X_e and C and (c) moderate dependence between X_e and C. Results are based on 5000 simulations. Truth represents true parameter values (× 100). AB represents absolute sampling bias (× 100). PB represents percentage bias (× 100%) and SE represents sampling standard deviation (× 100).

Scenario	n	Truth	(a)						(b)						(c)
			2SPS			2SRI			2SPS			2SRI			2SPS			2SRI
			AB	PB	(SE)	AB	PB	(SE)	AB	PB	(SE)	AB	PB	(SE)	AB	PB	(SE)	AB	PB	(SE)
I	100	100	−6	−6	(262)	−5	−5	(269)	−32	−32	(273)	6	6	(281)	−53	−53	(276)	4	4	(285)
	200	100	−6	−6	(167)	−4	−4	(170)	−43	−43	(175)	−2	−2	(179)	−59	−59	(178)	1	1	(182)
	400	100	−2	−2	(111)	−2	−2	(114)	−38	−38	(118)	2	2	(120)	−54	−54	(121)	4	4	(123)
	800	100	−4	−4	(78)	−4	−4	(79)	−42	−42	(82)	−2	−2	(83)	−60	−60	(84)	−2	−2	(86)
	1200	100	2	2	(64)	−2	−2	(65)	−42	−42	(66)	−2	−2	(67)	−60	−60	(68)	−1	−1	(68)
II	100	15	−2	−13	(41)	−2	−11	(42)	−8	−54	(44)	−1	−7	(44)	−12	−79	(45)	−1	−9	(46)
	200	15	−2	−10	(28)	−1	−9	(29)	−8	−51	(31)	−1	−4	(31)	−11	−75	(31)	−1	−5	(32)
	400	15	−1	−4	(19)	<1	−3	(19)	−7	−49	(21)	<1	−2	(22)	−11	−74	(22)	−1	−4	(22)
	800	15	<1	−2	(14)	<1	−1	(14)	−8	−52	(15)	−1	−5	(15)	−11	−76	(16)	−1	−6	(16)
	1200	15	<1	−2	(11)	<1	−1	(11)	−8	−50	(12)	<1	−3	(12)	−11	−75	(13)	−1	−5	(13)
III	100	20	−2	−11	(40)	−2	−9	(41)	−10	−50	(44)	−1	−8	(45)	−14	−68	(46)	−2	−10	(47)
	200	20	−2	−8	(27)	−1	−7	(28)	−10	−50	(30)	−1	−8	(31)	−14	−71	(31)	−1	−9	(32)
	400	20	−1	−5	(19)	−1	−4	(19)	−10	−48	(21)	−1	−6	(21)	−14	−68	(22)	−1	−7	(22)
	800	20	−1	−7	(13)	−1	−6	(13)	−10	−45	(15)	−1	−4	(15)	−13	−66	(15)	−1	−5	(15)
	1200	20	−1	−6	(11)	−1	−4	(11)	−9	−47	(12)	−1	−5	(12)	−14	−68	(12)	−1	−6	(12)

Open in a new tab

Acknowledgments

The author would like to thank Prof. Jeanine Houwing-Duistermaat, an associate editor and an anonymous referee for their helpful suggestions which greatly improve the article. The author was partially supported by the National Institutes of Health grant R01 HL 122212.

References

Li J, Fine J, Brookhart A. Instrumental variable additive hazards model. Biometrics. 2015;71(1):122–130. doi: 10.1111/biom.12244. [DOI] [PubMed] [Google Scholar]
Terza JV, Basu A, Rathouz PJ. Two-stage residual inclusion estimation: Addressing endogeneity in health econometric modeling. Journal of Health Economics. 2008;27(3):531–543. doi: 10.1016/j.jhealeco.2007.09.009. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R1] Li J, Fine J, Brookhart A. Instrumental variable additive hazards model. Biometrics. 2015;71(1):122–130. doi: 10.1111/biom.12244. [DOI] [PubMed] [Google Scholar]

[R2] Terza JV, Basu A, Rathouz PJ. Two-stage residual inclusion estimation: Addressing endogeneity in health econometric modeling. Journal of Health Economics. 2008;27(3):531–543. doi: 10.1016/j.jhealeco.2007.09.009. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Instrumental Variable Additive Hazards Models with Exposure-dependent Censoring

Kwun Chuen Gary Chan

Summary

1. Introduction

2. Two-stage predictor substitution and residual inclusion

3. Numerical studies

Table 1.

Acknowledgments

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Instrumental Variable Additive Hazards Models with Exposure-dependent Censoring

Kwun Chuen Gary Chan

Summary

1. Introduction

2. Two-stage predictor substitution and residual inclusion

3. Numerical studies

Table 1.

Acknowledgments

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases