Complier stochastic direct effects: identification and robust estimation

Kara E Rudolph; Oleg Sofrygin; Mark J van der Laan

doi:10.1080/01621459.2019.1704292

. Author manuscript; available in PMC: 2022 Jan 1.

Published in final edited form as: J Am Stat Assoc. 2020 Jan 23;116(535):1254–1264. doi: 10.1080/01621459.2019.1704292

Complier stochastic direct effects: identification and robust estimation

Kara E Rudolph ¹, Oleg Sofrygin ², Mark J van der Laan ³

PMCID: PMC8439556 NIHMSID: NIHMS1550244 PMID: 34531623

Abstract

Mediation analysis is critical to understanding the mechanisms underlying exposure-outcome relationships. In this paper, we identify the instrumental variable-direct effect of the exposure on the outcome not through the mediator, using randomization of the instrument. We call this estimand the complier stochastic direct effect (CSDE). To our knowledge, such an estimand has not previously been considered or estimated. We propose and evaluate several estimators for the CSDE: a ratio of inverse-probability of treatment-weighted estimators (IPTW), a ratio of estimating equation estimators (EE), a ratio of targeted minimum loss-based estimators (TMLE), and a TMLE that targets the CSDE directly. These estimators are applicable for a variety of study designs, including randomized encouragement trials, like the Moving to Opportunity housing voucher experiment we consider as an illustrative example, treatment discontinuities, and Mendelian randomization. We found the IPTW estimator to be the most sensitive to finite sample bias, resulting in bias of over 40% even when all models were correctly specified in a sample size of N=100. In contrast, the EE estimator and TMLE that targets the CSDE directly were far less sensitive. The EE and TML estimators also have advantages in terms of efficiency and reduced reliance on correct parametric model specification.

Keywords: Mediation, targeted minimum loss-based estimation, instrumental variables

1. Introduction

Mediation analysis is critical to understanding the mechanisms underlying exposure-outcome relationships. It can be used to decompose the total effect into its path-specific effects—usually categorized as direct effects, meaning the effect of the exposure on the outcome not operating through a mediator, and indirect effects, meaning the path from the exposure to the mediator to the outcome (Ogburn, 2012). For example, such decomposition has led to understanding of which locations in the brain are responsible for transmitting pain (Chen et al., 2015) and mechanisms underlying associations between early life body size and breast cancer (Rice et al., 2016). Such scenarios reflect observed data $O = (W, Z, M, Y)$ , where W are covariates, Z is exposure, M is mediator, and Y is outcome.

Less research has been devoted to estimating path-specific effects where there is an instrument for the exposure, reflecting observed data $O = (W, A, Z, M, Y)$ , where A is an instrument for the overall effect of Z by only affecting M and Y through Z, satisfying the econometric criteria for an instrument (Joffe et al., 2008). In this paper, we consider such a data structure and are concerned with estimating the path-specific direct effect of Z on Y not through M using instrument A to address observed and unobserved confounding of the exposure-outcome relationship. Such an estimand would be an instrumental variable (IV)-direct effect or complier direct effect. To our knowledge, such an estimand has not previously been considered or estimated, though Frölich and Huber (2017) considered a similar complier direct effect with separate instruments for Z and M, A₁ and A₂, that themselves are conditionally independent, $A_{1} ⫫ A_{2} | W .$ . These authors demonstrated how one can identify IV mediation estimands for the direct effect of Z on Y not through M and for the indirect effect of Z on Y through M using these two distinct instruments. Relatedly, Joffe et al. (2008) considered observed data $O = (W, A, Z, M, Y)$ but where Z and M are sequential exposures (Joffe et al., 2008) with A being an instrument for each. In this case, A affects Z and M but not Y. The authors were concerned with estimating the overall effect of Z, because it could no longer be identified using standard IV approaches.

Recent work considering the same observed data structure $O = (W, A, Z, M, Y)$ has identified and estimated stochastic direct and indirect effects of the instrument on the outcome not operating through a mediator in the direct effects case and operating through a mediator in the indirect effects case (Rudolph et al., 2017b), treating Z as a time-varying confounder (Didelez et al., 2006; VanderWeele and Tchetgen Tchetgen, 2017; Zheng and van der Laan, 2017). Other work has considered another instrumental variable observed data structure, $O = (W, Z, M, Y)$ , where Z and W interact together to form an instrument for M relaxing the sequential ignorability assumption (Ten Have et al., 2007; Dunn and Bentall, 2007; Albert, 2008; Small, 2011). However, again, to our knowledge, there has been no research on the identification or estimation of IV mediation estimands in the scenario where we have observed data $O = (W, A, Z, M, Y)$ , where A is an instrument for the total effect of exposure Z which in turn may affect mediator M and outcome Y and where A adheres to the exclusion restriction assumption of instruments, so does not directly affect either M or Y.

We address this research gap by identifying an IV causal quantity of the direct effect of the exposure on the outcome (the effect of Z on Y not through M), using randomization of the instrument, A. We call this estimand the complier stochastic direct effect (CSDE). We propose and evaluate several estimators for this estimand: 1) an inverse-probability-of-treatment weighted estimator (IPTW), 2) an estimating equation estimator (EE), and 3) several targeted minimum loss-based estimators (TMLE). Both the EE and TML estimators are robust to several combinations of model misspecifications, the details of which are described in a later section. In contrast, the IPTW estimator may not be consistent if the instrument or mediation models are incorrectly specified.

The paper is organized as follows. In Section 2, we introduce notation and the structural causal model representing our data structure. In Section 3, we define the causal quantity of interest, the CSDE, and establish its identification from the data distribution under specified assumptions. Section 4 details the IPTW, EE, and TML estimators. Section 5 presents the simulation study that demonstrates each estimator’s consistency, efficiency, and robustness properties in finite samples. In Section 6, we apply these estimators to a real-world data where we estimate the direct effect of using a Section 8 housing voucher to move out of public housing on subsequent adolescent substance use outcomes not mediated by parental mental health, employment, and parent-child closeness using randomization of housing voucher receipt as the instrument. Section 7 concludes.

2. Notation and Structural Causal Model

We observe data $O = (W, A, Z, M, Y)$ for each of n individuals, where we assume O₁,…,O_n are i.i.d. for the true, unknown data distribution, P₀ on O. The subscript 0 denotes values under this true, unknown distribution P₀. P is any probability distribution (including P₀) in statistical model, $M$ , which is the set of distributions for which our estimand is identifiable and is discussed further below and in Section 3. Values under a particular P are not given subscripts. The subscript n denotes estimates.

W is a vector of exogenous baseline covariates, $W = f (U_{W})$ , where U_W is unobserved exogenous error on W(Pearl, 2009). We consider the statistical model, $M$ , where A is a binary instrument (with the attendant assumptions of instrumental variables (Angrist et al., 1996)) of binary exposure Z, with $A = f (W, U_{A})$ and $Z = f (W, A, U_{Z})$ , where again, U_A and U_Z are unobserved exogenous errors. M is a binary mediator, with $M = f (W, Z, U_{M})$ , and Y is an outcome, $f (W, Z, M, U_{Y})$ , where U_M and U_Y represent exogenous errors. Adhering to the constraints of our statistical model in which A is an instrument, Y does not depend on A conditional on Z, and M does not depend on A conditional on Z This is equivalent to the exclusion restriction assumption of instruments (Angrist et al., 1996). However, the estimand and estimation approaches we consider also work in the scenario where M may depend on A conditional on Z: $M = f (W, A, Z, U_{M})$ . We describe differences in the estimator details for such a scenario in the Web appendix. In this alternative scenario, A is not an instrument for the total effect of Z on Y, and the estimation approach suggested by Joffe et al. (2008) would also be appropriate.

The density of the true distribution P₀ of O, $p_{0} (O)$ can be factorized as

p_{0} (O) = p_{0} (Y | W, Z, M) p_{0} (M | W, Z) p_{0} (Z | W, A) p_{0} (A | W) p_{0} (W) .

We note that an identification assumption of monotonicity of A on Z detailed in the next section, places an additional constraint on the statistical model.

3. Complier Stochastic Direct Effect Estimand and Identification

Our causal quantity of interest is the CSDE, which we define as

ψ_{_{_{C S D E}}} = E (Y_{{Z = 1, g}_{M | 0, W}^{*}} - Y_{{Z = 0, g}_{M | 0, W}^{*}} | Z_{1} - Z_{0} = 1),

(1)

where for each $a \in {0, 1}$ , Z_a indicates the potential exposure that would be observed if instrument A = a were assigned and compliers are those for whom $Z_{1} - Z_{0} = 1 Y_{z, g_{M | 0, W}^{*}}$ indicates the potential outcome that would be observed if exposure Z = z were assigned and under a given stochastic intervention on the mediator $g_{M | 0, W}^{*}$ , where the user sets M equal to m with probability $P (M = m | A = 0, W = w)$ . We note that this stochastic intervention marginalizes over Z and also note that $g_{M | 0, W}^{*}$ can be set equal to the true distribution, $g_{M | 0, W, 0}$ , or a data-dependent version estimated from the observed data, ${\hat{g}}_{M | 0, W}$ .

The statistical parameter, $Ψ_{C S D E}$ , is a mapping $Ψ_{C S D E} : M \to R$ that maps a probability distribution P in our statistical model $M$ to a real number R.

$Ψ_{C S D E} (P) = \frac{Ψ_{S D E} (P)}{Ψ_{F S} (P)}$ , where $Ψ_{S D E}$ is the statistical parameter for the stochastic direct effect (SDE) of A on Y given by

Ψ_{S D E} \equiv E (E (E_{g_{M | 0, W}^{*}} {E (Y | W, Z, M) | W, Z} | W, A = 1)) - E (E (E_{g_{M | 0, W}^{*}} {E (Y | W, Z, M) | W, Z} | W, A = 0)),

(2)

and where $Ψ_{F S}$ is the statistical parameter for the first-stage (FS) effect of A on Z given by

Ψ_{F S} \equiv E (E (Z | W, A = 1)) - E (E (Z | W, A = 0)) .

(3)

The causal quantity $ψ_{C S D E}$ is identified by the statistical parameter $Ψ_{C S D E}$ ,

ψ_{C S D E} = E (Y_{Z = 1, g_{M | 0, W}^{*}} - Y_{Z = 0, g_{M | 0, W}^{*}} | Z_{1} - Z_{0} = 1) \equiv Ψ_{C S D E} = Ψ_{S D E} / Ψ_{F S},

(4)

under the assumptions enumerated below. The proof for the identification result is in the Web appendix.

The assumptions needed for identifiability are:

$Y_{a, Z_{a}, g_{M | 0, W}^{*}} = Y_{Z_{a}, g_{M | 0, W}^{*}},$ which is the exclusion restriction assumption, stating that the instrument A only affects the outcome Y through the exposure Z where Z_a is the potential exposure that would be observed if instrument A = a were assigned, where $Y_{a, Z_{a}, g_{M | 0, W}^{*}}$ is the potential outcome that would be observed if A = a and Z= Z_a were assigned, and under stochastic intervention $g_{M | 0, W}^{*}$ where M to set to m with probability $P (M = m | A = 0, W = w), and where Y_{Z_{a}, g_{M | 0, W}^{*}}$ is the potential outcome that would be observed if Z = Z_a were assigned and under stochastic intervention $g_{M | 0, W}^{*},$
Sequential randomization: $A ⊥ Z_{a} | W, A ⊥ Y_{Z_{a}, g_{M | 0, W}^{*}} | W, and M ⊥ Y_{Z_{a}, g_{M | 0, W}^{*}} | W,$
Z₁ − Z₀ ≥ 0, which is the monotonicity assumption, meaning that the instrument A cannot decrease exposure,
Positivity assumptions: $P (A = a | W) > 0$ for all and $a \in A, and \frac{g_{M | 0, W}^{*} (m | W)}{P (M = m | Z, W)} < \infty$ a.e. which can also be written, $P (M = m | Z, W) > 0$ for all m in the support of $g_{M | 0, W}^{*} (m | W), i.e., all m s.t., g_{M | 0, W}^{*} (m | W) > 0,$ and
$E (Z_{1} - Z_{0} | W) \neq 0,$ which means that the average effect of the instrument on the exposure does not equal 0.

4. Estimators

We now describe several estimators of a data-dependent version of the CSDE parameter that assumes a known stochastic intervention on M estimated from the observed data, which we denote ${\hat{g}}_{M | 0, W} . So, here g_{M | 0, W}^{*} = {\hat{g}}_{M | 0, W}$ In the first subsection, we describe several estimators that estimate $Ψ_{C S D E}$ by estimating the numerator and denominator separately: a ratio of IPTW estimators, a ratio of EE estimators, and two ratios of TMLEs. In the second subsection, we describe a TMLE that targets the CSDE ratio itself, thus making a compatible plug-in estimator.

4.1. Estimators that estimate the numerator and denominator separately

4.1.1. Inverse Probability of Treatment Weighted Estimator

We first describe how to compute $Ψ_{C S D E}$ by using an IPTW estimator of the numerator, $Ψ_{S D E}$ , and denominator, $Ψ_{F S}$ , separately. The R code to program this estimator is included in the supplementary Web appendix.

The inverse probability of treatment weights for estimating $Ψ_{S D E}$ are

I P T W_{S D E} = \frac{(2 A - 1) {\hat{g}}_{M | 0, W}}{g_{A | W} g_{M | Z, W}} .

(5)

Let $g_{A, n} and g_{M, n}$ be estimators of $g_{A | W} = P (A = a | W) and g_{M | Z, W} = P (M = m | Z, W),$ respectively. $g_{A, n}$ can be estimated by predicted probabilities from a logistic regression model of A on W One could use machine learning in model fitting but we will describe estimation in terms of parametric model fitting for simplicity. $g_{M, n}$ can be estimated by predicted probabilities from a logistic regression model of M on W, Z. ${\hat{g}}_{M | 0, W}$ is treated as known, estimated from the observed data, marginalizing out $Z : \sum_{z = 0}^{1} P (M = m | Z = z, W) P (Z = z | A = 0, W)$ (VanderWeele and Tchetgen Tchetgen, 2017). The IPTW estimate of $Ψ_{S D E}$ is the empirical mean of outcome, Y, weighted by an estimate of IPTW_SDE.

The inverse probability of treatment weights for estimating $Ψ_{F S} are I P T W_{F S} = \frac{2 A - 1}{g_{A | W}}, where g_{A, n}$ is estimated as above. The IPTW estimate of $Ψ_{F S}$ is the empirical mean of Z, weighted by an estimate of $I P T W_{F S} .$

The ratio of these two IPTW estimates gives the IPTW estimate of parameter $Ψ_{C S D E}$ . The associated variance can be estimated as the sample variance of the estimator’s influence curve (IC), which is

D_{I P T W} (P) = \frac{D_{I P T W_{S D E}} (P)}{Ψ_{F S} (P)} - \frac{Ψ_{S D E} (P) D_{I P T W_{F S}} (P)}{Ψ_{F S}^{2} (P)},

(6)

and where

D_{I P T W_{S D E}} (P) = \frac{(2 A - 1) {\hat{g}}_{M | a *, W}}{g_{A | W} g_{M | Z, W}} Y - Ψ_{S D E}

(7)

and where

D_{I P T W_{F S}} (P) = \frac{2 A - 1}{g_{A | W}} Z - Ψ_{F S} .

(8)

We note that the above is the influence curve using true $g_{A | W} and true g_{M | Z, W} .$ If we use parametric models and maximum likelihood estimates of $g_{A | W} and g_{M | Z, W},$ then the sample variance of the above influence curve will be conservative.

4.1.2. Estimating Equation Estimator

We now describe how to estimate the $Ψ_{C S D E}$ by using an EE estimator of the numerator, $Ψ_{S D E}$ , and denominator, $Ψ_{F S}$ , separately. The efficient influence curve we detail for $Ψ_{S D E}$ is novel in that it respects the constraints on our statistical model—namely the exclusion restriction and monotonicity assumptions necessary for identification. This EE estimator uses the same estimator of the conditional distribution of Z given A and W for both the numerator and denominator. The R code to program this estimator is included in the supplementary Web appendix.

The efficient influence curve (EIC) for $Ψ_{C S D E}$ , is given by

D_{C S D E} (P) = \frac{D_{S D E} (P)}{Ψ_{F S} (P)} - \frac{Ψ_{S D E} (P) D_{F S} (P)}{Ψ_{F S}^{2} (P)},

(9)

where P represents $(Q_{W}, g_{A}, g_{Z}, \bar{Q}),$ and where

D_{S D E} (P) = (\frac{g_{1 | W, Z}}{g_{1 | W}} - \frac{g_{0 | W, Z}}{g_{0 | W}}) \frac{{\hat{g}}_{M | A = 0, W}}{g_{M | Z, W}} (Y - {\bar{Q}}_{Y} (M, Z, W)) + \frac{2 A - 1}{g_{A | W}} ({\bar{Q}}_{M} (1, W) - {\bar{Q}}_{M} (0, W)) (Z - g_{Z} (1 | A, W)) + ({\bar{Q}}_{z} (1, W) - {\bar{Q}}_{z} (0, W)) - Ψ_{S D E}

(10)

and where

D_{F S} (P) = \frac{2 A - 1}{g_{A | W}} (Z - g_{Z} (1 | A, W)) + {(g_{Z} (1, W) - g_{Z} (0, W)) - Ψ_{F S}} .

(11)

The EE estimator can be calculated in the following steps:

We first solve D_SDE to obtain the EE estimate of $Ψ_{S D E}$ . We calculate the first component of D_SDE as follows, noting that this first component is specifically formulated to respect the exclusion restriction. Let ${\bar{Q}}_{Y} = E (Y | W, Z, M) and let g_{A 2} = P (A = a | W, Z) . g_{M} = P (M = m | Z, M) and g_{A} = P (A = a | W)$ are defined and their estimation is described in Section 4.1.1. Recall that ${\hat{g}}_{M | 0, W}$ is treated as known, estimated from the observed data as described in Section 4.1.1. ${\bar{Q}}_{Y, n}$ can be estimated by predicted values of Y from a regression of Y on W Z and M. One could use machine learning in model fitting but we will describe estimation in terms of parametric model fitting for simplicity. $g_{A 2}$ can be written $\frac{P (A = a | W) P (Z = z | a, W)}{P (Z = z | W)} = \frac{g_{A | W} g_{Z | A, W}}{P (Z = z | W)},$ where $g_{A, n}$ can be estimated as described above, $g_{Z, n}$ can be estimated from a constrained logistic regression model of Z on A and W to respect the monotonicity assumption, $Z_{1} - Z_{0} \geq 0,$ and where an estimate of $P (Z = z | W)$ is obtained by marginalizing out $A : \sum_{a = 0}^{1} P (Z = z | A = a, W) P (A = a | W) .$
We now calculate the second component of D_SDE. To estimate ${\bar{Q}}_{M} = E (E (Y | W, Z, M) | W, Z),$ we integrate out M using the data-dependent stochastic intervention on M evaluated at $m | W, {\hat{g}}_{M | 0, W} (m | W) : {\bar{Q}}_{M, n} = \sum_{m = 0}^{1} {\bar{Q}}_{Y, n} (m, Z, W) {\hat{g}}_{M | 0, W} (m | W) .$
Finally, we calculate the third component of D_SDE. To estimate ${\bar{Q}}_{z} = E (E (E (Y | W, Z, M) | W, Z) | W, A),$ we integrate out Z from ${\bar{Q}}_{M, n} : {\bar{Q}}_{Z, n} = \sum_{z = 0}^{1} {\bar{Q}}_{M, n} (z, W) g_{Z, n} .$
The estimate of $Ψ_{S D E}$ is given by solving D_SDE.
The estimate of $Ψ_{F S}$ is given by solving D_FS, where each component is calculated as described above.
The ratio of these two estimates gives the EE estimate of $Ψ_{C S D E}$ .
The associated variance can be estimated as the sample variance of the EIC, $D_{C S D E} (P)$ , which is given in Equation 9.

4.1.3. Targeted Minimum Loss-Based Estimator

We now describe two TMLE approaches to estimate $Ψ_{C S D E}$ using separate estimates for the numerator, $Ψ_{S D E}$ , and denominator, $Ψ_{F S}$ .

Inefficient TMLE.

The first approach uses a previously developed TMLE for estimating $Ψ_{S D E}$ (Rudolph et al., 2017b) and uses the TMLE for an average treatment effect (van der Laan and Rubin, 2006) for estimating $Ψ_{F S}$ . However, using the previously developed TMLE for $Ψ_{S D E}$ respects neither the exclusion restriction nor monotonicity constraints on our statistical model, so we refer to this as an inefficient TMLE. This inefficient TMLE estimate of $Ψ_{C S D E}$ is given by the ratio of the estimates of $Ψ_{S D E}$ over $Ψ_{F S}$ . The variance of this estimator can be calculated as the sample variance of the EIC, given in Equation 9.

Efficient TMLE.

The second approach proposes a novel TMLE for $Ψ_{S D E}$ that respects the exclusion restriction and monotonicity statistical constraints. Additionally, as in the EE estimation approach detailed in Section 4.1.2, we use the same estimate of the conditional distribution of Z given A and W for both the numerator and denominator, and employ constrained regression in estimating this conditional distribution to enforce the monotonicity statistical constraint. Thus, we refer to this as an efficient TMLE and describe, step-by-step, how to compute this particular TMLE. The R code to program this efficient TMLE is included in the supplementary Web appendix.

Consider submodel ${{\bar{Q}}_{Y, n} (M, Z, W) (ϵ) : ϵ} defined as: l o g i t ({\bar{Q}}_{Y, n} (ϵ) (M, Z, W)) = l o g i t ({\bar{Q}}_{Y, n} (M, Z, W)) + ϵ C_{Y}, where C_{Y} = (\frac{g_{1 | W, Z}}{g_{1 | W}} - \frac{g_{0 | W, Z}}{g_{0 | W}}) \frac{{\hat{g}}_{M | A = 0, W}}{g_{M | Z, W}} .$ This C_Y differs from the C_Y in the previously developed inefficient TMLE for $Ψ_{S D E}$ in that the targeting step does not introduce dependence on A (Rudolph et al., 2017b); instead, the exclusion restriction constraint is preserved. The components of this submodel can be estimated as described in Step 1 of Section 4.1.2.
Let ϵ_n be the MLE fitted coefficient on C_Y in the logistic regression model of Yon C_Y with $l o g i t {\bar{Q}}_{Y, n}$ as an offset, using the binary log-likelihood loss function. Alternatively, a non-negative portion of $C_{Y} (e.g., \frac{{\hat{g}}_{M | A = 0, W}}{g_{M | Z, W}})$ may be moved into the weights and a weighted logistic regression model may be fitted. Y can be bounded to the [0,1] scale as previously recommended (Gruber and van der Laan, 2010).
The updated estimator is given by ${\bar{Q}}_{Y, n}^{*} (M, Z, W) = {\bar{Q}}_{Y, n} (ϵ_{n}) (M, Z, W);$ noting again that conditional independence with A is preserved.
We next integrate out M using the data-dependent stochastic intervention on $M, {\hat{g}}_{M | 0, W}, to estimate {\bar{Q}}_{M} = E (E (Y | W, Z, M) | W, Z) : {\bar{Q}}_{M, n}^{*} = \sum_{m = 0}^{1} {\bar{Q}}_{Y, n}^{*} (m, Z, W) {\hat{g}}_{M | 0, W} (m | W) .$
The next step is to target $g_{Z, n},$ given above. We denote this targeted $g_{Z, n}$ that is used in the numerator with $g_{Z, n}^{N^{*}}$ to distinguish it from the targeted version that is used in the denominator. Consider submodel ${g_{z, n} (ϵ_{1}, ϵ_{2}) : ϵ_{1}, ϵ_{2}}$ defined as:
$l o g i t g_{Z, n, ϵ_{1}, ϵ_{2}} (1 | W, A) = l o g i t g_{Z, n} (1 | W, A) + ϵ_{1} I (A = 1) C_{Z} ({\bar{Q}}_{M, n}^{*} (1, W) - {\bar{Q}}_{M, n}^{*} (0, W)) + ϵ_{2} I (A = 0) C_{Z} ({\bar{Q}}_{M, n}^{*} (1, W) - {\bar{Q}}_{M, n}^{*} (0, W)),$
where $C_{Z} = \frac{1}{g_{A | W}} .$ Let ${ϵ_{1, n}, ϵ_{2, n}}$ be the MLE fitted coefficients on $I (A = 1) ({\bar{Q}}_{M, n}^{*} (1, W) - {\bar{Q}}_{M, n}^{*} (0, W)) and I (A = 0) ({\bar{Q}}_{M, n}^{*} (1, W) - {\bar{Q}}_{M, n}^{*} (0, W))$ in the weighted logistic regression model of Z with $l o g i t g_{Z, n}$ as an offset, using the binary log-likelihood loss function, and weights C_Z.
The updated estimator is given by $g_{Z, n}^{N^{*}} = g_{Z, n} (ϵ_{1, n}, ϵ_{2, n}) .$
We can now estimate ${\bar{Q}}_{z} = E (E (E (Y | W, Z, M) | W, Z) | W, A)$ by integrating out Z from ${\bar{Q}}_{M, n}^{*} : {\bar{Q}}_{Z, n} = \sum_{Z = 0}^{1} {\bar{Q}}_{M, n}^{*} (Z, W) g_{Z, n}^{N *} (z | A, W) .$
The estimate of $Ψ_{S D E}$ is given by $Q_{W, n} ({\bar{Q}}_{Z, n} (1, W) - {\bar{Q}}_{Z, n} (0, W)),$ where $Q_{W, n}$ is the empirical distribution of W. It is the empirical mean of the difference in ${\bar{Q}}_{Z, n},$ setting a = 1 versus a = 0.
We now turn our attention to targeting $g_{Z, n}$ in the denominator. We denote the targeted $g_{Z, n}$ used in the denominator with $g_{Z, n}^{D *}$ to distinguish it from the targeted version used in the numerator, $g_{Z, n}^{N^{*}} .$ Consider submodel ${g_{Z, n} (ϵ_{D 1}, ϵ_{D 2}) : ϵ_{D 1}, ϵ_{D 2}}$ defined as: $l o g i t g_{Z, n, ϵ_{D 1}, ϵ_{D 2}} (1 | W, A) = l o g i t g_{Z, n} (1 | W, A) + ϵ_{D 1} C_{Z} I (A = 1) + ϵ_{D 2} C_{Z} I (A = 0),$ where $C_{Z} = \frac{1}{g_{A | W}} . Let {ϵ_{D 1, n}, ϵ_{D 2, n}}$ be the MLE fitted coefficients on $I (A = 1) and I (A = 0)$ in the weighted logistic regression model of Z with $l o g i t g_{Z, n}$ as an offset, using the binary log-likelihood loss function, and weights C_Z.
The updated estimator is given by $g_{Z, n}^{D *} = g_{Z, n} (ϵ_{D 1, n}, ϵ_{D 2, n}) .$
The estimate of $Ψ_{F S}$ is given by $Q_{W, n} (g_{Z, n}^{D *} (1 | 1, W) - g_{Z, n}^{D *} (1 | 0, W)) .$ It is the empirical mean of the difference in $g_{Z, n}^{D *},$ setting a = 1 versus a = 0.
The ratio of these two estimates of $Ψ_{S D E}$ over $Ψ_{F S}, \frac{Q_{W, n} ({\bar{Q}}_{Z, n} (1, W) - {\bar{Q}}_{Z, n} (0, W))}{Q_{W, n} (g_{Z, n}^{D *} (1, W) - g_{Z, n}^{D *} (0, W))},$ gives the efficient TMLE estimate of $Ψ_{C S D E}$ . The TMLE solves the empirical means of the efficient influence curves (EIC) for $Ψ_{S D E} and Ψ_{F S}$ (Equations 10–11 in Section 4.1.2), replacing P with $P_{n}^{*},$ where $P_{n}^{*}$ represents $(Q_{W, n}, g_{A, n}, g_{Z, n}^{N *}, g_{Z, n}^{D *}, {\bar{Q}}_{Y, n}^{*}) .$
The variance of the TMLE of $Ψ_{C S D E}$ is estimated as the sample variance of $D_{C S D E} (P_{n}^{*}) .$

4.2. TMLE that estimates the CDSE ratio directly

We now describe a TMLE that targets the CSDE ratio itself. This TMLE is both efficient, because it respects the model constraints, and compatible, because it simultaneously targets the numerator and denominator. We henceforth refer to this as the compatible TMLE and describe, step-by-step, how to compute it. Several of the steps that follow are identical to those for the Efficient TMLE described in the above Section 4.1.3 (e.g., Steps 1–4). The difference lies in the targeting of $g_{Z, n} .$ The R code to program this compatible TMLE is included in the supplementary Web appendix.

Again, consider submodel ${{\bar{Q}}_{Y, n} (M, Z, W) (ϵ) : ϵ} defined as: l o g i t ({\bar{Q}}_{Y, n} (ϵ) (M, Z, W)) = l o g i t ({\bar{Q}}_{Y, n} (M, Z, W)) + ϵ C_{Y}, where C_{Y} = (\frac{g_{1 | W, Z}}{g_{1 | W}} - \frac{g_{0 | W, Z}}{g_{0 | W}}) \frac{{\hat{g}}_{M | A = 0, W}}{g_{M | Z, W}} .$ Recall that this C_Y preserves the exclusion restriction constraint on our statistical model such that Y is conditionally independent of A given M, Z, W The components of this submodel can be estimated as described in Step 1 of Section 4.1.2.
Let ϵ_n be the MLE fitted coefficient on C_Y in the logistic regression model of Y on C_Y with $logit {\bar{Q}}_{Y, n}$ as an offset, using the binary log-likelihood loss function. Alternatively, a non-negative portion of $C_{Y} (e.g., \frac{{\hat{g}}_{M | A = 0, W}}{g_{M | Z, W}})$ may be moved into the weights and a weighted logistic regression model may be fitted. Y can be bounded to the [0,1] scale as previously recommended (Gruber and van der Laan, 2010).
The updated estimator is given by ${\bar{Q}}_{Y, n}^{*} (M, Z, W) = {\bar{Q}}_{Y, n} (ϵ_{n}) (M, Z, W);$ nothing again that conditional independence with A is preserved.
We next integrate out M using the data-dependent stochastic intervention on ${\bar{Q}}_{M} = E (E (Y | W, Z, M) | W, Z) : {\bar{Q}}_{M, n}^{*} = \sum_{m = 0}^{1} {\bar{Q}}_{Y, n}^{*} (m, Z, W) {\hat{g}}_{M | 0 W} (m | W) .$
The next step is to target $g_{Z, n},$ given above, for both the numerator and denominator and in such a way that preserves monotonicity of A on Z Consider submodel ${g_{Z, n} (ϵ_{1}, ϵ_{2}, ϵ_{3}, ϵ_{4}); ϵ_{1}, ϵ_{2}, ϵ_{3}, ϵ_{4}}$ defined as: ${logitg}_{Z, n, ϵ_{1}, ϵ_{1}, ϵ_{3}, ϵ_{4}} (1 | W, A) = {logitg}_{Z, n} (1 | W, A) + ϵ_{1} I (A = 1) C_{Z} + ϵ_{2} I (A = 0) C_{z} + ϵ_{3} I (A = 1) C_{Z} ({\bar{Q}}_{M, n}^{*} (1, W) - {\bar{Q}}_{M, n}^{*} (0, W)) + ϵ_{4} I (A = 0) C_{Z} ({\bar{Q}}_{M, n}^{*} (1, W) - {\bar{Q}}_{M, n}^{*} (0, W)),$ where $C_{Z} = \frac{1}{g_{A | W}} .$ Let $(ϵ_{1, n}, ϵ_{2, n}, ϵ_{3, n}, ϵ_{4, n})$ the fitted coefficients on $I (A = 1), I (A = 0), I (A = 1) ({\bar{Q}}_{M, n}^{*} (1, W) - {\bar{Q}}_{M, n}^{*} (0, W)), and I (A = 0) ({\bar{Q}}_{M, n}^{*} (1, W) - {\bar{Q}}_{M, n}^{*} (0, W))$ in the weighted logistic regression model of Z with $l o g i t g_{Z, n}$ as an offset, using the binary log-likelihood loss function, and weights C_Z.
The updated estimator is given by $g_{Z, n}^{*} = g_{Z, n} (ϵ_{1, n}, ϵ_{2, n}, ϵ_{3, n}, ϵ_{4, n}) .$
We can now estimate ${\bar{Q}}_{z} = E (E (E (Y | W, Z, M) | W, Z) | W, A)$ by integrating $out Z from {\bar{Q}}_{M, n}^{*} : {\bar{Q}}_{Z, n} = \sum_{z = 0}^{1} {\bar{Q}}_{M, n}^{*} (z, W) g_{Z, n}^{*} (z | A, W) .$
The estimate of $Ψ_{S D E} is given by Q_{W, n} ({\bar{Q}}_{Z, n} (1, W) - {\bar{Q}}_{Z, n} (0, W)), where Q_{W, n}$ is the empirical distribution of W. It is the empirical mean of the difference in ${\bar{Q}}_{Z, n},$ setting a = 1 versus a = 0.
The estimate of $Ψ_{F S}$ is given by $Q_{W, n} (g_{Z, n}^{*} (1 | 1, W) - g_{Z, n}^{*} (1 | 0, W)) .$ It is the empirical mean of the difference in $g_{Z, n}^{*},$ setting a = 1 versus a = 0.
The ratio of the estimates of $Ψ_{S D E} over Ψ_{F S}, \frac{Q_{W, n} ({\bar{Q}}_{Z, n} (1, W) - {\bar{Q}}_{Z, n} (0, W))}{Q_{W, n} (g_{Z, n}^{*} (1, W) - g_{Z, n}^{*} (0, W))},$ gives the TMLE estimate of $Ψ_{C S D E} .$ The TMLE solves the empirical mean of the EIC for $Ψ_{C S D E}$ (Equation 9), replacing P with $P_{n}^{*}, where P_{n}^{*} represents (Q_{W, n}, g_{A, n}, g_{Z, n}^{*}, {\bar{Q}}_{Y, n}^{*}) .$
The variance of the TMLE of $Ψ_{C S D E}$ is estimated as the sample variance of $D_{C S D E} (P_{n}^{*}) .$

5. Simulation

5.1. Overview

We conduct a simulation study to examine finite sample performance of the IPTW, EE, and TML estimators for $Ψ_{C S D E}$ from the two data-generating mechanisms (DGMs) shown in Table 1. In the Moving to Opportunity data used for the empirical illustration, we have $O = (W, Δ, Δ A, Δ Z, Δ M, Δ Y),$ where Δ is an indicator of selection into the sample (in the Moving to Opportunity data, one child from each family is selected to participate). This results in a factorized likelihood:

p_{0} (O) = p_{0} (Y | W (Z, M, Δ = 1) p_{0} (M | W, Z, Δ = 1) p_{0} (Z | W, A, Δ = 1) \times p_{0} (A | W, Δ = 1) p_{0} (Δ = 1 | W) p_{0} (W) .

(12)

Table 1.

Simulation data-generating mechanisms.

Moderate-Strong Instrument Simulation
$W_{1} ~ B e r (0.5)$	P(W₁ = 0.50)
$W_{2} ~ B e r (0.4 + 0.2 W_{1})$	P(W₂ = 0.50)
$Δ ~ B e r (- 1 + \log (4) W_{1} + \log (4) W_{2}$	P(Δ = 0.58)
$A = Δ A^{}, where A^{} ~ B e r (0.5)$	P(A = 0.50)
$Z = Δ Z^{}, where Z^{} ~ B e r (l o g (4) A - l o g (2) W_{2})$	P(Z = 0.58)
$M = Δ M^{}, where M^{} ~ B e r (- l o g (3) + l o g (10) Z - l o g (1.4) W_{2})$	P(M = 0.52)
$Y = Δ Y^{}, where Y^{} ~ B e r (\log (1.2) + l o g (3) Z + l o g (3) M - l o g (1.2) W_{2} + l o g (1.2) Z W_{2})$	P(Y = 0.76)
Weak Instrument Simulation
$Z = Δ Z^{}, where Z^{} ~ Ber (0.005 + 0.1 A + 0.5 W_{2})$	P(Z = 0.31)

Open in a new tab

Under the assumptions enumerated in Section 3, our causal quantity of interest, the CSDE, is identified by the statistical parameter, $Ψ_{C S D E} = Ψ_{S D E} / Ψ_{F S}$ where $Ψ_{S D E}$ is identified

Ψ_{S D E} \equiv E (E (E_{g_{M | 0, W}^{*}} {E (Y | W, Δ = 1, Z, M) | W, Δ = 1, Z} | W, Δ = 1, A = 1) | W) - E (E (E_{g_{M | 0, W}^{*}} {E (Y | W, Δ = 1, Z, M) | W, Δ = 1, Z} | W, Δ = 1, A = 0) | W),

(13)

and where $Ψ_{F S}$ is identified

Ψ_{F S} \equiv E (E (Z | W, Δ = 1, A = 1) | W) - E (E (Z | W, Δ = 1, A = 0) | W) .

(14)

This slight modification to the SCM results in correspondingly slight modifications to the estimators. For the IPTW estimators, the weights are multiplied by the inverse probability of sampling weights $Δ / π,$ where $π$ represents $P (Δ = 1 | W) .$ The EE estimators solve the numerator and denominator of an EIC that is nearly identical to that given in Equations 10 – 11 only now multiplied by $Δ / π,$ giving the modified ElCs for $Ψ_{S D E} and Ψ_{F S} :$

D_{S D E} (P) = \frac{Δ}{π} (\frac{g_{1 | W, Z}}{g_{1 | W}} - \frac{g_{0 | W, Z}}{g_{0 | W}}) \frac{{\hat{g}}_{M | A = 0, W}}{g_{M | Z, W}} (Y - {\bar{Q}}_{Y} (M, Z, W)) + \frac{Δ}{π} (\frac{2 A - 1}{g_{A W}} ({\bar{Q}}_{M} (1, W) - {\bar{Q}}_{M} (0, W)) (Z - g_{Z} (1 | A, W)) + ({\bar{Q}}_{Z} (1, W) - {\bar{Q}}_{Z} (0, W))) - Ψ_{S D E}

(15)

and where

D_{F S} (P) = \frac{Δ}{π} (\frac{2 A - 1}{g_{A | W}} (Z - g_{Z} (1 | A, W)) + (g_{Z} (1, W) - g_{Z} (0, W))) - Ψ_{F S} .

(16)

The TMLEs are now inverse-weighted TMLEs where the clever covariates are multiplied by $Δ / π .$

Table 1 uses the same notation as in Section 2, excepting the addition of Δ. The first DGM represents the primary simulation and a moderate-strong instrument scenario. The second DGM represents a weak instrument scenario that may be more likely to result in CSDE estimates that lie outside the bounds of the parameter space when estimated by non-substitution-based estimators.

We compare performance of our three estimators. We show estimator performance in terms of absolute bias, percent bias, closeness to the efficiency bound (mean estimator standard error (SE) × the square root of the number of observations), 95% confidence interval (CI) coverage, and mean squared error (MSE) across 1,000 simulations for sample sizes of N=5,000, N=500, and N=100. In addition, we consider 1) correct specification of all models, 2) misspecification of the Y model that included a term for Z only, 3) misspecification of the M model that included a term for W only, 4) misspecification of the M and Y models, 5) misspecification of the Z model that included a term for A only, and 6) misspecification of the Z and Y models.

5.2. Results

Table 2 gives results under the moderate-strong instrument simulation scenario using correct model specification for $Ψ_{C S D E},$ comparing the TML, IPTW, and EE estimators. We see that the TML, IPTW, and EE estimators are consistent when all models are correctly specified and sample sizes are large (N=5,000), showing biases of less than 1 % in the case of the TML and EE estimators and just over 1% in the case of the IPTW estimator. Bias increases for the IPTW estimator under the smaller sample sizes of N=500 and N=100 to 4% and 42% respectively, indicating that even under correct model specification, this estimator is challenged in finite samples. In contrast, the TML and EE estimators continue to perform well when sample size decreases to N=500 and N=100, although the efficient TMLE that targets the numerator and denominator separately shows a bias of 13% under N=100. The compatible TML and EE estimators perform similarly and close to the efficiency bound for all sample sizes, though efficiency decreases slightly with decreasing sample size. 95% CI coverage for both is close to 95% for N=5,000 and N=500 and is reduced slightly for N=100. 95% CI coverage is conservative for the IPTW estimator—around 99% for all sample sizes.

Table 2.

Simulation results comparing estimators of $Ψ_{CSDE}$ under correct model specification for various sample sizes. 1,000 simulations. Estimation methods compared include IPTW, EE, efficient TMLE, and compatible TMLE. Bias and MSE values are averages across the simulations. The estimator standard error $\times \sqrt{n}$ should be compared to the efficiency bound, which is 1.10.

Estimand	Bias	%Bias	$SE \times \sqrt{n}$	95%CI Cov	MSE
All correctly specified
N=5000
TMLE, compatible	0.000	0.07	1.11	94.90	0.000
TMLE, efficient	0.000	0.07	1.11	94.90	0.000
IPTW	0.003	1.45	6.53	98.70	0.005
EE	0.000	0.12	1.11	94.50	0.000
N=500
TMLE, compatible	−0.001	−0.47	1.11	94.90	0.002
TMLE, efficient	−0.001	−0.47	1.11	95.00	0.002
IPTW	−0.009	−4.08	6.89	98.40	0.051
EE	−0.002	−0.75	1.11	95.50	0.002
N=100
TMLE, compatible	−0.009	−3.96	1.14	90.64	0.014
TMLE, efficient	0.028	13.43	1.17	86.50	0.045
IPTW	−0.093	−42.39	21.71	99.20	1.484
EE	−0.005	−2.15	1.12	93.30	0.013

Open in a new tab

Table 3 gives simulation results under various model misspecifications with large sample size of N=5,000. The IPTW estimator is consistent if the A and M models are correctly specified. Deriving the robustness properties for the EE and TML estimators from the EIC, under large sample size, one of three scenarios is required for estimates of $Ψ_{S D E}$ to be consistent: 1) the $g_{A | W}, g_{Z | A, W}, and g_{M | Z, W}$ models need to be correctly specified, or 2) the $g_{Z | A, W} and {\bar{Q}}_{Y}$ models need to be correctly specified, or 3) the $g_{A | W}, g_{M | Z, W}, and {\bar{Q}}_{Y}$ models need to be correctly specified. For estimates of $Ψ_{F S}$ to be consistent, either the $g_{A | W} or g_{Z | A, W}$ model needs to be correctly specified. Thus, for the EE and TML estimators, robustness requirements for the denominator are subsumed in the robustness requirements for the numerator. In the simulation results that follow, we note that A is randomly assigned in the DGM we consider, aligned with its role as an instrument and with our motivating example.

Table 3.

Simulation results comparing estimators of $Ψ_{C S D E}$ under various model misspecifications for sample size N=5,000. 1,000 simulations. Estimation methods compared include IPTW, EE, efficient TMLE, and compatible TMLE. Bias and MSE values are averages across the simulations. The estimator standard error $\times \sqrt{n}$ should be compared to the efficiency bound, which is 1.10. 95% CI Coverage as determined by bootstrapping is denoted in parentheses for scenarios in which the Z model is misspecified.

Estimand	Bias	%Bias	$SE \times \sqrt{n}$	95%CI Cov	MSE
M model misspecified, N=5,000
TMLE, compatible	0.000	0.02	1.05	94.30	0.000
TMLE, efficient	0.000	0.02	1.05	94.30	0.000
IPTW	−0.024	−11.28	5.32	99.40	0.003
EE	0.000	0.08	1.05	94.20	0.000
Y model misspecified, N=5,000
TMLE, compatible	0.000	0.09	1.14	95.60	0.000
TMLE, efficient	0.000	0.09	1.14	95.60	0.000
IPTW	0.003	1.45	6.53	98.70	0.005
EE	0.000	0.06	1.19	96.10	0.000
M and Y models misspecified, N=5,000
TMLE, compatible	0.096	44.90	1.06	0.00	0.009
TMLE, efficient	0.096	44.90	1.06	0.00	0.009
IPTW	−0.024	−11.28	5.32	99.40	0.003
EE	0.095	44.74	1.06	0.00	0.009
Z models misspecified, N=5,000
TMLE, compatible	0.001	0.28	1.16	87.50 (92.90)	0.000
TMLE, efficient	0.012	6.28	1.38	83.10 (92.60)	0.001
IPTW	0.003	1.45	6.53	98.70	0.005
EE	0.001	0.26	1.16	88.30 (92.80)	0.000
Z and Y models misspecified, N=5,000
TMLE, compatible	0.066	33.64	1.16	2.60 (47.00)	0.005
TMLE, efficient	0.041	20.72	1.42	45.70 (46.90)	0.002
IPTW	0.003	1.45	6.53	98.70	0.005
EE	0.071	36.09	1.23	2.10 (38.30)	0.005

Open in a new tab

As expected from each estimator’s robustness properties, we see that all estimators are consistent under misspecification of the Y model, with performance equivalent to performance under correct model specifications and N=5,000. Also as expected, under misspecification of the M model, the IPTW estimator is no longer consistent with 11 % bias, but the TML and EE estimators remain consistent. When both the M and Y models are misspecified, all three estimators are inconsistent with biases ranging from 11 % for IPTW to 45% for TML and EE, and 95% CI coverage of the TML and EE estimators is reduced to 0%.

The compatible TML, EE, and IPTW estimators are consistent under misspecification of the Z model. In this scenario, the efficient TMLE demonstrates slight bias (6%), possibly because $g_{Z | A, W}$ is targeted separately in the numerator and denominator, resulting in incompatibility. In this scenario, and more generally under any model misspecification scenario consistent with the robustness properties guaranteeing estimator consistency, the IC-based inference for the TML and EE estimators may be inaccurate, because the IC differs under misspecification. Indeed, we see here that using IC-based inference results in undercoverage (Table 3). Coverage improves when bootstrapping is used for inference (also shown in Table 3), because bootstrapping will approximate the true variance of the estimator in this case where parametric models are used in fitting. However, bootstrapping may not provide valid inference under misspecification if machine learning is used in model fitting, because the resulting estimator is no longer asymptotically linear. Others have addressed the challenge of inference in scenarios of model misspecification under estimator consistency by targeting the nuisance parameters of the IC to provide accurate IC-based inference at the cost of efficiency (Benkeser et al., 2017; van der Laan, 2014).

Misspecification of both the Z and Y models results not only in invalid inference for the EE and TML estimators but also in inconsistent estimates. We note that for the two scenarios where the Z model was misspecified the true DGM was changed to $Z^{*} ~ Ber (\log (4) A - \log (40) W)$ to make misspecifying the Z model meaningful.

Table 4 gives results under the weak instrument simulation scenario using correct model specification, comparing the IPTW, EE, efficient TML, and compatible TML estimators. Finite sample performance is challenged in this scenario—we see all estimators having larger biases and worse efficiency for a given sample size than in the stronger instrument scenario. Performances of the IPTW estimator is particularly affected. Under sample size N=500 and correct model specification, the IPTW estimator is 27% biased compared to 3% bias of the other estimators. With sample size N=100 under this weak instrument scenario, all estimators perform poorly, with the IPTW estimator displaying particularly egregious performance.

Table 4.

Simulation results comparing the efficient and compatible TML, IPTW, and EE estimators of $Ψ_{C S D E}$ under the weak instrument simulation scenario and correct model specification for various sample sizes. 1,000 simulations. Bias and MSE values are averages across the simulations. The estimator standard error $\times \sqrt{n}$ should be compared to the efficiency bound, which is 1.13.

Estimand	Bias	%Bias	$SE \times \sqrt{n}$	95%CI Cov	MSE	% Out of Bounds
N=5000
TMLE compatible	−0.000	−0.08	1.13	73.50	0.001	0.00
TMLE, efficient	−0.000	−0.07	1.13	73.50	0.001	0.00
IPTW	0.016	8.14	19.29	98.70	0.045	0.10
EE	−0.000	−0.07	1.13	74.40	0.001	0.00
N=500
TMLE compatible	0.008	3.53	1.38	74.60	0.010	0.10
TMLE, efficient	0.008	3.55	1.38	74.50	0.010	0.10
IPTW	0.057	27.05	28.51	99.90	0.964	18.80
EE	0.008	3.69	1.40	75.30	0.010	0.10
N=100
TMLE compatible	0.048	22.19	344.04	86.49	3.302	4.10
TMLE, efficient	0.112	50.95	1226.43	88.51	3.550	⁴⁶⁹
IPTW	−1.14 ×10¹²	−5.11 ×10¹⁴	5.37 ×10²⁸	100.00	1.24 ×10²⁷	53.10
EE	0.042	19.40	53.66	88.50	0.740	3.30

Open in a new tab

In part, the poor efficiency of the IPTW estimator is due to very small or even negative denominator estimates in this weak instrument scenario, which results in CSDE estimates lying outside of the parameter space. Indeed, we see that 18% and 53% of the IPTW estimates were out of the bounds of the parameter space for the N=500 and N=100 sample sizes, respectively. In contrast, the EE and TMLE estimates largely stay within the parameter space for N=500. For the smallest sample size of N=100, about 3–4% lie outside of the parameter space for the EE and TMLE estimates.

6. Empirical Illustration

6.1. Overview and set-up

We now apply our proposed TMLE to the Moving to Opportunity study (MTO): a longitudinal, randomized trial where families living in public housing were randomized to receive a Section 8 housing voucher that they could then use to move out of public housing (Kling et al., 2007). In this example, the CSDE is the direct effect of using the housing voucher to move out of public housing on adolescent substance use outcomes, not mediated by aspects of parental wellbeing, among those who would comply with the intervention.

The instrument, A, is defined as randomization to receive a Section 8 housing voucher that one can then use to rent on the private market. The exposure, Z, is defined as adherence to the intervention—using the housing voucher, if one received it, to move out of public housing. We examined direct effects not operating through each of five mediators, M all measured at an interim assessment that occurred 4–7 years after the baseline assessment: 1) parental employment; 2) parental anxiety, defined as feeling worried, tense or anxious most of the time or worrying much more than others in his/her situation for at least one month during the past year; 3) parental depression, consistent with DSM-IV diagnostic criteria, as measured by the CIDI-SF instrument (Kessler et al., 1998); 4) parental distress, as measured by the Kessler Psychological Distress Scale (K6) (Kessler et al., 2002); and 5) parental warmth towards the adolescent, as measured by direct observation, consisting of nine items. The first three mediators were binary. The last two were indices bounded ^[0,1]. We examined three adolescent substance use outcomes, Y, which were also measured at the interim assessment: past-month cigarette use, past-month marijuana use, and past-month problematic drug use. Problematic drug use was defined as using hard drugs or using marijuana before school or work in the past month. We used a high-dimensional vector of covariates measured at baseline, W that included social-demographic information for the adolescent and his/her family, information on the adolescent’s behavior and learning while a child, neighborhood characteristics, and reasons for the family’s participation in MTO. Definitions of W, A, Z and Y align with previous work estimating direct and indirect effects of A on Y in the MTO study (Rudolph et al., 2017a). These variables follow the same structural causal model as detailed in Section 2.

Aligned with a prior analysis estimating direct effects in MTO (Rudolph et al., 2017a), our sample includes adolescents participating in MTO who were 12–17 years old at the interim assessment. We exclude the Baltimore site, as Section 8 voucher receipt did not increase a family’s likelihood of moving to a low-poverty neighborhood, which differs from other sites and from the intention of the intervention. We conducted analyses stratified by gender, as previous work documented qualitatively and quantitatively different intervention effects between girls and boys (Orr et al., 2003; Clampet-Lundquist et al., 2011). We combine sites with similar intervention effects, as has been done previously (Rudolph et al., 2017a). Lastly, we restrict to those with nonmissing mediator and outcome data. Multiple imputation by chained equations (Buuren and Groothuis-Oudshoorn, 2011) was used to create 30 imputed datasets to address missing covariate values (none had more than 5% missing). The University of California, Davis, and Columbia University determined this analysis of deidentified data to be non-human subjects research.

6.2. Results

Total complier average causal effects (CACEs) (Angrist et al., 1996) (also called treatment-on-treated effects (Orr et al., 2003)) are shown in Web Figures 1–3 (see supplementary Web appendix). This is the effect of Z on Y among compliers, using randomization of the instrument A. In other words, it is the total effect of moving with the voucher out of public housing on the outcome, among those who comply with the intervention. Moving with the voucher out of public housing increased risk of cigarette use among boys by 8% (RD: 0.08, 95%CI: - 0.00, 0.17) and reduced risk of marijuana use among girls by 7% (RD: −0.07, 95%CI: −0.13, −0.01). This aligns with previous work finding that the intervention generally improved health and risk behavior outcomes among girls but had negative impacts in terms of these same types of outcomes for boys (Orr et al., 2003; Clampet-Lundquist et al., 2011).

We next estimated the first-stage, data-dependent, stochastic effect of A on each mediator, M. $E ({\hat{g}}_{M | 1, W} - {\hat{g}}_{M | 0, W})$ that are used in each of CSDE estimators. These first-stage effects are shown in Table 5. Across outcome samples and genders, we see that being randomized to receive a Section 8 voucher increases parental employment and anxiety but decreases parental depression. Effects of voucher receipt on distress and warmth were mixed or null. These first-stage effects of A on M operate through Z The effect of A on Z (voucher receipt on moving with the voucher out of public housing) ranged from an increased likelihood of 47% to 52%, depending on the outcome sample.

Table 5.

Risk differences of the effect of voucher receipt on mediator by outcome sample (marginal effects, adjusting for baseline covariates and adherence, Z).

Mediator	Boys	Girls
	RD (95% CI)	RD (95% CI)
Cigarette Use Sample
Parental employment	0.058 (0.047, 0.070)	0.026 (0.015, 0.037)
Parental anxiety	0.036 (0.028, 0.044)	0.041 (0.039, 0.043)
Parental depression	−0.004 (−0.007, −0.001)	−0.004 (−0.006, −0.001)
Parental distress	0.004 (−0.001, 0.009)	0.013 (0.009, 0.017)
Parental warmth	−0.006 (−0.032, 0.019)	−0.005 (−0.028, 0.019)
Marijuana Use Sample
Parental employment	0.079 (0.067, 0.092)	0.027 (0.015, 0.038)
Parental anxiety	0.011 (0.002, 0.021)	0.042 (0.041, 0.043)
Mediator	Boys	Girls
Parental depression	−0.024 (−0.026, −0.021)	−0.003 (−0.005, −0.001)
Parental distress	−0.021 (−0.027, −0.015)	0.012 (0.008, 0.016)
Parental warmth	0.005 (−0.023, 0.033)	−0.001 (−0.025, 0.023)
Problematic Drug Use Sample
Parental employment	0.061 (0.052, 0.070)	0.052 (0.041, 0.063)
Parental anxiety	0.039 (0.031, 0.047)	0.050 (0.045, 0.056)
Parental depression	−0.006 (−0.009, −0.003)	−0.011 (−0.016, −0.005)
Parental distress	0.004 (−0.001, 0.009)	0.016 (0.009, 0.022)
Parental warmth	−0.005 (−0.027, 0.017)	−0.011 (−0.040, 0.017)

Open in a new tab

The TMLE estimates of the $Ψ_{C S D E}$ s by outcome sample, gender, and mediator are shown in Figures 1 - 3. The estimates are similar across mediators.

Fig. 1 — Data-dependent complier stochastic direct effect estimates and 95% confidence intervals on past-month cigarette use by mediator. Data from the Moving to Opportunity experiment, interim follow up.

Fig. 3 — Data-dependent complier stochastic direct effect estimates and 95% confidence intervals on past-month problematic drug use by mediator. Data from the Moving to Opportunity experiment, interim follow up.

Lastly, we compare the CSDE estimates in Figures 1 - 3 with the stochastic direct and indirect effect (SDEs and SIEs) estimates (see supplementary Web appendix Figures 1–3). The SDE is the direct effect of A on Y not through M and the SIE is the indirect effect of A on Y through M The total intent-to-treat average treatment effects (the total effect of A on each Y) are included in the figures. Such further examination may be of interest, because previous research has shown that significant indirect effects may be present with null total or direct effects (Imai et al., 2010). However, all SDE and SIE effect estimates are null.

Together, these results suggest that none of the five parental well-being variables tested are on the causal pathway from Section 8 voucher receipt to subsequent adolescent substance use. The first-stage results (Table 5) provided evidence against mediation by parental distress, or warmth. The CSDE and SDE/SIE estimates then provided evidence against mediation by the remaining variables of parental employment, anxiety, or depression.

7. Conclusion

In this paper, we identified the IV direct effect of exposure, Z on outcome Y not operating through mediator, M—what we call the complier stochastic direct effect. We detailed three estimators to estimate such effects: a ratio of inverse-probability of treatment-weighted estimators, a ratio of estimating equation estimators, a ratio of targeted minimum loss-based estimators, and a TMLE that targets the CSDE directly. These estimators would be applicable for a variety of study designs, including 1) randomized encouragement trials, like the MTO housing voucher experiment we consider as an illustrative example, 2) treatment discontinuities, such when a policy or practice changes abruptly either over time or at a certain value, and 3) Mendelian randomization (Baiocchi et al., 2014). To facilitate implementation of our proposed estimators, we include step-by-step instructions in the main text and commented R code in the supplementary Web appendix.

Estimators of the CSDE will be challenged by the finite sample sizes encountered in real-world data. Both mediation estimators and IV estimators are less efficient than their total average treatment effect counterparts. Because estimators of the CSDE combine both mediation and IV components, these estimators will likewise have efficiency challenges, particularly in finite samples. Thus, it is important to choose an estimator that is more robust to finite sample bias.

We found the IPTW estimator to be the most sensitive to finite sample bias, resulting in bias of over 40% even when all models were correctly specified in a sample size of N=100 (Table 2). In contrast, the EE estimator and compatible TMLE were far less sensitive, demonstrating slight losses of efficiency in sample sizes of N=100.

The EE and TML estimators also have advantages over the IPTW estimator in terms of efficiency and reduced reliance on correct parametric model specification due to 1) being robust to certain combinations of model misspecifications and 2) having theory-based inference when incorporating data-adaptive methods like machine learning into model fitting. In addition, the compatible TMLE solves the efficient influence equation of the CSDE ratio, with the targeting being done in such a way that it is compatible across the numerator and denominator. This compatibility improves the TMLE’s performance particularly in the weaker instrument scenario, as was shown in Table 4 comparing the compatible and efficient TMLEs. A previous TMLE for the complier average total effect (as opposed to the complier direct effect) used a separate TMLE for each of the numerator and denominator, so did not have the advantage of this compatibility (Rudolph and van der Laan, 2017).

However, the estimators we propose are limited in that they use a data-dependent stochastic intervention on M, ${\hat{g}}_{M | a, W},$ which assumes that the stochastic draw is from a known distribution of $M | a, W,$ estimated from the observed data. It would be significantly more complex to solve the compatible EIC for the non-data-dependent version, however, we plan to complete such an extension.

Perhaps the most significant limitation is that we were unable to identify a corresponding complier stochastic indirect effect without additional restrictive assumptions. This limitation is corroborated by recent work by (Frölich and Huber, 2017) where a similar IV indirect effect could only be identified by assuming two distinct instruments, one for Z and one for M that themselves were conditionally independent $A_{1} ⫫ A_{2} | W .$

Supplementary Material

Supp 1

NIHMS1550244-supplement-Supp_1.pdf^{(399.7KB, pdf)}

Fig. 2 — Data-dependent complier stochastic direct effect estimates and 95% confidence intervals on past-month marijuana use by mediator. Data from the Moving to Opportunity experiment, interim follow up.

Acknowledgments

The authors gratefully acknowledge R00DA042127(PI: Rudolph)

Contributor Information

Kara E. Rudolph, Department of Epidemiology, Columbia University, New York, New York

Oleg Sofrygin, Division of Biostatistics, University of California, Berkeley.

Mark J. van der Laan, Division of Biostatistics, University of California, Berkeley

References

Albert JM (2008) Mediation analysis via potential outcomes models. Statistics in medicine, 27, 1282–1304. [DOI] [PubMed] [Google Scholar]
Angrist JD, Imbens GW and Rubin DB (1996) Identification of causal effects using instrumental variables. Journal of the American statistical Association, 91, 444–455. [Google Scholar]
Baiocchi M, Cheng J and Small DS (2014) Instrumental variable methods for causal inference. Statistics in medicine, 33, 2297–2340. [DOI] [PMC free article] [PubMed] [Google Scholar]
Benkeser D, Carone M, Laan MVD and Gilbert P (2017) Doubly robust nonparametric inference on the average treatment effect. Biometrika, 104, 863–880. [DOI] [PMC free article] [PubMed] [Google Scholar]
Buuren S and Groothuis-Oudshoorn K (2011) mice: Multivariate imputation by chained equations in r. Journal of statistical software, 45. [Google Scholar]
Chén OY, Crainiceanu C, Ogburn EL, Caffo BS, Wager TD and Lindquist MA (2015) High-dimensional multivariate mediation with application to neuroimaging data. Biostatistics. [DOI] [PMC free article] [PubMed] [Google Scholar]
Clampet-Lundquist S, Edin K, Kling JR and Duncan GJ (2011) Moving teenagers out of high-risk neighborhoods: How girls fare better than boys. American Journal of Sociology, 116, 1154–89. [DOI] [PubMed] [Google Scholar]
Didelez V, Dawid AP and Geneletti S (2006) Direct and indirect effects of sequential treatments. In Proceedings of the Twenty-Second Conference on Uncertainty in Artificial Intelligence, 138–146. AUAI Press. [Google Scholar]
Dunn G and Bentall R (2007) Modelling treatment-effect heterogeneity in randomized controlled trials of complex interventions (psychological treatments). Statistics in medicine, 26, 4719–4745. [DOI] [PubMed] [Google Scholar]
Frölich M and Huber M (2017) Direct and indirect treatment effects-causal chains and mediation analysis with instrumental variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 79, 1645–1666. [Google Scholar]
Gruber S and van der Laan MJ (2010) A targeted maximum likelihood estimator of a causal effect on a bounded continuous outcome. The International Journal of Biostatistics, 6. [DOI] [PMC free article] [PubMed] [Google Scholar]
Imai K, Keele L, Yamamoto T et al. (2010) Identification, inference and sensitivity analysis for causal mediation effects. Statistical science, 25, 51–71. [Google Scholar]
Joffe MM, Small D, Ten Have T, Brunelli S and Feldman HI (2008) Extended instrumental variables estimation for overall effects. The international journal of biostatistics, 4. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kessler RC, Andrews G, Colpe LJ, Hiripi E, Mroczek DK, Normand SL, Walters EE and Zaslavsky AM (2002) Short screening scales to monitor population prevalences and trends in non-specific psychological distress. Psychological medicine, 32, 959–976. [DOI] [PubMed] [Google Scholar]
Kessler RC, Andrews G, Mroczek D, Ustun B and Wittchen H-U (1998) The world health organization composite international diagnostic interview short- form (cidi-sf). International journal of methods in psychiatric research, 7, 171–185. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kling JR, Liebman JB and Katz LF (2007) Experimental analysis of neighborhood effects. Econometrica, 75, 83–119. [Google Scholar]
van der Laan MJ (2014) Targeted estimation of nuisance parameters to obtain valid statistical inference. The international journal of biostatistics, 10, 29–57. [DOI] [PubMed] [Google Scholar]
van der Laan MJ and Rubin D (2006) Targeted maximum likelihood learning. The International Journal of Biostatistics, 2. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ogburn EL (2012) Commentary on” mediation analysis without sequential ignorability: Using baseline covariates interacted with random assignment as instrumental variables” by dylan small. Journal of statistical research, 46, 105. [PMC free article] [PubMed] [Google Scholar]
Orr L, Feins J, Jacob R, Beecroft E, Sanbonmatsu L, Katz LF, Liebman JB and Kling JR (2003) Moving to opportunity: Interim impacts evaluation. [Google Scholar]
Pearl J (2009) Causality. Cambridge university press. [Google Scholar]
Rice MS, Bertrand KA, VanderWeele TJ, Rosner BA, Liao X, Adami H-O and Tamimi RM (2016) Mammographic density and breast cancer risk: a mediation analysis. Breast Cancer Research, 18, 94. [DOI] [PMC free article] [PubMed] [Google Scholar]
Rudolph KE and van der Laan MJ (2017) Robust estimation of encouragement-design intervention effects transported across sites. Journal of the Royal Statistical Society Series B Statistical Methodology, 79, 1509–1525. [DOI] [PMC free article] [PubMed] [Google Scholar]
Rudolph KE, Sofrygin O, Schmidt NM, Crowder R, Glymour MM, Ahern J and Osypuk TL (2017a) Mediation of neighborhood effects on adolescent substance use by the school and peer environments in a large-scale housing voucher experiment. Epidemiology, In Press. [DOI] [PMC free article] [PubMed] [Google Scholar]
Rudolph KE, Sofrygin O, Zheng W and van der Laan MJ (2017b) Robust and flexible estimation of data-dependent stochastic mediation effects: a proposed method and example in a randomized trial setting. Epidemiologic Methods, In Press. [DOI] [PMC free article] [PubMed] [Google Scholar]
Small DS (2011) Mediation analysis without sequential ignorability: Using baseline covariates interacted with random assignment as instrumental variables. arXiv preprint arXiv:1109.1070. [PMC free article] [PubMed] [Google Scholar]
Ten Have TR, Joffe MM, Lynch KG, Brown GK, Maisto SA and Beck AT (2007) Causal mediation analyses with rank preserving models. Biometrics, 63, 926–934. [DOI] [PubMed] [Google Scholar]
VanderWeele TJ and Tchetgen Tchetgen EJ (2017) Mediation analysis with time varying exposures and mediators. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 79, 917–938. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zheng W and van der Laan M (2017) Longitudinal mediation analysis with time-varying mediators and exposures, with application to survival outcomes. Journal of Causal Inference. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supp 1

NIHMS1550244-supplement-Supp_1.pdf^{(399.7KB, pdf)}

[R1] Albert JM (2008) Mediation analysis via potential outcomes models. Statistics in medicine, 27, 1282–1304. [DOI] [PubMed] [Google Scholar]

[R2] Angrist JD, Imbens GW and Rubin DB (1996) Identification of causal effects using instrumental variables. Journal of the American statistical Association, 91, 444–455. [Google Scholar]

[R3] Baiocchi M, Cheng J and Small DS (2014) Instrumental variable methods for causal inference. Statistics in medicine, 33, 2297–2340. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] Benkeser D, Carone M, Laan MVD and Gilbert P (2017) Doubly robust nonparametric inference on the average treatment effect. Biometrika, 104, 863–880. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] Buuren S and Groothuis-Oudshoorn K (2011) mice: Multivariate imputation by chained equations in r. Journal of statistical software, 45. [Google Scholar]

[R6] Chén OY, Crainiceanu C, Ogburn EL, Caffo BS, Wager TD and Lindquist MA (2015) High-dimensional multivariate mediation with application to neuroimaging data. Biostatistics. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] Clampet-Lundquist S, Edin K, Kling JR and Duncan GJ (2011) Moving teenagers out of high-risk neighborhoods: How girls fare better than boys. American Journal of Sociology, 116, 1154–89. [DOI] [PubMed] [Google Scholar]

[R8] Didelez V, Dawid AP and Geneletti S (2006) Direct and indirect effects of sequential treatments. In Proceedings of the Twenty-Second Conference on Uncertainty in Artificial Intelligence, 138–146. AUAI Press. [Google Scholar]

[R9] Dunn G and Bentall R (2007) Modelling treatment-effect heterogeneity in randomized controlled trials of complex interventions (psychological treatments). Statistics in medicine, 26, 4719–4745. [DOI] [PubMed] [Google Scholar]

[R10] Frölich M and Huber M (2017) Direct and indirect treatment effects-causal chains and mediation analysis with instrumental variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 79, 1645–1666. [Google Scholar]

[R11] Gruber S and van der Laan MJ (2010) A targeted maximum likelihood estimator of a causal effect on a bounded continuous outcome. The International Journal of Biostatistics, 6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] Imai K, Keele L, Yamamoto T et al. (2010) Identification, inference and sensitivity analysis for causal mediation effects. Statistical science, 25, 51–71. [Google Scholar]

[R13] Joffe MM, Small D, Ten Have T, Brunelli S and Feldman HI (2008) Extended instrumental variables estimation for overall effects. The international journal of biostatistics, 4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] Kessler RC, Andrews G, Colpe LJ, Hiripi E, Mroczek DK, Normand SL, Walters EE and Zaslavsky AM (2002) Short screening scales to monitor population prevalences and trends in non-specific psychological distress. Psychological medicine, 32, 959–976. [DOI] [PubMed] [Google Scholar]

[R15] Kessler RC, Andrews G, Mroczek D, Ustun B and Wittchen H-U (1998) The world health organization composite international diagnostic interview short- form (cidi-sf). International journal of methods in psychiatric research, 7, 171–185. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] Kling JR, Liebman JB and Katz LF (2007) Experimental analysis of neighborhood effects. Econometrica, 75, 83–119. [Google Scholar]

[R17] van der Laan MJ (2014) Targeted estimation of nuisance parameters to obtain valid statistical inference. The international journal of biostatistics, 10, 29–57. [DOI] [PubMed] [Google Scholar]

[R18] van der Laan MJ and Rubin D (2006) Targeted maximum likelihood learning. The International Journal of Biostatistics, 2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R19] Ogburn EL (2012) Commentary on” mediation analysis without sequential ignorability: Using baseline covariates interacted with random assignment as instrumental variables” by dylan small. Journal of statistical research, 46, 105. [PMC free article] [PubMed] [Google Scholar]

[R20] Orr L, Feins J, Jacob R, Beecroft E, Sanbonmatsu L, Katz LF, Liebman JB and Kling JR (2003) Moving to opportunity: Interim impacts evaluation. [Google Scholar]

[R21] Pearl J (2009) Causality. Cambridge university press. [Google Scholar]

[R22] Rice MS, Bertrand KA, VanderWeele TJ, Rosner BA, Liao X, Adami H-O and Tamimi RM (2016) Mammographic density and breast cancer risk: a mediation analysis. Breast Cancer Research, 18, 94. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R23] Rudolph KE and van der Laan MJ (2017) Robust estimation of encouragement-design intervention effects transported across sites. Journal of the Royal Statistical Society Series B Statistical Methodology, 79, 1509–1525. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R24] Rudolph KE, Sofrygin O, Schmidt NM, Crowder R, Glymour MM, Ahern J and Osypuk TL (2017a) Mediation of neighborhood effects on adolescent substance use by the school and peer environments in a large-scale housing voucher experiment. Epidemiology, In Press. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R25] Rudolph KE, Sofrygin O, Zheng W and van der Laan MJ (2017b) Robust and flexible estimation of data-dependent stochastic mediation effects: a proposed method and example in a randomized trial setting. Epidemiologic Methods, In Press. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R26] Small DS (2011) Mediation analysis without sequential ignorability: Using baseline covariates interacted with random assignment as instrumental variables. arXiv preprint arXiv:1109.1070. [PMC free article] [PubMed] [Google Scholar]

[R27] Ten Have TR, Joffe MM, Lynch KG, Brown GK, Maisto SA and Beck AT (2007) Causal mediation analyses with rank preserving models. Biometrics, 63, 926–934. [DOI] [PubMed] [Google Scholar]

[R28] VanderWeele TJ and Tchetgen Tchetgen EJ (2017) Mediation analysis with time varying exposures and mediators. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 79, 917–938. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R29] Zheng W and van der Laan M (2017) Longitudinal mediation analysis with time-varying mediators and exposures, with application to survival outcomes. Journal of Causal Inference. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Complier stochastic direct effects: identification and robust estimation

Kara E Rudolph

Oleg Sofrygin

Mark J van der Laan

Abstract

1. Introduction

2. Notation and Structural Causal Model

3. Complier Stochastic Direct Effect Estimand and Identification

4. Estimators

4.1. Estimators that estimate the numerator and denominator separately

4.1.1. Inverse Probability of Treatment Weighted Estimator

4.1.2. Estimating Equation Estimator

4.1.3. Targeted Minimum Loss-Based Estimator

Inefficient TMLE.

Efficient TMLE.

4.2. TMLE that estimates the CDSE ratio directly

5. Simulation

5.1. Overview

Table 1.

5.2. Results

Table 2.

Table 3.

Table 4.

6. Empirical Illustration

6.1. Overview and set-up

6.2. Results

Table 5.

Fig. 1.

Fig. 3.

7. Conclusion

Supplementary Material

Fig. 2.

Acknowledgments

Contributor Information

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases