Transporting stochastic direct and indirect effects to new populations

Kara E Rudolph; Jonathan Levy; Mark J van der Laan

doi:10.1111/biom.13274

. Author manuscript; available in PMC: 2022 Mar 1.

Published in final edited form as: Biometrics. 2020 May 4;77(1):197–211. doi: 10.1111/biom.13274

Transporting stochastic direct and indirect effects to new populations

Kara E Rudolph ¹, Jonathan Levy ², Mark J van der Laan ³

PMCID: PMC7664994 NIHMSID: NIHMS1644546 PMID: 32277465

Summary:

Transported mediation effects may contribute to understanding how interventions work differently when applied to new populations. However, we are not aware of any estimators for such effects. Thus, we propose two doubly robust, efficient estimators of transported stochastic (also called randomized interventional) direct and indirect effects. We demonstrate their finite sample properties in a simulation study. We then apply the preferred substitution estimator to longitudinal data from the Moving to Opportunity Study, a large-scale housing voucher experiment, to transport stochastic indirect effect estimates of voucher receipt in childhood on subsequent risk of mental health or substance use disorder mediated through parental employment across sites, thereby gaining understanding of drivers of the site differences.

Keywords: External validity, Generalizability, Instrumental variables, Randomized interventional indirect effects, Stochastic indirect effects, Mediation, Targeted maximum likelihood estimation, Transportability

1. Introduction

Often, an intervention, program, or policy that works in one place or population fails to replicate in another place or population (20) or can even have unintended harmful effects (7). This is problematic from a public policy or public health perspective in that the goals of such interventions are to help—not harm, and problematic from a financial perspective in that limited resources may be not be spent optimally.

When such initiatives fail to replicate or have unintended effects in new populations, transportability theory and methods offer a chance to understand why. Transportability is the ability (based on identifying assumptions) to transport a causal effect from a source population to a new, target population, accounting for differences between the two populations (e.g., differences in compositional factors, treatment adherence, etc.) (16). Previous work developed estimators to transport total effects from a source to target population (22) and, similarly, to generalize effects from a sample to the population (4, 5, 10, 24).

In some cases, examining transportability of the total effect may shed light on reasons for lack of replication. However, in other cases, transporting the total effect may not identify the relevant differences and it may be beneficial to go further and examine transportability of the underlying mediation mechanisms. Although there has been work identifying transported indirect effects (2, 16), we are not aware of any previous work developing estimators for such effects (transported direct and indirect effects from a source to target population). Thus, we address this research gap by proposing two different estimators of transported stochastic direct and indirect effects: a doubly robust estimator that solves the estimating equation and a doubly robust substitution estimator in the targeted minimum loss-based framework.

Differences in indirect effects across sites in the Moving to Opportunity study (MTO) (7) motivates this work. MTO was a longitudinal randomized trial conducted by the U.S. Department of Housing and Urban Development from 1994 to 2007 in five cities in the U.S: Baltimore, Boston, Chicago, Los Angeles, and New York (23). Families living in high-rise public housing in these cities could sign up to be randomized at baseline to receive a Section 8 housing voucher that they could use to move out of public housing and into a rental on the private market. The adult participants and their children were then surveyed at two follow-up time points to estimate the effects of the intervention on economic, educational, and health outcomes. Previous work has discussed differences in total MTO intervention effects across sites (14, 20). Building on this prior work, we use the transport stochastic indirect effect estimators we propose to examine reasons for the site differences in a particular indirect effect of the MTO intervention.

The paper is organized as follows. In Section 2, we introduce notation and the structural equations model generating our data. In Section 3, we define the parameters of interest, the transported stochastic (also called randomized interventional) direct and indirect effects. Section 4 gives the identification results. Sections 5 and 6 detail estimating equation estimators and targeted minimum loss-based estimators (TMLE) for these effects, respectively. In Section 7, we present the results of simulation studies that demonstrates the relative performance of the aforementioned estimators in finite samples. In Section 8, we apply the estimators to transport a particular stochastic indirect effect estimate across MTO sites, thus gaining understanding of drivers of the site differences. Section 9 concludes.

2. Notation and structural equations model

The full data is generated by a structural equations model (SEM) (17, 31), which consists of the data generating process to which we would like to have access. The SEM first generates a random draw of a vector U of unknown measurements (15), where U = (U_S, U_W, U_A, U_Z, U_M, U_Y) ~ P_U. Then our variables are generated in the following time ordering:

S = f_{S} (U_{S})

W = f_{W} (U_{W}, S)

A = f_{A} (U_{A}, W, S)

Z = f_{Z} (U_{Z}, A, W, S)

M = f_{M} (U_{M}, Z, W, S)

Y = f_{Y} (U_{Y}, M, Z, W),

where S is a binary indicator of site, W is a vector of covariates, A is a binary treatment (though it could be categorical under our approach), Z is a binary intermediate variable (though it could also be categorical), M is a binary mediator (thought it could also be categorical), and Y is a binary or continuous outcome. The SEM generates the full data as $(U, O) ~ P_{U O} \in M^{F}$ , our full-data statistical model. If we had access to the SEM, we could generate potential (i.e., counterfactual) outcomes (11, 18), which define our causal parameters of interest. We observe data O = (S, W, A, Z, M, S × Y) for n participants, with the true distribution $O_{1}, \dots, O_{n} \overset{i i d}{~} P_{O} \in M$ , our observed data statistical model. Note, we will only observe the outcome, Y, for site S = 1, and from the above SEM, the selection mechanism is nondifferential with respect to the outcome.

Putting the above notation in the context of our motivating example, S is MTO site. For this particular example, we treat the combined Los Angeles and New York sites as the source population, S = 1, and the Boston site as the transport population, S = 0. W represents baseline covariates describing family and individual characteristics. A represents randomization to receive a Section 8 housing voucher, A = 1, or not, A = 0. We consider two types of Z variables, described more below. For the first, Z represents use of the housing voucher to move out of public housing. For the second, Z represents moving to a low-poverty neighborhood. M represents whether the participating parent was employed, M = 1, or not, M = 0, during follow-up. Finally, Y represents having (Y = 1) a current mental health or substance use disorder versus not (Y = 0).

We consider two statistical models, $M_{I}$ and $M_{I I}$ , depicted as a directed acyclic graph in Figure 1, corresponding to the motivating example. $M_{I}$ includes restrictions in alignment with Z representing use of the housing voucher to move out of public housing. In that case, A is considered to be an instrumental variable (IV) for Z (1). The restrictions are: 1) A is randomly assigned (possibly conditional on (W, S)), and 2) there is no direct effect of A on M or of A on Y —downstream effects of A only operate through Z, which makes for this motivating example given that being randomized to receive the housing voucher, A, can only affect the mediator and outcome if one uses the housing voucher, Z (21). Under $M_{I}$ the density of the true distribution P₀ of O, p₀(O) can be factorized as:

p_{0} (O) = p_{0} (Y ∣ M, Z, W, s = 1) p_{0} (M ∣ Z, W, S) p_{0} (Z ∣ A, W, S) p_{0} (A ∣ W, S) p_{0} (W ∣ S) p_{0} (S) .

However, the methods we propose can be used under statistical model $M_{I I}$ , where we allow A to directly affect M and/or Y. This corresponds to the other Z we consider, moving to a low-poverty neighborhood. In this case, housing voucher receipt, A, would be hypothesized to affect the mediator and outcome largely through moving to a low-poverty neighborhood, Z, but could also affect these downstream variables through alternative pathways, for example, through the act of moving alone. Thus, the exclusion restriction would not necessarily hold in this motivating example. Under $M_{I I}$ , p₀(O) can be factorized as:

p_{0} (O) = p_{0} (Y ∣ M, Z, A, W, s = 1) p_{0} (M ∣ Z, A, W, S) p_{0} (Z ∣ A, W, S) p_{0} (A ∣ W, S) p_{0} (W ∣ S) p_{0} (S) .

Throughout this paper, we will focus on our consideration of $M_{I}$ . Wherever consideration of $M_{I I}$ involves nontrivial differences, such as in terms of estimation, we will discuss such differences.

Figure 1: — Directed acyclic graphs depicting the two statistical models considered.

In our notation, we use lowercase letters to denote fixed, assignment values of variables and uppercase letters to denote observed values. We use subscripts for descriptive purposes—subscripts are not to be considered a variable. For instance, we use a capital letter in p_Y, the conditional density of Y, because it is a density of the random variable Y.

3. Parameters of interest

We consider two causal quantities of interest that we call transported stochastic direct and indirect effects. These causal quantities represent stochastic direct and indirect effects (21, 30) transported from a source population to a new, target population. Stochastic direct and indirect effects, also called randomized interventional direct and indirect effects (30), represent the 1) direct effect of A on Y not through M and the 2) indirect effect of A on Y through M. Stochastic direct and indirect effects are similar to natural direct and indirect effects but do not require the natural direct and indirect effects identifying assumption of no measured or unmeasured post-treatment confounder of the mediator-outcome relationship (30). There is also not an individual-specific exact decomposition of the average treatment effect in terms of these stochastic effects as there is with natural direct and indirect effects. As has been described previously (19), one can consider versions of these effects that condition on Z and thus estimate the indirect pathway of A to M to Y, not through Z (33), or versions that marginalize over Z and thus estimate the combined indirect pathways of 1) A to Z to M to Y and 2) A to M to Y (21, 30). Adhering to the IV exclusion restriction on $M_{I}$ , no effect operates through pathway A to M to Y, so we focus herein on the versions of these effects that marginalize over Z. Versions of indirect effects using stochastic interventions on M that are conditional on Z would be zero under $M_{I}$ and noninformative. Previously, such a stochastic intervention on M has been defined

g_{M ∣ a^{*}, W}^{*} (M ∣ W) = \sum_{z} P r (M = m ∣ Z = z, W) P r (Z = z ∣ A = a^{*}, W) (21, 30) .

The subscript for $g_{M ∣ a^{*}, W}^{*}$ specifies that it is a conditional density of random variable M given random variable W, and value a* for which a lower case letter indicates it is fixed and the same for all participants. We note that marginalizing out Z introduces dependence on A. The nontranported parameter of interest was defined previously as $Ψ^{F} (P_{U O}) = E [Y_{a, g_{M ∣ a^{*}, W}^{*}}]$ where the expectation is taken over the full data model and $Y_{a, g_{M ∣ a^{*}, W}^{*}}$ is a potential outcome intervening on A to set it to a, and then downstream, intervening on M to set it to a random (i.e., stochastic) draw from the distribution of M defined by $g_{M ∣ a^{*}, W}^{*} (M ∣ W)$ (21). We wish to transport this parameter to a new site where the outcome was not observed (S = 0), and thus make the following modification:

Ψ^{F} (P_{U O}) = E [Y_{a, g_{M ∣ a^{*}, W, s}^{*}} ∣ S = 0]

where

g_{M ∣ a^{*}, W, s}^{*} (M ∣ W) = \sum_{z} P r (M = m ∣ Z = z, W, S = s) P r (Z = z ∣ A = a^{*}, W, S = s),

(1)

and where we impose a certain a* and a certain s in both the Z and M models.

The transported stochastic direct effect entails setting a* to 0 and taking the difference in mean outcome between setting a to 1 and setting a to 0, denoted

E [Y_{1, g_{M ∣ 0, W, s}^{*}} - Y_{0, g_{M ∣ 0, W, s}^{*}} ∣ S = 0]

and the transported stochastic indirect effect entails setting a = 1 and then taking the difference in mean outcome between setting a* = 1 and a* = 0, denoted

E [Y_{1, g_{M ∣ 1, W, s}^{*}} - Y_{1, g_{M ∣ 0, W, s}^{*}} ∣ S = 0] .

We note that $g_{M ∣ a^{*}, W, s}^{*} (M ∣ W)$ represents any stochastic intervention, which can include stochastic draws from the true, unknown models, but can also include a data-dependent version, estimated from observed data distributions, which we denote $g_{M, n ∣ a^{*}, W, s}^{*} (M ∣ W)$ .

4. Identifiability

To identify the transported stochastic direct effect and transported stochastic indirect effect we will need to impose additional assumptions on $M^{F}$ , which implies models $M_{I}$ and $M_{I I}$ . Again, we focus our consideration on $M_{I}$ , so enumerate these additional assumptions in relation to $M_{I}$ , but note that rewriting them under $M_{I I}$ is trivial.

Positivity: For all S and W we need a positive probability of assigning any level of A. For all combinations of S, W, and A = a, we have a positive probability of any level of Z. For S = 1 and all combinations of Z and W we need a positive probability of any level of the mediator, M.
Common outcome model across sites: $E [Y ∣ M, Z, W, S = 1] = E [Y ∣ M, Z, W, S = 0]$ .
Sequential Randomization: Y_am ⊥ A | W, S, Y_am ⊥ M | W, Z, S, and M_a ⊥ A | W, S. This is the usual sequential randomization assumption for a two-time point longitudinal intervention where at the first time point, we statically intervene to set the treatment, A = a, and at the second time point, we stochastically intervene on the mediator, M. If we use data-dependent $g_{M, n ∣ a^{*}, W, s}^{*}$ , we no longer need that M_a ⊥ A | W, S because $g_{M, n ∣ a^{*}, W, s}^{*}$ is assumed to be known.

Theorem 4.1. Given the above assumptions, we can establish the following identification:

Ψ (P) = Ψ^{F} (P_{U X}) = E [E [E_{g_{M ∣ a^{*}, W, s}^{*}} [E [Y ∣ W, Z, M, S = 1] ∣ W, Z] ∣ W, A = a, S = 0] ∣ S = 0] = E [E [\sum_{m} [E Y g_{M ∣ a^{*}, W, s}^{*} (m ∣ W) ∣ M = m, W, Z, A = a, S = 1] ∣ A = a, W, S] ∣ S = 0]

The proof is in the Supporting Information. In words, first, one computes the conditional mean of Y among those with S = 1, then integrates out M under the stochastic intervention $g_{M ∣ a^{*}, W, s}^{*}$ . One then integrates out Z, setting A = a by computing the mean of the resulting object conditional on W, S, and A = a. Marginalizing over the distribution of W among those with S = 0 gives the statistical parameter.

We note that this identification applies for either a fixed parameter that assumes an unknown, true $g_{M ∣ a^{*}, W, s}^{*}$ or data-dependent parameter that assumes a known $g_{M, n ∣ a^{*}, W, s}^{*}$ , estimated from the observed data with the difference being in the sequential randomization assumptions noted in number 3 above.

In the sections that follow, we focus on estimation of the data-dependent target parameter under $g_{M, n ∣ a^{*}, W, s}^{*}$ . $g_{M ∣ a^{*}, W, s}^{*} (M ∣ W)$ can be expressed $g_{M ∣ a^{*}, W, s}^{*} (M ∣ W) = \sum_{z = 0}^{1} P (M = m ∣ Z = z, W, S = s) P (Z = z ∣ A = a^{*}, W, S = s)$ . It can estimated from the the observed data as follows. P(M = m|Z = z, W, S = s) can be estimated using a logistic regression estimating the probability of M = m given Z, W, and S and thereby getting predicted probabilities for M = m setting S = s and separately setting Z = 1 and Z = 0. P(Z = z|A = a*,W,S = s) can be estimated using a logistic regression estimating the probability of Z = z given A, W, and S and thereby getting predicted probabilities for Z = 1 and for Z = 0, setting A = a* and S = s and using observed values for W.

5. Estimating equation estimator

Next, we describe two estimating equation (EE) estimators of Ψ(P) under data-dependent $g_{M, n ∣ a^{*}, W, s}^{*} (M ∣ W)$ : 1) one that incorporates the exclusion restrictions on our statistical model, $M_{I}$ , that there is no direct effect of A on M or of A on Y, and 2) another that does not impose those restrictions under $M_{I I}$ . A link to R code to implement these estimators is provided in the Supporting Information section following the references. We describe model fitting using regression language for simplicity but note that machine learning can be used instead. For machine learning to be used, we would need to guarantee our nuisance parameter fits that define the influence curve fall into a Donsker class. This condition is satisfied if using standard regression techniques such as logistic regression. We also need to satisfy a second order remainder equal to $o_{P} (1 / \sqrt{n})$ (13). We include discussion and a theorem in Section 2.3 of the Supporting Information. By using the highly adaptive lasso (HAL), we satisfy the Donsker condition with probability tending to 1 (12). As an alternative to the Donsker condition, we can use sample splitting (32), which we also describe in more detail in Section 2.3 in the Supporting Information.

5.1. Estimator incorporating exclusion restrictions under $M_{I}$

This EE estimator solves the efficient influence curve (EIC) equation of the parameter Ψ(P) for $M_{I}$ where M and Y do not depend directly on A and for a data-dependent $g_{M, n ∣ a^{*}, W, s}^{*}$ that is considered known and estimated from the observed data. An EIC is defined in terms of the statistical model (e.g., $M_{I}$ ) and target parameter, Ψ, as the canonical gradient of the pathwise derivative of Ψ at P along each possible submodel of $M_{I}$ through P. The canonical gradient is a function of O and depends on P. An estimator, Ψ_n, is asymptotically linear if $\sqrt{n} (Ψ_{n} - Ψ (P_{0})) = \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} D (P_{0}) (O_{i}) + o_{p} (1)$ , where D(P₀) is a mean 0 function called the influence curve of the estimator. If a regular asymptotically linear (RAL) estimator has influence curve equal to the EIC, then the estimator is asymptotically efficient, meaning it is of minimum variance for a RAL estimator.

The EIC of this parameter is given by

D_{I} (P) = D_{Y, I} (P) + D_{Z, I} (P) + D_{W, I} (P), where

D_{Y, I} (P) = (Y - {\bar{Q}}_{Y} (M, Z, W)) \times \frac{g_{M, n ∣ a^{*}, W, s}^{*} (M ∣ W) p_{Z} (Z ∣ A = a, W, S = 0) p_{W} (S = 0 ∣ W) I (S = 1)}{g_{M} (M ∣ Z, W, S = 1) p_{Z} (Z ∣ W, S = 1) p_{W} (S = 1 ∣ W) P_{S} (S = 0)},

(2)

D_{Z, I} (P) = ({\bar{Q}}_{M} (Z, W, S) - {\bar{Q}}_{Z} (a, W, S)) \frac{I (S = 0, A = a)}{p_{A} (a ∣ W, S) p_{S} (S = 0)}, and

D_{W, I} (P) = ({\bar{Q}}_{Z} (a, W, S) - Ψ (P)) \frac{I (S = 0)}{p_{S} (S = 0)},

where subscript I indicates that this is the canonical gradient under $M_{I}$ and additional notation is explained further below.

We estimate D_Y,I(P) as follows. First, we estimate $E [Y ∣ M, Z, W]$ as ${\bar{Q}}_{Y, n} (M, Z, W)$ , which can be estimated using predicted values from a regression of Y on M, Z, W among those with S = 1. p_Z(Z|W,S = 1) can be expressed $p_{Z} (Z ∣ W, S = 1) = \sum_{a = 0}^{1} P (Z = z ∣ A = a, W, S = 1) P (A = a ∣ W, S = 1)$ . P(A = a|W, S = 1) can be estimated using a logistic regression estimating the probability of A = a given W, S and getting predicted probabilities for A = 1 and A = 0 setting S = 1 and using observed values for W. p_W (S = s|W) can be estimated using a logistic regression estimating the probability of S = s given W and getting predicted probabilities for S = 1 and S = 0. P_S(S = 0) can be estimated as the empirical mean of S.

We next estimate D_Z,I(P). To do so, we apply the stochastic intervention on ${\bar{Q}}_{Y, n} (M, Z, W)$ via the computation ${\bar{Q}}_{M, n} (Z, W, S) = E_{g_{M, n ∣ a^{*}, W, s}^{*}} [{\bar{Q}}_{Y, n} (M, Z, W) ∣ Z, W, S]$ . Specifically, we can generate predicted values of ${\bar{Q}}_{Y, n} (1, Z, W)$ and ${\bar{Q}}_{Y, n} (0, Z, W)$ and then marginalize over $g_{M, n ∣ a^{*}, W, s}^{*} (M ∣ W) : \sum_{m = 0}^{1} {\bar{Q}}_{Y, n} (m, Z, W) g_{M, n ∣ a^{*}, W, s}^{*} (m ∣ W)$ . Next, we estimate ${\bar{Q}}_{Z, n} (a, W, S)$ by regressing ${\bar{Q}}_{M, n} (Z, W, S)$ on A, W, and S, and getting predicted values setting A = a.

The EE estimator is given by

Ψ_{n} = \frac{1}{n} \sum_{i = 1}^{n} (D_{Y, I, n} (O_{i}) + D_{Z, I, n} (O_{i}) + {\bar{Q}}_{Z, i, n} (a, W_{i}, S_{i}) \frac{I (S_{i} = 0)}{p_{S, n} (S = 0)}),

where D_Y,I and D_Z,I are given in Equation 2 and are estimated as described above.

The transported stochastic direct effect (SDE) entails setting a* to 0 and taking the difference in Ψ(P) setting a to 1 versus setting a to 0. The corresponding SDE EIC is a difference in the EIC for the parameter defined by setting a* = 0, a = 1 and the EIC for the parameter defined by setting a* = 0, a = 0. The transported stochastic indirect effect (SIE) entails setting a = 1 and then taking the difference in Ψ(P) setting a* = 1 versus a* = 0. The corresponding SIE EIC is again a difference in the EIC for parameter defined by setting a* = 1, a = 1 and the EIC for the parameter defined by setting a* = 0, a = 1.

We estimate the variance as the sample variance of the EIC estimate. The conditions for consistency and asymptotic efficiency of this estimator are discussed in the Supporting Information.

5.2. Estimator not imposing exclusion restrictions

This EE estimator solves the EIC equation of the parameter Ψ(P) for $M_{I I}$ where M and Y may depend directly on A. It will be inefficient under $M_{I}$ where M and Y do not depend directly on A. This EIC is given by

D_{I I} (P) = D_{Y, I I} (P) + D_{Z, I I} (P) + D_{W, I I} (P), where

D_{Y, I I} (P) = (Y - {\bar{Q}}_{Y} (M, Z, A, W)) \times \frac{g_{M, n ∣ a^{*}, W, s}^{*} (M ∣ W) p_{Z} (Z ∣ A = a, W, S = 0)}{g_{M} (M ∣ Z, A, W, S = 1) p_{Z} (Z ∣ A = a, W, S = 1)} \times \frac{p_{W} (S = 0 ∣ W) I (S = 1, A = a)}{p_{A} (a ∣ W, S = 1) p_{W} (S = 1 ∣ W) P_{S} (S = 0)},

(3)

D_{Z, I I} (P) = ({\bar{Q}}_{M} (Z, A, W, S) - {\bar{Q}}_{Z} (a, W, S)) \frac{I (S = 0, A = a)}{p_{A} (a ∣ W, S) p_{S} (S = 0)}, and

D_{W, I I} (P) = ({\bar{Q}}_{Z} (a, W, S) - Ψ (P)) \frac{I (S = 0)}{p_{S} (S = 0)},

where subscript II indicates that this is the EIC under $M_{I I}$ and additional notation is explained further below. We estimate D_Y,II(P) as follows. First, we estimate $E [Y ∣ M, Z, A, W]$ as ${\bar{Q}}_{Y, n} (M, Z, A, W)$ , which can be estimated using predicted values from a regression of Y on M, Z, A, W among those with S = 1. Each of the probabilities in D_Y,II(P) can be estimated using predicted probabilities from a logistic regression of the relevant conditional mean outcome models for M, Z, A, and S. $g_{M, n ∣ a^{*}, W, s}^{*} (M ∣ W)$ can be estimated as described in Section 4.

We next estimate D_Z,II(P). To do so, we apply the stochastic intervention on ${\bar{Q}}_{Y, n} (M, Z, A, W)$ via the computation ${\bar{Q}}_{M, n} (Z, A, W, S) = E_{g_{M, n ∣ a^{*}, W, s}^{*}} [{\bar{Q}}_{Y, n} (M, Z, A, W) ∣ Z, W, S]$ . Specifically, we can generate predicted values of ${\bar{Q}}_{Y, n} (1, Z, A, W)$ and ${\bar{Q}}_{Y, n} (0, Z, A, W)$ and then marginalize over $g_{M, n ∣ a^{*}, W, s}^{*} (M ∣ W) : \sum_{m = 0}^{1} {\bar{Q}}_{Y, n} (m, Z, A, W) g_{M, n ∣ a^{*}, W, s}^{*} (m ∣ W)$ . Next, we estimate ${\bar{Q}}_{Z, n} (a, W, S)$ by regressing ${\bar{Q}}_{M, n} (Z, A, W, S)$ on A, W, and S, and getting predicted values setting A = a.

This EE estimator is given by

Ψ_{n} = \frac{1}{n} \sum_{i = 1}^{n} (D_{Y, I I, n} (O_{i}) + D_{Z, I I, n} (O_{i}) + {\bar{Q}}_{Z, i, n} (a, W_{i}, S_{i}) \frac{I (S_{i} = 0)}{p_{S, n} (S = 0)}),

where D_II,n(O) is the estimating function with components of D_II(P) (given in Equation 3) that are estimated as described above. The transported SDE and SIE, and their corresponding EICs and standard errors can be estimated as described in Section 5.1. The variance is estimated as the sample variance of the EIC estimate. The conditions for consistency and asymptotic efficiency of this estimator are discussed in the Supporting Information.

6. Targeted minimum loss-based estimator

We now describe how to estimate Ψ(P) using targeted minimum loss-based estimation (TMLE). This estimation approach, which is just one of several TMLE approaches that could be used, is a substitution estimator that uses sequential regression, updating the conditional outcome model at each stage to both solve the EIC equation while also lowering the empirical negative log-likelihood loss. The process is identical to a two time-point longitudinal intervention (27). Similar to the EE estimator section, we describe two TMLE estimators of Ψ(P): 1) one that incorporates the exclusion restrictions in $M_{I}$ , that there is no direct effect of A on M or of A on Y, and 2) another under model $M_{I I}$ that does not impose those restrictions. A link to R code to implement these estimators is provided in the Supporting Information section following the references. As in the previous sections, we describe model fitting using regression language for simplicity but note that machine learning can be used instead.

6.1. Estimator incorporating exclusion restrictions

This TMLE is a plug-in estimator for parameter Ψ(P) under the restricted model $M_{I}$ , the EIC of which is given in Equation 2.

Let ${\bar{Q}}_{Y, n}^{0} (M, Z, W)$ be an initial estimate of $E [Y ∣ M, Z, W]$ . ${\bar{Q}}_{Y, n}^{0} (M, Z, W)$ can be estimated by predicted values from a regression of Y on M, Z, W among those with S = 1.

Next, we update that initial estimate using the weights

H (M, Z, W, S) = \frac{g_{M, n ∣ a^{*}, W, s}^{*} (M ∣ W) p_{Z} (Z ∣ A = a, W, S = 0) p_{W} (S = 0 ∣ W) I (S = 1)}{g_{M} (M ∣ Z, W, S = 1) p_{Z} (Z ∣ W, S = 1) p_{W} (S = 1 ∣ W) P_{S} (S = 0)},

(4)

which are estimated with H_n(M, Z, W, S). ${\bar{Q}}_{Y, n}^{0} (M, Z, W)$ is updated by performing a weighted parametric logistic regression of Y with $logit ({\bar{Q}}_{Y, n}^{0} (M, Z, W))$ as an offset, intercept ϵ_Y, and weights H_n(M, Z, W, S). ϵ_Y,n is the MLE fit of intercept ϵ_Y. The update is given by ${\bar{Q}}_{Y, n}^{*} (M, Z, W) = {\bar{Q}}_{Y, n}^{0} (ϵ_{Y, n}) (M, Z, W)$ .

We then perform the stochastic intervention on ${\bar{Q}}_{Y, n}^{*} (M, Z, W)$ via the computation ${\bar{Q}}_{M, n}^{*} (Z, W, S) = E_{g_{M, n ∣ a^{*}, W, s}^{*}} [{\bar{Q}}_{Y, n}^{*} (M, Z, W) ∣ Z, W, S]$ . This can be done by generating predicted values of ${\bar{Q}}_{Y, n}^{*} (1, Z, W)$ and ${\bar{Q}}_{Y, n}^{*} (0, Z, W)$ and then marginalizing over $g_{M, n ∣ a^{*}, W, s}^{*} (M ∣ W) : \sum_{m = 0}^{1} {\bar{Q}}_{Y, n}^{*} (m, Z, W) g_{M, n ∣ a^{*}, W, s}^{*} (m ∣ W)$ .

Next, we estimate ${\bar{Q}}_{Z, n}^{0} (a, W, S)$ by regressing ${\bar{Q}}_{M, n}^{*} (Z, W, S)$ on A, W and S, and getting predicted values setting A = a. We then update this initial estimate using a second set of weights,

H_{a} (a, W, S) = \frac{I (S = 0, A = a)}{p_{A} (A ∣ W, S) p_{S} (S = 0)},

(5)

in a weighted logistic regression of $logit ({\bar{Q}}_{M, n}^{*} (Z, W, S))$ with $logit ({\bar{Q}}_{Z, n}^{0} (a, W, S))$ as an offset, intercept ϵ_Z. ϵ_Z,n is the MLE fit of intercept ϵ_Z. The updated estimate will be notated ${\bar{Q}}_{Z, n}^{*} (a, W, S) = {\bar{Q}}_{Z, n}^{0} (ϵ_{Z, n}) (a, W, S)$ .

The empirical mean of ${\bar{Q}}_{Z, n}^{*} (a, W, S)$ among those for whom S = 0 is the TMLE estimate of Ψ(P). It solves $\frac{1}{n} \sum_{i = 0}^{n} D_{I, n}^{*} (O) = 0$ . The TMLE updating steps also decreases the empirical loss of the model fits. The variance of the TMLE estimate is the sample variance of the estimated EIC. The transported SDE and SIE, and their corresponding EICs and standard errors can be estimated as described in Section 5. The conditions for consistency and asymptotic efficiency of this estimator are discussed in the Supporting Information.

6.2. Estimator not imposing exclusion restrictions

This TMLE is a plug-in estimator for parameter Ψ(P) under the restricted model $M_{I I}$ , the EIC of which is given in Equation 3. It will be inefficient under $M_{I}$ where M and Y do not depend directly on A.

Let ${\bar{Q}}_{Y, n}^{0} (M, Z, A, W)$ be an initial estimate of $E [Y ∣ M, Z, A, W]$ . ${\bar{Q}}_{Y, n}^{0} (M, Z, A, W)$ can be estimated by predicted values from a regression of Y on M, Z, A, W among those with S = 1.

Next, we update that initial estimate using the weights,

H (M, Z, A, W, S) = \frac{g_{M, n ∣ a^{*}, W, s}^{*} (M ∣ W) p_{Z} (Z ∣ A = a, W, S = 0) p_{W} (S = 0 ∣ W) I (S = 1, A = a)}{p_{M} (M ∣ Z, W, S = 1) p_{Z} (Z ∣ A = a, W, S = 1) p_{A} (a ∣ W, S = 1) p_{W} (S = 1 ∣ W) P_{S} (S = 0)},

(6)

which are estimated with H_n(M, Z, A, W, S). ${\bar{Q}}_{Y, n}^{0} (M, Z, A, W)$ is updated by performing a weighted parametric logistic regression of Y with $logit ({\bar{Q}}_{Y, n}^{0} (M, Z, A, W))$ as an offset, intercept ϵ_Y, and weights H_n(M, Z, A, W, S). ϵ_Y,n is the MLE fit of intercept ϵ_Y. The update is given by ${\bar{Q}}_{Y, n}^{*} (M, Z, A, W) = {\bar{Q}}_{Y, n}^{0} (ϵ_{Y, n}) (M, Z, A, W)$ .

We then perform the stochastic intervention on ${\bar{Q}}_{Y, n}^{*} (M, Z, A, W)$ via the computation ${\bar{Q}}_{M, n}^{*} (Z, A, W, S) = E_{g_{M, n ∣ a^{*}, W, s}^{*}} [{\bar{Q}}_{Y, n}^{*} (M, Z, A, W) ∣ Z, A, W, S]$ . This can be done by generating predicted values of ${\bar{Q}}_{Y, n}^{*} (1, Z, A, W)$ and ${\bar{Q}}_{Y, n}^{*} (0, Z, A, W)$ and then marginalizing over $g_{M, n ∣ a^{*}, W, s}^{*} (M ∣ W) : \sum_{m = 0}^{1} {\bar{Q}}_{Y, n}^{*} (m, Z, A, W) g_{M, n ∣ a^{*}, W, s}^{*} (m ∣ W)$ .

Next, we estimate ${\bar{Q}}_{Z, n}^{0} (a, W, S)$ by regressing ${\bar{Q}}_{M, n}^{*} (Z, A, W, S)$ , on A, W, and S, and getting predicted values setting A = a. The remainder of the steps for this TMLE are identical to those for the restricted TMLE in the above subsection.

7. Simulation

7.1. Overview

We compare finite sample performance of our estimators in estimating the data-dependent transported SDE and transported SIE using simulation under $M_{I}$ . We include the 1) versions that are efficient under $M_{I}$ (henceforth TMLE efficient and EE efficient) and 2) versions that allow for the exclusion restriction not to hold but are inefficient under $M_{I}$ (henceforth TMLE general and EE general). We show estimator performance in terms of absolute bias, efficiency, 95% confidence interval (CI) coverage, root mean squared error (RMSE), and percent of estimates lying outside the bounds of the parameter space across 1,000 simulations. The purpose of this last performance metric is to highlight the advantage of TMLE being a substitution estimator and hence its resultant estimates always lying in the parameter space. For calculating the efficiency and the 95% CI coverage, we use both the EIC and 500 bootstrap replicates. Because our parameter of interest is data-dependent, for each simulated dataset, we recompute the truth, which is then used in calculating the performance measures.

We consider two data-generating mechanisms (DGMs) within the structural equations model described in Section 2. The DGMs are detailed in Table 1 using the same notation as in Section 2.

Table 1:

Simulation data-generating mechanisms.

Data Generating Mechanism 1
W₁ ~ bernoulli	P (W₁ = 1) = 0.5
W₂ ~ bernoulli	P (W₂ = 1) = expit(0.4 + 0.2W₁)
S ~ bernoulli	P (S = 1) = expit(3W₂ − 1)
A ~ bernoulli	P (A = 1) = 0.5
Z ~ bernoulli	P (Z = 1) = expit(−0.1A + −0.2S + 0.2W₂ +5AW₂ +0.14AS + 0.2W₂S − 0.2AW₂S − 1)
M ~ bernoulli	P (M = 1) = expit(1Z + 3ZW₂ +0.2ZS − 0.2W₂S + 2W₂Z + 0.2S − 0.2ZW₂S − W₂ − 2)
Y ~ bernoulli	P (Y = 1) = expit(−6Z + 0.2ZW₂ + 2ZM + 2W₂M − 2W₂ +4M + 1ZW₂M − 0.2)
Data Generating Mechanism 2
W₁ ~ bernoulli	P (W₁ = 1) = 0.5
W₂ ~ bernoulli	P (W₂ = 1) = expit(0.4 + 0.2W₁)
S ~ bernoulli	P (S = 1) = expit(3W₂ − 1)
A ~ bernoulli	P (A = 1) = 0.5
Z ~ bernoulli	P (Z = 1) = expit(−3A + −0.2S + 2W₂ +0.2AW₂ − 0.2AS + 0.2W₂S + 2AW₂S − 0.2)
M ~ bernoulli	P (M = 1) = expit(1Z + 6W₂Z − 2W₂ − 2)
Y ~ bernoulli	P (Y = 1) = expit(log(1.2) + log(40)Z − log(30)M − log(1.2)W₂ − log(40)W₂Z)

Open in a new tab

7.2. Results

First, in Table 2, we show results under correct specification of all models for sample sizes of N=5,000, N=500, and N=100 using DGM 1. Results using alternative DGMs were similar and are shown in the Supporting Information.

Table 2:

Simulation results comparing estimators of the transported stochastic direct effect and transported stochastic indirect effect under DGM 1 and correct model specification for various sample sizes. 1,000 simulations. Estimation methods compared include solving the estimating equation (EE) and targeted minimum loss-based estimation (TMLE). We compare versions of the estimators that incorporate the exclusion restrictions in our statistical model (TMLE efficient, EE efficient) and versions that do not (TMLE general, EE general). Efficiency and 95% CI coverage are estimated separately using 1) the influence curve (IC) and 2) bootstrapping (boot). Bias and RMSE values are averages across the simulations.

Estimator	Bias	Efficiency		95% CI Coverage		RMSE	% Out of Bounds
		IC	Boot	IC	Boot
Transported stochastic direct effect
N=5000
TMLE efficient	0.000	100.17	100.42	0.955	0.957	0.008	0
TMLE general	0.001	321.02	321.92	0.945	0.942	0.027	0
EE efficient	0.000	100.21	100.42	0.955	0.957	0.008	0
EE general	0.001	321.27	321.00	0.946	0.942	0.027	0
N=500
TMLE efficient	0.000	101.37	105.16	0.946	0.957	0.028	0
TMLE general	−0.004	319.08	327.43	0.929	0.936	0.089	0
EE efficient	0.000	101.40	104.51	0.946	0.957	0.028	0
EE general	−0.004	321.97	316.56	0.947	0.935	0.088	0
N=100
TMLE efficient	0.003	102.26	139.52	0.970	0.992	0.064	0
TMLE general	0.007	293.07	375.10	0.852	0.945	0.218	0
EE efficient	0.004	102.13	118.65	0.974	0.989	0.060	0
EE general	0.005	316.03	292.87	0.957	0.930	0.186	0
Transported stochastic indirect effect
N=5000
TMLE efficient	0.000	99.98	100.46	0.942	0.942	0.004	0
TMLE general	0.000	101.75	101.99	0.942	0.942	0.004	0
EE efficient	0.000	100.01	100.46	0.942	0.942	0.004	0
EE general	0.000	101.69	101.95	0.942	0.943	0.004	0
N=500
TMLE efficient	−0.001	99.56	104.59	0.926	0.929	0.012	0
TMLE general	−0.001	101.74	106.99	0.928	0.935	0.012	0
EE efficient	−0.001	99.58	101.11	0.926	0.929	0.012	0
EE general	−0.001	101.94	103.39	0.932	0.930	0.012	0
N=100
TMLE efficient	−0.003	93.01	132.95	0.878	0.944	0.033	0
TMLE general	−0.001	99.85	158.57	0.865	0.952	0.041	0
EE efficient	−0.003	93.03	113.13	0.878	0.924	0.033	0
EE general	−0.002	99.09	117.32	0.871	0.915	0.036	0

Open in a new tab

Table 2 shows that under correct parametric model specifications, all estimators result in consistent estimates with 95% CI coverage close to 95% for sample sizes of N=5,000 and N=500. Influence curve-based efficiency is close to 100% of the efficiency bound for both the efficient TMLE and EE estimators for all sample sizes. We see a gain in the efficiency bound under $M_{I}$ versus $M_{I I}$ as a result of making the extra exclusion restriction assumptions. This is particularly apparent for the transported SDE comparing the general TMLE and EE estimators to their efficient versions.

Performance (e.g., RMSE and coverage) generally degrades across the estimators we consider with smaller sample sizes. In addition, under the smallest sample size of N=100, IC-based coverage of all estimators is low for the transported SIE (≈ 87%) and is also low for the general TML estimator for the transport SDE (85%). There could be two contributing factors. First, the general, inefficient TML and EE estimators are slightly less stable than their efficient counterparts due to an extra factor (which may be small) in the denominator of D_Y (P) (comparing Eq 3 to Eq 2). Second, among the general, inefficient estimators, TMLE has slightly worse coverage than EE. It is possible that in such small samples, the two update steps that are part of the TMLE algorithm further contribute to the instability. In this case, where we are using parametric regressions, we recommend using the bootstrap for inference, instead of the IC in small samples, which can recover some of this lost coverage by approximating the true variance of the estimator.

Next, we show results under various model misspecifications in Table 3 for sample size N=5,000. We consider: 1) misspecification of the Y model that only includes Z as a predictor; 2) misspecification of the Y and Z models, with the same Y model misspecification as above and misspecifying the Z model to only include an A−S interaction; 3) misspecification of the Y and M models, with the Y model misspecified as above and the M model misspecified to only include Z as a predictor; 4) misspecification of the Y and S models, with the Y model misspecified as above and S misspecified to only include an intercept; and 5) misspecification of the Z, M, and S models, with all misspecifications as previously described. We use DGM 2 for misspecification of the Y model and misspecification of the Y and M models. We use DGM 1 for misspecification of the Y and Z models, misspecification of the Y and S models, and misspecification of the Z, M, and S models. Full results for misspecified models under more DGMs and various sample sizes are shown in the Supporting Information.

Table 3:

Simulation results comparing estimators of the transported stochastic direct effect and transported stochastic indirect effect under various model misspecifications for sample size N=5,000. 1,000 simulations. Estimation methods compared include solving the estimating equation (EE) and targeted minimum loss-based estimation (TMLE). We compare versions of the estimators that incorporate the exclusion restrictions in our statistical model (TMLE efficient, EE efficient) and versions that do not (TMLE general, EE general). Efficiency and 95% CI coverage are estimated separately using 1) the influence curve (IC) and 2) bootstrapping (boot). Bias and RMSE values are averages across the simulations.

Estimator	Bias	Efficiency		95% CI Coverage		RMSE	% Out of Bounds
		IC	Boot	IC	Boot
Y model misspecified, DGM 2
Transported stochastic direct effect
TMLE efficient	0.000	191.06	118.82	0.999	0.965	0.014	0
TMLE general	0.000	354.81	288.79	0.984	0.954	0.035	0
EE efficient	0.000	240.26	126.67	0.999	0.968	0.015	0
EE general	0.000	374.95	291.49	0.992	0.957	0.035	0
Transported stochastic indirect effect
TMLE efficient	0.000	186.27	102.20	0.995	0.914	0.002	0
TMLE general	0.000	239.35	186.41	0.986	0.942	0.004	0
EE efficient	0.000	237.87	103.75	0.997	0.919	0.002	0
EE general	0.000	302.49	216.48	0.991	0.945	0.004	0
Y and Z models misspecified, DGM 1
Transported stochastic direct effect
TMLE efficient	0.032	75.49	77.35	0.000	0.000	0.032	0
TMLE general	0.188	973.02	463.63	0.223	0.003	0.192	0
EE efficient	0.033	80.22	81.92	0.000	0.000	0.034	0
EE general	0.545	1,112.09	1,008.79	0.000	0.000	0.552	0
Transported stochastic indirect effect
TMLE efficient	−0.014	192.60	92.53	0.380	0.004	0.014	0
TMLE general	−0.040	492.45	143.03	0.185	0.000	0.040	0
EE efficient	0.018	307.60	121.67	0.767	0.029	0.019	0
EE general	0.066	911.65	686.60	0.447	0.231	0.072	0
Y and M models misspecified, DGM 2
Transported stochastic direct effect
TMLE efficient	−0.272	280.08	149.17	0.000	0.000	0.273	0
TMLE general	−0.272	698.69	321.77	0.147	0.001	0.276	0
EE efficient	−1.110	852.67	735.38	0.000	0.000	1.118	99.2
EE general	−1.111	1,093.33	877.83	0.000	0.000	1.121	98.4
Transported stochastic indirect effect
TMLE efficient	0.104	249.26	173.89	0.000	0.000	0.106	0
TMLE general	0.103	370.51	279.70	0.036	0.016	0.106	0
EE efficient	0.135	336.39	227.03	0.000	0.000	0.137	0
EE general	0.137	535.91	413.33	0.054	0.016	0.141	0
Y and S models misspecified, DGM 1
Transported stochastic direct effect
TMLE efficient	−0.164	285.91	168.14	0.000	0.000	0.165	0
TMLE general	−0.164	511.22	399.63	0.019	0.010	0.167	0
EE efficient	−0.220	415.64	191.91	0.000	0.000	0.220	0
EE general	−0.218	699.90	586.82	0.029	0.023	0.224	0
Transported stochastic indirect effect
TMLE efficient	0.222	507.95	186.55	0.000	0.000	0.223	0
TMLE general	0.222	600.12	343.91	0.000	0.000	0.222	0
EE efficient	0.278	1,033.84	216.14	0.000	0.000	0.279	0
EE general	0.278	1,201.77	675.26	0.000	0.000	0.279	0
Z, M, and S models misspecified, DGM 1
Transported stochastic direct effect
TMLE efficient	0.000	98.78	100.01	0.956	0.960	0.008	0
TMLE general	0.001	228.37	231.43	0.939	0.941	0.020	0
EE efficient	0.000	98.80	100.03	0.956	0.960	0.008	0
EE general	0.001	228.50	203.24	0.966	0.942	0.018	0
Transported stochastic indirect effect
TMLE efficient	0.001	54.95	100.51	0.727	0.938	0.010	0
TMLE general	0.001	129.68	136.04	0.879	0.920	0.014	0
EE efficient	0.000	54.95	100.46	0.732	0.939	0.010	0
EE general	0.001	130.08	137.03	0.891	0.930	0.014	0

Open in a new tab

As derived from the robustness properties proven in the Supporting Information, we expect the TMLE and EE estimators to be consistent if 1) the Y model is correctly specified or if 2) the Z, M, and S models are correctly specified. Note that we assume A is randomly assigned, so we don’t examine misspecification of the A model. However, in cases where A is nonrandom, condition 2 above would change to require correct specification of the A, Z, M, and S models.

We see in Table 3 that, as expected, all estimators remain unbiased when the Y model is misspecified. Compared to the correctly specified case (Table 2), there is a slight reduction in efficiency, evidenced by efficiencies greater than 100% of the efficiency bound and slight overcoverage when using the IC as opposed to bootstrapping (Table 3).

When the Z, M, and S models are misspecified, the TMLE and EE estimators also remain unbiased, as expected (Table 3). However, in this scenario, coverage is anticonservative when using the IC, evidenced by efficiencies as low as 55% of the efficiency bound in for the transported SIE and coverage as low as 73%. This result is not unexpected—under misspecification, the influence curves of the EE and TMLE estimators change. In such instances of model misspecification, using the bootstrap can recover coverage (as long the fitting methods applied yield valid standard errors under the non-parametric bootstrap. For generalized linear regression models, this is the case, but for data adaptive machine learning algorithms, such might not be the case (29)).

In the remaining model misspecifications shown in Table 3, none of the estimators are guaranteed to be unbiased. In the case where the Y and M models are misspecified, all estimators perform poorly in estimating both the transported direct and indirect effects, and performance of the EE estimators are particularly poor in terms of bias and inefficiency. The marked inefficiency may be due, in part, to estimates lying outsides the bounds of the parameter space. Indeed, we see that nearly 100% of the EE transported SDE estimates lie outside the parameter space. In contrast, even though the TMLE estimators solve the same influence curve, none of the TMLE estimates lie outside the parameter space, demonstrating its advantage as a substitution estimator. However, this was the only case where performance of the EE and TML estimators differed markedly, and because neither estimator was expected to be consistent, we caution against overinterpreting comparisons of performance; in all other simulation scenarios, performance was comparable.

8. Illustrative example

8.1. Overview and approach

We now apply the EE and TMLE transport estimators proposed to our motivating examples corresponding to $M_{I}$ and $M_{I I}$ . Specifically, we consider the path-specific effect of Section 8 housing voucher receipt (A) and, in the case of $M_{I}$ , subsequent use (Z), and in the case of $M_{I I}$ , subsequently living in low-poverty neighborhoods (Z), through parental employment (M), on having a current DSM-IV mental health or substance use disorder at the final follow-up, 10–15 years later (Y). When considering Z to be subsequent use of the voucher, the exclusion restriction in $M_{I}$ likely holds, because receiving a housing voucher should only affect subsequent health if one actually uses the voucher. When considering Z to be moving to a low-poverty neighborhood, the exclusion restriction is less likely to hold, because the effect of receiving the voucher could operate through its use and subsequent exposure to neighborhood poverty but also through alternative pathways like the number of residential moves that would be captured by the path A → M → Y that is allowed in $M_{I I}$ .

We assume no violation of the identifying assumptions under $M_{1}$ or $M_{I I}$ . We believe the positivity assumptions are likely met; we do not have evidence for practical positivity violations in our observed data. The sequential randomization assumptions Y_a,m ⊥ A|W, S and M_a ⊥ A|W, S are likely met, as A was randomized. However, the assumption Y_a,m ⊥ M|W, Z, S may not be met, as M is not randomly assigned, so there may be unobserved confounding variables of the M − Y relationship. In an effort to reduce this bias, we use a large number of measured covariates at the child-, family-, and neighborhood levels. The last identification assumption is that of a common outcome model across sites. We use a nonparametric omnibus test of equality in distribution to test for evidence against a common outcome model,(8) and do not find such evidence.

We incorporated an extensive site of covariates at the individual- and family-levels, including sociodemographics, neighborhood characteristics as reported by the adult family member, and reasons for participation. A is a binary instrument measured at baseline when children were young (aged 0–10 years) that indicates whether or not the family was randomized to the group that received a Section 8 housing voucher. Z under $M_{I}$ is a binary intermediate variable that indicates whether or not the family used their voucher, if they received it, to move out of public housing and into a private rental. Z under $M_{I I}$ is a binary intermediate variable that indicates whether or not the family lived in low-poverty neighborhoods following randomization. M is a binary variable indicating whether or not one of the adult family members living in the home was employed, measured at the interim follow-up 4–7 years after randomization. Y is also binary, and indicates whether or not the child met criteria for any mental health or substance use disorder in the past year. It was measured at the final follow-up 10–15 years after randomization when the children were adolescents (aged 10–20 years (23)), using the CIDI-SF, which corresponds to clinical diagnoses of mental health and substance use disorders using DSM-IV criteria (6). We incorporate MTO study weights, which account for sampling of the child within the household, drop out over time, and changing random assignment ratios (23).

For the purposes of this illustrative analysis, we ignore missing data and perform a complete-case analysis; sample sizes are given below. A more in-depth examination of the transportability of various indirect effects informing mental health and risk behavior among MTO children is a subject of a future paper. We used the least absolute shrinkage and selection operator (lasso) (25) in fitting the Z, M, Y and Q_Z models. To mitigate risk of overfitting, we implemented the most regularized version of lasso that included all covariates and second-order interactions that resulted in model prediction error that was within one standard error of the minimum model prediction error in addition to always including race-ethnicity and age, as they may be important predictors and modifiers, coupled with 10-fold cross validation to select the model.

We first estimated the stochastic indirect effect (SIE) of the path A → Z → M → Y by MTO site among boys using a previously developed TMLE estimator for stochastic indirect effects (21). The New York City and Los Angeles sites had similar SIEs and so were combined (n≈550; sample sizes are rounded to the nearest 50 people per Census regulations). For the example under $M_{I}$ , their combined SIE (risk difference (RD): −0.0029, 95% CI: −0.0068, 0.0010) was in the opposite direction from the SIE for the Boston site (RD: 0.0017, 95% CI: −0.0038, 0.0073, n≈250), and the two were significantly different (p-value = 0.002) (Figure 2). For the example under $M_{I I}$ , their combined SIE was (RD: −0.0010, 95% CI: −0.0017, −0.0003) as compared to the SIE for the Boston site (RD: −0.0006, 95% CI: −0.0021, 0.0010, n≈250) (Figure 2). For reference, the total effect of A on Y among boys is −0.037 (95% CI: −0.103, 0.029) in the NYC and LA combined sites but in the opposite direction in Boston (RD: 0.118, 95% CI: 0.021, 0.215).

Figure 2: — Stochastic indirect effect estimates and 95% confidence intervals using longitudinal data from the Moving to Opportunity experiment. The SIE is interpreted as the effect of being randomized to receive a Section 8 housing voucher on risk of having a current or past-year DSM-IV mental health disorder operating through parental employment in the interim.

We then used our EE and TMLE transport estimators under $M_{I}$ and $M_{I I}$ to assess the extent to which differences were due to differences in compositional characteristics between the sites (W), differences in intervention take-up (Z), and differences in the intervention’s effects on parental employment (M). That is, we test the null hypothesis that the predicted, transported SIE for Boston equals the observed SIE for Boston. The transported effect estimate uses the conditional outcome model for the NYC and LA combined sites and accounts for the differing distributions of the population compositions between the sites and differences in intervention adherence and effects on parental employment. If we fail to reject the null hypothesis, this suggests that the SIEs can be transported between the sites in question. If we reject the null hypothesis, it suggests that the path-specific effect reflected by the SIE does not generalize across the sites, given our measured variables and assumed SEM.

8.2. Illustrative Example Results

Figure 2 shows the nontransported SIEs for the NYC and LA combined sites and the Boston site and the transported SIE for the Boston site that treats NYC and LA as the source population and Boston as the target population in the A) example under $M_{I}$ and B) example under $M_{I I}$ . Comparing the transported estimate for Boston with the nontransported, observed estimate for Boston, we see that accounting for site differences in (W, Z, M) explains much of the difference in estimates between the sites. In other words, even though the SIEs appear different across sites, once we account for differing distributions of (W, Z, M), there appears to be a common mediation mechanism linking housing voucher receipt to subsequent DSM-IV disorder through parental employment.

9. Discussion

In this paper, we defined and identified parameters that transport stochastic direct and indirect mediating effects from a source population (S = 1) to a new, target population (S = 0). Identification of such parameters rely on the typical sequential randomization and positivity assumptions of other stochastic mediation effects (21, 30, 33) as well as a common outcome model assumption, described previously for transport estimators (22). Such parameters enable the prediction of mediating effects in new populations based on data about the mediation mechanism in a source population and the differing distributions of compositional characteristics between the two populations. Thus, transport SDE and SIE parameters contribute to understanding how interventions may work differently and/or have differing effects when applied to new populations.

We proposed an estimating equation approach and a TMLE approach. We describe a version of each estimator that is efficient under a statistical model with exclusion restrictions such that A does not directly affect M or Y, and another version that is efficient under a statistical model without those restrictions such that A can directly affect M and Y. The EE and TMLE estimators solve the EIC for a particular statistical model, which results in double robustness, meaning that they are unbiased if one either consistently estimates the Y model or Z, M, S models. The TMLE estimator has the additional advantage of staying within the bounds of the parameter space by virtue of being a substitution estimator.

We also saw empirical evidence that, even in cases where robustness properties guarantee estimator consistency, the IC may no longer provide accurate inference when models are misspecified. In other words, the estimators are doubly robust in terms of point estimates, but valid IC-based inference requires that the influence curve at our estimated distribution converges to the influence curve at the true distribution. When model fits that determine the influence curve are misspecified this requirement might fail even though the point estimates are robust to this misspecification. If parametric models are used, as in our simulation, the bootstrap can be used to recover appropriate coverage in such scenarios (3). However, the bootstrap is not an appropriate strategy if data-adaptive methods are used in model fitting (28). To address this problem, we can employ extra targeting of the nuisance parameters that preserves asymptotic linearity with a known influence curve and thereby provides correct influence curve-based variance at the cost of losing some efficiency, as has been described previously (3, 26). Applying such extra targeting the to TML estimators we describe here is an area for future work.

The estimators we propose are limited in that they consider a stochastic intervention on mediator, M, that is assumed known and estimated from observed data. However, we plan to extend them to a true, unknown stochastic intervention in the future. Another limitation is that the parameters are only identified if one assumes a common outcome model across the source and target populations. There will be some research questions for which it is not possible to establish evidence for or against this assumption, as in questions about predicting a long-term outcome in a new population. However, when the research question instead focuses on establishing the extent to which mechanisms are shared across populations, and the full set of data O = (S, W, A, Z, M, Y) is observed for both populations, one can empirically test whether there is evidence against such a shared outcome model (9).

We focused on transporting mediation estimates where an instrument, A, was statically intervened on and mediator M was stochastically intervened on. Moreover, we were primarily concerned with a statistical model that imposed instrumental variable assumptions such as the exclusion restriction assumption. However, we describe how each estimator can be easily modified to accommodate statistical models that do not impose instrumental variable assumptions, allowing for a direct effect of A on M and of A on Y. Extending our proposed estimators for data generating mechanisms that do not include an intermediate variable, Z is straightforward. Thus, our transport mediation estimators can be applied to a wide-range of common data generating mechanisms.

Supplementary Material

sup 01

NIHMS1644546-supplement-sup_01.pdf^{(543.8KB, pdf)}

Acknowledgements

The contents of this document do not necessarily reflect the views or policies of HUD or the U.S. Government. CBDRB-FY19-201 and CBDRB-FY20-015. APPROVED BY DRB ON 2019-03-11 and 2019-10-09.

Footnotes

Data Availability Statement

Researchers may apply for data access through the U.S. Census Bureau. Code to replicate the simulation results is also provided https://github.com/jlstiles/SDEtransportsim/blob/master/simulations.Rmd.

Supporting Information

Web Appendices and Tables referenced in Sections 4–7 are available with this paper at the Biometrics website on Wiley Online Library.

Software to implement the estimators described is available on GitHub: https://github.com/jlstiles/SDEtransport. Code to replicate the simulation is also available on GitHub: https://github.com/jlstiles/SDEtransportsim/blob/master/simulations.Rmd.

Contributor Information

Kara E. Rudolph, Department of Epidemiology, Mailman School of Public Health, Columbia University, New York, New York, U.S.A.

Jonathan Levy, Division of Biostatistics, University of California, Berkeley, U.S.A..

Mark J. van der Laan, Division of Biostatistics, University of California, Berkeley, U.S.A.

References

Angrist JD, Imbens GW, and Rubin DB (1996). Identification of causal effects using instrumental variables. Journal of the American statistical Association 91, 444–455. [Google Scholar]
Bareinboim E and Pearl J A general algorithm for deciding transportability of experimental results. Journal of causal Inference 1, 107–134. [Google Scholar]
Benkeser D, Carone M, Laan MVD, and Gilbert P (2017). Doubly robust nonparametric inference on the average treatment effect. Biometrika 104, 863–880. [DOI] [PMC free article] [PubMed] [Google Scholar]
Cole SR and Stuart EA (2010). Generalizing evidence from randomized clinical trials to target populations: The actg 320 trial. American journal of epidemiology 172, 107–115. [DOI] [PMC free article] [PubMed] [Google Scholar]
Frangakis C (2009). The calibration of treatment effects from clinical trials to target populations. Clinical trials (London, England) 6, 136. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kessler RC, Andrews G, Mroczek D, Ustun B, and Wittchen H-U (1998). The world health organization composite international diagnostic interview short-form (cidi-sf). International journal of methods in psychiatric research 7, 171–185. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kling JR, Liebman JB, and Katz LF (2007). Experimental analysis of neighborhood effects. Econometrica 75, 83–119. [Google Scholar]
Luedtke A, Carone M, and van der Laan MJ (2019). An omnibus non-parametric test of equality in distribution for unknown functions. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 81, 75–99. [DOI] [PMC free article] [PubMed] [Google Scholar]
Luedtke AR, Carone M, and van der Laan MJ (2015). An omnibus nonparametric test of equality in distribution for unknown functions. arXiv preprint arXiv:1510.04195 [DOI] [PMC free article] [PubMed] [Google Scholar]
Miettinen OS (1972). Standardization of risk ratios. American Journal of Epidemiology 96, 383–388. [DOI] [PubMed] [Google Scholar]
Neyman JS (1923). On the application of probability theory to agricultural experiments. essay on principles. section 9.(tlanslated and edited by dm dabrowska and tp speed, statistical science (1990), 5, 465–480). Annals of Agricultural Sciences 10, 1–51. [Google Scholar]
Laanvan der Laan M (2016). A generally efficient targeted minimum loss based estimator. U.C. Berkeley Division of Biostatistics Working Paper Series 343,. [Google Scholar]
Laanvan der Laan M and Rubin D (2006). Targeted maximum likelihood learning. U.C. Berkeley Division of Biostatistics Working Paper Series [Google Scholar]
Orr L, Feins J, Jacob R, Beecroft E, Sanbonmatsu L, Katz LF, Liebman JB, and Kling JR (2003). Moving to opportunity: Interim impacts evaluation.
Pearl J (2009). Causality. Cambridge university press. [Google Scholar]
Pearl J and Bareinboim E (2014). External validity: From do-calculus to transportability across populations. Statistical Science pages 579–595. [Google Scholar]
Pearl J et al. (2009). Causal inference in statistics: An overview. Statistics surveys 3, 96–146. [Google Scholar]
Rubin DB (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of educational Psychology 66, 688. [Google Scholar]
Rudolph KE, Goin DE, Paksarian D, Crowder R, Merikangas KR, and Stuart EA (2018). Causal mediation analysis with observational data: Considerations and illustration examining mechanisms linking neighborhood poverty to adolescent substance use. American journal of epidemiology. [DOI] [PMC free article] [PubMed] [Google Scholar]
Rudolph KE, Schmidt NM, Crowder R, Galin J, Glymour MM, Ahern J, and Osypuk TL (2018). Composition or context: using transportability to understand drivers of site differences in a large-scale housing experiment. Epidemiology 29, 199–206. [DOI] [PMC free article] [PubMed] [Google Scholar]
Rudolph KE, Sofrygin O, Zheng W, and van der Laan MJ (2017). Robust and flexible estimation of data-dependent stochastic mediation effects: a proposed method and example in a randomized trial setting. Epidemiologic Methods page In Press. [DOI] [PMC free article] [PubMed] [Google Scholar]
Rudolph KE and van der Laan MJ (2017). Robust estimation of encouragement-design intervention effects transported across sites. Journal of the Royal Statistical Society Series B Statistical Methodology 79, 1509–1525. [DOI] [PMC free article] [PubMed] [Google Scholar]
Sanbonmatsu L, Katz LF, Ludwig J, Gennetian LA, Duncan GJ, Kessler RC, Adam EK, McDade T, and Lindau ST (2011). Moving to opportunity for fair housing demonstration program: Final impacts evaluation.
Stuart EA, Cole SR, Bradshaw CP, and Leaf PJ (2011). The use of propensity scores to assess the generalizability of results from randomized trials. Journal of the Royal Statistical Society: Series A (Statistics in Society) 174, 369–386. [DOI] [PMC free article] [PubMed] [Google Scholar]
Tibshirani R (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological) 58, 267–288. [Google Scholar]
van der Laan MJ (2014). Targeted estimation of nuisance parameters to obtain valid statistical inference. The international journal of biostatistics 10, 29–57. [DOI] [PubMed] [Google Scholar]
van der Laan MJ and Gruber S (2012). Targeted minimum loss based estimation of causal effects of multiple time point interventions. The international journal of biostatistics 8,. [DOI] [PubMed] [Google Scholar]
van der Vaart A (2000). Asymptotic Statistics, volume Chapter 25. Cambridge University Press, Cambridge, UK. [Google Scholar]
Van Der Vaart AW and Wellner JA (1996). Weak convergence. In Weak convergence and empirical processes, pages 16–28. Springer. [Google Scholar]
VanderWeele TJ and Tchetgen Tchetgen EJ (2017). Mediation analysis with time varying exposures and mediators. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 79, 917–938. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wright S (1921). Correlation and causation. Journal of agricultural research 20, 557–585. [Google Scholar]
Zheng W and Laanvan der Laan M (2010). Asymptotic theory for cross-validated targeted maximum likelihood estimation. U.C. Berkeley Division of Biostatistics Working Paper Series [Google Scholar]
Zheng W and van der Laan M (2017). Longitudinal mediation analysis with time-varying mediators and exposures, with application to survival outcomes. Journal of Causal Inference. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

sup 01

NIHMS1644546-supplement-sup_01.pdf^{(543.8KB, pdf)}

[R1] Angrist JD, Imbens GW, and Rubin DB (1996). Identification of causal effects using instrumental variables. Journal of the American statistical Association 91, 444–455. [Google Scholar]

[R2] Bareinboim E and Pearl J A general algorithm for deciding transportability of experimental results. Journal of causal Inference 1, 107–134. [Google Scholar]

[R3] Benkeser D, Carone M, Laan MVD, and Gilbert P (2017). Doubly robust nonparametric inference on the average treatment effect. Biometrika 104, 863–880. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] Cole SR and Stuart EA (2010). Generalizing evidence from randomized clinical trials to target populations: The actg 320 trial. American journal of epidemiology 172, 107–115. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] Frangakis C (2009). The calibration of treatment effects from clinical trials to target populations. Clinical trials (London, England) 6, 136. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] Kessler RC, Andrews G, Mroczek D, Ustun B, and Wittchen H-U (1998). The world health organization composite international diagnostic interview short-form (cidi-sf). International journal of methods in psychiatric research 7, 171–185. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] Kling JR, Liebman JB, and Katz LF (2007). Experimental analysis of neighborhood effects. Econometrica 75, 83–119. [Google Scholar]

[R8] Luedtke A, Carone M, and van der Laan MJ (2019). An omnibus non-parametric test of equality in distribution for unknown functions. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 81, 75–99. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] Luedtke AR, Carone M, and van der Laan MJ (2015). An omnibus nonparametric test of equality in distribution for unknown functions. arXiv preprint arXiv:1510.04195 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] Miettinen OS (1972). Standardization of risk ratios. American Journal of Epidemiology 96, 383–388. [DOI] [PubMed] [Google Scholar]

[R11] Neyman JS (1923). On the application of probability theory to agricultural experiments. essay on principles. section 9.(tlanslated and edited by dm dabrowska and tp speed, statistical science (1990), 5, 465–480). Annals of Agricultural Sciences 10, 1–51. [Google Scholar]

[R12] Laanvan der Laan M (2016). A generally efficient targeted minimum loss based estimator. U.C. Berkeley Division of Biostatistics Working Paper Series 343,. [Google Scholar]

[R13] Laanvan der Laan M and Rubin D (2006). Targeted maximum likelihood learning. U.C. Berkeley Division of Biostatistics Working Paper Series [Google Scholar]

[R14] Orr L, Feins J, Jacob R, Beecroft E, Sanbonmatsu L, Katz LF, Liebman JB, and Kling JR (2003). Moving to opportunity: Interim impacts evaluation.

[R15] Pearl J (2009). Causality. Cambridge university press. [Google Scholar]

[R16] Pearl J and Bareinboim E (2014). External validity: From do-calculus to transportability across populations. Statistical Science pages 579–595. [Google Scholar]

[R17] Pearl J et al. (2009). Causal inference in statistics: An overview. Statistics surveys 3, 96–146. [Google Scholar]

[R18] Rubin DB (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of educational Psychology 66, 688. [Google Scholar]

[R19] Rudolph KE, Goin DE, Paksarian D, Crowder R, Merikangas KR, and Stuart EA (2018). Causal mediation analysis with observational data: Considerations and illustration examining mechanisms linking neighborhood poverty to adolescent substance use. American journal of epidemiology. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R20] Rudolph KE, Schmidt NM, Crowder R, Galin J, Glymour MM, Ahern J, and Osypuk TL (2018). Composition or context: using transportability to understand drivers of site differences in a large-scale housing experiment. Epidemiology 29, 199–206. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R21] Rudolph KE, Sofrygin O, Zheng W, and van der Laan MJ (2017). Robust and flexible estimation of data-dependent stochastic mediation effects: a proposed method and example in a randomized trial setting. Epidemiologic Methods page In Press. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] Rudolph KE and van der Laan MJ (2017). Robust estimation of encouragement-design intervention effects transported across sites. Journal of the Royal Statistical Society Series B Statistical Methodology 79, 1509–1525. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R23] Sanbonmatsu L, Katz LF, Ludwig J, Gennetian LA, Duncan GJ, Kessler RC, Adam EK, McDade T, and Lindau ST (2011). Moving to opportunity for fair housing demonstration program: Final impacts evaluation.

[R24] Stuart EA, Cole SR, Bradshaw CP, and Leaf PJ (2011). The use of propensity scores to assess the generalizability of results from randomized trials. Journal of the Royal Statistical Society: Series A (Statistics in Society) 174, 369–386. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R25] Tibshirani R (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological) 58, 267–288. [Google Scholar]

[R26] van der Laan MJ (2014). Targeted estimation of nuisance parameters to obtain valid statistical inference. The international journal of biostatistics 10, 29–57. [DOI] [PubMed] [Google Scholar]

[R27] van der Laan MJ and Gruber S (2012). Targeted minimum loss based estimation of causal effects of multiple time point interventions. The international journal of biostatistics 8,. [DOI] [PubMed] [Google Scholar]

[R28] van der Vaart A (2000). Asymptotic Statistics, volume Chapter 25. Cambridge University Press, Cambridge, UK. [Google Scholar]

[R29] Van Der Vaart AW and Wellner JA (1996). Weak convergence. In Weak convergence and empirical processes, pages 16–28. Springer. [Google Scholar]

[R30] VanderWeele TJ and Tchetgen Tchetgen EJ (2017). Mediation analysis with time varying exposures and mediators. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 79, 917–938. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R31] Wright S (1921). Correlation and causation. Journal of agricultural research 20, 557–585. [Google Scholar]

[R32] Zheng W and Laanvan der Laan M (2010). Asymptotic theory for cross-validated targeted maximum likelihood estimation. U.C. Berkeley Division of Biostatistics Working Paper Series [Google Scholar]

[R33] Zheng W and van der Laan M (2017). Longitudinal mediation analysis with time-varying mediators and exposures, with application to survival outcomes. Journal of Causal Inference. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Transporting stochastic direct and indirect effects to new populations

Kara E Rudolph

Jonathan Levy

Mark J van der Laan

Summary:

1. Introduction

2. Notation and structural equations model

Figure 1:

3. Parameters of interest

4. Identifiability

5. Estimating equation estimator

5.1. Estimator incorporating exclusion restrictions under $M_{I}$

5.2. Estimator not imposing exclusion restrictions

6. Targeted minimum loss-based estimator

6.1. Estimator incorporating exclusion restrictions

6.2. Estimator not imposing exclusion restrictions

7. Simulation

7.1. Overview

Table 1:

7.2. Results

Table 2:

Table 3:

8. Illustrative example

8.1. Overview and approach

Figure 2:

8.2. Illustrative Example Results

9. Discussion

Supplementary Material

Acknowledgements

Footnotes

Contributor Information

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Transporting stochastic direct and indirect effects to new populations

Kara E Rudolph

Jonathan Levy

Mark J van der Laan

Summary:

1. Introduction

2. Notation and structural equations model

Figure 1:

3. Parameters of interest

4. Identifiability

5. Estimating equation estimator

5.1. Estimator incorporating exclusion restrictions under MI

5.2. Estimator not imposing exclusion restrictions

6. Targeted minimum loss-based estimator

6.1. Estimator incorporating exclusion restrictions

6.2. Estimator not imposing exclusion restrictions

7. Simulation

7.1. Overview

Table 1:

7.2. Results

Table 2:

Table 3:

8. Illustrative example

8.1. Overview and approach

Figure 2:

8.2. Illustrative Example Results

9. Discussion

Supplementary Material

Acknowledgements

Footnotes

Contributor Information

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

5.1. Estimator incorporating exclusion restrictions under $M_{I}$