Abstract
There is a growing literature on finding rules by which to assign treatment based on an individual’s characteristics such that a desired outcome under the intervention is maximized. A related goal entails identifying a subpopulation of individuals predicted to have a harmful indirect effect (the effect of treatment on an outcome through mediators), perhaps even in the presence of a predicted beneficial total treatment effect. In some cases, the implications of a likely harmful indirect effect may outweigh an anticipated beneficial total treatment effect, and would motivate further discussion of whether to treat identified individuals. We build on the mediation and optimal treatment rule literatures to propose a method of identifying a subgroup for which the treatment effect through the mediator is expected to be harmful. Our approach is nonparametric, incorporates post-treatment confounders of the mediator-outcome relationship, and does not make restrictions on the distribution of baseline covariates, mediating variables, or outcomes. We apply the proposed approach to identify a subgroup of boys in the MTO housing voucher experiment who are predicted to have a harmful indirect effect of housing voucher receipt on subsequent psychiatric disorder incidence through aspects of their school and neighborhood environments.
Keywords: dynamic treatment regime, optimal individualized treatment regime, cross-validation, causal inference, optimal rule, mediation, interventional indirect effect
1. Introduction
1.1. Overview
When individuals differ in their responses to a treatment or intervention, it may be desirable to use individual-level information to predict for whom treatments/interventions may work well versus not. This goal has fueled the growing literature on finding so-called optimal individualized treatment rules/regimes, which assign treatment based on an individual’s characteristics such that a desired outcome of the intervention is maximized (Murphy, 2003).
Nabi et al. (2018) and Shpitser and Sherman (2018) recently extended these methods to finding a treatment rule that optimizes a path-specific effect. Nabi et al. (2018) focus on optimizing a direct effect (an effect of treatment on an outcome, not through a mediator)—specifically, a natural direct effect (Tchetgen Tchetgen and Shpitser, 2014). In their application, the natural direct effect of interest is the effect of a treatment not operating through adherence. The authors estimate conditional natural direct effects, which are the type of effect that would form the basis for individualized treatment rules designed to optimize the natural direct effect (Nabi et al., 2018).
A related goal entails identifying a subgroup of individuals who are predicted to have a harmful indirect effect (the effect of treatment on an outcome through mediators). We propose an approach to do this by developing an estimator for conditional interventional indirect effects (this type of mediational estimand is further discussed below). This work represents an extension of work by Díaz et al. (2021b), which focused on estimation of marginal interventional direct and indirect effects.
1.2. Real-world relevance
A natural question is when it would be of practical interest to estimate conditional indirect effects in particular, as opposed to estimating conditional total effects. Or, in other words, when there would be value in identifying the subgroup predicted to have a harmful indirect effect in addition to identifying the subgroup predicted to have a harmful total effect.
First, we point out that it is possible for an individual to experience a harmful indirect effect even if their predicted total treatment effect is beneficial, sometimes referred to as inconsistent mediation (MacKinnon et al., 2000). For some subgroups, a predicted harmful indirect effect may outweigh a predicted beneficial total effect of treatment if there are additional outcomes, other one the one being studied, that the treatment harms, and if the predicted harmful indirect effect is a proxy surrogate for those unobserved harmful effects. It is possible that these additional negative outcomes may be of more consequence than the outcome being evaluated in the indirect effect. A directed acyclic graph (DAG) depicting an example of such a scenario is given in Figure A1 in the supplement.
In some cases, the relative importance of the indirect effect, total effect, and/or total effect on other outcomes, may be able to be quantified. If so, the direct and indirect effects could be considered together as a scalar composite outcome, similar to how multiple outcomes are currently considered in learning optimal treatment rules (e.g., set-valued treatment regimes, inverse preference elicitation, and constrained estimation) (Butler et al., 2018; Laber et al., 2014; Lizotte et al., 2012; Linn et al., 2015). However, deciding on a composite outcome would necessitate a more complete understanding of the relevant causal processes surrounding the research question, including the relative importance of the relevant measured outcomes to patients. Although quantification of the relative importance of outcomes has parallels with the valuation of years of life via DALYs and QALYs (Murray et al., 2012; Whitehead and Ali, 2010), and there has been work in this area in the optimal treatment rule literature (Butler et al., 2018; Laber et al., 2014; Lizotte et al., 2012; Linn et al., 2015), we do not engage in this philosophically and ethically complicated exercise here.
Second, it is also possible that such a harmful indirect effect would be unexpected. In these cases, alerting researchers to such a subgroup could prompt them to work to better understand the reasons underlying the harmful indirect effect among the identified individuals. Ideally, researchers would eventually understand why some subgroups experienced a harmful indirect effect while others did not, which could inform improvements to subsequent iterations of either the intervention itself or its delivery, including the inclusion of supports or additional resources targeted to the identified subgroups to mitigate unintended harms, thereby making it more likely that everyone would benefit from the intervention.
1.2.1. Descriptive versus prescriptive real-world implications
For those predicted to experience harmful indirect effects, treatment may be reconsidered in a more nuanced attempt to weigh risks—as demonstrated by harmful indirect effects and the processes that they may represent—versus benefits—as demonstrated by a slight beneficial total effect or direct effect. In this sense, such subgroup identification is descriptive rather than prescriptive. Although the approach we propose is mathematically grounded in the optimal individualized treatment rule literature, for the remainder of the paper, we avoid using the phrase “treatment rule” in describing it, because our subgroup identification is not a treatment recommendation. A decision to treat (or not) individuals in this subset would likely be based on a nuanced, comprehensive assessment involving aspects like the predicted direct effect as well as ethics, costs, and qualitative measures.
1.3. Motivating Data Example
We can use a motivating example from the Moving To Opportunity study (MTO) to think through a scenario where the harmful indirect effect may be a proxy surrogate for a harmful process that would result in multiple negative outcomes (reflected in Figure A1). In MTO, families were randomized to receive a Section 8 housing voucher that would allow them to move out of public housing and into a rental on the private market (Kling et al., 2007). Housing vouchers have been and remain a core component of federal housing policy in the United States (US) over the past several decades, as policy has moved away from public housing and towards voucher-based subsidies. MTO was an early housing voucher evaluation. MTO participants were followed for 10–15 years and a broad array of outcomes related to health, education and income were assessed (Sanbonmatsu et al., 2011). In general, housing voucher receipt resulted in better mental health and risk behavior outcomes for girls, but worse outcomes for boys (Kling et al., 2007; Sanbonmatsu et al., 2011; Kessler et al., 2014). For example, for boys, voucher receipt was estimated to increase risk of later developing any psychiatric disorder, including a mood disorder, and/or an externalizing disorder (Sanbonmatsu et al., 2011; Kessler et al., 2014; Rudolph et al., 2021a). We previously found evidence of harmful marginal indirect effects of voucher receipt on the mental health and risk behavior outcomes of adolescent boys through so-called “improvements” in objective indicators of school and neighborhood quality (e.g., higher school rank, lower school poverty, lower neighborhood poverty) (Rudolph et al., 2021a).
This harmful indirect effect was unexpected. One hypothesis is that it could be a surrogate for an unmeasured harmful social process like discrimination. Most boys in MTO were non-white (98%), and when their families moved with the voucher, they tended to move to slightly whiter neighborhoods and schools (Sanbonmatsu et al., 2011), where the boys could have faced discrimination, which in turn, could negatively and profoundly affect a large array of outcomes.
In this example and in others where a subgroup is predicted to have a harmful total effect or harmful indirect effect that may be a proxy surrogate for additional negative outcomes that may outweigh a beneficial total effect of one particular outcome, policymakers/interventionists may want to reconsider intervening on the subgroups who are predicted to experience harms, or may want to provide them with supports designed to mitigate those harms. In the MTO application, this may mean reconsidering giving the voucher to families with male children who would be identified for being at-risk of a harmful indirect effect. However, receiving a housing voucher affects the whole family—not just the male children—and benefits were consistently observed for girls. So, it is also plausible that benefits could be maximized for a family by continuing the intervention but offering supports to the families, relevant institutions (e.g., schools), or both, to mitigate harms for the relevant subgroup. Ideally, there would be data from a study designed to evaluate whether it is better to not give the intervention to the subgroup predicted to experience harm or to give the intervention in combination with strategies to mitigate harm. However, in the absence of such data, as here, one would only be able to predict the indirect effect and total effect had the intervention not been given to that subgroup predicted to experience a harmful indirect effect, so one would not be able to draw conclusions about what should be done in such a scenario.
1.4. Paper’s objective and organization
Identifying a subgroup predicted to experience an unintended harmful indirect effect is a necessary first step—a prerequisite—to having a more nuanced discussion in weighing treatment benefits and risks for these individuals. This paper’s objective is to propose an approach for this first step. We build on both the current mediation and optimal individualized treatment rule literature to propose an estimator that identifies a subgroup who may experience an unintended harmful indirect effect. We then estimate (using an existing estimator) how the population path-specific effects would hypothetically change if the identified subgroup was not treated. Our approach is nonparametric, incorporates an ensemble of machine learning algorithms in model fitting, incorporates post-treatment variables that may confound the mediator-outcome relationship, and does not make restrictions on the distribution of baseline covariates, mediating variables (considered jointly), or outcomes. We apply the proposed approach to identify a subgroup of boys in the MTO housing voucher experiment who are predicted to have a harmful indirect effect of housing voucher receipt on subsequent psychiatric disorder incidence through their school and neighborhood environments.
2. Notation and background
2.1. Notation
Let O = (W, A, Z, M, Y) represent observed data, where W is a vector of baseline variables; A is a binary treatment (representing housing voucher receipt in the case of MTO); M represents mediating variables (in the case of MTO: related to the school, neighborhood, and social environments); Y represents the outcome (past-year psychiatric disorder 10–15 years after voucher receipt, in the case of MTO) and Z represents post-treatment variables that may confound the M → Y relationship (in the case of MTO, moving with the housing voucher). Let O1, …, On represent the sample of n i.i.d. observations of O. We use capital letters to denote random variables and lowercase letters to denote realizations of those variables. Let P represent the distribution of O. We assume the data are generated according to the following nonparametric structural equation model (Pearl, 2000):
| (1) |
where each U represents unmeasured, exogenous factors and each f represents an unknown, deterministic function.
We define Pf = ∫ f(o)dP(o) for a given function f(o). We assume P is continuous with respect to some dominating measure ν and let p denote the corresponding probability density function. We will also use the following notation: b(a, z, m, w) denotes E(Y | A = a, Z = z, M = m, W = w); g(a | w) denotes P(A = a | W = w); e(a | m, w) denotes P(A = a | M = m, W = w); q(z | a, w) denotes P(Z = z | A = a, W = w); and r(z | a, m, w) denotes P(Z = z | A = a, M = m, W = w).
2.2. Background on interventional direct and indirect effects
There are several types of path-specific direct and indirect causal effects (Ogburn, 2012). Natural direct and indirect effects are common and decompose the average total treatment effect (ATE) as follows: , where denotes the nested counterfactual outcome that is interpreted as what the outcome would have been had treatment been set to some value a and had the mediator taken its counterfactual value had the treatment been set of some value a′ (Pearl, 2001).
However, natural direct and indirect effects are not generally point-identified in the presence of a variable Z that confounds the mediator-outcome relation and that is affected by treatment (Avin et al., 2005). This poses a problem in the case of our MTO motivating example, because “take-up” of the intervention (i.e., using the Section 8 voucher) is such a Z variable. If natural direct and indirect effects are estimated anyway (as is commonly done is practice), significant bias can result (Rudolph et al., 2020).
Interventional direct and indirect effects overcome this limitation and are point-identified in the presence of such a Z variable. They are defined in terms of counterfactual outcomes under a deterministic intervention on the treatment and a stochastic intervention on the mediator (Petersen et al., 2006; van der Laan and Petersen, 2008; Zheng and van der Laan, 2012; VanderWeele et al., 2014; Rudolph et al., 2017). We define a counterfactual outcome under interventions that set the treatment and mediator to (A, M) = (a, Ga′) as , where Za = fZ(a, W, UZ) is the counterfactual variable Z observed under an intervention setting A = a, and where Ga denotes a random draw from the distribution of Ma conditional on W. We could also denote this counterfactual outcome as . Although interventional direct and indirect effects do not decompose the ATE, they decompose a slightly different total effect that depends on stochastic interventions: , which we call the population interventional total effect (Díaz et al., 2021b). The interventional indirect effect, , compares the expected average counterfactual outcomes under hypothetical interventions in which the treatment is fixed but the mediator is changed from a draw under its counterfactual treatment distribution (G1) to a draw under its counterfactual control distribution (G0). In terms of our MTO research question, has the causal interpretation as the average difference in predicted risk of past-year psychiatric disorder in adolescence setting voucher to be received and contrasting a stochastic intervention on M under voucher receipt and another under no voucher receipt.
Interventional direct and indirect effects target a population-level path from A to Y through M for the indirect effect and not through M for the direct effect. If there is no post-treatment confounder, Z, and the assumption M0 ⊥ Y1,m | W holds, the identification formula (and consequently estimators) of the interventional and natural direct and indirect effects are the same (VanderWeele and Tchetgen Tchetgen, 2017). We include additional discussion related to the interpretation of these effects in the Appendix.VanderWeele et al. (2014) proved the following identification under assumptions of (i) Ya,m ⊥ A | W, (ii) Ma ⊥ A | W, and (iii) . In addition, robust and efficient estimators for interventional direct and indirect effects exist (VanderWeele and Tchetgen Tchetgen, 2017; Díaz et al., 2021b).
3. Identification of the subgroup with a predicted harmful indirect effect
3.1. Conditional interventional indirect effect definition and identification
We are concerned with identifying subgroups, characterized by baseline covariate values, v, that have predicted harmful interventional indirect effects, . These interventional conditional indirect effects can be interpreted as the effect contrasting two stochastic interventions on M—under voucher receipt vs. no voucher receipt—averaged over the particular subgroup, v.
Throughout, we use the example from MTO in which a reduction in risk of subsequent psychiatric disorder in adolescence is the outcome of interest; thus, negative risk differences are considered “beneficial” and positive risk differences are considered “harmful”. The definition of “harmful” for what follows should be changed to reflect the particular research question.
The subgroup predicted to not have a harmful indirect effect (where “harmful” is indicated by a positive effect, as described above) can be identified by the target parameter
where B(v) is the expected interventional indirect effect conditional on a subset of baseline variables V ⊆ W, and is defined as
In what follows we let Wc = W\V. The function B is often referred to as a “blip” function (Robins, 2004), and is a predictive function that takes covariates V as input and outputs the conditional interventional indirect effect of treatment on the additive scale. The function d(v) is then an indicator of a predicted nonharmful indirect effect for covariate profile V = v. In this case, if B(v) ≤ 0, then individuals in strata V = v are indicated as not having a predicted harmful effect; if B(v) > 0, then individuals are indicated as having a predicted harmful effect.
The following assumptions, which were established by VanderWeele et al. (2014), will allow us to identify the causal parameters B(v) and d(v):
Assumption 1 (No unmeasured confounders of the A → Y relation). Ya,m ⊥ A | W
Assumption 2 (No unmeasured confounders of the A → M relation). Ma ⊥ A | W, for a ∈ {0, 1}.
Assumption 3 (No unmeasured confounders of the M → Y relation). Ya,m ⊥ M | (W, A, Z).
Assumption 4 (Positivity). p(W) > 0 implies p(a | W) > 0, p(M | a⋆, W) > 0 and p(Z | a′, W) > 0 imply p(M | a′, Z, W) > 0 with probability one for (a′, a⋆) = (1, 1) and (a′, a⋆) = (1, 0).
In the context of the conditional interventional indirect effect of MTO housing voucher receipt on subsequent risk of psychiatric disorder in adolescence through the school, neighborhood and social environments, Assumptions 1 and 2 are expected to hold, as the intervention is randomized. Assumption 3 may not hold, most likely due to unmeasured post-randomization confounding variables, such as those related to changes in parental employment, income, parent-child dynamics, etc. However, we do include a large number of baseline covariates related parental socioeconomic status, motivations for enrolling in MTO, and relationships; child characteristics; and neighborhood characteristics. We also include an indicator representing moving with the housing voucher, which could be one such post-treatment confounder. Lastly, we include numerous mediating variables capturing aspects of the school, neighborhood and social environments. Any unmeasured variables would need to contribute to mediator-outcome confounding independently of the aforementioned variables to violate Assumption 3.
Under Assumptions 1–4, VanderWeele and Tchetgen Tchetgen (2017) showed that the conditional interventional indirect effect is identified from the observed data as follows: for any values (a′, a*) ∈ {0, 1}2, is identified as
And thus we have
| (2) |
Alternative interpretation in terms of the optimal dynamic treatment rule.
Although we generally do not interpret d(v) as a treatment rule in this paper, such an interpretation may be justified in other contexts. In these cases, d(v) may be alternatively defined in terms of an optimal individualized dynamic treatment rule:
where, for any d′ and d⋆, is the counterfactual outcome in a hypothetical world where treatment is set to the rule d′(V) ∈ {0, 1} and the mediator is set to a random draw from the distribution of M conditional on A = d⋆(V). Here 𝒟 is the space of all functions that map the covariates V to a treatment decision rule in {0, 1}.
3.2. Estimation
As pointed out in the Introduction, estimation of B(v) and d(v) are equivalent to estimation of conditional average effects (here, the conditional interventional indirect effect) and estimation of optimal treatment rules. Consequently, estimation techniques such as Q-learning (Murphy, 2003; Qian and Murphy, 2011; Moodie et al., 2012; Laber et al., 2014), outcome-weighted learning (OWL) (Zhang et al., 2012; Zhao et al., 2012, 2015), and doubly robust techniques (Luedtke and van der Laan, 2016; Díaz et al., 2018; Kennedy, 2020) may be used. Q-learning is a regression-based approach related to g-computation that relies on sequential regression formulas. Q-learning is not directly applicable to our problem because Equation 2 does not readily yield a sequential regression representation. OWL uses inverse probability weights to recast the problem of estimating d(v) as a weighted classification problem. Although we could use OWL to estimate B(v) and d(v) here, we instead choose to use a doubly robust approach that combines regressions with inverse probability weights to obtain an estimator that remains consistent under certain configurations of inconsistent estimation of nuisance parameters. This approach relies on so-called unbiased transformations, which we define below.
Definition 1 (Unbiased transformation). The function D(o) is an unbiased transformation for B(v) if it satisfies E[D(O) | V = v] = B(v).
In particular, we use a multiply robust unbiased transformation (Rubin and van der Laan, 2007). For the case of optimal treatment rules, multiply robust unbiased transformations are constructed using the efficient influence function (EIF) of the average treatment effect (Luedtke and van der Laan, 2016; Díaz et al., 2018; Kennedy, 2020). We take a similar approach here. Specifically, the EIF for the counterfactual mean under Assumptions 1–4 was derived by Díaz et al. (2021a). We denote this EIF as Dη(o), where η represents a vector of nuisance parameters.
Thus, we use as an unbiased transformation for B(v). In this sense, using a multiply robust unbiased transformation provides robustness to mis-specification of some models, specified in Díaz et al. (2021b), hence the name.
Our estimator proceeds by obtaining an estimate, , of η. All nuisance parameters are estimated using an ensemble regression approach known as the Super Learner (Van der Laan et al., 2007; van der Laan & S. Dudoit & A.W. van der Vaart, 2006), which creates an optimally weighted combination of algorithms. Then, the pseudo-outcome is computed for all individuals in the sample, and a regression of the pseudo-outcome on baseline covariates V is fitted. Following the approach of Luedtke and van der Laan (2016), we also use Super Learner in fitting the regression of on V and use the fitted values to obtain predictions . The group of individuals at risk for a harmful indirect effect is identified as those indices i with .
We use cross-fitting (Klaassen, 1987; Zheng and van der Laan, 2011; Chernozhukov et al., 2016) throughout the estimation process in all fits. Let 𝒱1, …, 𝒱J denote a random partition of data with indices i ∈ {1, …, n} into J prediction sets of approximately the same size such that . For each j, the training sample is given by 𝒯j = {1, …, n} \ 𝒱j. denotes the estimator of η, obtained by training the corresponding prediction algorithm using only data in the sample 𝒯j, and j(i) denotes the index of the validation set which contains observation i. We then use these fits, in computing . Likewise, regressions of the pseudo-outcome on Vi are trained within each training sample, and the final estimate is computed by predicting in the corresponding validation data set.
4. Estimating the population interventional indirect effect under a hypothetical treatment decision d(v)
As stated in the Introduction, our goal is to identify a subset of the population that is predicted to have a harmful indirect effect, and for whom treatment may be carefully considered or reconsidered. Even though our goal is not to develop a treatment rule, in some situations it may be important to assess the population effects that would be observed if the function d were used as a treatment rule. To that end, in this section we estimate the hypothetical population interventional indirect effect if we were to use d(V) to assign treatment to each individual. Specifically, we define the population interventional total effect of implementing d(V) as , and decompose it in terms of direct and indirect effects as
The population interventional indirect effect implementing d(V) can be identified under the sequential randomization assumptions and positivity in Section 3.1 as:
| (3) |
Because we do not have the true, unknown d, but instead have an estimate of it, , we estimate the population interventional indirect effect that would be observed if our estimate were implemented. We use the one-step estimation approach described in Díaz et al. (2021a), and which is based on solving the EIF estimating equation. Specifically, our estimator is defined as
We use a cross-fitted version of this estimator, as described in Section 3.2. The variance of this estimator can be estimated as the sample variance of the EIF. Theorem 5 of van der Laan and Luedtke (2015) proves that plugging in estimated from the data results in asymptotically linear estimation of its effect, even though the same data were used to estimate and to assess its effect.
The R code to implement this cross-fitted estimator is available: https://github.com/kararudolph/optpiie.
5. Identifying MTO participants with predicted harmful interventional indirect effects
We now apply our proposed approach to identify the subgroup of boys with a predicted harmful interventional indirect effect of voucher receipt on risk of past-year psychiatric disorder in adolescence through aspects of the school, neighborhood, and social environments. In this case, the population marginal interventional indirect effect through mediators related to the school and neighborhood environments and instability of the social environment is estimated to contribute to a statistically significant increased risk of having any psychiatric disorder in adolescence. The population interventional total effect is also estimated to be harmful in terms of this outcome, but not statistically significantly so.
If one is interested in optimizing the total treatment effect, one could apply existing methods to identify the subgroup who may be harmed by the intervention (Luedtke and van der Laan, 2016; Robins, 2004; Zhao et al., 2012). However, the possibility that some subgroups could have a beneficial total treatment effect, but a harmful indirect effect—especially if the indirect effect serves as a proxy surrogate for a more consequential process than the particular total effect being estimated—motivate our proposed approach to identify such a subset of individuals. In this application, we 1) identify the subgroup of boys who would be predicted to experience a harmful interventional indirect effect and thus, for whom voucher receipt may not be recommended, and 2) estimate the hypothetical population interventional indirect effect and hypothetical population interventional total treatment effect had the identified subgroup not received the intervention. We remind the reader here that the population interventional total effect is different from the intent-to-treat ATE (which is the type of total effect estimand commonly estimated in the MTO literature), we discuss these differences in Section 2.2.
This first task of identifying the subgroup based on predicted positive vs. negative conditional interventional indirect effects is a classification problem, which is successful if the robustness conditions given in Díaz et al. (2021b) are met. Using the so-called “black-box” algorithms proposed in Section 3.2 for this task would result in a flexible classifier, but we would be left without knowing which characteristics are important in predicting whether a conditional interventional indirect effect will be positive or negative. Consequently, we also use a simple regression tree (Breiman et al., 1984) to learn a less flexible, less-than-optimal, but interpretable d in terms of baseline characteristics.
5.1. Data and Analysis
MTO was summarized in the Introduction. For this data analysis, we restricted to adolescent boys who were surveyed at the final follow-up timepoint in the Boston, Chicago, New York City, and Los Angeles sites (N=2,100, rounded sample size).
5.1.1. Baseline covariates
We considered baseline covariates at the individual, family, and neighborhood level, W: study site, age, race/ethnicity, number of family members, previous problems in school, enrolled in special class for gifted and talented students, parent is high school graduate, parent marital status, parent work status, receipt of AFDC/TANF, whether any family member has a disability, perceived neighborhood safety, neighborhood satisfaction, neighborhood poverty level, reported reasons for participating in MTO, previous number of moves, previous application for Section 8 voucher.
5.1.2. Treatment
The treatment variable was baseline randomized intervention status of received voucher or not (A, binary 1/0).
5.1.3. Intermediate variable
The intermediate variable was whether or not the family used the voucher to move (Z, binary 1/0).
5.1.4. Mediators
Mediating variables represented aspects of the school and neighborhood environments and instability of the social environment over the 10–15 years of follow-up, M, and included: school rank, student-to-teacher ratio, % students receiving free or reduced-price lunch, % schools attended that were Title I (a measure of school poverty), number of schools attended, number of school changes within the year, whether or not the most recent school was in the same district as the baseline school, number of moves, neighborhood poverty; all weighted over the duration of follow-up. While it would be difficult to intervene directly on any of these mediators, their distribution differs by voucher status. This is because they are all affected by the neighborhood of residence. In the US, school poverty and school quality are tied to neighborhood, because school catchment zones are geographically drawn, and property taxes are a common source of funding.
5.1.5. Outcome
The long-term outcome was past-year psychiatric disorder at the final timepoint, when the children were adolescents (Y, binary 1/0).
5.1.6. Analysis
Two baseline covariates had missing data—race/ethnicity and baseline neighborhood poverty level were each missing for 2% of boys. Some mediators did not have any missing data (follow-up neighborhood poverty and number of moves), and the mediator of school rank had the most missingness at 12%. All other mediators were missing for 8–9% of boys. The outcome of past-year psychiatric disorder was missing for 8% of boys. For the purposes of this illustration, we use just one imputed dataset. Using multiple imputed datasets would add complications. For example, membership in the subgroup predicted to experience harmful interventional indirect effects could differ across imputed datasets, and reconciling differences, while interesting and of practical importance, represents additional open problems that we leave for future work. As recommended for all MTO analyses, we use individual-level weights that account for the sampling of children within families, assignment ratios, and loss-to-follow-up (Sanbonmatsu et al., 2011).
We first apply the estimation approach in Section 3.2 to identify the subgroup of boys predicted to experience a harmful interventional indirect effect of voucher receipt on subsequent risk of psychiatric disorder through aspects of the school, neighborhood, and social environments. We then apply the estimation procedure in Section 4 to estimate the population interventional indirect effects: , , and population interventional total effects: , . We use five folds in cross-fitting, and include: a simple mean model, main-terms generalized linear model, lasso, and extreme gradient boosted machines in our Super Learner ensemble. Finally, we report the estimated population interventional indirect effects and population interventional total effects using the alternative, interpretable estimated using the simple regression tree.
5.2. Results
Figure 1 shows: the typical population interventional indirect effect (PIIE, ) estimates and total effect (PITE, ) estimates not using any individualization, which are labeled as “no individualization”, and the PIIE and PITE estimates using to hypothetically assign boys to receive the voucher who were not predicted to experience a harmful effect through mediators of the school, neighborhood, and social environments (learned via Super Learner), denoted “superlearner estimation”. Using individualization learned through Super Learner would be expected to attenuate the harmful population interventional indirect effect point estimate (0.05 increased incidence of any psychiatric disorder during adolescence (95% CI: −0.04, 0.14) vs. 0.15 increased incidence (95% CI: 0.01, 0.30)). This individualization would also be expected to slightly attenuate the otherwise harmful population interventional total effect point estimate of voucher assignment on risk of having any psychiatric disorder in adolescence (−0.02 decreased risk of adolescent psychiatric disorder (95% CI: −0.22, 0.17) vs 0.08 increased risk (95% CI: −0.20, 0.36)). However, we acknowledge that all the total effect estimates have wide confidence intervals that span the null.
Figure 1:

Population interventional indirect effects and total effect by rule type. Effects of random assignment to receive a housing on long-term risk (i.e., probability) of developing a DSM-IV psychiatric disorder in adolescence among boys in the Moving to Opportunity study. All results were approved for release by the U.S. Census Bureau, authorization number CBDRB-FY22-CES018–008.
Applying a regression tree to estimate an interpretable , we find that boys were likely to have a harmful interventional indirect effect if: 1) their parent did not list “better schools” as a reason for volunteering for MTO, or if 2) they self-reported non-Hispanic/Latino black, white, or “other” race (92% reporting black race) in combination with a household size of 2 or 3 (including themselves). The PIIE and PITE estimates using this are denoted “regression tree estimation”.
However, these results may be biased if there exists interference among the boys participating in MTO, meaning that one boy’s counterfactual outcome may be affected by another boy’s family’s voucher assignment. The extent to which interference is present has been debated in the MTO literature (Sobel, 2006; Ludwig et al., 2008). Sobel (2006) argued that interference is likely in social, school, and neighborhood settings, and that the effects estimated in the presence of interference are a mixture of the causal effect without interference (e.g., the ATE) and the spillover effect in the control group, and that both effects are specific to the particular allocation of participants to treatment groups and their interactions. Unfortunately, the social networks among participants were not measured, so we cannot measure interference. In a response paper authored by the primary investigators, they countered that interference was likely minimal, as the majority of MTO participants reported no friends in the baseline neighborhood, an even higher majority reported no family in the baseline neighborhood, and that there was very little clustering of families in relocation neighborhoods at follow-up (Ludwig et al., 2008). The (rounded) N=2100 boys in this analysis resided in 40 housing projects at baseline (Sobel, 2006). In Chicago, the Robert Taylor Homes had more than 4,000 apartments, and in New York, large projects had between 1,000 and 3,000 apartments (Authority, 2021; Belluck, Sep 6, 1998). This means that the number of MTO families included in this analysis may have comprised about 1–5% of the housing project’s families, lending support to the primary investigators’ argument that the effect of interference may be minimal.
6. Conclusions
We proposed a nonparametric approach to identify a subgroup predicted to experience unintended harmful interventional indirect effects. We then proposed a nonparametric estimator to estimate the hypothetical population interventional indirect and total effects were one to make the decision not to treat the subgroup predicted to experience harmful interventional indirect effects. Our cross-fitted estimators solve the efficient influence function, so are robust and can incorporate machine learning in model fitting.
This work was motivated by surprising results from a large, housing policy intervention (the Moving to Opportunity Study) in which boys whose families were randomized to receive a housing voucher had statistically significant harmful indirect effects of voucher receipt on risk of having a subsequent psychiatric disorder operating through aspects of the school and social environments. One hypothesis is that this harmful population interventional indirect effect could reflect (be a proxy surrogate marker for) processes like racial discrimination. Boys whose families moved with the vouchers were slightly more likely to move to lower poverty neighborhoods and to attend schools with higher academic performance (Sanbonmatsu et al., 2011). However, data suggests they may also have been more likely to face discriminatory practices, like suspension and expulsion, and more social isolation (Rudolph et al., 2021a). It is possible that some may wish to avoid such negative processes, prompting the desire to identify those at risk for them. In this example, we do so and find that not giving the intervention to this subgroup would have indeed attenuated the harmful predicted population interventional indirect effect point estimate, though the confidence intervals are wide and overlapping.
Although in this case we were interested in the policy-relevant question about identification of subgroups who may have harmful indirect effects of random assignment to an intervention, others may be interested in a related question that uses randomized assignment as the instrumental variable for a nonrandomized exposure of interest. Despite much work on identification and estimation of conditional complier average treatment effects (Angrist and Imbens, 1995; Abadie, 2003; Frölich, 2007; Kasy, 2009; Ogburn et al., 2015), we know of no work on the identification and estimation of conditional complier direct and indirect effects. In fact, we know of few papers on the more general problem of identifying and estimating complier direct and indirect effects (Frölich and Huber, 2017; Rudolph et al., 2021b; Imai et al., 2013). We are currently pursuing work in this area(Rudolph et al., 2021c) and hope others will also.
An alternative to the approach we proposed that relies on interventional conditional direct and indirect effects would be to instead estimate bounds of natural conditional direct and indirect effects when they are not point identified, as has been done in the marginal case (Miles et al., 2017). This is another area for future work.
In terms of estimation, our proposal builds on both the causal mediation and individualized optimal treatment regime literatures. However, we caution against interpreting as a decision rule, understanding that, at least in some instances, deciding whether or not someone would benefit from an intervention/treatment is necessarily more nuanced than the mediation mechanisms and total effects we consider. Future work could focus on incorporating additional complexities and nuances into the existing tools used in personalized medicine to reflect the multiple mechanisms by which treatments or interventions may affect numerous, and sometimes competing, outcomes among individuals, recognizing heterogeneities (anticipated or not) at each stage in the process.
Supplementary Material
Funding information:
KER’s time was supported by R00DA042127 and R01DA053243.
References
- Abadie A (2003) Semiparametric instrumental variable estimation of treatment response models. Journal of econometrics, 113, 231–263. [Google Scholar]
- Angrist JD and Imbens GW (1995) Two-stage least squares estimation of average causal effects in models with variable treatment intensity. Journal of the American statistical Association, 90, 431–442. [Google Scholar]
- Authority NYCH (2021) Map of nycha developments. https://www1.nyc.gov/assets/nycha/downloads/pdf/nychamap.pdf “, Date Accessed: 23 March 2022”.
- Avin C, Shpitser I and Pearl J (2005) Identifiability of path-specific effects.
- Belluck P (Sep 6, 1998) End of a ghetto: A special report. razing the slums to rescue the residents. The New York Times, https://www.nytimes.com/1998/09/06/us/end--of--a--ghetto--a--special--report--razing--the--slums--to--rescue--the--residents.html“, Date Accessed: 23 March 2022”. [Google Scholar]
- Breiman L, Friedman J, Stone CJ and Olshen RA (1984) Classification and regression trees. CRC press. [Google Scholar]
- Butler EL, Laber EB, Davis SM and Kosorok MR (2018) Incorporating patient preferences into estimation of optimal individualized treatment rules. Biometrics, 74, 18–26. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chernozhukov V, Chetverikov D, Demirer M, Duflo E, Hansen C et al. (2016) Double machine learning for treatment and causal parameters. arXiv preprint arXiv:1608.00060. [Google Scholar]
- Díaz I, Hejazi NS, Rudolph KE and van Der Laan MJ (2021a) Nonparametric efficient causal mediation with intermediate confounders. Biometrika, 108, 627–641. [Google Scholar]
- Díaz I, Savenkov O and Ballman K (2018) Targeted learning ensembles for optimal individualized treatment rules with time-to-event outcomes. Biometrika, 105, 723–738. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Díaz I, Williams N, Hoffman KL and Schenck EJ (2021b) Nonparametric causal effects based on longitudinal modified treatment policies. Journal of the American Statistical Association, 1–16.35757777 [Google Scholar]
- Frölich M (2007) Nonparametric iv estimation of local average treatment effects with covariates. Journal of Econometrics, 139, 35–75. [Google Scholar]
- Frölich M and Huber M (2017) Direct and indirect treatment effects–causal chains and mediation analysis with instrumental variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 79, 1645–1666. [Google Scholar]
- Imai K, Tingley D and Yamamoto T (2013) Experimental designs for identifying causal mechanisms. Journal of the Royal Statistical Society: Series A (Statistics in Society), 176, 5–51. [Google Scholar]
- Kasy M (2009) Semiparametrically efficient estimation of conditional instrumental variables parameters. The International Journal of Biostatistics, 5. [Google Scholar]
- Kennedy EH (2020) Optimal doubly robust estimation of heterogeneous causal effects. arXiv preprint arXiv:2004.14497. [Google Scholar]
- Kessler RC, Duncan GJ, Gennetian LA, Katz LF, Kling JR, Sampson NA, Sanbonmatsu L, Zaslavsky AM and Ludwig J (2014) Associations of housing mobility interventions for children in high-poverty neighborhoods with subsequent mental disorders during adolescence. JAMA, 311, 937–948. [DOI] [PMC free article] [PubMed] [Google Scholar] [Retracted]
- Klaassen CA (1987) Consistent estimation of the influence function of locally asymptotically linear estimators. The Annals of Statistics, 1548–1562. [Google Scholar]
- Kling JR, Liebman JB and Katz LF (2007) Experimental analysis of neighborhood effects. Econometrica, 75, 83–119. [Google Scholar]
- van der Laan MJ and Luedtke AR (2015) Targeted learning of the mean outcome under an optimal dynamic treatment rule. Journal of causal inference, 3, 61–95. [DOI] [PMC free article] [PubMed] [Google Scholar]
- van der Laan MJ and Petersen ML (2008) Direct effect models. The international journal of biostatistics, 4. [DOI] [PubMed] [Google Scholar]
- Van der Laan MJ, Polley EC and Hubbard AE (2007) Super learner. Statistical applications in genetics and molecular biology, 6. [DOI] [PubMed] [Google Scholar]
- Laber EB, Linn KA and Stefanski LA (2014) Interactive model building for q-learning. Biometrika, 101, 831–847. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Linn KA, Laber EB and Stefanski LA (2015) Chapter 15: Estimation of dynamic treatment regimes for complex outcomes: balancing benefits and risks. In Adaptive treatment strategies in practice: Planning trials and analyzing data for personalized medicine, 249–262. SIAM. [Google Scholar]
- Lizotte DJ, Bowling M and Murphy SA (2012) Linear fitted-q iteration with multiple reward functions. The Journal of Machine Learning Research, 13, 3253–3295. [PMC free article] [PubMed] [Google Scholar]
- Ludwig J, Liebman JB, Kling JR, Duncan GJ, Katz LF, Kessler RC and Sanbonmatsu L (2008) What can we learn about neighborhood effects from the moving to opportunity experiment? American Journal of Sociology, 114, 144–188. [Google Scholar]
- Luedtke AR and van der Laan MJ (2016) Super-learning of an optimal dynamic treatment rule. The international journal of biostatistics, 12, 305–332. [DOI] [PMC free article] [PubMed] [Google Scholar]
- MacKinnon DP, Krull JL and Lockwood CM (2000) Equivalence of the mediation, confounding and suppression effect. Prevention science, 1, 173–181. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Miles C, Kanki P, Meloni S and Tchetgen ET (2017) On partial identification of the natural indirect effect. Journal of Causal Inference, 5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Moodie EE, Chakraborty B and Kramer MS (2012) Q-learning for estimating optimal dynamic treatment rules from observational data. Canadian Journal of Statistics, 40, 629–645. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Murphy S (2003) Optimal dynamic treatment regimes. Journal of the Royal Statistical Society: Series B, 65, 331–66. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Murray CJ, Vos T, Lozano R, Naghavi M, Flaxman AD, Michaud C, Ezzati M, Shibuya K, Salomon JA, Abdalla S et al. (2012) Disability-adjusted life years (dalys) for 291 diseases and injuries in 21 regions, 1990–2010: a systematic analysis for the global burden of disease study 2010. The lancet, 380, 2197–2223. [DOI] [PubMed] [Google Scholar]
- Nabi R, Kanki P and Shpitser I (2018) Estimation of personalized effects associated with causal pathways. In Uncertainty in artificial intelligence: proceedings of the… conference. Conference on Uncertainty in Artificial Intelligence, vol. 2018. NIH Public Access. [PMC free article] [PubMed] [Google Scholar]
- Ogburn EL (2012) Commentary on” mediation analysis without sequential ignorability: Using baseline covariates interacted with random assignment as instrumental variables” by dylan small. Journal of statistical research, 46, 105–111. [PMC free article] [PubMed] [Google Scholar]
- Ogburn EL, Rotnitzky A and Robins JM (2015) Doubly robust estimation of the local average treatment effect curve. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 77, 373–396. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pearl J (2000) Causality: models, reasoning, and inference. Cambridge University Press. ISBN 0, 521, 8. [Google Scholar]
- — (2001) Direct and indirect effects. In Proceedings of the seventeenth conference on uncertainty in artificial intelligence, 411–420. Morgan Kaufmann Publishers Inc. [Google Scholar]
- Petersen ML, Sinisi SE and van der Laan MJ (2006) Estimation of direct causal effects. Epidemiology, 276–284. [DOI] [PubMed] [Google Scholar]
- Qian M and Murphy SA (2011) Performance guarantees for individualized treatment rules. Annals of statistics, 39, 1180. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Robins JM (2004) Optimal structural nested models for optimal sequential decisions. In Proceedings of the second seattle Symposium in Biostatistics (eds. Lin DY and Heagerty PJ), 189–326. Seattle, Washington: Springer Science+ Business Media. [Google Scholar]
- Rubin D and van der Laan MJ (2007) A doubly robust censoring unbiased transformation. The international journal of biostatistics, 3. [DOI] [PubMed] [Google Scholar]
- Rudolph KE, Gimbrone C and Díaz I (2021a) Helped into harm: Mediation of a housing voucher intervention on mental health and substance use in boys. Epidemiology, 32, 336–346. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rudolph KE, Goin DE and Stuart EA (2020) The peril of power: a tutorial on using simulation to better understand when and how we can estimate mediating effects. American journal of epidemiology, 189, 1559–1567. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rudolph KE, Sofrygin O and van der Laan MJ (2021b) Complier stochastic direct effects: identification and robust estimation. Journal of the American Statistical Association, 116, 1254–1264. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rudolph KE, Sofrygin O, Zheng W and Van Der Laan MJ (2017) Robust and flexible estimation of stochastic mediation effects: A proposed method and example in a randomized trial setting. Epidemiologic Methods, 7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rudolph KE, Williams N and Diaz I (2021c) Causal mediation with instrumental variables. arXiv preprint arXiv:2112.13898. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sanbonmatsu L, Katz LF, Ludwig J, Gennetian LA, Duncan GJ, Kessler RC, Adam EK, McDade T and Lindau ST (2011) Moving to opportunity for fair housing demonstration program: Final impacts evaluation.
- Shpitser I and Sherman E (2018) Identification of personalized effects associated with causal pathways. In Uncertainty in artificial intelligence: proceedings of the… conference. Conference on Uncertainty in Artificial Intelligence, vol. 2018. NIH Public Access. [PMC free article] [PubMed] [Google Scholar]
- Sobel ME (2006) What do randomized studies of housing mobility demonstrate? causal inference in the face of interference. Journal of the American Statistical Association, 101, 1398–1407. [Google Scholar]
- Tchetgen Tchetgen EJ and Shpitser I (2014) Estimation of a semiparametric natural direct effect model incorporating baseline covariates. Biometrika, 101, 849–864. [DOI] [PMC free article] [PubMed] [Google Scholar]
- van der Laan M & Dudoit S & van der Vaart AW (2006) The cross-validated adaptive epsilon-net estimator. Statistics & Decisions, 24, 373–395. [Google Scholar]
- VanderWeele TJ and Tchetgen Tchetgen EJ (2017) Mediation analysis with time varying exposures and mediators. Journal of the Royal Statistical Society. Series B, Statistical Methodology, 79, 917. [DOI] [PMC free article] [PubMed] [Google Scholar]
- VanderWeele TJ, Vansteelandt S and Robins JM (2014) Effect decomposition in the presence of an exposure-induced mediator-outcome confounder. Epidemiology (Cambridge, Mass.), 25, 300. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Whitehead SJ and Ali S (2010) Health outcomes in economic evaluation: the qaly and utilities. British medical bulletin, 96, 5–21. [DOI] [PubMed] [Google Scholar]
- Zhang B, Tsiatis AA, Davidian M, Zhang M and Laber E (2012) Estimating optimal treatment regimes from a classification perspective. Stat, 1, 103–114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhao Y, Zeng D, Rush AJ and Kosorok MR (2012) Estimating individualized treatment rules using outcome weighted learning. Journal of the American Statistical Association, 107, 1106–1118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhao Y-Q, Zeng D, Laber EB and Kosorok MR (2015) New statistical learning methods for estimating optimal dynamic treatment regimes. Journal of the American Statistical Association, 110, 583–598. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zheng W and van der Laan MJ (2011) Cross-validated targeted minimum-loss-based estimation. In Targeted Learning, 459–474. Springer. [Google Scholar]
- — (2012) Targeted maximum likelihood estimation of natural direct effects. The international journal of biostatistics, 8, 1–40. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
