Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2017 Dec 1.
Published in final edited form as: Ann Stat. 2016 Nov 23;44(6):2433–2466. doi: 10.1214/15-AOS1411

CAUSAL INFERENCE WITH A GRAPHICAL HIERARCHY OF INTERVENTIONS

Ilya Shpitser , Eric Tchetgen Tchetgen §
PMCID: PMC5597261  NIHMSID: NIHMS900488  PMID: 28919652

Abstract

Identifying causal parameters from observational data is fraught with subtleties due to the issues of selection bias and confounding. In addition, more complex questions of interest, such as effects of treatment on the treated and mediated effects may not always be identified even in data where treatment assignment is known and under investigator control, or may be identified under one causal model but not another.

Increasingly complex effects of interest, coupled with a diversity of causal models in use resulted in a fragmented view of identification. This fragmentation makes it unnecessarily difficult to determine if a given parameter is identified (and in what model), and what assumptions must hold for this to be the case. This, in turn, complicates the development of estimation theory and sensitivity analysis procedures.

In this paper, we give a unifying view of a large class of causal effects of interest, including novel effects not previously considered, in terms of a hierarchy of interventions, and show that identification theory for this large class reduces to an identification theory of random variables under interventions from this hierarchy. Moreover, we show that one type of intervention in the hierarchy is naturally associated with queries identified under the Finest Fully Randomized Causally Interpretable Structure Tree Graph (FFRCISTG) model of Robins (via the extended g-formula), and another is naturally associated with queries identified under the Non-Parametric Structural Equation Model with Independent Errors (NPSEM-IE) of Pearl, via a more general functional we call the edge g-formula.

Our results motivate the study of estimation theory for the edge g-formula, since we show it arises both in mediation analysis, and in settings where treatment assignment has unobserved causes, such as models associated with Pearl’s front-door criterion.

Keywords: causal inference, graphical models, mediation analysis, identification

1. Introduction

The goal of the empirical sciences is discerning cause-effect relationships by experimentation and analysis. This is made difficult by the ubiquity of hidden variables, and the difficulty of collecting data free from confounding and selection bias. Two useful frameworks for address ing these difficulties have been potential outcomes, introduced by Neyman [8], and expanded by Rubin [21], and causal graphical models, first used in linear models by Wright [35], and later expanded into a general framework (see for example [30], and [11]). There exists a modern synthesis of these two frameworks, where causal models based on non-parametric structural equations are defined on potential outcome random variables, and assumptions defining these models can be represented by (absences) of arrows in a graph. See [11] chapter 7, and [13] for a detailed treatment.

Potential outcome random variables represent outcomes under a hypothetical intervention operation, which corresponds to an idealized randomized control trial. Concepts such as the overall causal effect of a treatment can be represented as causal parameters on appropriate potential outcomes, and as statistical estimands if appropriate assumptions hold.

The synthesis of potential outcomes and graphs has been instrumental in much of the recent work on identification of various types of causal parameters such as total effects [14, 33, 25, 26, 27], and mediated effects [10, 1, 24].

Nevertheless, the existing literature suffers from three problems. First, a single graph may correspond to different causal models, which means a particular causal parameter may be identified under one causal model, but not under another, even though the models share the same graph. Second, different types of causal parameters seem to have different key issues underlying their identification, which makes it difficult to determine the specific assumptions that must hold for identification. For instance, certain types of unobserved confounding must be absent in order for overall effects to be identifiable, while even completely unconfounded mediated effects may be unidentified [1]. Finally, because of the complex nature of identification theory for causal parameters, existing conventional wisdom on what is identifiable is too conservative. For example, it is often assumed that a mediator and outcome must remain completely unconfounded in order to obtain identification of mediated causal effects. However, this is not true [24].

These issues make it difficult to determine if a particular causal parameter is identified, and under what model, what assumptions underlie this identification, and what the corresponding statistical parameter is. This complicates estimation theory, the development of parametric relaxations that permit identification, and sensitivity analysis procedures.

1.1. Outline of the Paper

The contents of the paper can be summarized by a picture in Fig. 1. In section 2, we introduce our notation, necessary graph theory, standard interventions (which we call node interventions in this manuscript) and potential outcomes, which are responses to node interventions. We also introduce the FFRCISTG model of Robins, which in this paper we call the “single world model (SWM),” and the NPSEM-IE of Pearl, which is a submodel of the FFRCISTG model, and which we call the “multiple worlds model (MWM).” The reasons for these names will become clear when these models are defined. The subset relationship of these two models is shown explicitly in Fig. 1. Finally, we discuss targets of interest in causal inference known as total effects, which are defined in terms of node interventions, and discuss identification theory for these targets under the SWM via the extended g-formula.

Fig 1.

Fig 1

A hierarchy of responses to interventions defined with respect to features of a causal graph, the relationship of this hierarchy to targets of interest in causal inference, such as path-specific effects (PSEs), effects of treatments on the multiply treated (ETMTs), and new targets such as effects of treatments on the indirectly treated (ETITs), and identifiability under causal models defined in the literature.

In section 3, we define additional types of interventions, that we term edge and path interventions, and responses to these types of interventions via recursive substitution. Responses to node, edge and path interventions form an inclusion hierarchy in the sense that responses to node interventions are a special case of responses to edge interventions, which are in turn a special case of responses to path interventions. This inclusion is denoted by the subset relations in Fig. 1. We also discuss how targets of inference in mediation analysis known as direct and indirect effects are defined in terms of edge interventions.

In section 4, we show how we can express a wide variety of targets of interest in causal inference, such as path-specific effects (PSEs) or effects of treatment on the multiply treated (ETMTs) as responses to path interventions. In addition, we show that path interventions are general enough to accommodate novel targets which combine features of PSEs and ETMTs, which we call effects of treatment on the indirectly treated (ETITs). Our results then imply novel identification results for these targets, and others not previously considered in the literature, but expressible as path interventions.

In section 5, we show that there is a natural correspondence between causal models and intervention types we discuss in the following sense. We show that responses to node interventions are identified under the SWM, and responses to edge interventions are identified under the MWM. Furthermore, we show that if a response to an edge intervention cannot be expressed as a node intervention, then it is not identified under the SWM, and if a response to a path intervention cannot be expressed as an edge intervention, then it is not identified under the MWM.

The identification of node interventions under the SWM is via the well known extended g-formula [20, 13], which we give as equation (2). The identification of edge interventions under the MWM is via a generalization of (2), which we call the edge g-formula, and give as equation (5).

We also give examples of targets of interest in causal inference that do not correspond to responses to path interventions, as well as an example of a submodel of the MWM where even path interventions not ordinarily identified under the MWM are identified.

In Section 6 we briefly discuss the relationship of our results to Single World Intervention Graphs (SWIGs) [13].

Section 7 shows that a certain class of functionals that identify causal effects in latent variable causal models [33, 25] corresponds to functionals derived from the edge g-formula. This implies, in particular, that functionals that arise for treatment effects with unobserved causes of treatments, such as the front-door functional, also arise in mediation analysis.

In section 8, we illustrate the connection of our work to existing estimation theory for causal parameters, and suggest avenues of future work, by giving a known example of an estimator for a parameter derived from a special case of the edge g-formula.

What the overall picture implies is that once we solve the identification problem for the responses to interventions in our hierarchy, as we do here, we immediately reduce the identification problem for a wide class of tar gets of interest to the much easier problem of translating those targets into responses to path interventions. Once that translation is complete, the question of what is identified under what model is immediately settled. In addition, our developments imply that estimation theory for functionals derived from the edge g-formula is relevant for a large class of inference targets identified under the MWM, including path-specific effects, effects of treatment on the multiply treated, and certain total causal effects with unobserved causes of treatments.

In the interests of space, the vast majority of arguments for our results appear in the appendices in the supplementary materials [29]. In addition, the supplementary materials contains our rationale for the use of path interventions, rather than simpler or more algebraic representations of causal inference targets.

2. Notation and Deftnitions

We introduce graph theory terms, potential outcomes, and statistical and causal graphical models.

2.1. Graphs and Random Variables

We will associate random variables with vertices in graphs. We will denote both a single vertex and a single corresponding random variable as an uppercase Roman letter, e.g. A. Sets of vertices (and corresponding random variables) will be denoted by uppercase bold letters, e.g. A.

For a random variable V, let 𝔛V be the state space of V. For example if V is binary, then 𝔛V = {0, 1}. We denote elements of a set 𝔛A (values of A) by lowercase Roman letters: a ∈ 𝔛A. The state space of a set V of random variables is simply the Cartesian product of the individual state spaces: 𝔛V = XVV (𝔛V).

Sets of values corresponding to sets of random variables will be denoted by lowercase bold letters, e.g. a ∈ 𝔛A. Sometimes we will denote a restriction of a set of values by a set subscript. That is if v is a set of values of V, and AV, then vA is a restriction of v to A.

An edge in a graph is a vertex adjacency coupled with an orientation. A path in a directed graph is a (possibly empty) sequence of nodes of the form (A1A2A3 . . .Ak−1Ak), where each node in the sequence occurs exactly once, and each Ai, Ai+1 share an edge. The first vertex in a path sequence is called the source, and the last vertex is called the sink. A path with two vertices (A1A2) is just an edge.

A subpath of a path is a subsequence of edges in a path that themselves form a path. A suffix subpath of (A1A2 . . . Am−1Am . . . Ak−1Ak) is a subpath of the form (Am−1Am . . . Ak−1Ak), while a prefix subpath is a subpath of the form (A1A2 . . . Am−1Am). A directed path from A1 to Ak has edges for every i of the form AiAi+1. We will denote a directed path as (A1A2 . . . Ak), and also by Greek letters, e.g. α, and sets of directed paths by bold Greek letters, e.g. α. A source vertex of α will be written so𝒢 (α), and the sink vertex will be written sink𝒢 (α).

We say a directed cycle exists in a graph if it contains a path (A1A2A3 . . . Ak) and an edge (AkA1). A directed graph lacking directed cycles is called acyclic, abbreviated as DAG.

2.2. Causal Models of a DAG

For a subset A of random variables V, and a value assignment a to A, we denote a forced assignment of A to an element of 𝔛A as a node intervention. A node intervention which maps A to a ∈ 𝔛A will be denoted by νa. Pearl denoted node interventions νa by do(a), and Robins by g = a. We use alternative notation in this paper to avoid ambiguity, because we will consider other types of interventions. It is also possible to consider more complex types of interventions on nodes, known as dynamic treatment regimes, where assigned values to A are not constants, but functions of variables assigned and observed in the past [14, 7, 6]. Although generalizations of our results to this setting are possible, we do not pursue them in the interests of space.

For a random variable YV and a ∈ 𝔛A for a set AV we denote a (random) response to a node intervention νa as Y (a). These random variables are also called potential outcomes, because Y is often an outcome of interest, and the intervention is often hypothetical, rather than actually occurring. Given a set Y = {Y1, . . ., Yk} of random variables, we denote {Y1(a), . . ., Yk(a)} by Y(a) or {Y}(a).

Let pa𝒢 (V) be the set of parents of V in 𝒢, that is the set {W | (WV) is in 𝒢}. Following [13], given a DAG 𝒢 with vertices V, we will assume the existence of V (vpa𝒢 (V)) for every VV and for all vpa𝒢 (V) ∈ 𝔛pa𝒢 (V), as well as a well-defined joint distribution over these random variables, and use these potential outcomes, and the associated joint, to define others using recursive substitution.

In particular, for any AV, and any a ∈ 𝔛A, we define for every VV

V(a)V(apa𝒢(V),{pa𝒢(V)\A}(a)) (1)

In words, this states that the response of V to νa is defined as the potential outcome where all parents of V which are in A are assigned an appropriate value from a, and all other parents are assigned whatever value they would have attained under a node intervention νa (these are defined recursively, and the definition terminates because of the lack of directed cycles in 𝒢). For example, in the graph in Fig. 2 (a), Y (a) = Y (a, M (a)).

Fig 2.

Fig 2

(a) A simple causal graph. (b) The transitive closure with respect to blue arrows of this graph is a causal graph representing two time slices of a longitudinal study in HIV research.

It is possible to construct additional types of potential outcomes other than those that are responses to node interventions. We will discuss some such potential outcomes later. However, responses to node interventions are sufficient to define causal models. Just as a statistical model is a set of distributions over V defined by some restriction, we view a causal model as a set of distributions over {V (vpa𝒢 (V)) | VV} defined by some restriction. We will call elements of a causal model causal structures, and denote them as c(V, 𝒢), by analogy with p(V), but indexed by a graph. In this paper we will consider two causal models.

We adopt the definitions presented in [13]. We define the finest fully randomized causally interpretable structured tree graph (FFRCISTG) model associated with a DAG 𝒢 with vertices V, as the set of all possible potential outcome responses subject to the restriction that the variables in the set

{V(vpa𝒢(V))|VV}

are mutually independent for every v ∈ 𝔛V. We define the non-parametric structural equation model with independent errors (NPSEM-IE) associated with a DAG 𝒢 with vertices V, as the set of all possible potential outcome responses subject to the restriction that the sets of variables

{{V(aV)|aV𝒳pa𝒢(V)}|VV}

are mutually independent. The NPSEM-IE associated with a particular graph is a submodel of the FFRCISTG model associated with the same graph, because it always places at least as many restrictions on potential outcome responses, and in most cases many more.

For example, the binary FFRCISTG model associated with the DAG in Fig. 2 (a) asserts that variables W, A(w), M (a, w), Y (a, m) are mutually independent for any a, m, w ∈ {0, 1}, while the binary NPSEM-IE model associated with the same DAG asserts that sets {W}, {A(w) | w ∈ {0, 1}}, {M (a, w) | a ∈ {0, 1}, w ∈ {0, 1}}, {Y (a, m) | a ∈ {0, 1}, m ∈ {0, 1}} are mutually independent. The FFRCISTG model always imposes restrictions on a set of variables under a single set of interventions (a “single world”), while the NPSEM-IE may also impose restrictions on variables across multiple conflicting sets of interventions simultaneously. To emphasize this, we will refer to the FFRCISTG model as a “single world model” (SWM), and to the NPSEM-IE as a “multiple worlds model” (MWM) in the remainder of this paper.

A crucial difference between the SWM and the MWM, is that the assumptions of the former are possible to test, at least in principle, by checking independences in a distribution of responses in an idealized randomized controlled trial. That is, if we wanted to check if W is independent of A(w), we could check independence in a joint distribution obtained from recording, for a set of units, the values of W immediately before treatment w is assigned, and the response values of A under that assignment. However, checking if M (a) is independent of Y (a′, m) would entail somehow knowing how the response M of a unit behaves under assigned treatment a, and simultaneously how the response Y of the unit behaves under a conflicting treatment a′, (and m). One may be able to argue for explicit construction of such joint responses in certain designs [5], or for certain types of units, for instance logic gates in a digital circuit. However, in general, assumptions defining the MWM are not experimentally testable.

2.3. Identification of Node Interventions

Responses to interventions of various types can be used to define targets of interest, discussed in more detail in Section 4. However, in order for these definitions to be useful, they must be linked to actually observed data. If such a link can be provided, that is, if a particular response can be expressed as a functional of the observed joint distribution p(V) for any element of a causal model, we say that the response is identified under that causal model from p(V).

In causal models, this link is typically provided via the consistency assumption, which is sometimes informally stated as “in the subpopulation where A = a, Y(a) behaves as Y.” Under the definition of the SWM (and the MWM), consistency is implied by (1), see [13], p. 21. Thus, consistency is “folded in” to the model definition. Thus we will describe identification in terms of a particular model, and not mention consistency itself. Note that (1) is an assumption defined using a particular graph. If we are mistaken about the true graph, for instance due to the presence of unaccounted hidden variables, then some parts of (1), and thus some parts of the consistency assumption, may not be justifiable under the true causal model.

Identification theory for node interventions in causal DAG models is well understood. Given a DAG 𝒢 with vertices V, and two arbitrary subsets A, Y of V (not necessarily disjoint), the distribution p(Y(a)) for any value assignment a ∈ 𝔛A can be identified under the SWM as a functional of the observed distribution p(V) using the extended g-formula [20], given by

p(Y(a)=vY)=vV\YVVp(vV|apa𝒢(V)A,vpa𝒢(V)\A) (2)

where v ∈ 𝔛V. A recent proof of this appears in [13]. Special cases of (2) where A and Y are disjoint are known as the g-formula [14], the manipulated distribution [30], or the truncated factorization [11]. Because the MWM is a causal submodel of the SWM, (2) also holds under the MWM.

2.4. Total Effects as Responses to Node Interventions

Node interventions are used to represent causal effects of treatments as a contrast of potential outcome responses to different treatment assignments. By considering an intervention we remove the impact of confounding via assignment policy. For example, consider the simple causal graph shown in Fig. 2 (a), representing an observational study with a single application of one of two treatments m, m′. Variable M is assigned to either m or m′ based on (observed) patient health status (A, W), and survival Y is measured. Doctors follow a known policy p(M | A, W) in assigning M where sicker patients are more likely to get m. Note that p(alive | m) < p(alive | m′) may hold simply due to the assignment policy in the study which introduces confounding by health status, even if m is a better drug.

One appropriate contrast that adjusts for the influence of confounding by health status on the effect of interest can be expressed via node interventions, and is known as the average causal effect (ACE): 𝔼[Y (m)] − E[Y (m′)]. This contrast can be computed from the distribution p(Y (m)) for all m ∈ 𝔛M, which is equal, under (2), to

p(Y(m))=w,a,mp(Y|m,a,w)p(m|a,w)p(a,w)=w,ap(Y|m,a,w)p(a,w).

This recovers the well-known back-door formula [11].

Consider now a more complex example corresponding to the following problem from HIV research. In a longitudinal study, HIV patients were put on an antiretroviral drug regimen, where the specific level of drug exposure over time was controlled by a known policy, which was based on covariates observed for each patient. However, the outcome of the study has been disappointing. The question is whether this was due to the drug itself performing poorly, or whether patient’s adherence was poor. Consider a causal graph representing two time slices of this longitudinal study. To avoid cluttering the figure with too many edges, we represent the causal graph schematically by its transitive reduction with respect to blue edges, shown in Fig. 2 (b). That is, the true graph 𝒢* contains a blue arrow between any pair of nodes A, B connected by a blue directed path in Fig. 2 (b) (and inherits all red edges as well).

Here C0 is a vector of observed baseline confounders, A1, A2 are exposures over time, W1, W2 are drug toxicity levels at each exposure time, C1, C2 are adherence levels at each time, Y1, Y2 are outcomes, and U is an unobserved confounder. Both red and blue arrows represent direct causation. In general, a reasonable causal graph will contain unobserved common causes of most vertices, but in this example we assume adherence C1, C2, and treatments A1, A2 are only directly affected by the observed variables in the past, such as the toxicity level of the drug, and not by U . These assumptions are represented graphically by the absence of red edges from U to A1, A2, C1, C2.

We first consider the total effect of the two exposures on outcome Y2, formalized as the two-exposure version of ACE. We consider more complex effects involving mediation by adherence in subsequent sections. The ACE contrast is defined with respect to active treatment levels, which we denote a1, a2, and baseline treatment levels, which we denote a1,a2. In our case, the contrast is equal to ACE𝔼[Y2(a1,a2)]𝔼[Y2(a1,a2)]. If we were able to randomize treatment assignment to A1, A2, we could evaluate the ACE directly from experimental data. However, our data comes from an observational longitudinal study, and therefore we must properly adjust for observed confounders of the exposures. Robins [14] noted that in cases like these, assuming the underlying SWM represented by our graph is correct, we can get a bias-free estimand of the ACE from observational data using the g-computation algorithm, which in this case gives

ACE=y1,c1,w1,c0𝔼[Y2|a2,y1,c1,w1,a1,c0]p(y1,c1,w1|a1,c0)p(c0)y1,c1,w1,c0𝔼[Y2|a2,y1,c1,w1,a1,c0]p(y1,c1,w1|a1,c0)p(c0)

This is, yet again, a special case of (2). This estimand can be estimated via either the parametric g-formula [15], inverse weighting methods [19], or doubly robust methods [18].

In the following section, we introduce intervention types that generalize node interventions, and consider other types of causal effects which may be represented as responses to such intervention types.

3. Edge and Path Interventions

We consider two additional types of interventions defined on graphical features, edge and path interventions, and define responses to these interventions using recursive substitution in a natural way. As we shall see, responses to path interventions include many targets of interest in causal inference, including effects of treatment on the treated, mediated effects, and even novel effects that combine features of both.

3.1. Edge Interventions

For a set of edges α in a DAG 𝒢, define 𝔛α ≡ 𝔛so𝒢 (α). In other words, 𝔛α is a Cartesian product of the state spaces of source variables of all directed edges in α.

The state space of a given vertex in 𝒢 may occur multiple times in 𝔛α if multiple edges in α share the same source vertex. We denote members of 𝔛α by lowercase Frankfurt font: 𝔞 ∈ 𝔛α. We do so to emphasize that elements of 𝔛α may contain multiple conflicting value assignments to the same random variable, unlike elements of 𝔛A. For example, consider the graph in Fig. 2 (a), where 𝔛A = {0, 1}. Then if α = {(AM), (AY)}, a valid element 𝔞 of 𝔛α associates 0 with the variable associated with the parent vertex A of (AM) and 1 with the variable associated with the parent vertex A of (AY). Unlike elements of 𝔛A, it is not immediately clear what set of edges 𝔞 is referring to, so we will subscript the set of edges, if necessary, like so: 𝔞α.

We call a forced assignment of variables corresponding to source vertices of edges from α to an element of 𝔛α an edge intervention. An edge intervention which assigns α to an element 𝔞α ∈ 𝔛α will be denoted by η𝔞α. As with elements of 𝔛A, we denote a restriction of 𝔞 by a set subscript. That is, if 𝔞α ∈ 𝔛α, and βα, then 𝔞β is a restriction of 𝔞 to variables corresponding to source vertices of β.

We define responses of outcomes to edge interventions in the natural way using recursive substitution, the potential outcomes of the form V (vpa𝒢 (V)), and a joint distribution over these potential outcomes. For every VV, a set of edges α in a DAG 𝒢, and an element 𝔞α ∈ 𝔛α, we define the response of V to η𝔞α as

V(𝔞α)V(𝔞{(V)α},{pa𝒢α¯(V)}(𝔞α)) (3)

where pa𝒢α¯(V){Apa𝒢(V)|(AV)α}.

In words, this states that the response of V to η𝔞α, where 𝔞α ∈ 𝔛α is defined as the potential outcome where all parents of V along edges in α are assigned an appropriate value from 𝔞α, and all other parents are assigned whatever value they would have attained under an edge intervention η𝔞α (these are defined recursively, and the definition terminates because of the lack of directed cycles in 𝒢).

As before, given a set Y = {Y1, . . ., Yk} of random variables, we denote {Y1(𝔞α), . . ., Yk (𝔞α)} by Y(𝔞α) or {Y}(𝔞α).

3.2. Direct and Indirect Effects as Responses to Edge Interventions

Just as responses to node interventions can be used to represent total causal effects, so can responses to edge interventions be used to represent direct and indirect effects. Consider again Fig. 2 (a), but now assume A is the treatment (one of two drugs a, a′), Y is the outcome (survival), and M is a dangerous side effect that mediates some of the effect of A on Y.

We may be interested in how much of the total effect, as formalized via the ACE contrast 𝔼[Y (a)] − 𝔼[Y (a′)], can be attributed to the direct effect of the drugs on Y, and how much to the mediated effect via the side effect M. To formalize this, we want to consider how Y varies if we can set treatments separately for the purposes of the direct causal pathway represented by (AY) and the pathway mediated by M, represented by (AM). This is precisely what edge interventions allow us to do. Consider η 𝔞 that sets (AM) to a and (AY) to a′. Then (3) implies Y (𝔞) = Y (a′, M (a)). We can use this type of response to define the direct effect as the contrast 𝔼(Y (a)) − 𝔼[Y (a′, M (a))], and the indirect effect as the contrast 𝔼[Y (a′, M (a))] − 𝔼[Y (a′)]. Note that the ACE is a sum of the direct and indirect effect contrasts above.

The idea of using nested responses like Y (a′, M (a)) to represent direct and indirect effects for mediation analysis appears in [16], and is discussed in the context of graphical causal models in [10]. Our contribution is to aid interpretability of such nested responses by viewing them as responses to interventions associated with edges, graphical features intuitively associated with effects we are trying to formalize.

Just as it is good practice to only discuss node interventions in settings where it is possible, at least in principle, to assign treatment by fiat, so it is good practice to only discuss edge interventions in settings where it is possible, at least in principle, to conceive of assigning only those components of the overall treatment that influences a particular direct consequence. For instance, if smoking affects cardiovascular disease only by means of nicotine content, then we might simulate the absence of smoking, but only for the purposes of cardiovascular disease, by assigning the “treatment” of nicotinefree cigarettes. In this paper, we leave the issues of applicability of edge interventions and mediation analysis in particular settings aside [17], and consider, in subsequent sections, questions of identification and the form of resulting functionals.

3.3. Path Interventions

We are going to define responses to path interventions, which associate a set of directed paths with values of sources of every path in the set. A response to a path intervention will behave as if the source of a path were set to a particular value, but only for the purposes of a particular outgoing directed path. This behavior generalizes the behavior of edge interventions, where vertices may behave differently with respect to different outgoing edges. Path interventions serve as a very general, graphical representation of counterfactual quantities associated with causal pathways that generalizes both edge and path interventions. The supplementary materials [29] contain our rationale for the use of path interventions versus simpler or more algebraic approaches to representing counterfactuals of interest.

To make sure we end up with well-defined responses, we insist on a property for sets of directed paths called properness. A set of directed paths α, in a DAG 𝒢 is called proper if no path in α, is a prefix subpath of another path in α. A set consisting of a single path is always proper, as is a set of length 1 paths (e.g. a set of edges). In the remainder of the paper, when we say “a set of paths α,” we mean a proper set of directed paths.

For a set of paths α in a DAG 𝒢, define 𝔛α ≡ 𝔛so𝒢 (α). In other words, 𝔛α is a Cartesian product of the state spaces of source variables of all directed paths in α. Since sets of paths clearly generalize sets of edges, the same issue occurs where a single vertex in 𝒢 may occur multiple times in 𝔛α. As before, to emphasize this, we will denote elements of 𝔛α by lowercase Frankfurt font: 𝔞, possibly indexed by a path set subscript: 𝔞α.

We denote a forced assignment of variables corresponding to source vertices of paths from α to an element of 𝔛α as a path intervention. A path intervention which assigns α to an element 𝔞α ∈ 𝔛α will denoted by π𝔞α . As with elements of 𝔛A, we denote a restriction of 𝔞 by a set subscript. That is, if 𝔞α ∈ 𝔛α, and βα, then 𝔞β is a restriction of 𝔞 to variables corresponding to source vertices of β.

As was the case with node and edge interventions, our definition of path interventions will be inductive. To get the induction to work, we need to consider how treatments affect the response via pathways that end in a particular edge. We use the following definition to formalize this. Given a set of paths α in a DAG 𝒢, and an edge (WY), define a funnel operator(WY) which maps from α to the set of paths ◁(WY) (α) obtained from α by replacing any path of the form (A, . . ., W, Y) by (A, . . ., W), by removing all paths containing W but no suffix (WY), and keeping all other paths intact.

Lemma 3.1

If α is proper, then for any edge (WY), so is(WY) (α).

Given a path intervention π that assigns α to 𝔞α, and a funnel operator ◁(WY), we consider funneled path interventions on ◁(WY) (α). For every α such that ◁(WY) (α) = α, the funneled path intervention assigns α to 𝔞α, that is it keeps the same value assignment as the original path intervention. For the path α ≡ (A . . . W, Y) the funneled path intervention assigns ◁(WY) (α) to 𝔞(A...WY), that is assigns the value given by the original intervention to (A . . . WY). We denote such an assignment by 𝔞◁(WY) (α).

Our insistence on α being proper, together with Lemma 3.1, means that there is never any ambiguity in defining the funneled path intervention. That is, it is never the case that two distinct paths in α are of the form (A . . . W) and (A . . . WY). If such a pair of paths were allowed, the difficulty would then be that these paths can both reasonably be claimed to represent an effect of setting A along the path (A . . . WY), while potentially disagreeing on what that setting is.

We are now ready to define responses to path interventions. For every VV, a set of paths α in a DAG 𝒢, and an element 𝔞α ∈ 𝔛α, we define the response of V to π𝔞α as

V(𝔞α)V(𝔞(V)α,{W(𝔞(WY)(α))|Wpa𝒢α¯(V)}) (4)

where pa𝒢α¯(V){Wpa𝒢(V)|(WV)α}.

In words, this states that the response of V to π𝔞α, where 𝔞α ∈ 𝔛α is defined as the potential outcome where all parents of V along edges which are (length 1) paths in α are assigned an appropriate value from 𝔞α, and all other parents W are assigned whatever value they would have attained under the funneled path intervention associated with a funnel operator for the edge between that parent W and V. Note that the definition is inductive for such parents, with the result of applying a funnel operator serving as the new set of paths. Lemma 3.1 ensures that properness propagates to this set, and thus the overall response is well-defined.

For example, if π𝔞 assigns w to (W AMY) in Fig. 2 (a), then Y (𝔞) is defined by (4) to equal Y (M (A(w)), A). We will use a notational short-hand for responses to path interventions, where rather than listing nested responses in parentheses after the response, we list the paths with the source node replaced by the intervened on value. For example, we write Y (𝔞) = Y (M (A(w)), A) above as Y ((wAMY)). We use the same short-hand for responses to edge interventions.

As before, given a set Y = {Y1, . . ., Yk} of random variables, we denote {Y1(𝔞α), . . ., Yk (𝔞α)} by Y(𝔞α) or {Y}(𝔞α).

3.4. Responses to Path Interventions to Natural Values

So far we have defined path interventions as a mapping from a proper set of directed paths α to values in 𝔛α. However, we might be interested in considering responses to interventions that assign a variable not to a specific constant value, but to a value the variable would have attained under a no intervention regime. For instance, this might happen if the baseline exposure is one received by the general population, not a specific exposure level assigned by the experimenter, or if the effect of multiple treatments on the treated is of interest. In the context of node interventions, this situation was discussed in [4]. In order for responses to path interventions to include this case, we must extend the definition of path interventions to include intervening to natural values, that is values attained by variables under no interventions. Allowing arbitrary variables to be set to natural values may lead to identification difficulties even in very simple cases. Consider the following response to a node intervention in the MWM given by Fig. 2 (a), {A, Y}(A, w). In words, this is the joint response of A and Y to an intervention where W is set to value w, and A is set to the natural value it attains under no interventions. The definition of responses to node interventions via recursive substitution shows that {A, Y}(A, w) = Y (A), A(w). However, the distribution p(A, A(w)) is not identified under the MWM for the graph in Fig. 2 (a), see Lemma 5.8, and thus neither is the joint response in question.

To avoid this difficulty, we consider only a special subset of path interventions containing settings on natural values. This special subset can safely be rephrased in such a way that only interventions on constants remain explicit. To define this special subset, we need a few preliminary definitions.

For a node A, and a directed path (or an edge) α with source A, define the extended state space as follows 𝔛A𝔛A{A}, and 𝔛α𝔛α{A}. We define the extended state space for sets of nodes, edges, and paths via a Cartesian product as before. An intervention on an extended state space is allowed on either any constant value, or on the “natural value.”

Given a set of paths α and a response set Y, we call a directed path α relevant for Y given α if α = (A . . . Y), where YY, and no path in α is a subpath of α except possibly a prefix of α. We denote the set of all relevant paths for Y given α in 𝒢 by rel𝒢 (Y | α).

Paths relevant for Y given α are those paths consisting of vertices that follow a particular recursive sequence of invocations of definition (4). For example, assume we are interested in the singleton response set {Y} and a singleton path set {(W AMY)} in Fig. 2 (a). Then defining Y ((wAMY)) for a particular w via (4) entails defining intermediate responses M ((wAM)) and A((wA)). The sequence of vertices (A, M, Y) are all linked by directed edges by (4), and (AMY) is relevant for {Y} given {(W AMY)}. Similarly, (W AMY) and (W AY) are relevant for {Y} given {(W AMY)}.

We now give two useful results about relevant paths.

Lemma 3.2

If α ∈ rel𝒢 (Y | α), then β ∈ rel𝒢 (Y | α) for any suffix subpath β of α.

Lemma 3.3

If βα, then for any Y, rel𝒢 (Y | α) ⊆ rel𝒢 (Y | β).

A set of interventions may not all have an effect on a response, due to constraints of the model. For instance, since Y (a, m, w) ≠ Y (a′, m, w) but Y (a, m, w) = Y (a, m, w′) for any m, a, w, a′, w′ in Fig. 2 (a), A has an effect on Y, but W does not, given that we also intervene on A and M. We extend this notion to path interventions, and call those paths with sources that actually have an effect on the response, given interventions on other paths, live. More precisely, given a proper set of paths α and a response set Y, we call a path αα live for Y given α if there is an element of rel𝒢 (Y | α) containing α as a prefix.

Consider the maximal subset of α consisting of paths in α live for Y given α, or αY ≡ {αα | α live for Y given α}. We say a set of directed paths α is live for Y if α = αY. When discussing path interventions, we can always restrict our attention to sets of paths live for Y without loss of generality, due to the following result.

Lemma 3.4

For any Y and α proper for Y, rel𝒢 (Y | α) = rel𝒢 (Y | αY), (αY)Y = αY, and in addition, for any 𝔞α, p(Y(𝔞α)) = p(Y(𝔞αY)).

We now show that we can either ignore interventions to natural values in a response to a path intervention, or the response is not identified under the MWM. The set of paths for which the former is true for the response Y will be called natural for Y. Due to this result, we do not need to consider interventions to natural values explicitly.

Definition 1

Let α be live for Y. Let π𝔞α be a path intervention in 𝒢 where a subset α*α is assigned constant values, and α \ α* is assigned natural values. Then if no element of rel𝒢 (Y | α*) with a prefix subpath in α* contains a subpath in α \ α*, we say π is natural for Y.

Lemma 3.5

Let π𝔞α be a path intervention natural for Y, and α*α, is all paths assigned constant values by π. Then p(Y(𝔞α)) = p(Y(𝔞α*)).

Lemma 3.6

If π𝔞α is not natural for Y in 𝒢, then p(Y(𝔞α)) is not identified under the MWM for 𝒢.

Lemma 3.5 does not guarantee that a response to a natural path intervention is identifiable, merely that it can be expressed as a response to an intervention only setting to constant values.

4. Causal Inference Targets as Responses to Path Interventions

In this section we consider how a number of targets of interest in causal inference, including novel targets not previously considered in the literature, may be expressed as responses to path interventions.

We use as our running example the two time point fragment of a longitudinal study in HIV research, described in Section 2.4. We consider path-specific effects that arise in mediation analysis, and effects of treatment on the multiply treated, which are of interest in tort cases (since these are effects of the exposure on those actually exposed), and in epidemiology if natural exposure levels carry information about the causal effect of the exposure. We also describe a novel inference target that combines features of mediated effects, and effects of treatment on the treated, that we call effect of treatment on the indirectly treated. It is not straightforward to see whether these types of effects are identifiable, and under what model, nor is it obvious whether there is a single unifying principle which governs identification for these effects.

By translating the effect types above into responses to path interventions, we show that such responses form a very general class of causal inference targets. Thus, the advantage of path interventions is that we can use them to give a single characterization for a wide variety of targets of interest at once. The close relationship between effects of treatment on the treated and mediated effects hinted by their common generalization as responses to path interventions is currently not widely known.

We will define a special set of directed paths important for our translation scheme. Given a treatment set A and an outcome set Y (that possibly intersect) in a DAG 𝒢, define the set αA, Y, 𝒢 to be the set of all directed paths with a source in A, a sink in AY and which do not intersect AY, except at the source and sink. Since A and Y are allowed to intersect, the names “treatment” and “outcome” are slightly misleading. We allow the intersection to admit cases such as effect of treatment on the treated (ETT) where some treatments are also treated as outcomes for the purposes of certain paths.

Lemma 4.1

αA,Y,G is always proper.

4.1. Effects of Treatment on the Treated

We consider an effect on the mean difference scale where we condition on the naturally observed treatment levels. This is known as the effect of treatment on the treated (ETT), and in our two time point HIV example, it is defined as follows

ETT𝔼[Y(a1,a2)|a1,a2]𝔼[Y(a1,a2)|a1,a2].

This contrast is often of interest to epidemiologists. It also arises in cases where interventions are functions of the natural value of the exposure. For example, we may be interested in outcome for people who were encouraged to exercise for 30 more minutes than they normally would, which is a random variable of the form Y (A + 30) ≡ Y (a + 30) | A = a. These types of interventions are discussed in [36], in particular sufficient conditions for identification under the SWM, in terms of the extended g-formula (2) are given there and in [13].

Assume A1 is a binary variable (only two treatment levels). If we consider, instead, the ETT with respect to only the exposure A1, we obtain the following derivation for the second term in the contrast

p(Y2(a1)|a1)=p(Y2(a1),a1)p(a1)=p(Y2(a1))p(Y2(a1),a1)p(a1),

where the first identity is by definition, and the second by the binary treatment assumption. Since consistency implies p(Y2(a1), a1) = p(Y2, a1) for any value a1, the ETT for a single binary exposure A1 can be identified if p(Y2(a1)) is identified.

However, if the exposure is not binary, or if there are multiple exposures, as in our example, we cannot use the same algebraic trick to obtain identification, and we must proceed by exploiting additional assumptions in our causal model.

In our case, the first conditional mean in the contrast can be readily identified via consistency: 𝔼[Y (a1, a2) | a1, a2] = 𝔼[Y | a1, a2]. However, the second conditional mean presents a problem, because it contains a conflict between the naturally observed exposures, and the assigned exposures. Here we show how to represent the underlying joint distribution over potential outcomes, p(Y2(a1, a2), A1, A2), in terms of path interventions, and then attack the identification problem for all responses to path interventions, which would then include the problematic second term of the ETT.

We consider all directed paths from A2 to Y2, which we assign a value a2, all directed paths from A1 to Y2 not through A2, which we assign a value a1, and all directed paths from A1 to A2, which we assign the natural value of A1. Note that this set of paths is simply α{A1,A2},{Y2},𝒢 for 𝒢 that is the transitive closure with respect to blue edges of the graph in Fig. 2 (b), and thus is proper by Lemma 4.1. We then consider the response of A1, A2, Y2 to the path intervention so defined, or {A1, A2, Y2}(𝔞α). By our definition, all paths set to a value ancestral for A1, A2 are set to natural values. Thus, {A1, A2}(𝔞α) is defined in terms of natural values of its direct causal parents, or as A1(C0) = A1 and A2(Y1, C1, W1, A1, C0) = A2.

Finally, we consider all paths ancestral for Y2. Since A1 and A2 are parents of Y2 in 𝒢*, the single edge paths (A1Y2) and (A2Y2) are in our set, thus we substitute a1 and a2 into the potential outcome answer. Furthermore, for other parents of Y2, namely C0, U, W1, C1, Y1, W2 and C2, we consider an appropriate set derived from α. For example, for the node W2, we replace the path A2W2Y2 by a path A2W2 (while keeping the assignment a2). We proceed in this way recursively until we obtain the response for Y2, which is

Y2(a1,a2,U,C0,W1(a1,),C1(a1,),Y1(a1,),W2(a1,a2,),C2(a1,a2,)),

where . . . is a shorthand that means “include all earlier potential outcomes.” For example, C1(a1, . . . .) means C1(a1, W1(a1, U, C0), U, C0). By definition of node intervention responses, this counterfactual is equal to Y2(a1, a2), and our overall joint distribution over the responses is p(Y2(a1, a2), A1, A2).

For arbitrary sets of treatments A and outcomes Y, and active treatment values a, we may still represent ETT as a single mean difference, for example 𝔼[f (y)][p(Y(a)|a)] − 𝔼[f (y)][p(Y(a′)|a)], for some function f (y).

Note that though ETT resembles the total effect, it is in fact a more complex kind of counterfactual. This is because we are simultaneously interested in “outcome responses” Y, and “treatment responses” A. Defining these treatment responses may introduce conflicts among intermediate counterfactual responses, not well represented by node interventions, which is why we represent ETT as a response to a path intervention.

The ETT path intervention π𝔞αA,Y,𝒢a simply assigns all paths in αA, Y, 𝒢 to the appropriate value. That is, paths from A to A are assigned the appropriate natural value, and paths from A to Y are assigned the appropriate value in a. Given this definition, either the ETT is not identified, or the joint distribution from which ETT is obtained corresponds to the joint response of YA to the ETT path intervention.

Lemma 4.2

If there exists AA such that A(𝔞αA, Y, 𝒢) ≠ A, p(Y(a), A) is not identified under the MWM for 𝒢. If there does not exist such an A, p(Y(a), A) = p({YA}(𝔞αA, Y, 𝒢)).

If p(Y(a), A) is expressible as a response to a path intervention, it may still not be identifiable under the MWM.

Our subsequent results on identification of path interventions under the MWM complement identification results in [36, 13]. In particular, our results imply the distribution p(Y (a, m) | A, M) is identified under the MWM for Fig. 2 (a), but not under the SWM for Fig. 2 (a).

4.2. Path-Specific Effects

Next, we consider the mediated effect of A1, A2 on Y2 through C1, C2, in other words, the effect of exposures on outcome mediated by adherence. Originally these kinds of effects were considered in [3] in the context of linear models, and were generalized to a form not restricted by particular parametric models in [16]. We discuss a simple version of mediated effects in the graph in Fig. 2 (a), known as natural direct and indirect effects [16, 10] in Section 3.2, where we represented them as responses to edge interventions.

In our case, we are interested in a more complicated effect, but we can represent it using a similar idea using paths rather than edges – paths we are interested in are assigned active treatment values a1, a2, while paths we are not interested in are assigned baseline treatment values a1,a2. The paths we are interested in are all directed paths with the first edges are one of {(A1C1), (A1C2), (A2C2)}, which end in Y2, and which do not proceed through A2 if started at A1. The paths we are not interested in are all other paths which start with A2 or A1 (and do not proceed through A2) and end in Y2. Call this assignment 𝔞1. Note that the assignment 𝔞1 is on the set of paths that is precisely equal to α{A1, A2},{Y2}, 𝒢 for 𝒢 that is the transitive closure with respect to blue edges of the graph in Fig. 2 (b), and thus is proper.

We apply our definition to obtain a response of Y2 to this intervention. We must substitute a value for every parent of Y2. The values for A1, A2 will be the baseline a1,a2, while the values for C0, U will just be the natural values of those variables. Complications arise for other parents, due to the recursive nature of the definition. We proceed recursively:

Y2(𝔞1)=Y2(a1,a2,{C2,W2,Y1,C1,W1,C0}(𝔞1),U)
C2(𝔞1)=C2(a1,a2,{W2,Y1,C1,W1,C0}(𝔞1),U)
W2(𝔞1)=W2(a1,a2,{Y1,C1,W1,C0}(𝔞1),U)
Y1(𝔞1)=Y1(a1,{C1,W1,C0}(𝔞1),U)
C1(𝔞1)=C1(a1,{W1,C0}(𝔞1))
W1(𝔞1)=W1(a1,C0(𝔞1),U)
C0(𝔞1)=C0(U)=C0

In the matter similar to direct and indirect effects, we can use this response along with the total effect responses to define “the effect along paths we want” as 𝔼[Y(𝔞1)]𝔼[Y(a1,a2)], and “the effect along paths we do not want” as 𝔼[Y (a1, a2)] − 𝔼[Y (𝔞1)]. As before, the ACE additively decomposes into these two effect measures. This definition (without the use of path interventions) appears in [24].

We may also consider a response of Y2 where the paths we are not interested in are assigned the natural values, as discussed in Section 3.4, rather than fixed baseline values. Such an effect is defined similarly.

Consider a set of active treatment values a of A, a set of fixed baseline treatment values a′, and a subset β of αA,Y,𝒢 (which contains “paths of interest”). Define the fixed baseline PSE path intervention π𝔞αA,Y,𝒢a,a,β as a path intervention that assigns appropriate active values in a to sources in β and appropriate baseline values in a′ to sources of all paths in αA,Y,𝒢\ β.

Similarly, we call an intervention π𝔞αA,Y,𝒢a,β that assigns active values in a to sources of paths in β and appropriate natural values to sources of all paths in αA,Y,𝒢\ β the average baseline PSE path intervention.

Path specific effects along all paths in β (with a fixed baseline) can then be defined on the mean difference scale as 𝔼[Y(𝔞αA,Y,𝒢a,a,β)]𝔼[Y(a)], and along all paths not in β as 𝔼[Y(a)]𝔼[Y(𝔞αA,Y,𝒢a,a,β)]. Average baseline path specific effects on the difference scale are defined similarly.

4.3. Effects of Treatment on the Indirectly Treated

In this section we show that the language of path interventions is general enough to incorporate novel targets not currently considered in the literature. Our results immediately settle identification questions for any such target.

We consider a seemingly innocuous ETT with two treatments that in fact can only be represented by a path intervention, not an edge intervention, and variations of this target that are identified under the SWM and the MWM. Assume Fig. 2 (a) represents a simple two time point partially randomized observational study, where W and M are treatments at the first and second time points, respectively, A is an intermediate health measure, and Y is the outcome. We make very strong assumptions about this study. In particular, W is randomized, while M is only assigned based on A, W. Finally, no unobserved confounding exists anywhere, including between W, M and Y. We are interested in the effect of treatments W, M on the treated in this study. To obtain this contrast, we need to identify p(Y (m, w) | W, M) which is identified if and only if p(Y (m, w), W, M) is. It is not difficult to show that

p(Y(m,w),W,M)=p({Y,M,W}((wAY),(mY)))=p(Y(m,A(w)),M(A(W),W),W).

As we will show in the next section, there is no way to express this response as a response to an edge intervention, and it is not identified under the MWM. This is the case despite the fact that there is no unobserved confounding in this study. The difficulty is that the response is defined in terms of A(w) and A jointly, and the distribution p(A(w), A) is not identified under the MWM without more assumptions.

To obtain a target that is identified under the SWM in this case we may consider the response Y (w, m) on the treated to the natural value W, and the value of M occurring under the intervention setting W to w. This results in p(Y (m, w), W, M (w)) = p(Y (m, A(M (w))), M (A(w)), W) which is then identified under the SWM. To obtain identification we gave up on conditioning on the natural value of the second treatment M. This may not be “in the spirit” of the ETT target.

One compromise is to assume a stronger model, the MWM, and allow the response M to be “as natural as possible” while still retaining identification. This would correspond to defining a contrast in terms of p({Y, W, M}((mY), (wAY), (wAM))), which in turn is equivalent to p({Y, W, M}((mY), (wA))). A conditional distribution p(Y ((mY), (wA)) | M ((wA), (w′ M)) = m′, W = w′) represents the response Y (w, m) among those individuals for whom the value for W is naturally w′ (untreated), and for whom the value for M would have been m′ (untreated) under the situation where W acts as if set to treatment value w for paths shared by Y and M, and acts as if set to untreated value w′ otherwise.

We can define a contrast based on this quantity as follows

𝔼[Y((mY),(wA))|M((wA),(wM))=m,W=w]𝔼[Y((mY),(wA))|M((wA),(wM))=m,W=w],

which we call “the effect of treatment on the indirectly treated (ETIT).” The name is due to the fact that we consider people whose natural treatment value W is untreated, and whose followup treatment M assumes the untreated value if viewed as a response to the indirect effect of the first treatment. Such a quantity would be difficult to conceive of without a direct representation of effects along pathways, something path interventions provide. Our results also directly imply that this quantity is identified under the MWM, but not SWM.

5. Identification of Edge and Path Interventions

Having established a correspondence between responses to path interventions and a variety of targets of interest in causal inference, we now consider what assumptions are necessary to express path interventions as edge interventions, edge interventions as node interventions, and edge and node interventions as functions of the observed data.

As we showed in section 3.4, we can restrict our attention to path interventions that only assign paths to constant values, since paths that are assigned natural values either can be dropped from the intervention without affecting the response, or the overall response is not identified.

5.1. Node and Edge Interventions as Path Interventions

If node interventions are a special case of edge interventions, which are in turn a special case of path interventions, we ought to be able to give a path intervention the response of which is equal to the response to an arbitrary node or edge intervention. For any such response there may be multiple path interventions the responses to which are identical. Here we construct particular path interventions that work based on the set of paths αA,Y,𝒢 we defined earlier.

Lemma 5.1

Let A, Y be disjoint vertex sets in a DAG 𝒢, and a a value assignment to A. Let π𝔞αA,Y,𝒢a assign each ααA,Y,𝒢 to aso𝒢(α). Then p(Y(𝔞αA,Y,𝒢)) = p(Y(a)).

Lemma 5.2

Let Y be a vertex set in a DAG 𝒢, and α a set of edges, with 𝔞α an assignment to α. Let A = so𝒢 (α), and αY,𝒢 be a subset of αA,Y,𝒢 consisting of paths with an edge prefix in α. Let π𝔞αY,𝒢α assign each ααY,𝒢 to the value assigned to the edge prefix of α by 𝔞α. Then p(Y(𝔞αY,𝒢)) = p(Y(𝔞α)).

5.2. Identification of Edge Interventions

The difficulty with edge interventions is that a single response to an edge intervention may involve other responses with conflicting treatment assignments. It is this feature of edge interventions which in general prevents their identification under the SWM, and which requires the stronger assumptions of the MWM. If such a conflicting assignment is absent, the edge intervention can be rephrased as a node intervention. We show this absence of conflict is characterized by a property we call node consistency.

A set of edges α live for Y is called consistent for Y if for every node A, the set of prefix edges of the path set {α ∈ rel𝒢 (Y | α)|so𝒢 (α) = A} is either disjoint from α or contained in α.

For a set of edges α live and consistent for Y, we call an edge intervention η𝔞α node consistent for Y if for every node A, all edges in α with A as the source node are assigned the same value (say a). Any edge intervention that is not node consistent we call node inconsistent, including any edge intervention on a set of edges not consistent for an outcome set of interest.

The edge set {(AY)} in Fig. 2 (a) is live but not consistent for {Y}, thus any edge intervention on this set (that sets to constant values) is inconsistent for {Y}. An edge intervention corresponding to Y ((aY), (aM)) is node consistent for Y, while an edge intervention corresponding to Y ((aY), (a′ M)) is consistent, but not node consistent for Y.

For an edge intervention η𝔞α node consistent for Y, define the following set of value assignments to A = so𝒢 (α), aα ≡ {a | η assigns a to (AB)α}. Let νaα be the induced node intervention for η𝔞α.

Lemma 5.3

Given a DAG 𝒢 with vertices V, and an edge intervention η𝔞α node consistent for YV, p(Y(𝔞α)) = p(Y(aα)).

Proof

This follows by lemmas 5.1 and 5.2.

Corollary 5.1

If η𝔞α is node consistent for Y, then p(Y(𝔞α)) is identified as a functional of p(V) under the SWM via the appropriate marginal of the extended g-formula (2) for the response to the corresponding induced node intervention.

We next show that if an edge intervention is not node consistent, then responses to this intervention are not identifiable from p(V) under the SWM. By this we mean that the definition of identifiability given in Section 2.3 fails, and more specifically that we can find two elements of a causal model, in the sense of section 2.2, that agree on p(V) but disagree on the distribution of the response of interest. We start with a simple example of a non-identified parameter in the SWM.

Lemma 5.4

Responses p({B, C}((aB), (a′C))), p({B, C}((aB))), and p({B, C}((aC))), are not identifiable from p(A, B, C) under the SWM for Fig. 3 (b).

Fig 3.

Fig 3

(a) A causal model where p(B(a), B(a′)) and p(B(a), B) are not identified, in Lemma 5.8. (b) A causal model where p({B, C}((aB), (a′C))), p({B, C}((aB))), and p({B, C}((aC))) are not identified, in Lemma 5.4. (c),(d) Graphs corresponding to elements of the causal model in (b) witnessing non-identifiability of edge interventions in (b).

The proofs of this result, which appears in the appendix, exhibits two causal structures c1({A, B, C}, 𝒢), and c2({A, B, C}, 𝒢) that agree on p(A, B, C), but disagree on the above responses to (node inconsistent) edge interventions. These two structures corresponding to graphs in Fig. 3 (c), (d). In particular, c2 is constructed in such a way that the confounding of B and C introduced by UB and UC is masked under any single node intervention, but manifests if we consider responses to multiple interventions simultaneously. This is similar in spirit to an example in [17]. We can extend this simple example to a general result, due to the following lemma (stated in a more general form in terms of path rather than edge interventions).

Lemma 5.5

Let 𝒢 be a DAG, Y, A disjoint sets of vertices in 𝒢, α a set of paths live for Y. Let 𝒢* be any edge supergraph of 𝒢, Y* any superset of Y in 𝒢*, α* a superset of α in 𝒢* live and proper for Y*, such that every path in α* \ α does not exist in 𝒢. Finally, let π𝔞α* be a path intervention. If p(Y(𝔞α), A) is not identified under the MWM (SWM) for 𝒢, then p(Y*(𝔞α*), A) and p(Y(𝔞α*), AY* \ Y) are not identified under the MWM (SWM) for 𝒢*.

Theorem 5.1

Consider a DAG 𝒢 with vertices V, and a set of edges α live for Y. Then p(Y(𝔞α)) is identifiable from p(V) under the SWM for 𝒢 if and only if η𝔞α is node consistent. Moreover, if p(Y(𝔞α)) is identifiable, it is equal to the appropriate marginal of the extended g-formula (2) for p(Y(𝔞α)), the response to the induced node intervention.

What we have shown is that node consistent edge interventions are identifiable under the SWM, but an edge intervention that is node inconsistent is not, as long as this inconsistency is “causally relevant” for some response, in the sense of there existing causal pathways from the inconsistent edges to some responses that are not interrupted by other parts of the edge intervention. However, if we are willing to adopt stronger independence assumptions of the MWM, we obtain identification of any edge intervention via a modification of the g-formula, as the following result shows.

Lemma 5.6 (edge g-formula)

For a DAG 𝒢 with vertices V, and an edge intervention η𝔞α on an edge set α, we have, under the MWM for 𝒢,

p(V(𝔞α)=v)=VVp(V=vV|vpa𝒢α¯(V),𝔞{(WV)α}), (5)

where pa𝒢α¯(V){Apa𝒢(V)|(AV)α}.

For example, in the graph in Fig. 2 (a), we can express the distribution of the response of Y ((a′M), (aY)) using (5) as follows:

p(Y(a,M(a))=y)=w,a,mp(y|m,a)p(m|a,w)p(a|w)p(w)=m,wp(y|m,a)p(m|a,w)p(w)

If we are interested in a mean difference parameter, for example E[Y (a, M (a′))]−𝔼[Y (a′)], and assume there are no baseline factors W, the above reduces to

m{𝔼[Y|m,a]𝔼[Y|m,a]}p(m|a)

which recovers the well known mediation formula [12].

The independence assumptions which were necessary to derive this functional, namely (Y (m, a) ⫫ M (a′) ⫫ A), are implied by the MWM for the graph in Fig. 2 (a). It is possible to consider such assumptions independently of a graph. However the advantage of graphs is their ability to encode assumptions of this type systematically, which allowed us to derive such functionals for a wide variety of problems, and moreover, to give simple visual characterizations of when such derivations are possible.

5.3. Identification of Path Interventions

As we saw in the previous section, identification of responses to edge interventions under the SWM requires node consistency, while any joint response to any edge intervention is identified under the MWM. In this section we show that path interventions are identified under the MWM as long as edge consistency holds, that is as long as a path intervention can be expressed as an edge intervention. Lack of edge consistency will result in non-identification under the MWM. The presence of a “recanting witness” in a path-specific effect [1] can be viewed as a special case of the lack of edge consistency.

A set of paths α, live for Y is called consistent for Y if for every edge (AB) that is an edge prefix of αα, if (AB) is in β ∈ rel𝒢 (Y | α), then (AB) is an edge prefix of a prefix subpath of β in α.

For a set of paths α live and consistent for Y, we call a path intervention π𝔞α edge consistent for Y if for every edge (AB), all paths in α, with (AB) as a prefix are assigned the same value (say a). Any path intervention that is not edge consistent we call edge inconsistent, including any path intervention on a set of paths not consistent for an outcome set of interest.

The path set {(W AM Y)} in Fig. 2 (a) is live but not consistent for {Y}, thus any path intervention on this set is inconsistent for {Y}. A path intervention corresponding to Y ((wAMY), (wAY)) is edge consistent for Y while a path intervention corresponding to Y ((wAMY), (w′ AY)) is consistent for Y but not edge consistent for Y.

For a path intervention π𝔞α, edge consistent for Y, define the set of edges α1 ≡ {(AB) | (AB) is a prefix for αα}. Let η𝔞α1 be the induced edge intervention for π𝔞α, where η assigns (AB)α1 to the value assigned by π to all αα, which have (AB) as an edge prefix.

Lemma 5.7

Given a DAG 𝒢 with vertices Vand a path intervention π𝔞α edge consistent for YV, p(Y(𝔞α)) = p(Y(𝔞α1)).

Corollary 5.2

If π𝔞α is edge consistent for Y, then the distribution p(Y(𝔞α)) is identified as a functional of p(V) under the MWM model via the appropriate marginal of the edge g-formula for the response to the corresponding induced edge intervention.

We will show that responses to edge inconsistent path interventions are not identifiable under the MWM using the same strategy as we used for node inconsistent edge interventions. First, we reproduce a result stating that a joint response to a conflicting exposure is not identifiable. Then we extend this result to the general case we need.

Lemma 5.8

The distributions p(B(a), B(a′)) and p(B(a), B) are not identifiable from p(A, B) under the MWM for the DAG in Fig. 3 (a).

Theorem 5.2

Consider a DAG 𝒢 with vertices V, and a proper set of paths α live for Y. Then p(Y(𝔞α)) is identifiable from p(V) under the MWM for 𝒢 if and only if π𝔞α is edge consistent. Moreover, if p(Y(𝔞α)) is identifiable, it is equal to the appropriate marginal of the edge g-formula for p(Y(𝔞α1)), the response to the induced edge intervention.

5.4. A Model Where Responses to Path Interventions Are Identified

Though we have shown that responses to path interventions that cannot be expressed as responses to edge interventions are not in general identified under the MWM, there exist submodels of the MWM where all responses to path interventions are identified. In particular, consider the linear structural equation model (SEM), which is an MWM where the mapping from vpa𝒢(V) ∈ 𝔛pa𝒢(V) to V (vpa𝒢(V)) is a linear function of vpa𝒢(V) and an error term εV, where such error terms are normally distributed and mutually independent.

Theorem 5.3

Let π𝔞α be a path intervention. Then p(Y(𝔞α)) is identified under the linear SEM.

This follows as a corollary of results in [2]. The reason even edge-inconsistent path interventions are identified is that linearity, normality and independence are such strong assumptions that we can directly evaluate even counterfactuals of the form p(W(a), W(a′)) using the algorithm in [2]. A fruitful open question if whether there are other interesting (for instance maximal) submodels of the MWM where all responses to path interventions are identified.

5.5. Targets Not Representable as Path Interventions

We have shown that a wide class of targets of interest in causal inference can be expressed as responses to path interventions. Nevertheless, there exist targets of interest which are known not to be representable in this way, such as principal stratification effects. For instance, the principal stratum direct effect (PSDE) [22, 23] is defined to be a treatment contrast only among those individuals for whom the mediator assumes a particular value for both active and baseline treatment levels. In Fig. 2 (a), the PSDE is a contrast of the form

𝔼[Y(a,m)|M(a)=M(a)=m]𝔼[Y(a,m)|M(a)=M(a)=m].

Under the MWM, we obtain independences Y (a, m) ⫫ {M (a), M (a′)}, and Y (a′, m) ⫫ {M (a), M (a′)}, which implies the PSDE is equal to the controlled direct effect contrast under the MWM: 𝔼[Y (a, m)] − 𝔼[Y (a′, m)]. Under the SWM, the PSDE contrast is not identified without more assumptions. In either case, it is not possible to express the condition defining the principal strata, namely M (a) = M (a′) = m as a response to a path intervention, since this will entail assigning conflicting values to a directed edge from A to M. This is perhaps not surprising, since responses to path interventions are meant to encode effects along particular causal pathways which is not something principal strata effects encode. Note that despite this, the MWM allows us to rephrase the PSDE as a node intervention.

6. The Edge G-Formula and Single World Intervention Graphs

A connection between the SWM, node interventions, the extended g-formula, and a type of graph with split nodes called the Single World Intervention Graph (SWIG) was given in [13].

If a set of responses V to a node intervention νa includes all variables (including A), then, under the SWM, the response is linked to the observed distribution via (2), and can be viewed as a kind of Markov factorization [9] of the joint response V(a), where terms p(V | pa𝒢 (V)) with pa𝒢 (V) ∩ A ≠ ∅ are replaced with p(V | pa𝒢 (V)\A, apa𝒢(V)∩A). SWIGs are a graphical representation of this factorization, in the sense that independences in p(V(a)) can be read off from the corresponding SWIG. Since A occurs both as a treatment and a response, SWIGs split the vertex A into a random and fixed versions (we draw fixed vertices as squares).

For example, the SWIG in Fig. 4 (a) represents p({Y, M, W, A}(a)) in the SWM corresponding to Fig. 2(a). We can check independences of counterfactuals in the joint p({Y, M, W, A}(a)), via a simple modification of the d-separation criterion [9]. For instance, Y (a) ⫫ A | W since all d-connected paths from Y to A are blocked by W.

Fig 4.

Fig 4

(a) A SWIG for {Y, M, W, A}(a). (b) An edge intervention version for {Y, M, W, A}((wM), (w′A), (aY)).

Similarly, if a set of responses V to an edge intervention η𝔞 includes all variables (including A), then, under the MWM, the response is linked to the observed distribution via (2), and can be viewed as a kind of Markov factorization [9] of the joint response V(𝔞), where terms p(V | pa𝒢 (V)) with pa𝒢 (V) ∩ so𝒢 (α) ≠ ∅ are replaced with p(V|pa𝒢α¯(V),𝔞(WV)α). It is possible to generalize SWIGs to give a graphical representation of this factorization. Instead of splitting the vertices into the fixed and random versions, we instead shatter every intervened-on vertex into a set corresponding to distinct values (including the natural value) that vertex assumes when defining the response. For example, the graph in Fig. 4 (b) represents p({Y, M, W, A}((wM), (w′, A), (aY))) in the MWM corresponding to Fig. 2 (a). We can check independences of counterfactuals in this joint via a simple modification of d-separation: Y ((aY), (wM), (w′, A)) ⫫ A((w′, A)) | M ((w′, A), (wM)) since all d-connected paths from Y to A, are blocked by M. Note that we shatter W in Fig. 2(a) into three vertices, and A into two, where the random vertex has an outgoing arrow to M. This is because there are two treatment values for W, and W is also a response, while A is a response for the purposes of the (AM) edge and a treatment for the purposes of the (AY) edge. That responses to edge interventions factorize according to these kinds of “shattered graphs” under the MWM (but not SWM) follows as a straightforward generalization of the proof of proposition 11 in [13]. In fact, these shattered graphs can be viewed as SWIGs defined on an augmented graph where a treatment vertex is split into copies, corresponding to (individually intervenable) components of the treatment associated with direct and indirect effects. For examples of such graphs, and associated discussion, see [17], Section 6, and Fig. 6.

Fig 6.

Fig 6

A flowchart for identification results for path interventions under the MWM and the SWM.

Thus, the edge g-formula can be viewed as the MWM analogue of the extended g-formula, and it is possible to construct graphs that stand in the same relation to edge interventions, the edge g-formula, and the MWM as SWIGs do to node interventions, the extended g-formula, and the SWM. In the interests of space, we do not derive this formally, nor pursue this connection further here.

7. The Edge G-Formula and Causal Effects in Hidden Variable DAGs

If some variables in a causal model of a DAG are unobserved, not every response to a node intervention is identified, since (2) cannot always be directly applied. A complete theory for identifying Y(a) from p(V), where A and Y are disjoint, was given in [33, 25]. In this section we show that certain certain identifying functionals for p(Y(a)) correspond to marginals of the edge g-formula (5).

For example, it can be shown that p(Y (a)) is identified via the front-door functional Σm,a′ p(Y | a′, m)p(m | a)p(a′) under the SWM shown in Fig. 5 (a), where H is not observable. If we replace H and its outgoing arrows by an arrow from A to Y we obtain the DAG in Fig. 5 (c). A straightforward consequence of (5) is that p(Y ((aM))) is identified via the same functional under the MWM for Fig. 5 (c). In this section, we give a general condition for case when this correspondence of functionals occurs.

Fig 5.

Fig 5

(a) A hidden variable DAG where the causal effect p(Y(a) = y) is identified via the front-door formula Σm,a′ p(y | a′,m)p(m | a)p(a′). (b) A DAG for a simple setting in mediation analysis where multiply robust estimators for functionals derived from (5) for Y ((aY), (a′M)) are known. (c) A DAG where Y ((aM)) is identified via the front-door formula in (a). (d) A latent projection ADMG of the DAG in (a) onto {A, M, Y}.

We introduce additional terminology to help us formulate our results. Rather than defining the identification problem on hidden variable DAGs directly, we will define it on acyclic directed mixed graphs (ADMGs) which represent a class of hidden variable DAGs that all share identifying functionals. An ADMG is a mixed graph with directed (→) and bidirected edges (↔), with no directed cycles. ADMGs represent classes of hidden variable DAGs via a latent projection operation onto a graph defined only on observed variables [34]. For example, this operation applied to Fig. 5 (a) results in an ADMG shown in Fig. 5 (d). Connected components in a graph obtained from an ADMG 𝒢 by dropping all directed edges are called districts of 𝒢. For example, the sets {A, Y} and {M} are districts of the graph in Fig. 5 (d). The set of districts of 𝒢 is denoted by 𝒟(𝒢). If a set S is in a district of 𝒢, we denote that district by dis𝒢 (S).

For an ADMG 𝒢 with vertices V, and AV, let 𝒢A be a subgraph consisting only of vertices in A and edges between them. Let an𝒢 (V) = {A | A → … → V is in 𝒢}. A total order ≺𝒢 on V in 𝒢 is topological if whenever V1𝒢 V2, V2 ∉ an𝒢 (V1). For a total order ≺ on V, for any VVm let pre(V) ≡ {WV \ {V} | WV}.

In the remainder of this section we will, without loss of generality, restrict attention to identification problems where V ⊆ an𝒢 (Y), and V \ an𝒢V\A (Y) ⊆ A, and consider responses to node interventions that yield an identifying functional in a particular convenient form that only involves conditional distributions derived from p(V).

Definition 2

Given p(V), for any total orderon V, and v ∈ 𝔛V, a functional of p(V) of the form ΣV′\Y ΠVV′ p(V | SV, vpre (V)\SV), where YV′V, and SV ⊆ pre(V) ∩ V′ is called a g-functional.

The output of the g-computation algorithm [14], mentioned in Section 2.4, is always a g-functional, but some identifying functionals for responses to node interventions are g-functionals that cannot arise from g-computation. For instance, consider the front-door functional Σm,a′ p(Y | a′, m)p(m | a)p(a′), which identifies p(Y (a)) in the graph in Fig. 5 (a). If we let V = V′ = {A, M, Y}, Y = {Y}, take ≺ to be the topological ordering for the graph, and let SY = {M, A}, and SM = SA = {}, we see that this satisfies definition 2 and so is a functional. However, g-computation cannot be used to identify responses to node interventions in cases where intervened on variables have unobserved causes in common with responses, as is the case in Fig. 5 (a).

We give a sufficient condition on ADMGs 𝒢, and on response sets Y to node interventions on A, such that the output is a g-functional, and then show that it is possible to construct a DAG 𝒢 from 𝒢 where a certain response to an edge intervention is identified via the same g-functional via (5).

For a particular treatment set A in 𝒢, let DS,A,𝒢 = dis𝒢an𝒢(S) (S) for each S ∈ 𝒟(𝒢V\A). We will omit A and 𝒢 from the subscript if they are obvious, to yield DS, and let Af = A \ ∪S ∈ 𝒟(𝒢V\A) DS. In words, 𝒟(𝒢V\A) is the districts in a graph where treatments A are removed. For instance, in Fig. 5 (d), with treatment A, these districts are {M} and {Y}. For each such district S, DS is a (possibly larger) district containing all of S in a graph containing ancestors of S. For instance, D{Y} in Fig. 5 (d) is {A, Y}. Af is all treatments not in any such DS. Since A is the only treatment in Fig. 5 (d) and is in D{Y}, Af = {} in this case.

Lemma 7.1

If (∀S1, S2 ∈ 𝒟(𝒢V\A)) (DS1DS2 ≠ ∅) ⇒ (S1 = S2), then the sets {DS | S ∈ 𝒟(𝒢V\A)} partition V \ Af.

Given Lemma 7.1, for every VV \ Af, let DV = DS for the unique DS such that VDS.

The following lemma gives two conditions sufficient to yield an identifying g-functional. First, any district S ∈ 𝒢V\A must not have parents not in S as elements of DS, and second the sets DS must partition V \ Af as in Lemma 7.1. This is satisfied by Fig. 5 (d), since D{Y} = {Y, A}, D{M} = {M}, and pa𝒢 ({M}) = {A}, pa𝒢 ({Y}) = {M}.

Lemma 7.2

Fix A, Y, 𝒢 such that

  1. (∀S ∈ 𝒟(𝒢V\A)), (pa𝒢 (S) \ S) ∩ DS = ∅, and

  2. (∀S1, S2 ∈ 𝒟(𝒢V\A)), (DS1DS2 ≠ ∅) ⇒ (S1, S2).

Then p(Y(a) = y) is identified by a g-functional

vV\(YAf)VV\Afp((yv)V|apre𝒢(V)(A\DV),(yv)pre𝒢(V)\(A\DV)). (6)

Finally, given that preconditions given by Lemma 7.2 are satisfied by an ADMG 𝒢, the following result claims we can modify 𝒢 into a DAG 𝒢, where there is some edge intervention with a response identified by the same g-functional as given by lemma 7.2. This DAG for Fig. 5 (d) is Fig. 5 (c).

Lemma 7.3

For an ADMG 𝒢 with vertex set V, fix disjoint Y, AV that satisfy the preconditions of lemma 7.2. Then there exists a DAG 𝒢 with vertex set V, and an edge intervention η𝔞α on a set of edges α in 𝒢 such that p(Y(𝔞α)) is identified under the MWM for 𝒢 via a margin of the functional in (5) that is equal to the identifying g-functional for p(Y(a)) in terms of p(V) in 𝒢.

A natural question raised by lemma 7.3 is the converse – is it the case that every identifying functional for an edge intervention corresponds to an identifying functional of a response to a node intervention. We leave this question for future work.

The fact that a class of causal effects identified via a g-functional, even those effects with unobserved causes of treatments, corresponds to responses to edge interventions in a DAG gives an additional reason to study estimation theory of the edge g-formula (5). Furthermore, this connection gives another setting in which front-door type functionals may arise – the context of mediation analysis where the baseline treatment is not a constant value, but a naturally occurring value in the population.

8. A Multiply Robust Estimator for a Special Case of the Edge G-Formula

We have shown that the edge g-formula (5) encodes a wide class of identified targets in causal inference. Here we give an example of how a response to an edge-consistent path intervention is represented as an edge intervention, identified via a marginal of (5), and re-expressed as a contrast parameter for which an estimator exists which is robust to misspecification of parts of the likelihood function. We consider discrete state spaces, but extensions to continuous state spaces are straightforward in this case.

Consider the graph in Fig. 5 (b), which represents a simple mediation setting, with A an exposure, Y an outcome, M a mediator, and C a set of baseline covariates. We might be interested in a direct or indirect effect of A on Y. As discussed in section 4, we may represent such effects as contrasts obtained from a response to a path intervention p(Y ((aMY), (a′Y))). This path intervention is natural, and edge consistent, and the response of Y to it is equal to the response to an edge intervention p(Y ((aM), (a′Y))), which is identified as a marginal of (5), namely Σc p(Y | a′, m, c)p(m | a, c)p(c). Let ϒ (a, a′, c) = Σm 𝔼 (Y | a′, m, c) · p(m | a, c). Then the mean response is Φ(a, a′) = Σc ϒ(a, a′, cp(c), and the efficient influence function, of Φ(a, a′) under the saturated model 𝒫s, that is the set of all densities p(Y, A, M, C), is

U𝒫seff(Φ(a,a))=𝕀(A=a)p(M|a,C)p(a|C)p(M|a,C){Y𝔼(Y|C,M,a)}+𝕀(A=a)p(a|C){𝔼(Y|C,M,a)ϒ(a,a,C)}+ϒ(a,a,C)Φ(a,a),

where 𝕀(.) is the indicator function for an event [31].

To represent direct and indirect effects as contrasts, we also need to consider the response of Y to A being set to a for the purposes of all pathways from A to Y which simply corresponds to p(Y (a)), which is identified via a marginal of (2), namely Σc p(Y | a, c)p(c). The mean response is then Φ(a, a) = Σc 𝔼(Y | a, c)p(c). The efficient influence function of Φ(a, a) under the saturated model 𝒫s is simply U𝒫seff(Φ(a,a)), which simplifies to

𝕀(a)p(a|C){Yϒ(a,a,C)}+ϒ(a,a,C)Φ(a,a),

the efficient influence function derived in the context of total effects in [18].

Natural direct and indirect effects may be defined on the difference scale as Φ(a, a) − Φ(a, a′), and Φ(a, a′) − Φ(a′, a′). Alternatively, for binary outcomes we may also define such effects in a natural way on the risk ratio or odds ratio scale.

Estimating these parameters using an unrestricted likelihood is not a feasible strategy in settings with a high dimensional vector of baseline covariates, which means we must resort to modeling. An approach in [31] is to assume models {𝔼par(Y|a,m,c;α^),fpar(m|a,c;β^),fpar(a|c;γ^)}, and use a substitution estimator which solves the estimating equations

n(U^𝒫seff(Φ(a,a)))=0,

where ℙn(.) is the empirical average (for sample size n), and U^𝒫seff is equal to U𝒫seff evaluated at {𝔼par(Y|a,m,c;α^),fpar(m|a,c;β^),fpar(a|c;γ^)}.

The resulting estimator exhibits the property of triple robustness, that is it remains consistent in the union model where any two of the above three parametric models is correct. This estimator is combined with a similarly defined doubly robust estimator for Φ(a, a) derived in [18] to yield a triply robust estimator for the direct and indirect parameters on the difference scale. This was extended to the semi-parametric models for direct effects on the additive and multiplicative scales [32].

Since our results show that the edge g-formula encompasses a wide range of causal inference targets, including effects of treatments on the multiply treated, path-specific effects, and causal effects with unobserved causes of treatments, an interesting avenue of future work is to generalize estimation theory for simple instances of the edge g-formula, like above, to more general cases, for instance longitudinal cases like that shown in Fig. 2 (b).

9. Discussion

We have defined an inclusion hierarchy of interventions associated with graphical features: node interventions corresponding to standard treatment interventions, edge interventions corresponding to intervening on a portion of the treatment mechanism associated with a particular outgoing edge, and path interventions corresponding to intervening on a portion of the treatment mechanism associated with a particular outgoing causal pathway. We have shown that a variety of causal inference targets of interest, including effects of treatment on the multiply treated, and path-specific effects can be viewed as special cases of responses to path interventions. In addition, we have shown that edge interventions are in some sense naturally associated with the MWM of Pearl as the responses to such interventions are naturally identified under the assumptions of this model, just as node interventions are naturally associated with the SWM of Robins. The question of whether a particular causal inference target is identified, and under what model thus reduces to expressing the target as a path intervention, and then considering whether the path intervention is natural, and whether it can be re-expressed as an edge intervention or a node intervention. This process is summarized in a flowchart shown in Fig. 6.

An obvious extension of our work is to consider identification of responses in our hierarchy in hidden variable DAG models in terms of observed marginal distributions. Existing results on mediation analysis [24] and ETT identification [28] would be subsumed as special cases under this framework, but it would entail novel identification results for any new target expressible as a path intervention response. In addition, an interesting question is whether all identifying functionals for responses to node interventions in a hidden variable DAG model correspond to some sort of identified response to an edge intervention, although possibly not in a DAG but an ADMG. If true, this would recast any identified causal effect as a certain type of identified mediated effect.

While estimation theory of functionals derived from the extended g-formula (2) has received attention in the literature [20], multiply robust estimators for functionals obtained from the edge g-formula (5) are known only in very special cases such as the point treatment setting we discussed in section 8 [31]. As we have shown in this paper, developing estimators for general functionals obtained from the edge g-formula (5) results in estimators for a wide class of targets of interest in causal inference, including path-specific effects, effects of treatment on the multiply treated, effects of treatments on the indirectly treated, and causal effects in the presence of unobserved causes of treatments.

Our results thus not only provide a unifying view of identification, under various models, of a large class of targets of interest in causal inference, but also motivate the development of estimation theory for a more general functional than the g-formula.

Supplementary Material

Supplement

Acknowledgments

‡Supported in part by NIH Grants R01 AI104459-01A1

§1Supported in part by NIH Grants ES020337 and AI104459

References

  • 1.Avin Chen, Shpitser Ilya, Pearl Judea. Proceedings of the Nineteenth International Joint Conference on Artificial Intelligence (IJCAI-05) Vol. 19. Morgan Kaufmann; San Francisco: 2005. Identifiability of path-specific effects; pp. 357–363. [Google Scholar]
  • 2.Balke Alexander, Pearl Judea. Procedings of the Twelfth Conference on Artificial Intelligence (AAAI-94) Morgan Kaufmann; San Francisco: 1994. Probabilistic evaluation of counterfactual queries; pp. 230–237. [Google Scholar]
  • 3.Baron Reuben M, Kenny David A. The moderator-mediator variable distinction in social psychology research: Conceptual, strategic, and statistical considerations. Journal of Personality and Social Psychology. 1986;51:1173–1182. doi: 10.1037//0022-3514.51.6.1173. [DOI] [PubMed] [Google Scholar]
  • 4.Hubbard Alan E, Van Der Laan Mark J. Population intervention models in causal inference. Biometrika. 2008;95(1):35–47. doi: 10.1093/biomet/asm097. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Imai Kosuke, Tingley Dustin, Yamamoto Teppei. Experimental designs for identifying causal mechanisms. Journal of the Royal Statistical Society, series (A) 2013;176(1) [Google Scholar]
  • 6.Moodie Erica EM, Richardson Thomas S, Stephens David A. Demystifying optimal dynamic treatment regimes. Biometrics. 2007;63(2):447–455. doi: 10.1111/j.1541-0420.2006.00686.x. [DOI] [PubMed] [Google Scholar]
  • 7.Murphy Susan A. Optimal dynamic treatment regimes. Journal of the Royal Statistical Society (Series B) 2003;65:331–366. [Google Scholar]
  • 8.Neyman Jerzy. Sur les applications de la thar des probabilities aux experiences agaricales: Essay des principle. excerpts reprinted (1990) in English. Statistical Science. 1923;5:463–472. [Google Scholar]
  • 9.Pearl Judea. Probabilistic Reasoning in Intelligent Systems. Morgan and Kaufmann; San Mateo: 1988. [Google Scholar]
  • 10.Pearl Judea. Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence (UAI-01) Morgan Kaufmann; San Francisco: 2001. Direct and indirect effects; pp. 411–420. [Google Scholar]
  • 11.Pearl Judea. Causality: Models, Reasoning, and Inference. 2. Cambridge University Press; 2009. [Google Scholar]
  • 12.Pearl Judea. The causal mediation formula - a guide to the assessment of pathways and mechanisms. Technical Report R-379. Cognitive Systems Laboratory, University of California; Los Angeles: 2011. [Google Scholar]
  • 13.Richardson Thomas S, Robins Jamie M. Single world intervention graphs (SWIGs): A unification of the counterfactual and graphical approaches to causality. 2013 preprint: http://www.csss.washington.edu/Papers/wp128.pdf.
  • 14.Robins James M. A new approach to causal inference in mortality studies with sustained exposure periods - application to control of the healthy worker survivor effect. Mathematical Modeling. 1986;7:1393–1512. [Google Scholar]
  • 15.Robins James M. A graphical approach to the identification and estimation of causal parameters in mortality studies with sustained exposure periods. Journal of Chronic Diseases. 1987;40:139–161. doi: 10.1016/s0021-9681(87)80018-8. [DOI] [PubMed] [Google Scholar]
  • 16.Robins James M, Greenland Sander. Identifiability and exchangeability of direct and indirect effects. Epidemiology. 1992;3:143–155. doi: 10.1097/00001648-199203000-00013. [DOI] [PubMed] [Google Scholar]
  • 17.Robins James M, Richardson Thomas S. Alternative graphical causal models and the identification of direct effects. Causality and Psychopathology: Finding the Determinants of Disorders and their Cures. 2010 [Google Scholar]
  • 18.Robins James M, Rotnitzky Andrea, Zhao Lue P. Estimation of regression coefficients when some regressors are not always observed. Journal of the American Statistical Association. 1994;89:846–866. [Google Scholar]
  • 19.Robins James M, Hernn Miguel, Brumback Babette. Marginal structural models and causal inference in epidemiology. Epidemiology. 2000;11(5):550–560. doi: 10.1097/00001648-200009000-00011. [DOI] [PubMed] [Google Scholar]
  • 20.Robins James M, Hernán Miguel A, Siebert Uwe. Effects of multiple interventions. Comparative quantification of health risks: global and regional burden of disease attributable to selected major risk factors. 2004;2(28):2191–2230. [Google Scholar]
  • 21.Rubin Donald B. Estimating causal effects of treatments in randomized and non-randomized studies. Journal of Educational Psychology. 1974;66:688–701. [Google Scholar]
  • 22.Rubin Donald B. Direct and indirect causal effects via potential outcomes. Scandinavian Journal of Statistics. 2004;31:161–170. [Google Scholar]
  • 23.Rubin Donald B. Causal inference using potential outcomes: design, modeling, decisions. Journal of the American Statistical Association. 2005;100:322–331. [Google Scholar]
  • 24.Shpitser Ilya. Counterfactual graphical models for longitudinal mediation analysis with unobserved confounding. Cognitive Science (Rumelhart special issue) 2013;37:1011–1035. doi: 10.1111/cogs.12058. [DOI] [PubMed] [Google Scholar]
  • 25.Shpitser Ilya, Pearl Judea. Proceedings of the Twenty-First National Conference on Artificial Intelligence (AAAI-06) AAAI Press; Palo Alto: 2006. Identification of joint interventional distributions in recursive semi-Markovian causal models. [Google Scholar]
  • 26.Shpitser Ilya, Pearl Judea. Proceedings of the Twenty Second Conference on Uncertainty in Artificial Intelligence (UAI-06) AUAI Press; Corvallis, Oregon: 2006. Identification of conditional interventional distributions; pp. 437–444. [Google Scholar]
  • 27.Shpitser Ilya, Pearl Judea. Complete identification methods for the causal hierarchy. Journal of Machine Learning Research. 2008 Sep;9:1941–1979. [Google Scholar]
  • 28.Shpitser Ilya, Pearl Judea. Uncertainty in Artificial Intelligence. Vol. 25. AUAI Press; 2009. Effects of treatment on the treated: identification and generalization. [Google Scholar]
  • 29.Shpitser Ilya, Tchetgen Tchetgen Eric. Supplementary materials for: Causal inference with a graphical hierarchy of interventions. 2015 doi: 10.1214/15-AOS1411. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Spirtes Peter, Glymour Clark, Scheines Richard. Causation, Prediction, and Search. 2. Springer Verlag; New York: 2001. [Google Scholar]
  • 31.Tchetgen Tchetgen Eric J, Shpitser Ilya. Semiparametric theory for causal mediation analysis: efficiency bounds, multiple robustness, and sensitivity analysis. Annals of Statistics. 2012 doi: 10.1214/12-AOS990. in Press. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Tchetgen Tchetgen Eric J, Shpitser Ilya. Estimation of a semiparametric natural direct effect model incorporating baseline covariates. Biometrika. 2014 doi: 10.1093/biomet/asu044. in Press. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Tian Jin, Pearl Judea. Proceedings of the Eighteenth Conference on Uncertainty in Artificial Intelligence (UAI-02) Vol. 18. AUAI Press; Corvallis, Oregon: 2002. On the testable implications of causal models with hidden variables; pp. 519–527. [Google Scholar]
  • 34.Verma Thomas S, Pearl Judea. Technical Report R-150, Department of Computer Science. University of California; Los Angeles: 1990. Equivalence and synthesis of causal models. [Google Scholar]
  • 35.Wright Sewall. Correlation and causation. Journal of Agricultural Research. 1921;20:557–585. [Google Scholar]
  • 36.Young Jessica G, Hernan Miguel A, Robins James M. Identification, estimation and approximation of risk under interventions that depend on the natural value of treatment using observational data. Epidemiologic Methods. 2014;3(1):1–19. doi: 10.1515/em-2012-0001. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplement

RESOURCES