A Potential Outcomes Calculus for Identifying Conditional Path-Specific Effects

Daniel Malinsky; Ilya Shpitser; Thomas Richardson

. Author manuscript; available in PMC: 2019 Dec 28.

Published in final edited form as: Proc Mach Learn Res. 2019 Apr;89:3080–3088.

A Potential Outcomes Calculus for Identifying Conditional Path-Specific Effects

Daniel Malinsky ¹, Ilya Shpitser ², Thomas Richardson ³

PMCID: PMC6935349 NIHMSID: NIHMS1063775 PMID: 31886462

Abstract

The do-calculus is a well-known deductive system for deriving connections between interventional and observed distributions, and has been proven complete for a number of important identifiability problems in causal inference [1, 8, 18]. Nevertheless, as it is currently defined, the do-calculus is inapplicable to causal problems that involve complex nested counterfactuals which cannot be expressed in terms of the “do” operator. Such problems include analyses of path-specific effects and dynamic treatment regimes. In this paper we present the potential outcome calculus (po-calculus), a natural generalization of do-calculus for arbitrary potential outcomes. We thereby provide a bridge between identification approaches which have their origins in artificial intelligence and statistics, respectively. We use po-calculus to give a complete identification algorithm for conditional path-specific effects with applications to problems in mediation analysis and algorithmic fairness.

1. Introduction

Pearl’s do-calculus [6, 7, 8] is an abstract set of rules for reasoning about interventions that has proven to be influential in settings, such as computer science and artificial intelligence, where graphical models are used to represent causal relationships. In statistics and some social/biomedical sciences, the potential outcome framework [4, 15] is more commonly used to express causal assumptions and reason about interventions. Richardson and Robins [11] have made an important contribution by unifying causal formalisms grounded in graphical causal models with the potential outcomes framework. In this paper we build on those connections, presenting a calculus for reasoning about interventions in the potential outcomes notation that is equivalent to Pearl’s do-calculus for standard interventions, but allows generalizations to nested causal quantities pertinent to evaluating (e.g.) dynamic treatment regimes or path-specific interventions (for which the “do” notation is insufficiently expressive). We show how the new calculus can be applied to problems in mediation analysis, specifically the identification of conditional path-specific causal effects. We introduce a procedure which is complete for expressing such quantities as functions of the observed data distribution, i.e., an algorithm which will produce an identifying expression for a conditional path-specific effect if and only if the effect is identifiable.

Conditional path-specific effects are quantified via conditional distributions over potential outcomes, where treatment variables are assigned to possibly distinct values for different causal pathways. In mediation analysis, functions of such distributions are used to isolate the effect of a drug, therapy, or other treatment assignment along a specific pathway in a specific subpopulation, defined by pre-treatment variables (such as age or gender) or post-treatment variables (such as adverse reactions to the treatment). Importantly, there are settings where the marginal path-specific effect is identified but the conditional path-specific effect is not identified; we later discuss one simple example shown in Fig. 1.

Figure 1: — (a) A hidden variable causal DAG where p(Y(a, M(a′))) is identified, but p(Y(a, M(a′)) | C) is not identified. (b) A seemingly similar hidden variable causal DAG where both p(Y(a, M(a′))) and p(Y(a, M(a′)) | C) are identified.

Another context in which conditional path-specific effects may be of interest is in the study of algorithmic fairness. Recent papers [2, 3, 21] have proposed to combat disparities perpetuated by some automated decision-making systems by identifying, estimating, and constraining unfair causal influences that propagate along certain pathways, e.g., the direct effect of gender on hiring outcomes or the indirect effect of race on criminal justice outcomes via geographical factors. It may also be desirable to constrain such path-specific effects for certain subpopulations, which requires identifying conditional path-specific effects.

We begin by introducing potential outcomes, causal models, graphs, and some relevant results. Then we review the do-calculus, propose our potential outcome calculus, demonstrate they are equivalent, and give some simple derivations to establish the soundness of the rules in the language of potential outcomes. Finally, we introduce a formalism for expressing path-specific effects (PSEs) and a complete identification procedure for conditional PSEs.

2. Potential Outcomes, the Do Operator and Causal Models

Fix a set of indices K ≡ {1, …, k} under a total ordering ≺. For each random variable V_i, i∈ K, define a state space $X_{i}$ , and the sets Pre_i ≡ {1, …, i – 1}. Given A ⊆ K, we will denote subsets of random variables indexed by A with V_A and elements υ_A of $X_{A}$ by a (lowercase letters).

We assume the existence of all one-step-ahead potential outcome random variables (a.k.a. counterfactuals) of the form $V_{i} ({pa}_{i}) \equiv V_{i} (v_{{Pa}_{i}})$ , where Pa_i is a fixed subset of Pre_i, and ${pa}_{i} \equiv v_{{Pa}_{i}}$ is any element in $X_{{Pa}_{i}}$ . The variable V_i(pa_i) denotes the value of V_i had the set of direct causes of V_i, or $v_{{Pa}_{i}}$ , been set, possibly contrary to fact, to values pa_i. The existence of a total ordering ≺ on indices, and the fact that Pa_i ⊆ Pre_i precludes the existence of cyclic causation. (That is, we consider causal models that are recursive.) V_i(pa_i) may be conceptualized as the output of a structural equation $f_{i} : X_{{Pa}_{i} \cup {ϵ_{i}}} \mapsto X_{i}$ , a function representing a causal mechanism that maps values of Pa_i, as well as the value of an exogenous disturbance variable ϵ_i, to values of V_i. We define causal models as sets of densities over the set of random variables

V \equiv {V_{i} ({pa}_{i}) | i \in {1, \dots, k}, {pa}_{i} \in X_{{Pa}_{i}}} .

For simplicity of presentation, we assume $X_{i}$ is always finite, and thus ignore the measure theoretic complications that arise with defining densities over sets of random variables above in the case where some state spaces on Pa_i are infinite.¹

Given a set of one-step-ahead potential outcomes $V$ , for any A ⊆ K and i ⊆ K we define the potential outcome V_i(a), the response of V_i had variables in V_A been set to a, by the definition known as recursive substitution:

V_{i} (a) \equiv V_{i} (a \cap {pa}_{i}, {V_{j} (a) | j \in {Pa}_{i} \ A}) .

(1)

In words, this states that V_i(a) is the potential outcome where variables Pa_i in A are set to their corresponding values in a, and all elements of Pa_i not in A are set to whatever values their recursively defined counterfactual versions would have had had A been set to a. Equivalently, V_i(a) is the random variable induced by a modified set of structural equations: specifically the set of functions f_j for all V_j ∈ A are replaced by constant functions $f_{j}^{*}$ that set V_j to the corresponding value in a.

We denote by $V^{*}$ the set of all variables derived by (1) from $V$ , together with $V$ . In addition, for notational conciseness, we will use index sets to denote sets of potential outcomes themselves. That is, for Y ⊆ K, A ⊆ K, we will denote the set {V_i(a) | i ∈ Y} by Y(a). Note that we allow Y and A to intersect. Thus, we allow sets of potential outcomes of the form A(a), which denote the sets {V_i(a) | i ∈ A}, where each V_i(a) is defined using (1) above. In particular, if A = {V_i} (a singleton), V_i(υ_i) is defined in our notation to be the random variable V_i, not the constant υ_i.

In cases where Y and A do not intersect, the distribution p(Y(a)) has been denoted by Pearl as p(Y | do(a)) [8]. This formulation places emphasis on the intervention operator do(a), which replaces structural equations by constants.

Recursive substitution provides a link between observed variables and potential outcomes. In particular, it implies the consistency property:² for any disjoint A, B ⊆ K, i ∈ K \ (A ⋃ B), $a \in X_{A}$ , $b \in X_{B}$ ,

B (a) = b implies V_{i} (a, b) = V_{i} (a) .

(2)

Proposition 1 (consistency) Given $V^{*}$ derived from $V$ via(1), then (2) holds.

Proof: By (1), V_i(a) and V_i(a, b) are defined as

V_{i} (a_{{Pa}_{i}}, {V_{j} (a) | j \in {Pa}_{i} \ (A \cup B)}, {V_{j} (a) = b_{j} | j \in B \cap {Pa}_{i}})

and $V_{i} (a_{{Pa}_{i}}, {V_{j} (a, b) | j \in {Pa}_{i} \ (A \cup B)}, b_{{Pa}_{i}})$ , respectively. The conclusion follows immediately. □

(1) implies that every V_i(a) is can be written as a function of a unique minimally causally relevant subset of a.

Proposition 2 (causal irrelevance) Given $V^{*}$ derived from $V$ via (1), let $V_{i} (a) \in V^{*}$ , and let A* be the maximal subset of A such that for every A_j ∈ A*, there exists a sequence $V_{w_{1}}, \dots, V_{w_{m}}$ that does not intersect A, where $A_{j} \in {Pa}_{w_{1}}, V_{w_{i}} \in {Pa}_{w_{i + 1}}$ , for i = 1, … m − 1, and $V_{w_{m}} \in {Pa}_{i}$ . Then Vi(a) = Vi(a*).

Proof: Follows by definition of A* and (1). □

A functional causal model (a.k.a. a non-parametric structural equation model with independent errors, NPSEM-IE) asserts that the sets of variables

{{V_{i} ({pa}_{i}) | {pa}_{i} \in X_{{Pa}_{i}}} | i \in {1, \dots, k}}

(3)

are mutually independent. Phrased in terms of structural equations, the functional causal model states that the joint distribution of the disturbance terms factorizes into a product of marginals: $p (ϵ_{1}, \dots, ϵ_{k}) = \prod_{i = 1}^{k} p (ϵ_{i})$ .

Alternative causal models, which make fewer assumptions than the functional model but are sufficient for all inferences we aim to make in this paper, are discussed in [11, 20]. We focus on the functional causal model here, since it is simpler to describe and the original setting of Pearl’s do-calculus. We discuss how our results apply to a weaker causal model [11] in the Supplement.

3. Graphical Models

Much conceptual clarity may be gained by viewing causal models as graphs. We will consider graphs with either directed edges only (→), or mixed graphs with both directed and bidirected (↔) edges. Vertices correspond to random variables, and we simplify notation by using V_i to refer to both the graph vertex and corresponding random variable. In all cases we will require the absence of directed cycles, meaning that whenever the graph contains a path of the form V_i → ⋯ → V_j, the edge V_j → V_i cannot exist. Directed graphs with this property are called directed acyclic graphs (DAGs), and mixed graphs with this property are called acyclic directed mixed graphs (ADMGs). We will refer to graphs by $G (V)$ , where V is the set of random variables indexed by {1, …, k}. We will use the following standard definitions for sets of vertices in a graph:

\begin{array}{l} {Pa}_{i}^{G} \equiv {V_{j} | V_{j} \to V_{i} in G} & (parents of V_{i}) \\ {An}_{i}^{G} \equiv {V_{j} | V_{j} \to \dots \to V_{i} in G} & (ancestors of V_{i}) \\ {De}_{i}^{G} \equiv {V_{j} | V_{j} \leftarrow \dots \leftarrow V_{i} in G} & (descendants of V_{i}) \end{array}

By convention, we assume $V_{i} \in {An}_{i}^{G}$ and $V_{i} \in {De}_{i}^{G}$ . We will generally drop the superscript $G$ if the relevant graph is obvious and sometimes write $G$ in place of $G (V)$ when the vertex set is clear. Given a DAG $G (V)$ , a statistical DAG model (a.k.a. a Bayesian network) associated with $G (V)$ is a set of distributions that are Markov relative to $G (V)$ , i.e., the set of distributions that can be written as the following product of conditional densities:

p (V) = \prod_{i = 1}^{k} p (V_{i} | {Pa}_{i}) .

(4)

Given p(V) that is Markov relative to a DAG $G (V)$ , conditional independence relations (written: (Y ⫫ Z | X), where X, Y, Z are disjoint subsets of the index set K) satisfied by p(V) can be derived using the well-known d-separation criterion [5], which we reproduce in the Supplement. We write ${(Y ⫫_{d} Z | X)}_{G (V)}$ when Y is d-separated from Z given X in $G (V)$ . If p(V) is Markov relative to $G (V)$ , then the following global Markov property holds: for any disjoint X, Y, Z

{(Y ⫫_{d} Z | X)}_{G (V)} \Rightarrow (Y ⫫ Z | X) in p (V)

Functional causal models may also be associated with a DAG $G$ by identifying Pa_i with the graphical parents of V_i in $G (V)$ . Given a functional causal model for DAG $G$ , the joint distribution for any V(a) derived from $V$ using (1) is identified via the following formula:

p (V (a)) = \prod_{i = 1}^{K} p (V_{i} | {Pa}_{i} \ A, a \cap {pa}_{i}),

(5)

provided $(∐_{a_{i} \in a} p (a_{j} | {Pa}_{j} \ A, a \cap {pa}_{j})) > 0$ . See [11] for a simple proof. The modified factorization (5) is known as the extended g-formula [11, 13]. Note that (5) has a term for every V_i ∈ V, just like (4).

The formula (5) resembles (4) and in fact may be viewed as a factorization of p(V(a)) with respect to a certain graph derived from $G$ . Such graphs, called Single World Intervention Graphs (SWIGs), were introduced in [11]. SWIGs are graphical representations of potential outcome densities that help unify the graphical and potential outcome formalisms. Given a set A of variables which are assigned to values a, a SWIG $G (a)$ is constructed from $G (V)$ by splitting all vertices in A into a random half and a fixed half, with the random half inheriting all edges with an incoming arrowhead and the fixed half inheriting all outgoing directed edges. Then, all random vertices V_i are re-labelled as V_i(a) or equivalently (due to Proposition 2) as $V_{i} (a \cap {an}_{i}^{*})$ , where ${an}_{i}^{*}$ consists of values of the ancestors of V_i in the split graph. In [11], unsplit vertices were drawn as circles, and split nodes as half circles, with fixed nodes denoted by a lowercase. Fixed nodes are enclosed by a double line. For an example of a SWIG representing the joint density p(Y(a), M(a), C(a), A(a)) = p(Y(a), M(a), C, A), see Fig. 2 (b). Because of the resemblance of (5) to a DAG factorization, we say that p(V(a)) is Markov relative to a SWIG $G (a)$ if p(V(a)) may be written as (5).

Figure 2: — (a) A simple causal DAG $G$ , with a treatment A, an outcome Y, a vector C of baseline variables, and a mediator M. (b) A SWIG $G (a)$ derived from (a) corresponding to the world where A is intervened on to value a. (c) An extended graph $G^{e}$ derived from (a).

A SWIG $G (a)$ is a DAG with a vertex set {V(a), a}, and may be viewed as a conditional graph, with vertices in V(a) corresponding to random variables, and vertices in a corresponding to variables fixed to a value. We extend the notion of d-separation to allow fixed vertices. Specifically, we allow d-separation statements of the form ${(Y (a), a^{'} ⫫_{d} Z (a) | X (a))}_{G (a)}$ , for disjoint random subsets Y(a), Z(a), X(a) of V(a) and a′ a subset of a. Note that a possibly d-connecting path may only contain random nodes as non-endpoint vertices (as in [11] where fixed nodes are always blocked). Our extension here consists only in allowing fixed vertices to also appear as one endpoint in a d-separation statement. Just as (4) implied the global Markov property for a DAG, the modified factorization (5) implies a global Markov property for a SWIG.

Proposition 3 (SWIG global Markov property) If p(V(a)) is Markov relative to $G (a)$ , then for any disjoint subsets Y(a), Z(a), X(a) of V(a) and a subset a′ of a, if ${(Y (a), a^{'} ⫫_{d} Z (a) | X (a))}_{G (a)}$ then, for some f(·),

p (Z (a) | Y (a), X (a)) = p (Z (a) | X (a)) = f (Z, X, a \ a^{'}) .

Proof: The first equality is due to Theorem 12 in [11], the second follows from Theorem 19 in [10]. □

Note that f(Z, X, a\a′) is not necessarily equal to p(Z(a\a′) | X(a \ a′)).

The SWIG global Markov property implies the following intuitive result (proved in the Supplement) relating independence statements in p(V(a)) for various sets A. Specifically, the result is that interventions “always help” when it comes to conditional independence.

Proposition 4 (intervention monotonicity) For any disjoint subsets Y(a), Z(a), X(a) of V(a) and a subset a′ of a, if ${(Y (a), a^{'} ⫫ Z (a) | X (a))}_{G (a)}$ then for any A″ ⊇ A, $(Y (a^{''}), a^{'} ⫫ Z (a^{''}) | X (a^{''})) G (a^{''})$ .

Graphical Models With Hidden Variables

We also consider causal models where some variables are unmeasured (a.k.a. “latent” or “hidden” variables). Given a DAG $G (V \cup H)$ , define a latent projection mixed graph $G (V)$ as follows. V is the vertex set of $G (V)$ , and for any V_i, V_j ∈ V there is an edge V_i → V_j if there exists a directed path from V_i to V_j in $G (V \cup H)$ , with all intermediate nodes on the path in H; there is an edge V_i ↔ V_j if there exists a path from V_i to V_j of the form V_i ← ⋯ → V_j, where every intermediate node on the path is in H and no consecutive edges on the path are of the form → H_k ← for H_k ∈ H. The latent projection $G (V)$ obtained from a DAG $G (V \cup H)$ is always an ADMG. Our results in this paper apply to ADMGs, and indeed this is the intended setting for Pearl’s do-calculus (he used the terminology “semi-Markovian models”).

The definition of d-separation naturally generalizes to ADMGs with minor modification for bidirected edges; the resulting criterion is called m-separation [9]. We write ${(Y ⫫_{m} Z | X)}_{G (V)}$ if Y is m-separated from Z given X in ADMG $G (V)$ . In the following we sometimes drop the d or m subscripts and just write ⫫, where the relevant criterion is implicit.

Given an ADMG $G (V)$ , we define a SWIG $G (V) (a)$ by the analogous node splitting construction as for DAGs. Specifically, each node is split into a random half and a fixed half, with random halves inheriting all incoming directed and bidirected edges, and fixed halves inheriting all outgoing directed edges. Alternatively given a SWIG $G (V) (a)$ derived from a DAG $G (V \cup H)$ , we define the latent projection operation in the natural way, yielding the SWIG $G (a) (V)$ with random vertices V, fixed vertices a, and directed edges from a_i ∈ a or V_i ∈ V to V_j ∈ V if there is a directed path from the corresponding vertices in $G (a)$ with all intermediate vertices in H, and bidirected edges from V_i ∈ V to V_j ∈ V if there exists a path from V_i to V_j of the form V_i ← … → V_j, where every intermediate node on the path is in H and no consecutive edges on the path are of the form → H_k ← for H_k ∈ H. These operations commute, and we can derive independence statements via m-separation on G(V)(a), as we prove in the Supplement.

4. Do-Calculus and Potential Outcomes Calculus

Pearl formulated the do-calculus originally as follows:

1 : p (y | z, w, do (x)) = p (y | w, do (x)) if (Y ⫫ Z | W, X) G_{\bar{X}}

2 : p (y | z, w, do (x)) = p (y | w, do (z), do (x)) if {(Y ⫫ Z | W, X)}_{G_{\bar{X}, \underline{Z}}}

3 : p (y | w, do (z), do (x)) = p (y | w, do (x)) if (Y ⫫ Z | W, X) G_{\bar{X}, \bar{Z (W)}}

where $G_{\bar{X}}$ denotes the graph obtained from $G$ by removing all edges with arrowheads into X, $G_{\bar{Z}}$ denotes the graph obtained from $G$ by removing all directed edges out of Z, and $Z (W) \equiv Z \ {An}_{G_{\bar{X}}} (W)$ .

Here we present the do-calculus entirely in terms of potential outcomes (the “potential outcomes calculus” or “po-calculus” for short). The conditions are phrased in terms of conditional independencies implied by SWIGs, e.g., $G (x)$ for the SWIG where X is assigned value x. We restate the rules as follows:

1 : p (Y (x) | Z (x), W (x)) = p (Y (x) | W (x)) if {(Y (x) ⫫ Z (x) | W (x))}_{G (x)}

2 : p (Y (x, z) | W (x, z)) = p (Y (x) | W (x), Z (x) = z) if {(Y (x, z) ⫫ Z (x, z) | W (x, z))}_{G (x, z)}

3 : p (Y (x, z) | W (x, z)) = p (Y (x) | W (x)) if {(Y (x, z_{1}), W (x, z_{1}) ⫫ z_{1})}_{G (x, z_{1})} and {(Y (x, z_{1}) ⫫ Z_{2} (x, z_{1}) + W (x, z_{1}))}_{G (x, z_{1})} where Z_{1} = Z \ {An}_{G (x)} (W), Z_{2} = Z \cap {An}_{G (x)} (W)

Recall that random variables in a SWIG $G (x)$ are labelled V_i(x) or equivalently as $V_{i} (x \cap {an}_{i}^{*})$ , where ${an}_{i}^{*}$ consists of values of the ancestors of V_i in the split graph. We can view Rule 1 as the fragment of the SWIG global Markov property that pertains to random variables in V(a). Rule 2 may be called “generalized conditional ignorability” because it is a general version of the standard ignorability assumption used in causal inference settings, where (Y(a) ⫫ A | C), or equivalently (Y(a) ⫫ A(a) | C(a)), enables identification of (e.g.) the average treatment effect by adjusting for C. Note that Rule 3 does not have a simple interpretation, as it involves an equality of interventional distributions in two distinct “worlds,” given an independence condition in a third. However, below we suggest an alternative, simpler rule which may be used without loss of generality, and is more intuitive. First, we state some basic results.

Proposition 5 Rule 1 of po-calculus holds if and only if Rule 1 of do-calculus holds.

Proof: Follows from the definition of $G (x)$ and $G_{\bar{X}}$ , and the definition of m-separation. □

Proposition 6 Rule 2 of po-calculus holds if and only if Rule 2 of do-calculus holds.

Proof: Follows from the definition of $G (x, z)$ and $G_{\bar{X}, \underline{Z}}$ , and the definition of m-separation in $G (x, z)$ . □

Proposition 7 Rule 3 of po-calculus holds if and only if Rule 3 of do-calculus holds.

Proof: Since path separation criteria on graphs quantify over elements in vertex sets, and since Z is a disjoint union of Z₁ (Z(W) in Pearl’s terminology) and Z₂, the precondition in Rule 3 of do-calculus may be written as two preconditions: ${(Y ⫫ Z_{1} | W, X)}_{G_{\bar{X}, \bar{Z_{1}}}}$ and ${(Y ⫫ Z_{2} | W, X)}_{G_{\bar{X}, \bar{Z_{1}}}}$ .

By definition of Z₁, it contains only non-ancestors of W in $G_{\bar{X}}$ (and therefore also in $G_{\bar{X}, \bar{Z_{1}}}$ , which is an edge sub-graph of $G_{\bar{X}}$ ). Since Z₁ only has adjacent outgoing directed arrows in $G_{\bar{X}, \bar{Z_{1}}}$ , all elements of W are marginally m-separated from Z₁ in $G_{\bar{X}, \bar{Z_{1}}}$ . Thus, ${(W (x, z_{1}) ⫫ z_{1})}_{G (x, z_{1})}$ by the definition of $G (x, z_{1})$ . Furthermore, no element of Z₁ can be an ancestor of Y in $G_{\bar{X}, \bar{Z_{1}}}$ . To see this, suppose an element Z_i of Z₁ were an ancestor of Y. Then since ${(Y ⫫ Z_{1} | W, X)}_{G_{\bar{X}, \bar{Z_{1}}}}$ , the directed path from Z_i must be blocked by W and X. W cannot be on this directed path because it is non-descendant of Z₁, and X cannot be on the path because $G_{\bar{X}, \bar{Z_{1}}}$ has no directed edges into X. So we conclude that Z_i is not an ancestor of Y in $G_{\bar{X}, \bar{Z_{1}}}$ and therefore $(Y (x, z_{1}) ⫫ z_{1}) G (x, z_{1})$ by the definition of $G (x, z_{1})$ . Thus, if do-calculus Rule 3 precondition holds, po-calculus Rule 3 precondition holds.

We now prove the converse. If ${(Y (x, z_{1}) ⫫ z_{1})}_{G (x, z_{1})}$ then Z₁ is not an ancestor of Y in $G_{\bar{X}, \bar{Z_{1}}}$ . Similarly if ${(W (x, z_{1}) ⫫ z_{1})}_{G (x, z_{1})}$ then Z₁ is not an ancestor of W in $G_{\bar{X}, \bar{Z_{1}}}$ . Since Z₁ only has adjacent edges that are outgoing directed edges, this implies ${(Y, W ⫫ z_{1} | X)}_{G_{\bar{X}, \bar{Z_{1}}}}$ holds. Since semi-graphoid axioms hold for m-separation, this implies ${(Y ⫫ z_{1} | W, X)}_{G_{\bar{X}, \bar{Z_{1}}}}$ holds. Finally, ${(Y (x, z_{1}) ⫫ Z_{2} (x, z_{1}) | W (x, z_{1}))}_{G (x, z_{1})}$ holds if and only if ${(Y ⫫ Z_{2} | W, X)}_{_{G_{\bar{X}, \bar{Z_{1}}}}}$ holds, by the definitions of $G (x, z_{1})$ , $G_{\bar{X}, \bar{Z_{1}}}$ , and m-separation. □

We now briefly demonstrate the soundness of the three rules of the po-calculus using only potential outcomes machinery and our background assumptions.

Proposition 8 Rules 1, 2, and 3 are sound.

Proof: Proposition 3 licenses deriving conditional independence statements corresponding to the graphical conditions in each rule. Then we have the following derivations:

Rule 1 : p (Y (x) | Z (x), W (x)) = p (Y (x) | W (x)) by Y (x) ⫫ Z (x) | W (x).

Rule 2 : p (Y (x, z) | W (x, z)) = p (Y (x, z) | Z (x, x) = z, W (x, z)) = p (Y (x) | Z (x), W (x)) by Y (x, z) ⫫ Z (x, z) | W (x, z) and consistency.

Rule 3 : p (Y (x) | W (x)) = p (Y (x, z_{1}) | W (x, z_{1})) since Y (x, z_{1}), W (x, z_{1}) ⫫ z_{1.} = p (Y (x, z_{1}) | Z_{2} (x, z_{1}) = z_{2}, W (x, z_{1})) since Y (x, z_{1}) ⫫ Z_{2} (x, z_{1}) | W (x, z_{1}) . = p (Y (x, z_{1}, z_{2}) | Z_{2} (x, z_{1}, z_{2}) = z_{2}, W (x, z_{1}, z_{2})) by consistency. = p (Y (x, z) | Z_{2} (x, z) = z_{2}, W (x, z)) since Y (x, z_{1}) ⫫ Z_{2} (x, z_{1}) | W (x, z_{1}), Z_{2} \subseteq Z, and so by Proposition 4, = p (Y (x, z) | W (x, z))

The proof of Proposition 8 has a number of interesting consequences. First, the soundness of Rule 2 follows by Rule 1 and consistency. Second, the soundness of Rule 3 follows by applications of Rule 1, Rule 2, consistency, causal irrelevance, and intervention monotonicity.

Causal irrelevance, as used in the proof, is implied by m-separation statements in the SWIG $G (x, z_{1})$ ; however this property, like consistency, follows by (1) alone and does not require any assumption regarding the distributions p(V(a)) for any A ⊆ V; specifically, (5) is not required. As a result the three rules of po-calculus, taken as a whole, are consequences of consistency and causal irrelevance, which hold in any recursive causal model, together with the SWIG Markov property for random variables in V(a). (Intervention monotonicity follows from these.)

The proof of Proposition 8 also implies that a simpler reformulation of po-calculus suffices without loss of generality. Specifically, this reformulation replaces Rule 3 by the following simpler rule (encoding causal irrelevance in graphical form):

3^{*} : p (Y (x, z)) = p (Y (x)) if {(Y (x, z) ⫫ z)}_{G (x, z)} .

A benefit of translating the do-calculus exactly into our potential outcomes formulation is that the do-calculus rules as stated have been shown to be sufficient for a wide class of possible derivations on distributions expressible in terms of the do operator [1, 18]. However, since we phrased the rules for arbitrary potential outcomes, they may be applied to causal contrasts not expressible in standard do notation. We illustrate this by applying these rules to mediation analysis.

5. Path-Specific Effects and Extended Graphs

The identification theory for path-specific effects generally proceeds by considering nested, path-specific potential outcomes. Fix a set of treatment variables A, and a subset of proper causal paths π from any element in A. A proper causal path only intersects A at the source node. Next, pick a pair of value sets a and a′ for elements in A. For any V_i ∈ V, define the potential outcome V_i(π, a, a′) by setting A to a for the purposes of paths in π, and to a′ for the purposes of proper causal paths from A to Y not in π. Formally, the definition is as follows, for any V_i ∈ V:

V_{i} (π, a, a^{'}) \equiv a if V_{i} \in A V_{i} (π, a, a^{'}) \equiv V_{i} ({V_{j} (π, a, a^{'}) | V_{j} \in {Pa}_{i}^{π}}, {V_{j} (a^{'}) | V_{j} \in {Pa}_{i}^{\bar{π}}})

(6)

where V_j(a′) ≡ a′ if V_j ∈ A, and given by (1) otherwise, ${Pa}_{i}^{π}$ is the set of parents of V_i along an edge which is a part of a path in π, and ${Pa}_{i}^{\bar{π}}$ is the set of all other parents of V_i.

A counterfactual V_i(π, a, a′) is said to be edge inconsistent if counterfactuals of the form V_j(a_k, …) and $V_{j} (a_{k}^{'}, \dots)$ occur in V_i(π, a, a′), otherwise it is said to be edge consistent. It is well known that a joint distribution p(V(π, a, a′)) containing an edge-inconsistent counterfactual V_i(π, a, a′) is not identified in the functional causal model (nor weaker causal models) with a corresponding graphical criterion on π and $G (V)$ called the ‘recanting witness’ [16, 20]. For example, in Fig. 2 (a), given π = {C → A → Y}, Y(π, c, c′) ≡ Y(c′, M(c′, A(c′)), A(c)), while given π = {A → Y }, Y(π, a, a′) ≡ Y(C, a, M(a′, C)). Note that Y(π, c, c′) is edge inconsistent due to the presence of A(c) and A(c′), while Y(π, a, a′) is edge consistent.

Counterfactuals defined by (6) form the basis for direct, indirect, and path-specific effects estimated in the mediation analysis literature. There are generalizations where elements in A are set to arbitrary values for different paths, under the name of path interventions [20]. Similarly, edge consistent counterfactuals V(π, a, a′) generalize to responses to edge interventions [20]. We do not discuss this further here in the interests of space, although the results presented below generalize without issue. Note that edge consistent counterfactuals cannot, in general, be phrased in terms of the do operator.

We have the following the result, proven in [20].

Theorem 1 If V(π, a, a′) is edge consistent, then under the functional causal model for DAG $G$ ,

p (V (π, a, a^{'})) = \prod_{i = 1}^{K} p (V_{i} | a \cap {pa}_{i}^{π}, a^{'} \cap {pa}^{\bar{π}}, {Pa}_{i}^{G} \ A) .

(7)

As an example, the distribution p(Y(π, a, a′)) = p(Y(C, a, M(a′, C))) of the edge consistent counterfactual in Fig. 2 (a) is identified as a marginal distribution derived from (7), specifically ∑_C,M p(Y | a, M, C)p(M | a′, C)p(C). The po-calculus as presented above may be applied to any sort of potential outcome, including nested potential outcomes representing path-specific effects. In the following, we exploit an equivalence between path-specific potential outcomes and standard potential outcomes defined from an extended graph $G^{e}$ , which is constructed from $G$ following [14]. This both simplifies complex nested potential outcome expressions and enables us to leverage a series of prior results to identify conditional PSEs.

Given an ADMG $G (V)$ , define for each A_i ∈ A ⊆ V the set of variables $A_{i}^{Ch} \equiv {A_{i}^{j} | V_{j} \in {Ch}_{i}}$ , and let $A^{Ch} \equiv \cup_{A_{i} \in A} A_{i}^{Ch}$ . We define the extended graph of $G (V)$ , written $G^{e} (V \cup A^{Ch})$ , as the graph with the vertex set V ⋃ A^Ch, with edges of the form $A_{i} \to A_{i}^{j} \to V_{i}$ if and only if A_i → V_j is present in $G (V)$ , for A_i ∈ A, V_j ∈ V; furthermore, V_i ↔ V_j in $G^{e} (V \cup A^{Ch})$ if and only if V_i ↔ V_j is present for V_i, V_j ∈ V in $G (V)$ . As an example, the extended graph for the DAG in Fig. 2 (a), with A = V, is shown in Fig. 2 (c). For conciseness, we will generally drop explicit references to vertices V ⋃ A^Ch, and denote extended graph of $G (V)$ by $G^{e}$ . Extended graphs as we define them here are straightforward generalizations of those presented in [14], where they only consider “node copies” of a single “treatment” variable, whereas here extended graphs have “copies” corresponding to every parent-child relationship of a set of treatments A.

The edges $A_{i} \to A_{i}^{j}$ in $G^{e}$ are understood to represent deterministic relationships. More precisely, we associate a causal model with $G^{e}$ as follows. For $G$ we had associated a set of potential outcomes $V$ , and for $G^{e}$ we have $V^{e}$ . For every $V_{i} ({pa}_{i}) \in V$ , we let V_i(pa_i) be in $V^{e}$ . Note that this is well-defined, since V_i in $G$ and $G^{e}$ share the number of parents, and the parent sets for every V_i share state spaces. In addition, for every $A_{i}^{j} \in A^{Ch}$ , we let $A_{i}^{j} (a_{i})$ for $a_{i} \in X_{A_{i}}$ be in $V^{e}$ . By assumption, every $A_{i}^{j} \in A^{Ch}$ has a single parent A_i, and we further require that $p (A_{i}^{j} (a_{i}))$ is a deterministic density, with $p (A_{i}^{j} (a_{i}) = a_{i}) = 1$ . To fix intuitions, consider the example of Pearl’s discussed in [14]. They consider an analysis where A_i corresponds to smoking status, and affects hypertensive status V_j as well as myocardial infarction status V_k through nicotine $A_{i}^{j}$ and non-nicotine $A_{i}^{k}$ components respectively. The relationships $A_{i} \to A_{i}^{j}$ and $A_{i} \to A_{i}^{k}$ are deterministic relationships between smoking and exposure to nicotine/non-nicotine components. [14] go on to consider potential outcomes of the form $V_{k} (a_{i}^{j}, a_{i}^{k})$ (where the “node copies” $A_{i}^{j}$ and $A_{i}^{k}$ are assigned to perhaps different values) inspired by a hypothetical intervention on the nicotine components of cigarette exposure that fixes non-nicotine components at some reference value (e.g., a new nicotine-free cigarette). In this case, the path-specific effect of smoking on outcome via nicotine components is easy to write down and identify, at the price of introducing new variables and deterministic relationships into the model.

We now show the following two results. First, we show that an edge-consistent V(π, a, a′) may be represented without loss of generality by a counterfactual response to an intervention on a subset of A^Ch in $G^{e}$ with the causal model defined above. Second, we show that this response is identified by the same functional (7).

Given an edge consistent V(π, a, a′), define $G^{e}$ via A ⊆ V. We define a^π that assigns a_i to $A_{i}^{j} \in A^{Ch}$ if A_i → V_j in $G (V)$ is in π, and assigns $a_{j}^{'} to A_{i}^{j} \in A^{Ch}$ if A_i → V_j in $G (V)$ is not in π. The resulting set of counterfactuals V(a^π) is well defined in the model for $V^{e}$ , and we have the following result, proved in the Supplement.

Proposition 9 Fix an element $p (V)$ in the causal model for a DAG $G (V)$ , and consider the corresponding element $p^{e} (V^{e})$ in the restricted causal model associated with a DAG $G^{e} (V \cup A^{Ch})$ . Then p(V) = p^e(V ⋃ A^Ch) and p(V(π, a, a′)) = p^e(V(a^π)).

Corollary 1 Given an extended DAG $G^{e}$ ,

p (V (a^{π})) = \prod_{i = 1}^{K} p^{e} (V_{i} | a^{π} \cap {pa}_{i}, {Pa}_{i}^{G^{e}} \ A) .

Proof: This follows from Proposition 9, and the fact that the functional in (7) in p(V) is equal to $\prod_{i = 1}^{K} p^{e} (V_{i} | a^{π} \cap {pa}_{i}, {Pa}_{i}^{G^{e}} \ A)$ in p^e(V ⋃ A^Ch). □

In the causal models derived from DAGs with unobserved variables (e.g., $G (V \cup H)$ ), identification of distributions on potential outcomes such as p(V(a)) or p(V(π, a, a′)) may be stated without loss of generality on the latent projection ADMG $G (V)$ . A complete algorithm for identification of path-specific effects in hidden variable models was given in [16] and presented in a more concise form in [19]. We describe this form in detail in the Supplement. We also note (and prove in the Supplement) that the latent projection and the extended graph operations commute.

We now show that identification theory for p(V(π, a, a′)) in latent projection ADMGs $G (V)$ may be restated, without loss of generality, in terms of identification of p(V(a^π)) in $G^{e} (V \cup A^{Ch})$ .

Proposition 10 For any Y ⊆ V, p(Y(π, a, a′)) is identified in the ADMG $G (V)$ if and only if p(Y(a^π)) is identified in the ADMG $G^{e} (V, A^{Ch})$ . Moreover if p(Y(a^π)) is identified, it is by the same functional as p(Y(π, a, a′)).

Note that this Proposition is a generalization of Corollary 1 from DAGs to latent projection ADMGs. The proof of this claim, and all claims in the next section, are given in the Supplement.

6. Identification of Conditional PSEs

Having established that we can identify path-specific effects by working with potential outcomes derived from the $G^{e}$ model, we turn to the identification of conditional path-specific effects using the po-calculus. In [17], the authors present the conditional identification (IDC) algorithm for identifying quantities of the form p(Y(x)|W(x)) (in our notation), given an ADMG. Since conditional path-specific effects correspond to exactly such quantities defined on the extended model $G^{e}$ , we can leverage their scheme for our purposes. The idea is to reduce the conditional problem, identification of p(Y(a^π)|W(a^π)), to an unconditional (joint) identification problem for which a complete identification algorithm already exists.

The algorithm has three steps: first, exhaustively apply Rule 2 of the po-calculus to reduce the conditioning set as much as possible; second, identify the relevant joint distribution using Proposition 10 and the complete algorithm in [19]; third, divide that joint by the marginal distribution of the remaining conditioning set to yield the conditional path-specific potential outcome distribution. The procedure is presented formally as Algorithm 1, with the subroutine corresponding to Proposition 10 named PS-ID.

Note that we make use of SWIGs defined from extended graphs, e.g., $G^{e} (a^{π}, z)$ . Beginning with $G^{e}$ the SWIG $G^{e} (a^{π}, z)$ is constructed by the usual node-splitting operation: split nodes Z and $A_{i}^{j}$ into random and fixed halves, where $A_{i}^{j}$ is has fixed copy a if A_i → V_j in $G (V)$ is in π, and $a_{i}^{'}$ if A_i → V_j in $G (V)$ is not in π. Relabeling of random nodes proceeds as previously described.

The following two results are adapted from [17]; they are simply translated into potential outcomes and applied to extended graphs $G^{e}$ .

Proposition 11 If ${(Y (x, z) ⫫ L (x, z) | W (x, z))}_{C^{e} (x, z)}$ and T ⊆ W then ${(Y (x, t) ⫫ T (x, t) | Z (x, t), W_{1} (x, t))}_{G^{e} (x, t)}$ if and only if ${(Y (x, z, t) ⫫ T (x, z, t) | W_{1} (x, z, t))}_{G^{e} (x, z, t)}$ , where W₁ = W \ T.

Corollary 2 For any $G^{e} (x)$ and any conditional distribution p(Y(x)|W(x)), there exists a unique maximal set Z(x) = {Z_i(x) ∈ W(x) | p(Y(x)|W(x)) = p(Y(x, z_i)|W(x, z_i) \ {Z_i(x, z_i)})} such that Rule 2 applies for Z(x, z) in $G^{e} (x, z)$ for p(Y(x, z)|W(x, z)).

Algorithm 1 PS-IDC(Y, a^π, W,

G^{e}

)

Input: outcome Y, path-specific setting a^π, conditioning set W, and graph

G

Output: p(Y(a^π)|W(a^π))

1: if ∃Z ∈ W s.t.

{(Y (a^{π}, z) ⫫ Z (a^{π}, z) | W (a^{π}, z))}_{G^{e} (a^{π}, z)}

return

PS-IDC (Y, a^{π} \cup z, W \ Z, G^{e})

2: else let

p^{'} (Y (a^{π}), W (a^{π})) \leftarrow PS - ID (Y \cup W, a^{π}, G^{e})

return

p^{'} (Y (a^{π}), W (a^{π})) / \sum_{y} p^{'} (Y (a^{π}), W (a^{π}))

Open in a new tab

The following is similar to Theorem 6 in [17], but extended to path-specific queries in extended graphs. The proof is in the Supplement.

Theorem 2 Let p(Y(π, a, a′) | W(π, a, a′)) be a conditional path-specific distribution in the causal model for $G$ , and let p(Y(a^π) | W(a^π)) be the corresponding distribution in the extended causal model for $G^{e} (V \cup A^{Ch})$ . Let Z be the maximal subset of W such that p(Y(a^π) | W(a^π)) = p(Y(a^π, z) | W(a^π, z) \ Z(a^π, z)). Then p(Y(a^π) | W(a^π)) is identifiable in $G^{e}$ if and only if p(Y(a^π, z), W(a^π, z) \ Z(a^π, z)) is identifiable in $G^{e}$ .

We then have by Corollary 2, Theorem 2, and completeness of the identification algorithm for path-specific effects [19]:

Theorem 3 Algorithm 1 is complete.

As an example, p(Y(a, M(a′))) is identified from p(C, A, M, Y) in the causal model in Fig. 1 (a), via

\sum_{M} \frac{\sum_{C} p (Y, M | a, C) p (C) \sum_{C} p (M | a^{'}, C) p (C)}{\sum_{C} p (M | a, C) p (C)}

However p(Y(a, M(a′))|C) is not identified, since PS-IDC concludes p(Y(a, M(a′)), C) must first be identified, and this joint distribution is not identified via results in [16]. On the other hand, p(Y(a, M(a′))|C) is identified from p(C, A, M, Y) in a seemingly similar graph in Fig. 1 (b), via ∑_M p(Y | M, a, C)p(M | a′, C).

7. Conclusion

In this paper we introduced the potential outcomes calculus, a generalization of do-calculus that applies to arbitrary potential outcomes. We have shown that potential outcome calculus is equivalent to Pearl’s do-calculus for standard interventional quantities, and is a logical consequence of the properties of consistency and causal irrelevance, as well as the global Markov property associated with SWIGs. Finally, we used the potential outcomes calculus to give a sound and complete algorithm for conditional distributions defined on potential outcomes associated with path-specific effects. This algorithm may be viewed as a path-specific generalization of the identification algorithm for conditional interventional distributions in [17].

Supplementary Material

Appendix

NIHMS1063775-supplement-Appendix.pdf^{(275.2KB, pdf)}

8. Acknowledgments

The authors would like to thank the American Institute of Mathematics for supporting this research via the SQuaRE program. This project is sponsored in part by the National Institutes of Health grant R01 AI127271-01 A1, and the Office of Naval Research grants N00014-18-1-2760 and N00014-15-1-2672. The authors would like to thank James M. Robins for helpful discussions.

Footnotes

The set of $p (V)$ for a particular set of Pa_i and an ordering ≺ was called the finest causally interpretable structured tree graph (FCISTG) in [12].

Some readers may be more familiar with the simpler formulation where a = ∅, so “B = b implies V_i(b) = V_i.” Our reasons for allowing multiple intervention sets will become clear in what follows.

Contributor Information

Daniel Malinsky, Johns Hopkins University, Department of Computer Science, Baltimore, MD USA.

Ilya Shpitser, Johns Hopkins University, Department of Computer Science, Baltimore, MD USA.

Thomas Richardson, University of Washington, Department of Statistics, Seattle, WA USA.

References

[1].Huang Yimin and Valtorta Marco. Pearl’s calculus of interventions is complete In Proceedings of the 22nd Conference on Uncertainty in Artificial Intelligence (UAI-06), pages 217–224. AUAI Press, 2006. [Google Scholar]
[2].Nabi Razieh, Malinsky Daniel, and Shpitser Ilya. Learning optimal fair policies. arXiv preprint arXiv:1809.02244, 2018. [PMC free article] [PubMed] [Google Scholar]
[3].Nabi Razieh and Shpitser Ilya. Fair inference on outcomes In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18), pages 1931–1940. AAAI Press, 2018. [PMC free article] [PubMed] [Google Scholar]
[4].Neyman Jerzy. On the application of probability theory to agricultural experiments: essay on principles (1923), section 9. Reprinted in English, with Discussion. Statistical Science, pages 463–480, 1990. [Google Scholar]
[5].Pearl Judea. Probabilistic Reasoning in Intelligent Systems. Morgan and Kaufmann, San Mateo, 1988. [Google Scholar]
[6].Pearl Judea. A probabilistic calculus of actions In Proceedings of the Tenth Conference on Uncertainty in Artificial Intelligence (UAI-94), pages 452–462. Morgan Kaufmann, 1994. [Google Scholar]
[7].Pearl Judea. Causal diagrams for empirical research. Biometrika, 82(4):669–709, 1995. [Google Scholar]
[8].Pearl Judea. Causality: Models, Reasoning, and Inference. Cambridge University Press, 2 edition, 2009. [Google Scholar]
[9].Richardson Thomas S.. Markov properties for acyclic directed mixed graphs. Scandinavial Journal of Statistics, 30(1):145–157, 2003. [Google Scholar]
[10].Richardson Thomas S., Evans Robin J., Robins James M., and Shpitser Ilya. Nested Markov properties for acyclic directed mixed graphs. preprint: arXiv:1701.06686, 2017.
[11].Richardson Thomas S. and Robins Jamie M.. Single world intervention graphs (SWIGs): A unification of the counterfactual and graphical approaches to causality. preprint: http://www.csss.washington.edu/Papers/wp128.pdf, 2013.
[12].Robins James M.. A new approach to causal inference in mortality studies with sustained exposure periods – application to control of the healthy worker survivor effect. Mathematical Modelling, 7:1393–1512, 1986. [Google Scholar]
[13].Robins James M., Hernan Miguel A., and Siebert Uwe. Effects of multiple interventions In Comparative Quantification of Health Risks: Global and Regional Burden of Disease Attributable to Selected Major Risk Factors, volume 2, chapter 28, pages 2191–2230. World Health Organization, 2004. [Google Scholar]
[14].Robins James M. and Richardson Thomas S.. Alternative graphical causal models and the identification of direct effects In Causality and Psychopathology: Finding the Determinants of Disorders and their Cures. Oxford University Press, 2011. [Google Scholar]
[15].Rubin DB. Causal inference and missing data (with discussion). Biometrika, 63:581–592, 1976. [Google Scholar]
[16].Shpitser Ilya. Counterfactual graphical models for longitudinal mediation analysis with unobserved confounding. Cognitive Science (Rumelhart Special Issue), 37:1011–1035, 2013. [DOI] [PubMed] [Google Scholar]
[17].Shpitser Ilya and Pearl Judea. Identification of conditional interventional distributions In Proceedings of the 22nd Conference on Uncertainty in Artificial Intelligence (UAI-06), pages 437–444. AUAI Press, 2006. [Google Scholar]
[18].Shpitser Ilya and Pearl Judea. Identification of joint interventional distributions in recursive semi-Markovian causal models In Proceedings of the Twenty-First AAAI Conference on Artificial Intelligence (AAAI-06), pages 1219–1226. AAAI Press, 2006. [Google Scholar]
[19].Shpitser Ilya and Sherman Eli. Identification of personalized effects associated wisth causal pathways In Proceedings of the 34th Conference on Uncertainty in Artificial Intelligence (UAI-18). AUAI Press, 2018. [PMC free article] [PubMed] [Google Scholar]
[20].Shpitser Ilya and Tchetgen Tchetgen Eric J.. Causal inference with a graphical hierarchy of interventions. Annals of Statistics, 44(6):2433–2466, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
[21].Zhang Lu, Wu Yongkai, and Wu Xintao. A causal framework for discovering and removing direct and indirect discrimination. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17), pages 3929–3935, 2017. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Appendix

NIHMS1063775-supplement-Appendix.pdf^{(275.2KB, pdf)}

[R1] [1].Huang Yimin and Valtorta Marco. Pearl’s calculus of interventions is complete In Proceedings of the 22nd Conference on Uncertainty in Artificial Intelligence (UAI-06), pages 217–224. AUAI Press, 2006. [Google Scholar]

[R2] [2].Nabi Razieh, Malinsky Daniel, and Shpitser Ilya. Learning optimal fair policies. arXiv preprint arXiv:1809.02244, 2018. [PMC free article] [PubMed] [Google Scholar]

[R3] [3].Nabi Razieh and Shpitser Ilya. Fair inference on outcomes In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18), pages 1931–1940. AAAI Press, 2018. [PMC free article] [PubMed] [Google Scholar]

[R4] [4].Neyman Jerzy. On the application of probability theory to agricultural experiments: essay on principles (1923), section 9. Reprinted in English, with Discussion. Statistical Science, pages 463–480, 1990. [Google Scholar]

[R5] [5].Pearl Judea. Probabilistic Reasoning in Intelligent Systems. Morgan and Kaufmann, San Mateo, 1988. [Google Scholar]

[R6] [6].Pearl Judea. A probabilistic calculus of actions In Proceedings of the Tenth Conference on Uncertainty in Artificial Intelligence (UAI-94), pages 452–462. Morgan Kaufmann, 1994. [Google Scholar]

[R7] [7].Pearl Judea. Causal diagrams for empirical research. Biometrika, 82(4):669–709, 1995. [Google Scholar]

[R8] [8].Pearl Judea. Causality: Models, Reasoning, and Inference. Cambridge University Press, 2 edition, 2009. [Google Scholar]

[R9] [9].Richardson Thomas S.. Markov properties for acyclic directed mixed graphs. Scandinavial Journal of Statistics, 30(1):145–157, 2003. [Google Scholar]

[R10] [10].Richardson Thomas S., Evans Robin J., Robins James M., and Shpitser Ilya. Nested Markov properties for acyclic directed mixed graphs. preprint: arXiv:1701.06686, 2017.

[R11] [11].Richardson Thomas S. and Robins Jamie M.. Single world intervention graphs (SWIGs): A unification of the counterfactual and graphical approaches to causality. preprint: http://www.csss.washington.edu/Papers/wp128.pdf, 2013.

[R12] [12].Robins James M.. A new approach to causal inference in mortality studies with sustained exposure periods – application to control of the healthy worker survivor effect. Mathematical Modelling, 7:1393–1512, 1986. [Google Scholar]

[R13] [13].Robins James M., Hernan Miguel A., and Siebert Uwe. Effects of multiple interventions In Comparative Quantification of Health Risks: Global and Regional Burden of Disease Attributable to Selected Major Risk Factors, volume 2, chapter 28, pages 2191–2230. World Health Organization, 2004. [Google Scholar]

[R14] [14].Robins James M. and Richardson Thomas S.. Alternative graphical causal models and the identification of direct effects In Causality and Psychopathology: Finding the Determinants of Disorders and their Cures. Oxford University Press, 2011. [Google Scholar]

[R15] [15].Rubin DB. Causal inference and missing data (with discussion). Biometrika, 63:581–592, 1976. [Google Scholar]

[R16] [16].Shpitser Ilya. Counterfactual graphical models for longitudinal mediation analysis with unobserved confounding. Cognitive Science (Rumelhart Special Issue), 37:1011–1035, 2013. [DOI] [PubMed] [Google Scholar]

[R17] [17].Shpitser Ilya and Pearl Judea. Identification of conditional interventional distributions In Proceedings of the 22nd Conference on Uncertainty in Artificial Intelligence (UAI-06), pages 437–444. AUAI Press, 2006. [Google Scholar]

[R18] [18].Shpitser Ilya and Pearl Judea. Identification of joint interventional distributions in recursive semi-Markovian causal models In Proceedings of the Twenty-First AAAI Conference on Artificial Intelligence (AAAI-06), pages 1219–1226. AAAI Press, 2006. [Google Scholar]

[R19] [19].Shpitser Ilya and Sherman Eli. Identification of personalized effects associated wisth causal pathways In Proceedings of the 34th Conference on Uncertainty in Artificial Intelligence (UAI-18). AUAI Press, 2018. [PMC free article] [PubMed] [Google Scholar]

[R20] [20].Shpitser Ilya and Tchetgen Tchetgen Eric J.. Causal inference with a graphical hierarchy of interventions. Annals of Statistics, 44(6):2433–2466, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R21] [21].Zhang Lu, Wu Yongkai, and Wu Xintao. A causal framework for discovering and removing direct and indirect discrimination. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17), pages 3929–3935, 2017. [Google Scholar]

PERMALINK

A Potential Outcomes Calculus for Identifying Conditional Path-Specific Effects

Daniel Malinsky

Ilya Shpitser

Thomas Richardson

Abstract

1. Introduction

Figure 1:

2. Potential Outcomes, the Do Operator and Causal Models

3. Graphical Models

Figure 2:

Graphical Models With Hidden Variables

4. Do-Calculus and Potential Outcomes Calculus

5. Path-Specific Effects and Extended Graphs

6. Identification of Conditional PSEs

7. Conclusion

Supplementary Material

8. Acknowledgments

Footnotes

Contributor Information

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

A Potential Outcomes Calculus for Identifying Conditional Path-Specific Effects

Daniel Malinsky

Ilya Shpitser

Thomas Richardson

Abstract

1. Introduction

Figure 1:

2. Potential Outcomes, the Do Operator and Causal Models

3. Graphical Models

Figure 2:

Graphical Models With Hidden Variables

4. Do-Calculus and Potential Outcomes Calculus

5. Path-Specific Effects and Extended Graphs

6. Identification of Conditional PSEs

7. Conclusion

Supplementary Material

8. Acknowledgments

Footnotes

Contributor Information

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases