Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2019 Sep 27.
Published in final edited form as: Uncertain Artif Intell. 2018 Aug;2018:198.

Identification of Personalized Effects Associated With Causal Pathways

Ilya Shpitser 1, Eli Sherman 2
PMCID: PMC6764091  NIHMSID: NIHMS1001702  PMID: 31565035

Abstract

Unlike classical causal inference, where the goal is to estimate average causal effects within a population, in settings such as personalized medicine, the goal is to map a unit’s characteristics to a treatment tailored to maximize the expected outcome for that unit. Obtaining high-quality mappings of this type is the goal of the dynamic treatment regime literature. In healthcare settings, optimizing policies with respect to a particular causal pathway is often of interest as well. In the context of average treatment effects, estimation of effects associated with causal pathways is considered in the mediation analysis literature.

In this paper, we combine mediation analysis and dynamic treatment regime ideas and consider how unit characteristics may be used to tailor a treatment strategy that maximizes an effect along specified sets of causal pathways. In particular, we define counterfactual responses to such policies, give a general identification algorithm for these counterfactuals, and prove completeness of the algorithm for unrestricted policies. A corollary of our results is that the identification algorithm for responses to policies given in [16] is complete for arbitrary policies.

1. INTRODUCTION

Establishing causal relationships between actions and outcomes is fundamental to rational decision-making. The gold standard for establishing causal relationships is the randomized controlled trial (RCT), which may be used to establish average causal effects within a population. Causal inference is a branch of statistics that seeks to predict effects of RCTs from observational data, where treatment assignment is not randomized. Such data is often gathered from observational studies, surveys given to patients during follow up, and in-hospital electronic medical records.

While average treatment effects reported from implemented RCTs, or hypothetical RCTs emulated by causal inference methods using observational data establish whether a particular action is helpful on average, optimal decision making must tailor decisions to specific situations. In the context of causal inference this involves finding a map between characteristics of an experimental unit, such as baseline features, to an action that optimizes some outcome for that unit. Methods for finding such maps are studied in the dynamic treatment regime literature [3], and in off-policy reinforcement learning [2].

If an action is known to have a beneficial effect on some outcome, it is often desirable to understand the causal mechanism behind this effect. A popular type of mechanism analysis is mediation analysis, which seeks to decompose average treatment effects into direct and indirect components, or more generally into components associated with specific causal pathways. These components of the average causal effect are known as direct, indirect, and path-specific effects, and are also defined as population averages [1, 8, 12].

In this paper, we define counterfactual outcomes necessary to personalize effects associated with causal pathways, give an algorithm for non-parametric identification of these outcomes and prove that it is complete for arbitrary policies. We consider estimation methods for identified outcomes of this type in a companion paper [7].

Why Personalize Effects Along Causal Pathways?

It often makes sense to structure decision-making such that the overall effect of an action on the outcome is maximized for a given unit. However, in some cases it is appropriate to choose an action such that only a part of the effect of an action on the outcome is maximized. Consider management of HIV patients’ care. Since HIV is a chronic disease, care for HIV patients involves designing a long-term treatment plan to minimize the chance of viral failure (an undesirable outcome). In designing such a plan, an important choice is when to initiate primary therapy, and when to switch to a second line therapy. Initiating or switching too early risks unneeded side effects and ”wasting” treatment efficacy, while initiating or switching too late risks viral failure [4].

In the context of HIV, however, treatment adherence is an important component of the overall effect of the drug on the outcome. Patients who do not take prescribed doses compromise the efficacy of the drug, and different drugs may have different levels of adherence. Thus, for HIV patients, the overall effect of the drug can be viewed as a combination of the chemical effect and the adherence effect [6]. Therefore, choosing an action that maximizes the overall effect of HIV treatment on viral failure entangles these two very different causal mechanisms. One approach to tailoring treatments to patients in a way that disentangles these mechanisms is to find a policy that optimizes a part of the effect, say the chemical (direct) effect of the drug, while hypothetically keeping the adherence levels to some reference level. Finding such a policy yields information on how best to assign drugs to maximize their chemical efficacy in settings where adherence levels can be controlled to that of a reference treatment – even if the only data available is one where patients have differential adherence.

2. PRELIMINARIES

We proceed as follows. We first give graph theoretic preliminaries, and define graphical causal models that equate counterfactual responses to interventions (setting variables to values, contrary to fact) with truncated factorizations of the observed data distribution [11]. Next, we describe the more general edge intervention that sets variables to different values for different outgoing edges in a graph. Edge interventions are used to formulate direct, indirect, and path-specific effects in mediation analysis. Then, we define counterfactual responses to policies that set variables not to constant values but to values that potentially depend on other sets of variables. Extending these notions, we describe counterfactuals that generalize both responses to edge interventions, and responses to policies, namely responses to edge-specific policies. We briefly describe identification theory for these counterfactuals in causal models with no hidden variables, and note this theory is based on variations of a truncated factorization known as the g-formula [11].

We next consider identification theory for all counterfactuals we described in hidden variable causal models. This theory is more complex, and is based on the ID algorithm [14, 17]. We rephrase the algorithm and its necessary variations in a single line formula based on the fixing operator described in [10]. This reformulation allows us to express any functional corresponding to a counterfactual distribution identifiable in a hidden variable causal model as a single truncated factorization formula, just as identifiable counterfactual distributions in fully observed models are expressed via the g-formula. Finally, we describe a completeness result for the identification algorithm for responses to unrestricted edge-specific policies in hidden variable causal models.

While our primary contributions lie in the presentation of counterfactuals and identification theory for edge-specific policies, we include some discussion of prior theory to build up to our result, and show how identification theory of edge-specific policies generalizes identification theory for edge-specific effects and policy interventions.

Graph Theory

We will define statistical and causal models as sets of distributions defined by restrictions associated with graphs. We will use vertices and variables interchangeably – capital letters for a vertex or variable (V), bold capital letter for a set (V), lowercase letters for values (v), and bold lowercase letters for sets of values (v). By convention, each graph is defined on a vertex set V.

For a set of values a of A, and a subset AA, define aA to be a restriction of a to elements in A. The state space of A will be denoted by XA, and the (Cartesian product) state space of A will be denoted by XA.

For a graph mixed graph G with directed and bidirected edges, and any VV, we define the following genealogic sets: parents, children, ancestors, descendants, and districts as: paG(V){WV|WV}, chG(V){WV|VW}, anG(V){WV|WV}, deG(V){WV|WV}, disG(V){WV|VW}. By convention, anG(V)deG(V)disG(V)={V}. These sets generalize to VV disjunctively. For example, paG(V)VVpaG(V). For AV, define paGs(A)paG(A)\A, the parents of a set A.

The non-descendants of V are denoted ndG(V)V\deG(V). The set of districts forms a partition of vertices in G and is denoted D(G). Finally, given a graph G and AV, the subgraph of G containing only vertices in A and edges between these vertices is denoted GA.

Statistical And Causal Models Of A Dag

A directed acyclic graph (DAG), or Bayesian network, is a graph G with vertex set V connected by directed edges and such that there are no directed cycles in the graph (i.e. no sequences of edges and vertices VW and edge WV. A statistical model of a DAG G is the set of distributions p(V) such that p(V)=VVp(V|paG(V)). Such a p(V) is said to be Markov relative to G.

Causal models of a DAG are also sets of distributions, but on counterfactual random variables. Given YV and AV\{Y}, a counterfactual variable, or ‘potential outcome’, written as Y(a), represents the value of Y in a hypothetical situation where A were set to values a by an intervention operation [9]. Given a set Y, define Y(a){Y}(a){Y(a)|YY}. The distribution p(Y(a)) is sometimes written as p(Y|do(a)) [9].

Causal models of a DAG G consist of distributions defined on counterfactual random variables of the form V(a) where a are values of paG(V). In this paper we assume Pearl’s functional model for a DAG G with vertices V which is the set containing any joint distribution over all potential outcome random variables where the sets of variables

{{V(aV)|aVXpaG(V)}|VV}

are mutually independent [9]. The atomic counterfactuals in the above set model the relationship between paG(V), representing direct causes of V, and V itself. From these, all other counterfactuals may be defined using recursive substitution. For any AV\{V},

V(a)V(apaG(V)A,{paG(V)\A}(a)). (1)

For example, in the DAG in Fig. 1 (a), Y(a) is defined to be Y(a, M(a, W),W).

Figure 1:

Figure 1:

(a) A simple causal DAG, with a treatment A, an outcome Y, a vector W of baseline variables, and a mediator M. (b) A more complex causal DAG with two treatments A1, A2, an intermediate outcome W1, and the final outcome W2. H is a hidden common cause of the W variables. (c) A graph where p(Y(a,M(a))) is identified, but p(Y(fA(W),M(a))) is not.

A causal parameter is said to be identified in a causal model if it is a function of the observed data distribution p(V). Otherwise the parameter is said to be non-identified. In all causal models of a DAG G, all interventional distributions p({V\A}(a)) are identified by the g-formula [11]:

p({V\A}(a))=VV\Ap(V|paG(V))|A=a (2)

Not all interventional distributions are identified when there are hidden variables present in the causal model. We discuss identification theory in hidden variable DAGs later in this paper.

Edge Interventions

A more general type of intervention in a graphical causal model is the edge intervention [15], which maps a set of directed edges in G to values of their source vertices. Edge interventions have a natural interpretation in cases where a treatment variable has multiple components that a) influence the outcome in different ways, b) occur or do not occur together in observed data, and c) may in principle be intervened on separately. For instance, smoking leads to poor health outcomes due to two components: smoke inhalation and exposure to nicotine. A smoker would be exposed to both of these components, while a non-smoker to neither. However, one might imagine exposing someone selectively only to nicotine but not smoke inhalation (via a nicotine patch), or only smoke inhalation but not nicotine (via smoking plant matter not derived from tobacco leaves). These types of hypothetical experiments correspond precisely to edge interventions, and have been used to conceptualize direct and indirect effects [8, 12], often on the mean difference scale.

Formally, we will write the mapping of a set of edges to values of their source vertices using the following short-hand: (a1W1),(a2W2),,(akWk) to mean that edge (A1W1) is assigned to value a1, (A2W2) is assigned to value a2, and so on until (AkWk) is assigned to value ak. Alternatively, we will write aα to mean edges in α are mapped to values in the multiset a (since multiple edges may share the same source vertex, and be assigned to different values). For a subset βα, and an assignment aα denote aβ to be a restriction of aα to edges in β.

We will write counterfactual responses to edge interventions as Y(aα) or, for simple cases, as: Y((aY),(aM)) meaning the response to Y where A is set to value a for the purposes of the edge (AY) and to a′ for the purposes of the edge (AM). An edge intervention that sets a set of edges α to values in the multiset a is defined via the following generalization of recursive substitution (1):

Y(aα)Y(a{(ZY)α},{paGα¯(Y)}(aα)), (3)

where paCα¯(Y){W|(WY)α}. For example, in the DAG in Fig. 1 (a), Y((aY),(aM)) is defined as Y(a,M(a,W),W).

For simplicity of presentation, we will restrict attention to edge interventions with the property that if (AW)α, then for any VchG(A),(AV)α. These types of edge interventions set values for all causal pathways for a set of treatment variables. This is the convention in the majority of existing mediation literature as these interventions are most relevant in practical mediation analysis problems. Specifically, in our HIV example, we are interested in the effect of a drug along all pathways that start with a particular edge, while the effect of the drug via pathways that begin with other edges is kept to a reference level. This assumption may be relaxed, at the price of complicating the theory [15].

Edge interventions are used to define direct and indirect effects. For example, in the model given by the DAG in Fig 1 (a), the direct effect of A on Y is defined as E[Y((aY),(aM))]E[Y((aY),(aM))] which is equal to E[Y(a)]E[Y(a,M(a))]. The indirect effect may be defined similarly as E[Y((aY),(aM))]E[Y((aY),(aM))], which is equal to E[Y(a,M(a))]E[Y(a)]. The direct and indirect effects add up to the ACE.

Note that while direct, indirect, and path-specific effects may be defined directly as nested counterfactuals [8, 13], this notation quickly becomes unreadable for complicated interventions applied at multiple time points. The edge intervention notation may be viewed as a generalization of the do(.) operator notation of Pearl to mediation problems, which avoids having to specify the entire nested counterfactual, and instead directly ties interventions and sets of causal pathways to which these interventions apply (as represented by the first edge shared by all pathways in the set).

Identification of edge interventions in graphical causal models without hidden variables corresponds quite closely with identification of regular (node) interventions, as follows. Let Aα{A|(AB)α}. Consider an edge intervention given by the mapping aα. Then, under the functional model of a DAG G, the joint distribution of counterfactual responses p({V\Aα}(aα)) is identified via the the following generalization of (2) called the edge g-formula:

VV\Aαp(V|a{(ZV)α},paGα¯(V)). (4)

For example, in Fig 1 (a), p(Y((aY),(aM)))=W,Mp(Y|a,M,W)p(M|a,W)p(W), which is obtained by marginalizing W, M from the edge g-formula.

Edge interventions represent a special case of the more general notion of a path intervention [15]. Responses to both of these interventions are used to define path-specific effects [8], however responses to edge interventions are precisely those that are always identified under the functional model of a DAG, via (3). Responses to path interventions that cannot be rephrased as responses to edge interventions are not identified even in a DAG model, including the functional model, due to the presence of recanting witnesses [1]. For this reason, in this paper we restrict attention only to edge interventions and responses to edge-specific policies.

Responses To Treatment Policies

In personalized medicine settings, counterfactual responses to conditional interventions that set treatment values in response to other variables via a known function are of interest. As an example, assume the graph in Fig. 1 (b) represents an observational study of cancer patients where W0 represents baseline patient metrics, A1 is the primary therapy, W1 is the measured intermediate response to the primary therapy, A2 is a decision to either continue primary therapy or switch to a secondary therapy in the event of a poor response to A1, and W2 is the outcome of interest. In this setting, we might be interested in evaluating policies in the set {fA1:XW0XA1,fA2:X{W0,W1}XA2} that map patient characteristics to decisions about therapies A1 and A2. We evaluate the efficacy of these policies via the counterfactual variable W2(fA1,fA2), representing patient outcomes had treatment decisions been made according to those policies.

These types of variables are defined via a generalization of (1), where instead of setting values of parents in A1, A2 to values fixed by the intervention, values of parents in A are instead set according to fA1 and fA2. In particular, W2(fA1,fA2) is defined as

W2[fA2(W1[fA1(W0),W0],W0),W1[fA1(W0),W0],fA1(W0),W0]. (5)

The distribution of this variable is identified under the functional model via the natural generalization of (2) as

W0,W1p(W2|W0,fA1(W0),W1,fA2(W0,W1))×p(W1|W0,fA1(W0))p(W0). (6)

More generally, given a DAG G, a topological ordering , and a set AV, for each AA, define WA to be some subset of predecessors of A according to . Then, given a set of functions fA of the form fA:XWAXA, define Y(fA), the counterfactual response YV to A being intervened on via fA{fA|AA}, as

Y({fA(WA(fA))|ApaG(Y)A},{paG(Y)\A}(fA)). (7)

In a functional model of a DAG G, the effect of fA on the set of variables not being intervened upon, V \ A, represented by the distribution p({V\A)}(fA)), is identified by the following modification of (2) [16]:

VV\Ap(V|{fA(WA)|AApaG(V)},paG(V)\A). (8)

3. EDGE-SPECIFIC POLICIES

We now give a general definition of counterfactual responses to edge-specific policies that generalize both responses to edge interventions (where a variable is set to different constants for different outgoing edges) and responses to policies, where a variable is set according to a single known function for all causal pathways at once.

As an example, we can view Fig. 1 (a) as representing a cross-sectional study of HIV patients of the kind described in [6], where W is a set of baseline characteristics, A is one of a set of possible antiretroviral treatments, M is adherence to treatment, and Y is a binary outcome variable signifying viral failure. In this type of study, we may wish to find fA(W) that maximizes the expected outcome Y had A been set according to fA(W) for the purposes of the direct effect of A on Y, and A were set to some reference level a for the purposes of the effect of A on M. In other words, we may wish to find fA(W) to maximize the counterfactual mean E[Y(fA(W),M(a,W),W)].. This would correspond to finding a treatment policy that maximizes the direct (chemical) effect, if it were possible to keep adherence to a level M(a) as if a reference (easy to adhere to) treatment a were given.

We now give a general definition for responses to such edge-specific policies. Fix a set of directed edges α, and define Aα{A|(AB)α}. As before, we assume if (AW)α, then for all VchG(A),(AV)α. Define fα{fA(AW):XWAXA|(AW)α} as the set of policies associated with edges in α. Note that fα may contain multiple policies for a given treatment variable A.

Define Y(fα), the counterfactual response of Y to the set of edge-specific policies fα, as the following generalization of (3) and (7):

Y({fA(AY)(WA(fα))|(AY)α},{paGα¯(Y)}(fα)) (9)

In our earlier example, if f{(AY),(AM)}{fA(AY)(W),f˜A(AM)}, where f˜A assigns A to a constant value a, then Y(f{(AY),(AM)})Y(fA(W),M(a,W),W).

The joint counterfactual distribution for responses to edge-specific policies, p({V(fα)|VV\Aα}), is identified under the functional model, and generalizes (4) and (6) as follows:

VV\Aαp(V|{fA(AV)(WA)|(AV)α},paGα¯(V)). (10)

This is a consequence of the fact that (4) holds regardless of how edge interventions are set. In Fig. 1 (a), for example, p(Y(fA(W),M(a,W),W))=W,Mp(Y|fA(W),M,W)p(M|a,W)p(W).

4. IDENTIFICATION IN HIDDEN VARIABLE DAG MODELS

In a causal model of a DAG where some variables are hidden, not every causal parameter is a function of the observed data distribution. It is well known, however, that any two hidden variable DAGs which share a special mixed graph called a latent projection [9] share identification theory (see [10] for a proof).

Given a DAG G(VH), where V are observed and H are hidden variables, define a latent projection G(V) to be an acyclic directed mixed graph (ADMG) with the vertex set V and and edges. An edge AB exists in G(V) if there is a directed path from A to B in G(VH) with all intermediate vertices in H. Similarly, an edge AB exists in G(V) if there is a path without consecutive edges o from A to B with the first edge on the path of the form A and the last edge on the path of the form B, and all intermediate vertices on the path in H. For example, the graph in Fig. 2 (b) is the latent projection of Fig. 2 (a).

Figure 2:

Figure 2:

(a) A causal model with a treatment A and outcome Y. (b) A latent projection of the DAG in (a). (c) The graph derived from (b) corresponding to GY*=G{Y,M,W0,W1}. (d) A CADMG corresponding to p(M,W0|do(a)).

We will describe identification results on latent projections directly. General algorithms for identification of interventional distributions were given in [14, 17], for responses to edge interventions in [13], and for policies in [16]. Here we reformulate these results as one line formulas using the fixing operator described in [10]. We do so to explicate the connection between these earlier results, and our new identification algorithm.

Reformulation Of The ID Algorithm

A complete algorithm, called the ID algorithm, for identifying interventional distributions of the form p(Y|do(a)), or p(Y(a)), for YV\A was given in [17] and simplified in [14]. We now illustrate how this algorithm may be further simplified into a one line formula, which can be viewed as a generalization of the g-formula from the fully observed DAG to the hidden variable DAG case. We then show how this formula may be generalized appropriately to yield identification algorithms for edge interventions, and edge-specific policies in hidden variable causal models, just as g-formula was generalized to these cases in fully observed DAGs.

The version of the ID algorithm in [14], shown in Fig. 1 in the Appendix, proceeds as follows. Lines 2 and 3 reformulate the original query p(Y(a)) as Y*\Yp(Y*(a*)), where Y*,A* partition anG(Y), and Y*anGV\A(Y). In line 4, the distribution p(Y*(a*)) is factorized into terms corresponding to districts D in the subgraph GY*, with the ID algorithm called recursively on each term. These terms correspond to interventional distributions p(D|do(V\D=cV\D)), where CV\D is any set of values of V \ D consistent with a. In subsequent recursive calls, lines 2, 6 and 7 are iterated for each term until it is identified, or the failure condition is reached. Here line 2 corresponds to marginalizing out irrelevant variables, and lines 6 and 7 correspond to identifying a part of the set of intervened on variables in V \ D via the g-formula.

Consider Fig. 2 (b), where A represents a binary treatment, Y an outcome of interest, W0 a vector of baseline confounding factors, and M, W1 variables mediating the causal effect of A on Y. We are interested in identifying the counterfactual distribution p(Y(a)) as a function of the observed data distribution p(W0,A,M,W1,Y). Here anG(Y)={Y,M,W1,W0,A} is partitioned into Y*{Y,M,W1,W0} and A*{A}, with GY* shown in Fig. 2 (c). There are three districts in this graph, {W0,M}, {W1} and {Y}. Thus, the ID algorithm attempts to identify p(W0,M|do(w1,y,a)), p(W1|do(w0,m,y,a)) and p(Y|do(w0,m,w1,a)).

As an example, identifying p(W0,M|do(w1,y,a)) entails the following steps. First, Y and W1, as irrelevant variables that do not cause W0 and M, are marginalized out via line 2, leading to a subproblem where p(W0,M|do(a)) is identified from p(W0,A,M) with the subgraph corresponding to this subproblem shown in Fig. 2 (d). In this subproblem, p(W0,M|do(a)) is identified as p(M|a,W0)p(W0) via the g-formula in line 6. The recursion alternates steps that marginalize and apply the g-formula can be unified via a fixing operator applied to graphs and distributions that arise in the intermediate steps of the ID algorithm. We now define these graphs and distributions formally.

CADMGs And Kernels

A kernel qV(V|W) is a mapping from XW to normalized densities over V. Conditioning and marginalization are defined in kernels in the usual way:

qV(A|W)V\AqV(V|W);qV(V\A|AW)qV(V|W)qV(A|W),

for AV. A conditional distribution is one type of kernel, but others are possible. The functional p(M|a,W0)p(W0)=p(W0,M|do(a)) in the previous example is a kernel, q(M,W0|a), that is not in general equal to the conditional distribution p(M,W0|a).

A conditional ADMG (CADMG) G(V,W) is a type of ADMG where nodes are partitioned into two sets. The set W corresponds to fixed constants, and the set V corresponds to random variables. A CADMG has the property that no edges with an arrowhead into an element of W may exist. Intuitively, a CADMG represents a situation where some variables have already been intervened on. Pearl introduced a similar concept called the ‘mutilated graph’ in [9]. For example, the graph in Fig. 2 (d) is a CADMG G({W0,M},{A}) corresponding to the situation where W0, M are random variables and A is fixed to a constant. Just as a distribution may be associated with a DAG via factorization, so may a kernel be associated with a CADMG in a particular way [10]. For example, the CADMG in Fig. 2 (d) may be associated with p(W0,M|do(a))=p(M|a,W0)p(W0). Genealogic definitions, such as paG(.), carry over identically to CADMGs. Districts in a CADMG are defined as subsets of V.

The Fixing Operator And The ID Algorithm

Given a CADMG G(V,W), a variable VV is fixable if deG(V)disG(V)=. For example, in Fig. 2 (b), M is fixable, while W0 is not. Intuitively, V is fixable in a CADMG G(V,W) if, in a causal graph representing a hypothetical situation p(V|do(w)), where variables in W were already intervened on, p(V\{V}|do(w,v)) is identified by the application of the g-formula to p(V|do(w)). Whenever a variable V is fixable, a fixing operator may be applied to both the CADMG and the kernel to yield a new causal graph and a new kernel representing the situation where V is also intervened on.

Given VV fixable in a CADMG G(V,W), the fixing operator ϕV(G) yields a new CADMG G˜(V\{V},W{V}), where all vertices and edges in G(V,W) are kept, except V is viewed as fixed, and all edges with arrowheads into V are removed. Given VV fixable in a CADMG G(V,W), and a kernel qV(V|W) associated with G, the fixing operator ϕV(qV;G) yields a new kernel q˜V\{V}(V\{V}|W{V})qv(V|W)/qv(V|WndG(V)), where the denominator is defined as above by marginalization and conditioning within the kernel qV. If chG(V)=, division by qV(V|ndG(V)) is equivalent to marginalizing V from qV. In this way, the fixing operator unifies applications of the g-formula in lines 6 and 7 of the ID algorithm, and marginalization of irrelevant variables in line 2 of the ID algorithm, and the recursive operation of the ID algorithm can be expressed concisely as repeated invocations of the operator. This allows us to concisely express functionals returned by ID algorithm and its variations, including our new algorithm for identifying responses to edge-specific policies, as one line formulas.

A set VV is said to be fixable in a latent projection G(V) if there is a valid sequence V1,V2,,Vk of variables in V such that V1 is fixable in G, V2 is fixable in ϕV1(G), and so on. If V is fixable, V \ V is called a reachable set. If p(V) is a marginal of a distribution p(VH) Markov relative to a DAGG(VH), and G(V) is a latent projection, then CADMG/kernel pairs obtained from G(V) and p(V) by any valid sequence in V is the same [10, 17]. As a result, for any fixable set V in, G, writing ϕV(G) or ϕV(qV;G) is well-defined, and means “apply the fixing operator to elements of V in some valid sequence,” with the understanding that any such sequence will yield the same result.

The existence of a valid fixing sequence for each district in GY* implies corresponding terms may be identified via lines 2, 6, and 7 of the ID algorithm, and the overall algorithm can be rephrased as:

p(Y|do(a))=Y*\YDD(GY*)p(D|do(V\D))|A=a=Y*\YDD(GY*)ϕV\D(p(V);G(V))|A=a, (11)

which yields the following identifying formula for p(Y|do(a)) in our example in Fig. 2 (a):

p(Y(a))=W0,A,M,W1p(W1|M,A=a,W0)×p(M|A=a,W0)p(W0)W0,Ap(Y|W1,M,A,W0)p(W0,A). (12)

We omit the full derivation in the interest of space. See the section on identification of edge-specific policy interventions and the appendix for a complete example. Observe that this equation is a generalized version of Pearl’s front-door formula [9].

Whenever V \ D for every D is fixable, the formula (11) yields the correct expression for p(Y|do(a)) in terms of the observed data. If some V \ D is not fixable, the algorithm fails, and p(Y|do(a)) is not identified. See [10] for a detailed proof.

Edge Interventions

Identification of path-specific effects where each path is associated with one of two possible value sets a, a′ was given a general characterization in [13] via the recanting district criterion. Here, we reformulate this result in terms of the fixing operator in a way that generalizes (11), and applies to the response of any edge intervention, including those that set edges to multiple values rather than two. This result can also be viewed as a generalization of node consistency of edge interventions in DAG models, found in [15].

Given Aα{A|(AB)α}, and an edge intervention given by the mapping aα, define Y*anGV\Aα(Y). The joint distribution of the counterfactual response p({V\Aα}(aα)) is identified if p({V\Aα}(a)) is identified via (11), and for every DD(GY*), for every AAα,a¯α has the same value assignment for every directed edge out of A into D. Under these assumptions, we have the following result.

Theorem 1

p(Y(aα)) is identified and equal to

Y*\YDD(GY*)ϕV\D(p(V);G)|a{(AD)α|DD,AAα} (13)

Proof: This follows directly from results in [13] and [10]. Identifying edge interventions entails identifying DD(GY*)p(D|do(aD)), where aD is an assignment for paGs(D), and aD possibly assigns different values to elements of A with respect to different districts. The fact that this identification algorithm can be rephrased as (13) follows directly by Theorem 60 in [10]. □

Consider again the example in Fig. 2 (a). Now assume we set A = a for the edge (AM) and A = a′ for the edge (AW1). The identifying functional for p(Y((aW1),(aM))) has the same form as (12), but some terms are evaluated at A = a, and some at A = a′:

W0,A,M,W1p(W1|M,A=a,W0)p(M|A=a,W0)p(W0)W0,Ap(Y|W0,A,M,W1)p(W0,A) (14)

Policy Interventions (Dynamic Treatment Regimes)

A general algorithm for identification of responses to a set of policies fA was given in [16]. We again reformulate this algorithm in terms of the fixing operator. Define a graph GfA to be a graph obtained from G by removing all edges into A, and adding for any AA, directed edges from WA to A. By definition of WA, GfA is guaranteed to be acyclic. Define Y* anGfA(Y)\A. Assume p(Y*(a)) is identified in G. Then, under the above assumptions, we have the following result.

Theorem 2

p(Y(fA)) is identified in G. Moreover, the identification formula is

(Y*A)\YDD¯(G¯Y*)ϕV\D(p(V);G)|a˜paGs(D)A (15)

where a˜paGs(D)A is defined as

{{A=fA(WA)|ApaG(D)A} pa G(D)Aotherwise

Proof: This follows from the fact that identification of p(Y(fA)) can be rephrased as identification of p(Y*(a)), with values a set according to {WA|AA}, where all WA in the set are subsets of Y*. Identification of p(Y*(a)) may be rephrased as (15) follows by Theorem 60 in [10]. □

The outer sum over A in (15) is vacuous if fA is a set of deterministic policies. To illustrate (15), in our example in Fig. 2 (b), p(Y(A=fA(W0))) is identified as

W0,A,M,W1p(W1|M,A=f(W0),W0)p(M|A=f(W0),W0)p(W0)W0,Ap(Y|W1,M,A,W0)p(W0,A) (16)

Identification Of Edge-Specific Policies

Having reformulated existing identification results on responses to policies (15) and responses to edge interventions arising in mediation analysis (13) in terms of the fixing operator, we generalize these results for identification of responses to edge-specific policies.

Given Aα{A|(AB)α}, and a set of edge-specific policies given by the set of mappings fα, define the graph Gfα to be one where all edges with arrowheads into Aα are removed, and directed edges from any vertex in WA to AAα added. Fix a set Y of outcomes of interest, and define Y* equal anGfα(Y)\Aα. We have the following result.

Theorem 3

p(Y(fα)) is identified if p(Y*(a)) is identified, and for every DD((Gfα)Y*), fα yields the same policy assignment for every edge from AAα to D. Moreover, the identifying formula is

(Y*Aα)\YDD(GY*)ϕV\D(p(V);G)|a˜paGs(D)Aα (17)

Where a˜paGs(D)Aα is defined to be {A=fA(WA)fα|ApaG(D)Aα}, if paG(D)Aα, and is defined to be the otherwise.

Proof: This is a straightforward generalization of the proofs of Theorems 1 and 2. □

Responses to edge-specific policies are identified in strictly fewer cases compared to responses to edge interventions. This is because Y* is a larger set in the former case. As an example, consider the graph in Fig. 1 (c), where we are interested either in the counterfactual p(Y(a,M(a))), used to define pure direct effects, or the counterfactual p(Y(fA(W),M(a))).

For the former counterfactual, we have Y*={Y,M}, and p(Y(a,M(a))) equal to

m(wp(Y,m|a,w)p(w)wp(m|a,w)p(w))wp(m|a,w)p(w)

We omit the detailed derivation in the interest of space. For the latter counterfactual, however, the set Y* = {Y, M, W} forms a single district in GY*, and the edge-specific policy set f{(AM),(AY)} sets edges from A to this district to different policies. As a result, Theorem 3 is insufficient to conclude identification.

Generalizations of the example in Fig. 1 (b) are the most relevant in practice, as their causal structure corresponds to longitudinal observational studies, of the kind considered in [11], and many other papers. However, we illustrate complications that may arise in identifiability of responses to edge-specific policies with our running example in Fig. 2 (b), where we are interested in the response of Y to edge-specific policies f{(AM),(AW1)}={fA(AM)(W0),fA(AW1)(W0)}. Theorem 3 yields the following identifying formula:

W0,A,M,W1[[p(W1|M,A=fA(AM)(W0),W0)]×[p(M|A=fA(AW1)(W0),W0)p(W0)]×[W0,Ap(Y|W1,M,A,W0)p(W0,A)]]. (18)

Note that (18) generalizes both (14), which sets A to different constants in different terms, and (16), which sets A to the output of a function that depends on W0. We give a detailed derivation of this functional in the appendix.

5. ON COMPLETENESS

An identification algorithm for a class of parameters is said to be complete relative to a class of causal models if, whenever the algorithm fails to identify a parameter within a model class, the parameter is in fact not identified within that class.

The ID algorithm is known to be complete for the class of interventional distributions in the class of functional models [5, 14]. We restate this result here, and give a sequence of increasingly general completeness results for the identification algorithms described so far. Completeness results on policies and edge-specific policies are new. For completeness results pertaining to policies, we assume a completely unrestricted class of policies. If the set of policies of interest, fA or fα is restricted, or alternatively if the causal model has parametric restrictions, completeness results we present may no longer hold.

Theorem 4

Given disjoint subsets Y, A of V in an ADMG G, define Y*anGV\A(Y). Then p(Y(a)) is not identified if there exists DD(GY*) that is not a reachable set in G.

Corollary 1

The algorithm for identification of p(Y(a)), as phrased in (11), is complete.

Theorem 5

Given Aα{A|(AB)α}, and an edge intervention given by the mapping aα, define Y*anGV\Aα(Y). The joint distribution of the counterfactual response p({V\Aα}(aα)) is not identified if p({V\Aα}(a)) is not identified, or there exists DD(GY*) and AAα, such that aα has the different value assignments for a pair of directed edges out of A into D.

Corollary 2

The algorithm for identification of p(Y(aα)), as phrased in (13), is complete.

Theorem 6

Define GfA to be a graph obtained from G by removing all edges into A, and adding for any AA, directed edges from WA to A. Define Y*anGfA(Y)\A. Then if p(Y*(a)) is not identified in G, p(Y(fA)) is not identified in G if fA is the unrestricted class of policies.

Corollary 3

The algorithm for identification of p(Y(A)), as phrased in (15), is complete for unrestricted policies.

Theorem 7

Define the graph Gfα to be one where all edges with arrowheads into Aα are removed, and directed edges from any vertex in WA to AAα added. Fix a set Y of outcomes of interest, and define Y* equal anGfα(Y)\Aα. Then if p(Y*(a)) is not identified, or there exists DD((Gfα)Y*), such that fα yields different policy assignments for two edges from AAα to D, p(Y(fα)) is not identified.

Corollary 4

The algorithm for identification of p(Y(fα)), as phrased in (17), is complete for unrestricted policies.

Detailed proofs of these results are in the Appendix. Corollaries are immediate consequences of the preceding Theorems.

6. CONCLUSION

In this paper, we defined counterfactual responses to policies that set treatment values in such a way that they affect outcomes with respect to certain causal pathways only. Such counterfactuals arise when we wish to personalize only some portion of the causal effect of a treatment, while keeping other portions set to some reference values. An example might be optimizing the chemical effect of a drug, while keeping drug adherence to a reference value.

We gave a general algorithm for identifying these responses from data, which generalizes similar algorithms due to [16, 13] for dynamic treatment regimes, and edge-specific effects, respectively. Further, we showed that given an unrestricted class of policies the algorithm is complete. As a corollary, this established that the identification algorithm for dynamic treatment regimes in [16] is complete for unrestricted policies.

Given a fixed set of policies associated with a set of causal pathways, and assuming (17) yields a functional containing only conditional densities, as is the case in the functional (18), the counterfactual mean under those policies E[Y(fα)] may be estimated using the maximum likelihood plug-in estimator. Such an estimator can be viewed as a generalization of the parametric g-formula [11] to edge-specific policies. More general estimation strategies, and approaches to learning the optimal set of policies are the subject of our companion paper [7].

Supplementary Material

appendix

Acknowledgments

This research was supported in part by the NIH grants R01 AI104459–01A1 and R01 AI127271–01A1. We thank the anonymous reviewers for their insightful comments that greatly improved this manuscript.

Contributor Information

Ilya Shpitser, Department of Computer Science Johns Hopkins University Baltimore, MD.

Eli Sherman, Department of Computer Science Johns Hopkins University Baltimore, MD.

References

  • [1].Avin C, Shpitser I, and Pearl J Identifiability of path-specific effects. In Proceedings of the Nineteenth International Joint Conference on Artificial Intelligence (IJCAI-05), volume 19, pages 357–363. Morgan Kaufmann, San Francisco, 2005. [Google Scholar]
  • [2].Bertsekas DP and Tsitsiklis J Neuro-Dynamic Programming. Athena Publishing, 1996. [Google Scholar]
  • [3].Chakraborty B and Moodie EEM Statistical Methods for Dynamic Treatment Regimes (Reinforcement Learning, Causal Inference, and Personalized Medicine). Springer, New York, 2013. [Google Scholar]
  • [4].Hernan MA, Lanoy E, Costagliola D, and Robins JM Comparison of dynamic treatment regimes via inverse probability weighting. Basic and Clinical Pharmacology and Toxicology, 98:237–242, 2006. [DOI] [PubMed] [Google Scholar]
  • [5].Huang Y and Valtorta M Pearl’s calculus of interventions is complete. In Twenty Second Conference On Uncertainty in Artificial Intelligence, 2006. [Google Scholar]
  • [6].Miles C, Shpitser I, Kanki P, Melone S, and Tchetgen Tchetgen EJ Quantifying an adherence path-specific effect of antiretroviral therapy in the nigeria pepfar program. Journal of the American Statistical Association, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [7].Nabi R and Shpitser I Estimation of personalized effects associated with causal pathways. In Proceedings of the Thirty Fourth Conference on Uncertainty in Artificial Intelligence (UAI), 2018. [PMC free article] [PubMed] [Google Scholar]
  • [8].Pearl J Direct and indirect effects. In Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence (UAI-01), pages 411–420. Morgan Kaufmann, San Francisco, 2001. [Google Scholar]
  • [9].Pearl J Causality: Models, Reasoning, and Inference. Cambridge University Press, 2 edition, 2009. [Google Scholar]
  • [10].Richardson TS, Evans RJ, Robins JM, and Shpitser I Nested Markov properties for acyclic directed mixed graphs, 2017. Working paper. [Google Scholar]
  • [11].Robins JM A new approach to causal inference in mortality studies with sustained exposure periods – application to control of the healthy worker survivor effect. Mathematical Modeling, 7:1393–1512, 1986. [Google Scholar]
  • [12].Robins JM and Greenland S Identifiability and exchangeability of direct and indirect effects. Epidemiology, 3:143–155, 1992. [DOI] [PubMed] [Google Scholar]
  • [13].Shpitser I Counterfactual graphical models for longitudinal mediation analysis with unobserved confounding. Cognitive Science (Rumelhart special issue), 37:1011–1035, 2013. [DOI] [PubMed] [Google Scholar]
  • [14].Shpitser I and Pearl J Identification of joint interventional distributions in recursive semi-Markovian causal models. In Proceedings of the Twenty-First National Conference on Artificial Intelligence (AAAI-06) AAAI Press, Palo Alto, 2006. [Google Scholar]
  • [15].Shpitser I and Tchetgen Tchetgen EJ Causal inference with a graphical hierarchy of interventions. Annals of Statistics, 44(6):2433–2466, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [16].Tian J Identifying dynamic sequential plans. In Proceedings of the Twenty-Fourth Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI-08), pages 554–561, Corvallis, Oregon, 2008. AUAI Press. [Google Scholar]
  • [17].Tian J and Pearl J On the testable implications of causal models with hidden variables. In Proceedings of the Eighteenth Conference on Uncertainty in Artificial Intelligence (UAI-02), volume 18, pages 519–527. AUAI Press, Corvallis, Oregon, 2002. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

appendix

RESOURCES