Stochastic counterfactuals and stochastic sufficient causes

TYLER J VANDERWEELE; JAMES M ROBINS

doi:10.5705/ss.2008.186

. Author manuscript; available in PMC: 2014 Dec 1.

Published in final edited form as: Stat Sin. 2012 Jan 1;22(1):379–392. doi: 10.5705/ss.2008.186

Stochastic counterfactuals and stochastic sufficient causes

TYLER J VANDERWEELE ¹, JAMES M ROBINS ¹

PMCID: PMC4249711 NIHMSID: NIHMS632470 PMID: 25473251

Abstract

Most work in causal inference concerns deterministic counterfactuals; the literature on stochastic counterfactuals is small. In the stochastic counterfactual setting, the outcome for each individual under each possible set of exposures follows a probability distribution so that for any given exposure combination, outcomes vary not only between individuals but also probabilistically for each particular individual. The deterministic sufficient cause framework supplements the deterministic counterfactual framework by allowing for the representation of counterfactual outcomes in terms of sufficient causes or causal mechanisms. In the deterministic sufficient cause framework it is possible to test for the joint presence of two causes in the same causal mechanism, referred to as a sufficient cause interaction. In this paper, these ideas are extended to the setting of stochastic counterfactuals and stochastic sufficient causes. Formal definitions are given for a stochastic sufficient cause framework. It is shown that the empirical conditions that suffice to conclude the presence of a sufficient cause interaction in the deterministic sufficient cause framework suffice also to conclude the presence of a sufficient cause interaction in the stochastic sufficient cause framework. Two examples from the genetics literature, in which there is evidence that sufficient cause interactions are present, are discussed in light of the results in this paper.

Keywords: Causal inference, Interaction, Stochastic counterfactual, Sufficient cause, Synergism

1. Introduction

Although most work in causal inference concerns deterministic counterfactuals, a few papers address the setting of stochastic counterfactuals (Greenland, (1987); Robins and Greenland (1989, 2000)). In the deterministic counterfactual framework, each set of exposures corresponds to only one outcome for each individual. The same set of exposures may bring about different outcomes for different individuals but for a particular individual, the set of exposures fixes the outcome. The collection of individuals is generally treated as the sample space and the outcome is then regarded as a random variable over the space of individuals. In contrast, within the stochastic counterfactual framework (Greenland (1987); Robins and Greenland (1989, 2000)), each set of exposures corresponds to a distribution of the outcome for each individual; the random outcome variable defined over the space of individuals is then itself a distribution-valued random variable.

Counterfactuals make reference to different outcomes (or distributions of outcomes) under different exposures or interventions. Rothman (1976) described causation in a somewhat different manner by conceiving of the relationship between cause and effect as a series of different causal mechanisms each sufficient to bring about the outcome. These causal mechanisms Rothman called “sufficient causes,” informally defined as minimal sets of actions, events or states of nature which together initiate a process that inevitably results in the outcome. For a particular outcome there would likely be many different sufficient causes, i.e., many different causal mechanisms by which the outcome could come about. For example, perhaps we have some genetic factor G and some environmental factor E which are our causes of interest for a particular cancer outcome D; perhaps one such cause requires the environmental factor E and some unknown factors A₁ in order to operate. Within a deterministic framework, whenever both E and A₁ are present an individual will inevitably have the outcome D. Perhaps another sufficient cause for D consists of the genetic factor G and some other unknown factors A₂ and perhaps a third sufficient cause for D consists of the environmental factor E, the genetic factor G, and some other unknown factors A₃. We would then have three sufficient causes: A₁E, A₂G, and A₃EG. Each sufficient cause involves some combination of the various component causes, namely, E, G, A₁, A₂ and A₃. Under a deterministic sufficient cause framework, whenever all components of a particular sufficient cause are present, the outcome D will inevitably occur; within every sufficient cause, each component is necessary for that sufficient cause to lead to the outcome. If two distinct causes are both components of the same sufficient cause then the causes participate together in the same causal mechanism, and synergism is said to be present. Thus if there were indeed a sufficient cause, such as A₃EG, that required both E and G, then it would be said that synergism is present between the effects of E and G. In many settings it will not be known whether synergism is present i.e. whether there is a sufficient cause corresponding to a causal mechanism that requires both of two causes such as E and G to operate; we might then be interested in empirically testing whether synergism is present.

VanderWeele and Robins (2008) gave formal definitions for sufficient causes, sufficient cause representations, and sufficient cause interactions in the deterministic setting, and furthermore derived empirical conditions for testing for synergism. In this paper we formulate a stochastic sufficient cause framework, relate stochastic sufficient causes to stochastic counterfactuals, and show that it is possible to test for sufficient cause interactions even in the stochastic sufficient cause setting.

Technical details are provided below, but the basic approach for testing for sufficient cause interactions in the deterministic setting is as follows. For a binary outcome D and a number of binary exposures X₁, …, X_n, a sufficient cause representation is defined to be a set of sufficient causes (involving X₁, …, X_n and possibly also other unknown variables or causes denoted by A_i) that replicate a particular set of counterfactual outcomes. Thus the ith sufficient cause would take the form $A_{i} F_{1}^{i} \dots F_{n_{i}}^{i}$ , where each $F_{k}^{i}$ is either a member of the set {X₁, …, X_n} or is the complement of such a member. A sufficient cause interaction is said to be present between X₁, …, X_k if every representation of the counterfactual outcomes by sufficient causes has a sufficient cause in which X₁, …, X_k are all present. A sufficient cause interaction necessarily implies synergism (VanderWeele and Robins (2008)) but synergism may be present without a sufficient cause interaction. In the case of two binary variables, say, an X₁X₂ term may not be logically necessary to represent the counterfactual outcomes by sufficient causes, but there might be an X₁X₂ sufficient cause term in the representation that actually corresponds to the biological mechanisms; see VanderWeele and Robins (2007, 2008) for further discussion.

For two exposures, X₁ and X₂, we let D_x_₁_x_₂ denote the counterfactual value of D intervening to set X₁ = x₁ and X₂ = x₂. We say that the effects of X₁ and X₂ on D are unconfounded conditional on C if D_x_₁_x_₂ ∐ {X₁, X₂}|C where A ∐ B|C denotes that A is independent of B conditional on C. VanderWeele and Robins (2008) showed that for a binary outcome D and two binary exposures X₁ and X₂, if the effects of X₁ and X₂ on D are unconfounded conditional on C, then if

p_{11 c} - p_{10 c} - p_{01 c} > 0

(1)

where p_x_₁_x_₂_c = E(D|X₁ = x₁, X₂ = x₂, C = c), then a sufficient cause interaction must be present between X₁ and X₂. It was furthermore shown if D_x_₁_x_₂ is non-decreasing in x₁ and x₂, then if

p_{11 c} - p_{10 c} - p_{01 c} + p_{00 c} > 0,

(2)

then a sufficient cause interaction must be present between X₁ and X₂. Extensions to three way sufficient cause interactions were also noted. For three binary exposures X₁, X₂ and X₃, let p_x_₁_x_₂_x_₃_c = E(D|X₁ = x₁, X₂ = x₂, X₃ = x₃, C = c). If the effects of X₁, X₂, and X₃ on D are unconfounded conditional on C, and if

p_{111 c} - p_{110 c} - p_{101 c} - p_{011 c} > 0,

(3)

then a sufficient cause interaction must be present between X₁, X₂, and X₃. Finally if D_x_₁_x_₂_x_₃ is non-decreasing in x₁, x₂, and x₃, then any of the following three conditions imply that a sufficient cause interaction is present between X₁, X₂ and X₃:

\begin{array}{l} p_{111 c} - p_{110 c} - p_{101 c} - p_{011 c} + p_{100 c} + p_{010 c} > 0 \\ p_{111 c} - p_{110 c} - p_{101 c} - p_{011 c} + p_{100 c} + p_{001 c} > 0 \\ p_{111 c} - p_{110 c} - p_{101 c} - p_{011 c} + p_{010 c} + p_{001 c} > 0. \end{array}

(4)

See VanderWeele (2009) for discussion of the relation of conditions (1)–(4) to interaction terms in linear, log-linear, and logistic models. In the context of no confounding factors, the fact that condition (2) is sufficient to conclude the presence of a sufficient cause interaction was stated explicitly and proved by Rothman and Greenland (1998). Theory concerning sufficient causes developed by VanderWeele and Robins (2008) was necessary to derive conditions (1), (3), and (4). In this paper we provide necessary definitions for a stochastic sufficient cause framework and show that conditions (1)–(4) above also imply the presence of sufficient cause interactions in the stochastic sufficient cause framework.

2. Stochastic Sufficient Causes and Sufficient Cause Interactions

Under a deterministic counterfactual model, each set of potential interventions corresponds to only one outcome for each individual. The same set of interventions may bring about different outcomes on different individuals but for a particular individual the set of interventions fixes the outcome. The deterministic counterfactual framework can be generalized to a stochastic counterfactual framework wherein, for each individual, a particular set of interventions gives rise to a distribution of outcomes for that individual (Greenland (1987); Robins and Greenland (1989, 2000)). Likewise, the deterministic sufficient-component cause model can be generalized to a stochastic setting so that for each individual the completion of a sufficient cause gives rise to a probability of developing the outcome.

We use the following notation. An event is a binary variable taking values in {0, 1}. The disjunctive or OR operator, ⋁, is defined by A ⋁ B = A + B − AB so that A ⋁ B = 1 if A = 1 or B = 1 or both but A ⋁ B = 0 if A = B = 0. Note, however, by defining ⋁ more generally as A ⋁ B = A + B − AB, we can apply the OR operator, ⋁, to numbers other than 0 and 1. The complement of an event A will be denoted by Ā. A conjunction or product of the events X₁, …, X_n will be written as X₁…X_n so that X₁…X_n = 1 if and only if each of the the events X₁, …, X_n takes the value 1.

We will first assume that there are only two causes of primary interest X₁ and X₂. In the stochastic counterfactual setting for each exposure combination and for each individual there is some probability of outcome. Thus in the stochastic setting, for each individual ω the counterfactual D_x_₁_x_₂ (ω) is in fact a Bernoulli random variable with probability p_x_₁_x_₂ (ω). Note that the probabilities p_x_₁_x_₂ (ω) are allowed to vary with ω, i.e., from one individual to another. In a stochastic setting, as with the deterministic setting, each sufficient cause may involve either X₁ or $\bar{X_{1}}$ or neither, and may involve either X₂ or $\bar{X_{2}}$ or neither, and may also involve various other background variables or causes which we denote by A_i. There are nine possible sufficient causes for D: A₀, A₁X₁, A₂X₂, $A_{3} \bar{X_{1}}, A_{4} \bar{X_{2}}$ , A₅X₁X₂, $A_{6} \bar{X_{1}} X_{2}, A_{7} X_{1} \bar{X_{2}}$ and $A_{8} \bar{X_{1}} \bar{X_{2}}$ . We assume the background cause variables A_i are not affected by interventions on X₁ and X₂ (cf. VanderWeele and Robins (2007, 2008) for further discussion). For individual ω, if the ith sufficient cause takes the value 1, then in the stochastic counterfactual setting there is some probability v_i(ω) that the sufficient cause brings about the outcome. The probabilities v_i(ω) are allowed to vary with ω.

In the deterministic setting, for any set of variables A₀,.., A₈ not affected by interventions on X₁ and X₂, the disjunction of sufficient causes $A_{0} ⋁ A_{1} X_{1} ⋁ A_{2} X_{2} ⋁ A_{3} \bar{X_{1}} ⋁ A_{4} \bar{X_{2}} ⋁ A_{5} X_{1} X_{2} ⋁ A_{6} \bar{X_{1}} X_{2} ⋁ A_{7} X_{1} \bar{X_{2}} ⋁ A_{8} \bar{X_{1}} \bar{X_{2}}$ is said to constitute a sufficient cause representation if

\begin{array}{l} D_{x_{1} x_{2}} = A_{0} ⋁ A_{1} x_{1} ⋁ A_{2} x_{2} ⋁ A_{3} (1 - x_{1}) ⋁ A_{4} (1 - x_{2}) ⋁ A_{5} x_{1} x_{2} \\ ⋁ A_{6} (1 - x_{1}) x_{2} ⋁ A_{7} x_{1} (1 - x_{2}) ⋁ A_{8} (1 - x_{1}) (1 - x_{2}) . \end{array}

In the stochastic setting the causes of interest X_i and the background causes A_i are random variables over the population but fixed for an individual; however, in this stochastic setting, for each individual the completion of a sufficient cause will only bring about an outcome with some probability and this probability may vary across individuals. For a set of variables A₀,.., A₈ not affected by interventions on X₁ and X₂, and a set of possibly dependent Bernoulli random variables {R_i(ω)}_ω_∈Ω with corresponding probabilities {v_i(ω)}_ω_∈Ω that the completion of the ith sufficient cause brings about the outcome, we say that the disjunction $A_{0} R_{0} ⋁ A_{1} R_{1} X_{1} ⋁ A_{2} R_{2} X_{2} ⋁ A_{3} R_{3} \bar{X_{1}} ⋁ A_{4} R_{4} \bar{X_{2}} ⋁ A_{5} R_{5} X_{1} X_{2} ⋁ A_{6} R_{6} \bar{X_{1}} X_{2} ⋁ A_{7} R_{7} X_{1} \bar{X_{2}} ⋁ A_{8} R_{8} \bar{X_{1}} \bar{X_{2}}$ is a stochastic sufficient cause representation for D if for all ω and all x₁ and x₂,

\begin{array}{l} D_{x_{1} x_{2}} (ω) = A_{0} (ω) R_{0} (ω) ⋁ A_{1} (ω) R_{1} (ω) x_{1} ⋁ A_{2} (ω) R_{2} (ω) x_{2} ⋁ A_{3} (ω) R_{3} (ω) (1 - x_{1}) \\ ⋁ A_{4} (ω) R_{4} (ω) (1 - x_{2}) ⋁ A_{5} (ω) R_{5} (ω) x_{1} x_{2} ⋁ A_{6} (ω) R_{6} (ω) (1 - x_{1}) x_{2} \\ ⋁ A_{7} (ω) R_{7} (ω) x_{1} (1 - x_{2}) ⋁ A_{8} (ω) R_{8} (ω) (1 - x_{1}) (1 - x_{2}) . \end{array}

Note that R_i(ω) is the random variable which, for individual ω, denotes whether the ith sufficient cause, if complete, brings about the outcome. Note also that for a fixed ω ∈ Ω, we do not assume that for i ≠ j, R_i(ω) is independent of R_j(ω). For a particular ω ∈ Ω, if it is in fact the case that {R₀(ω), …, R₈(ω)} are mutually independent with probabilities {v₀(ω), …, v₈(ω)} then it is also the case that counterfactual outcome probability p_x_₁_x_₂ (ω) is:

\begin{array}{l} p_{x_{1} x_{2}} (ω) = A_{0} (ω) v_{0} (ω) ⋁ A_{1} (ω) v_{1} (ω) x_{1} ⋁ A_{2} (ω) v_{2} (ω) x_{2} ⋁ A_{3} (ω) v_{3} (ω) (1 - x_{1}) \\ ⋁ A_{4} (ω) v_{4} (ω) (1 - x_{2}) ⋁ A_{5} (ω) v_{5} (ω) x_{1} x_{2} ⋁ A_{6} (ω) v_{6} (ω) (1 - x_{1}) x_{2} \\ ⋁ A_{7} (ω) v_{7} (ω) x_{1} (1 - x_{2}) ⋁ A_{8} (ω) v_{8} (ω) (1 - x_{1}) (1 - x_{2}) . \end{array}

This follows since if Y₁, …, Y_k are independent Bernouilli random variable with success probabilities p₁, …, p_k then Y⁽^k⁾ = Y₁ ⋁ … ⋁ Y_k is a Bernoulli random variable with success probability p⁽^k⁾ = p₁ ⋁ … ⋁ p_k.

Note that for any given set of stochastic counterfactuals {D_x_₁_x_₂ (ω)}_ω_∈Ω there always exists at least one stochastic sufficient cause representation, since we may take A₀(ω) = A₁(ω) = A₂(ω) = A₃(ω) = A₄(ω) = 0 for all ω and A₅(ω) = A₆(ω) = A₇(ω) = A₈(ω) = 1 for all ω, and we may take {R₀(ω), …, R₈(ω)}_ω_∈Ω as the Bernoulli random variables R₀(ω) = R₁(ω) = R₂(ω) = R₃(ω) = R₄(ω) = 0 for all ω and R₅(ω) = D₁₁(ω), R₆(ω) = D₀₁(ω), R₇(ω) = D₁₀(ω), R₈(ω) = D₀₀(ω) for all ω. In what follows probabilities and expectations with an Ω subscript, such as P_Ω and E_Ω, denote probabilities and expectations over individuals but not within individuals; probabilities and expectations without a subscript denote double expectations over the individuals in the population and over the possible outcome realizations for the stochastic sufficient causes within individuals. We say that there is a stochastic sufficient cause interaction between X₁ and X₂ if in every stochastic sufficient cause representation for D we have P(A₅R₅ = 1) > 0. Note the sufficient cause corresponding to A₅ is the one with both X₁ and X₂ in its conjunction. Stochastic sufficient cause interactions for $\bar{X_{1}}$ and X₂, X₁ and $\bar{X_{2}}$ , or $\bar{X_{1}}$ and $\bar{X_{2}}$ can be defined similarly. If there is a stochastic sufficient cause interaction between X₁ and X₂ then there must be some mechanism which requires both X₁ and X₂ to operate and which results in the outcome with a non-zero probability.

We need one further concept. We say X₁ or X₂ has a positive monotonic effect in the stochastic sufficient cause sense if sufficient causes with $\bar{X_{1}}$ or $\bar{X_{2}}$ , respectively, are excluded from all stochastic sufficient cause representations. In other words, X₁, say, has a positive monotonic effect in the stochastic sufficient cause sense if we know a priori, or are willing to assume, that there are no mechanisms for the outcome D that require the absence of X₁ to operate. The following theorems show that the conditions that suffice to conclude the presence of a sufficient cause interaction in the deterministic setting suffice also to conclude the presence of sufficient cause interactions in the stochastic sufficient cause framework. As noted below, the proofs can be reconstrued so as to follow from the results in the deterministic setting.

Theorem 1

If E(D₁₁ − D₁₀ − D₀₁) > 0, then there is a stochastic sufficient cause interaction between X₁ and X₂.

Proof

We have E(D₁₁ − D₁₀ − D₀₁) = E_Ω(p₁₁ − p₁₀ − p₀₁). For any stochastic sufficient cause representation for D,

\begin{array}{l} D_{x_{1} x_{2}} (ω) = A_{0} (ω) R_{0} (ω) ⋁ A_{1} (ω) R_{1} (ω) x_{1} ⋁ A_{2} (ω) R_{2} (ω) x_{2} ⋁ A_{3} (ω) R_{3} (ω) (1 - x_{1}) \\ ⋁ A_{4} (ω) R_{4} (ω) (1 - x_{2}) ⋁ A_{5} (ω) R_{5} (ω) x_{1} x_{2} ⋁ A_{6} (ω) R_{6} (ω) (1 - x_{1}) x_{2} \\ ⋁ A_{7} (ω) R_{7} (ω) x_{1} (1 - x_{2}) ⋁ A_{8} (ω) R_{8} (ω) (1 - x_{1}) (1 - x_{2}) . \end{array}

Define B_i(ω) = A_i(ω)R_i(ω) and let b_i = E_Ω(B_i), b_ij = E_Ω(B_iB_j), b_ijk = E_Ω(B_iB_jB_k), and b_ijkl = E_Ω(B_iB_jB_kB_l). Then

\begin{array}{l} E_{Ω} (p_{11}) = b_{0} + b_{1} + b_{2} + b_{5} - (b_{01} + b_{02} + b_{05} + b_{12} + b_{15} + b_{25}) + (b_{012} + b_{015} + b_{025} + b_{125}) - b_{0125} \\ E_{Ω} (p_{10}) = b_{0} + b_{1} + b_{4} + b_{7} - (b_{01} + b_{04} + b_{07} + b_{14} + b_{17} + b_{47}) + (b_{014} + b_{017} + b_{047} + b_{147}) - b_{0147} \\ E_{Ω} (p_{01}) = b_{0} + b_{2} + b_{3} + b_{6} - (b_{02} + b_{03} + b_{06} + b_{23} + b_{26} + b_{36}) + (b_{023} + b_{026} + b_{036} + b_{236}) - b_{0236} \end{array}

If b₅ = 0, then E_Ω(p₁₁) = b₀ + b₁ + b₂ − (b₀₁ + b₀₂ + b₁₂) + b₀₁₂ and

\begin{array}{l} E_{Ω} (p_{11} - p_{10} - p_{01}) \\ = b_{0} + b_{1} + b_{2} - (b_{01} + b_{02} + b_{12}) + b_{012} - {b_{0} + b_{1} + b_{4} + b_{7} - (b_{01} + b_{04} + b_{07} + b_{14} + b_{17} + b_{47}) \\ + (b_{014} + b_{017} + b_{047} + b_{147}) - b_{0147}} - {b_{0} + b_{2} + b_{3} + b_{6} - (b_{02} + b_{03} + b_{06} + b_{23} + b_{23} + b_{26} + b_{36}) \\ + (b_{023} + b_{026} + b_{036} + b_{236}) - b_{0236}}, \\ = - (b_{12} - b_{012}) - {b_{4} + b_{7} - (b_{04} + b_{07} + b_{14} + b_{17} + b_{47}) + (b_{014} + b_{017} + b_{047} + b_{147}) - b_{0147}} \\ - {b_{0} + b_{3} + b_{6} - (b_{03} + b_{06} + b_{23} + b_{26} + b_{36}) + (b_{023} + b_{026} + b_{036} + b_{236}) - b_{0236}} \\ = - E (B_{0}) - E (\bar{B_{0}} B_{1} B_{2}) - E ({\bar{B}}_{0} {\bar{B}}_{1} {\bar{B}}_{4} B_{7}) - E ({\bar{B}}_{0} {\bar{B}}_{2} {\bar{B}}_{3} B_{6}) - {E (\bar{B_{0}} B_{4}) - E (\bar{B_{0}} B_{1} B_{4})} \\ - {E (\bar{B_{0}} B_{3}) - E (\bar{B_{0}} B_{2} B_{3})} \\ \leq 0. \end{array}

Thus if E(D₁₁ − D₁₀ − D₀₁) > 0 then b₅ > 0, and so E_Ω(A₅R₅ = 1) > 0 and consequently P(A₅R₅ = 1) > 0, and there is a stochastic sufficient cause interaction between X₁ and X₂.

Theorem 2

If X₁ and X₂ have monotonic effects on D in the stochastic sufficient cause sense and E(D₁₁ − D₁₀ − D₀₁ + D₀₀) > 0, then there is a stochastic sufficient cause interaction between X₁ and X₂.

Proof

We have that E(D₁₁ − D₁₀ − D₀₁ + D₀₀) = E_Ω(p₁₁− p₁₀ − p₀₁ + p₀₀). Since X₁ and X₂ have monotonic effects on D, for any stochastic sufficient cause representation for D,

D_{x_{1} x_{2}} (ω) = A_{0} (ω) R_{0} (ω) ⋁ A_{1} (ω) R_{1} (ω) x_{1} ⋁ A_{2} (ω) R_{2} (ω) x_{2} ⋁ A_{5} (ω) R_{5} (ω) x_{1} x_{2} .

Define B_i(ω) = A_i(ω)R_i(ω) and let b_i = E_Ω(B_i), b_ij = E_Ω(B_iB_j), b_ijk = E_Ω(B_iB_jB_k) and b_ijkl = E_Ω(B_iB_jB_kB_l). Then

\begin{array}{l} E_{Ω} (p_{11}) = b_{0} + b_{1} + b_{2} + b_{5} - (b_{01} + b_{02} + b_{05} + b_{12} + b_{15} + b_{25}) + (b_{012} + b_{015} + b_{025} + b_{125}) - b_{0125} \\ E_{Ω} (p_{10}) = b_{0} + b_{1} - b_{01} \\ E_{Ω} (p_{01}) = b_{0} + b_{2} - b_{02} \\ E_{Ω} (p_{00}) = b_{0} . \end{array}

If b₅ = 0, then E_Ω(p₁₁) = b₀ + b₁ + b₂ − (b₀₁ + b₀₂ + b₁₂) + b₀₁₂ and

\begin{array}{l} E_{Ω} (p_{11} - p_{10} - p_{01} + p_{00}) = b_{0} + b_{1} + b_{2} - (b_{01} + b_{02} + b_{12}) + b_{012} - (b_{0} + b_{1} - b_{01}) - (b_{0} + b_{2} - b_{02}) + b_{0} \\ = - (b_{12} - b_{012}) \leq 0. \end{array}

Thus if E(D₁₁ − D₁₀ − D₀₁ + D₀₀) > 0 then b₅ > 0, and so E_Ω(A₅R₅) > 0 and consequently P(A₅R₅ = 1) > 0, and there is a stochastic sufficient cause interaction between X₁ and X₂.

The definitions for stochastic sufficient causes given in the case of two causes of interest can be generalized to settings in which there are n causes of interest, X₁, …, X_n. The counterfactual D_{x₁…x_n} (ω) is a Bernoulli random variable with probability p_{x₁…x_n}(ω). The probabilities p_{x₁…x_n}(ω) are allowed to vary with ω. A sufficient cause is of the form $A_{i} F_{1}^{i} \dots F_{n_{i}}^{i}$ where each $F_{k}^{i}$ is either a member of the set {X₁, …, X_n} or is the complement of such a member, and the variables A_i are not affected by interventions on {X₁, …, X_n}. For individual ω if the ith sufficient cause $A_{i} F_{1}^{i} \dots F_{n_{i}}^{i} = 1$ , then there is some probability v_i(ω) that the sufficient cause brings about the outcome. The probabilities v_i(ω) are allowed to vary with ω. For a set of binary variables ${A_{i}}_{i = 0}^{T}$ that are not affected by interventions on {X₁, …, X_n} and a set of Bernoulli random variables {R_i(ω)}_ω_∈Ω with probabilities {v_i(ω)}_ω_∈Ω, we say that the disjunction of ${A_{i} R_{i} F_{1}^{i} \dots F_{n_{i}}^{i}}_{i = 0}^{T}$ constitutes a stochastic sufficient cause representation if for all ω and all x₁, …, x_n, $D_{x_{1} \dots x_{n}} (ω) = \underset{i}{⋁} R_{i} (ω) A_{i} (ω) g_{i} (x_{1}, \dots, x_{n})$ , where g_i(x₁, …, x_n) = 1 if $F_{1}^{i} \dots F_{n_{i}}^{i} = 1$ when (X₁, …, X_n) = (x₁, …, x_n) and 0 otherwise. For any given set of stochastic counterfactuals {D_{x₁…x_n} (ω)}_ω_∈Ω, there always exists at least one stochastic sufficient cause representation since for each conjunction $F_{1}^{i} \dots F_{n_{i}}^{i}$ we may take A_i = 1 if n_i = n and A_i = 0 otherwise, and we may take the Bernoulli random variables {R_i(ω)}_ω_∈Ω with R_i(ω) = 0 for all ω if A_i = 0 and, for i such that A_i = 1, R_i(ω) = D_{x₁…x_n}(ω) for x₁, …, x_n which satisfy $F_{1}^{i} \dots F_{n_{i}}^{i} = 1$ . We say X_k has a positive monotonic effect in the stochastic sufficient cause sense if sufficient causes with $\bar{X_{k}}$ are excluded from all stochastic sufficient cause representations. We say that there is a sufficient cause interaction between the effects of X₁, …, X_k if in every stochastic sufficient cause representation for D there exists a sufficient cause $A_{i} F_{1}^{i} \dots F_{n_{i}}^{i}$ with X₁, …, X_k in its conjunction such that P (A_iR_i = 1) > 0.

The method we used in the proofs of Theorems 1 and 2 in fact applies more generally. By letting B_i = A_iR_i we can express the expectation of a counterfactual contrast in terms of the probabilities b_i = E(B_i). If in this expression we replace b_i with a_i we obtain the same expression obtained in the deterministic case by taking the expectation of the counterfactual contrast and expressing it in terms of the probabilities a_i = P(A_i = 1) (cf. VanderWeele and Robins (2007)). In the deterministic case, we know from prior results (VanderWeele and Robins (2008), VanderWeele and Richardson (2011)) that if the expectation of certain counterfactual contrasts is positive, then some a_j corresponding to a sufficient cause interaction must be non-zero. It thus also follows in the stochastic setting that if the expectation of the same counterfactual contrasts is positive, then some b_j corresponding to a sufficient cause interaction must also be non-zero. Since b_j ≠ 0 we have E(B_j) > 0, and thus E_Ω(A_jR_j) > 0 and P(A_jR_j = 1) > 0 so there must be a stochastic sufficient cause interaction. Effectively, we reduce the problem in the stochastic setting to an equivalent problem in the deterministic setting for which the solution is already known. Thus for three-way sufficient cause interactions in the stochastic sufficient cause setting we have the following results.

Theorem 3

If E(D₁₁₁ − D₁₁₀ − D₁₀₁ − D₀₁₁) > 0, then there is a stochastic sufficient cause interaction between X₁, X₂ and X₃.

Theorem 4

If X₁, X₂ and X₃ have monotonic effects on D in the stochastic sufficient cause sense then if any of the following three conditions hold,

\begin{array}{l} E (D_{111} - D_{110} - D_{101} - D_{011} + D_{100} + D_{010}) > 0 \\ E (D_{111} - D_{110} - D_{101} - D_{011} + D_{100} + D_{001}) > 0 \\ E (D_{111} - D_{110} - D_{101} - D_{011} + D_{010} + D_{001}) > 0 \end{array}

there is a stochastic sufficient cause interaction between X₁, X₂ and X₃.

Note that if the effects of {X₁, X₂} or {X₁, X₂, X₃} on D are unconfounded, then the conditions given in Theorems 1–4 are simply conditions (1)–(4) given in the introduction. We thus have shown that the empirical conditions given by Vander-Weele and Robins (2008) that suffice to conclude the presence of a sufficient cause interaction in the deterministic sufficient cause framework suffice also to conclude the presence of a sufficient cause interaction in the stochastic sufficient cause framework. If the effects of {X₁, X₂} or {X₁, X₂, X₃} on D are unconfounded conditional on some set of covariates C, then conditions (1)–(4) and Theorems 1–4 can also be made conditional on C. Conditions for n-way sufficient cause interactions have been derived elsewhere (VanderWeele and Richardson (2011). By the arguments above these conditions for n-way interactions also imply the presence of n-way sufficient cause interactions in the stochastic setting. In the appendix we discuss a stochastic version of recent work on the sufficient cause model in which the exposures are categorical or ordinal with more than two levels.

3. Genetics Applications Revisited

In this section we discuss two genetic studies (Bennett et al. (1999), Zhang et al. (2005)). In other work (VanderWeele, Hernández-Diaz, and Hernán (2010), VanderWeele (2010)), it was shown that there is evidence in these two studies of a sufficient cause interaction within a deterministic sufficient cause framework. Here we revisit these examples in light of the stochastic sufficient cause framework.

Bennett et al. (1999) studied the interaction between passive smoking, X₁, and glutathione S-transferase M1 (GSTM1), X₂, on lung cancer risk, D, among non-smokers. The authors used a case-only design with 106 lung cancer cases and logistic regression, controlling for age, history of non-neoplastic lung disease, radon exposure, and intake of saturated fat and vegetables (denoted here by C) to estimate that $\frac{P (D ∣ X_{1} = 1, X_{2} = 1, C = c) P (D ∣ X_{1} = 0, X_{2} = 0, C = c)}{P (D ∣ X_{1} = 1, X_{2} = 0, C = c) P (D ∣ X_{1} = 0, X_{2} = 1, C = c)} = 2.6 (95 % CI : 1.1 - 6.1)$ . It can be shown (VanderWeele (2009)) that, under monotonicity, $\frac{P (D ∣ X_{1} = 1, X_{2} = 1, C = c) P (D ∣ X_{1} = 0, X_{2} = 0, C = c)}{P (D ∣ X_{1} = 1, X_{2} = 0, C = c) P (D ∣ X_{1} = 0, X_{2} = 1, C = c)} > 1$ implies condition (2), i.e., in the notation of the introduction, p₁₁_c − p₁₀_c − p₀₁_c+p₀₀_c > 0. Since even the lower bound of the confidence interval for $\frac{P (D ∣ X_{1} = 1, X_{2} = 1, C = c) P (D ∣ X_{1} = 0, X_{2} = 0, C = c)}{P (D ∣ X_{1} = 1, X_{2} = 0, C = c) P (D ∣ X_{1} = 0, X_{2} = 1, C = c)}$ is greater than 1, VanderWeele et al. (2010) argued that there was evidence of a sufficient cause interaction within the deterministic sufficient cause framework if it could be assumed that the effects of passive smoking and glutathione S-transferase M1 (GSTM1) on lung cancer risk are monotonic. By the results in the previous section, since condition (2) is satisfied, if the effects of passive smoking and glutathione S-transferase M1 (GSTM1) are monotonic in the stochastic sufficient cause sense, then one could then also conclude a sufficient cause interaction is present within the stochastic sufficient cause framework. In other words, even if we relax the requirement that the completion of a particular sufficient cause inevitably gives rise to the outcome and assume it does so only with some probability, we still have evidence for a mechanistic interaction between the effects of passive smoking and glutathione S-transferase M1 (GSTM1) on lung cancer.

Zhang et al. (2005) studied the lung cancer risk associated with ADPRT Val762Ala and XRCC1 Arg399Gln polymorphisms using a case-control study design. Let the ADPRT Val/Val, Val/Ala, and Ala/Ala genotypes be denoted by V₁ = 0, V₁ = 1 and V₁ = 2, respectively. Let the XRCC1 Arg/Arg, Arg/Gln, Gln/Gln genotypes be denoted by V₂ = 0, V₂ = 1 and V₂ = 2, respectively. Using logistic regression controlling for sex, age and smoking status (denoted here by C), the authors test $\frac{P (D ∣ V_{1} = 2, V_{2} = 2, C = c) P (D ∣ V_{1} = 0, V_{2} = 0, C = c)}{P (D ∣ V_{1} = 2, V_{2} = 0, C = c) P (D ∣ V_{1} = 0, V_{2} = 2, C = c)} = 1$ and obtained a p-value of 0.018, indicating the ratio is greater than 1. VanderWeele (2010) showed that this would imply the empirical condition p₂₂_c − p₂₀_c − p₀₂_c + p₀₀_c > 0 which, under the assumption that V₁ and V₂ have monotonic effects on D, implies that a sufficient cause containing the term 1(V₁ = 2)1(V₂ = 2) must be present in a deterministic sufficient cause framework. The results in this paper and in the appendix imply that, provided that V₁ and V₂ have monotonic effects on D in the stochastic sufficient cause sense, then one also has evidence for a sufficient cause containing the term 1(V₁ = 2)1(V₂ = 2) even in the stochastic sufficient cause setting.

4. Concluding Remarks

In this paper we have considered settings in which the outcome for each individual under each possible set of exposures follows a probability distribution so that, for any given exposure combination, outcomes vary not only between individuals but also within individuals thereby giving rise to stochastic counterfactual outcomes. This additional level of random variation may be seen as desirable. Although there is already a small literature on stochastic counterfactuals (Greenland (1987), Robins and Greenland (1989, 2000)), the literature on the sufficient cause framework to date concerns the deterministic setting. The definitions and results given here provide a stochastic sufficient cause framework, and relate stochastic counterfactuals to stochastic sufficient causes. In particular we have shown that, under the assumption of no unmeasured confounding, it is possible to empirically test for the joint presence of two causes in the same sufficient cause or causal mechanism under a stochastic sufficient cause and stochastic counterfactual framework.

Developments during the last century in quantum physics suggest that the world may be inherently probabilistic. Similar ideas may be found in the medical literature (Elwood (1988), Karhausen (2001)). Extending the theory of sufficient causes to a stochastic setting may thus constitute an important step towards conceptualizing causation in a manner more consistent with physical realities. We have shown that regardless of whether the underlying causal mechanisms are deterministic or stochastic, the same empirical conditions can be used to test for sufficient cause interactions. Our results did not require that the stochastic event of a particular mechanism bringing about the outcome be independent of the stochastic event of some other mechanism bringing about the outcome. These developments are furthermore important from a philosophical point of view. Because counterfactual outcomes cannot be simultaneously observed, assumptions about them cannot be empirically verified, it is important that assumptions made about counterfactuals be as general as possible. It is thus of interest that the conditions for sufficient cause interactions also hold under a stochastic counterfactual and stochastic sufficient cause setting.

Appendix

In this appendix, we discuss how the approach to stochastic sufficient causes described in the paper applies also to the sufficient cause setting when the variables are categorical or ordinal. For illustration we consider a setting in which there are two variables, V₁ and V₂, each with three possible levels: 0, 1, 2. The remarks apply more generally. The counterfactual D_v_₁_v_₂(ω) is a Bernoulli random variable with probability p_v_₁_v_₂(ω), the probabilities p_v_₁_v_₂(ω) are allowed to vary with ω. For simplicity assume Ω is finite. Let 1(V = v) be the indicator function that V = v and take 1(V = *) ≡ 1. A sufficient cause is of the form A_ij1(V₁ = i)1(V₂ = j), i ∈ {0, 1, 2, *}, j ∈ {0, 1, 2, *}, where the A_ij variables are not affected by interventions on {V₁, V₂}. For individual ω, if A_ij1(V₁ = i)1(V₂ = j) = 1, then there is some probability v_ij(ω) that the sufficient cause brings about the outcome. The probabilities v_ij(ω) are allowed to vary with ω. For a set of binary variables {A_ij}_i_∈{0_,₁_,₂_,_*}_,j_∈{0_,₁_,₂_,_*} which are not affected by interventions on {V₁, V₂}, and a set of Bernoulli random variables {R_ij(ω)}_ω_∈Ω_,i_∈{0_,₁_,₂_,_*}_,j_∈{0_,₁_,₂_,_*} with probabilities {v_ij(ω)}_ω_∈Ω_,i_∈{0_,₁_,₂_,_*}_,j_∈{0_,₁_,₂_,_*}, we say that ⋁ _i_∈{0_,₁_,₂_,_*}_,j_∈{0_,₁_,₂_,_*}A_ijR_ijI(v₁ = i)I(v₂ = j) constitutes a stochastic sufficient cause representation if for all ω and all v₁ and v₂ we have D_v_₁_v_₂(ω) = ⋁_i_∈{0_,₁_,₂_,_*}_,j_∈{0_,₁_,₂_,_*}A_ij(ω)R_ij(ω)I(v₁ = i)I(v₂ = j). For any given set of stochastic counterfactuals {D_v_₁_v_₂(ω)}_ω_∈Ω, there always exists at least one stochastic sufficient cause representation since for each conjunction I(v₁ = i)I(v₂ = j), i ∈ {0, 1, 2, *}, j ∈ {0, 1, 2, *}, we may take A_ij = 1 if i ≠ * and j ≠ * and A_ij = 0 otherwise and we may define the Bernoulli random variables {R_ij(ω)}_ω_∈Ω by R_ij(ω) = 0 for all ω if A_ij = 0 and, for i, j such that A_ij = 1, R_ij(ω) = D_ij(ω). We say V₁ (or V₂) has a positive monotonic effect in the stochastic counterfactual sense if for all ω, D_v_₁_v_₂(ω) is non-decreasing in v₁ (or v₂ respectively) for all points in the sample space for D_v_₁_v_₂(ω). We say that there is a weak (cf. VanderWeele (2010)) sufficient cause interaction between I(v₁ = i) and I(v₂ = j) if in every stochastic sufficient cause representation for D there exists a sufficient cause A_ijI(v₁ = i)I(v₂ = j) such that P(A_ijR_ij = 1) > 0.

Without the assumptions of monotonicity, the proof used for Theorem 1 again applies here: by letting B_ij = A_ijR_ij we can express the expectation of a counter-factual contrast in terms of the probabilities b_ij = E(B_ij). If in this expression we replace b_ij with a_ij, we obtain the same expression obtained in the deterministic case by taking the expectation of the counterfactual contrast and expressing it in terms of the probabilities a_ij = P (A_ij = 1), and from results in the deterministic case (VanderWeele (2010)) it then follows that if the expectation of certain counterfactual contrasts is positive, then some a_ij corresponding to a sufficient cause interaction must be non-zero. It thus also follows in the stochastic setting that if the expectation of the same counterfactual contrasts is positive then some b_ij corresponding to a sufficient cause interaction must also be non-zero. Since b_ij ≠ 0, we have E(B_ij) > 0, and thus E_Ω(A_ijR_ij) > 0 and P(A_ijR_ij = 1) > 0, so there must be a stochastic sufficient cause interaction. In the case that one or both of V₁ and V₂ have a positive monotonic effect in the stochastic counterfactual sense, then using similar arguments as in the deterministic case (VanderWeele (2010)), the empirical conditions under monotonicity implying the existence of an individual ω with a deterministic sufficient cause interaction implies also an individual ω with A_ij(ω) = 1 and v_ij(ω) > 0, and thus a stochastic sufficient cause interaction.

References

Bennett WP, Alavanja MCR, Blomeke B, Vähäkangas KH, Castrén K, Welsh JA, Bowman ED, Khan MA, Flieder DB, Harris CC. Environmental tobacco smoke, genetic susceptibility, and risk of lung cancer in never-smoking women. J Natl Cancer Inst. 1999;91:2009–2014. doi: 10.1093/jnci/91.23.2009. [DOI] [PubMed] [Google Scholar]
Elwood JM. Causal Relationships in Medicine. Oxford University Press; Oxford: 1988. [Google Scholar]
Greenland S. Interpretation and choice of effect measures in epidemiologic analyses. Am J Epidemiol. 1987;125:761–8. doi: 10.1093/oxfordjournals.aje.a114593. [DOI] [PubMed] [Google Scholar]
Karhausen LR. Exposures, mutations and the history of causality. J Epidemiol Community Health. 2001;55:607. doi: 10.1136/jech.55.8.607. [DOI] [PMC free article] [PubMed] [Google Scholar]
Robins JM, Greenland S. The probability of causation under a stochastic model for individual risk. Biometrics. 1989;45:1125–38. [PubMed] [Google Scholar]
Robins JM, Greenland S. Comment on: “Causal inference without counterfactuals” by A.P. Dawid. J Am Statist Assoc. 2000;95:477–82. [Google Scholar]
Rothman KJ. Causes. Am J Epidemiol. 1976;104:587–92. doi: 10.1093/oxfordjournals.aje.a112335. [DOI] [PubMed] [Google Scholar]
Rothman KJ, Greenland S. Modern Epidemiology. Lippincott-Raven; Philadelphia: 1998. [Google Scholar]
VanderWeele TJ. Sufficient cause interactions and statistical interactions. Epidemiol. 2009;20:6–13. doi: 10.1097/EDE.0b013e31818f69e7. [DOI] [PubMed] [Google Scholar]
VanderWeele TJ. Sufficient cause interactions for categorical and ordinal exposures with three levels. Biometrika. 97:647–659. doi: 10.1093/biomet/asq030. [DOI] [PMC free article] [PubMed] [Google Scholar]
VanderWeele TJ, Hernández-Diaz S, Hernán MA. Case-only gene-environment interaction studies: when does association imply mechanistic interaction? Genetic Epidemiology. 2010;34:327–334. doi: 10.1002/gepi.20484. [DOI] [PMC free article] [PubMed] [Google Scholar]
VanderWeele TJ, Richardson TS. General theory for sufficient cause interactions for dichotomous exposures. Annals of Statistics. 2011 doi: 10.1214/12-aos1019. conditionally accepted. [DOI] [PMC free article] [PubMed] [Google Scholar]
VanderWeele TJ, Robins JM. The identification of synergism in the sufficient-component cause framework. Epidemiol. 2007;18:329–39. doi: 10.1097/01.ede.0000260218.66432.88. [DOI] [PubMed] [Google Scholar]
VanderWeele TJ, Robins JM. Empirical and counterfactual conditions for sufficient cause interactions. Biometrika. 2008;95:49–61. [Google Scholar]
Zhang X, Miao X, Liang G, Hao B, Wang Y, Tan W, Li Y, Guo Y, He F, Wei Q, Lin D. Polymorphisms in DNA base excision repair genes ADPRT and XRCC1 and risk of lung cancer. Can Res. 2005;65:722–726. [PubMed] [Google Scholar]

[R1] Bennett WP, Alavanja MCR, Blomeke B, Vähäkangas KH, Castrén K, Welsh JA, Bowman ED, Khan MA, Flieder DB, Harris CC. Environmental tobacco smoke, genetic susceptibility, and risk of lung cancer in never-smoking women. J Natl Cancer Inst. 1999;91:2009–2014. doi: 10.1093/jnci/91.23.2009. [DOI] [PubMed] [Google Scholar]

[R2] Elwood JM. Causal Relationships in Medicine. Oxford University Press; Oxford: 1988. [Google Scholar]

[R3] Greenland S. Interpretation and choice of effect measures in epidemiologic analyses. Am J Epidemiol. 1987;125:761–8. doi: 10.1093/oxfordjournals.aje.a114593. [DOI] [PubMed] [Google Scholar]

[R4] Karhausen LR. Exposures, mutations and the history of causality. J Epidemiol Community Health. 2001;55:607. doi: 10.1136/jech.55.8.607. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] Robins JM, Greenland S. The probability of causation under a stochastic model for individual risk. Biometrics. 1989;45:1125–38. [PubMed] [Google Scholar]

[R6] Robins JM, Greenland S. Comment on: “Causal inference without counterfactuals” by A.P. Dawid. J Am Statist Assoc. 2000;95:477–82. [Google Scholar]

[R7] Rothman KJ. Causes. Am J Epidemiol. 1976;104:587–92. doi: 10.1093/oxfordjournals.aje.a112335. [DOI] [PubMed] [Google Scholar]

[R8] Rothman KJ, Greenland S. Modern Epidemiology. Lippincott-Raven; Philadelphia: 1998. [Google Scholar]

[R9] VanderWeele TJ. Sufficient cause interactions and statistical interactions. Epidemiol. 2009;20:6–13. doi: 10.1097/EDE.0b013e31818f69e7. [DOI] [PubMed] [Google Scholar]

[R10] VanderWeele TJ. Sufficient cause interactions for categorical and ordinal exposures with three levels. Biometrika. 97:647–659. doi: 10.1093/biomet/asq030. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] VanderWeele TJ, Hernández-Diaz S, Hernán MA. Case-only gene-environment interaction studies: when does association imply mechanistic interaction? Genetic Epidemiology. 2010;34:327–334. doi: 10.1002/gepi.20484. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] VanderWeele TJ, Richardson TS. General theory for sufficient cause interactions for dichotomous exposures. Annals of Statistics. 2011 doi: 10.1214/12-aos1019. conditionally accepted. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] VanderWeele TJ, Robins JM. The identification of synergism in the sufficient-component cause framework. Epidemiol. 2007;18:329–39. doi: 10.1097/01.ede.0000260218.66432.88. [DOI] [PubMed] [Google Scholar]

[R14] VanderWeele TJ, Robins JM. Empirical and counterfactual conditions for sufficient cause interactions. Biometrika. 2008;95:49–61. [Google Scholar]

[R15] Zhang X, Miao X, Liang G, Hao B, Wang Y, Tan W, Li Y, Guo Y, He F, Wei Q, Lin D. Polymorphisms in DNA base excision repair genes ADPRT and XRCC1 and risk of lung cancer. Can Res. 2005;65:722–726. [PubMed] [Google Scholar]

PERMALINK

Stochastic counterfactuals and stochastic sufficient causes

TYLER J VANDERWEELE

JAMES M ROBINS

Abstract

1. Introduction

2. Stochastic Sufficient Causes and Sufficient Cause Interactions

Theorem 1

Proof

Theorem 2

Proof

Theorem 3

Theorem 4

3. Genetics Applications Revisited

4. Concluding Remarks

Appendix

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Stochastic counterfactuals and stochastic sufficient causes

TYLER J VANDERWEELE

JAMES M ROBINS

Abstract

1. Introduction

2. Stochastic Sufficient Causes and Sufficient Cause Interactions

Theorem 1

Proof

Theorem 2

Proof

Theorem 3

Theorem 4

3. Genetics Applications Revisited

4. Concluding Remarks

Appendix

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases