Abstract
Haavelmo's seminal 1943 and 1944 papers are the first rigorous treatment of causality. In them, he distinguished the definition of causal parameters from their identification. He showed that causal parameters are defined using hypothetical models that assign variation to some of the inputs determining outcomes while holding all other inputs fixed. He thus formalized and made operational Marshall's (1890) ceteris paribus analysis. We embed Haavelmo's framework into the recursive framework of Directed Acyclic Graphs (DAGs) used in one influential recent approach to causality (Pearl, 2000) and in the related literature on Bayesian nets (Lauritzen, 1996). We compare the simplicity of an analysis of causality based on Haavelmo's methodology with the complex and nonintuitive approach used in the causal literature of DAGs—the “do-calculus” of Pearl (2009). We discuss the severe limitations of DAGs and in particular of the do-calculus of Pearl in securing identification of economic models. We extend our framework to consider models for simultaneous causality, a central contribution of Haavelmo. In general cases, DAGs cannot be used to analyze models for simultaneous causality, but Haavelmo's approach naturally generalizes to cover them.
Keywords: Causality, Identification, Do-Calculus, Directed Acyclic Graphs, Simultaneous Treatment Effects
1 Trygve Haavelmo's Causality
Trygve Haavelmo made fundamental contributions to understanding the formulation and identification of causal models. In two seminal papers (1943 (1944), he formalized the distinction between correlation and causation,1 laid the foundation for counterfactual policy analysis and distinguished the concept of “fixing” from the statistical operation of conditioning—a central tenet of structural econometrics. He developed an empirically operational version of Marshall's notion of ceteris paribus (1890), which is a central notion of economic theory, even though Haavelmo never explicitly used that terminology.
In Haavelmo's framework, the causal effects of inputs on outputs are determined by the impacts of hypothetical manipulations of inputs on outputs which he distinguishes from correlations between inputs and outputs in observational data. The causal effect of an input is defined using a hypothetical model that abstracts from the empirical data generating process by making hypothetical variation in inputs that are independent of all other determinants of outputs. As a consequence, Haavelmo's notion of causality relies on a thought experiment in which the model that governs the observed data is extended to allow for independent manipulation of inputs, irrespective of whether or not they vary independently in the data.
Haavelmo formalized Frisch's notion that “causality is in the mind.”2 Causal effects are not empirical statements or descriptions of actual worlds, but descriptions of hypothetical worlds obtained by varying—hypothetically—the inputs determining outcomes. Causal relationships are often suggested by observed phenomena, but they are abstractions from it.3
This paper revisits Haavelmo's notions of causality using the mathematical language of Directed Acyclic Graphs (DAGs). We start with a recursive framework less general than that of Haavelmo (1943). This allows us to represent causal models as Directed Acyclic Graphs which are intensively studied in the literature on Bayesian networks (Howard and Matheson, 1981; Lauritzen, 1996; Pearl, 2000). We then consider the general non-recursive framework of Haavelmo (1943, 1944) which cannot, in general, be framed as DAGs.
Following Haavelmo, we define hypothetical models that are used to generate causal parameters as idealizations of empirical models that govern the data generating processes. This facilitates discussion of causal concepts such as “fixing” using an intuitive approach that draws on Haavelmo's notion of causality. Identification relies on linking the parameters defined in a hypothetical model using data generated by an empirical model.
This paper makes the following contributions to the literature on causality: (1) We build a framework for the study of causality inspired by Haavelmo's concept of hypothetical variation of inputs. (2) In doing so, we express Haavelmo's notion of causality in the mathematical language of DAGs. (3) For this class of models, we compare the simplicity of Haavelmo's framework with the cumbersome and nonintuitive causal framework for the do-calculus proposed by Pearl (2000), which is beginning to be used in economics (see, e.g., Margolis et al., 2012; White and Chalak, 2009). (4) We discuss the limitations of the use of DAGs for econometric identification. We show that even in recursive models, the methods that rely solely on the information in DAGs do not exploit identification strategies based on functional restrictions and exclusion restrictions that are generated by economic theory. This limitation produces apparent non-identification in classically identified econometric models. We show how Haavelmo's approach naturally extends to notions of simultaneous causality while the DAG approach is fundamentally recursive.
Our paper is on the methodology of causality. We do not create a new concept of causality, but rather propose a new framework within which to discuss it. We show that Haavelmo's approach is a complete framework for the study of causality that accommodates the main tools of identification used in the current literature in econometrics, whereas an approach based on DAGs does not.
We show that the causal operation of fixing described in Haavelmo (1943) and Heckman (2005, 2008) is equivalent to statistical conditioning when embedded in a hypothetical model that assigns independent variation to inputs with regard to all variables not caused by those inputs. Pearl (2009) uses the term do for the concept of fixing a variable. We show the relationship between statistical conditioning in a hypothetical model and the do-operator. Fixing, in our framework, differs from the operation of the do-operator because it targets specific causal links instead of variables that operate across multiple causal links. A benefit of targeting causal links is that it simplifies the analysis of the subsets of causal relationships associated with an input variable when compared to the do-operator. An analysis of causality based on Haavelmo's approach uses standard rules of probability to define and analyze causal parameters. In contrast, the complex and nonintuitive analysis of Pearl invents new and cumbersome nonstandard rules for analyzing causal models that do not cover a variety of models identified under the Haavelmo approach.
Haavelmo's approach allows for a precise yet intuitive definition of causal effects. With it, analysts can identify causal effects by applying standard statistical tools. In contrast with the do-calculus, application of Haavelmo's concepts eliminates the need for additional extra-statistical graphical/statistical rules to achieve identification of causal parameters. Haavelmo's approach also covers the case of simultaneous causality in its full generality whereas frameworks for causal analysis currently used in statistics cannot, except through introduction and application of ad hoc rules.
This paper is organized in the following way. Section 2 reviews Haavelmo's causal framework. Section 3 uses a modern framework of causality to assess Haavelmo's contributions to the literature. Section 4 examines how application of this framework differs from Pearl's do-calculus (2009) and enables analysts to apply the standard tools of probability and statistics without having to invent new extra-statistical rules. It gives an example of the identification of causal effects that considers Pearl's “Front-Door” criteria and the nonidentifiability of the instrumental variables model using the rules of the do-calculus. Section 5 discuss the limitations of DAGs in implementing the variety of sources of identification available to economists. We focus on the simplest cases of confounding models where instrumental variables are available. Section 6 extends the discussion to a simultaneous equations framework. Section 7 concludes.
2 Haavelmo's Causal Framework
We review the key concepts of causality developed by Haavelmo (1943, 1944)—starting with a recursive model. A causal model is based on a system of structural equations that define causal relationships among a set of variables. In the language of Frisch (1938), these structural equations are autonomous mechanisms represented by deterministic functions mapping inputs to outputs. By autonomy we mean, as did Frisch, that these relationships remain invariant under external manipulations of their arguments. They are functions in the ordinary usage of the term in mathematics. They produce the same values of the outcomes when inputs are assigned to a fixed set of values, however those values are determined. Even though the functional form of a structural equation may be unknown, the causal directions among the variables of a structural equation are assumed to be known. They are determined by thought experiments that may sometimes be validated in data. The variables chosen as arguments in a structural equation are assumed to account for all causes of the associated output variable.
Haavelmo developed his work on causality for aggregate economic models. He considered mean causal effects and, for the sake of simplicity, invoked linearity, assumed uniformity of responses to inputs across agents, and focused on continuous variables. More recent approaches generalize his framework.
Haavelmo formalized the distinction between correlation and causation using a simple model. In order to examine his ideas, consider three variables Y, X, U associated with error terms ε = (εU, εX, εY) such that X, Y are observed by the analyst while variables U, ε are not.4 He assumed that U is a confounding variable that causes Y and X. We represent this model through the following structural equations:
where ε is a vector of mutually independent error terms with cumulative distribution function Qε. Thus, if X, U, εY take values of x, u, eY, then Y must take the value y = fY(x, u, eY). By iterated substitution we can express all variables in terms of ε. Moreover, the mutual independence assumption of error terms implies that εY is independent of (X, U) as X = fX(fU(εU), εX) and U = fU(εU). Notationally, we write (X, U) ╨ εY, where ╨ denotes statistical independence. In the same fashion, we have that εX ╨ U but X is not independent of εU.
Haavelmo defines the causal effect of X on Y as being generated by a hypothetical manipulation of variable X that does not affect the values that U or ε take. This is called fixing X by a hypothetical manipulation.5 Notationally, outcome Y when X is fixed at x is denoted by Y(x) = fY(x, U, εY) and its expectation is given by , where means expectation over the distribution of random variables U and εY. The average causal effect of X on Y when X takes values x and x′ is given by . For notational simplicity, we henceforth suppress the subscript on denoting the random variable with respect to which the expectation is computed.
Conditioning is a statistical operation that accounts for the dependence structure in the data. Fixing is an abstract operation that assigns independent variation to the variable being “fixed.” The standard linear regression framework is convenient for illustrating these ideas and in fact is the one used by Haavelmo (1943).
Consider the standard linear model Y = Xβ + U + εy where represent the data generating process for Y. The expectation of outcome Y when X is fixed at x is given by . This equation corresponds to Haavelmo's (1943) hypothetical model. The expectation of Y when X is conditioned on x is given by , as because εY ╨ X. If and elements of X are not collinear, then OLS identifies β and and β generates a causal parameter: the average treatment effect of a change in X on Y. Specifically, (x − x′)β is the average difference between the expectation of Y when X is fixed at x and x′.
The difficulty of identifying the average causal effect of X on Y when (and thereby ) stems from the potential confounding effects of unobserved variable U on X. In this case, the standard Least Squares estimator does not generate an autonomous causal or structural parameter because plim(β̂) = β + cov(X, U)/var(X) depends on the covariance between X and U. While the concept of a causal effect does not rely on the properties of the data generating process, the identification of causal effects does.
Without linearity, one needs an assumption stronger than to obtain . Indeed if one assumes no confounding effects of U, that is to say that X and U are independent (X ╨ U), then one can show that fixing is equivalent to statistical conditioning:
where Q(U, εY)|X=x(u, εY) denotes the cumulative joint distribution function of U, εY conditional on X = x and the second equality comes from as the fact that U, X and εY are mutually independent. If X ╨ (U, εY) holds, we can use observational data to identify the mean value of Y fixing X = x by evaluating the expected value of Y conditional on X = x. Note that in general, the value obtained depends on the functional form of fY(x, u, εY).
Haavelmo's notation has led to some confusion in the statistical literature. His argument was aimed at economists of the 1940s and does not use modern notation. Haavelmo's key definitions and ideas are given by examples rather than by formal definitions. We restate and clarify his framework in this paper.
To simplify the exposition, assume that all variables are discrete and let Pr denote their probability measure. The factorization of the joint distribution of Y, U conditional on X is given by Pr(Y, U|X = x) = Pr(Y|U, X = x)Pr(U|X = x). In contrast, in the abstract operation of fixing X is assumed not to affect the marginal distribution of U. That is to say that U(x) = U. Therefore the joint distribution of Y, U when X is fixed at x is given by Pr(Y(x), U(x)) = Pr(Y(x), U) = Pr(Y|U, X = x)Pr(U).
Fixing lies outside the scope of standard statistical theory and is often a source of confusion. Indeed, even though the probabilities Pr(Y|U, X = x) and Pr(U) are well defined, neither the causal operation of fixing nor the resulting joint distribution follow from standard statistical arguments.6 Conditioning is equivalent to fixing under independence of X and U. In this case the conditional joint distribution of Y and U becomes Pr(Y, U|X = x) = Pr(Y|U, X = x)Pr(U|X = x) = Pr(Y|U, X = x)Pr(U).
To gain more intuition on the difference between fixing and conditioning, express the conditional expectation as the integral across ε over a restricted set 𝒜C. By iterated substitution, we can write Y as Y = fY (fX(fU(εU), εX), fu(εu), εY). Thus
| (1) |
| (2) |
Fixing, on the other hand, is written as the integral across ε over its full support:
| (3) |
| (4) |
Fixing differs from conditioning in terms of the difference in the integration sets 𝒜F and 𝒜C. While conditional expectation (1) is a standard operation in statistics, the operation used to define fixing is not. Equation (1) is an expectation conditional on the event fX(fU(εU), εX) = x, which affects the integration set 𝒜C given in (2). Fixing (3), on the other hand, integrates the function fY(x, fU(εU), εY) across the whole support of ε given in (4). The inconsistency between fixing and conditioning in the general case comes from the fact that fixing X is equivalent to setting the expression fX(fU(εU), εX) to x without changing the probability measures of εU, εX associated with the operation of conditioning on the event X = x.
This paper interprets Haavelmo's approach by introducing a hypothetical model that enables analysts to examine fixing using standard tools of probability. The hypothetical model departs from the data generating process by exploiting autonomy and creating a hypothetical variable that has the desired property of independent variation with regard to U. The hypothetical model is an idealization of the empirical model. Standard statistical tools apply to both the data generating process and the hypothetical model.
To formalize Haavelmo's notions of causality, let a hypothetical model with error terms ε and four variables including Y, X, U but also a new variable X̃ with the property that X̃ ╨ (X, U, ε).7 Invoking autonomy, the hypothetical model shares the same structural equation as the empirical one but departs from it by replacing X with an X̃-input, namely Y = fY(X̃, U, εY). The hypothetical model is not a wildly speculative departure from the empirical data generating process but an expanded version of it. Thus (Y|X = x, U = u) = fY(x, u, εY) in the empirical model and (Y|X̃ = x, U = u) = fY(x, u, εY) in the hypothetical model. The hypothetical model has the same marginal distribution of U as the empirical model. The joint distributions of variables in the empirical model PrE and the hypothetical model PrH may differ.
The hypothetical model clarifies the notion of fixing in the empirical model. Fixing in the empirical model is based on non-standard statistical operations. However, the distribution of the outcome Y when X is fixed at x in the empirical model can be interpreted as standard statistical conditioning in the hypothetical model, namely, PrE(Y(x)) = PrH(Y|X̃ = x). The next section formalizes these ideas using one modern language of causality.8
3 Recasting Haavelmo's Ideas
We recast Haavelmo's model in the framework of Directed Acyclic Graphs (DAGs). DAGs are studied in Bayesian Networks (Howard and Matheson, 1981; Lauritzen, 1996) and are often used to define and estimate causal relationships (Lauritzen, 2001). The literature on causality based on DAGs was advanced by Judea Pearl (2000, 2009).9
In this fundamentally recursive framework, a causal model consists of a set of variables 𝒯 = {T1, …, Tn} associated with a set of mutually independent error terms ε = {ε1, …, εn} and a system of autonomous structural equations {f1, …, fn}. Variable set 𝒯 includes both observed and unobserved variables. Variable set 𝒯 also include both external and internal variables. We clarify these concepts in the following way.
Causal relationships between a dependent variable Ti ∈ 𝒯 and its arguments are defined by Ti = fi(Pa(Ti), εi), where Pa(Ti) ⊂ 𝒯 and εi ∈ ε are called parents of Ti and are said to directly cause Ti. If Pa(T) = ∅ then variable T is not caused by any variable in 𝒯. In this case, T is an external variable determined outside the system, otherwise the variable is called an internal or endogenous variable. The error terms in ε are not caused by any variable and are introduced to avoid degenerate conditioning statements among variables in 𝒯. For simplicity of notation, we keep the error terms ε implicit, except when it clarifies matters to do so. We assume that all random variables in this section and the next are discrete valued although this requirement is easily relaxed.
Causal relationships are represented by a graph G where each node corresponds to a variable T ∈ 𝒯. Nodes are connected by arrows from Pa(T) to T and represent causal influences among variables. Descendants of a variable T, i.e., D(T) ⊂ 𝒯, consist of all variables connected to T by arrows of the same direction arising from T. Graph G is called a DAG if no variable is a descendant of itself, i.e., T ∉ D(T), ∀ T ∈ 𝒯. Observe that this assumption rules out simultaneity—a central feature of Haavelmo's approach. Children of a variable T are the set of variables that have T as a parent, namely, Ch(T) = {T′ ∈ 𝒯;T ∈ Pa(T′)}.
Causal relationships are translated into statistical relationships in a DAG through a property termed the Local Markov Condition (LMC) (Kiiveri et al., 1984; Lauritzen, 1996). LMC states that a variable is independent of its non-descendants conditional on its parents. LMC (5) also holds among variables in 𝒯 under the assumption that error terms {ε1, …, εn} are mutually independent (Pearl, 1988; Pearl and Verma, 1994), namely:
| (5) |
We use Dawid's (1979) notation to denote conditional independence. If W, K, Z are subsets of 𝒯, the expression W ╨ K|Z means that each variable in W is statistically independent of each variable in K conditional on all variables in Z. The conditional independence relationships generated by LMC (5) can be further manipulated using the Graphoid relations.10 An important use of LMC (5) is to factorize the joint distribution of variables Pr(T1, …, Tn). Under a recursive model, we can assume without loss of generality that variables (T1, …, Tn, …, TN) are ordered so that (T1, …, Tn−1) are non-descendants of Tn and thereby Pa(Tn) ⊂ (T1, …, Tn−1). Thus,
| (6) |
where the last equality comes from applying LMC (5).
Table 1 uses the Haavelmo model described in Section 2 to illustrate the concepts discussed here. Table 1 presents two models and six panels separated by a series of horizontal lines. The first panel names the models. The second panel presents the structural equations generating the models. Columns 1 and 2 are based on structural equations that have the same functional form, but different inputs. The third panel represents the associated model as a DAG. Squares represent observed variables, circles represent unobserved variables. (Except in the first panel, the components of ε are kept implicit in the table.) The fourth panel displays the parents in 𝒯 for each variable. The fifth panel shows the conditional independence relationships generated by the application of LMC (5), and the sixth and final panel presents the factorization of the joint distribution.
Table 1. Haavelmo Empirical and Hypothetical Models.
| 1. Haavelmo Empirical Model | 2. Haavelmo Hypothetical Model |
|---|---|
| 𝒯 = {U, X, Y} ε = {εU, εX, εY} Y = fY (X, U, εY) X = fX (U, εX) U = fU (εU) |
𝒯 = {U, X, Y, X̃} ε = {εU, εX, εY} Y = fY (X̃, U, εY) X = fX (U, εX) U = fU (εU) |
|
|
|
Pa(U) = ∅, Pa(X) = {U} Pa(Y) = {X, U} |
Pa(U) = Pa(X̃) = ∅, Pa(X) = {U} Pa(Y) = {X̃, U} |
|
Y ╨ X|(X̃, U) X ╨ (X̃, Y)|U X̃ ╨ U |
|
| PrE(Y, X, U) = PrE(Y|X, U)PrE(X|U)PrE(U) | PrH(Y, X, U, X̃) = PrH(Y|X̃, U)PrH(X|U)PrH (U)PrH(X̃) |
This table has two columns and six panels separated by horizontal lines. Each column presents a causal model. The first panel names the models. The second panel presents the structural equations generating the models. In this row alone we make ε explicit. In the other rows it is kept implicit to avoid clutter. Columns 1 and 2 are based on structural equations that have the same functional form, but have different inputs. The third panel represents the model as a DAG. Squares represent observed variables, circles represent unobserved variables. The fourth panel presents the parents in 𝒯 of each variable. The fifth panel shows the conditional independence relationships generated by the application of the Local Markov Condition. The sixth panel presents the factorization of the joint distribution of variables in the Bayesian Network.
We use the framework presented above to discuss the concept of fixing in greater generality. According to Section 2, we define the causal operation of fixing a variable in a model represented by a graph G by the intervention that sets a value to this variable in 𝒯 in a fashion that does not affect the distribution of its non-descendants. In other words, fixing a random variable (or a set of random variables) X ∈ 𝒯 to x translates to setting X = x for all X-inputs in the structural equations associated with variables in Ch(X). Pearl (2009) uses the term doing for what we call fixing. We use his notation in writing Equation (7), below. The post-intervention distribution of variables in 𝒯 when X is fixed at x is given by
| (7) |
Versions of Equation (7) can be found in Pearl (2001), Spirtes et al. (2000), and Robins (1986). In this instance, do(X) = x is equivalent to conditioning X̃ at X̃ = x.
As noted in Section 2, standard arguments based on statistical conditioning are unable to describe the probability laws governing the fixing operation used in Equation (7). Our solution to this problem draws on Haavelmo's insight that causality is a property of hypothetical models in which causal effects on output variables are generated through hypothetical independent variations of inputs. Specifically, we are able to map causal manipulations of the fixing operation into standard statistical language by formalizing the concept of a hypothetical model in Section 3.1.
3.1 The Hypothetical Model
We formalize the concept of a hypothetical model and study its properties. The notions discussed here constitute our theoretical basis to examine causal effects. To recall, we use the term empirical model to designate the data generating process and the term hypothetical model to designate the model used to characterize causal effects.
The hypothetical model is generated from an empirical model. It shares the same structural equations and same distribution of error terms as the empirical model. The hypothetical model differs from the empirical model in two ways. First, it appends to the empirical model an external variable (or a set of external variables) termed a hypothetical variable(s). Second, it replaces the action of existing inputs. If X ∈ 𝒯 is the target variable to be fixed in the empirical model, then the newly created hypothetical variable X̃ replaces the X-input of one, some or all variables in Ch(X). In other words, children of X in the empirical model will have their X-input replaced by a X̃-input in the hypothetical model. We assume that X and X̃ have common supports.
Table 1 illustrates the concept of a hypothetical model using the Haavelmo model introduced in Section 2. Column 1 presents the Haavelmo empirical model while Column 2 presents its associated hypothetical model.
For the sake of clarity, we use GE for the DAG representing the empirical model and 𝒯E for its associated set of variables. We use PaE, DE, ChE for the parents, descendants, and children with DAG GE. We use PrE for the probability measure of variables in 𝒯E. For the corresponding counterparts in the hypothetical model we use GH, 𝒯H, PaH, DH, ChH, and PrH.
We now list some salient features of the hypothetical model. Let X̃ denote the hypothetical variable (or variables) associated with X ∈ 𝒯E. We expand the list of variables in the hypothetical model so that 𝒯H = 𝒯E ∪ {X̃}. The hypothetical variable can replace some or all of the input X for variables in ChE(X), i.e., ChH(X) ⊆ ChE(X). Children of X in the empirical model can be partitioned among X and X̃ in the hypothetical model: ChE(X) = ChH(X) ∪ ChH(X̃).11 As a consequence we also have that DE(X) = DH(X) ∪ DH(X̃), that is, X-descendants of the empirical model constitute the X and X̃ descendants in the hypothetical model. Parental sets of the hypothetical model are defined by PaH(T) = PaE(T) ∀ T ∈ 𝒯E\ChH(X̃) and PaH(T) = {PaE(T)\{X}}∪{X̃} ∀ T ∈ ChH(X̃). Moreover, X̃ is an external variable, that is, PaH(X̃) = ∅. The hypothetical model is also a DAG. Thus LMC (5) holds and the joint distribution of the variables in 𝒯H can be factorized using equation (6). By sharing the same structural equations and distribution of error terms, the conditional probabilities of the hypothetical model can be written as
| (8) |
and
| (9) |
Equations (8)–(9) arise because the distribution of a variable T ∈ 𝒯E conditional on its parents is determined by the distribution of its error terms, which is the same for hypothetical and empirical models.
We now link the probability measures of the empirical and hypothetical models. Theorem T-1 uses LMC (5) and Equation (8) to show that the distribution of non-descendants of X̃ are the same in both hypothetical and empirical models:
Theorem T-1. Let X̃ be the hypothetical variable in the hypothetical model represented by GH associated with variable X in empirical model GE. Let W, Z be any disjoint set of variables in 𝒯E \ DH(X̃). Then
Proof. See Appendix.
Theorem T-1 also holds for the set of variables that are non-descendants of X according to the empirical model, which are a subset of 𝒯E \ DH(X̃). Thus, PrH(W|Z) = PrH(W|Z, X̃) = PrE(W|Z) for all {W, Z} ⊂ 𝒯E \ DE(X).
The following theorem uses Theorem T-1 and Equations (8)–(9) to show that the distribution of variables conditional on X and X̃ taking the same value x in the hypothetical model is equal to the distribution of the variables conditional on X = x in the empirical model:
Theorem T-2. Let X̃ be the hypothetical variable in the hypothetical model represented by GH associated with variable X in empirical model GE and let W, Z be any disjoint12 set of variables in 𝒯E. Then
A useful corollary of Theorem T-2 is the method of matching:
Corollary C-2. Matching: Let Z, W be any disjoint set of variables in 𝒯E and let X̃ be a hypothetical variable in model GH associated with X ∈ 𝒯E in model GE such that, in the hypothetical model, X ╨ W|(Z, X̃), then
Proof. See Mathematical Appendix.
Variables Z of C-2 are called matching variables. In statistical jargon, it is said that matching variables solve the problem of confounding effects between a treatment indicator X and outcome W. Matching is commonly used to identify treatment effects in propensity score matching models.14 In these models, the conditional independence relation of Matching C-2 is assumed to be true. Pearl (1993) describes a graphical test called the “Back-Door” criterion that can be applied to a DAG in order to check if a set of variables satisfy the assumptions of Matching C-2. We turn next to an explicit discussion of the do-calculus.
3.2 Benefits of the Hypothetical Model
The major benefit of the hypothetical model is to allow us to perform causal operations using standard statistical tools. As previously noted, the fixing operation is poorly defined in statistics. Statistical tools such as LMC (5), Graphoid Axioms or the Law of Iterated Expectation do not apply to the fixing operator. The hypothetical model solves this mismatch between causal language and statistical operations because the operation of fixing a variable in the empirical model is easily translated into statistical conditioning in the hypothetical model. This comes as a consequence of the properties of the hypothetical variable, which is defined to have the desired independent variation to generate causal effects. In particular, if we replace the X-input by a X̃-input for all children of X, as suggested by the operation of fixing or “doing,” we have that the distribution of an outcome Y ∈ 𝒯E of the empirical model when variable X is fixed at x (for all its children) is equivalent to the distribution of Y conditional on the hypothetical variable X̃ being assigned to value x. This is captured by the following theorem:
Theorem T-3. Let X̃ be the hypothetical variable in GH associated with variable X in the empirical model GE, such that ChH(X̃) = ChE(X), then:
Proof. See Appendix.
Theorem T-3 avoids the need for defining new mathematical tools that would be necessary to integrate the fixing operator into statistical language. Section 4 illustrates this point by comparing the identification of causal effects in the “Front-Door” model of Pearl (2009) using his “do-calculus” and identification using a hypothetical model, which does not require any additional apparatus outside standard statistical analysis.
Another benefit of the hypothetical model is that it clearly distinguishes the characterization of a causal effect from its identification in data. Causal effects of a variable X on and outcome Y are characterized within the hypothetical model by the distribution of Y conditioned on hypothetical variable X̃. Identification of causal effects, on the other hand, requires analysts to relate the hypothetical and empirical distributions in a fashion that allows the evaluation of causal effects examined in the hypothetical model using data generated by the empirical model. This task relies on the statistical properties that connect both models; that is, Equations (8) and (9), Theorems T-1 and T-2, and Corollary C-2. Section 4 illustrates model identification while discussing the differences between the hypothetical model approach and the do-calculus of Pearl (1995).
The hypothetical model does not suppress the variable we seek to fix, but rather creates a new hypothetical variable that allows us to examine a variety of causal effects. This approach provides a natural framework within which to examine counterfactual outcomes that involve both fixing and conditioning. For example, suppose X denotes schooling choice: X = 1 college education and X = 0 otherwise. The treatment-on-the-treated parameter stands for the average causal effect of college education for the sub-sample of agents that choose to go to college. This parameter is readily defined by in the hypothetical model. For more examples of such parameters, see Heckman and Vytlacil (2007a).
In contrast with the do-operator, the hypothetical model targets causal links, not variables. In other words, the hypothetical model allows analysts to target separate causal relationships of X. We can choose subsets of variables in Ch(X) that will be caused by a hypothetical variable X̃, which in turn replaces some of the X inputs. The do-operator, on the other hand, targets all causal relationships involving the variable X. However it is often that case that the analysis would like to examine a counterfactual outcomes generated by fixing X at different levels for different structural equations that have X as input.
For example, consider the standard mediation model in which X denoted treatment (1 for treated and 0 for control assignments), M is a mediator variable that is caused by X and Y is an outcome of interest that is caused by X and M. Suppose the analysis is interested in distinguishing the effect of X on Y that operates through X itself and through the mediated effect of X on M. Say we are interested in the expected value of the counterfactual outcome Y generated by fixing its X-input to 1 while using the distribution of M that would be generated by fixing X at 0. We can interpret this counterfactual as the result of a treatment direct effect and a control mediation effect.
By evoking do(X) = 1 we fix its value at 1 in both Y and M inputs. But the counterfactual outcome Y is based on the distribution of M that is generated by fixing X at 0. To solve this problem, we are required to also fix M to its desired level. As a consequence, we need to evoke do(X) = 1 and do(M). Specifically, we can characterize the suggested counterfactual outcome in terms of the do-operator by:
| (10) |
QM|do(X)=0 denotes the cumulative distribution of the variable M when X is fixed at 0, that is (M|do(X) = 0). Under the hypothetical model presented in Model 2 of Table 2, the studied counterfactual outcome is easily written as .
Table 2. Models for Mediation Analysis.
| 1. Empirical Mediation Model | 2. Hypothetical Model for Indirect Effect of X on Y |
|---|---|
|
|
This table shows four models represented by DAGs. To simplify the displays we keep the unobservables in ε implicit. Model 1 represents the empirical model for mediation analysis. The remaining three models are hypothetical models that target different causal effects of X on Y. Model 2 represents the hypothetical model for indirect effect X on Y.
4 Do-Calculus versus Hypothetical Model
This section compares the identification strategy of the do-calculus developed by Pearl (1995) with the causal framework based on a hypothetical model. The hypothetical model approach is conceptually simpler, more tractable and more intuitive.
The do-calculus consists of three graphical and statistical rules that operate on the empirical model. These rules supplement standard statistical theory in order to process the information of a causal models that can be represented by a DAG. The goal of the do-calculus rules is to integrate the statistically ill-defined concept of fixing into a standard statistical framework. Examples of this literature are Huang and Valtorta (2006, 2008) and Tian and Pearl (2002, 2003).
Contrary to the hypothetical model approach, the do-calculus requires special graph-theoretic notation outside common statistical usage. Namely, let X, Y, Z, W be arbitrary disjoint sets of variables (nodes) in a causal graph G, then the graphical operations of the do-calculus are given by:
GX̄ denotes a modification of DAG G obtained by deleting the arrows pointing to X;
denotes the modified DAG obtained by deleting the arrows emerging from X;
denotes the DAG obtained by deleting arrows pointing to X and emerging from Z.
The graphical operations items 1,2 and 3 above also apply to nodes Z(W), which denotes the set of Z-nodes that are not ancestors of any W-node in a DAG G. Table 3 gives examples of this notation.
Table 3. Examples of the Do-calculus Notation.
| 1. Empirical Model G | 2. Modified Empirical Model GX̄ |
|---|---|
|
|
| 3. Modified Empirical Model | 4. Modified Empirical Model |
|
|
This table shows four models represented by DAGs that exemplify the notation of the do-calculus. Graph 1 is the original empirical model G. the remaining DAGs are generated by the deletion of arrows according to the do-calculus notation.
Let G be a DAG and let X, Y, Z, W be any disjoint sets of variables. The do-calculus rules are:
-
Rule 1: Insertion/deletion of variables:
Y ╨ Z|(X, W) under GX̄ ⇒ Pr(Y|do(X), Z, W) = Pr(Y|do(X), W).
-
Rule 2: Action/observation exchange:
Y ╨ Z|(X, W) under ⇒ Pr(Y|do(X), do(Z), W) = Pr(Y|do(X), Z, W).
-
Rule 3: Insertion/deletion of actions:
Y ╨ Z|(X, W) under ⇒ Pr(Y|do(X), do(Z), W) = Pr(Y|do(X), W), where Z(W) is the set of Z-nodes that are not ancestors of any W-node in GX¯.
Application of the do-calculus to a DAG entails several distinct steps:
Step 1: Define an empirical model of interest expressed by a DAG G.
Step 2: Generate a range of DAGs derived from the original DAG G according to the do-calculus graphical operations, e.g., GX̄ or .
Step 3: Generate the conditional independence relations associated with the DAGs generated from the subgraphs in step 2. These conditional independence relationships are obtained through the application of the Local Markov Condition and Graphoid Axioms to each DAG.
Step 4: Check if one (or more) of the three do-calculus rules apply for a selected DAG and a selected conditional independence relationship.
Step 5: If they apply, compute the do-operator equality associated with the rule in question. This step defines casual parameters.
Step 6: Identification of a causal parameter occurs if the generated do-operator expressions for the causal parameters can be expressed in terms of the conditional distribution of observed variables.
These steps are summarized in Figure 1.
Figure 1. The Steps Required to Implement the Do-Calculus.

Goth (2006) and Huang and Valtorta (2006) show that Rules 1–3 are complete, meaning that they are sufficient for deriving all causal effects that can be identified by the conditional independence assumptions implicit in the DAG.
The do-calculus rules can be applied to any DAG, but they are restricted to models that can be defined as DAGs. In other words, the do-calculus rules only apply to models that can be solely expressed by DAGs. They are not suitable for the identification of models whose assumptions lay outside the DAG conditional independence assumptions. In particular, they cannot identify the standard instrumental variable model examined in the next section.
We illustrate the use of the do-calculus through two examples. Section 4.1 examines the standard instrumental variable model. Section 4.2 examines the Front-Door Model.
4.1 The Instrumental Variable Model
We consider the simplest instrumental variable model that consists of four variables: (1) a confounding variable U that is external and unobserved; (2) an external instrumental variable Z; (3) an observed variable X caused by U and Z; and (4) an outcome Y caused by U and X. The empirical instrumental variable model is described in the first column of Table 4. Its hypothetical counterpart is presented in the second column of Table 4.
Table 4. Instrumental Variable Empirical and Hypothetical Models.
| 1. Instrumental Variable Empirical Model | 2. Instrumental Variable Hypothetical Model |
|---|---|
| 𝒯 = {U, X, Z, Y} ε = {εU, εX, εZ, εY} Y = gY(X, U, εY) X = gX (U, Z, εX) Z = gZ(εZ) U = gU(εU) |
𝒯 = {U, X, Z, Y, X̃} ε = {εU, εX, εZ, εY} Y = gY(X̃, U, εY) X = gX(U, Z, εX) Z = gZ(εZ) U = gU(εU) |
|
|
|
Pa(U) = Pa(Z) = ∅, Pa(X) = {U, Z} Pa(Y) = {U, X} |
Pa(U) = Pa(U) = ∅, Pa(X) = {U, Z} Pa(Y) = {U, X̃} |
|
Z ╨ U Y ╨ Z|(X, U) |
U ╨ (Z, X̃) Z ╨ (U, X̃, Y) X̃ ╨ (U, Z, X) X ╨ (X̃, Y)|(U, Z) Y ╨ (Z, X)|(U, X̃) |
| PrE(Y, Z, X, U) = PrE(Y|X, U)PrE(X|U, Z)PrE(Z)PrE(U) | PrH(Y, Z, X, U, X̃) = PrH(Y|X̃, U)PrH (X | U, Z)PrH (Z)PrH (U)PrH (X̃) |
| PrE(Y, Z, U|do(X) = x) = PrE(Y|X = x, U)PrE(Z)PrE(U) | PrH(Y, Z, X, U|X̃ = x) = PrH (Y|X̃ = x, U)PrH (X|U, Z)PrH (Z)PrH (U) |
This table has two columns and seven panels separated by horizontal lines. Each column presents a causal model. The first panel names the model. The second panel presents the structural equations generating the model. In this row alone we make the ε explicit. In the other rows it is kept implicit to avoid notational clutter. Columns 1 and 2 are based on structural equations that have the same functional form, but have different inputs. The third panel represents the model as a DAG. Squares represent observed variables and circles represent unobserved variables. The fourth panel presents the parents in T of each variable. The fifth panel shows the conditional independence relationships generated by the application of the Local Markov Condition. The sixth panel presents the factorization of the joint distribution of variables in the Bayesian Network. The last panel of column 1 presents the joint distribution of variables when X is fixed at x. (do(X) = x). The last panel of column 2 gives the joint distribution of variables generated by hypothetical models associated with empirical model 1 when X̃ is conditioned on X̃ = x.
The next step in the application of the do-calculus is the creation of DAGs derived from the original Instrumental Variable DAG. The rules of the do-calculus do not offer guidance on which DAGs to generate. For sake of brevity, we will consider four DAGs that can be operated on by the three do-calculus rules and thereby generate equalities involving the do-operator.
We first focus on the DAG GX̄ represented in the first DAG of Table 5. In this modified DAG, LMC (5) applied to variable Z generates Z ╨ (U, X, Y). Thereby Y ╨ Z|X holds from application of the Symmetry and Weak Union properties of the Graphoid Axioms. We next map this relationship into Rule 1. Namely, if Y ╨ Z|X holds in the Graph GX̄, then by Rule 1,
Table 5. Do-calculus and the Instrumental Variable Model.
| 1. Modified IV Model GX̄ | 2. Modified IV Model |
|---|---|
|
|
|
U ╨ (Z, X) Z ╨ (U, X, Y) X ╨ (U, Z) Y ╨ Z|(U, X) |
U ╨ (Z, X, Y) Z ╨ (U, X, Y) X ╨ (U, Z) Y ╨ (U, Z)|X |
| 3. Modified IV Model | 4. Modified IV Model |
|
|
|
U ╨ Z Z ╨ (U, Y) X ╨ Y|(U, Z) Y ╨ (Z, X)|U |
U ╨ (Z, X, Y) Z ╨ Z X ╨ U|Z Y ╨ (U, Z)|X |
This table shows four models represented by DAGs (ε are kept implicit to avoid notational clutter). Squares represent observed variables, circles represent unobserved variables. Each DAG is generated by the deletion of arrows of the original Instrumental Variable model (first column of Table 4) according to the rules of the do-calculus. Below each model, we show conditional independent relations generated by the application of the Local Markov Condition (5) to variables of the models.
| (11) |
Equation (11) states that Z is independent of Y when X is fixed. This implication of Rule 1 is captured by the relationship Y ╨ Z|X̃ in the hypothetical model. This relationship is automatically obtained by applying LMC (5) to variable Z in the hypothetical model.
Using the Haavelmo approach, we create a hypothetical model shown in column 2 of Table 4. Without using any of Pearl's rules, it is immediate, applying LMC to Z, that
and the causal parameter Pr(Y|X̃) is well defined. This parameter can be identified using standard IV methods widely used in econometrics (see, e.g., Matzkin, 2013).
We now use the do-calculus to show that causal effects can be obtained by conditioning on the unobserved confounding variable U. To illustrate the application of Rule 2, consider the graph depicted in DAG 2 of Table 5. In this modified DAG, LMC (5) applied to variable Y generates Y ╨ (U, Z)|X and therefore Y ╨ U|X. If Y ╨ U|X holds in the Graph then by Rule 2,
| (12) |
Next consider the graph depicted in DAG 3 of Table 5. In this modified DAG, LMC (5) applied to variable Y generates Y ╨ (Z, X)|U and therefore Y ╨ X|U. Now if Y ╨ X|U holds in the Graph then by Rule 2,
| (13) |
Finally, consider the graph depicted in DAG 4 of Table 5. Again, LMC (5) applied to variable Y in this modified DAG generates Y ╨ (U, Z)|X and therefore Y ╨ U|X. If Y ╨ U|X holds in graph by Rule 2,
| (14) |
Combining Equations (12)–(14) we obtain:
| (15) |
Equation (15) states that the distribution of Y conditioned on U when X is fixed is the same as the distribution of Y conditioned on U and X. In other words, the causal effects of X on Y can be obtained by conditioning Y on the unobserved variable U. This well-known fact is easily obtained through the Hypothetical Model: LMC (5) on X generates Y ╨ X|(X̃, U), and by Matching C-2, PrH(Y|X̃, U) = PrE(Y|X, U).
As noted in Pearl (2009, chapters 3 and 5), the relationships discussed here — which exhaust the implications of the do-calculus for the instrumental variable model — are not sufficient to identify it unless the analyst can condition on U. Using the Haavelmo approach, a variety of identification strategies can identify the causal parameters, using instrumental variables. These strategies, however, rely on assumptions that cannot be expressed by DAGs. This renders the do-calculus unsuitable for the identification analysis of these type of models. Section 5 discusses this topic in detail. Heckman and Pinto (2013) use the hypothetical model approach to examine the necessary and sufficient conditions that identify the instrumental variable model.
4.2 Identifying the “Front-Door” Model
We next compare the do-calculus and an analysis based on our hypothetical model by identifying the causal effects of Pearl's “Front-Door model”. Table 7 presents an example of the application of the rules of the do-calculus for the Front-Door model described in Table 6. That model consists of four variables: (1) an external unobserved variable U; (2) an observed variable X caused by U; (3) an observed variable M caused by X; and (4) an outcome Y caused by U and M. The Front-Door model is presented in the first column of Table 6.
Table 7. Do-calculus and the Front-Door Model.
| 1. Modified Front-Door Model | 2. Modified Front-Door Model |
|---|---|
|
|
| (Y, M) ╨ X|U (X, U) ╨ M |
(X, M) ╨ Y|U (Y, U) ╨ M|X |
| 3. Modified Front-Door Model | 4. Modified Front-Door Model GX̄,M̄ |
|
|
| (X, M) ╨ (Y, U) | (Y, M, U) ╨ X U ╨ M |
This table shows four models represented by DAGs (ε are kept implicit to avoid notational clutter). Squares represent observed variables, circles represent unobserved variables. Each DAG is generated by the deletion of arrows of the original Front-Door model (first column of Table 6) according to the rules of the do-calculus. Below each model, we show conditional independent relations generated by the application of the Local Markov Condition (5) to variables of the models.
Table 6. “Front-Door” Empirical and Hypothetical Models.
| 1. Pearl's “Front-Door” Empirical Model | 2. Our Version of the “Front-Door” Hypothetical Model |
|---|---|
| 𝒯 = {U, X, M, Y} ε = {εU, εX, εM, εy} Y = fY (M, U, εY) X = fX(U, εX) M = fM(X, εM) U = fU(εU) |
𝒯= {U, X, M, Y, X̃} ε = {εU, εX, εM, εY} Y = fY (M, U, εY) X = fX (U, εX) M = fM (X̃, εM) U = fU(εU) |
|
|
|
Pa(U) = ∅ Pa(X) = {U} Pa(M) = {X} Pa(Y) = {M, U} |
Pa(U) = Pa(X̃) = ∅ Pa(X) = {U} Pa(M) = {X̃} Pa(Y) = {M, U} |
|
Y ╨ X|(M, U) M ╨ U|X |
Y ╨ (X̃, X)|(M, U) M ╨ (U, X)|X̃ X ╨ (M, X̃, Y)|U U ╨ (M, X̃) X̃ ╨ (X, U) |
| PrE(Y, M, X, U) = PrE(Y|M, U)PrE(X|U)PrE(M|X)PrE(U) | PrH(Y, M, X, U, X̃) = PrH (Y|M, U) PrE (X|U)PrH (M|X̃)PrH (U)PrH (X̃) |
| PrE (Y, M, U|do(X) = x) = PrE (Y|M, U)PrE (M|X = x)PrE(U) | PrH (Y, M, U, X|X̃ = x) = PrH(Y|M, U) PrE(X|U) PrH(M|X̃ = x)PrH(U) |
This table has two columns and seven panels separated by horizontal lines. Each column presents a causal model. The first panel names the models. The second panel presents the structural equations generating the model. In this row alone we make ε explicit. In the other it is kept implicit to avoid clutter. Columns 1 and 2 are based on structural equations that have the same functional form, but have different inputs. The third panel represents the model as a DAG. Squares represent observed variables, circles represent unobserved variables. The fourth panel presents the parents in 𝒯 of each variable. The fifth panel shows the conditional independence relationships generated by the application of the Local Markov Condition. The sixth panel presents the factorization of the joint distribution of variables in the Bayesian Network. The last panel of column 1 presents the joint distribution of variables when X is fixed at x using the “do operator.” The last panel of column 2 gives the joint distribution of variables generated by the hypothetical models associated with empirical model 1 when X̃ is conditioned at X̃ = x.
We are interested in identifying the distribution of the outcome Y when X is fixed at x. Within the context of the do-calculus, by identification we mean expressing the quantity Pr(Y|do(X)) in terms of the distribution of observed variables.
The do-calculus identifies Pr(Y|do(X)) through four steps, which we now perform. Steps 1, 2, and 3 identify Pr(M|do(X)), Pr(Y|do(M)), and Pr(Y|M, do(X)), respectively. Step 4 uses the first three steps to identify Pr(Y|do(X)).
Invoking LMC (5) for variable M of DAG , (DAG 1 of Table 7) generates X ╨ M. Thus, by Rule 2 of the do-calculus, we obtain Pr(M|do(X)) = Pr(M|X).
-
Invoking LMC (5) for variable M of DAG GM̄, (DAG 1 of Table 7) generates X ╨ M. Thus, by Rule 3 of the do-calculus, Pr(X|do(M)) = Pr(X). In addition, applying LMC (5) for variable M of DAG , (DAG 2 of Table 7) generates M ╨ Y|X. Thus, by Rule 2 of the do-calculus, Pr(Y|X, do(M)) = Pr(Y|X, M).
where “supp” means support.
Invoking LMC (5) for variable M of DAG , (DAG 3 of Table 7) generates Y ╨ M|X. Thus, by Rule 2 of the do-calculus, Pr(Y|M, do(X)) = Pr(Y|do(M), do(X)). In addition, applying LMC (5) for variable X of DAG GX̄,M̄, (DAG 4 of Table 7) generates (Y, M, U) ╨ X. By weak union and decomposition, we obtain Y ╨ X|M. Thus, by Rule 3 of the do-calculus, we obtain that Pr(Y|do(X), do(M)) = Pr(Y|do(M)). Thus, Pr(Y|M, do(X)) = Pr(Y|do(M), do(X)) = Pr(Y|do(M)).
-
We collect the results from the three previous steps to identify Pr(Y|do(X)) from observed data:
In this fashion, we can use the do-calculus to identify the desired causal parameter. It is instructive to compare this proof of identification with one based on the approach of Haavelmo. We identify the causal effects of X on Y for the Front-Door model using a hypothetical model. We replace the relationship of X on M by a hypothetical variable X̃ that causes M. We use PrE to denote the probability of the Front-Door model that generates the data (Column 1 of Table 6) and PrH for the hypothetical model (Column 2 of Table 6). As before, we seek to identify PrH(Y|X̃) (the equivalent of Pr(Y|do(X))) from observed distributions in the empirical model.
We first present a lemma that states three useful conditional independence relations of the hypothetical model. The lemma is based on the application of LMC (5) and the Graphoid relationships:
Lemma L-1. In the Front-Door hypothetical model, (1) Y ╨ X̃|M, (2) X ╨ M, and (3) Y ╨ X̃|(M, X)
Proof. By LMC (5) for X, we obtain (Y, M, X̃) ╨ X|U. By LMC (5) for Y we obtain Y ╨ (X, X̃)|(M, U). By Contraction applied to (Y, M, X̃) ╨ X|U and Y ╨ (X, X̃)|(M, U) we obtain (Y, X) ╨ X̃|(M, U). By LMC (5) for U we obtain (M, X̃) ╨ U. By Contraction applied to (M, X̃) ╨ U and (Y, M, X̃) ╨ X|U we obtain(X, U) ╨ (M, X̃). The second relationship in the Lemma is obtained by Decomposition. In addition, by Contraction on (Y, X) ╨ X̃|(M, U) and (M, X̃) ╨ U we obtain (Y, X, U) ╨ X̃|M. The two remaining conditional independence relationships of the Lemma are obtained by Weak Union and Decomposition.15
Applying these results,
The second equality comes from relationship (1) Y ╨ X̃|M of Lemma L-1. The fourth equality comes from relationship (2) X ╨ M of Lemma L-1. The fifth equality comes from relationship (3) Y ╨ X̃|(M, X) of Lemma L-1. The last equality links the distributions of the hypothetical model with the ones of the empirical model. The first term uses Theorem T-2 to equate PrH(Y|X = x′, X̃ = x′, M = m) = PrE(Y|M, X = x′). The second term uses the fact that X is not a child of X̃; thus, by Theorem T-1, PrH(X = x′) = PrE(X = x′). Finally, the last term uses Matching applied to M. Namely, LMC (5) for M generates M ╨ X|X̃ in the hypothetical model. Then, by Matching C-2, PrH(M|X̃ = x) = PrE(M|X = x).
It is clear from the analysis presented in this section that, even when the do-calculus and the hypothetical model produce the same identification formulas, they differ greatly. Using the hypothetical model and adjoining X̃ to the original empirical model, we can define causal parameters in a straightforward way using the standard tools of probability theory. Analyses based on X̃ offer a simple, transparent, way to introduce Haavelmo's notion of fixing into models.
5 The Benefits and Limitations of DAGs
A major benefit of DAGs is their intuitively appealing description of models as causal chains. DAG assumptions list the variables in a model and their causal relationships. A DAG does not generate or characterize any restrictions on functional forms or parametric specifications of the structural equations. In this sense, if an identification result is achieved, it is obtained under very weak conditions.
This benefit of DAGs is also the source of their limitations. Methods that focus on identification of models solely described by DAGs lack the tools for invoking additional assumptions that would generate the identification of an a priori non-identified model. The instrumental variable model examined in Section 4.1 is a fundamental ingredient of a huge literature on econometric identification (see, e.g., Matzkin, 2013). It is the basis for identifying more sophisticated models such as the Generalized Roy model of Section 3.2, which is widely used in econometrics in the analysis of selection bias and in evaluating social programs (Heckman, 1976, Heckman, 1979, Heckman and Robb, 1985, Powell, 1994, Heckman and Vytlacil, 2007a, b). Examples of this literature are nonparametric control functions (see, e.g., Blundell and Powell, 2003) and identification through instrumental variables (Reiersöl, 1945). Heckman and Pinto (2013) use the concept of a hypothetical model to examine the identification of the instrumental variable model. They present a unified approach that summarizes a range of identification strategies. Section 4.1 shows that the instrumental variable model is not identified applying the rules of the do-calculus. As mentioned, it is impossible to identify the causal effect of X on Y without using additional information.
The non-identification of the instrumental variable model poses a major limitation for the identification literature that relies exclusively on DAGs. Identification of the instrumental variable model relies on assumptions outside the scope of the DAG literature. For example, we can use LMC (5) to obtain the following conditional independence relationships: Y ╨ Z|(U, X) and U ╨ Z. These relationships in addition to X
Z satisfy the necessary criteria to apply the method of Two Stage Least Squares (TSLS). TSLS identifies the instrumental variable model under a linearity assumption. As a consequence, if we assume that the causal relationship of X and U on outcome Y are represented by a linear equation, i.e., Y = Xβ + U, then it is well-known that parameters β can be identified using cov(Z, Y) / cov(Z, X) under standard rank conditions.
Linearity and homogeneity of the effects of X on Y across agents (i.e., β is the same across the values X, U take) are strong assumptions about the causal links that govern the relationship between Y and X. This assessment fostered a huge literature in economics devoted to methods that relax linearity and homogeneity and that allow coefficients to be correlated with regressors. Examples of this literature are Imbens and Angrist (1994), Vytlacil (2002), and Heckman and Vytlacil (2005, 2007a, b), who identify the instrumental variable model under more general conditions by making assumptions on the causal relationship of Z with X. Imbens and Angrist (1994) show that the instrumental variable model can be identified under a “monotonicity” assumption (increasing the values of an instrument has the same qualitative effect on all agents). Vytlacil (2002) shows that this assumption is equivalent to assuming an instrumental variable model in which the treatment assignment decision rule is separable in terms of unobserved characteristics of the agents and the instrumental variable. Heckman and Vytlacil (1999, 2005, 2007a, b) develop and apply this result.
Table 8 summarizes the common and distinct features of Pearl's do-calculus and the approach based on Haavelmo's hypothetical model. Both approaches use structural equation models in the sense of Koopmans and Reiersøl (1950). Both invoke autonomy and assume mutually independent errors ε. In recursive models, both use the Local Markov Condition and the Graphoid axioms. Both use “fixing” or the “do operator” to define counterfactuals.
Table 8. Summarizing the Do-calculus of Pearl (2009) and the Haavelmo Approach.
| Common Features of Haavelmo and Do-Calculus: | ||
|
| ||
| Autonomy (Frisch, 1938) | ||
| Errors Terms: ε mutually independent | ||
| Statistical Tools: LMC and Graphoid Axioms apply | ||
| Counterfactuals: Fixing or Do-operator is a causal, not statistical, operation | ||
|
| ||
| Distinctive Features of Haavelmo and Do-Calculus: | ||
|
| ||
| Haavelmo | Do-calculus | |
| Approach: | Thinks outside the box of the empirical model by constructing a new hypothetical model motivated by, but distinct from, the empirical model where fixing can be analyzed using standard tools of probability | Thinks inside the box of the empirical model; Creates complex nonstandard rules to introduce fixing into a probabilistic framework |
| Introduces: | Constructs a hypothetical model | Graphical rules |
| Identification: | Connects PrH and PrE | Iteration of do-calculus rules |
| Versatility: | Basic statistical principles apply | Creates new rules of statistics |
The approaches diverge in their analyses of identification. The approach based on Haavelmo creates a hypothetical variable X̃ and an associated hypothetical model that is “outside the box” of the empirical model. It applies standard probability calculus to the hypothetical model to connect the hypothetical model to the empirical model. Pearl's do-calculus creates a new set of extra-statistical tools to identify the causal parameters created by fixing or the “do-operator.” Our analysis shows that in the hypothetical model of Haavelmo, the special extra-statistical tools of the do-calculus are not required to identify causal parameters. By relying exclusively on statistical tools, the hypothetical model approach allows for additional econometric assumptions that identify a broader range of models that cannot be identified using the rules that only apply to DAGs, i.e. the “do-calculus.”
6 Hypothetical Models and Simultaneous Equations
The literature on causality provides a framework for modeling causal processes that are based on DAGs. Less is known about Directed Cyclic Graphs (DCGs) that are used to represent Simultaneous Equations. Indeed, the fundamental Local Markov Condition no longer holds for DCGs (Spirtes, 1995). Nevertheless, the notion of fixing readily extends to a system of simultaneous equations.
Consider a system of two equations:
| (7a) |
| (7b) |
𝒯E = {Y1, Y2, X1, X2, U1, U2}. Our analysis can be readily generalized to systems with more than two equations, but for the sake of brevity, we focus on the two-equation case. To simplify notation, we keep the variables in ε implicit.
The empirical Simultaneous Equations Model of (7a) and (7b) is represented as Model 1 of Table 9. Many different versions of this model appear in the literature. For simplicity, we assume U1 ╨ U2 and (U1, U2) ╨ (X1, X2).16
Table 9. Models for Simultaneous Equations.
| 1. Empirical Model for Simultaneous Equations |
|
| 2. Hypothetical Model for Simultaneous Equations |
|
| 3. Recursive Representation |
|
This table shows two models. (The variables in ε are kept implicit.) Model 1 represents the empirical model for Simultaneous Equations where Y1 and Y2 cause each other. Model 1 is cyclic, and hence it is not a DAG. Model 2 represents one possible hypothetical model associated with the empirical model for Simultaneous Equations. In Model 2, the hypothetical variable Ỹ2 is associated with the causal link of Y2 on Y1 of Model 1 and the hypothetical variable Ỹ1 is associated with the causal link of Y1 on Y2 of Model 1. Model 3 presents a recursive (“causal chain”) representation of a hypothetical model in which simultaneity is broken in the original model and the resulting Y1 is set outside of the model through hypothetical variation. This thought experiment is one way to produce the model on the left hand side of panel 2.
The hypothetical model associated with the causal operation of fixing both Y2 and Y1 is represented in Model 2 of Table 9. Under autonomy, the causal effect of Y2 on Y1 when Y2 is fixed at y2 is given by Y1(y2) = gY1(y2, X, U1). Symmetrically, Y2(y1) = gy2(y1, X, U2). We define hypothetical random variables Ỹ1,Ỹ2. They replace the Y1, Y2 inputs in Equations (7a) and (7b) in the same fashion as discussed in previous sections. (Ỹ1,Ỹ2) ╨ (X1, X2, U1, U2); and Ỹ1 ╨ Ỹ2. 𝒯H = {Ỹ1, Ỹ2, Ỹ1, Ỹ2, X1, X2, U1, U2}. We assume a common support for (Y1, Y2) and (Ỹ1, Ỹ2).
In the same fashion as in the model previously discussed, the distribution of Y1 when Y2 is fixed at y2 is given by PrH(Y1|Y2 = y2). The average causal effect of Y2 on Y1 when Y2 is fixed at the two values of y2 and y′2 is given by , where denotes expectation over the probability measure PrH of the hypothetical model. The hypothetical variation of Ỹ2 corresponds to the standard Marshallian and Walrasian thought experiments in which quantities or prices are fixed to trace out demand and supply curves (see, e.g., Mas-Colell et al., 1995). A symmetric analysis produces the causal effect of Y1 on Y2. Thus we obtain the counterpart to the counterfactuals defined for the recursive models earlier in this paper.
Under simultaneity, the graph for Model 1 is cyclic and the relationships that hold for DAGs, such as the LMC (5), break down (Lauritzen and Richardson, 2002; Spirtes, 1995). Equations (7a) and (7b) cannot be represented as Directed Bayesian networks. The tools developed for DAGs do not directly apply and require modification. Equations (7a) and (7b) are fundamentally non-recursive and observed variables emerge from a feedback process.
The do-calculus does not have the concept of a hypothetical model. It focuses exclusively on the empirical model. Thus, in the simultaneous equation models depicted in Table 9, the do-calculus can only be applied to the model in the first panel (the empirical model). In the case of the simultaneous equation model, the DAG is cyclical and the rules of the do-calculus do not hold.
A traditional assumption in the simultaneous equations literature is “completeness”—the existence of at least a local solution for Y1 and Y2 in terms of (X1, X2, U1, U2):
| (8a) |
| (8b) |
These are called “reduced form” equations (see, e.g., Matzkin, 2008, 2013). They inherit the autonomy properties of the structural equations.18
The assumption of the existence of a reduced form is not innocuous even in the linear cases for continuous Y1 and Y2 analyzed by Haavelmo (1943, 1944) and the Cowles Foundation pioneers (see Koopmans et al., 1950). Heckman (1978), Tamer (2003), and Chesher and Rosen (2012) analyze the case in which Y1 and Y2 are discrete valued. Solutions (8a) and (8b) may not exist except under conditions given in those papers.19 Alternatively, there may be multiple solutions giving rise to reduced form correspondences. In the case where no solutions exist, the model is incoherent as an equilibrium model unless additional assumptions are invoked. However, one can construct hypothetical models using Haavelmo's insights even in incoherent cases.20
In addition, some frameworks for multivariate discrete data may not be sufficiently rich to distinguish correlation from causation. Heckman (1978) shows that log-linear models for discrete data used in statistics (see, e.g., Bishop et al., 1975) have too few parameters to make causal distinctions. He introduces a class of latent variable models in which such distinctions are possible.
Note further that even in models in which the reduced form equations are well defined, it is not possible, in general, to simultaneously vary Ỹ1 and Ỹ2 so that they (i) solve Equations (7a) and (7b) and (ii) also satisfy the requirement that (Ỹ1, Ỹ2) ╨ (X1, X2, U1, U2). This is apparent from the reduced form equations (8a) and (8b) that, under completeness, the proposed variations must also satisfy. Nonetheless, Ỹ2 and Ỹ1 can be separately constructed to create hypothetical models corresponding to Equations (7a) and (7b) respectively. These equations exist as theoretical constructs independent of any particular equilibrium construct.
Matzkin (2007, 2008, 2012, 2013) presents comprehensive and definitive treatments of alternative approaches for identifying simultaneous equations. Our analysis readily extends to systems with more than two equations, but for the sake of brevity we do not make the extension here.
7 Summary and Conclusions
This paper examines Haavelmo's fundamental contributions to the study of causal inference. He produced the first formal analysis of the distinction between causation and correlation. He carefully distinguished the process of defining causality—a mental act that assigns hypothetical variation to inputs—from the act of identifying causal models from data. Haavelmo was remarkably clear about concepts that are still muddled in some quarters of statistics.21
Haavelmo shows us that causal effects of inputs on outputs are defined in abstract models that assign independent variation to inputs. He formalized Frisch's notion that causality is in the mind. We formalize his insight extending his analysis for linear models to more general models. This enables us to discuss causal concepts such as “fixing” using an intuitive approach that applies Haavelmo's ideas.
Following Haavelmo, we distinguish the definition of causal parameters from their identification. Our approach to defining causality relies on the assumption of autonomy joined with Haavelmo's notion of hypothetical random variables. Together they enable us to express the distribution of counterfactual outcomes using structural equations and the distributions of the data by replacing the variables whose causal effects we seek to establish with their hypothetical counterparts. Causal models thus defined apply standard statistical tools and do not require new procedures like the do-calculus that lie outside the scope of the standard tools of probability and statistics.
Identification in Haavelmo's model is achieved in recursive models by applying standard statistical tools to Bayesian Networks. We link the distributions of empirical and hypothetical models by expressing the quantities of interest in the hypothetical model into observed quantities in the empirical one.
We illustrate the benefits of Haavelmo's approach first by analyzing an IV DAG and then by comparing identification of the causal effects of Pearl's flagship Front-Door model (Pearl, 2009) using a method based on the Haavelmo approach and a method based on the do-calculus of Pearl (2009). The IV model is not identified using solely the calculus of the do-operator. While both methods generate the same estimator for the Front- Door model, the identification methods differ on both conceptual and methodological grounds. The definition of causal parameters is clearly separated from their identification in the Haavelmo approach. The two tasks are often confounded in applications of the do-calculus. The do-calculus is cumbersome compared to the elegance and simplicity of the Haavelmo approach. We discuss the limitations of methods of identification that rely on the fundamentally recursive approach of Directed Acyclic Graphs.
That framework cannot accommodate the fundamentally non-recursive simultaneous equations model. The hypothetical model readily accommodates an analysis of causality in the simultaneous equations model. The framework of simultaneous equations is fundamentally non-recursive and falls outside of the framework of Bayesian causal nets and DAGs. The rigorous definition of causality in a variety of models including the simultaneous equations framework and the identification of causal parameters, are central and enduring contributions of Haavelmo (1944).
Acknowledgments
We thank the guest editor of this issue, Olav Bjerkholt, for very helpful comments; participants at the Haavelmo symposium, Oslo, Norway, December, 2011, and the Haavelmo Lecture, University of Oslo, December 13, 2013; as well as Aureo de Paula, Steven Durlauf, Thibaut Lamadon, Maryclare Griffin, and Cullen Roberts; three anonymous referees; participants in the seminar on Quantitative Research Methods in Education, Health, and Social Sciences at the University of Chicago, March 2013; and participants at a seminar at UCL, September 4, 2013. Steve Stigler gave us helpful bibliographic references. This research was supported in part by the American Bar Foundation, a grant from the European Research Council DEVHEALTH-269874, and NICHD R37-HD065072. The views expressed in this paper are those of the authors and not necessarily those of the funders or commentators mentioned here.
A Mathematical Appendix
Theorem T-1:
Proof. If T is non-descendant of X̃ in the hypothetical model, i.e., T ∈ 𝒯E \ DH(X), then T ∈ 𝒯E \ ChH(X̃) as ChH(X̃) ⊂ DH(X̃). Thus, PrH(T|PaH(T)) = PrE(T|PaE(T)) from Equation (8). Moreover, it must be the case that parents of T are also non-descendants of X̃; i.e., PaH(T) ⊂ 𝒯E \ DH(X̃) ⊂ 𝒯E \ ChH(X̃) ∴ PaH(T) = PaE(T) by Equation (8). Another way of saying this is that the parents of T are not children of X̃. Thus, we can use factorization (6) to write:
As a consequence, PrH(W) = PrE(W) for all W ⊂ 𝒯E \ DH(X̃), and thereby
Conditioning on X̃ comes from that fact that X̃ ╨ (𝒯E \ DH(X̃)), which is obtained by applying LMC (5) to X̃ in GH.
Theorem T-2:
Proof. In order to prove the theorem, we first partition the set of variables 𝒯E into four sets:
Set 1 consists of all variables in 𝒯E that are non-descendants of X in the empirical model and thereby nondescendants of X̃ in the hypothetical one. Set 2 consists of descendants of X but not directly caused by X, i.e., except its Children. Sets 3 and 4 are the Children of X and X̃ in the hypothetical model. Note that Sets 3 and 4 consist of all Children of X in the empirical model as ChE(X) = ChH(X) ∪ ChH(X̃). We now examine the variables of each set separately:
For all T ∈ 𝒯H \ DE(X) ⇒ {T, PaH(T)} ⊂ 𝒯H \ DE(X) ⊂ 𝒯E \ DH(X̃), as DH(X) ⊂ DE(X). Also X ∈ 𝒯E \ DH(X̃). Thus, by Theorem T-1, PrH(T|PaH, X̃ = x, X = x) = PrE(T|PaE(T), X = x).
T ∈ DE(X) \ ChE(X) ⇒ X̃ ∉ PaH(T), X ∉ PaH(T), and PaH(T) = PaE(T). Moreover, X, X̃ must be non-descendants of T due to the acyclic property of the empirical model on X. Thus, by LMC (5), (X̃, X) ╨ T|PaH(T). By Weak Union, X̃ ╨ T|(PaH(T), X). Therefore PrH(T|PaH(T), X̃ = x, X = x) = PrH(T|PaH(T), X = x) = PrE(T|PaE(T), X = x) by Equation (8).
T ∈ ChH(X) ⇒ X̃ ∉ PaH(T) and X ∈ PaH(T) = PaE(T). Also, X̃ is external, thus X̃ ╨ T|PaH(T) by LMC (5) applied to T. Therefore PrH(T|PaH(T) \ X, X̃ = x, X = x) = PrH(T|PaH(T) \ X, X = x) = PrE(T|PaE(T) \ X, X = x) by Equation (8) as T ∈ ChH(X) ⊂ 𝒯E \ ChH(X̃).
T ∈ ChH(X̃) ⇒ X̃ ∈ PaH(T). Moreover, X must be a non-descendant of T due to the acyclic property of the empirical model on X. Thus, by LMC (5), X ╨ T|PaH(T). Therefore PrH(T|PaH(T) \ X̃,X̃ = x, X = x) = PrH(T|PaH(T) \ X̃, X̃ = x) = PrE(T|PaE(T) \ X, X = x) by Equation (9).
Grouping items 1–4, we have that for all T ∈ 𝒯H, PrH(T|PaH(T), X̃ = x, X = x) = PrE(T|PaE(T), X = x). Thus we can use the factorization (6) to obtain
| (17) |
The claim of the theorem is a direct consequence of Equation (17).
Corollary C-1:
Proof.
The second equality stems from PaH(X̃) = ∅ and X is not descendant of X̃, thus by LMC (5), X ╨ X̃. Therefore PrH(X = x, X̃ = x) = PrH(X = x)PrH(X̃ = x). The third equality comes from the assumption that PrH(X̃ = x) is constant due to uniformity. The fourth equality comes from the fact that . The first term of the fifth equality comes from an application of Theorem T-2. The second term of the fifth equality comes from Theorem T-1 and the fact that X ∈ 𝒯E \ DH(X̃).
Theorem T-3:
Proof.
The first equality comes from the fact that the hypothetical model is a DAG, therefore we apply factorization (6). The second equality comes from the characteristic of the do-operator, which targets all causal links of a fixed variable X. Thus, the hypothetical variable X̃ must replace all X inputs which is equivalent to ChH(X̃) = ChE(X). The first and second terms of the third equality come as a consequence of Equations (8) and (9), respectively. The last equality comes from the definition of the do-operator.
Matching C-2:
Proof.
Footnotes
To our knowledge, the first recorded statement of the distinction between correlation and causation is due to Fechner (1851), who distinguished “causal dependency” from what he called “functional relationship.” See Heidelberger (2004, p. 102). In later work, Yule (1895, footnote 2, p. 605) discussed the distinction between correlation and causation in a discussion of the effect of relief payments on pauperism. We thank, respectively, Olav Bjerkholt and Steve Stigler for these references.
“… we think of a cause as something imperative which exists in the exterior world. In my opinion this is fundamentally wrong. If we strip the word cause of its animistic mystery, and leave only the part that science can accept, nothing is left except a certain way of thinking, an intellectual trick … which has proved itself to be a useful weapon … the scientific … problem of causality is essentially a problem regarding our way of thinking, not a problem regarding the nature of the exterior world.” (Frisch 1930, p. 36, published 2011)
“In many fields the objective of the investigator's inquisitiveness is not just a “population” in the sense of a distribution of observable variables, but a physical structure projected behind this distribution, by which the latter is thought to be generated. The word “physical” is used merely to convey that the structure concept is based on the investigator's ideas as to the “explanation” or “formation” of the phenomena studied, briefly, on his theory of these phenomena, whether they are classified as physical in the literal sense, biological, psychological, sociological, economic or otherwise.” (Koopmans and Reiersøl 1950, p. 165)
See Simon (1953), Heckman (2008) and Freedman (2010), for later statements of this point of view.
All models—empirical or hypothetical—are idealized thought experiments. There are no formalized rules for creating models, causal or empirical. Analysts may differ about the inputs and relationships in either type of model. A model is more plausible the more phenomena it predicts and the deeper are its foundations in established theory. Causal models are idealizations of empirical models which are in turn idealizations of phenomena. Some statisticians reject the validity of hypothetical models and seek to define causality using empirical methods (Sobel, 2005). As an example we can cite the “Rubin model” of Holland (1986), which equates establishing causality with the empirical feasibility of conducting experiments. This approach confuses the definition of causal parameters with their identification from data. See Heckman (2005, 2008) for a discussion of this approach.
This framework allows for uncertainty on the part of agents if realizations of the uncertain variables are captured through variables X and U. In that sense the model can be characterized as a method for examining ex-post relationships between variables. For a discussion of causal analysis of ex-post versus ex-ante models, see, e.g., Hansen and Sargent (1980) and Heckman (2008).
Haavelmo (1943) did not explicitly use the term “fixing.” He set U (in our notation) to a specified value and manipulated X in his “hypothetical model.” Specifically Haavelmo set U = 0 but the point of evaluation is irrelevant in the linear case he analyzed.
See Pearl (2009) and Spirtes et al. (2000) for discussions.
We could express X̃ = fX̃(εX̃) to be notationally consistent.
Frisch's (1938) notion of invariance used by Haavelmo is called SUTVA in one model of causality popular in statistics. See Holland (1986) and Rubin (1986).
Chalak and White (2012) present generalizations of this approach.
| Symmetry | X ╨ Y|Z ⇒ Y ╨ X|Z. |
| Decomposition: | X ╨ (W, Y)|Z ⇒ X ╨ Y|Z. |
| Weak Union: | X ╨ (W, Y)|Z ⇒ X ╨ W|(Y, Z). |
| Contraction: | X ╨ Y|Z and X ╨ W|(Y, Z) ⇒ X ╨ (W, Y)|Z. |
| Intersection: | X ╨ W|(Y, Z) and X ╨Y|(W, Z) ⇒ X ╨ (W, Y)|Z. |
| Redundancy: | X ╨ Y|X. |
The intersection relation is only valid for variables with strictly positive probability distributions. See also Dawid (2001).
As an example, let a simple empirical model for mediation analysis consist of three variables: an input variable X, a mediation variable M caused by X and an outcome of interest Y caused by X and M. This model is represented as a DAG in Model 1 of Table 2 and ChE(X) = {M, Y}. Suppose we are interested in the indirect effect, that is the effect of X on Y that operates exclusively by changes in M while holding the distribution of X unaltered. The hypothetical model for the evaluation of the indirect causal effect assigns the causal link of X on M to the hypothetical variable X̃. Namely X still causes Y, but X̃ causes M. This hypothetical model is represented by Model 3 of Table 2. In this model ChH(X) = {Y}, ChH(X̃) = {M} and ChE(X) = ChH(X) ∪ ChH(X̃).
Disjoint (i.e., distinct) from X
We also note the following result:
Corollary C-1. Let X̃ be uniformly distributed in the support of X and let W, Z be any disjoint set of variables in 𝒯E then:
Proof. See Appendix. We thank an anonymous referee for suggesting this result and its proof.
See, e.g., Rosenbaum and Rubin (1983).
- a path p contains a chain i → m → j or a fork i ← m ← j such that the middle node m is in Z, or
- a path p contains an inverted fork (or collider) i → m ← j such that the middle node m is not in Z and such that no descendant of m is in Z.
A set Z is said to d-separate X from Y if and only if Z blocks every path from a node in X to a node in Y. If X and Y are d-Separated by Z according to a graph G, then Y ╨ Y|Z in G. We are examining the Hypothetical Model described by second column of Table 2. Variables Y and X̃ are connected by a single path X̃ → M → Y. Thus we have that Y ╨ X̃|M, according to part 1 of the d-Separation criteria. Moreover, we can also state that Y ╨ X̃|(M, X) as X is not a collider nor a descendant of a collider (part 2 of the d-Separation criteria). Finally, there is no path that connects X and M of the form X → … → M nor X ← … ← M. Thus we can state that X ╨ M according to part 1 of the d-Separation criteria.
These assumptions are made to simplify the analysis. A large literature relaxes these assumptions and develops identification criteria for cases where U1
U2 and (U1, U2)
(X1, X2). The literature considers a variety of specifications (see Matzkin, 2008). We maintain the assumptions that U1 ╨ U2 and (U1, U2) ╨ (X1, X2) for simplicity.
We use the term “completeness” in the sense of Koopmans et al. (1950); i.e., the existence of a local solution of Equations (7a) and (7b). This concept is to be distinguished from the notion of completeness in the nonparametric IV literature (Newey and Powell, 2003) or in hypothesis testing (Lehmann and Romano, 2005).
Under completeness, we can use a version of indirect least squares to define causal parameters and identify them where the induced variation in Ỹ1 and Ỹ2 satisfy equilibrium conditions. Thus if X1 and X2 are disjoint, one can use indirect least squares to identify from reduced form equations (8a) and (8b), assumed to be differentiable:
From the reduced forms:
Thus,
If X1 and X2 contain common elements, the method can be modified to use only the distinct elements in X1 and X2 in this analysis.
Linear probability model approximations to Equations (7a) and (7b), as advocated by Angrist and Pischke (2008), although widely used, are in general not autonomous. They can, however, be estimated and identified for incoherent models, creating the illusion of coherency through approximation error. See Heckman and MaCurdy (1985) for a discussion.
This might be a conceptually unsatisfactory exercise unless the data intended to be described by the model display disequilibrium cycling phenomena and a time sequence for the evolution of the system, e.g., , …, is postulated as functions of inputs where superscripts denote time-dated variables.
See, e.g., Holland (1986) and Sobel (2005) for examples of the confusion between models and identification strategies exemplified by the claim that no causal statements are possible unless persons are randomly assigned to treatment.
Contributor Information
James Heckman, Email: jjh@uchicago.edu, The University of Chicago, Department of Economics, 1126 E. 59th St. Chicago, IL 60637 (773) 702-0634.
Rodrigo Pinto, Email: rodrig@uchicago.com, The University of Chicago, Department of Economics, 1155 E. 60th St., Room 215, Chicago, IL 60637 (773) 702-9436.
References
- Angrist JD, Pischke JS. Mostly Harmless Econometrics: An Empiricist's Companion. Princeton: Princeton University Press; 2008. [Google Scholar]
- Bishop YM, Fienberg SE, Holland PW. Discrete Multivariate Analysis: Theory and Practice. Cambridge, Massachusetts: The MIT Press; 1975. [Google Scholar]
- Blundell R, Powell J. Endogeneity in nonparametric and semiparametric regression models. In: Dewatripont LPHM, Turnovsky SJ, editors. Advances in Economics and Econometrics: Theory and Applications, Eighth World Congress. Vol. 2. Cambridge, UK: Cambridge University Press; 2003. [Google Scholar]
- Chalak K, White H. Causality, conditional independence, and graphical separation in settable systems. Neural Computation. 2012;24(7):1611–1668. [Google Scholar]
- Chesher A, Rosen A. Simultaneous equations for discrete outcomes: Coherence, completeness, and identification. Working Papers CWP21/12, cemmap 2012 [Google Scholar]
- Dawid A. Separoids: A mathematical framework for conditional independence and irrelevance. Annals of Mathematics and Artificial Intelligence. 2001;32(1–4):335–372. [Google Scholar]
- Dawid AP. Conditional independence in statistical theory (with discussion) Journal of the Royal Statistical Society Series B (Statistical Methodological) 1979;41(1):1–31. [Google Scholar]
- Fechner GT. Outline of a new principle of mathematical psychology. Psychological Research. 1851;49:203–207. doi: 10.1007/BF00309027. [DOI] [PubMed] [Google Scholar]
- Freedman D. In: Statistical Models and Causal Inference: A Dialogue with the Social Sciences. Collier D, Sekhon J, Stark P, editors. Cambridge, UK: Cambridge University Press; 2010. [Google Scholar]
- Frisch R. In: A Dynamic Approach to Economic Theory: The Yale Lectures of Ragnar Frisch, 1930. Bjerkholt O, Qin D, editors. New York, New York: Routledge; 1930. published 2010. [Google Scholar]
- Frisch R. The Foundations of Econometric Analysis. Cambridge University Press; 1938. Autonomy of economic relations: Statistical versus theoretical relations in economic macrodynamics. Paper given at League of Nations. Reprinted in D.F. Hendry and M.S. Morgan (1995) [Google Scholar]
- Goth G. Judea pearl interview. IEEE Internet Computing. 2006;10(5):6. [Google Scholar]
- Haavelmo T. The statistical implications of a system of simultaneous equations. Econometrica. 1943 January;11(1):1–12. [Google Scholar]
- Haavelmo T. The probability approach in econometrics. Econometrica. 1944;12(Supplement):iii–vi. 1–115. [Google Scholar]
- Hansen LP, Sargent TJ. Formulating and estimating dynamic linear rational expectations models. Journal of Economic Dynamics and Control. 1980 February;2(1):7–46. [Google Scholar]
- Heckman J, Pinto R. A unified approach to examine treatment effects: Causality and identification. University of Chicago, Department of Economics; 2013. Unpublished manuscript. [Google Scholar]
- Heckman JJ. The common structure of statistical models of truncation, sample selection and limited dependent variables and a simple estimator for such models. Annals of Economic and Social Measurement. 1976 December;5(4):475–492. [Google Scholar]
- Heckman JJ. Dummy endogenous variables in a simultaneous equation system. Econometrica. 1978 July;46(4):931–959. [Google Scholar]
- Heckman JJ. Sample selection bias as a specification error. Econometrica. 1979 January;47(1):153–162. [Google Scholar]
- Heckman JJ. The scientific model of causality. Sociological Methodology. 2005 August;35(1):1–97. [Google Scholar]
- Heckman JJ. Econometric causality. International Statistical Review. 2008 April;76(1):1–27. [Google Scholar]
- Heckman JJ, MaCurdy TE. A simultaneous equations linear probability model. Canadian Journal of Economics. 1985 February;18(1):28–37. [Google Scholar]
- Heckman JJ, Robb R. Alternative methods for evaluating the impact of interventions: An overview. Journal of Econometrics. 1985 October-November;30(1–2):239–267. [Google Scholar]
- Heckman JJ, Vytlacil EJ. Local instrumental variables and latent variable models for identifying and bounding treatment effects. Proceedings of the National Academy of Sciences. 1999 April;96(8):4730–4734. doi: 10.1073/pnas.96.8.4730. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Heckman JJ, Vytlacil EJ. Structural equations, treatment effects and econometric policy evaluation. Econometrica. 2005 May;73(3):669–738. [Google Scholar]
- Heckman JJ, Vytlacil EJ. Econometric evaluation of social programs, part I: Causal models, structural models and econometric policy evaluation. In: Heckman J, Leamer E, editors. Handbook of Econometrics. 6B. Amsterdam: Elsevier; 2007a. pp. 4779–4874. [Google Scholar]
- Heckman JJ, Vytlacil EJ. Econometric evaluation of social programs, part II: Using the marginal treatment effect to organize alternative economic estimators to evaluate social programs and to forecast their effects in new environments. In: Heckman J, Leamer E, editors. Handbook of Econometrics. 6B. Amsterdam: Elsevier; 2007b. pp. 4875–5143. Chapter 71. [Google Scholar]
- Heidelberger M. Nature from within: Gustav Theodor Fechner and his psychophysical worldview. Pittsburgh, PA: University of Pittsburgh Press; 2004. [Google Scholar]
- Holland PW. Statistics and causal inference. Journal of the American Statistical Association. 1986 December;81(396):945–960. [Google Scholar]
- Howard RA, Matheson JE. Influence diagrams. 1. Menlo Park, CA: Stanford Research Institute; 1981. Principles and applications of decision analysis; pp. 720–762. [Google Scholar]
- Huang Y, Valtorta M. Technical report. University of South Carolina Department of Computer Science; 2006. A study of identifiability in causal Bayesian network. [Google Scholar]
- Huang Y, Valtorta M. On the completeness of an identifiability algorithm for semi-markovian models. Annals of Mathematics and Artificial Intelligence. 2008;54(4):363–408. [Google Scholar]
- Imbens GW, Angrist JD. Identification and estimation of local average treatment effects. Econometrica. 1994 March;62(2):467–475. [Google Scholar]
- Kiiveri H, Speed TP, Carlin JB. Recursive causal models. Journal of the Australian Mathematical Society (Series A) 1984;36(1):30–52. [Google Scholar]
- Koopmans TC, Reiersøl O. The identification of structural characteristics. The Annals of Mathematical Statistics. 1950 June;XXI(2):165–181. [Google Scholar]
- Koopmans TC, Rubin H, Leipnik RB. Measuring the equation systems of dynamic economics. In: Koopmans TC, editor. Statistical Inference in Dynamic Economic Models, Number 10 in Cowles Commission Monograph. Chapter 2. New York: John Wiley & Sons; 1950. pp. 53–237. [Google Scholar]
- Lauritzen SL. Graphical Models. Oxford, UK: Clarendon Press; 1996. [Google Scholar]
- Lauritzen SL. Causal inference from graphical models. In: Barndorff-Nielsen O, Cox DR, Klüppelberg C, editors. Complex Stochastic Systems. London: Chapman and Hall; 2001. pp. 63–107. [Google Scholar]
- Lauritzen SL, Richardson TS. Chain graph models and their causal interpretations. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2002;64(3):321–348. [Google Scholar]
- Lehmann EL, Romano JP. Testing Statistical Hypotheses. Third. New York: Springer Science and Business Media; 2005. [Google Scholar]
- Margolis M, List J, Osgood D. Endangered options and endangered species: what we can learn from a dubious design. Gettysburg College, Department of Economics; 2012. Apr, Unpublished manuscript. [Google Scholar]
- Marshall A. Principles of Economics. New York: Macmillan and Company; 1890. [Google Scholar]
- Mas-Colell A, Whinston MD, Green JR. Microeconomic Theory. New York: Oxford University Press; 1995. [Google Scholar]
- Matzkin RL. Nonparametric identification. In: Heckman J, Leamer E, editors. Handbook of Econometrics. 6B. Amsterdam: Elsevier; 2007. [Google Scholar]
- Matzkin RL. Identification in nonparametric simultaneous equations models. Econometrica. 2008;76(5):945–978. [Google Scholar]
- Matzkin RL. Identification in nonparametric limited dependent variable models with simultaneity and unobserved heterogeneity. Journal of Econometrics. 2012;166(1):106–115. [Google Scholar]
- Matzkin RL. Nonparametric identification of structural economic models. Annual Review of Economics. 2013;5 Forthcoming. [Google Scholar]
- Newey WK, Powell JL. Instrumental variable estimation of nonparametric models. Econometrica. 2003 September;71(5):1565–1578. [Google Scholar]
- Pearl J. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. San Mateo, CA: Morgan Kaufmann Publishers Inc.; 1988. [Google Scholar]
- Pearl J. [Bayesian analysis in expert systems]: Comment: Graphical models, causality and intervention. Statistical Science. 1993;8(3):266–269. [Google Scholar]
- Pearl J. Causal diagrams for empirical research. Biometrika. 1995 December;82(4):669–688. [Google Scholar]
- Pearl J. Causality. Cambridge, England: Cambridge University Press; 2000. [Google Scholar]
- Pearl J. Causality: Models, reasoning, and inference. New York: Cambridge University Press; 2001. Reprinted with corrections ed. [Google Scholar]
- Pearl J. Causality: Models, Reasoning, and Inference. 2nd. New York: Cambridge University Press; 2009. [Google Scholar]
- Pearl J, Verma TS. A theory of inferred causation. In: Prawitz D, Skyrms B, Westerståhl D, editors. Logic, Methodology, and Philosophy of Science; Proceedings of the Ninth International Congress of Logic, Methodology, and Philosophy of Science; Uppsala, Sweden. August 7–14, 1991.Amsterdam: Elsevier Science; 1994. pp. 789–812. [Google Scholar]
- Powell JL. Estimation of semiparametric models. In: Engle R, McFadden D, editors. Handbook of Econometrics. Vol. 4. Amsterdam: Elsevier; 1994. pp. 2443–2521. [Google Scholar]
- Reiersöl O. Confluence analysis by means of instrumental sets of variables. Arkiv för Matematik, Astronomi och Fysik. 1945;32A(4):1–119. [Google Scholar]
- Robins J. A new approach to causal inference in mortality studies with a sustained exposure period: Application to control of the healthy worker survivor effect. Mathematical Modelling. 1986;7(9–12):1393–1512. [Google Scholar]
- Rosenbaum PR, Rubin DB. The central role of the propensity score in observational studies for causal effects. Biometrika. 1983 April;70(1):41–55. [Google Scholar]
- Rubin DB. Statistics and causal inference: Comment: Which ifs have causal answers. Journal of the American Statistical Association. 1986;81(396):961–962. [Google Scholar]
- Simon HA. Causal ordering and identifiability. In: Hood WC, Koopmans TC, editors. Studies in Econometric Method. Chapter 3. New York, NY: John Wiley & Sons, Inc.; 1953. pp. 49–74. [Google Scholar]
- Sobel ME. Discussion: ‘the scientific model of causality’. Sociological Methodology. 2005;35(1):99–133. [Google Scholar]
- Spirtes P. Proceedings of the Eleventh Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI-95) San Francisco CA: Morgan Kaufmann; 1995. Directed cyclic graphical representations of feedback models; pp. 491–498. [Google Scholar]
- Spirtes P, Glymour CN, Scheines R. Causation, Prediction and Search. 2. Cambridge, MA: MIT Press; 2000. [Google Scholar]
- Tamer E. Incomplete simultaneous discrete response model with multiple equilibria. Review of Economic Studies. 2003 January;70(1):147–165. [Google Scholar]
- Tian J, Pearl J. Proceedings of the Eighteenth National Conference on Artificial Intelligence. Cambridge, MA: AAAI Press; 2002. A general identification condition for causal effects; pp. 567–573. [Google Scholar]
- Tian J, Pearl J. Technical report. Cognitive Systems Laboratory, University of California at Los Angeles; 2003. On the identification of causal effects. [Google Scholar]
- Vytlacil EJ. Independence, monotonicity, and latent index models: An equivalence result. Econometrica. 2002 January;70(1):331–341. [Google Scholar]
- White H, Chalak K. Settable systems: An extension of Pearl's causal model with optimization, equilibrium, and learning. Journal of Machine Learning Research. 2009;10:1759–1799. [Google Scholar]
- Yule GU. On the correlation of total pauperism with proportion of out-relief. The Economic Journal. 1895;5(20):603–611. [Google Scholar]
