Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2015 Jan 1.
Published in final edited form as: Psychol Bull. 2013 Apr 1;140(1):109–139. doi: 10.1037/a0031903

Reasoning about Causal Relationships: Inferences on Causal Networks

Benjamin Margolin Rottman 1, Reid Hastie 2
PMCID: PMC3988659  NIHMSID: NIHMS484441  PMID: 23544658

Abstract

Over the last decade, a normative framework for making causal inferences, Bayesian Probabilistic Causal Networks, has come to dominate psychological studies of inference based on causal relationships. The following causal networks—[XYZ, XYZ, XYZ]—supply answers for questions like, “Suppose both X and Y occur, what is the probability Z occurs?” or “Suppose you intervene and make Y occur, what is the probability Z occurs?” In this review, we provide a tutorial for how normatively to calculate these inferences. Then, we systematically detail the results of behavioral studies comparing human qualitative and quantitative judgments to the normative calculations for many network structures and for several types of inferences on those networks. Overall, when the normative calculations imply that an inference should increase, judgments usually go up; when calculations imply a decrease, judgments usually go down. However, two systematic deviations appear. First, people’s inferences violate the Markov assumption. For example, when inferring Z from the structure XYZ, people think that X is relevant even when Y completely mediates the relationship between X and Z. Second, even when people’s inferences are directionally consistent with the normative calculations, they are often not as sensitive to the parameters and the structure of the network as they should be. We conclude with a discussion of productive directions for future research.

Keywords: Causal Inference, Causal Structures, Bayes Nets, Markov Assumption, Discounting, Explaining Away, Conditional Reasoning, Logical Arguments

Introduction

Most human judgments under uncertainty involve reasoning about causal relationships. For example, a physician tries to infer which disease is the most likely cause of a patient’s symptoms (effects). Then, the physician intervenes to alleviate the symptoms by changing the causal dynamics within the patient. Or a corn futures trader forecasts the price of corn by considering the consequences of various possible economic and geopolitical events (e.g., Will a change in China’s trade policy influence the value of corn in North America?). And, more personally, one commits to an exercise and diet plan because one believes that the program will produce specific health benefits.

However, until recently, the role of causal reasoning in judgments under uncertainty has been neglected in psychological research. One reason for this neglect has been the lack of a good normative model for the reasoning process that underlies even simple everyday causal inferences, such as in the examples above. In the past ten years, there has been a paradigm shift in behavioral research on causal inference. The shift has been driven by the dissemination of the Bayesian Probabilistic Causal Network approach to modeling causality (henceforth referred to as “causal networks”). This approach provides prescriptions for rational calculations for inferences on causal networks. The approach has its roots in theoretical papers by Pearl (1988, 2000), Lauritzen and Spiegelhalter (1988), and Spirtes, Glymour, and Scheines (1993/2000) in mathematics and statistics. It has been communicated to behavioral scientists in books by Glymour (2001) and Sloman (2005), as well as papers by many other researchers (Danks, 2009; Gopnik, et al., 2004; Rehder & Hastie, 2001; Steyvers, Tenenbaum, Wagenmakers, & Bloom, 2003; Waldmann, 1996; Waldmann & Martignon, 1998).

Our focus here is on deliberate and partly conscious reasoning about causal beliefs. For example, when our car fails to start one morning, we engage in a deliberate, partly verbalizable sequence of inferences based on our beliefs about what is causing what within the car and its immediate environment: Could something about the weather – recent rainfall – have interfered with the normal sequence of events that occur after we turn the ignition key? Or could the gas tank be empty, the battery be dead, a fuse blown, or a wire chewed through by a squirrel? Here we start from a single fact or set of facts (the car won’t start and it rained last night), and then reason within a system of beliefs (about how the car works) to update our beliefs about the world (rain probably caused a short). Our focus here is not on how we obtain knowledge about how the car works, but rather on how we make inferences or judgments about the car given our knowledge of how the car works.

Introduction to Causal Networks

Throughout this article we will refer to a stylized example about farming represented in Figure 1. Imagine a farmer who grows cantaloupes and tomatoes. Both cantaloupes and tomatoes are damaged by an early frost (F); they are effects of a common cause. In addition, the tomato harvest (T) is hurt by the tomato fruitworm (W); however, this pest does not affect the cantaloupe harvest (C). Finally, if a farmer has a poor tomato harvest, then he or she is likely to reap a small profit from the tomatoes (P).

Figure 1.

Figure 1

Farming Scenario

The graph in Figure 1 conveys the structure of the causal relationships. The nodes represent variables that can take on multiple values. For example, uppercase F represents whether there was an early frost or not. Lowercase represents the state of the node; f=1 denotes that there was an early frost, and f=0 denotes that there was not. The causal relationships are represented by arrows (or “edges”) between the nodes.

A fully realized causal network also contains parameters. “Base rate” parameters capture the probability of exogenous nodes, P(F) and P(W), which do not have any explicitly represented causes. “Strength” parameters model how likely each cause is to generate or inhibit each of its effects. When multiple causes influence the same effect (such as F and W on T), a function must be identified to describe how these causes combine to produce T.

Once we know the structure and parameters, the normative theory of graphical causal models prescribes how one should infer the state of one variable given the state of another. For example, suppose we learn that a farm had a tomato fruitworm infestation. We would infer that the farm probably had a poor tomato harvest, but we would not rationally infer anything about the cantaloupe harvest. This sort of inference is often called an inference from an observation; we observe the state of one variable and then infer another. We will also discuss inferences from interventions, when we manipulate the state of one variable and then infer the state of another (e.g., if we spray the tomatoes with a pesticide to prevent a fruitworm infestation and then infer the tomato harvest). We also consider reasoning about counterfactuals such as “What would the profit from tomatoes have been had the tomato fruitworm infestation not occurred?” All of these questions can be interpreted as inferences on the causal network in Figure 1.

So, what makes a graph and parameters of this type a causal network, rather than merely a “probability graph”? It is simply the interpretation of the graph. If the graph is defined as representing causal relationships, then it’s a causal graph. However, certain conventions of these networks convey a distinctly causal interpretation. These include (a) the interpretation of arrows as indicating temporal ordering on the variables, (b) the assumption that interventions on the value of one node will be propagated only “downstream” to future states of other nodes, and (c) the presumption that counterfactual inferences can be made about “what would have happened” if the states of nodes had been otherwise than what they in fact were. In sum, the causal interpretation of these graphs comes from how they are used and what they represent, not from the probability calculus itself.

Three steps of causal inference

To clarify our focus we distinguish three steps of causal inference: (i) learning the structure of the causal network; (ii) learning the parameters; and (iii) making a judgment about one node given our knowledge about the other nodes. Our review focuses on making judgments. However, all of the behavioral experiments “teach” the structure and parameters to the human research participants in some manner. Because learning the structure and parameters conceptually precedes making judgments, we provide a brief overview of these prior types of learning.

Learning the structure of a causal network (i.e., which variables cause which other variables) often occurs through explicit teaching (e.g., in a biology class or reading The Economist) and deducing plausible causal pathways based on mechanistic hypotheses (e.g., rain water could have caused a short in the car ignition system). Additionally, much of the recent research on “causal learning” has focused on how people learn causal structures from experience (e.g., Gopnik et al., 2004; Lagnado & Sloman, 2004, 2006; Rottman & Keil, 2012; Steyvers et al., 2003; see Lagnado, Waldmann, Hagmayer, & Sloman, 2007, for a summary). For example, a parent might form beliefs about how to raise a well-behaved child by observing correlations between children’s behaviors and the behaviors of those children’s parents. Of course, it is notoriously difficult to learn causal relationships from correlations alone. A second way to learn causal structures is from “interventions”: a parent might try various child-rearing habits to see which one works best. Finally, people also learn causal structures from a variety of temporal cues. For example, a mother might infer different causal relationships if she notices that after her son has a restless night he misbehaves, versus the observation that after he misbehaves he sleeps poorly.

It is still unclear how successful we are at learning causal structures from experience. Furthermore, we often have beliefs about what causes what (e.g., rain might have caused an ignition short in my car) and can make judgments and decisions (e.g., I’ll wait to see if the short is fixed after the water dries) without having to learn the causal structure through some form of statistical induction from experience.

The second step is learning the causal strengths, i.e., the degree to which a cause influences each of its effects (see Hattori & Oaksford, 2007, for a summary of 41 potential models). Studies investigating learning focus on scenarios when there are one or more possible causes (A, B, C) of a single effect (E) and the goal is to learn the strengths of the alternate causes. Often these experiments do not distinguish whether participants learned about whether the link AE exists versus the strength of the link AE; thus experiments about “causal strength learning” and “causal structure learning” as well as “multiple cue learning” and even “covariation detection” can overlap.

Earlier literature on causal strength learning often focused on “irrational” inferences like illusory correlation (e.g., Jenkins & Ward, 1965) and how strengths could be learned through associative mechanisms (e.g., Dickinson, Shanks & Evenden; 1984). However, the recent trend has been to focus on rational explanations for patterns in causal strength learning such as conditioning on alternative causes (e.g., Waldmann, 1996; Waldmann & Hagmayer, 2001; Spellmann, 1996), accounting for ceiling and floor effects (Cheng, 1997; Novick & Cheng, 2004), understanding the interaction between whether a link exists and the strength of the link (Griffiths & Tenenbaum, 2005), and incorporating prior beliefs about the likely strength of potential causes (Lu et al., 2008).

Regarding the current review, most of the studies that focus on judgment have simply told experimental participants the causal structure rather than having them learn it from experience. Participants learned the parameters by observing correlations between causes and effects or from textual descriptions or prior knowledge (sometimes participants did not have any specific quantitative knowledge of the parameters). Thus, even in controlled experiments there may be questions about participants’ beliefs about the causal system (we attend to these issues on a study-by-study basis in this review). Overall, our focus is on people’s judgments given their causal belief system.

Simplifications and Limitations of Causal Networks

It is important to keep the nature of the simplifications that are inherent in the causal network framework clearly in mind, as this approach to human judgment depends upon accepting that such simplifications do not drastically distort everyday habits of thinking about causal relationships. The causal network framework is very flexible and can be extended to loosen various assumptions. However, the standard framework—the one that has been the primary focus in the causal reasoning literature—makes the following assumptions.

First, the networks we review do not represent any temporal durations such as the length of delay between a punctate cause and effect or the timing of a maximum effect (e.g., ibuprofen has its maximum effect at about 1 hour after ingestion). We note, however that standard causal networks can be expanded to include temporal information (e.g., Buchanan & Sobel, 2011; Rottman & Keil, 2012).

Second, the causal networks we review are acyclic; they cannot have any loops like XY→Z→X or “bidirectional” relationships like XY. In acyclic networks each variable can be represented as a function of the variables that directly cause it, but if a variable causes itself then this function is indeterminant. Standard networks can be “unfolded over time” to account for causal loops (e.g., Griffiths & Tenbaum, 2009; Kim, Luhmann, Pierce, & Ryan; 2009; Rehder & Martin, 2011).

Third, the networks considered in most applications are incomplete. Surely there are many variables that could be added that precede, mediate, and/or follow the variables explicitly represented in any network (e.g., other causes and effects of tomato fruitworms or small profits).

Fourth, there are many “zero links” in the network, when in reality there are small causal influences between relevant causal events. For example, in a realistic economic context, the cantaloupe harvest probably has an impact on the market price of tomatoes, but this influence is ignored in the Farming Scenario. This sparseness is also typical of all the relevant behavioral research.

Fifth, an essential property of causal networks is the Markov Assumption. In reference to Figure 1, this assumption says that when the state of T is known, and we infer P, F does not provide any additional information about P. In other words, T completely mediates the relationship from F to P. The Markov assumption greatly simplifies normative causal inference because it identifies variables that can be ignored for certain inferences. The Markov assumption cannot be relaxed or abandoned.

These limitations have led some philosophers and mathematicians to conclude that the entire enterprise of modeling realistic situations with such graphs is futile (e.g., Cartwright, 1999, 2001, 2002; see also papers in Gelman & Meng, 2004). We still believe that the approach helps us understand real causal systems and how ordinary people think about causality. But not all readers will agree, and we want to be clear about the strong assumptions required to believe that Causal Networks provide a useful tool for understanding causal cognition.

Simplifications and Limitations of Psychological Research on Causal Networks

In addition to the limitations and simplifications of the normative causal network approach, there are additional simplifications in the ways that causal inference is typically studied in psychology experiments. First, although the variables in the example network could be continuous, ordinal, or categorical, the majority of behavioral research has focused on binary causes and effects (e.g., the tomato harvest was good or poor, not number of tons of tomatoes harvested). Second, although each causal relationship could be generative or inhibitory, most of the existing research has focused on generative links. In the farming example we represented Early Frost as causing a Poor Tomato Harvest, not preventing a Good Tomato Harvest.

Third, when two or more causes influence one effect, the causes can potentially combine in many different ways. For example, when causes are multi-valued they could produce the effect additively, multiplicatively, or with any other function (Waldmann, 2007). However, most research (which has focused on independent, generative, binary causes) has assumed a particular “functional form” called the “Noisy-OR gate” (e.g., Cheng, 1997; Griffiths & Tenenbaum, 2005; Novick & Cheng, 2004; Pearl, 1988; see Yuille & Lu, 2008 for other functional forms). For example, one might believe that the probability of a poor tomato harvest is determined by the union (as opposed to the intersection, or some other function) of a frost or an infestation that successfully causes a poor tomato harvest.

Plan for this Review

Our focus is on how people make inferences and whether their inferences agree with the normative calculations on causal networks. We first discuss whether people’s inferences follow the Markov Assumption, which simplifies reasoning by identifying which nodes are relevant for making a particular inference. The rest of the manuscript focuses on how people make use of the parameters of the causal structures. We look at whether people’s inferences go in the predicted directions, as well as how close people’s inferences come to the normative calculations. We analyze these questions for a variety of different types of paradigmatic causal structures including chains, common cause structures, one-link structures, common effect structures, and diamond structures.

We finish by making some observations about the quality of human reasoning about causal relationships. To foreshadow our conclusions, many aspects of human reasoning about causal systems reflect the qualitative prescriptions of the normative model. When the calculations imply the probability of an event should increase, usually judgments go up; when they imply a decrease, they go down. But, there are some reliable anomalies. In particular, people seem not to respect the Markov Assumption and their inferences tend to be weaker than would be implied by the normative model. We also comment on the value of comparing behavioral results to a normative model. Among other reasons, we submit that the comparison is useful because it identifies potential pitfalls for human reasoning about practical matters.

The Markov Assumption

The Markov Assumption identifies which nodes are relevant to an inference and which nodes are irrelevant. Consider the chain in Figure 2, which is a sub-graph from Figure 1. Suppose that we are trying to infer whether there will be a large or small profit from tomatoes this year. If we know that there was an early frost, we would be likely to infer a poor tomato harvest and thus a small profit. The probability of p=1 is higher given that f=1 than given f=0; P(p=1|f=1)>P(p=1|f=0).

Figure 2.

Figure 2

Three Prototype Causal Networks Embedded in the Farming Scenario

However, suppose that we already know that there was a poor tomato harvest (t=1) and later learn that it was caused by an early frost (f=1). Given the poor tomato harvest we would already have inferred that there is likely to be a small profit from tomatoes, and learning that there was or was not an early frost does not change the inference about the tomato profit: P(p=1|t=1) = P(p=1|t=1, f=1) = P(p=1|t=1, f=0). F is irrelevant to P once T is known. The technical term for this relationship is that early frost and small profit from tomatoes are “d-separated” by poor tomato harvest; profit is no longer dependent on frost once the mediator (poor harvest) is known. These inference patterns are symmetric. For FTP, once T is known, learning the state of P does not affect the inference of F.

More generally, the Markov Assumption states that a given node, conditional on all its direct causes, is statistically independent of all other nodes that are not its direct or indirect effects. (See Charniak, 1991, and Sloman, 2005, for gentle introductions to causal graphical models, and Jensen & Nielsen, 2007, for a more technical introduction.) The Markov Assumption becomes even more useful in structures with large numbers of variables because the Markov Assumption may be able to label many of them as irrelevant for a given inference.

The common cause graph works much like the chain. If we find out that there was a poor tomato harvest, we might infer that there was an early frost, and thus that there was also a poor cantaloupe harvest. However, if we already know that there was an early frost, then we would predict that there was a poor cantaloupe harvest regardless of whether there was a poor tomato harvest or not.

For the common effect structure, neither F nor W have any direct causes in the network, so they are unconditionally independent. Just because there was an early frost does not mean that there was a fruitworm infestation, or vice versa. (In some modeling applications exogenous causes like F and W are not necessarily assumed to be independent.)

Evidence of the Use of the Markov Assumption for Inferences

The Markov Assumption identifies which variables can be ignored for particular inferences, simplifying the inference process. Rehder and Burnett (2005) provided the first comprehensive test of the Markov Assumption. Here is an example of one scenario they used involving a causal chain. Participants learned about Kehoe ants, which typically have blood high in iron sulfate, which causes a hyperactive immune system, which causes thick blood, which causes them to build nests quickly [ISTQ], but participants were not given the specific parameters of the causal model. Participants were then presented with an ant with certain features such as [s=1, t=1, q=0] and were asked to infer the probability of I. Whether T and Q are 1 or 0 should not affect the inference of I because S is known to be 1.

Rehder and Burnett (2005) found that participants systematically violated the Markov Assumption. For the chain structure (see Figure 3), even when they knew the state of M1, if M2 and E were present, then they were more likely to infer that C was present. There were analogous effects for inferring E. For the common cause, even if they knew the state of C, the states of E2 and E3 influenced participants’ inferences of E1. For the common effect structure, if C2 and C3 were present (and the state of E was unknown), participants were more likely to infer that C1 was present.

Figure 3.

Figure 3

Causal Structures Investigated by Rehder and Burnett (2005).

In order to account for these violations of the Markov Assumption, Rehder and Burnett suggested that their participants inferred that there was another feature, an unobserved “mechanism” that was a direct cause of all other features, somewhat like a category essence (see Figure 3, bottom row). With the unobserved mechanisms, all the features that were previously independent are dependent because they are common effects of the unobserved mechanism. For example, in the chain structure, even when the state of M1 is known, if M2 is present then the mechanism is more likely to be present, and thus C is more likely to be present.

Explaining this violation of the Markov rule by assuming participants had “imported” an unobserved cause into their mental representations leaves some open questions. First of all, there is no direct evidence that people believe in this unobserved mechanism. For the case of living kinds, it is plausible to hypothesize factors like DNA that might serve as underlying causes of many causal features. But Rehder and Burnett (2005) also used other categories such as Romanian cars (with features such as butane laden gas and lose fuel filter gaskets). In one experiment they even used “skeletal” categories called “Daxes” and the four features were simply labeled A, B, C, and D with no additional meaning. It is unclear what sort of unobserved mechanism could be posited in these cases. Furthermore, Rehder (2006) replicated these results in scenarios that did not involve categories (e.g., low interest rate → small trade deficit → high retirement savings) as well as with a completely abstract domain (e.g., Variable A → Variable B → Variable C). These experiments suggest that even if the Markov violations can be modeled by adding an unobserved common cause to the structure, it is not obvious why people would assume such a node.

In order to eliminate the unobserved category “mechanism” as a possible explanation for the Markov violations, Rehder (under review) used nodes labeled as causes and effects that were not features of a category (e.g., urbanization causes socio-economic mobility). Rehder also wondered if people were inferring other direct causal relationships between the variables based on their prior knowledge, which could lead to apparent violations of the Markov Assumption. Thus, he also counterbalanced the nodes in a way such that systematically inferred additional links between the nodes would not lead to violations of the Markov Assumption. Yet, he still found persistent violations.

Rehder (under review) also tested the effects of deliberative reasoning versus more intuitive judgments. In one condition he required participants to respond to the inference questions in under 10 seconds, and in another he asked them to justify their inferences. There was no consistent effect of the justification or speeded manipulations; if anything, it appeared that justification led to more Markov violations. This pattern of findings suggests that Markov violations are not merely due to a quick intuitive judgment such as associative reasoning.

Burnett (2004) conducted a number of similar experiments and also found significant violations of the Markov Assumption. In addition, he found evidence that people’s inferences fit a proximity heuristic: nodes that are closer to the inferred node are weighted more, even when an intermediate node is known. For example, in the chain CM1M2E, when inferring C and the state of M1 is known, the state of M2 has a larger impact on C than does the state of E.

Mayrhofer, Hagmayer, and Waldmann (2010) tested the Markov assumption in a task in which aliens read the minds of other aliens (see the following section for more details about this study). One condition involved a chain CM1M2E, such that Alien E read the mind of Alien M2, who read the mind of M1, who read the mind of C. Participants inferred that Alien E’s thoughts were almost entirely dependent upon M2 (and very weakly dependent upon C and M1). This particular domain and chain structure (essentially the “telephone game”) seems to emphasize to participants that only the direct cause is relevant for any given inference.

Finally, Sussman and Oppenheimer (2011) conducted a study in which they told participants causal relationships between three fictitious plumbing devices. On each trial, participants were told integer values of two of the devices and their task was to estimate the value of the third. They found that both for chains and common cause structures, people showed small and probably non-significant violations of the Markov Assumption.

In sum, many studies using a variety of materials have demonstrated that people violate the Markov Assumption. However, the authors of these reports believed that it was plausible that participants imagined an additional unobserved variable that was a common cause or inhibitor of the observed variables. Burnett (2004) even called violations of the Markov Assumption “adaptive” if people believed that there are additional causal relationships aside from those specified by the experimenter. Rehder and Burnett (2005) also pointed out that the Markov Assumption could appear to be violated if people treat all the observed variables as imperfect observations. This means that in realistic scenarios it is very difficult to rule out all rational explanations for “apparent” violations of the Markov Assumption.

At the same time, a number of studies have used scenarios in which there is no compelling reason why people would infer additional causal links. It is also notable that the Markov violations always seem to be “positive.” For ABC, people essentially infer an additional positive correlation between A and C above and beyond the correlation implied by B. If people were really inferring additional unobserved links it is unclear why these links would overwhelmingly be positive. Thus, some of these inferences seem to be true violations of the Markov Assumption, in that there is no plausible adaptive reason for inferring an unobserved common cause given the particular cover story. We summarize the results of this section in Figure 4, and use the same notation presented in the Key of Figure 4 throughout the remainder of this review. Bold represents nodes that are being inferred. Normal weight represents nodes with known states (0 or 1). Dashed lines represent nodes with unknown states. Octagons (stop sign) represent nodes that are used even though they should be ignored for the given inference.

Figure 4.

Figure 4

Summary of Markov Assumption Violations.

Reasoning About Plausible Unobserved Links on Common Cause Structures [E1CE2]

So far we have framed the Markov Assumption as being normative. We have discussed some potential explanations for apparent violations of the Markov Assumption. However, in all the previous scenarios, if people had actually inferred additional unobserved links they were doing so without good reasons. In the current section we discuss some situations in which people seem to adeptly reason about the scenario to infer plausible unobserved links.

In a standard common cause structure, E1CE2, we can conceive of the two effects as having additional independent unobserved influences (see the Us in Figure 5a). Though Figure 5a is the standard way to interpret common effect structures when we have no additional information, there are some situations in which we might believe that the effects would be correlated above and beyond what would be implied by C alone. Figures 5b and c represent two such structures (see also the Feature Uncertainty Model in Rehder and Burnett, 2005).

Figure 5.

Figure 5

Independent versus Correlated Errors on a Common Cause Network

Note: Dashed circles represent unobserved or unknown causal variables.

Mayrhofer, Hagmayer, and Waldmann (2010) investigated a social transmission scenario and found that describing the nodes as either active or passive moderated whether people interpreted the structure as having independent versus correlated errors. They used a cover story about four telepathic aliens who could transfer their thoughts (either “por”=1 or “tus”=0) through mind reading. In the “active” condition, one alien (cause; C) was described as sending his thoughts to the effect aliens (Es). Participants inferred P(e2=1|c=1, e1=1)>P(e2=1|c=1, e1=0). When C was described as the agent “sending” the message to E1 and E2, one could plausibly reason that something could cause an error in the transmission of the message to both E1 and E2 (e.g., Alien C wasn’t concentrating hard enough). This “sending” condition seems to imply correlated errors (e.g., Figure 5b or c). In contrast, when the effect aliens were described as passively “reading” the mind of Alien C, there was a smaller difference between P(e2=1|c=1, e1=1) and P(e2=1|c=1, e1=0). A plausible reason is that if one effect alien misread the message, it should not have an impact on another alien’s ability to read the message; independent errors like as depicted in Figure 5a.

Mayrhofer, Goodman, Waldmann, and Tenenbaum (2008) investigated another aspect of a causal scenario likely to convey beliefs in unobserved correlated errors. They used the same alien cover story, but now there were two different types of effect aliens, green and yellow. When one yellow alien misread the message people tended to infer that another yellow alien would also misread the message, but whether the green aliens correctly read the message did not matter for inferring whether a yellow alien would correctly read the message. This pattern can be interpreted as indirect evidence for two sets of correlated errors for the two types of aliens.

Walsh and Sloman (2004; 2007) also investigated rational explanations for correlations between effects of a common cause above and beyond the correlation implied by the cause. They used realistic common cause scenarios [e.g., jogging causes increased fitness level and weight loss]. When told that Tim did not lose weight, people often came up with explanations that were common causes or disablers of both effects (e.g. jogging increased Jim’s appetite, which caused him not to lose weight and prevented his fitness level from increasing).

In sum, the studies in this section have identified a number of scenarios for which it seems reasonable for people to use their own prior knowledge or information conveyed in the description of the scenario to infer structures with correlated errors. However, the fact that inferring such correlated errors was to be expected in these studies does not diminish the fact that in the studies in the previous section there was no similar reason to infer correlated errors, and thus no compelling reason to believe that the Markov Assumption did not hold.

Beliefs about whether the Causes in a Common Effect Structure [C1EC2] are Correlated

For common effect structures, C1EC2, the strict interpretation of the Markov Assumption implies that the two exogenous causes, C1 and C2, are independent from each other. Yet in practice, statistical causal modelers (e.g., LISREL) often allow for the possibility that they are correlated. For example, imagine an economist considering three economic indexes postulated to have the following structure: I1I3I2. The modeler would likely want to test for the possibility that I1 and I2 are correlated rather than just assume that they are independent. In this section we discuss situations in which people believe that C1 and C2 are correlated, exploring how these beliefs are influenced by different types of experience and whether people’s beliefs are consistent.

As already mentioned, Rehder and Burnett (2005) told participants a cover story involving a common effect structure C1EC2. The participants treated C1 and C2 as correlated even though there was no obvious compelling reason to do so. Von Sydow, Hagmayer, Meder, and Waldmann (2010; Experiment 2) told participants the structure C1EC2. In a set of learning trials participants observed whether each variable was present or absent; C1 and C2 were uncorrelated. Afterwards, participants inferred that C1 and C2 were independent, P(c2=1|c1=1) = P(c2=1|c1=0). Thus, even if people tend to believe that C1 and C2 are correlated, they can fairly quickly learn that that C1 and C2 are independent.

Hagmayer and Waldmann (2000) conducted a similar study. On a given learning trial, however, participants saw either C1 and E, or C2 and E, so they could not calculate the correlation between C1 and C2. At the end of the learning trials, participants judged P(c2=1|c1=1) and P(c2=1|c1=0), which were converted into the correlation measure phi. The correlations in two experiments were slightly positive (.16 and .24). Perales, Catena, and Maldonado (2004) conducted a parallel study. In most conditions participants inferred correlations close to zero, but in one condition with strong causal relationships about one third of the participants inferred a substantial positive correlation.

The assumption that C1 and C2 are independent is particularly important for inferring the causal strength of C1 on E when C2 is unobserved (Cheng, 1997). Suppose that C1 and E are strongly correlated. If one believes that there are no other potential causes of E that are correlated with C1, then one might infer that C1 is a strong cause of E. However, what if one knows that there is another factor, C2, which causes both C1 and E? In this case it is possible that C1 is not a cause of E at all and the correlation between C1 and E is an artifact of C2. Thus, believing that other causes of E are independent of C1 is critical for inferring the strength of C1.

Hagmayer and Waldmann (2007) and Luhmann and Ahn (2007; Experiment 3) examined whether people believe that C1 is independent from an unobserved C2. On each trial people observed C1 and inferred whether C2 was present or absent. Both of these studies found that people judged C1 and C2 to be correlated. More importantly, the estimated correlation depended on the learning conditions. In Hagmayer and Waldmann’s Experiment 1, when the two causes were relatively weak, people thought that C1 and C2 were positively correlated; when they were relatively strong, people thought they were negatively correlated. But Luhmann and Ahn (Experiment 3) found a different result: When C1 had a positive influence on E, people inferred a positive correlation, but in the condition in which C1 had zero effect on E, people inferred a negative correlation. These results are surprising because there is no normative reason why changing the strengths should lead people to change their inferences of the correlation between C1 and C2. Hagmayer and Waldmann (2007) also asked people to make summary judgments of P(c2=1|c1=1) and P(c2=1|c1=0) at the end of the learning trials. Unlike the trial-by-trial judgments, these judgments reflected a belief that C1 and C2 were independent. It is surprising and unclear why these judgments were inconsistent.

Until now we hedged about what people should infer about the correlation between C1 and C2, proposing that people’s inferences should merely be consistent. However, Hagmayer and Waldmann (2007, Experiment 2) conducted a study that normatively implies that C1 and C2 are independent. On each trial participants chose whether C1 occurred or not, and then inferred whether C2 would be present or not. Because participants chose C1 without knowing C2 or E2, this intervention should be interpreted as cutting any links to possible unobserved common causes. Yet, participants usually inferred that C1 and C2 were negatively correlated. In sum, people’s beliefs about the relationship between C1 and C2 are inconsistent, and in one instance go against the normative framework.

Summary

The Markov Assumption greatly simplifies learning and reasoning with causal networks. However, people appear to be unaware of the simplicity it affords. When making inferences, people often use nodes that, according to the Markov Assumption, are irrelevant for the particular inference. Furthermore, related research outside the focus of this review also shows that people fail to capitalize on the Markov Assumption when learning causal networks (Steyvers, Tenenbaum, Wagenmakers, & Blum, 2003; Experiment 3; Fernbach & Sloman, 2009; Jara, Vila, & Maldonado, 2006).

Normative Quantitative Inferences on Graphical Causal Models

The rest of this review focuses on quantitative inferences people make based on the structure and parameters of the causal model. In the following sections we explain how to simulate the functioning of a causal network. By understanding how a causal structure “works,” from causes to effects, it is possible to make inferences—that is, to deduce the probability of any variable in the network given information about the states of other variables.

Parameterizing a Structure: Modeling how Causes Combine to Produce an Effect

The first task required to make quantitative inferences on a causal network is to model how each individual node is produced by its direct causes, otherwise known as the “parameterization” of the model. We start with a one-link structure CE. One common way to conceive of CE is with an additional alternative unobserved cause of E, which we will call A; CEA. A represents the “causal background” or the likelihood of other possible factors that we cannot directly observe generating E (and they are assumed to be independent of C). A psychological explanation for adding A into the model is that if E ever occurs without C, then we must believe that some other cause produced E. We denote the likelihood of A (a=1) generating E (e=1), or the “strength” of A as SA, which equals P(e=1|c=0). (Metaphysically SA reflects both the probabilities of all the unobserved causes as well as the strengths of all these unobserved causes. But because we do not know specifics about the probabilities and strengths of all these unobserved causes, we use SA for simplicity.) The two other parameters of the model are the base rate of C, P(c=1), and the strength of C causing E, SC. C can cause E only when C is present.

According to this parameterization, E can be produced two ways: C can produce E, with the probability [P(c=1)SC], and A can produce E with the probability SA. Thus, [1−P(c=1)SC] is the probability that C fails to generate E, and [1−SA] is the probability that A fails to generate E. Because we are assuming that C and A are independent, the probability of E occurring is 1– the probability that C and A both fail to produce E (the probabilistic union of either C or A generating E, see Table 1 Row 1). This is the Noisy-OR combination rule for two independent causes (see Cheng, 1997; Pearl, 1988).

Table 1.

The Probability of an Effect given different Combinations of Binary Generative and Inhibitory Causes, Assuming Noisy-Or (Generative) or Noisy-And-Not (Inhibitory) Functions

Structure P(e=1)
One generative cause plus A 1−[1−SA][1−P(c=1)SC]
Two generative causes plus A 1−[1−SA][1−P(c1=1)SC1][1−P(c2=1)SC2]
One inhibitory cause plus A SA[1−P(c=1)SC]
Two inhibitory causes plus A SA[1−P(c1=1)SC1][1−P(c2=1)SC2]
One generative cause (C1) and one inhibitory cause (C2) plus A [1−[1−SA][1−P(c1=1)SC1]][1−P(c2=1)SC2]

This same logic can be extended to cases with two or more generative causes, all of which can independently produce E (Table 1 Row 2). To determine the union of any of the three causes successfully producing E, one can calculate 1 minus the probability of all of the generative causes failing to produce e=1.

What if C inhibits or decreases the probability of E on a CE structure? The standard function to represent an inhibitory cause is called “noisy-And-Not.” A can produce e=1 with the probability SA. SC is the probability that C would inhibit e=1, so (1−SC) is the probability that C fails to inhibit e=1. Thus, P(e=1) is the product of A generating E, and C failing to inhibit E (Row 3 in Table 1). Rows 4 and 5 show other cases that can be determined with the same logic (see Novick and Cheng, 2004).

From the formulas in Table 1 it is trivial to calculate conditional probabilities of E given knowledge of the states of the causes. When a given cause is known to be present (or absent), P(c=1) simplifies to 1 (or 0). For example, the following two conditional probabilities are deduced from Row 1: P(e=1|c=1)=1− (1−SA)(1−SC) and P(e=1|c=0)=SA. These conditional probabilities will be used in the next section.

There are other ways that multiple binary causes could influence an effect. For example, P(e=1) could be determined by a simple sum of the strengths of the causes with cutoffs so that P(e=1) cannot go above 1 or below 0. Or, analogous to a logistic regression, P(e=1) could be determined by an S-shaped function over the sum of the strengths of the generative and inhibitory causes. Behavioral research has almost exclusively focused on noisy-OR and noisy-AND-NOT functions, so we do not consider these other possibilities any further.

So far we have discussed how to parameterize a structure with multiple causes of a single effect. To parameterize a larger structure, each exogenous node needs a parameter to represent its base rate, and each arrow needs a causal strength parameter. Additionally, if a node ever occurs when its causes are absent, then it also needs an SA parameter. Figure 6 shows the parameters for five canonical causal structures.

Figure 6.

Figure 6

Parameters for Five Structures

In sum, this section explains how the conditional probability of an effect given its causes can be derived from causal strengths assuming a noisy-OR integration. It is also possible to parameterize a structure at the level of conditional probabilities instead of going down to causal strengths: CE would be parameterized by P(e=1|c=1) and P(e=1|c=0). Either parameterization works, although reasoning with causal strengths provides a deeper level of analysis and is simpler to represent when there are multiple causes of a single effect.

From Conditional Probabilities to the Factorization and Joint Distribution

The previous section explained how to model the probability of an effect given its direct causes. The second step for representing a causal structure is the joint probability distribution, the probability that the variables in the network are each in a particular state. For the farming example Early Frost (F) → Poor Tomato Harvest (T), the joint distribution specifies the percentage of farms that experienced an early frost and a poor tomato harvest, P(f=1, t=1), the percent of farms that experienced an early frost but a normal tomato harvest P(f=1, t=0), and so on. Determining the joint distribution requires applying the “factorization” of the network. The factorization represents the structure of the graph in terms of conditional probabilities associated with each causal relationship in the graph. For the CE structure, the factorization is simply P(E,C)=P(E|C)P(C). For example, suppose that C is generative and P(c=.1), SA=.2, and SC=.5, and thus P(e=1|c=0)=.2, and P(e=1|c=1)=.6. Table 2 shows how to calculate the joint probability distribution for the four joint states of C and E. The four joint probabilities are mutually exclusive and exhaustive, so they sum to 1.

Table 2.

Joint Probability Table for CE

Joint Probability Factorization
P(c=1, e=1) P(e=1|c=1)P(c=1) = .6×.1 = .06
P(c=1, e=0) P(e=0|c=1)P(c=1) = .4×.1 = .04
P(c=0, e=1) P(e=1|c=0)P(c=0) = .2×.9 = .18
P(c=0, e=0) P(e=0|c=0)P(c=0) = .8×.9 = .72

For more complicated causal structures, the factorization of the joint probability distribution works in essentially the same way: the probabilities of each variable given its direct causes are multiplied together (and for exogenous variables with no causes in the network the base rate is used). Figure 6 shows how to calculate the joint probability for five canonical causal structures.

Marginal Probabilities

Whereas a joint probability is the probability of all the nodes in a network assuming a specific set of states, a marginal probability is the probability of a subset of the nodes in the structure assuming a specific set of states. For example, on the CE structure, one might want to know P(e=1). P(e=1) can be calculated by summing P(c=1, e=1) and P(c=0, e=1)—that is, Rows 1 and 3 in Table 2—which is known as “marginalizing” over C. Note that certain marginal probabilities, such as this one, can also be calculated directly from the parameterization in Table 1. Consider a marginal probability on a structure with three nodes A, B, and C. The marginal probability P(a=1, c=1) can be obtained from the sum of the two joint probabilities P(a=1, b=1, c=1) and P(a=1, b=0, c=1), effectively “marginalizing out” B. In sum, marginal probabilities can be calculated by summing over joint probabilities.

Marginal probabilities are important for two reasons. First, they are inferences in their own right. For example, on the chain CME, one might want to know P(m=1) or P(e=1). Second, marginal probabilities are important because they are often required when deducing conditional inferences, which is explained in the next section.

From Joint Probabilities and Marginal Probabilities to Conditional Inferences

A conditional inference is an inference of the probability of the state of one variable when the states of some or all of the other variables are known. Suppose that we want to infer P(c=1|e=1) on a CE structure (perhaps the probability that a farm had an early frost given that there was a poor tomato harvest). Equation 1, which involves an application of Bayes’ rule, provides the math. The derivation requires four steps; compare Equation 1 with Equation 2, which shows the four steps used for any inference in this manuscript. The first step is simply the definition of a conditional probability, which equals the joint probability of the two variables (C and E) divided by the marginal probability of the variable that is conditioned upon (E). The second step expands the denominator by marginalization. The third step converts the joint probabilities into the factorization for the causal structure. The fourth step uses the parameterization to convert the conditional probabilities into the parameters. From the final product, it can be seen that P(c=1|e=1) increases as P(c=1) and SC increase and as SA decreases.

This relatively simple math provides the basis for a wide variety of inferences across different types of causal structures. (We make a suggestion when deriving inferences: convert probabilities of the form P(a=0) to [1−P(a=1)] and P(a=0|b=1) to [1−P(a=1|b=1)]. But remember that P(a=1|b=0) ≠ [1−P(a=1|b=1)].)

P(c=1e=1)=P(c=1,e=1)P(e=1)=P(c=1,e=1)P(c=1,e=1)+P(c=0,e=1)=P(e=1c=1)P(c=1)P(e=1c=1)P(c=1)+P(e=1c=0)P(c=0)=SC+SA-SCSASC+SA/P(c=1)-SCSA Eq. 1. An Inference on a C→E Structure
P(XY,Z)=P(X,Y,Z)P(Y,Z)=P(X,Y,Z)P(x=1,Y,Z)+P(x=0,Y,Z)convertjointprobabilityintofactorizationconvertconditionalprobabilitiesintoparameterssimplify Eq. 2. Canonical Method of Calculating Inferences

Reasoning Based on Observed Frequencies

So far we have explained how to derive quantitative inferences from the structure and the parameters of the network. However, in many instances in the real world (and in some experiments), people experience the probabilistic relationships between the variables in a network. In such cases participants may rely on memories of specific events rather than reason on the structure itself.

Consider a scenario in which you are told that CE, and then you observe whether C and E are present or absent on 20 separate trials. For example, perhaps you observe 20 different farms and note whether each farm had an early frost or not (the cause) and whether each farm had a poor tomato harvest or not (the effect). One could theoretically tabulate the frequencies of C and E to compile Table 2 and perform inferences on Table 2 without using Bayes’ rule. For example, P(c=1|e=1) = P(c=1, e=1)/P(e=1) = Row 1/(Row 1 + Row 3). However, the number of rows in the joint probability table grows exponentially with the number of variables. The causal network framework greatly simplifies inference because the number of parameters (base rates and strengths) is often much smaller than the number of rows in a joint probability table.

Reasoning about Interventions

So far all the inferences have involved situations in which a person learns about one piece of information and then infers another (e.g., “What is the likelihood of a poor tomato harvest given an early frost?”). Causal networks are also useful for modeling “interventions,” when an actor intervenes on a causal structure to set a variable to a particular value and then infers the effects of that intervention on other variables. The ability to distinguish interventional versus observational inferences has often been cited as a hallmark of causal reasoning in humans (e.g., Meder, Hagmayer, & Waldmann, 2008; Sloman & Lagnado, 2005; Waldmann & Hagmayer, 2005), and even in rats (Blaisdell, Sawa, Leising, & Waldmann, 2006).

Pearl (2000) and Spirtes et al. (1993) presented a framework for understanding interventions. The basic idea is that when an intervention sets a variable to a particular state, it severs all the ties from other causes of the manipulated variable. The intervention propagates to the effects of the manipulated variable but not to its causes. For example, suppose that a jealous neighbor sprays a poison on the cantaloupes, ensuring a poor cantaloupe harvest. This intervention can be modeled by cutting the link from F to C (Figure 7). Normally a poor cantaloupe harvest might be a sign that there was an early frost. However, because we know that the cantaloupes were poisoned, we know that there is no longer a relationship between an early frost and the cantaloupe harvest. Once the links to the manipulated variable have been eliminated, all the inferences on the resulting structure are exactly the same as explained above. This method of calculating the effect of interventions is appropriate for “perfect” interventions – when the intervention completely determines the state of the manipulated variable and the intervention is independent of the rest of the network (Meder et al., 2010; Woodward, 2003).

Figure 7.

Figure 7

Farming Scenario After an Intervention Poisoning the Cantaloupe

In the next sections we discuss inferences on various causal structures. Note that the earlier discussion of the Markov Assumption has already noted many inferences on causal structures. Here we discuss the rest of the inferences for which empirical research exists.

Chain CME

Here we discuss transitive and marginal inferences on chains. We skip consideration of inferences about the state of the mediator given C and E, because no studies have provided results on the quality of these judgments.

Inferring the Effect from the Cause: Transitive Causal Inferences

Probabilistic causal relations are transitive. On the chain causal structure, if C is known to cause M, and M is known to cause E, then there should be a correlation between C and E. If both links are positive or if both are negative, then the relationship between C and E should be positive. However, if one link is positive and the other is negative, then the relationship between C and E should be negative. Equation 3 shows how to derive the transitive inference. We do not reduce Equation 3 all the way down to causal strengths because the present format in terms of conditional probabilities can be used regardless of whether the links are positive or negative (e.g., P(e=1|m=1)<P(e=1|m=0)).

P(e=1c=1)=P(c=1,e=1)P(c=1)=P(c=1,m=1,e=1)+P(c=1,m=0,e=1)P(c=1)=[P(e=1m=1)-P(e=1m=0)]P(m=1c=1)+P(e=1m=0) Eq. 3. A Transitive Inference on a Chain

Baetu and Baker (2009) had participants learn the contingencies between C and M and between M and E separately, and then asked about the relationship between C and E. They found that people generally followed the normative pattern: a positive relation if both links were positive or both were negative, otherwise a negative relation. However, their inferences from C to E were weaker than predicted by Equation 3; i.e., the difference between P(e=1|c=1) and P(e=1|c=0) was too small. Note that participants inferences’ were made on a −10 [“when C is 1 it perfectly prevents E from being 1”] to +10 [“when C is 1 it perfectly causes E to be 1”] scale. We describe results on a 0.00 to 1.00 probability scale when we felt that they could be transformed into a probability scale without a significant change in meaning.

Jara, Vila, and Maldonado (2006) examined the learning of chain structures in a “second order conditioning” paradigm. Participants saw M paired with E and C paired with M. In one set of experiments, participants inferred that C causes E even though they never saw C and E appear together: they made the transitive inference. In a second set of experiments, after participants learned the ME and CM relationships, they were subsequently presented with a set of trials in which M occurred without E, which was intended to extinguish the ME relationship. Surprisingly, participants still inferred that C would cause E. Some associative models predict that people would form a direct association between C and E, contrary to the chain structure.

In another study, participants were told about the chain structure, worked through 192 trials in which they observed C, M, and E, and lastly judged P(e=1|c=1) (Von Sydow, Hagmayer, Meder, and Waldman, 2010; also see von Sydow, Meder, & Hagmayer, 2009). Normally if the CM and ME links are both positive, then there will be a positive relation from C to E. However, Von Sydow et al. created a set of stimuli in which there was zero correlation between C and E even though the correlations between CM and ME were both positive. Technically their stimuli violated the Markov condition; conditional on M, C and E were negatively correlated. In this way, these experiments were designed to test whether people rely more on the actual observed contingencies or on the transitive relationship implied by a chain structure that is faithful to the Markov Assumption. Even though there was zero correlation between C and E, participants inferred a positive correlation of about .25 (in Experiment 1; .10–.15 in Experiment 2). These results show that people infer transitivity, a conceptual property of causal Bayesian networks, even when the experienced data do not support it.

These three studies suggest that people make transitive inferences from C to E and that these inferences persist even when contradicted by data in which there is no correlation between C and E. But, somewhat paradoxically, when there is a correlation between C and E in the data, the transitive inferences are not as strong as would be predicted by the normative model. One explanation for this pattern of findings is that people’s transitive inferences are based on their beliefs about the causal structure (i.e., that transitive inferences are warranted by a chain structure), and are less sensitive to the experienced contingencies. Figure 8 summarizes these findings. Throughout the review diamonds represent parameters that are not used as they should be (caution sign).

Figure 8.

Figure 8

Summary of the P(E|C) Inference.

Marginal Probabilities

Rehder and Kim (2010) investigated how people infer the marginal probability of a mediator and effect; P(m=1) and P(e=1). They presented people with chain structures and told participants the strengths of the causal relationships. We used participants’ inferences of P(c=1), combined with the strengths that they were given, to model P(m=1) and P(e=1).

Overall, participants were sensitive to the qualitative predictions of the normative causal network; however, they were not sensitive enough to the strengths. In one condition (Experiment 2), if a cause occurred its effect would occur 75% of the time, SC=SM=.75, and the effects would occur only if their causes occurred (i.e. SA=0). In this case, the marginal probability of each successive node should decrease; however, the decreasing slope was not as steep as the normative model implies. People inferred that P(c=1)=.78, P(m=1)=.73 and P(e=1)=.67, but given their belief that P(c=1)=.78, the other two inferences should have been P(m=1)=.58, and P(e=1)=.44. In sum, people were insufficiently sensitive to the strengths.

Common Cause: E1CE2

Inferring the Cause from Effects: P(C|E1, E2)

Assuming positive causal relationships, the more effects that are present, the more likely the cause is to be present. Rehder and Burnett (2005) confirmed that research participants demonstrate this effect. However, there are no results yet on how close this inference is to the normative calculations. In particular, consider a case in which C influences three effects, all with the same strength. The difference in the likelihood of C being present when only 1 versus 2 of the effects are present should be larger than the difference between 2 versus 3 of the effects. This pattern of reasoning is not apparent in Rehder and Burnett’s (2005) experiments, although their experiments do not provide a strong test for this effect because participants did not know the precise parameters.

Inferring One Effect from Another Effect: P(E1|E2)

Two effects of a common cause should in general be correlated (Equation 4). The reason is simply that when the common cause C is present, assuming positive causal relations, then all the effects are more likely to be present, but when the common cause is absent, all the effects are more likely to be absent.

P(e1=1e2=1)=P(e1=1c=1)P(e2=1c=1)P(c=1)+P(e1=1c=0)P(e2=1c=0)P(c=0)P(e2=1c=1)P(c=1)+P(e2=1c=0)P(c=0) Eq. 4

Waldmann and Hagmayer (2005) performed a study in which participants were given a common cause structure and experienced a series of learning trials during which they observed all three variables. At the end participants inferred P(e1=1|e2=1) > P(e1=1|e2=0), reflecting transitivity. However, this is not surprising because out of the 20 learning trials, in all but two, E1 and E2 had the same value. More impressive was that these inferences were sensitive to P(c=1) and the causal strengths. However, one problem with this study for our purposes was that participants experienced learning trials in which they observed C, E1, and E2. Thus, it is possible that when they were inferring P(e1=1|e2=1), they were merely making a direct inference from E2 to E1 rather than reasoning from E2 up to C and then back down to E1.

Hagmayer and Waldmann (2000; see also Waldmann et al., 2008) conducted a similar study, but on each learning trial participants either observed C and E1 or C and E2. They had to infer the correlation between the two effects based on the causal model. At the end, participants inferred P(e1=1|e2=1) and P(e1=1|e2=0), and later these were converted to a phi correlation coefficient. This condition was also compared to a common effect structure, C1EC2, in which participants learned about C1 and E or C2 and E, and later judged the correlation between C1 and C2. Unlike a common cause, a common effect structure implies no correlation between C1 and C2.

In Experiment 1,i people’s estimates of the correlation (r = .29) were much weaker than the true correlation (r = .62) and were not significantly different from the common effect control condition. In Experiment 2, the inferences also were quite low (r = .26) compared to the normative value (r = .44)ii and, again, not different from a control condition. Perales, Catena, and Maldonado (2004) reported a similar set of experiments. Their participants did infer correlations between E1 and E2 and they gave higher correlations in the common cause than in the common effect condition. However in some of the conditions, particularly those with deterministic links, the inferred correlations were considerably lower than the normative calculation (although they used an unusual correlation rating scale).

One final experiment tested this inference in a different way. Von Sydow, Hagmayer, Meder, and Waldman (2010, Experiment 2; see also the discussion of transitivity in causal chains above) told participants about the common cause structure and had them observe 192 learning trials of all three variables. Recall that even though there were correlations between C and E1 and C and E2, there was zero correlation between E1 and E2 (i.e., the learning trials violated the Markov condition). In contrast to the chain structure in which people inferred transitivity, their judgments of P(E2|E1) for the common cause structure implied no transitivity. In sum, these experiments suggest that people do not always believe that effects of a common cause are correlated, even though causal Bayesian networks imply that they usually are.

Inferring E1 after an Intervention on E2

Waldmann and Hagmayer (2005; Experiments 3 and 4) also had participants infer E1 after an intervention on E2; P(e1=1|set e2=1) or P(e1|set e2=0). An intervention on E2 severs the link from C to E2, so the only way to infer E1 is directly from C. Waldmann and Hagmayer found that when C had a higher base rate, P(e1=1|set e2=1) was higher. This finding suggests that people have an understanding of what an intervention means in terms of causal structures and that they are able to perform inferences on the remaining causal structure. Manipulating the strength of C on E1 also had some effect on the inference of E1.

However, participants did not answer these questions entirely normatively. First, when the base rate of C and the strength of C on E1 were manipulated, the inferences did not change as much as the normative model predicts they should change. Additionally, participants predicted that E1 was more likely to be present when E2 was intervened upon and set to 1 compared to 0, even though the intervention implies that E2 is irrelevant for inferring E1.

Hagmayer and Sloman (2009) tested whether people would recommend an action intervening on E2 to produce a change in E1. Surprisingly, there were some participants who recommend such an intervention. However, as with all studies using real-world knowledge, it is hard to know if these participants had additional beliefs, not explicit in the instructions, that would justify such an intervention (e.g., perhaps they believed that there might be an additional link E2E1).

One Link CE

In this section we discuss how people use the three parameters of a one-link causal structure, P(c=1), SC, and SA, when performing various inferences.

Inferring E Given C

As can be deduced from Table 1 Row 1, P(e=1|c=1)=1−(1−SC)(1−SA). Fernbach, Darlow, and Sloman (2011; Experiment 2) tested whether people are sensitive to SA using scenarios involving generic real-world events. For example, the unpopularity of the mayor of a city (C) could cause the mayor’s new policy to be unpopular (E), but a policy could be unpopular for other reasons even if the mayor is popular (A). Participants were asked three questions that defined the parameters of the one-link structure: the probability that a mayor of a major city is unpopular, P(c=1), the probability that the mayor’s unpopularity would cause his or her new policy to be unpopular, SC, and the probability that a new policy would be unpopular even if the mayor is popular, P(e=1|c=0) = SA. Fernbach et al. then used these three parameters to predict how participants would judge P(e=1|c=1), the probability of a policy’s being unpopular given that the mayor is unpopular.

Fernbach et al’s (2011) participants’ inferences were mainly determined by the strength of the primary cause, SC, and were not correlated with their beliefs about SA. Their inferences of P(e=1|c1=1) were also 8 percent lower than the normative model (calculated using each participant’s responses to the other three questions). Fernbach et al. attribute both of these results to participants’ failure to consider the possibility that that A could produce E. An alternative interpretation is that unless A is explicitly mentioned, people interpret the question P(e=1|c=1) to be asking for SC.

Of course, whenever real-world stimuli are used it is hard to know if participants have additional causal beliefs not picked-up by the experimenters. Fernbach and Rehder (2012; Experiments 1 and 2) tested the same phenomenon, but with artificial stimuli. For example, they said that iodized helium (C) causes stars to be very hot (E). They also told participants the strength of the causal relationship, Sc and the likelihood that some other factor caused the same effect, P(e=1|c=0) = SA. When they manipulated both factors in a 2×2 design, participants clearly made use of Sc but were not at all sensitive to SA.

Fernbach and Rehder (2012) also asked participants to estimate P(e=1|c=0). This should have been extremely easy because participants were explicitly told the likelihood of some other cause producing the effect; SA. However, they were insensitive to variation in SA. This is odd given that participants literally had this piece of information right in front of them – it was a parameter, not an inference. In sum, people appear to view SA as less relevant than it is in reality for both P(e=1|c=1) and P(e=1|c=0).

Inferring the Effect

As shown in Table 1 Row 1, P(e=1)=1−(1−SCP(c=1))(1−SA). Fernbach, Darlow and Sloman (2011; see description above) collected judgments of P(e=1), and we analyzed the results by calculating what P(e=1) should have been given their participants’ average estimates of P(c=1), Sc, and SA. Just as for the P(e=1|c=1) inference, their participants’ inferences of P(e=1) were 9 percent lower than the normative model. This under-prediction might be explained as a failure to consider the possibility that alternative causes could produce the effect.

Rehder and Kim (2010; see also Fernbach & Rehder, 2012) also asked participants to infer P(e=1). Overall, their participants were sensitive, but not sufficiently sensitive, to SA. For example, in one condition (Experiment 2 in Appendix C) SC=.75 and SA was manipulated between 0 and .75. Based on their beliefs of P(c=1), participants’ inferences should have changed from .56 to .88, but they changed only from .69 to .79.

Inferring C Given E

Inferring a cause given knowledge of an effect is called a “diagnostic inference” as an analogy to medical diagnosis in which a disease (cause) is sought to explain a set of symptoms (effect). Equations 5 and 6 show these inferences, and the Table 3 shows the directions of the influences of the parameters assuming positive strengths. [See Meder, Mayrhofer, & Waldmann (2009) for a modified normative framework for inferring P(c=1|e=1) when the causal structure is not known a priori.] In the following three sections we separately evaluate the evidence of whether people are sensitive to the three parameters for inferring P(c=1|e=1).

Table 3.

Direction of Influence of Variables in Eqs. 5 and 6

P(c=1|e=1) P(c=1|e=0)
P(c1=1)
SC
SA -
P(c=1e=1)=SC+SA-SCSASC+SA/P(c=1)-SCSA Eq. 5
P(c=1e=0)=1-SC1/P(c=1)-SC Eq. 6

Use of P(c=1)

When inferring causes from effects, people notoriously exhibit base rate “neglect” or “underappreciation”; in other words, they fail to use P(c=1) to the extent dictated by Bayes’ rule (e.g., Eddy, 1982; Kahneman & Tverskey, 1972; Bar-Hillel, 1980; Koehler, 1996). In contrast, others have suggested that when people learn the parameters from experience instead of being told the parameters, their inferences are closer to the correct Bayesian calculation (e.g., Christensen-Szalanski & Beach, 1982, and Gigerenzer & Hoffrage, 1995; see also the discussion above on reasoning based on observed frequencies).

Irrespective of this debate, the previous research on “base rate neglect” often involved statistical dependencies and did not necessarily engage causal reasoning habits. Thus, we rely on Meder, Hagmayer, and Waldmann’s (2009) Experiment 2, which involved explicitly causal scenarios. Meder et al. taught participants a structure with four nodes and showed them a series of learning trials allowing them to learn the parameters from experience. Afterwards, participants estimated P(c=1|e=1). Even though the CE link was part of a larger causal structure, the rest of the structure is irrelevant for this particular inference.

In one condition for which P(c=1)=.30, SC=.35, and SA=.57, P(c=1|e=1) should be .35. Participants’ inferences were right on target. However, in another condition in which P(c=1)=.65, SC=.80, and SA=.24, P(c=1|e=1) should be .87; yet participants’ inferences were only .51. Unfortunately for our purposes, all three parameters changed across the two conditions prohibiting a clear analysis of P(c=1). However, in the second condition P(c=1|e=1) was lower than P(c=1), which should never happen with positive causal relations. This result reflects, at minimum, a misuse of the base rate.

Use of SC

Meder, Hagmayer, and Waldmann (2008, 2009) also examined the use of SC: if SC>0, then P(c=1|e=1) > P(c=1|e=0). One trend across all their experiments is that participants’ inferences were not nearly as extreme as predicted. For example, in one experiment (Meder et al., 2008, Experiment 1), the normative probabilities were P(c=1|e=1)=.95 and P(c=1|e=0)=.10. However, participants’ responses, converted to probabilities, were P(c=1|e=1)=.76 and P(c=1|e=0)=.43. Thus, even though the direction of the effect was correct, the estimates were “conservative.”

Fernbach and Rehder (2012; Experiment 1) told their participants the parameters SC and SA and collected judgments of P(c=1|e=1). Both SC and SA were manipulated in a 2×2 to be either strong or weak. Unfortunately, the third relevant parameter, P(c=1) was not provided to participants. We analyze this study two ways. First we assumed a plausible value of P(c=1)=.67. Comparing the conditions when SC was increased but SA was held constant, we would only expect an increase of about .03 – .06 in P(c=1|e=1). However, participants inferences increased by about .20. Another way to analyze these data is to reverse-derive P(c=1) using the normative model given the supplied values of SC, SA, and the participants’ average judgment of P(c=1|e=1) for one condition. Then the normative inference for P(c=1|e=1) can be derived for the other condition. This analysis also shows that participants’ inferences varied too much based on the change in SC.

Fernbach and Rehder’s (2012) study also asked for inferences of P(c=1|e=0). The increase in SC led to a decrease in estimates of P(c=1|e=0); this general pattern is normative. However, both methods of analysis reveal that the inferences of P(c=1|e=0) changed too little based on the change in SC. In sum, there does not appear to be a clear pattern of how SC is utilized when inferring the state of a cause from an effect: there are complicated patterns of over- and under-utilization of SC relative to the normative model.

Use of SA

Fernbach and Rehder (2012) told participants the parameters of the CE structure and they manipulated SA. In their Experiment 1, the manipulation of SA should have produced differences in P(c=1|e=1) of about .12–.15, but participants inferred differences of only about .06. Although this was a significant difference, it is about half as much as is expected by the normative model.

Two studies have examined the impact of making the alternative cause A explicit when inferring the cause. The idea is to provide a reason for the times when the effect occurs without the observed cause. The standard task so far involves a CE structure with an implicit alternative cause; we have interpreted P(e=1|c=0) as SA. Two studies have reframed the scenario as a common effect structure CEA, in which A is explicitly mentioned and the parameters of A are provided to participants.

Krynski and Tenenbaum (2007; Experiment 2) used the standard mammography base rate neglect problem (cancer causes a positive mammogram, but there can also be false positives). Participants were told the parameters P(c=1), SC, and SA, and they inferred P(c=1|e=1). In one condition the false positive rate SA was not explained; the possible causes were implicit. In another condition the wording explicitly mentioned a second cause of a positive mammogram result, a benign cyst.

The inferences were more normative in the condition in which the alternative cause was explicitly mentioned. About 42% of the inferences in the “explicit” benign cyst condition were right on target. In contrast, only 16% of the inferences in the “implicit alternative cause” condition were right on target. Krynski and Tenenbaum (2007) interpreted this result as showing that people have an easier time reasoning about explicit causes than about fundamentally stochastic causes (the unexplained false positive rate). However, this explanation is not entirely satisfying because it is unclear why people wouldn’t just infer an additional cause for any one-link causal structure in which the effect occurred (positive mammogram) without the observed cause (malignant tumor).

Fernbach and Rehder (2012; Experiment 2) performed a similar manipulation making the alternative cause either implicit or explicit. Manipulating SA had no effect in the explicit condition, and the effect in the implicit condition was much smaller than expected. In sum, multiple studies have found insufficient use of SA.

Common Effect: C1EC2

In this section we focus on common effect structures when there are only two causes, C1 and C2, both of which are explicit in the model. This means that if both causes are absent then E must be absent, because there is no alternative background cause A. In this case, still assuming a noisy-OR gate with no interactions, then P(e=1) = 1− [1−P(c1=1)SC1][1−P(c2=1)SC2].

Discounting: P(C1| E) versus P(C1|E,C2)

Here we continue the discussion on P(c=1|e=1) that began in the section on CE structures, but now discuss this inference in relation to P(c1=1|e=1, c2=1). Assuming generative causal relationships (which we assume for this entire section), P(c1=1|e=1)>P(c1=1|e=1, c2=1). This inference is atypical: in most other structures the presence of one node increases the probability of another node (again assuming generative causal links). Alternatively, sometimes due to the Markov condition (“screening off”), one node is irrelevant to the probability of another node. But for a common effect structure, the presence of C2 actually decreases the likelihood of C1. This atypical reasoning pattern has been viewed as a key aspect of causal reasoning (see Khemlani & Oppenheimer, 2010, for a review).

We explain this pattern of reasoning using the Farming Scenario. In this scenario, an early frost and a tomato fruit-worm infestation are both sufficient to cause a poor tomato harvest; FT←W. Table 4 shows a hypothetical sample of 1000 farms for which 10% experience an early frost and 10% experience a tomato fruitworm infestation.

Table 4.

Example Data for Discounting

Row Early Frost (F) Tomato Fruit-Worm Infestation (W) Poor Tomato Harvest (T) Number Of Farms
A 1 1 1 10
B 1 0 1 90
C 0 1 1 90
D 0 0 0 810

Within the 190 farms that had a poor tomato harvest (Rows A–C), 100 of them had a tomato fruit-worm infestation; P(w=1|t=1)=.53. But if we know that a farm had a poor harvest and that it also had an early frost (rows A and B), only 10 out of the 100 had a tomato fruit-worm infestation; P(w=1|t=1, f=1)=.10. This phenomenon has been known in Artificial Intelligence (e.g., Pearl, 1988) as “explaining away,” and in psychology as “discounting”: knowing that the farm had an early frost explains away or discounts the possibility that the farm had an infestation.

More generally, the pattern of discounting can be conceived in the following way. Observing that E is present increases the probability that C1 is present compared to its base rate; P(c1=1)<P(c1=1|e=1). Subsequently observing that C2 is also present decreases the likelihood of C1; P(c1=1|e=1)>P(c1=1|e=1, c2=1). If C1 and C2 are both sufficient to produce E, then the probability of C1 falls all the way back down to its base rate; P(c1) = P(c1=1|e=1, c2=1). However, if C2 is weak and is unlikely to explain the presence of E, then the probability of C1 still remains higher than its base rate; P(c1=1)<P(c1=1|e=1, c2=1). See Equations 7 and 8 for the normative calculations. We now discuss empirical results related to discounting.

P(c1=1e=1)=SC1P(c2=1)+SC2-SC1SC2SC1P(c2=1)+SC2P(c1=1)-SC1SC2 Eq 7
P(c1=1e=1,c2=1)=SC1+SC2-SC1SC2SC1+SC2P(c1=1)-SC1SC2 Eq 8

The Prototypical “Discounting” Effect: P(c1=1|e=1, c2=1)<P(c1=1|e=1)

Morris and Larrick (1995) defined discounting as the relationship between P(c1=1|e=1) and P(c1=1|e=1, c2=1) and asked whether people discount normatively. There is a rich history of research on discounting within social psychology. However, most of these studies did not present people with all the parameters of the model, nor did they assess the parameters that participants were intuitively using, so that a normative analysis is not possible. To answer this question, Morris and Larrick (1995; pp 340–341) conducted a study using a classic discounting scenario in which participants were told that they would read essays written by other students about Castro’s regime in Cuba; half of the writers were randomly assigned to write essays that were pro- or anti-Castro (Jones & Harris, 1967). In terms of the causal structure framework, one of the potential causes, C1, was whether the writer’s personal attitude was pro or anti-Castro. The second potential cause, C2, was whether the writer was assigned to write an essay that was pro or anti-Castro. The effect, E, was whether the essay was pro- or anti-Castro.

Participants first judged the following four parameters: P(c1=1), the prior probability of the writer having a pro-Castro attitude, P(c2=1), the prior probability of a writer being assigned to write a pro-Castro essay, P(e=1|c1=1, c2=0)=SC1, the probability that a person with a pro-Castro attitude would write a pro-Castro essay even if he or she was assigned to write an anti-Castro essay, and P(e=1|c1=0, c2=1)=SC2, the probability that a person with an anti-Castro attitude would write a pro-Castro essay if he or she was assigned to write an pro-Castro essay. After reading the essay, which was always pro-Castro, the participants rated the probability that the writer had a pro-Castro attitude P(c1=1|e=1). Finally, participants were told that the writer was assigned to write a pro-Castro essay, and the participants judged again whether the writer’s attitude was pro-Castro, P(c1=1|e=1, c2=1).

Morris and Larrick (1995) found the normative discounting effect, P(c1=1|e=1) = .35 > P(c1=1|e=1, c2=1) = .30. In fact, the inference of P(c1=1|e=1) was close to the normative calculation of .36 based on participants’ own beliefs about the parameters. However, the P(c1=1|e=1, c2=1) = .30 inference was numerically higher than the normative calculations (.26), though not significantly so. Thus, it seems that participants did discount, though only about half as much as they should have.

Fernbach and Rehder (2012; Experiment 3, “present” condition) told participants a hypothetical scenario about a common effect structure, instructed them about SC1 and SC2, and asked them to infer P(c1=1|e=1) and P(c1=1|e=1, c2=1). In one condition in which SC2 was strong, there is a slight trend for P(c1=1|e=1, c2=1)<P(c1=1|e=1). Yet in another condition in which SC2 was weak, there was a slight trend in the opposite direction. Discounting should normatively be greater when SC2 is stronger, but it should never go in the opposite direction so long as the two causes are independent (see the next section). Unfortunately because participants were not told specific values for P(c1=1) and P(c2=1) we cannot quantitatively compare these inferences to the normative model.

Rehder (under review) told participants about a common effect structure without the parameters, and then had them choose which one is higher (or equal): P(c1=1|e=1, c2=1) versus P(c1=1|e=1). Across two experiments participants were either more likely to choose the former – the opposite of discounting – or there was not a significant difference.

In sum, relatively few studies that have examined discounting allow for comparisons to the normative model. Out of those that do, discounting appears to be weak, and sometimes the inferences go in the opposite direction of discounting.

A Related Discounting Effect: P(c1=1|e=1, c2=1) < P(c1=1|e=1, c2=0)

Several studies have compared inferences of C1 when C2 is present versus absent. Hagmayer and Waldmann (2007) and Luhmann and Ahn (2007; Experiment 3) presented participants with a series of learning trials; on each trial they observed C2 and E and judged whether C1 was present or absent. Fernbach and Rehder (2012; absent versus unknown conditions in Experiment 3) and Rehder (2011; independent condition) told people the parameters SC1 and SC2 and had them make judgments of P(c1=1|e=1, c2=1) and P(c1=1| e=1, c2=0).

For all these studies, participants’ inferences did exhibit the expected asymmetry P(c1=1| e=1, c2=1) < P(c1=1| e=1, c2=0). Unfortunately, in all these studies P(c1=1) was not identified, so quantitative comparisons to the normative model for P(c1=1| e=1, c2=1) were not possible. There was, however, an unexpected pattern. Because there are only two possible causes of E, P(c1=1| e=1, c2=0) should equal 1. In Hagmayer and Waldmann’s study (Experiment 1), participants’ inferences of P(c1=1| e=1, c2=0) were close to 1, but in Luhmann and Ahn’s study and Fernbach and Rehder’s study they were around 0.75. A possible interpretation would be that participants inferred that there was another unobserved cause of E. Inferring another unobserved cause could also dampen the standard discounting effect P(c1=1|e=1, c2=1) < P(c1=1|e=1). However, if participants in these studies had inferred unobserved generative causes, they should also have given higher ratings of P(e=1|c1=1) than would be expected from the two known causes. At least in Fernbach and Rehder’s (2012) study, this was not the case.

Sussman and Oppenheimer’s (2011) investigation involved three variables representing plumbing parts (e.g., tightness of a clamp, amount of water flowing through a spout); they also tested discounting. The authors found no discounting in Experiment 1; and in Experiment 2, discounting was less than predicted by the normative model.

The Influence of SC2 on P(c1=1|e=1) and P(c1=1|e=1, c2=1)

Both P(c1=1|e=1) and P(c1=1|e=1, c2=1) should decrease with higher values of SC2; the stronger that C2 is, the more sufficient that C2 is to explain the presence of E, and thus the less that C1 is needed to explain E. Fernbach and Rehder’s (2012) participants were told about the common effect structure and told the parameters SC1 and SC2. Participants’ inferences of P(c1=1|e=1, c2=1) were lower when SC2 was higher (Experiment 3; Present condition). However participants’ inferences of P(c1=1|e=1) were not sensitive to SC2 (Experiment 2, explicit condition and Experiment 3, unknown condition). But, these inferences cannot be quantitatively compared to the normative model because two parameters, P(c1=1) and P(c2=1), were not known.

Discounting when two Causes Are Correlated

In the previous discussion of discounting on a common effect model, C1EC2, the two causes were assumed to be independent, in which case P(c1=1) ≤ P(c1=1|e=1) ≥ P(c1=1|e=1, c2=1). Here we consider instances when the two causes are correlated (see Figure 15b), in which case these inequalities do not necessarily hold. Instead of presenting equations, we explain discounting with correlated causes using Figure 15. C1 and C2 could be correlated if there is an underlying common cause or a direct link between C1 and C2; the inferences in Figure 15a are derived assuming an additional link C1C2 with the joint probability P(C1, E, C2)=P(E|C1, C2)P(C2|C1)P(C1).

Figure 15.

Figure 15

Discounting when Causes are Dependent versus Independent; The Graph Plots P(c1=1) as the States of E and C2 are learned

One easy way to think about discounting is to consider how the inference about C1 changes after first learning that e=1, and again after also learning that c2=1. Thus, one should read Figure 15a from left to right. The “independent” line in Figure 15a shows a typical discounting pattern when the two causes are independent. Learning that E is present increases the probability of C1. Then, learning that C2 is also present decreases the probability of C1.

The “positive” line in Figure 15a shows how the pattern of inferences involved in discounting is affected when the two causes are positively correlated. First, P(c1=1|e=1) is higher when they are positively correlated compared to when they are independent. To understand why, consider the common effect with correlated causes structure in Figure 15b. Learning that e=1 increases the probability of C1 through the C1E link, and it also indirectly increases the probability of C1 through the C1C2E route. Subsequently learning that C2=1 results in a smaller drop in the probability of C1 compared to the structure with independent causes. The reason is that learning that c2=1 decreases the probability of C1 through normal discounting (the “bottom” path C1EC2), but increases the probability of C1 through the C1C2 route.

Now consider how discounting is influenced by a negative correlation between the two causes. Learning that e=1 increases the probability of C1 through the direct link C1E, but decreases the probability of C1 through the path C1C2E. This means that P(c1=1|e=1) is lower compared to when the causes are independent (line Negative 1 in Figure 15). In fact, if the C1C2E path is strong and C1E is weak (Negative 2 line), then it is possible for P(c1=1) > P(c1=1|e=1).

Subsequently learning that c2=1 results in a greater drop from P(c1=1|e=1) to P(c1=1|e=1, c2=1), compared to when the causes are independent (compare the Independent versus Negative 1 lines because they have similar parameters). Learning that c2=1 decreases the probability of C1 through normal discounting (the “bottom” path C1EC2) and also directly decreases the probability of C1 through the C1C2 path.

Morris and Larrick (1995; Experiment 2) tested whether people use the correlation between C1 and C2 in a discounting task. Again, the participants read essays about Castro’s regime, and they inferred whether the writer was pro-Castro (C1) given that the essay was pro-Castro (e1=1), both before and after learning that the writer was assigned to write a pro-Castro essay (c2=1). In the independent condition, writers were supposedly assigned to write pro or anti-Castro essays randomly. In the positive versus negative correlation conditions, pro-Castro writers were likely to be assigned to write pro-Castro essays (positive condition) or anti-Castro essays (negative condition). Consistent with the normative standard, participants discounted most strongly [P(c1=1|e=1) vs. P(c1=1|e=1, c2=1)] in the negative correlation condition, least strongly in the positive correlation condition, and at an intermediate amount in the independent condition.

However, one aspect of the results, not discussed by Morris and Larrick, was that across all conditions and for both judgments of P(c1=1|e=1) and P(c1=1|e=1, c2=1), participants tended to provide lower estimates compared to the normative standard. This under-prediction resulted in some surprising patterns of reasoning. In the negative correlation condition, participants’ average inference of P(c1=1|e=1) = .38 was lower than their inference of P(c1=1) = .48 (we do not know if it was significantly lower); P(c1=1|e=1) should have been .51. In the independent condition, the two inferences were essentially equal, P(c1=1|e=1) = P(c1=1)=.48, even though P(c1=1|e=1) should have been .63. In the positive correlation condition P(c1=1|e=1) > P(c1=1), although the difference was not as large as expected. In sum, this experiment suggests that people are remarkably normative in their overall pattern of discounting, but the inferences were biased to be low. These low inferences might be explained by participants underweighting their own prior on C1 or by their own strength of C1.

Summary of Discounting

A number of studies have demonstrated that people sometimes discount the likelihood of one cause when another cause is known to have occurred and is sufficient to explain the presence of the effect. People are even sensitive to the correlation between the two causes. However, there are also a number of findings in which discounting was considerably smaller than the amount implied by the normative model, in which there was no discounting at all, or in which the inferences went in the opposite direction of discounting. Clearly there are many remaining empirical questions about discounting.

Use of the Base Rates in Diagnostic Judgments

Reips and Waldman (2008) conducted a study of diagnostic learning when there were two diseases (causes) that both caused the same symptom (effect). Both diseases always caused the symptom, so the diagnostic judgment P(c1=1|e=1) should perfectly reflect the frequency of the diseases. Participants learned from experience that C1 was three times more common than C2, and they could accurately report the base rates. Although their judgments of P(c1=1|e=1) were greater than P(c2=1|e=1), the difference was not close to the expected 3:1 ratio. Similar to the results for the CE structure, this result implies under-sensitivity to the base rates.

P(c1=1|e=0, c2=1) versus P(c1=1|e=0, c2=0) with Conjunctive Causes

So far we have only discussed scenarios in which C1 and C2 combine through a Noisy-OR rule. Rehder (2011) investigated how people reason about probabilistic conjunctive causes; when C1 and C2 are both present, the combination can cause E to be present, and there is also another independent unobserved cause of E. In this case, P(c1=1|e=0, c2=1,) < P(c1=1|e=0, c2=0). When the effect is absent, if c2=1, then C1 is probably absent (if it was present then E would probably have been present). However, if c2=0, then c1 could be either 0 or 1.

Rehder (2011) presented participants with a common effect structure and a conjunctive causes cover story, then told participants the causal strength parameters, and asked them to infer P(c1=1|c2=1, e=0) and P(c1=1|c2=0, e=0). He found the predicted asymmetry. Additionally, he found that the inferences of P(c1=1|e=0, c2=0) were low, despite the fact that C1 was described as usually being present. This pattern probably reflects misuse of the base rate of C1.

Inferring E from Multiple Causes: P(E|C1, C2)

Fernbach and Rehder (2012; Experiment 3, “present” condition) conducted a study in which participants were told about a common effect structure, were told about the strength parameters [SC1=0.6 and SC2=.25 versus 0.75], and were asked to infer P(e=1|c1=1, c2=1). Although this inference was higher when SC2 was higher, the ifference was not as large as it should have been. The normative model predicts .9 versus .7, a difference of .2, but their participants inferred only a difference of about .07. People do not use the strength parameters as strongly as they should.

Counterfactual Questions: P(c1=1| if e had been 0 instead of 1)

So far we have discussed inferences based on observations and interventions. Here we discuss a third type of inference, counterfactuals. Counterfactuals involve first observing the states of the nodes, and then asking a question about what would have been true if the actual conditions had not all occurred. For example, suppose one year on the farm (Figure 1) there was not an early frost nor an infestation and there was a good tomato harvest. A counterfactual could be “What is the likelihood of a good tomato harvest if there had been an early frost?”

One possible solution is to treat counterfactuals as observations; P(good harvest | early frost). The problem with this interpretation is that it discards our knowledge that the farm did not have an infestation. Pearl (2000) proposed that in many situations counterfactuals can be interpreted as interventions. In this case the counterfactual would be interpreted as P(good harvest | early frost, no infestation)

Consider a different counterfactual: “What is the likelihood of an early frost if there had been a poor tomato harvest?” (Remember that we know that there was not an early frost nor an infestation and there was a good harvest.) According to the intervention account, the intervention is on the poor harvest, which would mean that we maintain the belief that there was not an early frost nor an infestation. The potential problem with this account is that one might think the following: “if there had been a poor harvest, there is a decent chance that it is due to an early frost.” The intervention account does not allow for reasoning “upstream.”

Hiddleston (2005) proposed another, more complicated, way to represent counterfactuals that involves thinking about the possible minimal changes to the causal network in which the counterfactual is true, but all the other nodes are “minimally” different from their actual states. In contrast to the intervention account, a minimal change could involve changes in nodes “upstream” of the counterfactual variable.

Rips (2010; see also Sloman & Lagnado, 2005) tested how people interpret counterfactuals on a common effect structure. Across a variety of conditions, Rips found that none of the strategies (observations, interventions, or “minimal-networks”) by itself could account for all the results; he eventually proposed a modified version of the minimal-networks approach. In sum, it is not yet clear exactly how people interpret counterfactuals, and there is still expert disagreement on the normative interpretation of counterfactuals.

Conditional, “If … Then” Reasoning and Acceptability of Logical Arguments

There is a large literature on people’s inferences involving propositions stated in an “If … Then” syntactic format (Evans & Over, 2004, provides an excellent introduction). Many such sentences refer to causal relationships, and some philosophers and experimentalists have proposed that conditional statements “If p, then q,” are often interpreted probabilistically as P(q=1|p=1) (Evans, Handley & Over, 2003; Oberauer & Wilhelm, 2003; Over, Hadjichristidis, Evans, Handley & Sloman, 2007; see Bennett, 2003, on “The Ramsey Test”).

Furthermore, the logical rules of inference (Modus Ponens, Modus Tollens, Denying the Antecedent, and Affirming the Consequent) can also be interpreted as probabilistic inferences instead of logical. For example, consider the premise “If c=1, then e=1” (i.e. CE). “Affirming the consequent” is the inference “e=1, therefore c=1.” Logically this inference is invalid; however, consider how this inference might be viewed from causal structure perspective (Fernbach and Erb, in press; see Liu, Lo, & Wu, 1996; Oaksord, Chater, & Larkin, 2000, for other probabilistic accounts). First, in instances when the premise “If c=1, then e=1” refers to a causal relationship, CE, one may extend the structure with background knowledge and include P(c=1) and SC as well as other generative or inhibitory causes of E to form a structure like that in Figure 18. Second, assessing the acceptability of the inference “e=1, therefore c=1” could be interpreted as a request for the inference P(c=1|e=1). Thereby, the four canonical forms of logical argumentation can be reframed as conditional probability inferences. Table 6 maps between the logical and probabilistic interpretations of these inferences and gives mathematical derivations of the probabilistic inferences on the structure in Figure 18. For comprehensibility we talk about these logical inferences as interchangeable with conditional probability notation, even though many of the experiments actually had people judge the validity or acceptability of the logical arguments.

Figure 18.

Figure 18

Causal Structure for Conditional “If…then” Reasoning.

Table 6.

Logical and Probabilistic Interpretations of the Acceptability of Arguments

Logic Probability
Logical Name Inference Valid Inference Mathematical Derivation Influence of Increasing
SG SI
MP: Modus Ponens c=1
e=1
Y P(e=1|c=1) (SC + SGSCSG)(1 − SI) *
MT: Modus Tollens e=0
c=0
Y P(c=0|e=0)
1-1-(SC+SG-SCSG)(1-SI)1/P(c=1)-(SC+SG/P(c=1)-SCSG)(1-SI)
*
DA: Deny Antecedent c=0
e=0
N P(e=0|c=0) 1 minus; SG(1 minus; SI) *
AC: Affirm Consequent e=1
c=1
N P(c=1|e=1)
SC+SG-SCSGSC+SG/P(c=1)-SCSG
* -*

Note. ∴ stands for “therefore.”

*

denotes the classic effects. The arrows refer to an increase, decrease, or no change in the acceptance of the logical argument.

Although not explicitly framed in terms of Causal Bayesian Networks, a number of studies have examined the effect of manipulating various parameters of the causal structure, the number or strength of alternative generative or inhibitory causes, on the perceived validity of logical arguments. We use SG and SI to denote the total likelihood of alternative causes generating vs. inhibiting E. Higher SG reflects lower necessity and higher SI reflects lower sufficiency of the CE relation. There are four classic and robust findings (see *s in Table 6; e.g., Cummins et al., 1991; Cummins, 1995; Quinn & Markovits, 1998; De Neys, Schaeken, & d’Ydewalle, 2003a). Increasing SI leads to 1) lower judgments of Modus Ponens and 2) lower judgments of Modus Tollens. For example, given the conditional from Cummins (1995), “If John studied hard, then he did well on the test,” it is possible to think of many possible disabling conditions (e.g., the test was very hard), which lead to a lower endorsement of Modus Ponens “John studied hard… therefore he did well on the test.” Additionally, increasing SG leads to 3) lower judgments of Denying the Antecedent and 4) lower judgments of Affirming the Consequent. These classic findings make perfect sense in terms of inferences on a causal network, and Fernbach and Erb (in press) have proposed a causal network framework to model such effects (although they used a slightly different structure than Figure 18). (Note that these classic studies predicted no effect of SG on Modus Ponens and Modus Tollens and no effect of SI on Denying the Antecedent and Affirming the Consequent.)

This causal model account of conditional inference has a number of benefits. First, it clarifies the features of the causal scenario that matter; the number, base rates, and strengths of the alternative generative and inhibitory causes (Fernbach & Erb, in press). We group these factors together as SG and SI. In addition, one’s belief in the integration function would be critical, although here we are only discussing noisy-OR.

Second, this account makes many of the same predictions as those made in the classic studies (e.g., Cummins, 1995, see the *s in Table 6). In fact, it explains why SI is predicted to have no effect on Affirming the Consequent; it falls out of the equation in Row 4. However, we note that some studies have found a positive effect SI (De Neys, Schaeken, & d’Ydewalle, 2002, Experiment. 2; 2003b; Beller, 2006).

Third, this account makes three different predictions than the standard ones (see the arrows in Table 6 not marked with an asterisk ). First, increasing SG should increase the acceptance of MP, P(e=1|c=1). People sometimes ignore implicit alternative generative causes for P(e=1|c=1) judgments (see the CE section), and Fernbach and Erb (in press) did not include them in their model, although two studies found this effect (Beller, 2006; Thompson, 1994). Second, increasing SG should decrease the acceptance of Modus Tollens, P(c=0|e=0). The effect is predicted to be small and the reason is quite complex. Normally once it is known that e=0 then it is very likely that c=0. But the stronger SG is, then the more likely that the alternative inhibitory causes were present, in which case it is less certain that c must have been absent (Cummins et al., 1991; De Neys et al., 2002, Ex. 2; 2003b; Markovits & Handley, 2005, Experiment 1; see also Cummins, 1995). Third, increasing SI should increase the acceptance of Denying the Antecedent, P(e=0|c=0) (De Neys, Schaeken, & d’Ydewalle, 2002, Ex. 2; Beller, 2006).

Our goal here is to point out how causal network models may be useful for explaining conditional reasoning effects, with the benefits of a formal yet flexible framework. The research is still insufficient to provide a strong argument for or against the value of this interpretation.

In a similar vein, Ali, Chater, and Oaksford (2011) have used a causal network framework to model conditional reasoning comparing common cause vs. common effect structures. For example, one of the common effect scenarios had two conditionals: “If I do not clean my teeth, then I get cavities” and “If I eat lots of sugar, then I get cavities.” Then participants were asked “I got a cavity… how likely is it that I did not clean my teeth?” P(c1=1|e=1), and “I got a cavity and I ate lots of sugar… how likely is it that I did not clean my teeth?” P(c1=1|e=1, c2=1). They found some of the effects predicted by causal structures such as discounting, but they also found some effects that are inconsistent with causal structures such as violations of the Markov Assumption, P(c1=1)>P(c1=1|c2=1).

In sum, there are some intriguing applications of causal networks to model the acceptability of logical arguments and conditional reasoning. Although these approaches show promise, these paradigms rely heavily upon the application of knowledge that people have about the causal relationships as well as linguistic pragmatics, which pose challenges for assessing the causal structure framework.

Diamond Structures

Diamond structures are unique in that there are two routes from the cause to the effect, and both routes must be simultaneously considered when performing inference. We already discussed a structure with two routes in the section on discounting when the two causes are correlated. Here we use M1 and M2 to refer to alternative mediators of the two routes (see Figure 19).

Figure 19.

Figure 19

Summary of the P(E|M1) Inference

Reasoning about Both Routes Simultaneously

Meder, Hagmayer, and Waldmann (2008) investigated whether people take M2 into account when inferring P(E|M1). To test this, they compared two inferences, when M1 is observed to be present P(e=1|m1=1) versus when one intervenes and sets M1 to be present, P(e=1|set m1=1). If M1 is observed to be present, then C and M2 are probably present, so P(e=1|m1=1) should be very high. In contrast, when M1 is intervened upon and set to 1, the intervener has no knowledge of the state of C or M2; the best estimate of M2 is its base rate. Thus, P(e=1|set m1=1) should be lower than P(e=1|m1=1). Through the same logic, the opposite pattern holds for observing versus setting m1=0. In sum, the following asymmetries should hold: P(e=1|m1=0) < P(e=1|set m1=0) < P(e=1|set m1=1) < P(e=1|m1=1).

Meder, Hagmayer, and Waldmann (2008; Experiment 1) told participants about the diamond structure and participants experienced a series of trials to learn the parameters. Remarkably, participants’ answers to the inference questions reflected the predicted asymmetries. These results suggest that their participants understood the difference between interventions and observations, and understood that M1 and M2 would be correlated for observations but not for interventions, and used both M1 and M2 to infer E. In follow-up experiments, Meder, Hagmayer, and Waldmann (2009) also demonstrated that people’s inferences are sensitive to the base rate of C and the strength of the causal relations.

Even though these studies did demonstrate the basic normative patterns, the inferences tended to be weaker (i.e., closer to the middle of the scale) than expected. For example, in Meder et al., (2008, Experiment 1), the difference between P(e=1|m1=1) and P(e=1|m1=0) and should have been .78 (when converted to a probability scale). However, participants inferred a difference of only .37. This could have been due to underweighting the strength of M1, or underweighting any of the three other causal strengths (Figure 19). Additionally, the difference between P(e=1|set m1=1) and P(e=1|set m1=0) should have been .48, but participants inferred a difference of only .15. This reflects an underweighting of the impact of M1 on E (Figure 19).

Similar effects obtained in the 2009 study as well. Meder et al. (2009, Experiment 2) examined how differences in the base rate of C would affect inferences of P(e=1|set m1=0). When M1 is intervened upon and set to 0, the only possible cause of E is M2, and the probability of E should be higher to the extent that the base rate of C is higher. The manipulation of P(c=1) did produce a difference in the judgment of E, but the difference was only about half as large as it should be (.15 vs. .30). In sum, these experiments systematically demonstrate that people do use the parameters of a diamond model for inferring E, but in every case they seem not to use them as much as expected by the normative model.

Counterfactuals in Diamond Structures

Meder, Hagmayer, and Waldman (2009) asked another question that they called a “counterfactual intervention.” In the standard “hypothetical intervention” question, the states of the variables are not known before the intervention. However, in the counterfactual intervention question, participants were told the state of M1 before it was manipulated such as the following: “What is the probability of E given that you saw that M1 was absent and then you intervened and made it present?” Because the state of M1 was known to be 0 before the intervention, one can infer that C and thus M2 are probably also 0. In this way, the counterfactual intervention question requires reasoning about both routes (M1-E) and (M1-C-M2-E).

Meder, Hagmayer, and Waldman (2009) found that people’s inferences were only minimally different comparing standard intervention questions and counterfactual interventions. This lack of a difference could be interpreted as underweighting any or all of the causal strengths along this route or just general confusion about the question.

Intervening on Causal Structures to Produce Desired Outcomes

So far our discussion has focused on inference for its own sake. But, inferences also serve another purpose: they can help us identify interventions that produce desired outcomes (Meder, Gerstenberg, Hagmayer, & Waldmann, 2010; Sloman & Hagmayer, 2006). We can expand the standard causal network framework introduced in the introduction with utility nodes to represent the desirability of various events. In fact, the “Profit from Tomatoes” node in our Farming Scenario is essentially a utility node. Rationally, it would make sense to choose interventions that maximize the utility over all the utility nodes in the network.

Choosing an intervention to maximize the utility nodes of a causal network requires two steps in addition to performing inferences. First, instead of inferring the value of one node given an intervention, one must infer the value of all the utility nodes for a given intervention and sum across them. Second, one must choose the intervention to maximize expected utility. This decision may seem trivial, but given that people often exhibit probability matching instead of maximization in choice paradigms, it is possible that they will fail to maximize the utility of the network (Eberhardt & Danks, 2011).

Nichols and Danks (2007, Experiment 1) taught people a common effect structure C1EC2, in which C1 was stronger than C2. Participants could intervene on either C1 or C2 to try to produce the E, which was tied to a monetary reward. Not surprisingly, they were more likely to intervene on C1. Out of the participants who intervened on C2, most of them incorrectly believed that C2 was stronger than C1.

In a second experiment, Nichols and Danks taught participants about a chain structure CME. Intervening on M was more likely to produce E than intervening on C; however, the “cost” of intervening on M was greater than the “cost” of intervening on C, making the average expected payoff higher for C than M. Seventy-eight percent of participants intervened on the variable that, according to their beliefs about the network, would maximize their expected payoff. However, 15% of participants still chose interventions that did not maximize expected payoff, according to their own beliefs about the causal structure.

Hagmayer and Meder (2008; 2012, Meder & Hagmayer, 2009) investigated a similar phenomenon with the structures in Figure 20. The square nodes represent possible interventions, the P node represents an outcome to be maximized, and the plus signs denote the size of the outcome given that a given combination of nodes (A, and or B, and or C) is active. In Hagmayer and Meder’s study (2012; Experiment 3) participants first learned the causal structures (either Figure 20a or 20b) by activating L or W 100 times and observing whether A, B, or C became active and the value of P. Afterwards, participants were told that the A node was removed from the network, and they had 10 opportunities to activate L or W in order to maximize P.

Figure 20.

Figure 20

Choosing Actions to Maximize Payoff P.

Those who believed the structure to be the one in Figure 20a almost always chose W; intervening on L would have no chance of producing P now that A was removed from the network. However, participants who believed the structure to be the one in Figure 20b only chose L 55% of the time, even though they understood that L still had a higher expected value than W. In sum, people’s beliefs in the causal structure did have a large influence on their choices, but they also did not choose interventions that would fully maximize the outcome according to their own beliefs. Probability matching is a likely explanation.

These initial studies suggest that, for the most part, people use their beliefs about causal structures to choose actions that will increase payoffs. We speculate that when people confidently believe in a causal structure, they tend to maximize instead of probability match (e.g., Taylor, Landy, and Ross, 2012). But, in the real world, where people often choose interventions with incomplete knowledge of the relevant causal system, the probability matching habit emerges.

General Discussion

In this review we focused on studies in which people are given a causal structure, learn the parameters, and then make inferences. We started with a review of behavioral studies that examined violations to the Markov Assumption. We then catalogued various inferences that can be made on chain, common cause, one link, common effect, and diamond structures. Finally, we discussed how people decide to intervene on causal structures in order to produce a desired effect. In this General Discussion, we first discuss the uses of a normative rational analysis. Then we summarize and reorganize key results into two sections based on (a) violations of the Markov Assumption and (b) conservative inferences or under-use of parameter information. We also discuss possible approaches to a more descriptive, semi-rational model.

The Uses of the Normative, Rational Analysis

The present review was motivated by the relatively recent invention of a normative model for representing and calculating the implications of a system of causal relationships. However, there is no consensus on the value of normative, optimal, rational models in the behavioral sciences. Indeed, the very nature of a normative analysis is a matter of dispute. So, we provide a brief discussion that clarifies our own views on this knotty bundle of questions.

What is an optimal, rational model?

Scientific, mathematical, and philosophical analyses produce models of real world situations that can be used to guide actions to achieve goals in an optimally efficient manner. In some cases, the goals are implicit (e.g., logical truth-maintaining coherence; precise forecasts of the operations of a mechanical system), and in others the goals are stated as part of the model of the situation (e.g., to trade-off expected risk and returns at a designated rate in an investment). Such normative models are evaluated with reference to their accuracy or their usefulness in achieving outcomes in objective physical, biological, or social realities. Some examples of normative models that have been used in the behavioral sciences are elementary logic, probability and other mathematical theories, utility theories and Game Theory in the von Neumann-Morgenstern tradition, “ideal” models for identifying sensory stimulus events, and physics laws of mechanics. For all of these normative models there is close to unanimous consensus among experts that the models are accurate descriptions of the relevant domains of reality. The Bayesian Causal Networks framework is a new candidate for a normative model to represent objective causal systems.

Normative models can be contrasted with descriptive or psychological models that attempt to explain and predict behavior by proposing psychological mechanisms. Some theorists believe that normative models are closely related to psychological-descriptive models (e.g., many economists assume that a “rational man” model provides a good description of the actual behavior of economic agents; many behavioral ecologists believe that optimal models are the best descriptive models for the behavior of foraging animals; cf. Krebs & Davies, 1993). But, most psychologists believe that there are significant differences between the predictions of normative models and actual behavior.

The first conceptual challenge facing behavioral researchers who want to use normative models is to specify the application of a normative framework to a behavioral task. In many cases the identification of an optimal model is not obvious, so alternate rational models must be entertained (see disputes in Jones & Love, 2011, and discussion in Holyoak & Cheng, 2011). Examples in the present context include maximizing the total payoff versus maximizing the probability of a payoff when choosing an intervention (e.g., Nichols and Danks, 2007), whether people interpret the scenarios used in typical experiments to be atemporal or temporal (Rottman & Keil, 2012), and whether people intuitively believe that causes combine using noisy-OR or some other function. Even in the highly constrained environment of a psychology experiment, there is always the potential for ambiguity about the scenario, task, goals, and relevant prior knowledge. In sum, claiming a model as optimal or rational for a particular task requires justification and often requires making simplifying assumptions about the task.

The special difficulty of defending a normative model for causality

Identifying a normative framework for causal reasoning is particularly challenging because of its rich and diverse nature. We talk (and think) fluently about many different domains of causality including biological, mechanical, psychological, and social causation: “The fruitworm infestation caused the poor tomato harvest”; “the icy highway caused the traffic accident”; “Jill’s intelligence caused her to get a perfect score on the SAT test.” We can comprehend the meaning of causal statements despite a lack of understanding as to how they occurred (e.g., “God caused the Red Sea to part”; “Fossil fuel emissions cause global warming,” “Smoking causes lung cancer”). We think about causal processes that unfold at many different time frames and orders of magnitude, and we fluently reason about both single cause-effect instances and statistical regularities.

Many find the Causal Networks formalism to be a useful normative framework of objective causation. However, there is still much controversy about using Causal Networks as a foundation for conceptualizing causation. First, there is less acceptance of its status as a normative model than for other popular normative systems (e.g., elementary mathematics, logic, probability, and mechanics). Second, Causal Networks are only a couple of decades old, and still changing at a higher rate than older, more established normative systems. Third, there is more disagreement on metaphysical assumptions concerning objective causation, than there is on referents of the other normative systems.

Uses of a normative analysis with no claims about its psychologically descriptive validity

Several useful applications of normative models involve no claims about relationships between the normative and psychological-descriptive theories (cf. Garner, 1974, pp. 192–193). Normative frameworks provide a language to describe experimental tasks and goals, to specify at least one procedure for performing a task, and to determine standards for accurate or optimal performance. For example, Morris and Larrick’s (1995) analysis of discounting provided a language to discuss discounting [as the relationship between P(c1=1), P(c1=1|e=1), and P(c1=1|e=1, c2=1)]. Their analysis also clarified the “objectives” that were underspecified in the previous attribution theory literature; depending on the causal structure and parameters P(c1=1|e=1, c2=1) should sometimes be greater than, equal to, or less than P(c1=1). Differences between the normative “answers” and human performance are often consequential. Knowing when humans are non-optimal may be useful in practical endeavors and to guide the design of remedial procedures. In the case of the present review, we believe it is important to know what kinds of errors people are likely to make when they reason intuitively or analytically, even in a controlled experimental setting, about what’s causing what and how to use causal knowledge to bring about desired outcomes.

A closely related approach is to keep normative and descriptive accounts separate, but to pursue a research program to map the two levels onto each other. The most commonly cited inspiration for this research tactic is David Marr’s three-level framework (1982), which distinguished between a Computational Level (a functional analysis, often a normative model, including the actor’s goals), an Algorithmic-Representational Level (the descriptive-psychological model), and an Implementational Level (a neural-biological model)—and which promoted formal mappings between adjacent levels.

Normatively-inspired descriptive models

Many behavioral researchers go a step further and use normative models as an inspiration for psychological-descriptive theories or principles (see Anderson, 1990; Anderson & Milson, 1989 as exemplars). They first complete a normative analysis of the task and then use that analysis (with samples of behavioral data) to guide the invention of a descriptive model. The most commonly mentioned justification for this interaction between the two types of models is to note that humans are selected by evolution and shaped by learning to excel at tasks that are important to our survival, so that many of the normative principles are likely to be “wired-in” genetically or learned from individual experience as an adaptive strategy.

When applying a rational framework to empirical results, it is often found that human minds are bounded or lazy in ways that prevent them from performing the optimal calculations required for “full rationality” (Gigerenzer, Todd, & The ABC Research Group, 1999; Kahneman, 2003; Payne, Bettman, & Johnson, 1993; Shah & Oppenheimer, 2008; Simon, 1955). The notion that informal causal inference would follow shortcuts is especially plausible when one thinks though all of the calculations that would be necessary for a sufficient model of the optimal computation (Fernbach & Rehder, 2012; or see the complex equations in this article). Because the application of Causal Network models is so new, there are no full-fledged general proposals for the manner in which the rational model should be adjusted to be more descriptive. In the following sections we cite some proposals for parts of the problem.

The normative model is the descriptive model

The most extreme approach is to say we don’t need a descriptive model because we can predict behavior from only the normative model. No one so far has explicitly proposed this claim for causal reasoning, although some researchers have come close, by emphasizing the correspondences between Bayesian networks and participants’ judgments (e.g., Krynski & Tenenbaum, 2007; Sloman & Langado, 2005; Waldmann & Hagmayer, 2005). Yet we believe that most researchers expect there will be some reliable differences between normative and descriptive accounts (Jones & Love, 2011, and commentary). Our review refutes the strong claim with several examples of consistent discrepancies between human judgments and the implications of well-defined causal networks.

In the next sections we discuss the two main deviations from the normative model: violations of the Markov Assumption and conservative or weak inferences. We also discuss possible modifications to the normative model to make it more descriptive.

Summary of Main Results

The studies we have reviewed almost all had the goal of examining whether an experimental manipulation produced a significant effect in the direction predicted by the normative model. We also looked for patterns of the quantitative fit of the normative model, such as conservative biases where the inference was “in the right direction” but was too weak. We emphasized systematic patterns across experiments rather than deviations in particular means in single experiments. However, because many of these inferences have been studied in only one or two experiments and often those experiments were not designed to investigate the particular comparison we were interested in, some of our conclusions are educated judgment calls. We discuss these findings in terms of the farming example used in the introduction, reprinted in Figure 21.

Figure 21.

Figure 21

Farming Scenario

Violations of the Markov Assumption

The Markov Assumption specifies which nodes should be ignored for a particular inference, which simplifies reasoning. However, many studies found violations of the Markov Assumption. For example, if one knows that there was a poor tomato harvest (T), learning about an early frost (F) should not have any impact on inferences about profit (P), yet it did. Likewise, if one knows that there was an early frost on the farm (F), learning that there was a poor cantaloupe harvest or a good cantaloupe harvest (C) should not have any bearing on whether there was a poor tomato harvest (T). Yet it did here, too. Burnett (2004) also found bigger violations for closer variables (e.g., F would have a bigger effect than C on inferring P even when the state of T is known).

Some of these violations can be explained through alternative accounts that justify the apparent deviation with a rational or adaptive interpretation such as imagining additional nodes in the network or additional causal relationships outside those specified by the experimenter (e.g., Burnett, 2004). Back to the farming example, perhaps observing that there is a poor cantaloupe harvest is a sign that there was not enough rain, a variable not represented in the network, which might also cause a poor tomato harvest. Everyday causal systems are more complex than those in the experiments. Because of this complexity, some skeptics of the Causal Networks approach for engineering and data mining have argued that the Markov Assumption is unrealistically restrictive (Cartwright, 1999; 2001; 2002).

A more philosophical justification derives from the probabilistic nature of causality in these experiments. When a cause occurs and an effect does not (or vice versa), one interpretation implies that there must be an additional (generative or inhibitory) cause(s) that also influences the effect (Rottman, Ahn, & Luhmann. 2011). More fundamentally, if people act as if we live in a Laplacean world (i.e., if we know the state of everything in the universe then it is possible to perfectly predict the future), any contradiction between the causes and the predicted effects implies that there must be unknown factors. A person who conceives of causal relationships in this manner would certainly interpret an experimenter’s description of a small set of probabilistically related events as a subset of all relevant events. Believing that there are unobserved causes is not a problem for the causal network framework per se. But it is a problem if these unobserved relationships result in additional correlations between observed variables. Admitting this possibility undermines the validity of the experimental tests of the causal network framework, and it also challenges the validity of the framework in all applications.

Our view is that there are enough violations of the Markov independence condition, in cases where “importing” additional causal links was highly implausible or unjustified, to force the conclusion that humans reliably violate the principle. As noted before, it is also informative that these additional correlations have always been found to be positive; there is no reason a priori why they would not be negative.

There are several potential explanations for these patterns of reasoning. First, some people may engage in associative reasoning (e.g., Rehder, under review). Associative style reasoning implies that people don’t distinguish the direction of causal relationships (such as the difference between a common cause and common effect structure). Processes like second-order conditioning could potentially explain why people think that screened-off variables are still relevant. Alternatively, Hagmayer and Waldmann (2002) developed a constraint-satisfaction model of causal learning and reasoning. A characteristic of this model is that it is easy to learn individual causal relationships but harder to understand entire causal structures and the conditional and unconditional independencies (e.g., the difference between common cause vs. common effect structures). A related approach is to propose reasoning “locally” on subsets of the graph or single causal relations at a time (e.g., Fernbach & Sloman, 2009; Kruschke, 2006; Waldmann, Cheng, Hagmayer, & Blaisdell, 2008). In sum, these persistent violations warrant considering non-normative consistency-seeking explanations.

Conservative Inferences

Another result that has been reported in many different studies is that people made less extreme inferences than are implied by the parameters of the causal networks. “Base rate neglect” is the most obvious example of an under-sensitive inference. Consider the one-link structure: early frostpoor cantaloupe harvest. One would expect the probability of an early frost given a poor cantaloupe harvest to be higher than the prior probability of an early frost, although this was not always observed (Meder, Hagmayer, & Waldmann; 2009).

Consider the chain structure: early frostpoor tomato harvestsmall profit from tomatoes. What is the chance of a small profit given an early frost? For analogous questions, Baetu and Baker (2009) found that transitive inferences are not as strong as they should be. Rehder and Kim (2010) asked their participants to infer the marginal probability of small profit from tomatoes. Although participants’ inferences were influenced by the appropriate parameters (the base rate of early frost and the strengths of the causal links), they were not as sensitive as they should have been.

Consider the common cause structure: poor cantaloupe harvestearly frostpoor tomato harvest. During years in which there is a poor (vs. good) cantaloupe harvest it is likely that there would also be a poor (good) tomato harvest. In analogous situations in which people separately learned about the two causal relationships, they did not fully understand the extent to which effects of a common cause were correlated (Hagmayer & Waldmann, 2000; Perales, Catena, & Maldonado; 2004).

Consider the common effect structure: early frostpoor tomato harvesttomato fruitworm infestation. Learning that there was a poor tomato harvest makes an infestation more likely, but subsequently learning that there was an early frost suggests that there was not an infestation; the frost “explains away” the poor tomato harvest. Although “explaining away” is considered to be a hallmark of causal reasoning, the existing research has found it to be weaker than it should be, if present at all (Morris & Larrick, 1995; Sussman & Oppenheimer, 2011; Rehder, under review). Fernbach, Darlow, and Sloman (2011) asked participants questions analogous to “An early frost occurred; what is the probability that there was a poor tomato harvest?” Participants tended to ignore the possibility that an infestation could also cause a poor tomato harvest.

Because Figure 21 does not have a diamond, we modified it (Figure 22). Meder, Hagmayer, and Waldmann (2008; 2009) asked participants questions analogous to, “What is the probability of a small total profit given that there is a poor cantaloupe harvest?” implying that there probably was also an early frost and probably also a poor tomato harvest. They also asked the same question “… given that the cantaloupes were poisoned?”, which implies nothing about an early frost or the tomato harvest. Both of these inferences were closer to the middle of the scale than expected, which could reflect insufficient use of the parameters.

Figure 22.

Figure 22

Diamond Farming Scenario

There are a number of possible explanations for conservative inferences that derive from characteristics of the experimental tasks. First, it is possible that even though participants in these experiments were told the causal structure, they did not accept the experimenter’s statement of the causal structure. If people are uncertain about the causal structure they might perform inferences over multiple possible structures (Meder, Mayrhofer, & Waldmann, 2009; see Schum & Martin, 1982, for a related problem in law). However, many of the studies we review used novel variables and it is not clear why participants would have rejected the experimenters’ cover stories about the causal structure, especially when the learning data also matched the causal structure.

Second, it is possible that people had not fully learned the parameters of the causal model; if they had observed more evidence, their beliefs in the parameters might have been stronger. Meder, Hagmayer, and Waldmann (2009) proposed that their participants’ parameter estimates might have been influenced by a prior distribution (e.g., a uniform prior) that could have pulled inferences towards the middle of the scale. However, other theorists have argued that participants have non-uniform priors in mind. For example, Lu et al. (2008; see also Yeung & Griffiths, 2011) suggested that people expect causes to be either strong or non-existent, but not moderately strong. If people actually used these priors, then their inferences would be more extreme than the standard analysis; yet, people’s inferences tend to be more conservative.

There are two other pieces of evidence suggesting that conservative inferences are not just due to insufficient or pre-asymptotic learning of the parameters, or to averaging over participants with extreme but different judgments. First, the standard observation of base rate neglect (e.g., predictive value of breast cancer given a positive mammogram P(c=1|e=1); Eddy, 1982) occurs even when people are explicitly told the parameters. Indeed, base rate neglect has traditionally been found to be more extreme in situations in which the base rates are explicitly stated, compared to when they are learned from experience (Christensen-Szalanski & Beach, 1982; Koehler, 1996). Second, some of the studies found conservative inferences even compared to participants’ own stated beliefs in the parameters (e.g., Fernbach, Darlow, & Sloman, 2011; Morris & Larrick, 1995). Thus, we conclude that conservative inferences are caused by something more than insufficient learning of the parameters.

Third, it is difficult to separate true conservative reasoning from methodological artifacts associated with the rating scales used in all of the studies that ask for numerical ratings. It is plausible that some of the conservative habits are merely response biases produced by using response formats with a salient, “safe” or “compromise” midpoint. This artifact cannot be evaluated without systematic variations of the response scale formats, tests on inferences that involve different regions on the scales, and performance-contingent incentives.

Overall, some of the conservatism in judgments is likely due to general habits of caution. But, we also believe that there are hints in the conservative patterns of inferences that additional judgment habits are involved. We think it is unlikely that deliberate reasoning processes exactly map onto the Bayesian calculations. We conjecture that anchor and insufficient adjustment habits are plausible psychologically (cf. Lopes, 1987). The problem with this interpretation is that it simply re-labels the observed results, without providing deeper understanding, unless the anchoring process is further specified.

Let’s walk through a speculation on anchoring strategies. Consider inferences on the C→E structure. Suppose the following causal parameters are provided via verbal-numerical instructions [P(c=1)=.30, P(e=1|c=1)=.80, P(e=1|c=0)=.40], which imply P(e=1)=.52. What are some of the plausible anchor values for inferring P(e=1)? (i) One anchor would be zero; assume that E is not occurring and then adjust upwards for causal forces that increase its chances of occurring [P(e=1|c=1)=.80 and P(e=1|c=0) =.40]. (ii) Another anchor could be the salient value P(e=1|c=1)=.80; then adjust down towards P(c=1|e=0)=.40, or in the opposite direction. (iii) Some people might anchor on a mid-point between P(e=1|c=0)=.40 and P(e=1|c=1)=.80, perhaps .60, and then adjust downwards given that P(c=1)=.30. Note that these alternative anchoring strategies produce a range of predictions: anchor on zero, which is likely to produce a low rating, vs. anchor on .80, which is likely to produce a high rating. The predictions are blurred further by the plausible assumption that different participants are likely to anchor on different parameters.

We can also speculate about psychological processes when the causal structure is learned from samples, rather than declaratively through words and numbers. For an inference like P(e=1), participants might assess the memory strength or frequency in memory of (e=1) experiences, in which case the assessment is likely to be regressive with over-estimated low frequencies and under-estimated high frequencies (Attneave, 1953; Zacks & Hasher, 2002). For an inference like P(c=1|e=1), participants could try to recall the percent of (c=1) experiences out of the recalled set of (e=1) experiences.

This discussion makes it obvious that anyone who wants to make an empirical argument for an alternative to the Bayesian calculation will have to be clear about the alternative calculations that are proposed and increase the control and precision of the experimental methods. In fact, we hope researchers proceed in this fashion, as we do not believe that the humans’ explicit inferences about causal relationships are fully Bayesian. We also believe that some kind of serial averaging process is the most likely candidate for an alternative calculation, given the vast number of averaging results in the judgment literature and given that averaging, for most parameter values in the research we have reviewed, produces conservative final estimates.

We should also note a final complexity. One important aspect of the weak causal inferences is that they seem to run against the Markov violations. Take the structure CME. A typical violation of the Markov assumption involves inferring P(e=1|m=1, c=1) > P(e=1|m=1, c=0). The two judgments are too far apart, when they should be equal; C affects the inference about E when it should have been “screened off” by the knowledge of the mediator (M). In contrast, a standard too-weak transitive inference involves inferring that P(e=1|c=1) and P(e=1|c=0) are too close together. The only difference between these two sets of findings is whether the state of M is known or not. Recall that some researchers (e.g., Rehder & Burnett, 2005) proposed adding a “hidden mechanism” node to explain the Markov Violations, but adding such a node would lead to overly strong rather than weak transitive inference. The implication is that it is doubtful that there is a unitary rational explanation for these two results.

A potential way to model these two findings is with a linear averaging approach. When inferring E, M gets most of the weight but C still gets some weight. This approach could potentially capture the fact that C is weighted too little for transitive inferences, but it is weighted too much (it should have zero weight) when M is known. This approach might also be useful for explaining how people infer M on the chain CME or C on the common cause E1CE2. There is not much research on how normatively people make judgments like P(m=1|c=1, e=1), but it is likely that people use some sort of linear averaging instead of a Bayesian likelihood ratio calculation (e.g., H. Anderson, 1996; Lopes, 1987).

Summary of Possible Psychological Processes Involved in Causal Inference

Here we summarize some of the judgment problems faced in causal inference and present some potential cognitive process explanations; references appear in sections above.

Causal Structures are Complex

People may have difficulty understanding all the dependencies and conditional independencies implied by structures with multiple variables even if they understand each of the individual links. For example, explaining away and the independencies implied by the Markov assumption are not necessarily intuitive. Constraint satisfaction and associative reasoning strategies may provide some people with alternative representations for the structures. “Local” reasoning on parts of the structure could also explain why people have difficulty understand properties of the structure that emerge when reasoning about three or more nodes simultaneously.

Too much Information and Integration is Confusing

Performing the full Bayesian calculations requires reasoning about many nodes simultaneously, understanding how causes combine in complex ways (e.g., noisy-OR rule), and understanding how to use multiple parameters for a single inference. Even though anchoring is more of a description than a process model, it suggests a way to reduce complexity by focusing primarily on one piece of information, and then sequentially adjusting for other pieces of information.

Too Much Uncertainty

When one is uncertain about the causal structure or strengths, one might use “safe” defaults for judgments, such as the middle of the scale, or potentially rely on base rates with little updating. Uncertainty can also be built into the normative framework by integrating over possible structures or conditioning on sample size.

Limited Memory

When one experiences the probabilistic relationships between multiple variables, the number of cells in the joint probability table (e.g. Table 2) required to represent those experiences becomes very large. Focusing on the parameters instead of the contingencies simplifies the reasoning process, although we do not know if people naturally reason using the parameters or the raw experiences. Either way, memory biases could impact the assessment of parameters or judgments based directly on a mental version of the joint probabilities.

In sum, there are a variety of potential cognitive strategies and biases that could affect inferences on causal structures. We hope that summarizing these possibilities will encourage future research.

Conclusions

The Bayesian Probabilistic Causal Networks framework has stimulated a productive research program on human inferences on causal networks. Such inferences have clear analogues in everyday judgments about social attributions, medical diagnosis and treatment, legal reasoning, and in many other domains involving causal cognition. So far, research suggests two persistent deviations from the normative model. People’s inferences of one event are often inappropriately influenced by other events that are normatively irrelevant; they are unconditionally independent or are “screened off” by intervening nodes. At the same time, people’s inferences tend to be weaker than are warranted by the normative framework.

These conclusions do not sharply constrain the form of a descriptive model for causal reasoning. At one end of the spectrum, some psychologists may want to ignore the normative framework (although we hope they would be still consider its value as a model for objective causation). Such a theorist might want to “work up” from the lower implementational level such as associative networks or constraint satisfaction networks, which can mimic many of the properties of normative Causal Networks but are not committed to the strict normative calculus.

Another option is to start with the normative Causal Networks and to relax some of the assumptions. Some candidates for “relaxation” include (i) shifting from exhaustive hypothesis spaces to attention-limited subsets of cognitively salient hypotheses; (ii) considering alternative prior belief probability distributions (e.g., Lu et al., 2008); (iii) limiting updating inferences to a subset of network nodes (presumably because of working memory limits, attention limits, pragmatics, or proximity; e.g., Burnett, 2004); (iv) conditioning confidence in experimentally-learned parameter values on sample size or credibility to more realistically represent uncertainty about the network (cf. Winkler & Murphy, 1973); and (v) experimentally verifying that the participants in experiments have not added plausible nodes or links to the experimenter-defined causal system (e.g., Burnett, 2004).

Causal reasoning is one dramatic example of an exceptionally sophisticated system of inferences that approximates many properties of normative belief systems. The research we reviewed has shown that when the normative calculations of causal networks imply that the probability of an event should increase, the judgments usually go up; when they imply a decrease, judgments usually go down. At the same time, the experimental literature contains some substantial and systematic discrepancies between human inferences and those of the normative Causal Network framework. Empirical and theoretical research on these discrepancies is an important frontier for our exploration of human cognition and human nature more generally.

Figure 9.

Figure 9

Summary of the P(E1|E2) Inference

Figure 10.

Figure 10

Summary of the P(E1|set E2) Inference

Figure 11.

Figure 11

Summary of the P(E|C) Inference

Figure 12.

Figure 12

Summary of the P(E) Inference

Figure 13.

Figure 13

Summary of the P(C|E) inference.

Figure 14.

Figure 14

Summary of Discounting

Figure 16.

Figure 16

Summary of Discounting with Correlated Causes

Figure 17.

Figure 17

Summary of the P(E|C1, C2) Inference.

Table 5.

Direction of Influence of Variables in Eqs. 7 and 8

P(c1=1|e=1) P(c1=1|e=1, c2=1)
P(c1=1)
SC1
P(c2=1) -
SC2

Acknowledgments

This research was supported by NIH grant 1F32HL108711 and the University of Chicago Booth School of Business.

Footnotes

i

Hagmayer and Waldmann (2000) also collected a separate “implicit” measure that was closer to the normative value. However, this measure again might not reflect pure reasoning from E2 to C and then to E1, because participants observed C and then predicted E1 and E2.

ii

This is a different value than the one cited in the original article because of a slight error in calculating ΔP.

Contributor Information

Benjamin Margolin Rottman, Section of Hospital Medicine, Department of Medicine, University of Chicago

Reid Hastie, The University of Chicago Booth School of Business

References

  1. Ali N, Chater N, Oaksford M. The mental representation of causal conditional inference: Causal models or mental models. Cognition. 2011;119:403–418. doi: 10.1016/j.cognition.2011.02.005. [DOI] [PubMed] [Google Scholar]
  2. Anderson JR. The adaptive character of thought. Hillsdale, NJ: Erlbaum Associates; 1990. [Google Scholar]
  3. Anderson JR, Milson R. Human memory: An adaptive perspective. Psychological Review. 1989;96(4):703–719. [Google Scholar]
  4. Anderson NH. A functional theory of cognition. Mahwah, NJ: Erlbaum Associates; 1996. [Google Scholar]
  5. Attneave F. Psychological probability as a function of experienced frequency. Journal of Experimental Psychology. 1953;46(2):81–86. doi: 10.1037/h0057955. [DOI] [PubMed] [Google Scholar]
  6. Baetu I, Baker AG. Human judgments of positive and negative causal chains. Journal of Experimental Psychology: Animal Behavior Processes. 2009;35(2):153–68. doi: 10.1037/a0013764. [DOI] [PubMed] [Google Scholar]
  7. Bar-Hillel M. The base-rate fallacy in probability judgment. Acta Psychologica. 1980;44(3052):211–233. [Google Scholar]
  8. Beller S. What we can learn from causal conditional reasoning about the naive understanding of causality. In: Sun R, Miyake N, editors. Proceedings of the Twentieth Annual Conference of the Cognitive Science Society. Mahwah, NJ: Earlbaum; 2006. pp. 59–64. [Google Scholar]
  9. Bennett J. A Philosophical Guide to Counterfactuals. Oxford: Oxford U.P; 2003. [Google Scholar]
  10. Blaisdell AP, Sawa K, Leising KJ, Waldmann MR. Causal reasoning in rats. Science. 2006;311:1020–1022. doi: 10.1126/science.1121872. [DOI] [PubMed] [Google Scholar]
  11. Buchanan DW, Sobel DM. Children posit hidden causes to explain causal variability. Proceedings of the 33st Annual Meeting of the Cognitive Science Society; Boston, MA. 2011. [Google Scholar]
  12. Burnett RC. Inference from complex causal models. (Doctoral dissertation) Retrieved from ProQuestion Dissertations and Theses 2004 [Google Scholar]
  13. Cartwright N. The dappled world: A study of the boundaries of science. Cambridge: England: Cambridge University Press; 1999. [Google Scholar]
  14. Cartwright N. What is wrong with Bayes nets? Monist. 2001;84:242–264. [Google Scholar]
  15. Cartwright N. Against modularity, the causal Markov condition, and any link between the two. British Journal for the Philosophy of Science. 2002;53:411–453. [Google Scholar]
  16. Charniak E. Bayesian networks without tears. AI magazine. 1991;12(4):50. [Google Scholar]
  17. Cheng PW. From covariation to causation: A causal power theory. Psychological Review. 1997;104(2):367–405. doi: 10.1037//0033-295X.104.2.367. [DOI] [Google Scholar]
  18. Christensen-Szalanski JJ, Beach LR. Experience and the base-rate fallacy. Organizational behavior and human performance. 1982;29(2):270–8. doi: 10.1016/0030-5073(82)90260-4. [DOI] [PubMed] [Google Scholar]
  19. Cummins DD. Naive theories and causal deduction. Memory & cognition. 1995;23(5):646–58. doi: 10.3758/bf03197265. [DOI] [PubMed] [Google Scholar]
  20. Cummins DD, Lubart T, Alksnis O, Rist R. Conditional reasoning and causation. Memory & cognition. 1991;19(3):274–82. doi: 10.3758/bf03211151. [DOI] [PubMed] [Google Scholar]
  21. Danks D. The psychology of causal perception and reasoning. In: Beebee H, Hitchcock C, Menzies P, editors. Oxford handbook of causation. Oxford: Oxford University Press; 2009. pp. 447–470. [Google Scholar]
  22. De Neys W, Schaeken W, d’Ydewalle G. Causal conditional reasoning and semantic memory retrieval: a test of the semantic memory framework. Memory & cognition. 2002;30(6):908–20. doi: 10.3758/bf03195776. [DOI] [PubMed] [Google Scholar]
  23. De Neys W, Schaeken W, d’Ydewalle G. Causal conditional reasoning and strength of association: The disabling condition case. European Journal of Cognitive Psychology. 2003a;15(2):161–176. [Google Scholar]
  24. De Neys W, Schaeken W, d’Ydewalle G. Inference suppression and semantic memory retrieval: every counterexample counts. Memory & cognition. 2003b;31(4):581–95. doi: 10.3758/bf03196099. [DOI] [PubMed] [Google Scholar]
  25. Dickinson A, Shanks D, Evenden J. Judgment of act-outcome contingency: The role of selective attribution. Quarterly Journal of Experimental Psychology. 1984;36A:29–50. [Google Scholar]
  26. Eberhardt F, Danks D. Confirmation in the Cognitive Sciences: The Problematic Case of Bayesian Models. Minds and Machines. 2011;21(3):389–410. doi: 10.1007/s11023-011-9241-3. [DOI] [Google Scholar]
  27. Eddy DM. Probabilistic reasoning in clinical medicine: Problems and opportunities. In: Kahneman D, Slovic P, Tversky A, editors. Judgment under uncertainty: Heuristics and biases. Cambridge University Press; 1982. pp. 249–67. [Google Scholar]
  28. Evans JSBT, Handley SJ, Over DE. Conditionals and conditional probability. Journal of Experimental Psychology: Learning, Memory, and Cognition. 2003;29(2):321–335. doi: 10.1037/0278-7393.29.2.321. [DOI] [PubMed] [Google Scholar]
  29. Evans J, St BT, Over DE. If. Oxford: Oxford University Press; 2004. [Google Scholar]
  30. Fernbach PM, Erb CD. A quantitative causal model theory of conditional reasoning. Journal of Experimental Psychology: Learning, Memory, and Cognition. doi: 10.1037/a0031851. in press. [DOI] [PubMed] [Google Scholar]
  31. Fernbach PM, Rehder B. Argument and Computation. 2012. Cognitive shortcuts in causal inference. [DOI] [Google Scholar]
  32. Fernbach PM, Sloman SA. Causal learning with local computations. Journal of Experimental Psychology: Learning, Memory, and Cognition. 2009;35(3):678–93. doi: 10.1037/a0014928. [DOI] [PubMed] [Google Scholar]
  33. Fernbach PM, Darlow A, Sloman SA. Asymmetries in predictive and diagnostic reasoning. Journal of Experimental Psychology: General. 2011;140(2):168–85. doi: 10.1037/a0022100. [DOI] [PubMed] [Google Scholar]
  34. Garner WR. The processing of information and structure. Potomac, MD: Erlbaum Associates; 1974. [Google Scholar]
  35. Gelman A, Meng X-L. Applied Bayesian modeling and causal inference from incomplete data perspectives. New York: Wiley; 2004. [Google Scholar]
  36. Gigerenzer G, Hoffrage U. How to Improve Bayesian Reasoning Without Instruction: Frequency Formats. Psychological Review. 1995;102:684–704. [Google Scholar]
  37. Gigerenzer G, Todd PM & The ABC Research Group. Simple heuristics that make us smart. New York: Oxford University Press; 1999. [Google Scholar]
  38. Glymour C. The mind’s arrows: Bayes nets and graphical causal models in psychology. Cambridge, MA: MIT Press; 2001. [Google Scholar]
  39. Gopnik A, Glymour C, Sobel DM, Schulz LE, Kushnir T, Danks D. A theory of causal learning in children: Causal maps and Bayes nets. Psychological review. 2004;111(1):3–32. doi: 10.1037/0033-295X.111.1.3. [DOI] [PubMed] [Google Scholar]
  40. Griffiths TL, Tenenbaum JB. Structure and strength in causal induction. Cognitive psychology. 2005;51(4):334–84. doi: 10.1016/j.cogpsych.2005.05.004. [DOI] [PubMed] [Google Scholar]
  41. Griffiths TL, Tenenbaum JB. Theory-based causal induction. Psychological review. 2009;116(4):661–716. doi: 10.1037/a0017201. [DOI] [PubMed] [Google Scholar]
  42. Hagmayer Y, Meder B. Causal learning through repeated decision making. In: Love BC, McRae K, Sloutsky VM, editors. Proceedings of the 30th Annual Conference of the Cognitive Science Society. Austin, TX: Cognitive Science Society; 2008. pp. 179–184. [Google Scholar]
  43. Hagmayer Y, Meder B. Journal of Experimental Psychology: Learning, Memory, and Cognition. 2012. Repeated causal decision making. Advance online publication. [DOI] [PubMed] [Google Scholar]
  44. Hagmayer Y, Sloman S. Decision makers conceive of their choices as interventions. Journal of Experimental Psychology: General. 2009;138(1):22–38. doi: 10.1037/a0014585. [DOI] [PubMed] [Google Scholar]
  45. Hagmayer Y, Waldmann MR. Simulating Causal Models: The Way to Structural Sensitivity. In: Gleitman LR, Joshi AK, editors. Proceedings of the Twenty-Second Annual Conference of the Cognitive Science Society. Austin, TX: Cognitive Science Society; 2000. pp. 214–219. [Google Scholar]
  46. Hagmayer Y, Waldmann MR. Proceedings of the Twenty-Fourth Annual Conference of the Cognitive Science Society. Mahwah, NJ: Erlbaum; 2002. A constraint satisfaction model of causal learning and reasoning; pp. 405–410. [Google Scholar]
  47. Hagmayer Y, Waldmann MR. Inferences about unobserved causes in human contingency learning. Quarterly journal of experimental psychology (2006) 2007;60(3):330–55. doi: 10.1080/17470210601002470. [DOI] [PubMed] [Google Scholar]
  48. Hattori M, Oaksford M. Adaptive non-interventional heuristics for covariation detection in causal induction: Model comparison and rational analysis. Cognitive Science. 2007;31:765–814. doi: 10.1080/03640210701530755. [DOI] [PubMed] [Google Scholar]
  49. Hiddleston E. A causal theory of counterfactuals. Noûs. 2005;39:632–657. [Google Scholar]
  50. Holyoak KJ, Cheng PW. Causal learning and inference as a rational process: the new synthesis. Annual review of psychology. 2011;62:135–63. doi: 10.1146/annurev.psych.121208.131634. [DOI] [PubMed] [Google Scholar]
  51. Jara E, Vila J, Maldonado A. Second-order conditioning of human causal learning. Learning and Motivation. 2006;37(3):230–246. doi: 10.1016/j.lmot.2005.12.001. [DOI] [Google Scholar]
  52. Jenkins HM, Ward WC. Judgment of contingency between responses and outcomes. Psychological Monographs: General and Applied. 1965;79(1):1–17. doi: 10.1037/h0093874. [DOI] [PubMed] [Google Scholar]
  53. Jensen FJ, Nielsen TD. Bayesian networks and decision graphs. New York: Springer Verlag; 2007. [Google Scholar]
  54. Jones EE, Harris VA. The attribution of attitudes. Journal of Experimental Social Psychology. 1967;3:1–24. [Google Scholar]
  55. Jones M, Love BC. Bayesian Fundamentalism or Enlightenment? On the explanatory status and theoretical contributions of Bayesian models of cognition. Behavioral and Brain Sciences. 2011;34(04):169–188. doi: 10.1017/S0140525X10003134. [DOI] [PubMed] [Google Scholar]
  56. Kahneman D. A perspective on judgment and choice: Mapping bounded rationality. American Psychologist. 2003;58(9):697–720. doi: 10.1037/0003-066X.58.9.697. [DOI] [PubMed] [Google Scholar]
  57. Kahneman D, Tversky A. Subjective probability: A judgment of representativeness. Cognitive Psychology. 1972;3:430–454. [Google Scholar]
  58. Khemlani SS, Oppenheimer DM. When one model casts doubt on another: a levels-of-analysis approach to causal discounting. Psychological bulletin. 2011;137(2):195–210. doi: 10.1037/a0021809. [DOI] [PubMed] [Google Scholar]
  59. Kim NS, Luhmann CC, Pierce ML, Ryan MM. The conceptual centrality of causal cycles. Memory & Cognition. 2009;37:744–758. doi: 10.3758/MC.37.6.744. [DOI] [PubMed] [Google Scholar]
  60. Koehler JJ. The base rate fallacy reconsidered: Descriptive, normative, and methodological challenges. Behavioral and Brain Sciences. 1996;19(01):1. doi: 10.1017/S0140525X00041157. [DOI] [Google Scholar]
  61. Krebs JR, Davies NB. An Introduction to Behavioural Ecology, 4th ed. Oxford: Blackwell; 1993. [Google Scholar]
  62. Kruschke JK. Locally Bayesian learning with applications to retrospective revaluation and highlighting. Psychological Review. 2006;113(4):677–699. doi: 10.1037/0033-295X.113.4.677. [DOI] [PubMed] [Google Scholar]
  63. Krynski TR, Tenenbaum JB. The role of causality in judgment under uncertainty. Journal of experimental psychology. General. 2007;136(3):430–50. doi: 10.1037/0096-3445.136.3.430. [DOI] [PubMed] [Google Scholar]
  64. Lagnado DA, Sloman S. The advantage of timely intervention. Journal of Experimental Psychology: Learning, Memory, and Cognition. 2004;30(4):856–76. doi: 10.1037/0278-7393.30.4.856. [DOI] [PubMed] [Google Scholar]
  65. Lagnado DA, Sloman SA. Time as a guide to cause. Journal of Experimental Psychology: Learning, Memory, and Cognition. 2006;32(3):451–60. doi: 10.1037/0278-7393.32.3.451. [DOI] [PubMed] [Google Scholar]
  66. Lagnado DA, Waldmann MR, Hagmayer Y, Sloman SA. Beyond covariation: Cues to Causal Structure. In: Gopnik A, Schulz L, editors. Causal learning: Psychology, philosophy, and computation. Oxford: Oxford University Press; 2007. pp. 154–172. [Google Scholar]
  67. Lauritzen SL, Spiegelhalter DJ. Local computations with probabilities on graphical structures and their application to expert systems. Journal of the Royal Statistical Society. Series B (Methodological) 1988;50(2):157–224. [Google Scholar]
  68. Liu I, Lo K, Wu J. A Probabilistic Interpretation of “If-Then. Journal of Experimental Psychology. 1996;49A(3):828–845. [Google Scholar]
  69. Lopes LL. Procedural debiasing. Acta Psychologica. 1987;64(2):167–185. [Google Scholar]
  70. Lu H, Yuille AL, Liljeholm M, Cheng PW, Holyoak KJ. Bayesian generic priors for causal learning. Psychological review. 2008;115(4):955–84. doi: 10.1037/a0013256. [DOI] [PubMed] [Google Scholar]
  71. Luhmann CC, Ahn W. BUCKLE: a model of unobserved cause learning. Psychological review. 2007;114(3):657–77. doi: 10.1037/0033-295X.114.3.657. [DOI] [PMC free article] [PubMed] [Google Scholar]
  72. Markovits H, Handley S. Is inferential reasoning just probabilistic reasoning in disguise? Memory & cognition. 2005;33(7):1315–23. doi: 10.3758/bf03193231. [DOI] [PubMed] [Google Scholar]
  73. Marr D. Vision: A computational investigation into human representation and processing of visual information. San Diego: W.H. Freeman; 1982. [Google Scholar]
  74. Mayrhofer R, Goodman ND, Waldmann MR, Tenenbaum JB. Proceedings of the Thirtieth Annual Conference of the Cognitive Science Society. Austin, TX: The Cognitive Science Society; 2008. Structured correlation from the causal background; pp. 303–308. [Google Scholar]
  75. Mayrhofer R, Hagmayer Y, Waldmann MR. Proceedings of the Thirty-Second Annual Conference of the Cognitive Science Society. Austin, TX: The Cognitive Science Society; 2010. Agents and causes: A Bayesian error attribution model of causal reasoning. [Google Scholar]
  76. Meder B, Hagmayer Y. Causal induction enables adaptive decision making. In: Taatgen NA, van Rijn H, editors. Proceedings of the 31th Annual Conference of the Cognitive Science Society. Vol. 70. Austin, TX: Cognitive Science Society; 2009. pp. 1651–1656. [Google Scholar]
  77. Meder B, Gerstenberg T, Hagmayer Y, Waldmann MR. Observing and Intervening: Rational and Heuristic Models of Causal Decision Making. The Open Psychology Journal. 2010;(3):119–135. [Google Scholar]
  78. Meder B, Hagmayer Y, Waldmann MR. Psychonomic Bulletin & Review. 1. Vol. 15. Springer; 2008. Inferring interventional predictions from observational learning data; pp. 75–80. [DOI] [PubMed] [Google Scholar]
  79. Meder B, Hagmayer Y, Waldmann MR. The role of learning data in causal reasoning about observations and interventions. Memory & cognition. 2009;37(3):249–64. doi: 10.3758/MC.37.3.249. [DOI] [PubMed] [Google Scholar]
  80. Meder B, Mayrhofer R, Waldmann MR. A rational model of elemental diagnostic inference. In: Taatgen NA, van Rijn H, editors. Proceedings of the 31st Annual Conference of the Cognitive Science Society. Austin, TX: Cognitive Science Society; 2009. pp. 2176–2181. [Google Scholar]
  81. Morris MW, Larrick RP. When one cause casts doubt on another: A normative analysis of discounting in causal attribution. Psychological Review. 1995;102(2):331–355. doi: 10.1037/0033-295X.102.2.331. [DOI] [Google Scholar]
  82. Nichols W, Danks D. Proceedings of the 29th annual meeting of the cognitive science society. Austin, TX: The Cognitive Science Society; 2007. Decision making using learned causal structures; pp. 1343–1348. [Google Scholar]
  83. Novick LR, Cheng PW. Assessing interactive causal influence. Psychological Review. 2004;111(2):455–485. doi: 10.1037/0033-295X.111.2.455. [DOI] [PubMed] [Google Scholar]
  84. Oaksford M, Chater N, Larkin J. Probabilities and polarity biases in conditional inference. Journal of experimental psychology. Learning, memory, and cognition. 2000;26(4):883–99. doi: 10.1037//0278-7393.26.4.883. [DOI] [PubMed] [Google Scholar]
  85. Oberauer K, Wilhellm O. The meaning(s) of conditionals: conditional probabilities, mental models and personal utilities. Journal of Experimental Psychology: Learning, Memory, and Cognition. 2003;29:680–693. doi: 10.1037/0278-7393.29.4.680. [DOI] [PubMed] [Google Scholar]
  86. Over DE, Hadjichristidis C, Evans JSBT, Handley SJ, Sloman SA. The probability of causal conditionals. Cognitive Psychology. 2007;54(1):62–97. doi: 10.1016/j.cogpsych.2006.05.002. [DOI] [PubMed] [Google Scholar]
  87. Payne JW, Bettman JR, Johnson EJ. The adaptive decision maker. New York: Cambridge University Press; 1993. [Google Scholar]
  88. Pearl J. Probabilistic Reasoning in Intelligent Systems. San Mateo: Morgan Kaufmann; 1988. [Google Scholar]
  89. Pearl J. Causality: Models, Reasoning, and Inference. Cambridge University Press; 2000. [Google Scholar]
  90. Perales J, Catena A, Maldonado A. Inferring non-observed correlations from causal scenarios: The role of causal knowledge. Learning and Motivation. 2004;35(2):115–135. doi: 10.1016/S0023-9690(03)00042-0. [DOI] [Google Scholar]
  91. Quinn S, Markovits H. Conditional reasoning, causality, and the structure of semantic memory: strength of association as a predictive factor for content effects. Cognition. 1998;68(3):B93–101. doi: 10.1016/s0010-0277(98)00053-5. [DOI] [PubMed] [Google Scholar]
  92. Rehder B. Human Deviations from Normative Causal Reasoning. Poster presented at the 28th Annual Conference of the Cognitive Science Society; Vancouver, British Columbia. 2006. [Google Scholar]
  93. Rehder B. Proceedings of the 33rd Annual Conference of the Cognitive Science Society. Boston, MA: Cognitive Science Society; 2011. Reasoning with conjunctive causes. [Google Scholar]
  94. Rehder B. Independence and nonindependence in human causal reasoning under review. [Google Scholar]
  95. Rehder B, Burnett RC. Feature inference and the causal structure of categories. Cognitive psychology. 2005;50(3):264–314. doi: 10.1016/j.cogpsych.2004.09.002. [DOI] [PubMed] [Google Scholar]
  96. Rehder B, Hastie R. Causal knowledge and categories: The effect of causal beliefs on categorization, induction, and similarity. Journal of Experimental Psychology: General. 2001;130(3):323–360. doi: 10.1037//0096-3445.130.3.323. [DOI] [PubMed] [Google Scholar]
  97. Rehder B, Kim S. Causal status and coherence in causal-based categorization. Journal of Experimental Psychology: Learning, Memory, and Cognition. 2010;36(5):1171–1206. doi: 10.1037/a0019765. [DOI] [PubMed] [Google Scholar]
  98. Rehder B, Martin JB. Proceedings of the 33rd Annual Conference of the Cognitive Science Society. Austin, TX: Cognitive Science Society; 2011. A generative model of causal cycles. [Google Scholar]
  99. Reips UD, Waldmann MR. When Learning Order Affects Sensitivity to Base Rates. Experimental Psychology. 2008;55(1):9–22. doi: 10.1027/1618-3169.55.1.9. [DOI] [PubMed] [Google Scholar]
  100. Rips LJ. Two causal theories of counterfactual conditionals. Cognitive science. 2010;34(2):175–221. doi: 10.1111/j.1551-6709.2009.01080.x. [DOI] [PubMed] [Google Scholar]
  101. Rottman BM, Keil FC. Causal Structure Learning over Time: Observations and Interventions. Cognitive Psychology. 2012;64:93–125. doi: 10.1016/j.cogpsych.2011.10.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  102. Rottman BM, Ahn W, Luhmann CC. When and how do people reason about unobserved causes? In: Illari P, Russo F, Williamson J, editors. Causality in the Sciences. Oxford U.P; 2011. pp. 150–183. [Google Scholar]
  103. Schum DA, Martin AW. Formal and empirical research on cascaded inference in jurisprudence. Law and Society Review. 1982;17(1):105–152. [Google Scholar]
  104. Shah AK, Oppenheimer DM. Heuristics made easy: An effort-reduction framework. Psychological Bulletin. 2008;134(2):207–222. doi: 10.1037/0033-2909.134.2.207. [DOI] [PubMed] [Google Scholar]
  105. Simon HA. A behavioral model of rational choice. Quarterly Journal of Economics. 1955;69(1):99–118. [Google Scholar]
  106. Sloman SA. Causal models: How we think about the world and its alternatives. Oxford: Oxford U.P; 2005. [Google Scholar]
  107. Sloman SA, Hagmayer Y. The causal psycho-logic of choice. Trends in cognitive sciences. 2006;10(9):407–12. doi: 10.1016/j.tics.2006.07.001. [DOI] [PubMed] [Google Scholar]
  108. Sloman SA, Lagnado DA. Do we “do”? Cognitive Science. 2005;29:5–39. doi: 10.1207/s15516709cog2901_2. [DOI] [PubMed] [Google Scholar]
  109. Spellman BA. Acting as Intiutive Scientists: Contingency Judgments Are Made While Controlling for Alternative Potential Causes. Psychological Science. 1996;7(6):337–343. [Google Scholar]
  110. Spirtes P, Glymour C, Scheines R. Causation, Prediction, and Search. N.Y: Springer-Verlag; 1993/2000. [Google Scholar]
  111. Steyvers M, Tenenbaum JB, Wagenmakers EJ, Blum B. Inferring causal networks from observations and interventions. Cognitive Science. 2003;27(3):453–489. doi: 10.1016/S0364-0213(03)00010-7. [DOI] [Google Scholar]
  112. Sussman AB, Oppenheimer D. A Causal Model Theory of Judgment. In: Hölscher C, Carlson, Shipley T, editors. Proceedings of the 33rd Annual Conference of the Cognitive Science Society. Austin, TX: Cognitive Science Society; 2011. pp. 1703–1708. [Google Scholar]
  113. Taylor EG, Landy DH, Ross BH. The effect of explanation in simple binary prediction tasks. Quarterly Journal of Experimental Psychology. 2012;65(7):1361–1375. doi: 10.1080/17470218.2012.656664. [DOI] [PubMed] [Google Scholar]
  114. Thompson Va. Interpretational factors in conditional reasoning. Memory & cognition. 1994;22(6):742–58. doi: 10.3758/bf03209259. [DOI] [PubMed] [Google Scholar]
  115. von Sydow M, Hagmayer Y, Meder B, Waldmann MR. Proceedings of the Thirty-Second Annual Conference of the Cognitive Science Society. Vol. 6. Austin, TX: Cognitive Science Society; 2010. How causal reasoning can bias empirical evidence; pp. 2087–2092. [Google Scholar]
  116. von Sydow M, Meder B, Hagmayer Y. A transitivity heuristic of probabilistic causal reasoning. In: Taatgen NA, van Rijn H, editors. Proceedings of the 31st annual conference of the Cognitive Science Society. Vol. 1. Amsterdam: The Cognitive Science Society; 2009. pp. 803–808. [Google Scholar]
  117. Waldmann MR. Knowledge-based causal induction. In: Shanks DR, Holyoak KL, Medin DL, editors. The psychology of learning and motivation. Vol. 34. San Diego: 1996. pp. 47–88. [Google Scholar]
  118. Waldmann MR. Combining versus analyzing multiple causes: how domain assumptions and task context affect integration rules. Cognitive science. 2007;31(2):233–56. doi: 10.1080/15326900701221231. [DOI] [PubMed] [Google Scholar]
  119. Waldmann MR, Hagmayer Y. Estimating causal strength: the role of structural knowledge and processing effort. Cognition. 2001;82:27–58. doi: 10.1016/s0010-0277(01)00141-x. [DOI] [PubMed] [Google Scholar]
  120. Waldmann MR, Hagmayer Y. Seeing versus doing: two modes of accessing causal knowledge. Journal of experimental psychology. Learning, memory, and cognition. 2005;31(2):216–27. doi: 10.1037/0278-7393.31.2.216. [DOI] [PubMed] [Google Scholar]
  121. Waldmann MR, Martignon L. A Bayesian network model of causal learning. In: Gernsbacher MA, Derry SJ, editors. Proceedings of the Twentieth Annual Conference of the Cognitive Science Society. Mahwah, NJ: Earlbaum; 1998. pp. 1102–1107. [Google Scholar]
  122. Waldmann MR, Cheng PW, Hagmayer Y, Blaisdell AP. Causal learning in rats and humans: a minimal rational model. In: Chatern N, Oaksford M, editors. The probabilistic mind. Prospects for Bayesian cognitive science. Oxford: Oxford University Press; 2008. pp. 453–484. [Google Scholar]
  123. Walsh CR, Sloman SA. Revising causal beliefs. In: Forbus K, Gentner D, Regier T, editors. Proceedings of the 26th Annual Conference of the Cognitive Science Society. Mahwah, NJ: Lawrence Erlbaum Associates; 2004. pp. 1423–1427. [Google Scholar]
  124. Walsh CR, Sloman SA. Updating beliefs with causal models: Violations of screening off. In: Gluck MA, Anderson JR, Kosslyn SM, editors. Memory and Mind: A Festschrift for Gordon H Bower. NY: Lawrence Erlbaum Associates; 2007. pp. 345–358. [Google Scholar]
  125. Winkler RL, Murphy AH. Experiments in the laboratory and the real world. Organizational Behavior and Human Performance. 1973;10(3):252–270. [Google Scholar]
  126. Woodward J. Making things happen: A theory of causal explanation. New York: Oxford University Press; 2003. [Google Scholar]
  127. Yeung S, Griffiths TL. Proceedings of the 33rd Annual Conference of the Cognitive Science Society. Austin, TX: Cognitive Science Society; 2011. Estimating human priors on causal strength; pp. 1709–1714. [Google Scholar]
  128. Yuille AL, Lu H. Advances in neural information processing systems, Vol. 20. Cambridge, MA: MIT Press; 2008. The noisy-logical distribution and its application to causal inference. [Google Scholar]
  129. Zacks RT, Hasher L. Frequency processing: A twenty-five year perspective. In: Sedlmeier P, Betsch T, editors. Etc. frequency processing and cognition. New York: Oxford University Press; 2002. pp. 21–36. [Google Scholar]

RESOURCES