Significance
We developed a theory of how mental simulations underlie the abductions of informal algorithms and deductions from these algorithms. Experiments tested the theory’s predictions using a task for the investigation of how naive individuals think about algorithms. Participants solved problems, abduced and described in their own words algorithms that solved such problems, and deduced the consequences of algorithms. Difficulty in formulating an algorithm and deducing its consequences depended on the algorithm’s Kolmogorov complexity. Results corroborated the use of kinematic mental models in creating and testing informal algorithms and showed that individuals differ reliably in the ability to carry out these tasks.
Keywords: cognitive processes, informal programming, problem solving, reasoning
Abstract
We present a theory, and its computer implementation, of how mental simulations underlie the abductions of informal algorithms and deductions from these algorithms. Three experiments tested the theory’s predictions, using an environment of a single railway track and a siding. This environment is akin to a universal Turing machine, but it is simple enough for nonprogrammers to use. Participants solved problems that required use of the siding to rearrange the order of cars in a train (experiment 1). Participants abduced and described in their own words algorithms that solved such problems for trains of any length, and, as the use of simulation predicts, they favored “while-loops” over “for-loops” in their descriptions (experiment 2). Given descriptions of loops of procedures, participants deduced the consequences for given trains of six cars, doing so without access to the railway environment (experiment 3). As the theory predicts, difficulty in rearranging trains depends on the numbers of moves and cars to be moved, whereas in formulating an algorithm and deducing its consequences, it depends on the Kolmogorov complexity of the algorithm. Overall, the results corroborated the use of a kinematic mental model in creating and testing informal algorithms and showed that individuals differ reliably in the ability to carry out these tasks.
The basis of much human thinking is the ability to make mental simulations, that is, to imagine a process step-by-step, so that it unfolds in the mind in the same temporal order as the events in the actual process. This hypothesis is central to the theory of mental models (1–4). The theory explains how individuals reason, but in tasks such as syllogistic or conditional reasoning, rival theories offer alternative accounts (5, 6), and it is not easy to decide among them empirically (7). The aim of the present paper, accordingly, is to show that human reasoners use kinematic mental models to simulate events. This concept of mental models in simulations depends on three assumptions, which derive from the model theory (8).
i) The mental models in simulations are iconic [i.e., their structures correspond to the structures of what they represent (9)]. Hence, a model of a spatial layout is itself spatial, and so the relations between objects in the world are mirrored in the spatial relations between them in the model (10).
ii) A kinematic model unfolds in time, and the sequence of situations that it represents corresponds to the temporal order of events in the world, real or imaginary (2, 11).
iii) Mental models can be schematic and more parsimonious than visual images, which they underlie (1), because models need not represent the world from a particular point of view or represent all of its visual features (12). They represent what is common to many possibilities differing in details, and they yield faster inferences than images (13).
Some cognitive scientists are skeptical about the existence of any mental representations (14, 15), some emphasize the role of the environment in constraining, affording, or situating intelligent behavior (16, 17), some allow representations only in the form of syntactically structured strings of symbols in a mental language (18), and some to the contrary allow representations only in sensory modalities (19). Our experiments were designed to illuminate these various ideas about representations.
The model theory postulates that the formulation of algorithms and computer programs depends on mental simulations. Computer programming calls for knowledge of programming languages, and so our studies focused on how naive individuals—those who knew nothing about programming—formulated algorithms in informal language. Programs often depend on a loop of operations (e.g., “For each of the n elements in an input list, put that element at the head of the output”). This “for-loop” reverses the order of a list, such as (A B C). The first step places A at the head of an otherwise empty output, the second step puts B at the head of the output, and the third step puts C at the head of the output. The result is (C B A). The same reversal can be carried out with a “while-loop” (e.g., “While the input list contains at least one item, put the item at the head of the input list to the head of the output”). While-loops are more powerful than for-loops, because only they can compute certain functions (20).
There have been investigations of deductions that call for a repeated loop of mental operations (21, 22) and of novice programmers’ grasp of loops (23, 24). Studies of algorithmic thinking in nonprogrammers are rare, but they suggest that nonprogrammers tend not to make spontaneous use of loops (25–27).
To investigate the mental simulation of loops, we needed a task suitable for individuals with no knowledge of programming. We devised a simple computer environment of a toy train, which mimics a Turing machine (20) but can be immediately grasped by naive participants including children. Unlike classical problems, such as the Tower of Hanoi (28) or missionaries and cannibals (29), the railway environment can be used to examine problems that differ in computational complexity (30) as we describe below. Fig. 1 presents the environment as it is shown on a computer screen. It consists of a railway track with a siding and labeled cars. Only three types of moves are possible: a move from the left track to the right track; from the left track to the siding, and from the siding to the left track.
Fig. 1.

The railway environment with an example of an initial configuration in which a set of cars is on the left side (A) of the track, the siding (B) can hold one or more cars while other cars are moved to the right side of the track (C). The program allows individuals to select a car (e.g., the highlighted “E” car) and to move it and all of the cars in front of it to the siding or the right track.
We used the train environment to examine naive individuals’ performance of three distinct categories of tasks. (i) “Problem solving” calls for individuals to rearrange a train, initially on the left track, so that it is in a specified order on the right track. (ii) “Abductive reasoning” yields explanations (31), and we broaden the term to cover reasoning that yields algorithms. Our task calls for individuals to abduce algorithms that solve whole classes of rearrangements, such as an algorithm that reverses the order of a train of any length. (iii) “Deductive reasoning” calls for individuals to infer the consequences of an algorithm for a given train. Here, we describe the model theory of these three tasks, its computer implementation, and the results of three experiments that corroborate its predictions about the three tasks. Finally, we draw some general conclusions about mental representations and simulation.
The Model Theory of Algorithms
To create an algorithm that solves any problem in a class of problems, the first step is to solve representative instances in the class. The second step is to use a simulation of the process of their solution to abduce an algorithm that solves any problem in the class. Additionally, to test the algorithm’s correctness, the third step is to use the algorithm itself, or to simulate it, to deduce its consequences for some new problems in the class. Each of these steps is a component of the model theory, and we have implemented each component in a computer program, mAbducer (for “model-based abducer,” available at http://mentalmodels.princeton.edu/models). We describe the theory of its three components in turn.
Problem Solving.
Although there are only three possible types of moves in rearrangement problems (R: move one or more cars to the right track; S: move one or more cars to the siding; and L: move one or more cars to the left track), trial and error soon leads to an exponential number of possibilities. A problem such as the Tower of Hanoi can be solved using means–ends analysis in which one works backward from the desired goal, invoking operations to reduce the difference between it and the current state (32). A Sudoku puzzle, however, cannot be solved using means–ends analysis, because, by design, it lacks a complete description of the goal (33). Rearrangement problems can be solved in a relatively unusual way, using a “partial” means–ends analysis, in which individuals decompose the goal, starting with the right-most car on the right track, and solve the problem of arranging one or more adjacent cars into their required position in a “piecemeal” way.
The input to mAbducer is the starting state of the track and the required goal. It maintains a model of the current state of the track and of the goal, and it solves the problem in a psychologically plausible way. The kinematic model that it uses to represent the railway is highly schematic. For example, this model from a kinematic sequence A[BA]BCC represents the car, A, on the left track, the cars BA on the siding as denoted by the square brackets, and the cars BCC on the right track. The goal is represented as a single sequence of cars, which need to be on the right track, with no cars on the siding or the left track (e.g., [ ]AABBCC). The program, which implements a partial means–ends analysis, matches cars on the left track and the siding with those required to be on the right track, updating the goal whenever at least one car is moved to the right track until it solves the problem. The program’s output is a trace of the successful sequence of moves.
The sequences of moves in the program’s solutions are intended to be psychologically plausible. Hence, the relative difficulty of a problem should depend on the number of moves in the program’s solution, and the mean number of operands per move. In a reversal problem, as the trace above shows, each move after the first one has an operand of a single car. We can contrast this case with the solution of a “palindrome” problem, such as the rearrangement from ABCCBA[ ] to [ ]AABBCC. We refer to the problem as a palindrome, because when the input is a palindrome, as in this case, it is sorted into the order illustrated above. The program’s solution calls for six moves and the total number of operands (moved cars) is 10, which is greater than the seven operands for the reversal problem. Even though the two problems have the same number of moves, the theory, therefore, predicts that the palindrome should be more difficult to solve than the reversal. Number of operands has a family resemblance to “relational complexity,” which concerns the number of arguments in a relation and affects the difficulty of solving problems (34). However, the number of operands concerns not the number of arguments of an operator but whether the value of a single argument is one or more cars. The two have in common that they increase the processing load on working memory. A corollary is that individuals should be likely to make unnecessary moves in their solutions (i.e., they should often fail to solve problems parsimoniously) because they move just one car instead of two or more.
An alternative theoretical approach is that solution depends instead, for example, on a proof procedure or on an algebraic manipulation (35). The difficulty of a problem is then likely to depend on the Levenshtein “edit” distance (36) (i.e., the number of additions, deletions, or substitutions to obtain the goal string of cars from the starting string of cars). This metric predicts the difficulty of certain deductive tasks (37).
Abductions of Algorithms.
Consider the task of formulating an algorithm for reversing a train of any length (i.e., given an input of a train of some arbitrary length, ABC…XYZ, the algorithm should yield: ZYX…CBA). A train with a small number of cars can be reversed with a small number of moves with no loops. However, the example calls for reversing trains of any length, and so a correct solution is bound to call for a loop of operations. The model theory postulates that individuals can nevertheless carry out the task. The process is abductive because it depends on creating an explanation of how to get from the input to the output (31). A putative solution can be tested using deduction, but it is not discovered by deduction alone—no more than is the discovery of a mathematical proof. According to the model theory, the creation of an algorithm depends on three steps, which are each modeled in the mAbducer program.
The program’s first step is to simulate the solutions to two instances of the problem to avoid ambiguity. It makes the simulations using the process described above. Because each move concerns a set of one or more cars, which move together, the process parallels the piecemeal simulation of the workings of complex mechanisms (4).
The program’s second step is to recover the loop of moves, and any moves that have to be performed before or after the loop. The program finds the repeated sequences of at least two moves. However, what determines the number of iterations of the loop? Because the loop can be either a for-loop or else a while-loop, there are two ways to proceed. One way is to solve a pair of simultaneous linear equations to obtain the values of a and b in n = a × length + b, where n is the number of iterations of a for-loop, and length is the number of cars in the train. Therefore, the two reversals above yield the values, 3 = 4a + b and 4 = 5a + b, and the solution is that a = 1 and that b = −1. Hence, for a train of length 6, a for-loop can be constructed in which the number of iterations of the loop for a reversal, n, equals (1 × 6) – 1 = 5. Another way to ensure that a loop is carried out for the required iterations is to determine the conditions under which a while-loop halts. A simulation shows that for a reversal the while-loop halts as soon as the siding is empty. Other types of problems have different halting conditions. They can be used in the description of a while-loop.
Next, mAbducer determines any moves that precede or follow the loop. In the present example, the loop is preceded by a move, S3 or S4, where the number of operands, again, depends on the length of the train or in the simulation when there is only one car remaining on the left track. After the end of the loop of moves, a final R1 occurs. The loop in the present example is “static” in that the number of operands for the moves in the loop remains constant from one iteration to the next. In other rearrangement problems, including those that use two stacks for their solution, loops are “dynamic” [i.e., the number of cars in a move within a loop varies depending on the length of the train and on whether the loop is in its first iteration, its second iteration, and so on (see the faro shuffle in SI Text S1)].
The program’s third step is to convert the structure of the solution, including the loop, into a verbal description of the algorithm. It translates both for-loops and while-loops into explicit descriptions in the programming language Lisp (see SI Text S1 for the translations). It also translates while-loops into informal English.
The theory predicts that naive individuals use simulations to abduce algorithms, and so it should be easier for them to detect the halting conditions needed for while-loops than to solve the simultaneous equations needed for for-loops. They should, therefore, be biased to use while-loops. The prime difficulty in solving a problem is the number of moves and operands. However, the prime difficulty in abducing an algorithm should be the complexity of the algorithm itself. We used Kolmogorov complexity as the relevant metric (38, 39), and we applied it to mAbducer’s while-loops, because of their psychological plausibility. We used the numbers of characters in its algorithms in Common Lisp (SI Text S1), multiplied by the number of bits in a character [i.e., 7 for ASCII (American Standard Code for Information Interchange)]. The first three problems in Table 1 call for static loops, but the faro shuffle, which is the converse of the “parity-sort,” calls for a dynamic loop. The faro shuffle of cards (also known as a “riffle”) has interesting mathematical properties relating to parallel computation and to the Fast Fourier transform (40). The four algorithms, which we used in our experiments, increase in complexity and in computational power—two stacks are needed to solve faro shuffles. However, Kolmogorov complexity is a simple general metric that captures this increase, which is otherwise hard to quantify.
Table 1.
Examples of four types of rearrangements, the total number of moves for each example of six cars, their mean number of operands, their edit distance, and the Kolmogorov complexities of the Lisp functions containing while-loops for rearranging trains of any length
| Rearrangements of ABCDEF | No. of moves | Mean no. operands | Edit distance | Kolmogorov complexity |
| Reversal yields: FEDCBA | 12 | 1.3 | 6 | 1,288 |
| Palindrome yields: AFBECD | 6 | 1.6 | 4 | 1,295 |
| Parity sort yields: ACEBDF | 7 | 1.4 | 4 | 1,519 |
| Faro shuffle yields: ADBECF | 9 | 1.3 | 4 | 1,771 |
Deductions from Descriptions of Algorithms.
The final task that we investigated is to deduce the consequences of an algorithm. mAbducer carries out this procedure to check the algorithm that it has abduced. For a train of a new length, it simulates the consequences of the algorithm. An obvious sign of an erroneous algorithm is that it halts before solving the problem. This type of error has not occurred with mAbducer, and so it is capable of automatic programming (for other methods, see 41, 42). Suppose that naive individuals familiar with the railway environment have to deduce the consequences of the reversal algorithm for the train, ABCDEF. They should carry out this task by mentally simulating a sequence of operations. Of course, the task of imagining this sequence could be too difficult for most individuals without access to pencil and paper, and so one aim of our empirical research was to determine whether they could cope with it. The primary factor that should cause difficulty in such simulations, given that they are of comparable numbers of moves and operands, is the Kolmogorov complexity of the algorithms.
We have outlined the model theory, and its computer implementation, of how individuals solve rearrangement problems, how they use simulations to abduce algorithms to solve them, and how they use simulations of the algorithms to deduce their consequences. We now turn to empirical tests of the theory’s predictions that number of moves and operands should determine the difficulty of solving problems, whereas Kolmogorov complexity should determine the difficulty of the abductions and deductions.
Experiment 1: Problem Solving
The experiment examined the ability of 20 students to solve rearrangement problems—a prerequisite for the subsequent studies, because if individuals cannot solve these problems with reasonable efficiency, they can hardly devise algorithms for their solution. However, the experiment was also a test of the first component of mAbducer—its procedure for solving rearrangement problems. It uses a single algorithm to carry out a partial means–ends analysis to decide what move to make next, which may have one or more operands. The experiment allowed the participants to manipulate the trains (on a computer screen), and so they did not have to simulate the process of solution but could carry out it directly. The aim was to determine whether naive individuals could carry out the task, whether its difficulty depended on mAbducer’s numbers of moves and operands, and whether they tended to err in overlooking parsimonious moves. The problems were presented using a graphical interface on a computer and consisted of all 24 possible rearrangements of trains containing four cars.
The important result was that naive individuals were able to solve these problems with ease. They produced very few incorrect solutions. We omitted the two extreme problems from the statistical analysis, so that they would not bias the results (i.e., the problem that required only one move to solution and the problem that had a total of 12 operands). The participants’ mean number of moves to solve a problem increased with the mAbducer’s number of moves (Page’s trend test; L = 1809.5; z = 8.47; P < 0.0001) and the mean number of moves also increased with mAbducer’s number of operands (Page’s trend test; L = 276; z = 5.69; P < 0.0001; see SI Text S2 for means and additional analyses). In other words, as the number of operands increased, so did the mean number of moves, independently of the number of moves in a mAbducer’s solution. The latency results likewise corroborated both of these effects. There was a reliable tendency for the participants to make redundant moves. Every participant made at least one redundant move, and we replicated this tendency in a follow-up experiment designed to elicit such errors. The main reason for redundant moves was perseveration. That is, when the participants moved a single car from the siding to the left track, they often overlooked the possibility of moving two cars together from the left track to the right track. The participants differed reliably in their ability to find parsimonious solutions (Friedman test; χ2 = 45.05; P < 0.001), and the best participant made a mean of 5.63 moves over all of the problems, and the worst participant made a mean of 7.54 moves over all of the problems. After the end of the experiment proper, the participants had to think aloud as they solved two further problems, and their protocols corroborated the use of a partial means–ends analysis in which they focused on the successive parts of the goal rather than the goal as a whole.
Experiment 2: Abduction of Algorithms
The experiment examined the model theory of how naive individuals abduce informal algorithms that solve rearrangement problems. They should rely on mental simulations of solutions of the problems. The experiment accordingly tested three empirical predictions. First, algorithms to solve rearrangements of trains of eight cars should be easier to create than those for trains of any length. The former do not require loops of operations, and so they should be simpler to deal with than the latter. Second, the difficulty in formulating algorithms should depend on their Kolmogorov complexity, not on metrics such as edit distance or number of moves (Table 1). Third, if participants use mental simulation, then they should be biased in favor of while-loops rather than for-loops, because they can observe the condition on the track when a while-loop ends, whereas the abduction of a for-loop calls for mental arithmetic to solve simultaneous equations. The experiment examined the three categories of problems with static loops, namely, reversals, palindromes, and parity sorts, which call for-loops with a constant number of operands in their instructions (SI Text S3). The 20 participants, who were not programmers, first solved five practice problems (different from those in the experiment) using the railway environment. The environment was then switched off, and they had to create algorithms for solving the three categories of problems either for trains of eight cars or for trains of any length. The problems of these two types were presented in separate blocks in two counterbalanced orders to make a total of six trials. The participants wrote their algorithms in informal language; a typical example of a participant’s correct algorithm for a reversal of trains of any length is as follows: “Move all cars to the right of A to the side. Then move A to the right. Shift B to left, then right. Shift C to left, then right...repeat until pattern is reached.” It is based on a while-loop (for other examples of informal algorithms, see SI Text S3). Because solutions were near ceiling for the eight car trains (92% correct), Fig. 2 presents the percentages of correct algorithms and the times the participants took to produce them (whether correct or not) only for trains of any length. The results corroborated the three predictions of the model theory. First, it was easier to formulate algorithms for trains of eight cars (92% correct) than for trains of any length (52% correct; Wilcoxon test; z = 3.29; P < 0.001). Second, the three types of rearrangements yielded the predicted trend in accuracy [i.e., reversals (90% correct), palindromes (70% correct), and parity sorts (63% correct); Page’s trend test; L = 256.5; z = 2.60; P < 0.005]. Participants created accurate algorithms more often when they tackled eight car trains in the first block than when they tackled trains of any length in the first block (82% vs. 65%; Mann–Whitney test; z = 1.70; P < 0.05). However, there was a three-way interaction (Mann–Whitney test; z = 1.94; P < 0.05) in that eight car problems were close to ceiling regardless of block or type of problem, whereas algorithms for trains of any length were affected by both variables. Once again, the latencies showed the same pattern of results (SI Text S4). Third, analyses of the algorithms revealed that the participants used reliably more while-loops than for-loops. For trains of eight cars, 61% of correct algorithms embodied loops (38% while-loops and 23% for-loops). For trains of any length, correct solutions were bound to use loops (82% while-loops and 18% for-loops). These data are based on the 18 participants who formulated at least one correct algorithm for trains of any length; 12 of them used more while-loops than for-loops and there were 3 ties (binomial test; P < 0.02). The bias toward while-loops was greater for trains of any length (Wilcoxon test; z = 2.4; P < 0.01). The use of while-loops had a reliable correlation with accuracy (r = 0.43; P < 0.005), whereas the use of for-loops tended toward a negative correlation with accuracy (r = −0.26; P = 0.09). Finally, the participants, who knew nothing about programming, differed overall in their ability to formulate correct algorithms (Friedman nonparametric analysis of variance; χ2 = 35.96; P = 0.01). The most accurate participant was correct on every problem, whereas the least accurate participant was correct for less than 20% of the problems.
Fig. 2.
The proportions of correct algorithms in experiment 2 for trains of any length depending on the type of rearrangement and whether the participants carried out problems of trains of any length in the first block (A) or the second block (B).
Experiment 3: Deduction from Algorithms
The model theory postulates that when naive individuals deduce the consequences of carrying out an algorithm on a particular train, they rely on simulating the sequence of the algorithm’s operations. Hence, according to the theory, the difficulty of the task should depend, not on the number of moves to be carried out, but on the Kolmogorov complexity of the algorithm. The experiment tested this prediction using while-loops for all four types of problems in Table 1 (i.e., reversals, palindromes, parity sorts, and faro shuffles). Each of them, however, was described in exactly the same number of words. The participants, who were not programmers, first watched a movie that explained and illustrated the railway environment. They then had no access to this environment for the deduction task, and they were not allowed to write anything down. After two simple practice problems, they had to deduce the consequences of the descriptions of algorithms on a given train of six cars. They did the task twice for each of the four types of algorithms, once with trains labeled with letters and once with trains labeled with numbers. The descriptions of the algorithms were in Polish, the native language of the participants, and they were not the minimal descriptions in Table 1 but were rewritten to be as clear as possible and to contain the same number of words (SI Text S5).
The percentages of correct deductions for the 43 participants who produced at least one complete answer corroborated the model theory’s predictions. The participants were correct for 41% of reversals, 35% of palindromes, 32% of parity sorts, and 23% of faro shuffles (Page’s L test; z = 1.94; P < 0.03). The latencies of correct deductions also supported this trend for those participants who were correct on at least one deduction of each algorithm (i.e., 77 s for reversals, 130 s for palindromes, 106 s for parity sorts, and 151s for faro problems). The means are slightly misleading because the stochastic increase in latencies for individual participants corroborated the predicted trend in a highly reliable way (Page’s L; z = 3.55; P < 0.0005). The number of moves in the simulations, the number of operands, or the edit distance (Table 1) cannot explain the trends in accuracy and latency. The participants differed overall in their ability to make correct deductions (Friedman nonparametric analysis of variance; χ2 = 17.29; P < 0.001). The most accurate participants got all eight problems correct; the least accurate got none of them correct.
General Discussion
In reasoning, the mind is fallible about both logical and probabilistic conclusions (43–45), but it has a striking ability to make mental simulations. They can be static mental models or kinematic sequences of them in which the sequences represent temporal orders (11). The model theory that we outlined in this article, and its computer implementation in mAbducer, show how such simulations can underlie the abduction of algorithms and the deduction of their consequences—at least in the case of a seemingly simple environment of toy trains. In fact, unlike, say, syllogistic inferences (7), the number of rearrangement problems is unbounded, and some of them call for considerable computational power. Faro shuffles, as illustrated in Table 1, call for the use of two stacks, so that a car shifted from the siding to the left track has to be shifted back to the siding again. The computational power needed here—two stacks—exceeds the power embodied in a well-known conjecture about the syntax of natural languages (46).
Individuals readily solve problems in the railway domain when they manipulate the cars on the track. The difficulty of solving these problems, as experiment 1 showed, depends on mAbducer’s number of moves in a solution but also independently on the number of cars in these moves. Participants often overlooked parsimonious moves of more than one car at a time. In the experiment, they did not have to simulate the moves because they could use the track itself.
The ability to solve problems is a prerequisite for abducing algorithms for their solution. The mAbducer program depends on simulating solutions using schematic models that it updates kinematically. Given that a loop of operations has to be repeated, it formulates a while-loop from its observations of the halting condition in the simulations. The program can also describe a for-loop and determine the number of times that the loop should iterate from its solution of a pair of simultaneous equations. The task of abducing algorithms is difficult, and, at first, we doubted whether naive individuals would be able to perform it because previous studies of informal programming showed that they avoided the use of loops (25–27). However, without access to the railway environment, as experiment 2 showed, they were able to simulate loops of operations, to figure out what was going on in them, and to describe them in informal algorithms. The participants had the predicted bias toward while-loops rather than for-loops. Likewise, the difficulty of the four types of rearrangements depended, not on the numbers of moves or cars to be moved, but on the Kolmogorov complexity of the Lisp algorithms that mAbducer creates (Table 1).
Prudent programmers debug their code by deducing its consequences for specific inputs. This task also provided evidence for the role of simulation. With no access to the railway environment and without being allowed to write anything down, naive individuals in experiment 3 were able to infer the results of carrying out the four types of algorithms on trains containing six cars. As the theory predicts, the difficulty of making the deductions depended, not on numbers of moves or cars to be moved, but on the complexity of the algorithms, which varied from reversing the order of cars to the more complex faro shuffle (Table 1).
The evidence we have reported supports the theory of the simulation using kinematic mental models. It provides a unified account of the abduction of algorithms and the deductions of their consequences. As far as we know, no other theory of naive reasoning about algorithms exists. Probabilities hardly enter the process and so Bayesian theories of reasoning may be irrelevant (5). However, a theory could be developed from an axiomatization of the railway domain in logic (6). The difficulties for this approach are to frame a complete set of axioms in a way that captures both what changes and what does not change with each move (47), and to ensure that the resulting system makes the correct predictions about human performance.
As we mentioned in the Introduction, psychologists hold almost all possible views about mental representations, from the claim that they are not needed for intelligent behavior (16) to the competing views that they are either abstract strings of symbols (18) or rooted in sensory modalities (19). Our results seem impossible to explain without invoking mental representations, and, most plausibly, kinematic models with an iconic structure that corresponds to the railway environment. These models may be mapped into visual images or they may be as abstract as they are in mAbducer (4, 12). Individuals can reason from models without forming visual images from them, and evidence suggests that images impede reasoning (13). Of course, it does not follow that all reasoning depends on simulating the world: a person can learn to use formal rules of inference. Likewise, it does not follow that all mental representations are iconic models (48). The model theory itself relies on another type of representation to capture the meaning of an assertion, which it then uses to construct models (49).
Mathematicians, logicians, and computer programmers reason about the repeated loops of operations in algorithms. Previous studies have examined how novice programmers try to formulate such algorithms in a programming language (e.g., refs. 23–27). However, as computer scientists often complain, no valid test exists to predict the ability of naive individuals as computer programmers (50). The results show that individuals differ reliably in their ability to abduce informal algorithms and to deduce the consequences of these algorithms. It remains to be seen whether such tasks, which depend on mental simulation, are reliable predictors of ability in programming. However, the evidence corroborates the theory that naive individuals use mental simulations to create informal algorithms, even those containing loops of operations, and to infer their consequences.
Methods
Experiment 1.
Twenty undergraduate students at Princeton University served as participants (mean age of 19.7 y), and none had had any prior training in logic or computer science. Participants gave informed consent, and the study was approved by the Princeton University Institutional Review Panel for Human Subjects. The participants were tested individually and carried out the experiment on a personal computer using LispWorks Version 4.4. They interacted with the system using the mouse and the keyboard of the computer. They were shown a 3-min instructional video that guided them through the elements of the railway environment and that presented the instructions. The problems showed the initial state with the cars on the left track and the required goal state with the cars on the right track. The participants made moves using a mouse to control a graphical interface. The key instruction stated that they should try to solve each problem with as few moves as possible. They acted as their own controls and carried out all 24 problems, which were presented in a different random order to each of them.
Experiment 2.
Twenty participants from the same population as before were tested individually. The session began with five practice problems akin to those in experiment 1, which the participants had to solve by interacting with the railway system. These problems were unrelated to the experimental problems, and each of them used a train of six cars with a solution of eight moves. The experiment proper followed, and the participants’ task was to type out a procedure that would solve each problem, but they could not interact with the railway environment or write anything down. They carried out two blocks of trials, one with problems for trains of eight cars and one with problems for trains of any length (i.e., a total of six trials). The blocks were presented in a counterbalanced order to two groups of participants. The order of the three types of rearrangement was randomized for each participant within each block. For the problems with trains of any length, the participants were told that a car containing an ellipsis stood in place for any number of cars that had the same pattern. They were free to use their own words in any way that they wanted. Two independent judges (one of the authors and a research assistant) scored the informal algorithms in terms of whether were correct or incorrect and whether they contained a while-loop or a for-loop. The two judges agreed 93% about the accuracy of the algorithms (111 out of 120 problems; Cohen’s κ = 0.82). The judges agreed 83% about the nature of the loops in the algorithms (99 out of 120 problems; Cohen’s κ = 0.73). A third independent judge resolved the discrepant evaluations in both cases.
Experiment 3.
Fifty-four undergraduate psychology students from Warsaw University of Social Sciences and Humanities took part in the experiment (mean age 21.6 y), and because logic is obligatory in most Polish universities, over half of them had taken at least one course in logic. Twenty-two participants were paid a small sum (equivalent to $2) for participating in the experiment, and the rest took part in exchange for course credit. This difference had no reliable effect on either of the dependent variables, and so we pooled the data from those two conditions. Each participant carried out two versions of the reversal, palindrome, parity, and faro problems. One version had cars labeled with letters, and one version had cars labeled with numbers. Each description of an informal algorithm started and ended with the same phrases, and each description contained 109 words in Polish (see SI Text S4 for the original descriptions and translations into English). The descriptions were presented in one of eight counterbalanced orders allocated at random to the participants. The experiment was presented on a computer screen and the students typed in their answers. They were instructed not to type their response until they knew the position of all six cars on the right track, and they were not allowed to write anything down.
Supplementary Material
Acknowledgments
We thank Ruth Byrne, Sam Glucksberg, Adele Goldberg, Geoffrey Goodwin, Louis Lee, David Lobina, Max Lotstein, Paula Rubio, and Carlos Santamaría for advice. This work was supported by a National Science Foundation Graduate Research fellowship (to S.S.K.), Polish Ministry of Science and Higher Education Grant 2836/01/E/560/S/2012 (to R.M.), by Italian Ministry of Education University and Research Grant 2010RP5RNM (to M.B.) to study problem solving and decision making, and by National Science Foundation Grant SES 0844851 (to P.N.J.-L.) to study deductive and probabilistic reasoning.
Footnotes
The authors declare no conflict of interest.
This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1316275110/-/DCSupplemental.
References
- 1.Shepard RN, Metzler J. Mental rotation of three-dimensional objects. Science. 1971;171(3972):701–703. doi: 10.1126/science.171.3972.701. [DOI] [PubMed] [Google Scholar]
- 2.Johnson-Laird PN. Mental Models. Cambridge, UK: Cambridge Univ Press; 1983. [Google Scholar]
- 3.Bower GH, Morrow DG. Mental models in narrative comprehension. Science. 1990;247(4938):44–48. doi: 10.1126/science.2403694. [DOI] [PubMed] [Google Scholar]
- 4.Hegarty M. Mechanical reasoning by mental simulation. Trends Cogn Sci. 2004;8(6):280–285. doi: 10.1016/j.tics.2004.04.001. [DOI] [PubMed] [Google Scholar]
- 5.Oaksford M, Chater N. Bayesian Rationality: The Probabilistic Approach to Human Reasoning. New York: Oxford Univ Press; 2007. [DOI] [PubMed] [Google Scholar]
- 6.Rips LJ. The Psychology of Proof. Cambridge, MA: MIT Press; 1994. [Google Scholar]
- 7.Khemlani S, Johnson-Laird PN. Theories of the syllogism: A meta-analysis. Psychol Bull. 2012;138(3):427–457. doi: 10.1037/a0026841. [DOI] [PubMed] [Google Scholar]
- 8.Johnson-Laird PN. Mental models and human reasoning. Proc Natl Acad Sci USA. 2010;107(43):18243–18250. doi: 10.1073/pnas.1012933107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Peirce CS. In: Collected Papers of Charles Sanders Peirce. Hartshorne C, Weiss P, Burks A, editors. Vol 4. Cambridge, MA: Harvard Univ Press; 1931–1958. [Google Scholar]
- 10.Johnson-Laird PN, Byrne RMJ. Deduction. Hillsdale, NJ: Erlbaum; 1991. [Google Scholar]
- 11.Schaeken WS, Johnson-Laird PN, d’Ydewalle G. Mental models and temporal reasoning. Cognition. 1996;60(3):205–234. doi: 10.1016/0010-0277(96)00708-1. [DOI] [PubMed] [Google Scholar]
- 12.Hegarty M, Stieff M, Dixon BL. Cognitive change in mental models with experience in the domain of organic chemistry. J Cogn Psychol. 2013;25(2):220–228. [Google Scholar]
- 13.Knauff M, Fangmeier T, Ruff CC, Johnson-Laird PN. Reasoning, models, and images: Behavioral measures and cortical activity. J Cogn Neurosci. 2003;15(4):559–573. doi: 10.1162/089892903321662949. [DOI] [PubMed] [Google Scholar]
- 14.Margolis E, Laurence S. The ontology of concepts—abstract objects or mental representations? Noûs. 2007;41(4):561–593. [Google Scholar]
- 15.Ramsey WM. Representation Reconsidered. Cambridge, MA: MIT Press; 2007. [Google Scholar]
- 16.Brooks R. Intelligence without representation. Artif Intell. 1991;47(1-3):139–160. [Google Scholar]
- 17.Thelen E, Smith LB. A Dynamic Systems Approach to the Development of Cognition and Action. Cambridge, MA: MIT Press; 1994. [Google Scholar]
- 18.Pylyshyn Z. Return of the mental image: Are there really pictures in the brain? Trends Cogn Sci. 2003;7(3):113–118. doi: 10.1016/s1364-6613(03)00003-2. [DOI] [PubMed] [Google Scholar]
- 19.Barsalou LW. In: Embodied Grounding: Social, Cognitive, Affective, and Neuroscientific Approaches. Semin GR, Smith ER, editors. New York: Cambridge Univ Press; 2008. pp. 9–42. [Google Scholar]
- 20.Rogers H. Theory of Recursive Functions and Effective Computability. New York: McGraw–Hill; 1967. [Google Scholar]
- 21.Cherubini P, Johnson-Laird PN. Does everyone love everyone? The psychology of iterative reasoning. Think Reason. 2004;10(1):31–53. [Google Scholar]
- 22.Mazzocco K, Cherubini AM, Cherubini P. On the short horizon of spontaneous iterative reasoning in logical puzzles and games. Organ Behav Hum Decis Process. 2013;121(1):24–40. [Google Scholar]
- 23.Kurland DM, Pea RD. Children’s mental models of recursive LOGO programs. J Educ Comput Res. 1985;1(2):235–244. [Google Scholar]
- 24.Anderson JR, Pirolli P, Farrell R. In: The Nature of Expertise. Chi M, Glaser R, Farr M, editors. Hillsdale, NJ: Erlbaum; 1988. pp. 153–183. [Google Scholar]
- 25.Miller L. Programming by non-programmers. Int J Man Mach Stud. 1974;6(2):237–260. [Google Scholar]
- 26.Miller L. Natural language programming: Styles, strategies, and contrasts. IBM Syst J. 1981;20(2):184–215. [Google Scholar]
- 27.Pane JF, Ratanamahatana CA, Myers BA. Studying the language and structure in non-programmers’ solutions to programming problems. Int J Hum Comput Stud. 2001;54(2):237–264. [Google Scholar]
- 28.Simon HA. The functional equivalence of problem-solving skills. Cognit Psychol. 1975;7(2):268–288. [Google Scholar]
- 29.Simon HA, Reed SK. Modeling strategy shifts in a problem-solving task. Cognit Psychol. 1976;8(1):86–97. [Google Scholar]
- 30.Hopcroft JE, Ullman JD. Introduction to Automata Theory, Languages, and Computation. Reading, MA: Addison-Wesley; 1979. [Google Scholar]
- 31.Peirce CS. In: Philosophical Writings of Peirce. Buchler J, editor. New York: Dover; 1955. [Google Scholar]
- 32.Newell A. Unified Theories of Cognition. Cambridge, MA: Harvard Univ Press; 1990. [Google Scholar]
- 33.Lee NYL, Goodwin GP, Johnson-Laird PN. The psychological problem of Sudoku. Think Reason. 2008;14(4):342–364. [Google Scholar]
- 34.Halford GS, Wilson WH, Phillips S. Processing capacity defined by relational complexity: Implications for comparative, developmental, and cognitive psychology. Behav Brain Sci. 1998;21(6):803–831, discussion 831–864. doi: 10.1017/s0140525x98001769. [DOI] [PubMed] [Google Scholar]
- 35.Anderson JR, Betts S, Ferris JL, Fincham JM. Cognitive and metacognitive activity in mathematical problem solving: Prefrontal and parietal patterns. Cogn Affect Behav Neurosci. 2011;11(1):52–67. doi: 10.3758/s13415-010-0011-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Levenshtein V. Binary codes capable of correcting deletions, insertions, and reversals. Sov Phys Dokl. 1966;10(8):707–710. [Google Scholar]
- 37.Ragni M, Khemlani S, Johnson-Laird PN. The evaluation of the consistency of quantified assertions. Mem Cognit. 2013 doi: 10.3758/s13421-013-0349-y. in press. [DOI] [PubMed] [Google Scholar]
- 38.Li M, Vitányi P. An Introduction to Kolmogorov Complexity and Its Applications. 2nd Ed. New York: Springer; 1997. [Google Scholar]
- 39.Chater N, Vitányi P. Simplicity: A unifying principle in cognitive science? Trends Cogn Sci. 2003;7(1):19–22. doi: 10.1016/s1364-6613(02)00005-0. [DOI] [PubMed] [Google Scholar]
- 40.Diaconis P, Graham RL, Kantor WM. The mathematics of perfect shuffles. Adv Appl Math. 1983;4:175–196. [Google Scholar]
- 41.Koza JR. Genetic Programming II: Automatic Discovery of Reusable Programs. Cambridge, MA: MIT Press; 1994. [Google Scholar]
- 42.Flener P, Yilmaz S. Inductive synthesis of recursive logic programs: Achievements and prospects. J Log Program. 1999;41(2-3):141–195. [Google Scholar]
- 43.Johnson-Laird PN. How We Reason. New York: Oxford Univ Press; 2006. [Google Scholar]
- 44.Nickerson RS. Aspects of Rationality. New York: Psychology Press; 2008. [Google Scholar]
- 45.Khemlani SS, Lotstein M, Johnson-Laird PN. The probabilities of unique events. PLoS ONE. 2012;7(10):e45975. doi: 10.1371/journal.pone.0045975. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Gazdar G. On syntactic categories. Philos Trans R Soc Lond B Biol Sci. 1981;295(1077):267–283. [PubMed] [Google Scholar]
- 47.McCarthy J. Applications of circumscription to formalizing common-sense knowledge. Artif Intell. 1986;28(1):89–116. [Google Scholar]
- 48.Khemlani S, Orenes I, Johnson-Laird PN. Negation. J Cogn Psychol. 2012;24(5):541–559. [Google Scholar]
- 49. Khemlani S, Johnson-Laird PN (2012). The processes of inference. Argument & Computation 4(1):4–20.
- 50. Bornat R, Dehnadi S, Simon (2008) Mental models, consistency and programming aptitude. Proceedings of the Tenth Australasian Computing Education Conference (ACE 2008), CRPIT, eds Simon, Hamilton M (Australian Computer Society, Wollongong, Australia), Vol 78, pp. 53–61.
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.

