Mechanisms of Inferential Order Judgments in Humans (Homo sapiens) and Rhesus Monkeys (Macaca mulatta)

Dustin J Merritt; Herbert S Terrace

doi:10.1037/a0021572

. Author manuscript; available in PMC: 2012 Jul 17.

Published in final edited form as: J Comp Psychol. 2011 May;125(2):227–238. doi: 10.1037/a0021572

Mechanisms of Inferential Order Judgments in Humans (Homo sapiens) and Rhesus Monkeys (Macaca mulatta)

Dustin J Merritt ^1,^*, Herbert S Terrace ²

PMCID: PMC3398861 NIHMSID: NIHMS245733 PMID: 21341909

Abstract

If A > B, and B > C, it follows logically that A > C. The process of reaching that conclusion is called transitive inference (TI). Several mechanisms have been offered to explain transitive performance. Scanning models claim that the list is scanned from the ends of the list inward until a match is found. Positional discrimination models claim that positional uncertainty accounts for accuracy and reaction time patterns. In Experiment 1, we trained rhesus monkeys and humans on adjacent pairs (e.g. AB, BC, CD, DE, EF) and tested them with previously untrained nonadjacent pairs (e.g. BD). In Experiment 2, we trained a second list, and tested with nonadjacent pairs selected between lists (e.g. B from list 1, D from list 2). We then introduced associative competition between adjacent items in Experiment 3 by training two items per position (e.g. B₁C₁, B₂C₂) before testing with untrained nonadjacent items. In all three Experiments, humans and monkeys showed distance effects in which accuracy increased, and reaction time decreased, as the distance between items in each pair increased (e.g. BD vs. BE). In Experiment 4, we trained adjacent pairs with separate 9- list and 5-item lists. We then tested with nonadjacent pairs selected between lists to determine whether list items were chosen according to their absolute position (e.g. D, 5-item list > E, 9-item list), or their relative position (e.g. D, 5-item list < E, 9-item list). Both monkeys’ and humans’ choices were most consistent with a relative positional organization.

Keywords: distance effects, transitive inference, absolute position, relative position, serial order

As defined by the principle of transitivity, if given the premise that A is greater than B, and that B is greater than C, it logically follows that A must also be greater than C. Piaget argued that the ability to make transitive inferences (TI) is a defining feature of concrete operational thinking in young children (Piaget, 1954; Piaget, Inhelder & Szeminska, 1960). This ability is useful because it allows hierarchical relationships to be organized using minimal information. TI may also be useful to animals, particularly to those animals whose social structure is organized by a linear dominance hierarchy. Because dominance relationships are often established through aggressive physical encounters, the chances of injury or death may be reduced if the animal can learn its ranking with the fewest number of necessary encounters. Inferring dominance relationships by watching dyadic interactions would eliminate the need to physically engage each animal in the group (Bond, Kamil, & Balda, 2003; Grosenick, Clement, & Fernald, 2007).

To investigate transitivity in animals, McGonigle and Chalmers (1977) developed a nonverbal testing technique in which pairs of adjacent and arbitrarily selected list items were presented to squirrel monkeys e.g., AB, BC, CD, DE (see also Bryant & Trabasso, 1971). Selection of the "correct" item from a given pair produced a reward; selection of the incorrect item, a time-out (TO). Once a criterion level of performance was reached, a previously untrained, nonadjacent pair was presented (e.g. BD). If the animal could extrapolate an overall list order from relationships between adjacent pairs, then accuracy to nonadjacent pairs (e.g. BD, BE, CE, etc.) should be greater than that predicted by chance. McGonigle and Chalmers (1977) showed that this was the case for squirrel monkeys. Additional evidence for TI has been found in a many other species as well [Bond, Kamil, & Balda, 2003 (pinyon jays); D’Amato & Colombo, 1990 (Cebus monkeys); Davis, 1992 (rats); Gillan 1981 (chimpanzees); Grosenick, Clement, & Fernald, 2007 (fish); Lazareva et al., 2004 (crows); MacLean, Merritt, & Brannon, 2008 (lemurs); Treichler & van Tilburg, 1996 (rhesus monkeys); von Fersen et al., 1991 (pigeons)].

These studies are significant because they extend the scope of animal cognition in a new direction. They also pose the challenge of identifying a nonverbal mechanism that can account for the behavioral signatures commonly observed during training and testing of TI. One is the serial position effect (SPE) that occurs during adjacent-pair training; the other is the distance effect that is observed during testing with nonadjacent pairs (D’Amato & Colombo, 1990). The SPE refers to the fact that during training, accuracy is greatest for pairs that contain an end item, e.g., AB and EF from a 6-item list, and lowest for pairs that only contain middle items, e.g. CD. The distance effect refers to the fact that accuracy increases and RT decreases as the distance between items during testing increases, e.g., BC, BD, BE.

The SPE and distance effects have been cited as evidence that, like humans, monkeys integrate list items into a linear representation (D’Amato & Colombo, 1990; Treichler & van Tilburg, 1996; Treichler, Raghanti, & van Tilburg, 2003). Others have argued that elementary conditioning processes and reinforcement history can account for the SPE and distance effects (Couvillon & Bitterman, 1992; Delius & Siemann, 1998; Frank, Rudy, & O’reilly, 2003; Frank et al., 2005; Wynne et al., 1992).

Reinforcement-based theories can indeed account for some of the qualitative response patterns observed during both training and test. During training, selection of each non-end item produces reinforcement half of the time, e.g. C in a CD pair, and does not produce reinforcement the other half of the time, e.g. C in a BC pair. A challenge for reinforcement-based models is explaining transitive performance for these middle pairs, which are reinforced equally often. One of the most frequently cited models is based on Value Transfer Theory (VTT), which asserts that positive value is transferred from the reinforced item (e.g. A) to the nonreinforced item (e.g. B) in any given pairing (von Fersen, et al. 1991). Thus, even though middle items are all reinforced equally, the transferred positive values create the ordering B > C > D. Accordingly, B is chosen over D in a transitive test. Other models can predict TI performance, but usually only under very specific training conditions (Couvillon & Bitterman, 1992; Lazareva et al., 2004; Wynne, 1995).

Because conditioning models are based on reward history, there are some data for which any conditioning model would seem inadequate. For example, Paz-y-Mino et al. (2004), showed that pinyon jays were able to infer their rank within a social dominance hierarchy simply by watching known conspecifics interact with novel birds. Conditioning models have difficulty explaining this result because pinyon jays had no history of reinforcement with the birds in question. In another study, Treichler and van Tilburg (1996) trained monkeys on two separate 5-item lists (e.g. A+B−, B+C−, C+D−, D+E−, and F+G−. G+H−, H+I−, I+J−) and later linked these lists by training the single pair (E+F−). Accurate performance on test pairs that were composed of items from each list demonstrated that the monkeys connected the two lists to form a single 10-item list, as opposed to two separate lists organized by their within-list associative values. The linking of two lists via training on a single pair presents difficulties for any model that bases its predictions on differences in the associative strength of list items. These experiments, and others in which monkeys learned various serial tasks (Chen, Swartz, & Terrace, 1997; Terrace, 1991, 1993; Terrace, Son, & Brannon, 2003), suggest that monkeys may solve transitivity problems by forming a linear representation of the list items.

How can serial position and distance effects be explained if monkeys use a linear representation to make transitive choices? For humans, it has been proposed that transitive choice might result from an ends-inward serial scan. Two scan processes are initiated, one from each end of the list, and the scan proceeds inwardly until one of the processes finds a match (Holyoak and Patterson 1981; Jou 1997; Merikle & Coltheart 1972; Parkman, 1971; Trabasso, Riley, & Wilson, 1975; Trabasso & Riley, 1975; Woocher, Glass, & Holyoak, 1978). If the scan process that starts at the beginning of the list is first to reach an item, then a response occurs to that item. Conversely, if the scan that starts at the end of the list reaches an item first, a response occurs to the other item by default. This process can explain both the SPE and distance effect because items that are separated by a larger distance will, on average, have an item closer to one of the ends than items that are separated by smaller distances. If the time needed to scan through the list is the primary determinant of response latency, and if the probability of error increases with each scanned item, then accuracy should increase, and RT should decrease with increasing separation distance between the pairs. By using the same logic, an ends-inward scan can also account for the SPE. For example, on a 7-item list, fewer transitions are needed for pairs near one end of the list (such as BC, EF) than for pairs located in the middle (such as CD).

Positional discriminability models can also account for the SPE and distance effect (Holyoak & Patterson, 1981). During training, the beginning and end items are typically learned first. Some models suggest that they serve as positional reference points for the other items. As a result, new items are given two-dimensional position codes (beginning and end) based on their proximity to the beginning and end items (Henson, 1998, 1999). These codes are imprecise, forming a normal distribution centered on the item's veridical position. Once a code is generated for each item, a comparison process computes the difference between the two position codes. This information sampling process is iterative, occurring repeatedly until it reaches a cumulative difference threshold, at which point one of the two items is selected (Holyoak & Patterson, 1981; Jou, 1997).

Because items that are close together, e.g. CD, have more positional overlap than items that are farther apart (e.g. BE), more information sampling is required in order to make a decision. Items that are farther apart will thus be discriminated more quickly and easily than items that are close together (i.e. the distance effect). Positional uncertainty can also explain the SPE. By mere proximity, the positional uncertainty distributions for middle items will have more overlap with the distributions of other items than will beginning and end items. It is also the case that in many models, the distributions of middle items are wider and flatter than those of beginning and end items (Bower, 1971; Henson, 1999; Murdock, 1960; Trabasso & Riley, 1975). Thus, one would expect more errors and longer RT’s for middle items than for end items.

We conducted four experiments to investigate which of these underlying mechanisms might be responsible for transitive performance. During each experiment, humans and rhesus monkeys were trained with various types of adjacent pairs and then tested on nonadjacent pairs that were composed of the items used during the training phase. The purpose of Experiment 1 was to verify that our method for training transitive inference would produce the standard SPE and distance effects previously observed in other experiments. Each of the subsequent experiments was designed so that successes and failures were diagnostic of the manner in which list items were organized and represented in memory. Although previous experiments have demonstrated that humans and monkeys show similar response patterns, the present experiments go one step further by asking whether these patterns are also governed by similar mechanisms.

The purpose of Experiment 2 was to determine whether the human and non-human participants in Experiment 1 were able to make accurate ordinal judgments when pairs of items were selected from different lists (see Treichler, Raghanti, & van Tilburg, 2003). For example, given the presentation of pair B₁D₂, where B₁ was selected from List 1, and item D₂ was selected from List 2, would participants correctly order the items of the B₁D₂ test pair by selecting item B₁ (Figure 1A)? Failure to do so would suggest that accurate ordinal judgments were restricted to within-list pairs, and that the probable mechanism was an ends-inward associative scan that could only be performed on a single list at a time.

(A) During between-list testing, test pairs were composed of items drawn from different lists. The subscripts represent the list number. (B) There were two items per position, with each item having a 50% chance of appearing for any given position. For example, “A” was randomly paired with one of two “B” items (e.g. B₁ and B₂). (C) A five and a nine-item list arranged spatially according to the between-list subjective similarities that are predicted from a relative versus absolute positional representation. The subscripts represent the list length.

Notably absent from most scanning models, is mention of the connective linkage that allows the scan process to transition from one item to the next. Experiment 3 tested the hypothesis that participants used positional information when making order judgments. If transitions from one item to the next were mediated by associative linkages (Jou, 1997), associative competition between list items should make it very difficult for such linkages to operate (Figure 1B). Failure to make accurate judgments during testing would suggest that subjects did not understand the positions of the items, and instead, relied on associative links in order to guide an ends-inward scan.

Although successful test performance during Experiment 3 would suggest that associative scanning was not used, it still leaves open the possibility that scanning by ordinal position occurred. Under this scenario, each list item is placed into an ordinal slot or bin, whose contents could later be scanned for content (see Conrad, 1965, Henson, 1998, Orlov et al., 2006). This is functionally very similar to an associative scanning process, but it differs with respect to the suggested linkage that allows scanning to proceed from one item to the next. Rather than associative linkages, the slots themselves have a fixed order which dictate how scanning will proceed. This results in a very different type of positional representation than that described for the positional discriminability model. Because the memory slots are fixed, the nature of the positional representation will be ordinal, such as "first", "second", and "third" (Orlov, et al. 2000; Orlov et. al. 2006). In contrast, if the beginning and end items serve as salient anchor points by which other list items are organized (e.g. Henson, 1999), then items should be represented with respect to their relative positions ("beginning", "middle", and "end").

In Experiment 4 we asked whether item positions were represented by their absolute or relative value. A simple way to compare the absolute and relative models is to visually map a comparison of two lists of different lengths, say a 5-item and a 9-item list. Absolute and relative representations would give rise to different judgments of subjective similarity. If judgments are made by scanning ordinal memory slots, one would expect item C from the 5-item list (A, B, C, D, E) to be subjectively most similar to item C from the 9-item list (A, B, C, D, E, F, G, H, I). However, if the items were organized relative to the beginning and end of the list (as assumed by the positional discriminability model), then subjective similarity should be greatest when item C from the 5-item list is compared with item E from the 9-item list. This is because they both occupy the middle position within their respective lists (cf. Figure 1C).

Experiment 1

Within list training and within list testing

The purpose of Experiment 1 was to verify that our training methods would produce the SPE and distance effects observed in previous experiments (e.g. Moyer & Landauer, 1967; Trabasso & Riley, 1975). We also wanted to compare RT and accuracy functions obtained from humans and monkeys to determine the extent to which they were qualitatively similar.

Subjects

Three male rhesus monkeys (Macaca mulatta) and 15 adult humans participated in this experiment¹. One monkey (Benedict) was 7 years old, and had approximately 5 years of previous cognitive testing before these experiments began. The other two monkeys (Ebbinghaus and Lashley) were both 5 years old with approximately 3 years of previous cognitive testing with numerical stimuli (Brannon & Terrace, 1998). The 15 undergraduate students from Columbia University were paid to participate in all 4 experiments conducted in this study. As with monkeys, a within-group design was used to train and test human participants.

Apparatus

Humans were tested with a Macintosh G4 computer that was connected to a 15-inch computer monitor. A 15-inch Magictouch touch screen was affixed to the computer monitor in order to record selections made by the participant. The monkeys were transferred from their home cages to the experimental apparatus prior to each testing session. Training and testing took place in a chamber (23" wide × 27" long × 28.5" deep) that contained a 15" touch sensitive video monitor and a reinforcement hopper. The test chamber was completely enclosed in a sound-attenuated booth. All software was written in the RealBasic programming language.

Procedure

The stimuli were digitized photographs of natural objects and artificial structures (e.g. animals, people, scenery, flowers, cars, bridges etc.). Each photograph measured 1.5" × 1.5" in size, and was arbitrarily selected from a library of approximately 2500 photographs. The screen was divided into a 4×4 matrix that provided 16 locations for presenting photographs. However, in the present experiment, only 3 locations were used; specifically, the 3 locations in the top row of the matrix, starting from the left edge of the matrix. The positions of the photographs used during each trial were chosen at random. All of the stimuli were presented on a blue background. The start-stimulus was a 2" × 2" white square that was presented in the center of the screen.

At the onset of each trial, the start-stimulus was presented. A response to the start-stimulus caused it to disappear. After a 0.5 sec delay, monkeys and humans were presented with a pair of adjacent items randomly selected from a 6-item list (A+B−, B+C, C+D−, D+E−, and E+F−). Selection of the correct item (+) caused the screen to turn red, and a reward to be delivered (the word “CORRECT!” for humans, and a banana flavored pellet for the monkeys). Selection of the incorrect item (−) produced a 3-sec timeout (TO) during which the screen was black. For humans, the word “INCORRECT!” was flashed across the screen. Differential auditory feedback was provided both for correct responses and errors. Correct responses produced a “positive” sound; incorrect responses, a “negative” sound. If a selection was not made before a 5-sec time limit expired, the trial ended with a TO.

The monkeys were trained daily, 100 trials per session. Correct and incorrect choices had to be learned by trial and error. The testing phase began once monkeys reached a criterion performance of 80% correct, for each adjacent pair, for two consecutive sessions. During testing, non-adjacent pairs were presented for the first time, randomly intermixed with adjacent items.

Human participants had a single 300 trial training session. Participants were told that some pictures were “better” than others, and that they would learn these relationships through trial and error. No mention was made of the inherent order of the items. Accuracy was analyzed in blocks of 50 trials. In order to be eligible for testing, participants were required to reach a criterion performance of 80% correct for all training pairs by the fifth block. For humans, training accuracy was assessed during the criterion block and the block that immediately preceded criterion performance. For monkeys, the last 5 sessions (including the criterion session) were assessed.

Pair Testing

The monkeys were given six 105-trial test sessions in which all possible pairs were drawn from the 6-item list (AB, AC, AD, AE, AF, BC, BD, BE, BF, CD, CE, CF, DE, DE, EF). The pairs were drawn at random with the constraint that all pairs had to be presented an equal number of times during the 105-trial session. Choices were non-differentially reinforced (i.e. both correct and incorrect choices were rewarded), and no time-outs were given for any of the responses. Because the non-differential contingency created the possibility that monkeys could learn to select any item to obtain reward, test- and training-sessions alternated every day (i.e. test sessions and training sessions were given every other day). Testing of the human participants was identical to that of the monkeys, with the exception that human participants were only given a single 105-trial test session.

Results and Discussion

Training

Serial position effects were obtained from both humans and monkeys. As shown in Figure 2, an SPE was obtained from both Benedict and Ebbinghaus. Accuracy for middle pairs, e.g. CD, was lower than accuracy for end pairs, e.g. AB, EF. Although the both humans and monkeys showed a U-shaped accuracy function, the serial position effect for the human participants was much smaller than it was for monkeys. One reason may be that the human participants had much greater experience processing symbolic sequential information compared to monkeys, and therefore may have developed strategies to improve performance on the difficult middle pairs.

Pair Testing

Because responses to the first item (A) were always rewarded, and responses to the last item (F) were never rewarded, it is possible that reinforcement alone could explain the differences in accuracy and reaction time for pairs with those items. This could explain distance effects because pairs with end items are separated (on average) by larger distances than non-end pair items. To eliminate this confound, only internal (non-end) pairs from the list were used for reaction time and accuracy analyses.

Statistical analyses were the same as Experiment 1 for all four experiments. For monkeys, linear regressions were conducted on median session RT’s of internal test items, and one-way analyses of variance (ANOVA) were conducted on mean session accuracies of internal test pair items. For humans, within-subject ANOVAs were conducted on the accuracy and the median RT for internal test pairs.

Accuracy

As shown in Figure 3, a distance effect was obtained for both humans and monkeys. Accuracy increased as the distance between items increased. However, for humans, the accuracy distance effect was only borderline statistically significant [Benedict, F(2,27) = 3.57, p < .05, η² = 0.21; Ebbinghaus, F(2,27) = 6.00, p < .05, η² = 0.21; Humans, F(2,26) = 3.12, p = .06, , η² = 0.12]. As in other studies of transitivity (e.g. D’Amato & Colombo, 1990), accuracy was higher on the novel pairs with non-differential feedback than on adjacent pairs on which participants had received extensive training.

Reaction Time

As shown in Figure 4, a distance effect, based on RTs, was also obtained from both humans and from one monkey. RT decreased as the distance between the test pairs increased [Ebbinghaus, F(1,28) = 6.82, p < .05, R² = 0.20; Humans, F(2,26) = 4.2, p < .05, η² = 0.06]. While Benedict showed a similar pattern of decreasing RTs with increases in distance between test items, that difference fell short of statistical significance [F(1,28) = 1.66, p = .21].

Given that non-differentially reinforced testing sessions were interspersed with regular training sessions, it is possible that the multiple sessions with non-differentially reinforced items may have affected performance over the course of testing. In order to examine this possibility, the first five test sessions (Session Block 1) and the last five test sessions (Session Block 2) were compared using a two-way ANOVA (Distance X Session Block) for accuracy and RT. We found no main effect of session block for either Benedict (Accuracy, F(1, 24) = 0.001, p = 0.97; RT, F(1, 24) = 1.05, p = 0.36) or Ebbinghaus (RT, F(1, 24) = 0.16, p = 0.69), and no interaction between session block and distance for Benedict (Accuracy, F(2, 24) = 0.41, p = 0.67; RT, F(2, 24) = 0.12, p = 0.88) or Ebbinghaus (F(2, 24) = 0.20, p = 0.82). Because Ebbinghaus’ accuracy was at ceiling for all pairs, his accuracy was not assessed for trends.

The results of Experiment 1 achieved two goals. It provided baseline accuracy and RT functions that were used to assess participants’ performance in Experiments 2–4. They also provided evidence that qualitatively similar patterns in accuracy and RT could be obtained from humans and monkeys during both training and testing.

Experiment 2

Within list training and between list testing

Experiment 2 was designed to determine whether participants were engaged in an associative ends-inward serial scan when making judgments of relative order. The participants, both humans and monkeys were the same as the ones used in Experiment 1. In Experiment 2, they were trained on a second list, and were then tested with nonadjacent pairs (Figure 1A), one item from list 1; the other from list 2, e.g., B₁D₂, C₂E₁, etc. List number indicated by subscripts, 1 or 2. If a self-terminating search process was used to scan a single list at a time (e.g. Sternberg, 1969), then it should not be possible to make between-list comparisons.