Beyond Stimulus Cues and Reinforcement Signals: A New Approach to Animal Metacognition

Justin J Couchman; Mariana V C Coutinho; Michael J Beran; J David Smith

doi:10.1037/a0020129

. Author manuscript; available in PMC: 2011 Nov 1.

Published in final edited form as: J Comp Psychol. 2010 Nov;124(4):356–368. doi: 10.1037/a0020129

Beyond Stimulus Cues and Reinforcement Signals: A New Approach to Animal Metacognition

Justin J Couchman ¹, Mariana V C Coutinho ², Michael J Beran ³, J David Smith ⁴

PMCID: PMC2991470 NIHMSID: NIHMS205291 PMID: 20836592

Abstract

Some metacognition paradigms for nonhuman animals encourage the alternative explanation that animals avoid difficult trials based only on reinforcement history and stimulus aversion. To explore this possibility, we placed humans and monkeys in successive uncertainty-monitoring tasks that were qualitatively different, eliminating many associative cues that might support transfer across tasks. In addition, task transfer occurred under conditions of deferred and rearranged feedback—both species completed blocks of trials followed by summary feedback. This ensured that animals received no trial-by-trial reinforcement. Despite distancing performance from associative cues, humans and monkeys still made adaptive uncertainty responses by declining the most difficult trials. These findings suggest that monkeys’ uncertainty responses could represent a higher-level, decisional process of cognitive monitoring, though that process need not involve full self-awareness or consciousness. The dissociation of performance from reinforcement has theoretical implications concerning the status of reinforcement as the critical binding force in animal learning.

Keywords: metacognition, uncertainty monitoring, primate cognition, comparative psychology, monkeys

Humans can discern when they are certain or uncertain. They know when they know and do not know, and they can use this knowledge to avoid difficult situations or seek out additional information. Extensive research on uncertainty monitoring and metacognition has explored these phenomena (Benjamin et al., 1998; Brown et al., 1983; Flavell, 1979; Hart, 1965; Koriat, 1993, 2007; Koriat et al., 2006; Metcalfe, 2000; Metcalfe & Shimamura, 1994; Nelson, 1992; Schwartz, 1994; Serra & Dunlosky, 2005).

Researchers take humans’ metacognitive behaviors to indicate important mental capacities, including hierarchical layers of cognitive control (Nelson & Narens, 1990), self-awareness (Gallup, 1982), and declarative consciousness (Nelson, 1996). In fact, metacognition may be such a sophisticated cognitive capacity in humans that it is unique to humans (Metcalfe & Kober, 2005). Thus, it is an important empirical question with wide-ranging theoretical implications whether nonhuman animals have similar cognitive capacities and what these capacities might say about the nonhuman mind (Kornell, 2009; Smith, Shields, & Washburn, 2003; Terrace & Metcalfe, 2005).

Recent research has suggested that nonhuman animals (hereafter, animals) also have a capacity for metacognition or cognitive monitoring (Beran, Smith, Redford, & Washburn, 2006; Call & Carpenter, 2001; Foote & Crystal, 2007; Hampton, 2001; Inman & Shettleworth, 1999; Kornell, Son, & Terrace, 2007; Shields, Smith, & Washburn, 1997; Shields, Smith, Guttmannova, & Washburn, 2005; Smith, Beran, Redford, & Washburn, 2006; Smith, Schull, et al., 1995; Smith, Shields, Schull, & Washburn, 1997; Smith, Shields, Allendoerfer, & Washburn, 1998; Suda-King, 2008; Sutton & Shettleworth, 2008; Washburn, Smith, & Shields, 2006). In these studies, researchers used perception and memory tasks that presented a mix of easy and difficult trials. They gave animals primary discrimination responses (e.g., Sparse-Dense; Familiar-Unfamiliar) but also a secondary response that allowed them to decline any trials they chose. Animals with the capacity to monitor their cognitive states should recognize difficult trials as problematic and decline these trials selectively. Animals have produced data patterns in some uncertainty-monitoring tasks that are strikingly like those of humans (Shields et al., 1997; Smith et al., 1997; Smith et al., 1998). This secondary trial-decline response has come to be called the uncertainty response.

Comparative researchers naturally proceed cautiously in attributing metacognitive capacities to animals. Indeed, comparative psychology’s tradition of parsimony, as exemplified by Morgan’s Canon (1906. p. 53), demands a search for an explanation of the animal data patterns that relies on the lowest-level psychological capabilities possible. This is why the appropriate psychological interpretation of animals’ uncertainty responses has been a source of ongoing theoretical discussion.

As part of this discussion, Smith, Beran, Couchman, and Coutinho (2008) explored possible associative explanations for uncertainty responses. One persistent methodological concern is that researchers have rewarded animal participants directly for their uncertainty responses (e. g., Foote & Crystal, 2007; Inman & Shettleworth, 1999; Kornell et al., 2007; Hampton, 2001; Suda-King, 2008; Sutton & Shettleworth, 2008). This approach has the problem that it might give the uncertainty response a positive response strength independent of any metacognitive role it plays in a task. It might be used because its reward properties are attractive, producing the observed data patterns absent any metacognitive assessment by the animal. If so, a first-order associative account of those data patterns would be more parsimonious than a metacognitive account. One purpose of the present research was to evaluate animals’ uncertainty responses when no direct food rewards were ever offered for those responses. Of course eliminating the primary reinforcers attending the uncertainty response does not address all possible associative explanations of it. Accordingly, this article also considers carefully more indirect and secondary associative explanations.

In fact, another potential problem with uncertainty paradigms is that researchers generally make reinforcement transparent by giving feedback on every trial. As a result, every consequence can be immediately and directly associated to the stimulus-response pairing that produced the negative or positive outcome. Difficult stimuli/trials—seldom rewarded and frequently punished—could come to be aversive for animals and they could be conditioned not to make primary discrimination responses in those trial contexts. The uncertainty response could then become the default avoidance response to aversive stimuli, instead of a metacognitive report. This potential problem was raised from a formal-modeling perspective by Smith et al. (2008) and Staddon, Jozefowiez, and Cerutti (2007), and from a philosophical perspective by Carruthers (2008). A second purpose of the present research was to evaluate animals’ uncertainty responses when they were denied the trial-by-trial reinforcement that could produce gradients of stimulus avoidance/aversion.

In a first attempt to dissociate metacognitive from associative strategies in uncertainty tasks, Smith, Beran, Redford, and Washburn (2006) found that humans and one of two monkeys were able to make cognitive, decisional uncertainty responses that were independent of feedback signals.

Smith et al. (2006) gave humans and monkeys a psychophysical density-discrimination task and then trained them to complete the task under deferred feedback. That is, humans and monkeys were adjusted to situations in which they performed blocks of four trials and then received summary feedback for the block. In that way, they were denied trial-by-trial feedback along with the possibility of directly associating outcomes with specific stimulus-response pairs. The reinforcement situation thus became opaque and they could not construct reinforcement histories or response tendencies based on their trial-by-trial experience. Humans and one of two monkeys were able to make cognitive, decisional uncertainty responses that were independent of feedback signals.

However, this study had two significant limitations. First, only one monkey of two successfully transferred the use of the uncertainty response from one sparse-dense task to the next. The present article sought a general finding based on results from several monkeys. Second, and more important, the stimulus continua in all transfer tasks were Sparse-to-Dense continua and the primary responses did not change in kind. Thus, while humans and monkeys could not track the reinforcement history of the new tasks, they could have transferred their general knowledge of reinforcement history and task structure from one task to the next. To address this concern, the present research used transfer tasks that were qualitatively different from one another, so that neither task knowledge, associative cues, reinforcement history, or aversion/reinforcement gradients could easily transfer from one task to another. If monkeys could adapt their use of an uncertainty response to these qualitatively different tasks, and do so without any recourse to trial-by-trial feedback from their responses, it would indicate that monkeys cognitively construe new tasks and respond Uncertain based on psychological signals of indeterminacy/difficulty, not based on cues of aversion/avoidance. This would represent an important, converging line of evidence that nonhuman animals have an uncertainty-monitoring system that has functional similarities to metacognition of humans.

Experiment 1: Humans

Threshold tasks play a prominent role in human and animal psychophysics (Au & Moore, 1990; MacMillan & Creelman, 1991; Schusterman & Barrett, 1975; Thompson & Herman, 1975; Yunker & Herman, 1973). In these tasks, the experimenter moves one stimulus distribution nearer to or farther from a stable, contrast stimulus distribution in order to titrate the perceptual limen or discrimination threshold between the stimulus classes (Corso, 1963; Fechner, 1860/1966). The threshold task creates constant, focused task difficulty because at threshold the identity of the stimulus on all trials is barely discernible. Uncertainty monitoring and responding should be at a premium in these tasks. Experiment 1 examines human uncertainty monitoring in psychophysical threshold tasks performed with deferred and rearranged feedback.

We had a specific reason for evaluating humans’ uncertainty monitoring under threshold conditions. In the original study of psychophysical uncertainty monitoring by monkeys (Smith et al., 1997), monkeys performed both threshold and constant-stimuli psychological tasks. (In the constant-stimuli task, each trial represents a random choice from a set/constant distribution of stimuli, not a focused test of the animal’s discrimination performance near threshold). Both monkeys responded Uncertain adaptively and robustly in the threshold paradigm. But one monkey essentially did not respond Uncertain in the constant-stimuli task. Smith et al. (2006) reported the same lack of uncertainty responding by one monkey in a constant-stimuli task. Thus, we had reason to believe that the threshold paradigm might produce the most robust uncertainty responding by monkeys. We tested humans in the threshold paradigm in order to have a human performance profile that would be directly comparable to that of the monkeys.

Method

Participants

Seventy-two undergraduates at the University at Buffalo, the State University of New York participated.

Psychophysical tasks

Each participant first completed a Sparse-Dense task. The threshold task was run along a continuum of 100 density levels designated Levels 20-120. The number of lit (white) pixels in the 200 × 100 box was given by Pixels_Level = round ((11800 – (120 – Level) ²) div 4). Thus, the density continuum went from 450 pixels (Sparse, Level 20) to 2950 pixels (Dense, Level 120).

Participants then transferred to a Continuity task. They judged whether a white circle was Discontinuous or Continuous based on the number of radial dots it contained on its perimeter. The number of radial dots was determined by the formula Dots_Level = round ((11800 – (120 – Level)²) div 100). Each dot’s X and Y coordinates, respectively, were determined by the formulae round (100 * cos(j * 2 * pi / Dots_Level)) and round (80 * sin(j * 2 * pi / Dots_Level)). In this way, as the variable j increased from 1 up to the required number of dots, the dots were evenly distributed around the circle’s perimeter. The angular distances between radial dots ranged from 20 degrees (Discontinuous, Level 20, 18 total radial dots) to about 3 degrees (Continuous, Level 120, 118 radial dots).

Finally, participants transferred to an Ellipse task. Humans judged whether a red ellipse was relatively round or flattened. In this task, the lengths of the X-radius and Y-radius of the ellipse were manipulated. The X- and Y-radii, respectively, were given by the formulae 90 + round ((11800 – (120 – Level) ²) div 400) and 50 - round((11800 – (120 – Level) ²) div 400). This resulted in X-radii that ranged from 94 (Round, Level 20) to 119 (Flat, Level 120) and Y-radii that correspondingly ranged from 46 to 21.

Responses

Threshold tasks have one stable stimulus distribution against which another roving stimulus distribution moves in order to titrate the participant’s threshold. We arranged the response grammar of the threshold task so as to honor this structural asymmetry. A stimulus was presented at the screen’s top right. A response icon “S” was presented at the screen’s top left. On all Level 120 trials—that is, Dense, Continuous, and Flattened stimuli—participants needed to make a response that selected the stimulus itself. On all Level 20-119 trials—that is, Sparse, Discontinuous, and Rounded stimuli—participants needed to make a response that selected the “S” response icon. This response grammar was close to the asymmetrical Go/No Go discrimination paradigm that is familiar to comparative psychologists. Correct responses resulted in a computer-generated whoop sound. Incorrect responses resulted in an 8s computer-generated buzz. The Uncertain response was a “?” in the bottom-center of the screen, and it allowed humans to escape the current trial (with no reward or penalty). During the Density task, these consequences were given immediately; during the Continuity and Ellipse task they were not (see below).

Titrating threshold

As the session began, the participant made Sparse-Dense decisions about the extreme stimuli in the task (Level 20, Level 120). As participants correctly completed trials, the level of the sparse trials increased so that they became more similar in density to Level 120 trials. The level of the sparse trials increased in 2-step increments in the range of Levels 20-64, allowing task difficulty to approach threshold rapidly while difficulty was still fairly low. The level of the sparse trials increased in 1-step increments in the range of Levels 65-119, allowing fine adjustments of task difficulty in the region surrounding each participant’s threshold. As the sparse level approached threshold, the accuracy in comparing Level 120s (true Denses) and (e.g.) Level 85s (threshold Sparses) eventually dropped below 70%. Then, and whenever accuracy on Sparse trials fell below 70%, the roving level of sparse trials decreased, loosening the discrimination and making it easier. Whenever accuracy on Sparse trials rose above 70%, the roving level of sparse trials increased, tightening the discrimination and making it more difficult. In this way, participants’ discriminative capabilities were continually challenged, their thresholds were monitored and maintained, and the task presented highly focused difficulty and sustained uncertainty. Accuracy was determined by finding the proportion of correct responses over the last ten trials completed for the given task (Density, Continuity, or Ellipse).

The threshold-titration method was the same in all three tasks. In each task, approximately 30% of trials were spent approaching the participant’s threshold and 70% of trials were spent titrating around the threshold.

Instructions

At the beginning of the experiment, participants were told: “You will see boxes that are DENSELY or SPARSELY filled with pixels. Your job is to decide whether the boxes are DENSE or SPARSE.” They were shown the keys to press to make the stimulus-touch or “S” response, informed of the rewards and penalties, then told, “if you are UNCERTAIN, press the ‘?’ key”. Instructions for the Continuity and Ellipse tasks were in exactly the same format, with appropriate changes corresponding to the stimuli (i.e., “boxes” and “DENSE” changed to “ellipses” and “FLATTENED” for the ellipse task). For the Continuity and Ellipse task they were also told, “you will now find out how you are doing every 4 trials.”

Deferred/rearranged feedback

During deferred and rearranged feedback, participants completed four trials, with each response immediately bringing about the next trial. After four trials were completed, all reward whoops earned for the four trials were given, followed by all penalty buzzes. Consequences were separated by 250ms, so that it was apparent how many rewards and penalties had been accrued. Any uncertainty responses made during the four trials simply reduced the total number of rewards and penalties. For example, if a participant made 2 correct responses, 1 incorrect and 1 Uncertainty response, they would receive 2 reward whoops followed by 1 penalty buzz. It was thus possible to get direct feedback on a trial by making the Uncertainty response to the first 3 stimuli and attempting to answer the fourth. However, no human (and no monkey) responded in this way. The feedback cycle was immediately followed by the next block of trials. Each trial took about 1s to complete. Stimuli were not arranged in blocks. Every trial was selected randomly, with a 60% chance of being a Level 120 and a 40% chance of being the current roving level. Thus, there was no way to know or strategize about upcoming trials in a block based on what trials had already occurred.

Task durations

Participants were given 301 trials in the Density task with trial-by-trial feedback. They were given 453 trials in the Continuity and Ellipse transfer tasks, with all trials receiving only deferred and rearranged feedback. Thus, humans completed two qualitatively novel tasks (making the reinforcement history from previous tasks uninformative and unhelpful) entirely under deferred feedback (making the reinforcement history from the present task unavailable).

Results

Performance bins

Each task contained 100 stimulus levels. Humans were highly accurate on lower stimulus levels, and progressed through those levels quickly, with few trials delivered. To create data bins with more equal numbers of trials, the data were binned as follows: Levels 20-29, Levels 30-39, Levels 40-49, Levels 50-59, and Levels 60-69, respectively, became Bins 1-5 (10 levels per bin). Levels 70-74 and 75-79, respectively, became Bins 6 and 7 (5 levels per bin). Levels 80-115 became Bins 8-19 at 3 levels per bin. Levels 116-119 became Bin 20. For all bins up to and including Bin 20, “S” responses were correct. The final level (Level 120) – the only level for which the stimulus-touch response was correct – became Bin 21. Bins with fewer than 10 trials were eliminated from analysis and from the corresponding graphs, because it is not useful to estimate three response proportions based on fewer than 10 events.

Bin effects

In the Sparse-Dense task, peak uncertainty responding was 31.9% at Density Bin 15. Accuracy here was 62.5%, appropriately low because this point was near the perceptual threshold of our sample relative to true Dense trials at Density Bin 21. Figure 1A shows humans’ proportional use of the three responses in this task. At each density bin, the three response proportions sum to 1.0 because participants made one of three responses for each trial. This same additivity applies to Figures 2, 4, and 5. The error bars indicate 95% confidence intervals for the peak of uncertainty responding at Bin 15, for the first level to the left of the peak at which uncertainty responding significantly declined, and for Bin 21 representing the true Dense trials. Uncertainty responding fell off to either side of the peak, becoming reliably different at Density Bin 6 to the left (where participants could generally tell that stimuli were Sparse) and Density Bin 21 to the right (where participants could generally tell that stimuli were Dense).

A. Humans’ performance on a density discrimination with trial-by-trial feedback using the titrating threshold method. Grey diamonds, grey triangles, and black squares represent the proportion of Sparse, Dense, and Uncertain responses, respectively. Error bars indicate 95% confidence intervals at selected data bins as described in the text. B. Humans’ performance on a continuity discrimination with deferred feedback using the threshold method. Grey diamonds, grey triangles, and black squares represent the proportion of Discontinuous, Continuous, and Uncertain responses, respectively. Error bars indicate 95% confidence intervals at selected data bins. C. Humans’ performance on an ellipse discrimination with deferred feedback using the threshold method. Grey diamonds, grey triangles, and black squares represent the proportion of Round, Flattened, and Uncertain responses, respectively.

*A,B,C*. The performance of monkey Gale in the Length, Continuity, and Ellipse discriminations using deferred feedback and the titrating threshold method, depicted as in Figure 1.

*A,B,C*. The performance of monkey Lou in the Slope, Continuity, and Asterisk discriminations using deferred feedback and the titrating threshold method, depicted as in Figure 1.

*A,B,C*. The performance of monkey Murph in the Length, Continuity, and Asterisk discriminations using deferred feedback and the titrating threshold method, depicted as in Figure 1.

In the Continuity task, peak uncertainty responding was 33.3% at Bin 14. Accuracy here was 55.0%. In the Ellipse task, peak uncertainty responding was 57.8% at Bin 20. Accuracy here was 58.7%. Figures 1B and 1C show humans’ performance on the Continuity and Ellipse tasks, respectively. The error bars in these figures were placed as just described. In both tasks, participants declined more the more difficult trials at the crux of their ability to discriminate Discontinuous from Continuous circles and Rounded from Flattened ellipses.

In all three tasks, there was a significant effect of Bin on the level of uncertainty responding, F(1, 20) = 125.3, p < 0.001; η² = .86. Humans were able to decline selectively the most difficult trials. In the two tasks featuring deferred and rearranged reinforcement, they were clearly not dependent on trial-by-trial feedback to do so. There was also a significant effect of task on the pattern of uncertainty responding, F(2, 40) = 7.9, p < 0.001; η² = .28. This suggests that uncertainty responding was flexible and adapted to the structure of each task. It was not a function of previously experienced reinforcement history, nor was it a carryover effect from previous tasks. Their trial-decline responses were made to the most difficult stimuli in the task whether reinforcement-based cues were present (Figure 1A) or not (Figures 1B, 1C). In the latter two tasks, they had to decide to choose the uncertainty response based on their cognitive assessment of the difficulty of the trial. Thus, in both these tasks, humans illustrated a data pattern under deferred reinforcement that suggests their capacity for metacognition shorn of reinforcement cues. Though this conclusion is not surprising, it gives us a comparative standard against which we can compare the uncertainty responding of monkeys placed into a similar situation.

Experiment 2 Monkeys

Experiment 2 gave rhesus monkeys the threshold situation of Experiment 1. If monkeys could make adaptive uncertainty responses even in qualitatively novel tasks when feedback was deferred and rearranged, then it would be difficult for the uncertainty response to be conditioned by feedback signals, to be responsive to reinforcement history, or to be based in low-level associative cues. If that response were preserved, it would suggest that monkeys were choosing uncertainty responses at a higher, decisional level that could be isomorphic to humans’ uncertainty-monitoring performance in Experiment 1.