Abstract
Rewards and punishments (reinforcement) powerfully shape behavior. Accordingly, their neuronal representation is of significant interest, both for understanding normal brain-behavior relationships and the pathophysiology of disorders such as depression and addiction. A recent article by Vickery and colleagues (Neuron 72: 166–177, 2011) provides evidence that the neural response to rewards and punishments is surprisingly widespread, suggesting the need for examination of the specific roles of areas not commonly included in the canonical reward circuitry in processing reinforcement.
Keywords: reward, punishment, reinforcement learning, fMRI
whether or not to engage in a particular behavior, opt for one course of action over another, or approach or withdraw from a particular situation is all strongly influenced by the outcomes previously experienced under similar conditions (Mazur 2012). That is, motivation, decision making, and social interaction are all dependent on expectations that a particular action, choice, or environment will yield a given result. In general, these returns, be they pleasant (rewards) or aversive (punishments), powerfully shape behaviors from the simple to complex, across timescales that range from milliseconds to years (Mazur 2012). This influence of rewards and punishments is disrupted in clinical brain disorders such as depression, addiction, and obsessive-compulsive disorder (Charney and Nestler 2009). Individuals with these conditions no longer seek to gain rewards and avoid punishments, compulsively seek particular rewards to the detriment of other needs, or attribute rewards to maladaptive behaviors, respectively (Charney and Nestler 2009). Thus, an understanding of the neurobiological representation of rewards and punishments is essential for understanding both the healthy and diseased brain.
Evidence for a dedicated neurobiological pathway representing reinforcement first came from experiments that found rats would lever-press to earn electrical microstimulation of brain sites previously identified as critical for seeking natural rewards (Olds and Milner 1954). Subsequently, it was demonstrated that a key feature of these sites, including the ventral striatum, amygdala, medial prefrontal cortex, and orbital prefrontal cortex was innervation from midbrain dopaminergic neurons (for a review, see Haber and Knutson 2010). Studies demonstrating significant responses to primary (e.g., food) and secondary (e.g., money) reinforcements and reinforcement-predictive stimuli in these areas, modulation of these neuronal responses by dopamine agonists and antagonists, and behavioral impairments following dopamine depletions served to cement the primary role of dopamine and its cortical and subcortical efferents in encoding and valuing rewards (for a review, see Schultz 2007).
Given the above evidence for a well-conserved and circumscribed pathway for encoding rewards, it is surprising that a recent study reported widespread encoding of reinforcement signals across multiple cortical and subcortical domains (Vickery et al. 2011).
In this study, Vickery and coworkers (2011) examined blood oxygen level-dependent (BOLD) responses using functional magnetic resonance imaging (fMRI) while human subjects played a simple competitive game, matching pennies or rock-paper-scissors, against a computer opponent. During these experiments, the subject's task was to guess the computer's choice, either heads or tails in matching pennies, or rock, paper, or scissors in rock-paper-scissors. In the matching pennies game, guesses that matched the computer's selection were counted as wins and mismatches were counted as losses. Subject-computer choice relationships in rock-paper-scissors determined wins, ties, and losses (paper > rock > scissors > paper). In matching pennies, any block of trials that ended with the subjects having answered correctly on greater than 50% of the trials resulted in a monetary reward. Payouts in rock-paper-scissors were earned whenever subjects won more than they lost over a block of trials. In both games, subjects used previous outcomes to guide future choices, exhibiting a greater tendency to switch their response following a loss on the previous trial and stay with the same response following a win on the previous trial (“win-stay lose-shift” strategy).
To examine whether fMRI responses in a region of interest (ROI) encoded the outcome of the previous trial, Vickery et al. (2011) used two separate methods, a standard generalized linear model (GLM) and a multivoxel pattern analysis (MVPA), to decode previous trial outcomes during an inter-trial period. Unlike GLM, in which classification of the outcome is based on a comparison of the average activity following wins and losses, MVPA uses the pattern of activity across the voxels in an ROI to determine whether the previous trial was a win or loss (Fig. 1A). This technique has previously been used, for example, to decode subjective perception of visual stimuli from activity in primary visual cortex (Kamitani and Tong 2005).
Fig. 1.

Reconfiguring the pathways for processing reinforcement signals. A: cartoon illustration of the seminal finding of Vickery et al. (2011). The rhinal (Rh, red) and orbitofrontal cortex (OFC, blue) are highlighted on MR surface maps depicting ventral and lateral views of the human brain. Arrows link Rh and OFC to corresponding schematic illustrations of hypothetical fMRI BOLD responses in each region of interest (ROI) during conditions when the previous trial was either a win or a loss. When determination of reward processing is based on a comparison of average activity in an ROI across conditions (generalized linear model, GLM), a reward signal is found in OFC but not Rh. In contrast, when reward processing is assessed using the specific pattern of activation across the voxels in an ROI in different conditions (multi-voxel pattern analysis, MVPA), reward processing is observed in both OFC and Rh. Surface maps rendered with WebCARET software (http://www.nitrc.org/projects/caret/). B: simplified diagram of the canonical reward pathway as well as primary connections with circuits for vision, action, and memory. Box and outline colors indicate inclusion in a specific circuit. The results of Vickery et al. (2011) suggest that reward processing extends beyond a circumscribed reward circuit to include primary sensory areas (e.g., V1) and motor planning areas (e.g., SMA) as well as mnemonic areas (e.g., Rh). SMA, supplementary motor area; ACC, anterior cingulate cortex; LPFC, lateral prefrontal cortex; vmPFC, ventro-medial prefrontal cortex; VA/MD, ventral anterior and medial dorsal thalamic nuclei; Hpc, hippocampus; Rh, rhinal cortex; TH/TF, parahippocampal cortex; GP, globus pallidus; STN, subthalamic nucleus; STR, striatum; SN/VTA, substantia nigra/ventral tegmental area; Amy, amygdala.
Even under the strictest statistical criterion, the MVPA model performed significantly better than chance in decoding win-loss information in an approximately fourfold greater number of areas than the GLM model (greater than 80% of the areas studied). This suggests the more limited representation of reinforcement described in previous imaging studies (for a review, see Haber and Knutson 2010) might have been a result of only looking for differences in the mean signal. To examine the overlap between reward and punishment signals, separate MVPA models were used to classify wins vs. ties and ties vs. losses in the rock-paper-scissors game. Most areas that encoded wins vs. ties also encoded ties vs. losses, leading the authors to suggest that reinforcement and punishment are encoded within the same circuit. A notable exception to this pattern was the nucleus accumbens, an area previously identified as a key node in the reward circuit and hypothesized to implement an important step in leading computational models of reinforcement learning (for a review, see Houk et al. 1995), which did not encode losses. More generally, this finding also conflicts with studies reporting a segregation of responses to reward and punishment (Matsumoto and Hikosaka 2007).
Finally, two important control analyses were run to ensure that these results were not an artifact of a particular statistical model and could not be attributed to emotional or cognitive factors that co-vary with outcome.
First, to ensure that the widespread encoding of reinforcement signals was not simply an artifact of an overly sensitive method, MVPA was also used to decode the visual, feedback, motor response, and decision strategy signals from activity during the feedback, response and inter-trial periods, respectively. In all cases, significant encoding was restricted to a small subset of ROIs (fewer than 15% of those examined) in which, based on a large corpus of evidence, each type of signal would be expected to be observed (e.g., visual feedback could only be decoded from activity in visual cortex).
Second, to determine the extent to which the signals observed during the inter-trial period reflect reinforcement and not other variables such as arousal, separate MVPA classifiers were constructed using only win and tie outcomes or tie and loss outcomes in the rock-paper-scissors game. Assuming that wins and losses are more emotionally or cognitively “salient,” i.e., elicit greater happiness or frustration, respectively, or grab more attention, than ties, the tendency of each classifier to assign untrained losses (for the win-tie classifier) or wins (for the tie-loss classifier) as a win or a loss can be taken as an indication of the extent to which these signals reflect outcome salience rather than reinforcement. Because neither of these models showed a significant tendency to misclassify signals from any ROI in this manner, Vickery et al. (2011) concluded that these signals do not reflect salience.
Current theories of reward-guided behavior posit that sensory, motor, and mnemonic information is integrated with information about reinforcement in a specialized pathway for valuing and selecting rewards (Fig. 1B) (for a review, see Haber and Knutson 2010). The work of Vickery and colleagues (2011) suggests at the very least two alternative interpretations, namely: 1) reward is encoded far more broadly than previously hypothesized, or 2) processing within this specialized reward pathway dramatically influences subsequent processing throughout nearly the entire brain. Given the aforementioned importance of reinforcement to understanding normal and pathological behavior, it is crucial to critically evaluate the results of Vickery et al. to determine the most promising avenue for further research.
While the results of the control analyses examining visual feedback, response, and decision signals are strong evidence that the ubiquitous encoding of trial outcomes observed during the inter-trial period was not simply due to the sensitivity of the MVPA method, there remain several open questions that must be answered to better interpret the central findings of Vickery et al. (2011).
The results of the separate win-tie/tie-loss classifier analysis are strong evidence that these signals do not reflect the degree of emotional arousal or attention elicited by a particular outcome. However, when planning future choices, ties are potentially as strategically salient as wins and losses (i.e., just as informative about what to do next). The above analysis does not control for salience in this latter, informational sense. Furthermore, the manner in which subjects used this information might have been dependent on their knowledge that they were playing against a computer rather than human opponent. A recent study reported encoding of socially dependent strategies in many of the core areas of the canonical reward pathway (Bault et al. 2011). Examining whether a subject's task strategies and the pattern of neural encoding of outcomes change when games are played against human vs. computer opponents could shed further light on the degree to which the signals described in Vickery et al. (2011) are dependent on these factors.
In a similar vein, while competitive games such as matching pennies or rock-paper-scissors are attractive, in that reinforcement is not dependent on a particular stimulus or response, and previous outcomes guide future choices, the relationship between any individual outcome and the ultimate payout (determined at the end of a block of trials) is complex and temporally variable. Thus, the widespread encoding of reinforcement seen in these tasks could be related to either task complexity or temporal distance from actual reward. Either 1) providing real-time feedback on the overall probability of reward as subjects advance through a block of trials, 2) examining whether distance from payout significantly affected outcome encoding, or 3) employing a simpler task, for example, one in which each choice is directly rewarded or punished, would be interesting manipulations to test in further studies.
Studies examining the relationship between BOLD and underlying neural activity have generally reported that BOLD reflects synaptic input and/or local processing within an ROI (Logothetis et al. 2001). However, a recent paper combining intrinsic signal optical imaging (measuring a similar hemodynamic response to the BOLD signal) with multi-unit and local field potential recordings in non-human primate primary visual cortex reported a striking dissociation between electrophysiological and hemodynamic responses to expected reward (Sirotin and Das 2009). Specifically, Sirotin and Das (2009) found a significant increase in the hemodynamic signal, but no corresponding increase in any of their electrophysiological measures, when monkeys anticipated upcoming trials (and thus rewards). These authors suggested that this hemodynamic modulation could be the result of dopamine-driven changes in vascular tone in anticipation of upcoming neural processing. Because subjects might have been anticipating future outcomes during the time period that Vickery et al. (2011) reported widespread reinforcement encoding, these signals could reflect anticipatory changes in local cerebral blood flow that are uncoupled from the neural activity in their ROIs. Largely confirming these effects, with obvious differences in spatial resolution, with a more direct measure of neural activity (e.g., MEG or EEG), would at least provide assurance that the BOLD signals measured here reflect underlying neural activity in these conditions (although this still might not reflect the spiking output of these regions). Conversely, differences in BOLD and electrophysiological responses in such an experiment would strongly suggest that the signals reported by Vickery et al. (2011) reflect changes in local processing, perhaps an increase in the sensitivity of these circuits, driven by activity in distant structures.
Finally, in future studies, it would be helpful to correlate the performance of MVPA models like those used here with estimates of functional connectivity (Fox et al. 2006). This would allow for the determination of whether reinforcement signals are stronger in particular networks. Such clustering data would allow for the targeted investigation of signal flow within these circuits by, for example, using reversible inactivations, focal lesions, and disconnections in primate and rodent models to determine whether, and under what conditions, the reinforcement signals found in specific brain regions are necessary for normal reward-guided behavior.
The greater variety of techniques available for recording and manipulating neural activity, and concomitant behavioral output, in animal models can be deployed to great use in extending the work of Vickery et al. (2011). Indeed, these authors' work is in agreement with accumulating evidence in both rodents and monkeys that reinforcement signals are processed more widely than previously thought. For example, it has been reported that a large fraction of units in rodent primary visual cortex (V1) signal the magnitude or timing of an expected reward in the absence of any visual stimulation (Shuler and Bear 2006). Similarly, a recent study found that removal of macaque rhinal cortex (Rh), the final stage of the ventral stream of visual processing in the primate, induced a severe impairment in reward valuation in conditions in which no visual stimuli predicted reward value (Clark et al. 2012). Importantly, both V1 and Rh showed evidence of reinforcement processing in Vickery et al (2011).
In conclusion, maximizing rewards while minimizing losses is essential for survival. Thus, widespread abstract reinforcement signals, unrelated to any particular stimulus, response, or choice, could potentially be extremely useful not only in finding food and avoiding predators, but also, for example, in deciding whether specific information or a particular episode is valuable enough to encode in memory. It will be necessary to determine whether, as has been thought for more than fifty years, such signals are exclusive to dedicated pathways for processing reward and punishment, or whether wider networks of sensorimotor and association areas participate directly in encoding rewards and punishments.
GRANTS
This work was supported by the Intramural Research Programs of the National Institute of Mental Health.
DISCLOSURES
No conflicts of interest, financial or otherwise, are declared by the author(s). The opinions expressed in this article are the author's own and do not reflect the view of the U.S. National Institutes of Health, the Department of Health and Human Services, or the United States Government.
AUTHOR CONTRIBUTIONS
Author contributions: A.M.C. conception and design of research; A.M.C. performed experiments; A.M.C. analyzed data; A.M.C. interpreted results of experiments; A.M.C. prepared figures; A.M.C. drafted manuscript; A.M.C. edited and revised manuscript; A.M.C. approved final version of manuscript.
REFERENCES
- Bault N, Joffily M, Rustichini A, Coricelli G. Medial prefrontal cortex and striatum mediate the influence of social comparison on the decision process. Proc Natl Acad Sci USA 108: 16044–16049, 2011 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Charney D, Nestler E. (editor). Neurobiology of Mental Illness. Oxford: Oxford University Press, 2009 [Google Scholar]
- Clark AM, Bouret S, Young AM, Richmond BJ. Intersection of reward and memory in monkey rhinal cortex. J Neurosci 32: 6869–6877, 2012 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fox MD, Corbetta M, Snyder AZ, Vincent JL, Raichle ME. Spontaneous neuronal activity distinguishes human dorsal and ventral attention systems. Proc Natl Acad Sci USA 103: 10046–10051, 2006 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Haber SN, Knutson B. The reward circuit: linking primate anatomy and human imaging. Neuropsychopharmacology 35: 4–26, 2010 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Houk JC, Adams JL, Barto AG. A model of how the basal ganglia generate and use neural signals that predict reinforcement. In: Models of Information Processing in the Basal Ganglia, edited by Houk JC, Davis JL, Beiser DG. Cambridge: MIT Press, 1995, p. 379 [Google Scholar]
- Kamitani Y, Tong F. Decoding the visual and subjective contents of the human brain. Nat Neurosci 8: 679–685, 2005 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Logothetis NK, Pauls J, Augath M, Trinath T, Oeltermann A. Neurophysiological investigation of the basis of the fMRI signal. Nature 412: 150–157, 2001 [DOI] [PubMed] [Google Scholar]
- Matsumoto M, Hikosaka O. Lateral habenula as a source of negative reward signals in dopamine neurons. Nature 447: 1111–1115, 2007 [DOI] [PubMed] [Google Scholar]
- Mazur JE. Learning and Behavior. New York: Pearson, 2012 [Google Scholar]
- Olds J, Milner P. Positive reinforcement produced by electrical stimulation of septal area and other regions of rat brain. J Comp Physiol Psychol 47: 419–427, 1954 [DOI] [PubMed] [Google Scholar]
- Schultz W. Behavioral dopamine signals. Trends Neurosci 30: 203–210, 2007 [DOI] [PubMed] [Google Scholar]
- Shuler MG, Bear MF. Reward timing in the primary visual cortex. Science 311: 1606–1609, 2006 [DOI] [PubMed] [Google Scholar]
- Sirotin YB, Das A. Anticipatory haemodynamic signals in sensory cortex not predicted by local neuronal activity. Nature 457: 475–479, 2009 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vickery TJ, Chun MM, Lee D. Ubiquity and specificity of reinforcement signals throughout the human brain. Neuron 72: 166–177, 2011 [DOI] [PubMed] [Google Scholar]
