Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2012 Feb 27;109(11):4285–4289. doi: 10.1073/pnas.1119969109

Adaptive coding of reward prediction errors is gated by striatal coupling

Soyoung Q Park a,b,1, Thorsten Kahnt b,c, Deborah Talmi d, Jörg Rieskamp e, Raymond J Dolan d, Hauke R Heekeren a,b,f
PMCID: PMC3306682  PMID: 22371590

Abstract

To efficiently represent all of the possible rewards in the world, dopaminergic midbrain neurons dynamically adapt their coding range to the momentarily available rewards. Specifically, these neurons increase their activity for an outcome that is better than expected and decrease it for an outcome worse than expected, independent of the absolute reward magnitude. Although this adaptive coding is well documented, it remains unknown how this rescaling is implemented. To investigate the adaptive coding of prediction errors and its underlying rescaling process, we used human functional magnetic resonance imaging (fMRI) in combination with a reward prediction task that involved different reward magnitudes. We demonstrate that reward prediction errors in the human striatum are expressed according to an adaptive coding scheme. Strikingly, we show that adaptive coding is gated by changes in effective connectivity between the striatum and other reward-sensitive regions, namely the midbrain and the medial prefrontal cortex. Our results provide evidence that striatal prediction errors are normalized by a magnitude-dependent alteration in the interregional connectivity within the brain's reward system.

Keywords: adjustment, dopamine, normalization, context invariance, functional connectivity


From receiving a piece of chocolate to winning a lottery, the range of possible rewards in the world is immense, yet the coding range of reward-sensitive neurons is limited. An efficient way for the brain to solve this problem is by dynamically adjusting the activity range of neurons according to the momentarily available rewards. Such an adaptive coding mechanism maximizes the discriminability between different values in a given reward context, thus enabling efficient information processing.

Specifically, adaptive coding of reward prediction errors (PEs) has been suggested by a wide range of theories from economics and reinforcement learning. A PE quantifies the difference between the expected and the actually received reward. Prospect theory suggests that such changes are coded according to an individual reference outcome, such as the status quo or individual expectations (1, 2). In reinforcement learning theory, on the other hand, the PE is considered to be essential for updating the reward values associated with the predicting cue, thus acting as a teaching signal (3, 4). Adaptive coding of PEs is essential for two reasons. First, the reward magnitude (lottery or chocolate) is already encoded during expectation. Hence, in terms of effective neural coding, it is not necessary to represent the reward magnitude redundantly when computing the PEs. Second, to cover the wide range of all possible rewards, it is inevitable to optimally exploit the limited range of neural firing rates. By doing so, the neural system can represent the whole range of rewards and, at the same time, remain sensitive to change.

Indeed, animal recording studies have shown that dopaminergic midbrain neurons encode reward PEs (5, 6) according to an adaptive coding scheme (7). Specifically, these neurons increase their activity for the larger of two potential reward outcomes and decrease their activity for the smaller outcome independent of the absolute reward magnitude (7). Human studies using functional magnetic resonance imaging (fMRI) highlight PE-related activity in the ventral striatum (814), activity that is often presumed to reflect a dopaminergic input from the midbrain.

Adaptive coding is a normalization process that brings different magnitudes onto the same coding scale. Although adaptive coding in reward-sensitive neurons is well documented (7, 1517), it is unknown how the brain normalizes different ranges of rewards to enable the adaptive coding. One possible mechanism for such normalization is via modulation of connectivity with other reward-coding areas, including areas that might show sensitivity to actual reward magnitudes. The major dopaminergic innervations to the striatum originate in the ventral tegmental area (VTA) and the substantia nigra (SN) (18). Additionally, there is strong input from regions encoding reward value, notably the orbitofrontal cortex (OFC) and the ventromedial prefrontal cortex (vmPFC) (1924). Based on these connections, we hypothesized that changes in striatal connectivity with these regions would underlie an adjustment in the coding range of striatal PEs. Specifically, when a high reward magnitude is encountered, a dynamic change in connectivity would render striatal PE coding comparable to that of a lower reward magnitude.

In the current study, we aimed to investigate the normalization process underlying adaptive coding by means of fMRI. First, we addressed the question of whether PEs are represented according to an adaptive coding scheme in the human striatum. Second, we investigated how the brain implements the reward rescaling for adaptive coding.

Subjects performed a simple reward prediction task that induces PEs. In each trial, subjects saw a cue indicating the possible reward; in trials with high reward magnitude, subjects saw 1€ combined with either high or low probability (66% or 33%). In trials with low reward magnitude, subjects saw 10ct with high or low probability (Fig. 1A). The reward cues appeared on either the right or the left side of the screen, and subjects were asked to indicate the position of the reward cue by pressing a button. After a variable delay, subjects saw either the corresponding coin, indicating the outcome was obtained, or the coin with a red cross superimposed, indicating the outcome was omitted (Fig. 1A). Importantly, in our task, the reward magnitudes were combined with different probabilities, allowing us to disentangle PE-related activity from outcome-related activity.

Fig. 1.

Fig. 1.

Task description and behavioral data. (A) In each trial, subjects saw a visual cue indicating the reward magnitude and probability on the left or right side of the screen. Subjects indicated the location of the cue by pressing a button, after which a green circle surrounded the cue. After a variable delay, the reward outcome was shown to the subjects. (B) Mean corrected reaction time data. Left two bars show high and low reward magnitudes, and right two bars show high and low reward probabilities. The greater the reward magnitude and the higher the probability, the faster the subjects responded. (Error bars: SEM.)

Results

Behavioral Results.

Subjects correctly indicated the location of the cue in 99.33% ± 0.02% of the trials. A two-by-two ANOVA (reward magnitude × probability) on the reaction time data revealed a significant main effect of magnitude (F1, 27 = 13.75, P < 0.001) and a significant main effect of probability (F1,27 = 12.67, P < 0.001). There was no significant magnitude-by-probability interaction (F1,27 = 1.73, P = 0.2). Subjects responded faster in the high compared with low reward magnitude (high, 555 ms; low, 568 ms) and in the high- compared with low-probability trials (high, 557 ms; low, 567 ms), indicating that both reward magnitude and probability affected reward expectations independently (Fig. 1B).

Neuroimaging Results.

Our first analysis of imaging data focused on the question of adaptive coding of PEs. We applied a whole-brain general linear model (GLM) that included onset regressors for the reward cue and the outcome as well as two parametric regressors at the time of the reward outcome. The first parametric regressor coded outcome delivery as 1 for received and −1 for omitted rewards, thus accounting for the variance caused by received vs. omitted rewards. The second parametric regressor accounted for the PE-related variance, and, importantly, this regressor was orthogonalized with respect to the first parametric regressor (received vs. omitted). Hence, this PE regressor accounts for variance in the brain oxygen level-dependent (BOLD) signal that is independent of outcome-related (received vs. omitted) activity. Voxel-wise one-sample t tests on the parameter estimates of the parametric PE regressor revealed a significant correlation between PE and activity in the ventral striatum [Fig. 2A; P < 0.05, family-wise error (FWE) small volume-corrected (SVC), [−6, 18, −3]; t27 = 4.19; see Table S1 for whole-brain results].

Fig. 2.

Fig. 2.

Adaptive coding of PEs in the striatum. (A) Parametric modulation with trial-wise PE revealed significant correlation with BOLD responses in the striatum. (B) Across different reward magnitudes, no significant difference in PE-related responses was observed (t27 = −0.98, P = 0.34), confirming an adaptive coding of striatal PE. (C) Adaptive PE predicted striatal BOLD responses significantly better than nonadaptive PE did (t27 = 2.82, P = 0.0089). (B and C) The y axes represent the parameter estimates of PE. (Error bars: SEM.)

After having identified the region in ventral striatum coding reward PEs, we determined whether this striatal region adaptively codes PE responses. Specifically, we tested whether the striatal PE responses were invariant for different reward magnitudes. In case where PEs are coded according to an adaptive coding scheme, representations of striatal PEs for high reward should not differ from those of low reward. We applied a GLM to the striatal data, but this time all of the regressors were split for high- and low-reward trials. The onset regressors for cue and outcome, the two parametric regressors for outcome-related variance, and the two PE coding regressors were regressed against the BOLD signal in the striatum. Both PE regressors were orthogonalized with respect to the parametric modulation of reward outcome. This comparison of PE-related responses in high vs. low reward magnitudes revealed no significant difference in the striatum (Fig. 2B; t27 = −0.98, P = 0.34).

We further performed an additional analysis in which we directly tested whether the striatal changes in BOLD signal are significantly better predicted by an adaptive PE (PE modulated only by probability) than by a nonadaptive PE (PE modulated by probability × magnitude). If an adaptive PE predicts striatal activity better than a nonadaptive PE, it would support adaptive PE coding in the striatum. In a case where PEs are modulated by reward magnitude, then a nonadaptive PE would provide a better prediction of striatal activity. Importantly, our analysis shows that the striatal BOLD signal is significantly better predicted by an adaptive compared with a nonadaptive PE (Fig. 2C; t27 = 2.82, P = 0.0089; mean parameter estimates: adaptive PE, 0.37 ± 0.014; nonadaptive PE, 0.29 ± 0.018). This invariance in the representation of striatal PEs is consistent with an adaptive coding scheme shown in dopaminergic neurons in primates (7).

Having shown that PEs are adaptively coded in the striatum, our next analysis sought to determine whether there were dynamic changes in striatal coupling associated with this magnitude-dependent rescaling. We hypothesized that reward-sensitive brain regions with striatal innervations, specifically the vmPFC and the VTA/SN, would modulate striatal activity as a function of reward magnitude. We performed a whole-brain psychophysiological interaction (PPI) analysis where the striatal time series of PE-related activity (Fig. 3A) was selected as a physiological variable and the reward magnitude was selected as a psychological variable (high vs. low reward). Comparing striatal connectivity between reward magnitudes revealed significant modulations in the VTA/SN ([6, −8, −21]; P < 0.05, FWE SVC, t27 = 4.74; Fig. 3B) and the vmPFC ([3, 54, −3], BA10/BA9; P < 0.05, FWE SVC, t27 = 4.85; Fig. 3D, and see Table S2 for the whole-brain results). Specifically, the coupling was significantly less during the high compared with the low reward magnitude between the striatum and both VTA/SN and vmPFC, respectively (Fig. 3C).

Fig. 3.

Fig. 3.

Striatal coupling gates adaptive coding of PEs. (A) The striatum, showing adaptive coding of PEs, was used as seed region in the functional connectivity analysis. (B and D) VTA/SN (B) and vmPFC (D) showing significant magnitude-dependent connectivity modulation with the striatum. (B Inset) Activity superimposed on a T2-weighted image. (D Inset) A coronal view. (C) Bar graph depicts significantly less midbrain–striatal and fronto–striatal connectivity during high compared with low reward magnitudes. (Error bars: SEM.)

Discussion

From visual neurons (25, 26) to reward-coding dopaminergic neurons, scale invariance is a ubiquitous encoding principle in the brain. Retinal neurons rapidly adapt to an enormous range of light to guarantee high visual discriminability. Dunn et al. (27) have shown that this adaptation is implemented via a relay from cone bipolar cells to ganglion cells, demonstrating that such a rapid rescaling of range occurs via influences of innervating neurons. Analogously, we demonstrate that PE-related activity in the human striatum adapts to the momentarily available reward magnitude, and this effect is driven by changes in neuronal dynamics associated with different reward magnitudes.

Adaptive coding of PEs enables an enhanced discriminability to remain sensitive for changes of all sizes, which optimizes efficient coding in neural circuits with given limitations, namely, its firing ranges (28). Previous animal studies have provided evidence of neuronal adaptation in coding different aspects of reward. Whereas midbrain dopaminergic neurons adaptively code reward PEs (7), OFC neurons show adaptive coding of reward preference (15). OFC neurons also adapt their firing range according to the momentarily available reward range and distribution of rewards (16, 17). Furthermore, recently it has been shown that the primate lateral intraparietal cortex represents saccade values depending on other available choice options. This context dependency is precisely predicted by the divisive normalization mechanism (29). In humans, fMRI studies have also shown that BOLD signals in reward-sensitive areas reveal magnitude adaptation across possible rewards (14, 3033).

In our data, we observe significantly less connectivity between the striatum and the mPFC/midbrain in high-reward compared with low-reward conditions (Fig. 3C). In the PPI analysis, nonspecific correlations across the brain were removed by regressing out the global mean from every voxel. However, this global-mean normalization shifts the correlation distribution to have a mean near zero and forces negative correlations to appear (3436). Therefore, it is important to interpret only the difference in connectivity between task conditions.

Our PPI results are in accord with previous studies investigating the dynamics of neuronal activation in this anatomic network. The primate striatum is tightly interconnected with the midbrain as well as with cortical areas (18, 3739). Specifically, the striatum receives dopaminergic input from midbrain regions, creating an ascending midbrain–striatal loop (13, 40). Furthermore, PFC activation modulates striatal dopamine release via inhibitory midbrain neurons (41). More specifically, PFC neurons activate GABAergic cells in the midbrain that in turn inhibit neighboring dopaminergic neurons projecting to the striatum (42). Thus, one possible pathway underlying our connectivity result is that high magnitudes of reward activate PFC neurons, thereby increasing midbrain GABA inhibition, which in turn results in reduced dopamine release in the striatum (42). This pathway may underlie the adjustment of PE signals during high-reward outcomes and explains the relative decrease in connectivity between the mPFC/midbrain and the striatum in high vs. low magnitudes.

Although our connectivity result is in line with the neuroanatomical and neurochemical systems, we acknowledge that it is not possible to identify the underlying neurotransmitters by using functional connectivity analyses with BOLD responses. Future studies are needed to investigate the neurochemical nature of this mechanism by means of positron emission tomography or pharmacological interventions.

When a specific reward magnitude is expected, dopaminergic neurons are not sensitive to the magnitude when coding the PE (7). Analogously, in the present study, all rewards were cued to ensure that rewards are expected before the outcome was presented. Also, we show that the striatal PE was invariant for high and low reward magnitudes, which is in line with other studies showing that PEs are insensitive to different reward magnitudes when cues signal the possible reward magnitude (7, 14, 32, 43). In contrast, once a reward outcome is presented unexpectedly, activity in the striatum is different for high vs. low reward magnitudes (44). Our connectivity results suggest that, with cued magnitude, the context-dependent regulation of striatal activity takes place via striatal coupling with the PFC and the midbrain. However, in the absence of a cue, when the outcome is delivered unexpectedly, no magnitude-dependent adaptation can occur, and the striatal activity might remain sensitive to different reward magnitudes.

Altogether, our results provide evidence that adaptive coding of PEs in humans is gated by striatal coupling. In line with recordings from primate dopamine neurons (7), our results show that striatal PEs do not differ for high and low reward magnitudes. Using an effective connectivity analysis, we show that mPFC and midbrain significantly modulate their coupling with the striatum in the face of a high reward magnitude, rendering striatal PEs comparable to the lower reward magnitude. Our results suggest an elaborate neural mechanism that facilitates the accurate representation of the reward value. The adaptive coding of PEs enables the brain to dynamically allocate its limited neural firing range to better discriminate among momentarily available reward outcomes. This mechanism provides a rich framework with which mechanistic hypotheses for pathologies that include abnormal value processing, such as drug addiction, can be examined.

Materials and Methods

Experimental Design.

In each trial of the task, a reward-predicting cue was presented on either the left or right side of fixation. Subjects were asked to press the corresponding button on a response box as fast as possible. Each cue contained information about both the probability (67% or 33%) and the magnitude (1Inline graphic or 10ct) of the possible reward. Accordingly, there were four different cues, indicating high probability of 1Inline graphic, low probability of 1Inline graphic, high probability of 10ct, and low probability of 10ct (Fig. 1A). After the button press, a green circle highlighted the stimulus, and after a variable interstimulus interval, the outcome cue was presented for 2 s. In high-reward trials, the outcome was an image of a 1Inline graphic coin (for received reward) or a red cross superimposed on the 1Inline graphic coin (for omitted reward). Analogously, in low-reward trials, subjects saw either a 10ct coin in received trials or a red cross superimposed on the 10ct coin in omitted trials. The actual reception or omission of reward was determined by the probability indicated by the cue. Subjects performed five sessions with 60 trials each. A total of 33 healthy subjects were tested in the study. Five subjects were excluded from the sample [one was removed because of extreme head movement (more than 3 mm or 3°) during scanning, three subjects aborted the scanning because they felt sick, and one subject was left-handed], resulting in a final sample size of n = 28 (13 females and 15 males; mean ± SD age, 25.04 ± 2.5 y).

Behavioral Data Analysis.

To monitor subjects’ attention to the task, we analyzed the percentage of correct responses (indicating the location of the cue). All following analyses included only the correctly responded trials. We examined whether subjects established reward expectations during the reward-predicting cue by testing whether reaction times are influenced by both reward magnitude and probability. For this test, we computed a two-by-two (magnitude × probability) ANOVA with repeated measures on reaction time data.

fMRI Data Acquisition and Preprocessing.

Functional imaging was conducted on a 3-T Siemens Trio scanner with a 12-channel head coil. In each of the five runs, 366 T2*-weighted gradient-echo echoplanar images containing 37 slices (3 mm thick) separated by a gap of 0.75 mm were acquired. Imaging parameters were as follows: repetition time (TR), 2,000 ms; echo time (TE), 30 ms; flip angle, 70°; matrix size, 64 × 64; field of view, 192 mm; and voxel size, 3 × 3 × 3.75 mm. T1-weighted and T2-weighted structural datasets were collected for the purpose of anatomical localization. The parameters for the T1-weighted dataset were as follows: TR, 1,900 ms; TE, 2.52 ms; matrix size, 256 × 256; field of view, 256 mm; 176 slices (1 mm thick); and flip angle, 9°. The parameters for the T2-weighted dataset were as follows: TR, 8,170 ms; TE, 0.93 ms; matrix size, 256 × 256; field of view, 256 mm; 48 slices (3 mm thick); and flip angle, 120°.

Functional data were analyzed with SPM5 (Wellcome Department of Imaging Neuroscience, University College London Institute of Neurology, London, UK). Images were slice timing-corrected, realigned, spatially normalized to a standard echoplanar image template of the Montreal Neurological Institute (MNI), resampled to 3-mm isotropic voxels, and spatially smoothed with an 8-mm full width at half maximum Gaussian kernel. All included subjects moved less than the size of a single voxel (3 mm).

Model-Based fMRI Data Analysis.

We computed a GLM with a parametric design (45) to identify brain regions coding PEs in an adaptive fashion. In each trial t, the PE δ was defined as

graphic file with name pnas.1119969109eq1.jpg

where rt is the reward outcome (1 for received and 0 for omitted outcomes) and pt is the expected probability of the reward (0.33 or 0.66). Note that the task is not a learning task because the reward probability and magnitudes were explicitly shown and did not change over the experiment. Four regressors were included in the GLM in the following order: (i) onset of the cue, (ii) onset of the outcome, (iii) parametric modulation of the outcome (coded as 1 when received and −1 when omitted), and (iv) parametric modulation of the trial-wise PEs. The PE regressor was created by parametrically modulating the stimulus function of the outcome by the normalized (mean = 0, SD = 1) trial-wise PEs. Importantly, the PE regressor (the fourth regressor) was orthogonalized with respect to the outcome-related (received vs. omitted) parametric regressor (the third regressor). All regressors were convolved with a canonical hemodynamic response function. Individual contrast images were computed for PE-related responses and taken to a second-level mixed-effect analysis using voxel-wise one-sample t tests.

Reward Magnitude-Dependent Changes in Striatal Connectivity.

We performed a whole-brain PPI analysis (12, 13, 46) with the striatum as a seed region. Here, the entire time series over the experiment was extracted from each subject in the clusters of the striatum, in which activity significantly correlated with PE on the group level. To create the PPI regressor, we multiplied the normalized time series with two condition vectors containing ones for six TRs after each reward-magnitude type (one regressor for high and one for low magnitudes) and zeros otherwise. The method used here relies on correlations in the observed BOLD time-series data and makes no assumptions about the nature of the neural event contributing to the BOLD signal (13). The time window of six TRs (12 s) was selected to capture the entire hemodynamic response function, which peaks after three TRs and is back at baseline at approximately eight TRs after stimulus onset. These PPI regressors were used as covariates in a separate PPI-GLM, in which the following regressors were included: (i) cue onset, (ii) psychological regressor accounting for high-reward outcome, (iii) psychological regressor accounting for low-reward outcomes, (iv) physiological regressor (i.e., the entire time series of the seed region over the whole experiment), (v) the PPI regressor for high-reward outcomes, and (vi) the PPI regressor for low-reward outcomes. The onset regressors were convolved with an hemodynamic response function. The resulting parameter estimates of the two PPI regressors represent the extent to which activity in each voxel correlates with activity in the striatum for each condition. Individual contrast images for functional connectivity during high vs. low reward magnitude were then computed and entered into a one-sample t test. We then identified voxels with significant connectivity difference during low vs. high reward magnitude.

We applied an omnibus threshold for all whole-brain analyses of P < 0.001, uncorrected with a cluster extent threshold of k = 10 (whole-brain results are shown in Tables S1 and S2). Correction for multiple comparisons (P < 0.05, FWE correction) was then performed for clusters surviving this threshold by using 12-mm spheres around previously reported peak voxels (SVC): for the ventral striatum and midbrain, [−8, 8, −4] and [8, −18, −20], respectively (6); for mPFC, [3, 54, −15] (25). All reported coordinates (x, y, z) are in MNI space.

Supplementary Material

Supporting Information

Acknowledgments

This work was supported by Excellence Initiative of the German Federal Ministry of Education and Research, Deutsche Forschungsgemeinschaft, Grants GSC86/1-2009 and EXC 302, the Max Planck Society, and Swiss National Science Foundation Grant 100014_130352.

Footnotes

The authors declare no conflict of interest.

*This Direct Submission article had a prearranged editor.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1119969109/-/DCSupplemental.

References

  • 1.Tversky A, Kahneman D. Judgment under uncertainty: Heuristics and biases. Science. 1974;185:1124–1131. doi: 10.1126/science.185.4157.1124. [DOI] [PubMed] [Google Scholar]
  • 2.Kahneman D, Tversky A. Prospect theory: Analysis of decision under risk. Econometrica. 1979;47(2):263–291. [Google Scholar]
  • 3.Rescorla RA, Wagner AR. A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement. In: Black AH, Prokasy WF, editors. Classical Conditioning II: Current Research and Theory. New York: Appleton-Century-Crofts; 1972. pp. 64–99. [Google Scholar]
  • 4.Sutton R, Barto A. Reinforcement Learning: An Introduction. Cambridge, MA: MIT Press; 1998. [Google Scholar]
  • 5.Schultz W, Dayan P, Montague PR. A neural substrate of prediction and reward. Science. 1997;275:1593–1599. doi: 10.1126/science.275.5306.1593. [DOI] [PubMed] [Google Scholar]
  • 6.Bayer HM, Glimcher PW. Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron. 2005;47(1):129–141. doi: 10.1016/j.neuron.2005.05.020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Tobler PN, Fiorillo CD, Schultz W. Adaptive coding of reward value by dopamine neurons. Science. 2005;307:1642–1645. doi: 10.1126/science.1105370. [DOI] [PubMed] [Google Scholar]
  • 8.Pessiglione M, Seymour B, Flandin G, Dolan RJ, Frith CD. Dopamine-dependent prediction errors underpin reward-seeking behaviour in humans. Nature. 2006;442:1042–1045. doi: 10.1038/nature05051. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Pagnoni G, Zink CF, Montague PR, Berns GS. Activity in human ventral striatum locked to errors of reward prediction. Nat Neurosci. 2002;5(2):97–98. doi: 10.1038/nn802. [DOI] [PubMed] [Google Scholar]
  • 10.McClure SM, Berns GS, Montague PR. Temporal prediction errors in a passive learning task activate human striatum. Neuron. 2003;38:339–346. doi: 10.1016/s0896-6273(03)00154-5. [DOI] [PubMed] [Google Scholar]
  • 11.O'Doherty JP, Dayan P, Friston K, Critchley H, Dolan RJ. Temporal difference models and reward-related learning in the human brain. Neuron. 2003;38:329–337. doi: 10.1016/s0896-6273(03)00169-7. [DOI] [PubMed] [Google Scholar]
  • 12.Park SQ, et al. Prefrontal cortex fails to learn from reward prediction errors in alcohol dependence. J Neurosci. 2010;30:7749–7753. doi: 10.1523/JNEUROSCI.5587-09.2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Kahnt T, et al. Dorsal striatal–midbrain connectivity in humans predicts how reinforcements are used to guide decisions. J Cogn Neurosci. 2009;21:1332–1345. doi: 10.1162/jocn.2009.21092. [DOI] [PubMed] [Google Scholar]
  • 14.Breiter HC, Aharon I, Kahneman D, Dale A, Shizgal P. Functional imaging of neural responses to expectancy and experience of monetary gains and losses. Neuron. 2001;30:619–639. doi: 10.1016/s0896-6273(01)00303-8. [DOI] [PubMed] [Google Scholar]
  • 15.Tremblay L, Schultz W. Relative reward preference in primate orbitofrontal cortex. Nature. 1999;398:704–708. doi: 10.1038/19525. [DOI] [PubMed] [Google Scholar]
  • 16.Padoa-Schioppa C. Range-adapting representation of economic value in the orbitofrontal cortex. J Neurosci. 2009;29:14004–14014. doi: 10.1523/JNEUROSCI.3751-09.2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Kobayashi S, Pinto de Carvalho O, Schultz W. Adaptation of reward sensitivity in orbitofrontal neurons. J Neurosci. 2010;30:534–544. doi: 10.1523/JNEUROSCI.4009-09.2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Haber SN, McFarland NR. The concept of the ventral striatum in nonhuman primates. Ann N Y Acad Sci. 1999;877:33–48. doi: 10.1111/j.1749-6632.1999.tb09259.x. [DOI] [PubMed] [Google Scholar]
  • 19.Kahnt T, Heinzle J, Park SQ, Haynes JD. The neural code of reward anticipation in human orbitofrontal cortex. Proc Natl Acad Sci USA. 2010;107:6010–6015. doi: 10.1073/pnas.0912838107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Philiastides MG, Biele G, Heekeren HR. A mechanistic account of value computation in the human brain. Proc Natl Acad Sci USA. 2010;107:9430–9435. doi: 10.1073/pnas.1001732107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Ferry AT, Ongür D, An X, Price JL. Prefrontal cortical projections to the striatum in macaque monkeys: Evidence for an organization related to prefrontal networks. J Comp Neurol. 2000;425:447–470. doi: 10.1002/1096-9861(20000925)425:3<447::aid-cne9>3.0.co;2-v. [DOI] [PubMed] [Google Scholar]
  • 22.Ongür D, Price JL. The organization of networks within the orbital and medial prefrontal cortex of rats, monkeys and humans. Cereb Cortex. 2000;10(3):206–219. doi: 10.1093/cercor/10.3.206. [DOI] [PubMed] [Google Scholar]
  • 23.Park SQ, Kahnt T, Rieskamp J, Heekeren HR. Neurobiology of value integration: When value impacts valuation. J Neurosci. 2011;31:9307–9314. doi: 10.1523/JNEUROSCI.4973-10.2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Kahnt T, Heinzle J, Park SQ, Haynes JD. Decoding different roles for vmPFC and dlPFC in multi-attribute decision making. Neuroimage. 2011;56:709–715. doi: 10.1016/j.neuroimage.2010.05.058. [DOI] [PubMed] [Google Scholar]
  • 25.Fairhall AL, Lewen GD, Bialek W, de Ruyter Van Steveninck RR. Efficiency and ambiguity in an adaptive neural code. Nature. 2001;412:787–792. doi: 10.1038/35090500. [DOI] [PubMed] [Google Scholar]
  • 26.Brenner N, Bialek W, de Ruyter van Steveninck R. Adaptive rescaling maximizes information transmission. Neuron. 2000;26:695–702. doi: 10.1016/s0896-6273(00)81205-2. [DOI] [PubMed] [Google Scholar]
  • 27.Dunn FA, Lankheet MJ, Rieke F. Light adaptation in cone vision involves switching between receptor and post-receptor sites. Nature. 2007;449:603–606. doi: 10.1038/nature06150. [DOI] [PubMed] [Google Scholar]
  • 28.Preuschoff K, Bossaerts P. Adding prediction risk to the theory of reward learning. Ann N Y Acad Sci. 2007;1104:135–146. doi: 10.1196/annals.1390.005. [DOI] [PubMed] [Google Scholar]
  • 29.Louie K, Grattan LE, Glimcher PW. Reward value-based gain control: Divisive normalization in parietal cortex. J Neurosci. 2011;31:10627–10639. doi: 10.1523/JNEUROSCI.1237-11.2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Elliott R, Agnew Z, Deakin JF. Medial orbitofrontal cortex codes relative rather than absolute value of financial rewards in humans. Eur J Neurosci. 2008;27:2213–2218. doi: 10.1111/j.1460-9568.2008.06202.x. [DOI] [PubMed] [Google Scholar]
  • 31.Nieuwenhuis S, et al. Activity in human reward-sensitive brain areas is strongly context dependent. Neuroimage. 2005;25:1302–1309. doi: 10.1016/j.neuroimage.2004.12.043. [DOI] [PubMed] [Google Scholar]
  • 32.Bunzeck N, Dayan P, Dolan RJ, Duzel E. A common mechanism for adaptive scaling of reward and novelty. Hum Brain Mapp. 2010;31:1380–1394. doi: 10.1002/hbm.20939. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Lohrenz T, McCabe K, Camerer CF, Montague PR. Neural signature of fictive learning signals in a sequential investment task. Proc Natl Acad Sci USA. 2007;104:9493–9498. doi: 10.1073/pnas.0608842104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Van Dijk KR, et al. Intrinsic functional connectivity as a tool for human connectomics: Theory, properties, and optimization. J Neurophysiol. 2010;103:297–321. doi: 10.1152/jn.00783.2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Murphy K, Birn RM, Handwerker DA, Jones TB, Bandettini PA. The impact of global signal regression on resting state correlations: Are anti-correlated networks introduced? Neuroimage. 2009;44:893–905. doi: 10.1016/j.neuroimage.2008.09.036. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Fox MD, Zhang D, Snyder AZ, Raichle ME. The global signal and observed anticorrelated resting state brain networks. J Neurophysiol. 2009;101:3270–3283. doi: 10.1152/jn.90777.2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Haber SN, Kunishio K, Mizobuchi M, Lynd-Balta E. The orbital and medial prefrontal circuit through the primate basal ganglia. J Neurosci. 1995;15:4851–4867. doi: 10.1523/JNEUROSCI.15-07-04851.1995. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Haber SN, Fudge JL, McFarland NR. Striatonigrostriatal pathways in primates form an ascending spiral from the shell to the dorsolateral striatum. J Neurosci. 2000;20:2369–2382. doi: 10.1523/JNEUROSCI.20-06-02369.2000. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Haber SN, Knutson B. The reward circuit: Linking primate anatomy and human imaging. Neuropsychopharmacology. 2010;35(1):4–26. doi: 10.1038/npp.2009.129. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Ikemoto S. Dopamine reward circuitry: Two projection systems from the ventral midbrain to the nucleus accumbens–olfactory tubercle complex. Brain Res Brain Res Rev. 2007;56(1):27–78. doi: 10.1016/j.brainresrev.2007.05.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Frankle WG, Laruelle M, Haber SN. Prefrontal cortical projections to the midbrain in primates: Evidence for a sparse connection. Neuropsychopharmacology. 2006;31:1627–1636. doi: 10.1038/sj.npp.1300990. [DOI] [PubMed] [Google Scholar]
  • 42.Karreman M, Moghaddam B. The prefrontal cortex regulates the basal release of dopamine in the limbic striatum: An effect mediated by ventral tegmental area. J Neurochem. 1996;66:589–598. doi: 10.1046/j.1471-4159.1996.66020589.x. [DOI] [PubMed] [Google Scholar]
  • 43.Yacubian J, et al. Dissociable systems for gain- and loss-related value predictions and errors of prediction in the human brain. J Neurosci. 2006;26:9530–9537. doi: 10.1523/JNEUROSCI.2915-06.2006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Delgado MR, Locke HM, Stenger VA, Fiez JA. Dorsal striatum responses to reward and punishment: Effects of valence and magnitude manipulations. Cogn Affect Behav Neurosci. 2003;3(1):27–38. doi: 10.3758/cabn.3.1.27. [DOI] [PubMed] [Google Scholar]
  • 45.Büchel C, Holmes AP, Rees G, Friston KJ. Characterizing stimulus-response functions using nonlinear regressors in parametric fMRI experiments. Neuroimage. 1998;8(2):140–148. doi: 10.1006/nimg.1998.0351. [DOI] [PubMed] [Google Scholar]
  • 46.Friston KJ, et al. Psychophysiological and modulatory interactions in neuroimaging. Neuroimage. 1997;6(3):218–229. doi: 10.1006/nimg.1997.0291. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supporting Information

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES