Summary
Reinforcement learning (RL) theories posit that dopaminergic signals are integrated within the striatum to associate choices with outcomes. Often overlooked is that the amygdala also receives dopaminergic input and is involved in Pavlovian processes that influence choice behavior. To determine the relative contributions of the ventral striatum (VS) and amygdala to appetitive RL we tested rhesus macaques with VS or amygdala lesions on deterministic and stochastic versions of a two-arm bandit reversal learning task. When learning was characterized with a RL model relative to controls, amygdala lesions caused general decreases in learning from positive feedback and choice consistency. By comparison, VS lesions only affected learning in the stochastic task. Moreover, the VS lesions hastened the monkeys’ choice reaction times, which emphasized a speed-accuracy tradeoff that accounted for errors in deterministic learning. These results update standard accounts of RL by emphasizing distinct contributions of the amygdala and VS to RL.
Keywords: amygdala, ventral striatum, reinforcement learning, decision making, speed-accuracy tradeoff, Bayesian, Rescorla-Wagner, Pearce-Hall, associability, lesion
Introduction
Reinforcement learning refers to learning the value of states in the environment and actions taken in those states. A common assumption about RL is that dopamine neurons in the midbrain signal reward prediction errors (RPEs; Schultz, 2015), which drive plasticity in the striatum to facilitate learning (e.g. Collins and Frank, 2014; Gurney et al., 2015; Houk et al., 1995). This view is supported by evidence that RPEs are correlated with ventral striatum BOLD responses (e.g. O'Doherty et al., 2004; O'Doherty et al., 2003; Rutledge et al., 2010) and fluctuations in striatal dopamine levels (e.g. Day et al., 2007; Hart et al., 2014). However, VS inactivations or lesions do not impair instrumental learning or conditioned reinforcement (Cardinal et al., 2002; Floresco, 2015; Stern and Passingham, 1996). This raises a long-standing question about the causal role of the VS in RL. It also highlights a need to understand how other parts of the mesolimbic dopamine system contribute to RL.
The amygdala is often overlooked or assigned a modulatory role in RL (Haber and Behrens, 2014). The amygdala, however, receives dopaminergic inputs (Haber and Fudge, 1997), and is highly involved in Pavlovian learning of appetitive and aversive associations (Baxter and Murray, 2002; Janak and Tye, 2015; Moscarello and LeDoux, 2014). Although the amygdala exerts Pavlovian control over operant behavior (Cardinal et al., 2002; Robinson et al., 2014; Stuber et al., 2011), it remains unclear if it contributes directly to choice behavior. There is evidence across species both in favor of (Hampton et al., 2007; Rygula et al., 2015; Seymour and Dolan, 2008) and against (Izquierdo et al., 2013; Izquierdo and Murray, 2007; Jang et al., 2015) the amygdala playing a role in RL during choice tasks.
To determine the causal roles of the amygdala and VS in RL, the present study used computational modeling to quantify the effect of targeted excitotoxic lesions of the amygdala or VS on monkeys’ choice behavior in a two-arm bandit task. We show that VS is necessary for learning stochastic reward associations, and that the amygdala contributes substantially to RL in both stochastic and deterministic environments.
Results
We tested rhesus macaques on a two-armed bandit reversal learning task using deterministic or stochastic reward schedules (Fig. 1A). The subjects included four monkeys with bilateral excitotoxic lesions of the amygdala (Fig. 1B), three monkeys with bilateral excitotoxic lesions of the VS (Fig. 1C), and four unoperated controls. The monkeys were first tested on randomly interleaved blocks following one of three probabilistic reward schedules: 80/20, 70/30 and 60/40. Learning was impaired in both lesion groups compared to controls. To determine if their deficits were specific to the stochastic task, we next tested the monkeys on the same task using a deterministic reward schedule (100/0). The task was carried out in blocks of 80 trials. On each trial a pair of visual cues were displayed as choice options. One cue was associated with a greater probability of reward delivery than the other. The monkeys were allowed to select one cue per trial by making a saccade to the selected cue. By sampling the stimulus-reward outcomes across trials, they could determine which of the two cues was more often rewarded. The stimulus-reward mappings of the two options were reversed once per block on a randomly chosen trial between trials 30 and 50. At the end of the block, two new cues were introduced (signaling the start of a new block) and the monkeys again had to discover by trial and error which of the two options was more often rewarded.
Figure 1. Two-arm bandit task and lesion maps.
A. The task unfolded over a block of 80 trials. In each trial, the animals first fixated a central point, followed by the presentation of two stimuli, left and right of fixation. The animals selected one of the stimuli by making a saccade to it and fixating the chosen cue. In each block one object was rewarded more often than the other, according to the current schedule (100/0%, 80/20%, 70/30% or 60/40%). Stimulus-reward mappings were reversed randomly between trials 30–50. A new stimulus pair was introduced at the start of each block, and multiple blocks were completed per session. B and C. Lesion extent mapped for animals with bilateral excitotoxic lesions of amygdala or VS.
To assess motivation, we quantified the percentage of aborted trials per session due to a failure to acquire/hold central fixation or a failure to make a choice related saccade. The lesion (VS: 12.83%; amygdala: 12.78%) and control (11.16%) groups did not differ in the percentage of trials aborted due to fixation errors (F(2,204) = 1.77, p = .173). In addition, the lesion (VS: 0.31%; amygdala: 0.26%) and control (0.01%) groups did not differ in the percentage of trials aborted due to choice errors (F(2,205) = 1.82, p = .165). We also computed how quickly the monkeys acquired central fixation. Fixation RTs were faster in the VS lesion group compared to the amygdala (t(197) = 17.4, p < .001) or control groups (t(157) = 16.23, p < .001). Fixation RTs were similar in the amygdala and control groups (t(204) = −1.15, p = .251; Table S1). Therefore, lesion effects on RL are not attributable to reduced motivation.
Choice behavior
To visualize the monkeys’ choice behavior, we aligned the trials in each block around the reversal point estimated by a Bayesian change-point analysis (Fig. 2; Supplemental Methods). In both the deterministic (Fig. 2A) and stochastic (Fig. 2B) tasks, monkeys learned to select the better option and to switch their choice behavior once they detected the contingency reversal. To quantify lesion effects on choice behavior, we computed the fraction of correct choices in the acquisition and reversal phases (Fig. 2C). Group performance differed in the deterministic task when averaged across acquisition and reversal (F(2,41) = 74.59, p < .001). The striatal lesion group performed worse than the controls (t(19) = −7.79, p < .001), and the monkeys with amygdala lesions performed worse than those with striatal lesions (t(33) = −6.23, p < .001).
Figure 2. Choice behavior for the lesion and control groups.
A and B. Fraction of times each group chose the cue with a higher initial value in the deterministic (A) and stochastic (B) tasks. Dark lines show means, shaded region shows ± 1 S.E.M (computed across sessions). Dotted vertical lines show reversal point. Choice curves were smoothed with a moving average window of six trials. Because the number of trials before and after acquisition varied across blocks, trial number was normalized to be between 0 and 1 for each phase, and then averaged. C. Overall fraction of correct choices for each group by reward schedule and learning phase (see Table S1). D. Logistic regression coefficients (symbols) predicting the monkeys’ current choice based on previous trials and outcomes. Two predictors were used to indicate whether prior selection of each option was rewarded or unrewarded. Positive regression coefficients for either predictor indicate a greater likelihood of choosing an option that resulted in reward or no reward. Lines indicate exponential fits of the estimated regression coefficients.
Choice accuracy also differed by group in the stochastic task (Fig. 2C; F(2,232) = 289.71, p < .001). Again, monkeys with striatal lesions chose the correct option less often than controls (t(145) = −13.77, p < .001) but more often than animals with amygdala lesions (t(156) = 7.41, p < .001). The control group also showed improved performance in the reversal compared to acquisition phase, whereas in the VS and amygdala lesion groups, performance declined in the reversal phase (Group × Phase, F(2,232) = 15.57, p <.001). Although the reward schedule influenced performance (F(2,461) = 407.57, p < .001), controls showed greater linear improvement relative to the VS (t(142) = 7.7, p < .001) or amygdala lesion groups (t(160) = 11.48, p < .001; Group × Schedule, F(4,461) = 29.05, p < .001). Schedule related performance increases were equivalent in the two lesion groups (t(154) = 1.53, p = .131).
We also used a logistic regression model to quantify how previous trial choices and outcomes influenced the monkeys’ ongoing choice behavior (Fig. 2D). A positive regression coefficient indicated an increased likelihood that the monkeys chose the same rewarded or unrewarded option they chose on a past trial. In both tasks, the monkeys’ current choices were influenced more by past choices that were rewarded versus unrewarded. In the deterministic (Group × Outcome, F(2,42) = 4.85, p = .009) and stochastic (Group × Outcome, F(2,208) = 261.07, p < .001) tasks, regression coefficients differed between the lesion and control groups as a function of outcome. Across tasks, the mean regression coefficient weighting rewarded outcomes was larger in controls compared to the VS (deterministic: t(19) = 7.12, p < .001; stochastic: t(143) = 13.96, p < .001) or amygdala lesion groups (deterministic: t(30) = 11.37, p < .001; stochastic: t(163) = 18.67, p < .001). In the deterministic task, the mean regression coefficient for unrewarded outcomes did not differ between the lesion and control groups (F(2,42) = 3.09, p = .056). In the stochastic task, however, unrewarded outcomes were weighted less in controls compared to the amygdala (t(164) = −10.6, p < .001) or VS lesion groups (t(144) = −6.52, p < .001).
Reinforcement learning
Model selection
We assessed if an RL model best fit the monkeys’ choice behavior compared to alternative models (Fig. 3; Table S2). We compared Bayesian Information Criterion (BIC) estimates for the following four models: (1) a feedback dependent RL model that incorporated different learning rate parameters for positive and negative outcomes, (2) a non-feedback dependent RL model (e.g. Rescorla-Wagner), (3) a win-stay/lose-shift model, and (4) a Bayesian ideal observer model (Supplemental Experimental Methods). BIC estimates were lowest overall when we fit the feedback dependent RL model to each group’s choice behavior in the deterministic (Fig. 3A) and stochastic (Fig. 3B) tasks. Overlay of averaged model predictions and actual choice behavior showed that this model accurately fit the monkeys’ choices (Fig. 3C), which was also seen in scatterplots of predicted and measured choice behavior (Fig. 3D). Therefore, we proceeded by comparing feedback-dependent RL model parameter fits for the lesion and control groups.
Figure 3. Model comparisons indicated choice behavior was best fit by a feedback dependent RL model.
A and B. Bayesian information criterion (BIC) estimates for different models fit to each groups’ choice behavior in the deterministic (A) and stochastic (B) tasks (see Table S2). Filled bars indicate the model with lowest BIC estimate summed across sessions. Numbers within each bar represent percentage of sessions in which BIC was lowest for that model C. Feedback dependent model predictions overlaid on choice behavior for each group and reward schedule. D. Scatterplots of session-level averaged trial-by-trial estimates of model predictions and monkeys’ choice behavior.
Feedback dependent learning
We first analyzed how quickly the monkeys learned to update the expected value of each choice option based on positive and negative feedback. Positive and negative learning rates were fit for choices that resulted in reward or no reward, respectively. In the deterministic task (Fig. 4A), positive and negative learning rates did not differ (F(1,42) = 3.44, p = .071). In the stochastic task (Fig. 4B), learning was driven more by positive than negative feedback (F(1,231) = 273.84, p < .001), and mean learning rates were higher in the reversal compared to the acquisition phase (F(1,234) = 45.16, p < .001). There were no effects of reward schedule on learning rates in the stochastic task (p > .6 for all main effects and interactions).
Figure 4. Learning rates and choice consistency.
A and B. Feedback dependent learning rates for the lesion and control groups when the choice behavior is fit with an RL model in the deterministic (A) and stochastic (B) tasks. C and D. Inverse temperature values for the lesion and control groups in the deterministic (C) and stochastic (D) tasks broken out by learning phase. Error bars are ± 1 SEM. See Table S3 for average parameter values.
Next, we examined group differences in learning rates. In the deterministic task, the groups differed in their mean learning rate (F(2,42) = 12.93, p < .001; Fig 4A). Specifically, the mean learning rate was reduced in monkeys with amygdala lesions compared to those with VS lesions (t(34) = −3.99, p < .001) or intact controls (t(30) = −4.23, p < .001). The mean learning rate did not differ between the VS lesion and control groups (t(19) = 0.85, p = .431). Thus, in a deterministic reward environment, learning rates were only impaired in the monkeys with amygdala lesions.
When we analyzed the learning rates in the stochastic task, mean learning rates were lower in both the amygdala and VS lesion groups (F(2,234) = 37.18, p < .001; Fig. 4B). Learning from positive feedback was reduced in the monkeys with amygdala (t(165) = −8.64, p < .001) or VS lesions (t(146) = −8.14, p < .001) compared to controls. Learning from negative feedback was also reduced in the amygdala lesion group (t(166) = −3.42, p < .001) relative to controls. Negative learning rates did not differ between the VS lesion and control groups (t(146) = 2.4, p =.116). To summarize, monkeys with amygdala lesions showed deficits in feedback dependent learning in both the stochastic and deterministic tasks, whereas the monkeys with VS lesions only showed learning deficits in the stochastic task relative to controls.
Because the VS lesion group was more affected in the stochastic task, we also examined how learning rates changed across the two tasks. For the stochastic task we averaged learning rates across schedules and directly compared them to learning rates in the deterministic task. Negative learning rates were lower in the stochastic versus deterministic task (F(1,41) = 25.93, p < .001), and this effect was consistent in the lesion and control groups (Group × Task, F(2,41) = 1.19, p = .314). However, there were clear task related differences between the lesion and control groups in learning from positive feedback (Group × Task, F(2,41) = 8.96, p < .001). For the VS lesion (t(11) = −4.58, p < .001) and control (t(8) = −6.59, p < .001) groups, positive learning rates decreased in the stochastic versus deterministic task. This was not the case for the amygdala lesion group, in which positive learning rates were equivalent in the stochastic and deterministic tasks (t(22) = 0.43, p = .666). Thus, task effects were confined to learning from positive feedback, supporting the differences found between the lesion and control groups in each task.
Choice consistency
We next quantified choice consistency using the inverse temperature parameter β from the RL model. This parameter quantified how consistently the monkeys chose the higher value option, and smaller inverse temperature values are indicative of noisier choice behavior. For both the deterministic (F(1,42) = 11.32, p =.002) and stochastic tasks (F(1,232) = 39.74, p < .001), choice consistency was higher in the acquisition than the reversal phase (Fig. 4C and D).
In both the deterministic (F(2,42) = 11.16, p <.001) and stochastic (F(2,228) = 39.74, p <.001) tasks, choice consistency was reduced in the lesion groups relative to controls (Fig. 4C and D). Compared to controls, monkeys with amygdala lesions (deterministic: t(31) = −4.42, p < .001; stochastic: t(155) = 14.28, p < .001) or VS lesions (deterministic: t(20) = −2.62, p = .016; stochastic: t(143) = −6.7, p < .001) less consistently chose the higher value option. In the deterministic task, the lesion groups showed equivalent decreases in choice consistency (t(33) = −1.82, p = .078), whereas in the stochastic task, the amygdala lesion group was less consistent than the VS lesion group (t(158) = −2.44, p = .016). Also, in the stochastic task the reward schedule modulated choice consistency only in the control group (Group × Schedule, F(4,476) = 2.51, p = .041). In controls (Schedule, F(2,155) = 5.09, p = .007), choice consistency was higher in the 80/20 schedule compared to the 70/30 (t(156) = 2.38, p = .018) or 60/40 (t(156) = 3.03, p = .002) schedules, but did not differ in the latter two schedules.
When we examined how inverse temperature values changed across the two tasks, we found they were slightly higher in the stochastic compared to the deterministic task (F(1,216) = 4.79, p = .031), and this effect was consistent in the lesion and control groups (Group × Task, F(2,181) < 1, p = .723). Therefore, the monkeys with VS and amygdala lesions exhibited noisier choice behavior in general.
Reaction time effects on choice consistency
Choice RTs differed between the lesion and control animals (Fig. 5; Table S1). The monkeys with VS lesions were faster compared to the monkeys with amygdala lesions or controls in both the deterministic (F(2,42) = 84.81, p < .001) and stochastic (F(2,228) = 802.6, p < .001) tasks. Consistent with this result, the choice RT distribution of the VS lesion group was shifted left, compared to the control (KS test deterministic: p < .001; stochastic: p < .001) or amygdala lesion (deterministic: p < .001; stochastic: p < .001) groups. Choice RTs did not differ between the amygdala lesion and control groups in either task (all ps > .14). In addition RTs did not differ by task (F(1,250) < 1, p = .797) and there were no effects of reward schedule on choice RTs in any group.
Figure 5. Striatal lesion effects on choice reaction times and speed-accuracy tradeoff.
A and B. Upper plots with the ordinate oriented to the left illustrate how the chosen softmax probability varied as a function of choice RT in the deterministic (A) and stochastic (B) tasks. Larger softmax probabilities indicate more consistent selection of higher value option. Gaussian kernel regression was used to estimate mean softmax probabilities as a function of RT (see Experimental Methods). The lower plots with the ordinate oriented to the right show RT probability distributions for each group (20 ms bins; see Table S1). Line thickness of the estimated softmax probability indicates relative density of the RT probability distribution at that bin. C. Linear correlation coefficient between chosen softmax probabilities and RTs, for each reward schedule and lesion group.
The observation of faster choice RTs in the VS lesion group suggests their less consistent selection of high value options is related to a speed accuracy trade-off (i.e. faster responses led to less accurate choices). To visualize how choice RTs were related to choice accuracy, we extracted from the softmax probability for each of the monkeys’ choices from the RL model. These estimates indicated the probability that monkeys’ choice would be rewarded on a given trial. We then used Gaussian kernel regression to determine the function relating RTs to choice probabilities for each block, sampled the function in equally spaced 20 ms RT bins, and averaged the function estimates across blocks. This was done for both the deterministic and stochastic tasks (Fig. 5A and B). In each task it was clear that choice accuracy increased along with RT from about 100 ms up to approximately 225 ms, after which it plateaued before decreasing as RT further increased. This relationship between RTs and choice probabilities was consistent in all three groups, although there were vertical shifts in the curves.
To quantify the relationship between RTs and choice probabilities we computed the correlation coefficient for each block of trials and averaged them by reward schedule and session (Fig. 5C). The correlation magnitude increased along with the reward schedule (F(2,457) = 4.16, p = .016), confirming the animals slowed down as it became easier to discriminate the value of the two choice options. The lesion and control groups differed in the magnitude and sign of the correlation coefficient relating choice RT and accuracy. In the deterministic task (Group, F(2,42) = 5.87, p = .006), direct comparison of correlation coefficients by group confirmed a larger, positive correlation in the VS lesion group, compared to the amygdala lesion (t(37) = 2.46, p =.023) or control groups (t(20) = 3.19, p = .005). Similarly in the stochastic task, collapsed across schedules, there was a larger, positive correlation in the VS lesion group, compared to the amygdala (t(153) = 7.46, p < .001) or control groups (t(143) = 3.12, p < .001).
The VS lesion group’s choice RTs were overall faster and showed a speed-accuracy tradeoff. Their less consistent selection of high value options might therefore be due to the VS lesion group acting impulsively—especially in the deterministic task where striatal lesions had a minor effect on choice consistency relative to controls. To test this hypothesis we matched the choice RT distributions of the lesion groups and controls, pair-wise, and computed the fraction of times the monkeys in each group chose the higher value option in each task.
When the choice RT distributions of the VS and control groups were matched in the deterministic task (Fig. 6A and C), the fraction of correct choices was equivalent in the VS lesion and control groups (t(19) = 1.294, p = .211). This indicates the effect of VS lesions on choice consistency in the deterministic task was limited to trials in which the animals responded too quickly. In the stochastic task, however, equating group choice RT distributions did not mitigate previously identified differences in choice accuracy between the VS lesion group and controls (F(1,141) = 36.6, p <.001; Fig. 6B and C). For example, in the 80/20 schedule choice performance was consistently lower in the VS lesion group compared to controls, after matching the choice RT distributions (t(143) = −5.73, p < .001; Fig. 6B).
Figure 6. Faster RTs in the VS lesion group modulated choice consistency in the deterministic task.
A. Fraction of times the monkeys in the lesion and control groups chose the cue with a higher initial value in the deterministic task when RT distributions were matched. RTs were matched pairwise between each of the animals in the lesion and control groups. Specifically, the data labeled Striatum and Controls:Striatum plots the behavioral performance when the RT distributions of these two groups were matched, and likewise for the data labeled Amygdala and Controls:Amygdala. B. Fraction of times the animals in each group chose the cue with a higher initial value in the 80/20 blocks of the stochastic task, when RT distributions were matched as described in A. C. Fraction of correct choices for the lesion and control groups broken out by reward schedule when group RT distributions were matched. Shaded regions and error bars indicate ± 1 SEM.
The amygdala lesion and control groups did not differ in their choice RT distributions. Thus, we did not expect matching these groups’ choice RTs to eliminate differences in choice consistency. Even after choice RTs were matched, monkeys with amygdala lesions showed decreased choice consistency relative to controls in the deterministic (t(30) = −5.88, p < .001) and stochastic (F(1,149) = 196.6, p < .001) tasks.
Pearce-Hall error learning
RPEs can promote learning directly or indirectly by altering attention to reward predictive cues (Roesch et al., 2012). The amygdala appears to utilize error signals to modulate attention during learning (Roesch et al., 2012; Holland and Schiffino, 2016), versus direct encoding of RPEs in VS (Li et al., 2011) and dopamine neurons (Roesch et al., 2012). Therefore, we fit a hybrid Pearce-Hall model (Li et al., 2011) to the monkeys’ choices to examine how amygdala and VS lesions affected attentional gating of learning. This model contains an associability parameter that modulates the learning rate as a function of the absolute magnitude of past RPEs. When RPEs are large, associability (i.e. attention) is increased. The model has two free parameters, κ and η (Table S3). The κ parameter refers to a fixed scalar that modulates value updating of each cue, akin to the learning rate parameter in the feedback dependent RL model (to control model complexity, we did not fit separate learning rate parameters for positive and negative feedback). The η parameter is a decay parameter that controls the temporal dynamics of associability over time.
The fitted κ parameters were consistent with the learning rate results found using the feedback dependent RL model. In the deterministic task, κ (Fig. 7A) was reduced in the monkeys with amygdala lesions (t(30) = 2.32, p = .027), relative to controls. For the VS lesion group the value of κ was in between and did not differ from the amygdala lesion (t(33) < 1, p = .517) or control groups (t(19) = −1.56, p = .136; Group, F(2,42) = 2.62, p = .084). In the stochastic task, by contrast, κ was reduced in both the amygdala (t(163) = −5.13, p < .001) and VS (t(145) = −3.78, p < .001) lesion groups, compared to controls (F(2,233) = 14.16, p <.001).
Figure 7. Pearce-Hall model parameter fits to choice behavior.
A–C. Fitted parameter estimates for κ (fixed learning rate), η (decay rate of associability) and β (choice consistency). See Table S3 for average parameter values. Error bars indicate ± 1 SEM. D. BIC estimates for each group and task when choices were fit using a PH or RW model.
In the deterministic task, η values only marginally differed between the lesion and control groups (F(2,41) = 3.08, p = .057; Fig. 7B). Planned post-hoc comparisons of η did indicate it was reduced in monkeys with amygdala lesions relative to those with VS lesions (t(33) = −2.6, p = .0138). However, η values in either lesion group were equivalent to that estimated for controls (ps > .13). In the stochastic task, there were again only marginal differences in η values between the lesion and control groups (F(2,235) = 2.94, p = .054). Planned post-hoc comparisons did indicate a larger η value in the amygdala lesion group compared to controls (t(166) = 2.49, p = .0163), but not in comparison to the VS lesion group (t(157) < 1, p =.345). Thus, we found relatively weak evidence that the amygdala or VS affected associability.
Because we fit the PH model to the monkeys’ choice behavior, we also fit an inverse temperature parameter to measure choice consistency. Similar to our results using the feedback dependent RL model, inverse temperature values were lower in the animals with lesions of the amygdala or VS compared to controls, both in the deterministic (F(2,41) = 6.01, p = .005) and stochastic (F(2,212) = 62.13, p < .001) tasks.
The PH hybrid model is an extension of the Rescorla-Wagner (RW) learning. The RW algorithm has a fixed learning rate and no associability term. To determine which model (PH or RW) was more parsimonious, we computed BIC estimates to summarize how well each model fit the monkeys’ choices (Fig. 7D). In each task and in all three groups, the RW model with one less parameter yielded a more parsimonious fit compared to the PH model.
Discussion
We combined computational modeling with selective lesions of the amygdala or VS to determine the causal role of these regions in two facets of RL, feedback dependent learning and choice consistency. Relative to controls, learning from positive feedback was generally impaired in monkeys with amygdala lesions, whereas monkeys with VS lesions were only impaired when learning stochastic associations. Both the VS and amygdala lesion groups chose higher value options less consistently than controls. Although in the deterministic task, decreased consistency in the VS lesion group was attributable to their overall faster choice RTs. In addition we did not find strong evidence that the primate amygdala critically encodes Pearce-Hall associability in bandit tasks.
Ventral striatum is only necessary for learning stochastic reward associations
Neurocomputational accounts of RL implicate VS in value updating (Collins and Frank, 2014; Schultz, 2015). On the other hand, inactivation of VS in rodents does not impair instrumental learning or conditioned reinforcement (Cardinal et al., 2002; Dalton et al., 2014; Floresco, 2015). Likewise, excitotoxic VS lesions in monkeys do not affect particular forms of stimulus-reward learning (Stern and Passingham, 1996). In all these cases only deterministic learning was examined. Therefore, our finding that VS lesions had a greater impact on feedback dependent learning in the stochastic versus deterministic task highlights the necessity of the VS in learning probabilistic reward associations. Moreover, it reconciles the results of prior VS inactivation studies with existing theories about neural implementations of RL.
It was recently shown that nucleus accumbens (NAc) shell, but not core, preferentially mediates probabilistic learning (Dalton et al., 2014). In rats, shell inactivation reduced win-stay behavior and increased errors to criterion. This result parallels our findings that VS lesions reduced sensitivity to positive feedback and decreased choice consistency in the stochastic task. Although our VS lesions subsume the shell and core (Friedman et al., 2002), future research should determine if our effects are specifically due to shell lesions.
Although we confirm VS mediates stochastic RL, this does not necessarily mean that value updating in the striatum relies on dopaminergic RPEs. When control monkeys were tested on the same stochastic, two-armed bandit task, learning from positive feedback was heightened when levodopa was used to enhance dopaminergic function, compared to dopamine antagonism with haloperidol. Yet neither drug modulated the monkeys’ learning rates compared to saline (Costa et al., 2015). Systemic blockade of the dopamine transporter similarly does not modulate learning rates during a probabilistic, three-arm bandit task (Costa et al., 2014). Thus, whether the role of the VS in stochastic learning is dopamine related remains an open question.
Another issue is whether stochastic learning deficits in the VS lesion group are specific to acquiring stimulus-outcome associations, as opposed to action-outcome associations. D2 receptor antagonism in dorsal striatum disrupts action selection based on reinforcement of past choices (Lee et al., 2015) and dopamine terminals in dorsomedial striatum preferentially encode choices relative to specific movements (Parker et al., 2016). Thus adaptations of the current task to study action-outcome learning may reveal greater dependence on dorsal versus ventral striatum.
Ventral striatum modulates speed-accuracy tradeoff
VS lesions hastened choice RTs compared to the amygdala lesion or control groups. VS projections inhibit dopamine neurons (Haber et al., 2000; Menegas et al., 2015) and VS lesions may cause disinhibition of dopamine neurons. The faster choice RTs in the VS lesion group also made them more susceptible to a speed-accuracy tradeoff, particularly in the deterministic task. In rodents, NAc core lesions increase delay discounting (Cardinal et al., 2001) and impair performance when reward is contingent on withholding a response for a fixed period of time (Pothuizen et al., 2005). VS lesions also disrupt the temporal specificity of dopaminergic RPEs induced by changes in reward timing (Takahashi et al., 2016). Altogether, these findings suggest VS regulates temporal integration of reward information to guide choice behavior.
Amygdala directly contributes to reinforcement learning
The amygdala plays a fundamental role in Pavlovian learning and the valuation of sensory stimuli (Baxter and Murray, 2002; Janak and Tye, 2015; Salzman and Fusi, 2010; Seymour and Dolan, 2008). Pavlovian associations encoded in amygdala are also known to modulate instrumental behavior (Cardinal et al., 2002; Corbit and Balleine, 2005; Holland and Gallagher, 2003) through circuit interactions involving the ventral tegmental area and VS (Corbit et al., 2007; Stuber et al., 2011). Therefore, the amygdala is well positioned to guide choice behavior.
Yet is unclear if the amygdala directly contributes to value-guided decision making. This is in part due to mixed evidence across species about the necessity of the amygdala in mediating object discrimination reversal learning (Chau et al., 2015; Hampton et al., 2007; Izquierdo et al., 2013; Izquierdo and Murray, 2007; Rudebeck and Murray, 2008; Stalnaker et al., 2007). Our results are consistent with evidence of impaired discrimination learning in human patients with amygdala damage (Hampton et al., 2007), and in monkeys with aspiration or radio frequency amygdala lesions (Jones and Mishkin, 1972; Schwartz and Poulos, 1965). They are inconsistent with evidence that excitotoxic amygdala lesions do not disrupt object discrimination reversal learning (Izquierdo and Murray, 2007). Our data also counter evidence that the amygdala encodes inflexible value representations that impede reversal learning and credit assignment (Chau et al., 2015; Stalnaker et al., 2007).
The apparent contradiction between our results and past studies is likely due in part to methodological differences. For example, monkeys with excitotoxic amygdala lesions perform better than controls on a deterministic, object reversal learning task (Jang et al., 2015; Rudebeck and Murray, 2008). This benefit is limited to the first few reversals (Jang et al., 2015), which suggests amygdala contributions to RL may differ in contexts of unexpected versus expected uncertainty. Another difference is that prior studies tested animals with extensive, if not exclusive, training on deterministic stimulus-reward associations (Izquierdo and Murray, 2007). It is therefore possible that the amygdala lesion group might have shown little or no deficit in the deterministic task if they had only experienced deterministic stimulus-outcome pairings. However, the general and sizable deficit in learning from positive feedback seen in the amygdala lesion group argues against this idea.
Our results do support a nascent view of amygdala function that emphasizes its active role in decision making (Grabenhorst et al., 2012; Seymour and Dolan, 2008; Phelps et al., 2014). Amygdala activity tracks signals relevant to economic choice (Grabenhost et al., 2012; Hernadi et al., 2015), model-based RL (Prevost et al., 2013), and abstract context representations (Saez et al., 2015). Human patients with amygdala damage show learning deficits in both stochastic and deterministic bandit tasks (Hampton et al., 2007). Also, serotonin depletion in the marmoset amygdala delays probabilistic discrimination learning by reducing sensitivity to reward and punishment (Rygula et al., 2015). Thus when learning to choose between two objects, the amygdala is needed to form strong feedback dependent stimulus-reward associations in order to identify and consistently choose objects that are most valuable. However, it is likely that value representations formed in the amygdala drive choice behavior via projections to VS and dopamine subpopulations.
Lesion effects on Pearce-Hall error learning in choice contexts
Pearce-Hall learning theory describes how associability (i.e. amount of attention paid to a cue) influences the rate at which conditioned associations are learned (Li et al., 2011; Pearce and Hall, 1980). Experiments designed around this view suggest amygdala computes associability (Holland and Schiffino, 2016; Roesch et al., 2012). Likewise in the deterministic task, we found that associability was reduced in the monkeys with amygdala lesions relative to those with VS lesions. However, one caveat to this finding is that associability encoding in the lesion groups was equivalent to controls. Moreover, associability encoding did not differ between the amygdala and striatal lesion groups in the stochastic task. Therefore, it is difficult to draw firm conclusions about the relative roles of the amygdala and VS in signaling associability.
Previous studies have found amygdala and VS differentially encode associability when associability is computed by fitting a hybrid PH model to Pavlovian conditioned autonomic responses (Atlas et al., 2016; Li et al., 2011; Zhang et al., 2016). In these cases, conditioned responses were better fit by the PH model than the RW model. The opposite was true when we fit the same models to the monkeys’ choice data. The RW model with one fewer parameter yielded more parsimonious fits than the hybrid PH model. Modeling learning in terms of preparatory versus consummatory responses may reveal differential roles for the amygdala in signaling associability (Zhang et al., 2016). We might also have detected deficits in PH error learning if the monkeys were tested on choice related variants of serial prediction or unblocking tasks (Holland and Schiffino, 2016). At the very least, our results highlight a need to better understand how the amygdala and VS contribute to PH error learning, particularly in stochastic environments where surprise induced learning is not necessarily advantageous (Courville, Daw, & Touretsky, 2006).
Conclusion
Traditional views of reinforcement learning have held that the ventral striatum integrates dopamine coded RPEs to facilitate learning, but have not ascribed a specific role for the amygdala. Our results suggest standard accounts of RL should be revised to reflect a key role for the amygdala in learning to choose between stimuli, and to account for the relative role of the VS in mediating stochastic versus deterministic learning. Moreover, since amygdala projections to VS preferentially facilitate appetitive behaviors (Namburi et al., 2015), our data offers a new perspective on how interactions between the amygdala and ventral striatum enable reinforcement learning.
Experimental Procedures
Subjects
Eleven male rhesus macaques, weighing 6.5 to 10.5 kg, were studied. Three monkeys received bilateral excitotoxic lesions of the VS, 4 received bilateral excitotoxic lesions of the amygdala, and 4 were retained as unoperated controls. All monkeys were placed on water control for the duration of the study, and on test days earned all of their fluid through performance on the task. All experimental procedures were performed in accordance with the Guide for the Care and Use of Laboratory Animals and were approved by the National Institute of Mental Health Animal Care and Use Committee.
Surgery
For detailed surgical information see Supplemental Experimental Methods. In brief, each monkey was first implanted with a titanium head restraint device. The monkeys assigned to each lesion group then received targeted bilateral excitotoxic lesions of the amygdala or VS. All testing occurred after the lesion surgeries.
Lesion assessment
For each monkey, lesions of amygdala or VS were quantitatively assessed from postoperative MRI scans. The extent of damage was evaluated from T2-weighted scans obtained within 10 days of surgery. For each operated animal, MR scan slices were matched to drawings of coronal sections from a standard rhesus monkey brain at 1-mm intervals. Each lesion was subsequently plotted onto the standard sections to map its location and extent (Fig. 1B).
Experimental setup
The monkeys completed an average of 16.81 (SD = 6.72) blocks per session of a two-arm bandit task. Each block consisted of 80 trials and involved a single reversal of the stimulus-reward contingencies (Fig. 1). On each trial, the monkeys had to first acquire and hold a central fixation point (500–750 ms). After the monkey fixated for the required duration, two stimuli appeared to the left and right (6° visual angle) of the central fixation point. Stimuli varied in shape and color, and their trial-to-trial screen locations were randomized. Monkeys chose between stimuli by making a saccade to one of the two stimuli and fixating that cue for 500 ms. One stimulus was assigned a high reward probability, and the other a low reward probability. Juice rewards were delivered at the end of each trial according to the assigned reward schedule, followed by a fixed 1.5 s intertrial interval. A failure to acquire fixation within 5 s, hold central fixation, or to make a choice within 1 s resulted in an immediate repeat of the previous trial. The trial on which the cue-reward mapping reversed within each block was selected randomly from a uniform distribution across trials 30 to 50. The reversal trial did not depend on the monkey reaching a performance criterion. The reward schedule was randomly selected at the start of each block and remained constant within a block. We ran two experiments. In the first experiment we used three reward schedules: 80/20%, 70/30% and 60/40%. In the second experiment we used a deterministic reward schedule (100/0%).
Stimuli consisted of a fixed set of 6 images of a circle and square in one of three colors (red, green, and blue). The two choice options always differed in color and shape. These combinations were crossed with the reward schedules in each experiment and whether a particular shape was more or less initially rewarding (e.g., whether the blue square was the best choice before or after the reversal). This resulted in 12 block combinations in the deterministic version of the task and in 36 block combinations in the stochastic version of the task. Block presentations were fully randomized without replacement. This ensured that a specific stimulus-reward combination was never repeated until all block combinations were experienced. Although combinations were potentially repeated across sessions, upon inspection there was no evidence of improved performance across sessions.
Stimulus presentation and behavioral monitoring were controlled by a PC computer running Monkeylogic (Asaad and Eskandar, 2008). Stimuli were displayed on a LCD monitor (1024 × 768 resolution) situated 40 cm from the monkeys’ eyes. Eye movements were monitored at 1 kHz (Arrington Research, Scottsdale, AZ). On rewarded trials, apple juice (0.085 mL) was delivered through a pressurized plastic tube gated by a computer-controlled solenoid valve (Mitz, 2005).
Task training
All monkeys that received lesions were trained and tested following surgery. Initial training focused on having the monkeys learn the task structure using a deterministic reward schedule. They were then trained on a probabilistic version of the task until they routinely completed 15–20 blocks per session and demonstrated stable performance on interleaved blocks of the 3 probabilistic reward schedules. Data from these training sessions was not included in the reported analyses. Once testing began, the monkeys completed the stochastic and deterministic tasks in a separate series of daily sessions. All of the monkeys completed the deterministic task after they completed testing on the stochastic task. Prior to testing on the deterministic task, the lesion and controls groups did not differ in the number of sessions (F(2,8) = 1.48, p = .234) or total number of blocks completed (F(2,8) < 1, p = .554) on the stochastic task. In particular, the amygdala and VS lesion groups did not differ (number sessions: t(5) < 1, p = .377; total number of blocks: t(5) < 1, p = .818). Therefore group differences in the deterministic task cannot be attributed to time in training.
Computational modeling
We fit three models to the choice data. The first was a causal Bayesian model (see Supplemental Methods) that we used to estimate reversal points in the stimulus reward mappings. We then fit a stateless temporal difference learning model. Because it is stateless, this model is equivalent to a RW model (Rescorla and Wagner, 1972). Finally, we fit a hybrid PH model (Li et al., 2011; Pearce and Hall, 1980).
Reinforcement learning model
We split the trials in each block into acquisition and reversal phases using the reversal point calculated with the Bayesian model. We then fit separate RL models to each phase. This was done for each session, and separately for each schedule. We used a standard RL model to estimate learning from positive and negative feedback, as well as to estimate the inverse temperature. Specifically, value updates were given by:
| (7) |
where vi is the value estimate for option i, R is the reward feedback for the current choice, and αf is the feedback dependent learning rate parameter, where f indexes whether the current choice was rewarded (R=1) or not (R = 0). In other words, on each trial αf is one of two fitted values used to scale prediction errors based on reward feedback for the current choice. These value estimates were then passed through a logistic function to generate choice probability estimates:
| (8) |
The likelihood is then given by:
| (9) |
where c1(k) had a value of 1 if option 1 was chosen on trial k and c2(k) had a value of 1 if option 2 was chosen. Otherwise they had a value of 0. Standard function optimization techniques were used to maximize the log of the likelihood of the data given the parameters. As estimation can settle on local minima, we used 20 initial values for the parameters. The maximum of the log-likelihood across fits was then used.
When we fit the RL model to the acquisition phase, starting values were reset to 0.5 at the beginning of each block, unless the stimulus in the new block was one of the same stimuli from the previous block. This happened in some cases because we used a small set of stimuli that were paired in a counter-balanced way with rewards. When a stimulus in the new block was the same as the stimulus from the previous block we carried the value estimate from the previous block forward. For the reversal phase, starting values for each block corresponded to end values of the corresponding acquisition phase.
Pearce-Hall error learning
We also fit a hybrid PH model (Li et al., 2011). In this model, learning is dynamically gated by two free parameters, κ and η. Value updates are given by:
| (10) |
| (11) |
| (12) |
where vi(k) is the value estimate for option i, R is the reward feedback for the current choice, δi is the reward prediction error of each option, κ is a constant learning rate parameter, αi(k) is the associability of each option, and η is a weighting factor that accounts for the gradual update in associability across trials. To limit free parameters in the model, each option was assigned a starting value of 0.5 and αi(0) was set to 1. Also, because of the structure of this model, it is not straightforward to fit it separately to the acquisition and reversal phases. Therefore we fit one model across the entire block of trials.
Just as was done for the feedback dependent RL model value estimates were passed through a logistic function (equation 8) to generate choice probability estimates. The likelihood is then given by:
| (13) |
As described above, standard function optimization techniques were used to maximize the log of the likelihood of the data given the parameters.
For comparison with the PH model, we also fit a RW model to the choice data. This was done by setting η in the above model to a fixed value of 0. In this case the RW model was also not fit separately to each phase.
Fixation and choice errors
Fixation errors were defined as a failure to acquire fixation up to 1 s after onset of the fixation cue, or a failure to hold fixation up until presentation of the two choice options. Choice errors were defined as a failure to saccade to one of the two choice options up to 1 s after onset of the choice options.
Saccadic reaction times
We computed choice RTs on a trial-by-trial basis. Choice RTs were defined as the amount of time between the onset of the choice options and initiation of a saccade targeting either option. RT probability distribution functions were constructed (Fig. 5A and B) by binning RTs in 20 ms bins.
Speed-accuracy tradeoff
To examine the speed-accuracy tradeoff we first extracted the chosen softmax probability, di, from the feedback dependent RL model for each of the monkeys’ choices. Kernel regression, a non-parametric technique used to compute the conditional expectation of a random variable (Hastie et al., 2009), was used to visualize how choice RTs were related to chosen softmax probabilities (Fig. 5A and B). We specified a Gaussian kernel (20 ms) and the estimated function was sampled in evenly spaced 20 ms bins from 0 to 500 ms. We quantified the speed-accuracy tradeoff by estimating correlation coefficients for each block and averaged them across schedules for each session.
Aborted trials due to fixation or choice errors
Fixation errors were defined as a failure to acquire fixation up to 1 s after onset of the fixation cue, or a failure to hold fixation up until presentation of the two choice options. Choice errors were defined as a failure to saccade to one of the two choice options up to 1 s after onset of the choice options. Each error type was quantified as a percentage of the total number of trials completed per session.
Classical statistics
Each dependent variable was entered into mixed effects ANOVAs implemented in MATLAB. When appropriate, schedule, learning phase, feedback type, and monkey were specified as fixed effects, and session as a random factor nested under monkey. Post-hoc analyses of significant main effects used Fisher’s least significant difference test to correct for multiple comparisons (Levin et al., 1994). Post-hoc tests of significant interactions consisted of computing univariate ANOVAs for component effects, corrected for multiple comparisons.
Supplementary Material
Acknowledgments
This work was supported by the Intramural Research Program of the National Institute of Mental Health. We thank Laura Kennerly, Laura Kakalios, Kati Rothenhoefer, Raquel Vicario-Feliciano, and Pam Noble for assistance with data collection, Stephanie Huard for assistance in constructing the lesion density maps, Andrew Mitz for technical assistance.
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Author Contributions:
Conceptualization, V.D.C., O.D., E.A.M., and B.B.A.; Methodology, V.D.C., O.D., E.A.M. and B.B.A; Software, V.D.C. and B.B.A.; Formal Analysis, V.D.C and B.B.A.; Investigation, V.D.C, O.D. and D.R.L; Writing—Initial Draft, V.D.C. and B.B.A; Writing—Review and Editing, V.D.C., O.D., E.A.M., and B.B.A.; Resources, E.A.M and B.B.A.
References
- Asaad WF, Eskandar EN. A flexible software tool for temporally-precise behavioral control in Matlab. J Neurosci Methods. 2008;174:245–258. doi: 10.1016/j.jneumeth.2008.07.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Atlas LY, Doll BB, Li J, Daw ND, Phelps EA. Instructed knowledge shapes feedback-driven aversive learning in striatum and orbitofrontal cortex, but not the amygdala. eLife. 2016;5:e15192. doi: 10.7554/eLife.15192. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Baxter MG, Murray EA. The amygdala and reward. Nat Rev Neurosci. 2002;3:563–573. doi: 10.1038/nrn875. [DOI] [PubMed] [Google Scholar]
- Cardinal RN, Parkinson JA, Hall J, Everitt BJ. Emotion and motivation: the role of the amygdala, ventral striatum, and prefrontal cortex. Neurosci Biobehav Rev. 2002;26:321–352. doi: 10.1016/s0149-7634(02)00007-6. [DOI] [PubMed] [Google Scholar]
- Cardinal RN, Pennicott DR, Sugathapala CL, Robbins TW, Everitt BJ. Impulsive choice induced in rats by lesions of the nucleus accumbens core. Science. 2001;292:2499–2501. doi: 10.1126/science.1060818. [DOI] [PubMed] [Google Scholar]
- Chau BK, Sallet J, Papageorgiou GK, Noonan MP, Bell AH, Walton ME, Rushworth MF. Contrasting Roles for Orbitofrontal Cortex and Amygdala in Credit Assignment and Learning in Macaques. Neuron. 2015;87:1106–1118. doi: 10.1016/j.neuron.2015.08.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Collins AG, Frank MJ. Opponent actor learning (OpAL): modeling interactive effects of striatal dopamine on reinforcement learning and choice incentive. Psych Rev. 2014;121:337–366. doi: 10.1037/a0037015. [DOI] [PubMed] [Google Scholar]
- Corbit LH, Balleine BW. Double dissociation of basolateral and central amygdala lesions on the general and outcome-specific forms of pavlovian-instrumental transfer. J Neurosci. 2005;25:962–970. doi: 10.1523/JNEUROSCI.4507-04.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Corbit LH, Janak PH, Balleine BW. General and outcome-specific forms of Pavlovian-instrumental transfer: the effect of shifts in motivational state and inactivation of the ventral tegmental area. Eur J Neurosci. 2007;26:3141–3149. doi: 10.1111/j.1460-9568.2007.05934.x. [DOI] [PubMed] [Google Scholar]
- Costa VD, Tran VL, Turchi J, Averbeck BB. Dopamine modulates novelty seeking behavior during decision making. Behav Neurosci. 2014;128:556–566. doi: 10.1037/a0037128. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Costa VD, Tran VL, Turchi J, Averbeck BB. Reversal learning and dopamine: a bayesian perspective. J Neurosci. 2015;35:2407–2416. doi: 10.1523/JNEUROSCI.1989-14.2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Courville AC, Daw ND, Touretzky DS. Bayesian theories of conditioning in a changing world. Trends Cogn Sci. 2006;10:294–300. doi: 10.1016/j.tics.2006.05.004. [DOI] [PubMed] [Google Scholar]
- Dalton GL, Phillips AG, Floresco SB. Preferential involvement by nucleus accumbens shell in mediating probabilistic learning and reversal shifts. J Neurosci. 2014;34:4618–4626. doi: 10.1523/JNEUROSCI.5058-13.2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Day JJ, Roitman MF, Wightman RM, Carelli RM. Associative learning mediates dynamic shifts in dopamine signaling in the nucleus accumbens. Nat Neurosci. 2007;10:1020–1028. doi: 10.1038/nn1923. [DOI] [PubMed] [Google Scholar]
- Floresco SB. The nucleus accumbens: an interface between cognition, emotion, and action. Ann Rev Psych. 2015;66:25–52. doi: 10.1146/annurev-psych-010213-115159. [DOI] [PubMed] [Google Scholar]
- Friedman DP, Aggleton JP, Saunders RC. Comparison of hippocampal, amygdala, and perirhinal projections to the nucleus accumbens: combined anterograde and retrograde tracing study in the Macaque brain. J Comp Neurol. 2002;450:345–365. doi: 10.1002/cne.10336. [DOI] [PubMed] [Google Scholar]
- Grabenhorst F, Hernadi I, Schultz W. Prediction of economic choice by primate amygdala neurons. Proc Natl Acad Sci USA. 2012;109:18950–18955. doi: 10.1073/pnas.1212706109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gurney KN, Humphries MD, Redgrave P. A new framework for cortico-striatal plasticity: behavioural theory meets in vitro data at the reinforcement-action interface. PLoS Bio. 2015;13:e1002034. doi: 10.1371/journal.pbio.1002034. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Haber SN, Behrens TE. The neural network underlying incentive-based learning: implications for interpreting circuit disruptions in psychiatric disorders. Neuron. 2014;83:1019–1039. doi: 10.1016/j.neuron.2014.08.031. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Haber SN, Fudge JL. The primate substantia nigra and VTA: Integrative circuitry and function. Crit Rev Neurobiol. 1997;11:323–342. doi: 10.1615/critrevneurobiol.v11.i4.40. [DOI] [PubMed] [Google Scholar]
- Haber SN, Fudge JL, McFarland NR. Striatonigrostriatal pathways in primates form an ascending spiral from the shell to the dorsolateral striatum. J Neurosci. 2000;20:2369–2382. doi: 10.1523/JNEUROSCI.20-06-02369.2000. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hall J, Parkinson JA, Connor TM, Dickinson A, Everitt BJ. Involvement of the central nucleus of the amygdala and nucleus accumbens core in mediating Pavlovian influences on instrumental behaviour. Eur J Neurosci. 2001;13:1984–1992. doi: 10.1046/j.0953-816x.2001.01577.x. [DOI] [PubMed] [Google Scholar]
- Hampton AN, Adolphs R, Tyszka MJ, O'Doherty JP. Contributions of the amygdala to reward expectancy and choice signals in human prefrontal cortex. Neuron. 2007;55:545–555. doi: 10.1016/j.neuron.2007.07.022. [DOI] [PubMed] [Google Scholar]
- Hart AS, Rutledge RB, Glimcher PW, Phillips PE. Phasic dopamine release in the rat nucleus accumbens symmetrically encodes a reward prediction error term. J Neurosci. 2014;34:698–704. doi: 10.1523/JNEUROSCI.2489-13.2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hastie T, Tibshirani R, Friedman JH. The elements of statistical learning: data mining, inference, and prediction. 2nd. New York, NY: Springer; 2009. [Google Scholar]
- Hernadi I, Grabenhorst F, Schultz W. Planning activity for internally generated reward goals in monkey amygdala neurons. Nat Neurosci. 2015;18:461–469. doi: 10.1038/nn.3925. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Holland PC, Gallagher M. Double dissociation of the effects of lesions of basolateral and central amygdala on conditioned stimulus-potentiated feeding and Pavlovian-instrumental transfer. Eur J Neurosci. 2003;17:1680–1694. doi: 10.1046/j.1460-9568.2003.02585.x. [DOI] [PubMed] [Google Scholar]
- Holland PC, Schiffino FL. Mini-review: Prediction errors, attention and associative learning. Neurobiol Learn Mem. 2016;131:207–215. doi: 10.1016/j.nlm.2016.02.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Houk JC, Adamas JL, Barto AG. A model of how the basal ganglia generates and uses neural signals that predict reinforcement. In: Houk JC, Davis JL, Beiser DG, editors. Models of information processing in the basal ganglia. Cambridge, MA: MIT Press; 1995. pp. 249–274. [Google Scholar]
- Izquierdo A, Darling C, Manos N, Pozos H, Kim C, Ostrander S, Cazares V, Stepp H, Rudebeck PH. Basolateral amygdala lesions facilitate reward choices after negative feedback in rats. J Neurosci. 2013;33:4105–4109. doi: 10.1523/JNEUROSCI.4942-12.2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Izquierdo A, Murray EA. Selective bilateral amygdala lesions in rhesus monkeys fail to disrupt object reversal learning. J Neurosci. 2007;27:1054–1062. doi: 10.1523/JNEUROSCI.3616-06.2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Janak PH, Tye KM. From circuits to behaviour in the amygdala. Nature. 2015;517:284–292. doi: 10.1038/nature14188. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jang AI, Costa VD, Rudebeck PH, Chudasama Y, Murray EA, Averbeck BB. The Role of Frontal Cortical and Medial-Temporal Lobe Brain Areas in Learning a Bayesian Prior Belief on Reversals. J Neurosci. 2015;35:11751–11760. doi: 10.1523/JNEUROSCI.1594-15.2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jones B, Mishkin M. Limbic lesions and the problem of stimulus--reinforcement associations. Exp Neurol. 1972;36:362–377. doi: 10.1016/0014-4886(72)90030-1. [DOI] [PubMed] [Google Scholar]
- Lee E, Seo M, Dal Monte O, Averbeck BB. Injection of a dopamine type 2 receptor antagonist into the dorsal striatum disrupts choices driven by previous outcomes, but not perceptual inference. J Neurosci. 2015;35:6298–6306. doi: 10.1523/JNEUROSCI.4561-14.2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Levin JR, Serlin RC, Seaman MA. A Controlled, Powerful Multiple-Comparison Strategy for Several Situations. Psychol Bull. 1994;115:153–159. [Google Scholar]
- Li J, Schiller D, Schoenbaum G, Phelps EA, Daw ND. Differential roles of human striatum and amygdala in associative learning. Nat Neurosci. 2011;14:1250–1252. doi: 10.1038/nn.2904. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Menegas W, Bergan JF, Ogawa SK, Isogai Y, Umadevi Venkataraju K, Osten P, Uchida N, Watabe-Uchida M. Dopamine neurons projecting to the posterior striatum form an anatomically distinct subclass. elife. 2015;4 doi: 10.7554/eLife.10032. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mitz AR. A liquid-delivery device that provides precise reward control for neurophysiological and behavioral experiments. J Neurosci Methods. 2005;148:19–25. doi: 10.1016/j.jneumeth.2005.07.012. [DOI] [PubMed] [Google Scholar]
- Moscarello JM, LeDoux J. Diverse Effects of Conditioned Threat Stimuli on Behavior. Cold Spring Harb Symp Quant Biol. 2014;79:11–19. doi: 10.1101/sqb.2014.79.024968. [DOI] [PubMed] [Google Scholar]
- Namburi P, Beyeler A, Yorozu S, Calhoon GG, Halbert SA, Wichmann R, Holden SS, Mertens KL, Anahtar M, Felix-Ortiz AC, et al. A circuit mechanism for differentiating positive and negative associations. Nature. 2015;520:675–678. doi: 10.1038/nature14366. [DOI] [PMC free article] [PubMed] [Google Scholar]
- O'Doherty J, Dayan P, Schultz J, Deichmann R, Friston K, Dolan RJ. Dissociable roles of ventral and dorsal striatum in instrumental conditioning. Science. 2004;304:452–454. doi: 10.1126/science.1094285. [DOI] [PubMed] [Google Scholar]
- O'Doherty JP, Dayan P, Friston K, Critchley H, Dolan RJ. Temporal difference models and reward-related learning in the human brain. Neuron. 2003;38:329–337. doi: 10.1016/s0896-6273(03)00169-7. [DOI] [PubMed] [Google Scholar]
- Parker NF, Cameron CM, Taliaferro JP, Lee J, Choi JY, Davidson TJ, Daw ND, Witten IB. Reward and choice encoding in terminals of midbrain dopamine neurons depends on striatal target. Nat Neurosci. 2016;19:845–854. doi: 10.1038/nn.4287. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pearce JM, Hall G. A model for Pavlovian learning: variations in the effectiveness of conditioned but not of unconditioned stimuli. Psych Rev. 1980;87:532–552. [PubMed] [Google Scholar]
- Phelps EA, Lemper KM, Sokol-Hessner P. Emotion and decision making: multiple modulatory neural circuits. Annu Rev Neurosci. 2014;37:263–287. doi: 10.1146/annurev-neuro-071013-014119. [DOI] [PubMed] [Google Scholar]
- Pothuizen HH, Jongen-Relo AL, Feldon J, Yee BK. Double dissociation of the effects of selective nucleus accumbens core and shell lesions on impulsive-choice behaviour and salience learning in rats. Eur J Neurosci. 2005;22:2605–2616. doi: 10.1111/j.1460-9568.2005.04388.x. [DOI] [PubMed] [Google Scholar]
- Prevost C, McNamee D, Jessup RK, Bossaerts P, O'Doherty JP. Evidence for model-based computations in the human amygdala during Pavlovian conditioning. PLoS Comput Biol. 2013;9:e1002918. doi: 10.1371/journal.pcbi.1002918. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rescorla RA, Wagner AR. A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement. In: Black AH, WF P, editors. Classical coniditioning II: Current research and theory. New York: Appleton-Century-Crofts; 1972. pp. 64–99. [Google Scholar]
- Robinson MJ, Warlow SM, Berridge KC. Optogenetic excitation of central amygdala amplifies and narrows incentive motivation to pursue one reward above another. J Neurosci. 2014;34:16567–16580. doi: 10.1523/JNEUROSCI.2013-14.2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Roesch MR, Esber GR, Li J, Daw ND, Schoenbaum G. Surprise! Neural correlates of Pearce-Hall and Rescorla-Wagner coexist within the brain. Eur J Neurosci. 2012;35:1190–1200. doi: 10.1111/j.1460-9568.2011.07986.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rudebeck PH, Murray EA. Amygdala and orbitofrontal cortex lesions differentially influence choices during object reversal learning. J Neurosci. 2008;28:8338–8343. doi: 10.1523/JNEUROSCI.2272-08.2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rutledge RB, Dean M, Caplin A, Glimcher PW. Testing the reward prediction error hypothesis with an axiomatic model. J Neurosci. 2010;30:13525–13536. doi: 10.1523/JNEUROSCI.1747-10.2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rygula R, Clarke HF, Cardinal RN, Cockcroft GJ, Xia J, Dalley JW, Robbins TW, Roberts AC. Role of Central Serotonin in Anticipation of Rewarding and Punishing Outcomes: Effects of Selective Amygdala or Orbitofrontal 5-HT Depletion. Cereb Cortex. 2015;25:3064–3076. doi: 10.1093/cercor/bhu102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Saez A, Rigotti M, Ostojic S, Fusi S, Salzman CD. Abstract Context Representations in Primate Amygdala and Prefrontal Cortex. Neuron. 2015;87:869–881. doi: 10.1016/j.neuron.2015.07.024. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Salzman CD, Fusi S. Emotion, cognition, and mental state representation in amygdala and prefrontal cortex. Ann Rev Neurosci. 2010;33:173–202. doi: 10.1146/annurev.neuro.051508.135256. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schultz W. Neuronal Reward and Decision Signals: From Theories to Data. Physiol Rev. 2015;95:853–951. doi: 10.1152/physrev.00023.2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schwartz JS, Poulos DA. Discrimination Behavior after Amygdalectomy in Monkeys - Learning Set and Discrimination Reversals. J Comp Physiol Psych. 1965;60:320–328. doi: 10.1037/h0022551. [DOI] [PubMed] [Google Scholar]
- Seymour B, Dolan R. Emotion, decision making, and the amygdala. Neuron. 2008;58:662–671. doi: 10.1016/j.neuron.2008.05.020. [DOI] [PubMed] [Google Scholar]
- Stalnaker TA, Franz TM, Singh T, Schoenbaum G. Basolateral amygdala lesions abolish orbitofrontal-dependent reversal impairments. Neuron. 2007;54:51–58. doi: 10.1016/j.neuron.2007.02.014. [DOI] [PubMed] [Google Scholar]
- Stern CE, Passingham RE. The nucleus accumbens in monkeys (Macaca fascicularis): II. Emotion and motivation. Behav Brain Res. 1996;75:179–193. doi: 10.1016/0166-4328(96)00169-6. [DOI] [PubMed] [Google Scholar]
- Stuber GD, Sparta DR, Stamatakis AM, van Leeuwen WA, Hardjoprajitno JE, Cho S, Tye KM, Kempadoo KA, Zhang F, Deisseroth K, Bonci A. Excitatory transmission from the amygdala to nucleus accumbens facilitates reward seeking. Nature. 2011;475:377–380. doi: 10.1038/nature10194. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Takahashi YK, Langdon AJ, Niv Y, Schoenbaum G. Temporal Specificity of Reward Prediction Errors Signaled by Putative Dopamine Neurons in Rat VTA Depends on Ventral Striatum. Neuron. 2016 doi: 10.1016/j.neuron.2016.05.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang S, Mano H, Ganesh G, Robbins T, Seymour B. Dissociable Learning Processes Underlie Human Pain Conditioning. Curr Biol. 2016;26:52–58. doi: 10.1016/j.cub.2015.10.066. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.







