Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2018 Apr 1.
Published in final edited form as: Behav Neurosci. 2017 Apr;131(2):127–134. doi: 10.1037/bne0000192

Effects of inference on dopaminergic prediction errors depend on orbitofrontal processing

Yuji K Takahashi 1, Thomas A Stalnaker 1, Matthew R Roesch 2, Geoffrey Schoenbaum 1,3,4
PMCID: PMC5356489  NIHMSID: NIHMS849006  PMID: 28301188

Abstract

Dopaminergic reward prediction errors in monkeys reflect inferential reward predictions that well-trained animals can make when associative rules change. Here, in a new analysis of previously described data, we test whether dopaminergic error signals in rats are influenced by inferential predictions and whether such effects depend on the orbitofrontal cortex (OFC). Dopamine neurons were recorded from controls or rats with ipsilateral OFC lesions during performance of a choice task in which odor cues signaled the availability of sucrose reward in two wells. To induce prediction errors, we manipulated either the timing or number of rewards delivered in each well across blocks of trials. Importantly, a change in reward at one well predicted a change in reward at the other on later trials. We compared behavior and neural activity on trials when such inference was possible versus trials involving the same reward change when inference was not possible. Rats responded faster when they could infer an increase in reward compared to when the same reward was coming but they could not infer a change. This inferential prediction was reflected in the firing of dopamine neurons in controls, which changed less to unexpected delivery (or omission) of reward and more to the new high-value cue on inference versus non-inference trials. These effects were absent in dopamine neurons recorded in rats with ipsilateral OFC lesions. Thus, dopaminergic error signals recorded in rats are influenced by both experiential and inferential reward predictions, and the effects of inferential predictions depend on OFC.

Keywords: dopamine, reward prediction error, inference, orbitofrontal cortex, single unit, rat

INTRODUCTION

Dopamine neurons signal reward prediction errors (Hollerman & Schultz, 1998; Mirenowicz & Schultz, 1994; Schultz, Dayan, & Montague, 1997), and phasic manipulation of these neurons is able to substitute for positive and negative prediction errors to drive associative learning (Chang et al., 2016; Steinberg et al., 2013). Importantly, mounting evidence indicates that these teaching signals reflect reward predictions that go beyond an animal’s most recent, direct experience (Glascher, Daw, Dayan, & O'Doherty, 2010). For example, in monkeys, it has been shown that dopaminergic errors change in reversal tasks when there are predictable changes in a set of rewards, even before a change has been experienced for a particular cue or response (Bromberg-Martin, Matsumoto, Hong, & Hikosaka, 2010).

The ability to update a reward prediction without additional direct experience with that reward occurs because a well-trained animal can use a change in one reward in a set to infer that changes have occurred in the other rewards. Another way this can be described is that there are different abstract context representations, states or latent causes, to which can be assigned different, even conflicting, associative rules (Gershman & Niv, 2010; Saez, Rigotti, Ostojic, Fusi, & Salzman, 2015; Wilson, Takahashi, Schoenbaum, & Niv, 2014). Upon encountering a change in reward, the well-trained animal can switch to the appropriate state and recalls the appropriate rules not just for that reward but also for the other rewards it typically receives in that state. This is not just idle speculation. Neuronal correlates of this process have been demonstrated in prefrontal cortex and amygdala in monkeys (Saez et al., 2015), and in the cholinergic interneurons in the dorsal striatum in rats, where they appear to depend on the orbitofrontal cortex (OFC) (Stalnaker, Berg, Aujla, & Schoenbaum, 2016). The demonstration that dopamine neurons have access to these inferential predictions is important because it allows dopaminergic teaching signals to be constrained by this knowledge – i.e. to not operate in conflict with it – and further it allows these signals to facilitate learning when actual outcomes do not match what we infer should happen in a new state.

Here, in a new analysis that takes advantage of previously described data (Takahashi, Langdon, Niv, & Schoenbaum, 2016; Y. K. Takahashi et al., 2011), we test whether dopaminergic error signals in rats are influenced by inferential predictions of the sort highlighted in the aforementioned study in monkeys (Bromberg-Martin et al., 2010), and whether such effects depend on the OFC.

METHODS

Subjects

Twenty two male Long-Evans rats were obtained at 175–200g from Charles River Labs, Wilmington, MA. Rats were tested at the NIDA-IRP and University of Maryland School of Medicine in accordance with NIH guidelines and the University of Maryland School of Medicine Animal Care Committee.

Surgical procedures and histology

Lesions were made and electrodes implanted under stereotaxic guidance; all surgical procedures adhered to guidelines for aseptic technique. (OFC lesions were made by infusing NMDA (Sigma) (12.5 mg ml−1) at four sites in a hemisphere where a recording electrode was to be implanted: at 4.0 mm anterior to the bregma, 3.8 mm ventral to the skull surface, 2.2 mm (0.1µl) and 3.7 mm (0.1µl) lateral to the midline; and at 3.0 mm anterior to bregma, 5.2 mm ventral to the skull surface, 3.2 mm (0.05µl) and 4.2 mm (0.1µl). A drivable bundle of eight 25-um diameter FeNiCr wires (Stablohm 675, California Fine Wire, Grover Beach, CA) chronically implanted dorsal to VTA in the left or right hemisphere at 5.0 – 5.4 mm posterior to bregma, 0.7 mm laterally, and 7.0 mm ventral to the brain surface at an angle of 5° toward the midline from vertical. Wires were cut with surgical scissors to extend ~ 1.5 mm beyond the cannula and electroplated with platinum (H2PtCl6, Aldrich, Milwaukee, WI) to an impedance of ~300 kOhms. Cephalexin (15 mg/kg p.o.) was administered twice daily for two weeks post-operatively. The rats were then perfused, and their brains removed and processed for histology (Roesch, Taylor, & Schoenbaum, 2006)

Odor-guided choice task

Recording was conducted in aluminum chambers approximately 18” on each side with sloping walls narrowing to an area of 12” × 12” at the bottom. A central odor port was located above two fluid wells (Fig. 1b). Two lights were located above the panel. The odor port was connected to an air flow dilution olfactometer to allow the rapid delivery of olfactory cues. Odors where chosen from compounds obtained from International Flavors and Fragrances (New York, NY),

Figure 1. Apparatus and task design.

Figure 1

(a) Brain sections illustrate the extent of the maximum (gray) and minimum (black) lesion at each level in OFC in the lesioned rats. (b) Picture of apparatus used in the task, showing the odor port (~2.5 cm diameter) and two fluid wells. (c) Line deflections indicate the time course of stimuli (odors and rewards) presented to the animal on each trial. Dashed lines show when reward was omitted, and solid lines show when reward was delivered. At the start of each recording session one well was randomly designated as short (a 0.5 s delay before reward) and the other, long (a 1–7 s delay before reward) (block 1). In the second block of trials, these contingencies were switched. In block 3, the delay was held constant while the number of rewards was manipulated; one well was designated as big reward in which a second bolus of reward was delivered (big reward) and a single bolus of reward was delivered in the other well (small reward). In block 4, these contingencies were switched again. (d) Choice behavior on first and last 3 trials in Control (black) and OFCx (gray) groups. (e) Difference in reaction times between T1 trial and average last 3 trials in previous block or between inference trial and average last 3 trials in previous block in control (black) and OFCx (gray) groups. Adapted from “Expectancy-related changes in firing of dopamine neurons depend on orbitofrontal cortex”, Takahashi, YK, Roesch, MR, Wilson, RC, Toreson, K, O’Donnell, P, Niv, Y, and Schoenbaum, G., 2011. Nature Neuroscience. 14: 1590–1597. Copyright 2011 by Nature Neuroscience.

Trials were signaled by illumination of the panel lights inside the box. When these lights were on, nosepoke into the odor port resulted in delivery of the odor cue to a small hemicylinder located behind this opening. One of three different odors was delivered to the port on each trial, in a pseudorandom order. At odor offset, the rat had 3 seconds to make a response at one of the two fluid wells. One odor instructed the rat to go to the left to get reward, a second odor instructed the rat to go to the right to get reward, and a third odor indicated that the rat could obtain reward at either well. Odors were presented in a pseudorandom sequence such that the free-choice odor was presented on 7/20 trials and the left/right odors were presented in equal numbers. In addition, the same odor could be presented on no more than 3 consecutive trials.

Once the rats were shaped to perform this basic task, we introduced blocks in which we independently manipulated the size of the reward and the delay preceding reward delivery (Fig. 1c). For recording, one well was randomly designated as short and the other long at the start of the session (Fig. 1c, 1sh and 1lo). In the second block of trials these contingencies were switched (Fig. 1c, 2sh and 2lo). The length of the delay under long conditions followed an algorithm in which the side designated as long started off as 1s and increased by 1 s every time that side was chosen until it became 3 s. If the rat continued to choose that side, the length of the delay increased by 1 s up to a maximum of 7 s. If the rat chose the side designated as long less than 8 out of the last 10 choice trials then the delay was reduced by 1 s to a minimum of 3 s. The reward delay for long forced-choice trials was yoked to the delay in free-choice trials during these blocks. In later blocks we held the delay preceding reward constant while manipulating the size of reward (Fig. 1c, 3bg, 3sm, 4bg and 4sm). The reward was a 0.05 ml bolus of 10% sucrose solution. The reward magnitude used in delay blocks was the same as the reward used in the reward blocks. For big reward, additional boli were delivered after gaps of 500 ms.

Single-unit recording

Wires were screened for activity daily; if no activity was detected, the rat was removed, and the electrode assembly was advanced 40 or 80 um. Otherwise, active wires were selected for recording, a session was conducted, and the electrode was advanced at the end of the session. Neural activity was recorded using Plexon Multichannel Acquisition Processor systems (Dallas, TX). Signals from the electrode wires were amplified 20X by an op-amp headstage (Plexon Inc, HST/8o50-G20-GR), located on the electrode array. Immediately outside the training chamber, the signals were passed through a differential pre-amplifier (Plexon Inc, PBX2/16sp-r-G50/16fp-G50), where the single unit signals were amplified 50X and filtered at 150–9000 Hz. The single unit signals were then sent to the Multichannel Acquisition Processor box, where they were further filtered at 250–8000 Hz, digitized at 40 kHz and amplified at 1–32X. Waveforms (>2.5:1 signal-to-noise) were extracted from active channels and recorded to disk by an associated workstation

Data analysis

Units were sorted using Offline Sorter software from Plexon Inc (Dallas, TX). Sorted files were then processed and analyzed in Neuroexplorer and Matlab (Natick, MA). Dopamine neurons were identified via a waveform analysis used and validated by us and others previously (Jin & Costa, 2010; Y. S. Jo, J. Lee, & S. J. Mizumori, 2013; Roesch, Calu, & Schoenbaum, 2007; Takahashi et al., 2016; Y. K. Takahashi et al., 2011). Briefly cluster analysis was performed based on the half time of the spike duration and the ratio comparing the amplitude of the first positive and negative waveform segments. The center and variance of each cluster was computed without data from the neuron of interest, and then that neuron was assigned to a cluster if it was within 3 s.d. of the cluster’s center. Neurons that met this criterion for more than one cluster were not classified. This process was repeated for each neuron. Analyses were conducted using Matlab or Statistica (StatSoft, Inc, Tulsa, OK) as described in the main text.

For the activity during reward delivery and omission, both forced-choice and free-choice trials were included in analysis. T1 was defined as a trial in which reward was first delivered or omitted at a well after each block switch. Inference trial was defined as a trial in which reward was first delivered or omitted at the other well after T1 trial.

For the activity during cue-sampling, only forced-choice trials were included. T1 was defined if (i) the trial was a forced-choice trial in which reward was delivered or omitted at a well and (ii) the trial was a first trial after each block switch. Inference trial was defined if (i) the trial was a forced-choice trial after T1 trial in which reward was delivered or omitted at the other well and (ii) there was no previous free-choice trial after T1 trial in which reward was delivered or omitted at the same well.

We peak-normalized each neuron by dividing all firing rate by a peak firing rate across the trial in 500ms bins after averaging across the last 10 trials of each of the eight-well conditions in the four blocks. To clearly see the suppression of firing after reward omission, we also subtracted peak-normalized baseline firing (500ms during inter-trial interval before trial onset) from all peak-normalized firing rate.

RESULTS

We recorded single-unit activity from VTA in control rats (n = 15) and rats with ipsilateral OFC lesions (OFCx, n = 7) (see references Takahashi et al., 2016; Y. K. Takahashi et al., 2011 for more information on recording locations). The lesions targeted the ventral and lateral orbital and ventral and dorsal agranular insular areas in the bank off the rhinal sulcus (Fig. 1a). Neurons were recorded in rats performing an odor-guided choice task used previously to characterize signaling of reward predictions and reward prediction errors (Roesch et al., 2007; Takahashi et al., 2016; Takahashi et al., 2009; Y. K. Takahashi et al., 2011). On each trial, rats sampled one of three different odor cues at a central port, and then responded at one of two adjacent wells (Fig. 1b). One odor signaled the availability of sucrose reward only in the left well (forced left), a second odor signaled sucrose reward only in the right well (forced right), and a third odor signaled that reward was available at either well (free choice). To induce errors in the prediction of rewards, we manipulated either the timing or number of rewards delivered in each well across blocks of trials (Fig. 1c). Positive prediction errors were induced by making a previously delayed reward immediate (blocks 2sh and 3bg) or by adding more rewards (blocks 3bg and 4bg), whereas negative prediction errors were induced by delaying a previously immediate reward (block 2lo) or by decreasing the number of rewards (block 4sm).

Dopamine neurons were identified by means of a waveform analysis similar to that used to identify dopamine neurons in primate studies (Bromberg-Martin et al., 2010; Fiorillo, Newsome, & Schultz, 2008; Hollerman & Schultz, 1998; Kobayashi & Schultz, 2008; Matsumoto & Hikosaka, 2009; Mirenowicz & Schultz, 1994; Morris, Nevet, Arkadir, Vaadia, & Bergman, 2006). This approach identified 103 putative dopamine neurons in controls (Takahashi et al., 2016; Y. K. Takahashi et al., 2011) and 76 in OFCx rats (Y. K. Takahashi et al., 2011). Of these, 60 neurons in controls and 50 in OFCx rats increased firing in response to reward (compared with a 500 ms baseline taken during the inter-trial-interval before trial onset). The proportion of reward-responsive dopamine neurons did not differ between groups (x2 test, x2 = 1.05, p = 0.31). The average baseline activity and the average firing at the time of reward (500ms after reward delivery) were similar in the two groups (control vs OFCx, baseline, t-test, df = 108, p = 0.96; reward, t-test, df = 108, p = 0.92, data not shown). Thus, ipsilateral OFC lesions did not appear to have dramatic effects on the prevalence, waveform characteristics, or reward-related firing of putative dopamine neurons (see references Takahashi et al., 2016; Y. K. Takahashi et al., 2011 for more information on waveform features).

Both groups of rats changed their choice behavior across blocks in response to the changing reward contingencies, choosing the high-value reward more frequently at the end than at the beginning of blocks (Fig. 1d). A two-way ANOVA comparing Group × Learning (early/late) revealed a main effect of learning (F1,108 = 534.5, p < 0.01), but no main effect nor interaction involving Group (F’s < 1.9, p > 0.17). A step-down comparison showed that the percentage of high-value choice was significantly higher in late than that in early in both groups (ANOVA, Control, F1,59 = 273.5, p < 0.01; OFCx, F1,49 = 265.9, p < 0.01) (see references Takahashi et al., 2016; Y. K. Takahashi et al., 2011 for more information on behavior).

Previously we have reported that these reward-responsive dopamine neurons exhibited phasic responses at the time of reward and to the cues consistent with signaling of reward prediction errors (Takahashi et al., 2016; Y. K. Takahashi et al., 2011). Here we conducted a new analysis in which we examined whether the error signals in these same dopamine neurons reflected the rats’ ability to infer changes in the value of the reward available in one well as a result of the block structure in which a change in reward in one well was always accompanied by a high predictable change in reward in the other well. As a result of this arrangement, when a rat received a unexpected reward on one side at the beginning of a new block (on trial 1 or the T1 trial), it could immediately infer the value of the new reward available on the other side. This inferential prediction was evident in faster responding for reward, particularly when the rats could infer that the high-value reward was available in a new block of trials. Indeed both controls and OFCx rats responded much faster on such inference trial than on the comparable first (T1) trial in the new block (Fig. 1e; controls: ANOVA, F1,98 = 12.8, p < 0.01; OFCx: ANOVA, F1,79 = 9.33, p < 0.01). Faster responding did not occur when the low value reward could be inferred, although neither did the rats respond more slowly (Control, difference in T1 = 0.05 ± 0.03, difference in inference = −0.006 ± 0.01, F1,77 = 2.98, p = 0.09; OFCx, difference in T1 = 0.001 ± 0.03, difference in inference = 0.02 ± 0.02, F1,51 = 0.33, p = 0.57). The lack of faster responding is important because it shows that the effect on high value trials is not a general effect due to a change in attention or motivation after the change in reward. The failure to see frank slowing of responding is unfortunate but it is generally in accord with observations that extinction or inhibition of behavior is more resistant than acquisition or recovery (Rescorla, 2002), and also with observations in monkeys where similar inference based behaviors have much larger effects on encouraging responding to previously unrewarded cues than suppressing responding to previously rewarded cues (Saez et al., 2015).

The faster responding on high value inference trials suggest that the rats do utilize the block structure and predictable nature of the reward changes to make inferential predictions, as we have found previously in this setting (T A Stalnaker et al., 2014). We hypothesized that if this inferential or updated prediction were conveyed to the dopamine neurons, then their prediction error signal at the time of the change in reward on an inference trial would be weaker than that to the same change in reward experienced as the first (T1) trial in a new block. To make this comparison, we focused our analyses on the transitions between otherwise similar blocks (block 1->2 or block 3->4).

Consistent with our hypothesis, the population response of the reward-responsive dopamine neurons recorded in controls exhibited a smaller increase in firing to an unexpected reward (Fig.2a) and a smaller decrease in firing upon an omission of an expected reward (Fig. 2b) on inference trials when compared to firing to the same reward on the T1 trial. To evaluate these effects, we compared the average peak-normalized firing on the first and last 3 trials in blocks when the first trial occurred immediately after the beginning of a new block (T1 trial, blue lines) versus when the same trial occurred after rats had already experienced the change in reward on the other side (inference trial, red lines). This analysis showed that, for both unexpected and omitted reward, firing differed significantly between the T1 trial and inference trial (left plots in Figs. 2e and 2f) (ANOVA, reward delivery, F1,118 = 4.77, p = 0.031; reward omission, F1,118 = 4.60, p = 0.034). This difference in firing was only evident on the first trial; the firing on subsequent trials did not differ significantly (Figs. 2e and 2f) (ANOVA, F’s < 2.25, p’s > 0.14).

Figure 2. Effects of inferential predictions on firing of dopamine neurons to unexpected delivery and omission of reward.

Figure 2

(a – c) Peak-normalized population activity of dopamine neurons to unexpected delivery (a and c) and omission (b and d) of reward on T1 (blue) and inference trial (red) in control (a and b) and OFCx (c and d) rats. (e – h) Average peak-normalized activity of dopamine neurons to unexpected delivery (e and g) and omission (f and h) of reward on first 3 and last 3 trials on side of T1 (blue) versus inference trial (red) in control (e and f) and OFCx (g and h) rats.

Previously we have reported that input from OFC contributes to dopaminergic error signaling in rat VTA by helping to define value predictions more accurately, particularly in situations or states that cannot be directly observed (Y. K. Takahashi et al., 2011). Further, we have shown that OFC neurons signal inferred predictions about the not-yet-experienced reward in this choice task (T. A. Stalnaker et al., 2014). Based on this, we hypothesized that dopamine neurons might depend on OFC for information regarding inferred reward predictions. To assess this, we examined the effect of inference on error signaling in dopamine neurons recorded in OFCx rats.

Consistent with our hypothesis, the difference in firing on T1 and inference trial evident in controls (Figs. 2a and 2b) was absent in dopamine neurons recorded in OFCx rats (Figs. 2c and 2d). Importantly, this loss occurred even though dopamine neurons in OFCx rats still exhibited phasic firing to the expected reward at the beginning of a new block (Y. K. Takahashi et al., 2011); this phasic activity was simply not any larger than when the same unexpected reward occurred after rats experienced the changes in reward on the other side (T1, blue lines versus inference trial, red lines). Indeed average firing on these 2 trial types was almost identical on both the first trial (left plot, Fig. 2g) (ANOVA, F1,98 = 0.58, p = 0.45) and subsequent trials (Fig. 2g) (ANOVA, F’s < 1.59, p’s > 0.21). Firing upon an omission of reward also did not differ between these 2 trial types on the first trial (Figs. 2d and 2h) (ANOVA, F1,98 = 0.01, p = 0.90) and the subsequent trials (Fig. 2h) (ANOVA, F’s < 1.36, p’s > 0.24), although the meaning of this is less clear, since as in our original report, the dopamine neurons in OFCx rats did not show a statistically significant suppression of firing to reward omission. Note that the control and OFCx rats received the same amount of pre-training sessions before recordings (Control, 26.1 ± 0.9 sessions; OFCx, 25.8 ± 1.3 sessions; t-test, t = 0.13, p = 0.89). Thus the lack of difference in firing of dopamine neurons on T1 and inference trial did not reflect the difference in an amount of training before recordings.

Prediction error signals were also evident in response to the reward-predictive odor cues in this task; these dopamine neurons responded phasically during sampling of the odor cues, and after learning, this phasic response was larger to cues that predicted a high-value reward (Takahashi et al., 2016; Y. K. Takahashi et al., 2011). Here we conducted a new analysis in which we examined whether this cue-evoked response was affected by inference. For this, we compared the firing to the high value odor cue on the T1 trial or the inference trial with firing to the same odor cue the last 3 times it was presented at the end of the previous block, when it signaled low value. Consistent with the hypothesis that the dopamine neurons have access to inferential predictions, dopamine neurons in control rats exhibited an increase in firing to the high-value cue on the inference trial (Top distribution plot in Fig. 3b) but not on the T1 trial (Top distribution plot in Fig. 3a). Furthermore, the changes in activity to the high value cue on the inference trial were correlated with the faster reaction times on these trials (Scatter plot in Fig. 3b), whereas there was no correlation between reaction times and firing to the high value cue on T1 trials (Scatter plot in Fig. 3a).

Figure 3. Effects of inferential predictions on firing of the dopamine neurons to the high value cue.

Figure 3

Each panel (a – d) shows the distributions of and correlations between the activity (cue index) and behavior (reaction time) evoked by the high value cue at the start of a new trial block. The cue index of each neuron is computed from peak-normalized firing to the high value cue on the T1 trial or inference trial minus the average firing to that cue on the last 3 presentations in previous block. Reaction time is computed from reaction time on T1 trial or inference trial minus the average reaction time to that cue on the last 3 presentations in previous block. Data is plotted separately for neurons recorded in controls (a and b) and OFCx rats (c and d). The numbers in each distribution plot indicate results of Wilcoxon signed-rank test (p) and the average of index (u).

Notably, dopamine neurons recorded in OFCx rats failed to show any effect of inference on their firing to the high-valued cue at the start of new blocks (Top distribution plots in Figs. 3c and 3d). This effect of lesions occurred even though the lesioned rats showed normal changes in response latencies on inference trials (Left distribution plot in Fig. 3d), and the dopamine neurons exhibited phasic responses to the cues that reflected some aspects of their predicted value after learning (Y. K. Takahashi et al., 2011).

DISCUSSION

Here, in a new analysis of previously described data (Takahashi et al., 2016; Y. K. Takahashi et al., 2011), we tested whether dopaminergic error signals in rats were influenced by inferential predictions and whether such effects depended on the OFC. Dopamine neurons were recorded from controls or rats with ipsilateral OFC lesions during performance of a choice task in which odor cues signaled the availability of sucrose reward in two wells. Errors were induced by changes in the timing or number of rewards delivered in each well across blocks of trials. Importantly, a change in reward at one well predicted a change in reward at the other. We found that rats used this information to adapt their behavior in these recording sessions. Further, the underlying inferential predictions were reflected in the firing of dopamine neurons in controls, which changed less to unexpected delivery or omission of reward (and more to the new high-value cue) when rats could infer a change in reward versus when the same reward change occurred but inference was not possible. These effects were absent in dopamine neurons recorded in OFC-lesioned rats. These results confirm reports in monkeys of similar effects of inferential predictions on dopaminergic error signals (Bromberg-Martin et al., 2010), and further show that the ability of inferential predictions to influence dopaminergic error signals depends on OFC.

As noted earlier, the ability to update a reward prediction without additional direct experience with that reward in this and similar tasks likely occurs, even in the absence of clear contextual cues, because a well-trained animal can use a change in one reward to infer that changes have occurred in the other rewards. State representations can be used to separate different associative rules in different trial blocks. Upon encountering a change in reward, the well-trained animal can recall the appropriate rules for both the experienced reward as well as any others that the change in state predicts. Indeed, the influence of such state representations may be ubiquitous once one begins to look for it. Event-related firing the OFC as well as other prefrontal areas and even amygdala is influenced by context or state defined as trial blocks with different reward contingencies (Saez et al., 2015; T A Stalnaker et al., 2014), and cholinergic interneurons in the dorsal striatum provide an OFC-dependent state correlate that distinguishes such different blocks of trials (Stalnaker et al., 2016).

Given evidence that many of these same areas influence dopamine neuron firing generally (Lodge, 2011; Y K Takahashi et al., 2011) and coding of prediction errors specifically (Y S Jo, J Lee, & S J Mizumori, 2013; Jo & Mizumori, 2015; Y K Takahashi et al., 2011), our current findings are perhaps not surprising. However the confirmation that dopamine neurons have access to these inferential predictions, not only in monkeys but also in rats, is important because expands and at the same time makes subtler the potential contributions of these powerful teaching signals to learning. At a minimum, it allows them to operate in concert with rather than dissociated from this knowledge, and it puts them in a position to more appropriately facilitate learning when actual outcomes, since the errors can be based on hopefully more accurate inferential predictions rather than outdated, slower to change, experiential ones.

In this regard, it is worth noting that dopaminergic prediction errors do not seem to be limited to accessing inferred predictions about rewards that have been received in the past. At least in rats, dopaminergic errors reflect inferential predictions that do not require any direct experience with the reward. For example, several groups have shown that cue-evoked dopamine release changes as a result of changes in the desirability of the predicted reward (Aitken, Greenfield, & Wassum, 2016; Cone et al., 2016; Papageorgiou, Baudonnat, Cucca, & Walton, 2016), and we have recently shown that dopamine neurons fire to preconditioned cues that have never been directly paired with a reward or reward-predicting event (Sadacca, Jones, & Schoenbaum, 2016). While similar correlates have not been tested for in monkeys, BOLD correlates of reward prediction errors in humans reflect inferential predictions (Daw, Gershman , Seymour, Dayan, & Dolan, 2011; Glascher et al., 2010). Overlap with areas signaling experiential reward prediction errors suggest that this may be dopaminergic activity. These data highlight the growing evidence that dopaminergic errors may have access to a much wider variety of information than is generally appreciated and may be involved much more broadly in learning than envisioned by current canon (Glimcher, 2011; Schultz, 2016).

Acknowledgments

This work was supported by funding from NIDA. The opinions expressed in this article are the author's own and do not reflect the view of the National Institutes of Health, the Department of Health and Human Services, or the United States government.

REFERENCES

  1. Aitken TJ, Greenfield VY, Wassum KM. Nucleus accumbens core dopamine signaling tracks the need-based motivational value of food-paired cues. Journal of Neurochemistry. 2016;10.1111 doi: 10.1111/jnc.13494. jnc.13494. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Bromberg-Martin ES, Matsumoto M, Hong S, Hikosaka O. A pallidus-habenula-dopamine pathway signals inferred stimulus values. Journal of Neurophysiology. 2010;104:1068–1076. doi: 10.1152/jn.00158.2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Chang CY, Esber GR, Marrero-Garcia Y, Yau H-J, Bonci A, Schoenbaum G. Brief optogenetic inhibition of VTA dopamine neurons mimics the effects of endogenous negative prediction errors during Pavlovian over-expectation. Nature Neuroscience. 2016;19:111–116. doi: 10.1038/nn.4191. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Cone JJ, Fortin SM, McHenry JA, Stuber GD, McCutcheon JE, Roitman JF. Physiological state gates acquisition and expression of mesolimbic reward prediction signals. Proceedings of the National Academy of Science. 2016;113:1943–1948. doi: 10.1073/pnas.1519643113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Daw ND, Gershman SJ, Seymour B, Dayan P, Dolan RJ. Model-based influences on humans' choices and striatal prediction errors. Neuron. 2011;69:1204–1215. doi: 10.1016/j.neuron.2011.02.027. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Fiorillo CD, Newsome WT, Schultz W. The temporal precision of reward prediction in dopamine neurons. Nature Neuroscience. 2008;11:966–973. doi: 10.1038/nn.2159. [DOI] [PubMed] [Google Scholar]
  7. Gershman SJ, Niv Y. Learning latent structure: carving nature at its joints. Current Opinion in Neurobiology. 2010;20:251–256. doi: 10.1016/j.conb.2010.02.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Glascher J, Daw N, Dayan P, O'Doherty JP. States versus rewards: dissociable neural prediction error signals underlying model-based and model-free reinforcement learning. Neuron. 2010;66:585–595. doi: 10.1016/j.neuron.2010.04.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Glimcher PW. Understanding dopamine and reinforcement learning: The dopamine reward prediction error hypothesis. Proceedings of the National Academy of Science. 2011;108:15647–15654. doi: 10.1073/pnas.1014269108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Hollerman JR, Schultz W. Dopamine neurons report an error in the temporal prediction of reward during learning. Nature Neuroscience. 1998;1:304–309. doi: 10.1038/1124. [DOI] [PubMed] [Google Scholar]
  11. Jin X, Costa RM. Start/stop signals emerge in nigrostriatal circuits during sequence learning. Nature. 2010;466(7305):457–462. doi: 10.1038/nature09263. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Jo YS, Lee J, Mizumori SJ. Effects of prefrontal cortical inactivation on neural activity in the ventral tegmental area. Journal of Neuroscience. 2013;33:8159–8171. doi: 10.1523/JNEUROSCI.0118-13.2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Jo YS, Mizumori SJ. Prefrontal regulation of neuronal activity in the ventral tegmental area. Cerebral Cortex. 2015 doi: 10.1093/cercor/bhv215. epub ahead of print. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Jo YS, Lee J, Mizumori SJ. Effects of prefrontal cortical inactivation on neural activity in the ventral tegmental area. J Neurosci. 2013;33(19):8159–8171. doi: 10.1523/JNEUROSCI.0118-13.2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Kobayashi K, Schultz W. Influence of reward delays on responses of dopamine neurons. Journal of Neuroscience. 2008;28:7837–7846. doi: 10.1523/JNEUROSCI.1600-08.2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Lodge DJ. The medial prefrontal and orbitofrontal cortices differentially regulate dopamine system function. Neuropsychopharmacology. 2011;36:1227–1236. doi: 10.1038/npp.2011.7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Matsumoto M, Hikosaka O. Two types of dopamine neuron distinctly convey positive and negative motivational signals. Nature. 2009;459:837–841. doi: 10.1038/nature08028. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Mirenowicz J, Schultz W. Importance of unpredictability for reward responses in primate dopamine neurons. Journal of Neurophysiology. 1994;72:1024–1027. doi: 10.1152/jn.1994.72.2.1024. [DOI] [PubMed] [Google Scholar]
  19. Morris G, Nevet A, Arkadir D, Vaadia E, Bergman H. Midbrain dopamine neurons encode decisions for future action. Nature Neuroscience. 2006;9:1057–1063. doi: 10.1038/nn1743. [DOI] [PubMed] [Google Scholar]
  20. Papageorgiou GK, Baudonnat M, Cucca F, Walton ME. Mesolimbic dopamine encodes prediction errors in a state-dependent manner. Cell Reports. 2016;15:221–228. doi: 10.1016/j.celrep.2016.03.031. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Rescorla RA. Comparison of the rates of associative change during acquisition and extinction. Journal of Experimental Psychology. Animal Behavior Processes. 2002;28:406–415. [PubMed] [Google Scholar]
  22. Roesch MR, Calu DJ, Schoenbaum G. Dopamine neurons encode the better option in rats deciding between differently delayed or sized rewards. Nat Neurosci. 2007;10(12):1615–1624. doi: 10.1038/nn2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Roesch MR, Taylor AR, Schoenbaum G. Encoding of time-discounted rewards in orbitofrontal cortex is independent of value representation. Neuron. 2006;51(4):509–520. doi: 10.1016/j.neuron.2006.06.027. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Sadacca BF, Jones JL, Schoenbaum G. Midbrain dopamine neurons compute inferred and cached value prediction errors in a common framework. eLIFE. 2016 doi: 10.7554/eLife.13665. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Saez A, Rigotti M, Ostojic S, Fusi S, Salzman CD. Abstract context representations in primate amygdala and prefrontal cortex. Neuron. 2015;87:869–881. doi: 10.1016/j.neuron.2015.07.024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Schultz W. Dopamine reward prediction-error signalling: a two-component response. Nature Reviews Neuroscience. 2016;17:183–195. doi: 10.1038/nrn.2015.26. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Schultz W, Dayan P, Montague PR. A neural substrate for prediction and reward. Science. 1997;275:1593–1599. doi: 10.1126/science.275.5306.1593. [DOI] [PubMed] [Google Scholar]
  28. Stalnaker TA, Berg BA, Aujla N, Schoenbaum G. Cholinergic interneurons use orbitofrontal input to track beliefs about current state. Journal of Neuroscience. 2016;36:6242–6257. doi: 10.1523/JNEUROSCI.0157-16.2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Stalnaker TA, Cooch NK, McDannald MA, Tzu-Lan L, Wied H, Schoenbaum G. Orbitofrontal neurons infer the value and identity of predicted outcomes. Nature Communications. 2014;5:3926. doi: 10.1038/ncomms4926. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Stalnaker TA, Cooch NK, McDannald MA, Liu TL, Wied H, Schoenbaum G. Orbitofrontal neurons infer the value and identity of predicted outcomes. Nat Commun. 2014;5:3926. doi: 10.1038/ncomms4926. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Steinberg EE, Keiflin R, Boivin JR, Witten IB, Deisseroth K, Janak PH. A causal link between prediction errors, dopamine neurons and learning. Nature Neuroscience. 2013;16:966–973. doi: 10.1038/nn.3413. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Takahashi YK, Roesch MR, Wilson RC, Toreson K, O'Donnell P, Niv Y, Schoenbaum G. Expectancy-related changes in firing of dopamine neurons depend on orbitofrontal cortex. Nature Neuroscience. 2011;14:1590–1597. doi: 10.1038/nn.2957. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Takahashi YK, Langdon AJ, Niv Y, Schoenbaum G. Temporal Specificity of Reward Prediction Errors Signaled by Putative Dopamine Neurons in Rat VTA Depends on Ventral Striatum. Neuron. 2016 doi: 10.1016/j.neuron.2016.05.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Takahashi YK, Roesch MR, Stalnaker TA, Haney RZ, Calu DJ, Taylor AR, Schoenbaum G. The orbitofrontal cortex and ventral tegmental area are necessary for learning from unexpected outcomes. Neuron. 2009;62(2):269–280. doi: 10.1016/j.neuron.2009.03.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Takahashi YK, Roesch MR, Wilson RC, Toreson K, O'Donnell P, Niv Y, Schoenbaum G. Expectancy-related changes in firing of dopamine neurons depend on orbitofrontal cortex. Nat Neurosci. 2011;14(12):1590–1597. doi: 10.1038/nn.2957. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Wilson RC, Takahashi YK, Schoenbaum G, Niv Y. Orbitofrontal cortex as a cognitive map of task space. Neuron. 2014;81:267–279. doi: 10.1016/j.neuron.2013.11.005. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES