Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2017 Jul 6.
Published in final edited form as: Neuron. 2016 Jun 9;91(1):182–193. doi: 10.1016/j.neuron.2016.05.015

Temporal specificity of reward prediction errors signaled by putative dopamine neurons in rat VTA depends on ventral striatum

Yuji K Takahashi 1,*, Angela J Langdon 2,*, Yael Niv 2, Geoffrey Schoenbaum 1,3,4
PMCID: PMC4938771  NIHMSID: NIHMS786796  PMID: 27292535

Summary

Dopamine neurons signal reward prediction errors. This requires accurate reward predictions. It has been suggested that the ventral striatum provides these predictions. Here we tested this hypothesis by recording from putative dopamine neurons in the VTA of rats performing a task in which prediction errors were induced by shifting reward timing or number. In controls, the neurons exhibited error signals in response to both manipulations. However, dopamine neurons in rats with ipsilateral ventral striatal lesions exhibited errors only to changes in number and failed to respond to changes in timing of reward. These results, supported by computational modeling, indicate that predictions about the temporal specificity and the number of expected rewards are dissociable, and that dopaminergic prediction-error signals rely on the ventral striatum for the former but not the latter.

Introduction

Reward prediction errors are famously signaled, at least in primates and rodents, by midbrain dopamine neurons (Barto, 1995; Mirenowicz and Schultz, 1994; Montague et al., 1996; Schultz et al., 1997). Key to signaling reward prediction errors are reward predictions (Bush and Mosteller, 1951; Rescorla and Wagner, 1972; Sutton and Barto, 1981). Theoretical and experimental work has suggested that the ventral striatum (VS) is an important source of these predictions, particularly to dopamine neurons in the ventral tegmental area (VTA) (Daw et al., 2006; Daw et al., 2005; Joel et al., 2002; O'Doherty et al., 2003; O'Doherty et al., 2004; Seymour et al., 2004; Willuhn et al., 2012). Here we tested this hypothesis by recording the activity of putative dopaminergic neurons in the VTA of rats performing a task in which positive and negative prediction errors were induced by shifting either the timing or the number of expected rewards. We found that dopamine neurons recorded in sham-lesioned controls exhibited prediction error signals in response to both manipulations. By contrast, dopamine neurons in rats with ipsilateral VS lesions exhibited prediction error signals only in response to changes in the number of rewards; these neurons failed to respond to changes in reward timing. Computational modeling of these data, using a framework that separates learning about reward timing from learning about reward number (Daw et al., 2006), showed that this pattern of results could be obtained by degrading the model's ability to learn precise timing, while leaving all other aspects of the model intact. Contrary to proposals that VS might be required for all aspects of reward prediction (Joel et al., 2002; O'Doherty et al., 2003; Willuhn et al., 2012), these results suggest that the VS is critical for endowing reward predictions with temporal specificity.

Results

We recorded single-unit activity in the VTA of rats with ipsilateral sham (n = 9) or neurotoxic (n = 7) lesions of VS (see Fig. S1a-b for recording locations). Recordings were made in rats with ipsilateral lesions to avoid confounding any neural effects of lesions with behavioral changes (Burton et al., 2014). Lesions targeted the VS core, resulting in visible loss of neurons in 57% (35-75%) of this region across subjects (Fig. S1f). Neurons in the lesioned area are known to fire to reward-predictive cues (Bissonette et al., 2013; O'Doherty et al., 2003; O'Doherty et al., 2004; Oleson et al., 2012; Roesch et al., 2009) and send output to VTA (Bocklisch et al., 2013; Grace and Bunney, 1985; Groenewegen et al., 1990; Mogenson et al., 1980; Voorn et al., 2004; Watabe-Uchida et al., 2012; Xia et al., 2011), supporting the proposal that this part of VS may influence dopaminergic prediction error signaling as proposed in neural instantiations of temporal difference reinforcement learning (TDRL) models (Daw et al., 2006; Joel et al., 2002).

Neurons were recorded in rats performing an odor-guided choice task used previously to characterize signaling of reward predictions and reward prediction errors (Roesch et al., 2007; Takahashi et al., 2009; Takahashi et al., 2011). On each trial, rats sampled one of three different odor cues at a central port, and then responded at one of two adjacent wells (Fig. 1a). One odor signaled the availability of sucrose reward only in the left well (forced left), a second odor signaled sucrose reward only in the right well (forced right), and a third odor signaled that reward was available at either well (free choice). To induce errors in the prediction of rewards, we manipulated either the timing or the number of rewards delivered in each well across blocks of trials (Fig. 1b). Positive prediction errors were induced by making a previously delayed reward immediate (Fig. 1b, blue arrows in blocks 2sh and 3bg) or by adding more rewards (Fig. 1b, green arrows in blocks 3bg and 4bg), whereas negative prediction errors were induced by delaying a previously immediate reward (Fig. 1b, red arrow in block 2lo) or by decreasing the number of rewards (Fig. 1b, orange arrows in block 4sm).

Figure 1. Apparatus and behavioral results.

Figure 1

(a) Picture of apparatus used in the task, showing the odor port (∼2.5 cm diameter) and two fluid wells. (b) Line deflections indicate the time course of stimuli (odors and rewards) presented to the animal on each trial. Dashed lines show when reward was omitted, and solid lines show when reward was delivered. At the start of each recording session one well was randomly designated as short (a 0.5 s delay before reward) and the other, long (a 1-7 s delay before reward) (block 1). In the second block of trials, these contingencies were switched. In block 3, the delay was held constant while the number of rewards was manipulated; one well was designated as big reward in which a second bolus of reward was delivered (big reward) and a single bolus of reward was delivered in the other well (small reward). In block 4, these contingencies were switched again. Blue arrows, unexpected short reward; red arrow, short reward omission; green arrows, unexpected big reward; orange arrow, big reward omission. (c, f) Choice behavior in last 3 trials before the switch and first 8 and last 8 trials after the switch from high-valued outcome to a low-valued outcome in delay (c) and size blocks (f). Inset bar graphs show average percentage choice for high-valued (black) and low-valued (white) outcomes across all free-choice trials. Black line, sham-lesioned rats (Sham, n = 9, 75 sessions); gray line, unilateral VS-lesioned rats (VS×U, n = 7, 71 sessions). (d, e, g, h) Behavior on forced-choice trials in delay (d, e) and size blocks (g, h). Bar graphs show percentage correct (d, g) and reaction times (e, h) in response to the high and low value across all recording sessions. *p < 0.05 or better (see main text); NS, nonsignificant. Error bars, s.e.m.

Recording began after the rats were shaped to perform the task. Shaping was similar across the two groups, and there were no significant differences in number of trials in each block in the recording sessions (ANOVA, F3,432 = 0.54, p = 0.66). As expected, sham-lesioned rats changed their behavior across blocks in response to the changing rewards, choosing the larger/earlier reward more often on free-choice trials (75 sessions; delay blocks, t-test, t74 = 7.96, p < 0.01, Fig. 1c; number blocks, t-test, t74 = 11.72, p < 0.01, Fig. 1f) and responding more accurately (delay blocks, t-test, t74 = 12.01, p < 0.01, Fig. 1d; number blocks, t-test, t74 = 9.29, p < 0.01, Fig. 1g) and with shorter reaction times (delay blocks, t-test, t74 = 5.81, p < 0.01, Fig. 1e; number blocks, t-test, t74 = 3.06, p < 0.01, Fig. 1h) on forced-choice trials when the earlier or larger reward was at stake. Rats with unilateral (VS×U) lesions showed similar behavior (71 sessions; percent choice in delay blocks, t-test, t70 = 12.81, p < 0.01, Fig. 1c; in number blocks, t-test, t70 = 8.29, p < 0.01, Fig. 1f; percent correct in delay blocks, t-test, t70 = 10.39, p < 0.01, Fig. 1d; in number blocks, t-test, t70 = 5.74, p < 0.01, Fig. 1g; reaction times in delay blocks, t-test, t70 = 7.03, p < 0.01, Fig. 1e; in number blocks, t-test, t70 = 3.06, p < 0.05, Fig. 1h). Two-factor ANOVAs (group × reward number or group × reward timing) revealed neither main effects nor any interactions involving group in free choice performance, percent correct, or reaction times in delay (F's < 3.1, p's > 0.08) or number (F's < 0.7, p's > 0.07) blocks. Thus the two groups showed similar differences in all our behavioral measures in both block types.

Dopamine neurons signal prediction errors in response to changes in timing or number of rewards

We identified putative dopamine neurons by means of a waveform analysis similar to that used to identify dopamine neurons in primate studies (Bromberg-Martin et al., 2010; Fiorillo et al., 2008; Hollerman and Schultz, 1998; Kobayashi and Schultz, 2008; Matsumoto and Hikosaka, 2009; Mirenowicz and Schultz, 1994; Morris et al., 2006; Waelti et al., 2001). Although the use of waveform criteria has not been uniformly accepted for isolating dopamine neurons (Margolis et al., 2006), this analysis isolates neurons in rat VTA whose firing is sensitive to intravenous infusion of apomorphine or quinpirole (Jo et al., 2013; Roesch et al., 2007), and nigral neurons identified by a similar cluster analysis in mice are selectively activated by optical stimulation in tyrosine hydroxylase channelrhodopsin -2 mutants and show reduced bursting in tyrosine hydroxylase striatal-specific NMDAR1 knockouts (Xin and Costa, 2010). Although these criteria may exclude some neurons containing enzymatic markers relevant to dopamine synthesis (Margolis et al., 2006; Ungless and Grace, 2012), only neurons in this cluster signaled reward prediction errors in appreciable numbers in our previous work (Roesch et al., 2007; Takahashi et al., 2011).

This approach identified as putatively dopaminergic 51 of 501 and 55 of 407 neurons recorded from VTA in sham- and VS-lesioned rats, respectively (Fig. S1a-c). These proportions did not differ between groups (Chi-square = 2.4, df = 1, p = 0.12). Of these, 30 neurons in sham-and 31 in VS-lesioned rats increased firing in response to reward (compared with a 500 ms baseline taken during the inter-trial-interval before trial onset). The average baseline activity and the average firing at the time of reward were similar in the two groups, both for these reward-responsive neurons as well as for the remaining dopamine neurons that were not responsive to reward (Fig. S1d). Thus, VS lesions did not appear to have dramatic effects on the prevalence, waveform characteristics, or reward-related firing of putative dopamine neurons. Of note, neurons categorized as non-dopaminergic did show significantly higher baseline firing in the VS-lesioned rats (Fig. S1e).

As in prior studies (Roesch et al., 2007; Takahashi et al., 2011), we found that prediction error signaling was largely restricted to reward-responsive, wide-waveform putative dopamine neurons (see Fig. S2 for analyses of error signaling in other populations). In sham-lesioned rats, the activity of these neurons increased in response to an unexpected reward and decreased in response to omission of an expected reward. In each case, the change was largest at the start of the block, diminishing with learning of the new reward contingencies as the block proceeded. Firing to unexpected reward and its change with learning were similar whether we changed the timing (Fig. 2a, e) or number of rewards (Fig. 2b, f). To quantify these effects, we analyzed the average firing rate on the first and last ten trials in blocks in which we changed in the timing (Fig. 2i) or number (Fig. 2j) of rewards. Activity was taken at the time of reward or reward omission in the relevant blocks, as indicated by the matching colored arrows in Fig. 1b. A three-factor ANOVA (reward/omission × timing/number manipulation × trial) of the trial-by-trial neural data plotted in Figs 2i and 2j revealed main effects of reward/omission (F1,29 = 15.0, p < 0.001) and a significant interaction between reward/omission and trial (F19,551 = 6.15, p < 0.001), but no main effect nor any interactions involving timing/number manipulation (F's < 2.9, p's > 0.10). Separate ANOVAs indicated main effects of trial in each data series (p's < 0.01), but no main effects or interactions with manipulation (F's < 2.7, p's > 0.1). Comparisons to baseline (gray lines, Fig. 2i, j) revealed changes in firing initially in response to unexpected reward or reward omission in delay blocks (Fig 2i: reward/baseline × early/late phase, F1,29 = 9.42, p < 0.01; omission/baseline and early/late phase, F1,29 = 9.59, p < 0.01) and number blocks (Fig 2j: reward/baseline × early/late phase, F1,29 = 8.97, p < 0.01; omission/baseline × early/late phase, F1,29 = 15.3, p < 0.01). Post-hoc analyses showed significant differences in firing versus baseline for both reward and omission early in each type of block (Fig 2i: reward, F1,29 = 12.65, p < 0.01: omission, F1,29 = 18.9, p < 0.01; Fig 2j: reward, F1,29 = 8.11, p < 0.01; omission, F1,29 = 6.92, p < 0.05), but not late in the block (F's < 3.8, p's > 0.05). In addition, difference scores comparing each neuron's firing early versus late in these blocks were distributed significantly above zero for unexpected reward (upper histograms in Fig. 2i and 2j) and below zero for reward omission (lower histograms in Fig.2i and 2j).

Figure 2. Changes in activity of reward-responsive dopamine neurons to unexpected changes in timing and size of reward.

Figure 2

(a – h) Three-dimensional heat plots represent activity in averaged across all reward-responsive dopamine neurons in sham (n = 30) (a, b, e, f) and VS-lesioned rats (n = 31) (c, d, g, h) in response to introduction of unexpected delivery of short reward (a and c, blue arrows), unexpected big reward (b and d, green arrows), omission of expected short reward (e and g, red arrows) and omission of expected big reward (f and h, orange arrows). (i – l) Average firing during 500 ms after delivery of short reward (blue) and big reward (green), or omission of short reward (red) and big reward (orange) in sham (i and j) and VS-lesioned rats (k and l). Error bars, s.e.m. Gray dotted lines and gray shadings indicate baseline firing and s.e.m. t = significant interaction vs baseline, * = significant difference from baseline earl or late, ns = non-significant. Small insets in each panel represent distribution of difference scores comparing firing to unexpected reward (top) and reward omission (bottom) early 5 versus late 10 trials in relevant trial blocks. Difference scores were computed from the average firing rate of each neuron in the first 5 minus the last 10 trials in relevant trial blocks. The numbers in upper left of each panel indicate results of Wilcoxon signed-rank test (p) and the average difference score (u). (m, n, p, q) Difference in firing between delivery and omission of short reward (m and p) and between delivery and omission of big reward (n and q) in sham (m and n) and VS-lesioned rats (p and q). Dashed lines and gray shadings indicate average and 2 s.e.m. of shuffled data. Error bars, s.e.m. (o, r) Distribution of difference scores comparing (m) versus (n) in sham (o) and (p) versus (q) in VS lesioned rats (r). The numbers in upper left of each panel indicate results of Wilcoxon Signed-rank test (p) and the average difference score (u).

Notably, the effects of changing the timing versus number of rewards on the firing of these neurons were statistically indistinguishable inasmuch as the difference scores for unexpected reward and omission for each manipulation did not differ (Wilcoxon rank sum test, p's > 0.5). The similarity was also apparent when we plotted the difference in firing upon delivery of unexpected reward versus reward omission, trial by trial, separately for changes in timing (Fig. 2m) or number (Fig. 2n). This difference was large initially in each data series and then diminished after a small number of trials, approaching a shuffled baseline. A two-factor ANOVA comparing these data across the two manipulations found a significant main effect of trial (F19,551 = 6.23, p < 0.001), but no main effect nor any interaction with manipulation (F's < 0.7, p's > 0.75, see also difference scores plotted in the histogram in Fig. 2o). Together, these results suggest that our reward timing and number manipulations were equally successful at generating reward prediction errors in sham-lesioned animals.

VS lesions disrupt dopamine neuron signaling of prediction errors in response to changes in timing but not number of rewards

Ipsilateral lesions of the ventral striatum had a marked effect on the firing of dopamine neurons. In particular, reward-responsive dopamine neurons recorded in rats with ipsilateral VS lesions showed very little prediction error signaling when the timing of a reward changed (a delayed reward was made immediate, Fig. 2c, or an immediate reward was delayed, Fig. 2g). These neurons nevertheless showed prediction error signaling when reward number changed (new rewards added, Fig. 2d, or removed, Fig. 2h), although the changes in firing also seemed somewhat muted compared to those observed in sham controls.

These effects were again quantified by analyzing the average firing of the reward-responsive dopamine neurons on the first and last ten trials in all blocks in which we changed the timing (Fig 2k) or number (Fig. 2l) of rewards. A three-factor ANOVA (reward/omission × timing/number manipulation × trial number) of the neural data plotted in Figs 2k and 2l revealed main effects of reward/omission (F1,30 = 15.4, p < 0.001), trial (F19,570 = 1.83, p < 0.05) and a significant interaction between reward/omission and trial (F19,570 = 3.46, p < 0.001). However, in addition to these effects, which were similar to those seen in sham controls, there was also a significant three-way interaction involving the timing/number manipulation (F19,570 = 2.66, p < 0.001). Separate ANOVAs revealed significant interactions between trial and manipulation for both unexpected reward (F19,570 = 1.91, p < 0.05) and reward omission (F19,570 = 1.67, p < 0.05), and there were significant differences in firing on early versus late trials in response to changes in reward number (p's < 0.001) but not reward timing (F's < 1.45, p's > 0.10). Accordingly, difference scores comparing each neuron's firing early versus late in blocks where reward number was changed were distributed significantly above zero for unexpected reward and below zero for reward omission (histograms, Fig. 2l), however, similar scores computed when there was a change in reward timing were not different from zero (histograms, Fig. 2k). The distributions of these scores also differed significantly across manipulations (Wilcoxon rank sum tests, p's < 0.05), indicating that the effects of changing the timing versus number of rewards were statistically different.

Comparisons to baseline (gray lines, Fig. 2k, l) generally confirmed this difference between the effects of the two manipulations. There were no changes in firing versus baseline in response to manipulation of reward timing (Fig. 2k: reward/baseline × early/late phase, F1,30 = 1.65, p = 0.21; omission/baseline × early/late phase, F1,30 = 0.79, p = 0.38), whereas there were changes in response to manipulation of reward number (Fig 2l: reward/baseline × early/late phase, F1,30 = 5.83, p < 0.05; omission/baseline × early/late phase, F1,30 = 18.9, p < 0.01). Post-hoc analyses showed significant differences in firing versus baseline early only for an unexpected increase in reward number (Fig 2k: reward, F1,30 = 7.66, p < 0.01).

The different effects of the two manipulations were particularly apparent when we plotted the difference in firing upon delivery of unexpected reward versus reward omission, trial by trial, separately for changes in timing (Fig. 2p) or number (Fig. 2q) of rewards. For changes in reward number, this difference was large initially, diminishing after a small number of trials to approach a shuffled baseline value. However, for changes in reward timing, this difference score was flat throughout the block, showing only a difference due to reward receipt or omission. A two-factor ANOVA comparing these data across the two manipulations found a significant interaction between trial and manipulation (F19,570 = 2.65, p < 0.001, see also difference scores plotted in the histogram in Fig. 2r).

Finally, we directly compared data from sham- and VS-lesioned rats. ANOVAs comparing data for number blocks (Fig 2j versus 2l or Fig 2n versus 2q) found no main effects nor any interactions involving group (F's < 1.3, p's > 0.18), with the distributions of the scores comparing firing early versus late in these blocks (histograms, Figs. 2j versus 2l) not statistically different across groups (Wilcoxon rank sum test, addition: p = 0.96, omission: p = 0.52). These analyses indicate that, within the limits of our statistical power, the two groups showed very similar changes in firing in response to the addition or omission of extra rewards. On the other hand, we found significant group interactions when comparing data from timing blocks in Figs. 2i versus 2k (reward/omission × trial × group: F19,1121 = 2.63, p < 0.001) and in Figs. 2m versus 2p (trial × group: F19,1121 = 2.63, p < 0.001). Post-hoc analyses showed a significant interaction between group and firing on early versus late trials in response to both unexpected reward and reward omission (p's < 0.01). Accordingly, the distributions of the difference scores comparing firing changes in these blocks (histograms, Figs. 2i versus 2k) were significantly different between the groups (Wilcoxon rank sum tests, p's < 0.01), reflecting the fact that dopamine neurons recorded in sham controls changed firing in response to changes in reward timing, whereas those recorded in VS-lesioned rats did not. Distributions of difference scores plotted in Figs. 2o and 3r were also significantly different (Wilcoxon rank sum test, p = 0.011). Together, these analyses show that putative dopamine neurons in VS-lesioned rats responded differently to changes in the timing of the reward compared to putative dopamine neurons recorded in sham-lesioned rats.

Figure 3. Effects of simulating a lesion of temporal expectations in the semi-Markov TDRL model.

Figure 3

(a-d) Simulated average prediction errors during 500 ms after delivery of short reward (blue) and big reward (green), or omission of short reward (red) and big reward (orange) in the intact model (a-b) and in the lesioned model (c-d). Dotted lines in (c-d) show the expected pattern of prediction-error signals if total reward predictions were lost. Dark and light lines in (c-d) show simulated prediction-error signals for the full lesion and partial lesion models, respectively (colors indicate the same conditions as in (a-b)). (e) State space representation of the task, with transitions between states marked in black arrows and the characteristic observation for each state marked in gray. Note that the characteristic observation is only emitted with p=0.7, otherwise, the state emits a null (empty) observation (p=0.2) or any of the other 5 possible observations (with p=0.02 each). Below, the observation matrix shows the probability of each observation given each state and the transition matrix shows the probability of each successor state given each state. (f) Learnt dwell-time distributions at the end of block 1 (delay block) for state 2 in the short delay condition (blue) and state 3 in the long delay condition (red) for the intact, partial lesion and full lesion models. (g) Similar to (f) but at the end of block 3 (size block) for state 2 in the big reward condition (green) and state 3 in the small reward condition (orange). The more severe the lesion, more of the probability mass lies at infinity (equivalent to no prediction of state duration).

VS provides temporally precise reward predictions to the VTA

The in vivo results of our experiment suggest that temporal aspects of reward prediction can be dissociated from predictions of the number of expected rewards, and that VS supports essential information about the former, but perhaps not the latter, to VTA dopamine neurons. To make concrete this interpretation, we developed a TDRL model of the task using a framework that explicitly separates learning of reward number from learning to predict the timing of future rewards. Briefly, rather than using the standard TDRL framework in which event identity and event timing are inextricable, we used a semi-Markov framework that represents and learns about reward timing (and more generally, the duration of delays between events) separately and in parallel to representing and learning about expected amounts of reward (Daw et al., 2006).

In the model, the task is represented as a sequence of states (Fig 3e), with the duration of each state being potentially different. Key events in the task (e.g., appearance of an odor or a reward) signal transitions between states, however, transitions can also occur without an external signal, due to the passage of time. Importantly, through experience with the task, the animal learns the expected duration for each state, as well as the value (that is, expected amount of future reward) associated with the state. The latter is acquired through a temporal-difference learning rule, as in classic TDRL (see Methods for full details), but it is separate from learning about state durations, which occurs through averaging of previous durations of each state. The separation of state durations from state values allows the model to capture disruption to one function but not the other (as evidenced in the dopamine signals), and consists of a departure from commonly held assumptions regarding the computational basis of prediction learning in the basal ganglia. That is, our model suggests a completely different implementation of reinforcement learning in basal ganglia circuitry than is commonly considered (Joel et al., 2002).

One important departure from classic TDRL is that in our model, prediction error signals occur only on state transitions, and thus can be seen as “gated” according to the probability of a state transition occurring at any given time. This probability is 1 (or near 1, thus the “gate” is fully “open”) when an external event is observed, as when a new reward is delivered. However, critically, the transition probability can also be high if there is a precise expectation regarding the duration of a specific state, and this duration has terminated. It is in this case—when a state is deemed to terminate due to the passage of time despite the expected reward having not arrived— that a negative prediction error signal will be gated. Thus, temporal expectations can control or gate prediction errors, by causing a transition between states to be inferred even in the absence of an external event.

To test the predictions of the model, we simulated the evolution of state values, state durations, and the associated prediction errors using the actual task event timings that each group of rats experienced when performing the task. Model prediction-error signals were transformed into an instantaneous firing rate and then averaged and analyzed using the same epochs as we used to analyze the neural data. For the control group (full intact model), the simulations yielded the characteristic pattern of prediction error signals observed in vivo in this study and previously (Roesch et al., 2007; Takahashi et al., 2011). Specifically, the simulation produced positive prediction errors to unexpected early or large rewards and negative prediction errors to unexpected reward omission (Figs 3a and b). These errors were strongest early in each block, gradually disappeared over subsequent trials, and the pattern was similar for errors induced by changes in number versus timing of reward.

To simulate a VS lesion in the model, we prevented the model from learning or using precise temporal expectations for the duration of states. To do this, updates of state durations were effectively “blurred” by decreasing the amplitude of the timing “kernel” that was used during learning (see Methods and Fig. 3f,g; partial lesions were modeled by decreasing the kernel by half, full lesions by decreasing it to baseline). This simulated lesion captures the intuition that the VS is critically involved in gating prediction errors according to learned state durations, and therefore a loss of VS corresponds specifically to (partially or fully) decreased amplitude of a signal that tracks the expectation of a reward-related event occurring at that point within a trial. One effect of this ‘lesion’ is to effectively block the model from inferring a transition between states without an observation—the model will wait in a state indefinitely (or until the next observation of cue or reward) and is unable to infer a transition in the case of a missing (or late) reward.

This timing-lesioned model produced results that were remarkably similar to the firing of the dopamine neurons in VS-lesioned rats. Specifically, the simulation produced positive prediction errors in response to the delivery of new rewards (Fig. 3d, green line) but showed neither positive nor negative errors in response to changes in reward timing (Fig. 3c). These results are identical to the data observed in vivo. Moreover, the model did not register negative prediction errors when the additional rewards were omitted in number blocks (Fig. 3d, orange line). This is because the lesioned model could not use state duration to infer a transition at the time of the expected (but omitted) reward, and thus it did not gate a negative prediction-error signal. Notably, our neural data were equivocal on whether a negative prediction error occured for this event. On the one hand, there was not a significant difference in firing to reward omission between groups and there appeared to be a significant shift below zero in the activity of the individual neurons at the time of omission in the number blocks. However, comparing firing of the dopamine neurons recorded in the lesioned rats at the time of reward omission to baseline at the start of these blocks, the apparent decline in firing was not statistically significant. In any event, any discrepancy here is not necessarily at odds with this prediction of the model, as it is possible that the lesions were not equivalent to a complete loss of function (see light lines in Fig 3c,d for simulation of the effects of a partial lesion). We also simulated a lesion in which the width of the kernel update to the state duration distributions was increased (rather than its amplitude decreased) and the calculation of expected duration within a state was left intact. This simulation produced similar results (Fig. S3), suggesting that the specific implementation of the lesion was not paramount, as long as timing information in the model was degraded.

Finally we also tested the original hypothesis, commonly held in the literature (Joel et al., 2002; O'Doherty et al., 2003; Willuhn et al., 2012), that the VS serves as a (unitary) source of predictions of both when and how much reward is predicted for VTA dopamine neurons. To do this, in the lesioned model we set all state values to zero and blocked their update during the task, creating a situation where prediction errors must be computed absent input regarding both the timing and number of predicted rewards. In this case, and as expected, the model generated persistent prediction errors to reward delivery and no prediction errors to reward omission (dotted lines in Figs 3c and d). This pattern of results is clearly at odds with the in vivo data, suggesting that the assumption embedded in classical TDRL models—that predictions of reward number and of reward timing go hand in hand (and thus are either present or absent as a unit)—is incorrect.

Discussion

Reward prediction errors are signaled by midbrain dopamine neurons (Barto, 1995; Mirenowicz and Schultz, 1994; Montague et al., 1996; Schultz et al., 1997). To do this, dopamine neurons require predictions to compare to actual obtained rewards (Bush and Mosteller, 1951; Rescorla and Wagner, 1972; Sutton and Barto, 1981). Theoretical and experimental work has suggested that the VS is an important source of these predictions, particularly to dopamine neurons in the VTA (Daw et al., 2006; Daw et al., 2005; Joel et al., 2002; O'Doherty et al., 2003; O'Doherty et al., 2004; Seymour et al., 2004). Here we tested this hypothesis, recording from VTA dopamine neurons in rats with a lesioned VS, while they performed a task in which positive and negative prediction errors were induced by shifting either the timing or the number of expected rewards. Sham-lesioned rats exhibited prediction error signals in response to both manipulations, replicating our previous findings (Roesch et al., 2007; Takahashi et al., 2011). By contrast, dopamine neurons in rats with ipsilateral VS lesions exhibited intact prediction errors in response to increases in the number of rewards but no prediction errors to changes in reward timing on the order of several seconds (and possibly also to complete omission of expected rewards). These effects were reproduced by a computational model that used a non-traditional reinforcement-learning framework to separate learning about reward timing from learning about reward number (“state value”). Our results thus suggest a critical role for the VS in providing information about the predicted timing of rewards, but not their number, to VTA dopamine neurons. These data and our theoretical interpretive framework may require a rethinking of the implementation of prediction-error computation in the neural circuitry of the basal ganglia.

Before considering the implications of our findings, we address some important caveats. One key determinant of our findings may be the amount of training the rats underwent before recording began—while our rats were trained extensively, it may be that the VS has a broader role in supporting reward predictions in very early stages of learning a task. It is also possible that had we allowed less time for compensation by using reversible inactivation, or lesioned the VS more completely or bilaterally, we might have observed effects of the lesion on both types of prediction errors. In any case, our data suggest that delay-induced prediction-error signaling is more sensitive to VS damage than are prediction errors induced by changes in number of rewards. We also failed to observe any relationship between the amount of damage and the loss or preservation of prediction errors in our lesioned group (Fig 2r versus % damage yields an r = 0.08, p = 0.66). Even relatively modest damage to VS was sufficient to entirely disrupt delay-induced errors with little effect on those induced by number changes.

It also an empirical question of whether these results will generalize to primates or other rodent species or to other midbrain regions. While the waveform sorting approach we use is roughly similar to the approach used to identify prediction-error signaling dopamine neurons in primates (Bromberg-Martin et al., 2010; Fiorillo et al., 2008; Hollerman and Schultz, 1998; Kobayashi and Schultz, 2008; Matsumoto and Hikosaka, 2009; Mirenowicz and Schultz, 1994; Morris et al., 2006; Waelti et al., 2001), and the error signals we find in our neurons obey many of the same rules as those demonstrated in comparable primate studies (Bromberg-Martin et al., 2010; Fiorillo et al., 2008; Kobayashi and Schultz, 2008; Morris et al., 2006), it is possible that the influence of VS on these signals may be different in other species. This may be particularly true in mice, where dopamine neurons seem more prevalent in single unit data (>50% of recorded neurons) and prediction-error signals are often reported in populations whose waveform features are largely indistinguishable from the other populations (Cohen et al., 2012; Eshel et al., 2015; Tian and Uchida, 2015). It is also possible that the influence of VS on dopaminergic error signals in other parts of midbrain differs from what we have observed in (mostly lateral) VTA.

As a last caveat, we note that our conclusions are based on a relatively small proportion of our recorded neurons, smaller than would be identified as dopaminergic by immunohistological criteria (Li et al., 2013). While we did not pre-select neurons for recording, it is possible that neurons with different firing correlates were not visible to our electrodes. Nevertheless, the neurons we isolated had waveform and firing correlates similar to putative dopamine neurons in other studies in both rats (Jo et al., 2013; Jo and Mizumori, 2015; Pan et al., 2005) and primates (Bromberg-Martin et al., 2010; Fiorillo et al., 2008; Kobayashi and Schultz, 2008; Morris et al., 2006), recorded in both VTA and substantia nigra pars compacta. Further, most neurons in lateral VTA of the rat are thought to be classical dopamine neurons, meaning that they exhibit the same enzymatic phenotype characteristic of dopamine-releasing neurons in the substantia nigra (Li et al., 2013). As a result, we do not believe we are missing substantial numbers of dopamine neurons in our classification. Consistent with this, we also analyzed activity from the other neural populations that we recorded, but did not see any evidence of significant dopamine-like error signaling (see Supplemental data, especially Table S1).

We now turn to the possible implications of our results. Most importantly, our findings are inconsistent with the currently popular model in which VS supplies the reward predictions used by VTA dopamine neurons to calculate reward prediction errors (Daw et al., 2006; Daw et al., 2005; Joel et al., 2002; O'Doherty et al., 2003; O'Doherty et al., 2004; Seymour et al., 2004). If this model were correct, removing VS input to dopamine neurons would have amounted to removing all reward predictions, and therefore would have resulted in persistent firing to all rewards (now “unexpected”) and no signals to omission of rewards, irrespective of the manipulation (timing or number) that induced prediction errors. We did not observe these results, suggesting that the strong version of this hypothesis is not viable.

However, we did find major effects of VS removal on VTA error signaling, but only when prediction errors were induced by changing the timing of the reward. Here it is important to note that even for these timing-dependent prediction errors, this is not the effect we would have expected if we simply eliminated timing predictions—while negative prediction errors would be lost, positive errors would remain high in that case, as all rewards would be surprising. Instead, we found that positive errors were also eliminated (as compared to the response of these neurons to other unexpected rewards), as if a reward was predicted whenever it arrived. That is, lacking VS input, putative dopamine neurons knew that rewards would appear but did not know (or care) when. This suggests that VS is not necessary for predicting the occurrence of reward, however it is necessary for endowing that prediction with temporal specificity (e.g. that the reward is coming after 500 ms). Intriguingly, absent this information, any timing of the reward was treated as the expected timing of the reward.

A dissociation between knowing that reward is coming but not when it should arrive directly contradicts the standard TDRL framework in which predictions about reward timing and number are inextricably linked (Montague et al., 1996; Schultz et al., 1997; Sutton and Barto, 1981). However, this dissociation was produced effectively in a model based on a semi-Markov framework proposed by Daw et al. (2006) that learns about reward timing and reward number in parallel. “Lesioning” this model removing expectations for the duration between events in the task and leaving all other aspects of prediction learning intact produced the exact pattern of altered prediction-error signaling seen in the VS-lesioned rats: a reward that came too early or was delayed such that its timing became uncertain produced no prediction error, even though there were robust prediction error signals in response to changes in number of rewards. The model also predicted the loss of number-induced negative prediction errors (Fig. 3d, orange line). It is not clear from our in vivo data whether this signal is indeed affected by the VS lesion (Fig. 2j vs. 2l, orange lines), as the low baseline rates of dopamine neurons make suppression of firing (or lack thereof) difficult to demonstrate reliably. In any case, if one assumes that the lesions did not completely ablate the VS, some residual negative prediction error might be expected (Fig. 3c,d, light lines).

Our modeling results link signaling from the VS with the gating of prediction-error signals according to the learned timing of rewards, suggesting that activity in the VS might evolve within a trial by tracking learned temporal expectations regarding the timing of reward delivery (or other reward-predictive events). Such a signal from the VS might thus be similar to adaptive time representations found in the dorsal striatum in an instrumental task (Mello et al., 2015). A role for VS in constraining reward predictions to specific (learned) time intervals is also consistent with previous reports that place the VS at the center of a neural timing circuit (Meck et al., 2008). Notably this function is thought to depend on input from hippocampus (Meck, 1988; Meck et al., 2013), which has been linked to keeping track of internal representations of time (Eichenbaum, 2014), and is known to regulate VS input to VTA (Floresco et al., 2001). The loss of the ability to track these temporal predictions absent a VS and its effect on the perception of negative prediction errors may also be of relevance to the apparent role of VS in the perception of risk (Dalton et al., 2014; St Onge et al., 2012; Stopper and Floresco, 2011), since an inability to perceive negative prediction errors would dramatically reduce the apparent “riskiness” of a probabilistic reward. One prediction of our model is that substituting a small reward for reward omission might restore normal function in such tasks.

A specific role of VS in signaling state timing can even be observed in simple Pavlovian conditioning tasks—previous work has shown that when rats with bilateral neurotoxic lesions of VS are trained to associate long-duration cues with reward, learning and the ultimate levels of responding are not affected by the lesions, but the specific pattern of responding during the cues is disrupted (Singh et al., 2011). Specifically, while non-lesioned controls exhibit progressively more conditioned responding as the reward period approached, rats with VS lesions respond at constant levels throughout the delay to reward. This is consistent with a failure to learn temporally precise predictions but a conserved ability to learn that a reward is impending. Finally, Klein-Flugge et al. (2011) compared fMRI responses in VS and VTA in humans learning to predict the timing, but not the number of variably-timed rewards. They showed that blood-oxygen-level dependent (BOLD) signals in the VS are sensitive to information about timing, but not information about number, whereas VTA prediction-error signals are sensitive to both. Indeed, VS signals in that study were consistent with a role for the VS in learning when rewards will occur, showing larger responses to cues that predicted information about timing, and to rewards that occurred within the expected timeframe. Here we have shown more directly that the VS supplies information about temporal specificity of predictions to VTA neurons.

Single unit studies show that signals related to reward number or value are present in the VS, and many other reports indicate that VS integrity can be critical to behaviors seemingly based on value or even reward number (Berridge and Robinson, 1998; Di Chiara, 2002; Hauber et al., 2000; McDannald et al., 2011; Nicola, 2007; Steinberg et al., 2014). In fact, single-unit data from VS in the same task as used here show that information about number is present in VS unit activity (Roesch et al., 2009), and rats with bilateral VS lesions tested in the same task used in our study initially showed free choice deficits (Burton et al., 2014), though the deficits in delay blocks were significantly more severe than those in the number blocks. Indeed, when analyzed separately, VS lesioned rats showed a preference for larger rewards on free choice trials but did not show a preference between immediate versus delayed rewards (M.R. Roesch, personal communication), consistent with our finding. Interestingly, with further training, rats with bilateral VS lesions did learn to respond normally, even in the delay blocks. The authors concluded that this recovery of normal behavior must reflect compensation by other slower learning mechanisms in dorsal striatum (Burton et al., 2014). Our data from extensively trained rats suggest that these mechanisms can operate independent of normal delay-induced dopaminergic error signals from the VTA.

Our data suggest that even though VS neurons may signal information about the expected number of rewards (Roesch et al., 2009), this information may be redundant with inputs to VTA from other areas, as VS lesions were not detrimental to (positive) prediction-error signaling due to changes in reward number. Indeed, information about reward number or simply the future occurrence of reward is sufficiently fundamental that it is likely signaled by a diverse array of areas impinging on the midbrain, any one of which may be sufficient to support error signaling in response to simply adding a reward. Neural responses to rewards or to cues that predict different numbers of rewards are found in many brain areas, including areas that have either direct or indirect projections to VTA or its surrounds. Preserved number-induced error signals in our recordings may therefore reflect input from any of these areas. By contrast, our results suggest that signaling of when a reward is expected to occur is an aspect of reward prediction that is mediated uniquely by circuitry that converges on the VS.

Experimental Procedures

Subjects

Male Long-Evans rats (n = 16) were obtained at 175-200g from Charles River Labs, Wilmington, MA. Rats were tested at the NIDA-IRP in accordance with NIH guidelines.

Surgical procedures and histology

Lesions were made and electrodes implanted under stereotaxic guidance; all surgical procedures adhered to guidelines for aseptic technique. VS lesions were made by infusing of quinolinic acid (Sigma) in Dulbecco's phosphate vehicle. Infusions of 0.4 μl Quinolinic Acid (20 μg μl-1) were made at 1.9 mm anterior to the bregma, and 1.9 mm lateral to the midline, at a depth of 7.3 mm ventral to the skull surface. Sham controls received identical treatment only no infusion was made. After this procedure, a drivable bundle of eight 25-um diameter FeNiCr wires (Stablohm 675, California Fine Wire, Grover Beach, CA) chronically implanted dorsal to VTA in the left or right hemisphere at 5.2 mm posterior to bregma, 0.7 mm laterally, and 7.0 mm ventral to the brain surface at an angle of 5° toward the midline from vertical. Wires were cut with surgical scissors to extend ∼ 1.5 mm beyond the cannula and electroplated with platinum (H2PtCl6, Aldrich, Milwaukee, WI) to an impedance of ∼300 kOhms. Cephalexin (15 mg/kg p.o.) was administered twice daily for two weeks post-operatively. The rats were then perfused, and their brains removed and processed for histology (Roesch et al., 2006).

Odor-guided choice task

Recording was conducted in aluminum chambers approximately 18″ on each side with sloping walls narrowing to an area of 12″ × 12″ at the bottom. A central odor port was located above two fluid wells (Fig. 1a). Two lights were located above the panel. The odor port was connected to an air flow dilution olfactometer to allow the rapid delivery of olfactory cues. Odors where chosen from compounds obtained from International Flavors and Fragrances (New York, NY).

Trials were signaled by illumination of the panel lights inside the box. When these lights were on, nosepoke into the odor port resulted in delivery of the odor cue to a small hemicylinder located behind this opening. One of three different odors was delivered to the port on each trial, in a pseudorandom order. At odor offset, the rat had 3 seconds to make a response at one of the two fluid wells. One odor instructed the rat to go to the left to get reward, a second odor instructed the rat to go to the right to get reward, and a third odor indicated that the rat could obtain reward at either well. Odors were presented in a pseudorandom sequence such that the free-choice odor was presented on 7/20 trials and the left/right odors were presented in equal numbers. In addition, the same odor could be presented on no more than 3 consecutive trials.

Once the rats were shaped to perform this basic task, we introduced blocks in which we independently manipulated the number of the reward and the delay preceding reward delivery (Fig. 1b). For recording, one well was randomly designated as short and the other long at the start of the session (Fig. 1b, 1sh and 1lo). In the second block of trials these contingencies were switched (Fig. 1b, 2sh and 2lo). The length of the delay under long conditions followed an algorithm in which the side designated as long started off as 1s and increased by 1 s every time that side was chosen until it became 3 s. If the rat continued to choose that side, the length of the delay increased by 1 s up to a maximum of 7 s. If the rat chose the side designated as long less than 8 out of the last 10 choice trials then the delay was reduced by 1 s to a minimum of 3 s. The reward delay for long forced-choice trials was yoked to the delay in free-choice trials during these blocks. In later blocks we held the delay preceding reward constant while manipulating the number of reward (Fig. 1b, 3bg, 3sm, 4bg and 4sm). The reward was a 0.05 ml bolus of 10% sucrose solution. The reward number used in delay blocks was the same as the reward used in the small reward blocks. For big reward, additional boli were delivered after gaps of 500 ms.

Single-unit recording

Wires were screened for activity daily; if no activity was detected, the rat was removed, and the electrode assembly was advanced 40 or 80 um. Otherwise active wires were selected to be recorded, a session was conducted, and the electrode was advanced at the end of the session. Neural activity was recorded using Plexon Multichannel Acquisition Processor systems (Dallas, TX). Signals from the electrode wires were amplified 20× by an op-amp headstage (Plexon Inc, HST/8o50-G20-GR), located on the electrode array. Immediately outside the training chamber, the signals were passed through a differential pre-amplifier (Plexon Inc, PBX2/16sp-r-G50/16fp-G50), where the single unit signals were amplified 50× and filtered at 150-9000 Hz. The single unit signals were then sent to the Multichannel Acquisition Processor box, where they were further filtered at 250-8000 Hz, digitized at 40 kHz and amplified at 1-32×. Waveforms (>2.5:1 signal-to-noise) were extracted from active channels and recorded to disk by an associated workstation

Data analysis

Units were sorted using Offline Sorter software from Plexon Inc (Dallas, TX). Sorted files were then processed and analyzed in Neuroexplorer and Matlab (Natick, MA). Dopamine neurons were identified via a waveform analysis used and validated by us and others previously (Jo et al., 2013; Roesch et al., 2007; Takahashi et al., 2011; Xin and Costa, 2010). Briefly cluster analysis was performed based on the half time of the spike duration and the ratio comparing the amplitude of the first positive and negative waveform segments. The center and variance of each cluster was computed without data from the neuron of interest, and then that neuron was assigned to a cluster if it was within 3 s.d. of the cluster's center. Neurons that met this criterion for more than one cluster were not classified. This process was repeated for each neuron.

To quantify changes in firing due to reward delivery or omission we examined activity in the 500 ms periods identified by the arrows in Fig. 1b. The start of this time window coincided with opening of the solenoid valve and the duration was chosen to encompass the maximum duration of opening (actual open times were calibrated to maintain 0.05ml boli and so could be less than this). Importantly, no other trial event occurred until at least 500 ms after the end of this time window. Thus this period allowed us to isolate activity related to delivery or omission of each individual reward bolus. Analyses were conducted using Matlab (Mathworks, Natick, MA) or Statistica (StatSoft, Inc, Tulsa, OK) as described in the main text.

Computational modeling

We simulated learning and prediction error signaling in the task using temporal difference reinforcement learning in a semi-Markov framework with partial observability (Daw et al., 2006). Briefly, in this approach, we assume the rats represent the behavioral task as a sequence of states that each have an associated value, V, and a distribution over dwell times in that state, D. Observations during the task, such as an odor cue or the delivery of a reward in a well, signal a transition between states, at which time a prediction error is signaled and used to update state values. Additionally, transitions can occur without an external observation due to the mere passage of time (e.g., at the time that a reward was expected but failed to arrive, see below). That is, knowledge of the likely dwell time in a state (represented by D) can be used to infer a silent transition, and gate the signaling of a prediction error and the update of state values. To simulate a VS lesion in the model, we prevented the model from learning accurate dwell time distributions for each state, thereby degrading the ability of the model to infer these silent transitions when a reward is omitted or delayed. We describe the model in more detail in the Supplemental Information.

Supplementary Material

supplement

Acknowledgments

This work was supported by funding from NIDA (Y.T. and G.S.), the Human Frontier Science Program Organization (A.L.), and NIMH grant R01MH098861 (Y.N.). The opinions expressed in this article are the authors' own and do not reflect the view of the National Institutes of Health, the Department of Health and Human Services, or the United States government. The authors would like to acknowledge Nathaniel Daw for helpful suggestions regarding the semi-Markov model of the task.

Footnotes

The authors declare no competing financial interests.

Author Contributions: Y.T. and G.S. conceived and designed the behavioral and single-unit experiments, and Y.T. conducted the experiments and analyzed the data. A.L. conducted the modeling, with input from Y.N. Y.T. and G.S prepared the manuscript with input from the other authors.

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

  1. Barto AG. Adaptive critics and the basal ganglia. In: Houk JC, Davis JL, Beiser DG, editors. Models of Information Processing in the Basal Ganglia. Cambridge MA: MIT Press; 1995. pp. 215–232. [Google Scholar]
  2. Berridge KC, Robinson TE. What is the role of dopamine in reward: hedonic impact, reward learning, or incentive salience? Brain Res Brain Res Rev. 1998;28:309–369. doi: 10.1016/s0165-0173(98)00019-8. [DOI] [PubMed] [Google Scholar]
  3. Bissonette GB, Burton AC, Gentry RN, Goldstein BL, Hearn TN, Barnett BR, Kashtelyan V, Roesch MR. Separate populations of neurons in ventral striatum encode value and motivation. PLos One. 2013;8:e64673. doi: 10.1371/journal.pone.0064673. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Bocklisch C, Pascoli V, Wong JCY, House DRC, Yvon C, de Roo M, Tan KR, Luscher C. Cocaine disinhibits dopamine neurons by potentiation of GABA transmission in the ventral tegmental area. Science. 2013;341:1521–1525. doi: 10.1126/science.1237059. [DOI] [PubMed] [Google Scholar]
  5. Bradtke SJ, Duff MO. Reinforcement learning methods for continuous-time Markov decision problems. In: Tesauro G, Touretzky DS, Leen TK, editors. Advances in Neural Information Processing Systems. Cambridge, MA: MIT Press; 1995. pp. 393–400. [Google Scholar]
  6. Bromberg-Martin ES, Matsumoto M, Hong S, Hikosaka O. A pallidus-habenula-dopamine pathway signals inferred stimulus values. Journal of Neurophysiology. 2010;104:1068–1076. doi: 10.1152/jn.00158.2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Burton AC, Bissonette GB, Lichtenberg NT, Kashtelyan V, Roesch MR. Ventral striatum lesions enhance stimulus and response encoding in dorsal striatum. Biological Psychiatry. 2014;75:132–139. doi: 10.1016/j.biopsych.2013.05.023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Bush RR, Mosteller F. A mathematical model for simple learning. Psychological Review. 1951;58:313–323. doi: 10.1037/h0054388. [DOI] [PubMed] [Google Scholar]
  9. Cohen JY, Haesler S, Vong L, Lowell BB, Uchida N. Neuron-type-specific signals for reward and punishment in the ventral tegmental area. Nature. 2012;482:85–88. doi: 10.1038/nature10754. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Dalton GL, Phillips AG, Floresco SB. Preferential involvement by nucleus accumbens shell in mediating probabilistic learning and reversal shifts. Journal of Neuroscience. 2014;34:4618–4626. doi: 10.1523/JNEUROSCI.5058-13.2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Daw N, Courville AC, Touretzky DS. Representation and timing in theories of the dopamine system. Neural Computation. 2006;18:1637–1677. doi: 10.1162/neco.2006.18.7.1637. [DOI] [PubMed] [Google Scholar]
  12. Daw ND, Niv Y, Dayan P. Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nature Neuroscience. 2005;8:1704–1711. doi: 10.1038/nn1560. [DOI] [PubMed] [Google Scholar]
  13. Di Chiara G. Nucleus accumbens shell and core dopamine: differential role in behavior and addiction. Behav Brain Res. 2002;137:75–114. doi: 10.1016/s0166-4328(02)00286-3. [DOI] [PubMed] [Google Scholar]
  14. Eichenbaum H. Time cells in the hippocampus: a new dimension for mapping memories. Nature Reviews Neuroscience. 2014;15:732–744. doi: 10.1038/nrn3827. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Eshel N, Bukwich M, Rao V, Hemmelder V, Tian J, Uchida N. Arithmetic and local circuitry underlying dopamine prediction errors. Nature. 2015;525 doi: 10.1038/nature14855. AOP. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Fiorillo CD, Newsome WT, Schultz W. The temporal precision of reward prediction in dopamine neurons. Nature Neuroscience. 2008;11:966–973. doi: 10.1038/nn.2159. [DOI] [PubMed] [Google Scholar]
  17. Floresco SB, Todd CL, Grace AA. Glutamatergic afferents from the hippocampus to the nucleus accumbens regulate activity of ventral tegmental area dopamine neurons. Journal of Neuroscience. 2001;21:4915–4922. doi: 10.1523/JNEUROSCI.21-13-04915.2001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Gallistel CR, King A, McDonald R. Sources of variability and systematic error in mouse timing behavior. Journal of Experimental Psychology: Animal Behavior Processes. 2004;30:3–16. doi: 10.1037/0097-7403.30.1.3. [DOI] [PubMed] [Google Scholar]
  19. Gibbon J. Scalar expectancy theory and Weber's law in animal timing. Psychological Review. 1977;84:279–325. [Google Scholar]
  20. Grace AA, Bunney BS. Opposing effects of striatonigral feedback pathways on midbrain dopamine cell activity. Brain Research. 1985;333:271–284. doi: 10.1016/0006-8993(85)91581-1. [DOI] [PubMed] [Google Scholar]
  21. Groenewegen HJ, Berendse HW, Wolters JG, Lohman AHM. The anatomical relationship of the prefrontal cortex with the striatopallidal system, the thalamus and the amygdala: evidence for a parallel organization. Progress in Brain Research. 1990;85:95–118. doi: 10.1016/s0079-6123(08)62677-1. [DOI] [PubMed] [Google Scholar]
  22. Hauber W, Bohn I, Giertler C. NMDA, but not dopamine D2, receptors in rat nucleus accumbens are involved in guidance of instrumental behavior by stimuli predicting reward magnitude. Journal of Neuroscience. 2000;20:6282–6288. doi: 10.1523/JNEUROSCI.20-16-06282.2000. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Hollerman JR, Schultz W. Dopamine neurons report an error in the temporal prediction of reward during learning. Nature Neuroscience. 1998;1:304–309. doi: 10.1038/1124. [DOI] [PubMed] [Google Scholar]
  24. Jo YS, Lee J, Mizumori SJ. Effects of prefrontal cortical inactivation on neural activity in the ventral tegmental area. Journal of Neuroscience. 2013;33:8159–8171. doi: 10.1523/JNEUROSCI.0118-13.2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Jo YS, Mizumori SJ. Prefrontal regulation of neuronal activity in the ventral tegmental area. Cerebral Cortex. 2015 doi: 10.1093/cercor/bhv215. epub ahead of print. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Joel D, Niv Y, Ruppin E. Actor-critic models of the basal ganglia: new anatomical and computational perspectives. Neural Networks. 2002;15:535–547. doi: 10.1016/s0893-6080(02)00047-3. [DOI] [PubMed] [Google Scholar]
  27. Klein-Flugge MC, Hunt LT, Bach DR, Dolan RJ, Behrens TEJ. Dissociable reward and timing signals in human midbrain and ventral striatum. Neuron. 2011;72:654–664. doi: 10.1016/j.neuron.2011.08.024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Kobayashi K, Schultz W. Influence of reward delays on responses of dopamine neurons. Journal of Neuroscience. 2008;28:7837–7846. doi: 10.1523/JNEUROSCI.1600-08.2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Lammel S, Steinberg EE, Foldy C, Wall NR, Beier K, Luo L, Malenka RC. Diversity of transgenic mouse models for selective targeting of midbrain dopamine neurons. Neuron. 2015;85:429–438. doi: 10.1016/j.neuron.2014.12.036. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Li X, Qi J, Yamaguchi T, Wang HL, Morales M. Heterogenous composition of dopamine neurons of the rat A10 region: molecular evidence for diverse signaling properties. Brain Structure and Function. 2013;218:1159–1176. doi: 10.1007/s00429-012-0452-z. [DOI] [PubMed] [Google Scholar]
  31. Margolis EB, Lock H, Hjelmstad GO, Fields HL. The ventral tegmental area revisited: Is there an electrophysiological marker for dopaminergic neurons? Journal of Physiology. 2006;577:907–924. doi: 10.1113/jphysiol.2006.117069. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Matsumoto M, Hikosaka O. Two types of dopamine neuron distinctly convey positive and negative motivational signals. Nature. 2009;459:837–841. doi: 10.1038/nature08028. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. McDannald MA, Lucantonio F, Burke KA, Niv Y, Schoenbaum G. Ventral striatum and orbitofrontal cortex are both required for model-based, but not model-free, reinforcement learning. Journal of Neuroscience. 2011;31:2700–2705. doi: 10.1523/JNEUROSCI.5499-10.2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Meck WH. Hippocampal function is required for feedback control of an internal clock's criterion. Behavioral Neuroscience. 1988;102:54–60. doi: 10.1037//0735-7044.102.1.54. [DOI] [PubMed] [Google Scholar]
  35. Meck WH, Church RM, Olton DS. Hippocampus, time, and memory. Behavioral Neuroscience. 2013;127:644–668. doi: 10.1037/a0034188. [DOI] [PubMed] [Google Scholar]
  36. Meck WH, Penney TB, Pouthas V. Cortico-striatal representation of time in animals and humans. Current Opinion in Neurobiology. 2008;18:145–152. doi: 10.1016/j.conb.2008.08.002. [DOI] [PubMed] [Google Scholar]
  37. Mello G, Soares S, Paton JJ. A scalable population code for time in the striatum. Current Biology. 2015;25:1113–1122. doi: 10.1016/j.cub.2015.02.036. [DOI] [PubMed] [Google Scholar]
  38. Mirenowicz J, Schultz W. Importance of unpredictability for reward responses in primate dopamine neurons. Journal of Neurophysiology. 1994;72:1024–1027. doi: 10.1152/jn.1994.72.2.1024. [DOI] [PubMed] [Google Scholar]
  39. Mogenson GJ, Jones DL, Yim CY. From motivation to action: functional interface between the limbic system and the motor system. Progress in Neurobiology. 1980;14:69–97. doi: 10.1016/0301-0082(80)90018-0. [DOI] [PubMed] [Google Scholar]
  40. Montague PR, Dayan P, Sejnowski TJ. A framework for mesencephalic dopamine systems based on predictive hebbian learning. Journal of Neuroscience. 1996;16:1936–1947. doi: 10.1523/JNEUROSCI.16-05-01936.1996. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Morris G, Nevet A, Arkadir D, Vaadia E, Bergman H. Midbrain dopamine neurons encode decisions for future action. Nature Neuroscience. 2006;9:1057–1063. doi: 10.1038/nn1743. [DOI] [PubMed] [Google Scholar]
  42. Nicola SM. The nucleus accumbens as part of a basal ganglia action selection circuit. Psychopharmacology (Berl) 2007;191:521–550. doi: 10.1007/s00213-006-0510-4. [DOI] [PubMed] [Google Scholar]
  43. O'Doherty J, Dayan P, Friston KJ, Critchley H, Dolan RJ. Temporal difference learning model accounts for responses in human ventral striatum and orbitofrontal cortex during Pavlovian appetitive learning. Neuron. 2003;38:329–337. doi: 10.1016/s0896-6273(03)00169-7. [DOI] [PubMed] [Google Scholar]
  44. O'Doherty J, Dayan P, Schultz J, Deichmann R, Friston KJ, Dolan RJ. Dissociable roles of ventral and dorsal striatum in instrumental conditioning. Science. 2004;304:452–454. doi: 10.1126/science.1094285. [DOI] [PubMed] [Google Scholar]
  45. Oleson EB, Beckert MV, Morra JT, Lansink CS, Cachope R, Abdullah RA, Loriaux AL, Schetters D, Pattij T, Roitman MF, et al. Endocannabinoids shape accumbal encoding of cue-motivated behavior via CB1 receptor activation in the ventral tegmentum. Neuron. 2012;73:360–373. doi: 10.1016/j.neuron.2011.11.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Pan WX, Schmidt R, Wickens JR, Hyland BI. Dopamine cells respond to predicted events during classical conditioning: evidence for eligibility traces in the reward-learning network. Journal of Neuroscience. 2005;25:6235–6242. doi: 10.1523/JNEUROSCI.1478-05.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Rescorla RA, Wagner AR. A theory of Pavlovian conditioning: variations in the effectiveness of reinforcement and nonreinforcement. In: Black AH, Prokasy WF, editors. Classical Conditioning II: Current Research and Theory. New York: Appleton-Century-Crofts; 1972. pp. 64–99. [Google Scholar]
  48. Roesch MR, Calu DJ, Schoenbaum G. Dopamine neurons encode the better option in rats deciding between differently delayed or sized rewards. Nature Neuroscience. 2007;10:1615–1624. doi: 10.1038/nn2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Roesch MR, Singh T, Brown PL, Mullins SE, Schoenbaum G. Ventral striatal neurons encode the value of the chosen action in rats deciding between differently delayed or sized rewards. Journal of Neuroscience. 2009;29:13365–13376. doi: 10.1523/JNEUROSCI.2572-09.2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Roesch MR, Taylor AR, Schoenbaum G. Encoding of time-discounted rewards in orbitofrontal cortex is independent of value representation. Neuron. 2006;51:509–520. doi: 10.1016/j.neuron.2006.06.027. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Schultz W, Dayan P, Montague PR. A neural substrate for prediction and reward. Science. 1997;275:1593–1599. doi: 10.1126/science.275.5306.1593. [DOI] [PubMed] [Google Scholar]
  52. Seymour B, O'Doherty J, Dayan P, Koltzenburg M, Jones AK, Dolan RJ, Friston KJ, Frackowiak R. Temporal difference models describe higher order learning in humans. Nature. 2004;429:664–667. doi: 10.1038/nature02581. [DOI] [PubMed] [Google Scholar]
  53. Singh T, MCdannald MA, Takahashi YK, Haney RZ, Cooch NK, Lucantonio F, Schoenbaum G. The role of the nucleus accumbens in knowing when to respond. Learning & Memory. 2011;18:85–87. doi: 10.1101/lm.2008111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. St Onge JR, Stopper CM, Zahm DS, Floresco SB. Separate prefrontal-subcortical circuits mediate different components of risk-based decision making. Journal of Neuroscience. 2012;32:2886–2899. doi: 10.1523/JNEUROSCI.5625-11.2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Steinberg EE, Boivin JR, Saunders BT, Witten IB, Deisseroth K, Janak PH. Positive reinforcement mediated by midbrain dopamine neurons requires D1 and D2 receptor activation in the nucleus accumbens. PLoS ONE. 2014;9:e94771. doi: 10.1371/journal.pone.0094771. [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. Stopper CM, Floresco SB. Contributions of the nucleus accumbens and its subregions to different aspects of risk-based decision making. Cognitive, Affective, & Behavioral Neuroscience. 2011;11:97–112. doi: 10.3758/s13415-010-0015-9. [DOI] [PubMed] [Google Scholar]
  57. Stuber GD, Stamatakis AM, Kantak PA. Considerations when using cre-driver rodent lines for studying ventral tegmental area circuitry. Neuron. 2015;85:439–445. doi: 10.1016/j.neuron.2014.12.034. [DOI] [PMC free article] [PubMed] [Google Scholar]
  58. Sutton RS, Barto AG. Toward a modern theory of adaptive networks: expectation and prediction. Psychological Review. 1981;88:135–170. [PubMed] [Google Scholar]
  59. Sutton RS, Barto AG. Reinforcement Learning: An introduction. Cambridge MA: MIT Press; 1998. [Google Scholar]
  60. Takahashi Y, Roesch MR, Stalnaker TA, Haney RZ, Calu DJ, Taylor AR, Burke KA, Schoenbaum G. The orbitofrontal cortex and ventral tegmental area are necessary for learning from unexpected outcomes. Neuron. 2009;62:269–280. doi: 10.1016/j.neuron.2009.03.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  61. Takahashi YK, Roesch MR, Wilson RC, Toreson K, O'Donnell P, Niv Y, Schoenbaum G. Expectancy-related changes in firing of dopamine neurons depend on orbitofrontal cortex. Nature Neuroscience. 2011;14:1590–1597. doi: 10.1038/nn.2957. [DOI] [PMC free article] [PubMed] [Google Scholar]
  62. Tian J, Uchida N. Habenula lesions reveal that multiple mechanisms underlie dopamine prediction errors. Neuron. 2015;87:1304–1316. doi: 10.1016/j.neuron.2015.08.028. [DOI] [PMC free article] [PubMed] [Google Scholar]
  63. Ungless MA, Grace AA. Are you or aren't you? Challenges associated with physiologically identifying dopamine neurons. Trends in Neurosciences. 2012;35:422–430. doi: 10.1016/j.tins.2012.02.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  64. Voorn P, Vanderschuren LJMJ, Groenewegen HJ, Robbins TW, Pennartz CMA. Putting a spin on the dorsal-ventral divide of the striatum. Trends in Neurosciences. 2004;27:468–474. doi: 10.1016/j.tins.2004.06.006. [DOI] [PubMed] [Google Scholar]
  65. Waelti P, Dickinson A, Schultz W. Dopamine responses comply with basic assumptions of formal learning theory. Nature. 2001;412:43–48. doi: 10.1038/35083500. [DOI] [PubMed] [Google Scholar]
  66. Watabe-Uchida M, Zhu L, Ogawa SK, Vamanrao A, Uchida N. Whole-brain mapping of direct inputs to midbrain dopamine neurons. Neuron. 2012;74:858–873. doi: 10.1016/j.neuron.2012.03.017. [DOI] [PubMed] [Google Scholar]
  67. Willuhn I, Burgeno LM, Everitt BJ, Phillips PEM. Heirarchical recruitment of phasic dopamine signaling in the striatum during the progression of cocaine use. Proceedings of the National Academy of Science. 2012;109:20703–20708. doi: 10.1073/pnas.1213460109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  68. Xia Y, Driscoll JR, Wilbrecht L, Margolis EB, Fields HL, Hjelmstad GO. Nucleus accumbens medium spiny neurons target non-dopaminergic nurons in the ventral tegmental area. Journal of Neuroscience. 2011;31:7811–7816. doi: 10.1523/JNEUROSCI.1504-11.2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  69. Xin J, Costa RM. Start/stop signals emerge in nigrostriatal circuits during sequence learning. Nature. 2010;466:457–462. doi: 10.1038/nature09263. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

supplement

RESOURCES