Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2020 Aug 1.
Published in final edited form as: Neuroscience. 2019 Apr 25;412:259–269. doi: 10.1016/j.neuroscience.2019.04.035

Decreases in Cued Reward Seeking After Reward-Paired Inhibition of Mesolimbic Dopamine

Sarah Fischbach-Weiss 1, Patricia Janak 2,3,4,*
PMCID: PMC6858844  NIHMSID: NIHMS1527870  PMID: 31029728

Abstract

Reward-paired optogenetic manipulation of dopamine neurons can increase or decrease behavioral responding to antecedent cues when subjects have the opportunity for new learning, in accordance with a dopamine-mediated error learning signal. Here we examined the impact of reward-paired dopamine neuron inhibition on behavioral responding to reward-predictive cues after subjects had learned. We trained male TH-IRES-Cre mice to lever press for food reward in a progressive ratio procedure, a 2-cue choice procedure, or when continuously reinforced; in all procedures, completion of the response requirement was signaled by an auditory cue presented prior to food delivery. After training, mice underwent successive sessions in which optogenetic inhibition of dopamine neurons was triggered during food receipt. Rather than mimic brief inhibitions associated with negative reward prediction errors, we applied inhibition throughout the ingestion period on each trial. We found in all procedures that optogenetic inhibition of dopamine neurons during reward receipt decreased behavioral responding to the preceding reward-predictive cue over days, a behavioral change observed during time periods without optogenetic neuronal inhibition. Extinction-like behavioral responding was selective for learned associations: it was observed in the 2-response procedure in which each subject was trained on two associations and inhibition was paired with reward for only one of the associations. Thus, inhibition during reward receipt can decrease responding to reward predictive cues, sharing some features of behavioral extinction. These findings suggest changes in mesolimbic dopaminergic transmission at the time of experienced reward impacts subsequent responding to cues in well-trained subjects as predicted for a learning signal.

Keywords: Pavlovian learning, reward prediction error, extinction, halorhodopsin, Th-cre mice, optogenetics, ventral tegmental area

Introduction

Although learning depends upon multiple interacting neural systems, the activity of midbrain dopamine (DA) neurons continues to receive considerable attention due to the parallel between phasic firing of these neurons and the notion of a reward prediction error (RPE) capable of supporting certain forms of associative learning. In this view, phasic DA neuron firing signals the difference between actual and expected reward, with a relative increase when reward is better than unexpected, little change when reward is correctly predicted, and a decrease when reward is worse than expected or omitted. There are many examples of neural response profiles consistent with this view (Cohen et al., 2012; Day et al., 2007; Eshel et al., 2015; Flagel et al., 2011; Hart et al., 2014; Ljungberg et al., 1992; Matsumoto and Hikosaka, 2009; Owesson-White et al., 2008; Roesch et al., 2007; Saunders et al., 2018; Schultz et al., 2015, 1997; Waelti et al., 2001), although the concept of RPE signaling may not encompass all features of DA mediation of learning (Sharpe et al., 2017), nor behavior generally (e.g., (Collins et al., 2016; Howe and Dombeck, 2016).

The idea that DA activity may serve as a teaching signal has been directly tested using optogenetic manipulation of VTA DA neuron activity to mimic a positive or negative RPE (Chang et al., 2017, 2016; Hamid et al., 2016; Keiflin et al., 2017; Parker et al., 2016; Saunders et al., 2018; Steinberg et al., 2013; Tsai et al., 2009). When brief optogenetic activation (Keiflin et al., 2017; Steinberg et al., 2013) or inhibition (Chang et al., 2017, 2016) of VTA DA neurons in transgenic TH-cre rats is applied during the time of natural reward receipt, conditioned responding to a preceding Pavlovian cue is increased or decreased, respectively, during a later test session, an effect congruent with an RPE mechanism. In these experiments, reward-paired inhibition was applied during sessions in which the cues or rewards changed, offering an opportunity for new learning. In addition, when a neutral sensory cue is consistently paired with optogenetic activation of VTA DA neurons in lieu of natural reward, subjects also develop conditioned responding to the cue (Saunders et al., 2018).

In addition to learning, a role for VTA DA in motivation, or behavioral activation, has long been appreciated (Berridge, 2007; Berridge and Robinson, 2003; Robbins and Everitt, 2007; Salamone, 2002; Salamone and Correa, 2012); (Beeler et al., 2012; Cagniard et al., 2006; Niv, 2007; Ostlund et al., 2012). Recently we tested the effects of optogenetic inhibition of midbrain DA neurons of TH-cre mice performing sequences of nosepoke responses reinforced by serial presentation of a cue and a food pellet. We found that DA neuron inhibition just prior to or during the performance of sequences of nosepoke responses to earn reward reduced the likelihood that mice would continue to emit nosepoke responses during the short 15 sec epoch of DA neuron inhibition, an effect that rapidly recovered and that is congruent with an acute effect on motivated behavior (Fischbach-Weiss et al., 2017). In addition, given the opportunity, mice increased response bout initiation to compensate for lost reward to maintain relatively constant overall levels of reinforcement, and no obvious impairment in responding was observed across sessions, as might be expected if learning were impacted (Fischbach-Weiss et al., 2017). Given the prior work related to DA activity at the time of reward (discussed above), the predicted effect on behavior in subjects responding for reward if the inhibition period were to be shifted from the instrumental response bout to the reward delivery period is quite different; in this case, a decrease in behavior over time (either the instrumental behavior and/or the cue-elicited behavior), that outlasts, or is expressed outside of the period of inhibition, would be expected. Thus, we hypothesized that suppression of DA neuron activity when mice are required to execute a sequence of responses to obtain reward would have differential effects depending on when the DA suppression occurred.

We therefore sought here to examine the impact of inhibition limited to the time of reward receipt under static behavioral conditions in which the cues and rewards are held constant in well-trained subjects. We trained TH-cre mice on appetitive instrumental procedures for food reward, and then triggered DA neuron inhibition during reward delivery, or during the intertrial interval as a control. To parallel our prior work, we first tested reward-paired DA inhibition in mice well-trained to respond under a progressive ratio schedule. This first experiment was followed by a second experiment to test the generality of the effect of reward-paired inhibition within a low-effort task, and a third experiment to test the specificity of the effect of reward-paired inhibition within a 2-cue forced choice procedure. In these experiments, we found that TH+ neuron inhibition during the time of reward receipt produces a gradual decrease in behavioral responding to the cue that predicts that reward, a behavioral change that is observed during time periods without neuronal inhibition, and that lasts at least 24hrs. In addition, the decrease in responding to reward-predictive cues is limited to the specific association that was manipulated. Thus inhibition during reward receipt can produce a change in behavior congruent with an effect on learning -- in this case, extinction learning -- in well-trained subjects under consistent stimulus and reward conditions.

EXPERIMENTAL PROCEDURES

Subjects

Male TH-Cre (tyrosine hydroxylase (TH)-IRES-Cre) mice aged 8–12 weeks were housed individually on a reverse 12 h light/dark cycle (lights off at 10:00). Mice were tested under food restriction to 90% of their free-fed body weight. The Institutional Animal Care and Use Committee of the Ernest Gallo Clinic and Research Center at the University of California, San Francisco, approved all experimental procedures. Six of the ten mice used for the progressive ratio experiments also participated in the progressive ratio studies described in Fischbach et al. (2018), in which optotgenetic inhibition was applied just prior to or during bouts of nosepoking. Seven additional previously-naïve mice were used for the 2-cue forced choice experiment.

Surgical procedures and virus injection

Surgery to infuse inhibitory opsin-expressing virus and to implant optical fibers is as previously reported (Fischbach et al., 2018). Briefly, received a 1μl infusion of either Cre-dependent halo rhodopsin (AAV5/Ef1a-DIO-eNpHR3.0)-eYFP targeting the VTA (target coordinates: ML±0.5mm, AP −3.5mm, DV −4.45mm relative to bregma) at a speed of 0.1μl per minute. The infuser was left in place for an additional 10 minutes following each infusion. Two optical fibers (BFL37–200, Thorlabs) were implanted bilaterally (ML±1.5mm) at an angle of ±13.5° to a targeted depth of DV −4.45mm relative to skull and were secured to the skull surface with two metal screws and dental cement. Mice were given 7 days to recover before starting handling and behavioral procedures. Behavioral tests were conducted at least 5 weeks after surgery.

Behavioral procedures

Progressive ratio task

Mice were trained in standard operant chambers (Med Associates, St. Albans, VT, USA) equipped with two nose poke ports flanking a recessed port into which reward pellets were delivered. Mice were initially trained under a fixed ratio 1 (FR-1) schedule of reinforcement. The back of one nose poke port was illuminated with a cue light to signal when reward was available. A 500ms, 1kH tone was played after each nose poke was completed at the lit nose poke, indicating that a reward pellet was available. One high-fat food pellet (35%, 20mg, Bio-serv, Flemington, NJ, USA) was delivered upon the first port entry after cue presentation. After acquiring the FR-1 task, mice progressed to FR-3 training for ~ 3 sessions before progressing to a progressive ratio schedule wherein the number of nose poke responses required to earn a pellet was increased after each successive reward according to the following equation: × = (5*êreward*0.24)−5. For the progressive ratio procedure, the completion of each current response requirement resulted in the presentation of the 500ms, 1kH tone cue which also signaled the availability of the reward. Subjects received optogenetic inhibition of the midbrain after at least one week of progressive ratio training (inhibition parameters described below).

Fixed ratio(FR)-1 task

Mice were trained in standard operant chambers as above. The back of the active nose poke port was illuminated with a cue light when active and a 500ms, 1kH tone was delivered after each active nose poke was completed on the active nose poke, indicating that a reward pellet was available upon reward port entry. Mice were trained on a fixed ratio 1 (FR-1) schedule reinforced by high-fat food pellets (35%, 20mg, Bio-serv) pellet. Mice received optogenetic inhibition of the VTA after at least one week of training (inhibition parameters described below).

2-cue forced choice task

Each of the two nose poke ports that flanked the center reward port were activated in a pseudo-random fashion such that, when the left nose poke port was active, the back of the port was illuminated with a cue light and a 500ms, 1kH tone was delivered after each active nose poke was completed on the active nose poke, indicating that a reward pellet was available. Likewise, when the right nose poke port was active, the back of the port was illuminated with a cue light and a 500ms, white noise sound was delivered after each active nose poke was completed on the active nose poke, indicating that a reward pellet was available. For both the right and left active nose poke trials, the type of reward pellet and the location of pellet delivery (center reward port) was held constant. To allow within-subject test of the impact of reward-paired inhibition on a specific association, on inhibition days (inhibition parameters described below), one nose poke was paired with DA neuron inhibition, such that DA neurons were inhibited at the time of reward consumption after mice had performed an active nose poke on the left side, but not after mice had performed an active nose poke on the right side, or vice versa. The nosepoke side/cue that was followed by inhibition remained constant across six days of testing. The nose poke side that was paired with DA neuron inhibition was counterbalanced across mice.

In vivo optical inhibition parameters

The inhibitory opsin approach herein has previously been used in TH-cre mice in multiple published reports e.g., (Chaudhury et al., 2013; Fischbach-Weiss et al., 2017; Parker et al., 2016; Tye et al., 2013). In these published reports, TH+ neuron inhibition was confirmed in vitro and in vivo, and little to no rebound excitation was reported including after long inhibitions (Chaudhury et al., 2013). During session with optical inhibition of DA neurons, patch cables (Doric Lenses, Ville de Quebec, QC, Canada) were attached to the implanted optical fibers and to bilateral optical commutators (Doric Lenses) and connected to a 200mW DPSS 532 nm laser (OEM Laser Systems, Midvale, UT, USA). Med PC IV (Med Associates) software was used to control onset and offset of optical inhibition, and also recorded the behavioral events, i.e., nose poke and reward port responses. Subjects were habituated to patch cable tethering for five days prior to inhibition; the last two days of this habituation period served as the two control test sessions that were not paired with inhibition, although subjects were attached to the patch cables. This was followed by 6 consecutive days of inhibition during the progressive ratio, FR-1, or 2-cue forced choice tasks in which 15 second 532 nm laser pulses were triggered each time mice entered the reward port to collect a reward pellet. The laser was triggered 0.01 seconds after the animals entered the reward port, and the pellet was dropped 0.1 seconds after the animal entered the reward port, to ensure DA neurons were fully inhibited before the animal received the reward. Following the 6 inhibition test days, mice were then trained for 6 additional “recovery” days in which the laser was not turned on, but mice were tethered to the system via patch cords.

Histological verification of NpHR virus expression and fiber optic placement

Coronal 50 μm sections were treated using standard procedures to visualize viral expression, using antibodies against GFP (Fischbach-Weiss et al., 2018). Mounted sections were imaged on a confocal microscope to visualize virus expression and verify fiber placements. Subjects with insufficient virus expression or inaccurate fiber placements that were not above/within the VTA were discarded from the study. An example image from one subject tested in the current experiments and our prior related study can be found in Fischbach-Weiss et al (2018).

Statistics

All values are expressed as mean±SEM. Statistical differences across testing days were assessed using 1-way and 2-way ANOVA, as appropriate. Significant interactions were followed up by comparisons within conditions across days, or between conditions on specific days. Non-parametric Friedman and Wilcoxon signed rank tests were used to compare latency measures across days within groups, and the Kruskal-Wallis test was used to compare latencies between conditions. The confidence level was set at p < 0.05 and Bonferroni corrections were applied in cases of multiple comparisons.

Results

For all studies, TH-IRES-Cre mice received infusions of the Cre-dependent virus expressing eNpHR3.0 aimed at the ventral tegmental area (VTA) and also received bilateral chronic optical fiber implants above the VTA (Chaudhury et al., 2013; Fischbach-Weiss et al., 2017; Ilango et al., 2014; Parker et al., 2016; Tan et al., 2012; Tye et al., 2013). After surgical recovery, mice were trained to respond for food reward by inserting their snout into a recess in the experimental chamber. This behavior is termed ‘nosepoking’. Completion of correct nosepoke responses resulted in presentation of a 500 msec auditory tone and one high-fat food pellet. We first tested the effect of reward-paired inhibition in mice working under a progressive ratio schedule. We next compared these findings to subjects responding on a low effort FR-1 schedule for reward. Finally, to test for cue-selective effects of inhibition during reward receipt, an additional experiment in which mice learned to associate reward with two different cues was conducted.

Reward-paired DA neuron inhibition impairs behavioral responses to reward cues in a progressive ratio procedure

Inhibitions of DA neuron activity that occur during omission of an expected reward are proposed to act as a negative RPE signal that drives extinction learning (Glimcher, 2011; Pan et al., 2013, 2008; Steinberg et al., 2013), and a test of this hypothesis in the setting of an overexpectation procedure provided direct evidence for a role of negative prediction errors in reducing cue-elicited responding (Chang et al., 2016). Here we sought to determine whether DA neuron inhibition during reward in a progressive ratio procedure would similarly reduce behavior, in effect acting to extinguish responding. Mice were trained on a progressive ratio schedule in which the number of responses required to earn a single reward pellet increased exponentially across trials (Figure 1a). In our version of this procedure, completion of the correct number of nosepoke responses resulted in presentation of a 500 msec auditory tone and one high-fat food pellet (Figure 1b). We chose a 15-second inhibition epoch because we observed that mice sometimes took this long to complete food pellet retrieval and consumption. In this way we sought to prevent the impact on midbrain DA neurons of any food-related neural processing that might normally impinge upon these DA neurons. A disadvantage of this 15-second time epoch is that it does not mimic previously reported durations of DA neuron inhibitions during reward omission (see Discussion). Mice received either reward-paired or unpaired inhibition in separate tests (see Methods). For sessions with reward-paired inhibition, a 15-second laser-pulse was triggered 0.01 seconds after subjects entered the reward port to collect the pellet reward (Fig 1b). For unpaired inhibition, a 15-second laser-pulse was triggered 15 seconds after exiting the reward port after reward delivery, when subjects had already finished pellet consumption (Fig. 1b, unpaired inhibition). If the delivery of an artificial negative reward prediction error were to induce an extinction-like effect, we expected that multiple sessions of inhibition would be required. Thus, each test extended over 14 days, with two sessions for baseline behavior, 6 days of testing with DA neuron inhibition, and 6 recovery days with no DA neuron inhibition to assess return to baseline (Fig 1c).

Figure 1. Reward-paired DA neuron inhibition decreases reactions to a reward-predictive cue in a progressive ratio procedure.

Figure 1.

(a) In the progressive ratio (PR) procedure, the number of responses required to earn a single reward pellet increases exponentially across trials. (b) Schematic showing relative timing of nosepoke responses, cue onset and laser delivery relative to reward port entry for sessions with Paired and Unpaired inhibition. (c) Time line of experiment. Two-hour sessions were conducted daily, with six sessions of optogenetic inhibition flanked by Control and Recovery sessions. (d) Reward-paired inhibition decreased total nose poke responding across the 6 Inhibition sessions (main effect of Inhibition, p<.033). (e) Reward-paired inhibition decreased number of rewards earned (main effect of Inhibition, p<.034). (f) Reward-paired inhibition did not decrease the number of entries into the reward port when no reward was available. (g) Task efficiency measured by the ratio of nose pokes to all reward port entries decreased in Paired inhibition sessions (within-condition effect of Session, p<.012), but not Unpaired sessions (p=.27). (h) Percent of latencies to collect the reward after cue presentation greater than 5 seconds increased in Paired vs Unpaired sessions on Inhibition sessions 3–6 (*p<.005, Paired vs. Unpaired). (i) Reward-paired inhibition increased number of nose pokes performed after cue presentation and before reward collection (main effect of Inhibition, p<.015). (j) Latency to collect the reward after cue presentation from the first trial of each session is increased in the Paired condition on Inhibition sessions 3–6 and Recovery session 1 and returns to baseline by Recovery session 2 (*p<.005, Paired vs. Unpaired; #p<.008, Recovery session 1 vs. Recovery session 2 for Paired condition). n=10 mice. Data are presented as means and error bars represent S.E.M.

Reward-paired inhibition during the progressive ratio procedure resulted in modest but significant decreases in instrumental responding over the six test sessions (paired vs unpaired sessions: main effect of inhibition, F(1,9)=6.31, p=.033; main effect of session, F(5,45)=2.78, p=.028; inhibition × session interaction, F(5,45)=.15, p=.98)(Fig. 1d), and consequently around a 20% decrease in the number of rewards earned (main effect of inhibition, F(1,9)=6.20, p=.034; main effect of session, F(5,45)=4.82, p=.001; interaction, F(5,45)=.83, p=.53) (Fig. 1e), all of which could be consistent with a decrease in motivation. Yet, with the exception of one pellet for one mouse, all rewards earned were consumed, indicating that the inhibition during reward did not reduce the motivation to consume that reward. During performance in this task, mice ‘check’ the reward port frequently throughout a behavioral session, generating high levels of reward port entries during times when reward is not being delivered. This measure of reward-seeking behavior was also not decreased by reward-paired DA neuron inhibition (main effect of inhibition, F(1,9)=.90, p=.37; main effect of session, F(5,45)=.65, p=.66; inhibition × session interaction, F(5,45)=5.62, p<.0001; Fig. 1f). Although the interaction term here was significant, there were no significant differences in Paired vs Unpaired port entries on any individual inhibition session. The absence of significant overall decreases in unrewarded port entries across reward-paired inhibition days suggests that mice did not find DA neuron inhibition aversive, as they did not avoid the location paired with DA neuron inhibition during the session.

Because nosepokes and rewards earned decreased slightly, yet port entries did not, we considered that these decreases were due to a decrease in task efficiency, rather than due to a decrease in motivation. We assessed task efficiency by measuring the average number of nosepokes performed before checking the reward port (Ostlund et al., 2012) and found the pattern of behavior changed over the six inhibition sessions (main effect of inhibition, F(1,9)=4.84, p=.055; main effect of session, F(5,45)=2.55, p=.044; inhibition × session interaction, F(5,45)=3.34, p<.012); the significant inhibition by session interaction was accounted for by a decrease in efficiency in mice receiving reward-paired DA inhibition (within-group effect of session: F(5,45)=5.06, p=.001) but not during control sessions with unpaired inhibition (F(5,45)=1.32, p=.27) (Fig 1g).

If mice were still motivated to seek reward, but responding less efficiently, we considered that they could be failing to utilize the reward-paired auditory cue which signals the completion of the current response requirement and availability of the reward pellet. We examined the latency to move from the nosepoke recess to the reward port after cue presentation. In control unpaired laser sessions, the latency to collect the reward after cue presentation is very fast with mean of 1.50 (+/− .20) to 1.81 (+/− .25) seconds for the two control sessions, respectively; it is longer than 5 seconds in only 2%±1% of trials. However, in reward-paired laser sessions, the mean latencies increased from 9.44 (+/− 2.34) to 44.26 (+/− 30.9) seconds from the first to the sixth inhibition session. We quantified this as the percentage of cue-reward latencies > 5 seconds, a measure that increased across reward-paired inhibition sessions (main effect of inhibition, F(1,9)=17.39, p=.002; main effect of session, F(5,45)=2.99, p=.021; inhibition × session interaction, F(5,45)=2.86, p=.025), indicating that mice no longer appropriately responded to the cue indicating reward availability (Fig. 1h). Comparisons of percentages on individual inhibition sessions revealed significant effects of condition for sessions 3–6 (p<.003 for each comparison). To assess behavior in the period between cue presentation and reward port entry we measured the number of nosepokes mice performed after each cue presentation before moving to the reward port to collect the reward. We found that reward-paired inhibition resulted in performance of significantly more post-cue nosepokes when compared to unpaired sessions (main effect of inhibition, F(1,9)=8.94, p=.015; main effect of session, F(5,45)=3.44, p=.01; inhibition × session interaction, F(5,45)=3.68, p=.007) (Fig. 1i). The comparisons across condition for all 6 inhibition sessions all had p-values <.05, but none of these survived a Bonferroni multiple comparisons correction.

Because mice failed to terminate nosepoke bouts upon cue presentation, this suggests that mice no longer used the cue to indicate reward availability, an effect consistent with a learned decrease in cue value and/or extinction of cue-elicited behavior. If this were the case, then we should observe decreases in behavioral responding to the cue that outlast the inhibition session, reflecting learning. We therefore measured the latency to collect the reward after cue presentation on the first trial of each session, prior to any laser activation on that day. The latency to collect reward increased over days in inhibition sessions (Freidman test, chi-square = 13.66, p<.019) but not in control sessions with unpaired laser (Friedman test, chi-square = 6.37, p=.27), such that latencies were significantly longer for sessions 3–6 (Kruskal-Wallis test, paired vs. unpaired laser conditions: all chi-square > 8, all ps<.005 for sessions 36) (Fig. 1j). Additionally, this first-trial measure did not return to baseline levels until the second recovery session, as would be expected (Kruskal-Wallis test, paired vs. unpaired laser conditions, on recovery day 1, chi-square = 7.8, p<.006, and on recovery day 2, chi-square = 1.29, p= .28.; looking within the paired condition, Wilcoxon signed rank test on recovery day 1 vs recovery day 2 within the paired condition, z=−2.7, p=.007). These findings indicate that reward-paired DA neuron inhibition produces deficits in reward cue-conditioned behavior that persist across test sessions.

Together, these results show that inhibition during reward receipt decreased behavioral responding to the cue, consistent with predictions arising from optogenetic creation of a negative DA RPE. Importantly, the inhibition never overlapped with the presentation of the cue itself. These effects were not due to reward-paired illumination itself; TH-IRES-Cre mice (n=6) injected with a YFP containing virus showed no significant changes in total nose pokes (control vs. inhibition days, p=0.93), rewards earned (control vs. inhibition days, p=93), extra nosepokes performed after cue presentation and before reward collection (control vs. inhibition days, p=97), or the percentage of cue-reward latencies greater than 5 seconds (control vs. inhibition days, p=38).

Reward-paired inhibition reduces responding on a low-effort FR1 task

Reward-paired inhibition in the progressive ratio task resulted in a substantial decrement in responding to the reward cue, along with a mixed effect on nosepoke responding, decreasing overall numbers but increasing them locally on some trials. To determine if this outcome was somehow dependent on the progressive ratio or high effort aspect on the task, we trained new subjects on a simple fixed-ratio(FR)-1 schedule. In this task, mice received a 500msec tone followed by reward pellet delivery after one nosepoke, with the same relative timing among the nosepoke, tone, port entry and reward delivery as described for the progressive ratio procedure. Likewise, optogenetic inhibition of midbrain dopamine neurons (reward-paired or unpaired) occurred over 6 days after mice were trained.

During 6 days of reward-paired DA neuron inhibition on this schedule, we observed a significant decrease in the number of total nose pokes performed (main effect of inhibition, F(1,8)=11.56, p=.009; main effect of session, F(5,40)=1.82, p=.13; inhibition × session interaction, F(5,40)=.566, p=.73) (Fig. 2a), the total nosepokes include the reinforced nosepokes and any extra nosepokes subjects emitted. The number of rewards earned was correspondingly decreased (main effect of inhibition, F(1,8)=18.42, p=.003; main effect of session, F(5,40)=2.45, p=.05; inhibition × session interaction, F(5,40)=1.13, p=.36) (Fig. 2b). In addition, we also observed an increase in the percentage of cue-reward latencies greater than 5 seconds in the reward-paired inhibition condition when compared to control unpaired sessions (main effect of inhibition, F(1,8)=27.19, p=.001; main effect of session, F(5,40)=3.72, p=.007; inhibition × session interaction, F(5,40)=5.49, p=.001) (Fig. 2c). However, we did not see an increase in the number of post-cue nose pokes performed across reward-paired inhibition (main effect of inhibition, F(1,8)=.438, p=.527; main effect of session, F(5,40)=1.44, p=.23; inhibition × session interaction, F(5,40)=.373, p=.864) (Fig. 2d), suggesting that animals disengaged from the nose poke port, but did not immediately approach the reward port after the cue. Thus, in a relatively low-effort task in which every response is paired with both a cue and reward, reward-paired DA neuron inhibition reduces both the instrumental response and the cue-elicited response. However the decrease in number of rewards earned and the increase in cue-reward latencies occur earlier within the six inhibition sessions than the decrease in nosepokes, suggesting that instrumental response decreases are subsequent to effects on cue elicited behavior. Illumination alone cannot account for these effects as reward-paired laser activation in YFP-expressing TH-cre mice exhibited no decrease in instrumental responding or cue-related behavior (total nosepokes, control vs. inhibition days, p=0.92; rewards earned, control vs. inhibition days, p=0.90; percentage of cue-reward latencies > 5 sec, control vs. inhibition days, p=0.69; n=6).

Figure 2. Reward-paired DA neuron inhibition reduces continuously-reinforced operant responses and reactions to a reward-predictive cue.

Figure 2.

(a) Reward-paired inhibition decreased total nose poke responding on Inhibition session 4 and 6 (*p<.008). (b) Reward-paired inhibition decreased number of rewards earned on sessions 1 and 3–6 (*p<.008). (c) Percent of latencies to collect the reward after cue presentation greater than 5 seconds increased for Paired vs Unpaired on Inhibition sessions 3–6 (*p<.003, Paired vs. Unpaired). (d) No change in number of nose pokes performed after cue presentation and before reward collection. n=9 mice. Data are presented as means and error bars represent S.E.M.

Behavioral decrement produced by reward-paired dopamine neuron inhibition is cue specific

To confirm that reward-paired inhibition impairs responding to the preceding cue in an association-specific manner, we trained a third group of mice in a two-cue forced-choice procedure (Figure 3a). In this task, two nosepoke ports flanked a central reward port. On each trial, a light was illuminated in the back of a nosepoke port to indicate whether the right or left nosepoke port was active; a correct response at the lit nosepoke port triggered one of two 500 msec auditory cues, each consistently paired with either the left or right nosepoke port, followed by delivery of a reward pellet to the center reward port. The same reward type was delivered after both cues. After training, testing commenced as above, with six consecutive inhibition sessions flanked by control sessions with no inhibition. A 15-s laser-pulse was triggered after mice entered the reward port after one, but not the other, auditory cue.

Figure 3. Reward-paired DA neuron inhibition decreases to reward-predictive cues are selective for learned cue-reward associations.

Figure 3.

(a) Diagram of test parameters. Two nose pokes flank a center reward port. DA neurons were inhibited during reward consumption after mice performed a nose poke on the left nose poke port. DA neurons were not inhibited after mice performed a nose poke on the right nose poke port. The actual side that was paired with DA neuron inhibition was counterbalanced across animals. (b) Percent of latencies to collect the reward after cue presentation greater than 5 seconds increased for Paired vs Unpaired on Inhibition sessions 3–6 (*p<0.002, Paired vs Unpaired). (c) Non-selective decrease in total nose pokes completed (main effect of Session, p<.003). (d) Mean latency from reward port entry to the nosepoke that initiates the next trial (*p<.005). n=7 mice. Data are presented as means and error bars represent S.E.M.

We found that inhibiting DA neurons during reward consumption differentially increased the latency to collect reward after the cue that preceded that inhibition, an effect reflected as an increased percentage of cue-reward latencies > 5 seconds (main effect of inhibition, F(1,6)=43.20, p=.001; main effect of session, F(5,30)=2.94, p=.028; inhibition × session interaction, F(5,30)=14.08, p=.0001) (Fig.3b). We also observed a decrease in the total number of nosepokes performed at both nosepoke ports over days; however, no differential effect of inhibition emerged, potentially because mice were forced to perform a nosepoke on each trial to advance to the next trial, constraining the number of opportunities mice had to interact with each nosepoke (main effect of inhibition, F(1,6)=3.83 p=0.098; main effect of session, F(5,30)=4.96,p=0.002; inhibition × session interaction, F(5,30)=1.81,p=.14) (Fig. 3c). Subjects also showed divergent changes in the latency to resume NP after reward, with longer latencies on trials after inhibition, but shorter on trials with no inhibition (main effect of inhibition F(1,6)=9.63, p=.02, effect of day F(5,30)=1.06, p=.41, inhibition × day F(5,30)=7.14, p<.0001) (Fig. 3d). Taken together, these results indicate that reward-paired DA neuron inhibition is specific to the cue that preceded that inhibition and does not generalize to other cues experienced in the same behavioral session.

Discussion

Reward-paired DA neuron inhibition impairs responding to reward predictive cues

We paired DA neuron inhibition with reward receipt and examined the effects on behaviors mice made to obtain reward. We found the strongest impact on the latency to collect reward following cue onset; optogenetic inhibition of midbrain dopamine neurons during the time of reward receipt produced a gradual decrease in behavioral responding to a reward-predictive cue, a behavioral change that was observed during time periods without neuronal inhibition, and that lasted at least 24 hrs. Thus inhibition during reward receipt produces a long-lasting change in behavior congruent with an effect on learning. Importantly, the precise timing of DA neuron inhibition was critical. DA neuron inhibition triggered just 15 seconds after reward consumption (unpaired sessions) did not produce a change in the behaviors measured.

Natural inhibitions of DA neuron activity that occur during omission of an expected reward are proposed to act as a negative RPE signal that drives reductions in conditioned responding (Steinberg et al. 2013; Pan, Brown, and Dudman 2013; Pan et al. 2008; Glimcher 2011), and a recent test of this hypothesis in the setting of an overexpectation procedure provided direct evidence for a role of negative prediction errors in reducing cue-elicited responding (Chang et al. 2016). Here we sought to test whether DA neuron inhibition during reward in our progressive ratio procedure would reduce later behavior, inducing an extinction-like effect. We chose a 15 second inhibition epoch because our mice sometimes took this long to complete food pellet retrieval and consumption. Note that this time interval is longer than the durations of DA neuron inhibition previously reported during reward omission. However, we do not know in our procedure which sensory aspects of food detection are critical for the brain’s normal ability to tag a reinforcement event as ‘same as expected’. We considered that sensory aspects of food handling and consumption might elicit neural activity that could counter any optogenetically-induced inhibition, if, for example, the food handling and consumption were still in progress when the laser was turned off. With a longer 15-sec inhibition, we sought to circumvent this issue and prevent the impact on midbrain DA neurons of any food-related neural processing that might normally impinge upon these DA neurons.

Because behavior decreased as a result of reward-paired inhibition, we hypothesize that our inhibition served as the equivalent of an artificial negative RPE signal able to impact behavior, even in the face of actual reward delivery. It is also possible that our manipulation directly reduced the perceived value of the food reward. Detailed examination of the consummatory behavior during laser stimulation would be useful in this regard. We presume that our manipulation induced an inhibitory signal based on the fact that our subjects were well-trained, and that optogenetic inhibition of VTA dopamine neurons inhibits firing in slice in vitro and reduces dopamine release in vivo (McCutcheon et al., 2014; Parker et al., 2016). While reasonable to expect that our manipulation decreased the firing rate of DA neurons. we did not directly measure DA neuron activity during performance in this task.

Of note, the primary action that was impacted – the cue-triggered movement from the nosepoke operandum to the reward port - never was paired with DA neuron inhibition; this behavioral change was secondary to the inhibition rather than occurring during the inhibition. In addition, we found in a follow-up experiment that the behavioral changes observed in response to the reward predictive cue were cue-specific and were not explained by a simple lasting reduction in the motivational value of the reward since the same reward was delivered after each of the two cues in the final experiment. These results indicate that reward-paired DA neuron inhibition generates a signal that is specific to the cue that preceded inhibition. Multiple measures suggest that reward collection after the cue within the cue-reward association that was not targeted by our inhibition actually improved, with faster latencies apparent over days. It is possible this is reflective of a contrast effect or of the overall strategy for these mildly hungry mice to acquire food.

An unexplained finding is that reward-paired inhibition in the progressive ratio task resulted in an weakening/extinction of the cue-reward relationship but not the instrumental response. This is puzzling because optogenetic activation of VTA dopamine neurons alone can support acquisition of both new actions (Ilango et al., 2014; Saunders et al., 2018; Witten et al., 2011) and optogenetic dopaminergic inhibition after instrumental responses can decrease the probability of appropriate responses on the next trial (Hamid et al., 2016; Radke et al., 2018). An additional possibility is that the well-trained mice in our study may have learned to nosepoke in rapid sequence, and that this highly-practiced, possibly habitual behavior could be supported by non-VTA circuits, including substantia nigra-dorsal striatal circuits, that were less affected by our manipulation. Since we aimed our viral infusions and optical fiber placements towards the VTA, it is likely that most of our impact was on VTA, rather than SNc, DA neurons. It is also possible that the difference in extinction susceptibility is due to differences in reward uncertainty after the response and the cue. In the progressive ratio task, multiple nose pokes are required to earn a reward, and the animal does not explicitly know the number of nose pokes required to earn each reward. In addition, the instrumental response in the progressive ratio procedure is variably reinforced and thus the instrumental contingency between response and reward is much weaker than between the cue and reward, as the cue signals availability of the reward pellet on 100% of trials. Previous experiments have demonstrated that variably-reinforced cues and responses are more resistant to extinction than continuously reinforced cues and responses e.g., (Haselgrove et al., 2004), and that weaker response-outcome relationships favor habitual control of responding (Dickinson et al, 1983). In the current study, we began to investigate these issues by training mice on an FR-1 schedule, in which each nosepoke response was followed by a cue and reward delivery. In this experiment, dopamine neuron inhibition during reward in well-trained mice again greatly increased the latency to respond to the reward cue, but in contrast to the progressive ratio task also produced a large decrease in nosepoke behavior, without a corresponding increase in post-cue nosepokes. These data provide support for an effect of dopamine neuron inhibition on both cue-elicited behavior as well as some reinforced actions, although the specific determinants (automaticity, certainty, reinforcement rate, etc) for the differences in the progressive ratio and FR1 task remain to be elucidated. It is also important to note that in the 2-cue forced choice task, that our optogenetic manipulation also follows both a nosepoke made in a specific spatial location and the presentation of an auditory cue – thus the experiment cannot determine whether the effect on learning is specific to the cue or the nosepoke response or impacts both. We however favor an impact on the cue given the subtle increase in nosepokes after inhibition relative to the control association in the 2-cue task, reminiscent of the extra nosepokes observed in the progressive ratio task after reward inhibition.

Taken together with our prior findings, we conclude that inhibition of TH+ neurons has distinct effects on behavior depending on whether the inhibition occurs just before or during the production of an instrumental response or during the receipt of reward. When inhibition was applied around the time of the instrumental response, the probability of a response was decreased, an effect that tended to recover when the inhibition was removed (Fischbach-Weiss et al., 2017). When inhibition was applied during the receipt and consumption of a food pellet reward (present findings), behavioral responses to the reward predictive cue that preceded reward delivery were attenuated, an effect that did not recover immediately upon inhibition offset.

It is of interest to consider whether reward-paired DA neuron inhibitions may drive changes in synaptic plasticity in parallel with the observed changes in behavior. Previous research has shown that learning of a cue-reward association results in an increase in excitatory drive onto DA neurons (Stuber et al., 2008). Conversely, extinction training has been shown to increase inhibitory drive onto midbrain DA neurons (Pan et al., 2013). Pan and colleagues used extracellular recordings during associative learning and found that midbrain GABAergic neurons respond to both rewards and reward-predictive cues at a latency that precedes the onset of phasic DA neuron activity. In addition, GABAergic units that showed the strongest response to the reward developed the strongest responses to the conditioned stimulus after extinction. These results suggest that midbrain DA neurons may be inhibited by phasic GABAergic activity when extinguished reward-predictive cues are presented, resulting in suppression of the conditioned response. In the present study, we found that reward-paired DA neuron inhibition resulted in a decrease in measures of cue-related behavior that in some ways mimics extinction learning. However, it remains to be seen if our optoinhibition parameters result in an increase in inhibitory drive onto DA neurons similar to that observed during extinction caused by reward omission. Future studies comparing the results presented here with behavioral and neural changes resulting from extinction caused by reward omission with that caused by reward-paired DA neuron inhibition will further our understanding of how DA neuron inhibitions drive new learning.

Highlights.

  • We tested effects of dopamine neuron optoinhibition during reward consumption in mice responding to reward-predictive cues

  • Reward-paired dopamine neuron inhibition increased the latency to respond to preceding cues

  • This effect was long-lasting, observed the following day when inhibition was not applied

  • These findings strengthen the notion that dopamine neuron activity at the time of reward can act as a teaching signal

Acknowledgments

Supported by National Institutes of Health Grant R01 DA035943. Author contributions: Experimental conception, design and interpretation: SFW, PHJ; data collection: SFW; data analysis: SFW, PHJ; writing and editing: SFW, PHJ.

Footnotes

Conflict of Interest

The authors declare no conflict of interest.

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

  1. Beeler JA, Frazier CRM, Zhuang X, 2012. Putting desire on a budget: dopamine and energy expenditure, reconciling reward and resources. Front. Integr. Neurosci. 6, 49 10.3389/fnint.2012.00049 [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Berridge KC, 2007. The debate over dopamine’s role in reward: the case for incentive salience. Psychopharmacology (Berl.) 191, 391–431. 10.1007/s00213-006-0578-x [DOI] [PubMed] [Google Scholar]
  3. Berridge KC, Robinson TE, 2003. Parsing reward. Trends Neurosci. 26, 507–513. 10.1016/S0166-2236(03)00233-9 [DOI] [PubMed] [Google Scholar]
  4. Cagniard B, Beeler JA, Britt JP, McGehee DS, Marinelli M, Zhuang X, 2006. Dopamine scales performance in the absence of new learning. Neuron 51, 541–547. 10.1016/j.neuron.2006.07.026 [DOI] [PubMed] [Google Scholar]
  5. Chang CY, Esber GR, Marrero-Garcia Y, Yau H-J, Bonci A, Schoenbaum G, 2016. Brief optogenetic inhibition of dopamine neurons mimics endogenous negative reward prediction errors. Nat. Neurosci. 19, 111–116. 10.1038/nn.4191 [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Chang CY, Gardner M, Di Tillio MG, Schoenbaum G, 2017. Optogenetic Blockade of Dopamine Transients Prevents Learning Induced by Changes in Reward Features. Curr. Biol. CB 27, 3480–3486.e3. 10.1016/j.cub.2017.09.049 [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Chaudhury D, Walsh JJ, Friedman AK, Juarez B, Ku SM, Koo JW, Ferguson D, Tsai H-C, Pomeranz L, Christoffel DJ, Nectow AR, Ekstrand M, Domingos A, Mazei-Robison MS, Mouzon E, Lobo MK, Neve RL, Friedman JM, Russo SJ, Deisseroth K, Nestler EJ, Han M-H, 2013. Rapid regulation of depression-related behaviours by control of midbrain dopamine neurons. Nature 493, 532–536. 10.1038/nature11713 [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Cohen JY, Haesler S, Vong L, Lowell BB, Uchida N, 2012. Neuron-type-specific signals for reward and punishment in the ventral tegmental area. Nature 482, 85–88. 10.1038/nature10754 [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Collins AL, Greenfield VY, Bye JK, Linker KE, Wang AS, Wassum KM, 2016. Dynamic mesolimbic dopamine signaling during action sequence learning and expectation violation. Sci. Rep. 6, 20231 10.1038/srep20231 [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Day JJ, Roitman MF, Wightman RM, Carelli RM, 2007. Associative learning mediates dynamic shifts in dopamine signaling in the nucleus accumbens. Nat. Neurosci. 10, 1020–1028. 10.1038/nn1923 [DOI] [PubMed] [Google Scholar]
  11. Dickinson A, Nicholas DJ, Adams CD (1983). The effect of the instrumental training contingency on susceptibility to reinforcer devaluation. Quarterly Journal of Experimental Psychology, 35B, 35–51. [Google Scholar]
  12. Eshel N, Bukwich M, Rao V, Hemmelder V, Tian J, Uchida N, 2015. Arithmetic and local circuitry underlying dopamine prediction errors. Nature 525, 243–246. 10.1038/nature14855 [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Fischbach-Weiss S, Reese RM, Janak PH, 2017. Inhibiting Mesolimbic Dopamine Neurons Reduces the Initiation and Maintenance of Instrumental Responding. Neuroscience. 10.1016/j.neuroscience.2017.12.003 [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Flagel SB, Clark JJ, Robinson TE, Mayo L, Czuj A, Willuhn I, Akers CA, Clinton SM, Phillips PEM, Akil H, 2011. A selective role for dopamine in stimulus-reward learning. Nature 469, 53–57. 10.1038/nature09588 [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Glimcher PW, 2011. Understanding dopamine and reinforcement learning: the dopamine reward prediction error hypothesis. Proc. Natl. Acad. Sci. U. S. A. 108 Suppl 3, 15647–15654. 10.1073/pnas.1014269108 [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Hamid AA, Pettibone JR, Mabrouk OS, Hetrick VL, Schmidt R, Vander Weele CM, Kennedy RT, Aragona BJ, Berke JD, 2016. Mesolimbic dopamine signals the value of work. Nat. Neurosci. 19, 117–126. 10.1038/nn.4173 [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Hart AS, Rutledge RB, Glimcher PW, Phillips PEM, 2014. Phasic dopamine release in the rat nucleus accumbens symmetrically encodes a reward prediction error term. J. Neurosci. Off. J. Soc. Neurosci. 34, 698–704. 10.1523/JNEUROSCI.2489-13.2014 [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Haselgrove M, Aydin A, Pearce JM, 2004. A partial reinforcement extinction effect despite equal rates of reinforcement during Pavlovian conditioning. J. Exp. Psychol. Anim. Behav. Process. 30, 240–250. 10.1037/0097-7403.30.3.240 [DOI] [PubMed] [Google Scholar]
  19. Howe MW, Dombeck DA, 2016. Rapid signalling in distinct dopaminergic axons during locomotion and reward. Nature 535, 505–510. 10.1038/nature18942 [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Ilango A, Kesner AJ, Keller KL, Stuber GD, Bonci A, Ikemoto S, 2014. Similar roles of substantia nigra and ventral tegmental dopamine neurons in reward and aversion. J. Neurosci. Off. J. Soc. Neurosci. 34, 817–822. 10.1523/JNEUROSCI.1703-13.2014 [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Keiflin R, Pribut HJ, Shah NB, Janak PH, 2017. Phasic Activation of Ventral Tegmental, but not Substantia Nigra, Dopamine Neurons Promotes Model-Based Pavlovian Reward Learning. bioRxiv 232678 10.1101/232678 [DOI] [Google Scholar]
  22. Ljungberg T, Apicella P, Schultz W, 1992. Responses of monkey dopamine neurons during learning of behavioral reactions. J. Neurophysiol. 67, 145–163. [DOI] [PubMed] [Google Scholar]
  23. Matsumoto M, Hikosaka O, 2009. Two types of dopamine neuron distinctly convey positive and negative motivational signals. Nature 459, 837–841. 10.1038/nature08028 [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. McCutcheon JE, Cone JJ, Sinon CG, Fortin SM, Kantak PA, Witten IB, Deisseroth K, Stuber GD, Roitman MF, 2014. Optical suppression of drug-evoked phasic dopamine release. Front. Neural Circuits 8, 114 10.3389/fncir.2014.00114 [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Niv Y, 2007. Cost, benefit, tonic, phasic: what do response rates tell us about dopamine and motivation? Ann. N. Y. Acad. Sci. 1104, 357–376. 10.1196/annals.1390.018 [DOI] [PubMed] [Google Scholar]
  26. Ostlund SB, Kosheleff AR, Maidment NT, 2012. Relative response cost determines the sensitivity of instrumental reward seeking to dopamine receptor blockade. Neuropsychopharmacol. Off. Publ. Am. Coll. Neuropsychopharmacol. 37, 2653–2660. 10.1038/npp.2012.129 [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Owesson-White CA, Cheer JF, Beyene M, Carelli RM, Wightman RM, 2008. Dynamic changes in accumbens dopamine correlate with learning during intracranial self-stimulation. Proc. Natl. Acad. Sci. U. S. A. 105, 11957–11962. 10.1073/pnas.0803896105 [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Pan W-X, Brown J, Dudman JT, 2013. Neural signals of extinction in the inhibitory microcircuit of the ventral midbrain. Nat. Neurosci. 16, 71–78. 10.1038/nn.3283 [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Pan W-X, Schmidt R, Wickens JR, Hyland BI, 2008. Tripartite mechanism of extinction suggested by dopamine neuron activity and temporal difference model. J. Neurosci. Off. J. Soc. Neurosci. 28, 9619–9631. 10.1523/JNEUROSCI.0255-08.2008 [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Parker NF, Cameron CM, Taliaferro JP, Lee J, Choi JY, Davidson TJ, Daw ND, Witten IB, 2016. Reward and choice encoding in terminals of midbrain dopamine neurons depends on striatal target. Nat. Neurosci. 19, 845–854. 10.1038/nn.4287 [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Radke AK, Kocharian A, Covey DP, Lovinger DM, Cheer JF, Mateo Y, Holmes A, 2018. Contributions of nucleus accumbens dopamine to cognitive flexibility. Eur. J. Neurosci. 10.1111/ejn.14152 [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Robbins TW, Everitt BJ, 2007. A role for mesencephalic dopamine in activation: commentary on Berridge (2006). Psychopharmacology (Berl.) 191, 433–437. 10.1007/s00213006-0528-7 [DOI] [PubMed] [Google Scholar]
  33. Roesch MR, Calu DJ, Schoenbaum G, 2007. Dopamine neurons encode the better option in rats deciding between differently delayed or sized rewards. Nat. Neurosci. 10, 1615–1624. 10.1038/nn2013 [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Salamone JD, 2002. Functional significance of nucleus accumbens dopamine: behavior, pharmacology and neurochemistry. Behav. Brain Res. 137, 1. [DOI] [PubMed] [Google Scholar]
  35. Salamone JD, Correa M, 2012. The mysterious motivational functions of mesolimbic dopamine. Neuron 76, 470–485. 10.1016/j.neuron.2012.10.021 [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Saunders BT, Richard JM, Margolis EB, Janak PH, 2018. Dopamine neurons create Pavlovian conditioned stimuli with circuit-defined motivational properties. Nat. Neurosci. 21, 1072–1083. 10.1038/s41593-018-0191-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Schultz W, Carelli RM, Wightman RM, 2015. Phasic dopamine signals: from subjective reward value to formal economic utility. Curr. Opin. Behav. Sci. 5, 147–154. 10.1016/j.cobeha.2015.09.006 [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Schultz W, Dayan P, Montague PR, 1997. A neural substrate of prediction and reward. Science 275, 1593–1599. [DOI] [PubMed] [Google Scholar]
  39. Sharpe MJ, Chang CY, Liu MA, Batchelor HM, Mueller LE, Jones JL, Niv Y, Schoenbaum G, 2017. Dopamine transients are sufficient and necessary for acquisition of model-based associations. Nat. Neurosci. 20, 735–742. 10.1038/nn.4538 [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Steinberg EE, Keiflin R, Boivin JR, Witten IB, Deisseroth K, Janak PH, 2013. A causal link between prediction errors, dopamine neurons and learning. Nat. Neurosci. 16, 966–973. 10.1038/nn.3413 [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Stuber GD, Klanker M, de Ridder B, Bowers MS, Joosten RN, Feenstra MG, Bonci A, 2008. Reward-predictive cues enhance excitatory synaptic strength onto midbrain dopamine neurons. Science 321, 1690–1692. 10.1126/science.1160873 [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Tan KR, Yvon C, Turiault M, Mirzabekov JJ, Doehner J, Labouèbe G, Deisseroth K, Tye KM, Lüscher C, 2012. GABA neurons of the VTA drive conditioned place aversion. Neuron 73, 1173–1183. 10.1016/j.neuron.2012.02.015 [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Tsai H-C, Zhang F, Adamantidis A, Stuber GD, Bonci A, de Lecea L, Deisseroth K, 2009. Phasic firing in dopaminergic neurons is sufficient for behavioral conditioning. Science 324, 1080–1084. 10.1126/science.1168878 [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Tye KM, Mirzabekov JJ, Warden MR, Ferenczi EA, Tsai H-C, Finkelstein J, Kim S-Y, Adhikari A, Thompson KR, Andalman AS, Gunaydin LA, Witten IB, Deisseroth K, 2013. Dopamine neurons modulate neural encoding and expression of depression-related behaviour. Nature 493, 537–541. 10.1038/nature11740 [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Waelti P, Dickinson A, Schultz W, 2001. Dopamine responses comply with basic assumptions of formal learning theory. Nature 412, 43–48. 10.1038/35083500 [DOI] [PubMed] [Google Scholar]
  46. Witten IB, Steinberg EE, Lee SY, Davidson TJ, Zalocusky KA, Brodsky M, Yizhar O, Cho SL, Gong S, Ramakrishnan C, Stuber GD, Tye KM, Janak PH, Deisseroth K, 2011. Recombinase-driver rat lines: tools, techniques, and optogenetic application to dopamine-mediated reinforcement. Neuron 72, 721–733. 10.1016/j.neuron.2011.10.028 [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES