Summary
Addiction is a disorder of behavioral control and learning. While this may reflect pre-existing propensities, drug use also clearly contributes by causing changes in outcome processing in prefrontal and striatal regions. This altered processing is associated with behavioral deficits, including changes in learning. These areas provide critical input to midbrain dopamine neurons regarding expected outcomes, suggesting that effects on learning may result from changes in dopaminergic error signaling. Here we show that dopamine neurons recorded in rats that had self-administered cocaine failed to suppress firing on omission of an expected reward and exhibited lower amplitude and imprecisely timed increases in firing to an unexpected reward. Learning also appeared to have less of an effect on reward-evoked and cue-evoked firing in the cocaine-experienced rats. Overall the changes are consistent with reduced fidelity of input regarding the expected outcomes, such as their size, timing, and overall value, because of cocaine use.
Keywords: addiction, cocaine, dopamine, prediction error, learning, single-unit, rodent
In the current study, Takahashi et al. show that prior cocaine causes lasting changes in dopaminergic errors, consistent with diminished predictive input. These changes may play a role in long-term sequellae of drug use, such as relapse, which define addiction.
Introduction
Addiction is a disorder characterized by a loss of control over behavior (APA, 2013). While this may involve an overwhelming desire or even need for drug (Koob and Le Moal, 2001, 2005; Robbins and Everitt, 1999; Robinson and Berridge, 2003, 2008), it may also reflect changes in how the brain processes information about normal or non-drug consequences (Jentsch and Taylor, 1999; Lucantonio et al., 2014a). Addicts are clearly less sensitive to such outcomes, whether punishing and rewarding. On the ground, this reduced sensitivity is evident both in ongoing drug seeking and in devastating long-term phenomena such as relapse (APA, 2013); in each case, drug is (by definition) chosen in the face of significant negative outcomes or in lieu of more positive outcomes. Experimentally, addiction is associated with changes in the value functions for rewards and punishments (Ersche et al., 2016; Goldstein et al., 2007b; Konova et al., 2012; Parvaz et al., 2012), changes linked to reduced insight and dysfunction of prefrontal circuits critical for evaluating consequences on-the-fly to guide behavior, especially the orbitofrontal cortex and striatum (Goldstein et al., 2007a; Goldstein et al., 2009; Jentsch and Taylor, 1999; Lucantonio et al., 2012; Moeller et al., 2012; Schoenbaum and Shaham, 2007; Volkow and Fowler, 2000). Interestingly, many of these deficits could be explained by changes in learning mechanisms, speculation supported by work in animal models, which isolate deficits in learning when predictions about outcomes are violated (Lucantonio et al., 2014b; Lucantonio et al., 2014c; Wied et al., 2013).
While not ruling out pre-existing deficits in these or other neural systems, extensive work in animal models has shown that deficits in using information about non-drug outcomes, both punishing and rewarding, can be caused by addictive drugs (Burke et al., 2006; Calu et al., 2007; Chen et al., 2013; George et al., 2008; Groman et al., 2018; Jentsch et al., 2002; Lucantonio et al., 2014b; Lucantonio et al., 2014c; Mendez et al., 2010; Nelson and Killcross, 2006; Roesch et al., 2007b; Schoenbaum and Setlow, 2005; Wied et al., 2013; Wyvell and Berridge, 2001). Notably these drug-induced deficits are observed after relatively limited drug exposure and are not restricted to a subset of animals, suggesting that declines in processing in these systems do not operate alone but are instead a key early change that could lay the foundation for subsequent progression to addiction. Further the effects are long-lasting, persisting weeks or months after cessation of drug use in short-lived rodents, thus they may be particularly relevant to long-term clinical problems such as relapse.
A key brain region involved in learning when predictions about outcomes are violated is the ventral tegmental area (VTA) and the dopamine neurons that reside therein (Langdon et al., 2017; Schultz, 2016; Schultz et al., 1997). Transient changes in the firing of these neurons correlate with errors in the prediction of rewarding (Cohen et al., 2012; D’Ardenne et al., 2008; Hollerman and Schultz, 1998; Mirenowicz and Schultz, 1994; Pan et al., 2005; Roesch et al., 2007a; Sadacca et al., 2016; Waelti et al., 2001) and punishing outcomes (Joshua et al., 2008; Matsumoto et al., 2016; Matsumoto and Hikosaka, 2009; Oleson et al., 2012) as well as value-neutral, informational events (Bromberg-Martin and Hikosaka, 2009; Horvitz, 2000; Horvitz et al., 1997; Howard and Kahnt, 2018; Takahashi et al., 2017; Tobler et al., 2003). These theoretically important teaching signals are thought to be important for a broad array of learned behaviors (Rescorla and Wagner, 1972; Sutton and Barto, 1981), and optogenetic advances have allowed this proposal to be directly confirmed for several specific exemplars (Chang et al., 2016; Chang et al., 2017; Keiflin et al., 2017; Sharpe et al., 2017; Steinberg et al., 2013). These include settings in which learning is impaired by addictive drugs, cited above. Contrary to ideas that these teaching signals may be directly enhanced in addiction (Redish, 2004), these results suggest they are in fact degraded, either as a result of primary effects on integration of information in VTA or secondary effects on how neural systems upstream represent expected outcomes (Borgland et al., 2004; Burton et al., 2018; Burton et al., 2017; Lucantonio et al., 2014c; Stalnaker et al., 2006; Takahashi et al., 2007; Takahashi et al., 2008; Ungless et al., 2001). Indeed, several groups have shown changes in error-related negativity in prefrontal regions or BOLD response in striatal areas in addicts (Baker et al., 2010; Park et al., 2010; Parvaz et al., 2015; Tanabe et al., 2013), measures thought to reflect dopaminergic error signaling.
Here we directly tested whether a short course of cocaine self-administration, previously shown to alter outcome-guided behavior and learning (Lucantonio et al., 2014b; Lucantonio et al., 2014c; Roesch et al., 2007b; Wied et al., 2013), affects signaling of reward prediction errors by midbrain dopamine neurons. Rats were trained to self-administer cocaine and then, after at least a month of forced abstinence, single unit activity was recorded from VTA dopamine neurons in response to positive and negative reward prediction errors induced by changes in the number or timing of an expected reward, using a task in which we have shown initial learning to be affected by cocaine self-administration (Roesch et al., 2007b). Compared to dopamine neurons recorded in the sucrose-trained or naïve controls, those in cocaine-experienced rats failed to signal negative prediction errors in response to reward omission, showed diminished signaling of positive prediction errors in response to unexpected rewards, and failed to fire differentially to reward-predictive cues. These results resemble changes in dopaminergic error signals observed after lesions to orbitofrontal and ventral striatal regions directly upstream (Jo and Mizumori, 2015; Takahashi et al., 2016; Takahashi et al., 2011), consistent with evidence that drug-experience affects processing of outcomes in these regions (Lucantonio et al., 2014c; Stalnaker et al., 2006; Takahashi et al., 2007; Takahashi et al., 2008). These data add to evidence that early experience with addictive drugs has profound effects on neural circuits that process outcomes and provide direct evidence for how these changes may impact basic associative learning mechanisms.
Results
Prior to recording, rats in both the experimental (n = 9) and control (n = 8) groups were shaped to perform an odor-guided choice task used previously to characterize signaling of reward prediction errors by VTA dopamine neurons (Roesch et al., 2007a; Takahashi et al., 2016; Takahashi et al., 2011). Subsequently, rats in the experimental group were trained to self-administer cocaine on an FR1 schedule; sessions lasted 3 hours per day for 14 days. For comparison, some rats in the control group (n = 3 of 8) were trained to self-administer sucrose using identical procedures (Figs. 1A and B, see caption for statistics).
Figure 1: Experimental timeline and sucrose/cocaine self-administration.
(A) Shown is the experimental timeline. (B) Line graph showing number of reinforcements (triangles), and responses on the active (filled circles) and inactive (open circles) levers during sucrose (n = 3) 3hr self-administration. 2-way ANOVA comparing lever (active/inactive) and session revealed a significant interaction between lever and session (F13,52 = 4.65, p < 0.01). (C) Line graph showing number of reinforcements (triangles), and responses on the active (filled circles) and inactive (open circles) levers during cocaine (n = 9) 3hr self-administration. 2-way ANOVA comparing lever (active/inactive) and session revealed a significant interaction between lever and session (F13,208 = 12.5, p < 0.01). Error bars, s.e.m.
After 14 days of self-administration, each rat had a drivable recording electrode implanted in the ventral tegmental area (VTA) and, approximately 3 weeks later, we began recording single-unit activity in VTA in this task. During recording, the rats sampled one of three different odor cues at a central port on each trial and then responded at one of two adjacent fluid wells (Fig. 2A). One odor signaled the availability of reward only in the left well (forced left), a second odor signaled the availability of reward only in the right well (forced right), and a third odor signaled that reward was available at either well (free choice). To induce errors in reward prediction, we manipulated either the timing or the number of rewards delivered in each well across 5 blocks of trials (Fig. 2B). Positive prediction errors were induced by making a previously delayed reward immediate (blue arrows in Fig. 2B, 2sh, 3sh and 4bg) or by increasing the number of rewards (yellow arrows in Fig. 2B, 4bg and 5bg), whereas negative prediction errors were induced by delaying a previously immediate reward (red arrows in Fig. 2B, 2lo and 3lo) or by decreasing the number of rewards (green arrow in Fig. 2B, 5sm).
Figure 2: Apparatus and behavioral results.
(A) Picture of apparatus used in the task, showing the odor port (~2.5 cm diameter) and two fluid wells. (B) Line deflections indicate the time course of stimuli (odors and rewards) presented to the animal on each trial. Dashed lines show when reward was omitted, and solid lines show when reward was delivered. At the start of each recording session one well was randomly designated as short (a 0.5 s delay before reward) and the other, long (a 1–7 s delay before reward) (block 1). In the second and third blocks of trials, these contingencies were switched. In block 4, the delay was held constant while the number of rewards was manipulated; one well was designated as big reward in which two additional boli of reward was delivered (big reward) and a single bolus of reward was delivered in the other well (small reward). In block 5, these contingencies were switched again. Blue arrows, unexpected short reward; red arrows, short reward omission; yellow arrows, unexpected big reward; green arrow, big reward omission. (C and I) Choice behavior in last 3 trials before the switch and first 10 and last 10 trials after the switch from high-valued outcome to a low-valued outcome in timing (C) and number blocks (I). Inset bar graphs show average percentage choice for high-valued (black) and low-valued (white) outcomes across all free-choice trials. Ctrl, control rats; Cocaine, cocaine-treated rats. (D – H and J – N) Behavior on forced-choice trials in timing (D – H) and number blocks (J – N). Bar graphs show percentage correct (D and J), percent of early unpoke (E and K), reaction times from odor offset to left odor port (F and L), reaction times to well entry (G and M), and reaction times to nosepoke after light on (H and N) in response to the high and low value across all recording sessions. *p < 0.05 (see main text for statistics); NS, non-significant. Error bars, s.e.m.
Cocaine self-administration did not affect value-based responding at the time of recording
As expected, control rats changed their choice behavior across blocks in response to the changing rewards, choosing the higher value reward more often on free-choice trials in both timing and number blocks (timing blocks, t-test, t97 = 18.5, p < 0.01, Fig. 2C; number blocks, t97 = 14.3, p < 0.01, Fig. 2I). On forced-choice trials, they responded more accurately (timing blocks, t-test, t97 = 11.7, p < 0.01, Fig. 2D; number blocks, t97 = 9.28, p < 0.01, Fig. 2G), with more early unpokes from odor port (timing blocks, t-test, t97 = 3.02, p < 0.01, Fig. 2E; number blocks, t97 = 2.04, p < 0.05, Fig. 2K) and shorter reaction times to leave odor port (timing blocks, t-test, t97 = 4.23, p < 0.01, Fig. 2F; number blocks, t97 = 4.44, p < 0.01, Fig. 2L) and move from the odor port to the fluid well (timing blocks, t-test, t97 = 3.02, p < 0.01, Fig. 2G; number blocks, t97 = 3.93, p < 0.01, Fig. 2M) when either the earlier or larger reward was at stake. There was no effect of sucrose training on behavior during recording (ANOVA’s comparing behavior of sucrose and naïve rats: F’s < 3.3, p’s > 0.08) or on the error correlates of the neurons recorded in the controls (see Supplemental figure 2. Related to Fig. 5), so they were treated as a single group in the subsequent neural analyses.
Figure 5: Changes in activity of reward-responsive dopamine neurons to unexpected changes in timing and number of reward.
(A – H) Average firing of all reward-responsive dopamine neurons in control (A, B, E, F) and cocaine rats (C, D, G, H) on first five and last five trials after introduction of unexpected delivery of short reward (A and C), unexpected big reward (B and D), omission of expected short reward (E and G) and omission of expected big reward (F and H). (I – L) Distributions of difference scores comparing firing to unexpected reward (left) and reward omission (right) in the first 5 and versus last 5 trials in timing (I and K) and number (J and L) blocks in control (I and J) and cocaine (K and L) group. Difference scores were computed from the average firing rate of each neuron in the first 5 minus last 5 trials in relevant trial blocks. The numbers in the upper right of each panel indicate results of Wilcoxon signed-rank test (p) and the average difference score (u). (M – P) Average firing from 100–500ms after delivery of short reward and big reward, or after omission of short reward and big reward in control (M and N) and cocaine rats (O and P). Error bars, s.e.m. Both forced- and free-choice trials were included for the analysis shown in this figure.
Rats in cocaine group showed similar changes in behavior (percent choice in timing blocks, t-test, t146 = 19.2, Fig. 2C, p < 0.01; in number blocks, t146 = 15.4, p < 0.01, Fig. 2I; percent correct in timing blocks, t146 = 9.42, p < 0.01, Fig. 2D; in number blocks, t146 = 9.27, p < 0.01, Fig. 2J; percent early unpokes in timing blocks, , t146 = 5.64, p < 0.01, Fig. 2E; in number blocks, t146 = 6.32, p < 0.01, Fig. 2K: reaction time leaving odor port in timing blocks, t146 = 7.58, p < 0.01, Fig. 2F; in number blocks, t146 = 9.23, p < 0.01, Fig. 2L; reaction time responding fluid well in timing blocks, t146 = 5.41, p < 0.01, Fig. 2G; in number blocks, t146 = 7.55, p < 0.01, Fig. 2M). Three-factor ANOVAs (group x manipulation x value) on the data from each behavioral measure revealed no significant effects involving group except in three places. The first was in the analysis of percent correct on forced-choice trials (Fig 2D vs 2J), where we found a significant group x value interaction (F1,243 = 5.12, p < 0.05). This appeared to reflect modestly better accuracy in the cocaine rats on low-value trials; note the specificity of this effect to low value trials may simply be due to the ceiling on performance on high value trials. The second place we found an effect was in the analysis of early unpokes from the odor port (Fig 2E vs 2K, which revealed a significant main effect of group (F1,243 = 33.3, p < 0.01). This appeared to reflect somewhat increased impulsivity in the cocaine rats, such that they left the odor port early (before odor) a bit more often than controls. The third was in the analysis of the reaction time (Fig 2G vs 2M), where we found a significant main effect of group (F1,243 = 13.6, p < 0.01). This reflected modestly slower response times from the odor port to the fluid well in the cocaine rats. In each case, the differences in behavior were minimal and did not involve time periods used for subsequent neural analyses.
Cocaine self-administration did not affect the prevalence, features and baseline activity of putative dopamine neurons
We identified putative dopamine neurons by means of a waveform analysis similar to that typically used to identity dopamine neurons in primate studies (Bromberg-Martin et al., 2010; Fiorillo et al., 2008; Hollerman and Schultz, 1998; Kobayashi and Schultz, 2008; Matsumoto and Hikosaka, 2009; Mirenowicz and Schultz, 1994; Morris et al., 2006; Waelti et al., 2001). This analysis isolates neurons in rat VTA whose firing is sensitive to intravenous infusion of apomorphine or quinpirole (Jo et al., 2013; Roesch et al., 2007a). Neurons identified in this manner are also selectively eliminated by expression of a Casp3 neurotoxin in TH+ neurons in VTA (by infusion of AAV1-Flex-TaCasp3-TEVp into TH-Cre transgenic rats; (Takahashi et al., 2017).
This approach identified as putatively dopaminergic 58 of 320 and 85 of 492 VTA neurons recorded in control and cocaine treated rats, respectively (Figs. 3A and B). These proportions did not differ between groups (Chi-square = 0.10, p = 0.76), and there were no apparent effects of cocaine on the waveform characteristics of the putative dopamine neurons (Fig. 3C, t-test; p’s > 0.20). Of these, 42 neurons in controls and 55 in cocaine rats increased firing in response to reward. The average baseline activity was similar in the two groups, both for these reward-responsive dopamine neurons as well as for the remaining dopamine neurons that were not responsive to reward (Fig. 3D, control vs cocaine, t-test; reward-responsive dopamine neurons, t95 = −0.74, p = 0.46; reward nonresponsive dopamine neurons, t44 = −0.64, p = 0.52). Of note, neurons categorized as non-dopaminergic did show significantly higher baseline firing rates in the cocaine-treated rats (t-test; t667 = 3.05, p = 0.002), however their activity did not differ in response to prediction errors in the trials (see supplemental figure 1 for analyses of nondopamine neurons. Related to Fig. 5).
Figure 3: Identification and waveform features of putative dopamine neurons.
(A, B) Result of cluster analysis based on the half-time of the spike duration and the ratio comparing the amplitude of the first positive and negative waveform segments ((n – p) / (n + p)). Data in (A) show VTA neurons (n = 320) from the control group (n = 8), plotted as reward-responsive (filled black circles) and nonresponsive dopamine neurons (filled gray circles), and neurons that classified with other clusters, no clusters, or more than one cluster (open circles). Data in (B) show VTA neurons (n = 492) from the rats that received prior cocaine treatment (n = 9). Insets in each panel indicate location of electrode tracks in control (A) and cocaine (B) groups. (C) Bar graphs indicate average amplitude ratio and half duration of putative dopamine neurons in control (black) and cocaine rats (open). (D) Average baseline firing of reward-responsive (Rew DA), nonresponsive dopamine neurons (Non-Rew DA) and non-dopamine neurons (Non DA) in control (black) and cocaine rats (open). *p < 0.05 (see main text for statistics); NS, non-significant. Error bars, s.e.m.
Cocaine self-administration disrupted value-based error signaling at the time of reward
As in prior studies (Roesch et al., 2007a; Takahashi et al., 2016; Takahashi et al., 2011), prediction error signaling was largely restricted to reward-responsive putative dopamine neurons (see supplemental figure 1 for analysis of non-dopamine neurons. Related to Fig. 5). As shown in raster examples in Figure 4, the activity of these neurons in control rats increased in response to an unexpected reward and decreased in response to omission of an expected reward, both in timing (Figs.4A and E) and number (Figs. 4C and G) blocks. As a population, these neurons also showed changes in firing when an unexpected reward was delivered (Figs. 5A and B) or an expected reward was omitted (Figs. 5E and F) in both timing and size blocks. To quantify these changes in firing, we computed difference scores for each neuron comparing the average firing at the beginning versus the end of the blocks at the time of reward delivery or omission. In timing blocks, the distribution was shifted above zero when delayed reward became immediate (left in Fig. 5I) and below zero when immediate reward was delayed (right in Fig. 5I). In number blocks, the distribution was shifted above zero when an additional 2nd reward was delivered (left in Fig. 5J) and below zero when the 2nd reward was omitted (right in Fig. 5J). In each case, the changes in firing were maximal at the start of the block, diminishing with learning as the block proceeded (Figs. 5M and N, ANOVAs, F’s > 4.30, p’s < 0.05).
Figure 4: Activity in representative neurons in response to changes in timing and number of reward.
(A and B) Raster plots of activity of three representative neurons in control (A) and cocaine rats (B) in response to an unexpected short reward. Gray shading, time of short reward delivery. (C and D) Raster plots of activity of three representative neurons in control (C) and cocaine rats (D) in response to an omission of short reward. Gray shading, time of short reward omission. (E and F) Raster plots of activity of three representative neurons in control (E) and cocaine rats (F) in response to an unexpected big reward. Gray shading, time of big reward. (G and H) Raster plots of activity of three representative neurons in control (G) and cocaine rats (H) in response to an omission of big reward. Gray shading, time of big reward omission. Both forced- and free-choice trials were included for the analysis shown in this figure.
Prior cocaine exposure disrupted these iconic error correlates. In the timing blocks, reward-responsive dopamine neurons recorded in rats in the cocaine group failed to show changes in firing when a delayed reward was made immediate (Figs. 4B and5C) or when an immediate reward was delayed (Figs. 4D and5G). In the number blocks, these neurons increased firing when an additional drop of reward was delivered (Figs. 4F and5D), although the phasic increase in firing was somewhat more muted and less well timed, compared to that observed in controls (Figs. 5D vs 5B). In addition, these neurons did not show reduced firing when these additional drops were omitted (Figs. 4H and 5H). These effects were again quantified by computing difference scores in response to changes in reward in the timing (Fig. 5K) and number blocks (Fig. 5L). The distribution of these scores was above zero when a new drop was delivered in the number blocks (Fig. 5L left), however it was unaffected by omission of these additional drops (Fig. 5L right) or by changes in reward timing (Fig. 5K). In the case of additional drops in the number blocks, the firing was maximal initially and declined across the block, as in controls (Fig. 5P, black line; F1,54 = 7.64, p < 0.01); however, activity did not change significantly across the block in response to the other manipulations (Figs. 5P and 5O , F’s > 0.50, p’s > 0.05).
ANOVAs comparing data between the two groups revealed significant group interactions for timing blocks (Fig. 5M versus 5O; group x reward/omission x trial; F19,1805 = 1.99, p < 0.01). Accordingly, the distributions of the difference scores quantifying the firing changes in these blocks were significantly different between the two groups (histograms, Fig. 5I versus 5K; Wilcoxon rank-sum test; p’s < 0.05), reflecting the fact that dopamine neurons recorded in controls changed firing in response to changes in reward timing, whereas those recorded in cocaine-experienced rats did not. On the other hand, ANOVAs comparing data from the number blocks revealed no significant group interactions (Fig. 5N versus Fig. 5P; group x reward/omission x trial; F19,1805 = 0.86, p =0.64). Post-hoc analyses showed a significant interaction between group and firing on early versus late trials in response to reward omission (F1,95 = 7.79, p < 0.01), but not to the addition of an unexpected reward (F1,95 = 0.74, p= 0.39). Consistent with this analysis, the distributions of the difference scores (Fig. 5J and 5L) were also significantly different for reward omission (Wilcoxon rank-sum test; p < 0.05) but not reward addition (p = 0.96). Together, these analyses show that putative dopamine neurons in rats with prior cocaine exposure responded differently to changes in the timing of reward as well as omission of an expected reward.
Cocaine self-administration diminished the influence of learned expectations on value-based error signaling at the time of reward
Although the pattern of effects described above appears complex, much of it may be parsimoniously interpreted as resulting from a loss of input regarding the expected rewards. That is, the dopamine neurons in the cocaine-experienced rats appear to be not receiving or appropriately processing the learned expectations about reward that normally influence the firing of these neurons, evident in controls in this experiment and in many other studies. This is most easily seen in the complete lack of any effect of reward omission on the firing of the neurons in cocaine-experienced rats in both the timing and number blocks. The effect of reward omission depends entirely on the receipt of information that reward is to be expected at a particular time. If the neurons were not receiving this information, they would fail to properly register reward omission.
Diminished influence of learned expectations was also apparent in other aspects of the results. For example, dopamine neurons in the cocaine group fired similarly to immediate reward early versus later in the block (Fig. 5C and 5K). While this could be interpreted as a failure to signal the prediction error that occurs at the beginning of the block, the actual firing of these neurons appears to be higher at the end of the block than it is in controls (blue lines, Fig. 5C vs 5A). This suggests that cocaine may prevent the normal decline in firing that occurs with learning across blocks. To investigate this possibility, we compared firing to immediate reward during timing blocks, when it is temporally unstable on one of the two sides in each block, to that on number blocks, when it is always delivered consistently at the same time relative to well entry. We used the average firing in response to this more temporally stable reward in block 4 to normalize the firing of the dopamine neurons to the immediate reward in the timing blocks. This analysis revealed that while firing in controls declined from an initial maximum almost all the way down to the level of the stable reward delivery by the end of the timing blocks (black line in Fig. 6A, ANOVA, F1,41 = 22.6, p < 0.01), activity in cocaine-experienced rats did not decline significantly in the timing blocks, and thus failed to reach the level of stable reward delivery by the end of the block (gray line in Fig. 6A, ANOVA, F1,54 = 3.73, p > 0.05). This suggests the firing of these neurons was not influenced as strongly by learned expectations developed across the timing blocks. A direct comparison between groups revealed a significant interaction between group and learning (ANOVA, F1,95 = 4.42, p < 0.05).
Figure 6: Firing of reward responsive dopamine neurons to reward.
(A) Average normalized firing of all reward-responsive dopamine neurons in control (black) and cocaine (white) rats to an unexpected short reward on early and late trials. Activity was normalized by subtracting average firing to an expected small reward in block 4. An ANOVA revealed a significant effect of learning (F1,41 = 22.6, p < 0.01) in control rats, but not in cocaine rats (F1,54 = 3.73, p > 0.05). A direct comparison between groups revealed a significant main interaction between group and learning (F1,95 = 4.42, p < 0.05). F5, first 5 trials; L5, last 5 trials. (B) Average normalized firing of reward-responsive dopamine neurons in control (black) and cocaine (white) rats to each bolus of big reward. Activity was normalized by subtracting average firing to an expected small reward in block 4. Firing of dopamine neurons to the 1st bolus of big reward in the cocaine group was significantly higher than that in the control group (ANOVA, F1,95 = 5.58, p < 0.05). There were no differences between control and cocaine group in firing to 2nd and 3rd boli of big reward (F’s < 1.14, p’s > 0.29). 1st, 1st bolus; 2nd, 2nd bolus; 3rd, 3rd bolus of big reward. (C) Average normalized firing of reward-responsive dopamine neurons in control (black) and cocaine (white) rats during peak and gap periods of big reward epoch. Activity was normalized by subtracting average baseline firing. Peak epochs consisted of 100–400ms after each bolus delivery, whereas gap epochs consisted of the 100ms before and after each peak epoch. Dopamine neurons in cocaine rats showed significantly higher firing than those in control rats in gap epochs (ANOVA, F1,95 = 4.02 p < 0.05) whereas there was no difference in peak epochs (F1,95 = 0.12 p > 0.05). A two-way ANOVA comparing group and epoch (peak/gap) revealed a significant main interaction between group and epoch (F1,95 = 5.70 p < 0.05). Pk, peak epoch; Gp, gap epoch. Both forced-and free-choice trials were included for the analysis shown in this figure.
Similarly, subtle effects were also evident in the number blocks. On the final trial block, dopamine neurons in controls showed increased firing to the unexpected 2nd and 3rd drops, but not to the 1st drop (Fig. 5B), which would have been fully expected based on its delivery throughout the previous block. By contrast, dopamine neurons in cocaine-experienced rats fired slightly more to the 1st drop on early versus late trials in this block (Fig. 5D), an effect that is again consistent with a reduced influence of learned expectations about reward delivery. To quantify this, we again normalized the firing to each drop of the big reward by the average firing to the fully expected single drop in block 4. The result confirmed that the firing of the dopamine neurons in the cocaine group was significantly higher compared to that in controls to the 1st but not the 2nd and 3rd drops (Fig. 6B,1st drop ANOVA, F1,95 = 5.58, p < 0.05; 2nd and 3rd drop ANOVAs, F’s < 1.14, p’s > 0.29).
Finally, cocaine also appeared to reduce the temporal specificity of the positive prediction error signal elicited by the unexpected additional drops of reward in the number blocks (Fig. 5D). Again this effect could be attributed to a loss of information about when reward should be expected. To quantify this, we subdivided the 1500ms period of reward after the 1st bolus into peak epochs, 100–400ms after delivery of each bolus, and gap epochs, consisting of all the time in between. In the peak epochs, dopamine neurons in the two groups showed similar average activity (Fig. 6C, ANOVA, F1,95 = 0.12, p > 0.05), whereas in the gap epochs dopamine neurons in the cocaine group showed significantly higher average firing (Fig. 6C, ANOVA, F1,95 = 4.02, p < 0.05). A two-way ANOVA comparing group and epoch revealed a significant interaction (Fig. 6C, ANOVA, F1,95 = 5.70, p < 0.05).
Cocaine self-administration diminished the influence of learned expectations on value-based error signaling at the time of the cues
Prediction error signaling was also evident in control rats in response to the presentation of the odor cues that predicted differently valued rewards, with dopamine neurons firing more to the cue predicting the higher valued reward at the end of blocks in both timing (Fig.7A) and size block (Fig. 7B). This again reflected learned information about reward expectations, as it was not observed at the beginning of blocks but developed across blocks (Figs 7E and F). Since there was no difference between timing/number manipulations in both control and cocaine groups (ANOVA, Fs < 1.39, ps > 0.12), we collapsed data from timing and number blocks in subsequent analyses. An ANOVA comparing firing to the high- and low-value cues on the first and last 10 trials revealed a significant interaction between value and trials (F19,1577 = 2.83, p < 0.01), and difference scores comparing each neuron’s firing to the high- versus low-value cues early versus late were significantly above zero (Fig. 7B). Dopamine neurons recorded in the cocaine-experienced rats did not show this learning effect; instead, their firing was largely insensitive to the acquired predictive value of the cues (Figs. 7C and D). An ANOVA comparing average firing on first and last 10 trials revealed no significant interaction between value and trial (Fig. 7C; F19,2071 = 0.99, p > 0.05), and difference scores comparing each neuron’s firing to the high- versus low-value cues early versus late were distributed around zero (Fig. 7D). Direct comparison of data from the two groups (3-factor ANOVA) indicated a significant interaction (group x value x trial; F19,3648 = 1.73, p < 0.05).
Figure 7: Changes in activity of reward-responsive dopamine neurons to high- and low-valued cues.
(A – D) Right plots show difference of normalized firing to high- and low value cues of all reward-responsive dopamine neurons to early (red) and late (blue) trials in control (A and B) and cocaine rats (C and D) in delay (A and C) and size (B and D) blocks. Left in each panel represents distribution of difference scores between high- (short or big) and low-valued (long or small) cues in the first 5 versus the last five in relevant trial blocks. The numbers in upper left of each panel indicate results of Wilcoxon signed-rank test (p) and the average difference score (u). (E – H) Normalized average firing of reward-responsive dopamine neurons to the cues signaled short reward (black in E and G), long reward (gray in E and G), big reward (black in F and H) and small reward (gray in F and H) on previous 2, first 10 and last 10 trials in relevant trial blocks. Error bars, s.e.m. Only forced-choice trials were included in the analysis shown in this figure.
Discussion
Here we tested whether prior use of cocaine would affect expectancy-related changes in the signaling of reward prediction errors by VTA dopamine neurons. This hypothesis was suggested by direct evidence in animal models that such experience reduces outcome signaling in prefrontal (Chen et al., 2013; Lucantonio et al., 2014c; Stalnaker et al., 2006) and striatal regions (Burton et al., 2018; Burton et al., 2017; Takahashi et al., 2007), areas known to provide critical input to VTA dopamine neurons regarding expected outcomes (Jo et al., 2013; Jo and Mizumori, 2015; Takahashi et al., 2016; Takahashi et al., 2011). Reduced outcome signaling is associated with behavioral deficits, including changes in learning in response to unexpected outcomes (Lucantonio et al., 2014c), suggesting a loss of error signaling downstream. Consistent with this prediction, we found self-administration of cocaine was associated with changes in error signaling in VTA dopamine neurons when compared to similar neurons recorded in controls, including after sucrose self-administration. Specifically, dopamine neurons recorded in rats experienced with cocaine failed to suppress firing on omission of an expected reward and exhibited lower amplitude and imprecisely timed increases in firing on delivery of an unexpected reward. Learning also appeared to have less of an effect on reward-evoked firing in the cocaine-experienced rats. This was evident the response of these neurons to the immediate reward in the timing blocks, which remained abnormally high at the end of the block, and to the first reward in the number blocks, which was abnormally high initially, changes that in each case suggest a reduced influence of learned expectations of reward (notably despite normal behavior in this well-learned setting). In addition, these neurons also failed to show normal changes in firing to the differently valued cues with learning, again changes that are founded in the influence of learned expectations. Overall the effects of cocaine are consistent with reduced fidelity of input regarding the expected outcomes, such as their size, timing, and overall value.
Such reductions may be related to changes previously shown in upstream regions after use of cocaine and other addictive drugs. Although correlative reports suggest that dopamine neurons receive somewhat redundant information from multiple sources regarding expected and actual outcomes and even prediction errors themselves (Matsumoto and Hikosaka, 2007; Tian et al., 2016), causal studies show loss of input from prefrontal areas and ventral striatum has marked and very specific effects on the error signaling, consistent with a loss of information necessary for shaping the predictions underlying the dopaminergic reward prediction errors (Jo et al., 2013; Jo and Mizumori, 2015; Takahashi et al., 2016; Takahashi et al., 2011). For example, in this exact task, we found that dopamine neurons in orbitofrontal-lesioned rats responded less to unexpected reward, not at all to reward omission, and also showed less of an influence of learning both at the time of reward and in response to reward-predictive cues. These effects are remarkably similar to what we have reported here. The similarity to the effect of orbitofrontal and to a lesser extent striatal lesions is consistent with the effects of psychostimulants (and possibly opiates) on processing in these areas. As cited earlier, a large body of literature shows that use of psychostimulants in animal models causes behavioral deficits that resemble those caused by striatal and prefrontal, particularly orbitofrontal, damage. We have linked impaired behavior and learning to a specific loss of outcome-related signaling in orbitofrontal and ventral striatal neurons after cocaine use. The results here are consistent with what a loss of outcome-related signaling in upstream areas might be expected to produce. Without minimizing the potential importance of drug-induced changes in other areas including the midbrain itself (Argilli et al., 2008; Borgland et al., 2004; Ungless et al., 2001), this raises the possibility that altered dopaminergic error signaling may result from the effects on of drug use on these upstream areas.
Is this relevant to addiction? As noted above, there is good evidence that addicts have very similar behavioral and learning deficits. In addition, several groups have reported reductions in secondary measures thought to reflect dopaminergic prediction errors in addicts (Baker et al., 2010; Park et al., 2010; Parvaz et al., 2015; Tanabe et al., 2013). Although dopamine neurons are not the only error signaling mechanism in the brain that might be affected to alter these measures (Asaad and Eskandar, 2011; Bryden et al., 2011; Hyman et al., 2017; Matsumoto and Hikosaka, 2007; Matsumoto et al., 2007), the current data directly implicate changes in dopaminergic error signaling. Adding to the potential relevance of these results are reports of diminished dopamine efflux in striatum in rats engaged in cocaine-seeking (Willuhn et al., 2014). This reduction may be related to the diminished signaling observed here, reflecting perhaps a decoupling of the expectation of the drug high from the drug-seeking behavior. Notably the reduction was most marked in rats who escalated their drug intake, suggesting that reduced phasic dopamine function at least in striatum may be a particularly important marker for progression to addiction.
However, the effects demonstrated here would presumably have other far reaching consequences, since they occurred independent of and in subjects long-removed from drug taking. The changes in error signaling demonstrated here and the altered processing in other regions are neither short-lived nor restricted to learning about drugs or drug-associated cues. As a result, their impact would be felt long-term and widely, altering the ability of addicts to behave and learn normally in their daily lives. Critically. the ultimate effects on behavior could be complex, even if one considers only simple situations involving learning to maximize utility or value. For example, while the changes shown here would impair learning on reward omission, they might be expected to enhance learning in response to unexpected rewards, and to lead to inappropriate learning about low versus high value reward-predictors. Indeed we have observed such effects, if subtle, during initial learning in the same task used here (Roesch et al., 2007b). This complexity – and its impact – only increases if one considers the emerging evidence that dopaminergic errors contribute to value-neutral associative learning (Langdon et al., 2017). Though generally one would expect suboptimal and abnormal choices, the exact behavioral results might be quite variable in uncontrolled settings.
Another interesting future question is whether the altered signaling might be restricted to dopaminergic subpopulations with specific downstream targets. While current dogma holds that dopaminergic error signals are broadcast widely and homogenously, emerging data has begun to contradict this account. And even if the signal in normal subjects is homogenous, drug-induced changes evident here might be more pronounced in projections to some target regions than in projections to others. Such anatomical specificity would be an important piece of the puzzle of addiction.
Notably, these changes may be amenable to therapeutic intervention. In animal models, the hypoexcitability that seems to underlie changes in prefrontal information processing can be reversed by activating prefrontal networks (Chen et al., 2013; Lucantonio et al., 2014c); these manipulations reduce compulsive cocaine seeking and recover normal learning, suggesting that prefrontal manipulations both change behavior directly relevant to drug-seeking and impact the changes in error signals to non-drug outcomes shown here. Attempts to leverage this clinically, using transcranial magnetic stimulation, have met with success (Terraneo et al., 2016). Alternatively, it may also be possible to compensate for the changes in dopamine function in a more limited way, using pharmacologic agents. L-DOPA given to rats was sufficient to reduce escalation in the aforementioned study associating reduced dopamine with drug seeking (Willuhn et al., 2014), and such tonic treatments have been reported to specifically affect learning dependent on phasic dopamine signaling (Rutledge et al., 2009; Wunderlich et al., 2012). Indeed, dopaminergic agents have shown promise in combination with contingency management as a way to treat addiction (Schmitz et al., 2010; Schmitz et al., 2008).
STAR Methods Text
CONTACT FOR REAGENT AND RESOURCE SHARING
Further information and requests for resources and reagents should be directed to and will be fulfilled by the Lead Contact, Geoffrey Schoenbaum (geoffrey.schoenbaum@nih.gov).
EXPERIMENTAL MODEL AND SUBJECT DETAILS
Seventeen male Long-Evans rats (Charles River Labs, Wilmington, MA), aged approximately 3 months at the start of the experiment, were used in this study. Rats were tested at the NIDA-IRP in accordance with NIH guidelines determined by the Animal Care and Use Committee.
METHOD DETAILS
Surgical procedures:
All surgical procedures adhered to guidelines for aseptic technique. For implantation of chronically indwelling intravenous catheters to allow intravenous self-administration of cocaine, rats were anaesthetized with ketamine (100 mg/ kg, i.p., Sigma) and xylazine (10 mg/kg, i.p., Sigma), and a silastic catheter was inserted into the right jugular vein and passed subcutaneously to the back, where it was attached to a modified 22-gauge cannula (Plastics One) and fixed to the rat’s back with sutures. Carprofen (0.1 mg/kg, s.c., Pfizer) was given after surgery as an analgesic. Rats recovered for 7–10 days before starting behavioral testing. During recovery and self-administration training, catheters were flushed every day with sterile 0.9% saline + Gentamicin (0.08 mg/mL, BioSource International).
For implantation of recording electrodes, rats were anaesthetized with isoflurane and underwent stereotaxic surgery to implant a drivable bundle of eight 25-um diameter FeNiCr wires (Stablohm 675, California Fine Wire, Grover Beach, CA) dorsal to VTA in the left or right hemisphere at 5.3 mm posterior to bregma, 0.7 mm laterally, and 7.5 mm ventral to the brain surface at an angle of 5° toward the midline from vertical. Prior to insertion, wires were cut with surgical scissors to extend ~ 2.0 mm beyond the cannula and electroplated with platinum (H2PtCl6, Aldrich, Milwaukee, WI) to an impedance of 500~900 kOhms. Cephalexin (15 mg/kg p.o.) was administered twice daily for two weeks post-operatively
Histology:
After the recording experiment, all rats were perfused with phosphate-buffered saline (PBS) followed by 4% paraformaldehyde (Santa Cruz Biotechnology Inc., CA). Brains were cut in 40 μm sections and stained with thionin to visualize electrode location.
Self-Administration:
Rats were trained to self-administer cocaine-HCl (n = 9, 0.75 mg/kg/infusion, dissolved in sterile 0.9% saline) or sucrose (n = 3) for 14 consecutive days, 3h per day. Rats were trained in standard behavioral chambers purchased from Coulbourn Instruments, each enclosed in a sound-resistant shell. Each chamber was equipped with two levers, located on opposite walls and 8 cm from the grid floor. For cocaine training, silastic tubing shielded with a metal spring extended from the intravenous catheter to a liquid swivel (Instech Laboratories, Plymouth Meeting) mounted on an arm fixed outside of the operant chamber. Tygon tubing extended from the swivel to an infusion pump (Med Associates Inc) located adjacent to the external chamber. For sucrose training, a dipper was recessed in the center of one end wall.
Cocaine or sucrose was delivered under a fixed ratio 1 (FR1) schedule of reinforcement, such that every press on the active lever delivered either a 0.04 ml bolus of 10 % sucrose or a 4s infusion of cocaine. Daily sessions lasted 3 h, with 15-min timeout periods after each hour. Each session began with the insertion of the active lever. Each sucrose delivery or drug infusion was accompanied by the retraction of the active lever and followed by a 40-sec timeout period after which the lever was inserted again. Pressing on the inactive lever had no programmed consequences. The number of rewards (sucrose or cocaine) was limited to 20/h to prevent cocaine overdose. After 20 rewards, the active lever was retracted for the remainder of the hour.
Odor-guided choice task:
Recording was conducted in aluminum chambers approximately 18” on each side with sloping walls narrowing to an area of 12” x 12” at the bottom. A central odor port was located above two fluid wells (Fig. 2A). Two lights were located above the panel. The odor port was connected to an air flow dilution olfactometer to allow the rapid delivery of olfactory cues. Odors where chosen from compounds obtained from International Flavors and Fragrances (New York, NY). Trials were signaled by illumination of the panel lights inside the box. When these lights were on, a nosepoke into the odor port resulted in delivery of the odor cue to a small hemicylinder located behind this opening. One of three different odors was delivered to the port on each trial, in a pseudorandom order. At odor offset, the rat had 3 seconds to make a response at one of the two fluid wells. One odor instructed the rat to go to the left to get reward, a second odor instructed the rat to go to the right to get reward, and a third odor indicated that the rat could obtain reward at either well. Odors were presented in a pseudorandom sequence such that the free-choice odor was presented on 7/20 trials, and the left/right odors were presented in equal numbers. In addition, the same odor could be presented on no more than 3 consecutive trials. Once the rats were shaped to perform this basic task, we introduced blocks in which we manipulated either the number (1 vs 3 drops) or timing (immediate versus delayed) of reward delivery (Fig. 2B). For recording, one well was randomly designated as short and the other long at the start of the session (Fig. 2B, 1sh and 1lo). In the second and third blocks of trials, these contingencies were switched (Fig. 2B, 2sh, 2lo, 3sh and 3lo). The length of the delay under long conditions followed an algorithm in which the side designated as long started off as 1 s and increased by 1 s every time that side was chosen until it became 3 s. If the rat continued to choose that side, the length of the delay increased by 1 s up to a maximum of 7 s. If the rat chose the side designated as long less than 8 out of the last 10 choice trials, then the delay was reduced by 1 s to a minimum of 3 s. The reward delay for long forced-choice trials was yoked to the delay in free-choice trials during these blocks. In later blocks we held the delay preceding reward constant while manipulating the number of reward (Fig. 2B, 4bn, 4sm, 5bg and 5sm). The reward was a 0.05ml bolus of 10% sucrose solution. The reward number used in these timing blocks was the same as the reward used in the small reward blocks. For big reward, two additional boli were delivered after gaps of 500ms. The block 1 was 30–50 trial long and all subsequent blocks were 60–100 trials long. Block switches were unsignaled, occurring randomly at the experimenter’s discretion in this window provided the rat had chosen the high value side more than 60% in last 10 free-choice trials.
Single-unit recording:
Wires were screened for activity daily; if no activity was detected, the rat was removed and the electrode assembly was advanced 40 or 80 μm. Otherwise active wires were selected recording, a session was conducted, and the electrode was advanced at the end of the session. Neural activity was recorded using Plexon Multichannel Acquisition Processor systems (Dallas, TX). Signals from the electrode wires were amplified 20X at the headstage (Plexon Inc, HST/8o50-G20). Immediately outside the training chamber, the signals were passed through a differential pre-amplifier (Plexon Inc, PBX2/16sp-r-G50/16fp-G50), where the single unit signals were amplified 50X and filtered at 150–9000 Hz. The single unit signals were then sent to the Multichannel Acquisition Processor box, where they were further filtered at 250–8000 Hz, digitized at 40 kHz and amplified at 1–32X. Waveforms (>2.5:1 signal-to-noise) were extracted from active channels and recorded to disk by an associated workstation.
QUANTIFICATION AND STATISTCAL ANALYSIS
All data were analyzed using Matlab. Instances of multiple comparisons were corrected for with the Benjamini-Hochberg procedure. Error bars in figures denote the standard error of the mean. The number of subjects were chosen based on previous similar single-unit recording studies in rats.
Data analysis:
Units were sorted using Offline Sorter software from Plexon Inc (Dallas, TX). Sorted files were then processed and analyzed in Neuroexplorer and Matlab (Natick, MA). Dopamine neurons were identified via a waveform analysis. Briefly, a cluster analysis was performed based on the half time of the spike duration and the ratio comparing the amplitude of the first positive and negative waveform segments. The center and variance of each cluster was computed without data from the neuron of interest, and then that neuron was assigned to a cluster if it was within 3 s.d. of the cluster’s center. Neurons that met this criterion for more than one cluster were not classified. This process was repeated for each neuron. The putative dopamine neurons that showed an increase in firing to reward compared to baseline (400ms before reward) were further classified as reward-responsive (t-test, p< 0.05). To analyze neural activity to reward, we examined firing rate in the 400 ms beginning 100 ms after reward delivery. Reward activity was normalized by subtracting average baseline firing (400 ms before light on).
Supplementary Material
85 characters.
Addiction is characterized by poor decision-making and learning impairments.
Midbrain dopamine neurons broadcast key teaching signals to support learning.
Here we show that these teaching signals are disrupted by prior cocaine use.
The disruption is consistent with a loss of predictive input from upstream regions.
Acknowledgments
This work was supported by the Intramural Research Program at the National Institute on Drug Abuse. The opinions expressed in this article are the authors’ own and do not reflect the view of the NIH/DHHS.
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Declarations of Interests
The authors declare no competing interests.
References
- APA (2013). Diagnostic and statistical manual of mental disorders (5th ed) (Arlington, VA: American Psychiatric Publishing; ). [Google Scholar]
- Argilli E, Sibley DR, Malenka RC, England PM, and Bonci A (2008). Mechanism and time course of cocaine-induced long-term potentiation in the ventral tegmental area. Journal of Neuroscience 28, 9092–9100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Asaad WF, and Eskandar EN (2011). Encoding of both positive and negative reward prediction errors by neurons of the primate lateral prefrontal cortex and caudate nucleus. Journal of Neuroscience 31, 17772–17787. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Baker TE, Stockwell T, Barnes G, and Holroyd CB (2010). Individual differences in substance dependence: at the intersection of brain, behavior and cognition. Addiction Biology 16, 458–466. [DOI] [PubMed] [Google Scholar]
- Borgland SL, Malenka RC, and Bonci A (2004). Acute and chronic cocaine-induced potentiation of synaptic strength in the ventral tegmental area: electrophysiological and behavioral correlates in individual rats. Journal of Neuroscience 24, 7482–7490. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bromberg-Martin ES, and Hikosaka O (2009). Midbrain dopamine neurons signal preference for advance information about upcoming rewards. Neuron 63, 119–126. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bromberg-Martin ES, Matsumoto M, Hong S, and Hikosaka O (2010). A pallidushabenula-dopamine pathway signals inferred stimulus values. Journal of Neurophysiology 104, 1068–1076. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bryden DW, Johnson EE, Tobia SC, Kashtelyan V, and Roesch MR (2011). Attention for learning signals in anterior cingulate cortex. Journal of Neuroscience 31, 18266–18274. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Burke KA, Franz TM, Gugsa N, and Schoenbaum G (2006). Prior cocaine exposure disrupts extinction of fear conditioning. Learning and Memory 13, 416–421. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Burton AC, Bissonette GB, Vazquez D, Blume EM, Donnelly M, Heatley KC, Hinduja A, and Roesch MR (2018). Previous cocaine self-administration disrupts reward expectancy encoding in ventral striatum. Neuropsychopharmacology 43, 2350–2360. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Burton AC, Bissonette GB, Zhao AC, Patel PK, and Roesch MR (2017). Prior cocaine self-administration increases response-oucome encoding that is divorced from actions selected in dorsal lateral striatum. Journal of Neuroscience 37, 7737–7747. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Calu DJ, Stalnaker TA, Franz TM, Singh T, Shaham Y, and Schoenbaum G (2007). Withdrawal from cocaine self-administration produces long-lasting deficits in orbitofrontal-dependent reversal learning in rats. Learning and Memory 14, 325–328. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chang CY, Esber GR, Marrero-Garcia Y, Yau H-J, Bonci A, and Schoenbaum G (2016). Brief optogenetic inhibition of VTA dopamine neurons mimics the effects of endogenous negative prediction errors during Pavlovian over-expectation. Nature Neuroscience 19, 111–116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chang CY, Gardner M, Di Tillio MG, and Schoenbaum G (2017). Optogenetic blockade of dopamine transients prevents learning induced by changes in reward features. Current Biology 27, 3480–3486. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen BT, Yau H-J, Hatch C, Kusumoto-Yoshida I, Cho SL, Hopf FW, and Bonci A (2013). Rescuing cocaine-induced prefrontal cortex hypoactivity prevents compulsive cocaine seeking. Nature 496, 359–362. [DOI] [PubMed] [Google Scholar]
- Cohen JY, Haesler S, Vong L, Lowell BB, and Uchida N (2012). Neuron-type-specific signals for reward and punishment in the ventral tegmental area. Nature 482, 85–88. [DOI] [PMC free article] [PubMed] [Google Scholar]
- D’Ardenne K, McClure SM, Nystrom LE, and Cohen JD (2008). BOLD responses reflecting dopaminergic signals in the human ventral tegmental area. Science 319, 1264–1267. [DOI] [PubMed] [Google Scholar]
- Ersche KD, Gillan CM, Jones PS, Williams GB, Ward LHE, Luijten M, de Wit S, Sahakian BJ, Bullmore ET, and Robbins TW (2016). Carrots and sticks fail to change behavior in cocaine addiction. Science 352, 1468–1471. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fiorillo CD, Newsome WT, and Schultz W (2008). The temporal precision of reward prediction in dopamine neurons. Nature Neuroscience 11, 966–973. [DOI] [PubMed] [Google Scholar]
- George O, Mandyam CD, Wee S, and Koob GF (2008). Extended access to cocaine self-administration produces long-lasting prefrontal cortex-dependent working memory impairments. Neuropsychopharmacology 33, 2474–2482. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Goldstein RZ, Alia-Klein N, Tomasi D, Zhang L, Cottone LA, Maloney T, Telang F, Caparelli EC, Chang L, Ernst T, et al. (2007a). Is decreased prefrontal cortical sensitivity to monetary reward associated with impaired motivation and self-control in cocaine addiction? American Journal of Psychiatry 164, 43–51. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Goldstein RZ, Craig AD, Bechara A, Garavan H, Childress AR, Paulus MP, and Volkow ND (2009). The neurocircuitry of impaired insight in drug addiction. Trends in Cognitive Sciences 13, 372–380. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Goldstein RZ, Tomasi D, Alia-Klein N, Cottone LA, Zhang L, Telang F, and Volkow ND (2007b). Subjective sensitivity to monetary gradients is associated with frontolimbic activation to reward in cocaine abusers. Drug and Alcohol Dependence 87, 233–240. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Groman SM, Rich KM, Smith NJ, Lee D, and Taylor JR (2018). Chronic exposure to methamphetamine disrupts reinforcement-based decision making in rats. Neuropsychopharmacology 43, 770–780. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hollerman JR, and Schultz W (1998). Dopamine neurons report an error in the temporal prediction of reward during learning. Nature Neuroscience 1, 304–309. [DOI] [PubMed] [Google Scholar]
- Horvitz JC (2000). Mesolimbocortical and nigrostriatal dopamine responses to salient non-reward events. Neuroscience 96, 651–656. [DOI] [PubMed] [Google Scholar]
- Horvitz JC, Stewart T, and Jacobs BL (1997). Burst activity of ventral tegmental dopamine neurons is elicited by sensory stimuli in the awake cat. Brain Research 759, 251–258. [DOI] [PubMed] [Google Scholar]
- Howard JD, and Kahnt T (2018). Identity prediction errors in the human midbrain update reward-identity expectations in the orbitofrontal cortex. Nature Communications 9, 1–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hyman JM, Holroyd CB, and Seamans JK (2017). A novel neural prediction error found in anterior cingulate cortex ensembles. Neuron 95, 447–456. [DOI] [PubMed] [Google Scholar]
- Jentsch JD, Olausson P, De La Garza R, and Taylor JR (2002). Impairments of reversal learning and response perseveration after repeated, intermittent cocaine administrations to monkeys. Neuropsychopharmacology 26, 183–190. [DOI] [PubMed] [Google Scholar]
- Jentsch JD, and Taylor JR (1999). Impulsivity resulting from frontostriatal dysfunction in drug abuse: implications for the control of behavior by reward-related stimuli. Psychopharmacology 146, 373–390. [DOI] [PubMed] [Google Scholar]
- Jo YS, Lee J, and Mizumori SJ (2013). Effects of prefrontal cortical inactivation on neural activity in the ventral tegmental area. Journal of Neuroscience 33, 8159–8171. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jo YS, and Mizumori SJ (2015). Prefrontal regulation of neuronal activity in the ventral tegmental area. Cerebral Cortex epub ahead of print. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Joshua M, Adler A, Mitelman R, Vaadia E, and Bergman H (2008). Midbrain dopamine neurons and striatal cholinergic interneurons encode the difference between reward and aversive events at different epochs of probabilistic classical conditioning trials. Journal of Neuroscience 28, 11673–11684. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Keiflin R, Pribut HJ, Shah NB, and Janak PH (2017). Phasic activation of ventral tegmental, but not substantia nigra, dopamine neurons promotes model-based Pavlovian reward learning. BioRxiv 232678. [Google Scholar]
- Kobayashi K, and Schultz W (2008). Influence of reward delays on responses of dopamine neurons. Journal of Neuroscience 28, 7837–7846. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Konova AB, Moeller SJ, Tomasi D, Parvaz MA, Alia-Klein N, Volkow ND, and Goldstein RZ (2012). Structural and behavioral correlates of abnormal encoding of money value in the sensorimotor striatum in cocaine addiction. European Journal of Neuroscience 36, 2979–2988. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Koob GF, and Le Moal M (2001). Drug addiction, dysregulation of reward, and allostasis. Neuropsychopharmacology 24, 97–121. [DOI] [PubMed] [Google Scholar]
- Koob GF, and Le Moal M (2005). Plasticity of reward neruocircuitry and the ‘dark side’ of drug addiction. Nature Neuroscience 8, 1442–1444. [DOI] [PubMed] [Google Scholar]
- Langdon AJ, Sharpe MJ, Schoenbaum G, and Niv Y (2017). Model-based predictions for dopamine. Current Opinion in Neurobiology 49, 1–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lucantonio F, Caprioli D, and Schoenbaum G (2014a). Transition from ‘model-based’ to ‘model-free’ behavioral control in addiction: involvement of the orbitofrontal cortex and dorsolateral striatum. Neuropharmacology 76 Pt B, 407–415. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lucantonio F, Kambhampati S, Haney RZ, Atalayer D, Rowland NE, Shaham Y, and Schoenbaum G (2014b). Effects of prior cocaine versus morphine or heroin self-administration on extinction learning driven by overexpectation versus omission of reward. Biological Psychiatry 77, 912–920. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lucantonio F, Stalnaker TA, Shaham Y, Niv Y, and Schoenbaum G (2012). The impact of orbitofrontal dysfunction on cocaine addiction. Nature Neuroscience 15, 358–366. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lucantonio F, Takahashi YK, Hoffman AF, Chang CY, Bali-Chaudhary S, Shaham Y, Lupica CR, and Schoenbaum G (2014c). Orbitofrontal activation restores insight lost after cocaine use. Nature Neuroscience 17, 1092–1099. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Matsumoto H, Tian J, Uchida N, and Watabe-Uchida M (2016). Midbrain dopamine neurons signal aversion in a reward-context-dependent manner. eLIFE 5, e:17328. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Matsumoto M, and Hikosaka K (2007). Lateral habenula as a source of negative reward signals in dopamine neurons. Nature 447, 1111–1115. [DOI] [PubMed] [Google Scholar]
- Matsumoto M, and Hikosaka O (2009). Two types of dopamine neuron distinctly convey positive and negative motivational signals. Nature 459, 837–841. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Matsumoto M, Matsumoto K, Abe H, and Tanaka K (2007). Medial prefrontal cell activity signaling prediction errors of action values. Nature Neuroscience 10, 647–656. [DOI] [PubMed] [Google Scholar]
- Mendez IA, Simon NW, Hart N, Mitchell MR, Nation JR, Wellman PJ, and Setlow B (2010). Self-administered cocaine causes long-lasting increases in impulsive choice in a delay discounting task. Behavioral Neuroscience 124, 470–477. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mirenowicz J, and Schultz W (1994). Importance of unpredictability for reward responses in primate dopamine neurons. Journal of Neurophysiology 72, 1024–1027. [DOI] [PubMed] [Google Scholar]
- Moeller SJ, Hajcak G, Parvaz MA, Dunning JP, Volkow ND, and Goldstein RZ (2012). Psychophysiological prediction of choice: relevance to insight and drug addiction. Brain 135, 3481–3494. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Morris G, Nevet A, Arkadir D, Vaadia E, and Bergman H (2006). Midbrain dopamine neurons encode decisions for future action. Nature Neuroscience 9, 1057–1063. [DOI] [PubMed] [Google Scholar]
- Nelson A, and Killcross S (2006). Amphetamine exposure enhances habit formation. Journal of Neuroscience 26, 3805–3812. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Oleson EB, Gentry RN, Chioma VC, and Cheer JF (2012). Subsecond dopamine release in the nucleus accumbens predicts conditioned punishment and its successful avoidance. Journal of Neuroscience 32, 10692–10702. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pan W-X, Schmidt R, Wickens JR, and Hyland BI (2005). Dopamine cells respond to predicted events during classical conditioning: evidence for eligibility traces in the reward-learning network. Journal of Neuroscience 25, 6235–6242. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Park SQ, Kahnt T, Beck A, Cohen MX, Dolan RJ, Wrase J, and Heinz A (2010). Prefrontal cortex fails to learn from reward prediction errors in alcohol dependence. Journal of Neuroscience 30, 7749–7753. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Parvaz MA, Konova AB, Proudfit GH, Dunning JP, Malaker P, Moeller SJ, Maloney T, Alia-Klein N, and Goldstein RZ (2015). Impaired neural response to negative prediction errors in cocaine addiction. Journal of Neuroscience 35, 1872–1879. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Parvaz MA, Maloney T, Moeller SJ, Woicik PA, Alia-Klein N, Telang F, Wang G-J, Squires NK, Volkow ND, and Goldstein RZ (2012). Sensitivity to monetary reward is most severely compromised in recently abstaining cocaine addicted individuals: A cross-sectional ERP study. Psychiatry Research: Neuroimaging 203, 75–82. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Redish AD (2004). Addiction as a Computational Process Gone Awry. Science 306, 1944–1947. [DOI] [PubMed] [Google Scholar]
- Rescorla RA, and Wagner AR (1972). A theory of Pavlovian conditioning: variations in the effectiveness of reinforcement and nonreinforcement In Classical Conditioning II: Current Research and Theory, Black AH, and Prokasy WF, eds. (New York: Appleton-Century-Crofts; ), pp. 64–99. [Google Scholar]
- Robbins TW, and Everitt BJ (1999). Drug addiction: bad habits add up. Nature 398, 567–570. [DOI] [PubMed] [Google Scholar]
- Robinson TE, and Berridge KC (2003). Addiction. Annual review of Psychology 54, 25–53. [DOI] [PubMed] [Google Scholar]
- Robinson TE, and Berridge KC (2008). Review. The incentive sensitization theory of addiction: some current issues. Philosophical Transactions of the Royal Society of London B 363, 3137–3146. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Roesch MR, Calu DJ, and Schoenbaum G (2007a). Dopamine neurons encode the better option in rats deciding between differently delayed or sized rewards. Nature Neuroscience 10, 1615–1624. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Roesch MR, Takahashi Y, Gugsa N, Bissonette GB, and Schoenbaum G (2007b). Previous cocaine exposure makes rats hypersensitive to both delay and reward magnitude. Journal of Neuroscience 27, 245–250. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rutledge RB, Lazzaro SC, Lau B, Myers CE, Gluck MA, and Glimcher PW (2009). Dopaminergic drugs modulate learning rates and perseveration in Parkinson’s patients in a dynamic foraging task. Journal of Neuroscience 29, 15104–15114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sadacca BF, Jones JL, and Schoenbaum G (2016). Midbrain dopamine neurons compute inferred and cached value prediction errors in a common framework. eLIFE DOI: 10.7554/eLife.13665.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schmitz JM, Lindsay JA, Stotts AL, Green CE, and Moeller FG (2010). Contingency management and levodopa-carbidopa for cocaine treatment: a comparison of three behavioral targets. Experimental and Clinical Psychopharmacology 18, 238–244. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schmitz JM, Mooney ME, Moeller FG, Stotts AL, Green C, and Grabowski J (2008). Levodopa pharmacotherapy for cocaine dependence: choosing the optimal behavioral therapy platform. Drug and Alcohol Dependence 94, 142–150. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schoenbaum G, and Setlow B (2005). Cocaine makes actions insensitive to outcomes but not extinction: implications for altered orbitofrontal-amygdalar function. Cerebral Cortex 15, 1162–1169. [DOI] [PubMed] [Google Scholar]
- Schoenbaum G, and Shaham Y (2007). The role of orbitofrontal cortex in drug addiction: a review of preclinical studies. Biological Psychiatry 63, 256–262. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schultz W (2016). Dopamine reward prediction-error signalling: a two-component response. Nature Reviews Neuroscience 17, 183–195. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schultz W, Dayan P, and Montague PR (1997). A neural substrate for prediction and reward. Science 275, 1593–1599. [DOI] [PubMed] [Google Scholar]
- Sharpe MJ, Chang CY, Liu MA, Batchelor HM, Mueller LE, Jones JL, Niv Y, and Schoenbaum G (2017). Dopamine transients are sufficient and necessary for acquisition of model-based associations. Nature Neuroscience 20, 735–742. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stalnaker TA, Roesch MR, Franz TM, Burke KA, and Schoenbaum G (2006). Abnormal associative encoding in orbitofrontal neurons in cocaine-experienced rats during decision-making. European Journal of Neuroscience 24, 2643–2653. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Steinberg EE, Keiflin R, Boivin JR, Witten IB, Deisseroth K, and Janak PH (2013). A causal link between prediction errors, dopamine neurons and learning. Nature Neuroscience 16, 966–973. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sutton RS, and Barto AG (1981). Toward a modern theory of adaptive networks: expectation and prediction. Psychological Review 88, 135–170. [PubMed] [Google Scholar]
- Takahashi Y, Roesch MR, Stalnaker TA, and Schoenbaum G (2007). Cocaine exposure shifts the balance of associative encoding from ventral to dorsolateral striatum. Frontiers in Integrative Neuroscience 1:11. doi: 10.3389/neuro.07/011.2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Takahashi Y, Schoenbaum G, and Niv Y (2008). Silencing the critics: understanding the effects of cocaine sensitization on dorsolateral and ventral striatum in the context of an actor/critic model. Frontiers in Neuroscience 2, 86–99. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Takahashi YK, Batchelor HM, Liu B, Khanna A, Morales M, and Schoenbaum G (2017). Dopamine neurons respond to errors in the prediction of sensory features of expected rewards. Neuron 95, 1395–1405. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Takahashi YK, Langdon AJ, Niv Y, and Schoenbaum G (2016). Temporal specificity of reward prediction errors signaled by putative dopamine neurons in rat VTA depends on ventral striatum. Neuron 91, 182–193. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Takahashi YK, Roesch MR, Wilson RC, Toreson K, O’Donnell P, Niv Y, and Schoenbaum G (2011). Expectancy-related changes in firing of dopamine neurons depend on orbitofrontal cortex. Nature Neuroscience 14, 1590–1597. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tanabe J, Reynolds J, Krmpotich T, Claus E, THompson LL, Du YP, and Banich MT (2013). Reduced neural tracking of prediction error in substance-dependent individuals. American Journal of Psychiatry 170, 1356–1363. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Terraneo A, Leggio L, Saladini M, Ermani M, Bonci A, and Gallimberti L (2016). Transcranial magnetic stimulation of dorsolateral prefrontal cortex reduces cocaine use: A pilot study. European Journal of Neuropsychopharmacology 26, 37–44. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tian J, Huang R, Cohen JD, Osakada F, Kobak D, Machens CK, Callaway EM, Uchida N, and Watabe-Uchida M (2016). Distributed and mixed information in monosynaptic inputs to dopamine neurons. Neuron 91, 1374–1389. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tobler PN, Dickinson A, and Schultz W (2003). Coding of predicted reward omission by dopamine neurons in a conditioned inhibition paradigm. Journal of Neuroscience 23, 10402–10410. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ungless MA, Whistler JL, Malenka RC, and Bonci A (2001). Single cocaine exposure in vivo induces long-term potentiation in dopamine neurons. Nature 411, 583–587. [DOI] [PubMed] [Google Scholar]
- Volkow ND, and Fowler JS (2000). Addiction, a disease of compulsion and drive: involvement of orbitofrontal cortex. Cerebral Cortex 10, 318–325. [DOI] [PubMed] [Google Scholar]
- Waelti P, Dickinson A, and Schultz W (2001). Dopamine responses comply with basic assumptions of formal learning theory. Nature 412, 43–48. [DOI] [PubMed] [Google Scholar]
- Wied HM, Jones JL, Cooch NK, Berg BA, and Schoenbaum G (2013). Disruption of model-based behavior and learning by cocaine self-administration in rats. Psychopharmacology 229, 493–501. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Willuhn I, Burgeno LM, Groblewski PA, and Phillips PEM (2014). Excessive cocaine use results from decreased phasic dopamine signaling in the striatum. Nature Neuroscience 17, 704–711. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wunderlich K, Smittenaar P, and Dolan RJ (2012). Dopamine enhances model-based over model-free choice behavior. Neuron 75, 418–424. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wyvell CL, and Berridge KC (2001). Incentive sensitization by previous amphetamine exposure: increased cue-triggered “wanting” for sucrose reward. Journal of Neuroscience 21, 7831–7840. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.







