Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2013 May 15.
Published in final edited form as: Biol Psychiatry. 2012 Feb 2;71(10):846–854. doi: 10.1016/j.biopsych.2011.12.019

Phasic mesolimbic dopamine signaling precedes and predicts performance of a self-initiated action sequence task

Kate M Wassum 1,2,3,*, Sean B Ostlund 2,3,*, Nigel T Maidment 2,3
PMCID: PMC3471807  NIHMSID: NIHMS346947  PMID: 22305286

Abstract

Background

Sequential reward-seeking actions are readily learned despite the temporal gap between the earliest (distal) action in the sequence and the reward delivery. Fast dopamine signaling is hypothesized to mediate this form of learning by reporting errors in reward prediction. However, such a role for dopamine release in voluntarily initiated action sequences remains to be demonstrated.

Methods

Using fast-scan cyclic voltammetry we monitored phasic mesolimbic dopamine release, in real-time, as rats performed a self-initiated sequence of lever presses to earn sucrose rewards. Prior to testing rats received either 0 (n=11), 5 (n=11) or 10 (n=8) days of action sequence training.

Results

For rats acquiring the action sequence task at test dopamine release was strongly elicited by response-contingent (but unexpected) rewards. With learning, a significant elevation in dopamine release preceded performance of the proximal action, and subsequently came to precede the distal action. This pre-distal dopamine release response was also observed in rats previously trained on the action sequence task, and the amplitude of this signal predicted the latency with which rats completed the action sequence. Importantly, the dopamine response to contingent reward delivery was not observed in rats given extensive pre-training. Pharmacological analysis confirmed that task performance was dopamine dependent.

Conclusions

These data suggest that phasic mesolimbic dopamine release mediates the influence that rewards exert over the performance of self-paced, sequentially-organized behavior and shed light on how dopamine signaling abnormalities may contribute to disorders of behavioral control.

Keywords: Dopamine, Free-Operant, Ventral Striatum, Reinforcement Learning, Reward, Flupenthixol


A long-standing challenge in behavioral science, termed the distal reward (1) or credit assignment problem (2), has been to explain how action sequences are learned, given that one initiates a sequence by performing an action that never directly earns reward. There is growing evidence that dopamine is involved in mediating this influence of rewards over distal events. It is imperative to understand the role of dopamine in action sequence learning, and in the control of self-initiated behavior more generally, since these actions appear to be particularly disrupted by disorders involving alterations in striatal dopaminergic transmission, such as Parkinson’s disease (3) and addiction (4-6).

Studies recording dopamine cell activity have shown that reward presentation elicits a firing burst that shifts to reward-predictive cues (7-9), or discriminative stimuli (10, 11) with training (12). Fast-scan cyclic voltammetry studies have found that phasic mesolimbic dopamine release also backpropagates from reward to reward-predictive cues during Pavlovian learning (13, 14) or to a discriminative stimulus during discrete-trial instrumental tasks (15, 16). In paradigms involving a chain of reward-predictive cues dopamine cell activity also shifts from proximal to distal cues (17, 18). Such findings support the hypothesis that dopamine reports reward prediction errors, i.e., discrepancies between the observed and expected values of rewards and cues (19, 20). This signal is considered important for the acquisition of complex reward-seeking behaviors (21, 22). Interestingly, dopamine receptor blockade has recently been shown to reduce response likelihood during a self-initiated reward-seeking action sequence (23) and accumbal dopamine depletions disrupt action performance when many actions are required to obtain reward (24), which is consistent with a large literature implicating dopamine in incentive motivation (25, 26), the process that allows reward-predictive cues to invigorate reward-seeking actions. Such findings raise the possibility that task-related dopamine signaling also exerts a motivational influence over reward-seeking actions.

Although the simplifying assumptions common to reinforcement learning theory make it difficult to generate predictions about the characteristics of reward prediction error (and phasic dopamine) signaling in self-paced, free-operant instrumental tasks, there is limited data to suggest that in such situations a similar pattern of dopamine responses emerges with learning. Indeed, both dopamine neuron firing (27, 28) and phasic mesolimbic dopamine release (29, 30) precede well-learned self-initiated actions, consistent with a backpropagating reward prediction error signal. However, studies of dopamine release during free-operant learning have typically used cocaine reward (29, 30), which may not generate normal prediction error signals (31), given that it elicits dopamine when predicted (29) and over time undoubtedly alters dopaminergic function (32). Whether phasic mesolimbic dopamine release shows the properties of a prediction error signal under naturalistic free-operant conditions therefore remains unanswered. Moreover, given the prominent role that modern reinforcement learning models attribute to phasic dopamine in action sequence learning and a reward’s ability to influence distal events, it is surprising how little is known about the characteristics of dopamine release in such situations. Uncovering this information is vital if modern reinforcement learning concepts are to be used to explain the neural mechanisms of normal and aberrant decision-making.

We used fast-scan cyclic voltammetry to provide an initial characterization of phasic dopamine release in the ventral striatum/nucleus accumbens of rats performing an unsignaled, self-paced multi-action sequence task for sucrose reward. Given the combined findings that phasic dopamine activity backpropagates from reward through a series of Pavlovian cues (17, 18) and can come to precede first-order self-initiated instrumental actions for cocaine reward (29, 30, 33), we assessed whether phasic dopamine release shifted with learning from the reward itself to increasingly more distal events. Given the large body of evidence implicating mesolimbic dopamine in incentive motivation (25, 34, 35), we also tested the relationship between task-related dopamine responses and action sequence performance.

Methods and Materials

Male Sprague Dawley rats (n=46, Charles River Laboratories, Wilmington, MA) served as the subjects for these experiments. For the fast-scan cyclic voltammetry experiments rats were trained on a sequence of lever-pressing actions to earn sucrose pellet rewards (Bioserv, Frenchtown, NJ). Briefly, the behavioral paradigm (see Supplement) required rats to perform a fixed sequence of two different lever press actions to earn sucrose pellets, such that one action was temporally distal and the other temporally proximal to reward delivery. The distal lever was continuously available and, when pressed, resulted in the insertion of the proximal lever into the chamber. Pressing the proximal lever resulted in the delivery of a pellet and caused that lever to be retracted. Importantly, this task was self-paced, in that the subject could control both the initiation of each sequence and the speed with which the sequence of actions was performed (i.e., trial latency). Prior to testing rats received either 0 (n=11), 5 (n=11) or 10 (n=8) days of action sequence training. During the test phasic dopamine concentration changes in the ventral portion of the neostriatum/nucleus accumbens, a region previously implicated in reward-motivated behavior (36-38), were monitored with fast-scan cyclic voltammetry (see Supplement) while rats earned a total of 30 sucrose pellets rewards through their action sequence performance. All recording sites were verified with histological procedures (Figure 1).

Figure 1. Schematic representation of recording sites.

Figure 1

Line drawings of coronal section are taken from Paxinos and Watson (57). Numbers to the lower right of each section represent the anterior-posterior distance (mm) from bregma of the section. Gray circles represent electrode placements from the 0-day training group, red circles show placements for rats in the 5-day group, and blue circles show placements for rats in the 10-day training group.

In a separate set of rats we examined the effects of flupenthixol (Sigma Aldrich, St. Louis, MO) on sequence performance after either 0, 5 or 10 days of action sequence training. Flupenthixol (0.5mg/kg/ml i.p.) or saline control was administered 1h prior to a test in which rats were allowed to respond on the action sequence to earn up to 30 sucrose pellet rewards.

For full methodological details see the Supplement.

Results

Using several measures of task performance (distal and proximal action rate, task efficiency and the total time to complete the sequence) we found that rats given 5 or 10 days of training displayed similar levels of performance and that task performance was significantly better in these groups relative to rats that did not receive any pre-test sequence training. Analysis of rats’ distal lever press rate (Figure 2A) revealed a main effect of training (F2,27=3.32, p=0.05), suggesting that distal action rate was significantly higher in groups with more training. Similarly, there was a significant increase in proximal action rate across training groups (F2,27=5.88, p=0.007, Figure 3B). Importantly, the proximal/distal action ratio, a measure of task efficiency executed, also showed a significant improvement across training groups (F2,27=38.24, p<0.0001; Figure 2B-inset). Dunnent’s post-hoc analyses of these data showed that task efficiency was significantly better in the 5-day training group (p<0.001) relative to the 0-day group, and that no further improvement occurred after 10 days of sequence training (5-day v. 10-day p>0.05). Not only did training result in an increase in task efficiency, there was also an effect of training on the average time it took rats to complete each sequence (F2,27=11.23, p=0.0003; Figure 2C). Post hoc analyses revealed that the average action sequence time was significantly longer in rats without sequence pre-training relative to rats trained for 5 days on the sequence prior to test (p<0.001), while rats trained for 10 days did not perform differently than those trained for 5 (p>0.05).

Figure 2. Action sequence performance during the fast-scan cyclic voltammetry tests.

Figure 2

Rats were given either 0 (gray), 5 (red) or 10 (blue) days of action sequence training prior to the fast-scan cyclic voltammetry test. A. The average rate of lever pressing on the distal lever. B. The average rate of lever pressing on the proximal lever at test. The inset shows the efficiency ratio, the total number of proximal presses/the total number of distal presses. An efficiency ratio of 1 would indicate optimal performance. C. The average time taken to complete a sequence (totaling the time from initial distal press to the next initiating distal press as described in the text above). Error bars indicate +1 standard error of the mean. *= p<0.05, ***= p<0.001.

Figure 3. Examples of phasic mesolimbic dopamine release during action sequence learning and performance.

Figure 3

Mesolimbic Phasic dopamine release was recorded with fast-scan cyclic voltammetry in rats learning to perform a 2-action sequence to earn sucrose rewards after either 0, 5 or 10 days of pre-training on the action sequence. Current at the peak oxidative potential of dopamine is plotted as a function of time. The insets show the background-subtracted cyclic voltammogram from the peak of current for each trace. Below are 2-dimensional color plots of cyclic voltammograms over time. The y-axis indicates the applied potential, the x-axis shows time in seconds (s) and current is plotted in false-color (see lower left scale). Dashed lines mark the time of either the distal or proximal press, as denoted above the current trace. Under the press demarcation Cue refers to the insertion of the proximal lever following the distal lever press and Rwd refers to the delivery of the sucrose pellet reward after the proximal lever press. For the example from the groups of rats acquiring the task for the first time at test (0-day group) the representative example is taken from the 1st action sequence trial and the last action sequence trial for the same subject. The first action sequence trial is broken into separate 5s bins due to the fact that the rat’s latency to press the proximal lever after the distal press was greater than 5s.

Within a single learning session, phasic mesolimbic dopamine release comes to precede first the proximal then the distal action in a lever-pressing sequence earning reward

As mentioned above, mesolimbic dopamine release is hypothesized to backpropagate from reward to more distal elements in a sequence with learning. Our results appear to be consistent with this idea. Figure 4 shows the dopamine concentration change averaged across rats in the 5s prior to and after the distal (4A) and proximal (4B) lever press at the beginning (representative example shown in Figure 3-left), middle and end (representative example Figure 3-2nd panel) of the acquisition session for 0-day training group rats. Early in training, when reward delivery was relatively unexpected, mesolimbic dopamine release peaked following reward delivery (which occurred immediately following the proximal lever press, marked “Reward” in Figure 4B). Although rats had no training on the full action sequence prior to test, they had received brief training on the proximal lever-reward contingency and as such the proximal lever insertion likely served as a reward-predictive cue and elicited a small dopamine response. In the middle of the acquisition session the maximal dopamine signal amplitude occurred earlier, coincident with the proximal press. By the end of the session, when the rats had presumably come to expect response-contingent reward delivery, there was a dopamine increase prior to the distal action.

Figure 4. Phasic mesolimbic dopamine release comes to precede the proximal action then to precede the distal lever press during initial action sequence learning.

Figure 4

A&B. Change in dopamine concentration in the 5 sec prior to and after the distal (A) or proximal (B) lever press was averaged across subjects in the 0-day group of rats acquiring the action sequence task for the first time at test. The lightest gray trace (top) represents the dopamine concentration change during the beginning of the session, averaged across the first 5 action sequence trials, the darker gray trace (middle) represents the dopamine concentration change averaged across the middle 5 action sequence trials and the black trace (bottom) represents the same for the end of the session, the last 5 trials. The scale bar represents a 5nM dopamine concentration change. Tick marks in A represent the average time, for the first 5 (light gray-top), middle (gray-middle) or last 5 (black-bottom) trials at which the dopamine concentration change peaked prior to the distal press. C. The maximal dopamine peak concentration change preceding the distal action, upon the lever insertion cue, preceding the proximal action and following reward delivery averaged across the first 5 (Beginning), middle 5 (Middle) or last 5 (End) trials for each rat and then averaged across rats in the 0-day training group only. Asterisks and carets mark changes from in peak dopamine concentration at the middle and end of the sequence acquisition session relative to the beginning of the session. Number signs demarcate a significant difference in peak dopamine concentration preceding the distal or proximal actions or to the lever cue within each phase (i.e. beginning) relative to the reward-related dopamine response. 0-day training group n=11. Dashed lines and error bars indicate +1 standard error of the mean. *= p<0.05, **=p<0.01.

Peak dopamine concentration change to each event in the sequence (preceding distal action, to the lever insertion cue, preceding proximal action, and following reward delivery) averaged across session phase, either the first 5 (beginning), middle 5 (middle) or last 5 (end) trials of the acquisition session, and then averaged across subjects, is presented in Figure 4C. Analysis of these data revealed no significant effect of event (F3,30=1.30, p=0.29), but did show a significant session phase effect (F2,20=8.43, p=0.002), as well as a significant phase by event interaction (F6,60=13.08, p=0.01), demonstrating that the dopamine release in response to each task event was differentially altered by action sequence learning. Indeed, individual analysis of the dopamine signal associated with each event in the beginning of the session confirmed a significant effect of event (F3,30=5.18, p=0.005), with post hoc analysis revealing that the dopamine response to the reward delivery was significantly higher than to that preceding the distal action (p<0.01), to the lever cue (p<0.05) and preceding the proximal action (p<0.01). This was not the case in the middle and end of the session; there was only a marginally insignificant effect of event on dopamine peak concentration in the middle of the session (F3,30=2.6, p=0.07) and no effect of event at the end of the acquisition session (F3,30=0.66, p=0.58). Importantly, analysis of the dopamine response to each event individually in the beginning, middle and end of the acquisition session supports the notion that task-related dopamine signaling was modulated by training. There was a significant effect of time on the peak dopamine concentration change prior to the proximal response (F3,30=6.53, p=0.007), with post hoc analysis confirming that this response was significantly increased in both the middle (p<0.05) and end (p<0.01) of the session relative to the beginning of the session. Similarly, there was a significant effect of time on the pre-distal dopamine response (F3,30=4.62, p=0.02), however post hoc analyses here showed that, relative to the beginning, the amplitude of this response was increased only at the end of the session (p<0.05). Thus, it appears that for rats learning to perform a new sequence of actions, the mesolimbic dopamine response increased first during the period immediately before proximal action performance, and then during the period just before distal action performance, consistent with a backpropagating, reward-prediction error profile. Importantly, rather than dopamine being solely elicited by overt cues or events, it also came to precede the rats’ initiation of the action sequence, which is notable for a self-paced task in which rats’ reward seeking was voluntary.

Across training groups there is a shift in the phasic mesolimbic dopamine release pattern indicating backpropagation from reward to more distal action sequence elements

This shift in dopamine to more distal elements, noticed within the group acquiring the action sequence task, appeared to be followed by longer-term changes in task-related dopamine signaling, which were apparent when assessed across groups of rats with differing sequence training levels. Figure 5A and B shows the dopamine concentration change in the 5s prior to and after each distal (5A) and proximal (5B) lever press averaged across the 30-trial session for each rat and then averaged across rats for subjects in the 0-, 5-, or 10-day training groups. As is clear from this figure and the representative examples shown in Figure 3, mesolimbic dopamine was elevated both prior to and after the distal and proximal actions in all 3 groups. However, the amplitude and pattern of these dopamine concentration changes differed across training groups; the phasic dopamine signal was more prominent in the end stages of the sequence (proximal lever press and following reward delivery) in rats acquiring the sequence at test (0-day group), and became preferentially associated with more distal elements of the sequence in extensively trained rats (10-day group).

Figure 5. Phasic mesolimbic dopamine release backpropagates from reward delivery to preceding the distal action across training groups.

Figure 5

A. Change in dopamine concentration in the 5 sec prior to and after the distal lever press, averaged across all 30 trials in the test for each rat and then averaged across rats, for rats receiving either 0 (top, black), 5 (middle, red), or 10 (bottom, blue) days of action sequence training prior to test. Tick marks on each graph represent the average time at which the dopamine peak occured for each rat. The shaded bar after the distal press line indicates the time of the proximal lever insertion cue. B. Change in dopamine concentration in the 5 sec prior to and after the proximal lever press for rats receiving either 0, 5, or 10 days of action sequence training prior to test. The shaded bar after the distal press line indicates the time from lever press to sucrose pellet delivery. Scale bar marks a 5nM dopamine concentration change. C. The maximal dopamine peak concentration change preceding the distal action, to the lever insertion cue, preceding the proximal action and to the reward delivery averaged across all trials for each rat and then averaged across rats in each training group receiving either 0 (gray), 5 (red) or 10 (blue) days of action sequence training prior to test. Dashed lines and error bars indicate +1 standard error of the mean. 0 day group n=11, 5 day group n=11, 10 day group n=8. *=p<0.05

Statistical analysis of the peak dopamine release associated with each event in the sequence (Figure 5C) further supports the notion dopamine levels surrounding the major events within the sequence critically depended on the extent of training prior to testing (event x training group interaction: F6,81=46.42, p=0.005). Dopamine levels were significantly greater during the reward delivery compared to the pre-distal press period (p<0.05) for rats without sequence pre-training, whereas no such difference was observed in rats given 5 days of training prior to testing (p>0.05). Rats given extensive pre-training (10-day group) showed the opposite effect, exhibiting a larger dopamine response before performing the distal action than to reward delivery (p<0.05). There were no main effects of either event (F3,81=15.02, p=0.36), or training group (F2,81=1.29, p=0.29).

Rats were intermittently given non-contingent sucrose pellets before each test to compare dopamine responses to these unexpected rewards with those earned by lever pressing. This analysis, (Figure 6) revealed a main effect of expectancy (F1,27=10.83, p=0.002), suggesting that the dopamine response to the earned, and therefore expected, reward was less than to unexpected reward delivery. Although there was no training group effect (F2,27=1.56, p=0.23) or interaction between training group and expectancy (F2,27=0.60, p=0.55), Bonferroni post hoc analysis revealed that the effect of expectancy was significant only in the 10-day training group (p<0.05), i.e. only in the most well-trained rats did earned reward elicit significantly less dopamine release than unexpected reward delivery. Importantly, no group differences were observed in the magnitude of the dopamine response to unexpected reward (F2,29=0.42, p=0.66), demonstrating that the group differences described above were specific to predictable rewards and depended on rats’ training history rather than other potential differences across groups (e.g., electrode sensitivity).

Figure 6. Dopamine peak amplitude to unexpected and earned reward delivery.

Figure 6

Prior to the lever-pressing session at test, rats were given 3 free reward deliveries. The dopamine release induced by these unexpected reward deliveries is compared to that induced by delivery of a reward earned by completing the action sequence. Gray bars represent rats with no training on the full action sequence prior to test, red bars show rats with 5 days of sequence training prior to test and blue bars present rats with 10 days of sequence training prior to test. Error bars indicate +1 standard error of the mean. *= p<0.05.

Taken together, these data show that the rapid, within-session modulation of task-related dopamine signaling observed in rats learning to perform the action sequence continues to occur over sessions, as animals become proficient in the task. Moreover, there appears to be an interim learning phase in which phasic dopamine is elevated to each event in the sequence (5-day group) prior to transitioning from the reward to more distal elements (10-day group).

Importantly, as can been seen in Figure 5A, the dopamine peak preceding the distal action is tightly time-locked to the actual distal lever press in both the 0- and 5-day training groups (see tick line Figure 5A-top and - middle). In both these groups, dopamine levels peaked between 3.8 and 0s prior to the distal lever press and for most rats the peak occurred within 1s prior to the press. The average time between these dopamine peaks and the distal lever press was not significantly different between the 0- and 5-day training groups (t20=0.31, p=0.76). In the group receiving extended training (10 days) on the action sequence, time-locking of the dopamine peak preceding the distal action was more variable across rats (ranging from 4.9 to 0.3s prior to the press, see tick line Figure 5A-bottom) and occurred much earlier, on average, than it did for the 5-day group that had less training but showed comparable behavioral performance (t17=2.17, p=0.04). This pattern explains the apparent slow rise in average dopamine levels prior to the distal action in the 10-day group (Figure 5A). Rather than reflecting a consistent pattern across subjects, the averaged data obscure individual differences in pre-response dopamine signaling. Thus, it appears that the temporal relationship between phasic dopamine transients and initiation of sequence performance became decoupled for rats receiving extensive training.

Phasic mesolimbic dopamine predicts task performance

Our finding that mesolimbic dopamine release actually came to precede the rats’ initiation of the action sequence suggests that dopamine release may not simply be a response to overt task cues, but may also reflect a motivational component of task performance. Indeed, not only was there an evolution of phasic mesolimbic dopamine release prior to action sequence performance (i.e., before the distal press), the magnitude of this effect predicted task performance. Figure 7A presents the concentration of the dopamine peak prior to the distal action for each of the 30 total events (shaded to reflect 5-trial bins) relative to the average amount of time it took to complete the sequence, averaged across rats in the 0-day training group. The magnitude of the dopamine peak preceding each distal action was significantly negatively correlated with action sequence time (R30= -0.49, p=0.006). A similar relationship was also apparent between subjects across all three training groups (Figure 7B). Statistical analysis of these data, controlling for training group, also revealed a significant negative correlation (R27= -0.44, p=0.02). Thus, it appeared that the more dopamine released prior to initiation of the action sequence, the quicker the rat completed the sequence. This finding suggests that phasic dopamine is not solely serving as a prediction error signal during action sequence learning, but may also be related to the influence of incentive motivation on task performance. Interestingly, there were no significant correlations between the dopamine release amplitude to any of the action sequence elements and the rats’ response rate on either lever (Table S1 in the Supplement), indicating that the rats’ overt behavioral output level was not associated with phasic mesolimbic dopamine activity and is, in this sense, dissociable from their action sequence performance.

Figure 7. Phasic mesolimbic dopamine release is associated with action sequence performance.

Figure 7

A. For each of the 30 trials the maximum dopamine concentration peak prior to the distal lever press was averaged across rats (y-axis) and correlated with the average time to complete each action sequence performance (totaling the time from initial distal press to the next initiating distal press), also averaged across rats for each trial (x-axis). Shading represents the trial in 5 trial bins (light to dark=beginning to end of test). The black line represents the linear regression for all the trials combined. B. For each rat the dopamine concentration change peak amplitude prior to the distal lever press was averaged across trials (y-axis) and correlated with the average sequence time, also averaged across the 30 trials for each rat (x-axis). Gray circles represent rats receiving no action sequence training prior to test, red and blue represent rats receiving 5 and 10 days of action sequence training, respectively. The black bar represents the linear regression for all rats.

Action-sequence learning and performance is attenuated under dopamine receptor blockade

To further explore the relationship between dopamine signaling and self-initiated action sequence performance, we conducted an experiment in a separate group of animals to assess the sensitivity of the current task to dopamine receptor blockade. Naive rats were given 10 days of training on the action sequence. Rats were pretreated with the non-specific dopamine receptor antagonist, flupenthixol (0.5mg/kg i.p.) or vehicle, prior to the 1st, 5th or 10th day of action sequence training. As Figure 8A demonstrates, we found that flupenthixol administered on test altered the time it took rats to complete the action sequence; there was a main effect of both drug (F1,28=8.58, p=0.01) and training (F2,28=11.24, p=0.0003), as well as an interaction between these factors (F2,28=7.83, p=0.002). Post hoc analysis found that rats treated with flupenthixol took significantly longer to complete each action sequence, on average, relative to vehicle-treated controls when administered on the initial day of action sequence learning (p<0.001), but not when administered after 5 or 10 days of training (p>0.05). Thus, this aspect of action sequence performance appears to be dopamine-dependent only during initial acquisition of the task, presumably because the underlying learning process requires dopamine signaling. However, flupenthixol also had a more persistent task performance effect, reducing the rate of responding on both levers irrespective of training (Figure 8B and C). For both lever-pressing actions there was a significant main effect of training (distal: F2,28=13.52, p<0.0001; proximal: F2,28=22.38, p<0.0001), as well as a main effect of drug (distal: F1,28=30.66, p<0.0001; proximal: F1,28=18.46, p=0.0007), with no significant interaction (distal: F2,28=1.50, p=0.24; proximal: F2,28=1.43, p=0.26), indicating that, with training, rats pressed faster on both levers and that flupenthixol attenuated press rate in a manner that did not depend on training history. Taken together these results confirm that dopamine signaling plays a critical role in the acquisition and performance of self-initiated sequential actions.

Figure 8. Dopamine receptor blockade attenuates action sequence performance.

Figure 8

A separate group of rats (n=16) were trained on the action sequence and tested after 0 (gray), 5 (red) or 10 (blue) days of action sequence training. At test rats were either given saline vehicle (open bars) or flupenthixol (0.5mg/kg i.p., shaded bars). A. Effects of flupenthixol on action sequence performance, measured by the total action sequence time (averaged across trials then averaged across rats). B. The average rate of lever pressing on the distal lever at test. D. The average rate of lever pressing on the proximal lever at test. Error bars represent +1 standard error of the mean. ***= p<0.0001

Discussion

This study characterized the pattern of phasic mesolimbic dopamine release during the acquisition and performance of a self-paced two-action sequencing task in rats. We found that dopamine release shifted from the reward to more distal elements of the sequence, a pattern detected both within-subjects, in rats acquiring the action sequence for the first time, and across groups of rats given varying amounts of pre-training on the task. Moreover, we found that the concentration of the dopamine transient preceding the initiation of the action sequence predicted the speed with which rats completed the sequence.

These results are generally consistent with the findings of previous studies showing that phasic mesolimbic dopamine signaling transitions from reward delivery to reward-predictive cues during passive Pavlovian learning (7, 13, 14, 39) and discrete–trial, discriminative stimulus-controlled instrumental tasks (10, 11, 15, 40). Our current findings significantly extend this work by demonstrating that phasic dopamine release shifts from reward delivery to precede a self-initiated free-operant instrumental action. In this respect our results are consistent with the few studies that have examined phasic mesolimbic dopamine release during free-operant behavior for drug reward (29, 30, 41) and importantly show that phasic mesolimbic dopamine signaling comes to precede a self-initiated action under naturalistic conditions in which the dopamine system is not pharmacologically altered (42, 43). Importantly, unlike these studies with drug reward, we show that the phasic dopamine signal to earned food reward diminishes with training. Moreover, our results add to previous studies to suggest that, with learning, the initiation of a self-paced action sequence can be preceded by phasic mesolimbic dopamine release and that performance of an action that has never directly earned reward can be accompanied by phasic dopamine signaling. Interestingly, we show that dopamine release does not transition immediately in our self-paced action sequence task from the reward to more distal sequence elements, but rather that there is an interim learning phase in which phasic dopamine is elevated to each event in the sequence, including the reward delivery. This result is consistent with electrophysiological recordings of midbrain dopamine neurons during cue-reward pairings (39).

Temporal difference models of reinforcement learning assume that learning is regulated by a reward prediction error signal (21, 22, 44-46), and there is now considerable evidence that this reward prediction error is mediated by phasic dopamine (7, 19). However, using a temporal difference algorithm to model free operant conditioning is challenging due to the lack of unambiguous rules for defining model states. This clearly applies to the current task since our rats were allowed to decide when to initiate the sequence. The main features of our data are nevertheless in line with the general themes of such models. We show that rewards earned by the performance of well-established self-initiated actions elicit significantly smaller phasic mesolimbic dopamine responses than unexpected reward deliveries. Whereas training attenuated the dopamine response to response-contingent rewards, it led to an increase in phasic dopamine release during the period before the distal lever press. This increase was apparent within a single learning session. While not a response to any overt cue, this dopamine response may have been elicited by environmental or internal cues unappreciated by the experimenter.

In addition to the view that dopamine mediates reinforcement learning, a second popular hypothesis assumes that dopamine is responsible for mediating the incentive motivation that allows reward-paired cues to invigorate reward-seeking actions (25, 47-49). There have been several recent attempts to integrate the concept that dopamine mediates incentive motivation into the reinforcement learning framework (50-52). For example, McClure et al. (2003) posits that phasic dopamine’s role in mediating the direct incentive motivation effects of reward-predictive cues on action selection is dissociable from its role in reporting the reward prediction errors that support reinforcement learning (22, 44, 45), or action chunking (53, 54). While not providing a critical test of such theories, our finding that the transient dopamine release amplitude preceding the initiation of the action sequence predicts the speed with which that sequence will be completed is generally consistent with a role for phasic mesolimbic dopamine signaling in incentive motivation. Our pharmacological data support this correlational finding by showing that dopamine receptor blockade attenuates both action sequence learning and performance. These effects of dopamine receptor antagonism on action sequence performance are consistent with findings from nonhuman primate studies showing that dopamine transmission is preferentially involved during the early stages of action sequence learning (53, 54), when discrete actions are being integrated into sequence-level action chunks (55, 56), and with literature implicating dopamine in incentive motivation (25, 34, 35).

Taken together these data suggest that phasic mesolimbic dopamine release reflects the properties of a prediction error signal during the acquisition and performance of a self-paced sequence of actions and that such release is also associated with the incentive motivational properties of rewards and reward-paired cues, potentially providing a mechanism by which rewards come to exert influence over temporally distal actions.

Supplementary Material

01

Acknowledgements

This research was supported by grants DA09359 and DA05010 from NIDA to N.T.M., grant T32 DA024635 from NIDA and Hatos scholarship to K.M.W. and grant DA029035 to S.B.O. The authors would like to thank Katie McNutt for research assistance.

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Financial Disclosures: All authors report no biomedical financial interests or potential conflicts of interest.

References

  • 1.Hull C. Principles of Behavior. Appleton; New York: 1943. [Google Scholar]
  • 2.Minsky M. Steps toward artificial intelligence. Proceedings of the IRE. 1961:8–30. [Google Scholar]
  • 3.Jankovic J. Parkinson’s disease: clinical features and diagnosis. J Neurol Neurosurg Psychiatry. 2008;79:368–376. doi: 10.1136/jnnp.2007.131045. [DOI] [PubMed] [Google Scholar]
  • 4.Hyman SE. The neurobiology of addiction: implications for voluntary control of behavior. Am J Bioeth. 2007;7:8–11. doi: 10.1080/15265160601063969. [DOI] [PubMed] [Google Scholar]
  • 5.Volkow ND, Fowler JS, Wang GJ, Baler R, Telang F. Imaging dopamine’s role in drug abuse and addiction. Neuropharmacology. 2009;56(Suppl 1):3–8. doi: 10.1016/j.neuropharm.2008.05.022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Koob GF, Volkow ND. Neurocircuitry of Addiction. Neuropsychopharmacology. 2009 doi: 10.1038/npp.2009.110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Schultz W, Dayan P, Montague PR. A neural substrate of prediction and reward. Science. 1997;275:1593–1599. doi: 10.1126/science.275.5306.1593. [DOI] [PubMed] [Google Scholar]
  • 8.Schultz W. The Reward Signal of Midbrain Dopamine Neurons. News Physiol Sci. 1999;14:249–255. doi: 10.1152/physiologyonline.1999.14.6.249. [DOI] [PubMed] [Google Scholar]
  • 9.Schultz W. Getting formal with dopamine and reward. Neuron. 2002;36:241–263. doi: 10.1016/s0896-6273(02)00967-4. [DOI] [PubMed] [Google Scholar]
  • 10.Ljungberg T, Apicella P, Schultz W. Responses of monkey dopamine neurons during learning of behavioral reactions. J Neurophysiol. 1992;67:145–163. doi: 10.1152/jn.1992.67.1.145. [DOI] [PubMed] [Google Scholar]
  • 11.Schultz W, Apicella P, Ljungberg T, Romo R, Scarnati E. Reward-related activity in the monkey striatum and substantia nigra. Prog Brain Res. 1993;99:227–235. doi: 10.1016/s0079-6123(08)61349-7. [DOI] [PubMed] [Google Scholar]
  • 12.Bromberg-Martin ES, Matsumoto M, Hikosaka O. Dopamine in motivational control: rewarding, aversive, and alerting. Neuron. 2010;68:815–834. doi: 10.1016/j.neuron.2010.11.022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Flagel SB, Clark JJ, Robinson TE, Mayo L, Czuh A, Willuhn I, et al. A selective role for dopamine in stimulus-reward learning. Nature. 2010 doi: 10.1038/nature09588. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Day JJ, Roitman MF, Wightman RM, Carelli RM. Associative learning mediates dynamic shifts in dopamine signaling in the nucleus accumbens. Nat Neurosci. 2007;10:1020–1028. doi: 10.1038/nn1923. [DOI] [PubMed] [Google Scholar]
  • 15.Roitman MF, Stuber GD, Phillips PE, Wightman RM, Carelli RM. Dopamine operates as a subsecond modulator of food seeking. J Neurosci. 2004;24:1265–1271. doi: 10.1523/JNEUROSCI.3823-03.2004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Jones JL, Day JJ, Aragona BJ, Wheeler RA, Wightman RM, Carelli RM. Basolateral amygdala modulates terminal dopamine release in the nucleus accumbens and conditioned responding. Biol Psychiatry. 2010;67:737–744. doi: 10.1016/j.biopsych.2009.11.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Takikawa Y, Kawagoe R, Hikosaka O. A possible role of midbrain dopamine neurons in short- and long-term adaptation of saccades to position-reward mapping. J Neurophysiol. 2004;92:2520–2529. doi: 10.1152/jn.00238.2004. [DOI] [PubMed] [Google Scholar]
  • 18.Schultz W, Apicella P, Ljungberg T. Responses of monkey dopamine neurons to reward and conditioned stimuli during successive steps of learning a delayed response task. J Neurosci. 1993;13:900–913. doi: 10.1523/JNEUROSCI.13-03-00900.1993. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Waelti P, Dickinson A, Schultz W. Dopamine responses comply with basic assumptions of formal learning theory. Nature. 2001;412:43–48. doi: 10.1038/35083500. [DOI] [PubMed] [Google Scholar]
  • 20.Montague PR, Dayan P, Sejnowski TJ. A framework for mesencephalic dopamine systems based on predictive Hebbian learning. J Neurosci. 1996;16:1936–1947. doi: 10.1523/JNEUROSCI.16-05-01936.1996. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Suri RE, Schultz W. Learning of sequential movements by neural network model with dopamine-like reinforcement signal. Exp Brain Res. 1998;121:350–354. doi: 10.1007/s002210050467. [DOI] [PubMed] [Google Scholar]
  • 22.Joel D, Niv Y, Ruppin E. Actor-critic models of the basal ganglia: new anatomical and computational perspectives. Neural Netw. 2002;15:535–547. doi: 10.1016/s0893-6080(02)00047-3. [DOI] [PubMed] [Google Scholar]
  • 23.Veeneman MM, van Ast M, Broekhoven MH, Limpens JH, Vanderschuren LJ. Seeking-taking chain schedules of cocaine and sucrose self-administration: effects of reward size, reward omission, and α-flupenthixol. Psychopharmacology (Berl) 2011 doi: 10.1007/s00213-011-2525-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Salamone JD, Wisniecki A, Carlson BB, Correa M. Nucleus accumbens dopamine depletions make animals highly sensitive to high fixed ratio requirements but do not impair primary food reinforcement. Neuroscience. 2001;105:863–870. doi: 10.1016/s0306-4522(01)00249-4. [DOI] [PubMed] [Google Scholar]
  • 25.Berridge KC, Robinson TE. What is the role of dopamine in reward: hedonic impact, reward learning, or incentive salience? Brain Res Brain Res Rev. 1998;28:309–369. doi: 10.1016/s0165-0173(98)00019-8. [DOI] [PubMed] [Google Scholar]
  • 26.Dickinson A, Smith J, Mirenowicz J. Dissociation of Pavlovian and instrumental incentive learning under dopamine antagonists. Behav Neurosci. 2000;114:468–483. doi: 10.1037//0735-7044.114.3.468. [DOI] [PubMed] [Google Scholar]
  • 27.Romo R, Schultz W. Dopamine neurons of the monkey midbrain: contingencies of responses to active touch during self-initiated arm movements. J Neurophysiol. 1990;63:592–606. doi: 10.1152/jn.1990.63.3.592. [DOI] [PubMed] [Google Scholar]
  • 28.Jin X, Costa RM. Start/stop signals emerge in nigrostriatal circuits during sequence learning. Nature. 2010;466:457–462. doi: 10.1038/nature09263. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Stuber GD, Roitman MF, Phillips PE, Carelli RM, Wightman RM. Rapid dopamine signaling in the nucleus accumbens during contingent and noncontingent cocaine administration. Neuropsychopharmacology. 2005;30:853–863. doi: 10.1038/sj.npp.1300619. [DOI] [PubMed] [Google Scholar]
  • 30.Phillips PE, Stuber GD, Heien ML, Wightman RM, Carelli RM. Subsecond dopamine release promotes cocaine seeking. Nature. 2003;422:614–618. doi: 10.1038/nature01476. [DOI] [PubMed] [Google Scholar]
  • 31.Redish AD. Addiction as a computational process gone awry. Science. 2004;306:1944–1947. doi: 10.1126/science.1102384. [DOI] [PubMed] [Google Scholar]
  • 32.Anderson SM, Pierce RC. Cocaine-induced alterations in dopamine receptor signaling: implications for reinforcement and reinstatement. Pharmacol Ther. 2005;106:389–403. doi: 10.1016/j.pharmthera.2004.12.004. [DOI] [PubMed] [Google Scholar]
  • 33.Romo R, Schultz W. Somatosensory input to dopamine neurones of the monkey midbrain: responses to pain pinch under anaesthesia and to active touch in behavioural context. Prog Brain Res. 1989;80:473–478. doi: 10.1016/s0079-6123(08)62245-1. discussion 465-476. [DOI] [PubMed] [Google Scholar]
  • 34.Robbins TW, Everitt BJ. A role for mesencephalic dopamine in activation: commentary on Berridge (2006) Psychopharmacology (Berl) 2007;191:433–437. doi: 10.1007/s00213-006-0528-7. [DOI] [PubMed] [Google Scholar]
  • 35.Salamone JD, Correa M, Farrar A, Mingote SM. Effort-related functions of nucleus accumbens dopamine and associated forebrain circuits. Psychopharmacology (Berl) 2007;191:461–482. doi: 10.1007/s00213-006-0668-9. [DOI] [PubMed] [Google Scholar]
  • 36.Ostlund SB, Wassum KM, Murphy NP, Balleine BW, Maidment NT. Extracellular Dopamine Levels in Striatal Subregions Track Shifts in Motivation and Response Cost during Instrumental Conditioning. Journal of Neuroscience. 2011;31:200–207. doi: 10.1523/JNEUROSCI.4759-10.2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Lynd-Balta E, Haber SN. The organization of midbrain projections to the striatum in the primate: sensorimotor-related striatum versus ventral striatum. Neuroscience. 1994;59:625–640. doi: 10.1016/0306-4522(94)90182-1. [DOI] [PubMed] [Google Scholar]
  • 38.Lynd-Balta E, Haber SN. The organization of midbrain projections to the ventral striatum in the primate. Neuroscience. 1994;59:609–623. doi: 10.1016/0306-4522(94)90181-3. [DOI] [PubMed] [Google Scholar]
  • 39.Pan WX, Schmidt R, Wickens JR, Hyland BI. Dopamine cells respond to predicted events during classical conditioning: evidence for eligibility traces in the reward-learning network. J Neurosci. 2005;25:6235–6242. doi: 10.1523/JNEUROSCI.1478-05.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Nishino H, Ono T, Muramoto K, Fukuda M, Sasaki K. Neuronal activity in the ventral tegmental area (VTA) during motivated bar press feeding in the monkey. Brain Res. 1987;413:302–313. doi: 10.1016/0006-8993(87)91021-3. [DOI] [PubMed] [Google Scholar]
  • 41.Owesson-White CA, Ariansen J, Stuber GD, Cleaveland NA, Cheer JF, Wightman RM, et al. Neural encoding of cocaine-seeking behavior is coincident with phasic dopamine release in the accumbens core and shell. Eur J Neurosci. 2009;30:1117–1127. doi: 10.1111/j.1460-9568.2009.06916.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Addy NA, Daberkow DP, Ford JN, Garris PA, Wightman RM. Sensitization of rapid dopamine signaling in the nucleus accumbens core and shell after repeated cocaine in rats. J Neurophysiol. 2010;104:922–931. doi: 10.1152/jn.00413.2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Whitby LG, Hertting G, Axelrod J. Effect of cocaine on the disposition of noradrenaline labelled with tritium. Nature. 1960;187:604–605. doi: 10.1038/187604a0. [DOI] [PubMed] [Google Scholar]
  • 44.Sutton RS. Learning to predict by methods of temporal differences. Machine Learning. 1988;3:9–44. [Google Scholar]
  • 45.Barto AG, Sutton RS, Watkins CJCH. Sequential decision problems and neural networks. In: TD S, editor. Advances in Neural Information Processing Systems. MIT Press; Cambridge, MA: 1989. pp. 686–693. [Google Scholar]
  • 46.Houk JC, Adams JL, Barto AG. A model of how the basal ganglia generate and use neural signals that predict reinforcement. In: Houk JC, Davis JL, Beiser DG, editors. Models of information processing in the basal ganglia. MIT Press; 1995. pp. 249–270. [Google Scholar]
  • 47.Berridge KC. The debate over dopamine’s role in reward: the case for incentive salience. Psychopharmacology (Berl) 2007;191:391–431. doi: 10.1007/s00213-006-0578-x. [DOI] [PubMed] [Google Scholar]
  • 48.Berridge KC, Robinson TE, Aldridge JW. Dissecting components of reward: ‘liking’, ‘wanting’, and learning. Curr Opin Pharmacol. 2009;9:65–73. doi: 10.1016/j.coph.2008.12.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Ikemoto S, Panksepp J. The role of nucleus accumbens dopamine in motivated behavior: a unifying interpretation with special reference to reward-seeking. Brain Res Brain Res Rev. 1999;31:6–41. doi: 10.1016/s0165-0173(99)00023-5. [DOI] [PubMed] [Google Scholar]
  • 50.McClure SM, Daw ND, Montague PR. A computational substrate for incentive salience. Trends Neurosci. 2003;26:423–428. doi: 10.1016/s0166-2236(03)00177-2. [DOI] [PubMed] [Google Scholar]
  • 51.Zhang J, Berridge KC, Tindell AJ, Smith KS, Aldridge JW. A neural computational model of incentive salience. PLoS Comput Biol. 2009;5:e1000437. doi: 10.1371/journal.pcbi.1000437. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Niv Y, Daw ND, Joel D, Dayan P. Tonic dopamine: opportunity costs and the control of response vigor. Psychopharmacology (Berl) 2007;191:507–520. doi: 10.1007/s00213-006-0502-4. [DOI] [PubMed] [Google Scholar]
  • 53.Tremblay PL, Bedard MA, Levesque M, Chebli M, Parent M, Courtemanche R, et al. Motor sequence learning in primate: role of the D2 receptor in movement chunking during consolidation. Behav Brain Res. 2009;198:231–239. doi: 10.1016/j.bbr.2008.11.002. [DOI] [PubMed] [Google Scholar]
  • 54.Levesque M, Bedard MA, Courtemanche R, Tremblay PL, Scherzer P, Blanchet PJ. Racloprideinduced motor consolidation impairment in primates: role of the dopamine type-2 receptor in movement chunking into integrated sequences. Exp Brain Res. 2007;182:499–508. doi: 10.1007/s00221-007-1010-4. [DOI] [PubMed] [Google Scholar]
  • 55.Ostlund SB, Winterbauer NE, Balleine BW. Evidence of action sequence chunking in goal-directed instrumental conditioning and its dependence on the dorsomedial prefrontal cortex. J Neurosci. 2009;29:8280–8287. doi: 10.1523/JNEUROSCI.1176-09.2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Graybiel AM. The basal ganglia and chunking of action repertoires. Neurobiol Learn Mem. 1998;70:119–136. doi: 10.1006/nlme.1998.3843. [DOI] [PubMed] [Google Scholar]
  • 57.Paxinos G, Watson C. The rat brain in stereotaxic coordinates. 4th ed. Academic Press; 1998. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

01

RESOURCES