Abstract
The dopamine projection from ventral tegmental area (VTA) to nucleus accumbens (NAc) is critical for motivation to work for rewards, and reward-driven learning. How dopamine supports both functions is unclear. Dopamine spiking can encode prediction errors, vital learning signals in computational theories of adaptive behavior. By contrast, dopamine release ramps up as animals approach rewards, mirroring reward expectation. This mismatch might reflect differences in behavioral tasks, slower changes in dopamine cell spiking, or spike-independent modulation of dopamine release. Here we compare spiking of identified VTA dopamine cells with NAc dopamine release in the same decision-making task. Cues indicating upcoming reward increased both spiking and release. Yet NAc core dopamine release also covaried with dynamically-evolving reward expectations, without corresponding changes in VTA dopamine cell spiking. Our results suggest a fundamental difference in how dopamine release is regulated to achieve distinct functions: broadcast burst signals promote learning, while local control drives motivation.
Dopamine is famously related to “reward” – but how exactly? One function involves learning from unexpected rewards. Brief increases in dopamine cell firing encode reward prediction errors (RPEs1–3) - learning signals for optimizing future motivated behavior. Dopamine manipulations can affect learning as if altering RPEs4–6. But they also affect motivated behaviors immediately, as if dopamine signals reward expectation (value)5. Furthermore, NAc dopamine escalates during motivated approach, consistent with encoding value,7–11.
With few exceptions2,13,14, midbrain dopamine firing has been examined during classical conditioning in head-fixed animals3,12, unlike forebrain dopamine release. We therefore compared firing with release under the same conditions. We identified VTA dopamine neurons using optogenetic tagging3,14. To measure NAc dopamine release we used three independent methods – microdialysis, voltammetry, and the optical sensor dLight15 – with convergent results. Our primary conclusion is that although RPE-scaled VTA dopamine spike bursts provide abrupt changes in dopamine release appropriate for learning, separate NAc dopamine fluctuations associated with motivation arise independently from VTA dopamine cell firing.
Dopamine tracks motivation in key loci
We trained rats in an operant “bandit” task5 (Fig.1a,b). On each trial illumination of a nose poke port (Light-On) prompted approach and entry (Center-In). After a variable hold period (0.5–1.5s), white noise (Go Cue) led the rat to withdraw (Center-Out) and poke an adjacent port (Side-In). On rewarded trials this Side-In event was accompanied by a food hopper click, prompting the rat to approach a food port (Food-Port-In) to collect a sugar pellet. Leftward, rightward choices were each rewarded with independent probabilities, which occasionally changed without warning. When rats were more likely to receive rewards, they were more motivated to perform the task. This was apparent in their “latency” – the time between Light-On and Center-In - which was sensitive to the outcome of the preceding few trials (Extended Data Fig.1) and thereby scaled inversely with reward rate (Fig.1b).
We previously reported5 a correlation between NAc dopamine release and reward rate, consistent with the motivational role of mesolimbic dopamine16. Here, we first wished to know whether this is observed throughout forebrain targets, consistent with globally “broadcast” dopamine signaling17 or is restricted to specific subregions. We further hypothesized that these dopamine dynamics would differ between striatum and cortex, since these structures have distinct dopamine uptake/degradation kinetics18 and may use dopamine for distinct functions19,20.
Using microdialysis with liquid chromatography–mass spectrometry we surveyed medial frontal cortex and striatum (Fig.1c; Extended Data Fig.1). We simultaneously assayed 21 neurotransmitters and metabolites with 1 min time resolution, and used regression to compare chemical time series with behavioral variables (Extended Data Fig.2).
We replicated the correlation between reward rate and NAc dopamine – unlike other neurotransmitters (Fig.1c,d). However, this relationship was localized to NAc core, not NAc shell or dorsal-medial striatum. Contrary to our hypothesis, we observed a similar spatial pattern in frontal cortex: dopamine release correlated with reward rate in ventral prelimbic cortex, but not more dorsal or ventral subregions (Fig.1c,e). Though unexpected, these twin “hotspots” of value-related dopamine release have an intriguing parallel in human neuroimaging: BOLD signal correlates with subjective value specifically in NAc and ventral-medial prefrontal cortex21.
VTA firing is unrelated to motivation
We next addressed whether this motivation-related forebrain dopamine arises from variable firing of midbrain dopamine cells. NAc core receives dopamine input from lateral portions of VTA (VTA-l;6,22,23). In head-fixed mice, VTA-l dopamine neurons reportedly have uniform, RPE-like responses to conditioned stimuli3. To record VTA-l dopamine cells we infected the VTA with virus for Cre-dependent expression of channelrhodopsin (AAV-DIO-ChR2), in rats that express Cre recombinase under a tyrosine hydroxylase promoter3,14 (see Methods). Optrodes (Fig.2a,b) recorded single-unit responses to brief blue laser pulses (Fig.2c; Extended Data Figs.3,4; Supplementary Figure). We found 27 well-isolated VTA-l cells with reliable short-latency spikes, and considered these identified dopamine neurons.
All dopamine neurons were tonically-active, with relatively low firing rates (mean 7.7Hz; range 3.7–12.9Hz; compared to all VTA-l neurons recorded together with dopamine cells, p<0.001 one-tailed Mann–Whitney). They also had longer-duration spike waveforms (p<5×10−6, one-tailed Mann–Whitney), although there were exceptions (Fig. 2d), confirming that waveform duration is an insufficient marker of dopamine cells in vivo3,24. A distinct cluster of VTA-l neurons (n=38, from the same sessions) had brief waveforms, higher firing rates (>20Hz; mean 41.3Hz, range 20.1–97.1Hz), and included no tagged dopamine cells. We presume these cells are GABAergic and/or glutamatergic3,25, and refer to them as “non-dopamine” below.
We recorded the same dopamine cells across multiple behavioral tasks. VTA-l dopamine cells responded strongly to randomly-timed food hopper clicks, and progressively less strongly when these clicks were made more predictable by preceding cues (Extended Data Fig.5). This is consistent with canonical RPE-like coding by dopamine cells in Pavlovian tasks2,3,26,
Based on evidence from anesthetized animals, it has been argued that altered dopamine levels measured with microdialysis arise from changes in the tonic firing rate of dopamine cells27, and/or the proportion of active versus inactive dopamine neurons28. However, in the bandit task tonic dopamine cell firing in each block of trials was strikingly indifferent to reward rate (Fig.2e,g). There was no significant change in the firing rates of individual dopamine cells – or any other VTA-l neurons - between higher- and lower-reward blocks (Fig.2f,h; see also29 for concordant results in head-fixed mice). There was also no overall change in the rate at which dopamine cells fire bursts of spikes (Fig.2i). Furthermore, we never observed any dopamine cells switching between active and inactive states. The proportion of time dopamine cells spent inactive (long inter-spike-intervals) was very low, and did not change between higher- and lower-reward blocks (Fig.2i).
The anatomy of the VTA-NAc dopamine projection has been intensively investigated6,22,23, but given this apparent functional mismatch we reconfirmed that we were recording from the correct portion of VTA. Small injections of the retrograde tracer cholera toxin B (CTb) into NAc core resulted in dense labeling of TH+ neurons within the same VTA-l area as our optrode recordings (Extended Data Fig.3). Within the approximate recording zone 21% of TH+ cells were also CTb+, and this is likely an underestimate of the fraction of NAc core-projecting VTA-l dopamine cells as our tracer injections did not completely fill NAc core. Hence our sample of n=27 tagged VTA dopamine cells (plus many more untagged cells) almost certainly includes NAc core-projecting neurons. Finally, in an additional rat we recorded two tagged VTA-l dopamine cells after infusing virus selectively into NAc core (Extended Data Fig.3). Both retrogradely-infected cells had firing patterns that closely resembled the other tagged dopamine cells in all respects, including lack of tonic firing changes with varying reward rate (Supplementary Figure). We conclude that changes in tonic VTA-l dopamine cell firing are not responsible for motivation-related changes in forebrain dopamine release.
Tracking release on multiple timescales
Does NAc dopamine release track reward rate per se, as suggested in some theories30, or is this correlation driven by dynamic fluctuations in dopamine release that are too fast to resolve with microdialysis? We argued for the latter possibility based on voltammetry data5, but sought confirmation using an independent measure of dopamine release that can span different timescales. The dLight1 suite of genetically-encoded optical dopamine indicators was engineered by inserting circularly-permutated GFP into dopamine D1 receptors15. Binding of dopamine causes a highly specific increase in fluorescence (Fig.3a). We infused viruses into NAc to express either dLight1.1 (4 verified NAc placements from 3 rats) or the brighter variant dLight1.3b (6 verified NAc placements from 4 rats), and monitored fluorescence by fiber photometry. We observed clear NAc dopamine responses to Pavlovian reward-predictive cues, similarly to VTA dopamine cell firing (Extended Data Fig.5).
For the bandit task we first examined the dLight signal in 1min bins (Fig.3b) for comparison to microdialysis. We again saw a clear relationship between NAc dopamine release and reward rate, in both cross-correlation and analysis of block transitions (Fig.3c,d). We then looked more closely at how this relationship arises. Rather than slowly-varying on a timescale of minutes, the dLight signal showed highly dynamic fluctuations within and between each trial (Fig.3e). We compared these fluctuations to instantaneous state values, and RPEs, estimated from a reinforcement learning model (a Semi-Markov Decision Process5). As we previously reported using voltammetry5, moment-by-moment NAc dopamine showed a strong correlation with state values (Fig.3f), visible as ramping up within trials when rewards were expected (Fig.3e). We also saw transient increases with less-expected reward deliveries, consistent with RPE (examined below). In every individual dLight session dopamine showed a stronger correlation with values than either RPEs or reward rate (Fig.3h and Extended Data Fig.6). Correlations with both state values and RPE were maximal to the dLight signal ~0.3s s later, consistent with a brief lag caused by neural processing of cues and sensor response time (Fig.3g; with voltammetry we reported a ~0.4–0.5s lag5).
Dopamine firing does not explain release
We next compared dopamine cell firing and release around bandit task events. External stimuli at Light-On, Go cue, and rewarded Side-In (food hopper click) each evoked a rapid firing increase (Fig.4a). These responses were observed in the great majority of dopamine cells (Fig.4c), although the magnitude of responses to different cues varied from cell to cell (Supplementary Figure). The NAc dLight signal also responded rapidly and reliably to each of these salient cues (Fig.4b,c), consistent with burst firing of dopamine cells driving dopamine release.
We also saw clear increases in NAc dopamine release as rats approached the start port (just before Center-In) and the food port (just before Food-Port-In). This fits well with the extensive voltammetry literature showing that motivated approach behaviors are accompanied by rapid increases in NAc core dopamine5,7–11. However, the VTA-l dopamine cell population did not show a corresponding increase in firing at these times (Fig.4a; see Extended Data Fig.7 for additional comparisons, including non-dopamine cells).
To better dissociate cue-evoked, and approach-related, dopamine activity we separated trials by short (<1s) and long (>2s) latencies (Fig.4d,e). Increases in dopamine cell firing were consistently locked to the cue onset at Light-On, preferentially for short-latency trials. All 25 dopamine cells with significant firing rate increases after Light-On were better aligned to Light-On than Center-In (Fig.4e). By contrast, increases in NAc dopamine release before Center-In were distinct from cue-evoked dopamine release (Fig.4d,e). dLight signals consistently increased before Center-In on long-latency trials (10/10 sessions), and also before Food-Port-In (9/10 sessions), without corresponding increases in dopamine firing (Fig.4f).
Finally we considered how event-related dopamine signals depend upon recent reward history. During the early part of each trial dopamine cell firing was not dependent on reward rate (Fig.5a) - despite the influence of reward rate on motivation (Fig.5b). Subsequently, the phasic response to the reward cue was reliably stronger when reward rate was lower (Fig.5a), consistent with positive RPE encoding. When this reward cue was omitted dopamine cells paused firing, though encoding of negative RPEs was much weaker or absent, whether examined at the population level (Fig.5a,b) or individual cells (Extended Data Fig.8). It has been proposed that negative RPEs are encoded in the duration of dopamine pauses31, but this was observed in just 2/29 individual neurons. Similar results were obtained if reward expectation was estimated in other ways, including trial-based reinforcement learning models (actor-critic, Q-learning) or simply counting recent rewards (Extended Data Fig.8).
Dopamine release at Side-In also showed a clear, transient encoding of positive, but not negative, RPEs (Fig.5c,d). This dLight response was slightly delayed and prolonged compared to firing, consistent with time taken for release and reuptake32, but remained a subsecond phenomenon. Unlike firing, however, dLight signals early in each trial were greater when recent trials had been rewarded (Fig.5c), consistent with value coding. We observed this dependence on reward history even while the rat was not actively moving, but was maintaining a nosepoke in the center port waiting for the Go cue (Fig.5d). Overall, we conclude that NAc dopamine release reflects both cue-evoked responses and reward expectation, and that only the former can be well accounted for by VTA-l dopamine cell firing.
Discussion.
VTA-l provides the predominant source of dopamine to NAc core6,23,24. VTA-l dopamine cells consistently display RPE-like bursts in both Pavlovian3 and operant13 tasks, including VTA-l dopamine cells that project to NAc core. VTA bursts are thought to be particularly important for driving NAc dopamine32, and indeed we found that cue-evoked VTA bursts were matched by NAc release. However, we additionally found value-related patterns of NAc dopamine release that were not generated by firing of VTA-l dopamine cells, either on long (“tonic”) or short (“phasic”) timescales. Other dopamine subpopulations may carry distinct signals14,33,34, and we cannot rule out the possibility that firing of dopamine cell subpopulations not recorded from here produces value-related dopamine in NAc core. However, value-related firing has never been reported for any dopamine cells, across a wide range of studies. Our results suggest that NAc dopamine dynamics are controlled in different ways, at different times, for different functions, and that recording dopamine cells is important but not sufficient for understanding dopamine signals35.
Release from dopamine terminals is powerfully controlled by local, non-spiking mechanisms36–40. For example, NAc dopamine release is modulated by the basolateral amygdala even when VTA spiking is pharmacologically suppressed41,42. It has been noted for decades that local control of dopamine release might achieve distinct functions to dopamine cell spiking36,43, but this has not been incorporated into theoretical views of dopamine. Distinct striatal subregions contribute to different types of decision, and may influence their own dopamine release according to need44. It remains to be determined just how local this local control of dopamine release can be. One limitation shared by the three ways that we measured dopamine release is that they all sample on a 100μm+ spatial scale. There is evidence from in vivo microscopy that dopamine release may be heterogeneous at considerably smaller scales15.
Our results do not support the existence of any separate “tonic” dopamine signal that could mediate motivational effects of dopamine. Instead, dopamine shifts that appear slow if measured slowly (with microdialysis) resolve into rapid fluctuations if measured rapidly (with voltammetry or dLight). Furthermore, recordings of identified VTA dopamine cells by ourselves and others30 provide strong evidence against the idea29 that changes in tonic dopamine cell firing drive tonic changes in dopamine release. While tonic firing can be altered by lesions or drug manipulations28, we are not aware of sustained changes in firing rate in any behavioral task. Firing can ramp downwards on a ~1s timescale during anticipation of motivationally-relevant events45,46. However, this decline is the opposite of what would be required to boost dopamine release with reward expectation, and instead seems more akin to a sequence of transient negative prediction errors47. Although sustained signals encoding ongoing reward rate could be computationally useful30, dopamine instead provides rapidly-fluctuating error and value signals. It remains possible that sustained signals are computed at a subsequent step, by intracellular signaling pathways downstream of dopamine receptors.
Many groups have observed ramping dopamine release as rats approach rewards5,7–11, consistent with encoding escalating reward expectations. Some have argued that these dopamine ramps simply reflect RPEs, by supposing that rats either rapidly forget values48, or that they have a warped set of state representations49. This latter idea is not supported by our observation that ramping is rapidly modulated from trial to trial based on updated reward expectations - becoming stronger within a short sequence of successive rewards while RPE-like responses to cues become weaker (Fig. 3e). More generally, any theory in which dopamine solely conveys RPEs (learning signals) cannot account for the very well-established connection between ongoing mesolimbic dopamine and motivation16. The NAc core is not needed for highly-trained responses to conditioned stimuli, but is particularly important when deciding to perform time-consuming work to obtain rewards50. NAc core dopamine appears to provide an essential dynamic signal of how worthwhile it is to allocate time and effort to work5,44, even though this signal is not present in VTA dopamine cell firing.
Methods.
Animals.
All animal procedures were approved by the University of Michigan or University of California San Francisco Institutional Committees on Use and Care of Animals. Male rats (300–500g, either wild-type Long-Evans or TH-Cre+ with a Long-Evans background51) were maintained on a reverse 12:12 light:dark cycle and tested during the dark phase. Rats were mildly food deprived, receiving 15 g of standard laboratory rat chow daily in addition to food rewards earned during task performance. No sample size precalculation was performed.
Behavior.
Pretraining and testing were performed in computer-controlled Med Associates operant chambers (25 cm × 30 cm at widest point) each with a five-hole nose-poke wall, as previously described5. Bandit task sessions used the following parameters: block lengths were 35–45 trials, randomly selected for each block; hold period before Go cue was 500–1500 ms (uniform distribution); left/right reward probabilities were 10,50,90% (for electrophysiology, photometry, voltammetry, and previously reported microdialysis rats5, or 20,50,80% (newly reported microdialysis rats).
Current reward rate was estimated using a simple, time-based leaky-integrator52. Reward rate was incremented each time a reward was received, and decayed exponentially at a rate set by parameter τ (the time in s for the reward rate to decrease by ~63%, 1–1/e). For all analyses, τ was selected based on the rat’s behavior, maximizing the (negative) correlation between reward rate and log(latency) in each session. The correlations between forebrain dopamine and reward rate were not highly sensitive to this choice of τ (Extended Data Fig.1).
To classify block transitions as “increasing” or “decreasing” in reward rate, we compared the average leaky-integrator reward rate in the last 5 min of a block to the average reward rate in the first 8 min of the subsequent block.
Rats used for electrophysiology and photometry also performed a Pavlovian approach task, in the same operant chamber with the houselight on throughout the session. Three auditory cues (2kHz, 5kHz, 9kHz) were associated with different probabilities of food delivery (counterbalanced across rats). Cues were played as a train of tone pips (100ms on / 50ms off) for a total duration of 2.6s followed by a delay period of 500ms. Cues, and unpredicted reward deliveries, were delivered in pseudorandom order with a variable inter-trial interval (15–30s, uniform distribution).
Microdialysis.
Surgery.
Rats were implanted bilaterally with guide cannulae (CMA, #830 9024) in cortex and striatum. One group (n=8) received one guide cannula targeting prelimbic and infralimbic cortex (AP +3.2 mm, ML 0.6 mm relative to bregma; DV 1.4 mm below brain surface) and another targeting dorsomedial striatum and nucleus accumbens in the opposite hemisphere (AP +1.3, ML 1.9, DV 3.4). Both implants were angled 5 degrees away from each other along the rostral-caudal plane. A second group (n=4) received one guide cannula targeting anterior cingulate cortex (AP +1.6, ML 0.8, DV 0.8) and another targeting accumbens (core/shell in the opposite hemisphere at AP +1.6, ML 1.4, DV 5.5 (n=2) or AP +1.6, ML 1.9, DV 5.7 (n=2). Implant sides were counterbalanced across rats. Animals were allowed to recover for 1 week prior to retraining.
Chemicals.
Water, methanol, and acetonitrile for mobile phases were Burdick & Jackson HPLC grade, purchased from VWR (Radnor, PA). All other chemicals were purchased from Sigma Aldrich (St. Louis, MO) unless otherwise noted. Artificial cerebrospinal fluid (aCSF) was comprised of 145 mM NaCl, 2.68 mM KCl, 1.40 mM CaCl2, 1.01 mM MgSO4, 1.55 mM Na2HPO4, and 0.45 mM NaH2PO4, adjusted pH to 7.4 with NaOH. Ascorbic acid (250 nM final concentration) was added to reduce oxidation of analytes.
Sample Collection and HPLC-MS.
On testing day, animals were placed in the operant chamber with the houselight on. Custom-made concentric polyacrylonitrile membrane microdialysis probes (1 mm dialyzing AN69 membrane; Hospal, Bologna, Italy) were inserted bilaterally into guide cannula and perfused continuously (Chemyx Inc., Fusion 400) with aCSF at 2 μL/min for 90 min to allow equilibration. After 5 min baseline collection the houselight was extinguished, cueing the animal to bandit task availability. Sample collection continued at 1 min intervals and samples were immediately derivatized53 with 1.5 μL sodium carbonate, 100 mM; 1.5 μL BzCl, 2% (v/v) BzCl in acetonitrile; and 1.5 μL isotopically labeled internal standard mixture diluted in 50% (v/v) acetonitrile containing 1% (v/v) sulfuric acid, and spiked with deuterated ACh and choline (C/D/N isotopes, Pointe-Claire, Canada) to a final concentration of 20 nM. Sample series collection alternated between the two probes at 30-second intervals in each of 26 sessions, except for one session in which a broken membrane resulted in just one series (51 sample series total). Samples were analyzed using Thermo Scientific UHPLC systems (Accela, or Vanquish Horizon interfaced to a Quantum Ultra triple quadrupole mass spectrometer fitted with a HESI II ESI probe), operating in multiple reaction monitoring. Five μL samples were injected onto a Phenomenex core-shell biphenyl Kinetex HPLC column (2.1 mm × 100 mm). Mobile phase A was 10 mM ammonium formate with 0.15% formic acid, and mobile phase B was acetonitrile. The mobile phase was delivered an elution gradient at 450 μL/min as follows: initial, 0% B; 0.01 min, 19% B; 1 min, 26% B; 1.5 min, 75% B; 2.5 min, 100% B; 3 min, 100% B; 3.1 min, 5% B; and 3.5 min, 5% B. Thermo Xcalibur QuanBrowser (Thermo Fisher Scientific) was used to automatically process and integrate peaks. Each of the >100,000 peaks were visually inspected individually to ensure proper integration.
Analysis.
All neurochemical concentration data were smoothed with a 3-point moving average (y’ = [0.25*(y-1) + 0.5(y) + 0.25*(y+1)]) and z-score normalized within each session to facilitate between-session comparisons. For each target region, a cross-correlogram was generated for each session and the average of the sessions was plotted. 1% confidence boundaries were generated for each subplot by shuffling one time series 100,000 times and generating a distribution of correlation coefficients for each session. Multiple regression models were generated using the regress function in MATLAB, with the neurochemical as the outcome variable and behavioral metrics as predictors. Regression coefficients were determined significant at three alpha levels (0.05, 0.0005, 0.000005), after Bonferroni-correction for multiple comparisons (alpha / (21 chemicals * 7 regions * 9 behavioral regressors)). For analysis of block transitions data were binned into 3 min epochs, discarding the sample that included the transition time.
Electrophysiology.
Rats (n=25) were implanted with custom designed drivable optrodes, each consisting of 16 tetrodes (constructed from 12.5μm nichrome wire, Sandvik, Palm Coast, FL) glued onto the side of a 200μm optic fiber and extending up to 500μm below the fiber tip. During the same surgery, we injected 1μl of AAV2/5-EF1a-DIO-ChR2(H134R)-EYFP into the lateral VTA (AP 5.6, ML 0.8, DV 7.5) or NAc core (AP 1.6, ML 1.6, DV 6.4). Wideband (1–9000Hz) brain signals were sampled (30,000 samples/s) using Intan digital headstages. Optrodes were lowered at least 80μm at the end of each recording session. Individual units were isolated offline using a MATLAB implementation of MountainSort54 followed by careful manual inspection.
Classification.
To identify whether an isolated VTA-l unit was dopaminergic (TH+), we used the stimulus-associated latency test55. Briefly, at the end of each experimental session, we connected the optrode to a laser diode and delivered light pulse trains of different widths and frequencies. For a unit to be identified as light responsive it needed to reach the significance level of p<0.001 for 5ms and 10ms pulse trains. We also compared the light evoked waveforms (within 10ms of laser pulse onset) to session-wide averages; all light-evoked units had a Pearson correlation coefficient of >0.9. Dopamine neurons were successfully recorded from four rats with VTA-l virus infusions (IM657, 1 unit; IM1002, 3 units; IM1003, 15 units; IM1037, 9 units) and one rat with NAc core virus (IM-1078, 2 units). Peak width was defined as the full-width-at-half-maximum of the most prominent negative component of the aligned, averaged spike waveform. Non-tagged VTA neurons with session-wide firing rate > 20Hz and peak width < 200μs were classified as non-dopamine cells. To ensure that we were comparing dopamine and non-dopamine cells within the same subregions, we only analyzed non-dopamine cells recorded during sessions with at least one optically-tagged dopamine cell.
Analysis.
Spike bursts were detected by the conventional “80/160 template” approach56: each time an inter-spike-interval of 80 ms or less occurs, these and subsequent spikes are considered part of a burst until there is an interval of 160 ms or more. For comparison of “tonic” firing to reward rate, dopamine spikes were counted in 1 min bins. To examine faster changes, spike density functions were constructed by convolving spike trains with a Gaussian kernel with variance 20ms. To determine how quickly a neuron responded to a given cue, we used 40ms bins (sliding in steps of 20ms) and used a shuffle test (10,000 shuffles) for each time bin comparing the firing rate after cue onset to firing rate in the 250 ms immediately preceding the cue. The first bin at which the post cue firing rate was significantly (p<0.01, correcting for multiple comparisons) greater than baseline firing was considered the time to cue response.
Peak firing rate was calculated as the maximum (Gaussian-smoothed) firing rate of each trial in a 250 ms window after Side-In for rewarded trials, and the valley was calculated as the minimum firing rate in a 2 s window, starting one second after Side-In for unrewarded trials.
To calculate a ramp angle during approach behaviors we smoothed mean firing rates with a 50ms Gaussian kernel, detected the maximum/minimum of the resulting signal in a 0.5s window prior to each event (Center-In or Food-Port-In) and measured the signed angle connecting the two extrema. To compare firing rates in “high” and “low” reward blocks, for each session we performed a median split of average leaky-integrator reward rate in each block.
Voltammetry and computational model.
Fast-scan cyclic voltammetry results shown here reanalyze data previously presented in detail5. Within-trial estimates of state value and reward prediction errors were calculated using a semi-Markov decision process reinforcement learning model, exactly as previously described5.
Photometry.
We used a viral approach to express the genetically encoded optical dopamine sensor dLight15. Under isoflurane anesthesia, 1uL of AAV9-CAG-dLight (1×1012 vg/mL - UC Davis vector core) was slowly (100nL/min) injected (Nanoject III, Drummond, Broomall, PA ) through a 30μm glass micropipette in ventral striatum bilaterally (AP:1.7mm, ML:1.7mm, DV:−7.0mm). During the same surgery optical fibers (400μm core, 430μm total diameter) attached to a metal ferrule (Doric) were inserted (target depth 200μm higher than virus) and cemented in place. Data were collected >3 weeks later, to allow for dLight expression.
For dLight excitation blue (470nm) and violet (405nm; control) LEDs were sinusoidally modulated at distinct frequencies (211Hz, 531Hz respectively57). Both excitation and emission signals passed through minicube filters (Doric) and bulk fluorescence was measured with a femtowatt detector (Newport, Model 2151) sampling at 10KHz. Demodulation produced separate 470nm (dopamine) and 405nm (control) signals, which were then rescaled to each other via a least-square fit57. Fractional fluorescence signal (dF/F) was then defined as (470–405_fit)/405_fit. For all analyses this signal was downsampled to 50Hz and smoothed with a 5-point median filter. For presentation of 470nm and 405nm signals separately, see Extended Data Fig.7.
Data from an optic fiber placement were included in analyses if the fiber tip was in NAc, and the fluorescence response of one to at least one task cue had a Z-score of >1. These criteria excluded one rat, and yielded three rats/four placements (IM1065-left, IM1066-bilateral, IM1089-right) for dLight1.1, and four rats/six placements (IM1088-bilateral, IM1105-right, IM1106-bilateral, IM1107-right) for dLight1.3b. Similar results were obtained for dLight1.1 and dLight1.3 (Extended Data Fig.7) so data were combined.
To calculate a ramp angle during approach behaviors we detected the maximum/minimum of the resulting signal in a 0.5s window prior to each event (Center-In or Food-Port-In) and measured the signed angle connecting the two extrema.
Affinity and molecular specificity of dLight1.3b.
In vitro measurements were performed as previously described15. Briefly, HEK293T (ATCC CRL#1573) cells were cultured and transfected with plasmids encoding dlight1.3b driven by a CMV promoter, and washed with HBSS (Life Technologies) supplemented with Ca2+ (4mM) and Mg2+ (2 mM) before imaging. Imaging was performed using a 40X oil-based objective on an inverted Zeiss Observer LSN710 confocal microscope with 488nm/513nm (excitation/emission) wavelengths. For testing the sensor’s fluorescence responses, neurotransmitters were directly applied to the bath during time-lapse imaging, in at least two independent experiments. Titrations of dopamine and norepinephrine were obtained by performing 10-fold serial dilutions to achieve 8 different concentrations. All other neurotransmitters were tested at three sequential concentrations (100nM,1 μM, 10 μM). All neurotransmitter concentrations were obtained by dilution from a 1 mM stock concentration in HBSS, prepared fresh. Raw fluorescence intensities from time lapse imaging were quantified on Fiji; each region of interest (ROI) was manually drawn on the membrane of individual cells. Fluorescent fold change (ΔF/F) was calculated as F peak (averaged fluorescence intensity of 4 frames) - F basal (averaged fluorescence intensity of 4 frames before addition of ligands) / F basal. Graphs and statistical analysis were performed using GraphPad Prism 6. Data points were analyzed with a one-site specific binding curve fit to obtain Kd values. In box-and-whisker plots, the box covers the 25% to 75% range and whiskers extend from minimum to maximum values.
Availability of Reagents, Code, and Data.
The AAV.Synapsin.dLight1.3b virus used in this study has been deposited with Addgene (www.addgene.org). Custom MATLAB code is available upon request to J.D.B. All data will be available through the Collaborative Research in Computational Neuroscience (CRCNS.org) data sharing website.
Extended Data
Supplementary Material
Acknowledgments.
We thank Peter Dayan, Howard Fields, Loren Frank, Chris Donaghue, and Thomas Faust for their comments on an early version of the manuscript, and Vaughn Hetrick, Rahim Hashim, Tom Davidson for technical assistance and advice. This work was supported by the National Institute on Drug Abuse, the National Institute of Mental Health, the National Institute on Neurological Disorders and Stroke, the University of Michigan, Ann Arbor, and the University of California, San Francisco.
Footnotes
Competing Interests. The Authors declare no competing interests.
References.
- 1.Schultz W, Dayan P & Montague PR A neural substrate of prediction and reward. Science 275, 1593–1599 (1997). [DOI] [PubMed] [Google Scholar]
- 2.Pan WX, Schmidt R, Wickens JR & Hyland BI Dopamine cells respond to predicted events during classical conditioning: evidence for eligibility traces in the reward-learning network. J Neurosci 25, 6235–6242 (2005). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Cohen JY, Haesler S, Vong L, Lowell BB & Uchida N Neuron-type-specific signals for reward and punishment in the ventral tegmental area. Nature 482, 85–88 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Steinberg EE, Keiflin R, Boivin JR, Witten IB, et al. A causal link between prediction errors, dopamine neurons and learning. Nat Neurosci (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Hamid AA, Pettibone JR, Mabrouk OS, Hetrick VL, et al. Mesolimbic dopamine signals the value of work. Nat Neurosci 19, 117–126 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Saunders BT, Richard JM, Margolis EB & Janak PH Dopamine neurons create Pavlovian conditioned stimuli with circuit-defined motivational properties. Nature Neuroscience 21, 1072 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Phillips PE, Stuber GD, Heien ML, Wightman RM & Carelli RM Subsecond dopamine release promotes cocaine seeking. Nature 422, 614–618 (2003). [DOI] [PubMed] [Google Scholar]
- 8.Roitman MF, Stuber GD, Phillips PE, Wightman RM & Carelli RM Dopamine operates as a subsecond modulator of food seeking. J Neurosci 24, 1265–1271 (2004). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Wassum KM, Ostlund SB & Maidment NT Phasic mesolimbic dopamine signaling precedes and predicts performance of a self-initiated action sequence task. Biol Psychiatry 71, 846–854 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Howe MW, Tierney PL, Sandberg SG, Phillips PE & Graybiel AM Prolonged dopamine signalling in striatum signals proximity and value of distant rewards. Nature 500, 575–579 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Syed EC, Grima LL, Magill PJ, Bogacz R, et al. Action initiation shapes mesolimbic dopamine encoding of future rewards. Nat Neurosci (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Fiorillo CD, Tobler PN & Schultz W Discrete coding of reward probability and uncertainty by dopamine neurons. Science 299, 1898–1902 (2003). [DOI] [PubMed] [Google Scholar]
- 13.Morris G, Nevet A, Arkadir D, Vaadia E & Bergman H Midbrain dopamine neurons encode decisions for future action. Nat Neurosci 9, 1057–1063 (2006). [DOI] [PubMed] [Google Scholar]
- 14.Silva JAD, Tecuapetla F, Paixão V & Costa RM Dopamine neuron activity before action initiation gates and invigorates future movements. Nature 554, 244 (2018). [DOI] [PubMed] [Google Scholar]
- 15.Patriarchi T, Cho JR, Merten K, Howe MW, et al. Ultrafast neuronal imaging of dopamine dynamics with designed genetically encoded sensors. Science 360, eaat4422 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Salamone JD & Correa M The mysterious motivational functions of mesolimbic dopamine. Neuron 76, 470–485 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Schultz W Predictive reward signal of dopamine neurons. J Neurophysiol 80, 1–27 (1998). [DOI] [PubMed] [Google Scholar]
- 18.Garris PA & Wightman RM Different kinetics govern dopaminergic transmission in the amygdala, prefrontal cortex, and striatum: an in vivo voltammetric study. J Neurosci 14, 442–50. (1994). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Frank MJ, Doll BB, Oas-Terpstra J & Moreno F Prefrontal and striatal dopaminergic genes predict individual differences in exploration and exploitation. Nat Neurosci 12, 1062–1068 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.St Onge JR, Ahn S, Phillips AG & Floresco SB Dynamic fluctuations in dopamine efflux in the prefrontal cortex and nucleus accumbens during risk-based decision making. J Neurosci 32, 16880–16891 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Bartra O, McGuire JT & Kable JW The valuation system: a coordinate-based meta-analysis of BOLD fMRI experiments examining neural correlates of subjective value. Neuroimage 76, 412–427 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Ikemoto S Dopamine reward circuitry: two projection systems from the ventral midbrain to the nucleus accumbens-olfactory tubercle complex. Brain Res Rev 56, 27–78 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Breton JM, Charbit AR, Snyder BJ, Fong PTK, et al. Relative contributions and mapping of ventral tegmental area dopamine and GABA neurons by projection target in the rat. J Comp Neurol (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Ungless MA, Magill PJ & Bolam JP Uniform inhibition of dopamine neurons in the ventral tegmental area by aversive stimuli. Science 303, 2040–2042 (2004). [DOI] [PubMed] [Google Scholar]
- 25.Morales M & Margolis EB Ventral tegmental area: cellular heterogeneity, connectivity and behaviour. Nat Rev Neurosci 18, 73–85 (2017). [DOI] [PubMed] [Google Scholar]
- 26.Morris G, Arkadir D, Nevet A, Vaadia E & Bergman H Coincident but distinct messages of midbrain dopamine and striatal tonically active neurons. Neuron 43, 133–143 (2004). [DOI] [PubMed] [Google Scholar]
- 27.Floresco SB, West AR, Ash B, Moore H & Grace AA Afferent modulation of dopamine neuron firing differentially regulates tonic and phasic dopamine transmission. Nat Neurosci 6, 968–973 (2003). [DOI] [PubMed] [Google Scholar]
- 28.Grace AA Dysregulation of the dopamine system in the pathophysiology of schizophrenia and depression. Nature Reviews Neuroscience 17, 524 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Cohen JY, Amoroso MW & Uchida N Serotonergic neurons signal reward and punishment on multiple timescales. Elife 4, (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Niv Y, Daw N & Dayan P How fast to work: Response vigor, motivation and tonic dopamine. Advances in neural information processing systems 18, 1019 (2006). [Google Scholar]
- 31.Bayer HM, Lau B & Glimcher PW Statistics of midbrain dopamine neuron spike trains in the awake primate. J Neurophysiol 98, 1428–1439 (2007). [DOI] [PubMed] [Google Scholar]
- 32.Chergui K, Suaud-Chagny MF & Gonon F Nonlinear relationship between impulse flow, dopamine release and dopamine elimination in the rat brain in vivo. Neuroscience 62, 641–65. (1994). [DOI] [PubMed] [Google Scholar]
- 33.Parker NF, Cameron CM, Taliaferro JP, Lee J, et al. Reward and choice encoding in terminals of midbrain dopamine neurons depends on striatal target. Nat Neurosci (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Menegas W, Babayan BM, Uchida N & Watabe-Uchida M Opposite initialization to novel cues in dopamine signaling in ventral and posterior striatum in mice. Elife 6, (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Trulson ME Simultaneous recording of substantia nigra neurons and voltammetric release of dopamine in the caudate of behaving cats. Brain Res Bull 15, 221–223 (1985). [DOI] [PubMed] [Google Scholar]
- 36.Glowinski J, Chéramy A, Romo R & Barbeito L Presynaptic regulation of dopaminergic transmission in the striatum. Cellular and Molecular Neurobiology 8, 7–17 (1988). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Zhou FM, Liang Y & Dani JA Endogenous nicotinic cholinergic activity regulates dopamine release in the striatum. Nat Neurosci 4, 1224–1229 (2001). [DOI] [PubMed] [Google Scholar]
- 38.Threlfell S, Lalic T, Platt NJ, Jennings KA, et al. Striatal dopamine release is triggered by synchronized activity in cholinergic interneurons. Neuron 75, 58–64 (2012). [DOI] [PubMed] [Google Scholar]
- 39.Cachope R, Mateo Y, Mathur BN, Irving J, et al. Selective activation of cholinergic interneurons enhances accumbal phasic dopamine release: setting the tone for reward processing. Cell Rep 2, 33–41 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Sulzer D, Cragg SJ, Rice ME Striatal dopamine neurotransmission: Regulation of release and uptake. Basal Ganglia 6, 123–148 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Floresco SB, Yang CR, Phillips AG & Blaha CD Basolateral amygdala stimulation evokes glutamate receptor-dependent dopamine efflux in the nucleus accumbens of the anaesthetized rat. Eur J Neurosci 10, 1241–1251 (1998). [DOI] [PubMed] [Google Scholar]
- 42.Jones JL, Day JJ, Aragona BJ, Wheeler RA, et al. Basolateral amygdala modulates terminal dopamine release in the nucleus accumbens and conditioned responding. Biol Psychiatry 67, 737–744 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Schultz W Responses of midbrain dopamine neurons to behavioral trigger stimuli in the monkey. Journal of neurophysiology 56, 1439–1461 (1986). [DOI] [PubMed] [Google Scholar]
- 44.Berke JD What does dopamine mean? Nature Neuroscience (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Bromberg-Martin ES, Matsumoto M & Hikosaka O Distinct tonic and phasic anticipatory activity in lateral habenula and dopamine neurons. Neuron 67, 144–155 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Pasquereau B & Turner RS Dopamine neurons encode errors in predicting movement trigger occurrence. Journal of Neurophysiology 113, 1110–1123 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Fiorillo CD, Newsome WT & Schultz W The temporal precision of reward prediction in dopamine neurons. Nat Neurosci (2008). [DOI] [PubMed] [Google Scholar]
- 48.Morita K & Kato A Striatal dopamine ramping may indicate flexible reinforcement learning with forgetting in the cortico-basal ganglia circuits. Front Neural Circuits 8, 36 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Gershman SJ Dopamine ramps are a consequence of reward prediction errors. Neural Comput 26, 467–471 (2014). [DOI] [PubMed] [Google Scholar]
- 50.Nicola SM The flexible approach hypothesis: unification of effort and cue-responding hypotheses for the role of nucleus accumbens dopamine in the activation of reward-seeking behavior. J Neurosci 30, 16585–16600 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
Methods References.
- 51.Witten IB, Steinberg EE, Lee SY, Davidson TJ, et al. Recombinase-driver rat lines: tools, techniques, and optogenetic application to dopamine-mediated reinforcement. Neuron 72, 721–733 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Sugrue LP, Corrado GS & Newsome WT Matching behavior and the representation of value in the parietal cortex. Science 304, 1782–1787 (2004). [DOI] [PubMed] [Google Scholar]
- 53.Wong JM, Malec PA, Mabrouk OS, Ro J, et al. Benzoyl chloride derivatization with liquid chromatography-mass spectrometry for targeted metabolomics of neurochemicals in biological samples. J Chromatogr A 1446, 78–90 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Chung JE, Magland JF, Barnett AH, Tolosa VM, et al. A Fully Automated Approach to Spike Sorting. Neuron 95, 1381–1394.e6 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Kvitsiani D, Ranade S, Hangya B, Taniguchi H, et al. Distinct behavioural and network correlates of two interneuron types in prefrontal cortex. Nature 498, 363–366 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Grace AA & Bunney BS The control of firing pattern in nigral dopamine neurons: burst firing. J Neurosci 4, 2877–2890 (1984). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Lerner TN, Shilyansky C, Davidson TJ, Evans KE, et al. Intact-Brain Analyses Reveal Distinct Information Carried by SNc Dopamine Subcircuits. Cell 162, 635–647 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Paxinos G & Watson C The rat brain in stereotaxic coordinates (5th edition) (Elsevier Academic Press, 2005). [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.