Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2025 Nov 25.
Published in final edited form as: Nature. 2024 Nov 25;639(8053):143–152. doi: 10.1038/s41586-024-08412-x

Opponent control of reinforcement by striatal dopamine and serotonin

Daniel F Cardozo Pinto 1, Matthew B Pomrenze 1, Michaela Y Guo 1, Gavin C Touponse 1, Allen PF Chen 1, Brandon S Bentzley 2, Neir Eshel 1, Robert C Malenka 1,*
PMCID: PMC12614767  NIHMSID: NIHMS2113083  PMID: 39586475

Abstract

The neuromodulators dopamine (DA) and serotonin (5-hydroxytryptamine; 5HT) powerfully regulate associative learning18. Similarities in the activity and connectivity of these neuromodulatory systems have inspired competing models of how DA and 5HT interact to drive the formation of new associations914. However, these hypotheses have not been tested directly because it has not been possible to interrogate and manipulate multiple neuromodulatory systems in a single subject. Here, we establish a mouse model enabling simultaneous genetic access to the brain’s DA and 5HT neurons. Anterograde tracing revealed the nucleus accumbens (NAc) to be a putative hotspot for the integration of convergent DA and 5HT signals. Simultaneous recording of DA and 5HT axon activity, together with genetically encoded DA and 5HT sensor recordings, revealed that rewards increase DA signaling and decrease 5HT signaling in the NAc. Optogenetically dampening DA or 5HT reward responses individually produced modest behavioral deficits in an appetitive conditioning task, while blunting both signals together profoundly disrupted learning and reinforcement. Optogenetically reproducing DA and 5HT reward responses together was sufficient to drive acquisition of new associations and supported reinforcement more potently than either manipulation alone. Together, these results demonstrate that striatal DA and 5HT signals shape learning by exerting opponent control of reinforcement.


Survival depends on an animal’s ability to seek rewards and learn about environmental cues that predict them. From insects to primates, the neuromodulators DA and 5HT have been found to play key roles in this Pavlovian (i.e. cue-outcome) learning process by signaling the presence of rewards (unconditioned stimuli; US) or reward-predictive cues (conditioned stimuli; CS) and regulating plasticity mechanisms thought to underlie formation of CS-US associations1,57,1417. In mammals, DA and 5HT neurons innervate limbic structures18,19, suggesting that these areas are coordinately modulated during learning, but precisely how DA and 5HT interactions contribute to the formation of new associations remains unclear.

Historically, two contradictory hypotheses about the roles of DA and 5HT in associative learning have been debated. The synergy hypothesis posits that DA and 5HT signals carry information about reward expectation on short and long timescales, respectively9,12,14, thereby synthesizing theories about DA’s function in temporal difference learning2022 and 5HT’s role in regulating mood23,24. The opponency hypothesis proposes that DA invigorates25,26 and 5HT suppresses27,28 behavioral activation to optimize reward seeking with respect to cognitive flexibility29, intertemporal choice tradeoffs28,3032 or under the threat of punishment10,11,33,34, such that imbalances between these processes could lead to compulsion and addiction35,36. So far, however, efforts to test ideas about DA and 5HT interactions directly have been stymied by the difficulty of precisely manipulating multiple neuromodulatory systems at the same time. Here, we present a double transgenic strategy enabling genetic access to the brain’s DA and 5HT systems in a single animal and leverage this advance to reveal that the striatum integrates opponent ventral tegmental area DA (VTADA) and dorsal raphe 5HT (DR5HT) reward signals that coordinately control reinforcement during appetitive learning.

Simultaneous access to DA and 5HT circuits

First, we established a mouse model enabling orthogonal genetic access to DA and 5HT neurons by crossing the DAT-Cre37 and SERT-Flp38 mouse lines to produce DAT-Cre+/;SERT-Flp+/− progeny (Fig. 1a). To evaluate the cell-type specificity of this mouse line, we injected DAT-Cre+/−;SERT-Flp+/− mice with viral vectors encoding Cre-dependent mCherry in the VTA and Flp-dependent EYFP in the DR (Fig. 1b, c), and analyzed the colocalization between these fluorophores and immunostains for DA and 5HT cell markers. We observed >90% colocalization between mCherry and tyrosine hydroxylase (TH) in the VTA, and between EYFP and tryptophan hydroxylase 2 (TpH) in the DR (Fig. 1di). Wild-type control mice injected with these viruses into the same target structures did not show detectable levels of mCherry or EYFP expression, further validating the specificity of our genetic targeting strategy (Extended Data Fig. 1ac). DAT-Cre+/−;SERT-Flp+/− mice injected with the same viruses into the opposite target regions also showed negligible EYFP expression in the VTA and lacked mCherry expression in DR5HT neurons, confirming our ability to target distinct transgenes to the DA and 5HT systems independently (Extended Data Fig. 1dg). Together, these data confirmed that DAT-Cre+/−; SERT-Flp+/− mice enable simultaneous, orthogonal genetic access to midbrain DA and 5HT systems.

Fig. 1: Mapping convergent DA and 5HT inputs to limbic structures involved in learning.

Fig. 1:

a, Schematic describing the generation of DAT-Cre+/−;SERT-Flp+/− mice enabling simultaneous and independent genetic access to DA and 5HT neurons. b, Viral strategy for labeling VTADA and DR5HT neurons in a single mouse. c, Example sagittal section depicting mCherry-expressing neurons in the VTA and EYFP-expressing neurons in the DR. d-e, Example images showing colocalization between Cre-dependent mCherry and TH in the VTA. f, Cell-type specificity quantification for VTADA neurons in DAT-Cre+/−;SERT-Flp+/− mice (n = 3 mice). g-h, Example images showing colocalization between Flp-dependent EYFP and TpH in the DR. i, Cell-type specificity quantification for DR5HT neurons in DAT-Cre+/−;SERT-Flp+/− mice (n = 3 mice). j, Example images of VTADA and DR5HT inputs to limbic structures. k, Relative colocalization between VTADA and DR5HT axons across limbic regions (left) and striatal subregions (right; n = 5 mice). l, Left, injection strategy to label projection-defined DA subsystems. Right, example image showing retrogradely labeled VTADA neurons. m, Left, injection strategy to label projection-defined 5HT subsystems. Right, example image showing retrogradely labeled DR5HT neurons. Note the lack of colocalization between Ctb-488 and the other two tracers in l-m. Data are shown as mean +/− s.e.m. and significance is denoted as *P<0.05, **P<0.01, ***P<0.001, ****P<0.0001. See Supplementary Table 1 for statistics.

Identifying sites of DA and 5HT convergence

Next, we performed anterograde axon tracing to identify hotspots of VTADA and DR5HT input convergence in limbic structures important for associative learning. Labeling VTADA neurons with mCherry and DR5HT neurons with EYFP revealed partially overlapping axon fields in the anterior cortex (Ant Ctx), basolateral amygdala (BLA), and nucleus accumbens (NAc) (Fig. 1j, Extended Data Fig. 2ac). Colocalization between VTADA and DR5HT axons was lowest in frontal cortical areas, variable across subregions of the amygdala, and highest in the posterior medial shell of the NAc (NAcpmSh) (Fig. 1k and Extended Data Fig. 2di). Retrograde tracing with fluorescently tagged cholera toxin confirmed3941 that DA inputs to the NAcpmSh arise from medial VTADA neurons that are distinct from NAc core- (NAccore) and lateral shell- (NAclatSh) projecting subpopulations (Fig. 1l). In agreement with other studies42,43, DR5HT inputs to the NAcpmSh were concentrated in the dorsomedial part of the DR, but surprisingly we found that these neurons are entirely distinct from previously described DR5HT subsystems44, comprising a unique third branch of the DR5HT system (Fig. 1m and Extended Data Fig. 3aj). Based on these data, we focused on the NAcpmSh as the region best positioned to integrate convergent DA and 5HT signals during reward learning.

DA and 5HT dynamics during learning

We next asked how striatal inputs from VTADA and DR5HT neurons respond during an appetitive conditioning task. Using DAT-Cre+/−;SERT-Flp+/− mice, we expressed the red calcium indicator RCaMP2 in VTADA neurons and the green calcium indicator GCaMP6m in DR5HT neurons (Fig. 2a). This allowed us to simultaneously record the activity of VTADA and DR5HT axons in the NAcpmSh as mice learned a new cue-reward association (Fig. 2bc and Extended Data Fig. 4a). In this task, the CS (a compound sound and port-light cue) predicted delivery of sucrose solution (US) into the reward port. Mice successfully acquired the CS-US association as evidenced by a reduced latency to collect the reward and an increase in the number of port entries following CS-onset late in training (Fig. 2d). Aligning the axon recordings to CS-onset revealed that neither VTADA nor DR5HT inputs to the NAcpmSh acquired strong responses to a reward-predictive cue (Fig. 2eg), consistent with previous work showing this striatal subregion is innervated by DA neurons that lack a reward prediction error-like response profile39,45. By contrast, neuromodulatory inputs to the NAcpmSh showed robust US responses, with VTADA axons excited and DR5HT axons inhibited during reward consumption late in training (Fig. 2hj). These inverse DA and 5HT reward responses could be observed simultaneously within individual animals, were consistent across mice, and were not explained by motion artefacts (Fig. 2kl and Extended Data Fig. 4be).

Fig. 2: Convergent DA and 5HT inputs to NAcpmSh show inverse responses to rewards.

Fig. 2:

a, Surgical strategy to record VTADA and DR5HT axon calcium activity in the NAc. b, Confocal image of the recording site from an example mouse. c, Schematics of the fiber photometry system (left) and Pavlovian conditioning task (right). d, By the end of training, mice had acquired the CS-US association as indicated by a decreased latency to collect rewards following CS-onset (left) and an increased number of anticipatory port entries made during the first 5 s of the CS before US delivery (right; n =5 mice). e-j, Population recordings and max/min Z-score quantifications of VTADA and DR5HT axon calcium activity aligned to CS onset (top) or reward consumption (bottom) during early (left) and late (right) training. Neither VTADA nor DR5HT axons showed a CS response at any stage of training (e-g). Late in training, VTADA axons were excited by rewards while DR5HT axons were inhibited (h-j). k-l, Simultaneously recorded VTADA and DR5HT axon calcium responses in an individual mouse during late training aligned to CS-onset (k) and reward consumption (l). m, There was no correlation between the relative timing of the RCaMP2 max and GCaMP6 min (left), but we observed a negative correlation between the magnitude of the RCaMP2 max and the GCAMP6 min (right). n, Surgical strategy (left) and example image of the recording site (right) for GRAB sensor recordings in the NAcpmSh o-p, DA release increased (o) and 5HT release decreased (p) following reward consumption. In d-j and m, n = 5 mice. In o-p, n = 5 mice per group. Data are shown as mean +/− s.e.m. and significance is denoted as *P<0.05, **P<0.01, ***P<0.001, ****P<0.0001. See Supplementary Table 1 for statistics.

Given that the excitatory VTADA axon response began before mice entered the reward port, we wondered if DA release during reward approach could be driving the inhibitory response observed in DR5HT axons during reward consumption. Leveraging our simultaneous two-color recording design, we compared the relative timing and magnitudes of the two US responses. Correlation analyses showed that although the peak of the VTADA signal generally preceded the trough of the DR5HT response, the relative timing of one signal was not predictive of the other, and in fact greater excitatory VTADA reward responses were associated with smaller, not larger, inhibitory DR5HT responses to rewards (Fig. 2m). These findings suggest that DR5HT axons in the NAcpmSh are not simply responding to VTADA signals but rather separately encoding reward consumption in a manner opposite to striatal DA inputs.

To directly test whether, as expected, the inverse activity profiles in VTADA and DR5HT axons drive corresponding changes in DA and 5HT release, we injected mice in the NAcpmSh with viral vectors carrying the genetically encoded sensors, GRAB-DA or GRAB-5HT, and performed photometry recordings as mice consumed randomly delivered rewards (Fig. 2n and Extended Data Fig. 5a). On average, reward consumption led to an increase in DA release and decrease in 5HT release, which persisted for several seconds or more in every mouse we studied (Fig. 2o,p and Extended Data Fig. 5b,c). Interestingly, the 5HT reward response was variable over time, showing a mix of very weak excitatory and inhibitory responses across trials during the first few days of the task (Extended Data Fig. 5d) that was similar to what we observed in our axon recordings during the early training phase of appetitive conditioning (Fig. 2h). Notably, the GRAB 5HT reward response developed into a strong and consistent inhibition over time (Extended Data Fig. 5e) even though rewards were never paired with predictive cues. Thus the change in the 5HT reward responses appears to be independent of learning. In sum, our axon and GRAB sensor recording experiments suggest that the striatum receives dramatically different, inverse DA and 5HT responses that converge in the NAcpmSh during reward consumption.

Learning requires opposite DA and 5HT signals

Similar to most neural recording studies, the previous experiments monitoring DA and 5HT input activity were correlational in nature. To test for a causal relationship between inverse VTADA and DR5HT reward responses in appetitive learning, we next performed a loss-of-function experiment designed to blunt either DA or 5HT reward signals alone, or both together, with two-color optogenetics. Using DAT-Cre+/−;SERT-Flp+/− mice, we expressed halorhodopsin (NpHR) or an EYFP control in VTADA neurons, and channelrhodopsin (ChR2) or an EYFP control in DR5HT neurons in a two-by-two design, and implanted bilateral optical fibers in the NAcpmSh (Fig. 3ab and Extended Data Fig. 6ah). This design enabled us to use long pulses of red light to blunt excitatory VTADA excitatory reward responses and short pulses of blue light to blunt inhibitory DR5HT reward responses in the same animals precisely during US consumption in an appetitive Pavlovian conditioning task (Fig. 3c).

Fig. 3: Blunting convergent DA and 5HT reward responses disrupts learning and reinforcement.

Fig. 3:

a, Optogenetic strategy to blunt VTADA and/or DR5HT reward responses during learning. b, Images of injection/implantation sites from an example mouse. c, Schematic of the Pavlovian conditioning task. d, Number of rewards obtained over training. Inset, average rewards across days. e, Number of port entries over training. Inset, port entries on the final day of training. f, Median latency to enter the port after CS-onset over training. Inset, median latency across days. g, Probability of occupying the reward port as a function of time within a trial. Inset, average probability during the CS period. h, Baseline-normalized probability of occupying the reward port as a function of time within a trial. Inset, average normalized probability during the CS period. f, Percentage of time that mice occupied the port during each period of a trial, relative to the baseline period. j, Number of port entries during an extinction session. k, Latency to enter reward port after CS-onset during an extinction session. l-m, Same as h, but during an extinction session. n, Percentage of time that mice occupied the port during first 5 trials of the extinction session. o, Distance traveled in the open field. p, Time spent per chamber in the real-time place preference test. q, Average body weight during training days. r, Amount of sucrose solution consumed during a free-access reward task. In d-n and q: EYFP/EYFP, n = 10 mice; NpHR/EYFP, n = 8 mice; EYFP/ChR2, n = 8 mice; NpHR/ChR2, n= 9 mice. In o-p: EYFP/EYFP, n = 9 mice; NpHR/EYFP, n = 8 mice; EYFP/ChR2, n = 7 mice; NpHR/ChR2, n= 9 mice. In r: EYFP/EYFP, n = 9 mice; NpHR/EYFP, n = 8 mice; EYFP/ChR2, n = 8 mice; NpHR/ChR2, n= 9 mice. Data are shown as mean +/− s.e.m. and significance is denoted as *P<0.05, **P<0.01, ***P<0.001, ****P<0.0001. See Supplementary Table 1 for statistics.

We found that mice from the control (EYFP/EYFP) group performed hundreds of port entries per session, earned an increasing number of rewards across days, and collected rewards more quickly over time by learning to perform anticipatory port entries following CS-onset (Fig. 3df). Mice in the single opsin groups – in which either DA inputs were inhibited (NpHR/ EYFP) or 5HT inputs were excited (EYFP/ChR2) – earned a similar number of rewards to the control group despite making fewer port entries (Fig. 3d,e). They also learned to collect rewards more efficiently over time by making anticipatory port entries following CS-onset, although mice in the EYFP/ChR2 group were slower to acquire this learned response compared to EYFP/EYFP controls (Fig. 3f). By contrast, mice in the double opsin (NpHR/ChR2) group – in which both optogenetic manipulations occurred simultaneously – obtained fewer rewards compared to EYFP/EYFP controls and made the fewest port entries of all (Fig. 3d,e). Crucially, mice in this group took the longest to enter the port following CS-onset suggesting a disruption in the acquisition of a conditioned response (Fig. 3f).

Additional analyses focusing on the dynamics of the animals’ behavior within a trial supported the conclusion that the coordinated reduction of both neuromodulatory reward signals impaired the ability to learn the CS-US association to a greater extent than the reduction of the DA and 5HT reward responses independently. Specifically, mice in both the control and single opsin groups late in training showed strong evidence of having acquired the CS-US association as evidenced by an increase in their probability of occupying the reward port following CS-onset (Fig. 3g). In contrast, mice in the double opsin group did not adapt their reward seeking behavior in response to CS-onset and were least likely to occupy the reward port during the anticipatory period between CS-onset and reward delivery (Fig. 3g). Furthermore, despite differences in their likelihood of occupying the port at baseline (Fig. 3g), mice in the single opsin groups showed equivalent levels of responding to the CS compared to controls (Fig. 3h,i), whereas mice in the NpHR/ChR2 group exhibited minimal learning as indicated by the near complete absence of a behavioral response to the CS before reward delivery (Fig. h,i).

These differences were also observed in a subsequent extinction test, during which mice in the double opsin group made fewer port entries and continued to show a blunted behavioral response to the CS compared to mice in the other three groups, even in the absence of any optogenetic manipulations on this day (Fig. 3jn). These results demonstrate that although blunting DA or 5HT reward responses individually in the NAcpmSh can drive modest behavioral deficits in an appetitive learning task, loss-of-function manipulations targeting both neuromodulatory responses produces a dramatic disruption in the ability to form new associations.

Importantly, the effects of our loss-of-function manipulations on learning could not be explained by locomotor or affective (i.e. valence) effects of our optogenetic manipulations because in the same subjects these manipulations had no effects in real time place preference (RTPP) or open field test (OFT) control assays (Fig. 3op and Extended Data Fig. 6ij). Furthermore, there were no differences between groups in the extent of the modest food restriction we used (Fig. 3q) suggesting no differences in the animals’ physiological drive for sucrose rewards. However, when the same mice were given free access to sucrose rewards paired with the optogenetic manipulations, we observed a trend toward reduced reward consumption in the NpHR/EYFP group and significantly reduced sucrose consumption in the EYFP/ChR2 and NpHR/ChR2 groups (Fig. 3r). Thus, although there may have been a difference in the relative potency of our single opsin manipulations – likely because the inhibitory optogenetic manipulations with NpHR were less effective than excitatory manipulations with ChR2 (Extended Data Fig. 7a,b) – our data suggest that the loss-of-function manipulations blunted learning by reducing the hedonic properties of the sucrose US.

Integration of DA and 5HT signals drives learning

Our results thus far suggest that optimal associative reward learning requires coordinated and opponent changes in DA and 5HT reward signals, which may regulate the hedonic properties of a reward. This hypothesis predicts that optogenetically reproducing VTADA and DR5HT reward signals (Extended Data Fig. 7c) together should drive new learning more potently than either manipulation alone in the absence of a natural reward. To test this prediction, we expressed ChR2 or EYFP in VTADA and NpHR or EYFP in DR5HT neurons of DAT-Cre+/−;SERT-Flp+/− mice in a two-by-two design and implanted them with bilateral optical fibers in the NAcpmSh (Fig. 4a,b and Extended Data Fig. 8ah). We tested these mice on an optogenetic conditioned place preference (CPP) assay where one side of a two-chambered box was paired with optostimulation/inhibition and the other side was paired with no optogenetic manipulation (Fig. 4c). After two conditioning sessions, neither DR5HT inhibition nor VTADA stimulation in the NAcpmSh were individually sufficient to drive CPP in the single opsin groups (EYFP/NpHR, ChR2/EYFP). However, both manipulations delivered together produced appetitive conditioning in every mouse that expressed both opsins (ChR2/EYFP; Fig. 4de), suggesting that integration of inverse VTADA and DR5HT reward responses drives learning more strongly than either manipulation alone.

Fig. 4: Integration of opponent DA and 5HT reward responses drives new learning.

Fig. 4:

a, Optogenetic strategy to reproduce VTADA and/or DR5HT reward responses. b, Images of injection/implantation sites from an example mouse. c, Schematic of the CPP tasks. d-e, VTADA excitation and DR5HT inhibition together, but not either manipulation alone, produced CPP. f-h, Neither DR5HT inhibition alone (f) nor VTADA stimulation alone (g) produced CPP in the same mice that previously showed CPP for both manipulations together (h). Purple bars in h represent the same data as the purple bars in e. i, VTADA stimulation together with DR5HT inhibition produced a greater real-time place preference than either manipulation alone. j, VTADA stimulation alone, or together with DR5HT inhibition, increased locomotion in the open field test. k, Schematic of the optogenetic conditioning paradigm with three CS- (compound sound and port light cues) US- (VTADA stimulation and/or DR5HT inhibition) pairs. l, Mice acquired conditioned approach responses to CSs paired with VTADA stimulation alone, DR5HT inhibition alone, or both manipulations together (left), but conditioned responses were more accurate for the CS paired with both manipulations together. m, After training, CSs paired with VTADA stimulation alone, DR5HT inhibition alone, or both manipulations together all functioned as conditioned reinforcers, but responding was above chance level only for the CS paired with both manipulations together. n, During the primary reinforcement test, mice preferred VTADA stimulation and DR5HT inhibition delivered together to either manipulation alone. In d-i, n = 6 mice per group. In k-l: EYFP/EYFP, n = 6 mice; EYFP/NpHR, n = 6 mice; ChR2/EYFP, n = 5 mice; ChR2/NpHR, n = 6 mice. In n-p, n = 10 mice. In j, m-n dashed lines represent chance levels. Data are shown as mean +/− s.e.m. and significance is denoted as *P<0.05, **P<0.01, ***P<0.001, ****P<0.0001. See Supplementary Table 1 for statistics.

Leveraging our ability to manipulate DA signaling, 5HT signaling, or both together in the same mouse, we tested this hypothesis again in a within-subjects design by running mice from the ChR2/NpHR group through the CPP assay two more times with only blue and only red light while using distinct contextual cues in each iteration of the test to minimize the chances of carryover learning. Consistent with previous results, two conditioning sessions with VTADA stimulation alone or DR5HT inhibition alone in the NAcpmSh were insufficient to produce CPP on their own in the same mice that exhibited a robust CPP effect when the identical optogenetic manipulations were delivered together (Fig. 4fh).

Because our loss-of-function manipulations appeared to disrupt learning by blunting the rewarding effects of a sucrose reinforcer, we predicted that simultaneous VTADA stimulation and DR5HT inhibition could be eliciting CPP by driving a stronger rewarding effect than either manipulation alone. We tested this hypothesis in a real-time place preference experiment, where mice could directly choose between spending time in a chamber paired with optostimulation/inhibition or another chamber paired with nothing. Although there was a trend toward a place preference in response to VTADA stimulation alone, only both manipulations together produced robust reinforcement (Fig. 4i). Importantly, these findings could not be explained by limitations of the optogenetic manipulations because the same optogenetic stimulation/inhibition parameters were individually sufficient to produce corresponding changes in DA and 5HT release in the NAcpmSh (Extended Data Fig. 7c), and in the case of VTADA stimulation, to drive a hyperlocomotor effect in the OFT (Extended Data Fig. 8i).

In further experiments manipulating VTADA and DR5HT inputs to the NAcpmSh, we examined a more complex form of associative learning. ChR2/NpHR mice from the double opsin group were placed into an operant chamber with three ports, each equipped with its own port light. Mice were then presented with three cue-outcome pairings where compound sound and port-light CSs predicted delivery of optogenetic USs consisting of VTADA stimulation alone, DR5HT inhibition alone, or both together (Fig. 4k). Crucially, US delivery was not contingent upon any port entry, allowing us to use port entries as a measure of conditioned approach; a behavioral read-out of learning. We found that by the final days of the task, mice had developed conditioned approach responses to all three CSs, suggesting that with extensive training (far beyond the two conditioning sessions used in our CPP experiments) VTADA stimulation alone, DR5HT inhibition alone, and both together in the NAcpmSh can all drive some degree of learning. Nevertheless, we found that the animals’ conditioned responses were most accurate in response to the CS that predicted both manipulations together despite being trained on an equivalent number of trials of each type (Fig. 4l and Extended Data Fig. 8j).

As an additional test of our hypothesis that VTADA stimulation and DR5HT inhibition together drive learning more strongly than either manipulation alone, we tested the same mice in a conditioned reinforcement assay where entries into each of the reward ports triggered brief presentations of the CS paired with that port, but no USs (i.e., optogenetic manipulations) were delivered. This assay measured the work subjects exerted to receive each of the three CSs, allowing direct comparison of the incentive value that each CS acquired over the course of training. Mice worked to obtain presentation of all three CSs, but performance exceeded chance levels only for the CS that predicted VTADA stimulation delivered together with DR5HT inhibition (Fig. 4m). Finally, we compared the relative potency of VTADA stimulation alone, DR5HT inhibition alone, or both together as primary reinforcers by giving mice the option to perform port entries to receive any one of these three manipulations. Again, we found that mice strongly preferred to receive the two optogenetic manipulations together compared to either one alone (Fig. 4l). We conclude that inverse VTADA and DR5HT signals in the NAcpmSh exert opponent control over reward and are integrated to drive contextual, Pavlovian, and instrumental forms of learning.

DA 5HT opponency generalizes to the NAccore

Our experiments thus far focused on the integration of DA and 5HT reward signals in the NAcpmSh, the region that receives the greatest density of converging midbrain DA and 5HT inputs. However, this area is thought to be molecularly and functionally distinct from other striatal subregions4648. Notably, the NAcpmSh is innervated by VTADA neurons that are distinct from NAc core- (NAccore) and lateral shell- (NAclatSh) projecting populations3941, lack a classical reward prediction error like response profile39, and are excited by punishments as well as rewards45,49. We replicated several of these findings (Fig. 1l, Fig. 2ej, Extended Data Fig. 4d) thereby confirming that our experiments targeted a NAc subregion innervated by non-canonical DA neurons, activation of which had surprisingly weak effects on learning and reinforcement in CPP and RTPP tasks. The unusual properties of the VTADA inputs to NAcpmSh raise the possibility that the opponent relationship between DA and 5HT signaling that we uncovered may be unique to this brain region rather than a more general mechanism driving associative learning.

To address this possibility, we repeated key experiments in the NAccore, the striatal region most extensively studied in the context of reward learning and in which DA release is potently reinforcing3,50 and has consistently been found to encode reward prediction errors21,22,51,52. Because excitatory DA responses to rewards are documented to occur throughout the striatum53 and the source of VTADA inputs to the NAccore is known, the key result about the generalizability of our conclusions hinged on the activity of the NAccore 5HT inputs. To identify the source of these inputs, we injected retrograde tracers tagged with different fluorophores into the NAcpmSh, NAccore, and NAclatSh (Fig. 5a,b). NAcpmSh-projecting DR5HT neurons overlapped extensively with NAccore projectors and partially with NAclatSh-projectors, indicating that individual DR5HT neurons often sent axons collaterals to more than one striatal target region (Fig. 5c). This suggests that 5HT input dynamics are likely to be conserved between the NAcpmSh and NAccore.

Fig. 5: Opponent control of reinforcement by DA and 5HT generalizes to the NAccore.

Fig. 5:

a, Surgical strategy to label DR5HT neurons projecting to different NAc subregions. a, Example image of the injection sites. c, Example images of retrogradely labeled DR5HT neurons showing colocalization between NAcpmSh-, NAccore-, and NAclatSh- projecting subpopulations (n = 2 mice). d-e, Surgical strategy (d) and example image of the recording site (e) for GRAB sensor recordings in the NAccore. f-g, DA release increased (f, n = 3 mice) and 5HT release decreased (g, n = 4 mice) following reward consumption. h, Optogenetic strategy to reproduce VTADA and/or DR5HT reward responses in the NAccore. i, Images of injection/implantation sites from an example mouse. j, Schematics of the RTPP tasks. k, DR5HT inhibition alone did not produce RTPP (EYFP/EYFP, n = 5 mice; ChR2/NpHR, n = 6 mice). l, VTADA stimulation drove RTPP in the ChR2/NpHR group (right, n = 6 mice) but not in EYFP/EYFP controls (left, n = 5 mice). m, Mice in the ChR2/NpHR group (right, n = 5–6 mice), but not the EYFP/EYFP group (left, n = 5 mice), preferred VTADA stimulation together with DR5HT inhibition compared to VTADA stimulation alone. n, Difference score for the red-light only RTPP experiment in k. o, Same as n but for the 12 mW experiment in l. p, Same as n, but for the 3 mW experiment in m. In n-p: EYFP/EYFP, n = 5 mice; ChR2/NpHR, n = 6 mice. q-r, In a 4-choice RTPP task, mice showed a within-session place preference for VTADA stimulation delivered together with DR5HT inhibition compared to either manipulation alone (n = 5 mice). Data are shown as mean +/− s.e.m. and significance is denoted as *P<0.05, **P<0.01, ***P<0.001, ****P<0.0001. See Supplementary Table 1 for statistics.

To test this prediction, mice were injected with viral vectors encoding GRAB-DA or GRAB-5HT into the NAccore and implanted with an optical fiber above the injection site (Fig. 5d,e and Extended Data Fig. 9a,b). Fiber photometry recordings aligned to consumption of randomly delivered rewards revealed the expected increase in DA release and decrease in 5HT release (Fig. 5f,g and Extended Data Fig. 9c,d), which was comparable in timing and magnitude to the 5HT recordings obtained from the NAcpmSh (Fig. 2p).

Next, we examined whether the inverse DA and 5HT reward responses in the NAccore exert opponent control over reinforcement as demonstrated for the NAcpmSh. DAT-Cre+/−;SERT-Flp+/− mice were used to allow expression of ChR2 or EYFP in VTADA neurons and NpHR or EYFP in DR5HT neurons and bilateral optical fibers were implanted above the NAccore (Fig. 5h,i and Extended Data Fig. 10ad). We hypothesized that in the NAccore, DR5HT inhibition would strengthen or enhance the reinforcing properties of VTADA stimulation alone in a RTPP task. To avoid potential ceiling effects in the ability to measure strong place preferences, we repeated experiments using a broad range of optogenetic stimulation light powers (Fig. 5j). As observed in the NAcpmSh, DR5HT inhibition in the NAccore alone was neither rewarding nor aversive (Fig. 5k). On the other hand, stimulating VTADA release in the NAccore caused a clear place preference in the ChR2/NpHR group, which scaled with the intensity of the stimulating LED light and which did not occur in EYFP/EYFP controls (Fig. 5l). Using the same mice, we next performed a series of RTPP experiments in which we paired one side of the RTPP apparatus with VTADA stimulation alone and the other side with VTADA stimulation together with DR5HT inhibition. At every level of light power, mice exhibited a consistent preference for both manipulations together as opposed to VTADA stimulation alone; an effect not observed in EYFP/EYFP controls (Fig. 5mp).

As a final direct test of the generalizability of our conclusion that striatal DA and 5HT signaling exert opponent control over reinforcement, we offered mice in the ChR2/NpHR group a direct choice in the same behavioral session between VTADA stimulation alone, DR5HT inhibition alone, neither, or both together in a 4-choice RTPP task (Fig. 5q). We found that in the NAccore, where VTADA stimulation is strongly rewarding on its own3,50, VTADA stimulation delivered together with DR5HT inhibition was a dramatically more potent reinforcer than stimulation of the same VTADA axons alone (Fig. 5r). These results demonstrate that inverse DA and 5HT reward signals work in an opponent manner to regulate reward learning in two distinct striatal subregions, including one commonly thought to act as a key locus for learning and reinforcement3,21,22,50,51,54.

Discussion

We have presented evidence that optimal associative learning requires the integration of inverse VTADA and DR5HT reward signals that converge in the striatum. How do these findings fit with proposed models of DA and 5HT interactions? Overall, our data support theories of DA and 5HT opponency10,11,13, albeit with some important modifications. For example, our optogenetic manipulation experiments suggest that opposing VTADA and DR5HT responses drive learning about reward-predictive cues in CPP and Pavlovian conditioning assays, but they do not support earlier notions that this effect may be driven by neuromodulatory control of behavioral activation. If this were true, one would expect manipulations of both VTADA and DR5HT NAc inputs to affect locomotion, but this was not observed when we excited or inhibited DR5HT inputs alone during the open field test. Pairing VTADA stimulation with DR5HT inhibition during the open field test also did not enhance the locomotor response induced by VTADA stimulation alone as would be expected if DA and 5HT were coordinately controlling general behavioral activation.

Furthermore, the design of our optogenetic conditioning experiment where DA stimulation alone, 5HT inhibition alone, or both together were paired with different CS cues offered internal controls for unpaired presentations between each of the CSs and the USs that they did not predict. In this context, the finding that learning was specifically enhanced for the CS associated with both manipulations together rules out the possibility that DA and 5HT signals are working by regulating a general arousal or motivational state. Instead, our findings that integration of inverse VTADA and DR5HT signals produced robust real-time place preference, and that blunting neuromodulatory responses reduced reward consumption, suggest that striatal VTADA and DR5HT inputs provide coordinated, opponent modulatory control over reinforcement. This is consistent with work showing that inhibition of serotonin signaling, both systemically36 and in the dorsal striatum35, potentiates the reinforcing properties of a cocaine reward, and with longstanding studies showing 5HT releasing drugs dampen the reinforcing potency of amphetamines55,56. Nevertheless, our experiments cannot formally exclude the possibility that slow fluctuations in tonic levels of DA and 5HT may also carry information about reward expectation or value12,14,25 as proposed by some synergy theories.

In summary, we developed new tools and strategies for studying DA and 5HT signaling in the same mouse that facilitate the investigation of neuromodulatory interactions in ways not previously possible. Our results show that opponency on short timescales is at least one way that DA and 5HT systems shape motivated behavior, and support an updated opponency model where the integration of inverse DA and 5HT signals governs learning through reinforcement – a finding that held for contextual, Pavlovian, and instrumental forms of conditioning and generalized across two striatal regions. We hope this work will motivate further investigations into how the combinatorial actions of DA and 5HT elsewhere in the brain shape motivated behaviors thought to be under multiplexed neuromodulatory control, including aversive learning4,15 and sociability57,58, with the potential to deepen our understanding of neuropsychiatric disorders characterized by dysfunction in these behaviors.

METHODS

Mice

Adult (>6 week old) wild-type C57BL6/J (Jackson Laboratory #000664), SERT-Cre (Mouse Mutant Resource and Research Centers, stock number: 017260-UCD, strain code: Tg(Slc6a4-cre)ET33Gsat/Mmucd), and DAT-Cre+/−;SERT-Flp+/− (in-house cross between Jackson Laboratory lines #006660 and #034050 on a C57BL6 background) mice of both sexes were used in this study. All mice had ad-libitum access to food and water unless otherwise specified, and all experimental procedures were approved by the Stanford University Administrative Panel on Laboratory Animal Care and the Administrative Panel on Biosafety.

Stereotactic surgeries

Mice were anesthetized with either isofluorane (4–5% induction, 1–2% maintenance) or a ketamine- (60 mg/kg) dexmedetomidine (0.6 mg/kg) cocktail. Within each experiment, the type of anesthesia used was the same for every mouse. A stereotaxic frame (Kopf instruments) was used to target injections and implantations to the following structures (coordinates are in mm, with AP and ML relative to bregma and DV relative to the skull surface): DR, AP −4.6, ML 0, DV −3.3; VTA, AP −3.4, ML +/− 0.2, DV −4.2; NAc medSh, AP +1.0, ML +/−0.7, DV −4.3 to −4.5; NAc core, AP +1.2, ML 0.9 to 1.2, DV −4.5; NAc latSh, AP +1.1, ML 1.8, DV −4.5; OFC, AP +2.6, ML 1.1, DV −2.5; CeA, AP −1.5, ML 2.5, DV −4.5. Viral vectors (~1013 gc/mL for photometry experiments, ~1012 gc/mL otherwise; from the Stanford Gene Vector and Virus core unless otherwise specified) and Ctb retrograde tracers (1 ug/ul; Invitrogen #C34777, #C34776, #C34775) were infused with a syringe pump (Harvard Apparatus) at a rate of 100–200 nL/min and allowed to diffuse for at least 5 min. Incubation times were >3 weeks for viruses and 1–3 weeks for retrograde tracers. Optical fibers (Doric Lenses, 200–400 um diameter, 0.48–0.66 NA) were secured to the skull using screws (Antrin Miniature Specialties) and dental cement (Geristore). Optical fiber placements were histologically verified post-hoc, and mice with mistargeted fibers were excluded from further analysis.

Histology and Immunohistochemistry

Histological procedures were performed as previously described60. Mice were transcardially perfused with 4% (w/v) paraformaldehyde (PFA) in phosphate-buffered saline (PBS) and postfixed in PFA overnight. Coronal sections 50 um thick were immunostained for tyrosine hydroxylase (primary: mouse anti-TH, Millipore #MAB318; secondary: goat anti-mouse 647, Invitrogen #A-21236 or goat anti-mouse 405, Invitrogen #A-31553), tryptophan hydroxylase 2 (primary: rabbit anti-TpH, Novus #NB100–74555; secondary: goat anti-rabbit 546, Invitrogen #A-11035 or goat-anti rabbit 405, Invitrogen #A-31556), EYFP (primary: chicken anti-GFP, Aves #GFP-1020; secondary: goat anti-chicken 488, Invitrogen #A-11039), and/or mCherry (primary: rat anti-mCherry, Invitrogen #M11217; secondary: goat anti-rat 594, Invitrogen #11007). Primary antibodies were used at a concentration of 1:1000 and incubated on a shaker overnight at room temperature. Secondary antibodies were used at a concentration of 1:750 and incubated on a shaker for 2 hours at room temperature.

Cell-type specificity analysis

DAT-Cre+/−;SERT-Flp+/− mice were injected with 500 nL of AAV-DJ-EF1a-DIO-mCherry unilaterally in the VTA and 500 nL of AAV-DJ-EF1a-fDIO-EYFP along the midline in the DR. Cell-type specificity analysis was then performed as previously described60. Briefly, coronal sections of the VTA were stained for TH and DR sections were stained for TpH. For every other section, 40x images (~300 um × 300 um, 40× 1.3 NA objective, Nikon A1 confocal microscope) were acquired in each of the dorsomedial, ventromedial, and lateral (unilateral for VTA, bilateral for DR) subregions of the DR or VTA. Cell-type specificity for VTADA and DR5HT neurons was defined as the fraction of mCherry+ cells that were TH+ and the fraction of EYFP+ cells that were TpH+, respectively. For the orthogonality control experiment, samples were prepared identically to those in the cell-type specificity experiment but with the targets for the viruses swapped, and alternating DR sections were stained for TH and TpH to confirm that Cre-dependent mCherry expression restricted to DRDA neurons and not present in DR5HT neurons. For the specificity control experiment, samples were prepared identically to those in the cell-type specificity experiment but a wild-type C57Bl6/J mouse was used.

Anterograde anatomical tracing

DAT-Cre+/−;SERT-Flp+/− mice were injected with 400 nL of AAV-DJ-EF1a-DIO-mCherry bilaterally into the VTA and 800 nL of AAV-DJ-EF1a-fDIO-EYFP into the DR. Coronal sections were immunostained for EYFP and mCherry to amplify the signal from labeled axons. Epifluorescence images of the Ant Ctx, anterior BLA (AP −0.8 mm to −1.5 mm), posterior BLA (AP −1.5 mm to −2.2 mm), anterior NAc (AP 1.2 mm to 1.6 mm), and posterior NAc (AP 0.8 mm to 1.2 mm; 6–8 samples/region/mouse) were acquired on a Keyence BZ-X800 microscope using a 10x objective and identical imaging settings for all samples. Image analysis was done with ImageJ. Each image was background subtracted twice (first with a rolling ball radius of 50 px, then 25 px) and the green and red channels were binarized using Otsu’s thresholding method61. To measure colocalization between VTADA and DR5HT axons, the binarized green and red images were multiplied together yielding images that only had signal in pixels that were thresholded in both the green and red channels. Since pixels measured ~2 × 2 um each and both 5HT and DA can signal via volume transmission62 over comparable distances63, we reasoned that this analysis would reveal regions with cells that are well positioned to receive convergent DA and 5HT inputs. Regions of interest were manually defined based on DAPI signal and the Paxinos mouse brain atlas64. For each region, the VTADA axon density, DR5HT axon density, and axon colocalization were measured as the fraction of segmented pixels in the red, green, and colocalization channels, respectively, and the data for each channel was normalized within mice (scale function in R with center setting set to FALSE).

Retrograde anatomical tracing

Wild-type or DAT-Cre+/−;SERT-Flp+/− mice were injected with 200–300 nL each of Ctb-488, Ctb-555, and Ctb-647. To identify populations of VTADA and DR5HT neurons targeting different striatal subregions, retrograde tracers were injected in the NAc medSh, NAc core, and NAc latSh, respectively. VTA sections were immunostained for TH and DR sections were immunostained for TpH. To identify subsystems of DR5HT neurons, retrograde tracers were injected in the NAc medSh, CeA, and OFC, respectively, and DR sections were immunostained for TpH. For each DR section, the anteroposterior coordinate was estimated using the Paxinos mouse brain atlas64, 40x images (~300 um × 300 um, 40x objective, Nikon A1 confocal microscope) were acquired in each of the dorsomedial, ventromedial, lateral left, and lateral right subregions of the DR, and colocalization between the tracers and TpH was manually quantified. For retrograde tracing control experiments, mice were injected with 200–300 nL of a 1:1:1 cocktail of all three retrograde tracers in the NAc medSh and the DR was imaged and analyzed as above.

Two-Color Fiber Photometry Recordings of VTADA and DR5HT axons

DAT-Cre+/−;SERT-Flp+/− mice were injected with 1000 nL of AAV-DJ-EF1a-DIO-RCaMP2 unilaterally into the left VTA and 1000 nL of AAV-DJ-EF1a-fDIO-GCaMP6m along the midline in the DR and implanted unilaterally with an optical fiber in the NAc medSh (ipsilateral to the VTA injection). Data collection and preprocessing followed procedures described previously65 that were modified to enable simultaneous recording in the red and green channels. Briefly, mice were hooked up through low-autofluorescence patch cords and a connectorized fluorescence minicube (Doric #FMC6_IE(400–410)_E1(460–490)_F1(500–540)_E2(555–570)_F2(590–680)) to a photodetector (Newport #2151). GCaMP and RCaMP sensors were excited by frequency modulated 405 nm, 465 nm, and 560 nm light with LED power settings held constant across all mice. The resulting fluorescence signals were sampled at 1.0173 KHz, band pass filtered, demodulated, and recorded using a signal processor (RZ5P, Tucker-Davis Technologies). GCaMP and RCaMP traces were then detrended by subtracting out the linear fit between them and the 405 nm channel (respectively), down-sampled by a factor of 10, digitally filtered, aligned to behavioral events, and z-scored relative to a baseline period from 10 to 4 seconds before the event of interest. Traces aligned to each event of interest were averaged within mice and smoothened using local regression over a sliding window of 100 ms. For the correlation analyses in Fig. 2m, loess smoothing was applied and the timing and magnitude of the extrema for the RCaMP2 and GCaMP6 traces were calculated for each trial and averaged within mice. Recordings were performed as mice performed a Pavlovian conditioning task that is detailed in the corresponding section below. One mouse in which the optical patch cord often slipped off the mouse’s implanted ferrule during recording was excluded from further analysis.

Fiber Photometry Recordings with GRAB sensors

GRAB sensor recordings were performed in DAT-Cre+/−;SERT-Flp+/− mice injected with 600 nL of AAV9-hSyn-GRABDA2m or AAV9-hSyn-GRAB5HT3.0 (WZBiosciences) in the NAcpmSh or NAccore and implanted unilaterally with an optical fiber directly above the injection site. Hemispheres were counterbalanced across mice. Mice in the NAcpmSh group received additional viral injections and fiber implants to enable optogenetic manipulations during GRAB recordings (see below), but the reward response GRAB recording data in Fig. 2np and Extended Data Fig. 5be were collected before any optogenetic manipulations were ever performed on these mice. Photometry recordings were collected as described above for two-color axon recordings, except that only the 465 nm excitation LED was used. The data were analyzed as described above for two-color axon recordings, except that no detrending was performed because the GRAB sensors employed do not exhibit isosbestic excitation in the 405 nm range. Recordings were performed over 4–7 days as mice consumed 32% sucrose rewards delivered on a variable interval schedule without any predictive cues.

GRAB Sensor Validation of Optogenetic Manipulations

To validate optogenetic manipulations of VTADA neurons, DAT-Cre+/−;SERT-Flp+/− mice were injected bilaterally with 400 nL of AAV-DJ-DIO-ef1a-ENPAC2.0-YFP in the VTA (for a total of 800 nL), and unilaterally with 600 nL of AAV9-hSyn-GRABDA2m (WZBiosciences) into the NAcpmSh (with hemispheres counterbalanced across mice). An optical fiber was implanted directly above the GRAB sensor injection site, and a subset of these mice also received a second optical fiber implant above the VTA. To validate optogenetic manipulations of DR5HT neurons, SERT-Cre hemizygous mice were injected along the midline with 800 nL of AAV-DJ-DIO-ef1a-ENPAC2.0-YFP in the DR, and unilaterally with 600 nL of AAV9-hSyn-GRABDA2m (WZBiosciences) into the NAcpmSh (with hemispheres counterbalanced across mice). An optical fiber was implanted directly above the GRAB sensor injection site, and a subset of these mice also received a second optical fiber implant above the DR. Recordings took place during two sessions per day. During the first session of each day, mice consumed 32% sucrose rewards delivered on a variable interval schedule in the absence of any predictive cues and without any optogenetic manipulations. After the conclusion of the first session each day, mice immediately underwent one of two types of test sessions. The first type of test session was designed to validate the effects of our loss-of-function manipulations. During this test, mice continued to consume unpaired rewards on a variable interval schedule, but reward consumption was paired with VTADA axon inhibition (2 s on, 0.5 s off pattern for a total of 5s, 6–8 mW) for the GRAB-DA group and with DR5HT soma excitation (5ms pulses at 20 Hz for 5 s, ~12 mW) for the GRAB-5HT group. The second type of test session was designed to validate the effects of our gain of function manipulations. During this test, mice in the GRAB-DA group received VTADA stimulation (5ms pulses at 20 Hz for 5 s, ~12 mW) and mice in the GRAB-5HT group received DR5HT axon inhibition (2 s on, 0.5 s off pattern for a total of 5s, 6–8 mW) on a variable interval schedule and in the absence of any sucrose rewards. Thus, these assays allowed us to directly compare, in the same mice and in the same session, the GRAB-DA and GRAB-5HT reward responses to sucrose rewards with the GRAB-DA and GRAB-5HT responses to our optogenetic gain-of-function manipulations alone or to sucrose reward consumption paired with our optogenetic loss-of-function manipulations. For experiments involving optogenetic excitation of VTA or DR cell bodies, optical fiber placement in the midbrain was validated by recording GRAB-DA or GRAB-5HT responses in the NAcpmSh while delivering optogenetic stimulation to the corresponding cell-bodies in the midbrain, and only mice that showed a consistent GRAB sensor response to the optogenetic stimulation were included in subsequent analyses. Photometry analysis of GRAB sensor recordings was performed as described above, except that data were averaged over trials pooled across mice.

Optogenetic stimulation and inhibition

For loss-of-function experiments, DAT-Cre+/−;SERT-Flp+/− mice were injected with 400–500 nL of either AAV-DJ-ef1a-DIO-NpHR3.0-eYFP or AAV-DJ-ef1a-DIO-eYFP bilaterally into the VTA (for a total of 800–1000 nL), and 800–1000 nL of AAV-DJ-EF1a-fDIO-ChR2 or AAV-DJ-ef1a-fDIO-eYFP along the midline in the DR and implanted bilaterally with optical fibers in the NAcpmSh. For gain-of-function experiments, DAT-Cre+/−;SERT-Flp+/− mice were injected with 400–500 nL of either AAV-DJ-ef1a-DIO-ChR2-eYFP or AAV-DJ-ef1a-DIO-eYFP bilaterally into the VTA (for a total of 800–1000 nL), and 500–1000 nL of AAV-8-nEF-CoffFon-NpHR3.3-eYFP (Addgene #137154) or AAV-DJ-ef1a-fDIO-eYFP along the midline into the DR and implanted bilaterally with optical fibers in the NAcpmSh or NAccore. Prizmatix Pulsers and STSI LEDs were used to deliver blue (450 nm) and/or red (620 nm) light through a patch cord that mice were connected to via a rotary joint (Doric). The optical parameters used for each behavioral task are specified in the corresponding sections below. For all experiments where optogenetic manipulations were used, mice in the EYFP/EYFP control groups received identical light delivery to mice in the experimental groups.

General behavioral procedures

Mice received at least two sessions of habituation to handling, one session of habituation to each new behavioral arena, and an additional ~2 min of habituation time once in the behavioral apparatus prior to each behavioral testing session. In experiments involving optical fibers, mice were also habituated to being tethered. In experiments involving sucrose rewards, mice were pre-exposed to sucrose solution, food restricted to ~85% of their ad-libitum body weight prior to beginning and for the duration of the behavioral task, and fed daily after the completion of behavioral testing. Blinding was not used because all behavioral measurements were made in an automated way without any manual scoring. Sucrose conditioning, variable interval reward schedule, and optogenetic conditioning tasks were run using Med-PC IV/V software; all other behavioral tasks were run using the Biobserve Viewer behavioral tracking program.

Appetitive conditioning with sucrose

Sucrose conditioning took place inside of sound insulated operant chambers equipped with a red house light, sound generating speakers, and a reward port fed by a syringe pump (all from Med Associates). The CS consisted of a compound port light and sound cue (axon recording photometry experiments: white noise or a 4500 Hz tone, counterbalanced; 70–80 dB) or a sound cue alone (loss-of-function experiments: white noise only without port light; 70–80 dB) that lasted for 20 s. The US was ~12 uL of sucrose solution (32% w/v) delivered into the reward port 5 s after CS-onset. Mice needed to pick up the previous reward before the next trial could begin. The ITI was variable with an average of 40 s, for a maximum of 40 rewards per 40 min session. Wherever the latency to enter the reward port following CS onset is shown, we took the median latency for each session from each mouse.

For axon recording fiber photometry experiments, mice received 5 days of initial training and then continued to receive daily training sessions until they attained 30/40 possible rewards for 3 consecutive days (minimum 8 training days, maximum 14, average ~10). Early and late training photometry data shown correspond to the first 3 and last 3 days of training for each mouse, respectively. Reward consumption was defined as the first reward port entry following each reward delivery.

For loss-of-function experiments, all mice received 12 days of training and the late training data shown consists of data from days 7–12. For all training days, optostimulation/ inhibition was aligned to reward consumption. Specifically, reward port entries made while the CS was on triggered 5 s of optostimulation/inhibition (450 nm light in 5 ms pulses at 20 Hz and ~12 mW; 620 nm light in 2 s on 0.5 off pattern at 6–8 mW). If no reward port entry was made while the CS was on, then a single 5 s bout of optostimulation/inhibition was delivered upon the next port entry. Data from one training session between days 9 and 10 during which a software crash caused the LEDs not to work on some trials was excluded from analysis. The probability of occupying the reward port as a function of time was calculated as the fraction of trials during which each mouse was occupying the reward port at each point in time within a trial, measured in 10 ms bins and smoothened within mice using local regression (span ~0.5 s). The baseline normalized probability of occupying the reward port was calculated by Z-scoring each mouse’s probability trace relative to the baseline period for each trial, defined as the epoch from 10 s before CS-onset to 5 s before CS-onset which always corresponded to part of the ITI. Similarly, the percentage of time spent in the reward port at each period within a trial was calculated and for some analyses the percentage of time spent in the reward port during the baseline period was subtracted to account for differences in responding at the reward port during the ITI. Extinction sessions had an average ITI of 80 s for a total of 20 trials where neither US nor optostimulation/inhibition were delivered.

Real-time place preference

Real-time place preference tests were done using a 70 cm × 23 cm box made up of three chambers: a neutral center chamber with clear floors, and two identical side chambers with white walls and floors. Mice were confined to the center chamber for a ~2 min habituation period. Then, the barriers were removed to begin the initial phase of the test during which mice were free to explore all three chambers for 15 min and one side chamber (left or right, counterbalanced) was paired with optostimulation/inhibition that began when the mouse entered the light-paired chamber and continued for the entire duration that the mouse occupied the chamber. Immediately after completion of the initial phase, the side that was paired with optostimulation/inhibition was swapped and mice were free to explore all three chambers for an additional 15 min reversal phase. For real-time place preference experiments in Fig. 3p, Extended Data Fig. 6b, and Fig. 2kl, both blue (11–14 mW, 450 nm light in 5 ms pulses at 20 Hz) and red (6–8 mW, 620 nm light in a 2 s on, 0.5 s off pattern) light were used. For real-time place preference experiments in Fig. 5jp, the light parameters used are specified on the figure. In Fig. 4kl, n=1 mouse that failed to explore all three chambers of the box was excluded from analysis.

4-Choice real-time place preference

Mice were placed into a 40 × 40 cm behavioral arena divided into four equally sized and identical quadrants. One quadrant was paired with blue light stimulation (3 mW, 450 nm light in 5 ms pulses at 20 Hz), another with red light stimulation (6–8 mW, 620 nm light in a 2 s on, 0.5 s off pattern), another with both red and blue light stimulation together, and the fourth quadrant with no optical stimulation at all. As in conventional two-chamber real-time place preference tests, the optical stimulation paired with each quadrant began immediately when the mouse entered the light-paired quadrant and continued for the entire duration that the mouse occupied that quadrant. Mice were free to explore the entire behavioral arena for 90 min and the amount of time spent in each quadrant was measured.

Free sucrose access

Mice were placed into a 40 × 25 cm behavioral arena with a sipper bottle in one corner of the box and given free access to 32% sucrose solution (w/v) for 25 min/day for 4–5 days. Optogenetic manipulations (11–14 mW, 450 nm light in 5 ms pulses at 20 Hz; and constant 6–8 mW, 620 nm light ) were delivered for the entire time that each mouse occupied a 10 × 10 cm zone around the sucrose sipper bottle. The amount of sucrose solution consumed was measured as the change in weight of the sucrose bottle before and after the task. In 3/115 testing sessions when mice partially chewed through the rubber stopper on the sucrose bottle and spilled some of the sucrose solution inside, the amount consumed was estimated as the change in the mouse’s weight before and after the task. Data are shown as the average amount of sucrose solution consumed (by weight) across the days of the task minus the average change in weight of an identical bottle full of sucrose that was placed into an empty behavioral arena (to control for the small amount of sucrose that can drip out when the bottle is moved to be weighed).

Open field test

Mice were placed into a 40 × 40 cm behavioral arena and allowed to explore freely for 12 mins. Optogenetic manipulations were delivered in four alternating light-on and light-off epochs that lasted 3 mins each and which epoch came first was counterbalanced across mice.

Conditioned place preference

A 70 cm × 23 cm box was divided into two chambers with different visual cues on the walls and floors with different textures. On the first day of the task, mice underwent a pre-test where they were free to explore the entire box for 15 min. On the second day of the task, mice received two conditioning sessions where first they were confined to one chamber for 25 min and administered optostimulation/inhibition (see Fig. 4ac for the optical parameters of each experiment), and then >4 h later they were confined to the other chamber and administered nothing. On the third day, the conditioning procedure was repeated, but with the order of the sessions (optostimulation/inhibition or nothing) counterbalanced relative to the first day. On the fourth day the mice underwent a post-test that was identical to the pre-test, and the amount of time spent in each chamber was compared to the pre-test day. Which chamber was paired with optostimulation/inhibition was counterbalanced across mice in an unbiased design. For mice that underwent multiple CPP experiments, the visual and tactile cues in the boxes were changed for each new experiment.

Optogenetic conditioning

Optogenetic conditioning experiments were carried out in operant chambers enclosed in sound attenuating boxes and equipped with a sound generator and three reward ports, each with their own reward port light. Mice were presented with three CS-US pairs where CSs were compound sound (white noise, 4500 Hz pure tone, or 40 Hz clicks; all ~70 dB) and port-light (left, center, or right port) cues and USs consisted of red light stimulation alone (5–8 mW, 620 nm light in a 2 s on, 0.5 s off pattern), blue light stimulation alone, (11–14 mW, 450 nm light in 5 ms pulses at 20 Hz), or both together. Each CS lasted 20 s, each US lasted 5 s, and the CS and US co-terminated. CS-US pairings were counterbalanced across mice, and US delivery was not contingent upon entry into any port. Mice underwent 12 training sessions lasting 14 hours each and consisting of ~420 trials (variable ITI with average 100s) divided approximately evenly between the three trial types. During the first 9 days of training, trials were delivered into blocks of a single trial type (~90 min/block, 3 blocks of each trial type per session, delivered in randomized order). On the last 3 days of training, trials of different types were interleaved and delivered in randomized order. Data from one day of training when a software crash caused the optostimulation/inhibition to fail for some mice was excluded from analysis. Learning was measured as conditioned approach (number of entries made into any port during the initial 15s of each CS before the corresponding US was delivered) and approach accuracy (number of entries made into the paired port versus the unpaired ports during the initial 15s of each CS before the corresponding US was delivered) on the last 3 training days when trials were delivered in randomized interleaved order. After the final day of training, mice underwent one 14-hour conditioned reinforcement test during which each entry into a port triggered a 2.5 s long presentation of the CS originally paired with that port, but no US was delivered. After the conditioned reinforcement test, mice also underwent one 14-hour primary reinforcement test during which each entry into a port triggered a 2.5 s long presentation of the US originally paired with that port, but no CS was presented. One mouse was run through the primary reinforcement test a second time because its optical patch cord came off partway through the first attempt.

Statistical analysis

Statistical analyses were performed in R and GraphPad prism. Data that satisfied the assumptions of parametric tests (by inspection of the fitted values vs residuals and Q-Q diagnostic plots) were analyzed using t-tests or one-way ANOVAs followed by Holm-Sidak or Dunnett’s post-hoc tests for one factor designs, and two-way ANOVAs or mixed-effects models followed by Holm-Sidak’s or Dunnett’s post-hoc tests for two-factor designs. Paired or repeated measures comparisons and the Geisser-Greenhouse correction were used where appropriate. Data that did not satisfy the assumptions of parametric tests were log transformed and then analyzed using parametric tests if the transformed data met the requisite assumptions. Data that did not satisfy the assumptions of parametric tests after transformation or which could not be log transformed (e.g., because of negative values) were analyzed using nonparametric Kruskal-Wallis or Friedman’s tests followed by Dunn’s post-hoc tests. All hypotheses tested were two-tailed, and data are shown as mean +/− s.e.m.

Extended Data

Extended Data Fig. 1: DAT-Cre+/−;SERT-Flp+/− mice enable orthogonal and specific access to VTADA and DR5HT neurons.

Extended Data Fig. 1:

a, Surgical strategy to validate orthogonality of genetic access to VTADA and DR5HT neurons in DAT-Cre+/−;SERT-Flp+/− mice. b, Example image showing negligible Flp-dependent EYFP expression in the VTA. c-d, Example images of the DR showing Cre-dependent mCherry expression is restricted to DRDA neurons (c) and is not observed in DR5HT neurons (d). e, Surgical strategy for control experiments to validate the specificity of our viral targeting strategy. f-g, example images showing negligible Cre-dependent mCherry expression in the VTA in the absence of Cre (f) and negligible Flp-dependent EYFP expression in the DR in the absence of Flp (g). In a-d, n = 1 mouse. In e-f, n = 1 mouse.

Extended Data Fig. 2: Overlap between VTADA and DR5HT axons varies across limbic regions.

Extended Data Fig. 2:

a, Surgical strategy for VTADA and DR5HT axon tracing experiments. b, Example images showing labeled VTADA and DR5HT axons in sagittal sections (top). Insets (center, bottom) correspond to the boxed regions in the top images. c, Relative density of VTADA (left) and DR5HT (right) axons across limbic regions. d, Background subtracted (left) and segmented (center) images showing VTADA and DR5HT axons in the anterior NAc. Insets (right) show magnified views of the corresponding boxed areas in the left and center images. e, same as d, but for the posterior NAc. f, same as d, but for the anterior BLA. g, same as d, but for the posterior BLA. h, same as d, but for the Ant Ctx. i, Relative colocalization between VTADA and DR5HT axons across the regions shown in d-h. In c and i, n = 5 mice. Data are shown as mean +/− s.e.m. and significance is denoted as *P<0.05, **P<0.01, ***P<0.001, ****P<0.0001. See Supplementary Table 1 for statistics.

Extended Data Fig. 3: NAc-projecting DR5HT neurons are distinct from CeA- and OFC- projecting DR5HT subsystems.

Extended Data Fig. 3:

a, Surgical strategy for retrograde labeling of projection-defined DR5HT subsystems. b, Example images of the DR showing retrogradely labeled neurons (posterior DR image reproduced here from Fig. 1m for comparison). c, Percentage of OFC-, CeA-, and NAc- projecting DR neurons that are TpH+. d, Pie graphs showing the fraction of OFC- (left), CeA- (center), and NAc- (right) projecting DR5HT neurons that send axon collaterals to the other two target regions. e, NAc- projecting DR5HT neurons are a distinct population from then CeA- and OFC- projecting DR5HT subsystems. f, Distributions of OFC- (left), CeA- (center), and NAc- (right) projecting DR5HT neurons across the DR’s anteroposterior axis. g, same as f, but across the dorsomedial (dm), ventromedial (vm), and lateral (l) subregions shown in b. h, Injection strategy (top) and example injection site images (bottom) for retrograde tracing control experiments. I, Example images showing retrogradely labeled cells in the DR. j, When all three retrograde tracers were injected together into the same target structure, the vast majority of labeled cells in the DR were positive for all three tracers. In a-g, n = 3 mice. In h-j, n = 2 mice. Data are shown as mean +/− s.e.m. and significance is denoted as *P<0.05, **P<0.01, ***P<0.001, ****P<0.0001. See Supplementary Table 1 for statistics.

Extended Data Fig. 4: Inverse VTADA and DR5HT axon reward responses are consistent across mice and are not explained by motion.

Extended Data Fig. 4:

a, Optical fiber tip placements for mice used in two-color axon photometry experiments. b, RCaMP2 (VTADA axon) recordings from individual mice aligned to CS-onset (top) or reward consumption (bottom) early (left) and late (right) in training. c, Same as b, but for GCaMP6 (DR5HT axon) recordings. d, Average GCaMP6, RCaMP2, and UV photometry traces from an example mouse (n = 25 trials from 1 mouse) aligned to shock onset. Top, demodulated and mean subtracted traces before motion correction, z-scoring, or smoothing; center, Z-scored GCaMP6 traces from the same session with and without motion correction; bottom, Z-scored RCaMP2 traces from the same session with and without motion correction (see Methods for motion correction details). e, same as d, but for photometry traces aligned to reward consumption (n = 35 trials from 1 mouse). Example traces in d-e are from the same mouse shown in Fig. 2b and Fig. 2kl.

Extended Data Fig. 5: Fiber placement validation and additional analyses for GRAB sensor recording experiments in NAcpmSh.

Extended Data Fig. 5:

a, Optical fiber tip placements for mice used in GRAB-DA (top) and GRAB-5HT (bottom) experiments in the NAcpmSh. b, GRAB-DA recordings aligned to reward consumption showing the average response across trials for each mouse. c, Same as b, but for GRAB-5HT. d-e, GRAB-5HT recordings aligned to reward consumption during days 1–3 (d) and 5–7 (e) of a task where rewards were delivered randomly and without any predictive cues (Data are shown as mean +/− s.e.m.). For all panels, n = 5 mice per group.

Extended Data Fig. 6: Optical fiber placement validation and control assays for loss-of-function experiments in the NAcpmSh.

Extended Data Fig. 6:

a-d, Example images of the injection sites (DR, top left; VTA, top right) and optical fiber implantation sites (bottom) for the EYFP/EYFP (a), EYFP/NpHR (b), ChR2/EYFP (c), and ChR2/NpHR (d) groups. e-h, Optical fiber tip placements for mice in the EYFP/EYFP (e), EYFP/NpHR (f), ChR2/EYFP (g), and ChR2/NpHR (h) groups. i, Percent change in velocity during the light-on epochs relative to the light-off epochs in the open field test. j, Difference score for time spent on each side of the chamber in the RTPP task. In i-j: EYFP/EYFP, n = 9 mice; NpHR/EYFP, n = 8 mice; EYFP/ChR2, n = 7 mice; NpHR/ChR2, n= 9 mice. Data are shown as mean +/− s.e.m. and significance is denoted as *P<0.05, **P<0.01, ***P<0.001, ****P<0.0001. See Supplementary Table 1 for statistics.

Extended Data Fig. 7: GRAB sensor validation of the gain- and loss- of function manipulations of VTADA and DR5HT reward responses.

Extended Data Fig. 7:

a, Viral strategy enabling VTADA stimulation, VTADA inhibition, and GRAB-DA recordings in the same mouse (top); and, viral strategy enabling DR5HT stimulation, DR5HT inhibition, and GRAB-5HT recordings in the same mouse (bottom). b, GRAB-DA recordings aligned to sucrose consumption alone (gray, n = 913 trials from 11 mice) or sucrose consumption with VTADA inhibition (red, n = 1380 trials from 11 mice) in the same mice (top); and, GRAB-5HT recordings aligned to sucrose consumption alone (gray, n = 539 trials from 2 mice) or sucrose consumption with DR5HT stimulation (blue; n = 680 trials from 2 mice) in the same mice (bottom). c, GRAB-DA (top) recordings aligned to sucrose reward consumption (gray, n = 274 trials from 5 mice) or to the onset of VTADA stimulation (blue, n = 779 trials from 5 mice) in the same mice (top); and, GRAB-5HT recordings aligned to sucrose reward consumption (gray, n = 1006 trials from 12 mice) or to the onset of DR5HT inhibition (red, n = 1872 trials from 12 mice) in the same mice (bottom). Data are shown as mean +/− s.e.m.

Extended Data Fig. 8: Optical fiber placement validation and control assays for gain-of-function experiments in the NAcpmSh.

Extended Data Fig. 8:

a-d, Example images of the injection sites (DR, top left; VTA, top right) and optical fiber implantation sites (bottom) for the EYFP/EYFP (a), NpHR/EYFP (b), EYFP/ChR2 (c), and NpHR/ChR2 (d) groups. e-h, Optical fiber tip placements for mice in the EYFP/EYFP (e), NpHR/EYFP (f), EYFP/ChR2 (g), and NpHR/ChR2 (h) groups. Two mice in the NpHR/EYFP group died before their brains could be collected for histology. i, Percent change in velocity during the light-on epochs relative to the light-off epochs in the open field test (n = 6 mice per group). j, Number of trials of each type obtained during days 1–9 of training in the optogenetic conditioning task shown in Fig. 4mp (n = 10 mice per group). Data are shown as mean +/− s.e.m. and significance is denoted as *P<0.05, **P<0.01, ***P<0.001, ****P<0.0001. See Supplementary Table 1 for statistics.

Extended Data Fig. 9: Optical fiber placement validation and individual mouse traces for GRAB sensor recording experiments in NAccore.

Extended Data Fig. 9:

a-b, Optical fiber tip placements for mice used in GRAB-DA (a) and GRAB-5HT (b) experiments in the NAccore. One mouse in the GRAB-5HT group died before its brain could be collected for histology. c, GRAB-DA recordings aligned to reward consumption showing the average response across trials for each mouse (n = 3 mice). d, Same as c, but for GRAB-5HT (n = 4 mice).

Extended Data Fig. 10: Optical fiber placement validation for loss-of-function experiments in the NAccore.

Extended Data Fig. 10:

a-b, Example images of the injection sites (DR, left; VTA, center) and optical fiber implantation sites (right) for the EYFP/EYFP (a), and ChR2/NpHR (b) groups. c-d, Optical fiber tip placements for the EYFP/EYFP (c), and ChR2/NpHR (d) groups.

Supplementary Material

Supplementary table

ACKNOWLEDGMENTS

We thank Drs. Stephan Lammel, Liqun Luo, Lisa Giocomo, Gregory Scherrer, and members of the Malenka and STAAR labs for discussions, Jayashri Viswanathan for assistance with histology, and the Stanford Gene Vector and Virus Core for reagents. This work was supported by philanthropic funds donated to the Nancy Pritzker Laboratory at Stanford University. M.B.P. was supported by NIH grant K99DA056573. G.C.T. was supported by Berg Scholars program at Stanford School of Medicine. N.E. was supported by NIH grant K08MH123791, a Brain & Behavior Research Foundation Young Investigator Grant, a Burroughs Wellcome Fund Career Award for Medical Scientists, a Stanford NeuroChoice Initiative Pilot Award, and a Simons Foundation Bridge to Independence Award. D.F.C.P. was supported by an NSF Graduate Research Fellowship and an HHMI Gilliam Fellowship for Advanced Study (with R.C.M.).

Footnotes

COMPETING INTERESTS

N.E. is a consultant for Boehringer Ingelheim. B.S.B is a co-founder of Magnus Medical. R.C.M. is on the scientific advisory boards of MapLight Therapeutics, MindMed, and Aelis Farma.

DATA AVAILABILITY

The datasets generated and analyzed during this study are available from the corresponding author upon reasonable request

CODE AVAILABILITY

Code used for data processing and analysis is available from the corresponding author upon reasonable request.

REFERENCES

  • 1.Schultz W, Dayan P & Montague PR A Neural Substrate of Prediction and Reward. Science (1979) 275, 1593–1599 (1997). [DOI] [PubMed] [Google Scholar]
  • 2.Steinberg EE et al. A causal link between prediction errors, dopamine neurons and learning. Nat Neurosci 16, 966–973 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Saunders BT, Richard JM, Margolis EB & Janak PH Dopamine neurons create Pavlovian conditioned stimuli with circuit-defined motivational properties. Nat Neurosci 21, 1072–1083 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Sengupta A & Holmes A A Discrete Dorsal Raphe to Basal Amygdala 5-HT Circuit Calibrates Aversive Memory. Neuron 103, 489–505.e7 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Zeng J et al. Local 5-HT signaling bi-directionally regulates the coincidence time window for associative learning. Neuron 111, 1118–1135.e5 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Siegelbaum SA, Camardo JS & Kandel ER Serotonin and cyclic AMP close single K+ channels in Aplysia sensory neurones. Nature 299, 413–417 (1982). [DOI] [PubMed] [Google Scholar]
  • 7.Brunelli M, Castellucci V & Kandel ER Synaptic Facilitation and Behavioral Sensitization in Aplysia : Possible Role of Serotonin and Cyclic AMP. Science (1979) 194, 1178–1181 (1976). [DOI] [PubMed] [Google Scholar]
  • 8.Izquierdo A et al. Impaired reward learning and intact motivation after serotonin depletion in rats. Behavioural Brain Research 233, 494–499 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Luo M, Li Y & Zhong W Do dorsal raphe 5-HT neurons encode “beneficialness”? Neurobiol Learn Mem 135, 40–49 (2016). [DOI] [PubMed] [Google Scholar]
  • 10.Daw ND, Kakade S & Dayan P Opponent interactions between serotonin and dopamine. Neural Networks 15, 603–616 (2002). [DOI] [PubMed] [Google Scholar]
  • 11.Boureau Y-L & Dayan P Opponency Revisited: Competition and Cooperation Between Dopamine and Serotonin. Neuropsychopharmacology 36, 74–97 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Feng YY, Bromberg-Martin ES & Monosov IE Dorsal raphe neurons integrate the values of reward amount, delay, and uncertainty in multi-attribute decision-making. Cell Rep 43, (2024). [DOI] [PubMed] [Google Scholar]
  • 13.Redgrave P Modulation of intracranial self-stimulation behaviour by local perfusions of dopamine, noradrenaline and serotonin within the caudate nucleus and nucleus accumbens. Brain Res 155, 277–295 (1978). [DOI] [PubMed] [Google Scholar]
  • 14.Cohen JY, Amoroso MW & Uchida N Serotonergic neurons signal reward and punishment on multiple timescales. Elife 2015, (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Bissière S, Humeau Y & Lüthi A Dopamine gates LTP induction in lateral amygdala by suppressing feedforward inhibition. Nat Neurosci 6, 587–92 (2003). [DOI] [PubMed] [Google Scholar]
  • 16.Tye KM et al. Methylphenidate facilitates learning-induced amygdala plasticity. Nat Neurosci 13, 475–481 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Fisher YE, Marquis M, D’Alessandro I & Wilson RI Dopamine promotes head direction plasticity during orienting movements. Nature 612, 316–322 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Beier KT et al. Circuit Architecture of VTA Dopamine Neurons Revealed by Systematic Input-Output Mapping. Cell 162, 622–634 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Muzerelle A, Scotto-Lomassese S, Bernard JF, Soiza-Reilly M & Gaspar P Conditional anterograde tracing reveals distinct targeting of individual serotonin cell groups (B5–B9) to the forebrain and brainstem. Brain Struct Funct 221, 535–561 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Eshel N et al. Arithmetic and local circuitry underlying dopamine prediction errors. Nature 525, 243–246 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Kim HGR et al. A Unified Framework for Dopamine Signals across Timescales. Cell 183, 1600–1616.e25 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Amo R et al. A gradual temporal shift of dopamine responses mirrors the progression of temporal difference error in machine learning. Nat Neurosci 25, 1082–1092 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Merens W, Willem Van der Does AJ & Spinhoven P The effects of serotonin manipulations on emotional information processing and mood. Journal of Affective Disorders vol. 103 43–62 Preprint at 10.1016/j.jad.2007.01.032 (2007). [DOI] [PubMed] [Google Scholar]
  • 24.Ruhé HG, Mason NS & Schene AH Mood is indirectly related to serotonin, norepinephrine and dopamine levels in humans: a meta-analysis of monoamine depletion studies. Mol Psychiatry 12, 331–359 (2007). [DOI] [PubMed] [Google Scholar]
  • 25.Hamid AA et al. Mesolimbic dopamine signals the value of work. Nat Neurosci 19, 117–126 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Da Silva JA, Tecuapetla F, Paixão V & Costa RM Dopamine neuron activity before action initiation gates and invigorates future movements. Nature 554, 244–248 (2018). [DOI] [PubMed] [Google Scholar]
  • 27.Soubrié P Reconciling the role of central serotonin neurons in human and animal behavior. Behavioral and Brain Sciences 9, 319–335 (1986). [Google Scholar]
  • 28.Miyazaki KW et al. Optogenetic activation of dorsal raphe serotonin neurons enhances patience for future rewards. Current Biology 24, 2033–2040 (2014). [DOI] [PubMed] [Google Scholar]
  • 29.Matias S, Lottem E, Dugué GP & Mainen ZF Activity patterns of serotonin neurons underlying cognitive flexibility. Elife 6, (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Xu S, Das G, Hueske E & Tonegawa S Dorsal Raphe Serotonergic Neurons Control Intertemporal Choice under Trade-off. Current Biology 27, 3111–3119.e3 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Schweighofer N et al. Low-serotonin levels increase delayed reward discounting in humans. Journal of Neuroscience 28, 4528–4532 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Miyazaki KW, Miyazaki K & Doya K Activation of dorsal raphe serotonin neurons is necessary for waiting for delayed rewards. Journal of Neuroscience 32, (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Cools R, Nakamura K & Daw ND Serotonin and dopamine: Unifying affective, activational, and decision functions. Neuropsychopharmacology vol. 36 Preprint at 10.1038/npp.2010.121 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Dayan P & Huys QJM Serotonin in affective control. Annual Review of Neuroscience vol. 32 Preprint at 10.1146/annurev.neuro.051508.135607 (2009). [DOI] [PubMed] [Google Scholar]
  • 35.Li Y et al. Synaptic mechanism underlying serotonin modulation of transition to cocaine addiction. Science (1979) 373, (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Pelloux Y, Dilleen R, Economidou D, Theobald D & Everitt BJ Reduced forebrain serotonin transmission is causally involved in the development of compulsive cocaine seeking in rats. Neuropsychopharmacology 37, 2505–2514 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Bäckman CM et al. Characterization of a mouse strain expressing Cre recombinase from the 3′ untranslated region of the dopamine transporter locus. genesis 44, 383–390 (2006). [DOI] [PubMed] [Google Scholar]
  • 38.Ren J et al. Single-cell transcriptomes and whole-brain projections of serotonin neurons in the mouse dorsal and median raphe nuclei. Elife 8, (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.de Jong JW et al. A Neural Circuit Mechanism for Encoding Aversive Stimuli in the Mesolimbic Dopamine System. Neuron 101, 133–151.e7 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Yang H et al. Nucleus Accumbens Subnuclei Regulate Motivated Behavior via Direct Inhibition and Disinhibition of VTA Dopamine Subpopulations. Neuron 97, 434–449.e4 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Lammel S et al. Unique Properties of Mesoprefrontal Neurons within a Dual Mesocorticolimbic Dopamine System. Neuron 57, 760–773 (2008). [DOI] [PubMed] [Google Scholar]
  • 42.Pomrenze MB et al. Modulation of 5-HT release by dynorphin mediates social deficits during opioid withdrawal. Neuron 110, (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Huang KW et al. Molecular and anatomical organization of the dorsal raphe nucleus. Elife 8, (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Ren J et al. Anatomically Defined and Functionally Distinct Dorsal Raphe Serotonin Sub-systems. Cell 175, 472–487.e20 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Salinas-Hernández XI, Zafiri D, Sigurdsson T & Duvarci S Functional architecture of dopamine neurons driving fear extinction learning. Neuron 111, 3854–3870.e5 (2023). [DOI] [PubMed] [Google Scholar]
  • 46.Stanley G, Gokce O, Malenka RC, Südhof TC & Quake SR Continuous and Discrete Neuron Types of the Adult Murine Striatum. Neuron 105, 688–699.e8 (2020). [DOI] [PubMed] [Google Scholar]
  • 47.Liu Y et al. A subset of dopamine receptor-expressing neurons in the nucleus accumbens controls feeding and energy homeostasis. Nat Metab (2024) doi: 10.1038/s42255-024-01100-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Zhao Z-D et al. A molecularly defined D1 medium spiny neuron subtype negatively regulates cocaine addiction. Sci Adv 8, eabn3552 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Badrinarayan A et al. Aversive stimuli differentially modulate real-time dopamine transmission dynamics within the nucleus accumbens core and shell. Journal of Neuroscience 32, 15779–15790 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Steinberg EE et al. Positive reinforcement mediated by midbrain dopamine neurons requires D1 and D2 receptor activation in the nucleus accumbens. PLoS One 9, (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Day JJ, Roitman MF, Wightman RM & Carelli RM Associative learning mediates dynamic shifts in dopamine signaling in the nucleus accumbens. Nat Neurosci 10, 1020–1028 (2007). [DOI] [PubMed] [Google Scholar]
  • 52.Engel L et al. Dopamine neurons drive spatiotemporally heterogeneous striatal dopamine signals during learning. Current Biology 34, 3086–3101.e4 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Tsutsui-Kimura I et al. Distinct temporal difference error signals in dopamine axons in three regions of the striatum in a decision-making task. Elife 9, 1–39 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Parkinson JA, Olmstead MC, Burns LH, Robbins TW & Everitt BJ Dissociation in Effects of Lesions of the Nucleus Accumbens Core and Shell on Appetitive Pavlovian Approach Behavior and the Potentiation of Conditioned Reinforcement and Locomotor Activity by D-Amphetamine. (1999). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Wee S & Woolverton WL Self-administration of mixtures of fenfluramine and amphetamine by rhesus monkeys. Pharmacol Biochem Behav 84, 337–343 (2006). [DOI] [PubMed] [Google Scholar]
  • 56.Wee S et al. Relationship between the serotonergic activity and reinforcing effects of a series of amphetamine analogs. Journal of Pharmacology and Experimental Therapeutics 313, 848–854 (2005). [DOI] [PubMed] [Google Scholar]
  • 57.Gunaydin LA et al. Natural neural projection dynamics underlying social behavior. Cell 157, 1535–1551 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Walsh JJ et al. 5-HT release in nucleus accumbens rescues social deficits in mouse autism model. Nature 560, 589–594 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]

ADDITIONAL REFERENCES

  • 60.Cardozo Pinto DF et al. Characterization of transgenic mouse models targeting neuromodulatory systems reveals organizational principles of the dorsal raphe. Nat Commun 10, 443 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Otsu N A threshold selection method from gray-level histograms. IEEE Trans Syst Man Cybern 9, 62–66 (1979). [Google Scholar]
  • 62.Bunin MA & Wightman RM Quantitative evaluation of 5-hydroxytryptamine (serotonin) neuronal release and uptake: an investigation of extrasynaptic transmission. The Journal of Neuroscience 18, 4854–4860 (1998). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Liu C, Goel P & Kaeser PS Spatial and temporal scales of dopamine transmission. Nat Rev Neurosci 22, 345–358 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Paxinos G, Franklin KBJ & Franklin KBJ The mouse brain in stereotaxic coordinates. (Academic Press, 2001). [Google Scholar]
  • 65.Eshel N et al. Striatal dopamine integrates cost, benefit, and motivation. Neuron 112, 500–514.e5 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary table

Data Availability Statement

The datasets generated and analyzed during this study are available from the corresponding author upon reasonable request

RESOURCES