Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2026 Jan 29.
Published in final edited form as: Curr Opin Behav Sci. 2024 Dec 6;61:101464. doi: 10.1016/j.cobeha.2024.101464

Complementary roles of serotonin and dopamine in model-based learning

Masakazu Taira 1, Melissa J Sharpe 1
PMCID: PMC12851621  NIHMSID: NIHMS2128689  PMID: 41613786

Abstract

Model-based learning involves building mental representations of predictive relationships between events. Although there is growing evidence clarifying the role of dopamine in model-based learning, we know less about how serotonin contributes to these processes. Here, we present evidence examining the role of serotonin neurons in the dorsal raphe nucleus (DRN) in regulating model-based learning. We contrast this to the role of ventral tegmental area (VTA) dopamine neurons. We further propose a new framework suggesting how dopamine and serotonin could influence model-based learning in unique but complementary ways. Specifically, we know that the phasic firing of dopamine neurons acts as teaching signals to drive model-based learning. In contrast, we argue that the sustained activity seen in DRN serotonin neurons during waiting for rewards, or during presentation of reward-predictive cues reflects a precise expectation of future rewards across multiple dimensions. These two systems allow us to refine our cognitive maps by interplexing different types of information.

Introduction

Reinforcement learning is an adaptive process that allows us to make optimal choices. Through this process, the brain can build internal models of the world, which constitutes a mental representation of predictive relationships between events [1-4]. For example, if you find yourself in a new city looking for a quick lunch, you’re often drawn to familiar signage that indicates a trusted food source, like sugarfish sushi. This tendency reflects our ability to learn about relationship between cues (sugarfish signage) and rewards (sushi) in a sensory-specific manner. Building internal models helps guide flexible and adaptive actions [1,2,4]. For example, if you read a news article highlighting cases of mercury poisoning from eating sashimi, you may be less drawn to the signage of sugarfish and gamble on somewhere new. This type of learning is called model-based learning, and disruption of this cognitive process is observed across psychological disorders including addiction, schizophrenia, and obsessive-compulsive disorder [5-9].

Dopamine and serotonin are two important neuromodulators that regulate different aspects of reinforcement learning [10,11]. Both neuromodulators are important for acquiring internal models and using learned representations to guide appropriate actions [12,13*]. Many studies have examined the role of dopamine neurons in the ventral tegmental area (VTA) in model-based learning [12,14-16]. More specifically, current models of dopamine function propose that the phasic firing of VTA dopamine neurons reflects a teaching signal to instantiate learned representations between cues and sensory-specific rewards [17**,18*,19,20**]. Indeed, this teaching signal is necessary to form internal models of associations between events whether or not they contain rewarding information [17**,18*,21-25]. This allows us to infer possible outcomes through mental simulation (like avoiding sushi when you’ve previously seen information about mercury poisoning; [26-28]). Despite our growing knowledge of the role of dopamine in model-based learning, the role of serotonin neurons in model-based learning is less clear. Here, we review evidence that implicates serotonin in model-based learning. First, we discuss the role of serotonin neurons in impulsive behaviours, which has recently been argued to reflect a model-based computation [29**]. Then, we discuss manipulation of serotonin in classic model-based procedures involving outcome devaluation that cannot be accounted for by model-free mechanisms. Next, we review studies that use in-vivo recording of DRN serotonin neurons during tasks where subjects are learning to use sensory cues or temporal information to predict upcoming rewards. Critically, these recording studies dissociate the activity of serotonin neurons from dopamine neurons. With the knowledge garnered from these studies, we propose a new framework that illustrates how dopamine and serotonin work together to influence model-based behaviour in a unique and complementary manner.

The role of serotonin in reinforcement learning

Serotonin regulates impulsive behaviours

Serotonin has been long known to regulate impulsive behaviours [30-33]. Impulsive behaviours are broadly categorized into impulsive choice and impulsive action [30,31,34, but see, 35,36]. Impulsive choice is generally operationalised as the tendency to choose small, immediate rewards over a large, delayed reward [30]. On the other hand, impulsive action is characterized by the failure to suppress an inappropriate action [30,32]. One example of impulsive action is premature responding in a food port when a criterion is put in place that requires withholding a response to obtain reward. Intracerebroventricular infusion of neurotoxin of serotonin neurons [37] or systemic injection of serotonin synthesis inhibitor [38], both of which induced systemic reduction of serotonin neurons, results in choice of immediate small rewards over large, delayed rewards and also increases premature responses, which suggests serotonin neurons critically regulate both types of impulsive behaviours.

More recently, specific manipulation of serotonin neurons in the DRN have revealed that these neurons are the source of the serotonergic role in impulsive behaviours [39-41]. For example, in one study, DRN serotonin neurons were optogenetically activated or inhibited in an intertemporal choice task, in which mice made choices between a small reward with a short delay and a large reward with a long delay [39]. The activation of DRN serotonin neurons before the decision point increased the probability of choosing the larger delayed rewards (i.e. decreased impulsive choice). In contrast, optogenetic inhibition of DRN serotonin neurons increased the choice of the smaller, immediate reward. In another study, mice were required to maintain a nose-poke response into a food port for a period of time (e.g., staying at port for 3, 6, or 9 s) to obtain food rewards [40,41]. Here, they applied optogenetic activation of DRN serotonin neurons during the nose-poking period and found that this decreased the number of trials aborted before reward delivery (i.e., an increase in waiting time, and a decrease in the impulsive action).

To account for how serotonin regulates impulsive behaviours, it has been proposed that serotonin weighs the relative importance of future rewards, which corresponds to the discount factor in model-free reinforcement learning [10,42]. This account predicts that increased serotonin levels will increase the discounted value of expected future rewards (i.e., increase the discount factor) and promote actions to obtain larger rewards despite a longer delay. However, a recent study showed that the effect of optogenetic activation of DRN serotonin neurons on waiting behaviours is context-dependent [29**], which is difficult to reconcile with the discount factor hypothesis. In the context-dependent waiting task, multiple different contexts are created that vary reward probabilities and number of rewards [i.e. high (75%), moderate (50%), or low (25%) probability of reward after waiting 3s]. As some of these trials are unrewarded, this gives an opportunity to see how long mice will wait to get reward in these different contexts. Optogenetic activation of DRN serotonin neurons increased the waiting duration in omission trials only when reward probability was high, regardless of whether the actual reward was small or large. In contrast, when the reward probability was low or moderate, optogenetic activation did not change the waiting duration in omission trials, regardless of reward magnitude. This task shows that serotonin neurons regulate the waiting for future rewards depending on reward probability, but not the value of the reward per se. That is, optogenetic activation of DRN serotonin neurons only impacted behaviour when the reward probability in the task was high, despite the value of reward was aligned among contexts. For example, if the 25% context is currently giving more rewards (3 pellets; expected value of reward is 3 x 0.25 = 0.75), the expected value in the 75% context was equated by giving less rewards but more frequently (1 pellet; expected value of reward is 1 x 0.75 = 0.75). However, even when the reward value was equated between those two contexts, an optogenetic effect was only found in the high probability case, suggesting that serotonin is not encoding value per se, but rather sensitive to the high reward probability. This effect was also dependent on uncertainty of timing the reward delivery. That is, the optogenetic effect on the omission trials was more prominent when the delay was random rather than when the delay was fixed. That is, if mice were unsure of the delay, optogenetic activation of DRN serotonin neurons made them more likely to wait longer for reward. To explain this context-dependent effect, the authors proposed a Bayesian decision model of waiting. This mathematical model has the assumption that mice have acquired the internal model of reward delivery timing. Mice then update the belief that the current trial is likely to result in reward by integrating the prior estimate of reward probability and the internal model of reward timing to compute the value of the action. This model suggests the possibility that mice estimate the value of actions (e.g. waiting for delayed rewards) in a model-based manner in that they use an internal model of reward timing to predict the belief of the current state and estimate the value of the actions. If mice calculate a high value of the action (i.e., higher posterior estimate of reward), they are more likely to wait, particularly if the timing is currently uncertain (and could be longer than 3s). Here, serotonin adaptively increases the estimate of reward probability according to reward context and timing. Overall, this study suggests that DRN serotonin neurons encode the estimate of reward probability and relay that information to achieve flexible model-based prediction of upcoming reward.

Serotonin is critical for learning and using model-based associations

Recently, studies have explicitly shown that serotonin is necessary for model-based associations between actions and rewards [43*,44**]. For example, it has now been demonstrated that serotonin neurons in the DRN are critical for reward devaluation. In the reward devaluation task, animals first learn actions leading to rewards (e.g., left lever leads to grain pellets and right lever leads to sucrose solution). After learning the action-reward contingencies, devaluation of one reward is achieved by specific satiety (i.e. overconsumption of one of the rewards) or paring reward consumption with lithium chloride (LiCl) injection to induce sickness. Subsequently, animals receive a test session where they can press either lever without reward feedback. If animals use the model-based internal representation of rewards in sensory-specific manner, responding on the lever associated with the devalued reward should be significantly reduced relative to the lever leading to the non-devalued reward. One study used serotonin 1B receptor selective agonist to manipulate serotonin in rats [43*]. The administration of serotonin 1B receptor agonists acts on autoreceptors located in the terminal of serotoninergic projections and reduces serotonin release. Here, systemic injection of the serotonin 1B receptor agonist during the instrumental training diminished the devaluation effect in the subsequent test session. This was despite rats being able to acquire the general lever press directed towards the rewards [43*]. These results with systemic reduction in serotonin have now been shown with specific optogenetic inhibition of serotonin neurons in the DRN. Specifically, when DRN serotonin neurons were optogenetically inhibited during the test session after devaluation, the devaluation effect was eliminated. Interestingly, this same study showed that optogenetic inhibition of serotonin neurons in median raphe nucleus, another major serotonergic subnucleus, left the devaluation effect intact [44**]. This suggests that DRN serotonin neurons are necessary for guiding actions based on current representations of rewards in a sensory-specific manner. Overall, these results demonstrate that serotoninergic system is necessary for learning the model-based relationship between actions and rewards, as well as using these learned relationships to guide current actions.

Data from humans demonstrate a role for serotonergic systems in the model-based behaviour

Studies in humans using tryptophan depletion, which induces systemic reduction in serotonin, have also provided support for serotonergic systems in model-based behaviour [45,46]. For example, Worbe et al. demonstrated that tryptophan depletion produced a shift towards devaluation insensitive model-free behaviour in the slips-of-action task [45]. Here, the magnitude of the reduction in the model-based component of behaviour was explicitly correlated with the reduction of tryptophan in the blood [45]. Further, another study used the well-established Daw two-step task to reveal that tryptophan depletion reduces the model-based weighting of performance in this task as confirmed with the pattern of stay probabilities and computational modelling [46]. Interestingly, here there were differences in how behaviour changed with tryptophan depletion dependent on whether the task employed monetary gain relative to monetary loss. Specifically, while tryptophan clearly biased choice performance away from model-based strategies in the task employing monetary reward, this was not the case in the task that employed monetary loss (i.e., punishment). Instead, tryptophan depletion may have increase model-based performance in the punishment version of the task, though this could not be recapitulated with a computational analysis of the data. Nonetheless, a differential role for serotonergic systems in reward vs. punishment is consistent with the distinct response patterns to reward and punishment in distinct circuits [47,48**]. Together, these data support a role for serotonergic systems in model-based behaviour in reward-based tasks. In the future, it will be important to examine the response profiles of different serotonergic projections and their roles in model-based learning associated with reward or punishment.

Activity of DRN serotonin neurons signal expected future rewards.

Recent recording studies using both in vivo calcium imaging [48**,49,50,51*] and single-unit electrophysiological recording [49,52,53], support the notion that DRN serotonin neurons are important in predicting upcoming rewards. Recording single-unit electrophysiological activity in neurons allow us to capture heterogeneity in the neuronal response of individual serotonin neurons, while fibre photometry reflects the calcium signal from the population of serotonin neurons and will capture patterns observed by a majority of serotonin neurons. For example, a single-unit electrophysiological recording study of DRN serotonin neurons while mice waited for rewards found that the largest population of DRN serotonin neurons exhibited sustained firing during the delay period [49]. In this paper, they also used fibre photometry to examine how serotonergic activity is changed by reward delays. When the delay period was extended for a longer period of time, activity in DRN serotonin neurons remained sustained and increased closer toward expected delivery, suggesting that the activity of these neurons is modulated by a temporal expectation for rewards [49]. Another study recorded the activity of DRN serotonin neurons using fibre photometry while mice were learning the association between auditory cue and reward [51*]. Initially, DRN serotonin neurons showed a phasic response to the reward, but not to the to-be-learned cue [51*]. However, across learning, DRN serotonin neurons develop a sustained response across the reward cue (or delay to reward), showing a ramp up toward reward delivery [51*], consistent with prior studies [49]. Furthermore, a recent study recorded in vivo calcium signal from individual DRN serotonin neurons using endoscope imaging [48**] and showed the development of the response to a reward-predictive cue learning, where the reward was delivered following the involuntary turn of a wheel attached to a spout. On the first day of training, about half of serotonin neurons (47.1%) exhibited an increase in activity at reward delivery, while a smaller proportion of serotonin neurons (0.01%) increased during the wheel turn. However, after learning, a greater number of serotonin neurons (26.8 %) showed an excitatory response during the wheel turn. Importantly, a smaller proportion of neurons developed the inhibitory response at wheel turn across learning (3.9% to 11.9% on the first day and after learning, respectively), suggesting heterogeneity in the response profile of serotonin neurons. Recording studies have also shown that activity of DRN serotonin neurons to reward cues are modulated by the size of the received outcome, which was consistent across both fibre photometry [50,51] and single-unit electrophysiological recording [52]. That is, DRN serotonin neurons show a greater magnitude of activity to rewards of greater magnitude, regardless of the method of recording used to capture neural activity.

It's also worth noting that other studies have recorded DRN serotonin neural activity during cue-outcome learning (water, no outcome, or airpuff) in head-fixed mice [50,52]. When capturing neural activity using fibre photometry in a head-fixed scenario, DRN serotonin neurons show sustained activity across an odour and delay to reward, which plateaus early and does not exhibit a ramp up towards expected reward delivery [50]. However, electrophysiological recordings of serotonin neurons reveal nuances in this finding such that about half of serotonin neurons exhibited phasic excitation to a reward-predictive cue, with a subcomponent of this population also showing this sustained increase in activity [52]. A recent electrophysiological study also found heterogeneous DRN serotonin neural activity while head-fixed mice learnt cue-reward associations, where the authors varied the reward probabilities associated with the cue across learning [53]. Among a subset of DRN serotonin neurons, the response during the cue and the subsequent delay before reward was correlated with the degree of uncertainty in the expectation for reward delivery [53]. Specifically, some neurons showed increases in responding to the cue and during the delay when mice had a high expectation for reward (i.e., low expected uncertainty), while others showed a decrease in activity. This suggests that DRN serotonin neurons were encoding certainty within this probabilistic reward setting. Overall, these studies suggest that DRN serotonin neurons show sustained activity during reward anticipatory periods, which evolves with learning and reflects the different aspects of upcoming rewards including size, delay, and uncertainty.

As shown above, different response profiles (e.g., ramping up or a sustained increase) have been reported across studies that measure activity of serotonergic activity during reinforcement learning (Figure 1A-C). These differences may come from variations in the experimental setup. In particular, some of these differences can be explained by whether the animal is freely moving or head-fixed during the behaviour. For example, the ramp-like activity in DRN neurons is most prevalent in freely-moving rodents [48**,49,51*], while sustained increases are more prevalent in the head-fixed scenario [50,52]. This could be because head fixation changes the behavioural demands of the task, and subjects no longer need to adjust behaviour to retrieve reward. Further, this also alters to stress of the animal. In line with this, when mice are subjected to head fixation, DRN serotonin neurons reduce their activity [51*]. Additionally, there are several differences in experimental parameters including the length of cues, delays to the rewards, inter-trial intervals, and sensory modality of the cues, which may further reveal differences in firing of serotonin neurons [49,54,55]. This is as we would expect is serotonin neurons were reflective of a multidimensional expectation signal, which necessarily varies with these parameters.

Figure 1. VTA and DRN serotonin neuronal activity during reward-based tasks.

Figure 1.

(A) Profiles of serotonin neurons during waiting for delayed rewards in freely moving rodents: While mice are waiting for rewards, the activity of DRN serotonin neurons shows ramping-up activity as mice near reward delivery, when recorded using fibre photometry [49]. In single-unit electrophysiological recordings, the largest population (58%) of DRN serotonin neurons showed the similar ramping-up activity in the same task. However, other neurons show different responses: sustained increase during the delay (21%), ramping-up without phasic response to reward delivery (11%) or ramping-down (10%) without phasic response to reward delivery [49]. (B) Calcium fluorescent signals from DRN serotonin neurons across cue-reward learning in freely moving mice [48**,51*]: In early sessions of cue-reward learning, serotonin neurons show phasic responses to reward but not the auditory cue ([48**,51*]). As the learning progresses DRN serotonin neurons show sustained ramps up towards reward ([51*]), with some heterogeneity in the single-unit response characterised by a proportion of neurons developing an inhibitory response to the cue (11%) [48**]. (C) Other studies showed phasic response to odour cues predictive of rewards with a sustained increase in activity that plateaus early on in head-fixed mice using fibre photometry and single-unit electrophysiology [50,52]. (D) Activity of dopamine neurons in medial and lateral VTA. Left: The largest population of reward-responsive VTA dopamine neurons (~60-70%) show phasic firing at the cue with smaller firing at the reward [62], which recapitulate classic findings [56-61]. Right: Other subsets of reward-responsive VTA dopamine neurons, which are mainly located in medial VTA, show sustained (27%) activity during a reward-predictive cue prior to reward delivery [62].

Unique and complementary roles for dopamine and serotonin in model-based learning

Phasic firing to rewards- and the sensory cues that predict rewards- is major feature of VTA dopamine neural activity during learning (Figure 1D; [26,56-62]). While there is heterogeneity in this response (Figure 1D-E), the largest population of reward-responsive dopamine neurons, which is predominantly located in lateral part of VTA, exhibit phasic firing to reward and reward-predictive cues (~60-70%; [62]). Importantly, it has been shown that phasic activity in VTA dopamine critically regulates learning of model-based associations. For example, optogenetic studies have shown that phasic activity in dopamine neurons is necessary for learning sensory features of rewards [18*,20**,21,23] as well as the associations between value-neutral cues [22,24], without endowing general values to antecedent events [20**,24,25]. Electrophysiological studies support the function of VTA dopamine neurons in model-based learning [26,63,64,65**]. VTA neurons respond to the change in sensory features of associated outcomes [63,66,67] as sensory prediction errors [26]. Indeed, dopamine neurons also reflect learning about value-neutral cues [68]. This error signals contains the specific information about what is mis-predicted [64,65**], which illustrates that dopamine is likely providing the content of the error to downstream regions to direct learning. Furthermore, we recently showed that VTA dopamine neurons are even necessary to learn backward associations between rewards and cues, where reward delivery is followed by a value-neutral cue [17**]. All together, these findings demonstrate that phasic dopamine signals act as a universal teaching signal to associate related events together regardless of their motivational significance or inherent value.

Studies have shown that serotonin is also important for model-based learning [29**,43*,44**]. Specifically, activity in DRN serotonin neurons is important for developing and using model-based associations between action and outcomes [43*,44**]. Further, serotonin regulates the ability to use time to predict reward and incorporates reward probabilities to influence how to respond appropriately [29**]. One of the greatest differences between dopamine and serotonin neurons is in the temporal dynamics of their neural activity (Figure 1). This is critical to our understanding of the differential function of these systems. Specifically, dopamine neurons exhibit phasic activity to unexpected events, which range from changes in sensory features of rewards, their size, or even presentation of valence neutral cues [26,63,64,65**,69]. These studies have indicated that the phasic dopamine response can act as multidimensional teaching signal to instantiate learning. On the other hand, serotonin neurons show sustained activity across a reward-predictive cue or the temporal delay between reward deliveries, which ramps up as reward delivery becomes imminent [49,51*]. This temporal profile of serotonin activity suggests that serotonin may relay precise expectations for upcoming rewards, which incorporate multidimensional aspects of environment to facilitate adaptive behaviour. This role may be particularly apparent in tasks that require temporal criterion as this relies on an internal model, rather than using physical cues in the environment to motivate behaviour [65**,69]. In line with this, the role of serotonin in estimating time under reward prediction is revealed to be more complex once other variables like reward certainty and probability come into play [29**]. Thus, we propose that serotonin sends a multidimensional expectation signal to instantiate precise expectations for upcoming rewards to regulate learning and behaviour generated downstream. This is supported by the projection profiles of DRN serotonin neurons. Much like dopamine neurons, serotonin neurons send vast and dense projections throughout the brain, particularly to areas important for flexible behaviour, including prefrontal cortex, amygdala, striatum [47,70-75]. Future research would benefit from manipulation of these projections during tasks that engender states with complex reward predictions to test this hypothesis, particularly as it relates to learned relationships between value-neutral associations, as the current work with serotonin neurons in model-based learning always involves rewards. Applying these tasks in combination with the temporal specificity of optogenetics will allow us to test the complementary roles for serotonin and dopamine neurons in representing the multi-dimensional expectation (e.g., during the reward-predictive cue) and the prediction error (e.g., during reward receipt), respectively.

One interesting direction of future study is to understand how DRN serotonin neurons interact with VTA dopamine neurons (Figure 2; [76]). Particularly relevant to our proposed framework is that VTA GABAergic (VTAGABA) neurons exhibit ramping-up activity in cue-outcome learning [60] much like DRN serotonin neurons [49,51*] and regulate prediction error signals exhibited by VTA dopamine neurons [77]. This ramping profile has been interpreted as VTAGABA neurons sending reward expectations to VTA dopamine neurons to regulate ongoing learning [60]. Interestingly, VTAGABA and DRN neurons are interconnected. VTAGABA neurons synapse onto serotonin and GABAergic neurons in DRN [78,79]. On the other hand, DRN serotonin neurons send dense projections to VTA [80-84]. Both VTA dopamine and VTAGABA neurons express serotonin receptors [83,85,86]. The DRN to VTA projection as well as VTA to DRN projection are known to regulate real-time or conditioned place preference [78,83,87], which may indicate a role for learning. Future studies examining how the DRN to VTA projections regulate expectation and reward learning would help reveal the differential roles these regions play in reinforcement learning. In particular, it would be interesting to examine whether projections from DRN serotonin neurons inform the reward expectation exhibited by VTAGABA neurons, or vice versa.

Figure 2. A schematic of interconnections between VTA and DRN neurons and their afferent projections.

Figure 2.

VTA and DRN are interconnected. DRN send serotonergic and glutamatergic projections to VTA dopamine neurons [80,83,86]. Some DRN neurons corelease serotonin and glutamate to regulate dopamine neurons in VTA [83]. VTA GABAergic neurons express serotonin receptors [85]. On the other hand, VTA primarily sends GABAergic projections to both DRN serotonin and GABAergic neurons [78,79]. DRN GABA neurons also locally project to DRN serotonin neurons [79]. Both VTA dopamine and DRN serotonin neurons send projections to overlapping brain regions which critically regulate model-based learning such as orbitofrontal cortex (OFC) [47], medial prefrontal cortex (mPFC) [73,74], nucleus accumbens (NAc) [73], lateral hypothalamus (LH) [47,74], and basolateral amygdala (BLA) [70-72].

Future directions

Here, we discussed the possible roles of DRN serotonin neurons in model-based learning. However, it is important to acknowledge that serotonin neurons are involved in diverse forms of reinforcement learning, which may also include model-free learning mechanisms. Indeed, recent studies using reward-driven probabilistic choice tasks show choice patterns that can be explained by a variant of model-free reinforcement learning [53,88]. For example, one study showed that the optogenetic activation of serotonin neurons at reward delivery more strongly biased subsequent choices compared to choices following no optogenetic activation [88]. Using a mathematical model of reinforcement learning, the study showed that optogenetic activation of serotonin neurons increased the learning rate consistent with a model-free learning algorithm [88]. In another study using a probabilistic choice task, chemogenetic inhibition of DRN serotonin neurons disrupted adaptation of choices following a change in reward probability, which could also be characterised as the disruption of the computational mechanisms to tune the learning rate under a model-free reinforcement learning algorithm [53]. Essentially, these studies consistently showed that choices were strongly informed by the long-term history of direct outcomes, which is a feature of model-free learning [2]. Indeed, it has been demonstrated that model-free learning algorithms can better capture choice pattens in probabilistic choice tasks relative to model-based learning algorithms in which animals use the structure of the task [89]. All together, these studies could suggest that a role for serotonin in probabilistic choice tasks could reflect a role for serotonin neurons in model-free learning.

However, as we have discussed above, recent studies have also shown that serotonin plays important roles in model-based learning [29**,43*,44**], in a manner that cannot be accounted for by model-free learning mechanisms. For example, the context-dependent effect of optogenetic activation of serotonin neurons on waiting times for rewards is consistent a Bayesian decision model that uses an internal model of task structure [29**]. Further, outcome devaluation studies provide unequivocal behavioural evidence that serotonin neurons regulate learning about sensory-specific features of rewards and are also necessary to use that information to infer possible outcomes, both of which is underlying cognitive process for model-based learning [43*,44**]. Finally, data from humans have shown that serotonin depletion reduces the model-based weighting of behaviour as measured by performance on the Daw two step task [45,46]. Given the evidence that serotonin may contribute to both model-free and model-based computations, the next important question is whether and how serotonin neurons take part in different computational aspects of learning. That is, can a suggested role for serotonin in model-free learning be accounted for by model-based learning mechanisms? Or, if serotonin plays a role in both model-free and model-based components of learning, can this be achieved via different projections or different populations of neurons at different time points (e.g., during delays to rewards or at the observation of outcomes)? Answering this question will require careful experimentation using sophisticated behavioural tasks in combination with optogenetics, similarly to how researchers have attempted to dissociate these roles with midbrain dopamine neurons [18*,20**,21-25]. Such an approach will help us further dissect the roles of serotonin neurons in reinforcement learning.

As a final point, this complementary role for serotonin and dopamine in model-based learning may be consistent with work that has implicated dopamine and serotonin as playing distinct roles in working memory [90]. Studies using macaque monkeys examined the role of dorsolateral prefrontal cortex (dlPFC) in a spatial working-memory task [91,92]. In this task, monkeys were briefly presented with a reward-associated cue, which was followed by a delay period. Once a ‘go’ signal was presented after the delay, the monkeys were required to make saccade movement toward the cue’s prior position, which must be held in working memory. Here, neurons in dlPFC exhibited sustained activity during the delay period [91], which are referred to as delay cells. The delay cells also have the spatial locations where they preferably fire, and their spatial tuning and activity during the delay critically regulates spatial working memory in the task. Subsequent studies have revealed that local infusion of D1 receptor antagonists or serotonin 2A receptor agonists influences spatial tuning of the delay cells and improves behaviour on the task [90]. Although these data do not speak to the temporal dynamics of the dopamine and serotonergic influence over dlPFC neurons during the spatial working memory task, single-unit electrophysiological studies have recorded midbrain dopamine neurons and DRN neurons during similar tasks. Here, dopamine neurons show a phasic response to the cue [93], while DRN neurons exhibit tonic activity during the delay prior to the “go” signal [94]. These results suggest that dopamine and serotonin have complimentary roles in working memory. This is consistent with our current framework that advocates for a role of dopamine in providing the teaching signal to facilitate model-based learning, and serotonin relaying the multidimensional expectation for reward that informs subsequent teaching signals.

Conclusion

Here, we compared the unique and complementary roles of DRN serotonin neurons and VTA dopamine neurons in model-based learning. We know that phasic firing of dopamine neurons tells us when to learn and what to learn [64,65**]. Here, we propose serotonin allows us to refine expectations for reward by interplexing information across multiple dimensions of the cue-reward contingencies, including time [49], certainty [29,53], magnitude [50,51*,52], and probabilities associated with reward delivery [29**] and relaying this expectation through the brain to influence ongoing learning and behaviour. This facilitates a flexible trade-off in behaviour that benefits from precise estimates of upcoming rewards [29**]. Dopamine and serotonin neurons send projections to multiple brain regions in anatomically segregated manner [47,74,82,95] and the behavioural function of the neuromodulators depends on the target regions [18*,33,47,48**,96,97*]. Recent development of genetics tools allows us to examine the neural activity of specific projections and to optogenetically manipulate them for test their function. This approach will also benefit our understanding of the neural circuits involved in psychiatric disorders, which almost invariably implicate serotonin and dopamine [98].

Highlights.

  • Dopamine and serotonin play important roles in acquiring internal models of predictive relationships and using these learned representations to guide actions.

  • Dopamine neurons exhibit phasic activity to unexpected events, and causal studies have shown that phasic firing of dopamine neurons can act as a teaching signal to form associations between events regardless of their motivational significance or inherent value.

  • Serotonin neurons show sustained activity during waiting or reward-predictive cues later in learning, which often ramps up as subjects reach reward.

  • Sustained serotonin activity may relay precise expectations for upcoming rewards incorporating multidimensional aspects of environment to facilitate adaptive behaviour.

Acknowledgements

This work was supported by the National Institutes of Health [NIDA R01 054967; NIDA R01 057084], awarded to MJS.

Footnotes

Conflict of Interest Statement

Declarations of interest: None

Reference list

  • 1.Behrens TEJ, Muller TH, Whittington JCR, Mark S, Baram AB, Stachenfeld KL, Kurth-Nelson Z: What Is a Cognitive Map? Organizing Knowledge for Flexible Behavior. Neuron 2018, 100:490–509. [DOI] [PubMed] [Google Scholar]
  • 2.Drummond N, Niv Y: Model-based decision making and model-free learning. Curr Biol 2020, 30:R860–R865. [DOI] [PubMed] [Google Scholar]
  • 3.Whittington JCR, McCaffary D, Bakermans JJW, Behrens TEJ: How to build a cognitive map. Nat Neurosci 2022, 25:1257–1272. [DOI] [PubMed] [Google Scholar]
  • 4.Costa KM, Scholz R, Lloyd K, Moreno-Castilla P, Gardner MPH, Dayan P, Schoenbaum G: The role of the lateral orbitofrontal cortex in creating cognitive maps. Nat Neurosci 2023, 26:107–115. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Gillan CM, Kosinski M, Whelan R, Phelps EA, Daw ND: Characterizing a psychiatric symptom dimension related to deficits in goal-directed control. Elife 2016, 5. [Google Scholar]
  • 6.Culbreth AJ, Westbrook A, Daw ND, Botvinick M, Barch DM: Reduced model-based decision-making in schizophrenia. J Abnorm Psychol 2016, 125:777–787. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Groman SM, Massi B, Mathias SR, Lee D, Taylor JR: Model-Free and Model-Based Influences in Addiction-Related Behaviors. Biol Psychiatry 2019, 85:936–945. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Seow TXF, Benoit E, Dempsey C, Jennings M, Maxwell A, O'Connell R, Gillan CM: Model-Based Planning Deficits in Compulsivity Are Linked to Faulty Neural Representations of Task Structure. J Neurosci 2021, 41:6539–6550. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Sharp PB, Dolan RJ, Eldar E: Disrupted state transition learning as a computational marker of compulsivity. Psychological Medicine 2023, 53:2095–2105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Doya K: Metalearning and neuromodulation. Neural Netw 2002, 15:495–506. [DOI] [PubMed] [Google Scholar]
  • 11.Doya K: Modulators of decision making. Nat Neurosci 2008, 11:410–416. [DOI] [PubMed] [Google Scholar]
  • 12.Akam T, Walton ME: What is dopamine doing in model-based reinforcement learning? Curr Opin Behav Sci 2021, 38:74–82. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • *13. Doya K, Miyazaki KW, Miyazaki K: Serotonergic modulation of cognitive computations. Current Opinion in Behavioral Sciences 2021, 38:116–123. The authors reviewed recent studies examining the role of serotonin neurons in learning and behaviour. They discussed the diverse role of serotonin neurons which go beyond classical temporal discount factor hypothesis.
  • 14.Langdon AJ, Sharpe MJ, Schoenbaum G, Niv Y: Model-based predictions for dopamine. Curr Opin Neurobiol 2018, 49:1–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Starkweather CK, Uchida N: Dopamine signals as temporal difference errors: recent advances. Curr Opin Neurobiol 2021, 67:95–105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Seitz BM, Blaisdell AP, Sharpe MJ: Higher-Order Conditioning and Dopamine: Charting a Path Forward. Front Behav Neurosci 2021, 15:745388. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • **17. Seitz BM, Hoang IB, DiFazio LE, Blaisdell AP, Sharpe MJ: Dopamine errors drive excitatory and inhibitory components of backward conditioning in an outcome-specific manner. Curr Biol 2022, 32:3210–3218 e3213. Here, we investigated the effect of optogenetic inhibition of VTA dopamine neurons during a backward conditioning task where reward delivery is reliably followed by a value-neutral auditory stimulus. We found that the inhibition of VTA dopamine neurons at the onset of the auditory stimulus (when an error would occur) during backward conditioning disrupted the learned associations during this task, which include sensory-specific inhibitory associations and general excitatory associations. This study demonstrated that VTA dopamine neurons are necessary for learning associations between events, regardless of motivational significance.
  • *18. Sias AC, Jafar Y, Goodpaster CM, Ramirez-Armenta K, Wrenn TM, Griffin NK, Patel K, Lamparelli AC, Sharpe MJ, Wassum KM: Dopamine projections to the basolateral amygdala drive the encoding of identity-specific reward memories. Nat Neurosci 2024, 27:728–736. The authors investigated the role of VTA dopamine neurons projecting to basolateral amygdala (BLA) in cue-reward learning. They found that dopamine is released in BLA during cue-reward learning and optogenetic inhibition of VTA dopamine projections to BLA during cue-reward learning disrupted the use of identity-specific reward memories. This study demonstrated that VTA-BLA dopamine projections is critical to learning identity-specific reward memories and reinforce the emerging idea that VTA dopamine neurons regulate multifaceted roles in learning through distinct projections.
  • 19.Liu Q, Zhao Y, Attanti S, Voss JL, Schoenbaum G, Kahnt T: Midbrain signaling of identity prediction errors depends on orbitofrontal cortex networks. Nat Commun 2024, 15:1704. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • **20. Millard SJ, Hoang IB, Sherwood S, Taira M, Reyes V, Greer Z, O’Connor SL, Wassum KM, James MH, Barker DJ, et al. : Cognitive representations of intracranial self-stimulation of midbrain dopamine neurons depend on stimulation frequency. Nature Neuroscience 2024. Here, we examined the representation the drives intracranial self-stimulation (ICSS) of dopamine neurons while varying the frequency of dopamine neuronal firing. We found that a 20Hz stimulation train (approximating a learning-relevant prediction error) did not function as a reward. However, a supraphysiological 50Hz stimulation of dopamine neurons functions as a sensory-specific reward that could guide future actions. These results demonstrate that a learning-relevant prediction error signal does not contain value sufficient to uphold the value theory of dopamine.
  • 21.Chang CY, Gardner M, Di Tillio MG, Schoenbaum G: Optogenetic Blockade of Dopamine Transients Prevents Learning Induced by Changes in Reward Features. Curr Biol 2017, 27:3480–3486 e3483. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Sharpe MJ, Chang CY, Liu MA, Batchelor HM, Mueller LE, Jones JL, Niv Y, Schoenbaum G: Dopamine transients are sufficient and necessary for acquisition of model-based associations. Nat Neurosci 2017, 20:735–742. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Keiflin R, Pribut HJ, Shah NB, Janak PH: Ventral Tegmental Dopamine Neurons Participate in Reward Identity Predictions. Curr Biol 2019, 29:93–103 e103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Sharpe MJ, Batchelor HM, Mueller LE, Yun Chang C, Maes EJP, Niv Y, Schoenbaum G: Dopamine transients do not act as model-free prediction errors during associative learning. Nat Commun 2020, 11:106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Maes EJP, Sharpe MJ, Usypchuk AA, Lozzi M, Chang CY, Gardner MPH, Schoenbaum G, Iordanova MD: Causal evidence supporting the proposal that dopamine transients function as temporal difference prediction errors. Nat Neurosci 2020, 23:176–178. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Sadacca BF, Jones JL, Schoenbaum G: Midbrain dopamine neurons compute inferred and cached value prediction errors in a common framework. Elife 2016, 5. [Google Scholar]
  • 27.Groman SM, Massi B, Mathias SR, Curry DW, Lee D, Taylor JR: Neurochemical and Behavioral Dissections of Decision-Making in a Rodent Multistage Task. J Neurosci 2019, 39:295–306. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Krausz TA, Comrie AE, Kahn AE, Frank LM, Daw ND, Berke JD: Dual credit assignment processes underlie dopamine signals in a complex spatial environment. Neuron 2023, 111:3465–3478 e3467. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • **29. Miyazaki K, Miyazaki KW, Yamanaka A, Tokuda T, Tanaka KF, Doya K: Reward probability and timing uncertainty alter the effect of dorsal raphe serotonin neurons on patience. Nat Commun 2018, 9:2048. The authors found that optogenetic activation of DRN serotonin neurons increased the degree to which mice were willing to wait for reward in a context-dependent manner. They showed that the effect of optogenetic activation was more prominent when reward probability is high, and the timing of reward delivery is uncertain. They also proposed a Bayesian decision model of waiting, suggesting that DRN serotonin neurons are involved in developing and deploying a model-based prediction about upcoming rewards rather than model-free valuation of discounted future rewards.
  • 30.Miyazaki K, Miyazaki KW, Doya K: The role of serotonin in the regulation of patience and impulsivity. Mol Neurobiol 2012, 45:213–224. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Nautiyal KM, Wall MM, Wang S, Magalong VM, Ahmari SE, Balsam PD, Blanco C, Hen R: Genetic and Modeling Approaches Reveal Distinct Components of Impulsive Behavior. Neuropsychopharmacology 2017, 42:1182–1191. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Desrochers SS, Spring MG, Nautiyal KM: A Role for Serotonin in Modulating Opposing Drive and Brake Circuits of Impulsivity. Front Behav Neurosci 2022, 16:791749. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Salvan P, Fonseca M, Winkler AM, Beauchamp A, Lerch JP, Johansen-Berg H: Serotonin regulation of behavior via large-scale neuromodulation of serotonin receptor networks. Nat Neurosci 2023, 26:53–63. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Wang Q, Chen C, Cai Y, Li S, Zhao X, Zheng L, Zhang H, Liu J, Chen C, Xue G: Dissociated neural substrates underlying impulsive choice and impulsive action. Neuroimage 2016, 134:540–549. [DOI] [PubMed] [Google Scholar]
  • 35.Dalley JW, Robbins TW: Fractionating impulsivity: neuropsychiatric implications. Nat Rev Neurosci 2017, 18:158–171. [Google Scholar]
  • 36.Strickland JC, Johnson MW: Rejecting Impulsivity as a Psychological Construct: A Theoretical, Empirical, and Sociocultural Argument. Psychological Review 2021, 128:336–361. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Winstanley CA, Dalley JW, Theobald DE, Robbins TW: Fractionating impulsivity: contrasting effects of central 5-HT depletion on different measures of impulsive behavior. Neuropsychopharmacology 2004, 29:1331–1343. [DOI] [PubMed] [Google Scholar]
  • 38.Denk F, Walton ME, Jennings KA, Sharp T, Rushworth MF, Bannerman DM: Differential involvement of serotonin and dopamine systems in cost-benefit decisions about delay or effort. Psychopharmacology (Berl) 2005, 179:587–596. [DOI] [PubMed] [Google Scholar]
  • 39.Xu S, Das G, Hueske E, Tonegawa S: Dorsal Raphe Serotonergic Neurons Control Intertemporal Choice under Trade-off. Curr Biol 2017, 27:3111–3119 e3113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Miyazaki KW, Miyazaki K, Tanaka KF, Yamanaka A, Takahashi A, Tabuchi S, Doya K: Optogenetic activation of dorsal raphe serotonin neurons enhances patience for future rewards. Curr Biol 2014, 24:2033–2040. [DOI] [PubMed] [Google Scholar]
  • 41.Fonseca MS, Murakami M, Mainen ZF: Activation of dorsal raphe serotonergic neurons promotes waiting but is not reinforcing. Curr Biol 2015, 25:306–315. [DOI] [PubMed] [Google Scholar]
  • 42.Schweighofer N, Tanaka SC, Doya K: Serotonin and the evaluation of future rewards: theory, experiments, and possible neural mechanisms. Ann N Y Acad Sci 2007, 1104:289–300. [DOI] [PubMed] [Google Scholar]
  • *43. Corbit L, Kendig M, Moul C: The role of serotonin 1B in the representation of outcomes. Sci Rep 2019, 9:2497. The authors examined the role of serotonergic system in learning model-based associations between actions and rewards. They found that systemic inhibition of serotoninergic system using 5-HT1B receptor agonist prevented rats from learning model-based associations between actions and rewards. This study demonstrates that serotoninergic system critically regulates learning of model-based associations.
  • **44. Ohmura Y, Iwami K, Chowdhury S, Sasamori H, Sugiura C, Bouchekioua Y, Nishitani N, Yamanaka A, Yoshioka M: Disruption of model-based decision making by silencing of serotonin neurons in the dorsal raphe nucleus. Curr Biol 2021, 31:2446–2454 e2445. The authors found that the optogenetic inhibition of DRN serotonin neurons disrupted the use of learned model-based associations in an instrumental reward devaluation task. On the other hand, optogenetic inhibition of serotonin neurons in median raphe nucleus did not affect the use of model-based associations. This is the first study clearly demonstrating that serotonin neurons in DRN, but not median raphe nucleus, are necessary for using model-based associations to drive instrumental actions.
  • 45.Worbe Y, Savulich G, de Wit S, Fernandez-Egea E, Robbins TW: Tryptophan Depletion Promotes Habitual over Goal-Directed Control of Appetitive Responding in Humans. Int J Neuropsychopharmacol 2015, 18:pyv013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Worbe Y, Palminteri S, Savulich G, Daw ND, Fernandez-Egea E, Robbins TW, Voon V: Valence-dependent influence of serotonin depletion on model-based choice strategy. Mol Psychiatry 2016, 21:624–629. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Ren J, Friedmann D, Xiong J, Liu CD, Ferguson BR, Weerakkody T, DeLoach KE, Ran C, Pun A, Sun Y, et al. : Anatomically Defined and Functionally Distinct Dorsal Raphe Serotonin Sub-systems. Cell 2018, 175:472–487 e420. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • **48. Paquelet GE, Carrion K, Lacefield CO, Zhou P, Hen R, Miller BR: Single-cell activity and network properties of dorsal raphe nucleus serotonin neurons during emotionally salient behaviors. Neuron 2022, 110:2664–2679 e2668. In this study, the authors recorded activity from a large number of DRN serotonin neurons using miniscope calcium imaging while mice were presented with emotionally salient stimuli, such as reward, foot shock, or a social stimulus. They found that DRN serotonin neurons responded to various emotional stimuli with differential selectivity. They also showed that subpopulations of reward-responsive and punishment-responsive neurons had differential projection targets, suggesting that DRN neuronal selectivity is in part defined by projection specificity.
  • 49.Li Y, Zhong W, Wang D, Feng Q, Liu Z, Zhou J, Jia C, Hu F, Zeng J, Guo Q, et al. : Serotonin neurons in the dorsal raphe nucleus encode reward signals. Nat Commun 2016, 7:10503. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Matias S, Lottem E, Dugue GP, Mainen ZF: Activity patterns of serotonin neurons underlying cognitive flexibility. Elife 2017, 6. [Google Scholar]
  • *51. Zhong W, Li Y, Feng Q, Luo M: Learning and Stress Shape the Reward Response Patterns of Serotonin Neurons. J Neurosci 2017, 37:8863–8875. The authors examined how the activity of DRN serotonin neurons developed across cue-reward learning. They found that DRN serotonin neurons developed activity that increased as the reward period approached. They also showed that the activity of DRN serotonin neurons change under stressful conditions, such as head fixation or quinine consumption. This demonstrates the adaptive nature of DRN serotonin neural activity.
  • 52.Cohen JY, Amoroso MW, Uchida N: Serotonergic neurons signal reward and punishment on multiple timescales. Elife 2015, 4. [Google Scholar]
  • 53.Grossman CD, Bari BA, Cohen JY: Serotonin neurons modulate learning rate through uncertainty. Curr Biol 2022, 32:586–599 e587. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Sizemore TR, Hurley LM, Dacks AM: Serotonergic modulation across sensory modalities. J Neurophysiol 2020, 123:2406–2425. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Khalighinejad N, Manohar S, Husain M, Rushworth MFS: Complementary roles of serotonergic and cholinergic systems in decisions about when to act. Curr Biol 2022, 32:1150–1162 e1157. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Schultz W, Dayan P, Montague PR: A neural substrate of prediction and reward. Science 1997, 275:1593–1599. [DOI] [PubMed] [Google Scholar]
  • 57.Hollerman JR, Schultz W: Dopamine neurons report an error in the temporal prediction of reward during learning. Nat Neurosci 1998, 1:304–309. [DOI] [PubMed] [Google Scholar]
  • 58.Bayer HM, Glimcher PW: Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron 2005, 47:129–141. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Pan WX, Schmidt R, Wickens JR, Hyland BI: Dopamine cells respond to predicted events during classical conditioning: evidence for eligibility traces in the reward-learning network. J Neurosci 2005, 25:6235–6242. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Cohen JY, Haesler S, Vong L, Lowell BB, Uchida N: Neuron-type-specific signals for reward and punishment in the ventral tegmental area. Nature 2012, 482:85–88. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Eshel N, Tian J, Bukwich M, Uchida N: Dopamine neurons share common response function for reward prediction error. Nat Neurosci 2016, 19:479–486. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.de Jong JW, Liang Y, Verharen JPH, Fraser KM, Lammel S: State and rate-of-change encoding in parallel mesoaccumbal dopamine pathways. Nat Neurosci 2024, 27:309–318. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Takahashi YK, Batchelor HM, Liu B, Khanna A, Morales M, Schoenbaum G: Dopamine Neurons Respond to Errors in the Prediction of Sensory Features of Expected Rewards. Neuron 2017, 95:1395–1405 e1393. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Stalnaker TA, Howard JD, Takahashi YK, Gershman SJ, Kahnt T, Schoenbaum G: Dopamine neuron ensembles signal the content of sensory prediction errors. Elife 2019, 8. [Google Scholar]
  • **65. Takahashi YK, Stalnaker TA, Mueller LE, Harootonian SK, Langdon AJ, Schoenbaum G: Dopaminergic prediction errors in the ventral tegmental area reflect a multithreaded predictive model. Nat Neurosci 2023, 26:830–839. In this study, the authors examined how VTA dopamine neurons achieve an error reflecting multiple events including delay, reward size, and reward identity. They recorded VTA dopamine neurons while rats were trained during an odour-based choice task and showed that the pattern observed in the task can be theoretically explained by multithread temporal different reinforcement learning model. This result demonstrates that dopamine can relay errors that contain multiple aspects of reward related information such as delay, feature, and magnitude.
  • 66.Howard JD, Kahnt T: Identity prediction errors in the human midbrain update reward-identity expectations in the orbitofrontal cortex. Nat Commun 2018, 9:1611. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Suarez JA, Howard JD, Schoenbaum G, Kahnt T: Sensory prediction errors in the human midbrain signal identity violations independent of perceptual distance. Elife 2019, 8. [Google Scholar]
  • 68.Young AM, Ahier RG, Upton RL, Joseph MH, Gray JA: Increased extracellular dopamine in the nucleus accumbens of the rat during associative learning of neutral stimuli. Neuroscience 1998, 83:1175–1183. [DOI] [PubMed] [Google Scholar]
  • 69.Takahashi YK, Langdon AJ, Niv Y, Schoenbaum G: Temporal Specificity of Reward Prediction Errors Signaled by Putative Dopamine Neurons in Rat VTA Depends on Ventral Striatum. Neuron 2016, 91:182–193. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Asan E, Steinke M, Lesch KP: Serotonergic innervation of the amygdala: targets, receptors, and implications for stress and anxiety. Histochem Cell Biol 2013, 139:785–813. [DOI] [PubMed] [Google Scholar]
  • 71.Sengupta A, Bocchio M, Bannerman DM, Sharp T, Capogna M: Control of Amygdala Circuits by 5-HT Neurons via 5-HT and Glutamate Cotransmission. J Neurosci 2017, 37:1785–1796. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Sengupta A, Holmes A: A Discrete Dorsal Raphe to Basal Amygdala 5-HT Circuit Calibrates Aversive Memory. Neuron 2019, 103:489–505 e487. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Huang KW, Ochandarena NE, Philson AC, Hyun M, Birnbaum JE, Cicconet M, Sabatini BL: Molecular and anatomical organization of the dorsal raphe nucleus. Elife 2019, 8. [Google Scholar]
  • 74.Ren J, Isakova A, Friedmann D, Zeng J, Grutzner SM, Pun A, Zhao GQ, Kolluru SS, Wang R, Lin R, et al. : Single-cell transcriptomes and whole-brain projections of serotonin neurons in the mouse dorsal and median raphe nuclei. Elife 2019, 8. [Google Scholar]
  • 75.Cardozo Pinto DF, Yang H, Pollak Dorocic I, de Jong JW, Han VJ, Peck JR, Zhu Y, Liu C, Beier KT, Smidt MP, et al. : Characterization of transgenic mouse models targeting neuromodulatory systems reveals organizational principles of the dorsal raphe. Nat Commun 2019, 10:4633. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Peters KZ, Cheer JF, Tonini R: Modulating the Neuromodulators: Dopamine, Serotonin, and the Endocannabinoid System. Trends Neurosci 2021, 44:464–477. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Eshel N, Bukwich M, Rao V, Hemmelder V, Tian J, Uchida N: Arithmetic and local circuitry underlying dopamine prediction errors. Nature 2015, 525:243–246. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.Li Y, Li CY, Xi W, Jin S, Wu ZH, Jiang P, Dong P, He XB, Xu FQ, Duan S, et al. : Rostral and Caudal Ventral Tegmental Area GABAergic Inputs to Different Dorsal Raphe Neurons Participate in Opioid Dependence. Neuron 2019, 101:748–761 e745. [DOI] [PubMed] [Google Scholar]
  • 79.Rahaman SM, Chowdhury S, Mukai Y, Ono D, Yamaguchi H, Yamanaka A: Functional Interaction Between GABAergic Neurons in the Ventral Tegmental Area and Serotonergic Neurons in the Dorsal Raphe Nucleus. Front Neurosci 2022, 16:877054. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80.Liu Z, Zhou J, Li Y, Hu F, Lu Y, Ma M, Feng Q, Zhang JE, Wang D, Zeng J, et al. : Dorsal raphe neurons signal reward through 5-HT and glutamate. Neuron 2014, 81:1360–1374. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 81.McDevitt RA, Tiran-Cappello A, Shen H, Balderas I, Britt JP, Marino RAM, Chung SL, Richie CT, Harvey BK, Bonci A: Serotonergic versus nonserotonergic dorsal raphe projection neurons: differential participation in reward circuitry. Cell Rep 2014, 8:1857–1869. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 82.Beier KT, Steinberg EE, DeLoach KE, Xie S, Miyamichi K, Schwarz L, Gao XJ, Kremer EJ, Malenka RC, Luo L: Circuit Architecture of VTA Dopamine Neurons Revealed by Systematic Input-Output Mapping. Cell 2015, 162:622–634. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 83.Wang HL, Zhang S, Qi J, Wang H, Cachope R, Mejias-Aponte CA, Gomez JA, Mateo-Semidey GE, Beaudoin GMJ, Paladini CA, et al. : Dorsal Raphe Dual Serotonin-Glutamate Neurons Drive Reward by Establishing Excitatory Synapses on VTA Mesoaccumbens Dopamine Neurons. Cell Rep 2019, 26:1128–1142 e1127. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 84.Beier K: Modified viral-genetic mapping reveals local and global connectivity relationships of ventral tegmental area dopamine cells. Elife 2022, 11. [Google Scholar]
  • 85.Bubar MJ, Stutz SJ, Cunningham KA: 5-HT(2C) receptors localize to dopamine and GABA neurons in the rat mesoaccumbens pathway. PLoS One 2011, 6:e20508. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 86.Gao M, Der-Ghazarian TS, Li S, Qiu S, Neisewander JL, Wu J: Dual Effect of 5-HT(1B/1D) Receptors on Dopamine Neurons in Ventral Tegmental Area: Implication for the Functional Switch After Chronic Cocaine Exposure. Biol Psychiatry 2020, 88:922–934. [DOI] [PubMed] [Google Scholar]
  • 87.Nagai Y, Takayama K, Nishitani N, Andoh C, Koda M, Shirakawa H, Nakagawa T, Nagayasu K, Yamanaka A, Kaneko S: The Role of Dorsal Raphe Serotonin Neurons in the Balance between Reward and Aversion. Int J Mol Sci 2020, 21. [Google Scholar]
  • 88.Iigaya K, Fonseca MS, Murakami M, Mainen ZF, Dayan P: An effect of serotonergic stimulation on learning rates for rewards apparent after long intertrial intervals. Nat Commun 2018, 9:2477. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 89.Bari BA, Grossman CD, Lubin EE, Rajagopalan AE, Cressy JI, Cohen JY: Stable Representations of Decision Variables for Flexible Behavior. Neuron 2019, 103:922–933 e927. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 90.Cools R, Arnsten AFT: Neuromodulation of prefrontal cortex cognitive function in primates: the powerful roles of monoamines and acetylcholine. Neuropsychopharmacology 2022, 47:309–328. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 91.Funahashi S, Bruce CJ, Goldman-Rakic PS: Mnemonic coding of visual space in the monkey's dorsolateral prefrontal cortex. J Neurophysiol 1989, 61:331–349. [DOI] [PubMed] [Google Scholar]
  • 92.Arnsten AFT: Retrospective: Patricia S. Goldman-Rakic, pioneer in neuroscience and co-founder of the journal, Cerebral Cortex. Cereb Cortex 2023, 33:8089–8100. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 93.Matsumoto M, Takada M: Distinct representations of cognitive and motivational signals in midbrain dopamine neurons. Neuron 2013, 79:1011–1024. [DOI] [PubMed] [Google Scholar]
  • 94.Nakamura K, Matsumoto M, Hikosaka O: Reward-dependent modulation of neuronal activity in the primate dorsal raphe nucleus. J Neurosci 2008, 28:5331–5343. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 95.Zou WJ, Song YL, Wu MY, Chen XT, You QL, Yang Q, Luo ZY, Huang L, Kong Y, Feng J, et al. : A discrete serotonergic circuit regulates vulnerability to social stress. Nat Commun 2020, 11:4218. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 96.Saunders BT, Richard JM, Margolis EB, Janak PH: Dopamine neurons create Pavlovian conditioned stimuli with circuit-defined motivational properties. Nat Neurosci 2018, 21:1072–1083. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • *97. Miyazaki K, Miyazaki KW, Sivori G, Yamanaka A, Tanaka KF, Doya K: Serotonergic projections to the orbitofrontal and medial prefrontal cortices differentially modulate waiting for future rewards. Sci Adv 2020, 6. The authors examined the differential roles of DRN serotonin neurons projecting to the orbitofrontal cortex (OFC), medial prefrontal cortex (mPFC), and nucleus accumbens (NAc). They found that optogenetic activation of DRN-OFC and DRN-mPFC projections increased waiting duration, while the activation of DRN-NAc projection did not. They also found that the optogenetic activation of DRN-OFC projections prolonged waiting duration in any condition (i.e. different reward probability or uncertainty of reward timing), while the effect of DRN-mPFC activation was effective only when reward timing was uncertain, demonstrating the different roles of DRN projections in behavior.
  • 98.Millard SJ, Bearden CE, Karlsgodt KH, Sharpe MJ: The prediction-error hypothesis of schizophrenia: new data point to circuit-specific changes in dopamine activity. Neuropsychopharmacology 2022, 47:628–640. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES