Abstract
To make good decisions, humans need to learn about and integrate different sources of appetitive and aversive information. While serotonin has been linked to value-based decision-making, its role in learning is less clear, with acute manipulations often producing inconsistent results. Here, we show that when the effects of a selective serotonin reuptake inhibitor (SSRI, citalopram) are studied over longer timescales, learning is robustly improved. We measured brain activity with functional magnetic resonance imaging (fMRI) in volunteers as they performed a concurrent appetitive (money) and aversive (effort) learning task. We found that 2 weeks of citalopram enhanced reward and effort learning signals in a widespread network of brain regions, including ventromedial prefrontal and anterior cingulate cortex. At a behavioral level, this was accompanied by more robust reward learning. This suggests that serotonin can modulate the ability to learn via a mechanism that is independent of stimulus valence. Such effects may partly underlie SSRIs’ impact in treating psychological illnesses. Our results highlight both a specific function in learning for serotonin and the importance of studying its role across longer timescales.
Author summary
Drugs acting on the neurotransmitter serotonin in the brain are commonly prescribed to treat depression, but we still lack a complete understanding of their effects on the brain and behavior. We do, however, know that patients who suffer from depression learn about the links between their choices and pleasant and unpleasant outcomes in a different manner than healthy controls. Neural markers of learning are also weakened in depressed people. Here, we looked at the effects of a short-term course (2 weeks) of a serotonergic antidepressant on brain and behavior in healthy volunteers while they learnt to predict what consequences their choices had in a simple computer task. We found that the antidepressant increased how strongly brain areas concerned with predictions of pleasant and unpleasant consequences became active during learning of the task. At the same time, participants who had taken the antidepressant also performed better on the task. Our results suggest that serotonergic drugs might exert their beneficial clinical effects by changing how the brain learns.
Introduction
To make good decisions in complex environments, humans and animals need to learn about and integrate different sources of information, such as the good and bad aspects of the outcomes of choices. The neurotransmitter serotonin has been implicated in value-based choice and assumed to play a role in value learning, but even basic aspects of its function in such learning remain contested.
Serotonin has recently been implicated in simple aspects of reward-guided learning and decision-making. For example, when serotonergic neuron activity is recorded on a millisecond timescale or levels are manipulated acutely, serotonin has been found to code information about different aspects of good/appetitive [1–4] or bad/aversive [5,6] outcomes or to relate to avoidance behaviors [7,8]. On this basis, it has been suggested that it has a role in learning [9,10]. However, other studies have found no effects on learning [3,11–14] or have not dissociated learning from altered responsiveness to the valence of the reinforcing events themselves [15]. Thus, there is still no clear understanding of serotonin’s role in value learning. Here, we propose that this gap can be bridged by examining the effects of serotonin on value learning over a different timescale.
From animal studies, it is known that serotonin not only transfers information on millisecond timescales but also acts over protracted timescales of days and weeks. In fact, at the neuronal level, prolonged increases in serotonin over such timescales lead to changes in plasticity [16,17]. Increasing serotonin levels over several weeks, for example, by administering selective serotonin reuptake inhibitors (SSRIs), can reintroduce juvenile-like plasticity in the visual cortex [18] and the limbic system [19] in animals. Moreover, such effects are, respectively, linked to improvements in learning about visual stimuli and fear extinction. Interestingly, the timeframe of plasticity observed in animals is very similar to the timeframe for antidepressants to take effect in patients, so it may be particularly revealing to study serotonin’s neural and behavioral effects during learning at this timescale.
We therefore examined here whether prolonged increases in serotonin affect learning about appetitive (monetary reward) and aversive outcomes (investment of effort) independently of any effects such a manipulation may have on the coding of stimulus outcome valence per se. In other words, we examined whether serotonin increase has any effect on how we learn from pleasant or unpleasant outcomes as well as any direct impact on responsivity to the pleasant and unpleasant events per se. Using effort as the unpleasant dimension appeared particularly relevant in the context of serotonin’s role in treating clinical depression, in which motivation deficits are observed [20,21]. We recruited 29 human participants, who received 20 mg/d of the SSRI citalopram for 2 weeks or placebo (in a double-blind design), a similar dosage to that used clinically. Repeated administrations of SSRIs for this period of time have been shown to increase serotonin levels in nonhuman primates [22] and related markers of serotonin levels in humans [23–25]. A two-week administration schedule was chosen, as this is similar to the timeframe of appearance of early clinical effects of SSRIs in depression [26], of behavioral change in mild-stress animal models of depression [17], and of changes in neural plasticity in animals [17]. Participants subsequently performed a learning task while we measured their brain activity using functional magnetic resonance imaging (fMRI). In the task [27], participants concurrently learned about the changing values of two stimuli between which they had to choose on each trial. The value of each stimulus was determined by a pleasant (amount of monetary reward) and an unpleasant (amount of effort) dimension (we ensured that participants perceived the reward dimension as rewarding and the effort as aversive, see S1 Text #2 “Task description—training”). Our task allowed us to measure the neural and behavioral effects SSRIs might have on learning signals for pleasant and/or unpleasant information. It also made it possible to dissociate the effect SSRIs had on responses to receipt of positive and negative outcomes per se as opposed to the effect on learning about positive or negative outcomes. We hypothesized that if SSRIs affected learning independently of coding of valence, it should strengthen neural learning signals for both dimensions similarly. We found that SSRIs led to stronger reward and effort-related learning signals (i.e., reward and effort prediction errors [RPEs and EPEs]—the neural responses to the differences between the received and the expected outcomes) in a widespread network of brain areas coding value information. At the same time, however, activity related simply to the receipt of reward/effort outcomes per se was unaffected. This suggests that prolonged SSRI administration directly influences learning signals in humans over and above any effect it has on signaling pleasant or unpleasant outcomes per se. Not only were neural learning signals stronger but, at a behavioral level, we found that reward-related learning in complex environments was improved.
Results
Task and study design
This study investigated how neural and behavioral measures of reward and effort learning were modulated by repeated administration of an SSRI when both dimensions needed to be learned concurrently and could potentially interfere with one another.
To address this, healthy human participants (for details, see S1 Text #1 “Participants”) performed a previously established multidimensional learning task [27,28], while we measured their brain activity using fMRI. Participants were randomly assigned to 2 weeks of a clinical dose of the SSRI citalopram (20 mg/d, n = 15) or placebo (n = 14). Participants did not differ in any sociodemographic measures, and citalopram did not lead to any changes (baseline versus after 2 weeks of treatment) for any self-reported scores of depression, anxiety, positive or negative affect, or mood state (S1 Table).
The task [27] is described more extensively in the supporting methods (S1 Text, #2 “Task description”). In short, in the task, participants repeatedly chose between the same two options, aiming to choose the options maximizing their monetary gain and minimizing the effort they needed to exert to obtain the reward (Fig 1). When making their decisions (Fig 1A), they therefore had to take into account the independent reward and effort magnitudes associated with each of the two options, which they had to learn from experience across trials. These magnitudes slowly varied over the course of the experiment (Fig 1D). At the time of the decision, participants were additionally shown on the screen the randomly drawn probability of how likely each option was to lead to a real or hypothetical reward outcome (the probability determined what we later refer to, for the sake of brevity, as the “reward type” of a choice). If an option led to what we called a real reward outcome, participants received the reward magnitude points as monetary pay-off for the experiment; if an option led to a hypothetical reward outcome, participants were only shown how much money they could have won for this choice, but were not awarded it as monetary pay-off on this particular occasion. As these reward probabilities were randomly drawn on each trial, the reward type (real versus hypothetical) of one trial should not influence participants’ decisions on the next trial. However, we have shown previously [27] that reward type on one trial nevertheless biased participants’ behavior on the next trial in this learning task.
After each choice, participants saw the outcome of their choice (Fig 1B). At this time point, participants could learn about the reward and effort magnitudes of the two options. In other words, they could update their effort/reward expectation if it was violated, i.e., if the current trial’s outcome deviated from their prior expectation. Note that a numeric expression of how much one’s prediction deviated from the actual outcome and therefore should be changed is called the prediction error (PE), see below. This task was designed to be challenging for participants: ideally, they should simultaneously learn about both the reward and effort magnitude associations of the options regardless of whether rewards were real or hypothetical. Thus, optimal learning needed to represent all these components separately to prevent them interfering with one another. In constructing the task schedule (example in Fig 1D), we ensured that all factors of interest, e.g., the behavioral and neural measures of learning about reward and effort, varied independently from each other. This ensured, as in other studies [3,27,29,30], that their independent effects on behavior and neural activity could be determined (S4 and S11 Figs). After careful training, participants showed a good understanding of the task and good performance (Fig 2A and 2B and S1 Text #2 “Task description—Training”).
fMRI
In order to quantify value learning, simple computational models have been proposed [31]. A key component of these models are PEs, the difference between the actual and the expected outcome, or, in other words, how much better (or worse) than expected the outcome is. Such PEs then drive learning, i.e., they lead to changes in predictions for the next occasion that a choice can be taken. It has been shown previously that neural correlates of PEs can be found in different areas of the human brain using fMRI and that they relate to behavioral markers of learning [32–37]. We therefore tested whether citalopram affected PEs as the neural substrate of learning. Later, we examined how these changes translated into behavioral changes. Neural correlates might be more proximal to the molecular level action of citalopram than behavioral measures, which are the integrated (and binarized) outputs of many different brain processes. If citalopram increased synaptic plasticity and therefore induced learning-related changes in neural activity, then this should manifest in increased PE signals. We found this to be the case for both reward and effort learning.
Increased serotonin enhances reward and effort prediction errors (EPEs)
We measured neural learning signals (PEs) at the time of the outcome. The regressors for this analysis were derived from a Bayesian learning model ([27] and S1 Text #3 “Bayesian model”), which provided a good fit to our data (Fig 2A–2C). Using a fitted Rescorla—Wagner reinforcement learning model instead produced the same neural and behavioral results.
To test whether citalopram affected neural correlates of reward PEs (RPEs), we first selected regions of interest (ROIs) that were sensitive to the receipt of reward (i.e., whether reward was real or hypothetical, analysis fGLM1, Fig 3A, table of coordinates in S2 Table, results cluster-corrected at p < 0.05, voxel inclusion threshold: z > 2.3) and that had previously been implicated in the processing of rewards or learning [33,34]. These regions included striatum and ventromedial prefrontal cortex (vmPFC). Using the averaged BOLD data from these regions, we then tested whether citalopram affected the neural RPE signals (i.e., the difference between the received and the expected reward magnitude, independent of whether reward was real or hypothetical). This analysis was thus statistically independent of any analyses used to establish the ROIs in the first place. We found (Fig 3B, analysis fGLM2) that in those ROIs, citalopram strongly enhanced the neural correlates of RPEs for the chosen option (an ANOVA revealed significant group differences across all areas, i.e., difference in the mean value across all areas: F(1,27) = 9.21, p = 0.005; an additional analysis [fGLM2reduced] not controlling for reward/effort outcomes produced the same result: F(1,27) = 7.3, p = 0.012; t tests for each area individually are as follows: striatum: t(27) = −1.74, p = 0.093; mid cingulate cortex: t(20.24) = −2.12; p = 0.048, vmPFC: t(27) = −2.88; p = 0.008, parietal cortex: t(27) = −2.64, p = 0.014). Supplementary analyses confirmed that this result was robust to different RPE modeling choices (S1 Fig).
We next performed an analogous analysis to find neural correlates of EPEs. First, we identified effort outcome—related brain areas by finding areas that became more active when the chosen option was associated with more effort than the option that was unchosen (we refer to this contrast as the relative effort outcome contrast; Fig 3C, S2 Fig, S2 Table, analysis fGLM1). As the neural correlates of effort processing have received comparatively less attention than those related to reward processing, there were no strong a priori hypotheses about which regions might carry EPEs. However, despite the relative absence of specific information about EPEs, dorsal anterior cingulate cortex (dACC) has been consistently linked to effort processing in both animals and humans [39–44]. We therefore tested for EPEs in all effort-sensitive regions, although prior work suggested that dACC should be a focus of particular interest. Again, note that the EPE contrast is independent from the contrast used to establish the ROIs in the first place. In all ROIs, we examined whether citalopram increased neural correlates of EPEs. In all areas, higher EPEs led to a decrease in activity (S2 Fig). An ANOVA across these areas revealed that in some areas EPEs were stronger (more negative) in the citalopram group (interaction effect group x area: F(5,135) = 2.45, p = 0.037; main effect of group: F(1,27) = 1.43, p = 0.24, analysis fGLM2; an additional analysis [fGLM2reduced] not controlling for reward/effort outcomes suggests a group difference main effect across all tested brain areas: F(1,27) = 4.8, p = 0.037, interaction effect group x area: F(3.8,101.5) = 3.34, p = 0.015). This effect was particularly striking in dACC (between-subject t test: t(27) = 3.01, p = 0.0056, significance threshold of Bonferroni correction for six brain areas: p < 0.008; Fig 3D). No significant differences were found in the other areas (S2 Fig).
In summary, our fMRI results showed strong evidence that repeated administration of citalopram increased neural correlates of learning signals for both reward and effort. This is in agreement with what would be predicted if an increase in synaptic plasticity in value learning—related brain regions is induced by repeated citalopram administration.
Neural learning effects are not the result of increased outcome processing
As control analyses, we tested whether changes in learning with repeated SSRI administration might be secondary to increases in the coding of appetitive or aversive outcomes per se. This is based on theories that suggest that serotonin is involved in the coding of the valence of outcomes [8,45]. In our paradigm, PEs were sufficiently decorrelated from the outcomes themselves (S4 Fig) so that the impact of SSRIs on both could be investigated in the same analysis. This was possible because participants needed to learn reward and effort magnitudes rather than probabilities (the reward probabilities associated with the options did not need to be learned because this information was explicitly cued and provided to the participants on every trial; Fig 1).
Therefore, we tested whether citalopram also affected the reward magnitude outcome signals (analysis fGLM2, regressor of reward magnitude outcome). We found that citalopram did not increase reward magnitude outcome signals in the ROIs that had shown increased RPEs with citalopram (Fig 4A; ANOVA, group difference across all areas: F(1,27) = 1.19, p = 0.29) or on a whole-brain level. On the contrary, there was some evidence for reward outcome signals to be reduced in striatum and vmPFC by citalopram in a relatively late period during the processing of the outcomes. In other words, although there was no significant difference between the groups in the hemodynamically convolved signals time-locked to the onset of the outcome phase, there were statistical differences in the time course of the BOLD signal late during the outcome phase when using a more lenient statistical approach that did not correct for multiple comparisons (Fig 4A).
Similarly, we next tested (analysis fGLM2) whether citalopram affected the coding of effort magnitude outcomes in ROIs sensitive to effort as identified above. Again, this was not the case in an ANOVA across all six ROIs (ANOVA, testing for a group difference across all areas: F(1,27) = 0.29, p = 0.60), nor was it the case more specifically in dACC, in which citalopram had increased EPEs (Fig 4B; t test comparing the effort outcome signals across the two groups: t(27) = 0.65, p = 0.52) or on a whole-brain level.
In a further control analysis, we tested whether citalopram affected the BOLD response in general (rather than specifically increasing RPE/EPE signals). This was not the case (S5 Fig).
Together, our fMRI results suggest that citalopram enhanced neural learning signals independently of any increases to reward or effort outcome sensitivity per se.
Citalopram makes reward learning more robust against interference
Having found that citalopram increased neural learning signals, we next tested whether it also affected behavioral markers of learning. The analogous behavioral test is to measure the impact of PEs on behavior, as has been done previously [29,37,46,47] (learning can also be measured using alternative methods, S7 Fig). In short, this approach assesses to what extent PEs on one trial have an impact on participants’ choices on the next trial. For example, one would expect that if there is a positive RPE on one trial (i.e., the option is better than expected) then this should make participants more likely to select the option again on the next trial. The approach is thus very related to the neural regression analyses, in which we measured the impact of PEs on brain activity, rather than choices: a regressor that is “chosen” in the outcome phase of one trial (fMRI), is a regressor favoring “stay” (i.e., choosing again the same option) on the next trial (behavioral analysis).
Improved neural PE signals could translate into different kinds of behavioral learning improvements. They might result in a general overall improvement in using PEs to drive behavior. This was not the case (all p > 0.26, S7 Fig). This was probably because participants in the placebo group were already generally very good at learning, making it difficult to measure further general improvements. However, even if it is the case that a general, overall improvement is not observable, there may still be evidence of improvements if we focus on situations in which learning is particularly challenging. In the present task, participants needed to simultaneously learn about reward and effort; furthermore, reward was only hypothetical (rather than real) on some trials. Learning in these trials is particularly challenging. Optimally, participants should learn similarly from both real and hypothetical reward outcomes; even if the latter have less intrinsic value, they should still be equally informative for learning. Similarly, learning about reward magnitudes should be independent from learning about effort magnitudes. We therefore hypothesized more specifically that learning might be subject to some degree of interference from irrelevant factors and that this might be remedied by citalopram: learning about one dimension (e.g., reward) might be interfered with to some degree by the absence of real reward experience (i.e., when rewards were only hypothetical) and/or having to learn about the other dimension simultaneously. For example, a surprisingly high effort outcome might attract processing resources so much that reward learning is impaired. Citalopram’s enhancing effect on neural learning signals in general might then manifest in the behavior as more robust learning specifically in the face of interference.
To test the impact of citalopram on such learning, we performed a regression analysis (bGLM2) that assessed how much participants’ decisions on each trial to “stay” (i.e., to select again the same option as on the last trial) or to “switch” to the alternative option took the PE into account differently in the face of interfering factors (either the fact that the RPE involved a reward that was just hypothetical or the fact that the RPE occurred in the context of a high EPE). In this regression, interference was measured as an interaction between the interfering factors (reward type [i.e., real reward versus hypothetical reward] and EPEs) and the RPEs (see Methods for list of additional confound regressors included). Significant positive interaction terms between RPE and reward type (real versus hypothetical) or EPEs would then mean that participants were not as efficient at using the RPEs when reward was hypothetical compared to real or when effort was surprisingly high. However, if the interaction terms were not different from zero, then it would mean that participants were equally efficient at using RPEs whether rewards were real or hypothetical and regardless of whether the EPE was high. The analysis showed (Fig 5A) that the reward learning of participants in the placebo group was more subject to interference than that of participants taking citalopram (the placebo participants had larger regression weights for the interaction terms: ANOVA, measuring the average interference effect across both interaction term regression weights, comparing the two groups: F(1,27) = 7.00, p = 0.013). This effect is illustrated in Fig 5B and 5C (analyses bGLM3a and b): When there was no interference, because rewards were real and EPEs were low, the two groups did not differ in how much they could use RPEs on one trial for making decisions on the next trial (between-subject t tests for group differences: when reward was real [bGLM3a]: t(27) = −0.47, p = 0.64; when EPEs were favorable [bGLM3b]: t(27) = −0.32, p = 0.75). However, when there was potential for interference, because rewards were hypothetical or EPEs were surprisingly high, only the citalopram group still used RPEs for decisions on the next trial (between-subject t tests for group differences: when reward was hypothetical: t(27) = −2.21, p = 0.036; when EPEs were unfavorable: t(27) = −2.69, p = 0.012; one-sample t tests within each group testing whether RPEs significantly affected decisions: when reward was hypothetical: placebo: t(13) = 0.38, p = 0.71; citalopram: t(14) = 4.23, p = 0.001; when EPEs were unfavorable: placebo: t(13) = −0.43, p = 0.68; citalopram: t(14) = 3.13, p = 0.007).
This effect was specific to learning about the reward dimension (S8 Fig). This was potentially so because effort was always real (it had to be exerted on every trial) and, therefore, potentially easier to learn about. We also note that while the behavioral learning effects were selective for situations of interference, neural learning signal improvements were always present rather than specific to situations of interference (S9 Fig).
As a last behavioral finding, we also noted that citalopram did not affect how participants exerted effort, nor did it disrupt how motivation affected effort exertion (S10 Fig). Thus, just as citalopram did not affect neural responses to reward and effort outcomes per se, so it had no impact on the effect of reward or effort outcomes per se on different behavioral measures.
In summary, we found that citalopram not only enhanced PE coding at a neural level, but it also enhanced the impact of PEs at a behavioral level. Citalopram made RPE-based learning more robust against interference.
Discussion
This study examined the role of serotonin in value learning by looking at the effects of a repeated administration of the selective serotonin reuptake inhibitor (SSRI) citalopram on reward and effort learning in human participants. Participants performed a multidimensional learning task in which it was necessary to learn about both reward and effort. Neurally, we found that citalopram increased learning signals, i.e., PEs, for both reward and effort. RPEs were increased in a widespread network of brain regions, including vmPFC. At the same time, EPEs were increased in dACC. This increase in learning signals occurred in the absence of increases to the overall outcome signals for reward or effort. Behaviorally, we found that citalopram made reward learning more robust or resilient to negative interference.
SSRIs enhance neural measures of learning
Citalopram enhanced neural learning signals for both pleasant and unpleasant outcomes across many brain areas. This general and widespread effect is in agreement with a general increase in learning and plasticity after repeated SSRI administration [48], rather than a specific effect on only either appetitive or aversive learning.
One brain area that we identified as having increased RPE signals was the vmPFC. This area has repeatedly been identified as being involved with reward-guided reversal learning [49]. Furthermore, it has been shown that depletion of serotonin levels in adjacent prefrontal areas impaired reversal learning in marmosets [50–52]. While our results further support the claim that serotonin affects the role of vmPFC in reversal learning, we also note that our results point to an effect of SSRIs on learning that is not selective to vmPFC, as we found changes in the RPE signals in a wide range of areas. This is unsurprising given the systemic administration of citalopram used here.
EPEs were also enhanced, particularly in dACC. Although less is known about the neural mechanisms of effort processing, compared to reward processing, the dACC has been linked to aspects of effort processing in both animal models and humans [28,39–43]. Arguably, in the present paradigm, reward learning was more challenging than the more straightforward effort learning; while some level of effort exertion was constantly required on every trial, knowledge of the changes in reward magnitudes that the participants learned had to be integrated with information about reward probabilities presented on each trial, and the potentially distracting fact of whether or not a reward was real or hypothetical had to be ignored. The more widespread impact of the SSRI we studied on RPEs than EPEs may, therefore, simply reflect the more challenging nature of reward learning in this task.
Our results thus suggest an important role for serotonin in the control of value-guided learning from both pleasant and unpleasant outcomes. This is in agreement with findings that in patients with obsessive-compulsive disorder (OCD), who were given SSRIs as part of their treatment, performance in a learning task with monetary wins and losses was improved [53]. These learning effects may relate to changes in synaptic plasticity. SSRIs have been shown to lead to plasticity changes in the brain when administered repeatedly, and other studies have found that serotonin levels naturally increase in situations in which new learning occurs [16]. Furthermore, when SSRIs are administered for a prolonged time in animal models, there have been reports of an impact on basic forms of learning, such as visual adaptation and fear extinction [16,18,19]. This has been linked to changes in synaptic plasticity [16], synapse remodeling [17], and neurogenesis in the hippocampus [54] in animals. In other words, when serotonin levels are changed for a prolonged time, a series of adaptive downstream changes occur, which ultimately lead to increased learning and plasticity. This is in contrast to studies reporting shorter timescale serotonin manipulations in humans, for example, through tryptophan depletion, that have found no effect on reward learning [3,11–15,55–58], but see also [10] and [9]. While the present results reveal that sustained serotonin manipulation has a causal influence on the neural correlates of RPEs and EPEs and on learning, it was not possible to identify the various downstream changes that mediated the impact of serotonin, and it is possible that these included other neurotransmitter changes [59] or might produce other independent effects on behavior beyond improvements in learning. For example, studies have found SSRI administration to reduce GABA levels [18], to make certain forms of long-term potentiation easier to induce [18,19,60], to increase markers of LTP [16], to increase neurogenesis, and to change the morphology of neural dendrites [17]. Further work in animals will be needed to elucidate how these mechanisms relate to value learning.
Learning signal increases are not secondary to increases to reward/effort receipt coding
Importantly, beyond demonstrating increased neural learning signals, our neural data also allowed us to rule out the possibility that the effects of SSRIs on learning were secondary to an effect on the coding of positive or negative outcomes per se. If this were the case, we would have expected SSRIs to increase signals for reward and effort receipts (“outcomes”) at the same time as increasing learning signals. However, we instead found that SSRIs did not increase neural reward or effort outcome signals, and, if anything, at a lower statistical threshold, we found that in vmPFC and striatum, reward outcome signals were decreased. This is similar to previous studies that found decreases in brain activity to rewarding stimuli with prolonged SSRI administration [61–63]. This is also in agreement with optogenetic studies that failed to find evidence for serotonergic activity per se being reinforcing or aversive [1,4].
While our longer-term manipulation of serotonin did not reveal effects on the processing of aversive outcomes that could explain the learning effects, we note that previous studies, particularly looking at the function of serotonin at shorter timescales or through genetic variations in serotonin transporter polymorphisms, have found evidence for a role of serotonin in (aversive) outcome processing or inhibition [2,6,7,10,14,15,55,58,64–68]. Rather, what our study suggests is that, over the longer term, serotonin additionally plays an independent role in value-guided learning by modulating learning capability directly, rather than just as an indirect consequence of any impact it has on appetitive or aversive outcome signals over shorter timescales.
SSRIs enhance behavioral measures of learning and plasticity
Beyond changes to neural markers of learning, we also found, at a behavioral level that repeated administration of SSRIs increased learning. A priori, there are several ways in which improved neural learning signals could translate into improved learning behaviorally. Firstly, in classic theories [31], learning is about establishing expectation about mean magnitudes or probabilities. In such a scenario, learning can be influenced by a general change in the speed of learning this expectation, i.e., by changing the learning rate. Secondly, learning can also be improved by changing the precision of the representation of the learnt information, i.e., by increasing the signal-to-noise ratio. Importantly, in the second scenario, better learning can mean being able to use the learnt information more consistently in situations in which learning is particularly challenging, for example, because of interference from other outcome value dimensions. Neurally, a more precise estimate of learning would be reflected in stronger PE signals. Our findings thus align best with the second scenario: neural PEs were increased by the SSRI, but learning speed per se was not changed. Additionally, the SSRI had a protective effect on learning in challenging situations in which negative interference would normally have drawn resources away from the processing of key reward-related contingencies. Neurophysiologically, increased signal-to-noise could be achieved by SSRIs changing the properties of individual synapses, the number of synapses in a state that allows learning [17], or by allowing better integration between the predictions and the outcomes. Such cellular changes could either increase the signal itself or reduce the noise—both possibilities would improve how the information for learning is represented in the brain.
Clinical relevance
The results of our study might shed light onto the mechanism by which SSRIs work as treatments for psychological illnesses, such as depression. Early clinical effects have been reported with the same dosage and duration at which we administered citalopram to our participants [26,48]. In fact, our results might suggest that SSRIs exert part of their clinical effect by enhancing how well patients can learn about positive relationships in complex environments even in the presence of negative interfering information, which otherwise could prevent such learning. This effect may occur in addition to, or even underlie, previous reports that SSRIs shift the processing of social cues so that they are perceived as more positive [69]. In fact, these two effects, relating to learning and attentional biases, might interact with each other, resulting in patients perceiving the world as more positive and learning more reliably about positive aspects of the world. In this context, it is noteworthy that our behavioral results are in agreement with other studies that found that changes in serotonin levels affect how much negative stimuli can bias behavior [70,71].
Limitations
In the fMRI data, we noted that while the citalopram group showed strong overall RPEs, the placebo group did not. This is in contrast to previous reports of PEs in healthy participants performing probabilistic reward tasks. There were, however, some differences in our task and analyses: our participants learned about changing reward magnitudes rather than changing reward probabilities, and our analysis carefully dissociated RPE responses from responses to reward delivery per se. Nevertheless, it is important not to over-interpret the lack of strong RPEs in the placebo group as evidence of absence; it is quite possible that the placebo groups’ brains carried overall RPE signals that were simply below our threshold for detection. However, and most importantly, we can conclude from our data that the citalopram group had stronger overall RPE signals than the placebo group.
Furthermore, while the neural PE effects we found were not reflected in general changes in all aspects of PE-based behavioral learning, we did observe some more specific changes in aspects of PE-based behavioral learning. Namely, we observed improved RPE-based learning in situations of interference. This may be so because the serotonergic manipulation acted directly on the brain, but those neural changes only impacted behavior in certain situations in which learning was particularly challenging (because of interfering factors) and, therefore, most likely to be subject to disruption. We do not want to rule out the possibility of finding more general effects in future studies using larger participant samples or other tests of RPE-based learning.
Conclusion
We found that repeated administration of an SSRI increased neural PE signals during reward and effort learning. Concomitantly, behavioral measures of reward learning in the face of negative interference were improved. Thus, prolonged administration of SSRIs can strengthen learning signals for both appetitive and aversive outcomes in a manner that is consistent with previous demonstrations of the impact of serotonin manipulation on neural plasticity. These results are also of clinical relevance, supporting theories that SSRIs’ treatment effects on, for example, depression may be related to increases in neural plasticity and learning.
Methods
Ethics statement
Participants gave written informed consent to take part in the study, which was approved by the NRES Committee South Central—Portsmouth (12/SC/0276).
fMRI analyses
All analyses were performed in FSL, Matlab, and SPSS. Greenhouse—Geisser corrections for violations of sphericity and nonparametric tests were used where appropriate.
MRI data acquisition and preprocessing
Structural MRI and fMRI measurements were taken using a Siemens 3 Tesla MRI scanner (see S1 Text #5, “MRI” and [27]). In short, we used a Deichmann echo-planar imaging sequence [72]. We used FMRIB’s Software Library (FSL) [73] for image preprocessing and analysis. All main effect images shown are cluster-corrected (p < 0.05) with the standard voxel inclusion threshold of z = 2.3. We also analyzed data in ROIs, extracted from spheres with a three-voxel (or two, for small brain structures, i.e., striatum) radius, identified in MNI standard space on the basis of orthogonal whole-group contrasts.
Whole-brain
In the first fMRI analysis, we investigated whether citalopram affected neural learning signals and, more broadly, which brain areas were sensitive to reward and effort information (analysis fGLM1, similar design as previously described [27]). The regressors used in this design were as follows (all correlations between regressors were r < 0.44, S4 Fig): We used three boxcar regressors, indicating the onset and duration of the decision phase (from the beginning of the trial until participants made a choice), the onset and duration of the outcome phase (from the appearance of the chosen outcome until the chosen and the unchosen outcomes disappeared from the screen), and, lastly, the effort exertion phase (from the appearance of the first effort target until participants had removed the last target). We included the following parametric regressors in the decision phase: whether the reward of the chosen option had been real or hypothetical on the last trial, reward and effort magnitude predictions (derived from a Bayesian learning model as described previously [27,74], see S1 Text #3 “Bayesian model” for additional details), and the reward probabilities that were displayed on the screen. In the outcome phase, we included the following parametric regressors: the reward type (real versus hypothetical) delivered for the chosen option, the reward probability for the chosen option, the reward and effort magnitude outcomes for the chosen and the unchosen option, and the RPEs and EPEs for the chosen and the unchosen option. The onset of all regressors for the chosen option was time-locked to the onset of the outcome phase; the onset and duration of the regressors for the unchosen option were time-locked to their display. In each case, separate regressors for the chosen and the unchosen option were used; they were later combined to derive relative (i.e., chosen minus unchosen option) values at the contrast level. In the effort execution phase, we included the clicking rate as a parametric regressor. Finally, we included, as confound regressors, six movement regressors and a regressor indexing when additional visual stimuli were presented to warn participants that they had not clicked the targets on time and that the halfway point of the experiment had been reached. We used FSL’s FLAME [75] to perform higher-level analysis; the two groups were modeled as separate groups with shared variance; outlier de-weighting was used. To identify ROIs (see below and also S1 Text #5 “MRI”), both groups were combined at the contrast level, i.e., we identified areas that showed activations (or deactivations) across both groups.
Time course analyses
We used analysis fGLM1 to identify regions for time course analyses. Specifically, we used the contrast of reward type to identify ROIs for analyses of RPEs and reward outcomes and we used the contrast of relative effort outcomes to identify ROIs for analyses of EPEs and effort outcomes (Figs 3 and 4). The ROIs were identified on the basis of the peaks of the relevant whole-brain cluster-corrected activations. The aim of theses analyses was to test whether citalopram affected PE and/or outcome signals. We therefore tested whether the two groups differed in reward/effort PEs and in the coding of the reward/effort outcomes (analysis fGLM2). Please note that these analyses were orthogonal to how the ROIs were identified. All time courses were extracted, regressed, and statistically tested as described in Scholl et al. [27] and similar to previous studies [49,76,77] (S6 Fig).
In analysis fGLM2, we included as regressors of interest the RPEs and EPEs (separately for the chosen and the unchosen option) as well as the relative (chosen minus unchosen option) reward and effort magnitude outcomes. As regressors of no interest, we included the reward type (real versus hypothetical) PE (i.e., the reward type of the trial minus the probability of the chosen option that had been shown at the time of choice):
PEs were included as separate regressors for chosen and unchosen options, as previous work [78] suggested that different areas might carry signals for chosen and unchosen PEs. In contrast, chosen and unchosen outcome signals have been found to be present in the same areas with opposite signs [27,49] and were, therefore, included as a relative value regressor. Analysis fGLM2 was time-locked to the onset of the outcome phase. In this, as in all other regressions, all regressors were z-score normalized. We tested whether RPEs, EPEs, reward, or effort outcomes were affected by citalopram using repeated-measures ANOVAs within the sets of ROIs. Significant group differences (omnibus main effects or interactions) were then followed up using t tests to assess whether these effects were driven by group differences in specific regions. For example, for EPEs, we performed an ANOVA across the neural data from all ROIs identified in the effort outcomes contrast. Follow-up t tests then examined group differences separately in all ROIs. To look at reward and effort PEs, it is also possible to construct a variant of fGLM2 in which no regressors for reward and effort outcomes, or, in other words, fewer nuisance regressors, are included (“fGLM2reduced”). Note that in this case, PE regressors are not just the orthogonal component left after parceling out outcomes and may also capture signal variation due to outcomes per se. This is because PEs and outcomes are not completely uncorrelated (S4 Fig).
Behavioral analyses
Task validation
To assess whether participants took all features of the task into account when making their choices, we performed a logistic regression analysis (bGLM1). We predicted participants’ decisions to “stay” (choose the same option again as on the last trial) or “switch” (choose the alternative option compared to the last trial); as regressors we included the relative probabilities (i.e., the probability in favor of the “stay” minus the probability in favor of the “switch” option) shown at the time of choice, the relative learnt reward and effort magnitude predictions (derived from the Bayesian model see S1 Text #3 “Bayesian model”), and the reward type of the previous trial (whether a reward had been real or hypothetical). All regressors were z-score normalized.
Model fitting and comparison
Instead of using an a priori Bayesian model (see S1 Text #3 “Bayesian model”) to generate regressors of reward/effort predictions for the behavioral and neural regression analysis, it is also possible to derive these from reinforcement-learning models that are fitted to participants’ choice data (similar to analyses that we described previously [27,79]). We also fitted this class of models to ensure that our Bayesian model was appropriate, i.e., that it at least provided an equally good fit to the data. In short, the model consisted of three main components: firstly, the model had predictions of the mean reward/effort magnitudes underlying both outcomes. These were updated on every trial using a reinforcement-learning algorithm:
with
where α was the learning rate. Thus, the learning rate was a measure of how much participants updated their reward/effort magnitude prediction when the reward/effort magnitude outcome differed from their expectation (i.e., depending on the PE). We fitted models that differed in their number of learning rates: they either shared the same learning rate for reward and effort, or they had separate learning rates. Finally, we also used a model with no fitted learning rate that instead used the predictions for reward and effort derived from the Bayesian model.
Secondly, the model then combined these reward/effort magnitude predictions together with the reward probabilities (explicitly shown to participants on each trial on the screen) to calculate how valuable each option was (i.e., their utility). Similar to [79], we considered different ways in which utility could be computed: optimally, probability and reward should be integrated multiplicatively:
in which k was a normalization constant with k = 1+λ, and λ was the effort magnitude decision weight. Alternatively, participants might use a heuristic strategy in which they instead integrate probability and reward linearly (we previously found this to provide a better fit to similar data [79]):
in which k was a normalization constant with k = 1+γ+λ, γ was the reward magnitude decision weight, and λ was the effort magnitude decision weight. The Utility for option B was computed in the same way.
Thirdly, the model then compared the utility of the two options to predict how likely participants would be to choose either, using a standard soft-max decision rule:
in which β reflected participants’ ability to pick the option with the higher utility (i.e., the inverse temperature).
The data from each participant were fitted individually using Matlab’s fminsearch routine, which adjusted the free parameters to minimize the difference between the predicted choice and the actual choice. To compare how well each model fitted the data, we performed model comparison using the Bayesian Information Criterion (BIC) summed across all participants. As this model comparison revealed the Bayesian model to provide the best fit to the data, we used it to generate regressors for all regression analyses (fMRI and behavior). However, we note that all findings and conclusions from the regression analyses remain the same when regressors are instead derived from the best-fitting Rescorla—Wagner model. This is due to the very high correlations between the regressors derived from the Bayesian and the best fitting Rescorla—Wagner model (r > 0.99).
Reward learning interference
To assess whether citalopram protected reward learning from interference, we performed a logistic regression analysis assessing how well information to be learnt from one trial (in the form of PEs) was used in the next trial when there was interference or when there was no interference (analysis bGLM2, see S11 Fig for correlations between the regressors). The regression predicted participant’s decisions to “stay” (choose the same option again as on the last trial) or “switch” (choose the alternative option compared to the last trial). Coding decision as “stay/switch” makes the behavioral analyses similar to the neural analyses: a regressor that is “chosen” in the outcome phase of one trial (fMRI) is a regressor favoring “stay” on the next trial (behavioral analysis). Equally an “unchosen” regressors favors “switch” on the next trial. As main regressors of interest to measure interference, we included interaction terms between the last trial’s relative (“stay” minus “switch”) RPEs and the two factors that could produce interference: the relative EPEs (positive values indicating higher, i.e., less favorable EPEs) and whether reward was real (coded as +1) or hypothetical (coded as −1). For this, we z-score normalized each regressor separately and then multiplied each interfering factor with the relative RPEs. As regressors of no interest, we included the explicitly shown (at the time of choice) relative probabilities, the relative reward and effort magnitude predictions, the relative reward and effort PEs, and the reward type (real versus hypothetical):
β1 and β2 were the two regression weights of interest, measuring the interference effects of reward type and effort learning on reward learning. If there is no interference by reward type, the regression weights should be (on average) zero, while if there is interference, β1 should be positive (if reward learning is worse when reward is only hypothetical). Similarly, for interference by EPE, if there is no interference, β2 should be (on average) zero. While if there is interference (i.e., if reward learning is worse when EPEs are high), β2 should be negative (as EPE increases the probability of switch rather than stay decisions). For Fig 5A, β2 has been sign reversed for illustration purposes so that for both β1 and β2, positive values indicate more interference. PEs and predictions were again obtained from the same Bayesian learning model as described for the analysis of the neural data; also, note that very similar results were found using regressors derived from a fitted learning model instead.
To further illustrate the results of analysis bGLM2 more intuitively, we compared to what extent participants could use RPEs on trials when reward was real or hypothetical using a regression analysis (bGLM3a). We included as our two regressors of interest relative RPEs (in favor of “stay” minus in favor of “switch”) separately on trials when reward was real or hypothetical. As regressors of no interest, we also included the relative reward probabilities (which were displayed to participants on each trial), the relative reward and effort predictions, the relative EPEs (also split into trials where reward was real or hypothetical), and reward type (real versus hypothetical):
To assess whether RPEs were used differently by the two groups when reward was real or hypothetical, we compared β1 and β2 using t tests. To assess whether EPEs were used differently when reward was real or hypothetical (S8 Fig), we compared β3 and β4 using t tests.
Similarly, we tested to what extent participants could use RPEs on trials when EPEs were particularly favorable or unfavorable (i.e., surprisingly high effort) in analysis bGLM3b. For this, we included as our two regressors of interest relative RPEs separately on the quartile of trials with most favorable and the quartile of trials with most unfavorable EPEs (the split into quartiles was necessary as the relative EPE regressor was continuous, rather than categorical). As regressors of no interest, we also included the relative RPEs in the remaining trials (i.e., the half of trials when relative EPE was neither high nor low), the relative reward probabilities, the relative reward and effort predictions, the relative EPE, and the reward type.
We compared the resulting regression weights of the two groups for RPEs when EPEs were particularly high or low (β1 and β2) using t tests.
Supporting information
Acknowledgments
We thank Zachary Mainen for helpful comments on an earlier version of this work. We thank Beata Godlewska, Martina Di Simplicio, Daniel Whiting and Franco De Crescenzo for help with screening of the participants.
Abbreviations
- BIC
Bayesian Information Criterion
- dACC
dorsal anterior cingulate cortex
- EPE
effort prediction error
- fMRI
functional magnetic resonance imaging
- OCD
obsessive-compulsive disorder
- PE
prediction error
- ROI
regions of interest
- RPE
reward prediction error
- SSRI
selective serotonin reuptake inhibitor
- vmPFC
ventromedial prefrontal cortex
Data Availability
The data are publicly available from the Oxford University Research Archive (https://doi.org/10.5287/bodleian:PP805bgDz). Source data files are provided with the article for all figures presented in the manuscript.
Funding Statement
Medical Research Council (grant number MR/N014448/1). Received by JS. The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Christ Church College (Stipendary Junior Research Fellowship). Received by NK. The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Wellcome Trust wellcome.ac.uk (grant number 092759/Z/10/Z, 89280/Z/09/Z, WT100973AIA). Senior investigator award received by MFSR (WT100973AIA), Four-Year PhD Studentships received by JS (092759/Z/10/Z) and NK (89280/Z/09/Z). The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
References
- 1.Miyazaki KW, Miyazaki K, Tanaka KF, Yamanaka A, Takahashi A, et al. (2014) Optogenetic activation of dorsal raphe serotonin neurons enhances patience for future rewards. Curr Biol 24: 2033–2040. 10.1016/j.cub.2014.07.041 [DOI] [PubMed] [Google Scholar]
- 2.Liu Z, Zhou J, Li Y, Hu F, Lu Y, et al. (2014) Dorsal raphe neurons signal reward through 5-HT and glutamate. Neuron 81: 1360–1374. 10.1016/j.neuron.2014.02.010 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Seymour B, Daw ND, Roiser JP, Dayan P, Dolan R (2012) Serotonin selectively modulates reward value in human decision-making. J Neurosci 32: 5833–5842. 10.1523/JNEUROSCI.0053-12.2012 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Fonseca MS, Murakami M, Mainen ZF (2015) Activation of dorsal raphe serotonergic neurons promotes waiting but is not reinforcing. Curr Biol 25: 306–315. 10.1016/j.cub.2014.12.002 [DOI] [PubMed] [Google Scholar]
- 5.Amo R, Fredes F, Kinoshita M, Aoki R, Aizawa H, et al. (2014) The habenulo-raphe serotonergic circuit encodes an aversive expectation value essential for adaptive active avoidance of danger. Neuron 84: 1034–1048. 10.1016/j.neuron.2014.10.035 [DOI] [PubMed] [Google Scholar]
- 6.Cohen JY, Amoroso MW, Uchida N (2015) Serotonergic neurons signal reward and punishment on multiple timescales. Elife 4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Crockett MJ, Clark L, Apergis-Schoute AM, Morein-Zamir S, Robbins TW (2012) Serotonin modulates the effects of Pavlovian aversive predictions on response vigor. Neuropsychopharmacology 37: 2244–2252. 10.1038/npp.2012.75 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Boureau YL, Dayan P (2011) Opponency revisited: competition and cooperation between dopamine and serotonin. Neuropsychopharmacology 36: 74–97. 10.1038/npp.2010.151 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Rogers RD, Blackshaw AJ, Middleton HC, Matthews K, Hawtin K, et al. (1999) Tryptophan depletion impairs stimulus-reward learning while methylphenidate disrupts attentional control in healthy young adults: implications for the monoaminergic basis of impulsive behaviour. Psychopharmacology 146: 482–491. [DOI] [PubMed] [Google Scholar]
- 10.Chamberlain SR, Müller U, Blackwell AD, Clark L, Robbins TW, et al. (2006) Neurochemical modulation of response inhibition and probabilistic learning in humans. Science 311: 861–863. 10.1126/science.1121218 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Murphy FC, Smith KA, Cowen PJ, Robbins TW, Sahakian BJ (2002) The effects of tryptophan depletion on cognitive and affective processing in healthy volunteers. Psychopharmacology (Berl) 163: 42–53. [DOI] [PubMed] [Google Scholar]
- 12.Schweighofer N, Bertin M, Shishida K, Okamoto Y, Tanaka SC, et al. (2008) Low-serotonin levels increase delayed reward discounting in humans. J Neurosci 28: 4528–4532. 10.1523/JNEUROSCI.4982-07.2008 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Worbe Y, Palminteri S, Savulich G, Daw ND, Fernandez-Egea E, et al. (2015) Valence-dependent influence of serotonin depletion on model-based choice strategy. Mol Psychiatry. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Guitart-Masip M, Economides M, Huys QJ, Frank MJ, Chowdhury R, et al. (2014) Differential, but not opponent, effects of L -DOPA and citalopram on action learning with reward and punishment. Psychopharmacology (Berl) 231: 955–966. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Cools R, Robinson OJ, Sahakian B (2008) Acute tryptophan depletion in healthy volunteers enhances punishment prediction but does not affect reward prediction. Neuropsychopharmacology 33: 2291–2299. 10.1038/sj.npp.1301598 [DOI] [PubMed] [Google Scholar]
- 16.Jitsuki S, Takemoto K, Kawasaki T, Tada H, Takahashi A, et al. (2011) Serotonin mediates cross-modal reorganization of cortical circuits. Neuron 69: 780–792. 10.1016/j.neuron.2011.01.016 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Bessa JM, Ferreira D, Melo I, Marques F, Cerqueira JJ, et al. (2009) The mood-improving actions of antidepressants do not depend on neurogenesis but are associated with neuronal remodeling. Mol Psychiatry 14: 764–773, 739 10.1038/mp.2008.119 [DOI] [PubMed] [Google Scholar]
- 18.Vetencourt JFM, Sale A, Viegi A, Baroncelli L, De Pasquale R, et al. (2008) The antidepressant fluoxetine restores plasticity in the adult visual cortex. Science 320: 385–388. 10.1126/science.1150516 [DOI] [PubMed] [Google Scholar]
- 19.Karpova NN, Pickenhagen A, Lindholm J, Tiraboschi E, Kulesskaya N, et al. (2011) Fear erasure in mice requires synergy between antidepressant drugs and extinction training. Science 334: 1731–1734. 10.1126/science.1214592 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Clery-Melin ML, Schmidt L, Lafargue G, Baup N, Fossati P, et al. (2011) Why don't you try harder? An investigation of effort production in major depression. PLoS ONE 6: e23178 10.1371/journal.pone.0023178 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Treadway MT, Bossaller NA, Shelton RC, Zald DH (2012) Effort-based decision-making in major depressive disorder: a translational model of motivational anhedonia. Journal of abnormal psychology 121: 553 10.1037/a0028813 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Anderson GM, Barr CS, Lindell S, Durham AC, Shifrovich I, et al. (2005) Time course of the effects of the serotonin-selective reuptake inhibitor sertraline on central and peripheral serotonin neurochemistry in the rhesus monkey. Psychopharmacology (Berl) 178: 339–346. [DOI] [PubMed] [Google Scholar]
- 23.Haahr ME, Fisher PM, Jensen CG, Frokjaer VG, Mahon BM, et al. (2014) Central 5-HT4 receptor binding as biomarker of serotonergic tonus in humans: a [11C]SB207145 PET study. Mol Psychiatry 19: 427–432. 10.1038/mp.2013.147 [DOI] [PubMed] [Google Scholar]
- 24.Cowen PJ, Sargent PA (1997) Changes in plasma prolactin during SSRI treatment: evidence for a delayed increase in 5-HT neurotransmission. Journal of Psychopharmacology 11: 345–348. [DOI] [PubMed] [Google Scholar]
- 25.Laine K, Anttila M, Heinonen E, Helminen A, Huupponen R, et al. (1997) Lack of adverse interactions between concomitantly administered selegiline and citalopram. Clinical neuropharmacology 20: 419–433. [DOI] [PubMed] [Google Scholar]
- 26.Taylor MJ, Freemantle N, Geddes JR, Bhagwagar Z (2006) Early onset of selective serotonin reuptake inhibitor antidepressant action: systematic review and meta-analysis. Archives of general psychiatry 63: 1217–1223. 10.1001/archpsyc.63.11.1217 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Scholl J, Kolling N, Nelissen N, Wittmann MK, Harmer CJ, et al. (2015) The Good, the Bad, and the Irrelevant: Neural Mechanisms of Learning Real and Hypothetical Rewards and Effort. J Neurosci 35: 11233–11251. 10.1523/JNEUROSCI.0396-15.2015 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Scholl J, Kolling N, Nelissen N, Stagg CJ, Harmer CJ, et al. (2017) Excitation and inhibition in anterior cingulate predict use of past experiences. eLife 6: e20365 10.7554/eLife.20365 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Chiu PH, Lohrenz TM, Montague PR (2008) Smokers' brains compute, but ignore, a fictive error signal in a sequential investment task. Nat Neurosci 11: 514–520. 10.1038/nn2067 [DOI] [PubMed] [Google Scholar]
- 30.Glascher JP, O'Doherty JP (2010) Model-based approaches to neuroimaging: combining reinforcement learning theory with fMRI data. Wiley Interdiscip Rev Cogn Sci 1: 501–510. 10.1002/wcs.57 [DOI] [PubMed] [Google Scholar]
- 31.Rescorla RA, Wagner AR (1972) Classical conditioning 2: current research and theory. Appleton Century-Crofts, New York: 64–69. [Google Scholar]
- 32.O'Doherty JP, Dayan P, Friston K, Critchley H, Dolan RJ (2003) Temporal difference models and reward-related learning in the human brain. Neuron 38: 329–337. [DOI] [PubMed] [Google Scholar]
- 33.Garrison J, Erdeniz B, Done J (2013) Prediction error in reinforcement learning: a meta-analysis of neuroimaging studies. Neurosci Biobehav Rev 37: 1297–1310. 10.1016/j.neubiorev.2013.03.023 [DOI] [PubMed] [Google Scholar]
- 34.Rutledge RB, Dean M, Caplin A, Glimcher PW (2010) Testing the reward prediction error hypothesis with an axiomatic model. J Neurosci 30: 13525–13536. 10.1523/JNEUROSCI.1747-10.2010 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.McClure SM, Li J, Tomlin D, Cypert KS, Montague LM, et al. (2004) Neural correlates of behavioral preference for culturally familiar drinks. Neuron 44: 379–387. 10.1016/j.neuron.2004.09.019 [DOI] [PubMed] [Google Scholar]
- 36.Schonberg T, Daw ND, Joel D, O'Doherty JP (2007) Reinforcement learning signals in the human striatum distinguish learners from nonlearners during reward-based decision making. J Neurosci 27: 12860–12867. 10.1523/JNEUROSCI.2496-07.2007 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Cohen MX (2007) Individual differences and the neural representations of reward expectation and reward prediction error. Soc Cogn Affect Neurosci 2: 20–30. 10.1093/scan/nsl021 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Mars RB, Jbabdi S, Sallet J, O'Reilly JX, Croxson PL, et al. (2011) Diffusion-weighted imaging tractography-based parcellation of the human parietal cortex and comparison with human and macaque resting-state functional connectivity. J Neurosci 31: 4087–4100. 10.1523/JNEUROSCI.5102-10.2011 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Skvortsova V, Palminteri S, Pessiglione M (2014) Learning To Minimize Efforts versus Maximizing Rewards: Computational Principles and Neural Correlates. J Neurosci 34: 15621–15630. 10.1523/JNEUROSCI.1350-14.2014 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Walton ME, Bannerman DM, Alterescu K, Rushworth MFS (2003) Functional specialization within medial frontal cortex of the anterior cingulate for evaluating effort-related decisions. J Neurosci 23: 6475–6479. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Rudebeck PH, Walton ME, Smyth AN, Bannerman DM, Rushworth MF (2006) Separate neural pathways process different decision costs. Nat Neurosci 9: 1161–1168. 10.1038/nn1756 [DOI] [PubMed] [Google Scholar]
- 42.Croxson PL, Walton ME, O'Reilly JX, Behrens TE, Rushworth MF (2009) Effort-based cost-benefit valuation and the human brain. J Neurosci 29: 4531–4541. 10.1523/JNEUROSCI.4515-08.2009 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Prevost C, Pessiglione M, Metereau E, Clery-Melin ML, Dreher JC (2010) Separate valuation subsystems for delay and effort decision costs. J Neurosci 30: 14080–14090. 10.1523/JNEUROSCI.2752-10.2010 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Klein-Flügge MC, Kennerley SW, Friston K, Bestmann S (2016) Neural signatures of value comparison in human cingulate cortex during decisions requiring an effort-reward trade-off. Journal of Neuroscience 36: 10002–10015. 10.1523/JNEUROSCI.0292-16.2016 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Daw ND, Kakade S, Dayan P (2002) Opponent interactions between serotonin and dopamine. Neural Networks 15: 603–616. [DOI] [PubMed] [Google Scholar]
- 46.Boorman ED, Rajendran VG, O'Reilly JX, Behrens TE (2016) Two Anatomically and Computationally Distinct Learning Signals Predict Changes to Stimulus-Outcome Associations in Hippocampus. Neuron 89: 1343–1354. 10.1016/j.neuron.2016.02.014 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.McGuire JT, Nassar MR, Gold JI, Kable JW (2014) Functionally dissociable influences on learning rate in a dynamic environment. Neuron 84: 870–881. 10.1016/j.neuron.2014.10.013 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Castren E (2013) Neuronal network plasticity and recovery from depression. JAMA Psychiatry 70: 983–989. 10.1001/jamapsychiatry.2013.1 [DOI] [PubMed] [Google Scholar]
- 49.Boorman ED, Behrens TE, Woolrich MW, Rushworth MF (2009) How green is the grass on the other side? Frontopolar cortex and the evidence in favor of alternative courses of action. Neuron 62: 733–743. 10.1016/j.neuron.2009.05.014 [DOI] [PubMed] [Google Scholar]
- 50.Clarke HF, Walker SC, Crofts HS, Dalley JW, Robbins TW, et al. (2005) Prefrontal serotonin depletion affects reversal learning but not attentional set shifting. J Neurosci 25: 532–538. 10.1523/JNEUROSCI.3690-04.2005 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Rygula R, Clarke HF, Cardinal RN, Cockcroft GJ, Xia J, et al. (2014) Role of Central Serotonin in Anticipation of Rewarding and Punishing Outcomes: Effects of Selective Amygdala or Orbitofrontal 5-HT Depletion. Cereb Cortex. 10.1093/cercor/bhu102 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Clarke HF, Dalley JW, Crofts HS, Robbins TW, Roberts AC (2004) Cognitive inflexibility after prefrontal serotonin depletion. Science 304: 878–880. 10.1126/science.1094987 [DOI] [PubMed] [Google Scholar]
- 53.Palminteri S, Clair AH, Mallet L, Pessiglione M (2012) Similar improvement of reward and punishment learning by serotonin reuptake inhibitors in obsessive-compulsive disorder. Biol Psychiatry 72: 244–250. 10.1016/j.biopsych.2011.12.028 [DOI] [PubMed] [Google Scholar]
- 54.Malberg JE, Eisch AJ, Nestler EJ, Duman RS (2000) Chronic antidepressant treatment increases neurogenesis in adult rat hippocampus. The Journal of Neuroscience 20: 9104–9110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Geurts DE, Huys QJ, den Ouden HE, Cools R (2013) Serotonin and aversive Pavlovian control of instrumental behavior in humans. J Neurosci 33: 18932–18939. 10.1523/JNEUROSCI.2749-13.2013 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Tanaka SC, Shishida K, Schweighofer N, Okamoto Y, Yamawaki S, et al. (2009) Serotonin affects association of aversive outcomes to past actions. J Neurosci 29: 15669–15674. 10.1523/JNEUROSCI.2799-09.2009 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Robinson OJ, Cools R, Sahakian BJ (2012) Tryptophan depletion disinhibits punishment but not reward prediction: implications for resilience. Psychopharmacology (Berl) 219: 599–605. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Evers EA, Cools R, Clark L, van der Veen FM, Jolles J, et al. (2005) Serotonergic modulation of prefrontal cortex during negative feedback in probabilistic reversal learning. Neuropsychopharmacology 30: 1138–1147. 10.1038/sj.npp.1300663 [DOI] [PubMed] [Google Scholar]
- 59.Fischer AG, Jocham G, Ullsperger M (2014) Dual serotonergic signals: a key to understanding paradoxical effects? Trends Cogn Sci. [DOI] [PubMed] [Google Scholar]
- 60.Batsikadze G, Paulus W, Kuo MF, Nitsche MA (2013) Effect of serotonin on paired associative stimulation-induced plasticity in the human motor cortex. Neuropsychopharmacology 38: 2260–2267. 10.1038/npp.2013.127 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.McCabe C, Mishor Z, Cowen PJ, Harmer CJ (2010) Diminished neural processing of aversive and rewarding stimuli during selective serotonin reuptake inhibitor treatment. Biol Psychiatry 67: 439–445. 10.1016/j.biopsych.2009.11.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Macoveanu J, Fisher PM, Haahr ME, Frokjaer VG, Knudsen GM, et al. (2014) Effects of selective serotonin reuptake inhibition on neural activity related to risky decisions and monetary rewards in healthy males. Neuroimage 99: 434–442. 10.1016/j.neuroimage.2014.05.040 [DOI] [PubMed] [Google Scholar]
- 63.Fletcher PC, Napolitano A, Skeggs A, Miller SR, Delafont B, et al. (2010) Distinct modulatory effects of satiety and sibutramine on brain responses to food images in humans: a double dissociation across hypothalamus, amygdala, and ventral striatum. J Neurosci 30: 14346–14355. 10.1523/JNEUROSCI.3323-10.2010 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Dayan P, Huys QJ (2009) Serotonin in affective control. Annu Rev Neurosci 32: 95–126. 10.1146/annurev.neuro.051508.135607 [DOI] [PubMed] [Google Scholar]
- 65.Crockett MJ, Clark L, Robbins TW (2009) Reconciling the role of serotonin in behavioral inhibition and aversion: acute tryptophan depletion abolishes punishment-induced inhibition in humans. J Neurosci 29: 11993–11999. 10.1523/JNEUROSCI.2513-09.2009 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Fischer AG, Endrass T, Reuter M, Kubisch C, Ullsperger M (2015) Serotonin reuptake inhibitors and serotonin transporter genotype modulate performance monitoring functions but not their electrophysiological correlates. J Neurosci 35: 8181–8190. 10.1523/JNEUROSCI.5124-14.2015 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Volman I, Verhagen L, den Ouden HE, Fernandez G, Rijpkema M, et al. (2013) Reduced serotonin transporter availability decreases prefrontal control of the amygdala. J Neurosci 33: 8974–8979. 10.1523/JNEUROSCI.5518-12.2013 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Roiser JP, de Martino B, Tan GC, Kumaran D, Seymour B, et al. (2009) A genetically mediated bias in decision making driven by failure of amygdala control. J Neurosci 29: 5985–5991. 10.1523/JNEUROSCI.0407-09.2009 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Harmer CJ, Goodwin GM, Cowen PJ (2009) Why do antidepressants take so long to work? A cognitive neuropsychological model of antidepressant drug action. Br J Psychiatry 195: 102–108. 10.1192/bjp.bp.108.051193 [DOI] [PubMed] [Google Scholar]
- 70.Walker SC, Robbins TW, Roberts AC (2009) Differential contributions of dopamine and serotonin to orbitofrontal cortex function in the marmoset. Cereb Cortex 19: 889–898. 10.1093/cercor/bhn136 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Evers EA, van der Veen FM, Jolles J, Deutz NE, Schmitt JA (2006) Acute tryptophan depletion improves performance and modulates the BOLD response during a Stroop task in healthy females. Neuroimage 32: 248–255. 10.1016/j.neuroimage.2006.03.026 [DOI] [PubMed] [Google Scholar]
- 72.Deichmann R, Gottfried JA, Hutton C, Turner R (2003) Optimized EPI for fMRI studies of the orbitofrontal cortex. NeuroImage 19: 430–441. [DOI] [PubMed] [Google Scholar]
- 73.Smith SM, Jenkinson M, Woolrich MW, Beckmann CF, Behrens TE, et al. (2004) Advances in functional and structural MR image analysis and implementation as FSL. Neuroimage 23 Suppl 1: S208–219. [DOI] [PubMed] [Google Scholar]
- 74.Behrens TE, Woolrich MW, Walton ME, Rushworth MF (2007) Learning the value of information in an uncertain world. Nat Neurosci 10: 1214–1221. 10.1038/nn1954 [DOI] [PubMed] [Google Scholar]
- 75.Woolrich MW, Behrens TE, Beckmann CF, Jenkinson M, Smith SM (2004) Multilevel linear modelling for FMRI group analysis using Bayesian inference. Neuroimage 21: 1732–1747. 10.1016/j.neuroimage.2003.12.023 [DOI] [PubMed] [Google Scholar]
- 76.Behrens TEJ, Hunt LT, Woolrich MW, Rushworth MFS (2008) Associative learning of social value. Nature 456: 245–249. 10.1038/nature07538 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Jocham G, Furlong PM, Kröger IL, Kahn MC, Hunt LT, et al. (2014) Dissociable contributions of ventromedial prefrontal and posterior parietal cortex to value-guided choice. NeuroImage 100: 498–506. 10.1016/j.neuroimage.2014.06.005 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Boorman ED, Behrens TE, Rushworth MF (2011) Counterfactual choice and learning in a neural network centered on human lateral frontopolar cortex. PLoS Biol 9: e1001093 10.1371/journal.pbio.1001093 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Scholl J, Gunthner J, Kolling N, Favaron E, Rushworth MF, et al. (2014) A role beyond learning for NMDA receptors in reward-based decision-making-a pharmacological study using d-cycloserine. Neuropsychopharmacology 39: 2900–2909. 10.1038/npp.2014.144 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Beck AT, Steer RA, Brown GK (1996) Beck depression inventory-II. San Antonio. [Google Scholar]
- 81.Spielberger CD, Gorsuch RL (1983) State-trait anxiety inventory for adults: Manual, instrument, and scoring guide: Mind Garden, Incorporated.
- 82.Watson D, Clark LA, Tellegen A (1988) Development and validation of brief measures of positive and negative affect: The PANAS scales. Journal of Personality and Social Psychology 54: 1063–1070. [DOI] [PubMed] [Google Scholar]
- 83.Cloninger CR, Svrakic DM, Przybeck TR (1993) A psychobiological model of temperament and character. Archives of general psychiatry 50: 975–990. [DOI] [PubMed] [Google Scholar]
- 84.Bond A, Lader M (1974) The use of analogue scales in rating subjective feelings. British Journal of Medical Psychology 47: 211–218. [Google Scholar]
- 85.Beckmann M, Johansen-Berg H, Rushworth MF (2009) Connectivity-based parcellation of human cingulate cortex and its relation to functional specialization. J Neurosci 29: 1175–1190. 10.1523/JNEUROSCI.3328-08.2009 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 86.Sallet J, Mars RB, Noonan MP, Neubert FX, Jbabdi S, et al. (2013) The organization of dorsal frontal cortex in humans and macaques. J Neurosci 33: 12255–12274. 10.1523/JNEUROSCI.5108-12.2013 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The data are publicly available from the Oxford University Research Archive (https://doi.org/10.5287/bodleian:PP805bgDz). Source data files are provided with the article for all figures presented in the manuscript.