Skip to main content
Translational Psychiatry logoLink to Translational Psychiatry
. 2020 Mar 3;10:84. doi: 10.1038/s41398-020-0762-5

The neurochemical substrates of habitual and goal-directed control

Valerie Voon 1,2,3,, Juho Joutsa 4,5,6,7, Joonas Majuri 4,6, Kwangyeol Baek 1,8, Camilla L Nord 1,9, Eveliina Arponen 6, Sarita Forsback 6, Valtteri Kaasinen 4,7
PMCID: PMC7054261  PMID: 32127520

Abstract

Our daily decisions are governed by the arbitration between goal-directed and habitual strategies. However, the neurochemical basis of this arbitration is unclear. We assessed the contribution of dopaminergic, serotonergic, and opioidergic systems to this balance across reward and loss domains. Thirty-nine participants (17 healthy controls, 15 patients with pathological gambling, and 7 with binge eating disorder) underwent positron emission tomography (PET) imaging with [18F]FDOPA, [11C]MADAM and [11C]carfentanil to assess presynaptic dopamine, and serotonin transporter and mu-opioid receptor binding potential. Separately, participants completed a modified two-step task, which quantifies the degree to which decision-making is influenced by goal-directed or habitual strategies. All participants completed a version with reward outcomes; healthy controls additionally completed a version with loss outcomes. In the context of rewarding outcomes, we found that greater serotonin transporter binding potential in prefrontal regions was associated with habitual control, while greater serotonin transporter binding potential in the putamen was marginally associated with goal-directed control; however, the findings were no longer significant when controlling for the opposing valence (loss). In blocks with loss outcomes, we found that the opioidergic system, specifically greater [11C]carfentanil binding potential, was positively associated with goal-directed control and negatively associated with habit-directed control. Our findings illuminate the complex neurochemical basis of goal-directed and habitual behavior, implicating differential roles for prefrontal and subcortical serotonin in decision-making across healthy and pathological populations.

Subject terms: Neuroscience, Human behaviour

Introduction

Two distinct systems influence our choice behavior: goal directed and habitual control. Goal-directed (or model-based) control is characterized by a learned internal model of the environment that can dynamically evaluate optimal actions, a flexible but computationally expensive strategy13. By contrast, habitual (or model-free) control computes the value of each action entirely by past experience (reward prediction errors), sacrificing flexibility for greater efficiency. Disruptions in the balance of these strategies may underlie a range of pathological behaviours, in particular psychiatric disorders characterized by compulsivity35.

This balance between goal-directed and habitual strategies is mediated by various neurochemical processes. Among these, the dopamine system is most frequently implicated; a smaller number of studies also point to the involvement of the serotonin and opioid systems3,6. The role of dopamine in this balance is a topic of some debate. Traditionally, dopamine has been associated with model-free reinforcement learning: in rodents, pharmacologically enhancing dopamine increases habit formation7, while dopaminergic nigrostriatal lesions impair habit formation8. However, more recent human research has shown that depleting dopamine increases habitual control9, while administration of the dopamine precursor levodopa was reported to enhance goal-directed control in two studies10,11 and reduce habitual control in a third12 (in the latter study, participants with high working memory capacity did show enhancement of goal-directed control). There is evidence that a key locus of this influence is the ventral striatum: a study that combined 6-[18F]fluoro-L-dopa ([18F]FDOPA) positron emission tomography (PET) with functional magnetic resonance imaging found goal-directed learning correlated with ventral striatal presynaptic dopamine synthesis capacity13. In line with this work, we expected that heightened dopamine levels might shift decision-making toward a goal-directed and away from a habitual strategy. However, most previous work has focused exclusively on choice behavior in the reward domain1416, a crucial limitation, making the involvement of dopamine in the loss domain unclear. Thus, probing the neurochemical substrates of model-based and model-free control across reward and loss domains may yield a fuller picture of the neural basis of decision-making.

The opioid and serotonin systems appear to play a role in arbitrating between goal-directed and habitual control of behaviour. In rodents, decreasing forebrain serotonin (5-HT) increases compulsive cocaine seeking and manipulating the serotonergic system shifts these habitual behaviours16. Overexpression of rodent dorsolateral striatal 5-HT6 receptors also decreases habitual control15. In healthy humans, central serotonin depletion enhances habitual responding17. However, central serotonin depletion impairs goal-directed control to rewards, but enhances goal-directed control to losses6, illustrating the importance of including both reward and loss domains experimentally. The opioid system also plays an essential role in goal-directed behaviour. A large body of evidence implicates the opioid system in goal-directed aspects of reward processing: opioid peptide-containing neurons, their terminals, and opioid receptors are present in the same basal forebrain regions implicated in learning and performance of goal-directed actions (e.g., the nucleus accumbens (NAcc) core)18,19.

Compellingly, in rodents, blockade of the opioid system during learning with naloxone compromises goal-directed learning, enhancing habitual control of actions14. Naloxone administration also decreases goal-directed alcohol consumption in an animal model of alcoholism, and blocks reinstatement of alcohol-seeking learned in a goal-directed schedule20. Opioid processes seem critical for the acquisition of normal goal-directed control of actions: potentially, higher endogenous opioid levels would have the opposite effect to naloxone administration, enhancing goal-directed control of actions.

Here, we investigate the balance of goal-directed (model-based) and habitual (model-free) control in the appetitive and aversive domain (monetary rewards and losses), and its relationship with NAcc and ventromedial prefrontal cortex (vmPFC)/medial orbitofrontal cortex (mOFC) presynaptic dopamine function, and serotonin transporter (SERT) and mu-opioid receptor (MOR)-binding potential (BP). Previous studies investigating dopamine or serotonin function in association with model-free/model-based control have primarily focused on the striatum (e.g.,13,15). We additionally include a vmPFC/mOFC ROI, due to previous work suggesting the vmPFC is involved at least in part in model-based evaluation in this task2. Moreover, in healthy populations, lower medial OFC and vmPFC volumes (as well as striatal volumes) are associated with reduced model-based control4, while reduced medial prefrontal cortex activation during model-based control is predictive of relapse in alcohol-dependent patients21, underlining the clinical relevance of this region’s computations during the task.

We include three populations of subjects: healthy controls, patients with pathological gambling (PG), and those with binge-eating disorder (BED); in both BED and addictive disorders, decision-making is shifted away from goal-directed toward habitual control (and is thought to be a transdiagnostic symptom dimension common across disorders of compulsivity)4. However, the primary purpose of this study was not to assess between-group differences, which we explored separately22, but rather to illuminate the role of these three neurochemical systems (dopamine, serotonin, and opioid) in goal-directed and habitual control, across reward and loss domains. Thus, we included psychiatric populations in our sample in order to capture a wider range of goal-directed and habitual behavior (associated with healthier and pathological states, respecitvely). We hoped this approach would yield greater insight into the neurochemical substrates of this behaviour.

We hypothesized that heightened [18F]FDOPA uptake (signifying greater pre-synaptic dopamine function) would be associated with heightened goal-directed learning to rewards; that lower [11C]MADAM BP (which binds selectively to the SERT) would be associated with decreased goal-directed control; and that lower [11C]carfentanil BP (which binds to the MOR) would be associated with decreased goal-directed control.

Materials and methods

Participants

Sixty-seven prospective participants were screened for the study. Subjects recruited to BED and PG groups fulfilled the Diagnostic and Statistical Manual of Mental Disorders (DSM-IV) criteria for BED and PG, respectively, confirmed in a structured clinical interview. Exclusion criteria common to both groups, as well as healthy volunteers, included any substance use disorder during the last 6 months prior to PET imaging, diagnosed DSM-IV axis I psychiatric disorder, any clinically relevant somatic disorder (e.g., diabetes mellitus), pregnancy or lactation, and weight over 180 kg (the scanner limit). After screening, 17 healthy controls, 15 PG patients, and 7 BED patients were recruited to the study. The study protocol was approved by the local ethical committee, and all participants gave written informed consent. We required 36 subjects to detect a large effect size (f2 = 0.3) with 80% power (G*Power: Linear multiple regression). The study was conducted according to the principles of the Declaration of Helsinki.

Two-step task

Healthy participants performed the two-step task in two conditions, monetary reward or loss; all patient groups performed only the reward version of the task. We have previously described the task4,23. Briefly, the task consisted of two stages (see Fig. 1a). In stage 1, participants chose between two stimuli, each of which led to one of two stimulus pairs with a fixed probability (p = 0.70) and to the other stimulus pair with opposite probability (p = 0.30). In stage 2, participants chose a single stimulus from the resulting pair; this choice led to an outcome.

Fig. 1. Two-step task and regions of interest.

Fig. 1

a Subjects choose between a stimuli-pair at the first stage which leads with fixed probability (p = 0.70) to one of two states at the second stage. Subjects then choose between one of two stimuli-pairs at the second stage which are associated with a shifting probability of reward based on a random Gaussian walk. The reward outcome is shown. b Region of interests Top: freesurfer parcellation. Middle and bottom: Regions of interest. The caudate is shown in blue; the putamen is shown in green; the nucleus accumbens (ventral striatum) is shown in yellow. The medial orbitofrontal cortex parcellation shown in the Free Surfer parcellation (top) consisted of both the ventromedial prefrontal cortex and medial orbitofrontal cortex (shown in red at the bottom). For illustration purposes, the Free Surfer reconstruction is shown overlaid on the MNI152 brain (middle and bottom).

Each of the four stimuli in stage 2 was attached to a different probability distribution, with probability varying slowly and independently over time between 0.25 and 0.75. The association between each stage 2 stimulus and its reward probability was counterbalanced across participants. Choices at each stage had to be made within 2 s, and the result of each choice was presented for 1 s, after a 1.5 s delay. The stimuli chosen in stages 1 and 2 remained on screen as a reminder in stage 2 and the outcome stage, respectively. If the stage 2 choice was rewarded, participants saw a 1 Euro coin for 1 s; otherwise, they saw a grey circle for 1 s. In the reward condition, subjects either saw a 1 Euro coin with a green square (win outcome), or a grey circle (no-win outcome). In the loss condition, subjects either saw a 1 Euro coin with a red square and red cross over the coin (loss outcome), or a grey circle (no-loss outcome).

The task consisted of two blocks of 67 trials each per condition. The order of the conditions was randomized (but the two blocks of each condition were always run sequentially). Prior to the task, participants underwent extensive computer-based instructions, which included explanatory examples of changes in transition and probability, and a short block of 50 trials in the same format as the experimental task but with different stimuli. The task was run with Cogent 2000 (http://www.vislab.ucl.ac.uk/cogent.php) on Matlab R2011a (Mathworks, Natick, USA). See Supplemental Materials for an analysis on existing datasets comparing this shortened (two-block) version of the task with the typical three-block version: we showed that the average main outcome measure was highly correlated between the two versions.

PET imaging

All subjects underwent PET scanning three times: first using the MOR-ligand [11C]carfentanil, then with the SERT-ligand [11C]MADAM, and finally with the dopamine precursor ligand [18F]FDOPA. The syntheses of these tracers have been described in detail previously22,24. The PET imaging was performed with an high resolution research tomograph (Siemens Medical Solutions, Knoxville, TN, USA) PET scanner used in 3D list mode with scatter correction. A transmission scan was performed before each PET scan with a [137Cs] rotating point source. The dynamic scanning times were 51, 90, and 90 min for [11C]carfentanil, [11C]MADAM, and [18F]FDOPA, respectively. All three PET scans were conducted in the same day at fixed intervals: [11C]carfentanil scan at 0900–1000 h, regular hospital lunch at 1100–1200 h, [11C]MADAM scan at 1200–1300 h and [18F]FDOPA scan at 1430–1530 h. One [11C]carfentanil scan and three [18F]FDOPA scans were performed on a separate day due to tracer production failure or scanner malfunction. Head movements were minimized using a personalized thermoplastic mask or a Velcro strap, and recorded with a stereotaxic infrared camera (Polaris Vicra, Northern Digital, Waterloo, Canada). One [11C]carfentanil scan, three [18F]FDOPA scans and three [11C]MADAM scans were excluded due to scanner malfunction or subject withdrawal. Thus, the final sample sizes were 7 BED, 15 PG, and 16 controls with [11C]carfentanil, 7 BED, 13 PG, and 16 controls with [11C]MADAM and [18F]FDOPA.

The preprocessing and analysis has been described in detail previously22. Briefly, PET images were corrected for between-frame motion and coregistered with individual anatomical T1-weighted magnetic resonance imaging (MRI) using Statistical Parametric Mapping software (SPM8, http://www.fil.ion.ucl.ac.uk/spm/software/spm8/). Time-activity data were extracted using regions of interest (ROI) for the mean NAcc area, caudate, putamen and mOFC, which were determined from the individual T1-weighted MR images using FreeSurfer automatic parcellation (Fig. 1b, top) (version 5.3.0, http://surfer.nmr.mgh. Harvard.edu/) as described earlier2527. Note that the automated mOFC ROI includes both vmPFC and mOFC regions and is referred to in this study as vmPFC/mOFC (Fig. 1b). The simplified reference tissue model was applied to calculate [11C]carfentanil and [11C]MADAM estimates of specific binding relative to non-displaceable BPs (BPND)28. [18F]FDOPA influx rate constant (Ki) was determined using the Patlak plot using the reference region as the input function29. The occipital cortex was designated as the reference region for [11C]carfentanil and [18F]FDOPA, and the cerebellar cortex was the reference region for [11C]MADAM. Different reference regions ensure there is no specific tracer binding in the reference region (in the case of [11C]MADAM, there is specific binding in the occipital cortex but no specific binding in the cerebellar cortex30; for [11C]carfentanil and [18F]FDOPA, there is no specific binding in the occipital cortex31,32).

Analysis

All PET data were tested for outliers (>3 standard deviation (SD) from group mean) and normality of distribution (Shapiro Wilkes test p > 0.05). The computational analysis for the two-step task has been extensively described previously4,33. In brief, we fit choice data of each participant to a hybrid algorithm that combined model-free (i.e., reinforcement learning) and model-based learning algorithms. This model estimates five parameters based on the behavioural data for each participant: a choice reliability parameter (β) a learning rate (α), a reinforcement eligibility parameter (λ), a perseveration rate, and a weighting parameter (w, which extends from 1 (purely model-based) to 0 (purely model-free). We analyse only this final parameter, described as wr = w for the reward condition, and wl = w for the loss condition.

Two healthy controls did not complete the two-step task for loss outcomes. We tested wr and wl for outliers (>3 SD from group mean) and normality of distribution (Shapiro Wilkes test p > 0.05). As the scores were normally distributed we used parametric analyses. We compared wr between groups in the behavioural analysis using a one-way ANOVA (but did not conduct any group comparisons for wl as only healthy volunteers were tested in the loss condition). For the relationship with neural regions associated with the PET ligands, we conducted six stepwise multiple linear regressions with backwards elimination, with either wr or wl as the dependent variable and the mean bilateral NAcc, caudate, putamen and vmPFC/mOFC of each PET ligand data as the independent variables (no multicollinearity was detected with VIF < 10; homoscedascity of residuals and normality of residuals were confirmed). The wr analysis included healthy controls, PG, and BED; since only healthy controls were tested in the loss condition, the wl model included only healthy controls. For these models, p < 0.0083 was considered significant (after Bonferroni correction for six regression analyses: one model for each ligand, for both reward and loss).

Results

We assessed 17 healthy controls, 15 patients with PG, and 7 patients with BED (see Table 1 for demographic details, and see previous publications for additional clinical details22,34). Age did not differ between groups (p = 0.35), though there was a group effect of body mass index (BMI) (p = 0.003, driven by an increased BMI in the BED population) and on the Beck Depression Inventory (BDI) (p < 0.0005, driven by higher BDI scores in both patient populations). There were also group differences across all gambling measures (driven by higher scores in the PG group) and binge eating measures (driven by higher scores in the BED group); all p < 0.01 (see Table 1).

Table 1.

Demographic details of the participants.

Measure Healthy controls (N = 17) BED (N = 7) PG (N = 15)
Mean age (SD) 43.29 (11.10) 49.43 (5.09) 42.60 (11.81)
Males 8 0 8
Mean BMI (SD) 24.82 (2.10) 30.87 (6.58) 25.41 (3.64)
Mean BDI (SD) 2.82 (3.09) 15.43 (9.62) 14.36 (7.76)
SOGS 0.1 (0.3) 0.4 (0.5) 13.3 (2.3)
Duration of problem gambling (y) n.a. n.a. 11.6 (7.3)
Gambling per week (€) 3.9 (7.4) 2.9 (4.6) 152 (149)
Gambling per week (h) 0.5 (1.2) 0.5 (1.2) 8.7 (7.2)
Gambling debt (€) 0 (0) 0 (0) 18,000 (15,600)
Binge eating scale 2.1 (2.1) 30.9 (4.6) 4.4 (4.4)
Yale food addiction scale 5.4 (3.4) 42.3 (6.5) 9.1 (9.5)
DEBQ emotional 20.5 (5.0) 50.0 (8.3) 21.2 (8.7)
DEBQ external 23.7 (5.3) 37.5 (6.3) 26.1 (7.3)
DEBQ restrained 24.8 (6.8) 35.3 (3.4) 20.9 (10.6)
Duration of problem eating (y) n.a. 18.1 (14.9) n.a.

SD standard deviation, BED Binge eating disorder, PG pathological gambling, BMI body mass index, BDI Beck depression inventory, DEBQ the Dutch eating behavior questionnaire, SOGS south oaks gambling screen, n.a. not applicable.

We first analysed the behavioural results alone to test if the groups differed on measures of model-based and model-free control on wr (extracted from the computational model that putatively describes the degree of model-based or model-freeness of a subject). There were no significant group differences in wr between groups (healthy volunteers: 0.289 (0.254); PG: 0.139 (0.126); BED: 0.247 (0.232); F(2,34) = 1.70, p = 0.12) (wl was not compared between groups as only healthy volunteers were tested).

We have also tested whether other computational parameters differed between groups. There were no significant differences with other parameters including learning rates, temperature or reinforcement eligibility parameter. There was a significant group difference in perseveration, or the tendency to select the same choice in the first stage irrespective of outcome (PG: 0.06 (0.14), Healthy volunteers 0.16 (0.10), BED: 0.3 (0.26) (p = 0.009) with posthoc analysis showing differences between PG and BED (p = 0.007). These findings are consistent with high perseveration scores in BED previously reported4.

We compared the model fits and did not show a difference between groups (negative log likelihoods (−LL): w_r: control: 142.08 (27.60); PG 154.06 (29.57); BED 137.74 (43.11); p = 0.46; w_l: 138.65 (37.17)). We note that the model fit for this analysis was largely similar to our existing healthy control data set (see Supplemental Materials). We also ran a supplementary analysis with [11C]MADAM and w_r and [11C]carfentanil and w_l with −LL for reward and loss included as a variable respectively with both models remaining significant (reward: p = ; loss: p = 0.007).

PET imaging data

Reward

The linear regression for wr (collapsed across all three groups) showed a significant relationship with [11C]MADAM BP (R2 = 0.330, F = 4.791, p = 0.008) (which was significant after Bonferroni correction). The NAcc was not associated with wr and was subsequently removed from the model. The final model showed that wr was significantly negatively correlated with vmPFC/mOFC (Beta = −0.653, t = −3.406, p = 0.002), positively correlated with putamen (Beta = 0.421, t = 2.352, p = 0.040), and marginally associated with caudate (Beta = 0.332, t = 1.876, p = 0.071) [11C]MADAM BP. In sum, greater goal-directed control (and weaker habitual control) was associated with putamen [11C]MADAM BP, while greater habitual control (and weaker goal-directed control) was associated with vmPFC/mOFC [11C]MADAM BP. There were no significant relationships between wr and [18F]FDOPA (R2 = 0.081, F = 2.554, p = 0.121) or [11C]carfentanil BP (R2 = 0.008, F = 0.259, p = 0.614). See Fig. 2a, b.

Fig. 2. The neurochemical substrates of goal-directed control for rewards and losses.

Fig. 2

a Significant relationship between the relative balance of model-based and model-free control for rewards and medial orbitofrontal cortex (mOFC)/ventromedial prefrontal cortex (vmPFC) [11C]MADAM BP (across all participants). b Nonsignificant relationship between the relative balance of model-based and model-free control for rewards and putamen [11C]MADAM BP (across all participants). c Significant relationship between the relative balance of model-based and model-free control for losses and nucleus accumbens (NAcc) [11C]carfentanil BP (healthy controls only). d Nonsignificant relationship between the relative balance of model-based and model-free control for losses and putamen [18F]FDOPA. **p < 0.0083 (alpha Bonferroni-corrected for six multiple comparisons).

Loss

The linear regression for wl and [11C]carfentanil BP was significant with all regions included in the model (R2 = 0.472, F = 10.728, p = 0.007) (note wl includes only healthy participants, as this version of the task was only run in healthy participants) (which remained significant after Bonferroni correction). However, the vmPFC/mOFC, caudate, and putamen were not significantly associated with wl, and were therefore removed from the model. In the final model, wl, or greater goal-directed control (or impaired habitual control) toward losses, was significantly positively correlated with bilateral NAcc [11C]carfentanil BP (Beta = 0.687, t = 2.275, p = 0.007). See Fig. 2c.

The linear regression for wl and [18F]FDOPA showed only a trend (after Bonferroni correction) in the relationship between wl (R2 = 0.337, F = 5.598, p = 0.037); this was no longer significant after Bonferroni correction. The vmPFC/mOFC, caudate and NAcc were not significantly associated and were removed from the model. The final model of wl and bilateral putamen [18F]FDOPA (Beta = −0.581, t = −2.366, p = 0.037), such that higher putaminal [18F]FDOPA was associated with impaired goal-directed control (or greater habitual control) toward losses but critically was not significant after correction (see Fig. 2d). Given previous positive findings13, we also specifically tested a regression analysis with NAcc [18F]FDOPA for w_r and w_l, and show no significant findings (p = 0.98 and p = 0.32, respectively).

There were no significant relationships between wl and [11C]MADAM BP (R2 = 0.038, F = 0.475, p = 0.504).

Valence specificity and behavioural measures of model-based and model-free control

To assess specificity of the effect of the tracer on valence we reran the multiple regression analysis controlling for the opposing valence. As there was no evidence of multicollinearity between w for gain and loss (Tolerance = 0.74, VIF = 1.34), we conducted a secondary analysis of the regression analysis for [11C]MADAM and w_r including w_l into the model. The overall model including caudate and vmPFC/mOFC (but not putamen) remained significant at p = 0.014 (caudate p = 0.018; vmPFC/mOFC p = 0.006). Similarly, the regression analysis for [11C]carfentanil and w_l including w_r into the model also remained significant at p = 0.007 with only NAcc in the model (NAcc p = 0.007). The regression analysis for [18F]FDOPA and w_l including w_r into the model also showed an overall model p value of 0.04 with only putamen in the model (putamen p = 0.04). Put together, these secondary findings highlight the specificity of [11C]MADAM and vmPFC/mOFC for w_r and [11C]carfentanil and NAcc for w_l.

For the purposes of exploring the trade-off between goal-directed and habitual effects and the relationship with neurotransmitter levels, we conducted supplementary analyses with the behavioural based model-based and model-free control as the independent variables rather than w. For [11C]MADAM and reward, there was no significant relationship with either model-based on model-free control. For [11C]carfentanil and loss, greater model-based control was significantly associated with a model (p = 0.019) with a positive correlation with NAcc [11C]carfentanil BP (t = 3.24, p = 0.008); and greater model-free control was significantly associated with a model (p = 0.01) with a negative correlation with NAcc [11C]carfentanil BP (t = −3.04, p = 0.01).

Discussion

We reveal a differential role for prefrontal and striatal serotoninergic systems in mediating the balance of goal-directed and habitual control in the reward domain: lower mOFC/vmPFC, but higher putamen [11C]MADAM BP correlated with a shift toward goal-directed control; however, the latter relationship was not specific when controlled for the opposing valence (loss). In the loss domain, we also find a differential relationship between opioidergic systems and both a positive correlation with goal-directed control and a negative correlation with NAcc [11C]carfentanil BP.

Opioid peptides in goal-directed control

In the loss domain, we also found a positive relationship between the opioidergic system and goal-directed control and a negative relationship with habit-directed control. Here, greater NAcc [11C]carfentanil BP may reflect either greater MOR density or lower endogenous synaptic peptide opioid levels, which compete for binding with [11C]carfentanil. These findings are consistent with preclinical evidence suggesting blockade of endogenous opioid activity in rodents by the competitive opioid receptor antagonist naloxone during acquisition learning of food rewards shifts behavior toward habitual control, and decreases sensitivity to changes in the value of reward14. This effect was restricted to the acquisition of goal-directed actions, and not during performance in the test phase, suggesting a specific effect of MOR antagonism during goal-directed learning. An alternate explanation lies in the effect of opioids on aversive processing: opioids decrease pain ratings particularly in the expectation of pain relief35, and decrease non-painful aversive responses such as conditioned aversion in rodents36. In healthy humans, blocking MOR with naloxone during a gamble task increased the subjective aversive ratings to monetary loss outcomes36. Furthermore, naloxone increases blood oxygen level-dependent activity during loss outcomes in caudal and subgenual cingulate, bilateral insula, thalamus, and visual cortex; caudal cingulate activity correlates with aversive ratings36. Thus, in our data, an alternate plausible explanation may be that endogenously lower opioid peptides enhances the aversiveness of monetary loss, thus improving goal-directed control to losses. Note that although MOR stimulation is associated with striatal dopamine release via GABAergic mechanisms in the ventral tegmental area37, we did not observe any relationship between [18F]FDOPA and goal-directed control in our study, nor any relationship between [18F]FDOPA and [11C]MADAM or [11C]carfentanil BP.

A differential role for prefrontal and striatal serotonergic systems

Perhaps the most interesting finding emerging from our study is a potential differential relationship between prefrontal and striatal serotonergic systems in mediating the balance between goal-directed and habitual control. In rodents, decreasing forebrain 5-HT and systemic 5HT2C antagonism enhances compulsive cocaine seeking, an effect which was reversed by both a 5HT2C agonist and a selective serotonin reuptake inhibitor14. Furthermore, overexpression of dorsolateral striatal 5-HT6 receptors decreases habitual control in rodents15. In healthy humans, central serotonin depletion enhances habitual responding17 and impairs goal-directed control to rewards, while enhancing goal-directed control to losses6. Patients with obsessive–compulsive disorder (with putative impairments in serotonergic function) show impaired goal-directed control for rewards and enhanced goal-directed control for losses33.

It is worth noting that SERT BP is interpreted in terms of serotonin terminal density (SERT density), which can be either primary or adaptive in response to endogenous serotonin level changes; these have opposing implications for serotonin levels. If we presume that low SERT BP reflects fewer serotonergic terminals, and hence lower serotonergic activity, our prefrontal results support previous findings that low forebrain serotonin in rodents enhances compulsive cocaine seeking14 and central serotonin depletion in healthy humans impairs goal-directed control and shifts behavior toward habitual responding for rewards17. However, we fail to confirm previous studies showing valence-dependent effects on serotonin on goal-directed processing6 (we show no effect in the loss domain), which is inconsistent with previous work showing a key role of serotonin in loss or punishment processes6,38.

Presynaptic dopamine synthesis and habitual control

There are conflicting preclinical and human reports regarding dopaminergic function in goal-directed and habitual control. In rodents, pharmacologically enhancing dopamine (with amphetamine) accelerates habit formation7, a process reversed by D1 antagonism (but enhanced by D2 antagonism)39; selective nigrostriatal dopaminergic lesions impair habit formation8. In contrast, in healthy humans, depletion of the dopamine precursor increases habitual control9. The severity of Parkinson’s disease, characterized by dopaminergic deficits, is associated with impairments in goal-directed control9; patients tested off-medication show impaired goal-directed control. Pharmacological enhancement of dopamine with levodopa increases goal-directed control in both Parkinson’s disease patients11 and healthy controls10; although note this may not generalize to all individuals, as a more recent study found that levodopa decreased habitual control, with increases in goal-directed control only seen in individuals with a high working memory capacity12. Nevertheless, greater ventral striatal presynaptic dopamine synthesis, measured using F-DOPA PET, correlates with greater goal-directed control13. These human studies contrast with the preclinical literature9,11 and may be related to task differences such as overtraining in rodent relative to human studies, lack of anatomical specificity of dopaminergic medication challenges in humans, or overlap of neural substrates underlying goal-directed and habitual control3.

Our observations in healthy controls are more consistent with the preclinical literature: we show a marginal relationship between greater presynaptic dopamine synthesis in putaminal regions and habitual control in the loss domain which was no longer significant after multiple correction. A previous study showed a weak positive relationship between [18F]FDOPA and goal-directed control to rewards in 29 healthy controls13. However, we were unable to replicate these findings. Our lack of positive findings in the reward domain should be interpreted with caution, as we may not have had adequate power to replicate this effect. However, the negative relationship we observed with in the loss domain could imply a differential relationship between the role of dopamine in goal-directed and habit control for rewards versus losses.

Limitations

Our study is the first to investigate the role of three neurochemical systems—serotonergic, dopaminergic, and opioidergic—in goal-directed and habitual control. As such, while we reveal a number of interesting potential relationships, we are limited by both inherent ambiguities in the interpretation of BP effects, and a relative dearth of similar investigations in humans. Furthermore, while our study was adequately powered for within-group comparisons, our lack of a group effect may simply reflect inadequate power to detect between-group differences. This lack of power could also account for our lack of group differences on our behavioural measure (wr); previous studies have shown this measure to be generally compromised across disorders of compulsitivity35.

We also tested whether other computational parameters differed between groups. There were no significant differences with other parameters including learning rates, temperature or reinforcement eligibility parameter. There was a significant group difference in perseveration, the tendency to select the same choice in the first stage irrespective of outcome (PG: 0.06 (0.14), healthy volunteers: 0.16 (0.10), BED: 0.3 (0.26); p = 0.009) with a posthoc analysis showing differences between PG and BED (p = 0.007). Despite our small sample size of patients with BED, we replicate the finding of increased perseveration irrespective of outcome, which we previously reported in a much larger sample: patients with BED showed increased perseveration on this task compared to obese participants without BED4. This fits in with a larger experimental and clinical literature reporting cognitive inflexibility in BED: patients with BED show decreased cognitive flexibility on a neuropsychological battery compared to either healthy controls or patients with anorexia40 (for a review of the literature, see ref. 41). This impairment in cognitive flexibility could contribute to the symptoms of BED by making patients less able to change their decisions about food consumption after changing environmental outcomes (e.g., the food losing value after satiety, or nausea or discomfort as a result of overeating).

In addition, our findings in the reward domain were strengthened by the inclusion of both healthy controls and a transdiagnostic psychiatric population; in contrast, our findings in the loss domain were limited to the healthy population. In future, it would be essential to extend our transdiagnostic results to the loss domain, but also investigate samples large enough to characterize any between-group relationships in each neurochemical system and its role in goal-directed and habitual control.

Conclusions

We highlight a potential role for dopaminergic, opioidergic and serotonergic mechanisms in arbitrating between behavioral controllers. In the reward domain, we showed a differential role for prefrontal and striatal serotonergic mechanisms, which were associated with habitual and goal-directed control, respectively. In the loss domain, we found the NAcc opioidergic system was positively associated with goal-directed control, and more tentatively, that the putaminergic dopaminergic system was associated with habitual control. These findings begin to reveal the complex neurochemical substrates of a key aspect of decision-making. Uncovering these mechanisms could be crucial to developing interventions that target these behavioural strategies in the context of psychiatric disorders.

Supplementary information

Supplemental Material (11.9KB, docx)

Acknowledgements

We thank the personnel of the Turku PET Centre for their expertise and assistance in PET and MR imaging. This study was supported by the Academy of Finland (grant #256836), the Finnish Alcohol Research Foundation, the Finnish Medical Foundation and the Turku University Central Hospital (EVO grants). C.L.N. is supported by the UK Medical Research Council (Grant Reference: SUAG/043 G101400). V.V. is supported by a Medical Research Council Senior Clinical Fellowship (MR/P008747/1).

Conflict of interest

The authors declare that they have no conflict of interest.

Footnotes

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information accompanies this paper at (10.1038/s41398-020-0762-5).

References

  • 1.Gläscher J, Daw N, Dayan P, O’Doherty JP. States versus rewards: dissociable neural prediction error signals underlying model-based and model-free reinforcement learning. Neuron. 2010;66:585–595. doi: 10.1016/j.neuron.2010.04.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Daw ND, Gershman SJ, Seymour B, Dayan P, Dolan RJ. Model-based influences on humans’ choices and striatal prediction errors. Neuron. 2011;69:1204–1215. doi: 10.1016/j.neuron.2011.02.027. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Voon V, Reiter A, Sebold M, Groman S. Model-based control in dimensional psychiatry. Biol. Psychiatry. 2017;82:391–400. doi: 10.1016/j.biopsych.2017.04.006. [DOI] [PubMed] [Google Scholar]
  • 4.Voon V, et al. Disorders of compulsivity: a common bias towards learning habits. Mol. Psychiatry. 2015;20:345–352. doi: 10.1038/mp.2014.44. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Gillan C. M., Kosinski M., Whelan R., Phelps E. A., Daw N. D. Characterizing a psychiatric symptom dimension related to deficits in goal-directed control. Elife5, e11305 (2016). [DOI] [PMC free article] [PubMed]
  • 6.Worbe Y, et al. Valence-dependent influence of serotonin depletion on model-based choice strategy. Mol. Psychiatry. 2016;21:624. doi: 10.1038/mp.2015.46. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Nelson A, Killcross S. Amphetamine exposure enhances habit formation. J. Neurosci. 2006;26:3805–3812. doi: 10.1523/JNEUROSCI.4305-05.2006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Faure A, Haberland U, Condé F, El Massioui N. Lesion to the nigrostriatal dopamine system disrupts stimulus-response habit formation. J. Neurosci. 2005;25:2771–2780. doi: 10.1523/JNEUROSCI.3894-04.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.de Wit S, et al. Reliance on habits at the expense of goal-directed control following dopamine precursor depletion. Psychopharmacology. 2012;219:621–631. doi: 10.1007/s00213-011-2563-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Wunderlich K, Smittenaar P, Dolan RJ. Dopamine enhances model-based over model-free choice behavior. Neuron. 2012;75:418–424. doi: 10.1016/j.neuron.2012.03.042. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Sharp ME, Foerde K, Daw ND, Shohamy D. Dopamine selectively remediates ‘model-based’ reward learning: a computational approach. Brain. 2015;139:355–364. doi: 10.1093/brain/awv347. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Kroemer NB, et al. L-DOPA reduces model-free control of behavior by attenuating the transfer of value to action. NeuroImage. 2019;186:113–125. doi: 10.1016/j.neuroimage.2018.10.075. [DOI] [PubMed] [Google Scholar]
  • 13.Deserno L, et al. Ventral striatal dopamine reflects behavioral and neural signatures of model-based control during sequential decision making. Proc. Natl Acad. Sci. 2015;112:1595–1600. doi: 10.1073/pnas.1417219112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Wassum K, Cely I, Maidment N, Balleine B. Disruption of endogenous opioid activity during instrumental learning enhances habit acquisition. Neuroscience. 2009;163:770–780. doi: 10.1016/j.neuroscience.2009.06.071. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Eskenazi D, Neumaier JF. Increased expression of 5‐HT6 receptors in dorsolateral striatum decreases habitual lever pressing, but does not affect learning acquisition of simple operant tasks in rats. Eur. J. Neurosci. 2011;34:343–351. doi: 10.1111/j.1460-9568.2011.07756.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Pelloux Y, Dilleen R, Economidou D, Theobald D, Everitt BJ. Reduced forebrain serotonin transmission is causally involved in the development of compulsive cocaine seeking in rats. Neuropsychopharmacology. 2012;37:2505. doi: 10.1038/npp.2012.111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Worbe Y., Savulich G., de Wit S., Fernandez-Egea E., Robbins T. W. Tryptophan depletion promotes habitual over goal-directed control of appetitive responding in humans. Int. J. Neuropsychopharmacol.18, 1–5 (2015). [DOI] [PMC free article] [PubMed]
  • 18.Daunais JB, et al. Functional and anatomical localization of mu opioid receptors in the striatum, amygdala, and extended amygdala of the nonhuman primate. J. Comp. Neurol. 2001;433:471–485. doi: 10.1002/cne.1154. [DOI] [PubMed] [Google Scholar]
  • 19.Balleine BW, Dickinson A. Goal-directed instrumental action: contingency and incentive learning and their cortical substrates. Neuropharmacology. 1998;37:407–419. doi: 10.1016/S0028-3908(98)00033-1. [DOI] [PubMed] [Google Scholar]
  • 20.Hay RA, Jennings JH, Zitzman DL, Hodge CW, Robinson DL. Specific and nonspecific effects of naltrexone on goal‐directed and habitual models of alcohol seeking and drinking. Alcoholism. 2013;37:1100–1110. doi: 10.1111/acer.12081. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Sebold M, et al. When habits are dangerous: alcohol expectancies and habitual decision making predict relapse in alcohol dependence. Biol. Psychiatry. 2017;82:847–856. doi: 10.1016/j.biopsych.2017.04.019. [DOI] [PubMed] [Google Scholar]
  • 22.Majuri, J. et al. Dopamine and opioid neurotransmission in behavioral addictions: a comparative PET study in pathological gambling and binge eating. Neuropsychopharmacology42, 1169–1177 (2017). [DOI] [PMC free article] [PubMed]
  • 23.Nord, C. L. et al. The effect of frontoparietal paired associative stimulation on decision-making and working memory. Cortex117, 266–276 (2019). [DOI] [PMC free article] [PubMed]
  • 24.Halldin C, et al. [11C] MADAM, a new serotonin transporter radioligand characterized in the monkey brain by PET. Synapse. 2005;58:173–183. doi: 10.1002/syn.20189. [DOI] [PubMed] [Google Scholar]
  • 25.Fischl B, et al. Whole brain segmentation: automated labeling of neuroanatomical structures in the human brain. Neuron. 2002;33:341–355. doi: 10.1016/S0896-6273(02)00569-X. [DOI] [PubMed] [Google Scholar]
  • 26.Desikan RS, et al. An automated labeling system for subdividing the human cerebral cortex on MRI scans into gyral based regions of interest. Neuroimage. 2006;31:968–980. doi: 10.1016/j.neuroimage.2006.01.021. [DOI] [PubMed] [Google Scholar]
  • 27.Alakurtti K, et al. Long-term test–retest reliability of striatal and extrastriatal dopamine D2/3 receptor binding: study with [11C] raclopride and high-resolution PET. J. Cereb. Blood Flow. Metab. 2015;35:1199–1205. doi: 10.1038/jcbfm.2015.53. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Gunn RN, Lammertsma AA, Hume SP, Cunningham VJ. Parametric imaging of ligand-receptor binding in PET using a simplified reference region model. Neuroimage. 1997;6:279–287. doi: 10.1006/nimg.1997.0303. [DOI] [PubMed] [Google Scholar]
  • 29.Patlak CS, Blasberg RG, Fenstermacher JD. Graphical evaluation of blood-to-brain transfer constants from multiple-time uptake data. J. Cereb. Blood Flow. Metab. 1983;3:1–7. doi: 10.1038/jcbfm.1983.1. [DOI] [PubMed] [Google Scholar]
  • 30.Lundberg J, Odano I, Olsson H, Halldin C, Farde L. Quantification of 11C-MADAM binding to the serotonin transporter in the human brain. J. Nucl. Med. 2005;46:1505–1515. [PubMed] [Google Scholar]
  • 31.Hoshi H, et al. 6-[18F] fluoro-L-dopa metabolism in living human brain: a comparison of six analytical methods. J. Cereb. Blood Flow. Metab. 1993;13:57–69. doi: 10.1038/jcbfm.1993.8. [DOI] [PubMed] [Google Scholar]
  • 32.Endres CJ, Bencherif B, Hilton J, Madar I, Frost JJ. Quantification of brain μ-opioid receptors with [11C] carfentanil: reference-tissue methods. Nucl. Med. Biol. 2003;30:177–186. doi: 10.1016/S0969-8051(02)00411-0. [DOI] [PubMed] [Google Scholar]
  • 33.Voon V, et al. Motivation and value influences in the relative balance of goal-directed and habitual behaviours in obsessive-compulsive disorder. Transl. Psychiatry. 2015;5:e670. doi: 10.1038/tp.2015.165. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Majuri, J. et al. Serotonin transporter density in binge eating disorder and pathological gambling: A PET study with [11C]MADAM. European Neuropsychopharmacology27, 1281–1288 (2017). [DOI] [PubMed]
  • 35.Levine J, Gordon N, Jones R, Fields H. The narcotic antagonist naloxone enhances clinical pain. Nature. 1978;272:826. doi: 10.1038/272826a0. [DOI] [PubMed] [Google Scholar]
  • 36.Narayanan S, et al. Endogenous opioids mediate basal hedonic tone independent of dopamine D-1 or D-2 receptor activation. Neuroscience. 2004;124:241–246. doi: 10.1016/j.neuroscience.2003.11.011. [DOI] [PubMed] [Google Scholar]
  • 37.Spanagel R, Herz A, Shippenberg TS. Opposing tonically active endogenous opioid systems modulate the mesolimbic dopaminergic pathway. Proc. Natl Acad. Sci. 1992;89:2046–2050. doi: 10.1073/pnas.89.6.2046. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Crockett MJ, Clark L, Apergis-Schoute AM, Morein-Zamir S, Robbins TW. Serotonin modulates the effects of Pavlovian aversive predictions on response vigor. Neuropsychopharmacology. 2012;37:2244–2252. doi: 10.1038/npp.2012.75. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Nelson AJD, Killcross S. Accelerated habit formation following amphetamine exposure is reversed by D1, but enhanced by D2, receptor antagonists. Front. Neurosci. 2013;7:76. doi: 10.3389/fnins.2013.00076. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Aloi M, et al. Decision making, central coherence and set-shifting: a comparison between Binge Eating Disorder, Anorexia Nervosa and Healthy Controls. BMC Psychiatry. 2015;15:6. doi: 10.1186/s12888-015-0395-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Voon V. Cognitive biases in binge eating disorder: the hijacking of decision making. CNS Spectr. 2015;20:566–573. doi: 10.1017/S1092852915000681. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental Material (11.9KB, docx)

Articles from Translational Psychiatry are provided here courtesy of Nature Publishing Group

RESOURCES