Abstract
Variations in the fat mass and obesity-associated (FTO) gene are linked to obesity. However, the underlying neurobiological mechanisms by which these genetic variants influence obesity, behavior, and brain are unknown. Given that Fto regulates D2/3R signaling in mice, we tested in humans whether variants in FTO would interact with a variant in the ANKK1 gene, which alters D2R signaling and is also associated with obesity. In a behavioral and fMRI study, we demonstrate that gene variants of FTO affect dopamine (D2)-dependent midbrain brain responses to reward learning and behavioral responses associated with learning from negative outcome in humans. Furthermore, dynamic causal modeling confirmed that FTO variants modulate the connectivity in a basic reward circuit of meso-striato-prefrontal regions, suggesting a mechanism by which genetic predisposition alters reward processing not only in obesity, but also in other disorders with altered D2R-dependent impulse control, such as addiction.
SIGNIFICANCE STATEMENT Variations in the fat mass and obesity-associated (FTO) gene are associated with obesity. Here we demonstrate that variants of FTO affect dopamine-dependent midbrain brain responses and learning from negative outcomes in humans during a reward learning task. Furthermore, FTO variants modulate the connectivity in a basic reward circuit of meso-striato-prefrontal regions, suggesting a mechanism by which genetic vulnerability in reward processing can increase predisposition to obesity.
Keywords: dopamine, fMRI, genetics, modeling, obesity, reinforcement learning
Introduction
Although variations in the fat mass and obesity-associated (FTO) gene are currently the strongest known genetic factor predisposing humans to nonmonogenic obesity 1 (Dina et al., 2007; Frayling et al., 2007), recent experiments have linked the same variants to a broad spectrum of altered behavioral responses (for review, see Hess and Brüning, 2014), including food choice, attention deficiency, impulse control, and substance abuse (Sobczyk-Kopciol et al., 2011; Choudhry et al., 2013; Karra et al., 2013; Chuang et al., 2015); also, the A variant of rs9939609 has been recently associated with a lower risk of depression (Samaan et al., 2013). However, the underlying neurobiological mechanisms by which FTO or obesity-predisposing variants of the human FTO gene affect behavior remain elusive.
Importantly, the dopaminergic mesolimbic circuitry and brain autonomic networks are critical regulators of these behaviors and are altered upon obesity development (Stice et al., 2008; Kenny, 2011a; Volkow et al., 2011). Moreover, our recent analysis of Fto-deficient mice revealed that a lack of Fto specifically impairs dopamine receptor D2/3-mediated control of neuronal activation. Here Fto deficiency led to increased 6-methyl adenosine modification of specific mRNAs of critical components of D2/3R-signaling, including that of D3R and the GIRK2-channel, thus reducing their translation and affecting dopamine-dependent regulation of locomotor activity and reward sensitivity (Hess et al., 2013). Consistently, behavioral alterations associated with FTO variants in humans have also been linked to altered dopaminergic transmission (Kenny, 2011b). These findings raise the possibility that dopamine-regulated neuronal responses and associated behavioral patterns might be affected in human carriers of FTO risk alleles as well.
Another genetic factor influencing D2R signaling and body weight is the TaqIA restriction fragment length polymorphism (rs1800497), located in the ankyrin repeat and protein kinase domain-containing protein (ANKK)1 gene, downstream from the D2R gene (Neville et al., 2004). Healthy individuals who carry the A1 allele, compared with those who do not, show diminished striatal D2R density (Jönsson et al., 1999) and reduced glucose metabolism in dopaminoceptive regions involved in reward processing (Noble et al., 1997). This genetic trait has been shown to moderate (1) increased likelihood of obesity (Noble et al., 1994), (2) food reinforcement and intake, especially in obese individuals (Epstein et al., 2007), and (3) the association between neural responses and weight gain (Stice et al., 2008).
Because Fto regulates dopaminergic signaling in mice and ANKK1 affects D2R signaling in humans, we tested whether FTO and ANKK1 gene variants may interact to control D2-dependent behavior and associated neural responses. We reasoned that such an interaction would provide direct evidence that FTO gene variants modulate D2-dependent neurotransmission in humans as well.
To evaluate the individual contributions and potential interaction of FTO and ANKK1 gene variants in dopamine-controlled behavior, we tested the effect of genotype on reward and avoidance learning. To investigate whether rewarding outcomes engage DA signaling depending on genotype, we used fMRI. Our prior finding from Fto-deficient mice (Hess et al., 2013) suggested that a lack of Fto specifically impairs D2/3R-mediated autoinhibition of dopaminergic midbrain neurons. Furthermore, ANKK1 genotype modulates midbrain response to rewards in humans (Felsted et al., 2010), and reward prediction errors (PEs) are encoded by phasic dopamine release from neurons in the ventral tegmental area/substantia nigra (VTA/SN) (Schultz et al., 1997; Montague et al., 2004). For these reasons, we focused our primary analysis on PE signals in the midbrain.
To further scrutinize these genetic effects on dopaminergic processes, we investigated whether FTO and ANKK1 variants would modulate connections between reward responsive regions, including mesolimbic and mesocortical efferents of the midbrain. To this end, we used dynamic causal modeling (DCM) (Friston et al., 2003) for inferring effective connectivity from fMRI data. Specifically, we examined connectivity between (1) midbrain, (2) ventral striatum, which plays a central role in reward processing (Haber and Knutson, 2010), and (3) medial prefronal cortex, which is crucial for evaluating contextual aspects of reward and in adaptive coding of reward PEs (Park et al., 2012).
Materials and Methods
Participants.
Ninety-two healthy volunteers (45 male) participated in the study. Participants were selected based on the genetic stratification of a larger sample (589 health individuals) and differed according to their FTO (rs9939609 T/A variant) and ANKK1 (rs1800497 G/A variant) genotype but were matched for similar age (26 ± 0.45 years), body mass index (BMI) (23 ± 0.22), and general intelligence (Table 1); participants were further assessed by the Beck Depression Inventory II (Beck et al., 1996) to preclude an acute depression. For reasons unrelated to these criteria, 13 further subjects had to be excluded from data analysis: two participants due to malfunction of the MR scanner, another for an incomplete test phase as the participant experienced panic inside the scanner, and 10 others because they did not perform the task satisfactorily. We used an elimination criterion regarding the performance in the test phase such that subjects whose correct responses on AB trials were less frequent than wrong responses (A<B) were eliminated. In total, 79 subjects were included in further data analyses (Table 1). All participants gave written informed consent to participate in the experiment, which had been approved by the local ethics committee of the Medical Faculty of the University of Cologne (Cologne, Germany).
Table 1.
Genotype | No. of subjects | Gender |
Age (yr) |
WAIS-MS |
BMI |
BDI-II |
|||||
---|---|---|---|---|---|---|---|---|---|---|---|
Male | Female | Mean | SEM | Mean | SEM | Mean | SEM | Mean | SEM | ||
A1−FTO− | 20 | 8 | 12 | 25 | 0.7 | 12.0 | 0.5 | 22.5 | 0.6 | 7.0 | 1.2 |
A1−FTO+ | 21 | 9 | 12 | 27 | 1.1 | 11.2 | 0.4 | 23.9 | 0.9 | 7.6 | 1.0 |
A1+FTO− | 16 | 6 | 10 | 26 | 1.0 | 11.6 | 0.1 | 22.4 | 0.5 | 10.8 | 1.4 |
A1+FTO+ | 22 | 9 | 13 | 26 | 0.9 | 11.0 | 0.1 | 22.5 | 0.4 | 8.0 | 1.2 |
F(3,74) = 0.78 | F(3,74) = 1.79 | F(3,74) = 1.29 | F(3,74) = 1.73 | ||||||||
p = 0.51 | p = 0.16 | p = 0.29 | p = 0.17 |
aData are mean ± SEM. To preclude an acute depression, participants have been assessed using the Beck's Depression Inventory (BDI-II).
DNA isolation and SNP genotyping.
Isolation of DNA from buccal swabs was performed using the QIAamp DNA Blood Mini Kit (# 51106, QIAGEN) according to the manufacturer's instructions. Concentration and quality of the DNA were determined with an ND-1000 UV/Vis-Spectrophotometer (Peqlab). SNP genotyping for rs9939609 (FTO) and rs1800497 (ANKK1) was performed with 20 ng of DNA in triplicates using allelic discrimination assays (TaqMan SNP Genotyping Assays, Applied Biosystems by Invitrogen). The genotyping PCR was performed on a 7900HT Fast Real-Time PCR System (Applied Biosystems), and the resulting fluorescence data were analyzed with Sequence Detection Software version 2.3 (Applied Biosystems).
Reinforcement learning and choice task.
After informed consent was obtained, participants completed the probabilistic selection task developed by Frank et al. (2004) and formerly applied in the same form by Jocham et al. (2011). It consisted of two phases: an initial reinforcement learning (“training”) phase and a subsequent transfer (“test”) phase. We performed both phases during one fMRI session.
During the learning phase, participants were presented with pairs of symbols that were probabilistically associated with reward. In each of three pairs, one symbol was always “better” (i.e., associated with a higher reward probability) than the other, but the differences in the reward probability were unequal across the three pairs. Symbol pairs were presented in random order, and subjects had to learn to choose the more frequently rewarded symbols from these pairs. Immediately after each choice, the outcome (a smiling face indicating a reward or a frowning face for no reward, see Fig. 1a) was revealed. The three stimuli pairs were animal figures and associated with 80%/20%, 70%/30%, or 60%/40% of positive feedback (see Fig. 1b). This setup provided a varied learning scenario, including difficult-to-learn and easier-to-learn trials. Each pair was presented 120 times; the whole session comprised 401 trials, including 41 null events (black screen). After this reinforcement learning session, subjects underwent a test phase where the stimuli consisted of all 15 possible combinations of the 6 animal figures presented during the learning session. In this test phase, the subject was asked to choose the better option (“choose A” trials) or to avoid the worse option (“avoid B” trials, see Fig. 1b) based on the previous experience with the stimulus pairs.
Statistical analysis.
To statistically evaluate differences between FTO and ANKK1 gene variants on choice behavior, unpaired t tests were performed; before this and to test their interaction, an ordinary one-way ANOVA was calculated. Before all statistical calculations, a D'Agostino-Pearson omnibus normality test was done to verify that the data were compatible with a normal distribution. A significance level of p = 0.05 was chosen in all statistical tests.
Reinforcement learning model.
A standard action-value (Q) learning model (Watkins and Dayan, 1992) was fitted to the participant's behavior in the reinforcement learning phase (for details, see Jocham et al., 2011). The model estimates the action values, Q(A), Q(B), …, Q(F), for each of the six stimuli, A to F. These values are updated on every trial as follows:
where i is the current trial, α is the learning rate, and δ is the PE, which is computed for any given trial i according to the following:
where ri is the reward on trial i, which is either 1 or 0.
Therefore, in case of a positive reward, PE will be positive because reward is modeled with a value of 1; by contrast, a nonrewarding trial is modeled with a value of 0, resulting in a negative PE. The learning rate α scales the impact of the PE (i.e., the degree to which PE is used to update the action value). This model assumes that subjects make choices based on the softmax decision rule (Sutton and Barto, 1998). That is, on each trial i, the probability of the model for (e.g., choosing A) is as follows:
The parameter τ reflects the subject's individual bias toward either exploratory (random choice of a response) or exploitatory (choice of response with the highest Q value) behavior. To fit the free model parameters α and τ to choices that were actually made by the participant, the negative log-likelihood was minimized. A systematic grid search procedure examined both parameters, from 0.01 to 1 for α, and from 0.01 to 3 for τ, with a step size of 0.01.
Previous studies on reinforcement learning have suggested that humans may differ in learning from positive or negative PEs (e.g., Niv et al., 2012). It has therefore been proposed (Collins and Frank, 2012; Niv et al., 2012; Gershman, 2015) that separate learning rates may mediate updates in response to positive and negative PEs, respectively. For the behavioral data recorded in this study, however, statistical model comparison indicated that a model with two distinct learning rates was inferior to a model with a single learning rate (i.e., assuming two learning rates did not explain the behavioral data better than using a single learning rate when taking into account the added model complexity afforded by the additional parameter). Specifically, using the Bayesian Information Criterion (Schwarz, 1978) and random effects Bayesian model selection (Stephan et al., 2009), we found that the more parsimonious model with a single learning rate was favored very strongly, with a protected exceedance probability of 0.995 (Rigoux et al., 2014). As a consequence, we used the results from the single learning rate model for all subsequent analyses.
fMRI acquisition and analysis.
Imaging was performed on a Siemens 3T Trio scanner (maximum gradient strength 40 mT/m). Functional time series of each subject were acquired with a TxRx head coil (Siemens). For functional time series, 30 axial slices (field of view 192 mm × 192 mm, thickness 3 mm, 0.3 mm interslice gap, 64 × 64 pixel matrix) parallel to the commissural line (AC-PC) were acquired in a descending order from top to bottom using a single-shot gradient echo-planar imaging sequence (EPI: TR = 2000 ms, TE = 30 ms, bandwidth = 116 kHz, flip angle 90°). Additionally, high-resolution T1-weighted images were acquired in a separate scanning session using a 12-channel array head coil with a whole-brain field of view (MDEFT3D: TR = 1930 ms, TI = 650 ms, TE = 5.8 ms, 128 sagittal slices, resolution = 1 × 1 × 1.25 mm3, flip angle = 18°).
fMRI data were analyzed using statistical parametric mapping (SPM8; Wellcome Department of Imaging Neuroscience, London) in MATLAB version 7.12 (The MathWorks). Following realignment of the functional images and coregistration of the structural image to the mean functional image, we segmented the structural image and normalized both functional and structural images to a standard template in MNI coordinate space. The functional images were smoothed, applying an 8 mm FWHM Gaussian kernel and resampled to isotropic resolution. Additionally, a high-pass filter with a cutoff of 1/128 Hz was applied to remove all slowly varying signals from functional data.
Preprocessed scans from the learning phase were analyzed with GLMs, using maximum likelihood estimation for serially autocorrelated observations (Worsley and Friston, 1995) at the first level, with SPM8 (version 4290). The design matrix comprised regressors for reward and punishment onsets as well as the motion parameters, and the positive and negative PEs separately derived from the model as parametric modulators. Preprocessed scans from the test phase were modeled with a separate GLM at single-subject level, where onsets for “choose A” and “avoid B,” as well as onsets for events of no interest (any stimuli except “choose A” and “avoid B”), and motion parameters were included as regressors. Relevant single-subject activations were further evaluated with volume-of-interest (VOI) analysis. Based on our prior finding from Fto-deficient mice (Hess et al., 2013), that a lack of Fto specifically impairs D2/3R-mediated autoinhibition of dopamine neurons in the midbrain, we applied an anatomical mask for VOI analyses of the VTA/SN (Bunzeck and Düzel, 2006). Additionally, there is evidence that the ANKK1 genotype modulates the midbrain response to rewards in humans (Felsted et al., 2010); this result, however, was based on a categorical analysis and did not use a model of trialwise computational quantities.
After individual fMRI data from learning and transfer phases had been subjected to GLM analysis and relevant contrasts had been estimated, we searched for the peak effect size within our VTA/SN VOI for each condition at the single-subject level, using the RFXplot toolbox (Gläscher, 2009). To statistically test differences between FTO and ANKK1 gene variants on VTA/SN activation, unpaired two-tailed t tests were performed; before this and to test their interaction, an ANOVA was calculated. Before all statistical calculations, a D'Agostino-Pearson omnibus normality test was done to verify that the data were compatible with a normal distribution. A significance level of p = 0.05 was chosen in all statistical tests. To assess the relationship between PE coding in the VTA/SN VOI and choice performance, a linear (Pearson's correlation) regression model was used.
DCM.
DCM represents a Bayesian framework for identification and comparison of hierarchical generative models (state-space models) of neuroimaging data. For fMRI, DCM is primarily used to estimate the strengths of directed connections between brain regions that are active during a particular task, and how these connections change dynamically as a function of some controlled experimental variable (e.g., PE) (Friston et al., 2003). Here, we constructed a simple three-region DCM to infer whether FTO and ANKK1 genotype variants modulate connections of a basic reward circuit, including mesolimbic and mesocortical efferents of the dopaminergic midbrain. Specifically, we quantified the effective connectivity between (1) VTA/SN, (2) nucleus accumbens (NAcc), and (3) medial prefrontal cortex (mPFC); compare the introduction for a rational and references. We used Bayesian model selection to investigate different variants of our three-region DCM; the set of alternative models (model space) is described below.
Specification of DCMs.
We created and estimated DCMs with DCM10 as implemented in SPM8 (version 4290). The DCMs were based on the ROIs and time series extraction described above and used the main effects as outlined above (see Model space construction). All DCMs were deterministic, two of eight alternative models were bilinear, and all others were nonlinear DCMs, where activity between two regions is modulated by a third region (Stephan et al., 2008).
ROI time series extraction.
We used a combination of anatomical and functional constraints to extract regional time series. For an anatomical definition of VTA/SN and NAcc, we applied masks: provided by Bunzeck and Düzel (2006) and the Harvard-Oxford Subcortical Structural Atlas, respectively. For time series extraction from VTA/SN and NAcc, a 3 mm sphere was defined within the anatomical masks and around the peak voxel of each subject's “positive PE” (for VTA/SN) and “reward > punishment” contrast (for NAcc). For mPFC, we defined a 3 mm sphere around the peak voxel, which was limited to a search region of 6 mm distance from the group level “reward > punishment” contrast maximum [−3, 53, 1] and anatomically constrained by a mPFC mask created with the tool neurosynth (http://wagerlab.colorado.edu/). All time series were adjusted by an “effects of interest” F contrast, thus removing confounds, such as head movements (represented by a linear combination of realignment parameters).
Model space construction.
For defining the inputs used in the DCM analyses, we constructed a further GLM where rewarded and unrewarded trials were merged into the same regressor (“trial”), followed by two regressors of positive and negative PEs. Based on the activations in our GLM analyses as well as the longstanding literature about reward neurocircuitry, we focused on a simple three-region model. The key idea of this model is that activity in VTA/SN encodes trialwise PEs (i.e., a bilinear modulation of VTA/SN self-connections with trialwise PEs; encoded in the B matrix), and that the efferent connections of VTA/SN convey this PE signal to dopaminoceptive target regions (here: NAcc and mPFC), either via the endogenous connections of the model (A matrix) or in a nonlinear (multiplicative) fashion (D matrix).
Two structural elements (the A and B matrices) were identical for all models. We assumed a fully connected model (i.e., bidirectional connections between SN/VTA, NAcc, and mPFC) (see Fig. 2e), and we assumed that PEs modulated the self-connections of VTA/SN in a trial-by-trial fashion. Two other model components varied across models (C and D matrices), resulting in model space with a 2 × 4 factorial structure. First, we considered that the driving input “trial” would either enter the midbrain or drive all three regions. Second, in each model, midbrain activity modulated one of the four different pairs of connections (i.e., VTA/SN↔ Nacc, VTA/SN↔ mPFC, mPFC↔ Nacc, or none of the connections). In total, we thus estimated eight alternative models per subject with DCM.
Bayesian model selection and Bayesian model averaging.
We used random effects Bayesian model selection with Gibbs sampling for model comparison, which yields a posterior probability for each of the tested models (Stephan et al., 2009). We grouped models into four families based on the D matrix (i.e., how midbrain activity modulated different connections as explained above). Each family consisted of two nested models differing only by their driving input configuration (C matrix). Familywise model comparison showed that the models without any nonlinear (quadratic influence) of the VTA/SN on other connections best described the data, as indicated by a protected exceedance probability of 1 (compare Rigoux et al., 2014). We then merged the two models of the winning family using Bayesian model averaging (Penny et al., 2010), where the parameter estimates of each model considered are weighted by the posterior probability of the model. The resulting parameter estimates provided a basis for examining genetic effects on connectivity strengths within our modeled reward circuit. Specifically, we ran post hoc two-tailed t tests between carriers and noncarriers on the subjectwise parameter estimates provided by Bayesian model averaging: (1) A1+ versus A1− group, (2) FTO+ versus FTO− group, and (3) the interaction of both.
Results
To evaluate the individual contributions and potential interaction of FTO and ANKK1 gene variants in dopamine controlled-behavior, we tested the effect of genotype on reward and avoidance learning by using a probabilistic selection task. To this end, we recruited a cohort of 589 healthy young individuals, which underwent neuropsychological testing and genotyping for the rs9939609 T/A variant of the FTO gene and for the rs1800497 G/A variant in the ANKK1 gene known to affect D2R density. This allowed us to stratify our sample and focus on a subgroup of 92 individuals who were matched for age, BMI, and general intelligence but differed according to their FTO and ANKK1 genotype (Table 1). Groups were defined according to FTO genotype (FTO−: TT; FTO+: AT, AA), ANKK1 genotype (A1−: GG; A1+: AG, AA), or the combination of both (FTO−A1−; FTO−A1+; FTO+A1−; FTO+A1+). All individuals underwent an established probabilistic learning fMRI task that is sensitive to alterations of dopaminergic transmission as evidenced both by genetic and pharmacological tests (Frank et al., 2004; Klein et al., 2007; Jocham et al., 2011) (Fig. 1a,b) and distinguishes learning about rewarded events (“choose A” trials) from learning to avoid nonrewarded choices (“avoid B” trials).
Comparing behavioral performance on this task between FTO genotypes (group × choice interaction: F(1,152) = 6.22, p = 0.014) revealed no significant difference during “choose A” trials (p = 0.112; Fig. 1c), whereas correct choices during the “avoid B” trials were significantly (p = 0.009) reduced in the FTO+ compared with the FTO− group of participants. Similarly, and in line with previous studies (Klein et al., 2007), whereas correct choices during “choose A” trials did not differ significantly between A1− and A1+ individuals (p = 0.206; Fig. 1d), correct choices during the “avoid B” trial were significantly (p = 0.048) reduced in A1+ compared with A1− individuals (group × choice interaction: F(1,152) = 3.05, p = 0.08). Interestingly, comparing the effect of combined FTO and ANKK1 genotypes revealed a trend towards a reduction of correct choices during the “choose A” trial only between FTO−A1− versus FTO+A1+ carriers (p = 0.066, whereas the gene × gene interaction: F(3,74) = 3.74, p = 0.601, is not significant; Fig. 1e). However, there was a robust reduction of correct choices during “avoid B” trials in a gene dosage-dependent manner; that is, correct choices to “avoid B” decreased in the presence of either the FTO+ or A1+ allele and carriers of the combination of both at-risk alleles performed significantly worse than carriers of the individual at-risk alleles (gene × gene interaction: F(3,74) = 2.88, p = 0.041; Fig. 1f). These experiments indicate that FTO gene variants affect D2-dependent learning from negative outcomes and that the group differences in learning behavior are determined by the combination of both genotypes, moreover pointing toward a genetic interaction of FTO- and ANKK1-regulated processes.
To address the effect of the FTO gene variants, the ANKK1 gene variant, and their interaction on neuronal activation, we used fMRI to investigate whether rewarding outcomes engaged DA neurons depending on genotype. Here, we focused our primary analysis on PE processing in the VTA/SN and thus performed VOI analyses in this area.
fMRI measurements of VTA/SN activity revealed a significantly reduced positive PE response in a gene-dosage-dependent manner (i.e., the peak effect size associated with neural response of the positive PE in VTA/SN decreased in the presence of either the FTO+ or A1+ allele), and carriers of the combination of both risk alleles exhibited significantly reduced responses compared with noncarriers of the individual at-risk alleles (FTO−A1−; Fig. 2a).
Strikingly, these reduced PE responses in the dopaminergic VTA/SN (Fig. 2b) were associated with poorer ability to avoid negative outcomes during a later test phase (Fig. 2d), whereas learning to select the most rewarding stimulus (choose A) did not correlate with a positive PE response in VTA/SN (Fig. 2c). These results demonstrate that the FTO gene variants alter midbrain responses during reward learning, which is in turn associated with impaired avoidance learning. Again, we found a gene × gene interaction with ANKK1 variants (F(3,74) = 3.82, p = 0.013).
To further scrutinize these genetic effects on dopaminergic processes, we investigated whether FTO and ANKK1 variants would modulate the functional coupling between reward-responsive mesolimbic and mesocortical regions. Hence, we used DCM for inferring effective connectivity from the fMRI data (Fig. 3a). This connectivity analysis suggested a modulatory effect of FTO on the connectivity from VTA/SN to NAcc (p = 0.055; Fig. 3b, left) and from NAcc to mPFC (p = 0.017; Fig. 3b, middle), whereas ANKK1 influenced connections from VTA/SN to mPFC (p = 0.045; Fig. 3b, right). Strikingly, increased connection strengths between VTA/SN and NAcc were associated with poorer ability to avoiding negative outcomes (Fig. 3c). These results provide further evidence that FTO regulates the function of dopaminergic brain areas in the context of avoidance learning.
Discussion
Variations in the FTO gene have been robustly linked to obesity across multiple studies and ethnicities (Frayling et al., 2007). The underlying mechanism explaining how the FTO gene product contributes to obesity-related behaviors has remained largely unclear. Based on recent evidence that Fto regulates D2/3R signaling in mice, we tested whether obesity-predisposing variants of FTO in humans would influence D2R-dependent behavioral and neural responses during a reward and avoidance learning task and whether they would interact with variants of ANKK1, which is also associated with obesity and D2R signaling (Noble et al., 1994; Stice et al., 2008) to influence behavioral, neural, and perceptual responses during a reward learning and in response to reward. The present behavioral and fMRI analyses indeed revealed an interaction of these gene variants and thereby suggest a role for FTO variants in regulating reward learning in humans.
Both pharmacological and genetic studies have linked the ability to learn from positive and negative feedback to dopamine-dependent neurotransmission within basal ganglia neurcircuitry (Hikida et al., 2010; Frank and Fossella, 2011; Chowdhury et al., 2013). The “direct” pathway, populated mostly by D1R-expressing neurons, is critical for optimizing behavior based on positive outcomes and error signaling, whereas the “indirect” pathways, using mostly D2/3R-expressing neurons, is critical in optimizing behavior based on negative outcomes and error monitoring (Frank and Fossella, 2011). Consistent with our hypothesis that FTO and ANKK1 variants produce synergistic effects on D2/3 receptor neurotransmission, individuals possessing both at-risk alleles performed significantly worse on negative, but not positive, outcome learning. We also found reduced responses in the VTA/SN associated with PE signaling in a gene-dosage-dependent fashion, with reduced responses in carriers of a single at-risk allele and further reductions in carriers of both at-risk alleles. This diminished midbrain response during the generation of PEs was associated with the magnitude of impaired performance in negative outcome learning. No associations were observed here with positive outcome learning, which was unaffected by genotype.
Frank et al. (2007) previously showed that T/T homozygotes of the DRD2 gene, who have the highest D2 receptor availability, performed selectively better on avoiding nonrewarded choices (“avoid-B” trials). This is compatible with our findings, which show that performance on “avoid-B“ trials was significantly reduced in genetic groups with lower D2 receptor density in the striatum (i.e., A1+ group) as demonstrated by combined genetics and positron emission tomography studies (Pohjalainen et al., 1998; Jönsson et al., 1999) and lower mesostriatal DA transmission due to higher midbrain autoinhibition mediated by D2R autoreceptors (FTO+ group). As a cautionary note, it must be pointed out that the latter is inferred from genetic and physiological studies in mice (Hess et al., 2013) and has not been directly demonstrated in humans. Nevertheless, by using positron emission tomography with separate tracers selective for D1R and D2R and relating these measures to performance on a different probabilistic selection task (Frank et al., 2004; Frank and Hutchison, 2009), Cox et al. (2015) recently confirmed that individual differences in human reward and avoidance learning are indeed predicted by variability in striatal D1 and D2 receptor binding, respectively. Specifically, their results do support a selective modulation of learning from negative outcomes by D2 signaling.
Our results demonstrate that FTO and ANKK1 variants not only alter midbrain responses individually during reward learning but also suggest that their effects interact. Moreover, both variants differentially affected mesolimbic and mesostriatal connectivity during reward learning, which, in turn, was associated with impaired avoidance learning. We need to emphasize, however, that our connectivity results are reported at uncorrected levels and should thus be considered with some caution. Generally, correcting connectivity estimates based on any generative model, such as DCM, for multiple comparisons is a nontrivial issue because of the posterior dependencies of model parameters that are ubiquitously encountered in biological systems (Gutenkunst et al., 2007). These dependencies render conventional correction methods, such as Bonferroni correction, very conservative (Stephan et al., 2010).
An additional limitation of our study concerns the imaging method (fMRI). One interpretational caveat concerning all fMRI studies of midbrain activity or connectivity is that BOLD signals from the midbrain are not guaranteed to reflect the activity of dopaminergic neurons because the midbrain is heterogeneous in cellular composition and also contains GABAergic (Steffensen et al., 1998; Korotkova et al., 2004) and a small proportion of glutamatergic neurons (Morales and Root, 2014), such an anatomical complexity is paralleled by a functional complexity because dopaminergic neurons can corelease glutamate or GABA (Pignatelli and Bonci, 2015). However, as demonstrated by multimodal investigations of the correspondence between striatal DA release and midbrain BOLD activity in response to reward PEs or novel stimuli (Düzel et al., 2009), for paradigms specifically probing (reward) PEs, one may be relatively confident that phasic BOLD responses mainly arise from dopamine neuron activity. Additionally, the genetic effects we investigate in this paper have an established biological relation to dopamine signaling; as mentioned above, the ANKK1 gene is known to affect D2R density (Pohjalainen et al., 1998; Jönsson et al., 1999), and Fto affects DRD2-signaling mice (Hess et al., 2013).
Direct evidence linking impaired avoidance learning to risk for obesity is lacking, and our study only considers normal-weight participants. A comparison of groups with different mean BMI would have introduced potential confounds that are more severe. For example, BMI per se has previously been demonstrated to impact on dopaminergic signaling both in rodents and humans (Wang et al., 2001; Volkow et al., 2011; Babbs et al., 2013). By matching BMI across groups in our study, we deliberately focused on “pure” effects of risk genes on DA signaling in the absence of general BMI effects. In other words, we asked whether the combined presence of ANKK1 and FTO risk alleles might affect DA signaling and choice behavior before the manifestation of increased body weight.
However, obese compared with normal-weight individuals perform worse on negative, but not positive, outcome learning (Coppin et al., 2014), and rodents with diet-induced obesity fail to resist lever pressing for food in the presence of an aversive shock (Johnson and Kenny, 2010). Failure to avoid negative outcomes is in turn linked to impulsivity and obesity (Davis and Fox, 2008; Davis et al., 2008; Stoeckel et al., 2013). Thus, the synergistic impairment of D2/3R-dependent signaling in combined FTO/ANKK1 risk allele carriers, which our study found to reduce the ability for avoidance learning may thereby contribute to a predisposition for obesity. Further investigations of the combined effect of ANKK1/FTO risk alleles on dopaminergic signaling will be important, not only for clarifying their role in obesity development but also in the manifestation of other disorders with altered D2/3-R-dependent impulse control, such as substance abuse or pathological gambling.
Footnotes
This study was carried out within the framework of the German Competence Network Obesity (01GI1122A). Furthermore, M.T., T.O.J.G., and M.U. were supported by the German Research Foundation in the Clinical Research Group 219. L.S., K.E.S., J.C.B., and M.T. were supported by the German Research Foundation in the Transregional Collaborative Research Center 134. K.E.S. was supported by the René and Susanne Braginsky Foundation. We thank Nadine Spenrath for excellent technical assistance; and Nico Bunzeck and Emrah Düzel for providing the anatomical masks for delineating the VTA/SN.
The authors declare no competing financial interests.
References
- Babbs RK, Sun X, Felsted J, Chouinard-Decorte F, Veldhuizen MG, Small DM. Decreased caudate response to milkshake is associated with higher body mass index and greater impulsivity. Physiol Behav. 2013;121:103–111. doi: 10.1016/j.physbeh.2013.03.025. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Beck AT, Steer RA, Brown GK. San Antonio, TX: Psychological Corporation; 1996. Manual for Beck Depression Inventory-II (BDI-II) [Google Scholar]
- Bunzeck N, Düzel E. Absolute coding of stimulus novelty in the human substantia nigra/VTA. Neuron. 2006;51:369–379. doi: 10.1016/j.neuron.2006.06.021. [DOI] [PubMed] [Google Scholar]
- Choudhry Z, Sengupta SM, Grizenko N, Thakur GA, Fortier ME, Schmitz N, Joober R. Association between obesity-related gene FTO and ADHD. Obesity (Silver Spring) 2013;21:E738–E744. doi: 10.1002/oby.20444. [DOI] [PubMed] [Google Scholar]
- Chowdhury R, Guitart-Masip M, Lambert C, Dayan P, Huys Q, Düzel E, Dolan RJ. Dopamine restores reward prediction errors in old age. Nat Neurosci. 2013;16:648–653. doi: 10.1038/nn.3364. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chuang YF, Tanaka T, Beason-Held LL, An Y, Terracciano A, Sutin AR, Kraut M, Singleton AB, Resnick SM, Thambisetty M. FTO genotype and aging: pleiotropic longitudinal effects on adiposity, brain function, impulsivity and diet. Mol Psychiatry. 2015;20:140–147. doi: 10.1038/mp.2014.49. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Collins AG, Frank MJ. How much of reinforcement learning is working memory, not reinforcement learning? A behavioral, computational, and neurogenetic analysis. Eur J Neurosci. 2012;35:1024–1035. doi: 10.1111/j.1460-9568.2011.07980.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Coppin G, Nolan-Poupart S, Jones-Gotman M, Small DM. Working memory and reward association learning impairments in obesity. Neuropsychologia. 2014;65:146–155. doi: 10.1016/j.neuropsychologia.2014.10.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cox SM, Frank MJ, Larcher K, Fellows LK, Clark CA, Leyton M, Dagher A. Striatal D1 and D2 signaling differentially predict learning from positive and negative outcomes. Neuroimage. 2015;109:95–101. doi: 10.1016/j.neuroimage.2014.12.070. [DOI] [PubMed] [Google Scholar]
- Davis C, Fox J. Sensitivity to reward and body mass index (BMI): evidence for a non-linear relationship. Appetite. 2008;50:43–49. doi: 10.1016/j.appet.2007.05.007. [DOI] [PubMed] [Google Scholar]
- Davis C, Levitan RD, Kaplan AS, Carter J, Reid C, Curtis C, Patte K, Hwang R, Kennedy JL. Reward sensitivity and the D2 dopamine receptor gene: a case-control study of binge eating disorder. Prog Neuropsychopharmacol Biol Psychiatry. 2008;32:620–628. doi: 10.1016/j.pnpbp.2007.09.024. [DOI] [PubMed] [Google Scholar]
- Dina C, Meyre D, Gallina S, Durand E, Körner A, Jacobson P, Carlsson LM, Kiess W, Vatin V, Lecoeur C, Delplanque J, Vaillant E, Pattou F, Ruiz J, Weill J, Levy-Marchal C, Horber F, Potoczna N, Hercberg S, Le Stunff C, et al. Variation in FTO contributes to childhood obesity and severe adult obesity. Nat Genet. 2007;39:724–726. doi: 10.1038/ng2048. [DOI] [PubMed] [Google Scholar]
- Düzel E, Bunzeck N, Guitart-Masip M, Wittmann B, Schott BH, Tobler PN. Functional imaging of the human dopaminergic midbrain. Trends Neurosci. 2009;32:321–328. doi: 10.1016/j.tins.2009.02.005. [DOI] [PubMed] [Google Scholar]
- Epstein LH, Temple JL, Neaderhiser BJ, Salis RJ, Erbe RW, Leddy JJ. Food reinforcement, the dopamine D2 receptor genotype, and energy intake in obese and nonobese humans. Behav Neurosci. 2007;121:877–886. doi: 10.1037/0735-7044.121.5.877. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Felsted JA, Ren X, Chouinard-Decorte F, Small DM. Genetically determined differences in brain response to a primary food reward. J Neurosci. 2010;30:2428–2432. doi: 10.1523/JNEUROSCI.5483-09.2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Frank MJ, Fossella JA. Neurogenetics and pharmacology of learning, motivation, and cognition. Neuropsychopharmacology. 2011;36:133–152. doi: 10.1038/npp.2010.96. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Frank MJ, Hutchison K. Genetic contributions to avoidance-based decisions: striatal D2 receptor polymorphisms. Neuroscience. 2009;164:131–140. doi: 10.1016/j.neuroscience.2009.04.048. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Frank MJ, Seeberger LC, O'Reilly RC. By carrot or by stick: cognitive reinforcement learning in parkinsonism. Science. 2004;306:1940–1943. doi: 10.1126/science.1102941. [DOI] [PubMed] [Google Scholar]
- Frank MJ, Moustafa AA, Haughey HM, Curran T, Hutchison KE. Genetic triple dissociation reveals multiple roles for dopamine in reinforcement learning. Proc Natl Acad Sci U S A. 2007;104:16311–16316. doi: 10.1073/pnas.0706111104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Frayling TM, Timpson NJ, Weedon MN, Zeggini E, Freathy RM, Lindgren CM, Perry JR, Elliott KS, Lango H, Rayner NW, Shields B, Harries LW, Barrett JC, Ellard S, Groves CJ, Knight B, Patch AM, Ness AR, Ebrahim S, Lawlor DA, et al. A common variant in the FTO gene is associated with body mass index and predisposes to childhood and adult obesity. Science. 2007;316:889–894. doi: 10.1126/science.1141634. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Friston KJ, Harrison L, Penny W. Dynamic causal modelling. Neuroimage. 2003;19:1273–1302. doi: 10.1016/S1053-8119(03)00202-7. [DOI] [PubMed] [Google Scholar]
- Gershman SJ. Do learning rates adapt to the distribution of rewards? Psychon Bull Rev. 2015 doi: 10.3758/s13423-014-0790-3. doi: 10.3758/s13423-014-0790-3. Advance online publication. Retrieved Jan. 13, 2015. [DOI] [PubMed] [Google Scholar]
- Gläscher J. Visualization of group inference data in functional neuroimaging. Neuroinformatics. 2009;7:73–82. doi: 10.1007/s12021-008-9042-x. [DOI] [PubMed] [Google Scholar]
- Gutenkunst RN, Waterfall JJ, Casey FP, Brown KS, Myers CR, Sethna JP. Universally sloppy parameter sensitivities in systems biology models. PLoS Comput Biol. 2007;3:1871–1878. doi: 10.1371/journal.pcbi.0030189. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Haber SN, Knutson B. The reward circuit: linking primate anatomy and human imaging. Neuropsychopharmacology. 2010;35:4–26. doi: 10.1038/npp.2009.129. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hess ME, Brüning JC. The fat mass and obesity-associated (FTO) gene: obesity and beyond? Biochim Biophys Acta. 2014;1842:2039–2047. doi: 10.1016/j.bbadis.2014.01.017. [DOI] [PubMed] [Google Scholar]
- Hess ME, Hess S, Meyer KD, Verhagen LA, Koch L, Brönneke HS, Dietrich MO, Jordan SD, Saletore Y, Elemento O, Belgardt BF, Franz T, Horvath TL, Rüther U, Jaffrey SR, Kloppenburg P, Brüning JC. The fat mass and obesity associated gene (Fto) regulates activity of the dopaminergic midbrain circuitry. Nat Neurosci. 2013;16:1042–1048. doi: 10.1038/nn.3449. [DOI] [PubMed] [Google Scholar]
- Hikida T, Kimura K, Wada N, Funabiki K, Nakanishi S. Distinct roles of synaptic transmission in direct and indirect striatal pathways to reward and aversive behavior. Neuron. 2010;66:896–907. doi: 10.1016/j.neuron.2010.05.011. [DOI] [PubMed] [Google Scholar]
- Jocham G, Klein TA, Ullsperger M. Dopamine-mediated reinforcement learning signals in the striatum and ventromedial prefrontal cortex underlie value-based choices. J Neurosci. 2011;31:1606–1613. doi: 10.1523/JNEUROSCI.3904-10.2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Johnson PM, Kenny PJ. Dopamine D2 receptors in addiction-like reward dysfunction and compulsive eating in obese rats. Nat Neurosci. 2010;13:635–641. doi: 10.1038/nn.2519. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jönsson EG, Nöthen MM, Grünhage F, Farde L, Nakashima Y, Propping P, Sedvall GC. Polymorphisms in the dopamine D2 receptor gene and their relationships to striatal dopamine receptor density of healthy volunteers. Mol Psychiatry. 1999;4:290–296. doi: 10.1038/sj.mp.4000532. [DOI] [PubMed] [Google Scholar]
- Karra E, O'Daly OG, Choudhury AI, Yousseif A, Millership S, Neary MT, Scott WR, Chandarana K, Manning S, Hess ME, Iwakura H, Akamizu T, Millet Q, Gelegen C, Drew ME, Rahman S, Emmanuel JJ, Williams SC, Rüther UU, Brüning JC, et al. A link between FTO, ghrelin, and impaired brain food-cue responsivity. J Clinic Invest. 2013;123:3539–3551. doi: 10.1172/JCI44403. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kenny PJ. Reward mechanisms in obesity: new insights and future directions. Neuron. 2011a;69:664–679. doi: 10.1016/j.neuron.2011.02.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kenny PJ. Common cellular and molecular mechanisms in obesity and drug addiction. Nat Rev Neurosci. 2011b;12:638–651. doi: 10.1038/nrn3105. [DOI] [PubMed] [Google Scholar]
- Klein TA, Neumann J, Reuter M, Hennig J, von Cramon DY, Ullsperger M. Genetically determined differences in learning from errors. Science. 2007;318:1642–1645. doi: 10.1126/science.1145044. [DOI] [PubMed] [Google Scholar]
- Korotkova TM, Ponomarenko AA, Brown RE, Haas HL. Functional diversity of ventral midbrain dopamine and GABAergic neurons. Mol Neurobiol. 2004;29:243–259. doi: 10.1385/MN:29:3:243. [DOI] [PubMed] [Google Scholar]
- Montague PR, Hyman SE, Cohen JD. Computational roles for dopamine in behavioural control. Nature. 2004;431:760–767. doi: 10.1038/nature03015. [DOI] [PubMed] [Google Scholar]
- Morales M, Root DH. Glutamate neurons within the midbrain dopamine regions. Neuroscience. 2014;282C:60–68. doi: 10.1016/j.neuroscience.2014.05.032. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Neville MJ, Johnstone EC, Walton RT. Identification and characterization of ANKK1: a novel kinase gene closely linked to DRD2 on chromosome band 11q23.1. Hum Mutat. 2004;23:540–545. doi: 10.1002/humu.20039. [DOI] [PubMed] [Google Scholar]
- Niv Y, Edlund JA, Dayan P, O'Doherty JP. Neural prediction errors reveal a risk-sensitive reinforcement-learning process in the human brain. J Neurosci. 2012;32:551–562. doi: 10.1523/JNEUROSCI.5498-10.2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Noble EP, Noble RE, Ritchie T, Syndulko K, Bohlman MC, Noble LA, Zhang Y, Sparkes RS, Grandy DK. D2 dopamine receptor gene and obesity. Int J Eat Disord. 1994;15:205–217. doi: 10.1002/1098-108X(199404)15:3<205::AID-EAT2260150303>3.0.CO;2-P. [DOI] [PubMed] [Google Scholar]
- Noble EP, Gottschalk LA, Fallon JH, Ritchie TL, Wu JC. D2 dopamine receptor polymorphism and brain regional glucose metabolism. Am J Med Genet. 1997;74:162–166. doi: 10.1002/(SICI)1096-8628(19970418)74:2<162::AID-AJMG9>3.0.CO;2-W. [DOI] [PubMed] [Google Scholar]
- Park SQ, Kahnt T, Talmi D, Rieskamp J, Dolan RJ, Heekeren HR. Adaptive coding of reward prediction errors is gated by striatal coupling. Proc Natl Acad Sci U S A. 2012;109:4285–4289. doi: 10.1073/pnas.1119969109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Penny WD, Stephan KE, Daunizeau J, Rosa MJ, Friston KJ, Schofield TM, Leff AP. Comparing families of dynamic causal models. PLoS Comput Biol. 2010;6:e1000709. doi: 10.1371/journal.pcbi.1000709. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pignatelli M, Bonci A. Role of dopamine neurons in reward and aversion: a synaptic plasticity perspective. Neuron. 2015;86:1145–1157. doi: 10.1016/j.neuron.2015.04.015. [DOI] [PubMed] [Google Scholar]
- Pohjalainen T, Rinne JO, Någren K, Lehikoinen P, Anttila K, Syvälahti EK, Hietala J. The A1 allele of the human D2 dopamine receptor gene predicts low D2 receptor availability in healthy volunteers. Mol Psychiatry. 1998;3:256–260. doi: 10.1038/sj.mp.4000350. [DOI] [PubMed] [Google Scholar]
- Rigoux L, Stephan KE, Friston KJ, Daunizeau J. Bayesian model selection for group studies: revisited. Neuroimage. 2014;84:971–985. doi: 10.1016/j.neuroimage.2013.08.065. [DOI] [PubMed] [Google Scholar]
- Samaan Z, Anand SS, Anand S, Zhang X, Desai D, Rivera M, Pare G, Thabane L, Xie C, Gerstein H, Engert JC, Craig I, Cohen-Woods S, Mohan V, Diaz R, Wang X, Liu L, Corre T, Preisig M, Kutalik Z, et al. The protective effect of the obesity-associated rs9939609 A variant in fat mass- and obesity-associated gene on depression. Mol Psychiatry. 2013;18:1281–1286. doi: 10.1038/mp.2012.160. [DOI] [PubMed] [Google Scholar]
- Schultz W, Dayan P, Montague PR. A neural substrate of prediction and reward. Science. 1997;275:1593–1599. doi: 10.1126/science.275.5306.1593. [DOI] [PubMed] [Google Scholar]
- Schwarz G. Estimating the dimension of a model. Ann Stat. 1978;6:461–464. doi: 10.1214/aos/1176344136. [DOI] [Google Scholar]
- Sobczyk-Kopciol A, Broda G, Wojnar M, Kurjata P, Jakubczyk A, Klimkiewicz A, Ploski R. Inverse association of the obesity predisposing FTO rs9939609 genotype with alcohol consumption and risk for alcohol dependence. Addiction. 2011;106:739–748. doi: 10.1111/j.1360-0443.2010.03248.x. [DOI] [PubMed] [Google Scholar]
- Steffensen SC, Svingos AL, Pickel VM, Henriksen SJ. Electrophysiological characterization of GABAergic neurons in the ventral tegmental area. J Neurosci. 1998;18:8003–8015. doi: 10.1523/JNEUROSCI.18-19-08003.1998. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stephan KE, Kasper L, Harrison LM, Daunizeau J, den Ouden HE, Breakspear M, Friston KJ. Nonlinear dynamic causal models for fMRI. Neuroimage. 2008;42:649–662. doi: 10.1016/j.neuroimage.2008.04.262. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stephan KE, Penny WD, Daunizeau J, Moran RJ, Friston KJ. Bayesian model selection for group studies. Neuroimage. 2009;46:1004–1017. doi: 10.1016/j.neuroimage.2009.03.025. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stephan KE, Penny WD, Moran RJ, den Ouden HE, Daunizeau J, Friston KJ. Ten simple rules for dynamic causal modeling. Neuroimage. 2010;49:3099–3109. doi: 10.1016/j.neuroimage.2009.11.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stice E, Spoor S, Bohon C, Small DM. Relation between obesity and blunted striatal response to food is moderated by TaqIA A1 allele. Science. 2008;322:449–452. doi: 10.1126/science.1161550. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stoeckel LE, Murdaugh DL, Cox JE, Cook EW, 3rd, Weller RE. Greater impulsivity is associated with decreased brain activation in obese women during a delay discounting task. Brain Imaging Behav. 2013;7:116–128. doi: 10.1007/s11682-012-9201-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sutton RS, Barto AG. Reinforcement learning. Cambridge, MA: Massachusetts Institute of Technology; 1998. [Google Scholar]
- Volkow ND, Wang GJ, Baler RD. Reward, dopamine and the control of food intake: implications for obesity. Trends Cogn Sci. 2011;15:37–46. doi: 10.1016/j.tics.2010.11.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang GJ, Volkow ND, Logan J, Pappas NR, Wong CT, Zhu W, Netusil N, Fowler JS. Brain dopamine and obesity. Lancet. 2001;357:354–357. doi: 10.1016/S0140-6736(00)03643-6. [DOI] [PubMed] [Google Scholar]
- Watkins CCH, Dayan P. Technical note: Q-Learning. Machine Learning. 1992;8:279–292. doi: 10.1007/BF00992698. [DOI] [Google Scholar]
- Worsley KJ, Friston KJ. Analysis of fMRI time-series revisited, again. Neuroimage. 1995;2:173–181. doi: 10.1006/nimg.1995.1023. [DOI] [PubMed] [Google Scholar]