Skip to main content
PLOS ONE logoLink to PLOS ONE
. 2021 Apr 7;16(4):e0243899. doi: 10.1371/journal.pone.0243899

Reward signalling in brainstem nuclei under fluctuating blood glucose

Tobias Morville 1, Kristoffer H Madsen 2, Hartwig R Siebner 1,3, Oliver J Hulme 1,*
Editor: Tom Verguts4
PMCID: PMC8026025  PMID: 33826633

Abstract

Phasic dopamine release from mid-brain dopaminergic neurons is thought to signal errors of reward prediction (RPE). If reward maximisation is to maintain homeostasis, then the value of primary rewards should be coupled to the homeostatic errors they remediate. This leads to the prediction that RPE signals should be configured as a function of homeostatic state and thus diminish with the attenuation of homeostatic error. To test this hypothesis, we collected a large volume of functional MRI data from five human volunteers on four separate days. After fasting for 12 hours, subjects consumed preloads that differed in glucose concentration. Participants then underwent a Pavlovian cue-conditioning paradigm in which the colour of a fixation-cross was stochastically associated with the delivery of water or glucose via a gustometer. This design afforded computation of RPE separately for better- and worse-than expected outcomes during ascending and descending trajectories of serum glucose fluctuations. In the parabrachial nuclei, regional activity coding positive RPEs scaled positively with serum glucose for both ascending and descending glucose levels. The ventral tegmental area and substantia nigra became more sensitive to negative RPEs when glucose levels were ascending. Together, the results suggest that RPE signals in key brainstem structures are modulated by homeostatic trajectories of naturally occurring glycaemic flux, revealing a tight interplay between homeostatic state and the neural encoding of primary reward in the human brain.

Introduction

A basic assumption of many models of adaptive behavior, is that the value of primary rewards are modulated by their capacity to rectify future homeostatic deficits [1, 2]. Compatible with this notion, deprivation-induced hypoglycaemia increases willingness to work for food in rats and humans [3], as well as the subjectively reported pleasure [2]. Dopamine is a neurotransmitter that plays a key role in signalling reward [4] and is involved in behavioural reinforcement, learning, and motivation [5, 6]. Via meso-cortical and mesolimbic dopaminergic projections, synaptic dopamine release modulates the plasticity of cortico-striatal networks and thereby sculpts behavioural policies according to their reward contingencies [4, 7]. Patterns of phasic dopaminergic firing have been demonstrated to follow closely the principles of reinforcement learning, encoding the errors in the prediction of reward [6, 810]. Reward prediction error (RPE) signals appear commensurate with the economic construct of marginal utility, defined as the additional utility obtained through additional units of consumption, where utility is a subjective value inferred from choice [7, 11, 12].

Although animals are motivated by a homeostatic deficit of thirst or hunger, homeostatic states are rarely considered as relevant modulators of dopaminergic signalling of reward prediction errors. In typical paradigms involving cumulative consumption, the homeostatic deficit gradually diminishes as the animal plays for consumption of water or sugar-containing juice. Eventually, the animal rejects further play, presumably because the marginal utility of consumption diminished to a point of indifference or even aversion. Interestingly, a recent electrophysiology study in rats, demonstrated that oral consumption of sodium solution causes phasic dopaminergic signals in the nucleus accumbens, that are modulated by sodium depletion [13].

There is now growing evidence for a multifaceted interface between dopamine mediated reward-signalling and the systems underpinning energy homeostasis. Firstly, dopamine neurons in the ventral tegmental area (VTA) express a suite of receptors targeted by energy-reporting hormones ghrelin, insulin, amylin, leptin and Glucagon Like Peptide 1 (GLP-1) [14, 15]. This provides numerous degrees of freedom for flexibly interfacing between homeostatic state and reward signalling. Although hormonal modulations of phasic dopamine are yet to be fully scrutinised, there is emerging evidence that circulating factors do indeed modulate its magnitude. For instance, amylin, a hormone co-released with insulin, acts on the VTA to reduce phasic dopamine release in its mesolimbic projection sites [16]. In terms of neuronal input, there are many such opportunities for the appetitive control of dopamine mediated signalling.

Appetitive control can be delineated into three interacting systems [17]. The first system generates a negative valence signal which involves activity of the Agouti-related peptide (AgRP) neurons of the arcuate nucleus of the hypothalamus (ARC). Activity of ARCAgRP neurons reports on energy deficits, inhibits energy expenditure, and regulates glucose metabolism [1821]. ARC neurons that contain peptide products of pro-opiomelanocortin (POMC) form an opponent code compared with ARCAgRP neurons. The balance between the two neuronal ARC sub-populations putatively encodes the value of near-term energetic states, becoming rapidly modulated just prior to food consumption [22]. The second system codes positive valence signals and consists of circuits involving the lateral hypothalamus (LH). It is linked to positively reinforcing consummatory behaviours via its GABAergic projections to VTA dopamine neurons [23] assumed to trigger positive feedback to keep consumption going during feeding bouts. The third valuation system involves calcitonin gene-related protein (CGRP)-expressing neurons in the parabrachial nuclei (PBN) that potently suppress eating when activated, but do not increase food intake when inhibited. PBNCGRP neurons are activated by signals associated with food intake, and they provide a signal of satiety that has negative valence when strongly activated [24]. The PBN has been characterised as a hedonic hotspot, the modulation of which by either GABA or Benzodiazepines potently modulates experienced reward [25]; ARCAgRP neurons GABA-ergically inhibit PBN neurons, thus stimuli predicting glucose consumption should inhibit ARCAgRP, releasing the PBN from inhibition [26]. Further, hormones related to hunger and feeding (GLP-1 & leptin) modulate PBN activity and subsequent behaviour [27, 28]. Of note, these three valuation systems all project to and modulate the dopaminergic neurons in the ventral tegmental area (VTADA). The interface between these hypothalamic-brainstem networks and the VTADA, is arguably the most important interface for mediating the dialogue between energy homeostasis and value computation.

While most evidence for encoding of RPEs is obtained under homeostatic deprivation, the modulation of RPE signalling triggered by physiological fluctuations in glucose availability (glycaemic flux) remains yet to be characterised in the human brain. This begs the questions, how are RPE signals modulated by these subcortical circuits that integrate, evaluate, and predict energy-homeostatic states? We hypothesize that glucose fluctuations above and below average levels of serum glucose, will down and up modulate RPE responses in these hypothalamic-brainstem networks. To test these hypotheses, we acquired a large volume of fMRI data in five participants during a simple Pavlovian cue-conditioning task, while their serum glucose was systematically manipulated.

Methods

Subjects

Five healthy (3 male), normal-weight subjects in the age range 23 to 29, participated in the study. Exclusion criteria were: 20 > BMI > 25; 18 > Age > 32 yrs; any metabolic or endocrine diseases or gastrointestinal disorder; any known medication that might interfere with the study; claustrophobia; and any metal implants or devices that could not be removed. Informed consent was obtained in writing from all subjects as approved by the Regional Ethics Committee of Region Hovedstaden (protocol H-4-2013-100) and in accordance with the declaration of Helsinki.

Experimental procedure

The experimental design constituted a single-blinded, randomised control trial, with repeated measures crossover-design. On four separate days, subjects fasted for a minimum of twelve hours before testing. Compliance with the fasting instruction was based both on trust, and on the understanding that we would be able to detect if participants had not fasted via blood tests. Any participant that was not in the hypoglycemic range (defined here as <6mmol/L) at the start of the experiment would be assumed to have not fasted, and the session would be aborted. This was not necessary for any participant. At the beginning of an experimental session, subjects ingested either a hi-glucose (75 g, 300 kcal) or lo-glucose preload (10 g, 40 kcal) diluted to 100 ml with a non-caloric lemon juice, used in order to mask the taste of the glucose. The lo-glucose preload resulted in glucose ascending during the fMRI acquisition period, due to the consumption of glucose, whereas during the same period after the hi-glucose preload, glucose levels descended (Fig 1B). The preload conditions are thus referred to as ascending and descending conditions. The temporal order of the conditions was randomised within subjects, with each condition being performed twice. Both preloads were anecdotally reported by independent samplers to be highly palatable. Each delivery of glucose reward was 0.4ml, corresponding to 0.3 g of glucose (1.2 kcal) per delivery.

Fig 1. Experimental design, glucose trajectories, and expected reward signals.

Fig 1

a, participants were presented with the Cueonset (grey fixation cross) for 1-3s after which either Cuehigh (blue cross) or Cuelow (brown cross) is presented with a probability of 0.5 each. Cuehigh signalled a high probability (0.8) of glucose delivery and a low probability (0.2) of water delivery. Cuelow signalled a low probability of glucose (0.2) and a high probability of water (0.8). The 0.4ml of the liquid were delivered over 2.5 seconds, followed by 10-15s wait period and a Cueswallow that cued the subject to swallow (here, purple) which lasted for 5s. All jitters are uniformly distributed within the ranges specified. b, serum glucose trajectories for the high and low glucose preload conditions. Grey shading indicates the period of fMRI acquisition for a single session. The different line plots indicate different sessions for all subjects. Glucose levels ascend during the fMRI acquisition period in the lo-glucose condition, and descend in the hi-glucose condition. c, graph depicts the objective reward expectations, expressed as the expected value in ml glucose, and the perturbation of these expectations under the onset of the experimental cues and outcomes. Note that reward expectations are updated three times per trial: at the onset of the Cueonset; at the onset of Cuehigh or Cuelow; at the onset of Outcomeglucose or Outcomewater. d, illustrates simulated BOLD responses to RPE signals resulting from the updated reward expectations shown in c, generated by convolving the canonical hemodynamic response function with the RPE stick functions evoked by changes to the reward expectations.

Experimental task

After consuming the preload, participants engaged in a simple Pavlovian cue-conditioning task. The colour of the fixation cross cued both the onset of each trial (Cueonset), as well as stochastically predicting glucose delivery (Fig 1A), with one colour signalling a high probability of glucose delivery (Cuehigh), and another signalling a low probability (Cuelow). 10–15 seconds after delivery of the liquid, a purple cross signalled that subjects were to swallow. The large temporal distance between the swallowing and the reward onset, as well as the levels of temporal jitter, was designed to mitigate the contamination of the reward signals by swallowing related artefacts. All probabilities and contingencies were implicitly revealed only through experience in the scanner, and all were stationary over all test days. The mapping between colour and outcome probabilities was counterbalanced across subjects, while the mapping was stationary within and between sessions. Participants went through ~82 trials [82 ± 1.5 SEM] each day giving ~328 trials per subject. Serum glucose measurements were attained immediately before and 20 minutes after ingestion, using a Contour® Next glucose meter (Fig 1B). It should be noted that whilst there is no jitter between the cue and the outcome, independent estimation of each of the effects is achievable by virtue of their probabilistic transitions (Fig 1A).

Scanning procedure

Task related changes in regional brain activity were mapped with blood oxygen dependent (BOLD) MRI immediately after the second glucose measurement (t20). Functional MRI measurements were performed with a 3T Philips Achieva and a 32 channel receive head coil using a gradient echo T2* weighted echo-planar image (EPI) sequence with a repetition time of 2526 ms, and a flip-angle of 80°. Each volume consisted of 40 axial slices of 3 mm thickness and 3 mm in-plane resolution (220 x 220 mm). The axial field-of-view was 120 mm covering the whole brain, cutting off the medulla oblongata partially. This sequence was extensively piloted and optimised specifically for reducing distortion and maximising resolution with the hypothalamic and brainstem regions of interest. During each session, 800 EPI volumes were acquired, resulting in 3200 EPI volumes per subject. Further, an anatomical T1-weighted image was recorded for each subject. Respiration and heart rate were measured to assess and model possible artefacts. Liquid tastants were contained in two 50 ml syringes, one containing water-only (water hence) the other containing glucose and water (glucose hence) solutions, attached to two programmable syringe pumps (AL1000-220, World Precision Instruments Ltd, Stevenage, UK), controlled by the stimulus paradigm script. The liquid was delivered orally via two separate 5m long 3mm wide silicone tubes. Each tube was attached to a gustatory manifold specifically built for the Philips head-coil (John B. Pierce Laboratory, Yale University). Visual stimuli were presented on a screen positioned ~30 cm away from the scanner.

Pre-processing

Pre-processing and image analysis were done using SPM12 software (Statistical Parametric Mapping, Wellcome Department of Imaging Neuroscience, Institute of Neurology, London, UK). To correct for motion, EPI scans were realigned to their mean using a two-step procedure and co-registered to the T1 weighted anatomical image. The realigned images were spatially normalised to the standard ICBM space template of European brains, with a resampled voxel size of 3 mm.

fMRI analysis

After model specification, the ascending condition sessions were concatenated using the function spm_fmri_concatenation (SPM 12) for each subject. Equivalently, the same concatenation was performed for the descending conditions sessions. A first-level fixed effects model was run over all subjects. The concatenation was performed to avoid state-dependent effects being expressed via the session-specific regression coefficients. All variables of interest were convolved with the canonical hemodynamic response function, along with their associated temporal and dispersion derivatives and fitted to the data using the specified GLM. The temporal evolution of cues and outcomes were modelled as separate conditions, each with state as parametric modulators. Regressors of no interest included a discrete cosine transform based 1/128 Hz cut-off frequency high-pass filter, rigid body realignment parameters using a 24 parameter Volterra expansion [29] and physiological noise from heart rate and respiration using the RETROICOR method [30]. We specified the striatum (caudate, putamen and nucleus accumbens), brainstem (pons, ventral tegmental area and substantia nigra) and hypothalamus as regions of interest (ROI). These ROIs were determined on the basis of the literature describing dopamine projections from midbrain to the striatum and its role in regulating behaviour as a function of reward. The pons was selected to accommodate the literature described above, which sets certain nuclei within the pons as important homeostatic modulators. All ROI were defined with the WFU pick atlas [31, 32] and cross checked against the book Atlas of the human brain [33]. All initial first-level analysis was performed as whole-brain uncorrected at p < 0.001. Significant clusters in regions of interest (ROI) are all reported as small-volume corrected with a family-wise threshold of p < 0.05 at cluster level (abbreviated SVC FWE), unless otherwise stated.

Modelling RPEs

At the first level, a general linear model (GLM) was set up to model cue and outcome related brain activity. We specified separate regressors which modelled the onset of cue events (Cueonset, Cuehigh and Cuelow) and outcome events (Outcomegluc & Outcomewater). Since there is no overt behavior in this task to which a temporal difference learning algorithm can be fit, we used a different approach based on how RPEs converge to changes in the conditional expectation values of reward outcomes. As subjects learn the contingencies between the cues and the outcomes, the RPEs evoked by these events converge toward the change in expected value of the reward (here the volume of glucose), conditional on the events they have cumulatively experienced during the trial. This has been shown in single cell recordings, where the RPE signals of dopaminergic cells in the VTA, signal errors whose magnitudes reflect the change in expected value of juice volume, conditional on the events that have been experienced at that time [11]. Fig 1C illustrates how the expectation value of glucose volume evolves over time as a function of the cues observed. The conditional expectation value upon seeing the Cueonset is the expectation value of glucose volume for each trial, conditional on the fact the trial has started. This is shown in Fig 1C, since all lines begin from this starting point of 0.2ml. Thus, Cueonset triggers a small positive RPE, seen as the first spike on the left in Fig 1D. From there the expectation value of glucose increases or decreases depending on whether the Cuehigh or Cuelow is experienced, as labelled in Fig 1C. This causes the second set of spikes seen in Fig 1D, where the green corresponds to the Cuehigh and the purple corresponds to Cuelow. Finally, the outcomes arrive changing again the conditional expectational value of glucose to either 0.4ml or 0ml. This can be thought of as the expectation value of the glucose that the agent can expect to metabolise having received the liquid in its mouth. This corresponds to the third set of spikes in Fig 1D. To approximate the RPEs without behavior, we specified contrasts which were formulated by linear combinations of these cue and outcome regressors, weighted as a function of the RPE values that would be expected from the temporal-difference learning algorithm, once converged [34]. In other words, the cue and outcome regressors were weighted by the change in conditional expectation of glucose volume caused by the regressor’s event (either the cue or the outcome). A contrast of positive RPE signals (RPEpos) was computed by assigning the positive valence cue and outcome regressors (i.e. Cueonset, Cuehigh, and Outcomegluc) contrast weights that were proportional to the change in expectation value of glucose volume that these events caused. Equivalently negative RPEs (RPEneg) were computed equivalently as contrasts to include only the negative valence events (i.e. Cuelow and Cuewater). It should be noted here that in this approach to modelling RPEs, the model does not incorporate any effect of learning, in effect modelling what signals are expected once learning has converged on the expected reward values.

Modelling RPE modulation by glucose

The effect of serum glucose on RPE was modelled via first order parametric modulator of the cue and outcome regressors, taking the interpolated serum glucose at each time point as the covariate (demeaned). This resulted in five parametric modulator regressors, namely Cueonset*state, Cuehigh*state, Cuelow*state, Outcomegluc*state & Outcomewater*state. From these regressors, contrasts can be specified to model the effect of serum glucose on RPEs. The effect of glucose state on positive RPEs (RPEpos*state) was computed as a linear combination of Cueonset*state, Cuehigh*state, and Outcomegluc*state regressors. Equivalently, the effect of glucose state on negative RPEs (RPEneg*state) was computed as a linear combination of (Cuelow*state, Outcomewater*state). All 1st-order parametric modulators are orthogonal to their associated onset regressors by construction. No other orthogonalization of regressors was performed.

Results

Cueonset induced brain activity

An RPE signal should respond to the Cueonset, with an error signal that signals the expected value of glucose reward for the whole trial [11]. Computing the main effect of this regressor, this was found to evoke an increase in activity in VTA bilaterally (Fig 2A). Thus cue-induced VTA activation is consistent with existing evidence of VTA signalling RPEs [3537]. Cueonset also led to the deactivation of the postcentral gyrus (primary somatosensory cortex), mediodorsal thalamus, and the striatum [whole brain, uncorrected p < 0.001] (not shown).

Fig 2. Statistical parametric maps of main effects of trial onset, positive and negative RPEs.

Fig 2

a, main effect of Cueonset, which reflects an RPE following the mean reward expectation for the whole trial, revealed activity in VTA bilaterally (β = 2.77) (R: [4–17–20] and L: [–8–17–20], FWE SVC). Further this revealed deactivation of precentral gyrus (primary somatosensory cortex), mediodorsal thalamus, and striatum (FWE whole brain, not shown). Lower panel shows fitted response for the Cueonset event within the same region. b, main effect of RPEpos revealed activity in left lateral caudate [β = 1.21; coordinates -8 4 7; FWE SVC]. Lower panel shows fitted response for the RPEpos contrast within the region. The colours of time courses follow the same meaning as shown in Fig 1C and 1D. c, main effect of RPEneg revealed bilateral activity in caudate (L: -11–2 13; R: 10 7 1; β = 12.2) medial dorsal thalamic nucleus [7, –2, 22], and lateral insula [43, –2, –17] (all FWE). Lower panel shows fitted response for the RPEneg contrast within the region. All fitted responses were generated by convolving the canonical hemodynamic response function with the RPE stick function multiplied by their respective beta-values extracted from the local maxima of the ROI in units of percent signal change. Again, the time courses follow the same meaning as shown in Fig 1C and 1D. Error bars show standard errors of the mean.

Positive and negative reward prediction error signals

In several brain regions, regional task-related activity changed in proportion with the magnitude of positive-going (i.e. better-than-expected) reward prediction errors (RPEpos) or negative-going (i.e. worse-than-expected) reward prediction errors (RPEneg). Task related activity scaling with the RPEpos, formalized as an RPE-weighted linear combination of Cuetrial, Cuehigh, and Outcomegluc, was found in left lateral caudate nucleus (Fig 2B). Conversely, task related activity reflecting RPEneg, formalized as an RPE-weighted linear combination of Cuelow and Outcomewater, was located in the caudate nucleus bilaterally Fig 2C), the medial dorsal thalamic nucleus, and insula (not shown).

Modulation of task-related brain activity by glycaemic state

We were interested to identify changes in RPE signalling over time as serum glucose either ascended or descended. A bilateral cluster, including the parabrachial nuclei (PBN), showed a modulation of the regional neural responses to RPEs by the glycaemic state dynamics (Fig 4A). Higher levels of serum glucose amplified the response to RPEpos in the PBN region (Fig 3B). The main effect of RPEneg*state, which models the interaction between RPEneg and state, did not yield any significant results in any ROI, or in exploratory analyses using uncorrected thresholds, in positive or negative contrasts. When considering both ascending and descending serum glucose fluctuations together, there was no detectable region where the RPEneg signal was either positively or negatively modulated by serum glucose. Brain responses to Cueonset were also not altered by glycaemic state.

Fig 4. Statistical parametric maps of RPEneg*state subtracted for increasing minus decreasing.

Fig 4

a, negative reward prediction error RPEneg*state revealed glucose modulated activity in SN [±12, -22, -10] and VTA [0, –15, –9] when subtracting the effect of descending from the ascending glucose state [FWE SVC]. b, fitted response (β = 0.34) of the local maxima of cluster [7, -11, 8; 52 voxels] to the three possible trajectories that RPEneg*state yield modulated by serum glucose state. Onsets are not at zero because the negative trajectories do not envelop the trial mean which has a positive expectation. In order of closeness to the viewer, the first set of trajectories is for the Cuehigh- Outcomegluc trial; the second set is for the Cuehigh−Outcomewater trial; the third set is for Cuelow−Outcomegluc trial; the fourth set is for the Cuelow−Outcomewater trial. The fifth set of trajectories (furthest from the viewer) superimpose together all possible trajectories depicting the complexity of how the trajectories are patterned according to their modulation by glucose state. The set of trajectories furthest away from the viewer superimposes the trajectories from all trial types into one plot.

Fig 3. Statistical parametric maps of RPEpos*state and fitted responses over varying glycaemic state.

Fig 3

a, main effect of RPEpos*state revealed bilateral activity in the PBN [-2–29–26; FWE SVC]. The colour bar indicates the t-value on a scale of white to blue. b, fitted response (β = 1.66) of the local maxima of PBN cluster (7 voxels) to the possible trajectories that RPEpos*state, yield (Fig 1D) modulated by serum glucose state. There are four possible trajectories of the positive RPE, according to the four different possible trial types depicted in Fig 1C. These trajectories are modulated by serum glucose and shown as different colours with 5 different equally spaced glycemic states. There are thus 4 sets of these trajectories (the four closest to the viewer), each showing their modulation by glucose in the different colours. In order of closeness to the viewer, the first set of trajectories is for the Cuehigh- Outcomegluc trial; the second set is for the Cuehigh−Outcomewater trial; the third set is for Cuelow−Outcomegluc trial; the fourth set is for the Cuelow−Outcomewater trial. The fifth set of trajectories (furthest from the viewer) superimpose together all possible trajectories depicting the complexity of how the trajectories are patterned according to their modulation by glucose state. For comparison, this presentation is analogous to the superimposed trajectories shown in Fig 1D. Note that the colour bar for the graphs in (b) is serum glucose and is distinct from the colour bar in (a) used to indicate t-values.

We also tested for state-dependent modulatory effects on RPE processing which depends on whether serum glucose was ascending (Fig 1B, left) or descending (Fig 1B, right) over time. This yields four different contrasts (ascending vs. descending and the converse, for RPEpos*state and RPEneg*state) that are directly relevant to glucose state. Subtracting descending trajectories from ascending and vice versa, revealed no significant activity changes for RPEpos*state [whole brain, uncorrected]. The same comparisons for RPEneg*state did reveal significant effects in VTA and substantia nigra for ascending trajectories relative to descending trajectories (Fig 4A). This result shows a relative amplification of the RPEneg*state signal as glucose state increases. In instances where reward was lower-than-expected (thus yielding negative RPE), the glucose state modulated the RPEneg signal in VTA and SN more so when glucose levels were ascending than descending.

Discussion

Participants performed a simple cue-conditioning task involving the probabilistic delivery of glucose or water, whilst their blood glucose fluctuated over the course of an hour. We had hypothesized that low levels of serum glucose will positively modulate the scale of positive RPE responses in hypothalamic-brainstem networks, reflecting the marginal utility of glucose as a function of homeostatic needs. Contrary to this hypothesis, we did not observe any positive RPE that increased its scaling with decreasing levels of serum glucose levels. In exploratory analyses there were however several observations of a dependency between serum glucose and RPE signals. Reward prediction error signalling in the parabrachial nuclei scaled positively with serum glucose levels, and this was true whether glucose was ascending or descending over time. We found that both the VTA and SN became more sensitive to negative RPEs for ascending compared to descending glycaemic trajectories. We begin by discussing the interpretation of these state modulated RPE effects, before considering other effects, and the limitations inherent under this paradigm.

In rodent models, the PBN acts as a 2nd order relay of inputs from the nucleus tractus solitarius, and is critical in the control of energy homeostasis via its projections to amygdala [38, 39], VTA [40], hypothalamus [39, 41] and the nucleus accumbens [42]. Subnuclei of the PBN are targeted by descending projections from several nuclei implicated in energy homeostasis, including hypothalamus, amygdala, and the bed nucleus of the stria terminalis [39, 43]. The PBN is known to be a potent site of reward modulation and subsequent behaviour in rodents. Microinjection of benzodiazepines [4446], endocannabinoids [47], opioids [48, 49] and melanocortin agonists [50] into the PBN, all evoke hyperphagias. To our knowledge, the involvement of PBN in context of hedonics and reward signalling in the human brain remains to be charted. Here we provide tentative evidence that PBN activity generates a positive RPE-like signal that is sensitive to blood glucose and is time-locked to both the sensory cues predicting glucose, as well as glucose consumption.

We found no state modulation of RPEneg signalling (RPEneg*state), expressed during both ascending and descending glycaemic trajectories. For the RPEneg signal, the modulatory effect of the glycaemic trajectory depended on whether glucose trajectories were ascending or descending. Regional activity scaling with RPEneg, the VTA and SN showed significantly higher state modulation effects during ascending compared to the descending glycaemic paths. In our experiment, the ascending glucose trajectory resulted from a low-glucose preload with the subsequent increase over time likely occurring by virtue of the continual ingestion of glucose throughout the paradigm (Fig 1B). In the ascending condition, the neural response to RPEneg is attenuated at lower levels of serum glucose, while it becomes amplified by the transition to higher serum glucose. Given that there is some evidence that dopaminergic neurons of the VTA and SN are directly inhibited by insulin [15], it is possible that the insulin release following hi-glucose preload was highest at the start of the paradigm, decreasing over time, and thus resulting in a gradual decrease in inhibition. However, it should be noted insulin can have a stimulatory effect on dopaminergic firing rates [51]. The difference in RPEneg in its state modulation (RPEneg*state) between ascending and descending may therefore be attributed to differential dynamics of insulin secretion [52], though other hormones such as ghrelin [5254] or leptin [5558] may play a role. It is not presently known, why it would make sense that a behaviorally reinforcing signal, such as phasic RPE, is expressed less as glucose levels are decreasing in a situation where the body is moving towards a state of potential dyshomeostasis.

Our finding that the VTA and SN responses are linked to RPEneg may appear counterintuitive, given that these midbrain regions are typically associated with BOLD responses signalling positive-going RPEs. This is assumed to be by virtue of the fact that a greater range of firing rates can be devoted to the better-than-expected range, signalled by above baseline firing. This is contrasted to the worse-than-expected range, which can only be signalled by a decrease from an already low baseline frequency. It is conceivable that what we are asserting as being RPEneg is in fact a positive RPE resulting from the gradual avoidance of glucose, which increases in magnitude with increasing levels of serum glucose as reported in humans [2] and rats [59]. Thus, as the experimental paradigm continues, especially under the conditions of glucose preload, serum glucose increases, and this may change the valence of the outcome, switching the affective connotation of glucose from palatable to aversive.

As detailed in the introduction, little is known about how the interface between dopaminergic RPE signalling and energy homeostasis is implemented in the human brain. While there are many means by which circulating factors can modulate activity in the VTA and SN, the mechanisms by which this is mediated cannot be revealed without wider hormonal assays. Contemporaneous hormonal sampling, as well as continuous glucose monitoring in the scanner will prove an important step in revealing these latent factors.

There are several technical limitations that should be noted in discussing this experiment. Though relatively high volumes of functional data (150 minutes per subject) were acquired in each subject, the total number of subjects was small. The reason for this was a focusing on maximising experimental power within subjects, for finer scale inference on the longer timescale glucose dynamics. It was not known ahead of time how large the modulatory effects would be, and thus we deployed a conservative strategy of testing fewer subjects for longer. The long regressors that result from the concatenation of sessions may have meant that the high pass filtering would have reduced the effects of ascending and descending glucose levels. Inferring slow timescale dynamics is generally a problem for fMRI, however it is circumvented to some degree here, insofar as we are inferring the modulatory effect of glucose on faster RPE signalling dynamics which occur on a faster timescale than that which is filtered by the high-pass filter. Due to the small number of subjects, we deployed a fixed effects analysis over all subjects. It should be noted that this makes assumptions about the nature of the noise that might not be compatible with the repeated measures design, since it is difficult to correct for non-sphericity in this setting. Future work will expand this paradigm with a larger group of subjects to afford random effects modelling, and thus generalisation to the population sampled from. Contrary to our hypotheses, we found no modulatory effect of hypothalamic nuclei on RPE signalling. We stress that the current imaging protocols and field-strength (3T) were not optimal to dissociate neural activity in the hypothalamic nuclei. Due to the proximity of air sinuses adjacent to the hypothalamus and the effective resolution available, the present study most likely had insufficient sensitivity to capture activity in hypothalamic regions of interest. Another issue is whether the hemodynamic modelling was appropriate for detected evoked responses in subcortical regions which may deviate from the canonical hemodynamic response functions typically used. The regression model we used deployed temporal and dispersion derivatives for all regressors of interest in order to account for idiosyncratic variance in the timing and temporal spread of the hemodynamic response function. Finally, it should be noted that the cue-conditioning employed in this study was passive. Hence, subjects produced no overt choice behaviour against which to fit learning rate parameters for the RPE model, instead we relied on the asymptote values for the RPE signals. The problem of modelling RPEs in the absence of choice behaviour, motivates fitting learning rate parameters directly to brain data, a computational imaging approach that future work will exploit [60].

Recent literature on the computational neuroscience of reinforcement Learning (RL) has evidenced how decision-making in the mammalian brain is driven by optimizing the net value of both primary and non-primary rewards. Such reward computations have been shown to rest on a comparison between the expectation and outcome of external environmental cues, integrating both the physical and cognitive effort costs of the agent [6163]. The work presented here tentatively expands this perspective this by showing that reward and RPE signals are dependent on internal homeostatic states, which may serve to modify the motivational values according to the personal and time varying homeostatic needs of the organism.

In conclusion, we exploited a simple paradigm, capable of eliciting RPEs under differential glycaemic trajectories, to identify brain stem structures that show a modulation of RPE signalling depending on the glycaemic state. We found that the PBN signals a positive-going reward prediction that is subject to systematic modulation by serum glucose. In the VTA and SN, negative-going RPEs were modulated by serum glucose trajectories, but in a way that was specific to an ascending glycaemic slope. Together the results show that RPE signals in key brainstem structures can be modulated by homeostatic trajectories inherent in naturally occurring glycaemic flux, revealing a potentially tight interplay between homeostatic state and the signalling of primary reward in the human brain.

Acknowledgments

We thank Mehdi Keramati and Boris Gutkin for several helpful discussions.

Data Availability

A fully anonymized dataset along with analysis code for this paper is available at https://doi.org/10.5281/zenodo.4616695.

Funding Statement

This work was supported by the following funders: H.R.S (Lundbeck Foundation Grant ofExcellence “ContAct” ref: R59 A5399; Novo Nordisk FoundationInterdisciplinary Synergy Programme Grant “BASICS” ref: NNF14OC0011413) O.J.H(Lundbeck Foundation, ref: R140-2013-13057; Danish Research Council ref:12-126925) T.M (Lundbeck Foundation ref: R140-2013-13057).

References

  • 1.Pompilio L., Kacelnik A. & Behmer S.T., 2006. State-dependent learned valuation drives choice in an invertebrate. Science (New York, N.Y.), 311(5767), pp.1613–1615. [DOI] [PubMed] [Google Scholar]
  • 2.Cabanac M., 1971. Physiological Role of Pleasure. Science, 173(4002), pp.1103–1107. 10.1126/science.173.4002.1103 [DOI] [PubMed] [Google Scholar]
  • 3.Sclafani A. Learned controls of ingestive behaviour. Appetite. 1997. October;29(2):153–8. 10.1006/appe.1997.0120 . [DOI] [PubMed] [Google Scholar]
  • 4.Haber S.N. & Knutson B., 2010. The reward circuit: linking primate anatomy and human imaging. Neuropsychopharmacology, 35(1), pp.4–26. 10.1038/npp.2009.129 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Berridge K.C., 2006. The debate over dopamine’s role in reward: the case for incentive salience. Psychopharmacology, 191(3), pp.391–431. 10.1007/s00213-006-0578-x [DOI] [PubMed] [Google Scholar]
  • 6.Schultz W., Dayan P. & Montague P.R., 1997. A neural substrate of prediction and reward. Science (New York, N.Y.), 275(5306), pp.1593–1599. [DOI] [PubMed] [Google Scholar]
  • 7.Schultz W., 2015. Neuronal Reward and Decision Signals: From Theories to Data. Physiological reviews, 95(3), pp.853–951. 10.1152/physrev.00023.2014 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.O’Doherty J. et al., 2004. Dissociable Roles of Ventral and Dorsal Striatum in Instrumental Conditioning. Science (New York, N.Y.), 304(5669), pp.452–454. 10.1126/science.1094285 [DOI] [PubMed] [Google Scholar]
  • 9.Rangel A., Camerer C. & Montague P.R., 2008. A framework for studying the neurobiology of value-based decision making. Nature Reviews Neuroscience, 9(7), pp.545–556. Available at: http://www.nature.com/doifinder/10.1038/nrn2357. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Tobler P.N., Fiorillo C.D. & Schultz W., 2005. Adaptive coding of reward value by dopamine neurons. Science (New York, N.Y.), 307(5715), pp.1642–1645. 10.1126/science.1105370 [DOI] [PubMed] [Google Scholar]
  • 11.Stauffer W.R., Lak A. & Schultz W., 2014. Dopamine Reward Prediction Error Responses Reflect Marginal Utility. Current Biology, 24(21), pp.2491–2500. 10.1016/j.cub.2014.08.064 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Schultz W., 2005. Behavioral Theories and the Neurophysiology of Reward. dx.doi.org.ep.fjernadgang.kb.dk, 57(1), pp.87–115. [DOI] [PubMed] [Google Scholar]
  • 13.Cone J.J. et al., 2016. Physiological state gates acquisition and expression of mesolimbic reward prediction signals. Proceedings of the National Academy of Sciences of the United States of America, 113(7), pp.1943–1948. 10.1073/pnas.1519643113 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Ferrario C.R. et al., 2016. Homeostasis Meets Motivation in the Battle to Control Food Intake. Journal of Neuroscience, 36(45), pp.11469–11481. 10.1523/JNEUROSCI.2338-16.2016 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Palmiter R.D., 2007. Is dopamine a physiologically relevant mediator of feeding behavior? Trends in Neurosciences, 30(8), pp.375–381. Available at: http://linkinghub.elsevier.com/retrieve/pii/S0166223607001336. 10.1016/j.tins.2007.06.004 [DOI] [PubMed] [Google Scholar]
  • 16.Mietlicki-Baase E.G. et al., 2015. Amylin modulates the mesolimbic dopamine system to control energy balance. Neuropsychopharmacology, 40(2), pp.372–385. 10.1038/npp.2014.180 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Sternson S.M. & Eiselt A.-K., 2017. Three Pillars for the Neural Control of Appetite. 79(1), pp.401–423. Available at: http://www.annualreviews.org/doi/10.1146/annurev-physiol-021115-104948. [DOI] [PubMed] [Google Scholar]
  • 18.Aponte Y., Atasoy D. & Sternson S.M., 2011. AGRP neurons are sufficient to orchestrate feeding behavior rapidly and without training. Nature Neuroscience, 14(3), pp.351–355. 10.1038/nn.2739 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Dietrich M.O. et al., 2015. Hypothalamic Agrp neurons drive stereotypic behaviors beyond feeding. Cell, 160(6), pp.1222–1232. 10.1016/j.cell.2015.02.024 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Luquet S. et al., 2005. NPY/AgRP neurons are essential for feeding in adult mice but can be ablated in neonates. Science (New York, N.Y.), 310(5748), pp.683–685. 10.1126/science.1115524 [DOI] [PubMed] [Google Scholar]
  • 21.Cansell C. et al., 2012. Arcuate AgRP neurons and the regulation of energy balance. Frontiers in Endocrinology, 3, p.169. 10.3389/fendo.2012.00169 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Mandelblat-Cerf Y. et al., 2015. Arcuate hypothalamic AgRP and putative POMC neurons show opposite changes in spiking across multiple timescales. eLife, 4, p.351. Available at: http://elifesciences.org/lookup/doi/10.7554/eLife.07122 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Nieh E.H. et al. 2015. Decoding Neural Circuits that Control Compulsive Sucrose Seeking. Cell, 160, 528–541 10.1016/j.cell.2015.01.003 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Campos C.A. et al., 2016. Parabrachial CGRP Neurons Control Meal Termination. Cell metabolism, 23(5), pp.811–820. 10.1016/j.cmet.2016.04.006 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Söderpalm A.H.V. & Berridge K.C., 2000. The hedonic impact and intake of food are increased by midazolam microinjection in the parabrachial nucleus. Brain Research, 877(2), pp.288–297. 10.1016/s0006-8993(00)02691-3 [DOI] [PubMed] [Google Scholar]
  • 26.Qunli et al., 2014. The Temporal Pattern of cfos Activation in Hypothalamic, Cortical, and Brainstem Nuclei in Response to Fasting and Refeeding in Male Mice. Endocrinology, 155(3), pp.840–853. 10.1210/en.2013-1831 [DOI] [PubMed] [Google Scholar]
  • 27.Alhadeff A.L., Baird J.-P., et al., 2014. Glucagon-like Peptide-1 receptor signaling in the lateral parabrachial nucleus contributes to the control of food intake and motivation to feed. Neuropsychopharmacology, 39(9), pp.2233–2243. 10.1038/npp.2014.74 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Alhadeff A.L., Hayes M.R. & Grill H.J., 2014. Leptin receptor signaling in the lateral parabrachial nucleus contributes to the control of food intake. American journal of physiology. Regulatory, integrative and comparative physiology, 307(11), pp.R1338–R1344. 10.1152/ajpregu.00329.2014 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Friston K.J. et al., 1996. Movement-related effects in fMRI time-series. Magnetic resonance in medicine, 35(3), pp.346–355. 10.1002/mrm.1910350312 [DOI] [PubMed] [Google Scholar]
  • 30.Glover GH, Li TQ, Ress D. Image-based method for retrospective correction of physiological motion effects in fMRI: RETROICOR. Magnetic Resonance in Medicine. 2000;44:162–167. [DOI] [PubMed] [Google Scholar]
  • 31.Lancaster J.L. et al., 2000. Automated Talairach atlas labels for functional brain mapping. Human brain mapping, 10(3), pp.120–131. Available at: http://onlinelibrary.wiley.com/doi/10.1002/1097-0193. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Lancaster J.L. et al., 1997. The Talairach Daemon, a database server for Talairach atlas labels, Neuroimage. [Google Scholar]
  • 33.Mai JK, Majtanik M, Paxinos G. Atlas of the human brain. Academic Press; 2015. Dec 2. [Google Scholar]
  • 34.Sutton R.S. & Barto A.G., 1998. Reinforcement Learning: An Introduction. IEEE Transactions on Neural Networks, 9(5), pp.1054–1054. [Google Scholar]
  • 35.D’Ardenne K. et al., 2008. BOLD Responses Reflecting Dopaminergic Signals in the Human Ventral Tegmental Area. Science (New York, N.Y.), 319(5867), pp.1264–1267. 10.1126/science.1150605 [DOI] [PubMed] [Google Scholar]
  • 36.Page K.A. et al., 2011. Circulating glucose levels modulate neural control of desire for high-calorie foods in humans. Journal of Clinical Investigation, 121(10), pp.4161–4169. 10.1172/JCI57873 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Eshel N. et al., 2016. Dopamine neurons share common response function for reward prediction error. Nature Neuroscience, 19(3), pp.479–486. 10.1038/nn.4239 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Norgren R., 1978. Projections from the nucleus of the solitary tract in the rat. Neuroscience, 3(2), pp.207–218. 10.1016/0306-4522(78)90102-1 [DOI] [PubMed] [Google Scholar]
  • 39.Loewy A.D., 1998. The Lower Brainstem and Bodily Homeostasis. Trends in Neurosciences, 21(6), pp.270–271. [Google Scholar]
  • 40.Miller R.L., Stein M.K. & Loewy A.D., 2011. Serotonergic inputs to FoxP2 neurons of the pre-locus coeruleus and parabrachial nuclei that project to the ventral tegmental area. Neuroscience, 193, pp.229–240. 10.1016/j.neuroscience.2011.07.008 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Norgren R., 1976. Taste pathways to hypothalamus and amygdala. The Journal of comparative neurology, 166(1), pp.17–30. 10.1002/cne.901660103 [DOI] [PubMed] [Google Scholar]
  • 42.Li C.-S. et al., 2012. Descending projections from the nucleus accumbens shell suppress activity of taste-responsive neurons in the hamster parabrachial nuclei. Journal of Neurophysiology, 108(5), pp.1288–1298. 10.1152/jn.00121.2012 [DOI] [PubMed] [Google Scholar]
  • 43.Zhang C., Kang Y. & Lundy R.F., 2011. Terminal field specificity of forebrain efferent axons to the pontine parabrachial nucleus and medullary reticular formation. Brain Research, 1368(2), pp.108–118. 10.1016/j.brainres.2010.10.086 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Söderpalm A.H.V. & Berridge K.C., 2000. The hedonic impact and intake of food are increased by midazolam microinjection in the parabrachial nucleus. Brain Research, 877(2), pp.288–297. 10.1016/s0006-8993(00)02691-3 [DOI] [PubMed] [Google Scholar]
  • 45.Wu Qi, Boyle M.P.& Palmiter R.D., 2009. Loss of GABAergic signaling by AgRP neurons to the parabrachial nucleus leads to starvation. Cell, 137(7), pp.1225–1234. 10.1016/j.cell.2009.04.022 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.De Oliveira L.B. et al., 2011. Baclofen into the lateral parabrachial nucleus induces hypertonic sodium chloride and sucrose intake in rats. Neuroscience, 183, pp.160–170. 10.1016/j.neuroscience.2011.02.019 [DOI] [PubMed] [Google Scholar]
  • 47.DiPatrizio N.V. & Simansky K.J., 2008. Activating parabrachial cannabinoid CB1 receptors selectively stimulates feeding of palatable foods in rats. The Journal of neuroscience: the official journal of the Society for Neuroscience, 28(39), pp.9702–9709. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Wilson J.D. et al., 2003. An orexigenic role for μ-opioid receptors in the lateral parabrachial nucleus. American journal of physiology. Regulatory, integrative and comparative physiology, 285(5), pp.R1055–R1065. 10.1152/ajpregu.00108.2003 [DOI] [PubMed] [Google Scholar]
  • 49.Chaijale N.N., Aloyo V.J. & Simansky K.J., 2013. The stereoisomer (+)-naloxone potentiates G-protein coupling and feeding associated with stimulation of mu opioid receptors in the parabrachial nucleus. Journal of Psychopharmacology, 27(3), pp.302–311. 10.1177/0269881112472561 [DOI] [PubMed] [Google Scholar]
  • 50.Skibicka K.P. & Grill H.J., 2009. Hypothalamic and hindbrain melanocortin receptors contribute to the feeding, thermogenic, and cardiovascular action of melanocortins. Endocrinology, 150(12), pp.5351–5361. 10.1210/en.2009-0804 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Konner A.C et al. 2011. Role for insulin signaling in catecholaminergic neurons in control of energy homeostasis.Cell Metabolism, 3(6):720–8 10.1016/j.cmet.2011.03.021 [DOI] [PubMed] [Google Scholar]
  • 52.Sun X. et al., 2014. The neural signature of satiation is associated with ghrelin response and triglyceride metabolism. Physiology & Behavior, 136, pp.63–73. 10.1016/j.physbeh.2014.04.017 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Malik S. et al., 2008. Ghrelin modulates brain activity in areas that control appetitive behavior. Cell metabolism, 7(5), pp.400–409. 10.1016/j.cmet.2008.03.007 [DOI] [PubMed] [Google Scholar]
  • 54.Kroemer N.B. et al., 2013. Fasting levels of ghrelin covary with the brain response to food pictures. Addiction Biology, 18(5), pp.855–862. 10.1111/j.1369-1600.2012.00489.x [DOI] [PubMed] [Google Scholar]
  • 55.Domingos A.I. et al., 2011. Leptin regulates the reward value of nutrient. Nature Neuroscience, 14(12), pp.1562–1568. Available at: http://www.nature.com/doifinder/10.1038/nn.2977. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Figlewicz D.P. et al., 2003. Expression of receptors for insulin and leptin in the ventral tegmental area/substantia nigra (VTA/SN) of the rat. Brain Research, 964(1), pp.107–115. 10.1016/s0006-8993(02)04087-8 [DOI] [PubMed] [Google Scholar]
  • 57.Fulton S., 2000. Modulation of Brain Reward Circuitry by Leptin. 287(5450), pp.125–128. Available at: http://www.sciencemag.org/cgi/doi/10.1126/science.287.5450.125. [DOI] [PubMed] [Google Scholar]
  • 58.Takahashi K.A. & Cone R.D., 2005. Fasting induces a large, leptin-dependent increase in the intrinsic action potential frequency of orexigenic arcuate nucleus neuropeptide Y/Agouti-related protein neurons. Endocrinology, 146(3), pp.1043–1047. 10.1210/en.2004-1397 [DOI] [PubMed] [Google Scholar]
  • 59.Berridge KC. Modulation of taste affect by hunger, caloric satiety, and sensory-specific satiety in the rat. Appetite. 1991. April;16(2):103–20. 10.1016/0195-6663(91)90036-r . [DOI] [PubMed] [Google Scholar]
  • 60.Meder D. et al., 2017. Simultaneous representation of a spectrum of dynamically changing value estimates during decision making. Nature Communications, 8(1), p.1942. 10.1038/s41467-017-02169-w [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Alexander W. H., & Brown J. W. (2011). Medial prefrontal cortex as an action-outcome predictor. Nature Publishing Group, 14(10), 1338–1344. 10.1038/nn.2921 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Silvetti M., Alexander W., Verguts T., & Brown J. W. (2014). From conflict management to reward-based decision making: Actors and critics in primate medial frontal cortex. Neuroscience and Biobehavioral Reviews, 46 Pt 1, 44–57. 10.1016/j.neubiorev.2013.11.003 [DOI] [PubMed] [Google Scholar]
  • 63.Verguts T., Vassena E., & Silvetti M. (2015). Adaptive effort investment in cognitive and physical tasks: A neurocomputational model. Frontiers in Behavioral Neuroscience, 9. 10.3389/fnbeh.2015.00057 [DOI] [PMC free article] [PubMed] [Google Scholar]

Decision Letter 0

Tom Verguts

21 Jul 2020

PONE-D-20-05797

Reward signalling in brainstem nuclei under fluctuating blood glucose

PLOS ONE

Dear Dr. Hulme,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

The main comment is the lack of methodological detail (see comments of both reviewers). It is impossible to currently understand how the TD model was constructed. Note that a TD model requires adding assumptions that may change its qualitative behavior (Pan et al., 2005, JNeurosci), so it must be made explicit. Equally important, I could not reconstruct how the 1st-level GLM was constructed (which events? which parametric modulators? were they orthogonalized?), and how contrasts were subsequently defined based on the 1st-level regressors.

Relatedly, it seems that the modulation by serum glucose runs opposite to what the authors predicted (more glucose, stronger response). If so, this must be made explicit in the Discussion.

Please submit your revised manuscript by Sep 04 2020 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

We look forward to receiving your revised manuscript.

Kind regards,

Tom Verguts

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Please provide additional details regarding participant consent. In the Methods section, please ensure that you have specified what type of consent you obtained (for instance, written or verbal) and whether the ethics committee approved this consent procedure. If verbal consent was obtained please state why it was not possible to obtain written consent and how verbal consent was recorded. If your study included minors, state whether you obtained consent from parents or guardians.

3.We note that you have stated that you will provide repository information for your data at acceptance. Should your manuscript be accepted for publication, we will hold it until you provide the relevant accession numbers or DOIs necessary to access your data. If you wish to make changes to your Data Availability statement, please describe these changes in your cover letter and we will update your Data Availability statement to reflect the information you provide.

4.Thank you for stating the following in the Competing Interests section:

[The authors declare no competing interests.  H.R.S. has received honoraria as speaker from Genzyme, Denmark and as senior editor of Neuroimage from Elsevier Publishers, Amsterdam, The Netherlands. H.R.S. has received a research fund from Biogen-idec, Denmark.].

Please confirm that this does not alter your adherence to all PLOS ONE policies on sharing data and materials, by including the following statement: "This does not alter our adherence to  PLOS ONE policies on sharing data and materials.” (as detailed online in our guide for authors http://journals.plos.org/plosone/s/competing-interests).  If there are restrictions on sharing of data and/or materials, please state these. Please note that we cannot proceed with consideration of your article until this information has been declared.

Please include your updated Competing Interests statement in your cover letter; we will change the online submission form on your behalf.

Please know it is PLOS ONE policy for corresponding authors to declare, on behalf of all authors, all potential competing interests for the purposes of transparency. PLOS defines a competing interest as anything that interferes with, or could reasonably be perceived as interfering with, the full and objective presentation, peer review, editorial decision-making, or publication of research or non-research articles submitted to one of the journals. Competing interests can be financial or non-financial, professional, or personal. Competing interests can arise in relationship to an organization or another person. Please follow this link to our website for more details on competing interests: http://journals.plos.org/plosone/s/competing-interests

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Partly

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: No

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: No

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The authors propose a study aimed at investigating the influence of homeostatic state on RPE signals. They manipulated homeostatic state by varying the blood glucose levels with two paradigms, one ascending (start low and grow) and the other descending (start high and decrease). The results indicate a clear modulation RPE x glucose level in several cortical and subcortical areas, including the VTA, PBN, and caudate nucleus. The homeostatic modulation of RPE was found to depend also on the glucose level phase (ascending or descending). I found the study interesting and very relevant for the understanding of the biological/evolutionary meaning of RPE. I have no major concerns about the methods, nonetheless the authors should improve the clarity of results exposition, in order to make their findings more easily understandable by the reader. In particular (sorted by appearance order in the manuscript):

1. Figure 1b. The authors should define already here as “ascending” and “descending” the two glucose conditions. Moreover, it is not specified the meaning of the eight different time courses in each condition.

2. Figure 2. I suppose the y axis indicates PSC. It should be specified. The authors should stress out the matching between the meaning of the time courses color here and in Figure 1d

3. Figure 3b and Figure 4b are hard to understand. I think the trajectory furthest away from the viewer does not provide important information, it is difficult to read and confounds the reader (it seems an additional trajectory besides those described in Figure 1d). Moreover, the authors should help the reader in matching these trajectories with those described in Figure 1d (who’s who?).

4. The authors should provide a table of fMRI results.

Finally, recent literature on computational neuroscience of Reinforcement Learning (RL) is evidencing how decision-making in the mammalian brain is strongly driven by optimizing the net value (discounted by costs) about both primary and non-primary rewards (e.g. Alexander & Brown 2011; Silvetti, Alexander et al. 2014; Verguts et al. 2015; Silvetti, Vassena et al., 2018). I think this work is relevant for clarifying how reward and RPE signals are dependent from internal states, and how the latter modulate RL processes, suggesting that RPEs are not only the comparison between expectation and environmental outcome (objective), but are dependent on homeostatic states (subjective). These results should be linked to the above literature in the Discussion section, as they contribute to a paradigmatic change that shifts RL-based decision-making from being “objective” to be more “subjective”.

Reviewer #2: Morville and colleagues use FMRI to investigate the relationship between reward prediction error (RPE) signals and homeostatic state (ascending and descending serum glucose trajectories). The authors find that RPE responses in midbrain structures are sensitive to glucose trajectories, which demonstrates a link between RPE responses and homeostatic processes. The authors acknowledge several important limitations in their work, and draw generally appropriate conclusions from their findings. Although I think the manuscript possesses many good features, I have a number of concerns that should be addressed in a revision.

Major Concerns

1) My biggest problem with the manuscript is the lack of methodological details. First, it is difficult to evaluate what was done with respect to the analyses. I think the authors authors rely on the canonical hemodynamic response function. But, it's not clear to me if this shape would capture meaningful variation tied to difference in neuronal activity in this design (cf. Chen et al., 2015). Some discussion of the potential impact of HDR shape would be important to include. Second, it is not clear how the authors modeled RPEs and included those responses as parametric modulators in their analyses. Third, given the design and concatenation of the FMRI sessions, how did this interact with the high pass filter?

2) The authors state that they will release the data on NeuroVault. However, this repository is for statistical maps, which should be shared of course (e.g., Botvinik-Nezer et al., 2020). I recommend that the authors share the raw data (formatted into the Brain Imaging Data Structure) on OpenNeuro (Gorgolewski et al., 2015). With such a rich dataset and many important open questions (e.g., fitting the RL model directly to the brain data), it would be unfortunate not to share the data openly and publicly. (Of course, I realize the authors might still be working on some of these questions, and thus could elect to embargo the data for some period of time. But, as is, the manuscript is not compliant with data sharing policies at this journal and hence why I list this as a "major" comment.)

Minor Concerns

1) Although the paper from Stauffer and colleagues (2014) links dopaminergic activity to marginal utility and the explanation in the Introduction makes sense to me, I think it would be worth explaining this concept further in the Discussion. Unless I missed it, the authors do not mention marginal utility again after the Introduction.

2) I think the concept of "three interacting valuation systems" (Sternson & Eiselt 2017) could get confused with other valuation systems, as described in related work (e.g., Rangel et al., 2008). Please try to clarify and reconcile these different frameworks.

3) The reward cue and liquid would be very difficult to separate without any jitter. In addition. the responses to these events could occur in the same or consecutive TRs. Please clarify how these phases of the task were modeled (see above major comment).

4) How were the scanning parameters optimized to record midbrain responses? Were there any corrections for physiological noise?

5) How did the authors ensure that participants were compliant with the 12-hour fasting protocol?

6) Please check figure legends for completeness/accuracy. Some elements (e.g., line color for the HDR functions in Figure 2 are not defined).

7) In Figure 2, please add error bars to the HDR functions.

8) In Figures 3 and 4, consider using a different color bar for the brain activation and serum glucose levels. As is, the color bars looks somewhat similar, which could lead to confusion.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2021 Apr 7;16(4):e0243899. doi: 10.1371/journal.pone.0243899.r002

Author response to Decision Letter 0


10 Aug 2020

Our reviewer responses are uploaded as a separate document in this resubmission. We make use of several screenshots of the revised document in our reply which this text field cannot support. We hope this does not cause inconvenience.

Attachment

Submitted filename: PlosOne reviewers comments _ Glucose reward paper .docx

Decision Letter 1

Tom Verguts

19 Aug 2020

PONE-D-20-05797R1

Reward signalling in brainstem nuclei under fluctuating blood glucose

PLOS ONE

Dear Dr. Hulme,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

I thank the authors for responding to the comments by myself and the reviewers. The paper became much clearer as a result. I would like you to look at these (minor) unclarity issues still.

  • The study design is not entirely clear: Every subject does 4 sessions, but is it LLHH, or LHHL, or a random order,…?

  • P 6: Ascending and descending are used for the first time here; and only explained on page 10. Please define on first pass.

  • P 6: “A standard fixed effects level…” Do you mean that a fixed effects level model was fitted across all data? Although a fixed effects model in itself is standard, it is definitely not standard to apply it across subjects (without random effects). Please explicitly comment on this in the paper.

  • The model clarification is good, but I wonder about the statement “Since there is no overt behaviour…” This is true, but i don’t see how this is relevant. Behavioral data are needed indeed when you want to estimate the parameters of a model; but you can perfectly well generate regressors for a Rescorla-Wagner, TD, or any other model (which, apparently, is what you do). TD is just about how you model how value “percolates” to earlier time points (such as cue) from reward. You can generate regressors for that model. But the way I see it, your model does not learn, but you give it for each time point (cue, feedback, …), the equilibrium value that it has to converge to, based on the statistics of the experiment? Then say explicitly that your model does not learn, not that you don’t have behavioral data to fit a model.

  • P 12: “that this was true”: change to “this was true”

Please submit your revised manuscript by Oct 03 2020 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

We look forward to receiving your revised manuscript.

Kind regards,

Tom Verguts

Academic Editor

PLOS ONE

[Note: HTML markup is below. Please do not edit.]

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

Decision Letter 2

Tom Verguts

1 Dec 2020

Reward signalling in brainstem nuclei under fluctuating blood glucose

PONE-D-20-05797R2

Dear Dr. Hulme,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Tom Verguts

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Acceptance letter

Tom Verguts

29 Mar 2021

PONE-D-20-05797R2

Reward signalling in brainstem nuclei under fluctuating blood glucose

Dear Dr. Hulme:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Tom Verguts

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    Attachment

    Submitted filename: PlosOne reviewers comments _ Glucose reward paper .docx

    Attachment

    Submitted filename: Plos One reviewer comments rFlux.docx

    Data Availability Statement

    A fully anonymized dataset along with analysis code for this paper is available at https://doi.org/10.5281/zenodo.4616695.


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES