Excitation and inhibition in anterior cingulate predict use of past experiences

Jacqueline Scholl; Nils Kolling; Natalie Nelissen; Charlotte J Stagg; Catherine J Harmer; Matthew FS Rushworth

doi:10.7554/eLife.20365

. 2017 Jan 5;6:e20365. doi: 10.7554/eLife.20365

Excitation and inhibition in anterior cingulate predict use of past experiences

Jacqueline Scholl ^1,^*,^†, Nils Kolling ^1,^†, Natalie Nelissen ², Charlotte J Stagg ², Catherine J Harmer ^3,^‡, Matthew FS Rushworth ^1,^4,^‡

Editor: Joshua I Gold⁵

PMCID: PMC5213710 PMID: 28055824

Abstract

Dorsal anterior cingulate cortex (dACC) mediates updating and maintenance of cognitive models of the world used to drive adaptive reward-guided behavior. We investigated the neurochemical underpinnings of this process. We used magnetic resonance spectroscopy in humans, to measure levels of glutamate and GABA in dACC. We examined their relationship to neural signals in dACC, measured with fMRI, and cognitive task performance. Both inhibitory and excitatory neurotransmitters in dACC were predictive of the strength of neural signals in dACC and behavioral adaptation. Glutamate levels were correlated, first, with stronger neural activity representing information to be learnt about the tasks’ costs and benefits and, second, greater use of this information in the guidance of behavior. GABA levels were negatively correlated with the same neural signals and the same indices of behavioral influence. Our results suggest that glutamate and GABA in dACC affect the encoding and use of past experiences to guide behavior.

DOI: http://dx.doi.org/10.7554/eLife.20365.001

Research Organism: Human

Introduction

Dorsal anterior cingulate cortex (dACC) has a central role in reward-guided decision-making, behavioral adaptation, learning, and formation of task models (Heilbronner and Hayden, 2016; Kolling et al., 2016a; Holroyd and Yeung, 2012; Khamassi et al., 2011; Ullsperger et al., 2014). Recently dACC’s role in health and disease has been underscored by findings that structural variability predicts a broad spectrum of mental illnesses (Goodkind et al., 2015). Most of our knowledge of dACC is based on measurements tied to neuronal firing such as human functional magnetic resonance imaging (fMRI) and animal recording studies or to investigations of loss of function after lesions and inactivation (Kennerley et al., 2006; Amiez et al., 2006). However, the neurochemical modulation and orchestration of dACC’s role is largely unknown.

The importance of variation in neurotransmitter levels has recently become apparent in other frontal brain areas. For example ventromedial prefrontal cortex (vmPFC) has been linked to value-guided decisions (Boorman et al., 2009; Rushworth et al., 2011). Biophysical neural network models of decision-making in vmPFC (Hunt et al., 2012) predict that the inhibitory neurotransmitter gamma-aminobutyric acid (GABA) mediates the dynamics of the value comparison process. The predictions were born out in a study looking at the neurochemistry of this structure with magnetic resonance spectroscopy (MRS) (Jocham et al., 2012). Relatedly, levels of GABA in motor cortex (Stagg et al., 2011) and in the frontal eye field (Sumner et al., 2010) have been found to predict the speed of selection of responses and inhibition of incorrect responses to distractors respectively. In all three cases, neurotransmitter levels were predictive of the dynamics of the decision or selection process within different domains.

Here we use a similar approach to examine the relation between GABA and glutamate in dACC, fMRI-based indices of neural activity, and behavior. We relate these neurotransmitters to a key function of dACC that is quite distinct to the selection processes previously examined in MRS studies, namely the use of a task model to guide behavior based on past experience. More specifically, we hypothesized that if excitatory and inhibitory neurotransmitters in dACC determine the processing and use of information to form a model of the world (O'Reilly et al., 2013), or at least the task at hand, then measures of these neurotransmitters should relate to both behavioral and neural markers of this process (Figure 1—figure supplement 1).

Results

We used MRS to obtain measures of the total amount of GABA and glutamate in 27 humans at rest in dACC (Figure 1A and B). Participants then performed a previously established multi-dimensional learning task (Scholl et al., 2015) during fMRI acquisition. Participants had to repeatedly choose between the same two options, based on the reward probabilities and the reward and effort magnitudes (i.e. requirement of a sustained effort) associated with each option. The reward probabilities changed randomly from trial to trial and were displayed to participants on each trial on the screen. By contrast the reward and effort magnitudes associated with each option had to be learnt from experience across trials (Figure 1C and D). The participants’ goal was thus to choose options that would lead to the highest reward magnitude with the highest probability of being rewarded, but at the same time requiring the least effort. Participants performed the task well (Figure 2) after careful training.

Figure 1—figure supplement 1. — (A) Spectroscopy voxels were placed in dACC. Cingulate sulcal morphology was used to guide voxel placement and this resulted in consistent positioning of the voxel in the same location in MNI space (white color indicates overlap in voxel position in all 27 participants). (B) Example spectroscopy spectrum from one participant. The fitted LCModel (red) is plotted overlaid on the actual data (black). The difference between the data and the model (residuals) is shown at the top and the baseline at the bottom. (C) Participants performed 240 trials of a reward- and effort-guided learning task. On each trial, participants were shown two options overlaid with the probability of receiving a reward for each choice (Ci). Participants chose between the options on the basis of the reward probabilities displayed on the screen and on the basis of reward and effort magnitudes learnt from experience on previous trials. After participants chose one option, they were shown feedback information for both the option they had chosen and the unchosen option (**Cii**). The reward magnitudes were shown as purple bars (top of the screen), the effort magnitude was indicated through the position of a dial on a circle. Whether the participant actually received a reward or not (because of the reward probability) was indicated through a tick mark (green) or a cross (red, not shown here). If participants received a reward, the chosen reward magnitude was added to a status bar at the bottom of the screen, which tracked participants’ earning over the course of the experiment. Finally, participants exerted the effort associated with the chosen option in a final phase of the trial (**Ciii**). They had to exert an extended effort by responding to select targets that appeared on the screen over a period of time before the trial ended; the higher the trial’s effort level, the more targets participants had to eliminate. (D) Example of reward magnitude and effort magnitude variation associated with the two options across the course of the 120 trials for one of the two sessions in the experiment.

**DOI:** http://dx.doi.org/10.7554/eLife.20365.002

Participants’ performance can be described using a computational reinforcement-learning model (see Figure 2—figure supplement 1 and 2). This allows parsing a single behavior (choices on each trial) into different underlying components. Our hypothesis was that neurotransmitter levels in dACC should relate to how much participants used the learnt information or, in other words, a model of what choices are associated with high/low reward/effort magnitudes, to guide their choices (rather than just relying on the displayed probability information). This use of learnt information was captured by a single parameter in the model (γ, Figure 2—figure supplement 1C), which was independent from participants’ other behavioral parameters (Figure 2—figure supplement 2B).

If the use of learnt information depends on the excitation/inhibition balance, we should find correlations between γ and the neurotransmitters. Indeed, partial correlation analyses revealed that higher glutamate relative to GABA levels related to increased use of the learnt information (ρ=0.53, p=0.011). This effect was specific to the use of learnt information (Figure 3—figure supplement 1). When considering the effects of the two neurotransmitters separately, we found that both higher levels of glutamate (ρ=0.45, p=0.039) and lower levels of GABA (ρ=−0.43, p=0.05) were independently related to increased use of the learnt information (Figure 3A).

Figure 3. — (A) Participants with higher concentrations of glutamate (ρ=0.45, p=0.039) and lower concentrations of GABA (ρ=−0.43, p=0.05) in dACC were better able to use the learnt information (parameter γ from computational model) to guide their choices (the graphs illustrate partial correlations, i.e. the plotted values have been adjusted for covariates, see Materials and methods). (Bi) Neural activity in dACC was sensitive to the information to be learnt at the time of the outcome: it showed an inverse outcome value signal, i.e. BOLD activity increased with relative value of the alternative (unchosen minus chosen option, reward minus effort magnitude;yellow, whole-brain cluster-corrected, p=5*10⁻⁵, GLM1). (**Bii**) Individual differences in this neural signal were predictive of individual differences in how well participants could use the learnt information (red, whole-brain cluster-corrected, p=2*10⁻⁵, GLM2). (**Biii**) Individual differences in this neural signal also correlated with relative glutamate and GABA levels (blue, cluster-corrected in spectroscopy ROI, p=0.039, GLM3). See Figure 3—figure supplement 2 for overlaps of these activations with the spectroscopy voxel in sagittal cross-sections. Data for A in Figure3_SourceData1; data for B in Figure3_SourceData2.zip

**DOI:** http://dx.doi.org/10.7554/eLife.20365.010

Figure 3—source data 1. This table contains the spectroscopy, brain volume and behavioral parameters used for correlations in Figure 3A.
**DOI:** http://dx.doi.org/10.7554/eLife.20365.011

elife-20365-fig3-data1.xlsx^{(46.7KB, xlsx)}

DOI: 10.7554/eLife.20365.011

Figure 3—source data 2. This folder contains the MRI contrast maps, both thresholded (i.e. corrected for multiple comparison using cluster correction) and non-thresholded.
The maps are in NIfTI format and can be opened with freely available data viewers such as FSLView or MRIcron.

**DOI:** http://dx.doi.org/10.7554/eLife.20365.012

elife-20365-fig3-data2.zip^{(5.9MB, zip)}

DOI: 10.7554/eLife.20365.012

Figure 3—source data 3. This folder relates to Figure 3—figure supplement 3.
It contains non-thresholded contrast maps showing the relationship between Glu-GABA and brain activity to information to be learnt, using different corrections of spectroscopy measurements for partial brain volumes (Figure 3—figure supplement 3i). Maps are in NIfTI format.

**DOI:** http://dx.doi.org/10.7554/eLife.20365.013

elife-20365-fig3-data3.zip^{(2.5MB, zip)}

DOI: 10.7554/eLife.20365.013

Figure 3—source data 4. This table relates to Figure 3—figure supplement 3.
It contains the brain-volume corrected spectroscopy measurements (brain volume corrections B-D labeled as in figure) and behavioral parameters used for correlations in Figure 3—figure supplement 3ii.

**DOI:** http://dx.doi.org/10.7554/eLife.20365.014

elife-20365-fig3-data4.xlsx^{(52.6KB, xlsx)}

DOI: 10.7554/eLife.20365.014

Figure 3—figure supplement 1. — (A) Participants with higher concentrations of glutamate (ρ=0.45, p=0.039) and lower concentrations of GABA (ρ=−0.43, p=0.05) in dACC were better able to use the learnt information (parameter γ from computational model) to guide their choices (the graphs illustrate partial correlations, i.e. the plotted values have been adjusted for covariates, see Materials and methods). (Bi) Neural activity in dACC was sensitive to the information to be learnt at the time of the outcome: it showed an inverse outcome value signal, i.e. BOLD activity increased with relative value of the alternative (unchosen minus chosen option, reward minus effort magnitude;yellow, whole-brain cluster-corrected, p=5*10⁻⁵, GLM1). (**Bii**) Individual differences in this neural signal were predictive of individual differences in how well participants could use the learnt information (red, whole-brain cluster-corrected, p=2*10⁻⁵, GLM2). (**Biii**) Individual differences in this neural signal also correlated with relative glutamate and GABA levels (blue, cluster-corrected in spectroscopy ROI, p=0.039, GLM3). See Figure 3—figure supplement 2 for overlaps of these activations with the spectroscopy voxel in sagittal cross-sections. Data for A in Figure3_SourceData1; data for B in Figure3_SourceData2.zip

**DOI:** http://dx.doi.org/10.7554/eLife.20365.010

Figure 3—source data 1. This table contains the spectroscopy, brain volume and behavioral parameters used for correlations in Figure 3A.
**DOI:** http://dx.doi.org/10.7554/eLife.20365.011

elife-20365-fig3-data1.xlsx^{(46.7KB, xlsx)}

DOI: 10.7554/eLife.20365.011

Figure 3—source data 2. This folder contains the MRI contrast maps, both thresholded (i.e. corrected for multiple comparison using cluster correction) and non-thresholded.
The maps are in NIfTI format and can be opened with freely available data viewers such as FSLView or MRIcron.

**DOI:** http://dx.doi.org/10.7554/eLife.20365.012

elife-20365-fig3-data2.zip^{(5.9MB, zip)}

DOI: 10.7554/eLife.20365.012

Figure 3—source data 3. This folder relates to Figure 3—figure supplement 3.
It contains non-thresholded contrast maps showing the relationship between Glu-GABA and brain activity to information to be learnt, using different corrections of spectroscopy measurements for partial brain volumes (Figure 3—figure supplement 3i). Maps are in NIfTI format.

**DOI:** http://dx.doi.org/10.7554/eLife.20365.013

elife-20365-fig3-data3.zip^{(2.5MB, zip)}

DOI: 10.7554/eLife.20365.013

Figure 3—source data 4. This table relates to Figure 3—figure supplement 3.
It contains the brain-volume corrected spectroscopy measurements (brain volume corrections B-D labeled as in figure) and behavioral parameters used for correlations in Figure 3—figure supplement 3ii.

**DOI:** http://dx.doi.org/10.7554/eLife.20365.014

elife-20365-fig3-data4.xlsx^{(52.6KB, xlsx)}

DOI: 10.7554/eLife.20365.014

One way in which resting state glutamate/GABA levels could be linked to behavioral performance is through an impact on brain activity. To test this, we first identified brain areas that represented the information to be learnt (GLM1) at the time of learning. We identified activity in dACC and adjacent cortex (Figure 3Bi, x = 6, y = 32, z = 36, z-score = 3.62, cluster p-value=5*10⁻⁵) and in other areas (Table 1A) as coding the information to be learnt as an inverse outcome value signal (relative reward outcome minus relative effort outcome) or, in other words, a signal related to the relative value of the alternative not chosen on the current trial. Such a signal has previously been noted in dACC and has been related to behavioral adaptation: decisions to maintain or change behavior in diverse contexts (Shima and Tanji, 1998; Kolling et al., 2012; Stoll et al., 2016; Meder et al., 2016; Kolling et al., 2016b). Other areas with different types of outcome-related activity are listed in Table 1B. Next, we examined whether variation in this neural signal was related to our behavioral measure of use of learnt information (γ, GLM2). Again, we found this to be the case in a partly overlapping dACC area (Figure 3Bii and Table 1C, x=−14, y = 24, z = 58, z-score = 3.44, cluster p-value=2*10⁻⁵): participants with stronger neural representation of the information to be learnt in dACC were better at using the learnt information to guide their choices.

Table 1.

(A) Several areas carried a signal for learnt information (relative reward outcome minus relative effort outcome) as an inverse outcome value signal, in other words a signal related to the value of the alternative choice compared to the value of the action actually taken. (B) Other areas signaled outcomes to be learnt as the value of the action actually taken relative to the value of the alternative action. (C) Areas in which individual differences in the strength of the neural signal for the learnt information correlated with the behavioral use of the learnt information. All results are cluster-corrected at whole-brain level (z > 2.3, p<0.05, with actual p-value and number of voxels in the cluster indicated in the table). Region labels were obtained using atlases in FSL: ¹(Neubert et al., 2015), ²(Mars et al., 2011), ³(Sallet et al., 2013), ⁴(Mori et al., 2005).

DOI: http://dx.doi.org/10.7554/eLife.20365.019

A) Learnt information (inverse value signal)
	x	y	z	z-score	Voxels	p-value
dACC (area 8m, anterior rostro-cingulate zone¹)	6	32	36	3.62	821	5*10⁻⁵
Parietal (IPL-D, IPL-C²), left	−52	−58	42	3.68	855	3*10⁻⁵
Parietal (IPL-C, IPL-D²), right	52	−46	42	3.98	753	1*10⁻⁴
Dorsolateral prefrontal cortex (area 9/46 V³), right	40	22	38	3.61	840	4*10⁻⁵
Cerebellum	−10	−80	−26	4.19	454	6*10⁻³
Lateral frontal pole¹, right	32	54	6	3.21	360	0.02
B) Learnt information (outcome value signal)
Temporal cortex, extending to parietal opercular cortex, left	−36	−32	16	3.49	1371	1*10⁻⁷
Temporal cortex, extending to parietal opercular cortex, right	50	−28	26	3.23	458	5*10⁻³
C) Brain behavior interaction for learnt information
Midcingulate cortex (posterior rostro-cingulate zone1)	2	−4	54	3.12	719	2*10⁻⁴
Pre-SMA extending into dACC and area 8 m¹	−14	24	58	3.44	771	2*10⁻⁵
Occipital lobe	−12	−84	4	3.22	488	0.001
White matter (corticospinal tract⁴)	−18	−14	32	3.4	431	0.003
Precentral gyrus, right	40	−14	52	3.28	311	0.02

Open in a new tab

Finally, we tested whether neurotransmitter levels in dACC were related to the neural representation of the information to be learnt (GLM3). Indeed, we found that the strength of the representation of the information to be learnt in dACC correlated with the relative glutamate to GABA levels (Figure 3Biii, x = 4, y = 22, z = 40, z-score = 3.11, cluster p-value=0.039). This result was specific to dACC; analogous analyses in other ROIs identified in the contrasts for learnt information (Table 1A and B) revealed no significant effects. These findings suggest that neurotransmitters in dACC are predictive of a behavior dependent on dACC and of fMRI-based measures of neural activity in dACC related to the same behavior.

Discussion

We looked at the effects of neurotransmitter variation on dACC function. We found that differences in glutamate and GABA both related, firstly, to the strength of neural signals in dACC encoding the outcomes of decisions, i.e. the feedback information that should guide behavioral adaptation on future decisions. Secondly, the neurotransmitters also related to behavior, i.e. how well participants used this feedback information to guide future choices. Strikingly, we found opposing patterns of relationships for excitatory and inhibitory neurotransmitters: higher levels of glutamate and lower levels of GABA were linked to increased use of the learnt information.

Our findings are consistent with an emerging view of dACC in forming, updating and maintaining a model of the world and of behavioral strategies (O'Reilly et al., 2013; Karlsson et al., 2012; Kolling et al., 2014; Wittmann et al., 2016). In our paradigm, it was always advantageous to use information learnt from the outcome of one decision to guide subsequent decisions. In contrast, in other situations, it may be beneficial to behave more randomly (for example when exploring new environments). Here, increased GABA concentrations might enable better performance by ensuring that one does not rely too much on previous information. In fact, inhibition in ACC of rats has been shown to disable reward history-guided behaviors, making them more random, which depending on the task led to better or worse performance, similarly inactivation of dACC in macaques completely prevented them from using reward history (Kennerley et al., 2006; Amiez et al., 2006; Karlsson et al., 2012; Tervo et al., 2014). It is possible that transient inhibition (through increased GABA) might allow for learning a new model of the task, whereas glutamate might mediate the exploitation of such a model.

DACC has also been implicated in error monitoring. In this context, global changes in another neurotransmitter, acetylcholine, have been shown to affect dACC-mediated post-error adjustments (Danielmeier et al., 2015). This suggests that there are additional neurochemical factors, potentially mediating dACC’s impact on neural activity in other brain areas.

Our results contrast with findings in vmPFC where increased GABA levels are linked to improved decision accuracy and slower ramping of neural signals (Jocham et al., 2012). Here we found that both decreased levels of GABA and increased levels of glutamate were related to the degree to which a learned task model, as opposed to information displayed on each trial, influenced behavior. This suggests a fundamental difference in function, that dACC represents and regulates the use of a model of the world based on past experiences, rather than that it mediates the integration and selection of all arbitrary types of information during decisions (Hunt et al., 2015). It is particularly in complex environments that monitoring and fine-tuning of how much to use learnt information – as opposed to immediately perceived information - may be crucial.

These findings are of potential clinical relevance as, dACC has been linked to psychiatric disorders generally (Goodkind et al., 2015) and to mood disorders more specifically (Yüksel and Öngür, 2010). In the future, it would be important to test whether glutamate and GABA measurements, and their effects on value-guided learning, are changed in mood disorders.

Materials and methods

Participants

30 healthy volunteers took part in the study after giving informed consent. One participant was excluded because he/she fell asleep, one participant was excluded because of corrupted spectroscopy data and one participant was excluded because of noise in the spectroscopy measurements (i.e. Cramer-Rao lower bound values for GABA were 38%). Of the remaining 27 participants, 13 were assigned to a selective serotonergic re-uptake inhibitor for two weeks, while 14 were assigned to placebo as part of previously reported studies (Scholl et al., 2015). The drug manipulation had no effect on neurotransmitter levels (p>0.84). Nevertheless, we included it in all analyses as a confound regressor.

Task description

This task description is adapted from a previously published study based on the same task (Scholl et al., 2015). We designed a task that allowed measuring how participants learnt about reward and effort and how well they could use this information to guide decisions. In the task, participants made repeated choices between two options with the aim of maximizing their monetary pay-off and minimizing the effort they needed to exert in an interleaved ‘effort phase’. On each trial, there were three phases: first participants chose between two options (‘choice phase’), then they were shown the outcome of their choice (‘outcome phase’), then they had to exert the effort associated with the option they had chosen (‘effort phase’).

In the decision phase, participants chose between two options using two buttons on a trackball mouse. Each option had three independent attributes: a reward magnitude (reward points, later translated into monetary pay-off), an effort magnitude (amount of effort required in the effort phase), and a probability of receiving a real reward (rather than a hypothetical reward, see below). The probability of each option was shown on the screen at the time of choice. In contrast, the reward and effort magnitudes of the options were not explicitly instructed and instead participants had to learn and track these slowly changing features of the two choices across trials. These magnitudes were drawn from normal distributions of which the means fluctuated pseudorandomly, slowly and independently over the course of the experiment between three levels (low, mid, high). Participants were instructed to learn and keep track of the changing mean value of each magnitude across the experiment. Only one of the reward or effort magnitude means was drifting at any one time and each of the four magnitudes was at each mean level equally often.

After the participants had selected an option, it was highlighted until the ensuing outcome phase. In the outcome phase, participants were first shown the reward and effort magnitudes of the option they had chosen, as well as whether they received a reward or not (in other words whether the outcome was a real secondary reinforcer indicating a specific monetary payment or instead hypothetical). If they received a reward, the current trial’s chosen reward magnitude was added to their total reward accumulated so far (which was translated into a monetary reward in the end of the experiment). They were then shown the reward and effort magnitudes for the option they had not chosen. During the outcome phase, participants could thus use the information displayed to update their estimates of the reward and effort magnitudes associated with the choices.

Finally, on every trial, participants had to perform the effort phase of the trial. Participants had to exert a sustained effort by selecting circles that appeared on the screen using the trackball mouse. The circles were added to random positions on the screen in threes every three seconds (up to a total equal to the chosen effort magnitude). To make the task more effortful a random jitter (five pixels, the total screen size was 1280 × 800 pixels) was added to the mouse movement and circles only had a 70% probability of disappearing when clicked on. Furthermore, we pre-screened participants and only invited participants for the fMRI session if they had perceived the effort as aversive and were willing to trade-off money to reduce the effort that they needed to exert.

Participants had 25s to complete the clicking phase and otherwise lost money equivalent to the potential reward magnitude of the chosen option (participants failed to complete the effort phase on less than 1% of trials).

On most trials (100 out of 120) participants had to chose between the two options with changing reward and effort magnitudes. The reward magnitudes were set between 0 and 20 pence and the effort magnitudes were set between 0 and 15 circles that needed to be clicked. On the remaining trials (‘Special-option-trials’, SOTs), participants had to choose between one of the changing options and one of two fixed options whose values participants learned in a training session outside the scanner. The value of both fixed options was 7.5 pence, but one had a fixed effort magnitude of 4 circles and the other had one of 12 circles. The SOTs were included to ensure participants learned the values of each choice, rather than just their preference for one option over the other (a relative preference for one option over the other would not enable participants to choose effectively on the SOTs).

Interspersed with the 120 learning trials, there were 20 trials on which participants just had to indicate which option had a higher mean effort magnitude. These trials were included to ensure participants paid attention to the effort dimension. They were not given feedback about their choice. These trials were not included in the data analysis.

Participants performed 120 trials of the learning task inside the scanner and an additional 120 trials afterwards on the next day outside the scanner to increase the number of trials for the behavioral data analysis. Each participant performed the same two schedules in randomized order. Participants were informed about the features of the task in two training sessions before the scan, including the fixed number of trials they would perform. This ensured that they did not perceive low effort options as having a potentially higher monetary value because taking them might allow participants to move on to the next trial more quickly and to perform more trials with more chances to win money. Further details of the training were as follows: In the first training session (45 min), participants performed a version of the task without a learning component, i.e. not only the probability, but also reward and effort magnitudes were explicitly shown. This training ensured that participants were familiar with the features of the task, for example, that they understood what the probability information meant. We also used this session to exclude participants before the fMRI session that did not find the effort sufficiently aversive to produce robust effects on behavior. In a second training session (1 hr), we instructed participants about the learning task that they later performed in the fMRI scanner. At the end of the training, participants were queried about how they made decisions (specifically, they were asked ‘What are you thinking about when you’re making your decision’). All participants reported trying to learn the reward and effort magnitudes and using the explicitly cued probabilities to make decisions. This suggested that participants were well aware of how to do the task before the beginning of the scan.

Experiment timings

The options were displayed for 1.4 to 4.5 s before participants could make a choice. After the choice was made, the chosen option was highlighted for 2.9 to 8.0 s. Next, the outcome was first displayed for the chosen option (1.9–2.1 s), then for the unchosen option (1.9–6.9 s). Participants then performed the effort exertion task (0–25 s). Finally, the trial ended with an ITI (2.3–7.5 s).

Data sharing

The data are publicly available from the Oxford University Research Archive (https://doi.org/10.5287/bodleian:PP805bgDz). Analysis scripts are available on request from the corresponding author. Source data files are provided with the article for all figures presented in the manuscript.

Task validation

We performed a logistic regression to validate that participants performed the task well, i.e. that they took all relevant task features into account when making their decisions. In the regression, we predicted whether participants chose again the same option as on the previous trial (‘stay’) or instead selected the alternative option (‘switch’). As predictors we included the displayed reward probabilities (from current trial, t) and the reward (‘RM’) and effort magnitudes (‘EM’) from the past four trials (t-1, t-2, t-3, t-4). These regressors were coded in the frame of reference of the ‘stay’ choice relative to the ‘switch’ choice [e.g. reward magnitude on the last trial (t-1) for the option that would be a ‘stay choice’ minus the reward magnitude (at t-1) for the alternative option]. All regressors were z-score normalized.

\begin{matrix} Y = β_{0} + β_{1} R e w a r d P r o b a b i l i t y_{t} + β_{2} R M_{t - 1} + β_{3} R M_{t - 2} + β_{4} R M_{t - 3} + β_{5} R M_{t - 4} \\ + β_{6} E M_{t - 1} + β_{7} E M_{t - 2} + β_{8} E M_{t - 3} + β_{9} E M_{t - 4} \end{matrix}

We used ANOVAs to test whether participants could use the learnt information (main effect across the four reward magnitude (RM) or the four effort magnitude (EM) regression weights. We controlled for group assignment as a between participant confound.

The same result can be illustrated by binning participants’ choices according to the predicted reward and effort magnitudes on each trial, as derived from a previously established Bayesian optimal observer model (Scholl et al., 2015), see also Figure 2—figure supplement 2A and the Materials and methods below for a validation of this model.

Behavioral modeling

We adapted a previously described computational learning model (Scholl et al., 2015) to measure how much participants used the information they learnt to guide their choices (γ). This Rescorla-Wagner learning model was fit to participants’ choices in the task. In short, the model consisted of three components: firstly the model had predictions of the mean reward/effort magnitudes underlying both outcomes. These were updated on every trial:

P r e d i c t i o n_{t} = P r e d i c t i o n_{t - 1} + α * P E_{t - 1}

with

P E_{t - 1} = O u t c o m e_{t - 1} - P r e d i c t i o n_{t - 1}

where α was the learning rate.

Secondly, the model combined these reward/effort magnitude predictions together with the reward probabilities (shown to participants on each trial) to calculate how valuable each option was (i.e. their utility).

\begin{array}{ll} U t i l i t y_{O p t i o n A} & = (1 - γ) * P r o b a b i l i t y_{R e w a r d} + γ * (λ * M a g P r e d i c t i o n_{R e w a r d} \\ - (1 - λ) * M a g P r e d i c t i o n_{E f f o r t}) \end{array}

λ describes to what extent participants relied more on reward versus effort magnitudes. In the present context, the parameter of interest that describes how much participants used the learnt information was γ.

Thirdly, the model then compared the utility of the two options to predict choices, using a standard soft-max decision rule:

P (O p t i o n_{A}) = \frac{e^{β * U t i l i t y_{A}}}{e^{β * U t i l i t y_{A}} + e^{β * U t i l i t y_{B}}}

where β (inverse temperature) reflected participants’ tendency to pick the option with the higher utility.

We also considered alternative models (M2-M6, see Figure 2—figure supplement 2A). Firstly, these models differed in their number of learning rates: they either shared the same learning rate for reward and effort, or they had separate learning rates. Secondly, instead of computing utility as a linear combination of reward magnitudes and probability, utility could be computed based on a multiplicative integration of probability and reward:

\begin{array}{ll} {U t i l i t y}_{O p t i o n A} & = (1 - λ) {* P r o b a b i l i t y}_{R e w a r d} * M a g P r e d i c t i o n_{R e w a r d} - λ \\ * M a g P r e d i c t i o n_{E f f o r t} \end{array}

where λ was the relative effort (to reward) sensitivity.

Finally, to ensure that the previously described (Scholl et al., 2015) Bayesian optimal observer model that we used to illustrate the participants’ behavior in Figure 2A and to derive regressors for the fMRI analysis provided a good fit to the data, we also used a model with no fitted learning rate that instead used the predictions for reward and effort derived from the Bayesian model. We also note here that fMRI regressors derived from the Bayesian optimal observer model correlated very strongly (r > 0.99) with those obtained from the fitted reinforcement learning model and therefore using either type of model to obtain regressors does not affect our results.

All models were fit using Bayesian parameter estimation (Lee and Wagenmakers, 2014) as implemented in Stan (Carpenter et al., 2016). We used a hierarchical modeling approach, i.e. parameter estimates for individual participants were constrained by a group-level distribution of those parameters. We obtained three chains of 1000 samples after an initial warm-up of 1000 samples; convergence of chains was checked (Gelman and Rubin, 1992). Based on an initial fitting of individual participants, parameter ranges and transformations were selected so that parameters were reasonable, and normally distributed at the group level. Specifically, the learning rates, weight of learnt information and sensitivity to the effort were sampled from a group normal distribution on a scale from -∞ to +∞ and then transformed to a scale from 0 to 1; the inverse temperature was sampled from a group normal distribution on a scale from 0 to 1. For the group level distributions, mean values were given a flat prior in the allowed range and standard deviations were given a prior of mean zero and standard deviation 10 and constrained to be positive.

We assessed model-fit in two ways. Firstly, we performed a cross validation using a half-split of the data: we fitted all participants’ data for either the session inside or outside the scanner and then used the estimated parameters to assess predictive accuracy (summed log likelihoods) for the data from the other session (Vehtari et al., 2016). Secondly, as an alternative method for model comparison, we also computed summed (across participants) BIC values for a non-hierarchical version of the models, fitted using Matlab’s fminsearch. We also used the parameter estimates derived from the separate sessions to examine test-retest reliability of the parameter estimates (Figure 2—figure supplement 2C).

In supplementary analyses, we validated the model (M1) further (Figure 2—figure supplement 1). To check that our model was indeed able to capture participants’ behavior, we simulated data from 10 sets of 27 participants with parameter group mean and standard deviations as derived from the real data. We analyzed this simulated data using the same regression and model-fitting approaches as described above. To illustrate our behavioral effect of interest, differences in the use of learnt information (γ), we simulated another two groups of 270 participants whose mean γ was at the extreme ends of the confidence intervals for those found in real participants.

Spectroscopy

Spectroscopy and fMRI data were acquired using a Siemens Verio 3 Tesla MRI scanner (32-channel coil). Spectroscopy data were obtained from dACC. Previous studies have shown that spectroscopy measurements of neurotransmitter levels are region specific (Emir et al., 2012; van der Veen and Shen, 2013). First, a high-resolution T1-weighted scan was acquired using an MPRAGE sequence. Based on this scan, the spectroscopy voxel (2 x 2 × 2 cm) was centered on dACC by reference to the location of the corpus callosum, the cingulate and adjacent sulci. The center of gravity of the region of maximum voxel overlap across participants lay at x = 0, y = 28, z = 28 in the Montreal Neurological Institute (MNI) space. The relatively large size of the spectroscopy voxel meant that it extended to include tissue in the paracingulate sulcus in those participants in which it was present. MRS data (128 samples) were acquired using the SPECIAL sequence (Mekle et al., 2009; Mlynárik et al., 2006) as described previously (Stagg et al., 2011). The data were preprocessed using the FID-Appliance (github.com/CIC-methods/FID-A [Simpson et al., 2015]) to correct for frequency and phase-drift. The data were then analyzed using LCModel (Provencher, 2001). Voxels for which Cramer-Rao lower bound values exceeded 20% were excluded. GABA and glutamate values were divided by total creatine. To correct for partial volumes within the spectroscopy voxels, all analyses included as confound regressors the relative volumes of grey and white matter (i.e. grey or white matter divided by total tissue = grey + white + cerebrospinal fluid) and total tissue in the spectroscopy voxel. These values were obtained using FAST (FMRIB’s automated segmentation tool, [Zhang et al., 2001]). The results were independent of the precise manner of controlling for partial volume (see Figure 3—figure supplement 3).

Relating behavior to spectroscopy

As in previous reports (Jocham et al., 2012), we found that glutamate and GABA were correlated (r = 0.47, p=0.013). Therefore, to be able to measure the separate impact of glutamate and GABA on the use of the learnt information, we performed nonparametric (Spearman) partial correlations, between the use of learnt information (γ) and either glutamate or GABA, controlling for the other neurotransmitter. In each analysis, we additionally controlled for group assignment, inverse temperature (β, from the behavioral model), relative gray and white matter and total tissue. To reduce the number of multiple comparisons in our initial analysis, when testing each model parameter for its relationship to glutamate and GABA, we combined glutamate and GABA to one value (glutamate minus GABA). This also reflected our initial hypothesis that it might not be each neurotransmitter in isolation that influences neural activity and behavior but rather the relationship between glutamate and GABA that is critical.

FMRI

Data acquisition

For the fMRI, we used a Deichmann echo-planar imaging (EPI) sequence (Deichmann et al., 2003) [time to repeat (TR): 3s; 3 x 3 × 3 mm voxel size; echo time (TE): 30 ms; flip angle: 87°; slice angle of 15° with local z-shimming] to minimize signal distortions in orbitofrontal brain areas. This entailed orienting the field-of-view at approximately 30° with respect to the AC-PC line. We acquired between 1100 and 1300 volumes (depending on the time needed to complete the task) of 45 slices per participant.

Preprocessing

FMRI data were analyzed using FMRIB’s Software Library (FSL [Smith et al., 2004]; see also [Scholl et al., 2015]), run on a local computer using HTCondor (Thain et al., 2005) and code from NeuroDebian (Halchenko and Hanke, 2012). We used the standard settings in FSL (Smith et al., 2004) for image pre-processing and analysis. Motion was corrected using the FSL tool MCFLIRT (Jenkinson et al., 2002). This also provided six motion regressors that we included in the FMRI analyses. Functional images were first spatially smoothed (Gaussian kernel with 5 mm full-width half-maximum) and temporally high-pass filtered (3 dB cut-off of 100 s). Afterward, the functional data were manually denoised using probabilistic independent component analysis (Beckmann and Smith, 2004), visually identifying and regressing out obvious noise components (Kelly et al., 2010); we considered only the first 40 components of each participant which had the greatest impact to interfere with task data (total up to 550). We used the Brain Extraction Tool (BET) from FSL (Smith, 2002) on the high-resolution structural MRI images to separate brain matter from non-brain matter. The resulting images guided registration of functional images in the Montreal Neurological Institute (MNI)-space using non-linear registrations as implemented in FNIRT (Jenkinson et al., 2012). The data were pre-whitened before analysis to account for temporal autocorrelations (Woolrich et al., 2001).

Data analysis

In the first analysis (GLM1), we looked for brain areas that showed activity varying with the reward and effort information to be learnt. A full list of regressors and correlations between them is shown in Figure 3—figure supplement 4 (all r<0.33). We used three boxcar regressors, indicating the onset and duration of the decision phase (from the beginning of the trial until participants made a choice), the onset and duration of the outcome phase (from the appearance of the chosen outcome until the chosen and the unchosen outcomes disappeared from the screen) and lastly the effort exertion phase (from the appearance of the first effort target until participants had removed the last target). In the outcome phase, we included the following parametric regressors: whether a reward was delivered for the chosen option, the reward probability for the chosen option and the reward and effort magnitude outcomes for the chosen and the unchosen option. In each case, separate regressors for the chosen and the unchosen option were used.

The main contrast of interest (Figure 3B) was the total information to be learnt, i.e. the contrast of the relative (chosen minus unchosen) reward minus effort magnitude:

\begin{array}{l} L e a r n t I n f o r m a t i o n_{C o n t r a s t} = \\ (R e w a r d M a g n i t u d e_{C h o s e n} - R e w a r d M a g n i t u d e_{U n c h o s e n}) - \\ (E f f o r t M a g n i t u d e_{C h o s e n} - E f f o r t_{U n c h o s e n}) \end{array}

We used FSL’s FLAME 1 + 2 (Woolrich et al., 2004) to perform higher-level analyses; outlier de-weighting was used. We included group assignment as a confound regressor. Results were cluster-corrected (p<0.05, voxel inclusion threshold: z = 2.3).

Next, we tested whether individual differences in how much participants could use the learnt information related to differences in neural signals (GLM2). For this, we included at the group level the behavioral measure γ as a covariate. We again included group assignment, as well as inverse temperature (β) as confound regressors. The results were cluster-corrected (p<0.05, voxel inclusion threshold: z = 2.3).

Relating fMRI to spectroscopy measures

We tested how measures of GABA and glutamate influenced the neural signal of the information to be learnt (GLM3). For this, we included at the group level GABA and glutamate measurements as covariates. As confound regressors we included, as in the behavioral analysis, group assignment, inverse temperature (β), as well as gray matter (voxel-wise, obtained using FSL’s feat_gm_prepare), relative white matter and total tissue in the spectroscopy voxel. We combined regressors for the effect of glutamate and GABA to a single contrast for statistical testing (i.e. glutamate minus GABA levels). We used the group average spectroscopy voxel as a mask; results were again cluster-corrected (p<0.05, voxel inclusion threshold: z = 2.3).

Acknowledgement

The authors thank Gerhard Jocham for helpful advice on methods and data analysis.

Funding Statement

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Funding Information

This paper was supported by the following grants:

Medical Research Council MR/N014448/1 to Jacqueline Scholl.
Wellcome Trust 092759/Z/10/Z to Jacqueline Scholl.
Wellcome Trust 089280/Z/09/Z to Nils Kolling.
Wellcome Trust WT100973AIA to Matthew FS Rushworth.
Christ Church College Stipendary Junior Research Fellowship to Nils Kolling.

Additional information

Competing interests

The authors declare that no competing interests exist.

Author contributions

JS, Conception and design, Acquisition of data, Analysis and interpretation of data, Drafting or revising the article.

NK, Conception and design, Acquisition of data, Analysis and interpretation of data, Drafting or revising the article.

NN, Acquisition of data, Drafting or revising the article.

CJS, Acquisition of data, Drafting or revising the article.

CJH, Conception and design, Analysis and interpretation of data, Drafting or revising the article.

MFSR, Conception and design, Analysis and interpretation of data, Drafting or revising the article.

Ethics

Human subjects: Participants gave informed consent to take part in the study, which was approved by the NRES Committee South Central - Portsmouth (12/SC/0276).

Additional files

Major datasets

The following dataset was generated:

Scholl J,Kolling N,Rushworth M,Harmer C,2016,Neurotransmitters in learning and decision-making - Data,https://ora.ox.ac.uk/objects/uuid:bde6944b-3021-4942-a2cb-9f76fef12788,Publicly available at the Oxford University Research Archive (https://ora.ox.ac.uk)

References

Amiez C, Joseph JP, Procyk E. Reward encoding in the monkey anterior cingulate cortex. Cerebral Cortex. 2006;16:1040–1055. doi: 10.1093/cercor/bhj046. [DOI] [PMC free article] [PubMed] [Google Scholar]
Beckmann CF, Smith SM. Probabilistic independent component analysis for functional magnetic resonance imaging. IEEE Transactions on Medical Imaging. 2004;23:137–152. doi: 10.1109/TMI.2003.822821. [DOI] [PubMed] [Google Scholar]
Boorman ED, Behrens TE, Woolrich MW, Rushworth MF. How green is the grass on the other side? frontopolar cortex and the evidence in favor of alternative courses of action. Neuron. 2009;62:733–743. doi: 10.1016/j.neuron.2009.05.014. [DOI] [PubMed] [Google Scholar]
Carpenter B, Gelman A, Hoffman M, Lee D, Goodrich B. Stan: A probabilistic programming language. Journal of Statistical Software. 2016 doi: 10.18637/jss.v076.i01. In press. [DOI] [PMC free article] [PubMed] [Google Scholar]
Danielmeier C, Allen EA, Jocham G, Onur OA, Eichele T, Ullsperger M. Acetylcholine mediates behavioral and neural post-error control. Current Biology. 2015;25:1461–1468. doi: 10.1016/j.cub.2015.04.022. [DOI] [PubMed] [Google Scholar]
Deichmann R, Gottfried JA, Hutton C, Turner R. Optimized EPI for fMRI studies of the orbitofrontal cortex. NeuroImage. 2003;19:430–441. doi: 10.1016/S1053-8119(03)00073-9. [DOI] [PubMed] [Google Scholar]
Emir UE, Auerbach EJ, Van De Moortele PF, Marjańska M, Uğurbil K, Terpstra M, Tkáč I, Oz G. Regional neurochemical profiles in the human brain measured by ¹H MRS at 7 T using local B₁ shimming. NMR in Biomedicine. 2012;25:152–160. doi: 10.1002/nbm.1727. [DOI] [PMC free article] [PubMed] [Google Scholar]
Gelman A, Rubin DB. Inference from iterative simulation using multiple sequences. Statistical Science. 1992;7:457–472. doi: 10.1214/ss/1177011136. [DOI] [Google Scholar]
Goodkind M, Eickhoff SB, Oathes DJ, Jiang Y, Chang A, Jones-Hagata LB, Ortega BN, Zaiko YV, Roach EL, Korgaonkar MS, Grieve SM, Galatzer-Levy I, Fox PT, Etkin A. Identification of a common neurobiological substrate for mental illness. JAMA Psychiatry. 2015;72:305–315. doi: 10.1001/jamapsychiatry.2014.2206. [DOI] [PMC free article] [PubMed] [Google Scholar]
Halchenko YO, Hanke M. Open is not enough. let's Take the Next Step: An Integrated, Community-Driven Computing Platform for Neuroscience. Frontiers in Neuroinformatics. 2012;6:22. doi: 10.3389/fninf.2012.00022. [DOI] [PMC free article] [PubMed] [Google Scholar]
Heilbronner SR, Hayden BY. Dorsal anterior cingulate cortex: A Bottom-Up view. Annual Review of Neuroscience. 2016;39:149–170. doi: 10.1146/annurev-neuro-070815-013952. [DOI] [PMC free article] [PubMed] [Google Scholar]
Holroyd CB, Yeung N. Motivation of extended behaviors by anterior cingulate cortex. Trends in Cognitive Sciences. 2012;16:122–128. doi: 10.1016/j.tics.2011.12.008. [DOI] [PubMed] [Google Scholar]
Hunt LT, Behrens TE, Hosokawa T, Wallis JD, Kennerley SW. Capturing the temporal evolution of choice across prefrontal cortex. eLife. 2015;4:e11945. doi: 10.7554/eLife.11945. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hunt LT, Kolling N, Soltani A, Woolrich MW, Rushworth MFS, Behrens TEJ. Mechanisms underlying cortical activity during value-guided choice. Nature Neuroscience. 2012;15:470–476. doi: 10.1038/nn.3017. [DOI] [PMC free article] [PubMed] [Google Scholar]
Jenkinson M, Bannister P, Brady M, Smith S. Improved optimization for the robust and accurate linear registration and motion correction of brain images. NeuroImage. 2002;17:825–841. doi: 10.1006/nimg.2002.1132. [DOI] [PubMed] [Google Scholar]
Jenkinson M, Beckmann CF, Behrens TE, Woolrich MW, Smith SM. FSL. NeuroImage. 2012;62:782–790. doi: 10.1016/j.neuroimage.2011.09.015. [DOI] [PubMed] [Google Scholar]
Jocham G, Hunt LT, Near J, Behrens TE. A mechanism for value-guided choice based on the excitation-inhibition balance in prefrontal cortex. Nature Neuroscience. 2012;15:960–961. doi: 10.1038/nn.3140. [DOI] [PMC free article] [PubMed] [Google Scholar]
Karlsson MP, Tervo DG, Karpova AY. Network resets in medial prefrontal cortex mark the onset of behavioral uncertainty. Science. 2012;338:135–139. doi: 10.1126/science.1226518. [DOI] [PubMed] [Google Scholar]
Kelly RE, Alexopoulos GS, Wang Z, Gunning FM, Murphy CF, Morimoto SS, Kanellopoulos D, Jia Z, Lim KO, Hoptman MJ. Visual inspection of independent components: defining a procedure for artifact removal from fMRI data. Journal of Neuroscience Methods. 2010;189:233–245. doi: 10.1016/j.jneumeth.2010.03.028. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kennerley SW, Walton ME, Behrens TE, Buckley MJ, Rushworth MF. Optimal decision making and the anterior cingulate cortex. Nature Neuroscience. 2006;9:940–947. doi: 10.1038/nn1724. [DOI] [PubMed] [Google Scholar]
Khamassi M, Wilson C, Rothé R, Quilodran R, Dominey PF. Meta-learning, cognitive control, and physiological interactions between medial and lateral prefrontal cortex. Neural Basis of Motivational and Cognitive Control. 2011:351–370. doi: 10.7551/mitpress/9780262016438.003.0019. [DOI] [Google Scholar]
Kolling N, Behrens T, Wittmann MK, Rushworth M. Multiple signals in anterior cingulate cortex. Current Opinion in Neurobiology. 2016a;37:36–43. doi: 10.1016/j.conb.2015.12.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kolling N, Behrens TE, Mars RB, Rushworth MF. Neural mechanisms of foraging. Science. 2012;336:95–98. doi: 10.1126/science.1216930. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kolling N, Wittmann M, Rushworth MF. Multiple neural mechanisms of decision making and their competition under changing risk pressure. Neuron. 2014;81:1190–1202. doi: 10.1016/j.neuron.2014.01.033. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kolling N, Wittmann MK, Behrens TE, Boorman ED, Mars RB, Rushworth MF. Value, search, persistence and model updating in anterior cingulate cortex. Nature Neuroscience. 2016b;19:1280–1285. doi: 10.1038/nn.4382. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lee MD, Wagenmakers E-J. Bayesian Cognitive Modeling: A Practical Course. Cambridge university press; 2014. [Google Scholar]
Mars RB, Jbabdi S, Sallet J, O'Reilly JX, Croxson PL, Olivier E, Noonan MP, Bergmann C, Mitchell AS, Baxter MG, Behrens TE, Johansen-Berg H, Tomassini V, Miller KL, Rushworth MF. Diffusion-weighted imaging tractography-based parcellation of the human parietal cortex and comparison with human and macaque resting-state functional connectivity. Journal of Neuroscience. 2011;31:4087–4100. doi: 10.1523/JNEUROSCI.5102-10.2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
Meder D, Haagensen BN, Hulme O, Morville T, Gelskov S, Herz DM, Diomsina B, Christensen MS, Madsen KH, Siebner HR. Tuning the Brake while raising the stake: Network dynamics during sequential Decision-Making. Journal of Neuroscience. 2016;36:5417–5426. doi: 10.1523/JNEUROSCI.3191-15.2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
Mekle R, Mlynárik V, Gambarota G, Hergt M, Krueger G, Gruetter R. MR spectroscopy of the human brain with enhanced signal intensity at ultrashort echo times on a clinical platform at 3t and 7t. Magnetic Resonance in Medicine. 2009;61:1279–1285. doi: 10.1002/mrm.21961. [DOI] [PubMed] [Google Scholar]
Mlynárik V, Gambarota G, Frenkel H, Gruetter R. Localized short-echo-time proton MR spectroscopy with full signal-intensity acquisition. Magnetic Resonance in Medicine. 2006;56:965–970. doi: 10.1002/mrm.21043. [DOI] [PubMed] [Google Scholar]
Mori S, Wakana S, Van Zijl PCM, Nagae-Poetscher LM. MRI Atlas of Human White Matter. Elsevier; 2005. p. 284. [DOI] [PubMed] [Google Scholar]
Neubert FX, Mars RB, Sallet J, Rushworth MF. Connectivity reveals relationship of brain areas for reward-guided learning and decision making in human and monkey frontal cortex. PNAS. 2015;112:E2695–E2704. doi: 10.1073/pnas.1410767112. [DOI] [PMC free article] [PubMed] [Google Scholar]
O'Reilly JX, Schüffelgen U, Cuell SF, Behrens TE, Mars RB, Rushworth MF. Dissociable effects of surprise and model update in parietal and anterior cingulate cortex. PNAS. 2013;110:E3660–E3669. doi: 10.1073/pnas.1305373110. [DOI] [PMC free article] [PubMed] [Google Scholar]
Provencher SW. Automatic quantitation of localized in vivo 1H spectra with LCModel. NMR in Biomedicine. 2001;14:260–264. doi: 10.1002/nbm.698. [DOI] [PubMed] [Google Scholar]
Rushworth MF, Noonan MP, Boorman ED, Walton ME, Behrens TE. Frontal cortex and reward-guided learning and decision-making. Neuron. 2011;70:1054–1069. doi: 10.1016/j.neuron.2011.05.014. [DOI] [PubMed] [Google Scholar]
Sallet J, Mars RB, Noonan MP, Neubert FX, Jbabdi S, O'Reilly JX, Filippini N, Thomas AG, Rushworth MF. The organization of dorsal frontal cortex in humans and macaques. Journal of Neuroscience. 2013;33:12255–12274. doi: 10.1523/JNEUROSCI.5108-12.2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
Scholl J, Günthner J, Kolling N, Favaron E, Rushworth MF, Harmer CJ, Reinecke A. A role beyond learning for NMDA receptors in reward-based decision-making-a pharmacological study using d-cycloserine. Neuropsychopharmacology. 2014;39:2900–2909. doi: 10.1038/npp.2014.144. [DOI] [PMC free article] [PubMed] [Google Scholar]
Scholl J, Kolling N, Nelissen N, Wittmann MK, Harmer CJ, Rushworth MF. The good, the Bad, and the Irrelevant: Neural Mechanisms of Learning Real and Hypothetical Rewards and Effort. Journal of Neuroscience. 2015;35:11233–11251. doi: 10.1523/JNEUROSCI.0396-15.2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
Shima K, Tanji J. Role for Cingulate Motor Area cells in voluntary movement selection based on reward. Science. 1998;282:1335–1338. doi: 10.1126/science.282.5392.1335. [DOI] [PubMed] [Google Scholar]
Simpson R, Devenyi GA, Jezzard P, Hennessy TJ, Near J. Advanced processing and simulation of MRS data using the FID appliance (FID-A)-An open source, MATLAB-based toolkit. Magnetic Resonance in Medicine. 2015;77:23–33. doi: 10.1002/mrm.26091. [DOI] [PubMed] [Google Scholar]
Smith SM, Jenkinson M, Woolrich MW, Beckmann CF, Behrens TE, Johansen-Berg H, Bannister PR, De Luca M, Drobnjak I, Flitney DE, Niazy RK, Saunders J, Vickers J, Zhang Y, De Stefano N, Brady JM, Matthews PM. Advances in functional and structural MR image analysis and implementation as FSL. NeuroImage. 2004;23 Suppl 1:S208–219. doi: 10.1016/j.neuroimage.2004.07.051. [DOI] [PubMed] [Google Scholar]
Smith SM. Fast robust automated brain extraction. Human Brain Mapping. 2002;17:143–155. doi: 10.1002/hbm.10062. [DOI] [PMC free article] [PubMed] [Google Scholar]
Stagg CJ, Best JG, Stephenson MC, O'Shea J, Wylezinska M, Kincses ZT, Morris PG, Matthews PM, Johansen-Berg H. Polarity-sensitive modulation of cortical neurotransmitters by transcranial stimulation. Journal of Neuroscience. 2009;29:5202–5206. doi: 10.1523/JNEUROSCI.4432-08.2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
Stagg CJ, Bestmann S, Constantinescu AO, Moreno LM, Allman C, Mekle R, Woolrich M, Near J, Johansen-Berg H, Rothwell JC. Relationship between physiological measures of excitability and levels of glutamate and GABA in the human motor cortex. The Journal of Physiology. 2011;589:5845–5855. doi: 10.1113/jphysiol.2011.216978. [DOI] [PMC free article] [PubMed] [Google Scholar]
Stoll FM, Fontanier V, Procyk E. Specific frontal neural dynamics contribute to decisions to check. Nature Communications. 2016;7:11990. doi: 10.1038/ncomms11990. [DOI] [PMC free article] [PubMed] [Google Scholar]
Sumner P, Edden RA, Bompas A, Evans CJ, Singh KD. More GABA, less distraction: a neurochemical predictor of motor decision speed. Nature Neuroscience. 2010;13:825–827. doi: 10.1038/nn.2559. [DOI] [PubMed] [Google Scholar]
Terhune DB, Russo S, Near J, Stagg CJ, Cohen Kadosh R. GABA predicts time perception. Journal of Neuroscience. 2014;34:4364–4370. doi: 10.1523/JNEUROSCI.3972-13.2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
Tervo DG, Proskurin M, Manakov M, Kabra M, Vollmer A, Branson K, Karpova AY. Behavioral variability through stochastic choice and its gating by anterior cingulate cortex. Cell. 2014;159:21–32. doi: 10.1016/j.cell.2014.08.037. [DOI] [PubMed] [Google Scholar]
Thain D, Tannenbaum T, Livny M. Distributed computing in practice: the condor experience. Concurrency and Computation: Practice and Experience. 2005;17:323–356. doi: 10.1002/cpe.938. [DOI] [Google Scholar]
Ullsperger M, Danielmeier C, Jocham G. Neurophysiology of performance monitoring and adaptive behavior. Physiological Reviews. 2014;94:35–79. doi: 10.1152/physrev.00041.2012. [DOI] [PubMed] [Google Scholar]
van der Veen JW, Shen J. Regional difference in GABA levels between medial prefrontal and occipital cortices. Journal of Magnetic Resonance Imaging. 2013;38:745–750. doi: 10.1002/jmri.24009. [DOI] [PMC free article] [PubMed] [Google Scholar]
Vehtari A, Gelman A, Gabry J. Practical bayesian model evaluation using leave-one-out cross-validation and WAIC. arXiv.org. 2016 arXiv:1507.04544
Wittmann MK, Kolling N, Akaishi R, Chau BK, Brown JW, Nelissen N, Rushworth MF. Predictive decision making driven by multiple time-linked reward representations in the anterior cingulate cortex. Nature Communications. 2016;7:12327. doi: 10.1038/ncomms12327. [DOI] [PMC free article] [PubMed] [Google Scholar]
Woolrich MW, Behrens TE, Beckmann CF, Jenkinson M, Smith SM. Multilevel linear modelling for FMRI group analysis using bayesian inference. NeuroImage. 2004;21:1732–1747. doi: 10.1016/j.neuroimage.2003.12.023. [DOI] [PubMed] [Google Scholar]
Woolrich MW, Ripley BD, Brady M, Smith SM. Temporal autocorrelation in univariate linear modeling of FMRI data. NeuroImage. 2001;14:1370–1386. doi: 10.1006/nimg.2001.0931. [DOI] [PubMed] [Google Scholar]
Yüksel C, Öngür D. Magnetic resonance spectroscopy studies of glutamate-related abnormalities in mood disorders. Biological Psychiatry. 2010;68:785–794. doi: 10.1016/j.biopsych.2010.06.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhang Y, Brady M, Smith S. Segmentation of brain MR images through a hidden markov random field model and the expectation-maximization algorithm. IEEE Transactions on Medical Imaging. 2001;20:45–57. doi: 10.1109/42.906424. [DOI] [PubMed] [Google Scholar]

eLife. 2017 Jan 5;6:e20365. doi: 10.7554/eLife.20365.024

Decision letter

Editor: Joshua I Gold¹

In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.

Thank you for submitting your article "Excitation and inhibition in dorsal anterior cingulate predict brain activity and use of past experiences" for consideration by eLife. Your article has been favorably evaluated by David Van Essen as the Senior Editor and three reviewers, including Emmanuel Procyk (Reviewer #3) and Joshua Gold, who is a member of our Board of Reviewing Editors.

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

Summary:

This study tested whether levels of Glu and GABA, as measured by MR spectroscopy in dorsal anterior cingulate cortex (dACC), correlate with brain function and behavioral markers attributed to dACC. They first made these measures in the rostral part of the mid-cingulate cortex, taking into account sulcal morphology. The defined ROI in the region served for spectroscopic measurements and for comparisons with BOLD measurements obtained while subjects performed a multidimensional learning task. The task tested the degree to which the subjects used outcome information to adapt decisions to choose between alternatives. A model-based approach, validated by models comparisons, allowed the authors to fit subjects' behavior and extract a parameter of interest describing the tendency of subjects to use learnt information.

The authors present three primary findings: 1) dACC levels of Glu were positively correlated and of GABA were negatively correlated with the degree of experience-based learning on a subject-by-subject basis; 2) a partly overlapping brain area represented the information to be learned in a manner that was also correlated with the behavioral effect; and 3) there was also a relationship between the strength of this representation and neurotransmitter levels. In addition, this study replicated previous work with the multidimensional learning task, thereby strengthening the notion that dACC mediates updating of cognitive models used to drive adaptive reward-guided behavior.

The reviewers were in agreement that this study is interesting and relevant, in that it addresses mechanisms of adaptive decisions, which is a critical aspect of higher brain function. In addition, the methodology is sound, and the paper is clearly written.

Essential revisions:

1) The task has some nice features, including the use of explicit and learned information that can be used flexibly to guide behavior. However, it also seems awfully complicated to allow for a single parameter to effectively describe the overall influence of learned information on behavior. Some issues that this raises: 1) is the learning parameter even consistent for a single subject over time? What is the test/retest reliability? 2) Were there systematic relationships between how well the model fit behavior for individual subjects and the MRS/fMRI activity?

2) The main results obviously depend on statistical outcomes that hopefully are not overly sensitive to multiple comparisons and other factors. Was that tested? The glutamate result presented in Figure 3A seems particularly weak; does it survive a non-parametric test? Did the neurotransmitter measures relate to any of the other behavioral parameters?

3) It would be useful to discuss in more detail the interpretation of the findings in terms of the role of dACC in adaptive behavior. For example, why it is expected that dACC bold inversely correlates with the information to be learnt, and how does this finding logically lead to the negative correlation with GABA? The answer to this might explain why the testing of alternative areas concerned only the deactivated network (Table 1A). On a related note, are there more specific points to be made regarding the meaning of the sign of the "information to be learned" relative difference? Likewise, the results suggest that more GABA in dACC leads to less use of information to guide behavior. How does this relate to more random behavior found in rats? More generally, a more mechanistic interpretation of the role of GABA/GLU in reducing integration of reward-history (or other behaviorally relevant computations) would be useful.

eLife. 2017 Jan 5;6:e20365. doi: 10.7554/eLife.20365.025

Author response

Essential revisions:

We agree with the reviewer that our experiment is more complex than most learning tasks. This is because we wished to investigate specific aspects of behavior. However, we believe that we can capture the key component of using learned information as opposed to instructed information with one specific parameter that we can separate from the other major determinants of behavior. We have been able to measure this component independently of learning speeds, for rewards or for costs; inverse temperature, and sensitivity to cost vs. rewards.

The reviewer is of course also correct in pointing out that in such a complex task it is important to test whether the model fits reliably. We have now expanded the manuscript with two further figures (Figure 2—figure supplements 1+2) that assess appropriateness of the model, model fit and reliability, and which illustrate the modeling approach better.

In a new supplementary figure (Figure 2—figure supplement 1), we simulated agents that make decisions in exactly the same task as our participants, with behavior being guided by the same computational model (M1) that we use throughout the paper. Importantly, these simulated agents value the different choices according to a weighted sum of learnt information and cued information that is traded-off by a single parameter, the use of learnt information, γ. We find that these simulated agents behave just like our human participants, as revealed by a regression analysis (Figure 2—figure supplement 1B) analogous to the one performed on human participants’ data (Figure 2B). We also find that when we then use our computational model to analyze the simulated participants’ data, we can recover the model parameters well (Figure 2—figure supplement 1A). This further suggests that a model that describes the overall influence of learnt information with a single parameter is appropriate for our task.

We have now also added an illustration of how a change in the use of learnt information will affect behavior (Figure 2—figure supplement 1C): We simulated agents that differ in the value of their parameter determining the use of learnt information. We find, when binning the simulated choices by either the learnt value or the explicitly cued value, that when γ is higher, behavior is, as expected, more driven by the learnt information. In contrast, when γ is lower, behavior is more drive by differences between the options in terms of their cued value (i.e. explicitly shown probabilities).

We have expanded the Methods (section ‘Behavioral Modeling’) in the following way to reflect these considerations:

“In supplementary analyses, we validated the model (M1) further (Figure 2—figure supplement 1). […] To illustrate our behavioral effect of interest, differences in the use of learnt information (γ), we simulated another two groups of 270 participants whose mean γ was at the extreme ends of the confidence intervals for those found in real participants.”

We have also included a new supplementary figure showing the new results of model validation (Figure 2—figure supplement 1).

Some issues that this raises: 1) is the learning parameter even consistent for a single subject over time? What is the test/retest reliability?

We have carried out a comparison of the learning parameter (use of learnt information γ), and other parameters used to describe our participants’ behavior, across time periods as requested by the reviewer. In short, we find that parameters are indeed very consistent over time.

Our behavioral data was collected in two separate sessions, one inside the MRI scanner, while we collected fMRI data (120 trials) and one outside the scanner, on the next day (120 trials). We can therefore examine the test/retest reliability of parameter estimates based on either the 120 trials inside or the 120 trials outside the scanner (collected on the next day). To do this, we have changed our modeling approach to employ a hierarchical Bayesian model fitting method as implemented in Stan (Carpenter et al., 2016), as this has been suggested to provide more robust fits when less data per participant is available (a detailed description of this new fitting approach has been included in the manuscript and is also shown below). First, we note that the parameters, based on data from both sessions, obtained using either the hierarchical or the non-hierarchical approach that we used before are very highly correlated (see Author response image 1).

Author response image 1. — **DOI:** http://dx.doi.org/10.7554/eLife.20365.020

Second, using the hierarchical fitting approach on data from each session separately, we find (Figure 2—figure supplement 2C, a strong correlation between the parameter estimates for the two sessions for our main parameter of interest, γ, how much participants used what they had learnt, and also all other parameters of the model. We therefore concluded that our model parameters are reliable and robust within individual subjects across time (at least from one day to the next).

We also note that as this form of test/retest reliability was only done on half the data, it seems likely that parameters fitted on the whole data set (which is the approach that we use in the main paper) would be estimated even more precisely and therefore show even higher test/re-test reliability.

We have expanded the Methods (section ‘Behavioral modeling’) to describe this new hierarchical model fitting approach:

“All models were fit using Bayesian parameter estimation (Lee and Wagenmakers, 2014) as implemented in Stan (Carpenter et al., 2016). We used a hierarchical modeling approach, i.e. parameter estimates for individual participants were constrained by a group-level distribution of those parameters. We obtained three chains of 1000 samples after an initial warm-up of […] We also used the parameter estimates derived from the separate sessions to examined test-retest reliability of the parameter estimates (Figure 2—figure supplement 2C).”

We have included a new supplementary figure to show the test/re-test reliability (Figure 2—figure supplement 2 panel C).

2) Were there systematic relationships between how well the model fit behavior for individual subjects and the MRS/fMRI activity?

In short, there were no relationships between model fit and our measures of interest. Additionally, controlling for a proxy of individual differences in model fit in our analyses does not affect any of our results.

Using model-fitting based on individual participants (i.e. without a hierarchical model), we obtain one measure of model fit for each participant. This correlates strongly with the estimated inverse temperature, i.e. an index of behavioral stochasticity according to our model, see Author response image 2. The overall goodness of model fit does not, however, correlate with our parameter of interest, γ, the use of learnt information, showing that an overall non-specific signal strength that correlates with participants’ willingness or ability to do the task cannot be invoked to explain our results.

Author response image 2. — However it does not correlate with other model parameters (all p>0.12).

**DOI:** http://dx.doi.org/10.7554/eLife.20365.021

Using the new hierarchical fitting method, we no longer obtain a measure of model fit for each person (instead there is only a measure of model fit for all participants together). As a proxy, because of the relationship between inverse temperature and model fit for individual participants (Author response image 2), we instead used the inverse temperature. We find that inverse temperature does not correlate with spectroscopy measures (see Figure 3—figure supplement 1). We therefore conclude that it is not the case that individual differences in how much people use what they have learnt are an artifact of overall behavioral model fit. We have now included the inverse temperature as a control parameter in analyses throughout the manuscript. We have updated all figures and tables in the manuscript accordingly.

In short, we find that also after correction for multiple comparisons and using non-parametric tests, our results hold. Additionally, neurotransmitter measurements do not relate to any other behavioral parameters, but instead are very specifically related to how participants use the learnt information.

Correction for multiple comparisons is not always straightforward, as it is critically dependent on the number of plausible, equivalent i.e. interchangeable tests that would confirm or deny one’s hypothesis. In the analysis of the effects of glutamate and GABA on behavior, our main hypothesis was, based on previous work, that this would relate to the use of learnt information (Karlsson et al., 2014). For this reason we did not examine correlations between neurotransmitter levels and all other behavioral parameters.

However, to show the specificity of our results, we have now also examined these additional correlations and find that none of the other behavioral measures correlate with neurotransmitter measurements. Moreover, the correlation between γ and neurotransmitters remains significant even after correction for multiple comparison (see Figure 3—figure supplement 1; for the use of learnt information: ρ=0.53, p=0.010; for all other parameters, p>0.7). Specifically, to reduce the number of comparisons, we have combined the glutamate and GABA measurement (as z-score normalized glutamate minus GABA levels, as for the fMRI) and performed partial correlations between this value and the four behavioral parameters (use of learnt information, inverse temperature, relative reliance on learnt reward or effort information and learning rate). These four comparisons thus have a Bonferroni-corrected p-value of 0.0125 (0.05/4). Furthermore, all these results are now reported as nonparametric partial correlations (Spearman’s rho, ρ; this has also been implemented in Figure 3A in the main manuscript). The fMRI analyses are controlled for multiple-comparison at the cluster level. In the fMRI analysis, we used FSL’s outlier de-weighting (in FLAME 1+2), which means that the possible impact of outlier data points is reduced. Therefore, our results are robust to correction for multiple comparison and outliers.

We also note (see Figure 3—figure supplement 3) that our results were not sensitive to the precise method of correction of spectroscopy values for brain volumes: we present in our paper results from partial correlation analyses that treat partial grey and white matter brain volume in the spectroscopy voxel as confound regressors. If instead we use other correction methods, we find very similar results. This further attests to the robustness of our results.

We are happy to expand our Discussion and we have added a new conceptual figure (Figure 1—figure supplement 1), to help clarify these issues.

We agree that referring to the activity pattern that we found as “deactivation” is potentially confusing and we have made changes throughout our manuscript to make our argument clearer: Similar to our BOLD effect in dACC, previous fMRI studies have found activity in dACC reflects what are sometimes called “inverse value” signals. For example, Kurniawan et al. (2013), Prevost et al. (2010) and Skvortsova et al. (2014) have all reported that dACC activity increases with increasing effort levels and decreases with increased levels of reward, which results in a signal that reports the opposite/inverse of the subjective value of the chosen option. Such a signal can, for the sake of brevity, be described as a “deactivation” in proportion to subjective value of the choice taken as opposed to the choice not taken. However, we agree with the reviewers that such a description is potentially confusing and arguably the activity pattern becomes more intuitive if it is described as positively related to the value of the alternative choice rather than the choice taken. Because the values of the two choices are varied independently we know that the activity is definitely also positively related to the value of the unchosen option. A number of fMRI and single neuron recording studies have suggested that such activity in dACC is related to the weighing up of evidence for changing or maintaining behavior (Kolling et al., 2012; Kolling et al., 2014; Kolling et al., 2016a; Kolling et al., 2016b; Meder et al., 2016; O'Reilly et al., 2013; Scholl et al., 2015; Shima and Tanji, 1998; Stoll et al., 2016; Wittmann et al., 2016). Such an interpretation is certainly consistent with the data in our current experiment.

In short, previous literature and the current study suggest that there is a relationship between dACC and the value of switching to an alternative behavior as opposed to the action actually taken. In the revised manuscript we have tried to express this without using the word “deactivation” which carries entirely inappropriate connotations of an overall decrement in activity during task performance. To take this to the domain of neurotransmitters, an enhanced learning signal in dACC would mean a stronger signal indicating the relative value of information to be learnt for an alternative as opposed to the current behavior. If glutamate increases a signal and GABA decreases it then we should see a larger instance of such a signal as glutamate increases and as GABA decreases. This is exactly what we find.

Regarding the choice of control regions. The aim of these control analyses was to assess whether the neurotransmitter levels measured in dACC specifically predicted BOLD signal in dACC or also BOLD signal in other brain areas. We did this as an additional control, despite evidence that glutamate and GABA levels in different brain areas are relatively uncorrelated (Emir et al., 2012; van der Veen and Shen, 2013). In order to perform the most stringent control analysis, we chose control areas, which were identified as having activity levels that changed in the same manner and direction as dACC in the same contrast (information to be learnt). However, we agree with the reviewer that we could have also examined activity in areas that showed the opposite pattern of activity in relation to our contrast of interest (i.e. the areas listed in Table 1B). We have now performed this additional control analysis, identifying regions of interest in mid cingulate cortex and temporal cortex, extending into parietal cortex. Again, we find no evidence that dACC spectroscopy measures correlate with activity relating to information to be learnt in these other brain regions suggesting, once again, the specificity of our effect.

We have now updated the manuscript to reflect these considerations. We have included a new conceptual figure (Figure 1—figure supplement 1).

We have also reworded relevant parts in the Results section to avoid the term ‘deactivation’:

“We identified activity in dACC and adjacent cortex (Figure 3Bi, x=6, y=32, z=36, z-score = 3.62, cluster p-value=5x10^-5) and in other areas (Table 1A) as coding the information to be learnt as an inverse outcome value signal (relative reward outcome minus relative effort outcome) or, in other words, a signal related to the relative value of the alternative not chosen on the current trial. Such a signal has previously been noted in dACC and has been related to behavioral adaptation: decisions to maintain or change behavior in diverse contexts (Kolling et al., 2012; Kolling et al., 2016b; Meder et al., 2016; Shima and Tanji, 1998; Stoll et al., 2016).”

We have now also added a note in the Results section to highlight that we have used both types of regions coding information to be learnt, i.e. regions with a value signal in the framework of the chosen or the unchosen option, as control regions:

“This result was specific to dACC; analogous analyses in other ROIs identified in the contrasts for learnt information (Table 1A+B) revealed no significant effects.”

Likewise, the results suggest that more GABA in dACC leads to less use of information to guide behavior. How does this relate to more random behavior found in rats? More generally, a more mechanistic interpretation of the role of GABA/GLU in reducing integration of reward-history (or other behaviorally relevant computations) would be useful.

We apologize to the reviewer for not making it clearer how we place our findings mechanistically and in regards with more basic rodent studies. We have now made an additional illustration (Figure 1—figure supplement 1) to hopefully make things clearer and we have included some more explanations in the text.

In short, we interpret our results as suggesting that the glutamate to GABA levels in dACC are a mechanism through which the brain can control to what extends it relies on either information that has been learnt (reward histories) or on new information available at the time of the decision. More excitation could drive increased firing of neurons with reward history based estimates of value, leading behavior to be more influenced by reward histories. GABA on the other hand reduces firing, preventing such information from driving behavior and effectively suppressing the effect of such past experiences on choice.

This relates very directly to findings by Tervo et al. (2014). They found that manipulating activity in rat ACC (through e.g. muscimol inactivation), in other words manipulating inhibitory activity, reduced the extent to which rats based their choices on information they had learnt. In other words it reduced the influence of the rats’ task models on their behavior. In the study of Tervo and colleagues, however, not using learnt information meant animals behaved randomly, whereas in our task they relied more heavily on other features of the task, namely the non-learnt probability information in a trial.

Relatedly, neurophysiological recordings made in ACC by Karlsson et al. (2012) found that ensemble activity patterns related to a model of the world (or prior beliefs) rats had learnt. When animals needed to disregard their learnt model, activity in ACC abruptly changed.

We have now updated the Discussion to reflect these considerations, however because of word limits, we had to keep this quite brief. We also hope that the new Figure 1—figure supplement 1 also illustrates the proposed mechanism better:

“Our findings are consistent with an emerging view of dACC in forming, updating and maintaining a model of the world and of behavioral strategies (Karlsson et al., 2012; Kolling et al., 2014; O'Reilly et al., 2013; Wittmann et al., 2016). […] It is possible that transient inhibition (through increased GABA) might allow for learning a new model of the task, whereas glutamate might mediate the exploitation of such a model.”

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Figure 2—source data 1. This table contains the regression coefficients for individual participants for the analysis shown in Figure 2B.

DOI: http://dx.doi.org/10.7554/eLife.20365.005

elife-20365-fig2-data1.xlsx^{(43.6KB, xlsx)}

DOI: 10.7554/eLife.20365.005

Figure 2—source data 2. This table relates to Figure 2—figure supplement 1.

It contains data of 270 simulated participants, including the simulated parameters (‘simulated’, ‘s’), the parameters that were estimated from the computational model M1 (‘fitted’, ‘f’) and the regression weights (‘w’) resulting from performing the same behavioral regression as on the real data, see Materials and methods, section ‘task validation’.

DOI: http://dx.doi.org/10.7554/eLife.20365.006

elife-20365-fig2-data2.xlsx^{(93.4KB, xlsx)}

DOI: 10.7554/eLife.20365.006

Figure 2—source data 3. This table relates to Figure 2—figure supplement 2.

It contains the model fits (Bayesian Information Criterion, BIC) for a non-hierarchical version of all models (models one to six, M1-M6) and the individual parameters for all participants from the hierarchical version of model M1. The individual parameters are shown for a model fitted on behavioral data from both sessions combined (‘both session’, ‘bs’), for a model fitted on data from the MRI session only (‘MRI session’, ‘ms’) and for a model fitted on data from outside the MRI scanner only (‘outside MRI session’, ‘os’).

DOI: http://dx.doi.org/10.7554/eLife.20365.007

elife-20365-fig2-data3.xlsx^{(50.1KB, xlsx)}

DOI: 10.7554/eLife.20365.007

Figure 3—source data 1. This table contains the spectroscopy, brain volume and behavioral parameters used for correlations in Figure 3A.

DOI: http://dx.doi.org/10.7554/eLife.20365.011

elife-20365-fig3-data1.xlsx^{(46.7KB, xlsx)}

DOI: 10.7554/eLife.20365.011

Figure 3—source data 2. This folder contains the MRI contrast maps, both thresholded (i.e. corrected for multiple comparison using cluster correction) and non-thresholded.

The maps are in NIfTI format and can be opened with freely available data viewers such as FSLView or MRIcron.

DOI: http://dx.doi.org/10.7554/eLife.20365.012

elife-20365-fig3-data2.zip^{(5.9MB, zip)}

DOI: 10.7554/eLife.20365.012

Figure 3—source data 3. This folder relates to Figure 3—figure supplement 3.

It contains non-thresholded contrast maps showing the relationship between Glu-GABA and brain activity to information to be learnt, using different corrections of spectroscopy measurements for partial brain volumes (Figure 3—figure supplement 3i). Maps are in NIfTI format.

DOI: http://dx.doi.org/10.7554/eLife.20365.013

elife-20365-fig3-data3.zip^{(2.5MB, zip)}

DOI: 10.7554/eLife.20365.013

Figure 3—source data 4. This table relates to Figure 3—figure supplement 3.

It contains the brain-volume corrected spectroscopy measurements (brain volume corrections B-D labeled as in figure) and behavioral parameters used for correlations in Figure 3—figure supplement 3ii.

DOI: http://dx.doi.org/10.7554/eLife.20365.014

elife-20365-fig3-data4.xlsx^{(52.6KB, xlsx)}

DOI: 10.7554/eLife.20365.014

[bib1] Amiez C, Joseph JP, Procyk E. Reward encoding in the monkey anterior cingulate cortex. Cerebral Cortex. 2006;16:1040–1055. doi: 10.1093/cercor/bhj046. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib2] Beckmann CF, Smith SM. Probabilistic independent component analysis for functional magnetic resonance imaging. IEEE Transactions on Medical Imaging. 2004;23:137–152. doi: 10.1109/TMI.2003.822821. [DOI] [PubMed] [Google Scholar]

[bib3] Boorman ED, Behrens TE, Woolrich MW, Rushworth MF. How green is the grass on the other side? frontopolar cortex and the evidence in favor of alternative courses of action. Neuron. 2009;62:733–743. doi: 10.1016/j.neuron.2009.05.014. [DOI] [PubMed] [Google Scholar]

[bib4] Carpenter B, Gelman A, Hoffman M, Lee D, Goodrich B. Stan: A probabilistic programming language. Journal of Statistical Software. 2016 doi: 10.18637/jss.v076.i01. In press. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib5] Danielmeier C, Allen EA, Jocham G, Onur OA, Eichele T, Ullsperger M. Acetylcholine mediates behavioral and neural post-error control. Current Biology. 2015;25:1461–1468. doi: 10.1016/j.cub.2015.04.022. [DOI] [PubMed] [Google Scholar]

[bib6] Deichmann R, Gottfried JA, Hutton C, Turner R. Optimized EPI for fMRI studies of the orbitofrontal cortex. NeuroImage. 2003;19:430–441. doi: 10.1016/S1053-8119(03)00073-9. [DOI] [PubMed] [Google Scholar]

[bib7] Emir UE, Auerbach EJ, Van De Moortele PF, Marjańska M, Uğurbil K, Terpstra M, Tkáč I, Oz G. Regional neurochemical profiles in the human brain measured by ¹H MRS at 7 T using local B₁ shimming. NMR in Biomedicine. 2012;25:152–160. doi: 10.1002/nbm.1727. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib8] Gelman A, Rubin DB. Inference from iterative simulation using multiple sequences. Statistical Science. 1992;7:457–472. doi: 10.1214/ss/1177011136. [DOI] [Google Scholar]

[bib9] Goodkind M, Eickhoff SB, Oathes DJ, Jiang Y, Chang A, Jones-Hagata LB, Ortega BN, Zaiko YV, Roach EL, Korgaonkar MS, Grieve SM, Galatzer-Levy I, Fox PT, Etkin A. Identification of a common neurobiological substrate for mental illness. JAMA Psychiatry. 2015;72:305–315. doi: 10.1001/jamapsychiatry.2014.2206. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib10] Halchenko YO, Hanke M. Open is not enough. let's Take the Next Step: An Integrated, Community-Driven Computing Platform for Neuroscience. Frontiers in Neuroinformatics. 2012;6:22. doi: 10.3389/fninf.2012.00022. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib11] Heilbronner SR, Hayden BY. Dorsal anterior cingulate cortex: A Bottom-Up view. Annual Review of Neuroscience. 2016;39:149–170. doi: 10.1146/annurev-neuro-070815-013952. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib12] Holroyd CB, Yeung N. Motivation of extended behaviors by anterior cingulate cortex. Trends in Cognitive Sciences. 2012;16:122–128. doi: 10.1016/j.tics.2011.12.008. [DOI] [PubMed] [Google Scholar]

[bib13] Hunt LT, Behrens TE, Hosokawa T, Wallis JD, Kennerley SW. Capturing the temporal evolution of choice across prefrontal cortex. eLife. 2015;4:e11945. doi: 10.7554/eLife.11945. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib14] Hunt LT, Kolling N, Soltani A, Woolrich MW, Rushworth MFS, Behrens TEJ. Mechanisms underlying cortical activity during value-guided choice. Nature Neuroscience. 2012;15:470–476. doi: 10.1038/nn.3017. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib15] Jenkinson M, Bannister P, Brady M, Smith S. Improved optimization for the robust and accurate linear registration and motion correction of brain images. NeuroImage. 2002;17:825–841. doi: 10.1006/nimg.2002.1132. [DOI] [PubMed] [Google Scholar]

[bib16] Jenkinson M, Beckmann CF, Behrens TE, Woolrich MW, Smith SM. FSL. NeuroImage. 2012;62:782–790. doi: 10.1016/j.neuroimage.2011.09.015. [DOI] [PubMed] [Google Scholar]

[bib17] Jocham G, Hunt LT, Near J, Behrens TE. A mechanism for value-guided choice based on the excitation-inhibition balance in prefrontal cortex. Nature Neuroscience. 2012;15:960–961. doi: 10.1038/nn.3140. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib18] Karlsson MP, Tervo DG, Karpova AY. Network resets in medial prefrontal cortex mark the onset of behavioral uncertainty. Science. 2012;338:135–139. doi: 10.1126/science.1226518. [DOI] [PubMed] [Google Scholar]

[bib19] Kelly RE, Alexopoulos GS, Wang Z, Gunning FM, Murphy CF, Morimoto SS, Kanellopoulos D, Jia Z, Lim KO, Hoptman MJ. Visual inspection of independent components: defining a procedure for artifact removal from fMRI data. Journal of Neuroscience Methods. 2010;189:233–245. doi: 10.1016/j.jneumeth.2010.03.028. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib20] Kennerley SW, Walton ME, Behrens TE, Buckley MJ, Rushworth MF. Optimal decision making and the anterior cingulate cortex. Nature Neuroscience. 2006;9:940–947. doi: 10.1038/nn1724. [DOI] [PubMed] [Google Scholar]

[bib21] Khamassi M, Wilson C, Rothé R, Quilodran R, Dominey PF. Meta-learning, cognitive control, and physiological interactions between medial and lateral prefrontal cortex. Neural Basis of Motivational and Cognitive Control. 2011:351–370. doi: 10.7551/mitpress/9780262016438.003.0019. [DOI] [Google Scholar]

[bib22] Kolling N, Behrens T, Wittmann MK, Rushworth M. Multiple signals in anterior cingulate cortex. Current Opinion in Neurobiology. 2016a;37:36–43. doi: 10.1016/j.conb.2015.12.007. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib23] Kolling N, Behrens TE, Mars RB, Rushworth MF. Neural mechanisms of foraging. Science. 2012;336:95–98. doi: 10.1126/science.1216930. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib24] Kolling N, Wittmann M, Rushworth MF. Multiple neural mechanisms of decision making and their competition under changing risk pressure. Neuron. 2014;81:1190–1202. doi: 10.1016/j.neuron.2014.01.033. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib25] Kolling N, Wittmann MK, Behrens TE, Boorman ED, Mars RB, Rushworth MF. Value, search, persistence and model updating in anterior cingulate cortex. Nature Neuroscience. 2016b;19:1280–1285. doi: 10.1038/nn.4382. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib26] Lee MD, Wagenmakers E-J. Bayesian Cognitive Modeling: A Practical Course. Cambridge university press; 2014. [Google Scholar]

[bib27] Mars RB, Jbabdi S, Sallet J, O'Reilly JX, Croxson PL, Olivier E, Noonan MP, Bergmann C, Mitchell AS, Baxter MG, Behrens TE, Johansen-Berg H, Tomassini V, Miller KL, Rushworth MF. Diffusion-weighted imaging tractography-based parcellation of the human parietal cortex and comparison with human and macaque resting-state functional connectivity. Journal of Neuroscience. 2011;31:4087–4100. doi: 10.1523/JNEUROSCI.5102-10.2011. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib28] Meder D, Haagensen BN, Hulme O, Morville T, Gelskov S, Herz DM, Diomsina B, Christensen MS, Madsen KH, Siebner HR. Tuning the Brake while raising the stake: Network dynamics during sequential Decision-Making. Journal of Neuroscience. 2016;36:5417–5426. doi: 10.1523/JNEUROSCI.3191-15.2016. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib29] Mekle R, Mlynárik V, Gambarota G, Hergt M, Krueger G, Gruetter R. MR spectroscopy of the human brain with enhanced signal intensity at ultrashort echo times on a clinical platform at 3t and 7t. Magnetic Resonance in Medicine. 2009;61:1279–1285. doi: 10.1002/mrm.21961. [DOI] [PubMed] [Google Scholar]

[bib30] Mlynárik V, Gambarota G, Frenkel H, Gruetter R. Localized short-echo-time proton MR spectroscopy with full signal-intensity acquisition. Magnetic Resonance in Medicine. 2006;56:965–970. doi: 10.1002/mrm.21043. [DOI] [PubMed] [Google Scholar]

[bib31] Mori S, Wakana S, Van Zijl PCM, Nagae-Poetscher LM. MRI Atlas of Human White Matter. Elsevier; 2005. p. 284. [DOI] [PubMed] [Google Scholar]

[bib32] Neubert FX, Mars RB, Sallet J, Rushworth MF. Connectivity reveals relationship of brain areas for reward-guided learning and decision making in human and monkey frontal cortex. PNAS. 2015;112:E2695–E2704. doi: 10.1073/pnas.1410767112. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib33] O'Reilly JX, Schüffelgen U, Cuell SF, Behrens TE, Mars RB, Rushworth MF. Dissociable effects of surprise and model update in parietal and anterior cingulate cortex. PNAS. 2013;110:E3660–E3669. doi: 10.1073/pnas.1305373110. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib34] Provencher SW. Automatic quantitation of localized in vivo 1H spectra with LCModel. NMR in Biomedicine. 2001;14:260–264. doi: 10.1002/nbm.698. [DOI] [PubMed] [Google Scholar]

[bib35] Rushworth MF, Noonan MP, Boorman ED, Walton ME, Behrens TE. Frontal cortex and reward-guided learning and decision-making. Neuron. 2011;70:1054–1069. doi: 10.1016/j.neuron.2011.05.014. [DOI] [PubMed] [Google Scholar]

[bib36] Sallet J, Mars RB, Noonan MP, Neubert FX, Jbabdi S, O'Reilly JX, Filippini N, Thomas AG, Rushworth MF. The organization of dorsal frontal cortex in humans and macaques. Journal of Neuroscience. 2013;33:12255–12274. doi: 10.1523/JNEUROSCI.5108-12.2013. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib37] Scholl J, Günthner J, Kolling N, Favaron E, Rushworth MF, Harmer CJ, Reinecke A. A role beyond learning for NMDA receptors in reward-based decision-making-a pharmacological study using d-cycloserine. Neuropsychopharmacology. 2014;39:2900–2909. doi: 10.1038/npp.2014.144. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib38] Scholl J, Kolling N, Nelissen N, Wittmann MK, Harmer CJ, Rushworth MF. The good, the Bad, and the Irrelevant: Neural Mechanisms of Learning Real and Hypothetical Rewards and Effort. Journal of Neuroscience. 2015;35:11233–11251. doi: 10.1523/JNEUROSCI.0396-15.2015. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib39] Shima K, Tanji J. Role for Cingulate Motor Area cells in voluntary movement selection based on reward. Science. 1998;282:1335–1338. doi: 10.1126/science.282.5392.1335. [DOI] [PubMed] [Google Scholar]

[bib40] Simpson R, Devenyi GA, Jezzard P, Hennessy TJ, Near J. Advanced processing and simulation of MRS data using the FID appliance (FID-A)-An open source, MATLAB-based toolkit. Magnetic Resonance in Medicine. 2015;77:23–33. doi: 10.1002/mrm.26091. [DOI] [PubMed] [Google Scholar]

[bib41] Smith SM, Jenkinson M, Woolrich MW, Beckmann CF, Behrens TE, Johansen-Berg H, Bannister PR, De Luca M, Drobnjak I, Flitney DE, Niazy RK, Saunders J, Vickers J, Zhang Y, De Stefano N, Brady JM, Matthews PM. Advances in functional and structural MR image analysis and implementation as FSL. NeuroImage. 2004;23 Suppl 1:S208–219. doi: 10.1016/j.neuroimage.2004.07.051. [DOI] [PubMed] [Google Scholar]

[bib42] Smith SM. Fast robust automated brain extraction. Human Brain Mapping. 2002;17:143–155. doi: 10.1002/hbm.10062. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib43] Stagg CJ, Best JG, Stephenson MC, O'Shea J, Wylezinska M, Kincses ZT, Morris PG, Matthews PM, Johansen-Berg H. Polarity-sensitive modulation of cortical neurotransmitters by transcranial stimulation. Journal of Neuroscience. 2009;29:5202–5206. doi: 10.1523/JNEUROSCI.4432-08.2009. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib44] Stagg CJ, Bestmann S, Constantinescu AO, Moreno LM, Allman C, Mekle R, Woolrich M, Near J, Johansen-Berg H, Rothwell JC. Relationship between physiological measures of excitability and levels of glutamate and GABA in the human motor cortex. The Journal of Physiology. 2011;589:5845–5855. doi: 10.1113/jphysiol.2011.216978. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib45] Stoll FM, Fontanier V, Procyk E. Specific frontal neural dynamics contribute to decisions to check. Nature Communications. 2016;7:11990. doi: 10.1038/ncomms11990. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib46] Sumner P, Edden RA, Bompas A, Evans CJ, Singh KD. More GABA, less distraction: a neurochemical predictor of motor decision speed. Nature Neuroscience. 2010;13:825–827. doi: 10.1038/nn.2559. [DOI] [PubMed] [Google Scholar]

[bib47] Terhune DB, Russo S, Near J, Stagg CJ, Cohen Kadosh R. GABA predicts time perception. Journal of Neuroscience. 2014;34:4364–4370. doi: 10.1523/JNEUROSCI.3972-13.2014. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib48] Tervo DG, Proskurin M, Manakov M, Kabra M, Vollmer A, Branson K, Karpova AY. Behavioral variability through stochastic choice and its gating by anterior cingulate cortex. Cell. 2014;159:21–32. doi: 10.1016/j.cell.2014.08.037. [DOI] [PubMed] [Google Scholar]

[bib49] Thain D, Tannenbaum T, Livny M. Distributed computing in practice: the condor experience. Concurrency and Computation: Practice and Experience. 2005;17:323–356. doi: 10.1002/cpe.938. [DOI] [Google Scholar]

[bib50] Ullsperger M, Danielmeier C, Jocham G. Neurophysiology of performance monitoring and adaptive behavior. Physiological Reviews. 2014;94:35–79. doi: 10.1152/physrev.00041.2012. [DOI] [PubMed] [Google Scholar]

[bib51] van der Veen JW, Shen J. Regional difference in GABA levels between medial prefrontal and occipital cortices. Journal of Magnetic Resonance Imaging. 2013;38:745–750. doi: 10.1002/jmri.24009. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib52] Vehtari A, Gelman A, Gabry J. Practical bayesian model evaluation using leave-one-out cross-validation and WAIC. arXiv.org. 2016 arXiv:1507.04544

[bib53] Wittmann MK, Kolling N, Akaishi R, Chau BK, Brown JW, Nelissen N, Rushworth MF. Predictive decision making driven by multiple time-linked reward representations in the anterior cingulate cortex. Nature Communications. 2016;7:12327. doi: 10.1038/ncomms12327. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib54] Woolrich MW, Behrens TE, Beckmann CF, Jenkinson M, Smith SM. Multilevel linear modelling for FMRI group analysis using bayesian inference. NeuroImage. 2004;21:1732–1747. doi: 10.1016/j.neuroimage.2003.12.023. [DOI] [PubMed] [Google Scholar]

[bib55] Woolrich MW, Ripley BD, Brady M, Smith SM. Temporal autocorrelation in univariate linear modeling of FMRI data. NeuroImage. 2001;14:1370–1386. doi: 10.1006/nimg.2001.0931. [DOI] [PubMed] [Google Scholar]

[bib56] Yüksel C, Öngür D. Magnetic resonance spectroscopy studies of glutamate-related abnormalities in mood disorders. Biological Psychiatry. 2010;68:785–794. doi: 10.1016/j.biopsych.2010.06.016. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib57] Zhang Y, Brady M, Smith S. Segmentation of brain MR images through a hidden markov random field model and the expectation-maximization algorithm. IEEE Transactions on Medical Imaging. 2001;20:45–57. doi: 10.1109/42.906424. [DOI] [PubMed] [Google Scholar]

PERMALINK

Excitation and inhibition in anterior cingulate predict use of past experiences

Jacqueline Scholl

Nils Kolling

Natalie Nelissen

Charlotte J Stagg

Catherine J Harmer

Matthew FS Rushworth

Abstract

Introduction

Results

Figure 1. Spectroscopy measurements and task.

Figure 1—figure supplement 1. Illustration of the model of integration and use of learnt information and relationship between spectroscopy and brain activity.

Figure 2. Task validation.

Figure 2—figure supplement 1. Model simulation and validation.

Figure 2—figure supplement 2. Model fit and parameter stability.

Figure 3. GABA and glutamate predict behavior and neural activity.

Figure 3—figure supplement 1. Correlations between spectroscopy measurements and other behavioral parameters.

Figure 3—figure supplement 2. Overlap between neural signals and spectroscopy voxel placements.

Figure 3—figure supplement 3. Different brain volume normalizations for spectroscopy.

Figure 3—figure supplement 4. Correlations between regressors in GLM1.

Table 1.

Discussion

Materials and methods

Participants

Task description

Experiment timings

Data sharing

Task validation

Behavioral modeling

Spectroscopy

Relating behavior to spectroscopy

FMRI

Data acquisition

Preprocessing

Data analysis

Relating fMRI to spectroscopy measures

Acknowledgement

Funding Statement

Funding Information

Additional information

Competing interests

Author contributions

Ethics

Additional files

Major datasets

References

Decision letter

Author response

Author response image 1. Correlations between model parameters derived using either a hierarchical (y-axis) or non-hierarchical (x-axis) model fitting approach.

Author response image 2. The fit of the model for each participant (measured as negative log likelihood, y-axis) strongly correlates with the estimate of inverse temperature (β) from the model (ρ=-0.78, p=3.5x10-6).

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

Author response image 2. The fit of the model for each participant (measured as negative log likelihood, y-axis) strongly correlates with the estimate of inverse temperature (β) from the model (ρ=-0.78, p=3.5x10^-6).