Neural basis of decision making guided by emotional outcomes

Kentaro Katahira; Yoshi-Taka Matsuda; Tomomi Fujimura; Kenichi Ueno; Takeshi Asamizuya; Chisato Suzuki; Kang Cheng; Kazuo Okanoya; Masato Okada

doi:10.1152/jn.00564.2014

. 2015 Feb 18;113(9):3056–3068. doi: 10.1152/jn.00564.2014

Neural basis of decision making guided by emotional outcomes

Kentaro Katahira ^1,^2,^✉, Yoshi-Taka Matsuda ^1,², Tomomi Fujimura ^1,², Kenichi Ueno ³, Takeshi Asamizuya ³, Chisato Suzuki ³, Kang Cheng ³, Kazuo Okanoya ^1,^2,⁴, Masato Okada ^1,⁵

PMCID: PMC4455562 PMID: 25695644

Abstract

Emotional events resulting from a choice influence an individual's subsequent decision making. Although the relationship between emotion and decision making has been widely discussed, previous studies have mainly investigated decision outcomes that can easily be mapped to reward and punishment, including monetary gain/loss, gustatory stimuli, and pain. These studies regard emotion as a modulator of decision making that can be made rationally in the absence of emotions. In our daily lives, however, we often encounter various emotional events that affect decisions by themselves, and mapping the events to a reward or punishment is often not straightforward. In this study, we investigated the neural substrates of how such emotional decision outcomes affect subsequent decision making. By using functional magnetic resonance imaging (fMRI), we measured brain activities of humans during a stochastic decision-making task in which various emotional pictures were presented as decision outcomes. We found that pleasant pictures differentially activated the midbrain, fusiform gyrus, and parahippocampal gyrus, whereas unpleasant pictures differentially activated the ventral striatum, compared with neutral pictures. We assumed that the emotional decision outcomes affect the subsequent decision by updating the value of the options, a process modeled by reinforcement learning models, and that the brain regions representing the prediction error that drives the reinforcement learning are involved in guiding subsequent decisions. We found that some regions of the striatum and the insula were separately correlated with the prediction error for either pleasant pictures or unpleasant pictures, whereas the precuneus was correlated with prediction errors for both pleasant and unpleasant pictures.

Keywords: emotional pictures, reinforcement learning, valence, striatum, insula

emotion influences decision making in humans and other animals. The process underlying this influence has generated interest because it can explain the various aspects of decision making that are regarded as either rational or irrational (Bechara et al. 2005; Cohen 2005; Loewenstein et al. 2001; Seymour and Dolan 2008; Shiv et al. 2005). Previous studies have used monetary gain/loss or gustatory stimuli (e.g., juice) as decision outcomes that guide subsequent decision making. The value or magnitude of these stimuli can be manipulated and quantified in an experimental environment relatively easily. However, such reward or punishment not only induces an emotional response but also can be objectively quantified; thus it can be subject to the rational calculations of the decision maker.

To address the pure impact of emotion induced by a decision outcome, we discuss the decision-making task in which the decision outcomes induce emotion with values that are difficult to compute in a straightforward manner. Emotional pictures have been used for investigating behavioral, physiological, or neural reactions to emotional events (e.g., Bradley et al. 2001; Lang et al. 1998a). For example, brain activities during the passive viewing of emotional pictures have been studied using imaging techniques, and studies have found that emotional (pleasant and unpleasant) pictures activate several brain regions, such as the occipital cortex, medial prefrontal cortex, thalamus, hypothalamus, and midbrain, to a greater extent than neutral pictures (Lane et al. 1997; Lang et al. 1998b).

In the main task of the current study, the emotional pictures and neutral pictures were presented as outcomes of choice, and the valence of the pictures was stochastically contingent on the participants' choices (Katahira et al. 2011, 2014). The neural basis of decision making and associative learning based on reward or punishment has been intensively studied, and several brain regions, such as the striatum and the insula, are known to be involved in the learning processes (O'Doherty et al. 2003b, 2004; Seymour et al. 2004, 2005; Tanaka et al. 2004). We hypothesized that emotional pictures recruit the striatum and the insula in contexts in which the participants have to learn and make decisions.

According to the reinforcement learning theory, the reward prediction error that quantifies the discrepancy between what was expected and what is actually observed is a key variable that drives learning (Niv 2009; Niv et al. 2005; Niv and Schoenbaum 2008). Numerous studies have reported that the blood oxygen level-dependent (BOLD) signal in several regions in the striatum and the insula is correlated with the prediction error (Daw 2011; Daw et al. 2006; O'Doherty et al. 2003b; Seymour et al. 2004). We derived the prediction error signal by fitting the reinforcement learning model to the subjects' behavioral data. We assumed that the emotional decision outcomes affected subsequent decisions by updating the values of the options and that the brain regions representing the prediction error that drives the reinforcement learning play a pivotal role in emotion-guided decision making.

MATERIALS AND METHODS

Participants

Thirty healthy volunteers participated in this study. All participants were neurologically normal and had normal or corrected-to-normal vision. Data from five participants were excluded because of incomplete data acquisition (1 participant), excessive head motion (1 participant), or poor behavioral performance that did not exceed the chance level (2 participants). The data from the 25 remaining participants (13 men and 12 women, age 24.44 ± 5.28 yr, mean ± SD) were analyzed. All participants provided informed consent according to the procedures approved by the RIKEN Ethics Committee and the RIKEN Functional MRI Safety and Ethics Committee.

Behavioral Task

The task consisted of 80 decision-picture trials, 80 decision-money trials, and 40 no-decision-picture trials (Fig. 1). The no-decision-picture trial was a control trial used to investigate the effects of choice (in decision-picture trials) on the response to picture stimuli. In the two decision trial types, participants were faced with the choice between two possible actions that were represented by affectively neutral fractal images (cues). Two different cue sets were assigned to two decision trial types. In the decision-picture trial, a picture with its valence (pleasant, neutral, or unpleasant) dependent on the choice was presented. In the decision-money trial, a resulting monetary outcome (+500, 0, or −500 yen) was presented. In the no-decision-picture trials, two cues, marked with “?” and “×,” were first presented. The participants were asked to press the button corresponding to the position of “?,” and then a picture whose valence was randomly determined was presented. All three trial types were pseudorandomly intermixed throughout the task. The choice-outcome contingency (in the decision-picture and decision-money trials) was as follows. For each decision trial, one fractal image was an “advantageous” or “optimal” option, which was associated with a pleasant picture/+500 yen with a probability of 65%, a neutral picture/0 yen with a probability of 20%, or an unpleasant picture/−500 yen with a probability of 15%. Another fractal image was a “disadvantageous” or “non-optimal” option, which was associated with a pleasant picture/+500 yen with a probability of 15%, a neutral picture/0 yen with a probability of 20%, or an unpleasant picture/−500 yen with a probability of 65%. The advantageous option and disadvantageous option switched between two fractal images for each trial type without any signal at the 20th, 35th, and 45th trials of the decision-picture trials and at the 15th, 30th, and 50th trials of the decision-money trials. The assignment of the fractal images to options was counterbalanced across participants. The locations of the fractal images were also randomized across trials. If a response was not made within the time limit of 1.5 s, a response omission was indicated to the participants, and the trial was aborted. The mean/maximum fractions of aborted trials across the participants were 0.019/0.062 for the decision-picture trials, 0.020/0.088 for the decision-money trials, and 0.014/0.075 for the no-decision-picture trials. After a white frame indicating the choice had been presented for 4.0 s, an outcome image (a picture or an image of a monetary outcome) was presented. The outcome image lasted 2 s and was followed by a jittered intertrial interval whose duration was 4–5 s (drawn from the uniform distribution).

For each picture category (pleasant, unpleasant, neutral), 20 pictures were selected from the International Affective Picture System (IAPS) (Lang et al. 2008).¹ IAPS has been commonly used in emotion studies (e.g., Bradley et al. 2001; Codispoti et al. 2001; Lang and Bradley 2010). We avoided using sexual pictures and pictures that included attractive faces or smiling faces because previous imaging studies have reported that these stimuli evoked the reward system by themselves (Bray and O'Doherty 2007; O'Doherty et al. 2003a; Sabatinelli et al. 2007). Examples of pleasant pictures include beautiful scenes, such as gardens, sunsets, and beaches; cute living systems, such as dolphins, puppies, and human children; exciting scenes, such as sky divers; and appetizing objects, such as pancakes. Examples of unpleasant pictures include violent scenes; tragic scenes; and harmful objects, such as heroin, cockroaches, and dirty garbage. Examples of neutral pictures include simple objects, such as spoons, buttons, umbrellas, and scenes from daily living, such as a girl sitting in front of a computer screen. The pleasant and unpleasant pictures were selected to be equidistant from the neutral pictures in terms of valence and arousal. The normative valence/arousal ratings of these pictures were as follows (mean ± SD, 1 = most unpleasant/least arousing, 9 = most pleasant/most arousing): 5.03 ± 0.21/2.92 ± 0.85 for the neutral pictures, 7.37 ± 0.59/5.06 ± 0.71 for the pleasant pictures, and 2.70 ± 0.36/5.26 ± 0.79 for the unpleasant pictures. Pictures in the decision-picture trials and the no-decision-picture trials were randomly sampled from the same set of pictures. Before entering the scanner, the participants experienced 1 training session consisting of 30 trials (10 trials per trial type) with a different picture set from the main experiment. Before the training session, the participants were told that there were two pairs of stimuli for choice trials and that on each trial, one of these pairs would be displayed. They were instructed to select one of the stimuli on each trial by pressing the left or right response button. The participants were told that after making their choices, they would be shown a picture or an image indicating the monetary outcome. The participants were instructed to carefully look at the picture to answer questions about the scenes and individuals in the picture after the entire experimental session had finished. The participants were not told which stimulus was associated with which particular outcome, but they were told that one option was associated with a higher probability of obtaining an outcome than the other and that the probability might change without any cue. The participants were encouraged to 1) try to make a choice so that they could see a picture they wanted to see and avoid seeing a picture they did not want to see for the decision-picture trials and 2) try to maximize the gain for the decision-money trials. In our previous study that used emotional pictures as the decision outcome (Katahira et al. 2011), we found that pleasant pictures act as appetitive stimuli and that unpleasant pictures act as aversive stimuli. This result supports the assumption that normal participants have a tendency to want to see pleasant pictures and to not want to see unpleasant pictures. The participants underwent 2 sessions, each consisting of 100 trials. The participants were informed that they would receive compensation proportional to the total money earned in the decision-money trials, although minimum compensation (∼5,000 yen) was guaranteed regardless of performance. The actual compensation in yen equaled 5,500 + 7.5 × min [max (average gain per decision-money trial, 0), 200].

The participants first performed the practice session outside the scanner and then performed the two main sessions in the scanner. After completing the experimental task, the participants rated all pictures for valence on a scale from 1 (most unpleasant) to 9 (most pleasant) on a paper-based questionnaire. They were asked to rate how the images made them feel during the decision-making experiment. A paper-based recognition test in which the participants were asked whether each picture had appeared in the task was also administered. The mean fraction of pictures presented in the task was 0.84 (SD 0.04). The mean correct recognition rate (including correct hit and correct rejection) across the participants was 0.94 (SD 0.05), suggesting that the participants had a satisfactory attendance to the picture stimulus.

Reinforcement Learning Models

To model the participants' choice behaviors, we employed the Q-learning models (Sutton and Barto 1998; Watkins and Dayan 1992). Because of the independence of the decision-picture trials and the decision-money trials (i.e., a nonoverlapping and independent reward schedule was used for the 2 tasks), we could independently treat the data for these two tasks. Below we first describe the standard Q-learning model (with valence-mixed representation) and then the valence-separated representation of the Q-learning model.

Standard Q-learning model (with valence-mixed representation).

The standard Q-learning model represents the value of each action (selecting one option) as Q values (action values). Let Q_i(t) denote the Q value for option i (i = 1, 2) on trial t. The Q values are updated according to the choice and the resulting outcome (the outcome in this study corresponded to a picture or monetary feedback). Let a(t) denote the option the participant chooses on trial t. If a(t) = i, then the Q value corresponding to the selected option is updated as follows:

\begin{matrix} Q_{i} (t + 1) = Q_{i} (t) + α \cdot δ (t), \end{matrix}

δ (t) = v (t) - Q_{i} (t),

whereas the Q value for the unselected option does not change. Here, α (0 ≤ α ≤ 1) is the learning rate that determines the degree of the update and v(t) is the motivational value, or reward value, for the picture or monetary feedback presented in trial t, which is specified below. δ(t) is called the reward-prediction error. Given a Q-value set, a choice is assumed to be made according to the probability of choosing option 1, P[a(t) = 1], given by the soft-max function:

P [a (t) = 1] = \frac{\exp [Q_{1} (t)]}{\exp [Q_{1} (t)] + \exp [Q_{2} (t)]},

with P[a(t) = 2] = 1 − P[a(t) = 1]. The model set motivational value v(t) as follows:

For the t-th trial of decision-picture trials,

v (t) = {\begin{matrix} κ_{pict}^{P} \\ 0 \\ κ_{pict}^{N} \end{matrix} \begin{matrix} if a pleasant picture was presented \\ if a neutral picture was presented \\ if a unpleasant picture was presented \end{matrix} .

For the t-th trial of decision-money trials,

v (t) = {\begin{matrix} κ_{money}^{P} \\ 0 \\ κ_{money}^{N} \end{matrix} \begin{matrix} if the outcome was + 500 yen \\ if the outcome was 0 yen \\ if the outcome was - 500 yen \end{matrix} .

Because we do not know the motivational value of each outcome category a priori, κ^P and κ^N are free parameters that should be estimated based on the participants' choice data. To quantify the relative motivational value of emotional outcomes versus neutral outcomes, we set the value of neutral outcome to zero and then estimated the motivational value parameters for pleasant pictures and unpleasant pictures. Although neutral pictures may have effectively non-zero motivational value, including the free parameter for neutral outcome complicates the interpretation of the model. The absolute values of motivational value parameters are only meaningful when we compare them to the initial values of the action value. Thus the effective motivational value of the neutral outcome (used as reference point) can be offset by adjusting the initial action values Q₀ [= Q₁(1) = Q₂(1)]: setting Q₀ to a negative value while setting the value of neutral outcomes to zero represents the effective positive motivational value of neutral outcomes. To examine whether the effective motivational value of neutral outcomes was non-zero, we compared the results obtained from the standard Q-learning model when Q₀ was either set to zero or left as a free parameter. We used the model for the decision-picture trials and the decision-money trials separately; thus all the parameters and Q values were independently used for each trial type, and the trial index t was counted separately.

Valence-separated representation of the Q-learning model.

The aforementioned standard Q-learning model expresses positive and negative valence in one dimension; that is, negative valence takes an opposite sign of positive valence. Therefore, we refer to this model as a standard Q-learning model with valence-mixed representation. However, several reports suggest that the activities of our regions of interest (ROIs), the insula and the striatum, are correlated with negative valence with a positively signed value (Seymour et al. 2004, 2005, 2007). In our task, neutral options (fractal images) were related to bivalent outcomes. To represent the positive-valence value and negative-valence value separately, we propose a valence-separated representation of the Q-learning model. In this representation, Q values for option i are decomposed as follows:

Q_{i} (t) = Q_{i}^{+} (t) + Q_{i}^{-} (t),

where Q_i⁺(t) represents the Q value related to positive (appetitive) valence and Q_i⁻(t) represents the Q value related to negative (aversive) valence. Accordingly, the update rule and prediction error are divided into the appetitive component and the aversive component as follows:

Q_{i}^{+} (t + 1) = Q_{i}^{+} (t) + α^{+} \cdot δ^{+} (t),

Q_{i}^{-} (t + 1) = Q_{i}^{-} (t) + α^{-} \cdot δ^{-} (t),

where the appetitive prediction error δ⁺(t) and the aversive prediction error δ⁻(t) are calculated depending on the outcome valence as follows:

For an appetitive or neutral outcome case [v(t) ≥ 0],

δ^{+} (t) = v (t) - Q_{i}^{+} (t),

δ^{-} (t) = - Q_{i}^{-} (t),

For an aversive outcome case [v(t) < 0],

δ^{+} (t) = - Q_{i}^{+} (t),

δ^{-} (t) = v (t) - Q_{i}^{-} (t) .

For the special case α⁺ = α⁻, the valence-separated representation of the Q-learning model provides the equivalent prediction for the choice as the standard Q-learning model. We simply assumed the initial Q values to be Q_i⁺(1) = Q_i⁻(2) = Q₀/2. To test whether the learning rate should differ between appetitive and aversive components, we compared the model with the common learning rate model (α⁺ = α⁻) and the different learning rate model (both α⁺ and α⁻ are independent free parameters). In addition, to check whether both the positive and negative outcomes indeed worked as appetitive and aversive stimuli, respectively, we compared these models with simpler models in which either value parameter κ^P or κ^N was set to zero (equal to the neutral outcome). An example of the behavior of the Q-learning model is depicted in Fig. 2A.

Fig. 2. — Illustration of the reinforcement learning model and the model-based functional magnetic resonance imaging (fMRI) analysis. A: an example of choice data from a single participant and the prediction of the reinforcement learning model. The first half of the decision-picture trials are shown. The vertical bars (*top* panel) indicate the chosen option for each trial; their color and length represent the valence of the outcome pictures. The probability of choosing *option 1* (green line) was calculated from the fitted Q-learning model with a parameterized initial value and a single learning rate. The valence-separated representation of Q values and the prediction error (for the outcome picture), as well as their valence-mixed representation, were derived from the same model (*bottom* panels). Note that for aversive components, the prediction errors (PEs) were sign flipped so that PE = PE⁺ − PE⁻. B: estimated temporal-difference (TD) errors for the 37th trial in the example. Options (fractal images) appear unpredictably and thus induce PEs approximately equal to the Q values of the chosen option. Predicted blood oxygen level-depending (BOLD) responses (dotted lines) were obtained by convolving the TD errors with the hemodynamic response function.

Parameter fit and model comparison.

We used the maximum likelihood approach to estimate the model parameter from the participants' choice data. For each model and each trial type, a single parameter set was estimated for the participants as a whole to obtain a stable parametric regressor (Daw 2011). The model parameters were optimized by minimizing the negative log likelihood using the Matlab function “fmincon.” To compare the models, we used the likelihood-ratio test of the null hypothesis that the improvement in the likelihood of more complicated models relative to the simpler ones occurred by chance alone. Under the null hypothesis, the likelihood-ratio statistic would obey the χ² distribution with the degree of freedom being the difference in the number of model parameters between the two models. The degree of freedom does not depend on the number of participants because a single model parameter set was estimated for each model using the pooled data from all participants.

fMRI Data Acquisition

The functional imaging was conducted by using an Agilent 4-Tesla whole body MRI system (Agilent, Santa Clara, CA) with a circularly polarized quadrature birdcage radiofrequency coil as a transmitter and 12 array coils as receivers (Nova Medical, Wilmington, MA). Fifty axial slices (19.2-cm field of view, 64 × 64 matrix, 3-mm thickness, 0-mm gap) with 30° forward rotation from the AC-PC plane were acquired using a four-shot echo planar imaging (EPI) pulse sequence (volume TR 7.32 s, TE 25 ms, flip angle 76°) for the two functional runs, each consisting of 180 volumes. After TSENSE (acceleration factor 4) reconstruction (Kellman 2001; Pruessmann et al. 1999), the sampling frequency was quadruplicated and the effective volume TR became 1.83 s. Before the functional runs, a whole brain anatomic image (voxel size = 1 × 1 × 1 mm³) was acquired using a three-dimensional magnetization-prepared rapid gradient echo (MPRAGE) pulse sequence.

fMRI Data Analysis

After EPI reconstruction, the four-volume cycle intensity alternation caused by TSENSE reconstruction was removed. The four data sets (4n-th, 4n + 1-th, 4n + 2-th, and 4n + 3-th volumes) were averaged on a voxel-by-voxel basis, and we calculated the multiplication factor between the volume sets. A pressure sensor was used to measure the respiration signal, and a pulse oximeter was used to measure the cardiac signal. The respiratory and cardiac signals were used to remove the physiological fluctuations from the functional images by using a retrospective estimation and correction method (Hu et al. 1995). The data were then preprocessed using SPM12 (Wellcome Department of Imaging Neuroscience, Institute of Neurology, London, UK). The preprocessing of the EPIs included slice-timing correction, which adjusted each slice to the middle of the scan, and motion correction (rigid body realignment of all images to the first volume). The T1-weighted structural image of each participant was normalized to a standard T1 image template in Montreal Neurological Institute (MNI) space. The EPIs were then normalized according to transformed structural images and thus transformed into the standard MNI space. The EPIs were then spatially smoothed using a Gaussian kernel with a full width at half-maximum of 8 mm.

Statistical analysis of the fMRI data was performed using a general linear model (GLM) with regressors composed of sets of delta (stick) functions. We analyzed the data separately with the three independent GLMs described below. The first GLM (GLM1) was composed of simple regressors without a reinforcement learning model. The other GLMs (GLM2 and GLM3) included temporal-difference (TD) errors as parametric modulators. All regressors were convolved with a standard two-gamma hemodynamic response function. For each GLM, the six scan-to-scan motion parameters were also included to account for the residual effects of movement.

GLM1: regressors with stimulus identity.

The GLM1 included the regressors at the time of the cue onset for three trial types: the decision-picture trial cue, the decision-money-trial cue, and the no-decision picture trial cue. The GLM1 also included the regressors at the time of outcome onset for the three valence groups for each of the three trial types (9 regressors in total). Here, we were especially interested in how decision making modulates the neural response to emotional pictures. Brain regions that were specifically activated in response to emotional pictures in the decision-trials compared with those in the no-decision trials were assumed to be involved in the emotional outcome-guided decision-making process. To observe this effect, we constructed the following two contrast images: 1) [(decision-pleasant picture − decision-neutral picture) − (no-decision-pleasant picture − no-decision-neutral picture)] and 2) [(decision-unpleasant picture − decision-neutral picture) − (no-decision-unpleasant picture − no-decision-neutral picture)].

GLM2: TD error with valence-mixed representation.

We used the Q-learning model to analyze the fMRI data. With valence-mixed representation, a TD error was derived from the Q-learning model. To include the TD error, the chosen option Q_i(t) was set at the onset of the cue, and the reward-prediction error δ(t) was used at the onset of the outcome (“valence-mixed TD error” in Fig. 2B).

GLM3: TD errors with valence-separated representation.

We also constructed a regressor set with a valence-separated representation of the Q-learning model. Two distinct TD errors, i.e., the appetitive TD errors and the aversive TD errors, were derived from the Q-learning model. To include the appetitive TD error, the chosen option Q_i⁺(t) was set at the onset of the cue and the appetitive reward-prediction error δ⁺(t) was used at the onset of the outcome (Fig. 2C). For the aversive TD error, sign-flipped values, −Q_i⁻(t) and −δ⁻(t), were used for the cue onset and the outcome onset, respectively. We used the sign-flipped TD error because previous studies using punishment reported that greater punishment compared with prediction induced positively greater BOLD signals in the striatum and the insula (Seymour et al. 2004, 2005).

The individual contrast images were then entered into a second-level analysis using a one-sample t-test. The resulting summary statistical map was initially given a threshold at P < 0.005 (uncorrected for multiple comparisons); a small volume correction (SVC) was then applied to our ROIs. The anatomically defined ROIs were extracted from all ROI libraries distributed with the MarsBaR toolbox (Brett et al. 2002) for the bilateral insula, the putamen, and the caudate. For the nucleus accumbens (NAcc), we used 6-mm radius spherical ROIs centered at the locations with MNI coordinates (x, y, z): right NAcc, 12, 8, −8; left NAcc, −12, 8, −8. Clusters that reached a threshold (P < 0.005 or 0.001, uncorrected) are shown overlaid on the average of all participants' normalized structural images.

RESULTS

The participants carried out two decision-making tasks in which they selected one fractal neutral image and a picture-viewing task without a decision (no-decision -picture trials). On the decision-picture trial, which was our main task, an emotional picture or an emotionally neutral picture was presented as an outcome of the decision. In the decision-money trial, a monetary feedback was presented, similarly to a conventional value-based decision-making task.

Behavioral Results

The valence rating for pictures (1 = most unpleasant, 5 = neutral, and 9 = most pleasant) were collected after the fMRI scan. Here, we report the results only for the pictures that were actually presented in the experiment. The mean valence rating for the neutral pictures (mean ± SD across participants) was 5.08 ± 0.28, which did not significantly differ from neutral [= 5; t(24) = 1.336, P = 0.19]. The mean valence rating for the pleasant pictures was 7.53 ± 0.63, and the valence rating for the unpleasant pictures was 2.51 ± 0.58, both of which significantly differed from neutral [both P < 10⁻¹⁰; t(24) = 20.17 and t(24) = −21.07, respectively]. These results indicate that the picture categories assigned to each picture were valid. In addition, the absolute deviations from neutral for rating the pleasant and unpleasant pictures did not significantly differ [t(24) = 0.108, P = 0.914], implying the symmetry of subjective valence for our picture set.

The decision-making tasks were challenging because the participants were required to continue learning due to the abrupt changes of the cue assignment to the advantageous/disadvantageous option. The fractions of trials in which the participants chose the advantageous option were 0.68 ± 0.11 for the decision-picture trials and 0.69 ± 0.09 for the decision-money trials. These values were significantly above the chance level [i.e., 0.5; both P < 10⁻⁵; t(24) = 7.98 and t(24) = 10.21, respectively]. As the time courses of the proportions of choosing the advantageous options indicated, after the contingencies (which options were advantageous) were changed, the participants quickly changed their preferences for the advantageous option, with a few trials for all switching position in both trial types (Fig. 3).

Fig. 3. — Learning time courses of our participants for decision-picture trials (A) and decision-money trials (B). The solid lines depict the fractions, across participants, who choose the advantageous options. The vertical solid lines indicate the points at which contingencies changed (the advantageous options flipped). For the intervals with the same gray-shaded bar (in each panel), the same option was advantageous. The broken lines indicate the 95% confidence intervals of the optimal choice (choosing the advantageous option) for each trial.

We fit several variants of the reinforcement learning models to the choice data of the decision trials and compared their goodness of fit. First, we examined whether the motivational value of neutral pictures, which was used as a reference point (= 0), differed from the initial value of each action (Q₀). To achieve this, we compared the standard Q-learning model, in which the initial value Q₀ was fixed at zero, with the standard Q-learning model, in which Q₀ was parameterized (see materials and methods). The standard Q-learning model with a parameterized initial value (estimated parameters were α = 0.663, κ^P = 0.710, κ^N = −1.714, and Q₀ = −1.666 for the decision-picture trials and α = 0.698, κ^P = 1.602, κ^N = −1.707, and Q₀ = −0.766 for the decision-money trials) provided a significantly better fit to the data than the restricted Q-learning model with Q₀ = 0 [χ²(1) = 31.49, P < 10⁻⁷ for the decision-picture trials and χ²(1) = 4.56, P = 0.032 for the decision-money trials, likelihood ratio test], suggesting that the neutral outcomes have appetitive motivational values in both the decision-picture trials and the decision-money trials. Next, to confirm that the pleasant pictures had a positive motivational value (relative to neutral pictures) and that unpleasant pictures had a negative motivational value in the decision-picture trial, we compared the standard Q-learning model with parameterized initial value and restricted models in which either the motivational value of pleasant or unpleasant pictures was set at zero (κ^P = 0 or κ^N = 0). For the data from the decision-picture trials, the unrestricted standard Q-learning model provided a significantly better fit to the data in the decision-picture trials than the restricted models in which κ^P = 0 [χ²(1) = 109.8, P < 10⁻¹⁰, likelihood ratio test] and κ^N = 0 [χ²(1) = 14.63, P < 0.001], suggesting that the pleasant pictures had a more positive motivational value and the unpleasant pictures had a more negative motivational value compared with the neutral pictures. The models for the decision-money trials showed similar results, that is, setting the motivational value for either a gain or loss at zero significantly decreased the likelihood [χ²(1) = 102.38, P < 10⁻¹⁰ or χ²(1) = 63.22, P < 10⁻¹⁰, respectively]. Next, we examined the Q-learning model in which the model had two different learning rates for the appetitive value and the aversive value in the valence-separated representation (see materials and methods). For the decision-picture trials, the different learning rate model (estimated parameters: α⁺ = 0.647, α⁻ = 0.669, κ^P = 0.719, κ^N = −1.704, and Q₀ = −1.661) did not provide a significantly better fit than the standard Q-learning model with a common learning rate [χ²(1) = 0.01, P = 0.943]. Also, for the decision-money trials, the different learning rate model (estimated parameters: α⁺ = 0.618, α⁻ = 0.795, κ^P = 1.739, κ^N = −1.545, and Q₀ = −0.647) did not provide a significantly better fit than the standard Q-learning model [χ²(1) = 0.87, P = 0.350]. Taking the above findings together, the Q-learning model with a non-zero (negative) initial value, different motivational values for outcome valence, and a single learning rate was the best model for both the decision-picture and the decision-money trials. Thus, in the fMRI analysis, we used this model to generate prediction errors as parametric modulators of regressors.

fMRI Results

Neural activities reflecting the stimulus identity.

We first investigated several contrast images derived from the GLM in which stimulus identities were included as regressors (GLM1). Abundant studies have examined the brain activity in response to emotional pictures selected from the same database as ours (Britton et al. 2006; Lang et al. 1998b; Lang and Bradley 2010; Sabatinelli et al. 2005, 2007). The basic contrast image analysis for BOLD responses to outcome in the decision-picture trials is shown in Fig. 4. The presentation of emotional (pleasant and unpleasant) pictures activated the occipital cortex more compared with neutral pictures (Fig. 4, A and B; Table 1), which might be caused in part by the differences in the visual properties of the pictures between the three picture categories. In addition, unpleasant pictures activated the fusiform gyrus and thalamus (Fig. 4B; Table 1), which is in agreement with previous studies (e.g., Lane et al. 1997). However, our primary interest involved determining how presenting those pictures as the outcomes of decisions would influence neural activities in response to the pictures. Thus we tested the differences between the pleasant/unpleasant pictures and the neutral pictures in the decision-picture trials with a similar contrast for the no-decision trials being subtracted (see materials and methods for details). For the pleasant picture contrast, there were no significant regions in our ROIs, i.e., the insula and the striatum. Instead, the whole brain analysis revealed differential activations in the midbrain, the left fusiform gyrus, and the right parahippocampal gyrus (all P < 0.001, uncorrected; Fig. 5A; Table 2). For the unpleasant picture contrast, there were significant activations in our ROIs, i.e., the left anterior insula (x = −30, y = 26, z = −2) and the right ventral striatum, including the putamen (x = 18, y = 10, z = −6) and the NAcc (x = 16, y = 10, z = −6) (all P < 0.01, after SVC for multiple comparisons within anatomically defined masks; Fig. 5B). The estimated parameter (β value) for each outcome condition indicated that these significant contrasts were mainly driven by decreases in activity in response to the neutral pictures in the decision trial, rather than by increases in activity in response to the pleasant or unpleasant pictures (Fig. 5C).

Fig. 4. — Regions showing significant statistical contrast at the outcome onsets. A: pleasant > neutral for decision-picture trials. B: unpleasant > neutral for decision-picture trials. C: decision-picture trials > no-decision-picture trials (valence collapsed). D: decision-picture trials > no-decision-picture trials (valence separated). Colors indicate valence. Thresholds for the maps have been set at P < 0.001 for *A–D*. dlPFC, dorsolateral prefrontal cortex; dmPFC, dorsomedial prefrontal cortex; IPL, inferior parietal lobule.

Table 1.

fMRI peak voxels for emotional vs. neutral picture contrasts

Area	L/R	x	y	z	k	t Value
Pleasant > neutral
Cuneus	L	−10	−94	14	1,147	6.43
Calcarine gyrus	L	−6	−86	2		5.63
Middle temporal gyrus	R	50	−36	−6	24	4.75
Middle temporal gyrus	L	−60	−6	−16	18	4.40
Middle occipital gyrus	L	−52	−78	6	24	4.31
Middle cingulate cortex	L	−2	26	32	30	4.02
Lingual gyrus	L	−14	−64	−10	47	3.91
Supramarginal gyrus	R	66	−40	26	22	3.83
Unpleasant > neutral
Fusiform gyrus	L	−40	−46	−26	117	6.92
Calcarine gyrus	L	−6	−90	4	544	6.40
Middle cingulate cortex	R	2	24	36	212	6.06
Superior medial gyrus	L	−2	42	34		3.80
Thalamus	R	4	−10	2	108	5.55
Supplementary motor area	R	12	16	58	257	5.30
Supplementary motor area	R	10	10	64		5.18
Middle temporal gyrus	R	54	−72	0	43	4.75
Parahippocampal gyrus	R	16	−6	−18	62	4.59
Inferior frontal gyrus	R	48	22	24	57	4.40
Inferior frontal gyrus	R	40	22	22		4.19
Inferior temporal gyrus	R	46	−52	−22	30	4.39
Temporal pole	R	40	24	−32	20	4.35
Middle occipital gyrus	L	−50	−76	6	19	4.23
Supramarginal gyrus	R	50	−40	32	18	4.17
Superior frontal gyrus	R	24	56	14	18	4.14
Lingual gyrus	R	12	−60	−4	22	4.12

Open in a new tab

The uncorrected P-value threshold was P < 0.001, and the cluster size (k) threshold was 15 voxels. Empty cluster size means that the peak is in the same cluster with the above peak.

Fig. 5. — Regions differentially activated by emotional pictures specifically in the decision-picture trials. A: regions activated by pleasant pictures: right parahippocampal gyrus, midbrain, and fusiform gyrus. B: regions activated by unpleasant pictures: right ventral striatum (putamen) and left anterior insula. The thresholds for the maps have been set at P < 0.001 or 0.005, uncorrected. C: estimated parameter values (β) for the right parahippocampal gyrus, fusiform gyrus, left anterior insula, and right ventral striatum. Vtr Str, ventral striatum.

Table 2.

fMRI peak voxels for decision vs. no-decision contrasts

Area	L/R	x	y	z	k	t Value
Pleasant pictures
Decision-pleasant − decision-neutral) > (no-decision-pleasant − no-decision-neutral)
Midbrain		2	−38	−12	22	5.13
Fusiform gyrus	L	−38	−48	−24	31	4.82
Temporal lobe/subgyral	R	42	0	−24	24	4.73
Parahippocampal gyrus	R	34	−42	−8	46	4.68
Superior temporal gyrus	R	54	−10	−6	24	4.47
Unpleasant pictures
(Decision-unpleasant − decision-neutral) > (no-decision-unpleasant − no-decision-neutral)
Superior frontal gyrus	R	24	54	2	81	6.60
Putamen	R	18	10	−6	74	5.02
Middle frontal gyrus	R	32	12	44	135	4.76
Anterior cingulate xortex	R	8	40	26	41	4.69
Fusiform gyrus	L	−40	−46	−24	38	4.60
Mid orbital gyrus	R	8	54	−4	26	4.56
Precentral gyrus	L	−34	−10	52	22	4.44
Supra marginal gyrus	R	46	−38	36	80	4.27
Middle occipital gyrus	R	38	−92	4	18	4.18
Rolandic operculum	L	−50	6	2	25	4.02
Postcentral gyrus	R	48	−30	54	38	4.00
Cerebellum	R	10	−52	−60	21	3.98
Middle cingulate cortex	L	−2	26	34	19	3.91
Middle frontal gyrus	R	30	22	56	18	3.87
Angular gyrus	R	50	−66	38	25	3.87

Open in a new tab

The threshold was P < 0.001, and the cluster size threshold was 15 voxels.

The results of other contrasts between the decision-picture trials and the no-decision-picture trials are shown in Fig. 4, C and D. Several regions, such as the right dorsolateral prefrontal cortex (dlPFC), the dorsomedial prefrontal cortex (dmPFC), the right inferior parietal lobule (IPL), and the precuneus, showed differential activations (Fig. 4C). Subsequent analyses that separated the data depending on the valence revealed that pictures with all valences activated the dlPFC and the right IPL, although the regions activated by neutral pictures were smaller than those activated by emotional pictures (Fig. 4D). The precuneus was activated only by emotional (pleasant and unpleasant) pictures, and the dmPFC was activated only by unpleasant pictures. Because the basic neural responses to monetary rewards were not our primary aim, we did not include a no-decision control condition for monetary trials. Thus the same analyses as used with the picture trials were infeasible. Instead, we analyzed basic contrasts, such as gain (+500 yen) vs. no gain (0 yen) and loss (−500 yen) vs. no gain (0 yen) (Table 3). These contrasts revealed significant activations primarily in the occipital regions (visual cortex), perhaps reflecting the differences in the visual properties among monetary feedbacks.

Table 3.

fMRI peak voxels for outcome of decision-money trials

Area	L/R	x	y	z	k	t Value
Gain > no gain
Lingual gyrus	L	−6	−82	−8	3769	10.50
Cuneus	R	14	−98	6		7.97
Calcarine gyrus	R	14	−94	−2		7.80
Superior temporal gyrus	L	−48	−24	4	238	4.91
Rolandic operculum	L	−46	−24	20		4.66
Cerebellum	R	50	−52	−38	26	4.66
Anterior cingulate cortex	R	8	42	12	17	4.65
Precentral gyrus	L	−34	−18	72	69	4.50
Temporal pole	R	42	6	−22	22	4.44
Superior temporal gyrus	L	−48	−12	−6	32	4.30
Loss > no gain
Lingual gyrus	L	−6	−82	−8	4879	13.39
Cuneus	R	14	100	6		8.80
Calcarine gyrus	L	−4	−96	0		8.02
Calcarine gyrus	R	2	−40	−22	34	4.59
Precentral gyrus	L	−46	−6	38	44	4.40
Supramarginal gyrus	L	−60	−34	34	19	4.32
Postcentral gyrus	L	−48	−18	24	16	3.92

Open in a new tab

The uncorrected P-value threshold was P < 0.001, and the cluster size threshold was 15 voxels.

Neural activities reflecting TD errors.

To investigate the brain regions that represented the learning signal, i.e., the TD error, we entered the TD errors into the GLM as parametric modulators (GLM2 and GLM3). The TD errors were derived on a trial-by-trial basis from the fitted Q-learning model with both the valence-separated representation (for GLM2) and the valence-mixed representation (for GLM3). With the valence-mixed representation, unpleasant outcomes and pleasant outcomes are assumed to have opposite values and a single TD error is assumed. In the valence-separated representation, two types of learning signals are assumed: the appetitive TD error, which is related to the appetitive value of the cues, and the aversive TD error, which is related to the aversive value of the cues (Fig. 2B; see materials and methods). The appetitive TD error is used to update the expected value of the cues related to the appetitive events (pleasant pictures or monetary gain), whereas the aversive TD error is used for the expected value of the cues related to the aversive events (unpleasant pictures or monetary loss). Because these two representations are different representations of a single model (standard Q-learning model with a non-zero initial value), both representations provide the statistically identical prediction regarding choice behavior. Consequently, there is no reason for favoring one representation from the viewpoint of statistical model fitting to behavioral data. Thus we used both representations for separate analyses.

No regions in our ROIs correlated with TD error derived from the valence-mixed representation (GLM2) even with a very weak threshold (P < 0.01, uncorrected) in both the decision-picture and decision-money trials. In contrast, for the valence-separated representation (GLM3), we found several regions in the ROIs that were correlated with the TD errors. For the appetitive TD error in the decision-picture trials, there were several regions in the ROI, including the insula and striatum, that showed significant correlations after SVC (Fig. 6A; left insula: x = −32, y = 14, z = 8, P < 0.01; right caudate: x = 8, y = 20, z = 6, P < 0.05). Whole brain analysis revealed that the left precuneus showed the strongest correlation (Fig. 6A; Table 4) with the appetitive TD error. In addition to the appetitive TD error, several brain ROIs showed significantly correlated activities with the aversive TD error (Fig. 6B; Table 4; all P < 0.05, SVC; right NAcc: x = 16, y = 10, z = −4; right insula: x = 34, y = 16, z = 6; left insula: x = −42, y = −2, z = 8; right caudate: x = 8, y = 18, z = 4; right putamen: x = 18, y = 10, z = −2). The bilateral precuneus was also correlated with the aversive TD error (left: x = −6, y = −61, z = 40; right: x = 9, y = −73, z = 61).

Fig. 6. — Regions correlated with TD errors derived from the Q-learning model with valence-separated representation. A: the appetitive TD error for decision-picture trials was correlated with the bilateral precuneus, right caudate, and left middle insula. B: the aversive TD error for decision-picture trials was also correlated with the bilateral precuneus, right caudate, right ventral striatum, and right middle insula. C: the appetitive and aversive TD errors for decision-money trials were correlated with the left caudate and right ventral striatum. The thresholds for the maps have been set at P < 0.001 or 0.005, uncorrected.

Table 4.

fMRI peak voxels for TD errors in decision-picture trials

Area	L/R	x	y	z	k	t Value
Appetitive TD error for decision-picture trials
Precuneus	L	−8	−70	46	4725	7.41
Precuneus	R	14	−52	50		6.26
Middle frontal gyrus	L	−28	2	58		6.13
Middle cingulate cortex	L	−2	22	32	257	5.60
Inferior occipital gyrus	L	−18	−98	−10	67	5.51
Supra marginal gyrus	R	66	−38	40	243	5.45
Supplementary motor area	R	12	4	56	610	5.43
Rolandic operculum	L	−52	6	4	150	5.22
Inferior frontal gyrus	L	−60	14	4		4.19
Cerebellum	R	2	−48	−42	29	5.12
Caudate nucleus	R	8	18	4	44	4.82
Supra marginal gyrus	L	−62	−24	16	141	4.74
Superior temporal gyrus	L	−48	−22	12		4.42
Insula lobe	L	−32	12	8	44	4.74
Paracentral lobule	R	2	−26	76	32	4.52
Putamen	R	26	16	4	34	4.47
Caudate nucleus	R	22	22	8		3.63
Caudate nucleus	R	4	−62	−16	34	4.23
Cerebellum	R	26	−48	−26	17	4.20
Postcentral gyrus	R	34	−38	66	17	3.88
Middle frontal gyrus	L	−30	42	16	18	3.87
Superior frontal gyrus	L	−28	56	22	17	3.75
Aversive TD error for decision-picture trials
Supplementary motor area	L	2	20	48	1507	6.18
Caudate nucleus	R	8	18	2	228	6.18
Cerebellum	R	36	−48	−50	142	5.84
Middle frontal gyrus	L	−26	2	56	721	5.79
Inferior Parietal lobule	L	−44	−40	40	781	5.69
Supra marginal gyrus	R	62	−50	26	911	5.61
Middle frontal gyrus	R	30	56	24	248	5.57
Precuneus	L	−6	−76	44	736	5.49
Precuneus	R	14	−50	46		4.87
Precentral gyrus	L	−44	2	26	227	5.47
Insula lobe	R	34	16	6	45	5.12
Cerebellum	L	−38	−46	−38	29	5.06
Inferior occipital gyrus	L	−26	−86	−6	18	4.77
Cerebellum	L	−28	−54	−36	15	4.76
Cerebellum	R	18	−52	−26	28	4.53
Precentral gyrus	L	−28	−20	52	46	4.42
Rolandic operculum	L	−44	−2	8	27	4.31
Cerebellum	R	6	−82	−26	25	4.16
Superior frontal gyrus	R	14	36	52	17	4.15
Superior frontal gyrus	L	−28	56	22	23	4.10

Open in a new tab

The threshold was P < 0.001, and the cluster size threshold was 15 voxels.

With respect to the decision-money trials, the striatum regions were significantly correlated with the appetitive TD error (Fig. 6C; Table 5; all P < 0.05, SVC; left caudate: x = −8, y = 6, z = 16; right NAcc: x = 12, y = 14, z = −8; right insula, x = 42, y = 10, z = 4). The right NAcc also correlated with the aversive TD error (P < 0.05; right NAcc: x = 16, y = 8, z = −10). Other regions in our ROI showed a tendency to correlate with the aversive TD error but failed to reach the level of significance after SVC (Fig. 6C; Table 5; left caudate: x = −10, y = 10, z = 8, P = 0.055; right insula: x = 38, y = 14, z = 4, P = 0.077).

Table 5.

fMRI peak voxels for TD errors in decision-money trials

Area	L/R	x	y	z	k	t Value
Appetitive TD error for decision-money trials
Calcarine gyrus	L	−14	−96	−4	5178	10.23
Precentral gyrus	L	−34	−2	62	4255	7.93
Cerebellum	R	40	−50	−38	363	6.86
Cuneus	R	26	−62	20	122	5.65
Cerebellum	L	−20	−36	−46	36	5.20
Insula lobe	R	42	10	4	46	5.05
Superior Parietal lobule	L	−14	−68	54	268	5.03
Anterior cingulate cortex	L	−2	42	10	15	4.43
Middle frontal gyrus	R	32	46	26	17	4.33
Hippocampus	R	40	−28	−12	20	4.31
Superior frontal gyrus	L	−22	42	48	131	4.02
Middle occipital gyrus	R	38	−76	24	29	3.95
Postcentral gyrus	L	−54	−16	28	43	3.74
Aversive TD error for decision-money trials
Calcarine gyrus	L	−14	−98	−4	5302	11.13
Supplementary motor area	L	−2	2	60	3634	10.50
Precuneus	L	−8	−60	58	177	7.70
Superior parietal lobule	L	−16	−70	46	232	6.82
Cerebellum	L	−22	−38	−46	34	5.20
Cerebellum	R	40	−50	−32	160	4.61
Superior occipital gyrus	R	28	−80	32	46	4.53
Superior parietal lobule	L	−38	−46	60	97	4.36
Postcentral gyrus	L	−48	−20	28	31	4.30
Fusiform gyrus	L	−32	−58	−18	50	4.16
Precuneus	R	20	−58	22	57	4.10
Middle frontal gyrus	L	−24	42	24	16	3.94
Midbrain		2	−38	−12	22	5.13
Fusiform gyrus	L	−38	−48	−24	31	4.826
Temporal lobe/subgyral	R	42	0	−24	24	4.73
Parahippocampal gyrus	R	34	−42	−8	46	4.68
Superior temporal gyrus	R	54	−10	−6	24	4.47

Open in a new tab

The threshold was P < 0.001, and the cluster size threshold was 15 voxels.

DISCUSSION

We investigated how emotional stimuli drive subsequent decision-making processes via learning processes. Although previous studies have investigated how emotional facial stimuli modulate financial reward-based decision making (Evans et al. 2011), to date, no study has addressed how purely emotional stimuli, presented solely as decision outcomes, guide decision making. In addition, we used general emotion-evoking pictures, including emotional scenes adopted from a standard emotional picture set, rather than pictures of facial expressions. The reinforcement learning model in which motivational value parameters were free parameters allowed us to quantify the impact of the emotional images on the subsequent choice behavior; this model also provided trial-by-trial learning signals, i.e., prediction errors (TD errors), which were correlated with the BOLD signal.

The analysis with basic stimulus identity contrasts revealed significant differential activations (compared with neutral pictures) in the ventral striatum (including the putamen and the NAcc) for unpleasant pictures. As shown in Fig. 5C, the differences were mainly driven by a decrease in activity in response to the neutral images in the decision trial rather than an increase in activity in response to the unpleasant pictures. This decreased activity in response to neutral pictures was not observed in the no-decision trials. Thus the decrease specifically occurred in the decision-making context, possibly reflecting different attitudes of the participants toward the task. One possible explanation for the decrease in activity in response to neutral images is the negative prediction error, which is caused by expecting pleasant or unpleasant pictures with a non-zero probability. Consistent with this interpretation, the model-based analysis showed significant correlations between the aversive prediction errors (aversive TD error) and activities in these regions. Similar to appetitive rewards, several studies have reported that the prediction error for aversive stimuli, such as electrical shock and monetary loss, are positively correlated with the BOLD signal in the ventral striatum (Li et al. 2011; Seymour et al. 2007). In addition, activity in the insula has been shown to correlate with TD error, particularly for aversive events, which is consistent with the present study if we regard seeing an unpleasant picture as an aversive event (Seymour et al. 2004, 2005). Our results showed that negative emotion induced by the unpleasant pictures similarly induced avoidance behavior through common neural systems with other aversive stimuli. In addition, our results are consistent with Levita et al. (2012), who reported that the BOLD responses from the ventral striatum are higher in active avoidance conditions compared with passive avoidance conditions. A series of fMRI studies reported that the BOLD responses in striatum are dominated by action requirements (e.g., go vs. no go) rather than by valence, in accord with the view that dopamine has a role in modulating vigor or motivation for actions independent of valence (Guitart-Masip et al. 2011, 2014).

We considered two types of action-value representations. One was a valence-mixed representation, which is a straightforward representation employed in standard reinforcement learning models, that expresses positive and negative valence in one continuous dimension, with negative valences having negative values The other representation was a valence-separated representation, which we proposed in the present study, that represents the action value by decomposing it into an appetitive component and an aversive component; the more aversive events an action induces, the more positive the value of the aversive component. Our results showed that the striatum and insula mainly correlated with TD errors in the valence-separated representation. This result suggests that at least these regions separately represent positive and negative valences, rather than representing both valences in one continuous dimension.

TD errors computed in the present study consist of a cue-induced anticipated value and a prediction error regarding the outcome value (Fig. 2B). Previous studies have reported that the anticipation of emotional pictures was activated during the anticipation phase (Grupe et al. 2013). This anticipation-related activation of the insula can be interpreted as a part of TD error-related activity and was incorporated in our analysis. Similar neural substrates might underlie the anticipation of aversive stimuli and TD error-based action-value updates.

We previously found that the magnitude of motivational values was larger for unpleasant pictures than for pleasant pictures (Katahira et al. 2011). In the present study, this asymmetry was also observed as a difference in the estimated motivational value parameters. Significant correlations were found in both the ventral and dorsal parts of the striatum and insula with respect to the aversive TD errors. On the other hand, the activities of the ventral part of the striatum (NAcc and putamen) did not significantly correlate with the appetitive TD error, whereas the activities of the caudate correlated with the appetitive TD error. These differences in activation patterns may account for the asymmetry between the effects of the pleasant and unpleasant pictures on decision making. Because the ventral striatum activity is directly related to motivation, the activity in response to unpleasant pictures may have a larger influence (for avoidance) on the subsequent choice behavior.

In addition to the brain regions in our ROIs, pleasant pictures in the decision-making task activated the fusiform gyrus, the parahippocampal gyrus (compared with neutral pictures, as revealed by the GLM1 analysis), and the bilateral precuneus, which were correlated with appetitive TD errors. The precuneus was also activated by the aversive prediction error in the decision-picture trials. Previous functional imaging studies have suggested that the precuneus is involved in self-processing operations, mental imagery, and episodic memory retrieval (for a review, see Cavanna and Trimble 2006). One fMRI study employing an empathic judgment task during the viewing of emotional pictures reported that the attribution of emotions to the self and other individuals commonly activated the precuneus (Ochsner et al. 2004). The activation of the precuneus for both appetitive and aversive prediction errors in our decision-making task may be accounted for by emotional pictures affecting the value of options through a mental imagery process, such as “how do I feel if I am placed in this situation?” or “how does the person in the picture feel?” Some pictures used in our study were not biologically or directly pleasant (such as sexual images) or unpleasant (such as pictures of a snake or spiders) but were endowed with emotional valence only through interpretations of the situation of the scenes. The fusiform gyrus, the parahippocampal gyrus, and the precuneus may engage in such interpretation processes. The precuneus has anatomic projections to the dorsolateral caudate nucleus and the putamen, possibly allowing its direct influence on updating the value of choices (Cavanna and Trimble 2006).

The TD errors for both monetary gain and monetary loss in the decision-money trials activated a region of the striatum, in agreement with previous fMRI studies (O'Doherty et al. 2004; Pessiglione et al. 2006; Seymour et al. 2007). However, we found no significantly increased activation of the ventral striatum or orbitofrontal cortex to gain vs. no-gain contrast (Table 3), in contrast to previous studies that employed basic tasks that used monetary feedback. This inconsistency might be because of a weaker detection power of our experimental design for monetary outcome compared with pictures and because the ventral striatum activities are more sensitive to reward prediction error than to simple differences in outcome values. The weak detection power for monetary reward may arise from two factors. First, the picture trials and money trials were intermixed in our experiment; thus the picture trials may have dampened the saliency of the monetary outcome. Saliency is a key factor for the activation of the striatum (Zink et al. 2003, 2004). Second, because the participants were informed that the minimum payment was guaranteed for participation in the experiment and it was sufficiently high (∼5,000 yen), participants might be indifferent to the monetary feedback.

After we conducted the present experiment, we became aware of the study by Lin et al. (2012) investigating the neural substrates of social reward learning, in which smiling, angry, or neutral faces paired with emotional words were used as decision outcomes. They found substantial overlap between regions that correlated with the prediction errors for the social rewards and those that correlated with the prediction errors for monetary rewards in the ventral striatum. Our experimental design was apparently similar to theirs but differed in several aspects. First, they used emotional facial expressions as decision outcomes, whereas we used general emotional pictures, including natural scenes, rather than social rewards. This difference might have resulted in the disparities in the activated regions: our results showed remarkable correlations with prediction error in the precuneus, the insula, and the ventral striatum. Second, the association rules between cues and outcomes were different. In Lin et al. (2012), the target cues produced either a positive or a negative outcome in addition to a neutral outcome, whereas control cues produced all valences with equal probability (1/3). In contrast, in our experimental cues always produced both positive and negative outcomes, which enabled us to evaluate how the brain learns the values of cues incorporating both valences. Third, we separated the influence of appetitive and aversive outcomes by using the valence-separated representation or the Q-learning model, whereas Lin et al. (2012) collapsed them, because their model expressed both valences in one dimension in such a way that the motivational values of appetitive outcomes were set to 1, those of neutral outcomes were set to 0.5, and those of aversive outcomes were set to 0. Furthermore, in our experimental design, the contingency was switched without any cue, which made the tasks difficult to learn and produced sufficient variability in the prediction error.

In conclusion, the present study examined the neural basis of decision making guided by emotional events rather than a quantifiable reward or punishment. We found that brain regions correlated with prediction error related to emotional pictures overlapped, in part, with reward/motivation systems (the striatum and the insula). In addition, these regions appear to encode positive and negative valence separately, rather than on one continuous dimension. Because we restricted our interest to the most basic dimension of emotion (i.e., valence), other important features, such as arousal and dominance, remain unexplored in the context of decision making. Our experimental and modeling paradigm is the first step in exploring how more complex aspects of emotion guide decision making.

GRANTS

This work was supported, in part, by funding from the Japan Science and Technology Agency, Exploratory Research for Advanced Technology, the Okanoya Emotional Information Project, and Grant-in-Aid for Scientific Research 24700238.

DISCLOSURES

No conflicts of interest, financial or otherwise, are declared by the authors.

AUTHOR CONTRIBUTIONS

K.K., Y.-T.M., T.F., K.C., K.O., and M.O. conception and design of research; K.K., Y.-T.M., K.U., T.A., and C.S. performed experiments; K.K., Y.-T.M., K.U., T.A., and C.S. analyzed data; K.K., Y.-T.M., T.F., and K.C. interpreted results of experiments; K.K. prepared figures; K.K. drafted manuscript; K.K., Y.-T.M., T.F., K.U., C.S., and K.C. edited and revised manuscript; K.K., Y.-T.M., T.F., K.U., T.A., C.S., K.C., K.O., and M.O. approved final version of manuscript.

ACKNOWLEDGMENTS

We thank Shinsuke Suzuki for advice on the experimental design and the data analysis, and Hiroki C. Tanabe and Kunihiro Hasegawa for advice on the fMRI data processing.

Footnotes

The IAPS slide numbers (and descriptions) used in this study were 2411 (Girl), 7004 (Spoon), 7217 (Clothes Rack), 7491 (Building), 2840 (Chess), 7010 (Basket), 7175 (Lamp), 7500 (Building), 7950 (Tissue), 5740 (Plant), 7014 (Scissors), 7077 (Stove), 7018 (Screw), 7021 (Whistle), 7001 (Buttons), 7590 (Traffic), 7006 (Bowl), 5395 (Boat), 7150 (Umbrella), and 7026 (Picnic Table) for the neutral pictures; 1440 (Seal), 5199 (Garden), 5910 (Fireworks), 5994 (Skyline), 7470 (Pancakes), 1710 (Puppies), 5833 (Beach), 7502 (Castle), 8031 (Skier), 5301 (Galaxy), 1920 (Dolphins), 2655 (Child), 5890 (Earth), 1410 (Ferret), 5829 (Sunset), 2314 (Binoculars), 7508 (Ferris Wheel), 5825 (Sea), 1720 (Lion), and 5621 (Sky Divers) for the pleasant pictures; and 6550 (Attack), 3230 (Dying Man), 9041 (Scared Child), 9295 (Garbage), 9419 (Assault), 1271 (Roaches), 6231 (Aimed Gun), 9421 (Soldier), 9530 (Boys), 9610 (Accident), 2750 (Bum), 9280 (Smoke), 6242 (Gang), 7380 (Roach On Pizza), 9290 (Garbage), 1111 (Snakes), 9102 (Heroin), 2301 (Kid Crying), 2276 (Girl), and 6560 (Attack) for the unpleasant pictures.

REFERENCES

Bechara A, Damasio H, Tranel D, Damasio AR. The Iowa Gambling Task and the somatic marker hypothesis: some questions and answers. Trends Cogn Sci 9: 159–164, 2005. [DOI] [PubMed] [Google Scholar]
Bradley MM, Codispoti M, Cuthbert BN, Lang PJ. Emotion and motivation I: defensive and appetitive reactions in picture processing. Emotion 1: 276–298, 2001. [PubMed] [Google Scholar]
Bray S, O'Doherty J. Neural coding of reward-prediction error signals during classical conditioning with attractive faces. J Neurophysiol 97: 3036–3045, 2007. [DOI] [PubMed] [Google Scholar]
Brett M, Anton JL, Valabregue R, Poline JB. Region of interest analysis using the MarsBar toolbox for SPM 99. Neuroimage 16: S497, 2002. [Google Scholar]
Britton JC, Taylor SF, Sudheimer KD, Liberzon I. Facial expressions and complex IAPS pictures: common and differential networks. Neuroimage 31: 906–919, 2006. [DOI] [PubMed] [Google Scholar]
Cavanna AE, Trimble MR. The precuneus: a review of its functional anatomy and behavioural correlates. Brain 129: 564–583, 2006. [DOI] [PubMed] [Google Scholar]
Codispoti M, Bradley MM, Lang PJ. Affective reactions to briefly presented pictures. Psychophysiology 38: 474–478, 2001. [PubMed] [Google Scholar]
Cohen JD. The vulcanization of the human brain: a neural perspective on interactions between cognition and emotion. J Econ Perspect 19: 3–24, 2005. [Google Scholar]
Daw ND. Trial-by-trial data analysis using computational models. Tutorial review. In: Decision Making, Affect, and Learning: Attention and Performance XXIII, edited by Delgado MR, Phelps EA, Robbins TW. Oxford, UK: Oxford Univ. Press, 2011, p. 3–38. [Google Scholar]
Daw ND, O'Doherty JP, Dayan P, Seymour B, Dolan RJ. Cortical substrates for exploratory decisions in humans. Nature 441: 876–879, 2006. [DOI] [PMC free article] [PubMed] [Google Scholar]
Evans S, Fleming SM, Dolan RJ, Averbeck BB. Effects of emotional preferences on value-based decision-making are mediated by mentalizing and not reward networks. J Cogn Neurosci 23: 2197–2210, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
Grupe DW, Oathes DJ, Nitschke JB. Dissecting the anticipation of aversion reveals dissociable neural networks. Cereb Cortex 23: 1874–1883, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
Guitart-Masip M, Duzel E, Dolan R, Dayan P. Action versus valence in decision making. Trends Cogn Sci 18: 194–202, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
Guitart-Masip M, Fuentemilla L, Bach DR, Huys QJM, Dayan P, Dolan RJ, Duzel E. Action dominates valence in anticipatory representations in the human striatum and dopaminergic midbrain. J Neurosci 31: 7867–7875, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hu X, Le TH, Parrish T, Erhard P. Retrospective estimation and correction of physiological fluctuation in functional MRI. Magn Reson Med 34: 201–212, 1995. [DOI] [PubMed] [Google Scholar]
Katahira K, Fujimura T, Matsuda YT, Okanoya K, Okada M. Individual differences in heart rate variability are associated with the avoidance of negative emotional events. Biol Psychol 103: 322–331, 2014. [DOI] [PubMed] [Google Scholar]
Katahira K, Fujimura T, Okanoya K, Okada M. Decision-making based on emotional images. Front Psychol 2: 311, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kellman P. Adaptive sensitivity encoding incorporating temporal filtering (TSENSE). Magn Reson Med 852: 846–852, 2001. [DOI] [PubMed] [Google Scholar]
Lane RD, Reiman EM, Bradley MM, Lang PJ, Ahern GL, Davidson RJ, Schwartz GE. Neuroanatomical correlates of pleasant and unpleasant emotion. Neuropsychologia 35: 1437–1444, 1997. [DOI] [PubMed] [Google Scholar]
Lang PJ, Bradley MM. Emotion and the motivational brain. Biol Psychol 84: 437–450, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lang PJ, Bradley MM, Cuthbert BN. Emotion, motivation, and anxiety: brain mechanisms and psychophysiology. Biol Psychiatry 44: 1248–1263, 1998a. [DOI] [PubMed] [Google Scholar]
Lang PJ, Bradley MM, Cuthbert BN. International Affective Picture System (IAPS): affective ratings of pictures and instruction manual. Technical Report A-8. Gainesville, FL: University of Florida, 2008. [Google Scholar]
Lang PJ, Bradley MM, Fitzsimmons JR, Cuthbert BN, Scott JD, Moulder B, Nangia V. Emotional arousal and activation of the visual cortex: an fMRI analysis. Psychophysiology 35: 199–210, 1998b. [PubMed] [Google Scholar]
Levita L, Hoskin R, Champi S. Avoidance of harm and anxiety: a role for the nucleus accumbens. Neuroimage 62: 189–198, 2012. [DOI] [PubMed] [Google Scholar]
Li J, Schiller D, Schoenbaum G, Phelps a E, Daw ND. Differential roles of human striatum and amygdala in associative learning. Nat Neurosci 14: 1250–1252, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lin A, Adolphs R, Rangel A. Social and monetary reward learning engage overlapping neural substrates. Soc Cogn Affect Neurosci 7: 274–281, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
Loewenstein GF, Weber EU, Hsee CK, Welch N. Risk as feelings. Psychol Bull 127: 267–286, 2001. [DOI] [PubMed] [Google Scholar]
Niv Y. Reinforcement learning in the brain. J Math Psychol 53: 139–154, 2009. [Google Scholar]
Niv Y, Duff MO, Dayan P. Dopamine, uncertainty and TD learning. Behav Brain Funct 1: 6, 2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
Niv Y, Schoenbaum G. Dialogues on prediction errors. Trends Cogn Sci 12: 265–272, 2008. [DOI] [PubMed] [Google Scholar]
Ochsner KN, Knierim K, Ludlow DH, Hanelin J, Ramachandran T, Glover G, Mackey SC. Reflecting upon feelings: an fMRI study of neural systems supporting the attribution of emotion to self and other. J Cogn Neurosci 16: 1746–1772, 2004. [DOI] [PubMed] [Google Scholar]
O'Doherty J, Dayan P, Schultz J, Deichmann R, Friston K, Dolan RJ. Dissociable roles of ventral and dorsal striatum in instrumental conditioning. Science 304: 452–454, 2004. [DOI] [PubMed] [Google Scholar]
O'Doherty J, Winston J, Critchley H, Perrett D. Beauty in a smile: the role of medial orbitofrontal cortex in facial attractiveness. Neuropsychologia 41: 147–155, 2003a. [DOI] [PubMed] [Google Scholar]
O'Doherty JP, Dayan P, Friston K, Critchley H, Dolan RJ. Temporal difference models and reward-related learning in the human brain. Neuron 38: 329–337, 2003b. [DOI] [PubMed] [Google Scholar]
Pessiglione M, Seymour B, Flandin G, Dolan RJ, Frith CD. Dopamine-dependent prediction errors underpin reward-seeking behaviour in humans. Nature 442: 1042–1045, 2006. [DOI] [PMC free article] [PubMed] [Google Scholar]
Pruessmann KP, Weiger M, Scheidegger MB, Boesiger P. SENSE: sensitivity encoding for fast MRI. Magn Reson Med 42: 952–962, 1999. [PubMed] [Google Scholar]
Sabatinelli D, Bradley MM, Fitzsimmons JR, Lang PJ. Parallel amygdala and inferotemporal activation reflect emotional intensity and fear relevance. Neuroimage 24: 1265–1270, 2005. [DOI] [PubMed] [Google Scholar]
Sabatinelli D, Bradley MM, Lang PJ, Costa VD, Versace F. Pleasure rather than salience activates human nucleus accumbens and medial prefrontal cortex. J Neurophysiol 98: 1374–1379, 2007. [DOI] [PubMed] [Google Scholar]
Seymour B, Daw N, Dayan P, Singer T, Dolan R. Differential encoding of losses and gains in the human striatum. J Neurosci 27: 4826–4831, 2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
Seymour B, Doherty JPO, Dayan P, Koltzenburg M, Jones AK, Dolan RJ, Friston KJ, Frackowiak RS. Temporal difference models describe higher-order learning in humans. Nature 429: 664–667, 2004. [DOI] [PubMed] [Google Scholar]
Seymour B, Dolan R. Emotion, decision making, and the amygdala. Neuron 58: 662–671, 2008. [DOI] [PubMed] [Google Scholar]
Seymour B, O'Doherty JP, Koltzenburg M, Wiech K, Frackowiak R, Friston K, Dolan R. Opponent appetitive-aversive neural processes underlie predictive learning of pain relief. Nat Neurosci 8: 1234–1240, 2005. [DOI] [PubMed] [Google Scholar]
Shiv B, Loewenstein G, Bechara A, Damasio H, Damasio AR. Investment behavior and the negative side of emotion. Psychol Sci 16: 435–439, 2005. [DOI] [PubMed] [Google Scholar]
Sutton RS, Barto AG. Reinforcement Learning: An Introduction. Cambridge, UK: Cambridge Univ. Press, 1998. [Google Scholar]
Tanaka SC, Doya K, Okada G, Ueda K, Okamoto Y, Yamawaki S. Prediction of immediate and future rewards differentially recruits cortico-basal ganglia loops. Nat Neurosci 7: 887–893, 2004. [DOI] [PubMed] [Google Scholar]
Watkins CJ, Dayan P. Q-Learning. Mach Learn 8: 279–292, 1992. [Google Scholar]
Zink CF, Pagnoni G, Martin ME, Dhamala M, Berns GS. Human striatal response to salient nonrewarding stimuli. J Neurosci 23: 8092–8097, 2003. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zink CF, Pagnoni G, Martin-Skurski ME, Chappelow JC, Berns GS. Human striatal responses to monetary reward depend on saliency. Neuron 42: 509–517, 2004. [DOI] [PubMed] [Google Scholar]

[B1] Bechara A, Damasio H, Tranel D, Damasio AR. The Iowa Gambling Task and the somatic marker hypothesis: some questions and answers. Trends Cogn Sci 9: 159–164, 2005. [DOI] [PubMed] [Google Scholar]

[B2] Bradley MM, Codispoti M, Cuthbert BN, Lang PJ. Emotion and motivation I: defensive and appetitive reactions in picture processing. Emotion 1: 276–298, 2001. [PubMed] [Google Scholar]

[B3] Bray S, O'Doherty J. Neural coding of reward-prediction error signals during classical conditioning with attractive faces. J Neurophysiol 97: 3036–3045, 2007. [DOI] [PubMed] [Google Scholar]

[B4] Brett M, Anton JL, Valabregue R, Poline JB. Region of interest analysis using the MarsBar toolbox for SPM 99. Neuroimage 16: S497, 2002. [Google Scholar]

[B5] Britton JC, Taylor SF, Sudheimer KD, Liberzon I. Facial expressions and complex IAPS pictures: common and differential networks. Neuroimage 31: 906–919, 2006. [DOI] [PubMed] [Google Scholar]

[B6] Cavanna AE, Trimble MR. The precuneus: a review of its functional anatomy and behavioural correlates. Brain 129: 564–583, 2006. [DOI] [PubMed] [Google Scholar]

[B7] Codispoti M, Bradley MM, Lang PJ. Affective reactions to briefly presented pictures. Psychophysiology 38: 474–478, 2001. [PubMed] [Google Scholar]

[B8] Cohen JD. The vulcanization of the human brain: a neural perspective on interactions between cognition and emotion. J Econ Perspect 19: 3–24, 2005. [Google Scholar]

[B9] Daw ND. Trial-by-trial data analysis using computational models. Tutorial review. In: Decision Making, Affect, and Learning: Attention and Performance XXIII, edited by Delgado MR, Phelps EA, Robbins TW. Oxford, UK: Oxford Univ. Press, 2011, p. 3–38. [Google Scholar]

[B10] Daw ND, O'Doherty JP, Dayan P, Seymour B, Dolan RJ. Cortical substrates for exploratory decisions in humans. Nature 441: 876–879, 2006. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B11] Evans S, Fleming SM, Dolan RJ, Averbeck BB. Effects of emotional preferences on value-based decision-making are mediated by mentalizing and not reward networks. J Cogn Neurosci 23: 2197–2210, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B12] Grupe DW, Oathes DJ, Nitschke JB. Dissecting the anticipation of aversion reveals dissociable neural networks. Cereb Cortex 23: 1874–1883, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B13] Guitart-Masip M, Duzel E, Dolan R, Dayan P. Action versus valence in decision making. Trends Cogn Sci 18: 194–202, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B14] Guitart-Masip M, Fuentemilla L, Bach DR, Huys QJM, Dayan P, Dolan RJ, Duzel E. Action dominates valence in anticipatory representations in the human striatum and dopaminergic midbrain. J Neurosci 31: 7867–7875, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B15] Hu X, Le TH, Parrish T, Erhard P. Retrospective estimation and correction of physiological fluctuation in functional MRI. Magn Reson Med 34: 201–212, 1995. [DOI] [PubMed] [Google Scholar]

[B16] Katahira K, Fujimura T, Matsuda YT, Okanoya K, Okada M. Individual differences in heart rate variability are associated with the avoidance of negative emotional events. Biol Psychol 103: 322–331, 2014. [DOI] [PubMed] [Google Scholar]

[B17] Katahira K, Fujimura T, Okanoya K, Okada M. Decision-making based on emotional images. Front Psychol 2: 311, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B18] Kellman P. Adaptive sensitivity encoding incorporating temporal filtering (TSENSE). Magn Reson Med 852: 846–852, 2001. [DOI] [PubMed] [Google Scholar]

[B19] Lane RD, Reiman EM, Bradley MM, Lang PJ, Ahern GL, Davidson RJ, Schwartz GE. Neuroanatomical correlates of pleasant and unpleasant emotion. Neuropsychologia 35: 1437–1444, 1997. [DOI] [PubMed] [Google Scholar]

[B20] Lang PJ, Bradley MM. Emotion and the motivational brain. Biol Psychol 84: 437–450, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B21] Lang PJ, Bradley MM, Cuthbert BN. Emotion, motivation, and anxiety: brain mechanisms and psychophysiology. Biol Psychiatry 44: 1248–1263, 1998a. [DOI] [PubMed] [Google Scholar]

[B22] Lang PJ, Bradley MM, Cuthbert BN. International Affective Picture System (IAPS): affective ratings of pictures and instruction manual. Technical Report A-8. Gainesville, FL: University of Florida, 2008. [Google Scholar]

[B23] Lang PJ, Bradley MM, Fitzsimmons JR, Cuthbert BN, Scott JD, Moulder B, Nangia V. Emotional arousal and activation of the visual cortex: an fMRI analysis. Psychophysiology 35: 199–210, 1998b. [PubMed] [Google Scholar]

[B24] Levita L, Hoskin R, Champi S. Avoidance of harm and anxiety: a role for the nucleus accumbens. Neuroimage 62: 189–198, 2012. [DOI] [PubMed] [Google Scholar]

[B25] Li J, Schiller D, Schoenbaum G, Phelps a E, Daw ND. Differential roles of human striatum and amygdala in associative learning. Nat Neurosci 14: 1250–1252, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B26] Lin A, Adolphs R, Rangel A. Social and monetary reward learning engage overlapping neural substrates. Soc Cogn Affect Neurosci 7: 274–281, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B27] Loewenstein GF, Weber EU, Hsee CK, Welch N. Risk as feelings. Psychol Bull 127: 267–286, 2001. [DOI] [PubMed] [Google Scholar]

[B28] Niv Y. Reinforcement learning in the brain. J Math Psychol 53: 139–154, 2009. [Google Scholar]

[B29] Niv Y, Duff MO, Dayan P. Dopamine, uncertainty and TD learning. Behav Brain Funct 1: 6, 2005. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B30] Niv Y, Schoenbaum G. Dialogues on prediction errors. Trends Cogn Sci 12: 265–272, 2008. [DOI] [PubMed] [Google Scholar]

[B31] Ochsner KN, Knierim K, Ludlow DH, Hanelin J, Ramachandran T, Glover G, Mackey SC. Reflecting upon feelings: an fMRI study of neural systems supporting the attribution of emotion to self and other. J Cogn Neurosci 16: 1746–1772, 2004. [DOI] [PubMed] [Google Scholar]

[B32] O'Doherty J, Dayan P, Schultz J, Deichmann R, Friston K, Dolan RJ. Dissociable roles of ventral and dorsal striatum in instrumental conditioning. Science 304: 452–454, 2004. [DOI] [PubMed] [Google Scholar]

[B33] O'Doherty J, Winston J, Critchley H, Perrett D. Beauty in a smile: the role of medial orbitofrontal cortex in facial attractiveness. Neuropsychologia 41: 147–155, 2003a. [DOI] [PubMed] [Google Scholar]

[B34] O'Doherty JP, Dayan P, Friston K, Critchley H, Dolan RJ. Temporal difference models and reward-related learning in the human brain. Neuron 38: 329–337, 2003b. [DOI] [PubMed] [Google Scholar]

[B35] Pessiglione M, Seymour B, Flandin G, Dolan RJ, Frith CD. Dopamine-dependent prediction errors underpin reward-seeking behaviour in humans. Nature 442: 1042–1045, 2006. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B36] Pruessmann KP, Weiger M, Scheidegger MB, Boesiger P. SENSE: sensitivity encoding for fast MRI. Magn Reson Med 42: 952–962, 1999. [PubMed] [Google Scholar]

[B37] Sabatinelli D, Bradley MM, Fitzsimmons JR, Lang PJ. Parallel amygdala and inferotemporal activation reflect emotional intensity and fear relevance. Neuroimage 24: 1265–1270, 2005. [DOI] [PubMed] [Google Scholar]

[B38] Sabatinelli D, Bradley MM, Lang PJ, Costa VD, Versace F. Pleasure rather than salience activates human nucleus accumbens and medial prefrontal cortex. J Neurophysiol 98: 1374–1379, 2007. [DOI] [PubMed] [Google Scholar]

[B39] Seymour B, Daw N, Dayan P, Singer T, Dolan R. Differential encoding of losses and gains in the human striatum. J Neurosci 27: 4826–4831, 2007. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B40] Seymour B, Doherty JPO, Dayan P, Koltzenburg M, Jones AK, Dolan RJ, Friston KJ, Frackowiak RS. Temporal difference models describe higher-order learning in humans. Nature 429: 664–667, 2004. [DOI] [PubMed] [Google Scholar]

[B41] Seymour B, Dolan R. Emotion, decision making, and the amygdala. Neuron 58: 662–671, 2008. [DOI] [PubMed] [Google Scholar]

[B42] Seymour B, O'Doherty JP, Koltzenburg M, Wiech K, Frackowiak R, Friston K, Dolan R. Opponent appetitive-aversive neural processes underlie predictive learning of pain relief. Nat Neurosci 8: 1234–1240, 2005. [DOI] [PubMed] [Google Scholar]

[B43] Shiv B, Loewenstein G, Bechara A, Damasio H, Damasio AR. Investment behavior and the negative side of emotion. Psychol Sci 16: 435–439, 2005. [DOI] [PubMed] [Google Scholar]

[B44] Sutton RS, Barto AG. Reinforcement Learning: An Introduction. Cambridge, UK: Cambridge Univ. Press, 1998. [Google Scholar]

[B45] Tanaka SC, Doya K, Okada G, Ueda K, Okamoto Y, Yamawaki S. Prediction of immediate and future rewards differentially recruits cortico-basal ganglia loops. Nat Neurosci 7: 887–893, 2004. [DOI] [PubMed] [Google Scholar]

[B46] Watkins CJ, Dayan P. Q-Learning. Mach Learn 8: 279–292, 1992. [Google Scholar]

[B47] Zink CF, Pagnoni G, Martin ME, Dhamala M, Berns GS. Human striatal response to salient nonrewarding stimuli. J Neurosci 23: 8092–8097, 2003. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B48] Zink CF, Pagnoni G, Martin-Skurski ME, Chappelow JC, Berns GS. Human striatal responses to monetary reward depend on saliency. Neuron 42: 509–517, 2004. [DOI] [PubMed] [Google Scholar]

PERMALINK

Neural basis of decision making guided by emotional outcomes

Kentaro Katahira

Yoshi-Taka Matsuda

Tomomi Fujimura

Kenichi Ueno

Takeshi Asamizuya

Chisato Suzuki

Kang Cheng

Kazuo Okanoya

Masato Okada

Series information

Abstract

MATERIALS AND METHODS

Participants

Behavioral Task

Fig. 1.

Reinforcement Learning Models

Standard Q-learning model (with valence-mixed representation).

Valence-separated representation of the Q-learning model.

Fig. 2.

Parameter fit and model comparison.

fMRI Data Acquisition

fMRI Data Analysis

GLM1: regressors with stimulus identity.

GLM2: TD error with valence-mixed representation.

GLM3: TD errors with valence-separated representation.

RESULTS

Behavioral Results

Fig. 3.

fMRI Results

Neural activities reflecting the stimulus identity.

Fig. 4.

Table 1.

Fig. 5.

Table 2.

Table 3.

Neural activities reflecting TD errors.

Fig. 6.

Table 4.

Table 5.

DISCUSSION

GRANTS

DISCLOSURES

AUTHOR CONTRIBUTIONS

ACKNOWLEDGMENTS

Footnotes

REFERENCES

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases