Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2013 Nov 21.
Published in final edited form as: Neuron. 2012 Nov 21;76(4):826–837. doi: 10.1016/j.neuron.2012.09.030

The primate ventral pallidum encodes expected reward value and regulates motor action

Yoshihisa Tachibana 1,2, Okihide Hikosaka 1
PMCID: PMC3519929  NIHMSID: NIHMS412468  PMID: 23177966

SUMMARY

Motor actions are facilitated when expected reward value is high. It is hypothesized that there are neurons that encode expected reward values to modulate impending actions and potentially represent motivation signals. Here we present evidence suggesting that the ventral pallidum (VP) may participate in this process. We recorded single neuronal activity in the monkey VP using a saccade task with a direction-dependent reward bias. Depending on the amount of the expected reward, VP neurons increased or decreased their activity tonically until the reward was delivered, for both ipsiversive and contraversive saccades. Changes in expected reward values were also associated with changes in saccade performance (latency and velocity). Furthermore, bilateral muscimol-induced inactivation of the VP abolished the reward-dependent changes in saccade latencies. These data suggest that the VP provides expected reward value signals that are used to facilitate or inhibit motor actions.

INTRODUCTION

Animals and humans make an action quickly and preferentially if the action is expected to provide a valuable reward. This is the hallmark of `goal-directed behavior based on motivational values' (Dickinson and Balleine, 1994). Goal-directedness requires that the outcome of the action should be represented as a goal at the time of performance, and motivation is a mental state that embodies the goal-directedness. These considerations suggest that somewhere inside the brain there are neurons that represent motivation (or the value of an upcoming action) and that the motivational signal facilitates or inhibits the initiation/execution of the action.

A prominent candidate for such a goal-directed motivator is the basal ganglia. It has been shown that activity of many neurons in the basal ganglia is heavily influenced by expected reward values (Ding and Hikosaka, 2006; Hollerman and Schultz, 1998; Hong and Hikosaka, 2008; Joshua et al., 2009; Kawagoe et al., 1998; Lauwereyns et al., 2002; Pasquereau et al., 2007; Samejima et al., 2005; Sato and Hikosaka, 2002; Shidara et al., 1998). Anatomical studies have suggested that the basal ganglia act as an interface between non-motor processes and motor processes (Haber, 2003). In particular, the limbic part of the basal ganglia, which includes the ventral striatum (VS) and ventral pallidum (VP) (Heimer and Wilson, 1975), may convert motivation signals to motor signals (Mogenson et al., 1980). Accumulated evidence has suggested that the VS receives inputs from the `limbic' prefrontal cortex, hippocampus, and amygdala, which regulate motivational/emotional behaviors (Daw and Doya, 2006; Haber and McFarland, 1999; O'Doherty et al., 2007; Robbins and Everitt, 1996; Wise, 1996). The VP receives GABAergic projections from the VS and, in turn, projects to many brain areas involved in control of motivation such as the ventral tegmental area (VTA), substantia nigra pars compacta (SNc) and pars reticulata (SNr), thalamic mediodorsal nucleus (MD), and lateral habenula (LHb) (Haber and Knutson, 2010; Humphries and Prescott, 2010).

We chose the VP as a first step to answer the question, because its activity should be more directly correlated with changes in the animal's performance due to the close connectivity between the VP and motor output regions. It has been suggested that the VP may serve as a `limbic final common pathway' for processing of rewards (Smith et al., 2009). In Pavlovian tasks, neurons in the rat VP responded to sensory stimuli that predicted an upcoming reward as well as to the reward itself (Smith et al., 2011; Tindell et al., 2004). Lesions, inactivations, and chemical manipulations of the rodent VP enhance or suppress reward-seeking behaviors (Cromwell and Berridge, 1993; Farrar et al., 2008; Johnson et al., 1996; McAlonan et al., 1993; Smith and Berridge, 2005). Chemical activation of the monkey VP by local bicuculline injection induced stereotyped, non-purposive behavior (Grabli et al., 2004). In humans, bilateral lesions of the globus pallidus (GP) and the VP lead to a lack of motivation and pleasure (Bhatia and Marsden, 1994; Miller et al., 2006). Recent human imaging studies have demonstrated that the VP is activated during various kinds of motivational tasks associated with the primary (food and waters) and secondary (monetary gains) rewards (Beaver et al., 2006; Pessiglione et al., 2007). However, few studies have examined how individual VP neurons encode motivational/emotional states (Ito and Doya, 2009; Smith et al., 2011; Tindell et al., 2004) and how they modulate goal-directed behavior.

In our experiments we manipulated the value of an upcoming action by asking two macaque monkeys to perform a reward-biased saccade task. We found that many VP neurons showed differential activity encoding expected reward values. The role of the VP neuronal activity in motivating or demotivating the goal-directed saccade was supported by chemical inactivations of the VP in one of the monkeys.

RESULTS

To examine the functional roles of the VP in reward-oriented actions, we recorded activity of single neurons in the VP while two monkeys (P and H) were performing a reward-biased memory-guided saccade task (Ding and Hikosaka, 2006; Kawagoe et al., 1998; Nakamura et al., 2008), in which reward size was associated with the direction of saccade (Figures 1A and 1B). A trial started with the appearance of a central point that the monkey had to fixate. During the central fixation, a cue indicating the position of the saccade target was briefly presented randomly to the left or right. After a delay period the central fixation point turned off (go signal), and the monkeys had to make a saccade to the remembered target position. Within one block of 24 trials, saccades to one position were associated with a large amount of liquid reward (large-reward trials) while saccades to the other position were associated with a small amount of reward (small-reward trials). In the next block of 24 trials the position-reward contingency was reversed without external instructions. Animals reliably adjusted their saccade performance along with the position-reward contingency reversals: saccades to the large-reward position had higher velocities and shorter latencies and saccades to the small-reward position had lower velocities and longer latencies (Figures 5A and 5B). This indicates that the animals' performance was modulated by the expected reward size.

Figure 1.

Figure 1

Behavioral task and the location of the ventral pallidum (VP)

(A and B) Reward-biased memory-guided saccade task. (A) Time course of the task. While the monkey was fixating a central spot, another spot (target cue) appeared either on the left or right. After a delay period the fixation point turned off (go signal) and the monkeys had to make a saccade to the remembered position of the target cue. A correct saccade was signaled by the appearance of the target (feedback). One of the two targets was associated with the large amount of reward, while the other target was associated with a small amount of reward. (B) Reward schedule. Within a block of 24 trials, the reward amount was fixed with regard to the direction of saccade. The position-reward contingency was reversed in the next block of trials without external instructions. The position of the target was chosen pseudorandomly (see Experimental Procedures). (C) Recording sites, shown on an MR image (left panel) and on a Nissl-stained section (right panel) in monkey H. The T2-weighted MR image shows the pallidal complex as black areas (indicated by an arrow). A green line indicates a simulated electrode penetration passed the anterior commissure (AC) and reached the VP. The Nissl-stained section shows two electrolytic lesions, one in the AC and the other at the ventral end of VP. In this penetration, reward-related neurons were recorded 0.2–1.4 mm above the VP lesion. Cd, caudate nucleus; IC, internal capsule; Put, putamen. (D) Locations of reward-related neurons in the VP: positive type (red) and negative type (blue). Recording sites from three hemispheres in two monkeys are superimposed on two representative coronal sections. GPe, external segment of the globus pallidus (GP); v, lateral ventricle.

Tachibana et al.

Figure 5.

Figure 5

Correlated changes between saccade performance and VP neuronal activity

Average saccade performance (A–B) and VP neuronal activity (C–D) are shown across trials as their outcomes changed blockwise between a small reward (blue) and a large reward (red). The saccade performance is shown as saccade latency (A) and saccade peak velocity (B). VP neuronal activity is shown for the pre-saccade period (300–0 ms before saccade onset) for the positive type neurons (C) and the negative type neurons (D). The pre-saccadic activity was defined as the firing rate during pre-saccade period minus the baseline firing rate. For saccade latency and velocity, data were obtained for saccades in both directions (i.e., left and right). Since saccades in both directions went through block-wise transitions of reward outcome, but in different phases (e.g., left-large and right-small), we realigned the data in both directions based on their reward outcomes. The saccade data were normalized with respect to the mean values for each recording session, and then were averaged across the sessions. The data from monkey P and H were combined. Error bars indicate ± 1 SEM. Following the reversals of the position-reward contingency, the pre-saccadic VP activity changed in a similar time course to saccade latency and velocity.

Tachibana et al.

While the monkey was performing the reward-biased memory-guided saccade task, we recorded from single neurons in and around the VP which was defined as the structure below the anterior commissure (AC) and above the substantia innominata (Figures 1C and 1D), following Haber et al., (1993). The structure above the AC was defined as part of the external segment of the globus pallidus (GPe). Posterior to these regions are the main bodies of the GPe and the internal segment of the globus pallidus (GPi) (see Figure 6A). This anatomical classification was roughly correlated with the variation of single neuronal activity physiologically and functionally. In particular, reward-related neurons were found predominantly in the VP, less frequently in the GPe dorsal to the AC, and rarely in the GPe-GPi posterior to the AC.

Figure 6.

Figure 6

Bilateral inactivation of the VP diminished the reward-dependent modulation of saccade latency

(A) Injection sites of a GABAA agonist, muscimol, in and around the left VP (see Table 1, 21 injections in total). Muscimol was also injected in the mirror-symmetric position in the right hemisphere. Magenta marks indicate highly effective sites, where the saccade latency bias (the difference in the mean saccade latencies between small- and large-reward trials) was decreased below 30 ms within 20 min after the injection. Green marks indicate weakly effective sites, where the saccade latency bias decreased below 30 ms between 20 min and 40 min. Black circles indicate ineffective sites. The effective sites included the VP which was centered at 1 mm anterior to the AC (AC+1) and the GPe-GPi which was centered at 2 mm posterior to the AC (AC-2). NAc, nucleus accumbens; GPi, internal segment of the globus pallidus. (B) A representative effect of VP inactivation (indicated by a magenta square in A) on saccade latencies. The latencies of rightward saccades are plotted against the trial number in the conditions before the inactivation (left) and after the inactivation (right). After the VP inactivation, the saccade latency bias disappeared. (C) Population data for the effects of the bilateral inactivations of the effective sites (n = 11) on rightward saccade latencies. Error bars indicate ± 1 SEM. (D) Changes in the saccade latencies in small-reward trials (ordinate) and large-reward trials (abscissa) from before (filled symbols) to after (open symbols) the muscimol injections into the effective sites (VP and GP, left panel) and ineffective sites (right panel). The data in the graphs were obtained within 40 min after the injection. Theoretically, the plot on the diagonal line means that the saccade latencies in small-and large-reward trials were the same. Note that some data within 40 min after the injections could not be sampled because the animal refused to perform the task (injection into the structure 4 mm lateral to the VP) or showed involuntary eye movements (injection into the subthalamic nucleus/substantia nigra pars reticulata).

Tachibana et al.

In this study, we focused on the single neuronal activity recorded in the VP. We tested 190 neurons in the VP using the reward-biased memory-guided saccade task. Among them, 118 neurons (32 in monkey P and 86 in monkey H) (62 %) showed task-related modulations. In addition to the neurons formally tested, we encountered 73 neurons which were judged to be unrelated to the task and thus were not tested formally. The average spontaneous firing rate of the task-related VP neurons was 26.6 ± 14.8 spikes/s. The average spike duration of VP neurons was 0.82 ± 0.12 ms.

VP neurons encoded expected reward values

Figure 2 shows two examples of single VP neurons recorded in the reward-biased saccades. As shown in the raster display, both VP neurons changed their activity completely depending on the expected outcome (large or small reward), for both ipsiversive and contraversive saccades. The first VP neuron increased its activity after the onset of fixation point (Figure 2A). This neuron's activity further increased after the appearance of the target cue indicating the delivery of large reward (large-reward cue, red), but decreased after the appearance of the cue indicating small reward (small-reward cue, blue). These responses were long-lasting and thus the difference in the neuron's activity between the large- and small-reward trials remained until the reward delivery. We classified this neuron as a `reward positive' neuron (see Experimental Procedures). The second VP neuron (Figure 2B) decreased its activity after the appearance of the large-reward cue, but increased after the appearance of the small-reward cue. We classified this neuron as a `reward negative' neuron.

Figure 2.

Figure 2

Two VP neurons showing distinct reward modulations

(A) Spike activity of a neuron showing positive reward modulation. Its activity is shown separately for the ipsiversive (left panel) and contraversive (right panel) saccades in the reward-biased memory-guided saccade task. For each saccade direction, rasters of spikes (top panel) and spike density functions (SDFs, σ = 10 ms; bottom panel) are aligned at the onsets of fixation point, target cue, saccade, and reward delivery. The spike rasters are shown in order of occurrence of trials from top to bottom. Large- and small-reward trials are indicated, respectively, by red and blue bars on the left side of the rasters. The SDFs are shown separately for large-reward trials (red) and small-reward trials (blue); the first trial in each block was excluded. A gray dotted line indicates the baseline firing. (B) Spike activity of a neuron showing negative reward modulation.

Tachibana et al.

Among the 118 task-related VP neurons, 92 neurons showed a significant main effect of reward modulation throughout the task (P < 0.05, two-way ANOVA; see Experimental Procedures). A majority of reward-modulated VP neurons (16 in monkey P and 51 in monkey H, total: 67; 73%) were classified as reward positive type, while a minority (8 in monkey P and 17 in monkey H, total: 25; 27%) were classified as reward negative type. Their average activities (Figures 3A and 3B) were similar to those of the sample neurons shown in Figure 2. Both types showed sustained reward modulation which started after cue onset and outlasted reward delivery. This was true for many of the individual VP neurons (Figure 3C). In contrast, their activity was rarely modulated by saccade direction (Figure 3D).

Figure 3.

Figure 3

Population activity of VP neurons

(A) Population activity of VP neurons showing significant positive reward modulations (n=67). The convention of this figure is the same as the one in Figure 2. The light shaded areas indicate ± 1 SEM. (B) Population activity of VP neurons showing significant negative reward modulations (n=25). (C and D) Changes in reward-dependent (C) and direction-dependent (D) modulations of activity in 118 task-related VP neurons. The modulation of each neuron's activity is presented as a row of pixels, each pixel indicating the degree of modulation on the time point. The degree of modulation is expressed as an ROC area based on the comparison of firing rate (100 ms test window) between large- and small-reward trials (C) and between contraversive- and ipsiversive-saccade trials (D). The calculation of the ROC area was repeated by moving the test window in 20 ms steps. Warm colors (ROC > 0.5) indicate higher firing rates on large- than on small-reward trials (C) and on contraversive than on ipsiversive trials (D). The neurons in (C) and (D) are sorted in order of ROC areas for the reward and direction modulation, respectively.

Tachibana et al.

Other than the opposite reward modulations, the positive and negative neurons were not different in their physiological properties, including average spontaneous firing rate (positive type: 22.6 spikes/s, negative type: 28.7 spikes/s; P = 0.11, Mann-Whitney U test), average spike duration (positive type: 0.81 ms, negative type: 0.79 ms; P = 0.80), and average irregularity index (positive type: 0.57, negative type: 0.54; P = 0.87; see Davies et al., 2006).

A remarkable feature of VP neuronal activity was stepwise and gradual increases during the entire course of a trial. This was found particularly in positive type neurons (Figure 4A). The VP activity seemed to encode the `expected reward value' depending on the behavioral state during the task (Figure 4B). To test this hypothesis, we calculated the VP activity in four different states (pre-fixation, pre-cue, pre-saccade, pre-reward periods; indicated by gray columns in Figure 4A). In large-reward trials, linear increases in the state-dependent reward expectation were observed in the population (Figure 4C) and individual neurons, (Figure 4D). The increase in the VP activity appears to reflect the nearing of the upcoming reward, which was expressed in two ways: 1) stepwise by discrete events (fixation point, target cue, and saccade) and 2) linearly by the passage of time. Except for the post-cue phasic changes in activity, neuronal changes occurred similarly in both large- and small-reward trials (Figure 4A), and therefore the difference between large- and small-reward trials remained largely unchanged (Figure S1). Alternatively, the changes in the VP activity might reflect changes in expected cost as well as expected reward, as explained in Supplemental Text.

Figure 4.

Figure 4

Expected values and costs encoded by reward positive VP neurons

(A) Population activity of reward positive VP neurons showing significant positive reward modulations (same as Figure 3A, but includes both ipsiversive and contraversive saccades). (B) Hypothetical signals encoding expected values and costs. (C) Changes in the average activity of reward positive VP neurons measured at four periods (each, 300 ms duration) indicated by gray bars in (A). The activity was normalized with respect to the baseline firing rate (1,300–300 ms before the onset of fixation point). (D) Changes in the activity of single VP neurons, shown by red dots connected with lines. Only the activity in large-reward trials is shown (which was expected to increase monotonically). Inset graph shows the distribution of correlation coefficients between the VP neuronal activity and the sequential task periods. The mean of the correlation coefficients (indicated by a triangle) was significantly deviated from zero to the positive direction (P < 0.001, Wilcoxon signed-rank test).

Tachibana et al.

Another notable activity pattern was found during the period before the cue onset. The average pre-cue activity of positive neurons was higher on large-reward trials (Figures 3A and 4C; see also Figure S1A, arrow), while the average activity of negative neurons was higher on small-reward trials (Figure 3B; Figure S1B). It was as if the VP neurons predicted the reward value of the current trial even before the reward cue was presented. The prediction was possible because we used a pseudorandom reward schedule in which four consecutive trials consisted of two large-reward and two small-reward trials. Thus, the monkeys could predict a large reward with a high probability in the next trial after they obtained a small reward, and vice versa (Bromberg-Martin et al., 2010). To test this issue, we compared VP neurons' activity during the pre-cue period (Figure S2). Thirteen out of 25 negative neurons and 11 out of 67 positive neurons showed significant differences in pre-cue activity in reward-predictive manners (P < 0.05, Mann-Whitney U test). These results are consistent with the hypothesis that the VP neurons predicted the reward value of the current trial based on the reward history.

VP activity was related to saccadic performance

Animal's reward expectation is known to influence saccadic performance (Takikawa et al., 2002; Watanabe et al., 2003). We hypothesized that VP neurons regulate the initiation of saccades using the reward expectation-related information. As a first step to test this hypothesis, we examined whether the activity of VP neurons was correlated with saccadic performance (i.e., saccade latency and velocity). We focused on the VP neurons' activity during the pre-saccade period because it could directly modulate the saccadic preparatory signals in the oculomotor system. The pre-saccadic activity of VP neurons should then be correlated with the saccadic performance as it changed across trials. More specifically, since the position-reward contingency was reversed relatively frequently in our task, both VP pre-saccadic neuronal activity and saccadic performance should also be reversed in similar time courses.

The results were basically consistent with this prediction (Figure 5). Following the reversal of the position-reward contingency, both saccade latency (Figure 5A) and saccade velocity (Figure 5B) showed clear changes. There were two kinds of reversal: small-to-large reversal (the saccade, which had been associated with a small reward, was now associated with a large reward) and large-to-small reversal (the saccade, which had been associated with a large reward, was now associated with a small reward). The saccade latency decreased and the saccade velocity increased instantly after the small-to-large reversal. In contrast, the saccade latency increased and the saccade velocity decreased more slowly after the large-to-small reversal.

The pre-saccadic activity of VP neurons also changed clearly following the reversal of the position-reward contingency (Figures 5C and 5D). Positive and negative VP neurons changed their activity in opposite manners. Its time course was similar to the time course of the saccade latency and velocity. In particular, positive VP neurons changed their activity quickly after the small-to-large reversal, but more slowly after the large-to-small reversal, similarly to the changes in saccade latency and velocity. This impression was supported by a statistical analysis: their activity on the 2nd trial after the small-to-large reversal was not statistically different from the activity in the subsequent 10 trials (P = 0.51, Wilcoxon signed-rank test), whereas the activity on the 2nd trial after the large-to-small reversal was statistically different from the activity in the subsequent 10 trials (P = 0.008).

We also examined the activity of VP neurons in two different periods: post-cue and post-reward periods (Figure S3). The changes in the post-cue activity (Figure S3A) were similar to the changes in the pre-saccadic activity, confirming that the VP neurons maintained their reward expectation information derived from the cue. The changes in the post-reward activity were more complex (Figure S3B). Both positive and negative neurons changed their activity roughly in relation to the amount of the received reward. Thus, VP neurons did not encode reward prediction errors, unlike several groups of neurons that are involved in dopamine release (Hong and Hikosaka, 2008; Hong et al., 2011; Matsumoto and Hikosaka, 2007), but are similar to neurons in the dorsal raphe nucleus including serotonin neurons (Nakamura et al., 2008).

VP inactivation abolished reward-dependent modulations of saccadic performance

Our results so far showed that VP neurons encoded the expected reward value in a manner that was associated with behavioral measures of motivation. If these neurons truly have a causal role in generating motivation, then inactivating the VP should abolish the effects of expected reward value on behavior. Crucially, inactivation should not interfere with the sensorimotor aspects of behavior (such as perceiving the target or executing the saccade), only with the ability to regulate behavior based on the expected value.

To test the hypothesis, we locally inactivated the VP and its surrounding regions by injecting a GABAA receptor agonist, muscimol (0.88–44 mM, 1–2 μl) while one monkey performed a reward-biased visually-guided saccade task (Lauwereyns et al., 2002) (see Experimental Procedures). We tested whether the changes in saccade latency based on reward expectation (hereafter called `reward-dependent saccade latency bias') were changed by the muscimol-induced inactivation. We carried out 4 unilateral and 17 bilateral injection experiments in monkey H within the period of 84 days (Table 1). Figure 6A depicts the injection sites in the left hemisphere on the basis of the histological reconstruction. We confirmed that, in case of bilateral injections, muscimol was injected roughly at the mirror-symmetric position in the right hemisphere (data not shown).

Table 1.

Muscimol injection in and around the ventral pallidum. Latency of the first effect was defined as the time when muscimol injection decreased the saccade latency bias below 30 ms. GPe, external globus pallidus; GPi, internal globus pallidus; NAc, nucleus accumbens; STN, subthalamic nucleus; VP, ventral pallidum.

Concn. (μg/μl) Volume (μl) Side Structure (in comparison to VP) Latency of the First Effect (Right saccade/Left Saccade; min)
1 1 Left VP No effect / No effect
1 2 Right VP 60 / 90
5 2 Left VP No effect / No effect
5 2 Left GPe; 2 mm caudal 250 / 250
5 2 Bilateral VP 5 / 5
1 2 Bilateral VP 10 / 10
0.2 1 Bilateral VP 5 / 46
0.1 1 Bilateral VP 5 / 5
1 2 Bilateral ventral to VP; 2 mm ventral 10 / 10
1 2 Bilateral anterior commissure; 2 mm dorsal 74 / 40
1 2 Bilateral GPe; 4 mm dorsal 40 / 130
1 2 Bilateral medial to VP; 4 mm medial 89 / 58
1 2 Bilateral lateral to VP; 4 mm lateral 143 / 143
1 2 Bilateral lateral part of ventral striatum (NAc); 3 mm rostral 250 / 70
1 2 Bilateral medial part of ventral striatum (NAc); 3 mm rostral, 3 mm medial 95 / 65
1 2 Bilateral GPi (Left), ventral to GPe (Right); 3 mm caudal 10 / 10
0.1 1 Bilateral GPi (Left), ventral to GPe (Right); 3 mm caudal 5/ 5
1 2 Bilateral GPe; 3 mm caudal, 4 mm dorsal 5/ 19
0.1 1 Bilateral medial to GPi (Left), medial to GPe (Right); 3 mm caudal, 3 mm medial 5 / 5
0.1 1 Bilateral ventromedial to GPi (Left), ventromedial to GPe (Right); A part of hypothalamic region was included in each hemisphere. 3 mm caudal, 3 mm medial, 3 mm ventral No effect / 183
1 2 Bilateral STN; 6 mm caudal No effect / No effect

Bilateral, not unilateral, injections of muscimol in the VP diminished or eliminated the reward-dependent saccade bias with short latencies (Table 1). One example is shown in Figure 6B. Its injection site is shown by a magenta square in Figure 6A. The saccade latency bias, which was present before the injection (Figure 6B, left), disappeared 10 min after the muscimol injection (Figure 6B, right). We repeated the same experiment (bilateral muscimol injections in the VP) four times, and the results were very similar (Table 1).

To examine how localized the effects of muscimol injections were, we made bilateral muscimol injections at 12 sites around the VP (Table 1). The reward-dependent saccade latency bias decreased after most injections, but the effects tended to appear later than after the injections in the VP (i.e., longer latencies). We graded the effectiveness of the injection based on the latency (Sakamoto and Hikosaka, 1989): highly effective if the latency was less than 20 min and weakly effective if the latency was more than 20 min but less than 40 min. Such effective injection sites are depicted by magenta (highly effective) and green (weakly effective) in Figure 6A.

As shown in Figure 6A, the highly effective sites were centered on the VP (AC+1), but extended posteriorly into the GPe and GPi, which was 3 mm posterior to the VP (AC−2). At a more anterior level (3 mm anterior to the VP, AC+4) injections into the nucleus accumbens (NAc) had no clear effect (n = 2). At a more posterior level (6 mm posterior to the VP, not shown) injections into the subthalamic nucleus (STN)/SNr induced oculomotor effects that prevented the monkey from making saccades to the target, such as involuntary saccades and nystagmic eye movements (Hikosaka and Wurtz, 1985). These data suggest that the effects of muscimol on the reward-dependent saccade latency bias were relatively localized in a region including the VP and possibly the GPe-GPi.

Figure 6C shows the population data based on muscimol injections into the effective sites (n = 11), indicating similarly devastating effects on the reward-dependent saccade latency bias. Crucially, these inactivations selectively affected the motivational bias, without severe impairments in the sensorimotor control of saccades. If anything, inactivation caused saccade latencies to become shorter, especially on during small-reward trials for on which saccades normally had very long latencies (Figure 6D; Figure S4A). These bilateral muscimol injections also caused changes in saccade peak velocity and the stability of gaze fixation (Figure S4, B and C). The bias in the saccade peak velocity (i.e., faster when a large reward was expected) disappeared after the injections. After the muscimol injections the monkey broke fixations more frequently before the target came on (error rate of fixation break: P < 0.01 for both small- and large-reward conditions, Wilcoxon signed-rank test).

DISCUSSION

We found that VP neurons encoded the expected value associated with the upcoming action (i.e., saccade), rather than the physical properties of the action (e.g., saccadic direction). In this sense, the VP may be different from other parts of the basal ganglia such as the caudate nucleus (Hikosaka et al., 1989), GPe/GPi (Yoshida and Tanaka, 2009), and SNr (Hikosaka and Wurtz, 1983) where neurons carry sensorimotor signals. Although their sensorimotor activity may be modulated by reward value signals, the outputs of these neurons could still be used to control actions physically (e.g., bias saccades to the contralateral side) (Ding and Hikosaka, 2006; Lauwereyns et al., 2002; Sato and Hikosaka, 2002).

Instead, our finding seems to support the hypothesis that the VP is involved in motivational control of actions (Mogenson et al., 1980). Indeed, the activity of VP neurons share essential properties with subcortical motivation-related neurons which are found in the LHb (Matsumoto and Hikosaka, 2007), border region of the GP (GPb) (Hong and Hikosaka, 2008), rostromedial tegmental nucleus (RMTg) (Hong et al., 2011), the dorsal raphe (DRN) (Nakamura et al., 2008), and dopamine (DA) neurons in the SNc/VTA (Matsumoto and Hikosaka, 2007; Nakamura et al., 2008). These neurons, at least partially, form neural circuits that control the release of both dopamine and serotonin in the basal ganglia and other forebrain structures (Ikemoto, 2010), thereby modulating sensorimotor processing (Hikosaka et al., 2008). Moreover, the VP is known to project to the LHb, RMTg, DRN, and SNc/VTA (Haber and Knutson, 2010; Humphries and Prescott, 2010). The projection to the SNc/VTA may target dopamine neurons directly, or indirectly through GABAergic neurons which behave similarly to VP neurons (Cohen et al., 2012). Therefore, the expected value information encoded by VP neurons might be used to control actions through the dopaminergic or serotonergic actions.

However, the nature of the reward value coding in VP neurons was different from most of the subcortical motivation-related neurons, especially neurons in the GPb, LHb, and RMTg which altogether control dopamine neurons. The activation (or suppression) of these dopamine-controlling neurons (including dopamine neurons themselves) occurs phasically in response to sensory events that indicate `changes' in the level of reward (or its expectation). If a reward is fully expected, the dopamine-controlling neurons may not respond to a sensory event that cues an action leading to the reward (Bromberg-Martin et al., 2011). The signal may be suitable for learning the value of a behavioral context (i.e., sensory event – action – reward), but not for facilitating or suppressing ongoing actions.

In contrast, VP neurons encoded expected reward values as they currently stand (rather than as they change). Even after the cue was presented and the monkey had acquired the information about the amount of the upcoming reward, VP neurons continued to be active (or inactive) until the reward was delivered. This was true for both positive type neurons (i.e., more active with larger rewards) and negative type neurons (i.e., more active with smaller rewards). The positive VP neurons, specifically, seem to represent the worthiness of an action by combining expected values and expected costs. Such sustained activity would be highly useful for the sensorimotor system because it could directly modulate the preparatory processes of the goal-directed action. In fact, the activity of VP neurons during the period preceding a saccade was well associated with the latency and velocity of the saccade, as the saccade performance changed across blocks of trials due to changes in reward amount.

These data raised the possibility that the VP neuronal activity is used for modulating impending motor actions based on expected reward values, in addition to learning the value of behavioral context. If so, the removal of the VP neuronal signals should reduce the reward-based modulation of motor actions. To test this prediction, we reversibly inactivated the VP and tested its effects on reward-oriented behavior. Bilateral VP inactivation had little effect on the basic sensorimotor processes underlying saccadic eye movements, but caused a rapid and dramatic deficit in reward-based biases in saccade initiation. Furthermore, unilateral inactivations of the VP had little effect, suggesting that the VP on both sides conjointly contributes to the reward expectation-dependent modulation of motor actions.

Notably, the bilateral VP inactivation led to shortening of saccade latencies on small-reward trials while the saccade latencies on large-reward trials remained short. However, this may seem odd since a majority of reward-related VP neurons were positive type and blocking their activity may be expected to dampen the initiation of saccades on large-reward trials. We propose two mechanisms that, together, might explain this phenomenon. First, reward negative VP neurons, though a minority, might exert comparatively strong behavioral effects, suppressing the initiation of saccades on small-reward trials. If so, blocking their activity would remove the suppressive effect on small-reward trials. However, this mechanism alone may not explain our results, because the response of VP neurons was bidirectional. On large-reward trials the negative VP neurons were inhibited and therefore saccades would be facilitated (i.e., disinhibited). Blocking this effect would remove the facilitatory effect, leading to an increase in saccade latency. Such an effect was not observed clearly. However, the above argument is focused on the reward-biased phasic responses of VP neurons. Actually, a majority of VP neurons fire tonically, and this may act as a second mechanism. Thus, VP neurons might exert general inhibitory effects on the target motor areas in addition to the reward-biased phasic effect. The inactivation of VP neurons would then lead to a removal of the inhibitory effects, thus causing general shortening of saccade latencies. On large-reward trials, the phasic facilitatory effect of the negative VP neurons would be cancelled by their tonic suppressive effect. On small-reward trials, the phasic suppressive effect would be enhanced by the tonic suppressive effect. To summarize, the VP may influence motor behavior using the reward-biased phasic signal and the reward-unbiased tonic signal.

The second effect (i.e., general inhibitory effect) may be worth considering further for experimental and theoretical reasons. Studying inputs to dopamine neurons in the rat, Floresco et al., (2003) showed that the muscimol-induced inactivation of the VP led to an increase in the number of spontaneously active dopamine neurons in the ventral tegmental area and a tonic increase in extracellular dopamine levels in the nucleus accumbens. Niv et al. (2007) proposed that the tonic dopamine level controls the vigor of action so that the amount of reward obtained per time is optimized in relation to the cost required in performing the action. This might explain our results that the saccade latency as well as velocity on small reward trials became shorter by the VP inactivation. The increase in the rate of fixation break errors might reflect an abnormal increase in the vigor of action (i.e., saccade). These results together appear to suggest that the output of the VP normally decreases the level of motivation. This seems at odds with human lesion and imaging studies (Beaver et al., 2006; Bhatia and Marsden, 1994; Miller et al., 2006; Pessiglione et al., 2007). The apparent discrepancy remains to be investigated.

The reduction of the reward-dependent saccade latency bias also occurred after inactivations of the GPe-GPi region which is located posterior to the VP. These effects are unlikely due to the diffusion of muscimol from the VP to the GPe–GPi or vice versa, because the effects appeared very quickly, typically within 5–10 min after these injections. Such short latencies would be expected if the inactivation target is no more than 1.5 mm away from the injection site (Sakamoto and Hikosaka, 1989). Therefore, both the VP and GPe–GPi may independently contribute to the reward-dependent saccade latency bias. Indeed, highly reward-sensitive neurons are distributed usually in the border between the GPe and GPi and sometimes inside the GPi or its medial border, and most of them transmit the reward signals to the LHb (Hong and Hikosaka, 2008). This population of reward-sensitive neurons has been called GPb (i.e., GP border). Since the GPb-LHb connection controls both dopamine and serotonin release (Hikosaka, 2010), the strong effects of muscimol injections in the GPe–GPi region may be caused by the interruption of the reward information transmitted through the GPb-LHb connection.

Our study raises the important question how the VP gains access to the sensorimotor system to cause the reward-dependent saccade latency bias. This might be performed by the VP's connections to the output structures in the basal ganglia, directly to the SNr or indirectly through the STN (Haber and Knutson, 2010; Humphries and Prescott, 2010). Indeed, the activity of SNr neurons change depending on expected reward values, but in a direction-dependent manner (Sato and Hikosaka, 2002). Alternatively, the VP may influence thalamo-cortical information processing mainly through its connection to the MD thalamus (Haber and Knutson, 2010). These connections might enable the VP to modulate the preparation and execution of motor actions. The contribution of these connections to the preparation and execution of motor actions should be elucidated in future studies.

In summary, our data indicate that VP neurons encode the moment-by-moment changes of expected reward values (and possibly expected costs). These signals may be used for modulating the vigor with which goal-directed actions are executed or for learning the value of behavioral context.

EXPERIMENTAL PROCEDURES

We used three hemispheres of two male rhesus monkeys, P and H, in this study. All animal care and experimental procedures were approved by the Institute Animal Care and Use Committee and complied with the Public Health Service Policy on the humane care and use of laboratory animals. The detail of surgical procedures to implant the head holder, recording chamber, and scleral search coil was described elsewhere (Ding and Hikosaka, 2006; Nakamura et al., 2008).

Behavioral task

The monkeys were trained to perform a reward-biased memory-guided saccade task (Ding and Hikosaka, 2006; Kawagoe et al., 1998; Nakamura et al., 2008). During experimental sessions, the monkey was seated quietly in a primate chair in a dimly lit room. Visual stimuli were rear-projected by an active matrix liquid crystal display projector (PJ759, ViewSonic) onto a frontoparallel screen 32 cm from the monkey's eyes. Eye movements were monitored using a scleral search coil method with 1 ms resolution. Each trial started with the appearance of a central fixation point (diam., 0.6°) (Figure 1A). After 1,100 ms central fixation within a window of 4°, a target cue (diam., 0.7°) indicating the position of saccade was briefly (50 ms) presented either to the right or left 20° from the fixation point. The target position was adjusted in either normal or oblique angles when the recorded neuron showed some direction selectivity. After a delay period (1,000–1,200 ms) the fixation point turned off, and the monkeys had to make a saccade to the remembered target position. If the eye position after the saccade was within the target window (7°), the target appeared and the monkeys had to maintain their gaze for 500 ms on the target. This correct performance was then signaled with a tone (duration: 100 ms) together with the delivery of a liquid reward (water or juice). If the monkeys made an error (i.e., breaking fixation or making an incorrect saccade), the trial was aborted (with all stimuli extinguished) and the monkeys had to wait for the next trial.

As a key feature of this task, we used a block-wise alternating biased reward schedule (Figure 1B). Within a block of 24 trials, the amount of reward was always large (0.25 or 0.3 ml) for the saccades to one direction and small (0 or 0.03 ml) for the saccades to the other direction. Even in the small-reward trials, the monkeys had to make a correct saccade; otherwise, the same trial was repeated. The reward-position contingency was reversed for the next block of trials without external instructions. We used a pseudorandom reward schedule in which each block was divided into six `sub-blocks', each consisting of two large-reward and two small-reward trials presented in a random order.

In the following inactivation study, we used a reward-biased visually guided saccade task (Lauwereyns et al., 2002). After the central fixation (600 or 1,000 ms), the fixation point turned off and simultaneously the target appeared either to the right or left 20° from the fixation point. The monkeys had to immediately make a saccade to the visible target. There was no cue during the fixation period. The reward schedule was the same as the memory-guided saccade task.

Electrophysiological recording

We followed Haber et al., (1993) for the anatomical localization of the VP which is located ventral to the AC and anterior to the GPe-GPi. Thus defined location of the VP was estimated on the basis of magnetic resonance (MR) images (4.7 T, Bruker). Single-unit recordings of VP neurons were performed with an epoxy-coated or a glass-coated Tungsten microelectrode (0.8–1.5 MΩ at 1 kHz). The electrode was inserted obliquely (36° from vertical in the frontal plane) into the pallidum (Figure 1C) using an oil-driven micromanipulator (MO-97A, Narishige). The recording sites were determined using a grid system, which allowed recordings at every 1 mm between penetrations. The unitary activity recorded from the microelectrode was amplified, filtered (200 Hz to 5 kHz), converted into digital data with an online window discriminator, and stored in a computer at the sampling rate of 1 kHz. During recording, the VP is located below the AC, which was identified on the basis of axonal signals such as high-frequency background noises and initially positive spikes. Only stable and well-isolated neurons were included in the present data.

Muscimol inactivation

After the electrophysiological recording (mapping) of VP neurons in monkey H, we performed inactivation experiments to test a causal relationship between the VP activity and the reward modulation of saccadic performance. To accurately inactivate the brain structure, we used an electrode assembly (injectrode) consisting of an epoxy-coated Tungsten microelectrode for unit recording and a silica tube for drug delivery as described previously (Tachibana et al., 2008). After the precise identification of the aimed structures by unit recording, we injected a GABAA receptor agonist, muscimol (Sigma; 0.88–44 mM; 1–2 μl), into the target structure of each hemisphere. Because the effect of muscimol on the targets lasted several hours, the inactivation was limited to twice per week. After inserting the injectrode, the animal performed one session of reward-biased visually guided saccade task (at least four blocks), and the data were used as preinjection control. Soon after the injection was completed (within 5 min), the animal was required to resume the same saccade task, and to repeat it every 30 min for 2–3 hrs. We used a reward-biased visually guided saccade task, because the behavioral bias of saccadic performance could be detected more clearly than the reward-biased memory-guided saccade task (see the section `Behavioral task').

Histology

Reference lesions were placed at several recording sites of task-related neurons by passing a cathodal DC current of 15 μA for 30 s through the electrode. At the conclusion of the experiments, the monkeys were deeply anesthetized with an overdose of sodium pentobarbital and perfused transcardially saline followed by 4% paraformaldehyde. The head was fixed to the stereotaxic frame, and the brain was cut into blocks in the coronal plane parallel to the electrode penetrations. Serial 50-μm sections were processed for Nissl staining. The recording and drug injection sites were reconstructed according to the lesions made by the cathodal DC current, the traces of electrode tracks, and MR images.

Data analysis

Only correct trials were included in the data analysis. In addition, the first trials after the reversal of reward-position contingency were excluded in most cases. An exception was the analysis of the time courses of neuronal and behavioral changes after the reversal of the position-reward contingency. To determine saccade latency, we detected the onset of a saccade if the velocity of an eye movement exceeded a threshold value (50°/s). To examine the across-block behavioral changes, we normalized saccade latency by subtracting the mean saccade latency for each saccade direction in each monkey. Saccade velocity was also normalized in the same manner.

We analyzed the task-related activity of VP neurons across the following five task periods: post-cue (100–400 ms after cue onset), delay (700–1,000 ms after cue onset), pre-saccade (300–0 ms before saccade onset), post-saccade (0–300 ms after saccade onset), and post-reward periods (0–500 ms after reward delivery). During each period, we analyzed neuronal activity using two-way ANOVA [reward size (large reward and small reward) × direction of saccade target (ipsilateral and contralateral to the recording site)]. With correction for multiple comparisons, we set statistically significant level as P = 0.01, equivalent to a value of 0.05/5. If the neuron showed the main effect of reward and/or direction modulation in any of the five task periods, the neuron was assigned as a task-related neuron. To determine the reward selectivity of individual VP neurons, we used a long test window (from 100 ms after cue onset to 500 ms after reward delivery) with ANOVA. We also applied ROC (receiver-operator-characteristic) analysis to compare spike counts sampled in the long test window under different reward conditions (i.e., large reward vs. small reward) (Nakamura et al., 2008). We classified the reward-related VP neurons into three groups: 1) reward positive type, if their activity was larger in the large-reward condition than in the small-reward condition (P < 0.05, ANOVA and ROC > 0.5); 2) reward negative type, if their activity was larger in the small-reward condition than in the large-reward condition (P < 0.05, ANOVA and ROC < 0.5); 3) no reward modulation type (P > 0.05, ANOVA). To determine the direction selectivity of individual VP neurons, we performed the ROC analysis in the same long test window under different direction conditions (i.e., contraversive vs. ipsiversive).

To visualize event-dependent changes in reward and direction modulations, we computed ROC areas comparing the firing rates in the same test window of 100 ms between large- and small-reward trials (reward modulation) (see Figure 3C) and between contraversive- and ipsiversive-saccade trials (direction modulation) (see Figure 3D). We repeatedly computed ROC areas by sliding the test window in 20 ms steps.

To investigate if the VP signals encode expected reward values, we calculated the VP neurons' activity during the following four test periods: prefixation (300–0 ms before fixation point onset), precue (300–0 ms before target cue onset), presaccade (300–0 ms before saccade onset), and prereward periods (300–0 ms before reward delivery). To test the state-dependent changes in VP signals reflecting the expected reward values, we calculated correlation coefficients between the VP responses and the behavioral states. We further tested whether the reward-history could affect the expected reward values. Because our task included the pseudorandom reward schedule, the monkeys might be able to predict the reward size in next trials. To test the reward-history effect, we calculated the VP activity on the basis of the preceding reward history (i.e., whether the preceding trial was a small-reward trial or a large-reward trial) (Figure S2).

To examine neuronal changes after the reversal of position-reward contingency, post-cue, pre-saccade, and post-reward responses were calculated as the firing rate during post-cue, post-saccade, or post-reward period minus the baseline firing rate (1,300-300 ms before the onset of fixation point), respectively.

For the inactivation experiment, we focused on the changes in the reward-dependent saccade latency bias which was defined as the difference in the average saccade latencies between small- and large-reward trials. We judged that a muscimol injection was significantly effective if the saccade latency bias in either the left or right saccades decreased and became statistically insignificant (P > 0.05, Mann-Whitney U test) within 40 min after the injection, which roughly corresponded to the saccade latency bias less than 30 ms.

Supplementary Material

01

ACKNOWLEDGMENTS

We thank M. Matsumoto, S. Hong, E. Bromberg-Martin, M. Yasuda, S. Yamamoto, H. Kim and I. Monosov for helpful comments and discussion, M. Smith for histological expertise, A. Nichols, T. Ruffner, A. Hays and J. McClurkin for technical assistance, and D. Parker and B. Nagy for animal care.

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

REFERENCES

  1. Beaver JD, Lawrence AD, van Ditzhuijzen J, Davis MH, Woods A, Calder AJ. Individual differences in reward drive predict neural responses to images of food. J Neurosci. 2006;26:5160–5166. doi: 10.1523/JNEUROSCI.0350-06.2006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Bhatia KP, Marsden CD. The behavioural and motor consequences of focal lesions of the basal ganglia in man. Brain. 1994;117(Pt 4)):859–876. doi: 10.1093/brain/117.4.859. [DOI] [PubMed] [Google Scholar]
  3. Bromberg-Martin ES, Matsumoto M, Hikosaka O. Dopamine in motivational control: rewarding, aversive, and alerting. Neuron. 2011;68:815–834. doi: 10.1016/j.neuron.2010.11.022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Bromberg-Martin ES, Matsumoto M, Nakahara H, Hikosaka O. Multiple timescales of memory in lateral habenula and dopamine neurons. Neuron. 2010;67:499–510. doi: 10.1016/j.neuron.2010.06.031. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Cohen JY, Haesler S, Vong L, Lowell BB, Uchida N. Neuron-type-specific signals for reward and punishment in the ventral tegmental area. Nature. 2012;482:85–88. doi: 10.1038/nature10754. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Cromwell HC, Berridge KC. Where does damage lead to enhanced food aversion: the ventral pallidum/substantia innominata or lateral hypothalamus? Brain Res. 1993;624:1–10. doi: 10.1016/0006-8993(93)90053-p. [DOI] [PubMed] [Google Scholar]
  7. Daw ND, Doya K. The computational neurobiology of learning and reward. Curr Opin Neurobiol. 2006;16:199–204. doi: 10.1016/j.conb.2006.03.006. [DOI] [PubMed] [Google Scholar]
  8. Davies RM, Gerstein GL, Baker SN. Measurement of time-dependent changes in the irregularity of neural spiking. J Neurophysiol. 2006;96:906–918. doi: 10.1152/jn.01030.2005. [DOI] [PubMed] [Google Scholar]
  9. Dickinson A, Balleine B. Motivational control of goal-directed action. Anim Learn Behav. 1994;22:1–18. [Google Scholar]
  10. Ding L, Hikosaka O. Comparison of reward modulation in the frontal eye field and caudate of the macaque. J Neurosci. 2006;26:6695–6703. doi: 10.1523/JNEUROSCI.0836-06.2006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Farrar AM, Font L, Pereira M, Mingote S, Bunce JG, Chrobak JJ, Salamone JD. Forebrain circuitry involved in effort-related choice: Injections of the GABAA agonist muscimol into ventral pallidum alter response allocation in food-seeking behavior. Neuroscience. 2008;152:321–330. doi: 10.1016/j.neuroscience.2007.12.034. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Floresco SB, West AR, Ash B, Moore H, Grace AA. Afferent modulation of dopamine neuron firing differentially regulates tonic and phasic dopamine transmission. Nat Neurosci. 2003;6:968–973. doi: 10.1038/nn1103. [DOI] [PubMed] [Google Scholar]
  13. Grabli D, McCairn K, Hirsch EC, Agid Y, Féger J, François C, Tremblay L. Behavioural disorders induced by external globus pallidus dysfunction in primates: I. Behavioural study. Brain. 2004;127:2039–2054. doi: 10.1093/brain/awh220. [DOI] [PubMed] [Google Scholar]
  14. Haber SN. The primate basal ganglia: parallel and integrative networks. J Chem Neuroanat. 2003;26:317–330. doi: 10.1016/j.jchemneu.2003.10.003. [DOI] [PubMed] [Google Scholar]
  15. Haber SN, Lynd-Balta E, Mitchell SJ. The organization of the descending ventral pallidal projections in the monkey. J Comp Neurol. 1993;329:111–128. doi: 10.1002/cne.903290108. [DOI] [PubMed] [Google Scholar]
  16. Haber SN, Knutson B. The reward circuit: linking primate anatomy and human imaging. Neuropsychopharmacology. 2010;35:4–26. doi: 10.1038/npp.2009.129. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Haber SN, McFarland NR. The concept of the ventral striatum in nonhuman primates. Ann N Y Acad Sci. 1999;877:33–48. doi: 10.1111/j.1749-6632.1999.tb09259.x. [DOI] [PubMed] [Google Scholar]
  18. Heimer L, Wilson RD. The subcortical projections of the allocortex: similarities in the neural associations of the hippocampus, the piriform cortex, and the neocortex. In: Santani M, editor. Golgi Centennial Symposium, Proceedings.1975. pp. 177–193. [Google Scholar]
  19. Hikosaka O. The habenula: from stress evasion to value-based decision-making. Nat Rev Neurosci. 2010;11:503–513. doi: 10.1038/nrn2866. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Hikosaka O, Bromberg-Martin E, Hong S, Matsumoto M. New insights on the subcortical representation of reward. Curr Opin Neurobiol. 2008;18:203–208. doi: 10.1016/j.conb.2008.07.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Hikosaka O, Sakamoto M, Usui S. Functional properties of monkey caudate neurons. I. Activities related to saccadic eye movements. J Neurophysiol. 1989;61:780–798. doi: 10.1152/jn.1989.61.4.780. [DOI] [PubMed] [Google Scholar]
  22. Hikosaka O, Wurtz RH. Visual and oculomotor functions of monkey substantia nigra pars reticulata. III. Memory-contingent visual and saccade responses. J Neurophysiol. 1983;49:1268–1284. doi: 10.1152/jn.1983.49.5.1268. [DOI] [PubMed] [Google Scholar]
  23. Hikosaka O, Wurtz RH. Modification of saccadic eye movements by GABA-related substances. II. Effects of muscimol in monkey substantia nigra pars reticulata. J Neurophysiol. 1985;53:292–308. doi: 10.1152/jn.1985.53.1.292. [DOI] [PubMed] [Google Scholar]
  24. Hollerman JR, Schultz W. Dopamine neurons report an error in the temporal prediction of reward during learning. Nat Neurosci. 1998;1:304–309. doi: 10.1038/1124. [DOI] [PubMed] [Google Scholar]
  25. Hong S, Hikosaka O. The globus pallidus sends reward-related signals to the lateral habenula. Neuron. 2008;60:720–729. doi: 10.1016/j.neuron.2008.09.035. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Hong S, Jhou TC, Smith M, Saleem KS, Hikosaka O. Negative reward signals from the lateral habenula to dopamine neurons are mediated by rostromedial tegmental nucleus in primates. J Neurosci. 2011;31:11457–11471. doi: 10.1523/JNEUROSCI.1384-11.2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Humphries MD, Prescott TJ. The ventral basal ganglia, a selection mechanism at the crossroads of space, strategy, and reward. Prog Neurobiol. 2010;90:385–417. doi: 10.1016/j.pneurobio.2009.11.003. [DOI] [PubMed] [Google Scholar]
  28. Ikemoto S. Brain reward circuitry beyond the mesolimbic dopamine system: a neurobiological theory. Neurosci Biobehav Rev. 2010;35:129–150. doi: 10.1016/j.neubiorev.2010.02.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Ito M, Doya K. Validation of decision-making models and analysis of decision variables in the rat basal ganglia. J Neurosci. 2009;29:9861–9874. doi: 10.1523/JNEUROSCI.6157-08.2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Johnson PI, Parente MA, Stellar JR. NMDA-induced lesions of the nucleus accumbens or the ventral pallidum increase the rewarding efficacy of food to deprived rats. Brain Res. 1996;722:109–117. doi: 10.1016/0006-8993(96)00202-8. [DOI] [PubMed] [Google Scholar]
  31. Joshua M, Adler A, Rosin B, Vaadia E, Bergman H. Encoding of probabilistic rewarding and aversive events by pallidal and nigral neurons. J Neurophysiol. 2009;101:758–772. doi: 10.1152/jn.90764.2008. [DOI] [PubMed] [Google Scholar]
  32. Kawagoe R, Takikawa Y, Hikosaka O. Expectation of reward modulates cognitive signals in the basal ganglia. Nat Neurosci. 1998;1:411–416. doi: 10.1038/1625. [DOI] [PubMed] [Google Scholar]
  33. Lauwereyns J, Watanabe K, Coe B, Hikosaka O. A neural correlate of response bias in monkey caudate nucleus. Nature. 2002;418:413–417. doi: 10.1038/nature00892. [DOI] [PubMed] [Google Scholar]
  34. Matsumoto M, Hikosaka O. Lateral habenula as a source of negative reward signals in dopamine neurons. Nature. 2007;447:1111–1115. doi: 10.1038/nature05860. [DOI] [PubMed] [Google Scholar]
  35. McAlonan GM, Robbins TW, Everitt BJ. Effects of medial dorsal thalamic and ventral pallidal lesions on the acquisition of a conditioned place preference: further evidence for the involvement of the ventral striatopallidal system in reward-related processes. Neuroscience. 1993;52:605–620. doi: 10.1016/0306-4522(93)90410-h. [DOI] [PubMed] [Google Scholar]
  36. Miller JM, Vorel SR, Tranguch AJ, Kenny ET, Mazzoni P, van Gorp WG, Kleber HD. Anhedonia after a selective bilateral lesion of the globus pallidus. Am J Psychiatry. 2006;163:786–788. doi: 10.1176/ajp.2006.163.5.786. [DOI] [PubMed] [Google Scholar]
  37. Mogenson GJ, Jones DL, Yim CY. From motivation to action: functional interface between the limbic system and the motor system. Prog Neurobiol. 1980;14:69–97. doi: 10.1016/0301-0082(80)90018-0. [DOI] [PubMed] [Google Scholar]
  38. Nakamura K, Matsumoto M, Hikosaka O. Reward-dependent modulation of neuronal activity in the primate dorsal raphe nucleus. J Neurosci. 2008;28:5331–5343. doi: 10.1523/JNEUROSCI.0021-08.2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Niv Y, Daw ND, Joel D, Dayan P. Tonic dopamine: opportunity costs and the control of response vigor. Psychopharmacology. 2007;191:507–520. doi: 10.1007/s00213-006-0502-4. [DOI] [PubMed] [Google Scholar]
  40. O'Doherty JP, Hampton A, Kim H. Model-based fMRI and its application to reward learning and decision making. Ann N Y Acad Sci. 2007;1104:35–53. doi: 10.1196/annals.1390.022. [DOI] [PubMed] [Google Scholar]
  41. Pessiglione M, Schmidt L, Draganski B, Kalisch R, Lau H, Dolan RJ, Frith CD. How the brain translates money into force: a neuroimaging study of subliminal motivation. Science. 2007;316:904–906. doi: 10.1126/science.1140459. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Pasquereau B, Nadjar A, Arkadir D, Bezard E, Goillandeau M, Bioulac B, Gross CE, Boraud T. Shaping of motor responses by incentive values through the basal ganglia. J Neurosci. 2007;27:1176–1183. doi: 10.1523/JNEUROSCI.3745-06.2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Robbins TW, Everitt BJ. Neurobehavioural mechanisms of reward and motivation. Curr Opin Neurobiol. 1996;6:228–236. doi: 10.1016/s0959-4388(96)80077-8. [DOI] [PubMed] [Google Scholar]
  44. Sakamoto M, Hikosaka O. Eye movements induced by microinjection of GABA agonist in the rat substantia nigra pars reticulata. Neurosci Res. 1989;6:216–233. doi: 10.1016/0168-0102(89)90061-8. [DOI] [PubMed] [Google Scholar]
  45. Samejima K, Ueda Y, Doya K, Kimura M. Representation of action-specific reward values in the striatum. Science. 2005;310:1337–1340. doi: 10.1126/science.1115270. [DOI] [PubMed] [Google Scholar]
  46. Sato M, Hikosaka O. Role of primate substantia nigra pars reticulata in reward-oriented saccadic eye movement. J Neurosci. 2002;22:2363–2373. doi: 10.1523/JNEUROSCI.22-06-02363.2002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Shidara M, Aigner TG, Richmond BJ. Neuronal signals in the monkey ventral striatum related to progress through a predictable series of trials. J Neurosci. 1998;18:2613–2625. doi: 10.1523/JNEUROSCI.18-07-02613.1998. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Smith KS, Berridge KC. The ventral pallidum and hedonic reward: neurochemical maps of sucrose “liking” and food intake. J Neurosci. 2005;25:8637–8649. doi: 10.1523/JNEUROSCI.1902-05.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Smith KS, Berridge KC, Aldridge JW. Disentangling pleasure from incentive salience and learning signals in brain reward circuitry. Proc Natl Acad Sci U S A. 2011;108:255–264. doi: 10.1073/pnas.1101920108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Smith KS, Tindell AJ, Aldridge JW, Berridge KC. Ventral pallidum roles in reward and motivation. Behav Brain Res. 2009;196:155–167. doi: 10.1016/j.bbr.2008.09.038. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Tachibana Y, Kita H, Chiken S, Takada M, Nambu A. Motor cortical control of internal pallidal activity through glutamatergic and GABAergic inputs in awake monkeys. Eur J Neurosci. 2008;27:238–253. doi: 10.1111/j.1460-9568.2007.05990.x. [DOI] [PubMed] [Google Scholar]
  52. Takikawa Y, Kawagoe R, Itoh H, Nakahara H, Hikosaka O. Modulation of saccadic eye movements by predicted reward outcome. Exp Brain Res. 2002;142:284–291. doi: 10.1007/s00221-001-0928-1. [DOI] [PubMed] [Google Scholar]
  53. Tindell AJ, Berridge KC, Aldridge JW. Ventral pallidal representation of pavlovian cues and reward: population and rate codes. J Neurosci. 2004;24:1058–1069. doi: 10.1523/JNEUROSCI.1437-03.2004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Watanabe K, Lauwereyns J, Hikosaka O. Effects of motivational conflicts on visually elicited saccades in monkeys. Exp Brain Res. 2003;152:361–367. doi: 10.1007/s00221-003-1555-9. [DOI] [PubMed] [Google Scholar]
  55. Wise RA. Neurobiology of addiction. Curr Opin Neurobiol. 1996;6:243–251. doi: 10.1016/s0959-4388(96)80079-1. [DOI] [PubMed] [Google Scholar]
  56. Yoshida A, Tanaka M. Enhanced modulation of neuronal activity during antisaccades in the primate globus pallidus. Cereb Cortex. 2009;19:206–217. doi: 10.1093/cercor/bhn069. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

01

RESOURCES