Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2016 Nov 19.
Published in final edited form as: Cell. 2015 Nov 19;163(5):1165–1175. doi: 10.1016/j.cell.2015.10.063

Dopamine neurons encoding long-term memory of object value for habitual behavior

Hyoung F Kim 1,2,3, Ali Ghazizadeh 1, Okihide Hikosaka 1,4
PMCID: PMC4656142  NIHMSID: NIHMS734606  PMID: 26590420

SUMMARY

Dopamine neurons promote learning by processing recent changes in reward values, such that reward may be maximized. However, such a flexible signal is not suitable for habitual behaviors that are sustained regardless of recent changes in reward outcome. We discovered a type of dopamine neuron in the monkey substantia nigra pars compacta (SNc) that retains past-learned reward values stably. After reward values of visual objects are learned, these neurons continue to respond differentially to the objects, even when reward is not expected. Responses are strengthened by repeated learning and are evoked upon presentation of the objects long after learning is completed. These “sustain-type” dopamine neurons are confined to the caudal-lateral SNc and project to the caudate tail, which encodes long-term value memories of visual objects and guides gaze automatically to stably valued objects. This population of dopamine neurons thus selectively promotes learning and retention of habitual behavior.

INTRODUCTION

Dopamine (DA) neurons are sensitive to reward value that is different from predicted, the signal often called reward prediction error (RPE) (Schultz et al., 1997). Positive or negative RPE is used to facilitate or inhibit, respectively, behavior associated with the reward until a desirable behavior is chosen (Sutton, 1988). However, reward-seeking behavior changes as it progresses. Initially, behavior changes flexibly depending on recent reward outcomes (goal-directed), but once a desirable pattern is acquired, it is maintained stably regardless of reward outcomes (habit) (Balleine and Dickinson, 1998; Graybiel, 2008; Seger and Spiering, 2011). The RPE-based DA signal would thus be suitable for acquiring goal-directed behavior, but not for sustaining habits. Then, how can a habit be sustained?

It has been suggested that goal-directed behavior and habits are controlled by separate mechanisms, especially separate circuits in the basal ganglia: dorsomedial vs. dorsolateral striatum in rodents (Yin and Knowlton, 2006), rostral vs. caudal striatum in monkeys (Hikosaka et al., 1999) and humans (Balleine and O’Doherty, 2010; Lehericy et al., 2005). Notably, all of these striatal areas are heavily innervated by DA neurons (Richfield et al., 1987). Furthermore, we recently found that a distinct group of DA neurons selectively project to the tail of the caudate nucleus (CDt), part of the caudal striatum, which has a critical role in habitual visual-oculomotor behavior (Fernandez-Ruiz et al., 2001; Kim and Hikosaka, 2013; Kim et al., 2014; Yamamoto et al., 2013).

This raises a question: Are the CDt-projecting DA neurons involved in habitual visual-oculomotor behavior? To answer this question, we let monkeys experience many visual objects in two contexts sequentially: 1) each object associated with a high or low reward value consistently and repeatedly (learning context), 2) the same objects with no contingent reward feedback (habitual context). We found that a spatially localized group of DA neurons acquired object value signals without encoding RPE in the learning context and then continued to respond to the objects differentially by their past-learned values in the habitual context. These DA neurons projected to CDt, suggesting that they play a critical role in learning and sustaining habitual visual-oculomotor behavior.

RESULTS

Habitual visual-oculomotor behavior caused by object-value learning

We used computer-generated fractals for experimental objects, half of them associated with a reward (high-valued) and the other half with no reward (low-valued) (Figure 1A). Our experiments consisted of two steps: 1) object-reward association (learning) and 2) behavior and neuronal encoding of object values (testing) (Figure 1B). In the learning procedure, the monkey made a saccade to the presented object, which was followed by a liquid reward or no reward depending on the presented object (Figure 1C). In each learning session, the object was chosen pseudo-randomly from a set of eight objects (each row in Figure 1A). Monkeys developed the difference in target acquisition time gradually across the trials in the 1st learning session (Figure 1D), indicating that the saccade was controlled by the expected reward outcome.

Figure 1. Object-value learning and habitual visual-oculomotor behavior.

Figure 1

A. Fractal objects, each consistently associated with a reward (high-valued) or no reward (low valued). Monkey PK and DW learned 440 and 840 fractals respectively, among which 56 and 376 were long-term learned (> 4d).

B. Learning and testing procedures.

C. Object-value learning task. A fractal object was presented at a neurons’ preferred position, and the monkey made a saccade to it after the central fixation dot turned off. This was followed by a reward if the object was high-valued (top) or no reward if the object was low-valued (bottom). A set of eight objects (as in A) was used in each learning session.

D. Behavioral changes during learning. Mean target acquisition time (time after the fixation dot disappeared until the gaze reached the object) is plotted against the number of trials for each object during the 1st learning (n = 107). Data are shown separately for high-valued objects (red) and low-valued objects (blue). Green line indicates the difference of target acquisition time between the high- and low-valued objects (mean ± SE).

E. Free viewing procedure to test behavioral changes after learning. Four fractal objects among one set of eight objects were chosen pseudorandomly and presented simultaneously. Monkeys were free to look at the objects (or look elsewhere) for 2 sec without reward feedback.

F. Increase in gaze bias during free viewing after repeated learning. The saccade-choice rate (left) and gazing duration rate (right) are plotted before learning (Before, n = 42); after 1 day learning (1d, n = 22); after more than 4 day learning (> 4d, n = 316).

See also Figure S1.

To test habitual behavior, we let the monkey freely look at the value-learned objects with no reward outcome (free viewing procedure) (Figure 1E). When the previously learned objects were presented, monkeys looked at high-valued objects more frequently with longer durations than low-valued objects, even though no reward was given. The gaze bias became stronger after repeated learning (Figure 1F). Once established, the gaze bias occurred each time the free viewing was tested without further learning, confirming previous studies (Kim and Hikosaka, 2013; Yasuda et al., 2012). Although the gaze bias declined initially, it showed no further decrease, even though monkeys viewed the learned objects many times without contingent reward outcomes (i.e., free viewing or passive viewing) (Figure S1). This extinction-resistant gaze bias would be regarded as a habitual visual-oculomotor behavior, although we did not apply ‘devaluation’, a common procedure to characterize habits (Balleine and Dickinson, 1998).

Two types of dopamine neurons updating and sustaining object values

To test if DA neurons encode object values habitually, we recorded from presumed DA neurons in SNc during the two steps: learning and testing (Figure 1B). Figure 2 shows the activity of two example neurons. They fired slowly with long duration spikes (Figure 2C,F), which is distinct from GABAergic neurons in the substantia nigra pars reticulata (SNr) (Schultz, 1986). As the first step (learning), we recorded their activity while the monkey was looking at novel fractal objects with or without reward (Figure 2A, C, and F). Both neurons responded to these objects differentially, more excited by high-valued objects (Figure 2D and G), consistent with previous reports (Tobler et al., 2005). Such value-differential responses developed gradually (Figure S2A and C), as the difference in target acquisition time developed (Figure 1D).

Figure 2. Neuronal coding of object values during learning and post-learning.

Figure 2

Responses of two presumed DA neurons in SNc are shown during learning (object-value learning task) and post-learning (passive viewing procedure).

A. Object-value learning task (see Figure 1C).

B. Passive viewing task. The learned objects were presented sequentially in the neuron’s preferred location, while the monkey was fixating at the center. A reward was delivered non-contingently with the presented objects.

CE. Responses of neuron #1 (spike shape shown in C, bottom) to eight objects (shown in C, top) during the first object learning task (D), followed by the passive viewing task (E). Average activity (shown by spike density functions, SDFs) is aligned at the onset of object presentation.

FH. Responses of neuron #2 in the same format.

See also Figure S2

However, the two neurons behaved differently in the second step (testing). The learned objects were presented sequentially while the monkey was fixating at the center (passive viewing task, Figure 2B). Unlike the learning procedure, each object presentation induced no increase or decrease of the expected reward value. Neuron #1 stopped responding to the previously learned objects (Figure 2E), reflecting the lack of reward contingency. In contrast, neuron #2 continued to respond to the objects differentially (Figure 2H). In short, neuron #1 updated object values flexibly based on immediate reward expectation (update type), whereas neuron #2 sustained object values stably based on past experience (sustain type).

We recorded activity of 133 presumed DA neurons and found 69 neurons that encoded object values in two monkeys. In the passive viewing task, 45 neurons showed value-differential responses (sustain type) and 24 neurons showed no response (update type) (p < 0.05, Wilcoxon rank-sum test). Their average activity is shown in Figure 3.

Figure 3. Distinct patterns of object value responses in two types of presumed DA neurons.

Figure 3

AE. Average activity (shown by SDFs) of update-type DA neurons (top) and sustain-type DA neurons (bottom). AC: Responses to novel objects during three steps: passive viewing (A), object value learning (B), passive viewing (C). DE. Responses to well learned objects (> 4d) during relearning (D) and passive viewing after > 1d retention (E). Green line indicates the difference between the high- and low-valued object responses (mean ± SE). The number of neurons examined (n) is shown in each graph.

F. Increase in value discrimination by repeated learning. Neuronal discrimination between high-and low-valued objects in passive viewing task (measured as ROC area, mean ± SE) is plotted before learning (Before), after 1-day learning (1d), and after more than 4-day learning (> 4d).

G. Retention of value discrimination after learning. Neuronal discrimination in passive viewing task is plotted before learning (Before), immediately after learning (After), 1–4 days after learning (≤ 4d), and > 4 days after learning (> 4d). For sustain-type DA neurons, data are separated by the number of learning: 1 day learning (dashed line, 1d) and more than 4 day learning (solid line, > 4d).

For F and G, the number of neurons examined (n) is shown at each data point. *p < 0.05, **p < 0.01, ***p < 0.001 by Wilcoxon rank-sum test.

See also Figure S2 and S3.

In order to examine the learning/memory process of these neurons, we repeated the object-value learning and passive viewing task across daily sessions. Update-type neurons continued to be non-responsive in the passive viewing task even after extensive long-term learning (> 4 daily sessions) (Figure 3E, top). In contrast, sustain-type neurons continued to be responsive in the passive viewing task (Figure 3C and E, bottom), and their responses were enhanced after repeated learning (Figure 3F), together with the development of the gaze bias during free viewing (Figure 1F). Their value discrimination remained robust many days after the last learning (Figure 3G, >4d). The retention of the object-value response was affected by the amount of learning: no decrease after > 4 day learning (Figure 3G, solid magenta lines); some decrease after 1 day learning (Figure 3G, hatched magenta lines). Importantly, during the retention period the monkeys viewed the learned objects many times without contingent reward outcomes (i.e., free viewing or passive viewing), yet sustain-type neurons showed no significant decrease in the object-value response (Figure S2E–G). These results suggest that sustain-type neurons contribute to the acquisition and retention of habitual visual-oculomotor behavior.

Figure 4. Differences and similarities of update- and sustain-type DA neurons.

Figure 4

A. Response to the unpredicted reward outcome. Data were collected only for the 1st trial of each object in object value learning, and are shown as average SDFs (left) and individual neuronal discrimination (right) for update-type DA neurons (n = 24) and sustain-type DA neurons (n = 29). The mean neuronal discrimination (indicated by triangle, calculated as ROC area) was significantly higher than 0.5 (i.e., no discrimination) for sustain-type DA neurons (ROC = 0.78), but not for update-type DA neurons (ROC = 0.53) (p < 0.001, Wilcoxon rank-sum test).

B. Spatial selectivity of visual response. This was tested for update-type DA neurons (top, n = 24) and sustain-type DA neurons (bottom, n = 39) by ipsilateral and contralateral object presentations. Data are shown by averaged SDFs (left) (dashed line: ipsilateral, solid line: contralateral). In scatterplots (right), each data point indicates the responses of each neuron to ipsilateral (ordinate) and contralateral (abscissa) objects. Red dots indicate neurons whose spatial selectivity is statistically significant (Wilcoxon rank-sum test, p < 0.05). N.S., non-significant.

C. Stereotaxic locations of sustain- and update-type DA neurons in SN in coronal (top-left) and sagittal (down-left) views. D: dorsal, V: ventral, M: medial, L: lateral, R: rostral, C: caudal. Their distributions are projected to each of 3D axes (right). Number 0 indicates the midline (Medial-Lateral), the dorsal end of SN (Dorsal-Ventral), and the rostral end of SN (Rostral-Caudal). Their means (triangles) were statistically different in the medial-lateral and rostral-caudal dimensions (p < 0.001 by Wilcoxon rank-sum test). The coordinates 0, 0, 0 (abscissa) are rostral, medial and dorsal edges of SN.

D. Electrophysiological properties. Sustain- and update-type DA neurons had similar spike shapes, which are different from non-DA (presumed GABAergic) neurons (left). Relationship between spike duration and baseline firing rate for sustain- and update-type DA neurons, and non-DA neurons (right).

See also Figure S2, S3 and S4.

Update and sustain-type neurons also had different sensitivities to reward itself. In the first trial of the 1st learning procedure when the reward outcome was unpredictable, update-type neurons were excited more strongly to reward than to no reward (Figures 4A, top and S2B), whereas sustain-type neurons showed variable responses and overall no discrimination (Figures 4A, bottom and S2D). This reward response in update-type neurons disappeared in < 3 trials as the reward became predictable (Figure S2B). These data suggest that update-type neurons encode RPE signals and therefore are suitable for goal-directed behavior, but not for sustaining habitual behavior. In contrast, sustain-type neurons may be more suitable for habitual behavior.

These suggestions were supported by another object-value association task in which two objects reversed their values frequently (Figure S3A). Update-type neurons reversed their responses quickly and clearly following the object value reversal (Figure S3B). In contrast, sustain-type neurons started showing value-biased responses slowly and weakly (Figure S3C). Overall, neurons showing stronger value-biased responses in the passive viewing task tended to show weaker value-biased responses in the reversal task (Figure S3F). The response to the unexpected reward outcome occurred in update-type neurons (Figure S3D), but not sustain-type neurons (Figure S3E), confirming the data in the learning procedure (Figure 4A).

Interestingly, most sustain-type neurons (29/38, 77%) responded more strongly to visual objects presented in the contralateral than ipsilateral field (p < 0.05, Wilcoxon rank-sum test) (Figure 4B, bottom, Figure S3C), unlike update-type neurons (Figure 4B, top, Figure S3B). Neurons showing stronger value-biased responses in the passive viewing task tended to show stronger spatial selectivity (Figure S3G). The scattered data in Figure S3F and G may suggest that presumed DA neurons can be divided into update- and sustain-type neurons along a gradient in multiple features, rather than as two distinct groups of neurons.

We also found that the locations of update and sustain-type neurons were largely separate: the update-type neurons more rostral and medial part of SNc (rmSNc); sustain-type neurons more caudal and lateral part (clSNc) (Figure 4C). Overall, the response in the passive viewing task was stronger in the caudal and lateral parts (Figure S4A). The same tendencies were present for the spatial selectivity (Figure S4B). In contrast, the reversal response was stronger in the rostral and medial parts (Figure S4C).

Despite all of the differences described above, the update and sustain type neurons showed similar electrophysiological properties (Fig 4D and Table 1). Compared with presumed GABAergic neurons in SNr, both update and sustain-type neurons had longer spike durations and lower baseline activity. These features are similar to those used previously to characterize DA neurons in the monkey SNc (Schultz, 1986), suggesting that both update and sustain-type neurons were dopaminergic.

Table 1.

Electrophysiological properties for sustain-type DA, update-type DA, and presumed GABAergic neurons (mean ± SD).

Sustain DA Update DA Presumed GABAergic
Spike duration (ms) 1.3 ± 0.2 1.3 ± 0.2 0.9 ± 0.1
Baseline firing rate (spikes/s) 4.7 ± 2.9 5.0 ± 2.3 53.7 ± 19.7

Update-type neurons apparently correspond to the RPE-sensitive DA neurons which have been investigated repeatedly in various animal species including humans (Cohen et al., 2012; D’Ardenne et al., 2008; Schultz, 1998). In contrast, sustain-type neurons are a novel finding, and their functions are unknown so far. In pursuing this question, we hypothesized that the value signal encoded by sustain-type neurons is sent to CDt, because we previously showed that CDt receive inputs from DA neurons in the caudal-dorsal-lateral part of SNc (cdlSNc) (Kim et al., 2014), the distribution similar to that of sustain-type neurons (Figure 4C).

Sustain-type DA neurons connecting with caudate tail

To test this hypothesis further, we examined whether the sustain-type neurons were activated antidromically by the electrical stimulation of CDt (Figures 5 and S6B). We placed the stimulating electrode accurately in CDt by recording single neuronal activity with the electrode (Figure 5A). Among 31 SNc neurons tested, 7 neurons were activated antidromically (hereafter called Anti(+) neurons). One example is shown in Figure 5B. The CDt stimulation evoked spikes at a fixed latency (6.9 ms) (Figure 5B, top). The antidromic nature was confirmed by a collision test: when the stimulation was followed by a spontaneous spike by less than 7.9 ms, the stimulation no longer evoked a spike (Figure 5B, bottom). Among the 7 Anti(+) neurons, the antidromic response appeared sometimes as a partial initial segment (IS) spike in 4 neurons (Figure S5A–D), a feature common among DA neurons (Grace and Bunney, 1983).

Figure 5. Efferent and afferent connections of sustiain-type DA neurons.

Figure 5

A. Scheme showing electrical stimulation in caudate tail (CDt) and neuronal recording in caudal-lateral SNc (clSNc). SNr: substantia nigra pars reticulata. Coronal view. Scale bar: 1 mm.

B. An SNc neuron activated by electrical stimulation in CDt with a fixed latency (6.9 ms) (PK#1, Figure S5E). This activation was eliminated when CDt stimulation occurred < 7.9 ms after a spontaneous spike (bottom), confirming its antidromic nature (collision test).

C. Value discrimination of antidromically activated (Anti(+)) neurons (n=6) in passive viewing task (> 4d learning and > 1d retention), shown as average SDFs (left) and ROC distribution (right).

DE. Orthodromic responses of sustain-type DA neurons (D), but not update-type DA neurons (E), by CDt stimulation. Average activity is (shown by peristimulus time histogram, PSTH) is aligned on CDt stimulation (dotted line). The lack of activity just after the stimulation was caused by stimulus artifact.

F. Responses of individual DA neurons to CDt stimulation shown by a scatterplot. Each data point indicates each neuron’s activity 10–40 ms after (ordinate) and 0–80 ms before (abscissa) CDt stimulation. Red dots indicate neurons whose response is statistically significant (t-test, p < 0.05). N.S., non-significant.

See also Figure S5 and S6.

We then examined the responses of 6 out of the 7 Anti(+) neurons using our object-value procedures (Figures 1 and S5E–G). These Anti(+) neurons shared the same features common to the sustain-type neurons: 1) Anti(+) neurons showed strong value-differential responses in the passive viewing task (average activity in Figure 5C; individual activity in Figures S5F and S4A); 2) Anti(+) neurons showed only weak value-differential responses in the reversal task (Figure S4C); 3) Anti(+) neurons were insensitive to the reward outcome (Figure S5G); 4) Anti(+) neurons responded more strongly to contralateral objects (Figure S4B); 5) Anti(+) neurons were located within a cluster of sustain-type neurons (Figure 6F, Figure S4). These results suggest that the sustained value signal is sent from SNc to CDt.

Figure 6. Colocalization of sustain-type DA neurons and CDt-projecting DA neurons.

Figure 6

A. Location of an Anti(+) neuron (PK#3), indicated by a marking lesion (black arrow) in a Nissl-stained coronal section. Black line indicates the border of SN.

B. Combination of antidromic and retrograde tracer experiments in monkey PK. A retrograde tracer, cholera toxin subunit B (CTB), was injected in CDt.

C. An adjacent section (50 μm from the section in A) showing sensitivity to TH (green) and CTB (red) and the location of the marking lesion (white arrow). A red plexus in the dorsolateral SNr indicates the anterogradely labeled axon terminals of CDt neurons. Scale bar: 2 mm.

D. Enlarged view of the area around the marking lesion. Among many TH-positive DA neurons (green) are TH and CTB-double labeled neurons (yellow or orange color, indicated by white arrowheads). Scale bar: 100 μm.

E. Stereotaxic locations of CDt-projecting neurons (retrogradely CTB-labeled). 98.5% of them were TH-positive. Scale bar: 1 mm.

F. Recording sites of sustain-type and update-type SNc neurons in stereotaxic coordinates. Anti(+) neurons among the sustain-type are indicated by yellow dots. The locations of sustain-type neurons were included in clSNc where DA neurons projected to CDt (E).

In E and F, the locations of neurons are projected to the coronal and sagittal perspectives of SN, based on MRI (E) and histological sections (F). Scale bar: 1 mm.

See also Figure S6.

We also found that CDt stimulation induced orthodromic effects (Figure 5D–F). Among 15 sustain-type neurons examined, 10 neurons showed a phasic increase in activity (Figure 5D and F). Their locations are shown in Figure S4. The latency of the excitatory response as a population was 7 ms. In contrast, none of the update-type neurons examined (n = 6) showed an orthodromic response (Figure 5E and F). These data raise the possibility that sustain-type SNc neurons not only send signals to CDt but also receive signals from CDt, possibly through SNr (Figure 5A, see Discussion).

Given their projections to CDt, the Anti(+) neurons were likely to be dopaminergic. To test this hypothesis, we anatomically reconstructed dopaminergic neurons in SNc that projected to CDt. We injected Cholera toxin B subunit (CTB, retrograde tracer) into CDt and immunohistochemically processed SNc-containing sections to detect both CTB and Tyrosine hydroxylase (TH, DA neuronal marker) (Figure 6B–D). Before this process, we marked the recording site of one Anti(+) neuron by passing a small DC current through the recording electrode. The marking lesion was found in the caudal-lateral part of SNc (clSNc) (Figure 6A and C). It was surrounded by many TH-positive cells (green cell somas in Figure 6D), some of which were also CTB-positive and therefore projected to CDt (orange cell somas in Figure 6D). These CTB-positive cells were clustered in clSNc and most of them (98.5%, 338/343) were TH-positive (Figures 6E and S6A), confirming our previous study (Kim et al., 2014). Importantly, the recording sites of sustain-type neurons (Figure 6F) were included in the clSNc region where DA neurons projected specifically to CDt (Figure 6E). These results suggest that sustain-type neurons are dopaminergic and project to CDt.

DISCUSSION

Anatomical and functional segregation of DA effects

Our experiments showed that two types of DA neurons encoded reward values of visual objects in different manners (i.e., updating and sustaining), although they might represent two extremes of a functional gradient. Update-type DA neurons were sensitive to unpredicted changes in the values of incoming rewards (i.e., RPE) and their predictors (i.e., fractal objects). These are typical features that characterize DA neurons (Schultz, 1998) and are suitable for goal-directed behavior which is modified flexibly by changes in reward outcomes (Balleine and Dickinson, 1998). In contrast, sustain-type DA neurons constitute a different group of DA neurons. After long-term learning they became insensitive to changes in expected rewards: they continued to respond to previously reward-associated visual objects even when the reward outcome was no longer expected. Their responses showed no significant decrease even when non-contingent outcomes were repeated many times across many days (Figure S2E–G). These extinction-resistant neuronal responses are compatible with habits which remain functional even after reward outcomes are eliminated (Balleine and Dickinson, 1998; Graybiel, 2008), although ‘devaluation’ has not been applied.

The two types of DA neurons would influence behavior by their projections to separate regions of the striatum. Update-type neurons were localized in the rostral-medial part of SNc (rmSNc) where many DA neurons project to CDh (Kim et al., 2014). Indeed, depending on the expected reward outcome, neurons in CDh change their activity flexibly (i.e., mostly higher when reward is expected) similarly to update-type DA neurons (Figure S3B and F) and monkeys change their behavior flexibly (e.g., quicker saccades when reward is expected) (Hikosaka et al., 1989a; Kim and Hikosaka, 2013).

A different scenario applies to the other type of DA neurons: sustain-type DA neurons. Sustain-type DA neurons were localized in the caudal-lateral part of SNc (clSNc) where many DA neurons project to CDt (Kim et al., 2014). Notably, all neurons in clSNc projecting to CDt (indicated by antidromic activation) showed sustained object-value responses. This sustained DA signal seems to control the activity of CDt neurons: most CDt neurons continued to show value-differential visual responses stably even with no contingent reward outcomes (Kim and Hikosaka, 2013; Yamamoto et al., 2013). In short, habitual object-value signals are encoded by the sustainDA-CDt circuit, but not the updateDA-CDh circuit (Figure S6C and D).

In fact, these two circuits seem to control gaze-orienting behavior (saccade) in distinctly different manners. The flexible saccade bias (i.e., caused by expected rewards) is reduced by the inactivation of CDh, but not CDt; in contrast, the stable saccade bias (i.e., caused by previous reward associations) is reduced by the inactivation of CDt, but not CDh (Kim and Hikosaka, 2013). This distinct value-signal processing is supported by mostly separate downstream circuits (Figure S6E): CDh outputs through the rostral-ventral-medial part of SNr (rvmSNr) and CDt outputs through the caudal-dorsal-lateral part of SNr (cdlSNr), both of which converge to the superior colliculus (SC) (Yasuda and Hikosaka, 2015; Yasuda et al., 2012).

To summarize, short-term and long-term memories of object values are processed separately by the two basal ganglia circuits (Figure S6E). Relying on short-term memories, the updateDA-CDh circuit contributes to voluntary, goal-directed saccades. Relying on long-term memories, the sustainDA-CDt circuit contributes to automatic, habitual saccades.

Sustain-type DA neurons for habitual visual-oculomotor behavior

A remarkable feature of sustain-type DA neurons is that, once they have acquired memories of object values, they rarely unlearn the object values (Figure 3G, Figure S2E–G). The resistance to extinction apparently allowed them to accumulate value memories as the monkey experienced more objects. The cumulative value memories of sustain-type DA neurons seem translated into the extremely high capacity of visual object memories in their target neurons in CDt and cdlSNr (Hikosaka et al., 2013; Yamamoto et al., 2013; Yasuda et al., 2012). This was clearly shown for cdlSNr neurons which discriminate > 300 visual objects by their stable values and retain the value memories for > 100 days (Yasuda et al., 2012).

How can sustain-type DA neurons process object values consistently regardless of the immediate reward outcomes? To address this question, we will discuss their detailed properties. First, sustain-type DA neurons showed little response to water reward itself even when the reward was unpredicted (Figures 4A and S2D). Instead, they responded to visual objects even when they were novel (Figure 3A), and then became consistently sensitive to the values of visual objects. These results suggest that sustain-type DA neurons rely on another mechanism that can identify the valuable object based on its association with reward outcomes; in fact, this is what update-type DA neurons would do. This might be accomplished by the connection of update-type DA neurons to SC (Figure S6E) which projects back to the DA neurons via excitatory connections (Comoli et al., 2003). Specifically, update-type DA neurons would guide SC neurons to signal attention/gaze based on recent reward experiences (Ikeda and Hikosaka, 2003, 2007) and this signal may be sent to sustain-type DA neurons. Consistent with this hypothesis, sustain-type DA neurons responded to visual objects more strongly when they appeared on the contralateral side (Figure 4B), similarly to SC visual neurons (Goldberg and Wurtz, 1972). According to this scenario, sustain-type DA neurons appear to rely on conditioned reinforcement (Taylor and Robbins, 1986), not reward itself, as the source of object values.

Second, the extinction-resistant memories for habitual visual-oculomotor behavior might be facilitated by a loop-circuit mechanism. Electrical stimulation of CDt induced orthodromic excitations in many of the sustain-type DA neurons (Figure 5D and F), in addition to occasional antidromic activations (Figure 5A–C). Notably, cdlSNr neurons are inhibited by CDt stimulation through the direct GABAergic connection (Yasuda and Hikosaka, 2015). SNr neurons are known to have GABAergic axon collaterals that synapse on adjacent DA neurons (Deniau et al., 1982; Tepper et al., 1995). Therefore, the CDt-induced excitation of sustain-type DA neurons may be caused by a disinhibition mediated by cdlSNr neurons. In fact, sustain-type DA neurons were located very close to cdlSNr neurons which encode stable value memories (Kim et al., 2014; Yasuda et al., 2012). The presumed loop circuit (CDt-cdlSNr-sustainDA-CDt in Figure S6E) would act as a positive loop, since the DA effect on direct pathway neurons in the striatum is mediated through D1 receptors (Gerfen, 1992) and is thought to be facilitatory (Surmeier et al., 2007; West and Grace, 2002). This mechanism might underlie the long-term memories for habitual visual oculomotor behavior.

Implications of sustain-type DA neurons in unconscious memories

CDt has long been implicated in unconscious memories of visual objects. This “visual habit” concept was initiated by studies on macaque monkeys using a concurrent discrimination task (Fernandez-Ruiz et al., 2001) and was confirmed by human studies: people with extensive lesions in the medial temporal lobe including hippocampus may lose conscious memories (i.e., amnesia), but can learn to choose high-valued objects, even though they cannot recognize the objects (Bayley et al., 2005). Our data suggest that CDt-projecting DA neurons contribute to the unconscious visual memories and automatic gaze orienting. In fact, people with Parkinson’s disease (PD) showed no learning in the concurrent discrimination task unless conscious memories are deployed (Moody et al., 2010).

Implications of sustain-type DA neurons in basal ganglia dysfunctions

Our study provides new perspectives in basal ganglia dysfunctions. In Parkinson’s disease, DA cell loss tends to occur in the lateral part of SNc (Goto et al., 1989) where sustain-type neurons dominate. As predicted from our data, people with Parkinson’s disease often have difficulties in performing daily routines automatically (Kim and Hikosaka, 2015; Redgrave et al., 2010). This may be caused partly by the dysfunction of the CDt-cdlSNr-SC circuit. In contrast, people with drug abuse are persistently and often unconsciously attracted by visual cues associated with addictive drugs (Goldstein et al., 2009). This might be caused by malfunctioning of sustain-type DA neurons in SNc targeting the CDt-cdlSNr-SC circuit.

Heterogeneity of DA neurons

Recent studies suggest that DA neurons are heterogeneous in terms of their functions (Brischoux et al., 2009; Lerner et al., 2015; Matsumoto and Hikosaka, 2009). Our experiments support this idea. In macaque monkeys, another kind of heterogeneity was reported from our lab (Matsumoto and Hikosaka, 2009): In response to visual cues that predicted an aversive stimulus, DA neurons in the ventromedial SNc were inhibited (value-coding), whereas DA neurons in the dorsolateral SNc were excited (salience-coding). Their locations roughly match the locations of update-type and sustain-type neurons, respectively. However, it is still unknown how individual DA neurons are involved in these two kinds of functional categories. This remains a crucial question to integratively understand the functions of DA neurons.

EXPERIMENTAL PROCEDURES

General procedures

Two adult male rhesus monkeys (Macaca mulatta), PK (8 kg) for neuronal recording and histology and DW (11 kg) for neuronal recording, were used for the experiments. All animal care and experimental procedures were approved by the National Eye Institute Animal Care and Use Committee and complied with the Public Health Service Policy on the humane care and use of laboratory animals. We implanted a plastic head holder and two plastic recording chambers to the skull under general anesthesia and sterile surgical conditions. One chamber aiming at CDt was tilted laterally by 25°, and another chamber aiming at SNc was tilted posteriorly by 40°. Two search coils were surgically implanted under the conjunctiva of the eyes to record eye movements. After the monkeys fully recovered from surgery, we started training them with object value learning and passive viewing task.

Single unit recording

While the monkey was performing a task, activity of single neurons in SNc and CDt was recorded using conventional methods. The recording sites were determined with 1 mm spacing grid system, with the aid of MR images (4.7T, Bruker) obtained along the direction of the chamber. Single-unit recording was performed using glass-coated electrode (Alpha-Omega). The electrode was inserted into the brain through a stainless-steel guide tube and advanced by an oil-driven micromanipulaltor (MO-97A, Narishige). The electric signal from the electrode was amplified with a band-pass filter (0.2–10 kHz; BAK). Neuronal spikes were isolated online using a custom voltage-time window discrimination software (MEX, Laboratory of Sensorimotor Research, National Eye Institute – National Institutes of Health [LSR/NEI/NIH]) and their timings were detected at 1 kHz. The waveforms of individual spikes were collected at 50 kHz.

Identification of dopamine neurons by electrophysiological properties

Presumed dopamine (DA) neurons were identified by their tonic baseline activity around five spikes per second and broad spike potential. To characterize the electrophysiological properties of recorded neurons, we used two parameters: (1) baseline firing rate and (2) spike waveform. Baseline firing rate is the mean firing rate during 1 sec before the onset of the fixation dot in passive viewing task. To quantify spike waveform, we measured the spike duration which was defined as the time between the first and second negative peaks.

Behavioral procedure

Behavioral procedure was controlled by QNX-based real-time experimentation data acquisition system (REX, LSR/NEI/NIH). The monkey sat in a primate chair, facing a frontoparallel screen in a sound-attenuated and electrically shielded room. Visual stimuli generated by an active matrix liquid crystal display projector (PJ550, ViewSonic) were rear projected on the screen. We created the visual stimuli using fractal geometry (Yamamoto et al., 2012). Their sizes were ~8° × 8°.

The behavioral procedure consisted of two phases: learning (object-value learning task) and testing (passive viewing task for neuronal testing, free viewing procedure for behavioral testing). Importantly, the learning was guided by reward (i.e., water), but the testing was done with no reward outcome. Details are explained below.

Object-value learning task

In this task, monkeys viewed visual objects repeatedly in association with consistent reward outcomes and thus learned their stable values (Figure 1C) (Yamamoto et al., 2013). In each session of this and the following tasks, a set of eight computer-generated fractals was used as visual objects. While the monkey was fixating on a central white dot, one of the objects was presented at a neurons’ preferred position (ipsilateral or contralateral position, 15 deg from center). The center fixation spot turned off 400 ms later, and the monkey was required to make a saccade to the object. Half of the objects were always associated with a liquid reward (high-valued objects), whereas the other half were associated with no reward (low-valued objects). A tone was presented with either outcome. One training session consisted of 112 trials (14 trials for each object). Each set was learned in one learning session in one day. The same sets of objects were repeatedly learned with the same object-value associations across days, while new sets of fractals were introduced for learning across days. At the time of the neuronal recording and behavioral experiments started, there were many objects (40–440 for monkey PK, 608–840 for monkey DW) whose levels of learning varied (from 0 day to 1053 days), which allowed us to examine how neuronal and behavioral responses changed during long-term learning. When this learning task was used while a presumed DA neuron was being recorded, the fractal was mostly presented at a contralateral position (15 deg from center) because these neurons often showed contralateral spatial selectivity (Figure 4B).

Passive viewing task

This task was used to examine how a presumed DA neuron responded to the value-learned objects, but now without any contingent reward outcome (Figure 2B) (Kim and Hikosaka, 2013; Yamamoto et al., 2013; Yasuda et al., 2012). While the monkey was fixating on a central white dot, some of the fractals (n = 2–6) were chosen pseudorandomly and presented sequentially at a contralateral position (15 deg from center, presentation time: 400 ms, inter-object interval: 500–700 ms). Reward was delivered 300 ms after the last object was presented. The reward was thus not contingently associated with any object. Each object was presented at least six times in one session. The value-coding activity of DA neurons was tested before learning, immediately after 1st learning, and after long-term learning (more than four daily learning sessions) with a sufficient retention period (> 1day after the last learning session). For each neuron, we used multiple sets of well-learned objects (2–4 sets, or 16–32 objects) to test its stable value-coding during retention.

Reversal task

To examine value-updating activity, we used a task in which the object-value contingency was reversed in every block of 20–35 trials (Kim and Hikosaka 2013; Yamamoto et al., 2013; Yasuda et al., 2012). The procedure in each trial was the same as the object-value learning task (above). Unlike the object-value learning task, the same two fractal objects (2 and 3 pairs for monkey PK and DW) were used as the saccade target. In each trial, one of them was presented at a right or left position pseudorandomly (15 deg from center). In a block of 20–35 trials, one of the objects was associated with a reward and the other with no reward. In the next block the object-reward contingency was reversed. At least four blocks were included in one experiment.

Free viewing procedure

This task was used to examine how the monkey responded to the value-learned objects, but without any reward outcome (Kim and Hikosaka, 2013; Yamamoto et al., 2013; Yasuda et al., 2012). After the monkey fixated on a central white dot for 300 ms, four objects were chosen pseudorandomly and presented simultaneously in four symmetric positions (15 deg from center) (Figure 1E). The monkey was free to look at them for 2 seconds without any reward outcome. After a blank period (500 ms), another four objects were presented. On half of the trials, a white dot was presented at one of eight positions. If the monkey made a saccade to it, a liquid reward was delivered. Each object was presented at least 16 times in one session.

Neuronal spatial preference

To test the spatial preference of presumed DA neurons, we presented fractal objects in either left or right position (15 deg from center) in a saccade task. The task was similar to object value learning task, but only two familiar fractals were used while the object-value association was reversed after each block of 20–35 trials.

Identifying the CDt-projecting neurons by antidromic activation

To test if a presumed DA neuron projected to CDt, we inserted two electrodes in CDt and SNc through the lateral and posterior chambers, respectively (Figure 5A). First, to determine the stimulation site, we lowered the CDt electrode until we found neurons that had typical electrophysiological properties of striatal output neurons (Figure S6B) (Hikosaka et al., 1989b). If we found that the neurons responded to fractal objects with stable value-coding (Kim and Hikosaka, 2013; Yamamoto et al., 2013), we fixed the position of the electrode for stimulation. We then lowered another electrode to SNc while stimulating CDt, until we found spikes that were evoked with a fixed latency. The antidromic nature of the spikes was confirmed by a collision test. A biphasic pulse with cathodal and anodal components was used for the stimulation. The currents for cathodal pulse ranged from 100 μA to 1000 μA (anodal pulse lower). The biphasic negative-positive pulse was delivered with 0.4 ms per phase duration. The CDt stimulation site measured from the anterior commissure was at 8.5 mm posterior for monkey PK and 13 mm posterior for monkey DW.

Testing the influence of CDt on SNc neurons by orthodromic activation

We noticed that CDt stimulation induced changes in activity in some of the presumed DA neurons, but with no fixed latencies. The data were collected as the orthodromic effects on the SNc neurons (Figure 5D–F).

Data analysis

To assess the neuronal discrimination, we first measured the magnitude of the neuron’s response to each fractal object and reward outcome by counting the numbers of spikes within a test window in individual trials. The test window was set to 0–400 ms after the onset of the object and after the onset of the reward outcome in both object value learning and passive viewing tasks. The neuronal discrimination was defined as the area under the receiver operating characteristic (ROC) based on the response magnitudes of the neurons to high-valued objects versus low-valued objects, reward versus no reward, or before reward versus after reward. The statistical significance of the neuronal discrimination and its changes was tested using Wilcoxon rank-sum test. To assess the behavioral discrimination, we used several measures. For object value learning task, we computed the target acquisition time which was measured as the time after the fixation dot disappeared until the gaze reached the object. For free viewing procedure, we measured saccade-choice rate and gazing duration rate. The saccade-choice rate was defined as follows: (nSACh − nSACl)/ (nSACh + nSACl) where nSACh and nSACl are the numbers of saccades toward high-valued and low-valued objects, respectively. The gazing duration rate was defined as follows: (tGAZh − tGAZl)/ tGAZh + tGAZl) where tGAZh and tGAZl are the durations of gaze on high-valued and low-valued objects, respectively.

Anatomical procedures

Electric marking lesion, retrograde tracer injection, histology and immunohistochemistry are described in the Supplemental Experimental Procedures.

Supplementary Material

1
2
3
4
5
6
7
8

Acknowledgments

We thank M. Yasuda for discussions and D. Parker, I. Bunea, M.K. Smith, G.Tansey, A.M. Nichols, T.W. Ruffner, J.W. McClurkin, and A.V. Hays for technical assistance. This research was supported by the Intramural Research Program at the National Institutes of Health, National Eye Institute.

Footnotes

Author contributions

A.G. and O.H. did pilot experiments. H.F.K. and O.H. designed the main part of the experiments. H.F.K collected and analyzed data. H.F.K., O.H. and A.G. wrote the paper.

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

  1. Balleine BW, Dickinson A. Goal-directed instrumental action: contingency and incentive learning and their cortical substrates. Neuropharmacology. 1998;37:407–419. doi: 10.1016/s0028-3908(98)00033-1. [DOI] [PubMed] [Google Scholar]
  2. Balleine BW, O’Doherty JP. Human and rodent homologies in action control: corticostriatal determinants of goal-directed and habitual action. Neuropsychopharmacology. 2010;35:48–69. doi: 10.1038/npp.2009.131. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Bayley PJ, Frascino JC, Squire LR. Robust habit learning in the absence of awareness and independent of the medial temporal lobe. Nature. 2005;436:550–553. doi: 10.1038/nature03857. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Brischoux F, Chakraborty S, Brierley DI, Ungless MA. Phasic excitation of dopamine neurons in ventral VTA by noxious stimuli. Proc Natl Acad Sci. 2009;106:4894–4899. doi: 10.1073/pnas.0811507106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Cohen JY, Haesler S, Vong L, Lowell BB, Uchida N. Neuron-type-specific signals for reward and punishment in the ventral tegmental area. Nature. 2012;482:85–88. doi: 10.1038/nature10754. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Comoli E, Coizet V, Boyes J, Bolam JP, Canteras NS, Quirk RH, Overton PG, Redgrave P. A direct projection from superior colliculus to substantia nigra for detecting salient visual events. Nat Neurosci. 2003;6:974–980. doi: 10.1038/nn1113. [DOI] [PubMed] [Google Scholar]
  7. D’Ardenne K, McClure SM, Nystrom LE, Cohen JD. BOLD responses reflecting dopaminergic signals in the human ventral tegmental area. Science. 2008;319:1264–1267. doi: 10.1126/science.1150605. [DOI] [PubMed] [Google Scholar]
  8. Deniau JM, Kitai ST, Donoghue JP, Grofova I. Neuronal interactions in the substantia nigra pars reticulata through axon collaterals of the projection neurons. An electrophysiological and morphological study. Exp Brain Res. 1982;47:105–113. doi: 10.1007/BF00235891. [DOI] [PubMed] [Google Scholar]
  9. Fernandez-Ruiz J, Wang J, Aigner TG, Mishkin M. Visual habit formation in monkeys with neurotoxic lesions of the ventrocaudal neostriatum. Proc Natl Acad Sci U S A. 2001;98:4196–4201. doi: 10.1073/pnas.061022098. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Gerfen CR. The neostriatal mosaic: multiple levels of compartmental organization. Trends Neurosci. 1992;15:133–139. doi: 10.1016/0166-2236(92)90355-c. [DOI] [PubMed] [Google Scholar]
  11. Goldberg ME, Wurtz RH. Activity of superior colliculus in behaving monkey. I Visual receptive fields of single neurons. J Neurophysiol. 1972;35:542–559. doi: 10.1152/jn.1972.35.4.542. [DOI] [PubMed] [Google Scholar]
  12. Goldstein RZ, Craig ADB, Bechara A, Garavan H, Childress AR, Paulus MP, Volkow ND. The neurocircuitry of impaired insight in drug addiction. Trends Cogn Sci. 2009;13:372–380. doi: 10.1016/j.tics.2009.06.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Goto S, Hirano A, Matsumoto S. Subdivisional involvement of nigrostriatal loop in idiopathic Parkinson’s disease and striatonigral degeneration. Ann Neurol. 1989;26:766–770. doi: 10.1002/ana.410260613. [DOI] [PubMed] [Google Scholar]
  14. Grace AA, Bunney BS. Intracellular and extracellular electrophysiology of nigral dopaminergic neurons--1. Identification and characterization. Neuroscience. 1983;10:301–315. doi: 10.1016/0306-4522(83)90135-5. [DOI] [PubMed] [Google Scholar]
  15. Graybiel AM. Habits, rituals, and the evaluative brain. Annu Rev Neurosci. 2008;31:359–387. doi: 10.1146/annurev.neuro.29.051605.112851. [DOI] [PubMed] [Google Scholar]
  16. Hikosaka O, Sakamoto M, Usui S. Functional properties of monkey caudate neurons. III Activities related to expectation of target and reward. J Neurophysiol. 1989a;61:814–832. doi: 10.1152/jn.1989.61.4.814. [DOI] [PubMed] [Google Scholar]
  17. Hikosaka O, Sakamoto M, Usui S. Functional properties of monkey caudate neurons. I Activities related to saccadic eye movements. J Neurophysiol. 1989b;61:780–798. doi: 10.1152/jn.1989.61.4.780. [DOI] [PubMed] [Google Scholar]
  18. Hikosaka O, Nakahara H, Rand MK, Sakai K, Lu X, Nakamura K, Miyachi S, Doya K. Parallel neural networks for learning sequential procedures. Trends Neurosci. 1999;22:464–471. doi: 10.1016/s0166-2236(99)01439-3. [DOI] [PubMed] [Google Scholar]
  19. Hikosaka O, Yamamoto S, Yasuda M, Kim HF. Why skill matters. Trends Cogn Sci. 2013;17:434–441. doi: 10.1016/j.tics.2013.07.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Ikeda T, Hikosaka O. Reward-dependent gain and bias of visual responses in primate superior colliculus. Neuron. 2003;39:693–700. doi: 10.1016/s0896-6273(03)00464-1. [DOI] [PubMed] [Google Scholar]
  21. Ikeda T, Hikosaka O. Positive and negative modulation of motor response in primate superior colliculus by reward expectation. J Neurophysiol. 2007;98:3163–3170. doi: 10.1152/jn.00975.2007. [DOI] [PubMed] [Google Scholar]
  22. Kim HF, Hikosaka O. Distinct Basal Ganglia circuits controlling behaviors guided by flexible and stable values. Neuron. 2013;79:1001–1010. doi: 10.1016/j.neuron.2013.06.044. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Kim HF, Hikosaka O. Parallel basal ganglia circuits for voluntary and automatic behaviour to reach rewards. Brain. 2015;138:1776–1800. doi: 10.1093/brain/awv134. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Kim HF, Ghazizadeh A, Hikosaka O. Separate groups of dopamine neurons innervate caudate head and tail encoding flexible and stable value memories. Front Neuroanat. 2014;8:120. doi: 10.3389/fnana.2014.00120. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Lehericy S, Benali H, Van de Moortele PF, Pelegrini-Issac M, Waechter T, Ugurbil K, Doyon J. Distinct basal ganglia territories are engaged in early and advanced motor sequence learning. Proc Natl Acad Sci. 2005;102:12566–12571. doi: 10.1073/pnas.0502762102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Lerner TN, Shilyansky C, Davidson TJ, Evans KE, Beier KT, Zalocusky KA, Crow AK, Malenka RC, Luo L, Tomer R, et al. Intact-Brain Analyses Reveal Distinct Information Carried by SNc Dopamine Subcircuits. Cell. 2015;162:635–647. doi: 10.1016/j.cell.2015.07.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Matsumoto M, Hikosaka O. Two types of dopamine neuron distinctly convey positive and negative motivational signals. Nature. 2009;459:837–841. doi: 10.1038/nature08028. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Moody TD, Chang GY, Vanek ZF, Knowlton BJ. Concurrent discrimination learning in Parkinson’s disease. Behav Neurosci. 2010;124:1–8. doi: 10.1037/a0018414. [DOI] [PubMed] [Google Scholar]
  29. Redgrave P, Rodriguez M, Smith Y, Rodriguez-Oroz MC, Lehericy S, Bergman H, Agid Y, DeLong MR, Obeso JA. Goal-directed and habitual control in the basal ganglia: implications for Parkinson’s disease. Nat Rev Neurosci. 2010;11:760–772. doi: 10.1038/nrn2915. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Richfield EK, Young AB, Penney JB. Comparative distribution of dopamine D-1 and D-2 receptors in the basal ganglia of turtles, pigeons, rats, cats, and monkeys. J Comp Neurol. 1987;262:446–463. doi: 10.1002/cne.902620308. [DOI] [PubMed] [Google Scholar]
  31. Schultz W. Responses of midbrain dopamine neurons to behavioral trigger stimuli in the monkey. J Neurophysiol. 1986;56:1439–1461. doi: 10.1152/jn.1986.56.5.1439. [DOI] [PubMed] [Google Scholar]
  32. Schultz W. Predictive reward signal of dopamine neurons. J Neurophysiol. 1998;80:1–27. doi: 10.1152/jn.1998.80.1.1. [DOI] [PubMed] [Google Scholar]
  33. Schultz W, Dayan P, Montague PR. A neural substrate of prediction and reward. Science. 1997;275:1593–1599. doi: 10.1126/science.275.5306.1593. [DOI] [PubMed] [Google Scholar]
  34. Seger CA, Spiering BJ. A critical review of habit learning and the Basal Ganglia. Front Syst Neurosci. 2011;5:66. doi: 10.3389/fnsys.2011.00066. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Surmeier D, Ding J, Day M, Wang Z, Shen W. D1 and D2 dopamine-receptor modulation of striatal glutamatergic signaling in striatal medium spiny neurons. Trends Neurosci. 2007;30:228–235. doi: 10.1016/j.tins.2007.03.008. [DOI] [PubMed] [Google Scholar]
  36. Sutton RS. Learning to predict by the methods of temporal differences. Mach Learn. 1988;3:9–44. [Google Scholar]
  37. Taylor JR, Robbins TW. 6-Hydroxydopamine lesions of the nucleus accumbens, but not of the caudate nucleus, attenuate enhanced responding with reward-related stimuli produced by intra-accumbens d-amphetamine. Psychopharmacology (Berl) 1986;90:390–397. doi: 10.1007/BF00179197. [DOI] [PubMed] [Google Scholar]
  38. Tepper JM, Martin LP, Anderson DR. GABAA receptor-mediated inhibition of rat substantia nigra dopaminergic neurons by pars reticulata projection neurons. J Neurosci. 1995;15:3092–3103. doi: 10.1523/JNEUROSCI.15-04-03092.1995. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Tobler PN, Fiorillo CD, Schultz W. Adaptive coding of reward value by dopamine neurons. Science. 2005;307:1642–1645. doi: 10.1126/science.1105370. [DOI] [PubMed] [Google Scholar]
  40. West AR, Grace AA. Opposite influences of endogenous dopamine D1 and D2 receptor activation on activity states and electrophysiological properties of striatal neurons: studies combining in vivo intracellular recordings and reverse microdialysis. J Neurosci. 2002;22:294–304. doi: 10.1523/JNEUROSCI.22-01-00294.2002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Yamamoto S, Monosov IE, Yasuda M, Hikosaka O. What and where information in the caudate tail guides saccades to visual objects. J Neurosci. 2012;32:11005–11016. doi: 10.1523/JNEUROSCI.0828-12.2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Yamamoto S, Kim HF, Hikosaka O. Reward value-contingent changes of visual responses in the primate caudate tail associated with a visuomotor skill. J Neurosci. 2013;33:11227–11238. doi: 10.1523/JNEUROSCI.0318-13.2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Yasuda M, Hikosaka O. Functional territories in primate substantia nigra pars reticulata separately signaling stable and flexible values. J Neurophysiol. 2015;113:1681–1696. doi: 10.1152/jn.00674.2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Yasuda M, Yamamoto S, Hikosaka O. Robust Representation of Stable Object Values in the Oculomotor Basal Ganglia. J Neurosci. 2012;32:16917–16932. doi: 10.1523/JNEUROSCI.3438-12.2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Yin HH, Knowlton BJ. The role of the basal ganglia in habit formation. Nat Rev Neurosci. 2006;7:464–476. doi: 10.1038/nrn1919. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1
2
3
4
5
6
7
8

RESOURCES