Abstract
A goal-directed action aiming at an incentive outcome, if repeated, becomes a skill that may be initiated automatically. We now report that the tail of the caudate nucleus (CDt) may serve to control a visuomotor skill. Monkeys looked at many fractal objects, half of which were always associated with a large reward (high-valued objects) and the other half with a small reward (low-valued objects). After several daily sessions, they developed a gaze bias, looking at high-valued objects even when no reward was associated. CDt neurons developed a response bias, typically showing stronger responses to high-valued objects. In contrast, their responses showed no change when object values were reversed frequently, although monkeys showed a strong gaze bias, looking at high-valued objects in a goal-directed manner. The biased activity of CDt neurons may be transmitted to the oculomotor region so that animals can choose high-valued objects automatically based on stable reward experiences.
Introduction
A transition from a goal-directed action to a skill occurs in everyday life. Suppose that you are in front of a vending machine where you find several new kinds of drinks. You try one of them. If you like it, you start choosing it more often and eventually choose it right away without thinking much. Such a habitual choice may remain desirable because the taste (or value) of the drink will remain stable. Therefore, it can be called a “skill,” which would be defined as a well-adjusted and acquired performance, depending on motor behavior (Adams, 1987).
What then is the neural mechanism underlying the skillful choice of visual objects? There are several requirements. First, the skillful choice mechanism must receive detailed information on many objects, most likely as visual information. Otherwise, different kinds of drinks may not be discriminated. Second, it needs to encode the spatial positions of the objects. Otherwise, a desirable object cannot be chosen out of many objects. Third, it must have an easy access to motor outputs, because a choice requires body movements (e.g., look and reach). Finally, the skill mechanism must encode the stable values of many objects.
The tail of the caudate nucleus (CDt) satisfies the first three criteria. The CDt is a morphologically distinct subregion of the caudate nucleus. The CDt is prominently developed in the primates (Paxinos and Watson, 2007; Saleem and Logothetis, 2007), which heavily rely on visual information (Orban et al., 2004). First, many CDt neurons respond to complex visual objects and do so in an object-selective manner (Caan et al., 1984; Brown et al., 1995; Yamamoto et al., 2012). Second, CDt neurons showed strong spatial selectivity (Yamamoto et al., 2012). This is critically different from neurons in the inferotemporal cortex (Gross et al., 1972), which is thought to provide the CDt with visual object information (Yeterian and Van Hoesen, 1978; Van Hoesen et al., 1981; Saint-Cyr et al., 1990; Webster et al., 1995). Third, electrical stimulation in the CDt readily induced spatially selective saccades (Yamamoto et al., 2012). Finally, however, there is no evidence that CDt neurons encode the values of visual objects.
A role of the CDt in visual choice learning was proposed by Mishkin et al. (1984). Based on lesion and anatomical studies, they proposed that the connection from the inferior temporal cortex to the CDt mediates visual choice learning (“visual habit” in their words). Indeed, the monkey's performance on a concurrent discrimination task was impaired by the lesions of the CDt (Fernandez-Ruiz et al., 2001).
However, Brown et al. (1995) found no evidence that the responses of CDt neurons to visual objects were influenced by the reward values associated with the objects. In their experiment, the values of objects were created flexibly through short-term object–reward experiences. Conversely, the skillful choice may require the stable values of objects that could be acquired only through long-term object–reward experiences. Indeed, we found that CDt neurons encode the stable, but not flexible, values of visual objects.
Materials and Methods
We used three male rhesus monkeys (Macaca mulatta): monkeys Z, D, and T. After each monkey was sedated by general anesthesia, we implanted a head holder, a chamber for unit recording, and eye coils. All animal care and experimental procedures were approved by the National Eye Institute Animal Care and Use Committee and complied with the Public Health Service Policy on the humane care and use of laboratory animals.
Behavioral tasks
Behavioral tasks were controlled by a custom real-time experimentation data acquisition system (REX; Laboratory of Sensorimotor Research, National Eye Institute, National Institutes of Health) (Hays et al., 1982). Three monkeys (Z, D, and T) participated in the experiments. The monkeys sat in a primate chair and faced a front screen on which visual stimuli were presented.
Fractals used as visual objects.
We created visual stimuli using fractal geometry (Miyashita et al., 1991). One fractal was composed of four point-symmetrical polygons that were overlaid around a common center such that smaller polygons were positioned more toward the front. The parameters that determined each polygon (size, edges, color, etc.) were chosen randomly. Its size was ∼8° × 8°. Because it was unlikely that the monkey had seen any of the fractal objects before the experiment, we could control the level of object–reward association. Furthermore, we could generate an infinite number of novel fractal objects. These features allowed us to repeat object–reward association learning and at the same time test the short-term and long-term effects of learning.
Flexible object–value association procedure.
This procedure allowed us to examine the effects of short-term object–reward association on saccadic behavior and CDt neuronal activity. The learning was short-term because the values of visual objects were reversed in blocks of trials. Thus, learning (of object values) and testing (of the monkey's behavior and of the activity of the CDt neuron) were done in one task procedure (object-directed saccade task), as illustrated in Figure 2, A and B. For each monkey a fixed set of two fractal objects (say, A and B) was used as the saccade target. The coding of flexible values by the CDt neuron was assessed by comparing its responses to the same object between two conditions: when the object was associated with a reward (high-valued) and when it was associated with no reward (low-valued).
Each trial started with the appearance of a central white spot on which the monkey had to fixate. After 700 ms, while the monkey was fixating on the central spot, one of the two fractal objects was chosen pseudorandomly and was presented at the preferred position of the neuron. In monkey D, the fractal object was also presented at the point-symmetrical position of the preferred position of the neuron to the central fixation point (FP). The fixation spot disappeared some time later (600 ms for monkey Z, 400 ms for monkeys D, and 450 ms for monkey T), and then the monkey was required to make a saccade to the object within 1000 ms in monkeys Z and T or within 3000 ms in monkey D. If the gaze was held on the object for 600 ms for monkeys Z and T and for 300 ms for monkey D, an outcome was delivered. The outcome was a tone and a larger amount of liquid reward if the saccade was made to one object (e.g., A) and a tone alone or a smaller amount of liquid reward if the saccade was made to the other object (e.g., B). Some CDt neurons had preferred positions close to the center (i.e., parafoveal receptive fields) (Yamamoto et al., 2012). In this case, the target object was presented at the center, and the monkey was not required to make any saccade.
During a block of trials (36 trials in monkeys Z and T, 25–45 trials in monkey D), the object–reward contingency was fixed (e.g., A-reward/B-no reward), but it was reversed in a following block (e.g., B-reward/A-no reward) without any external cue. While a CDt neuron was being recorded, these two blocks were alternated in blocks (their order counterbalanced across neurons). Most trials (89% in monkeys Z and T and 80% in monkey D) were forced trials: one of the two objects was presented, and the monkey had to make a saccade to it. The object was presented at the preferred position of the recorded neuron in monkeys Z and T and at the preferred position or its point-symmetrical position with respect to the fixation spot in monkey D. The rest of trials (11% in monkeys Z and T and 20% in monkey D) were choice trials: two objects were presented simultaneously, one at the preferred position of the recorded neuron and the other at the point-symmetrical position with respect to the fixation spot. The monkey had to choose one of the objects by making a saccade to it. Then, the outcome associated with the chosen object (reward or no reward) was delivered. If the preferred position of the neuron was close to the center, two objects were presented at right and left with the eccentricity of 15°. The targets and their position were counterbalanced. If the monkey failed to make a saccade correctly on either forced or choice trials, the same trial was repeated. In each recording session, these two types of block were repeated at least twice.
Stable object–value association procedure.
This procedure allowed us to examine the effects of long-term object–reward association on saccadic behavior and CDt neuronal activity. The learning was long-term because the values of visual objects were fixed across daily training sessions. Learning (of object values) and testing (of the monkey's behavior and of the activity of the CDt neuron) were done separately: (1) procedure for object–reward association learning (see Fig. 6); (2) procedure for testing saccadic behavior (see Fig. 7); and (3) procedure for testing CDt neuronal activity (see Fig. 9). This separation of the learning–testing procedures precluded possible influences of short-term reward effects. Also importantly, the testing procedure was done in a neutral condition: the monkey obtained no reward when learned objects were presented (in case of behavioral testing) or the monkey did obtain reward but not in association with particular objects (in case of neuronal testing). The “neutral” condition during testing was critical because, otherwise, any change in the monkey's behavior or neuronal activity could be derived from short-term reward experiences.
The goal of the learning procedure was to create a fixed bias among fractal objects in their reward values (i.e., high-valued and low-valued objects). For this purpose, we used an object-directed saccade task (see Fig. 6). In each session, a set of eight fractal objects was used as the target. On each trial, one of the fractal objects was chosen pseudorandomly as the target and was presented at one of five positions (right, up, left, bottom, and center). The monkey was required to make a saccade to the target (except when it was presented at the center). Importantly, half of the fractal objects were always associated with a larger amount of liquid reward (i.e., high-valued objects), whereas the other half were always associated with no reward or smaller amount of liquid reward (i.e., low-valued objects). One learning session consisted of 160 trials (four trials for each object at each position). Each set of objects was trained in one learning session in 1 d. The same sets of fractals were used repeatedly for learning across days (or months), throughout which each object remained to be either a high-valued object or a low-valued object.
The goal of the testing procedure was to examine the long-term reward effects on saccadic behavior and CDt neuronal activity, while excluding any short-term reward effect. For testing saccadic behavior, we used a free-viewing task (see Fig. 7). For testing neuronal activity, we used a passive-viewing task (see Fig. 9). These procedures are explained below.
Free-viewing task.
On each trial, four of a set of eight fractal objects were chosen pseudorandomly and were presented simultaneously at four radially symmetrical positions (or tilted by 45°) with the eccentricity of 15 ° (see Fig. 7B). On each odd-numbered trial, four fractals were chosen randomly from a set of eight fractal objects. On the following even-numbered trial, the remaining four fractals were presented. Each fractal presentation lasted for 3000 ms in monkeys Z and T and 2000 ms in monkey D. The monkey was free to look at these objects (or something else) by making saccades between them, but no reward was given. After a blank period (500–700 ms), another four objects were presented. Occasionally, a white small dot, instead, was presented at one of the other four positions than the four positions at which fractals were presented in the last trial. If the monkey made a saccade to it and held the gaze on it for 600–800 ms in monkeys Z and T and for 300–600 ms in monkey D, a reward was delivered. Each object was presented ∼50 times in monkeys Z and T and 32 times in monkey D in one session.
Passive-viewing task.
While the monkey was fixating on a central spot of light, two to six fractal objects (pseudorandomly chosen from a set of eight objects) were presented in the preferred location of the neuron in sequence (presentation start, 600–800 ms after monkey's fixation; presentation duration, 400 ms; interobject time interval, 500–700 ms) (see Fig. 9B). A liquid reward was delivered 500–700 ms after the presentation of the last fractal. Thus, the reward was not associated with particular objects. Each object was presented at least 15 times in monkeys Z and T and eight times in monkey D in one session.
Electrophysiology
Based on a stereotaxic atlas (Saleem and Logothetis, 2007), a rectangular (28 × 26 mm or 36 × 26 mm; anteroposterior × mediolateral) or circular (19 mm diameter) recording chamber was placed over the parietal cortex, tilted laterally by 25°, and aimed at the CDt. MR images (4.7 T; Bruker) were then obtained along the direction of the recording chamber that was visualized with gadolinium that filled grid holes and inside the chamber (Fig. 1).
Single-neuron recordings were performed using tungsten electrodes [0.25 mm diameter, 1–3 MΩ (FHC); 0.39 mm diameter, 1–3 MΩ (Alpha Omega)]. The recording site was determined using a grid system, which allowed electrode penetrations at every 1 mm. Based on the MR images and preceding recording data, we chose a grid hole to hold the stainless steel guide tube, through which the electrode was inserted and was advanced by an oil-driven micromanipulator (MO-97A; Narishige). Based on the grid hole position and the reading of the electrode depth, we estimated the three-dimensional position of the electrode.
The electrical signal from the electrode was amplified with a bandpass filter (200 Hz to 10 kHz; BAK) and collected at 1 kHz via a custom-made window discriminator. Single neurons were isolated online using the custom voltage–time window discrimination software (MEX; Laboratory of Sensorimotor Research, National Eye Institute, National Institutes of Health).
To find visually responsive CDt neurons, we let the monkey continue to perform the passive-viewing task or the object-directed saccade task. Because CDt neurons fired spikes only occasionally, we could find and examine only those neurons that responded to these visual-saccade tasks. Thus, it is likely that other nonvisual neurons, if present, remained undetected or uncharacterized.
We recorded 73, 39, and 25 neurons (monkeys Z, D, and T, respectively) in a flexible object–reward association procedure and 102, 46, and 58 neurons (monkeys Z, D, and T, respectively) in the passive-viewing task in a stable object–reward association procedure. The recorded neurons did not contain tonically active neurons, which were characterized by their tonic firing pattern (Aosaki et al., 1995). Most recorded neurons were considered as medium-spiny projection neurons and some may be GABAergic interneurons (Hikosaka et al., 1989; Kimura et al., 1996).
Data analyses
Effects of object–value association learning on saccadic behavior.
To evaluate the effects of the flexible object–reward association learning, we measured two parameters: (1) the probability of choosing high-valued objects in the choice trials and (2) the reaction time (RT) in the forced trials. The RT was measured as the time from the offset of the FP to the onset of the saccade to the fractal object (Fig. 2A). Based on these parameters, we defined a choice index and an RT index as follows:
where Hchoice and Lchoice are the averaged choice probabilities of the high-valued and low-valued objects, respectively, and
where HRT and LRT are the averaged RTs for the high-valued and low-valued objects, respectively. To show the across-trial changes in the probability of choosing high-valued objects and the RT, we averaged these values for each trial in the flexible object–value association task (Figs. 2C,D, 3A,C).
To evaluate the behavioral effects of the stable object–reward association learning, we measured two parameters obtained in the free-viewing task: (1) gaze duration on each object and (2) the probability of saccades to each object. Based on these parameters, we defined a gaze index and a first saccade index as follows:
where Hgaze and Lgaze are the averaged gaze durations on the high-valued and low-valued objects, respectively, and
where HFS and LFS are the averaged probabilities that the first saccade targeted the high-valued and low-valued objects, respectively.
Effects of object–value association learning on CDt neuronal responses.
To evaluate the neuronal discrimination, we measured the magnitude of the response of the CDt neuron to each fractal object by counting the numbers of spikes within a test window in individual trials. For flexible object–value learning, we defined the test window for 400 ms after the target onset in the flexible object–value association task (the same duration as in the passive-viewing task). For stable object–value learning, we defined the test window as the whole object presentation period (400 ms) in the passive-viewing task (see Fig. 9B).
The value modulation index of CDt neurons was defined as the area under the receiver operating characteristic (ROC) based on the response magnitudes of the CDt neurons to high-valued objects versus low-valued objects. The value modulation index was calculated using individual responses in each trial for all objects tested. The statistical significance of the value modulation index was tested using Wilcoxon's rank-sum test based on the response magnitudes in individual trials.
Histology
In the later part of the experiments in monkey T, we made electrolytic microlesions at the recording and stimulation sites (12 μA and 30 s). We chose several sites for the microlesions along the anteroposterior axis of the CDt. For each site, we made two to three microlesions with different patterns of intervals, one of them usually inside the CDt. The animal was then deeply anesthetized with pentobarbital sodium and perfused with 4% paraformaldehyde. Frozen sections were cut every 50 μm in the coronal plane. The sections were stained with cresyl violet (Yamamoto et al., 2012, their Fig. 1C,D).
Results
Our previous study (Yamamoto et al., 2012) suggested that the CDt has a mechanism that guides gaze to particular objects in particular positions. However, it was still unknown how this mechanism is used. We hypothesized that the CDt mechanism is trained by past experience so that gaze is directed to more valuable objects. A prominent outcome that determines the value of an object is reward. Therefore, we presented visual objects followed by different amounts of reward and then examined the responses of single CDt neurons to the objects. Past experiences can be short-term or long-term. In some cases, short-term experiences are more important than long-term experiences; in other cases, long-term experiences are more important. Therefore, we devised two paradigms so that short-term experiences and long-term experiences can be tested separately. Using these paradigms, we recorded spike activity of single neurons in a wide area of the CDt in three monkeys. The CDt was identified as an elongated structure located along and above the inferior horn of the lateral ventricle, as visualized on MR images (Fig. 1) and later confirmed histologically.
Effects of short-term reward experiences
To examine the effects of short-term reward experiences, we devised a flexible object–value association task (Fig. 2A,B) in which the object–reward contingency was reversed in a blockwise manner. In a first block of trials (Fig. 2B, top), one object (say, A) was associated with a reward and the other (B) was associated with no reward. In a second block (Fig. 2B, bottom), the relationship was reversed. There were two trial types: (1) one object was presented (forced trials) or (2) two objects were presented (choice trials).
The monkey's saccadic behavior changed quickly each time the object–reward contingency was reversed. Examples are shown in Figure 2, C and D, for monkey T. On choice trials (Fig. 2C), the monkey chose an object that had been associated with a reward. As soon as the object–reward contingency was reversed, the monkey's choice was reversed. In other words, the monkey's behavior changed flexibly based on short-term reward experiences, choosing the object that had been associated with a reward (hereafter called “high-valued object”) and avoiding the object associated with no reward (“low-valued object”).
The monkey's flexible preference is also evident on forced trials (Fig. 2D). The saccade RT (measured from the offset of the FP) was shorter and less variable for the saccades directed to the high-valued object than those directed to the low-valued object. The RT bias was reversed quickly after the object–reward contingency was reversed.
All three monkeys showed similarly flexible changes in saccadic behavior (Fig. 3). Notably, however, there were individual differences. Monkeys T and D showed quicker and stronger choice biases than monkey Z (Fig. 3A,B). Monkey T showed a larger RT bias than monkey D or Z (Fig. 3C,D). Overall, the bias based on short-term reward experiences was strongest in monkey T and weakest in monkey Z.
If CDt neurons contribute to the flexible changes in saccadic behavior, they should show similarly flexible changes in their responses to the visual objects. This was not what we found. Figure 4 shows an example obtained in monkey Z. This CDt neuron responded to the two objects with different magnitudes, confirming object feature selectivity of CDt neurons (Caan et al., 1984; Brown et al., 1995; Yamamoto et al., 2012). However, the response of the neuron to each object showed little change, regardless of whether the object was high-valued (i.e., recently associated with a reward) or low-valued (i.e., recently associated with no reward) (Fig. 4B, top vs bottom).
Overall, CDt neurons were not affected by short-term reward experiences (Fig. 5). The averaged visual responses showed no difference between high-valued and low-valued objects (Fig. 5A, red and blue curves). To evaluate the value-dependent bias for each neuron, we calculated an ROC area and defined it as a value modulation index. An ROC area larger (or smaller) than 0.5 indicates that the neuron responded more strongly to high- than low-valued (or low- than high-valued) objects. The ROC areas were distributed around 0.5 for all three monkeys (mean, 0.504, 0.499, and 0.478 for monkeys Z, D, and T, respectively) and were not significantly different from 0.5 (t test, p = 0.51, 0.95, and 0.39 for monkeys Z, D, and T, respectively). Neurons whose ROC areas were significantly different from 0.5 are shown in black in Figure 5B (Wilcoxon's rank-sum test, n = 5 of 73, 3 of 39, and 10 of 25 for monkeys Z, D, and T, respectively). Their number was not significantly larger than that expected by chance in monkeys Z and D but was larger in monkey T (binomial test, p = 0.12, 0.22, and 1.5 × 10−15, respectively). The ROC areas overall were not significantly deviated away from the level of no modulation (i.e., 0.5) in monkeys Z and D but were significantly deviated in monkey T (permutation test, p = 0.44, 0.97, and 0.028, respectively).
These results suggest that CDt neurons play little role in the flexible changes of saccadic behavior based on short-term reward experiences. This conclusion is supported by the comparison between saccadic behavior and CDt neuronal activity across the three monkeys. Significant saccade biases based on short-term reward experiences were observed in all three monkeys (Fig. 3), but there was no significant bias in CDt neurons as a population in all monkeys (Fig. 5). However, it is possible that the bidirectional neuronal biases in monkey T may have contributed to its stronger behavioral biases.
Effects of long-term reward experiences
Alternatively, the CDt may be involved in the adaptation of saccadic behavior based on long-term reward experiences. To test this hypothesis, the monkey needs to experience reward-associated visual objects for a long time, only after which its saccadic behavior and CDt neuronal activity can be tested. Specifically, this stable object–value association procedure consisted of the following: (1) object–reward association learning (Fig. 6) and (2) testing of saccadic behavior (Fig. 7) and CDt neuronal activity (see Fig. 9). These subprocedures were done on separate days so that any effect of recently updated object values (i.e., short-term reward experiences) was excluded (see Materials and Methods).
For object–reward association learning, the monkeys were trained with many sets of fractal objects with different amounts of reward by using a saccade task (Fig. 6). Among a set of eight fractal objects, four were always associated with a reward (high-valued objects); the other four were always associated with no (or small) reward (low-valued objects). Each object set was trained in one session in 1 d. Other object sets were added for learning on subsequent days. Thus, the monkey learned multiple object sets on most experimental days, with each set being learned on consecutive days.
To test the effects of long-term reward experiences on saccadic behavior, we examined the monkey's saccadic eye movements in a free-viewing condition while learned objects were presented (Fig. 7). On each trial, four objects were chosen randomly from a set of eight objects and were presented simultaneously (Fig. 7B). The monkey was free to look at them for several seconds (3 s for monkeys Z and T and 2 s for monkey D). Note that the monkey obtained no reward by looking at any of the learned objects. This was critical because, otherwise, the value of each object would be updated or modified based on short-term reward experiences.
In the example shown in Figure 7, we examined free-viewing saccades of monkey Z to a set of eight objects before and after the long-term reward experiences (training). Before the training, the monkey looked at these objects with different durations (Fig. 7C). After 9 d of training (one session per day), the gaze durations changed dramatically (Fig. 7D): the monkey looked at high-valued objects (i.e., previously associated with a reward) longer than low-valued objects (i.e., previously associated with no reward). Some examples of eye-movement trajectories are shown in Figure 7B. The monkey first made a saccade to one of the high-valued objects and then made saccades to other high-valued objects sequentially (if there were any), while mostly avoiding low-valued objects.
To quantify the gaze bias, we calculated the index of gaze duration and the index of the first saccade and compared them between before and after training (≥3 d) for each set of objects (Fig. 8A,C). In monkey Z, both indices increased consistently: the monkey's gaze stayed longer on high-valued objects and the first saccade was more likely directed to a high-valued object. This tendency was weaker in monkey D and weakest in monkey T (Fig. 8B,D).
To test the effects of long-term reward experiences on CDt neuronal activity, we recorded the spike activity of single CDt neurons using a passive-viewing task (Fig. 9A,B). The fractal objects were chosen randomly and presented one at a time at the preferred position for the recorded neuron while the monkey was fixated on the FP. To evaluate only the effects of long-term reward experiences while avoiding any effect of short-term reward experiences (i.e., flexible) object values, we used two specific procedures: (1) the neuronal testing was done at least 1 d after the last learning; and (2) a reward was given at the end of a trial but it was not associated with either high-valued or low-valued objects.
Figure 9C shows an example of the activity of a CDt neuron in monkey Z. The neuron responded to some of the fractal objects but not others. Importantly, the preference of the neuron was biased to high-valued objects: the two most preferred objects were high-valued objects. We examined the same neuron using four more object sets (total number of objects was 40). A Wilcoxon's rank-sum test showed that the responses of the neuron to high-valued objects were statistically larger than those to low-valued objects (p = 1.1 × 10−15). The value modulation index, calculated as an ROC area, was 0.655.
Overall, CDt neurons responded to high-valued objects more strongly than to low-valued objects, as shown by the averaged activity for each monkey (Fig. 10A). The value modulation indices of individual CDt neurons were biased toward 1.0 (which would indicate the absolute preference for high-valued objects) in all monkeys (Fig. 10B). On average, the value modulation index was larger in monkey Z (0.57) than in monkey D (0.54) or monkey T (0.53). In all monkeys, the index was significantly larger than 0.5 (t test, p = 3.7 × 10−11, 0.015, and 0.0029 for monkeys Z, D, and T, respectively). Many CDt neurons (n = 70 of 102, 33 of 46, and 33 of 58 for monkeys Z, D, and T, respectively) showed statistically significant biases in response between high- and low-valued objects (Wilcoxon's rank-sum test, p < 0.05; Fig. 10B, black). As a whole, CDt neurons encoded stable object values based on long-term reward experiences.
Figure 10A also shows the time courses of the stable value coding of CDt neurons. This is indicated as the difference in the responses to high- and low-valued objects (Fig. 10A, black curve). The stable value coding started at ∼125 ms, peaked at ∼200 ms, and gradually decreased. Because the visual responses started at ∼90 ms, the neuronal bias based on long-term reward experiences appeared ∼35 ms after the arrival of visual information.
The population data of the neuronal bias shown in Figure 10 might seem rather modest. However, this seemingly weak bias was partly attributable to the fact that many CDt neurons are highly object selective (Yamamoto et al., 2012), as shown graphically in Figure 11. Here, for each neuron, we plot the response magnitudes against the response rank separately for high- and low-valued objects. All of the neurons shown here appear to have value-dependent biases, but their value modulation indices were smaller than 0.7 and larger than 0.3 except for one neuron (Fig. 11A). It was because the visual responses to lower-rank objects were often mostly 0. This tendency (i.e., high object selectivity) was stronger among the CDt neurons shown on the right column (Fig. 11B,D,F). In most neurons, the difference in response between high- and low-valued objects was larger for higher-rank objects and smaller (sometimes absent) for lower-rank objects.
It is noteworthy that some neurons showed opposite responses: larger to low-valued objects than high-valued objects. A small percentage of neurons showed the significantly smaller value modulation index than 0.5 (Fig. 10B). The rank-ordered response plot for those neurons (Fig. 11G,H) showed the opposite response pattern to the other neurons (Fig. 11A–F).
The averaged rank-order plot obtained in monkey Z (Fig. 12) suggests that the response bias of CDt neurons based on stable values was multiplicative rather than additive. At each rank, we calculated the difference and ratio of the responses of CDt neurons to the high-valued and low-valued objects. The difference was larger for the higher-rank objects (p = 5.6 × 10−16, t test for a correlation coefficient), whereas the ratio was not different across ranks (p = 0.41, t test for a correlation coefficient).
After the long-term object–value association learning, the three monkeys showed different degrees of preference for high-valued objects in terms of their saccadic behavior (Fig. 8) as well as the CDt neuronal responses (Fig. 10). Figure 13 shows that the behavioral preference was correlated with the neuronal preference across the three monkeys. The data are consistent with the hypothesis that the CDt neuronal preference contributes to the animal's behavioral preference.
Discussion
Our results support the hypothesis that the CDt serves to control a visuomotor skill. There are several parallels between skill and CDt neuronal activity. A skill emerges after repeating a goal-directed action (Fitts, 1964; Anderson, 1982; Hikosaka et al., 1995); CDt neurons responded to visual objects differentially only after the monkey experienced each object many times in association with stable values. A skill is executed automatically (Logan, 1985; Ericsson and Lehmann, 1996; Ashby et al., 2010); the value-differential responses of CDt neurons occurred in a passive-viewing task in which the presented visual objects were likely ignored. A skill persists even after learning is stopped (Ammons et al., 1958; Adams, 1987; Hikosaka et al., 2002); the value-differential responses of CDt neurons remained robust after repeated sessions of the passive-viewing task in which each object presentation was not associated with a reward. A skill is not hindered by the capacity limitation of short-term or working memory (Shiffrin and Schneider, 1977; Ericsson and Lehmann, 1996); each monkey experienced many fractal objects (e.g., >400 in monkey Z), and yet CDt neurons overall showed clear value-differential responses. A skill is executed as a motor action (Newell, 1991); the value-differential signals of CDt neurons are likely sent to the superior colliculus (SC) to induce saccadic eye movements directed to high-valued objects. A skill is executed quickly (Newell and Rosenbloom, 1981; Hikosaka et al., 1995); the value-differential responses of CDt neurons started at ∼125 ms after the appearance of fractal objects, so that saccades are likely initiated within 150 ms.
The last two points require more explanation. Our first study on this project showed that electrical stimulation of CDt neurons induced saccadic eye movements, and its threshold was as low as the frontal eye fields (Bruce et al., 1985; Yamamoto et al., 2012). This oculomotor effect is likely mediated by a striatonigral pathway, more specifically the direct connection of the CDt to the substantia nigra pars reticulata (SNr) (Saint-Cyr et al., 1990). In our second study, we showed that SNr neurons responded to visual objects differentially based on long-term stable reward experiences, similarly to CDt neurons but with an opposite polarity (i.e., stronger inhibitions by more valued objects) (Yasuda et al., 2012). The comparison suggests that the inhibitions of SNr neurons are mediated by the direct pathway (i.e., direct inhibitory connection from the CDt to the SNr). The CDt–SNr effect is likely exerted on the saccade-generation mechanism in the SC, because a majority of SNr neurons exhibiting the value-differential visual responses were shown to project to the SC (Yasuda et al., 2012).
It should be noted that the activity of CDt neurons can explain the monkey's value-seeking behavior only partially. When the values of fractal objects changed flexibly, the monkey changed its preference flexibly, thus choosing recently high-valued objects. This is crucial in a volatile condition when object values change flexibly (Paton et al., 2006). This would be characterized as “goal-directed behavior” because the choice occurs when a valued goal is predicted (Balleine and Dickinson, 1998). Crucially, CDt neurons showed only weak value-differential responses in the flexible value condition and therefore are unlikely to contribute to goal-directed behavior or deliberate object choice. Which brain areas, including head and body part of the caudate nucleus, control the deliberate object choice will be an important future issue.
Our conclusion supports the hypothesis by Mishkin et al. (1984) that visual habit is controlled by the connection from the temporal cortex to the CDt. In their paradigm, the monkey was presented with ∼20 fixed pairs of visual objects, one associated with a reward and the other no reward, and learned to choose the reward-associated objects. The memory acquired through this task was distinct from episodic memory, because the learning was not impaired by the lesion of the hippocampal region (Malamut et al., 1984). Instead, it was impaired by the lesion of the CDt (Fernandez-Ruiz et al., 2001). Furthermore, this type of learning or memory may not be controlled consciously, because human amnesic patients with hippocampal lesions were able to learn this task, but they were not able to indicate verbally which object they chose (Cohen and Squire, 1980). These seminal findings and our new finding together suggest that the CDt is a key structure that controls automatic choices of visual objects based on long-term stable reward experiences.
There is one important issue that remains to be solved: where does the stable value-based plasticity occur? Because CDt neurons show value-differential responses, the plasticity must occur either at the synapses on CDt neurons or somewhere upstream. The CDt plasticity hypothesis would be favored by the literature on the basal ganglia physiology. It is well known that corticostriatal synapses are susceptible to plasticity, particularly when there is a corelease of dopamine (Wickens et al., 2003). Because dopamine neurons are known to encode reward-related signals (Schultz, 1998), the plasticity at corticostriatal synapses would provide a perfect mechanism for CDt neurons to change their responses to visual objects based on past reward experiences. However, the CDt plasticity mechanism has never been examined experimentally. This was because the plasticity mechanism has been studied mainly in rodents (Yin and Knowlton, 2006) whose caudate nucleus has no tail (Franklin and Paxinos, 2007; Paxinos and Watson, 2007).
Thus, it is still possible that the plasticity occurs in areas upstream to the CDt and that CDt neurons simply receive their signals. An obvious area to test this hypothesis is the inferotemporal cortex, which encodes visual object information (Miyashita, 1993; Logothetis et al., 1995; Tanaka, 1996) and heavily projects to the CDt (Yeterian and Van Hoesen, 1978; Van Hoesen et al., 1981; Saint-Cyr et al., 1990; Webster et al., 1995). Neurons in the inferotemporal cortex do show plastic changes so that they can respond to particular visual objects regardless of their appearances (Li and DiCarlo, 2008, 2012), but this “tolerance” learning occurs in an unsupervised manner and does not require reward experiences (Li and Dicarlo, 2012). Some neurons in the inferotemporal cortex change their responses to visual objects depending on their stable reward values, but the changes are minor (Jagadeesh et al., 2001; Mogami and Tanaka, 2006). However, there have been few studies that tested this hypothesis, especially in relation to long-term reward experiences. This hypothesis remains to be tested.
We cannot exclude the possibility that the same type of plasticity occurs downstream to the CDt. In fact, SNr neurons (one synapse downstream to the CDt) show differential responses such that they categorized visual objects based on long-term stable reward experiences, largely disregarding their visual features (unlike CDt neurons) (Yasuda et al., 2012). Thus, it is possible that the stable value-based plasticity occurs at the CDt–SNr synapses as well so that the output of the SNr overwhelmingly indicates whether the object is valued or not.
To summarize, our data suggest that the CDt–SNr–SC circuit provides a selective mechanism to choose valued visual objects based on long-term reward experiences, although it is still unclear where the underlying synaptic plasticity occurs. This mechanism has two important features. First, this mechanism encodes only stable values (not flexible values); CDt neurons (and SNr neurons as well) are unable to learn the values of visual objects quickly but instead learn their values slowly when their values remain stable. Second, this mechanism works automatically even when the appearance of an object is not associated with a rewarding or nonrewarding consequence (i.e., passive-viewing task). This would allow monkeys to respond automatically to valued objects by making saccades to them; this is exactly what they did. Similar value-based automatic orienting of gaze occurs in humans (Della Libera and Chelazzi, 2009; Anderson et al., 2011; Anderson and Yantis, 2012; Theeuwes and Belopolsky, 2012).
These features in turn may support other aspects of skill. First, stable value coding ensures a large memory capacity. Without stable value coding, the brain would have to rely on flexible value coding, for which both learning and unlearning must occur quickly based on short-term reward experiences. Working memory is a typical example of flexible coding, and its capacity is very small (Shiffrin and Schneider, 1977; Cowan, 2001). In contrast, for stable value coding, both learning and unlearning occur very slowly. Unlearning could have occurred in our procedure for testing the effect of long-term reward experiences (because object–reward contingency was absent), and yet the monkey's saccadic behavior as well as CDt neuronal activity remained biased. This tolerance to devaluation, which is often related to habit (Yin and Knowlton, 2006), ensures the accumulation of value-based information (or memories). Thus, as the monkey experiences a large number of visual objects in association with stably biased rewards for a prolonged period of time, the CDt–SN system would acquire a large capacity of object–value memories (Yasuda et al., 2012). Second, the automatic nature of the CDt–SNr–SC mechanism ensures that the monkey responds to valued objects obligatorily and therefore quickly. Without the stable value-coding mechanism, animals and humans would be at a loss facing so many objects, unable to quickly choose valuable objects.
Footnotes
This work was supported by the Intramural Research Program at the National Eye Institute. We thank M. Yasuda, I. E. Monosov, E. S. Bromberg-Martin, S. Hong, and Y. Tachibana for valuable discussions and A. Hays, J. W. McClurkin, B. Nagy, A. M. Nichols, D. Parker, T. W. Ruffner, M. K. Smith, G. Tansey, N. Phipps, C. Zhu, F. Ye, and D. Leopold for technical assistance.
The authors declare no competing financial interests.
References
- Adams JA. Historical review and appraisal of research on the learning, retention, and transfer of human motor skills. Psychol Bull. 1987;101:41–74. doi: 10.1037/0033-2909.101.1.41. [DOI] [Google Scholar]
- Ammons RB, Farr RG, Bloch E, Neumann E, Dey M, Marion R, Ammons CH. Long-term retention of perceptualmotor skills. J Exp Psychol. 1958;55:318–328. doi: 10.1037/h0041893. [DOI] [PubMed] [Google Scholar]
- Anderson BA, Yantis S. Value-driven attentional and oculomotor capture during goal-directed, unconstrained viewing. Attent Percept Psychophys. 2012;74:1644–1653. doi: 10.3758/s13414-012-0348-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Anderson BA, Laurent PA, Yantis S. Value-driven attentional capture. Proc Natl Acad Sci U S A. 2011;108:10367–10371. doi: 10.1073/pnas.1104047108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Anderson JR. Acquisition of cognitive skill. Psychol Rev. 1982;89:369–406. doi: 10.1037/0033-295X.89.4.369. [DOI] [Google Scholar]
- Aosaki T, Kimura M, Graybiel AM. Temporal and spatial characteristics of tonically active neurons of the primate's striatum. J Neurophysiol. 1995;73:1234–1252. doi: 10.1152/jn.1995.73.3.1234. [DOI] [PubMed] [Google Scholar]
- Ashby FG, Turner BO, Horvitz JC. Cortical and basal ganglia contributions to habit learning and automaticity. Trends Cogn Sci. 2010;14:208–215. doi: 10.1016/j.tics.2010.02.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Balleine BW, Dickinson A. Goal-directed instrumental action: contingency and incentive learning and their cortical substrates. Neuropharmacology. 1998;37:407–419. doi: 10.1016/S0028-3908(98)00033-1. [DOI] [PubMed] [Google Scholar]
- Brown VJ, Desimone R, Mishkin M. Responses of cells in the tail of the caudate nucleus during visual discrimination learning. J Neurophysiol. 1995;74:1083–1094. doi: 10.1152/jn.1995.74.3.1083. [DOI] [PubMed] [Google Scholar]
- Bruce CJ, Goldberg ME, Bushnell MC, Stanton GB. Primate frontal eye fields. II. Physiological and anatomical correlates of electrically evoked eye movements. J Neurophysiol. 1985;54:714–734. doi: 10.1152/jn.1985.54.3.714. [DOI] [PubMed] [Google Scholar]
- Caan W, Perrett DI, Rolls ET. Responses of striatal neurons in the behaving monkey. 2. Visual processing in the caudal neostriatum. Brain Res. 1984;290:53–65. doi: 10.1016/0006-8993(84)90735-2. [DOI] [PubMed] [Google Scholar]
- Cohen NJ, Squire LR. Preserved learning and retention of pattern-analyzing skill in amnesia: dissociation of knowing how and knowing that. Science. 1980;210:207–210. doi: 10.1126/science.7414331. [DOI] [PubMed] [Google Scholar]
- Cowan N. The magical number 4 in short-term memory: a reconsideration of mental storage capacity. Behav Brain Sci. 2001;24:87–114. doi: 10.1017/s0140525x01003922. discussion 114–185. [DOI] [PubMed] [Google Scholar]
- Della Libera C, Chelazzi L. Learning to attend and to ignore is a matter of gains and losses. Psychol Sci. 2009;20:778–784. doi: 10.1111/j.1467-9280.2009.02360.x. [DOI] [PubMed] [Google Scholar]
- Ericsson KA, Lehmann AC. Expert and exceptional performance: evidence of maximal adaptation to task constraints. Annu Rev Psychol. 1996;47:273–305. doi: 10.1146/annurev.psych.47.1.273. [DOI] [PubMed] [Google Scholar]
- Fernandez-Ruiz J, Wang J, Aigner TG, Mishkin M. Visual habit formation in monkeys with neurotoxic lesions of the ventrocaudal neostriatum. Proc Natl Acad Sci U S A. 2001;98:4196–4201. doi: 10.1073/pnas.061022098. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fitts PM. Perceptual-motor skill learning. In: Melton AW, editor. Categories of human learning. New York: Academic; 1964. pp. 243–285. [Google Scholar]
- Franklin KBJ, Paxinos G. The mouse brain in stereotaxic coordinates. Ed 3. New York: Academic; 2007. [Google Scholar]
- Gross CG, Rocha-Miranda CE, Bender DB. Visual properties of neurons in inferotemporal cortex of the macaque. J Neurophysiol. 1972;35:96–111. doi: 10.1152/jn.1972.35.1.96. [DOI] [PubMed] [Google Scholar]
- Hays AV, Jr, Richmond BJ, Optican LM. Unix-based multiple-process system, for real-time data acquisition and control. WESCON Conf Proc. 1982:1–10. [Google Scholar]
- Hikosaka O, Sakamoto M, Usui S. Functional properties of monkey caudate neurons. I. Activities related to saccadic eye movements. J Neurophysiol. 1989;61:780–798. doi: 10.1152/jn.1989.61.4.780. [DOI] [PubMed] [Google Scholar]
- Hikosaka O, Rand MK, Miyachi S, Miyashita K. Learning of sequential movements in the monkey: process of learning and retention of memory. J Neurophysiol. 1995;74:1652–1661. doi: 10.1152/jn.1995.74.4.1652. [DOI] [PubMed] [Google Scholar]
- Hikosaka O, Rand MK, Nakamura K, Miyachi S, Kitaguchi K, Sakai K, Lu X, Shimo Y. Long-term retention of motor skill in macaque monkeys and humans. Exp Brain Res. 2002;147:494–504. doi: 10.1007/s00221-002-1258-7. [DOI] [PubMed] [Google Scholar]
- Jagadeesh B, Chelazzi L, Mishkin M, Desimone R. Learning increases stimulus salience in anterior inferior temporal cortex of the macaque. J Neurophysiol. 2001;86:290–303. doi: 10.1152/jn.2001.86.1.290. [DOI] [PubMed] [Google Scholar]
- Kimura M, Kato M, Shimazaki H, Watanabe K, Matsumoto N. Neural information transferred from the putamen to the globus pallidus during learned movement in the monkey. J Neurophysiol. 1996;76:3771–3786. doi: 10.1152/jn.1996.76.6.3771. [DOI] [PubMed] [Google Scholar]
- Li N, DiCarlo JJ. Unsupervised natural experience rapidly alters invariant object representation in visual cortex. Science. 2008;321:1502–1507. doi: 10.1126/science.1160028. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li N, Dicarlo JJ. Neuronal learning of invariant object representation in the ventral visual stream is not dependent on reward. J Neurosci. 2012;32:6611–6620. doi: 10.1523/JNEUROSCI.3786-11.2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Logan GD. Skill and automaticity: relations, implications, and furure directions. Can J Psychol. 1985;39:367–386. doi: 10.1037/h0080066. [DOI] [Google Scholar]
- Logothetis NK, Pauls J, Poggio T. Shape representation in the inferior temporal cortex of monkeys. Curr Biol. 1995;5:552–563. doi: 10.1016/S0960-9822(95)00108-4. [DOI] [PubMed] [Google Scholar]
- Malamut BL, Saunders RC, Mishkin M. Monkeys with combined amygdalo-hippocampal lesions succeed in object discrimination learning despite 24-hour intertrial intervals. Behav Neurosci. 1984;98:759–769. doi: 10.1037/0735-7044.98.5.759. [DOI] [PubMed] [Google Scholar]
- Mishkin M, Malamut B, Bachevalier J. Memories and habits: two neural systems. In: Lynch G, McGaugh JL, Weinberger NM, editors. Neurobiology of learning and memory. New York: Guilford; 1984. [Google Scholar]
- Miyashita Y. Inferior temporal cortex: where visual perception meets memory. Annu Rev Neurosci. 1993;16:245–263. doi: 10.1146/annurev.ne.16.030193.001333. [DOI] [PubMed] [Google Scholar]
- Miyashita Y, Higuchi S, Sakai K, Masui N. Generation of fractal patterns for probing the visual memory. Neurosci Res. 1991;12:307–311. doi: 10.1016/0168-0102(91)90121-E. [DOI] [PubMed] [Google Scholar]
- Mogami T, Tanaka K. Reward association affects neuronal responses to visual stimuli in macaque TE and perirhinal cortices. J Neurosci. 2006;26:6761–6770. doi: 10.1523/JNEUROSCI.4924-05.2006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Newell A, Rosenbloom PS. Mechanisms of skill acquisition and the law of practice. In: Anderson JR, editor. Cognitive skills and their acquisition. Hillsdale, NJ: Erlbaum; 1981. pp. 1–55. [Google Scholar]
- Newell KM. Motor skill acquisition. Annu Rev Psychol. 1991;42:213–237. doi: 10.1146/annurev.ps.42.020191.001241. [DOI] [PubMed] [Google Scholar]
- Orban GA, Van Essen D, Vanduffel W. Comparative mapping of higher visual areas in monkeys and humans. Trends Cogn Sci. 2004;8:315–324. doi: 10.1016/j.tics.2004.05.009. [DOI] [PubMed] [Google Scholar]
- Paton JJ, Belova MA, Morrison SE, Salzman CD. The primate amygdala represents the positive and negative value of visual stimuli during learning. Nature. 2006;439:865–870. doi: 10.1038/nature04490. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Paxinos G, Watson C. The rat brain in stereotaxic coordinates. Ed 6. Amsterdam: Academic; 2007. [DOI] [PubMed] [Google Scholar]
- Saint-Cyr JA, Ungerleider LG, Desimone R. Organization of visual cortical inputs to the striatum and subsequent outputs to the pallido-nigral complex in the monkey. J Comp Neurol. 1990;298:129–156. doi: 10.1002/cne.902980202. [DOI] [PubMed] [Google Scholar]
- Saleem KS, Logothetis NK. A combined MRI and histology atlas of the rhesus monkey brain. Amsterdam: Academic; 2007. [Google Scholar]
- Schultz W. Predictive reward signal of dopamine neurons. J Neurophysiol. 1998;80:1–27. doi: 10.1152/jn.1998.80.1.1. [DOI] [PubMed] [Google Scholar]
- Shiffrin RM, Schneider W. Controlled and automatic human information processing. II. Perceptual learning, automatic attending and a general theory. Psychol Rev. 1977;84:127–190. doi: 10.1037/0033-295X.84.2.127. [DOI] [Google Scholar]
- Tanaka K. Inferotemporal cortex and object vision. Annu Rev Neurosci. 1996;19:109–139. doi: 10.1146/annurev.ne.19.030196.000545. [DOI] [PubMed] [Google Scholar]
- Theeuwes J, Belopolsky AV. Reward grabs the eye: oculomotor capture by rewarding stimuli. Vision Res. 2012;74:80–85. doi: 10.1016/j.visres.2012.07.024. [DOI] [PubMed] [Google Scholar]
- Van Hoesen GW, Yeterian EH, Lavizzo-Mourey R. Widespread corticostriate projections from temporal cortex of the rhesus monkey. J Comp Neurol. 1981;199:205–219. doi: 10.1002/cne.901990205. [DOI] [PubMed] [Google Scholar]
- Webster MJ, Bachevalier J, Ungerleider LG. Transient subcortical connections of inferior temporal areas TE and TEO in infant macaque monkeys. J Comp Neurol. 1995;352:213–226. doi: 10.1002/cne.903520205. [DOI] [PubMed] [Google Scholar]
- Wickens JR, Reynolds JN, Hyland BI. Neural mechanisms of reward-related motor learning. Curr Opin Neurobiol. 2003;13:685–690. doi: 10.1016/j.conb.2003.10.013. [DOI] [PubMed] [Google Scholar]
- Yamamoto S, Monosov IE, Yasuda M, Hikosaka O. What and where information in the caudate tail guides saccades to visual objects. J Neurosci. 2012;32:11005–11016. doi: 10.1523/JNEUROSCI.0828-12.2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yasuda M, Yamamoto S, Hikosaka O. Robust representation of stable object values in the oculomotor basal ganglia. J Neurosci. 2012;32:16917–16932. doi: 10.1523/JNEUROSCI.3438-12.2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yeterian EH, Van Hoesen GW. Cortico-striate projections in the rhesus monkey: the organization of certain cortico-caudate connections. Brain Res. 1978;139:43–63. doi: 10.1016/0006-8993(78)90059-8. [DOI] [PubMed] [Google Scholar]
- Yin HH, Knowlton BJ. The role of the basal ganglia in habit formation. Nat Rev Neurosci. 2006;7:464–476. doi: 10.1038/nrn1919. [DOI] [PubMed] [Google Scholar]