Abstract
Neurons at the top of primate ventral visual stream [inferior temporal cortex (IT)] have selectivity for objects that is highly tolerant to variation in the object's appearance on the retina. Previous nonhuman primate (Macaca mulatta) studies suggest that this neuronal tolerance is at least partly supported by the natural temporal contiguity of visual experience, because altering that temporal contiguity can robustly alter adult IT position and size tolerance. According to that work, it is the statistics of the subject's visual experience, not the subject's reward, that instruct the specific images that IT treats as equivalent. But is reward necessary for gating this type of learning in the ventral stream? Here we show that this is not the case—temporal tolerance learning proceeds at the same rate, regardless of reward magnitude and regardless of the temporal co-occurrence of reward, even in a behavioral task that does not require the subject to engage the object images. This suggests that the ventral visual stream uses autonomous, fully unsupervised mechanisms to constantly leverage all visual experience to help build its invariant object representation.
Introduction
Neurons at the top of the ventral stream, inferior temporal cortex (IT), have shape selectivity that is remarkably tolerant to variations in each object's appearance (e.g., object position, size variation) (Logothetis and Sheinberg, 1996; Tanaka, 1996; Vogels and Orban, 1996). This neuronal tolerance likely underlies the primate ability to recognize objects in the face of image variation (Hung et al., 2005; Li et al., 2009). How IT neurons obtain this tolerance remains poorly understood. One hypothesis is that the ventral stream leverages natural visual experience to build tolerance, and temporal contiguity cues can participate in “instructing” tolerance—because each object's identity is temporally stable, different retinal images of the same object tend to be temporally contiguous. The ventral stream could take advantage of this natural tendency and learn to associate neuronal representations that occur closely in time to yield tolerant object selectivity (i.e., optimizing the “slowness” of visual representation) (Foldiak, 1991; Stryker, 1991; Wiskott and Sejnowski, 2002). We previously reported evidence for this hypothesis—the position and size tolerance of IT neurons are predictably reshaped by targeted manipulations of temporally contiguous experience. In particular, visual experience can both destroy existing tolerance and build new tolerance, depending on the details of the provided experience [i.e., “unsupervised temporal tolerance learning” (UTL)] (Li and DiCarlo, 2008, 2010).
In this study, we ask: does UTL depend on the reward (R) contingencies of the animal? This question is central because it assesses the relevance of UTL to natural vision, and it informs computational models of the ventral stream that incorporate temporal contiguity learning (Wallis and Rolls, 1997; Wiskott and Sejnowski, 2002; Wyss et al., 2006; Masquelier and Thorpe, 2007). One hypothesis is that reward plays a “permissive” role and the ventral stream only learns tolerance from visual experience during elevated reward states. Alternatively, the ventral stream continually leverages all visual experience to build tolerance, regardless of reward state. Intuitively, our central underlying question is: does UTL take place in animals that are naturally experiencing the world in which external rewards may be few and far between?
We first note that we termed UTL “unsupervised” because the statistics of the visually arriving images (not the external reward) instruct what will be learned—images that occur in close spatiotemporal proximity are gradually learned to be treated as equivalents in the IT representation. However, our previous work was done in the context of water-restricted animals receiving liquid co-occurrent with all experimentally controlled visual experience (Li and DiCarlo, 2008, 2010). Thus, those studies cannot rule out the hypothesis that reward is required to gate UTL.
Here we examine the effect of reward on gating UTL by strongly varying the magnitude and the timing of rewards to monkey subjects during visual experience previously shown to induce UTL. We used behavioral measures to confirm reward state changes in the animals, and we measured the strength of UTL under different reward contingencies. We found that UTL proceeded at the same, experience-driven rate as previous reports, regardless of the reward contingencies.
Materials and Methods
Animal subjects
All animal procedures were performed in accordance with NIH guidelines and the MIT Committee on Animal Care. Two male rhesus monkeys (Macaca mulatta, 8 and 6 kg) were used in the study. Aseptic surgery was performed to implant a head post and a scleral search coil. After brief behavioral training (1–3 months), a second surgery was performed to place a recording chamber (18 mm diameter) to reach the anterior half of the temporal lobe.
Visual stimuli
In each experimental session, visual stimuli were presented on a 53 cm CRT monitor (85 Hz refresh rate; ∼48 cm away; background gray luminance: 22 Cd/m2; maximum white: 46 Cd/m2). Pairs of object stimuli were chosen from a pool of 96 images to ensure IT neuronal responsivity and selectivity (described in Experimental design). Eye position was monitored using standard sclera coil technique (Robinson, 1963), and was used in real time to support the presentation of object images in retinal coordinates, with the details depending on the particular experiment (see below, Experimental design).
Neuronal recording
Multiunit activity (MUA) was recorded using single-microelectrode methods (Li and DiCarlo, 2010). MUA was defined as all voltage waveforms in the spiking band (300 Hz to 7 kHz) that crossed a threshold set to ∼2 SDs of the background activity. That threshold was held constant for the entire session. MUA has the important advantage of allowing reliable monitoring of the same IT site for several hours. In this study, we leveraged this advantage to directly assess (before vs after) the change in IT tolerance induced by experience over 1–2 h. We focused on MUA here because we have previously shown that UTL is quantitatively similar when measured using isolated single units or MUA (Li and DiCarlo, 2008, 2010).
Experimental design
Each experimental session (day) began by lowering a single microelectrode into anterior IT cortex (for details on recording locations, see Li and DiCarlo, 2010). Once a site was found that showed clear visual drive in the measured MUA, we began the recording session with an initial screening in which the site was probed with object images (4.5°) each presented for 100 ms on the animals' center of gaze (interleaved in pseudo-random order with a 100 ms blank period between each image, five to eight object images presented per trial). We used 96 achromatic images of two classes of visual objects: 48 cutout natural objects and 48 silhouette shapes, both presented on gray background. These two classes of objects are substantially different from each other in their pixel-wise similarity, and we have previously shown that neuronal plasticity induced among one object class does not “spill over” to the other class (Li and DiCarlo, 2010). Based on the response of each recording site within one class of objects (natural or silhouette), we chose the most preferred (P) and least preferred [i.e., nonpreferred (N)] objects among the objects that drove the site significantly (t test against background response, p < 0.05, not corrected for multiple tests). Typically, only one such pair of objects was chosen to be further tested and manipulated (“swap” objects). In some sessions (R1 + R2 sessions) (see Fig. 2a, described below), we ran two experiments in interleaved fashion within the session, and thus two pairs of swap objects were chosen to be manipulated. For these sessions, we always chose the two pairs of objects from separate object classes (e.g., one object pair from the “natural” class used for R1 and one pair from the silhouette class used for R2), and the assignment of object class to R1/R2 was strictly alternated across sessions. In addition, six other images (three natural, three silhouette) were chosen as control images, and IT selectivity among these control images served as a measure of recording stability (see Data analyses). The animals were provided with altered visual experience with the swap-object pairs in the exposure phase, and the details of that experience and the animals' tasks are provided below. IT selectivity was measured for both the swap-object pairs and control images during two test phases—one before the exposure phase and one following the exposure phase. Further details of the experimental design and stimuli are as previously described (Li and DiCarlo, 2010) (outlined in Fig. 1).
Each test phase was used to probe IT neuronal selectivity. During the test phase, each animal was engaged in a slightly different task, but we and others have previously found that both tasks lead to very similar testing results (DiCarlo and Maunsell, 2000; Li and DiCarlo, 2008, 2010). Specifically, Monkey 1 performed a passive fixation task while object images were presented on the center of gaze (100 ms duration followed by 100 ms blank, five to eight object images per fixation trial). Monkey 2 performed a visual foraging task in which it freely searched an array of eight small dots (0.2° in size, vertically arranged) in a manner like the visual foraging task used in some exposure phases (R3, described below). During free viewing, object images were presented for 100 ms on the animals' center of gaze during brief periods of natural fixations. The retinal stimulation produced by each object image in the two tasks was essentially identical (DiCarlo and Maunsell, 2000). Fifty to 60 repetitions per object image were collected.
During each exposure phase, visual experience of the object images was delivered as punctate exposure events on the animals' center of gaze in the context of free viewing. Specifically in one exposure event, one object image was presented for 100 ms, after which it was immediately replaced by a second object image at a different size for another 100 ms. This flow of image presentation is illustrated as one arrow in Figure 1a. There were four different exposure event types: some of the exposure events contained object identity change across the size change (so-called swap exposures; Fig. 1a, red arrows), while others maintained object identity over a different size change (“nonswap” exposures; Fig. 1a, blue arrows). Unless stated otherwise, the animals typically received a total of 800 swap and 800 nonswap exposures (randomly interleaved) for each swap-object pair. We strictly alternated the object size (1.5° or 9°) at which the swap exposure was deployed between experimental sessions. In sum, to induce neuronal learning, we used the same visual experience delivery paradigm as in our previous study (Li and DiCarlo, 2010). For different experimental sessions, we aimed to make the visual experience delivery identical (albeit with different object pairs), but we manipulated the delivery of reward accompanying the exposure events (the main variable of interest in this study, described next).
Reward magnitude was operationally defined as juice volume delivered in short pulses (17–117 μl) to water-restricted animals (see reward condition details below). Juice delivery was controlled via a solenoid valve with its opening time and duration under computer control. There was a linear relationship between the juice volume delivered and the duration of the solenoid valve opening (calibrated before the experiments), such that the juice volume delivery could be precisely controlled. In separate experimental sessions, we tested four different ways of delivering the reward and exposure events to the animals (conditions R0, R1, R2, and R3, described below) (Fig. 2b). Condition R0 was published data from Li and DiCarlo (2010) reanalyzed in the current study. Conditions R1, R2, and R3 were newly collected data. Sessions of R1/R2 were conducted first. Then, sessions of R3 were conducted (Fig. 2a).
Reward condition R0.
Object images appeared at random positions on a gray computer screen and free-viewing animals naturally looked to the objects. Visual exposure events were deployed upon the animals' foveation, and a drop of apple juice in a fixed volume (17 μl) was delivered immediately after each exposure event (Fig. 2b). Unlike conditions R1, R2, and R3, the exposure phase only consisted of 800 swap exposures in condition R0 (no nonswap exposures). The R0 condition was not designed to redemonstrate the basic UTL phenomenology (see Results); thus, this condition did not include the nonswap control condition. Rather, data from this condition were collected in a previous study, here reanalyzed with the only aim of comparing the rate of IT selectivity change under visual experience conditions that are exactly matched to those used in reward conditions R1–R3 (swap condition).
Reward conditions R1 and R2.
Visual exposure events were delivered the same way as R0. However, in each experimental session, we manipulated visual experience with two pairs of swap objects. All exposure events were immediately followed by juice reward (as in R0). However, that reward was nearly seven times larger for exposures to images from one object pair (R1, “high reward,” 117 μl) than to exposures of images from the other object pair (R2, “low reward,” 17 μl). The R1 exposures and R2 exposures were delivered in blocks (Fig. 3a; 200 exposure events per block), and small colored dots (0.4°) on the four corners of the monitor indicated the block type to the animal (green indicated high reward, red indicated low reward). This block design was used under the logic that it would give us the best chance to detect a dependency of IT tolerance learning on reward magnitude, even if the hypothetical underlying reward state mechanisms had low temporal resolution. The rate of the exposure event delivery was calibrated based on pilot data, such that the total duration of the two types of blocks was equal, this was done before the beginning of the experiment to ensure that any would-be observed difference in the induced selectivity change magnitude could not be due to a longer or shorter time interval for selectivity change to unfold. Only Monkey 1 was tested in this condition (see logic described in Reward condition R3).
Reward condition R3.
Each animal freely searched an array of 16 small dots looking for hidden reward (Fig. 4a), and our goal was to insert exposure events during natural visual fixations that were retinally identical to those provided under reward conditions R0–R2. The dots never changed in appearance, but on each “trial” one dot would be randomly baited, in that a juice reward was given when the animals foveated that dot, and the next trial continued uninterrupted. During this foraging task, exposure events were deployed on a subset of the animals' natural fixations. Each exposure event's onset time was the detected end time of a saccade (see Fig. 4a). One such exposure event was provided on approximately every fourth fixation, and no two exposure events were provided on back-to-back fixations. Because the delivery of the exposure events was unrelated to the animals' performance in the foraging task, the temporal contingency of the exposure events and the animals' reward was strongly disrupted (Fig. 4d). Our goal was to provide at least 800 complete exposure events per exposure phase. However, because a complete exposure event is 200 ms in duration (100 ms for each image in the event) and some of the natural fixation durations turned out to be shorter than this (minimum fixation duration was 80 ms), a fraction of the intended exposure events (∼30%) were terminated early (i.e., removal of the currently displayed image at the end of the fixation). For example, the shortest natural fixation could produce an exposure event that only displayed the first image for 80 ms and the second image was not shown—an event that clearly fails to deliver the intended image-pair experience. We ensured that each exposure phase contained exactly 800 complete swap exposure events, plus some number of incomplete ones (same for the nonswap exposures). Because we did not know whether some of these incomplete exposure events induced any additional neuronal learning (beyond the learning induced by the 800 complete exposures), we used off-line analyses to control for that possibility. Specifically, we counted an incomplete swap exposure event as valid if the second object image was shown for at least 24 ms (i.e., natural fixation >124 ms). On average, our data include ∼200 such partial but “valid” exposure events in the exposure phase (∼25%). Thus, the total magnitude of learning we report was normalized by the total number of valid exposure events (so that the reported magnitude of learning rate could be compared with the results from other reward conditions; Fig. 2b). Because these incomplete swap exposure events may induce less learning than complete exposure events, our reported learning rate could have underestimated the R3 learning rate by as much as 20% (Fig. 2c). Both Monkey 1 and Monkey 2 were tested in this condition. Monkey 1 was tested in the R3 condition after first being tested in the R1/R2 condition. Then, Monkey 2 was tested in the R3 condition. Because manipulation of reward time (R3) is the most powerful test for any permissive role of reward, having tested both animals in the R3 condition and observed neuronal learning that was at least as strong as our original R0 conditions (see Results), we did not further test Monkey 2 in the R1/R2 sessions.
Data analyses
Neuronal data collected from 164 IT sites were first screened for recording stability. As in our previous work, we deemed a multiunit site's recording to be “stable” if the selectivity among the control object images (see Experimental design) was “unchanged” (r > 0.7 for Pearson's correlation between the responses to the six control images measured in the first and last test phases). We have previously verified the robustness of our results to this criterion (Li and DiCarlo, 2010). After this screen, we were left with the following dataset presented below: R0, n = 31 sites; R1 /R2, n = 25 sites; R3, n = 23 sites.
The IT response to each image was taken as the spike count in a time window of 100–250 ms post-stimulus onset (test phases only). Neuronal selectivity was computed as the response difference in units of spikes per second between images of objects P and N at different object sizes. To avoid bias in this estimate of selectivity, for each IT site we used a portion of the pre-exposure data (10 repetitions per image, pooled across object size) to determine the object labels P and N, and the remaining independent data to compute the selectivity values reported in the text (Li and DiCarlo, 2010). To address possible adaptation concerns, we reperformed the key analysis (Fig. 2c) after discarding the first image presentation in each test phase trial, and the result was qualitatively unchanged. The key results were evaluated statistically using a combination of t tests, ANOVAs, and nonparametric bootstraps, as described in the Results.
Single-unit sorting
We performed principle component analysis (PCA)-based spike sorting on the waveform data (sampled every 0.07 ms) collected during each test phase to isolate single units (Li and DiCarlo, 2010). K-mean clustering was performed in the PCA feature space to yield multiple units. The number of clusters was determined automatically by maximizing the distances between points of different clusters. Each unit obtained from the clustering was further evaluated by its signal-to-noise ratio (SNR: ratio of peak-to-peak mean waveform amplitude to the SD of the noise). We set an SNR threshold of 4.0, above which we termed a unit “single unit.” Then, from the pools of single units, we determined stable units across the two test phases within a recording session. A unit was deemed to be stable if its response pattern among the control object images (unexposed during the exposure phase) remained unchanged (Pearson's correlation, r > 0.9), and its waveform maintained a consistent shape (r > 0.9). Each recording session yielded at most one such stable unit, and we were able to isolate 18 such units from 48 stable recording sessions (as defined in the Data analyses section) (Fig. 5a).
Results
We focused on a form of neuronal learning that ventral stream neurons may rely on to build their tolerance (UTL): manipulating the temporal contiguity of animals' experience with object images can predictably reshape the size and position tolerance of IT neuronal responses (Li and DiCarlo, 2008, 2010). In all experiments reported here, we focused on size tolerance, and we used a very similar experimental paradigm as in our previous work (see Materials and Methods) (Li and DiCarlo, 2010), which is briefly summarized as follows. We exposed two animals to an altered visual world where we gave each animal temporally contiguous experience with object images from two objects (P and N objects) at different object sizes (exposure phase). For half of such exposure events, images of each object were correctly paired in time (e.g., a small size image of a dog was followed by a medium size image of a dog) (Fig. 1a, blue arrows, nonswap exposures). As in our previous work, we tested the strength of UTL by using the other half of the exposure events to consistently pair the images of two different objects across two sizes (e.g., a medium size image of a dog was followed by a large size image of a rhinoceros) (Fig. 1a, red arrows, swap exposures). If the ventral stream is learning to associate temporally contiguous visual representations, the prediction, confirmed previously by Li and DiCarlo (2010), is that the selectivity at each IT site for P and N objects will begin to reverse primarily for the object size where the swap exposure was deployed (i.e., “swap size”) as IT neurons incorrectly associate the representations for P and N (Fig. 1b; a change in selectivity among large size dog and large size rhinoceros in the above example). While the temporal contiguity learning hypothesis makes no quantitative prediction for the selectivity change at the medium size (Fig. 1b, gray oval) (Li and DiCarlo, 2010), it does predict that there should be no reduction in selectivity at the nonswap size.
The above learning phenomenology has been previously described, and we have previously explored key issues that are not re-explored here (e.g., order of stimulus presentation, effects on nonexposed stimuli) (Li and DiCarlo, 2010). Instead, our first, control goal was to replicate the basic UTL phenomenology: a decrease in the original object preference (P over N) at the swap size. The second, primary goal of this study was to examine the effect of reward on the rate of UTL to gauge the potential permissive role of reward. To achieve these goals, we performed experimental sessions in which we always provided the animals with the same kind of temporal contiguity experience (described above) to induce UTL, but in some sessions we strongly varied the reward magnitude (Fig. 2a; R1/R2, n = 25 recording sessions), and in other sessions we strongly varied the reward timing (Fig. 2a; R3, n = 23 recording sessions) accompanying that visual experience (see Materials and Methods).
In the following results presentation, we first check for the replication of the basic UTL phenomenology by analyzing the newly collected dataset, ignoring the reward structure (Fig. 2a; R1,R2, R3, pooled). Next, we examine the strength of UTL separately for each reward condition to isolate the effect of reward. As a reference, we also compare the rate of UTL in the current study to data collected as part of our previously published work (Li and DiCarlo, 2010), reanalyzed here (Fig. 2a, R0, which has nominally identical visual experience to R1, R2, and R3 swap conditions).
Replicate basic UTL phenomenology
To assess changes in the tolerance of IT responses, we recorded IT multiunit activity in short test phases before and after each exposure phase (see Materials and Methods) (Fig. 1). As expected, we reproduced the previously reported UTL in that, after exposure to the altered visual object size statistics, IT object selectivity changed in a specific way—it tended toward reversal at the swap size, while remaining largely unchanged at the nonswap size (for the central idea, see Fig. 1b). This effect was large enough to be observed in some individual multiunit sites after just 1 h of exposure (Fig. 1c). To quantify the effect for each IT site, we measured the difference in IT response to object P and N (P − N, in units of spikes per second) at each object size, and we took the difference in this selectivity before and after exposure to the altered visual statistics. This Δ(P − N) reflected the rate of IT selectivity change per 800 exposure events. Across all the tested object pairs (n = 73), Δ(P − N) at the swap size was highly significant (mean: −12.2 spikes/s change per 800 exposures; p < 0.001, t test against zero), and there was no significant change at the nonswap size (mean: −2.7 spikes/s; p = 0.17) (Fig. 1d). The size specificity was further confirmed statistically in the following two different ways: (1) a direct t test on Δ(P − N) between the swap and nonswap size (p < 0.001, two-tailed; Monkey 1: p < 0.01; Monkey 2: p = 0.019); and (2) a significant interaction of “exposure × object size” on the raw selectivity measurements (P − N)—that is, IT selectivity was decreased by exposure only at the swap size (p < 0.001, repeated-measures ANOVA, with “exposure” and “object size” being the within-group factors; Monkey 1: p < 0.01; Monkey 2: p = 0.014).
In this study, we concentrated on multiunit response data because it had a clear advantage as a direct test of our hypothesis—it allowed us to longitudinally track IT selectivity during altered visual experience across the entirety of each experimental session. We have previously shown that both single-unit and multiunit data reveal the UTL phenomenology (Li and DiCarlo, 2008, 2010). Nevertheless, we here also examined underlying single-unit data to confirm its consistency with the multiunit results. Single units have a limited hold time in awake primate physiology preparations. In 18 of the 48 recording sessions, we were able to track the same single unit across an entire (1–2 h) recording session (see Materials and Methods). Figure 5a shows six such IT single units. The judgment that we were recording from the same unit came from the consistency of the waveform of the unit and its consistent pattern of response among the nonexposed control object images (Fig. 5a). Across the single-unit population, the P − N selectivity at the swap size was significantly reduced, whereas the selectivity at the nonswap size remained stable (Fig. 5b; p = 0.003, t test, swap vs nonswap size), thus reconfirming the multiunit results at the single-unit level (Fig. 1d).
Since Δ(P − N) at the swap size quantifies the rate of IT selectivity change induced by our manipulation, we use it as our measure of IT neuronal tolerance learning in the rest of this article. The result presented so far is from the pooled data and ignores the reward condition under which the change in IT tolerance was induced. Next, we answer the main question of this study by breaking out the data to determine the rate of learning under three reward conditions: low reward (R1), high reward (R2), and temporally decoupled reward (R3). Because the animals received the same type and amount of visual experience with each swap pair of objects under each reward condition (see Materials and Methods), this enabled us to use the change in IT selectivity for each swap-object pair to independently measure and then directly compare the rate of IT tolerance learning under each reward condition.
Reward magnitude does not affect the rate of UTL
We first manipulated the magnitude of reward accompanying visual experience (Fig. 2a, R1 and R2). During the exposure phase, the animal naturally looked to each object image that we presented at arbitrary positions on the display screen. Here, all key exposure events took place after this foveation, and each exposure event was immediately followed by a juice reward, as in our previous work (Fig. 3) (Li and DiCarlo, 2010). However, here we systematically varied the magnitude of that reward. Specifically, the exposure phase consisted of alternating blocks: a high-reward block (condition R1, 117 μl of juice per exposure event) and a low-reward block (condition R2, 17 μl of juice per exposure event). Each block was ∼5 min in duration, each contained 200 exposure events, and each included a peripheral visual cue to indicate the current block type to the animal (Materials and Methods).
Behaviorally, we found that the animal was sensitive to this difference in reward magnitude. For example, the object-foveating saccade was faster during the high-reward block (Fig. 3a). The latency of primates' foveating saccades to visual stimuli has previously been used as an overt, objective measure of the animals' internal reward state, and this behavioral attribute is correlated with neuronal measures of dopaminergic release in basal ganglia (Hikosaka, 2007; Bromberg-Martin et al., 2010). Across all the experimental sessions, we observed overall shorter saccade latency for the high-reward blocks, confirming that we had successfully manipulated each animal's reward state (Fig. 3b; mean saccade latency; high reward, 170 ms; low reward, 200 ms; p ≪ 0.001, t test).
To measure experience-induced UTL under each reward condition, we used two pairs of objects—one pair was exposed (Fig. 1a) during the high-reward block, and the other pair was exposed during the low-reward block. We chose these objects based on the activity at each IT site using an objective procedure such that, on average, they were equivalent in terms of each IT site's selectivity and size tolerance (Materials and Methods). Importantly, we have previously shown that UTL learning from exposure to one such pair does not transfer, on average, to the nonexposed object pair when the pairs are chosen using these same pools of objects and same methods (Li and DiCarlo, 2010). As a further confirmation of this, we found that there was, on average, no significant change in IT selectivity among pairs of control objects unexposed to the animal during the exposure phase (Δ(P − N) = −2.4 spikes/s, p = 0.12, t test; see “control object images” in Materials and Methods, six control objects were randomly split into three pairs here). In contrast to the strong behavioral effect of reward, we found that our visual exposure paradigm induced robust UTL of nearly identical rate in both the high-reward block and the low-reward block (Fig. 2c; Δ(P − N), R1: −12.1 spikes/s; R2: −11.7 spikes/s; pooled: −12 spikes/s change per 800 exposures; p < 0.01, t test). In both reward conditions, the experience-induced selectivity change was specific to the swap size with no significant change at the nonswap size (R1: −3.0, p = 0.46; R2: −5.3, p = 0.18, t test).
We considered the possibility that our analyses of mean effect size might hide a subtle effect of reward magnitude on UTL. Specifically, we performed three additional analyses to leverage the most power from our data. First, we took notice of the fact that the animal's behavior was not consistent from day to day. On some days, the animal was highly sensitive to reward magnitude (i.e., a large difference in saccade latency), whereas on other days, the animal was not overtly sensitive to our reward manipulation (little difference in saccade latency) (Fig. 3c, top). Leveraging this day-to-day variation in the animal's sensitivity to reward, we reasoned that if UTL proceeds at a faster rate under a heightened reward state, then any difference in UTL between the high-reward (R1) and low-reward (R2) states should be the greatest in sessions with the largest behavioral difference. However, plotting the mean UTL rate [mean Δ(P − N)] from sessions with the largest saccade latency difference in the two reward blocks revealed that this was not the case—sessions with larger behavioral difference did not lead to greater UTL in the high-reward condition (Fig. 3c, bottom). Instead, the data showed a slight, nonsignificant trend in the opposite direction (p = 0.46, t test).
Second, due to daily variation in the animal's thirst level, we found a great amount of day-to-day variation in the animal's mean motivational state, as measured by its mean saccade latency during that day, collapsing over our reward blocks (Fig. 3d, top). Leveraging this day-to-day variation in the animal's motivational state, we asked whether higher motivational states led to higher (or lower) rates of neuronal tolerance learning during the day (relative to other days). However, when we plotted the rate of UTL as a function of the animal's mean saccade latency, we saw no such relationships (Fig. 3d, bottom; slope = 0.05, p = 0.56).
Third, we evaluated the statistical power of our dataset using bootstrap. By resampling with replacement the raw response data from the test phases, we determined the confidence interval around the ratio of the measured effect size in Δ(P − N) between the high- and low-reward conditions (Fig. 2c, R1/R2) due to trial-by-trial response variability. That analysis showed that our experiment should have (i.e., with 95% probability) detected modulation of UTL by reward magnitude if that modulation was outside the range of 59–173%. In sum, these results argue that UTL is not substantially gated by reward magnitude.
Next, we aimed to compare the rate of UTL in our present study to our previous work. If the magnitude of reward does not gate UTL, we expect the same rate of UTL, as long as we provide the animal with the same kind of visual experience (note that UTL rate is defined as selectivity change per exposure event; Fig. 1a). To test this prediction, we reanalyzed our previously published data, which was collected under the same kind of temporal contiguity experience followed by a fixed amount of reward that approximates the low-reward condition used here (Fig. 2b, condition R0) (Li and DiCarlo, 2010). We found that, the rate of UTL induced in our present study is nearly identical to that found in our previous work. When quantified as Δ(P − N) per 800 exposures, the mean rate of the selectivity change was remarkably similar across all the conditions (Fig. 2c, R0, R1, R2; p = 0.99, one-way ANOVA).
Reward timing does not affect UTL
In the animals' visual experience described so far and in all of our previous work on UTL (Li and DiCarlo, 2010), reward was always delivered immediately at the end of each visual exposure event. That is, all exposure events were tightly temporally coupled with reward. Dopaminergic neurons can be activated by stimuli predictive of an upcoming reward, and this modulation can be short lived (200–300 ms) (Schultz, 2007; Bromberg-Martin et al., 2010). Thus, we considered the possibility that a tight temporal contingency of reward and each visual experience event is critical in allowing those visual events to drive UTL. Because all the previous work on UTL implicitly engaged the animals with the visual object images to be learned (the animals had no task other than to spontaneously look at the object images and then receive its reward), we also wondered whether UTL would still occur even if the animals were engaged in an explicit task unrelated to the object images.
To examine these two related possibilities in a single experiment, we trained animals to perform an orthogonal visual foraging task while the key visual exposure events were delivered at times unrelated to the times of reward delivery. Specifically, free-viewing animals searched for a hidden reward randomly placed under 1 of 16 dots on the computer screen. Reward was given only upon foveation of the baited dot (Fig. 4a; Materials and Methods). Behaviorally, the animals were highly engaged in the task: the animals spent the majority of their time looking from dot to dot (Fig. 4a), the majority of their fixations were on or near one of the dots (Fig. 4c), and the animals successfully located the reward in >95% of the trials. By monitoring the animals' eye position in real time, we delivered the exposure events during brief periods of natural fixation (i.e., a retinal exposure event was begun as the animals' gaze arrived at one of the dots; Fig. 4a) while the animals were performing this task (Fig. 4b; see Materials and Methods). Critically, the fixation intervals chosen for visual exposure delivery were randomly chosen independently of the animals' progress in finding the hidden reward (see Materials and Methods). This strongly disrupted the temporal contingency of the reward from the exposure events that drive UTL in all previous work (Fig. 4d), and also made the exposure events completely irrelevant to the animals' acquisition of reward.
Importantly, we kept the type of object retinal exposure events and the number of such events virtually identical to the other reward conditions in which reward was tightly temporally coupled (R0, R1, R2). Thus, we could directly compare the rate of UTL induced in this condition (R3) with UTL produced in those conditions (R0, R1, R2) (Fig. 2b). We found that the rate of UTL in this condition was indistinguishable from that found in all three previously tested conditions (Fig. 2c; p = 0.99, one-way ANOVA). Bootstrap analysis showed that our data can rule out any difference in Δ(P − N) >3.8 spikes/s per 800 exposures with 95% confidence (Fig. 2c; R3 vs R0, R1, R2 pooled, ∼32% modulation in the learning rate). Therefore, we conclude that neither tight temporal contingency of reward, nor task relevance of the object images is required for visual experience to induce UTL.
Discussion
The objective of this study was to examine a form of visual experience-induced neuronal learning (UTL) that appears to support the ventral visual stream's ability to tolerate object identity preserving image variation (Cox et al., 2005; Li and DiCarlo, 2008, 2010). The results reported here replicate those previous results by showing that the temporal contiguity of visual experience can modify IT tolerance in a manner that is in qualitative agreement with temporal contiguity (also termed “slowness”) theories of learning invariance (Foldiak, 1991; Stryker, 1991; Wallis and Rolls, 1997; Wiskott and Sejnowski, 2002; Wyss et al., 2006; Masquelier and Thorpe, 2007). The main contribution of this study is the finding that the rate of experience-induced learning is unaffected by strong changes in the animals' reward state (Fig. 2) or by strong changes in the temporal relationship of reward to visual experience. Moreover, the visual experience produced the same rate of learning even when the animals performed a foraging task that did not require engagement with the visual stimuli.
While all of our experiments and analyses revealed no effect of external reward on tolerance learning in the ventral stream, it could still be argued that appropriate setting of internal state variables (e.g., dopamine activity patterns) related to reward “state” is somehow required to gate that learning, but that our experiments did not explore the appropriate operating range to expose that hypothetical gating. For example, one might argue that “rewards” are much smaller, much larger, or more temporally dispersed in the natural world than we were able to achieve in the laboratory with water-restricted animals. While experimental results can never, in principle, dismiss all variants of this hypothesis, the practical goal is to explore the relevant operating range of the system. In that regard, we note that the magnitude and the rate of reward delivery used in our study, as well as the animal subjects' measurable behavioral attributes (i.e., mean saccade latencies) are in close agreement with other studies that involved reward manipulations, and those studies have reported similar modulation of behavioral attributes by reward, as well as modulations of neuronal activity in other regions of the brain (Hikosaka, 2007; Bromberg-Martin et al., 2010; Frankó et al., 2010). Nevertheless, future study might resolve any remaining uncertainty by directly measuring neuromodulatory signals associated with reward in vivo (e.g., voltametrically recorded dopamine concentration) (Schultz, 2007) in freely behaving animals in husbandry settings. Such data could then be used to establish the true natural range of reward-related signals in the brain, and determine the relationship of that operational range to that which can be achieved in laboratory settings similar to ours. Alternatively, one could attempt to directly study UTL “in the wild,” but this would require clever control and/or monitoring of the animals' visual experience.
Our results here, taken in the context of earlier results (Li and DiCarlo, 2008, 2010), show that tolerance learning observed in IT is driven by the temporal contiguity of visual experience and that this learning is fully unsupervised in that it (thus far) shows no requirement of external supervision. That is, the learning does not require explicit labels of what to learn (e.g., no specific reinforcement signals such as rewards for some object image associations). Similarly, it does not require the subject to be in a highly rewarding environment, and it does not show any dependence on the animals' motivational state (Fig. 2). Thus, we speculate that UTL reported here and previous studies (Li and DiCarlo, 2008, 2010) reflects general, autonomous sensory cortical plasticity mechanisms. More concretely, it appears that the ventral visual cortical stream uses such mechanisms to constantly leverage all visual experience to help build its highly tolerant object representation—that is, a representation that supports object identification and categorization with simple decoding schemes and few labeled training examples (DiCarlo and Cox, 2007). Indeed, previous computational work has shown that autonomous extraction of slow varying features from unsupervised visual input can effectively create tolerant object representations (Wallis and Rolls, 1997; Wiskott and Sejnowski, 2002; Wyss et al., 2006; Masquelier and Thorpe, 2007), such that a simple supervised stage can be applied to the learned representation to achieve high performance on demanding visual tasks (Legenstein et al., 2010).
Beyond informing the neuronal mechanisms that underlie invariant object representation in the cortical ventral stream, it is also worth considering how the results reported here relate to our understanding of learning at the level of sensory perception. A few perceptual learning studies have investigated the role of reward, and have reported that verbal feedback and experience-contingent reward can enhance or gate perceptual learning in human subjects (Seitz et al., 2009, Shibata et al., 2009). However, such studies could not directly investigate the neuronal changes in sensory cortices. Our group previously showed position tolerance learning in human subjects' object perception under conditions analogous to those used here for size tolerance (Cox et al., 2005), and related human tolerance learning results have been reported by others (Wallis and Bülthoff, 2001; Wallis et al., 2009). It is noteworthy that our human study failed to induce such learning when visual experience was delivered passively to fixating subjects (Cox et al., 2005), which suggests a role for some internal state variable (e.g., “attention,” “arousal,” “reward”) on gating tolerance learning. This finding does not dovetail with our IT neuronal results reported here (which show that IT learning is robust to a wide range of reward and some task conditions). Many remaining differences between these two studies might explain the apparent discrepancy, including differences in visual stimuli and differences in tasks. Moreover, a failure to detect learning in human subjects under some conditions (Cox et al., 2005) does not mean that such learning does not exist at the neuronal level. Nevertheless, existing psychophysical results suggest caution in fully generalizing the neuronal results reported here. Most importantly, the link between the IT neuronal learning reported here and human perception remains mysterious, and a key direction going forward is combined psychophysical and neuronal studies in monkeys.
Direct neuronal investigations of learning in sensory cortices have revealed a large range of results on the role of reward—some studies reported that reward-related manipulations can gate various forms of associative learning (Sakai and Miyashita, 1991; Messinger et al., 2001) and sensory learning (Bao et al., 2001; Froemke et al., 2007; Law and Gold, 2009; Frankó et al., 2010), while others were able to induce learning in the absence of reward (Yao and Dan, 2001; Li et al., 2008). This range of effects in the literature may not be easily reconciled with a single learning mechanism; rather, different types of learning may be at play. Because instructions for some type of sensory learning cannot be obtained from the sensory experience alone (e.g., learning categorical grouping of unrelated stimuli, or learning to privilege one class of stimuli among many equally exposed ones), these types of learning likely require feedback of some type, such as reward (Bao et al., 2001; Froemke et al., 2007; Frankó et al., 2010). Other types of learning rely on the statistical regularities in the sensory input to shape neuronal representation (Yao and Dan, 2001; Li et al., 2008). Our results suggest that UTL belongs in the latter category in that instruction arises purely from the temporal contiguity in the visual experience itself. UTL is closely related to the previously reported “paired associate” learning (Miyashita, 1988; Messinger et al., 2001). Indeed, the fact that some studies have reported paired associate learning in the absence of reward (Miyashita, 1988; Erickson and Desimone, 1999), suggests that these two phenomenologies may share some underlying mechanisms. Furthermore, exposure to temporal pairing of visual stimuli can induce UTL-like neuronal learning in the primary visual cortex of the anesthetized cat (Yao and Dan, 2001). These accumulating and converging results point to neuronal mechanisms that speak to how the ventral stream may assemble its powerful adult object representations at all levels of visual cortical processing during visual development.
Footnotes
This work was supported by the National Institutes of Health (Grant R01-EY014970 and its American Recovery and Reinvestment Act supplement, National Research Service Award 1F31EY020057 to N.L.) and The McKnight Endowment Fund for Neuroscience. We thank N. Majaj, E. Issa, and A. Afraz for valuable discussion and comments on this work; and K. Schmidt, C. Stawarz, and R. Marini for technical support.
References
- Bao S, Chan VT, Merzenich MM. Cortical remodelling induced by activity of ventral tegmental dopamine neurons. Nature. 2001;412:79–83. doi: 10.1038/35083586. [DOI] [PubMed] [Google Scholar]
- Bromberg-Martin ES, Matsumoto M, Nakahara H, Hikosaka O. Multiple timescales of memory in lateral habenula and dopamine neurons. Neuron. 2010;67:499–510. doi: 10.1016/j.neuron.2010.06.031. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cox DD, Meier P, Oertelt N, DiCarlo JJ. “Breaking” position-invariant object recognition. Nat Neurosci. 2005;8:1145–1147. doi: 10.1038/nn1519. [DOI] [PubMed] [Google Scholar]
- DiCarlo JJ, Cox DD. Untangling invariant object recognition. Trends Cogn Sci. 2007;11:333–341. doi: 10.1016/j.tics.2007.06.010. [DOI] [PubMed] [Google Scholar]
- DiCarlo JJ, Maunsell JH. Form representation in monkey inferotemporal cortex is virtually unaltered by free viewing. Nat Neurosci. 2000;3:814–821. doi: 10.1038/77722. [DOI] [PubMed] [Google Scholar]
- Erickson CA, Desimone R. Responses of macaque perirhinal neurons during and after visual stimulus association learning. J Neurosci. 1999;19:10404–10416. doi: 10.1523/JNEUROSCI.19-23-10404.1999. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Foldiak P. Learning invariance from transformation sequences. Neural Comput. 1991;3:194–200. doi: 10.1162/neco.1991.3.2.194. [DOI] [PubMed] [Google Scholar]
- Frankó E, Seitz AR, Vogels R. Dissociable neural effects of long-term stimulus-reward pairing in Macaque visual cortex. J Cogn Neurosci. 2010;22:1425–1439. doi: 10.1162/jocn.2009.21288. [DOI] [PubMed] [Google Scholar]
- Froemke RC, Merzenich MM, Schreiner CE. A synaptic memory trace for cortical receptive field plasticity. Nature. 2007;450:425–429. doi: 10.1038/nature06289. [DOI] [PubMed] [Google Scholar]
- Hikosaka O. Basal ganglia mechanisms of reward-oriented eye movement. Ann N Y Acad Sci. 2007;1104:229–249. doi: 10.1196/annals.1390.012. [DOI] [PubMed] [Google Scholar]
- Hung CP, Kreiman G, Poggio T, DiCarlo JJ. Fast readout of object identity from macaque inferior temporal cortex. Science. 2005;310:863–866. doi: 10.1126/science.1117593. [DOI] [PubMed] [Google Scholar]
- Law CT, Gold JI. Reinforcement learning can account for associative and perceptual learning on a visual-decision task. Nat Neurosci. 2009;12:655–663. doi: 10.1038/nn.2304. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Legenstein R, Wilbert N, Wiskott L. Reinforcement learning on slow features of high-dimensional input streams. PLoS Comput Biol. 2010;6:e1000894. doi: 10.1371/journal.pcbi.1000894. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li N, DiCarlo JJ. Unsupervised natural experience rapidly alters invariant object representation in visual cortex. Science. 2008;321:1502–1507. doi: 10.1126/science.1160028. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li N, DiCarlo JJ. Unsupervised natural visual experience rapidly reshapes size-invariant object representation in inferior temporal cortex. Neuron. 2010;67:1062–1075. doi: 10.1016/j.neuron.2010.08.029. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li N, Cox DD, Zoccolan D, DiCarlo JJ. What response properties do individual neurons need to underlie position and clutter “invariant” object recognition? J Neurophysiol. 2009;102:360–376. doi: 10.1152/jn.90745.2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li Y, Van Hooser SD, Mazurek M, White LE, Fitzpatrick D. Experience with moving visual stimuli drives the early development of cortical direction selectivity. Nature. 2008;456:952–956. doi: 10.1038/nature07417. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Logothetis NK, Sheinberg DL. Visual object recognition. Annu Rev Neurosci. 1996;19:577–621. doi: 10.1146/annurev.ne.19.030196.003045. [DOI] [PubMed] [Google Scholar]
- Masquelier T, Thorpe SJ. Unsupervised learning of visual features through spike timing dependent plasticity. PLoS Comput Biol. 2007;3:e31. doi: 10.1371/journal.pcbi.0030031. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Messinger A, Squire LR, Zola SM, Albright TD. Neuronal representations of stimulus associations develop in the temporal lobe during learning. Proc Natl Acad Sci U S A. 2001;98:12239–12244. doi: 10.1073/pnas.211431098. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Miyashita Y. Neuronal correlate of visual associative long-term memory in the primate visual cortex. Nature. 1988;335:817–820. doi: 10.1038/335817a0. [DOI] [PubMed] [Google Scholar]
- Robinson DA. A method of measuring eye movements using a scleral search coil in a magnetic field. IEEE Trans Biomed Eng. 1963;10:137–145. doi: 10.1109/tbmel.1963.4322822. [DOI] [PubMed] [Google Scholar]
- Sakai K, Miyashita Y. Neural organization for the long-term memory of paired associates. Nature. 1991;354:152–155. doi: 10.1038/354152a0. [DOI] [PubMed] [Google Scholar]
- Schultz W. Multiple dopamine functions at different time courses. Annu Rev Neurosci. 2007;30:259–288. doi: 10.1146/annurev.neuro.28.061604.135722. [DOI] [PubMed] [Google Scholar]
- Seitz AR, Kim D, Watanabe T. Rewards evoke learning of unconsciously processed visual stimuli in adult humans. Neuron. 2009;61:700–707. doi: 10.1016/j.neuron.2009.01.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shibata K, Yamagishi N, Ishii S, Kawato M. Boosting perceptual learning by fake feedback. Vision Res. 2009;49:2574–2585. doi: 10.1016/j.visres.2009.06.009. [DOI] [PubMed] [Google Scholar]
- Stryker MP. Neurobiology. Temporal associations. Nature. 1991;354:108–109. doi: 10.1038/354108d0. [DOI] [PubMed] [Google Scholar]
- Tanaka K. Inferotemporal cortex and object vision. Annu Rev Neurosci. 1996;19:109–139. doi: 10.1146/annurev.ne.19.030196.000545. [DOI] [PubMed] [Google Scholar]
- Vogels R, Orban GA. Coding of stimulus invariances by inferior temporal neurons. Prog Brain Res. 1996;112:195–211. doi: 10.1016/s0079-6123(08)63330-0. [DOI] [PubMed] [Google Scholar]
- Wallis G, Bülthoff HH. Effects of temporal association on recognition memory. Proc Natl Acad Sci U S A. 2001;98:4800–4804. doi: 10.1073/pnas.071028598. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wallis G, Rolls ET. Invariant face and object recognition in the visual system. Prog Neurobiol. 1997;51:167–194. doi: 10.1016/s0301-0082(96)00054-8. [DOI] [PubMed] [Google Scholar]
- Wallis G, Backus BT, Langer M, Huebner G, Bulthoff H. Learning illumination- and orientation-invariant representations of objects through temporal association. J Vis. 2009;9:6. doi: 10.1167/9.7.6. [DOI] [PubMed] [Google Scholar]
- Wiskott L, Sejnowski TJ. Slow feature analysis: unsupervised learning of invariances. Neural Comput. 2002;14:715–770. doi: 10.1162/089976602317318938. [DOI] [PubMed] [Google Scholar]
- Wyss R, König P, Verschure PF. A model of the ventral visual system based on temporal stability and local memory. PLoS Biol. 2006;4:e120. doi: 10.1371/journal.pbio.0040120. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yao H, Dan Y. Stimulus timing-dependent plasticity in cortical processing of orientation. Neuron. 2001;32:315–323. doi: 10.1016/s0896-6273(01)00460-3. [DOI] [PubMed] [Google Scholar]