Abstract
We perceive objects as permanent and stable despite frequent occlusions and eye movements, but their representation in the visual cortex is neither permanent nor stable. Feature selective cells respond only as long as objects are visible, and their responses depend on eye position. We explored the hypothesis that the system maintains object pointers that provide permanence and stability. Pointers should send facilitatory signals to the feature cells of an object, and these signals should persist across temporary occlusions and remap to compensate for image displacements caused by saccades. Here, we searched for such signals in monkey areas V2 and V4 (Macaca mulatta). We developed a new paradigm in which a monkey freely inspects an array of objects in search for reward while some of the objects are being occluded temporarily by opaque drifting strips. Two types of objects were used to manipulate attention. The results were as follows. 1) Eye movements indicated a robust representation of location and type of the occluded objects; 2) in neurons of V4, but not V2, occluded objects produced elevated activity relative to blank condition; 3) the elevation of activity was reduced for objects that had been fixated immediately before the current fixation (‘inhibition of return’); and 4) when attended, or when the target of a saccade, visible objects produced enhanced responses in V4, but occluded objects produced no modulation. Although results 1–3 confirm the hypothesis, the absence of modulation under occlusion is not consistent. Further experiments are needed to resolve this discrepancy.
NEW & NOTEWORTHY The way we perceive objects as permanent contrasts with the short-lived responses of visual cortical neurons. A theory postulates pointers that give objects continuity, predicting a class of neurons that respond not only to visual objects but also when an occluded object moves into their receptive field. Here, we tested this theory with a novel paradigm in which a monkey freely scans an array of objects while some of them are transiently occluded.
Keywords: area V4, object permanence, remapping, visual cortex, visual organization
INTRODUCTION
Our eyes continually make saccadic movements several times per second, tossing the images around on the retinae. Despite these image fluctuations, we are not confused about where things are. Even when several objects move around independently, like students on a campus, we have no problem keeping track of multiple individuals at a time to avoid collision despite temporary occlusions. Clearly, these amazing skills require some form of memory; the system must be able to relate what is visible now to what was visible a moment ago. This kind of memory has been demonstrated with various paradigms, such as the multiple-object tracking paradigm, showing that the system can keep track of specific objects in a crowd of moving, identically looking objects (Pylyshyn and Storm 1988), and the double-step saccade task where the system, before executing a saccade to one object, already plans the next saccade to another object (Duhamel et al. 1992; Umeno and Goldberg 1997; Walker et al. 1995). The system computes where on the retina the other object will be after the first saccade and stores the result until executing the next saccade. The neural mechanisms underlying this advance programming have been clarified (see Goldberg et al. 2006 and Wurtz 2008 for reviews).
What we do not understand well is how objects can be stable in perception. For tracking and saccade planning the system merely needs to update locations in space. But objects are characterized by their features, and features are represented by neurons in the visual cortex that are known to have fixed receptive fields in retinal coordinates. The activity of these feature neurons refers to retinal locations, whereas perception locates objects in space, ignoring retinal location. Also, the activity of feature neurons does not afford memory, because it is reset at every saccade (typically 3–4 times/s). So, how do we know what is where? Whereas previous theories postulated “remapping” of visual receptive fields to compensate for the retinal image displacements, a recent theory proposes that the system maintains object pointers, and what is being remapped is not the receptive fields but the pointers (Cavanagh et al. 2010). Specifically, it interprets the “remapping responses” of Duhamel et al. (1992) as attention pointer signals. This is an interesting conjecture that needs to be tested. In more general terms, we need to understand the neural basis of object permanence (Michotte 1950).
In this study, we searched for object pointer activity in the visual cortex. Stable object perception means continual access to object feature information across saccades. The neural mechanisms that lump feature signals together and enable object-based readout are likely the mechanisms of figure ground organization that modulate the visual responses in areas V1 and V2 (Bakin et al. 2000; Lamme 1995; Poort et al. 2012; Roelfsema et al. 1998; Zhou et al. 2000). The result of this organization process may be called a proto-object representation (Rensink 2000; von der Heydt 2015). But this representation is still retinotopic. For stable object perception, it must be combined with pointer mechanisms that link the proto-objects to locations in space. Thus, we are looking for neurons that project to the distributed feature neurons of objects and modulate their signals according to the pointer activity.
We hypothesized that such neurons might be found in area V4, because V4 is densely connected to V2 in both directions, with back projection axons spreading out several millimeters in V2 (Rockland et al. 1994), which is a prerequisite for accessing the distributed feature signals of an object (cf. models in Craft et al. 2007, Pooresmaeili and Roelfsema 2014, and Tsotsos 2011), and because V4 activity enhances contour grouping in V1 (Chen et al. 2014) and is itself enhanced by attention and saccade-related activity in the frontal eye fields (Moore and Armstrong 2003; Noudoost et al. 2010). However, because most V4 neurons represent basic visual features like orientation and curvature of contours (Pasupathy and Connor 1999), only a fraction of V4 neurons is likely to mediate pointer activity.
To distinguish neurons that mediate pointer activity from ordinary feature selective neurons, we use predictions from the theory. 1) Whereas visual feature responses cease immediately when the stimulating object is occluded, object pointer activity should persist; 2) neurons targeted by an object pointer should be activated not only by the visual object but also when the occluded object1 moves into their receptive field, that is, without any visual stimulation; and 3) the occluded object activity should be enhanced when the object is attended.
We tested these predictions with a novel paradigm in which monkeys make saccades across an array of objects in search for reward while some of the objects are briefly occluded. We call this the “object permanence paradigm.” We reasoned that, when engaged in such a task, the system would be likely to set up internal representations of the objects and use the hypothetical pointer mechanism to guide the eyes to the objects of interest, whether visible or occluded. We recorded single-cell activity from V4, and V2 for comparison, and analyzed the responses with respect to the monkey’s fixations and saccades. To derive specific predictions, we first need to sketch the theory of object pointers in some detail. We will then describe the experiment and compare the results with each of the predictions.
Theory and Predictions
The term remapping has been used in different studies with different meanings. We will explain the computational goal of remapping, the concept of object pointers, and the mechanism of remapping the pointers.
Primates make frequent saccades because their eyes have high resolution only near the center of gaze. With saccades, they can explore the entire visual scene with high resolution. But this creates the problem of integrating information across saccades. In particular, for object-based tasks the system needs to be able to integrate information from the same object sampled during several fixations. This is by no means trivial; the system has to combine the high-resolution foveal pattern of signals with peripheral patterns of much lower resolution. It needs to discern and individuate objects in the scene and integrate information by object, possibly for several objects simultaneously. Thus, producing stable perception involves much more than just providing subjective stability.
There are two fundamentally different ideas of how integration by objects might be achieved. One assumes that the afferent signals are “remapped” at each saccade (Fig. 1A), and the other postulates pointers and assumes that the pointers are remapped (Fig. 1B). In both cases, the remapping requires some kind of shifter circuit (Anderson and Van Essen 1987; Olshausen et al. 1993) that is activated by a copy of the eye movement control signal (“efference copy” or “corollary discharge”), but the outcome of the shifting operation is quite different: In scheme A, it is a stabilized representation of visual features from which the system can select, for example, by directing a top-down attention signal. Low- and high-resolution signals are combined spatio-topically in the stabilized representation. Galletti et al. (1993) found neurons in parietal cortex that had spatio-topic receptive fields (at least in the fixed-head condition), and some of them were orientation selective, suggesting a stabilized feature representation as in scheme A. But the most detailed visual representation is found in areas V1 and V2, and each of these areas contains several hundred million neurons signaling visual features retinotopically. The shear number of signals makes it seem unlikely that the system uses scheme A to achieve stabilization; the shifter circuit would have to be gigantic. In scheme B, the system uses a feature representation that shifts with every saccade, as it does in V1 and V2, and the shifter circuit moves the selection gate(s) correspondingly, using the efference copy to predict the shifts. In this scheme, the system is able to keep attention on the object of interest despite continual saccadic movements. Also, assuming that the system maintains several pointers, it can select one out of several objects by directing the attention signal to the corresponding pointer. In this scheme, low- and high-resolution signals are in different locations of the representation and need to be integrated at the higher processing stages. This would require some translation invariance, which is actually common in shape-selective neurons in inferotemporal cortex (Connor et al. 2007). Thus, although shifter circuits occur in both schemes, their functions are different and their envisioned sizes massively different.
The term “receptive field remapping” suggests a stabilization mechanism of the kind shown in Fig. 1A, but in fact, its original use referred to the observation that some neurons that have a retinotopic receptive field (RF) can also be activated, for a limited time before a saccade, from the location where RF will be after the saccade (Duhamel et al. 1992; Umeno and Goldberg 1997; Walker et al. 1995). In keeping with the receptive field concept, authors often refer to this location as the “future receptive field” (FF), as if RF would move to a new location. However, the “remapping” is only transient; after execution of the saccade, all excitability returns to the original RF. As shown in Fig. 2, these observations can be explained more easily as remapping of a pointer. The stimulus in FF activates a pointer cell whose activity flows down to the recorded cell when the connections are remapped. Thus, in this interpretation, the activation from FF is not a visual response but the result of remapping a pointer (Cavanagh et al. 2010).
The ultimate goal of remapping is to provide continuity of object representation for visual processing as well as for guiding eye movements. When an object is displayed as if moving behind an occluding object and then reappearing “on the other side,” it is perceived as the same object (Michotte 1950), and when an object simply disappears from the display and, within a short time, another object appears in a different location, the sequence is interpreted as one moving object. Importantly, even when the two display objects differ in shape, color, or size, the impression of seeing one single object is still compelling (Kolers and von Grünau 1976). Thus, spatio-temporal continuity, not the visual attributes, determines what is perceived as the same. All these observations show that there is a neural structure that tracks sameness and provides “hooks” for carrying the various attributes. Kahneman et al. (1992) call this structure an “object file,” and Pylyshyn and Storm (1988) introduced the idea of object pointers that provide perceptual continuity, showing that the system can maintain about four independent pointers simultaneously.
A first roadblock in understanding the pointer scheme is that it requires an object representation before object-based processing can start. A common assumption is that object representation involves object recognition. Can an object be represented before it is recognized? The clue to this puzzle was provided by the discovery of border ownership selective neurons in the visual cortex (Zhou et al. 2000). When such a neuron is stimulated by the contour of a figure, such as a square, the strength of its responses differs, depending on the side on which the figure is located; with the identical contour in the RF, a figure on one side produces a stronger response than a figure on the other side. Even when the stimulus displays are identical over a large region around the RF, the responses differ. Despite its small RF, the neuron seems to know where the figure is located. A large proportion of neurons in V2/V1 show this selectivity, and many of these also show side-of-object selectivity in images of natural scenes (Williford and von der Heydt 2016). And what is important is that border ownership selectivity emerges long before shape selectivity in inferotemporal cortex, the region that computes and represents object attributes like shape and color: 70 ms (Sugihara et al. 2011) compared with 120 ms (Brincat and Connor 2006). Thus, the brain defines objects before processing them in detail.
To explain the early onset of border ownership selectivity, Craft et al. (2007) proposed a model in which “grouping cells” (G cells) detect possible objects based on simple Gestalt criteria such as compact shape. In their model, the G cells have fixed templates that sum the responses of edge-selective cells with roughly co-circular arrangement of RFs. Each G cell also feeds back to the same edge cells. The feedback connections are modulatory; they enhance or suppress responses evoked from the RF but do not produce responses by themselves (Zhang and von der Heydt 2010). Thus, when a G cell is strongly activated by a configuration of edges, the representation of these edges will be enhanced. For this to be fast, G cells must reside in a higher-level area (e.g., V4) so that the feedforward and feedback signals travel through white matter fibers, which are much faster than “horizontal fibers” within V1/V2. This simple model explains a range of findings on border ownership selectivity (Sugihara et al. 2011; Zhang and von der Heydt 2010; Zhou et al. 2000) and even the results on natural scenes (Hu et al. 2019; Williford and von der Heydt 2016). Jehee et al. (2007) proposed another feedforward-feedback model comprising a hierarchy of visual areas. The actual neural mechanisms are a bit more sophisticated, making use of a number of specific figure ground cues (Qiu and von der Heydt 2005; von der Heydt and Zhang 2018).
The postulate that a single G cell feeds back to multiple feature cells leads to the unlikely prediction that there should be increased spiking synchrony between pairs of border ownership selective neurons, even between pairs that are widely separated in cortex, if both are stimulated by the same object, but not if stimulated by different objects. This is because, in the case of stimulation by the same object, the G cells projecting to both neurons send spike trains to both of them, and the common spike inputs produce synchrony. But when the two neurons are stimulated by different objects, different G cells are activated and the two neurons do not receive common spike inputs. This is exactly what Anne Martin found when recording with two separate electrodes from area V2 (Martin and von der Heydt 2015; also see Wagatsuma et al. 2016 for model analysis of synchrony). The result shows that distant neurons in V2 indeed receive common input from single neurons, just like in the model edge cells receive feedback from G cells.
How does object-based attention relate to the early grouping mechanisms? Wannig et al. (2011) found that attentional response modulation of V1 neurons spreads along visual structure as defined by Gestalt criteria. A likely vehicle for such top-down modulation are G cells, as in the border ownership model, because their feedback to feature cells of V1/V2 is modulatory. By activating a single G cell, a top-down attention signal can enhance a large number of edge responses corresponding to the contours of the object of interest (for a model simulation, see Mihalas et al. 2011; see also Pooresmaeili and Roelfsema 2014 for a related model). This leads to the prediction that the attention effect in border ownership selective cells should be asymmetric, because these cells are connected asymmetrically to G cells having templates on one side (this is what makes them border ownership selective). This asymmetry was actually observed; attention to an object on the preferred side of border ownership produced enhancement, whereas attention to an object on the nonpreferred side produced suppression (Qiu et al. 2007). Thus, the grouping mechanisms provide the structure for object-based attentional selection, whereby the feedback connections of G cells serve to select the distributed feature signals representing an object.
The grouping circuits organize the feature representation into entities that may be called proto-objects (von der Heydt 2015). But proto-objects are retinotopic. A G cell represents a retinal location, and its bottom-up input changes with every saccade. Object continuity requires a pointer that persists and can be remapped. One surprising result, indeed, the finding that inspired the present study, was the discovery of a short-term memory for border ownership (O’Herron and von der Heydt 2009, 2011): When a border ownership selective cell is stimulated by the contour of a square and the display is then reduced to this contour in isolation, making its ownership ambiguous, the modulation persists; that is, when the initial presentation is on the preferred side, the following response to the isolated contour is strong, but when initial presentation is on the other side, the following response is weak. The modulation reflecting the preceding border ownership persists for >1 s; it decays ∼20 times more slowly on average than the visual responses after removal of the RF stimulus (O’Herron and von der Heydt 2009; also see Gillary et al. 2017 for the possible synaptic mechanism of persistence).
Short-term memory is of course a prerequisite for the function of object pointers. Indeed, in their study demonstrating remapping, Duhamel et al. (1992) found that, in some neurons of the lateral intraparietal cortex, the FF stimulus could be presented up to 1 s before a saccade and still produce activity at the time of the saccade. A trace of the stimulus must have persisted. In the model of Fig. 2 this is persisting activity in object pointer cells (OP). Upon discovering short-term memory for border ownership, O’Herron and von der Heydt (2013) asked whether that memory would also support remapping. They presented a square figure outside the RF of V2 neurons, reduced the display to show only one of its contours in isolation, and then had the monkey make a saccade that landed the RF on the isolated contour. At that moment, the neurons responded, and sure enough, the strength of the response depended on the side of ownership in the foregoing figure presentation (O’Herron and von der Heydt 2013). Thus, figure ground organization has memory and remaps across saccades. The next step was to combine the pointer remapping scheme with the grouping model.
Figure 3 shows a synthesis of the two schemes. It differs from the diagram of Fig. 2 in that the layer of visual cells (VIS) has been expanded into a layer of feature cells F (V1/V2) and a layer of G cells. The F cells have their RFs on the retina (red/green ellipses on shaded band). For the illustration, a square is depicted whose contours stimulate some of the F cells, which in turn activate G cell no. 1 (the main projection of the F cells, not shown here for simplicity, goes to higher-level processing regions like V4 and IT). The excitatory bottom-up projections to G cells are depicted with filled arrows and the modulatory top-down projections with open arrows. As before, the top and bottom diagrams illustrate the states of the model before and after a saccade. In this model, F cells show the remapping of border ownership (O’Herron and von der Heydt 2013), whereas G cells show the classical remapping response (Duhamel et al. 1992).
Below we will use model diagrams derived from Fig. 3 to explain every prediction tested in the neurophysiological experiments. Our goal was to discriminate whether our recordings are from F cells or from G cells. The two electrode symbols labeled h1 and h2 represent these two alternative hypotheses. We will omit the F cell layer in the following to simplify the figures, because only the predictions for G cells are of interest for the purpose of this paper.
MATERIALS AND METHODS
We studied neurons in the visual cortices of two male rhesus macaques (Macaca mulatta). All procedures conformed to National Institutes of Health and US Department of Agriculture guidelines as verified and approved by the Animal Care and Use Committee of Johns Hopkins University.
Preparation and Single-Cell Recording
Three small head posts for head fixation were implanted in the skull, and recording chambers were placed over the visual cortex of each hemisphere under general anesthesia.
Isolated neuronal activity was recorded extracellularly with glass-coated platinum-iridium microelectrodes (Pt-Ir 0.1 mm diameter, etched taper ∼0.1, impedance 3–9 MO at 1 kHz) that were inserted through the dura mater. A spike time detection system (Alpha Omega MSD 3.22) was used.
V4 neurons were recorded in the prelunate gyrus, and V2 neurons were recorded mostly in the lunate sulcus after passing through V1, but also in the lip of the post-lunate gyrus. The assignment of cells to areas is based on location of tracks, depth of recording, and physiological criteria such as topography and size of receptive fields.
Stimuli and Experimental Design
Stimulus display.
The stimuli were presented to the monkeys with a ViewSonic G220fb color monitor having the refresh rate set to 100 Hz and resolution to 1,600 × 1,200. The monitors were viewed at a distance of 1 m and subtended 21 × 16° of the visual field.
Recording of gaze direction.
The direction of gaze was recorded for one eye by corneal reflection tracking using an infrared video system (Arrington MCU400) that was aligned with the axis of the eye via an infrared-reflecting mirror. The system recorded direction of gaze with a resolution of 0.08 × 0.16° of visual angle at a sampling rate of 250 Hz.
Behavioral design.
Monkeys were trained to perform two kinds of task, a conventional fixation task and a free-viewing foraging task. The fixation task was used for exploring feature selectivity and mapping receptive fields of neurons. Monkeys were given a juice reward for keeping the eye position signal on a fixation point for a few seconds. A fixation period was initiated when gaze was in a predetermined fixation window of a 1° radius for the duration of 300 ms. At this time, the first stimulus appeared. The animal was rewarded for keeping its gaze in the fixation window for a minimum duration of 2.2–2.5 s. After successful completion of a fixation trial, the display was blanked for an interval of 0.5–1.2 s. When fixation was broken, the trial was terminated and the following blank interval increased by 1 s.
In the free-viewing task, a cue object was presented, and after the monkey acquired fixation and maintained it for 200–500 ms, the cue object was extinguished and an array of objects presented (Fig. 4). The animal was then free to inspect the array while direction of gaze was monitored. Following Mazer and Gallant (2003), the array was constructed according to the eccentricity of the RF of the neuron under study so that fixation of one of the objects would in most cases land the RF on one of the other objects (object condition) and in the other cases land the RF on the uniform background (blank condition). One of the objects carried a reward; that is, when this object was fixated for 200 ms, a reward was delivered (otherwise there was no contingency on fixation time). The radius of the fixation windows was typically 1°, but in some neurons, especially when a receptive field was near foveal, we often used smaller windows to avoid having two objects inside the same window. In very few V4 neurons, we used a window of 2° when the objects were large.
The arrays consisted of two different types of objects distinguished either by shape or by color. An array contained a total of 10 objects, half of them matching the cue object (targets) and the others not matching (distracters). One of the targets was the reward object. While testing a neuron, the geometry of the array was fixed, but how target and distracter objects were distributed in the array, and which of the target objects carried the reward, was assigned at random before each trial, and the cue was reassigned every 20 trials. Thus, at the beginning of a trial, the cue informed the monkey to which group of objects to attend, and the monkey knew that fixating one of these five objects would produce the reward. After delivery of reward, or after 3.5–4 s of display, whichever came first, the array disappeared and the screen went blank for 0.5–0.8 s (intertrial interval), after which the next cue object appeared, enabling a new trial.
Test procedures.
Upon isolating a cell, its classical receptive field (CRF) was manually mapped with bars, drifting gratings, and/or rectangles, depending on the CRF properties of the neuron. Color and orientation were varied to determine the optimal bar stimulus. The manual mapping was typically confirmed by a position test in which a bar or grating was flashed at predetermined positions in randomized order. The orientation preference was tested in steps of 30°.
After these preliminary tests, the object array and a set of occluding strips were constructed, and recording was continued with the free-viewing foraging task (Fig. 4). The spacing of objects along the long axis of the array was made equal to the eccentricity of the RF (as required by the paradigm) and the spacing along the short axis ∼30% larger (to minimize interference). The width of the occluding strips and the open spaces approximately equaled the shorter array spacing, and their orientation was made orthogonal to the preferred orientation of the cell (if it was orientation selective). The speed of drift of the strips was chosen so that an object would be fully occluded for 200 ms. The cue object was placed outside the array at a position that randomly varied between trials.
We used two variants of the paradigm that differed in the way the colors were assigned to objects, background, and occluding strips. In the first variant, the preferred color of the neuron (or white in the absence of color selectivity) was assigned to the objects and to the occluding strips and a medium gray to the background. This was so in half of the trials; in the other half, object and background colors were flipped. Thus, when a strip drifted over an object, the object disappeared, but the color in the object area did not change. In this version, all objects in a trial necessarily had to have the same color, and target and distracter objects were made distinct by different shapes (square and trapezoid). This variant was used in monkey WI and partly also in monkey HA. To be able to use objects of different colors for targets and distracters, we used a different design in part of the recordings in monkey HA in which the occluding strips were a fixed gray darker than the background. In this design, target and distracters differed either in color or in shape, but not both.
Data Analysis
Fixation and saccades.
To compare saccades guided by direct visual information with saccades produced from memory, we define “visible” and “occluded” by requiring that an object was visible/occluded from 150 ms before the onset of a saccade until 50 ms before the beginning of the next fixation (rather than to the onset of the next saccade, to exclude atypically long saccades during which the system might be able to acquire additional visual information). Thus, we classify a saccade as “visually guided” if the saccade goal was exposed throughout this interval and as “memory saccade” if the saccade goal was occluded throughout this interval. If the exposure changed during this interval, the saccade was excluded.
Firing rates.
The analysis of the data from the object permanence paradigm is complicated by the fact that the fixation periods are determined by the monkey, and only the object array and occluding strips are under experimental control. As a result, the periods of object occlusion are not synchronized to the fixation periods. We only analyzed the activity corresponding to fully visible and fully occluded objects. To be sure that for the visible/occluded condition nothing of the object was visible during a given fixation, we imposed the additional condition that the object had been fully visible/occluded ≥30 ms before the fixation onset (saccades were generally <30 ms). Only fixations with a clear history were included. We excluded 1) the first fixation of each foraging trial, 2) fixations on the same object as the previous one (most likely resulting from a correction saccade after fixation moved out of the detection window and then back in), and 3) fixations when the previous saccade onset happened >30 ms ago. We counted spikes from 50 ms after fixation onset to the end of fixation or until 30 ms after the end of the visible/occluded period, whichever came first. Exactly the same counting scheme was applied for the blank condition (uniform background in the RF). Here, we distinguished between visible background and occluded background. In a separate analysis, we looked at the activity at the end of the fixation periods, aligning the spike trains to the onset of the next saccade. For this analysis, the visible and occluded conditions were defined by requiring that the object in the RF was visible/occluded ≥100 ms before the end of fixation, and spikes were counted from that time point until 50 ms after the onset of the saccade.
We calculated the mean firing rate for each fixation by dividing the spike count by the length ta of analysis interval and applied general linear models (GLMs) to the square root transformed mean firing rates of a neuron. A square root transform, also known as Anscombe transform, is often used for spike counts of visual cortical neurons in which the variance tends be proportional to the mean; it stabilizes the variance and makes the distribution approximately normal (Bartlett 1936). For a given length of the analysis interval ta, this holds also for the mean firing rates. For variable ta, preliminary analysis of our data showed that, in a GLM on the square root transformed mean firing rates, the residual variance tended to vary inversely with ta (varres = ta−0.93 in the average across neurons). Therefore, to stabilize variances, we weighted the inputs to the GLM with ta1/2 (assuming varres = ta−1; this corresponds to weighting with the inverse of the residual standard deviations). Population results were derived by averaging the GLM coefficients across neurons, weighting each neuron with the inverse of its RMSE of the GLM of the visual object responses. Bar graphs represent means and 95% confidence limits according to t-test. Significance of the means compared with zero was determined by Wilcoxon signed-rank test (in Figs. 9, 10, and 13, *P < 0.05, **P < 0.01).
To determine whether occluded objects are represented in the neural activity, we compared the activity for the occluded object with the activity for the occluded blank region. For the analysis locked to the begin of fixation we used the GLM:
(1) |
where the dependent variable is square root of firing rate (SQFR), and OBJ codes for the presence of an object (1 = object, 0 = blank). The other variables are covariates that absorb variance produced by unrelated factors, such as the decaying responses when there had been an object in the receptive field during the previous fixation (PREOBJ), the direction of movement of the occluding strips (MASKDIR), and color (COL; only in the first design, where the color of objects and occluding strips varied between background gray and the cell’s preferred color); ε is the error term.
For the analysis locked to the end of fixation, the PREOBJ term was omitted:
(2) |
The same GLM (without PREOBJ) was also used for analyzing the subset of fixations for which the RF was in a blank region during the preceding fixation.
The influence of surround objects was estimated on all but the blank conditions:
(3) |
where NSURR is the mean number of visible surround objects (which depends on RF position in the array). This estimate was used to correct the effect of OBJ estimated according to Eq. 1.
Inhibition of return was estimated using
(4) |
where FROMRF = 1 if the object in RF had just been fixated, and 0 otherwise, and TYPE is the type of object in RF; analysis interval is locked to the beginning of fixation. Because fixation was mostly on target objects, we excluded fixations where the RF was on a distracter object. Also, to exclude saccade planning effects, we excluded fixations in which the object in RF was the goal of the next saccade. This GLM and the following (Eqs. 5 and 6) were applied to occluded object intervals (see above).
The effect of attention was estimated using
(5) |
where ATTEND = 1 if TYPE equals the cue object, and 0 otherwise; analysis interval is locked to end of fixation. To exclude the effects of inhibition of return, responses to just-fixated objects were excluded.
The effect of saccade planning was estimated using
(6) |
where SGOAL indicates whether the object in RF is the goal of the next saccade or not, and ATTEND = 1 was selected; analysis interval is locked to the end of fixation. Again, responses to just-fixated objects were excluded.
Time Course of Responses
Peristimulus time histograms of the responses were compiled with 1-ms resolution and smoothed using Lowess with tension 0.04 (for 350 ms length of histogram) for graphic displays. We computed separate histograms for the beginning and end of fixation by aligning individual responses appropriately. Because the state of occlusion (vis, occ) before and after a saccade varied unpredictably and their frequencies were not balanced, simply averaging visual and occluded conditions, each would include that randomness. Therefore, we first computed histograms for each combination (vis-vis, vis-occ, occ-vis, and occ-occ) and then calculated means as appropriate. For example, to show the responses at the onset of fixation for visible and occluded objects, we calculated mean (vis-vis and occ-vis) and mean (vis-occ and occ-occ). The analogous procedure was used for other parameters and combinations of parameters. Thus, we obtained representations of the mean pre- and post-saccadic activity that were independent of the numbers of presentation of the various combinations of parameters. For population averages, neurons were weighted as explained above in Firing rates. For the time course plots (but not the firing rate analysis), we averaged over monkeys because their individual plots were similar.
RESULTS
Behavior: Fixation and Saccades
We studied a quasi-natural situation in which the animal freely inspects an array of objects in search for reward while some of the objects are temporarily occluded. The animal was taught that fixating on one of the objects for 200 ms would yield a reward in form of a small amount of fruit juice. There were two types of objects in the array, and the animal was informed ahead of each trial which of the two groups included the “reward object.” We call this group of objects targets and the other distracters. After the monkeys had mastered this task, we added a series of opaque strips that drifted across the array, occluding half of it at any time. We collected data from two macaque monkeys that we identify as monkey WI and monkey HA.
Both monkeys performed well on the task. Counting as a fixation when gaze remained within the fixation window of an object for >90 ms, monkey WI placed 74% of all fixations on targets and monkey HA 75% (each >50% at P < 10−16, normal approximation test for proportions). The rates were even higher (82% and 87%) if we count only fixations >200 ms (the minimum required for a reward). The animals learned to cope with the occlusion surprisingly fast; within one session they performed the foraging task as before. Moreover, analysis of their eye movements showed that they did not hesitate to saccade to an occluded object and fixated occluded objects nearly as accurately as visible objects (see materials and methods for definition of “visual” and “occluded” conditions).
Monkey WI fixated occluded objects as often as visible objects (12,884 vs. 12,451; total no. of fixations in the data included in the neurophysiology analysis), whereas monkey HA fixated on the occluded object less frequently (13,576 vs. 19,509); apparently, monkey HA had learned to some extent to avoid saccading to occluded objects. Saccades to target objects were more frequent than saccades to distracter objects, irrespective of whether they were visible or occluded before the saccade (by a factor of 1.80 for visible and 1.82 for occluded in monkey WI; 1.81 for visible and 1.85 for occluded in monkey HA; each >1 at P < 10−16, normal approximation test for proportions). The duration of fixation was quite similar whether the next saccade was to an occluded or a visible target (median: 222 vs. 230 ms in monkey WI; 244 vs. 214 ms in monkey HA).
In Fig. 5, we compare the fixation accuracy during the first 50 ms of fixation between the visible and occluded conditions. The histograms represent the distributions of the horizontal deviation of gaze from the centers of objects. The distributions were only slightly broader for occluded than for visible objects. The same was true for the vertical deviation (not shown).
These results show that the initiation of saccades was quite similar for memory-guided and visually guided saccades and that fixation on occluded objects was nearly as accurate as fixation on visible objects. We conclude that the system maintains pointers to guide eye movements.
Neurophysiology
We studied areas V4 and V2 of the visual cortex. Complete data from the object permanence paradigm were obtained from 206 V4 cells and 76 V2 cells, but only cells with clear responses in the visible condition (coefficient of OBJ > 0 with P < 0.05 in the model of Eq. 1) were included in the analysis, leaving 164 V4 cells (WI: 88; HA: 76) and 54 V2 cells (WI: 45; HA: 9). For comparing responses between occluded-object and occluded-blank conditions, we further excluded cells whose RFs sometimes moved outside or close to the margin of the display, leaving 87 V4 cells and 54 V2 cells (WI: 59/45; HA: 28/9) for that analysis.
A complication of our paradigm is that only the timing of object presentation and occlusion is under experimental control, whereas the fixation periods are determined by the monkey. As a result, the fixation periods are not synchronized to the periods of object occlusion (Fig. 6). We selected the activity corresponding to fully visible and fully occluded objects, excluding fixations in which the RF might have landed on an object before it was fully visible/occluded and limiting the length of the analysis window so as to exclude effects of visual stimulation after the end of fixation (see materials and methods). We counted spikes from 50 ms after fixation onset to the end of fixation or until 30 ms after the end of the visible/occluded period, whichever came first. The resulting analysis intervals correspond to the triangular regions outlined in red and blue for visible and occluded, respectively, in Fig. 6. In a separate analysis, we looked at the activity at the end of the fixation periods, aligning the spike trains to the onset of the next saccade (not illustrated). For this analysis, the visible and occluded conditions were defined by requiring that the object in the RF was visible/occluded ≥100 ms before the end of fixation, and spikes were counted from that time point until 50 ms after the onset of the saccade (minimum latency of visual responses in V4).
We shall first discuss the time course plots of the neural activity that provide a qualitative view of the results and then present a quantitative analysis of the firing rate data based on general linear models.
Occluded Object Signals
Figure 7 shows the time course of the mean neuronal responses of V4 and V2 to visible object (red), occluded object (blue), and occluded blank (black). For the plots on the left, the fixation periods were aligned to the beginning of fixation and for the plots on the right, to the end of fixation (note that, because of the restrictions on the analysis time for the visible and occluded conditions, the left and right plots are based on different selections of fixations). The visual conditions before the beginning of fixation (shaded area) are a mixture determined by the monkey’s fixation sequence, and so are the visual conditions after the end of fixation where responses could be influenced after the minimum latency (50 ms for V4 and 40 ms for V2, indicated by shading). Because the monkeys were essentially free to inspect the array of objects (there were no restrictions except that the reward was contingent on fixating the designated object for >200 ms, the typical minimum duration of fixation in general), the plots in Fig. 7 show the time course of saccade-related neural responses in V2 and V4 under natural conditions. The results show that the natural responses to static object arrays are remarkably similar to the responses observed in the conventional fixation paradigm where a stimulus is turned on and off, except for the beginning and end, which do not reflect the spontaneous activity but a mixture of responses. Note that there is no indication of “saccadic suppression.” The responses extend well into the post-saccadic phase, right to the time when the volley of new afferent activity arrives (as marked by the rise of the black and blue curves). Besides the slow decay during the steady phase (which is typical for visual responses), there was no drop in firing around the time of the saccade.
In the occluded conditions, the activity drops after the saccade, but V4 neurons tend to fire at a slightly higher rate for occluded objects compared with occluded background, whereas V2 neurons do not show such a difference. However, the difference is hard to see because of the decaying responses to stimulation during the previous fixation. To show the difference more clearly, we plotted responses in the subset of fixations that were preceded by the blank condition (Fig. 8). Figure 8A shows what the theory predicts in the case that recording is from a G cell. It illustrates two fixations. During the first fixation (Fig. 8A, top), the recorded cell (G5) is not activated because there is no object in its receptive field, whereas G3 is activated by an object and drives an object pointer cell OP. This object is then occluded and brought into the RF by a saccade (Fig. 8A, bottom). At the same time, the shifter circuit reroutes the activity from OP to G5 (yellow arrow). Thus, at this point, G5 should be activated. This was actually observed, as shown by the blue trace in Fig. 8B, compared with the blank condition (black trace).
The results of Fig. 8B appear like a strong confirmation of the prediction. However, there is an alternative interpretation. The comparison between occluded object and occluded background responses is only valid if other factors that can influence the responses are identical. But this might not be the case if the RFs were actually larger than the RF plots suggest, because an occluded object is surrounded by a larger number of objects than the occluded background patch on average (cf. Fig. 4). Thus, it is possible that the blue curve simply reflects RF surround activation. We will address this concern in the firing rate analysis to be described in the following.
Using a general linear model (GLM), we estimated the difference in activity between the occluded object condition and the occluded blank condition in each neuron (the effect of factor OBJ; see Eqs. 1 and 2 in materials and methods) and computed their means across neurons. Figure 9 shows the results for the neurons from both monkeys pooled (black bars) and for each monkey separately (gray bars). Figure 9A corresponds to Fig. 7, which includes all fixations; Fig. 9B corresponds to Fig. 8, which includes only fixations that were preceded by a blank condition. The results from V2 show no systematic effects, whereas V4 neurons showed positive differences, the occluded object response being greater than the occluded blank response. For the pooled data, the differences are significant for both, analysis intervals aligned to the begin or the end of fixations.
To address the possibility that these effects were the result of surround modulation by neighboring visible objects, we applied a correction. Because the geometry of object array and occluding strips determines which objects are visible, we know the number of visible surrounding objects for each occluded object position and each blank position (Fig. 4). For example, for an occluded object at one of the two central positions, four surrounding objects are visible on average; for each corner position, the average is two objects, etc. For the different blank positions the averages are 4/3 and 2/3. Using a GLM, we can thus determine the effect of the number of visible neighboring objects on the occluded object signals (without using the blank position responses; Eq. 3). We can then derive a prediction for the difference between mean array position and mean blank position. This is an estimate of the object effect that would result under the null hypothesis that the object effect was zero and the response modulation was entirely caused by surround stimulation. Figure 9, bottom, shows the result of subtracting the correction from the original object effects. The corrected effects are smaller and no longer reach the significance criterion when all fixations are included (Fig. 9A) but are still significant for the pooled data when only fixations after a blank condition are included (Fig. 9B). Thus, it is not clear whether activation by neighboring visible objects is a valid explanation for the observed activation in the occluded object condition.
Another question is whether the activation in the occluded condition was stronger when the occluded object was a preferred stimulus of the neuron than a nonpreferred stimulus, as would be expected if the activation indeed reflected object permanence. We tested the subset of V4 neurons that were selective for the type of object in the visible condition, but there was no significant difference in the occluded condition.
Inhibition of Return
Interestingly, the occluded object signal was reduced for objects that had just been fixated before the present fixation (Fig. 10). The prediction is explained in Fig. 10A, which illustrates a saccade between two objects, a blue star indicating the fovea. To initiate the saccade, the system suppresses the OP cell for the right-hand object and enhances the OP cell for the left-hand object. Around the time of the saccade, the shifter circuit reroutes the OP activity for the left-hand object from G1 to G3 and the OP activity for the right-hand object from G3 to G5 (assumed to be the recorded cell). As in Fig. 8, this cell is not activated during the first fixation because its RF is in a blank region but would be activated at the time of the saccade through the rerouted connection from the OP cell. However, that OP cell is being suppressed so that the expected “response” is weakened. Had the RF of G5 initially been on one of the not-fixated objects, there would be no suppression, and the rerouting would produce activation of G5 in full (prediction not shown).
The dashed trace in Fig. 10B shows the response corresponding to the occluded object that had just been fixated, as illustrated in Fig. 10A, and the solid trace shows the response corresponding to occluded objects that had not just been fixated. The difference was significant. We found no significant reduction of responses in the visible condition (data not shown). Suppression of the representation of an object that has just been fixated would be the analog of what is called “inhibition of return” in the psychological literature. For the sake of brevity, we use this term here also for the observed reduction of neural activity. Suppression by inhibition of return is a strong argument for the interpretation of the occluded object signal as a rerouting of object pointer activity, because the hypothesis predicts a response reduction for the occluded object signal when the G cell is activated by rerouted activity from OP (Fig. 10A), but not when the G cell is driven by afferent visual signals.
Both curves in Fig. 10B, top, are based on occluded object signals. Thus, the RF is always on positions within the array; thus, the difference between array positions and background positions in number of surrounding objects should not be a problem here. However, the RF positions underlying the two curves might still not be equivalent. Moving the RF to an object that had just been fixated means that the monkey makes a saccade away from the RF with an amplitude equal to the eccentricity of the RF. Given the shape of the array, and requiring that the RF initially be on a blank region, there are only three such saccade vectors producing three end positions of RF, whereas RF movements to not-just-fixated objects include other end positions. Figure 11 shows the RF movements to just-fixated objects as dashed arrows. Examples of saccades that move the RF to objects that had not just been fixated are shown with solid arrows.
To rule out any position effects, we have recomputed the two response curves of Fig. 10B, this time selecting only saccades with the same end positions for both. These curves are shown in Fig. 10B, bottom. The result is “noisy” because the selection greatly reduced the amount of included data, but a difference in the sense of “inhibition of return” is still significant. Thus, a saccade that moves the RF from a blank field to an occluded object activates the neuron, and this activation is weaker for the object that had just been fixated than for the other objects. This result is consistent with the inhibition-of-return explanation but would be hard to explain if the activation were merely a surround response.
Effects of Selective Attention and Intended Eye Movements
We now look at the effects of type-based attention and saccade planning on the responses (Fig. 12). In our experiments, type-based selective attention was controlled by presenting a cue object at the beginning of each trial that instructed the monkey to fixate one of the two types of objects distinguished either by shape or by color. The effect was assessed by comparing the activity corresponding to the presence of either type of object in the receptive field. The saccade planning effect was assessed by comparing the activity corresponding to the object that is the goal of the next saccade, with the activity corresponding to the objects that are not the goal of the next saccade.
Figure 12A illustrates the prediction for a saccade to an occluded object. Initially, the right-hand object is fixated, as shown in Fig. 12Aa (blue star), which means that the corresponding OP cell is centrally activated, enhancing a foveal G cell, G3. When a saccade is initiated (Fig. 12Ab), this OP cell becomes suppressed, and the OP cell corresponding to the left-hand object is centrally activated, which also activates G1, the recorded cell. G1 was not active before, because the left-hand object was occluded. After the saccade (Fig. 12Ac), this object, still occluded, is represented by the foveal G cell G3, which receives top-down activity when the shifter circuit reroutes the OP activity (Fig. 12Ad; the rerouting is not necessarily synchronous with the saccade).
The observed V4 responses are shown in Fig. 12B. As in the previous figures, visual responses are plotted with reddish colors, and occluded object signals are plotted with bluish colors. Responses to targets (objects that equal the cue) are plotted with solid lines, and responses to distracters are plotted with dashed lines. For this comparison, we excluded situations when the object was the goal of the next saccade. Thus, the difference between solid and dashed curves shows the effect of a selection mechanism that differentiates all objects in the display in parallel, enhancing the representations of the cued type relative to the representations of the other type. Responses to target objects that were the goal of the next saccade are plotted with intense red and blue (fixations that moved the receptive field to an object that had just been fixated, and would generally not be goal of the next saccade, were excluded because they would confound the assessment of the saccade goal effect). The red curve is higher than the solid brown curve; thus, the saccade mechanism further enhances the representation of the selected target object relative to the others.
The visible object responses clearly show the modulation by type-based attention and the effect of saccade planning. Comparison of the plots aligned to beginning (Fig. 12B, top) and end of fixation (Fig. 12B, bottom) shows that both effects appear with a delay after response onset and strengthen toward the end of fixation, continuing across the ensuing saccade and part of the next fixation all the way to the beginning of the post-saccadic responses (evidenced by the rise of the blue curves beginning at ∼60 ms).
Although the effects of distributed attention and saccade planning were clear in the visual condition, making target representations stronger compared with distracter representations and further enhancing the representation of the target object that was the goal of the upcoming saccade, there was no differential activity when the object in the RF was occluded (Fig. 12B, bluish curves). Thus, the V4 neurons showed modulation of visual responses, but there was no modulation where the rerouted activity was expected.
The results from the firing rate analysis (Eqs. 5 and 6) are shown in Fig. 13. Consistent with the time course plots, attention and saccade planning produced significant enhancement when the objects were visible but no modulation when they were occluded.
We also found that in the visible condition, the attention effect was stronger for the cells’ preferred types of object than the nonpreferred types of object (data not shown), in agreement with Bichot et al. (2005). This is consistent with a multiplicative mechanism as proposed for the back projection from the G to the F layer in the grouping cell model (Fig. 3), suggesting that our recordings were from F (feature) cells.
Figures 12 and 13 only represent means across the population. However, cells varied with respect to attentional and saccadic modulation, and there is the possibility that individual cells might show significant effects. If there were attention effects for occluded objects in individual cells, they should correlate with the attention effects for visible objects. In Fig. 14, we have plotted the effect for the occluded versus the effect for the visible condition in each cell of V4. As the test results show (Fig. 14, inset), there was no correlation between the two effects, indicating that attention did not affect the activity in the same way when the object in the receptive field was occluded compared with visible. Also, for modulation by the impending saccade, we found no such correlation (r = 0.07, n = 54, P = 0.61 for monkey WI; r = 0.09, n = 70, P = 0.44 for monkey HA). The absence of these correlations suggests that the enhancement of activity that occurred when there was an occluded object in the RF, as observed in V4 cells (Figs. 7–10), is not related to the expected top-down activation of G cells. If it were, attention would have a similar (excitatory or inhibitory) effect for occluded as for visible objects in each cell, and the same holds for the saccade planning effect. Thus, the results represented in Figs. 12–14 indicate that the V4 cells we recorded were probably F cells, not G cells.
DISCUSSION
We studied the hypothetical representation of object pointers in the visual cortex. Specifically, we examined neurons in V2 and V4 to see if their activity reflects pointers rather than visual features, as commonly assumed. According to the theory sketched at the outset, a pointer activates a grouping (G) cell that in turn modulates the feature (F) cells of an object. Thus, grouping cells can be centrally (top-down) activated, as opposed to feature cells, which can be activated only by afferent visual signals. For visible objects the difference is subtle, because feature-selective cells in V4 can be modulated by top-down signals (e.g., see Motter 1994); it would be hard to distinguish central activation from central modulation. That is why we tested a situation in which objects are “visually present,” although the stimuli are occluded and thus do not generate afferent signals. We expected that a visual foraging task with temporary occlusion of objects would likely induce the system to set up and use internal representations that persist across the occlusions.
Our results on the fixation behavior confirm this expectation. The subjects did not hesitate to saccade to occluded objects, and fixation on occluded objects was nearly as accurate as fixation on visible objects (Fig. 5). Also, both subjects saccaded to target objects more often than to distracter objects, whether they were visible or occluded before the saccade. Thus, subjects seemed to have available a robust representation of the locations and qualities (whether target or distracter) of the occluded objects.
The neuronal recordings showed increased activity in V4 neurons, as predicted, when fixation brought an occluded object into the receptive field compared with an occluded background region (Figs. 7 and 9A), and such activation was not found in neurons of V2. The activation under occlusion was easy to see when the receptive field moved from a blank region to an occluded object, because that excluded decaying activity from previous stimulation (Figs. 8 and 9B). To control for the possibility that this activation might reflect stimulation by the visible objects outside the mapped RF, we measured how much objects surrounding the RF contribute to the firing rates with a linear model that estimates the firing rate as a function of the number of visible surround objects. We then corrected the actual responses by subtracting the estimated surround contributions (Fig. 9). This reduced the occluded object activity but did not eliminate it. In the subset of fixations that moved the receptive field from a blank region to an occluded object, the corrected activation was still significant (Fig. 9B).
Another result suggesting that V4 cells indeed represent occluded objects was the finding of inhibition of return. When the occluded object currently in the receptive field had been fixated immediately before, the activation was reduced (Fig. 10). Had the activity merely been a surround response, there should have been no such reduction; inhibition of return serves the purpose to suppress the representation of the just-fixated object so that gaze does not return to it, but why would the system suppress activation by objects surrounding it?
Two other important predictions of the theory are that occluded object representations should exhibit modulation by attention and saccade planning. The subjects saccaded selectively to target objects (which led to reward), and they did so for occluded as well as visible objects. The responses to visible objects showed the corresponding modulation: enhancement of responses to target objects compared with responses to distracter objects and additional enhancement when the object was the goal of the next saccade (Fig. 12B, brown and red traces). However, attention and saccade planning did not affect the activity of the same cells when objects were occluded (Fig. 12B, blue and purple traces).
The analysis of firing rates with a GLM showed the same pattern: highly significant effects of cue (target versus distracter) and saccade planning (goal of saccade versus not goal of saccade) for visible objects but no such effects for occluded objects (Fig. 13). The positive results for visible objects show that our tests are sensitive. Thus the absence of these effects for occluded objects is strong evidence to the negative. Because the theory predicts attention and saccade planning effects for occluded objects for grouping cells, but not for feature cells, this result indicates that our V4 population consisted of feature cells and probably did not include grouping cells.
Taken together, regarding the predictions from the theory, our results are conflicting. The increase of firing rates when the receptive field moved from a blank field to an occluded object (Fig. 8) and its suppression by inhibition of return (Fig. 10) are indicative of grouping cells, but we did not see the effects of attention and saccade planning for occluded objects as predicted for grouping cells (Figs. 12–14). Some of the firing rate increase might be explained by direct visual activation from outside the mapped receptive field, and correction for the effect of surround activation, although not completely eliminating the observed increase, supports this explanation (Fig. 9). However, the suppression of the activity increase by inhibition of return cannot be explained this way. The initiation of a saccade involves enhancement of the object pointer cell of the object to be fixated next and suppression of the object pointer cell of the currently fixated object. This suppression reduces the amount of top-down activation in the grouping cells subserving the occluded object (Fig. 10A), and because it affects only object pointer cells, there is only a loss of top-down input to the momentarily connected grouping cell, but any visually evoked activity is not suppressed. In particular, afferent visual activation from the surround will not be affected. Thus, our finding (Fig. 10B) that the activity produced when receptive fields moved from a blank region to an occluded object was reduced if that object had been fixated before is indicative of grouping cells. Nevertheless, there may be alternative explanations. For example, a modification of the model circuit of Fig. 3 such that the suppression signals from OP reach down all the way to the feature representation level might explain the apparent inhibition of return in the population of recorded cells.
In conclusion, we think our results do not provide convincing evidence that area V4 contains cells that are activated “top down” by object pointers (i.e., grouping cells in the sense of the theory of Fig. 3). Creating and remapping object pointers are functions of fundamental importance for visual systems that use saccades and are able to integrate the information from multiple fixations. To be efficient, the underlying mechanisms must be accurate and fast. We expected to find strong unequivocal signals in individual cells at least once in a while, but we found only changes in population activity that we think are unlikely to support these functions.
Object pointer signals and grouping cells might be found somewhere else. Object permanence has been demonstrated in the anterior regions of the superior temporal cortex, where neurons signal hidden objects for seconds after occlusion (Baker et al. 2001). These signals are selective for the identity of the occluded object/person and for location in real space, so they must be the result of the remapping of object pointers, but presumably not the object pointers themselves. In inferotemporal cortex, experiments with displays simulating occlusion of an object showed that a small fraction of neurons produced larger (“surprise”) responses when a different object emerged from the occluder, whereas others showed larger (“match”) responses when the object reappeared as expected (Puneeth and Arun 2016). Because these responses peaked at around 130 ms, which is typical for shape-selective responses in IT and much later than the border ownership signals, they are probably more related to object/shape memory than to object pointer signals. We also want to emphasize that we are not considering all kinds of conditions where object permanence is observed but rather the simple condition, when an object is temporarily occluded without changing its attributes, as described by Michotte (1950). As explained under theory and predictions, cells in the lateral parietal area (LIP) that show remapping activity fit some of the criteria for grouping cells. It would be interesting to see whether LIP cells would also “respond” when their RF lands on an occluded objects. To qualify as grouping cells, it must also be shown that they modulate the activity of V2 feature cells, making them border ownership selective and susceptible to top-down attention. However, anatomic studies have shown that LIP projects to V4 but has no direct projections to V2 (Felleman and Van Essen 1991), and direct projections would be required to produce the observed increase in synchrony between V2 cells when stimulated by a common object (Martin and von der Heydt 2015). Because object pointers are intimately related to the deployment of attention and visual awareness, a structure of particular interest in this context is the pulvinar, where visual responses have been shown to correlate with perceptual suppression of an object as experienced by human observers in an illusion paradigm (Wilke et al. 2009). Moreover, the pulvinar has been implicated as a possible site of the shifter circuits that are required for the remapping of pointers (Olshausen et al. 1993).
Neuronal function in so far little explored regions of the brain is often being studied using shape selectivity as the primary criterion for characterizing visual function. But the G cells of the grouping model are not particularly shape selective (Craft et al. 2007); they define object shape and other attributes only by way of their projections to feature cells (von der Heydt 2015). Moreover, studies of border ownership selectivity indicate that G cells do not require the simultaneous presence of multiple features to respond, like a logical AND, as do IT cells, but rather sum figure ground cues in the manner of a logical OR (Zhang and von der Heydt 2010; von der Heydt and Zhang 2018). We hope that the object permanence paradigm developed in our study will help in identifying object pointer signals somewhere in the diverse visual regions of the brain.
GRANTS
This study was supported by National Eye Institute Grants R01-NEI-002966 and R01-NEI-027544.
DISCLOSURES
No conflicts of interest, financial or otherwise, are declared by the authors.
AUTHOR CONTRIBUTIONS
S.D.Z., L.A.Z., and R.v.d.H. conceived and designed research; S.D.Z. and L.A.Z. performed experiments; S.D.Z., L.A.Z., and R.v.d.H. analyzed data; S.D.Z., L.A.Z., and R.v.d.H. interpreted results of experiments; S.D.Z., L.A.Z., and R.v.d.H. prepared figures; R.v.d.H. drafted manuscript; S.D.Z., L.A.Z., and R.v.d.H. edited and revised manuscript; S.D.Z., L.A.Z., and R.v.d.H. approved final version of manuscript.
ACKNOWLEDGMENTS
We thank Ofelia Garalde for technical assistance and Ernst Niebur for helpful discussions.
Footnotes
The term “occluded object” describes the perceptual condition of a remembered object; when created on a computer display, there is no physical object.
REFERENCES
- Anderson CH, Van Essen DC. Shifter circuits: a computational strategy for dynamic aspects of visual processing. Proc Natl Acad Sci USA 84: 6297–6301, 1987. doi: 10.1073/pnas.84.17.6297. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Baker CI, Keysers C, Jellema T, Wicker B, Perrett DI. Neuronal representation of disappearing and hidden objects in temporal cortex of the macaque. Exp Brain Res 140: 375–381, 2001. doi: 10.1007/s002210100828. [DOI] [PubMed] [Google Scholar]
- Bakin JS, Nakayama K, Gilbert CD. Visual responses in monkey areas V1 and V2 to three-dimensional surface configurations. J Neurosci 20: 8188–8198, 2000. doi: 10.1523/JNEUROSCI.20-21-08188.2000. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bartlett MS. The square root transformation in the analysis of variance. J R Stat Soc 3, Suppl 3: 68–78, 1936. doi: 10.2307/2983678. [DOI] [Google Scholar]
- Bichot NP, Rossi AF, Desimone R. Parallel and serial neural mechanisms for visual search in macaque area V4. Science 308: 529–534, 2005. doi: 10.1126/science.1109676. [DOI] [PubMed] [Google Scholar]
- Brincat SL, Connor CE. Dynamic shape synthesis in posterior inferotemporal cortex. Neuron 49: 17–24, 2006. doi: 10.1016/j.neuron.2005.11.026. [DOI] [PubMed] [Google Scholar]
- Cavanagh P, Hunt AR, Afraz A, Rolfs M. Visual stability based on remapping of attention pointers. Trends Cogn Sci 14: 147–153, 2010. doi: 10.1016/j.tics.2010.01.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen M, Yan Y, Gong X, Gilbert CD, Liang H, Li W. Incremental integration of global contours through interplay between visual cortical areas. Neuron 82: 682–694, 2014. doi: 10.1016/j.neuron.2014.03.023. [DOI] [PubMed] [Google Scholar]
- Connor CE, Brincat SL, Pasupathy A. Transformation of shape information in the ventral pathway. Curr Opin Neurobiol 17: 140–147, 2007. doi: 10.1016/j.conb.2007.03.002. [DOI] [PubMed] [Google Scholar]
- Craft E, Schütze H, Niebur E, von der Heydt R. A neural model of figure-ground organization. J Neurophysiol 97: 4310–4326, 2007. doi: 10.1152/jn.00203.2007. [DOI] [PubMed] [Google Scholar]
- Duhamel JR, Colby CL, Goldberg ME. The updating of the representation of visual space in parietal cortex by intended eye movements. Science 255: 90–92, 1992. doi: 10.1126/science.1553535. [DOI] [PubMed] [Google Scholar]
- Felleman DJ, Van Essen DC. Distributed hierarchical processing in the primate cerebral cortex. Cereb Cortex 1: 1–47, 1991. doi: 10.1093/cercor/1.1.1. [DOI] [PubMed] [Google Scholar]
- Galletti C, Battaglini PP, Fattori P. Parietal neurons encoding spatial locations in craniotopic coordinates. Exp Brain Res 96: 221–229, 1993. doi: 10.1007/BF00227102. [DOI] [PubMed] [Google Scholar]
- Gillary G, von der Heydt R, Niebur E. Short-term depression and transient memory in sensory cortex. J Comput Neurosci 43: 273–294, 2017. doi: 10.1007/s10827-017-0662-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Goldberg ME, Bisley JW, Powell KD, Gottlieb J. Saccades, salience and attention: the role of the lateral intraparietal area in visual behavior. Prog Brain Res 155: 157–175, 2006. doi: 10.1016/S0079-6123(06)55010-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hu B, von der Heydt R, Niebur E. Figure-ground organization in natural scenes: performance of a recurrent neural model compared with neurons of area V2. eNeuro 6: ENEURO.0479-18.2019, 2019. doi: 10.1523/ENEURO.0479-18.2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jehee JF, Lamme VA, Roelfsema PR. Boundary assignment in a recurrent network architecture. Vision Res 47: 1153–1165, 2007. doi: 10.1016/j.visres.2006.12.018. [DOI] [PubMed] [Google Scholar]
- Kahneman D, Treisman A, Gibbs BJ. The reviewing of object files: object-specific integration of information. Cognit Psychol 24: 175–219, 1992. doi: 10.1016/0010-0285(92)90007-O. [DOI] [PubMed] [Google Scholar]
- Kolers PA, von Grünau M. Shape and color in apparent motion. Vision Res 16: 329–335, 1976. doi: 10.1016/0042-6989(76)90192-9. [DOI] [PubMed] [Google Scholar]
- Lamme VAF. The neurophysiology of figure-ground segregation in primary visual cortex. J Neurosci 15: 1605–1615, 1995. doi: 10.1523/JNEUROSCI.15-02-01605.1995. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Martin AB, von der Heydt R. Spike synchrony reveals emergence of proto-objects in visual cortex. J Neurosci 35: 6860–6870, 2015. doi: 10.1523/JNEUROSCI.3590-14.2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mazer JA, Gallant JL. Goal-related activity in V4 during free viewing visual search. Evidence for a ventral stream visual salience map. Neuron 40: 1241–1250, 2003. doi: 10.1016/S0896-6273(03)00764-5. [DOI] [PubMed] [Google Scholar]
- Michotte A. A propos de la permanence phénoménale faits et theories. Acta Psychol (Amst) 7: 298–322, 1950. doi: 10.1016/0001-6918(50)90021-7. [DOI] [Google Scholar]
- Mihalas S, Dong Y, von der Heydt R, Niebur E. Mechanisms of perceptual organization provide auto-zoom and auto-localization for attention to objects. Proc Natl Acad Sci USA 108: 7583–7588, 2011. doi: 10.1073/pnas.1014655108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Moore T, Armstrong KM. Selective gating of visual signals by microstimulation of frontal cortex. Nature 421: 370–373, 2003. doi: 10.1038/nature01341. [DOI] [PubMed] [Google Scholar]
- Motter BC. Neural correlates of attentive selection for color or luminance in extrastriate area V4. J Neurosci 14: 2178–2189, 1994. doi: 10.1523/JNEUROSCI.14-04-02178.1994. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Noudoost B, Chang MH, Steinmetz NA, Moore T. Top-down control of visual attention. Curr Opin Neurobiol 20: 183–190, 2010. doi: 10.1016/j.conb.2010.02.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- O’Herron P, von der Heydt R. Short-term memory for figure-ground organization in the visual cortex. Neuron 61: 801–809, 2009. doi: 10.1016/j.neuron.2009.01.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- O’Herron P, von der Heydt R. Representation of object continuity in the visual cortex. J Vis 11: 12, 2011. doi: 10.1167/11.2.12. [DOI] [PMC free article] [PubMed] [Google Scholar]
- O’Herron P, von der Heydt R. Remapping of border ownership in the visual cortex. J Neurosci 33: 1964–1974, 2013. doi: 10.1523/JNEUROSCI.2797-12.2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Olshausen BA, Anderson CH, Van Essen DC. A neurobiological model of visual attention and invariant pattern recognition based on dynamic routing of information. J Neurosci 13: 4700–4719, 1993. doi: 10.1523/JNEUROSCI.13-11-04700.1993. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pasupathy A, Connor CE. Responses to contour features in macaque area V4. J Neurophysiol 82: 2490–2502, 1999. doi: 10.1152/jn.1999.82.5.2490. [DOI] [PubMed] [Google Scholar]
- Pooresmaeili A, Roelfsema PR. A growth-cone model for the spread of object-based attention during contour grouping. Curr Biol 24: 2869–2877, 2014. doi: 10.1016/j.cub.2014.10.007. [DOI] [PubMed] [Google Scholar]
- Poort J, Raudies F, Wannig A, Lamme VA, Neumann H, Roelfsema PR. The role of attention in figure-ground segregation in areas V1 and V4 of the visual cortex. Neuron 75: 143–156, 2012. doi: 10.1016/j.neuron.2012.04.032. [DOI] [PubMed] [Google Scholar]
- Puneeth NC, Arun SP. A neural substrate for object permanence in monkey inferotemporal cortex. Sci Rep 6: 30808, 2016. doi: 10.1038/srep30808. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pylyshyn ZW, Storm RW. Tracking multiple independent targets: evidence for a parallel tracking mechanism. Spat Vis 3: 179–197, 1988. doi: 10.1163/156856888X00122. [DOI] [PubMed] [Google Scholar]
- Qiu FT, Sugihara T, von der Heydt R. Figure-ground mechanisms provide structure for selective attention. Nat Neurosci 10: 1492–1499, 2007. doi: 10.1038/nn1989. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Qiu FT, von der Heydt R. Figure and ground in the visual cortex: V2 combines stereoscopic cues with gestalt rules. Neuron 47: 155–166, 2005. doi: 10.1016/j.neuron.2005.05.028. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rensink RA. Seeing, sensing, and scrutinizing. Vision Res 40: 1469–1487, 2000. doi: 10.1016/S0042-6989(00)00003-1. [DOI] [PubMed] [Google Scholar]
- Rockland KS, Saleem KS, Tanaka K. Divergent feedback connections from areas V4 and TEO in the macaque. Vis Neurosci 11: 579–600, 1994. doi: 10.1017/S0952523800002480. [DOI] [PubMed] [Google Scholar]
- Roelfsema PR, Lamme VAF, Spekreijse H. Object-based attention in the primary visual cortex of the macaque monkey. Nature 395: 376–381, 1998. doi: 10.1038/26475. [DOI] [PubMed] [Google Scholar]
- Sugihara T, Qiu FT, von der Heydt R. The speed of context integration in the visual cortex. J Neurophysiol 106: 374–385, 2011. doi: 10.1152/jn.00928.2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tsotsos JK. A Computational Perspective on Visual Attention. Cambridge, MA: MIT Press, 2011, p. xvi. [Google Scholar]
- Umeno MM, Goldberg ME. Spatial processing in the monkey frontal eye field. I. Predictive visual responses. J Neurophysiol 78: 1373–1383, 1997. doi: 10.1152/jn.1997.78.3.1373. [DOI] [PubMed] [Google Scholar]
- von der Heydt R. Figure-ground organization and the emergence of proto-objects in the visual cortex. Front Psychol 6: 1695, 2015. doi: 10.3389/fpsyg.2015.01695. [DOI] [PMC free article] [PubMed] [Google Scholar]
- von der Heydt R, Zhang NR. Figure and ground: how the visual cortex integrates local cues for global organization. J Neurophysiol 120: 3085–3098, 2018. doi: 10.1152/jn.00125.2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wagatsuma N, von der Heydt R, Niebur E. Spike synchrony generated by modulatory common input through NMDA-type synapses. J Neurophysiol 116: 1418–1433, 2016. doi: 10.1152/jn.01142.2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Walker MF, Fitzgibbon EJ, Goldberg ME. Neurons in the monkey superior colliculus predict the visual result of impending saccadic eye movements. J Neurophysiol 73: 1988–2003, 1995. doi: 10.1152/jn.1995.73.5.1988. [DOI] [PubMed] [Google Scholar]
- Wannig A, Stanisor L, Roelfsema PR. Automatic spread of attentional response modulation along Gestalt criteria in primary visual cortex. Nat Neurosci 14: 1243–1244, 2011. doi: 10.1038/nn.2910. [DOI] [PubMed] [Google Scholar]
- Wilke M, Mueller KM, Leopold DA. Neural activity in the visual thalamus reflects perceptual suppression. Proc Natl Acad Sci USA 106: 9465–9470, 2009. doi: 10.1073/pnas.0900714106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Williford JR, von der Heydt R. Figure-ground organization in visual cortex for natural scenes. eNeuro 3: ENEURO.0127-16.2016, 2016. doi: 10.1523/ENEURO.0127-16.2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wurtz RH. Neuronal mechanisms of visual stability. Vision Res 48: 2070–2089, 2008. doi: 10.1016/j.visres.2008.03.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang NR, von der Heydt R. Analysis of the context integration mechanisms underlying figure-ground organization in the visual cortex. J Neurosci 30: 6482–6496, 2010. doi: 10.1523/JNEUROSCI.5168-09.2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhou H, Friedman HS, von der Heydt R. Coding of border ownership in monkey visual cortex. J Neurosci 20: 6594–6611, 2000. doi: 10.1523/JNEUROSCI.20-17-06594.2000. [DOI] [PMC free article] [PubMed] [Google Scholar]