Significance
Theories of visual attention postulate the existence of a saliency map that guides attention/gaze toward the most visually conspicuous stimuli in complex scenes. This study compared saliency coding in the two dominant visual gateways: the primary visual cortex (V1) and the evolutionarily older visual system that exists in the midbrain superior colliculus. Our results show that neurons in the superficial visual layers of the superior colliculus (SCs) encoded saliency earlier and more robustly than V1 neurons. This was surprising, because the dominant input to the SCs arises from V1. This result is in line with models that place a feature processing stage (V1) before the feature-agnostic saliency map in SCs.
Keywords: attention, priority, vision, gaze, oculomotor
Abstract
Models of visual attention postulate the existence of a bottom-up saliency map that is formed early in the visual processing stream. Although studies have reported evidence of a saliency map in various cortical brain areas, determining the contribution of phylogenetically older pathways is crucial to understanding its origin. Here, we compared saliency coding from neurons in two early gateways into the visual system: the primary visual cortex (V1) and the evolutionarily older superior colliculus (SC). We found that, while the response latency to visual stimulus onset was earlier for V1 neurons than superior colliculus superficial visual-layer neurons (SCs), the saliency representation emerged earlier in SCs than in V1. Because the dominant input to the SCs arises from V1, these relative timings are consistent with the hypothesis that SCs neurons pool the inputs from multiple V1 neurons to form a feature-agnostic saliency map, which may then be relayed to other brain areas.
Most theories and computational models of saliency postulate that visual input is transformed into a topographic representation of visual conspicuity (Fig. 1A, red), whereby certain stimuli stand out from others based on low-level features of the input image (Fig. 1A, blue) (1–3). The concept of a priority map describes a combined representation of visual saliency and behavioral relevancy (Fig. 1A, yellow), which is thought to be the core determinant of attention and gaze (4, 5). To date, most studies have reported evidence of saliency and/or priority maps in a distributed network of cortical brain areas [e.g., primary visual cortex (V1) (6–9), visual area 4 (V4) (10), lateral intraparietal area (LIP) (11–13), and frontal eye fields (14, 15)]. However, there is mounting evidence for a subcortical saliency mechanism in the premammalian optic tectum (16–18) or superior colliculus (SC) in primates (Fig. 1B). The primate SC, which has received a lot of attention for its role as an oculomotor hub, might be considered an unlikely candidate for a visual salience map, but there is a rich history of research on visual attention in the SC (a recent review is in ref. 19), which has broadened our perspective of its role in processes previously thought to be the domain of neocortex.
The SC (Fig. 1B) is multilayered but is often described as having two dominant functional layers, a superior colliculus superficial layer (SCs) associated exclusively with visual processing and a superior colliculus multisensory–cognitive–motor intermediate layer (SCi) linked to the control of attention and gaze (19–22). Because SCs is interconnected with multiple visual areas (23–25), it is in an ideal location to pool diverse visual inputs to form a feature-agnostic saliency representation. Recently (26), it has been shown that the activity of SCs neurons, with dominant inputs that arise from the retina and V1 (Fig. 1B) (24, 25), is well-predicted by a computational saliency model that has been validated on the free viewing behavior of humans (27) and nonhuman primates (28). In addition, a recent lesion study has shown that attention guidance is preserved in the absence of V1 (29), thereby challenging the hypothesis that saliency coding depends critically on V1 (6–9) and instead, implicating a possible pathway through the SC. This poses the question of whether the evolutionarily older SC still plays a significant role in the computation of saliency or whether saliency representations observed there reflect computations done elsewhere.
Here, we explored the hypothesis that SCs, not V1, embodies the role of a saliency map. To test this hypothesis, we recorded from neurons in the SC and V1 (of different animals), while using a task and stimuli (Fig. 1 C–F) designed to compare the timing of saliency representations, and the long-range spatial interactions that give rise to saliency in complex scenes. Rhesus monkeys performed a visually guided saccade task while presented with salient but goal-irrelevant stimuli that they were required to ignore. The goal-irrelevant stimulus consisted of a wide-field array (210 items spanning 40° to 50° of visual angle) of oriented color bars (∼0.4° × 1.2°) with a salient oddball, forming what is traditionally described as perceptual pop out (30). We measured visually evoked responses when the salient oddball appeared within (IN) vs. opposite/contralateral (OPP) to the response field (RF), representing high- vs. low-saliency regions of the display. Because models of visual saliency depend critically on center-surround feature contrasts that extend widely across the visual field (1–3), quantification of the surround suppression characteristics can also provide a strong index of the degree to which each brain area represents saliency. Therefore, a single-item control condition (Fig. 1 E and F) served as a benchmark of visual responses with no competing surround, which allowed us to differentiate first-order saliency (local luminance change associated with a sudden visual onset) from second-order saliency computations (center-surround contrasts that make a stimulus stand out from its isoluminant neighbors in a complex scene). In addition, the goal-irrelevant oddball/item was centered in the RF in two ways: (i) the stimulus appeared abruptly with the oddball/single-item centered IN or OPP the RF; or (ii) the oddball/single item was brought IN or OPP the RF via a saccade (11). This allowed us to examine possible differences in saliency coding associated with passive viewing of an abrupt visual onset vs. active gaze-dependent shifting of stimuli over different parts of the visual field via saccades (11), which is characteristic of real-world viewing.
Results
Saliency Coding Emerges Earlier in SCs than V1.
We compared visually evoked responses when a salient but goal-irrelevant oddball appeared IN vs. OPP the RF, representing high- vs. low-saliency regions of the display, the difference of which defined the saliency index (SI Materials and Methods). Fig. 2 A–D shows normalized population-averaged responses for V1 (Fig. 2, blue) and SCs (Fig. 2, red) neurons (note that these results were not affected by the normalization procedure as indicated by the identical statistical outcome using absolute firing rates) (Fig. S1). For a given neuron, we normalized to the averaged peak response of the single-item condition within the window from 0 to 500 ms (SI Materials and Methods). Because the single item always produced the greatest visual response, this ensured that the relative differences between conditions for a given neuron were retained. Although V1 and to a lesser degree, SCs (31) neurons showed some feature preference (Fig. S2), the response curves depicted in Fig. 2 represent the data collapsed across the feature combinations, effectively canceling out feature-specific differences. It should be noted that the V1 response curves represent combined single units plus multiunit sites (Materials and Methods), the results of which were qualitatively similar but less robust with the single units alone (Fig. S3). At the population level, V1 and SCs neurons showed a significant preference for the salient oddball as indicated by the difference between the oddball IN (Fig. 2, thick traces) and oddball OPP (Fig. 2, thin traces) conditions (P < 0.05, Wilcoxon signed rank test at 10-ms intervals, Bonferroni–Holm corrected). In addition, this oddball selectivity was qualitatively similar whether the oddball appeared abruptly in the RF (Fig. 2 A and C, array-aligned condition) or was brought into the RF via a saccade (Fig. 2 B and D, saccade end-aligned condition).
To explore these differences in detail, we derived two quantities from the response profiles of each neuron, the visual response onset latency (VROL; the earliest emergence of a visual response) and the saliency response onset latency (SROL; the earliest point at which the neuron signaled a preference for the oddball) (Materials and Methods). Fig. 2 E and F shows the cumulative distributions of VROLs (Fig. 2, solid traces) and SROLs (Fig. 2, dotted traces) for V1 and SCs neurons. Only the subset of neurons which showed significant preference for the oddball was included in the cumulative distribution of SROLs. For the array-aligned condition (Fig. 2E), VROL was significantly earlier for V1 (median = 40 ms, n = 56) (Fig. 2, vertical blue line) than SCs (median = 49 ms, n = 24; P = 0.0008, rank sum test) (Fig. 2, vertical red line), whereas SROL was significantly earlier for SCs (median = 65 ms, n = 24) than V1 (median = 139 ms, n = 56; P = 0.000021, rank sum test). These results were qualitatively similar in the saccade end-aligned condition (Fig. 2F). There, VROL was significantly longer for V1 (median = 47 ms, n = 64) than SCs (median = 38 ms, n = 21; P = 0.013, rank sum test). More importantly, SROL was significantly earlier for SCs (median = 60 ms, n = 21) than V1 (median = 121 ms, n = 64; P = 0.0041, rank sum test). The results were qualitatively similar whether RFs were overlapping or nonoverlapping between brain areas (Fig. S4).
These results indicate that, although V1 signaled the earliest visual arrival times in the stimulus-aligned condition (although not in the saccade end-aligned condition), on average, the saliency representation in V1 did not emerge until 60–75 ms after it had already appeared in the SCs. This delay was relatively shorter for V1 single units (14–37 ms) (Fig. S3 E and F) than V1 single units plus multiunits (Fig. 2 E and F), but in either case, the results were generally the same. Using the larger dataset that included single-unit plus multiunit V1 recordings led to more reliable and statistically powerful V1 results (compare, for example, the difference between the population-averaged spike density traces in Fig. 2 A and B and Fig. S3 A and B).
In addition, in V1, there was an average delay of 45–99 ms from visual onset to the emergence of the saliency representation (comparing the blue solid traces with the blue dotted traces in both Fig. 2 E and F and Fig. S3 E and F). This delay is exemplified by the fact that the saliency representation in V1 did not emerge until after the initial volley of visual activity, as indicated by the blue vertical dotted line in Fig. 2A (Fig. S3A). In contrast, while SCs neurons showed overall later VROLs than V1 in the stimulus-aligned condition, the saliency representation emerged within the initial volley of visual activity around the earliest part of the visual response (∼15–20 ms after visual response onset) as indicated by the red vertical dotted line in Fig. 2C. This relatively slow signaling of saliency in V1 suggests either a processing delay within V1 itself to generate a saliency representation or that the saliency representation in V1 emerged via feedback from other brain areas. Interestingly, VROL was the same or shorter for SCs than V1 in the saccade end-aligned case (Fig. 2F and Fig. S3F). This might indicate that, under real-world viewing conditions, where stimuli are more often brought into a neuron’s RF as a result of saccades, VROL may be earlier in SCs than V1. Most importantly, the saliency representation emerged reliably earlier in SCs than V1.
Surround Modulation Emerges Earlier and Stronger in SCs than V1.
Most models of visual saliency depend critically on center-surround feature contrasts that extend widely across the visual field (1–3), and therefore, comparison of the surround suppression characteristics can provide a strong index of the degree to which each brain area represents saliency. To quantify surround suppression, we compared visual responses evoked by the wide-field array (surround) (Fig. 1C) with a single item (no surround) (Fig. 1E). Fig. 3 A–D shows population responses in the single-item IN condition (no surround) (Fig. 3, black trace) vs. the oddball IN condition (surround) (Fig. 3, color trace). From these averaged population traces, one can see noticeable differences in the timing of surround suppression between brain areas, with SCs (Fig. 3 C and D) showing markedly earlier suppression than V1 (Fig. 3 A and B) [P < 0.05, Wilcoxon signed rank test at 10-ms intervals, Bonferroni–Holm corrected; note that these results were not affected by the normalization procedure as indicated by the identical statistical outcome using absolute firing rates (Fig. S5)]. In particular, the suppression in V1 did not emerge until well after the initial volley of visual activity as exemplified by the blue vertical dotted line in Fig. 3A (Fig. S6A). This is in contrast to SCs, in which the suppression emerged within or before the peak of the initial volley of visual activity as exemplified by the red vertical dotted line in Fig. 3C. To quantify the magnitude of the differences, we compared the percentage of suppression within two time periods: (i) during the peak of the visual response and (ii) during the sustained portion of the response (details are in SI Materials and Methods). Percentage suppression (Fig. 3 E–H) was significantly greater for SCs neurons (Fig. 3, red) than for V1 neurons (Fig. 3, blue) during both the peak (Fig. 3 E, array-aligned; P = 9.3e-07 and G, saccade end-aligned; P = 1.2e-05) and sustained (Fig. 3 F, array-aligned; P = 0.0089 and H, saccade end-aligned; P = 2.8e-04) portions of the response (rank sum test).
As with SROL above, we computed the surround suppression onset latency (SSOL) for each neuron that showed significant surround suppression, defined as the earliest time in which a neuron signaled a significantly greater response for the single-item IN condition (no surround) vs. the oddball IN condition (surround) (SI Materials and Methods). Fig. 3I shows that, for V1 neurons in the stimulus-aligned condition, SSOL (median = 73 ms, n = 69) was significantly delayed relative to VROL (median = 40 ms, n = 69; P = 6.8e-16). In contrast, for SCs neurons in the stimulus-aligned condition, SSOL (median = 51 ms, n = 25) was not significantly delayed relative to VROL (median VROL = 49 ms, n = 25; P = 0.18). Fig. 3J shows that, for V1 neurons in the saccade end-aligned condition, SSOL (median = 95 ms, n = 52) was again significantly delayed relative to VROL (median = 46 ms, n = 52; P = 2.2e-12). For SCs neurons in the saccade end-aligned condition, SSOL (median = 51 ms, n = 22) was moderately delayed relative to VROL (median VROL = 39 ms, n = 22; P = 0.0004). Importantly, SSOL emerged in V1, on average, 22–44 ms later than in SCs (note the difference between the red and blue dotted curves in Fig. 3 I and J) (P = 2.8e-05 for stimulus-aligned data; P = 6.1e-05 for saccade end-aligned data, rank sum test). These results were qualitatively similar using V1 single units alone (Fig. S6 I and J). The results were also qualitatively similar whether RFs were overlapping or nonoverlapping between brain areas (Fig. S7). This indicates that, similar to the SROL results described above, SSOL also showed relatively slow dynamics in V1. This suggests either a processing delay to generate surround suppression within V1 itself or that surround modulation in V1 arises, in part, via feedback from other brain areas (32–34). In either case, these results are in agreement with the hypothesis that the SCs, but not V1, has center-surround characteristics suited for computing saliency from wide-field visual inputs (35).
Goal-Irrelevant Saliency Is Weakly Represented in SCi.
The input–output structure and response characteristics of SCi neurons differ in many important ways from SCs neurons (23–25). In particular, SCi neurons receive a diverse set of inputs predominantly from frontal and parietal cortices and the basal ganglia (Fig. 1B) and project directly to the brainstem saccade generator (23–25). Functionally, these neurons have multisensory and oculomotor responses and have been shown to play a central role in spatial attention (19, 20). This has led to the hypothesis that SCi is best described as a priority map for the control of attention and gaze (4) (Fig. 1 A and B). For these reasons, we predicted that SCi neurons would show important differences from SCs neurons on our task designed to measure goal-irrelevant saliency.
Fig. 4 shows the results of our sample of n = 25 SCi neurons. In contrast to SCs neurons (Fig. 2), SCi neurons only weakly signaled the presence of the salient oddball, showing a small difference over a shorter period than observed in SCs (Fig. 4 A and B) (P < 0.05, moving Wilcoxon signed rank test, Bonferroni–Holm corrected). Moreover, the response of SCi neurons was especially attenuated in the saccade end-aligned condition (Fig. 4B). Recall that these response profiles were normalized to the response of a single unitary item (Materials and Methods and Fig. 1E), which indicates that, while SCi neurons respond well to a visual stimulus (Fig. 4C, black trace), presentation of the entire array resulted in a dramatic response attenuation (Fig. 4 A–D, yellow traces). In addition, SCi neurons did show significant surround suppression during the peak response in the stimulus-aligned condition (Fig. 4E) (P = 0.0005, Wilcoxon signed rank test against zero median) but not during the sustained portion of the response (Fig. 4F) (P = 0.072) or in the saccade end-aligned condition (Fig. 4 G and H) (P > 0.49 in both cases). SSOL emerged around the same time or just after VROL, and the difference was not significant (Fig. 4I) (median VROL = 48 ms, median SROL = 52 ms, n = 14; P = 0.24, rank sum test).
These results suggest that SCi neurons are not a good candidate for the representation of a saliency map, a hypothesis that is consistent with the results of an earlier study (26). Moreover, the strong attenuation of the overall response in the saccade end-aligned case implies that locations on the SCi map not associated with the saccade goal were particularly suppressed around the time of the saccade, which continued momentarily into the subsequent fixation when the stimuli were brought into the RF (Fig. 4D). This was not the result of a misalignment between the stimulus and RF (because of less than perfect saccade accuracy), because V1 neurons showed little attenuation under the same conditions (Fig. 3B), and stimulus–RF misalignment would have been more likely in V1 given the characteristically smaller RFs (36, 37). These results suggests a mechanism that restricts goal-irrelevant saliency signals in SCi from passing through to downstream gaze control circuitry.
SI Materials and Methods
Animal Preparation.
Data were collected from four male Rhesus monkeys (Macaca mulatta) weighing between 10 and 12 kg: two for the SC recordings (monkeys I and U) and two for the V1 recordings (monkeys D and Y). For the SC recordings, surgical procedures and extracellular recording techniques have been detailed previously (45). For V1 recordings, one animal (monkey D) had a recording chamber implanted over V1, centered on the midline to allow access to both left and right lower visual field representations using single microelectrodes. In the second animal (monkey Y), a 96-channel microelectrode array (Blackrock Microsystems) was chronically implanted on the surface of the right V1 using surgical procedures outlined by Blackrock Microsystems (46). All animal care and experimental procedures were approved by the Queen’s University Animal Care Committee in accordance with the guidelines of the Canadian Council on Animal Care.
Stimuli and Equipment.
Visual stimuli were presented on a high-definition LCD video monitor (Sony Bravia 55 inch; model KDL-46XBR6) at a screen resolution of 1,920 × 1,080 pixels (60-Hz noninterlaced, 16-bit color depth). Viewing distance was 70 cm, resulting in a viewing angle of 82° horizontally and 52° vertically. The viewing area that extended beyond the monitor was blackened using black nonreflective cloth.
The main stimuli consisted of a radial arrangement of equally spaced color bars (210 items), with the diameter of the entire display spanning 40° to 45° visual angle (Fig. 1 C and D). The items were horizontally or vertically oriented (typically 0.4° × 1.2° but modified slightly depending on RF size of V1 neurons) and were red or green derived from the red–green cardinal axis in Derrington–Krauskopf–Lennie (DKL) color space (47), with −40% luminance contrast relative to the neutral gray background (65 cd/m2). The main condition consisted of a stimulus array with a single oddball that was always the opposite color and orientation as the remaining items [e.g., red horizontal against green vertical (depicted in Fig. 1 C and D), red vertical against green horizontal, green horizontal against red vertical, green vertical against red horizontal].
The oddball could appear IN or OPP the RF. This was compared with a single-item control condition, in which a single red or green, horizontal or vertical, stimulus appeared IN or OPP the RF (Fig. 1 E and F). All experimental conditions were interleaved.
The tasks were controlled by a Dell 8100 computer running a UNIX-based real-time data control system [real-time experimentation system (REX) 7.6] (48), which communicated with a second computer running in-house graphics software (written in C++) for presentation of stimuli. Stimulus timing was controlled using a photodiode placed at the left lower corner of the monitor and hidden by nonreflective tape. The photodiode measured the onset of a stimulus (20 × 20 pixels) that pulsed for one frame simultaneously with the onset of the main stimuli (i.e., the photodiode stimulus turned white for one frame and then returned to black). The REX was synchronized to the timing of the photodiode pulse by holding the current state until the pulse was detected.
Eye position was monitored using a 1,000-Hz video-based eye tracker (Eyelink 1000; SR Research). Saccades were detected based on a velocity (eye position > 50°/s) and amplitude (>1°) criteria and confirmed offline. Saccade end was defined as the point in time when the velocity of a detected saccade first fell below the saccade threshold defined above and was successfully inside the ∼3° × 3° computer-controlled (imaginary) target window required for receiving a reward (only rewarded trials were analyzed). The data were recorded in a third computer running a multichannel data acquisition system (Plexon Inc.). Eye position, event data, and spike times were digitized at 1 kHz.
Procedure.
Monkeys were seated with the head restrained in a primate chair (Crist Instruments) ≈70 cm from the LCD video monitor. For SC recordings, single glass-insulated tungsten microelectrodes (2.0 MΩ; Alpha Omega) were lowered into the SC through a stainless steel guide tube. For V1 recordings in one animal (monkey D), we used more durable tungsten microelectrodes (250-µm shank diameter; 1.0 MΩ; FHC) that were able to pierce directly through the dura stabilized by a guide tube that was positioned against the dura. V1 recordings in the second animal (monkey Y) were obtained from a 96-channel chronically implanted microeletrode array (impedance range = 0.08–0.35 MΩ; Blackrock Microsystems). The animals viewed a dynamic video, which provided rich visual stimulation that facilitated the localization of the visually responsive dorsal SC surface or the uppermost layers in V1.
When a neuron was isolated, its visual RF was mapped using a rapid visual stimulation procedure described previously (21). Briefly, for SC neurons, single stimuli (white spot against a black background) were successively flashed (150-ms flashes with 150-ms interflash intervals) across the visual field, while the animal held fixation on a central fixation stimulus. For V1 neurons, with RFs that are significantly smaller than in SC (36, 37), the visual stimulation procedure was localized within a ∼5° × 5° grid positioned over the approximate location of the neuron’s RF. The exact size and position of the stimulus grid could be manipulated online. The stimuli consisted of alternating black–white flashes against a neutral gray background (65 cd/m2). For neurons obtained from the V1 array, this stimulus grid was positioned in a manner that allowed us to coarsely map the spatial RFs across all electrode sites simultaneously. This online RF mapping was used to determine stimulus placement in the main experiment. Although it was not possible to determine the optimal stimulus parameters (orientation, size, direction of motion), the chosen stimulus parameters did evoke vigorous visual responses for most of the V1 neurons and multiunits that we sampled, and therefore, we assumed from this that our stimuli were effective for the purpose of this study. For SC neurons, a delayed saccade task was also used to characterize whether the neurons had visual and/or motor response properties using previously established methods (21). The location of the RFs for V1 neurons fell in the lower left hemisphere, a few degrees below the horizontal meridian, ≈5° to 10° eccentric from the fovea. The RFs of SC neurons were generally left or right of the fovea and were above, below, or very near the horizontal meridian at eccentricities ranging from 6° to 24°, with the most frequent eccentricity at ∼9° (mode = 8.8). The array was rotated and organized to place the oddball at the center or opposite the RF.
The animals then performed the main task. Specifically, on a given trial (Fig. 1), the animal fixated an FP (black Gaussian-windowed spot, SD = 0.3°), which appeared at the screen center (array-aligned or single item-aligned conditions) (Fig. 1 C and E) or above or below center (saccade end-aligned conditions) (Fig. 1 D and F) orthogonal to the RF at the same eccentricity (exactly ±90° polar angle from RF center). The animals were required to continue fixating on the FP for a 0.5- to 0.7-s random period, after which the goal-irrelevant stimulus/array appeared. The animals were required to continue fixating on the FP for an additional 0.5–0.7 s, after which the FP stepped from center to one of the specified peripheral locations (array-aligned or item-aligned conditions) (Fig. 1 C and E) or from the peripheral location to center (saccade end-aligned conditions) (Fig. 1 D and F). The animals were then required to launch a saccade to the new location of the FP and to hold fixation on this new location for 0.5–0.7 s within a ∼3° × 3° computer-controlled window, after which a liquid reward was issued. Thus, in the array/item-aligned case, the goal-irrelevant but salient stimulus (oddball or single item) abruptly appeared IN or OPP the RF, whereas in the saccade end-aligned case, the goal-irrelevant stimulus (oddball or single item) was brought into the RF via the saccade to screen center. Importantly, when the eyes were at center in either condition, the stimulus display was the same. Visually evoked responses were measured during the stimulus-aligned vs. saccade end-aligned time points, which are highlighted by the black outlines in the key frames of the illustration in Fig. 1 C–F.
Data Analyses.
Single unit and multiunit recordings.
Single units were isolated online using a window discriminator and confirmed offline using spike-sorting software (Plexon Inc.). Spikes were convolved with a function that resembled an excitatory postsynaptic potential (49), with rise and decay values of 5 and 20 ms, respectively. A total of 94 neurons were isolated (38 V1, 31 SCs, 25 SCi). Using the microelectrode array in V1, several electrodes yielded reliable visually evoked multiunit spiking activity and showed qualitatively similar results to the V1 single units. Therefore, for the main analyses, we combined V1 single units (n = 38) and multiunit sites (n = 55) for a total of n = 93. Thus, although V1 was represented by a larger sample size, yielding potentially greater statistical power, this gave additional support for our hypothesis, because SCs neurons still showed a more robust and earlier saliency representation.
SC neuron classification.
The SC is composed of two dominant functional layers (9, 10), a visual-only superficial layer (SCs) and a multisensory–cognitive–motor-related intermediate layer (SCi). SC neurons were functionally classified as visual SCs or visuomotor SCi based on their discharge characteristics using a visual RF mapping procedure (21) to determine the presence of a visual component and a delayed saccade task to determine the presence of a motor component using previously established methods (21, 31). Briefly, neurons were defined as having a visual component if the visual mapping procedure yielded the presence of a localized visual RF (21). Neurons were defined as having a motor component if the average firing rate around the time of the saccade (−25 to +25 ms relative to saccade onset) into the neuron’s preferred visual RF was significantly greater than a presaccade baseline period (−150 to −50 ms relative to saccade onset).
Data normalization.
The normalization procedure was similar to conventional normalization to the maximum firing rate, except that we scaled the firing rate of each neuron to its minimum and maximum evoked by the single item within the relevant time window (from stimulus onset for 500 ms). Specifically, for each neuron, neuronal discharge rates were normalized to the minimum and maximum values of the single-item condition (which always yielded the greatest response) using a zero to one rescaling of the spike density function with the following:
where is the original spike density function (averaged across trials) for a given condition, and are the minimum and maximum values, respectively, within the poststimulus period (0–500 ms) of the single-item condition. Thus, when normalizing the array condition, for example, its response can be seen as a proportion of the maximum response possible using a unitary item. This ensures that the relative differences between conditions for a given neuron are retained. This also makes the percentage surround suppression index very intuitive and easy to compute, because the response curves in the array condition will always be some proportion of the maximum response elicited by the single item.
Latency and magnitude indices.
VROL, SROL, and SSOL were computed for each neuron when possible (i.e., when the neuron exhibited each of these properties). We used an established statistical method used by others in the attention selection literature (10, 50, 51), which involves a running Wilcoxon rank sum test at a fixed alpha level (P < 0.05). This statistical method has been shown to be comparable with conventional receiver operating characteristic analysis (51) for determining neuronal selection times and therefore, provides a reasonable estimate of the latencies and selection times comparable with other studies. The test was run on a temporally averaged moving window at every millisecond from 0 to 300 ms. Specifically, VROL was defined as the time in which activation (averaged over a moving 5-ms window) was significantly greater than a prestimulus baseline (single item-aligned condition: −100 ms to stimulus onset; saccade end-aligned condition: −200 to −100 ms relative saccade onset). SROL was defined as the time in which activation (averaged over a moving 50-ms window) was significantly greater for oddball IN vs. oddball OPP condition. We used a longer moving window here compared with VROL, because the dynamics of this process was slower and more variable, and the longer window yielded more reliable results. SSOL was defined similarly to SROL, but instead of comparing oddball IN with OPP conditions, the single-item IN condition (no surround) was compared with the oddball IN condition (surround). The latency indices were verified by observing the estimated VROL, SROL, and SSOL of each neuron plotted against its spike density function.
Because surround suppression emerged at noticeably different time periods across the brain areas (Fig. 3), we computed surround suppression magnitude indices during the peak response and later, during a sustained portion of the response. The surround suppression index was defined as the ratio of the response evoked by the single item relative to the response evoked by the array. The time of the peak response differed for each neuron and brain area but in general, ranged from ∼40 to 100 ms poststimulus onset. The sustained period was defined as the period from the VROL of a given neuron for 300 ms.
Discussion
This study compared saliency coding in two early visual areas of the primate visual system, V1, and the hub of the evolutionarily older visual system, SCs. We found that, while both V1 and SCs neurons encoded the presence of a salient but goal-irrelevant oddball, the saliency representation in V1 did not emerge until after it had already emerged in SCs (Fig. 2 E and F). This indicates that, although V1 is the dominant input to SCs (24, 25), the saliency representation in SCs was not simply a readout of computations done upstream in V1. This is interesting, because V1 neurons registered the arrival of visual signals reliably earlier than SCs in the stimulus-aligned condition (although not the saccade end-aligned condition), but V1 was slower to signal both saliency (Fig. 2 E and F) and surround modulation (Fig. 3), an essential component for computing saliency across the visual field (1–3). Moreover, the saliency representation was generally limited to the superficial visual SC layers. SCi visuomotor neurons, with dominant inputs that arise from frontal–parietal cortices and the basal ganglia (23–25), did not show a significant preference for the salient oddball (Fig. 4 A and B). This finding is consistent with an earlier study that reported a weak saliency representation in SCi during free viewing of natural scenes (26). Taken together, the results provide evidence that the evolutionarily older SCs, not V1, embodies the role of a saliency map.
The relative timing of these visual representations offers insight into the flow of information processing between V1 and SC. Using the traditional index of visual latency timed from stimulus onset (Fig. 2 A and C), V1 was the earliest to receive visual inputs (Fig. 2), which is consistent with the role of an early feature extraction stage (Fig. 1 A, blue and B, blue). The evolutionarily older SCs then signaled the arrival of visual inputs, which may, in part, be inherited from pooled inputs of multiple V1 neurons (24, 25). This would effectively cancel out most feature-specific preferences, consistent with the broadly tuned characteristics of SCs neurons (31, 38, 39). Interestingly, VROL was the same or shorter for SCs than V1 in the saccade end-aligned case (Fig. 2F and Figs. S1, S3–S5, and S7). This might indicate that, under real-world viewing conditions, where stimuli are more often brought into a neuron’s RF as a result of saccades, retinotectal inputs might dominate the earliest part of the SCs visual response. In either case, SCs signaled the presence of the salient oddball before V1. Interestingly, the saliency representation in SCs was immediately preceded by robust surround suppression, which was markedly stronger and earlier than in V1. This is strong evidence that SCs has center-surround characteristics better suited for computing saliency from wide-field visual inputs (35). For SCi, visual inputs not associated with the saccade goal were strongly attenuated (Fig. 4D). This result suggests a mechanism that suppresses goal-irrelevant visual inputs in SCi to prioritize which salient signals gain access to the downstream gaze control system. Taken together, these results support a simple, intuitive framework for the formation of the saliency map in the primate midbrain, echoing earlier studies in the premammalian optic tectum (16–18).
The timing of saliency representations in other cortical brain areas also suggests that they are not the source of the saliency representation observed here in SCs. For example, V4 neurons signal the presence of an oddball too late (∼85–100 ms poststimulus) to be the source of the saliency representation observed in SCs (∼65 ms poststimulus) (10). Similarly, estimates of a pure saliency response in LIP emerge too late (∼70–75 ms poststimulus onset) (13) to be the source of the saliency representation in SCs, other than the fact that LIP does not project to SCs (24, 25). It is also worth mentioning that the timing of the V1 saliency representation in our study is very similar to the timing of V1 figure-ground modulation (40, 41), both of which indicate a modulation that only occurs well after the initial transient visual response. In sum, the timing of these cortical saliency representations appears too slow to be the source of the saliency representation observed in SCs. Whether these cortical saliency representations arise from ascending tectothalamocortical pathways [for example, via pulvinar (42, 43)] is unknown.
Although there have been viable arguments for the existence of a feature-agnostic saliency map in V1 (6, 7), these arguments are weakened by the lack of direct neurophysiological evidence (a detailed discussion is in ref. 44). The results of this study further challenge the hypothesis that V1 generates the visual saliency map by showing that SCs neurons, with dominant inputs that arise from V1, encoded the presence of a salient oddball sooner and more robustly than V1. This is in close agreement with the results of a recent V1 lesion study by Yoshida et al. (29), which used a computational saliency model to weigh the contribution of each visual feature on free viewing gaze behavior after the surgical removal of V1. In that study, the authors showed that the removal of V1 abolished the contribution of oriented features, whereas the remaining features (luminance, color, motion) remained relatively unaffected. Taken together, this is strong evidence that V1 is not essential for the computation of saliency, whereas SCs remains a strong candidate for the early coding of saliency.
While it has been argued that the saliency map has evolved from the optic tectum (SC) in lower vertebrates to the V1 in primates (7), the results of this study provide a serious challenge for this hypothesis. Certainly, it would seem advantageous for primitive species to have a mechanism that allows one to rapidly engage the orienting system toward stimuli defined by conspicuous features, and the evolutionarily old visual system in the midbrain seems to have solved this biological problem long before the elaboration of visual cortex (16–18). A potential functional advantage of a saliency mechanism in SC is that it would reside close to the gaze-orienting circuity. The sensory and motor maps of the SCs and SCi are closely aligned (45) and ideally situated to integrate diverse inputs from cortex and engage the orienting system to act on such inputs. Thus, we postulate that, through evolution, V1 was integrated into an already existing saliency system, providing greater feature specificity and fine-tuned control over visually guided behavior.
Materials and Methods
Data were collected from four male Rhesus monkeys (Macaca mulatta): two for the SC recordings and two for the V1 recordings (details are in SI Materials and Methods). All procedures were approved by the Queen’s University Animal Care Committee in accordance with the guidelines of the Canadian Council on Animal Care.
The stimuli consisted of a radial arrangement of equally spaced color bars (210 items), with the diameter of the entire display spanning 40° to 45° visual angle (Fig. 1 C and D). The oddball, which could appear IN or OPP the RF, was always the opposite color and orientation as the remaining items, but all stimulus combinations were run interleaved. A single-item control condition served as a benchmark with no surround stimuli (Fig. 1 E and F).
On a given trial (Fig. 1), the animal fixated a fixation point (FP), which appeared at the screen center (stimulus-aligned conditions) (Fig. 1 C and E) or above or below center (saccade end-aligned conditions) (Fig. 1 D and F) orthogonal to the RF at the same eccentricity. The animals were required to fixate the FP for 0.5–0.7 s, after which the goal-irrelevant stimulus/array appeared. The animals continued to fixate the FP for an additional 0.5–0.7 s, after which the FP stepped from center to one of the specified peripheral locations (array/item-aligned conditions) (Fig. 1 C and E) or from the peripheral location to center (saccade end-aligned conditions) (Fig. 1 D and F). The animals were then required to launch a saccade to the new FP location and hold fixation for 0.5–0.7 s within a ∼3° × 3° computer-controlled window, after which a liquid reward was issued. Visually evoked responses were measured during the stimulus-aligned vs. saccade end-aligned time points, highlighted by the black outlines in key frames of Fig. 1 C–F.
SI Materials and Methods has extended details about stimuli, equipment, procedure, and data analyses. The data, materials, and code that support the findings of this study are available from B.J.W. on reasonable request.
Acknowledgments
We thank Ann Lablans, Donald Brien, Sean Hickman, and Mike Lewis for technical assistance. This project was funded by Canadian Institutes of Health Research Grant FDN-148418 and National Science Foundation Grants BCS-0827764 and CCF-1317433. D.P.M. was supported by the Canada Research Chair Program.
Footnotes
The authors declare no conflict of interest.
This article is a PNAS Direct Submission. M.C. is a guest editor invited by the Editorial Board.
This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1701003114/-/DCSupplemental.
References
- 1.Itti L, Koch C, Niebur E. A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach Intell. 1998;20:1254–1259. [Google Scholar]
- 2.Itti L, Koch C. Computational modelling of visual attention. Nat Rev Neurosci. 2001;2:194–203. doi: 10.1038/35058500. [DOI] [PubMed] [Google Scholar]
- 3.Borji A, Itti L. State-of-the-art in visual attention modeling. IEEE Trans Pattern Anal Mach Intell. 2013;35:185–207. doi: 10.1109/TPAMI.2012.89. [DOI] [PubMed] [Google Scholar]
- 4.Fecteau JH, Munoz DP. Salience, relevance, and firing: A priority map for target selection. Trends Cogn Sci. 2006;10:382–390. doi: 10.1016/j.tics.2006.06.011. [DOI] [PubMed] [Google Scholar]
- 5.Serences JT, Yantis S. Selective visual attention and perceptual coherence. Trends Cogn Sci. 2006;10:38–45. doi: 10.1016/j.tics.2005.11.008. [DOI] [PubMed] [Google Scholar]
- 6.Li Z. A saliency map in primary visual cortex. Trends Cogn Sci. 2002;6:9–16. doi: 10.1016/s1364-6613(00)01817-9. [DOI] [PubMed] [Google Scholar]
- 7.Zhaoping L. From the optic tectum to the primary visual cortex: Migration through evolution of the saliency map for exogenous attentional guidance. Curr Opin Neurobiol. 2016;40:94–102. doi: 10.1016/j.conb.2016.06.017. [DOI] [PubMed] [Google Scholar]
- 8.Zhang X, Zhaoping L, Zhou T, Fang F. Neural activities in v1 create a bottom-up saliency map. Neuron. 2012;73:183–192. doi: 10.1016/j.neuron.2011.10.035. [DOI] [PubMed] [Google Scholar]
- 9.Li W, Piëch V, Gilbert CD. Contour saliency in primary visual cortex. Neuron. 2006;50:951–962. doi: 10.1016/j.neuron.2006.04.035. [DOI] [PubMed] [Google Scholar]
- 10.Burrows BE, Moore T. Influence and limitations of popout in the selection of salient visual stimuli by area V4 neurons. J Neurosci. 2009;29:15169–15177. doi: 10.1523/JNEUROSCI.3710-09.2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Gottlieb JP, Kusunoki M, Goldberg ME. The representation of visual salience in monkey parietal cortex. Nature. 1998;391:481–484. doi: 10.1038/35135. [DOI] [PubMed] [Google Scholar]
- 12.Bisley JW, Goldberg ME. Attention, intention, and priority in the parietal lobe. Annu Rev Neurosci. 2010;33:1–21. doi: 10.1146/annurev-neuro-060909-152823. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Arcizet F, Mirpour K, Bisley JW. A pure salience response in posterior parietal cortex. Cereb Cortex. 2011;21:2498–2506. doi: 10.1093/cercor/bhr035. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Thompson KG, Bichot NP. A visual salience map in the primate frontal eye field. Prog Brain Res. 2005;147:251–262. doi: 10.1016/S0079-6123(04)47019-8. [DOI] [PubMed] [Google Scholar]
- 15.Purcell BA, Schall JD, Logan GD, Palmeri TJ. From salience to saccades: Multiple-alternative gated stochastic accumulator model of visual search. J Neurosci. 2012;32:3433–3446. doi: 10.1523/JNEUROSCI.4622-11.2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Mysore SP, Asadollahi A, Knudsen EI. Signaling of the strongest stimulus in the owl optic tectum. J Neurosci. 2011;31:5186–5196. doi: 10.1523/JNEUROSCI.4592-10.2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Knudsen EI. Control from below: The role of a midbrain network in spatial attention. Eur J Neurosci. 2011;33:1961–1972. doi: 10.1111/j.1460-9568.2011.07696.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Asadollahi A, Mysore SP, Knudsen EI. Stimulus-driven competition in a cholinergic midbrain nucleus. Nat Neurosci. 2010;13:889–895. doi: 10.1038/nn.2573. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Krauzlis RJRRJ, Lovejoy LP, Zénon A. Superior colliculus and visual spatial attention. Annu Rev Neurosci. 2013;36:165–182. doi: 10.1146/annurev-neuro-062012-170249. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Ignashchenkova A, Dicke PW, Haarmeier T, Thier P. Neuron-specific contribution of the superior colliculus to overt and covert shifts of attention. Nat Neurosci. 2004;7:56–64. doi: 10.1038/nn1169. [DOI] [PubMed] [Google Scholar]
- 21.Marino RA, et al. Linking visual response properties in the superior colliculus to saccade behavior. Eur J Neurosci. 2012;35:1738–1752. doi: 10.1111/j.1460-9568.2012.08079.x. [DOI] [PubMed] [Google Scholar]
- 22.White BJ, Munoz DP. Separate visual signals for saccade initiation during target selection in the primate superior colliculus. J Neurosci. 2011;31:1570–1578. doi: 10.1523/JNEUROSCI.5349-10.2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.White BJ, Munoz DP. Oxford Handbook of Eye Movements. Oxford University Press; Oxford, UK: 2011. The superior colliculus; pp. 195–213. [Google Scholar]
- 24.Cerkevich CM, Lyon DC, Balaram P, Kaas JH. Distribution of cortical neurons projecting to the superior colliculus in macaque monkeys. Eye Brain. 2014;2014:121–137. doi: 10.2147/EB.S53613. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Lock TM, Baizer JS, Bender DB. Distribution of corticotectal cells in macaque. Exp Brain Res. 2003;151:455–470. doi: 10.1007/s00221-003-1500-y. [DOI] [PubMed] [Google Scholar]
- 26.White BJ, et al. Superior colliculus neurons encode a visual saliency map during free viewing of natural dynamic video. Nat Commun. 2017;8:14263. doi: 10.1038/ncomms14263. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Itti L. Quantifying the contribution of low-level saliency to human eye movements in dynamic scenes. Vis cogn. 2005;12:1093–1123. [Google Scholar]
- 28.Berg DJD, Boehnke SES, Marino RAR, Munoz DP, Itti L. Free viewing of dynamic stimuli by humans and monkeys. J Vis. 2009;9:1–15. doi: 10.1167/9.5.19. [DOI] [PubMed] [Google Scholar]
- 29.Yoshida M, et al. Residual attention guidance in blindsight monkeys watching complex natural scenes. Curr Biol. 2012;22:1429–1434. doi: 10.1016/j.cub.2012.05.046. [DOI] [PubMed] [Google Scholar]
- 30.Treisman AM, Gelade G. A feature-integration theory of attention. Cognit Psychol. 1980;12:97–136. doi: 10.1016/0010-0285(80)90005-5. [DOI] [PubMed] [Google Scholar]
- 31.White BJ, Boehnke SE, Marino RA, Itti L, Munoz DP. Color-related signals in the primate superior colliculus. J Neurosci. 2009;29:12159–12166. doi: 10.1523/JNEUROSCI.1986-09.2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Nurminen L, Angelucci A. Multiple components of surround modulation in primary visual cortex: Multiple neural circuits with multiple functions? Vision Res. 2014;104:47–56. doi: 10.1016/j.visres.2014.08.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Cavanaugh JR, Bair W, Movshon JA. Nature and interaction of signals from the receptive field center and surround in macaque V1 neurons. J Neurophysiol. 2002;88:2530–2546. doi: 10.1152/jn.00692.2001. [DOI] [PubMed] [Google Scholar]
- 34.Nassi JJ, Lomber SG, Born RT. Corticocortical feedback contributes to surround suppression in V1 of the alert primate. J Neurosci. 2013;33:8504–8517. doi: 10.1523/JNEUROSCI.5124-12.2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Phongphanphanee P, et al. Distinct local circuit properties of the superficial and intermediate layers of the rodent superior colliculus. Eur J Neurosci. 2014;40:2329–2343. doi: 10.1111/ejn.12579. [DOI] [PubMed] [Google Scholar]
- 36.Hubel DH, Wiesel TN. Receptive fields and functional architecture of monkey striate cortex. J Physiol. 1968;195:215–243. doi: 10.1113/jphysiol.1968.sp008455. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Goldberg ME, Wurtz RH. Activity of superior colliculus in behaving monkey. I. Visual receptive fields of single neurons. J Neurophysiol. 1972;35:542–559. doi: 10.1152/jn.1972.35.4.542. [DOI] [PubMed] [Google Scholar]
- 38.Davidson RM, Bender DB. Selectivity for relative motion in the monkey superior colliculus. J Neurophysiol. 1991;65:1115–1133. doi: 10.1152/jn.1991.65.5.1115. [DOI] [PubMed] [Google Scholar]
- 39.Marrocco RT, Li RH. Monkey superior colliculus: Properties of single cells and their afferent inputs. J Neurophysiol. 1977;40:844–860. doi: 10.1152/jn.1977.40.4.844. [DOI] [PubMed] [Google Scholar]
- 40.Lamme VAF. The neurophysiology of figure-ground segregation in primary visual cortex. J Neurosci. 1995;15:1605–1615. doi: 10.1523/JNEUROSCI.15-02-01605.1995. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Self MW, van Kerkoerle T, Supèr H, Roelfsema PR. Distinct roles of the cortical layers of area V1 in figure-ground segregation. Curr Biol. 2013;23:2121–2129. doi: 10.1016/j.cub.2013.09.013. [DOI] [PubMed] [Google Scholar]
- 42.Berman RA, Wurtz RH. Functional identification of a pulvinar path from superior colliculus to cortical area MT. J Neurosci. 2010;30:6342–6354. doi: 10.1523/JNEUROSCI.6176-09.2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Berman RA, Wurtz RH. Signals conveyed in the pulvinar pathway from superior colliculus to cortical area MT. J Neurosci. 2011;31:373–384. doi: 10.1523/JNEUROSCI.4738-10.2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Veale R, Hafed ZM, Yoshida M. How is visual salience computed in the brain? Insights from behaviour, neurobiology and modelling. Philos Trans R Soc Lond B Biol Sci. 2017;372 doi: 10.1098/rstb.2016.0113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Marino RA, Rodgers CK, Levy R, Munoz DP. Spatial relationships of visuomotor transformations in the superior colliculus map. J Neurophysiol. 2008;100:2564–2576. doi: 10.1152/jn.90688.2008. [DOI] [PubMed] [Google Scholar]
- 46.Fellows M, Suner S. 2009. Blackrock Microsystems Array Surgical Implant Procedure Training Manual Purpose (Blackrock Microsystems LLC, Salt Lake City), Rev 5.0, LB-0220.
- 47.Derrington AM, Krauskopf J, Lennie P. Chromatic mechanisms in lateral geniculate nucleus of macaque. J Physiol. 1984;357:241–265. doi: 10.1113/jphysiol.1984.sp015499. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Hays AV, Richmond BJ, Optican LM. WESCON Conference Proceedings. Vol 2. Electron Conventions; El Segundo, CA: 1982. A UNIX-based multiple-process system for real-time data acquisition and control; pp. 1–10. [Google Scholar]
- 49.Thompson KG, Hanes DP, Bichot NP, Schall JD. Perceptual and motor processing stages identified in the activity of macaque frontal eye field neurons during visual search. J Neurophysiol. 1996;76:4040–4055. doi: 10.1152/jn.1996.76.6.4040. [DOI] [PubMed] [Google Scholar]
- 50.Cohen JY, et al. Cooperation and competition among frontal eye field neurons during visual target selection. J Neurosci. 2010;30:3227–3238. doi: 10.1523/JNEUROSCI.4600-09.2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Thomas NWD, Paré M. Temporal processing of saccade targets in parietal cortex area LIP during visual search. J Neurophysiol. 2007;97:942–947. doi: 10.1152/jn.00413.2006. [DOI] [PubMed] [Google Scholar]