Visual Search is Guided to Categorically Defined Targets

Hyejin Yang; Gregory J Zelinsky

doi:10.1016/j.visres.2009.05.017

. Author manuscript; available in PMC: 2010 Jul 1.

Published in final edited form as: Vision Res. 2009 Jun 3;49(16):2095–2103. doi: 10.1016/j.visres.2009.05.017

Visual Search is Guided to Categorically Defined Targets

Hyejin Yang ¹, Gregory J Zelinsky ¹

PMCID: PMC2756560 NIHMSID: NIHMS121341 PMID: 19500615

Abstract

To determine whether categorical search is guided we had subjects search for teddy bear targets either with a target preview (specific condition) or without (categorical condition). Distractors were random realistic objects. Although subjects searched longer and made more eye movements in the categorical condition, targets were fixated far sooner than was expected by chance. By varying target repetition we also determined that this categorical guidance was not due to guidance from specific previously viewed targets. We conclude that search is guided to categorically-defined targets, and that this guidance uses a categorical model composed of features common to the target class.

Keywords: categorical guidance, eye movements, categorization, object detection, visual attention

Visual search is one of our most common cognitive behaviors. Hundreds of times each day we seek out objects and patterns in our environment in the performance of search tasks. Some of these are explicit, such as when we scan the shelves for a particular food item in a grocery store. Other search tasks are so seamlessly integrated into an ongoing behavior that they become all but invisible, such as when gaze flicks momentarily to each ingredient when preparing a meal (Hayhoe, Shrivastava, Mruczek, & Pelz, 2003).

Such a widely used cognitive operation requires a highly flexible method for representing targets, a necessary first step in any search task. In many cases a search target can be described in terms of very specific visual features. When searching for your car in a crowed parking lot or your coffee cup in a cluttered room, relatively specific features from these familiar objects can be recalled from long term memory, assembled into a working memory description of the target, and used to guide your search (Wolfe, 1994; Zelinsky, 2008). However, in many other cases such an elaborated target description is neither possible nor desirable. Very often we need to find any cup or any pen or any trash bin, not a particular one. In these cases different target-defining features are required, as the features would need to represent an entire class of objects and cannot be tailored to a specific member. How does the search for such a categorically defined target differ from the search for a specific member of a target class?

Although several studies have used categorically defined targets in the context of a search task (e.g., Bravo & Farid, 2004; Ehinger, Hidalgo-Sotelo, Torralba, & Oliva, in press; Fletcher-Watson, Findlay, Leekam, & Benson, 2008; Foulsham & Underwood, 2007; Henderson, Weeks, & Hollingworth, 1999; Mruczek and Sheinberg, 2005; Newell, Brown, & Findlay, 2004; Torralba, Oliva, Castelhano, & Henderson, 2006), surprisingly few studies have been devoted specifically to understanding categorical visual search (Bravo & Farid, 2009; Castelhano, Pollatsek, & Cave, 2008; Schmidt & Zelinsky, 2009). Early work on categorical search used numbers and letters as target classes. For example, Egeth, Jonides, and Wall (1972) had subjects search for a digit target among a variable number of letter distractors, and found nearly flat target present and target absent search slopes (see also Jonides & Gleitman, 1972). Brand (1971) also showed that the search for a digit among letters tended to be faster than the search for a letter among other letters. The general conclusion from these studies was that categorical search is not only possible, but that it can be performed very efficiently, at least in the case of stimuli having highly restrictive feature sets (Duncan, 1983).

Wolfe, Friedman-Hill, Stewart, and O'Connell (1992) attempted to identify some of the relevant categorical dimensions that affect search in simple visual contexts. They found that the search for an oriented bar target among heterogeneous distractors could be very efficient when the target was categorically distinct in the display (e.g., the only “steep” or “left leaning” item), and concluded that categorical factors can facilitate search by reducing distractor heterogeneity via grouping. Wolfe (1994) later elaborated on this proposal by hypothesizing the existence of categorical features, and incorporating these features into his influential Guided Search Model (GSM). According to GSM, targets and search objects are represented categorically, and it is the match between these categorical representations that generates the top-down signal used to guide attention in a search task. However, left unanswered from this work was whether this evidence for categorical guidance would extend to more complex object classes in which the categorical distinctions between targets and distractors are less apparent.

Levin, Takarae, Miner, and Keil (2001) directly addressed this question and provided the first evidence that categorical search might be possible for visually complex object categories. Subjects viewed 3-9 line drawings of objects and were asked to search for either an animal target among artifact distractors or an artifact target among animal distractors. They found that both categorical searches were very efficient, particularly in the case of the artifact search task. Levin and colleagues concluded that subjects might learn the features distinguishing targets from distractors for these two object classes (e.g., rectilinearity and curvilinearity), then use these categorical features to efficiently guide their search.

More recent work suggests that categorical guidance may in fact be quite limited in tasks involving fully realistic objects (Vickery, King, & Jiang, 2005; Wolfe, Horowitz, Kenner, Hyle, and Vasan, 2004, Experiments 5-6). Using a target preview, subjects in the Wolfe et al. study were shown either an exact picture of the target (e.g., a picture of an apple), a text label describing the target type (e.g., “apple”), or a text label describing the target category (e.g., “fruit”). They found that search was most efficient using the picture cue, less efficient using a type cue, and least efficient using a categorical cue (see also Schmidt & Zelinsky, 2009). Contrary to the highly efficient guidance reported in the Levin et al. (2001) study, these results suggest that categorical guidance may be weak or non-existent for search tasks using common real-world object categories.

At least two factors may have contributed to the discrepant findings from previous categorical search studies. First, most of these studies measured categorical guidance exclusively in terms of manual search efficiency. However, this measure makes it difficult to cleanly separate actual guidance to the target from decision processes needed to reject search distractors (Zelinsky & Sheinberg, 1997). For example, it may be the case that subjects were very efficient in rejecting animal distractors in the Levin et al. (2001) study, thereby resulting in shallow search slopes for artifact targets (see also Kirchner & Thorpe, 2006). Given that object verification times vary widely for categorically-defined targets (Castelhano et al., 2008), differences in search efficiency reported across studies and conditions might reflect different rejection rates for distractors, and have very little to do with actual categorical guidance. Second, previous studies using a categorical search task have invariably repeated stimuli over trials. For example, Wolfe et al. (2004, Experiment 5) used only 22 objects as targets, despite having 600 trials in their experiment. Such reuse of stimuli might compromise claims of categorical search, as subjects could have retrieved instances of previously viewed targets from memory and used these as search templates. To the extent that object repetition speeds categorical search (Mruczek & Sheinberg, 2005), differences in search efficiency between studies might be explained by different object repetition rates.

By addressing both of the above-described concerns, the present study clarifies our capacity to guide search to categorically defined targets. In Experiment 1 we removed the potential for target and distractor repetition to affect categorical search by using entirely new targets and distractors on every trial. If stimuli are not reused from trial to trial, no opportunity for object-specific guidance would exist. In Experiment 2 we explicitly manipulated target repetition so as to determine whether guidance results from the categorical representation of target features or from previously viewed targets serving as specific templates. Eye movements were monitored and analyzed in both experiments so as to separate actual categorical guidance from decision factors relating to distractor rejection. To the extent that search is guided to categorical targets, we expect these targets to be fixated preferentially by gaze (e.g., Chen & Zelinsky, 2006; Schmidt & Zelinsky, 2009). However, finding no preference to fixate targets over distractors would suggest that differences in search efficiency are due to different rates of distractor rejection or target verification under categorical search conditions.

Experiment 1

Can search be guided to categorical targets? To answer this question we had subjects search for a teddy bear target among common real-world objects under specific and categorical search conditions. Following Levin et al. (2001), if categorical search is highly efficient we would expect relatively shallow manual search slopes, perhaps as shallow as those in the specific search condition where the target is designated using a preview. We would also expect categorically-defined targets to be acquired directly by gaze, again perhaps as directly as those under target specific conditions. However, if categorical descriptions cannot be used to guide search to a target (Castelhano et al., 2008; Vickery et al., 2005; Wolfe et al., 2004), we would expect steep manual search slopes in the categorical condition, and a chance or near chance probability of looking initially to the categorical target.

Method

Participants

Twenty-four students from Stony Brook University participated in the experiment for course credit. All had normal or corrected to normal visual acuity, by self report, and were naïve to the goals of the experiment.

Stimuli & Apparatus

Targets were 198 color images of teddy bears from The teddy bear encyclopedia (Cockrill, 2001). Of these objects, 180 bears were used as targets in the search task, and 18 bears were used as targets in practice trials. The distractors were 2,475 color images of real-world objects from the Hemera Photo Objects Collection (Gatineau, Quebec, Canada). Of these objects, 2,250 were used as distractors in the search task, and 225 were used in practice trials. No object, target or distractor, was shown more than once during the experiment. All were normalized to have the same bounding box area (8,000 pixels), where a bounding box is defined as the smallest rectangle enclosing an object. Normalizing object area roughly equated for size, but precise control was not possible given the irregular shape of real-world objects. Consequently, object width varied between 1.12° and 4.03°, and object height varied between 1.0° and 3.58°. Figure 1 shows representative targets and distractors.

Representative targets (A) and distractors (B) used as stimuli.

Objects were arranged into 6, 13, and 20-item search displays, which were presented in color on a 19-inch flat screen CRT monitor at a refresh rate of 100 Hz. A custom-made program written in Visual C/C++ (v. 6.0) and running under Microsoft Windows XP was used to control the stimulus presentation. Items were positioned randomly in displays, with the constraints that the minimum center-to-center distance between objects, and the distance from center fixation to the nearest object, was 180 pixels (about 4°). Approximate viewing angle was 26° horizontally and 20° vertically. Head position and viewing distance (72 cm) were fixed with a chinrest, and all responses were made with a Game Pad controller attached to the computer's USB port. Eye position was sampled at 500 Hz using the EyeLink II eye tracking system (SR Research Ltd.) with default saccade detection settings. Calibrations were not accepted until the average spatial error was less than .49° and the maximum error was less than .99°.

Design

The 180 experimental trials per subject were evenly divided into 2 target presence conditions (present/ absent) and 3 set size conditions (6/ 13/ 20), leaving 30 trials per cell of the design. The type of search, specific or categorical, was a between-subjects variable. Half of the subjects were shown a preview of a specific target bear at the start of each trial (specific search), the other half were instructed to search for a non-specific teddy bear (categorical search). The search displays viewed by these two groups of subjects were identical (i.e., the same targets and distractors in the same locations); the only difference between these groups was that subjects in the specific condition were searching for a particular target.

Procedure

The subject's task was to determine, as quickly and as accurately as possible, the presence or absence of a teddy bear target among a variable number of real-world distractors. Each trial began with subjects looking at a central fixation point and pressing a “start” button, which also served to drift correct the eye tracker. In the specific search condition a preview of the target teddy bear was displayed for 1 second, followed by the search display. In the categorical search condition subjects were instructed at the start of the experiment to search for any teddy bear, based on the bears viewed during the practice trials and their general knowledge of this category; there were no target previews. Target present judgments were registered using the left trigger of the Game Pad, target absent judgments were registered using the right trigger. Accuracy feedback was provided after each response. The experiment consisted of one session of 18 practice trials and 180 experimental trials, lasting about 30 minutes.

Results and Discussion

Manual data

Error rates were generally low. In the categorical condition the false alarm and miss rates were 0.5% and 2.5%, respectively. Corresponding errors in the specific condition were 0.4% and 1.6%. These trials were excluded from all subsequent analyses.

Figure 2 shows manual reaction times (RTs) for the categorical and specific search conditions, as a function of set size and target presence. There were significant main effects of target presence, F(1, 11) = 34.0, p < .001, and set size, F(2, 22) = 36.7, p < .001, in the categorical search data, as well as a significant target × set size interaction, F(2, 22) = 20.5, p < .001. Similar patterns characterized the specific search data. There were again significant main effects of target presence, F(1, 11) = 18.6, p < .01, and set size, F(2, 22) = 39.1, p < .001, as well as a significant interaction between the two, F(2, 22) = 16.1, p < .001. However, we also found a significant three-way interaction between target presence, set size, and search condition, F(2, 44) = 5.30, p < .01. Target present and absent search slopes in the categorical condition were 33.8 ms/item and 111.4 ms/item, respectively. Search slopes for a specific target were only 14.9 ms/item in the target present condition and 48.5 ms/item in the target absent condition. With respect to manual measures, categorical search was much less efficient than the search for a specific target designated by a preview.

Mean manual reaction times from correct trials in Experiment 1. Error bars indicate standard error.

Although the manual data can be interpreted as evidence for search guidance only when the target's specific features are known in advance, such a conclusion would be premature. First, as already noted there is an inherent ambiguity in the relationship between manual search slopes and the decision processes involved in distractor rejection. Search slopes may be shallower in the specific condition, not because of better guidance to the target, but rather because the availability of a specific target template makes it easier to classify objects as distractors. Second, although slopes were steeper in the categorical condition compared to the specific, all that one can conclude from this difference is that target specific search is more efficient, not that categorical search is un-guided. To more directly compare guidance under specific and categorical search conditions, we therefore turn to eye movement measures.

Eye movement data

If search is guided to specific targets but not to categorical targets we should find a high percentage of immediate target fixations under specific search conditions, and a chance level of such fixations under categorical search conditions. Finding above-chance levels of immediate target fixations in either condition would constitute evidence for search guidance. Figure 3 shows the percentage of trials in which the target was the first object fixated after onset of the search display. Immediate target fixations were quite common in both search conditions, but more so when a target was specified, F(1, 22) = 23.3, p < .001. As expected, the frequency of these fixations also declined with increasing set size under both categorical, F(2, 22) = 76.7, p < .001, and specific, F(2, 22) = 90.4, p < .001, search conditions. Together, these patterns are highly consistent with the manual data and indicate stronger search guidance to targets defined by a preview. However, we also compared these immediate fixation rates to the rates expected by a random movement of gaze to one of the display objects. These correspond to the chance baselines of 16.7%, 7.7%, and 5% in the 6, 13, and 20 set size conditions, respectively. Targets in the specific search conditions were clearly fixated initially more often than what would be expected by chance, t(11) ≥ 8.93, p < .001. This is unsurprising given the already strong evidence for guidance under specific conditions provided by the other search measures. More interestingly, a similar pattern of above-chance immediate target fixations was also found in the categorical search conditions. These preferential fixation rates were significantly greater than chance at each set size, t(11) ≥ 9.04, p < .001, with guidance estimates ranging from 31.9% at a set size of 6 to 15.1% at a set size of 20. This preference to look initially to the target, even in the absence of knowledge about the target's specific appearance, constitutes strong evidence for categorical guidance.

Percentage of Experiment 1 target present trials in which the first fixated object was the target. Dotted, dashed, and solid lines indicate chance levels of guidance in the 6, 13, and 20 set size conditions, respectively. Error bars indicate standard error.

Guidance might also increase during the course of a search trial, and this evidence for guidance would be missed if one focuses exclusively on immediate target fixations. To better capture this dimension of guidance we analyzed the number of distractor objects that were fixated prior to fixation on the target. The results from this analysis, shown in Figure 4, indicate that fixations on distractors were rare; fewer than two distractors were fixated even in the relatively dense twenty-object displays. Nevertheless, the average number of distractors fixated before the target increased significantly with set size in both the categorical, F(2, 22) = 109, p < .001, and specific, F(2, 22) = 68.7, p < .001, search conditions. The number of fixated distractors was also slightly smaller in the target specific condition than in the categorical condition, F(1, 22) = 14.6, p < .01, with this difference interacting with set size, F(2, 44) = 13.6, p < .001. We also compared these distractor fixation rates to baselines reflecting the number of pre-target distractor fixations that would be expected by chance. These baselines, 2.5, 6, and 9.5 in the 6, 13, and 20 set size conditions, respectively, assumed a random fixation of objects in which no fixated object was revisited by gaze (i.e., sampling without replacement). Consistent with our analysis of immediate target fixations, distractor fixation rates were well below chance levels for both categorical and target specific search conditions, with significant differences obtained at each set size, t(11) ≤ -41.5, p < .001. These below-chance rates of distractor fixation provide additional direct evidence for guidance during categorical search.

Average number of distractors fixated before the target (without replacement) from target present Experiment 1 trials. Dotted, dashed, and solid lines indicate chance levels of guidance in the 6, 13, and 20 set size conditions, respectively. Error bars indicate standard error.

In addition to search being guided to targets, search efficiency might also be affected by how long it takes to initiate a search (Zelinsky & Sheinberg, 1997), and the time needed to verify that an object is a target once it is located (Castelhano et al., 2008). To examine search initiation time we analyzed the latencies of the initial saccades, defined as the time between search display onset and the start of the first eye movement (Table 1). Although we found small but highly significant increases in initial saccade latency with set size (Zelinsky & Sheinberg, 1997), F(2, 22) ≥ 14.1, p < .001, search initiation times did not reliably differ between categorical and specific conditions, F(1, 22) = 0.49, p > .05. To examine target verification time we subtracted the time taken to first fixate a target from the manual RT, on a trial by trial basis (Table 1). This analysis revealed only marginally reliable effects of set size, F(2, 22) ≤ 3.21, p ≥ .06, and search condition, F(1, 22) = 3.54, p = .07. Perhaps more telling is that an analysis of just the time to fixate the target revealed large effects in both set size, F(2, 22) ≥ 57.0, p < .001, and search condition, F(2, 44) = 17.8, p < .001. Although search initiation times and target verification times can affect search efficiency, in the current task search efficiency was determined mainly by guidance to the target.

Table 1. Initial saccade latencies, RTs to target, and target verification times from Experiment 1.

	Categorical			Specific
Set Size	6	13	20	6	13	20
Initial Saccade Latency
Target Present	181 (6.3)	185 (6.9)	191 (8.4)	182 (6.9)	189 (5.8)	196 (7.7)
Target Absent	174 (6.8)	186 (8.1)	193 (8.6)	184 (6.9)	203 (8.0)	198 (5.9)
RT to Target	362 (12.4)	450 (20.3)	667 (21.8)	331 (8.5)	391 (12.1)	495 (20.7)
Target Verification Time	446 (24.4)	622 (107.5)	564 (86.9)	376 (34.4)	396 (37.4)	421 (36.5)

Open in a new tab

Note: All values in msec. Values in parentheses indicate standard error.

To summarize, the goal of this experiment was to determine whether target guidance exists in a categorical search task, and on this point the data were clear; although search guidance was strongest when subjects knew the target's specific features, search was also guided to categorically-defined targets. To our knowledge this constitutes the first oculomotor-based evidence for categorical search guidance in the context of a controlled experiment (but see also Schmidt & Zelinsky, 2009). Also noteworthy is the fact that the observed level of categorical guidance was quite pronounced, larger than the difference in guidance between the specific and categorical search conditions. Rather than being unguided, categorical search more closely resembles the strong guidance observed under target-specific search conditions.

Experiment 2

We know from Experiment 1 that categorical search guidance exists, but how does it work? There are two broad possibilities. One is that subjects were guiding their search based on the features of a specific, previously viewed teddy bear. Although Experiment 1 prevented the repetition of objects, thereby minimizing the potential for this sort of bias from developing over the course of trials, it is nevertheless possible that subjects had in mind a specific target (perhaps a favorite teddy bear from childhood) and were using this pattern to guide their categorical search. Rather than guiding search based on the features of a specific target template, another possibility is that subjects assembled in their working memory a categorical target template based on visual features common to their teddy bear category. Search might therefore have been guided by teddy bear color, texture, and shape features, even if the subject had never viewed the exact combination of these features in a specific teddy bear.

To distinguish between these two possibilities we manipulated the number of times that a target would repeat during the course of the experiment. The net effect of this manipulation is to vary the featural uncertainty of the target category, and the capacity for subjects to anticipate or predict the correct features of the target bear. If the same target repeated on every trial, this prediction would be easy and roughly equivalent to target specific search (e.g., Wolfe et al., 2004); if each of ten targets repeated on a small percentage of trials, a correct prediction becomes more difficult. According to a specific-template model, categorical guidance should decrease monotonically with the number of potential targets, and their accompanying lower repetition rates, due to the lower probability of selecting the correct target to be used as a specific guiding template. A categorical-template model makes a very different prediction. If a subject builds a target template from all of the teddy bears existing in their long-term memory, then the repetition of specific target bears would be expected to affect categorical guidance only minimally, or not at all.