Does Contextual Cueing Guide the Deployment of Attention?

Melina A Kunar; Stephen Flusberg; Todd S Horowitz; Jeremy M Wolfe

doi:10.1037/0096-1523.33.4.816

. Author manuscript; available in PMC: 2010 Aug 17.

Published in final edited form as: J Exp Psychol Hum Percept Perform. 2007 Aug;33(4):816–828. doi: 10.1037/0096-1523.33.4.816

Does Contextual Cueing Guide the Deployment of Attention?

Melina A Kunar ¹, Stephen Flusberg ³, Todd S Horowitz ^2,³, Jeremy M Wolfe ^2,³

PMCID: PMC2922990 NIHMSID: NIHMS75735 PMID: 17683230

Abstract

Contextual cueing experiments show that when displays are repeated, reaction times (RTs) to find a target decrease over time even when observers are not aware of the repetition. It has been thought that the context of the display guides attention to the target. We tested this hypothesis by comparing the effects of guidance in a standard search task to the effects of contextual cueing. Firstly, in standard search, an improvement in guidance causes search slopes (derived from RT × Set Size functions) to decrease. In contrast, we found that search slopes in contextual cueing did not become more efficient over time (Experiment 1). Secondly, when guidance is optimal (e.g. in easy feature search) we still found a small, but reliable contextual cueing effect (Experiments 2a and 2b), suggesting that other factors, such as response selection, contribute to the effect. Experiment 3 supported this hypothesis by showing that the contextual cueing effect disappeared when we added interference to the response selection process. Overall, our data suggest that the relationship between guidance and contextual cueing is weak and that response selection can account for part of the effect.

Keywords: Contextual Cueing, Attention, Guidance, Response selection, Visual Search

Introduction

In everyday life we are inundated by a glut of visual stimuli. Visual scenes are often complex, containing a large amount of irrelevant information. In a commonplace search for something like a particular student in an auditorium, we would be overwhelmed if we were to attempt to attend to every stimulus at once. In response to the inability to process all visual stimuli simultaneously, the visual system has attentional mechanisms that permit us to search for the student by deploying attention to one or a few objects at a time out of the crowded world. Given the inherent complexity of the task, the visual system has evolved a variety of mechanisms to optimize this selection process. Many of these mechanisms come under the rubric of “attentional guidance” (see Wolfe & Horowitz, 2004). Guidance processes speed search by directing attention to items more likely to be targets. Thus, the student is likely to be human-sized and elongated. Attention is guided to objects with those attributes in preference to, for example, small, cubic objects. Spatial configuration of items is a candidate source of guidance. The visual system appears to be sensitive to the predictive value of repeated spatial configurations. In this paper, we ask whether this contextual cueing (Chun & Jiang, 1998) is a form of guidance. Our answer will be that contextual cueing is, at best, a very weak form of guidance and that there are other mechanisms involved in the beneficial impact of repeated configuration on response times.

It has long been known that context speeds object recognition (Biederman, 1972). For example, we would be faster to name a potato masher on a kitchen countertop than the same implement on a workbench. Similarly, a student easily recognized in the classroom might be difficult to place if we ran into her at the mall. But does context affect our ability to search for a specific target? Intuition suggests that it should be easier to find the potato masher if it were habitually stored to the right of the fridge than if it could appear anywhere in the kitchen, and that we would have a better chance of finding our student if she always sat in the same seat than if we had to search the entire auditorium.

Research by Chun and Jiang (1998; 2003) seemed to confirm these intuitive predictions. They demonstrated that the spatial layout of a search display could influence how quickly participants found a target. In a series of studies they found that if the target item was embedded in an invariant configuration that was repeated across the experiment, reaction times (RTs) to find the target were quicker than when it appeared in a novel or unrepeated configuration; this is the basic contextual cueing phenomenon. Further research has found that contextual cueing can be based on implicit memory, is learned after only 5 repetitions of the display (Chun & Jiang, 1998), and can persist for up to a week (Chun & Jiang, 2003).

In their initial paper, Chun and Jiang (1998) suggested that contextual cueing occurs because the visual context can guide spatial attention towards the target. In fact, the notion that contextual cueing helps guidance is repeated throughout the literature (e.g., Chun, 2000; Chun & Jiang, 1998; Chun & Jiang, 1999; Chun & Jiang, 2003: Endo & Takeda, 2004; Hoffmann & Sebald, 2005; Jiang & Chun, 2001; Jiang & Leung, 2005; Jiang, Song & Rigas, 2005; Jiang & Wagner, 2004; Lleras & Von Mühlenen, 2004; Olson & Chun, 2002; Tseng & Li, 2004). This fits with our intuitive notion that when we know where to expect a target we do not need to search too much but instead, taking our example of looking for a student in an auditorium, deploy our attention directly to the expected seat. However, one might observe faster search times without improving the search process at all. For example, it might take just as long to search for a target in a repeated configuration, but once found, the target in the expected location might be recognized and/or responded to more quickly, just as the student is more readily identified in the classroom than in the mall. In this paper we ask whether contextual cueing really guides the search process itself - making the search more efficient – or whether other factors such as facilitation in response selection play a part in contextual cueing.

RTs in visual search experiments can be affected by any processing stage between the retina and the hand (Wolfe et al., 2002). In order to isolate the cost of search proper from perceptual, decision, and response factors, researchers studying search behavior in the RT domain typically vary the number of items (set size), and fit a line to the RT × set size function. The slope of this line can be taken as a measure of the efficiency of search, while non-search factors, such as initial perceptual processing and response selection processes, contribute to the intercept. A wide range of slopes have been observed in the literature (Wolfe, 1998). A slope of 0 msec/item shows that RT is independent of the number of distractors, indicating that attention is directed immediately to the target. Such highly efficient search is characteristic of “feature search” (Treisman & Gelade, 1980), where the target differs markedly from distractors along some basic feature dimension, such as search for a red letter among green letters or for a horizontal bar among verticals (see Wolfe & Horowitz, 2004, for a review). In less efficient search tasks, each additional distractor is associated with an increase in mean RT. For example, conjunction search, in which the target is defined by a combination of features each of which is present separately in the distractors (e.g. finding a red vertical bar among red horizontals and green verticals), is generally less efficient than feature search (Treisman & Gelade, 1980), with slopes averaging 10–15 msec/item (Wolfe, 1998). More difficult spatial configuration searches, where the target is defined by the spatial arrangement of elements (e.g. finding a digital 2 among digital 5s) might produce slopes of 20–40 msec/item (Wolfe, 1998). Many differences in search efficiency can be attributed to differences in guidance. To give one example, the inefficient search for a 2 among 5s becomes more efficient if it is a search for a red 2 among red and black 5s. Attention would be guided to red items, reducing the effective set size.

If contextual cueing were the result of guiding attention to the target, there are several predictions we could make based on the extensive visual search literature. For example, contextual cueing ought to result in improved search efficiency, so we should see a decrease in search slope over the course of a contextual cueing experiment. To take an extreme example, if contextual cueing produced perfect guidance, attention would go directly to the target item and the search slope would drop to zero. While such perfect guidance is unlikely, search slopes for repeated displays should, at the very least, be markedly reduced compared to those from unrepeated displays. We tested this prediction in Experiment 1 and found little, if any, improvement in search efficiency¹.

If guidance cannot account for the whole contextual cueing effect, then what can? Experiments 2 and 3 tested the hypothesis that response priming contributes to contextual cueing. Experiment 2 showed that small but reliable contextual cueing effects occur even in tasks when there is already ‘perfect’ guidance (i.e. displays with a single item and feature search tasks). However, contextual cueing disappeared in these tasks when we introduced interference at the level of response selection (Experiment 3). Taken together, these experiments support a role for response factors in contextual cueing. We conclude that several factors including, but probably not limited to response selection, contribute to contextual cueing. Attentional guidance makes, at best, a small contribution.

Experiment 1

If the benefit found in contextual cueing experiments were a result of improved attentional guidance then we would expect to find an improvement in search efficiency when the display was repeated, as well as a benefit in reaction time. Previous contextual cueing studies, with the exception of Chun and Jiang (1998), have not varied set size, and so could not measure search efficiency. Here we ran a contextual cueing experiment in which set size varied from 8 to 12 items, allowing us to compute the RT × set size slope.

Method

Participants

Twelve observers between the ages of 18 and 55 years served as participants. Each participant passed the Ishihara test for color blindness and had normal or corrected to normal vision. All participants gave informed consent and were paid for their time.

Apparatus and Stimuli

This experiment, and all experiments, hereafter, was conducted on a Macintosh G4 computer using Matlab 5.2.1 software with the PsychToolbox (Brainard, 1997; Pelli, 1997). The distractor items were L shapes presented randomly in one of four orientations (0°, 90°, 180° or 270°). The target item was a T shape rotated 90° either to the left or to the right with equal probability. There was always a single target present. A blue dot at the center of the screen served as a fixation point. The background color of the screen was a uniform gray. Three black concentric circles surrounded the fixation point with diameters of 9.5°, 15.5°, and 25° visual angle. Sixteen black lines radiated out from the fixation point roughly equidistant from one another to form a radial lattice. On every trial, either eight or twelve (depending on the set size) circular “placeholders” appeared at the conjunctions between the concentric circles and the spokes. To compensate for the decline in visual acuity with distance from the fixation point, the size of the place-holding circles and of the Ts and Ls increased with eccentricity. Those on the closest concentric circle were 2° in diameter, those on the middle concentric circle were 3.3°, and those on the furthest concentric circle were 5.4°.

All stimuli were made up of two lines of equal length (forming either an L or a T) and appeared within the circular placeholders. Stimuli enclosed in the smallest placeholders subtended a visual angle of 1° × 1°, those enclosed in the middle placeholders subtended 1.5° × 1.5°, and those enclosed in the largest placeholders subtended 2.5° × 2.5°. A tone sounded at the start of each trial, at which point the items appeared on the screen. The color of the items and the placeholders varied for each participant (either yellow, red, blue, orange, cyan, green, purple, or white) but remained constant throughout the experiment. Participants were asked to respond to the direction of the target letter T by pressing the letter ‘a’ if the stem of the T was pointing right and ‘l’ if the stem of the T was pointing left. Error feedback was given after each trial. Example displays are shown in Figure 1.

Procedure

Participants were given a practice block of 10 trials, followed by 512 experimental trials divided into 8 epochs of 64 trials. Approximately half of the trials in each epoch had a set size of 8. The remaining trials had a set size of 12.

Within each set size, for epochs 1 to 7, approximately half the trials had fixed placeholder configurations that were repeated throughout the experiment (predictive displays). These consisted of 4 fixed displays that were repeated 4 times within an epoch for each set size. Overall, each repeated display was shown approximately 28 times throughout the experiment. The other half of the trials had a novel configuration that was generated at random. In order to ensure that participants were not simply learning absolute target locations from the predictive displays, in the random displays targets appeared equally often in 4 randomly selected locations but these appearances were not correlated with any pattern of distractor locations. In epoch 8 the absolute target locations for predictive and random trials remained the same, but all configurations were now made random, so that the context was no longer predictive on any of the trials. This was implemented as a secondary check to make sure any benefit observed for predictive displays was due to the learning of display context rather than the learning of the absolute target locations. If participants were learning the context, then epoch 8 should produce slower RTs than epoch 7, even on trials where the target locations were identical to those used in the predictive displays of epochs 1–7.

Data analysis

In the literature, there have been many ways to formally define contextual cueing. Chun & Jiang (1998) suggested that the contextual cueing effect should be measured as the difference between predictive and random configurations across the last three epochs (see also Jiang, Leung & Burks, submitted, and Kunar, Flusberg & Wolfe, in press). This procedure focuses on the asymptotic benefit for having learned a predictive context over a random one. Following their reasoning, we collapsed the data across the last 3 predictive epochs (here epochs 5 to 7) and used this as our standard measure of contextual cueing

Results and Discussion

Figures 2a and 2b show RTs for both predictive and random configurations for set sizes 8 and 12, respectively. RTs below 200 msec and above 4000 msec were removed. This led to the removal of less than 1% of the data. Examining the RTs, we see that both set sizes showed a contextual cueing effect. For set size 8, there was a main effect of configuration and epoch (for epochs 1 to 7), where RTs in the predictive display were faster than those in the random, F(1, 11) = 12.2, p < 0.01, and RTs became faster over time, F(6, 66) = 2.3, p < 0.05. There was also a significant configuration x epoch interaction, F(6, 66) = 3.4, p < 0.01. RTs decreased more across epoch when the display was predictive than when it was random. Comparing the “predictive” RTs between epoch 7 and epoch 8 (where the predictive configurations were no longer valid) we see that RTs increased when the configuration was no longer predictive, t(11) = 3.0, p < 0.05. This suggests that it is the context that is important rather than the absolute target locations². When we collapsed the data across epochs 5–7, the results showed a positive contextual cueing effect: predictive RTs were 152 msec faster than random ones, t(11) = 3.8, p < 0.01.

Mean correct RTs (msec) for each condition over epoch in Experiment 1. In Epoch 8, all displays are random. Error bars in all graphs represent the standard error.

A similar pattern could be seen for set size 12. Here there was a main effect of configuration and epoch (for epochs 1 to 7), where RTs in the predictive display were faster than those in the random, F(1, 11) = 23.3, p < 0.01, and RTs became faster over time, F(6, 66) = 3.4, p < 0.01. However, there was no configuration x epoch interaction. Collapsing the data across epochs 5 to 7 again showed a valid contextual cueing effect. RTs for predictive trials were 174 msec faster than those for random, t(11) = 4.2, p < 0.01.

Overall error rates were quite low at 3%. There was a significant effect of configuration, F(1, 11) = 5.2, p < 0.05; random trials showed a higher error rate than predictive. None of the other main effects or interactions proved reliable.

The RT data for both set sizes showed a reliable contextual cueing effect. For present purposes, the critical question is the effect of contextual cueing on search slope. Slopes for predictive and random displays are shown as a function of epoch in Figure 3. While there may be some effect, it is not very robust and certainly never yields efficient search for contextually cued targets. There was a main effect of context. Over epochs 1 to 7, search slopes were more efficient when the displays were predictive than when they were random, F(1, 11) = 6.5, p < 0.05). The effect of epoch was not reliable, F(6, 66) = 0.4, p = n.s.. Nor was there a reliable condition x display size interaction, F(6, 66) = 0.7, p = n.s.. If we take our standard measure and collapse the data across epochs 5 to 7, there was no contextual cueing effect, t(11) = 0.6, p = n.s.. If anything, more learning makes the contextual cueing effect on slope less reliable.

Search slopes (msec/item) for each condition over epoch in Experiment 1. In Epoch 8, all displays are random.

Another way to look at this question is to see whether the difference in slope between predictive and random displays can account for the size of the contextual cueing benefit. For example, at set size 12, contextual cueing speeded responses by 174 msec (as calculated from epochs 5–7). In order to account for an effect of this magnitude, slopes in the predictive case would have to be 174÷12 or 15 msec/item shallower than in the random case. The observed slope difference, however, was only 5 msec/item (and not reliably different from 0 msec/item). It seems that guidance on its own cannot account for the contextual cueing effect.

If there were any effect of contextual cueing on search efficiency, it was very modest. Instead of seeing a marked improvement in search efficiency, search slopes from repeated displays hovered around 30 msec/item, suggesting, at best, that observers can only eliminate a few items from search³. This is similar to data reported by Chun & Jiang (1998). Since it is hard to interpret essentially negative findings, over the course of our research we have replicated this experiment nine other times (see Figure 4). Table 1, gives a brief description of each of the nine new experiments. None of these experiments yielded a reliable difference between predictive and random slopes (again collapsing the data across epochs 5–7, although two experiments did show a marginal benefit, p = 0.09 in both cases). Furthermore, unlike Experiment 1, eight out of nine of these new experiments showed that there was no reliable main effect of predictive versus random configuration on slope (see Table 1). This again suggests that there was little guidance benefit from having a repeated display. A meta-analysis across all 118 participants in all ten experiments showed that the overall RT contextual cueing effect (as measured from the last three epochs) for set size 12 was 172 msec. Using the logic introduced above, we would predict a 14.4 msec/item slope advantage for the predictive displays, if guidance were to account for the contextual cueing effect. However, the average observed benefit was only half this at 6.9 msec/item (again not reliably different to 0 msec/item, t(117) = 1.4, p = n.s.). Predictive displays produce, at best, weak slope benefits. Guidance seems to account for, if anything, only a small part of the contextual cueing effect.

Ten experiments showing that there is little difference between predictive and random search slopes within contextual cueing studies. The data from Experiment 1 are those from Experiment 1 here.

Table 1.

A brief description of the nine new experiments investigating the effect of contextual cueing on slope and RT in Figure 4

Experiment	N	SS	Stimuli	Background Lattice	Main effect of Configuration (Slope)	Main effect of Configuration (RT)
2	12	8, 12	Letters	Yes	No	Marginal
3	8	8, 12	T vs L	No	No	Yes
4	12	4, 8, 12	T vs L	Yes	No	Yes
5	12	8, 12	V vs H	Yes	No	Yes
6	12	8, 12	V vs H	Yes	No	Yes
7	12	8, 12	T vs L*	Yes	Marginal	Yes
8	12	8, 12	V vs H	Yes	No	Yes
9	13	8, 12	T vs L	Yes	No	Yes
10	13	8, 12	V vs H	Yes	Yes	Yes

Open in a new tab

Where N = Number of participants and SS = Set Sizes

Stimuli Description:

Letters = Stimuli were heterogeneous letters. The task was to respond to the mirror reversed letter

T vs L = The task was to respond to the orientation of the letter T among rotated distractor Ls (n.b., this was the same task as that of Experiment 1)

T vs L* = The task was to respond to the color of the T among Ls. All stimuli were randomly colored red or green

V vs H = The task was to report whether the target was a vertical or horizontal line. The distractors were oblique lines orientated either 30, 60, −30 or −60 degrees of the vertical

Learning appeared rapidly over the first few epochs in Experiment 1 (see Figure 2b). Therefore, one could argue that any slope difference should have emerged early on – perhaps over the first few repetitions. In fact, Chun and Jiang (1998) reported that learning could occur within the first two repeats of a display. To investigate this, we compared the data over the first four repetitions (Block 1) and the next four repetitions of the display (Block 2). If learning occurred after a few trials and resulted in improved guidance, we would expect to find a slope benefit within the first few blocks. However, we did not. There was no difference between search slopes for predictive trials versus random for either Block 1 or Block 2 (t(11) = 1.2, p = n.s., and t(11) = 1.6, p = n.s, respectively). Thus even if learning occurred early on in the experiment, this did not result in improved search slopes. Even if we extend these analyses to look at search slopes across all subsequent blocks (i.e., groups of four successive predictive versus random displays), we see that throughout the experiment there was no reliable benefit (all ts < 2.0, ps = n.s.). Furthermore, a meta-analysis on all ten experiments shown in Figure 4 found no effect of predictive versus random search slopes for Blocks 1 or 2, (t(117) = 1.0, p = n.s. and t(117) = 0.1, p = n.s., respectively). This analysis argues that the contextual cueing effect involves, at best, a limited improvement in guidance.

Perhaps we did not find an effect on search slope because of differential contextual cueing effects across set size. It has been suggested that contextual cueing does not occur in crowded displays (e.g., see Hodsoll & Humphreys, 2005), as the context loses some of its distinctiveness. If this were the case, and a display of set size 12 was less distinct than set size 8, there would be less contextual cueing with the former than the latter. Thus a reduction in distinctiveness with increasing set size might offset the benefit of guidance, leading to no net change in slope. We find this explanation unlikely. In Hodsoll and Humphreys’ experiments, displays of set size 10 produced strong contextual cueing, while displays of set size 20 did not. Displays of set size 12 have been shown to produce a robust contextual cueing effect throughout the literature, indicating that they are seen to provide unique and distinct contexts. The difference in distinctiveness between set size 8 and set size 12 seems unlikely to offset any but the weakest of potential guidance effects. However, in the absence of further data we cannot rule out this possibility.

If factors other than attentional guidance were involved in contextual cueing, then we would expect to see reliable differences in intercepts between predictive and random displays. Intercept effects are thought to reflect perceptual processes and/or response selection processes. Figure 5a shows intercept effects across epoch for all of the ten experiments reported above, and Figure 5b shows the difference in predictive versus random displays over the last three epochs. As can be seen there was a clear difference between predictive and random intercepts, reflected in a reliable main effect between predictive and random displays, F(1, 117) = 4.3, p < 0.05, and a significant difference across the last three epochs, t(117) = 2.4, p < 0.05. These data suggest that processes other than guidance must account for some portion of the contextual cueing benefit. Presumably these will be either a facilitation of early processing stages or a facilitation of response selection processes. Experiment 3 investigated the role of this latter component, while Experiment 2 explored whether a contextual cueing effect can still occur when guidance is already optimal.

Intercept effects (msec) for predictive and random displays across all 118 participants tested in Figure 4 (a) for each epoch and (b) over the last three predictive epochs.

Experiment 2a

If contextual cueing improved search by guiding attention to the target then it should be of little use when the guidance signal is already strong enough to attract attention to the target location with near certainty. In Experiment 2a, a single letter was presented on each trial. Empty circular placeholders provided the context. There were no distractor items. In this case, standard guidance should direct attention straight to the target. Any guidance by contextual cueing would be redundant.