Guided Search for Triple Conjunctions

Maria Nordfang; Jeremy M Wolfe

doi:10.3758/s13414-014-0715-2

. Author manuscript; available in PMC: 2017 Aug 22.

Published in final edited form as: Atten Percept Psychophys. 2014 Aug;76(6):1535–1559. doi: 10.3758/s13414-014-0715-2

Guided Search for Triple Conjunctions

Maria Nordfang ^1,², Jeremy M Wolfe ²

PMCID: PMC5565881 NIHMSID: NIHMS882391 PMID: 25005070

Abstract

A key tenet of Feature Integration Theory and related theories such as Guided Search (GS) is that the binding of basic features requires attention. This would seem to predict that conjunctions of features of objects that have not been attended should not influence search. However, Found (1998) reported that an irrelevant feature (size) improved the efficiency of search for a color × orientation conjunction if it was correlated with the other two features across the display compared to the case where size was not correlated with color and orientation features. We examine this issue with somewhat different stimuli. We use triple conjunctions of color, orientation and shape (e.g. search for a red, vertical, oval-shaped item). This allows us to manipulate the number of features that each distractor shares with the target (Sharing) and it allows us to vary the total number of distractor types (and, thus, the number of groups of identical items; Grouping). We find these triple conjunction searches are generally very efficient – producing very shallow reaction time (RT) × set size slopes, consistent with strong guidance by basic features. Nevertheless, both of these variables, Sharing and Grouping modulate performance. These influences are not predicted by previous accounts of GS. However, both can be accommodated in a GS framework. Alternatively, it is possible, if not necessary, to see these effects as evidence for “preattentive binding” of conjunctions.

In a typical visual scene, many objects will share features with each other. There may be several big things, several blue things, several shiny things, and so forth. Consequently, looking for a specific object is likely to entail search for a conjunction of features (the big, blue, shiny thing). Conjunction searches have been a subject of considerable interest in the visual search literature for many years. In her original “Feature Integration Theory (FIT)”, Treisman classified conjunction searches as “serial” as contrasted with “parallel” feature searches (Treisman & Gelade, 1980). Central evidence for this claim came from the functions relating set size (the number of items in a search display) to reaction time (RT). For salient features (e.g. red among green or big among small), the slope of the RT × set size function was near zero, suggesting that there was no additional cost of added distractor items. For conjunction searches, in contrast, RT increased linearly with set size. Each additional distractor imposed a cost. The data were consistent with a serial search through the items at a rate of 20–40 items per second. It should be noted that the same data are also consistent with various versions of parallel models in which all items are processed at the same time (Townsend, 1971; Townsend & Wenger, 2004) but where noise or capacity limitations cause a rise in RT with set size (Palmer, 1995).

A key theoretical claim of FIT was that the features forming conjunctions could not be ‘bound’ without the application of selective attention. However, whether or not conjunction identification required serial binding, subsequent work made it clear that conjunction search did not need to be particularly inefficient. With salient component features, conjunction searches tended to produce RT × set size slopes that were intermediate between the most efficient feature searches and the least efficient basic searches in which items were big enough to be identified without requiring fixation (e.g. Ts among Ls or 2s among 5s) (Egeth, Virzi, & Garbart, 1984; Dick, Ullman, & Sagi, 1987; McLeod, Driver, Dienes, & Crisp, 1991; Nakayama & Silverman, 1986; Treisman & Sato, 1990; Wolfe, Cave, & Franzel, 1989). There is a continuum of search efficiency from highly efficient feature searches to inefficient searches for items defined by their spatial configurations (like Ts and Ls) (Wolfe, 1998).

Guided Search (GS) Theory is one approach to understanding this continuum (Eckstein, 1998; Wolfe, 1994, 2007; Wolfe, et al., 1989). GS preserves the central role for binding via selective attention. According to GS, relatively efficient conjunction search occurs because basic features can be used to guide attention to items that are more likely to be the target item. Thus, in a search for a red vertical item, attention can be guided to red items and to vertical items; the intersection of those two sets being an excellent place to look for red vertical items. The claim of FIT and GS is that the red vertical item is not bound and recognized until the item falls under the ‘spotlight’ of selective attention. Various other experimental results (Driver, McLeod, & Dienes, 1992; Duncan, 1995; Enns & Rensink, 1990; Roggeveen, Kingstone, & Enns, 2004) and various other theoretical formulations propose that features can be bound without the need to focus selective attention on the item (McElree & Carrasco, 1999; Palmer, 1995). The “Similarity” model proposed by Duncan and Humphreys put an emphasis on the role of grouping of items by similarity (Duncan & Humphreys, 1989), including the grouping of items whose similarity was based on the binding of features without attention (Humphreys, Quinlan, & Riddoch, 1989). GS argued against such preattentive binding (Wolfe, 1992).

Found (1998) put these competing claims to an interesting test. He had participants search for tilted red lines among tilted white and vertical red lines. The critical manipulation was an irrelevant variation in a third variable, size. Items were either big or small, and the size of items was either correlated with the color and orientation of items or not. In the correlated case, within a trial, all items of one conjunction type had the same size and all items of the other conjunction type had the other size. For example, red vertical items might be big while all white tilted items might be small; however, the specific relationship between size and the orientation × color varied from trial to trial. When size was uncorrelated, the size varied randomly with the orientation × color conjunctions within a trial, with the restriction that half the elements in a trial were big and half were small. In both cases, the target item were equally likely to be either big or small, thus, size was uninformative of target presence. Found reasoned that GS should not care about whether the irrelevant size variable was tied to the task relevant feature dimensions or not. If features were processed independently, prior to the arrival of attention, the contribution of size would be the same in the two conditions. However, the results showed that the strongly correlated case was more efficient. Found argued that the size correlated case had two groups of items (e.g. big red vertical and small white tilted) while the size uncorrelated case had four. The displays with more and smaller groups looked ‘noisier’ and were somewhat harder to search through. Found considers this to be consistent with a Similarity Theory in which “preattentive vision delivers bound sets of features that relate to the same segmented object” (Found, 1998, p. 1123) and not consistent with GS which would not deliver such preattentive bindings. Proulx (2007) expanded on these considerations and found that salient, task-irrelevant singleton features influence search efficiency. This led Proulx to propose that both GS and Similarity Theory understate the role of bottom-up saliency in conjunction searches (Proulx, 2007).

There is good evidence that feature conjunctions can influence behavior even for conjunction items that GS and similar serial theories assume are available only preattentively or with minimal attention. For example, Mordkoff, Yantis, & Egeth (1990) had observers look for Red X targets in displays with other items that could be red or Xs but not both. In displays of two or six items, the critical comparison was between trials with one or two Red Xs. RTs are faster when there are two (redundancy gain, see also Pashler, 1987). Importantly for the argument, the RTs are faster than would be predicted if each conjunction needed to be processed separately (Mordkoff, Yantis, Egeth, 1990). Mordkoff et al. argue that, in a redundant display, both Red Xs can be processed as conjunctions of red and X at the same time.

Converging evidence for this sort of preattentive processing of conjunctions comes from Mordkoff and Halterman’s (2008) “correlated flankers task”. In the standard flanker task, observers might be shown groups of three letters and told to hit the left key if the middle letter was an A and the right key if the middle letter was a B (C W Eriksen & Hoffman, 1973; B A Eriksen & C W Eriksen, 1974). The standard finding is that it will take a little longer to respond if the flanking letters are incongruent with the central letter (BAB and ABA) than if the flankers are congruent (AAA, BBB). In Mordkoff and Halterman’s (2008) version of the task, the target was a color shape conjunction (e.g. Red Square) and the flankers were other conjunctions that could be correlated with the target. Thus, Blue Diamond flankers might be correlated with Red Square though blue and diamond by themselves were not. These conjunctive flankers have an effect on RT to the target, indicating that the combination of blue and diamond has been registered.

There is a long running debate about the source of the flanker effect. The original hypothesis was that the flanker effect was evidence that the flanker letters were processed without attention because attention was directed to the central letter. Later work questioned the assumption that one could completely deny attention to the flankers. For instance, Lavie and Tsal (1994) argued that, if the central task was not very demanding, some attentional resources would spill over to process the flankers. Kyllingsbæk, Sy, & Giesbrecht (2011) demonstrated that this load effect on the flanker task can also be explained by a parallel model with limited processing capacity and limited visual working memory. Regardless of one’s position on this continuing debate (Tsal and Benoni, 2010; Lavie and Torralbo, 2010), results like those of Mordkoff and Halterman (2008) do indicate that, under some circumstances, the conjoint appearance of basic feature in an object can be processed with little or no attention.

Krummenacher and colleagues find evidence for coactivation of multiple features in visual search tasks (Krummenacher et al., 2001, 2002; Krummenacher, Grubert, & Müller, 2010;). As in the Mordkoff work, conjunctions of color and shape produce RTs that are too fast to be explained if the two features are not being combined in some manner. Their “dimension weighting” solution to this problem is a modification of the GS.

In this paper, we use higher order conjunctions to revisit this issue of preattentive processing of the combinations of basic features. By higher order conjunctions, we mean targets that are defined by more than two features. In the real world, most objects in a complex environment would need to be defined by multiple features. Moreover, as will be seen, higher order conjunctions give us other tools with which to address the question of conjunctive target feature guidance and preattentive effects of feature conjunctions separately. Earlier work with triple conjunctions has provided evidence for an ability to guide attention on the basis of multiple dimensions (Dehaene, 1989; Quinlan & Humphreys, 1987). Consistent with either GS or Similarity Theory, it is easier to find a triple conjunction if distractors share just one feature with the target, than if they share two (Wolfe, et al., 1989). Typically, some features seem to guide more effectively than other, with color being a frequent winner (Williams & Reingold, 2001).

The basic puzzle

Figure 1 illustrates the basic challenge to models like GS posed by Found’s work. The target in each case is a horizontal red rectangle. This is a triple conjunction task because some distractor items are red, some are horizontal, and some are rectangles. No single feature is adequate to do the task. In each case, one third of the items have the target properties. That is, both examples contain one third red items, one third horizontal, and one third rectangles. A standard model with separate representations for each dimension would see no preattentive differences between the two conditions. The difference between the conditions lies in the combinations of the features. On the left, every combination of the three values of the three feature dimensions is present; leading to a display with a target and 26 distractors types. On the right, only three of the 26 distractor types are used. However, the distribution of the individual features is the same in both displays; each feature is represented equally often. It is probably intuitively clear that the 3 Distractor case is easier than the 26 Distractor case. Experiment 1a tests this intuition and shows that it can be supported by data.

It seems intuitively clear (and will be shown empirically below) that it is harder to find a red, horizontal rectangle in the left image than in the right, even though both sets of stimuli have 1/3 red items, 1/3 horizontal items, and 1/3 rectangular shapes.

Experiment 1a

In seven experiments, we examine guidance of attention in visual search for targets defined by 3 or 6 features. We look for and find evidence that cannot be explained by guidance by representations of independent stimulus attributes and we consider whether these findings require a mechanism of preattentive binding. In Experiment 1a, we provide empirical support for the impression that triple conjunctions are easier to find when there are fewer types of distractor items.

Participants

Thirteen paid volunteers (four women) participated in the experiment. Age information was available for 12 out of the 13 participants, for these participants the age range was 19 to 47. The participants had normal or corrected to normal 20/25 vision, no history of eye or muscular disorders, and no color vision deficits when tested on Ishihara’s tests for color blindness (Ishihara, 1987). All participants gave informed consent prior to participation. One participant was excluded from the data analysis due to excessive miss rates. The miss rates of this participant exceeded the mean miss rate across all other participants by over two standard deviations.

Apparatus

The stimuli were presented on Apple Macintosh OS X 10.5.8 computers. The experiments were run using the Psychophysics toolbox in MatLab 7.5.0 (R2007b). Each computer was connected to a 20” CRT screen, and the screen resolution was 1280 × 960 pixels with a refresh rate of 85 Hz. Participants were freely viewing the screen at a distance of approximately 60 centimeters. Responses were collected using a standard U.S. Apple keyboard.

Stimuli

The stimulus set consisted of elements that had one of three features in each of the three feature dimensions color, shape, and orientation. A stimulus element could be red (RGB: 200, 0, 0), green (RGB: 0, 170, 45) or blue (RGB: 0, 230, 230); vertical (0°), oblique (45°) or horizontal (90°); rectangular, oval or jagged. There were thus 27 possible types of feature conjunctions (see Figure 2 for the basic stimulus set).

The 27 items defined by 3 colors X 3 shapes X 3 orientations.

In Experiment 1a, all participants searched for the same target: a red, vertical rectangle. Four distractor sets were used. In the first distractor set, all of the possible conjunction types, excluding the target, made up the set (as the first display in Figure 1). We call this the 26 conjunction (26D) set. Distractor sets two and three each consisted of three conjunction types. These two conditions differed in how many features each distractor type shared with the target. In one of the three-distractor conditions, the distractors were red, vertical ovals (sharing two features with the target); blue horizontal rectangles (one shared feature); and green, oblique zigzag shapes (no shared features). This condition will be designated 3D(012). In the other three-distractor condition, the distractors were red, oblique ovals; green, vertical zigzag shapes; and blue, horizontal rectangles. Each distractor shared one feature with the target; hence, this condition is designated 3D(1). The fourth and last distractor set in Experiment 1a was a 5D set and consisted of a red, vertical zigzag shape (share 2); a red, vertical oval (share 2); a green, oblique rectangle (share 1); a blue, horizontal zigzag shape (share none); a blue, horizontal oval (share none). In the 26D and 3D sets, the proportions of basic features remain the same: one third of the items having each color, each orientation, and each shape. In the 5D set there were fewer representations of green, rectangular, and oblique than of the other features. Importantly, in all conditions, the distractors shared one feature with the target on average.

Display set size was 27 on half the trials and 54 on the other half. These set sizes were picked so that all 26 distractors plus a target could be presented on a single trial. When distractor sets were subsets of the full set, distractors were repeated in a display. Equal (or almost equal) numbers of each distractor were presented on each trial. When the number of distractors did not divide evenly into the set size (in the 5D condition), the required additional distractors were drawn at random without replacement from the current distractor set.

The stimuli were presented on a white background (RGB: 255, 255, 255) in an eight × eight matrix, with a diameter of 950 pixels and centered on the screen. The stimulus elements were randomly presented in the 64 tiles of the matrix. Each element was placed in the center of a randomly chosen unoccupied tile and jittered a few pixels in order to avoid alignment of elements.

Procedure

Participants were instructed to look for the target, defined by three target features (i.e., the red, vertical rectangle), and to respond as quickly and accurately as possible as to whether the target element was present or absent. The target remained the same across the whole experiment. Responses were made by pressing the predetermined “present” or “absent” key on the keyboard. The two response keys were marked by a red and a blue sticker on top of the A key and the L key, respectively. Participants were instructed to place each of their index fingers on top of each of the two keys. Targets were present on half the trials.

Each trial followed the same sequence of events. First, the description of the three target features appeared in the center of the screen for 500 msec accompanied by a warning beep. This was followed by a stimulus display that remained present on screen until the participant responded. After the response, a screen showing the trial number, accuracy feedback, and the reaction time for that trial was displayed for 500 msec. If an error response was made, three error beeps would sound, concurrent with the presentation of the feedback screen. After the feedback, the next trial was initiated after a 1000 msec delay.

Participants started the experiment by completing 10–30 practice trials and 900 experimental trials with presentations of all display types intermixed pseudo-randomly.

Data Analysis

Reaction time (RT) data were trimmed by removing “outlier” trials with reaction times more than three standard deviations greater than the mean of that participant. Trials with RTs below 200 msec were also removed from the analysis.

RT data and accuracy data were examined separately by repeated-measures analyses of variance (ANOVA) with factors: distractor condition and set size. In the following, Greenhouse-Geisser corrected p-values are reported where Mauchly’s test revealed that sphericity could not be assumed. The analyses were carried out separately for target-present and target-absent trials. For the RTs, we were particularly interested in whether the distractor sets significantly influenced search efficiency. Hence, when the general RT ANOVA revealed a significant interaction between set size and distractor condition, the relevant distractor conditions were compared by post-hoc ANOVAs or Student’s t-tests. Post-hoc p-values were Bonferroni-Holm corrected. For the error data, our primary interest was to ensure that speed-accuracy trade-offs were not contributing substantially to reaction time differences for the various distractor conditions. Therefore, when the error rate ANOVA revealed a significant effect of distractor set, the error rates were further investigated. For all ANOVAs generalized eta square (ges) is reported for effect sizes.

Results & Discussion

Using the outlier procedure described above, 2.1 % of the trials were removed from further analysis. Mean RTs are shown in Figure 3. First, they confirm that triple conjunction searches are very efficient when the target shares an average of one feature with the distractors. Note that all target-present slopes are less than 5 msec/item. Second, the results show reliable differences between the conditions, even though the feature maps should be equivalent in four of the five conditions (the 5D condition had slightly fewer green, oblique, and rectangular items).

Mean RTs for Experiment One. Error bars are +/− 1 within-observer s.e.m. using the method of Cousineau (2005), corrected as suggested by Morey (2008, Cousineau & O’Brian, 2014). In some cases, error bars fall within the graphed datapoint.

Reaction times

The reaction time ANOVAs revealed that the effects of distractor condition and the interaction between distractor condition and set size were significant, for both target present and absent trials. In general, reaction times and search efficiency increased when the number of conjunction types in the distractor sets increased. For the target present trials, the two 3D - 26D slope comparisons were significant, as was the 5D – 3D(1) comparison. For the target absent trials, all slope comparisons, except the 3D(1) – 3D(012) comparison, were significant. The results thus indicate that the searches were more efficient when fewer conjunction types were present, and that this pattern is more pronounced for the target-absent trials.

Error rates

There were 255 errors out of the 5602 target-present trials that were not removed by the outlier procedure, and 115 error trials out of the 5337 target-absent trials. Investigations of the error rates revealed no significant effects for the target-present trials. For the target-absent trials, there was a significant main effect of distractor set (see Table 2), however, none of the separate distractor type comparisons revealed any significant effects. Numerically, the error rates followed the pattern suggested by the RTs, with higher error rates for the 26D condition (5,6 % errors), intermediate for the 5D condition (1.7 %) and lowest for the 3D conditions (< 0.1 %). The error rate analyses thus do not suggest a speed-accuracy trade-off.

Table 2.

Error rate analysis of variance for Experiments 1a, 1b, 2, 3, 6, and 7. Only significant effects are reported. GGe denotes the Greenhouse-Geisser epsilon and is reported when significant; ges denotes generalized eta squared.

Experiment	Target		F	df	GGe	p	ges
1A	Absent	Distractor	5.12	3, 33	.37	.040^*	.16
1B	Present	Distractor	4.49	2, 24	n.s.	.022^*	.11
		Set size	8.31	1, 12	n.s.	.014^*	.07
	Absent	Distractor	7.78	2, 24	.51	.016^*	<.01
2 – 8 pp	Present	Distractor	10.71	4, 28	n.s.	<.001^***	.31
2 – all pp	Present	Set Size	7.20	1, 10	n.s.	.023^*	.01
		Distractor	11.74	4, 40	.42	<.001^***	.24
	Absent	Distractor	4.76	4, 40	.36	.35^*	.04
3	Present	Distractor	6.03	3, 36	n.s.	.002^**	.08
		Set size	5.42	1, 12	n.s.	.038^*	.06
	Absent	Distractor	7.38	3, 36	.42	.012^*	.15
		Set size	5.24	1, 12	n.s.	.041^*	.03
6	Present	Distractor	36.03	3, 30	.42	<.001^***	.53
		Set size	8.49	1, 10	n.s.	.015^*	.06
		Distractor × Set size	4.15	3, 30	n.s.	.014^*	.08
	Absent	Distractor × Set size	6.03	3, 30	n.s.	.002^**	.08
7	Present	Distractor	21.35	4, 36	n.s.	<.001^***	.37
		Set size	39.42	1, 9	n.s.	<.001^***	.10

Open in a new tab

corresponds to significance below .05 level,

^**

corresponds to significance below .01 level,

^***

corresponds to significance below .001 level.

Experiment 1b – replication

The results of Experiment 1a clearly indicate that the efficiency of search cannot be explained entirely by the activity in individual feature maps or their linear sum. If that were the case, there should be no difference between 26D and 3D searches. Even though all of these searches are very efficient, the 3D searches are easier than the 26D case.

In Experiment 1a, all participants searched for the same red horizontal rectangle. Moreover, replication is good practice. Accordingly, Experiment 1b is a replication of Experiment 1a with modest modifications. The 5D condition was dropped and the target conjunction varied between participants.