Studying visual search using systems factorial methodology with target–distractor similarity as the factor

Mario Fifić; James T Townsend; Ami Eidels

doi:10.3758/pp.70.4.583

. Author manuscript; available in PMC: 2009 Jan 26.

Published in final edited form as: Percept Psychophys. 2008 May;70(4):583–603. doi: 10.3758/pp.70.4.583

Studying visual search using systems factorial methodology with target–distractor similarity as the factor

Mario Fifić ¹, James T Townsend ¹, Ami Eidels ¹

PMCID: PMC2630500 NIHMSID: NIHMS64887 PMID: 18556921

Abstract

Systems factorial technology (SFT) is a theory-driven set of methodologies oriented toward identification of basic mechanisms, such as parallel versus serial processing, of perception and cognition. Studies employing SFT in visual search with small display sizes have repeatedly shown decisive evidence for parallel processing. The first strong evidence for serial processing was recently found in short-term memory search, using target–distractor (T–D) similarity as a key experimental variable (Townsend & Fifić, 2004). One of the major goals of the present study was to employ T–D similarity in visual search to learn whether this mode of manipulating processing speed would affect the parallel versus serial issue in that domain. The result was a surprising and regular departure from ordinary parallel or serial processing. The most plausible account at present relies on the notion of positively interacting parallel channels.

Walking into a café, you are looking for the familiar face of an old colleague with whom you have set an appointment. Doing so, you are engaged in a visual search for a target (your friend's face) among distractors (other people's faces). Are you more prone to quickly find her when the café is half empty than when it is crowded? Would certain properties, such as red hair, make her more likely to be detected? Similar themes arise when one is scanning a newspaper for words or topics of interest or even scrutinizing a face for a birthmark. The nature of visual search and the factors affecting the speed of processing have been extensively studied over the past half-century. A number of models have been put forth to explain how response times (RTs) are influenced by various stimuli and experimental conditions (Bundesen, 1990; Duncan & Humphreys, 1989; Palmer, Verghese, & Pavel, 2000; Treisman & Sato, 1990; Wenger & Townsend, 2006; Wolfe, 1994).

An especially intriguing question is that of processing architecture.¹ In the café, can we simultaneously process all faces (parallel processing), or is it necessary to process one face at a time (serial processing), until we recognize our friend? Is some kind of more complicated set of processes involved? Although this has been an issue of concern since the 19th century, and intensive experimental and theoretical researches on such issues have continued for several decades, decisive resolution has been hard to come by.

The most common experimental approach since the 1960s has been to manipulate the number of search objects (workload) and measure RTs (for memory search, see Sternberg, 1966; for visual search, see Atkinson, Holmgren, & Juola, 1969, and Egeth, 1966). For example, when participants search for a single visual target among distractors that are highly similar to the target, mean RTs can increase in a linear fashion as a function of workload, thus suggesting serial processing (e.g., Atkinson et al., 1969; Townsend & Roos, 1973; Treisman & Gelade, 1980; Wolfe, Cave, & Franzel, 1989). On the other hand, under certain conditions and even with manipulation of workload, participants can exhibit rapid RTs or patterns of accuracy that are more compatible with parallel processing (e.g., Bacon & Egeth, 1994; Bundesen, 1990; Folk & Remington, 1998; Palmer et al., 2000; Pashler & Harris, 2001; Thornton & Gilden, 2007; Wenger & Townsend, 2006).

Unfortunately, there is a marked asymmetry in the use of workload as an independent variable to assess architecture. On the positive side, when RT decreases or, in some cases, remains roughly constant with an increase in workload, models based on parallel processing are virtually the only viable type of alternative. The reason for the conclusiveness is that serial models would have to increase their processing speed in an outlandish way in order to make such predictions. Yet, in scores of experiments utilizing a single target embedded in a set of distractors (a few have been indicated just above), RT increases with workload, and here the inference is decidedly more ambiguous. The problem is that quite natural parallel models whose channels² become less efficient as workload increases can make predictions identical to those of serial models (e.g., Townsend, 1969, 1971, 1972; Townsend & Ashby, 1983); this is the well-known model-mimicking dilemma.³

There is a more powerful theory-driven methodology that has evolved over the past several decades that is able to avoid most, if not all, model-mimicking challenges and certainly those associated with the architecture–capacity confounding. This approach grew out of Sternberg's additive factors method, which allowed assessment of the hypothesis of serial processing. We will explain the methodology further below, but suffice it to say for now that the prime logic is that experimental factors can be found that selectively slow down or speed up separate subprocesses or subsystems in the overall processing network (the so-called postulate of selective influence; Ashby & Townsend, 1980; Dzhafarov, 1999; Egeth & Dagenbach, 1991; Sternberg, 1969; Townsend, 1984; Townsend & Thomas, 1994). These methods were subsequently generalized to permit identification not only of parallel processing, but also of other feedforward systems of rather surprising complexity (Schweickert, 1978; Schweickert, Giorgini, & Dzhafarov, 2000; Schweickert & Townsend, 1989; Townsend & Ashby, 1983).

We have employed this methodology, now often referred to globally as systems factorial technology (SFT), to investigate visual search with a small workload in a number of important contexts: in binocularly present dots (e.g., Hughes & Townsend, 1998; Townsend & Nozawa, 1988, 1995); realistic facial feature identification (e.g., Fifić, 2006; Wenger & Townsend, 2001), emotional facial feature perception (e.g., Innes-Ker, 2003), short-term memory search (e.g., Townsend & Fifić, 2004), and nonface meaningful and meaningless objects (e.g., Wenger & Townsend, 2001). In all the visual search situations, the results have always strongly supported parallel processing. All these experiments utilized some type of salience manipulation as a selective experimental factor. And none of them involved letters or words as stimuli. In contrast, in a short-term memory search experiment with word-like stimuli (Townsend & Fifić, 2004), it was necessary to engage a different type of factor: the similarity of distractors to the target. That study showed marked individual differences, with some observers revealing serial and others parallel processing.

As has been noted, although the factorial approach in general can assess complex networks, our data have so far corroborated simple parallel or, sometimes in memory search, serial processing. We shall refer to these as single-stage models, since the main search mechanisms are confined to a single stage in the overall cognitive-processing chain. Yet a highly popular class of models is based on three stages of processing: a very efficient parallel early stage, followed by a selection process that sends certain items to a third, limited-capacity, usually serial, later stage used to search through more difficult items (Sagi & Julesz, 1985; Treisman & Gelade, 1980; Treisman & Sato, 1990; Wolfe, 1994). Although there appear to be many directions one could take in extracting predictions from such multistage models, we will begin to probe some apparently natural, if simplified, possibilities.

The present study was intended first to expand the purview of SFT to meaningless word-like or letter stimuli. Second, we employed similarity of distractors to the target as a selective factor, as was done in the short-term memory study (Townsend & Fifić, 2004), but not heretofore in visual search. Third, we wished to manipulate the complexity of the items in order to explore its effect on architecture. Would the experimental evidence again provide uniform support for parallel processing? How would our simple extensions of the popular three-stage models fare? As will be seen, the results deviated radically from earlier findings.

Factorial Technology Tests of Parallel Versus Serial Systems

SFT is a theory-driven experimental methodology that unveils a taxonomy of four critical characteristics of the cognitive system under study: architecture (serial vs. parallel), stopping rule (exhaustive vs. minimum time), workload capacity (limited, unlimited, or super), and channel independence. The first three are directly tested by our RT methodology. Independence can only be indirectly assessed, although channel dependencies can affect capacity (e.g., Townsend & Wenger, 2004b). To directly test for independence, the investigator needs the accuracy-based analyses afforded by general recognition theory (e.g., Ashby & Townsend, 1986). Architecture and stopping rule are the primary characteristics targeted in this study, but we will see that capacity and, possibly, channel dependencies may be implicated in the interpretations.

It is important to observe that all of our assessment procedures are distribution and parameter free. Most experimentation pursues tests of qualitative predictions (e.g., Group A is faster than Group B) of verbally founded predictions. More rigorous modeling typically tests mathematical models that are based on specific probability functions (normal, Gaussian, gamma, etc.) by estimating the parameters that provide a “best” fit to the data and then, sometimes, determining whether the fit is statistically significant or not. If it is, the model is said to be supported by the data; if not, the model is said to be falsified. In the SFT approach, powerful qualitative predictions are made that, if wrong, can falsify huge classes of models—for instance, the set of all mathematical functions that obey the prime psychological assumptions. If confirmed (not falsified), more specific, parameter-based models can be explored.

As was intimated earlier, SFT analysis is based on a factorial manipulation of at least two factors with two levels, and it utilizes two main statistics: the mean interaction contrast (MIC; Ashby & Townsend, 1980; Schweickert, 1978; Sternberg, 1969) and the survivor interaction contrast (SIC). The latter extension makes use of data at the distributional level, rather than means, and therefore permits analysis at a more powerful and detailed level (Townsend, 1990; Townsend & Nozawa, 1988, 1995; for extension to complex networks, see Schweickert et al., 2000).

The MIC statistic describes the interaction between the mean RTs of two factors with two levels each and can be presented as follows:

\begin{matrix} MIC & = ({RT}_{LL} - {RT}_{LH}) - ({RT}_{HL} - {RT}_{HH}) \\ = {RT}_{LL} - {RT}_{LH} - {RT}_{HL} - {RT}_{HH} . \end{matrix}

(1)

There are two subscript letters; the first denotes the level of the first factor (H = high, L = low), and the second indicates the level of the second factor. Note that MIC gives the difference between differences, which is literally the definition of interaction. MIC = 0 indicates that the effect of one factor on processing latency is exactly the same, whether the level of the other factor is L or H. Conversely, if two factors interact, manipulating the salience of one factor would yield different effects depending on the level of the other factor; hence, MIC ≠ 0. Underadditive interaction, or MIC < 0, is a typical prediction of parallel exhaustive processing, whereas additivity, or MIC = 0, is associated with serial exhaustive processing (Townsend & Ashby, 1983; Townsend & Nozawa, 1995).

The survivor interaction contrast function (SIC) is defined as

SIC (t) = [S_{LL} (t) - S_{LH} (t)] - [S_{HL} (t) - S_{HH} (t)],

(2)

where S(t) denotes the RT survivor function. In brief, to calculate the SIC, we divide the time scale into bins (say, of 10 msec each) and calculate the proportion of responses given within each time bin to produce an approximation to the density function, f(t), and the cumulative probability function F(t). That is, F(t) is equal to the probability that RT is less than or equal to t. The survivor function, S(t), is the complement of the cumulative probability function [1 – F(t)] and tells us the probability that the process under study finishes later than time t. To produce the SIC, one calculates the difference between differences of the survivor functions of the four corresponding factorial conditions the way it is derived for the means, but does so for every bin of time.

Note that this statistic produces an entire function across the values of observed RTs. Furthermore, there is a specific signature of each architecture and stopping rule, with respect to the shape of the SIC function (Townsend & Nozawa, 1988, 1995). For example, the SIC function for a parallel exhaustive model is negative for all time. Perhaps surprisingly, the serial exhaustive processing SIC function is not identically equal to 0 but, rather, is first negative and then positive (see Figure 1). Furthermore, the positive area is predicted to be identical to the negative area. Because this study will, as in Townsend and Fifić (2004), analyze only target-absent trials, the focus will be on the exhaustive (i.e., and) stopping rule.⁴ MIC and SIC predictions for serial exhaustive and parallel exhaustive models are presented in Figure 1. Although the MIC is not nearly as diagnostic as the SIC, it both reinforces the SIC results and provides a means of statistically assessing any interactions associated with the SIC function.⁵

The predicted signatures of serial exhaustive (left) and parallel exhaustive (right) processing revealed by systems factorial technology analysis. The expected survivor interaction contrast (SIC) as a function of time is presented for each model. The predicted factorial interaction on the means (MIC) is boxed inside each figure. Both SIC functions are calculated without an inclusion of a residual time (processing time that is due to nonsystematic variability—for example, a motor response).

The topic of workload capacity plays a relatively minor role here, but it will be required in some of the discussion. Basically, we measure capacity by comparing the actual efficiency when, say, n channels are operating with that predicted by a regular, independent (unlimited capacity, by definition) parallel-processing model. This is accomplished by taking the RT distribution at n = 1 and compounding it in the way predicted by the regular parallel model. This computation creates a capacity function over time. If the actual result is identical to the regular parallel predictions, we call the underlying mechanism (be it serial, parallel, or hybrid) unlimited capacity. If the speed is less than that predicted by the regular parallel model, we say it is limited capacity. If it somehow supersedes the regular parallel prediction, it is called super capacity. In particular, we have repeatedly found that perception of good, configural visual objects can, in certain circumstances, elicit super capacity or, contrarily, lead to limited capacity (e.g., Townsend & Wenger, 2004b; Wenger & Townsend, 2006).

Before ending this section, we need to mention two important nonindependent parallel channels models. One rather distinct type, coactive parallel processing, does away with the assumption that separate decisions (e.g., recognition of an item) are made in the distinct channels. Rather, it is assumed that the multiple-channels pool—for instance, add—their activations together in a final single mutual conduit. The activation level then is continuously compared with a single decision criterion. This idea and its name stem from J. Miller's work on better-than-regular parallel processing (e.g., Miller, 1982; for mathematical renditions of such processing, see Colonius & Townsend, 1997; Diederich & Colonius, 1991). Townsend and Nozawa (1995) proved that a large class of coactive models, based on general counting processes (whose individual channels do not slow down with increased workload and that are summed in a subsequent conduit), inevitably predict super capacity. In fact, the capacity is sufficiently super that Miller's (1982) well-known inequality must be violated. The latter authors also showed that there is a special case of coactive models, based on Poisson counting channels, that predict SIC curves that are S-shaped like exhaustive serial models but whose early negative parts are small, in contrast to the serial models.

A rather different variety of nonindependent parallel models simply assume that the parallel channels interact in a mutually facilitatory manner but still possess decision thresholds within each channel (Mordkoff & Yantis, 1991; Townsend & Wenger, 2004b). We know that this class of models is quite broad, including true coactive models as a special case (Colonius & Townsend, 1997). Hence, they must be able to predict S-shaped SIC functions with total positive area, but we do not know whether this can be accomplished in a psychologically interesting way. These mutually facilitatory channel models are also capable of predicting supercapacity findings (Townsend & Wenger, 2004b). We shall indicate some recent theoretical results with such models in the General Discussion section.

The factorial combination of (item position) × (target–distractor visual dissimilarity) provides for the tests of system architecture. When factorial tests are applied, the simple, single-stage models predict the shape of the SIC function to be consistent with the signature predicted by either a serial exhaustive or a parallel exhaustive system (Figure 1). The MIC value should, correspondingly, reveal either additivity or underadditivity.

A number of alternative varieties of process architecture, including some special simple types of three-stage models, are assayed. It turns out that the simplest displays, where the stimuli consisted of a single letter (C = 1), afford what, at first glance, looks like standard serial processing. However, when the nonsignificant interaction trend in C = 1 is combined with the intriguing results from the more complex conditions (C = 2, 3), we are forced to move to more intricate single-stage models for the overall corpus of data. Potentially, some multistage models may also account for the data. Naturally, multistage models in general form a huge class of alternatives, and it is challenging to pinpoint detailed predictions in the general case. Certain natural, if quite simple, special types of multistage models can be decisively falsified.

EXPERIMENT 1

Method

Participants

Four participants, 2 females and 2 males, were paid for their participation. All had normal or corrected-to-normal vision. They were all native speakers of Serbian and familiar with the Cyrillic alphabet.

Materials

The stimuli were letter strings made up of Cyrillic letters. In a visual search task, a target is presented before a set of test items, which constitute a test display. The task was to search across test items for a target presence. Test items that are not identical to a target are called distractors. We manipulated the visual complexity of the stimuli and the degree of visual dissimilarity between the target and each of the test items. Note that we use the term dissimilarity, rather than similarity, in order to be consistent with previous work (Townsend & Fifić, 2004). Table 1 presents examples of actual target-absent trials for different levels of item complexity (C = 1, 2, 3) and different levels of target-to-test item dissimilarity (HH, HL, LH, and LL, where H stands for high dissimilarity and L for low dissimilarity), in Experiment 1.

Table 1.

Examples of the Actual Target-Absent Trials for Different Item Complexity Levels (C = 1, 2, 3) for Curved and Straight-Line Letters in Experiment 1

Factorial Condition	Target	Test (Display) Items
C = 1
LL	Б	З, B
	П	H, П
HL	З	Ш, B
	H	З, П
LH	H	П B
	B	З, H
HH	Ш	B, Б
	Б	П, Ш
C = 2
LL	БB	БЗ, BЗ
	ПШ	ПH, ШП
HL	ЗB	ШП, BЗ
	HШ	BЗ, ПH
LH	HШ	ШП BЗ
	ЗB	BЗ, ПH
HH	ПШ	BЗ, BБ
	BБ	HП, HШ
C = 3
LL	БBЗ	БЗB, BЗБ
	ПШH	ПHШ, ШПH
HL	ЗBБ	ШПH, BЗБ
	HШП	BЗБ, ПHШ
LH	HШП	ШПH, BЗБ
	ЗBБ	BЗБ, ПHШ
HH	ПШH	BЗБ, BБЗ
	БBЗ	HПШ, HШП

Open in a new tab

Note—The factorial condition indicates the degree of dissimilarity (L for low, H for high) between each test item and the target item. For example, HL denotes a trial on which the left test item was highly dissimilar to the target, whereas the right test item shared low dissimilarity with the target.

We manipulated the visual complexity of the letter-string stimuli by varying the number of letters that made up a single item (1, 2, or 3). For a given level of complexity, the number of letters in the target item was identical to the number of letters in each test item.

To manipulate the degree of visual dissimilarity, we employed two sets of letters: letters with curved features (Б, В, З) and letters with straight-line features (П, Ш, Н). Note that the items within each group were mutually confusable; that is, each of the items was relatively similar to other members of its own group but relatively dissimilar to members of the other group. We then generated different dissimilarity levels by constructing the target and test items from letters drawn either from the same group or from different groups. In Appendix A we test and demonstrate how this technique directly affects the perceptual dissimilarity between items.

To generate the factorial conditions necessary for the MIC and SIC architecture tests, we factorially combined the position of the test item and its visual dissimilarity to the target item. Four orthogonal conditions were thereby generated; HH, HL, LH, and LL, where the position of the letter denotes the position of the relevant test item with respect to the fixation point (left or right). So, HL indicates that two test items were presented; the left was highly dissimilar to the target, whereas the right item possessed low dissimilarity to the target. Since we used two sets of letters, curved and straight, the HL display may represent a case in which the target and the right test item were made of, say, curved letters, whereas the left test item was made of straight letters. Alternatively, on HH trials, when the target was made of curved letters, both test items were made of straight-line letters, and vice versa. Observe that in the HH display, the two test items would be relatively similar to each other, whereas in the HL display, they would be dissimilar to each other (Table 1).

Design and Procedure

The participants were tested individually in a dimly lit room. The two test items in the most complex condition (C = 3, with the widest stimuli) spanned 5 cm horizontally. At a viewing distance of 1.7 m from the computer screen, this width corresponds to a visual angle of 1.86°, well within the fovea. A test display was presented until a response was made, and then a new trial began.

Each trial started with a fixation point that appeared for 700 msec and a low-pitch warning tone of 1,000 msec, followed by the presentation of the target item for 400 msec. Then a mask was presented for 130 msec, followed by two crosshairs that indicated the positions of the two upcoming test items. A high-pitch warning tone was then played for 700 msec, followed by the presentation of the test items. The test display remained on the screen until a response was given.⁶

Half of the trials were target present, and half were target absent. On each trial, the participant had to indicate whether or not the target item appeared on the test display by pressing either the left or the right mouse key with his/her corresponding index finger. RTs were recorded from the onset of the test display, up to the time of the response. The participants were asked to respond both quickly and accurately.

Each participant performed on 30 blocks of 128 trials each, each block on a different day. The order of trials was randomized within blocks. The complexity of the presented items (i.e., the number of letters: C = 1, 2, or 3) was manipulated between blocks, whereas factorial combinations (HH, HL, LH, LL) varied within blocks. For each participant, the mean RT for each conjunction of item complexity and factorial combination was calculated from approximately 200 trials. We found this number of trials to be sufficient for testing models at the individual-participant level, both for the ANOVA and for calculating survivor functions.

Results

We analyzed data within individual participants. When the results exhibited a uniform pattern across participants, we also report group statistics.

We ran a one-way ANOVA on mean RT averaged over participants. We used the factor of the item complexity level (C = 1, 2, 3). The main effect of item complexity was significant [F(2,6) = 74.28, p < .01], showing a trend of increasing RTs with increasing complexity of the presented items (mean RT_C1 = 576 msec, SE = 26.37 msec; mean RT_C2 = 786 msec, SE = 57.89 msec; mean RT_C3 = 924 msec, SE = 54.37 msec).

We performed a separate ANOVA for individual participants, where F₁ was defined as the visual dissimilarity between the left test item and the target (high or low). F₂ was similarly defined for the right test item. Together, F₁ and F₂ yielded four experimental conditions: HH, HL, LH, and LL. The F₁ × F₂ interaction indicated the significance of the observed MIC value.

Both main effects (F₁ and F₂) were significant at p < .01 in the higher item complexity conditions (C = 2, 3) for all the participants. For 3 out of the 4 participants, both main effects were significant in the single-letter presentation, C = 1 (Participant 1 exhibited a significant main effect for F₂ but not for F₁). Next, we will turn to an analysis of the results of the mean interaction contrast, MIC, and the survivor interaction contrast, SIC.

Mean and survivor interaction contrast functions

In Table 2, we present the interaction test on means (MIC) for each of the participants for different complexity levels. The results are remarkably uniform across participants. They all exhibited an overadditive MIC on the higher item complexity levels (C = 2, 3), and all of them, excluding Participant 1, exhibited an additive MIC on the lowest level of item complexity (C = 1; stimulus items that were made of one letter). Hence, additivity, which is associated with a serial exhaustive processing, was manifested by 3 out of the 4 participants in the C = 1 condition. Nonetheless, it is important that we note that the tendency of all the participants, even those in the C = 1 condition, was toward overadditivity, just not significantly so. Furthermore, it is evident that all the participants exhibited less overadditivity (i.e., smaller MIC values) in the C = 3 than in the C = 2 condition. That is, there is a nonmonotonicity with regard to how overadditivity acted as a function of complexity, first increasing, and then decreasing, with complexity.

Table 2.

Mean Interaction Contrast (MIC) Analysis of Experiment 1, Calculated Separately for Each Level of Item Complexity and for Each Participant

Participant	MIC (msec)	df	F	p	Power
C = 1 (One-Letter Target)
1	−41^*	1, 558	3.85	.05	.50
2	12	1, 554	1.38	.24	.22
3	18	1, 553	0.97	.32	.17
4	16	1, 558	2.13	.14	.31
C = 2 (Two-Letter Target)
1	168^**	1, 596	33.92	.00	1.00
2	133^**	1, 593	43.25	.00	1.00
3	179^**	1, 595	65.79	.00	1.00
4	140^**	1, 596	36.20	.00	1.00
C = 3 (Three-Letter Target)
1	64^*	1, 594	4.93	.03	.60
2	52	1, 589	3.53	.06	.47
3	98^**	1, 584	12.56	.00	.94
4	96^**	1, 590	12.88	.00	.95

Open in a new tab

Note—The MIC factors were the degree of visual dissimilarity between the target item and each of the test items. Since the main effects were significant for almost all cases, we present only the interaction contrast (but see the text for detailed information). df stands for degrees of freedom.

p < .05.

^**

p < .01.

In Figure 2 we present graphically the MIC and SIC figures for the individual participants. The SIC functions are on the left, and the corresponding MIC figures are on the right. As the reader may note, most SIC functions exhibit an S-shape, which by itself is associated with serial exhaustive processing.⁷ However, concurring with the ANOVA results, for the C = 2 and 3 conditions, the summed areas of several SIC functions are significantly greater than zero, which is, of course, inconsistent with serial exhaustive processing. We will discuss this finding in the next section. As has been noted, a single exception is the data of Participant 1 from the C = 1 condition, who showed mainly a negative SIC (and demonstrated a consequent negative MIC value; see Table 2 and Figure 2), which is consistent with parallel exhaustive processing. However, this atypical result is disputable, given that one of the main effects (F₁) did not reach significance.⁸

Systems factorial technology analysis of the data from Experiment 1 (only for the target-absent trials). The survivor interaction contrast (SIC) and the mean interaction contrast (MIC) are presented from left to right. Error bars around each mean represent the standard error statistic. Different participants are presented in different rows, and the figures are sorted by the item complexity level (C = 1, 2, 3).

Error analysis

The mean error level for target-absent trials, averaged across all participants, was 1.1%. No speed–accuracy trade-off was observed: Both mean RTs and mean errors increased across item complexity.

Discussion

The participants exhibited longer RTs when moving from item complexity C = 1 to more complex displays, C = 2 and C = 3. Both MIC and SIC patterns were consistent across participants. The SIC functions in the C = 2 and C = 3 conditions generally revealed an S-shape that roughly resembles the signature of serial exhaustive processing, and the C = 1 condition could be taken as indicative of serial processing, given that the MIC is nonsignificant. However, the positive portion of the curve was typically much larger than the negative portion for C > 1. Taken together with the nonsignificant trend in C = 1, the results suggest that overadditivity, not additivity or under-additivity (as in parallel exhaustive processing), was the modal characteristic of processing in this study.⁹

Furthermore, MIC increased from C = 1 to C = 2 and decreased from C = 2 to C = 3. However, it was still significant for most of the C = 3 participants. Does the shift from additive to overadditive MIC, as we vary the complexity of the visual display, indicate a qualitative change in the architecture (from serial to parallel)? To acquire an even more accurate answer, we conducted additional analyses on the trends of mean RTs over different item complexity levels, for distinct factorial conditions (see Appendix B for details).

These additional analyses revealed that RTs on different factorial conditions (HH, HL, LH, and LL) changed differently as a function of item complexity. RTs on LL and HH conditions increased in a concave down (negatively accelerated) fashion when item complexity was increased, whereas RTs on HL and LH conditions increased linearly. This differential change, as we increase the complexity of the presented items, leads to a change from additive to highly overadditive to low overadditive MIC that does not necessarily reflect a real change in the processing architecture.

One account of why the effect of item complexity on processing in the LL and HH conditions departs from its effect in the HL and LH conditions (in fact, on the overall mean RT patterns) is that, in accordance with Duncan and Humphreys (1989), visual search latencies are affected both by the target-to-distractor dissimilarity (i.e., the dissimilarity between the target and each of the test items, denoted earlier as T–D) and by the mutual similarity of distractors (denoted D–D). The two test items in the LL display are highly similar to each other, as are the test items in the HH display. In the HL (alternatively, LH) condition, on the other hand, the two test items are highly dissimilar to each other.

By increasing the complexity of items (as we increase the number of their constituting letters), we generally impair the visual search, but, in the HH and LL conditions, this effect may be countered by a relative facilitation due to the increased mutual similarity. In popular multistage models, this dynamic could be represented in an underlying selection process: When (two) distractor items are mutually similar, an efficient grouping process that can select them on the basis of a common property is activated. That grouping or scene segmentation process is a fast parallel process that is postulated in somewhat different ways in several theories: as feature inhibition (Treisman & Sato, 1990), as spreading suppression (Duncan & Humphreys, 1989), or as a complex network of several subnetworks that maintain efficient parsing of the visual scene (Grossberg, Mingolla, & Ross, 1994). When such a grouping process is deployed, higher efficiency of visual search is achieved, and overall mean RTs are shorter. To have an intuitive idea of how mutual similarity between test items may increase the efficiency of visual search, consider again the example provided at the beginning of the article: Looking for the familiar face of a friend in a crowded café may become much easier if all the other patrons are identical siblings.

It is worth noting that we observed the mean overadditivity in the more complex conditions (C = 2, 3) in an unpublished experiment that comprised several hundred trials per condition. The C = 1 condition offered ambiguous results and prompted the full-scale study presented here. In the C > 1 conditions, as in the present work, we consistently observed S-shaped SIC functions with a positive part larger then the negative one for different durations of display: from a limited duration of 300 msec to an “unlimited” duration (where the test items are displayed until a response is given). Although it could be expected that presenting a display for an unlimited duration would encourage serial processing (McElree & Carrasco, 1999; Zelinsky & Shein berg, 1997), the overadditive S-shaped SIC function—not predicted by serial processing—was consistently evident in both viewing conditions, limited and unlimited.

Previous visual search experiments conducted in our lab have yielded consistent evidence for parallelism. Perhaps the most obvious difference, apart from use of D–D similarity, in the present study was the employment of linguistic or alphabetic-related stimuli, as opposed to the nonalphabetic stimuli (dots, faces, etc.) employed in previous experiments. Experiment 2 was designed to investigate whether the overadditivity combined with S-shaped SIC curves could be associated with letter stimulus items, as opposed to our usual pictorial stimulus patterns—for instance, via reading habits. We therefore used meaningless visual patterns instead of letters in the next experiment. We employed only the single-item complexity level (C = 1), since it was the only condition so far to engage anything close to pure serial processing.