Divided attention limits perception of 3-D object shapes

Alec Scharff; John Palmer; Cathleen M Moore

doi:10.1167/13.2.18

. 2013 Feb 12;13(2):18. doi: 10.1167/13.2.18

Divided attention limits perception of 3-D object shapes

Alec Scharff ¹, John Palmer ¹, Cathleen M Moore ²

PMCID: PMC5833208 PMID: 23404158

Abstract

Can one perceive multiple object shapes at once? We tested two benchmark models of object shape perception under divided attention: an unlimited-capacity and a fixed-capacity model. Under unlimited-capacity models, shapes are analyzed independently and in parallel. Under fixed-capacity models, shapes are processed at a fixed rate (as in a serial model). To distinguish these models, we compared conditions in which observers were presented with simultaneous or sequential presentations of a fixed number of objects (The extended simultaneous-sequential method: Scharff, Palmer, & Moore, 2011a, 2011b). We used novel physical objects as stimuli, minimizing the role of semantic categorization in the task. Observers searched for a specific object among similar objects. We ensured that non-shape stimulus properties such as color and texture could not be used to complete the task. Unpredictable viewing angles were used to preclude image-matching strategies. The results rejected unlimited-capacity models for object shape perception and were consistent with the predictions of a fixed-capacity model. In contrast, a task that required observers to recognize 2-D shapes with predictable viewing angles yielded an unlimited capacity result. Further experiments ruled out alternative explanations for the capacity limit, leading us to conclude that there is a fixed-capacity limit on the ability to perceive 3-D object shapes.

Keywords: divided attention, shape perception, object perception, visual search

Introduction

In perception, divided attention refers to situations in which an observer must monitor multiple stimuli at once. Such demands are common in real-world tasks such as driving a car. Here we investigate how divided attention affects shape-based object recognition. Although many theories posit detrimental effects of divided attention on object shape perception, there has been no definitive empirical study of these effects.

The scarcity of research on this topic may be due to the difficulty of isolating the phenomena involved from other cognitive and perceptual processes. Shape perception can be conflated with non-shape feature discrimination (Evans & Treisman, 2005) or semantic categorization (Warrington & Taylor, 1978). We aim to dissociate the effects of attention on perceptual processes from effects on sensory, memory, and decision processes (Palmer, 1995).

Shape-based object recognition

Shape-based object recognition is achieved when an observer apprehends an object's spatial contours sufficiently to match it to a known exemplar. For the purpose of object recognition, the diagnostic value of a three-dimensional shape is unsurpassed by other visual features. Object shape perception is the result of shape-constancy mechanisms that infer three-dimensional forms from a retinal image containing only two-dimensional projections of those forms. Thus, object shape percepts remain constant despite changes in the corresponding retinal images (Pizlo, 2008).

Our goal is to measure divided attention effects on the perception of object shape. To do so, we use a psychophysical task that relies solely on object shape perception. Below we elaborate our approach to minimizing extraneous non-shape information in the task.

Equating non-shape features across objects to prevent feature matching

We attempted to eliminate any diagnostic non-shape features in the stimuli because these features might be exploited to identify target objects without achieving object shape constancy. For example, an elephant may be identified not only by its shape but also by its gray color, bumpy texture, or large size. We refer to a strategy that exploits these diagnostic non-shape features as feature matching. To prevent feature matching, we equated non-shape features across objects. For example, if an elephant were a stimulus in the experiment, then this approach would require observers to recognize an elephant among other large, gray, bumpy objects.

Rotating objects to preclude template matching

Even when non-shape image features are equated, there remains the possibility that observers could employ a template-matching strategy to identify an object without achieving shape constancy. In an example template strategy, the observer compares luminance value at each pixel in an image with a representation in memory and makes a decision based on the degree of match. More plausibly, the observer could identify an object on the basis of some partial template (e.g., a diagonal edge in the lower-left corner of the image). Template matching strategies are possible when the observer knows the exact form the image will take, so we attempted to prevent such strategies by unpredictably varying the observer's view of the object. In two of our experiments (Experiments 5 and 6), we examined the possibility of template matching by using predictable images.

Use novel objects to minimize role of semantic categorization

Another concern is that performance in an ostensibly perceptual task might also depend on semantic processes. For example, reading words is subject to an attentional-capacity limitation (Scharff, Palmer, & Moore, 2011b), but it is ambiguous whether this limit is attributable to the perceptual demands or the semantic demands of the task. To minimize the role of semantics, we used novel object stimuli that did not have prior semantic associations and required observers to recognize specific physical objects rather than categorize them.

Divided attention and capacity

In perception, effects of divided attention can be characterized in terms of capacity, the quantity of information that can pass through a system during a given time interval (Broadbent, 1958; Kahneman, 1973; Townsend, 1974). Two extreme, boundary-defining model classes are unlimited-capacity models and fixed-capacity models. Here, capacity refers to throughput at the level of the perceptual system. All models assume that we accrue information over time, but they differ in the extent to which dividing attention among multiple objects affects this information accrual.

Unlimited-capacity models posit that divided attention among multiple objects does not limit perception, because there is no system-level limit on perceptual processing. Importantly, unlimited-capacity processing does not necessarily imply fast or accurate object recognition, because perception is still noise-limited at the level of local mechanisms. The definitive property of unlimited-capacity models is that the rate of individual stimulus processing is not degraded by divided attention. The prototype model of this class is the standard parallel model in which multiple analyses occur in parallel, with the rate of each analysis unaffected by the number of objects being analyzed (Gardner, 1973). A less intuitive example model is the super-capacity serial model in which only one analysis can be carried out at a time, but the speed of each analysis increases proportionally with the number of objects of interest so that system-level capacity is unlimited (Townsend, 1974). The standard parallel model and super-capacity serial model can make equivalent predictions for many experiments, making them difficult to distinguish.

In contrast, fixed-capacity models assume an inflexible limit on the overall rate of information accrual. An intuitive example is the standard serial model in which one object process must be completed before another can begin (Davis, Shikano, Peterson, & Michel, 2003; Townsend, 1974). Parallel processing can also have fixed capacity, for example, if multiple concurrent analyses are carried out with efficiency inversely proportional to the number of relevant stimuli (e.g., the parallel sampling model of Shaw, 1980).

Between these two extremes are intermediate-capacity models in which there is a limit less restrictive than fixed capacity models propose (e.g., limited resource models, Norman & Bobrow, 1975; crosstalk models, Mozer, 1991). The more common term limited capacity encompasses both intermediate-capacity and fixed-capacity models.

Relationship to parallel-serial architecture

Many authors have characterized divided attention effects in terms of processing architecture—whether object processes occur in serial or in parallel. It is difficult to distinguish between parallel and serial models because the predictions of a particular serial model can often be mimicked by an appropriately formulated parallel model and vice versa (Townsend, 1974). For example, the standard serial and parallel sampling models make equivalent predictions for many experiments, as do the standard parallel and super-capacity serial models. For this reason, we focus on capacity, rather than architecture, as a tractable property that characterizes how perception is affected by divided attention.

Effects on perception versus decision

Attention research is usually concerned with the effects of non-stimulus manipulations on the quality of perception. However, effects on perception can be difficult to distinguish from effects on the decision processes involved in psychophysical performance (Eckstein, Peterson, Pham, & Droll, 2009; Palmer, Verghese, & Pavel, 2000). In the domain of divided attention phenomena, the most common experimental approach is the visual search paradigm, in which an observer searches for a specified target stimulus among some number of distracter stimuli. Response time is often measured as a function of the number of distracter stimuli with the total number of stimuli referred to as set size. Large effects of set size on response time are sometimes interpreted as evidence for limited-capacity perception. However, this interpretation neglects the impact of set size on decision processes. If judgments about individual stimuli are subject to error, then increasing the number of stimuli under consideration in a given task increases the overall error rate, as when multiple comparisons are made in statistical decisions (Palmer et al., 2000; Shaw, 1980; Tanner, 1961). Thus, large set-size effects may be observed even in the absence of capacity limitations, as demonstrated by Huang and Pashler (2005).

A second common method of investigating perceptual capacity limitations is the psychophysical dual-task experiment, in which observers make two independent psychophysical judgment concurrently (Bonnel, Stein, & Bertucci, 1992; Sperling & Melchner, 1978). Performance in the dual task can then be compared to single-task controls to determine whether dividing attention in the dual-task condition caused performance decrements. The challenge with this method is to determine whether any performance decrements are due to loss in perceptual sensitivity or to interference in decision and/or response mechanisms because the number of stimuli to be monitored is confounded with the number of decisions and responses to be made.

In the experiments presented here, we use the extended simultaneous-sequential method to measure the effects of divided attention on perception. This method uses a visual search task but keeps the decision structure and response alternatives constant across conditions. Thus, any divided attention effects are attributable solely to perception.

Theories of attention and object recognition

Early accounts described all of perception as capacity limited (Kahneman, 1973; Posner, 1980). This notion has been contradicted by numerous studies that show unlimited-capacity performance in search for image features such as stimulus luminance, orientation, color saturation, and size (Bonnel et al., 1992; Palmer, 1994; Scharff et al., 2011b). Subsequent theories proposed that some aspects of perception have unlimited capacity, while others are subject to capacity limits. Most prominently, Treisman and Gelade (1980) proposed that while image feature perception has unlimited capacity, perceiving the conjunction of two or more features in the same object requires a limited-capacity mechanism. This theory is challenged by studies showing unlimited-capacity performance in conjunction search tasks (Eckstein, 1998; Eckstein, Thomas, Palmer, & Shimozaki, 2000; Huang & Pashler, 2005). Several modern theories describe object perception as a limited-capacity process (Kahneman, Treisman, & Gibbs, 1992; Wolfe, 1994). Others have challenged the view that object perception is limited capacity, proposing that at least some special cases can have unlimited capacity (Allport, 1987; Rousselet, Thorpe, & Fabre-Thrope, 2004; Van der Heijden, 1996).

There is no solid evidence to support either kind of theory for shape perception. Though several studies have addressed the idea or related questions, none provide unambiguous results. Each is rendered ambiguous by a combination of reasons given above. Here we briefly review some representative studies and discuss why we consider them inconclusive.

Biederman, Blickle, Teitelbaum, and Klatsky (1987) used the response time search task with line drawings of familiar objects as stimuli (e.g., car, fire hydrant, filing cabinet). Following a brief display of an array of objects, subjects indicated whether or not a specific object had been present. Both error rate and response time increased with set size, effects that were interpreted as evidence for limited-capacity object perception. However, Biederman et al.'s use of the response-time set-size paradigm makes this interpretation ambiguous, because the set-size effect could be explained by an effect on the decision processes as described above. Furthermore, the semantic categorization required by the task confounds object shape recognition with semantic categorization.

More recent work by Rousselet, Fabre-Thrope, and Thorpe (2002) used search studies in which subjects searched for animals or vehicles among distracter images that did not contain those targets. Rousselet et al. compared speed and accuracy performance to the predictions of unlimited-capacity models. The results were consistent with standard parallel models of object perception, at least in the special case of animal and vehicle searches. However, as others have pointed out, such categories of image might easily be discriminated by non-shape image features, either those intrinsic to the object (e.g., furriness; Evans & Treisman, 2005) or those induced by the style of photography used in images containing animals (sharp focus and blurred background, Wichmann, Drewes, Rosas, & Gegenfurtner, 2010). Thus, although object recognition tasks yielded unlimited capacity results in these studies, it is not clear if parallel search was accomplished on the basis of object shape recognition or on the basis of non-shape diagnostic features.

Our recent study addresses a closely related question but confounds semantic categorization with object shape perception (Scharff, Palmer, & Moore, 2011a). We tested object categorization under divided attention, using a search task in which observers searched for a particular kind of animal on each trial. In an effort to minimize non-shape features, we used animal images as both targets and distracters. We required observers to indicate the presence of a specific animal type (e.g., squirrel, moose, etc). In this study, we found unambiguous fixed-capacity results. However, it is still unclear whether it was the shape perception or the categorization aspect of this task that caused the fixed-capacity limit.

To summarize, our goal was to characterize divided attention effects on shape-based object recognition. We designed our stimuli and tasks so that objects could be distinguished only on the basis of object shape, not by feature or template matching, and without requiring semantic categorization. To measure divided attention effects, we used the extended simultaneous-sequential paradigm. This method can distinguish unlimited-capacity and fixed-capacity processes while relying on relatively modest assumptions (Scharff et al., 2011b).

General method

The six experiments described here used the extended simultaneous-sequential paradigm to distinguish models of divided attention (Eriksen & Spencer, 1969; Scharff et al., 2011b; Shiffrin & Gardner, 1972). All of the experiments use the same three experimental conditions. These conditions are simultaneous, sequential, and repeated (illustrated in Figure 1). In this section we describe the three conditions and the logic of the predictions. Quantitative predictions are presented in the Appendix.

Schematic of conditions used in Experiment 1. Each condition began with a preview of the target object. Next the critical display appeared, depicting the target object in one of four locations. The target object was always shown from a different viewpoint in the critical display than in the preview display. The critical display differed across conditions. In the simultaneous condition, all four stimuli were shown concurrently. In the sequential condition, two stimuli were shown in one display and then the remaining two stimuli were shown in a subsequent display. In the repeated condition, all stimuli were shown twice. Following the critical display, the observer responded by indicating the location where the target object had appeared.

Localization search task

In all conditions, observers performed a localization task, indicating the location of specified target stimulus in a briefly presented array. Four stimulus locations were positioned at the corners of an invisible square surrounding fixation. To begin each trial, observers saw a preview image that indicated the target object for that trial (first row of Figure 1). Target stimuli varied randomly between trials. The preview image subtended 3.9° in the center of the display and persisted for 2000 ms. The preview was then replaced by a fixation cross that persisted for 1000 ms. Next, the critical display began. The critical display was a simultaneous, sequential, or repeated presentation of the four stimuli, as described in the sections that follow. Observers were instructed to maintain fixation throughout the trial (in Experiment 4 fixation was enforced with eye tracking). In all conditions, the display included the target stimulus along with three distracter stimuli. Following the critical display, the preview image reappeared and observers pressed one of four keys to indicate the location where the target had appeared. Observers were instructed to strive for the best possible accuracy performance and not rush their responses. Conditions alternated in blocks throughout each session, with six trials to a block. Each block was preceded by a warning screen indicating the condition of the following block.

Simultaneous condition

In the simultaneous condition (Panel A of Figure 1), the critical display consisted of a brief, simultaneous presentation of all four stimuli. Each image subtended 3.9° and was centered 3° eccentric from the fixation cross. The center-to-center spacing of neighboring images was 4.2°. The objects were shown with superimposed dynamic noise (these are described in the Methods section for each experiment).

Sequential condition

In the sequential condition (Panel B of Figure 1), the critical display showed the four stimuli sequentially in two subdisplays (henceforth referred to as intervals). Two of the four stimuli were shown in the first interval and then the other two stimuli were shown in a second interval. There was an 1,800 interstimulus interval (blank screen between the two displays). This relatively long interstimulus interval was used to provide ample time for shifts of attention (Duncan, Ward, & Shapiro, 1994). To discourage eye movements, each interval always presented stimuli on opposite sides of fixation (in Experiment 4, the role of eye movements was explicitly tested with eye tracking). The order in which the subsets of objects appeared alternated by block and observers were informed of the order by a warning screen preceding each block.

Repeated condition

In the repeated condition, the critical display had two intervals presented sequentially and all four images appeared in both intervals (Panel C of Figure 1). Timing was as in the sequential condition. Identical stimuli were used in the first and second intervals, but unique noise sequences were generated for each interval.

Predictions

All models predict better accuracy in the repeated than the simultaneous conditions. Fixed-capacity models predict that sequential accuracy will be similar to repeated accuracy, while unlimited capacity models predict similarity between simultaneous and sequential accuracies. We refer to Figure 2 to demonstrate the logic of these predictions. The placement of each gray bar indicates how and when the stimuli appears in a trial. Locations are listed vertically and numbered one to four; display intervals are listed horizontally and labeled “first” and “second.” The black arrows overlying the gray bars represent the observer's analysis of a stimulus.

The predictions of unlimited-capacity models are illustrated with the standard parallel model (left column of Figure 2). Here, the observer analyzes all visible stimuli independently and in parallel. Accuracy is determined by the amount of time available to analyze the stimuli. In both the simultaneous and sequential conditions, each stimulus is displayed for one interval duration; therefore the model predicts equivalent accuracy between these two conditions (Shiffrin & Gardner, 1972). In the repeated condition, the stimuli are visible for a longer time (two interval durations), yielding better accuracy than the other conditions. Overall, the standard parallel model (and other unlimited-capacity models) predicts the pattern simultaneous = sequential < repeated.

The predictions of fixed-capacity models are illustrated with the standard serial model (right column of Figure 2). In this example, the observer has time to scan through just two of the stimuli during each brief interval duration. The observer has no information about the unscanned stimuli. Under this model, accuracy is impaired in the simultaneous condition (in which the observer has no information about two of the stimuli) compared to the sequential and repeated conditions (in which the observer has information about all four stimuli). Thus accuracy is equivalent in the repeated and sequential conditions, and both are superior to the simultaneous condition, yielding the overall predicted pattern simultaneous < sequential = repeated. This prediction can be generalized to several other serial models and to fixed-capacity parallel models (see Appendix).

The predictions are predicated on the assumption that accuracy is limited by stimulus display duration (i.e., accuracy would improve given increased display time). The comparison between the simultaneous and repeated conditions can be used to verify whether this assumption is correct. In some pilot studies, we have found cases that yielded similar accuracy in the simultaneous and repeated conditions even though the repeated conditions present stimuli for twice the duration as the simultaneous condition. For example, brief displays of objects without noise can yield this result in violation of the assumption of duration-dependent accuracy. One explanation for this violation is that the duration of the simultaneous condition exceeds the maximum window of temporal integration, so that additional viewing time provides no benefit. Whatever the explanation, if the repeating the display does not improve accuracy then results from this paradigm are insensitive to the capacity differences of interest. Consequently, simultaneous-sequential experiments without a repeated condition cannot distinguish an unlimited capacity result from an insensitive experiment.

Session length and practice

Experimental sessions included 48 trials in each of the three conditions and lasted approximately 20 minutes. Each observer completed at least two practice sessions before beginning the experiment. If observers were not performing reliably better in the repeated than simultaneous conditions, they were given additional practice and instructions to exploit both intervals of the repeated display. Following practice, each observer completed 10 experimental sessions for a total 480 trials per condition per observer. Experiment 3 instead used 30 sessions with 24 trials per condition per duration.

Overview of experiments

Here we provide a brief summary of the experiments to aid the reader in following their progression. Full methods and results follow.

In Experiment 1, we used the extended simultaneous-sequential method to measure capacity limits in a shape-based recognition task. We used the strategies described earlier to prevent template matching, feature matching, and semantic categorization. Here we found results consistent with fixed capacity. This is the primary finding of the study: A carefully controlled shape-based object recognition task yields fixed-capacity results.

Experiments 2 through 4 ruled out alternative explanations for the fixed-capacity result in Experiment 1. Experiment 2 showed that this result was not due to physical display differences between conditions. Experiment 3 showed that capacity is independent of task difficulty. Experiment 4 showed that the fixed-capacity result was not caused by saccadic eye movements. Through the first four experiments, all results were consistent with the fixed-capacity processing in shape-based object perception.

For Experiments 5 and 6, we sought complementary unlimited-capacity results. Contrasting unlimited-capacity results would help delineate which aspects of perception invoke capacity limits. Experiment 5 was similar to Experiment 1, but we used unrotated object stimuli and encouraged observers to adopt a template-matching strategy. Contrary to our prediction, we found results consistent with fixed capacity. We speculate that the presence of object shape information made it difficult or unintuitive for observers to use a template matching strategy. In Experiment 6, we further encouraged template matching by having observers search for simple, familiar, unrotated two-dimensional shapes. With these shapes we found unlimited-capacity results that were consistent across observers. This is the second main finding of the study, that a simple shape discrimination task can also yield unlimited-capacity results. In the Discussion section of this article, we contrast this simple task with the object shape perception tasks of the earlier experiments to better understand the locus of capacity limitations in object perception.

Experiment 1: Shape-based object recognition

This initial experiment is the central experiment for this study. The goal of Experiment 1 was to create a shape task in which accuracy relied solely on perception of an object's real-world 3-D shape. Observers attempted to locate a specific object in an array of similar objects. Object sets were created to minimize non-shape cues to object recognition. The extended simultaneous-sequential paradigm was used to measure capacity limitations.

Method

Stimuli

Figure 3 shows example stimuli from the study. The stimulus set included three categories: foam blocks, toy duplos, and paper crumples. Each category comprised photographs of six physical models. To create the stimuli, we photographed each physical model from six evenly spaced viewpoints. The models were placed on a circular table covered in white cloth and illuminated by ambient room light and a stationary desk lamp. The objects occupied the central region of each photograph and the white tablecloth filled the background. We then applied a square crop to each image and converted all photographs to gray scale, 100 × 100 pixel images.

Superimposed dynamic noise

We found that conventional dynamic pixel noise was not effective in limited accuracy for this task. To create noise that was effective in limiting accuracy, we superimposed movies of scrambled stimulus images with the stimuli. To create scrambled stimulus images, we divided up a central region of each stimulus image into squares and randomly shuffled the squares within the central region that included the object. The central region was 74 × 74 pixels for the foam block images and 50 × 50 pixels for the duplo and crumple images. Noise movies were superimposed with stimuli from their own categories using the following procedure. First, we subtracted the mean pixel value of the noise image from each individual pixel value in that image, centering the distribution of pixel values at zero. We then multiplied each pixel in both the noise image and the stimulus image by 0.7 (to prevent clipping) and combined them by taking the pixelwise sum. This gave the appearance of the stimulus image superimposed with a transparent lower contrast noise image. Each stimulus display in the experiment lasted 120 ms (nine screen refreshes), during which one noise image persisted for 40 ms (three refreshes) before changing to a new randomly chosen noise image.

Procedure

Details of the procedure are described in the General method section. Each trial consisted of a preview display followed by a critical display. The observer's task was to indicate which of four locations in the critical display contained the object that was shown in the preview and post-critical displays. The object category (foam blocks, duplos, or crumples) varied randomly from trial to trial. Distracters were always drawn from the same category of objects as the target. Importantly, different viewpoint images were used in the preview display and the critical display. Thus, observers knew what object they were searching for but not the viewpoint image that would appear in the critical display. Interval duration was 120 ms in all conditions. The apparatus was the same as in Scharff et al. (2011b).

Observers

The six observers were paid and unpaid volunteers. All had normal or corrected-to-normal vision. Author AS was an observer in this experiment and all others except Experiment 3. In this and all other experiments, each observer completed at least two practice sessions before participating.

Results

The results of Experiment 1 are depicted in Figure 4. Proportion correct is plotted against the three conditions (average of six observers with error bars corresponding to standard error of the mean across observers). The dotted lines plot the predictions of the unlimited-capacity (simultaneous = sequential) and fixed-capacity models (sequential = repeated). There was a reliable advantage for sequential over simultaneous presentation, mean within-observer difference of 10.3% ± 0.7% standard error of the mean difference, two-tailed paired-samples t(5) = 15.59, p < 0.0001, Cohen's d = 1.24, and for repeated over simultaneous presentation, 11.3% ± 0.3%, t(5) = 34.8, p < 1.0 × 10⁻⁶, d = 1.37. However, there was no reliable difference between the sequential and repeated conditions, 0.9% ± 0.8% in favor of repeated, t(5) = 1.16, p = 0.29, d = 0.11. This pattern of results across the three main conditions is consistent with fixed-capacity models and rejects unlimited-capacity models.

To test whether observers treated the two sequential display intervals similarly, we compared accuracy between the sequential trials with targets in the first versus the second interval. There was no reliable accuracy difference between trials with targets in the first versus the second interval of the sequential condition, 2.4% ± 3.1% in favor of second interval trials, t(5) = 0.77, p = 0.48, d = 0.27. If such a difference had been present, it would undermine a critical assumption of the model.

We tested whether there was an effect of the angular viewpoint difference between the preview image and the target image in the critical display. There was no reliable effect of viewpoint difference on accuracy: For 60° viewpoint differences, mean accuracy was 77% ± 3%; for 120° viewpoint differences, mean accuracy was 76% ± 3%; and for 180° viewpoint differences, mean accuracy was 75% ± 4%, F(2, 5) = 1.12, p = 0.37.

Discussion

In sum, results of Experiment 1 were consistent with fixed-capacity models and rejected unlimited-capacity models for shape-based object recognition. Secondary analyses indicated that observers treated the two sequential display intervals similarly and that image viewpoints had a negligible effect on accuracy. The central finding of this study is that shape-based object recognition has fixed capacity. The remaining experiments are controls and comparisons needed to more fully interpret this result.

Experiment 2: Identical displays with cueing

In Experiment 1, we observed a fixed capacity result for shape-based object perception. In Experiments 2, 3, and 4, we test a series of alternative explanations for the fixed-capacity limit: sensory limitations, limitations due to task difficulty, and limitations due to eye movements, respectively.

In Experiment 2, we use a cueing design to rule out any sensory limitations as causes of the limited perceptual capacity in observed in Experiment 1. The simultaneous-sequential design equates decision and response demands but not sensory demands. In particular, the simultaneous condition presents a denser array of stimuli (with four stimuli presented at once) compared to the sequential condition. Consequently, accuracy differences may have been stimulus driven (e.g., by mutual crowding of densely presented stimuli, Pelli & Tillman, 2008).

The cueing design used here eliminates physical differences between the simultaneous and sequential conditions; thus all sensory factors are held constant across conditions (e.g., crowding). Any observed effects must be attributed to attentional capacity limitations. For similar experiments see Palmer (1994) and Scharff et al. (2011a).

Method

Experiment 2 included three conditions, as depicted in Figure 5. These conditions all used two display intervals, but they are fundamentally different from the repeated condition from Experiment 1 because stimuli were not repeated. The cued simultaneous and cued sequential conditions (Panels B and C of Figure 5) replicate the corresponding conditions from Experiment 1 in terms of the number and order of relevant stimuli. The neutral condition (Panel A of Figure 5) was a check to ensure observers followed the cueing instructions. We did not include a repeated condition because it would have required inconsistent instructions and stimulus presentations. However, the repeated condition from Experiment 1 can be used as a reference.

Schematic of conditions used in Experiment 2. This experiment repeated the simultaneous-sequential comparison using identical stimulus displays. Each condition included eight unique images, four in each interval, with only one target among them. Cues preceding each interval indicated possible target positions. In the neutral condition, the target can appear in any location in either interval. In the cued simultaneous condition, the target can appear in any location in one of the intervals (the second interval in the example shown). In the cued sequential condition, the target can occur within the cued subset of locations in either interval.

The stimuli and display parameters were similar to the repeated condition from Experiment 1. The critical difference was that each trial included eight unique stimuli displayed across the two intervals. Each trial included one target and seven distracters with four shown in the first interval and four shown in the second. Conditions were run in separate blocks. Cueing instructions were given before each block and were reinforced with central precues pointing toward the relevant locations. Only cued locations could contain the target. In the cued simultaneous condition, cues indicated which interval could contain the target. The cued interval (first or second) was consistent within a block of trials. In the cued sequential condition, observers were told which subset of locations would contain the target—two in the first interval and two in the second. The cued locations were always at opposite corners of the display, and the order in which they were cued was consistent within a block of trials. In the neutral condition, observers were told that the target could appear at any location and in either interval. The precues were white lines extending from near fixation towards relevant locations. Precues started 0.5° from fixation and ended 1° away. Preceding each stimulus display interval, cues appeared for 300 ms and were followed by a 300 ms blank interval.

Observers

Four observers completed the experiment, three of whom had previously completed Experiment 1. The fourth observer practiced Experiment 1 for several hours before participating. Observer AS was an observer in the study.