Object-level visual information gets through the bottleneck of crowding

Jason Fischer; David Whitney

doi:10.1152/jn.00904.2010

. 2011 Jun 15;106(3):1389–1398. doi: 10.1152/jn.00904.2010

Object-level visual information gets through the bottleneck of crowding

Jason Fischer ^1,^✉, David Whitney ¹

PMCID: PMC3174808 PMID: 21676930

Abstract

Natural visual scenes are cluttered. In such scenes, many objects in the periphery can be crowded, blocked from identification, simply because of the dense array of clutter. Outside of the fovea, crowding constitutes the fundamental limitation on object recognition and is thought to arise from the limited resolution of the neural mechanisms that select and bind visual features into coherent objects. Thus it is widely believed that in the visual processing stream, a crowded object is reduced to a collection of dismantled features with no surviving holistic properties. Here, we show that this is not so: an entire face can survive crowding and contribute its holistic attributes to the perceived average of the set, despite being blocked from recognition. Our results show that crowding does not dismantle high-level object representations to their component features.

Keywords: face recognition, object recognition, emotion, sparse selection, ensemble, texture

much of the visual processing in the brain culminates in object recognition, whereby the vast array of visual features in a scene are segmented and combined into discrete objects that we can interact with. This process happens rapidly and effortlessly, so that our everyday experience is one of a seamless visual environment richly populated with objects. Yet at a given moment, many, perhaps most, of the objects in our peripheral vision are rendered unrecognizable by surrounding visual clutter. This phenomenon, known as visual crowding, constitutes the fundamental limitation on object recognition (Levi 2008; Pelli and Tillman 2008; Whitney and Levi 2011).

Crowding is not a problem of insufficient visual acuity. An object in the periphery that is readily identifiable on its own can become unrecognizable simply by the addition of one or two neighboring objects (Bouma 1970; Korte 1923). Thus crowding reflects a limitation on the mechanisms by which visual features are combined into objects and brought to perceptual awareness. Understanding why crowding occurs is a crucial step in understanding visual object recognition.

Of the many theories that have been advanced to explain visual crowding, most have in common the underlying notion of a resolution limit in the object processing stream. That is, they posit that at some stage of object processing, the visual system lacks the necessary resolution to process an object individually when it is surrounded too closely by other objects. Theories have variously placed the resolution bottleneck at the level of elementary feature detection (Flom et al. 1963; Levi and Waugh 1994), feature pooling (Pelli et al. 2004; Pelli and Tillman 2008), attentional selection (He et al. 1996; Intriligator and Cavanagh 2001), and others, but all share the premise that beyond this resolution bottleneck, information about an individual crowded object is lost. The degraded representation of the crowded object leads observers to experience a jumbled, ambiguous percept in its place.

In the present study, we challenged the long-standing belief that crowding destroys object-level information. We presented observers with groups of faces and tested whether the central face in a group, despite being crowded by the surrounding faces, nonetheless exerted an influence on the perceived ensemble information about the set. We asked subjects to report the average expression in a group of faces when the central face was crowded such that it could not be individually identified (Fig. 1C). Despite the fact that subjects were unable to recognize the expression of the central crowded face, they were paradoxically able to incorporate its precise expression into an estimate of the overall group expression. Our results suggest a different account of crowding than that posited by traditional theories: crowding is not a failure of object processing but a failure of how individual objects are perceptually accessed from the object-processing stream.

Fig. 1. — Experimental stimuli. A: faces ranged from neutral (0) to disgusted (49) in a 50-frame morph; 1 emotional unit was defined as the difference between 2 consecutive frames. B: the average luminance profile across all 50 faces in the main stimulus set. Each face had a unique noise distribution added to equate its pixel contents to the group histogram (see methods). The example image shows the noise added to frame 36 of the morph. C: in the main experiment, we presented 2 groups of faces centered at 16.5° to the left and right of a fixation dot (faces are shown here at a reduced eccentricity for visualization). In separate runs, subjects judged either which central face was more disgusted or which set, on average, was more disgusted. In this example, the left and right flanking sets contain the same 6 faces (average expression of 35 emotional units), whereas the left central face is 12 units more disgusted than the right.

METHODS

Experimental design.

The study protocol was reviewed and approved by the University of California, Berkeley, Institutional Review Board, and informed written consent was obtained from all participants. The neutral and disgusted faces used in the emotional morph were drawn from Ekman's Pictures of Facial Affect (POFA) collection and are reprinted in modified form with permission from the Paul Ekman Group. The experimental stimuli consisted of 50 faces falling on an emotional continuum from neutral to disgusted (Fig. 1A). We generated this emotional continuum by creating a 50-frame morph between a neutral expression and a disgusted expression for the same person, and we defined the difference between 2 consecutive frames as 1 emotional unit. We also added noise to the images, as depicted in Fig. 1B, to control for luminance differences between the faces. To generate the noise in the face images, we first created a luminance histogram (with 256 bins corresponding to the possible grayscale values) for each of the 50 faces in the emotional morph. We averaged the histograms of all 50 images to produce a group histogram (Fig. 1B) and then added noise to each individual image until its histogram matched the group histogram. For example, if image X had fewer pixels of value 133 than the group histogram, then a pixel of value 133 was added to a random location in the image. By randomly moving through all possible grayscale values and iterating this process, we produced 50 images that all contained the same pixels, only arranged differently. On the right in Fig. 1B is an example face after the addition of noise to equate its luminance profile with that of the group average. The histogram-based noise approach ensured that the mean luminance of all images was the same, and the mean luminance within local regions of the images was very similar. Regions of an image that were slightly darker or lighter than in most other images, such as the bridge of the nose in the pictured face, received an appropriate level of noise to bring them in line with all images. We confirmed psychophysically that the discriminability between two adjacent frames did not vary significantly across the emotional continuum.

We first established a performance baseline for each subject's ability to discriminate the emotional expressions in the periphery by presenting 2 faces, 1 to the left and 1 to the right of fixation, at 16.5° eccentricity (the 2 central face positions in Fig. 1C without flanking faces). Although difficult, observers can still readily recognize faces at this eccentricity (Louie et al. 2007; McKone 2004). The faces were presented on a gray background, and each subtended 1.8 × 2.6° of visual angle. The faces appeared for 1 s followed by a gray background and fixation point awaiting the subject's response. We varied the emotional separation between the left and right faces between 2 and 12 emotional units, and subjects reported whether the left or right face was more disgusted in a 2-alternative forced choice method of constant stimuli task. For each subject, the threshold emotional separation (THR₇₅) was defined as the smallest emotional separation between the faces that could be discriminated 75% of the time (Fig. 2).

Fig. 2. — Single-face emotional discrimination. For each subject (Subj.), performance at discriminating 2 target faces (percent correct) in the absence of flankers is plotted as a function of the emotional separation between the faces (emotional units). Error bars indicate ±68 and ±95% confidence intervals for logistic curves fit to the data. For each subject, the threshold emotional separation (THR₇₅) was defined as the smallest emotional separation for which performance was at 75% correct or better; THR₇₅ was used as the fixed separation between the target faces in the main experiment. When performance at the THR₇₅ emotional separation was retested with flankers present, performance was significantly impaired in every subject (shaded bars; error bars are ±1 SE; all P values < 0.001), indicating that the flankers were effective at crowding the central faces.

In the main experiment, we presented 2 sets of 7 faces, with central faces positioned at 16.5° to the left and right of fixation and 6 flanking faces surrounding each central face (Fig. 1C). The emotional separation between the 2 central faces of the sets was fixed at either THR₇₅ (established in the baseline experiment) or 0. The difference between the average expressions of the surrounding faces on the left and right (mean_right − mean_left) varied between −8 and 8 emotional units and was independent of the relationship between the 2 central faces. We computed the average expression of each flanker set as the arithmetic mean of the emotional morph numbers of the 6 flanking faces in the set (i.e., the difference in flanker means plotted on the x-axis in Fig. 3 does not include the target faces). In separate runs, subjects either reported which central face was more disgusted or which group of faces as a whole was more disgusted. These 2 run types were interleaved. Subjects completed ≥4 runs of each type, with each run comprising 255 trials.

In two control experiments, we repeated the main experiment with the central faces inverted (Fig. 3B) and scrambled (Fig. 3C), whereas the flanking faces remained upright and intact. To create the scrambled faces, we defined three sections of the face, containing the eyes, nose, and mouth, respectively, and swapped the arrangement of these sections. We also horizontally offset the three segments slightly to disrupt the alignment of the features. This approach left all of the individual facial features intact and upright while aiming to disrupt the configural arrangement of the face by repositioning the features relative to each other.

In an additional dual-task control experiment, subjects made two judgments on each trial: 1) which set, on average, was more disgusted?; and 2) were the central faces of the sets upright or inverted? The central faces were either upright or inverted on randomly interleaved trials. The left and right flanking sets were fixed to have the same average expression on each trial, whereas the difference in the expressions of the central faces was ±THR₇₅. We selected only the trials in which the subjects misreported the orientation of the central faces for further analysis. Three subjects participated in the dual-task control, two of whom were naïve to the purpose of the experiment.

Analysis.

All data analysis was conducted in MATLAB R2009a (The MathWorks, Natick, MA). Our primary interest in the main experiment was whether the emotions of the central (crowded) faces influenced the perceived expressions of the 2 groups in which they fell. To test for such an effect, we first fit psychometric functions to each subject's data. We separately fit 1 curve to the trials in which the right central face was more disgusted (open circles in Fig. 3) and another curve to the trials in which the left central face was more disgusted (filled circles in Fig. 3). As a reference, we also fit a third curve to the trials in which the right and left central faces had the same emotion (squares in Fig. 3). We used a standard logistic equation of the form

y = \frac{1}{1 + e^{- a (x - b)}}

to fit each curve. An influence of the central faces on the perceived average expressions of the sets would manifest as a horizontal displacement of the black and gray curves away from each other (i.e., significantly different b parameters from the curve fits). To test for a significant displacement between the curves, we used a bootstrapping approach similar to the one outlined by Wichmann and Hill (2001). We generated a bootstrapped distribution of fits for each curve by resampling the data with replacement 1,000 times and fitting a new curve on each iteration. Each bootstrapped sample contained the same number of trials as the original data, but some trials were repeated, and some did not appear in a given sample due to the replacement during sampling. The resampling was performed within each bin along the abscissa so that each data point in the resampled data represented the same number of trials as in the original data (20 trials per data point). In this way, the resampling procedure provided an estimate of the variability of the original data at each point along the abscissa, and the distribution of resulting curve fit parameters reflected the confidence intervals around the estimates of those parameters. To compare the black and gray curves, we found the difference between the b parameter estimates for the 2 curves on each iteration and tested whether the resulting distribution of difference scores was significantly different from 0.

Eye tracking.

In a set of control runs, we tracked subjects' eye positions using a Tobii X120 eye tracker sampling at 120 Hz. Calibration was performed between each testing run to ensure the accuracy of gaze estimates over the course of the experiment. Gaze recordings were analyzed in MATLAB R2009a: we measured the distribution of x (horizontal) gaze coordinates and plotted horizontal gaze position as a function of time for visualization (Fig. 4D). The two subjects were experienced psychophysical observers, and their gaze never departed >2.5° from fixation (stimuli were at 16.5°).

Fig. 4. — Inverted and scrambled controls with single-face discrimination equated. In 2 subjects, we repeated the inverted and scrambled control experiments (as well as the main experiment with upright central faces), this time separately computing THR₇₅, the threshold emotional separation, for the inverted and scrambled faces when presented in isolation. A: for upright central faces, the results from these subjects replicated those of the main experiment: the expressions of the crowded central faces exerted a significant influence on the perceived expressions of the surrounding sets, as indicated by a horizontal displacement between the black and gray solid curves (JF: P = 0.001; JS: P = 0.002). B: even with the central inverted faces separated on the emotional continuum (Fig. 1A) such that they were as discriminable as upright faces, they nonetheless failed to influence subjects' perception of the average set expressions (JF: P = 0.64; JS: P = 0.29). C: after equating the discriminability of isolated scrambled faces with that of upright faces, the scrambled faces still failed to influence the perceived expressions of the surrounding sets (JF: P = 0.84; JS: P = 0.15). D: we also tracked subjects' eye movements during these control runs. Example plots are shown for horizontal gaze position over the course of 1 run for each subject. The subjects were experienced psychophysical observers, and their gaze did not deviate >2.5° from fixation at any time during the runs (the stimulus arrays were at 16.5° eccentricity).

Monte Carlo simulation.

In a series of control analyses, we simulated subjects' performance in the scenario where the influence of the central faces was due to incomplete or failed crowding. We used the trials from the main experiment in which the central faces were identical (dashed curves in Fig. 3) and modified a subset of the trials on each iteration of a Monte Carlo simulation. On each iteration, we randomly sampled a subset of the trials (varying in size from 1% of total trials up to the percentage of total trials in which crowding failed, estimated separately for each subject) and replaced the subject's responses in the subsampled trials with responses corresponding to a comparison of the two central faces. The estimated proportion of trials on which crowding might have failed is given by [2(% correct − 50)]/100, since chance is 50%. Subject NS was at chance at discriminating the crowded faces (49.4% correct), suggesting no failed crowding on the whole, but, at the other extreme, subject SM's performance was 63.3% correct, suggesting crowding may have failed on up to 26.6% of trials.

We separately simulated trials in which the left central face was more disgusted than the right and vice versa, replacing subjects' responses with left responses or right responses, respectively. These altered trials reflect the response that the subject would have made on a trial in which crowding broke down. Taking all of the data, including the altered trials, we fit a psychometric curve using a standard logistic model, as shown in Fig. 5A, acquiring a slope (a) parameter and a threshold (b) parameter. These parameters characterize a possible psychometric curve that we might have measured whether the subject was simply exploiting failed crowding to perform the task. We plotted the pair of parameters as a point in 2-dimensional parameter space in Fig. 5B, with the simulation for the case where the right central face was more disgusted than the left plotted in dark gray, and the reverse case (left more disgusted than right) plotted in light gray. We repeated this subsampling and curve fitting process ∼25,000 times for each subject, each time sampling and altering anywhere from 1% of the total trials up to the percentage of total trials in which crowding might have failed. In this way, for each subject, we obtained the 2 clusters of curve estimates (Fig. 5B), representing the psychometric curves that might result if the subject's responses were based solely on individual access to the central faces on some trials. In a separate simulation, we held the number of altered trials constant, fixed at the proportion of trials in which crowding might have failed, and varied the weighting on the central faces between 0 and 100%. A larger weighting on the central faces produced a larger effective difference between the average expressions of the 2 sets. Altered trials were assigned a left or right response based on the likelihood of such a response given the effective difference between the set means, as determined from the subject's baseline curve.

Fig. 5. — Simulated influence of incomplete crowding in the main experiment. On each iteration of a Monte Carlo simulation, we produced simulated data based on the hypothesis that subjects responded according to a comparison of the central faces when crowding was incomplete and had no information about the central faces on the remaining trials (see methods). We fit the data from each iteration with a standard logistic function (A) and plotted the resulting parameter estimates in a 2-dimensional parameter space (B). We compared the parameter estimates from subjects' empirical data (solid black dots in B) with those from the simulated data by fitting a 2-dimensional Gaussian distribution to the simulated parameter estimates (C). C, *bottom*, shows a slice running through the peak of the 2-dimensional Gaussian and the subject's empitical data point. Subjects' empirical data differed significantly from the simulated distributions, indicating that failed crowding on a portion of trials cannot explain the effect we found in the main experiment.

Our goal was to test whether subjects' actual empirical data from the main experiment differed significantly from what would be expected based on any possible degree of failed crowding alone. To do so, we plotted subjects' psychometric curve fit parameters from the main experiment (the filled black points in Fig. 5B) on the same plots as the bootstrapped data, and, after fitting a two-dimensional Gaussian distribution to each cluster of bootstrapped points, evaluated how extreme the empirical data points were, relative to the bootstrapped distributions. We examined the standard deviation of the cluster of bootstrapped values along the line passing through its centroid and the empirical point (Fig. 5C), determining the portion of the area under the Gaussian curve that fell beyond the subject's empirical data.

RESULTS

The goal of the study was to test whether an intact upright face, although crowded such that its expression could not be individually identified, could nonetheless influence the perceived ensemble expression in a group of faces. Face processing relies on holistic mechanisms that are sensitive to the configuration of facial features (Maurer et al. 2002; Moscovitch et al. 1997; Yin 1969), making faces an ideal stimulus for investigating the fate of crowded object-level information.

Single-face discrimination baseline.

We first established each subject's baseline threshold for discriminating between the expressions of two faces presented without flankers (see methods). Figure 2 shows the results from this baseline experiment. For each subject, the threshold emotional separation (THR₇₅) was defined as the smallest emotional separation at which the subject correctly discriminated ≥75% of the face pairs. Each subject's individual THR₇₅ was used in the main experiment as the emotional separation between the two crowded central faces.

Main experiment: discriminating the ensemble expressions of face sets.

In the main experiment, we presented two groups of seven faces each, again centered at 16.5° to the left and right of fixation (Fig. 1C; see methods). In separate, interleaved runs, subjects compared either the expressions of the two central faces, reporting which was more disgusted, or the average emotions of the left and right sets as a whole [recent results show that observers can accurately and rapidly judge the average emotion of a set of faces (de Fockert and Wolfenstein 2009; Haberman and Whitney 2007; Sweeny et al. 2009)].

When discriminating the two central faces, each subject's performance dropped significantly compared with his/her performance in the absence of flankers (depicted in the shaded bars in Fig. 2; all subjects, P < 0.001). This reduction in discrimination of the target faces with the addition of flankers is due to crowding and has been demonstrated to result from interactions at the level of holistic face information (Farzin et al. 2009; Louie et al. 2007). Crowding resulted in a drop to chance-level performance in one subject (NS) and slightly above-chance performance in the other three subjects.

Figure 3A shows subjects' performance when viewing the same stimulus configuration but comparing the average expressions of the two sets as wholes. On the x-axis is the emotional separation between the right and left flanker means (mean_right − mean_left), not including the central faces; positive values indicate that the right set was more disgusted on average, and negative values indicate that the left set was more disgusted. On the y-axis is the proportion of trials in which the subject responded that the right group of faces was more disgusted. The dashed curve in each plot shows performance when the two central faces were identical, whereas the black and gray solid curves show performance for trials in which the right or left central face was more disgusted, respectively. The fact that these solid curves are shifted away from each other in every subject indicates an influence of the crowded central faces on subjects' perception of the average group expressions, despite the fact that the central faces were crowded. For example, at the zero point on the abscissa, when the left and right flanking sets had the same average expression, subjects were more likely to perceive the right set as more disgusted when the right central face was more disgusted (the solid black curve) and vice versa when the left central face was more disgusted (the solid gray curve). A bootstrapping analysis, as detailed in methods, established that these shifts were significant in the psychometric functions of every subject (P < 0.001 for JF, NS, and SM; P = 0.002 for TH).

Inverted and scrambled face controls.

Each subject participated in two additional control experiments to test whether the influence of the crowded central faces on the perceived expressions of the surrounding sets took place at the level of holistic, object-level information rather than at the level of basic visual features. We repeated the main experiment, this time inverting the central faces or scrambling the central faces as depicted at the top of Fig. 3. Each is a common method for disrupting configural processing of objects while leaving low-level stimulus features intact (Moscovitch et al. 1997; Yin 1969). If the influence of the crowded central faces that we found in the main experiment was based on low-level stimulus properties such as luminance or orientation rather than face recognition, we would expect to find a similar effect after inverting or scrambling the central faces. In fact, Fig. 3B shows that the influence of the central faces on the perceived expressions of the surrounding sets was eliminated when the central faces were inverted (JF: P = 0.73; NS: P = 0.86; SM: P = 0.32; TH: P = 0.41). Likewise, disrupting the configural arrangement of the facial features by scrambling their positions eliminated the effect of the central faces (Fig. 3C; JF: P = 0.24; NS: P = 0.66; SM: P = 0.22; TH: P = 0.16).

We performed an additional control in two subjects in which we repeated all three versions of the experiment [upright, inverted (invert), and scrambled (scramb) central faces], this time separately determining THR₇₅ for inverted and scrambled faces such that they were presented at an emotional separation that was just as discriminable as upright faces when viewed in isolation. Data from the upright central faces replicated the results of the main experiment, showing a significant influence on the perceived set expressions (Fig. 4A; JF: P = 0.001, JS: P = 0.002). However, despite being readily discriminable when viewed in isolation, the inverted and scrambled central faces still failed to influence the perceived expressions of the surrounding sets (Fig. 4, B and C; JF_invert: P = 0.64, JF_scramb: P = 0.84, JS_invert: P = 0.29, JS_scramb: P = 0.15). Thus the influence of the crowded central faces hinges on them containing upright features that are arranged appropriately, such that configural or holistic face processing is possible. We conducted an additional control experiment in one subject in which both the flankers and central faces were inverted and again found no influence of the inverted central faces on the perceived set expressions (P = 0.22).

During these control runs, we also tracked subjects' eye positions to ensure that eye movements were not responsible for the results. The subjects maintained good fixation throughout all runs (example eye traces are shown in Fig. 4D) and showed the same pattern of results as in the main experiment.

Accounting for incomplete crowding in the main experiment.

A potential concern is that if crowding was incomplete on some trials in the main experiment, individual access to the central faces on those trials might have influenced subjects' responses regarding the average set expressions. Whereas subject NS was at chance performance when discriminating the crowded central faces (Fig. 2), the other three subjects performed slightly but significantly above chance, indicating that they had individual access to the central faces on some trials. We tested whether these cases in which crowding was incomplete could have been sufficient to explain the influence of the central faces on the perceived set expression that we found.

We simulated the scenario in which subjects averaged the central faces with the surrounding flankers when crowding was incomplete but did not incorporate the expressions of the central faces on the remaining trials (see methods for details). We used subjects' performance in discriminating the crowded central faces to estimate the maximum number of trials in which crowding might have broken down during the set mean discrimination runs. We found that the resulting shifts in subjects' psychometric functions would be much smaller than those that we actually measured in the main experiment (all P values ≥ 0.34); that is, crowding did not fail on enough trials to produce an effect of the size that we found. We extended this analysis to include the possibility that subjects weighted the central faces more heavily than the surrounding faces on trials where crowding failed. A Monte Carlo simulation depicted in Fig. 5 and detailed in methods showed that the outcome of such a strategy is inconsistent with our results. If subjects based their judgments solely on the central faces whenever crowding failed, such a strategy could produce shifts in the psychometric functions of the size that we found in the main experiment, but this strategy would also yield substantially flattened psychometric functions as a result of ignoring the flanking faces on some trials (Fig. 5B; recall that in Fig. 3, the abscissa represents the relative means of the flanking faces, not including the central faces). All subjects' actual performance fell significantly outside the range of expected performance based on such a weighting strategy (all P values ≤ 0.0037). Similarly, over a range of possible weightings of the central faces from 0 to 100%, we found that no weighting would have produced psychometric functions that were simultaneously as steep and shifted as those we measured in the main experiment (actual psychometric functions fell outside the clouds of simulated psychometric functions; all P values ≤ 0.0022).

We also considered the possibility of partial crowding: perhaps some features from the target faces consistently leaked through the influence of crowding but not always to a degree that would allow for correct discrimination of the central faces. It has been shown that some performance variability in crowding tasks can be attributed to idiosyncratic but consistent differences in the difficulty of crowding with different target-distractor configurations (Dakin et al. 2009). If there was any consistency to the kinds of features that might leak through crowding, then the trials that subjects got correct most often in the crowded target discrimination task should have been more likely to show an effect of the central faces in the ensemble discrimination task. Because the stimuli were the same for the crowded face discrimination and ensemble discrimination tasks, we were able to test whether the stimuli that subjects tended to get correct when discriminating the central faces also tended to drive the effect of the central faces in the ensemble discrimination experiment. That is, did variability in the crowding effect correlate with variability in the ensemble discrimination performance across the same stimuli? For each ensemble discrimination trial, we searched within the crowded face discrimination trials to find matching trials: those trials with the same flanker means (the emotional separation between the central faces was always the same, set at THR₇₅ for each subject). For each trial in the ensemble discrimination task, we computed percent correct for the matching trials in the crowded face discrimination task and also recorded whether the subject's response when comparing the ensemble expressions was consistent with the relative emotions of the central faces. We correlated these two measures with each other across trials to test whether a trial's difficulty in the target discrimination task was predictive of whether the subject was likely to show a bias toward using the central face in the ensemble discrimination task. This correlation was not significant in any subject (all P values ≥ 0.14); crowding difficulty did not predict whether the central faces influenced the perceived set expressions on a trial-by-trial basis.

The results of the Monte Carlo simulations and partial crowding simulation suggest that partial crowding would not explain the results in Figs. 3 and 4. Nevertheless, we sought a more direct test of this possibility by employing a dual task in which we measured crowding and ensemble perception within the same trials.

Dual-task experiment.

We conducted a dual-task control experiment to test for an influence of the crowded central faces on the perceived set expressions on a within-trial basis (see methods). Subjects judged both the relative set expressions and the orientaions of the central faces on each trial. We analyzed only the trials in which the subjects misreported the orientation of the central faces for further analysis.

Note that the orientation task provided a stringent test of whether the central faces were successfully crowded. In a separate control experiment, we verified that failure at the upright/inverted task was a good indicator that the expressions of the central faces were crowded as well: when subjects judged both the orientation and the relative expressions of the central faces, performance on the expression discrimination was at chance within trials in which the upright/inverted judgment was incorrect (JF: 47.2% correct, GM: 47.1% correct, TS: 43.1% correct; none significantly different from chance). That is, when subjects incorrectly identified target face orientation, they were unable to report the target expression above chance.

Figure 6 shows the results of the dual-task experiment. On trials in which the subject incorrectly reported upright central faces as inverted, the central faces nonetheless had a significant influence on the perceived set expressions (Fig. 6A; GM: z = 2.04, P = 0.042; JF: z = 3.10, P = 0.002; TS: z = 3.30, P = 0.002). On the other hand, when inverted central faces were misreported as upright, they had no effect on the perceived average set expressions (Fig. 6B; all P values ≥ 0.38). Thus, even within the same trials in which subjects could not identify the orientation of the central faces, those faces nonetheless contributed holistic information to the perceived expression of the surrounding sets.

Fig. 6. — Dual-task experiment. Subjects viewed stimuli similar to those in the main experiment, this time judging both the relative average expressions of the 2 sets and whether the central faces were upright or inverted. We selected only the trials in which the subject responded incorrectly to the upright/inverted question for further analysis; on these trials, subjects could not individually access the central faces sufficiently to identify their orientations. The left and right flanking sets always had the same average expression, so any overall difference in the reported relative expressions of the sets was due to an influence of the central faces. A: when central faces were upright but incorrectly reported as inverted, they nonetheless had a significant influence on the perceived average expressions of the surrounding sets (GM: z = 2.04, P = 0.042; JF: z = 3.10, P = 0.002; TS: z = 3.30, P = 0.002). B: on the other hand, inverted faces that were misreported as upright had no effect on the perceived set expressions (all P values ≥ 0.38). Error bars are ±1 SE. *Significant at α = 0.05. n.s., Not significant.

DISCUSSION

Our results show that crowding does not irreversibly dismantle or destroy the “objectness” of an object. Despite the fact that subjects could not explicitly identify the central faces, the visual system had access to precise information about the expressions of the crowded faces when generating average representations of the sets. Crowding may be an essential bottleneck for object recognition, but it does not break down object processing itself.

The present results suggest that the purpose of object processing is not solely for the sake of perceiving individual objects. Indeed, emerging literature on ensemble coding is revealing that the brain can efficiently compute summary information about groups of visual objects at all levels of complexity (Alvarez and Oliva 2009; Ariely 2001; Chong and Treisman 2003; Haberman and Whitney 2007; Oliva 2005). These high-level textures may provide a rapid sense of the objects that surround us, constituting the initial perception of a scene before we selectively attend to or perceive the individual constituent objects (Hochstein and Ahissar 2002).

An example where such a tradeoff between the perception of individual objects and the perception of object textures may be particularly apparent is in the phenomenon of “mindsight” (Rensink 2004). Mindsight refers to the ability of observers to detect accurately a change in a scene without being able to locate where the change occurred. Our results provide a reasonable mechanism for mindsight: if many of the particular objects in the scene are crowded from recognition, an observer may be unable to report which object among many crowded ones has changed. However, since those crowded objects nonetheless contribute to the scene gist, the presence of a change in an individual object could be detected simply by being aware that some ensemble information has changed.

Our study is similar in experimental design to an important study conducted by Parkes et al. (2001) but addresses a very different question. They presented arrays of oriented gratings in the periphery and found that subjects perceived the average orientation of the arrays in a manner consistent with a compulsory averaging of the crowded orientation information, thus uncovering a link between crowding and texture processing. Greenwood et al. (2009) reported similar findings with averaging of crowded positional information. Critically, both of these studies found averaging of basic visual features during crowding. Although the results of neither study make a direct prediction about the fate of higher-level objects (e.g., faces) during crowding, based on the compulsory integration of basic visual features that occurs at a crowded location, it is reasonable to think that crowding would render further object processing impossible. This is consistent with the commonly held view that crowding dismantles objects to their component features (Levi 2008) and that those features are then available as a texture (Parkes et al. 2001). Our results, based on the presentation of high-level stimuli rather than basic visual features, show that object-level information in fact still exists in neural representations of the crowded location.

Our results demonstrate that although crowding is the fundamental bottleneck on conscious object recognition (Levi 2008), high-level holistic or configural information about objects is not destroyed or broken down to low-level features. The visual system maintains precise high-level object representations even in the crowd.

GRANTS

This work was supported by National Eye Institute Grants T32-EY-015387 (J. Fischer) and EY-018216 (D. Whitney) and National Science Foundation Grant 0748689 (D. Whitney).

DISCLOSURES

No conflicts of interest, financial or otherwise, are declared by the author(s).

ACKNOWLEDGMENTS

We thank Jason Haberman for providing the morphed face images used in the main experiment.

REFERENCES

Alvarez GA, Oliva A. Spatial ensemble statistics are efficient codes that can be represented with reduced attention. Proc Natl Acad Sci USA 106: 7345–7350, 2009 [DOI] [PMC free article] [PubMed] [Google Scholar]
Ariely D. Seeing sets: representation by statistical properties. Psychol Sci 12: 157–162, 2001 [DOI] [PubMed] [Google Scholar]
Bouma H. Interaction effects in parafoveal letter recognition. Nature 226: 177–178, 1970 [DOI] [PubMed] [Google Scholar]
Chong SC, Treisman A. Representation of statistical properties. Vision Res 43: 393–404, 2003 [DOI] [PubMed] [Google Scholar]
Dakin SC, Bex PJ, Cass JR, Watt RJ. Dissociable effects of attention and crowding on orientation averaging. J Vis 9: 28.1–16, 2009 [DOI] [PMC free article] [PubMed] [Google Scholar]
de Fockert J, Wolfenstein C. Rapid extraction of mean identity from sets of faces. Q J Exp Psychol (Colchester) 62: 1716–1722, 2009 [DOI] [PubMed] [Google Scholar]
Ekman P, Friesen WV. Pictures of Facial Affect. Palo Alto, CA: Consulting Psychologists Press, 1976 [Google Scholar]
Farzin F, Rivera SM, Whitney D. Holistic crowding of Mooney faces. J Vis 9: 18.1–15, 2009 [DOI] [PMC free article] [PubMed] [Google Scholar]
Flom MC, Heath GG, Takahashi E. Contour interaction and visual resolution: contralateral effects. Science 142: 979–980, 1963 [DOI] [PubMed] [Google Scholar]
Greenwood JA, Bex PJ, Dakin SC. Positional averaging explains crowding with letter-like stimuli. Proc Natl Acad Sci USA 106: 13130–13135, 2009 [DOI] [PMC free article] [PubMed] [Google Scholar]
Haberman J, Whitney D. Rapid extraction of mean emotion and gender from sets of faces. Curr Biol 17: R751–R753, 2007 [DOI] [PMC free article] [PubMed] [Google Scholar]
He S, Cavanagh P, Intriligator J. Attentional resolution and the locus of visual awareness. Nature 383: 334–337, 1996 [DOI] [PubMed] [Google Scholar]
Hochstein S, Ahissar M. View from the top: hierarchies and reverse hierarchies in the visual system. Neuron 36: 791–804, 2002 [DOI] [PubMed] [Google Scholar]
Intriligator J, Cavanagh P. The spatial resolution of visual attention. Cogn Psychol 43: 171–216, 2001 [DOI] [PubMed] [Google Scholar]
Korte W. Uber die Gestaltauffassung im indirekten Sehen. Zeitschrift für Psychologie 93: 17–82, 1923 [Google Scholar]
Levi DM. Crowding–an essential bottleneck for object recognition: a mini-review. Vision Res 48: 635–654, 2008 [DOI] [PMC free article] [PubMed] [Google Scholar]
Levi DM, Waugh SJ. Spatial scale shifts in peripheral vernier acuity. Vision Res 34: 2215–2238, 1994 [DOI] [PubMed] [Google Scholar]
Louie EG, Bressler DW, Whitney D. Holistic crowding: selective interference between configural representations of faces in crowded scenes. J Vis 7: 24.1–11, 2007 [DOI] [PMC free article] [PubMed] [Google Scholar]
Maurer D, Grand RL, Mondloch CJ. The many faces of configural processing. Trends Cogn Sci 6: 255–260, 2002 [DOI] [PubMed] [Google Scholar]
McKone E. Isolating the special component of face recognition: peripheral identification and a Mooney face. J Exp Psychol Learn Mem Cogn 30: 181–197, 2004 [DOI] [PubMed] [Google Scholar]
Moscovitch M, Winocur G, Behrmann M. What is special about face recognition? Nineteen experiments on a person with visual object agnosia and dyslexia but normal face recognition. J Cogn Neurosci 9: 555–604, 1997 [DOI] [PubMed] [Google Scholar]
Oliva A. Gist of the scene. In: The Encyclopedia of Neurobiology of Attention, edited by Itti L, Rees G, Tsotsos JK. San Diego, CA: Elsevier, 2005, p. 251–256 [Google Scholar]
Parkes L, Lund J, Angelucci A, Solomon JA, Morgan M. Compulsory averaging of crowded orientation signals in human vision. Nat Neurosci 4: 739–744, 2001 [DOI] [PubMed] [Google Scholar]
Pelli DG, Palomares M, Majaj NJ. Crowding is unlike ordinary masking: distinguishing feature integration from detection. J Vis 4: 1136–1169, 2004 [DOI] [PubMed] [Google Scholar]
Pelli DG, Tillman KA. The uncrowded window of object recognition. Nat Neurosci 11: 1129–1135, 2008 [DOI] [PMC free article] [PubMed] [Google Scholar]
Rensink RA. Visual sensing without seeing. Psychol Sci 15: 27–32, 2004 [DOI] [PubMed] [Google Scholar]
Sweeny TD, Grabowecky M, Paller K, Suzuki S. Within-hemifield perceptual averaging of facial expressions predicted by neural averaging. J Vis 9: 1–11, 2009 [DOI] [PMC free article] [PubMed] [Google Scholar]
Whitney D, Levi DM. Visual crowding: a fundamental limit on conscious perception and object recognition. Trends Cogn Sci 15: 160–168, 2011 [DOI] [PMC free article] [PubMed] [Google Scholar]
Wichmann FA, Hill NJ. The psychometric function: II. Bootstrap-based confidence intervals and sampling. Percept Psychophys 63: 1314–1329, 2001 [DOI] [PubMed] [Google Scholar]
Yin RK. Looking at upside-down faces. J Exp Psychol 81: 141–145, 1969 [Google Scholar]

[B1] Alvarez GA, Oliva A. Spatial ensemble statistics are efficient codes that can be represented with reduced attention. Proc Natl Acad Sci USA 106: 7345–7350, 2009 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B2] Ariely D. Seeing sets: representation by statistical properties. Psychol Sci 12: 157–162, 2001 [DOI] [PubMed] [Google Scholar]

[B3] Bouma H. Interaction effects in parafoveal letter recognition. Nature 226: 177–178, 1970 [DOI] [PubMed] [Google Scholar]

[B4] Chong SC, Treisman A. Representation of statistical properties. Vision Res 43: 393–404, 2003 [DOI] [PubMed] [Google Scholar]

[B5] Dakin SC, Bex PJ, Cass JR, Watt RJ. Dissociable effects of attention and crowding on orientation averaging. J Vis 9: 28.1–16, 2009 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B6] de Fockert J, Wolfenstein C. Rapid extraction of mean identity from sets of faces. Q J Exp Psychol (Colchester) 62: 1716–1722, 2009 [DOI] [PubMed] [Google Scholar]

[B7] Ekman P, Friesen WV. Pictures of Facial Affect. Palo Alto, CA: Consulting Psychologists Press, 1976 [Google Scholar]

[B8] Farzin F, Rivera SM, Whitney D. Holistic crowding of Mooney faces. J Vis 9: 18.1–15, 2009 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B9] Flom MC, Heath GG, Takahashi E. Contour interaction and visual resolution: contralateral effects. Science 142: 979–980, 1963 [DOI] [PubMed] [Google Scholar]

[B10] Greenwood JA, Bex PJ, Dakin SC. Positional averaging explains crowding with letter-like stimuli. Proc Natl Acad Sci USA 106: 13130–13135, 2009 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B11] Haberman J, Whitney D. Rapid extraction of mean emotion and gender from sets of faces. Curr Biol 17: R751–R753, 2007 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B12] He S, Cavanagh P, Intriligator J. Attentional resolution and the locus of visual awareness. Nature 383: 334–337, 1996 [DOI] [PubMed] [Google Scholar]

[B13] Hochstein S, Ahissar M. View from the top: hierarchies and reverse hierarchies in the visual system. Neuron 36: 791–804, 2002 [DOI] [PubMed] [Google Scholar]

[B14] Intriligator J, Cavanagh P. The spatial resolution of visual attention. Cogn Psychol 43: 171–216, 2001 [DOI] [PubMed] [Google Scholar]

[B15] Korte W. Uber die Gestaltauffassung im indirekten Sehen. Zeitschrift für Psychologie 93: 17–82, 1923 [Google Scholar]

[B16] Levi DM. Crowding–an essential bottleneck for object recognition: a mini-review. Vision Res 48: 635–654, 2008 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B17] Levi DM, Waugh SJ. Spatial scale shifts in peripheral vernier acuity. Vision Res 34: 2215–2238, 1994 [DOI] [PubMed] [Google Scholar]

[B18] Louie EG, Bressler DW, Whitney D. Holistic crowding: selective interference between configural representations of faces in crowded scenes. J Vis 7: 24.1–11, 2007 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B19] Maurer D, Grand RL, Mondloch CJ. The many faces of configural processing. Trends Cogn Sci 6: 255–260, 2002 [DOI] [PubMed] [Google Scholar]

[B20] McKone E. Isolating the special component of face recognition: peripheral identification and a Mooney face. J Exp Psychol Learn Mem Cogn 30: 181–197, 2004 [DOI] [PubMed] [Google Scholar]

[B21] Moscovitch M, Winocur G, Behrmann M. What is special about face recognition? Nineteen experiments on a person with visual object agnosia and dyslexia but normal face recognition. J Cogn Neurosci 9: 555–604, 1997 [DOI] [PubMed] [Google Scholar]

[B22] Oliva A. Gist of the scene. In: The Encyclopedia of Neurobiology of Attention, edited by Itti L, Rees G, Tsotsos JK. San Diego, CA: Elsevier, 2005, p. 251–256 [Google Scholar]

[B23] Parkes L, Lund J, Angelucci A, Solomon JA, Morgan M. Compulsory averaging of crowded orientation signals in human vision. Nat Neurosci 4: 739–744, 2001 [DOI] [PubMed] [Google Scholar]

[B24] Pelli DG, Palomares M, Majaj NJ. Crowding is unlike ordinary masking: distinguishing feature integration from detection. J Vis 4: 1136–1169, 2004 [DOI] [PubMed] [Google Scholar]

[B25] Pelli DG, Tillman KA. The uncrowded window of object recognition. Nat Neurosci 11: 1129–1135, 2008 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B26] Rensink RA. Visual sensing without seeing. Psychol Sci 15: 27–32, 2004 [DOI] [PubMed] [Google Scholar]

[B27] Sweeny TD, Grabowecky M, Paller K, Suzuki S. Within-hemifield perceptual averaging of facial expressions predicted by neural averaging. J Vis 9: 1–11, 2009 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B28] Whitney D, Levi DM. Visual crowding: a fundamental limit on conscious perception and object recognition. Trends Cogn Sci 15: 160–168, 2011 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B29] Wichmann FA, Hill NJ. The psychometric function: II. Bootstrap-based confidence intervals and sampling. Percept Psychophys 63: 1314–1329, 2001 [DOI] [PubMed] [Google Scholar]

[B30] Yin RK. Looking at upside-down faces. J Exp Psychol 81: 141–145, 1969 [Google Scholar]

PERMALINK

Object-level visual information gets through the bottleneck of crowding

Jason Fischer

David Whitney

Abstract

Fig. 1.