Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2010 Mar 1.
Published in final edited form as: J Exp Psychol Hum Percept Perform. 2009 Jun;35(3):718–734. doi: 10.1037/a0013899

Seeing the mean: Ensemble coding for sets of faces

Jason Haberman 1,2,*, David Whitney 1,2
PMCID: PMC2696629  NIHMSID: NIHMS93131  PMID: 19485687

Abstract

We frequently encounter groups of similar objects in our visual environment: a bed of flowers, a basket of oranges, a crowd of people. How does the visual system process such redundancy? Research shows that, rather than coding every element comprising a texture, the visual system favors a statistical summary representation of all the elements. Although this ‘ensemble coding’ may facilitate low-level texture perception, we demonstrate that ensemble coding also occurs for faces, a level of processing well beyond that of textures. Observers viewed sets of (4 to 16) faces varying in emotionality (e.g. happy to sad) and made judgments about the mean emotion of each set. Although observers retained little information about the emotionality of the individual set members, they had a remarkably precise representation of the mean emotion. Observers continued to accurately discriminate the mean emotion even when they viewed sets of 16 faces for 500 ms or less. Modeling revealed that observers’ abilities to perceive ensemble facial expression in groups of faces is not due to noisy representation or noisy discrimination. These findings support the hypothesis that ensemble coding occurs extremely fast at multiple levels of visual analysis.

Keywords: ensemble, set perception, visual search, face recognition, summary statistics, perception


Our seamless interaction with our surroundings gives us the impression that we have a complete and accurate representation of the visual world. Well-controlled laboratory experiments, however, have revealed that the visual system only samples sparsely, and it has limited attentional and short-term memory capacity (Luck & Vogel, 1997; Potter, 1976; Rensink, ORegan, & Clark, 1997; Scholl & Pylyshyn, 1999; Simons & Levin, 1998). What gives us the impression that we have such a complete representation of the visual world? One possible contribution may lie in the natural design of the environment – it is highly redundant. A field of grass, for example, contains repeating and overlapping features. While we may be able to distinguish one blade of grass from another, coding every blade of grass would be computationally overwhelming and may serve little utility. Rather, what we tend to perceive by default is the whole field, a single texture comprised of many blades of grass. This kind of ‘ensemble coding’ reflects an adaptive mechanism that allows for the efficient representation of a large amount of information – so efficient, that it has been suggested that this process may be responsible for the ‘illusion of completeness,’ filling in gaps of a visual scene where detailed representations are lacking (Chong & Treisman, 2003).

Ensemble coding, whereby summary statistics are derived from a set of similar items, has been examined for low-level features such as size (Ariely, 2001; Chong & Treisman, 2003, 2005) and orientation (Parkes, Lund, Angelucci, Solomon, & Morgan, 2001). Ariely (2001) demonstrated, for example, that observers precisely extract the mean size from a set of dots varying in size while losing the representation of the individual set constituents. The precision of mean extraction is not significantly compromised by changing the distribution of dot sizes within the set (Chong & Treisman, 2003), suggesting a robust and flexible averaging mechanism. In orientation perception, Parkes (2001) showed that individuals perceive a mean orientation in a set of crowded Gabor patches presented in the periphery, despite being unable to individuate the central target. Observers’ inability to correctly identify the orientation of the target is not due to interference from the crowding flankers. Rather, observers’ responses reflect an implicit pooling of all the elements in the set.

Given the overwhelming influx of information, it is not entirely surprising that the visual system employs an ensemble coding heuristic. Uniform patterns such as dots or lines possess minimal amounts of variance, making it both easy and reasonable to use a single statistic to represent the whole set. By favoring a single summary statistic over a discrete representation for each set constituent, the system dramatically reduces computational load. Such ease of coding may explain why the dominant (and more relevant) percept when viewing a surface is that of a single texture and not a jumble of low-level features. In fact, ensemble coding may actually drive texture perception (Cavanagh, 2001). This does not mean, however, that ensemble coding operates only at mid-level vision (beyond V1 but before higher level object representation; Marr, 1982; Nakayama, He, & Shimojo, 1995). In a previous study, we showed that observers precisely represented the mean emotion of a set of emotionally varying faces, a level of processing well beyond that of surface perception (Haberman & Whitney, 2007).

Our initial findings revealed that ensemble coding is precise, flexible, and occurs for high-level objects like faces. We now further characterize the mechanisms driving ensemble coding. The first two experiments demonstrate that this process occurs implicitly. Using a paradigm similar to Ariely’s (2001), we show that observers unknowingly represent a set of faces using the mean emotion despite unrelated task instructions, and do so at low stimulus durations. Modeling confirms that performance cannot be explained by observer discrimination ability, and thus points to an explicit averaging process. The third experiment replicates and extends previous work, demonstrating the precision with which the mean emotion of a set of faces may be represented. We show that observers’ can discriminate a mean from an array of heterogeneous faces as well as they can discriminate any two individual faces, a surprising level of precision. Experiment 4 shows that, despite a precise mean representation of a set of faces, observers have almost no persistent representation of the individual faces comprising that set. The final two experiments are control experiments showing that this emotion averaging is indeed the result of the high-level properties of face stimuli, and does not simply operate on low-level features. Observers are unable to extract a mean from a set of inverted or scrambled faces as well as they can from upright faces. These experiments converge to suggest that ensemble coding is implicit, fast, and occurs across multiple levels of object complexity.

Experiment 1A

The first experiment tested observers’ knowledge of the individual set members. Despite the instruction to attend to the individual members of the set, we hypothesized that performance on this task would reflect a bias to represent sets of faces using the mean emotion.

Method

Participants

Four individuals (one woman, mean age = 25 yrs) affiliated with the University of California, Davis participated. Informed consent was obtained for all volunteers, who were compensated for their time and had normal or corrected-to-normal vision.

Stimuli

We generated a set of 50 faces by ‘morphing’ (Morph 2.5, 1998) between two emotionally extreme faces of the same person, taken from the Ekman gallery (Ekman & Friesen, 1976). The emotional expression among the faces ranged from happy to sad (or neutral to disgusted), with face number one being the happiest. Morphed faces were nominally separated from one another by emotional units (e.g. face one was one emotional unit happier than face two). The larger the separation between any two faces, the easier they should be to discriminate (we tested this in Experiment 1B).

To create the range of morphs, the starting points of several features (e.g. the corner of the mouth, the bridge of the nose, the center of the eye) on one face are matched to their corresponding end points on the other face. For happy-sad stimuli, 75 points of interest were specified. The program then linearly interpolated the two original Ekman faces, creating 50 separate morphed images (see Figure 1). All face images were gray-scaled (the average face had a 98% max Michelson contrast) and occupied 3.04 × 4.34 degrees of visual angle. The background relative to the average face had a 29% max Michelson contrast.

Figure 1.

Figure 1

The spectrum of face morphs for both happy-sad and neutral-disgusted emotions. There were 50 faces for each emotional range. Observers saw only happy-sad or neutral-disgusted stimuli during a given experiment.

We varied set size from among 4, 8, 12, or 16 items, determined randomly on each trial. The faces were presented on the screen in a grid pattern in the following format: 2 row by 2 column matrix for set size 4, 2 × 4 matrix for set size 8 (14.68 × 9.53 deg), 3 × 4 matrix for set size 12 (14.68 × 14.68 deg), and 4 × 4 matrix for set size 16 (14.68 × 19.77 deg). Each face was assigned to a random position in the matrix at the start of every trial.

Procedure

On every trial there were four unique emotions displayed in the set, each of which was separated by at least six emotional units, a distance well above observers’ discrimination thresholds (results discussed in Experiment 1B). In a set size of eight there were two instances of each unique emotion, in a set size of twelve there were three instances of each emotion, and in a set size of sixteen, there were four instances of each emotion. The mean emotion of each set was randomly selected at the start of every trial. Once the mean was selected, the four unique emotions comprising the set were selected surrounding the mean: two happier and two sadder. The two happier faces were 3 and 9 units away from the mean, as were the two sadder faces (Figure 2). The mean changed on every trial, but was never a constituent of the set.

Figure 2.

Figure 2

Task design for Experiment 1A. Observers saw four unique faces, selected based on their emotional distance from the mean emotion, for 2000 ms. Set size varied among 4, 8, 12, and 16 items. Observers had to indicate whether the test face was a member of the previously displayed set. The test face could be any of the distances indicated by the circles (numbers were not seen by participants).

The set was displayed for 2000 ms and was immediately followed by a single test face (0 ISI), which could either be a member or a non-member of the preceding set. Each non-member test face was at least three units away from a member face (Figure 2). The full range of potential test faces was from 15 units below the mean to 15 units above the mean. Observers were instructed to indicate with a key press whether the test face was a member of the preceding set (a yes-no task; Figure 2). The test face remained on the screen until a response was received.

For each of the 4 possible set sizes, there were 11 possible test faces (four of which were members of the preceding set) and 5 trials for each of these test faces, for a total of 220 trials per run. Observers performed 4 runs for 880 trials total.

Results and Discussion

Figure 3A depicts the percentage of “yes” responses for each observer, collapsed across set size. A “yes” response indicates that the observer thought that the test face was a member of the previously presented set. The x-axis depicts separation of the test face from the mean of the set in emotional units (i.e. the mean changed from trial to trial, but this graph represents performance collapsed across all means). For all four observers, the probability of a “yes” response was substantially lower when the test face fell near the edge or outside of the set range, demonstrating sensitivity to the emotional range of the set. More importantly, the probability of responding “yes” increased as the test face approached the emotional mean of the set even though the emotional mean was never actually presented in the original set. Despite the instruction to attend to the individual set members, their responses revealed a bias to indicate the mean emotion of each set. This is consistent with the findings of Ariely (2001), which demonstrated that observers unintentionally represented the average size of a set of dots. Our results suggest that observers implicitly extracted the mean of a set of faces on a trial-by-trial basis.

Figure 3.

Figure 3

Probability of making a ‘yes’ response for each subject collapsed across set size (A) and broken down by set size (B). A ‘yes’ response indicates that the observer believed the test face was a member of the preceding set. Probability of making a ‘yes’ response peaked when the test face corresponded to the mean emotion of the set. (B) There were no systematic differences in probability of making a ‘yes’ response as a function of set size.

Observers did not act as ideal observers. An ideal observer’s probability of ‘yes’ responses would have produced a saw-toothed function (perfect accuracy). Figure 3A clearly demonstrates that this was not the case.

Experiment 1B

Is it possible that participants were simply noisy observers? Specifically, could noise at the perceptual, decision, or response stage produce something that looks like the data in Figure 3? To test this, we ran multiple simulations in which we convolved the expected performance of an ideal observer (i.e. a saw-toothed function) with observer discrimination ability. If discrimination ability determined performance on the yes/no set membership task, the resulting convolution should resemble observer performance. To create this convolution, however, we first had to determine each observer’s discrimination performance, nominally referred to as homogeneous discrimination.

Procedure

Each trial consisted of two intervals: a set of four identical faces simultaneously displayed for 2000 ms in a grid pattern (6.94 × 9.53 degrees) immediately followed by a single test face displayed in the center of the screen (Figure 4A). The test face remained on the screen until a response was received. The emotionality of the set was randomly selected from the gallery of morphed faces. The subsequent test face was happier or sadder than the set by ±1–6 emotional units. In a method of constant stimuli two-alternative-forced-choice task (2AFC), observers were asked to indicate with a key press whether the test face was happier or sadder than the set of identical faces. Each run consisted of 20 trials at each of the six levels separation for a total of 120 trials. Observers performed eight runs over two testing sessions for a total of 960 trials. Thus, 160 judgments were made at each of the six levels of separation. A logistic psychometric function was fit to the data using the Psignifit toolbox version 2.5.6 from Matlab (see http://bootstrap-software.org/psignifit/). Confidence intervals were derived using the bias-corrected accelerated bootstrap method based on 5,000 simulations, also implemented by Psignifit (Wichmann & Hill, 2001a, 2001b).

Figure 4.

Figure 4

Task design for Experiment 1B. (A) Observers saw four instances of a randomly selected face displayed on the screen for 2000 ms. This was immediately followed by a single test face. Observers had to determine whether the set or the test face was happier. (B) Same task as in Figure 4A, except the stimuli were neutral-disgusted morphs. Separations between set and test were doubled relative to the happy-sad condition.

We also derived threshold estimates for neutral-disgusted morphs. Three observers (two women, mean age = 20.33 yrs) performed four runs of the same task described above using the alternate emotional stimuli (Figure 4B). The separation between set emotion and test emotion was increased to ± 2–12 units, with increments of two units between test conditions.

Results and Discussion

For each observer, we identified 75% correct discrimination in terms of units of emotional separation between set and test. Figure 5 shows the psychometric functions for all observers. 75% correct thresholds were comparable for all observers in the happy-sad morph condition (KS: 5.3, FF: 3.8, JH: 2.6, DH: 4.4) as well as in the neutral-disgusted morph condition (KS: 4.4, JSH: 7.4, AC: 5.8). The results here reveal the precision with which observers could discriminate any two of the morphed faces (homogeneous discrimination ability). This information is critical for our modeling procedures, described below.

Figure 5.

Figure 5

Results of Experiment 1B. (A-B) Psychometric functions for all observers. (C) 75% thresholds for each observer in the happy-sad condition and the neutral-disgusted condition. Error bars in (A–C) are 95% confidence intervals derived from 5000 bootstrap simulations (Wichmann & Hill, 2001a, 2001b). For fitting purposes, we included a point at 0 separation between set and test (chance performance), which does not appear in the graph.

Modeling Procedure and Results

To test whether discrimination performance could predict yes/no set membership performance (i.e. simply due to observer noise), we convolved performance of an ideal observer with both the poorest discriminator of happy-sad morphs (Figure 5A, observer KS), as well as with the most sensitive discriminator of happy-sad morphs (Figure, 5A, observer JH). If the convolutions for these two observers’ mimic their yes/no set membership data, then performance on the task may be attributed to observer noise. If, however, this convolution does not replicate their yes/no set membership results, then performance may be attributed to an implicit representation of the mean emotion of each set.

As observer noise increases (i.e. discrimination ability is worse), the expected performance (i.e. the convolution) on the yes/no membership task should begin to look more like what we observed in Figure 3A– a Gaussian shaped response probability distribution, albeit wider than observed data. As noise decreases, the expected performance should begin to resemble a saw-toothed function. Figure 6 shows KS’s and JH’s actual yes/no membership data and the modeled data (the ideal observer convolved with each observers’ psychometric homogeneous discrimination function from Figure 5A).

Figure 6.

Figure 6

(A) Comparison of expected performance for KS on the yes/no membership task (happy-sad stimuli) to her actual performance. The triangles are her discrimination performance (Experiment 1) convolved with performance of an ideal observer in the yes/no membership task (noisy observer model); this reflects expected performance on the yes/no membership task. The circles indicate her actual performance. The width of the Gaussian curve fit to the actual yes/no membership data reveals a greater level of mean precision than expected based on her discrimination performance alone (narrower fitted Gaussian curve). This suggests that KS’s relatively poor face discrimination ability (from Figure 5A) cannot account for her actual sensitivity to the mean emotion of a set of faces. (B) Comparison of expected performance for JH on the yes/no membership task (happy-sad stimuli) to his actual performance. The triangles represent JH’s discrimination performance (Figure 3A) convolved with performance of an ideal observer in the yes/no membership task. A boxcar function approximates the simulated data better than a Gaussian. However, JH’s actual yes/no membership data are better captured by a Gaussian. Therefore, JH’s relatively precise face discrimination (Figure 5A) cannot account for his actual sensitivity to the mean emotion of a set of faces.

We fit Gaussian curves to both the modeled (convolved) data and to KS’s and JH’s observed yes/no membership data (Figure 6). We selected a Gaussian because the data looked roughly normally distributed, but this particular function is not critical for our conclusions. Any symmetrical function with a central peak would have adequately fit these data. The Gaussian equation was formalized as aexp([(xb)/c]2), where (a) is the amplitude, (b) is the phase, and (c) is the full-width at 75% maximum (nominally referred to as ‘width’). The parameter of most interest was the width of the Gaussian fit, as this parameter reveals the precision of mean representation—the narrower the width, the more precise the representation. We used a conservative approach and fixed the amplitude of each Gaussian to the average amplitude of KS’s observed yes/no membership data (Figure 3A) and her convolved data. With amplitude fixed, only two parameters were free to vary: curve width and phase. Figure 6 clearly shows that the Gaussian curve fit to KS’s probability of yes responses (Experiment 1A data, solid line) was substantially narrower than the noisy observer model (dashed curve), suggesting her representation of the set mean was more precise than would be predicted by her discrimination ability. To statistically test the difference between the width parameter estimates of the two curves, we fixed the width parameter of KS’s convolved data to the width of her observed data and refit the Gaussian (now with two freely varying parameters, amplitude and phase, and fixed width). If there were a substantial decline in the goodness-of-fit between the original convolved Gaussian and the two-parameter Gaussian, then the two width parameters from the observed and convolved curves would be considered significantly different from one another (i.e. one would be significantly wider than the other). There was a statistically significant difference in the quality-of-fit between these two models (F(1,8) = 31.01, p < .005), suggesting that KS’s face discrimination ability (Experiment 1B) could not account for the observed pattern of her responses in the set membership experiment (Experiment 1A).

JH was the most precise discriminator of happy-sad face expression (Figure 5A). Convolving his discrimination function with an ideal observer reveals a pattern resembling a saw-toothed function (triangles in Figure 6B). As opposed to KS’s data, a Gaussian may not even be the appropriate function to use, making parameter comparisons between convolved and observed data moot. We therefore compared the quality-of-fit of a Gaussian distribution (a three-parameter model) to the quality-of-fit of a boxcar (a two-parameter model). A boxcar distribution would suggest insensitivity to the mean, since probability of responding that a test face is a set member would not vary as a function of distance from the mean. In this case, the less complicated model (i.e. the boxcar model with two parameters) fitted the data better (sum-of-squares = 0.08 for Boxcar vs. sum-of-squares = 0.13 for Gaussian). Since the same is not true of JH’s observed yes/no data (first panel from Figure 3A; i.e. a Gaussian describes the data better than a boxcar, F(1,8) = 76.61, p < .0001), we can conclude that JH’s discrimination ability (Experiment 1B) cannot account for his pattern of yes/no membership data (Experiment 1A).

By systematically varying the slope and threshold of simulated psychometric discrimination functions (like from Figure 5), a family of hypothetical noisy observers was created. The convolution technique described above was iteratively applied to each of the hypothetical noisy discrimination functions (Figure 7). Interestingly, no realistic level of modeled noise was able to match the performance curve seen in the actual task. As the noise increased, the curves tended to become flatter and wider. While still resembling Gaussian distributions, the precision of predicted mean representation (as reflected by width) did not approach the level of precision in the actual data. As the noise decreased (i.e. became more like an ideal observer), the distribution became less like a Gaussian and more like a saw-toothed function. However, the saw-toothed shape does not resemble the membership data. As was the case for JH’s convolved data (Figure 6B), the saw-toothed function (Figure 7, inverted triangles) is not well modeled by a Gaussian—a boxcar function is a better fit. Expressed in another way, the yes/no membership data reveal a very narrow distribution of responses centered at the average facial expression; to produce such a narrow distribution of responses based solely on noisy discrimination would have required a saw-toothed shaped yes/no distribution. Clearly, this did not happen. Therefore, no level of noise in discrimination ability, whether it was substantial (very poor discriminator – worse than observer KS) or minimal (close to an ideal observer – better than observer JH), could explain the precision of the mean representation found in observers’ data. These simulations suggest that the ‘noisy observer’ hypothesis cannot account for the yes/no membership data.

Figure 7.

Figure 7

A family of simulated data sets based on various levels of face discrimination ability. Each curve was generated by convolving ideal observer performance with some degree of discrimination performance (starting with KS’s discrimination data in Figure 5A). The filled-in circles represent KS’s actual yes/no membership performance. To generate the family of models, noise was increased or decreased by parametrically multiplying the x-axis of KS’s discrimination data (Figure 3A) by one of several different gain values. Increasing noise (making the simulated observer a less precise face discriminator) increased curve width and flattened its overall appearance (diamonds), while decreasing noise (making the simulated observer a more precise face discriminator) created a curve that looks more like that of an ideal observer (a saw-toothed function; triangles). This simulation demonstrates that KS’s actual yes/no membership performance (solid circles) cannot be generated through the direct manipulation of discrimination noise. The legend shows 75% thresholds (c.f., Figure 5A) in order of increasing noise; the threshold of 5.3 (open circles) was KS’s threshold discrimination from Figure 5A.

We also examined performance as a function of set size, depicted in Figure 3B. We should note that viewing 4 iterations of 4 emotionally varying faces (set size 16) is not the same as viewing only 4 faces. Increasing the number of items effectively increases processing load. The greater the number of faces in a set (even if there are duplicates), the greater the number of faces one must track to obtain the same level of performance seen for set size of 4. That said, it is clear that there are no systematic differences in the yes/no membership data as a function of set size, consistent with the findings of Ariely (2001). Therefore, within at least the range of 4 to 16 faces, set size does not appear to influence the ability of observers to extract the mean of the set.

The fact that the membership data (e.g., Figures 3, 6, 7) followed a Gaussian rather than a saw-toothed distribution suggests that observers had a poor representation of the individual emotions present in the set of faces, and hints at the possibility that individuals are implicitly extracting a mean representation of emotion. This occurs rapidly and flexibly, since observers perceived a different set mean on every trial, and could do so even with 16 faces on the screen.

Experiment 2

The previous experiment demonstrated that individuals implicitly perceived a mean emotion in a set of heterogeneous faces. Observers were exposed to the sets for 2000 ms. To examine the speed and time course of mean emotion perception, we repeated the yes/no membership experiment, above, and manipulated stimulus duration.

Methods

Participants

Five individuals (three women, mean age = 20.67) affiliated with the University of California, Davis participated in this experiment. Two of these observers did not participate in the prior experiment. Observer KS viewed both happy-sad morphs and neutral-disgusted morphs, observer FF viewed only happy-sad morphs, and observers AC, TH, and JSH only viewed neutral-disgusted morphs. Informed consent was obtained for all volunteers, who were compensated for their time and had normal or corrected-to-normal vision.

Procedure

Observers performed the exact same task as described in Experiment 2, but were exposed to the sets for 2000 ms (as before), 500 ms, or 50 ms. We used a block design for stimulus presentation, such that observers were tested at a single duration for the entirety of a run. For this follow-up study, the five observers ran three runs (660 trials) at each duration. Set size was evenly divided between 4 and 16 faces, presented in random order.

Results and Discussion

As in Experiment 1A, we examined proportion of ‘yes’ responses, an indication that the observer thought the test face was a member of the preceding set. To quantify the effect of set duration on mean representation, we fit a Gaussian curve to the probability of yes responses for each condition, independently for each observer (Figure 8A shows one representative observer). As described above, curve fitting provides information regarding the precision of observers’ mean extraction ability. For example, a narrower curve and higher amplitude reflect greater precision. Thus, we report full-width at 75% maximum (as before), as well as curve amplitude, as a function of stimulus duration (Table 1). Nearly all Gaussian fits were significant (as indicated by the goodness-of-fit statistic, R2; Table 1).

Figure 8.

Figure 8

Results from Experiment 2 (duration manipulation). (A) Gaussian fits for one representative observer at three durations. The quality of the fit was comparable for each set duration (see Table 1), although there is an effect of set duration (lower amplitude and greater curve width at shorter durations). (B–C) The general trend of curve parameters width and amplitude as a function of duration. Each circle represents one observer’s parameter estimates. A power function best captures these data. (B) As set duration decreased, curve width increased. For set durations of 50 ms, observers had a coarser representation of the mean than at 2000 ms. Similar results were seen for amplitude (C). As set duration decreased, curve amplitude decreased.

Table 1.

Curve parameters for observers at three set durations

Observer R2 Width Amplitude
50 ms
Happy-Sad
KS 0.90** 4.90 0.37
FF 0.91** 4.85 0.67
Neutral-Disgusted
KS 0.88** 5.29 0.28
AC 0.74** 6.76 0.42
JSH 0.22 12.21 0.35
TH 0.88** 7.82 0.34

500 ms
Happy-Sad
KS 0.92** 3.91 0.38
FF 0.96** 5.17 0.45
Neutral-Disgusted
KS 0.90** 3.84 0.33
AC 0.54* 8.18 0.43
JSH 0.82** 6.62 0.42
TH 0.87** 5.35 0.39

2000 ms
Happy-Sad
KS 0.97** 3.66 0.39
FF 0.98** 4.62 1.01
Neutral-Disgusted
KS 0.90** 3.55 0.73
AC 0.70** 5.56 0.44
JSH 0.87** 5.23 0.53
TH 0.92** 4.12 0.53
**

Note. indicates significance at p < .005.

*

indicates significance at p < .05

The width of the Gaussian curves should approach infinity as set duration approaches zero. Similarly, the amplitude of the curves must approach zero as set duration approaches zero (i.e. response trends become a flat line). Based on these limits, we used a power function [f(x) = (axb + c)] to examine the trends in the parameter estimates as a function of set duration. Figure 8B shows the width and amplitude parameters of the Gaussian curve fits for each observer as a function of set duration. As expected, with decreasing presentation time, there was a significant increase in Gaussian curve width, F(2, 10) = 4.47, p = 0.04, and a significant decrease in curve amplitude, F(2, 10) = 5.33, p = 0.03. This suggests that the precision of the mean emotion representation depends upon set exposure time.

The data in Figure 8A demonstrate that curve width increases and curve amplitude decreases with decreasing set duration. However, the quality of each Gaussian fit (Table 1), reflected by the R2 for each curve, does not substantially decline. The fact that a Gaussian distribution represents these data well at all levels suggests that observers still represented mean facial expression even at 50 ms, albeit more coarsely. A trend of increasing width or decreasing amplitude across set durations simply implies a reduction in the precision of the mean representation with decreasing set duration, not a complete lack of mean representation.

To illustrate this principle more concretely, we examined the one case in which an observer was unable to represent the mean emotion at 50 ms set exposure. Unlike other observers, a Gaussian does not adequately capture Observer JSH’s performance at 50 ms (Table 1). There may be an alternative function that better represents his pattern of performance. If JSH was simply unable to extract anything meaningful from the set in such a short amount of time, a linear function (i.e. a flat line) would fit his data better than a Gaussian. This would suggest that his responses were essentially random, no longer dependent upon the distance of the test face from the set mean. We used Akaike’s Information Criteria (AIC) method for comparing the likelihood of two models: Gaussian versus linear. In this method of model comparison, a lower AIC indicates a better model fit (Motulsky & Christopoulos, 2003). The AIC value for one model by itself, however, is meaningless without the AIC value for another comparison model. The difference in AIC between two models is computed and from this an information ratio (IR) derived, which reflects the likelihood that one of the two models is correct. For observer JSH at 50 ms, a flat line was more likely the correct model (Difference in AIC = 14.37, IR= 26.23; the linear fit was 26.23 times more likely to be correct). This suggests that at 50 ms, JSH was unable to derive a mean from the set of faces. However, this was not the case for the other five observers at the same short duration, whose AICs indicated that a Gaussian curve was likely the better fitting model (KS happy-sad: Difference in AIC = 15.68, IR = 2536.04; FF happy-sad: Difference in AIC = 17.38, IR = 5952.20; KS neutral-disgusted: Difference in AIC = 14.37, IR = 1318.46; AC neutral-disgusted: Difference in AIC = 5.46, IR = 15.33; TH neutral-disgusted: Difference in AIC = 14.30, IR = 1273.19). Therefore, all subjects but JSH were able to extract a mean representation even at 50 ms, albeit a coarser representation than in the 2000 ms condition.

Although a Gaussian distribution fits the data better than a flat line for five out of six observers, this does not rule out the possibility there might be another function better fitting than a Gaussian – perhaps one that does not exhibit a peak at the center of the distribution. Examination of the curves from this experiment, as well as those from Experiment 1A, reveals a precipitous drop in the probability of responding ‘yes’ when the test face falls beyond the emotional range of the set. Is it possible that observers possessed an implicit knowledge of the range of the set rather than the mean? If this were the case, a boxcar function should fit these data better than a Gaussian distribution; the probability of ‘yes’ responses would be minimized and equal for test faces beyond the emotional range of the set, and the probability of ‘yes’ responses would be maximized and equal for those test faces within the emotional range of the set (Figure 9). To test this alternative, we computed the information ratio (IR) comparing the fit of a boxcar function to the fit of a Gaussian distribution for each observer at the 50 ms set duration. For five out of six observers, the Gaussian was more likely the better fit than the boxcar function (KS happy-sad: Difference in AIC = 5.22, IR = 13.62; FF happy-sad: Difference in AIC = 10.17, IR = 161.81; KS neutral-disgusted: Difference in AIC = 4.82, IR = 11.14; AC neutral-disgusted: Difference in AIC = 1.49, IR = 2.11; TH neutral-disgusted: Difference in AIC = 13.06, IR = 686.48). These results indicate that observers were extracting more than just the emotional range of the set of faces.

Figure 9.

Figure 9

Depiction of two possible models describing the membership data for 50 ms duration (from Figure 8). The Gaussian distribution depicts the hypothesis that observers’ yes/no responses are dependent upon the proximity of the test face to the mean emotion of the set. The boxcar alternative suggests that the range of the set influences yes/no responses, where observers are most likely to reject a test face as a set member when it falls beyond the emotional range of the set and most likely to accept a test face as a set member when it falls within the emotional range of the set. The boxcar was the better fitting model for only one observer (JSH).

We may conclude from these results that individuals perceive a mean emotion (implicitly) across multiple set durations, although this representation becomes noisier as set duration decreased. The fact that the width of the fitted Gaussian curves increased (and amplitude decreased) as set duration decreased only implies a reduction in the overall precision of mean emotion representation, and this is expected (i.e. one cannot represent anything when the set duration is 0). Therefore, observers still extract a coarse representation of the mean in a short amount of time.

Experiment 3

In the previous experiments, observers implicitly perceived the mean emotion of a set of heterogeneous faces even though they were instructed to attend to the individual constituents. We were able to measure the precision of mean face representation by fitting a Gaussian curve to the data at multiple stimulus durations. However, that method does not provide specific thresholds for mean extraction ability. In the following experiment, we explicitly asked observers whether a test face was happier or sadder (or more neutral or more disgusted, depending on task condition) than the mean emotion of the preceding set, thus deriving a concrete assessment of observer precision. This replicates and extends the work of Haberman & Whitney (2007).

Method

Participants

All observers had participated in some or all of the previously described experiments. Four observers viewed happy-sad morphs, three observers viewed neutral-disgusted morphs, and one observer viewed both (in separate runs).

Procedure

We presented sets of 4 and 16 faces. The sequence of events was nearly identical to that of Experiment 2, only the task instructions changed. The range of potential test faces was ±five, ±four, ±two, or ±one unit(s) happier or sadder (or ±ten, ±eight, ±four, or ±two units for neutral-disgusted morphs) than the mean. There were 208 trials per run, with equal number of trials at each of the two set sizes. Each observer performed at least 3 runs for a minimum total of 624 trials (78 presentations of each of the eight possible test face separations). Some observers performed four runs. The mean emotion of the set of faces was randomly selected on every trial.

For each observer, separate logistic functions were fit for observers’ homogeneous discrimination data (Experiment 1B) and their mean discrimination data. These two curves were then subjected to Monte Carlo simulations using the psignifit toolbox (Wichmann & Hill, 2001b), associated with Matlab, to test the null hypothesis that both curves came from the same underlying distribution. Psignifit created a density plot containing 5000 simulated values corresponding to slope and threshold differences between our curves of interest. Significance regions were derived based on this plot, and a p-value calculated reflecting how aberrant the actual observed difference was relative to the simulated differences.

In a method of constant stimuli 2AFC task, observers indicated with a button press whether the test face was happier or sadder (or more neutral or more disgusted) than the mean emotion of the previously presented set (Figure 10).

Figure 10.

Figure 10

Task design for Experiment 3. Observers had to indicate whether the test face was happier or sadder (or more neutral or more disgusted) than the mean emotion of set. The test face could be any of the distances indicated by the circles (numbers were not seen by participants).

Results and Discussion

Observers were explicitly asked to indicate whether a test face was happier or sadder (or more neutral or more disgusted) than the mean emotion of the previously displayed set of faces. Judgments were made for set sizes 4 and 16. As Figure 11 indicates, observers were remarkably precise in their assessment of the mean emotion of a set (i.e. mean discrimination), performing just as well as when they were asked to discriminate between two faces (homogeneous discrimination; Experiment 1B). Just how good are individuals at representing the mean emotion of a set? The Monte Carlo simulations revealed that, for six out of seven observers, thresholds between mean discrimination and regular discrimination did not differ, suggesting that the two psychometric functions conceivably came from the same underlying distribution. Observers were equally good at representing a mean emotion from a set of heterogeneous faces as they were at indicating which of two faces was happier. This is particularly striking, especially when one considers that Experiment 1A indicated that observers were unable to explicitly represent the individual set members. It appears that the visual system favors a summary representation of a set of faces over a representation of each set constituent.

Figure 11.

Figure 11

Results for Experiment 3. (A) and (B) show individual psychometric functions for mean discrimination for happy-sad morphs and neutral-disgusted morphs, respectively. Superimposed on each graph is that observer’s discrimination performance (from Figure 5). Mean discrimination performance was as precise as regular discrimination performance for nearly all observers. (C) depicts the 75% thresholds for each observer. Error bars in (A–C) are 95% confidence intervals derived from 5000 bootstrap simulations (Wichmann & Hill, 2001a, 2001b). For fitting purposes, we included a point at 0 separation between set and test (chance performance), which does not appear in the graph.

Experiment 4A

Experiment 3 demonstrated that observers were able to extract a precise representation of the mean emotion of a set of faces. Further, Experiment 1A suggested that observers disregarded the individual set members in favor of this mean representation. In the following experiments we tested to what extent observers represented the individual members of the set.

Method

Participants

Three observers (two females, mean age = 23.33) affiliated with the University of California, Davis participated in this experiment.

Procedure

Observers were instructed to identify the location in which a test face had appeared within the previously displayed set. Set size varied from one to four items. The sets of faces were created in exactly the same way as in experiment 1A, which ensured a minimum separation of six emotional units among set members (a separation above discrimination threshold). Each face was randomly assigned to one of four locations on the screen. Set size of four looked as it did in Figure 10, while smaller set sizes had one or more gaps. Here, we used neutral-disgusted morphs, although we have reported a different version of this task using happy-sad morphs (Haberman & Whitney, 2007). Sets were presented for two seconds, followed by a single test face in the center of the screen that remained until a response was received. The test face was surrounded by four letters (A, B, C, and D) that corresponded to the possible locations of the faces in the previous set. Observers had to indicate where in the set the test face had appeared. Within a given run there were 160 trials, and observers performed 3 runs for a total of 480 trials.

Results and Discussion

As expected, location identification declined as a function of set size (Figure 12A). For set size of 4 items, the group average was only 50% correct. Observers derived some information from the set, but how much? For the purpose of comparison, we estimated expected performance when a hypothetical observer explicitly remembered only one face from the set (solid line in Fig. 12). Two of the actual observers (JH and PL) were at or below this level of performance, suggesting that they could remember only one face (or less) from the sets, and this is consistent across all set sizes (Figure 12A). Observer AD performed at a level of accuracy that would be expected if she were able to remember between one and two faces. This amount of information cannot explain the level of precision on mean discrimination (Figure 11). Despite explicitly remembering only 1 of the faces in the set, observers were still able to precisely represent the mean emotion of an array of faces containing up to 16 items.

Figure 12.

Figure 12

Results for Experiments 4A and 4B. The solid line represents expected performance if observers could remember only one face from each set. Actual performance, represented by the dotted lines, was at or below this solid line for all but one of the observers (AD at set size of three and four). Error bars are ± 1 SEM.

Experiment 4B

In the previous task, observers were asked to identify the location of a face within the set. It is possible that observers had a high-fidelity representation of each set member, but simply lost its corresponding location information and therefore performed poorly on the task. To test this, observers performed a 2AFC judgment to identify which of two test faces was a member of the previously viewed set.

Participants

The same three observers from the previous experiment participated.

Procedure

We used an unbiased, 2AFC paradigm to examine how well observers represented the set members. We varied set size from one to three faces and asked participants to identify which of two subsequently presented test faces had appeared in the set. The sets of faces were the same as in Experiment 4A (with the exception that the largest set size was three faces, which kept the minimum separation between any test and/or set faces at 6 or more units). Sets were presented for two seconds (as before), followed by two simultaneously presented test faces, one of which was a target. The target was randomly selected from among the set items. The lure was at least six units away from the test face and any other member of the set (often this separation was larger). The position (top or bottom) of the target face was randomized on every trial. Observers were instructed to indicate with a key press which of the two test faces was a member of the preceding set. Within a given run there were 160 trials, and observers performed 3 runs for a total of 480 trials.

Results and Discussion

Observers performed at a high level for set size of one (Figure 12B), as expected. Performance was not at 100%, however, reflecting limitations in discrimination ability. Importantly, performance declined with the introduction of just one additional face (Figure 12B), and this trend continued through a set size of three faces. The solid line in Figure 12B indicates expected performance were subjects able to code one face from the set. As in Experiment 4A, observers remembered only one face (or less) from the set. Explicitly remembering one face from the set cannot explain the level of precision observed for mean discrimination (Figure 11). This suggests that individuals lacked high fidelity representations of the set members and not simply the locations of the members. Despite this, observers had enough coarse information about the set members to derive a precise estimate of the mean emotion (Experiment 3).

When set size was greater than just one item, participants were unable to code and retain explicit information about individual identities of the set members. Numerous studies on visual working memory have demonstrated the striking limitations in attentional capacity (Luck & Vogel, 1997; Simons & Levin, 1998). Taken in conjunction with research suggesting that searching for faces within an array is a slow and difficult process (Brown, Huey, & Findlay, 1997; Kuehn & Jolicoeur, 1994; Nothdurft, 1993), the poor performance seen in this experiment may not be entirely surprising. What is surprising, however, is that despite poor performance on set membership identification, there was still precise mean discrimination of average facial expression (Experiment 3).

Experiments 5 & 6

It is widely accepted that whole upright faces, such as those used in the experiments above, are processed in a configural or holistic manner (Farah, Wilson, Drain, & Tanaka, 1998; Kanwisher, McDermott, & Chun, 1997). Inverted and scrambled faces, on the other hand, are processed in a more part-based manner (Maurer, Le Grand, & Mondloch, 2002; Moscovitch, Winocur, & Behrmann, 1997; Robbins & McKone, 2003). If the summary statistical representation found above for sets of faces is specific to faces, and not the low-level features within the faces, then we should find more precise mean extraction for whole upright faces than for inverted or scrambled faces. The purpose of Experiments 5 and 6 was to test this.

Method

Participants

Three observers (KS, TH, JH) from previous experiments participated.

Stimuli

The same happy-sad and neutral-disgusted morphs from prior experiments were used here, except that they were either inverted (Experiment 5, Figure 13A) or scrambled (Experiment 6, Figure 13B). For scrambling, we used Matlab to divide up each of the original morphs into a 4 × 3 matrix of randomly arranged squares. The same scrambling algorithm was applied to each morph, such that the face pieces were rearranged in the same order across faces.

Figure 13.

Figure 13

Task design and stimuli for Experiments 5 and 6. Procedure was identical to Experiment 3. Observers saw either inverted (A) or scrambled (B) faces.

Procedure

With the exception of the stimuli, the procedures for Experiments 5 and 6 were identical to those used in the mean discrimination experiment (Experiment 3). Set duration was fixed at 500 ms. The recognition of inverted and scrambled faces is thought to require feature-based strategies that differ from the configural strategy associated with upright face processing (Farah et al., 1998; Kanwisher et al., 1997; Maurer et al., 2002; Moscovitch et al., 1997; Robbins & McKone, 2003). If the mean face extraction were based on configural or holistic facial information, then we would expect that observers’ mean discrimination of inverted and scrambled faces would be poorer than for upright faces.

Results and Discussion

Figure 14 shows fitted psychometric curves for each observer on mean discrimination tasks when viewing inverted/scrambled stimuli compared to when viewing upright stimuli. Mean discrimination performance for both inverted and scrambled faces suffered relative to upright faces. The inverted and scrambled faces had all of the same feature information available that upright faces had, and yet five of the six observers were significantly worse at extracting the mean emotion. Based on 5000 Monte Carlo simulations run using Psignifit (described in Experiment 3), the only non-significant result had p = .07 (observer KS, upright versus inverted; Figure 14A). For all other observers, the difference between the curves was significant, p < .05. This supports the conclusion that the perception of mean facial expression in sets of faces is distinct from low-level feature averaging. The dissociation in mean extraction performance (Figure 14) suggests that the information used to perceive mean upright facial expression was not available when the faces were inverted or scrambled. Therefore, perceiving sets of upright faces relies on a distinct and more precise facial representation. Evidently, ensemble coding not only occurs for low level features (dots and gratings), but also for high-level, complex objects.

Figure 14.

Figure 14

Psychometric curves for (A) upright mean discrimination versus inverted mean discrimination and (B) upright mean discrimination versus scrambled mean discrimination. Upright mean discrimination was significantly better (p < 0.05) than either inverted or scrambled mean discrimination for five out of six observers. Error bars are 95% confidence intervals derived from 5000 bootstrap simulations (Wichmann & Hill, 2001a, 2001b). For fitting purposes, we included a point at 0 separation between set and test (chance performance), which does not appear in the graph.

General Discussion

We have demonstrated that observers quickly recognize the mean expression of a set of faces with remarkable precision, despite lacking a representation of the set constituents. In fact, observers were able to discriminate the mean emotion of a set at least as well as they were able to discriminate the expression of two faces (Figure 11). The results cannot be explained simply by observers’ perceptual or decision noise. Further, the summary statistic was not simply the range of the set, but approximated the set mean. Feature-based processing strategies cannot account for our findings, as observers were able to extract the mean emotion of a set of upright faces with significantly more precision than they were able to extract the mean emotion from a set of inverted or fractured faces.

Ensemble coding and scene perception

Statistical representation for low-level features makes sense – an array of dots or a collection of gratings naturally combines to create a single texture. Some speculate that statistical set representation may serve, then, to promote texture perception (Cavanagh, 2001). However, we show statistical representation for face-specific processing, a level of processing well beyond that required for dots, bars, or gratings. While it is conceivable and even probable that statistical set representation plays some role in texture perception, this cannot be its only function given that it is operating on high-level, complex objects. It is likely that set representation serves a more general role in deriving gist from a complex scene. This point is particularly compelling when one considers the speed with which individuals perceived the statistical representation – noisy mean extraction occurred in as little as 50 ms for sets of up to 16 faces. Previous studies that have reported gist perception in extremely brief displays (Biederman, Glass, & Stacy, 1973; Navon, 1977; Potter, 1976) may have tapped the set representation mechanism found here. We speculate that the impression we get of a complete and wholly accurate representation of the visual world is not actually a “Grand Illusion” (Noe, Pessoa, & Thompson, 2000). Rather, a great deal of condensed information arrives in the form of summary statistics. This information, while not necessarily high fidelity, is useful and may drive the impression that we “see” everything in our environment.

Groups of faces are special

The fact that we perceive the mean emotion in a set of faces may not seem intuitive at first. Whereas deriving an average texture directly benefits surface perception, it is not clear whether similar mechanisms are at work for average emotion perception. For example, in our effort to derive a mean, do we perceive a texture of faces the same way we perceive a texture of Gabor patches? Despite this quandary, high-level ensemble coding makes sense from an evolutionary perspective. A rapid and precise assessment of a crowd of faces is useful for determining the intent of the mob. Summary statistical face processing may therefore be a unique phenomenon, at a unique level of processing.

Taken with the body of work showing ensemble coding for low-level objects such as dots (Ariely, 2001; Chong & Treisman, 2003) and gratings (Parkes et al., 2001), we can conclude that some form of averaging occurs across multiple visual domains at different levels of analysis. Unlike average size, orientation, and motion, perceiving average facial expression is not mediated by low-level features, luminance cues, or other non-configural cues (Experiments 5 and 6). Further, the sensitivity to average facial expression is remarkably precise: Subjects were able to discriminate mean facial expression at least as well as they were able to discriminate an individual face. This degree of sensitivity is not found for average size (Ariely, 2001), orientation (Parkes, et al., 2001), or other low-level features.

Parallel or serial

Of significant interest is the exploration of whether mean extraction is a parallel or serial process. Do observers automatically extract the mean from large arrays of items, the same way they do for global motion, global orientation, and some other texture segmentation tasks (Landy & Graham, 2004; Movshon & Newsome, 1996; Newsome & Pare, 1988; Parkes et al., 2001; Regan, 2000; Watamaniuk & Duchon, 1992; Williams & Sekuler, 1984)? This remains an ongoing debate, and currently there is support for both sides. Chong and Treisman (2005) suggested that mean extraction is automatic and parallel, because neither the number of items nor cueing seemed to affect the accuracy of mean size discrimination. Alvarez and Oliva (2008) reached a similar conclusion, showing that summary statistical representation for the location of a group of objects (average or centroid position) occurred even when observers did not attend to the objects. Our data from Experiments 1 and 2 suggest that ensemble coding of facial expression occurs implicitly, as observers unknowingly possessed knowledge of the mean while disregarding the task instructions to attend to the set members. However, demonstrating an implicit representation is not the same as demonstrating automaticity. Even though observers derived a mean emotion in brief stimulus displays, they showed a reduction in precision as set exposure decreased. Therefore, we cannot make a strong claim regarding the automaticity of ensemble coding, only that it can occur implicitly.

In response to the suggestion that there is average size perception (Ariely, 2001; Chong & Treisman, 2003), Myzcek and Simons (in press) have demonstrated that sparse sampling of set items is sufficient for accurate discrimination of average object size. Their model tested the precision with which observers could represent average size assuming they track some number of items in the set. In other words, the authors investigated whether existing models of directed attention could explain ensemble coding performance, eliminating the need for a separate averaging module that operates in parallel. Their model is theoretically capable of explaining much of the existing set representation data, at least when the stimuli are dots (Ariely, 2001; Chong & Treisman, 2003). However, our data depart from the average size data in one key respect: compared to baseline discrimination performance in the two respective tasks, mean discrimination of emotion is more precise than mean discrimination of size. Whereas average size perception is worse than discrimination of two dots, average emotion perception is at least as good as discrimination of two faces (homogeneous discrimination, Experiment 1B). This distinction would boost the number of items necessary for Myzcek & Simons’ (in press) model to achieve behavioral levels of performance in our task.

While Myzcek & Simons’ model supports a directed attention strategy for ensemble coding, it does not preclude the existence of a parallel mechanism as well. For example, a directed attentional mechanism cannot explain the results of Parkes and colleagues (2001), where crowded central targets (i.e., the targets were impossible to individuate because of the surrounding flankers) influenced the perceived average orientation of the entire set of Gabor patches. Because the crowded Gabor patches could not be individuated, attention could not be directed to those single items. This indicates that attention may not be necessary to derive the mean orientation (i.e., a parallel process may be at work). Thus, the jury is still out on whether ensemble coding is primarily subserved by serial or parallel processes. However, both mechanisms are capable of achieving the same end – a summary statistical representation.

Conclusions

The experiments here demonstrate the existence of summary statistical representations for groups of high-level objects: observers perceived the mean facial expression in a group of heterogeneous faces. This reveals an efficient ensemble coding mechanism that processes and represents large crowds, but is distinct from the mechanism responsible for low-level ensemble coding. The results further demonstrate that ensemble coding operates at multiple levels in the visual system.

References

  1. Alvarez GA, Oliva A. The Representation of Simple Ensemble Visual Features Outside the Focus of Attention. Psychological Science. doi: 10.1111/j.1467-9280.2008.02098.x. (in press) [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Ariely D. Seeing sets: Representation by statistical properties. Psychological Science. 2001;12(2):157–162. doi: 10.1111/1467-9280.00327. [DOI] [PubMed] [Google Scholar]
  3. Brown V, Huey D, Findlay JM. Face detection in peripheral vision: do faces pop out? Perception. 1997;26(12):1555–1570. doi: 10.1068/p261555. [DOI] [PubMed] [Google Scholar]
  4. Biederman I, Glass AL, Stacy EW. Searching for Objects in Real-World Sciences. Journal of Experimental Psychology. 1973;97(1):22–27. doi: 10.1037/h0033776. [DOI] [PubMed] [Google Scholar]
  5. Cavanagh P. Seeing the forest but not the trees. Nature Neuroscience. 2001;4(7):673–674. doi: 10.1038/89436. [DOI] [PubMed] [Google Scholar]
  6. Chong SC, Treisman A. Representation of statistical properties. Vision Research. 2003;43(4):393–404. doi: 10.1016/s0042-6989(02)00596-5. [DOI] [PubMed] [Google Scholar]
  7. Chong SC, Treisman A. Statistical processing: computing the average size in perceptual groups. Vision Research. 2005;45(7):891–900. doi: 10.1016/j.visres.2004.10.004. [DOI] [PubMed] [Google Scholar]
  8. Ekman P, Friesen WV. Pictures of facial affect. Palo Alto, CA: Consulting Psychologists Press; 1976. [Google Scholar]
  9. Farah MJ, Wilson KD, Drain M, Tanaka JN. What is "special" about face perception? Psychological Review. 1998;105(3):482–498. doi: 10.1037/0033-295x.105.3.482. [DOI] [PubMed] [Google Scholar]
  10. Haberman J, Whitney D. Rapid extraction of mean emotion and gender from sets of faces. Current Biology. 2007;17(17):R751–R753. doi: 10.1016/j.cub.2007.06.039. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Kanwisher N, McDermott J, Chun MM. The fusiform face area: A module in human extrastriate cortex specialized for face perception. Journal of Neuroscience. 1997;17(11):4302–4311. doi: 10.1523/JNEUROSCI.17-11-04302.1997. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Kuehn SM, Jolicoeur P. Impact of Quality of the Image, Orientation, and Similarity of the Stimuli on Visual-Search for Faces. Perception. 1994;23(1):95–122. doi: 10.1068/p230095. [DOI] [PubMed] [Google Scholar]
  13. Landy M, Graham N. Visual Perception of Texture. In: Chalupa LM, Werner JS, editors. The Visual Neurosciences. Vol. 2. Cambridge, Mass: MIT PRess; 2004. pp. 1106–1118. [Google Scholar]
  14. Luck SJ, Vogel EK. The capacity of visual working memory for features and conjunctions. Nature. 1997;390(6657):279–281. doi: 10.1038/36846. [DOI] [PubMed] [Google Scholar]
  15. Marr D. Vision : a computational investigation into the human representation and processing of visual information. San Francisco; W.H. Freeman: 1982. [Google Scholar]
  16. Maurer D, Le Grand R, Mondloch CJ. The many faces of configural processing. Trends in Cognitive Sciences. 2002;6(6):255–260. doi: 10.1016/s1364-6613(02)01903-4. [DOI] [PubMed] [Google Scholar]
  17. Moscovitch M, Winocur G, Behrmann M. What is special about face recognition? Nineteen experiments on a person with visual object agnosia and dyslexia but normal face recognition. Journal of Cognitive Neuroscience. 1997;9(5):555–604. doi: 10.1162/jocn.1997.9.5.555. [DOI] [PubMed] [Google Scholar]
  18. Motulsky HJ, Christopoulos A. A practical guide to curve fitting. San Diego, CA: GraphPad Software Inc; 2003. Fitting models to biological data using linear and nonlinear regression. [Google Scholar]
  19. Movshon JA, Newsome WT. Visual response properties of striate cortical neurons projecting to area MT in macaque monkeys. Journal of Neuroscience. 1996;16(23):7733–7741. doi: 10.1523/JNEUROSCI.16-23-07733.1996. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Myczek K, Simons DJ. Better than average: Alternatives to statistical summary representations for rapid judgments of average size. Perception & Psychophysics. doi: 10.3758/pp.70.5.772. (in press) [DOI] [PubMed] [Google Scholar]
  21. Nakayama K, He ZJ, Shimojo S. Visual Surface Representation: A Critical Link between Lower-Level and Higher-Level Vision. In: Kosslyn SM, Osherson DN, editors. An Invitation to Cognitive Science. 2. Vol. 2. Cambridge, MA: MIT Press; 1995. pp. 1–70. [Google Scholar]
  22. Navon D. Forest before Trees - Precedence of Global Features in Visual-Perception. Cognitive Psychology. 1977;9(3):353–383. [Google Scholar]
  23. Newsome WT, Pare EB. A Selective Impairment of Motion Perception Following Lesions of the Middle Temporal Visual Area (Mt) Journal of Neuroscience. 1988;8(6):2201–2211. doi: 10.1523/JNEUROSCI.08-06-02201.1988. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Noe A, Pessoa L, Thompson E. Beyond the grand illusion: What change blindness really teaches us about vision. Visual Cognition. 2000;7(1–3):93–106. [Google Scholar]
  25. Nothdurft HC. Faces and Facial Expressions Do Not Pop Out. Perception. 1993;22(11):1287–1298. doi: 10.1068/p221287. [DOI] [PubMed] [Google Scholar]
  26. Parkes L, Lund J, Angelucci A, Solomon JA, Morgan M. Compulsory averaging of crowded orientation signals in human vision. Nature Neuroscience. 2001;4(7):739–744. doi: 10.1038/89532. [DOI] [PubMed] [Google Scholar]
  27. Potter MC. Short-Term Conceptual Memory for Pictures. Journal of Experimental Psychology-Human Learning and Memory. 1976;2(5):509–522. [PubMed] [Google Scholar]
  28. Regan D. Human perception of objects : early visual processing of spatial form defined by luminance, color, texture, motion, and binocular disparity. Sunderland, Mass: Sinauer Associates; 2000. [Google Scholar]
  29. Rensink RA, ORegan JK, Clark JJ. To see or not to see: The need for attention to perceive changes in scenes. Psychological Science. 1997;8(5):368–373. [Google Scholar]
  30. Robbins R, McKone E. Can holistic processing be learned for inverted faces? Cognition. 2003;88(1):79–107. doi: 10.1016/s0010-0277(03)00020-9. [DOI] [PubMed] [Google Scholar]
  31. Scholl BJ, Pylyshyn ZW. Tracking multiple items through occlusion: Clues to visual objecthood. Cognitive Psychology. 1999;38(2):259–290. doi: 10.1006/cogp.1998.0698. [DOI] [PubMed] [Google Scholar]
  32. Simons DJ, Levin DT. Failure to detect changes to people during a real-world interaction. Psychonomic Bulletin & Review. 1998;5(4):644–649. [Google Scholar]
  33. Watamaniuk SNJ, Duchon A. The Human Visual-System Averages Speed Information. Vision Research. 1992;32(5):931–941. doi: 10.1016/0042-6989(92)90036-i. [DOI] [PubMed] [Google Scholar]
  34. Wichmann FA, Hill NJ. The psychometric function: I. Fitting, sampling, and goodness of fit. Perception & Psychophysics. 2001a;63(8):1293–1313. doi: 10.3758/bf03194544. [DOI] [PubMed] [Google Scholar]
  35. Wichmann FA, Hill NJ. The psychometric function: II. Bootstrap-based confidence intervals and sampling. Perception & Psychophysics. 2001b;63(8):1314–1329. doi: 10.3758/bf03194545. [DOI] [PubMed] [Google Scholar]
  36. Williams DW, Sekuler R. Coherent Global Motion Percepts from Stochastic Local Motions. Vision Research. 1984;24(1):55–62. doi: 10.1016/0042-6989(84)90144-5. [DOI] [PubMed] [Google Scholar]

RESOURCES