Abstract
The ability to estimate proportions informs our immediate impressions of social environments (e.g., of the diversity of races or genders within a crowded room). This study examines how the distribution of attention during brief glances shapes estimates of group gender proportions. Performance-wise, subjects exhibit a canonical pattern of judgment errors: small proportions are overestimated while large values are underestimated. Subjects’ eye movements at sub-second timescales reveal that these biases follow from a tendency to visually oversample members of the gender minority. Rates of oversampling dovetail with average levels of error magnitudes, response variability, and response times. Visual biases are thus associated with the inherent difficulty in estimating particular proportions. All results are replicated at a within-subjects level with non-human ensembles using natural scene stimuli; the observed attentional patterns and judgment biases are thus not exclusively guided by face-specific visual properties. Our results reveal the biased distribution of attention underlying typical judgment errors of group proportions.
Keywords: Ensemble perception, Social perception, Proportion estimation, Overestimation
1. Introduction
The ability to assess proportions underlies a wide range of judgments regarding our social environment. By making a quick visual scan (e.g., across an on-screen quilt of video feeds, or a busy street), we can obtain a rough idea of the social groups in attendance. As a further example, a glance around an academic conference room might reveal a stark gender disparity across attendees. Snap judgments of arrays of faces (following 500 ms of viewing time) have recently been studied in the laboratory (Whitney & Yamanashi Leib, 2018). Individuals are adept at extracting gist information—e.g., average emotions (Haberman & Whitney, 2009), gaze direction (Sweeny & Whitney, 2014), and diversity (Phillips, Slepian, & Hughes, 2018)—following brief glances at people. However, we do not yet understand the kinds of information acquired during these glances and how they inform typical errors in estimation. Here we gather data on eye movements as individuals make snap judgments of visual ensembles. Across two studies, we investigate judgments of gender proportions and proportions of indoor/outdoor scenes.
In prior studies of basic visual ensembles (e.g., clusters of shapes), errors in proportional judgments often display a form of conservatism (Edwards, 1968): estimates tend to lie closer to 50% than their true values. These errors peak in a sinusoidal pattern at intermediate values, with minimal errors made at extreme and central proportions (Brooke & MacRae, 1977; Erlick, 1964; Varey, Mellers, & Birnbaum, 1990). Inaccurate mental representations of people carry added significance, as estimates of crowd compositions inform attitudinal judgments, such as one’s sense of belonging to an in-group, or the level of threat posed by identified out-group members (Alt, Goodale, Lick, & Johnson, 2019; Goodale, Alt, Lick, & Johnson, 2018).
Despite similarities with basic proportion and numerosity estimation, the cognitive mechanisms underlying snap judgments of groups are largely unknown. Recently, subsampling (relying on a small number of elements) has been shown to be critical in informing summary impressions of multiple facial expressions (Ji, Pourtois, & Sweeny, 2020). In contrast, immediate social inferences on single targets such as those based on physical expressions (Ambady & Rosenthal, 1992) and facial features (Todorov, Olivola, Dotsch, & Mende-Siedlecki, 2015) are well-established within the thin-slice literature (Weisbuch & Ambady, 2011). Proposed cognitive mechanisms behind misestimates of proportions include power law transformations of sensory information (Hollands & Dyre, 2000; Spence, 1990; Stevens, 1957) and the influence of prior beliefs when making an uncertain judgment (Landy, Guay, & Marghetis, 2018). Here we examine another potential source of errors: whether attention itself is overtly biased when assessing the composition of visual ensembles. We combine methods from decision process tracing as well as ensemble perception; these methods respectively shed light on (i) typical information gathered during the assessment of group compositions and (ii) the accuracy and uncertainty associated with ensuing perceptual judgments.
We examine how visual samples—the collection of faces encompassed by subjects’ gaze—inform their impression of the entire crowd. In other words, we ask: do biases in where people look alter their judgments about whole group compositions? Understanding this relationship between gaze and judgment will be critical for explaining key anomalies in judgments of groups. For example, conservative errors can sometimes reverse, resulting in exaggerative biases (e.g., underestimation of small proportions and vice versa). In addition, multiple cycles of over- and underestimation have been reported (Hollands & Dyre, 2000), suggesting that peak errors are not restricted to intermediate proportions within a scale. Explanations for variation in the directions of these errors involve repulsive or attractor effects from category boundaries, implicit reference points (Spence & Krizel, 1994), and prototypical exemplars from memory (Huttenlocher, Hedges, & Duncan, 1991) that cause estimates to assimilate toward extreme values. We hypothesize that directional errors, independent of cognitive distortions during recall, might also occur during information acquisition.
We collect eye-tracking data to examine the process of information sampling that informs subjects’ estimates. Overt attention is selective toward visual features relevant to task performance, with the detection of grouped features happening pre-attentively (Treisman, 1982). Thus, examining viewing patterns across time might reveal features that contribute to the rapid judgments of groups and crowds. Several forms of biased information processing have already been inferred from judgment data: when estimating race-based compositions, other-race individuals appear to be weighted more heavily than same-race individuals (Thornton, Srismith, Oxner, & Hayward, 2019). Similar effects have been reported for strong emotional faces, resulting in the amplification of perceived aggregate expressions (Goldenberg, Weisz, Sweeny, Cikara, & Gross, 2021). Estimation performance has been characterized as being positively and linearly related to true proportions (Goodale et al., 2018; Yang & Dunham, 2019). However, the precise pattern of errors surrounding eye movements and resulting judgments are unknown.
Here we examine the eye movement patterns of individuals attempting to infer categorical proportions within visual ensembles. Proportion estimates form the bedrock on which characterizations of social groups are formed: our sense of belonging toward visible members and their likely motivations. We used a within-subjects design with two study treatments. Across both studies, subjects were incentivized to provide accurate estimates of the categorical proportions on display. Subjects’ eye movements were recorded during the brief (1 s) presentations of each ensemble. Study 1 features face ensembles with varying gender ratios. Study 2 tests for analogous phenomena using different, non-biological perceptual categories: indoor and outdoor scenes.
2. Study 1: group proportion estimation with face ensembles
2.1. Methods
2.1.1. Participants
Forty-five naïve observers (age range: 18–28; M = 21.93, SD = 3.34) were recruited from the Duke University community. All subjects possessed normal to corrected vision and provided informed consent prior to participation. The research protocol was approved by the Duke Institutional Review Board (IRB# 2019–0166).
2.1.2. Procedure
Subjects were incentivized to accurately report their estimate of the gender composition of face images shown on a 4 × 3 grid (Fig. 1; see Supplementary Material A for details on stimuli display). Estimates were informed by brief (1 s) exposure trials, where faces from the Chicago Face Database (CFD; Ma, Correll, & Wittenbrink, 2015) were drawn to reflect a random proportion on this grid. We note that we do not possess information about CFD subjects’ self-identified gender beyond the database’s ‘male’ and ‘female’ category labels. Post-hoc surveys performed by the CFD authors (using a specialist sample of social psychologists) confirmed that individual faces were suitable for research use within the categories of race and gender. Mean suitability ratings of target images were consistently in the 3.8–3.9 range on a 5-point scale (Ma et al., 2015). Furthermore, previous studies have elicited summary impressions of gender using the CFD to construct face ensembles (Goodale et al., 2018; Yang & Dunham, 2019) comparable to those used in the present study.
Fig. 1.
General sequence of estimation and exposure trials performed by participants. Reports were made on a slider bar, with both visual and verbal representations of the current slider setting. Exposure trials feature a fixed composition of genders per block with randomized individual identities. Estimation trials followed the 1 s duration exposure trials.
Subjects were briefed on the following block structure on the task, and the randomly chosen proportion that was the target of their estimates on each block. Before every block, the computer determined a single group composition (e.g., 3 men and 9 women) drawn from the set of all possible proportions (0–12 men or women) with equal likelihood. This composition was then held constant for 6 consecutive estimation and exposure trial pairs. At the beginning of each block, the first estimation trial was essentially a guess about the composition of upcoming exposures; these probe trials follow a reminder that the group composition had changed and allowed us to assess if expectations carried over from previous blocks. All pre-exposure trials were excluded from further analyses. After each exposure trial, the participant provided an estimate of the composition that was currently on display (Fig. 1). The task was made incentive-compatible by introducing a monetary prize based on the magnitude of subjects’ errors on each estimation trial (see Supplementary Material A for details on payoff rules).
The block structure enables us to measure subjects’ estimates of specific ensemble properties while being clear to subjects about what is actually changing across trials and blocks (Khaw, Nichols, & Freedberg, 2021). This is especially important for stochastic ensemble stimuli, as compositions, the inherent variance across elements, individual elements (e.g., face identities), and subjects’ percepts are all liable to vary across trials. Our measures produce three types of outcome variables (Fig. 2). Behaviorally, we inspect the estimates declared by the subjects and the preceding gaze patterns indicated by measured eye movements. On the basis of these two behaviors, we characterize the typical attentional biases that have taken place during exposure trials (e.g., disproportionate gaze allocated to specific category or image types).
Fig. 2.
Outcome variables measured during the ensemble estimation process. We investigate attentional biases elicited by ensemble stimuli; these visual properties are reflected by gaze behavior measured during the exposure trials. We relate the number of men and women (Study 1) or indoor and outdoor scenes (Study 2) gazed at by subjects on each trial to estimates of stimulus proportions declared by subjects (who were incentivized to produce accurate judgments).
2.2. Results
2.2.1. Overall estimation and sampling performance
We found that subjects’ estimates were significantly and positively associated with the true group composition across all trials (r(5398) = 0.90, p < .001; Fig. 3A). Examining the distribution of errors across all unique proportions, we further found that the average magnitude of errors were not randomly distributed around zero (F(12, 5387) = 51.50, p < .001). Rather, the sign and magnitude of errors reflect distinct ranges of over- and underestimation (Fig. 3B top panel), consistent with sinusoidal error patterns found in the psychophysical estimation literature (Erlick, 1964; Varey et al., 1990).
Fig. 3.
(A) Estimates of group compositions were significantly related to the true underlying gender composition (top panel) as well as the sample of faces seen by subjects on each trial (bottom panel). Solid red lines indicate the line of best fit. (B) Both estimates (top panel) and gaze behavior (bottom panel) exhibited similar patterns of a known bias: sinusoidal levels of over- and underestimation, of small and large proportions respectively. Error bars denote the standard errors of each mean. For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.
In order to test for associations between gaze behavior and resulting estimates, we defined a subject’s visual sample as the percentage of men or women stimuli gazed at by the subject during exposure trials. We first used this eye movement variable (i.e., percentage men gazed at on the array) to test the null hypothesis that information sampling is unbiased relative to the true probabilities of sampling either gender in each state. After having computed the gender percentages encompassed by subjects’ gaze coordinates, we found that the percentages witnessed by subjects were significantly biased across group composition types (i.e., not matching the true percentages presented). This pattern of visual errors (Fig. 3B, bottom panel) was also not randomly distributed around zero (F(12, 6465) = 24.12, p < .001) and qualitatively resembled the trend shown by the errors in estimation. We verified that the visual samples obtained on exposure trials were significantly and positively related to the subsequent judgments (r(5398) = 0.73, p < .001). Overall, we found that faces belonging to the gender minority were oversampled visually and their proportions were overreported in subjects’ estimates. Subsidiary analyses support the specificity of gaze being directed by the apparent gender of face stimuli, favoring faces characteristic of the minority gender (Supplementary Material B). Furthermore, a series of regression analyses on the trial-to-trial magnitude of visual errors (Supplementary Material C, Table C1) rules out attentional effects stemming from racial variation within ensembles, such as gaze effects related to the varying degrees of visual salience afforded by different races (e.g., Trawalter, Todd, Baird, & Richeson, 2008) or one’s own race (e.g., Thornton et al., 2019).
Finally, we verified in further detail the relation between subjects’ individual glances and final estimates of gender proportions. We performed a linear regression with subjects’ proportion estimates as the dependent variable, along with prior gaze data entering as independent variables. Each previous exposure trial provided time-lagged regressors similar to sequence effects induced by previous trials’ stimulus intensity and elicited responses (King & Lockhead, 1981). Given that the true gender proportions persisted throughout a given block, this regression analysis allows us to verify the explanatory power of the visual information collected by our subjects, as well as the individual weights given to each prior exposure trial. Our dependent variable comprises the final estimate on each block so that we are able to examine the weights of each visual sample reaching back to the first exposure.
Specifically, we fit the model:
| (1) |
where refers to the estimate (number of men and women on the display) declared by the subject on trial t, β0 is the fitted intercept term, Vt−1 is the gender fraction that was gazed at (number of unique men summed over the combined total of unique faces gazed at) during the prior exposure trials. This model explains 83.8% of the variance in judgments, providing further support to the hypothesis that visual information gathered by subjects’ gaze meaningfully informs subjects’ estimates. Furthermore, each visual sample was a significant regressor within the model, with the first glance weighted most heavily toward subjects’ estimates (Table 1). Collectively, the results in this section provide evidence for (i) a systematic pattern of errors across the range tested in this study; and (ii) a similar pattern of visual sampling errors that guides estimation behavior throughout the task.
Table 1.
Linear regression analysis: effects of prior visual samples on current estimate.
| Time-lagged visual sample | Estimated coefficient | SE | 95% CI |
|
|---|---|---|---|---|
| LL | UL | |||
| t−1 | 0.15 | 0.022 | 0.11 | 0.20 |
| t−2 | 0.17 | 0.021 | 0.13 | 0.22 |
| t−3 | 0.17 | 0.023 | 0.13 | 0.21 |
| t−4 | 0.20 | 0.022 | 0.16 | 0.25 |
| t−5 | 0.25 | 0.021 | 0.21 | 0.29 |
| Intercept | 0.043 | 0.0074 | 0.027 | 0.058 |
Number of observations = 1055, Error DOF = 1049, Root Mean Squared Error = 0.123 R2: 0.838; Adjusted R2: 0.837. CI = confidence interval; LL = lower limit; UL = upper limit. Each regressor is significant at p < .001 level.
2.2.2. Visual sampling and its relation to perceiver uncertainty
Next, we test the hypothesis that the visual sampling bias coincides with the perceptual uncertainty associated with each composition type. A relation with subjects’ apparent uncertainty would indicate that oversampling is also linked to the apparent difficulty involved with estimating particular ratios. Similar to Khaw et al. (2021), we operationalized perceiver uncertainty across three measures: (i) the variability of responses within a block of trials, (ii) the response times exhibited by subjects prior to declaring a judgment, and (iii) the persistence of errors following each individual exposure trial for each gender proportion. Each of these measures can be interpreted as representing average levels of uncertainty faced by the viewer, as certain group composition types ought to be easier (e.g., completely homogenous groups) or more difficult to summarily discern. The relatively greater difficulty in estimating particular compositions might have elicited increased rates of biased sampling, e.g., as seen in the estimation performance of intermediate proportions (e.g., 25% and 75%). Alternatively, the distribution of attention to minority categories might itself exacerbate the task difficulty experienced by subjects; elevated levels of response variability, responses times, and errors would occur with ensemble types that elicit greater rates of visual oversampling. Both accounts predict a positive relation between magnitudes of the visual sampling bias and our measures of observer uncertainty. We found that each of these measures of uncertainty—variability, response time, and error persistence—covaries with the overall trend found in the sampling and estimation errors (Fig. 4, Fig. 3B).
Fig. 4.
Composition-specific values of average perceptual uncertainty and estimation difficulty. In each of these measures, values peak at intermediate proportions, minimizing at central and extreme proportions. (A) The average variability associated with estimates recorded on each block and the average response time following exposure to each type of ensemble composition. (B) Intermediate proportions also feature the most persistent degree of errors, as reflected by the magnitude of errors observed across repeated exposures.
In order to test for oversampling’s relation with uncertainty more formally, we examine aggregate and subject-level correlations between the relevant variables. Significant rank-wise correlations were found between average response times and variability with absolute levels of visual sampling errors (r(11)var = 0.58, pvar < .05, r(11)rt = 0.73, prt < .01) as well as estimation errors (r(11)var = 0.69, pvar < .05, r(11)rt = 0.80, prt < .01). Second, we are also able to compute the corresponding correlation coefficients at the subject level for each of these correlation pairs. The resulting positively-shifted distributions are visualized in the Supplementary Materials (Supplementary Fig. D1). One-sample t-tests were performed with the Fisher-Z transformed correlation coefficients for each relation. We found that each distribution was significantly different than zero in the positive direction for each of the correlations between: (i) visual errors and RT (median coefficient = 0.18, t (44) = 5.24, p < .001), (ii) visual errors and variability (median coefficient = 0.14, t (44) = 2.88, p < .01), (iii) estimation errors and RT (median coefficient = 0.26; t (44) = 4.97, p < .001), and (iv) estimation errors and variability (median coefficient = 0.50; t (44) = 10.48, p < .001). Overall, the covarying patterns of visual bias, judgment errors, and estimation difficulty suggest that the perceptual uncertainty experienced by the observer co-occurs with the primary attentional and judgment biases.
2.2.3. Within-trial sampling dynamics
If the oversampling effect is the result of attributing overt attention to specific category members, then we should observe evidence for goal-directed eye movements following a brief pre-attentive phase, as seen in visual search tasks (Töllner, Zehetleitner, Gramann, & Müller, 2011), or as in the case of item individuation (Pagano & Mazza, 2012) before the enumeration of identified ensemble elements.
To test for the presence of this kind of visual processing trajectory, we computed within-trial differences between the observed probability of gazing at a particular gender category and the expected probability based on the current group composition of any given trial (i.e., the assumption that men would be encompassed 80% of the time by eye movements, when there were in fact, 80% men displayed within the ensemble). These differences in sampling probabilities were computed for each moment of recorded gaze data and averaged across 0.1 s time bins, representing 10% increments of overall exposure duration following trial onset (Fig. 5). A chi-square proportion test between visually-sampled gender proportions (and those expected from the true proportions on display) at each time bin first revealed a difference beginning at the 0.3 s bin (X2 (1, N = 1720) = 8.18, p < .01). This divergence indicates that gaze behavior began to deviate meaningfully from random sampling toward a specific kind of oversampling well within the 1 s duration of exposure trials.
Fig. 5.
State-specific divergences in within-trial gaze dynamics. The sampled proportion of faces of either gender were significantly biased—participants began to excessively sample the minority category of face ensembles after an initial period of around 0.3 s.
As an additional benchmark that controls for the number of observations in our dataset, we compared sampling deviations to a simulated unbiased observer that sampled each face from the ensemble with uniform probability. We generated a sample for every moment of recorded eye movement data and computed the simulated observer’s deviations from expected frequencies. Subjects’ deviations begin to fall outside of the maximum range of simulated errors around 300 ms following stimuli onset (Supplementary Fig. E1A). In addition, we found that the rate of switches between individual ensemble elements increases from trial onset (Supplementary Fig. E1B), suggesting that meaningful departures from random sampling are also indicative of an increase in general information-seeking behavior.
2.3. Discussion
We find that the estimates of group compositions exhibit conservatism, similar to that seen in psychophysical experiments of categorical proportion estimation (e.g., Brooke & MacRae, 1977; Erlick, 1964); the reported prevalences of a particular gender were overestimated when their true percentages were below 50%, and underestimated when their true percentages were above 50%. Consistent with the conjecture of Shuford (1961), subjects’ gaze data confirm a sampling process, albeit one that is biased in a particular manner. Within visual samples encompassed by eye movements, faces belonging to the gender minority were consistently overrepresented, matching the modal errors in reported judgments. Furthermore, estimation and visual errors peaked in tandem at intermediate proportions (e.g, 25% men, 75% women) in a periodic manner, a trend matched by levels of response times and variability. The results of Study 1 suggest that overt attention toward gender minorities—emerging consistently within 1 s exposures—ramifies into judgment errors and perceiver uncertainty. The main observations are summarized in Fig. 6 We focus next on testing the specificity of these phenomena beyond human-face images in Study 2.
Fig. 6.
Summary schematic of the overall influences on group composition estimates. Attentional biases elicited by ensemble stimuli direct gaze behavior; bias magnitudes match the levels of composition- or ratio-specific perceptual uncertainty. Resulting gaze behavior during brief exposures provide enumerated information (visual samples) that inform subjects’ estimates.
3. Study 2: group proportion estimation with natural scene ensembles
The cognitive ability to quickly form summary impressions of groups has been established using human stimuli (e.g., faces and bodies) as well as non-human stimuli (e.g., shapes and objects; see Whitney & Yamanashi Leib, 2018 for a review). In Study 2, we test the replicability and generalizability of the attentional and judgment patterns seen in Study 1 using alternate categories of non-human images: natural scenes of indoor and outdoor locations.
3.1. Method
All methods, materials, and procedures with the exception of visual stimuli were identical to those used in Study 1. In Study 2, a set of natural scene images comprising indoor and outdoor locations were used to replace face ensembles in each trial type (Supplementary Figs. F1, F2). The same subjects participated in both Study 1 and Study 2 in a random counterbalanced order.
3.2. Results
3.2.1. Overall estimation and sampling performance
Similar to Study 1, we first test for relations between subjects’ judgments in each trial with the true underlying ensemble compositions as well as subjects’ gaze data on the preceding exposure trials. Estimates of scene compositions were significantly and positively associated with their true underlying values (r(5398) = 0.89, p < .001) as well as the sample of scene images gazed at by subjects on each trial (r(5398) = 0.72, p < .001; Fig. 7A). Estimates also exhibited varying levels of over- and underestimation, of small and large proportions respectively. Both judgment errors (F(12, 5387) = 62.23, p < .001) and visual errors (F(12, 6466) = 45.48, p < .001) were not distributed randomly around the zero error benchmark. Similar to Study 1, average errors tended to be positive for proportions below 50% and negative for proportions above 50% (Fig. 7B).
Fig. 7.
Judgment and viewing patterns with natural scene stimuli. (A) Estimates of the prevalence of scene categories were positively associated with (i) the true underlying percentages (top panel) and (ii) the visual sample of scenes encompassed by subjects’ gaze (bottom panel). Lines of best fit are shown in solid red. (B) Subjective estimates (top panel) as well as gaze behavior (bottom panel) exhibit similar patterns of bias previously documented with the face stimuli. For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.
Further supporting the role of visual samples in informing subjects’ estimates, the linear model specified in Eq. 1 (replacing men/women variables with their indoor/outdoor equivalents) provides a good fit to the estimation performance of subjects, accounting for 80% of the variability in estimates. The first exposure trial is again weighted most heavily among the succesive visual samples included within the model (Table 2).
Table 2.
Linear regression analysis: effects of prior visual samples on current estimate in Study 2.
| Time-lagged visual sample |
Estimated coefficient |
SE | 95% CI |
|
|---|---|---|---|---|
| LL | UL | |||
| t−1 | 0.15 | 0.025 | 0.10 | 0.20 |
| t−2 | 0.16 | 0.025 | 0.11 | 0.22 |
| t−3 | 0.18 | 0.027 | 0.13 | 0.24 |
| t−4 | 0.20 | 0.025 | 0.15 | 0.24 |
| t−5 | 0.25 | 0.024 | 0.21 | 0.30 |
| Intercept | −0.023 | 0.0086 | −0.040 | −0.0056 |
Number of observations = 1066, Error DOF = 1060, Root Mean Squared Error = 0.137 R2: 0.805; Adjusted R2: 0.804. Each regressor is significant at p < .01 level.
3.2.2. Visual sampling and its relation to perceiver uncertainty
Here we again test for a positive association between viewer uncertainty (as reflected by response variability across blocks and average response times) with two of our main outcome variables: judgment errors and rates of oversampling. Both variability (r (11) = 0.79, p < .01) and response times (r (11) = 0.81, p < .01) were positively associated with the absolute level of judgment errors that occur at each scene composition. Aggregate levels of visual sampling biases (Fig. 7B) were also correlated with each measure of perceptual uncertainty (Fig. 8A): response variability (r (11) = 0.59, p < .05) and response times (r (11) = 0.83, p < .05). Mirroring the analysis procedure in Study 1 with subject-level coefficients, one-sample t-tests were performed with the Fisher-Z transformed correlation coefficients for each distribution. Each distribution of subject-wise correlations are plotted in Supplementary Fig. D1. The resulting distributions of coefficients were significantly different than zero in the positive direction for each variable pair: (i) visual errors and RT (median coefficient = 0.22, t (44) = 5.98, p < .001), (ii) visual errors and variability (median coefficient = 0.17, t (44) = 3.00, p < .01), (iii) estimation errors and RT (median coefficient = 0.38; t (44) = 6.73, p < .001), (iv) estimation errors and variability (median coefficient = 0.58; t (44) = 10.79, p < .001).
Fig. 8.
Measures of observer uncertainty and within-trial gaze dynamics in Study 2. (A) Levels of response variability and response times (operationalized measures of perceptual uncertainty) align with the magnitude of judgment and sampling errors observed, replicating aggregate and subject-level correlations observed in Study 1. (B) Within-trial eye movement dynamics using natural scene stimuli; the majority-specific divergences in sampling rates occurs again at approximately 0.3 s following stimuli onset.
3.2.3. Within-trial sampling dynamics
Similar to Study 1, we found a divergence between displayed and viewed category proportions that occurs approximately 0.3 s following stimuli onset (Fig. 8B); the significance of this difference was confirmed with a chi-square proportion test between the expected and viewed proportions within the corresponding time bin (X2 (1, N = 3057) = 5.08, p < .05). This separation in gaze trajectories supports the hypothesis of eye-movements being incrementally distributed toward salient (minority) category members following an initial pre-attentive phase, prior to the parsing of category-specific features within each ensemble.
3.3. Discussion
The results from Study 2 collectively replicate each of the main phenomena (Fig. 6) observed with face ensembles in Study 1. The disproportionate influence of minority category members generalizes for natural scene images; minority elements were again overrepresented in subjects’ gaze patterns and proportion estimates. The similarity in effects across domains suggest that holistic impressions reported by our subjects were not contingent on face-specific familiarity or expertise (see Gauthier (2020) for a treatise on the importance of contrasting face and non-face treatments due to expertise-based differences). Gaze patterns were thus also not exclusively guided by the configural properties of faces that facilitate summary representations (Haberman & Whitney, 2009) or by secondary social cues specific to faces (e.g., racial/ethnic groupings, as analyzed further in the Supplementary Materials). The reoccurence of the attentional bias also offers parallels to existing results in ensemble and proportion estimation. The visual grouping of human or non-human stimuli can elicit judgment biases common to both types of ensembles (Carragher, Thomas, Gwinn, & Nicholls, 2019). In addition, our two studies jointly extend the range of results relating to proportion estimation, originally tested using shapes, symbols, and letters (e.g., Erlick, 1964; Shuford, 1961).
4. General discussion
We conclude that errors in perceptual judgments about the composition of groups are driven by an attentional bias that favors members of the minority category. This conclusion follows from converging evidence across four primary results, as summarized in Fig. 6. First, we show that people overestimate the quantity of faces and scenes belonging to the minority category, a pattern previously seen in other forms of proportion estimation. Second, we show that minority category members are visually oversampled (i.e., are more likely to be the target of eye gaze), consistent with the observed pattern of biases in judgment. The trial-by-trial relation between subsampled faces and reported biases were corroborated by the results of a linear regression analysis (Table 1). Third, we demonstrate that perceptual uncertainty—as indexed by average levels of variability and response times—tracks these overlapping patterns of visual and reported errors. At the within-trial level, we show that these goal-directed eye movements toward minority elements arise as rapidly as 0.3 s following stimulus onset, suggesting that subjects’ capability to selectively identify and process samples of interest arises immediately following a pre-attentive phase.
The minority category oversampling can be interpreted as a pattern of information-seeking behavior exhibited by the majority of our subjects when assessing the composition of groups. For example, subjects’ gaze might be distributed reflexively on the basis of low-level properties (akin to outlier detection) or directed at countable elements consciously in a goal-directed manner (potentially linked to enumeration or numerosity processing, discussed below). The influence of subjects’ visual samples on resulting judgments dovetails with the finding of subsampling informing average impressions of emotion ensembles (Ji et al., 2020). Although the tendency to identify minority category members can lead to errors in judgment, it might be especially effective for disambiguating specific types of group compositions. For instance, the detection of categorical outliers might have enabled our subjects to quickly determine whether the current group is completely homogenous, or if it were comprised of a clear majority. Furthermore, this allocation of attention might be adaptive for other visual arrangements of ensemble information, particularly in natural environments where members of a category tend to be spatially clustered (e.g., homophily in social networks, see McPherson, Smith-Lovin, & Cook, 2001). Future experiments might manipulate the visual clustering within ensemble displays in order to introduce spatial correlations between the locations of each social category on display (e.g., clusters of same-race individuals).
Accurate estimation in our task requires the maintenance and retrieval of information gained during exposure trials. Oversampled elements might offer exemplars that are more easily retrieved from memory (e.g., due to either a recency effect, due to their encoding as salient categorical outliers, or due to them being visually-distinct episodes). The cognitive overweighting of categories within ensembles has been obseived in other experimental contexts, e.g., preferential weighting toward faces with strong emotions (Goldenberg et al., 2021), other-race faces (Thornton et al., 2019), as well as spatially and temporally salient elements (Kanaya, Hayashi, & Whitney, 2018). The relative ease-of-recall or retrieval might then alter proportion judgments through decision rules such as the representativeness heuristic (Kahneman & Tversky, 1972). Modifications to the current task could examine recall-based effects downstream of attentional biases by introducing delays between exposure and estimation trials, or by adding probe trials that measure the ease-of-recall for specific faces.
Our study also sheds light on how ensemble elements are identified and integrated during short timescales, as is studied in the field of numerosity perception. The interpretation of subjects’ eye movements over time (Figs. 5, 8B) is consistent with two general forms of group object processing: individuation and identification (Xu, 2009). Individuation allows for the simultaneous detection of around four objects along with their spatial relations at sub-second timescales (Pagano & Mazza, 2012). Performance on detecting visual targets within this range is independent of stimuli complexity (Mazza & Caramazza, 2011), making this a candidate mechanism that enables minority element detection. Identification occurs following this pre-attentive phase and allows further visual detail to be extracted from individuated elements. Eye movements might be guided toward specific category members (Reijnen, Wolfe, & Krummenacher, 2013) or category attributes, after which these seen elements are weighted to produce an overall estimate of proportion.
The sinusoidal patterns of errors (both in judgments and eye movements) mirror those found in higher-level estimates regarding social attitudes and factual knowledge, e.g., in estimates of the proportion of individuals who hold particular political and religious beliefs in a society (Landy et al., 2018). The computational model proposed by Landy et al. (2018) predicts sinusoidal errors as arising from logarithmic proportional encoding and uncertainty-weighted averaging. Though their analysis assumes accurate input-level information, our results show that brief presentations yield similar biases at the level of information acquisition. Biases introduced from the overweighting of prior beliefs might then exacerbate these viewing errors, or indeed guide them, as in the case of visual confirmatory biases (Eberhardt, Goff, Purdie, & Davies, 2004). Though it is beyond the scope of our study to find links between attentional mechanisms and societal beliefs, we nonetheless note that the biases of our subjects are analogous to errors observed in large-scale surveys of demographic proportions.
The distribution of attention employed by our subjects provides implications for the presentation of groups in applied settings. The attentional bias seen here might contribute to the efficacy of representative minorities found in various media, such as in magazine advertisements (Bowen & Schmid, 1997) and marketing material depicting inclusivity (Pippert, Essenburg, & Matchett, 2013). The generalizability of the phenomena we observe would depend on whether the sampling bias is scale-invariant across group sizes, presentation durations, and more naturalistic presentation styles. On a broader scale, the phenomena of visual attention being distributed to statistical outliers on display has been documented in several forms. Similar phenomena include eye gaze being drawn toward surprising events that occur across time (Itti & Baldi, 2009), salient low-level visual properties (Parkhurst, Law, & Niebur, 2002), and regions featuring high amounts of semantic information (Peacock, Hayes, & Henderson, 2019).
Our results reveal that the presence of minority category members shapes the attention and subsequent overall impressions of viewers. The tendency to detect categorical outliers drives an initial overrepresentation of minority elements, accompanying a parallel pattern of estimation errors. These results suggest that erroneous impressions of our environment form very rapidly—during initial glances at the people or places that constitute our surroundings.
Supplementary Material
Acknowledgements
The authors gratefully acknowledge financial support from the National Institute of Mental Health (NIMH R01-108627; S.H.). The funder had no role in the design, execution, interpretation, or writing of the study. The authors would like to thank M. Cotet for help with data collection and B. De Oliveira for help with manuscript preparation.
Footnotes
Appendix A. Supplementary material
Supplementary material to this article can be found online at https://doi.org/10.1016/j.cognition.2021.104756.
Data availability
Data from both studies can be accessed at (https://osf.io/y6q82/).
References
- Alt NP, Goodale B, Lick DJ, & Johnson KL (2019). Threat in the company of men:Ensemble perception and threat evaluations of groups varying in sex ratio. Social Psychological and Personality Science, 147(11), 1660. [Google Scholar]
- Ambady N, & Rosenthal R (1992). Thin slices of expressive behavior as predictors of interpersonal consequences: A meta-analysis. Psychological Bulletin, 111(2), 256. [Google Scholar]
- Bowen L, & Schmid J (1997). Minority presence and portrayal in mainstream magazine advertising: An update. Journalism & Mass Communication Quarterly, 74(1), 134–146. [Google Scholar]
- Brooke JB, & MacRae AW (1977). Error patterns in the judgment and production of numerical proportions. Perception & Psychophysics, 21(4), 336–340. [DOI] [PubMed] [Google Scholar]
- Carragher DJ, Thomas NA, Gwinn OS, & Nicholls ME (2019). Limited evidence of hierarchical encoding in the cheerleader effect. Scientific Reports, 9(1), 1–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Eberhardt JL, Goff PA, Purdie VJ, & Davies PG (2004). Seeing black: Race, crime, and visual processing. Journal of Personality and Social Psychology, 87(6), 876. [DOI] [PubMed] [Google Scholar]
- Edwards W (1968). Conservatism in human information processing (Formal representation of human judgment). [Google Scholar]
- Erlick DE (1964). Absolute judgments of discrete quantities randomly distributed over time. Journal of Experimental Psychology, 67(5), 475. [DOI] [PubMed] [Google Scholar]
- Gauthier I (2020). What we could learn about holistic face processing only from nonface objects. Current Directions in Psychological Science, 29(4), 419–425. [Google Scholar]
- Goldenberg A, Weisz E, Sweeny T, Cikara M, & Gross J (2021). The crowd emotion amplification effect. Psychological Science, 32(3), 437–450. [DOI] [PubMed] [Google Scholar]
- Goodale BM, Alt NP, Lick DJ, & Johnson KL (2018). Groups at a glance: Perceivers infer social belonging in a group based on perceptual summaries of sex ratio. Journal of Experimental Psychology: General, 147(11), 1660. [DOI] [PubMed] [Google Scholar]
- Haberman J, & Whitney D (2009). Seeing the mean: Ensemble coding for sets of faces. Journal of Experimental Psychology: Human Perception and Performance, 35(3), 718. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hollands JG, & Dyre BP (2000). Bias in proportion judgments: The cyclical power model. Psychological Review, 107(3), 500. [DOI] [PubMed] [Google Scholar]
- Huttenloeher J, Hedges LV, & Duncan S (1991). Categories and particulars: Prototype effects in estimating spatial location. Psychological Review, 98(3), 352. [DOI] [PubMed] [Google Scholar]
- Itti L, & Baldi P (2009). Bayesian surprise attracts human attention. Vision Research, 49 (10), 1295–1306. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ji L, Pourtois G, & Sweeny TD (2020). Averaging multiple facial expressions through subsampling. Visual Cognition, 28(1), 41–58. [Google Scholar]
- Kahneman D, & Tversky A (1972). Subjective probability: A judgment of representativeness. Cognitive Psychology, 3(3), 430–454. [Google Scholar]
- Kanaya S, Hayashi MJ, & Whitney D (2018). Exaggerated groups: Amplification in ensemble coding of temporal and spatial features. Proceedings of the Royal Society B: Biological Sciences, 285(1879), 20172770. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Khaw MW, Nichols P, & Freedberg D (2021). Uncertainty-based overestimation of group actions. Vision Research, 179, 42–52. [DOI] [PubMed] [Google Scholar]
- King MC, & Lockhead GR (1981). Response scales and sequential effects in judgment. Perception & Psychophysics, 30(6), 599–603. [DOI] [PubMed] [Google Scholar]
- Landy D, Guay B, & Marghetis T (2018). Bias and ignorance in demographic perception. Psychonomic Bulletin & Review, 25(5), 1606–1618. [DOI] [PubMed] [Google Scholar]
- Ma DS, Correll J, & Wittenbrink B (2015). The Chicago face database: A free stimulus set of faces and norming data. Behavior Research Methods, 47(4), 1122–1135. [DOI] [PubMed] [Google Scholar]
- Mazza V, & Caramazza A (2011). Temporal brain dynamics of multiple object processing: The flexibility of individuation. PLoS One, 6(2), Article e17453. [DOI] [PMC free article] [PubMed] [Google Scholar]
- McPherson M, Smith-Lovin L, & Cook JM (2001). Birds of a feather: Homophily in social networks. Annual Review of Sociology, 27(1), 415–444. [Google Scholar]
- Pagano S, & Mazza V (2012). Individuation of multiple targets during visual enumeration: New insights from electrophysiology. Neuropsychologia, 50(5), 754–761. [DOI] [PubMed] [Google Scholar]
- Parkhurst D, Law K, & Niebur E (2002). Modeling the role of salience in the allocation of overt visual attention. Vision Research, 42(1), 107–123. [DOI] [PubMed] [Google Scholar]
- Peacock CE, Hayes TR, & Henderson JM (2019). Meaning guides attention during scene viewing, even when it is irrelevant. Attention, Perception, & Psychophysics, 81 (1), 20–34. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Phillips LT, Slepian ML, & Hughes BL (2018). Perceiving groups: The people perception of diversity and hierarchy. Journal of Personality and Social Psychology, 114(5), 766. [DOI] [PubMed] [Google Scholar]
- Pippert TD, Essenburg LJ, & Matchett EJ (2013). We’ve got minorities, yes we do: Visual representations of racial and ethnic diversity in college recruitment materials. Journal of Marketing for Higher Education, 23(2), 258–282. [Google Scholar]
- Reijnen E, Wolfe JM, & Krummenacher J (2013). Coarse guidance by numerosity in visual search. Attention, Perception, & Psychophysics, 75(1), 16–28. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shuford EH (1961). Percentage estimation of proportion as a function of element type, exposure time, and task. Journal of Experimental Psychology, 61(5), 336–340. [Google Scholar]
- Spence I (1990). Visual psychophysics of simple graphical elements. Journal of Experimental Psychology: Human Perception and Performance, 16(4), 683. [DOI] [PubMed] [Google Scholar]
- Spence I, & Krizel P (1994). Children’s perception of proportion in graphs. Child Development, 65(4), 1193–1213. [Google Scholar]
- Stevens SS (1957). On the psychophysical law. Psychological Review, 64(3), 153. [DOI] [PubMed] [Google Scholar]
- Sweeny TD, & Whitney D (2014). Perceiving crowd attention: Ensemble perception of a crowd’s gaze. Psychological Science, 25(10), 1903–1913. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Thornton IM, Srismith D, Oxner M, & Hayward WG (2019). Other-race faces are given more weight than own-race faces when assessing the composition of crowds. Vision Research, 157, 159–168. [DOI] [PubMed] [Google Scholar]
- Todorov A, Olivola CY, Dotsch R, & Mende-Siedlecki P (2015). Social attributions from faces: Determinants, consequences, accuracy, and functional significance. Annual Review of Psychology, 66. [DOI] [PubMed] [Google Scholar]
- Töllner T, Zehetleitner M, Gramann K, & Müller HJ (2011). Stimulus saliency modulates pre-attentive processing speed in human visual cortex. PLoS One, 6(1), Article e16276. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Trawalter S, Todd AR, Baird AA, & Richeson JA (2008). Attending to threat: Race-based patterns of selective attention. Journal of Experimental Social Psychology, 44(5), 1322–1327. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Treisman AM (1982). Perceptual grouping and attention in visual search for features and for objects. Journal of Experimental Psychology: Human Perception and Performance, 8(2), 194. [DOI] [PubMed] [Google Scholar]
- Varey CA, Mellers BA, & Birnbaum MH (1990). Judgments of proportions. Journal of Experimental Psychology: Human Perception and Performance, 16(3), 613. [DOI] [PubMed] [Google Scholar]
- Weisbuch M, & Ambady N (2011). Thin-slice vision. The science of social vision, 228–247. [Google Scholar]
- Whitney D, & Yamanashi Leib A (2018). Ensemble perception. Annual Review of Psychology, 69, 105–129. [DOI] [PubMed] [Google Scholar]
- Xu Y (2009). Distinctive neural mechanisms supporting visual object individuation and identification. Journal of Cognitive Neuroscience, 21(3), 511–518. [DOI] [PubMed] [Google Scholar]
- Yang X, & Dunham Y (2019). Hard to disrupt: Categorization and enumeration by gender and race from mixed displays. Journal of Experimental Social Psychology, 85, 103893. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Data from both studies can be accessed at (https://osf.io/y6q82/).








