Stereotypes shape response competition when forming impressions

Neil Hester; Sally Y Xie; Jeannine Alana Bertin; Eric Hehman

doi:10.1177/13684302221129429

. 2022 Nov 3;26(8):1706–1725. doi: 10.1177/13684302221129429

Stereotypes shape response competition when forming impressions

Neil Hester ^1,^✉, Sally Y Xie ¹, Jeannine Alana Bertin ², Eric Hehman ¹

PMCID: PMC10665134 PMID: 38021317

Abstract

Dynamic models of impression formation posit that bottom-up factors (e.g., a target’s facial features) and top-down factors (e.g., perceiver knowledge of stereotypes) continuously interact over time until a stable categorization or impression emerges. Most previous work on the dynamic resolution of judgments over time has focused on either categorization (e.g., “is this person male/female?”) or specific trait impressions (e.g., “is this person trustworthy?”). In two mousetracking studies—exploratory (N = 226) and confirmatory (N = 300)—we test a domain-general effect of cultural stereotypes shaping the process underlying impressions of targets. We find that the trajectories of participants’ mouse movements gravitate toward impressions congruent with their stereotype knowledge. For example, to the extent that a participant reports knowledge of a “Black men are less [trait]” stereotype, their mouse trajectory initially gravitates toward categorizing individual Black male faces as “less [trait],” regardless of their final judgment of the target.

Keywords: dynamic interactive models, person perception, stereotyping, mousetracking

In North America, Black men are often stereotyped as less intelligent than White men (Devine & Elliot, 1995; Williams & Mohammed, 2013), even by individuals who do not explicitly endorse such a viewpoint. Imagine separate situations in which both a Black and a White man are encountered and evaluated as equally intelligent. Although the ultimate conclusion about each man is the same—that they are “very intelligent”—do perceivers reach this conclusion through the same underlying cognitive process? How a perceiver arrives at “very intelligent” in each scenario may be different because of perceivers’ awareness of cultural stereotypes. Stereotypic knowledge about gender, race, and other characteristics shapes our expectations, impacting which people are regarded as intelligent (or trustworthy, or any other trait). Thus, North American perceivers may initially be inclined to perceive the very intelligent-looking Black man as “less intelligent” given their stereotypical expectations, even if this stereotype is corrected or updated milliseconds later in the impression formation process as more information is integrated. Here, we investigate whether stereotype–feature congruence—the degree of agreement between stereotype knowledge and a target’s overall features—has an impact on the way in which the psychological process of impression formation unfolds over time. This research extends existing work on impression formation by examining the interplay between perceiver characteristics and target characteristics throughout the process of impression formation (i.e., unfolding over time), rather than merely examining how these effects influence the final product of the impression formation process.

Background

Impressions formed from others’ appearance are multiply determined. Theoretical models have generally broken down contributions as being from (a) the target’s characteristics, (b) the perceiver’s characteristics, and (c) interplay between target and perceiver characteristics (Funder, 1995; Hehman et al., 2019; Hester et al., 2021; Hönekopp, 2006; Kenny et al., 2006). The first part is intuitive—what targets look like influences perceivers’ impressions of them. The second part includes ways in which perceiver differences influence their impressions of targets (Kawakami et al., 2017; Kunda & Thagard, 1996; Macrae & Bodenhausen, 2000). For example, perceivers’ moods could have an impact on their impressions, or they might have a stable tendency to give more positive or negative impressions overall. The third part of variability in impression formation—the interaction between perceiver and target—is explained by perceiver and target characteristics together. For example, many stereotype findings involve the combination of perceiver-level endorsements of stereotypes and target-level social identity. Research using variance decomposition techniques has consistently revealed that this is the largest source of variance in impression formation (Hehman et al., 2017; Hester et al., 2021; Hönekopp, 2006; Xie et al., 2019). Whereas the target and perceiver characteristics alone seem to each contribute roughly 10%–20% of the variance in a given impression, perceiver × target interactions account for between 30%–40% (Hehman et al., 2017; Xie et al., 2019).

Perceiver × target interactions are often at the heart of social cognitive questions and research, encompassing topics such as how perceivers’ cognitions influence behaviors differently when they interact with members of different social groups. For example, prejudiced people rate own- and other-group individuals differently, whereas unprejudiced people do so to a lesser extent (Hugenberg & Bodenhausen, 2003; Hutchings & Haddock, 2008). Impression formation theories present models for how perceiver × target interactions occur (Freeman & Ambady, 2011; Rhodes, 2006; Todorov et al., 2015; Webster & MacLeod, 2011; Zebrowitz & Montepare, 2005) throughout the impression formation process, resulting in a final impression. A modern example is the dynamic interactive model of impression formation, which suggests that the processing of bottom-up facial features is constrained by top-down cognition (e.g., emotion, stereotypes, goals) (Freeman et al., 2020).

Although models of person perception commonly implicate perceiver × target interactions in the impression formation process (i.e., how impressions unfold over time), formal tests of these interactions are still uncommon, as most perceiver × target interactions address the final impression rather than the underlying process. This is partially due to methodological challenges. However, understanding the nature of these interactions is central to such models and their predictions. In this article, we examine the role of perceiver × target interactions in the impression formation process using a novel mousetracking approach.

Tapping Into the Process of Impression Formation

Researchers face a difficult challenge when measuring the specific processes underlying impression formation: impression formation unfolds very quickly over time and involves difficult-to-measure psychological factors. Much of the work examining the process underlying impression formation relies on indirect inferences from outcomes such as reaction time or error rate. For example, when people are asked to classify threatening versus non-threatening objects accurately, racial information has an impact on both reaction times and error rates (e.g., tools versus guns) (Payne, 2001; Todd et al., 2016), which allows for indirect inferences about the underlying process, but not for direct measurement of this process. Other measures, such as event-related potentials, can provide more direct insight by recording fluctuations in brain activity (Correll et al., 2015; Kubota & Ito, 2014); however, these techniques can be difficult to use and interpret.

The limitations of these techniques are at least partly addressed by mousetracking, which provides high-resolution, millisecond-level information about people’s decision process when deciding between two or more competing response options (Dale et al., 2007; Freeman & Ambady, 2010). Mousetracking has been employed in multiple domains to investigate and advance models of cognitive theory (Stillman et al., 2018). Outside of the social perception domain, mousetracking research has explored the real-time resolution of response competition in the domain of self-control (Gillebaart et al., 2016; Schneider et al., 2015; Stillman et al., 2017) and moral decision-making (Koop, 2013). With regard to person perception, mousetracking research has provided evidence of cognitive competition in the process of gender and race categorization of atypical faces (Freeman et al., 2010; Freeman & Ambady, 2011; Hehman et al., 2014). Finally, mousetracking research has broadly illustrated the dynamic role of stereotypes in social cognition. For instance, race and gender cues in the face interact with stereotype information to drive the categorization of targets’ race, gender, and emotions (Brooks & Freeman, 2018; Johnson et al., 2012).

The vast majority of tests that have used mousetracking to test person perception models have focused on the process of forming social categorizations (e.g., “is this person male or female?”) rather than the process of forming trait impressions (e.g., “how attractive is this person?”), which is the focus of the present research. Social categorization and trait impressions share some similarities (Freeman et al., 2020) and one might assume that they involve the same processes, such that research on the process of forming social categorizations accurately generalizes to the process of forming trait impressions. However, there are reasons to doubt this assumption, necessitating research specifically on the process of forming trait impressions. In the next section, we outline differences between social categorization and trait impressions. Our goal here is not to set up a direct experimental comparison between social categorization and trait impressions, but rather to justify the importance of specifically examining the process underlying trait impressions.

Differences Between Forming Social Categorizations and Trait Impressions

There is evidence suggesting that the processes underlying social categorization are distinct from those underlying trait impressions. To start, categories are usually discrete (it is intuitive to say that someone is male or female), whereas trait impressions are usually continuous (someone can be more or less attractive). Although processes underlying discrete categorizations are continuous, as ongoing response competition between multiple choices resolves itself over time to arrive at a categorization (Dale et al., 2007; Freeman et al., 2008), the nature of the judgment itself is nevertheless different. The determinants of these patterns of fluctuation might differ depending on whether choices reflect discrete categories (e.g., “male” versus “female”) or continuous trait impressions (e.g., “more” versus “less” attractive).

In addition, impressions (e.g., trustworthy, immoral) are typically more valenced than categorizations (e.g., male, female), just as perceiver knowledge about trait concepts is more valenced than categorizations. As a result, categorization is more likely to leverage perceiver knowledge that is grounded in base rates (e.g., “on average, women have longer hair than men”), whereas impressions are more likely to leverage perceiver knowledge or endorsement of valenced stereotypes (e.g., “women are less competent” or “Black people are lazier”). These differences suggest that idiosyncratic perceiver variability might play a larger role in forming impressions than in categorization processes. Indeed, past work has found that observers have a high degree of consensus when inferring the sex or race category of faces, but a much lower degree of consensus when inferring traits (e.g., attractive) from faces (Bjornsdottir et al., 2021; Hehman et al., 2017). Given our focus on how top-down perceiver characteristics interact with bottom-up target features to influence impressions, a clear understanding of this perceiver variability is important.

Another critical distinction between social categorization and first impressions is temporal precedence. If we assume that race and gender have an impact on trait impressions of faces via stereotypes, then this necessitates the categorization of faces into different races and genders prior to impression formation (Cloutier et al., 2005; Funder, 1995; Macrae & Martin, 2007). This categorization of race (occurring within 100–120 ms) and gender (occurring within 150–200 ms) emerges early in the perceptual process (Ito & Urland, 2003, 2005; Zhang et al., 2018). Because the morphological facial features necessary for an individualized trait impression are processed later in the impression formation pipeline than race and gender categorization (Ambrus et al., 2019; Cloutier et al., 2005; Dobs et al., 2019), evidence for the integration of race and gender stereotypes with morphological features (as captured by mousetracking) would be gravitation toward stereotypical responses early in the impression formation process, followed by subsequent adjustment as individual facial features are incorporated.

Together, these distinctions support the usefulness of research specifically examining the process of forming trait impressions. Next, we review existing work on the process of forming trustworthiness impressions specifically, then describe how our research approach attempts to form more generalizable conclusions about the process of forming trait impressions.

Uncovering the Process of Forming Trait Impressions

Although little research has explored the process of forming trait impressions, there have been some notable exceptions studying impressions of trustworthiness specifically. Some research has examined these perceptions across the lifespan (Cassidy et al., 2019), finding that older adults exhibited a tendency to rate faces as more trustworthy. Other research focusing on face–context integration asked participants to view faces in the presence of threatening, negative but unthreatening, or neutral contexts (Brambilla et al., 2018). Results were consistent with predictions of their model in that contextual information was integrated into both the process, and the final trustworthiness ratings of the faces.

Building on this previous research, we conducted a broader examination of the process of forming impressions across numerous traits. Trustworthiness is an important impression and the focus of much research, but—relative to many other trait impressions—it is particularly informed by the facial appearance of a target (Hehman et al., 2017), especially the extent to which the target appears to be smiling or frowning (presumably due to overgeneralization processes) (Zebrowitz et al., 2010). Consistent with the dynamic interactive model (Freeman et al., 2020), we posit that impressions are formed through the integration of top-down perceiver information and bottom-up stimuli appearance. To support this broad hypothesis, we examine impression formation processes across (a) traits, (b) target race, and (c) target gender. This approach, which emphasizes generalizability and replicability across varied conditions, is critical for achieving a fuller understanding of impression formation.

“Top-down perceiver information” could refer to any individual difference, but given the close relationship between stereotypes and trait impressions, we focus on perceivers’ knowledge of cultural stereotypes, regardless of whether they personally endorse those stereotypes. For example, take the finding that the association of “Asian” with “femininity” leads to a North American cultural stereotype of Asian women as attractive and Asian men as less attractive (gendered race hypothesis) (Galinsky et al., 2013; Johnson et al., 2012; Schug et al., 2015). When there is high congruence between the stereotypes activated by a face’s race and gender category and a face’s morphological features (e.g., an Asian female face with attractive facial features), we predicted that participants’ evolving impression—reflected by a mouse trajectory toward their choice—would take a relatively straight path, indicating little attraction toward the alternate choice (i.e., lower response competition). However, when there is low congruence between the stereotypes activated by a face and the face’s overall features (e.g., an Asian male face with attractive facial features), we predicted that participants’ impressions would initially exhibit greater attraction toward the unselected choice (i.e., higher response competition). We refer to this predicted pattern of outcomes as stereotype–feature congruence.

Although our example specifically focuses on attractiveness, we expect stereotype–feature congruence to be a trait-general phenomenon, as predicted by the dynamic interactive model of person perception (Freeman et al., 2020). In other words, the model does not predict the pattern of effects to be moderated by the specific trait according to which perceivers form impressions. Traits do differ in their characteristics in several ways, for example, the morphological characteristics they are based upon (Hehman et al., 2015; Oosterhof & Todorov, 2008), how they correlate with other traits (Oh et al., 2020; Sutherland et al., 2013; Xie et al., 2021), and the amount of agreement between perceivers with regard to a trait impression (Hehman et al., 2017; Xie et al., 2019). However, to the extent that an individual has stereotype knowledge about a given trait and its association with a target’s category (e.g., race and gender), we predict this stereotype information will influence the impression formed no matter the trait, because these aspects of specific traits should not be relevant to the process by which impressions unfold.

Sharp Versus Graded Response Curves

A secondary aim in our analyses is to examine the trajectories of perceivers’ trait impressions over time. There are several possible ways in which impressions can be formed, and the high-resolution data that mousetracking provides can help distinguish between mechanisms underlying impressions. These can reflect either sequential processes or the simultaneous integration of information. Past research has attempted to demarcate the perceptual pipelines leading to an impression (Macrae & Bodenhausen, 2000). From a target’s appearance, category information informs a social categorization (e.g., White); then, stereotypes associated with this categorization (e.g., trustworthy) are activated. The classic sequential view of this process is that social categorization must be completed prior to the stereotype information being accessed. In contrast, modern work informed by dynamic models (Freeman et al., 2008, 2020; McKinstry et al., 2008) has argued that these processing stages can be continuous and simultaneous rather than discrete and sequential. In this model, partially processed information about a target’s appearance continuously updates a category impression (e.g., White), which in turn continually updates partially active impressions (e.g., trustworthy) that are informed by stereotypes. This relationship is bidirectional, as the activation of stereotype information (e.g., trustworthy) further constrains categorization (e.g., White).

The patterns of trajectories in a mousetracking paradigm can reveal which of these processes is occurring. The sequential processing possibility would predict a mouse trajectory in which participants are initially attracted to a response, but then abruptly “correct” this tendency mid-flight as they integrate additional stereotype information sequentially. Patterns indicative of this response often have a straight line toward one response that abruptly shifts at a sharp angle toward the other response, resulting in a bimodal distribution of area under the curve (AUC). On the other hand, top-down stereotypes and bottom-up morphological information might instead be simultaneously and continuously integrated throughout the response. Participants demonstrating such a pattern would show smooth competition and a slow shift from one response category toward the other, with the mouse trajectory indicating a curve instead of sharp angles (Dale et al., 2007). In this case, the distribution of AUC scores would be unimodal and relatively normal. These two patterns are indicative of distinct underlying processes of impressions.

Previous research has found more evidence for graded responses (Freeman et al., 2008, 2010; Johnson et al., 2012) in categorization processes. We test whether these results generalize to trait impressions, which would provide further evidence for a dynamic interactionist perspective.

The Present Studies

We conducted two studies. The first was an in-lab exploratory study using MouseTracker (Freeman & Ambady, 2010). The second was a preregistered direct replication and extension using an online sample and a mousetracking task programmed using MinnoJS (Zlotnick et al., 2015). Prior to COVID-19, we had preregistered a second in-lab direct replication, but changed our plan when in-person data collection became unfeasible.

Open Data, Open Materials, and Preregistration

Our complete data and the analysis script, as well as an anonymized copy of our preregistration, are available at https://osf.io/mgsza. In both studies, we report all measures, manipulations, and exclusions.

Study 1

Participants and Design

We recruited 238 participants through a participant pool. Each participant completed 300 main trials (150 faces, two ratings each) in the mousetracking task in a 3 (Target race: East Asian, Black, White) × 2 (Target gender: female, male) × 6 (Trait: attractive, dominant, friendly, intelligent, physically strong, trustworthy) mixed design, with repeated measures on the first two factors. These six traits were chosen based on previous work that identifies trustworthiness, dominance, and youthfulness/attractiveness as primary dimensions of face perception (Sutherland et al., 2013). Fourteen of these participants were excluded during the data cleaning process (details below) and four asked that their data not be included, leaving us with 220 participants for analysis (81% women, 18% men, 0.5% other gender identity, 0.5% no response, M_age = 20.34 years, SD_age = 2.47 years).

Using simr (Green & MacLeod, 2016), we conducted a sensitivity power analysis to estimate the minimum effect size for the key effect that we could reasonably detect using this sample. The estimate suggested that this sample is sufficient to detect an effect size of b = –0.040 with 80% power.

Materials and Procedure

Stimuli

Stimuli were Asian, Black, and White male and female faces with neutral expressions pseudo-randomly selected from the Chicago Face Database (Ma et al., 2015). Twenty-five stimuli for each group were presented in order randomized by participant. Participants rated each face twice for a single trait.

Mousetracking task

Participants first received instructions for a mousetracking task programmed in MouseTracker (Freeman & Ambady, 2010). Instructions were as follows: “We are interested in how people evaluate faces. Today, you will be evaluating faces on their [TRAIT]. For every face that appears, your task is to indicate whether they are more or less [TRAIT]. You will do so by clicking the selections at the top corners of the screen with your mouse. It is important to respond AS QUICKLY AS POSSIBLE. If you don’t respond quickly enough, you will receive a warning.”

Following practice trials to familiarize themselves with the task, participants completed the main task. They were encouraged to take a short break at 50-trial intervals, for a total of five breaks. For every trial, the cursor was re-centered at the bottom of the screen. Target faces appeared slightly above this starting position. If participants took longer than 500 ms to begin moving their mouse at the beginning of a trial, they received a warning message after the trial asking them to move more quickly.

The two response choices—“Less [TRAIT] than average” and “More [TRAIT] than average”—were displayed in the upper left and upper right corners of the screen, respectively. We anticipated variation in what perceivers considered “average,” and interpret these responses as above or below participants’ idiosyncratic perceptions of average. Given our theoretical interest in top-down and bottom-up integration of information, we considered it ideal that this design naturally adjusted for participants’ individual perceptions.

Participants’ responses to the MouseTracker task were recorded as raw mouse coordinates, which measure the x and y coordinates of the mouse every ~15 ms (exact time varies according to the Hz of the screen). These coordinates were time-normalized, each trajectory having 101 equally spaced time units (Freeman & Ambady, 2010), and then used to derive the AUC, which measures the degree of spatial attraction of the mouse cursor toward the unselected choice. AUC is calculated as the area between the actual trajectory and an idealized straight-line trajectory from the origin to the final choice. AUC is the outcome on which we focused our analyses, although we also report analyses using spatial entropy in the markdowns provided on the OSF page (Hehman et al., 2015). The spatial entropy analyses produced similar patterns of results.

Stereotype knowledge task

After completing the mousetracking task, participants completed a set of questionnaires asking them about North American stereotypes with regard to various race × gender groups. Participants indicated the extent to which they felt that each of the six race × gender groups was stereotypically perceived as each of the six traits, for a total of 36 questions. The wording was as follows: “To what extent are [RACE] [GENDER] stereotyped as [TRAIT]?” Participants gave their responses on a seven-point Likert-type scale ranging from 1 = “not at all” to 7 = “completely”. Previous research has measured stereotype knowledge in a similar manner (e.g., Stolier & Freeman, 2016; Xie et al., 2021) and found it corresponds with impressions both behaviorally and at the neural level. Furthermore, work measuring stereotype knowledge in other ways supports its validity as a construct (e.g., Devine & Elliot, 1995). In Study 2, we further examine the construct validity evidence for this measure by comparing it with stereotype endorsement.

Data Cleaning

Data were processed and cleaned, consistent with previous research (Freeman et al., 2010). In order to compare mouse trajectories across all participant response choices, we time-normalized trajectories to 101 timesteps and horizontally flipped trajectories so that both response options (e.g., “less intelligent,” “more intelligent”) were in the same direction regardless of response selection. Any trials with an initiation time outside the range (min = 1 ms, max= 500 ms) and any trials with a response time outside the range (min = 100 ms, max = 3000 ms) were removed. On visual inspection of individual trajectories, we removed outlier mouse trajectories that either (a) showed initial horizontal deviation from the cursor origin or (b) contained looping. Additionally, data from 14 participants were excluded from analysis because they consisted uniquely or largely of trajectories with frequent looping, indicative of participants not participating in the task. In total, 94% of trials were retained for analysis.

Results

Data and code are available at https://osf.io/mgsza.

Analysis Strategy

For each participant, each trial was assigned a stereotype knowledge score, which was the participant’s stereotype knowledge for the target’s race-by-gender category for the trait currently being rated. For example, if a participant answered the question “To what extent are Black women stereotyped as intelligent?” with a “3,” in all mousetracking trials rating the intelligence of Black women, this participant’s stereotype knowledge score would be 3. The use of this score incorporates race, gender, and stereotype knowledge ratings in a single variable, improving the interpretability of our findings and simplifying the statistical model. Put differently, this stereotype knowledge score allows us to test our primary hypothesis via a two-way interaction between response and stereotype knowledge score, rather than a five-way interaction between response, stereotype knowledge, trait, race, and gender.

By combining this stereotype knowledge score with participants’ responses to a given trial, we captured the extent to which their response trajectories were congruent with stereotype knowledge. To account for the cross-classified structure of the data, we nested responses within (a) participant, (b) stimulus, (c) social category,¹ and (d) participant × social category in a multilevel framework. This specification is necessary because the stereotype knowledge score is a function of both participant and the specific race-by-gender social category being referenced. Participant response is a trial-level variable, making the interaction between participant response and stereotype a cross-level interaction. For cross-level interactions, the slope of the lower-level predictor must be specified as random at the level of the higher-level predictor to avoid severe Type I error (Heisig & Schaeffer, 2019). Accordingly, we allowed the slope of participant response to vary at the random intercept of participant × social category. Thus, the final model is as follows:²

\begin{array}{l} {AUC}_{i} \sim N (μ, σ^{2}) \\ μ = α_{j [i], k [i], l [i], m [i]} + β_{1 j [i]} (response) + \\ β_{2} (stereotype) + β_{3} (stereotype \times response) \\ (\begin{matrix} α_{j} \\ β_{1 j} \end{matrix}) \sim N ((\begin{matrix} μ_{α_{j}} \\ μ_{β_{1 j}} \end{matrix}), (\begin{matrix} σ_{α_{j}}^{2} \\ ρ_{α_{j} β_{1 j}} \end{matrix} \begin{matrix} ρ_{α_{j} β_{1 j}} \\ σ_{β_{1_{j}}}^{2} \end{matrix})), \\ for participant : socialcat j = 1, . . ., J \\ α_{k} \sim N (μ_{α_{k}}, σ_{α_{k}}^{2}), for participant = 1, . . ., K \\ α_{l} \sim N (μ_{α_{l}}, σ_{α_{l}}^{2}), for participant = 1, . . ., L \\ α_{m} \sim N (μ_{α_{m}}, σ_{α_{m}}^{2}), for participant = 1, . . ., M \end{array}

For any models that did not converge in lme4, confirmation of the parameter estimates was done using a Bayesian approach in brms (Bürkner, 2017).

AUC

In line with our stereotype–feature congruence hypothesis, the interaction between response and stereotype knowledge score was significant, b = –0.10, t(1414) = –6.85, p < .001, 95% CI [–0.13, –0.07]. When participants responded that a target was “less” of a trait, a higher stereotype knowledge score was associated with a larger AUC. When participants responded that a target was “more” of a trait, a greater stereotype knowledge score was associated with a smaller AUC. In Figure 1, we visualize this effect by using a graph to depict time-normalized trajectories for “less” and “more” responses at the lower and upper tertiles of stereotype knowledge score.

Figure 1. — Time-Normalized Trajectories by Stereotype Knowledge and Response

*Note*. Red and blue lines represent the average trajectories for trials in the upper and lower tertiles of stereotype scores. Tertiles are for visualization purposes only. Points along the red and blue lines represent the 101 time-normalized steps. Gray lines represent individual trajectories.

*Note*. Please refer to the online version of the article to view this figure in colour.

The main effect of stereotype knowledge score was not significant, b = –0.01, t(695) = –0.99, p = .32, 95% CI [-0.02, .001]. Unrelated to our hypotheses, we did observe a main effect of response, such that “more” responses showed higher AUC scores than “less” responses, b = 0.17, t(1252) = 11.10, p < .001, 95% CI [0.14, 0.20], which was qualified by a response × trait interaction, detailed below. The full model (with model statistics and intraclass correlations) is available in the Markdown file on the OSF page under “Main Analyses.”

Generalizability of Effects Across Traits

We sampled six different traits between subjects in order to improve the generalizability of our findings—for any single trait, it could be the case that any effect was specific to that trait, rather than applicable to all traits more broadly. Accordingly, we tested whether effects were dependent on trait. We found no significant interaction between trait and our key interaction of response × stereotype knowledge score according to AUC, F(5, 1358) = 1.20, p = .31, suggesting that evidence for stereotype–feature congruence is consistent across different traits.

We did observe a significant interaction between trait and response, F(5, 1458) = 61.32, p < .001, such that the way response choice predicted AUC varied considerably depending on the trait: for attractive, b = 0.58 (95% CI [0.52, 0.64]); for dominant, b = 0.21 (95% CI [0.13, 0.30]); for friendly, b = 0.19 (95% CI [0.10, 0.28]); for intelligent, b = –0.05 (95% CI [–0.14, 0.03]); for physically strong, b = 0.05 (95% CI [–0.04, 0.14]); for trustworthy, b = –0.16 (95% CI [–0.26, –0.06]). See Figure 2. In addition, this heterogeneity suggests that people’s initial tendency to begin evaluating someone as “more” or “less” of a trait, independent of stereotype knowledge scores or race × gender, varies depending on the trait. For example, these results suggest that people tend to move toward “less” for attractiveness impressions but toward “more” for trustworthiness impressions (this is visible in Figure 2 by focusing on the “overall” trajectories for “less” and “more” by trait). See the Markdown file for full statistics by trait.

Figure 2. — Time-Normalized Trajectories by Stereotype Knowledge and Response

*Note*. Red and blue lines represent the trajectories for trials in the upper and lower tertiles of stereotype scores. Tertiles are for visualization purposes only. Points along the red and blue lines represent the 101 normalized time steps. Gray lines represent individual trajectories.

*Note*. Please refer to the online version of the article to view this figure in colour.

Distributional Analyses

Next, we examined the patterns of mouse trajectories to conclude whether they support sequential processing of appearance and stereotype information versus simultaneous integration of this information. The mouse trajectories evident in Figures 1 and 2 are apparently smooth and graded. As described in the introduction, such a pattern is evidence of continuous and simultaneous processing of top-down and bottom-up information. However, such a conclusion would be preliminary, as these apparently smooth trajectories could be the result of averaging together trials showing very little attraction to the category not selected, and trials showing discrete mid-flight corrections (i.e., evidence of sequential processing).

To examine this possibility, and as in previous research (Freeman et al., 2008), we plotted the distributions of AUC across different levels of stereotype knowledge scores in order to assess whether there are two distinct subpopulations of trajectories (i.e., bimodality) or one single distribution (Figure 3). Results clearly show that there was no bimodality at any level of stereotype knowledge. Accordingly, we interpreted these graded responses as evidence of simultaneous integration of stereotype knowledge and target appearance throughout the impression formation process, consistent with a dynamic interactionist perspective (Freeman et al., 2020).

Figure 3. — Density Plots for Area Under the Curve Across Stereotype Knowledge Scores

*Note*. Distributions of area under the curve scores were approximately the same across levels of stereotype knowledge score (i.e., the participants’ reported stereotype knowledge for a given race-by-gender category).

Participant Stereotype Knowledge

Although the calculation of stereotype knowledge scores is useful for parsimoniously modeling our hypothesized effect, this approach does remove the descriptive component of the specific patterns in participants’ stereotypes. Therefore, we modeled participants’ reported stereotype knowledge as a function of race and gender to illustrate how the stereotype–feature congruence results map onto stereotype knowledge scores. Figure 4 indicates that participants’ stereotype knowledge shows interactions between race and gender, in line with work on intersectional stereotyping (Galinsky et al., 2013; Ghavami & Peplau, 2013; Johnson et al., 2012)—all six traits showed a significant interaction between race and gender, although the specific patterns varied by trait. The patterns observed align with previous work sampling from a North American population. We see that Asian men and Black women are stereotyped as especially low in attractiveness (gendered race hypothesis) (Galinsky et al., 2013; Johnson et al., 2012; Schug et al., 2015), Black people are stereotyped as high in physical strength and low in intelligence and trustworthiness (Devine & Elliot, 1995; Wilson et al., 2017), Asian people are stereotyped as low in physical strength and high in intelligence (Wong et al., 2012; Yee, 1992), and White people are stereotyped as high in attractiveness, friendliness, and trustworthiness (Fiske et al., 2002).

Figure 4. — Stereotype Knowledge by Race and Gender

*Note*. Scores show the extent to which participants indicated that each race × gender group was stereotypically perceived as each of the six traits. Black error bars represent 95% confidence intervals.

Discussion

This study offers clear support for our hypothesized stereotype–feature congruence effect. Perceivers’ knowledge of race × gender stereotypes predicted their spatial attraction to the response choice congruent with the stereotype, regardless of their final response choice. This finding suggests that within milliseconds in the decision-making process, participants’ impressions of faces are guided by top-down knowledge of stereotypes. This result supports previous theory and research implicating top-down knowledge of stereotypes in impressions and, critically, provides novel evidence that this top-down knowledge interacts with the content of impressions (i.e., the specific trait being rated), influencing how the process underlying these impressions unfolds.

Study 2

We conducted Study 2 as a direct preregistered replication: https://osf.io/jmfz2. For Study 2, we collected data online, providing a more diverse sample. In addition to participants’ stereotype knowledge, we also collected information about their personal endorsements of stereotypes. We aimed to contrast the role of stereotype endorsement with stereotype knowledge in the process of forming face impressions. To address power and participant fatigue concerns arising from our use of an online sample, we focused on three of the six traits from Study 1: attractiveness, intelligence, and trustworthiness. Again, these reflect the primary dimensions of face perception and stereotype content identified in previous work (Sutherland et al., 2013).