Abstract
An important development in the study of face impressions was the introduction of dominance and trustworthiness as the primary and potentially orthogonal traits judged from faces. We test competing predictions of recent accounts that address evidence against the independence of these judgements. To this end we develop a version of recent ‘deep models of face impressions’ better suited for data‐efficient experimental manipulation. In Study 1 (N = 128) we build impression models using 15 times less ratings per dimension than previously assumed necessary. In Study 2 (N = 234) we show how our method can precisely manipulate dominance and trustworthiness impressions of face photographs and observe how the effects' pattern of the cues of one trait on impressions of the other differs from previous accounts. We propose an altered account that stresses how a successful execution of the two judgements' functional roles requires impressions of trustworthiness and dominance to be based on cues of both traits. Finally we show our manipulation resulted in larger effect sizes using a broader array of features than previous methods. Our approach lets researchers manipulate face stimuli for various face perception studies and investigate new dimensions with minimal data collection.
Keywords: dominance‐trustworthiness model, face perception, first impressions, neural networks, social cognition
BACKGROUND
Oosterhof and Todorov's (2008) pioneering study of face judgements, which revealed a stable perceptual basis for traits such as dominance and trustworthiness, consisted of two phases. First they identified 13 traits with the most stable ratings across individuals that were spontaneously judged when forming impressions of faces. Then hoping to uncover a structure underlying these trait judgements, they identified two principal components in the ratings of unfamiliar adult faces that correlated with judgements of trustworthiness and dominance, respectively. This result was recently shown to be robust in a global replication effort (Jones et al., 2021), and has since been interpreted to mean that trustworthiness and dominance themselves constitute orthogonal dimensions of face evaluation (e.g. Getov et al., 2015; Todorov et al., 2008). Importantly dominance and trustworthiness have since been theorized to have functional roles, with trustworthiness and dominance serving as signals of others' intent and ability to cause harm, respectively (Todorov et al., 2008).
The second part of Oosterhof and Todorov's study was to find the shape differences that drive trait judgements on 3D models of faces. This data‐driven method called ‘reverse‐correlation’ (Dotsch & Todorov, 2012; Todorov et al., 2008) amounts to finding in the space of physical face variation, where faces are encoded as vectors, a direction that best explains the difference between faces with different trait ratings. Having found this direction they were able to generate faces with different levels of perceptual dominance and trustworthiness and show they succeeded in changing trait impressions.
The first study may leave the reader with the impression that one can increase one's perceptual trustworthiness in order to be judged more trustworthy, while leaving judgements of dominance unchanged. However it is important not to confuse correlational data with causal experimental data. Face impressions vary not only with morphology but also with other perceptual variables that influence the face's appearance, which we can call identity‐related factors, such as age or gender (Todorov et al., 2011), factors that change the person's perceived social category (Mileva et al., 2019; Sutherland et al., 2013), and these are not the factors that one is looking to change when asking intervention‐based questions. This was also not the goal of the original authors, whose second study was also not designed to test the possibility that changing perceptual trustworthiness – the complex of facial features associated with judgements of trustworthiness regardless of the social category of the face – could orthogonally increase impressions of trustworthiness of real faces. This is partly due to the limitations that the use of 3D models imposes on the range of possible questions. Critics of this approach point out that 3D models are not realistic and lack the distinctiveness of real faces (Freeman & Ambady, 2011), which limits their generalizability to real faces. More importantly if 3D models lack discriminative features that could anchor the identity of the face, we cannot control for other relevant factors that determine the face's appearance and the direct effect of an intervention on perceptual trustworthiness or dominance on impressions.
The process of reverse‐correlation involves determining the common perceptual factors responsible for certain judgements based on psychological ratings. It is these factors that influence such judgements. However in order to understand the effect of perceptual trustworthiness on these judgements, it is essential to experimentally alter its associated properties. However to avoid confounding the observed effect, we would want the manipulation not to change the perceived social categorization of the face (e.g. not to change perceived gender when manipulating dominance; Balas & Pacella, 2015). What is needed to test the orthogonality of dominance and trustworthiness is a method that has both the representational capacity to leave most of the face's features unchanged when manipulating dominance and to produce faces that appear realistic. Methods have been proposed that avoid some of the unrealistic nature of 3D models, such as morphing approaches (Tiddeman et al., 2001, used in e.g. Oh, Buck, & Todorov, 2019), but while they produce more realistic faces, these methods still rely mostly on the manipulations found for 3D models, which can expose them to the same criticisms (Crookes et al., 2015).
Consequently current methods have not yet conclusively determined whether the lack of a linear correlation observed at the level of trustworthiness and dominance ratings is due to the fact that cues signalling dominance do not have a causal influence on trustworthiness impressions and trustworthiness cues do not have a causal influence on dominance impressions. Theoretically if these judgements do indeed serve functional roles, as recent evidence on children's faces supports (Collova et al., 2019), this limitation leaves unanswered the question of the relationship between these roles. Specifically whether the statistical results for ratings can be extended to claim that the functional judgements of benevolence and power underlying these impressions are themselves independent. This intervention‐based question is also important for understanding the evolutionary role of these cues, as exploring the social side effects associated with an increase in trustworthiness cues helps to answer why all humans haven't converged on a high expression of such advantageous traits (Mercier, 2020). Several researchers have already suggested that the relationship between these judgements may be more complex. These include the role of valence, typicality (Dotsch et al., 2017; Sutherland et al., 2015), overlapping features (Oh, Buck, & Todorov, 2019), correlated models (Dotsch & Todorov, 2012) and the interaction with social categories (Oh et al., 2020; Sutherland et al., 2015). These different accounts have different predictions about how an intervention on one trait would affect impressions of the other. The goal of the present study is to put these predictions to multiple tests by introducing a new method that allows us to directly and more effectively test intervention‐based effects.
Evidence for and against dominance‐trustworthiness orthogonality
First the commonly held orthogonality account of these judgements actually has limited support in the data. Although the PCA analysis performed by Oosterhof and Todorov (2008) looks for orthogonal factors, it is not a way to prove dominance‐trustworthiness orthogonality as it is simply a redescription of a dataset into uncorrelated dimensions and is therefore not a type of factor analysis. Recognizing the limitations of PCA, Sutherland et al. (2013) instead used factor analysis, a modelling framework that allows latent factors to vary. This approach was also used by Jones et al. (2021) who reported mixed results regarding the orthogonality of these factors.
A more direct test is to look at the correlation of their ratings. However the lack of correlation does not equal lack of causation, as (1) these dependencies can be obscured by differences in the relative propensity of combinations of levels of perceptual trustworthiness and dominance in real faces (Collova et al., 2019), (2) the effect can have any non‐linear shape, (3) ratings are also influenced by what we called identity‐related variables. Moreover direct examination of the correlations between perceived trustworthiness and dominance may be insufficient, as these constructs are manifest variables that, according to factor analysis theory, are derived from linear combinations of underlying latent components. The relationship between these judgements has to be thus determined experimentally. The closest evidence for the orthogonality account is Sutherland et al.'s (2015) demonstration that ambient image ratings did not correlate for dominance and trustworthiness judgements.
The first evidence of non‐orthogonality is that reverse‐correlation models of these traits found with 3D faces are correlated (Oosterhof & Todorov, 2008). This means that the dominance and trustworthiness vectors in the space of 50 features of 3D models shared some of their manipulated features. There are three reasons why this may happen. First the 50 features that could be manipulated may have been a fine‐grained enough representation of face variation to capture independent dominance and trustworthiness manipulations. Second trustworthiness and dominance may share some of their perceptual features. Third trustworthiness and dominance impressions may not be orthogonal, which creates reverse‐correlated manipulation that captures some of the other traits' features. An overlapping features account would predict that a manipulation of dominance would linearly change trustworthiness ratings, although less than it would change dominance ratings. For example a model correlated with competence impressions increases both competence and attractiveness ratings, but the former is stronger than the latter (Oh, Buck, & Todorov, 2019).
Motivated by the orthogonality account, researchers have forced these models to be orthogonal by subtracting the trustworthiness vector from the dominance vector (Oh, Buck, & Todorov, 2019; Oosterhof & Todorov, 2008; Todorov et al., 2011). A better way to test the hypothesis that dominance and trustworthiness share features would be to use a face representation that is high‐dimensional enough to potentially allow orthogonal manipulations of dominance and trustworthiness. To this end the present study uses a neural network‐based method similar to that proposed by Peterson et al. (2022). Although the Peterson et al. (2022) method is promising for such investigations, we make two critical observations that motivate our method: (1) the manipulations significantly alter the identities of the individuals depicted in the generated images; (2) the computational model used forces the manipulation to focus on a very specific subset of features (e.g. the trustworthiness manipulation almost exclusively changes how much the person smiles), which may be very temporal features of the face, making the manipulation less ecologically valid. We will discuss these points in more detail. For now the tentative conclusion may be that without building on this method, when analysing impressions for manipulated faces, we cannot separate the influence of traits closely related to, say, trustworthiness from that of the identity‐related traits and temporary facial features, such as position and expression, that are changed by the manipulation. We present ways to overcome these issues and improve this method in the neural network‐based methodology section.
The case for a dependency between these judgements could also be made theoretically on the grounds that it could be beneficial in the execution of their functional roles. For example people could benefit from judging a highly dominant individual as less trustworthy, as it manifests hesitance in attributing goodwill to individuals who have the capacity to harm them (Oh, Buck, & Todorov, 2019). Conversely it could be that a lack of dominance cues does not constitute a reason to increase one's trust. A related proposal stresses the role of valence. It has been shown that valence‐laden social judgements, such as trustworthiness, competence and attractiveness are influenced by external factors perceived as positive or negative. Sutherland et al. (2015) showed that high dominance is associated with negative valence in female faces (see also Radke et al., 2018), which leads to decreased impressions of trustworthiness. The dependency was also stronger in perceivers, who more strongly endorsed gender stereotypes (Oh et al., 2020; for conceptually related results for race, see Livingston & Pearce, 2009; Mercier & Sperber, 2017).
Moreover valence‐laden impressions have a qualitatively different pattern, observed by Todorov et al. (2008) (as well as many others Nakamura & Watanabe, 2019; Oh, Dotsch, & Todorov, 2019), in which a linear manipulation of facial features produces a quadratic shape of change in trustworthiness ratings, with the manipulation to increase impressions being less effective than the manipulation to decrease impressions. One explanation for this effect is that an additional factor in valence‐laden impressions is the typicality of faces (Dotsch et al., 2017; Sofer et al., 2015). Manipulating faces could make them more atypical, which has been shown to be associated with negative valence judgements and thus reduced trustworthiness. Such a typicality account would predict that a manipulation to either increase or decrease dominance could also lead to a decrease in trustworthiness judgements because it decreases typicality, and that dominance impressions would be unaffected by the trustworthiness manipulation because these impressions are not affected by typicality.
A neural network‐based methodology
The development of machine learning has recently begun to gradually permeate psychological research. The need to create more accurate and precise experimental stimuli has begun to be addressed through deep learning – a subfield of machine learning that involves the use of multi‐layer neural networks to create models that can learn complex data representations (Goodfellow et al., 2016). One of the ways in which such models have shown great promise is in the generation of photorealistic human faces that can differ along several psychologically important dimensions, as demonstrated by Peterson et al. (2022). They used face representations from a type of generative neural network called a Generative Adversarial Network (GAN; Goodfellow et al., 2014) to build a predictive model for many psychological traits and showed that such computational models can be used to manipulate features associated with that trait on newly generated faces.
Generative Adversarial Networks are models which learn to generate new stimuli similar to a given set of training data. The network learns a multi‐dimensional numerical representation of the data, called the latent space, where each point corresponds to an output stimulus that it can generate (in this case, to a face). One of the goals in designing GANs is to make a latent space that allows for each feature that varies in the training data to be manipulated independently of other features. This property, called disentanglement, can be used for manipulation by finding in the latent space a direction associated with some trait, similar to Todorov et al.'s (2011) reverse‐correlation.
To further adopt latent space manipulation in GANs for experimental manipulation in psychological research the method of manipulation should follow principles of validity and specificity (the manipulation influences the psychological impression while controlling for extraneous variables). To maximize validity we can find in a data‐driven way the direction in face space that will yield the biggest change in trait impressions with the smallest change in overall appearance. While traditional methods necessitate the explicit selection of controlled factors to maximize specificity, the disentangled representations learned by GANs independently model different factors of variation in the data, such that when a factor is selected for manipulation, the others are automatically controlled as they are associated with vectors that are orthogonal to it in the latent space.
Overcoming limitations of existing models used to manipulate facial features
Two major limitations we observe in current manipulation methods are their need to collect tens of thousands of ratings per manipulated dimension of interest and their tendency to significantly alter the appearance of the manipulated face (limiting the range within which impressions can be altered without breaking specificity or making the obtained faces unrealistic). The latter limits the practical utility of deep impression models for experimental manipulation, which could be remedied by increasing the validity of the manipulation (more change in impression for less change in appearance). Peterson et al. (2022), who collected over 30,000 ratings for each trait, argue that future studies could increase manipulation validity by collecting even more ratings (reiterated as a direction for the field in Todorov et al., 2023). This is motivated by their observation that the predictive accuracy of their linear trait model increases with the number of ratings. In the present study we show how a change in perspective on the construction of such computational models can allow us to overcome both limitations. In this section we argue that this is possible because the parameter estimates of these models are fungible.
Let's start by understanding the styleGAN2 model (Karras et al., 2020), which is able to generate diverse, high‐resolution images of faces that are largely indistinguishable from real ones (Tucciarelli et al., 2022), and which is used in both the approaches we discuss. StyleGAN represents faces as points in a 512‐dimensional latent space (which we'll call the face space) in which we are able to find the direction most associated with either perceptual dominance or trustworthiness. Each particular dimension of the 512‐dimensional face space does not refer to a specific trait of the output face, as it does not consist of discrete feature dimensions – these surface‐level dimensions are not meaningful. However numerous meaningful dimensions can be found in styleGAN's vast face space due to its high dimensionality and its training objective aimed at having all directions in this space consistently correspond to a set of visual features.
Thanks to this we can build a trait model by running a linear regression that predicts the rating from the face coordinates. The parameter estimates from this regression form the direction most associated with a given trait, which we can use to manipulate it by shifting the face coordinates up and down along that direction. While increasing the dataset reduces our uncertainty about the direction associated with the least squares solution of the regression, fungibility demonstrates the existence of alternative, similarly fitting (non‐least squares) solutions. As shown by Waller (2008) in a multiple regression with many predictors, there are many alternative solutions, all of which yield only slightly higher squared errors, but correspond to possibly very different directions. Regardless of how large we make our dataset, fungibility ultimately persists because more than one input dimension covaries with the trait ratings (a fact used in methods such as partial least squares). Thus even with an infinite set of observations, we still have a choice of many possible trait directions.
Thus we argue that better manipulation methods should not be informed by marginal increases in predictive accuracy, but by prioritizing effective manipulation. This involves, as we discussed, finding a direction that maximizes the change in impressions while minimizing the change in appearance. Regression with LASSO regularization, as applied by Peterson et al. (2022), aims at the opposite goal. LASSO regularization biases the solution towards directions that require us to move the most in latent space to observe a change in impressions, which is the opposite of the goal of creating stimuli for two experimental conditions, where we wish the stimuli to be as similar as possible while eliciting a different impression. Additionally linear regression in a multi‐dimensional space may bias the discovered features towards those strictly linearly dependent on assessments, neglecting other significant attributes, which is a recent issue, as in older models there were not enough dimensions for over‐fitting to occur. As a result manipulations of valence‐related features may only modify for example the smile, which can be gradually increased or decreased. We may thus use other priors to inform our trait vector selection, as to find directions that are correlated, but not the most correlated with the trait, but which will be more effective for manipulation.
Since the StyleGAN architecture's design situates the most average‐looking faces at the midpoint of the face space, we can construct computational models for a trait with only the examples that are high and low on a given trait. By identifying the direction associated with the fastest changes from the set of faces that were high on this dimension to those that were low, we can find a manipulation that targets primarily the features that have an outsized effect on being perceived for example trustworthy or untrustworthy. To allow us to find this direction with very few impressions per face, we use logistic regression, a linear model used for classification. Here the impressions that we want to model are encoded as either 0 (low) or 1 (high), eliminating a lot of noise.
Motivation and aim
In summary the approach developed in this study is motivated by the recognition that the problem of finding the best direction to change a face impression in the space of face features (sometimes called reverse‐correlation) and the problem of building the best predictive model of that impression are not equivalent. In high‐dimensional spaces predictive models have ‘fungible solutions’, meaning simply that many equally good solutions exist. Among these some are better suited for prediction, while others are more effective for manipulation. For instance LASSO regression, while useful for prediction, is suboptimal for manipulation as it favours directions requiring large changes in appearance to affect impressions. Our goal in experimental manipulation is to create face stimuli for two conditions that are as visually similar (control) while eliciting different impressions (manipulated variable). To this end we propose using a different statistical model (logistic regression), which prioritizes directions that yield significant changes in trait impressions with minimal changes to overall appearance. The proposed method aims to elicit larger impression changes while requiring significantly less impression data to construct, addressing the limitations of previous approaches in creating effective experimental stimuli.
In the following study we will find trait vectors in face space that allow us to manipulate perceptual dominance and trustworthiness. Such a manipulation will always be a trade‐off between the strength of trait manipulation and the amount of (visual) face identity preservation. The last puzzle piece towards performing a successful manipulation is thus to extend the range in which we may manipulate a trait, while preserving identity, for which we developed a special procedure alternating manipulated and unmanipulated face vectors the styleGAN architecture's ‘style layers’ – this and every other step necessary to use our method is explained and implemented in a Supplementary Online Codebook in Data S1 provided at the beginning of the Methods section.
STUDY 1: FINDING TRAIT VECTORS IN FACE SPACE
The goal of this study was to find directions in face space (a 512‐dimensional vector) that will allow us to manipulate the intensity of perceptual cues associated with our traits of interest. We can treat coordinates of the face in the 512‐dimensional face space as the independent variable and ratings on dominance and trustworthiness scales as dependent variables. As coordinates in face space determine how the face looks, manipulating this variable will allow us to detect a correspondence between the numerical representation of the face in face space and the intensity of each trait in psychological ratings.
Method
Data availability
Stimulus materials and data from all studies are available on the Open Science Framework (https://osf.io/du5m8/). The scripts used to generate and manipulate faces are available for use in a Supplementary Codebook (Data S1) hosted online on Google Colaboratory (https://colab.research.google.com/drive/1G34eiDEnSF8tHqaq9egfy6puPXWmBmlC?usp=sharing), which can also be downloaded from the Open Science Framework.
Participants
One hundred and twenty‐eight individuals took part in this study (72.1% women; Age: M = 27.8, SD = 10.9). Participants were recruited through Facebook, took part voluntarily and were not compensated. Determining a stopping rule for the number of participants was not important in this experiment as the result of this experiment was not a statistical test.
Design and materials
We generated 5000 random faces using the styleGAN2 model ‘ffhq‐f’. The generation can at times produce unrealistic‐looking faces; we thus created a set of exclusion criteria (outlined in the Codebook), that were followed to narrow down the set to only realistic photos and to exclude faces of children whose faces have been found to be judged primarily on functionally different dimensions (Collova et al., 2019). We picked the first 2000 valid faces as experimental stimuli.
To ensure cross‐cultural understanding of the traits a brief description for each was prepared based on the theory that dominance ratings reflect a judgement of strength, and trustworthiness ratings reflect a judgement of valence, particularly an approach‐avoidance tendency (Slepian et al., 2017; Todorov et al., 2008). Descriptions included statements like ‘A trustworthy face belongs to a person who you feel safe to be around’ and ‘A dominant face belongs to a person who has the strength to pursue his/her goals’. Ratings were measured on a continuous 9‐point slider labelled from 1 (Completely not dominant/untrustworthy) to 9 (Extremely dominant/trustworthy). The stimuli generation can be reproduced by following the Supplementary Codebook in Data S1.
Procedure
The survey was hosted online on Qualtrics. Participants provided informed consent and read a short description of the trait they were instructed to rate. Participants rated either trustworthiness (54 participants) or dominance (74 participants). Next participants proceeded to rate a random subset of 50 styleGAN‐generated faces, from a pool of 2000. Only at the end of the survey were participants informed that the images were computer‐generated.
Results
In total of the 2000 stimuli, 1430 random faces were rated on dominance and 1641 on trustworthiness. The subset were chosen at random, due to the random draw of stimuli for each participant. Contrary to previous approaches using linear regression, we begin by selecting for each factor two subsets: a subset of faces with the lowest and a subset with the highest trait rating. Coordinates of faces in face space and to which subset they belonged were subjected to logistic regression, separately for dominance ( = .55) and trustworthiness ( = .46). The coefficients of its result form a vector that describes the direction of the shortest path in face space ‘from’ the subset low in a trait ‘to’ the subset high in that trait.
To choose the hyperparameters of the cutoff values for low and high sets we maximized the cross‐validated accuracy. The final accuracy was 0.65 for the dominance classifier and 0.62 for the trustworthiness classifier. However we will argue that these values should not be interpreted in terms of real‐world predictive accuracy. A point worth first addressing is whether using logistic regression instead of linear regression deteriorates the quality of the trait vector found, with the different leverage it gives to data points. On the contrary, for our goals when using data like the one from this study, in which we collect few ratings per face, the wide confidence intervals (CIs) of each data point demand a different analytical treatment. The argument goes as follows: (1) wide CIs imply only global not local differences in ratings have a satisfactory signal‐to‐noise ratio as to the direction of change, (2) the points with ratings near the mean are the most ambiguous whether they should be associated with a decrease or increase in ratings, (3) dropping these medium faces diminishes leverage of noise vs. signal on the result, (4) pooling of extremely and slightly highly rated faces into one ‘subset with high‐ratings’ similarly decreases the leverage of the mostly noisy local differences. As a result we capture a more holistic bundle of features associated with a trait, while discarding only the least informative third of faces. Most importantly these changes are what allow us to overcome the need for large datasets, but at the same time decouple the statistical accuracy of the model from real‐world predictive accuracy, whose maximization makes the task data‐intensive. A visual comparison of face feature manipulation using both the proposed method and the one developed by Peterson et al. (2022) is presented in Figure 1.
FIGURE 1.

Example manipulations of a face used by Peterson et al. (2022), comparing their manipulation method with the revised method proposed in this article. Comparing the two methods, in the less trustworthy condition our method preserved better the face's emotional expression, perceived gender, as well as hair colour. In the more trustworthy condition more changes are associated with face morphology (e.g. neck width, eye size) than with a difference in facial expression.
To assess whether we obtained independent dominance and trustworthiness manipulations we use the normed dot product, a measure equal to 0 for orthogonal, 1 for parallel and −1 for antiparallel vectors, and found it to be equal to −0.088 for the dominance and trustworthiness vectors, a level of orthogonality in face space previously reached only for the most unrelated of traits (Todorov & Oosterhof, 2011). Orthogonality of vectors is however only necessary to measure the effect, but not sufficient to guarantee that a manipulation of perceptual dominance and trustworthiness is orthogonal on the level of psychological ratings (e.g. for perceptual dominance to have no direct effect on judgements of trustworthiness independent of an indirect effect through influencing perceptual trustworthiness), this has to be checked experimentally using the directions we have just found (Figure 2).
FIGURE 2.

Sample manipulations of faces used in Study 2 with the obtained vectors. The middle, neutral faces are unmanipulated and randomly generated. Previously proposed perceptual features associated with these traits are visible, for example (a) trustworthiness – a change in eye colour (Kleisner et al., 2013), (b) dominance – a change in head tilt (Witkower & Tracy, 2019).
STUDY 2: EXPERIMENTAL EVALUATION OF THE EFFECT OF FACE MANIPULATION ON JUDGEMENTS
The goal of this experiment was two‐fold: (1) validate our method by showing the expected pattern of dominance ratings, when manipulating perceptual dominance, and of trustworthiness ratings, when manipulating trustworthiness, (2) quantify the size of the impact that a change in perceptual dominance has on trustworthiness ratings, and that trustworthiness has on dominance ratings. This experimental demonstration of the direct effects of these traits on impressions will allow us to evaluate the relationship between the factors of the dominance‐trustworthiness model. That is, whether dominance cues influence judgements of trustworthiness and whether trustworthiness cues influence judgements of dominance.
Method
Participants
Three hundred and eighty‐three participants took part in this trial (70.6% women; Age: M = 30.0, SD = 12.7). Individuals were recruited through Facebook and took part voluntarily. Of these 234 individuals (73.2% women; Age: M = 27.9, SD = 11.6) correctly answered an attention check and were included in the analysis. Participants were informed that after finishing the survey, they could choose to take part in a draw of 10 compensations of 50 PLN that will be assigned at random.
In choosing the participant count for this study we relied on estimates of the amount of ratings required to stabilize the mean rating of a face (Hehman et al., 2018) and set the goal of recruiting participants until every face in every condition had been rated more than 20 times. From 234 individuals we obtained between 21 and 34 ratings for each face‐condition combination.
Design and materials
To observe the shape of the effects of perceptual traits on ratings we chose five levels of manipulation for each dimension and obtained a total of nine conditions for each face: the unmanipulated face (control condition), four manipulations on trustworthiness (the face shifted by −2, −1, +1 or +2 with the trustworthiness vector) and four manipulations on dominance (the face shifted with the dominance vector, Figures 3 and 4).1
FIGURE 3.

The nine experimental conditions. An exemplary face randomly generated for this study (labelled 0) was manipulated using the two vectors either to increase (+1, +2) or decrease (−1, −2) dominance (d) or trustworthiness (t). The text in the upper right corner helps identify conditions in Figure 4.
FIGURE 4.

Crossover design of Study 2. A schematic representation of how experimental conditions vary between participants and faces – even though each individual sees each face in only one condition, the marginal distributions of conditions per person (rows) and conditions per face (columns) stay fixed.
We set ourselves a high standard of showing between‐subject differences in ratings. Due to the influence of previously seen faces on ratings (Dotsch et al., 2017) one participant could not be assigned one manipulation condition for all faces. Therefore we utilized a crossover design with Latin squares for randomization, ensuring each subject saw each face only once in a random condition, while maintaining constant marginal distributions across participants. Each subject saw 45 faces, five in each of the nine conditions.
We generated 100 random styleGAN faces and applied the same exclusion criteria as in Study 1 to select 45 faces that would be used in this experiment. To decrease the effect of typicality on trustworthiness judgements we picked more typical faces than in Study 1 by sampling faces closer to the centre of their distribution. Each face was generated in the eight manipulated versions and the first 45 faces that satisfied the exclusion criteria were picked for a total of 405 experimental stimuli. A detailed overview of the process of generating manipulated images is provided in the Supplementary Codebook (Data S1). Ratings were measured on a continuous 9‐point slider labelled from 1 (Completely not dominant/untrustworthy) to 9 (Extremely dominant/trustworthy). On both viewings, after rating 22 images, participants read the trait description again. Only at the end of the survey were participants informed that the images were computer‐generated.
Procedure
The study was hosted online on Qualtrics. Participants provided informed consent and, as in Study 1, proceeded to read a short description of the trait they were instructed to rate. This time every individual rated both traits. They were informed they will see the same 45 faces twice and rate them first on one trait then on the second. They read the description of the first trait at the start of the study, and the description of the second when they finished rating all 45 faces on the first trait. The assignment of whether the first trait rated was dominance or trustworthiness was randomized. Every image was rated separately and images were shown in a different order on the second viewing.
To ensure the validity of the ratings we included an attention check – a test of whether the participant had read the trait description. At the end of the study participants were asked to recall the description of a face with a low level of dominance or trustworthiness and to report it in a short sentence. The question always asked about the trait that was rated second.
Results
To test whether perceptual manipulation produced the expected change in ratings we fitted regression models to the ratings data. The ratings were averaged across participants to obtain the faces' mean ratings. Both manipulations produced changes in trait ratings. The dominance manipulation changed dominance ratings in a linear fashion (F(1, 223) = 170.8, p < .001, R 2 = .43). Trustworthiness ratings tracked the trustworthiness manipulation (F(1, 223) = 111.0, p < .001, R 2 = .33); however the higher the manipulation, the less the ratings were changed, producing a curvilinear (quadratic) relationship (F quadratic(1222) = 61.24, p < .001, R 2 = .36; see Figure 5).
FIGURE 5.

Mean face ratings in Study 2 under different manipulation conditions measured on a 9‐point scale. Points indicate mean dominance ratings (blue) and trustworthiness ratings (red). The x‐axis shows the amount of manipulation by either the trustworthiness vector (right) or dominance vector (left) expressed in terms of the coefficient by which the vector was multiplied to obtain manipulated faces. Error bars show standard error of the mean.
To explain the quadratic pattern seen in trustworthiness ratings we conducted additional analyses to inquire which faces drove the shape of changes in ratings. We hypothesized that the face's initial (control) rating affected the change in mean ratings it incurred under manipulation. Indeed there was a linear trend (F(1, 43) = 14.8, p < .001, R 2 = .26) between individual face's control group ratings, and the change in its ratings when manipulated to increase trustworthiness (+2 condition). The higher the face was judged on trustworthiness in the control group, the less it gained from the manipulation, to the point that a very highly trustworthy face's rating decreased when manipulated. We found no such effect when comparing with the −2 condition of decreasing trustworthiness (F(1, 43) = 0.002, p = .97, R 2 < .001), indicating that the former linear trend did not occur due to a regression to the mean. Additionally in this comparison not a single face's rating on trustworthiness increased, instead of decreasing as predicted by the manipulation.
From the results of the bootstrap (Table 1) we observed that the manipulation of one trait had an influence on judgements of the other in three of the four possible manipulations – dominance ratings changed both when decreasing and increasing perceptual trustworthiness, but trustworthiness ratings changed only when increasing dominance. We hypothesized that this fourfold pattern can be explained by the shape of the effect of each judgment being consistently either quadratic or linear under both manipulations. Concretely we tested whether the manipulation of trustworthiness affected dominance judgements linearly (F(1, 223) = 43.1, p < .001, R 2 = .16), and whether the dominance manipulation affected trustworthiness judgements in a quadratic fashion (F quadratic(1222) = 15.10, p < .001, R 2 = .12).
TABLE 1.
The results of the bootstrap of the mean change for 5 faces.
| Manipulation of perceptual dominance | Manipulation of perceptual trustworthiness | |||
|---|---|---|---|---|
| Decrease | Increase | Decrease | Increase | |
| Change of Dominance judgements | ||||
| In ratings (Cl 95%) | −0.91 (−1.39, −0.39) | +0.98 (+0.51, +1.45) | +0.48 (−0.04, +0.95) | −0.49 (−1.02, +0.11) |
| In control group SDs (Cl 95%) | −1.18 SD (−1.79, −0.51) | +1.26 SD (+0.66, +1.87) | +0.61 SD (−0.05, +1.22) | −0.63 SD (−1.31, +0.14) |
| Change of Trustworthiness judgements | ||||
| In ratings (Cl 95%) | +0.02 (−0.41, +0.48) | −0.74 (−1.28, −0.19) | −1.22 (−1.73, −0.72) | +0.43 (+0.04, +0.82) |
| In control group SDs (Cl 95%) | +0.03 SD (−0.54, +0.63) | −0.97 SD (−1.68, −0.25) | −1.6 SD (−2.27, −0.94) | +0.57 SD (+0.06, +1.08) |
Note: The columns indicate which condition the control group is being compared to; the rows show what trait was being judged expressed in either a change in ratings or how big such a change is in terms of the standard deviation of this trait's ratings in the control group.
To assess manipulation effectiveness achievable with our method we compared the manipulation effect sizes on both dimensions with those calculated for the results obtained by Peterson et al. (2022). These comparisons were conducted for each dimension in both datasets using t‐tests between the groups with the maximum decrease/increase in manipulation strength on a given dimension and the control group (non‐manipulated faces). Because the scales in the two studies had different ranges, we standardized them to have values between 0 and 1. This approach enabled us to calculate and compare Cohen's d values derived from each comparison. These results are presented in Table 2. Regarding the dominance manipulation our method allowed us to obtain more than doubled effect size values. We obtained a similar result when we compared the effectiveness of decreasing the intensity of trustworthiness, while when we increased it, the effect size calculated for our method was also higher than that calculated for the data collected by Peterson et al. (2022), although this time not twice as much. Note how the t‐test comparison for increasing trustworthiness condition versus the control group indicated that in the case of Peterson et al. (2022) manipulation there was no statistically significant difference between their means.
TABLE 2.
The results of methods effectiveness comparisons.
| Dimension | Method | M | Cohen's d | t‐test | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Low | 0 | High | 0 to low | 0 to high | 0 to low | 0 to high | ||||||
| t | Df | p | t | Df | p | |||||||
| Dominance | Proposed Method | 0.38 | 0.5 | 0.62 | −1.24 | 1.22 | −5.88 | 86.87 | <.001 | 5.79 | 87.73 | <.001 |
| Peterson et al. (2022) | 0.37 | 0.44 | 0.51 | −0.56 | 0.59 | −2.8 | 97.7 | .006 | 2.97 | 96.29 | .004 | |
| Trustworthiness | Proposed Method | 0.43 | 0.58 | 0.64 | −1.4 | 0.61 | −6.63 | 83.35 | <.001 | 2.88 | 86.27 | .005 |
| Peterson et al. (2022) | 0.49 | 0.56 | 0.61 | −0.54 | 0.37 | −2.69 | 97.1 | .008 | 1.86 | 97.4 | .066 | |
Note: In the case of the means, low and high are the manipulated conditions and ‘0’ the control one. The effect sizes and statistics concern the difference between a given manipulation condition and the control one (faces not manipulated).
Discussion
We shall start by noting, that while we have successfully obtained a clear shift of the distribution of face ratings, the changes in individual face's ratings seen on bootstrap results still vary noticeably. This can be partially attributed to a complex interaction between identity variables and the manipulated traits, which highlights how crucial a good process of identity randomization is for uncovering the main effect of perceptual traits (Sutherland et al., 2013). We may expand our previous statement that the existence of an effect of face identity on ratings precludes the ability to infer a lack of a causal effect of perceptual dominance on trustworthiness ratings: even if only perceptual dominance and trustworthiness varied in a rated sample, lack of a correlation also wouldn't indicate a lack of a causal effect, as such causation could be an effect of any shape, including those with no linear correlation.
We observed one such non‐linear effect for trustworthiness ratings, which, together with the linear effect of perceptual dominance on dominance ratings, closely replicated the shapes of the effects found by Oosterhof and Todorov (2008). However the quadratic relationship calls for an explanation – Oosterhof and Todorov speculated that it arises because people are better at discriminating trustworthiness at low levels. We can now shed new light on this result. Since faces judged to be more trustworthy in the control group gained less from the manipulation, so that highly trustworthy faces actually lost in mean ratings, the quadratic relationship may actually reflect the fact that above a certain level of perceptual trustworthiness, raising it further will actually decrease trustworthiness judgements, indicating that there is a limit to how much trustworthiness can be increased. A possible explanation is that this is an effect of typicality. The faces that lost in ratings may have already had a larger pronouncement of the manipulated features, thus increasing them further may then make them appear atypical, which in turn decreases their perceived valence and decreases their trustworthiness ratings.
This explanation entails the prediction that the trustworthy faces that lost in trustworthiness ratings as the result of manipulation to increase trustworthiness would not be as affected by the decrease of trustworthiness, as they would turn more typical and the effects would partially cancel out. This, alas, does not seem to be the case, as we have already noted that all faces were affected equally by the decrease, regardless of their rating in the control group, with R 2 < .001. Moreover recall that we made sure to generate more typical faces for the validation study, so as to be able to manipulate them without making them atypical. Neither does typicality seem to explain the change in trustworthiness impressions when manipulating dominance. Based on the richer per‐face information on this effect available in the present study, it may be possible to postulate a novel explanation of the observed quadratic pattern that has a theoretical basis in the functional interpretation of these traits and the evolutionary origin of their cues as signals. If trustworthiness signals the intention, and dominance the ability, to cause harm, we would expect there to be different evolutionary pressures associated with these impressions. While unreliable displays of strength are unlikely, judgements of trustworthiness may be subject to a kind of impostor effect, in which exaggerated displays of trustworthiness are discounted.
A useful parallel is drawn by Mercier (2020) between the human task of allocating trust and costly signalling in animals. Mercier notes that if it were advantageous to signal one's trustworthiness regardless of whether one was actually a trustworthy person, then a selection pressure would begin to increase trustworthiness cues in everyone, quickly rendering trustworthiness cues useless. To keep trustworthiness signalling reliable, there must be a cost associated with unreliable signalling. An analogue to the observed effect for faces that serves such a role can be found in the bowerbirds studied by Madden (2002), which attract mates by building decorated bowers. These bowers are not costly signals in the sense that they are costly to build, which, like looking trustworthy, they are not, but, as Madden has shown, they are kept reliable by other bowerbirds monitoring each other and punishing males that build exaggerated bowers relative to their social standing.
Some of the research establishing the relationship between typicality and trustworthiness may perhaps be interpreted as corroborating this alternative account. Sofer et al. (2015) have shown that when faces are manipulated towards an attractive face, judgements of trustworthiness peak and begin to decline, while judgements of attractiveness continue to rise. Since attractiveness is another valence‐laden dimension, and the finding suggests that very attractive faces can be judged as untrustworthy, it seems that the described effect can occur even when the faces are not extremely atypical and thus negatively judged. If future studies confirm the existence of such an effect, it would be a major step forward in explaining the reliability of trustworthiness cues, although for the time being it remains a hypothesis because the effect of typicality is better established.
Having validated our manipulation we turn to how an intervention on one trait changes the ratings of the other (summarized in Table 3). We note that contrary to what the orthogonality account would predict, two of the four possible manipulations produce similar changes in both judgements, while their orthogonality holds only when decreasing dominance. The fact that in two manipulations the change in both judgements is similar may be explained by the quadratic shape of trustworthiness judgements, which peaks when decreasing dominance and starts decreasing sharply when increasing it. This quadratic relationship also explains the other changes in the relative strength of these effects, as there exists a discrepancy in the relative ease of change in judgement of trustworthiness when decreasing or increasing the intensity of perceptual cues. This demonstrates that the orthogonality, which holds only when decreasing a face's dominance, should be attributed to people's reluctance to increase trustworthiness judgements, instead of an actual independence of these traits (the dependence of which is highlighted in all three other manipulations). Crucially for face perception research, such non‐linear pattern cannot be done away with using approaches proposed previously, such as subtracting the trustworthiness vector from the dominance vector forcing them to be negatively correlated (Oh, Buck, & Todorov, 2019; Oosterhof & Todorov, 2008; Todorov et al., 2011), as no rotation of the described effects would produce orthogonal manipulation in all four manipulations. Thus this new pattern has to be considered, when causally interpreting what the results of experiments should be attributed to. This is because if dominant faces are perceived as untrustworthy, regardless of their perceptual trustworthiness, experimental studies manipulating dominance should consider the indirect effect that a lack of trust had on behavior.
TABLE 3.
The fourfold pattern of the expected changes in judgements under an intervention to increase or decrease perceptual dominance or trustworthiness.
| Direction | Trustworthiness manipulation | Dominance manipulation |
|---|---|---|
| Increase |
Small increase in trustworthiness judgements Small decrease in dominance judgements |
Large decrease in trustworthiness judgements Large increase in dominance judgements |
| Decrease |
Large decrease in trustworthiness judgements Small increase in dominance judgements |
No change in trustworthiness judgements Large decrease in dominance judgements |
This last point underscores how an overlapping features account may also be inadequate to explain the pattern. While some of the features associated with trustworthiness and dominance still appear to overlap in our neural network‐based method (i.e. the change in apparent expression, as first reported in Todorov et al., 2008), these features do not account for the observed pattern, especially the effects of the dominance manipulation on trustworthiness judgements. A better account for this effect is the role of valence, as it is similar to the effects obtained for female faces in Sutherland et al. (2015). We may then see this as evidence that the successful execution of the functional roles of the two impressions demands evaluating cues associated with both of them, rather than their cues overlapping by chance.
In light of this result alterations to the dominance‐trustworthiness model can be proposed. First we propose that dominance and trustworthiness are not independent dimensions, but that rather their relationship can be described by a fourfold pattern summarized in Table 3. However this challenge to orthogonality does not undermine the importance of the functional roles associated with these judgements. Rather it suggests that successful execution of these roles requires adaptation of each judgement based on both trustworthiness and dominance cues (which should not be the case if dominance and trustworthiness were orthogonal dimensions). Finally we suggest that the qualitative difference in the form of these judgements may be partially explained by different pressures to keep the signals associated with their functional roles somewhat reliable (for trustworthiness cues to signal intent, and for dominance cues to signal power, reliably), exemplified by the proposed impostor effect – the discounting of exaggerated trustworthiness cues.
GENERAL DISCUSSION
We have employed GANs for a controlled intervention on complex perceptual traits of faces and shown how recent AI‐based approaches to experimental manipulation of faces can be made effective even with the higher standard of between‐subject effects. We observed the effects that changing one perceptual trait has on ratings of the other, showing their supposed orthogonality holds only when decreasing dominance – a fourfold pattern inconsistent with existing accounts. We showed how this pattern may be explained by the qualitative difference between these traits, specifically the quadratic shape of trustworthiness judgements. Lastly we proposed an alternative, evolutionarily plausible account of why the faces judged most trustworthy may not be the ones with the most perceptual trustworthiness.
The method presented in the current paper combines the benefits of computer‐generated faces with those of using ecologically valid, real face‐stimuli, extending the work of Peterson et al. (2022). The method allows for more precise controlled manipulation and is perfectly suited for use in experiments with between‐subject designs, concerned with isolating the effect that traits such as dominance and trustworthiness have on behaviour. This ability enhances the ecological validity of the findings, making them more applicable and relatable to real‐world scenarios. Furthermore the results of the comparisons between the proposed method and the one used by Peterson et al. (2022) clearly indicated that our method yields substantially larger effect sizes for both the dominance and trustworthiness dimensions. These findings indicate that our approach is more effective in achieving the desired psychological manipulations, offering a potent tool for future research in psychology.
Moreover our method makes the GANs methodology accessible to a broader audience, facilitating a wider range of scientific inquiries and broadening the scope of potential studies. This includes quick, exploratory experiments that require minimal data as well as more in‐depth, comprehensive research. Additionally our method's efficiency and adaptability to incorporate new dimensions by collecting ratings (as in Study 1) and applying all the steps described in the Codebook allows researchers to conduct studies without the constraints of excessive data requirements or computational overhead, encouraging richer, more complex explorations of human perception. By requiring less data than recently existing methods (e.g. Peterson et al., 2022), our approach may also facilitate studies in situations where data collection is difficult or where rapid testing is necessary.
Moreover it delegates the hard task of abstracting factors of variation to a state‐of‐the‐art neural network. Perceptual complexes underlying trait judgements can thus be found in the disentangled representation of faces in face space, without making a priori assumptions about which features drive them. This is the major advantage of AI‐based methods over programs such as FaceGen used by Oosterhof and Todorov (2008) and many others (e.g. Getov et al., 2015; Ho et al., 2018; Nakamura & Watanabe, 2019; Thorstenson et al., 2019), which limit their manipulation to the vertices and reflectance of the 3D model, leaving out important sources of variation in judgements such as face position (Witkower & Tracy, 2019). However there lies the limitation of this approach: because the features manipulated in the GANs are not explicitly visible, we can't say for sure which changes in the face drove the changes in judgements. Second we cannot precisely eliminate variation in some contingent features, such as head position, to study only face morphology, to which our results cannot be generalized. Our study took a different approach by treating the features as perceptual rather than physical complexes, which may also be its advantage, as it allows for more actionable results.
The last important limitation to discuss is the generalizability of our findings based on our study sample, particularly across cultures. However there are reasons to believe that the result could be replicated across cultures. The dimensional structure of impressions of faces has been shown to show considerable cross‐cultural agreement (Sutherland et al., 2018; Walker et al., 2011). In particular the results of Oosterhof and Todorov (2008) have been shown to hold across nearly all world regions (Jones et al., 2021). It should be noted that while the perceptual features that determine the dominance or trustworthiness stereotype may be subject to cultural differences, it is the salience and structure in relation to other judgements of dominance and trustworthiness that appear to be nearly universal (Martínez et al., 2020; Todorov & Oh, 2021). Such consistency would be difficult to achieve were it not for the universality of their functional roles, strongly suggesting a degree of universality of the relationships shown in this study. Note also that while the experiments presented in the current paper may warrant future replication in other regions, the method presented remains an asset in an investigation of cross‐cultural differences, as it could be used not only to produce culture‐specific manipulations of these dimensions but also to compare which perceptual features account for the changes in judgements across regions.
Summarizing the contribution of the method we show how to make more effective manipulation models using actually far less data, disproving the conjecture that progress in using deep models for manipulation requires larger datasets (Peterson et al., 2022; Todorov et al., 2023). Second we show how going beyond linear regression can let us manipulate a broader range of facial features, which we show induces much larger changes in impressions than the less holistic manipulations from previous methods. Finally we added a way to better preserve the original features of the face, increasing the specificity and allowing for a wider range of manipulations without altering the face identity (e.g. changing the perceived gender) which could threaten internal validity. Together these greatly improved the utility of deep models of face impression for practical experimental manipulation for future studies.
AUTHOR CONTRIBUTIONS
Kamil K. Imbir: Methodology; supervision; conceptualization; writing – review and editing. Maciej Siemiątkowski: Resources; investigation; writing – review and editing. Adam Sobieszek: Conceptualization; software; methodology; investigation; writing – original draft; writing – review and editing.
CONFLICT OF INTEREST STATEMENT
The authors report no conflict of interest.
Supporting information
Data S1.
ACKNOWLEDGEMENTS
We are grateful to Małgorzata Piskorska and Barbara Urbańczyk for their help in data collection and stimuli selection for Study 1.
Footnotes
Using data from a pilot study, we scaled trait vectors, so that a single shift on either vector would correspond to a similar change in ratings.
DATA AVAILABILITY STATEMENT
The data that support the findings of this study are openly available on the Open Science Framework at https://osf.io/du5m8/.
REFERENCES
- Balas, B. , & Pacella, J. (2015). Artificial faces are harder to remember. Computers in Human Behavior, 52, 331–337. 10.1016/j.chb.2015.06.018 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Collova, J. R. , Sutherland, C. A. M. , & Rhodes, G. (2019). Testing the functional basis of first impressions: Dimensions for children's faces are not the same as for adults' faces. Journal of Personality and Social Psychology, 117(5), 900–924. 10.1037/pspa0000167 [DOI] [PubMed] [Google Scholar]
- Crookes, K. , Ewing, L. , Gildenhuys, J. D. , Kloth, N. , Hayward, W. G. , Oxner, M. , Pond, S. , & Rhodes, G. (2015). How well do Computer‐generated faces tap face expertise? PLoS One, 10(11), e0141353. 10.1371/journal.pone.0141353 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dotsch, R. , Hassin, R. R. , & Todorov, A. (2017). Statistical learning shapes face evaluation. Nature Human Behaviour, 1(1), 1. 10.1038/s41562-016-0001 [DOI] [Google Scholar]
- Dotsch, R. , & Todorov, A. (2012). Reverse correlating social face perception. Social Psychological and Personality Science, 3(5), 562–571. 10.1177/1948550611430272 [DOI] [Google Scholar]
- Freeman, J. B. , & Ambady, N. (2011). A dynamic interactive theory of person construal. Psychological Review, 118(2), 247–279. 10.1037/a0022327 [DOI] [PubMed] [Google Scholar]
- Getov, S. , Kanai, R. , Bahrami, B. , & Rees, G. (2015). Human brain structure predicts individual differences in preconscious evaluation of facial dominance and trustworthiness. Social Cognitive and Affective Neuroscience, 10(5), 690–699. 10.1093/scan/nsu103 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Goodfellow, I. , Bengio, Y. , & Courville, A. (2016). Deep learning. MIT Press. [Google Scholar]
- Goodfellow, I. , Pouget‐Abadie, J. , Mirza, M. , Xu, B. , Warde‐Farley, D. , Ozair, S. , Courville, A. , & Bengio, Y. (2014). Generative adversarial nets. In Ghahramani Z., Welling M., Cortes C., Lawrence N., & Weinberger K. Q. (Eds.), Advances in neural information processing systems (Vol. 27). Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2014/file/5ca3e9b122f61f8f06494c97b1afccf3‐Paper.pdf [Google Scholar]
- Hehman, E. , Xie, S. Y. , Ofosu, E. K. , & Nespoli, G. (2018). Assessing the point at which averages are stable: A tool illustrated in the context of person perception . 10.31234/osf.io/2n6jq [DOI]
- Ho, P. K. , Woods, A. , & Newell, F. N. (2018). Temporal shifts in eye gaze and facial expressions independently contribute to the perceived attractiveness of unfamiliar faces. Visual Cognition, 26(10), 831–852. 10.1080/13506285.2018.1564807 [DOI] [Google Scholar]
- Jones, B. C. , DeBruine, L. M. , Flake, J. K. , Liuzza, M. T. , Antfolk, J. , Arinze, N. C. , Ndukaihe, I. L. G. , Bloxsom, N. G. , Lewis, S. , Foroni, F. , Willis, M. L. , Cubillas, C. P. , Vadillo, M. A. , Turiégano, E. , Gilead, M. , Simchon, A. , Sarıbay, S. A. , Owsley, N. , Jang, C. , … Baskin, E. (2021). To which world regions does the valence–dominance model of social perception apply? Nature Human Behaviour, 5(1), 159–169. 10.1038/s41562-020-01007-2 [DOI] [PubMed] [Google Scholar]
- Karras, T. , Laine, S. , Aittala, M. , Hellsten, J. , Lehtinen, J. , & Aila, T. (2020). Analyzing and improving the image quality of StyleGAN. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . 10.1109/cvpr42600.2020.00813 [DOI]
- Kleisner, K. , Příplatová, L. , Fröst, P. , & Flegr, J. (2013). Trustworthy‐looking face meets brown eyes. PLoS One, 8(1), e53285. 10.1371/journal.pone.0053285 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Livingston, R. W. , & Pearce, N. (2009). The teddy‐bear effect. Psychological Science, 20(10), 1229–1236. 10.1111/j.1467-9280.2009.02431.x [DOI] [PubMed] [Google Scholar]
- Madden, J. R. (2002). Bower decorations attract females but provoke other male spotted bowerbirds: Bower owners resolve this trade‐off. Proceedings of the Royal Society B: Biological Sciences, 269(1498), 1347–1351. 10.1098/rspb.2002.1988 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Martínez, J. E. , Funk, F. , & Todorov, A. (2020). Quantifying idiosyncratic and shared contributions to judgment. Behavior Research Methods, 52(4), 1428–1444. 10.3758/s13428-019-01323-0 [DOI] [PubMed] [Google Scholar]
- Mercier, H. (2020). Not born yesterday: The science of who we trust and what we believe. Princeton University Press. [Google Scholar]
- Mercier, H. , & Sperber, D. (2017). The enigma of reason. Harvard University Press. [Google Scholar]
- Mileva, M. , Young, A. W. , Kramer, R. S. S. , & Burton, A. M. (2019). Understanding facial impressions between and within identities. Cognition, 190, 184–198. 10.1016/j.cognition.2019.04.027 [DOI] [PubMed] [Google Scholar]
- Nakamura, K. , & Watanabe, K. (2019). Data‐driven mathematical model of east‐Asian facial attractiveness: The relative contributions of shape and reflectance to attractiveness judgements. Royal Society Open Science, 6(5), 182189. 10.1098/rsos.182189 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Oh, D. , Buck, E. A. , & Todorov, A. (2019). Revealing hidden gender biases in competence impressions of faces. Psychological Science, 30(1), 65–79. 10.1177/0956797618813092 [DOI] [PubMed] [Google Scholar]
- Oh, D. , Dotsch, R. , & Todorov, A. (2019). Contributions of shape and reflectance information to social judgments from faces. Vision Research, 165, 131–142. 10.1016/j.visres.2019.10.010 [DOI] [PubMed] [Google Scholar]
- Oh, D. , Grant‐Villegas, N. , & Todorov, A. (2020). The eye wants what the heart wants: Female face preferences are related to partner personality preferences. Journal of Experimental Psychology: Human Perception and Performance, 46(11), 1328–1343. 10.1037/xhp0000858 [DOI] [PubMed] [Google Scholar]
- Oosterhof, N. N. , & Todorov, A. (2008). The functional basis of face evaluation. Proceedings of the National Academy of Sciences of the United States of America, 105(32), 11087–11092. 10.1073/pnas.0805664105 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Peterson, J. C. , Uddenberg, S. , Griffiths, T. L. , Todorov, A. , & Suchow, J. W. (2022). Deep models of superficial face judgments. Proceedings of the National Academy of Sciences of the United States of America, 119(17), e2115228119. 10.1073/pnas.2115228119 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Radke, S. , Kalt, T. , Wagels, L. , & Derntl, B. (2018). Implicit and explicit motivational tendencies to faces varying in trustworthiness and dominance in men. Frontiers in Behavioral Neuroscience, 12, 8. 10.3389/fnbeh.2018.00008 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Slepian, M. L. , Young, S. G. , & Harmon‐Jones, E. (2017). An approach‐avoidance motivational model of trustworthiness judgments. Motivation Science, 3(1), 91–97. 10.1037/mot0000046 [DOI] [Google Scholar]
- Sofer, C. , Dotsch, R. , Wigboldus, D. , & Todorov, A. (2015). What is typical is good. Psychological Science, 26(1), 39–47. 10.1177/0956797614554955 [DOI] [PubMed] [Google Scholar]
- Sutherland, C. a. M. , Liu, X. , Zhang, L. , Chu, Y. , Oldmeadow, J. A. , & Young, A. W. (2018). Facial first impressions across culture: Data‐driven modeling of Chinese and British perceivers' unconstrained facial impressions. Personality and Social Psychology Bulletin, 44(4), 521–537. 10.1177/0146167217744194 [DOI] [PubMed] [Google Scholar]
- Sutherland, C. A. M. , Oldmeadow, J. A. , Santos, I. M. , Towler, J. , Burt, D. M. , & Young, A. W. (2013). Social inferences from faces: Ambient images generate a three‐dimensional model. Cognition, 127(1), 105–118. 10.1016/j.cognition.2012.12.001 [DOI] [PubMed] [Google Scholar]
- Sutherland, C. a. M. , Young, A. W. , Mootz, C. A. , & Oldmeadow, J. A. (2015). Face gender and stereotypicality influence facial trait evaluation: Counter‐stereotypical female faces are negatively evaluated. British Journal of Psychology, 106(2), 186–208. 10.1111/bjop.12085 [DOI] [PubMed] [Google Scholar]
- Thorstenson, C. A. , Pazda, A. D. , Young, S. G. , & Elliot, A. J. (2019). Face color facilitates the disambiguation of confusing emotion expressions: Toward a social functional account of face color in emotion communication. Emotion, 19(5), 799–807. 10.1037/emo0000485 [DOI] [PubMed] [Google Scholar]
- Tiddeman, B. , Burt, D. M. , & Perrett, D. I. (2001). Prototyping and transforming facial textures for perception research. IEEE Computer Graphics and Applications, 21(4), 42–50. 10.1109/38.946630 [DOI] [Google Scholar]
- Todorov, A. , Dotsch, R. , Wigboldus, D. , & Said, C. P. (2011). Data‐driven methods for modeling social perception. Social and Personality Psychology Compass, 5(10), 775–791. 10.1111/j.1751-9004.2011.00389.x [DOI] [Google Scholar]
- Todorov, A. , & Oh, D. (2021). The structure and perceptual basis of social judgments from faces. Advances in Experimental Social Psychology, 63, 189–245. 10.1016/bs.aesp.2020.11.004 [DOI] [Google Scholar]
- Todorov, A. , & Oosterhof, N. N. (2011). Modeling social perception of faces. IEEE Signal Processing Magazine, 28(2), 117–122. 10.1109/msp.2010.940006 [DOI] [Google Scholar]
- Todorov, A. , Said, C. P. , Engell, A. D. , & Oosterhof, N. N. (2008). Understanding evaluation of faces on social dimensions. Trends in Cognitive Sciences, 12(12), 455–460. 10.1016/j.tics.2008.10.001 [DOI] [PubMed] [Google Scholar]
- Todorov, A. , Uddenberg, S. , & Albohn, D. (2023). Generative models for visualizing idiosyncratic impressions. British Journal of Psychology, 114(2), 511–514. 10.1111/bjop.12622 [DOI] [PubMed] [Google Scholar]
- Tucciarelli, R. , Vehar, N. , Chandaria, S. , & Tsakiris, M. (2022). On the realness of people who do not exist: The social processing of artificial faces. iScience, 25(12), 105441. 10.1016/j.isci.2022.105441 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Walker, M. , Jiang, F. , Vetter, T. , & Sczesny, S. (2011). Universals and cultural differences in forming personality trait judgments from faces. Social Psychological and Personality Science, 2(6), 609–617. 10.1177/1948550611402519 [DOI] [Google Scholar]
- Waller, N. G. (2008). Fungible weights in multiple regression. Psychometrika, 73(4), 691–703. 10.1007/s11336-008-9066-z [DOI] [Google Scholar]
- Witkower, Z. , & Tracy, J. L. (2019). A facial‐action imposter: How head tilt influences perceptions of dominance from a neutral face. Psychological Science, 30(6), 893–906. 10.1177/0956797619838762 [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data S1.
Data Availability Statement
Stimulus materials and data from all studies are available on the Open Science Framework (https://osf.io/du5m8/). The scripts used to generate and manipulate faces are available for use in a Supplementary Codebook (Data S1) hosted online on Google Colaboratory (https://colab.research.google.com/drive/1G34eiDEnSF8tHqaq9egfy6puPXWmBmlC?usp=sharing), which can also be downloaded from the Open Science Framework.
The data that support the findings of this study are openly available on the Open Science Framework at https://osf.io/du5m8/.
