Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2014 Jul 28;111(32):E3353–E3361. doi: 10.1073/pnas.1409860111

Modeling first impressions from highly variable facial images

Richard J W Vernon 1, Clare A M Sutherland 1, Andrew W Young 1, Tom Hartley 1,1
PMCID: PMC4136614  PMID: 25071197

Significance

Understanding how first impressions are formed to faces is a topic of major theoretical and practical interest that has been given added importance through the widespread use of images of faces in social media. We create a quantitative model that can predict first impressions of previously unseen ambient images of faces (photographs reflecting the variability encountered in everyday life) from a linear combination of facial attributes, explaining 58% of the variance in raters’ impressions despite the considerable variability of the photographs. Reversing this process, we then demonstrate that face-like images can be generated that yield predictable social trait impressions in naive raters because they capture key aspects of the systematic variation in the relevant physical features of real faces.

Keywords: face perception, social cognition, person perception, impression formation

Abstract

First impressions of social traits, such as trustworthiness or dominance, are reliably perceived in faces, and despite their questionable validity they can have considerable real-world consequences. We sought to uncover the information driving such judgments, using an attribute-based approach. Attributes (physical facial features) were objectively measured from feature positions and colors in a database of highly variable “ambient” face photographs, and then used as input for a neural network to model factor dimensions (approachability, youthful-attractiveness, and dominance) thought to underlie social attributions. A linear model based on this approach was able to account for 58% of the variance in raters’ impressions of previously unseen faces, and factor-attribute correlations could be used to rank attributes by their importance to each factor. Reversing this process, neural networks were then used to predict facial attributes and corresponding image properties from specific combinations of factor scores. In this way, the factors driving social trait impressions could be visualized as a series of computer-generated cartoon face-like images, depicting how attributes change along each dimension. This study shows that despite enormous variation in ambient images of faces, a substantial proportion of the variance in first impressions can be accounted for through linear changes in objectively defined features.


Avariety of relatively objective assessments can be made upon perceiving an individual’s face. Their age, sex, and often their emotional state can be accurately judged (1). However, inferences are also made about social traits; for example, certain people may appear more trustworthy or dominant than others. These traits can be “read” from a glimpse as brief as 100 ms (2) or less (3), and brain activity appears to track social traits, such as trustworthiness, even when no explicit evaluation is required (4). This finding suggests that trait judgments are first impressions that are made automatically, likely outside of conscious control. Such phenomena link facial first impressions to a wider body of research and theory concerned with interpersonal perception (5).

For many reasons, including the increasingly pervasive use of images of faces in social media, it is important to understand how first impressions arise (6). This is particularly necessary because although first impressions are formed rapidly to faces, they are by no means fleeting in their consequences. Instead, many studies show how facial appearance can affect our behavior, changing the way we interpret social encounters and influencing their outcomes. For example, the same behavior can be interpreted as assertive or unconfident depending on the perceived dominance of an accompanying face (7), and inferences of competence based on facial cues have even been shown to predict election results (8). Nonetheless, support for the validity of these first impressions of faces is inconclusive (915), raising the question of why we form them so readily.

One prominent theory, the overgeneralization hypothesis, suggests that trait inferences are overgeneralized responses to underlying cues. For example, a person may be considered to have other immature characteristics based on a “babyfaced” appearance (16). Exploring this general approach of seeking the factors that might underlie facial first impressions, Oosterhof and Todorov (17) found that a range of trait ratings actually seem to reflect judgments along two near-orthogonal dimensions: trustworthiness (valence) and dominance. The trustworthiness dimension appeared to rely heavily on angriness-to-happiness cues, whereas dominance appeared to reflect facial maturity or masculinity. The use of such cues by human perceivers can be very subtle; even supposedly neutral faces can have a structural resemblance to emotional expressions that can guide trait judgments (18).

Oosterhof and Todorov’s (17) findings imply that trait judgments are likely to be based upon both stable (e.g., masculinity) and more transient (e.g., smiling) physical properties (“attributes”) of an individual’s face. However, the trait ratings they analyzed were derived from constrained sets of photographs or computer-generated neutral facial images. Although this approach allows careful control over experimental stimuli, such image sets do not fully reflect the wide range of variability between real faces and images of faces encountered in everyday life.

Jenkins et al. (19) introduced the concept of ambient images to encompass this considerable natural variability. The term “ambient images” refers to images typical of those we see every day. The variability between ambient images includes face-irrelevant differences in angle of view and lighting, as well as the range of expressions, ages, hairstyles, and so forth. Such variability is important: Jenkins et al. (19) and Todorov and Porter (20) found that first impressions can show as much variability between photographs of the same individual as between photographs of different individuals. Hence, the normally discounted range of everyday image variability can play a substantial role in facial judgments.

An important demonstration of the potential value of the ambient images approach is that Sutherland et al. (21) found that a third factor (youthful-attractiveness) emerged when analyzing first impressions of highly variable, ambient images, in addition to Oosterhof and Todorov’s (17) trustworthiness and dominance dimensions. This finding does not of course undermine the value of Oosterhof and Todorov’s model, but shows how it can be extended to encompass the range of natural variability.

The critical test for any model of facial first impressions is therefore to capture these impressions from images that are as variable as those encountered in daily life. Because human perceivers do this so easily and with good agreement, it seems likely that the underlying cues will be closely coupled to image properties, but previous studies have not been able to capture the physical cues underlying a wide range of trait impressions across a wide range of images.

Here, we offer a novel perspective, modeling the relationships between the physical features of faces and trait factor dimensions for a sample of 1,000 ambient facial images. The models take the form of artificial neural networks, in which simple processing elements represent both facial features and social trait dimensions and compute the translation between these representations by means of weighted connections.

With this general technique we were able to show that a linear network can model dimensions of approachability, youthful-attractiveness, and dominance thought to underlie social attributions (21), and critically that its performance generalizes well to previously untrained faces, accounting on average for 58% of the variance in raters’ impressions. To validate this method, we then reversed the approach by using a neural network to generate facial feature attributes and corresponding image properties expected to produce specific trait impressions. In this way, the factors driving first impressions of faces could be visualized as a series of computer-generated cartoon images, depicting how attributes change along each dimension, and we were able to test the predictions by asking a new group of human raters to judge the social traits in the synthesized images.

Our approach offers a powerful demonstration of how apparently complex trait inferences can be based on a relatively simple extracted set of underlying image properties. Furthermore, by grounding the approach in such physical properties, it is possible to directly relate the explained variance back to the features that are driving the model, allowing us to quantify the ways in which physical features relate to underlying trait dimensions.

Procedure and Results

Our overall aim was to characterize the relationship between physical features of faces and the perception of social traits. Fig. 1 provides an overview of the methods. To quantify first impressions, we used a database of 1,000 highly variable ambient face photographs of adults of overall Caucasian appearance, which had been previously rated for a variety of social traits by observers with a Western cultural background (2124). We focused on first impressions of Caucasian-looking faces by participants with a Western upbringing to avoid possible confounding influences of “other-race” effects (25, 26). As in previous work (21), we used factor analysis to reduce these ratings of perceived traits to three factors, which together account for the majority of the variance in the underlying trait judgments: approachability [corresponding closely to trustworthiness/valence in Oosterhof and Todorov’s model (17)], youthful-attractiveness, and dominance (Fig. 1A). In this way, each of the 1,000 ambient face photographs could be given a score reflecting its loading on each of the three underlying factors.

Fig. 1.

Fig. 1.

Summary of methods. (A) For each of 1,000 highly variable face photographs, judgments for first impressions of 16 social traits (“traits”) were acquired from human raters. These 16 trait scores were reduced to scores on three underlying dimensions (“factors”) by means of factor analysis [see Sutherland et al. (21) and Methods for further details]. (B) Faces were delineated by the placement of 179 fiducial points outlining the locations of key features. The fiducial points were used to calculate 65 numerical attributes (“attributes”) summarizing local or global shape and color information (e.g., “lower lip curvature, nose area”). Neural networks were trained to predict factor scores based on these 65 attributes (Table S1 describes the fiducial points and attributes in detail). The performance of the trained networks (i.e., the proportion of variance in factor scores that could then be explained by applying the network to untrained images) was evaluated using a 10-fold cross-validation procedure. (C) Using the full set of images a separate network was trained to reproduce 393 geometric properties (e.g., fiducial point coordinates) from the 65 image attributes (pixel colors were recovered directly from the attribute code). This process permits novel faces to be reconstructed accurately on the basis of the attributes illustrating the information available to the model (see Fig. S2 for additional examples). (D) A cascade of networks was used to synthesize cartoon face-like images corresponding to specified factor scores. This process entailed training a new network to translate three factor scores into 65 attributes that could then be used as the input to the network shown in C, and generate the face-like image. The social trait impressions evoked by these synthesized face-like images were then assessed by naive raters using methods equivalent to Sutherland et al. (21).

To identify the corresponding objective features of the same faces, we calculated a range of attributes reflecting physical properties (Fig. 1B). Consistent with our approach of using unconstrained ambient images, we sought to include a wide range of attributes without predetermining which might prove useful on any theoretical grounds. We first defined the locations of 179 fiducial points used in the Psychomorph program (27) on each face image (Fig. S1), thereby outlining the shapes and positions of internal and external facial features. Then, using the coordinates of these fiducials and the brightness and color properties of pixels within regions defined by them, we calculated a range of image-based measures summarizing physical characteristics of each face (see Methods for further details). These included measures related to the size and shape of the face as a whole (e.g., head width), individual facial features (e.g., bottom lip curvature), their spatial arrangement (e.g., mouth-to-chin distance), the presence/absence of glasses and facial hair, and information about the texture and color of specified regions (e.g., average hue of pixels within polygons defined by fiducial points around the irises). Where applicable, size and area values were first standardized by dividing by a proxy for apparent head size.

To characterize the relationship between these physical measures and the social trait ratings, we used a subset of the face images to train artificial neural networks that were then used to predict the trait factor scores of “novel” (untrained) faces from their physical attributes alone. A variety of different network architectures were evaluated. These included a simple linear architecture (with factor score predictions being modeled as a weighted combination of attributes plus constant) and a range of nonlinear architectures (i.e., neural networks including a nonlinear hidden layer containing varying numbers of hidden units). The former (linear architecture) approach is equivalent to multiple linear regression and is capable of characterizing univariate linear relationships between individual attributes and factor scores. The latter (nonlinear, with hidden units) approach is in principle capable of exploiting nonlinear and multivariate relationships.

The neural network approach is beneficial in that it allows us to evaluate both linear and nonlinear models using a closely matched 10-fold cross-validation procedure. For this procedure the set of 1,000 faces was divided into 10 subsamples, each containing the physical attributes derived from 100 images together with the corresponding factor scores. For each “fold” one set of 100 image attributes was reserved for use as test cases, and the remaining 900 were used to train (800) and then validate (100) a freshly initialized network. The network was trained to fit the physical measures to the factor scores from the training cases. After training, the predicted factor scores for the reserved 100 untrained faces were computed. The predictions were then set aside and the process repeated until all 1,000 images had been used as test cases. The correlation between predicted and observed factor scores for the full set of untrained images was then calculated. The procedure was repeated 100 times (using random initialization of the networks and sampling of training and test images), with performance calculated as the average correlation over all iterations.

The average correlations between the network predictions and actual factor scores were rapproachability = 0.90, ryouthful-attractiveness = 0.70, and rdominance = 0.67 (all P < 0.001) (Fig. 2). Across iterations of training, the SDs of these correlations were all less than 0.01, showing consistent network performance regardless of the particular selection of images chosen for training/validation and testing. Overall, the linear model accounted for a substantial proportion of the variance for all three trait factors (58% on average). These findings therefore demonstrate that linear properties of objective facial attributes can be used to predict trait judgments with considerable accuracy. Indeed, predicted factor scores based on unseen test cases were only marginally less accurate than the fit obtained for training cases (rapproachability = 0.92; ryouthful-attractiveness = 0.75; rdominance = 0.73), indicating that the linear model generalizes very well to novel cases.

Fig. 2.

Fig. 2.

Scatterplots indicating the correlations between experimentally derived factor scores (from human raters) with the corresponding predictions, for untrained images (see Methods for details), derived from a linear neural network (as illustrated in Fig. 1B). Each point (n = 1,000, for all axes) represents the observed and predicted ratings for a distinct face image in our database. Both experimental and predicted scores have been scaled into the range (−1:1) (A) approachability, (B) youthful-attractiveness, or (C) dominance.

Because of their greater ability to capture nonlinear and multivariate relationships, we might intuitively have expected nonlinear architectures to outperform linear models. Perhaps surprisingly, however, we found no additional benefits of a nonlinear approach. For example, using our standard training and validation procedures, a network with five nonlinear hidden units generated correlations of rapproachability = 0.88, ryouthful-attractiveness = 0.65, rdominance = 0.62 (all P < 0.001). Furthermore, there were significant negative correlations between the number of hidden units and performance for all three factors (rapproachability = −0.96, ryouthful-attractiveness = −0.98, rdominance = −0.97, all P < 0.001). It seems that any nonlinear or multivariate relationships that the more complex architectures are able to exploit in fitting the training data, do not generalize. Instead, nonlinear network performance for untrained test cases suffers from overfitting. Importantly then, the critical relationships in the data are largely captured by the simple linear model, which generalizes well to the new cases.

The fact that the linear model works so well allows us to quantify which physical features are correlated with each social trait dimension. Table 1 summarizes statistically significant associations (Pearson correlations surviving Bonferroni correction for multiple comparisons) between physical attributes and factor scores (a full description and numerical key to all 65 attributes is provided in Table S1).

Table 1.

Significant associations between objective attributes and social trait impressions in 1,000 ambient face photographs

Attribute type Attribute App Yo-At Dom
Head size and posture 01. Head area 0.14
03. Head width 0.14 0.18 −0.20
04. Orientation (front-profile) 0.12
05. Orientation (up-down) 0.17 0.28
06. Head tilt 0.19 0.20
Eyebrows 07. Eyebrow area −0.16 −0.21 0.23
08. Eyebrow height −0.15 −0.33 0.27
09. Eyebrow width 0.22 0.12
10. Eyebrow gradient 0.31 −0.15
Eyes 11. Eye area −0.26 0.40 −0.22
12. Iris area −0.20 0.41 −0.31
13. Eye height −0.30 0.39 −0.23
14. Eye width 0.13 0.34 −0.19
15. % Iris −0.31 0.24
Nose 16. Nose area 0.26 0.14
17. Nose height 0.24
18. Nose width 0.45 0.16
19. Nose curve 0.37
20. Nose flare −0.37
Jawline 21. Jaw height 0.17 0.35
22. Jaw gradient 0.18 0.33
23. Jaw deviation 0.25 0.14
24. Chin curve 0.18 0.31
Mouth 25. Mouth area 0.69 0.14 −0.15
26. Mouth height 0.51 0.15 −0.22
27. Top Lip height −0.24 0.24 −0.25
28. Bottom lip height −0.35 0.34 −0.15
29. Mouth width 0.73
30. Mouth gap 0.71
31. Top lip curve 0.36 0.12
32. Bottom lip curve 0.75
Other structural features 33. Noseline separation 0.22
34. Cheekbone position 0.16
35. Cheek gradient −0.17 0.37
36. Eye gradient −0.23 −0.21 0.32
Feature positions 38. Eyebrows position −0.27
39. Mouth position 0.38 −0.28
40. Nose position −0.22
Feature spacing 41. Eye separation 0.23 −0.21
42. Eyes-to-mouth distance −0.39 0.19
43. Eyes-to-eyebrows distance −0.44
46. Mouth-to-chin distance −0.38 0.13
47. Mouth-to-nose distance −0.60 0.12
Color and texture 49. Skin saturation 0.28
50. Skin value (brightness) 0.13 −0.23
51. Eyebrow hue
52. Eyebrow saturation 0.13 0.15
53. Eyebrow value (brightness) 0.13 −0.22
55. Lip saturation 0.12 0.19
59. Iris value (brightness) −0.24
60. Skin hue variation −0.21
61. Skin saturation variation −0.22 0.21
62. Skin value variation −0.24 0.25
Other features 63. Glasses −0.26
64. Beard or moustache −0.20 0.24
65. Stubble −0.15 0.24

App, Approachability; Dom, Dominance; Yo-At, Youthful-attractiveness. Significant attribute-factor correlations, after Bonferroni correction (P < 0.050/195). Highly significant results (P < 0.001/195) are highlighted in bold. See Table S1 for attribute descriptions.

The most striking thing in Table 1 is that almost all attributes we considered are significantly correlated with one or more of the dimensions of evaluation. It is clear that social traits can be signaled through multiple covarying cues, and this is consistent with findings that no particular region of the face is absolutely essential to making reliable social inferences (24).

That said, the substantial roles of certain types of attribute for each dimension also emerge clearly from closer inspection of Table 1. The five features that are most strongly positively correlated with the approachability dimension are all linked to the mouth and mouth shape (feature #25 mouth area, #26 mouth height, #29 mouth width, #30 mouth gap, #32 bottom lip curve), and this is consistent with Oosterhof and Todorov’s (17) observation that a smiling expression is a key component of an impression of approachability. Four of the five features that are most strongly positively correlated with the youthful-attractiveness dimension relate to the eyes (feature #11 eye area, #12 iris area, #13 eye height, #14 eye width), in line with Zebrowitz et al.’s (16) views linking relatively large eyes to a youthful appearance. In Oosterhof and Todorov’s (17) model the dominance dimension is linked to stereotypically masculine appearance, and here we find it to be most closely correlated with structural features linked to masculine face shape (feature #8 eyebrow height, #35 cheek gradient, #36 eye gradient) and to color and texture differences that may also relate to masculinity (28) or a healthy or tanned overall appearance (feature #49 skin saturation, #62 skin value variation).

Although this agreement between the features, which we found to be most closely linked to each dimension and theoretical approaches to face evaluation, is reassuring, it is nonetheless based on correlations, and correlational data are of course notoriously susceptible to alternative interpretations. We therefore sought to validate our approach with a strong test. The largely linear character of the mapping we have identified implies that it might be possible to reverse-engineer the process, using trait-factor scores as inputs (instead of outputs) to a neural network that will generate 65 predicted features from the input combination of factor scores. From these 65 attributes, the requisite image properties can then be recovered and used to reconstruct a face-like image (Fig. 1). The critical test is then whether the reconstructed image exemplifies the intended social traits. This process provides us with a way to visualize the patterns of physical change that are expected to drive the perception of specific social traits, and to test the validity of these predictions with naive human raters.

We carried out this process in three steps. We first created a linear model, allowing us to generate physical attribute scores (including pixel colors) characteristic of specific combinations of social trait judgments. We then created a linear model relating attribute scores to normalized image coordinates, allowing us to reconstruct face-like images from specified attribute scores (see Methods and Fig. 1C for details). We then combined these models to reconstruct faces expected to elicit a range of social trait judgments along each dimension (see Fig. 3 for examples), and obtained a new set of ratings of these images, which we compared with the model’s predictions (Fig. 1D).

Fig. 3.

Fig. 3.

Synthesized face-like images illustrating the changes in facial features that typify each of the three social trait dimensions. The images were generated using the methods described in the text and in Fig. 1 C and D, based on the same attributes as those used to derive social trait predictions in Fig. 2. The “low” end of each dimension is shown at the left end of each row and the “high” end is at the right. Faces are shown in upright orientation for easy comparison, but the model also suggests some systematic variation in the tilt of the face (indicated below each image). A sample of such synthesized images was used to validate the predicted trait impressions in a group of naive human raters (see Table 2 and the text for details). See Movies S1–S3 for video animations of these changes.

In all cases the predicted scores on a given dimension correlated significantly with the new obtained ratings on that dimension (Table 2), showing that the intended manipulation of each dimension was effective. However, it is also evident from Table 2 that the dimensional manipulations were not fully independent from each other, as would be expected from the fact that many features are correlated with more than one dimension evident in Table 1. Nonetheless, for both approachability and dominance, the magnitude of the correlation of ratings with the corresponding predicted trait dimension was significantly greater than the correlation with either of the remaining dimensions, showing clear discriminant validity (all P < 0.011) (see Methods for details). For the youthful-attractiveness dimension, the magnitude of the correlation between predicted youthful-attractiveness and ratings of youthful-attractiveness was greater than that of its correlation with ratings of approachability (P < 0.001), and its correlation with dominance approached statistical significance (P = 0.081).

Table 2.

Spearman's correlations between expected and rated factor scores for synthesized face-like images

Rated factor scores Expected factor scores
Approachability Youthful-attractiveness Dominance
Approachability 0.93** 0.03 −0.09
Youthful-attractiveness 0.78** 0.56* 0.10
Dominance 0.19 −0.53* 0.74**
**

P < 0.001, *P < 0.050.

In sum, the generated faces were successful in capturing the key features associated with each dimension. Furthermore, the faces strikingly bear out the conclusions reached from the pattern of feature-to-dimension correlations reported in Table 1. The multiple cues for each dimension are clearly captured in the cartoon faces, and the importance of the mouth to approachability, the eyes to youthful-attractiveness, and the masculine face shape and change in skin tone for dominance are all evident. Increased youthful-attractiveness is also apparently linked to a more “feminized” face shape in Fig. 3. This result was in fact also apparent in Table 1, where the “jaw height” feature (no. 21 in Table 1) was the among the “top 5” positive correlations with youthful-attractiveness (together with the four eye-related features we listed previously), and the other jawline features (2224) were all in the “top 11.”

General Discussion

To our knowledge, we have shown for the first time that trait dimensions underlying facial first impressions can be recovered from hugely variable ambient images based only on objective measures of physical attributes. We were able to quantify the relationships between those physical attributes and each dimension, and we were able to generate new face-like representations that could depict the physical changes associated with each dimension.

The fact that our results are consistent with a number of previous studies based on ratings rather than physical measures suggests that this approach has successfully captured true changes in each dimension. To validate this claim, we first trained a separate model to reconstruct face-like images capturing the relevant featural variation in our ambient image database. We then generated images whose features varied systematically along each dimension and demonstrated that these yield predictable social trait impressions in naive raters.

Critically, our approach has been based entirely on highly variable ambient images. The results did not depend in any way on imposing significant constraints on the images selected, or on any preconceived experimental manipulation of the data. Faces were delineated, and a wide range of attributes chosen/calculated, with no a priori knowledge of how individual faces were judged, or which attributes might be important. This approach minimizes any subjectivity introduced by the researcher, and is as close to a double-blind design as can be achieved. The fact that the findings are based on such a diverse set of images also lends considerable support to both the replicated and novel findings described above. Furthermore, the approach used allowed us explore attribute-factor relationships.

Oosterhof and Todorov’s (17) demonstration that first impressions of many traits can be encompassed within an overarching dimensional model offered an important and elegant simplification of a previously disparate set of studies, and helped demystify how we as humans seem capable of reliably making such an extraordinary range of social inferences. Our findings take this demystification a significant step further by showing that these dimensions of evaluation can be based on simple linear combinations of features. Previous studies with computer-generated stimuli had also shown that specific linear changes in a morphable computer model can capture such dimensions (17, 29). Achieving an equivalent demonstration here is noteworthy because the features present in the ambient images were not predetermined or manipulated on theoretical grounds and because their highly varied nature will clearly affect the reliability of individual feature-trait associations; unnatural lighting can compromise skin-tone calculations, angle of view affects the perceived size and the retinal shape of many features, and so on. It is therefore particularly impressive that our approach was able to capture the majority of the variance in ratings despite these potential limitations in individual attribute calculations. Part of the reason for this result surely lies in the point emphasized by Bruce (30) and by Burton (31) for the related field of face recognition: that naturally occurring image variations that are usually dismissed as “noise” can actually be a useful pointer that helps in extracting any underlying stability.

A question for future research concerns the extent to which the model’s success in accounting for social trait impressions is dependent on the particular selection of attributes we used. Our 65 input features were intended to provide a reasonably complete description of the facial features depicted in each image. The success of these features is demonstrated by our capacity to reconstruct distinctive facsimiles of individual faces based only on the attributes (e.g., Fig. 1C; see Fig. S2 for further examples). In generating these attributes, our approach was to establish numerical scores that would, where possible, relate to a list of characteristics we could label verbally based on the Psychomorph template (Fig. S1 and Table S1). In line with the overarching ambient image approach, our strategy was to be guided by the data as to the role of different attributes, and thus we included the full set in our analyses. Given the linearity we found, though, we broadly expect that any similarly complete description of the relevant facial features should yield similar results. However, it is also likely that our attribute description is overcomplete in the sense that there is substantial redundancy between the attributes. Because of intercorrelations between features, it might well be possible to achieve comparable performance with fewer, orthogonal, components, but these would be much harder to interpret in terms of individual verbally labeled cues.

Although the current model already includes a fairly detailed representation of the geometry of shape cues, the textural information we incorporate is less fine-grained, and an obvious development of our approach will be to increase the resolution of this aspect of the model, which may yield some improvement in its performance and will potentially allow for much more realistic reconstructions. For the present purposes, though, we consider it a strength that a relatively simple representation can capture much of the underlying process of human trait attribution.

Another important issue to which our approach can be addressed in future concerns the extent to which social trait impressions may depend on specific image characteristics, such as lighting and camera position, changeable properties of an individual’s face, such as pose and expression, or alternatively, more stable characteristics determined by the face’s underlying structure. It is important to note that the former, variable features are potentially specific to each image and therefore even to different images of the same individual. This critical distinction between individual identities and specific photographs has a long history in work on face perception (32, 33), and indeed recent work (19, 20) has demonstrated clearly that intraindividual and image cues play a role in determining social trait judgments alongside any interindividual cues. Our results are consistent with this in demonstrating that changeable features of the face (such as head tilt and bottom lip curvature) covary reliably with social trait impressions, but our approach could also be extended to allow estimation of the relative contributions of these different contributory factors.

Our methods provide a means to estimate first-impression dimensions objectively from any face image and to generate face-like images varying on each dimension. These are significant steps that offer substantial benefits for future research. Our results are also of practical significance (e.g., in the context of social media) because we have shown how images of a given individual can be selected on the basis of quantifiable physical features of the face to selectively convey desirable social trait impressions.

Methods

Social Trait Judgments.

Each of the ambient faces had been rated for 16 social trait impressions, as reported in previous work (2124). Briefly (Fig. 1A), each trait was rated for the full set of 1,000 faces by a minimum of six independent judges using a seven-point Likert scale (for example, attractiveness: 1 = very unattractive, 7= very attractive). All traits had acceptable interrater reliability (Cronbach’s α > 0.70). The means of the different raters’ scores for each face and each trait were subjected to principal axis factor analysis with orthogonal rotation, yielding factor scores (Anderson–Rubin method, to ensure orthogonality) for each of the previously identified dimensions (approachability, youthful-attractiveness, and dominance) (21).

Delineation and Quantification of Facial Features.

The shape and internal features of each face were identified by manually placing 179 fiducial points onto each image using PsychoMorph (27). The initial delineation was checked visually by two experimenters, with further checking of the organization of fine-scale features using a custom Matlab script that identified errors that would not be found by visual inspection (e.g., incorrect sequencing of the fiducials). To facilitate the modeling of image shapes, the resulting 2D fiducials were then rotated to a vertical orientation such that the centroids of left- and right-sided points were level.

Sixty-five attributes were derived using these coordinates. These attributes corresponded to a range of structural, configurational, and featural measurements (see Table S1 for full details). For example, “head width” was calculated as the mean horizontal separation between the three leftmost and three rightmost points; “bottom lip curvature” was calculated by fitting a quadratic curve through the points representing the edge of the bottom lip (curvature being indicated by the coefficient of the squared term); “mouth-to-chin distance” was the vertical separation between lowest point on the lower lip and the lowest point on the chin. Area measurements were calculated using polygons derived from subsets of the fiducial points. Overall colors within specified areas were calculated for specified regions (i.e., lips, iris, skin, eyebrows) by averaging RGB values of pixels within polygons defined by sets of fiducial points, converting these to hue, saturation, value (HSV) measures, and (for skin pixels) additionally calculating a measure of dispersion, entropy, for each of the H, S, and V channels (15 texture attributes in total). A HSV description for color was chosen on the basis that hue and saturation would be relatively insensitive to the large overall luminance variations in the ambient images. Three Boolean attributes described the presence of glasses, facial hair (beards and moustaches), or stubble.

The raw attribute values were separately normalized to place them into a common range necessary for the neural network training. These normalization steps also helped to reduce the impact of image-level nonlinearities in the raw attributes of the highly varied images.

First, a square-root transform was applied to attributes measuring area. The overall scale of all geometric measures was normalized by dividing by the average distance between all possible pairs of points outlining the head (a robust measure of the size of the head in the 2D image). Finally the resulting values were scaled linearly into the range (−1:1).

The HSV color values were similarly scaled into the range (−1:1). As hue is organized circularly, it was necessary to apply a rotation to the raw hue values such that the median, reddish, hues associated with typical Caucasian skin tones were represented by middling values.

Neural Network Training, Validation, and Cross-Validation.

Neural networks were implemented using the MatLab Neural Network toolbox (MathWorks). For initial modeling of the determinants of social trait judgments, input units represented physical attributes as described above, with output units representing social trait factor scores. For the two-layer (linear) networks, both input and output units used a linear activation function, and these were connected in a fully feed-forward manner with each input unit being connected to each output unit (Fig. 1B). For the three-layer (potentially nonlinear) networks an additional intervening layer of hidden units (with sigmoid activation function) was placed between input and output layers, such that each input unit was connected to each hidden unit, which was in turn connected to each output unit.

During training weights were adjusted using the MatLab toolbox’s default Levenberg–Marquardt algorithm. In essence, the weighted connections between input and output units were gradually and iteratively varied so as to minimize (on a least-squares measure) the discrepancy between the model’s output and the social trait judgments obtained from human raters.

Training, validation, and 10-fold cross-validation was carried out as described above. The 1,000 images were randomly partitioned into 10 discrete sets of 100, with 8 sets used to train the network, a further set used to determine when training had converged, and the remaining set used to evaluate the performance of the trained network. The predicted social trait judgment outputs were noted, and the whole process repeated until each of the 10 image sets had served as the test. This entire cross-validation procedure was then repeated 100 times using a new random partitioning of the data to ensure that the results did not depend on a specific partitioning of the data.

Statistical Analysis.

The Pearson correlation between the outputs of the model to unseen (i.e., untrained) test case images and the corresponding factor scores provides a measure of how well the model predicts variation in the human ratings. We calculated these correlations for each of the three previously identified social trait dimensions (Fig. 2). We also report the correlations (Bonferroni-corrected) between individual attribute scores and each social trait dimension (Table 1). Note that our analysis suggests that social trait judgments are determined by multiple small contributions from different physical attributes, which means that it is impossible to unambiguously determine the contribution of each attribute (multicollinearity), although the correlations may serve to indicate relationships worthy of further investigation.

Generating Face-Like Images.

To address interpretational difficulties arising from multicollinearity, we reversed the modeling process to generate face-like images with attributes expected to convey certain social trait impressions. To solve this engineering problem, we used a cascade of linear networks trained on the full set of 1,000 images. First (Fig. 1C), a network was trained to convert physical attributes (as described above) into the coordinates of fiducial points, which could be rendered using custom MatLab scripts (for this purpose we subdivided the output units, representing the full set of fiducial points (Fig. S1), feature centroids, and global rotation, into 19 subnetworks that were each trained separately for memory efficiency). Then (Fig. 1D), a new network was trained to generate the attributes corresponding to a specified social trait factor scores [each having been scaled into the range (−1:1)], which then acted as input to the face-rendering network. The resulting cascade generates synthetic face-like images that are expected to yield specific social percepts. For example, a face-like image generated using a trait factor score of (1, 0, 0) is expected to be perceived as highly approachable (first dimension) but neutral with respect to youthful-attractiveness (second dimension) and dominance (third dimension).

Validating Synthesized Faces.

We tested the validity of the generated face-like images by having new participants assess them in an online questionnaire, closely following the procedure in Sutherland et al. (21). Trait ratings (as a proxy for scores on the three dimensions) were collected for 19 generated faces that covered the range of potential factor scores [each scaled into the range (−1:1)]. One face represented the neutral point (0, 0, 0), 6 faces represented the extremes (high and low) of each dimension [e.g., (0.8, 0, 0), approachable], and 12 faces represented all possible pairs of those extremes [e.g., (−0.8, 0, 0.8) unapproachable, dominant]. We chose a value of 0.8 for to represent the extreme of each dimension because this was typical of the scaled social trait factor scores of the most extreme 30 of the 1,000 faces in our ambient image database.

To evaluate these images, we solicited social trait judgments from 30 naive participants (15 male, mean age 23.93 y) who took part after consenting to procedures approved by the ethics committee of the Psychology Department, University of York. All spoke fluent English and were from a culturally Western background.

To generate proxy factor scores for each trait dimension identified in our earlier factor analysis, we selected the most heavily loading traits and asked raters to evaluate those traits for each image in the set of synthesized face images. These ratings were then combined in an average, weighted by the trait’s loadings on that dimension. For the approachability dimension raters assessed each face for “smiling,” “pleasantness,” and “approachability.” For youthful-attractiveness they rated images for “attractiveness,” “health,” and “age.” For dominance they rated “dominance,” “sexual dimorphism,” and “confidence.”

The participants were randomly allocated into three sex-balanced groups, one per dimension. Each group only rated the three traits making up one dimension, to avoid the risk of judgments on one factor biasing another. Each trait was rated in a block of consecutive trials (with random block order), and within each block the 19 generated faces were randomly presented. Before rating the actual stimuli, the participants saw six practice images (created using random factor scores, indistinguishable from actual stimuli). All faces were rated on a 1–7 Likert scale with endpoints anchored as previous research (2124).

To assess the correspondence between raters’ judgments and the predicted factor scores (i.e., those used to synthesize the faces), we determined the correlation for each pairing of judged and predicted dimensions (Table 2). Independent t tests were used to compare these correlations at the level of individual raters. First, we calculated a Spearman's correlation for each pairing of rated and synthesized trait dimensions, then we used independent t tests to test the hypothesis that the absolute value of the Fisher transformed correlations was significantly greater for the predicted trait dimension than for each of the other dimensions.

Animations.

By generating a series of images with incremental changes along each dimension, we were also able to create short movies to encapsulate each dimension in the model (Movies S1–S3). As well as the changes already noted, these movies show that the neural network has also captured superordinate variation resulting from synchronized changes across combinations of features; for example, increased dominance involves raising the head (as if to “look down” upon the viewer).

Supplementary Material

Supporting Information

Acknowledgments

The research was funded in part by an Economic and Social Research Council Studentship ES/I900748/1 (to C.A.M.S.).

Footnotes

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1409860111/-/DCSupplemental.

References

  • 1.Bruce V, Young AW. Face Perception. London: Psychology Press; 2012. [Google Scholar]
  • 2.Willis J, Todorov A. First impressions: Making up your mind after a 100-ms exposure to a face. Psychol Sci. 2006;17(7):592–598. doi: 10.1111/j.1467-9280.2006.01750.x. [DOI] [PubMed] [Google Scholar]
  • 3.Todorov A, Pakrashi M, Oosterhof NN. Evaluating faces on trustworthiness after minimal time exposure. Soc Cogn. 2009;27(6):813–833. [Google Scholar]
  • 4.Engell AD, Haxby JV, Todorov A. Implicit trustworthiness decisions: Automatic coding of face properties in the human amygdala. J Cogn Neurosci. 2007;19(9):1508–1519. doi: 10.1162/jocn.2007.19.9.1508. [DOI] [PubMed] [Google Scholar]
  • 5.Fiske ST, Cuddy AJC, Glick P. Universal dimensions of social cognition: Warmth and competence. Trends Cogn Sci. 2007;11(2):77–83. doi: 10.1016/j.tics.2006.11.005. [DOI] [PubMed] [Google Scholar]
  • 6.Perrett DI. In Your Face: The New Science of Human Attraction. London: Palgrave Macmillan; 2010. [Google Scholar]
  • 7.Hassin R, Trope Y. Facing faces: Studies on the cognitive aspects of physiognomy. J Pers Soc Psychol. 2000;78(5):837–852. doi: 10.1037//0022-3514.78.5.837. [DOI] [PubMed] [Google Scholar]
  • 8.Todorov A, Mandisodza AN, Goren A, Hall CC. Inferences of competence from faces predict election outcomes. Science. 2005;308(5728):1623–1626. doi: 10.1126/science.1110589. [DOI] [PubMed] [Google Scholar]
  • 9.Snyder M, Tanke ED, Berscheid E. Social perception and interpersonal behaviour: On the self-fulfilling nature of social stereotypes. J Pers Soc Psychol. 1977;35(9):656–666. [Google Scholar]
  • 10.Bond JCF, Berry DS, Omar A. The kernel of truth in judgments of deceptiveness. Basic Appl Soc Psych. 1994;15(4):523–534. [Google Scholar]
  • 11.Zebrowitz LA, Andreoletti C, Collins MA, Lee SY, Blumenthal J. Bright, bad, babyfaced boys: Appearance stereotypes do not always yield self-fulfilling prophecy effects. J Pers Soc Psychol. 1998;75(5):1300–1320. doi: 10.1037//0022-3514.75.5.1300. [DOI] [PubMed] [Google Scholar]
  • 12.Efferson C, Vogt S. Viewing men’s faces does not lead to accurate predictions of trustworthiness. Sci Rep. 2013;3:1047. doi: 10.1038/srep01047. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Carré JM, McCormick CM. In your face: Facial metrics predict aggressive behaviour in the laboratory and in varsity and professional hockey players. Proc Biol Sci. 2008;275(1651):2651–2656. doi: 10.1098/rspb.2008.0873. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Stirrat M, Perrett DI. Valid facial cues to cooperation and trust: Male facial width and trustworthiness. Psychol Sci. 2010;21(3):349–354. doi: 10.1177/0956797610362647. [DOI] [PubMed] [Google Scholar]
  • 15.Gómez-Valdés J, et al. Lack of support for the association between facial shape and aggression: A reappraisal based on a worldwide population genetics perspective. PLoS ONE. 2013;8(1):e52317. doi: 10.1371/journal.pone.0052317. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Zebrowitz LA, Fellous JM, Mignault A, Andreoletti C. Trait impressions as overgeneralized responses to adaptively significant facial qualities: Evidence from connectionist modeling. Pers Soc Pscycol Rev. 2003;7(3):194–215. doi: 10.1207/S15327957PSPR0703_01. [DOI] [PubMed] [Google Scholar]
  • 17.Oosterhof NN, Todorov A. The functional basis of face evaluation. Proc Natl Acad Sci USA. 2008;105(32):11087–11092. doi: 10.1073/pnas.0805664105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Said CP, Sebe N, Todorov A. Structural resemblance to emotional expressions predicts evaluation of emotionally neutral faces. Emotion. 2009;9(2):260–264. doi: 10.1037/a0014681. [DOI] [PubMed] [Google Scholar]
  • 19.Jenkins R, White D, Van Montfort X, Burton AM. Variability in photos of the same face. Cognition. 2011;121(3):313–323. doi: 10.1016/j.cognition.2011.08.001. [DOI] [PubMed] [Google Scholar]
  • 20.Todorov A, Porter JM. Misleading first impressions: Different for different facial images of the same person. Psychol Sci. 2014;25(7):1404–1417. doi: 10.1177/0956797614532474. [DOI] [PubMed] [Google Scholar]
  • 21.Sutherland CAM, et al. Social inferences from faces: Ambient images generate a three-dimensional model. Cognition. 2013;127(1):105–118. doi: 10.1016/j.cognition.2012.12.001. [DOI] [PubMed] [Google Scholar]
  • 22.Santos IM, Young AW. Exploring the perception of social characteristics in faces using the isolation effect. Vis Cogn. 2005;12:213–247. [Google Scholar]
  • 23.Santos IM, Young AW. Effects of inversion and negation on social inferences from faces. Perception. 2008;37(7):1061–1078. doi: 10.1068/p5278. [DOI] [PubMed] [Google Scholar]
  • 24.Santos IM, Young AW. Inferring social attributes from different face regions: Evidence for holistic processing. Q J Exp Psychol (Hove) 2011;64(4):751–766. doi: 10.1080/17470218.2010.519779. [DOI] [PubMed] [Google Scholar]
  • 25.Malpass RS, Kravitz J. Recognition for faces of own and other race. J Pers Soc Psychol. 1969;13(4):330–334. doi: 10.1037/h0028434. [DOI] [PubMed] [Google Scholar]
  • 26.Meissner CA, Brigham JC. Thirty years of investigating the own-race bias in memory for faces—A meta-analytic review. Psychol Public Policy Law. 2001;7(1):3–35. [Google Scholar]
  • 27.Tiddeman B, Burt M, Perrett D. Prototyping and transforming facial textures for perception research. Computer Graphics and Applications, IEEE. 2001;21(5):42–50. [Google Scholar]
  • 28.Russell R. A sex difference in facial contrast and its exaggeration by cosmetics. Perception. 2009;38(8):1211–1219. doi: 10.1068/p6331. [DOI] [PubMed] [Google Scholar]
  • 29.Walker M, Vetter T. Portraits made to measure: Manipulating social judgments about individuals with a statistical face model. J Vis. 2009;9(11):1–13. doi: 10.1167/9.11.12. [DOI] [PubMed] [Google Scholar]
  • 30.Bruce V. Stability from variation: The case of face recognition. Q J Exp Psychol A. 1994;47(1):5–28. doi: 10.1080/14640749408401141. [DOI] [PubMed] [Google Scholar]
  • 31.Burton AM. Why has research in face recognition progressed so slowly? The importance of variability. Q J Exp Psychol (Hove) 2013;66(8):1467–1485. doi: 10.1080/17470218.2013.800125. [DOI] [PubMed] [Google Scholar]
  • 32.Hay DC, Young AW. The human face. In: Ellis AW, editor. Normality and Pathology in Cognitive Functions. London: Academic; 1982. pp. 173–202. [Google Scholar]
  • 33.Bruce V, Young A. Understanding face recognition. Br J Psychol. 1986;77(Pt 3):305–327. doi: 10.1111/j.2044-8295.1986.tb02199.x. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supporting Information
Download video file (8MB, avi)
Download video file (8MB, avi)
Download video file (8.1MB, avi)

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES