Facial color is an efficient mechanism to visually transmit emotion

Carlos F Benitez-Quiroz; Ramprakash Srinivasan; Aleix M Martinez

doi:10.1073/pnas.1716084115

. 2018 Mar 19;115(14):3581–3586. doi: 10.1073/pnas.1716084115

Facial color is an efficient mechanism to visually transmit emotion

Carlos F Benitez-Quiroz ^a,^b, Ramprakash Srinivasan ^a,^b, Aleix M Martinez ^a,^b,¹

PMCID: PMC5889636 PMID: 29555780

Significance

Emotions correspond to the execution of a number of computations by the central nervous system. Previous research has studied the hypothesis that some of these computations yield visually identifiable facial muscle movements. Here, we study the supplemental hypothesis that some of these computations yield facial blood flow changes unique to the category and valence of each emotion. These blood flow changes are visible as specific facial color patterns to observers, who can then successfully decode the emotion. We present converging computational and behavioral evidence in favor of this hypothesis. Our studies demonstrate that people identify the correct emotion category and valence from these facial colors, even in the absence of any facial muscle movement.

Keywords: face perception, categorization, affect, computer vision

Abstract

Facial expressions of emotion in humans are believed to be produced by contracting one’s facial muscles, generally called action units. However, the surface of the face is also innervated with a large network of blood vessels. Blood flow variations in these vessels yield visible color changes on the face. Here, we study the hypothesis that these visible facial colors allow observers to successfully transmit and visually interpret emotion even in the absence of facial muscle activation. To study this hypothesis, we address the following two questions. Are observable facial colors consistent within and differential between emotion categories and positive vs. negative valence? And does the human visual system use these facial colors to decode emotion from faces? These questions suggest the existence of an important, unexplored mechanism of the production of facial expressions of emotion by a sender and their visual interpretation by an observer. The results of our studies provide evidence in favor of our hypothesis. We show that people successfully decode emotion using these color features, even in the absence of any facial muscle activation. We also demonstrate that this color signal is independent from that provided by facial muscle movements. These results support a revised model of the production and perception of facial expressions of emotion where facial color is an effective mechanism to visually transmit and decode emotion.

For over 2,300 y (1), philosophers and scientists have argued that people convey emotions through facial expressions by contracting and relaxing their facial muscles (2). The resulting, observable muscle articulations are typically called action units (AUs) (3).

It has been shown that different combinations of AUs yield facial expressions associated with distinct emotion categories (2, 4, 5), although cultural and individual differences exist (6, 7) and context may influence their production and perception (8, 9). These facial expressions are easily interpreted by our visual system, even in noisy environments (10–12).

Here, we test the supplemental hypothesis that facial color successfully transmits emotion independently of facial movements. Our hypothesis is that a face can send emotion information to observers by changing the blood flow or blood composition on the network of blood vessels closest to the surface of the skin (Fig. 1). Consider, for instance, the redness that is sometimes associated with anger or the paleness in fear. These facial color changes can be caused by variations in blood flow, blood pressure, glucose levels, and other changes that occur during emotive experiences (13–15).

Fig. 1. — Emotions are the execution of a number of computations by the nervous system. Previous research has studied the hypothesis that some of these computations yield muscle articulations unique to each emotion category. However, the human face is also innervated with a large number of blood vessels, as illustrated (veins in blue, arteries in red). These blood vessels are very close to the surface of the skin. This makes variations in blood flow visible as color patterns on the surface of the skin. We hypothesize that the computations executed by the nervous system during an emotion yield color patterns unique to each emotion category. (Copyright The Ohio State University.)

Can facial color accurately and robustly communicate emotion categories and valence to observers? That is, when experiencing an emotion, are the skin color changes consistent within expressions of the same emotion category/valence and differential between categories/valences (even across individuals, ethnicity, gender, and skin color)? And are these changes successfully visually interpreted by observers?

The present paper provides supporting evidence for this hypothesis.

First, we demonstrate that specific color features in distinct facial areas are indeed consistent within an emotion category (or valence) but differential between them. Second, we use a machine-learning algorithm to identify the most descriptive color features associated with each emotion category. This allows us to change the color of neutral face images (i.e., faces without any facial muscle movement) to match those of specific emotions. Showing these images to human subjects demonstrates that people do perceive the correct emotion category (and valence) on the face even in the absence of muscle movements. Finally, we demonstrate that the emotion information transmitted by color is additive to that of AUs. That is, the contributions of color and AUs to emotion perception are (at least partially) independent.

These results provide evidence for an efficient and robust communication of emotion through modulations of facial color. These results call for a revised model of the production and visual perception of facial expressions where facial color is successfully used to transmit and visually identify emotion.

Results

Our first goal is to determine whether the production of distinct facial expressions of emotion (plus neutral ones) can be discriminated based on color features alone, having eliminated all shading information. By discriminant, we mean that there are facial color features that are consistent within an emotion category (regardless of identity, gender, ethnicity, and skin color) and differential between them. We test this in experiment 1. Experiment 2 identifies the color changes coding each emotion category. Experiment 3 incorporates these color patterns to neutral faces to show that people visually recognize emotion based on color features alone. And experiment 4 shows that this emotive color information is independent from that transmitted by AUs.

Experiment 1.

We use images of 184 individuals producing 18 facial expressions of emotions (5) (Materials and Methods, Images). These include both genders and many ethnicities and races (SI Appendix, Fig. S1A and Extended Methods, Databases). Previous work has demonstrated that the pattern of AU activation in these facial expressions is consistent within an emotion category and differential between them (5, 7, 16). But are facial color patterns also consistent within each emotion and differential between emotions?

To answer this question, we built a linear machine-learning classifier using linear discriminant analysis (LDA) (17, 18) (Materials and Methods, Classification). First, each face is divided into 126 local regions defined by 87 anatomical landmark points (SI Appendix, Fig. S2). These facial landmark points define the contours of the face (jaw and crest line) and internal facial components (brows, eyes, nose, and mouth). A Delaunay triangulation of these points yields the local areas of the face shown in SI Appendix, Fig. S2. These local regions correspond to local areas of the network of blood vessels illustrated in Fig. 1.

Let the feature vector ${\hat{𝐱}}_{i j}$ define the isoluminant opponent color (19) features of the $j th$ emotion category as produced by the $i th$ individual (Materials and Methods, Color Space).

If indeed local facial color is consistent within each emotion category and differential between them, then a linear classifier should discriminate these feature vectors as a function of their emotion category. We use a simple linear model and disjoint training and testing sets to avoid overfitting.

Testing was done using 10-fold cross-validation (Materials and Methods, Experiment 1). This means that we randomly split the dataset of all our images into 10 disjoint subsets of equal size. Nine of these subsets are used to train our linear classifier, while the 10th is used for testing, i.e., to determine how accurate this classifier is on unobserved data. This process is repeated 10 times, each time leaving a different subset out for testing. The average classification accuracy and SD on the left-out subsets is computed, yielding the confusion table results shown in Fig. 2A. Columns in this confusion table specify the true emotion category, and rows correspond to the classification given by our classifier. Thus, diagonal entries correspond to correct classifications and off-diagonal elements to confusions.

Fig. 2. — Consistent with our hypothesis, specific color features remain unchanged for images of the same emotion category but are differential between images of distinct categories. (A) Confusion table of the results of a $k$ -way classification with LDA (experiment 1). Rows indicate the predicted category and columns the true category. The size of the circle specifies the accuracy of recognition (i.e., larger circles equal higher accuracy) and the color the SD across identities. (B) Results of the two-way classifications using LDA (experiment 2). The emotion category indicated in the $x$ axis corresponds to the target emotion. Blue bars indicate accuracy of classification between the target and nontarget emotions, and error bars specify SE. All results are statistically significant (change = $50 %$ ); **** $P < 10^{- 50}$ . (C) Confusion table of a $k$ -way classification on the four emotion categories in DISFA. A, angry; AD, angrily disgusted; AS, angrily surprised; D, disgusted; DS, disgustedly surprised; F, fearful; FA, fearfully angry; FD, fearfully disgusted; FS, fearfully surprised; H, happy; HD, happily disgusted; HS, happily surprised; N, neutral; S, sad; SA, sadly angry; SD, sadly disgusted; SF, sadly fearful; SS, sadly surprised; Su, surprised.

The correct classification accuracy, averaged over all emotion categories, is $50.15 %$ (chance $= 5.5 %$ ), and all emotion categories yield results that are statistically significant above chance ( $P < 0.001$ ). A power analysis yielded equally supportive results (SI Appendix, Table S1). The high effect size and low $P$ value, combined with the small number of confusions made by this linear classifier, provide our first evidence for the existence of consistent color features within each emotion category that are also differential across emotions.

These results are not dependent on the skin color as demonstrated in SI Appendix, Fig. S3A and Results Are Not Dependent on Skin Color.

Classification of positive vs. negative valence by this machine-learning algorithm was equally supportive of our hypothesis. Classification was $92.93 %$ (chance $= 50 %$ ), $P < 10^{- 90}$ .

Experiment 2.

To identify the most discriminant (diagnostic) color features, we repeated the above computational analysis but using a one-vs.-all approach (Materials and Methods, Experiment 2). This means that the images of the target emotion are used as the samples of one class, while the images of all other (nontarget) emotions are assigned to the second class. Specifically, LDA is used to yield the color models ${\tilde{𝐱}}_{j}$ of each of the target emotions and the color model of the nontarget emotions ${\tilde{𝐱}}_{- j}, j = 1, \dots, 18$ . That is, ${\tilde{𝐱}}_{j}$ and ${\tilde{𝐱}}_{- j}$ are defined by the most discriminant color features given by LDA.

Using this LDA classifier and a 10-fold cross-validation procedure yields the accuracies and standard errors shown in Fig. 2B. As we can see in Fig. 2B, the classification accuracies of all emotion categories are statistically better than chance ( $P < 10^{- 50}$ , chance $= 50 %$ ; SI Appendix, Table S2). These results are not dependent on skin color as demonstrated in SI Appendix, Fig. S3B.

Importantly, the relevance of the color channels in each local face area varies across emotion categories, as expected (SI Appendix, Fig. S4A). This result thus supports our hypothesis that experiencing distinct emotions yields differential color patterns on the face. That is, when producing facial expressions of distinct emotions, the observable facial color differs. But when distinct people produce facial expressions of the same emotion, these discriminant color features remain constant. Furthermore, the most discriminant color areas are mostly different from the most discriminant shape changes caused by AUs (SI Appendix, Fig. S4B). Thus, color features that transmit emotion are generally independent of AUs.

The identified color features can now be added to images of facial expressions, as shown in Fig. 3A. In addition, the dimensionality of the color space is 18, with each dimension defining an emotion category (SI Appendix, Fig. S5 and Dimensionality of the Color Space), further supporting our hypothesis.

Since the production of spontaneous expressions generally differs from that of posed expressions (16, 20), we also assessed the ability of isoluminant color to discriminate spontaneous expressions. Specifically, the DISFA (Denver Intensity of Spontaneous Facial Action) database (21) includes a large number of samples of four emotion categories plus neutral. Using these images and the above machine-learning approach yields an average classification accuracy of 95.53% (chance $= 20 %$ ; SI Appendix, Table S3) (Fig. 2C). We note that spontaneous expressions of distinct emotions are more clearly discriminated than posed expressions, as expected.

Experiment 3.

The color models of experiment 2 are now used to modify the neutral faces of the 184 individuals in our database. That is, we change the color of neutral faces using the discriminant models ${\tilde{𝐱}}_{j}$ and ${\tilde{𝐱}}_{- j}$ identified above.

Specifically, the feature vector $𝐱_{i n}$ associated with image $𝐈_{i n}$ (of a neutral face) is modified using the equation $𝐲_{i j} = 𝐱_{i n} + α ({\tilde{𝐱}}_{j} - {\tilde{𝐱}}_{- j})$ , where $α = 1$ (Materials and Methods, Experiment 3). Let us call the image associated with the feature vector $𝐲_{i j}, {\tilde{𝐈}}_{i j}$ . We used this approach to add the color of the following six emotion categories to neutral faces: angry, disgusted, happy, happily disgusted, sad, and fearfully surprised (Fig. 3B). This yields a new set of neutral faces, ${\tilde{𝐈}}_{i j}$ , $j = 1, \dots, 6$ .

These images were then shown in pairs to participants (Fig. 3C and SI Appendix, Fig. S6). Each pair had an image corresponding to the target emotion, ${\tilde{𝐈}}_{i j}$ , and another one corresponding to one of the nontarget emotions, ${\tilde{𝐈}}_{i k}, k \neq j$ . All target, nontarget emotion pairs were tested. Participants were asked to identify which one of the two images in the screen appears to express emotion $j$ , ${\tilde{𝐈}}_{i j}$ or ${\tilde{𝐈}}_{i k}$ . Half of the time ${\tilde{𝐈}}_{i j}$ was the left image and half of the time it was the right one. This order was randomized across participants. Participants responded by keypress.

The percentage of times participants selected ${\tilde{𝐈}}_{i j}$ over ${\tilde{𝐈}}_{i k}$ is given in Fig. 4A, with $j = {1, \dots, 6}$ . As can be seen in Fig. 4A, participants consistently preferred the neutral face with the target emotion color, ${\tilde{𝐈}}_{i j}$ , rather than the image with the colors of a nontarget emotion, ${\tilde{𝐈}}_{i k}$ . These results were statistically significant, $P < 0.01$ (SI Appendix, Table S4). The recognition of positive vs. negative valence was also significant; accuracy $= 82.65 %$ (chance $= 50 %$ ), $P < 10^{- 4}$ .

Next, we tested the ability of subjects to classify each ${\tilde{𝐈}}_{i j}$ image into one of six possible choices, i.e., a six-alternative forced-choice experiment (Materials and Methods, Six-alternative forced-choice experiment) (Fig. 3D). The results are shown in Fig. 4B; average classification $= 32.92 %$ (chance $= 16.67 %$ ). All results are statistically significant, $P < 0.01$ , except for happily disgusted, which is confused with happy. Nonetheless, this confusion yields the correct perception of valence. In fact, average accuracy of the recognition of valence in this challenging experiment is $72.9 %$ (chance $= 50 %$ , $P < 10^{- 5}$ ) (SI Appendix, Table S5).

Experiment 4.

One may wonder whether the emotion perception effect demonstrated in the previous experiment disappears when the facial expression also includes AUs. That is, are the emotion information visually transmitted by facial muscle movements and diagnostic color features independent from one another? If the color features do indeed transmit (at least some) independent information from that of AUs, then having the color of the target emotion $j$ should yield a clearer perception of the target emotion than having the colors of a nontarget emotion $k$ ; i.e., congruent colors make categorization easier, and incongruent colors make it more difficult.

To test this prediction, we created two additional image sets. In the first image set, we added the color model of emotion category $j$ to the images of the 184 individuals expressing the same emotion with AUs. Formally, image $𝐈_{i j}^{+}$ corresponds to the modified feature vector $𝐳_{i j}^{+} = 𝐱_{i j} + α ({\tilde{𝐱}}_{j} - {\tilde{𝐱}}_{- j})$ . Here $𝐱_{i j}$ is the feature vector of the facial expression of emotion $j$ as expressed by individual $i$ (i.e., the feature vector of image $𝐈_{i j}$ ), and $α = 1$ (Materials and Methods, Experiment 4; Fig. 3A; and SI Appendix, Fig. S7). In the second set, we added the color features of a nontarget emotion $k$ in an image of a facial expression of emotion $j$ , $k \neq j$ . The resulting images, $𝐈_{i j}^{k}$ are given by the feature vectors $𝐳_{i j}^{k} = 𝐱_{i j} + α ({\tilde{𝐱}}_{k} - {\tilde{𝐱}}_{- k})$ , $α = 1$ (SI Appendix, Fig. S7). Thus, $𝐈_{i j}^{+}$ is an image with the facial color hypothesized to transmit emotion $j$ , while $𝐈_{i j}^{k}$ is an image with the diagnostic colors of emotion $k$ instead, $k \neq j$ .

Participants completed a two-alternative forced-choice experiment where they had to indicate which of two images of facial expressions convey emotion $j$ more clearly, $𝐈_{i j}^{+}$ or $𝐈_{i j}^{k}$ , $k \neq j$ (Materials and Methods, Experiment 4 and SI Appendix, Fig. S7). The location of $𝐈_{i j}^{+}$ and $𝐈_{i j}^{k}$ (left or right) was randomized. The proportion of times subjects selected $𝐈_{i j}^{+}$ as more clearly expressive of emotion $j$ than $𝐈_{i j}^{k}$ is shown in Fig. 4C. As predicted, the selection of $𝐈_{i j}^{+}$ was significantly above chance (chance $= 50 %$ ). These results are statistically significant, $P < 0.01$ (SI Appendix, Table S6).

The recognition of valence was even stronger; average classification $= 85.01 %$ (chance $= 50 %$ , $P < 10^{- 5}$ ).

These results support our hypothesis of an (at least partially) independent transmission of emotion signal by color features.

Discussion

Emotions are the execution of a number of computations by the nervous system. For over two millennia, studies have evaluated the hypothesis that some of these computations yield visible facial muscle changes (called AUs) specific to each emotion (1, 2, 4, 5, 7, 22).

The present paper studied the supplemental hypothesis that these computations result in visible facial color changes caused by emotion-dependent variations in facial blood flow. The studies reported in the present paper provide evidence favoring this hypothesis. Crucially, this visual signal is correctly interpreted even in the absence of facial muscle movement. Additionally, the emotion signal transmitted by color is additive to that encoded in the facial muscle movements. Thus, the emotion information transmitted by color is at least partially independent from that by facial movements.

Supporting our results, the human retina (23) and later visual areas (24) have cells specialized for different types of stimuli. Some cells are tuned to detect motion, such as those caused by facial movements. Other cells are specialized for color perception. The visual analysis of emotion using these two separate systems (motion and color) adds robustness to the emotion signal, allowing people to more readily interpret it in noisy environments. Moreover, the existence of two separate neural mechanisms for the analysis of motion and color provides a plausible explanation for the independence of emotion information transmitted by facial movements and facial color.

Also in support of our results is the observation that human faces are mostly bare, whereas those of other primates are covered by facial hair (25). This bareness facilitates the transmission of emotion through the modulation of facial color in humans. This could mean that the transmission of an emotion signal through facial color is a mechanism not available to all primates and may be a result of recent evolutionary forces.

These findings call for a revisit to emotion models. For instance, the present paper has demonstrated that color can successfully communicate 18 distinct emotion categories to an observer as well as positive and negative valence, but more categories or degrees of valences are likely (7, 26). For example, it is possible that new combinations of AUs and colors, not tested in the present study, can yield previously unidentified facial expressions of emotion. This is a fundamental question in emotion research likely to influence the definition of emotion and the role of emotions in high-level cognitive tasks (26, 27).

The results of the present study also call for the study of the brain pathways associated with emotion perception using color alone and in conjunction with facial movements. Previous studies have suggested the existence of a variety of pathways in the interpretation of facial expressions of emotion, but the specificity of some of these pathways is poorly understood (28–30). Also, understanding independent domain-specific mechanisms such as the ones supported by our results is a topic of high value in neuroscience (31, 32).

Current computational models of the perception of facial expressions of emotion (7) will also need to be extended or revised to include the contribution of facial color and color perception. Similarly, algorithms in artificial intelligence (e.g., computer vision and human–robot interaction) will need to be able to interpret emotion through these two independent mechanisms if they are to be indistinguishable from humans, i.e., to pass a visual Turing test (33).

Also consistent with our hypothesis, studies in computer graphics demonstrate the importance that color plays in the expression of emotion (34–37).

Our findings are also significant in the study of psychopathologies. A recurring characteristic in psychopathologies is an atypical perception of facial expressions of emotion (38). But is this caused by limitations in interpreting facial movements or color? Or is this dependent on the disorder?

Finally, the results of the present study support the view that the perception of color can have an emotional interpretation, as painters have been exploiting for years (39). Of note are the paintings of Mark Rothko, consisting of blurred blocks of colors combined to yield the perception of emotion. The studies reported in the present paper could help understand the perception of emotion so eloquently conveyed by this and other painters.

Materials and Methods

Approval and Consent.

The experiment design was approved by the Office of Responsible Research Practices at The Ohio State University. Subjects provided written consent.

Images.

We used images of 184 individuals expressing 18 emotion categories, plus neutral. The images are from refs. 5 and 21. These images have been extensively validated for consistency of production and recognition (7). The emotion categories are happy, sad, angry, disgusted, surprised, fearful, happily surprised, happily disgusted, sadly fearful, sadly angry, sadly surprised, sadly disgusted, fearfully angry, fearfully surprised, fearfully disgusted, angrily surprised, angrily disgusted, and disgustedly surprised. The dataset also includes the neutral face (SI Appendix, Extended Methods, Databases).

Areas of the Face.

Denote each face color image of $p \times q$ pixels as $𝐈_{i j} \in ℝ^{p \times q \times 3}$ and the $r$ landmark points of each of the facial components of the face as $𝐬_{i j} = (𝐬_{i j 1}, \dots, 𝐬_{i j r}), 𝐬_{i j k} \in ℝ^{2}$ the 2D coordinates of a landmark point on the image. Here, $i$ specifies the subject and $j$ the emotion category and we used $r = 66$ (SI Appendix, Fig. S2). Delaunay triangulation (40) is used to create the triangular local areas shown in SI Appendix, Fig. S2. This triangulation yields a total of 148 local areas. Six of these triangles define the interior of the mouth and 16 the inside of the eyes (sclera and iris). Since blood changes in these areas are not generally visible, they are removed from further consideration, leaving a total of 126 triangular local areas. Let $D = {d_{1}, \dots, d_{126}}$ be a set of functions that return the pixels of each of these local regions; i.e., $d_{k} (𝐈_{i j})$ is a vector including the $l$ pixels within the $k th$ Delaunay triangle in image $𝐈_{i j}$ ; i.e., $d_{k} (𝐈_{i j}) = {(𝐝_{i j k 1}, \dots, 𝐝_{i j k l})}^{T}$ , where $𝐝_{i j k s} = {(d_{i j k s 1}, d_{i j k s 2}, d_{i j k s 3})}^{T} \in ℝ^{3}$ defines the values of the LMS channels of each pixel.

Color Space.

The above-defined feature vectors are mapped onto the isoluminant color space given by the opponent color channels yellow–blue and red–green. Formally $𝐡_{i j k s} = {(d_{i j k s 1} + d_{i j k s 2} - d_{i j k s 3}, d_{i j k s 1} - d_{i j k s 2})}^{T} \in ℝ^{2}$ . We compute the first and second moments (i.e., mean and variance) of the data in this color space; i.e., $μ_{i j k} = l^{- 1} Σ_{s = 1}^{l} 𝐡_{i j k s}$ , $σ_{i j k} = l^{- 1} Σ_{s = 1}^{l} \sqrt{{(𝐡_{i j k s} - μ_{i j k})}^{2}}$ . Every image $𝐈_{i j}$ is now using the following feature vector of color statistics, $𝐱_{i j} = {(μ_{i j 1}^{T}, σ_{i j 1}^{T}, \dots, μ_{i j 126}^{T}, σ_{i j 126}^{T})}^{T} \in ℝ^{504}$ . Using the same modeling, we define the color feature vector of every neutral face as $𝐱_{i n} = {(μ_{i n 1}^{T}, σ_{i n 1}^{T}, \dots, μ_{i n 126}^{T}, σ_{i n 126}^{T})}^{T}$ . We also use the normalization proposed by Bratkova et al. (41), which maintains the properties of saturation and hue in color space. This yields the vectors ${\hat{𝐱}}_{i j}$ and ${\hat{𝐱}}_{i n}$ , respectively.

Classification.

LDA is computed on the above-defined color space. Formally, the color space is defined by the eigenvectors associated with nonzero eigenvalues of the matrix $𝚺_{x}^{- 1} 𝐒_{B}$ (17), where $𝚺_{x} = \sum_{i = 1}^{m} \sum_{j = 1}^{C} ({\hat{𝐱}}_{i j} - μ) {({\hat{𝐱}}_{i j} - μ)}^{T} + δ 𝐈$ is the (regularized) covariance matrix, $𝐒_{B} = \sum_{j = 1}^{c} ({\bar{𝐱}}_{j} - μ) {({\bar{𝐱}}_{j} - μ)}^{T}$ is the between-class scatter matrix, ${\bar{𝐱}}_{j} = m^{- 1} \sum_{i = 1}^{m} {\hat{𝐱}}_{i j}$ are the class means, $μ = {(C m)}^{- 1} \sum_{i = 1}^{m} \sum_{j = 1}^{C} {\hat{𝐱}}_{i j}$ , I is the identity matrix, $δ = 0.01$ , and $C$ is the number of classes.