Emotion recognition (sometimes) depends on horizontal orientations

Carol M Huynh; Benjamin Balas

doi:10.3758/s13414-014-0669-4

. Author manuscript; available in PMC: 2015 Jul 1.

Published in final edited form as: Atten Percept Psychophys. 2014 Jul;76(5):1381–1392. doi: 10.3758/s13414-014-0669-4

Emotion recognition (sometimes) depends on horizontal orientations

Carol M Huynh ¹, Benjamin Balas ¹

PMCID: PMC4096061 NIHMSID: NIHMS579162 PMID: 24664854

Abstract

Face recognition depends critically on horizontal orientations (Goffaux & Dakin, 2010). Face images that lack horizontal features are harder to recognize than those that have that information preserved. Presently, we asked if facial emotional recognition also exhibits this dependency by asking observers to categorize orientation-filtered happy and sad expressions. Furthermore, we aimed to dissociate image-based orientation energy from object-based orientation by rotating images 90-degrees in the picture-plane. In our first experiment, we showed that the perception of emotional expression does depend on horizontal orientations and that object-based orientation constrained performance more than image-based orientation. In Experiment 2 we showed that mouth openness (i.e. open versus closed-mouths) also influenced the emotion-dependent reliance on horizontal information. Lastly, we describe a simple computational analysis that demonstrates that the impact of mouth openness was not predicted by variation in the distribution of orientation energy across horizontal and vertical orientation bands. Overall, our results suggest that emotion recognition does largely depend on horizontal information defined relative to the face, but that this bias is modulated by multiple factors that introduce variation in appearance across and within distinct emotions.

Keywords: Face recognition, emotion recognition, orientation

Introduction

Face recognition depends on a restricted range of low-level image features. This includes specific spatial frequency (SF) bands (Gold, Bennett, & Sekuler, 1999; Yue, Tjan, & Biederman, 2006) and orientation bands. As is the case for all object categories, different spatial frequencies carry different kinds of visual information about face stimuli (Vuilleumier, Armony, Driver & Dolan, 2003; Goffaux, Hault, Michel, Vuong, & Rossion, 2005; Goffaux & Rossion, 2006), with lower spatial frequencies carrying more information about coarser features (e.g. the face outline) and higher spatial frequencies carrying information about finer details (e.g. texture features or the appearance of the eyes). Mid-range spatial frequencies (~8-16 cycles per face), however, appear to contribute to face recognition disproportionately, while for other object classes it appears that there are not sub-bands that contribute disproportionately to recognition (Biederman & Kalocsai, 1997; Collin, 2006). With regard to how orientation sub-bands contribute to face recognition, Dakin & Watt (2009) demonstrated that horizontal orientations appear to contribute disproportionately to famous face identification. The authors asked observers to identify famous faces (celebrities) and found that observers were about 35% accurate when the orientation was near the vertical axis, which differed significantly from the 56% accuracy in performance for orientations near the horizontal axis. Additionally, in a separate computational analysis, the authors discuss the possibility that horizontal structures in face images may be a robust cue for face detection. That is, the typical pattern of horizontally-oriented features in the face may be a cue that is not disrupted by changes in view or illumination and may also reliably distinguish faces from non-faces. The robustness of these structures following typical environmental manipulations may support invariant recognition in many settings. For example, observers are moderately robust to variation in face illumination and viewpoint (Sinha, Balas, Ostrovsky, & Russell, 2006), possibly because neither of these manipulations typically disrupts the structure of the sequence posited by Dakin & Watt. In contrast, both contrast negative and face inversion produce stripes that are highly dissimilar to the original image. The disruption of the face sequence in both these circumstances may be the reason contrast negation and face inversion both disrupt face recognition so profoundly (Galper, 1970; Yin, 1969).

A number of face processing phenomena depend critically on horizontal orientations within face images. For example, Goffaux & Dakin (2010) showed that the face inversion effect (i.e., failure to recognize a familiar face when presented upside down) was preserved for faces containing only horizontal information, but did not obtain for vertically- filtered faces. They presented upright and inverted pairs of faces, cars, and scenes that contained horizontal or vertical information, or both. When presented upright, faces containing horizontal information were better processed than faces containing only vertical information. However, when presented upside down, the horizontal advantage was greatly disrupted while faces containing vertical information remained largely unaffected. Identity after-effects are also driven by horizontal information. Adapting to one of two faces containing horizontal information (i.e., staring at a face for an extended period of time) affected responses to morphed versions of those same two faces (a shift of the psychometric curve towards the adapting face). Finally, the authors showed that masking horizontal information with visual noise disrupted the ability to match faces across different viewpoints. All three manipulations led the authors to conclude that the horizontal structure provides the most useful information about face identity (Goffaux & Dakin, 2010; Dakin & Watt, 2009).

The amount of information for identification has also been shown to be greatest within the horizontal orientation band. Pachai, Sekuler, and Bennet (2013) found that masking of face images was strongest for noise with orientations at or near 0° (i.e., horizontal) and least for orientations at or near 90° (i.e., vertical). Furthermore, the authors also calculated absolute efficiency scores and proposed that if observers were using masked face information across all orientation bands equally, then no differences should be observed for faces embedded in noise fields with different orientation energy distributions. However, this was not the case; rather, Pachai et. al. (2013) found observers were actually more sensitive to horizontal information when faces were upright than inverted suggesting observers were more efficient at utilizing orientations at the horizontal band and less efficient at using information in the vertical band. Finally, the authors showed that sensitivity to horizontal information in upright faces correlated significantly with the size of the face inversion effect suggesting that horizontal face information was used efficiently but only for faces presented upright rather than inverted.

However, there is preliminary evidence that horizontal information is not completely dominant over all other orientations. Goffaux & Okamoto-Barth (2013) demonstrated that vertical orientation assisted in the processing of gaze information. The authors compared direct with averted gaze by presenting an array of faces filtered to include horizontal, vertical, or a combination of both types of orientation information. By having subjects search for a target face consisting of either gaze, the authors found that detection was better for direct rather than averted gazes but only when arrays were comprised of vertically- filtered faces. This suggests that specific facial regions can be useful for communicating relevant social information, and that these orientation bands carry social cues that are distinct from those that carry useful information for individuation. Indeed, the eyes in particular (relative to the nose and mouth) carry important horizontally- oriented information for individuation (Pachai, Sekuler, & Bennett, 2013a) but clearly also carry important vertically- oriented information for gaze perception. These reports also suggest that multiple subsets of orientations may be more critical than others depending on the region of focus within the face and the specific cues that observers require completing different perceptual tasks. However, for recognition of whole faces, horizontal information appears to be most important.

Since identification critically depends on horizontal information, but some social cues may depend on a broader or different range of orientations, we chose to investigate how facial emotion recognition depends on orientation information. Bruce and Young's classic model of face perception (1986) proposes dissociable processes for identity and facial expressions (Winston, Henson, Fine-Goulden, & Dolan, 2004; Young, McWeeny, Hay, & Ellis, 1986). Bruce & Young (1986) identified several distinct types of information that can be derived from viewing faces. This perceptual process is broken up into stages with the first being the encoding of structural information where abstract descriptions of features are obtained. Following this initial stage, they proposed that expression and identity are analyzed independently from one another by separate systems (the expression analysis and face recognition units). This model has received support from clinical studies of prosopagnosic patients (Palermo, Willis, Rivolta, McKone, Wilson, Calder, 2011; Duchaine, Parker, Nakayama, 2003) and behavioral studies of neurotypical individuals. For example, Young, McWeeny, Hay, and Ellis (1986) provided evidence for separate processing of identity and emotional expressions by measuring reaction time in a matching task. They presented pairs of familiar or unfamiliar faces simultaneously and had subjects decide whether faces were of the same person (identity-matching) or emotion (expression-matching). According to the Bruce and Young model, recognizing expression does not depend on face recognition units and so performance should be similar across familiar and unfamiliar faces. In contrast, identity matching should result in faster responses to familiar than unfamiliar faces due to the rapid and automatic operation of face recognition units. The results of Young et. al.'s task revealed that for identity matching, reaction time was indeed faster for familiar than for unfamiliar faces while no differences were observed for matching emotional expressions.

Evidence from neuroimaging studies and visual adaptation paradigms also supports the possibility that identity and emotion are processed independently. For example, Winston et al. (2004) were able to distinguish between neural representations for emotion and identity processing using an fMRI adaptation paradigm. Behavioral face adaptation paradigms similarly reveal that identity adaptation depends on both an expression-dependent mechanism and an expression-independent mechanism (Fox, Oruc, & Barton, 2008), the latter providing evidence of independent neural processing of facial emotion. Different emotions (Happy vs. Sad) also appear to be dissociated neurally (Morris, DeGelder, Weiskrantz, & Dolan, 2001); Calder, Lawrence & Young, 2001), suggesting that not only is emotion processing neurally distinct from identity processing, but that distinct emotions may be processed by distinct mechanisms.

Altogether, different emotional expressions appear to be processed by distinct neuroanatomical structures (Johnson, 2005; Calder, Lawrence, & Young, 2001). Additionally they are also largely dissociable from identity. Therefore, we hypothesized that the observed bias for horizontal information in identity recognition may not obtain in a facial emotion recognition task, and that orientation biases may also depend on appearance variability in how those emotions were expressed. Indeed, prior reports suggest that not all emotion categories are equally dependent on the same spatial frequencies or orientations. Happy and sad emotion recognition appear to be supported by low (<8 cycles per face) and high (>32 cycles per face) spatial frequencies respectively (Kumar & Srinivasan, 2011). Yu, Chai, & Chung (2011) measured performance on the categorization of four facial expressions (Anger, Fear, Happiness, and Sadness) using multiple orientation filters (i.e., -60°, -30°, 0°, 30°, 60°, 90°) and concluded that horizontal information is critical for the recognition of most emotions with the exception of fear expressions. When the degree of orientation reached near vertical, there was a bias towards labeling faces as “fearful,” suggesting that diagnostic cues for recognizing fear may be embedded within the vertical rather than the horizontal component, or at least more equally distributed between the two. This result was also borne out by computational modeling simulations that help to explain the differing orientation biases observed by Yu et al. using a model of visual processing based on multi-scale, oriented Gabor filters (Li & Cottrell, 2012).

In the current study, we asked participants to categorize happy and sad faces that were filtered to include information that was predominantly vertical, predominantly horizontal, or both. Furthermore, we used picture-plane rotation (0° or 90°) to dissociate image-based from object-based orientation. For instance, when the horizontally filtered face image (stimuli containing predominantly horizontal information) is rotated at a 90-degree angle, raw visual orientation becomes vertical although information along the horizontal structure of the face remains present. This manipulation thus allowed us to determine the relative contribution of a putative bottom-up bias for horizontal orientations and higher-level biases for particular facial features. We conducted two experiments using faces expressing genuine emotions (Experiment 1) and faces expressing posed emotions (Experiment 2). This allowed us to examine both an ecologically valid set of emotional faces in one task and complement this analysis with a controlled set of images that made it possible to control for confounds between emotional expression and specific features (e.g. open mouths) that are present in naturally-evoked expressions. We hypothesized that the reliance on horizontal orientations for emotion recognition may depend on the appearance of distinct emotional faces, since particular diagnostic features vary substantially by emotion category. Overall, our results are consistent with this hypothesis, insofar as we found that emotion recognition does largely depend on horizontal orientation information, but that this bias is modulated by factors influencing the appearance of specific emotions, including mouth openness (i.e., open vs closed). Furthermore, we found that structural orientation relative to the face image, as opposed to raw orientation, is driving performance in our tasks. Lastly, we submitted our face images for analysis of the energy content within the horizontal or vertical orientation band to determine if our behavioral effects were driven by the relative amounts of orientation energy in the target bands. Our results revealed that overall, horizontal orientation energy was consistently larger than vertical orientation energy, but that this effect was not significantly affected by emotional expression or mouth openness. We conclude that the extent to which emotional expressions are recognized with a horizontal orientation bias depends on a number of stimulus factors, suggesting that observers are capable of adopting a flexible strategy for recognition that is not constrained by a front-end horizontal bias.

Experiment 1

In Experiment 1, we investigated whether genuine emotion recognition depended on the horizontal structure of the human face. We also wanted to determine whether image-based orientation or object-based orientation was more relevant to differential performance as a function of orientation energy.

Method

Participants

Seventeen undergraduate students (11 females/6 males) from North Dakota State University participated in this experiment. All participants reported normal or corrected-to-normal vision. Students provided written informed consent and received course credit for their participation.

Stimuli

Face images of 29 individuals (12 male/17 females) expressing genuine emotions of both happy and sad were taken from the Tarrlab Face Place database (www.face-place.org) and were 250 × 250 pixels in size. Faces containing certain artifacts (e.g., extensive facial hair) were not chosen, hence the unequal sample of male and female stimuli. We normalized images by subtracting the mean luminance value from each image. We filtered these faces in Matlab 2010A by applying a Fourier Transform to each image and multiplying the Fourier energy with either a horizontal or vertical Gaussian filter with a standard deviation of 20-degrees. Our stimuli were then created by taking the inverse of the Fourier transformed image back into the spatial domain (Figure 1). Following this inverse transformation, all images were readjusted so that mean luminance and contrast were matched (Dakin & Watt, 2009). In addition to the vertically (V) and horizontally (H) filtered images, we also included a third condition (Broadband) comprised of faces that contain broadband orientation information (Figure 1). We had intended this set of images to only include the combination of horizontal and vertical orientation energy from our first two filtered image conditions, but due to an error in our image filtering code, these images instead have information from all orientations at low spatial frequencies and are only restricted to horizontal and vertical orientations at high spatial frequencies. As a result, these control images are effectively comprised of orientation energy at all orientations, and therefore primarily allow us to compare performance with largely unfiltered images to performance in the horizontal and vertical conditions.

Fig. 1 — Examples of Experiment 1 faces depicting the same individual filtered horizontally, vertically, and broadband orientations information.

Design

We used a 2 × 2 × 3 within-subjects design with the factors of Emotion (happy, sad), Image Orientation (upright or sideways), and Filter Orientation (vertical, horizontal, broadband). Image orientation was varied in separate blocks, while emotion and filter orientation were pseudo-randomized within each block. Participants completed a total of 348 trials (174 trials in the upright condition and 174 trials in the rotated condition). Block order was counterbalanced so that half of the participants began with the upright images and the remaining half began with the rotated images condition.

Procedure

Participants viewed the stimuli on a 13-inch MacBook with a 2.4 GHz Intel Core 2 Duo Processor. We recorded participants’ responses using an 8-bit USB controller. Stimuli were presented using PsychToolbox 3.0.10 on a MacOS 10.7.4 system. The participants’ task was to label each face according to the expressed emotion (happy/sad). Participants responded by pressing the “B” button on our controller for “sad” and the “A” button for “happy.” We asked participants to respond as quickly and as accurately as possible. Each trial began with a fixation cross at the center of a gray screen for 500ms, followed by a face stimulus that replaced the fixation cross. The face stimulus remained on the screen until participants either made a response or 2000ms elapsed. Short breaks were offered in between blocks and the experiment resumed only when participants indicated they were ready.

Results

Sensitivity

We computed estimates of sensitivity (d’) using hits (the correct classification of happy) and false alarms (the incorrect classification of sad) in each condition. We submitted these sensitivity measures to a 2 (image rotation) × 3 (filter orientation) Repeated Measures ANOVA and observed a main effect of image rotation (F(1,16) = 24.62, p < .0001, η²= 0.11) such that discrimination was better for upright faces (M = 3.26) than for faces rotated sideways at a 90-degree angle (M = 2.87). We also observed a main effect of filter- orientation (F(2,32) = 73.40, p < .0001, η²= 0.68) such that discrimination was poorest for vertically-filtered faces (M= 2.43) followed by horizontally- filtered faces (M= 3.17) and faces containing both horizontal and vertical information (M=3.59). Bonferroni post-hoc pairwise comparisons revealed that all three of these values differed from one another (p<.001). These two factors also interacted (F(2,32) = 7.57, p = .002, η²= 0.04) such that image rotation significantly impacted discrimination for vertically-filtered but had no effect on horizontally-filtered faces and faces containing both broadband orientation information (Figure 2).

Fig. 2 — Sensitivity measures for responses in Experiment 1. We find that image rotation significantly impacted discrimination for categorization, especially for vertically- filtered faces. Error bars represent +/- 1 s.e.m.

Response Time

A 2 (emotion) × 2 (image rotation) × 3 (filter orientation) Repeated Measures ANOVA of median correct response latencies revealed significant main effects of emotion (F(1,16) = 27.9, p < .001) and filter orientation, (F(2,32) = 42.74, p < .001). These effects were driven by longer response latencies for sad faces (774ms) than for happy faces (665ms), and by slower response latencies for vertically-filtered faces (793ms) relative to horizontal (700ms) and faces with broadband orientation (M=665ms). There was also a significant interaction between these two factors, (F(2,32) = 11.60, p < .001), such that vertical filtering greatly impaired the recognition of sad faces (Figure 3) but happy face latencies did not differ in the horizontal and vertical filtering conditions. Unlike our analysis of sensitivity, we observed no interaction between image orientation and orientation filter (F(2,32) = 1.29, p = 0.29).

Fig. 3 — Average response latency for correct responses in Experiment 1. For both upright and sideways faces, we find that vertically- filtered faces are more slowly recognized than our other filtered images, but only when sad faces are recognized. Error bars represent +/- 1 s.e.m.

Criterion

We also ran a 2 (image rotation) × 3 (filter orientation) Repeated Measures ANOVA of response bias, C, and found no significant biases in the way participants were responding in any of our conditions, (F < 1).

Discussion

Our results demonstrate that the facial cues important for emotion recognition were preferentially carried by horizontal orientation. Discrimination ability (as indexed by d’ values) was poorer for vertically-filtered relative to horizontally-filtered images and images with broadband orientation. We also found that discrimination of emotional faces was worse when images were rotated 90-degrees in the picture plane, but that sideways-rotation did not lead to a “flipped” orientation bias. That is, horizontal orientations relative to the object (not the image) yielded better performance than vertical orientations. This suggests that the horizontal bias we have observed is not solely driven by a front-end bias for horizontally-tuned neurons in early vision. Were this the case, we would have expected horizontal information relative to the image to be a better predictor of superior performance. Instead, our effect of planar rotation is largely consistent with previous reports on the effect of inversion on face processing (Freire, Lee, & Symons, 2000; Maurer, Grand, & Mondloc, 2002). A change in orientation disrupts the efficiency of face processing, in our case, impacting vertically-filtered faces significantly more than faces with horizontal or both horizontal and vertical energy. This disruption in performance for vertically-filtered faces rather than horizontally- filtered faces rotated sideways is inconsistent with previous results showing that rotation impacts both types of information (Yin, 1969; Goffoax & Rossion, 2011; Jacques, d'Arripe, & Rossion). However, one major difference between the results we obtained from previous studies is that our picture-plane rotation does not involve a complete 180-degree rotation. Thus, our face images are not completely inverted but are presented sideways instead. Furthermore, according to Goffaux and Rossion (2007), face inversion does not disrupt the same amount of facial information, whether vertical or horizontal. Although Goffaux & Rossion (2007) did not actually investigate orientation bands, they did find differences between the extractions of different orientation information within facial features. Specifically, they observed poorest performance for recognizing vertical rather than horizontal facial relations rotated sideways at a 90-degree angle. Again, orientation information was not the target of their investigation but the report that performance differences could arise for various facial information rotated along the picture-plane may possibly explain the discrepancies between our results from those of previous findings.

In terms of response latency, our main effect of emotion category is consistent with previous reports (Elfenbein & Ambady, 2003; Kirita & Endo, 1995), that happy face categorization was carried out faster than sad face categorization. In terms of our initial hypotheses regarding the potential for different emotions to exhibit a differential horizontal bias, we also found that the preference for horizontal information depended on emotion category since happy face response latencies revealed a reduced horizontal bias relative to sad faces.

Together, our sensitivity and RT data suggest that (a) orientation biases for emotion recognition may manifest in an emotion-dependent manner and that (b) structural orientation relative to the face image (not raw orientation on the retina) drives differential performance in our task.

One important limitation of our first experiment, however, is that our use of genuine emotional expressions may have introduced confounding factors that underlie the interaction we observed between emotion category and filter orientation. Specifically, mouth openness varies substantially in genuine happy and sad faces, and the prevalence of open mouths in happy faces may be the basis of the interaction we observed here. To examine the emotion-dependence of the horizontal orientation bias in more depth, we continued in Experiment 2 by using stimuli from a database of posed emotional expressions that permitted systematic control of mouth openness and emotional expression.

Experiment 2

In Experiment 2, we wished to replicate and extend the results of Experiment 1 using a controlled set of face stimuli where mouth openness could be manipulated. Specifically, we chose to use a set of posed emotions from the NimStim Face Set (Tottenham, Tanaka, Leon, McCarry, Nurse, Hare, Marcus, Westerlund, Casey, & Nelson, 2009) in which the position of the mouth (open vs. closed) was systematically varied in happy and sad emotional expressions.

Methods

Participant

Twenty-one undergraduate students (11 females/10 males) from North Dakota State University participated in this experiment, all reported normal or corrected-to-normal vision. Students provided written informed consent and received course credit for their participation.