Skip to main content
GMS Psycho-Social-Medicine logoLink to GMS Psycho-Social-Medicine
. 2013 Jun 17;10:Doc04. doi: 10.3205/psm000094

The effect of forced choice on facial emotion recognition: a comparison to open verbal classification of emotion labels

Der Effekt eines geschlossenen Antwortformats auf die mimische Emotionserkennung: ein Vergleich mit der freien verbale Zuordnung von Emotionswörtern

Kerstin Limbrecht-Ecklundt 1,*, Andreas Scheck 1, Lucia Jerg-Bretzke 1, Steffen Walter 1, Holger Hoffmann 1, Harald C Traue 1
PMCID: PMC3687244  PMID: 23798981

Abstract

Objective: This article includes the examination of potential methodological problems of the application of a forced choice response format in facial emotion recognition.

Methodology: 33 subjects were presented with validated facial stimuli. The task was to make a decision about which emotion was shown. In addition, the subjective certainty concerning the decision was recorded.

Results: The detection rates are 68% for fear, 81% for sadness, 85% for anger, 87% for surprise, 88% for disgust, and 94% for happiness, and are thus well above the random probability.

Conclusion: This study refutes the concern that the use of forced choice formats may not adequately reflect actual recognition performance. The use of standardized tests to examine emotion recognition ability leads to valid results and can be used in different contexts. For example, the images presented here appear suitable for diagnosing deficits in emotion recognition in the context of psychological disorders and for mapping treatment progress.

Keywords: basic emotions, facial emotion recognition, open-response format, decision making

Introduction

The ability to recognize emotions is a key criterion for social interaction. In interpersonal communication, emotions are exchanged not only through complex language semantics but also through emotional facial expressions [1], [2]. Partners in an interpersonal interaction analyze each other’s facial expressions to gain information about one another’s emotional states. This can provide important information for predicting the course of the conversation and the initiation of action tendencies. The foundation of the ability to recognize emotions is the capacity to decipher emotional facial expressions, which can be estimated in standardized contexts. A common approach to testing emotion recognition ability is to present the relevant subjects with emotional stimuli, usually emotional facial expressions based on the six basic emotions according to Ekman [3], [4], [5] at the time of their greatest intensity, and to allow subjects to choose from a series of alternatives which emotion is being expressed. Such a specification of alternatives corresponds to a closed answer format, also called forced-choice format. The advantage of this approach is a quicker and easier evaluation of the answers. Immediately after the assessment, for example, the sum of correctly recognized emotions and mistakes can be calculated [6]. Unfortunately, the use of this response format also has some potential disadvantages. In 1993, James A. Russell [7] pointed out that the conclusions drawn from the use of a closed-answer format are not clear. The presence of several options leads to a relative judgment; i.e., the subject chooses the answer that appears most likely. According to Russel, the most likely answer alternative does not necessarily reflect the answer that was actually intended by the subject. In his study [7], only 12.5% chose the “anger” option for an anger stimulus from the JACFEE photo set [8] when the subjects were free to choose from different emotion words. A similar effect was also observed in the identification of suspects in, for example, [9], [10]. Significantly more false accusations resulted when several potential suspects were presented in a row when the perpetrator was not among them. This is because often the person identified most resembles a perpetrator – that is, a relative judgment is made. The number of misidentifications decreases if each person is shown individually and the subject must make a direct judgment as to whether the person is the perpetrator or not (absolute judgment). Wagner [11] also points out that the use of categories always results in a change in the selection probability. If response alternatives are presented, it is conceivable that not only the actual recognition performance but also some “lucky strikes” are recorded due to the random selection of the correct answer. The recognition rate would then not accurately reflect the actual performance. In addition, the possible answers already represent a selection. Adolphs [12], for example, critically discusses whether “fear” and (positive) “surprise” are distinct emotions or subcategories of the (neutral) “surprise.” Tracy and Robins [13] note that only one positive category (“happiness”) is available for selection in most cases, although other positive basal emotional expressions are conceivable (e.g., “pride”). It is also possible, however, that the recognition of basic emotions represents a process that is so little prone to error that the use of a closed-response format provides no real assistance to the “average healthy adult” but that rather a type of ceiling effect is present anyway.

The present study investigates which effect results from the use of an open-response format. In addition, it examines the differences resulting from the use of an open compared to a closed-answer format. For this purpose, the recognition rates of another study based on a closed-response format were used, for which the subjects evaluated the same stimuli (for detailed informartin see [14]).

This results in the following research questions:

  1. Are there certain emotion words that are particularly frequently named by the study participants and can provide an indication of a certain emotional expression being reliably detected?

  2. Can the mentioned emotion words be assigned to specific categories?

  3. Do the recognition rates and error rates differ between the open- and closed-response formats?

  4. Can a correlation be found between recognition rates and the subjects’ subjective certainty in evaluating the emotion?

It was hypothesized that the processing of facial stimuli is such a basal task in interacting with others that correct classification occurs even when no predefined response categories are provided. If this hypothesis is confirmed, it will be derived that the use of standardized methods for measuring emotion recognition in research and clinical practice is unproblematic. The application of such standardized procedures is conceivable in many areas. One potential area of application is the clinical context. Patients with mental (e.g., anxiety disorder, depression) and neurological (e.g., stroke, lesions) disorders often exhibit specific deficits in the area of emotion recognition, which can lead to problems in social interaction, e.g., [15], [16], [17], [18]. In this context, some mental disorders are associated with the increased occurrence of one or several emotions (depression, anxiety), while others are coupled with deficits and a lack of coherence in this area (e.g., psychopathy, schizophrenia). The goal of many therapeutic treatment approaches, for example for depression or social phobia, is the optimization of social contacts [19]. It is known that experiencing anger can lead to an intrapersonal sense of injustice. For the interaction partner, the perception of anger often leads to a fear response [20]. Numerous studies show specific deficits in emotion recognition ability, which can be seen as a part of emotion regulation and processing. One prominent example is panic disorder: Patients with this disorder show general deficits in recognizing emotions, coupled with a tendency to favor anger in their selection [21].

It thus makes sense to use tests for assessing the ability to recognize emotions to evaluate treatment progress insofar as it can be assumed that an improvement in the underlying pathology is associated with a less pronounced deficit in the recognition of certain emotions [22], [23]. Moreover, it is conceivable to use the stimulus material for training purposes, for example to render social interactions more satisfactory by practicing empathic reactions to match certain cues. Initial successes have already been achieved in the treatment of autistic patients [24]. The validity of the images used and the response behavior must be ensured, however. The latter is the focus of this study.

Methodology

Sample

The study was conducted among N=33 subjects. All participants were recruited through postings at the University of Ulm and gave written informed consent to voluntarily participate in the study (ethics committee decision 245/08_UBB/se). The average age of the participants was M=26.39 years (SD=3.65). Only subjects who were no older than 35 were included in the study. This age range was chosen to match that of the people shown in the photographs to be assessed and to control for any potential own-age bias [25], [26], [27]. The proportion of women was 69%. The overall education level of the sample is high; all subjects completed their Abitur (general qualification for university entrance). Furthermore, only native German speakers were included in the study. This approach was chosen to maximize the number of mentioned emotion words. The subjects took no medications for the treatment of mental disorders and were not currently undergoing psychiatric, neurological, or psychotherapeutic treatment.

Study design

The subjects were asked to complete a demographic questionnaire (age, gender, education). This was followed by a briefing about the experiment. The subjects were presented with 96 emotional facial expressions – from a total of 48 persons – from the “Pictures of Facial Affect-Ulm” image set [14], including expressions of fear, anger, disgust, happiness, sadness, and surprise. Each person was represented by two facial expressions. This FACS-based set of images [5] is highly standardized; according to an initial study, the recognition rates are at least 75% for all six basic emotions [14]. In addition to the recognition rate, the genuineness and intensity of the emotional expression shown, as well as the attractiveness of the actors, were taken into account [28] to minimize systematic differences with regard to these variables. For this purpose, these three factors were assessed on a separate scale for each emotional stimulus. Every effort was made to select images that were judged to be as similar as possible with regard to these three factors. A total of 16 emotional expressions per basic emotion were used. The gender of the persons shown was balanced. The images were pseudo-randomized, i.e., presented to all subjects in random order as paper prints. The size of the images was 5.9 cm × 4.6 cm. The subjects were asked to judge the presented emotions verbally – without predefined categories (first assessment). Furthermore, they were asked to indicate their subjective certainty on a five-point Likert scale from 1 = very uncertain (in their assessment) to 5 = very certain (in their assessment). If the subjects were unable to decide on an emotion, they had the opportunity to note alternative emotion words in another field (second assessment).

All emotion words were then presented for assessment to two naïve judges blind to the hypothesis, who assigned the data obtained to the six basic emotion categories. In addition, a category was introduced for words that were not attributable to the basic emotions. Here, the judges had to decide whether the word pertained to a positive or a negative emotion. The purpose of these additional categories is to provide an overview of the emotion words that cannot be assigned to the basic emotions. The recognition performance for each individual basic emotion was determined and relative judgments were minimized through the use of these two additional categories.

Results

The following analyses were used to answer the above questions: In the first step, the frequencies of each emotion word were calculated and compared with each other (question 1). These emotion words were then assigned to the existing emotion categories by two independent judges who were blind to the hypothesis. The results of this classification and the inter-rater agreement were calculated (question 2). Subsequently, the results of the free judgment were compared with the recognition rates in a paradigm with a closed-response format by means of logistic regression, as the individual recognition rates were used for the calculation [14] (question 3). In the next step, the error rates (error matrices) were examined (question 3). In the last step, correlation analyses were used to assess whether a correlation between the subjective certainty and recognition rate could be found (question 4).

The analysis of the emotion classification resulted in the naming of a total of 245 different emotion words, including variations of a root word (i.e. “anger” and “angry” were considered two different emotion words). Here, 190 emotion words arose from the first assessment; another 55 words arose from the second assessment. The latter were chosen by the subjects if they thought that an additional emotion word applied or they were unable to commit to just one emotion. All emotion words were included in the assessment process for review.

The statistical analysis was performed using IBM SPSS Statistics 20©. When assigning categories, the raters achieved an overall inter-rater agreement of 72.25% (κ =.655, p=.000). This can be considered a remarkable agreement [29]. The remaining, differently rated categories were rated again using a “forced-choice” method; this time by both raters together. This allowed all 245 emotion words to be assigned to one of the eight possible categories. The distribution can be found in Table 1 (Tab. 1); an overview of all emotion words is available in Attachment 1.

Table 1. Assignment of the 245 emotional labels to the eight predefined categories.

Table 1

It turns out that, overall, offering the option of “positive or negative emotion” contributed to a reduction in misidentifications, since the two raters were not forced to use one of the six predefined categories. As can be seen in Table 2 (Tab. 2), these two additional categories were used as an alternative response option for all six emotions.

Table 2. Recognition rates derived from the assignments in the rating procedure (in percent).

Table 2

The next step was to count how often each emotion word was named by the 33 subjects. Only the first judgment was included in this analysis, since it is understood as the first preference in the assignment of an emotional expression to an emotion word. Moreover, the additional response option was not used equally by all participants. The analysis of the responses shows that there are a few emotion words that were named by many subjects. The greatest agreement is found for “Ekel” (English: “disgust”). This term was mentioned 458 times. Common words also included “Freude” (428x; English: “happiness”), “Trauer” (304x; English: “sadness”), “Überraschung” (303x; English: “surprise”), “Angst” (297x; English: “fear”), and “Wut” (227x; English: “rage”). “Ärger” (English: “anger”) was mentioned 98 times. Also common were “überrascht” (82x; English: “surprised”), “Zorn” (70x; English: “enragement”), “traurig” (53x; English: “sad”), “Erstaunen” (52x; English: “astonishment”), and “Traurigkeit” (44x; English: “unhappiness”). All other emotion words were mentioned between 1 and 33 times. A preference for a few emotion words is thus apparent.

After the rating procedure in which the two independent raters blind to the hypothesis assigned the emotion words to the existing categories, the recognition rates for the six basic emotions were calculated. For this, the recognition rates per person were determined and then averaged. This resulted in the following average recognition performance: 94% for happiness (SD=13.71), 88% for disgust (SD=15.31), 87% for surprise (SD=14.90), 85% for anger (SD=17.41), 81% for sadness (SD=18.91), and 68% for fear (SD=21.82). For example, in 94% of all cases, an emotion that fit the category of “happiness” was assigned to a joyful expression. The statistical analysis using a multifactorial general linear model with the subject variable of “subject” and the intra-subject factors of “emotion” and “stimulus image” resulted in a significant main effect for “emotion” (Wald χ²(5, 33) 41.012; p<.001). Post-hoc analyses were performed to compare the differences in recognition performance for each emotional category. The recognition rate for fear was significantly lower than for those of all other emotions (p<.05). In addition, there was a significant difference between happiness and sadness (p=.011). All other comparisons were not significant (p>.05).

This raises the question of whether these recognition rates are comparable to those that result when using a closed-response format. To answer this question, a sample was used, which had assessed the same images in a previous study [14] with the difference that the six basic emotions were specified as answer choices (N=63; 63% women, 37% men; average age = 27.48 years). The mentioned study also contains a detailed description of the sample. The following recognition rates from the previous “forced-choice” study can be used for comparison: 98.81% for happiness (SD=4.33), 93.84% for surprise (SD=10.07), 93.57% for anger (SD=11.32), 92.12% for disgust (SD=11.22), 83.74% for sadness (SD=15.23), and 75.89% for fear (SD=24.30). The raw data from this study were available. Subsequently, the two samples were compared using logistic regression with the aim to reveal potential differences in the recognition rates caused by the use of a closed- versus open-response format. No significant differences were found for fear (Wald χ²(1, 96)=1.984; p=.159), disgust (Wald χ²(1, 96)=0.697; p=.404), and sadness (Wald χ²(1, 96)=0.454; p=.501). The closed-answer format resulted in significantly better recognition rates only for anger (Wald χ²(1, 96)=11.063; p=.001), happiness (Wald χ²(1, 96)=17.504; p<.001), and surprise (Wald χ²(1, 96)=8.034; p=.005).

Descriptively, it can be stated that fear, which in the present study has a recognition rate of 68%, was most often confused with “surprise” (almost 10%). It can also be observed that, in the study described here, surprise was for the most part confused with fear (also approx. 10%), thus indicating a tendency for confusion in both directions. In addition, fear was also relatively frequently mistaken for disgust (almost 9%) and sadness (almost 5%). Happiness had a recognition rate of 94%. Disgust was mistaken for anger 6% of the time; sadness for fear 4% of the time.

In addition to assessing the stimulus material, the 33 subjects were also asked to specify how certain they felt about their assessment. Certainty averaged 4.21 (on a 5-point scale). Only minor differences could be found with regard to the individual emotion categories: The certainty rating was lowest for fear (3.79) – the emotion with the lowest recognition rates – and highest for happiness (4.56) – the emotion with the highest recognition rates (see also Table 3 (Tab. 3)).

Table 3. Mean and standard deviation of the subjective certainty for identifying facial expressions.

Table 3

There was a moderate positive correlation according to Pearson’s formula between the specified certainty regarding the judgment and the correctness of the emotion named for anger (r=.566, p=.001), disgust (r=.500, p=.004), and surprise (r=.530, p=.003). This means that a higher subjective certainty score for these three emotions was associated with higher accuracy in the assessment of the emotions shown. No significant (but still moderate) correlations were found for anger (r=.183, p=.352), happiness (r=.201, p=.277), and sadness (r=.182, p=.336). The individual average recognition rate across the individual emotions and the individual assessment of subjective certainty for the facial expressions within one emotion category were used to calculate the correlation. A Pearson’s correlation between subjective certainty and number of mentioned alternative emotions from the second judgment revealed no significant relationships (all p>.08).

Discussion

The assumption of Russell [7] or Wagner [11], that the basic emotions will not be clearly recognized without a closed-response format, cannot be definitely confirmed in this study. On the contrary, this study shows that the basic emotions are reliably – i.e. significantly better than occurring randomly – recognized even without the specification of answer choices. The results can be interpreted as a possible indication of the universality and particular relevance of basic emotions in interpersonal communication. In the present paradigm, the subjects reliably identified the emotion expressions even without specified alternatives. The resulting recognition rates are comparable to our own results and also to the results of other research groups, obtained on the basis of a forced-choice response format (see Table 4 (Tab. 4)).

Table 4. Overview of recognition rates in picture sets (with forced choice, in percent).

Table 4

It is striking that, related to the six basic emotions, the greatest number of different words was found to describe “surprise,” the least to describe “disgust.” The reasons for this are not clear. Perhaps the divergence is due to the causal attribution: While the triggering stimulus for the feeling of “disgust” seems relatively unambigious, surprise may occur due to an unexpected, dangerous, positive, negative, etc. situation, resulting in the necessity for more words for differentiation. Furthermore, the subjects exhibit a preference for those emotion words that are used to describe the six basic emotions in a research context. Only the term "Wut" (English: “rage”) appears to be more common in everyday situations than the word “anger,” as it was mentioned 2.3 times more frequently.

No consistent result can be derived from previous studies regarding the order of recognition performance pertaining to the various emotions [30], [31], [32], although there is a trend toward fear frequently being the least and happiness being the best recognized emotion. This is explained by the fact that differences in the sequence of correctly recognized emotions can be attributed to various factors: implementation of the presentation of emotional stimuli, ethnicity of subjects and of the individuals depicted in the photographs, gender, education, sample size, and the age of the raters are just some examples of aspects that may affect recognition performance. Overall, the results upon visual inspection show good agreement with previous research results, and the recognition rates appear to be comparable. In comparison to a study conducted using the same images but a forced-choice response format [14], it shows that the recognition rates in the presence of pre-defined answer choices are only higher for anger, happiness, and surprise in the present study. Fear, disgust, and sadness were recognized equally well when using an open- versus closed-response format.

Regarding the misidentification of certain emotional expressions, there was a reciprocal tendency to confuse fear and surprise. This result is consistent with the results of Palermo and Coltheart [33], who examined the recognition performance concerning 336 emotional images from various image sets. It can be assumed that fear and surprise are mistaken for each other due to their great similarities in facial expressions.

The subjective certainty only partly correlated with recognition performance. The reference system of humans, namely the process of self-assessment, is thus not perfect. In the present study, the subjective certainty was only partly correlated with the actual recognition performance; i.e. for the emotions of anger, disgust, and surprise. Whether people tend to over- or underestimate themselves or whether there are correlations with certain personality characteristics was not systematically examined in this context. This would certainly be an interesting research question for follow-up studies, however.

Conclusion for clinical practice

The present study provides initial evidence that the use of closed-response formats in the research on emotion recognition ability can be regarded as unproblematic. Further systematic studies must follow, for example, the comparison of performance within a sample when using closed- and open-response formats. The results found here are gratifying and encouraging as far as the use of closed-response formats for the testing of emotion recognition ability is concerned. The advantages of a closed-response format include the simple and quick evaluation of the results. If the results found here can be confirmed, use in the area of the treatment of mental disorders such as anxiety disorders, depression, or borderline personality disorder can be considered. Application in the area of training in identifying specific emotional reactions of others is conceivable, so that social interactions may become more satisfactory. Which emotion words best characterize the emotional stimuli should be thoroughly researched in advance. This study showed, for example, that “rage” is used far more often in the general population than the frequently used label of “anger” in this (German) study population.

Limitations

The study has N=33 participants. The willingness to participate was rather low due to the high expenditure of time necessary to complete the tasks (1 to 2 hours have been reported). Approximately 70 people were made aware of the study, meaning that the response rate was only about 50%. Nevertheless, the sample size is comparable with that of other studies, e.g. [34].

Using the same image sequence for all subjects may be problematic. Potential order effects could not be analyzed with the existing data set.

This study only included subjects with a high level of education. This limited the generalizability of the present results.

Notes

Competing interests

The authors declare that they have no competing interests.

Acknowledgments

This project was supported by the Transregional Collaborative Research Center SFB/TRR 62 “Companion-Technology for Cognitive Technical Systems,” funded by the German Research Foundation, DFG.

Supplementary Material

Appendix A: Emotional labels
PSM-10-04-s-001.pdf (94.7KB, pdf)

References

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Appendix A: Emotional labels
PSM-10-04-s-001.pdf (94.7KB, pdf)

Articles from GMS Psycho-Social-Medicine are provided here courtesy of German Medical Science

RESOURCES