. Author manuscript; available in PMC: 2020 Jul 1.

Published in final edited form as: Psychol Sci Public Interest. 2019 Jul;20(1):1–68. doi: 10.1177/1529100619832930

Table 2:

Criteria used to evaluate the empirical evidence

	Expression Production	Emotion Perception
Reliability	When a person is sad, the proposed expression (a frowning facial configuration) should be observed more frequently than would be expected by chance. Likewise, for every other emotion category that is subject to a commonsense belief. Reliability is related to a forward inference: given that someone is happy, what is the likelihood of observing a smile, p[set of facial muscle movements ∣ emotion category].	When a person makes a scowling facial configuration, perceivers should consistently infer that the person is angry. Likewise, for every facial configuration that has been proposed as the expression of a specific emotion category. That is, perceivers must consistently make a reverse inference: given that someone is scowling, what is the likelihood that he is angry, p[emotion category ∣ set of facial muscle movements].
	Chance means that facial configurations occur randomly with no predictable relationship to a given emotional state. This would mean that the facial configuration in question carries no information about the presence or absence of an emotion category. For example, in an experiment that observes the facial configurations associated with instances of happiness and anger, chance levels of scowling or smiling would be 50%.	Chance means that emotional states occur randomly with no predictable relationship to a given facial configuration. This would mean that the presence or absence of an emotion category cannot be inferred from the presence or absence of the facial configuration. For example, in an experiment that observes how people perceive 51 different facial configurations, chance levels for correctly labeling a scowling face as anger would be 2%.
	Reliability also depends on the base rate: how frequently people make a particularly facial configuration overall. For example, if a person frequently makes a scowling facial configuration during an experiment examining the expressions of anger, sadness and fear, he will seem to be consistently scowling in anger when in fact he is scowling indiscriminately.	Reliability also depends on the base rate: how frequently people use a particular emotion label or make a particular emotional inference. For example, if a person frequently labels facial configurations as “angry” during an experiment examining scowling, smiling and frowning faces, she will seem to be consistently perceiving anger when in fact she is labeling indiscriminately.
	Reliability rates between 70% and 90% provide strong evidence for the commonsense view, between 40% and 69% provide moderate support for the commonsense view, and between 20% and 39% provide weak support (Ekman, 1994; Haidt & Keltner, 1999; Russell, 1994).	Reliability rates between 70% and 90% provide strong evidence for the commonsense view, between 40% and 69% provide moderate support for the commonsense view, and between 20% and 39% provide weak support (Ekman, 1994; Haidt & Keltner, 1999; Russell, 1994).
Specificity	If a facial configuration is diagnostic of a specific emotion category, then the facial configuration should express instances of one and only one emotion category better than chance; it should not consistently express instances of any other mental event (emotion or otherwise) at better than chance levels. For example, to be considered the expression of anger, a scowling facial configuration must not express sadness, confusion, indigestion, an attempt to socially influence, etc. at better than chance levels.	If a frowning facial configuration is perceived as the diagnostic expression of sadness, then a frowning facial configuration should only be labeled as sadness (or sadness should only be inferred from a frowning facial configuration) at above chance levels. And it should not be consistently perceived as expressions of any mental states other than sadness at better than chance levels.
Specificity	Estimates of specificity, like reliability, depend on base-rates and on how chance levels are defined.	Estimates of specificity, like reliability, depend on base-rates and on how chance levels are defined.
Generalizability	Patterns of reliability and specificity should replicate across studies, particularly when different populations are sampled, such as infants, congenitally blind individuals and individuals sampled from diverse cultural contexts, including small-scale, remote cultures. High generalizability across different circumstances ensures that scientific findings are generalizable.	Patterns of reliability and specificity should replicate across studies, particularly when different populations are sampled, such as infants, congenitally blind individuals and individuals sampled from diverse cultural contexts, including small-scale, remote cultures. High generalizability across different circumstances ensures that scientific findings are generalizable.
Validity	Even if a facial configuration is consistently and uniquely observed in relation to a specific emotion category across many studies (strong generalizability), it is necessary to demonstrate that the person in question is really in the expected emotional state. This is the only way that a given facial configuration leads to accurate inferences about a person’s emotional state. A facial configuration is valid as a display or a signal for emotion if and only if it is strongly associated with other measures of emotion, preferably those that are objective and do not rely on anyone’s subjective report (i.e., a facial configuration should be strongly and consistently related to perceiver-independent evidence about the emotional state of the expresser).	Even if a facial configuration is consistently and uniquely labeled with a specific emotion word across many studies (strong generalizability), it is necessary to demonstrate that the person making the facial configuration is really in the expected emotional state. This is the only way that a given perception or inference of emotion is accurate. A perceiver can only be said to be recognizing an emotional expression if and only if the person being perceived is verifiably in the expected emotional state.

Note: Reliability is also related to sensitivity, consistency, informational value, and the true positive rate (for further description, see Figure 3). Specificity is related to uniqueness, discreteness, the true negative rate and referential specificity. In principle, we can also ask more parametrically whether there is a link between the intensity of an emotional instance and the intensity of facial muscle contractions, but scientists rarely do.