Children’s and Adolescent’s Use of Context in Judgments of Emotion Intensity

Brian T Leitzke; Aaron Cochrane; Andrea G Stein; Gwyneth A DeLap; C Shawn Green; Seth D Pollak

doi:10.1007/s42761-024-00279-5

. 2024 Sep 26;6(1):117–127. doi: 10.1007/s42761-024-00279-5

Children’s and Adolescent’s Use of Context in Judgments of Emotion Intensity

Brian T Leitzke ¹, Aaron Cochrane ², Andrea G Stein ¹, Gwyneth A DeLap ³, C Shawn Green ¹, Seth D Pollak ^1,^✉

PMCID: PMC11904079 PMID: 40094045

Abstract

The ability to infer others’ emotions is important for social communication. This study examines three key aspects of emotion perception for which relatively little is currently known: (1) the evaluation of the intensity of portrayed emotion, (2) the role of contextual information in the perception of facial configurations, and (3) developmental differences in how children perceive co-occurring facial and contextual information. Two experiments examined developmental effects on the influence of congruent, incongruent, and neutral situational contexts on participants’ reasoning about others’ emotions, both with and without emotion labels. Experiment 1 revealed that participants interpreted others’ emotions to be of higher intensity when facial movements were congruent with contextual information. This effect was greater for children compared to adolescents and adults. Experiment 2 showed that without verbal emotion category labels, adults relied less on context to scale their intensity judgments, but children showed an opposite pattern; in the absence of labels, children relied more on contextual information than facial information. Making accurate inferences about others’ internal states is a complex learning task given high variability within and across individuals and contexts. These data suggest changes in attention to perceptual information as such learning occurs.

Supplementary Information

The online version contains supplementary material available at 10.1007/s42761-024-00279-5.

Keywords: Emotion recognition, Emotion labeling, Emotion intensity, Development, Context

To make accurate inferences about another person’s emotions, children must learn to construct meaning by integrating myriad signals. These signals include facial musculature, bodily movements, situational contexts, vocal cues, and memories and experiences of past events (Ruba & Pollak, 2020; Sauter et al., 2010; Van den Stock & de Gelder, 2012). Most research on emotion perception focuses on peak prototypes of facial movements with little attention to emotion intensity. Yet full-blown facial displays are not often encountered in the real world (Calvo & Nummenmaa, 2016). Rather, real-world emotion perception requires more nuanced judgments, such as whether someone might be merely annoyed, increasingly irritated, or enraged (Calder et al., 2000). Adding to this complexity, facial movements must be interpreted within the situational contexts in which they are embedded (Aviezer et al., 2012; Mesquita, 2022; Martinez, 2019), and this interpretation is dependent on experience (Pollak et al., 2009). Here, we examine age-related differences in the effect of context, in terms of scenes of situations, on individuals’ interpretations of others’ emotional intensity.

To accurately predict someone else’s behavior, one must often gauge the intensity of the other person’s emotions. While we often focus on facial configurations to make these assessments, facial muscular activation exists on continua that afford meaningful distinctions beyond categorical labeling (Martinez, 2017), and even with precise attention to facial movements, interpretive judgments about emotion are dependent upon social context (Barrett et al., 2019; Chen & Whitney, 2019). Such contextual influence on face perception has been found in both laboratory and observational studies and has been reported across development (Aviezer et al., 2012; Noh & Isaacowitz, 2013; Rajhans et al., 2016; Reschkle & Walle, 2021). While older children and adults tended to allocate the majority of their attention toward faces relative to contextual information, younger children divide their attention to both sources of information equally, suggesting a developmental trend of increasing prioritization of facial over contextual information with age (Leitzke & Pollak, 2016). Yet contextual influences on perceptions of intensity remain less understood.

Context and Ratings of Emotion Intensity

Studies examining the effects of context on emotion judgments predominantly rely on emotion categorization paradigms. High-intensity faces are more accurately categorized than low-intensity faces (Calder et al., 2000; Gao & Maurer, 2009; Hess et al., 1997). Yet, context plays an influential role in how these stimuli are categorized (Aviezer et al., 2012), and may play a role in perceptions of emotional intensity as well. Conceptually, this makes sense: perceivers may use situational contexts to gauge how strongly they think an emoter ought to feel given the circumstances. We are likely to respond differently if we think another person is startled versus terrified, or if we interpret them as disappointed versus devastated. However, the extent to which context influences ratings of emotion intensity has yet to be directly examined.

Current Study

There is little research examining age-related differences in the effect of context on intensity ratings of perceived emotion. In two experiments, we tested children, adolescents, and young adults as they viewed facial configurations presented with emotion-congruent, incongruent, and neutral contexts. Based upon extant literature, we predicted that ratings of intensity would be greater when faces were presented in a congruent, relative to an incongruent or neutral context and that this effect would be more prominent in younger, compared to older, participants. In the second experiment, we replicated the first experiment without providing participants with any emotion labels. Based upon the findings from Experiment 1, we expected to replicate the intensity-congruency effect but also predicted that the absence of labels would further increase children’s reliance on contextual information.

Experiment 1: Does Congruency Between Faces and Contexts Influence the Intensity of Perceived Emotion?

Method

Participants

One hundred sixty-two individuals from three age groups were recruited for this study. Children (N = 56; M_age = 7 years, 11 months, SD = 7 months; range 7.02–9.12; 48% Female; 7% Asian or Asian-American, 6% Black or African-American, 7% Hispanic, 73% White, and 7% Other Racial/Ethnic identification) and pre-adolescents (N = 54; M_age = 13 years, 1 month, SD = 7 months; range 11.86–13.93; 44% Female; 8% Asian or Asian-American, 6% Black or African-American, 4% Hispanic, 81% White, and 1% Other Racial/Ethnic identification) were recruited from the local community via television commercials, radio ads, and posted flyers as well as through a local school district registry where parents volunteered their contact information for research study recruitment. These ages were selected to be consistent with those used in Leitzke and Pollak (2016). Young adults (N = 52; M_age = 19 years, 7 months, SD = 11 months; range 17.97–21.99; 37% Female; 26% Asian or Asian-American, 6% Black or African-American, 8% Hispanic, 58% White, and 2% Other Racial/Ethnic identification) were recruited from an introductory psychology course or from the campus community via posted flyers. Power analyses are reported below in the Data Analytic Plan.

Stimuli

Facial stimuli

Facial stimuli were borrowed from the Interdisciplinary Affective Science Laboratory (IASLab) Facial Stimuli Set.1 We included models with averted gaze to direct participants’ attention toward the contextual information displayed in each image. We selected prototypical categories of anger, disgust, fear, and sadness from four models (two females: models F17 and F19, and two males: models M01 and M07). This stimulus set did not require models to self-identify their race, though a pilot study demonstrated that all selected models were unanimously identified as White in a free-response format. We selected all White models to reduce biases that may be due to race (Elfenbien & Ambady, 2002). While biases by gender also exist (Adams et al., 2015), we included both male and female models to ensure some extent of natural variability in the presented stimulus set. To create variation in emotion intensity, we morphed the facial displays of each model with that same model’s neutral expression. We used j.psychomorph (see Tiddeman et al., 2005) to morph each image to create 10% increments in intensity and selected the 20%, 50%, and 80% images to represent, low-, intermediate-, and high-intensity stimuli respectively (Fig. 1).

Fig. 1 — Example of low, intermediate, and high intensity for facial configurations associated with fear

Contextual stimuli

One hundred contextual images were downloaded from the Internet and rated by 301 adult participants via Mechanical Turk (see Buhrmester et al., 2011). Participants viewed each image and assigned emotion labels to each one, choosing from 27 highly used emotion terms from different cultures (Srinivasan & Martinez, 2018). To avoid a forced-choice bias, participants were able to assign as many options as they felt were appropriate in response to how they believed someone might feel if they viewed each scenario in real life. The top-rated images for each emotion category were selected (all images achieved 50% endorsements for each emotion category; chance performance would be 3.7%). We selected the four top-rated images for each emotion prototypical category (anger, disgust, fear, happiness, sadness, neutral), for a total of 24 images. For each category, two contextual images included people and two did not include people. The faces of all people in the situational context were blurred.

Composites

Facial images were superimposed on top of each context to create composites of each face appearing to respond to a situation. Four facial categories (fear, sad, disgust, angry) were fully crossed with the three conditions (congruent face-context, incongruent face-context, neutral) resulting in 12 pairings. The congruent condition paired each face with a context of the same emotion (e.g., anger face within anger context). Incongruent pairings were chosen based on the confusability of anger and disgust (Aviezer et al., 2012) as well as sadness and fear (Mondloch, 2012); anger faces were paired with disgust contexts, and vice versa, and sad faces were paired with fear faces, and vice versa. All 12 face/context pairings were fully crossed with three intensities (low, intermediate, high), creating 36 face emotion/congruence/intensity composites (Fig. 2).

Fig. 2 — Exemplars of face emotion/context composites. All four emotions (anger, disgust, sadness, fear) are depicted in congruent, incongruent, and neutral contexts. The model depicted is model F17 from the International Affective Science Laboratory (IASLab) Face Set expressing high (80%) intensity for each facial configuration

Each face emotion/congruence/intensity composite appeared evenly with the four different models (two males, two females) and context exemplars (two with people, two without). One-hundred forty-four trials comprised of four presentations of each of the 36-face emotion/congruence/intensity composites, each with a different model and context exemplar. Sixteen trials consisted of smiling faces and contexts to provide variation in stimuli valence. Specifically, smiling faces were morphed to create low (20%) and high (80%) intensity smiling faces in congruent and neutral contexts paired with each of the four facial emotion posers. In total, this experiment consisted of 160 total trials, with each of the four models and contexts appearing together ten times across the experiment.

Judgment Task

Participants viewed each image for 1,000 ms before being asked to rate the image. Participants responded to the question “How is this person feeling?” by using a computer mouse to move a cursor along a visual analog scale that ranged from “not at all (displayed facial emotion)?” to “extremely (displayed facial emotion)?” to indicate the intensity of emotion they believed each person was experiencing. Participants were randomly assigned to rate increasing intensity from left to right or right to left to reduce any directional bias. The resulting scores ranged from 0 (not at all displaying that emotion) to 100 (displaying that emotion very intensely). Participants were instructed to focus explicitly on the face in each image when making their ratings to ensure ratings reflected the intensity of emotion in different contexts. The experiment was divided into four 40-trial blocks, which participants viewed in a random order to ensure that all participants saw all stimuli and to control for order effects. At the end of the task, participants also completed an additional block consisting of 48 trials of all face emotion/intensity/model combinations presented in the absence of any contextual information. The task was created and presented with E-Prime 3.0 software (Psychology Software Tools, Pittsburgh, PA).

Procedure

The University of Wisconsin–Madison Institutional Review Board approved all procedures, and all participants provided informed written consent/assent. Participants and parents completed (as appropriate for the participants’ ages) consent/assent procedures in a waiting room before moving to an individual testing room where they completed the study tasks. Participants completed all four blocks of the judgment task before completing the no-context block. Stimuli were presented on a 21-in. Dell computer monitor at a resolution of 1920 by 1080 pixels and displayed at 75% of the width and height of the screen. A research assistant remained with each participant throughout the study to encourage and remind them to pay attention to the change in emotion label and direction of the scale and to use the face of each individual to make their determination of emotion intensity. Adult participants were compensated with either course credit, if recruited from the psychology course, or a $10 cash payment if recruited from the campus community. Children received a prize for completing this experiment, and their parents were compensated with a $20 cash payment.

Data Analytic Plan

We excluded trials with response times under 200 ms, as these would have occurred prior to the time required to initiate perceptual and motor processes following stimulus presentation (Whelan, 2008; 0.1% of all responses). We also excluded all trials involving the two contexts depicting anger in a non-social setting, based on results from a post hoc validation study conducted with children (n = 129) and adolescents (n = 51) on Lookit (Scott & Schulz, 2017). This study showed that while the stimuli were effective for adults, children did not consistently perceive these two contexts as anger-inducing; less than 50% of children endorsed the expected label (“mad”) for these stimuli (ratings available online on OSF).

Data were analyzed in R (R Core Team, 2018) using the brms package (Bürkner, 2017) to implement a Bayesian fitting approach. To do so, we fit a Bayesian hierarchical beta regression with monotonic coefficients (model code available online on OSF). Beta regression, in which fit values may range from zero to one, includes a dispersion parameter phi which we modeled as a by-age-group random effect. Additional random intercepts were included for ratings by participant and by context emotion. Random intercepts and linear emotion-intensity slopes were estimated for each combination of face model and emotion. Default priors were used in all cases. No aggregation was completed prior to data fitting (i.e., models were fit to raw by-trial data). Monotonic coefficients are used to estimate the overall fixed effects of an ordered (monotonic) variable when the relative distance between the levels is unknown (Bürkner & Charpentier, 2020). The overall effect was estimated and reported here and represents half of the difference between low-intensity and high-intensity faces or between incongruent and congruent contexts. The monotonic regression allowed the overall effect to account for a set of simplex parameters which controlled for variations in distances between levels (e.g., low-intensity, medium-intensity, or high-intensity). Monotonic coefficients were estimated for face intensity as well as context congruency.

Models were fit using two chains run for 5,000 iterations each and discarding the first 2500 samples as warmup. From these fits, we extracted values for specific groups and trial types for each sample of parameters, corresponding to 5,000 separate estimates of these values. We then tested for differences in ratings by examining the overlaps in the distributions of parameter samples (Kruschke, 2013). By subtracting values in paired samples, we calculated distributions corresponding to predicted differences between two groups (e.g., children vs. adults) or trial types (e.g., low vs. high intensity). If the distribution of differences was largely positive or negative, we considered the difference reliable (using conventional two-tailed 95% quantiles of parameter differences). We reported these comparisons in terms of the median, as mean values are subject to bias with potentially skewed distributions, and in terms of 95% credible intervals of the differences. We also calculated the proportion of posterior parameter samples that were on the opposite side of zero as the median parameter value (i.e., evidence for an effect in the opposite direction then the median). Finally, we multiplied this proportion of samples by two to be on the same scale as typical frequentist p-values and report the value as Bayesian-p.

To calculate our power to detect fixed effects, we used a maximum-likelihood simulation-based approach (Green & MacLeod, 2016). While our primary models include monotonic effects (e.g., of congruency), to reduce the computational burden, we used linear effects of congruency and emotion intensity. For the simulation-based power analyses, we fit a generalized linear mixed-effect model to the original data. Then, using that fit to the real data, one fixed-effect point estimate at a time was replaced with a coefficient of a prespecified size (e.g., the interaction between face intensity and background congruency was set to be 0.05 in one simulation). From this model in which one coefficient was specified and many aspects had been empirically estimated (e.g., error variance; random-effects estimates), data was repeatedly simulated, and the same model fit. This allowed for an estimate of the probability of null hypothesis rejection for that coefficient at that magnitude. By repeating the process for various magnitudes and for different coefficients, we estimated the size of the fixed-effect coefficients at which we would have 80% power to detect a true effect. The results of power analyses are reported in Supplemental Table 3.

Results

We first excluded all participants who had an average rating for low-intensity faces that was higher than their average rating for high-intensity faces, which we interpreted as non-compliance with task instructions (remaining participants: children: n = 40, adolescents: n = 54, adult: n = 52). The two model chains, each run for 5,000 iterations, demonstrated convergent parameter distributions (max fixed-effect r-hat < 1.01). Summary statistics are reported in Table 1.

Table 1.

Summary statistics for model-fitted intensity ratings in congruent, incongruent, and neutral contexts by face intensity and age group in experiment 1

				95% CI
Congruence	Intensity	Age Group	Median	Lower	Upper
Congruent	Low	Adolescents	0.438	0.241	0.615
		Adult	0.482	0.283	0.659
		Children	0.587	0.374	0.748
	Intermediate	Adolescents	0.675	0.484	0.802
		Adult	0.687	0.493	0.811
		Children	0.744	0.573	0.849
	High	Adolescents	0.759	0.567	0.871
		Adult	0.757	0.567	0.868
		Children	0.77	0.586	0.875
Neutral	Low	Adolescents	0.395	0.207	0.577
		Adult	0.436	0.247	0.622
		Children	0.459	0.26	0.641
	Intermediate	Adolescents	0.641	0.444	0.779
		Adult	0.646	0.45	0.781
		Children	0.654	0.46	0.789
	High	Adolescents	0.735	0.535	0.854
		Adult	0.725	0.523	0.847
		Children	0.693	0.484	0.827
Incongruent	Low	Adolescents	0.375	0.197	0.551
		Adult	0.406	0.225	0.595
		Children	0.434	0.241	0.616
	Intermediate	Adolescents	0.62	0.422	0.76
		Adult	0.615	0.416	0.759
		Children	0.684	0.498	0.809
	High	Adolescents	0.72	0.517	0.844
		Adult	0.698	0.495	0.83
		Children	0.733	0.538	0.854

Open in a new tab

Values represent log odds. Intensity levels represent 20% (low), 50% (intermediate), and 80% (high) morphs

Hypothesis #1: Contextual Congruency Will Result in More Intense Perceived Emotion

Across all age groups, faces in congruent contexts were rated as conveying more intense feelings than faces presented with incongruent contexts (md = 0.249; CI₉₅ = [0.200, 0.294], Bayesian-p < .001). This was true for all three age groups (children: md = 0.309; CI₉₅ = [0.252, 0.364], Bayesian-p < .001; adolescents: md = 0.129; CI₉₅ = [0.076, 0.174], Bayesian-p < .001; adults: md = 0.155; CI₉₅ = [0.096, 0.212], Bayesian-p < .001). When we examined ratings across levels of facial intensity, we found that this contextual congruency effect was stronger for low-intensity faces than for high-intensity faces (md = 0.095; CI₉₅ = [0.056, 0.141], Bayesian-p < .001; see Fig. 3).

Fig. 3 — Predicted intensity ratings in Experiment 1 for congruent, neutral, and non-congruent contexts by face intensity (− 0.5 = low, 0 = intermediate, 0.5 = high) for adolescents, adults, and children with emotion labels provided. Predicted ratings reflect posterior samples calculated from generalized, multi-level Bayesian regression. Error bars represented 95% credible intervals (meaning that for each interval, there is a 95% probability the true mean intensity rating for the corresponding age group and stimulus intensity level falls within the depicted range). Note that error bars cannot be used to determine reliability because they reflect within- and between-subject variation, whereas reliability of fixed-effect coefficients controlled for between-subject variation

Hypothesis #2: Younger Children’s Perceived Intensity of Emotions Will Be More Influenced by Context Than Adolescents or Adults

As noted above, all three age groups rated faces as more intense in congruent contexts than in incongruent contexts. However, children demonstrated a stronger contextual congruency effect than adolescents (md = 0.180; CI₉₅ = [0.123, 0.242], Bayesian-p < .001) or adults (md = 0.154; CI₉₅ = [0.072, 0.234], Bayesian-p < .001). Adolescents and adults, on the other hand, showed comparable congruency-related differences in intensity ratings (md = 0.026; CI₉₅ = [− 0.051, 0.095], Bayesian-p = .472; see Fig. 3).

Summary of Experiment 1

Participants interpreted others’ emotions to be of higher intensity when facial movements were congruent with contextual information. This effect was greater for children compared to adolescents and adults, suggesting that context exerts an especially strong influence on the way children perceive emotions.

Experiment 2: Does the Presence of Verbal Labels Influence the Intensity of Perceived Emotion?

Much research activity in affective science concerns the role of language and labeling in emotion reasoning. In Experiment 1, participants’ intensity ratings were anchored to a specific emotion label, as they rated intensity on a scale that ranged from “not at all angry/disgusted/sad/scared” to “extremely angry/disgusted/sad/scared.” Thus, participants’ intensity ratings were influenced by the categorical labels we supplied, even if the participant did not feel the individual in the image was actually experiencing that emotion. To inform our understanding of the impact of emotion labels on this task, Experiment 2 was conducted without the use of any specific emotion words. At the completion of the task, we asked participants to provide their own labels for the stimuli to get a sense of how similarly participants construed the stimuli (these analyses were not part of our a priori hypotheses but are reported in the Supplemental Material, see Supplementary Tables 1 and 2).