Skip to main content
iScience logoLink to iScience
. 2025 Nov 27;29(1):114269. doi: 10.1016/j.isci.2025.114269

Behavioral and neural mechanisms underlying contextual processing in naturalistic short video viewing

Zhengcao Cao 1,2, Xiang Xiao 3,, Yashu Wang 1, Ran Li 4, Yapei Xie 2, Liangyu Wu 1, Suyu Bi 5, Fengyu Yang 1, Yiwen Wang 1,6,7,∗∗
PMCID: PMC12805293  PMID: 41550737

Summary

Contextual processing enables the brain to integrate environmental cues for adaptive emotional perception. It is crucial to understand how this ability operates in dynamic environments of short video viewing and how the brain supports it. In this study, we utilized behavioral and neuroimaging experiments to examine contextual processing during short video viewing. Compared with single-face clips, face-context-face sequences elicited coherent emotional perceptions of neutral faces from emotional contexts. A distributed brain network was involved in the top-down modulation of contextual processing. Temporoparietal junction (TPJ) and precuneus showed sustained engagement during context integration. Activity in TPJ and insula was associated with valence and arousal ratings, respectively. Under negative contexts, global functional segregation facilitated contextual processing, and weaker TPJ-insula connectivity corresponded to stronger contextual effects. This study advances the understanding of contextual processing in the digital media age and may inform future investigations into neural correlates of contextual processing dysfunction in psychiatric conditions.

Subject areas: Biological sciences, Behavioral neuroscience, Cognitive neuroscience

Graphical abstract

graphic file with name fx1.jpg

Highlights

  • Contextual processing affects emotional perception during short video viewing

  • TPJ and insula support top-down contextual modulation in short video viewing

  • TPJ and insula regional and FC dynamics correlate with contextualized emotion


Biological sciences; Behavioral neuroscience; Cognitive neuroscience

Introduction

Information and context are never presented in isolation.1 Through associative learning of context-information relationships,2 animals and humans develop the ability to integrate contextual cues and make adaptive interpretations within environmental contexts,3,4 a cognitive function known as contextual processing.2,5,6 This function helps individuals conserve cognitive resources and facilitates rapid decision-making.7,8 Although contextual processing has been demonstrated to be a fundamental ability across various cognitive domains,9,10 its underlying mechanisms remain insufficiently explored,6,9,10,11 particularly within emerging media environments.3,12

With the growing dominance of short videos and their widespread adoption for information acquisition by billions of users worldwide,13,14,15 the way humans engage with and process contextual information has undergone a significant shift. Short videos—characterized by rapid scene transitions and condensed storytelling—present unique cognitive demands that may influence contextual processing mechanisms.16 However, despite the prevalence of short video viewing, it remains unclear whether contextual processing functions similarly in this dynamic video setting and how the brain adapts to support this process. More specifically, a focal aspect of contextual processing during short video viewing involves emotional perception,17,18,19 particularly the interpretation of facial expressions within rapidly shifting visual narratives, raising the further question of how contextual processing contributes to emotional perception. Moreover, deficits in contextual processing abilities may lead to rigid or inappropriate behavioral responses, contributing to various psychiatric symptoms; thus, investigating this ability in clinical populations may inform studies on neural correlates of contextual processing dysfunction.2,20

Although naturalistic short video materials are essential for studying contextual processing,21,22,23 it remains unclear whether existing research methodologies can effectively capture this process in ecologically valid short video environments. Studies using face-context-face sequences,24,25 which ask participants to evaluate the valence and arousal of neutral faces and categorize expressions with several emotional labels, have found that emotional context shapes emotion perception. However, these traditional studies,24,25,26 which primarily rely on static images or artificial stimuli, cannot fully address how contextual processing functions in naturalistic short videos. Most neuroimaging experiments have also used static images6,27 to investigate the neural mechanisms of contextual processing. For instance, functional magnetic resonance imaging (fMRI) and electrophysiological studies using static face-context pairs have implicated regions such as the amygdala, hippocampus, orbitofrontal cortex, temporal pole, and superior temporal sulcus in the contextual processing of emotional perception.6,17,27 However, their reliance on static stimuli that omit facial dynamics and contextual continuity limits ecological validity, highlighting the need to investigate the neural dynamics of contextual processing within immersive short video environments.

The key to revealing the behavioral and neural mechanisms underlying contextual processing in naturalistic short video viewing lies in ecologically valid stimuli, careful experimental design, and advanced analytical methods. Firstly, the integration of video material production and neuroscience has attracted increasing attention,28,29,30 reflecting the growing trend toward naturalistic fMRI paradigms.22,23,31,32 While current studies on contextual processing primarily rely on static stimuli from databases such as KDEF25 and IAPS,6 this interdisciplinary approach offers an ecologically valid means of investigating the mechanisms of contextual processing. Secondly, as contextual processing is a top-down process that modulates the emotional perception of a neutral face by integrating contextual cues,17,27,33,34,35 a well-designed experimental framework is crucial for revealing the behavioral mechanisms of contextual processing. When a neutral face is presented in isolation, viewers lack contextual cues to infer its emotional expression; however, when a neutral face is paired with an emotional context, top-down cognitive processes influence its perception, making the face appear more consistent with the contextual emotion. In this within-subject design, researchers evaluate neutral faces under two conditions—in isolation and within a face-context-face sequence—allowing within-subject comparisons of emotional ratings. This within-subject design minimizes participant-related variability that often limits between-group comparisons in previous studies.6,24,25 Thirdly, advanced neuroimaging analysis methods, such as the finite impulse response (FIR) analysis36 and the sliding window approach,37,38 enable a more thorough investigation of dynamic contextual processing mechanisms at both the regional and network levels. Although these three aspects are essential for revealing the mechanisms underlying contextual processing, to the best of our knowledge, no studies have yet integrated them into a unified framework. As a result, substantial knowledge gaps remain.

In this study, we aim to investigate the behavioral and neural mechanisms underlying contextual processing during naturalistic short video viewing, including how contextual processing contributes to emotional perception in short videos and how the brain adapts to support this process. Prior to the experiment, we produced dynamic short videos presented both in face-context-face sequences and as single-face clips. From a behavioral perspective, using the colored version and within-subject design, we assess whether contextual processing occurs during short video viewing by examining how emotional context alters neutral face valence ratings. From a neural perspective, we address three key questions: first, which brain regions are involved in the top-down modulation during colored short video viewing; second, how these regions dynamically and independently process contextual information to generate coherent emotional perception, as revealed by FIR analysis and brain-behavior correlations; and third, how these regions interact dynamically through patterns of network-level segregation and integration, as demonstrated by sliding window analysis and brain-behavior correlations, to support contextual processing. Furthermore, we replicate our findings using black-and-white videos to test whether contextual processing is a generalizable ability across different color conditions. Lastly, we conduct an additional behavioral replication experiment with an independent cohort to further validate contextual processing during colored short video viewing.

Results

Short video examples produced for this study are demonstrated in supplementary videos (Videos S1, and S2). The results are structured as follows: first, to assess whether contextual processing occurs during short video viewing, we use a within-subject design to compare participants’ valence and arousal ratings of neutral faces presented in single-face clips versus face-context-face sequences (Video S3 or YouTube: https://youtu.be/FD1fl2Np1E0). Emotional context significantly modulates these ratings, demonstrating an evident contextual influence on emotional perception during colored short video viewing. The valence ratings of contextual processing are influenced by several factors, including the type of emotional context (biological vs. non-biological), actor gender (male vs. female), and participant gender (male vs. female). Next, we identify the neural correlates of top-down contextual processing, highlighting the involvement of specific brain regions, such as the temporoparietal junction (TPJ) and insula, in integrating contextual information during short video viewing. Following this, we apply FIR modeling to examine how these regions dynamically process contextual information over time, supporting their role in generating coherent emotional perceptions. Additionally, we examine the relationship between the peak activation of these neural correlates and behavioral ratings of contextual processing, supporting their role in context-dependent emotional modulation. Furthermore, we apply a sliding-window analysis to examine whole-brain and cluster-wise dynamic functional connectivity (FC), defined as BOLD signal correlations among different brain regions, in relation to behavioral ratings of contextual processing. The results reveal that a decrease in global FC contributes to this contextual processing. In particular, a reduction in FC between the TPJ cluster and insula cluster is strongly associated with contextual processing. Moreover, we replicate these findings in black-and-white short video viewing, confirming that both behavioral and neural contextual processing persist even in the absence of color, suggesting that contextual processing is a fundamental cognitive mechanism, independent of color cues, and reinforcing its robustness in emotional perception. Lastly, we replicated the behavioral contextual processing in an independent cohort, confirming that this effect represents a robust and generalizable cognitive mechanism across individuals.

Video S1. Colored short video Demos
Download video file (4.7MB, mp4)
Video S2. Black-and-white short video Demos
Download video file (4MB, mp4)
Video S3. Contextual processing Demos
Download video file (7.7MB, mp4)

Contextual influence on emotional perception in colored short video viewing

To explore contextual processing in colored short video viewing, 29 participants completed the single-face rating experiment (Figure 1A) and 36 participants (including the same 29 participants) completed the face-context-face sequence rating experiment (Figure 1B). Comparing valence and arousal ratings between single-face clips and faces presented in the face-context-face sequence allows us to examine how contextual processing contributes to emotional perception in colored short video viewing.

Figure 1.

Figure 1

Procedure and behavioral results using colored short videos

(A) In the single-face clip rating experiment, conducted on a laptop, participants rated the valence and arousal of isolated face clips. Each trial began with a 0.5-s fixation cross, followed by a 2-s presentation of a neutral face clip. After a 0.65-s inter-stimulus interval (ISI), participants provided valence and arousal ratings for the neutral face. Valence ratings ranged from −4 (negative) to 4 (positive), while arousal ratings ranged from 1 (low) to 9 (high), with a 5-s response window per rating. The neutral faces were categorized into three emotional conditions (negative, neutral, and positive), with each condition containing 10 trials.

(B) In the face-context-face sequence rating experiment, conducted in an fMRI scanner, each trial began with a 0.5-s fixation cross, followed by a 2-s presentation of a neutral face clip and a jitter period of 4–6 s. Participants then viewed a 4-s emotional context clip (negative, neutral, or positive) followed by another jitter period. The trial concluded with a 2-s presentation of a second neutral face clip, which was similar to the first. After a 0.65-s ISI, participants rated the valence and arousal of the final neutral face. Valence ratings ranged from −4 (negative) to 4 (positive), while arousal ratings ranged from 1 (low) to 9 (high), with a 5-s response window per rating. Each trial ended with an inter-trial interval (ITI) of 1–1.5 s. The fMRI experiment consisted of 30 trials, evenly distributed across the three emotional conditions.

(C) Averaged valence scores of neutral faces in the single-face clip rating experiment. A significant main effect of emotional condition on valence was observed (F1.5, 42.1 = 18.862, p = 9 × 10−6, η2 p = 0.402, Greenhouse-Geisser correction).

(D) Averaged valence scores of neutral faces in the face-context-face sequence rating experiment. A significant main effect of emotional condition on valence was found (F1.3, 45.2 = 114.842, p = 1.4 × 10−15, η2 p = 0.766, Greenhouse-Geisser correction).

(E) Comparison of valence ratings between the single-face clip and face-context-face sequence rating experiments for the same participants. The delta score is calculated by subtracting the valence ratings from the face-context-face sequence rating experiment from those of the single-face clip experiment for the same participants. A significant difference was detected using a single-sample t-test on the delta scores in the negative condition (t = −7.282, df = 28, p = 6.3 × 10−8) and positive condition (t = 6.540, df = 28, p = 4.3 × 10−7).

(F) Averaged arousal scores of neutral faces in the single-face clip rating experiment. No significant main effect of emotional condition on arousal was found (F1.6, 45.2 = 0.067, p = 0.901, η2 p = 0.002, Greenhouse-Geisser correction).

(G) Averaged arousal scores of neutral faces in the face-context-face sequence rating experiment. A significant main effect of emotional condition on arousal was observed (F1.6, 57.0 = 19.090, p = 2 × 10−6, η2 p = 0.353, Greenhouse-Geisser correction).

(H) Comparison of arousal ratings between the single-face clip and face-context-face sequence rating experiments for the same participants. The delta score is calculated by subtracting the arousal ratings from the face-context-face sequence rating experiment from those of the single-face clip experiment for the same participants. A significant difference was detected using a single-sample t-test on the delta scores in the negative (t = 3.108, df = 28, p = 0.004), neutral (t = −4.128, df = 28, p = 3 × 10−4), and positive (t = 2.969, df = 28, p = 0.006) conditions.

(I) Effect of context type on the contextual processing. A significant main effect of context type (biological vs. non-biological) was observed (F1, 35 = 25.209, p = 1.5 × 10−5, η2 p = 0.419).

(J) Effect of actor gender on the contextual processing. A significant main effect of actor gender was found (F1, 35 = 32.349, p = 2 × 10−6, η2 p = 0.480).

(K) Effect of participant gender on the contextual processing. A significant main effect of participant gender was detected (F1, 34 = 4.672, p = 0.038, η2 p = 0.121).

Error bars represent the SEM. Asterisks denote significant differences between conditions (∗∗p < 0.01, ∗∗∗p < 0.001).

In the single-face clip rating experiment, valence ratings for neutral faces varied by condition: −0.50 ± 0.12 for the negative condition, 0.18 ± 0.11 for the neutral condition, and 0.22 ± 0.10 for the positive condition. A repeated-measures ANOVA revealed a significant main effect of emotional condition on valence (F1.5, 42.1 = 18.862, p = 9 × 10−6, η2 p = 0.402, Greenhouse-Geisser correction; Figure 1C). Post hoc Bonferroni tests showed no significant difference between the positive and neutral conditions (p > 0.99) but found a significant difference between the negative and neutral conditions (p < 0.001). However, when neutral face clips were combined with an emotional context in the colored face-context-face short video sequence, participants perceived the neutral faces differently. Negative contexts elicited a negative valence rating (−1.72 ± 0.15), while positive contexts induced a positive valence rating (1.24 ± 0.15), indicating that contextual processing leads to a coherent emotional perception. In contrast, neutral contexts did not significantly alter valence ratings (0.04 ± 0.08). A repeated-measures ANOVA confirmed a significant main effect of emotional condition on valence (F1.3, 45.2 = 114.842, p = 1.4 × 10−15, η2 p = 0.766, Greenhouse-Geisser correction; Figure 1D), and post hoc Bonferroni tests showed significant differences across all conditions (negative vs. neutral, negative vs. positive, and neutral vs. positive; p < 0.001). For participants who completed both the single-face clip rating experiment and the face-context-face sequence rating experiment, valence ratings for neutral faces in the sequence rating experiment decreased in the negative condition and increased in the positive condition (Figure 1E). This shift was statistically significant, with delta scores confirming differences between the two experiments for the negative condition (t = −7.282, df = 28, p = 6.3 × 10−8) and the positive condition (t = 6.540, df = 28, p = 4.3 × 10−7), demonstrating that emotional context significantly influences facial emotion perception through contextual processing.

In the single-face clip rating experiment, arousal ratings for neutral faces were 4.41 ± 0.24 for the negative condition, 4.38 ± 0.26 for the neutral condition, and 4.44 ± 0.28 for the positive condition. A repeated-measures ANOVA showed no significant differences in arousal across conditions (F1.6, 45.2 = 0.067, p = 0.901, η2 p = 0.002, Greenhouse-Geisser correction; Figure 1F). However, after viewing the colored face-context-face short video sequence, arousal levels increased across all three conditions compared with the single-face clip rating experiment (Figure 1G). Negative contexts elicited the highest arousal levels (5.55 ± 0.26), followed by positive contexts (5.12 ± 0.22), while neutral contexts resulted in the lowest arousal levels (3.88 ± 0.22). A repeated-measures ANOVA confirmed a significant main effect of emotional condition on arousal (F1.6, 57.0 = 19.090, p = 2 × 10−6, η2 p = 0.353, Greenhouse-Geisser correction), and post hoc Bonferroni tests showed significant differences between negative and neutral contexts (p < 0.001) and between positive and neutral contexts (p < 0.001). For participants who completed both the single-face clip rating experiment and the face-context-face sequence rating experiment, arousal ratings for neutral faces in the face-context-face sequence rating experiment increased in the negative and positive conditions but decreased in the neutral condition (Figure 1H). This shift was statistically confirmed, with significant delta scores observed between the two experiments in the negative condition (t = 3.108, df = 28, p = 0.004), neutral condition (t = −4.128, df = 28, p = 3 × 10−4), and positive condition (t = 2.969, df = 28, p = 0.006).

Several factors influenced contextual processing, including the type of emotional context (biological vs. non-biological), actor gender (male vs. female), and participant gender (male vs. female). Repeated-measures ANOVAs revealed a significant main effect of context type, with biological contexts eliciting stronger contextual processing than non-biological contexts (F1, 35 = 25.209, p = 1.5 × 10−5, η2 p = 0.419; Figure 1I). Additionally, actor gender had a significant impact, as participants’ valence ratings differed based on whether the actor was male or female (F1, 35 = 32.349, p = 2 × 10−6, η2 p = 0.480; Figure 1J). Participant gender also influenced the contextual processing, with male and female participants showing different valence rating patterns (F1, 34 = 4.672, p = 0.038, η2 p = 0.121; Figure 1K). Together, these findings indicate that the contextual processing in colored short video viewing is modulated by both the emotional context and individual characteristics, including actor and participant gender.

Neural correlates of top-down contextual processing during short video viewing

Previous research has shown that individuals tend to perceive emotions from a neutral face after viewing a preceding emotional context,6,27 and the brain regions activated during the processing of “Face_2” are considered to reflect top-down modulation in contextual processing.17 To specifically examine the neural activity associated with “Face_2” while accounting for contextual influences, we employed a subtraction approach (Figure 2A): we first subtracted the activity related to “Face_1” from “Face_2” to remove baseline face-processing effects and then controlled for the contribution of the preceding “Emotional Context.” This analysis revealed three distinct clusters of brain activation (Figure 2B and Table S1).

Figure 2.

Figure 2

Whole-brain activation for contextual processing

(A) Brain regions involved in contextual processing were identified by contrasting neural activation across different phases of the face-context-face sequence. First, neural activity during Face_2 was contrasted with Face_1 to isolate brain regions associated with the contextual modulation of face perception. Next, this contrast was further compared with the Emotional Context to account for and exclude the direct influence of emotional context processing.

(B) The contrast results revealed significant whole-brain activation related to contextual processing, aggregated across all emotional conditions. Statistical significance was determined using a threshold of p < 0.05 (false discovery rate corrected), with a cluster size >50 voxels. The identified brain regions are visualized as three distinct clusters.

The first cluster was primarily localized in the right parietal lobe, including the inferior parietal lobule (Parietal_Inf_R), angular gyrus (Angular_R), superior parietal lobule (Parietal_Sup_R), and postcentral gyrus (Postcentral_R). The second cluster included regions in the left insula (Insula_L), superior temporal gyrus (Temporal_Sup_L), superior temporal pole (Temporal_Pole_Sup_L), and Heschl gyrus (Heschl_L). The third cluster encompassed key subcortical regions, including the left and right hippocampus (Hippocampus_L, Hippocampus_R), caudate nucleus (Caudate_L, Caudate_R), precuneus (Precuneus_L, Precuneus_R), calcarine cortex (Calcarine_L), lingual gyrus (Lingual_L), and posterior cingulate cortex (Cingulate_Post_L). These regions collectively contribute to the top-down modulation of contextual information during short video viewing. Among the cortical regions, the largest activation cluster was observed in the TPJ, which includes the right inferior parietal lobule, angular gyrus, and superior parietal lobule. Given its established role in mentalizing and perspective-taking,39,40 TPJ activation suggests that contextual framing engages social cognitive mechanisms during emotion perception tasks. Among the subcortical regions, the left insula exhibited the strongest activation, underscoring its role in integrating emotional context into perceptual judgment. This finding aligns with prior research indicating that the insula plays a crucial role in generating coherent emotions from neutral faces.41 Additionally, the hippocampus and superior temporal pole showed significant activation, suggesting their complementary contributions to contextual processing. Specifically, the hippocampus may facilitate the retrieval of emotional memories associated with prior context, while the superior temporal pole likely supports the integration of contextual cues into emotional perception.6,42 In summary, these findings highlight a distributed neural network that underlies top-down contextual processing during short video viewing. To further refine the spatial localization of these activations, we identified the peak coordinates of each region using the DPABI toolbox (Table S1).43

Multi-region temporal dynamics in contextual processing

Building on the whole-brain activation findings, we applied FIR modeling to investigate the temporal dynamics of contextual processing within regions implicated in top-down modulation (as listed in Table S1). This method enabled us to capture activation changes at specific time points (Figures 3A and S1), providing insights into the evolving engagement of these regions across different emotional conditions. The FIR results from this analysis are presented in Table S2.

Figure 3.

Figure 3

FIR analysis and behavioral association results using colored short videos

(A) The grand average time course of percentage signal change in ROIs (peak MNI coordinates of Cluster 1: [52, −42, 54], as reported in Table S1) is shown across three emotional conditions (Negative, Neutral, Positive). Time points at 6s, 14s, and 20s were selected to approximate the offset times of “Face_1,” “Emotional Context,” and “Face_2,” based on the typical hemodynamic response function (HRF) latency.

(B) The percentage signal change is plotted across three time points (“Face_1,” “Emotional Context,” “Face_2”) and three emotional conditions (Negative, Neutral, Positive) for the peak coordinates of each cluster identified in the contrast analysis. Clusters 1 and 3 exhibited a statistically significant main effect of time.

(C) The percentage signal change (PSC) during “Face_2” was used to correlate with valence and arousal scores.

(D) Behavioral association results based on PSC values. Significant correlations were observed between PSC and valence scores in Cluster 1 (i) (r = −0.218, p = 0.012, one-tailed). Additionally, significant correlations were found between PSC and arousal scores in Cluster 2 (v) (r = 0.317, p = 4 × 10−4, one-tailed).

Error bars represent the SEM.

First, the FIR analysis showed a significant main effect of time on neural contextual processing, with multiple brain regions demonstrating time-dependent changes. For the peak coordinates of the clusters, a significant main effect of time was observed for neural responses in Cluster 1 (F2, 34 = 7.208, p = 0.002, η2 p = 0.298) and Cluster 3 (F1.6, 57.1 = 19.148, p = 2 × 10−6, η2 p = 0.354, Greenhouse-Geisser correction), indicating time-dependent modulation in these clusters during contextual processing (Figure 3B). For the specific regions of interest (ROIs) (Figure S2), which included the TPJ (Parietal_Inf_R: F2, 34 = 7.208, p = 0.002, η2 p = 0.298; Parietal_Sup_R: F1.6, 57.7 = 9.149, p = 0.001, η2 p = 0.207, Greenhouse-Geisser correction; Angular_R: F1.7, 57.8 = 10.146, p = 4 × 10−4, η2 p = 0.225, Greenhouse-Geisser correction), Hippocampus (Hippocampus_L: F1.6, 56.3 = 5.899, p = 0.008, η2 p = 0.144, Greenhouse-Geisser correction), Posterior Cingulate Cortex (Cingulate_Post_L: F1.5, 52.3 = 7.090, p = 0.004, η2 p = 0.168, Greenhouse-Geisser correction), Precuneus (Precuneus_L: F2, 34 = 9.259, p = 0.001, η2 p = 0.353; Precuneus_R: F1.4, 50.7 = 14.391, p = 7 × 10−5, η2 p = 0.291, Greenhouse-Geisser correction), Caudate (Caudate_L: F2, 34 = 8.261, p = 0.001, η2 p = 0.327), and Postcentral Gyrus (Postcentral_R: F2, 34 = 15.869, p = 1 × 10−5, η2 p = 0.483) showed significant time-dependent changes. Post hoc comparisons, using Face_1 as the baseline, revealed that percentage signal changes (PSC) values during the Emotional Context period were significantly higher than in Face_1 across most regions (p < 0.05). This suggests that these regions exhibit increased engagement in processing contextual information, likely reflecting the integration of prior affective cues into current perception. However, not all regions exhibited increased activation during Face_2. In the TPJ and Precuneus, PSC values for Face_2 remained significantly higher than for Face_1, suggesting that these regions continue to engage in top-down modulation, possibly integrating contextual information to influence later-stage emotional perceptual judgments.

The FIR results further revealed a main effect of condition on neural responses to emotional context, suggesting that specific brain regions exhibit preferential sensitivity to different emotional conditions. Significant condition effects were found in the Temporal Pole (Temporal_Pole_Sup_L: F2, 34 = 3.390, p = 0.045, η2 p = 0.166) and Hippocampus_R (Hippocampus_R: F2, 34 = 4.831, p = 0.014, η2 p = 0.221), with the Temporal Pole exhibiting greater activation in response to positive emotional contexts compared with neutral contexts, whereas Hippocampus_R showed a stronger response to neutral contexts than to negative ones. Lastly, a 3 (Time: Face_1, Emotional Context, Face_2) × 3 (Condition: Negative, Neutral, Positive) ANOVA revealed a significant interaction effect in the left Temporal Pole (Temporal_Pole_Sup_L: F4, 32 = 2.814, p = 0.042, η2 p = 0.260, Greenhouse-Geisser correction), indicating that this region plays a role in processing emotional context across different time points.

Association of TPJ and insula activation with behavioral ratings of contextual processing

To examine whether PSC values during Face_2 correlated with participants’ emotional ratings, we analyzed their relationship with valence and arousal ratings across conditions. This analysis further explores the role of the TPJ and Insula in contextual emotion processing (Figure 3C). The TPJ plays a critical role in social cognition,44,45 while the insula is central to emotional responses.46,47,48 Our contrast analysis further confirmed that the TPJ is the largest cortical region and the insula is the largest subcortical region associated with contextual processing.

For Peak Cluster 1 (centered in the TPJ), the correlation between PSC and valence scores across the three conditions was significant (r = −0.218, p = 0.012, one-tailed; Figure 3Di), suggesting that the TPJ is primarily involved in valence ratings. Higher PSC values were associated with lower valence scores, supporting its role in dynamically integrating social cues and inferring emotional valence during contextual processing. When analyzing PSC within each condition separately, no significant correlation was found between PSC and behavioral ratings (all p > 0.05; Figures S3A–S3C). For Peak Cluster 2 (centered in the Insula), significant correlations were found between PSC and arousal scores across the three conditions (r = 0.317, p = 4 × 10−4, one-tailed; Figure 3Dv). When examining PSC separately for each condition and behavioral rating (Figures S3D–S3F), significant positive correlations were observed. Specifically, in the neutral condition, PSC was significantly correlated with both valence (r = 0.417, p = 0.006, one-tailed) and arousal (r = 0.314, p = 0.031, one-tailed), and in the positive condition, PSC was significantly correlated with both valence (r = 0.399, p = 0.008, one-tailed) and arousal (r = 0.379, p = 0.011, one-tailed). These findings demonstrate that the insula plays a central role in processing both valence and arousal, with higher PSC values corresponding to increased ratings in these dimensions, reflecting its involvement in emotional and physiological integration. For Peak Cluster 3 (located near the Caudate Tail), no significant correlation was found between PSC and behavioral ratings across all conditions (Figures 3Diii and 3Dvi), as confirmed by separate condition analyses (all p > 0.05; Figures S3G–S3I).

Association of whole-brain dynamic FC with behavioral ratings of contextual processing in the negative condition

To explore the dynamic interactions among multiple regions during contextual processing, we conducted a sliding window analysis based on ROIs (as listed in Table S1) for each condition (Figure 4A). This analysis generated a series of 17 × 17 symmetric matrices representing the whole-brain network within each time window. The full set of dynamic brain networks across different time windows for the three conditions is provided in Figure S4. While the averaged FC matrices gradually decreased across all three conditions (Figure 4B), a paired t-test on the averaged FC values revealed that the difference between the first (0–8s) and last (14–22s) time windows was significant only in the negative condition (t = 2.602, df = 70, p = 0.012; Figure 4C). In contrast, no significant differences were found in the neutral condition (t = 0.685, df = 70, p = 0.496) or positive condition (t = 1.065, df = 70, p = 0.291).

Figure 4.

Figure 4

Sliding window analysis and behavioral association results using colored short videos

(A) Demonstration of the sliding window analysis. In the first row, the Pearson correlation coefficients between the time courses of each pair of ROIs (as listed in Table S1) were calculated for the 0-8s time window. These correlation coefficients were averaged across subjects (green box). A series of 17 × 17 symmetric matrices represented the whole-brain network for the negative condition within the 0-8s time window. The second row shows the dynamic brain network across different time windows, with a step size of 2 s.

(B) Averaged FC across three conditions and different time windows. Each color represents a condition. Averaged FC decreases for all three conditions in the later time windows.

(C) Comparison of the averaged FC matrix between the first (0-8s) and last (14-22s) time windows. Each color represents a condition. A significant difference between the first (0-8s) and last (14-22s) time windows was detected using a paired t-test on the averaged FC in the negative condition (t = 2.602, df = 70, p = 0.012).

(D) Association between whole-brain FC with valence ratings in each condition. (i) In the negative condition, significant correlations were observed between FC in the first time window (0-8s) and valence scores (r = 0.315, p = 0.031, one-tailed). The correlation between FC in the last time window (14-22s) and valence scores (r = 0.388, p = 0.010, one-tailed) was stronger than that in the first time window. However, no significant correlation was observed in the neutral condition (ii) or the positive condition (iii) for both time windows.

Error bars represent the SEM. Asterisks denote significant differences between time windows (∗p < 0.05).

To determine whether FC patterns during Face_1 and Face_2 processing correlated with participants’ emotional valence ratings, we examined the relationship between averaged FC values and valence ratings within each condition. In the negative condition (Figure 4Di), a significant correlation was found between FC in the first time window (0–8s) and valence scores (r = 0.315, p = 0.031, one-tailed). Moreover, the correlation between FC in the last time window (14–22s) and valence scores was even stronger (r = 0.388, p = 0.010, one-tailed), suggesting an increasing influence of dynamic FC changes on emotional perception over time. However, no significant correlations were observed in the neutral condition (Figure 4Dii) or the positive condition (Figure 4Diii) for either time window. The observed decrease in FC over time suggests a process of functional segregation, indicating a reorganization of network interactions specifically in the negative condition. Furthermore, this dynamic reorganization contributes to the perception of emotions from the neutral face, highlighting its role in contextual processing. Overall, these findings provide insight into the neural mechanisms underlying contextual processing during short video viewing.

As a complementary analysis, we also performed dynamic mutual information analysis to capture potential nonlinear dependencies among brain regions, and the corresponding results are presented in Figure S5.

Association of cluster-wise dynamic FC with behavioral ratings of contextual processing in the negative condition

Since the FC in the negative condition was associated with valence ratings, we further segmented the brain network into clusters based on contrast analysis to examine the specific contributions of distinct functional clusters to these ratings. This approach allowed for a more detailed investigation of FC both within and between clusters in the negative condition. To achieve this, we divided the 17 × 17 symmetric matrices into nine submatrices, each representing FC either within or between clusters (Figure 5A).

Figure 5.

Figure 5

Cluster-wise FC and behavioral association results in the negative condition using colored short videos

(A) Cluster-wise FC of two representative subjects from sliding-window analysis. Significant correlations exist between average FC and valence scores in the negative condition; however, further examination of FC patterns both between and within clusters is warranted. We divided the 17 × 17 symmetric matrices into 9 submatrices representing the FC between clusters or within clusters. The FC between or within clusters is shown for the first time window (0-8s, i) and the last time window (14-22s, ii). The red lines represent FC > 0, while the blue lines represent FC < 0. The thickness of the lines represents the magnitude of FC.

(B) Association between cluster-wise FC and valence rating in the negative condition. Panels i-iii represent the correlation between clusters, and panels iv-vi represent the correlation within clusters. (i) Significant correlations were observed between FC of Cluster 1 and Cluster 2 and valence scores in both the first time window (r = 0.303, p = 0.036, one-tailed) and last time window (r = 0.344, p = 0.020, one-tailed). (iii) Significant correlations were observed between FC of Cluster 2 and Cluster 3 and valence scores in the last time window (r = 0.282, p = 0.048, one-tailed).

For between-cluster connectivity, significant correlations were observed between the FC of Cluster 1 (including TPJ) and Cluster 2 (including insula) and valence scores in both the first time window (r = 0.303, p = 0.036, one-tailed) and the last time window (r = 0.344, p = 0.020, one-tailed) (Figure 5Bi). Additionally, significant correlations were found between the FC of Cluster 2 (including insula) and Cluster 3 (including hippocampus) and valence scores in the last time window (r = 0.282, p = 0.048, one-tailed) (Figure 5Biii). Moreover, the correlation strength based on FC during the last time window was higher than that during the first time window, suggesting that increased functional segregation enhanced association strength during contextual processing. For within-cluster connectivity, no significant correlations were observed between FC and valence scores (Figures 5Biv–vi). These findings indicate that the significant association between FC and behavioral ratings in the negative condition arises primarily from connectivity between clusters rather than within individual clusters. Specifically, the results suggest that valence ratings are influenced by functional interactions between Cluster 1 (including TPJ) and Cluster 2 (including insula), as well as between Cluster 2 (including insula) and Cluster 3 (including hippocampus). This highlights the critical role of these inter-cluster connections in contextual processing during short video viewing.

Replicating behavioral contextual processing in black-and-white short video viewing

To explore contextual processing in black-and-white short video viewing, 29 participants completed the single-face rating experiment (Figure 6A) and 31 participants (including the same 29 participants) completed the face-context-face sequence rating experiment (Figure 6B). Comparing valence and arousal ratings between single-face clips and faces presented in the face-context-face sequence allows us to examine how contextual processing contributes to emotional perception in black-and-white short video viewing.

Figure 6.

Figure 6

Procedure and behavioral results using black-and-white short videos

(A) In the single-face clip rating experiment, conducted on a laptop, participants rated the valence and arousal of isolated face clips. Each trial began with a 0.5-s fixation cross, followed by a 2-s presentation of a neutral face clip. After a 0.65-s inter-stimulus interval, participants provided valence and arousal ratings for the neutral face. Valence ratings ranged from −4 (negative) to 4 (positive), while arousal ratings ranged from 1 (low) to 9 (high), with a 5-s response window per rating. The neutral faces were categorized into three emotional conditions (negative, neutral, and positive), with each condition containing 10 trials.

(B) In the face-context-face sequence rating experiment, conducted in an fMRI scanner, each trial began with a 0.5-s fixation cross, followed by a 2-s presentation of a neutral face clip and a jitter period of 4–6 s. Participants then viewed a 4-s emotional context clip (negative, neutral, or positive) followed by another jitter period. The trial concluded with a 2-s presentation of a second neutral face clip, which was similar to the first. After a 0.65-s ISI, participants rated the valence and arousal of the final neutral face. Valence ratings ranged from −4 (negative) to 4 (positive), while arousal ratings ranged from 1 (low) to 9 (high), with a 5-s response window per rating. Each trial ended with an inter-trial interval of 1–1.5 s. The fMRI experiment consisted of 30 trials, evenly distributed across the three emotional conditions.

(C) Averaged valence scores of neutral faces in the single-face clip rating experiment. A significant main effect of emotional condition on valence was observed (F1.5, 42.0 = 15.227, p = 5.3 × 10−5, η2 p = 0.352, Greenhouse-Geisser correction).

(D) Averaged valence scores of neutral faces in the face-context-face sequence rating experiment. A significant main effect of emotional condition on valence was found (F1.5, 45.9 = 60.102, p = 4.6 × 10−12, η2 p = 0.667, Greenhouse-Geisser correction).

(E) Comparison of valence ratings between the single-face clip and face-context-face sequence rating experiments for the same participants. The delta score is calculated by subtracting the valence ratings from the face-context-face sequence rating experiment from those of the single-face clip experiment for the same participants. A significant difference was detected using a single-sample t-test on the delta scores in the negative condition (t = −4.067, df = 28, p = 3.5 × 10−4), neutral condition (t = −2.885, df = 28, p = 0.007), and positive condition (t = 4.458, df = 28, p = 1.2 × 10−4).

(F) Averaged arousal scores of neutral faces in the single-face clip rating experiment. No significant main effect of emotional condition on arousal was found (F2, 27 = 0.926, p = 0.408, η2 p = 0.064, Greenhouse-Geisser correction).

(G) Averaged arousal scores of neutral faces in the face-context-face sequence rating experiment. A significant main effect of emotional condition on arousal was observed (F2, 29 = 9.915, p = 0.001, η2 p = 0.406, Greenhouse-Geisser correction).

(H) Comparison of arousal ratings between the single-face clip and face-context-face sequence rating experiments for the same participants. The delta score is calculated by subtracting the arousal ratings from the face-context-face sequence rating experiment from those of the single-face clip experiment for the same participants. A significant difference was detected using a single-sample t-test on the delta scores in the negative condition (t = 2.116, df = 28, p = 0.043).

(I) Effect of context type on the contextual processing. A significant main effect of context type (biological vs. non-biological) was observed (F1, 30 = 37.246, p = 1 × 10−6, η2 p = 0.554).

(J) Effect of actor gender on the contextual processing. A significant main effect of actor gender was found (F1, 30 = 51.012, p = 6 × 10−8, η2 p = 0.630).

(K) Effect of participant gender on the contextual processing. No significant main effect of participant gender was observed (F1, 29 = 0.352, p = 0.558, η2 p = 0.012).

Error bars represent the SEM. Asterisks denote significant differences between conditions (∗p < 0.05, ∗∗p < 0.01, ∗∗∗p < 0.001).

In the single-face clip rating experiment, valence ratings for neutral faces varied by condition: −0.65 ± 0.13 for the negative condition, 0.05 ± 0.06 for the neutral condition, and 0.09 ± 0.13 for the positive condition. A repeated-measures ANOVA revealed a significant main effect of emotional condition on valence (F1.5, 42.0 = 15.227, p = 5.3 × 10−5, η2 p = 0.352, Greenhouse-Geisser correction; Figure 6C). Post hoc Bonferroni tests showed no significant difference between the positive and neutral conditions (p > 0.99) but found a significant difference between the negative and neutral conditions (p < 0.001). However, when neutral face clips were combined with an emotional context in the black-and-white face-context-face short video sequence, participants perceived the neutral faces differently. Negative contexts elicited a negative valence rating (−1.48 ± 0.17), while positive contexts induced a positive valence rating (0.89 ± 0.16), indicating that contextual processing leads to a coherent emotional perception. In contrast, neutral contexts did not significantly alter valence ratings (−0.16 ± 0.07). A repeated-measures ANOVA confirmed a significant main effect of emotional condition on valence (F1.5, 45.9 = 60.102, p = 4.6 × 10−12, η2 p = 0.667, Greenhouse-Geisser correction; Figure 6D), and post hoc Bonferroni tests showed significant differences across all conditions (negative vs. neutral, negative vs. positive, and neutral vs. positive; p < 0.001). For participants who completed both the single-face clip rating experiment and the face-context-face sequence rating experiment, valence ratings for neutral faces in the sequence rating experiment decreased in the negative condition and increased in the positive condition (Figure 6E). This shift was statistically significant, with delta scores confirming differences between the two experiments for the negative condition (t = −4.067, df = 28, p = 3.5 × 10−4) and the positive condition (t = 4.458, df = 28, p = 1.2 × 10−4), demonstrating that emotional context significantly influences facial emotion perception through contextual processing.

In the single-face clip rating experiment, arousal ratings for neutral faces were 4.93 ± 0.19 for the negative condition, 4.74 ± 0.21 for the neutral condition, and 4.92 ± 0.20 for the positive condition. A repeated-measures ANOVA showed no significant differences in arousal across conditions (F2, 27 = 0.926, p = 0.408, η2 p = 0.064; Figure 6F). However, after viewing the black-and-white face-context-face short video sequence, arousal levels increased across all three conditions compared with the single-face clip rating experiment (Figure 6G). Negative contexts elicited the highest arousal levels (5.27 ± 0.24), followed by positive contexts (4.94 ± 0.25), while neutral contexts resulted in the lowest arousal levels (4.20 ± 0.25). A repeated-measures ANOVA confirmed a significant main effect of emotional condition on arousal (F2, 29 = 9.915, p = 0.001, η2 p = 0.406, Greenhouse-Geisser correction), and post hoc Bonferroni tests showed significant differences between negative and neutral contexts (p = 0.001) and between positive and neutral contexts (p = 0.001). For participants who completed both the single-face clip rating experiment and the face-context-face sequence rating experiment, arousal ratings for neutral faces in the face-context-face sequence rating experiment increased in the negative and positive conditions but decreased in the neutral condition (Figure 6H). This shift was statistically confirmed, with significant delta scores observed between the two experiments only in the negative condition (t = 2.116, df = 28, p = 0.043).

Several factors influenced the contextual processing in black-and-white short video viewing, including the type of emotional context (biological vs. non-biological) and actor gender (male vs. female). Repeated-measures ANOVAs revealed a significant main effect of context type, with biological contexts eliciting stronger contextual processing than non-biological contexts (F1, 30 = 37.246, p = 1 × 10−6, η2 p = 0.554; Figure 6I). Additionally, actor gender had a significant impact, as participants’ valence ratings differed based on whether the actor was male or female (F1, 30 = 51.012, p = 6 × 10−8, η2 p = 0.630; Figure 6J). However, the participant’s gender did not influence the contextual processing, with male and female participants showing similar valence rating patterns (F1, 29 = 0.352, p = 0.558, η2 p = 0.012; Figure 6K). These results suggest that, while the emotional context and actor gender significantly shape the contextual processing, individual differences based on participant gender do not play a major role. Together, these findings indicate that contextual processing in black-and-white short video viewing is influenced by both the nature of the emotional context and specific individual factors, particularly the actor’s gender.

Replicating neural contextual processing in black-and-white short video viewing

To replicate the findings on neural contextual processing in black-and-white short video viewing, we applied FIR modeling to examine the temporal dynamics of contextual processing within previously identified regions associated with top-down modulation (as listed in Table S1). This approach allowed us to capture activation changes over time (Figures S6 and S7A), providing insights into the evolving engagement of these regions across different emotional conditions during black-and-white short video viewing. The FIR results are summarized in Table S3.

The FIR analysis showed a significant main effect of time on neural contextual processing in black-and-white short video viewing, with multiple brain regions demonstrating time-dependent changes. For the peak coordinates of the clusters, a significant main effect of time was observed for neural responses in Cluster 1 (F1.5, 45.8 = 9.731, p = 0.001, η2 p = 0.245, Greenhouse-Geisser correction) and Cluster 3 (F2, 29 = 39.941, p = 4.7 × 10−9, η2 p = 0.734), indicating time-dependent modulation in these clusters during contextual processing (Figure 7A). For the specific ROIs (Figure S7B), which included the TPJ (Parietal_Inf_R: F1.5, 45.8 = 9.731, p = 0.001, η2 p = 0.245, Greenhouse-Geisser correction; Parietal_Sup_R: F1.6, 46.5 = 9.803, p = 0.001, η2 p = 0.246, Greenhouse-Geisser correction), Hippocampus (Hippocampus_L: F2, 29 = 6.921, p = 0.003, η2 p = 0.323; Hippocampus_R: F1.5, 45.6 = 11.769, p = 2.7 × 10−4, η2 p = 0.282, Greenhouse-Geisser correction), Posterior Cingulate Cortex (Cingulate_Post_L: F1.5, 45.9 = 12.761, p = 1.5 × 10−4, η2 p = 0.298, Greenhouse-Geisser correction), Precuneus (Precuneus_L: F2, 29 = 6.273, p = 0.005, η2 p = 0.302; Precuneus_R: F2, 29 = 8.943, p = 0.001, η2 p = 0.381), Caudate (Caudate_L: F2, 29 = 7.120, p = 0.003, η2 p = 0.329; Caudate_R: F1.5, 46.4 = 3.905, p = 0.037, η2 p = 0.115, Greenhouse-Geisser correction), and Postcentral Gyrus (Postcentral_R: F1.5, 46.5 = 12.631, p = 1.5 × 10−4, η2 p = 0.296) showed significant time-dependent changes. Post hoc comparisons, using Face_1 as the baseline, revealed that PSC values during the Emotional Context period were significantly higher than during Face_1 across most regions (p < 0.05). This suggests that these regions exhibit increased engagement in processing contextual information, likely reflecting the integration of prior affective cues into current perception. However, not all regions exhibited sustained activation during Face_2. In the TPJ and precuneus, PSC values for Face_2 remained significantly higher than for Face_1, suggesting that these regions continue to engage in top-down modulation, possibly integrating contextual information to influence later-stage perceptual judgments. The FIR results further revealed a main effect of condition on neural responses to emotional context, suggesting that specific brain regions exhibit preferential sensitivity to different emotional conditions. Significant condition effects were found in the Precuneus (Precuneus_R: F2, 29 = 3.686, p = 0.037, η2 p = 0.203), with the Precuneus exhibiting greater activation in response to positive emotional contexts compared with negative contexts. Lastly, a 3 (Time: Face_1, Emotional Context, Face_2) × 3 (Condition: Negative, Neutral, Positive) ANOVA revealed a significant interaction effect in the left Lingual Gyrus (Lingual_L: F3.1, 93.9 = 3.106, p = 0.028, η2 p = 0.094, Greenhouse-Geisser correction), indicating that this region plays a role in processing emotional context across different time points.

Figure 7.

Figure 7

FIR analysis and sliding window analysis using black-and-white short videos

(A) The percentage signal change is plotted across three time points (“Face_1,” “Emotional Context,” “Face_2”) and three emotional conditions (Negative, Neutral, Positive) for the peak coordinates of each cluster identified in the contrast analysis. Clusters 1 and 3 exhibited a statistically significant main effect of time.

(B) Averaged FC across three conditions and different time windows. Each color represents a condition. Averaged FC decreases for all three conditions in the later time windows.

(C) Comparison of the averaged FC matrix between the first (0-8s) and last (14-22s) time windows. Each color represents a condition. A significant difference between the first (0-8s) and last (14-22s) time windows was detected using a paired t-test on the averaged FC in the negative condition (t = 3.147, df = 60, p = 0.003).

Error bars represent the SEM. Asterisks denote significant differences between time windows (∗∗p < 0.01).

To replicate the dynamic network interactions during contextual processing in black-and-white short video viewing, a sliding window analysis was performed using the previously identified ROIs (as listed in Table S1) for each condition. A series of 17 × 17 symmetric matrices represented whole-brain connectivity within a given time window. The full dynamic brain network across different time windows for the three conditions is shown in Figure S8. While the averaged FC matrices decreased across all three conditions (Figure 7B), a paired t-test on averaged FC revealed a significant difference between the first (0–8s) and last (14–22s) time windows only in the negative condition (t = 3.147, df = 60, p = 0.003; Figure 7C) but not in the neutral condition (t = 1.056, df = 60, p = 0.295) or positive condition (t = 1.359, df = 60, p = 0.179). These findings demonstrate that black-and-white short videos replicated the reorganization of network interactions observed in the negative condition when using color short videos. Taken together, the FIR and sliding window analyses suggest that contextual processing is a fundamental cognitive mechanism, independent of color cues, reinforcing its robustness in emotional perception.

Replicating behavioral contextual processing in colored short video viewing with an independent cohort

To replicate the behavioral findings of contextual processing observed in colored short video viewing, an independent cohort of 58 participants completed the face-context-face sequence rating experiment (Figure S9A). In this replication experiment, negative contexts elicited a negative valence rating (−0.58 ± 0.04), whereas positive contexts induced a positive valence rating (0.39 ± 0.04), indicating that contextual information consistently shaped emotional perception. In contrast, neutral contexts did not significantly alter valence ratings (0.05 ± 0.03). A repeated-measures ANOVA confirmed a significant main effect of emotional condition on valence (F1.6, 89.2 = 194.340, p = 7.1 × 10−30, η2 p = 0.773, Greenhouse-Geisser correction; Figure S9B), and post hoc Bonferroni tests showed significant differences across all conditions (negative vs. neutral, negative vs. positive, and neutral vs. positive; p < 0.001). Despite emotional contexts influencing valence ratings, no corresponding influence was observed in emotional intensity ratings (F1.7, 99.3 = 2.543, p = 0.091, η2 p = 0.043, Greenhouse-Geisser correction; Figure S9C).

Several factors influenced contextual processing, including the type of emotional context (biological vs. non-biological), actor gender (male vs. female), and participant gender (male vs. female). Repeated-measures ANOVAs revealed a significant main effect of context type, with biological contexts eliciting stronger contextual processing than non-biological contexts (F1, 57 = 55.745, p = 5.3 × 10−10, η2 p = 0.494; Figure S9D). Additionally, actor gender had a significant impact, as participants’ valence ratings differed based on whether the actor was male or female (F1, 57 = 36.384, p = 1.3 × 10−7, η2 p = 0.390; Figure S9E). However, the participant’s gender did not influence the contextual processing, with male and female participants showing similar valence rating patterns (F1, 56 = 0.056, p = 0.814, η2 p = 0.001; Figure S9F). These results suggest that, while the emotional context and actor gender significantly shape the contextual processing, individual differences based on participant gender do not play a major role.

Discussion

Contextual processing is widely recognized as a fundamental cognitive mechanism enabling humans to make adaptive emotional interpretations based on environmental cues, facilitating affective understanding and meaning construction. With the global rise of short videos as a dominant media form, viewers are increasingly required to interpret characters’ emotional states within fast-paced, fragmented, and richly contextualized narratives. This shift raises important questions about how this cognitive function is involved in such settings, and how the brain adapts its processing strategies to meet the demands of emotional interpretation in dynamic media environments. Our findings demonstrate that contextual processing is not disrupted but adaptively sustained during short video viewing, enabling viewers to infer coherent emotional meaning from face-context-face sequences that simulate real-life emotional dynamics. Neuroimaging results further reveal that the TPJ and insula are critically involved in integrating contextual cues to modulate emotional perception, as shown by contrast analyses and brain-behavior correlations. Notably, a reduction in FC between TPJ and insula clusters reliably tracked the strength of contextual effects, indicating a flexible reconfiguration of network-level processes in response to the temporal and narrative demands of short video content. The following sections build upon these behavioral and neural findings to illustrate how contextual processing is both preserved and reorganized in the short video era.

First, to contextualize our findings within the broader theoretical landscape, it is essential to clarify the relationship between the foundations of classical theories and their applicability in contemporary media environments. Driven by advances in mobile internet infrastructure, the widespread adoption of smartphones,49 and the global rise of platforms such as TikTok, the dominant forms of media have shifted toward short videos,50 fundamentally reshaping how billions of users engage with emotional and informational content. As a result, existing theoretical models face growing pressure to account for the rapid, multimodal, and emotionally saturated nature of this form.23,30,51 Classical theories of contextual processing were primarily based on experiments using static and highly controlled stimuli, such as facial photographs from standardized image databases.6,27,52 While researchers attempted to extend these models into more dynamic contexts by combining static facial expressions with short video clips,24,25 their efforts were constrained by significant limitations in ecological validity. Specifically, the use of zoom-in techniques and simulated motion failed to reproduce natural expression, due to factors such as facial micro-movements, spontaneous blinking, and visually static backgrounds.53,54,55 Moreover, earlier studies often failed to capture the layered structure of context, including both the environmental background that frames the scene and the inserted situational cues that carry emotional meaning.56 These limitations led to an incomplete understanding of how viewers construct emotional meaning in real-world media. In response, our study integrates principles from both filmmaking and neuroscience by embedding emotionally congruent background scenes to maintain spatial continuity and enhance narrative coherence. These material refinements allowed us to simulate emotional perception in everyday contexts with greater authenticity, thus offering a more ecologically valid framework for investigating contextual processing in short video environments. Importantly, our approach reflects the broader methodological shift in cognitive neuroscience toward naturalistic paradigms grounded in the film.23,31 As film-based stimuli are increasingly recognized as powerful tools for examining brain function under realistic conditions,29 our paradigm contributes to this movement by demonstrating how carefully constructed short videos can not only enhance ecological validity but also extend theoretical models of emotion and cognition through immersive, context-rich experimental designs.

Second, building upon the emphasis on ecological validity above, it is necessary to clarify how our stimulus construction operationalizes these principles within the empirical landscape of short videos. Although the short videos used in this study were not directly sampled from social media platforms, they were carefully designed to approximate the core structural and affective characteristics of authentic short-form videos while maintaining methodological rigor. Content on platforms such as TikTok and YouTube Shorts is highly heterogeneous; however, despite this diversity, certain editing grammars recur across popular formats, particularly the face-context-face configuration that alternates between a character’s facial reaction and the surrounding context, resembling the perceptual logic of POV (point-of-view) editing. These structures are pervasive in micro-dramas, reaction clips, and POV meme videos, forming a shared visual syntax for emotional storytelling. Recent analyses indicate that the hashtag #POV alone has generated over 68 million videos on TikTok, underscoring its prominence as a narrative convention in contemporary media.57 Similarly, more than half of YouTube’s entertainment content now appears within its Shorts ecosystem, where short-form videos featuring rapid alternations between performer faces and contextual scenes dominate user engagement.15 Given the ubiquity of this editing grammar in short-form media, we sought to distill its perceptual essence rather than reproduce the uncontrolled variability of real online content. To achieve this, we extracted the face-context-face editing prototype as the structural foundation of our stimuli and systematically standardized low-level cinematic parameters, including acting style, shot length, shot scale, POV editing, and color grading.58,59 This approach preserved the key perceptual mechanisms underlying emotional inference in naturalistic short videos, as alternating visual perspectives between faces and contexts is known to facilitate viewers’ interpretation of emotional meaning.6,29,60 By emphasizing representativeness over direct replication, our design captured the essential grammar of emotional communication across short-form media while retaining experimental precision. In this way, the paradigm advances film-based experimental methodologies by explicitly situating contextual emotion perception within the formal structures of contemporary short video storytelling.

Third, to evaluate how contextual processing contributes to emotional perception in dynamic media environments, our behavioral findings provide robust evidence that such processing alters emotional perception during short video viewing. Prior to the main experiment, we conducted a preliminary rating task to establish an emotional baseline for neutral faces (Tables S4 and S5). In the main experiment, participants were asked to assess the valence and arousal of neutral faces after watching face-context-face video sequences. The ANOVA results revealed a significant difference in valence between emotional and neutral conditions (Figures 1D, 6D, and S9B), suggesting that emotional context systematically altered participants’ perception of affect in otherwise neutral faces. These findings align with previous studies showing that viewers often rely on external cues to resolve ambiguous facial expressions.24,25,27,56 Extending these insights, our study adopted a within-subject design, which reduced between-subject variability and allowed a more direct observation of contextual processing. In the single-face condition, neutral faces were presented in isolation and elicited minimal emotional perception (Figures 1A and 6A). In contrast, in the face-context-face condition, where the same faces were presented within emotional context sequences, participants inferred emotions that were consistent with the surrounding narrative (Figures 1B and 6B). Importantly, both valence and arousal ratings were significantly higher in the face-context-face condition (Figures 1E, 1H, 6E, and 6H), and this pattern persisted across both color and black-and-white forms, indicating the robustness and modality-independence of contextual processing. Moreover, the within-subject design enabled us to uncover the top-down nature of the contextual modulation,27,33 in which viewers relied on associative learning and prior experiences that link specific contexts with emotional outcomes to interpret otherwise ambiguous stimuli.2,61 In this framework, the “top” refers to the stored experiential associations,62,63 while the “down” refers to how those associations shape perceptual and affective judgments,64 reinforcing the view that contextual processing is not purely driven by sensory input but involves dynamic cognitive appraisal.4 Beyond this core mechanism, our findings further illuminate the influence of social and individual differences on contextual processing. Specifically, the type of emotional context (biological compared with non-biological) and the gender of the actor significantly affected valence ratings across both visual forms (Figures 1I, 1J, 6I, 6J, S9D and S9E). Non-biological contexts were associated with lower valence ratings, possibly due to their reduced social relevance or greater emotional ambiguity, which may lead participants to adopt more negatively biased interpretations in the absence of clearly embodied cues.65,66 Additionally, male actors were rated as evoking more positive emotional perceptions than female actors, a finding that may reflect gender stereotypes associating male expressions with greater emotional stability or social acceptability.67 Participant gender also influenced valence ratings, although this effect was observed only in the color video condition within one dataset (Figure 1K). Female participants gave lower valence scores on average than male participants, which may reflect greater emotional sensitivity and accuracy in recognizing subtle or mixed emotional expressions—a trait that has been linked to more cautious emotional evaluations.68

Fourth, our findings indicate that the neural mechanisms underlying contextual processing during short video viewing differ from those previously identified in traditional paradigms. Earlier studies using static image-based paradigms have consistently implicated regions such as the bilateral temporal pole, anterior cingulate cortex, amygdala, and bilateral superior temporal sulcus in contextual processing.6,27 In contrast, our study, which employed ecologically valid short video stimuli, revealed three distinct clusters comprising regions including the TPJ, insula, hippocampus, and temporal pole (Figure 2B and Table S1). Importantly, the activation patterns from these clusters were isolated from both facial expressions and the emotional content of preceding contexts, thereby reflecting brain regions specifically involved in top-down modulation during contextual processing.27,33 The temporal pole, previously described as a repository of contextual schemas,69,70 is known to support the integration of emotionally meaningful background information, and damage to this region has been associated with impairments in recognizing familiar scenes and faces.71,72 However, the involvement of the temporoparietal junction observed in our study highlights its specialized role in interpreting socially and emotionally relevant contextual cues, particularly in immersive and dynamic environments such as short videos. This interpretation is supported by prior research linking the TPJ to mechanisms of top-down contextual updating, a cognitive process in which new stimuli are evaluated in light of internal models and expectations shaped by prior experience.73,74 In our experiment, participants inferred emotional meaning from a second neutral face (Face_2) that followed an emotional context—a process likely driven by internalized models rather than immediate perceptual input. This finding suggests that contextual processing in short video environments is not purely stimulus-driven but instead involves active inference modulated by stored knowledge. In addition, the TPJ is a central hub in the mentalizing network,75 which enables the understanding of others’ thoughts and emotions in socially complex settings,23,76 and its activation in our task indicates that participants may have construed short video narratives as socially meaningful episodes, prompting spontaneous attribution of mental states. This interpretation is further corroborated by findings from clinical populations; for example, individuals with autism spectrum conditions, who typically exhibit deficits in social cognition, show reduced TPJ activation when viewing socially awkward moments in naturalistic television clips.77 The developmental trajectory of the TPJ function also supports its link with acquired, experience-based processes. Studies have demonstrated that the efficiency of contextual framing increases with age, mirroring the maturation of the mentalizing network.78,79 Further supporting the role of experience in contextual processing, our contrast analysis identified functionally relevant clusters, including the hippocampus, a region implicated in retrieving learned associations that inform emotional and social inferences.2,6,42 Prior work has shown that the ability to contextualize emotion is not innate but develops over time; for instance, first-time viewers often struggle to integrate contextual cues when interpreting neutral facial expressions, whereas experienced viewers succeed in doing so.80 Given that our participants were university students with considerable exposure to visual media (mean age = 22.64), their contextual evaluations likely reflected well-established emotional learning. Taken together, these findings indicate that the neural architecture supporting contextual processing is not static or universally fixed but is instead dynamically restructured in response to the demands of specific task contexts and ecologically realistic stimuli.

Fifth, contextual processing unfolds over time and contributes to emotional perception in ways that warrant close examination within dynamic media environments such as short videos. Building on the contrast results, we conducted an FIR analysis to examine whether neural engagement in relevant regions was transient or sustained throughout the task. This analysis revealed a significant main effect of time across brain clusters (Tables S2 and S3), indicating the dynamic involvement of these regions during contextual evaluation. Compared with the previous study that employed intracranial electrodes and static pictures to investigate dynamic contextual processing in the orbitofrontal cortex, amygdala, and hippocampus,6 our findings expand existing knowledge by demonstrating that multiple cortical and subcortical regions, including the TPJ, insula, and hippocampus, actively contribute to the temporal dynamics of contextual processing under naturalistic media conditions. At the level of individual regions of interest, the hippocampus, which is commonly associated with associative memory,42,81,82 showed a significant main effect of time, further reinforcing its role in top-down modulation.27,33 The temporal pole also exhibited a significant interaction effect during color video viewing, consistent with previous findings,27,70,83 highlighting its role in the integration of emotionally meaningful background information. Notably, the time series of the TPJ showed minimal activation during the initial neutral face phase (Face_1) but heightened activation during the emotional context phase; more importantly, it maintained elevated activity during the second neutral face phase (Face_2) (Figures 3A, S2 and S7), suggesting continued engagement in affective meaning construction during contextual processing. The involvement of the TPJ in dynamic emotional contextual processing is supported by prior electroencephalography evidence,17 which reported increased N170 and late positive potential amplitudes at the P10 site during the viewing of happy compared with neutral faces. Similarly, functional magnetic resonance imaging studies have shown that the right TPJ encodes the polarity, complexity, and intensity of the emotional experience,44 aligning with its established function in contextual updating, whereby new stimuli are interpreted in relation to internal models and expectations.73,74 In our study, TPJ activation was significantly correlated with participants’ behavioral contextual effects (Figure 3Di). This finding supports that the TPJ, a region extensively implicated in social cognition,23,76 facilitates the resolution of ambiguity in facial expressions by integrating emotionally relevant context. In parallel, activation in the insula was significantly correlated with arousal ratings (Figure 3Dv), consistent with its well-established function in emotional awareness.46,47,48 Together, these findings revealed the distinct but coordinating role of TPJ and insula subserving the emotional context processing in short video environments.

Sixth, contextual processing unfolds over time and contributes to emotional perception in ways that necessitate examination at the network level, particularly within dynamic media environments such as short video viewing. While our FIR analysis focused on the region-specific temporal dynamics,36 a broader question concerns how these regions collectively support processing through the network-level reorganization.84 The brain’s functional architecture is characterized by a dynamic balance between integration and segregation, where integration promotes interregional cooperation for complex tasks and segregation allows for region-specific specialization.85,86,87 This balance forms the foundation of cognitive flexibility,88 and recent research suggests that slow cortical dynamics enable neural circuits to encode temporal context and novelty through evolving population states.89 To examine how these principles manifest during contextual processing, we applied sliding window FC analysis to determine whether and how network configurations reorganize over time.37,90 Under negative emotional conditions, we observed a significant reduction in FC between the contextual processing regions identified in the contrast analysis (Figures 4B and 4C), and this decoupling was significantly associated with stronger behavioral contextual effects (Figure 4Di). Although network reconfiguration was also present in neutral and positive contexts, the average change in connectivity between the first and second face phases was not statistically significant, nor was it associated with behavioral changes, suggesting asymmetry in how emotional valence modulates network dynamics during contextual processing. Overall, the observed reduction in connectivity reflects increased functional segregation,91,92,93 indicating that distinct brain regions may assume more specialized roles in constructing affective meaning. This interpretation aligns with a growing body of research demonstrating that the organization of brain networks plays a critical role in cognitive performance. For example, studies have shown that neuromodulatory mechanisms facilitate flexible transitions between integrated and segregated network states, thereby supporting adaptive cognition across tasks.94 Complementing this, evidence from recent work suggests that greater functional segregation within neural systems is associated with enhanced crystallized intelligence and faster processing speed.95 At the cluster level, our results indicate that negative emotional contexts primarily prompted reorganization between clusters (Figures 5Bi–iii), rather than within clusters (Figures 5Biv–vi). In particular, reduced connectivity between Cluster 1 (including TPJ) and Cluster 2 (including insula) during the “Face_2” phase was significantly associated with higher valence ratings (Figure 5Bi). The association strength during “Face_2” was higher than that during “Face_1”. This pattern of network-level segregation may facilitate more modular and efficient processing, allowing the TPJ and insula to contribute distinct but complementary functions to affective interpretation. Although these regions are anatomically connected through white matter tracts, as demonstrated by the diffusion imaging research,96 they nonetheless engage in functionally distinct roles. This distinction is further supported by resting-state fMRI studies showing that, while the TPJ and anterior insula co-activate within the salience and ventral attention networks, they exhibit divergent temporal profiles, suggesting specialized processing dynamics depending on task demands.97 In addition, findings from social discounting paradigms show that the TPJ and insula are differentially recruited depending on the motivational framing,98 reinforcing the idea that these regions flexibly decouple to support contextual processing functions in online social cognition. This suggests that the TPJ and insula can flexibly decouple to support specialized processing, consistent with our finding that reduced connectivity between them is associated with stronger contextual effects. Additionally, cross-cluster decoupling between Cluster 2 (including insula) and Cluster 3 (including hippocampus and precuneus) was also associated with higher valence ratings (Figure 5Biii), further underscoring subcortical functional specialization that may enhance deeper emotional integration. Taken together, these findings align with intracranial evidence from Zheng et al.,6 who reported that hippocampal and orbitofrontal circuits jointly contribute to contextual processing, albeit within more spatially constrained paradigms. In summary, these results suggest that contextual processing in naturalistic media is not governed by a static or monolithic circuit but rather emerges through flexible, adaptive network restructuring that is finely attuned to the affective and narrative complexities of short video content.

Seventh, our findings carry broader implications for social cognitive theory and media design, with additional relevance to clinical neuroscience. At the theoretical level, emotional perception is not an isolated process but is shaped by dynamic contextual cues.6,56 The face-context-face structure employed in this study (Figure 1B), which simulates a POV narrative,58 provides a naturalistic paradigm for investigating contextual processing. This structure includes both a glance shot and an object shot, allowing viewers to adopt the actor’s perspective. Because it mirrors how individuals observe and interpret social cues in everyday life,24 it facilitates audience immersion into the actor’s viewpoint and enhances ecological validity, thereby bridging the gap between controlled experimental settings and real-world social cognition. With implications for short video production, the validated face-context-face structure mirrors a core narrative strategy commonly employed in formats like short dramas, vlogs, and news reports, characterized by alternating facial expressions and contextual scenes. Our findings show that contextual processing is not only preserved in this form but also modulated by both contextual and individual factors; specifically, biological contexts and male actors were associated with stronger contextual modulation (Figures 1I, 1J, 6I, 6J, S9D and S9E). These effects provide practical insights for content creators, highlighting which combinations of narrative elements are most likely to elicit coherent and emotionally resonant responses. This supports the use of modular emotional templates in short video production and informs the development of affective computing and automated editing systems. At the clinical level, the present study identified several brain regions, including the TPJ, insula, and hippocampus, that are involved in contextual processing. Although these findings are based on healthy participants and remain correlational in nature, it is noteworthy that these regions have also been implicated in previous research on psychiatric and neurological conditions characterized by deficits in social and emotional processing.2,20,99,100,101 Consistent with recent efforts to apply naturalistic fMRI paradigms to the study of psychiatric disorders,2,22,31,102,103,104 future research building upon the current findings could further explore contextual processing in clinical populations. Such work may ultimately contribute to the identification of candidate neural markers for disorders involving contextual misprocessing, provided that these findings are rigorously validated in clinical cohorts.

In summary, this study confirms that contextual processing is preserved in a naturalistic media environment. Specifically, key brain regions of TPJ and insula, and their related networks, play distinct yet coordinated roles when contextual processing is recruited in integrating environmental cues and guiding emotional perception. Our findings could extend classic cognitive theories, advance our understanding of emotional perception in short video forms, and may inform future investigations into neural correlates of contextual processing dysfunction in psychiatric conditions.

Limitations of the study

There are several limitations in the current study that warrant consideration for future research. To begin with, the limited temporal resolution of fMRI constrains its ability to capture fast-evolving neural dynamics inherent to contextual processing in short videos. Although our use of FIR modeling and sliding window analysis partially addressed this issue by capturing broader temporal trends, future studies may benefit from high-temporal-resolution techniques such as magnetoencephalography, which can offer finer insights into the timing of context-emotion interactions.105 In addition, while our paradigm focused exclusively on visual information, real-world short videos are inherently audiovisual. Auditory cues often provide essential emotional context and can modulate both perception and interpretation.106 Incorporating sound in future designs could enhance ecological validity and reveal how multimodal integration shapes contextual processing. Another concern involves the format of the video presentation. Current short video platforms predominantly use a 9:16 vertical format optimized for mobile screens,107 which contrasts with the 16:9 horizontal aspect ratio employed in our experimental stimuli. This mismatch may affect visual attention and emotional immersion, potentially modulating contextual processing outcomes. Future studies should explore whether this format difference alters neural engagement or behavioral responses. Furthermore, although we selected the fundamental face-context-face sequence to approximate the basic POV editing of short videos,57 the diversity of content on TikTok is far greater, encompassing wider shot angles, varied shot types, and differences in shot duration and overall video length. Importantly, our current design focuses exclusively on short sequences and does not directly compare short and long videos, limiting the generalizability of the conclusions across different types of videos on social media. Future studies could implement more complex structures to enhance ecological validity and examine emotional perception across longer video durations, beyond the constraints of the current face-context-face sequence. Moreover, emotional experience in the present study was assessed solely through self-reported valence and arousal ratings, which may be influenced by subjective bias and limited introspective accuracy. To overcome this limitation, future work could incorporate physiological indices of emotional arousal, such as electrodermal activity and heart rate variability, which reflect autonomic nervous system activation—a core component of emotional responding in many contemporary theories of emotion.108,109 These indices capture complementary sympathetic and parasympathetic dynamics, providing objective evidence of emotional intensity beyond self-report. Combining fMRI with these multimodal physiological measures would allow for a more accurate characterization of the emotional processes underlying contextual processing. Lastly, our sample size and demographic diversity were relatively limited, which may constrain the reliability and generalizability of our findings. Future studies should recruit larger and more culturally diverse cohorts to examine how factors such as cultural background, ethnicity, or social context modulate contextual processing of emotional perception, thereby providing a more comprehensive and reliable characterization of the neural mechanisms underlying contextual processing.110,111

Resource availability

Lead contact

Further information and requests for resources and reagents should be directed to and will be fulfilled by the lead contact, Yiwen Wang (wyiw@bnu.edu.cn).

Materials availability

This study did not generate new unique reagents.

Data and code availability

Acknowledgments

This study was financially supported by the Arts Project of 2023 National Social Science Fund of China, China (Grant No. 23ZD07); the National Natural Science Foundation of China, China (Grant No. 32400844); the China Postdoctoral Science Foundation, China (Grant No. 2025M773628); and an industry-university collaborative project between Zhengzhou Qinghe Measurement and Control Technology Co., LTD and Beijing Normal University, China (Grant No. 2252000030). The authors thank them for their financial support. We would also like to thank Professor Yong He from the IDG/McGovern Institute at Beijing Normal University, whose invaluable assistance greatly facilitated our data collection process. Special thanks are also extended to the crew at Sichuan Film and Television University for their support in producing the short videos for this study. The authors would also like to thank Keqing Cao, Yanlin Zhu, Zhichen Shi, and Yuan Wu for their support during the study.

Author contributions

Z.C.: conceptualization, investigation, methodology, software, data curation, formal analysis, visualization, funding acquisition, writing - original draft, writing - review & editing. X.X.: conceptualization, methodology, supervision, funding acquisition, writing - review & editing. Yashu Wang: investigation, resources. R.L.: supervision, writing - original draft. Y.X.: methodology. L.W.: resources. S.B.: investigation. F.Y.: investigation. Yiwen Wang: Project administration, conceptualization, supervision, funding acquisition, writing - review & editing.

Declaration of interests

The authors declare no competing interests.

STAR★Methods

Key resources table

REAGENT or RESOURCE SOURCE IDENTIFIER
Deposited data

Behavioral and neuroimaging data This paper https://github.com/Michaelcao92/ContextualProcessing

Software and algorithms

MATLAB 2016b MATLAB software http://www.mathworks.com
PsychoPy 3.2 Psychopy software https://www.psychopy.org
G∗Power G∗Power http://www.gpower.hhu.de
SPM12 NITRC https://www.nitrc.org/projects/spm
XjView xjView toolbox https://www.alivelearn.net/xjview
BrainNet Viewer NITRC https://www.nitrc.org/projects/bnv
SPSS (v 26.0) IBM SPSS Statistics https://www.ibm.com/SPSS

Experimental model and study participant details

The sample size was calculated using G∗Power software, with an expected effect size (f) of 0.6, a significance level (α) of 0.05, and a statistical power (1-β) of 0.8. We used three conditions (negative, neutral, and positive) to determine the required number of participants, which resulted in a total sample size of 30. To ensure robustness, our sample size exceeded this threshold, surpassing those used in previous studies investigating contextual processing.6,21,35 We recruited 149 healthy individuals (75 females, aged 22.40 ± 3.52 years) with normal or corrected-to-normal vision from Beijing Normal University.28 Participants were screened through a psychological health questionnaire and reported no history of panic disorder. All participants provided written informed consent, which was approved by the ethics committee of the State Key Laboratory of Cognitive Neuroscience and Learning at Beijing Normal University (approval no. IRB B 0030 2019001). Participants received financial compensation, and the experimental procedures were ethically endorsed by the aforementioned committee.

Ethics approval

All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki Declaration and its later amendments or comparable ethical standards. The study was approved by the Institutional Review Board of the State Key Laboratory of Cognitive Neuroscience and Learning at Beijing Normal University (Approval No. IRB B 0030 2019001).

Informed consent

Before the study began, written informed consent was obtained from all actors and human participants. Furthermore, consent was also secured for publishing any potentially identifiable images or data presented in this article, ensuring adherence to ethical guidelines for research transparency and participant privacy.

Method details

General design

This study aimed to investigate the behavioral and neural mechanisms of contextual processing in short video viewing, focusing specifically on how emotional perception emerges from the integration of facial expressions and contextual cues. In the main experiment, neutral faces were presented either within a face-context-face sequence while participants were scanned using fMRI, or as isolated stimuli on a laptop, following a within-subject design. This comparison was designed to observe how coherent emotional perception is shaped by contextual embedding and to assess the role of top-down modulation in this process. For the fMRI experiment, we identified specific brain regions involved in the top-down modulation of contextual processing and investigated their temporal dynamics during naturalistic video viewing. Following the experiment with colored short videos, the same procedure was applied to black-and-white short videos to replicate the behavioral and neural mechanisms of contextual processing. Lastly, behavioral replication was further conducted with an independent cohort to confirm the robustness of contextual processing effects in colored short video viewing.

The overall experimental design comprised three components: a materials rating experiment, a face-context-face sequence rating experiment, and a subsequent single-face rating experiment. For the materials rating experiment, 12 participants (6 females) rated colored and 12 participants (6 females) rated black-and-white short videos. In the face-context-face sequence rating experiment for colored short videos, 36 participants (19 females) were recruited, with 29 completing the subsequent single-face rating experiment, while 7 dropped out. For the black-and-white short video experiment, 31 participants (16 females) were recruited, with 29 completing the subsequent single-face rating experiment, while 2 dropped out. For the replicated face-context-face sequence rating experiment, 58 participants (28 females) were recruited.

Stimuli

In the current study, under the guidance of a professional director and crew, we produced face–context–face short video sequences to investigate contextual processing during short video viewing, as pre-existing short video clips with complex mise-en-scène and variable durations were not suitable for our specific research objectives. Moreover, the face–context–face structure represents a fundamental narrative unit commonly found in various forms of short videos, including short dramas, vlogs, and news reports. The video sequences included both colored and black-and-white types. In brief, the preparation involved three main steps: Video Clip Production, Clips Rating, and Video Sequence Assembly.28

In the Video Clip Production step, we adopted a standardized camera setup that adhered to the one-sided 180-degree axis rule, ensuring that both face shots and emotional contexts were viewed from the same vantage point. We recorded thirty 2-second face clips, featuring 15 female and 15 male actors, and thirty 4-second clips representing emotional contexts. The face clips featured actors maintaining a neutral expression, recorded against a blue screen to ensure a consistent background. Actors were instructed to maintain a neutral facial expression and direct their gaze at a fixed point near the camera. The proportion of biological (e.g., human-related) to non-biological (e.g., object-related) elements in emotional context clips varied across conditions: negative (50%:50%), neutral (40%:60%), and positive (60%:40%). The selected clip duration aligns with the standard length of short video shots and film scenes, which typically range from 3 to 4 seconds.112,113 The emotional context clips were divided into three categories—negative, neutral, and positive—with ten clips per category. For example, the emotional contexts included a bathtub with blood (negative), cleaning a blackboard (neutral), and humorous facial expressions (positive).

graphic file with name fx2.jpg

Description of producing short videos

(A) Description of shot angles used to create short videos. Two cameras, strategically positioned along a 180-degree axis, were used to capture the sequences in a shot-reverse-shot structure. One camera recorded neutral faces, while the other captured emotional contexts.

(B) On set photos of the camera recording neutral faces.

(C) Examples of emotional context used in short videos.

In the Clips Rating step, participants independently evaluated the valence and arousal of both face and emotional context clips. Each participant rated the stimuli using a scale from -4 (negative) to 4 (positive) for valence and a scale from 1 (low) to 9 (high) for arousal. Participants had a 5-second response window for each stimulus. The stimuli were presented using PsychoPy 3.2 software (https://www.psychopy.org) on a 14-inch laptop, positioned approximately 60 cm from the participants. Participants rated the stimuli using Visual Analogue Scales, with a mouse to select the corresponding values for valence and arousal. The rating results for colored clips are shown in Table S4. For the faces, no significant differences were found between the three groups in valence (F2, 8 = 0.924, p = 0.435, η2 p = 0.188) or in arousal (F2, 8 = 0.140, p = 0.872, η2 p = 0.034). For the contexts, significant differences were observed among the emotional contexts in both valence scores (F2, 8 = 119.650, p = 1 × 10-6, η2 p = 0.968) and arousal scores (F2, 8 = 43.771, p = 4.9 × 10-5, η2 p = 0.916). Post hoc Bonferroni tests revealed significant differences between the negative and neutral groups (p < 0.001) and between the positive and neutral groups (p < 0.001) for both valence and arousal. The rating results for black-and-white clips are shown in Table S5. For the faces, no significant differences were found between the three groups in valence (F2, 8 = 0.402, p = 0.682, η2 p = 0.091) or in arousal (F2, 8 = 1.089, p = 0.382, η2 p = 0.214). For the contexts, significant differences were observed among the emotional contexts in both valence scores (F2, 8 = 107.922, p = 2 × 10-6, η2 p = 0.964) and arousal scores (F2, 8 = 48.642, p = 3.3 × 10-5, η2 p = 0.924). Post hoc Bonferroni tests revealed significant differences between the negative and neutral groups (p < 0.001) and between the positive and neutral groups (p < 0.001) for both valence and arousal.

In the Video Sequence Assembly step, we combined a neutral face clip (Face_1), an emotional context clip, and another neutral face clip (Face_2), creating ten distinct short video sequences for each of the three emotional conditions (negative, neutral, and positive). The blue screen background of the face shots was replaced with images or videos captured during the filming of the emotional contexts. The sound was removed, and the completed video sequences were rendered in MP4 format (1920 × 1080 pixels, 25 frames per second). Examples of these video sequences are provided in the Supplementary Videos (Videos S1 and S2).

Procedure

Face-context-face sequence rating experiment

The face-context-face sequence rating experiment, conducted in an MRI scanner, began with two practice trials using distinct short videos, followed by T1-weighted image scans. After a 10-second instruction screen, each trial started with a 0.5-second fixation cross, followed by a 2-second neutral face clip. Random jitter intervals lasting between 4 and 6 seconds were inserted before and after the 4-second emotional context clip. A final 2-second neutral face clip was presented at the end of each sequence. This sequence was designed to capture the integration of face and context information in a dynamic viewing context. After each short video, participants experienced a 0.65-second inter-stimulus interval (ISI) before rating the valence and arousal of the neutral face. Valence ratings were provided on a scale from -4 (negative) to 4 (positive), and arousal ratings on a scale from 1 (low) to 9 (high), both with a 5-second response window. A 1 to 1.5-second inter-trial interval (ITI) followed each trial. This procedure was repeated for a total of 30 trials (Figures 1B and 6B).

During the fMRI experiment, participants lay inside the MRI scanner while viewing the videos projected onto a mirror. The computer controlling the projection was positioned approximately 50 cm from the mirror, with the distance from the mirror to the participants’ eyes ranging between 10 and 15 cm. Stimulus presentation and response recording were managed using PsychoPy 3.2 software. The order of video sequence presentations was randomized for each participant within each condition, and the sequence of the three conditions was also randomized. Ratings were recorded using MRI-compatible keyboards, with participants selecting their responses via Visual Analogue Scales displayed on the screen. Participants first used the two keys on the left-hand keyboard to select the value, and then they used the single key on the right-hand keyboard to confirm their selection.

Single-face clip rating experiment

Following the face-context-face sequence rating experiment, participants completed the single-face clip rating experiment. Each trial began with a 0.5-second fixation cross, followed by a 2-second presentation of a neutral face clip. A 0.65-second ISI was presented before participants rated the valence and arousal of the neutral face. Valence ratings ranged from -4 (negative) to 4 (positive), while arousal ratings ranged from 1 (low) to 9 (high), with a 5-second response window for each. Both the face-context-face and single-face clip rating experiments consisted of a total of 30 trials, with each experimental condition including 10 trials. Stimulus presentation and response recording during the experiment were managed using PsychoPy 3.2 software, displayed on a 14-inch laptop positioned 60 cm from the participants. Ratings were presented via Visual Analogue Scales, and participants used a mouse to select the corresponding scale values.

Replicated face-context-face sequence rating experiment

To examine whether the behavioral contextual processing observed in the fMRI experiment could be reproduced, a simplified behavioral replication of the face-context-face sequence rating task was conducted on a laptop with an independent cohort of participants. Each trial started with a 2-second neutral face clip, followed by a 4-second emotional context clip, and concluded with another 2-second neutral face clip. After each short video, participants rated the valence and emotional intensity of the neutral face. To simplify the behavioral replication of contextual processing and facilitate participants’ responses, the rating scales were slightly adjusted: valence was rated from −1 (negative) to 1 (positive), and emotional intensity from 1 (low) to 5 (high). Both ratings were completed within a 5-second response window. A 1.5-second inter-trial interval followed each trial. The procedure was repeated for a total of 30 trials, with each experimental condition including 10 trials (Figure S9A). Stimulus presentation and response recording were managed using PsychoPy 3.2 software, displayed on a 14-inch laptop positioned approximately 60 cm from participants. Ratings were presented via visual analogue scales, and participants used a mouse to select their responses.

Quantification and statistical analysis

Imaging data acquisition and preprocessing

A Siemens 3T Prisma MRI scanner was employed for the experiment. T1-weighted data were acquired using Magnetization Prepared Rapid Acquisition Gradient-echo (MPRAGE) imaging, with the following parameters: repetition time (TR) / echo time (TE) / inversion time (TI) = 2530 ms / 2.27 ms / 1100 ms, flip angle (FA) = 7°, field of view (FOV) = 256 × 256 mm2, number of slices = 208, slice thickness = 1 mm, and voxel size = 1 × 1 × 1 mm3. Task-related fMRI scans were acquired using a T2-weighted echo planar imaging (EPI) sequence, with TR/TE = 2000 ms / 34 ms, FA = 70°, FOV = 200 × 200 mm2, matrix size = 100 × 100, number of slices = 72, slice thickness = 2 mm, and slice gap = 0 mm, with a voxel size of 2 × 2 × 2 mm3, and 480 volumes per scan. Field map parameters were as follows: TR/TE1/TE2 = 720 ms / 4.92 ms / 7.38 ms, FA = 70°, FOV = 200 × 200 mm2, matrix size = 100 × 100, number of slices = 72, slice thickness = 2 mm, slice gap = 0 mm, and voxel size = 2 × 2 × 2 mm3, with 1 volume recorded. The total imaging session lasted an average of 25 minutes and 12 seconds.

We preprocessed the brain imaging data from the 67 participants using SPM12 software (https://www.nitrc.org/projects/spm). The preprocessing steps included: 1) Converting scanned DICOM format data to NIFTI format; 2) Performing time-slice correction to rectify time differences between different slices; 3) Conducting field map correction to address geometric distortions caused by magnetic field inhomogeneity, using two different echo times (TE) of 4.92 milliseconds and 7.38 milliseconds, with distortion correction executed using phase and magnitude images; 4) Implementing head motion and distortion correction by setting a motion correction quality parameter at 0.9 and a 4 mm separation, along with Gaussian kernel parameters (FWHM: 5 mm × 5 mm × 5 mm); 5) Registering structural and functional images using normalized mutual information (NMI) as the cost function and setting spatial separation parameters and Gaussian kernel parameters to FWHM 7 mm × 7 mm × 7 mm; 6) Segmenting T1-weighted images into different tissue types, including scalp, skull, cerebrospinal fluid, gray matter, and white matter; 7) Aligning functional images to the Montreal Neurological Institute (MNI) standard template and resampling them to a voxel size of 2 mm × 2 mm × 2 mm; 8) Applying spatial smoothing using Gaussian kernel parameters (FWHM: 4 mm × 4 mm × 4 mm) to enhance the signal-to-noise ratio for group-level analysis.

Data analysis of colored short videos

Behavioral data

For the single-face clip rating experiment, a repeated-measures ANOVA was performed to analyze the valence ratings of neutral face clips using IBM SPSS 24 software (https://www.ibm.com/support/pages/downloading-ibm-spss-statistics-24). To determine whether the assumption of sphericity was met, Mauchly’s Test of Sphericity was conducted. If the test indicated a violation, the Greenhouse-Geisser correction was applied to adjust the degrees of freedom before calculating the corresponding p-value. To control for multiple comparisons, post hoc Bonferroni corrections were applied. The same statistical procedure was used to analyze the arousal ratings, ensuring methodological consistency.

A repeated-measures ANOVA was also conducted to assess the contextual influence on valence ratings in the face-context-face sequence rating experiment. As before, Mauchly’s Test of Sphericity was used to assess sphericity assumptions, and if violated, the Greenhouse-Geisser correction was applied. Post hoc Bonferroni corrections were performed to account for multiple comparisons. This analytical approach was similarly applied to arousal ratings, ensuring consistency in statistical methodology.

To quantify the influence of contextual information on facial emotion perception, a delta score was computed by subtracting each participant’s valence rating in the single-face clip rating experiment from their corresponding rating in the face-context-face sequence rating experiment. A single-sample t-test was conducted to determine whether these differences were statistically significant. The same approach was used to examine changes in arousal ratings.

Since multiple factors—such as the biological nature of the emotional context, actor gender, and participant gender—could modulate the contextual processing, additional subgroup analyses were performed. Separate repeated-measures ANOVAs were conducted to examine the interaction effects of biological type, actor gender, and participant gender on valence ratings. As in previous analyses, Mauchly’s Test of Sphericity was applied, and when necessary, the Greenhouse-Geisser correction was used to adjust the degrees of freedom before significance testing.

GLM analysis of fMRI data

In the fMRI data analysis, we aimed to examine the top-down neural processes involved in modulating emotional perception through contextual processing, particularly focusing on brain regions activated during the presentation of “Face_2.” Previous research has shown that individuals perceive emotions from a neutral face after viewing a preceding emotional context,6,27 and the brain regions activated during “Face_2” are considered to reflect top-down modulation of the contextual processing.17 Based on this evidence, we sought to identify brain regions in “Face_2” that are involved in the top-down modulation of emotional perception. The activation observed during “Face_2” reflects the combined influence of the preceding emotional context (“Emotional Context”) and the earlier face stimulus (“Face_1”). To isolate neural activity specific to contextual processing, we first subtracted the activity related to “Face_1” from “Face_2” to remove baseline face-processing effects. Next, we controlled for the contribution of the preceding “Emotional Context” to exclude direct effects related to prior emotional exposure.

To achieve this, we performed first-level SPM analyses on each participant using a general linear model (GLM) to obtain contrast maps of brain activation. This first-level analysis calculated the average activation differences across ten trials for each emotional condition (Negative, Neutral, Positive) per participant. Subsequently, the contrast images were entered into second-level SPM analyses using one-sample T-tests to evaluate the main effects of each contrast across participants. The results of the second-level analysis were obtained in the form of spm_T maps, representing the statistical significance of activation differences. Further statistical analysis was conducted using xjView software (https://www.alivelearn.net/xjview). In xjView, we selected brain regions with positive activation based on our hypothesis that these regions would show heightened activation in response to contextual processing. A cluster size threshold of 50 was set in xjView, and we applied false discovery rate (FDR) correction for multiple comparisons, with a significance level of p < 0.05. Finally, we reported the locations of brain regions that passed FDR correction for multiple comparisons and exhibited significant activation.

fMRI time series analysis of percentage signal change

To investigate dynamic neural processing in response to contextual processing, we conducted an fMRI time series analysis to examine neural responses in regions identified by contrast analysis across three conditions (Negative, Neutral, Positive), comparing percentage signal changes for “Face_1,” “Emotional Context,” and “Face_2.” This method facilitates the visualization of signal changes over time, allowing for detailed observation of neural responses to stimuli under varying conditions.36,114

First, we defined the ROIs based on regions identified as activated in the preceding contrast analysis. Each brain region’s functional mask was isolated using the automated anatomical labeling (AAL) atlas,115 and a logical AND operation was applied to ensure that only overlapping regions from the AAL atlas and the GLM analysis were included in the ROI definition. A 4 mm radius sphere was then created, centered at the peak voxel of each functional mask, following the previous methodology.116 This uniform sphere size was selected to balance the need for precise localization of functional activity with sufficient coverage of the surrounding region, ensuring consistency in ROI definition across participants. Anatomical reference was provided using structural templates, including AAL templates, which were applied via the DPABI toolbox.43

Next, we performed the fMRI time series analysis using data from the first-level GLM analysis, modeling brain responses to experimental stimuli. A univariate GLM was employed with three separate regressors of interest (Negative, Neutral, Positive), each convolved with the canonical HRF to simulate signal changes in response to the stimuli. Stimulus onsets within each condition were averaged to generate a single HRF for a 24-s period, reflecting the average brain response to the condition. For each time point, the percentage signal changes were calculated by comparing the signal to a baseline (the average signal across all trials for the condition). The resulting PSC time course, spanning 24 s after the onset of “Face_1,” was divided into 12 time points, corresponding to the TR of 2 s. Grand average PSC time courses were computed across participants for each condition. Time points at 6s, 14s, and 20s were selected to represent the average offset times for “Face_1,” “Emotional Context,” and “Face_2,” respectively, based on the typical HRF response latency. Parameter estimates (percentage signal changes) associated with the conditions of interest were extracted from the predefined ROIs at the individual level using MarsBar (http://marsbar.sourceforge.net), a tool for ROI analysis in fMRI data.

Finally, a repeated-measures ANOVA was conducted to examine neural processing differences in PSC across time and condition variables. When Mauchly’s Test of Sphericity indicated a violation, the Greenhouse-Geisser correction was applied to adjust the degrees of freedom and calculate the p-value. Post hoc comparisons of means were performed using Bonferroni corrections to control for multiple comparisons. Additionally, simple effects analyses were conducted when interactions were detected, with Bonferroni corrections applied to control for multiple comparisons.

Behavioral association analysis with percentage signal change

Previous fMRI time series analysis revealed signal changes during contextual processing; however, it remains unclear how these neural changes contribute to behavioral ratings of contextual processing. We investigated whether the PSC values within the peak clusters correlated with the valence scores assigned by participants during the fMRI task.

At the same time, Peak Cluster 1 corresponded to the TPJ, while Peak Cluster 2 corresponded to the Insula. Given that contextual processing in short video viewing involves the dynamic integration of emotional and social cues, these regions play crucial roles in this process. The TPJ is known to play a critical role in social cognition,44,45 while the insula is primarily associated with emotional responses.46,47,48 By performing correlation analysis, we aimed to determine whether activation in these regions was associated with variations in participants’ valence ratings, thereby linking neural activity to contextual emotion processing.

PSC values for the “Face_2” condition were extracted from the activated regions (Cluster 1, Cluster 2, and Cluster 3) identified in the contrast analysis. Given our focus on top-down modulation of contextual processing, these values were then correlated with participants’ valence and arousal scores to examine how neural activation patterns were associated with subjective emotional ratings. This analysis was conducted using MATLAB 2016b (http://www.mathworks.com).

Dynamic FC analysis using a sliding window approach

Through FIR analysis, we examined the temporal dynamics of neural responses in individual brain regions, identifying their specific contributions to contextual processing. However, brain activity does not occur in isolation; rather, it arises from the interactions among multiple regions, forming dynamic functional networks that may undergo reconfiguration over time.6,84 While the specific mechanisms underlying these temporal changes in neural interactions during contextual processing remain unclear, an even more fundamental question is how the FC between brain regions evolves over time when participants view Face_2 compared to Face_1.

To address this question, we focused on top-down processing regions identified via contrast analysis and employed sliding window analysis to examine dynamic FC changes over time.37,90 Using the 24-s PSC time course, which was derived from time-series data in the preceding ROIs and chosen to maintain consistency with the FIR analysis, FC was quantified by computing Pearson correlation coefficients across successive 8-s time windows. To ensure comparability across participants and conditions, all FC values were Z score normalized using the grand mean and standard deviation across the entire time course. This approach allows for time-series analysis of interregional interactions, tracking the temporal evolution of FC. Each 8-s window was computed separately, with a 2-s step size, ensuring compatibility with the fMRI repetition time (TR) and maintaining the temporal resolution of the data.117 The choice of the 8-s window was based on its capacity to encompass both the initial Face_1 and final Face_2 periods, offering a balance between temporal resolution and reducing signal noise often associated with shorter windows. This method captures dynamic shifts in connectivity and enables a detailed analysis of changes in brain network coupling over time.

For each experimental condition (negative, neutral, and positive), the sliding window analysis generated a sequence of time-course FC matrices, capturing the evolving connectivity patterns across successive time windows. To assess overall network changes over time during contextual processing, we first computed the mean FC matrices at each time window for each subject within each condition. We then averaged the FC matrices across all subjects to obtain a group-level representation for each condition. Additionally, to examine condition-specific dynamic changes, we compared FC matrices from the initial (0-8s) and final (14-22s) time windows, representing Face_1 and Face_2 separately. A paired t-test was performed within each condition to identify significant differences in FC between these two time windows. An increase in FC over time suggests enhanced functional integration, indicating stronger connectivity among brain regions, whereas a decrease in FC signifies functional segregation, reflecting a reorganization of network interactions. This analysis provides insights into how the brain’s network structure dynamically adapts during contextual processing.

Behavioral association analysis with FC

Previous dynamic FC analysis revealed changes in functional integration and segregation during contextual processing. Specifically, as participants viewed the short videos, their brain networks exhibited dynamic shifts in connectivity patterns over time. Given that dynamic FC changes likely reflect underlying neural mechanisms associated with the emotional appraisal,32,118 we examined whether FC patterns during Face_1 and Face_2 correlated with participants’ emotional valence ratings. This approach enabled us to test whether connectivity alterations corresponded with subjective emotional experiences.

We first extracted whole-brain FC matrices for each subject from the initial (0-8s) and final (14-22s) time windows, representing Face_1 and Face_2, respectively. These FC values were then averaged within each condition and correlated with participants’ valence ratings. Additionally, to examine the contributions of specific functional clusters to the valence ratings, we segmented the brain network into clusters based on contrast analysis for a more detailed examination of FC both within and between these clusters. We computed within-cluster FC for each cluster separately and between-cluster FC for different cluster pairs (Cluster 1 – Cluster 2, Cluster 1 – Cluster 3, and Cluster 2 – Cluster 3). Finally, we analyzed whether FC patterns from the first (0-8s) and last (14-22s) time windows significantly associated with valence ratings. This analysis allowed us to assess how functional reorganization within and between clusters contributes to emotional processing over time.

Data analysis of black-and-white short videos

Behavioral data

To explore the contextual processing in black-and-white short videos while ensuring methodological consistency, identical statistical analyses were independently applied to the behavioral data from these videos. This included repeated-measures ANOVAs for valence and arousal ratings, Mauchly’s Test of Sphericity to assess sphericity assumptions, and Greenhouse-Geisser corrections when necessary. Post hoc Bonferroni corrections were applied to control for multiple comparisons. Additionally, delta scores were computed to quantify the influence of contextual information on facial emotion perception, followed by single-sample t-tests to assess statistical significance. Subgroup analyses were also conducted to examine potential interaction effects of biological type, actor gender, and participant gender, following the same procedures as in the primary analysis.

fMRI time series analysis of percentage signal change

To determine whether the ROIs identified in the contrast analysis of color short videos exhibit comparable neural response patterns in black-and-white short videos, the same fMRI time series analysis was applied to the black-and-white short video neural data. Percentage signal changes were extracted from these ROIs at key time points—“Face_1,” “Emotional Context,” and “Face_2”— and analyzed using a univariate GLM. The same analytical framework was applied, utilizing repeated-measures ANOVAs to examine differences across time and conditions, with Mauchly’s Test of Sphericity and Greenhouse-Geisser corrections when necessary, alongside Bonferroni corrections for multiple comparisons.

Dynamic FC analysis using a sliding window approach

To assess whether the ROIs identified from the contrast analysis of color short videos exhibit comparable FC patterns in black-and-white short videos, the same sliding window analysis was applied to the black-and-white short video neural data. This involved computing time-series FC matrices using the same ROIs identified in the previous color video contrast analysis, applying an 8-s sliding window with a 2-s step size to track dynamic connectivity changes across experimental conditions. The same analytical framework was applied, including averaging FC matrices at the group level, comparing connectivity patterns between the initial (0-8s) and final (14-22s) time windows—representing “Face_1” and “Face_2”—and conducting paired t-tests to identify significant changes in connectivity.

Replication analysis of colored short videos with an independent cohort

Behavioral data

For the replicated face-context-face sequence rating experiment, a repeated-measures ANOVA was conducted to examine the contextual influence on valence ratings of neutral face clips using IBM SPSS 24 software. As in this experiment, Mauchly’s Test of Sphericity was performed to assess the assumption of sphericity, and if violated, the Greenhouse–Geisser correction was applied to adjust the degrees of freedom before calculating the corresponding p-value. Post hoc Bonferroni corrections were performed to control for multiple comparisons. The same statistical procedure was applied to analyze emotional intensity ratings.

Since multiple factors—such as the biological nature of the emotional context, actor gender, and participant gender—could modulate the contextual processing, additional subgroup analyses were performed. Separate repeated-measures ANOVAs were conducted to examine the interaction effects of biological type, actor gender, and participant gender on valence ratings. As in previous analyses, Mauchly’s Test of Sphericity was applied, and when necessary, the Greenhouse-Geisser correction was used to adjust the degrees of freedom before significance testing.

Published: November 27, 2025

Footnotes

Supplemental information can be found online at https://doi.org/10.1016/j.isci.2025.114269.

Contributor Information

Xiang Xiao, Email: xiang.xiao@bnu.edu.cn.

Yiwen Wang, Email: wyiw@bnu.edu.cn.

Supplemental information

Document S1. Figures S1–S9 and TableS S1–S5
mmc1.pdf (16.7MB, pdf)

References

  • 1.Oliva A., Torralba A. The role of context in object recognition. Trends Cogn. Sci. 2007;11:520–527. doi: 10.1016/j.tics.2007.09.009. [DOI] [PubMed] [Google Scholar]
  • 2.Maren S., Phan K.L., Liberzon I. The contextual brain: Implications for fear conditioning, extinction and psychopathology. Nat. Rev. Neurosci. 2013;14:417–428. doi: 10.1038/nrn3492. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Nikolić D. The brain is a context machine. Rev. Psychol. 2010;17:33–38. [Google Scholar]
  • 4.Lloyd K., Leslie D.S. Context-dependent decision-making: A simple Bayesian model. J. R. Soc. Interface. 2013;10 doi: 10.1098/rsif.2013.0069. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Tran K.H., McDonald A.P., D’arcy R.C.N., Song X. Contextual processing and the impacts of aging and neurodegeneration: A scoping review. Clin. Interv. Aging. 2021;16:345–361. doi: 10.2147/CIA.S287619. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Zheng J., Skelin I., Lin J.J. Neural computations underlying contextual processing in humans. Cell Rep. 2022;40 doi: 10.1016/j.celrep.2022.111395. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Wolfe J.M. Moving towards solutions to some enduring controversies in visual search. Trends Cogn. Sci. 2003;7:70–76. doi: 10.1016/S1364-6613(02)00024-4. [DOI] [PubMed] [Google Scholar]
  • 8.Treisman A.M., Gelade G. A Feature-Integration Theory of Attention. Cogn. Psychol. 1980;12:97–136. doi: 10.1016/0010-0285(80)90005-5. [DOI] [PubMed] [Google Scholar]
  • 9.Zhao S., Li C., Uono S., Yoshimura S., Toichi M. Human cortical activity evoked by contextual processing in attentional orienting. Sci. Rep. 2017;7 doi: 10.1038/s41598-017-03104-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Silveira S., Fehse K., Vedder A., Elvers K., Hennig-Fast K. Is it the picture or is it the frame? An fmri study on the neurobiology of framing effects. Front. Hum. Neurosci. 2015;9 doi: 10.3389/fnhum.2015.00528. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Chiu Y.C., Egner T. Cortical and subcortical contributions to context-control learning. Neurosci. Biobehav. Rev. 2019;99:33–41. doi: 10.1016/j.neubiorev.2019.01.019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Giedd J.N. The Digital Revolution and Adolescent Brain Evolution. J. Adolesc. Health. 2012;51:101–105. doi: 10.1016/j.jadohealth.2012.06.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Linlin W., Wanyu H., Yuting L., Huimin Q., Zhi L., Qinchen J., Tingting W., Fan W., Minghao P., Wei Z. Research on the mechanism of short video information interaction behavior of college students with psychological disorders based on grounded theory. BMC Public Health. 2023;23 doi: 10.1186/s12889-023-17211-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Wei T., Wang X. A Historical Review and Theoretical Mapping on Short Video Studies 2005–2021. Online Media Glob. Commun. 2022;1:247–286. doi: 10.1515/omgc-2022-0040. [DOI] [Google Scholar]
  • 15.Violot C., Elmas T., Bilogrevic I., Humbert M. Shorts vs. Regular Videos on YouTube: A Comparative Analysis of User Engagement and Content Creation Trends. Proc. 16th ACM Web Sci. Conf. WebSci. 2024:213–223. doi: 10.1145/3614419.3644023. [DOI] [Google Scholar]
  • 16.Shutsko A. In: HCII 2020. Meiselwitz G., editor. Springer International Publishing; 2020. User-Generated Short Video Content in Social Media. A Case Study of TikTok BT - Social Computing and Social Media. Participation, User Experience, Consumer Experience, and Applications of Social Computing; pp. 108–125. [Google Scholar]
  • 17.Calbi M., Siri F., Heimann K., Barratt D., Gallese V., Kolesnikov A., Umiltà M.A. How context influences the interpretation of facial expressions: a source localization high-density EEG study on the “Kuleshov effect.”. Sci. Rep. 2019;9:2107–2116. doi: 10.1038/s41598-018-37786-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Chen Z., Whitney D. Inferential affective tracking reveals the remarkable speed of context-based emotion perception. Cognition. 2021;208 doi: 10.1016/j.cognition.2020.104549. [DOI] [PubMed] [Google Scholar]
  • 19.Li M., Huang H., Zhou K., Meng M. Unraveling the neural dichotomy of consensus and idiosyncratic experiences in short video viewing. Brain Cogn. 2025;184 doi: 10.1016/j.bandc.2024.106260. [DOI] [PubMed] [Google Scholar]
  • 20.Fogelson N. Neural correlates of local contextual processing across stimulus modalities and patient populations. Neurosci. Biobehav. Rev. 2015;52:207–220. doi: 10.1016/j.neubiorev.2015.02.016. [DOI] [PubMed] [Google Scholar]
  • 21.Lahner B., Dwivedi K., Iamshchinina P., Graumann M., Lascelles A., Roig G., Gifford A.T., Pan B., Jin S., Ratan Murty N.A., et al. Modeling short visual events through the BOLD moments video fMRI dataset and metadata. Nat. Commun. 2024;15 doi: 10.1038/s41467-024-50310-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Meer J.N.v.d., Breakspear M., Chang L.J., Sonkusare S., Cocchi L. Movie viewing elicits rich and reliable brain state dynamics. Nat. Commun. 2020;11:5004–5014. doi: 10.1038/s41467-020-18717-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Redcay E., Moraczewski D. Social cognition in context: A naturalistic imaging approach. Neuroimage. 2020;216 doi: 10.1016/j.neuroimage.2019.116392. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Barratt D., Rédei A.C., Innes-Ker Å., van de Weijer J. Does the Kuleshov Effect Really Exist? Revisiting a Classic Film Experiment on Facial Expressions and Emotional Contexts. Perception. 2016;45:847–874. doi: 10.1177/0301006616638595. [DOI] [PubMed] [Google Scholar]
  • 25.Calbi M., Heimann K., Barratt D., Siri F., Umiltà M.A., Gallese V. How context influences our perception of emotional faces: A behavioral study on the Kuleshov effect. Front. Psychol. 2017;8:1684. doi: 10.3389/fpsyg.2017.01684. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Mullennix J., Barber J., Cory T. An examination of the Kuleshov effect using still photographs. PLoS One. 2019;14 doi: 10.1371/journal.pone.0224623. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Mobbs D., Weiskopf N., Lau H.C., Featherstone E., Dolan R.J., Frith C.D. The Kuleshov Effect: the influence of contextual framing on emotional attributions. Soc. Cogn. Affect. Neurosci. 2006;1:95–106. doi: 10.1093/scan/nsl014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Cao Z., Wang Y., Li R., Xiao X., Xie Y., Bi S., Wu L., Zhu Y., Wang Y. Exploring the combined impact of color and editing on emotional perception in authentic films : Insights from behavioral and neuroimaging experiments. Humanit. Soc. Sci. Commun. 2024;11 doi: 10.1057/s41599-024-03874-w. [DOI] [Google Scholar]
  • 29.Hasson U., Landesman O., Knappmeyer B., Vallines I., Rubin N., Heeger D.J. Neurocinematics: The Neuroscience of Film. Projections. 2008;2:1–26. doi: 10.3167/proj.2008.020102. [DOI] [Google Scholar]
  • 30.Qin S. Emotion representations in context: maturation and convergence pathways. Trends Cogn. Sci. 2023;27:883–885. doi: 10.1016/j.tics.2023.07.009. [DOI] [PubMed] [Google Scholar]
  • 31.Kringelbach M.L., Perl Y.S., Tagliazucchi E., Deco G. Toward naturalistic neuroscience: Mechanisms underlying the flattening of brain hierarchy in movie-watching compared to rest and task. Sci. Adv. 2023;9 doi: 10.1126/sciadv.ade6049. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Raz G., Touroutoglou A., Wilson-Mendenhall C., Gilam G., Lin T., Gonen T., Jacob Y., Atzil S., Admon R., Bleich-Cohen M., et al. Functional connectivity dynamics during film viewing reveal common networks for different emotional experiences. Cogn. Affect. Behav. Neurosci. 2016;16:709–723. doi: 10.3758/s13415-016-0425-4. [DOI] [PubMed] [Google Scholar]
  • 33.Bar M., Kassam K.S., Ghuman A.S., Boshyan J., Schmid A.M., Dale A.M., Hämäläinen M.S., Marinkovic K., Schacter D.L., Rosen B.R., Halgren E. Top-down facilitation of visual recognition. Proc. Natl. Acad. Sci. USA. 2006;103:449–454. doi: 10.1073/pnas.0507062103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Galli G., Feurra M., Viggiano M.P. “Did you see him in the newspaper?” Electrophysiological correlates of context and valence in face processing. Brain Res. 2006;1119:190–202. doi: 10.1016/j.brainres.2006.08.076. [DOI] [PubMed] [Google Scholar]
  • 35.Kajal D.S., Fioravanti C., Elshahabi A., Ruiz S., Sitaram R., Braun C. Involvement of top-down networks in the perception of facial emotions: A magnetoencephalographic investigation. Neuroimage. 2020;222 doi: 10.1016/j.neuroimage.2020.117075. [DOI] [PubMed] [Google Scholar]
  • 36.Ren J., Huang F., Gao C., Gott J., Schoch S.F., Qin S., Dresler M., Luo J. Functional lateralization of the medial temporal lobe in novel associative processing during creativity evaluation. Cereb. Cortex. 2023;33:1186–1206. doi: 10.1093/cercor/bhac129. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Preti M.G., Bolton T.A., Van De Ville D. The dynamic functional connectome: State-of-the-art and perspectives. Neuroimage. 2017;160:41–54. doi: 10.1016/j.neuroimage.2016.12.061. [DOI] [PubMed] [Google Scholar]
  • 38.Song Z., Zhu Z., Zhang H., Wang S., Zou L. Extraction of brain function pattern with visual-capture-task fMRI using dynamic time-window method in ADHD children. Behav. Brain Res. 2024;460 doi: 10.1016/j.bbr.2023.114828. [DOI] [PubMed] [Google Scholar]
  • 39.Saxe R., Kanwisher N. People thinking about thinking people: The role of the temporo-parietal junction in “theory of mind.”. Neuroimage. 2003;19:1835–1842. doi: 10.1016/S1053-8119(03)00230-1. [DOI] [PubMed] [Google Scholar]
  • 40.Morelli S.A., Rameson L.T., Lieberman M.D. The neural components of empathy: Predicting daily prosocial behavior. Soc. Cogn. Affect. Neurosci. 2014;9:39–47. doi: 10.1093/scan/nss088. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Duerden E.G., Arsalidou M., Lee M., Taylor M.J. Lateralization of affective processing in the insula. Neuroimage. 2013;78:159–175. doi: 10.1016/j.neuroimage.2013.04.014. [DOI] [PubMed] [Google Scholar]
  • 42.Davachi L., DuBrow S. How the hippocampus preserves order: The role of prediction and context. Trends Cogn. Sci. 2015;19:92–99. doi: 10.1016/j.tics.2014.12.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Yan C.G., Wang X.D., Zuo X.N., Zang Y.F. DPABI: Data Processing & Analysis for (Resting-State) Brain Imaging. Neuroinformatics. 2016;14:339–351. doi: 10.1007/s12021-016-9299-4. [DOI] [PubMed] [Google Scholar]
  • 44.Lettieri G., Handjaras G., Ricciardi E., Leo A., Papale P., Betta M., Pietrini P., Cecchetti L. Emotionotopy in the human right temporo-parietal cortex. Nat. Commun. 2019;10 doi: 10.1038/s41467-019-13599-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Powers J.P., Davis S.W., Neacsiu A.D., Beynel L., Appelbaum L.G., LaBar K.S. Examining the Role of Lateral Parietal Cortex in Emotional Distancing Using TMS. Cogn. Affect. Behav. Neurosci. 2020;20:1090–1102. doi: 10.3758/s13415-020-00821-5. [DOI] [PubMed] [Google Scholar]
  • 46.Berntson G.G., Norman G.J., Bechara A., Bruss J., Tranel D., Cacioppo J.T. The insula and evaluative processes. Psychol. Sci. 2011;22:80–86. doi: 10.1177/0956797610391097. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Jezzini A., Rozzi S., Borra E., Gallese V., Caruana F., Gerbella M. A shared neural network for emotional expression and perception: An anatomical study in the macaque monkey. Front. Behav. Neurosci. 2015;9:243. doi: 10.3389/fnbeh.2015.00243. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Menon V., Uddin L.Q. Saliency, switching, attention and control: a network model of insula function. Brain Struct. Funct. 2010;214:655–667. doi: 10.1007/s00429-010-0262-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Teodorescu C.A., Ciucu Durnoi A.-N., Vargas V.M. The Rise of the Mobile Internet: Tracing the Evolution of Portable Devices. Proc. Int. Conf. Bus. Excell. 2023;17:1645–1654. doi: 10.2478/picbe-2023-0147. [DOI] [Google Scholar]
  • 50.Koronaki N. The Evolution of Media Consumption Trends and Impacts. J. Mass Commun. Journal. 2024;14:3–4. doi: 10.37421/2165-7912.2024.14.571. [DOI] [Google Scholar]
  • 51.Krumhuber E.G., Skora L.I., Hill H.C.H., Lander K. The role of facial movements in emotion recognition. Nat. Rev. Psychol. 2023;2:283–296. doi: 10.1038/s44159-023-00172-1. [DOI] [Google Scholar]
  • 52.Gao Y., Lin W., Zhang M., Zheng L., Liu J., Zheng M., En Y., Chen Y., Mo L. Cognitive mechanisms of the face context effect: An event related potential study of the effects of emotional contexts on neutral face perception. Biol. Psychol. 2022;175 doi: 10.1016/j.biopsycho.2022.108430. [DOI] [PubMed] [Google Scholar]
  • 53.Ambadar Z., Schooler J.W., Cohn J.F. Deciphering the enigmatic face the importance of facial dynamics in interpreting subtle facial expressions. Psychol. Sci. 2005;16:403–410. doi: 10.1111/j.0956-7976.2005.01548.x. [DOI] [PubMed] [Google Scholar]
  • 54.Ponech T. Visual Perception and Motion Picture Spectatorship. Cine. J. 1997;37:85–100. doi: 10.2307/1225691. [DOI] [Google Scholar]
  • 55.Tan E.S. A psychology of the film. Palgrave Commun. 2018;4 doi: 10.1057/s41599-018-0111-y. [DOI] [Google Scholar]
  • 56.Wieser M.J., Brosch T. Faces in context: A review and systematization of contextual influences on affective face processing. Front. Psychol. 2012;3:471. doi: 10.3389/fpsyg.2012.00471. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Trillò T. New Media Soc; 2024. “PoV: You are reading an academic article.” The memetic performance of affiliation in TikTok’s platform vernacular. [DOI] [Google Scholar]
  • 58.Bordwell D., Thompson K. Film Art: An Introduction. McGraw-Hill; 2000. pp. 156–290. [Google Scholar]
  • 59.Monaco J. Oxford University Press; 2009. How to Read a Film: Movies, Media, and beyond. [Google Scholar]
  • 60.Cutting J.E. The Framing of Characters in Popular Movies. Art Percept. 2015;3:191–212. doi: 10.1163/22134913-00002031. [DOI] [Google Scholar]
  • 61.Mobbs D., Hagan C.C., Dalgleish T., Silston B., Prévost C. The ecology of human fear: Survival optimization and the nervous system. Front. Neurosci. 2015;9:55. doi: 10.3389/fnins.2015.00055. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.LaBar K.S., Phelps E.A. At American Psychological Association; 2005. Reinstatement of Conditioned Fear in Humans Is Context Dependent and Impaired in Amnesia. [DOI] [PubMed] [Google Scholar]
  • 63.LaBar K.S., LeDoux J.E., Spencer D.D., Phelps E.A. Impaired fear conditioning following unilateral temporal lobectomy in humans. J. Neurosci. 1995;15:6846–6855. doi: 10.1523/jneurosci.15-10-06846.1995. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Gilbert C.D., Li W. Top-down influences on visual processing. Nat. Rev. Neurosci. 2013;14:350–363. doi: 10.1038/nrn3476. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Barrett L.F. The theory of constructed emotion: an active inference account of interoception and categorization. Soc. Cogn. Affect. Neurosci. 2017;12:1–23. doi: 10.1093/scan/nsw154. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Ferrey A.E., Burleigh T.J., Fenske M.J. Stimulus-category competition, inhibition, and affective devaluation: A novel account of the uncanny valley. Front. Psychol. 2015;6:249. doi: 10.3389/fpsyg.2015.00249. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Plant E.A., Hyde J.S., Keltner D., Devine P.G. The gender stereotyping of emotions. Psychol. Women Q. 2000;24:81–92. doi: 10.1111/j.1471-6402.2000.tb01024.x. [DOI] [Google Scholar]
  • 68.McClure E.B. A Meta-Analytic Review of Sex Differences in Facial Expression Processing and Their Development in Infants, Children, and Adolescents. Psychol. Bull. 2000;126:424–453. doi: 10.1016/S0022-3476(99)70079-X. [DOI] [PubMed] [Google Scholar]
  • 69.Ganis G., Kutas M. An electrophysiological study of scene effects on object identification. Cogn. Brain Res. 2003;16:123–144. doi: 10.1016/S0926-6410(02)00244-6. [DOI] [PubMed] [Google Scholar]
  • 70.Smith A.P.R., Henson R.N.A., Dolan R.J., Rugg M.D. fMRI correlates of the episodic retrieval of emotional contexts. Neuroimage. 2004;22:868–878. doi: 10.1016/j.neuroimage.2004.01.049. [DOI] [PubMed] [Google Scholar]
  • 71.Lough S., Kipps C.M., Treise C., Watson P., Blair J.R., Hodges J.R. Social reasoning, emotion and empathy in frontotemporal dementia. Neuropsychologia. 2006;44:950–958. doi: 10.1016/j.neuropsychologia.2005.08.009. [DOI] [PubMed] [Google Scholar]
  • 72.Tranel D., Damasio H., Damasio A.R. A neural basis for the retrieval of conceptual knowledge. Neuropsychologia. 1997;35:1319–1327. doi: 10.1016/S0028-3932(97)00085-7. [DOI] [PubMed] [Google Scholar]
  • 73.Geng J.J., Vossel S. Re-evaluating the role of TPJ in attentional control: Contextual updating? Neurosci. Biobehav. Rev. 2013;37:2608–2620. doi: 10.1016/j.neubiorev.2013.08.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Doricchi F., Lasaponara S., Pazzaglia M., Silvetti M. Left and right temporal-parietal junctions (TPJs) as “match/mismatch” hedonic machines: A unifying account of TPJ function. Phys. Life Rev. 2022;42:56–92. doi: 10.1016/j.plrev.2022.07.001. [DOI] [PubMed] [Google Scholar]
  • 75.Atique B., Erb M., Gharabaghi A., Grodd W., Anders S. Task-specific activity and connectivity within the mentalizing network during emotion and intention mentalizing. Neuroimage. 2011;55:1899–1911. doi: 10.1016/j.neuroimage.2010.12.036. [DOI] [PubMed] [Google Scholar]
  • 76.Wittmann M.K., Lockwood P.L., Rushworth M.F.S. Neural mechanisms of social cognition in primates. Annu. Rev. Neurosci. 2018;41:99–118. doi: 10.1146/annurev-neuro-080317-061450. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Pantelis P.C., Byrge L., Tyszka J.M., Adolphs R., Kennedy D.P. A specific hypoactivation of right temporo-parietal junction/posterior superior temporal sulcus in response to socially awkward situations in autism. Soc. Cogn. Affect. Neurosci. 2015;10:1348–1356. doi: 10.1093/scan/nsv021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.Moraczewski D., Chen G., Redcay E. Inter-subject synchrony as an index of functional specialization in early childhood. Sci. Rep. 2018;8 doi: 10.1038/s41598-018-20600-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79.Moraczewski D., Nketia J., Redcay E. Cortical temporal hierarchy is immature in middle childhood. Neuroimage. 2020;216 doi: 10.1016/j.neuroimage.2020.116616. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80.Ildirar S., Ewing L. Revisiting the Kuleshov effect with first-time viewers. Projections. 2018;12:19–38. doi: 10.3167/proj.2018.120103. [DOI] [Google Scholar]
  • 81.Chandra S., Sharma S., Chaudhuri R., Fiete I. Episodic and associative memory from spatial scaffolds in the hippocampus. Nature. 2025;638:739–751. doi: 10.1038/s41586-024-08392-y. [DOI] [PubMed] [Google Scholar]
  • 82.Borders A.A., Aly M., Parks C.M., Yonelinas A.P. The hippocampus is particularly important for building associations across stimulus domains. Neuropsychologia. 2017;99:335–342. doi: 10.1016/j.neuropsychologia.2017.03.032. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 83.Cox D., Meyers E., Sinha P. Contextually Evoked Object-Specific Responses in Human Visual Cortex. Science. 2004;304:115–117. doi: 10.1126/science.1093110. [DOI] [PubMed] [Google Scholar]
  • 84.Bassett D.S., Wymbs N.F., Porter M.A., Mucha P.J., Carlson J.M., Grafton S.T. Dynamic reconfiguration of human brain networks during learning. Proc. Natl. Acad. Sci. USA. 2011;108:7641–7646. doi: 10.1073/pnas.1018985108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 85.Wang R., Liu M., Cheng X., Wu Y., Hildebrandt A., Zhou C. Segregation, integration, and balance of large-scale resting brain networks configure different cognitive abilities. Proc. Natl. Acad. Sci. 2021;36:12083–12094. doi: 10.1073/pnas.2022288118/-/DCSupplemental.y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 86.Cocchi L., Gollo L.L., Zalesky A., Breakspear M. Criticality in the brain: A synthesis of neurobiology, models and cognition. Prog. Neurobiol. 2017;158:132–152. doi: 10.1016/j.pneurobio.2017.07.002. [DOI] [PubMed] [Google Scholar]
  • 87.Parlatini V., Radua J., Dell’Acqua F., Leslie A., Simmons A., Murphy D.G., Catani M., Thiebaut de Schotten M. Functional segregation and integration within fronto-parietal networks. Neuroimage. 2017;146:367–375. doi: 10.1016/j.neuroimage.2016.08.031. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 88.Deco G., Tononi G., Boly M., Kringelbach M.L. Rethinking segregation and integration: Contributions of whole-brain modelling. Nat. Rev. Neurosci. 2015;16:430–439. doi: 10.1038/nrn3963. [DOI] [PubMed] [Google Scholar]
  • 89.Shymkiv Y., Hamm J.P., Escola S., Yuste R., Shymkiv Y., Hamm J.P., Escola S., Yuste R. Report Slow cortical dynamics generate context processing and novelty detection Report Slow cortical dynamics generate context processing and novelty detection. Neuron. 2025;113:1–11. doi: 10.1016/j.neuron.2025.01.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 90.Hutchison R.M., Womelsdorf T., Allen E.A., Bandettini P.A., Calhoun V.D., Corbetta M., Della Penna S., Duyn J.H., Glover G.H., Gonzalez-Castillo J., et al. Dynamic functional connectivity: Promise, issues, and interpretations. Neuroimage. 2013;80:360–378. doi: 10.1016/j.neuroimage.2013.05.079. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 91.Li Q., Xia M., Zeng D., Xu Y., Sun L., Liang X., Xu Z., Zhao T., Liao X., Yuan H., et al. Development of segregation and integration of functional connectomes during the first 1,000 days. Cell Rep. 2024;43 doi: 10.1016/j.celrep.2024.114168. [DOI] [PubMed] [Google Scholar]
  • 92.Park H.J., Friston K. Structural and functional brain networks: From connections to cognition. Science. 2013;342 doi: 10.1126/science.1238411. [DOI] [PubMed] [Google Scholar]
  • 93.Sun L., Zhao T., Liang X., Xia M., Li Q., Liao X., Gong G., Wang Q., Pang C., Yu Q., et al. Human lifespan changes in the brain’s functional connectome. Nat. Neurosci. 2025;28:891–901. doi: 10.1038/s41593-025-01907-4. [DOI] [PubMed] [Google Scholar]
  • 94.Shine J.M., Aburn M.J., Breakspear M., Poldrack R.A. The modulation of neural gain facilitates a transition between functional segregation and integration in the brain. eLife. 2018;7 doi: 10.7554/eLife.31130. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 95.Chen S., Tang Q., Toyoizumi T., Sommer W., Yu L. Stiff-sloppy analysis of brain networks to reveal individual differences in task performance. arXiv. 2025 doi: 10.48550/arXiv.2501.19106. Preprint at. [DOI] [Google Scholar]
  • 96.Kucyi A., Moayedi M., Weissman-Fogel I., Hodaie M., Davis K.D. Hemispheric asymmetry in white matter connectivity of the temporoparietal junction with the insula and prefrontal cortex. PLoS One. 2012;7 doi: 10.1371/journal.pone.0035589. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 97.Kucyi A., Hodaie M., Davis K.D. Lateralization in intrinsic functional connectivity of the temporoparietal junction with salience- and attention-related brain networks. J. Neurophysiol. 2012;108:3382–3392. doi: 10.1152/jn.00674.2012. [DOI] [PubMed] [Google Scholar]
  • 98.Sellitto M., Neufang S., Schweda A., Weber B., Kalenscher T. Arbitration between insula and temporoparietal junction subserves framing-induced boosts in generosity during social discounting. Neuroimage. 2021;238 doi: 10.1016/j.neuroimage.2021.118211. [DOI] [PubMed] [Google Scholar]
  • 99.Assaf M., Hyatt C.J., Wong C.G., Johnson M.R., Schultz R.T., Hendler T., Pearlson G.D. Mentalizing and motivation neural function during social interactions in autism spectrum disorders. Neuroimage. Clin. 2013;3:321–331. doi: 10.1016/j.nicl.2013.09.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 100.Nord C.L., Lawson R.P., Dalgleish T. Disrupted Dorsal Mid-Insula Activation During Interoception Across Psychiatric Disorders. Am. J. Psychiatry. 2021;178:761–770. doi: 10.1176/appi.ajp.2020.20091340. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 101.Acheson D.T., Gresack J.E., Risbrough V.B. Hippocampal dysfunction effects on context memory: Possible etiology for posttraumatic stress disorder. Neuropharmacology. 2012;62:674–685. doi: 10.1016/j.neuropharm.2011.04.029. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 102.Kirk P.A., Robinson O.J., Skipper J.I. Anxiety and amygdala connectivity during movie-watching. Neuropsychologia. 2022;169 doi: 10.1016/j.neuropsychologia.2022.108194. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 103.Zhang Q., Li B., Jin S., Liu W., Liu J., Xie S., Zhang L., Kang Y., Ding Y., Zhang X., et al. Comparing the Effectiveness of Brain Structural Imaging, Resting-state fMRI, and Naturalistic fMRI in Recognizing Social Anxiety Disorder in Children and Adolescents. Psychiatry Res. Neuroimaging. 2022;323 doi: 10.1016/j.pscychresns.2022.111485. [DOI] [PubMed] [Google Scholar]
  • 104.Mosley P.E., van der Meer J.N., Hamilton L.H.W., Fripp J., Parker S., Jeganathan J., Breakspear M., Parker R., Holland R., Mitchell B.L., et al. Markers of positive affect and brain state synchrony discriminate melancholic from non-melancholic depression using naturalistic stimuli. Mol. Psychiatry. 2025;30:848–860. doi: 10.1038/s41380-024-02699-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 105.Kveraga K., Ghuman A.S., Kassam K.S., Aminoff E.A., Hämäläinen M.S., Chaumon M., Bar M. Early onset of neural synchronization in the contextual associations network. Proc. Natl. Acad. Sci. USA. 2011;108:3389–3394. doi: 10.1073/pnas.1013760108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 106.Müller V.I., Cieslik E.C., Turetsky B.I., Eickhoff S.B. Crossmodal interactions in audiovisual emotion processing. Neuroimage. 2012;60:553–561. doi: 10.1016/j.neuroimage.2011.12.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 107.Mulier L., Slabbinck H., Vermeir I. This Way Up: The Effectiveness of Mobile Vertical Video Marketing. J. Interact. Market. 2021;55:1–15. doi: 10.1016/j.intmar.2020.12.002. [DOI] [Google Scholar]
  • 108.Kreibig S.D. Autonomic nervous system activity in emotion: A review. Biol. Psychol. 2010;84:394–421. doi: 10.1016/j.biopsycho.2010.03.010. [DOI] [PubMed] [Google Scholar]
  • 109.Mauss I.B., Robinson M.D. Measures of emotion: A review. Cogn. Emot. 2009;23:209–237. doi: 10.1080/02699930802204677. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 110.Duan Z., Wang F., Hong J. Culture shapes how we look: Comparison between Chinese and African university students. J. Eye Mov. Res. 2016;9:1–10. [Google Scholar]
  • 111.Engelmann J.B., Pogosyan M. Emotion perception across cultures: The role of cognitive mechanisms. Front. Psychol. 2013;4:118. doi: 10.3389/fpsyg.2013.00118. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 112.Salt B. Statistical Style Analysis of Motion Pictures. Film Q. 1974;28:13–22. doi: 10.1525/fq.1974.28.1.04a00050. [DOI] [Google Scholar]
  • 113.Cutting J.E., Brunick K.L., DeLong J.E., Iricinschi C., Candan A. Quicker, faster, darker: Changes in hollywood film over 75 years. Iperception. 2011;2:569–576. doi: 10.1068/i0441aap. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 114.Lee K.H., Siegle G.J. Different brain activity in response to emotional faces alone and augmented by contextual information. Psychophysiology. 2014;51:1147–1157. doi: 10.1111/psyp.12254. [DOI] [PubMed] [Google Scholar]
  • 115.Rolls E.T., Huang C.C., Lin C.P., Feng J., Joliot M. Automated anatomical labelling atlas 3. Neuroimage. 2020;206 doi: 10.1016/j.neuroimage.2019.116189. [DOI] [PubMed] [Google Scholar]
  • 116.Ren J., Huang F., Zhou Y., Zhuang L., Xu J., Gao C., Qin S., Luo J. The function of the hippocampus and middle temporal gyrus in forming new associations and concepts during the processing of novelty and usefulness features in creative designs. Neuroimage. 2020;214 doi: 10.1016/j.neuroimage.2020.116751. [DOI] [PubMed] [Google Scholar]
  • 117.Patil A.U., Ghate S., Madathil D., Tzeng O.J.L., Huang H.W., Huang C.M. Static and dynamic functional connectivity supports the configuration of brain networks associated with creative cognition. Sci. Rep. 2021;11:165. doi: 10.1038/s41598-020-80293-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 118.Tobia M.J., Hayashi K., Ballard G., Gotlib I.H., Waugh C.E. Dynamic functional connectivity and individual differences in emotions during social stress. Hum. Brain Mapp. 2017;38:6185–6205. doi: 10.1002/hbm.23821. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Video S1. Colored short video Demos
Download video file (4.7MB, mp4)
Video S2. Black-and-white short video Demos
Download video file (4MB, mp4)
Video S3. Contextual processing Demos
Download video file (7.7MB, mp4)
Document S1. Figures S1–S9 and TableS S1–S5
mmc1.pdf (16.7MB, pdf)

Data Availability Statement


Articles from iScience are provided here courtesy of Elsevier

RESOURCES