Skip to main content
NPJ Science of Learning logoLink to NPJ Science of Learning
. 2025 Nov 26;10:86. doi: 10.1038/s41539-025-00378-3

Behavioral and eye-tracking investigation of event segmentation following short video watching

Hongxiao Li 1,2, Jiashen Li 1,2, Xin Hao 1,2,, Wei Liu 1,2,
PMCID: PMC12657513  PMID: 41298532

Abstract

The proliferation of short-video platforms prompts critical investigation of their effects on human cognitive functions. We hypothesized that the frequent, user-driven content shifts inherent to short-video watching impair event segmentation, a cognitive process critical for continuous memory encoding. Combining behavioral, eye-tracking, and self-report data, we revealed that acute exposure to randomly selected short videos was associated with poorer memory for continuous movies, particularly in participants with more frequent daily short-video viewing. This effect was absent after viewing personalized short videos and did not apply to static image encoding tasks. Intersubject correlation analysis of eye movements revealed that random short video watching attenuated eye synchronization at event boundaries. Furthermore, Hidden Markov Model analysis indicated that personalized and random short videos induced qualitatively different latent event structures. These findings indicate that the algorithmic curation of content, not merely the short-video format, is a crucial factor shaping event segmentation and subsequent memory.

Subject terms: Neuroscience, Psychology, Psychology

Introduction

The exponential growth of short video platforms—internationally known as TikTok and its Chinese counterpart, Douyin—has established them as dominant forces in social media, with their user base reaching 1.6 billion in 2024, approximately 21% of whom are adolescents (Retrieved from https://www.businessofapps.com/data/tik-tok-statistics/ on March 2025). While enhancing user immersion through personalized recommendation algorithms, short-video platforms have also raised concerns about problematic use and potential addiction1, significantly transforming the information consumption habits of users, particularly adolescents. Given the widespread use of these short-video platforms by populations undergoing critical cognitive development, it is essential to understand their consequences for the foundational mechanisms of learning and memory. This study provides empirical evidence of how this media format influences the ability to segment and encode continuous events, aiming to inform scientifically grounded recommendations for its use.

The defining characteristic of short-video watching is its fragmented format, marked by rapid, user-controlled context shifts. This viewing pattern contrasts sharply with the cognitive processes required to encode continuous, structured experiences. Previous research has shown that exposure to the fragmented information common on these platforms increase users’ cognitive load2 and impair cognitive functions like sustained attention3, working memory2, prospective memory4, time perception5, and women’s body satisfaction6. Here, we propose its nature mainly disrupts another associated cognitive mechanism of human learning and memory: event segmentation.

Event segmentation is the fundamental process of parsing continuous experience into discrete, meaningful units79. This process is crucial for encoding continuous stimuli, such as commercial movies1012, sports footage13, and auditory narratives14 such as movies or narratives. Critically, effective segmentation during encoding is a strong predictor of subsequent memory recall1518. Existing research has demonstrated that gradual changes between events can provide individuals with event segmentation cues, which serve as a framework for subsequent memory recall19,20. This raises the question: What is the effect of rapid switching between short video clips on subsequent event segmentation? As our brain perceives a continuous stream of information, it builds an “event model” based on the ongoing situation and uses this model to actively predict what will happen next. When incoming information aligns with the model’s predictions, the current model is maintained21. However, when the situation an individual experiences changes, a “prediction error” occurs, leading to the construction of a new event model. The perception of an “event boundary,” triggered by this prediction error, is the core of event segmentation22. The fragmented pattern of short videos induces high-intensity, high-frequency event segmentation. From a cognitive science perspective, each switch to a new video artificially manufactures a “prediction error.” This frequent switching compels the brain to repeatedly cycle through the process of “building a model - prediction failure - abandoning the model - building a new model” at an extremely high frequency, resulting in a significant consumption of cognitive resources22.

We hypothesize that acute or chronic exposure to this high-frequency updating habituates the cognitive system to a “segment-and-refresh” mode. This habituation has two key consequences for subsequent processing of continuous information. First, it may lower the threshold for perceiving event boundaries, causing individuals to “over-segment” continuous streams by misinterpreting minor changes as major prediction errors. Second, a brain accustomed to the constant stimulation of model updating may struggle to maintain focus on a stable information stream, leading to deficits in sustained attention as a downstream consequence. Therefore, we predict that exposure to short-video content impairs subsequent event segmentation, leading to poorer memory for continuous events.

We further hypothesize that this disruptive effect is amplified by the second characteristic: personalized recommendation algorithms. These algorithms predict user interests from Browse history and content tags to curate a continuous stream of tailored videos23,24. By tailoring content to users’ preferences, algorithms create highly immersive and pleasurable viewing experiences that reinforce compulsive engagement1. Supporting this, neuroimaging evidence indicates that personalized video exposure modulates functional connectivity across large-scale brain networks, suppressing regions involved in cognitive control while activating reward-related pathways25,26, which may underlie the habitual nature of short-video consumption. By hijacking the brain’s reward system, personalization may make it harder for viewers to disengage from the rapid, fragmented flow of content, thereby magnifying the disruptive impact of the constant context-switching on event segmentation. This leads to our second hypothesis: that the tailored, reward-driven nature of personalized content will have a more pronounced disruptive impact on the cognitive mechanisms underlying event segmentation compared to a random, non-personalized sequence of videos.

Previous research on the effects of short video exposure on human behavior has primarily relied on self-report studies27. These have indicated that, like reports on smartphone and social media usage, TikTok exposure correlates positively with symptoms of depression and anxiety2831. However, self-reported measures are susceptible to memory biases and social desirability effect. For instance, individuals may experience subjective time distortions when estimating the duration of their online activities32, an issue that could similarly affect estimates of time spent on short-video platforms. An alternative research approach involves acutely exposing participants to short videos in a controlled laboratory setting and subsequently observing the carry-over effects on their behaviors. To overcome these limitations and more objectively investigate the cognitive consequences of short-video watching, we employed an integrative research methodology that combined self-reported measures with objective assessments of cognitive functions and memory retrieval.

To test the specificity of this proposed mechanism, we designed two studies that contrasted memory encoding for continuous versus discrete information by incorporating both self-reported and objective memory retrieval measures. Study 1 used a continuous movie-watching task, which heavily relies on event segmentation for successful memory formation11,12,15,33. In contrast, Study 2 employed a trial-based static image encoding task, which engages visual memory but does not depend on the same temporal segmentation mechanism34. We predicted that if short-video exposure selectively impairs event segmentation, memory performance would be degraded for the continuous movie (Study 1) but would remain intact for the discrete images (Study 2).

Since memory performance is a relatively indirect measure of the underlying segmentation process, we used eye-tracking, leveraging advanced computational analyses typically applied to neuroimaging data like fMRI3540 and EEG17. Prior work from our lab applied eye-tracking methods to identify signatures of event segmentation in continuous memory encoding41. This approach leveraged the high temporal resolution and cost-efficiency of eye tracking42,43 and built upon findings that pupil dilation reflects changes in event structure44. We tracked pupil size and eye movement patterns, integrating advanced computational approaches (i.e., Hidden Markov Models (HMM)45 and inter-subject correlation analysis (ISC)46) —typically used in neuroimaging—into our eye-tracking data analysis.

More specifically, to capture the potentially subtle disruptions in the dynamic process of event segmentation during continuous viewing, we used advanced eye-tracking analyses, ISC, and HMM, to quantify the dynamic process of event segmentation. We selected Intersubject Correlation (ISC) analysis because it provides a robust measure of shared attentional engagement across viewers during segmentation; lower ISC at narrative event boundaries indicates a divergence in viewing patterns, reflecting disruptions in shared segmentation processes47. By applying the HMM—a data-driven event segmentation technique—to eye-tracking data, which includes pupil size and the coordinates of fixation points over time, we determined the model-generated optimal event number, distinctive eye-tracking patterns specific to each event, and the precise timing of event transitions. HMMs are particularly well-suited for identifying latent cognitive states from sequential eye-tracking data, allowing us to quantify the fragmentation of a viewer’s segmentation process.

To investigate the effects of acute short-video exposure on subsequent memory encoding, we conducted two studies (Study 1, N = 113; Study 2, N = 60) using a two-phase (manipulation-encoding) paradigm. Study 1 employed a four-group, between-subjects design to systematically disentangle the core components of the short-video experience and their impact on continuous memory encoding (Fig. 1a and Table 1). During the manipulation phase, we established two short-video conditions to isolate the hallmark feature of rapid context-switching. The Short-Random (Short-R) group viewed randomly selected videos, whereas the Short-Personalized (Short-P) group watched algorithmically recommended videos from their personal accounts to examine the potential moderating role of engagement. These conditions were compared against two controls: a Long group that watched a continuous documentary to provide a baseline, and a Schema group that viewed the first 25 minutes of the same Sherlock episode used in the encoding phase. This latter group served as a positive control to verify that our memory measures were sensitive to the known facilitative effects of prior knowledge48. In the subsequent encoding phase, participants’ eye movements were monitored while they viewed a ~20-min “BBC-Sherlock” episode, a well-established stimulus for event segmentation research, followed by a free recall test. Study 2 (N = 60) paired a similar manipulation phase with a trial-based static image encoding task (Fig. 1b). This comparative approach allowed us to determine if any observed cognitive impact was specific to the disruption of continuous memory processes, such as event segmentation, or indicative of a more general encoding deficit.

Fig. 1. Experimental design of Study 1 and Study 2.

Fig. 1

a Study 1 Outline. In the manipulation phase, participants were exposed to one of four conditions: randomly selected short videos (Short-R group), personalized short videos (Short-P group), a unrelated segment from the “BBC-Ocean” documentary (Long group), or a related segment from a Sherlock movie (Schema group). During the encoding phase, all participants viewed the same segment of the Sherlock movie while their eye movements were monitored, and memory was assessed through a subsequent free recall task. b Study 2 Outline. In the manipulation phase, participants watched either of short videos (Short group) or a documentary from National Geographic (Long group). During the encoding phase, participants rated the approachability of images (faces or houses) in a self-paced task, following the presentation of each image. In the retrieval phase, memory recognition was evaluated using a Remember/Know (RK) paradigm.

Table 1.

Overview of experimental groups, procedures, and hypotheses in Study 1

Group Rationale and purpose Manipulation phaseStimulus Encoding task (Main stimulus) Key hypothesis tested Key literature
Short-R To test the effect of rapid context-switching Random short videos (from a new account). Continuous movie (BBC Sherlock, Section B). disrupt event segmentation and impair subsequent memory Zheng (2021)
Short-P To test the effect of algorithmically personalized content. Personalized short videos (from user’s own account). different (potentially less detrimental) effect on memory encoding Su et al. (2021a; 2021b)
Schema To control for the effect of prior knowledge on encoding. Continuous movie (BBC Sherlock, Section A). facilitate event segmentation and memory encoding Van Kesteren et al. (2010a; 2010b)
Long To establish a baseline for continuous encoding Continuous nature documentary (BBC Ocean). Serves as the primary control group Chen et al. (2017)

Results

Random short video watching links daily viewing habits to memory impairment

In Study 1, we first assessed whether self-reported daily short-video usage, as measured by the TikTok score, differed among the four randomly assigned experimental groups. No significant difference was found (F(3,109) = 1.63, p = 0.187; MLong = 58.43(SD = 12.32); MSchema = 51.26(SD = 11.90); MShort−P = 55.67(SD = 10.92); MShort−R = 52.14(SD = 18.13)), indicating that any subsequent group differences in memory were not attributable to pre-existing differences in short video watching habits. We then examined the relationship between TikTok scores and memory performance across all participants. Higher TikTok scores were marginally and negatively correlated with the number of recalled events (r = −0.17, p = 0.07; Fig. 2a) but were not significantly correlated with either raw detail recall (r = −0.14, p = 0.12; Fig. 2b) or corrected detail scores, which account for variability in available details per event (r = −0.07, p = 0.41; Fig. 2c). Group comparisons using ANOVA revealed significant main effects on all three memory metrics: A one-way ANOVA revealed a significant main effect of Group on the number of recalled events (F(3,109) = 5.83, p < 0.001, pHolm = 0.001, ηp² = 0.138; Fig. 2d). Significant effects were also observed for raw detail scores (F(3,109) = 3.28, p = 0.024, pHolm = 0.048, ηp² = 0.083; Fig. 2e) and corrected detail scores (F(3,109) = 3.00, p = 0.034, pHolm = 0.034, ηp² = 0.076; Fig. 2f). The omnibus effect remained significant after Holm correction across the three primary outcomes

Fig. 2. Daily short video viewing predict memory deficits selectively after random short video watching.

Fig. 2

a–c Across all participants, higher TikTok scores were marginally correlated with recalling fewer events (r = -0.17, p = 0.07) but show no correlation with the detail recall (detail scores: p = 0.12; corrected detail scores: p = 0.41). d–f Group-level analysis of memory performance revealed that the Schema group significantly outperformed all other groups in the number of recalled events. Neither the Short-R nor the Short-P groups demonstrated inferior memory performance relative to the Long group across the measures. g–i Critically, group-specific correlation analyses showed a significant negative relationship between TikTok scores and the number of recalled events exclusively for the Short-R group (r = -0.42, p = 0.02). This correlation was not significant for the Long, Schema, or Short-P groups. Scatter plots show the line of best fit with a 95% confidence interval.

Tukey-adjusted pairwise comparisons indicated that the Schema group recalled significantly more events than the Short-P group (mean difference = 0.180, 95% CI [0.053, 0.307], pTukey = 0.002, Hedges’ g = 0.98, 95% CI [0.25, 1.72]), the Long group (mean difference = 0.171, 95% CI [0.042, 0.300], pTukey = 0.004, g = 0.94, 95% CI [0.19, 1.68]), and the Short-R group (mean difference = 0.148, 95% CI [0.019, 0.277], pTukey = 0.017, g = 0.81, 95% CI [0.07, 1.55]). For raw detail scores, the Schema group also outperformed the Short-P group (mean difference = 0.71, 95% CI [0.05, 1.36], pTukey = 0.028, Cohen’s d = 0.75, 95% CI [0.02, 1.48]) and the pTukey group (mean difference = 0.67, 95% CI [0.00, 1.33], pTukey = 0.049, d = 0.71, 95% CI [ − 0.03, 1.44]). For corrected detail scores, the Schema group showed significantly higher performance than the Short-P group (mean difference = 0.062, 95% CI [0.001, 0.123], pTukey = 0.047, Cohen’s d = 0.70, 95% CI [−0.03, 1.42]) and marginally higher than the Short-R group (mean difference = 0.060, 95% CI [−0.002, 0.123], pTukey = 0.060, d = 0.68, 95% CI [−0.05, 1.42]). No significant differences were observed among the Long, Short-P, and Short-R groups (all pTukey ≥ 0.16, |d| < 0.20).

Finally, we explored correlations between TikTok scores and memory within each group. Critically, we found a significant negative correlation between TikTok scores and the number of recalled events exclusively in the Short-R group (r = −0.42, p = 0.02; Fig. 2g). This relationship was absent in the Long (r = −0.04, p = 0.81), Schema (r = −0.04, p = 0.81), and Short-P (r = 0.27, p = 0.13) groups. To test whether the association between TikTok scores and memory performance differed across groups, an OLS regression model with interaction terms was conducted, using the Short-R group as the reference category. The overall model was significant (F(7, 105) = 3.94, p < 0.001), explaining 20.8% of the variance in performance (adjusted R² = 0.16). A significant main effect of TikTok Score was observed (β = –0.005, SE = 0.002, t(105) = –2.63, p = 0.010), indicating that in the Short-R group, higher TikTok scores were associated with lower performance. In addition, a significant interaction was found between TikTok Score and the Short-P group (β = 0.010, SE = 0.004, t(105) = 2.66, p = 0.009), suggesting that the slope of TikTok Score on performance was more positive in the Short-P group compared to the Short-R group. No significant interactions were observed for the Schema or Long groups (all p > 0.20). No other group-specific correlations between TikTok scores and other memory metrics were significant (all ps > 0.10; Fig. 2h, i).

Reduced boundary responses after exposure to random short videos indexed by inter-subject correlations

We employed eye-tracking-based inter-subject correlation (ISC) analysis to investigate the impact of acute short video viewing on subsequent event segmentation during continuous video watching. This ISC calculation incorporated both vertical and horizontal gaze coordinates, alongside pupil sizes, for each participant. ISC was examined under three conditions (Fig. 3a): 10 s preceding the event boundary (pre-boundary condition), 5 s before and after the boundary (boundary condition), and 10 s after the boundary (post-boundary condition).

Fig. 3. Intersubject correlation (ISC) analysis and event segmentation during memory encoding.

Fig. 3

a Displays synchronized eye movements—positions (X and Y coordinates) and pupil sizes—of participants S1 and S2 while they observed two distinct events separated by an event boundary. ISC is quantified as the mean correlation between both vertical and horizontal gaze positions and pupil sizes across participants. Three ISC metrics were analyzed: pre-boundary ISC, calculated during the final 10 s before the event boundary; boundary ISC, derived from 5 s of eye-tracking data at the boundary; and post-boundary ISC, measured during the initial 10 s after the event boundary. b Participants in the Short-R group, who watched randomly selected short videos, had diminished synchronization of eye movements around event boundaries compared to other groups. c All groups, except the Short-R group, exhibited higher boundary ISC relative to pre-boundary ISC, a phenomenon defined as the Boundary Response. d Boundary Response during encoding was predictive of subsequent detailed recall (i.e., detail scores), and that all three ISC metrics were predictive of the ability to recall the number of events (i.e., recalled events (%)).

Group-level differences in boundary ISC were assessed using a one-way ANOVA with Group as the between-subjects factor. The results revealed a significant main effect of Group for the boundary ISC measure (F = 4.145, p = 0.008, ηp2 = 0.111; Fig. 3b), indicating significant differences in performance across groups. Post hoc Tukey tests showed that the Short-P group exhibited significantly higher boundary ISC than the Short-R group (M = 0.020, pTukey = 0.008, Cohen’s d = 0.911, 95% CI [0.145, 1.677]). Other pairwise comparisons did not reach statistical significance (all pTukey > 0.05).

Additionally, within-group dynamic changes from pre-boundary to boundary were analyzed (Fig. 3c). A mixed repeated-measures ANOVA with time (pre-boundary vs. boundary) as the within-subjects factor and Group as the between-subjects factor showed a significant main effect of time (F = 134.90, p < 0.001, ηp² = 0.054), indicating an increase in ISC from pre-boundary to boundary. Importantly, the time × Group interaction was significant (F(3, 100) = 88.06, p < 0.001, ηp² = 0.106), showing that the extent of increase varied across groups. After Holm–Bonferroni correction for four comparisons, the Long, Schema, and Short-P groups exhibited significant increases in ISC from pre-boundary to boundary (Long: t = 8.29, pHolm < 0.001, Cohen’s d = 1.65; Schema: t = 5.26, pHolm < 0.001, Cohen’s d = 1.03; Short-P: t = 13.52, pHolm < 0.001, Cohen’s d = 2.82), whereas the Short-R group showed a significant decrease (t = −8.13, pHolm < 0.001, Cohen’s d = −1.48).

We characterized the increase in ISC from pre-boundary to boundary as a “boundary response” and assessed its association with memory performance. We observed significant positive correlations between the boundary response and the detail score, which measures how accurately participants could recall movie details (r = 0.24, p = 0.01). This finding suggests that the boundary response, absent in the Short-R group, plays a crucial role in linking event segmentation to successful memory encoding. Additionally, the number of events participants recalled (i.e., remember score) demonstrated positive correlations with pre-boundary ISC (r = 0.27, p = 0.006), boundary ISC (r = 0.25, p = 0.009), and post-boundary ISC (r = 0.25, p = 0.008). We depicted all possible correlations between ISC values and memory performance within the correlation matrix (Fig. 3d).

Altered event segmentation after random versus personalized short video watching revealed by HMM

To objectively characterize event segmentation during naturalistic memory encoding, we applied a Hidden Markov Model (HMM) to participants’ eye-tracking data. We used a leave-one-subject-out cross-validation procedure to determine the optimal number of latent events (K) for each individual. Specifically, for each participant, an HMM was trained on the data from all other participants (N-1) and then used to parse the left-out individual’s time-series data, which comprised pupil diameter and horizontal (x) and vertical (y) gaze coordinates (Fig. 4a). The optimal K value was identified from a candidate range of 15–25. To validate this approach, we constructed a temporal similarity matrix for each participant by correlating their eye-tracking data at every time point with all other time points. Aligning the HMM-identified event boundaries with this matrix confirmed that the model successfully demarcated segments of high within-event pattern similarity, indicating that it captured coherent perceptual events (Fig. 4b).

Fig. 4. Hidden markov model (HMM) analysis of eye-tracking data reveals that disrupted event segmentation is linked to impaired memory recall.

Fig. 4

a Schematic of the inputs for the HMM, which include time-series data of pupil diameter and horizontal (x) and vertical (y) gaze coordinates. Data are simulated for illustrative purposes. b Validation of the HMM approach. HMM-identified event boundaries (red lines) are overlaid on a temporal similarity matrix of eye-tracking pattern similarity. The alignment demonstrates that the model successfully segmented the continuous data into segments of high within-event pattern similarity. The color bar indicates pattern similarity, from low (dark blue) to high (light yellow). c The optimal number of latent events (K) negatively correlates with memory performance. A higher K, indicating more fragmented event segmentation, is associated with recalling fewer details about the movie. d Group differences in the optimal number of events (K). The Short-R group exhibited significantly higher K values than the Long group, suggesting more fragmented perception. e Group differences in HMM model fit, as indexed by p-values. A lower p-value indicates a better fit between the HMM-identified boundaries and the structure of the eye-tracking data. The Schema and Long groups showed a significantly better model fit than both short-video groups.

Having established the validity of this analytical framework, we then proceeded to evaluate the cognitive relevance of the HMM-derived optimal K value. Firstly, we confirmed the cognitive relevance of the HMM-based optimal K value. Across participants, although the optimal K value was not linked to the number of events recalled (i.e., “remember” score; F = 0.23, p = 0.87), it associated with the number of details recalled (“detail” score; F = 7.36, p < 0.001; Fig. 4c). A linear correlation analysis revealed that the optimal K value negatively predicted the detail score (r = −0.42, p < 0.001). Subsequent between-group analyses centered on the optimal K value and p value, assessing the fit between eye-tracking data and model predictions based on HMM-generated event boundaries. A one-way ANOVA revealed a significant main effect of Group on optimal K value (F = 7.26, p < 0.001, ηp² = 0.179; Fig. 4d). Tukey-adjusted pairwise comparisons indicated that the Short-P group exhibited significantly lower optimal K values compared to the Short-R group (mean difference = −1.93, 95% CI [−3.00, −0.85], p < 0.001, Cohen’s d = −1.29, 95% CI [−2.08, −0.51]). Other group differences did not reach statistical significance after adjustment. However, comparisons between the Short-P and Long groups (mean difference = −1.11, 95% CI [−0.01, 2.24], p = 0.054, d = 0.75, 95% CI [−0.04, 1.54]) and between the Short-P and Schema groups (mean difference = −1.08, 95% CI [−0.03, 2.20], p = 0.059, d = 0.73, 95% CI [−0.06, 1.51]) showed marginal effects. No significant differences were found among the Long, Schema, and Short-R groups (all p ≥ 0.157, |d| ≤ 0.57).

A one-way ANOVA also revealed a significant main effect of Group on p-value (F = 4.14, p = 0.008, ηp² = 0.111; Fig. 4e). Tukey-adjusted pairwise comparisons indicated that the Short-P group exhibited significantly higher p-values compared to the Schema group (mean difference = 0.165, 95% CI [0.017, 0.312], p = 0.022, Cohen’s d = 0.84, 95% CI [0.05, 1.62]). Other pairwise differences did not survive multiplicity correction. Comparisons between the Short-P and Long groups (mean difference = 0.153, 95% CI [0.004, 0.302], p = 0.042, d = 0.78, 95% CI [0.02, 1.57]) approached significance but did not remain reliable after adjustment. No significant differences were found among the Long, Schema, and Short-R groups (all p ≥ 0.208, |d| ≤ 0.53).

Unaltered pupil size and moving speed responses to event boundaries across groups

To complement the ISC and HMM analyses, we investigated whether basic eye tracking metrics (i.e., pupil size and gaze moving speed) also reflect event segmentation and were modulated by prior short video watching. First, we analyzed pupil size time-locked to pre-defined event boundaries. Across all experimental conditions, we observed a robust, event boundary-related modulation of pupil size, which peaked at the boundary and subsequently decreased. A repeated-measures ANOVA across five time points (two pre-boundary, the boundary, and two post-boundary) confirmed a significant main effect of time (F(4,448) = 11.69, p < 0.001, ω2 = 0.095; Fig. 5a). Post-hoc comparisons revealed a significant decrease in pupil size one second after the boundary relative to the boundary itself (t(112) = 6.58, p < 0.001, Cohen’s d = 0.61, 95% CI [0.04,0.09]). We used this post-boundary decrease as a pupillary index of event segmentation. However, a mixed ANOVA testing for a group difference in this index (time points: boundary vs. post-boundary) revealed no significant group-by-time interaction (F(3,109) = 0.42, p = 0.737, ω2 < 0.001; Fig. 5b), indicating that this pupillary response did not differ across experimental groups.

Fig. 5. Pupil size and gaze moving speed modulation at event boundaries.

Fig. 5

a The time course of mean z-scored pupil size, time-locked to the onset of event boundaries (t = 0), averaged across all participants. The plot shows a characteristic peak in pupil size at the boundary, followed by a significant decrease in the subsequent seconds. b Group comparison of the pupillary responses to event boundaries. No significant differences in this pupillary response were observed across the four experimental groups. c The time course of mean z-scored gaze speed, time-locked to event boundaries (t = 0), averaged across all participants. A prominent and sharp peak in gaze speed is evident precisely at the event boundary. d Group comparison of the gaze moving speed at event boundaries. The magnitude of the gaze speed increase at event boundaries did not differ significantly across the experimental groups.

A parallel analysis of gaze moving speed across the same five time points also revealed a significant main effect of time, with gaze speed peaking at event boundaries F(4,448) = 221.33, p < 0.001, ω2 = 0.458; Fig. 5c). Again, the magnitude of this response did not differ between groups, as shown by a non-significant group-by-time interaction in a mixed ANOVA across three key time points (pre-boundary, boundary, post-boundary) (F(6,218) = 0.408, p = 0.873, ω2 < 0.001; Fig. 5d).

Robustness of eye-tracking results to low-level confounds

To ensure our primary eye-tracking findings were not attributable to low-level stimulus features, we repeated our key analyses after correcting pupil-size data for gaze location, screen luminance, motion energy, and audio volume. A one-way ANOVA revealed a significant main effect of Group on boundary ISC (F = 5.04, p = 0.003, ηp² = 0.131). Tukey-adjusted pairwise comparisons indicated that the Short-R group exhibited significantly lower boundary ISC compared to the Short-P group (mean difference = −0.025, 95% CI [−0.045, −0.006], p = 0.006, Cohen’s d = −0.93, 95% CI [−1.69, −0.16]). Similarly, the Short-R group showed lower boundary ISC than the Long group (mean difference = −0.021, 95% CI [−0.040, −0.001], p = 0.033, d = −0.75, 95% CI [ − 1.50, −0.01]) and the Schema group (mean difference = −0.023, 95% CI [−0.042, −0.003], p = 0.014, d = −0.82, 95% CI [−1.56, −0.09]). No significant differences were observed among the Long, Schema, and Short-P groups (all p ≥ 0.930, |d| ≤ 0.18).

Similarly, the Hidden Markov Model (HMM) results remained robust. We found a significant main effect of group on the optimal number of hidden states (K-value) (F = 3.49, p = 0.018, ηp² = 0.095), driven by the Short-R group having significantly higher K values than the Short-P group (mean difference = −2.61, 95% CI [−4.84, −0.38], p = 0.015, Cohen’s d = −0.85, 95% CI [ − 1.61, −0.09]), which indicates more fragmented event segmentation. No other pairwise comparisons reached significance, including contrasts between the Short-R group and the Long or Schema groups, as well as among the Long, Schema, and Short-P groups (all p ≥ 0.124, |d| ≤ 0.83).

No evidence that watching short video disrupts memory encoding of static pictures

We conducted Study 2 with an independent sample of 60 healthy young adults to examine the influence of short video watching on memory encoding and its dependency on the type of stimuli. Our hypothesis was that the picture-based memory encoding task, requiring only brief attention spans and no need for spontaneous event segmentation, would show reduced sensitivity to short video exposure. All participants completed a picture-based encoding-retrieval paradigm using faces and houses as stimuli, along with a self-reported questionnaire assessing daily-life short video usage (i.e., TikTok score). During the manipulation phase, participants were randomly assigned to either watch 15 min of short videos or a long video, followed by the encoding phase. The retrieval phase began immediately afterward, and performance was evaluated through separate measures of recognition, familiarity, and recollection for different stimulus types, and collectively for all pictures. No significant differences were found in any of the memory metrics investigated (all p > 0.10; Fig. 6, left panels). Furthermore, across all participants, no significant correlations were observed between TikTok scores and memory metrics (all p > 0.10; Fig. 6, right panels). Complete statistical results for each group comparison and correlation analysis are available in Table 2.

Fig. 6. No evidence of impact from acute or daily life short video watching on memory encoding of static pictures.

Fig. 6

Memory encoding was assessed using a picture-based encoding-retrieval paradigm involving human faces and houses as stimuli. Memory performance was quantitatively assessed through measures of recognition, familiarity, and recollection. The frequency of daily-life short video consumption was quantified via a self-reported questionnaire designed to capture usage intensity (i.e., “TikTok Score”). Left panels: Between-group analysis of memory performance across different stimuli (faces and houses) and memory assessment metrics (recognition, familiarity, and recollection). Right panels: Correlation analysis between TikTok scores and memory performance, encompassing all types of stimuli and memory measures.

Table 2.

Statistical analysis of memory performance in study 2

Group T-testa Pearson correlationb
Metricc Short Long t p r p
Recollectionface 0.45(0.13) 0.50(0.14) 1.316 0.19 0.18 0.14
Recollectionhouse 0.33(0.14) 0.35(0.14) 0.519 0.60 -0.005 0.96
Recollectionall 0.39(0.13) 0.42(0.12) 1.013 0.31 0.10 0.44
Familarityface 0.18(0.17) 0.19(0.18) 0.242 0.80 0.07 0.54
Familarityhouse 0.34(0.79) 0.21(0.22) −0.818 0.41 0.14 0.25
Familarityall 0.26(0.41) 0.20(0.16) −0.680 0.49 0.15 0.22
OverallRecognitionface 0.40(0.18) 0.44(0.16) 0.850 0.39 0.19 0.17
OverallRecognitionhouse 0.27(0.16) 0.30(0.15) 0.548 0.315 -0.06 0.61
OverallRecognitionall 0.34(0.15) 0.37(0.13) 0.814 0.419 0.07 0.58

aBetween-group t-tests (t and p values) were conducted to explore potential differences in memory metrics between the Short and Long groups.

bPearson correlations (r and p values) were performed to identify any associations between memory metrics and questionnaire-based measures of daily short video usage (i.e., TikTok score).

cOverall recognition memory = hit rate (remember + know) − false alarm rate (remember + know). Recollection = hit rate (remember) − false alarm rate (remember). Familiarity = (hit rate know/(1 − hit rate remember)) − (false alarm rate know/(1 − false alarm rate remember)). Subscript indicates that the memory metric was calculated based on pictures of faces, houses, or all pictures used.

Discussion

Our findings provide behavioral and eye-tracking evidence that the nature of short-video watching (i.e., random or personalized) differentially affects subsequent event segmentation during subsequent memory encoding. This cognitive disruption manifested behaviorally as impaired memory for narrative films, an effect specific to continuous encoding that did not extend to discrete, trial-based memory tasks. Advanced eye-tracking analyses further delineated the underlying mechanisms. Intersubject correlation (ISC) analysis revealed that exposure to random short videos, but not personalized ones, diminished the synchronization of eye movements at event boundaries. Furthermore, a Hidden Markov Model (HMM) indicated that personalized short videos induced an altered and less accurate fragmentation of the event structure. Collectively, these results suggest that the algorithmic curation of short-video content is a critical factor that selectively disrupts the process of event segmentation, thereby altering how continuous experiences are encoded into memory.

Our behavioral analyses from Study 1 found that acute exposure to short videos, coupled with habitual daily usage, impairs event memory encoding. This suggests that neither self-reported daily use nor acute exposure to short videos independently influences memory encoding. Although previous studies have examined the cognitive impacts of short video watching36, none have explored its relationship with human episodic memory encoding. A related study demonstrated that using search engines reduces recall rates of the information itself (i.e., Google effect)49. With the rise of short video platforms as a dominant social media form, there is growing interest in their effects on memory. In a parallel Study 2, we found that neither routine daily viewing nor acute exposure to short videos prior to picture encoding affected memory performance. This outcome supports the resilience of picture-based memory encoding to interference from prior short video watching and highlights the advantages of using more ecologically valid stimuli and paradigm in studies of human cognition50,51, specificity pointing to the disruption of event segmentation as the core underlying mechanism. Our results underscore the necessity of using ecologically valid paradigms and investigating the interaction between acute exposure and habitual short video watching to fully characterize its cognitive consequences.

Our eye-tracking analyses provide direct evidence for this proposed mechanism. Using Hidden Markov Models (HMM) and intersubject correlation (ISC) analysis, we found that eye-tracking-based event segmentation index, greater segmentation fragmentation, and lower synchrony were associated with poorer recall and critically altered after short video watching. This finding contributes a novel dimension to Event Segmentation Theory (EST)79 by applying it to understand the carry-over effects of a prevalent real-world behavior—short-video watching—on subsequent cognitive processing. While EST traditionally focuses on event boundaries triggered by the immediate perceptual stream and internal prediction errors9,52, our results demonstrate that segmentation is highly susceptible to the residual cognitive states induced by prior activities. We propose that the frequent, rapid context shifts inherent to short-video watching habituate the brain to a high-frequency “segment-and-refresh” processing mode, lowering the threshold for what constitutes a “prediction error.” When subsequently watching a continuous narrative, this habituated state persists, causing the brain to over-interpret minor fluctuations as event boundaries. This aberrant segmentation disrupts the formation of coherent, durable event models, leading to the impaired memory performance we observed.

Our comprehensive eye-tracking analyses, integrating advanced models (i.e., ISC and HMM) with basic metrics (i.e., pupil size and moving speed), suggest the cognitive mechanisms underlying the disruption in event segmentation following short-video watching. The results indicate that this disruption originates at a high-level conceptual stage rather than from alterations in low-level perceptual processing or arousal. Consistent with prior work showing that HMM-based event segmentation models are robust to low-level visual features13,41. Specifically, our HMM analysis identified a greater number of latent eye-movement states post-exposure. This suggests that participants formed more fragmented, internally generated event representations. In contrast, we observed no significant changes in pupil size or moving speed surrounding predefined event boundaries, indicating that arousal and basic perceptual responses to major event transitions remained stable. This dissociation—an increase in latent event boundaries without a change in physiological responses at explicit boundaries—provides novel experimental evidence for a specific mechanism of altered event segmentation. Whereas other manipulations might shift boundary perception (e.g., anticipation35) or alter event duration (e.g., in developmental stages18,37), short-video exposure appears to uniquely promote the fragmentation of continuous experience. This interpretation of fragmented segmentation supports our central argument of disrupted sustained attention and aligns with broader findings on the cognitive consequences of short video watching.

The divergent results of Study 1 and Study 2 provide compelling evidence that short-video exposure does not cause a global memory impairment, but rather selectively disrupts the process of event segmentation. The continuous movie task in Study 1 required participants to actively segment a temporal stream of information, a process our results show is vulnerable to disruption following exposure to fragmented content. In contrast, the discrete image-encoding task in Study 2 relied more on item-specific visual processing, which does not depend on temporal segmentation. The null result in Study 2 thus serves as a crucial control, indicating that the impairment observed in Study 1 is tied specifically to the demands of processing continuous, unfolding events. We propose that this stems from the viewing pattern fragmenting sustained attention and imposing cognitive load, thereby disrupting the event segmentation crucial for encoding temporally unfolding information. In contrast, Study 2 found no effect on memory for static images, likely because encoding discrete items relies more on visual perception and selective attention and less on the temporal integration provided by event segmentation. Therefore, the detrimental impact of such short-video viewing appears specific to cognitive processes dependent on segmenting continuous experience, highlighting the vulnerability of attentional continuity and temporal integration mechanisms to short-video watching involving rapid context shifts.

Our findings reveal an intriguing dissociation between the effects of random and personalized short videos. This dissociation likely stems from the distinct cognitive processes captured by our measures: whereas intersubject correlation (ISC) of eye movements reflects synchronized processing of normative, stimulus-driven event boundaries, the Hidden Markov Model (HMM) analysis uncovers the viewer’s idiosyncratic, latent cognitive state. Accordingly, we found that behavioral memory impairments and the corresponding reduction in ISC at event boundaries were observed exclusively following exposure to random short videos, suggesting a disruption in processing shared event structure. In contrast, alterations in the latent event structure were unique to the personalized short-video condition. Contrary to our initial hypothesis, exposure to personalized videos resulted in the identification of fewer latent events and lower accuracy in predicting boundaries. This suggests that the cognitive state induced by personalized content—characterized by a coarser segmentation style with less defined event representations—carries over to subsequent, unrelated tasks. This result establishes the content curation algorithm as a key moderator of the cognitive impact of short video watching. While previous research has predominantly focused on the addictive potential of personalized algorithms25,26, our study provides evidence that engaging (personalized) and non-engaging (random) short videos distinctly modulate continuous memory processes. A plausible mechanism, grounded in predictive processing theories of event cognition, is that random videos, often misaligned with user interests, generate frequent prediction errors that disrupt ongoing event model updating. Conversely, the high engagement or ‘flow’ state induced by personalized content may reduce the cognitive resources allocated to parsing the environment, thereby blurring the salience of discrete event boundaries and promoting a more integrated, holistic encoding style for subsequent experiences.

Our findings have significant implications for educational contexts. Effective learning from a lecture or a textbook depends on the ability to segment a continuous flow of information into a coherent structure of concepts. These findings have significant implications for educational contexts. Effective learning from extended materials, such as a lecture or textbook, relies on segmenting continuous information into a coherent structure. The segmentation impairment we observed suggests that habitual exposure to short-form media may hinder a student’s ability to build these structured mental representations, potentially impacting their comprehension and long-term retention. While our study focused on university students, these findings may also be relevant to other heavily engaged age groups, such as adolescents. Furthermore, the observed impairments could extend to other cognitive tasks requiring sustained attention and temporal integration, including complex problem-solving and in-depth reading comprehension.

Our findings, derived from both behavioral and eye-tracking data, are limited to the acute effects of short-video exposure on event segmentation, as the memory encoding task was administered immediately after the manipulation phase. Consequently, the persistence of these cognitive alterations remains an open question. One possibility is that the observed impairment reflects a transient cognitive state that dissipates over a short timescale, analogous to the temporal window in which emotional arousal modulates subsequent memory encoding53. Alternatively, the interaction we found with pre-existing viewing habits suggests a more durable phenomenon. Habitual exposure may induce chronic changes in cognitive processing, rendering individuals more susceptible to the disruptive effects of certain viewing patterns. This interpretation remains untested in previous studies, which have also focused on acute cognitive effects36. Disentangling this transient versus chronic account is a crucial next step for research, which could be addressed by systematically varying the delay between short-video exposure and cognitive assessment. However, this approach presents considerable methodological challenges, given the difficulty of isolating acute experimental effects from the cumulative impact of widespread and frequent short-video consumption among young adults.

Our study has several limitations that warrant acknowledgment. Firstly, no eye-tracking data were collected when participants engaged in the picture-based memory encoding task (i.e., Study 2). However, this was due to the methods used for analyzing eye movements being suitable only for capturing cognitive processes during naturalistic memory encoding tasks. Consequently, it was not possible to compare eye-tracking metrics between Studies 1 and 2. Secondly, we recruited participants who are typical daily users of short videos; specifically, we did not recruit those with a severe addiction to short videos. Future studies could include a cohort of heavy users in similar study design. Thirdly, our study did not include direct assessments of individual cognitive abilities, such as attentional control and working memory capacity. While we assumed that random assignment to experimental groups would mitigate systematic group differences, an important avenue for future research will be to explore these individual differences. For instance, individuals with higher baseline working memory capacity may be more resilient to the fragmenting effects of short-video exposure. Fourthly, the use of foreign film material (Sherlock) with Chinese participants introduces a potential issue of cultural familiarity. However, the between-subjects design, where all groups viewed the same encoding material, helps to minimize this factor’s impact on our main conclusions. Future research should still explore how short-video exposure might differently affect participants depending on their familiarity with the cultural context of the stimuli. Fifthly, our study is the deliberate trade-off between strict experimental control and ecological validity, a choice reflected in our stimulus design. For instance, rather than artificially segmenting a single long film to control for content, we used authentic short-video streams, embracing the thematic diversity and unpredictability that define the real-world viewing experience. Similarly, in comparing the Random and Personalized short-video conditions, we prioritized matching total exposure time. This necessarily confounded the degree of user agency with the raw number of content shifts, making it difficult to isolate their unique cognitive impacts. While these factors represent limitations, we argue they are intrinsic features of the phenomenon under investigation. Future research could aim to disentangle these variables to build upon our ecologically-grounded findings. Finally, we used commercially available movies as stimuli in Study 1, which may have masked memory impairments in participants who viewed short videos prior to memory encoding, due to the high attractiveness and emotional content of the movies. Future research could employ more emotionally neutral and educationally relevant video clips, such as recordings from real-life educational lectures47,54, as memory encoding material.

In conclusion, this study establishes a direct link between short-video watching and event segmentation for continuous memories. Importantly, these effects are absent in discrete memory tasks, underscoring the unique challenges posed by processing continuous information. Given that self-report studies have predominated in short video research27, we advocate for interdisciplinary efforts, particularly involving cognitive psychologists and neuroscientists, to further investigate and elucidate the changes in event segmentation caused by short video watching across varied demographics, with a special focus on children and adolescents.

Methods

Participants

Participants were recruited from the CCNU university community. Eligibility was determined through a pre-study screening process. Prospective participants were required to meet the following criteria: (1) no self-reported history of neurological or psychiatric conditions; (2) a score below 11 on the Beck Depression Inventory-II (BDI-II); (3) self-reported prior experience with using short-video applications (e.g., TikTok); and (4) no prior familiarity with the experimental stimuli (i.e., the ‘Sherlock’ episode in Study 1 or the specific face set in Study 2). The BDI-II cutoff was established to minimize potential confounds from depressive symptomatology, which can affect memory and attention. A substantial body of research has demonstrated that mood disorders, such as depression, are often associated with impairments in cognitive functions55, including episodic memory56. To ensure that we could more purely examine the specific impact of short-form video watching on event segmentation and memory encoding, we incorporated this exclusion criterion. Individuals who did not meet all criteria were not invited to the formal testing session. Crucially, all participants who passed this screening and completed the full experimental procedure were included in the final analyses for both studies. No participants were excluded from the final sample after data collection finished. This study was conducted in accordance with the Declaration of Helsinki. The research protocol was reviewed and approved by the Central China Normal University Institutional Review Board (Approval No.CCNU-IRB-202212008b). Written informed consent to participate was obtained from all participants prior to their inclusion in the study. No participants included in this study were under the age of 16.

In Study 1, 113 right-handed, healthy undergraduates (73 female, 40 males; average age: 20.42 years, standard deviation: 2.04) were tested. All had normal or corrected vision, no hearing deficits. Recruitment was via advertisements, with compensation ($10 USD) provided for participation in the full session. Each gave written informed consent ahead of the experiments. These were randomly distributed into four groups to receive various experimental manipulations (see the experimental design section for details). The demographics of the groups were: Short-R (28 participants; 11 male, average age = 20.32, SD = 2.08), Short-P (30 participants; 8 male, average age = 20.58, SD = 1.99), Long (28 participants; 9 male, average age = 20.18, SD = 2.11), and Schema (27 participants; 12 male, average age = 20.77, SD = 1.96). No significant differences in age (F = 1.145, p = 0.33) or gender distribution (χ²=2.28, p = 0.51) were observed between groups. The study protocols were approved by the University Institutional Review Board for Human Subjects.

In Study 2, we recruited 60 right-handed, healthy undergraduate students (52 females, 8 males; mean age = 21.7, SD = 1.7), who completed all experimental procedures and were included in the final data analysis. During the manipulation phase, participants were instructed to use their phones to play videos (short or long) within the app. Participants were divided into two age- and gender-matched groups: a Short group (26 females, 4 males; mean age = 21.36, SD = 1.65) and a Long group (26 females, 4 males; mean age = 22.2, SD = 1.71). Recruitment was via advertisements, with compensation ($10 USD) provided for participation in the full session. The study protocols received approval from the University’s Institutional Review Board for Human Subjects.

Given no previous study investigated the effect of short video watching on memory encoding, estimating the precise effect size before collecting data was not possible. The sample size was guided by earlier research on event boundaries that typically engaged 20–30 subjects10,15,37, often supplemented by other techniques like EEG and fMRI. A relevant previous study, employing pupil dilation measurements to examine event segmentation, involved 34 participants and posited a need for at least 25 subjects to identify a substantial effect size (d = 0.8) under conditions of an α of 0.05 and 80% power44. For our Study 1, we targeted a minimum of 25 evaluable subjects per experimental condition, ensuring complete datasets of memory and eye movement metrics. All primary and secondary analyses exceeded this participant count, with the minimal sample in the schema group, with 27 participants included in the analyses. Study 2, we achieved the final sample of 30 participants in our final analyses. A post-hoc sensitivity power analysis was conducted using G*Power (v. 3.1) to confirm that our sample provided adequate statistical power for the primary analyses. For the one-way ANOVA on intersubject correlation (ISC) values, our sample (N = 113) yielded achieved power of 0.89 to detect the observed effect size (ηp2 = 0.111) at an alpha of 0.05. For the analysis of the Hidden Markov Model (HMM) metrics, the achieved power was 0.99 for the observed effect size (ηp2 = 0.179). These results confirm that our study was sufficiently powered to detect the key effects of interest.

Materials and design

This Study 1 adopted a between-subjects design to investigate the effect of short-video watching on continuous memory encoding. Participants were randomly assigned to one of four experimental groups: (1) the Short-Random (R) group, (2) the Short-Personalized (P) group, (3) the Long group, and (4) the Schema group. The primary independent variable was the type of video content viewed during the manipulation phase. The key dependent variables were (1) memory performance, assessed using a free recall task for a subsequent continuous movie, and (2) eye-movement dynamics during movie encoding, including intersubject correlation (ISC) at event boundaries, latent state patterns derived from a Hidden Markov Model (HMM), and pupil size/gaze moving speed around boundaries.

This Manipulation Phase aimed to influence event segmentation during subsequent video viewing. We randomly divided the participants into four groups: (1) the Short-Random (R) group, (2) the Short-Personalized (P) group, (3) the Long group (watching “Ocean”), and (4) the Schema group (watching the first 25 min of “Sherlock”). The viewing duration for the participants in all four groups was 25 min. During the manipulation phase, participants in the Short-Personalized group had control over the video playback; they were instructed that they could skip the current video and move to the next one in the downloaded sequence at any time by pressing the designated key. This condition was designed to incorporate an element of user control over the content flow, distinct from the passive viewing in the random condition, where videos played consecutively without participant intervention.

Immediately following the completion of the manipulation phase, participants proceeded to the memory encoding phase without any delay. During this phase, participants from all groups watched the 21-min segment from “BBC Sherlock”.

In the free recall phase, participants recounted in detail the content from the encoding phase, emphasizing the depth of recall. The manipulation phase content, while not directly evaluated, was included in assessments to maintain participant focus throughout the experiment. Initial responses were captured via smartphones and later transcribed for detailed memory analysis. All recording sessions were conducted in a controlled laboratory setting. Participants’ verbal recalls were recorded using a smartphone. This device was chosen as the goal was to capture the semantic content of the recall for manual transcription, and the analyses did not require acoustic features. The recording quality was confirmed to be sufficient for clear and unambiguous transcription, ensuring the reliability of our memory scoring.

All experimental tasks in Study 1 were presented on a 23-inch LCD monitor with a screen resolution of 1280 × 720 pixels. Stimuli were presented, and behavioral responses were recorded using Experiment Builder software (SR Research Ltd., Ottawa, ON, Canada). The core memory encoding stimulus, viewed by all participants, was a 21-min continuous segment from the latter half of the television episode Sherlock (Season 1, Episode 1, “A Study in Pink”). We selected this episode as it allowed us to adopt previously established event boundaries identified by Chen et al. (2017), ensuring consistency with prior event segmentation research, and because prior studies indicated its capability to evoke distinct, event-specific neural representations11,12,15,17. Before this encoding phase, participants were assigned to one of four manipulation groups. All four acute exposure conditions were matched for total duration (i.e., 25 min). However, the conditions inherently differed in the number of discrete video clips presented and the degree of user agency over the content stream. Participants in the Short-Random (R) group were exposed to 400 short TikTok clips across themes such as humans, animals, objects, and scenes, each theme contributing 100 clips of 1 to 5 s, curated from a neutral account to avoid bias from personalized algorithms. Participants in the Short-Personalized (P) group viewed short videos downloaded from their own TikTok accounts. Participants in the Long group watched a segment of the BBC documentary “Ocean”. Participants in the Schema group watched the initial 25-min segment of the same Sherlock episode to facilitate the formation of a narrative schema.

Participants’ verbal free recall responses were audio-recorded for later scoring. The scoring procedure was adapted from Chen et al.12 who used the same Sherlock episode stimulus. The original coding scheme segmented the full episode into 50 distinct events, each containing multiple details (approximately 1000 details in total). For the current study, which used a segment corresponding to 20 events from the original study, we created our scoring sheet by extracting these 20 events and their associated details (range: 1–24 details per event) from the Chen et al.12 system. This list was translated into Chinese. To enhance scoring accuracy, keywords (e.g., character names, specific actions, dialogue elements) were manually added to the description of each detail by one author (H.X.L.). The original scoring system document from Chen et al.12 is available in our OSF repository, along with our adapted and translated version used for scoring.

Two independent raters, blind to experimental conditions, coded the audio-recorded recall protocols. Using the scoring sheet, raters marked each detail as ‘remembered’ if the participant’s recall accurately mentioned its core content or associated keywords. An event was coded as ‘remembered’ if one or more of its constituent details were recalled. If no details from an event were recalled, the event was coded as ‘forgotten’. Inter-rater reliability was assessed on the initial, independent ratings from both coders before any discussion or consensus-building. For the binary measure of whether an event was recalled, agreement was excellent, as indicated by a Cohen’s Kappa of 0.91. For the quantitative measure of the number of details recalled per event, reliability was also high, with an Intraclass Correlation Coefficient (ICC) of 0.86 (calculated using a two-way random effects model for absolute agreement, ICC(2,k)). After this initial reliability assessment, the two raters discussed any scoring discrepancies to achieve 100% consensus. Any scoring disagreements were resolved through discussion to achieve consensus. The final, consensus-based scores were used for all subsequent statistical analyses of the free recall data.

Audio recordings from the free recall task were transcribed and assessed by two independent raters. The evaluation of each participant’s recall was based on three metrics: (1) the remember score, reflecting the number of events recalled; (2) the raw detail score, representing the average number of details recalled per event; and (3) the corrected detail score, calculated as the ratio of recalled details to the total details per event, averaged across all events. Additionally, we investigated the serial position effect by calculating the recall probability for each event, considering its position in the original movie’s storyline.

In Study 2, we also organized the research into three distinct phases: the manipulation phase, the memory encoding phase, and the retrieval phase (i.e., picture recognition task). Prior to beginning the main experimental tasks, all participants received instructions and completed a brief practice session for the approachability rating task using practice stimuli distinct from those used in the encoding phase.

Following the practice, participants entered the manipulation phase, where they used the Douyin app to watch 15 min of either short videos or the long video documentary. For the short video group, participants were instructed to use the TikTok application on their own smartphones as they normally would. This involved naturalistic interaction with the app, including actively scrolling through the feed, selecting videos to watch, and skipping content via user-initiated actions. This approach was chosen to enhance ecological validity, ensuring the exposure involved the typical user control inherent in real-world short-video consumption, before assessing its impact on the subsequent memory task. For both short and long video groups, participants were instructed not to use the comment or like functions of the app and to disable notifications from other apps to avoid interruptions.

Following the completion of the manipulation phase, participants proceeded immediately to the memory encoding phase without any delay. Participants underwent an initial encoding session where they were tasked with rating the approachability of 60 faces and 60 non-social images. This activity was structured into two segments separated by a 20-s interval, with each image displayed for 3500 ms followed by a self-paced approachability rating on a scale from 1 (not approachable) to 5 (very approachable). This task masked the true intent of memory encoding, requiring detailed attention to each stimulus. An interstimulus interval of 500 ms was maintained.

Memory retention was unexpectedly assessed using the remember/know (RK) paradigm57. Participants were presented with the previously encoded images alongside 30 new faces and 30 new non-social images across two blocks, totaling 180 stimuli. They were asked to classify each image as remembered, known, or new. Each image was shown for 3500 ms, followed by a self-paced response, with trials separated by a 500 ms interval. A brief practice session using unrelated cartoon faces and buildings preceded the testing to clarify the distinction between “remember” and “know” responses.

Participants watched short/long videos on their own smartphones using their personal Douyin (the Chinese version of TikTok) accounts. This was done to increase the ecological validity of the short-video watching experience. Participants performed a picture-based memory task on a laboratory computer. The memory task was programmed and administered using E-Prime 3.0 (Psychology Software Tools, Pittsburgh, PA). No eye-tracking was performed in Study 2. Stimuli for the manipulation phase consisted of either short videos viewed via the Douyin/TikTok app on participants’ own smartphones or a long video, specifically the documentary “The Enchanting Creatures of the Motuo Forest—Inside the World’s Largest Canyon” from National Geographic. The duration of this phase was 15 min. Stimuli for the memory encoding and retrieval phases consisted of a total of 180 images. Ninety images depicted human faces and were sourced from the Chinese Academy of Sciences’ Sinicized Face Emotion Image System. From a pool of 600 grayscale images, we selected 90 faces (45 males, 45 females) displaying positive, neutral, and negative emotions—30 faces per emotion category. Sixty of these faces were employed for memory encoding, and 30 served as distractors during the recognition task. For non-social stimuli, we curated 90 images (30 each of houses, castles, and offices) from pixabay.com and used them under Creative Commons licenses permitting reuse for research purposes. These were similarly divided, with 60 used for encoding and 30 for interference, all presented within an oval mask against a black backdrop. Distinct practice stimuli (e.g., cartoon images) were used for the initial practice session of the approachability rating task. Cartoon faces and buildings were used for the practice session before the retrieval phase.

Overall recognition memory accuracy was quantified by the net difference between the overall hit rate and false alarm rate [i.e., Overall recognition memory = hit rate (remember + know) − false alarm rate (remember + know)]. Separate metrics for recollection and familiarity were calculated: recollection was defined as the net hit rate for remembered items minus the false alarm rate for those classified as remembered [i.e., hit rate (remember) − false alarm rate (remember)], and familiarity was calculated as the net rate of known responses, adjusted for false alarms [i.e., (hit rate know/(1 - hit rate remember)) - (false alarm rate know/(1 - false alarm rate remember))]. These measures were used to evaluate the impact of experimental variables on long-term memory retention. Given that the images originated from two principal categories—faces and houses—we first computed the Overall Recognition, Recollection, and Familiarity scores separately for each category. This analysis aimed to determine if the act of watching short videos exerted a selective impact on certain categories. Subsequently, we computed all memory metrics without regard to image category. In summary, we derived nine memory metrics for comparative analyses between groups of participants who viewed either short or long videos prior to the encoding phase, and for correlational studies with questionnaire-based assessments of daily short video usage.

Questionnaire measure of short video use

Participants from both Study 1 and Study 2 completed a 20-item Short Video Use Scale at the conclusion of their respective experiments, which was used to assess the degree of participants’ short video usage. This data collection occurred post memory encoding and retrieval phases to prevent participants from inferring the study’s objectives, which could potentially influence their memory performance. Problematic short-video usage was assessed using a scale adapted from the validated Chinese version of the Internet Addiction Test (IAT)58. The primary modification involved replacing the term “the Internet” with “Short Videos59”. To ensure clarity, the beginning of the questionnaire defined “Short Videos” for participants as encompassing content from platforms such as TikTok (International Version), Douyin (the Chinese version of TikTok), KuaiShou, and the Tencent Short Video Platform. While this adapted scale demonstrated effectiveness in previous research by Su et al. on short-video usage25,26, it has not undergone separate, formal psychometric validation. The scale consisted of 20 items rated on a 5-point Likert scale. In our sample, participants typically completed this questionnaire in approximately 2–3 min.

Eye-tracking data acquisition

To control for environmental factors, all experimental sessions were conducted under constant artificial illumination within a sound-attenuated, light-controlled room, minimizing potential pupillary changes due to natural light. All participants, regardless of their group allocation, were accommodated in a uniform environment where they were exposed to stimuli on the same model of monitor, with their eye movements being uniformly recorded. The experimental setting was modified to mitigate any influence from external light by using curtains to block out natural sunlight, ensuring that indoor lighting maintained consistent luminance throughout the day. The precision in monitoring eye movements was achieved through a desk-mounted EyeLink 1000 system (SR Research Ltd., Mississauga, Ontario, Canada), using a 35-mm lens at a high sampling rate of 1000 Hz. A head and chin rest provided by SR Research minimized head movements. Participants were not given specific instructions regarding blinking, allowing for natural viewing behavior during the extended task duration (approximately 45 min for the movie encoding). Before the onset of the experiment, participants underwent a briefing on the eye movement calibration and the eye tracker setup, which included adjustments to the camera for optimal pupil visibility and modifications to the infrared sensitivity threshold. Calibration involved directing participants to sequentially focus on the four extremities of the monitor’s display and was initiated by asking them to concentrate on nine randomly ordered black dots across the screen, placed at predetermined locations. The calibration’s extent adapted based on the stimulus task’s screen area, using a 9-point grid. Each point had to be focused upon by the participants until its disappearance, without predicting its trajectory. Post-calibration, the accuracy was verified by reevaluating participants’ ability to fixate on the same nine points, this time assessing the consistency between recorded and actual positions to quantify visual discrepancies. Participants were required to pass the system’s default calibration validation threshold before beginning the experiment; if the initial calibration was unsuccessful, the procedure was repeated until the criterion was met. Specific quantitative precision and accuracy metrics for each participant’s calibration were not recorded. Eye-tracking data was collected throughout the experiment, including the initial manipulation phase and the subsequent encoding phase (i.e., watching “Sherlock” movie). Critically, all reported eye-tracking analyses were performed exclusively on the data from the encoding phase. This was because all participants viewed the identical ‘Sherlock’ movie, providing a common baseline to assess group differences induced by the preceding video manipulation. In contrast, eye-movements during the short-video phase were not directly comparable across participants due to the randomized nature of the video content.

Eye-tracking data preprocessing and analysis

We used MATLAB to preprocess the eye-tracking data. Since all videos presented had a fixed, predetermined duration, we segmented the continuous eye-tracking data by extracting the time series corresponding to the exact duration of video presentation during memory encoding. This time-locking procedure yielded data sequences of standardized length for each participant, ensuring the complete temporal information within each segment was retained for subsequent analysis. We also corrected for artefacts due to blinks by linearly interpolating the eye-tracking signal during detected blink periods (typically lasting 100 to 300 ms), culminating in a complete dataset for each participant that included coordinates of screen fixations (X, Y) and pupil diameters.

To investigate how event boundaries influence the synchronization of eye movement patterns and the effects of prior exposure to short videos, we calculated the inter-subject correlation (ISC) of eye movements during specific temporal segments around event boundaries (pre-event, during-event, and post-event) for each participant. We selected Intersubject Correlation (ISC) analysis because it provides a robust measure of shared attentional engagement across participants viewing dynamic, continuous stimuli like movies46,60. Lower ISC, particularly at narrative transition points (event boundaries), indicates divergence in viewing patterns, potentially reflecting disruptions in shared event segmentation processes47. The analysis was conducted by calculating Pearson’s correlation coefficients for vertical gaze positions between each subject and all other viewers, averaging these values to derive an ISC for each individual, and repeating across all participants to establish individual ISC scores. This procedure was also applied to pupil size and horizontal gaze data. A composite ISC was then calculated as the mean of these values: ISC = (ISCx + ISCy + ISCpupil)/3. To assess synchronized eye movements around event boundaries, we calculated Intersubject Correlation (ISC) across three specific time windows relative to each event boundary: Pre-boundary ISC: Calculated using eye-tracking data from the 10 s immediately preceding the event boundary (i.e., −10 s to 0 s relative to the boundary). Boundary ISC: Calculated using eye-tracking data spanning 5 s before and 5 s after the event boundary (i.e., −5 s to +5 s relative to the boundary). Post-boundary ISC: Calculated using eye-tracking data from the 10 s immediately following the event boundary (i.e., 0 s to +10 s relative to the boundary). ISC was quantified as the mean correlation of vertical gaze position, horizontal gaze position, and pupil size between pairs of participants within these defined time windows. We analyzed ISC variability for each participant across these conditions using repeated-measures ANOVA and conducted between-group comparisons using ANOVA to determine if ISC scores varied among different groups.

We adapted Hidden Markov Models, previously used in fMRI11 and EEG17 studies, to analyze eye-tracking data. To model the underlying structure of event segmentation from the continuous eye-tracking data, we employed Hidden Markov Models (HMMs). HMMs are particularly well-suited for identifying latent cognitive states (i.e., distinct ‘viewing patterns’ potentially corresponding to perceived events) from sequential data like gaze coordinates11,37,41. By analyzing the transitions and properties of these states, HMMs allow us to quantify characteristics such as the fragmentation or coherence of event segmentation during viewing. This method processed three time series: X and Y coordinates of screen fixations and pupil size. Our data-driven event segmentation model could discern specific eye movement patterns and the structural dynamics of events, including the identification of event boundaries. Initially, we explored individual differences in event structures without presetting the number of events (K). We employed the HMM across K values ranging from 15 to 25, allowing the model to select the optimal K based on eye movement data. Previous fMRI studies, using the same movie clip, established a ground truth of 20 events12. Consequently, K values below 20 might indicate overlooked responses near annotated boundaries, whereas values above 20 could suggest an over-segmentation of event cognition, with additional boundaries emerging from the eye movement patterns. We further examined the correlation between individual K values and memory performance to affirm the relationship between event segmentation and event memory. As hypothesized, K values exceeding the established ground truth correlated with poorer memory performance. Subsequently, we set the model to a predefined k of 20 to elucidate latent states and the structure of observed events. Using a leave-one-out approach, we initially applied this model with a fixed k to ascertain optimal event boundaries for each participant. We evaluated the model’s fit by comparing the HMM-determined boundaries against those from a null model using within-event similarity metrics. This comparison was statistically analyzed against a distribution generated from 1000 randomized event sequences to assess the significance of the model’s fit. A significant p-value indicated a robust alignment between the observed eye movement patterns and the HMM-derived event boundaries.

To isolate pupil size changes associated with cognitive engagement, specifically event segmentation, from low-level confounds, we implemented a two-step correction procedure13. First, to correct for artifacts related to gaze position, the screen was divided into a grid of spatial bins based on x- and y-coordinates, and the raw pupil area was z-scored within each bin. Second, we employed a multiple linear regression model to partial out variance attributable to low-level sensory features for each participant’s time series. This model included three regressors: (1) Screen Luminance, quantified as the mean grayscale value of each video frame and downsampled to the pupil data’s sampling rate; (2) Motion Energy, calculated as the frame-by-frame absolute difference in luminance to account for visual transients; and (3) Audio Volume, represented by the root-mean-square (RMS) of the audio signal’s amplitude in 1-s windows. The residuals from this regression—representing pupil variance after accounting for these sensory confounds—were defined as the “corrected pupil size.” To ensure the robustness of our results, all group-level analyses were performed on both the raw and the corrected pupil size data.

Interpretation of behavioral and eye-tracking measures

To clarify the link between our quantitative measures and the underlying cognitive processes, we specified the following interpretive framework. Free Recall (Study 1): The number of recalled events (“remember score”) was interpreted as a direct behavioral index of successful event segmentation. The associated ‘detail scores’ were used to measure the qualitative richness of the resulting episodic memory traces. Intersubject Correlation (ISC) (Study 1): ISC quantifies shared attentional engagement. We interpret lower ISC at narrative event boundaries as a marker of divergent processing, reflecting a failure of the collective attentional reorientation necessary to update event models. This provides a group-level, eye-tracking-based index of impaired event segmentation. Hidden Markov Models (HMM) (Study 1): The HMM analysis models an individual’s latent event states. We interpret a higher number of inferred states (optimal K) as evidence for more fragmented and unstable internal event models during viewing. This provides an individualized quantification of segmentation coherence, where higher fragmentation indicates disruption. Recognition Memory (Study 2): We used the Remember/Know paradigm to distinguish recollection from familiarity. We interpret recollection (‘Remember’ responses) as the retrieval of contextual details, a process reliant on high-quality event segmentation during encoding. We interpret familiarity (‘Know’ responses) as a context-free sense of prior exposure, which we predicted would be less sensitive to segmentation quality.

Statistical analyses

All statistical analyses were performed using Python and JASP. To control for multiple comparisons and interactions, we implemented a combination of ANOVA models with post hoc corrections and regression models with interaction terms. Group-level differences in memory performance and eye-tracking metrics were assessed using one-way ANOVAs for each of the primary outcome measures. These models included Group as the between-subjects factor. To control for Type I error due to multiple comparisons, Holm-Bonferroni corrections were applied to the omnibus ANOVA p-values. Significant omnibus results were followed by Tukey’s HSD post hoc comparisons, providing multiplicity-adjusted confidence intervals and effect sizes. Effect sizes were reported as partial eta squared (ηp²) for the ANOVA results, and Hedges’ g or Cohen’s d with 95% confidence intervals for pairwise comparisons. For individual difference analyses, we first calculated within-group correlations. However, to more formally evaluate whether these correlations differed across groups, we conducted an ordinary least squares (OLS) regression model that included independent variable, Group, and their interaction terms. The Short-R group was used as the reference category for the analysis, which allowed for an evaluation of the differential effect of TikTok scores across the other groups. The model’s fit was assessed using F statistics, adjusted R², and regression coefficients with standard errors and confidence intervals. This approach allowed for a more comprehensive analysis of the data, accounting for multiple factors and their interactions within a single framework. The use of mixed ANOVA and regression models addressed the concerns about multiple comparisons and interactions, helping to control for Type I error and ensuring that the complexity of the data was appropriately evaluated.

Acknowledgements

W.L. was supported by the National Natural Science Foundation of China (grant No. 32300879 and No. W2421004), Humanities and Social Sciences Fund, Ministry of Education (grant No. 22YJCZH109), and Fundamental Research Funds for the Central Universities (CCNU25ai035). X.H. was supported by the Knowledge Innovation Program of Wuhan-Shuguang Project (2023020201020384).

Author contributions

W.L. and X.H. conceived the study; J.S.L. and H.X.L. collected the data; W.L. and J.S.L. analyzed the data; W.L. and H.X.L. prepared the first draft. W.L. and X.H. reviewed and edited the manuscript, provided supervision, and obtained funding.

Data availability

Processed behavioral and eye-tracking data have been made available on theOSF platform (https://osf.io/n7k45/?view_only=673d89b23720410799b5daec1f5d511e).

Code availability

The analytical scripts and models used in the study are available in the same repository where the data is hosted (https://osf.io/n7k45/?view_only=673d89b23720410799b5daec1f5d511e).

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Xin Hao, Email: psyhaoxin@ccnu.edu.cn.

Wei Liu, Email: weiliu1991@ccnu.edu.cn.

References

  • 1.Liao, M. Analysis of the causes, psychological mechanisms, and coping strategies of short video addiction in China. Front. Psychol.15, 1391204 (2024). [DOI] [PMC free article] [PubMed]
  • 2.Feng, J. et al. Identifying fragmented reading and evaluating its influence on cognition based on single trial electroencephalogram. Front. Hum. Neurosci.15, 753735 (2021). [DOI] [PMC free article] [PubMed]
  • 3.Chen, Y., Li, M., Guo, F. & Wang, X. The effect of short-form video addiction on users’ attention. Behav. Inf. Technol.42, 2893–2910 (2023). [Google Scholar]
  • 4.Chiossi, F., Haliburton, L., Ou, C., Butz, A. M. & Schmidt, A. Short-Form Videos Degrade Our Capacity to Retain Intentions: Effect of Context Switching On Prospective Memory. In Proc. 2023 CHI Conference on Human Factors in Computing Systems 1–15 (ACM. 10.1145/3544548.3580778, 2023).
  • 5.Yang, Y. et al. Time distortion for short-form video users. Comput. Hum. Behav.151, 108009 (2024). [Google Scholar]
  • 6.Joiner, R. et al. The effect of different types of TikTok dance challenge videos on young women’s body satisfaction. Comput. Hum. Behav.147, 107856 (2023). [Google Scholar]
  • 7.Kurby, C. A. & Zacks, J. M. Segmentation in the perception and memory of events. Trends Cogn. Sci.12, 72–79 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Zacks, J. M. & Tversky, B. Event structure in perception and conception. Psychol. Bull.127, 3–21 (2001). [DOI] [PubMed] [Google Scholar]
  • 9.Zacks, J. M. Event perception and memory. Annu. Rev. Psychol.71, 165–191 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Hasson, U., Furman, O., Clark, D., Dudai, Y. & Davachi, L. Enhanced intersubject correlations during movie viewing correlate with successful episodic encoding. Neuron57, 452–462 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Baldassano, C. et al. Discovering event structure in continuous narrative perception and memory. Neuron95, 709–721.e5 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Chen, J. et al. Shared memories reveal shared structure in neural activity across individuals. Nat. Neurosci.20, 115–125 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Antony, J. W. et al. Behavioral, physiological, and neural signatures of surprise during naturalistic sports viewing. Neuron109, 377–390.e7 (2021). [DOI] [PubMed] [Google Scholar]
  • 14.Pérez, P. et al. Conscious processing of narrative stimuli synchronizes heart rate between individuals. Cell Rep.36, 109692 (2021). [DOI] [PubMed] [Google Scholar]
  • 15.Liu, W., Shi, Y., Cousins, J. N., Kohn, N. & Fernández, G. Hippocampal-medial prefrontal event segmentation and integration contribute to episodic memory formation. Cereb. Cortex32, 949–969 (2022). [DOI] [PubMed] [Google Scholar]
  • 16.Cohn-Sheehy, B. I. et al. The hippocampus constructs narrative memories across distant events. Curr. Biol.31, 4935–4945.e7 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Silva, M., Baldassano, C. & Fuentemilla, L. Rapid memory reactivation at movie event boundaries promotes episodic encoding. J. Neurosci.39, 8538–8548 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Reagh, Z. M., Delarazan, A. I., Garber, A. & Ranganath, C. Aging alters neural activity at event boundaries in the hippocampus and Posterior Medial network. Nat. Commun.11, 3980 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Polyn, S. M., Norman, K. A. & Kahana, M. J. Task context and organization in free recall. Neuropsychologia47, 2158–2163 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Heusser, A. C., Ezzyat, Y., Shiff, I. & Davachi, L. Perceptual boundaries cause mnemonic trade-offs between local boundary processing and across-trial associative binding. J. Exp. Psychol. Learn Mem. Cogn.44, 1075–1090 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Radvansky, G. A. & Zacks, J. M. Event boundaries in memory and cognition. Curr. Opin. Behav. Sci.17, 133–140 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Radvansky, G. A. & Zacks, J. M. Event perception. WIREs Cogn. Sci.2, 608–620 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Zhang, X., Wu, Y. & Liu, S. Exploring short-form video application addiction: Socio-technical and attachment perspectives. Telemat. Inform.42, 101243 (2019). [Google Scholar]
  • 24.Bobadilla, J., Ortega, F., Hernando, A. & Gutiérrez, A. Recommender systems survey. Knowl. Based Syst.46, 109–132 (2013). [Google Scholar]
  • 25.Su, C., Zhou, H., Wang, C., Geng, F. & Hu, Y. Individualized video recommendation modulates functional connectivity between large scale networks. Hum. Brain Mapp.42, 5288–5299 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Su, C. et al. Viewing personalized video clips recommended by TikTok activates default mode network and ventral tegmental area. Neuroimage237, 118136 (2021). [DOI] [PubMed] [Google Scholar]
  • 27.Montag, C., Marciano, L., Schulz, P. J. & Becker, B. Unlocking the brain secrets of social media through neuroscience. Trends Cogn. Sci.27, 1102–1104 (2023). [DOI] [PubMed] [Google Scholar]
  • 28.Yao, N., Chen, J., Huang, S., Montag, C. & Elhai, J. D. Depression and social anxiety in relation to problematic TikTok use severity: the mediating role of boredom proneness and distress intolerance. Comput. Hum. Behav.145, 107751 (2023). [Google Scholar]
  • 29.McCashin, D. & Murphy, C. M. Using TikTok for public and youth mental health – a systematic review and content analysis. Clin. Child Psychol. Psychiatry28, 279–306 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Chao, M., Lei, J., He, R., Jiang, Y. & Yang, H. TikTok use and psychosocial factors among adolescents: comparisons of non-users, moderate users, and addictive users. Psychiatry Res.325, 115247 (2023). [DOI] [PubMed] [Google Scholar]
  • 31.Qu, D. et al. The longitudinal relationships between short video addiction and depressive symptoms: a cross-lagged panel network analysis. Comput. Hum. Behav.152, 108059 (2024). [Google Scholar]
  • 32.Parry, D. A. et al. A systematic review and meta-analysis of discrepancies between logged and self-reported digital media use. Nat. Hum. Behav.5, 1535–1547 (2021). [DOI] [PubMed] [Google Scholar]
  • 33.Sonkusare, S., Breakspear, M. & Guo, C. Naturalistic stimuli in neuroscience: critically acclaimed. Trends Cogn. Sci.23, 699–714 (2019). [DOI] [PubMed] [Google Scholar]
  • 34.Chun, M. M. & Turk-Browne, N. B. Interactions between attention and memory. Curr. Opin. Neurobiol.17, 177–184 (2007). [DOI] [PubMed] [Google Scholar]
  • 35.Lee, C. S., Aly, M. & Baldassano, C. Anticipation of temporally structured events in the brain. Elife10, e64972 (2021). [DOI] [PMC free article] [PubMed]
  • 36.Geerligs, L., van Gerven, M. & Güçlü, U. Detecting neural state transitions underlying event segmentation. Neuroimage236, 118085 (2021). [DOI] [PubMed] [Google Scholar]
  • 37.Yates, T. S. et al. Neural event segmentation of continuous experience in human infants. Proc. Natl. Acad. Sci.119, e2200257119 (2022). [DOI] [PMC free article] [PubMed]
  • 38.Zacks, J. M. et al. Human brain activity time-locked to perceptual event boundaries. Nat. Neurosci.4, 651–655 (2001). [DOI] [PubMed] [Google Scholar]
  • 39.Ben-Yakov, A. & Henson, R. N. The hippocampal film editor: sensitivity and specificity to event boundaries in continuous experience. J. Neurosci.38, 10057–10068 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Hahamy, A., Dubossarsky, H. & Behrens, T. E. J. The human brain reactivates context-specific past information at event boundaries of naturalistic experiences. Nat. Neurosci.10.1038/s41593-023-01331-6 (2023). [DOI] [PMC free article] [PubMed]
  • 41.Li, J., Chen, Z., Hao, X. & Liu, W. Boundaries in the eyes: measure event segmentation during naturalistic video watching using eye tracking. Behav. Res. Methods57, 255 (2025). [DOI] [PubMed] [Google Scholar]
  • 42.Ryan, J. D. & Shen, K. The eyes are a window into memory. Curr. Opin. Behav. Sci.32, 1–6 (2020). [Google Scholar]
  • 43.Kragel, J. E. & Voss, J. L. Looking for the neural basis of memory. Trends Cogn. Sci.26, 53–65 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Clewett, D., Gasser, C. & Davachi, L. Pupil-linked arousal signals track the temporal organization of events in memory. Nat. Commun.11, 4007 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Eddy, S. R. What is a hidden Markov model?. Nat. Biotechnol.22, 1315–1316 (2004). [DOI] [PubMed] [Google Scholar]
  • 46.Nastase, S. A., Gazzola, V., Hasson, U. & Keysers, C. Measuring shared responses across subjects using intersubject correlation. Soc. Cogn. Affect Neurosci.10.1093/scan/nsz037 (2019). [DOI] [PMC free article] [PubMed]
  • 47.Madsen, J., Júlio, S. U., Gucik, P. J., Steinberg, R. & Parra, L. C. Synchronized eye movements predict test scores in online video education. Proc. Natl. Acad. Sci.118, e2016980118 (2021). [DOI] [PMC free article] [PubMed]
  • 48.Coutanche, M. N., Koch, G. E. & Paulus, J. P. Influences on memory for naturalistic visual episodes: sleep, familiarity, and traits differentially affect forms of recall. Learn. Mem.27, 284–291 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Sparrow, B., Liu, J. & Wegner, D. M. Google effects on memory: cognitive consequences of having information at our fingertips. Science333, 776–778 (2011). [DOI] [PubMed] [Google Scholar]
  • 50.Holleman, G. A., Hooge, I. T. C., Kemner, C. & Hessels, R. S. The ‘real-world approach’ and its problems: a critique of the term ecological validity. Front. Psychol.11, 529490 (2020). [DOI] [PMC free article] [PubMed]
  • 51.Liu, W., Guo, J. & Li, H. Using artworks to understand human memory and its neural mechanisms. N. Ideas Psychol.74, 101095 (2024). [Google Scholar]
  • 52.Zacks, J. M., Speer, N. K., Swallow, K. M., Braver, T. S. & Reynolds, J. R. Event perception: a mind-brain perspective. Psychol. Bull.133, 273–293 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Tambini, A., Rimmele, U., Phelps, E. A. & Davachi, L. Emotional brain states carry over and enhance future memory formation. Nat. Neurosci.20, 271–278 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Amalric, M., Roveyaz, P. & Dehaene, S. Evaluating the impact of short educational videos on the cortical networks for mathematics. Proc. Natl. Acad. Sci.120, e2213430120 (2023). [DOI] [PMC free article] [PubMed]
  • 55.Rock, P. L., Roiser, J. P., Riedel, W. J. & Blackwell, A. D. Cognitive impairment in depression: a systematic review and meta-analysis. Psychol. Med.44, 2029–2040 (2014). [DOI] [PubMed] [Google Scholar]
  • 56.James, T. A. et al. Depression and episodic memory across the adult lifespan: a meta-analytic review. Psychol. Bull.147, 1184–1214 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Rajaram, S. Remembering and knowing: two means of access to the personal past. Mem. Cogn.21, 89–102 (1993). [DOI] [PubMed] [Google Scholar]
  • 58.Černja, I., Vejmelka, L. & Rajter, M. Internet addiction test: croatian preliminary study. BMC Psychiatry19, 388 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Widyanto, L. & McMurran, M. The psychometric properties of the internet addiction test. CyberPsychology Behav.7, 443–450 (2004). [DOI] [PubMed] [Google Scholar]
  • 60.Hasson, U., Nir, Y., Levy, I., Fuhrmann, G. & Malach, R. Intersubject synchronization of cortical activity during natural vision. Science303, 1634–1640 (2004). 1979. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

Processed behavioral and eye-tracking data have been made available on theOSF platform (https://osf.io/n7k45/?view_only=673d89b23720410799b5daec1f5d511e).

The analytical scripts and models used in the study are available in the same repository where the data is hosted (https://osf.io/n7k45/?view_only=673d89b23720410799b5daec1f5d511e).


Articles from NPJ Science of Learning are provided here courtesy of Nature Publishing Group

RESOURCES