Abstract
Remembering the temporal dynamics of past experiences help people plan for the future. Previous studies using discrete pictorial stimuli showed that people are better at remembering the temporal order of items occurring within the same perceptual context than items spanning across a contextual boundary, suggesting that event segmentation can structure temporal order memory by resetting item-level binding mechanisms. However, in meaningful everyday scenarios, other mechanisms may play equal or greater roles: Two potentially powerful candidates are hierarchical event structure and knowledge about typical temporal order. In a pair of experiments testing order memory with both short (2.5-minute) and longer (20-minute) delays, we presented narratives that described everyday activities with semantic constraints on order for fine-grained actions or for coarse-grained activity units. Constraints on either level improved order memory, at both delays. In some cases, this reversed the typical finding that temporal order memory within events is better than across events. An additional experiment revealed that serial recall was chunked based on coarse-level event membership and that semantic order constraints helped organize recall order. A final experiment showed that even in the absence of semantic constraints on coarse-grained activity, participants could use episodic memory for coarse-grained order to constrain memory for fine-grained order, given accurate source memory. Collectively, these results provide evidence for important roles played by hierarchical event structure and prior knowledge in scaffolding reconstructive memory, indicating that reconstruction processes use multiple sources of information in addition to simple episodic associations between fine-grained units.
Keywords: episodic memory, temporal order memory, associative memory, event segmentation, event schema
Memory for time is a fundamental aspect of human cognition. As people go through everyday life, it is adaptive to have a chronological sense of the past: Remembering the temporal order of past events allows people to communicate experiences, to anticipate the structure of future activities, and to plan their actions accordingly. For instance, dancers need to remember the sequence of dance choreography to perform on the stage, and travel writers need to reflect on the order of their sightseeing when drafting a travel report. In forensic and medical settings, accurately recounting the progression of events is crucial for making correct inferences and resolving conflicts; for example, a witness’s memory for whether a suspect left a building before or after a bomb went off could be critical to solving a crime.
Consider an example: One morning, you cleaned up in the kitchen after breakfast and then immediately drove to work. Prior findings from lab-based paradigms have provided evidence that experiences within the same event form stronger temporal associations in long-term memory than those separated by contextual changes (DuBrow & Davachi, 2013; Ezzyat & Davachi, 2011; Heusser et al., 2018). This would seem to imply that remembering the order of two instances happening within the same “cleaning up” event (e.g., whether you wiped the table first or dried a dish first) should always be easier than retrieving the order of two instances separated by the change in larger events (e.g., whether you wiped the table first or adjusted the car’s air conditioning first). However, it seems plausible that the second memory problem might actually be the easier of the two. This could be because one might remember the temporal order of the larger events of cleaning and driving, and be able to use them to answer the question about the sub-events of wiping the table and adjusting the car’s air conditioning. Moreover, if one has semantic knowledge regarding the typical order of these events on larger or smaller temporal scale, this can also be used to reconstruct the order. In this example, you might know that one typically cleans up before driving to work on the larger event scale. In another instance, you might be able to deploy knowledge that one typically washes dishes before drying them—order information on the smaller sub-event scale. In sum, people may use hierarchical organization, as well as semantic knowledge about order on coarser or finer levels in a hierarchical event structure, to scaffold memory for temporal order. Our aim with this research was to integrate these factors into accounts of temporal memory, to better extend them to everyday experiences.
Perceived Event Structure Influences Memory Organization
People spontaneously segment continuous experience into distinct events (Zacks et al., 2007; Shin & DuBrow, 2021). There is substantial behavioral and neuroimaging evidence suggesting that people agree on when transitions between events in everyday scenarios occur, and that they are sensitive to hierarchical structures in events (Zacks & Tversky, 2001; Zacks et al., 2006; Sasmita & Swallow, 2023; Baldassano et al., 2017; Geerligs et al., 2022). Event perception shows evidence of partonomic hierarchy—smaller event units grouping into larger ones—and this hierarchical organization has been shown to increase as participants became more familiar with an event sequence (Hard et al., 2006; Zacks, 2020). In both reading and film viewing, it has been demonstrated that people are able to track multiple dimensions in the story, including characters, spatial locations, goals, and their causal and temporal relations; these dimensions and relations inform how comprehenders segment and update their model for the current event at adaptive moments (Gernsbacher, 1991; Zwaan & Radvansky, 1998; Zacks et al., 2009). For example, when a reader encounters a sentence in a story saying that the main character goes from their home to school, the change in spatial location may suggest that the reader needs to update their previous event model about home activities to accommodate new events that happen at the school. One behavioral signature of event model updating is that reading time increases when the reader of a narrative encounters an event boundary (Zacks et al., 2009), and this can be partially explained by modeling effects of situational changes, including changes in space (Radvansky et al., 2001; Zwaan & Radvansky, 1998), goal (Radvansky et al., 2001), and character (McNerney et al., 2011).
Event structure influences how events are encoded, stored, and later reconstructed from long-term memory (Radvansky, 2012; Radvansky & Zacks, 2017; Rait & Hutchinson, 2024; Rubin & Umanath, 2015; Zacks, 2020). In one study conducted by Ezzyat and Davachi (2011), participants read narratives that contained event boundary sentences, operationalized as describing temporal shifts such as “an hour later.” When participants were cued with a sentence from the narrative and were asked to recall the next sentence, cued recall performance was worse if an event boundary separated the cue and the to-be-recalled sentence. This effect was later replicated in both younger and older adults (Davis & Campbell, 2023), and it suggests that event boundaries help discretize complex experience into distinct episodes in long-term memory, reducing interference across different event models during retrieval.
Contextual Boundaries and Temporal Order Memory Impairment
Consistent with the finding that boundaries impair cued-recall performance, many studies have reported that contextual boundaries disrupt people’s ability to remember the temporal order among items. Unlike previous studies using narratives as stimuli, these tasks (known collectively as the “Ezzyat-Dubrow-Davachi Event Memory paradigm,” or the “EDD paradigm,” see Buonomano et al., 2023) aim to simplify naturalistic event structure (Heusser et al., 2018). During encoding, participants are presented with a series of discrete pictures, and the “context” (for example, background color) in which these items appear changes periodically to create perceptible boundaries. During retrieval, participants are given two pictures and are asked to judge their relative recency (“Which of these two stimuli were seen first?”), as well as to rate the subjective temporal distance between them during encoding (“How far apart in time were the two stimuli presented?”). A consistent finding is that people are less able to remember the order of the two probed pictures when the pictures are encoded in different contexts compared to when they are encoded in the same context. This effect holds true when contextual changes are operationalized through a change in background color (Heusser et al., 2018; Pu et al., 2022), stimulus category (DuBrow & Davachi, 2013, 2014; Sols et al., 2017), spatial location (Horner et al., 2016; Gurguryan et al., 2021), background sound (Clewett et al., 2020, 2025; McClay et al., 2023), encoding task (Wang & Egner, 2022), goal state (Cowan et al., 2024), or reward structure (Rouhani et al., 2020). This boundary-related disruption in temporal order memory is often accompanied by an inflation in temporal distance judgment, meaning that participants are more likely to rate two stimuli separated by an event boundary as temporally further away from each other than two stimuli that occurred within an event, despite actual equal temporal lags (Clewett et al., 2020; Ezzyat & Davachi, 2014; Wen & Egner, 2022).
Together, these findings have been interpreted in terms of associative mechanisms that emphasize episodic associations formed on a single temporal scale (DuBrow & Davachi, 2013; Heusser et al., 2018). One type of associative mechanisms is the chaining of direct item-item associations (Lewandowsky & Murdock, 1989; Murdock, 1983). According to chaining theories, memory for the serial order of items is supported by encoding and retrieving the pairwise associations between sequential items (Lewandowsky & Murdock, 1989). This predicts that event boundaries induced by contextual changes will disrupt the formation of associative links between items during encoding, thereby causing recency information across events more difficult to retrieve (Heusser et al., 2018). Another associative mechanism, which is more frequently discussed in the event cognition literature (DuBrow et al., 2017), is the retrieval of context associated with items at encoding; this forms the basis of the temporal context model (TCM) and the related context maintenance and retrieval (CMR) model. These models emphasize an indirect temporal linking mechanism among items through their shared, gradually drifting temporal context (Howard & Kahana, 2002; Polyn et al., 2009). According to this account, event boundaries may cause an abrupt shift in the slowly drifting temporal context representation, which makes it easier to retrieve the temporal association among items within the same event than across different events (Clewett & Davachi, 2017; DuBrow et al., 2017; DuBrow & Davachi, 2013).
Multiple Processes May Contribute to Real-life Temporal Order Memory, Including Hierarchical Structure and Semantic Knowledge
However, consider again the example of wiping the table and later adjusting the air conditioning in the car. The richer structure and available knowledge in this situation illustrate a challenge for theories: Can explanations of how the mind and brain remember the order of “nano-events” in artificial materials handle the structure inherent in real-life scenarios? Recent studies that attempt to generalize the canonical temporal order memory finding to more ecologically valid settings have posed challenges to these accounts, particularly when the stimuli have meaningful hierarchical structures. Wen and Egner (2022) noticed one key difference between the stimuli used in the EDD paradigm and real life events: Items (e.g., a picture of a ball) are frequently orthogonalized with their encoding context (e.g., a purple background) in the EDD paradigm, whereas in real life people sometimes have knowledge about the hierarchical relationship between finer-grained activities (e.g., wiping the table) and the coarser-grained events they belong to (e.g., cleaning up after breakfast) (Zacks & Tversky, 2001). Using a modified version of the EDD paradigm, they found that if the encoding context of items was salient enough and available during retrieval, the canonical result of boundary-induced impairment could be flipped: Recency judgments for items spanning across two events would be more accurate than items within the same event, and this effect co-occurred with an inflated temporal distance rating for across-event items. In addition, Yousif and colleagues (2024) found that when asking people to reconstruct the timing of events from popular multi-season TV shows on a timeline, they were more likely to confuse the order of events within a meaningful larger grouping unit (i.e., season) than across different seasons. These findings clearly contradict the pattern predicted by previously discussed frameworks centered on item-level associative mechanisms (Clewett & Davachi, 2017; Howard & Kahana, 2002; Lewandowsky & Murdock, 1989; Polyn et al., 2009) and suggest hierarchical event structure may play important roles in real-life memory of temporal order. However, the exact mechanism to reconcile these views has not been specified.
In addition, as Friedman (1993, 2004) argued, memory for the time of autobiographical experiences can be primarily viewed as a reconstructive process. Prior studies have revealed that—apart from drawing information from episodic memory—there are multiple additional processes that may contribute to this reconstruction: Rememberers may utilize semantic knowledge about recurring temporal sequences to reconstruct episodic temporal order (Bower et al., 1979; Lichtenstein & Brewer, 1980). They may engage in causal reasoning to infer which potential temporal orders are likely (Lehn et al., 2009; St. Jacques et al., 2008). They may perform mental simulations using retrieved episodic information, thereby filling in missing information (Strickland & Keil, 2011). They may use chronological event order to infer narrated event order (Xu & Kwok, 2019). Mechanisms involving these processes have largely been overlooked by studies using the EDD paradigm, because the order of both item and context changes in these paradigms are arbitrary, and the relationship between items and contexts is often orthogonalized. This prevents participants from using existing semantic knowledge to facilitate remembering the temporal order of the information, from reasoning about potential causal interactions, or from simulating likely outcomes.
To address this gap in understanding the mechanisms of temporal memory, here we investigate comprehenders’ ability to use semantic knowledge and the hierarchical structure of events to help with order reconstruction. There is strong evidence that people have semantic knowledge about stereotyped event sequences in long-term memory, which are referred to as “scripts” (Abelson, 1981) or “event schemas” (Mandler, 1984; Rumelhart, 1980). Key aspects of script knowledge include information about hierarchical organization—which fine-level events are likely to happen during a coarse-level event—and the typical order of events at a given level (Mandler, 1984). When people describe what typically happens during familiar activities (e.g., eating in a restaurant), they have good agreement on the fine-level events that constitute the coarse-level event (e.g., entering, ordering, eating, etc.), as well as specific characters (e.g., customer, waiter, cashier, etc.) and actions (e.g., pick up menu, look at menu, etc.) (Bower et al., 1979). Within a script’s hierarchical structure, it may specify constraints on temporal order at a given level, but the degree to which scripts do so varies across activities (Abelson, 1981; McRae et al., 2021). At one end of the spectrum, some scripts have strong constraints on the ordering of their constituent events. These constraints can result from both causal relations (for example, eating a meal can only happen after ordering the meal) and socio-cultural conventions (for example, in western culture, the main course is typically served after the appetizer). At the other end of the spectrum, some scripts have constituent activities that are generally agreed-upon, but with weak constraints on the way they are sequenced (for example, an event such as cleaning a room contains typical subevents like vacuuming the floor and cleaning the table, but they can happen in any order). Together, both aspects may serve as important sources of information for how we encode and reconstruct the temporal dynamic in everyday life.
Current Study and Hypotheses
In the current study, we propose that in scenarios that involve both meaningful hierarchical event structure and semantic knowledge about typical event order, processes in addition to boundary-induced contextual change may influence the reconstruction of order memory. We created narrative stimuli about everyday activities with a two-level hierarchical structure, which paralleled the stimulus structure in the well-established EDD paradigm. This is illustrated in Figure 1: Each narrative had different coarse-level events, and each coarse-level event consisted of different fine-level events. Instead of pairing items randomly with contexts (e.g., a picture of a ball shown in a purple background) as a new episodic association to learn like in the EDD paradigm, in the current stimuli the part-whole relationship between fine-level and coarse-level events was designed to rely on participants’ existing knowledge. For example, the fine-level event peeling some potatoes belongs to the coarse-level event help with cooking dinner in the kitchen. In addition, instead of having completely random ordering among “items” and “contexts,” the narratives were constructed to have strong order constraints at one of the two levels: In Coarse-level Semantic (CS) narratives, there were strong semantic order constraints only among coarse-level events, but there no semantic order constraints among each set of fine-level events within each coarse-level event. For example, in the “visiting aunt” CS narrative (see Figure 1A), most people would agree that the protagonist would likely perform the first three coarse-level steps in this order: prepare at home, then drive the car to his aunt’s house, and then get greeted by his aunt in her living room. In contrast, the fine-level events within the preparing at home coarse-level event, such as taking out hoodie and checking the address could plausibly occur in any order. Fine-level Semantic (FS) narratives were written to have a structure that was the complement of CS narratives: There were strong semantic order constraints on each set of fine-level events within each coarse-level event, but there were no semantic order constraints among coarse-level events. For example, in the “visiting the zoo” FS narrative (see Figure 1B), most people would agree that the coarse-level events of getting a free souvenir tattoo and watching a sea lion show could plausibly occur in any order. But within the visit the snack cart coarse-level event, most people would agree that the protagonist will first wait in the line, then tell the owner what he wants, and then pay and get food. We conducted a norming study to verify people’s agreement on the existence and absence of semantic order constraints in both fine-level and coarse-level event sets we used for stimuli development. Details of the norming study are reported in Supplemental Materials.
Figure 1. Example Stimuli Used in Experiment 1-3.

Note. The two types of narrative stimuli. Each narrative had 5 coarse-level event labels and 27 sentences, including an opening sentence, an ending sentence, and 25 fine-level event sentences in between. Panel A. Coarse-level Semantic (CS) Narrative: There were semantic order constraints on the order of the five coarse-level event labels, but there were no semantic order constraints among the five fine-level event sentences within each coarse-level event. Panel B. Fine-level Semantic (FS) Narrative: There were no semantic order constraints on the order of the five coarse-level event labels, but there were semantic order constraints on the five fine-level event sentences within each coarse-level event. Black arrows in the figure indicate the location and the direction of semantic order constraints. Sentences pairs highlighted in orange indicate across-event pairs, and sentence pairs highlighted in blue indicate sample within-event pairs that were probed in the test phase.
With the stimulus structure illustrated in Figure 1, several different processes could be used for reconstructing the order of two fine-level events at retrieval time, depending on (a) whether the two fine-level events occurred within the same coarse-level event, and (b) whether fine-level or coarse-level event order in the narrative was supported by semantic knowledge. These are illustrated in Figure 2. As was shown in the EDD paradigm, one direct route to judge the order of fine-level events at retrieval is to recover the episodic associative links among them (dotted arrows between the small ellipses in Figure 2). These links can be more reliable for fine-level events belonging to the same coarse-level event (i.e., within-event pairs), compared to fine-level events spanning across an event boundary (i.e., across-event pairs) (DuBrow & Davachi, 2013; Heusser et al., 2018). If this were the only mechanism, we should always observe better order memory for within-event pairs compared to across-event pairs. However, we suggest that an additional mechanism can be used for recovering the order of across-event pairs, especially when the relationship between fine- and coarse-level events is not orthogonalized: If participants have source memory about which coarse-level event each fine-level event belonged to, they can make a recency judgment by recovering associative links among coarse-level events (dotted arrows between the large ellipses in Figure 2). We hypothesized that this indirect route to recovering fine-grained temporal order would have a strong influence in meaningful narratives, similar to how explicitly reinstating encoding context at retrieval improves across-event pair recency judgment in Wen & Egner (2022). Thus, manipulating whether readers have previous knowledge about fine- or coarse-level event order in the narrative should determine whether within- or across-event judgment is more accurate: When semantic knowledge reinforces associative links among fine-level events (in FS narratives; solid arrows between the small ellipses in Figure 2), recency judgment for within-event pairs should be more accurate than across-event pairs. In contrast, when semantic knowledge reinforces associative links among coarse-level events (in CS narrative; solid arrows between the large ellipses in Figure 2), recency judgment for across-event pairs should be more accurate than within-event pairs. To preview our results, we found strong evidence supporting the existence of this additional mechanism.
Figure 2. Schematic Diagram of Hypothesized Processes.

Note. Schematic diagram showing how different processes could be used to reconstruct the order of different types of pairs during retrieval. Both CS and FS narratives have a two-level hierarchical structure, with ellipses representing coarse-level events and circles representing fine-level events that belong to each coarse-level event. Circles in blue represent within-event pairs and circles in orange represent across-event pairs. Arrows represent different types of associative links, and arrows marked in red represent the associative links that can be used for recency judgment of a given pair during retrieval. Note that each narrative used in the experiment was consisted of five coarse-level events, with five fine-level events nested within each coarse-level event. Here we show only three coarse-level events per narrative to save space.
These central hypotheses were tested in the first two experiments, with a relatively short delay between encoding and retrieval (about 2.5 minutes, Experiment 1a) and with a longer delay (about 20 minutes, Experiment 1b). In addition, we hypothesized that when reading each narrative, changes in coarse-level events would lead to coarse-level event model updating and induce typical boundary-related effects. These effects would include increased reading time for boundary sentences during encoding (Radvansky et al., 2001; Zwaan & Radvansky, 1998) and inflated temporal distance rating for fine-level event pairs spanning across event boundaries during retrieval (Clewett et al., 2020; Ezzyat & Davachi, 2014; Wen & Egner, 2022).
In line with the mechanisms proposed above, we further hypothesized that semantic order knowledge and coarse-level event membership would influence the temporal organization of events in long-term memory. This was tested in Experiment 2 using a serial recall task. Finally, in Experiment 3, we explicitly tested different components of the indirect route we proposed for across-event recency judgment, including source memory and coarse-level order memory. Taken together, these studies supplement previous theoretical accounts to present a comprehensive picture of how order memory of naturalistic events can be reconstructed using multiple sources of information, including hierarchical event structure and semantic order knowledge.
Experiments 1a and 1b
In Experiment 1, we tested the effect of event boundaries on temporal order memory in a narrative reading paradigm, where either coarse-level or fine-level semantic order knowledge could be utilized to facilitate with temporal order reconstruction. We hypothesized that during encoding, people would perceive coarse-level event changes as event boundaries, which would lead to increased reading time when spatial changes occur (Radvansky et al., 2001; Zwaan & Radvansky, 1998). As a result of event model updating, during retrieval people would judge fine-level event pairs that spanned across event boundaries as farther apart in time than fine-level event pairs in the same coarse-level event, even though they were separated by the same number of sentences in between (Clewett et al., 2020; Ezzyat & Davachi, 2014). Critically, we hypothesized that the effect of event boundaries on recency judgment would depend on how semantic order knowledge could be utilized: In FS narratives, where fine-level events within the same coarse-level event were governed by semantic order constraints, it should be easier to discriminate the temporal order of two fine-level events from the same coarse-level event (FS_within pairs). However, in CS narratives, where coarse-level events were governed by semantic order constraints, it should be easier to discriminate the temporal order of two fine-level events from two different coarse-level events (CS_across pairs), which would reverse the canonical result from the EDD paradigm. We predicted that the recency judgment accuracy for these two conditions would be better than for the two conditions without semantic order knowledge facilitation (CS_within and FS_across pairs). In addition, we hypothesized that because event models are formed hierarchically during reading comprehension, the order of coarse-level events would serve as important information for recency judgment between fine-level events. In the pilot experiment of Experiment 1a, we observed that recency judgment confidence was higher for FS_across pairs compared to CS_within pairs, despite the fact that recency judgment accuracy did not differ significantly between the two conditions. We therefore hypothesized that for the two conditions lacking semantic order knowledge facilitation, participants would have higher confidence when judging the order for FS_across pairs compared to CS_within pairs. This increased confidence would stem from the potential use of remembered coarse-level event order as an additional source of information.
We tested these hypotheses in two experiments with different delay intervals between the encoding and the test phase: In Experiment 1a, participants were tested about each narrative they read after a short distraction task (approximately 2.5 minutes) — a delay similar to that used in previous studies using the EDD paradigm (DuBrow & Davachi, 2013; Heusser et al., 2018; Wen & Egner, 2022). However, in Experiment 1b, participants encoded all the ten narratives at once and then received tests regarding these narratives according to the order they were encoded. This created a natural delay of about 20 minutes between the encoding and retrieval of each narrative and filled the delay period with learning and testing on other stimuli. Thus, we sought to observe if the patterns we observed about temporal order memory would hold true after more chances of interference and memory decay that resembled memory retention in the real world.
Another procedural difference between the two experiments was the modalities engaged during encoding. In Experiment 1a, participants read the narrative texts sentence by sentence on each screen in a self-paced format. But in Experiment 1b, in order to increase participants’ engagement during the longer encoding period, we additionally played audio accompanying each sentence during reading, and participants could only proceed to the next screen after the audio of each sentence finished playing. As a result, we only tested the hypotheses regarding reading time in Experiment 1a, because the length of audio for each sentence might constrain self-paced reading time in Experiment 1b. All the other hypotheses regarding recency judgment accuracy, recency judgment confidence, and temporal distance rating as described above were tested in both experiments.
Method
Transparency and Openness
The Institutional Review Board at Washington University in St. Louis approved all studies. We preregistered the design, hypotheses, and analysis plans of all four experiments reported in this manuscript on Open Science Framework (https://osf.io), except as explicitly noted. The preregistrations are available at the following links: Experiment 1a (https://osf.io/42d6p), Experiment 1b (https://osf.io/6j4k9), Experiment 2 (https://osf.io/8wxhf), and Experiment 3 (https://osf.io/nghwt). Data, materials, and code for all four experiments have been made publicly available at Open Science Framework (OSF) and can be accessed at https://osf.io/gm2c8/.
Participants
In Experiment 1a, we recruited 74 undergraduate students at Washington University in St. Louis to complete this online experiment through the university participant pool as partial fulfillment of course requirements. The mean age of the participants was 20.12 years (min = 18, max = 23, SD = 1.15). Fifty-two participants identified as female, 21 identified as male, and 1 identified as intersex, nonbinary, or other. More demographic information for all the experiments is reported in Supplemental Materials (Table S3). Informed consent was obtained from all participants prior to the start of data collection. We determined the sample size by performing a bootstrapped power analysis using a pilot sample (n = 25, 8 dropped based on the exclusion criteria, totaling n = 17) collected using the Washington University participant pool. We conducted a power analysis using a simulation-based approach. We randomly sampled test data from the pilot sample with replacement to create 1000 new datasets with sample size ranging from 15 to 45 in steps of 5. We then ran a mixed effects logistic regression model to predict the accuracy of each recency judgment question (correct/incorrect) as a function of narrative type (CS/FS) and pair type (within/across), and their interaction. We calculated the proportion of simulations at each sample size that yielded difference estimate in the hypothesized direction and a p-value < 0.05 for the smallest effect we hypothesized (i.e., the effect of narrative type on within-event pairs recency judgment accuracy, FS_within > CS_within). The power analysis showed that we needed at least 30 participants to achieve a statistical power of 80%.
In Experiment 1b, we recruited 38 undergraduate students at Washington University in St. Louis to complete this online experiment through the university participant pool as partial fulfillment of course requirements. The mean age of the participants was 19.47 years (min = 18, max = 22, SD = 1.20). Twenty-six participants identified as female, and 12 identified as male. Informed consent was obtained from all participants prior to the start of data collection. We determined the sample size by performing a bootstrapped power analysis like in Experiment 1a using a pilot sample (n = 15, 5 dropped based on the exclusion criteria, totaling n = 10) collected using Prolific (https://www.prolific.com/). The power analysis showed that we needed at least 25 participants to achieve a statistical power of 90%. In both experiments, we recruited more participants than what was indicated by the power analysis, because we anticipated excluding some participants based on the preregistered criteria.
Materials
The design of the stimuli is depicted in Figure 1. To construct the stimulus set, we wrote ten narratives about everyday activities. Each narrative consisted of 27 sentences, including one opening sentence at the beginning, one ending sentence at the end, and 25 fine-level event sentences in the middle. Each fine-level sentence was accompanied by a coarse-level event label that described its context. The label changed after every five fine-level event sentences, making up five coarse-level event labels in each narrative. To effectively induce event model updating at coarse-level event transitions, whenever a coarse-level event label changed to a new one, there was a spatial shift in the narrative (e.g., from “at home” to “in the car”).
The ten narratives were divided into two conditions: For Coarse-level Semantic (CS) narratives, all the coarse-level event labels in the narrative had a common-knowledge temporal order constraint (e.g., preparing at home, and then driving the car to his aunt’s house, and then get greeted in his aunt’s living room, etc.), whereas each set of fine-level events contained in each coarse-level event had no semantic order constraint (e.g. take out hoodie, check aunt’s address, make sure the gift is wrapped, etc.). For Fine-level Semantic (FS) narratives, all the coarse-level events had no semantic order constraint (e.g. get a souvenir tattoo, watch a sea lion show, visit a snack cart), whereas each set of fine-level events contained in each coarse-level event had a common-knowledge temporal order constraint (e.g. wait in the line, then order the food, and then pay with cash). In addition, we generated audio for each event sentence in all the narratives using an online AI-based text-to-speech tool, ElevenLabs (https://elevenlabs.io/).
The narrative stimuli used in this study were included in the “Stimuli” section of the Supplemental Materials. The themes of Coarse-level Semantic (CS) narratives included visiting an aunt (CS_1), going swimming (CS_2), doing morning routine (CS_3), going to a cafeteria (CS_4), and going to a health examination (CS_5). The themes of Fine-level Semantic (FS) narratives included going shopping (FS_1), cleaning home (FS_2), going to a zoo (FS_3), visiting campus (FS_4), and visiting a farm (FS_5).
Procedure and Design
All the experiments reported in this manuscript were programmed using jsPsych (https://www.jspsych.org/7.3/) and were hosted online using Cognition (https://www.cognition.run/).
In Experiment 1a, each participant completed 10 task runs, each corresponding to one of the 10 narrative stimuli (5 CS narratives and 5 FS narratives). Each run consisted of an encoding phase, a delay phase, and a test phase.
In the encoding phase, participants were instructed to read each narrative sentence by sentence in a self-paced format. On each screen, they first saw a label describing the broader context of the current activity (“coarse-level event label,” e.g., “Prepare at home”). After 1 second, they read one sentence describing an activity (“fine-level event sentence,” e.g., “He took out his hoodie from the closet.”) below the coarse-level event label. They were instructed to click on the “Next” button to proceed to the next screen whenever they finish reading the current sentence. Figure 3A illustrates the encoding phase.
Figure 3. Schematic of Experimental Procedure.

Note. Panel A. Encoding phase in Experiments 1-3. Participants read each narrative sentence-by-sentence in a self-paced format. Fine-level event sentences changed every one screen, and coarse-level event labels changed every five screens. In Experiment 1b, 2, and 3, in addition to reading the contents on the screen, participants also heard audio of each fine-level event sentence. Panel B. Test phase in Experiments 1a and 1b. For each fine-level event sentence pair, participants performed recency judgment task, rated their confidence for the recency judgment task, and then rated their perceived temporal distance between the two sentences. Panel C. Test phase in Experiment 2. Participants performed a serial recall task for the each of the narratives they read before the delay phase. Panel D. Test phase in Experiment 3. For each fine-level event sentence pair, participants performed recency judgment task and rated their confidence. In the source memory task, participants selected the corresponding coarse-level event label for each fine-level event sentence being prompted. For each coarse-level event label pair, participants performed recency judgment task and rated their confidence. Note that the narrative used for illustration here was one of the Coarse-level Semantic (CS) narratives used in the study. The presentation format was the same for Fine-level Semantic (FS) narratives in both encoding and test.
After finishing each encoding phase, participants entered a delay phase in which they solved 40 math questions for approximately 150 seconds. They were asked to choose whether a given math question (e.g., 3*5-7) produced an odd or even result, and feedback (“Correct!” or “Wrong!”) was given immediately after each answer. Participants were asked to not rush over the questions and try their best to maximize the accuracy.
After the delay phase, participants entered a test phase in which they were told to answer questions based on the story. Each test phase consisted of eight trials. On each trial, they were presented with a two fine-level event sentences selected from the narrative, and were asked to (1) make a recency judgment (“Please select the event that occurred first in the story”), (2) indicate their confidence for the recency judgment (“On a scale of 1-100, what’s your confidence for the previous order judgment?”), and (3) give their rating of perceived temporal distance between these two sentences (“On a scale of 1-10, how far apart in time were the two events presented in the story?”). Each pair of fine-level sentences being probed was either a within-event pair (i.e., two sentences studied at the second and the fifth positions of the same coarse-level event), or an across-event pair (i.e., one sentence studied at the third position of a given coarse-level event, and another sentence studied at the first position in the next adjacent coarse-level event). Critically, both within-event sentence pairs and across-event sentence pairs were separated by two sentences during encoding time. Depending on whether the narrative was a Coarse-level Semantic (CS) or a Fine-level Semantic (FS) narrative, a test pair could be one of four types: within-event pairs from Coarse-level Semantic narratives (CS_within), across-event pairs from Coarse-level Semantic narratives (CS_across), within-event pairs from Fine-level Semantic narratives (FS_within), or across-event pairs from Fine-level Semantic narratives (FS_across). (We conducted a separate semantic similarity analysis to make sure test pairs in different conditions did not differ significantly in terms of the semantic similarity between two sentences, in order to rule out semantic similarity as a confounding variable for recency judgment accuracy or distance rating. Details are reported in the Supplemental Materials.) All the test questions were self-paced, presented one at a time. Figure 3B shows an illustration of the testing phase.
In Experiment 1b, in order to create a 20-minute delay between encoding and test for each narrative, we presented all ten narratives in randomized order and then tested all narratives in the same order. As in Experiment 1a, they were instructed to read each narrative sentence by sentence in a self-paced format. The on-screen display of fine-level event sentences and coarse-level event labels were the same as in Experiment 1, except that participants also heard the audio of each fine-level event sentence as they read, and that the event label appeared at the same time with the event sentence in each trial. They could only click on the “Next” button to proceed to the next screen after the audio for the current sentence finished playing.
Another change from Experiment 1a was that participants answered two reading comprehension questions immediately after reading each narrative. This change was also made to increase participants’ engagement in the task, and we used accuracy for comprehension questions as an additional exclusion criterion. Each of these reading comprehension questions probed one specific detail in the narrative, and participants chose one option out of four options provided. Here is a sample reading comprehension question: “What did Jim notice outside of his car window?” (Correction option: “The trees turning yellow.”) To avoid interference with the later test block, we carefully constructed these questions so that they never probe contents in the sentences being tested in the recency judgment and distance rating tasks.
After encoding all ten narratives in the encoding block, participants entered the test block. Before entering each test phase, they received an instruction specifying the narrative being probed (e.g., “The following questions are for the story about Jim visiting his aunt.”). Each test phase tested the same eight fine-level event sentence pairs as in Experiment 1a. For each pair, they were asked to (1) make a recency judgment, (2) indicate their confidence for the recency judgment, and (3) give their rating of perceived temporal distance between these two sentences. Figure 3B shows an illustration of the testing phase.
Experiments 1a and 1b used a 2 (Narrative Type: CS vs. FS) × 2 (Pair Type: Within vs. Across) within-subject design. Both narrative type and pair type are within-subject variables. The presentation order of the ten narratives (same as their test order) was randomized for each participant, and the order of event pairs being tested for a given narrative was randomized for each run.
Data Preparation
For Experiment 1a, the final sample included responses from 40 participants. Thirty-four participants were excluded from the data analysis based on the preregistered criteria: Two were excluded because they reported experiencing technical problems, 8 were excluded for having less than 75% accuracy for math questions during the delay phase, 14 for having response time greater than 40000 ms for more than five encoding or test trials, and 10 for having response time less than 300 ms for more than five encoding or test trials.
For Experiment 1b, the final sample included responses from 27 participants. Eleven participants were excluded from the data analysis. Four exclusions were based on our preregistered criteria: One was excluded for reporting experiencing technical problems, and 3 for having reaction time greater than 40000 ms for more than five encoding or test trials. We adjusted our criterion for reading comprehension accuracy before performing hypothesis tests based on the observation that accuracy was lower than in the pilot sample, leading us to adopt a cutoff of 60% rather than 75% and leading to 4 exclusions. In a second deviation from our preregistration, for recency judgment accuracy, we had preregistered using 3SD less than the mean as the cutoff; this excluded no participant, but we excluded 3 participants for having less-than-chance mean accuracy.
In addition, we excluded outlier trials based on the preregistered criteria. For encoding trials, we excluded all trials with response time less than 300 ms, or more than 3SDs above the mean response time of sentences that were not the opening or ending sentences of each narrative (0.2% of the data in Experiment 1a). For test trials, we excluded all trials that had response time less than 300 ms or greater than 3SDs above the mean reaction time for recency judgment trials (1.8% of the data in Experiment 1a, and 1.6% in Experiment 1b), recency judgment confidence trials (1.3% of the data in Experiment 1a, and 1.1% in Experiment 1b), and temporal distance rating trials (0.9% of the data in Experiment 1a, and 1.3% in Experiment 1b).
Reading time data in Experiment 1a were log-transformed to correct for skewness. We excluded the reading time data for the opening and ending sentence for each narrative. In addition, for both Experiment 1a and 1b, we transformed recency judgment confidence (on a scale of 1-100) into a binary confidence group variable by coding confidence scores greater than 90 as in the “High Confidence” group, and coding confidence scores less than 90 as in the “Low Confidence” group. This was determined based on the distribution of confidence data in the pilot dataset, in which confidence = 100 was the mode that occurred in 40% of all confidence trials.
Analyses
We conducted data analyses using R. We estimated Linear Mixed-Effects Models using the lmer function, and Generalized Linear Mixed-Effects Models using the glmer function from the lme4 package (Bates et al., 2015). Unless otherwise specified, regression models in all experiments using “narrative type” and “pair type” as fixed effect predictors were effect coded (narrative type: CS = +1, FS = −1; pair type: across = +1, within = −1). For all the models reported across the four experiments, we initially fit a “maximal model” (Barr et al., 2013) that included random slope effects of all predictors. We removed one random effect at a time, and used a likelihood ratio test to compare the reduced model with the more complex model. We retained the most parsimonious model that did not differ significantly from the more complex models (Bates et al., 2018; Matuschek et al., 2017). For the fixed effects of interest that were not statistcally significant, we calculated their Bayes factors using the lmBF function from the BayesFactor package (Morey & Rouder, 2012) by comparing the marginal likelihood of a model including the fixed effect of interest with that of a model excluding the effect. The comparison used the default Jeffreys-Zellner-Siow (JZS) priors on regression coefficients as specified by lmBF. We created plots from regression models using the sjPlot package (Lüdecke, 2021). In addition, we created the visualization of serial position curves and lag-CRP curves using the psifr package in Python (Morton, 2020).
Results
Reading Time
Based on previous studies showing that reading time was longer at event boundaries (Radvansky et al., 2001; Zwaan & Radvansky, 1998), we predicted that reading time for boundary sentences (i.e. the first sentence in each coarse-level event) would be longer than non-boundary sentences (i.e. the second to fifth sentences in each coarse-level event), when controlling for whether the sentence came from the first coarse-level event of each narrative, and controlling for narrative type (CS vs. FS). In addition, we predicted that in the first coarse-level event in a narrative, the extent to which boundary sentences required longer encoding time than other sentences would be larger than in other coarse-level events. The rationale behind this prediction was that as narrative unfolded, readers should become more familiarized with the narrative structure, such that the degree of prediction quality change induced by changes in coarse-level event should gradually decrease. In Experiment 1a, mean reading time for sentences during the encoding phase (excluding opening and ending sentences) was 2131 ms (SD = 3986 ms). Figure S2 in Supplemental Materials showed the average reading time of sentences at each position (1-25). To satisfy model assumptions, we log-transformed reading time before entering it into the regression model. We predicted the log-transformed reading time of each sentence using a linear mixed-effects model, with fixed effects of fine-level position type (boundary vs. non-boundary sentence), coarse-level position type (first coarse event vs. other coarse event), narrative type (CS vs. FS narrative), and the interaction between coarse-level position type and fine-level position type. We effect coded both predictors (fine-level position type: boundary = +1, non-boundary = −1; coarse-level position type: first coarse event = +1, other coarse event = −1). After model selection, we retained the random slope of narrative type on participant and random intercept of narrative as random effects. In Wilkinson notation, the model can be described as follows: Log-transformed Reading Time ~ Fine-level Position Type + Coarse-level Position Type + Narrative Type + Fine-level Position Type:Coarse-level Position Type + (Narrative Type | Participant) + (1| Narrative).
As shown in Figure 4A, reading times were longer for boundary sentences than non-boundary sentences, as predicted, leading to a main effect of fine-level position type, F(1, 9891.01) = 155.14, p < .001. Reading times were also slower for the first coarse-grained event, leading to a significant main effect of coarse-level position type, F(1, 9891.03) = 116.28, p < .001, and the difference between boundary and non-boundary sentences was particularly large for the first coarse event, leading to a significant interaction between fine-level position type and coarse-level position type, F(1, 9891.03) = 19.40, p < .001. There was no significant difference between reading times for CS and FS narratives, F(1, 11.62) = .003, p = .96. We probed the fine-level position type × coarse-level position type interaction with planned contrasts and found that the effect of being a boundary sentence on reading time was stronger in the first coarse-level event in a narrative (B = .31, SE = .03, p < .001) than in other coarse-level events (B = .15, SE = .02, p < .001), when controlling for narrative type. These results suggested that change in coarse-level event labels increased readers’ processing effort, which could be interpreted as a signature for event model updating and event boundary processing.
Figure 4. Experiment 1a and 1b Results.

Note. Panel A: Participants spent longer time reading boundary sentences than non-boundary sentences, and this effect was particularly large in the first coarse-level event in each narrative. Panel B: Participants had higher recency judgment accuracy for within-event pairs than across-event pairs when there were fine-level semantic order constraints (in FS narratives), and higher recency judgment accuracy for across-event pairs than within-event pairs when there were coarse-level semantic order constraints (in CS narratives). Panel C: Participants were more likely to have high recency judgment confidence (> 90) for across-event pairs in FS narratives than for within-event pairs in CS narratives. Panel D: Participants rated across-event pairs as farther from each other than within-event pairs, regardless of narrative type. Error bars represent 95% confidence intervals.
We also conducted this analysis with the total number of syllables in each sentence as an additional predictor, in order to control for its potential influence on reading time. There was a significant main effect of the total number of syllables in each sentence, such that sentences with more syllables required a longer reading time. After controlling for this factor, all the above findings involving the preregistered hypotheses remained statistically significant (see Supplemental Materials).
Recency Judgment Accuracy
We hypothesized that whether the presence of event boundary impaired or facilitated temporal order memory would depend on whether semantic order knowledge could be used to facilitate coarse-level or fine-level temporal order judgment. We predicted that when semantic knowledge could help infer the order of coarse-level events, temporal order memory of across-event pairs (CS_across pairs) would be better than when there was no semantic knowledge facilitation (FS_across and CS_within pairs). Conversely, we predicted that when semantic knowledge could help infer the order of fine-level events, temporal order memory of within-event pairs (FS_within pairs) will be better than when there was no semantic knowledge facilitation (CS_within and FS_across pairs).
Across all trials, mean recency judgment accuracy was .86 (SD = .35) in Experiment 1a, and .76 (SD = .43) in Experiment 1b. We predicted whether a given recency judgment trial was correct or not using a logistic mixed-effects model with the fixed effects of narrative type (CS vs. FS narrative), fine-level event pair type (across vs. within), and the interaction between narrative type and fine-level event pair type. For both experiments, after model selection, we retained the random intercept of subject and event pairs as random effects: Recency Judgment Result (0/1) ~ Narrative Type + Fine-level Event Pair Type + Narrative Type:Fine-level Event Pair Type + (1 | Participant) + (1| Event Pair).
As shown Figure 4B, the presence of semantic order constraints determined whether across-event or within-event order memory was better, leading to a significant interaction between narrative type and fine-level event pair type (Experiment 1a, X2(1) = 36.45, p < .001; Experiment 1b, X2(1) = 18.13, p < .001). There was no significant main effect of narrative type (Experiment 1a, X2(1) = .20, p = .65; Experiment 1b, X2(1) = .13, p = .72) or fine-level event pair type (Experiment 1a, X2(1) = .32, p = .57; Experiment 1b, X2(1) = 1.50, p = .22). We probed the interaction with planned contrasts and found that FS_within pairs had better accuracy than both CS_within pairs (Experiment 1a, B = 1.19, SE = .30, p < .001; Experiment 1b, B = 1.02, SE = .32, p = .001) and FS_across pairs (Experiment 1a, B = 1.16, SE = .30, p < .001; Experiment 1b, B = 1.21, SE = .31, p < .001), suggesting that fine-level semantic order constraints could facilitate with within-event recency judgment. In addition, CS_across pairs had better accuracy than both FS_across pairs (Experiment 1a, B = 1.38, SE = .30, p < .001; Experiment 1b, B = .86, SE = .31, p = .005) and CS_within pairs (Experiment 1a, B = 1.40, SE = 0.30, p < .001; Experiment 1b, B = .67, SE = .31, p = .03), suggesting that coarse-level semantic order constraints could facilitate with across-event recency judgment. This supported the hypothesis that semantic order knowledge could be used on either level to improve the accuracy of recency judgments, at both delays.
We did not preregister to test differences in recency judgment accuracy regarding the following contrasts, but the results were reported here for completeness: Across two experiments, recency judgment accuracy did not differ significantly between CS_within and FS_across pairs (Experiment 1a, B = .02, SE = .28, p = .93; Experiment 1b, B = .19, SE = .30, p = .53) or between CS_across and FS_within pairs (Experiment 1a, B = .21, SE = .32. p = .50; Experiment 1b, B = .35, SE = .32, p = .28).
Recency Judgment Confidence
We hypothesized that, due to the hierarchical event structure in the narratives, using episodic memory of coarse-level order as a source of information would improve people’s confidence on inferring the temporal order between fine-level event pairs. Specifically, when there was no influence of semantic knowledge on temporal order, temporal order memory confidence for across-event pairs in fine-level semantic narratives (FS_across pairs) would be higher than for within-event pairs in coarse-level semantic narratives (CS_within pairs), controlling for whether the judgment was correct.
Across all trials, the mean recency judgment confidence was 76.53 (SD = 28.16) in Experiment 1a and 61.55 (SD = 33.17) in Experiment 1b on a scale of 1 to 100. Condition-specific means and standard deviations were reported in the Supplemental Materials. The confidence rating distribution was highly left-skewed, with 48% and 28% of the scores higher than 90 in Experiment 1a and 1b respectively. Therefore, we binarized the confidence variable used 90 as a cutoff, coding confidence scores greater than 90 as “High Confidence” and confidence score less than 90 as “Low Confidence.” In addition, because the key comparison was between the recency judgment confidence of FS_across pairs and CS_within pairs, we dummy coded the fine-level event pair type variable (with FS_across as the reference group, and three dummy variables for FS_within, CS_across, and CS_within). We predicted recency judgment confidence (high/low) of a given sentence pair using a logistic mixed-effects model, with the fixed effects of recency judgment result (0/1), and three dummy variables FS_within, CS_across, and CS_within. The recency judgment accuracy predictor was effect coded (correct = +1, incorrect = −1). After model selection in Experiment 1a, we retained the slope of three dummy variables of FS_within, CS_across, and CS_within on subject and random intercepts of event pairs as random effects: Recency Judgment Confidence (High/Low) ~ Recency Judgment Result (0/1) + FS_within + CS_across + CS_within + (FS_within + CS_across + CS_within | Participant) + (1| Event Pair). After model selection in Experiment 1b, we retained the random intercepts of subject and event pairs as random effects: Recency Judgment Confidence (High/Low) ~ Recency Judgment Result (0/1) + FS_within + CS_across + CS_within + (1 | Participant) + (1| Event Pair).
As shown in Figure 4C, high-confidence recency judgments were more frequent in the FS_across condition than in the CS_within condition, leading to a significant effect of the CS_within variable (Experiment 1a, X2(1) = 12.93, p < .001; Experiment 1b, X2(1) = 10.63, p = .001). In addition, high-confidence judgments were more frequent for the FS_within condition than the FS_across condition, leading to a significant effect of the FS_within variable (Experiment 1a, X2(1) = 12.79, p < .001; Experiment 1b, X2(1) = 13.80, p < .001). There was no significant effect of the variable CS_across (Experiment 1a, X2(1) = .92, p = .34; Experiment 1b, X2(1) = .13, p = .72). For the reference condition FS_across, high-confidence judgments were more frequent for correct than incorrect trials, leading to a significant main effect of recency judgment result, (Experiment 1a, X2(1) = 73.11, p < .001; Experiment 1b, X2(1) = 52.73, p < .001). The significant main effect of CS_within supports the hypothesis that participants had higher confidence for recency judgments for FS_across pairs than for CS_within pairs, after controlling for accuracy. This suggests that when there was no semantic order knowledge facilitation, participants had higher confidence for the relative recency of event pairs coming from two different coarse-level events (FS_across) than for event pairs coming from the same coarse-level events (CS_within), controlling for the accuracy of the judgment.
Temporal Distance Ratings
Based on previous studies showing temporal distance rating inflation due to the presence of event boundaries (Clewett et al., 2020; Ezzyat & Davachi, 2014; Wen & Egner, 2022), we hypothesized that the temporal distance between two fine-level events would be rated as farther if they span across an event boundary, compared to when they belong to the same coarse-level event, even though the actual distance was the same.
Across all trials, the mean temporal distance rating was 3.61 (SD = 2.00) in Experiment 1a and 3.77 (SD = 2.12) in Experiment 1b on a scale of 1 to 10. We predicted the temporal distance rating of a given sentence pair using a linear mixed-effects model, with the fixed effects of narrative type (CS vs. FS narrative), fine-level event pair type (across vs. within), and the interaction between narrative type and fine-level event pair type. After model selection in Experiment 1a, we retained the random slopes of narrative type, fine-level event pair, and their interaction on subject, and random intercepts of event pairs as random effects: Temporal Distance Rating ~ Narrative Type + Fine-level Event Pair Type + Narrative Type:Fine-level Event Pair Type + (Narrative Type + Fine-level Event Pair Type + Narrative Type:Fine-level Event Pair Type | Participant) + (1| Event Pair). After model selection in Experiment 1b, we retained the random slopes of fine-level event pair type on subject and random intercepts of event pairs as random effects: Temporal Distance Rating ~ Narrative Type + Fine-level Event Pair Type + Narrative Type:Fine-level Event Pair Type + (Fine-level Event Pair Type | Participant) + (1| Event Pair).
As shown in Figure 4D, across-event pairs were rated as farther away from each other than within-event pairs, leading to a significant main effect of fine-level event pair type (Experiment 1a, F(1, 92.19) = 168.68, p < .001; Experiment 1b, F(1, 62.11) = 84.38, p < .001). There was a significant main effect of narrative type in Experiment 1a, F(1, 84.29) = 13.79, p < .001, but not in Experiment 1b, F(1, 75.97) = 2.60, p = .11. There was no significant interaction between narrative type and fine-level event pair type (Experiment 1a, F(1, 79.09) = 1.89, p = .17; Experiment 1b, F(1, 75.97) = .75, p = .39). By testing the four hypothesized pairwise contrasts, we found that CS_across pairs were perceived as more temporally distant than both CS_within pairs (Experiment 1a, B = 1.80, SE = .19, p < .001; Experiment 1b, B = 1.83, SE = .26, p < .001) and FS_within pairs (Experiment 1a, B = 1.50, SE = .20, p < .001; Experiment 1b, B = 1.71, SE = .26, p < .001), and FS_across pairs were perceived as more temporally distant than both CS_within pairs (Experiment 1a, B = 2.46, SE = .20, p < .001; Experiment 1b, B = 2.21, SE = .26, p < .001) and FS_within pairs (Experiment 1a, B = 2.15, SE = .20, p < .001; Experiment 1b, B = 2.10, SE = .26, p < .001).
Note that because we used a self-paced reading paradigm during encoding, the absolute time elapsed between encountering two sentences in each tested pair was not strictly the same across trials and participants, even though all the tested pairs we selected were separated by two sentences in between. To address this potential confound, we conducted an additional analysis (reported in the Supplemental Materials) and added time elapsed during encoding as a predictor to the two models predicting temporal distance rating in Experiment 1a and 1b. For both models, the main effect of time elapsed was not statistically significant, and all the above findings involving preregistered hypotheses did not change. This strongly suggests that the inflated distance rating for across-event pairs was not driven by longer time passed during encoding. Instead, this distortion in subjective time perception was more likely to be caused by the event model updating process at event boundary and its downstream effect in event-based memory chunking.
Discussion
Across Experiment 1a and 1b with different delay length, we found that semantic order knowledge interacts with hierarchical event structure to facilitate temporal order memory. When semantic knowledge supported the reconstruction of fine-level temporal order, participants were better at judging the order of within-event pairs; when semantic knowledge supported reconstruction of coarse-level temporal order, participants were better at judging the order of across-event pairs. Notably, this pattern is different from the canonical finding in most studies using the EDD paradigm, where recency judgment results for within-event pairs are almost always more accurate than across-event pairs (Clewett et al., 2020; DuBrow & Davachi, 2013; Heusser et al., 2018). We showed that the role event boundary plays in “disrupting” temporal order memory can be influenced by other factors that help scaffold real-life event memory, including knowledge of stereotypical event order and event structure. In our narrative reading paradigm, the assumption that across-event sentence pairs were perceived and remembered as belonging to separate coarse-level events was supported by two findings: During encoding, readers slowed down at the moments of coarse-level event updating, and their retrospective judgment of temporal distance was farther when an event boundary occurred between two sentences, compared to when the two sentences belonged to the same coarse-level event.
The importance of hierarchical event structure for readers’ comprehension was also evident in participants’ confidence judgments. Here, we focused on those conditions in which temporal order could not be reconstructed based on semantic knowledge: FS_across (in which one needs to remember the order of two sentences from different coarse events, but the coarse order was unconstrained by knowledge) and CS_within (in which one needs to remember the order of two sentences from the same event, but the fine order was unconstrained by knowledge). Controlling for accuracy, participants were more confident in memory for FS_across pairs; this suggests that readers encode hierarchical relationship among events during reading comprehension and are confident that they can remember and use coarse-level order information to make recency judgments about pairs of fine-level sub-events.
It was notable that all the patterns we found in recency judgment, recency judgment confidence, and distance rating remained the same in both experiments. Following the design of most EDD paradigms, we used a relatively short delay task (~ 2.5 minutes) between encoding and retrieval in Experiment 1a. At this short delay, memory systems such as the episodic buffer (Baddeley, 2000) or long-term working memory (Ericsson & Kintsch, 1995) might contribute to performance, whereas in real-world memory for past events the delays are often longer and thus the contribution of such systems is likely negligible. To rule out that the effects observed here might depend on rapidly-fading memory systems, we replicated the effects in Experiment 1b after increasing the length of the delay period between encoding and retrieval to about 20 minutes. Together, these results suggested that hierarchical structure and semantic order constraint in meaningful narratives influence how fine-level events were temporally reconstructed from long-term memory. Moreover, the mechanisms we proposed were suitable for explaining real-life memory reconstruction that operates on a relatively long timescale, and that they did not depend on a delay-dependent mechanism.
Experiment 2
Experiments 1a and 1b used recency judgment tasks to examine the influence of semantic order knowledge and event hierarchy on temporal order memory. One limitation of the recency judgment task is that explicit judgments may not reflect how temporal information is typically used in real life. In many situations, rather than reflecting deliberately on temporal order, people may use temporal order information to structure memory search. Therefore, in Experiment 2 we adopted a serial recall task as an indirect approach to probe order memory. By asking people to retell the narratives they read, we could assay how people reconstruct order information to help communicate the content and the broad structure of the narratives. We asked participants to type down what they remembered about the narrative in the order of occurrence, after a short delay (as in Experiment 1a). We then analyzed both the order and the number of events recalled. First, we hypothesized that the recall order of fine-level events would be influenced by semantic order constraints, based on previous studies suggesting that the temporal order of recalled goal-directed events was strongly influenced by underlying schemas (Lichtenstein & Brewer, 1980; Bower et al., 1979; Brewer & Dupree, 1983). If semantic order constraints scaffold temporal memory, then FS narratives should facilitate the recall of within-event transitions and CS narratives should facilitate across-event transitions. This led us to predict that (1) fine-level events within each coarse-level event would be recalled in better temporal order for FS narratives than for CS narratives, (2) more within-event transitions would be made in the correct temporal order for FS narratives than for CS narratives, and (3) more across-event transitions than within-event transitions would be made in the correct temporal order in CS narratives. Note that (2) and (3) made different predictions from canonical recall results in the EDD paradigm, where participants always make more accurate within-event than across-event transitions in serial recall (DuBrow & Davachi, 2013, 2016). Second, we hypothesized that participants would recall more fine-level events from FS narratives than from CS narratives. Previous literature suggests that people recall more information if it is directly related to an underlying script or linked by causal structure (Lee & Chen, 2022; Myers et al., 1987; Radvansky & Zacks, 2017; Trabasso & van den Broek, 1985). Because FS narratives contained more semantic order constraints on the finer level, this should scaffold retrieval of fine-level events and lead to higher levels of recall. However, we also acknowledge that another type of link that might help with retrieval is the link between each coarse-level event and its constituent fine-level events. Since both CS and FS narratives were designed to utilize readers’ prior knowledge about such part-of relationship, this might overshadow the facilitation effect of fine-level semantic order constraints on boosting recall.
Method
Participants
Thirty-eight undergraduate students at Washington University in St. Louis participated in this online experiment through the university participant pool as partial fulfillment of a course requirement. The mean age of the participants was 19.79 years (min = 18, max = 23, SD = 1.23). Twenty-nine participants identified as female, and 9 identified as male. Informed consent was obtained from all participants prior to the start of data collection.
We determined the sample size by performing a bootstrapped power analysis using a pilot sample (n = 9, 1 dropped based on exclusion criteria, totaling n = 8). All of the hypothesized effects were supported by the pilot data, and we decided to power our study based on the smallest effect we observed by running simulation-based power analysis using the package “Mixedpower” in R (Kumle et al., 2021). The power analysis showed that we need at least 30 participants to achieve the power of 90% to detect the smallest effect we hypothesized (i.e., the effect of narrative type on the number of fine-level events recalled per narrative).
Materials
We used six of the ten narratives about everyday activities used in the previous experiments, in order to reduce fatigue in the recall task. Out of these six narratives there were three Coarse-level Semantic (CS) and three Fine-level Semantic (FS) narratives: visiting an aunt (CS_1), going swimming (CS_2), doing morning routine (CS_3), going shopping (FS_1), cleaning home (FS_2), and going to a zoo (FS_3).
Procedure and Design
Each participant completed 6 task runs, consisting of encoding a narrative, a delay, and a test phase. We used the same encoding procedure with audio as in Experiment 1b and the same shorter delay procedure as in Experiment 1a, except that before reading each story, we instructed participants to remember each activity in the story in as much detail as possible, as if they were preparing to retell the story to a friend later. In each test phase, they were told to type down everything they could remember from the previous story in the order it was presented, in as much detail as possible. They were required to spend at least two minutes typing their response before proceeding to the next run using the “Next” button. There was no maximum put on their recall time. The presentation and test order of the six narratives were randomized for each participant. Figure 3C showed an illustration of the testing phase.
Data Preparation
The final sample included responses from 32 participants. Six participants were excluded from the data analysis based on the exclusion criteria that we preregistered: Three were excluded for reporting technical problems during the online experiment, two for having less than 75% accuracy for math questions during the delay phase, and one for giving an empty response for at least one recall trial.
Additionally, we excluded outlier recall trials based on the exclusion criteria that we preregistered: All trials that had reaction time > 3SDs above the mean reaction time of all recall trials (1% of the data).
Recall Scoring.
To score recall responses, we compared each participant’s typed recall with the original script for each narrative. For each fine-level event sentence in the original script of a narrative, two coders identified the key verbs and key objects in the sentence based on the situation it described. For example, for the fine-level event sentence “He took out his hoodie from the closet,” “took out” was identified as the key verb phrase, and “hoodie” and “closet” were identified as the key objects. We counted a fine-level event as being recalled if at least one of the key verbs or the key objects in the original sentence or their synonyms was mentioned in the recall protocol, and if the recalled event corresponded to the original event on a situational level (van Dijk & Kintsch, 1983).
For each of the fine-level events in the script that was mentioned in the recall, we recorded the event’s ordinal position in the script following the order it was mentioned in the recall. Therefore, for each typed recall, we derived a vector of ordinal positions representing the fine-level events being recalled (Diamond & Levine, 2020) for the whole narrative (fine-level order vector; maximum length = 25 after excluding beginning and ending), as well as the sub-vectors of ordinal positions representing the fine-level events recalled within each coarse-level event (sub-fine-level order vector; maximum length = 5). For each recall response made by each participant, there was one fine-level order vector and up to five (depending on the number of coarse-level events being recalled) sub-fine-level order vectors being extracted.
For each coarse-level event in the script, we counted it as being recalled if the coarse-level label was directly mentioned in the recall or if at least one fine-level event from it was coded as recalled (Tulving & Pearlstone, 1966). We recorded the ordinal position of each coarse-level event in the script following the order it was mentioned in the recall and derived a vector of ordinal positions representing the coarse-level events being recalled (coarse-level order vector; maximum length = 5). For each participant’s recall of one story, there was one coarse-level order vector extracted.
Based on the three types of recall vectors, we computed the number of fine-level events recalled per narrative (i.e., the length of fine-level order vector), the number of fine-level events recalled per coarse-level event (i.e., the length of sub-fine-level order vector), and the number of coarse-level events recalled per narrative (i.e., the length of coarse-level order vector). To quantify how much the fine-level events recalled within each coarse-level event deviated from the correct order, we computed a deviance score by comparing each coarse-level event’s fine-level observed recall order to the optimal recall ordering that could be produced given that some of the events might not have been recalled. To compute the deviance score, for each adjacent transition, we subtracted the transition lag in the sorted vector from the transition lag in the unsorted vector, took the absolute value of the difference, and summed across all adjacent transitions. For example, suppose a participant recalled the fifth fine-level event in the coarse-level event, then the first, then the third, and then the second. We would construct the vector [5, 1, 3, 2] and compare it with the vector [1, 2, 3, 5], which arranged all the events recalled in the correct order, to yield a deviance score of 9 (9 = |(2-1)-(1-5)| + |(3-2)-(3-1)| + |(5-3)-(2-3)|). A recall in the correct order would yield a deviance score of 0, and a bigger deviance from the correct order would yield a numerically larger deviance score. We then divided the deviation score by the number of transitions made in each recall vector (number of transitions = recall vector length – 1) to create a normalized deviation score that was independent of the length of the recall and used this normalized deviance score as the dependent variable in the regression model.
To quantify the direction of recall transitions, we categorized all immediate (lag = 0) transitions among fine-level events in each recall as within-event transitions (i.e. two adjacent fine-level events belonging to the same coarse-level event) or across-event transitions (i.e. two adjacent fine-level events belonging to two different coarse-level events), and determined if each transition is in the same order as in the narratives presented during encoding. For example, if the 3rd fine-level event was followed immediately by the 5th fine-level event, and the 4th fine-level event was never recalled, then the transition from the 3rd to the 5th would be counted as correct.
Two raters (YD and DA) were trained on the scoring criteria described above using data from the pilot experiment (n = 8). For the data collected in Experiment 2, YD and DA each scored recall data from half of the participants. To calculate interrater reliability, the two coders scored recall data from the same five participants. Interrater reliability was high (mean Cohen’s Kappa = 0.87).
Results
Order of Fine-level Events Recalled within Each Coarse-level Event
We hypothesized that participants would recall the order of fine-level events within each coarse-level event better for FS than for CS narratives, because FS narratives provide semantic order constraints on fine-level events. The mean normalized order deviance score for fine-level events recalled within each coarse-level event was .31 (SD = .85). We predicted the magnitude of normalized recall order deviance score using a linear mixed-effects model, with the fixed effects of narrative type (CS vs. FS narrative). After model selection, we retained the random intercept of subject and narrative as random effects: Normalized Deviance Score ~ Narrative Type + (1 | Participant) + (1| Narrative). As shown in Figure 5A, fine-level events with each coarse-level event were recalled more in order for FS than CS narratives, leading to a significant main effect of narrative type, F(1, 4.16) = 19.81, p = .01.
Figure 5. Experiment 2 Results.

Notes. Panel A. Participants recalled fine-level events in each coarse-level event in better temporal order in FS narratives than in CS narratives. Panel B. For immediate recall transitions, participants were more likely to make correct across-event transitions in CS narratives or correct within-event transitions in FS narratives, compared to within-event transitions in CS narratives. Panel C. Participants recalled similar numbers of fine-level events per narrative across CS and FS narratives. Panel D. Participants recalled similar numbers of fine-level events per coarse-level event across CS and FS narratives. Error bars represent 95% confidence intervals.
We noticed that the distribution of normalized recall order deviance score was highly skewed (with 83% of the scores = 0), and the linear mixed effect model reported above based on our preregistration might not be the most appropriate. Therefore, we recoded each deviance score as a binary variable and predicted whether the recall order was correct for all fine-level events within a coarse-level event using a logistic mixed-effects model with fixed effects of narrative type (CS vs. FS narrative) and random intercepts of subject and narrative. Again, the main effect of narrative type was significant, X2(1) = 43.03, p < .001, providing converging statistical evidence that fine-level events within each coarse-level event were recalled in a better order in the FS than the CS condition.
Forward Transition Probability between Adjacent Fine-level Events Recalled
We hypothesized that having semantic order constraints on the coarse-level would make across-event transitions in recall more likely to be in the correct direction, and that semantic order constraints on the fine-level would make within-event transitions in recall more likely to be in the correct direction. Specifically, we hypothesized that correct forward across-event transition in CS narratives (CS_across) would be more likely than within-event transitions in CS narratives (CS_within), and that correct forward within-event transition in FS narratives (FS_within) would be more likely than within-event transitions in CS narratives (CS_within). We predicted the probability of making a correct forward recall transition using a logistic mixed-effects model, with the fixed effects of narrative type (CS vs. FS narrative), transition type (within- vs. across-event transition), and their interaction. The transition type predictor was effect coded (across-event transition = +1, within-event transition = −1). After model selection, we retained the random intercept of subject and narrative as random effects: Accuracy of Transition Order (0/1) ~ Narrative Type + Transition Type + Narrative Type:Transition Type + (1| Participant) + (1| Narrative).
As shown in Figure 5B, there was a significant interaction between transition type and narrative type, X2(1) = 36.17, p < .001, a significant main effect of narrative type, X2(1) = 6.37, p = .01, and a non-significant main effect of transition type, X2(1) = 1.04, p = .31. We further probed the interaction and tested the two hypothesized pairwise contrasts: We found that across-event transitions in CS narratives had a higher probability of being in the correct order than within-event transition in CS narratives (B = 2.12, SE = .37, p < .001), and that within-event transition in FS narratives had a higher probability of being in the correct order than within-event transition in CS narratives (B = 3.09, SE = .56, p < .001).
Number of Fine-level Events Recalled
Next, we hypothesized that participants would recall more fine-level events per narrative and per each coarse-level event for FS narratives than for CS narratives. We first analyzed the total number of fine-level events recalled. Participants recalled a mean of 12.01 (SD = 5.54) out of 25 fine-level events in each narrative. We predicted the number of fine-level events recalled in each narrative using a linear mixed-effects model, with the fixed effect of narrative type (CS vs. FS narrative). After model selection, we retained the random intercepts of subject and narrative as random effects: Number of Fine-level Events Recalled Per Narrative ~ Narrative Type + (1 | Participant) + (1| Narrative). Contrary to our hypothesis, the total number of fine-level events recalled in each narrative did not differ significantly between CS and FS narratives, leading to a non-significant main effect of narrative type, F(1, 4.00) = .05, p = .84. This was shown in Figure 5C. The Bayes Factor value indicated weak evidence in support of the null hypothesis, BF10 = .48.
Next, we tested if the number of fine-level events recalled per coarse-level event differed across narrative type. Each participant recalled a mean of 2.85 (SD = 1.25) out of 5 fine-level events per coarse-level event. We predicted the number of fine-level events recalled in each coarse-level event using a linear mixed-effects model, with the fixed effects of narrative type (CS vs. FS narrative). After model selection, we retained the random intercept of subject and narrative as random effects: Number of Fine-level Events Recalled Per Coarse-level Event ~ Narrative Type + (1 | Participant) + (1| Narrative). Again, contrary to our hypothesis, the number of fine-level events recalled per coarse-level event did not differ significantly between CS and FS narratives, F(1, 4.00) = .89, p = .40. This was shown in Figure 5D. The Bayes Factor value indicated weak evidence in support of the null hypothesis, BF10 = .45. In summary, the data did not provide strong evidence for or against the hypothesis that fine-grained semantic order constraints facilitate recall of more fine-grained events.
Serial Position Curves and Lag-Conditional Response Probability Curves
Graphical methods provide alternative ways of examining temporal memory that can be highly illuminating. One of these is the serial position curve, which plots the probability of event recall as a function of position in the story. Another is the lag-conditional response probability curve, which plots the probability of a transition k steps forward or backward as a function of k. We performed analyses using these methods, which led to conclusions congruent with those reported above; see Supplemental Materials (Figures S3 and S4).
Discussion
In Experiment 2, we examined how semantic order knowledge and hierarchical event structure influenced the temporal organization of events in long-term memory using a serial recall task. The results suggested that fine-level semantic order knowledge exerted a strong influence in the order of fine-level events recalled within each coarse-level narrative: When there were semantic order constraints on the order of fine-level events, they were more likely to be recalled in order than when there were no semantic order constraints. This confirmed previous findings that event schemata served as an important factor biasing the order of events recalled (Lichtenstein & Brewer, 1980; Bower et al., 1979; Brewer & Dupree, 1983).
By analyzing the order of adjacent recall transitions and examining the influence of event boundary, we found evidence that narrative recall was organized hierarchically, with coarse-level event membership serving as an important grouping factor for the recall of fine-level events. First, for pairs of events that were adjacent in participants’ recall protocols, the order of within-event transitions was more likely to be correct in FS narratives compared to in CS narratives. This finding offers additional support to the above conclusion that fine-level semantic order constraints helped organize the order of recall. Second, if adjacent fine-level events recalled in CS narratives belonged to different coarse-level events, they were almost always recalled in the correct order. Similar to Experiment 1, this pattern differed from the canonical pattern observed in the EDD paradigm, where across-event recall transitions are less likely to be in the correct order compared to within-event transitions. This suggests a temporal order mechanism making use of both hierarchical event structure and semantic knowledge about coarse-level event order: Participants may have formed separate event models for each coarse-level event and used semantic constraints to organize the order of those coarse-level events. Then, during recall they could follow the hierarchical organization, recalling all fine-level events they could remember from the first coarse-level event they retrieved, then moving on to the next coarse-level event and so forth. Such a mechanism is consistent with previous findings that event boundaries served as “entry points” for accessing and scanning subevents in memory retrieval (Antony et al., 2024; Michelmann et al., 2023). In addition, readers’ strategy of traversing coarse-level events using semantic order constraints is consistent with one of the principles outlined in the Event Horizon Model: Causal connectivity is a dominant factor organizing the relationship among event models in long-term memory (Radvansky, 2012; Radvansky & Zacks, 2017).
We also observed a surprising result that across-event transitions in FS narratives also had a very high probability (mean > 95%) of being in the correct direction, despite these narratives’ lack of coarse-level semantic order constraints. At first glance, this might seem to contradict the results in Experiment 1a and 1b that recency judgment for FS_across pairs had consistently lower performance compared to CS_across and FS_within pairs with semantic order knowledge facilitation. However, the recall transition analysis in Experiment 2 was only based on events that were successfully recalled by the participants, which did not include everything in the narrative (on average, 12.01 out of 25 fine-level events were recalled per narrative). It is possible that for those coarse-level events that participants successfully remembered in FS narratives, participants had very accurate episodic memory of their relative order. This suggests that accurate episodic memory of coarse-level event order in FS narratives could also be leveraged to judge the order of fine-level events spanning across an event boundary. We explicitly tested this mechanism in Experiment 3.
The hypothesis that more fine-level events would be recalled per narrative (or per coarse-level event) in FS narratives compared to in CS narratives was not supported. One possibility is that even though CS narratives do not have semantic order constraints on the fine level, they have similarly strong semantic relationships between each fine-level event and its containing coarse-level event. This additional mechanism might overshadow the effect of fine-level order constraints on boosting the number of units recalled.
Experiment 3
In Experiment 2, we found that when performing serial recall of the narratives, participants chunked fine-level events based on their coarse-level event membership, such that they tended to recall all fine-level events they could remember from the same coarse-level event before accessing the next coarse-level event. In our analysis of adjacent recall transition, we discovered that if two fine-level events belonged to different coarse-level events, they were almost always recalled in the correct order in both CS and FS narratives. This strongly suggested that when participants were judging the recency of across-event pairs, they could potentially adopt a strategy that did not rely on episodic links between the two individual fine-level event sentences: If they knew which coarse-level event each fine-level event belonged to, they could answer the question based on information about the order of corresponding coarse-level events. In Experiment 1a and 1b, we saw evidence that semantic knowledge of coarse-level event order could facilitate across-event fine-level recency judgments, such that accuracy was higher for across-event pairs compared to within-event pairs in Coarse-level Semantic (CS) narratives. We also saw indirect evidence that episodic memory of coarse-level event order might serve as a source of information in fine-level recency judgment, such that between two conditions without semantic facilitation, recency judgment confidence was higher for across-event pairs in FS narratives than within-event pairs in CS narratives. Thus, in addition to semantic knowledge, accurate episodic memory about coarse-level event order could also potentially support recency judgment for across-event pairs. In addition, this use of coarse-level event order would require accurate source memory, namely associating both fine-level event sentences with the coarse-level events to which they belonged.
Experiment 3 explicitly tested this novel mechanism by adapting the recency judgment task from Experiment 1. In addition to testing recency judgments and confidence for fine-level event sentence pairs, we also measured recency judgments and confidence for consecutive coarse-level event pairs. Further, for each fine-level event sentence that appeared in across-event pairs, we measured participants’ source memory for choosing its corresponding coarse-level event label out of the five possible options. Our key hypotheses were as follows: If participants could use episodic memory about coarse-level event order to perform recency judgments for fine-level events spanning across event boundaries, then the response for each across-event pair in FS narratives (i.e., FS_across pairs) should depend on (1) coarse-level recency judgment outcome (i.e., if participants correctly judged the order of their two corresponding coarse-level events), and (2) source memory outcome (i.e., if participants could match both of the two fine-level event sentences to its corresponding coarse-level event label). In addition, we hypothesized that participants’ confidence for fine-level recency judgments would be predicted by their confidence for the corresponding coarse-level recency judgment trial, controlling for their accuracy for both recency judgment trials and source memory trials.
Methods
Participants
Sixty-four undergraduate students at Washington University in St. Louis participated in this online experiment through the university participant pool as partial fulfillment of a course requirement. The mean age of the participants was 19.72 years (min = 18, max = 23, SD = 1.17). Forty-six participants identified as female and 18 identified as male. Informed consent was obtained from all participants prior to the start of data collection.
We determined the sample size by performing a simulation-based power analysis using the simr package in R (Green & MacLeod, 2016). Using data from a pilot sample (n = 20, 6 dropped based on the exclusion criteria, final n = 14) collected on Prolific (https://www.prolific.com/), we discovered that at least 45 participants were needed to achieve a statistical power of 80% for the smallest effect we hypothesized (i.e., the effect of narrative type on recency judgment accuracy for across-event pairs).
Materials
The stimuli were the same ten narratives about everyday activities used in Experiments 1a and 1b. We also used the audio recordings that were generated for all the narratives in Experiment 1b.
Procedure and Design
As in Experiment 1b, Experiment 3 had an approximately 20-minute delay between encoding and test for each narrative. Participants encoded all ten narratives and then their memory was tested in the same order as the narratives were read. We selected this delay, rather than the shorter delay of Experiment 1a, to induce more variability in fine-level recency, coarse-level recency, and source memory performance.
In the encoding block, participants completed the encoding phases of all ten narratives consecutively in a randomized order. As in Experiment 1b and 2, they were instructed to read each narrative sentence by sentence in a self-paced format and heard the audio of each fine-level event sentence as they read. Participants also answered two reading comprehension questions immediately after reading each narrative, like in Experiment 1b. We used accuracy for comprehension questions as an additional exclusion criterion.
After the encoding block, participants entered the test block, illustrated in Figure 3D. Narratives were tested in the same order as they had been presented. Before each test, the participant received an instruction specifying the narrative being probed (e.g., “The following questions are for the story about Jim visiting his aunt.”). The test phase for each narrative was composed of three parts: First, participants made recency judgments for fine-level event sentences. For the same eight sentence pairs (four across-event, four within-event) as in Experiment 1a and 1b, participants were asked to (1) make a recency judgment, and then (2) indicate their confidence for the recency judgment. This was followed by a source memory test, in which each of the eight sentences from the four across-event pairs would be presented on the screen, and participants were asked to select its corresponding coarse-level event label out of five labels in the narrative. Finally, participants made coarse-level recency judgments, in which they were given adjacent pairs of coarse-level event labels and asked to make a recency judgment, and then indicate their confidence for the recency judgment. Because there were five coarse events in each story, there were four adjacent pairs for recency judgment.
As in Experiment 1a and 1b, Experiment 3 was conducted using a 2 (Narrative Type: CS vs. FS) × 2 (Pair Type: Within vs. Across) within-subject design. Both narrative type and pair type were within-subject variables. In the encoding phase, the presentation order of ten narratives (which was the same as their order of being tested) were randomized for each participant. In the retrieval phase we randomized for each run the order of testing event pairs in fine-level recency, source memory, and coarse-level recency trials.
Data Preparation
The final sample included responses from 49 participants. Fifteen participants were excluded from the data analysis based on the exclusion criteria that we preregistered. Twelve were excluded for having reaction time greater than 40000 ms for more than five encoding or test trials, 1 was excluded for having less than 60% accuracy for the reading comprehension questions after encoding each narrative, 1 was excluded for having less than 50% accuracy for source memory questions, and 1 was excluded for having a delay time longer than 35 minutes between the last encoding trial of the first narrative they read and the first test trial on that narrative. Additionally, we excluded outlier trials based on our preregistered criteria: For test trials, we excluded all test trials that had reaction time less than 300 ms or greater than 3SDs above the mean reaction time for source memory trials (0.5% of the data), fine-level recency judgment trials (0.6% of the data), and coarse-level recency judgment trials (1.1% of the data). As in Experiment 1a and 1b, we transformed both recency judgment confidence and coarse-level recency judgment confidence (on a scale of 1-100) into binary confidence group variables. Previously, we used 90 as a cutoff when there were only recency judgment confidence trials. Because the distribution of coarse-level recency judgment confidence was more left-skewed, we changed our cutoff criterion to 80, coding confidence score greater than 80 as “High Confidence” and less than 80 as “Low Confidence.” This ensured a relatively equal distribution of data in both groups for both variables (42.0% and 46.5% coded as “High Confidence” separately for recency judgment confidence and coarse-level recency judgment confidence).
Results
Recency Judgment Accuracy
For recency judgment accuracy, the pattern of results from Experiments 1a and 1b was replicated in Experiment 3. Across all trials, mean accuracy on recency judgment was .77 (SD = .42). We predicted whether a given recency judgment trial was correct or not using a logistic mixed-effects model, with the fixed effects of narrative type (CS vs. FS narrative), event pair type (across vs. within), and the interaction between narrative type and pair type. After model selection, we retained the random intercepts of subject and event pairs as random effects: Recency Judgment Result (0/1) ~ Narrative Type + Pair Type + Narrative Type:Pair Type + (1 | Participant) + (1| Event Pair).
As shown in Figure 6A, we replicated the results in Experiment 1a and 1b that the presence of semantic order constraints determined whether across-event or within-event order memory was enhanced, leading to a significant interaction between narrative type and fine-level event pair type, X2(1) = 28.01, p < .001. There was no significant main effect of narrative type, X2(1) = 3.00, p = .08, and no significant main effect of pair type, X2(1) = 1.94, p = .16. We probed the interaction with planned contrasts and found that FS_within pairs had better accuracy than both CS_within pairs (B = 1.51, SE = .31, p < .001) and FS_across pairs (B = 1.43, SE = .31, p < .001), suggesting that fine-level semantic order constraints could facilitate with within-event recency judgment. In addition, CS_across pairs had better accuracy than both FS_across pairs (B = .76, SE = .30, p = .01) and CS_within pairs (B = .84, SE = .30, p = .005), suggesting that coarse-level semantic order constraints could facilitate with across-event recency judgment. This again replicated the finding in Experiment 1a and 1b that semantic order knowledge at either coarse-level or fine-level could be used to facilitate fine-level recency judgment accuracy, even after a 20-minute delay and more varieties of tasks in between.
Figure 6. Experiment 3 Results.

Note. Panel A. Replicating findings from Experiment 1a and 1b, participants had higher recency judgment accuracy for within-event pairs than across-event pairs when there were fine-level semantic order constraints (in FS narratives), and higher recency judgment accuracy for across-event pairs than within-event pairs when there were coarse-level semantic order constraints (in CS narratives). Panel B. Participants had higher coarse-level recency judgment accuracy when there were coarse-level semantic order constraints (in CS narratives) comparing to when there were not (in FS narratives). Panel C. Recency judgment accuracy for FS_across pairs was significantly improved if participants had accurate recency judgment for corresponding coarse-level events, but this happened only when source memory was intact for both fine-level events. Panel D. Participants were more likely to have high recency judgment confidence (>80) when they also have high recency judgment confidence (>80) for corresponding coarse-level events, after controlling for all other variables. Error bars represent 95% confidence intervals.
Coarse-level Recency Judgment Accuracy
When interpreting results from previous experiments, we assumed based on norming study results that coarse-level semantic order knowledge supported recency judgment between adjacent coarse-level events, thereby facilitated recency judgment for fine-level event pairs coming from two adjacent coarse-level events (CS_across pairs). The setup of the current experiment allowed us to empirically test this assumption that coarse-level semantic order knowledge supported recency judgment for adjacent events on the corresponding level. We hypothesized that coarse-level recency judgment accuracy would be higher for CS narratives than for FS narratives, because coarse-level events in CS narratives were designed to unfold according to a stereotypical order. Across all trials, mean accuracy on coarse-level recency judgment was .85 (SD = .36). We predicted whether a given recency judgment trial was correct or not using a logistic mixed-effects model, with the fixed effect of narrative type (CS vs. FS narrative). After model selection, we retained the random slope of narrative type on subject and the random intercept of event pair as random effects: Coarse-level Recency Judgment Result (0/1) ~ Narrative Type + (Narrative Type | Participant) + (1| Event Pair). As shown in Figure 6B, coarse-level recency judgment accuracy was higher in CS narratives than in FS narratives, leading to a significant main effect of narrative type, X2(1) = 46.86, p < .001. This directly supported the hypothesis that coarse-level semantic order knowledge could be used to facilitate order reconstruction for coarse-level events.
Source Memory Accuracy
Previous paradigms using picture lists and orthogonalizing the relationship between “item” and “context” found that the associative memory between “item” and “context” was enhanced for trials happening at perceptual boundaries (Heusser et al., 2018). However, in this paradigm, we intentionally designed the narrative stimuli such that the association between fine-level events and their corresponding coarse-level event was largely determined by participants’ existing knowledge (e.g., The fine-level event “Peeling some potatoes” belongs to the coarse-level event “Help with cooking dinner in the kitchen”). Therefore, we would expect source memory to be highly accurate even after a 20-minute delay, and we would not expect boundary sentences to receive source memory benefit compared to non-boundary sentences.
Across all trials, average source memory accuracy was .88 (SD = .33). Because we tested source memory for all the fine-level event sentences that appeared in across-event pairs in recency judgment, these sentences were studied either at the first location (boundary sentence) or the third location (middle sentence) within a given coarse-level event during encoding. We predicted whether a given source memory trial was correct or not using a logistic mixed-effects model, with the fixed effects of narrative type (CS/FS) and sentence location (boundary/middle). Sentence location was effect coded, such that boundary = +1, middle = −1. Because we did not have a hypothesis regarding the interaction between the two predictors, we used a likelihood ratio test to compare the model with the interaction term and the model without the interaction term. We found that the model without interaction yielded a better fit.
After model selection, we retained the random slope of sentence location on event pair and the random intercept of subject as random effects: Source Memory Accuracy (0/1) ~ Narrative Type + Sentence Location + (1| Participant) + (Sentence Location | Event Pair). As expected, source memory accuracy did not differ between boundary and middle sentences, leading to a non-significant main effect of sentence location, X2(1) = .37, p = .54. There was also no significant main effect of narrative type, X2(1) = 1.21, p = .27. This supported the proposal that event boundary processing does not significantly influence source memory accuracy in this paradigm. Additionally, this also showed that difference in narrative type (CS vs. FS narratives) did not influence source memory accuracy, providing some evidence for our speculation in Experiment 2 that good source memory enhanced the number of events recalled in both CS and FS narratives similarly.
Relationship among fine-level recency judgment, coarse-level recency judgment, and source memory
For the most important mechanism tested in Experiment 3, we hypothesized that when people were judging the order of two fine-level events coming from two different coarse-level events (across-event pairs), they would use their correct episodic order memory of the two corresponding coarse-level events as a source of information. Controlling for source memory accuracy, coarse-level recency accuracy should predict fine-level recency accuracy. To test this hypothesis, we matched each recency judgment trial performed on FS_across pairs by each participant with the outcome of its corresponding coarse-level recency judgment trial and the outcome of two source memory trials for each fine-level event sentence in the recency judgment pair. We created a variable called source memory result, which was coded as 1 only if both source memory trials were correct, and 0 otherwise.
We predicted whether a given fine-level recency judgment trial was correct or not using a logistic mixed-effects model, with the fixed effects of coarse-level recency judgment accuracy (0/1) and source memory result (0/1). Both predictors were effect coded, such that correct = +1, incorrect = −1. Because we did not have a hypothesis regarding the interaction between the two predictors, we used a likelihood ratio test to compare the model with the interaction term and the model without the interaction term. We found that the model with interaction yielded a better fit.
After model selection, we retained the random slope of coarse-level recency judgment accuracy on subject and the random intercept of event pair as random effects: Recency Judgment Result (0/1) ~ Coarse-level Recency Judgment Result + Source Memory Result + Coarse-level Recency Judgment Result:Source Memory Result + (Coarse-level Recency Judgment Result | Participant) + (1| Event Pair).
As shown in Figure 6C, recency judgment accuracy for FS_across pairs was jointly determined by coarse-level recency judgment accuracy and source memory, leading to a significant interaction between coarse-level recency judgment accuracy and source memory accuracy, X2(1) = 42.48, p < .001. Controlling for source memory, accurate coarse-level recency judgment increased the probability of having accurate fine-level recency judgment, leading to a significant main effect of coarse-level recency accuracy, X2(1) = 34.20, p < .001. Controlling for coarse-level recency judgment accuracy, there was no significant main effect of source memory accuracy, X2(1) = 2.56, p = .11. Since we did not have a priori hypotheses regarding specific contrasts, we further probed the interaction and examined the simple effects with a Bonferroni correction: When source memory was not correct for at least one of the two sentences, there was no significant difference in fine-level recency judgment accuracy between having correct coarse-level recency result versus incorrect coarse-level recency result, B = .11, SE = .37, p = .76. However, when source memory was correct for both sentences, having correct coarse-level recency led to a much higher fine-level recency accuracy than having incorrect coarse-level recency, B = 2.79, SE = .27, p < .001. This supported the hypothesis that people’s correct episodic memory of coarse-level event order facilitated the correct reconstruction of fine-level event order. Critically, this facilitation depended on having accurate source memory for both fine-level events.
Relationship among fine-level recency judgment confidence, coarse-level recency judgment confidence, and memory accuracy
We hypothesized that when people were judging the order of two fine-level events that came from two different coarse-level events (across-event pairs), the confidence of each fine-level recency judgment would be related to their confidence of the corresponding coarse-level recency judgment, controlling for their accuracy for both fine-level and coarse-level recency judgment and source memory. As in the previous analysis, we matched each recency judgment confidence trial performed on FS_across pairs with its corresponding recency judgment trial, corresponding coarse-level recency judgment trial and accompanying confidence trial, and source memory result.
Across all trials, the mean fine-level recency judgment confidence was 69.26 (SD = 26.96) and the mean coarse-level recency judgment confidence was 72.55 (SD = 26.76), both on a scale of 1 to 100. Both confidence rating distributions were highly left-skewed, with 42.0% of the fine-level recency judgment confidence scores and 46.5% of the of the coarse-level recency judgment confidence scores higher than 80. Therefore, we decided to use 80 as a cutoff, coding confidence score greater than 80 as “High Confidence,” and confidence score less than 80 as “Low Confidence.” We predicted fine-level recency judgment confidence (high/low) for each sentence pair using a logistic mixed-effects model, with the fixed effects of fine-level recency judgment result (0/1), source memory accuracy (0/1), coarse-level recency judgment accuracy (0/1), and coarse-level recency judgment confidence (high/low). All of the predictors were effect coded (all the memory accuracy predictors: incorrect = −1, correct = +1; coarse-level recency judgment confidence: low = −1, high = +1). Because we did not have a hypothesis regarding the interactions among the four predictors, we started by fitting the most complex model with a four-way interaction among the four predictors, reduced to the least complex model with no interaction, and removed the effects that did not improve fit using likelihood ratio test. In the end, we found that the model without any interaction term yielded the best fit.
After model selection, we retained the random intercepts of subject and event pairs as random effects: Fine-level Recency Judgment Confidence (High/Low) ~ Fine-level Recency Judgment Result + Source Memory Accuracy + Coarse-level Recency Judgment Result + Coarse-level Recency Judgment Confidence + (1 | Participant) + (1| Event Pair).
As shown in Figure 6D, high coarse-level recency judgment confidence increased the probability of having high fine-level recency judgment confidence, leading to a significant main effect of coarse-level recency confidence, X2(1) = 90.16, p < .001, when controlling for all other variables. In addition, higher fine-level recency judgment accuracy also increased the probability of having high fine-level recency judgment confidence, leading to a significant main effect of fine-level recency accuracy, X2(1) = 14.34, p < .001, when controlling for all other variables. There was no significant main effect of source memory result, X2(1) = .04, p = .85, and no significant main effect of coarse-level recency accuracy, X2(1) = .13, p = .72. This confirmed our hypothesis that confidence for coarse-level recency judgment was related to their confidence for fine-level recency judgment, which provided another piece of evidence that participants might use coarse-level order information when reconstructing the order of fine-level events spanning across an event boundary.
Discussion
In Experiment 3, we explicitly tested a specific mechanism behind the usage of coarse-level event order to reconstruct fine-level event order in recency judgment tasks. We found that episodic memory for coarse-level event order could serve as an important basis for fine-level recency judgments spanning across event boundaries, but only if source memory was intact: When participants could match both fine-level events to their corresponding coarse-level events, correctly remembering the order of coarse-level events was associated with greatly improved accuracy of retrieving the order of fine-level events. When participants failed to match both fine-level events to their corresponding coarse-level events this was not the case. We also found that participants’ confidence in judging the order of fine-level events was related to their confidence in judging the order of their corresponding coarse-level events, again providing evidence for the usage of hierarchical information in fine-level recency judgment tasks.
The fact that accurate source memory facilitated fine-level order memory only when coarse-level order memory also was accurate may explain why previous studies have failed to find a relationship between recency judgment accuracy and source memory. In Experiment 4 reported in Wen & Egner (2022), source memory was not found to be associated with recency judgment accuracy for across-event item pairs. One possibility is that because the color contexts changed randomly from one to another as in typical EDD paradigms, participants may sometimes have poor episodic memory for the order of context change, and this factor was not tested or controlled in the analysis.
The recency judgment results replicated the finding in Experiments 1 that semantic order knowledge on both coarse and fine levels facilitated fine-level order reconstruction. By measuring coarse-level recency judgment and comparing the performance across CS and FS narratives, we found empirical support for a previous assumption that semantic knowledge about coarse-level event order improved coarse-level recency judgment. This leads to a more comprehensive explanation of how semantic order knowledge scaffolds temporal order memory: Having semantic knowledge about how event order unfolds on a certain level helps with order reconstruction on the corresponding level; in addition, semantic knowledge of coarse-level order facilitates order reconstruction for fine-level events coming from different coarse-level events.
There was not a boundary-related source memory boost in the current paradigm, which differed from patterns found in previous studies using the EDD paradigm (Heusser et al., 2018). We expected this difference to occur because of a key difference between the design of our narrative stimuli and picture-list stimuli in previous paradigms: We designed the stories to make sure that source memory, or the “part-of” relationship between fine-level and coarse-level events, was primarily supported by knowledge about everyday activities (e.g., “take out a hoodie from the closet” is a part of the “prepare to go out at home” coarse-level event). Therefore, this type of source memory performance was less susceptible to modulation by the brief attentional boost caused by event boundary processing during encoding, whereas source memory that required more online encoding effort (e.g., remembering associations between random pictures and random color background) might be more susceptible. This suggests an important boundary condition on previous theories focusing on the mnemonic trade-off between source memory and across-event temporal order memory during encoding (Clewett et al., 2019; Heusser et al., 2018): This trade-off might not occur in everyday situations in which knowledge can compensate for both episodic source memory and associative links among events.
General Discussion
Here, we have proposed a novel mechanism for explaining temporal order memory in everyday scenarios: Semantic order knowledge and hierarchical event structure can be leveraged to facilitate the reconstruction of temporal order relationships among events. To test this proposal, we developed a narrative reading paradigm and tested it in a series of experiments. The results highlight and explain a discrepancy between a canonical finding in paradigms using simple pictorial stimuli and how the structure of everyday experience can additionally support temporal memory: Previous findings from the EDD paradigm have shown that order memory is generally impaired for stimuli spanning a contextual change compared to stimuli encoded in the same context (DuBrow & Davachi, 2013; Heusser et al., 2018; Pu et al., 2022). However, in the current experiments, adding just a little bit of the structure inherent in naturalistic activities led to a reversal of this effect in specific conditions that could be predicted based on well-grounded principles of knowledge and memory. Whereas previous theories have focused exclusively on the formation of associations between fine-level items and the influence of event boundaries on this binding, here we propose that the hierarchical organization of activity can provide additional support for temporal order memory: If someone is asked to remember whether they wiped the table or adjusted their car temperature first, they could use information about the hierarchical organization of the activity to respond: If they know that wiping the table was part of cleaning up at home, whereas adjusting car temperature was part of driving to work, and also know that cleaning up came before driving, wiping the table must have happened before adjusting car temperature. This shows that besides relying on direct associations between the fine-level events, rememberers can also utilize the hierarchical organization and the order of the coarse-level events for order reconstruction. Moreover, information about temporal order of both the fine-grained and coarse-grained events could come from semantic knowledge as well as from episodic memory.
In the current studies, when semantic knowledge only provided information about the order of fine-level events (FS narratives), recency judgments for within-event pairs were more accurate than across-event pairs. In contrast, when semantic knowledge gave information about the order of coarse-level events (CS narratives), recency judgments for across-event pairs was more accurate than within-event pairs. The data supported the central hypothesis that semantic order knowledge on both coarse and fine levels could be applied to facilitate recency judgment, and that this effect overrode the influence of event segmentation alone (Experiments 1a, 1b, and 3). This proposal was further supported by results from the recency judgment confidence measure: When there was no semantic order knowledge facilitation, having coarse-level event membership as an extra source of information increased confidence in recency judgment, after controlling for accuracy (Experiments 1a and 1b). Participants’ serial recall of the narratives provided converging evidence that semantic order constraints served as an important factor in organizing the order of recall on both the coarse and fine levels. The serial recall data also indicated that participants frequently chunked their recall of fine-level events based on coarse-level event membership (Experiment 2), which is consistent with the use of hierarchical structure. Building on these findings, we further tested a generalized mechanism of how coarse-level event order could be used to reconstruct the order of fine-level events separated by an event boundary: When participants could match both fine-level events to their corresponding coarse-level event, correct episodic memory of coarse-level event order helped with reconstructing fine-level event order (Experiment 3).
Although previous studies using picture lists found a consistent effect that event boundaries impaired recency judgment for items that spanned across a contextual boundary (DuBrow & Davachi, 2013; Heusser et al., 2018; Pu et al., 2022), the current results show that, in more naturalistic events such as narratives about everyday activities, the effect of an event boundary on recency judgment can depend on the reader’s ability to use information other than episodic memory, including semantic knowledge about stereotypical event order and the hierarchical structure of the event. These results suggest that an adequate mechanism to explain temporal memory in naturalistic events should consider how information about event structure can be utilized in reconstructing the order of events. Indeed, previous accounts tended to explain the influence of event boundaries on temporal order memory using mechanisms that emphasize associations formed in the experimental context on a single temporal level (DuBrow & Davachi, 2013; Howard & Kahana, 2002; Lewandowsky & Murdock, 1989). These accounts entail that event boundaries induce a trade-off between temporal memory and other types of memories: Shifting one’s attention to process novel contextual information can enhance recognition memory and item-context associative memory (source memory) at event boundaries, but at the same time disrupt the encoding of item-level associative links. Therefore, items spanning two different contexts suffer from impaired order memory and inflated distance ratings (Heusser et al., 2018; Clewett et al., 2019). However, we found that this trade-off need not occur in the current narrative stimuli: Whereas boundary processing did not seem to affect source memory performance, the extent to which across-event temporal order memory was enhanced or impaired depended on where semantic order constraints occurred. In addition, temporal distance ratings were always inflated for across-event sentence pairs, both in conditions in which temporal order memory was worse for across-event pairs—the standard previous result—and in conditions in which this pattern was reversed. Thus, these results cannot be explained solely by associative links formed on the fine-level during encoding, and they indicate that the role event boundaries play in temporal order memory could not be simply described as “disruptive.” Rather, the process of segmenting experiences into distinct, hierarchically organized events can also provide support for temporal memory by facilitating the reconstruction of the structural relationship among events on multiple timescales (Lichtenstein & Brewer, 1980).
The current results are consistent with certain principles outlined by Event Horizon Model (Radvansky, 2012; Radvansky & Zacks, 2017) in how people create structured representations of events during perception and store them into long-term memory. This theoretical framework proposes that people segment an ongoing stream of activity into distinct event models and transform them into “episodes” in long-term memory. During this process, semantic knowledge not only plays an important role in combining with incoming sensory information to construct the working event model, but also serves as an important factor in organizing the relationship among event models in long-term memory (Radvansky & Zacks, 2017; Zacks, 2020). In addition, it is worth noting that event structure formed during perception can serve as an effective chunking mechanism. By grouping fine-level events into larger coarse-level events, it helps create a hierarchical and relational representation of events in long-term memory. This representation is likely to include a mechanism to represent the order of events on both the coarse and fine levels, and a way to represent the membership of fine-level events within certain coarse-level events. As a result, when asked to judge the recency between two fine-level events that come from two different coarse-level events, instead of only relying on direct temporal relationship between the two fine-level events, an additional path is to consult the temporal relationship between the coarse-level events they each belong to.
It is worth noting differences between the mechanism we proposed here and several current computational models of temporal order memory in event cognition (Horner et al., 2016; Pu et al., 2022). Those models were largely based on the EDD paradigm that orthogonalized the relationship between items and contexts, and the only role that event boundaries play in these paradigms is to alter the stability of the encoding context. Both the Horner et al. (2016) and Pu et al. (2022) models are derived from temporal context models (Estes, 1950; Howard & Kahana, 2002), which associate discrete items to be encoded with a context signal that gradually drifts over time. Both models assumed a sharp change in the contextual representation at event boundary, implemented as a faster random drifting rate in the context signal (Horner et al., 2016) or as a reinstatement of the pre-experimental context (Pu et al., 2022). Neither of these two models for recency judgment is able to account for the results from our study, because their unstructured context representations lack the ability to model multi-scale organization of events or pre-existing schematic order constraints among events. One existing model of short-term and episodic memory, introduced by Farrell (2012), allows for a hierarchical representation of temporal context, such that temporal order can be represented over different timescales. However, despite its ability to model episodic temporal relationship learned in the experimental context, it remains a challenge for this model to represent structured transition dynamics among event units influenced by knowledge and event schema. In contrast, the Structured Event Memory (SEM) model (Franklin et al., 2020; Nguyen et al., 2024) has the capacity to learn about relations amongst features within an event, statically and across time. However, a limitation of the current SEM model is that it only learns and represents relations on a single level, without a mechanism to explicitly represent multi-level temporal dynamic. Our findings strongly implicate that to build computational models that account for temporal order memory phenomena in naturalistic contexts, a promising future direction is to design architectures that combine a hierarchical representational scheme with the flexibility to represent both episodic and semantic relations amongst events.
Limitations and Future Directions
In these studies, we used shifts in spatial location to induce event boundaries (Radvansky et al., 2001; Zwaan & Radvansky, 1998). This manipulation was validated by the observation of increased reading time for boundary sentences during encoding (Experiment 1a) and inflated temporal distance rating for fine-level events pairs that spanned across event boundary during retrieval (Experiments 1a and 1b), which replicates previous findings (Radvansky et al., 2001; Zwaan & Radvansky, 1998; Ezzyat & Davachi, 2014).
One limitation of the current study is that the stimuli we used contain pre-determined hierarchical and semantic structure created by the experimenter, such that a given narrative contained either coarse-level or fine-level semantic order constraints. In realistic scenarios, there are probably semantic order constraints operating on multiple levels simultaneously. In future work it would be interesting to include narratives that have semantic order constraints at neither the coarse nor fine levels, and narratives that have semantic order constraints at both levels. It would also be possible to study the effects of more naturalistic, covarying semantic constraints at multiple levels by using prose stories or movies (Lee et al., 2020); one could then estimate the effects of semantic order constraints at each level while controlling for the effects of other levels statistically. Another limitation is that there are potentially multiple bases for semantic constraints on temporal order, which were not distinguished here. For example, causal relations—such as that frying an egg necessarily comes later than cracking the egg—might have different effects from conventional relations—such as that when attending a sporting event, one sings the national anthem before the play commences. In the present research we did not attempt to distinguish causal relations from conventional relations, or from other types that may prove relevant.
A further limitation of the current study is that readers saw both coarse-level event labels and fine-level event sentences during the encoding phase, whereas in real-world reading they are likely to encounter only fine-level event sentences. The rationale behind this design is to make the presentation of stimuli directly paralleling the EDD paradigm, as well as enabling the test of coarse-level order memory and source memory in Experiment 3. Future studies can omit coarse-level event labels in stimuli presentation and test if readers still spontaneously utilize hierarchical event structure for reasoning about temporal order.
Conclusion
In conclusion, the current findings highlight that memory for temporal structure is more than episodic memory for associations between individual instances on a single temporal scale. Other sources of information, including semantic knowledge and hierarchical event structure, can play important roles in scaffolding the reconstruction of temporal event sequences in memory. Thus, the findings support the view that using stimuli and task structures more representative of real-life situations can help us identify memory mechanisms that are crucial yet overlooked by previous theories (Parra & Radvansky, 2024; Pooja et al., 2024; Ranganath, 2022). The interaction of multiple memory processes is likely to be crucial in supporting the flexibility and efficiency of event memory in real-world scenarios.
Supplementary Material
Table 1.
Comparison of Experiment 1-3 Method
| Experiment 1a | Experiment 1b | Experiment 2 | Experiment 3 | |
|---|---|---|---|---|
| Sample Size (after data exclusion) | N = 40 | N = 27 | N = 32 | N = 49 |
| Encoding Stimuli | 10 narratives (5 CS + 5 FS) | 10 narratives (5 CS + 5 FS) | 6 narratives (3 CS + 3 FS) | 10 narratives (5 CS + 5 FS) |
| Encoding Task | Self-paced reading | Self-paced reading with audio | Self-paced reading with audio | Self-paced reading with audio |
| Delay Length | ~ 2.5 minutes | ~ 20 minutes | ~ 2.5 minutes | ~ 20 minutes |
| Retrieval Task | (a) Recency Judgment (b) Recency Judgment Confidence (c) Temporal Distance Rating |
(a) Recency Judgment (b) Recency Judgment Confidence (c) Temporal Distance Rating |
(a) Serial Recall | (a) Recency Judgment (b) Recency Judgment Confidence (c) Source Memory (d) Coarse-level Recency Judgment (e) Coarse-level Recency Judgment Confidence |
Acknowledgments
This work was supported by grants from the James S. McDonnell Foundation, the National Institutes of Health (R01AG06243801), and the Office of Naval Research (N00014-17-1-2961). We would like to thank Devon Alperin for the help with data collection and recall coding. We also would like to thank members of the Dynamic Cognition Lab and the Complex Memory Lab for helpful discussions and support. Results were reported as posters at the Psychonomic Society’s 64th and 65th Annual Conference and the Cognitive Neuroscience Society 2024 Annual Meeting. Results were presented as a talk at the 2024 APS Annual Convention.
Footnotes
We preregistered all the studies reported in this manuscript. The preregistrations are available at the following links: Experiment 1a (https://osf.io/42d6p), Experiment 1b (https://osf.io/6j4k9), Experiment 2 (https://osf.io/8wxhf), and Experiment 3 (https://osf.io/nghwt). Data, materials, and code for all four experiments have been made publicly available at Open Science Framework (OSF) and can be accessed at https://osf.io/gm2c8. We have no conflicts of interest to disclose.
References
- Abelson RP (1981). Psychological status of the script concept. American Psychologist, 36(7), 715–729. 10.1037/0003-066X.36.7.715 [DOI] [Google Scholar]
- Antony J, Lozano A, Dhoat P, Chen J, & Bennion K (2024). Causal and Chronological Relationships Predict Memory Organization for Nonlinear Narratives. Journal of Cognitive Neuroscience, 1–18. 10.1162/jocn_a_02216 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Baddeley A (2000). The episodic buffer: A new component of working memory? Trends in Cognitive Sciences, 4(11), 417–423. 10.1016/S1364-6613(00)01538-2 [DOI] [PubMed] [Google Scholar]
- Baldassano C, Chen J, Zadbood A, Pillow JW, Hasson U, & Norman KA (2017). Discovering Event Structure in Continuous Narrative Perception and Memory. Neuron, 95(3), 709–721.e5. 10.1016/j.neuron.2017.06.041 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Barr DJ, Levy R, Scheepers C, & Tily HJ (2013). Random effects structure for confirmatory hypothesis testing: Keep it maximal. Journal of Memory and Language, 68(3), 255–278. 10.1016/j.jml.2012.11.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bates D, Kliegl R, Vasishth S, & Baayen H (2018). Parsimonious Mixed Models (No. arXiv:1506.04967). arXiv. 10.48550/arXiv.1506.04967 [DOI] [Google Scholar]
- Bates D, Mächler M, Bolker B, & Walker S (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1–48. 10.18637/jss.v067.i01 [DOI] [Google Scholar]
- Bower GH, Black JB, & Turner TJ (1979). Scripts in memory for text. Cognitive Psychology, 11(2), 177–220. 10.1016/0010-0285(79)90009-4 [DOI] [Google Scholar]
- Brewer WF, & Dupree DA (1983). Use of plan schemata in the recall and recognition of goal-directed actions. Journal of Experimental Psychology: Learning, Memory, and Cognition, 9(1), 117–129. 10.1037/0278-7393.9.1.117 [DOI] [Google Scholar]
- Buonomano DV, Buzsáki G, Davachi L, & Nobre AC (2023). Time for Memories. Journal of Neuroscience, 43(45), 7565–7574. 10.1523/JNEUROSCI.1430-23.2023 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Clewett D, & Davachi L (2017). The ebb and flow of experience determines the temporal structure of memory. Current Opinion in Behavioral Sciences, 17, 186–193. 10.1016/j.cobeha.2017.08.013 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Clewett D, DuBrow S, & Davachi L (2019). Transcending time in the brain: How event memories are constructed from experience. Hippocampus, 29(3), 162–183. 10.1002/hipo.23074 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Clewett D, Gasser C, & Davachi L (2020). Pupil-linked arousal signals track the temporal organization of events in memory. Nature Communications, 11(1), Article 1. 10.1038/s41467-020-17851-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Clewett D, Huang R, & Davachi L (2025). Locus coeruleus activation “resets” hippocampal event representations and separates adjacent memories. Neuron, 113(15), 2521–2535.e8. 10.1016/j.neuron.2025.05.013 [DOI] [PubMed] [Google Scholar]
- Cowan ET, Chanales AJ, Davachi L, & Clewett D (2024). Goal Shifts Structure Memories and Prioritize Event-defining Information in Memory. Journal of Cognitive Neuroscience, 1–17. 10.1162/jocn_a_02220 [DOI] [PubMed] [Google Scholar]
- Davis EE, & Campbell KL (2023). Event boundaries structure the contents of long-term memory in younger and older adults. Memory, 31(1), 47–60. 10.1080/09658211.2022.2122998 [DOI] [PubMed] [Google Scholar]
- Diamond NB, & Levine B (2020). Linking Detail to Temporal Structure in Naturalistic-Event Recall. Psychological Science, 31(12), 1557–1572. 10.1177/0956797620958651 [DOI] [PubMed] [Google Scholar]
- Ding Y, & Zacks J (2023, May 22). The Influence of Semantic Knowledge and Event Hierarchy on Temporal Order Memory in Naturalistic Events. 10.17605/OSF.IO/42D6P [DOI] [Google Scholar]
- Ding Y, & Zacks J (2023, October 19). The Influence of Semantic Knowledge and Event Hierarchy on the Temporal Structure of Recall in Naturalistic Events. 10.17605/OSF.IO/8WXHF [DOI] [Google Scholar]
- Ding Y, & Zacks J (2024, February 13). The Influence of Semantic Knowledge and Event Hierarchy on Temporal Order Memory in Naturalistic Events with a 20-minute Delay. 10.17605/OSF.IO/6J4K9 [DOI] [Google Scholar]
- Ding Y, & Zacks J (2024, March 18). The Influence of Semantic Knowledge and Event Hierarchy on Fine-level and Coarse-level Temporal Order Memory in Naturalistic Events with a 20-minute Delay. 10.17605/OSF.IO/NGHWT [DOI] [Google Scholar]
- Ding Y, & Zacks JM (2025, June 28). The Influence of Semantic Knowledge and Event Hierarchy on Temporal Order Memory in Naturalistic Events. Retrieved from osf.io/gm2c8 [Google Scholar]
- DuBrow S, & Davachi L (2013). The influence of context boundaries on memory for the sequential order of events. Journal of Experimental Psychology: General, 142(4), 1277–1286. 10.1037/a0034024 [DOI] [PMC free article] [PubMed] [Google Scholar]
- DuBrow S, & Davachi L (2014). Temporal Memory Is Shaped by Encoding Stability and Intervening Item Reactivation. Journal of Neuroscience, 34(42), 13998–14005. 10.1523/JNEUROSCI.2535-14.2014 [DOI] [PMC free article] [PubMed] [Google Scholar]
- DuBrow S, & Davachi L (2016). Temporal binding within and across events. Neurobiology of Learning and Memory, 134, 107–114. 10.1016/j.nlm.2016.07.011 [DOI] [PMC free article] [PubMed] [Google Scholar]
- DuBrow S, Rouhani N, Niv Y, & Norman KA (2017). Does mental context drift or shift? Current Opinion in Behavioral Sciences, 17, 141–146. 10.1016/j.cobeha.2017.08.003 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ericsson KA, & Kintsch W (1995). Long-term working memory. Psychological Review, 102(2), 211–245. 10.1037/0033-295X.102.2.211 [DOI] [PubMed] [Google Scholar]
- Estes WK (1950). Toward a statistical theory of learning: Psychological Review. Psychological Review, 57(2), 94–107. 10.1037/h0058559 [DOI] [Google Scholar]
- Ezzyat Y, & Davachi L (2011). What constitutes an episode in episodic memory? Psychological Science, 22(2), 243–252. 10.1177/0956797610393742 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ezzyat Y, & Davachi L (2014). Similarity Breeds Proximity: Pattern Similarity within and across Contexts Is Related to Later Mnemonic Judgments of Temporal Proximity. Neuron, 81(5), 1179–1189. 10.1016/j.neuron.2014.01.042 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Farrell S (2012). Temporal clustering and sequencing in short-term memory and episodic memory: Psychological Review. Psychological Review, 119(2), 223–271. 10.1037/a0027371 [DOI] [PubMed] [Google Scholar]
- Franklin NT, Norman KA, Ranganath C, Zacks JM, & Gershman SJ (2020). Structured Event Memory: A neuro-symbolic model of event cognition. Psychological Review, 127(3), 327–361. 10.1037/rev0000177 [DOI] [PubMed] [Google Scholar]
- Friedman WJ (1993). Memory for the time of past events. Psychological Bulletin, 113(1), 44–66. 10.1037/0033-2909.113.1.44 [DOI] [Google Scholar]
- Friedman WJ (2004). Time in Autobiographical Memory. Social Cognition, 22(5), 591–605. 10.1521/soco.22.5.591.50766 [DOI] [Google Scholar]
- Geerligs L, Gözükara D, Oetringer D, Campbell KL, Van Gerven M, & Güçlü U (2022). A partially nested cortical hierarchy of neural states underlies event segmentation in the human brain. eLife, 11, e77430. 10.7554/eLife.77430 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gernsbacher MA (1991). Cognitive Processes and Mechanisms in Language Comprehension: The Structure Building Framework. In Bower GH (Ed.), Psychology of Learning and Motivation (Vol. 27, pp. 217–263). Academic Press. 10.1016/S0079-7421(08)60125-5 [DOI] [Google Scholar]
- Green P, & MacLeod CJ (2016). SIMR: An R package for power analysis of generalized linear mixed models by simulation. Methods in Ecology and Evolution, 7(4), 493–498. 10.1111/2041-210X.12504 [DOI] [Google Scholar]
- Gurguryan L, Dutemple E, & Sheldon S (2021). Conceptual similarity alters the impact of context shifts on temporal memory. Memory, 29(1), 11–20. 10.1080/09658211.2020.1841240 [DOI] [PubMed] [Google Scholar]
- Hard BM, Tversky B, & Lang DS (2006). Making sense of abstract events: Building event schemas. Memory & Cognition, 34(6), 1221–1235. 10.3758/BF03193267 [DOI] [PubMed] [Google Scholar]
- Heusser AC, Ezzyat Y, Shiff I, & Davachi L (2018). Perceptual boundaries cause mnemonic trade-offs between local boundary processing and across-trial associative binding. Journal of Experimental Psychology: Learning, Memory, and Cognition, 44(7), 1075–1090. 10.1037/xlm0000503 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Horner AJ, Bisby JA, Wang A, Bogus K, & Burgess N (2016). The role of spatial boundaries in shaping long-term event representations. Cognition, 154, 151–164. 10.1016/j.cognition.2016.05.013 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Howard MW, & Kahana MJ (2002). A Distributed Representation of Temporal Context. Journal of Mathematical Psychology, 46(3), 269–299. 10.1006/jmps.2001.1388 [DOI] [Google Scholar]
- Lee H, & Chen J (2022). Predicting memory from the network structure of naturalistic events. Nature Communications, 13(1), Article 1. 10.1038/s41467-022-31965-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lehn H, Steffenach H-A, Strien N. M. van, Veltman DJ, Witter MP, & Håberg AK (2009). A Specific Role of the Human Hippocampus in Recall of Temporal Sequences. Journal of Neuroscience, 29(11), 3475–3484. 10.1523/JNEUROSCI.5370-08.2009 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lewandowsky S, & Murdock BB Jr. (1989). Memory for serial order. Psychological Review, 96(1), 25–57. 10.1037/0033-295X.96.1.25 [DOI] [Google Scholar]
- Lichtenstein EH, & Brewer WF (1980). Memory for goal-directed events. Cognitive Psychology, 12(3), 412–445. 10.1016/0010-0285(80)90015-8 [DOI] [Google Scholar]
- Lüdecke D (2021). sjPlot: Data visualization for statistics in social science. R Package Version, 2(7), 1–106. [Google Scholar]
- Mandler JM (1984). Stories, Scripts, and Scenes: Aspects of Schema Theory. Lawerence Erlbaum. [Google Scholar]
- Matuschek H, Kliegl R, Vasishth S, Baayen H, & Bates D (2017). Balancing Type I error and power in linear mixed models. Journal of Memory and Language, 94, 305–315. 10.1016/j.jml.2017.01.001 [DOI] [Google Scholar]
- McClay M, Sachs ME, & Clewett D (2023). Dynamic emotional states shape the episodic structure of memory. Nature Communications, 14(1), 6533. 10.1038/s41467-023-42241-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- McNerney MW, Goodwin KA, & Radvansky GA (2011). A Novel Study: A Situation Model Analysis of Reading Times. Discourse Processes, 48(7), 453–474. 10.1080/0163853X.2011.582348 [DOI] [Google Scholar]
- McRae K, Brown KS, & Elman JL (2021). Prediction-Based Learning and Processing of Event Knowledge. Topics in Cognitive Science, 13(1), 206–223. 10.1111/tops.12482 [DOI] [PubMed] [Google Scholar]
- Michelmann S, Hasson U, & Norman KA (2023). Evidence That Event Boundaries Are Access Points for Memory Retrieval. Psychological Science, 34(3), 326–344. 10.1177/09567976221128206 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Morey RD, & Rouder JN (2012). BayesFactor: Computation of Bayes Factors for Common Designs (p. 0.9.12–4.7) [Dataset]. 10.32614/CRAN.package.BayesFactor [DOI] [Google Scholar]
- Morton NW (2020). Psifr: Analysis and visualization of free recall data. Journal of Open Source Software, 5(54), 2669. 10.21105/joss.02669 [DOI] [Google Scholar]
- Murdock BB (1983). A distributed memory model for serial-order information. Psychological Review, 90(4), 316–338. 10.1037/0033-295X.90.4.316 [DOI] [Google Scholar]
- Myers JL, Shinjo M, & Duffy SA (1987). Degree of causal relatedness and memory. Journal of Memory and Language, 26(4), 453–465. 10.1016/0749-596X(87)90101-X [DOI] [Google Scholar]
- Nguyen TT, Bezdek MA, Gershman SJ, Bobick AF, Braver TS, & Zacks JM (2024). Modeling human activity comprehension at human scale: Prediction, segmentation, and categorization. PNAS Nexus, 3(10), pgae459. 10.1093/pnasnexus/pgae459 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Parra D, & Radvansky GA (2024). A novel study: Fragmented and holistic forgetting. Memory, 32(9), 1258–1266. 10.1080/09658211.2024.2401020 [DOI] [PubMed] [Google Scholar]
- Polyn SM, Norman KA, & Kahana MJ (2009). A context maintenance and retrieval model of organizational processes in free recall. Psychological Review, 116(1), 129–156. 10.1037/a0014420 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pooja R, Ghosh P, & Sreekumar V (2024). Towards an ecologically valid naturalistic cognitive neuroscience of memory and event cognition. Neuropsychologia, 203, 108970. 10.1016/j.neuropsychologia.2024.108970 [DOI] [PubMed] [Google Scholar]
- Pu Y, Kong X-Z, Ranganath C, & Melloni L (2022). Event boundaries shape temporal organization of memory by resetting temporal context. Nature Communications, 13(1), Article 1. 10.1038/s41467-022-28216-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Radvansky GA (2012). Across the Event Horizon. Current Directions in Psychological Science, 21(4), 269–272. 10.1177/0963721412451274 [DOI] [Google Scholar]
- Radvansky GA, & Zacks JM (2017). Event boundaries in memory and cognition. Current Opinion in Behavioral Sciences, 17, 133–140. 10.1016/j.cobeha.2017.08.006 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Radvansky GA, Zwaan RA, Curiel JM, & Copeland DE (2001). Situation models and aging. Psychology and Aging, 16(1), 145–160. 10.1037/0882-7974.16.1.145 [DOI] [PubMed] [Google Scholar]
- Rait LI, & Hutchinson JB (2024). Recall as a Window into Hippocampally Defined Events. Journal of Cognitive Neuroscience, 36(11), 2386–2400. 10.1162/jocn_a_02198 [DOI] [PubMed] [Google Scholar]
- Ranganath C (2022). What is episodic memory and how do we use it? Trends in Cognitive Sciences, 26(12), 1059–1061. 10.1016/j.tics.2022.09.023 [DOI] [PubMed] [Google Scholar]
- Rouhani N, Norman KA, Niv Y, & Bornstein AM (2020). Reward prediction errors create event boundaries in memory. Cognition, 203, 104269. 10.1016/j.cognition.2020.104269 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rubin DC, & Umanath S (2015). Event memory: A theory of memory for laboratory, autobiographical, and fictional events. Psychological Review, 122(1), 1–23. 10.1037/a0037907 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rumelhart DE (1980). Schemata: The Building Blocks of Cognition. In Theoretical Issues in Reading Comprehension (pp. 33–58). Routledge. [Google Scholar]
- Sasmita K, & Swallow KM (2023). Measuring event segmentation: An investigation into the stability of event boundary agreement across groups. Behavior Research Methods, 55(1), 428–447. 10.3758/s13428-022-01832-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shin YS, & DuBrow S (2021). Structuring memory through inference-based event segmentation. Topics in Cognitive Science, 13(1), 106–127. 10.1111/tops.12505 [DOI] [PubMed] [Google Scholar]
- Sols I, DuBrow S, Davachi L, & Fuentemilla L (2017). Event Boundaries Trigger Rapid Memory Reinstatement of the Prior Events to Promote Their Representation in Long-Term Memory. Current Biology, 27(22), 3499–3504.e4. 10.1016/j.cub.2017.09.057 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jacques P, Rubin DC, LaBar KS, & Cabeza R (2008). The Short and Long of It: Neural Correlates of Temporal-order Memory for Autobiographical Events. Journal of Cognitive Neuroscience, 20(7), 1327–1341. 10.1162/jocn.2008.20091 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Strickland B, & Keil F (2011). Event completion: Event based inferences distort memory in a matter of seconds. Cognition, 121(3), 409–415. 10.1016/j.cognition.2011.04.007 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Trabasso T, & van den Broek P (1985). Causal thinking and the representation of narrative events. Journal of Memory and Language, 24(5), 612–630. 10.1016/0749-596X(85)90049-X [DOI] [Google Scholar]
- Tulving E, & Pearlstone Z (1966). Availability versus accessibility of information in memory for words. Journal of Verbal Learning and Verbal Behavior, 5(4), 381–391. 10.1016/S0022-5371(66)80048-8 [DOI] [Google Scholar]
- Wang YC, & Egner T (2022). Switching task sets creates event boundaries in memory. Cognition, 221, 104992. 10.1016/j.cognition.2021.104992 [DOI] [PubMed] [Google Scholar]
- Wen T, & Egner T (2022). Retrieval context determines whether event boundaries impair or enhance temporal order memory. Cognition, 225, 105145. 10.1016/j.cognition.2022.105145 [DOI] [PubMed] [Google Scholar]
- Xu X, & Kwok SC (2019). Temporal-order iconicity bias in narrative event understanding and memory. Memory, 27(8), 1079–1090. 10.1080/09658211.2019.1622734 [DOI] [PubMed] [Google Scholar]
- Yousif SR, Lee SH, Sherman BE, & Papafragou A (2024). Event representation at the scale of ordinary experience. Cognition, 249, 105833. 10.1016/j.cognition.2024.105833 [DOI] [PubMed] [Google Scholar]
- Zacks JM (2020). Event Perception and Memory. Annual Review of Psychology, 71(1), 165–191. 10.1146/annurev-psych-010419-051101 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zacks JM, Speer NK, & Reynolds JR (2009). Segmentation in reading and film comprehension. Journal of Experimental Psychology: General, 138(2), 307–327. 10.1037/a0015305 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zacks JM, Speer NK, Swallow KM, Braver TS, & Reynolds JR (2007). Event perception: A mind-brain perspective. Psychological Bulletin, 133(2), 273–293. 10.1037/0033-2909.133.2.273 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zacks JM, Speer NK, Vettel JM, & Jacoby LL (2006). Event understanding and memory in healthy aging and dementia of the Alzheimer type. Psychology and Aging, 21(3), 466–482. 10.1037/0882-7974.21.3.466 [DOI] [PubMed] [Google Scholar]
- Zacks JM, & Tversky B (2001). Event structure in perception and conception. Psychological Bulletin, 127(1), 3–21. 10.1037/0033-2909.127.1.3 [DOI] [PubMed] [Google Scholar]
- Zwaan RA, & Radvansky GA (1998). Situation models in language comprehension and memory. Psychological Bulletin, 123(2), 162–185. 10.1037/0033-2909.123.2.162 [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
