Event Segmentation Improves Event Memory Up to One Month Later

Shaney Flores; Heather R Bailey; Michelle L Eisenberg; Jeffrey M Zacks

doi:10.1037/xlm0000367

. Author manuscript; available in PMC: 2018 Aug 1.

Published in final edited form as: J Exp Psychol Learn Mem Cogn. 2017 Apr 6;43(8):1183–1202. doi: 10.1037/xlm0000367

Event Segmentation Improves Event Memory Up to One Month Later

Shaney Flores ¹, Heather R Bailey ^1,², Michelle L Eisenberg ¹, Jeffrey M Zacks ¹

PMCID: PMC5542882 NIHMSID: NIHMS842308 PMID: 28383955

Abstract

When people observe everyday activity, they spontaneously parse it into discrete meaningful events. Individuals who segment activity in a more normative fashion show better subsequent memory for the events. If segmenting events effectively leads to better memory, does asking people to attend to segmentation improve subsequent memory? To answer this question, participants viewed movies of naturalistic activity with instructions to remember the activity for a later test, and in some conditions additionally pressed a button to segment the movies into meaningful events or performed a control condition that required button-pressing but not attending to segmentation. In five experiments, memory for the movies was assessed at intervals ranging from immediately following viewing to one month later. Performing the event segmentation task led to superior memory at delays ranging from 10 min to one month. Further, individual differences in segmentation ability predicted individual differences in memory performance for up to a month following encoding. This study provides the first evidence that manipulating event segmentation affects memory over long delays and that individual differences in event segmentation are related to differences in memory over long delays. These effects suggest that attending to how an activity breaks down into meaningful events contributes to memory formation. Instructing people to more effectively segment events may serve as a potential intervention to alleviate everyday memory complaints in aging and clinical populations.

Keywords: event segmentation, delay, memory

Organizing information into comprehensible and meaningful structures is important for memory. If one can form a coherent representation of an experience that binds together features of the experience into a structured framework, this reduces the amount of information that needs to be stored while also providing a useful system of indexing for later retrieval, much like a catalog at a library. In the laboratory, imposing organization during encoding of verbal materials helps memory formation and facilitates recall by providing cues to access stored information at retrieval (e.g., Tulving, 1962; Tulving & Pearlstone, 1966). Participants can recall more items from structured word lists than from unstructured lists and, when asked to freely recall items from an unstructured list, will tend to cluster their responses by category similarity (Bousfield, 1953; Bousfield & Cohen, 1955; Jenkins & Russell, 1952). Semantic clustering can be reinforced or undermined by the temporal structure of a list, with robust effects on memory (Polyn, Norman, & Kahana, 2009). However, these sorts of materials place an important limit on the study of organizational processes because they are highly organized from the start. Each word or picture comes tightly packaged by the experimenter as an informational chunk.

For more complex information, such as narratives or life events, a wider range of organizational processes can come into play. Event memories are fundamentally structured by organization in time and space, and often exhibit covarying organization on other dimensions including characters, objects, causes, and goals (Radvansky & Zacks, 2014; Rubin & Umanath, 2015). Moreover, memories for particular events are often informed by knowledge acquired from repeated experience with similar events. For example, the structure of one’s memory of going to see a particular film with a friend likely depends not just on information encoded during that episode, but also on information acquired from previous trips to the theatre. Schank and Abelson (1977) proposed that this prior information is stored in scripts, which are generalized knowledge structures that capture the typical patterns in routine activities. Scripts affect memory both by organizing information at encoding and by enabling missing information to be inferred at retrieval. When asked to recall information from stories or movies of routine events, people are more likely to organize their responses to fit their pre-existing scripts, recall more information directly related to the script compared to unrelated information, and distort the remembered order of events to better conform to script norms (Abbott, Black, & Smith, 1985; Bower, Black, & Turner, 1979; Bower & Clark-Meyers, 1980; Brewer & Dupree, 1983; Lichtenstein & Brewer, 1980; Migueles & García-Bajos, 2012).

Script-based studies of memory for events have addressed one important question of how organizational processes operate in memory: how do the units of an activity relate to each other and to knowledge in memory? However, they leave unanswered a complementary question: how are these units formed? Outside the laboratory, our cognitive system is bombarded by a vast amount of continuous and dynamic perceptual information necessary to our functioning and survival. Without an experimenter to provide us with units in the form of discrete stimuli and trials, we are forced to segment continuous real-life experience into organized elements ourselves. This question has been addressed by studies of event segmentation (Newtson, 1973, 1976). The segmentation of activity into events is a consequence of the automatic and ongoing perceptual behavior that underlies everyday perception, meaning that people segment ongoing information even when not explicitly asked to do so (Hard, Recchia, & Tversky, 2011; Zacks et al., 2001). Even infants as young as nine to eleven months of age are capable of parsing dynamic action (e.g., Baldwin, Baird, Saylor, & Clark, 2001; Saylor, Baldwin, Baird, & LaBounty, 2007), suggesting that this process may be a fundamental component of human cognition.

One account for how people segment activity into organized elements is event segmentation theory (Zacks, Speer, Swallow, Braver, & Reynolds, 2007). According to this theory, a perceiver actively maintains a representation of the current environmental context called an event model. The current event model is useful for allowing one to make predictions about upcoming activity and thus for guiding proactive behavior. The perceptual system monitors prediction error, and when prediction error increases, the event model is updated based on current perceptual input, knowledge (including script knowledge), and memory representations of related previous events. This updating process leads to the formation of units in long-term memory. In addition to enabling better predictions, segmentation may have an additional benefit for memory in the form of cognitive economy: Segmentation temporally compresses related perceptual data into more robust long term memory units that can be accessed more quickly.

Studies of memory for filmed events suggest that how an event is segmented is related to how it is subsequently remembered (Boltz, 1992; Hanson & Hirst, 1989; Lassiter, 1988; Lassiter, Stone, & Rogers, 1988; Newtson & Engquist, 1976; Sargent et al., 2013; Schwan, Garsoffky, & Hesse, 2000; Zacks & Tversky, 2001). In addition, several studies have found that individual differences in segmentation predict individual differences in subsequent memory (Bailey et al., 2013; Kurby & Zacks, 2011; Sargent et al., 2013; Zacks, Speers, Vettel, & Jacoby, 2006). For example, a study by Sargent et al. (2013) investigated this relationship across the adult lifespan in a sample of adults aged 20 to 79 years who segmented activity into events and then completed several memory measures for those activities. Each participant’s segmentation was compared to that of the group as a whole by calculating a segmentation agreement score that measures how well the individual’s segmentation corresponds with the typical responses for the group (Kurby & Zacks, 2011). Participants also completed a battery of cognitive tests assessing working memory capacity, processing speed, general knowledge and episodic memory. After controlling for these general cognitive factors, segmentation agreement remained a unique contributor to memory for the activities. This suggests that those who segment activity more normatively form representations that are better organized to support long-term retrieval. However, all studies to date have assessed the relationship between segmentation and memory using short delays, as event memory has always been tested within the same experimental session as the event segmentation task. In the present study, we asked whether these individual differences in segmentation predict memory over longer delays.

If event segmentation predicts memory, is this relationship causal or does it merely reflect some as-yet-unmeasured third variable that affects both segmentation and memory? If the relationship is causal, this opens up the possibility for improving memory for naturalistic activity by facilitating effective segmentation. This is an important potential application, because memory complaints are a prominent feature of healthy aging, age-related diseases such as Alzheimer’s disease, and brain injury (Gilewski, Zelinski, & Schaie, 1990; Jorm & Jacomb, 1989; Van Zomeren & Van den Burg, 1985). There are a few hints in the literature that experimentally manipulating stimuli to facilitate or impair segmentation affects subsequent memory. Boltz (1992) showed participants one of two episodes of the TV miniseries A Perfect Spy, in which commercial breaks were placed so as to coincide or conflict with event boundaries. Commercials that supported natural event boundaries facilitated subsequent memory, whereas commercials that conflicted with event boundaries impaired memory. A conceptually similar, but more subtle, manipulation was carried out by Schwan and colleagues (2000), who manipulated the presence of film edits rather than commercial breaks in brief movies. Like Boltz (1992), they found that placing interruptions at event boundaries facilitated subsequent memory. Zacks and Tversky (2003) taught participants to perform new everyday tasks, such as building a model, by using computer interfaces that either supported effective segmentation or did not. Learning was better when the interface supported effective segmentation.

Manipulating participants’ orientation to event segmentation during event encoding also may affect subsequent memory. Lassiter and colleagues found that instructing participants to segment events at a fine temporal grain led to better memory than instructions to segment at a coarse grain (Lassiter, 1988; Lassiter et al., 1988). However, the generality of this finding has been a matter of debate (see Hanson & Hirst, 1989; Lassiter & Slaw, 1991).

In nearly all previous studies investigating memory and segmentation, memory was assessed within a few minutes following encoding of the event. Sargent et al. (2013), Bailey et al. (2013), Kurby and Zacks (2011), and Lassiter (1988) asked participants to segment movies of everyday actions lasting less than 10 min, and then assessed for memory of the movies directly following each movie. Hanson and Hirst (1989) had participants watch a five-minute distractor video before assessing memory. In Boltz (1992), memory was tested immediately after movie presentation, but because the movies lasted 40–45 min the effective delay was a bit longer, at least for some items. To date, no studies have examined the effect segmentation has on memory over delays longer than at most 45 min—a crucial test if active segmentation is to be relevant to real-world memory.

Certainly, encoding manipulations can affect memory over a long delay. For example, in survival processing manipulations (Nairne, Thompson, & Pandeirada, 2007) participants perform an incidental encoding task of rating words on their relevance to surviving in the wilderness. Memory for the words is better following this encoding task as compared to other encoding tasks, such as judging the pleasantness of the words, at delays as long as four days (Abel & Bäuml, 2013; Clark & Bruno, 2015; Raymaekers, Otgaar, & Smeets, 2014). In some cases, the most effective level of an encoding variable depends on the delay between study and test. For example, research on the spacing effect shows that increasing the time between restudy sessions can improve memory, and this benefit can persist for up to eight years (Bahrick, Bahrick, Bahrick, & Bahrick, 1993; Bahrick & Phelphs, 1987; Glenberg & Lehmann, 1980; Pavlik & Anderson, 2005). However, the optimal amount of spacing depends on the delay between the last study episode and the final test: For optimal memory, the lag between study sessions had to increase in parallel with the retention interval (Cepeda, Pashler, Vul, Wixted, & Rohrer, 2006). The effect of an encoding manipulation can even reverse as the retention interval evolves. For example, Roediger and Karpicke (2006) asked participants to read expository passages and then either restudy them or attempt to recall them. Restudying the passages resulted in better memory performance at a five-minute delay; however, at delays of two days or one week the recall group performed better. Thus, it should be of interest whether manipulations of event segmentation facilitate memory over longer delays, and whether the effects change with delay.

One hint that event segmentation effects on memory are durable comes from a recent study of infants’ event memory (Sonne, Kingo, & Krøjgaard, 2016). In this experiment, 16-month-old and 20-month-old infants viewed brief cartoons that were edited to occlude scene information either during an event boundary or during an event middle. When they returned to the laboratory two weeks later, they were shown the movies they had seen, each paired with a new movie. The infants showed a familiarity preference, looking more at the previously-seen cartoons than at the new ones, indicating that they remembered information from the movies. This memory effect was stronger for cartoons that had been presented with the middles occluded. Thus, memory after two weeks was better for videos whose event boundaries had been preserved during presentation.

The studies reported here were designed to answer three questions. First, does deliberate attention to event segmentation facilitate memory? Second, do individual differences in segmentation ability predict individual differences in memory? Finally, if these relationships between segmentation and memory are present, are they robust over retention intervals of days to weeks—the sorts of intervals that are relevant for real-world event memory? To answer these questions, we conducted a series of experiments in which participants viewed movies of everyday activities and either segmented them or engaged in a control task, and then attempted to remember them at delays ranging from a few moments to one month.

Experiment 1

In Experiment 1 we examined memory at delay intervals of 10 minutes, one day and one week. During encoding, participants segmented movies into meaningful events or performed one of two control tasks. Memory was tested using a recall test and a recognition test for still pictures from the movie. This allowed us, first, to ask whether segmentation improved memory relative to the control conditions, and second, whether segmentation performance predicted subsequent memory.

We conducted the study using an online pool of human participants from the Amazon Mechanical Turk (AMT) system. Several studies have shown a strong degree of reliability between data produced by AMT participants and participants tested in the laboratory for both perception-based and cognitive experiments (Crump, McDonnell, & Gureckis, 2013; Germine et al., 2012; Goodman, Cryder, & Cheema, 2013; Sprouse, 2011; for a review of AMT, see Mason & Suri, 2012).

Method

Participants

We recruited 482 participants (266 females; age range: 18–66 years; median age: 30 years) from AMT. The study required participants to complete three parts (one at each delay interval). Participants were paid $0.50 for completing session one, $1 for completing session two, and $2.50 for completing session three. All workers were from the United States and had a human intelligence task (HIT) approval rating of at least 95%. Before beginning the study, each participant was presented with a description of the procedure and indicated consent by clicking to continue.

Materials and Tasks

Participants watched three movies of actors engaging in everyday activities (see Figure 1), and completed two memory tests for each movie. The movies included a female actor making breakfast (329 s), a male actor decorating a room for a party (376 s), and a male actor gardening (354 s). A short practice movie of a man building a boat out of Duplo blocks (155 s) also was presented. During the initial encoding, each participant performed one of three tasks, event segmentation, intentional encoding, or timing, with all three movies. Participants in the event segmentation group were instructed to press the space bar whenever, in their judgment, one meaningful unit of activity ended and another began (e.g., Newtson, 1976). Participants were instructed to identify the smallest units that were meaningful to them, while trying to remember as much as possible. Those in the intentional encoding group were instructed simply to watch the movie and try to remember as much as possible. Those in the timing group were instructed to press the keyboard space bar every 15 s during the movie, again while trying to remember as much as possible. No external prompt was given; they were asked to try to estimate the 15-s intervals. The timing task controlled for potential effects of the motor task on event memory. However, it should be noted that attending to counting may have artificially interfered with viewers’ event encoding. All three task groups were informed that their memory for the movies would be tested.

Still frames taken from three of the experimental movies: making breakfast, decorating for a party, and gardening.

After watching all three movies one time each while performing their assigned task, participants completed two memory tests for each film: recall and recognition memory. For the recall memory measure, participants were instructed to type what had occurred in the movie in as much detail as possible. They were given five minutes to respond. Testing procedures for recognition memory were similar to those described in Zacks et al. (2006). On each trial, participants were shown two still pictures: a target picture from the movie viewed and a lure from footage of the same actor performing similar activities in the same setting. (For example, for the breakfast video, foil pictures depicted the actor preparing different ingredients or assembling them in a different order.) Participants were instructed to select the picture that had come from the movie they had viewed. Twenty trials were presented for each movie, and recognition memory was scored as the proportion of trials answered correctly for each movie.

Finally, to assess the potential contribution of knowledge about everyday events to event memory, participants completed a measure of their script knowledge (Rosen, Caplan, Sheesley, Rodriguez, & Grafman, 2003). In this task, participants were given three minutes to type the sequence of events, line-by-line, associated with three everyday activities (getting ready for work, shopping for groceries, going out to dinner). An example response of a different activity (putting a child to bed) was provided to participants to help them structure their responses.

Design and Procedure

Testing occurred in three sessions over a period of one week. In session one, participants viewed the three movies while performing either the intentional encoding, timing, or event segmentation tasks. Each movie was only viewed one time, and task was manipulated between participants. Demographic information by group for all studies is presented in Table 1. The timing and event segmentation groups were given the opportunity to practice their assigned task on a separate practice movie before proceeding to the main experiment. Movie presentation order was counterbalanced. Following the presentation of the third movie, participants completed the recall and recognition tests for the first movie viewed. The delay period between the presentation of the first movie and its memory test was approximately 10 minutes.

Table 1.

Demographic information for each participant group.

	Condition	N	Number Female	Median Age (yrs)	Age Range (yrs)
Experiment 1	Segmentation	243	140	31	18 – 66
	Intentional Encoding	84	43	29	18 – 61
	Timing	155	83	31	20 – 63
Experiment 2a	Segmentation	260	172	31.5	18 – 65
	Intentional Encoding	196	123	32	19 – 77
Experiment 2b	Segmentation	287	191	31	18 – 75
	Intentional Encoding	288	178	30	18 – 77
Experiment 3	Segmentation	140	90	30	18 – 65
	Intentional Encoding	210	124	29	18 – 67
Experiment 4	Segmentation/Immediate	135	70	32	18 – 70
	Intentional Encoding/Immediate	106	63	30	18 – 68
	Timing/Immediate	136	83	31.5	18 – 69
	Segmentation/10 Minute Delay	142	93	32	20 – 66
	Intentional Encoding/10 Minute Delay	119	76	35	18 – 65
	Timing/10 Minute Delay	138	96	30	18 – 66

Open in a new tab

Session two occurred 24 – 48 hours later. During this session, participants completed memory tests for the second movie viewed during session one and completed the script knowledge task. Session three occurred seven to nine days following session one. During this session, participants completed the recall and recognition tests for the last movie viewed during session one.

In short, the study design included the between-participants variable of task (segmentation, intentional encoding, or timing) and the within-participants variable of delay (10 min, 1 day, 1 week).

Recall and Script Elicitation Scoring

The primary dependent measures were recall and recognition performance, and segmentation performance for the event segmentation group. Here, we describe how the recall protocols and segmentation agreement were scored, and also how the independent variable of script knowledge was scored.

Due to the large number of responses, hand scoring of the recall protocols was not practical. Therefore, we validated a simple alternative measure, starting with scored recall data from a previous study (Sargent et al., 2013). In that study, recall protocols from 208 participants were scored using an adaptation of the action coding system developed by Schwartz, Reed, Montgomery, Palmer, and Mayer (1991), which divides the actor’s stream of behavior into fine-grained action units; for example, “walks into kitchen”, “puts soap on hand”, “turns on water.” In that study, inter-rater reliability was found to be 0.84, p < .001, 95% CI [0.79, 0.90]. To create a recall measure that could be calculated automatically, we simply counted the number of words in each protocol and divided each count by the number of fine-grained actions in the scoring rubric to give a normalized word count. For the Sargent et al. (2013) study, the normalized word count measure correlated well with the hand-scored recall measure, r = 0.77, p < .001, 95% CI [0.71, 0.82]. To ensure the normalized word count measure captured the previously-observed relationship between event segmentation and memory, we computed linear mixed effects models with hand-scored recall or normalized word count as the dependent measures, a fixed effect of segmentation agreement, and random effects of movie and participant. For both analyses, there was a strong effect of segmentation agreement: For the hand-scoring, F(1, 579.55) = 30.2, p < .001; for normalized word count, F(1, 568.33) = 9.96, p = .002.

Therefore, for the present studies, we used the normalized word count as the criterial measure of recall memory. We adopted the same approach to score the script elicitation task: Responses were scored as the mean number of lines for the three everyday activities assessed.

Segmentation Agreement

To assess how well each participant in the event boundary condition segmented each movie, we calculated segmentation agreement scores for each viewing (Kurby & Zacks, 2011). Each movie was divided into 1 s bins. For each participant, each bin was coded as “1” if the participant segmented during that bin and “0” otherwise. These distributions were summed across participants to create a normative probability-of-segmentation time series. Segmentation agreement was calculated by computing the point-biserial correlation between an individual’s segmentation and the normative time series for the group, and then scaling each correlation based on the number of boundaries the person identified (see Kurby & Zacks, 2011). This scaling accounts for the fact that the minimum and maximum possible correlation for a particular viewing depends on the number of boundaries identified by the viewer, and on the particular values of the normative time series. Scaled values range from zero to one, with zero being the worst possible agreement and one being the best.

Data Preparation

Of the participants enrolled, 157 failed to complete all sessions within specified time windows and thus were excluded from analyses. An additional 29 participants were dropped because valid data were not recorded for one or more memory measures, and 89 participants from both the segmentation and timing task were dropped because valid button pressing data were not recorded for at least one of the experimental movies. For the remaining segmentation and timing task groups who provided valid data, we calculated the mean number of button presses per minute for each participant, and excluded outlying participants whose measure was more than 2.5 SDs from the group mean; this resulted in the loss of 11 participants. For recall memory, participants whose word counts were fewer than 10 words for one or more of the recall responses were identified as outliers and excluded. Response times (RTs) for the recognition memory task were trimmed by excluding RTs from incorrect trials and RTs greater than 2.5 SDs from the within subject mean. Participants whose mean trimmed RT was greater than 2.5 SDs from the mean across participants or were faster than 500ms for at least one movie’s recognition test were excluded from further analyses. We did not exclude participants on the basis of recognition accuracy. In total, 14 participants were excluded for being outliers on the memory tasks. Our final sample consisted of n = 177 (71 segmentation, 52 intentional encoding, and 54 timing).

Results and Discussion

Analyses were conducted in the R statistical environment (version 3.0.2; R Core Team, 2013) using the lme4 package for linear mixed effect modeling (Bates et al., 2015). We used linear mixed effects models because they can simultaneously account for the influence of multiple random effects, in this case the effects of participants and of movies (for a review, see Baayen, Davidson, & Bates, 2008). For the current experiment, all linear mixed models treated both participant and movie as random effects, and treated encoding task and delay as fixed effects. In the analyses assessing the effects of segmentation agreement or script knowledge on memory these variables were also treated as fixed effects. The lmerTest package (Kuznetsova, Brockhoff, & Christensen, 2014) was used to test the fixed main effects and interactions by comparing the nested model, using the Satterthwaite approximation for degrees of freedom. To compare the different encoding conditions within each delay condition, we fit separate models for each delay and then used the lsmeans (Lenth, 2016) and multcomp (Hothorn, Bretz, & Westfall, 2008) packages to compute Tukey tests comparing the different encoding conditions within delay.

Recall

We first fit a linear mixed model to predict recall performance from the fixed effects of task group and delay along with their interaction.¹ As can be seen in Figure 2, more information was recalled by the segmentation group (M = 1.00, SD = 0.49) than either the intentional encoding (M = 0.91, SD = 0.46) or timing (M = 0.81, SD = 0.46) groups, resulting in a main effect of task, F(2,173.99) = 8.58, p < .001. Mean recall performance decreased across the delay intervals (10 min: M = 1.00, SD = 0.49; 1 day: M = 0.91, SD = 0.46; 1 week: M = 0.82, SD = 0.46), resulting in a main effect of delay, F(2,345.98) = 18.97, p < .001. The interaction of task and delay was not significant (F(4,345.93) = 0.57, p = .68), indicating that task group difference was relatively stable across the delay intervals. As is shown in Table 3, the segmentation group out-performed the intentional encoding group significantly at all three delays. This provides the first evidence that explicitly segmenting everyday activity into meaningful events can incidentally produce a memory benefit when memory for those events is accessed much later in time. The segmentation group also was significantly better than the timing group at each delay. The intentional encoding and timing groups did not differ significantly at any delay.

Table 3.

Pairwise Comparisons Between Tasks at Each Delay.

	Delay	Conditions compared	Recall			Recognition Accuracy			Recognition Response Time

			df	t	p	df	t	p	df	t	p

Experiment 1	10 Minute	Segmentation vs. Intentional	172	3.154	0.00525	172	2.69	0.0213	172	2.225	0.0697
		Segmentation vs. Timing	172	2.857	0.01333	172	2.129	0.0869	172	0.684	0.7729
		Timing vs. Passive	172	−0.322	0.94451	172	−0.561	0.8408	172	−1.464	0.3105
	1 Day	Segmentation vs. Intentional	172	2.43	0.0424	172	2.524	0.0333	172	2.742	0.0183
		Segmentation vs. Timing	172	2.717	0.0197	172	1.266	0.4162	172	1.527	0.2805
		Timing vs. Passive	172	0.236	0.9697	172	−1.198	0.4557	172	−1.161	0.478
	1 Week	Segmentation vs. Intentional	172	3.938	< 0.001	172	2.216	0.071263	172	2.131	0.0866
		Segmentation vs. Timing	172	3.166	0.00519	172	3.881	0.000417	172	0.511	0.8657
		Timing vs. Passive	172	−0.763	0.72583	172	1.518	0.284829	172	−1.528	0.2802
Experiment 2a	10 Minute	Segmentation vs. Intentional	118	1.495	0.138	117	1.172	0.244	118	−0.433	0.666
	1 Month	Segmentation vs. Intentional	118	1.22	0.225	117	1.517	0.132	118	−0.317	0.752
Experiment 2b	10 Minute	Segmentation vs. Intentional	180	0.587	0.558	180	2.046	0.0422	180	0.664	0.508
	1 Month	Segmentation vs. Intentional	180	0.413	0.68	180	1.898	0.0594	180	0.717	0.475
Experiment 3	Immediate	Segmentation vs. Intentional	300	−1.16	0.247	300	−1.083	0.28	300	−0.094	0.925
Experiment 4	Immediate	Segmentation vs. Intentional	288	−1.23	0.436	288	−1.15	0.4843	288	0.884	0.6507
		Segmentation vs. Timing	288	0.455	0.892	288	1.205	0.4509	288	2.66	0.0224
		Timing vs. Passive	288	1.704	0.205	288	2.383	0.0467	288	1.801	0.171
	10 Minute	Segmentation vs. Intentional	281	2.426	0.0419	281	1.17	0.472	281	0.456	0.892
		Segmentation vs. Timing	281	1.782	0.1776	281	1.742	0.191	281	1.609	0.244
		Timing vs. Passive	281	−0.613	0.813	281	0.58	0.831	281	1.149	0.485

Open in a new tab

Recognition

Separate analyses were conducted for recognition accuracy and response time. As can be seen in Figure 3, the segmentation group had higher recognition accuracy (M = 0.69, SD = 0.13) than either intentional encoding (M = 0.64, SD = 0.15) or timing (M = 0.64, SD = 0.14), resulting in a significant main effect of task, F(2,173.96) = 9.23, p < .001. The linear mixed model also detected a main effect of delay, F(2,345.82) = 21.81, p < .001, such that the recognition performance dropped significantly between 10 min (M = 0.71, SD = 0.12) and 1 day (M = 0.64, SD = 0.14), t(346) = 4.64, p < .001. Performance on recognition remained relatively stable between 1 day and 1 week (M = 0.63, SD = 0.15), t(346) = 1.72, p = .20. The interaction of task and delay was not significant, F(4,345.76) = 2.06, p = .08. As is indicated in Table 3, participants in the segmentation condition performed significantly better than the intentional encoding condition for the 10-minute and 1-day delays. Further, at a 1-week delay the segmentation group performed significantly better than the timing group and marginally better than the intentional encoding group. The intentional encoding and timing groups did not differ significantly at any delay.

The segmentation group responded slower (M = 4225 ms, SD = 1858 ms) to recognition probes than either the intentional encoding (M = 3480 ms, SD = 1833 ms) or timing (M = 3930 ms, SD = 1747) groups (see Figure 4), resulting in a main effect of group on recognition response time, F(2,173.99) = 3.92, p = .02. On average, participants responded faster to recognition memory trials at the 1 week delay (M = 3590 ms, SD = 1810 ms) than either the 10 min (M = 4113 ms, SD = 1748 ms) or 1 day (M = 4046 ms, SD = 1924 ms) delays, yielding a main effect of delay, F(2,346.04) = 10.28, p < .001. The task-by-delay interaction was found to be non-significant for response time, F(4,346.02) = 0.55, p = .70, suggesting that group differences in response times were relatively stable across the different delay intervals. In only one case was a pairwise difference across conditions significant: The segmentation group was significantly slower than the intentional encoding group at the 1-day delay (see Table 3).

Mean trimmed recognition response times for task groups at each delay for all six experiments. In over half of the experiments, no significant differences in recognition response time were found among the task groups. In Experiment 1, the segmentation group took significantly longer to respond while the timing group was significantly faster to respond in Experiment 4. Error bars are 95% confidence intervals.

In sum, participants who segmented the ongoing flow of perceptual activity into meaningful events remembered those events better than participants who simply watched with the intention to remember or who pressed a button to mark 15 s intervals. This benefit was seen in both recall and recognition tests, and was stable for up to a week following encoding.

For recognition memory, participants who had encoded by segmenting responded more slowly than the other groups, which may indicate that their superior accuracy was in part due to a speed-accuracy trade-off. However, given that the test conditions were identical for all three encoding groups, this seems unlikely. A more plausible account is that when participants were tested on movies they had segmented, they retrieved more information and thus took longer to make their recognition judgments.

Segmentation Agreement

Descriptive statistics for the segmentation task in the boundary segmentation group are given in Table 2, with data for the timing task for comparison. To test the hypothesis that segmentation agreement predicted memory, we fit linear mixed models predicting recall word count from segmentation agreement. All measures were z-scored to control for differences between movies and delays; segmentation agreement was z-scored within movie, and memory measures were z-scored within movie and delay. The full model included fixed effects of segmentation agreement and delay (dummy coded as 3 binary indicator variables), plus their interaction (coded as 3 additional variables). Because we were interested in the relationship between segmentation and memory, we constrained the intercepts in the model to be zero. The random effect of participant was modeled as an effect on memory performance at the 10 min delay; in other words, the 10 min delay was treated as the reference level characterizing differences between people. (Models using the other two delays as the reference level produced equivalent results.) For recall word count, there was a strong effect of segmentation, F(1, 202.2) = 12.8, p < .001, as can be seen in Figure 5. To test whether the relationship varied with delay, we compared the full model to a reduced model with no interaction terms. The comparison was not significant, meaning there was not evidence that the strength of the relationship varied with delay, χ²(2) = 0.50, p = .78. For recognition accuracy, neither the effect of segmentation nor the interaction with delay was significant, F(1, 209.0) = 0.52, p = .47; χ²(2) = 1.52, p = .47 (See Figure 6).

Table 2.

Descriptive Statistics for Number of Button Presses Identified during Encoding Task by Group

	Segmentation	Timing
Exp. 1
Range	1 – 107	13 – 37
Mean Number of Boundaries	31.35	22.05
Median Number of Boundaries	24	22
Exp. 2A
Range	3 – 67
Mean Number of Boundaries	16.55
Median Number of Boundaries	14
Exp. 2B
Range	3 – 76
Mean Number of Boundaries	17.73
Median Number of Boundaries	14
Exp. 3
Range	6 – 275
Mean Number of Boundaries	32.53
Median Number of Boundaries	20
Exp. 4
Range	1 – 117	7 – 49
Mean Number of Boundaries	23.36	23.11
Median Number of Boundaries	16	23

Open in a new tab

Relationship between segmentation agreement and recall memory at each delay for all six experiments. In all experiments segmentation agreement predicted recall performance. All plots reflect mean normalized word count for a participant at each delay with a 95% confidence interval for the regression line.

Relationship between segmentation agreement and recognition accuracy at each delay for all six experiments. The relationship between segmentation agreement and recognition accuracy is mixed. Experiments 2B and 4 showed a significant relationship between segmentation agreement and recognition performance while Experiments 1, 2A and 3 did not. All plots reflect mean recognition accuracy for a participant at each delay with a 95% confidence interval for the regression line.

In sum, we replicated previous findings that more normative segmentation predicts better subsequent memory (Bailey et al., 2013; Sargent et al., 2013; Zacks et al., 2006). Importantly, for the first time, we showed that this effect is stable for up to one week.

Script Knowledge

To investigate whether script knowledge could account for the relationship between segmentation and memory, we constructed two linear mixed models. The first model was constructed in a manner similar to the model used to test recall and recognition performance (the ‘null’ model); it included the fixed effects of task and delay, and their interaction, but not the effect of script knowledge. A second model included the effect of script knowledge but no interaction with either task group or delay (the ‘no interaction’ model). We first analyzed the influence of script knowledge as a fixed effect by comparing both the null and no interaction models. For recall, the comparison was significant, meaning that pre-existing script knowledge was a significant predictor of recall performance, χ²(1) = 25.41, p < .001. In this model, the effects of both task group and delay were still significant [task: F(2,172.98) = 8.44, p < .001; delay: F(2,345.98) = 26.85, p < .001]. To test whether the effect of script knowledge varied with either delay or task, we constructed a third model that included the two-way interactions of script knowledge with task and delay, and the three-way interaction of task, delay and script knowledge (the ‘full’ model) and compared this to the no interaction model. The addition of the three-way interaction did not significantly improve model fit, χ²(8) = 4.76, p = .78. For recognition accuracy, the no interaction model did not provide any additional explanatory power beyond the null model,, χ²(1) = 1.85, p = .17. These findings replicate those by Sargent et al. (2013), suggesting that script knowledge facilitates event memory, but does so through a mechanism that is independent of segmentation.

Together, the group differences and individual differences provide the first evidence that attending to everyday event structure produces an event memory benefit up to a week after encoding. This is important for theories of long term memory, because it suggests that memory representations that undergo consolidation and support stable long-term memory reflect the event structure of the experiences that were encoded (Rubin & Umanath, 2015). These results are also of potential practical significance because they suggest that variables affecting event structure encoding can determine memory performance long past the encoding episode, allowing for the development of potential interventions to alleviate memory complaints for everyday activities.

As noted previously, the timing control has both advantages and disadvantages. We included it so as to have a comparison condition that, like the segmentation task, involves making motor decisions and executing actions during movie viewing. However, it could well have been that the requirement to count off time would interfere with attending to the events in the movie, imposing an artificial dual-task cost. In fact, the timing group performed similarly to the intentional encoding group; therefore in studies 2A, 2B, and 3 it was eliminated. (We brought this condition back in Experiment 4, for reasons we will discuss later.)

Experiments 2A and 2B

The primary aim of Experiment 2 was to ask whether the effect of segmentation on memory was retained at a delay of one month. In addition, we asked whether the relationship between individual differences in segmentation agreement and memory extended to this longer delay. Finally, in Experiment 2 we addressed a limitation of the design of Experiment 1. In that study, movies were always tested in the order in which they were presented. Thus, the movie tested in the encoding session was always the first movie presented, and the movie tested at the one week delay was always the last movie presented, confounding presentation order with delay. This leaves open the possibility that differences between the delays could be due to proactive interference rather than delay. In Experiment 2, the relationship between presentation order and delay was counterbalanced to control for this confound. We were interested in further exploring the possibility that the advantage of the segmentation group in recognition accuracy was due to a speed-accuracy trade-off, so we again measured recognition memory accuracy and response time.

When conducting Experiment 2, we found that the one month delay substantially increased our attrition rates. We therefore collected a second sample; the two samples are reported separately as Experiments 2A and 2B.