Abstract
The present study examined how the function relating continued retrieval practice (e.g., 1, 3, or 5 tests) and long-term memory retention is modulated by desirable difficulty (Bjork, 1994). Of particular interest was how retrieval difficulty differed across young and older adults and across manipulations of lag (Experiment 1) and spacing (Experiment 2). To extend on previous studies, acquisition phase response latency was used as a proxy for retrieval difficulty, and analysis of final test performance was conditionalized on acquisition phase retrieval success to more directly examine the influence of desirable difficulty on retention. Results from Experiment 1 revealed that continued testing in the short lag condition led to consistent increases in retention, whereas continued testing in the long lag condition led to increasingly smaller benefits in retention for both age groups. Results from Experiment 2 revealed that repeated spaced testing enhanced retention relative to taking one spaced test for both age groups; however, repeated massed testing only enhanced retention over taking one test for young adults. Across both experiments, the response latency results were overall consistent with an influence of desirable difficulty on retention. Discussion focuses on the role of desirable difficulty during encoding in producing the benefits of lag, spacing, and testing.
Keywords: testing effect, spacing effect, retrieval practice aging, refreshing
Introduction
Healthy aging is marked by broad declines in episodic memory (Arking, 1998; Balota, Dolan & Duchek, 2000; Salthouse, 1996). Given the substantial increase in our aging population, there is a clear need to identify ways of improving memory that are effective across diverse populations and variable aging trajectories (e.g., Hertzog, Kramer, Wilson & Lindenberger, 2009). One technique that is effective across varying contexts and populations, spaced retrieval practice, combines the mnemonic benefits of spacing and testing (see Balota, Duchek, & Logan, 2007, and Cepeda, Pashler, Vul, Wixted, & Rohrer, 2006 for reviews). Indeed, spaced retrieval practice has been shown to enhance performance in healthy older adults (e.g., Balota, Duchek, & Paullin, 1989), individuals with Alzheimer’s disease (e.g., Balota, Duchek, Sergent-Marshall, & Roediger, 2006; Camp, Foss, Stevens, & O’Hanlon, 1996), and individuals suffering from amnesia (Schacter, Rich & Stamp, 1985).
Given the effectiveness of spaced retrieval, it is important to better understand why this technique improves memory and how it can be used most effectively. There have been multiple mechanisms that have been developed to account for spaced retrieval including the combined study-phase retrieval and encoding variability account (e.g., Greene, 1989; Raaijmakers, 2003, see Cepeda et al., 2006 for a review). Here, we focus on one account of the benefits of spaced retrieval, desirable difficulty (Bjork, 1994), which suggests that more effortful retrieval will lead to greater strengthening in the underlying memory trace than less effortful retrieval assuming that successful retrieval occurs in both situations. Specifically, spaced practice produces better performance than massed practice because the second retrieval event in the spaced condition is relatively more difficult thereby producing better retention. The present study extends on previous examinations of desirable difficulty in two ways. First, we operationally define desirable difficulty in a way that utilizes response latencies on retrieval trials during the acquisition phase. Second, we examine the influence of desirable difficulty on retention by accounting for differences across conditions in acquisition performance that may later influence final test performance.
With these methodological extensions in hand, we can more carefully address the following two questions: First, how does the efficiency of retrieval practice differ across healthy young and older adults given past research indicating age differences in optimal spacing schedules (e.g., Maddox, Balota, Coane, & Duchek, 2011)? Second, how can the efficiency of retrieval practice be maximized by varying the spacing interval (i.e., lag) that occurs between retrieval attempts? In other words, how does the relationship between long-term memory performance and additional retrieval practice (e.g., 1 vs. 3 vs. 5 tests) differ as a function of lag?
Before introducing the current experiments, we more fully consider the two methodological extensions in the present study. We will first examine how desirable difficulty may influence final test performance in a retrieval practice paradigm and will discuss the way desirable difficulty has often been operationally defined. We will then consider how spaced retrieval practice may differentially influence encoding versus retention of material.
Spaced Retrieval Practice and Desirable Difficulty
As noted earlier, the desirable difficulty account of the spacing effect (Bjork, 1994) suggests that performance in the spaced condition benefits more than the massed condition from successful retrieval, because retrieval events in the former condition are more difficult than retrieval events in the latter condition. Similarly, the desirable difficulty account can help explain the more general benefit of testing over re-studying (see Rowland, 2014 for a review) and the benefit of spaced study over massed study when incorporated with other proposed mechanistic accounts (e.g., the study-phase retrieval and reminding accounts; see Benjamin & Tullis, 2010 for a review).
One concern with considering past spaced retrieval studies as assessments of the desirable difficulty account is how retrieval difficulty during the encoding phase is operationally defined. The retrieval difficulty of a condition is typically inferred based on the lag separating study and test events (i.e., retrieval following a short lag is easier than retrieval following a long lag; e.g., Bjork, 2013; Bjork & Bjork, 2011; Clark & Bjork, 2014; Pyc & Rawson, 2009) or utilizing other experimental manipulations hypothesized to induce different levels of difficulty during the encoding process (e.g., Sungkhasettee, Friedman, & Castel, 2011; Yue, Castel & Bjork, 2013).
In studies that have examined the benefit of spaced retrieval practice, two studies have examined acquisition phase response latencies in addition to cued recall (Karpicke & Roediger, 2007; Logan & Balota, 2008). Results from both studies revealed slower mean response latency on the first retrieval attempt following a long lag relative to a short lag, which suggests that it may be particularly important to introduce desirable difficulty on the first retrieval attempt as a means for enhancing long-term memory performance. Thus, there is precedent for using response latencies as a proxy for retrieval difficulty, but it is not true that retrieval difficulty must always be correlated with the spacing interval separating retrieval attempts. In some instances, the relationship may actually be curvilinear. For example, the difference in retrieval difficulty between Lag 1 and Lag 5 conditions may be less than the difference between Lag 5 and Lag 9 conditions even though the objective change in lag size is constant. This might occur if Lag 1 and Lag 5 intervals are within an individual’s working memory capacity but Lag 9 is beyond their working memory capacity (c.f., Bui, Maddox & Balota, 2013). Similarly, if multiple populations are examined in a single study (as in in the current experiments), retrieval difficulty and the difference in difficulty between spacing conditions may shift across groups. In the current study, older adults may experience a larger difference in retrieval difficulty between lag conditions than young adults given age-related changes in episodic memory (see Balota, Dolan & Duchek, 2000). Hence, one might expect larger benefits of spacing in older adults than younger adults.
Thus, it is critical to jointly examine acquisition phase response latency and accuracy to more directly assess the influence of desirable difficulty on long-term memory for the specific lags and populations examined in a given study. Although the manipulation of lag as a proxy for changes in desirable difficulty appears a priori to be a reasonable assumption, it is also critical to have an independent measure of this construct. Therefore in the present study, we use acquisition phase response latency as a proxy for retrieval difficulty such that longer response latencies will indicate greater retrieval difficulty than shorter response latencies.
The Influence of Spaced Retrieval Practice on Encoding versus Retention
Typically, past research has emphasized the extent to which spaced retrieval maximizes final test performance. However, differences during acquisition performance are typically observed across spaced retrieval conditions as well as differences in final test performance, which complicates the understanding of how spaced retrieval influences retention and final retrieval above and beyond its influence on the encoding process. In order to illustrate this issue, consider the hypothetical situation in which there are 10 items assigned to a short lag condition and 10 items assigned to a long lag condition. If six items are retrieved on the final test from the long lag condition and four items are retrieved from the short lag condition, one might argue that spaced retrieval practice produced a 20% benefit in benefit on final recall performance (60% vs. 40% correct). However, if during acquisition, participants correctly retrieved all 10 items in the short lag condition but only eight items correct in the long lag condition, then the benefit of retrieval practice on retention would be 35% (6/8 items, or 75% in the long lag condition vs. 4/10 items or 40% in the short lag condition). This of course assumes that there little if any hypermnesia (see Erdelyi & Becker, 1974; Payne & Roediger, 1987). A similar concern may also arise when comparing age groups. If young adult memory is more intact than older adult memory, one would predict a larger difference in accuracy between spacing conditions for older adults compared to young adults during the acquisition phase due to age-related changes in episodic memory (see Balota, Dolan & Duchek, 2000). If one does not account for these differences during acquisition, it may appear that the benefit of spaced retrieval practice differs across age groups.
With these points in mind, the current study emphasized conditional final test performance. In doing so, we aimed to minimize the influence of acquisition phase differences (between spacing conditions and between age groups) on final test performance to better isolate the influence of spaced retrieval on retention.
Current Study
Armed with the two methodological extensions to past spaced retrieval studies (i.e., conditional final recall performance and measuring response latency during acquisition), Experiment 1 examined the benefits of retrieval practice for items that were retrieved once, three times or five times either at a short or long lag. Figure 1 displays a partial list structure in which both spacing and retrieval practice were within-participant manipulations.
Figure 1.
Partial schedule for items receiving three retrieval attempts in Lag 1 (e.g., APPLE – evil) and Lag 3 (e.g., HORSE-jumped) conditions.
Based on prior studies and the assumptions described earlier relating retrieval difficulty and spacing intervals, one would expect a priori that continued testing should produce a benefit in final test performance in the long lag condition but not the short lag condition. This assumes that repeated retrieval with a more difficult, longer lag should produce continued benefits, whereas little benefit should be gained from additional retrieval practice with the easier, shorter lag. Importantly, however, the current study will afford a direct measure of retrieval difficulty (i.e., response latency) instead of assuming differences in retrieval difficultly across conditions. As we will see, this additional measure provides important insights into the retrieval difficulty encountered during the acquisition phase in each lag condition.
Importantly, the present study also compared young and older adult memory performance as a function of lag and number of retrieval attempts. As a result of the well-established difference in episodic memory between young and older adults, one would expect the difference between the short and long lags in retrieval difficulty during acquisition to be greater for older adults than for younger adults. However, examining conditional final recall performance (along with response latency during encoding) will afford a more direct measure of retrieval difficulty on retention, without the potential confounding influence of differences in retrieval success during encoding.
Experiment 1
Method
Participants
Young adults were undergraduates at Washington University in St. Louis and received partial course credit or monetary remuneration ($15 or $20 for short and long retention intervals, respectively) for their participation. Older adults were healthy, community dwelling individuals who provided their own transportation to the testing site. For their participation, older adults received monetary remuneration ($20). Participants in each age group were equally divided between the short and long retention interval (RI) conditions (see Table 1 for demographics). Age, years of education, and Shipley vocabulary scores were significantly different between young and older adults (ps < .005). An additional group of young adults (n = 1 and n = 3 for short and long RIs, respectively) and older adults (n = 5 and n = 4 for short and long RIs, respectively) were excluded from analysis due to low performance on the final test (i.e., unconditional mean accuracy less than 5%), and two additional young adults in the long retention interval condition were excluded for not completing the second experimental session.
Table 1.
Mean (S.D.) age (in years), education (in years) and Shipley vocabulary score as a function of age, retention interval, and experiment.
| Young | Older | ||||
|---|---|---|---|---|---|
| Short RI | Long RI | Short RI | Long RI | ||
| Experiment 1 | N | 49 | 49 | 42 | 42 |
| Age | 20.33 (2.46) | 20.76 (2.89) | 73.81 (5.20) | 75.66 (7.49) | |
| Education | 14.39 (1.92) | 14.48 (1.53) | 15.79 (2.58) | 15.21 (2.93) | |
| Shipley | 33.71 (2.64) | 32.96 (3.05) | 35.24 (3.55) | 35.95 (3.43) | |
| Experiment 2 | N | 24 | -- | 24 | -- |
| Age | 19.00 (1.02) | -- | 70.08 (6.19) | -- | |
| Education | 13.75 (1.70) | -- | 16.54 (2.21) | -- | |
| Shipley | 30.25 (3.49) | -- | 35.92 (2.55) | -- | |
Design
A 2 (Age) × 2 (RI) × 2 (Lag, 1 vs. 3) × 3 (Number of Tests, 1 vs. 3 vs. 5) mixed-factor design was used with age and RI as between-participant factors and lag and number of tests as within-participant factors. The short RI was five minutes for both age groups. Because of large age-related differences in long-term retention and in an attempt to minimize differences due to scaling of final test performance (see Salthouse, 2000 for a discussion of Variable × Age interactions), the long RI was one hour for older adults and one day for younger adults. These RIs were selected based on pilot testing, and to foreshadow our results, the use of different durations successfully equated young and older adult retention following a long RI. The lag between study and test trials was either a single trial (Lag 1) or three trials (Lag 3), and items were tested one time (one-test), three times (three-tests), or five times (five-tests) without feedback.
Materials
A continuous paired associate task was used for the acquisition phase of the memory task (see Figure 1 for an example). Fifty-six low associate word pairs (e.g., APPLE - evil) were selected from the USF Free Association Norms (Nelson, McEvoy, & Schreiber, 1998) and have been used in prior spaced retrieval studies (e.g., Maddox et al., 2011). Word pairs shared some features that made them more easily associable (e.g., WHISKEY - water) or could be used to form a sentence (e.g., HORSE - jumped). These stimuli were divided into seven sets of eight pairs which were counterbalanced across lists with each pair occurring equally often in each of the within-participant conditions. Stimuli were statistically equated across sets for word length, frequency, orthographic neighborhood and phonological neighborhood (Balota et al., 2007), and pairs were equated on backward associative strength across stimulus sets (ps > .10).
The critical conditions were interleaved, and the average serial list position was equated across all conditions (ps > .70) as was the average serial list position for the first and the last tests across the six testing conditions (ps > .70). Thus, the average RI was constant for all conditions. In total, the acquisition phase included 218 trials consisting of 192 trials for the critical conditions, 18 filler trials, and eight trials that were equally split between primacy and recency buffer items. Of the 192 critical condition trials, 48 trials were encoding trials (e.g., HORSE - jumped) and 144 trials were retrieval practice trials (e.g., HORSE - ?????). Filler trials were included to ensure that average serial list position was equated across the critical conditions. The final cued recall test presented the cue for each critical pair (e.g., HORSE - ?????). Encoding trials were presented at a 4.5 second rate, whereas cues during acquisition and final test retrieval trials were presented until participants responded with an answer or by stating they did not know the answer. Thus, participants were not required to recall an item to complete the cued recall trial unless they were confident in their response. In both the acquisition and final test phases, participants were asked to speak their answers aloud. The experimenter indicated when the participant produced the response via keypress and then typed the participant’s response on a second screen.
Procedure
Participants first completed a brief practice phase, which included encoding and retrieval practice trials, before the acquisition phase of the memory task. After the practice phase, participants were instructed to learn the word pairs for a final test and were also aware that they would be tested on the pairs throughout the acquisition phase. On retrieval trials, participants were asked to speak their answers aloud as quickly and accurately as possible, and the experimenter typed the response immediately upon vocalization by the participant1. Following the acquisition phase, participants completed five minutes of a distractor trivia task in which trivia questions were presented at a rate of one question every 10 seconds. The procedure following the distractor task differed as a function of RI group. For participants in the short RI condition, the final cued recall test for the memory task occurred immediately following the trivia task. Again, participants were presented with the cue word (e.g., HORSE-????) for each critical pair one at time and made an oral response that the experimenter entered into the computer. Participants then completed a battery of cognitive tasks, the Shipley vocabulary task, and a demographics questionnaire before being dismissed (see Maddox, 2013 for a full discussion and analysis of the cognitive battery). Participants in the long RI condition proceeded with the cognitive battery, Shipley vocabulary task and demographics questionnaire following the trivia task. After completing all other tasks, older adults completed the final cued recall test for the memory task before being dismissed. Young adults were dismissed following completion of the other tasks and were asked to return 24 hours later to complete the final cued recall test.
Results
Acquisition data were collapsed across RI groups, because the acquisition phase was the same for all participants and indeed there were no differences as a function of RI group (ps > .25).
Although there are numerous ways to examine acquisition performance, the current set of analyses focuses on the first and last retrieval attempts in each of the multiple retrieval attempt conditions. This approach allows for an assessment of stability in accuracy across retrieval attempts in each lag condition and the extent to which response latency is influenced by retrieval attempts and lag.
Accuracy During Acquisition
Mean proportion correct recall is shown in Figure 2 as a function of age, lag, number of tests, and test number. There are three observations to note in this figure. First, as expected, young adult performance was higher than older adult performance. Second, performance in the Lag 1 condition was higher than in the Lag 3 condition, and the difference in performance between lag conditions was greater for older adults than young adults. Third, performance remained relatively stable across retrieval attempts in both lag conditions and age groups.
Figure 2.
Mean proportion cued recall (top panel) and mean standardized response latency (bottom panel) during the acquisition phase in Experiment 1 as a function of age, lag, number of tests, and test number (e.g., T1 = first retrieval attempt, T2 = second retrieval attempt, and so on). Error bars represent ± 1 S.E.M.
The results from the 2 (Age) × 2 (Lag) × 2 (Number of Tests, 3 tests vs. 5 tests) × 2 (Test Number, First vs. Last) mixed-factor ANOVA yielded main effects of age, F(1, 180) = 89.19, p < .001, η2p = .33, and lag, F(1, 180) = 50.68, p < .001, η2p = .22, as well as significant Lag × Number of Tests, F(1, 180) = 21.35, p < .001, η2p = .11, and Lag × Test Number interactions, F(1, 180) = 4.72, p = .031, η2p = .03.
Importantly, the Lag × Test Number interaction was qualified by a significant Age × Lag × Test Number interaction, F(1, 180) = 11.72, p = .001, η2p = .06, which is displayed in Figure 2. Separate Lag × Test Number ANOVAs were conducted for each age group. Analysis of young adult accuracy only revealed a single main effect of lag, F(1, 97) = 21.01, p <. 001, η2p = .18. Analysis of older adult accuracy revealed main effects of lag, F(1, 83) = 28.42, p < .001, η2p = .26, test number, F(1, 83) = 5.67, p = .02, η2p = .06, and a Lag × Test Number interaction, F(1, 83) = 12.27, p = .001, η2p = .13. Follow-up t tests revealed a single significant difference (Mdiff = .03) between the first and final test in the Lag 1 condition for older adults, t(83) = 3.48, p = .001. There was no difference in performance between the first and final tests in the Lag 3 condition for the older adults or in either lag condition for the young adults (ps > .20). Thus, it appears that older adults produced forgetting across repeated tests for items that were initially retrieved at a short lag. This may be due to the fact that initial retrieval success is not as strong of an indicator of encoding quality following the short lag compared to the long lag for older adults. If one can maintain the item across the longer lag for the initial retrieval event, then the item is sufficiently well encoded to be produced across the remaining retrieval events for the older adults. The results from the younger adults suggest that they are not susceptible to this forgetting.
Standardized Response Latency on Successful Retrieval During the Acquisition Phase
In the present and all subsequent analyses of response latency, only latencies from trials on which retrieval was successful were included. All latencies beyond three SDs from the mean were excluded from analysis (< 1%). Because older adults were overall slower than young adults, and this difference in speed can compromise the interpretation of interactions, the response latencies were converted to z-scores based on each participant’s mean and standard deviation of raw reaction times to correct trials (see Faust, Balota, Spieler & Ferraro, 1999). For purposes of the ANOVA, missing response latency data were estimated by a triangulation procedure in which the relationship in performance between conditions at the group level was used in relation to individual performance at the participant level to provide an estimate for the missing data. Specifically, the relevant conditional mean for participants who had at least one observation per cell was taken in proportion to the grand mean for those same participants. In turn, this proportion was used to estimate a given participant’s missing cell(s) by multiplying the [Conditional Mean/Grand Mean] proportion for all participants and the participant’s grand mean. Importantly, the pattern of results was similar when analyses were conducted only on participants who had observations for all cells (see Maddox, 2013, for details)
Standardized mean response latencies on correct trials are displayed in Figure 2 as a function of age, lag, number of tests, and test number. There are three observations to note in the figure. First, response latency decreased across retrieval attempts. Second, the decrease in response latency between the first and last tests was larger for the Lag 3 condition than the Lag 1 condition. Third, the difference between lag conditions in speeding across retrieval attempts was larger for older adults than young adults.
The 2 (Age) × 2 (Lag) × 2 (Number of Tests, 3 tests vs. 5 tests) × 2 (Test Number, First vs. Last) mixed-factor ANOVA revealed main effects of number of tests, F(1, 180) = 4.17, p = .043, η2p = .02, which reflected a small difference in response latency between the three- and five-test conditions (M = .03 vs. −.03, respectively), and test number, F(1, 180) = 473.19, p < .001, η2p = .72, which reflected the speeding of response latency between the first and last tests (M = .30 vs. −.31, respectively). Additionally, the Lag × Test Number interaction was significant, F(1, 180) = 6.94, p = .009, η2p = .04
Importantly, the results from the ANOVA again yielded a significant Age × Lag × Test Number interaction, F(1, 180) = 4.65, p = .032, η2p = .03, which is displayed in the bottom half of Figure 2. To examine this interaction, separate 2 (Lag) × 2 (Test Number) ANOVAs were conducted for young and older adults. Analysis of young adult performance revealed significant effects of lag and test number (ps < .05) but no interaction, whereas older adult performance revealed a significant effect of test number (p < .001) and again a significant Lag × Test Number interaction, F(1, 83) = 8.32, p = .005, η2p = .09. The significant Lag × Test Number interaction reflected significantly slower response latency on the first test for the Lag 3 condition than the Lag 1 condition (M = .37 vs. .24, respectively, p = .034) and a numerical reversal of this pattern on the final test (M = −.33 vs. −.25, respectively, p = .154). This finding is consistent with Bjork’s (1994) desirable difficulty account in which items that are retrieved with more difficulty will be strengthened to a greater extent than items retrieved with less difficulty. In the current results, the time it took to initially retrieve items provides evidence that Lag 3 produced a relatively more difficult retrieval event than Lag 1. As a result, the trace may be strengthened to a greater extent and consequently retrieved faster on the final retrieval attempt in the Lag 3 condition compared to the Lag 1 condition.
Taken together, the three-way interactions among age, lag and test number observed in both accuracy and response latency suggest that the underlying memory trace continues to be strengthened by subsequent tests (as indicated by faster response latency) even when accuracy remains stable. Moreover, results suggest that the two lag conditions were more distinct for older adults than young adults as indicated by the significant Lag × Test Number interactions for older adults but not young adults in follow-up analyses. Again, this is consistent with the different forgetting functions for young and older adults.
Final Test Phase Performance
As noted earlier, because we were interested in the influence of testing, lag and age on retention, the present analyses emphasize conditional recall performance. Conditional recall was calculated for items that were correctly retrieved on their final retrieval attempt during the acquisition phase (see Maddox, 2013, for a complete analysis of unconditional performance, which generally accord with the current analyses).
Conditional Final Test Accuracy
Mean proportion conditional recall is shown in Figure 3 as a function of age, RI, lag, and number of tests. There are three observations to note in this figure. First, retention was greater for young adults than older adults after a short RI but was comparable across age groups following the long RI. This confirms we were successful in matching the young and older adults at the long RI by increasing the retention interval more for the former group than the latter group. Second, continued retrieval during the acquisition phase led to increased retention for young and older adults when tests were spaced by a single item regardless of RI. A similar increase in retention was observed across age groups and RIs in the Lag 3 condition when pairs were tested three times versus one time, but no additional benefit was observed in the five-test condition relative to the three-test condition. Third, older adults produced a larger lag effect than young adults following both RIs.
Figure 3.
Mean proportion conditional cued recall on the final test in Experiment 1 as a function of age, retention interval (RI), lag, and number of tests. Error bars represent ± 1 S.E.M.
These observations were supported by the results of a 2 (Age) × 2 (RI) × 2 (Lag) × 3 (Number of Tests) mixed-factor ANOVA. The main effects of RI, F(1, 178) = 13.86, p < .001, η2p = .26, lag, F(1, 178) = 65.78, p < .001, η2p = .27, and number of tests, F(2, 356) = 55.96, p < .001, η2p = .24, were significant, along with a marginal effect of age, F(1, 178) = 3.53, p = .062, η2p = .019. A reliable interaction between Age and RI, F(1, 178) = 17.69, p < .001, η2p = .09, indicated a significant age difference following a short RI (p < .001) but similar performance between groups following a long RI (p = .134). The Lag × Number of Tests interaction was also significant, F(2, 356) = 4.52, p = .012, η2p = .03. This interaction reflected significant increases in performance in the Lag 1 condition as the number of tests during acquisition increased, (M = .34, .45, and .55 for 1-test, 3-tests, and 5-tests conditions, respectively; ps < .001), whereas performance in the Lag 3 condition increased from the one-test to three-test condition (M = .46 and M = .61, p < .001) but did not increase further with five tests (M = .61, p > .90). Finally, the Age × Lag interaction was significant, F(1, 178) = 12.65, p <.001, η2p = .07, which reflected a larger lag effect for older adults (M = .16) than for young adults (M = .06).
Discussion
The final recall results from Experiment 1 are inconsistent with original predictions which assumed differences in retrieval difficulty between lag conditions and age groups. Specifically, these assumptions led to the prediction that a benefit of continued retrieval across tests would be observed in the Lag 3 condition but not in the Lag 1 condition. However, results revealed significant increases in retention with each increase in testing in the Lag 1 condition (i.e., increased retention when increasing from 1 to 3 to 5 tests). In contrast, for the Lag 3 condition, there was an increase in retention between three tests compared to one test during acquisition, but there was no comparable increase in retention between three tests and five tests. Importantly, the present experiment afforded a measure of desirable difficulty during encoding, i.e., response latency, and hence can provide some direct evaluation of this prediction.
When considering the ways in which response latency may reflect the retrieval difficulty associated with each condition during encoding, one may expect difficulty on the first retrieval attempt to be particularly useful in predicting long-term retention. Specifically, past research suggests that a long initial lag produces increased long-term memory versus a short initial lag regardless of subsequent form of spacing (i.e., equal spaced vs. expanding retrieval; Karpicke & Roediger, 2007). Thus, one factor likely to influence final test performance is the response latency on the first retrieval attempt during acquisition (i.e., slower response latency indicates more difficult retrieval). As shown in the bottom of Figure 2, response latencies on the first test were slower in the three-test condition compared to the one-test condition (Mdiff = .11; t(182) = 2.61, p = .010), and there was no difference in response latencies between the three-test and five-test conditions (Mdiff = .06; t(181) = 1.31, p = .190). Hence, one would predict that conditional accuracy on the final test should be greater in the three- and five-test conditions than in the one-test condition, and indeed, the final recall results indicated that taking three tests produced a benefit over taking one test in both lag conditions2. Moreover, response latencies were similar across lag conditions for young adults (Mdiff = .05; Lag, F(1,97) = 1.41, p = .237, η2p = .014) but were faster in the Lag 1 condition than the Lag 3 condition for older adults (Mdiff = .13; Lag, F(1, 83) = 4.66, p = .034, η2p = .05). Thus, the lag effect should be larger for older adults than younger adults in conditional final test performance, which was observed.
In sum, measuring acquisition response latency and conditional final test recall provided better leverage in examining the influence of desirable difficulty on long-term memory performance than relying solely on assumptions about the retrieval difficulty of our various manipulations. Indeed, our results were largely consistent with Bjork’s (1994) desirable difficulty framework.
Experiment 2
Experiment 2 was motivated to extend the results from Experiment 1 to an examination of massed versus spaced retrieval practice, which should produce a more extreme manipulation of spacing than the lag manipulation used in Experiment 1. Based on the extant literature, one would predict that continued testing with an ineffective lag, namely massed retrieval, should produce little benefit in final test performance relative to continued testing with a more effective lag.
It is possible, however, that continued massed retrieval may enhance long-term retention compared to a single massed retrieval attempt as a result of mechanisms other than desirable difficulty. Specifically, Experiment 2 was also motivated by an intriguing age-related difference in the benefit of refreshing. Johnson, Reeder, Raye, and Mitchell (2002) reported that young adults may benefit more from an immediate retrieval attempt than older adults, a process referred to as refreshing, despite older adults being slower to retrieve items on adjacent trials compared to young adults. Interestingly, Maddox et al. (2011) also found that younger adults appeared to benefit more from an immediate test than older adults. Provided that older adults are slower than younger adults to retrieve items on massed testing trials, it does not appear that the benefit of refreshing reflects desirable difficulty. Indeed, Johnson et al. (2002) have suggested that refreshing yields prolonged activation of the item, and in turn, this prolonged activation benefits young adult memory performance to a greater extent than older adult memory performance. Thus, one might expect young adults’ long-term memory performance to benefit more from continued massed retrieval than older adults’ performance.
Given our specific interest in addressing how desirable difficulty is operationally defined and in accounting for differences between conditions in acquisition phase accuracy, Experiment 2 included only two levels of testing (1 vs. 3 tests) and one RI (5 minutes). Because RI did not interact with spacing and testing in Experiment 1, we only tested a short RI in Experiment 2, which still allowed us to examine conditional final test performance and acquisition response latency as a proxy for retrieval difficulty. These changes in methodology had the additional benefit of reducing the overall list length and increasing older adult performance closer to younger adult performance than observed in Experiment 1.
Method
Participants
Young adults were undergraduates at Washington University in St. Louis and received partial course credit or monetary remuneration ($10) for their participation. Older adults were healthy, community dwelling adults and received monetary compensation ($15) for their participation. Demographics are displayed in Table 1.
Design
A 2 (Age) × 2 (Spacing, Lag 0 vs. 4) × 2 (Number of Tests, 1 vs. 3) mixed-factor design was used in Experiment 2. Age was a between-participant factor. Lag and Number of Tests were within-participant factors. RI was five minutes for both age groups.
Materials
A subset of thirty-two low associate word pairs was selected from Experiment 1. Stimuli were divided into four sets of eight pairs, and these sets were counterbalanced across lists such that each pair occurred once in each of the within-participant conditions. A continuous paired associate task was again used for the acquisition phase of the memory task. The average serial list position was equated across conditions (ps > .75). In total, the acquisition phase consisted of 139 trials of which 96 trials were critical condition trials, 35 were filler trials, and eight trials were equally split between primacy and recency buffer items. Of the 96 critical condition trials, 32 trials were encoding trials and 64 trials were retrieval practice trials. Filler trials were included to ensure that average serial list position was equated across the critical conditions. Thus, the average RI was constant for all conditions.
Procedure
The procedure was the same as the procedure used in Experiment 1 with two exceptions: (a) only single-test and three-test conditions were included; (b) a single five minute RI was used.
Results
Acquisition Performance
The current set of analyses again emphasizes performance on the first and last retrieval attempts in each of the multiple retrieval attempt conditions as a way of assessing the stability of retrieval accuracy across testing events and the degree to which response latency decreases across test events as a function of lag.
Acquisition Memory Accuracy
Mean proportion correct recall for young and older adults is shown in Figure 4 as a function of lag, number of tests, and test number. Accuracy from the first and last retrieval attempt in the three test conditions were submitted to a 2 (Age) × 2 (Lag) × 2 (Test Number) mixed-factor ANOVA. Results revealed main effects of age and lag, ps < .005, that were further qualified by a significant Age × Lag interaction, F(1, 46) = 5.86, p = .020, η2p = .11. The interaction revealed statistically equivalent performance across age groups in the Lag 0 condition, p = .173, but a significant difference in Lag 4 performance between young (.55) and older adults (.39), p = .003, reflecting the greater forgetting rate in older adults across the longer lags, consistent with Experiment 1 results.
Figure 4.
Mean proportion cued recall (top panel) and mean standardized response latency (bottom panel) during the acquisition phase in Experiment 2 as a function of age, lag, number of tests, and test number (i.e., T1 = first retrieval attempt, T2 = second retrieval attempt, T3 = third retrieval attempt). Error bars represent ± 1 S.E.M.
Standardized Response Latency on Successful Retrieval Attempts During the Acquisition Phase
Mean standardized response latency for young and older adults is shown in in the bottom panel of Figure 4 as a function of lag, number of tests, and test number. Again, response latency data from the first and last retrieval attempt in the three test condition were submitted to a 2 (Age) × 2 (Lag) × 2 (Test Number) mixed-factor ANOVA. All main effects were significant, ps < .05, in addition to a significant Lag × Test Number interaction, F(1, 46) = 10.87, p = .002, η2p = .19. Moreover, the three-way interaction was significant, F(1, 46) = 6.17, p = .017, η2p = .12. Separate analysis of young adult response latency revealed main effects of lag and test number, ps < .001, with no interaction, p > .50. Analysis of older adult response latency also revealed significant main effects of lag and test number, ps < .001, and a significant Lag × Test Number interaction, F(1, 23) = 14.12, p = .001, η2p = .38. As shown in Figure 4, response latency for older adults decreased from the first to second retrieval attempt in the Lag 0 condition (p = .002) but remained stable from the second to third retrieval attempt (p > .40). In contrast, response latency significantly decreased across all retrieval attempts in the Lag 4 condition, ps < .05
Conditional Final Test Memory Accuracy
Figure 5 displays mean proportion conditional recall as a function of age, lag and number of tests. Performance was higher for young adults than older adults (M = .53 vs. M = .38, respectively), Lag 4 items were remembered better than Lag 0 items (M = .68 vs. M = .23, respectively), and taking three tests led to better retention than taking a single test (M = .50 vs. M = .41, respectively).
Figure 5.
Mean proportion conditional cued recall on the final test in Experiment 2 as a function of age, lag, and number of tests. Error bars represent ± 1 S.E.M.
The 2 (Age) × 2 (Lag) × 2 (Number of Tests) mixed-factor ANOVA on conditional final recall yielded main effects of age, F(1, 46) = 15.44, p < .001, η2p = .25, lag, F(1, 46) = 272.33, p < .001, η2p = .86, and number of tests, F(1, 46) = 8.03, p = .007, η2p = .15. Although the three-way interaction was not significant, separate analyses were conducted to examine the benefit of additional testing for each lag condition given a priori predictions based on age differences in refreshing discussed above. Analysis of Lag 0 performance revealed main effects of age and number of tests (ps < .05) which were further qualified by a significant Age × Number of Tests interaction, F(1, 46) = 4.20, p = .046, η2p = .08. This interaction reflected a significant increase in performance when testing was increased from one to three tests in the massed condition for young adults (p = .005) but not for older adults (p > .90). Regarding Lag 4 performance, the ANOVA revealed main effects of age and number of tests (ps < .05) but no interaction, p > .95. As predicted, these results indicate that young adults benefited from repeated testing when refreshing was engaged in the Lag 0 condition. Older adults did not produce this benefit.
Discussion
The results from Experiment 2 are clear. First, as predicted, both age groups benefited from spaced retrieval and continued testing in the Lag 4 condition. However, only young adults benefited in terms of conditional accuracy from continued testing in the Lag 0 condition, which is consistent with previously reported age differences in refreshing described above.
With respect to the influence of retrieval difficulty on long-term memory performance, there was a clear relationship between acquisition phase response latency and retention. A significant three-way interaction was observed in acquisition phase response latencies among age, lag and test number. A follow-up analysis of this interaction revealed main effects of lag and test number but no interaction in young adult performance, whereas analysis of older adult performance revealed effects of lag and test number as well as a significant Lag × Test Number interaction. First, consider how the young adult retrieval latencies are related to final recall performance. The lack of an interaction between lag and test number in acquisition response latency suggests that the benefit of continued retrieval practice over taking a single test should be equivalent across lag conditions, and indeed additive effects of lag and number of tests were observed in retention for the younger adults. Turning to the older adult data, the significant interaction between lag and test number in older adult response latency leads one to expect a larger benefit of repeated testing in the Lag 4 condition than the Lag 0 condition, which was also observed. In sum, the current emphasis on acquisition phase response latency as a proxy for retrieval difficulty provided a more precise way of examining the desirable difficulty account. Indeed, it appears that overall the results are consistent with the benefits of desirable difficulty.
General Discussion
The current experiments extended past studies investigating spacing and retrieval practice in two ways. First, response latency for correctly retrieved items during acquisition was used as a metric of retrieval difficulty to allow for a more precise assessment of Bjork’s (1994) desirable difficulty account. Second, final test performance was examined only for those items that were correctly retrieved (i.e., received the benefit of retrieval practice) during acquisition. In this way, the influence of spaced retrieval on retention was better isolated from the effects of spaced retrieval on encoding than in previous studies.
Importantly, with these methodological extensions, the present experiments were able to more carefully examine two questions regarding the benefits of spaced retrieval practice across age groups and RIs. First, how does lag modulate the extent to which continued testing improves long-term memory? Second, how does the function relating lag and continued testing to final test performance differ across young and older adults? We will first address these latter two questions before considering the extent to which results are consistent with Bjork’s desirable difficulty account.
Retrieval Practice as a Function of Lag
The results from Experiment 1 provided information regarding the function relating continued testing and lag to final test performance. Similar to previous studies (e.g., Karpicke & Roediger, 2007; Wheeler & Roediger, 1992), the results revealed a long-term retention benefit with increased testing when comparing a single test condition to a three test condition in both the Lag 1 condition (11% benefit) and Lag 3 condition (15% benefit). More importantly, the inclusion of a third level of testing (i.e., 5 tests) extended previous studies and revealed a difference in the function relating continued testing and lag to final test performance. Specifically, retention continued to increase with additional retrieval practice in the short lag condition (10% from 3- to 5-tests) but did not increase in the long lag condition (0% from 3- to 5-tests). Thus, the benefits of additional retrieval practice appear to asymptote after three successful retrieval events in the long lag condition but not in the short lag condition.
It is important to note that a similar pattern of data has recently been observed in a study that used a paradigm utilizing feedback and learning to criterion with young adults (Rawson & Dunlosky, 2011). Specifically, Rawson and Dunlosky reported results from a cued recall paradigm that suggested the most efficient way of scheduling study events is to learn material to an initial criterion of three correct retrieval events and then schedule three relearning sessions in which material is retrieved to a criterion of one correct retrieval. This is a critical observation, because the current study emphasized the influence of spaced retrieval on retention (i.e., conditional accuracy) rather than unconditional final test performance. Thus, there is converging evidence across different methodologies that the benefits of spaced retrieval practice asymptote after an item has been tested a specific number of times (i.e., 3 times) following a relatively long lag. Continuing to test material beyond this optimal number provides relatively little additional benefit in terms of retention. It is also important to note that both methodologies assessed final test performance when differences in acquisition performance were minimized between conditions. Of course, ultimately, the precise number of tests needed to maximize the efficiency of retrieval practice is likely to be influenced by the precise lag used, the retention interval, the difficulty of the materials to be learned and the ability of the learner.
The current results also have implications for one of the leading accounts of the spacing effect, namely the combined study-phase retrieval and encoding variability account (e.g., Greene, 1989; Raaijmakers, 2003). This account proposes that the benefit of spaced study results from increased encoding variability for repetitions separated by time or intervening items relative to repetitions that are studied consecutively. Often, the account suggests that participants must retrieve the first presentation of an item when it is later re-studied to obtain the full mnemonic benefit of spaced study (e.g., Madigan, 1969; Melton, 1967; Thios & D’Agostino, 1976)). Our results are generally consistent with this account when examining the single-test and three-test conditions (see Figure 3; Experiment 1). However, the five-test condition included in Experiment 1 suggests that the benefits of encoding variability may asymptote (i.e., continued retrieval after initial encoding variability may yield minimal increases in long-term performance), and that continued retrieval practice may help compensate for the use of a less variable encoding condition (i.e., Lag 1). Of course, some caution must be exerted when attributing the benefits of repeated retrieval to encoding variability versus retrieval difficulty, because these conditions are naturally confounded (i.e., more variable encoding is predicted to lead to increased retrieval difficulty). Thus, future work should attempt to isolate the contributions of these two mechanisms to long-term memory performance.
Age-related differences and the Benefits of Spaced Retrieval Practice
One particularly interesting aspect of Experiment 1 is the reliable Age × Lag interaction in accuracy which reflected a larger lag effect in conditional accuracy for older adults than for young adults. As seen in Figure 6, a much larger lag effect was observed for older adults than young adults following a long RI compared to the short RI. At this level, it appears that older adults actually benefit more from the Lag 3 condition in terms of conditional accuracy than young adults. This pattern is particularly important, because young and older adult performance was equated in the Lag 1 condition (as shown in Figure 6). Hence, it appears that the increased lag effect in older adults may be due to age-related differences in retrieval effort and desirable difficulty (Bjork, 1994). Specifically, because older adults have lower performance during the long lag condition during encoding, those items that did survive may have benefited more from desirable difficulty in the older adult group than the young adult group and, hence, may have produced a stronger long-term trace. Indeed, this interpretation is consistent with the age differences observed across lag conditions in acquisition response latency, discussed above.
Figure 6.
Proportion conditional age, retention interval and lag. Error bars representation ± 1 S.E.M.
With respect to Experiment 2, results indicated that both age groups benefited from continued testing in the long lag condition. However, young and older adults differed in the benefit of continued massed retrieval practice. Namely, young adults benefited from continued testing but older adults failed to show a similar benefit. This benefit of repeated testing in young adults led to a reduction in the overall spacing effect, and hence contributed to the observation that older adults produced a larger lag effect than young adults, which is consistent with results observed at the long RI in Experiment 1. More importantly, the benefits of the immediate retrieval event in young but not older adults provides further support for age differences in a refreshing mechanism (Johnson et al., 2002; Maddox et al., 2011).
Although the current results are useful for examining the benefits of spaced retrieval for items that successfully incurred retrieval practice during encoding, an emphasis on conditional analyses overlooks overall differences between age groups and lag conditions in performance during acquisition. Indeed, the benefit of a long lag during acquisition is offset by reduced acquisition performance. Thus, future studies may wish to extend recent work reported by Rawson and Dunlosky (2011) to an older adult population as a means of examining the benefits of criterion level learning and the benefits of continued testing with feedback in this group (see Pyc & Balota, 2013, for such a study).
Continued Retrieval Practice and Desirable Difficulty
As discussed in the Introduction, the benefits of various spaced retrieval schedules have been tied to the degree to which a given schedule produces desirably difficult retrieval (e.g., Bjork, 1994) during the learning phase. Based on Bjork’s concept of desirable difficulty, it was originally predicted that continued testing would improve final recall performance in the long lag condition but would produce relatively little improvement in the short lag condition given that longer lags should lead to more difficult retrieval attempts than short lags. This pattern of data was not observed in the current experiments.
In our discussion of each experiment we more closely examined acquisition phase response latencies as a proxy for retrieval difficulty rather than relying on a priori assumptions about differences in difficulty across conditions and participant groups. This approach allowed for more precise assessment of retrieval difficulty and ultimately revealed general support for Bjork’s (1994) account. Indeed, this approach led to a different conclusion than would have been reached had we simply assumed retrieval difficulty differences as a function of lag condition and age group. Hence, these results emphasize the importance of measuring difficulty to examine the influences of desirable difficulty.
Jointly considering acquisition phase response latency and conditional final recall accuracy in the current study also provided evidence that an item continues to strengthen and increase in accessibility after it has been successfully retrieved on an initial test event despite little change in accuracy across subsequent test events during the acquisition phase. Specifically, the speeding of response latencies across later retrieval attempts suggests that items continue to strengthen with each additional test, which may be viewed as consistent with the bifurcation model of the testing effect (Halamish & Bjork, 2011; Kornell, Bjork, & Garcia, 2011; Storm, Friedman, Murayama, & Bjork, 2014). This model rests on two core assumptions. First, to-be-remembered items are distributed across a continuous “memory strength” dimension which influences the probability of successful encoding. Second, restudying a list of items after initial encoding will shift the entire distribution along a memory strength dimension, whereas testing without feedback will bifurcate the distribution such that successfully retrieved items are substantially increased in memory strength and unsuccessfully retrieved items retain their original memory strength. Critically, the act of testing will increase the memory strength for successfully retrieved items to a greater extent than the increase in memory strength obtained by restudying all items. Once the distribution of items is bifurcated, it is not always clear whether items continue to be strengthened with additional retrieval practice, because all items have memory strength that surpasses the threshold for successful retrieval. However, the acquisition phase response latencies observed in the current study suggest that strengthening continues to occur even when it is otherwise undetectable in the accuracy measures.
Limitations of the Current Study
There are two noteworthy limitations of the current study that should be considered in future work. First, in the current study we used different retention intervals in Experiment 1 to equate overall performance. Although this was successful (see Salthouse, 2000 for a similar procedure), this manipulation may have also allowed for other mechanisms to contribute to the observed results (e.g., the one day RI for young adults may have allowed for greater consolidation of material than the one hour RI for older adults). Thus, future work may wish to address this possibility. Second, in order to minimize noise variance associated with age-related differences in motor control and computer use (e.g., Hickman et al., 2007; Nair et al., 2007) that could influence the response latency measure, the experimenter produced the response and this was used as a proxy for retrieval difficulty. Although the results were quite systematic in the current study, it is possible that the experimenters slowed their responses for older adults relative to younger adults which has been observed when caregivers speak to older adults (e.g., Kemper, 1994). Of course, such an effect would have to be more subtle as a function of condition, since the current study used standardized response latencies which controls for overall age-related slowing.
Conclusions
The results from the current study underscore the importance of conditional analyses in understanding the differing effects of spaced retrieval practice on learning and retention of material. Moreover, the use of acquisition response latencies as a proxy for retrieval difficulty provided a more accurate assessment of the desirable difficulty account (Bjork, 1994) than assuming differences in retrieval difficulty across conditions and provide evidence that items continue to strengthen with additional retrieval practice even when those changes are not evident in cued recall performance. Overall, the present results provide support for an important role of desirable difficulty in accounting for the benefits of spacing and repeated testing in younger and older adults.
Acknowledgments
We thank Jan Duchek, Mark McDaniel, and Roddy Roediger for helpful comments at various stages of this work. Additionally, we thank Clare Misko and Zack Pinsky for help with data collection. This research was supported by NIA Training Grant AG00030.
Footnotes
Portions of this paper are based on the dissertation by Geoffrey Maddox.
Research assistants were trained by the first author and received extensive practice prior to data collection. To avoid experimenter bias, research assistants were unaware of the hypotheses related to response latency. Moreover, the procedure utilized in the current study emphasized an immediate button-press upon vocalization of participant’s response which triggered a second screen on which the response was entered. Thus, the experimenter completed the button-press to record response latency before the accuracy of the participant’s response was known.
Both three and five tests led to similar levels of retention in the Lag 3 condition, but this was not true in the Lag 1 condition. Specifically, taking five tests in the Lag 1 condition produced significantly better performance on the final test than taking three tests, which may reflect an additional benefit obtained from additional retrieval practice. Thus, the influence of repeated exposure to material via testing may compensate for less effective spacing intervals.
References
- Arking RA. Perspectives on aging. In: Arking RA, editor. Biology of Aging. 2nd ed. Sunderland, MA: Sinauer Associates; 1998. pp. 2–36. [Google Scholar]
- Balota DA, Dolan PO, Duchek JM. Memory changes in healthy young and older adults. In: Tulving E, Craik FIM, editors. Oxford Handbook of Memory. Oxford: Oxford University Press; 2000. pp. 395–410. [Google Scholar]
- Balota DA, Duchek JM, Logan JM. Is expanded retrieval practice a superior form of spaced retrieval? A critical review of the extent literature. In: Nairne JS, editor. The foundations of remembering: Essays in honor of Henry L. Roediger III. New York: Psychology Press; 2007. pp. 83–105. [Google Scholar]
- Balota DA, Duchek JM, Paullin R. Age-related differences in the impact of spacing, lag and retention interval. Psychology and Aging. 1989;4:3–9. doi: 10.1037//0882-7974.4.1.3. [DOI] [PubMed] [Google Scholar]
- Balota DA, Duchek JM, Sergent-Marshall SD, Roediger HL., III Does expanded retrieval produce benefits over equal-interval spacing? Explorations of spacing effects in healthy aging and early stage Alzheimer’s disease. Psychology and Aging. 2006;21:19–31. doi: 10.1037/0882-7974.21.1.19. [DOI] [PubMed] [Google Scholar]
- Balota DA, Yap MJ, Cortese MJ, Hutchison KI, Kessler B, Loftis B, Neely JH, Nelson DL, Simpson GB, Treiman R. The English Lexicon Project. Behavior Research Methods. 2007;39:445–459. doi: 10.3758/bf03193014. [DOI] [PubMed] [Google Scholar]
- Bjork RA. Memory and metamemory considerations in the training of human beings. In: Metcalfe J, Shimamura AP, editors. Metacognition: Knowing about knowing. Cambridge, MA: MIT Press; 1994. pp. 185–205. [Google Scholar]
- Bjork RA. Desirable difficulties perspective on learning. In: Pashler H, editor. Encyclopedia of the mind. Thousand Oaks: Sage Reference; 2013. [Google Scholar]
- Bjork EL, Bjork RA. Making things hard on yourself, but in a good way: Creating desirable difficulties to enhance learning. In: Gernsbacher MA, Pew RW, Hough LM, Pomerantz JR, editors. Psychology and the real world: Essays illustrating fundamental contributions to society. New York: Worth Publishers; 2011. pp. 56–64. [Google Scholar]
- Bui DC, Maddox GB, Balota DA. The roles of working memory and intervening task difficulty in determining the benefits of repetition. Psychonomic Bulletin & Review. 2013;20:341–347. doi: 10.3758/s13423-012-0352-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Camp CJ, Foss JW, Stevens AB, O’Hanlon AM. Improving prospective memory task performance in person with Alzheimer’s disease. In: Bandimonte M, Einstein GO, McDaniel MA, editors. Prospective memory: Theory and applications. Mahwah, NJ: USum Associates; 1996. pp. 351–367. [Google Scholar]
- Carpenter SK, DeLosh EL. Application of the testing and spacing effects to name-learning. Applied Cognitive Psychology. 2005;19:619–636. [Google Scholar]
- Cepeda NJ, Pashler H, Vul E, Wixted JT, Rohrer D. Distributed practice in verbal recall tasks: A review and quantitative synthesis. Psychological Bulletin. 2006;132:354–380. doi: 10.1037/0033-2909.132.3.354. [DOI] [PubMed] [Google Scholar]
- Clark CM, Bjork RA. When and why introducing difficulties and errors can enhance instruction. In: Benassi VA, Overson CE, Hakala CM, editors. Applying the Science of Learning in Education: Infusing psychological science into the curriculum. 2014. [Google Scholar]
- Cull WL. Untangling the benefits of multiple study opportunities and repeated testing for cued recall. Applied Cognitive Psychology. 2000;14:215–235. [Google Scholar]
- Erdelyi MH, Becker J. Hypermnesia for pictures: Incremental memory for pictures but not for words in multiple recall trials. Cognitive Psychology. 1974;6:159–171. [Google Scholar]
- Faust ME, Balota DA, Spieler DH, Ferraro FR. Individual differences in information processing rate and amount: Implications for group differences in response latency. Psychological Bulletin. 1999;125:777–799. doi: 10.1037/0033-2909.125.6.777. [DOI] [PubMed] [Google Scholar]
- Giambra LM, Arenberg D. Adult age differences in forgetting sentences. Psychology and Aging. 1993;8:451–462. doi: 10.1037//0882-7974.8.3.451. [DOI] [PubMed] [Google Scholar]
- Glenberg AM. Monotonic and nonmonotonic lag effects in paired-associate and recognition memory paradigms. Journal of Verbal Learning and Verbal Behavior. 1976;15:1–16. [Google Scholar]
- Greene RL. Spacing effects in memory: Evidence for a two process account. Journal of Experimental Psychology: Learning, Memory, and Cognition. 1989;15:371–377. [Google Scholar]
- Halamish V, Bjork RA. When does testing enhance retention? A distribution-based interpretation of retrieval as a memory modifier. Journal of Experimental Psychology: Learning, Memory, and Cognition. 2011;37:801–812. doi: 10.1037/a0023219. [DOI] [PubMed] [Google Scholar]
- Hertzog C, Kramer AF, Wilson RS, Lindenberger U. Enrichment effects on adult cognitive development. Can the functional capacity of older adults be preserved and enhanced? Psychological Science in the Public Interest. 2009;9:1–65. doi: 10.1111/j.1539-6053.2009.01034.x. [DOI] [PubMed] [Google Scholar]
- Hickman JM, Rogers WA, Fisk AD. Training older adults to use new technology. Journal of Gerontology: Psychological Sciences. 2007;62B:77–84. doi: 10.1093/geronb/62.special_issue_1.77. [DOI] [PubMed] [Google Scholar]
- Johnson MK, Reeder JA, Raye CL, Mitchell KJ. Second thoughts versus second looks: An age-related deficit in reflectively refreshing just-active information. Psychological Science. 2002;13:64–67. doi: 10.1111/1467-9280.00411. [DOI] [PubMed] [Google Scholar]
- Karpicke JD, Roediger HL., III Expanding retrieval practice promotes short-term retention, but equally spaced retrieval promotes long-term retention. Journal of Experimental Psychology: Learning, Memory, and Cognition. 2007;33:704–719. doi: 10.1037/0278-7393.33.4.704. [DOI] [PubMed] [Google Scholar]
- Kemper S. Elderspeak: Speech accommodations to older adults. Aging, Neuropsychology, and Cognition. 1994;1:17–28. [Google Scholar]
- Kornell N, Bjork RA, Garcia MA. Why tests appear to prevent forgetting: A distribution-based bifurcation model. Journal of Memory and Language. 2011;65:85–97. [Google Scholar]
- Landauer TK, Bjork RA. Optimum rehearsal patterns and name learning. In: Gruneberg M, Morris PE, Sykes RN, editors. Practical aspects of memory. London: Academic Press; 1978. pp. 625–632. [Google Scholar]
- Logan JM, Balota DA. Expanded vs. equal spaced retrieval practice in healthy young and older adults. Aging, Cognition, and Neuropsychology. 2008;15:257–280. doi: 10.1080/13825580701322171. [DOI] [PubMed] [Google Scholar]
- Maddox GB. The efficiency of retrieval practice as a function of spacing and intrinsic value in young and older adults. St. Louis, MO: Washington University in St. Louis; 2013. (Unpublished doctoral dissertation). [Google Scholar]
- Maddox GB, Balota DA, Coane JH, Duchek JM. The role of forgetting rate in producing a benefit of expanded over equal spaced retrieval in young and older adults. Psychology and Aging. 2011;26:661–670. doi: 10.1037/a0022942. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Madigan SA. Intraserial repetition and coding processes in free recall. Journal of Verbal Learning and Verbal Behavior. 1969;8:828–835. [Google Scholar]
- Melton AW. Repetition and retrieval from memory. Science. 1967;158:532. doi: 10.1126/science.158.3800.532-b. [DOI] [PubMed] [Google Scholar]
- Nelson DL, McEvoy CL, Schreiber TA. The University of South Florida word association, rhyme, and word fragment norms. 1998 doi: 10.3758/bf03195588. http://www.usf.edu/FreeAssociation/. [DOI] [PubMed] [Google Scholar]
- Payne DG, Roediger HL. Hypermnesia occurs in recall but not recognition. American Journal of Psychology. 1987;100:145–166. [PubMed] [Google Scholar]
- Pyc MA, Balota DA. Catastrophic interference? The influence of lag and testing on retention in young and older adults. Talk presented at the Annual Meeting of the Psychonomic Society; Toronto ON, CA. 2013. Nov, [Google Scholar]
- Pyc MA, Rawson KA. Testing the retrieval effort hypothesis: Does greater difficulty correctly recalling information lead to higher levels of memory? Journal of Memory and Language. 2009;60:437–447. [Google Scholar]
- Raaijmakers JGW. Spacing and repetition effects in human memory: Application of the SAM model. Cognitive Science. 2003;27:431–452. [Google Scholar]
- Rawson KA, Dunlosky J. Optimizing schedules of retrieval practice for durable and efficient learning: How much is enough? Journal of Experimental Psychology: General. 2011;140:283–302. doi: 10.1037/a0023956. [DOI] [PubMed] [Google Scholar]
- Rowland CA. The effect of testing versus restudy on retention: A meta-analytic review of the testing effect. Psychological Bulletin. doi: 10.1037/a0037559. (in press). [DOI] [PubMed] [Google Scholar]
- Salthouse TA. Mediation of adult age differences in cognition by reductions in working memory and speed of processing. Psychological Science. 1991;2:179–183. [Google Scholar]
- Salthouse TA. The Processing Speed Theory of Adult Age Differences in Cognition. Psychological Review. 1996;103:403–428. doi: 10.1037/0033-295x.103.3.403. [DOI] [PubMed] [Google Scholar]
- Salthouse TA. Methodological assumptions in cognitive aging research. Handbook of Cognitive Aging. (2nd Ed.) 2000:467–498. [Google Scholar]
- Schacter DL, Rich SA, Stamp MS. Remediation of memory disorders: Experimental evaluation of the spaced-retrieval technique. Journal of Clinical and Experimental Neuropsychology. 1985;7:70–96. doi: 10.1080/01688638508401243. [DOI] [PubMed] [Google Scholar]
- Storm BC, Friedman MC, Murayama K, Bjork RA. On the transfer of prior tests or study events to subsequent study. Journal of Experimental Psychology: Learning, Memory, and Cognition. 2014;40:115–124. doi: 10.1037/a0034252. [DOI] [PubMed] [Google Scholar]
- Sungkhasettee VW, Friedman MC, Castel AD. Memory and metamemory for inverted words: Illusions of competency and desirable difficulties. Psychonomic Bulletin & Review. 2011;18:973–978. doi: 10.3758/s13423-011-0114-9. [DOI] [PubMed] [Google Scholar]
- Thios SJ, D’Agostino PR. Effects of repetition as a function of study-phase retrieval. Journal of Verbal Learning and Verbal Behavior. 1976;15:529–536. [Google Scholar]
- Wheeler MA, Roediger HL. Disparate effects of repeated testing: Reconciling Ballard’s (1913) and Bartlett’s (1932) results. Psychological Science. 1992;3:240–245. [Google Scholar]
- Yue CL, Bjork EL, Bjork RA. Reducing verbal redundancy in multimedia learning: An undesired desirable difficulty? Journal of Educational Psychology. 2013;105(2):266–277. [Google Scholar]






