Recall Latencies, Confidence, and Output Positions of True and False Memories: Implications for Recall and Metamemory Theories

Jerwen Jou

doi:10.1016/j.jml.2007.12.003

. Author manuscript; available in PMC: 2012 May 9.

Published in final edited form as: J Mem Lang. 2008 Jan 28;58(4):1049–1064. doi: 10.1016/j.jml.2007.12.003

Recall Latencies, Confidence, and Output Positions of True and False Memories: Implications for Recall and Metamemory Theories

Jerwen Jou ¹

PMCID: PMC3348658 NIHMSID: NIHMS47000 PMID: 22582008

Abstract

Recall latency, recall accuracy rate, and recall confidence were examined in free recall as a function of recall output serial position using a modified Deese-Roediger-McDermott paradigm to test a strength-based theory against the dual-retrieval process theory of recall output sequence. The strength theory predicts the item output sequence to be in the descending order of memory strength. The dual-retrieval process theory postulates two phases in a free recall, a first direct access phase in which items are output verbatim in the weakest-to-strongest order (cognitive triage) and a second reconstructive phase in which reconstructed items are output in the strongest-to-weakest order. In three experiments, all three indicators of memory strength (latency, accuracy, and confidence) consistently showed a descending-strength order of recall both for true and false memories. Additionally, false memory was found to be output in two phases and subjects’ confidence judgment of their own memory to be unaccountable by retrieval fluency (recall latency).

Keywords: recall latencies, recall output position, false memory, metamemory

The main purpose of this study is to examine the relationship between memory strength of remembered items and their sequential output order in free recall. Research results have been inconsistent regarding the relationship between these two variables. Two theoretical perspectives make different predictions. The first view will be termed memory strength theories (Anderson, 1976, 2005; Dosher, 1984; Gillund & Shiffrin, 1984; Norman, 2002; Wixted, Ghadisha, and Vera, 1997). These theories hold that the output sequence of items in a free recall follows the decreasing order of memory strength or activation of the items, i.e., recall output order is from the strongest to the weakest item. Also, according to these theories, the retrieval time for stronger memory is shorter than for weaker memory. The other perspective will be termed the dual-retrieval processes theory (Brainerd, 1995; Brainerd, Olney, & Reyna, 1993; Brainerd, Reyna, Howe, & Kevershan, 1991; Brainerd, Wright, Reyna, & Payne, 2002). This theory holds that there are two phases in the process of a free recall, a first verbatim retrieval, or direct access phase, and a second gist-based, constructive or reconstructive phase. Also, according to this theory, the direct retrieval process is subject to output interference, whereas the reconstructive process is not subject to this interference. Moreover, in the first verbatim phase of recall, the item with the weakest memory trace is output first and the item with the strongest memory trace is output last. This strategic recall process of giving the top priority to the weakest item is known as cognitive triage (Brainerd, 1995; Brainerd, Reyna, & Howe, 1990; Brainerd, Reyna, Howe, & Kevershan, 1991) and its function is to minimize the output interference for the weakest item. Although the output order in the first phase is from the weakest to the strongest, in the second reconstructive phase, the output order follows the decreasing order of strength, i.e., the strongest constructed item is output first, and the weakest one the last. Brainerd and his associates have demonstrated the cognitive triage phenomenon of the dual-retrieval recall process in many developmental studies (Brainerd, Olney, & Reyna, 1993; Brainerd, Reyna, & Howe, 1990; Brainerd, Reyna, Howe, & Kevershan,1990, 1991). Similarly, Barnhardt, Choi, Gerkens, & Smith (2006) recently demonstrated the same phenomenon with adult subjects and found no evidence supporting the strength theory. Brainerd (1995; Brainerd et al., 1990) argued that cognitive triage in free call is not just a memory strategy but rather a basic memory interference minimizing process because neither children as young as age 6 (before they start to use any memory strategy) nor adults have conscious awareness of using a deliberate strategic control when cognitive triage can be observed in their recall.

On the other hand, in a free recall study using frequency of presentation as a strength manipulation, Wixted et al. (1997) found that items (in a strong and weak items mixed list) with greater memory strength yielded a shorter recall latency than items with weaker memory strength, providing strong evidence supporting the strength theory, and no evidence at all for the dual-retrieval processes theory. Therefore, this issue deserves further investigation. The primary goal of the present study is to further investigate this issue and test the two theories.

In previous studies, memory strength was either measured by the proportion of accurate recall (Brainerd, 1995) or defined by the number of times an item was studied (Dosher, 1984; Wixted et al., 1997) or by study time (Rohrer & Wixted, 1994). Also, confidence rating was found to be negatively correlated with latency but positively correlated with accuracy both in recognition (Jou, Matus, Aldridge, Rogers, & Zimmerman, 2004; Robinson, Johnson, & Herndon, 1997) and recall (Koriat, 1993; Nelson, Gerler, & Narens, 1984; Nelson & Narens, 1990). Hence, accuracy, recall latency, and confidence have been used as indices of memory strength by researchers. In the present study, all three measures were used as indices of memory strength and as converging evidence for testing the theories. In addition, the materials and a modified procedure in the Deese-Roediger-McDermott (DRM) (Deese, 1959; Roediger & McDermott, 1995) paradigm were used because the high rate of false memory this procedure generates insured the occurrence of high frequency of constructed memories in the recall output for testing the dual-retrieval processes theory.

Also to be examined in this study is the question of whether recall latency or output serial position (SP) can account for the confidence judgments subjects make of their memory. One possibility is that subjects heavily rely on the retrieval fluency (Kelly & Rhodes, 2002), or the ease and quickness with which information comes to mind as the basis of confidence judgment (Kelly & Lindsay, 1993; Lindsay & Kelly, 1996; Mazzoni & Nelson, 1995). Nelson and Narens (1990) referred to this idea as confidence-determined-entirely-by-latency hypothesis. Another possibility is that the retrieval or recall latency cannot account for the confidence or lack of confidence subjects indicate for the recalled words. In that case, subjects may use other cues to evaluate the validity of their memory and they have the conscious access to the difference in the sources of the correct and incorrect memories (Koriat, 1993, 2007).

Jou, et al. (2004) showed that false memory as defined in the DRM paradigm produced a significantly longer recognition latency than true memory, and suggested that the activation level of false memory is lower than true memory. Therefore, still another purpose of this study is to determine whether such a latency difference between true and false memory also exits in recall. A recall latency was measured in this study as the time elapsed between the end of typing a word and the end of typing the next word in a self-paced sequential free recall test. It is assumed that the pause before typing the word contributes significantly to the word’s production time. If recall latency for false memory is indeed longer than for true memory, then the distinction in response time between true and false memories can be generalized across recognition and recall. If false memory can be shown to have longer recognition and recall latencies than true memory, then false memory can be considered a weaker form of memory than true memory from the strength point of view (Wixted et al., 1997).

Experiment 1

Experiment 1 used the DRM materials and a modified DRM procedure to measure the recall latencies and the output serial positions (SPs) of the recalled items. As converging evidence, the rate of correct recall was also examined as a function of output SP. The crucial question asked is whether the recall latency/output SP function first shows a negative slope (weak items output earlier have longer latencies and strong items output later have shorter latencies) prior to the middle point and then a positive slope past the middle point as will be predicted by the dual-retrieval processes theory. The strength theory predicts a monotonically increasing latency/output SP function. A second question is whether false memory has a longer recall latency than true memory and also a different pattern of output SP distribution than true memory.

Method

Subjects

A total of 123 undergraduates at the University of Texas – Pan American participated in the experiments for extra course credit. They all met the criterion of English as their only language or their dominant language if they were bilingual.

Materials and design

The twenty-four lists of semantically associated words that produced the top 24 false memory rates were selected from the Stadler, Roediger, and McDermott’s (1999) DRM associated word norms. The theme word or the critical word of each list was not presented (the critical nonpresented words, henceforth CNPW). These 24 lists of words were divided into three blocks, with list 1 to 8 as block 1, 9 to 16 as block 2, and 17 to 24 as block 3. Each subject studied and was tested on two blocks (i.e., 16 lists) with the three blocks rotated across the subjects so that each block was used for an equal number of times across subjects (i.e., one third of the subjects received blocks 1 and 2, one third received blocks 1 and 3, and one third blocks 2 and 3). This design encompassed a larger material sample to enhance the generality of the results while keeping the experiment from running too long. For the last 36 subjects, because of time constraint, each subject studied and was tested only on 12 of the 24 lists. Half of them received the even-numbered, and the other half odd-numbered lists of the 24 lists.

Procedure

A modified DRM procedure was used. Each subject was seated in a cubicle to perform the experimental task individually. The experiment was run by a computer. In the learning phase, the 16 lists of words were presented in a new random order for each individual subject, and so were the 15 words in each list. Each word was displayed for 2.5 s with a 1 s blank screen in between words. Subjects were asked to pay close attention to the words during presentation. At the end of presentation, subjects performed backward counting for 30 s by steps of 3 starting from a random 3-digit number generated by the computer program. They were asked to count at a reasonably fast pace. At the end of the counting, the recall input screen appeared. The recall prompt was a number (starting from 1) followed by a question mark and a blank space along with a blinking cursor displayed near the upper center location of the screen. Subjects typed the first recalled word into the space and followed it with a press of the enter key. This simultaneously removed the question mark from the first entry (the number “1” and the entered word remained visible in their original positions) and brought on the second word prompt. The recall latency was measured from subjects’ pressing Enter (to start recalling a word) to pressing Enter again (to submit the input word and start the next recall). Subjects were told to submit a word by pressing Enter immediately after they completed typing it. They could skip a prompt by pressing Enter without entering a word. The recall was subject self-paced and they were not told that their recall time was being measured by the computer. However, they were told not to take a break before completing the recall for a list. They were told that they should avoid guessing as much as possible and that they could recall the words in any order they liked. At the end of the recall of each list of words, subjects could take a break and were told how many of the 16 lists they had completed. Because of time constraint, the last 36 subjects studied and were tested on only 12 lists of words (other subjects received 16 lists of words).¹

Results and Discussion

Recalled list words whose plurality was changed from the presented form or which were spelled incorrectly (but could be clearly identified) were scored as wrong by the computer program. Before the data analysis, the author changed the scoring of these words from “wrong” to “correct”. This increased the correct recall rate by about 3.5%. The correct recall rate for the list words was .584. The false recall rate of the CNPW was .363 which was close to the median false recall rate of the CNPW in the Stadler et al.’s (1999) norms.

Recall latency

The mean latency as a function of Experiment, word type (list words vs. CNPW), and recall output SP are presented in Figure 1.

Mean recall latencies as a function of experiment, word type, and output serial position of Experiments 1 and 2.

The recall latency functions showed that overall the word recall time increased as more words were being recalled (or as recall progressed with the output SP). Furthermore, over and beyond this output SP main effect, at each output SP except for the first few, the recall was slower for the CNPW than for the list words. This recall latency difference could not have been caused by typing more letters on average for the CNPW than for the list words because on average the CNPW were shorter than the list words. The mean number of letters for the list words was 5.21, as compared with 4.97 for the CNPW. The difference was significant, F (1, 118) = 26.43, MSE = .123, p < .001. So, if other things were equal, the CNPW should on average have produced a shorter recall time if the recall latency merely measured typing time. Therefore, factors other than word length should have caused the difference in recall time. It is suggested that the differential recall onset latency (the pause before typing the word) was the source of the recall latency difference between these two types of words. This finding was consistent with findings in other prior studies showing that the recall latency of commission errors was longer than that of the correct recall (Nelson et al., 1984; also see Nelson et al., 1990). In addition, this difference tended to increase with the output SP.

An ANOVA conducted on the part of the Experiment 1 data shown in Figure 1 indicated that the main effect of word type was significant, F (1, 118) = 33.17, MSE = 122,540,173, p < .001, with the mean latency for the list words being 7,622 ms, and that of the CNPW 10,900 ms. The output SP main effect was significant, F (14, 1429) = 73.66, MSE = 38,580,382, p < .001, indicating that overall, the recall time increased with the output SP. The word type by output SP interaction was also significant, F (14, 365) = 11.21, MSE = 36,842,289, p < .001. The significant word type by output SP interaction confirmed the visual observation that the difference in time between these two types of words widened as the output SP progressed.

According to the dual-retrieval processes theory, during the first phase, items are output according to the cognitive triage principle whereby more vulnerable items (i.e., weaker memory) are output before items of stronger memory and during the second phase, items are output in the reverse of the above order (i.e., decreasing order of strength) (Brainerd, 1995). Assuming that recall latency is indicative of memory strength (Wixted et al., 1997), the latency functions in Figure 1 show no sign of outputting weaker items before stronger items in the first phase (there was no indication that in the first several output SPs, the function was negatively sloped). The functions appeared to be monotonically increasing for the list words and the CNPW. Thus, the latency data contradicted the cognitive triage principle but supported the idea of a strength-based output order.

Recall probability

The mean output SP for the list words was 5.32, and that of the CNPW was 6.34. Thus, the mean of the output SP distribution of the list words was about 1 position earlier than that of the CNPW. The difference was significant, F (1, 118) = 20.88, MSE = 2.45, p < .001. In this experiment, each subject studied and recalled 16 lists of 15 words each. The recall probability at each output SP was the relative frequency of recall for that output SP, i.e., the number of words output at that position divided by 16 (i.e., the total number of chances for which words could be output at that position). The mean recall output probabilities as a function of experiment and the recall output SP for the list words are presented in Figure 2 and those of the CNPW presented in Figure 3.

Mean recall probabilities of list words as a function of experiment and output serial position of Experiments 1, 2, 3, and 4.

Mean recall probabilities of critical nonpresented words as a function of experiment and output serial position of Experiments 1, 2, 3, and 4.

Figure 2 showed that the probability of the list word recall declined continuously and monotonically over the output SPs. The output SP recall probability function of the CNPW looked very different. The SP output probabilities showed two modes, an early and a later one, although the first peak was a little higher than the second. Again, the probability of output at each output SP was calculated by the summed frequency of recalled CNPWs at that output SP across the 16 lists divided by 16. Because there was only one chance to recall a CNPW for each list (instead of 15 chances as for the list words), the probability of recall for the CNPW was much lower than for the list words as can be seen in Figure 3. An ANOVA conducted on the output probability data of the list words and the CNPW of Experiment 1 showed that the main effect of word type (mean of list words = .569, mean of CNPWs = .024)² was significant, F (1, 122) = 3019.55, MSE = .091, p < .001, as was the main effect of output SP, F (14, 1708) = 1076.17, MSE = .008, p < .001. The crucial word type by output SP interaction was significant, F (14, 1708) = 750.25, MSE = .010, p < .001. The significant interaction confirmed that the two functions were not the same in shape. Although the mean output SP for the CNPW was 1 position later than that of the list words, the first (or the higher) peak of the output SP distribution was actually at position 2 (i.e., a very early position) which contradicts the idea of the dual-retrieval processes theory that gist-based memory is output in the second constructive phase as well as other prior reports that CNPWs are output relatively late in the recall sequence (Brainerd et al., 2002, 2003; Brainerd et al., 2005; Roediger & McDermott, 1995). Thus, the mean CNPW output SP fails to reveal some important details of the CNPW output distribution such as whether the distribution is unimodal or bimodal and where the mode or modes of the distribution are located. The present data showed that the CNPWs are output in recall in two phases, rather than just in the second phase, and that the mean output position typically reported in prior studies fails to reveal the bimodal nature of the distribution. It should be noted that the gist-based part of the output that occurred in the first phase produced the same recall latency as the list words (see Figure 1).

The trough in the output probability function (roughly position 5) of the CNPWs coincided with the point in the recall latency functions at which the false and the true memory time functions started to diverge from each other. This is consistent with the idea that false memory that is output within the first 4 or 5 positions in the recall sequence is highly accessible just as the list words and may have been generated during the encoding stage, whereas those that are output after position 5 may likely have passed through constructive steps and generated during the retrieving stage (Hicks & Starns, 2005; Brainerd et al., 2003).

Conditional correct recall rate

Besides latency, accuracy has been used as a measure of memory strength (Brainerd, 1995). Therefore, accuracy was examined as a function of output SP to see if it corroborated the latency results. The accuracy rate of recall for each output SP for each subject was calculated by dividing the number of correctly recalled words by the total number of words recalled at that output SP. The proportion was the relative frequency of correct recall conditional on recalled words (note that it is different from the probability of correct recall which is the total number of correctly recalled words divided by 16. The CNPW was excluded from this calculation). The mean rates of correct recall as a function of Experiment and output SP are presented in Figure 4.

As is evident in Figure 4, accuracy of recall monotonically decreased over the output SPs. There was no indication of accuracy starting lower, going higher, then going lower, the nonmonotonic function found in some studies (see Brainerd, 1995). An ANOVA indicated that the decrease over the output SPs was significant, F (14, 1454) = 62.19 MSE = .022, p < .001. Thus, two indicators of memory strength, latency and accuracy, provided converging evidence that the recall output order follows the order of memory strength and not the sequence predicted by the cognitive triage principle.

If accuracy is an indicator of memory strength, then, correctly recalled words should have a lower latency than incorrectly recalled words (not counting the recalled CNPWs). Indeed, the overall mean latency for the correctly recalled words was 5966 ms as compared with 13,606 ms for the incorrect recalled words. The difference was significant, F (1, 113) = 114.12, MSE = 30,266,920, p < .001. Thus, recall accuracy and recall latency were consistent as an indicator of a common underlying construct, namely, memory strength.

Several conclusions can be drawn from Experiment 1. First, consistent with recognition latency (Jou et al., 2004), false memory produces longer recall latency than true memory, strengthening the conclusion that it is a weaker form of memory than true memory, despite its vivid illusion-like characteristics. Second, both the latency and the accuracy data indicated that free recall output sequence follows the order of memory strength, and not the order in which the weakest item is output first and the strongest last during the first direct retrieval phase, and the strongest first and the weakest last during the second gist-based reconstructive phase as claimed by the cognitive triage principle of the dual-retrieval processes theory. Third, false or constructed memory was output in two phases, rather than only in the second phase as suggested by the dual-retrieval processes theory and other investigators (Brainerd et al., 2002, 2003, 2005; Barnhardt et al., 2006).

Experiment 2

The main purpose of Experiment 2 was to collect converging evidence for further testing the strength-based and dual-retrieval process theory of recall. Confidence has been found to be correlated with memory accuracy (Jou et al., 2004; Nelson et al., 1990; Robinson, Johnson, & Herndon, 1997). Several questions were asked in this experiment. The first question is whether the confidence rating function will show any signs of cognitive triage in operation. Assuming that weaker memory is associated with lower confidence, the confidence rating function should show a positive slope in the first half of the confidence function if weaker memory is output before stronger memory in the first phase of recall. The second question is whether output SPs associated with indicators of greater memory strength are also associated with higher confidence ratings, and vice versa. The third question is whether subjects are conscious (as may be indicated by confidence rating) of the distinction between true and false memory as reflected in recall latency difference. The fourth question is whether the confidence difference, if any, can be accounted for by the latency difference alone (the confidence-determined-entirely-by-latency hypothesis, see Nelson et al., 1990) or by output SP alone. This hypothesis is based on the idea of output or retrieval fluency determining confidence judgment, namely, that items that comes to mind quickly or early in recall must be the items that are studied (Koriat, 1993; Kelly & Lindsay, 1993; Lindsay & Kelly, 1996; Mazzoni & Nelson, 1995).