Abstract
Two processes are postulated to underlie delayed judgments of learning (JOLs) -- cue familiarity and target retrievability. The two processes are distinguishable because the familiarity-based judgments are thought to be faster than the retrieval-based processes, because only retrieval-based JOLs should enhance the relative accuracy of the correlations between the JOLs and criterion test performance, and because only retrieval-based judgments should enhance memory. To test these predictions, in three experiments, we either speeded people’s JOLs or allowed them to be unspeeded. The relative accuracy of the JOLs in predicting performance on the criterion test was higher for the unspeeded JOLs than for the speeded JOLs, as predicted. The unspeeded JOL conditions showed enhanced memory as compared to the speeded JOL conditions, as predicted. And finally, the unspeeded JOLs were sensitive to manipulations that modified recallability of the target, while the speeded JOLs were selectively sensitive to experimental variations in the familiarity of the cues. Thus, all three of the predictions about the consequences of the two processes potentially underlying delayed JOLs were borne out. A model of the processes underlying delayed JOLs, based on these and earlier results is presented.
People’s judgments of learning (JOLs) have consequences for their subsequent study behavior (Finn, in press; Metcalfe & Finn, 2008). If JOLs are independently lowered, say, by framing the JOL question to participants to ask about whether they will remember the answer (resulting in high JOLs) or whether they will forget it (resulting in low JOLs), their study choice behavior is altered. They choose fewer items to study in the former case than in the latter, even though their learning, at time of making the judgment, is the same (Finn, in press). Other manipulations that have altered people’s JOLs in an illusory way also have been shown to have direct consequences for what they choose to study (Metcalfe & Finn, 2008). Given that people use these metacognitive judgments to control their subsequent behavior, it is important both that the judgments be accurate and that we understand the processes that underlie them. Delayed JOLs, in which the judgments are made using only the cue at some time after the study effort, appear to be among the most accurate ways of making a self assessment of one’s own learning, both in terms of relative accuracy (Nelson & Dunlosky, 1991) and calibration (Finn & Metcalfe, 2007, 2008; Koriat & Bjork, 2005). For this reason, we were especially interested in understanding the mechanisms underlying delayed JOLs.
Research on delayed JOLs focuses on the postulate that the mechanism for making these judgments is an attempted retrieval of the target (Nelson, Narens & Dunlosky, 2004). Here, we test the idea that although some delayed JOLs may, indeed, be based on a retrieval attempt as most researchers have proposed, there is a second basis for these judgments--cue familiarity. We will investigate whether these two mechanisms that may underlie delayed JOLs are separable, and also whether they may have different consequences for the accuracy of the JOLs and for people’s subsequent memory.
The reasons many researchers have thought that delayed JOLs may be based on retrieval is that the relative accuracy of people’s delayed judgments is substantially higher than when those judgments are made immediately after the study presentation (Begg, Duft, Lalonde, Melnick, & Sanvito, 1989; Benjamin & Bjork, 1996; Benjamin, Bjork & Schwartz, 1998; Kimball & Metcalfe, 2003; Koriat, 1997; Nelson & Dunlosky, 1991; 1992; Nelson, Dunlosky, Graf & Narens, 1994; Spellman & Bjork, 1992). There have been three main theories of why the delayed JOL accuracy advantage occurs, and each of the three implicates a retrieval attempt in the case of delayed JOLs. Indeed, only two studies (Benjamin, 2005; Son & Metcalfe, 2005) have suggested that something else may underlie some delayed JOLs.
The case for the postulate that people use an attempt to retrieve the target as the basis of their delayed JOLs comes primarily from studies and theories that have attempted to explain the difference in immediate and delayed JOL relative predictive accuracy with respect to the criterion test, that is, the ‘delayed JOL effect.’ The first proposal to explain this finding was the monitoring dual memories hypothesis given by Nelson and Dunlosky (1991) which states that immediate judgments are based on retrieval from both short-term memory (STM) and long-term memory (LTM). While making an immediate judgment the target item is still in STM and so judgments made immediately will not entail a retrieval attempt from LTM and hence will be poor at discriminating between what will be remembered and what will be forgotten when the test is delayed. By contrast, delayed JOLs rely only on retrieval from LTM, which is more diagnostic of what will happen at the final test.
The second explanation of the delayed JOL effect is the transfer appropriate processing view (Begg, et al., 1989; Dunlosky & Nelson, 1992; Glenberg, Sanocki, Epstein & Morris, 1987; Roediger, Weldon, & Challis, l989), which states that retrieval enacted at a delay is more similar to the retrieval that the person will use at test than are the processes that people use to make immediate JOLs. Therefore, the delayed retrieval will be more diagnostic of how people will do on the test. Although there are data mitigating against this theory (Dunlosky & Nelson, 1997; Dunlosky, Rawson, & Middleton, 2005; Weaver & Keleman, 2003), our only point here is that it postulates that the reason for the delayed JOL to test accuracy is a retrieval attempt. By both of these views, if there were no target retrieval attempt the correlations between JOLs and later test performance would be low rather than high. We will make a similar assumption--that a target retrieval attempt should result in a high JOL to test correlation, but if no retrieval attempt is made that correlation will be lower. We will use this as a method to tease apart the hypothesized two processes in delayed JOLs.
The third view is the Self-Fulfilling Prophecy explanation. By this view, the improvement in the relative accuracy of the delayed JOLs comes about because those judgments themselves -- which involve retrieval and retrieval, if successful, enhances memory--have an effect on the later memory test performance (Kimball & Metcalfe, 2003; Spellman & Bjork, 1992). This theory, like the others, states that people make their delayed JOLs by attempting to retrieve the target. If they are successful, they give those items a high JOL; if unsuccessful, they assign a low JOL. The critical difference between this theory and the two others is that these authors note (and demonstrate, in the case of Kimball & Metcalfe, 2003) that the act of successful retrieval at a delay enhances memory for those items that are brought to mind (see, Roediger & Karpicke, 2006). Those retrieved items are not only given high JOLs, but also get a memory boost. Thus, the JOL itself, insofar as it involves retrieval, should enhance memory. We will return to this point shortly, since we will not only look for higher relative accuracy if the learner is retrieving to make his or her JOLs, but we will also look for enhanced memory.
Despite the near consensus that delayed JOLs are based on an attempt at target retrieval, Son and Metcalfe (2005) have recently presented data that suggest that some delayed JOLs may not be based on target retrieval. Three experiments compared the reaction times of people when making JOLs without any instructions to when they were told to retrieve and then make the JOLs. According to a retrieval-only hypothesis, people should attempt to retrieve the target in both cases: telling them to do what they would do anyhow should not alter their behavior. If so, then the RT functions in these two cases should track one another. In both cases, the time needed to make the JOL should increase as the JOLs decrease and target retrieval becomes more difficult and time consuming.
However, Son and Metcalfe (2005) found that the reaction times for the lowest JOL items did not follow this pattern: some ‘don’t know’ judgments were made very quickly. The pattern of reaction time data followed the expectations of the retrieval hypothesis in the case where people were told to retrieve first and then make their JOLs: reaction times increased monotonically with the lowest JOLs showing the longest reaction times. But a different pattern was seen for the JOL alone condition. It showed a nonmonotonic reaction time function with the lowest JOLs being made very rapidly rather than very slowly. Indeed, a measure of the lowest JOLs in the JOL alone condition showed that they were made faster than the time needed to make a retrieval attempt. When making the lowest JOLs, people seemed to know that they did not know without having to take the time needed to attempt to retrieve the target.
To make these very fast, low JOLs, Son and Metcalfe (2005) suggested that people might be evaluating how familiar they were with the cue, assessing it as low, and making their judgment based on this evaluation. They suggested that both cue familiarity and target retrievability may play a role in making JOLs. Fast low JOLs arise because cue familiarity is assessed as low, and no attempt is made in these cases to retrieve the target. Thus, the judgment process can conclude rapidly. When cue familiarity is assessed as high and the target is retrieved very quickly, a high JOL is given--but it is a somewhat slower judgment.
If their explanation of the reaction time data is correct, there are three testable consequences. First, there should be a beneficial memory effect of retrieval, but only when the JOLs are based on target retrieval and not when they are based only on cue familiarity. A number of research reports have shown that testing and retrieval have beneficial effects on later memory (e.g., Butler & Roediger, 2007; Karpicke & Roediger, 2008; McDaniel & Fisher, 1991; McDaniel, Kowitz, & Dunay, 1989; McDaniel & Masson, 1985; Roediger & Karpicke, 2006; Pashler, Cepeda, Wixted, & Rohrer, 2005; Pashler, Zarow, & Triplett, 2003). Whitten and Bjork (1977) have found similar memory benefits for retrieval practice. This enhancement, presumably, occurs only on the items that are retrieved (and not on the ones that fail to be retrieved). Nevertheless, some items should get a memory boost from the JOL procedure itself, as long as that JOL process involves retrieval. The finding that successful retrieval enhances memory can be used as a dependent measure to see, retrospectively, whether one JOL condition was more likely to involve retrieval than another.
Second, all three dominant theories propose that the reason delayed JOLs accurately predict performance is because of the retrieval attempt. It follows that we would expect to see the very high JOL relative accuracy in the case where those JOLs are made, primarily, on the basis of target retrieval. JOL relative accuracy should be less good were the JOLs to be based, mainly, on cue familiarity without a retrieval attempt.
Third, we should be able to experimentally manipulate the two kinds of judgments rather than just relying on correlational evidence. If the cue-familiarity-based JOLs are made quickly, whereas the target-retrieval-based JOLs are made more slowly, we should expect to see that variables that selectively affect cue familiarity should impact more on the speeded JOLs, whereas variables that affect retrieval should impact primarily on the unspeeded JOLs. Benjamin (2005), in a study that manipulated cue and target familiarity, found promising preliminary evidence, in support of the second and third proposition. We shall explore this third prediction further, as well.
Experiment 1
In the first experiment, we manipulated target retrievability by using multiple pictorial cue exemplars of a particular category (bear1, bear2, bear3, bear4) and either paired each category cue with a single target word--resulting in high retrievability, or paired each category cue with multiple targets--resulting in low target retrievability. An example of the pictorial cues used in this experiment is given in Figure 1. Using the pictorial variants of the category allowed us to be explicit about which target was specified in the multiple target condition, while still keeping the cue familiarity the same in the two conditions. Our two primary conditions, were, therefore, A-B, A’-B, A”-B, A”’-B (which, for simplicity, we will hereafter call A-B A-B), and A-B, A’-C, A”-D, A”’-E, (which we will hereafter call, A-B A-C). A-B A-B is, of course, a positive transfer situation, and should result in good recall of the target, whereas A-B A-C is a negative transfer situation, and should result in poorer recall of the target.
We also varied whether the JOL that people made at a delay was speeded or unspeeded. In the speeded condition, participants had to respond in less than 3/4 of a second, or else they heard a voice (in the computer program we used) say: “Hurry” and a “Too slow! Data lost!” written message appeared onscreen. In the unspeeded condition they were told to take their time in making the judgments, and no voice ever intruded. In the judgment phase we also included pictorial cues that had never been presented. We call this the ‘New’ condition.
Our predictions were that in the speeded conditions the JOLs would be lowest in the New cues condition (because of a lack of cue familiarity). They would be higher, but about the same in the A-B A-C condition and in the A-B A-B condition (because of greater, but equal, cue familiarity, and little ‘contamination’ from target retrieval). In the unspeeded condition we expected low JOLs in the New condition as well (because of a lack of both cue familiarity and target retrievability). But here we predicted higher JOLs in the A-B A-C condition than in the new condition (because of higher target retrievability) and still higher JOLs in the A-B A-B condition (because the target would be easiest to retrieve in this condition).
We also predicted that the JOL gammas indexing the relative accuracy would be higher in the unspeeded than in the speeded JOL condition. The difference in gamma correlations was expected on the grounds that the JOLs would be based much more on attempted retrieval in the unspeeded JOL condition than in the speeded condition. And, finally, we predicted that recall would be better in the unspeeded JOL condition than in the speeded JOL condition. The purported retrieval attempt, in the unspeeded JOL condition, was expected to improve recall of those items that were retrieved. In the speeded JOL condition a target retrieval was predicted much less frequently and thus less recall enhancement was expected.
Method
Participants
The participants were 32 undergraduates at Columbia University and Barnard College. They participated for course credit or were paid at a rate of $12 an hour for participating. Participants were treated in accordance with the ethical principles of the APA and the Columbia University IRB approved all of the experiments in this article.
Design and materials
The experiment was a 2 (Speeded or Unspeeded JOL) X 2 (Encoding Condition, A-B A-B, or A-B A-C) X 12 (within-list repetitions of the basic design, over which the data were collapsed) within-participant design. Participants also made JOLs, in both the speeded and unspeeded condition, on 12 new cues.
The picture cues were four distinct exemplars of a particular category, which shared a common name, as shown in Figure 1. These cues, each being slightly different from one another, allowed us to uniquely query a particular target in the JOL and memory tests.
Procedure
Participants were shown, one at a time, and instructed to remember, 48 picture-word pairs. The 48 cues represented 6 distinct categories with 4 exemplars per category in each of the A-B A-B and the A-B A-C conditions, randomly mixed into a single list of items. Each picture-word pair was presented for 3 s of study on each presentation, and the entire list was shown twice. Participants were then asked for their JOLs for 12 cues from that list, and 6 cues that were new. The 12 cues from the list were selected such that 6 were cues from the 6 categories in the just-studied A-B A-B condition and 6 were from the A-B A-C condition. The cue used for the JOL was randomly selected from one of the 4 exemplar pictures that had been studied for each category. The JOL cue was the same as was then given in the test phase. The 6 New cues were randomly selected from other categories of pictures that each had four exemplars. After making their JOLs, participants were then tested for recall on the 18 items on which they had made JOLs.
There were two trials. The second trial was the same as the first (with different materials, of course) except that if the judgments had been speeded on the first trial they were unspeeded on the second, and if they had been unspeeded on the first they were speeded on the second. The speed of the first trial judgments was counterbalanced over participants.
The procedure in making the JOLs was as follows. Participants were told, “After you are presented with the pairs you will have an opportunity to give a JOL. A JOL is a Judgment of Learning which indicates your how confident you are that in about 10 minutes from now you will be able to recall the target when prompted with the picture.” They made their JOLs by pressing one of four keys that ranged in quarters from 0-100%. Keys were marked on the keyboard. In both conditions there was a practice trial in which the judgments were made at the speed at which they would be made during the experiment, and in which participants were told that for the upcoming trial they would be making either speeded or unspeeded judgments. This practice trial was especially important in the speeded conditions, because it gave participants the opportunity to practice with the JOL buttons as quickly as was necessary during the experiment, before we were collecting data. During the practice trial, as well as during the experiment, a prerecorded voice in the speeded conditions said ‘Hurry!’ and a ‘Too slow! Data lost!’ message appeared if the JOL response exceeded .75 s. This happened during the experiment on 15 % of the speeded trials. We included all of the items in the analyses below, though, even those that exceeded .75 s.
Results
Latencies
The mean time to make the Speeded JOLs was .61 s as compared to 1.48 s in the Unspeeded condition, t(31) = 7.37, p <.05. (We also conducted a separate analysis that excluded items that exceeded .75 s in the speeded condition. The pattern of results was the same as shown below.)
Recall
As predicted, recall was better in the Unspeeded JOL condition than in the Speeded condition. Unspeeded judgments showed a recall advantage (M = .69, SE = .04) over the Speeded condition (M = .63, SE = .04). This main effect was significant, F(1,31) = 5.53, MSe = .02, p < .05, η2p = .15 (effect size is reported using partial eta squared, η2p ).
As was expected Encoding Condition A-B A-B showed better recall performance (M = .83) than Condition A-B A-C (M = .48, F(1,31) = 71.34, MSe = .06, p < .05, η2p = .70 . The interaction between condition and judgment speed was not significant (F <1). The recall means are shown in Figure 2.
JOLs
The JOLs for the new items were included in this analysis, in both the unspeeded and speeded JOL conditions. All of the relevant effects and interactions were still significant, however, when the data were reanalyzed with the new items eliminated. As predicted, when people made Speeded JOLs their judgments followed the familiarity of the cue, whereas when they made Unspeeded JOLs the judgments followed the retreivability of the target. The interaction between JOL Speed and Encoding Condition, F(2,62) = 16.82, MSe = .21, p<.05, η2p = .35, is shown in Figure 3. Both the speeded and the unspeeded JOLs showed low mean judgments on the new items. In the speeded condition, although both the A-B A-B and the A-B A-C condition showed higher JOLs than those given to the new cues (t(31) = 7.83, p <.05, t(31) = 8.74, p <.05, respectively), there was no significant difference between them, t(31) = 1.52, p >.05. There was, however, a difference between the JOLs in the A-B A-B condition and the A-B, A-C condition in the unspeeded JOL condition, reflecting a similar difference in retrieval in these two conditions, t(31) = 5.56, p < .05.
There was also, of course, a main effect of Encoding Condition, F(2,62) = 119.14, MSe = .51, p < .05, η2p = .79. There was a main effect of JOL speed, F(1,31) = 11.76, MSe = .22, p <.05, η2p = .28. However, these main effects were qualified by the interaction of interest.
Gamma correlations relating JOLs to recall
Gamma correlations between JOLs and recall index relative metacognitive accuracy. We computed gamma correlations collapsed over all conditions (including the new items) within the unspeeded and speeded JOL conditions. As predicted, the gammas were higher for the unspeeded condition (M = .84, SE = .05) than for the speeded JOL condition (M = .61, SE = .08, t(30) =2.60, p <.05. We also eliminated the new items and recomputed the gammas only on items that had been presented for study. Once again, they were higher for the unspeeded JOL condition, (M = .58, SE = .11), than for the speeded JOL condition, (M = .28, SE = .11, t(24)= 2.14, p <.05 (The change in degrees of freedom occurred because some subjects had either all answers wrong or all right, and so a gamma could not be computed for them).
Additional analyses
Using data only from the Unspeeded JOL condition, we were able to investigate the reaction times (RTs) of participants making delayed JOLs when they were not constrained or subject to a time deadline. The data from this condition are comparable to the RT data of Son and Metcalfe (2005) when people were simply asked to make delayed JOLs without further constraints. In addition, because we had used a condition in which the cues were new, we were able to investigate whether under unspeeded conditions people would spontaneously give very fast low JOLs selectively in this condition, presumably, because of a lack of cue familiarity. The reaction time data for the three conditions, along with the proportion of responses in each condition at each of the four JOL levels, and the proportion correct at each of these four levels, are presented in Figure 4. As can be seen, most of the JOL responses in the New condition clustered into the lowest JOL category: people knew that they did not know. And they were very fast. In the A-B A-B condition, in contrast, most of the JOLs clustered into the highest JOL category. The proportion of responses in the highest JOL category was, appropriately, somewhat lower in the A-B A-C condition. They knew that they knew the answers more often in the A-B A-B condition than in the A-B A-C condition. The ‘know’, or highest JOL judgments, in both the A-B A-C and the A-B A-B conditions were made quickly but numerically less quickly than the ‘don’t know’ judgments in the New condition--consistent with the hypothesis. Medium valued JOLs in the A-B A-B and A-B A-C conditions were made more slowly, just as Son and Metcalfe (2005) had shown.
We were unable to conduct an ANOVA combining both Levels of JOLs and Encoding Conditions (New, A-B A-B and A-B A-C) on RTs, because there were many cases in which there were no responses at all in the New condition for the highest JOLs, and in the A-B A-B condition for the lowest JOL category. Indeed, there was not a single participant in this experiment who had data in every cell of the full design. Thus, we had to collapse. Accordingly, we conducted two separate one-way ANOVAs, the first comparing RTs across the 3 Encoding Conditions (collapsing over JOL levels) , and the second comparing RTs over JOL levels (collapsing over Encoding Conditions). There was a significant effect of Encoding Condition with RT as the dependent variable, F(2, 62) = 9.84, MSe=.39, p < .05, η2=p .24 . Although numerically the New condition (at 1.16 s) was faster than the A-B A-B condition (at 1.39 s), the post hoc test comparing these two conditions was not significant, t(31)= 1.41, p>.05 . The post hoc tests comparing both the New condition to the A-B A-C condition (at 1.84 s) and the A-B A-B condition to the A-B A-C condition were both significant , t(31)= 3.86, p<.05, and t(31)=3.69, p<.05, respectively.
There was a main effect for JOL level when RT was the dependent measure, F(3,57)=6.56, MSe=.61, p<.05, η2p=.26. All differences among means except between JOL level 1 and JOL level 4 and between JOL level 2 and JOL level 3 were significant — indicating an inverted U-shaped curve as a function of JOL level, with the collapsed RT data. Accordingly, we tested for linear, quadratic and cubic trends. Only the quadratic coefficient was significant, t(19)=2.90, p<.05. These distributional and RT results extend and provide further support for the dual process hypothesis.
Discussion
The predictions of the dual process model of delayed JOLs held up very well in the first experiment. The relative accuracy of the gamma correlations was higher with unspeeded than speeded JOLs. This pattern was consistent with the idea that the slow process that people use in making delayed JOLs involves a target retrieval attempt, but the fast process involves something else. Memory was better when the JOLs were slow rather than fast, suggesting a benefit from retrieval practice that was greater in the unspeeded condition. The manipulation that affected target retrieval had an impact only on the unspeeded JOLs and did not show up on the speeded JOLs. These three results suggest that the two processes are different and dissociable. They also suggest that the slow process may be an attempt at target retrieval. The low JOLs in evidence in the condition in which the cues were new suggests that the fast process was probably cue familiarity, but this suggestion is equivocal because both the cue and the target were completely unfamiliar in this case. Not only was the cue unfamiliar, but the target was also unretrievable, because no target had been presented.
Experiment 2
Although the results of the first experiment were supportive of our hypothesis, we had only included a measure of cue familiarity during the judgment process and retrieval but not during encoding. Thus, in the second experiment, we used the same basic design as had been used in the first experiment except that we added another condition in which the cue and target were presented only once. Thus, our three encoding conditions were A-B A-B, A-B A-C, and A-B-- the latter being a condition in which the cue was presented only once, and hence cue familiarity was expected to be lower than in the other two conditions.
Method
The participants were 42 undergraduates at Columbia University and Barnard College. They participated for course credit or were paid at a rate of $12 an hour for participating. The method was identical to that of Experiment 1 except that an A-B condition was also included. In the A-B condition pictorial cues were selected randomly from the same set as the other cues, and targets were drawn from the same set as the other targets and were presented only once during list presentation. People made speeded or unspeeded JOLs about four classes of cues in this experiment: those from the A-B A-B condition, from the A-B A-C, from the A-B condition, and cues that were new.
Results
Latencies
The mean time to make the speeded JOLs was .56 s. The mean time to make unspeeded JOLs was 1.24 s. This difference was significant, t(41) = 9.64, p <.05.
Recall
As predicted, recall was better in the unspeeded JOL condition (M = .59, SE = .03) than in the speeded JOL condition (M = .53, SE = .03), F(1,41) = 5.18, MSe = .03, p < .05, η2p = .11. In addition, the A-B, A-B condition showed the best recall performance (M = .88); condition A-B, A-C was in the middle (M = .45), and the A-B condition was the worst (M = .35), F(2, 82) = 152.69, MSe = .06, p< .05, η2p = .79. The interaction between condition and speed was not significant. The means for recall are shown in Figure 5.
JOLs
Because of our manipulation, the familiarity of the cues was as follows: A-B A-B = A-B A-C > A-B > New. The pattern of JOLs in the speeded condition followed this ordering. In contrast, in the unspeeded JOL condition the JOLs tracked the memorability of the targets: A-B A-B > A-B-A-C > A-B > New. The interaction between JOL speed and encoding condition was significant, F(3,123) = 12.62, MSe = .25, p<.05, η2p = .24, as is shown in Figure 6. The pattern of judgments in the speeded condition showed the A-B A-B condition and the A-B A-C condition both being high but not significantly different from one another, t(41) = 1.85, p >.05, the A-B condition being lower, and significantly different from both the A-B A-B condition, t(41) = 5.75, p < .05) and the A-B A-C condition, t(41)= 4.61, p< .05, and the New condition, in which the cue was not seen at all and hence was maximally unfamiliar, being lower yet, and significantly lower than the speeded JOLs in the A-B condition, t(41)=3.61, p< .05.
The post hoc comparisons for the unspeeded JOL conditions showed that the A-B A-B condition was higher than the A-B A-C condition, t(41)= 6.49, p< .05; the A-B A-C condition was higher than the A-B condition, t(41)= 7.13, p<.05), and the A-B condition was higher than the new condition , t(41)=8.83, p <.05. This interaction, as before, was our main prediction concerning JOLs.
There was a main effect of Encoding Condition, F(3, 123) = 129.49, MSe = .40, p < .05, η2p = .76. There was also a main effect of JOL speed, F(1,41) = 932, MSe = .36, p <.05, η2p = .19. But these main effects were qualified by the interaction of interest.
Gamma correlations relating JOLs to recall
As in Experiment 1, gammas were predicted to be higher in the unspeeded than the speeded condition. We computed gamma correlations collapsed over the three encoding conditions (excluding the new items) within the unspeeded and speeded JOL conditions. As predicted, they were higher for the unspeeded condition (M = .51, SE = .07) than for the speeded JOL condition (M = .19, SE = .06), t(41) =3.87, p < .05.
Additional analyses
The reaction time data for the four conditions in this experiment, and the proportion of responses in each condition at each of the four JOL levels, as well as the proportion correct at each of these four levels, are presented in Figure 7. Most JOL responses in the New condition were found to be in the lowest ‘don’t know’ JOL category. These responses were very fast. As in the first experiment, in the A-B A-B condition, most of the JOLs clustered into the highest JOL category, and they were also fast but not quite as fast as the ‘don’t know’ responses in the New condition. The proportion of responses in the highest JOL category was lower in the A-B A-C condition and the A-B condition. Medium JOLs in the conditions where targets had been presented were made more slowly than when high JOLs were given, again, as Son and Metcalfe (2005) had shown. These distributional and RT results are consistent with those of the first experiment, and provide further support for the dual-process hypothesis.
We were unable to conduct an ANOVA combining both Levels of JOLs and Conditions (New, A-B A-B, A-B A-C, and A-B) on RTs, because, again, there were no participants in this experiment who had data in every cell of the full design. Thus, we had to collapse into two separate one-way ANOVAs, the first comparing RTs across the 4 Encoding Conditions (collapsing over JOL levels), and the second comparing RTs over JOL levels (collapsing over Encoding Conditions). There was a significant effect of Encoding Condition with RT as the dependent variable, F (3, 123)=6.44, MSe=.17, p < .05, η2p= .14 . Although numerically the New condition (at 1.03 s) was faster than the A-B A-B condition (at 1.20 s), the post hoc test comparing these two conditions just missed being significant, t(41)= 1.94, p=.06 . The post hoc tests comparing both the New condition to the A-B A-C condition (at 1.38 s) and to the A-B condition (at 1.36 s) were both significant , t(41)= 3.11, p<.05, and t(41)=4.53, p<.05, respectively.
There was a main effect for JOL level when RT was the dependent measure, F(3,63)=5.13, MSe=.18, p<.05, η2p=.20. All differences among means except between JOL level 1 and JOL level 4 and between JOL level 2 and JOL level 3 were significant--indicating an inverted U-shaped curve as a function of JOL level. We tested for linear, quadratic and cubic trends. Only the quadratic co-efficient was significant, t(21)=2.68, p<.05.
Experiment 3
In Experiment 3 we again varied the target retrievability and the cue familiarity, as well as the speed of the JOLs in a single crossed design. The results of the first two experiments were supportive of the idea that slow JOLs were based on retrieval and that cue familiarity was what drove the fast JOLs, especially fast ‘don’t know’ JOLs. However, the fact that the recall shown for the A-B condition in Experiment 2, was lower than in the other conditions in which the target had been presented (A-B A-B and A-B A-C) made our results equivocal. We had intended the A-B condition to vary only in terms of cue familiarity. We thought it unlikely, but it was nevertheless possible that target recall, rather than only cue familiarity, could have been a factor in the difference in the fast JOLs between the A-B condition and the A-B A-C and the A-B A-B conditions. Here, we sought to devise a manipulation that would allow us to better isolate cue familiarity.
Specifically, we wanted to get rid of the possibility that target recall could be a contaminant of cue familiarity (or vice versa). To do so, we attempted to construct a zero retrieval condition, in which cue familiarity was still varied. If retrieval were zero in both high and low cue familiarity conditions, then the only thing that could affect JOLs, would be cue familiarity (if that were, in fact, what drove the fast JOLs). If the magnitude of the cue-familiarity JOL effect with fast JOLs was the same when retrieval was zero and when retrieval was higher, then we could be more confident in attributing the effect to cue familiarity itself. Thus, by negating the possibility of target retrieval, the effect of the cue familiarity variation could be isolated.
To vary target retrievability, from retrievable to not retrievable, we could, of course, simply either present a target or not. However, a no-target condition posed other problems in terms of the sensibility of the JOL question we were asking our participants. If no target were given following a cue, but just a blank space, the participant might remember that nothing at all was there following a particular cue. What would the correct answer be, then, to the JOL question of how likely is it that you will be able to recall what was paired with the cue, in a few minutes? If nothing had been presented and the participant knew that nothing had been presented, he or she might be justified in answering the question with a very high JOL, and then later, correctly, answering “nothing.” Would he or she be right or wrong to do this? We did not know, but this did not seem to be a good solution.
To get around this conundrum, we needed to present something, but something that would not be retrievable. So, in the no target condition we presented scrambled letters for 16 msec, followed immediately by a pattern mask. Participants saw something rather than simply nothing. But they were unable to retrieve anything from this presentation. No participant reported that there had not been words presented in the no target condition. It simply seemed to them that whatever had been presented had gone by too quickly for them to process--they could retrieve nothing. (In a pilot experiment, a word, rather than scrambled letters, had been presented in this condition followed by a pattern mask. We had presented the word for 21 msec--supposedly below the threshold of word recognition. However, our participants remembered the words presented in this manner about 10% of the time, and when we presented them three times--to vary target retrievability--their recall performance was at about 30%. For this reason we resorted to presenting scrambled letter strings rather than masked words.) This procedure of presenting something, however unable the participant was to process it, allowed us to ask the JOL question in a way that made sense.
Cue familiarity was varied by altering the duration of the cue. Cues were presented for either .5 s or 8 s. However, we wanted cues in which a difference in duration would make a large difference in their familiarity or fluency. Some stimulus items may be fully processed within a very short time interval, in which case even a large difference in cue duration might not be effective in altering cue familiarity. We also wanted cues for which, especially at the fast rate, we could be fairly sure that processing would be closely limited to the presentation time, since we did not want people to continue to process the cue even after it had been removed from the perceptual field. If we had used words, for example, so long as people could read the word in .5 s, they could have continued to elaborate and think about it the seconds that followed, which would not have allowed us to construct a clean design. People could time steal to further encode the cue, after its own nominal presentation interval.
To get around this problem, we used materials that made this possibility unlikely--fractal patterns. These patterns, two of which are shown in Figure 8, are exceedingly difficult, if not impossible, to verbalize when presented for only .5 s. They could be fairly well encoded, and, for some participants, verbalized, when they were exposed for 8 s. Thus, these particular cues afforded a large difference in familiarity, usability, or fluency as a function of presentation duration, which is what we wanted.
The third factor we varied was JOL speed, either speeded or unspeeded, as in the previous experiments. We predicted that recall would be better in the unspeeded JOL condition than in the speeded JOL condition, as before (though, of course, only when a target word had actually been presented). We also predicted, as before, that the gammas relating JOLs to recall would be higher in the unspeeded condition than in the speeded condition. In addition here we predicted a three-way interaction. Cue familiarity alone was predicted to selectively affect the speeded JOLs, with the 8 s cues giving rise to higher speeded JOLs than the .5 s cues. We expected no effect of target retrievability on the speeded JOLs. Target retrievability was predicted to affect the unspeeded JOLs, with the retrievable targets giving rise to higher JOLs than the unretrievable targets. This three-way interaction would provide firmer evidence, not only that the slow process was attempted target retrieval, but that the fast process was an assessment of cue familiarity.
Method
Participants were 35 Columbia University or Barnard College students who received course credit or cash. The design was a 2 X 2 X 2 factorial within-participant design, where the factors were speed of judgments (either speeded--.75 s or less, or unspeeded, as long as they wanted), cue familiarity (fractal shown for either .5 s in the unfamiliar condition or 8 s, in the familiar condition), target retrievability (target or no target). In the target condition, the words were presented, following the cue, for 3 s. In the no-target condition, scrambled letter strings were presented for 16 msec, followed immediately by a pattern mask for 250 msec. The procedure was basically the same as that of Experiments 1 and 2. The dependent variables were recall performance, JOLs and Gammas between JOLs and recall performance.
Results
Latencies
The mean time to make the speeded JOLs was .49 s. The mean time to make unspeeded JOLs was 1.74 s. This difference was significant, t(44) = 9.17, p <.05.
JOLs
As predicted, when the JOLs were manipulated to be speeded, cue familiarity had an effect on the JOLs (with high familiarity cues producing higher JOLs than low familiarity cues) and target retreivability had no effect. When judgments were unspeeded, target retreivability had an effect. The three-way interaction, shown in Figure 9, was significant, F(1, 34) = 9.26, MSe = .01, p<.05, η2p = .21., t(34) = 2.54, p < .05. With the speeded JOLs, the high familiarity cues resulted in higher JOLs than did the low familiarity cues, t(34) = 2.54, p < .05. At the same time, target retreivability had no effect. t(34) = 1.68, p > .05. When the JOLs were unspeeded, target retrievability had an effect, such that having had a target presented resulted in much higher JOLs than having had no target, t(34) = 8.76, p < .05. Furthermore, in the unspeeded JOL condition, the effect of cue duration only had an effect when this could have an impact on retrieval, that is, when there was a target present, t(34) = 6.69, p <.05. When no target was present, then the familiarity of the cue produced no difference in JOLs, t(34) = 1.05, p > .05, and the JOLs were close to the lowest possible value of 1. In summary, then, this significant three-way interaction indicates that the fast JOLs were driven by cue familiarity, with little or no influence of target retreivability, whereas the slow JOLs depended on retrieval of the target.
All of the other main effects and interactions in this experiment were significant, but they are all explained by (and qualified by) the pattern of data shown in the three-way interaction. There was a main effect of JOL speed, such that unspeeded JOLs were, on average, higher than speeded JOLs, F(1,34) = 3.78, MSe =.06, one tailed p < .05, η2p= .10. There was an effect of target condition, such that JOLs were higher when there was a target than when there was not, F(1,34) = 73.86, MSe = .02, p<.05, η2p=.69. There was an effect of cue condition, such that JOLs were higher when the cue was presented for a long time rather than a short time, F(1,34) = 55.92, MSe = .01, p<.05, η2p=.62. There was an interaction between JOL speed and whether or not a target was given, such that presentation of the target mattered much more for the unspeeded JOL conditions than for the speeded JOL condition, F(1,34) = 25.76, MSe = .02, p<.05, η2p= .43. There was an interaction between JOL speed and cue condition, such that the duration of the cue mattered more in the unspeeded condition than in the speeded condition, F(1,34) = 6.88, MSe = .01, p <.05, η2p= .17. This interaction perhaps deserves comment, since, at first blush, it would seem to go against the idea that cue familiarity matters for the speeded and not the unspeeded judgments. The significant two-way interaction, collapses over both the cases where there was a target and where there was not. There was a large difference in unspeeded JOLs as a function of cue presentation when there was a target, and this large difference was responsible for this double interaction. This occurred because the duration of the cue was important only when a target followed, and when the cue was presented for a long enough time to allow the presented target to be retrieved. When there was no target presented (as shown in the figure for the three-way interaction) there was no effect of cue duration whatsoever at the fast speed. So, taking this two-way interaction at face value without considering the significant three-way interaction, which qualifies it, would be mistaken.
Finally, the interaction between cue duration and whether or not a target was presented was significant, F(1,34) = 11.33, MSe = .02, p<.05, η2p= .25, such that the change in duration of the cue mattered much more when a target had been presented, than when no target had been presented. All of these main effects and interactions are qualified by the significant three-way interaction, which really tells the whole story.
Recall
Because recall in the no-target conditions was necessarily zero, we dropped this condition from all of the analyses on recall. Performance was better in the unspeeded JOL conditions (.25) than the speeded JOL conditions (.19), F(1,34) = 6.54, MSe = .02, p < .05, η2p= .16. There was an effect of cue condition, F(1,34) = 70.17, MSe = .02, p<.05, η2p= .67 such that recall was better with the long presentation of the cues than with the short presentation of the cues. This effect is important because it mirrors the effect of cue-familiarity seen in the three-way interaction in which the JOLs are the dependent measure. The difference in recall underscores the idea that this effect in the JOLs is very likely due to differential retreivability. The main finding of interest, however, in the recall data was the finding that making the JOLs slowly improved recall more than making the JOLs quickly, as predicted if the slow JOLs involved a memory-enhancing retrieval process whereas the fast JOLs did not use such a process.
Gamma correlations relating JOLs to recall
Gamma correlations were computed separately for speeded and unspeeded JOL conditions. Within these conditions, they were computed by taking the JOL values given for all cues compared to whether the person gave the word that had been presented with that cue. Thus, a 1 was assigned for recall of the word. A zero was assigned if there had been a word presented and it was not recalled or if no word had been presented (and, of course, it could not be recalled). Thus, if a person assigned a low JOL value to cues with which a target word had not been presented this would contribute to the goodness of the resultant gamma--increasing its positive value. We predicted higher gammas in the unspeeded than the speeded JOL condition, as before. True to prediction, the gammas were higher in the unspeeded (M = .87, SE = .08) than the speeded (M = .56, SE = .13) condition, but the effect was significant only by a one-tailed (though predicted, and therefore justified) test, t(26) = 1.92, p< .05.
General Discussion
These experiments provide support for the conclusion that there are two processes underlying people’s delayed judgments of learning. The first of these processes is the recognition of the cue. The second uses the recognized cue in an attempt to retrieve the target. The first process--the recognition of the cue--may, if the cue fails to be recognized, give rise to fast ‘don’t know’ judgments. In such a situation, where the individual does not even recognize the cue, he or she will not go onto the second process of trying to retrieve the target. Instead, a quick and decisive low JOL will be given and further processing stopped. Furthermore, because there has been no attempt to retrieve the target, no beneficial memory enhancement, attributable to retrieval, ensues.
But all low JOLs are not fast. It is also possible to obtain slow low JOLs. These, however, come about after a successful recognition of the cue coupled with an unsuccessful attempt at retrieval. Thus, at the low to middle end of the JOL scale there is a mix between fast ‘don’t know’ JOLs and fairly slow JOLs that come about because of retrieval failure.
A flow chart outlining the two processes that we propose underlie spontaneous delayed judgments of learning is given in Figure 10. As is shown in this figure, upon receiving the cue, the first process is to determine whether the cue is, itself, recognized. If it is recognized, then the way is clear to go on to the second stage. If not, then there is an endpoint fast ‘don’t know’ JOL. There are, of course, a number of compelling and well-elaborated models of recognition, and for purposes of determining this first stage of JOLs, differences among these are most likely inconsequential. However, for sake of illustration consider how the processes in random walk or diffusion models of recognition (see Ashby, 2000; Luce, l986; Ratcliff, l978, Ratcliff, Van Zandt, & McKoon, l999) would map to fast JOLs. In such models, there are two criteria--a lower match boundary and an upper match boundary. The lower boundary results in a decision, in old/new recognition experiments, that the cue is new (i.e., ‘no’). In the case of JOLs, reaching this criterion results in the lowest JOL being output, and in further processing stopping. The upper boundary in old/new recognition tasks results in an ‘old’ or ‘yes’ decision. In the case of JOLs, this positive recognition triggers the next phase--an attempt at target retrieval. The amount of time it takes to reach these two boundaries generates the reaction time functions in recognition experiments. In the JOL situation, the time to reach the ‘no’ boundary generates the reaction time for the fast ‘don’t knows.’ RTs for fast high JOLs, in this simple model, are the sum of the time to reach the ‘yes’ cue-recognition boundary plus the time taken to retrieve the target, following cue recognition.
Relative to a recall process, which may sometimes take seconds to complete, the recognition process is fast. Reber, Alvarez, and Squire (1997), for example, reported recognition reaction time functions with short retention intervals for correct ‘yes’ decisions that peaked at around .68 s, with about 2/3 of the responses being under .75 s. Nearly all ‘yes’ responses had been made within the first second of processing. ‘No’ responses are often a bit longer, but not much. Our own mean RTs for making ‘don’t know’ of the lowest possible JOLs in the unspeeded New condition—which may best reflect a relatively pure cue-recognition process in which the lower ‘no’ boundary is reached—was 1.16 s in Experiment 1 and 1.01 in Experiment 2. The latencies are about right for this first process to be a cue recognition process.
This cue-recognition stage of processing accounts, in a natural way, for the fast ‘don’t know’ JOL responses seen in Son and Metcalfe’s (2005) data. We assume, in the model shown in Figure 10, that the recognition process will, normally, run to completion and the process will either result in a fast ‘don’t know’ judgment, or lead to stage 2, in which target retrieval is attempted. What about in our own deadline paradigm experiment, presented here, in which processing was truncated in the speeded conditions? It is straightforward to see that if the participant in the experiment is forced to give a very fast JOL—supposedly not to exceed .75 s—then the first stage of the JOL’s normal processing may not always run to completion. We assume that under these conditions, the person assesses the state of the recognition random walk itself at the time of the deadline. If they did that, then the cues that were more familiar would, on average, at time of the deadline, have shown greater drift toward the positive boundary than would the cues with less familiarity. This would result in JOLs that would be sensitive to the familiarity manipulation alone, as was shown in the experiments presented here.
The second stage of processing in the model is an attempt at target retrieval. This stage is predicated on successful recognition of the cue. Once the cue is recognized it is used to attempt to retrieve the target. How long will the person persist with this retrieval attempt, and how does the time to retrieve the target relate to the person’s JOL? We propose that the dynamic JOL values themselves are instrumental in determining how long the person will attempt retrieval before giving up and giving the metacognitive judgment that they do not know. As is shown in Figure 10, following successful cue recognition, the person starts the retrieval process with a very high setting on the JOL counter. This counter will remain high if the retrieval process is successful nearly immediately--resulting in fast high JOLs. Since the attempt at retrieval will take some time after successful recognition, these fast high JOLs might be slightly slower than the fast don’t know JOLs (though the time to reach the ‘no’ boundary--giving rise to ‘don’t know’ JOLs is often slower than the time to reach the ‘yes’ boundary, which would trigger the second stage of JOL processing. Accordingly, some fast ‘know’ responses—even though two processing stages are recruited—might be faster than some fast ‘don’t know’ responses). This overall result was shown in both Experiment 1 and 2, in which the RTs for the ‘don’t know’ judgments in the New condition were the same or slightly faster than the high ‘know’ judgments given in the conditions in which the cues and targets had been presented and the items were given high fast ‘know’ responses.
According to the model, whenever target retrieval is successful--no matter how long it takes—a memory strengthening process should be enacted. If retrieval is not successful on the first attempt, the retrieval attempts will continue—taking time, of course with each try. On each successive attempt, the JOL counter decreases. (The model is neutral as to the exact nature of these attempts at retrieval, and to our knowledge there are no data on what happens during the time it takes someone to recall. Perhaps more and more features are retrieved in succession eventually resulting in an interpretable item, or perhaps different memory ‘images’ or ‘echoes’ are successively retrieved as a whole, through different epochs of retrieval attempts. But however it occurs, we assume that there is a counter that is decrementing the JOL value as the process takes more and more time). If retrieval is successful, at any point in this process, then the current JOL—whatever it is—will be given as the output. This loop results in decreases in JOLs with increases in retrieval time and maps well into the findings, not only of Son and Metcalfe (2005) but also of Benjamin, Bjork, and Schwartz (1998). They showed that decreases in retrieval fluency, as indicated by increased retrieval times, resulted in lower and lower JOLs.
The stop rule, in this iterative retrieval process, is a predetermined value of the JOL (presumably, for most participants, the lowest JOL value which indicates that they do not know). So long as the JOL is above that lower criterion that the person has set as the value at which they say that their JOL is so low that they definitely do not know the item, they will continue to attempt to retrieve. Once the JOL becomes too low, that is, it hits the lower JOL criterion, no further retrieval attempts ensue and the model exits the cycle with a low slow JOL.
What about the frequency distributions of delayed JOLs? As was shown by Kelemen and Weaver (1997) the frequency distributions of delayed JOLs over the range of possible JOL ratings is bimodal (and different from immediate JOLs, which are unimodal, and centered in the midrange, but relatively flat). There is a large preponderance of very low and very high JOLs, but few observations in the mid range. Notice that in the frequency distribution data that we presented with Experiments 1 and 2, in figures 4 and 7, the overall data are also bimodal. However, in our data they are bimodal in an analyzable way--the lowest JOLs are selectively attributable to the ‘new’ cues. The highest JOLs are attributable to presented materials that people subsequently recall with a very high probability. This bimodality, seen in delayed JOL data, falls out of the proposed model in a natural way. Many fast, low JOLs result simply because the participant fails to recognize the cue. If they do recognize the cue, however, they will then be automatically set to give the highest JOLs for those items that are retrieved. Insofar as most recall is fast, and only a few straggler items will be retrieved slowly, most of the retrievable items are likely to meet with success quickly, and be assigned high JOLs. But there will be a few stragglers. It is these that are expected to be produced increasingly slowly and with decreasing JOL values. Thus, the model makes the prediction, consistent with the quadratic RT functions of Son and Metcalfe (2005) and the data presented here, that the slow judgments should be those that are neither very high nor very low, but rather in the middle.
In summary, then, these experiments provide evidence that there are two successive processes that underlie delayed judgments of learning. The first process is recognition of the cue, and it occurs quickly. This process accounts for the observed very fast RTs given to some ‘don’t know’ responses. The third experiment showed that when the JOLs were made under a deadline procedure these fast JOLs were responsive only to variations in the familiarity of the cue—as would be expected if they were based on cue recognition. The relative accuracy of these fast JOLs is above chance—if the person does not recognize the cue they have virtually no chance of recalling the target, and this alone produces above chance JOL to recall gammas. However, there is no discrimination among the cues that are recognized, so more fine-grained predictions about future recall performance are not possible from this first stage. The attempt at retrieval, as is postulated to occur in the second stage, should increase the JOL relative accuracy further. This stage indicates whether the recall process is successful or not. If it is, then presumably, it is likely to also be successful later, and hence the results of this second stage are highly diagnostic (and much more diagnostic than the results of the first stage alone, especially for the targets for which the cue is recognized) of whether the target item will be retrievable later. Thus, the second, attempted-retrieval stage, results in higher relative accuracy than the first stage alone. This prediction was confirmed in the present experiments, and has previously been observed by Benjamin (2005). The second stage, which is an attempt at retrieval of the target, is sensitive to experimental variations in target retreivability, as was shown here in all three experiments. All three experiments confirmed the prediction that memory enhancement should obtain primarily with slow JOLs--that presumably entail retrieval of the target, and not with fast JOLs which are less likely to entail target retrieval. This simple dual-process model, then, can account for these findings in the delayed JOL paradigm, and provides a foundation for further understanding of how people make such metacognitive judgments.
Decade ago, Kolers and Palef (l977) raised the question of ‘knowing not’: How could people know that they do not know? Furthermore, how could they know that they do not know quickly? At this time in the history of psychology, search models of memory retrieval were popular, though what puzzled Kolers and Palef (l977) may apply even without recourse to a search metaphor. Should the person not have to laboriously exhaust all of their memoryknowledge store, coming up with nothing, to reach the conclusion that the desired information is not there? And should that process, which allows the conclusion that they do not know, not take a long time? How could it be possible that a person could answer that they did not know very quickly--even more quickly, sometimes, than that they knew something? To use an analogy based on a search metaphor of memory, if a person is asked to say whether she knows where she left her iPhone, should she not have to search until she either finds it (to say she knows) or searches a long time, and perhaps exhaustively, and eventually gives up (to say that she does not know)? It should take less time to find than not find, since at the time the iPhone is found, there are still a large (maybe infinite) number of places where she could still look if it had not yet been found. Each will take some time to explore. By this rationale, knowing not should be a long and tedious process. And yet people are often quick to say they don’t know. The answer to this dilemma, given substance in the results of the present paper, is that there is another process that precedes the search. To revert to the analogy — she asks herself: “Hmmm, iPhone?” And if the answer is, “I don’t have an iPhone,” she gives a quick ‘don’t know’ response and does not search at all.
Acknowledgments
This research was supported by NIMH grant R01 MH60637. We thank Lisa Son and the MetaLab for their help and comments.
References
- Ashby FG. A stochastic version of general recognition theory. Journal of Mathematical Psychology. 2000;44:310–329. doi: 10.1006/jmps.1998.1249. [DOI] [PubMed] [Google Scholar]
- Begg I, Duft S, Lalonde P, Melnick R, Sanvito J. Memory predictions are based on ease of processing. Journal of Memory and Language. 1989;28:610–632. [Google Scholar]
- Benjamin AS. Recognition memory and introspective remember/know judgments: Evidence for the influence of distractor plausibility on “remembering” and a caution about purportedly nonparametric measures. Memory & Cognition. 2005;33:261–269. doi: 10.3758/bf03195315. [DOI] [PubMed] [Google Scholar]
- Benjamin AS, Bjork RA. Retrieval fluency as a metacognitive index. In: Reder LM, editor. Implicit memory and metacognition: The 27th Carnegie Symposium on Cognition. Erlbaum; Hillsdale, NJ: 1996. pp. 309–338. [Google Scholar]
- Benjamin AS, Bjork RA, Schwartz BL. The mismeasure of memory: When retrieval fluency is misleading as a metamnemonic index. Journal of Experimental Psychology: General. 1998;127:55–68. doi: 10.1037//0096-3445.127.1.55. [DOI] [PubMed] [Google Scholar]
- Butler AC, Roediger HL. Testing improves long-term retention in a simulated classroom setting. European Journal of Cognitive Psychology. 2007;19:514–527. [Google Scholar]
- Dunlosky J, Nelson TO. Importance of the kind of cue for judgment for learning (JOL) and the delayed JOL effect. Memory & Cognition. 1992;20:374–380. doi: 10.3758/bf03210921. [DOI] [PubMed] [Google Scholar]
- Dunlosky J, Nelson TO. Similarity between the cue for judgments of learning (JOL) and the cue for test is not the primary determinant of JOL accuracy. Journal of Memory and Language. 1997;36:34–49. [Google Scholar]
- Dunlosky J, Rawson KA, Middleton E. What constrains the accuracy of metacomprehension judgments? Testing the transfer-appropriate-monitoring and accessibility hypotheses. Journal of Memory and Language. 2005;52:551–565. [Google Scholar]
- Finn B. (in press). Framing effects on metacognitive monitoring and control Memory & Cognition [DOI] [PMC free article] [PubMed]
- Finn B, Metcalfe J. The role of memory for past test in the underconfidence with practice effect. Journal of Experimental Psychology: Learning Memory and Cognition. 2007;33:238–244. doi: 10.1037/0278-7393.33.1.238. [DOI] [PubMed] [Google Scholar]
- Finn B, Metcalfe J. Judgments of Learning are influenced by Memory for Past Test. Journal of Memory and Language. 2008;58:19–34. doi: 10.1016/j.jml.2007.03.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Glenberg AM, Sanocki T, Epstein W, Morris C. Enhancing calibration of comprehension. Journal of Experimental Psychology: General. 1987;116:119–136. [Google Scholar]
- Karpicke JD, Roediger HL. The critical importance of retrieval for learning. Science. 2008;319:966–968. doi: 10.1126/science.1152408. [DOI] [PubMed] [Google Scholar]
- Kelemen WL, Weaver CA. Enhanced metamemory at delays: why do judgments of learning improve over time. Journal of Experimental Psychology: Learning, Memory and Cognition. 1997;23:1394–1409. doi: 10.1037//0278-7393.23.6.1394. [DOI] [PubMed] [Google Scholar]
- Kimball DR, Metcalfe J. Delaying judgments of learning affects memory, not metamemory. Memory & Cognition. 2003;31:918–929. doi: 10.3758/bf03196445. [DOI] [PubMed] [Google Scholar]
- Kolers PA, Palef SR. Knowing not. Memory & Cognition. l977;5:553–558. doi: 10.3758/BF03213218. [DOI] [PubMed] [Google Scholar]
- Koriat A. Monitoring one’s knowledge during study: A cue-utilization approach to judgments of learning. Journal of Experimental Psychology: General. 1997;126:349–370. [Google Scholar]
- Koriat A, Bjork RA. Illusions of competence in monitoring one’s knowledge during study. Journal of Experimental Psychology: Learning, Memory, and Cognition. 2005;31:187–194. doi: 10.1037/0278-7393.31.2.187. [DOI] [PubMed] [Google Scholar]
- Luce RD. Response times. Oxford; New York: 1986. [Google Scholar]
- McDaniel MA, Fisher RP. Tests and test feedback as learning sources. Contemporary Educational Psychology. 1991;16:192–201. [Google Scholar]
- McDaniel MA, Kowitz MD, Dunay PK. Altering memory through recall: The effects of cue-guided retrieval processing. Memory & Cognition. 1989;17:423–434. doi: 10.3758/bf03202614. [DOI] [PubMed] [Google Scholar]
- McDaniel MA, Masson MEJ. Altering memory representations through retrieval. Journal of Experimental Psychology: Learning, Memory, and Cognition. 1985;11:371–385. [Google Scholar]
- Metcalfe J, Finn B. Evidence that judgments of learning are causally related to study choice. Psychonomic Bulletin and Review. 2008;15:174–179. doi: 10.3758/pbr.15.1.174. [DOI] [PubMed] [Google Scholar]
- Nelson TO, Dunlosky J. When people’s judgments of learning (JOL) are extremely accurate at predicting subsequent recall: The delayed-JOL effect. Psychological Science. 1991;2:267–270. [Google Scholar]
- Nelson TO, Dunlosky J. How shall we explain the delayed-judgment-of-learning effect? Psychological Science. 1992;3:317–318. [Google Scholar]
- Nelson TO, Dunlosky J, Graf A, Narens L. Utilization of metacognitive judgments in the allocation of study during multi-trial learning. Psychological Science. 1994;5:207–213. [Google Scholar]
- Nelson TO, Narens L, Dunlosky J. A revised methodology for research on metamemory: Pre-judgment recall and monitoring (PRAM) Psychological Methods. 2004;9:53–69. doi: 10.1037/1082-989X.9.1.53. [DOI] [PubMed] [Google Scholar]
- Pashler H, Cepeda NJ, Wixted JT, Rohrer D. When does feedback facilitate learning of words? Journal of Experimental Psychology: Learning, Memory, and Cognition. 2005;31:3–8. doi: 10.1037/0278-7393.31.1.3. [DOI] [PubMed] [Google Scholar]
- Pashler H, Zarow G, Triplett B. Is temporal spacing of tests helpful even when it inflates error rates? Journal of Experimental Psychology: Learning, Memory, and Cognition. 2003;29:1051–1057. doi: 10.1037/0278-7393.29.6.1051. [DOI] [PubMed] [Google Scholar]
- Ratcliff R. A theory of memory retrieval. Psychological Review. 1978;85:59–108. [Google Scholar]
- Ratcliff R, Van Zandt T, McKoon G. Connectionist and diffusion models of reaction time. Psychological Review. 1999;106:261–300. doi: 10.1037/0033-295x.106.2.261. [DOI] [PubMed] [Google Scholar]
- Reber PJ, Alvarez P, Squire LR. Reaction time distributions across normal forgetting: Searching for markers of memory consolidation. Learning and Memory. 1997;4:284–290. doi: 10.1101/lm.4.3.284. [DOI] [PubMed] [Google Scholar]
- Roediger HL, Karpicke JD. Test-enhanced learning: Taking memory tests improves long-term retention. Psychological Science. 2006;17:249–255. doi: 10.1111/j.1467-9280.2006.01693.x. [DOI] [PubMed] [Google Scholar]
- Roediger HL, Weldon MS, Challis BH. Explaining dissociations between implicit and explicit measures of retention: A processing account. In: Roediger HL, Craik FIM, editors. Varieties of memory and consciousness: Essays in honour of Endel Tulving. Erlbaum; Hillsdale, NJ: 1989. pp. 3–39. [Google Scholar]
- Son LK, Metcalfe J. Judgments of Learning: Evidence for a Two-Stage Model. Memory & Cognition. 2005;33:1116–1129. doi: 10.3758/bf03193217. [DOI] [PubMed] [Google Scholar]
- Spellman BA, Bjork RA. When predictions create reality: Judgments of learning may alter what they are intended to assess. Psychological Science. 1992;3:315–316. [Google Scholar]
- Weaver CA, III, Kelemen WL. Processing similarity does not improve metamemory: Evidence against transfer-appropriate monitoring. Journal of Experimental Psychology: Learning, Memory, and Cognition. 2003;29:1058–1065. doi: 10.1037/0278-7393.29.6.1058. [DOI] [PubMed] [Google Scholar]
- Whitten WB, Bjork RA. Learning from tests: Effects of spacing. Journal of Verbal Learning and Verbal Behavior. 1977;16:465–478. [Google Scholar]