Skip to main content
PLOS One logoLink to PLOS One
. 2020 Oct 23;15(10):e0240858. doi: 10.1371/journal.pone.0240858

The effect of metacognitive training on confidence and strategic reminder setting

Nicole C Engeler 1,2,*, Sam J Gilbert 1
Editor: Veronica Yan3
PMCID: PMC7584199  PMID: 33095817

Abstract

Individuals often choose between remembering information using their own memory ability versus using external resources to reduce cognitive demand (i.e. ‘cognitive offloading’). For example, to remember a future appointment an individual could choose to set a smartphone reminder or depend on their unaided memory ability. Previous studies investigating strategic reminder setting found that participants set more reminders than would be optimal, and this bias towards reminder-setting was predicted by metacognitive underconfidence in unaided memory ability. Due to the link between underconfidence in memory ability and excessive reminder setting, the aim of the current study was to investigate whether metacognitive training is an effective intervention to a) improve metacognitive judgment accuracy, and b) reduce bias in strategic offloading behaviour. Participants either received metacognitive training which involved making performance predictions and receiving feedback on judgment accuracy, or were part of a control group. As predicted, metacognitive training increased judgment accuracy: participants in the control group were significantly underconfident in their memory ability, whereas the experimental group showed no significant metacognitive bias. However, contrary to predictions, both experimental and control groups were significantly biased toward reminder-setting, and did not differ significantly. Therefore, reducing metacognitive bias was not sufficient to eliminate the bias towards reminders. We suggest that the reminder bias likely results in part from erroneous metacognitive evaluations, but that other factors such as a preference to avoid cognitive effort may also be relevant. Finding interventions to mitigate these factors could improve the adaptive use of external resources.

Introduction

In our daily lives we must frequently remember to execute intentions such as buying ingredients for a meal or attending future appointments. However, our unaided memory abilities are limited [1]. Hence, we frequently choose to enhance our memory for delayed intentions with external tools, for instance by taking notes or by setting reminders in our smartphones [2, 3]. Reducing the cognitive demands of a task in this way has been termed cognitive offloading, and creating external triggers in order to remember delayed intentions is known as intention offloading [4]. With the development of technologies such as smartphones and other smart devices, the use of technology to lessen an individual’s memory load has become increasingly ingrained in the completion of everyday tasks [57]. However, using reminders involves both costs (e.g. the time and effort creating them) and benefits (e.g. increased likelihood of remembering), and individuals often need to evaluate whether it would be more beneficial for them to create a reminder or not. Previous studies have suggested that an accurate estimation of our unaided memory abilities may be necessary to make optimal choices in reminder-setting [812]. Therefore, this study examines an intervention designed to a) improve the accuracy of participants’ self-judgments, and b) reduce bias in offloading decisions.

To study cognitive offloading in memory, Gilbert [9] developed an intention offloading task in which participants dragged numbered circles to the bottom of a box in sequential order. During this ongoing task, participants also completed delayed intentions in which they were required to drag target circles to alternative locations (left, right or top) of the box. Dragging a sequence of numbered circles out of the box completed a ‘trial’. As an alternative strategy to relying on their own memory and mentally rehearsing delayed intentions, participants could ‘offload’ the need to remember by dragging target circles to the alternative locations at the beginning of each trial, before they were reached in the sequence. The target circles could then act as reminders, akin to someone leaving an object by the front door so that they remember to take it when leaving the house the next day. This study found that setting external reminders (i.e. offloading) improved performance. Additionally, participants set these reminders adaptively, based on the internal cognitive demands of the task (i.e. the number of items to remember and the presence of distractions). These findings suggest that individuals decide whether to set reminders based on a metacognitive evaluation of the difficulty of the task. In line with this, findings from multiple studies point toward metacognitive judgments as a key factor in decisions about offloading. Using the same paradigm as detailed above, Gilbert [10] showed that choosing to set external reminders was predicted by participants’ metacognitive confidence, as people with lower confidence in their memory ability used more reminders, even when that confidence was unrelated to objective performance. The relationship between confidence and intention offloading was replicated by Boldt and Gilbert [8], both when reminder-setting was instructed and when it was spontaneously generated. Similarly, in another study involving the recall of word pairs, lower confidence in memory ability was linked to more frequent requests for hints, even when performance was controlled for [12]. In line with this, a survey study showed a negative correlation between self-reported internal memory ability and the use of memory offloading [13]. These results suggest that decisions of whether to set reminders are influenced by potentially erroneous metacognitive evaluations of internal memory abilities.

To investigate whether participants weigh costs and benefits of reminder-setting optimally or whether they show systematic bias, Gilbert et al. [11] adapted the paradigm by Gilbert [9]. Participants could either earn a maximum reward (10 points) for correctly remembered target circles when using their own memory, or earn a lesser reward (between 1–9 points) when setting reminders to increase the number of circles remembered. All experiments found a bias toward the use of reminders, predicted by participants’ inaccurate metacognitive underconfidence in their own internal memory abilities. Furthermore, metacognitive interventions have been shown to influence the reminder bias. In an experiment where one group of participants received metacognitive advice about whether they would be likely to score more points using their own memory or reminders, the reminder bias was eliminated [11, Experiment 2]. Another experiment [11, Experiment 3] manipulated feedback valence (positive or negative) and the difficulty of practice trials (easy or hard). This produced one group (difficult practice, negative feedback) which was significantly underconfident, and another group (easy practice, positive feedback) that was significantly overconfident. Both groups used reminders significantly more often than would have been optimal. Seeing as a bias towards reminders can be observed both in the context of under- and over-confidence, it seems likely that metacognitive bias can partially, but not fully, explain the reminder bias.

Metacognitive bias does not occur in the same way for all tasks, populations and individuals: some experiments find that people are overconfident in their memory ability, but other experiments find that people are underconfident [10, 1418]. Therefore, we aim to create a metacognitive intervention which can potentially remedy biases in either direction. Asking participants to make predictions about their own performance, and providing feedback on the accuracy of those predictions, may be a suitable method of “training” metacognitive accuracy. Indeed, multiple studies examining the role of feedback on participants’ judgments have found a reduction in metacognitive bias and improvements in judgment accuracy [1921].

Hence, the primary aim of this study is to investigate whether metacognitive training, i.e. providing participants with feedback on their metacognitive judgments, is an effective intervention to a) improve participants’ metacognitive judgment accuracy, and b) reduce bias in offloading behaviour. We predict that participants in the experimental condition will make more accurate judgments than participants in a control group. Moreover, we predict that an improvement in metacognitive accuracy may result in more optimal strategy choices, as measured by the reminder bias. As individuals rely on offloading to organise their behaviour, but do not always offload optimally, finding interventions to influence individuals’ offloading strategies could improve behavioural organisation in everyday life.

Methods

To view a demonstration of the experimental task, please visit [http://samgilbert.net/demos/NE1/start.html]. This demonstration version omits the information and consent pages, informs the visitor at the beginning whether they have been randomised to the feedback or no-feedback control condition, and does not record any participant data. Other than this, the demonstration version of the task is identical to the one undertaken by the actual participants in this study.

Design

The present study adapted the paradigm developed by Gilbert et al. [11] to investigate whether metacognitive training has an impact on a) metacognitive judgment accuracy, and b) strategic reminder setting. During ‘metacognitive training’ participants were asked to make pre-trial predictions about their performance, and then received feedback on their judgment accuracy post-trial. Using a between-subjects design, participants were randomly allocated to either an experimental condition with metacognitive feedback training or a control group without. The experimental manipulation occurred during ‘forced trials’ only (described below), with all else being identical between groups. Before commencing data collection, all hypotheses, experimental procedures, and analysis plans were pre-registered [https://osf.io/ebp4z/].

Optimal reminders task

See Fig 1 for a schematic illustration of the task. Participants were presented with six yellow circles randomly positioned within a box, on their device screen. Each circle contained a number which participants were asked to drag sequentially (1, 2, 3 etc.) to the bottom of the box. When a circle was dragged out of the box, a new circle would appear in its vacated location, continuing the next number in sequence (i.e. if 1–6 were on screen, 7 would appear in the location of 1 after it got dragged out). Each trial contained 25 circles presented in sequence. Sometimes, new circles first appeared in another colour (blue, orange or pink) rather than yellow, but after 2 seconds those circles faded to yellow as well. These were target circles which corresponded with the alternative sides of the box (left, top and bottom). Hence, a circle initially appearing in an alternative colour meant that the circle should eventually be dragged to its corresponding side of the box once reached in the sequence. For instance, if after dragging number 1 to the bottom, 7 initially appeared in blue, the participant had to drag 2–6 to the bottom before dragging 7 to the left (blue) side. When a target circle was dragged to the correct side of the box it turned green before disappearing. Circles dragged wrongly to an alternative side of the box turned red before disappearing.

Fig 1. Schematic illustration of the optimal reminders task.

Fig 1

To remember to drag initially nonyellow circles to alternative locations when reached in the sequence, participants had to form a delayed intention. There were two strategies for remembering these intentions: participants could either rely on their internal (unaided) memory or create an external reminder. To create external reminders, target circles had to be dragged near the instructed alternative location as soon as the circle appeared on screen (because circles faded back to yellow quickly). Once the target circle was reached in a sequence, its location would remind participants of their intention. There were 10 target circles per trial, which always appeared between positions 7 and 25, distributed as evenly as possible. Seeing as participants had to remember multiple intentions at once, it was unlikely that they would remember all of them if they relied on their internal memory. However, the task was easier when external reminders were used.

In the optimal reminders paradigm participants completed trials in which they were forced to use their internal (unaided) memory, as well as trials in which they had to set external reminders. The paradigm also contained choice trials in which participants had to decide between receiving the maximum reward for each remembered target circle when using their own memory, or receiving a smaller reward when using reminders. Correct target circles were always worth 10 points when the internal strategy was chosen, and varied between 1–9 points for the external strategy. Assigned point values remained for an entire trial. During the nine choice trials, the values for reminders (1–9) were presented in a randomised order. Participants were instructed to choose the strategy with which they believed they could score the most points. To do so, participants had to consider both the amount of points they would receive per correct circle, and the number of circles they thought they were likely to remember correctly with each strategy. Additionally, including forced internal and external trials allowed us to determine each participant’s optimal indifference point, i.e. the point at which they should be indifferent between the two strategies, based on their accuracy during the forced trials. This indifference point could then be compared to an individual’s actual indifference point as seen from the decisions made during choice trials. The difference between a participants’ optimal and actual indifference point is the reminder bias (see data analysis section).

Participants

As specified in our preregistration, we aimed for a final sample size of 116. Our power calculation was based on an experiment which, like the present study, tried to influence metacognitive judgments and strategy choices using Gilbert et al.’s [11] paradigm. In their experiment a group of participants received metacognitive advice about which strategy to use before choosing a strategy option [11, Experiment 2]. The reminder bias was eliminated in this group, and it was significantly reduced in comparison with a control group who did not receive advice (Cohen’s d = .55). Assuming that the influence of metacognitive feedback may be comparable to the influence of metacognitive advice, and based on a desired power of 90%, this yielded a required sample of 116 (58 participants in each group) to conduct the between-subject comparisons (G*Power 3.1). A total of 133 participants were tested to reach the planned sample size of 116, after applying the preregistered exclusion criteria. These criteria were designed to ensure that included participants engaged with the task as intended. Participants were excluded for a) having higher accuracy for forced internal (own memory) than forced external (with reminders) trials (n = 4), b) lower than 70% accuracy during external trials (n = 6), c) a negative correlation between target value and likelihood of choosing to set reminders, which suggests random or counter-rational strategy choices (n = 4) and d) a metacognitive bias score more than 2.5 standard deviations from the group mean (n = 3). The final sample had a mean self-reported age of 37 (SD = 11.13, range: 20–71), with 64 females, 41 males and one other. All participants provided informed consent before participating and the research was approved by the UCL Research Ethics Committee.

Procedure

Participants were recruited from the Amazon Mechanical Turk website and completed the experiment on their computer, accessing it via a provided weblink. Participation was restricted to individuals with a minimum of 90% Mechanical Turk approval rate, and to those reporting a location in the U.S. in order to reduce heterogeneity and to remain consistent with previous studies. The median duration to complete the experiment was 31 minutes. Participants were paid $7.50 for taking part.

See Fig 2 for a visualization of the task procedure. Participants first performed a practice session which required them to respond accurately to a target circle in order to proceed. This ensured that they understood the task instructions properly. Following this, participants in both groups completed 5 forced internal and 4 forced external trials in alternating order, beginning and ending with an internal trial. This served two purposes. First, accuracies in the two conditions could be used to calculate the optimal indifference point, i.e. the number of points offered for each target when using reminders at which an unbiased individual would be indifferent between the two strategies (see data analysis section). Second, this provided the opportunity for the metacognitive training group to give performance predictions and to receive feedback on those predictions. The reason for including more internal than external trials was so that the number of trials in this phase was matched to the subsequent choice phase. Furthermore, it was of more interest to train participants’ metacognitive accuracy during internal trials than external trials, as the reminder bias has been previously linked to erroneous underconfidence in unaided internal memory ability [11]. Prior to each trial, participants in both groups were informed of the number of points they had scored so far and were told which strategy they had to use in the upcoming trial. Participants in the experimental feedback group were additionally instructed to provide performance predictions before they began each trial. For this, participants had to use a moveable slider on their screen to indicate what percentage of target circles (0% - 100%) they thought they would be able to correctly drag to the instructed side of the square during the next trial. After each trial, participants in the experimental group received feedback about their judgment accuracy: they were reminded of their predicted accuracy, informed of their actual accuracy, and told whether they had underestimated, overestimated or accurately estimated their memory ability. Participants in the control group did not make performance predictions or receive feedback. After completing the forced internal and forced external trials, participants in both groups were asked to make metacognitive evaluations of their accuracy in the internal and external conditions: “Please use the scale below to indicate what percentage of the special circles you will correctly drag to the instructed side of the square, on average. 100% would mean that you always get every single one correct. 0% would mean that you can never get any of them correct”. For both global judgments, participants used a moveable slider to select any value between 0–100%. This allowed us to investigate whether metacognitive training improved the subsequent accuracy of metacognitive predictions. Next, participants received instructions about choice trials and subsequently completed these. Before each trial participants were informed of the total number of points they had scored so far. Finally, participants in both groups were again asked to judge their internal memory ability, as well as their ability whilst using reminders, on a 0–100% scale, with the following wording: “You have now finished doing the task. But we would like you to make some more predictions. Suppose you had to do the task again using your own memory. [or “using reminders”, depending on which judgement]. What percentage of the special circles do you think you would be able to correctly drag to the instructed side of the square, on average? 100% would mean that you always get every single one correct. 0% would mean that you could never get any of them correct. Please remember that you should just answer about your ability to do the task with your own memory [or: “with reminders”, depending on which condition]. This allowed us to examine whether any effect of metacognitive training on metacognitive evaluations was maintained at the end.

Fig 2. Summary of the task procedure.

Fig 2

Data analysis

Data were analysed using R version 4.0.0 and RStudio Version 1.3.959. T-tests did not assume equal variances and degrees of freedom were adjusted accordingly. Data and code to reproduce the analyses below can be downloaded from [https://osf.io/ebp4z/]. All statistical analysis was conducted as outlined in our preregistered plan. Measures and calculations are based on those by Gilbert et al. [11] and are described below:

1. Optimal Indifference Point (OIP): This is the reminder value (1–9) at which an unbiased individual should be indifferent between the internal and the external strategy option, based on their mean target accuracy (i.e. the mean number of correct target circles) on forced internal trials (ACCFI), and the mean target accuracy on forced external trials (ACCFE). The optimal indifference point is calculated as: OIP = (10 x ACCFI) / ACCFE.

2. Actual Indifference Point (AIP): This is the point at which participants are actually indifferent to the two strategy choices and are equally as likely to choose either option. This was calculated by fitting a sigmoid curve to the strategy choices across the 9 reminder target values (1–9), using the R package ‘quickpsy’ bounded to the range 1–9.

3. Reminder Bias: The reminder bias is the difference between the optimal and the actual indifference point (OIP–AIP). If participants were unbiased between the two strategy options, the actual and optimal indifference points would match. A positive value indicates that a participant is biased toward using more reminders than is optimal.

4. Metacognitive Judgments: Participants gave four global metacognitive judgments: two internal and two external judgment responses, given before the experimental choice trials began and after all trials were completed.

5. Metacognitive Bias: The metacognitive bias score indicates the difference between participants metacognitive judgements and their objective accuracy levels (i.e. the mean accuracy for the forced internal and forced external trials). A positive number indicates overconfidence, and a negative value indicates underconfidence.

Results

Accuracy in the forced internal trials (feedback group: M = 56.17%, SD = 16.76; control group: M = 50.24%, SD = 14.11) was lower than accuracy in the forced external trials (feedback group: M = 95.91%, SD = 5.61; control group: M = 96.29%, SD = 4.92).

In order to assess the accuracy of participants’ metacognitive judgments and investigate whether metacognitive training improves judgment accuracy, two metacognitive bias scores (the internal bias and the external bias) were initially calculated by averaging the first and second global judgments. This is in order to avoid type-1 error (see below for additional analyses investigating whether there was an effect of timepoint). One sample t-tests revealed that participants in the control group were significantly underconfident in both their internal (t(57) = -6.83, p < .0001, d = -.90) and external memory abilities (t(57) = -5.16, p < .0001, d = -.68), whereas participants in the feedback group did not display any significant metacognitive bias (internal bias: t(57) = -.34, p = .74, d = -.045; external bias: t(57) = -1.91, p = .061, d = -0.25). See Fig 3 for a visualisation of these results. Direct comparisons showed that metacognitive bias was significantly different between the two groups (internal bias: t(111.55) = 4.2, p < .0001, d = .78; external bias: t(103.38) = 3.08, p = .0013, d = .57); note that these p values are for a one-tailed independent samples t test, in accordance with our preregistered plan. Group differences in judgments of both internal and external memory ability were also observed when raw metacognitive judgments rather than metacognitive bias scores were investigated (internal judgment: t(113.58) = 4.63, p < .0001, d = .86; external judgment: t(100.08) = 2.49, p = .0072, d = .46); note that these p values are for one-tailed tests, in accordance with our preregistered plan. Furthermore, a mixed ANOVA was conducted to evaluate whether metacognitive bias scores differed according to timepoint (first judgment, second judgment) or condition (internal, external), and whether these effects were modulated by group. The test produced a main effect of group on metacognitive bias scores, F(1,114) = 23.74, p < .0001, ηp2 = .17. There was also a main effect of condition (F(1, 114) = 5.43, p = .022, ηp2 = .045), as metacognitive underconfidence was more pronounced in the internal (M = -7.74, SD = 18.91) than the external condition (M = -3.75, SD = 7.89), but there was no significant main effect of timepoint (F(1,114) = .038, p = .85, ηp2 = .00034). These main effects were qualified by a significant group x condition interaction (F(1,114) = 7.61, p = .007, ηp2 = .063) with a larger effect of metacognitive feedback on internal bias (feedback: M = -0.84, SD = 18.94; control: M = -14.63, SD = 16.31) than external bias (feedback: M = -1.58, SD = 6.28; control: M = -5.93, SD = 8.75). There was also a group x time interaction (F(1, 114) = 9.52, p = .003, ηp2 = .077), reflecting a negative shift in bias from time 1 (M = 0.35, SD = 13.8) to time 2 (M = -2.77, SD = 10.2) in the feedback group, but a positive shift from time 1 (M = -12.05, SD = 11.64) to time 2 (M = -8.51, SD = 10.32) in the control group. The three-way interaction was also significant (F(1,114) = 4.69, p = .032, ηp2 = .04); see Table 1 for means and standard deviations. One-tailed independent samples t-tests showed that group differences in metacognitive bias scores were found regardless of timepoint or condition (internal condition, first judgement: p < .0001, d = .84; internal condition, second judgement: p = .0075, d = .46; external condition, first judgement: p = .0014, d = .57; external condition, second judgement: p = .014, d = .41).

Fig 3. Internal and external metacognitive bias scores for the feedback and the control group.

Fig 3

Error bars represent 95% confidence intervals.

Table 1. Means and standard deviations for the three-way interaction of group x condition x time.

  Internal Bias External Bias
Time 1 Time 2 Time 1 Time 2
Feedback        
Mean 3.29 -4.98 -2.60 -0.56
SD 26.47 17.30 5.91 8.31
Control
Mean -16.10 -13.20 -8.00 -3.86
SD 19.11 18.30 12.00 7.76

Turning to the question of whether participants exhibit a reminder bias and whether this bias is influenced by metacognitive training, one-tailed one sample t-tests revealed that participants in both groups were significantly biased towards setting more external reminders than was optimal (feedback group: t(57) = 4.11, p < .0001, d = 0.54; control group: t(57) = 3.43, p = .0006, d = 0.45). See Fig 4 for a visualisation of these results. Contrary to our prediction, the reminder bias was numerically larger for the feedback group (M = 1.64, SD = 3.04) than the control group (M = 1.16, SD = 2.57). Seeing as our pre-registered plan was to investigate any difference in the opposite direction with a one-tailed test, it is not appropriate to conduct any further statistical analysis of the observed effect.

Fig 4. Reminder bias scores for the feedback and the control group.

Fig 4

Error bars represent 95% confidence intervals.

To further investigate whether the reminder bias is related to metacognitive bias, we calculated the Pearson correlation between the reminder bias and a) internal metacognitive bias, and b) external metacognitive bias, for each group separately. For participants in the feedback group there was no significant correlation between metacognitive biases and reminder bias scores (internal bias: r(56) = -0.004, p = .98; external bias: r(56) = .15, p = .26). Although there was no significant correlation between external bias and reminder bias (r(56) = -.095, p = .48) in the control group either, a significant correlation between participants’ internal metacognitive bias and their reminder bias was found (r(56) = -.31, p = .018): the more underconfidence was displayed, the higher the bias for reminders, as has been previously found [11]. This suggests that the relationship between internal metacognitive bias and reminder bias was specific to the no-feedback group. However, we note that a direct comparison between the correlation coefficients in the two groups (based on Fisher’s r-t-z transformation) did not yield a significant effect (z = 1.17, p = .24). Therefore, no strong conclusions about specificity may be drawn. See Fig 5 for scatterplots depicting the relationship between internal metacognitive bias and reminder bias scores in the feedback group and the control group, respectively.

Fig 5. Scatterplot depicting the correlation between internal metacognitive bias and reminder bias scores in the feedback and control groups, with a line of best fit.

Fig 5

Discussion

Enhancing our internal memory ability by using cognitive tools involves both costs and benefits. Previous studies [812] have suggested that we do not accurately assess our memory ability and thus do not make the most optimal decisions about whether or not to use external reminders. Using the optimal reminders paradigm, we investigated whether metacognitive training, i.e. providing participants with feedback on their metacognitive judgments, is an effective intervention to a) improve metacognitive judgment accuracy, and b) whether this leads to more optimal offloading behaviour, as measured by the reminder bias.

In line with previous studies [11, 12, 22], using reminders improved task performance, demonstrating the benefit of using external tools in aiding our memory. Further, participants in the control group were significantly underconfident in both their internal memory ability as well as their memory ability when setting reminders. However, they underestimated their internal memory ability to a larger degree than their memory ability with reminders. In comparison, participants who received metacognitive feedback displayed neither internal nor external metacognitive bias, as they made more accurate judgments about their memory performance for both memory strategies. Group differences were significant for both internal and external metacognitive bias. The same was found when raw metacognitive judgments rather than metacognitive bias scores were investigated. Moreover, group differences in metacognitive bias were observed not only immediately after the metacognitive training but also in a final judgement at the end of the experiment. Therefore, the effect of our metacognitive feedback training persisted beyond the initial manipulation phase. All group differences found were especially pronounced for participants’ estimations of their internal memory abilities compared to their external memory abilities. This is consistent with the literature, as previous evidence suggests that individuals tend to be underconfident in their own, unaided memory abilities rather than underestimating the helpfulness of external tools [10, 18, 23, 24]. It is perhaps because metacognitive training appears to improve appraisals of memory, and underconfidence is especially pronounced for internal memory abilities, that the difference between groups is larger for the internal than the external bias. Altogether, the evidence of the present study shows that metacognitive training in the form of providing participants with feedback on their metacognitive judgments is effective in improving metacognitive judgment accuracy and in removing metacognitive bias. Similar effects of metacognitive training have been demonstrated in a study by Carpenter et al. [20], in which participants receiving feedback on their metacognitive judgements experienced increased metacognitive calibration relative to participants receiving feedback on task performance [see also: 19, 21, 25]. Further studies are needed to disentangle whether the benefit of metacognitive training was found due to the action of making predictions or whether participants must also receive feedback on these predictions, or even whether feedback alone can lead to such an effect.

The second line of investigation asked whether improved metacognitive accuracy leads to more optimal reminder setting, i.e. a reduction in reminder bias. Contrary to our predictions, both groups were significantly biased towards using more external reminders than was optimal. Despite improved metacognitive accuracy, the reminder bias was actually numerically larger for the feedback group than the control group. This suggests that the reminder bias cannot be fully explained by metacognitive error, seeing as it can be observed even when metacognitive bias is eliminated. An additional factor that may explain the bias towards reminders is a preference to avoid cognitive effort associated with use of internal memory [2628]. According to the ‘minimal memory’ view, people have a general bias to use external information over internal memory representations [26]. This may be because cognitive effort is intrinsically costly [27, 28]. Consistent with this, recent evidence shows that the bias towards reminders is reduced (but not eliminated) when participants receive financial compensation based on the number of points they score, which is hypothesized to increase cognitive effort [29]. Seeing as the participants in the present study received a fixed payment, regardless of their performance, it remains to be seen whether metacognitive interventions might be effective under conditions of performance-based reward. We also note that participants underwent the forced trials before the choice trials in this experiment, unlike previous studies where the two types of trial have generally been intermixed [11, 29]. This could lead to inaccuracy in estimation of the reminder bias, if performance in the choice trials relative to forced trials was increased (e.g. due to practice) or reduced (e.g. due to fatigue). However, seeing as both feedback and control groups underwent the same procedure, this would affect both groups in the same manner. Therefore this issue does not confound the direct comparisons between groups, which were the main focus for the present study.

Offloading to reduce cognitive effort would be in line with recent evidence on pre-crastination, in that individuals may want to complete a task sooner rather than later to reduce the effort of holding an intention in mind [30]. Another possibility is that participants may have preferred to use reminders in order to reduce variability in their performance, even if this resulted in a worse overall outcome when considering the mean level of performance [11]. Similarly, individuals may have chosen the external strategy to avoid “looking stupid” by making errors [31]. It is also possible that individuals chose a sub-optimal reminder strategy simply because they lack the arithmetic ability to weigh the two strategies properly, although it is not clear why this would cause systematic bias in one direction or the other. Despite the finding that reducing metacognitive bias did not reduce the reminder bias here, the present results do not rule out a metacognitive influence on cognitive offloading in other settings. Although the present experiment yielded a null effect, a previous study did demonstrate an effect of metacognitive interventions on the reminder bias [11; Experiment 3], showing that metacognitive interventions can influence reminder setting at least under certain circumstances. It is likely that any bias towards reminders is influenced by multiple factors, and the influence of factors such as cognitive effort does not rule out the influence of metacognitive factors as well. Indeed, a relationship between metacognitive bias and reminder bias was still observed in the control group of the present study, consistent with earlier findings [812].

Taking these results together, we suggest that metacognitive judgements play a role in the decision of whether to set external reminders, but other factors, such as avoidance of cognitive effort, may influence reminder setting too. Other cognitive offloading studies have also proposed that both metacognitive beliefs about expected performance as well as the effort required are critical in deciding whether to use an external resource [32].

To potentially reduce any preference for the avoidance of cognitive effort, future studies could provide a strong incentive, such as performance-based pay, for participants to behave optimally. Another method would be to amend the current study so that participants are merely asked which strategy option they would hypothetically choose in the choice trials, without having to use any cognitive effort in doing the actual task. This could help to establish the extent to which reminder-setting is influenced by avoidance of cognitive effort and metacognitive accuracy. Moreover, future work could explore item-by-item reminder selection, as this resembles how individuals offload memory in the real world: rather than offloading an entire block of information, they choose what information to offload on an item-to-item basis.

In conclusion, our feedback intervention was effective in improving participants metacognitive judgment accuracy. As this intervention improved judgment accuracy rather than increasing or decreasing confidence in memory abilities, it may be useful across multiple tasks or populations with different directions of bias. However, the extent to which metacognitive accuracy or other factors such as avoidance of cognitive effort influence the reminder bias remains uncertain. In order to promote the effective use of cognitive tools and to find interventions that improve offloading behaviour, factors influencing the choice of reminder-setting must be further understood.

Data Availability

All data files and analysis codes will be held in a public repository, available at https://osf.io/ebp4z/.

Funding Statement

Funding for this study was received by SJG from the Economic and Social Research Council, ES/N018621/1, https://esrc.ukri.org. Funders played no role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Cowan N. The Magical Mystery Four: How Is Working Memory Capacity Limited, and Why? Curr Dir Psychol Sci. 2010;19(1):51–7. 10.1177/0963721409359277 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Hall L, Johansson P, de Léon D. Recomposing the will: Distributed motivation and computer-mediated extrospection In: Clark A, Kiverstein J, Vierkant T, editors. Philosophy of mind series. New York: Oxford University Press; 2013. 298–324. [Google Scholar]
  • 3.Heersmink RA. Taxonomy of Cognitive Artifacts: Function, Information, and Categories. Rev Philos Psychol. 2013;4(3):465–481. [Google Scholar]
  • 4.Risko E, Gilbert S. Cognitive Offloading. Trends Cogn Sci. 2016;20(9):676–688. 10.1016/j.tics.2016.07.002 [DOI] [PubMed] [Google Scholar]
  • 5.Dror I, Harnad S. Offloading Cognition onto Cognitive Technology In: Dror I, Harnad S, editors. Cognition Distributed: How Cognitive Technology Extends Our Minds. Amsterdam: John Benjamins; 2009. p. 1–23. [Google Scholar]
  • 6.Graus D, Bennett P, White R, Horovitz E, editors. Analyzing and predicting task reminders. In: Proceedings of the 2016 conference on user modeling adaptation and personalization (UMAP ‘16), Halifax, Nova Scotia, Canada, 13–17 July 2016, p. 7–15. New York: ACM.
  • 7.Svoboda E, Rowe G, Murphy K. From Science to Smartphones: Boosting Memory Function One Press at a Time. JCCC. 2012;2:15–27. [Google Scholar]
  • 8.Boldt A, Gilbert SJ. Confidence guides spontaneous cognitive offloading. Cogn Res Princ Implic. 2019;4(1):45 10.1186/s41235-019-0195-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Gilbert SJ. Strategic offloading of delayed intentions into the external environment. Q J Exp Psychol. 2015;68(5):971–992. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Gilbert SJ. Strategic use of reminders: influence of both domain-general and task-specific metacognitive confidence, independent of objective memory ability. Conscious Cogn. 2015;33:245–260. 10.1016/j.concog.2015.01.006 [DOI] [PubMed] [Google Scholar]
  • 11.Gilbert SJ, Bird A, Carpenter JM, Fleming SM, Sachdeva C, Tsai PC. Optimal use of reminders: Metacognition, effort, and cognitive offloading. J Exp Psychol Gen. 2020;149(3):501–517. 10.1037/xge0000652 [DOI] [PubMed] [Google Scholar]
  • 12.Hu X, Luo L, Fleming S. (2019). A role for metamemory in cognitive offloading. Cognition. 2019;193:104012 10.1016/j.cognition.2019.104012 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Finley JR, Naaz F, Goh FW. Memory and technology. Cham: Springer International Publishing; 2018. [Google Scholar]
  • 14.Cauvin S, Moulin C, Souchay C, Schnitzspahn K, Kliegel M. Laboratory vs. naturalistic prospective memory task predictions: young adults are overconfident outside of the laboratory. Memory. 2019;27(5):592–602. 10.1080/09658211.2018.1540703 [DOI] [PubMed] [Google Scholar]
  • 15.Cherkaoui M, Gilbert SJ. Strategic use of reminders in an ‘intention offloading’ task: Do individuals with autism spectrum conditions compensate for memory difficulties? Neuropsychologia. 2017;97:140–151. 10.1016/j.neuropsychologia.2017.02.008 [DOI] [PubMed] [Google Scholar]
  • 16.Devolder P, Brigham M, Pressley M. Memory performance awareness in younger and older adults. Psychology and aging. 1990;5(2):291–303. 10.1037//0882-7974.5.2.291 [DOI] [PubMed] [Google Scholar]
  • 17.Knight R, Harnett M, Titov N. The effects of traumatic brain injury on the predicted and actual performance of a test of prospective remembering. Brain Inj. 2005;19(1):19–27. 10.1080/02699050410001720022 [DOI] [PubMed] [Google Scholar]
  • 18.Schnitzspahn K, Ihle A, Henry J, Rendell P, Kliegel M. The age-prospective memory-paradox: An exploration of possible mechanisms. Int Psychogeriatr. 2011;23(4):583–592. 10.1017/S1041610210001651 [DOI] [PubMed] [Google Scholar]
  • 19.Callender A, Franco-Watkins A, Roberts A. Improving metacognition in the classroom through instruction, training, and feedback. Metacogn Learn. 2016;11(2):215–235. [Google Scholar]
  • 20.Carpenter J, Sherman M, Kievit R, Seth A, Lau H, Fleming S. Domain-General Enhancements of Metacognitive Ability Through Adaptive Training. J Exp Psychol Gen. 2019;148(1):51–64. 10.1037/xge0000505 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Flannelly L. Using feedback to reduce students' judgment bias on test questions. J Nurs Educ. 2001;40(1):10–16. [DOI] [PubMed] [Google Scholar]
  • 22.Storm B, Stone S. Saving-enhanced memory: the benefits of saving on the learning and remembering of new information. Psychol Sci. 2015;26(2):182–188. 10.1177/0956797614559285 [DOI] [PubMed] [Google Scholar]
  • 23.Meeks J, Hicks J, Marsh R. Metacognitive awareness of event-based prospective memory. Conscious Cogn. 2007;16(4):997–1004. 10.1016/j.concog.2006.09.005 [DOI] [PubMed] [Google Scholar]
  • 24.Virgo J, Pillon J, Navarro J, Reynaud E, Osiurak F. Are you sure you're faster when using a cognitive tool? Am J Psychol. 2017;130(4):493–503. [Google Scholar]
  • 25.Hirst M, Luckett P, Trotman K. Effects of Feedback and Task Predictability on Task Learning and Judgment Accuracy. Abacus. 1999;35(3):286–301. [Google Scholar]
  • 26.Ballard D, Hayhoe M, Pook P, Rao R. Deictic codes for the embodiment of cognition. Behav Brain Sci. 1997;20(4):723–742. 10.1017/s0140525x97001611 [DOI] [PubMed] [Google Scholar]
  • 27.Baumeister RF, Vohs KD, Tice DM. The Strength Model of Self Control. Curr Dir Psychol Sci. 2007;16(6):351–355. [Google Scholar]
  • 28.Shenhav A, Musslick S, Lieder F, Kool W, Griffiths TL, Cohen JD et al. Toward a Rational and Mechanistic Account of Mental Effort. Annu Rev Neurosci. 2017;40(1):99–124. [DOI] [PubMed] [Google Scholar]
  • 29.Sachdeva C, Gilbert SJ. Excessive use of reminders: Metacognition and effort-minimisation in cognitive offloading. Conscious Cogn. 2020;85:103024 10.1016/j.concog.2020.103024 [DOI] [PubMed] [Google Scholar]
  • 30.Rosenbaum DA, Fournier LR, Levy-Tzedek S, McBride DM, Rosenthal R, Sauerberger K, et al. Sooner rather than later: Precrastination rather than procrastination. Curr Dir Psychol Sci. 2019;28(3):229–33. [Google Scholar]
  • 31.Hawkins GE, Brown SD, Steyvers M, Wagenmakers EJ. An optimal adjustment procedure to minimize experiment time in decisions with multiple alternatives. Psychon Bull Rev. 2012;19(2):339–348. 10.3758/s13423-012-0216-z [DOI] [PubMed] [Google Scholar]
  • 32.Dunn TL, Risko EF. Toward a metacognitive account of cognitive offloading. Cogn Sci. 2016;40(5):1080–1127. 10.1111/cogs.12273 [DOI] [PubMed] [Google Scholar]

Decision Letter 0

Veronica Yan

9 Jul 2020

PONE-D-20-14992

The effect of metacognitive training on confidence and strategic reminder setting

PLOS ONE

Dear Dr. Engeler,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

I have now received three expert reviews and have also read the paper myself. As you can see, the comments are largely positive and I agree that there is much to like in this paper. To echo Reviewer 3, one laudable aspect is the preregistration of the analytic plan and the fact that the paradigm produces several metacognitive measures is also in itself a contribution to the literature.

Each reviewer has provided informative comments that I hope you will find useful. There are two types of key concerns that arose from the reviews. The first type of key concern is a relatively simple issue of clarifying the methods sections, to give provide more details about the online administration of the study and the task itself. I think part of the confusion might be the use of the word “trial” to refer to each block of 25 circle-drags (note that Reviewer 1’s comment 6 uses “trial” to refer to each circle drag; it was also a confusion that I had). Perhaps referring to each trial as a “block” might sidestep the issue.

The second type of key concern is about how exactly the data should be interpreted. For example, Reviewer 2 asked about the specific wording of the global metacognitive judgments questions. I suspect that one reason why they ask that question is to get at what participants specifically participants were being asked to make judgments about. For example, was the first global metacognitive judgment a prediction of how participants would perform on future trials or a postdiction of how well they did on the forced trials? Similarly, did participants interpret the second global judgment as a postdiction of how well they did in the choice trials? One reason this matter is that metacognitive bias (which is the DV on which effects are found) is based on the difference between the judgments and their actual performance on the forced trials. However, interpretation of this difference as over- or underconfidence is muddied if participants think they are making metacognitive judgment about anything other than performance on the forced trials. Another example is comment 5 by Reviewer 1 -- if participants are aware of flagging attention, this relatively  judgment could appear as a positive reminder bias. I do not expect that these are questions that can be definitively answered without a new study, but should at least be raised and addressed in the general discussion.

Finally, I had another more minor suggestion for the analysis of the relationship between reminder bias and metacognitive bias: currently, four correlations are reported and an interaction is implied. I would recommend considering using regression analyses to test the implied interaction (e.g., is the relationship between reminder bias and internal metacognitive bias moderated by group?).

I look forward to seeing the next iteration of this manuscript.

Please submit your revised manuscript by Aug 23 2020 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

We look forward to receiving your revised manuscript.

Kind regards,

Veronica Yan, Ph.D.

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. We note that you have stated that you will provide repository information for your data at acceptance. Should your manuscript be accepted for publication, we will hold it until you provide the relevant accession numbers or DOIs necessary to access your data. If you wish to make changes to your Data Availability statement, please describe these changes in your cover letter and we will update your Data Availability statement to reflect the information you provide.

Additional Editor Comments (if provided):

<accurate>

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions</accurate>

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Partly

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: No

Reviewer #2: Yes

Reviewer #3: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The authors examine whether a training period will shift metacognitive judgments about reminder-setting and adjust how many reminders learners choose to set. Learners move digital circles to assigned different sides of a box. Learners are randomly assigned to a metacognitive training condition in which they predict how many they will correctly move and receive feedback about their performance and predictions during forced trials OR to a baseline condition in which they do not practice metacognitive judgments. Practicing metacognitive judgments improves later metacognitive predictions but does not shift when and how they choose to use reminders.

I think cognitive offloading is an incredibly interesting and important area for investigation and understanding influences on it is worthwhile. This manuscript provides an incremental step towards greater understanding of cognitive offloading and reminder setting.

1. I wondering if practice making metacognitive judgments is truly a kind of “training.” The authors consider training to be making metacognitive predictions and getting feedback about their predictions. This does not fit my ideal of what training entails. Training seems like it would entail some kind of instruction. It may be more accurate to think of practice predictions as experience with the task, which then improve later predictions. Those that get experience making predictions become better at making predictions (and this improvement at the task continues throughout the experimental trials, even without feedback about estimates – i.e. the interaction of judgment time with other variables). Practice with this unusual or difficult task improves later performance at the task.

2. The primary conclusion (practice making metacognitive predictions does not change reminder setting) is evidenced by a null effect. The authors find no differences between conditions in reminder selections. This concern is somewhat allayed by differences in metacognitive monitoring accuracy, but null effects are not as powerful as significant differences. The authors suggest reasons why there may be null effects but cannot test their hypotheses. Understanding how to shift reminder setting and showing how to do so would be interesting future directions.

3. To calculate optimal selections, participants must compute a math equation – there is no other way to select optimally. Therefore, this “metacognitive” task may be more of an algebra problem than what is typically considered as metacognition. Succeeding at a math problem seems very different than on-going metacognitive monitoring. Do optimal selections in this task relate to math ability? How can learners choose optimally if they do not know how to solve the math problem?

4. I think that future directions could explore item-by-item reminder selections. Item-by-item selection processes could better mimic how learners choose to offload memory in the real world. Learners do not choose to offload an entire block of information; rather, they choose what items and information to offload on an item-by-item basis.

5. To calculate optimal allocation, the authors use performance on the forced trials at the beginning of the experiment. This could be inaccurate if learning occurs during training and performance improves. Alternatively, this could be inaccurate if fatigue and proactive interference happens and performance declines. In other words, the temporal separation of the forced trials from the choice trials could produce biases in calculating optimal selection. The procedure assumes that performance is static across the entire experiment, which seems unlikely. Further, we know attention waxes and wanes. If learners are sensitive to changes in their attention (as could be suggested by prior research, e.g., Markant, DuBrow, Davachi, and Gureckis, 2014), then they may be exercising better metacognitive control over reminder setting than can be picked up by the rough measurements in this procedure.

6. I think the authors need to be clearer about the 1-9 point assignment phases. It was unclear whether individual circles were assigned point values or whether the entire block of trials was assigned those point values (I figured out that it was the latter possibility, but it took a bit of time).

7. The authors need to be clearer about where they got their degrees of freedom around lines 279

8. The authors sometimes average across the two judgments and sometimes include the time point as a variable in the analysis. Can they be clearer about the differences in accounting for time point? And given the interactions with time point and other variables, does averaging across time point in some analyses obscure important differences?

9. I do not see a link to the data file used for this project.

Reviewer #2: I find this manuscript interesting, informative, and clearly written. I appreciate the authors’ work and can see why it would be of interest to those examining cognitive offloading. However, there are some potential areas for improvement, particularly in the motivation of the current work and the specification of the sample. I outline my suggestions below, in the order I noticed them in the paper.

Overall, I found that the authors could improve their situating of the current work, and describing why it is compelling – why would improving metacognitive judgment accuracy/reducing bias in offloading behavior be helpful? I think this motivation could be improved in both the Introduction and Discussion sections, perhaps using practical examples of why these factors matter. The larger issues at play here are interesting and could be enhanced in this manuscript.

---

The sample was collected via Amazon Mechanical Turk. Given the rather sizeable existing literature on the proper screening of MTurk participants, I think it's important to have additional detail about the sample.

How was age confirmed? Did you use Amazon MTurk's pre-screening age qualifications? To confirm that people were in the US as you stated, did you use some kind of IP address locator? Or were these items self-reported? If participants did self-report, how did you control for lying? (MTurk participants commonly respond to demographic questions based on their perceptions of demand characteristics, and are known to share those demand characteristics with their fellow MTurkers once they are discovered).

Some studies have reported that American MTurkers are more highly educated on average than other samples. Did you collect educational data, and if so, was that a factor (given the extant literature on the relationship between education and metacognition)? How about across age, as the sample ranges quite widely from 20-71 years old, and some literature suggests that younger and older adults differ in both their memory and metacognitive capabilities?

Did you use “catch questions” or “bot checks” to ensure participants were paying attention? If so, what were these items? When did they appear? If not, how did you verify that your sample did not contain bots?

Were participants screened to be fluent in English, or to have English as a first language? Among those whose first language was not English, how did you ensure fluency/understanding of the instructions, especially given that the task has multiple components?

Do the authors have a measure of how often Mturk participants left the page (e.g., to click to another window) when they were doing this task to assess potential distraction/lack of attention?

The authors report that “the experiment took approximately 45 minutes.” What was the standard deviation in time spent? Were data from any especially quick or especially slow outliers discarded? Could participants have clicked through very quickly (e.g., skimming through instructions), or were there minimum requirements for time spent on each page?

----

For conciseness, some repetitiveness between the Optimal Reminders Task section and the Procedure section could be decreased or consolidated.

Lines 236-237, please provide exact wording of the final metacognitive questions – was it some version of “how accurately do you think you performed on this task?”.

Finally, this may or may not be possible, but I think it would be interesting if readers could be directed to a quick video of the task in action in supplemental materials. Reading about it and seeing photos help, but a screen recording of the task could be even clearer and more compelling. If this exists elsewhere, directing readers toward it would be helpful.

Reviewer #3: The authors investigate behavior in a recently-developed metacognitive task in which participants drag a set of numbered circles to the bottom of the screen in order. Certain circles are initially marked in a different color and must be dragged to different locations, for which participants may optionally set an "external reminder." By manipulating the rewards & costs associated with setting a reminder, the paradigm allows the researchers how optimally people use external reminders. In the present study, the researchers additionally created an intervention in which experimental (but not control) participants made predictions and received feedback in the first block of trials. In the second block of trials, the experimental group showed more accurate performance predictions for both internal and external memory, but both groups still showed a behavioral bias towards setting more external reminders than optimal.

Overall, I found much to like about this manuscript.

One clear strength of the manuscript is the authors' commitment to open science. The authors have preregistered the participant exclusion criteria, experimental procedure and the analytic plan, and they have adhered to this closely. Among other laudable aspects of this plan, power analysis was used to set the target sample size a priori to avoid any risk of p-hacking.

Another strength is the data and analytic procedure. An exciting aspect of the paradigm is that it produces several measures, including actual performance, participants' bias in choosing external vs. internal memory, and metacognitive predictions, that each characterizes a separate aspect of behavior. The authors nicely outline each of these measures on p. 11 and clearly delineate the results and conclusions stemming from each one.

I did have some initial concerns about how strongly the results from this particular experimental paradigm support the claim of a general bias towards external reminders (with or without metacognitive training), but the authors generally acknowledge these concerns in their Discussion. First, participants might not be motivated to set an optimal indifference point simply because there is no real incentive to earn more points, but the authors acknowledge this in lines 389-394. An additional counter-explanation, which the authors may want to also discuss, is that participants use external reminders because they don't want to "look stupid" by making errors (Hawkins, Brown, Steyvers, & Wagenmakers, 2012), even if that results in a suboptimal payoff. Second, while it's also unclear whether the present intervention is effective because of the predictions, the feedback, or their combination, the authors similarly acknowledge this on lines 378-380 (although I think it would also be worthwhile to consider the possibility that feedback alone produces the effect, which the authors do not discuss). Thus, I find the authors' Discussion section generally effective.

But, there is the potential for one other major confound that was unclear to me when reading the manuscript. Lines 148-150 state that "when participants set an external reminder, target circles had to be dragged near the instructed alternative location." Am I to understand that offloading in this case involved moving the special-colored circles near or at their eventual target location? If so, I feel that this referring to this operation as just an a "external reminder" is a bit of a misnomer because putting the circle in the alternative location also helps to complete the eventual task. That could explain participants' bias to set reminders, since in doing so they are not only creating a reminder but also helping to complete the task ("pre-crastinating"; Rosenbaum et al., 2019). (Compare that to say, setting a reminder in one's phone to mow the lawn, in which case the phone alert serves as a reminder but does nothing to accomplish the lawn-mowing itself.) At the very least, the manuscript could make this issue more clear: Where, exactly, where the items moved when setting a reminder? And, if the items were indeed dragged to or near the target locations, I think the authors need to discuss how that may temper or alter their conclusions.

One other weakness of the current manuscript is that, while the empirical contributions are clear, the manuscript could do more to speak to the theoretical contributions. The theoretical motivation is a single paragraph in the introduction that briefly covers both the limits of working memory and the notion of offloading with technology. More contact with broader theory on external memory and metacognition could help contextualize the results. For example, can broader theoretical perspectives on metacognitive monitoring and control suggest anything about why people might have a reminder bias?

Nevertheless, I think this is a well-executed and well-analyzed study, and with some revision to the introduction and discussion, it will make a valuable contribution.

References

Hawkins, G. E., Brown, S. D., Steyvers, M., & Wagenmakers, E.-J. (2012). An optimal adjustment procedure to minimize experiment time in decisions with multiple alternatives. Psychonomic Bulletin & Review, 19, 339–348.

Rosenbaum, D, A., et al. (2019). Sooner rather than later: Precrastination rather than procrastination. Current Directions in Psychological Science, 28, 229-223.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Reviewer #3: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2020 Oct 23;15(10):e0240858. doi: 10.1371/journal.pone.0240858.r002

Author response to Decision Letter 0


26 Aug 2020

We are very grateful to the editor and all three reviewers for their careful reading of our manuscript and their insightful comments, which we believe have greatly improved our manuscript. We have carefully revised the manuscript according to your comments, as summarised below. As well as uploading a clean version of the document as the main manuscript, we have also uploaded the version with track changes as a supplementary file so that all changes are clearly indicated.

We look forward to hearing from you.

Sincerely,

Sam Gilbert and Nicole Engeler

Note: All line references refer to the “Manuscript with track changes” document.

Editor comments

I have now received three expert reviews and have also read the paper myself. As you can see, the comments are largely positive and I agree that there is much to like in this paper. To echo Reviewer 3, one laudable aspect is the preregistration of the analytic plan and the fact that the paradigm produces several metacognitive measures is also in itself a contribution to the literature.

Thank you for this evaluation.

Each reviewer has provided informative comments that I hope you will find useful. There are two types of key concerns that arose from the reviews. The first type of key concern is a relatively simple issue of clarifying the methods sections, to give provide more details about the online administration of the study and the task itself. I think part of the confusion might be the use of the word “trial” to refer to each block of 25 circle-drags (note that Reviewer 1’s comment 6 uses “trial” to refer to each circle drag; it was also a confusion that I had). Perhaps referring to each trial as a “block” might sidestep the issue.

Thank you for this summary. We have now clarified the methods, as explained in more detail below. We have also included a weblink (see lines 124-129) which allows anyone to try the experiment, exactly as it was presented to the actual participants. We hope this will further clarify any methodological points. We agree that our use of the word “trial” was potentially confusing, therefore we added a description of exactly what is meant by this on lines 62-63. This wording was chosen to be in line with previous studies (e.g. Gilbert et al., 2019).

The second type of key concern is about how exactly the data should be interpreted. For example, Reviewer 2 asked about the specific wording of the global metacognitive judgments questions. I suspect that one reason why they ask that question is to get at what participants specifically participants were being asked to make judgments about. For example, was the first global metacognitive judgment a prediction of how participants would perform on future trials or a postdiction of how well they did on the forced trials? Similarly, did participants interpret the second global judgment as a postdiction of how well they did in the choice trials? One reason this matter is that metacognitive bias (which is the DV on which effects are found) is based on the difference between the judgments and their actual performance on the forced trials. However, interpretation of this difference as over- or underconfidence is muddied if participants think they are making metacognitive judgment about anything other than performance on the forced trials. Another example is comment 5 by Reviewer 1 -- if participants are aware of flagging attention, this relatively judgment could appear as a positive reminder bias. I do not expect that these are questions that can be definitively answered without a new study, but should at least be raised and addressed in the general discussion.

We agree that these are important issues and have clarified the manuscript accordingly. As well as providing a weblink to the full experimental task [lines 124-129] we have also added the exact wording for the 2nd global metacognitive judgments on lines 271-277. For all global metacognitive judgments, the wording made it clear that participants were judging their average ability for future trials rather than making postdictions. It was also made clear, using bold text, whether they were judging their ability for internal or external trials.

Regarding reviewer 1’s comment about the temporal separation between forced and choice trials, we agree that this is an important point. In most of our experiments using this paradigm we have intermixed force and choice trials in order to avoid this issue (e.g. Gilbert et al., 2020, Experiments 1 and 3). However, it was necessary to administer the forced trials first in this experiment seeing as this was where the metacognitive intervention took place. While we agree that this complicates interpretation of whether the reminder bias was positive or negative, we also note that this would apply equally to the feedback and no-feedback groups, therefore the direct comparison between the two groups – our main interest in this study - is not affected. We have added a discussion of this issue on lines 485-493.

Finally, I had another more minor suggestion for the analysis of the relationship between reminder bias and metacognitive bias: currently, four correlations are reported and an interaction is implied. I would recommend considering using regression analyses to test the implied interaction (e.g., is the relationship between reminder bias and internal metacognitive bias moderated by group?).

We agree that in order to draw the conclusion that the relationship between internal metacognitive bias and reminder bias was specific to the no-feedback group, it would be necessary to demonstrate a significant difference between the correlations for the feedback and no-feedback groups. We now report this analysis on lines 405-409 and, seeing as it was not significant, we note that no strong conclusions may be drawn about specificity.

Reviewer 1

The authors examine whether a training period will shift metacognitive judgments about reminder-setting and adjust how many reminders learners choose to set. Learners move digital circles to assigned different sides of a box. Learners are randomly assigned to a metacognitive training condition in which they predict how many they will correctly move and receive feedback about their performance and predictions during forced trials OR to a baseline condition in which they do not practice metacognitive judgments. Practicing metacognitive judgments improves later metacognitive predictions but does not shift when and how they choose to use reminders.

I think cognitive offloading is an incredibly interesting and important area for investigation and understanding influences on it is worthwhile. This manuscript provides an incremental step towards greater understanding of cognitive offloading and reminder setting.

Thank you for this assessment.

1. I wondering if practice making metacognitive judgments is truly a kind of “training.” The authors consider training to be making metacognitive predictions and getting feedback about their predictions. This does not fit my ideal of what training entails. Training seems like it would entail some kind of instruction. It may be more accurate to think of practice predictions as experience with the task, which then improve later predictions. Those that get experience making predictions become better at making predictions (and this improvement at the task continues throughout the experimental trials, even without feedback about estimates – i.e. the interaction of judgment time with other variables). Practice with this unusual or difficult task improves later performance at the task.

Thank you, that is a good point. We agree that the wording choice of “training” can be confusing. However, we used this term in order to be consistent with the terminology previous studies have used (Carpenter et al., 2019). Therefore, we clarified what we mean by this term on lines 106-107.

2. The primary conclusion (practice making metacognitive predictions does not change reminder setting) is evidenced by a null effect. The authors find no differences between conditions in reminder selections. This concern is somewhat allayed by differences in metacognitive monitoring accuracy, but null effects are not as powerful as significant differences. The authors suggest reasons why there may be null effects but cannot test their hypotheses. Understanding how to shift reminder setting and showing how to do so would be interesting future directions.

We agree with these points, and have now included an additional section in the discussion (lines 505-507) where we acknowledge the null effect and discuss the relationship between metacognitive interventions and reminder-setting (as well as noting that such a relationship has previously been found in an earlier study).

3. To calculate optimal selections, participants must compute a math equation – there is no other way to select optimally. Therefore, this “metacognitive” task may be more of an algebra problem than what is typically considered as metacognition. Succeeding at a math problem seems very different than on-going metacognitive monitoring. Do optimal selections in this task relate to math ability? How can learners choose optimally if they do not know how to solve the math problem?

Thank you for this suggestion. We agree that participants may choose sub-optimally simply because they lack the arithmetic ability to weigh the two strategies properly, although it is not clear why this would cause systematic bias in one direction or the other. We now note this point in the discussion (lines 500-502).

4. I think that future directions could explore item-by-item reminder selections. Item-by-item selection processes could better mimic how learners choose to offload memory in the real world. Learners do not choose to offload an entire block of information; rather, they choose what items and information to offload on an item-by-item basis.

Thank you for another good point. We included this suggestion for future work in the discussion (line 535-536).

5. To calculate optimal allocation, the authors use performance on the forced trials at the beginning of the experiment. This could be inaccurate if learning occurs during training and performance improves. Alternatively, this could be inaccurate if fatigue and proactive interference happens and performance declines. In other words, the temporal separation of the forced trials from the choice trials could produce biases in calculating optimal selection. The procedure assumes that performance is static across the entire experiment, which seems unlikely. Further, we know attention waxes and wanes. If learners are sensitive to changes in their attention (as could be suggested by prior research, e.g., Markant, DuBrow, Davachi, and Gureckis, 2014), then they may be exercising better metacognitive control over reminder setting than can be picked up by the rough measurements in this procedure.

Thank you, this is a very valid concern. We now discuss this issue on lines 485-493. We note that previous studies have intermixed the forced and choice trials to avoid this problem. We also note that the issue would apply equally to both the feedback and the control groups, so should not confound any direct comparison between the groups.

6. I think the authors need to be clearer about the 1-9 point assignment phases. It was unclear whether individual circles were assigned point values or whether the entire block of trials was assigned those point values (I figured out that it was the latter possibility, but it took a bit of time).

We have clarified the point assignment (line 185).

7. The authors need to be clearer about where they got their degrees of freedom around lines 279

Consistent with previous articles (e.g. Gilbert et al., 2020), degrees of freedom were based on t-tests conducted using R, without assuming equal variance. We now clarify this on lines (283-285).

8. The authors sometimes average across the two judgments and sometimes include the time point as a variable in the analysis. Can they be clearer about the differences in accounting for time point? And given the interactions with time point and other variables, does averaging across time point in some analyses obscure important differences?

As determined in our pre-registration we first collapsed the two timepoints in order to conduct our main analyses of the metacognitive bias scores without inflating the type-1 error rate. This was followed by additional analyses separately investigating whether there was an effect of timepoint. We have now clarified this on lines 319-320.

9. I do not see a link to the data file used for this project.

We have also uploaded data and code to https://osf.io/ebp4z/ so that all of the reported analyses can be reproduced. We now note this in lines 284-285. In reproducing all of our analyses using R, we noticed that there were some minor discrepancies and we also noticed an error in Table 1. All of these have been corrected. None of our conclusions were affected.

Reviewer 2

I find this manuscript interesting, informative, and clearly written. I appreciate the authors’ work and can see why it would be of interest to those examining cognitive offloading. However, there are some potential areas for improvement, particularly in the motivation of the current work and the specification of the sample. I outline my suggestions below, in the order I noticed them in the paper.

Overall, I found that the authors could improve their situating of the current work, and describing why it is compelling – why would improving metacognitive judgment accuracy/reducing bias in offloading behavior be helpful? I think this motivation could be improved in both the Introduction and Discussion sections, perhaps using practical examples of why these factors matter. The larger issues at play here are interesting and could be enhanced in this manuscript.

Thank you for this assessment. We expanded on the motivation of this manuscript on lines 117-120.

The sample was collected via Amazon Mechanical Turk. Given the rather sizeable existing literature on the proper screening of MTurk participants, I think it's important to have additional detail about the sample. How was age confirmed? Did you use Amazon MTurk's pre-screening age qualifications? To confirm that people were in the US as you stated, did you use some kind of IP address locator? Or were these items self-reported? If participants did self-report, how did you control for lying? (MTurk participants commonly respond to demographic questions based on their perceptions of demand characteristics, and are known to share those demand characteristics with their fellow MTurkers once they are discovered).

Age, gender and location were all self-reported, and we did not use MTurk’s pre-screening age qualifications, so we cannot exclude the possibility that some individuals gave false information. We now clarify this on lines 214 and 221.

Some studies have reported that American MTurkers are more highly educated on average than other samples. Did you collect educational data, and if so, was that a factor (given the extant literature on the relationship between education and metacognition)? How about across age, as the sample ranges quite widely from 20-71 years old, and some literature suggests that younger and older adults differ in both their memory and metacognitive capabilities?

We did not collect educational data, though we agree that it would be interesting to investigate the influence of this on offloading. We also agree that it is interesting to investigate potential age effects, but we did not do so here (although we have previously in Gilbert, 2015, Quarterly Journal of Experimental Psychology). This is because it is unclear how comparable younger and older mTurk participants are, and what the confounding variables might be. As an alternative, we have recently been investigating age effects in laboratory-based studies (Scarampi & Gilbert, under revision, Psychology and Aging: https://psyarxiv.com/vsa45/ ; Tsai, Kliegel, & Gilbert, in prep).

Did you use “catch questions” or “bot checks” to ensure participants were paying attention? If so, what were these items? When did they appear? If not, how did you verify that your sample did not contain bots?

We did not use catch questions or bot checks per se. However, seeing as participants needed to drag circles to specific parts of the screen in order to progress with the task, this would have excluded any attempts at automated responding. We also applied exclusion criteria to ensure that participants engaged with the task as intended, which we now note on lines 207-208.

Were participants screened to be fluent in English, or to have English as a first language? Among those whose first language was not English, how did you ensure fluency/understanding of the instructions, especially given that the task has multiple components?

We included a practice session which required participants to respond accurately to a target circle in order to proceed, ensuring that participants understood the task instructions. This is now noted on lines 207-208. The full practice session can be sampled by visiting the demonstration weblink: http://samgilbert.net/demos/NE1/start.html

Do the authors have a measure of how often Mturk participants left the page (e.g., to click to another window) when they were doing this task to assess potential distraction/lack of attention?

Unfortunately we did not collect this measure, but we agree it would be useful information to collect in future work.

The authors report that “the experiment took approximately 45 minutes.” What was the standard deviation in time spent? Were data from any especially quick or especially slow outliers discarded? Could participants have clicked through very quickly (e.g., skimming through instructions), or were there minimum requirements for time spent on each page?

We have now included a more precise description of the experiment duration on lines 222-223: the median was 31 minutes. The standard deviation was large (30 minutes), however this is hard to interpret because it could be caused by participants loading up the first page of instructions in their web browser, then working on other tasks until returning to the browser tab later. It is common for mTurk workers to cue up several experiments in this manner before starting work. We did not impose any minimum requirements for time spent on each page, instead we applied exclusion criteria (line 210) to make sure that participants met a minimum level of accuracy when they performed the task. We also imposed a performance criterion on the practice session to ensure that participants understood the instructions (lines 224-226).

For conciseness, some repetitiveness between the Optimal Reminders Task section and the Procedure section could be decreased or consolidated.

Thank you for spotting the unnecessary repetition of information, we have consolidated this (lines 262-263).

Lines 236-237, please provide exact wording of the final metacognitive questions – was it some version of “how accurately do you think you performed on this task?”.

We have now added the exact wording of the final metacognitive judgments on lines 271-277.

Finally, this may or may not be possible, but I think it would be interesting if readers could be directed to a quick video of the task in action in supplemental materials. Reading about it and seeing photos help, but a screen recording of the task could be even clearer and more compelling. If this exists elsewhere, directing readers toward it would be helpful.

Thank you, this is a good idea. Rather than a video, we have provided a weblink (lines 124-129) so that anyone interested can complete the task themselves.

Reviewer 3

The authors investigate behavior in a recently-developed metacognitive task in which participants drag a set of numbered circles to the bottom of the screen in order. Certain circles are initially marked in a different color and must be dragged to different locations, for which participants may optionally set an "external reminder." By manipulating the rewards & costs associated with setting a reminder, the paradigm allows the researchers how optimally people use external reminders. In the present study, the researchers additionally created an intervention in which experimental (but not control) participants made predictions and received feedback in the first block of trials. In the second block of trials, the experimental group showed more accurate performance predictions for both internal and external memory, but both groups still showed a behavioral bias towards setting more external reminders than optimal.

Overall, I found much to like about this manuscript.

One clear strength of the manuscript is the authors' commitment to open science. The authors have preregistered the participant exclusion criteria, experimental procedure and the analytic plan, and they have adhered to this closely. Among other laudable aspects of this plan, power analysis was used to set the target sample size a priori to avoid any risk of p-hacking.

Another strength is the data and analytic procedure. An exciting aspect of the paradigm is that it produces several measures, including actual performance, participants' bias in choosing external vs. internal memory, and metacognitive predictions, that each characterizes a separate aspect of behavior. The authors nicely outline each of these measures on p. 11 and clearly delineate the results and conclusions stemming from each one.

Thank you for this evaluation.

I did have some initial concerns about how strongly the results from this particular experimental paradigm support the claim of a general bias towards external reminders (with or without metacognitive training), but the authors generally acknowledge these concerns in their Discussion. First, participants might not be motivated to set an optimal indifference point simply because there is no real incentive to earn more points, but the authors acknowledge this in lines 389-394. An additional counter-explanation, which the authors may want to also discuss, is that participants use external reminders because they don't want to "look stupid" by making errors (Hawkins, Brown, Steyvers, & Wagenmakers, 2012), even if that results in a suboptimal payoff. Second, while it's also unclear whether the present intervention is effective because of the predictions, the feedback, or their combination, the authors similarly acknowledge this on lines 378-380 (although I think it would also be worthwhile to consider the possibility that feedback alone produces the effect, which the authors do not discuss). Thus, I find the authors' Discussion section generally effective.

Thank you for the interesting additional points. We included the suggested possibility that participants use external reminders because they do not want to “look stupid” (lines 499-500), and that feedback alone could make the intervention effective (line 469).

But, there is the potential for one other major confound that was unclear to me when reading the manuscript. Lines 148-150 state that "when participants set an external reminder, target circles had to be dragged near the instructed alternative location." Am I to understand that offloading in this case involved moving the special-colored circles near or at their eventual target location? If so, I feel that this referring to this operation as just an a "external reminder" is a bit of a misnomer because putting the circle in the alternative location also helps to complete the eventual task. That could explain participants' bias to set reminders, since in doing so they are not only creating a reminder but also helping to complete the task ("pre-crastinating"; Rosenbaum et al., 2019). (Compare that to say, setting a reminder in one's phone to mow the lawn, in which case the phone alert serves as a reminder but does nothing to accomplish the lawn-mowing itself.) At the very least, the manuscript could make this issue more clear: Where, exactly, where the items moved when setting a reminder? And, if the items were indeed dragged to or near the target locations, I think the authors need to discuss how that may temper or alter their conclusions.

That is correct, offloading involved dragging target circles near their target locations in external trials. However, we do not believe that dragging target circles near their target location meaningfully reduces work, seeing as in either case the participant needs to make a small mouse movement to move a circle from one screen position to another. The physical demands of the two actions seem similar, i.e. a small movement of the wrist. Nevertheless, we agree that the literature on “pre-crastination” is highly relevant in this context and we now discuss it on lines 494-496.

One other weakness of the current manuscript is that, while the empirical contributions are clear, the manuscript could do more to speak to the theoretical contributions. The theoretical motivation is a single paragraph in the introduction that briefly covers both the limits of working memory and the notion of offloading with technology. More contact with broader theory on external memory and metacognition could help contextualize the results. For example, can broader theoretical perspectives on metacognitive monitoring and control suggest anything about why people might have a reminder bias?

We have now included some broader theoretical context on why people might exhibit a reminder bias in the discussion (lines 478-480).

Nevertheless, I think this is a well-executed and well-analyzed study, and with some revision to the introduction and discussion, it will make a valuable contribution.

Thank you for this assessment.

References

Carpenter, J., Sherman, M. T., Kievit, R. A., Seth, A. K., Lau, H., & Fleming, S. M. (2019). Domain-general enhancements of metacognitive ability through adaptive training. Journal of Experimental Psychology: General, 148(1), 51.

Gilbert, S. J., Bird, A., Carpenter, J. M., Fleming, S. M., Sachdeva, C., & Tsai, P. C. (2019). Optimal use of reminders: Metacognition, effort, and cognitive offloading. Journal of Experimental Psychology: General.

Scarampi, C., & Gilbert, S. (2020, June 24). Age differences in strategic reminder setting and the compensatory role of metacognition. https://doi.org/10.31234/osf.io/vsa45

Scarampi, C., & Gilbert, S. J. (2020b). The effect of recent reminder setting on subsequent strategy and performance in a prospective memory task. Memory, 1-15.

Attachment

Submitted filename: Response to Reviewers.docx

Decision Letter 1

Veronica Yan

5 Oct 2020

The effect of metacognitive training on confidence and strategic reminder setting

PONE-D-20-14992R1

Dear Dr. Engeler,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Veronica Yan, Ph.D.

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Thank you for the submitted revision. As you can see, the reviewers felt that their comments were sufficiently addressed, and I agree. I do want to draw attention to one note that Reviewer 3 made about the consistency of the use of terms 'sequence' and 'trial' -- this is something that you may want to pay attention to for the final proofs.

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #3: (No Response)

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #3: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #3: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #3: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #3: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: My comments were adequately addressed.

I think this manuscript will make an interesting, incremental addition to the literature on metacognition.

Reviewer #3: I was Reviewer #3 on the previous version of the manuscript. In my initial review, I expressed enthusiasm about the authors' experimental paradigm and its ability to yield multiple informative measures relevant to metacognition. I also appreciated the authors' commitment to open-science practices.

My concerns had been relatively small. I had expressed some concern that, if setting reminders involved moving the special-colored circles nearer to their eventual target location, participants might doing so in order to partially solve the eventual task ("pre-crastination"), in addition to acting as a reminder per se. I also noted another explanation of the reminder bias is that participants might want to avoid "looking stupid" by making errors, even if that results in a suboptimal payout. Lastly, I felt that the authors could do more to situate the work within broader theory.

I am pleased to report that the revision has addressed all of my concerns.

The authors have clarified the methodological details that I and the other reviewers have asked about; in particular, the addition of a link to a demo version of the experiment is a fantastic decision that I hope to see implemented in more papers in the future.

The authors also now use their Discussion section to acknowledge several potential limitations and counter-explanations that I and the other review mentioned. What I think is particularly encouraging about these revisions is that the authors have tied these other factors back into their "minimal-memory" theoretical account; for instance, pre-crastination may actually be part of the explanation as to *why* there is a reminder bias insofar as removing an intention from mind saves mental effort. That is, not only have the authors acknowledged this possibility, but have made a persuasive argument that it in fact advances their theoretical view. Fantastic.

One small comment (which should not preclude acceptance of the manuscript): Based on comments from the editor, the authors have clarified that one "trial" refers to the entire sequence of 25 circles. That is fine with me; however, I notice that the alternate term "sequence" is used in several places to refer to an entire trial (e.g., lines 146, 151, 157). In their final submission, the authors may want to change these usages to "trial" as well so that a reader is not misled into thinking that there is a difference between a "trial" and a "sequence".

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #3: No

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    Attachment

    Submitted filename: Response to Reviewers.docx

    Data Availability Statement

    All data files and analysis codes will be held in a public repository, available at https://osf.io/ebp4z/.


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES