Regulating recognition decisions through incremental reinforcement learning

Sanghoon Han; Ian G Dobbins

doi:10.3758/PBR.16.3.469

. Author manuscript; available in PMC: 2010 Jun 1.

Published in final edited form as: Psychon Bull Rev. 2009 Jun;16(3):469–474. doi: 10.3758/PBR.16.3.469

Regulating recognition decisions through incremental reinforcement learning

Sanghoon Han ¹, Ian G Dobbins ²

PMCID: PMC2737388 NIHMSID: NIHMS122810 PMID: 19451370

Abstract

Does incremental reinforcement learning influence recognition memory judgments? We examined this question by subtly altering the relative validity or availability of feedback in order to differentially reinforce old or new recognition judgments. Experiment 1 probabilistically and incorrectly indicated that either misses or false alarms were correct in the context of feedback that was otherwise accurate. Experiment 2 selectively withheld feedback for either misses or false alarms in the context of feedback that was otherwise present. Both manipulations caused prominent shifts of recognition memory decision criteria that remained for considerable periods even after feedback was altogether removed. Overall, these data demonstrate that incremental reinforcement learning mechanisms influence the degree of caution participants exercise when evaluating explicit memories.

Keywords: recognition memory, incremental reinforcement learning, decision criterion, probabilistic feedback

Recognition criteria are hypothetical standards by which memory evidence is categorized as either sufficient or inadequate to warrant a judgment of prior encounter (viz. “old”) (Macmillan & Creelman, 1991) (see Figure 1). Although most memory researchers assume criteria are adaptive, there are few models of learning that might support such adaptability (however see Estes & Maddox 1995; Unkelbach, 2006) and to date, the vast majority of successful manipulations of memory decision criteria have involved explicit instructions given to observers about the relative preponderance of old and new items (Hirshman, 1998; Rotello et al., 2005; Strack & Foerster, 1995), or explicit warnings to avoid either errors of omission or commission (Azimian-Faridani & Wilding, 2006). These instructed criterion shifts are sometimes augmented with clear descriptions of monetary losses and gains attached to different response outcomes (payoff matrices) (Van Zandt, 2000) but in all of these cases observers consciously attempt to comply with instructions given their understanding of test list regularities or characteristics. What remains unclear is whether the decision criterion can adapt without an explicit or controlled strategy.

Example of one-dimensional Signal Detection Theory (1D-SDT) model of item-based memory decision. The figure illustrates the model of old/new recognition with normal probability density distributions of familiarity values for old and new items. Accuracy estimate d3 is the distance between the means of two distributions divided by the standard deviation. ‘C’ denotes the SDT estimate of bias that is the relative position of “Old/New” decision criterion with respect to the intersection of two distributions, and ‘HC’ denotes the high confidence criterion. The decision criterion is the hypothetical standard by which memory information is categorized as either sufficient or inadequate to warrant a categorical judgment of “old” (Macmillan & Creelman, 1991). Old items whose evidence falls above the criterion are correctly endorsed (“hits”) whereas new items with evidence above the criterion yield errors of commission (“false alarm”). The complimentary proportions below the criterion labeled “correct rejection” and “miss” respectively.

One candidate mechanism we propose that might enable adaptive positioning of criterion is incremental reinforcement learning, which is central for learning category distinctions in other non-recognition domains (e.g., Gluck & Bower, 1988; Poldrack et al., 2005). Such learning requires integrating trial-by-trial feedback outcomes and gradual re-mapping of different decisions onto different stimulus feature or feature combinations as a function of probabilistic reward likelihood (for reviews see Ashby & Maddox, 2005). Two category learning paradigms having this characteristic are information integration and probabilistic classification tasks. During both, the relationship between key stimulus features and appropriate decisions cannot be reduced to a simple explicit, verbalizable strategy because observers must classify the items based on complex combinations of multiple feature dimensions (e.g., a nonlinear combination of thickness and orientation of sinusoidal gratings), or because feedback is rendered probabilistically such that making the same judgment for a given repeated stimulus does not guarantee receiving the same feedback outcome on every trial (see also Ashby & O’Brien, 2007). Neuropsychological findings suggest that learning during these tasks heavily relies upon the integrity of the striatum, a basal ganglia structure linked to implicit procedural and habit learning (Knowlton et al., 1996; Saint-Cyr et al., 1988).

Although feedback-based changes in criteria have been frequently examined in perceptual judgments tasks (e.g., Dorfman & Biderman, 1971; Thomas, 1973), there are fundamental differences between perceptual classification tasks and the regulation of episodic recognition judgments. More specifically, in feedback-based category learning tasks it is assumed that the mapping between object features and category decisions are incrementally altered via trial-by-trial feedback learning. However, during episodic recognition tests the perceptual and semantic features of the probes are not diagnostic of the required categorical distinction, since the memory status of the probes, and the types of features each possesses, are orthogonal. Instead, incremental reinforcement learning, if successful, must alter the mapping between levels of retrieved memory evidence and recognition decisions and this represents a level of abstraction not found during perceptual classification learning. Additionally, observers cannot learn a reinforced response to each individual test item, because the items within a memory test are never repeated.

Perhaps consistent with the notable differences between episodic recognition and perceptual classification demands, evidence for the efficacy of feedback regulation of recognition criteria has been decidedly mixed. For example, studies by Estes and Maddox (1995), and Healy and Kubovy (1977) both failed to demonstrate criterion shifts during recognition paradigms that manipulated the base rates of studied items in conjunction with trial-based feedback procedures, although similar procedures easily produced shifts in their perceptual classification tasks. In contrast, Rhodes and Jacoby (2007) manipulated the relative item probabilities across two screen locations and found that in conjunction with trial-based feedback, subjects demonstrated different recognition criteria for the two locations. However, the criterion shifts were only prominent for observers explicitly aware of the location manipulation of target density, and the removal of the feedback greatly reduced the criterion difference. Critically, neither explicit awareness nor continued presence of the feedback reinforcement should be necessary if incremental reinforcement learning governed the effects. Using a different procedure, Verde and Rotello (2007) demonstrated an acquired criterion shift for separate halves of a test list containing well versus poorly encoded study items intermixed with lures. This design also used trial-based feedback, in addition to halting the test and providing performance summaries during testing. Although these researchers did not examine observer awareness of the test list characteristics, it is possible that the feedback and particularly the performance summaries may have explicitly alerted subjects to the fact that the well- and poorly-encoded study items were not distributed evenly across test halves. In total, although these designs importantly demonstrated criterion flexibility during the course of testing, they point towards a mechanism based on explicit awareness of test list regularities. Additionally, they confound a manipulation of the test list characteristics with the presence of trial-based feedback, which necessarily precludes assigning an exclusive role to the processing of feedback in the observed criterion shifts.

Despite the limited support for an incremental reinforcement learning mechanism governing episodic recognition, recognition decisions arguably share important similarities to feedback-based classification learning tasks. First, episodic information is often assumed to be multidimensional (Johnson et al., 1993; Yonelinas, 1994) such that the category “old” may depend upon complex combinations of different trace attributes in ways difficult to capture in a simple explicit response strategy. Second, under most measurement models of recognition, any decision criterion for responding will only yield positive feedback probabilistically because the evidence evoked by old and new items overlaps and cannot be fully separated by a simple criterion boundary (Macmillan & Creelman, 1991) (Figure 1). These two characteristics suggest that recognition judgments might be influenced by the same learning mechanisms shown to govern classification learning in information-integration and/or probabilistic classification tasks, provided such mechanisms are sensitive to abstract mnemonic evidence representations.

One study suggesting such a mechanism underlying learned criterion shifts was Han and Dobbins (2008), which used systematically misleading feedback in order to shift the relative criteria of two recognition groups. During the procedure, one group was given false positive feedback for errors of commission (false alarms) whilst the other was given false positive feedback for errors of omission (misses). All other feedback was correct. This design isolates any criterion shift solely to the nature of the feedback since the actual structure of the test lists remained equivalent across the groups. The manipulation shifted the relative criteria of the groups and this difference remained even when feedback became fully correct in the second test block of the design, suggesting a durable form of learning (cf. Rhodes & Jacoby 2007). Additionally, the majority of subjects did not report any perceived anomalies in the feedback itself post-test.

Although suggestive of incremental reinforcement learning, there were potential drawbacks to Han and Dobbins (2008). First, the feedback was fully deterministic in that every error of a particular kind received the false positive feedback. For example, in the condition designed to instill a lax criterion, all false alarms were incorrectly cued as correct responses. This meant that no “old” response ever received a negative feedback outcome for this group. Such deterministic feedback procedures are known to shift learning towards explicit rule use and away from incremental reinforcement learning, with probabilistic versus deterministic feedback conditions potentially engaging different neural learning systems (e.g., Frank & Kong, 2008; Mehta & Williams, 2002; Robinson et al., 1980). Second, the design relied exclusively upon false positive feedback in order to shift the criteria. This approach was chosen because it was assumed that subjects would be uncertain during the commission of errors and hence the manipulation of the feedback validity would be difficult to detect. Nonetheless, this also potentially weds the manipulation exclusively to surprising event outcomes. Although the reinforcement literature suggests this may be particularly useful for learning, as it should evoke considerable “positive prediction error” (Schultz, 2000), it may also increase the likelihood of explicit awareness of the manipulation. Finally, the design of Han and Dobbins (2008) failed to demonstrate that the criterion shifts survived complete removal of the feedback. Since a hallmark of successful incremental reinforcement learning is the perseverance of decision tendencies in the absence of any form of external reinforcement (e.g., Cincotta & Seger, 2007), it is critical to demonstrate that the acquired memory criterion shifts remain for some notable period absent feedback. Thus the goal of the current study was to examine whether feedback based memory criterion shifts demonstrated three key properties consistent with incremental reinforcement learning processes, namely; a) sensitivity to probabilistic feedback contingencies, b) not solely dependent upon surprising false positive outcomes, and c) persistence in the complete absence of supporting feedback.

GENERAL METHODS

Participants

Sixty-four Duke undergraduates (30 in Experiment 1; 34 in Experiment 2) participated in return for partial course credit. Informed consent was obtained as required by the human subjects review committee of Duke University. Experiment 1 administered a post-experiment questionnaire asking about the feedback procedures to assess participant awareness of the manipulation. One participant who correctly believed the feedback to be inaccurate or skewed was removed from Experiment 1.

Materials and Procedures

In Experiments 1 and 2, four lists of 200 words (average 7.09 letters, 2.34 syllables, with a Kucera-Francis corpus frequency of 8.85) items (100 studied- and 100 lure-items for each cycle) were constructed for use in sequential study/test cycles. List and condition assignment was randomized for each participant. During study, participants rated words on the computer screen for the number of syllables (“Counting syllables 1/2/3/more than 4”) within a limited amount of time (2 sec), immediately followed by a forewarned memory test. Participants were not forewarned that feedback would be present during testing. In each test, studied and lure items were randomly intermixed and presented serially for self-paced OLD/NEW recognition judgments. Following the old/new response, the participant rated confidence on a scale of 1–3 (“Confidence? Unsure =1 2 3= Certain”). Feedback, when given, immediately followed the confidence report. The key and only manipulation across experiments was the nature of the feedback.

Probabilistic Biased Feedback Manipulation

In Experiment 1, the validity of the feedback given to errors was probabilistically altered in order to tacitly encourage lax or strict responding. More specifically, a random portion of a particular type of error (misses or false-alarms) was incorrectly reported as “correct”. Participants were correctly informed during correct responses (hits and correct rejections). Consistent with incremental reinforcement learning principles, the general expectation was that participants would learn to favor the decision more often linked to a positive feedback outcome (“correct” feedback indications) and/or would learn to avoid the response option that more often led to negative outcomes (“incorrect” feedback indications). The false-feedback manipulation was restricted to errors since they are typically of low confidence and hence incorrect feedback should not raise suspicions on the part of the participants. In Experiment 2 the balance of positive/negative feedback was instead shifted by simply omitting correct, negative feedback for one or the other class of error (availability manipulation). The analyses employed the detection theoretic estimate of accuracy, A_z (Rotello et al., 2008), and criterion, c.

EXPERIMENT 1 - Criterion learning based on the probabilistic false feedback

The goal of Experiment 1 was to determine if a probabilistic variant of the false-feedback procedure would induce criterion shifts. Half of the participants were given false positive feedback “That is CORRECT” for approximately 70 percent of their incorrect “Old” classifications of new items (false alarms). All other responses received correct feedback. We refer to this as the Lax condition (L). For the other half of participants, approximately 70 percent of incorrect “New” classifications of old items (miss) received false positive feedback (S - Strict condition). Each group received the same manipulation (L or S) on the first two successive study/test cycles. Following this, two additional study/test cycles were given with no feedback whatsoever during testing (N - no feedback). This allowed us to assess durability of criterion learning in the absence of any external reinforcement. Thus there were two groups, one receiving LLNN feedback conditions and the other SSNN.

RESULTS & DISCUSSION

Accuracy (A_z)

A two-way ANOVA for A_z with factors of Group (LLNN or SSNN) and Test (First, Second, Third or Fourth) yielded no main effect of Group (p > .84) or Test (p > .09), and no evidence for an interaction between Group and Test (p > .32) suggesting that the groups displayed similar accuracy during each test (Table 1).

Table 1.

Experiment 1 accuracy and decision criterion estimates across groups and tests

	Strict-Strict-No FB-No FB				Lax-Lax-No FB-No FB

	TEST1	TEST2	TEST3	TEST4	TEST1	TEST2	TEST3	TEST4
Hit	.71(.10)	.68(.13)	.67(.17)	.63(.17)	.76(.07)	.78(.07)	.78(.08)	.74(.08)
False Alarm	.15(.08)	.17(.08)	.20(.11)	.20(.09)	.20(.07)	.26(.10)	.31(.17)	.29(.13)
A_z	.83(.10)	.83(.06)	.81(.10)	.79(08)	.85(.04)	.81(.08)	.79(.10)	.82(.08)
c	.26(.31)	.26(.30)	.20(.36)	.26(.35)	.06(.21)	−0.07(.24)	−0.12(.32)	−0.02(.25)
Proportion of Manipulated Trials	20.5(6.76)	21.3(8.16)	N/A	N/A	14.8(4.23)	18.9(6.76)	N/A	N/A

Open in a new tab

Note: Values in parentheses indicate standard deviations. FB = feedback

Decision Criteria (c)

ANOVA for decision criteria c with factors of Group and Test revealed a main effect of Group (F(1,28) = 11.85, p < .01, η²_p = .30) with the SSNN group demonstrating a more conservative criterion (mean c = .24) than the LLNN group (−.03). There was no main effect of Test (p > .19) and no evidence for an interaction between Group and Test (p > .65) suggesting a persistent difference in criterion across the two groups regardless of test. Pair-wise comparisons of the groups’ criteria during each of the four separate tests were all significant (t(28) = 2.05, 3.37, 2.57, & 2.48 respectively), although the smallest numerical difference in criteria across the groups was during Test 1 (Table 1).

The probabilistic nature of current feedback manipulation would have precluded the belief that a given type of response never resulted in errors, yet a relative shift was nonetheless induced. Furthermore, the no-feedback condition ruled out interpretations that necessarily rely on the continued presence of feedback. For example, if the criterion shift represented a trial-to-trial win-stay strategy (Frank & Kong, 2008) on the part of the participants, removing feedback should have eliminated the relative criterion differences. Finally, we parsed Tests 1 and 2 into sub-blocks (cumulative blocks of 40 trials (40, 80, 120, 160 & 200 trials)) to examine the emergence of the relative shift of criterion c in a finer grained manner within each test. Test 1 yielded a significant interaction (F(4,92) = 2.86, p < .05, η²_p = .11) between Group and Cumulative Sub-block, reflecting an increasingly larger criterion group difference as the total amount of false feedback accumulated within the test. The same analysis during Test 2 yielded a main effect of Group (F(1,23)=7,52, p < .05, η²_p = .25) and no evidence for the interaction between Group and Sub-block (p = .82) suggesting that the relative difference acquired during Test 1 had already reached asymptote and was carried largely intact into Test 2. Partially supporting this conclusion, when the criterion measures for each group were compared across the tests the SSNN group showed no difference between Tests 1 and 2 (p > .96), although the LLNN group did show a more liberal criterion in Test 2 versus Test 1 (t(14) = 2.48, p < .05). Overall, these findings suggest that some continued learning may take place across Tests1 and 2, but that the vast majority of criterion learning has occurred prior to the conclusion of Test 1, as indicated by a failure to find a Group by Cumulative Sub-block interaction in the second test. These findings are consistent with an incrementally learned recognition decision tendency¹.

EXPERIMENT 2 - Criterion learning based on net feedback outcomes

Based on the prior findings it could be argued that it was the unexpectedly positive outcomes of the manipulated feedback trials that are particularly important for the learning (e.g., Butterfield & Metcalfe, 2001; Schmidt et al., 1989). While this would not preclude a core role for incremental reinforcement learning, it should nonetheless be possible to demonstrate adaptive criteria whenever the balance of positive to negative reinforcement systematically favors one decision. Experiment 2 differentially reinforced the judgments by withholding feedback for certain types of errors. Thus from the subject’s perspective some small portion of trials simply failed to elicit feedback. These neutral, uninformative feedback trials should not reflect unexpectedly positive (or negative) outcomes, but they nonetheless would serve to shift the balance of reinforcement for the two decision types. For half the participants, the first two tests selectively encouraged lax responding by eliminating the negative feedback for their false alarms. All other response types were correctly identified by the feedback (Lax condition). For the other half of participants, their miss responses received no feedback (Strict condition). Thus for each group, one response type was associated with positive and negative outcomes whereas the other was associated with positive and neutral (no feedback) outcomes. Again, all feedback was eliminated during tests 3 and 4 (LLNN or SSNN).

RESULTS & DISCUSSION

Accuracy

ANOVA for A_z with factors of Group and Test yielded a significant main effect only of Test (F(3,96)=6.29, p<.001, η²_p = .16), with accuracy gradually declining across the entire experiment. Importantly, there was no interaction between Group and Test (p > .30) (Table 2).

Table 2.

Experiment 2 accuracy and decision criterion estimates across groups and tests

	Strict-Strict-No FB-No FB				Lax-Lax-No FB-No FB

	TEST1	TEST2	TEST3	TEST4	TEST1	TEST2	TEST3	TEST4
Hit	.63(.17)	.59(.17)	.59(.15)	.59(.16)	.73(.10)	.74(.13)	.77(.10)	.68(.12)
False Alarm	.16(.09)	.16(.09)	.21(.11)	.22(.13)	.26(.13)	.33(.16)	.34(.19)	.33(.16)
A_z	.78(.11)	.77(.09)	.75(.07)	.74(.09)	.82(.06)	.77(.07)	.79(.08)	.74(.08)
c	.33(.35)	.40(.38)	.31(.39)	.31(.41)	.04(.33)	−0.09(.40)	−0.15(.38)	0.00(.36)
Proportion of Manipulated Trials	36.8(16.67)	40.5(17.08)	N/A	N/A	25.6(12.88)	33.0(16.21)	N/A	N/A

Open in a new tab

Note: Values in parentheses indicate standard deviations. FB = feedback

Decision Criteria

ANOVA for c with factors of Group and Test revealed a main effect of Group (F(1,32) = 11.20, p < .01, η²_p = .25) (.34 vs. −.05 for SSNN vs. LLNN group, respectively). There was no main effect of Test (p > .11) or interaction between Group and Test (p > .07). Pair-wise comparisons of the groups’ criteria at each of the four separate tests were all significant (t(32) = 2.48, 3.61, 3.50, & 2.43 respectively), although again, the smallest numerical difference in criteria across the groups was during the first test (Table 2). To our knowledge, this is the first demonstration that the selective availability of feedback can be used to guide memory decision criterion placement, or criterion placement in general. Again, a finer grained analysis by cumulative test sub-blocks revealed a clear interaction within Test 1 between Group and Sub-Block (F(4,112) = 10.98, p < .01, η²_p = .28) suggesting a gradual acquisition of the learned criterion as withheld feedback accrued. The same analysis during Test 2 merely approached significance (F(4,116) = 2.40, p = .053, η²_p = .08) suggesting a small increase in criterion differences as further withheld feedback accrued. When Tests 1 and 2 criterion measures were directly compared for each Group, the differences were not significant across the tests for either the SSNN Group (p > .26) or the LLNN Group (p > .09). Similar to Experiment 1, this overall pattern suggests that the bulk of criterion learning occurred during the initial test, although some small degree of additional learning or relearning may have occurred during the second test.

General Discussion

A fundamental role for incremental reinforcement learning in episodic memory judgments has not been suggested in humans (c.f., Wixted and Gaitan (2002) in non-human animals). Although it is difficult if not impossible to completely rule out a role for explicitly maintained strategies in criterion shift experiments (Unkelbach, 2006), and using awareness questionnaires potentially taps only a portion of subject awareness (e.g., Merikle & Reingold, 1991), it is noteworthy that none of the participants included here reported awareness of the biased nature of the feedback manipulations. Furthermore, the current findings are quite similar to other classification learning phenomena that do not require explicit awareness of reward contingencies for learning. In total, these considerations support the notion that the current effects do not require participants to formulate explicit, rule-based strategies in reaction to the biased feedback manipulations, and they clearly demonstrate that no alteration in the test materials themselves is necessary in order to induce a criterion shift (cf. Rhodes & Jacoby 2007; Verde & Rotello 2007). Instead, an incremental reinforcement learning framework suggests that the current manipulations led to shifted decision preferences based on the relation of positive/negative outcomes and levels of recognition evidence.

The current data add to early evidence suggesting different routes to regulating episodic recognition decisions. The first, which has been extensively documented in the recognition literature, is an explicit strategy on the part of the observers typically formed following overt warnings or instructions. Furthermore, in those cases where feedback accompanied a detectable criterion shift for altered lists (e.g., Rhodes & Jacoby 2007), the feedback likely alerted the subjects to the list manipulation, and thus likely represents a similar strategy to those adopted by subjects following explicit instructions or warnings about the riskiness of certain responses. In contrast, the current findings suggest that subjects also appear to develop, through reinforcement learning, incrementally acquired tendencies that durably change the mapping of memory evidence types or levels onto decisions. Similar to the acquisition habits in other domains, these learned criterion shifts may not require subjects to maintain the intention of responding liberally or conservatively across the multiple trials of the test. Because current models of episodic recognition judgment typically do not assume two independent or partially independent decision influences, future work directly contrasting and attempting to doubly dissociate these putative mechanisms using various methods (i.e., behavioral, functional neuroimaging, special populations) holds promise for further elucidating the mechanisms that regulate the translation of memory content into judgments.

Footnotes

We declare that this manuscript is original, has not been published before and is not under currently being considered for publication elsewhere.

Although we do not present the data due to space considerations, individual variability in the number of false feedback trials modulated the size of induced criterion shifts. For example, a median split of observers receiving high versus low amounts of manipulated feedback demonstrated that the criterion shift was more prominent for subjects receiving high amounts of manipulated feedback during Experiments 1 and 2. It was not significant when the low subgroups were compared at each test level. Of course this outcome is expected if the feedback manipulation is the cause of the shift, and variability in the composition of the feedback across subjects is inherent in all designs that use individual performance feedback to modulate behavior (e.g., Rhodes & Jacoby, 2007).

References

Ashby F, Maddox W. Human category learning. Annual Review of Psychology. 2005;56:149–178. doi: 10.1146/annurev.psych.56.091103.070217. [DOI] [PubMed] [Google Scholar]
Ashby FG, O’Brien J. The effects of positive versus negative feedback on information-integration category learning. Perception & Psychophysics. 2007;69(6):865–878. doi: 10.3758/bf03193923. [DOI] [PubMed] [Google Scholar]
Azimian-Faridani N, Wilding E. The influence of criterion shifts on electrophysiological correlates of recognition memory. Journal of Cognitive Neuroscience Vol. 2006;18(7):1075–1086. doi: 10.1162/jocn.2006.18.7.1075. [DOI] [PubMed] [Google Scholar]
Butterfield B, Metcalfe J. Errors committed with high confidences are hypercorrected. Journal of Experimental Psychology: Learning, Memory, and Cognition. 2001;27:1491–1494. doi: 10.1037//0278-7393.27.6.1491. [DOI] [PubMed] [Google Scholar]
Cincotta CM, Seger CA. Dissociation between striatal regions while learning to categorize via feedback and via observation. Journal of Cognitive Neuroscience. 2007;19(2):249–265. doi: 10.1162/jocn.2007.19.2.249. [DOI] [PubMed] [Google Scholar]
Dorfman DD, Biderman M. A learning model for a continuum of sensory states. Journal of Mathematical Psychology. 1971;8(2):264–284. [Google Scholar]
Estes WK, Maddox W. Interactions of stimulus attributes, base rates, and feedback in recognition. Journal of Experimental Psychology: Learning, Memory, & Cognition. 1995;21(5):1075–1095. doi: 10.1037//0278-7393.21.5.1075. [DOI] [PubMed] [Google Scholar]
Frank M, Kong L. Learning to avoid in older age. Psychology and Aging. 2008;23:392–398. doi: 10.1037/0882-7974.23.2.392. [DOI] [PubMed] [Google Scholar]
Gluck MA, Bower GH. Evaluating an adaptive network model of human learning. Journal of Memory and Language. 1988;27(2):166–195. [Google Scholar]
Han S, Dobbins IG. Examining recognition criterion rigidity during testing using a biased feedback technique: Evidence for a adaptive criterion learning. Memory & Cognition. 2008;36(4):703–715. doi: 10.3758/mc.36.4.703. [DOI] [PMC free article] [PubMed] [Google Scholar]
Healy AF, Kubovy M. A comparison of recognition memory to numerical decision: How prior probabilities affect cutoff location. Memory & Cognition. 1977;5(1):3–9. doi: 10.3758/BF03209184. [DOI] [PubMed] [Google Scholar]
Hirshman E. On the utility of the signal detection model of the remember-know paradigm. Consciousness and Cognition: An International Journal. 1998;7(1):103–107. doi: 10.1006/ccog.1998.0330. [DOI] [PubMed] [Google Scholar]
Johnson MK, Hashtroudi S, Lindsay D. Source monitoring. Psychological Bulletin. 1993;114(1):3–28. doi: 10.1037/0033-2909.114.1.3. [DOI] [PubMed] [Google Scholar]
Knowlton BJ, Mangels JA, Squire LR. A neostriatal habit learning system in humans. Science. 1996;273(5280):1399–1402. doi: 10.1126/science.273.5280.1399. [DOI] [PubMed] [Google Scholar]
Macmillan NA, Creelman C. Detection theory: A user’s guide. New York, NY: Cambridge University Press; 1991. [Google Scholar]
Mehta R, Williams D. Elemental and configural processing of novel cues in deterministic and probabilistic tasks. Learning and Motivation. 2002;33:456–484. [Google Scholar]
Merikle P, Reingold E. Comparing direct (explicit) and indirect (implicit) measures to study conscious memory. Journal of Experimental Psychology: Learning, Memory, and Cognition. 1991;17(2):224–233. [Google Scholar]
Poldrack RA, Sabb FW, Foerde K, Tom SM, Asarnow RF, Bookheimer SY, et al. The neural correlates of motor skill automaticity. Journal of Neuroscience. 2005;25(22):5356–5364. doi: 10.1523/JNEUROSCI.3880-04.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
Rhodes MG, Jacoby LL. On the dynamic nature of response criterion in recognition memory: Effects of base rate, awareness, and feedback. Journal of Experimental Psychology: Learning, Memory, and Cognition. 2007;33(2):305–320. doi: 10.1037/0278-7393.33.2.305. [DOI] [PubMed] [Google Scholar]
Robinson AL, Heaton RK, Lehman RAW, Stilson DW. The utility of the wisconsin card sorting test in detecting and localizing frontal lobe lesions. Journal of Consulting and Clinical Psychology. 1980;48(5):605–614. doi: 10.1037//0022-006x.48.5.605. [DOI] [PubMed] [Google Scholar]
Rotello CM, Macmillan NA, Reeder JA, Wong M. The remember response: Subject to bias, graded, and not a process-pure indicator of recollection. Psychonomic Bulletin & Review. 2005;12(5):865–873. doi: 10.3758/bf03196778. [DOI] [PubMed] [Google Scholar]
Rotello CM, Masson MEJ, Verde M. Type i error rates and power analyses for single-point sensitivity measures. Perception & Psychophysics. 2008;70(2):389–401. doi: 10.3758/pp.70.2.389. [DOI] [PubMed] [Google Scholar]
Saint-Cyr JA, Taylor AE, Lang AE. Procedural learning and neostriatal dysfunction in man. Brain. 1988;111:941–959. doi: 10.1093/brain/111.4.941. [DOI] [PubMed] [Google Scholar]
Schmidt RA, Young DE, Swinnen S, Shapiro DC. Summary knowledge of results for skill acquisition: Support for the guidance hypothesis. Journal of Experimental Psychology: Learning, Memory, and Cognition. 1989;15(2):352–359. doi: 10.1037//0278-7393.15.2.352. [DOI] [PubMed] [Google Scholar]
Schultz W. Multiple reward signals in the brain. Nature Reviews, Neuroscience. 2000;1:199–207. doi: 10.1038/35044563. [DOI] [PubMed] [Google Scholar]
Strack F, Foerster J. Reporting recollective experiences: Direct access to memory systems? Psychological Science. 1995;6(6):352–358. [Google Scholar]
Thomas EA. On a class of additive learning models: Error-correcting and probability matching. Journal of Mathematical Psychology. 1973;10(3):241–264. [Google Scholar]
Unkelbach C. The learned interpretation of cognitive fluency. Psychological Science Vol. 2006;17(4):339–345. doi: 10.1111/j.1467-9280.2006.01708.x. [DOI] [PubMed] [Google Scholar]
Van Zandt T. Roc curves and confidence judgments in recognition memory. Journal of Experimental Psychology: Learning, Memory, and Cognition. 2000;26(3):582–600. doi: 10.1037//0278-7393.26.3.582. [DOI] [PubMed] [Google Scholar]
Verde MF, Rotello CM. Memory strength and the decision process in recognition memory. Memory & Cognition. 2007 doi: 10.3758/bf03193446. [DOI] [PubMed] [Google Scholar]
Wixted JT, Gaitan SC. Cognitive theories as reinforcement history surrogates: The case of likelihood ratio models of human recognition memory. Learning & Behavior. 2002;30(4):289–305. doi: 10.3758/bf03195955. [DOI] [PubMed] [Google Scholar]
Yonelinas AP. Receiver-operating characteristics in recognition memory: Evidence for a dual-process model. Journal of Experimental Psychology: Learning, Memory, & Cognition. 1994;20(6):1341–1354. doi: 10.1037//0278-7393.20.6.1341. [DOI] [PubMed] [Google Scholar]

[R1] Ashby F, Maddox W. Human category learning. Annual Review of Psychology. 2005;56:149–178. doi: 10.1146/annurev.psych.56.091103.070217. [DOI] [PubMed] [Google Scholar]

[R2] Ashby FG, O’Brien J. The effects of positive versus negative feedback on information-integration category learning. Perception & Psychophysics. 2007;69(6):865–878. doi: 10.3758/bf03193923. [DOI] [PubMed] [Google Scholar]

[R3] Azimian-Faridani N, Wilding E. The influence of criterion shifts on electrophysiological correlates of recognition memory. Journal of Cognitive Neuroscience Vol. 2006;18(7):1075–1086. doi: 10.1162/jocn.2006.18.7.1075. [DOI] [PubMed] [Google Scholar]

[R4] Butterfield B, Metcalfe J. Errors committed with high confidences are hypercorrected. Journal of Experimental Psychology: Learning, Memory, and Cognition. 2001;27:1491–1494. doi: 10.1037//0278-7393.27.6.1491. [DOI] [PubMed] [Google Scholar]

[R5] Cincotta CM, Seger CA. Dissociation between striatal regions while learning to categorize via feedback and via observation. Journal of Cognitive Neuroscience. 2007;19(2):249–265. doi: 10.1162/jocn.2007.19.2.249. [DOI] [PubMed] [Google Scholar]

[R6] Dorfman DD, Biderman M. A learning model for a continuum of sensory states. Journal of Mathematical Psychology. 1971;8(2):264–284. [Google Scholar]

[R7] Estes WK, Maddox W. Interactions of stimulus attributes, base rates, and feedback in recognition. Journal of Experimental Psychology: Learning, Memory, & Cognition. 1995;21(5):1075–1095. doi: 10.1037//0278-7393.21.5.1075. [DOI] [PubMed] [Google Scholar]

[R8] Frank M, Kong L. Learning to avoid in older age. Psychology and Aging. 2008;23:392–398. doi: 10.1037/0882-7974.23.2.392. [DOI] [PubMed] [Google Scholar]

[R9] Gluck MA, Bower GH. Evaluating an adaptive network model of human learning. Journal of Memory and Language. 1988;27(2):166–195. [Google Scholar]

[R10] Han S, Dobbins IG. Examining recognition criterion rigidity during testing using a biased feedback technique: Evidence for a adaptive criterion learning. Memory & Cognition. 2008;36(4):703–715. doi: 10.3758/mc.36.4.703. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] Healy AF, Kubovy M. A comparison of recognition memory to numerical decision: How prior probabilities affect cutoff location. Memory & Cognition. 1977;5(1):3–9. doi: 10.3758/BF03209184. [DOI] [PubMed] [Google Scholar]

[R12] Hirshman E. On the utility of the signal detection model of the remember-know paradigm. Consciousness and Cognition: An International Journal. 1998;7(1):103–107. doi: 10.1006/ccog.1998.0330. [DOI] [PubMed] [Google Scholar]

[R13] Johnson MK, Hashtroudi S, Lindsay D. Source monitoring. Psychological Bulletin. 1993;114(1):3–28. doi: 10.1037/0033-2909.114.1.3. [DOI] [PubMed] [Google Scholar]

[R14] Knowlton BJ, Mangels JA, Squire LR. A neostriatal habit learning system in humans. Science. 1996;273(5280):1399–1402. doi: 10.1126/science.273.5280.1399. [DOI] [PubMed] [Google Scholar]

[R15] Macmillan NA, Creelman C. Detection theory: A user’s guide. New York, NY: Cambridge University Press; 1991. [Google Scholar]

[R16] Mehta R, Williams D. Elemental and configural processing of novel cues in deterministic and probabilistic tasks. Learning and Motivation. 2002;33:456–484. [Google Scholar]

[R17] Merikle P, Reingold E. Comparing direct (explicit) and indirect (implicit) measures to study conscious memory. Journal of Experimental Psychology: Learning, Memory, and Cognition. 1991;17(2):224–233. [Google Scholar]

[R18] Poldrack RA, Sabb FW, Foerde K, Tom SM, Asarnow RF, Bookheimer SY, et al. The neural correlates of motor skill automaticity. Journal of Neuroscience. 2005;25(22):5356–5364. doi: 10.1523/JNEUROSCI.3880-04.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R19] Rhodes MG, Jacoby LL. On the dynamic nature of response criterion in recognition memory: Effects of base rate, awareness, and feedback. Journal of Experimental Psychology: Learning, Memory, and Cognition. 2007;33(2):305–320. doi: 10.1037/0278-7393.33.2.305. [DOI] [PubMed] [Google Scholar]

[R20] Robinson AL, Heaton RK, Lehman RAW, Stilson DW. The utility of the wisconsin card sorting test in detecting and localizing frontal lobe lesions. Journal of Consulting and Clinical Psychology. 1980;48(5):605–614. doi: 10.1037//0022-006x.48.5.605. [DOI] [PubMed] [Google Scholar]

[R21] Rotello CM, Macmillan NA, Reeder JA, Wong M. The remember response: Subject to bias, graded, and not a process-pure indicator of recollection. Psychonomic Bulletin & Review. 2005;12(5):865–873. doi: 10.3758/bf03196778. [DOI] [PubMed] [Google Scholar]

[R22] Rotello CM, Masson MEJ, Verde M. Type i error rates and power analyses for single-point sensitivity measures. Perception & Psychophysics. 2008;70(2):389–401. doi: 10.3758/pp.70.2.389. [DOI] [PubMed] [Google Scholar]

[R23] Saint-Cyr JA, Taylor AE, Lang AE. Procedural learning and neostriatal dysfunction in man. Brain. 1988;111:941–959. doi: 10.1093/brain/111.4.941. [DOI] [PubMed] [Google Scholar]

[R24] Schmidt RA, Young DE, Swinnen S, Shapiro DC. Summary knowledge of results for skill acquisition: Support for the guidance hypothesis. Journal of Experimental Psychology: Learning, Memory, and Cognition. 1989;15(2):352–359. doi: 10.1037//0278-7393.15.2.352. [DOI] [PubMed] [Google Scholar]

[R25] Schultz W. Multiple reward signals in the brain. Nature Reviews, Neuroscience. 2000;1:199–207. doi: 10.1038/35044563. [DOI] [PubMed] [Google Scholar]

[R26] Strack F, Foerster J. Reporting recollective experiences: Direct access to memory systems? Psychological Science. 1995;6(6):352–358. [Google Scholar]

[R27] Thomas EA. On a class of additive learning models: Error-correcting and probability matching. Journal of Mathematical Psychology. 1973;10(3):241–264. [Google Scholar]

[R28] Unkelbach C. The learned interpretation of cognitive fluency. Psychological Science Vol. 2006;17(4):339–345. doi: 10.1111/j.1467-9280.2006.01708.x. [DOI] [PubMed] [Google Scholar]

[R29] Van Zandt T. Roc curves and confidence judgments in recognition memory. Journal of Experimental Psychology: Learning, Memory, and Cognition. 2000;26(3):582–600. doi: 10.1037//0278-7393.26.3.582. [DOI] [PubMed] [Google Scholar]

[R30] Verde MF, Rotello CM. Memory strength and the decision process in recognition memory. Memory & Cognition. 2007 doi: 10.3758/bf03193446. [DOI] [PubMed] [Google Scholar]

[R31] Wixted JT, Gaitan SC. Cognitive theories as reinforcement history surrogates: The case of likelihood ratio models of human recognition memory. Learning & Behavior. 2002;30(4):289–305. doi: 10.3758/bf03195955. [DOI] [PubMed] [Google Scholar]

[R32] Yonelinas AP. Receiver-operating characteristics in recognition memory: Evidence for a dual-process model. Journal of Experimental Psychology: Learning, Memory, & Cognition. 1994;20(6):1341–1354. doi: 10.1037//0278-7393.20.6.1341. [DOI] [PubMed] [Google Scholar]

PERMALINK

Regulating recognition decisions through incremental reinforcement learning

Sanghoon Han

Ian G Dobbins

Abstract

Figure 1.

GENERAL METHODS

Participants

Materials and Procedures

Probabilistic Biased Feedback Manipulation

EXPERIMENT 1 - Criterion learning based on the probabilistic false feedback

RESULTS & DISCUSSION

Accuracy (A_z)

Table 1.

Decision Criteria (c)

EXPERIMENT 2 - Criterion learning based on net feedback outcomes

RESULTS & DISCUSSION

Accuracy

Table 2.

Decision Criteria

General Discussion

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Regulating recognition decisions through incremental reinforcement learning

Sanghoon Han

Ian G Dobbins

Abstract

Figure 1.

GENERAL METHODS

Participants

Materials and Procedures

Probabilistic Biased Feedback Manipulation

EXPERIMENT 1 - Criterion learning based on the probabilistic false feedback

RESULTS & DISCUSSION

Accuracy (Az)

Table 1.

Decision Criteria (c)

EXPERIMENT 2 - Criterion learning based on net feedback outcomes

RESULTS & DISCUSSION

Accuracy

Table 2.

Decision Criteria

General Discussion

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

Accuracy (A_z)