Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2009 Jun 9.
Published in final edited form as: Mem Cognit. 2009 Jan;37(1):52–64. doi: 10.3758/MC.37.1.52

Remembering Words Not Presented in Sentences: How Study Context Changes Patterns of False Memories

Laura E Matzen 1, Aaron S Benjamin 1
PMCID: PMC2694445  NIHMSID: NIHMS111796  PMID: 19103975

Abstract

People falsely endorse semantic associates and morpheme rearrangements of studied words at high rates in recognition testing. The co-existence of these results is paradoxical: Models of reading that presume automatic extraction of meaning can not account for elevated false memory for foils that are related to studied stimuli only by their visual form, and models without such a process can not account for false memory to semantic foils. Here we show how sentence and list study contexts encourage different encoding modes and consequently lead to different patterns of memory errors. Participants studied compound words such as “tailspin” and “floodgate” as single words or embedded in sentences. We show that sentence contexts led subjects to be better able to discriminate conjunction lures (”tailgate”) from old words than did list contexts. Conversely, list contexts led to superior discrimination of semantic lures (”nosedive”) from old words than did sentence contexts.

Keywords: recognition memory, false memory, context effects, conjunction errors, semantic errors, reading, encoding strategies


Memory errors provide a rich source of data for investigating the structure and organization of human memory. Examination of the factors that lead to the creation of false memories tells us about the underlying composition and arrangement of memory representations. A large body of research has investigated memory errors related to aspects of verbal stimuli, and these studies have found a surprising variety of errors for different kinds of materials. While some studies have found that participants make memory errors by endorsing words or sentences that are similar to studied items in terms of their meaning (Johnson, Bransford & Solomon, 1973; Roediger & McDermott, 1995), others have found that participants endorse words that are similar to the studied items only in visual appearance (Jones & Jacoby, 2001; Underwood & Zimmerman, 1973). These two patterns of results have very different implications in terms of what information about a studied word is encoded and retained in memory.

The strategy of using memory errors to infer something about the nature of memory representations is one that has been widely employed in social psychology (e.g., Hastie & Kumar, 1979; Payne, Jacoby, & Lambert, 2004), as well as in some subdomains of cognitive psychology, such as source memory (e.g., Bayen, Nakamura, DuPuis, & Yang, 2000; Hicks & Cockman, 2003). However, the recent rush in interest in false memory motivated by a report by Roediger & McDermott (1995; see also Deese, 1959) has spurred more theorizing about the retrieval, matching, or decision processes that yield such errors (e.g., Benjamin, 2001; Brainerd & Reyna, 1998; 2001; Israel & Schacter, 1997; Gallo & Roediger, 2002; Miller & Wolford, 1999; Payne, Elie, Blackwell, & Neuschatz, 1996) than about how memory errors can be exploited to better understand the types of representations that promote them. Here we examine the two most commonly employed experimental contexts for the study of verbal stimuli—lists and sentences—and show that those contexts modulate the types of memory errors that arise. By doing so, we show that study context influences the form and not just the strength of memory representations, and we provide a preliminary explanation of specifically how this takes place.

Our primary goal in this article is to demonstrate that different study contexts have predictable effects on patterns of false memory, effects that reveal something about the nature of encoding strategies in those contexts. The results also have several interrelated implications for understanding the nature of memory errors and how to study them. Principally, we show that overemphatic theorizing about the processes at test that promote false memory miss part of the picture: Processes at encoding set the stage for false memory by promoting representations that are biased towards the goals of the learner (see, e.g., Benjamin, 2008). Such bias determines which representations are confusable with one another, and thus what types of false memory are observed. Thus, a second implication is that it can be misleading to compare or collapse across measures of false memory that appear similar but follow different encoding regimens. We show that very different types of information are extracted from individual words embedded in meaningful sentence contexts and those same words when they are studied in lists.

Types of memory errors

First, we review two of the most prominently studied types of memory errors and argue that the presumed bases for these errors are somewhat contradictory and their coexistence thus somewhat problematic for current theory. Conjunction errors occur when participants mistakenly endorse test words that are perceptually or phonetically recombined versions of actually studied words (Jones & Jacoby, 2001; Reinitz, Lammers, & Cochran, 1992; Underwood, Kapelak, & Malmi, 1976; Underwood & Zimmerman, 1973). The study stimuli in such experiments are usually individual compound words such as “blackmail” and “jailbird,” and the critical lures at test are rearrangements of those words, such as “blackbird.” An analogous pattern of errors is evident for lures that consist of recombined syllables of shorter study words such as “instruct” and “consult” (where “insult” is the lure; Underwood & Zimmerman, 1973). In these experiments, semantic relationships between the studied words and the conjunction lures are typically minimized in order to rule out the possibility that semantics might underlie the effect. Thus, the high rate of such conjunction errors suggests that the surface forms of words and syllables are maintained in memory for some time, leading to a misleading sense of familiarity when word components are recombined.

These results are surprising in light of other studies that show that participants remember the semantics of studied items but retain little information about their surface forms. Studies using sentences or stories as stimuli (Bransford & Franks, 1971; Brewer, 1977; Johnson, Bransford, & Solomon, 1973) have found that participants have little or no memory for the surface forms of the words or sentences that they have studied, thus leading to semantic errors. Participants seem to distill these longer stimuli down to their basic meaning, losing any information about the exact structure of the sentences (cf. Bock & Brewer, 1974). Potter and Lombardi (1990) showed that although readers who were engaged in sentence processing tasks were largely accurate in their recall of the sentences, they seemed to be reconstructing each sentence based on their memory for its message-level meaning rather than using stored information about the exact words and their order. This led participants to substitute semantically similar words for the original words in the sentences, even in immediate recall.

Analogous results have been found in numerous experiments using recognition tasks. For example, people are likely to endorse test items if they contain the same basic ideas as sentences on the study list, even if those ideas are in very different sentence structures and are combined with additional related sentences (Bransford & Franks, 1971). Similarly, when participants are presented with short stories describing an event that had a probable but unstated consequence, they incorrectly endorse test sentences about the implied event (Johnson, Bransford, & Solomon, 1973). For example, having heard “The boy hit the baseball and watched as it flew into the picture window in the house,” participants are likely to endorse as previously heard a statement about a baseball breaking a window. Although this statement was never actually heard, it did follow logically from the events of the heard story. Brewer (1977) found that participants were sometimes even more likely to remember the unstated implication of a sentence than the original sentence itself. After hearing the sentence “The hungry python caught the mouse,” participants were far more likely to recall “The hungry python ate the mouse” than they were to correctly recall the original sentence. These memory errors indicate that participants remember the gist of the studied sentences but little about their surface forms. When asked to recognize or recall the surface forms of the studied sentences, participants reconstruct the sentences using a combination of the gist information they have stored and their knowledge of likely events in the world.

Similar effects have been found in experiments using word lists containing semantically or associatively related words. Using categorized word lists of common semantic associates, Roediger and McDermott (1995) found that participants often falsely recalled and recognized the unstudied associate from which those stimuli were drawn. This effect occurs even when the relationship between words is purely semantic and not associative (e.g., Benjamin & Bawa, 2004; Shiffrin, Huber, & Marinelli, 1995).

In each of these experiments, the critical memory error—the false recognition of or recall for material semantically related to studied material—reveals that information about the form of the individual sentences, stories, or thematic lists was lost in memory, leaving only abstract representations of their basic meanings. At test, the participants relied on the gist of the items they had studied, often in combination with their own inferences and world knowledge. Without access to any information about the surface forms of the original words and sentences, participants were highly susceptible to memory errors based on similarity in meaning between the study and test items.

Influences of study context

On the face of it, the existence of these two types of errors seems paradoxical. If participants rapidly lose information about the surface characteristics of linguistic stimuli, then why do they false alarm to semantically dissimilar but physically related lures? Likewise, if they retain those surface characteristics, then why are they prone to falsely endorsing semantically related lures? In this paper, we consider the question of whether the context of study can modulate the type of encoding and consequently the form of memory for words. We start from the perspective of Benjamin (2008), who argued that encoding is always strategic, and that any evaluation of memory performance requires a assessment of the learner’s goals and the task affordances. Sentences imply a very different goal set for the learner than do lists of unrelated words. In almost every instance of the participants’ lives prior to entering this experiment, their memory for sentences has been “assessed” by their ability to recall the semantics of the material. Students are not instructed to repeat the text back verbatim on essay tests (in fact, they may encounter charges of plagiarism if they do); telling stories among friends requires the adequate reconstruction of events in a series rather the reproduction of specific words. The rare instances in which verbatim reproduction is valued, such as reciting the Gettysburg Address or retelling a joke with peculiar syntax and specific words that are critical to the humor, are difficult and prone to error.

On the other hand, encountering lists of words provides a very different context and set of goals. Grocery lists, to-do lists, and vocabulary terms for a foreign-language test are all contexts that emphasize the need for verbatim retention: Remembering that I need to get “food” rather than a set of specific items when I reach the grocery store is useless; the burden is on me to remember the exact individual items rather than their gist.

Due to this accumulation of experience in day-to-day life, participants in an experimental setting are likely to take different approaches to words that are presented in different study contexts. Sentence or story contexts encourage participants to discard surface information, likely because the context implies its lack of future usefulness and because of the considerable demand on the systems underlying encoding and comprehension to retain the surface information for a number of sentences. Participants likely focus instead on the meaning of each item, or on associations between words in the sentence or in the thematic list. From the strategic-encoding perspective, then, we assume that sentences should elicit a relatively greater evaluation of the semantic content of individual words, whereas lists should encourage a lower-level retention strategy that promotes greater verbatim recall.

These ideas also relate to the transfer-appropriate processing account for memory performance. This account holds that memory performance is enhanced to the degree that the same kinds of processing are used during both study and test. For example, Morris, Bransford, and Franks (1977) showed that participants performed better on a semantic recognition task after doing semantic processing during encoding, but performed better on a rhyme recognition task after doing rhyme processing during encoding. A levels-of-processing account would predict that participants would do better on both tests after doing semantic processing during encoding, in which case the words should be more deeply encoded. However, the pattern found by Morris and colleagues showed that the match between the types of processing called for at study and test outweighed the effects of deeper processing at encoding.

With respect to false memory, the degree of match or mismatch between the type of processing that a participant uses during study and the types of lures presented at test should play a role in determining the participant’s susceptibility to the lures. A strategy of discarding surface information and encoding information at a deep, semantic level should give rise to the types of semantic memory errors seen in experiments in which participants remembered the general theme of the items but little about their exact form. At the same time, this study strategy should make participants less susceptible to conjunction errors of the type reported by Jones and Jacoby (2001). With less information about the visual forms of the words they had studied, participants would not experience such a high degree of match between physically similar lures and memory for the study list. This would lead to fewer false memories in response to the conjunction lures.

The opposite pattern of false memories would then obtain for participants who study lists of individual, decontextualized words. This study context signals to the comprehension system that extracting meaning is difficult and less useful, and perhaps that no clear “gist” is being formed across the study session. Thus, surface structure is retained to a greater degree and the information that participants retain makes them less susceptible to falsely endorsing semantic lures. The cost of this process is that lures that are composed of rearranged surface structures become more alluring by virtue of their relatively greater match with the contents of memory for the study episode. However, without the context provided by a sentence or story, people may be more likely to remember specific details about the word rather than just the gist of its meaning within a larger unit. In this situation, semantically related test items may be less likely to lead to memory errors simply because people will have more specific memories about the studied words that could help them to reject the lures.

While some prior studies have compared memory for words studied out of context to those studied in sentences (e.g. Murnane & Shiffrin, 1991), their focus has been how the number of memory traces stored is affected by changes in context. Our focus in the present study is how changes in context influence the nature of the information that is encoded for studied words and how those changes can account for seemingly discrepant patterns of memory errors in the previous literature on false memory. To our knowledge, this is the first study to investigate the processing of compound words within sentence contexts. In addition, although there has been some investigation of false memories for sentences (cf. Reinitz et al., 1992), the sentences used typically have the same basic frame for all items (such as “The X saw the Y” with different nouns substituted for X and Y) and provide little meaningful context. The current study used much richer and more natural sentence contexts, more like those a reader would encounter in everyday life.

In summary, the nature of the information that people retain in memory when studying a list of words should influence a tradeoff between meaning-based and structure-based false memory. When words are placed in a rich semantic context, such as a sentence, the way in which they are processed and the information that is gleaned from them is likely to change. This change should influence the pattern of false memories, making people more susceptible to semantic lures but less susceptible to conjunction lures. In the present study, we conducted three experiments to test these predictions. In the first experiment, participants studied either a list of compound words or a list of sentences in which the same compound words were placed into sentence contexts. Both groups of participants were then given identical memory tests that included conjunction lures that were visually similar to the studied words. In the second experiment, participants studied the same lists of compound words or sentences but were given a memory test that included semantic lures that were similar in meaning to the studied words. In the third experiment, participants studied both single words and sentences and received a memory test that included both semantic and conjunction lures. We hypothesized that conjunction lures and old items would be less discriminable following word-list study than sentence-context study because those lures place a premium on memory for surface structure. Similarly, semantic lures should be less discriminable from old items following sentence-context study than word-list study.

Analytic techniques in the measurement of false recognition

Traditional studies of false memory evaluate false remembering in several ways. Most commonly, they examine mean false alarm rates between conditions. This strategy is appropriate when the response policy is equivalent between the relevant conditions, and a detection-theoretic interpretation of this analysis is depicted in the top panel of Figure 1. As long as the rememberer employs the same response criterion for endorsing an item across conditions, the false alarm rate reveals something about the relative proportion of items that surpass that criterion, or how compelling those items are to the rememberer. As can be seen, this strategy does not need to deal with the location of the criterion (the dotted line) nor with the location of the distribution for studied items because they remain constant across the conditions of interest.

Figure 1.

Figure 1

Detection-theoretic representations of the assessment of false memory. Top panel: A direct comparison of false alarm rates is appropriate when a rememberer uses the same response criterion across conditions. Middle panel: Overall memory for the studied items does not differ across conditions but the rememberer uses different response criteria. In this case, the appropriate measurement of false memory is a measure of discriminability, such as a comparison of the distance between the distributions. Bottom panel: Both the overall memory for the studied items and the placement of response criteria differ across conditions. The appropriate measurement of false memory here is a comparison of the relative distances between distributions across the different conditions. The current experiments make the comparison shown in the lower panel by using Δ da as a measure of discriminability that can be compared across experimental conditions.

Alternatively, if the response policy is thought to differ between conditions but overall memory for the studied items does not, then the appropriate measure of false memory is not simply the false alarm rate, but rather an estimate of the discriminability of old and new items. This can be seen in the middle panel of Figure 1. Because the criteria differ with the conditions of interest, the false alarm rates reflect a confluence of false memory and different response policies. For example, if one were to compare false memory for different types of lures, one could be reasonably certain that memory for the actually studied old items would not vary with the manipulation, but that the response criterion might. In that case, a measure of discriminability or distance between the distributions circumvents the problems posed by different criteria.

The final case represents the current situation, in which one wishes to compare false memory for different types of items across different conditions. In the experiments in the present study, the participants studied words under experimental conditions that were likely to lead to different levels of overall memory as well as different response criteria across conditions. The items in the different experimental conditions differ in discriminability, as represented by the two distributions for old items in the lower panel of Figure 1. This makes direct comparison of distances, as shown in the middle panel, inappropriate. In addition, since it is likely that the different lure types promote different criteria, and that the differences in discriminability exacerbate these differences (e.g., Hirshman, 1995), the strategy shown in the top panel inappropriate as well. Our strategy thus involves comparing the relative distances between distributions, between conditions, as we detail below.

Current analysis

The measure of relative discriminability used in this study is da, which is based on basic assumptions of the Theory of Signal Detection (Green & Swets, 1966), as applied to recognition memory (Egan, 1958), and effectively handles the evidence that, in recognition, the underlying probability distributions, unlike those shown in Figure 1, differ in variance as well as mean. Participants rated the test items based on whether they believed the words to be old or new and the rating data were used to generate isosensitivity functions1. da represents the shortest distance from the origin of a two-dimensional space to the isosensitivity function when plotted in normal-deviate coordinates (scaled by a constant), and is used here (see also Banks, 2000; Benjamin, Diaz, & Wee, 2008) as a measure of discrimination between old and unrelated new items, as well as between old items and lures. Its metric properties make it ideally suited for this novel analysis of false memory in which we compare susceptibility to memory errors across conditions with different overall levels of performance.

A particularly compelling lure leads to responses that are more similar to those seen for old items and less similar to those seen for unrelated new items. By comparing how well participants are able to discriminate lures from old items relative to how well they are able to discriminate new, unrelated items from old items, we are able to determine how compelling semantic and conjunction lures are relative to one another under conditions in which overall levels of performance and response bias are different. A high da value indicates that participants were largely successful at discriminating one group of items from another. For example, if the da value is higher for the old-new comparison than for the old-lure comparison, this indicates that participants were better at discriminating new items from old items than they were at discriminating lures from old items. In other words, that difference would indicate that participants were more likely to identify a lure as being old than they were to identify a new, unrelated word as being old, indicating that the lures were more compelling and led to more memory errors. Additionally, da values have metric qualities which allow us to make direct comparisons across conditions by subtracting the old-lure da value from the old-new da value for each participant to generate Δ da values (Green & Swets, 1966; Peterson, Birdsall, & Fox, 1954; Swets, 1986). The resulting Δ da values indicate how likely the participants are to correctly identify a lure as being a new item. If the Δ da value is small, it indicates that the participant was largely successful at identifying the lures as new words. A small Δ da shows that the participant typically responded to lures and to new, unrelated items in the same way. A high Δ da value indicates that the participant was often unsuccessful at identifying the lures as unstudied items and that he or she was more likely to respond to them as if they were old items. The Δ da values allow us to determine the relative discriminability of different types of lures from old items following different study conditions.

Experiment 1

Method

Participants

Sixty-one University of Illinois undergraduates participated in the experiment for credit in an introductory psychology course. Five participants were dropped because they were not native English speakers, leaving 56 participants (23 female) whose data was included in the analysis. The mean age of the participants was 19 (range 18-29).

Design

The critical variable was whether items were studied within the context of sentences or not (manipulated between-subjects). Item types at test were old, (unrelated) new, and conjunction lures (manipulated within-subjects). The remainder of the design variables were for counterbalancing purposes and will be described below. The dependent variable was confidence in the recognition judgment, used to generate individual isosensitivity functions within each condition.

Materials

There were a total of 384 compound words forming 128 triplets in which two parent words (such as “tailspin” and “floodgate”) were recombined to form a conjunction lure (“tailgate”). Eighty of the triplets were from the set used by Jones and Jacoby (2001), some of which were modified slightly.

The stimuli were divided into four counterbalanced lists. In each list there were 64 triplets for which both parent words were studied and the to-be-rejected conjunction lure was tested. This yielded 128 study items and 64 test items. For the remaining 64 triplets, one parent was studied and served as an old, to-be-endorsed, item on the test. The other parent was unstudied and served as a new (to-be-rejected) lure on the test. This yielded an additional 64 study items and 128 test items. Thus, both study and test lists were 192 words in length.

Table 1 depicts example items and illustrates the counterbalancing procedure. One counterbalancing variable reversed the sets of old and new items (compare conditions 1 and 2) and also reversed the study order of the parents for the conjunction lure. The second counterbalancing variable (compare conditions 1 and 3) swapped the triplets such that the items that had served as parents for conjunction lures in the other condition now served as the old/new item set, and vice-versa. The counterbalancing yielded four unique lists, each of which was assigned a unique study order. The old/new items were randomly placed within a subset of positions reserved for those items. The positions of the parents of the to-be-tested conjunction lure were maintained, but the assignment of Parent 1 (P1) and Parent 2 (P2) to those positions was counterbalanced. For example, “blackmail” appeared before “jailbird” on one list and vice versa on another. For each pair of parent compounds, P1 and P2 were separated on the study list by 1-5 intervening words, with an average separation of three intervening words. The variation in spacing was included so that it would be very difficult for the participants to notice that the parent words could be recombined to form other words.

Table 1.

Examples of Items and Counterbalancing Procedure for Experiment 1

Counter-
balancing
Condition
Study Test
Parent 1a Parent 1b Parent 2a Conjunction
Lure
Old
(Parent 2a)
New
(Parent 2b)
1 blackmail jailbird tailspin blackbird tailspin floodgate
2 jailbird blackmail floodgate blackbird floodgate tailspin
3 tailspin floodgate blackmail tailgate blackmail jailbird
4 floodgate tailspin jailbird tailgate jailbird blackmail

Four additional experimental lists were created by placing each of the parent compound words in a sentence context such as “The fighter plane went into a tailspin after it was hit by enemy fire.” This produced a total of 256 sentences that were placed into experimental lists using the same pseudorandom order that was created for the original word lists. The test lists that were used in the sentence study condition were identical to those that were used in the word list study condition.

Each of the eight experimental lists was divided into four study blocks containing 48 experimental items and 2 filler items, one at the beginning and one at the end of the block. In addition to the two fillers, each study block contained 32 parents of to-be-tested conjunction lures and 16 parents to be tested as old items. Each study block was followed by a test block containing 16 conjunction lures, 16 old items, and 16 unstudied parent items. These were intermixed in a pseudorandom order so that no more than four test items of the same type appeared consecutively. Because some morphemes appeared in more than one word, particularly in the sentence study condition, care was taken to ensure that the two morphemes in each of the lure words on the test block appeared the same number of times in the preceding study block. Additionally, the morphemes that formed the new items in the test block did not appear in any of the words in the preceding study block.

Procedure

Participants were seated in front of a computer monitor in a quiet room and were either instructed that they would be studying a list of words or that they would be studying a list of sentences for a subsequent memory test. They were instructed in advance that there would be four study blocks containing 50 items each and four test blocks that would test their memory for the preceding study block. Participants were given a chance to rest between blocks.

All of the words were presented in the center of the computer monitor in black 16-point Times New Roman font text on a white background. The compound words in the word lists were presented individually for two seconds and were followed by a 250 ms interstimulus interval. The sentences in the sentence lists were presented for 8 seconds with a 250 ms interstimulus interval. During the test phases, participants saw one compound word at a time and were asked to respond by pressing the keys 1-4 on the computer keyboard. A response of “1” indicated that the participants were sure that the word had not appeared on the study list, a response of “2” indicated that they thought the word was new, but were not sure, a response of “3” indicated that they thought they had studied the word, but were not sure, and a response of “4” indicated that they were sure that they had studied the word. Each word stayed on the screen along with a guide indicating what each response choice meant until the participant selected his or her response.

The sentence lists had an additional test phase that followed each of the aforementioned test blocks and contained six yes-or-no questions about the content of various sentences in the preceding study block. For example, following a study block containing the sentences “After he discovered evidence of a crime, the butler threatened his employer with blackmail” and “The best player on the little league team was the young boy who played shortstop,” the content test posed questions such as “Did the butler find evidence of a crime?” and “Was the pitcher the best player on the little league team?” These comprehension questions varied in difficulty and some made reference to the compound word in a studied sentence while others did not, as in the examples above. This test was included because, after seeing the first test phase which only tested memory for compound words, the participants might have stopped reading the sentences and focusing instead on the compound words embedded within them. The comprehension tests after each block were included in an effort to keep the participants reading the sentences as naturally as possible throughout the experiment. On average, the participants responded to these questions correctly 81% of the time and the percentages of correct responses were similar across all of the blocks (ranging from 76% to 87% correct), indicating that the comprehension test was successful at prompting the participants to read all of the sentences.

The experiment lasted approximately 20 minutes for the word lists and 45 minutes for the sentence lists.

Analysis

The goal of the current experiment was to examine the extent to which sentence contexts modulate the plausibility of conjunction lures and to do so independently of any effects that context manipulation may have on overall response bias or accuracy. The raw performance data are likely to reveal an expected but uninteresting advantage for the word-list study condition because of the smaller number of exposed words as well as the shorter interval between the study items and test, and this advantage may additionally encourage a more liberal response bias (Benjamin & Bawa, 2004). Thus, as discussed above, a direct comparison of false alarm rates across the two experimental conditions would not be meaningful. Instead of analyzing false alarm rates, da values were calculated for old-new and old-lure discrimination for each participant. These values were entered into a mixed-model ANOVA with discrimination type (old-new vs. old-lure) as a within-subjects variable and study context (sentence vs. word) as a between-subjects variable. The measure of interest is Δ da, the difference between da values for the two different comparisons.

Results

Table 2 provides the mean proportions of each confidence rating for each item type and Figure 2 shows the Δ da values for Experiment 1. Individual ratings tables were used to generate isosensitivity functions for the discrimination of old from unrelated test items and conjunction lures for both the sentence-study and the word-study conditions. These functions were used to compute da values for each participant. All effects described below are significant at the α < .05 level unless otherwise noted. Overall discrimination differed between study-context conditions, as evidenced by the differences in da in old-new recognition (da = 1.77 for word list study and da = 1.33 for sentence study; t(54) = 3.13). As noted above, it is not surprising that participants were somewhat less accurate overall in the sentence-study condition, given the much larger amount of information presented to them.

Table 2.

Mean Proportions of Each Confidence Rating for Each Item Type in Experiment 1

Confidence Ratings
Study
Context
Test Item
Type
1
(Sure New)
2
(Unsure
New)
3
(Unsure Old)
4
(Sure Old)
Sentence
Condition
New 0.27 0.51 0.16 0.07
Conjunction
Lure
0.23 0.45 0.19 0.12
Old 0.07 0.19 0.17 0.57
Word
Condition
New 0.47 0.40 0.10 0.03
Conjunction
Lure
0.35 0.37 0.16 0.12
Old 0.07 0.17 0.16 0.60

Figure 2.

Figure 2

Δ da values for all three experiments. Δ da is the difference between the da value for old-new discrimination and the da value for old-lure discrimination for each condition in each of the experiments.

The critical test concerns the discrimination of old items from conjunction lures, which was expected to be relatively superior in the sentence-study condition. This result obtained: discrimination was only slightly poorer for the old-lure (da = 1.16, Δ da = 0.17) than the old-new comparison in the sentence condition (t(54) = 1.33; ns) but was considerably lower (da = 1.34, Δ da = 0.43) in the word-study condition (t(54) = 3.05). This yielded a reliable interaction between lure type and study context (F [1, 54] = 16.25). This interaction confirms that old items were more easily discriminated from conjunction lures in the sentence-study than the word-study condition.

Discussion

Experiment 1 showed that participants were more susceptible to incorrectly endorsing conjunction lures if they studied a list of words rather than a list of sentences. Although participants in the sentence-study condition were presented with much more information and had poorer memory for the words overall, participants in the word-study condition experienced relatively more difficulty in discriminating conjunction lures from old items than did participants in the sentence-study condition.

This result supports the hypothesis that the words in the study lists were encoded differently depending on their context. When the words were studied without sentence contexts, participants retained more information about the surface forms of the words and less information about their meaning. In this case the conjunction lures provided a better match to the contents of memory for the study episode, and participants were more likely to endorse the lures even though their meanings did not match with any of the original words on the study list. On the other hand, participants who studied a list of sentences retained little information about the surface form of each word, but they were likely to retain some information about the gist of each sentence. This strategy could help the participants in two ways. First, with little surface information encoded in memory, the conjunction lures are poor matches for the contents of memory and they are less likely to be endorsed as “old.” Second, the information about the gist of each sentence can be used to reject the conjunction lures (Odegard, Lampinen, & Toglia, 2005). A participant could easily reject a lure such as “blackbird” if he or she remembered that there were no sentences about birds on the study list.

This interpretation of differences in word processing produced by changes in study context has implications for other types of false memories as well. If participants in the sentence-study condition retained information about the gist of the sentences but not the surface forms of the words, they should be more susceptible to semantic lures that are related to the studied items in meaning but not in form. Similarly, if participants in the word-study condition retained more information about the surface forms of the words but relatively less information about their meaning, they should be relatively better at discriminating semantic lures from old items. We tested these predictions in Experiment 2.

Experiment 2

Method

Participants

Fifty-one University of Illinois undergraduates participated in the experiment for credit in an introductory psychology course. Three participants were dropped because they were not native English speakers, leaving 48 participants (21 female) whose data was included in the analysis. The mean age of the participants was 20 (range 18-33).

Design

As in Experiment 1, the critical variable was whether items were studied within the context of sentences or not (manipulated between-subjects). Item types at test were old, (unrelated) new, and conjunction lures (manipulated within-subjects). The dependent variable was confidence in the recognition judgment.

Materials

Experiment 2 used the same compound words and sentences that were used in Experiment 1. In addition to the compound words, there were 128 words that were semantically related to one of the parent words. These words were selected so that they would be interchangeable with the original compound words in both the list and the sentence contexts. For example, the semantic associate for “tailspin” was “nosedive,” and the two words were both appropriate in the sentence context “The fighter plane went into a tailspin/nosedive after it was hit by enemy fire.” While some of the semantic associates were also compound words, most were not.

As in Experiment 1, the stimuli were divided into four counterbalanced study lists that were 192 items in length. In each list there were 64 items for which one of the words in the semantically associated pair was studied and the other member of the pair was presented at test as a to-be-rejected semantic lure. An additional 64 items contained compound words or their semantic associates that were presented in the same form at test and served as old, to-be-endorsed items. The remaining 64 items were filler items that were taken from Experiment 1. These items were included to make the study phases of Experiments 1 and 2 as similar as possible. The assignment of the pairs of semantic associates to the old or lure conditions was counterbalanced across lists. The assignment of the semantic associates within each pair to study or test was also counterbalanced across lists. The critical items, old items, and fillers were placed in a pseudorandom order and the experimental items were substituted into the appropriate slots to create four unique study lists.

On each test list there were 64 to-be-rejected semantic lures, 64 to-be-endorsed old words, and 64 new, unrelated words. Unlike in Experiment 1, the new items for a given list could not be taken from among the old or lure items from other lists because of the inherent semantic relationships among the critical items. Instead, the new words for each list were drawn from a pool of 107 words that had no semantic association with any of the words on the study lists. As with the lures and old items, slightly more than half of the new words were compound words. These compounds did not share syllables with any of the words on the study lists. The new words used for each test list were matched with the lures and the old words on that list in terms of length and frequency. Across all of the test lists, the average length of the words was 7.82 letters for old and lure items (as these were the same words appearing in different conditions on different lists) and 7.78 letters for the new items. The average frequency of the words was 15.79 for the old and lure items and 16.86 for the new items (based on the Kucera and Francis [1967] norms included in Balota, Cortese, Hutchison, Neely, Nelson, Simpson, & Treiman, 2002; a frequency value of zero was assumed for items not appearing in the database).

The lists for Experiment 2 were divided into study and test blocks in the same way as in Experiment 1. Each was divided into four study blocks containing 48 experimental items and 2 filler items, one at the beginning and one at the end of the block. In addition to the two fillers, each study block contained 16 semantic associates of to-be-tested semantic lures, 16 words to be tested as old items, and 16 filler items. Each study block was followed by a test block containing 16 semantic lures, 16 old items, and 16 unrelated new items. These were intermixed in a pseudorandom order so that no more than four test items of the same type appeared consecutively. Care was taken to ensure that the semantic lures on the test list were related to one and only one word in the preceding study list.

Four additional experimental lists were created by placing each of the critical items in the appropriate sentence context. The sentences were identical to those used in Experiment 1 except for a few minor modifications made in order to eliminate words that were semantically related to one of the critical items. The 256 sentences were placed into experimental lists using the same pseudorandom order that was created for the word lists. Each of the four sentence lists was divided into four study blocks containing 50 sentences each. The test blocks were identical to those that were used for the word lists. Again, care was taken to ensure that the semantic lures on the test list were related to one and only one word in the preceding study list.

As in Experiment 1, the sentence lists included sets of comprehension questions at the end of each test block. The questions were identical to those used in Experiment 1 and on average participants answered 80% of them correctly. The rate of correct responses was similar across all blocks (ranging from 78% to 83% correct), indicating that the participants continued to read the sentences throughout the experiment.

The procedure and analysis used in Experiment 2 were identical to those used in Experiment 1.

Results and Discussion

Table 3 provides the mean proportions of each confidence rating for each item type and Figure 2 shows the Δ da values for Experiment 2. As in Experiment 1, the overall discrimination differed between study-context conditions, as evidenced by differences in da in old-new recognition (da = 2.03 for word study and da = 1.56 for sentence study; t(46) = 2.44).

Table 3.

Mean Proportions of Each Confidence Rating for Each Item Type in Experiment 2

Confidence Ratings
Study
Context
Test Item
Type
1
(Sure New)
2
(Unsure
New)
3
(Unsure Old)
4
(Sure Old)
Sentence
Condition
New 0.31 0.52 0.14 0.03
Semantic
Lure
0.33 0.40 0.17 0.10
Old 0.06 0.20 0.18 0.56
Word
Condition
New 0.58 0.30 0.09 0.03
Semantic
Lure
0.58 0.29 0.08 0.06
Old 0.09 0.11 0.11 0.70

The critical test concerns the discrimination of old items from semantic lures, which was expected to be relatively superior in the word-study condition. As expected, discrimination decreased only slightly for the old-lure comparison in the word-study condition (da = 1.95, Δ da = 0.08, t(48) = 0.37; ns), and decreased more substantially for the old-lure comparison in the sentence-study condition (da = 1.32, Δ da = 0.23, t(44) = 1.83; p < .05 one-tailed). Most importantly, this yielded a significant interaction between lure type and study context (F [1, 46] = 5.62).

As predicted, the pattern of results seen in Experiment 2 was the opposite of that seen in Experiment 1. When presented with semantic lures at test, the participants were better able to discriminate the lures from the old items if they had studied a list of words rather than a list of sentences. The kind of word processing involved in studying a list of sentences made participants relatively more susceptible to the semantic lures. In this case they encoded the gist of each sentence but less information about the specific words in the sentence. When presented with semantic lures that were consistent with the gist of a studied sentence, participants were likely to endorse them as old items. As in previous studies in which participants endorsed the unstated implications of studied sentences (e.g. Brewer, 1977), the gist information encoded from the sentences in Experiment 2 provided powerful cues that led the participants to endorse the semantic lures. For example, a participant might remember that he or she read a sentence about a fighter plane crashing. The test item “nosedive” fits very well with this general scenario, and with little information encoded about the surface forms of the words in the original sentence, the participant is unlikely to remember that the word in the original sentence was actually “tailspin.” With gist information supporting the semantic lure and little surface information to contradict it, the participant is likely to endorse the lure.

However, when the participants were presented with a list of individual words rather than a list of sentences, they encoded relatively less semantic information and relatively more of the details of the surface forms of the words. This information benefited the participants both by making the semantic lures less appealing and by providing them with details that could help to reject lures that were consistent with the original items in meaning but not in form.

It is possible that the between-subjects manipulation of lure type and the use of four separate study and test blocks led the participants to notice the relationship between the studied items and the lures at test. This could have prompted them to develop an unusual study strategy, such as ignoring the sentence contexts and searching for compound words. To eliminate this problem and other confounds that could stem from the between-subjects design used in Experiments 1 and 2, we conducted an additional experiment using a entirely within-subjects design. Experiment 3 combined both sentences and isolated words in the study list as well as both conjunction lures and semantic lures in the test phase. In addition, Experiment 3 used a single study list and a single test list. The single study-test phase design ensures (1) that participants could not tailor their study strategy across items in anticipation of seeing a particular type of lure, (2) that there could be no changes in encoding strategy as a function of test experience over multiple blocks, and (3) that participants would be forced to read and attend to the full sentences when they were presented.

Experiment 3

Method

Participants

Twenty-seven University of Illinois undergraduates participated in the experiment for credit in an introductory psychology course. Three participants were dropped because they were not monolingual English speakers, leaving 24 participants (two female) whose data was included in the analysis. The mean age of the participants was 20 (range 18-24).

Design

As in Experiments 1 and 2, one critical variable was whether the items were studied within the context of sentences or not, but in Experiment 3 this variable was manipulated within-subjects rather than between subjects. The second critical variable, lure type, was also manipulated within-subjects in Experiment 3. The item types at test were old, (unrelated) new, conjunction and semantic lures whose parent words appeared in sentence contexts at study, and conjunction and semantic lures whose parent words appeared as single out-of-context words at study.

Materials

Experiment 3 used a subset of the compound words and sentences that were used in the previous two experiments plus five new items that were created to avoid repetition of morphemes in the study lists. The stimuli were divided into eight counterbalanced study lists containing 160 items each. Ninety-six of these items were rotated through the same experimental conditions that were used in Experiment 1. On each list, 64 of the items from this subset (32 sentences and 32 single words) contained parent words that were recombined at test and presented as to-be-rejected conjunction lures. The other 32 items from this subset (16 sentences and 16 single words) contained compound words that were presented in the same form at test, serving as to-be-endorsed old items. Each study list also contained 64 items that were rotated through the same experimental conditions that were used in Experiment 2. Thirty-two of the items in this subset (16 sentences and 16 single words) contained one member of a pair of close semantic associates. The other member of this pair was presented at test as a to-be-rejected semantic lure. The order in which the two members of the pair appeared was counterbalanced across lists. The remaining 32 items in this subset (16 sentences and 16 words) contained one member of a pair of semantic associates that was presented in the same form at test, serving as a to-be-endorsed old item.

The 160 study items for each list were placed in a pseudorandom order with the appropriate versions of each item placed in each slot to create eight unique study lists. Each study list had an associated test list that contained 192 items. Of the test items, 32 were conjunction lures, 32 were semantic lures, 64 were old items, and 64 were new, unrelated items. All of the conjunction lures and approximately half of the semantic lures were compound words, so a similar pattern was created for the old and new items in which approximately three-fourths of the old and new items were compound words and one-fourth were not. The same 64 new items were used for all eight lists. The new items were matched as closely as possible to the old items and lures in terms of length and frequency. The average length of the words on the test list was 8.26 letters for old items, 8.04 letters for lures, and 8.27 letters for new items. The average frequency of the test items was 12.53 for the old items, 10.55 for the lures, and 6.30 for the new items (based on the Kucera and Francis [1967] norms included in Balota et al., 2002; a frequency value of zero was assumed for items not appearing in the database). The 192 test items for each list were placed in a pseudorandom order so that no more than three items of the same type appeared in a row. The same order was used for all eight test lists with the appropriate test items substituted into each slot.

Unlike Experiments 1 and 2 where there were four separate study and test blocks, Experiment 3 used a single study phase followed by a single test list. For each test list, care was taken to ensure that the two morphemes in each conjunction lure appeared the same number of times (twice for one item, once for all other items) in the preceding study list. Additionally, none of the morphemes in any of the semantic lures or new items appeared anywhere in the preceding study list. There were no sentence comprehension questions in Experiment 3 because the participants did not know what the test phase would be like until they had completed the entire study block, and it is unlikely that they would adopt a study strategy in which they ignored the sentence contexts.

Procedure

Participants were instructed that they would be studying a list of intermixed words and sentences for a subsequent memory test. During the study phase, one item at a time (a single word or a sentence) was presented on the computer monitor in black 16-point Times New Roman font on a white background. Single words were presented for two seconds and sentences were presented for eight seconds with a 250 ms interstimulus interval. The words and sentences were quasi-randomly intermixed with no more than four single words or four sentences appearing in a row. The test phase was the same as in Experiments 1 and 2, with the participants rating each test word on a scale from 1-4.

Analysis

In the analysis of Experiment 3, da values were calculated for old-new, old-conjunction lure and old-semantic lure discrimination for each type of study context for each participant. Four Δ da values were calculated for each participant by subtracting the old-lure da values for each study context condition from the old-new da values for each condition. The Δ da values for each participant were then entered into a within-subjects ANOVA with lure type (conjunction vs. semantic) and study context (sentence vs. word) as dependent variables.

Unlike Experiments 1 and 2, the within-subjects design of Experiment 3 allows for a meaningful comparison of hit rates and false alarm rates for different conditions. High confidence responses were taken as the best indicator of the participants’ performance, so to analyze the false alarm rates, the number of high confidence “yes” responses for each lure condition for each participant was entered into a within-subjects ANOVA with lure type (conjunction vs. semantic) and study context (sentence vs. word) as dependent variables.

Results and Discussion

Table 4 provides the mean proportions of each confidence rating for each item type and Figure 2 shows the Δ da values for Experiment 3. The difference in overall discrimination between study-context conditions was marginally significant, as shown by the differences in da in old-new recognition (da = 1.25 for word study and da = 0.96 for sentence study, t(23) = 1.98, p = 0.06). This difference in discrimination replicates the first two experiments, but is of a smaller magnitude. The fact that this difference is small in the present experiment is to be expected because of the within-subjects design. In the first two experiments, participants saw only one type of study stimulus and studied more or less information overall depending on whether they studied sentences or words. Both of these factors make it likely that the participants who studied lists of sentences in the first two experiments would set very different response criteria than those who studied lists of words. In Experiment 3, where all of the participants studied both words and sentences and studied the same amount of information overall, it is very likely that their response criteria would be much more similar for the different study-context conditions.

Table 4.

Mean Proportions of Each Confidence Rating for Each Item Type in Experiment 3

Confidence Ratings
Study
Context
Test Item
Type
1
(Sure New)
2
(Unsure
New)
3
(Unsure Old)
4
(Sure Old)
Sentence
Condition
Conjunction
Lure
0.35 0.41 0.16 0.09
Semantic
Lure
0.35 0.40 0.14 0.11
Old 0.21 0.24 0.16 0.39
Word
Condition
Conjunction
Lure
0.32 0.39 0.18 0.11
Semantic
Lure
0.38 0.40 0.17 0.05
Old 0.14 0.24 0.18 0.43
New Items 0.41 0.42 0.13 0.04

Critically, the interaction between study context (word or sentence) and lure type (semantic lure or conjunction lure) was reliable (F [1, 23] = 5.04), replicating the effects seen in Experiments 1 and 2. There was a bigger decrease in performance for the conjunction lures whose parent items were studied as single words (da = 0.88, Δ da = 0.37, t(23) = 4.91) than there was for the conjunction lures whose parent items had been studied in sentences (da = 0.76, Δ da = 0.20, t(23) = 3.20). The opposite pattern obtained for the semantic lures, with a bigger decrease in discrimination for lures whose parent items were presented in sentences (da = 0.73, Δ da = 0.23, t(23) = 4.09) than there was for lures whose parent items were presented as single words (da = 1.10, Δ da = 0.15, t(23) = 2.34).

The same pattern holds for the hits and false alarms in Experiment 3. Unlike Experiments 1 and 2, the within-subjects design on Experiment 3 makes it possible for us to make a direct comparison of hit rates and false alarm rates across study conditions. The number of high confidence yes responses was taken to be the best measure of the participants’ susceptibility to the lures, so this number was used to calculate the hit rate and false alarm rates for each condition for each participant. The average hit rate for the old items did not differ significantly across study contexts (39% for old items that were originally studied in sentences and 43% for items that were originally studied as single words, t(23) = 0.84). As discussed above, the similar hit rates for the two conditions are to be expected in this experiment because of the within-subjects design. For the conjunction lures, the average percentage of high confidence false alarms was 9% in the sentence condition and 11% in the word list condition. For the semantic lures, the average false alarm rates were 11% in the sentence condition and 5% in the word list condition. The interaction between item type (word or sentence) and lure type (semantic lure or conjunction lure) was reliable (F [1, 23] = 6.71), just as it was for the da values.

General Discussion

The goal of these experiments was to examine the effects of study context on different types of false memories. The two very different patterns of memory errors seen in these experiments suggest that changes in study context alter the way in which people process words and encode them in memory. When presented with a list of sentences, participants were less susceptible to conjunction lures but more susceptible to semantic lures. The opposite was true following study of decontextualized words.

These findings indicate that participants engage different encoding strategies when studying words in or out of a larger meaningful context. These strategies are not necessarily a conscious decision, but rather are based on the participants’ previous experience with verbal materials in everyday life and the information that can be gleaned from the studied items. Before beginning the experiment, the participants have extensive experience with reading and remembering words in numerous contexts, including sentences and lists. Through this experience, they are likely to have developed expectations about what kinds of information about words are most useful in different contexts. When they encounter sentences or out-of-context words in an experimental setting, this previous experience is likely to guide their strategy for encoding the items. In the present experiments, when the participants encountered words in a sentence context, they encoded information about the gist of each sentence but relatively less information about the exact forms of the words they contained. When they studied isolated words, the participants encoded relatively less semantic information about the words but more detail about their surface features. Each type of processing had advantages and disadvantages. Retaining more semantic information through gist processing allowed the participants to reject lures that did not fit with the gist of any of the original sentences. However, semantic lures that were consistent with the meaning of one of the sentences were difficult to reject, especially with little information about the surface features that could help to distinguish one semantic associate from another. When the participants retained more information about the surface features of the words and less information about their semantics, they were better able to reject semantic lures that did not match the forms of the original words. Yet they were also more likely to false-alarm to conjunction lures that strongly resembled the studied words in form but not in meaning.

These results can be understood as an example of transfer-appropriate processing. When the participants studied words in sentence contexts, they used encoding processes that were well suited to the kinds of information that they would need to reject semantic lures presented at test. However, these same encoding processes were poorly matched to the kind of information needed to reject conjunction lures at test. The opposite was true for words that were presented out of context. This kind of stimulus presentation promoted encoding processes that were well matched with the kinds of processing needed for higher performance on a test using semantic lures, but poorly matched for a test using conjunction lures.

It is important to note that in the experiments described here, the way in which the participants processed the words is more constrained in the sentence study condition than in the word study condition. The participants were simply instructed to read and to try to remember the materials. The context provided by the sentences constrains the meaning of the critical words and the way in which they are processed while the out-of-context words remain unconstrained. Our focus in the present study is on naturalistic study strategies that participants might adopt when simply asked to read and remember different types of verbal materials. We feel that it was important to leave the participants’ choices about study strategies as unconstrained as possible in order to gain insight on how strategy choice may have affected the results of previous experiments on conjunction and semantic memory errors. We used the patterns of memory errors produced in the present experiments to infer what kinds of study strategies the participants were using during encoding. In future research, it would be beneficial to give participants specific instructions about how to encode the out-of-context words so that their processing of the words would be similarly constrained across both study context conditions. By manipulating the encoding instructions, it should be possible to alter the resulting patterns of memory errors in very specific ways, which would further strengthen the findings from the present study.

The pattern of errors found in this study can explain the larger pattern of results evident in previous research on false memories for words. The results of studies finding high rates of conjunction errors (Jones & Jacoby, 2001; Reinitz, Lammers, & Cochran, 1992; Underwood, Kapelak, & Malmi, 1976; Underwood & Zimmerman, 1973) suggest that the surface forms of words are maintained in memory while little semantic information is retained to contradict the sense of familiarity produced by lures that are visually similar to studied words. On the other hand, studies of semantic errors (Bransford & Franks, 1971; Brewer, 1977; Johnson, Bransford, & Solomon, 1973) suggest that only gist information is stored in memory while information about the surface forms of words is discarded. Although these results seem to be discrepant, this pattern can be explained by taking into account the types of study materials that have been used in these two different sets of experiments. The studies that found high rates of conjunction errors typically used lists of out-of-context words, while those that found semantic errors typically used sentence or story contexts. As we have shown in the present study, this difference in study contexts plays a crucial role in how the studied items are encoded. The encoding strategy that participants use when they encounter out-of-context words leads them to focus relatively more on the exact form of the word and less on its meaning, making the participants less able to reject conjunction errors at test. Conversely, the encoding strategy that participants adopt when studying sentences or stories leads them to encode the gist of the sentences and relatively little information about their forms, making the participants less able to reject semantic lures at test. These different encoding strategies, which arise from the participants’ day-to-day experience with language, lead them to exhibit different patterns of memory error in experiments with different study contexts.

An exception to this pattern come from experiments using the Deese-Roediger-McDermott (DRM) paradigm (Roediger & McDermott, 1995) in which high numbers of semantically-related false alarms are found in response to lists of out-of-context words. However, the strong semantic relationships between the words on the study list are likely to promote semantic processing for the words in a way that studying a list of unrelated words does not. Additionally, the inclusion of phonological associates in DRM lists has been found to greatly increase the number of false memories (Watson, Balota & Roediger, 2003), leading to errors similar to those seen in studies using conjunction lures. In light of these findings, it seems that the DRM paradigm creates a study context that is like a word list in some ways and like sentence processing in other ways. The close relationships among the words promote semantic processing, making semantically related lures difficult to reject (or easy to generate, in the case of recall tests). The absence of a larger context for the studied words, such as a sentence or story, also promotes attention to the word forms. While encoding this information might help the participants to reject semantic lures at test, the high number of related words in the DRM lists makes this difficult. Instead, the encoded word form information can make the participants susceptible to form-based memory errors in addition to semantic errors.

We do not wish to argue that retrieval processes are unimportant with respect to the production of memory errors. However, it is also important to take encoding strategies into account, as they affect what kinds of information are available for retrieval. For example, many previous studies have accounted for conjunction error data using dual process models (cf. Jones & Jacoby, 2001; Marsh, Hicks, & Davis, 2002). In this account, conjunction errors occur when familiarity is unopposed by recollection. If the lure at test is a recombined word, it may seem familiar because its syllables appeared in other words during the study phase. If the participant cannot remember the words that those syllables actually appeared in, this sense of familiarity could lead to an endorsement of the lure. Manipulations that decrease recollection, such as dividing attention at study (Jones & Jacoby, 2001; Odegard & Lampinen, 2005), imposing a response deadline (Jones & Jacoby, 2001), or placing studied items into very similar contexts (Marsh et al., 2002; Reinitz & Hannigan, 2004; Underwood et al., 1976) consistently increase conjunction error rates. Our present experiments indicate that manipulations of study context also influence error rates by changing what information is encoded and therefore what types of lures seem familiar. When words are presented in sentence contexts rather than in a list, participants engage in gist processing and encode less information about the specific words in the sentences. With this type of information encoded in memory, conjunction lures will not seem familiar at test and conjunction error rates will go down, as demonstrated in our experiments.

In addition to determining what sorts of information seem familiar, study strategies and the nature of the encoded information also affect recollection attempts. The process of recollection rejection, where participants are able to reject a lure by recalling its parent item, has been widely studied with respect to conjunction lures and other types of false memories (Brainerd & Reyna, 2002; Brainerd, Reyna, Wright, & Mojardin, 2003; Hintzman, Curran & Oppy, 1992; Lampinen et al., 2004). When participants are successfully able to recall the originally studied words, they are also much more successful at rejecting the conjunction lures (Lampinen, Odegard & Neuschatz, 2004; Odegard & Lampinen, 2005, Odegard, Lampinen & Toglia, 2005). Thus, factors such as study strategies that influence encoding can also influence recollection rejection by making the relevant information about the studied items more or less difficult to recall. For example, Lloyd (in press) found that pairing compound words with pictures during study reduced the rate of conjunction errors. Including a picture of the object named by the word provided a richer context that made participants more likely to remember the word’s meaning. This in turn made them less susceptible to the conjunction lures. In the present study, placing compound words into sentences may have had much the same effect. If the participants were able to recall the gists of the studied sentences, they could have eliminated lures that did not seem to fit with the gist of any particular sentence. For example, when presented with the lure “blackbird,” a participant may be able to determine that there were no sentences on the study list about birds, which would enable him or her to reject the lure.

While building models of retrieval processes is clearly very important for understanding memory errors, more attention to factors that affect encoding is needed. Our experiments have shown that taking encoding differences into account can resolve some discrepancies in the memory error literature. A closer look at the interplay between encoding and retrieval could also serve to strengthen existing theories of false memory.

In summary, study context plays an important role in determining the motivational factors that influence what information people remember about studied words. Learners adopt different strategies when confronted with meaningful sentences as opposed to a list of unrelated words. When a word is presented in a larger context such as a sentence or story, people are unlikely to remember the exact features of the word. Instead, they encode the gist of the whole item, a strategy that is less demanding and also likely to be successful in most language processing settings. This strategy makes people more susceptible to errors when they are presented with lures that are semantically similar to words in the original sentences or stories. In a real-world language processing task such as listening to a conversation, this is unlikely to be problematic because the basic meaning of the message would be unchanged. Additionally, previous research has found that there are cues in normal language processing situations that can direct the comprehender’s attention to the surface features of the words when such attention is necessary. For example, Birch and Garnsey (1995; see also Watson & Benjamin, 2007) found that listeners had better memory for the exact forms of words that were prosodically focused than they had for words that were not focused. This indicates that listeners generally process the gist of the sentences they hear, but that they can change their processing strategy and shift their attention to the surface features of the words if the speaker indicates that those words are particularly important. This same sort of change in strategy can account for the high number of conjunction errors found in memory experiments. Our experiments demonstrated that people are highly flexible in their study strategies, and that when presented with out-of-context words, they change their processing strategy to fit the context. In a situation where extracting meaning is difficult, people focus more on the surface structures of the words. These structures are subsequently retained in memory and can influence performance at test. The change in study context leads to a change in strategy, and this in turn changes what people are able to encode and remember for later use.

Acknowledgments

This work was funded in part by grant R01 AG026263 to Aaron Benjamin from The National Institutes of Health. It was also supported by the Sandia National Laboratories Excellence in Engineering Fellowship provided to Laura Matzen and is a part of her doctoral dissertation. We thank Todd Jones for sending us the stimuli used in Jones & Jacoby (2001).

Footnotes

1

These figures are more typically called receiver- (or relative-) operating characteristics, or ROCs. We have chosen to use the transparent nomenclature of Luce (1963; see also Benjamin & Wee, 2007).

References

  1. Banks WP. Recognition and source memory as multivariate decision processes. Psychological Science. 2000;11:267–273. doi: 10.1111/1467-9280.00254. [DOI] [PubMed] [Google Scholar]
  2. Balota DA, Cortese MJ, Hutchison KA, Neely JH, Nelson D, Simpson GB, Treiman R. The English Lexicon Project: A web-based repository of descriptive and behavioral measures for 40,481 English words and nonwords. Washington University; 2002. http://elexicon.wustl.edu/ [Google Scholar]
  3. Bayen UJ, Nakamura GV, Dupuis SE, Yang CL. The use of schematic knowledge about sources in source monitoring. Memory & Cognition. 2000;28:480–500. doi: 10.3758/bf03198562. [DOI] [PubMed] [Google Scholar]
  4. Benjamin AS. On the dual effects of repetition on false recognition. Journal of Experimental Psychology: Learning, Memory, & Cognition. 2001;27:941–947. [PubMed] [Google Scholar]
  5. Benjamin AS. Memory is more than just remembering: Strategic control of encoding, accessing memory, and making decisions. In: Benjamin AS, Ross BH, editors. The Psychology of Learning and Motivation: Skill and Strategy in Memory Use. Vol. 48. Academic Press; London: 2008. pp. 175–224. [Google Scholar]
  6. Benjamin AS, Bawa S. Distractor plausibility and criterion placement in recognition. Journal of Memory & Language. 2004;51:159–172. [Google Scholar]
  7. Benjamin AS, Diaz M. Measurement of relative metamnemonic accuracy. In: Dunlosky J, Bjork RA, editors. Memory and Metamemory. 2006. Chapter to appear in. [Google Scholar]
  8. Benjamin AS, Diaz M, Wee S. Signal detection with criterion variability: Applications to recognition memory. 2008. Manuscript under review. [DOI] [PMC free article] [PubMed]
  9. Birch SL, Garnsey SM. The effect of focus on memory for words in sentences. Journal of Memory & Language. 1995;34:232–267. [Google Scholar]
  10. Bock JK, Brewer WF. Reconstructive recall in sentences with alternative surface structures. Journal of Experimental Psychology. 1974;103(5):837–843. [Google Scholar]
  11. Brainerd CJ, Reyna VF. Fuzzy trace theory and children’s false memories. Journal of Experimental Child Psychology. 1998;71:81–129. doi: 10.1006/jecp.1998.2464. [DOI] [PubMed] [Google Scholar]
  12. Brainerd CJ, Reyna VF. Fuzzy-trace theory: Dual processes in memory, reasoning, and cognitive neuroscience. Advances in Child Development & Behavior. 2001;6:359–364. doi: 10.1016/s0065-2407(02)80062-3. [DOI] [PubMed] [Google Scholar]
  13. Brainerd CJ, Reyna VF. Recollection rejection: How children edit their false memories. Developmental Psychology. 2002;38:156–172. [PubMed] [Google Scholar]
  14. Brainerd CJ, Reyna VF, Wright R, Mojardin AH. Recollection rejection: False-memory editing in children and adults. Psychological Review. 2003;110:762–784. doi: 10.1037/0033-295X.110.4.762. [DOI] [PubMed] [Google Scholar]
  15. Bransford JD, Franks JJ. The abstraction of linguistic ideas. Cognitive Psychology. 1971;2:331–350. [Google Scholar]
  16. Brewer WF. Memory for the pragmatic implications of sentences. Memory & Cognition. 1977;5(6):673–678. doi: 10.3758/BF03197414. [DOI] [PubMed] [Google Scholar]
  17. Deese J. On the prediction of occurrence of particular verbal intrusions in immediate recall. Journal of Experimental Psychology. 1959;58:17–22. doi: 10.1037/h0046671. [DOI] [PubMed] [Google Scholar]
  18. Egan JP. Recognition memory and the operating characteristic. Indiana University, Hearing and Communication Laboratory; Bloomington: 1958. (Tech. Note AFCRC-TN-58_51) [Google Scholar]
  19. Gallo DA, Roediger HL. Variability among word lists in eliciting memory illusions: Evidence for associative activation and monitoring. Journal of Memory & Language. 2002;47:469–497. [Google Scholar]
  20. Green DM, Swets JA. Signal detection theory and psychophysics. Wiley; New York: 1966. [Google Scholar]
  21. Hastie R, Kumar PA. Person memory: Personality traits as organizing principles in memory for behaviors. Journal of Personality & Social Psychology. 1979;37:25–38. [Google Scholar]
  22. Hicks JL, Cockman DW. The effect of general knowledge on source memory and decision processes. Journal of Memory & Language. 2003;48:489–501. [Google Scholar]
  23. Hintzman DL, Curran T, Oppy B. Effects of similarity and repetition on memory: Registration without learning? Journal of Experimental Psychology, Learning, Memory, & Cognition. 1992;18:667–680. doi: 10.1037//0278-7393.18.4.667. [DOI] [PubMed] [Google Scholar]
  24. Israel L, Schacter DL. Pictorial encoding reduces false recognition of semantic associates. Psychonomic Bulletin & Review. 1997;4:577–581. [Google Scholar]
  25. Johnson MK, Bransford JD, Solomon SK. Memory for tacit implications in sentences. Journal of Experimental Psychology. 1973;98:203–205. [Google Scholar]
  26. Jones TC, Jacoby LL. Feature and conjunction errors in recognition memory: Evidence for dual-process theory. Journal of Memory & Language. 2001;45:82–102. [Google Scholar]
  27. Kucera H, Francis WN. Computational analysis of present-day American English. Brown University Press; Providence, RI: 1967. [Google Scholar]
  28. Lampinen JM, Odegard TN, Neuschatz JS. Robust recollection rejection in the memory conjunction paradigm. Journal of Experimental Psychology: Learning, Memory & Cognition. 2004;30:332–342. doi: 10.1037/0278-7393.30.2.332. [DOI] [PubMed] [Google Scholar]
  29. Lloyd ME. Metamemorial influences in recognition memory: Pictorial encoding reduces conjunction errors. Memory & Cognition. doi: 10.3758/bf03193478. (in press) [DOI] [PubMed] [Google Scholar]
  30. Luce RD. Detection and recognition. In: Luce RD, Bush RR, Galenter E, editors. Handbook of mathematical psychology. Wiley; New York: 1963. pp. 103–189. [Google Scholar]
  31. Marsh RL, Hicks JL, Davis TT. Source monitoring does not alleviate (and may exacerbate) the occurrence of memory conjunction errors. Journal of Memory & Language. 2002;47:315–326. [Google Scholar]
  32. Masson MEJ, Loftus GR. Using confidence intervals for graphically based data interpretation. Canadian Journal of Experimental Psychology. 2003;57:203–220. doi: 10.1037/h0087426. [DOI] [PubMed] [Google Scholar]
  33. Miller MB, Wolford GL. The role of criterion shift in false memory. Psychological Review. 1999;106:398–405. [Google Scholar]
  34. Morris CD, Bransford JD, Franks JJ. Levels of processing versus transfer appropriate processing. Journal of Verbal Learning & Verbal Behavior. 1977;16:519–533. [Google Scholar]
  35. Murnane K, Shiffrin RM. Word repetitions in sentence recognition. Memory & Cognition. 1991;19(2):119–130. doi: 10.3758/bf03197109. [DOI] [PubMed] [Google Scholar]
  36. Odegard TN, Lampinen JM, Toglia MP. Meaning’s moderating effect on recollection rejection. Journal of Memory & Language. 2005;53:416–429. [Google Scholar]
  37. Payne DG, Elie CJ, Blackwell JM, Neuschatz JS. Memory illusions: Recalling, recognizing, and recollecting events that never occurred. Journal of Memory & Language. 1996;35:261–285. [Google Scholar]
  38. Payne BK, Jacoby LL, Lambert AJ. Memory monitoring and the control of stereotype distortion. Journal of Experimental Social Psychology. 2004;40:52–64. [Google Scholar]
  39. Peterson WW, Birdsall TG, Fox WC. The theory of signal detectability. Transactions of the IRE Professional Group on Information Theory. 1954;4:171–212. [Google Scholar]
  40. Potter MC, Lombardi L. Regeneration in the short-term recall of sentences. Journal of Memory & Language. 1990;29:633–654. [Google Scholar]
  41. Reinitz MT, Hannigan SL. False memories for compound words: Role of working memory. Memory & Cognition. 2004;32:463–473. doi: 10.3758/bf03195839. [DOI] [PubMed] [Google Scholar]
  42. Reinitz MT, Lammers WJ, Cochran BP. Memory-conjunction errors: Miscombination of stored stimulus features can produce illusions of memory. Memory & Cognition. 1992;20(1):1–11. doi: 10.3758/bf03208247. [DOI] [PubMed] [Google Scholar]
  43. Roediger HL, McDermott K. Creating false memories: Remembering words not presented in lists. Journal of Experimental Psychology: Learning, Memory & Cognition. 1995;21:803–814. [Google Scholar]
  44. Roediger HL, Weldon MS, Challis BH. Explaining dissociations between implicit and explict measures of retention: A processing account. In: Roediger HL, Craik FIM, editors. Varieties of memory and consciousness: Essays in honour of Endel Tulving. Erlbaum; Hillsdale, NJ: 1989. pp. 3–39. Chapter in. [Google Scholar]
  45. Shiffrin RM, Huber DE, Marinelli K. Effects of category length and strength on familiarity in recognition. Journal of Experimental Psychology: Learning, Memory, & Cognition. 1995;21(2):267–287. doi: 10.1037//0278-7393.21.2.267. [DOI] [PubMed] [Google Scholar]
  46. Swets JA. Form of empirical ROCs in discrimination and diagnostic tasks: Implications for theory and measurement of performance. Psychological Bulletin. 1986;99:181–198. [PubMed] [Google Scholar]
  47. Underwood BJ, Kapelak SM, Malmi RA. Integration of discrete verbal units in recognition memory. Journal of Experimental Psychology: Human Learning & Memory. 1976;2:293–300. [Google Scholar]
  48. Underwood BJ, Zimmerman J. The syllable as a source of error in multisyllable word recognition. Journal of Verbal Learning & Verbal Behavior. 1973;12:701–706. [Google Scholar]
  49. Watson DG, Benjamin AS. The effect of intonational boundaries on memory for sentences. (under review)
  50. Watson JM, Balota DA, Roediger HL. Creating false memories with hybrid lists of semantic and phonological associates: Over-additive false memories produced by converging associative networks. Journal of Memory & Language. 2003;49:95–118. [Google Scholar]

RESOURCES