Prior Knowledge and Exemplar Frequency

Harlan D Harris; Gregory L Murphy; Bob Rehder

doi:10.3758/MC.36.7.1335

. Author manuscript; available in PMC: 2008 Oct 29.

Published in final edited form as: Mem Cognit. 2008 Oct;36(7):1335–1350. doi: 10.3758/MC.36.7.1335

Prior Knowledge and Exemplar Frequency

Harlan D Harris ¹, Gregory L Murphy ¹, Bob Rehder ¹

PMCID: PMC2575124 NIHMSID: NIHMS53982 PMID: 18927047

Abstract

New concepts can be learned by statistical associations as well as by relevant existing knowledge. We examined the interaction of these two processes by manipulating exemplar frequency and thematic knowledge and considering their interaction through computational modeling. Exemplar frequency affects category learning, with high frequency items learned faster than low frequency items, and prior knowledge usually speeds category learning. In two experiments that manipulated both of these factors, we found that the effects of frequency are greatly reduced when stimulus features are linked by thematic prior knowledge, and that frequency effects on single stimulus features can actually be reversed by knowledge. We account for these results with the Knowledge Resonance (KRES) model of category learning (Rehder & Murphy, 2003) and conclude that prior knowledge may change representations so that empirical effects such as those caused by frequency manipulations are modulated.

Frequency has long been known as an important property of category structure. Rosch and Mervis (1975) argued that the frequency of properties in a category determines how typical category members are. Those that have properties frequently found in the category are more typical than those possessing less frequent properties, and category members possessing properties frequently found in other categories are less typical than those possessing less frequent properties. Although the frequency or familiarity of an object does not itself seem very strongly related to its typicality in natural concepts (Barsalou, 1985; Mervis, Catlin, & Rosch, 1976; Novick, 2003), when item frequency has been experimentally manipulated independently of other variables (such as similarity to category prototypes), it does influence category structure. For example, Nosofsky (1988) showed that repeating one item five times in each block of category learning made the item not only easier to learn but also more typical after learning. Furthermore, the effect spread beyond the frequent item itself, in that similar items in the same category also were rated as more typical.

Theories of concepts can explain such frequency effects easily (Barsalou, Huttenlocher, & Lamberts, 1998). If exemplar theories assume that each presentation of a stimulus is a stored instance, then frequent exemplars will have more stored instances, increasing the typicality of items similar to them. Likewise, if prototype theories assume that the category prototype is based on generalizing from instances, then the more an item is repeated, the more influence it will have on that generalization. That is, frequent items will pull the category prototype in their direction. Thus, the effect of frequency seems to be a straightforward example of how category structure influences learning and use of concepts.

One might expect basic variables such as frequency to have consistent effects across materials and tasks. However, there are a number of examples in which effects of category structure are altered when concepts make contact with other knowledge. For example, the standard learning advantage of conjunctive (and) over disjunctive (or) concepts can be overruled when the disjunctive concept is related to prior knowledge (Pazzani, 1991). Wattenmaker, Dewey, T. Murphy, and Medin (1986) examined effects of prior knowledge on learning linearly separable and nonlinearly separable categories. Linear separability is a structural variable that refers to whether correct categorizations can be made by independently weighting the category’s properties. Wattenmaker et al. showed that linearly separable categories could be made easier or harder to learn than nonlinearly separable categories by varying categories’ content (see also Murphy & Kaplan, 2000). They argued that some conceptual domains encourage summing of evidence, suitable for linearly separable categories, and that other domains encourage configural processing, suitable for nonlinearly separable categories. These content effects are one example of how people’s prior knowledge about a category can affect the processing they perform during learning and thereby alter the influence of structural variables.

The present research examined whether frequency effects are similarly sensitive to the content of the category being learned. Because frequency is such a basic variable, influencing cognitive processes from learning to lexical access, it is possible that its effects will not be so easily modified by the content of a category. We were particularly interested in this structural variable because it allowed us to investigate possible interactions of structural or formal aspects of a category with the more slippery variable of prior knowledge.

We also used this problem of the interaction between knowledge and exemplar frequency to test a model of category learning that attempts to incorporate both structure and knowledge, the Knowledge Resonance (KRES) model (Rehder & Murphy, 2003). Unlike most other models of category learning, KRES allows knowledge, represented by links among features and prior concept nodes, to influence the learning process. We carried out this modeling to try to provide an account of the effects of frequency and knowledge in this task, and more generally, to understand better how prior knowledge affects representations during learning. This work also continues our ongoing validation of the model’s general approach.

Earlier KRES modeling work (Harris & Rehder, 2006) compared two model variants on linearly and nonlinearly separable category learning tasks. One model represented prior knowledge by specific nodes that represented already-known categories, which could be associated to the to-be-learned categories. This model could base category responses on the similarity of stimuli to prior concepts. The other variant only allowed prior knowledge to influence the learning of associations by modifying representations of the stimuli. In this second model, knowledge could not directly and independently affect categorization, but instead had to affect responses by modulating the normal category learning and categorization system. The first variant fit Wattenmaker et al.’s data better than the second did. However, this may be because Wattenmaker et al. used categories that corresponded to known concepts (e.g., the personality trait of honesty). The present study will use what we call thematic feature relationships (Murphy & Allopenna, 1994) in which knowledge-related features are all consistent with a schema or theme but no known category actually exists. Since people often learn new categories that don’t correspond to already known ones, it is important to study and attempt to model this form of knowledge and its influence on learning.

The present experiment, exploring the effects of prior knowledge on category learning with varying exemplar frequency, is a step towards understanding the circumstances under which prior knowledge can affect concept learning, and, when combined with computational modeling, will elucidate the mechanism underlying these effects. For expository purposes, we will postpone detailed description of our modeling efforts until after the experiments are described, so that we may then discuss the relationships among the data, the model, and the theory in detail.

Experiment 1

In Experiment 1, subjects saw descriptions of buildings and learned to classify the buildings into two categories. For half of the subjects, most features of the buildings could be linked together by themes like “aerial buildings” and “underwater buildings,” while for the other half of the subjects, the features were unrelated to each other. (Note that aerial and underwater buildings are not familiar concepts for most people, as confirmed by Murphy & Allopenna, 1994, and Spalding & Murphy, 1999.) Table 1 shows the stimulus features. To manipulate frequency, one item of each category was presented six times more often than the other items. Once during learning and twice after learning, subjects performed test trials in which they classified or rated several types of stimuli (trained items, novel prototype items, and individual features), and the effects of knowledge and stimulus frequency on their responses were examined.

Table 1.

Feature Pairs Used in the Experiments.

Related Features
divers live there	astronauts live there
get there by submarine	get there by airplane
deep-sea research is conducted there	atmospheric research is conducted there
has thick, heavy walls	has thin, light walls
fish are kept there as pets	birds are kept there as pets

Unrelated Features
has a large kitchen	has a small kitchen
has area rugs	has wall-to-wall carpeting
has modern furniture	has colonial-style furniture
has a patio	has a porch
has rectangular doorways	has round doorways

Open in a new tab

Based on the work of Nosofsky (1988) and Barsalou et al. (1998), we expected to find frequency effects when prior knowledge was absent. That is, the frequent exemplar and its features would both be more likely to be categorized into the appropriate category than would less frequent exemplars and their features. The category structure we tested is presented in Table 2. This structure follows a standard one-away design in which 11111 is the prototype of category A and 00000 is the prototype of category B, and each category member contains one exception feature (a feature characteristic of the other category). However, not all the items in Table 2 were presented an equal number of times. Specifically, 11110 was presented six times more often than the other category A members, and 00001 was presented six times more often than the other category B members. Note that this manipulation of exemplar token frequency changes which features are in fact associated with which categories. The 0 feature appears more frequently on the fifth dimension in category A members, whereas the 1 feature on that dimension appears more frequently in category B members.

Table 2.

Abstract Category Structure for Experiment 1

Item	Features	Frequency	Item	Features	Frequency
Training Items
A1	11110	6	B1	00001	6
A2	11101	1	B2	00010	1
A3	11011	1	B3	00100	1
A4	10111	1	B4	01000	1
A5	01111	1	B5	10000	1

Additional Test Items
A0	11111		B0	00000
AF1	-----1		BF1	----0
AF2	---1-		BF2	---0-
AF3	--1--		BF3	--0--
AF4	-1---		BF4	-0---
AF5	1----		BF5	0----

Open in a new tab

Note. Item frequency is provided for training items. For test items, A0 and B0 were novel prototype items, while AF* and BF* were single-feature items, with AF1 and BF1 being exception features (assuming A1/B1 were the HF items).

To understand this manipulation, imagine that you had a black squirrel living in your back yard. Frequent exposure to this squirrel, assuming that you do not recognize that it is just a single individual, would not only increase your knowledge of the normal shape, size, and behavior of squirrels but would also incorrectly increase the association between black fur and squirrels (given that the vast majority of squirrels are not black). Thus, high frequency of an individual exemplar may have a negative effect on learning some of a category’s features—those that are idiosyncratic to it. We expected that such high-frequency exception features (those presented many times in the “wrong” category) should have low classification accuracy compared with the other features. We also expected analogous effects in classification and typicality ratings of the test items.

What should be expected when categories are related to prior knowledge? It is possible that the effects of frequency will be moderated in this condition. In an experiment manipulating feature frequency rather than item frequency, Murphy and Allopenna (1994, Experiment 2; also, Spalding & Murphy, 1999, Experiment 3) found much smaller effects of frequency on classification and typicality when prior knowledge was relevant. (They used a somewhat unusual category structure without crossover features that did not allow the analysis we describe below.) Here, we predicted similar effects for several different reasons. Because learning is faster when prior knowledge is present, empirical manipulations may have fewer opportunities to change what is learned. Furthermore, prior knowledge might tend to counteract some effects of frequency. In particular, the atypical feature of the frequent exemplar might be less influenced by frequency as it is thematically inconsistent with the rest of the category. For example, perhaps when learning about underwater buildings (though not labeled as such), people might encounter a frequent example that had the exception feature “astronauts live there” (along with four typical features). Although frequency would associate this feature to its incorrect category, people may tend to ignore or downplay the feature because it does not fit with the category theme, thereby weakening the frequency manipulation (see Heit, 1994). Further analysis of the responses to learned items and novel stimuli may also address more subtle effects of prior knowledge.

Method

Subjects

Forty members of the New York University community received course credit for their participation. Nineteen subjects were assigned to the knowledge condition and twenty-one to the no-knowledge condition.

Stimuli

Each subject saw training examples comprised of the written features in Table 1. The features were based on the integrated (knowledge) and nonintegrated (no knowledge) feature sets of Murphy and Kaplan (2000), but as we needed additional stimuli, we generated a set of potential additional knowledge-related and knowledge-unrelated dimensions and normed them. Fourteen additional subjects were given lists of features (two values for each dimension, as in Table 1) and were asked how likely each feature would be to be present in buildings that were either “underwater” or “in the air.” For each pair of items, the likelihood of being in the two types of buildings was calculated. Items that had similar likelihood ratings for underwater and aerial buildings were selected as knowledge-unrelated items, while items with a large effect of building type, but with relatively few “impossible” responses were chosen for knowledge-related items. (Related items: mean effect of building type on 1–4 rating scale = 2.8, proportion of dimension responses deemed impossible = .25; unrelated items: building type = .13, impossible = .04.) Because the norming yielded only four knowledge-related dimensions, we added a fifth, type of research (deep-sea or atmospheric), which was strongly related to the category themes. The ratings subjects made at the end of the experiment (see Procedure) confirmed that this dimension was strongly thematic.

Each training example was a description of a building using all five dimensions, in random order, displayed centered on a computer screen. Table 2 shows the abstract category structure used. The assignment of abstract dimensions, and thus of frequency, to specific building features was rotated across subjects. The first items in each category, A1 and B1, were presented six times per block and so were considered high frequency (HF) items, in contrast to the normal low frequency (LF) items, which appeared once per block. The atypical features in A1 and B1 (the final dimension in the table) were called exception features, because they were associated with the opposite category (B and A, respectively) 60% of the time, due to the high frequency of A1 and B1. The other features were considered normal features and were associated to the correct category 90% of the time.

The abstract transfer stimuli are shown in Table 2. All transfer stimuli were presented once in each test phase. A1–A5 and B1–B5 were the trained items, A0 and B0 were novel prototype items, and AF1–AF5 and BF1–BF5 were single-feature tests.

Design

Half of the subjects were randomly assigned to the knowledge condition and saw items constructed from the features in the top half of Table 1, while the other half were assigned to the no-knowledge condition, and saw items constructed from features in the bottom half of Table 1. The assignment of concrete stimulus dimensions to the abstract category structure was a counterbalance factor with five levels.

Procedure

Subjects were informed that they would be learning new categories but were not told about the frequency or knowledge manipulations. In order to make sure that subjects had comparable experience with the categories, all subjects performed five blocks of training trials, with 20 trials per block. After each block of training, subjects were told their accuracy on that block.

On each trial, the subject pressed a key in response to a prompt, causing an exemplar to appear on the screen. Subjects had 15 s to decide if the item belonged to category Q or category P, pressing the respective keys to indicate their choice. A “Correct” or “Incorrect” message appeared for 1.5 s, followed by the exemplar again, with either a Q or P on the screen to indicate the correct answer. This feedback remained visible for 4 or 8 s to allow study, depending on whether the subject got the trial correct or incorrect, respectively.

There were three test phases. The first phase was performed following the first block of training. Subjects were instructed to categorize the transfer stimuli (Table 2) as quickly and accurately as possible. Subjects were told to expect some new and incomplete items, and to just respond as best they could. The procedure was similar to the training phase, with identical stimulus presentation. After the response, however, the prompt to begin the next trial was immediately displayed, without feedback. The 22 whole and single-feature items appeared in random order. Classification decision was the dependent measure for this first test phase. The second test phase was performed following the completion of the fifth and final block of training. Exactly the same procedure was used as for the first test phase, and RT measures were also collected. For the third test phase, which immediately followed the second test, the same stimuli were used, but following categorization of each item, subjects were asked to evaluate the typicality of the item with respect to the response category on a 1 (entirely atypical) to 7 (very typical) scale. An explanation and example of typicality was provided. Subjects were instructed to respond as accurately as possible for the third phase, without emphasizing speed, and RT measures were not collected. McDowell and Oden (1995; and see Friedman & Massaro, 1998) found no effect on categorical responses when confidence measures were also collected, so classification responses in the second and third phase should be directly comparable.

A final task was a feature rating survey. Subjects rated each of the 20 features (i.e., both their and the alternative stimulus set; Table 1), indicating how predictive it is of the category themes. The instructions were, “Suppose you were trying to learn about underwater and aerial buildings in the real world (not in the context of this experiment). How useful would it be to be provided with each of the following features?” Possible responses were on a 1–5 scale, with labels “useless,” “not very useful,” “somewhat useful,” “very useful,” and “crucial.” As the results showed the expected effect of feature type (knowledge-related or not), and a small feature frequency effect, but no effect at all of the between-subjects knowledge condition, we do not discuss the survey further.

Results

Subjects in both experimental conditions learned to classify the items well. We defined a learning criterion of better than chance accuracy on LF items in the final block of training. Subjects in the knowledge condition were correct on 88% of their responses, with two subjects failing to reach criterion. Subjects in the no-knowledge condition were correct on 84% of their responses, with one subject failing to reach criterion. These three subjects were excluded from the analyses below.

Learning phase

Figure 1 shows training accuracy, broken down by knowledge condition and item frequency. Subjects found it considerably easier to learn frequent items, confirming a well-known prior result (repeated-measures ANOVA with knowledge condition and counterbalance as between subjects factors, and blocks and item frequency as within subjects factors, F(1, 27) = 78.92 > 4.21, η_p² = .75)ⁱ. Also as expected, accuracy increased with training block, F(4, 108) = 25.20 > 2.46, η_p² = .48. There was no significant main effect of the knowledge manipulation, F(1, 27) < 1, but notably, the interaction between knowledge and frequency was significant, F(1, 27) = 6.99 > 4.21, η_p² = .21. Knowledge helped learning of LF items, but it did not seem to help learning of HF items—or, alternatively, frequency had a greater effect in the no-knowledge condition.

Experimental learning curves, Experiment 1.

Test phases

The transfer stimuli give insight into the concepts learned by the subjects in each group. As noted above, response preferences (tests 1–3), RT measures (test 2), and typicality ratings (test 3) were collected for each type of transfer item. Any RT more than 2 SDs above a subject’s mean was omitted, and for the whole-item RT tests, only correct responses were included. Table 3 gives means for each test item type, for each test, for subjects in each of the two knowledge conditions.

Table 3.

Test Results, Experiment 1.

Accuracy
	LF Item	HF Item	Proto.	Norm. SF	Except. SF
Test 1 (during learning)
Knowledge	.78	.86	.81	.76	.78^*
No Knowledge	.71	.84	.89	.77	.34^*

Test 2 (during learning)
Knowledge	.92	1.0	.97	.85	.89^*
No Knowledge	.86	1.0	.95	.91	.39^*

Test 3 (with ratings)
Knowledge	.90	.92	.94	.89	.94^*
No Knowledge	.85	.97	.95	.91	.47^*

Reaction Time (ms), Test 2

	LF Item	HF Item	Proto.	Norm. SF	Except. SF

Knowledge	2683^*	2948	2438	1201	1196
No Knowledge	3640^*	3395	2771	1281	1444

Typicality Ratings (signed), Test 3

	LF Item	HF Item	Proto.	Norm. SF	Except. SF

Knowledge	4.08	4.67	5.89	4.74	4.89^*
No Knowledge	3.63	5.50	5.42	4.95	−0.05^*

Open in a new tab

Note. LF = low frequency; HF = high frequency; Proto = prototype; Norm = normal; Except = exception. RTs of correct responses only are given for whole items; all responses for single features.

Significant simple effect of knowledge (p < .05, Sidak test)

Figure 2 (middle) shows the accuracy of the training items during the three test phases. As these items were the same as those used in training, accuracy in test blocks 2 and 3 would be expected to be similar to the last block of training (Figure 1), and it was. An ANOVA with item frequency, knowledge condition, test block, and counterbalance as factors showed that HF items were responded to more accurately than LF items, F(1, 27) = 29.31 > 4.21, η_p² = .52, but this effect was reduced when concept-relevant knowledge was present, F(1, 27) = 4.81 > 4.21, η_p² = .15 for the interaction. Of course, part of this interaction may be due to ceiling effects, as performance on HF items is above 90%. Still, the interaction was numerically obtained on all three tests, and perhaps even crossed over on test 3, supporting a knowledge-driven reduction in the frequency effect. Finally, the ANOVA found a main effect of test number, F(2, 54) = 13.93 > 3.17, η_p² = .34, reflecting the increase in accuracy with further learning. No other effects were significant, aside from a three-way interaction among test, frequency, and the counterbalance factor, F(8, 54) = 2.32 > 2.12, η_p² = .26.

Experimental accuracy on Experiment 1 prototypes, training items and single features during tests. Test 1 was after the 1^st block of training, and tests 2 and 3 were following training. Normal items were AF2-AF5 and BF2-BF5, while exception items were AF1 and BF1. Error bars are 95% confidence intervals.

Figure 2 (right) shows the responses to the single-feature tests (see Table 2). For the purposes of analysis, correct responses were determined by type (ignoring frequency), not token, so exception features AF1 and BF1 were counted “correct” if labeled A and B respectively. AF1 appeared in five A items and one B item, so A was deemed the correct response for the purposes of the analysis, despite AF1 being associated with category A responses only 40% of the timeⁱⁱ. Given this definition of accuracy, the results show that normal features were responded to more accurately (consistently with type frequency) than were exception features, F(1, 27) = 23.48 > 4.21, η_p² = .47, and that accuracy was higher for subjects in the knowledge-related condition, F(1, 27) = 14.61 > 4.21, η_p² = .35. Critically, there was an interaction between these factors, with knowledge eliminating the tendency of subjects to respond to exception features based on token frequency, F(1, 27) = 28.58 >4.21, η_p² = .51. Without knowledge, subjects’ choice proportions following training very nearly corresponded to the normal and exception features’ token frequencies of 0.9 (normal) and 0.4 (exception). With knowledge, subjects reversed their response preference and responded to both features consistently with their prior knowledge, apparently showing no sensitivity to frequency.

Finally, Figure 2 (left) shows responses to the category prototype items. Aside from a trend towards an effect of test number, F(2, 54) = 2.69 ≯ 3.17, η_p² = .09, due to slightly lower accuracy on test 1, no other effects approached significance.

In addition to knowledge’s effects on response preferences, knowledge also affected how subjects made typicality judgments. For each test item in test 3, following collection of the response preference, we collected typicality ratings on a 1 to 7 scale. The raw rating was multiplied by −1 if the response preference was inconsistent with type frequency.ⁱⁱⁱ For example, if a subject classified feature AF3 as a member of category B and gave it a typicality rating of 4, the signed typicality rating for that subject and item would be −4. Mean signed typicality ratings are shown in Table 3 and Figure 3. For the whole (training) items, HF items were viewed as more typical, F(1, 27) = 14.39 > 4.21, η_p² = .35, but this frequency effect was marginally moderated with knowledge, F(1, 27) = 3.61 ≯ 4.21, η_p² = .12 for the interaction. There was no main effect of knowledge condition on typicality ratings for training items, F(1, 27) < 1.

Typicality ratings (signed) to Experiment 1 test items (test 3). Scores were on a 1–7 scale, multiplied by the consistency of the responses with type frequency (see text). Error bars are 95% confidence intervals.

For the individual features, subjects viewed normal features as much more typical than exception features, F(1, 27) = 14.43 > 4.21, η_p²= .35, but this effect essentially disappeared when the feature was related to prior knowledge, F(1, 27) = 15.47 > 4.21, η_p² = .36 for the interaction. This interaction in typicality ratings closely parallels the interaction in response preferences discussed previously, supporting the idea that knowledge can substantially overwhelm the otherwise robust effects of frequency. There was a main effect of knowledge condition as well, in which subjects in the prior knowledge condition rated individual features as more typical than did subjects in the no knowledge condition, F(1, 27) = 7.70 > 4.21, η_p² = .22.

The reaction time data showed a similar pattern to that of the accuracy data. However, because of the small number of trials and fairly large variance (unsurprising in making judgments of lists of verbal features in a partly between-subjects comparison), few of the effects reached standard levels of significance—here or in Experiment 2. This was particularly true in the analysis of individual features, as there were very few data points for those items. In Experiment 1, the only significant effect was a simple effect of knowledge on LF items, with knowledge speeding responses, F(1, 27) = 6.08 > 4.21, η_p² = .18. Overall, although RTs showed similar (but non-significant) patterns of knowledge and frequency as did accuracy (Table 3), the amount of data did not allow us to draw strong conclusions based on RTs.

Discussion

The results of this experiment show that exemplar frequency interacts with the presence or absence of prior knowledge. When prior knowledge was associated with the features of a category such that it could be used to aid learning, the advantage of high-frequency items over low-frequency items was substantially reduced. Single feature test results also dramatically shifted towards knowledge-consistent responses and away from frequency-consistent responses. The most striking result was that when frequency gave misleading evidence about a property (the exception features), learners in the no-knowledge condition classified it into the “wrong” category, but those in the knowledge condition did not. Thus, our results give clear evidence that prior knowledge can reduce or even eliminate the statistical effect of exemplar frequency. This pattern of results is consistent with our hypotheses that predicted that prior knowledge would modulate empirical learning, reducing the otherwise robust empirical effects of exemplar frequency.

The results are not as clear as could be desired, however, because the interaction of frequency and knowledge is possibly influenced by a ceiling effect. The critical reversal of subjects’ categorization of the exception features cannot be explained by ceiling effects, but some of the learning accuracy and other test results could be. The effects for these measures all take the form in which the frequency effect is reduced in the knowledge condition, where performance is very high. Although not every result seems susceptible to a ceiling effect explanation (e.g., typicality results shown in Figure 3), we carried out another experiment that was designed to avoid ceiling effects.

Experiment 2

Experiment 2 made a number of small changes in procedure from Experiment 1 in an attempt to reduce ceiling effects. One change was to reduce the exemplar frequency manipulation from 6:1 to 3:1. This should not only reduce the accuracy of the frequent items but should also confirm that the effects don’t depend on the presence of exception features. As will be described in detail below, the 3:1 ratio means that exception features are now more associated to their correct category than to their incorrect category, unlike the structure used in Experiment 1. A second change was to the test blocks. As in Experiment 1, in Experiment 2, test blocks were given to subjects after their first and last (fifth) blocks of training. However, only one post-learning test block was used, and each test block was identical, collecting both response preferences and typicality ratings. Based on the results of Experiment 1, we believed that typicality ratings would provide fine-grained information without troublesome ceiling effects.