The Role of Linguistic Labels in Inductive Generalization

Wei (Sophia) Deng; Vladimir M Sloutsky

doi:10.1016/j.jecp.2012.10.011

. Author manuscript; available in PMC: 2014 Mar 1.

Published in final edited form as: J Exp Child Psychol. 2012 Dec 25;114(3):432–455. doi: 10.1016/j.jecp.2012.10.011

The Role of Linguistic Labels in Inductive Generalization

Wei (Sophia) Deng ¹, Vladimir M Sloutsky ¹

PMCID: PMC3570606 NIHMSID: NIHMS431586 PMID: 23270793

Abstract

What is the role of linguistic labels in inductive generalization? According to one approach, labels denote categories and differ from object features, whereas according to another approach, labels start out as features and may become category markers in the course of development. This issue was addressed in four experiments with 4- to 5-year-olds and adults. In Experiments 1-3, we replicated Yamauchi & Markman’s findings (1998, 2000) with adults and extended the paradigm to young children. In Experiment 4, we compared effects of labels to those of highly salient visual features. Overall, results of experiments provide strong support for the idea that early in development, labels function the same way as other features, but they may become category markers in the course of development. A related finding is that whereas categorization and induction may be different processes in adults, they seem to be equivalent in young children. These results are discussed with respect to theories of development of inductive generalization.

Induction, or generalizing knowledge from known to novel, is a critical component of learning and cognition: induction enables us to apply learned knowledge to new situations. Some examples of inductive generalization include (a) inferring a property of a novel item given that a known item has this property or (b) inferring a category of a novel item given category membership of a known item. The former is referred to as projective induction and the latter as categorization. The term induction is often used to refer to both projective induction and categorization (Sloutsky & Fisher, 2004a).

Induction may have humble beginnings: it has been well established that induction appears early in development (Gelman & E. Markman, 1986; Mandler & McDonough, 1996; Sloutsky & Fisher, 2004a). There is also much evidence demonstrating that even early in development linguistic labels may affect inductive generalization (Gelman & E. Markman, 1986; Sloutsky & Fisher, 2004a; Sloutsky, Lo, & Fisher, 2001; Welder & Graham, 2001). However, the mechanism underlying the role of labels in early induction is hotly debated. Do labels start out as category markers (i.e., symbols denoting the category) or do they start as features and potentially become category markers in the course of development. In what follows, we consider both possibilities in greater detail.

Putative Mechanisms Underlying Effects of Labels on Generalization

Some researchers have argued that from early in development children expect linguistic labels (primarily in the form of count nouns) to mark categories (Waxman & Markow, 1995) and facilitate inductive generalization (e.g., Gelman, 2003; Welder & Graham, 2001). According to this view, a common label suggests common category (i.e., if two items are called “dog”, they are likely to belong to the same kind), whereas common category suggests that the items may share multiple properties. Therefore, when performing induction, people may first use a category label to identify the category the entity belongs to and then generalize properties of that entity to other members of the target category. For example, in a series of experiments, Gelman and E. Markman (1986) presented young children with triads consisting of a target and two test items. One test item shared the label with the target, but looked dissimilar from it, whereas the other test item looked similar to the target, but had a different label. Children were informed that one test item had a particular hidden property (e.g., “hollow bones”) and the other test item had a different hidden property (e.g., “solid bones”), and asked to decide which hidden property the target had. The results indicated that children were more likely to base their inference on the common label than on perceptual similarity (but see Sloutsky & Fisher, 2004a, Experiment 4, for diverging evidence and counterarguments). This and similar findings have been interpreted as evidence that children’s induction is based on category membership, which is denoted by a particular label.

There is also evidence that count nouns are most likely to guide induction than other word forms. For example, Gelman and Heyman (1999) reported that young children were more willing to generalize properties of a person from one context to another when the person was referred to by a count noun (i.e., “carrot-eater”) than when referred to by a descriptive sentence (e.g., “likes to eat carrots”).

These findings, however, do not lend unequivocal support to the idea that words are category markers. For example, some researchers suggested that the contribution of linguistic labels is driven by attentional rather than conceptual factors (Napolitano & Sloutsky, 2004; Sloutsky & Napolitano 2003). There is also evidence that labels contribute to the overall similarity of compared entities (Sloutsky & Lo, 1999; Sloutsky & Fisher, 2004a) and thus to both categorization and induction. In one experiment using items that had been previously used by Gelman and E. Markman (1986), Sloutsky and Fisher (2004a) demonstrated that similarity computed over labels and appearances can accurately predict young children’s responses, whereas a model that assumes reliance only on labels fails to predict children’s performance. Proponents of this view have also argued that early in development, labels may function like other features (e.g., shape, color, size, etc.), although they may become category markers as a result of development (Deng & Sloutsky, 2012; Sloutsky & Fisher, 2004a; Sloutsky & Lo, 1999; Sloutsky & Fisher, 2004a; Sloutsky, Lo, & Fisher, 2001; Sloutsky, 2010).

In short, according to one approach labels start out as category markers: even early in development they denote categories, and as such, they differ from other features. In contrast, labels may become category markers as a result of development, whereas early in development labels do not qualitatively differ from other features.

Experimental Distinction between Labels-as-Features and Labels-as-Category-Markers

In an attempt to distinguish between labels being features and category markers, Yamauchi and A. Markman (1998, 2000) developed an innovative paradigm potentially capable of settling the issue. The paradigm is based on the following idea. Imagine two categories A (labeled “A”) and B (labeled “B”), each having four binary dimensions (e.g., Size: large vs. small, Color: black vs. white, Shape: square vs. circle, and Texture: smooth vs. rough). The prototype of Category A has all values denoted by “1” (i.e., “A”, 1, 1, 1, 1) and the prototype of Category B has all values denoted by “0” (i.e., “B”, 0, 0, 0, 0). There are two inter-related generalization tasks – categorization (referred to as “classification” by the authors) and projective induction (referred to as “inference”). The goal of classification is to predict category membership (and hence the label) on the basis of presented features. For example, participants are presented with all the values for an item (e.g.,?, 0, 1, 1, 1) and have to predict category label “A” or “B”. In contrast, the goal of inference is to predict a feature on the basis of category label and other presented features. For example, given an item (e.g., “A”, 1,?, 1, 0), participants have to predict the value of the missing feature. A critical manipulation that could illuminate the role of labels is the “low-match” condition. For low-match inference, participants were presented with an item “A”,?, 0, 1, 0, 0 (which had more features in common with the prototype of Category B, but label “A”) and asked to predict the missing feature. For low-match classification, participants were presented with an item “?”, 1, 0, 1, 0, 0 (which again had more features in common with the prototype of Category B) and asked to predict the missing label.

These researchers argued that if the label is just a feature then performance on the low-match classification and inference tasks should be symmetrical. However, if labels are more than features and are treated as category markers, then predicting a label when features are provided (i.e., a classification task) should elicit different performance from a task of predicting a feature when the label is provided (i.e., an inference task). Specifically, category-consistent responding should be more likely in low-match inference tasks (where participants could rely on the category label) than in low-match classification tasks (where participants had to infer the category label).

Upon finding predicted asymmetries between the two conditions, these researchers concluded that category labels differed from other features (see also Rehder, et al, 2009 for supporting eye tracking evidence). These findings have been replicated in a series of follow-up studies (see A. Markman & Ross, 2003, for a review) and have been successfully modeled (see Love, et al, 2004).

What is at Stake: Why is the Difference between Labels-as-Features and Labels-as-Category-Markers Important?

Why is understanding of the role of label early in development important? We believe that there are at least two reasons. First, this understanding is necessary for identifying the mechanism of early generalization and its change in the course of development and this knowledge, in turn, may elucidate more general principles of cognitive development. In particular, if labels function as features, they contribute to generalization in a bottom-up manner (by contributing to the featural overlap among the compared items), whereas if they are category markers they may guide the process in a top-down manner (by triggering a search for overlapping features). Each of these possibilities has long-ranging consequences for our understanding of cognitive development. If from early in development language exerts top-down influences on category learning, then even early in development, the lower-level processes (such as discrimination and generalization) are subject to top-down control. Therefore, the ability to exert top-down control, as well as cognitive and neural mechanisms that sub-serve this ability, has to exhibit early onset. Alternatively, if words acquire the ability to guide cognition in the course of development, then top-down control does not have to exhibit early onset and could be itself a product of development.

And second, the role that labels play in generalization may elucidate relationships between categorization and induction. Note that some researchers have argued that that the two tasks are functionally equivalent for adults (e.g., Anderson, 1991) and children (e.g., Sloutsky & Fisher, 2004a), whereas others have argued that the tasks are functionally different (see Markman & Ross, 2003, for a review). If the tasks are equivalent, then representation formed in the course of classification and inference training should be equivalent as well. In contrast, if the tasks are functionally different and thus classification and inference training should result in different representations. For example, Markman and Ross (2003) presented an extensive argument regarding potential differences in representations between classification and inference training and presented evidence supporting this distinction in adults (see also Hoffman & Rehder, 2010, for eye tracking evidence; Love, et al., 2004, for a computational model). However, if early in development, labels function as features, then, classification and inference tasks should be equivalent, which, in turn suggests that extensive differences observed between classification and induction in adults are a product of development. We return to this issue in the General Discussion section.

Present Research

Yamauchi and Markman’s paradigm has been successfully applied for examining the role of labels in adults’ generalization and could be applied for examining possible developmental changes in the effect of labels on generalization. Does the asymmetry between low-match classification and low-match inference characterizing adults’ performance also characterize children’s performance? Finding such an asymmetry would suggest that labels play a similar role across development, indicating that even for young children labels are more than features. However, as argued above, it is possible that labels function differently across development: whereas labels may denote function as category markers in adults, they may function as perceptual features in young children. If this is the case, then unlike adults, children may exhibit symmetrical performance in low-match classification and low-match inference.

The reported experiments were designed to address these issues. In Experiments 1-3, we replicated Yamauchi & Markman’ findings with adults and extended the paradigm to young children. In Experiment 4, we compared effects of labels to those of highly salient visual features.

EXPERIMENT 1

The goal of Experiment 1 was to (1) replicate Yamauchi and A. Markman’s (2000) paradigm with adults and (2) examine the role of labels in early generalization by extending the paradigm to young children. Similar to Yamauchi and Markman (2000), participants learned two categories of creatures and then were given classification and inference trials, half of which were high-match and half were low-match. There were small procedural differences between the current procedure and the one used by Yamauchi and Markman (2000). Most importantly, in contrast to Yamauchi and Markman (2000), where labels were presented as a single written word, labels in Experiment 1 were presented auditorily in a carrier phrase (e.g. “This is a Flurp”).

Based on Yamauchi and A. Markman’s (2000) results, we expected that adult participants would make category-consistent responses in low-match inference, but not in low-match classification. This finding would be consistent with the idea that adults treat labels as category markers. Finding such an asymmetry in young children would support the idea that even for children labels are more than features.

EXPERIMENT 1A

Method

Participants

Participants were 12 adults (3 women) and 12 preschool children (M = 56.0 months, range 51.9-59.4 months; 7 girls). In this and all other experiments reported here, children were recruited from childcare centers, located in middle-class suburbs of Columbus, Ohio and tested in a quiet room in their preschool by a female experimenter. All adults were undergraduate students from the Ohio State University participating for course credit.

Materials

In all reported experiments reported, materials were colorful drawings of artificial creatures measuring 17.0 cm by 23.5 cm (see Figure 1). The items had five features varying in color and shape and formed two categories determined by feature values. Artificial labels (“Flurp” or “Jalet” printed above each creature) were used to refer to the categories.

Stimuli examples from two categories used in Experiments 1-3. F = Flurp; J = Jalet. F0 and J0 are prototypes of each category and F1/J1-F5/J5 are individual exemplars.

As shown in Tables 1-2, the two categories have a family-resemblance structure, which is derived from two prototypes (F0 and J0) by modifying the values of one of five features (see Figure 1). For example, stimulus F1, has four features consistent with the prototype F0 and one feature (i.e., antenna) consistent with the prototype J0. The degree of similarity between a test stimulus and the prototype is defined by the number of matching features of the test stimulus to the prototype of the corresponding category (see Tables 1-2).

Table 1.

Category structure used in learning in Experiment 1-4.

Flurp							Jalet
Stimuli	Head	Body	Hands	Feet	Antenna	Label	Stimuli	Head	Body	Hands	Feet	Antenna	Label
F1	1	1	1	1	0	1	J1	0	0	0	0	1	0
F2	1	1	1	0	1	1	J2	0	0	0	1	0	0
F3	1	1	0	1	1	1	J3	0	0	1	0	0	0
F4	1	0	1	1	1	1	J4	0	1	0	0	0	0
F5	0	1	1	1	1	1	J5	1	0	0	0	0	0
F0	1	1	1	1	1	1	J0	0	0	0	0	0	0

Open in a new tab

Note. The value 1 = any of five dimensions identical to "Flurp" (see Figure 1). The value 0 = any of five dimensions identical to "Jalet" (see Figure 1). F = Flurp; J = Jalet. F0 and J0 are prototypes of each category.

Table 2.

A. Structure of testing stimuli in Classification used in Experiments 1-3.
Flurp								Jalet

Stimuli	Head	Body	Hand	Feet	Antenna	Target Label	Match	Stimuli	Head	Body	Hand	Feet	Antenna	Target Label
F11	1	1	1	1	0	?	High	J11	0	0	0	0	1	?
F12	1	1	1	0	1	?		J12	0	0	0	1	0	?
F13	1	1	0	1	1	?		J13	0	0	1	0	0	?
F14	1	0	1	1	1	?		J14	0	1	0	0	0	?
F15	0	1	1	1	1	?		J15	1	0	0	0	0	?

F21	1	0	1	0	0	?	Low	J21	0	1	0	1	1	?
F22	0	1	0	1	0	?		J22	1	0	1	0	1	?
F23	0	0	1	0	1	?		J23	1	1	0	1	0	?
F24	1	0	0	1	0	?		J24	0	1	1	0	1	?
F25	0	1	0	0	1	?		J25	1	0	1	1	0	?

B. Structure of testing stimuli in Induction in Experiments 1-3.
Flurp								Jalet

Stimuli	Head	Body	Hand	Feet	Antenna	Target Label	Match	Stimuli	Head	Body	Hand	Feet	Antenna	Target Label

F11	1	?	1	1	0	1	High	J11	0	?	0	0	1	0
F12	1	1	?	0	1	1		J12	0	0	?	1	0	0
F13	?	1	0	1	1	1		J13	?	0	1	0	0	0
F14	1	0	1	1	?	1		J14	0	1	0	0	?	0
F15	0	1	1	?	1	1		J15	1	0	0	?	0	0

F21	?	0	1	0	0	1	Low	J21	?	1	0	1	1	0
F22	0	?	0	1	0	1		J22	1	?	1	0	1	0
F23	0	0	?	0	1	1		J23	1	1	?	1	0	0
F24	1	0	0	?	0	1		J24	0	1	1	?	1	0
F25	0	1	0	0	?	1		J25	1	0	1	1	?	0

Open in a new tab

Note. High and low are two levels of feature match. F = Flurp; J = Jalet. Category-consistent responses were the ones consistent with the values indicated in the target features and target labels.

There were two levels of similarity: high-match and low-match. In the high-match condition, each test stimulus had four features in common with the prototype of the corresponding category and one feature in common with the prototype of the contrasting category. In the low-match condition, each test stimulus had two features in common with the prototype of the corresponding category and three features in common with the prototype of the contrasting category.

Design and Procedure

All experiments reported here had a two (Test Condition: Classification vs. Inference) by two (Feature Match: High vs. Low) within-subjects design, and the procedure consisted of two phases, training and testing. The experiments were administered on a 17-inch computer monitor and controlled by E-prime 2.0 software. Classification and Inference test trials were presented in blocks and the order of the blocks was counterbalanced. The order of test trials within each block was randomized for each participant.

All experiments started with the training phase. At the beginning of training both adult and children were instructed that there were two groups of creatures “Flurps” and “Jalets.” They were then presented with creatures (one at a time, with 4000 ms per item), each accompanied by a category label presented auditorily in a carrier phrase (e.g., “This is a Flurp”). The carrier phrase was pre-recorded and presented by a computer. The phrase had the same onset as the beginning of the trial, with the total duration of approximately 1800 ms. There were 36 training trails (i.e., 18 Flurps and 18 Jalets) and this part of the experiment lasted for approximately 3-4 minutes. No participant response was required during this phase.

The testing phase, consisting of 92 trials (12 with feedback and 80 without feedback), was administered immediately after the training phase (see Figure 2 for examples of testing trials). Half of these trials were Classification and half were Inference, with each trial presented in a self-paced manner. The Classification and Inference testing conditions differed in what participants had to predict. On Classification trials, participants predicted the label of an item, given information about all five features (they were instructed that they would be presented with creatures and they would need to decide whether the creature was a Flurp or a Jalet). On Inference trials, participants predicted a missing (i.e., covered) feature, given the other four features and the label (they were instructed that they would be presented with creatures with a covered body part and they will have to decide which body part is under the cover). For both children and adults, this part of the experiment lasted for approximately 14-15 minutes.

Examples of Classification and Induction test trials in Experiments 1-4. A. On classification trials, participants were presented with stimuli and asked: whether the item was Flurp or Jalet? B. On induction trials, participants were presented with stimuli and asked: which body part was under the cover?

The procedures were identical for both adult and child participants except for the way the instructions and test questions were presented and the data were recorded. Adults read instructions and the test questions on the computer screen and responded by pressing an appropriate key on the keyboard. For children, all instructions and questions were presented by a female experimenter and she recorded children’s verbal responses by pressing the keyboard.

To familiarize participants with the testing task, yes/no feedback was given on 12 test trials -- the first six test trials of Classification and Induction testing condition (each of these were high-match trials). In this and other experiments reported here, children were above 85.2% accuracy on these trials and adults were above 68.5%, all above chance, ps < .05. No feedback was given on the remaining 80 testing trials (40 in each testing condition, half high-match and half low-match) and only these trials were used in the reported analyses. The proportion of responses consistent with the category from which the exemplar was derived (called “category-accordance responses” by Yamauchi & A. Markman, 2000) was the dependent variable.

In addition, a memory check was administered after the main experiment to examine whether participants remembered two categories after completing all the tasks. There were five memory check trials, with participants being presented with stimuli randomly generated from the training structure (see Table 1). On each trial, participants were asked to recall the corresponding label of each stimulus. Children and adults exhibited memory accuracy of 91.7% and 78.0% respectively. One adult answered fewer than three out five memory check questions correctly and these data were excluded from the analysis.

Results and Discussion

The main results are presented in Figure 3. As can be seen in the figure, adults exhibited equivalent performance in the high-match condition (i.e., no differences between Classification and Inference), whereas there was a marked difference in the low-match condition. In particular, adults were more likely to produce category-consistent responding in the low-match Inference condition than in the low-match Classification condition. In contrast, young children produced high levels of category-consistent responding only in the high-match conditions, whereas this was not the case in the low-match conditions.

Proportion of category-consistent responses by feature match and testing condition in Experiment 1A. *Note*. Error bars represent standard error of the mean.

Note that all experiments that involved different age groups (Experiments 1-3) revealed a significant 3-way (i.e., Age × Testing Type × Feature Match) interaction (all Fs > 4.1, ps <.06, η_p² > .176). To interpret the interaction, we conducted separate 2 (Testing Type: Classification vs. Induction) by 2 (Feature Match: High vs. Low) within-subjects ANOVAs for each age level.

For adults, there was a significant testing type by feature match interaction, F(1, 10) = 24.63, MSE = 0.32, p = .001, η_p² = .711. A paired-samples t-test indicated that in the high-match condition there were no differences between Inference and Classification t(10) = 0.30, p = .772, whereas in the low-match condition participants were more likely to make category-consistent responses in the Inference than in the Classification condition, t(10) = 3.89, p = .003, d = 1.71.

For children, there was a main effect of feature match, F(1, 11) = 43.56, MSE = 1.33, p = .001, η_p² = .798, with participants being more likely to provide category-consistent responses in the high-match than in the low match condition. There was also a main effect of testing type (there were more category consistent responses in the Classification than in the Inference condition), F(1, 11) = 14.77, MSE = 0.16, p = .003, η_p² = 0.573, which was different from adults. This main effect may reflect the advantage of Training-Testing correspondence (recall that participants in this experiment were trained by classification) and we will come back to this issue in Experiments 1B and 3.

There were several important differences between children and adults. First, in contrast to adults, for children there was no significant interaction between testing type and feature match, F(1, 11) = 0.02, MSE = 0.00, p = .90. Second, unlike adults who were above chance in relying on category information in low-match inference, one-sample t(10) = 2.66, p = .024, d = 0.80, young children performed significantly below chance in relying on the label to predict missing features; they relied instead on the overall similarity, one-sample t(11) = 4.47, p = .001, d = 1.29. And finally, in contrast to adults, children’s performance in low-match Inference did not exceed low-match Classification. In fact, the opposite was true – children were somewhat more likely to generate category-consistent responses in low-match Classification than in low-match Inference, paired-sample t(11) = 2.32, p = .041, d = 0.85.

These results extended those of Yamauchi & Markman (2000), suggesting that labels may play a different role for adults and children. In particular, similar to Yamauchi & Markman (2000), adults treated labels differently from other features: in low-match inference they relied primarily on labels. In contrast, children relied on the overall similarity: the proportion of category-consistent responses in low-match Inference was below chance and did not exceed that in low-match Classification, which provided little evidence that for young children labels are category markers.

In sum, the asymmetry between low-match Inference and low-match Classification in adults suggests that for adults labels are processed differently from other features. In contrast, children’s tendency to rely on the overall similarity in both low-match Inference and Classification indicates that young children did not treat category labels differently from other features.

Note that Experiment 1A used only Classification training, whereas participants were tested in both classification and inference. To ensure that the observed effects are not specific to a particular training condition used in Experiment 1A, we conducted Experiment 1B, in which participants were given Inference training.