Influences of High and Low Variability on Infant Word Recognition

Leher Singh

doi:10.1016/j.cognition.2007.05.002

. Author manuscript; available in PMC: 2009 Feb 1.

Published in final edited form as: Cognition. 2007 Jun 27;106(2):833–870. doi: 10.1016/j.cognition.2007.05.002

Influences of High and Low Variability on Infant Word Recognition

Leher Singh ¹

PMCID: PMC2213512 NIHMSID: NIHMS37566 PMID: 17586482

Abstract

Although infants begin to encode and track novel words in fluent speech by 7.5 months, their ability to recognize words is somewhat limited at this stage. In particular, when the surface form of a word is altered, by changing the gender or affective prosody of the speaker, infants begin to falter at spoken word recognition. Given that natural speech is replete with variability, only some of which is determines the meaning of a word, it remains unclear how infants might ever overcome the effects of surface variability without appealing to meaning. In the current set of experiments, consequences of high and low variability are examined in preverbal infants. The source of variability, vocal affect, is a common property of infant-directed speech with which young learners have to contend. Across a series of four experiments, infants' abilities to recognize repeated encounters of words, as well as to reject similar-sounding words, are investigated in the context of high and low affective variation. Results point to positive consequences of affective variation, both in creating generalizable memory representations for words, but also in establishing phonologically precise memories for words. Conversely, low variability appears to degrade word recognition on both fronts, compromising infants' abilities to generalize across different affective forms of a word and to detect similar-sounding items. Findings are discussed in the context of principles of categorization, both of a linguistic and non-linguistic variety, which may potentiate the early growth of a lexicon.

Indisputably, a crucial component of language acquisition involves learning the meanings of words. In the simplest terms, this refers to the process by which learners equate words that they hear with conceptual knowledge. When we consider the potential processes that enable word learning, it may serve us well to introspect on how we learn words in a new language as adults. As adults embarking on learning a second language, we approach the process by storing word-concept associations in memory. However, the fact that we have already successfully acquired one human language confers certain privileges that are unavailable to comparatively naïve infants learning their first language. First, as adults, we can avail of certain assumptions about human languages in general. Specifically, we know that human languages are divisible into words, clauses and phrases, even though speech arrives at our perceptual receptors as a continuous stream of sound. This knowledge compels us to mentally punctuate the signal and generate smaller-sized units that may be easier to parse. Second, we know something about the purpose served by words in a language. Specifically, we know that a word is a subclausal unit that maps onto a concept in our native language, arguably forming the most fundamental unit of the language code employed to impart meaning. Finally, the equation of sound and meaning is made easier for us as we are explicitly aware of the ways in which a particular language expresses meaning, i.e. how changes in sound determine changes in meaning in the language we are attempting to learn. For example, if we opt to learn a tonal language, we know that we need to attend to certain types of pitch change while defining words. This knowledge allows us to pay close attention to changes in sound that affect meaning and less attention to those changes that are not meaningful. Therefore, at a very general level, we possess a few basic definitions of a word before even approaching the task of word learning in a second language. This affords us a comfortable degree of predictability about the status and possible form of words in the language we are trying to learn at the outset. By contrast, infants learning their first language lack knowledge of how and why words are relevant to language use. They do not know a priori that human language can and should be parsed into words nor do they know how their particular language chooses to alter the form of a word in order to alter its meaning. Therefore, prior to mapping sound to meaning, young infants must derive certain facts about human language as well as about their own particular language before they can ‘discover’ words in the speech stream.

The process of word discovery in infancy is complicated by two widely documented problems. First, infants must resolve the segmentation problem, whereby they have to separate large-scale units into individual components that correspond to words. This poses a problem because speech, unlike written language, does not unravel with convenient pauses inserted between words. Therefore, infants have to establish the location of word boundaries prior to knowing the meaning of most words. Given that parents produce a negligible percentage of total words in isolation to their children and that vast majority of their speech consists of multiword utterances (Van der Weijer, 1998), the segmentation problem is one that infants must overcome without assistance from their caregivers. Second, infants must contend with inordinate variability in speech, caused by a ‘lack of invariance’ (Klatt, 1989). It is well known by scholars of human speech perception and architects of automatic speech recognition devices that human speech is a calculus of cues that interacts in notoriously complex and unpredictable ways with human language. While we all have a reliable and stable store of linguistic knowledge that we consult when we produce or perceive speech, the way in which we produce or perceive linguistic units varies in unreliable ways both between and within individual speakers. Within speakers, depending on factors such as a speakers' affective state or distance from the listener, linguistic units will sound physically different across varying contexts. Across speakers, different vocal tract characteristics lead to the units of speech being realized in physically distinct ways based on factors such as gender and voice quality. Even if we consider the smallest units of speech, a phonetic segment can change its form based on surrounding segments as a result of assimilation processes (Gaskell & Marslen-Wilson, 1996; Gow, 2001; 2002). As a result, it is incredibly difficult to capture invariant linguistic units amidst the mire of human speech.

Both the segmentation and variability problems conspire to produce a challenge that would seem intractable for young minds. Amidst profound variability and its ensuing indeterminacy, how do infants establish the way in which words should be represented in memory? Painting in broad strokes, this problem amounts to a task of reducing the dimensionality of the input to its linguistically relevant (phonemic) dimensions and categorizing incoming sounds accordingly. However, in order to categorize speech according to dimensions that are phonemic, infants must first know which dimensions of sound are phonemic in their native language. One potential way to arrive at this knowledge is to appeal to meaning: Changes in sound that necessarily accompany changes in meaning are linguistically relevant. However, paradoxically, infants learn to recognize words in fluent speech before they map these words onto meaning with any regularity. In a now seminal study by Jusczyk & Aslin (1995), it was revealed that infants learn to track, encode and recognize repetitions of novel words, with which meaning has not yet been associated, by 7.5 months. In this study, infants were familiarized with repetitions of two words using the Headturn Preference Procedure. They were then exposed to sentences containing those words interspersed with an unfamiliar set of sentences. Infants listened longer to sentences that contained the familiarized words than to unfamiliar sentences, providing the first empirical evidence of speech segmentation and word recognition in infancy. Therefore, months before they are able to understand the meanings of words, infants continuously archive memories of words they hear, perhaps heralding the true point of origin of lexical development.

Even though infants develop word knowledge as early as 7.5 months, there is considerable fragility in their word recognition abilities. Specifically, 7.5-month-old infants appear to encode words in fine phonetic and acoustic detail, which compromises their ability to detect novel instances that are acoustically dissimilar. Consequently, at this stage, infants do not recognize a word spoken by a female if they were trained on an instance of the word spoken by a male (Houston & Jusczyk, 2000) nor do they recognize a word spoken with positive affect if they were trained on the word in neutral affect and vice versa (Singh, Morgan & White, 2004). In a study by Singh et al. (2004), 7.5 month old infants were familiarized with two words, one in a happy tone of voice and another in a neutral tone of voice. Infants were then tested on their recognition of both words in the context of sentences. As in Jusczyk & Aslin's (1995) study, during the recognition phase, some of the sentences contained the familiarized words and some did not. The difference in infants' listening times to the two types of passages yielded an index of word recognition. One distinction between the design of this experiment and that of Jusczyk & Aslin (1995) was that during the recognition phase, half of the infants heard all passages (groups of six sentences) in happy affect and half of the infants heard all the passages in neutral affect. Results demonstrated that at 7.5 months, infants only recognized happy familiarization words in happy passages and neutral familiarization words in neutral passages, revealing a matching effect. Essentially, infants failed to recognize the same word when it differed in affect between the two phases of the experiment. Over the succeeding three months, infants' ability to recognize dissimilar encounters of the same word appear to markedly improve. At 10.5 months, infants succeed in recognizing instances of a word that were mismatched in vocal affect (Singh et al., 2004) or speaker gender (Houston & Jusczyk, 2000) indicating the point at which infants could learn to appropriately generalize to novel, dissimilar tokens of the same word in a linguistically mature fashion (Houston & Jusczyk, 2000; Singh et al., 2004). ¹

The transition from 7.5 to 10.5 months marks a concurrent development in phonetic perception where infants graduate from a universalist to a language-specific frame of reference. While infants begin life with a perceptual apparatus designed to detect phonetic contrast regardless of whether it proves to be phonemic in the native language, this apparatus is pruned over the second half of infancy to selectively appreciate phonetic changes that are phonemic in the native language (Best, 1995; Kuhl, 1996; Werker & Tees, 1984; 1999) at the cost of attending to those contrasts which are not. A rich and detailed literature, inspired by an original study by Werker and Tees (1984) has allowed us to chronicle the development and elaboration of infants' phonological store over the first year of life. This literature has uncovered great linguistic strides made by early learners in the establishment of a native phonological repertoire, which is undoubtedly a necessary prerequisite to acquiring other formal aspects of language. While infants develop a language-specific orientation towards the end of their first year, the exact time course of this transition interacts with a number of other factors, such as the statistical frequency of the phonemes being tested (Anderson, Morgan & White, 2003; Maye, Werker & Gerken, 2002), the relationship of the native language to the contrasts being tested (Best, 1995), and the particular training conditions under which infants are familiarized with phonemes (Maye, et al, 2002).

While it remains unclear exactly how infants learn which contrasts are phonemic, one potential causal mechanism by which infants may establish phonemic boundaries is the statistical frequency with which particular segments occur in the input. In a study designed to assess infants' sensitivities to such frequencies, Maye, et al. (2002) demonstrated an impressive capacity on the part of 8 month old infants to draw phonemic distinctions by capitalizing on distributional cues in the input. In this study, when infants were exposed to phonetic continua that assumed a bimodal distribution, infants formed two phonetic categories; when exposed to phonetic continua assuming a unimodal distribution, they formed a single phonetic category. This provides a compelling causal mechanism by which infants may develop native phonetic categories amidst considerable natural variability in the production of phonemes in the linguistic environment. Therefore, it remains clear that infants undergo crucial developments between 6 and 12 months, during which they appear to develop a native perceptual filter through which incoming phonetic segments are classified. This process may be guided by the distributional profile of particular phonetic segments in the input, revealing the valuable contributions of early plasticity to the discovery of native phonological organization.

This emergence and refinement of native phonetic categories between 6 and 12 months seems to coincide in part with the development of robust word recognition skills between 7.5 and 10.5 months. Therefore, infants, by the end of their first year, have learned to ascribe relevance to the acoustic properties of phonemes and to the acoustic properties of words based on the underlying phonological organization of their language. A primary goal of the current set of studies is to determine what factors may facilitate or impede the process of ascribing relevance to the properties of words and in particular, how the natural variability encountered in typically infant-directed speech may guide such a progression.

Even though infants develop the ability to recognize familiar words in fluent speech at 7.5 months (Jusczyk & Aslin, 1995), they have a strong propensity to retain talker- or context-specific details in memory to their own detriment in these tasks. This tendency appears to result in narrow lexical categories, in which words are defined in terms of both phonetic and acoustic characteristics. Consequently, infants show matching effects and successfully detect only familiarized words that are both phonetically and acoustically similar. Such matching effects are evident regardless of whether similarity is realized by complex constructs such as talker gender and vocal affect, which are generally characterized by a constellation of spectral and temporal changes, or by simpler lexically irrelevant dimensions such as absolute pitch (Singh, White & Morgan, in press). Therefore, at early stages of word recognition, infants appear to fuse phonological and perceptual characteristics, storing in memory highly specific composites. In theory, it would behoove learners to adopt such a conservative approach to early word learning as languages differ in the kinds of acoustic cues they exploit to communicate meaning. Therefore, by design, infants cannot arrive with a prescribed set of rules mandating which acoustic details to consider and which to disregard in structuring a native lexicon. As a consequence, they may cautiously encode all surface details in the event that they may prove lexically relevant. Later, as infants gain more exposure to words in ever increasingly varying forms, they may learn which dimensions of sound co-vary with meaning and which dimensions vary orthogonally to meaning.

In other words, the transition from fragile to robust word recognition observed between 7.5 and 10.5 months may reflect infants' mastery of the interaction of acoustic-phonetic cues and meaning within a given language. However, implicit in this account is the notion that infants view words as lexical items at this later stage. In light of the fact that infants' word recognition skills appear to be robust at 10.5 months, when they may possess only a very modest comprehension vocabulary (Benedict, 1979), it seems unlikely that knowledge of word meanings, in and of itself, strengthens word recognition at this age. It is possible that older infants may develop mature word recognition skills without appealing to meaning at all. Instead, they may succeed at the task based on the type of experience they have accrued with words. Specifically, the diversity of experience they have had with words may assist them in generalizing appropriately across encounters of the same word. Accordingly, they may capitalize on variability along lexically irrelevant dimensions to identify the invariants, which are likely to be germane to lexical identity.

If it is true that older infants exploit variability in the speech stream to learn how to generalize across encounters of words obviating any need to appeal to meaning, why do young infants perform poorly at word recognition in the face of surface variability? One possible reason is that they have simply had less experience with words, and therefore, less opportunity to observe the infinitely varying forms that words can assume. Therefore, given a high degree of inexperience and uncertainty, young infants may take their cues from the conditions of the task. They may assume that all covarying properties are relevant to categorizing the word. Therefore, when they hear a single word repeated in a particular affect, they may assume that both the phonological and affective properties of the word contribute equivalently to its identity. Over the next few months, as they experience a greater diversity of word forms, they may identify the invariant properties of words for their language and focus only on those dimensions of sound when categorizing words. This account circumvents the role of meaning in the maturation of spoken word recognition, and therefore might help to explain how infants possess relatively mature word recognition skills at 10.5 months in the absence of a substantial vocabulary. By this account, just as limited experience with the same word (e.g. only hearing the word in a single type of affect) may lead to narrow categories for that word, it is equally likely that diverse experience with a word (e.g. hearing a word spoken in many different types of affect) may lead to broad categories for that word.

Just as we observe infants' tendencies to over-rely on surface form in early lexical categorization, they show similar tendencies in early cognitive categorization. Analogous studies on early object categorization reveal that infants initially attend to all surface form details, in the event that they may be meaningful (Oakes & Madole, 2003). Later, they learn to disregard those regularities that emerge adventitiously, and only consider those that are reliably co-occur (Madole & Cohen, 1995, Needham, Dueker & Modi, 2002). This potential bias to attach relevance to details that reliably co-occur across encounters allows infants to progress from attending to all feature correlations to only those that are consistently supported by their interaction with the world (Madole & Oakes, 1999). Therefore, nascent categories are highly vulnerable to effects of surface form variation across exemplars because it is assumed that similarity in form across encounters is relevant.

Therefore, while the matching effects observed in word recognition tasks completed by younger infants may appear linguistically immature, they may simply reflect immature categories, rather than immature processes. The principles that infants may employ to form these categories may be relatively mature, in the sense that they are based on structural correlations in the input. When the opportunity to derive those correlations is limited, either by having had few encounters with a word and/or having a perceptually narrow range of encounters, infants may rely heavily on structural details reinforced in a particular task. Similarly, the actual categories that infants demonstrate evidence of may be task- and age-dependent, whereas the means by which infants construct categories may remain constant over time. By this account, categories would automatically stabilize as infants cull increasing amounts of information about which characteristics covary reliably and which do not. This invites an overarching question, which underlies one of the most clamorous debates in early cognitive and linguistic development: When forming categories, whether cognitive or linguistic (or both), does the structure reside in the input or the infant?

To train this question towards the challenges of early word recognition, it is possible that in previous studies, by familiarizing infants with a word in a single affect (either happy or neutral) or spoken by a single gender (Houston & Jusczyk, 2000; Singh et al., 2004), this may have reduced the scope of any lexical category that infants may have formed during familiarization. Instead, if infants were familiarized with each word in a variety of affects for example, would they have succeeded in forming more generalizable memory representations for words? If infants' emergent lexical categories are in fact dynamic and responsive to experience, one might expect them to update their representations based on the kind of information they receive during the task. Increased variability across instances of a word may indicate to infants that highly varying dimensions are irrelevant to lexical identity and may lead to the formation of more robust representations. This hypothesis is investigated in the following study to determine the extent to which infants' early lexical categories are malleable (yet opportunistically so) early in development.

Intuitively, one might expect increased stimulus variability to hinder categorization in infants by increasing processing load during the task. Given that their ability to categorize words at 7.5 months is already fragile, this increase in load may serve only to usurp valuable resources and disrupt word recognition. However, the current hypothesis is predicated on the notion that the observed fragility in young infants' word recognition abilities is in part ascribable to the fact that their lexical categories are in their inception. The most primitive categories in particular may benefit greatly from increased variability in surface form because naïve learners may turn to cues provided by a task in their search for defining characteristics more than experienced learners

Furthermore, if we consider the privileges and constraints attached to high variability within the context of input to infants, infant-directed speech is replete with affective variation (Fernald, 1993; Trainor, Austin & Desjardins, 2000), suggesting that affectively variable speech may more closely approximate the kind of language infants actually hear than affectively uniform speech. It is highly improbable that children are ever introduced to instances of words only in a single affect, given the characteristic modulations that adults often incorporate into their speech to infants. Therefore, familiarization with a wide range of tokens more realistically simulates the prosodic undulations of a natural mother-infant dyad.

In Experiment 1, 7.5 month old infants were tested on their ability to recognize words in fluent speech amidst high variability in a procedure similar to that of Jusczyk & Aslin (1995) and Singh, et al. (2004). Infants were familiarized with one word in variable affect, another word in constant affect (either positive or neutral) and then tested on their recognition of these words in fluent speech. The recognition stimuli were spoken in constant affect (either positive or neutral), which served as a between-subjects variable. However, the affect of the recognition passages was always mismatched with the affect of the constant word. For example, a particular infant might hear one word in variable affect, one word in positive affect during familiarization and then all recognition passages in neutral affect. Based on the results of Singh et al. (2004), it was expected that the infants would recognize the word they heard in variable affect, owing to the hypothesized advantages of variable affect, yet that they would fail to recognize the word in constant affect as it was mismatched across familiarization and recognition.

Experiment 1

The goal of the present study is to determine whether infant word recognition is aided by the presence of increased variability in surface form. To investigate this, infants were familiarized with one word spoken in several different emotions (variable word) and another word in a single emotion (constant word). They then heard each word in a series of passages interspersed with unfamiliar passages (sentences containing words to which the infant was not familiarized). In this design, the affect of the passages (happy or neutral) always mismatched the affect of the constant word (happy or neutral). From previous studies, it has been established that infants at this age do not recognize words across encounters that contrast in surface form as a result of changes in factors such as affect, talker gender, and pitch (Houston and Jusczyk, 2000; Singh, et al., 2004, Singh, et al., under review). It was predicted that infants would recognize the word familiarized in variable affect in both types of passages. Furthermore, it was predicted that infants would fail to recognize the other word, given that it was mismatched in affect across the familiarization and recognition phases of the experiment.

Participants

Forty full-term, English-exposed 7.5 month olds participated in the study (22 males and 18 females), recruited from Rhode Island Health Department records. The mean age of participants was 231 days (range = 224 days to 248 days). Eight additional infants were tested and data were discarded because of inattention or technical difficulties.

Stimuli

Stimuli consisted of four monosyllabic words (“bike”, “hat”, “tree” and “pear”) and four passages, each containing six sentences. The test passages and the constant affect familiarization words were identical stimuli to the stimuli used in previous investigations (Singh et al., 2004). To produce the variable affect words, the speaker was asked to produce each word in happy, neutral, sad, angry and fearful affect. Five repetitions in each affect were produced, generating a total of 25 tokens per word. The three most clear and demonstrative tokens for each emotion were selected and formed the stimulus set of 15 variable affect words. All stimuli were produced by a trained theater actor so that the intended emotions were convincingly conveyed. In addition, the speaker was addressing her own infant while producing the stimuli to ensure that they were spoken in an infant-directed register. The stimuli for all studies were produced by the same speaker as in previous studies (Singh, et al., 2004). In addition, the test passages for every study were identical to those used in previous studies (Singh, et al., 2004).

Acoustic analyses were conducted on both the variable and single affect familiarization stimuli and are graphed in Figures 1a and 1b based on the most significant communicators of vocal affect: mean fundamental frequency and fundamental frequency range, which are particularly instrumental in distinguishing positive affect from neutral affect (Banse & Scherer, 1996; Scherer, 1986; Williams & Stevens, 1972). Mean fundamental frequency, fundamental frequency range, maxima and minima are presented in Table 1. As can be seen in Figures 1a and b, the variable affect words form an acoustically heterogeneous set relative to the constant affect words for all measures. Figure 1a shows the mean fundamental frequency for each familiarization word for variable, happy and neutral familiarization items and each point represents an individual familiarization token. As Figure 1a shows, the mean fundamental frequencies of the variable tokens are much broadly distributed than those of each of the happy and neutral tokens. In fact, the range of values spanned by the variable tokens encompasses that spanned by both the happy and neutral group combined. A comparison of variances using Levene's test for Equality of Variances revealed that the variances of each group of affective words were significantly different from that of the variable words, F(1, 118) = 55.7, p<.0001 (happy and variable) and F(1, 118)=154.1, p,.0001 (neutral and variable). Similar results were found for fundamental frequency range (displayed in Figure 1b), F(1, 118) = 8.9, p<.01 (happy and variable) and F(1, 118)=68.45, p,.0001 (neutral and variable). These findings corroborate the patterns depicted in Figures 1a and 1b, that along dimensions related to fundamental frequency, the variable words constituted a broader set of tokens than either of the single affect sets of words. All familiarization stimuli were equated for mean amplitude, duration and vowel length.

Figure 1a — Mean fundamental frequency for variable, happy and neutral words (measured in Hertz)

Figure 1b — Fundamental frequency range for variable, happy and neutral words (measured in Hertz)

Table 1.

Acoustic Analyses of Words: Means and (Standard Deviations)

	Mean F₀	Min F₀	Max F₀	F₀ range
Happy	320.45 (48.07)	231.89 (43.66)	404.77 (39.21)	172.88 (52.14)
Neutral	158.44 (6.39)	141.64 (9.24)	184.20 (14.17)	42.56 (18.05)
Variable	256.47 (94.47)	153.86 (83.35)	473.45 (105.44)	319.57 (78.98)

Open in a new tab

Comparisons of the happy and neutral tokens revealed that minimum and maximum F₀ were higher in happy words relative to neutral words, t(59)=9.39, p<.0001 and t(59)=33.58, p<.0001 respectively. Mean F₀ was higher in happy words than in neutral words, t(59)=29.55, p<.0001. Pitch range of happy words also exceeded that of neutral words, t(59)=15.18, p<.0001.

For sentences (see Table 2), happy tokens embedded in sentences had higher F₀ minima and maxima compared with neutral tokens, t(23)=8.96, p<.0001 and t(23)=25.93, p<.0001 respectively. In addition, pitch was higher and more variable in happy sentences relative to neutral sentences, as indexed by F₀ mean and range, t(23)=25.47, p<.0001 and t(23)=21.88, p<.0001 respectively. Finally, analyses revealed a higher proportion of high-frequency energy in happy sentences than in neutral sentences, t(23)=7.40, p<.0001, consistent with previous acoustic profiles of happy and neutral speech at the level of the phrase (Banse & Scherer, 1996; Scherer, 1986). The duration of happy sentences did not differ significantly from the duration of neutral sentences, indicating that speech rate did not differ significantly across affect. In addition, the durations of the target words within the carrier sentences did not differ between happy and neutral passages.

Table 2.

Acoustic Analyses of Words in Sentences: Means and (Standard Deviations)

	Mean F₀	Min F₀	Max F₀	F₀ range
Happy	223.64 (21.52)	145.43 (43.15)	334.57 (18.54)	189.14 (45.2)
Neutral	164.75 (27.48)	104.55 (13.63)	185.75 (5.76)	81.2 (17.64)

Open in a new tab

Apparatus

Testing was conducted in a three-walled testing booth within a sound-treated testing room. Each wall of the booth was 120 cm wide. A chair was positioned at the open end of the booth where the parent sat with the infant on his/her lap. The infant sat approximately 110 cm from the front of the booth. Yamaha bi-amplified loudspeakers were located behind both side walls of the booth. At the infants' eye level, 86 cm above the floor, a white light was mounted on the front wall. Each of the side walls had a similar blue light at the same level. A Panasonic CCTV video camera (model WV-BP330) was mounted behind the testing booth 12.3 cm above the yellow light. In a separate control room, a Panasonic monitor (WV-5410) was connected to the video camera in the testing booth. The participants were displayed on the monitor in the control room, where the experimenter judged infants' looking, pressing buttons on the mouse of a Windows computer to control the customized experimental software. The computer was equipped with a Sound-Blaster compatible soundboard connected to the amplified speakers. Speech stimuli were set at conversation level (75 dB) using a Digital sound level meter.

Procedure

Infants were tested using the Headturn Preference Procedure (HPP) (Kemler Nelson, Jusczyk, Mandel, Myers, Turk, & Gerken, 1995). The infant was seated on the parent's lap facing the center light. The parent listened to instrumental music over Panasonic headphones to mask the stimuli. Each trial began with the center light flashing until the experimenter judged that the infant fixated on the flashing light. At that point, this light was turned off and one of the side lights began to flash to attract the infant's attention to the side. Side of presentation was randomized across trials, so that all stimuli occurred on both sides. After the infant turned to look at the flashing side light, the speech stimuli for that trial began to play. The sound continued to play and the side light remained on for the duration of the infant's fixation on the light. Each trial continued until the infant looked away for two seconds, or until 20 seconds of looking time had been accumulated during that trial. If the infant looked away, but then looked back within two seconds, the trial continued. If the infant's looking time was below 2 seconds, the trial was repeated with a new randomization of the trial stimuli; otherwise, the procedure advanced to the next trial.

Familiarization began with trials alternating between the two target words. Once the infant had exceeded 30 seconds of looking time with one word, all subsequent familiarization trials presented the alternate word. This modification of the HPP was instituted to ensure that differences in looking times during recognition testing could not be due to different amounts of familiarization with the two target words. When the infant reached 30 seconds of looking time with the second word, the test phase began.

Recognition testing consisted of four blocks of trials, each block containing one trial with each of the four passages. The order of passages within each block was randomized for each infant. In addition, the order of sentences within passages was also randomized on each trial. The test procedure was similar to the familiarization procedure, except that the side light continued to flash while infants were fixated on the light. As in the familiarization phase, if the infant continued to look at the light for 20 seconds, the trial ended automatically and the next trial began. Similarly, if the infant failed to look at the side light for at least 2 seconds, the trial was automatically repeated. A minimum criterion of 2 seconds was necessary to allow the infant to hear at least one token of the target word in a sentence.

Target words for each infant were either “bike” and “hat” or “tree” and “pear.” The within-subjects manipulation was the affect of the target words: all infants heard one target word in variable affect and the other in either happy or neutral affect. The affect (variable/constant) of the target word and the particular pair of words infants were familiarized with were counterbalanced across subjects. In the interests of clarity, a summary of the factors manipulated in this and subsequent experiments is displayed in Table 4.

Table 4.

Stimulus Manipulations in Each Experiment

Experiment Number	Familiarization Stimulus 1	Familiarization Stimulus 2	Recognition of Stimulus 1	Recognition of Stimulus 2
Experiment 1	Variable Word	Mismatched Word	Yes	Yes
Experiment 2	Matched Non-Word	Mismatched Non-Word	Yes	No
Experiment 3 Condition 1	Matched Non-Word	Mismatched Word	Yes	No
Experiment 3 Condition 2	Mismatched Non-Word	Matched Word	No	Yes
Experiment 4	Variable Non-Word	Variable Word	No	Yes

Open in a new tab

During the test phase, infants heard four passages. Half the infants heard all passages in happy affect and half heard all passages in neutral affect. However, the affect of passages was always mismatched with the affect of the constant familiarization word. For example, an infant who heard “bike” in variable affect and “hat” in neutral affect heard all recognition passages in happy affect. Therefore, there was one between-subjects condition, passage affect during recognition (happy or neutral), and one within-subject condition, word affect during familiarization (variable or constant).

Results and Discussion

To calculate infants' recognition of familiarized words, recognition scores were computed, which were calculated by subtracting infants looking times to passages containing unfamiliar words from looking times to passages containing familiarized words. Evidence of word recognition was inferred from a recognition score that departed significantly from zero. Although analyses were conducted on recognition score, raw data are available in Table 5 for each experiment and condition therein. Recognition scores with standard error values are plotted in Figure 2 for happy passages and Figure 3 for neutral passages. For infants who were familiarized with words in neutral and variable affect and tested on recognition of those words in happy passages (see Figure 2) there was a main effect of word affect, F(2,38)=10.55, p<.0001. In this condition, infants listened longer to passages containing words they originally heard in variable affect than to passages containing unfamiliar words, F(1,19)=4.31, p<.05. Eleven of twenty infants looked longer at sentences containing words familiarized in variable affect relative to unfamiliar passages. However, in this condition, infants also showed reduced looking times for words familiarized in neutral affect, even though they were embedded in happy test passages F(1,19)=6.78, p=.<.05. Twelve of twenty infants showed this pattern of results. This tendency on the part of infants to inhibit their attention to stimuli presented in neutral affect was observed reliably in pilot testing and in previous studies investigating recognition of words spoken in neutral affect in both 7.5 and 10.5 month old infants (Singh et al., 2004). Given the reliability with which this reverse pattern of looking time has been observed with stimuli spoken in neutral affect, it is hypothesized that infants inhibit their attention to neutral affect stimuli because infants typically disprefer neutral and negative affect stimuli in speech perception tasks (Fernald, 1993; Kitamura & Burnham, 1998; Singh, Morgan & Best, 2000). By contrast, infants listen selectively to speech spoken in positive affect, perhaps accounting for increased looking times relative to baseline for familiarization items spoken in positive affect. Both reduced and increased looking times relative to baseline are indications of word recognition, but again signify that infants express their recognition of a word in ways consonant with their listening preferences.

Table 5.

Mean Looking Times (S.E.) for Each Experiment

Experiment Number	Type of Word in Passage	Happy Test Passages	Neutral Test Passages
Experiment 1	Variable Word	9866.14(911.2)	8154.54(1098.3)
	Mismatched Word	6464.26(610.25)	9598.72(1373.54)
	Unfamiliar Passages	8482.54(662.24)	6257.02(761.25)
Experiment 2	Matched Non-Word	8068.25(647.71)	5858.64(362.94)
	Mismatched Non-Word	6528.28(372.94)	6662.56(642.84)
	Unfamiliar Passages	5496.65(329.15)	7052.58(475.95)
Experiment 3 Condition 1	Matched Non-Word	11051.44(889.64)	7990.71(591.11)
	Mismatched Word	8019.34(1430.75)	9952.48(721.58)
	Unfamiliar Passages	8578.54(814.25)	9791.25(871.24)
Experiment 3 Condition 2	Mismatched Non-Word	7830.54(904.27)	7878.94(303.54)
	Matched Word	11658.54(712.97)	6246.54(410.17)
	Unfamiliar Passages	8472.9(755.44)	8389.84(414.67)
Experiment 4	Variable Non-Word	7802.18(651.88)	7524.46(597.52)
	Variable Word	9038.21(732.84)	8968.12(661.87)
	Unfamiliar Passages	7131.46(523.45)	7076.24(495.22)

Open in a new tab

Looking times (means and standard errors) for happy passages containing words familiarized in variable and constant (neutral) affect

Looking times (means and standard errors) for neutral passages containing words familiarized in variable and constant (happy) affect

In the neutral passage condition, infants were exposed to one target word in happy affect and the other in variable affect. They were then tested on recognition of these words by listening to a neutral set of passages containing both words as well as unfamiliar passages. In this condition, infants looked longer at passages containing words familiarized in variable affect relative to unfamiliar words, F(1,19)=7.23, p<.05 (see Figure 3). Eleven of twenty infants showed this pattern. However, in this condition, infants also showed increased looking times for words familiarized in happy affect, even though they were embedded in neutral test passages, F(1,19)=8.65, p=.<01. Fourteen of twenty infants showed this pattern of results.

In summary, infants were able to recognize words familiarized in variable affect in the context of both happy and neutral test passages, suggesting that high variability in familiarization assists infant word recognition. However, the most surprising result of this experiment was that unlike in previous investigations, 7.5-month-old infants recognized words presented in a single affect during familiarization. This was inconsistent with earlier predictions because these single affect tokens were always embedded in passages that contrasted with the affect of familiarization stimuli. Therefore, the effects of high variability in one word appeared to extend to the other word presented in the session, even though that word was produced with minimal variability. The finding that the benefits of a high variability set propagated to a low variability set implies that the focus of infants' attention may have shifted from exemplar-specific details to category-based details for both words in the test session. In other words, infants appeared to be able to disregard exemplar-specific surface details in favor of category-level representation, resulting in the appearance of relatively mature word recognition skills.

Infants appear to show relatively abstract word recognition skills in this experiment, suggesting that while they are known to retain considerable episodic detail about words, they are able to focus on phonological characteristics of both words presented in the same experimental session under particular conditions. In light of the finding that high variability across tokens facilitates subsequent word recognition, these results raise the issue of whether low acoustic variability during familiarization degrades performance on word recognition tasks. Previously, it has been reported that low variability during familiarization leads infants to over-emphasize perceptual similarity along dimensions that are lexically irrelevant (Houston, 1999; Singh et al, 2004). To what extent does this over-reliance on surface detail compromise spoken word recognition? One possible cost of this is that infants may over-emphasize affective dimensions at the expense of attending to more subtle cues such as fine phonetic detail. The following experiment assesses the effects of low acoustic variability on infants' detection of phonological equivalence by seeking evidence of false recognition of similar sounding words that are perceptually confusable (i.e. produced with the same affective prosody) yet lexically distinct (i.e. minimal pairs).

Previous investigations of false recognition in 7.5-month-old infants have revealed that they are highly sensitive to phonetic detail in spoken word recognition tasks and that they do not confuse minimal pairs. These studies showed that infants do not equate phonetic variants that differ by the onset phoneme, e.g. they do not perceive “tup” to be an instance of “cup” (Jusczyk & Aslin, 1995). Similarly, they are sensitive to final consonant quality, which is typically less perceptible, and do not show false recognition of variants that differ in the final consonant either (Tincoff & Jusczyk, 1996). However, in these studies conducted by Aslin, Jusczyk & Tincoff, the stimuli were spoken in typical infant-directed speech and therefore each familiarization set consisted of acoustically variable instances of a word, and were possibly more similar to the variable stimulus set used in Experiment 1. The high variability inherent in the familiarization sets may have assisted infants in attending to invariant phonological details and disregarding lexically irrelevant details, leading to a similarly high level of performance to Experiment 1. The current focus of investigation is in the context of low stimulus variability, whether infants over-represent surface details that reliably co-occur with phonological structure and in particular, whether this over-reliance leaves in its wake a neglect of more subtle phonemic detail. In the following experiment, infants are introduced to a phonetic variant of each target word produced in a single affect (e.g. “dike” consistently produced with happy affect and “gat” consistently produced with neutral affect). They were then tested on recognition of target words (e.g. “bike” and “hat”) in sentences spoken in either happy or neutral affect. Therefore, no word was introduced with high stimulus variability. This design was very similar to that of Singh, et al. (2004) except that here, infants were familiarized with variants of the target word. However, the design was also similar to Experiment 3 in Jusczyk & Aslin (1995) where infants were familiarized with variants of the words on which they were later tested, although unlike in that experiment, in the present experiment the variability with which words were introduced was varied between words rather than within a word.

Experiment 2

The purpose of the following experiment was to investigate whether false recognition might be observed in 7.5-month-old infants when the recurrence of surface properties of the familiarization tokens directs their attention to non-phonemic surface properties, such as affect. If infants show evidence of false recognition, this would again suggest that they are uncertain about the determinants of lexical relevance at this age, and are influenced by nonphonemic features at the expense of phonemic features. In this case, it would appear that low stimulus variability in familiarization not only results in increased matching based on nonphonemic similarities but also that it actively compromises matching based on phonemic similarities.