Underspecification in Toddlers’ and Adults’ Lexical Representations

Jie Ren; Uriel Cohen-Priva; James L Morgan

doi:10.1016/j.cognition.2019.06.003

. Author manuscript; available in PMC: 2020 Dec 1.

Published in final edited form as: Cognition. 2019 Sep 14;193:103991. doi: 10.1016/j.cognition.2019.06.003

Underspecification in Toddlers’ and Adults’ Lexical Representations

Jie Ren ¹, Uriel Cohen-Priva ¹, James L Morgan ¹

PMCID: PMC7134210 NIHMSID: NIHMS1539934 PMID: 31525643

Abstract

Recent research has shown that toddlers’ lexical representations are phonologically detailed, quantitatively much like those of adults. Studies in this article explore whether toddlers’ and adults’ lexical representations are qualitatively similar. Psycholinguistic claims (Lahiri & Marslen-Wilson, 1991; Lahiri & Reetz, 2002, 2010) based on underspecification (Kiparsky, 1982 et seq.) predict asymmetrical judgments in lexical processing tasks; these have been supported in some psycholinguistic research showing that participants are more sensitive to noncoronal-to-coronal (pop → top) than to coronal-to-noncoronal (top → pop) changes or mispronunciations. Three experiments using on-line visual world procedures showed that 19-month-olds and adults displayed sensitivities to both noncoronal-to-coronal and coronal-to-noncoronal mispronunciations of familiar words. No hints of any asymmetries were observed for either age group. There thus appears to be considerable developmental continuity in the nature of early and mature lexical representations. Discrepancies between the current findings and those of previous studies appear to be due to methodological differences that cast doubt on the validity of claims of psycholinguistic support for underspecification.

Keywords: lexical representation, developmental continuity, mispronunciation processing, phonological details, underspecification

Adults exhibit impressive abilities in recognizing words in familiar languages. They can segment words instantaneously from continuous speech streams, abstract native phonemes effortlessly from speech signals characterized by lack of invariants, and distinguish known words from novel words with very high accuracy.

For learners to acquire adult-like language proficiency, experience with the phonology of the native language plays a central role. Studies have consistently shown that infants possess global perceptual sensitivities that can be adapted to learn the phonological categories of any language. For example, before six months, infants can distinguish phonetic categories that are not used in their native language, and it is not until the end of the first year of life that infants are fully tuned to language-specific speech structures (e.g., Kuhl, Williams, Lacerda, Stevens, & Lindblom, 1992; Werker, Gilbert, Humphrey, & Tees, 1981; Werker & Tees, 1984; Werker & Lalonde, 1988). Whether early lexical knowledge comprises native phonological details and whether the details are adult-like both in amount and in nature have been topics of debate.

Several early studies suggested that early lexical representations might be holistic and vague, with little or no detailed phonological information. For instance, children as old as eight years had been found to fail to discriminate native phonemes across a variety of tasks, including picture selection and phonological similarity judgment (Barton, 1976, 1980; Eilers & Oller, 1976; Garnica, 1973; Kay-Raining Bird & Chapman, 1998; Schvachkin, 1973). However, recent studies using less demanding experimental paradigms have shown that at least by the middle of the second year, lexical representations are detailed in terms of phonological features. Swingley & Aslin (2000) found that 14-month-olds are sensitive to 1-feature mispronunciations (“vaby”) of familiar words (“baby”). White & Morgan (2008) demonstrated that 19-month-old infants display graded sensitivity to varying degrees of onset mispronunciations of familiar words: as mispronunciations increasingly deviated from a correct form (/dog/) by one (/gog/), two (/kog/), or three features (/sog/), infants’ proportional looking to a referent of the familiar target word decreased. The same pattern of response was replicated for coda mispronunciations (Ren & Morgan, 2011), vowel mispronunciations (Mani & Plunkett, 2011) and lexical tone mispronunciations (Ren & Morgan, 2013). Using a slightly different paradigm, Nazzi (2005) showed that 20-month-olds can apply their detailed representation of native speech categories in learning minimal pairs of novel words (“pize” vs. “tize”). Adults have also been found to be affected by the degree of acoustic-phonetic mismatch during semantic priming (Connine, Titone, Deelman, & Blasko, 1997; Milberg, Blumstein, & Dworetzky, 1988) and visual word recognition (Tin, White, & Morgan, 2014). Thus, early and mature lexical representations both appear to be of commensurate with phonological detail.

However, although these previous studies have found striking developmental continuity in the amount of phonological details between early and mature lexical representations, whether the nature of the represented detail is stable across development is still not known. In particular, phonological theories of underspecification (Archangeli, 1988; Avery & Rice, 1989; Kiparsky, 1982) have suggested that certain unmarked feature values, such as coronal place of articulation, may be left unspecified or empty in underlying lexical representations (Kean, 1975). Such arguments are primarily motivated by differences in phonological processes between coronal and noncoronal segments. For example, in Catalan (Mascaro, 1976), backwards place of articulation assimilation occurs only for coronal stops, as in (1),

(1). Stop assimilation in Catalan:

se[t]	‘seven’
se[m] mans	‘seven hands’
se[p] forcs	‘seven fires’
se[b] beus	‘seven voices’
se[d] dones	‘seven women’
se[k] cases	‘seven houses’

Open in a new tab

whereas labial and velar stops do not undergo such assimilation.

(2). Stop assimilation in Catalan :

ca[p]	‘no’
ca[p] signe	‘no sign’
po[k]	‘few’
po[k] pa	‘few bread’
po[k] sol	‘few sun’

Open in a new tab

Differences in phonological processes between coronal and noncoronal segments are also found in many other languages, such as English place assimilation (e.g. sweet [swi:t] girl is often pronounced as [swi:k]girl) (Gaskell & Marslen-Wilson, 1996) and so forth. Underspecification theory is thus in part an attempt to explain why pronunciations of nominally coronal segments in words are less faithful than are pronunciations of noncoronal segments.

Although underspecification was not advanced to explain speech perception, hypotheses predicting effects on spoken word recognition have been derived from this aspect of phonological theory (Lahiri & Marslen-Wilson, 1991; Lahiri & Reetz, 2002, 2010). For example, the Featurally Underspecified Lexicon (FUL) model (Lahiri & Reetz, 2002, 2010) predicts specific asymmetries in effects of mispronunciations. On this account, the place of articulation of the onset /d/ of the word duck is unspecified in lexical representation; consequently, mispronunciations in the onset of duck such as guck will not be incompatible with the underlying representation, and such mispronunciations should have minimal effects on lexical activation of duck. By contrast, the onset /g/ of the word goose is specified as [+velar] in the underlying lexical representation, so mispronunciations of the onset of goose such as doose will be incompatible with the underlying representation and thus will significantly disrupt lexical activation of goose.

Psycholinguistic studies using a variety of tasks have adduced evidence supporting predictions of the FUL model. For example, at temporally early stages of speech perception as found by ERP, German-speaking adults display asymmetric discrimination for mispronunciations of familiar words with coronal or noncoronal onsets (Friedrich, Lahiri, & Eulitz, 2008), word internal consonants (Friedrich, Eulitz, & Lahiri, 2006; Cornell, Lahiri & Eulitz, 2013), codas (Lahiri & van Coillie, 1999), and vowels (Lahiri & Reetz, 2010; Cornell, Lahiri, & Eulitz, 2011). Similar effects have also been found with English-speaking adults (Roberts, Wetterlin & Lahiri, 2013). Other studies have used gating and lexical decision procedures to demonstrate putative effects of underspecification in Bengali, English, and German (Gaskell & Marslen-Wilson, 1996; Lahiri & Marslen-Wilson, 1991; Lahiri & Reetz, 2010; Wheeldon & Wasker, 2004).

A recent series of studies by Mitterer (2011), however, failed to find the asymmetries predicted by the FUL model. Mitterer conducted a series of visual-world studies in which Dutch-speaking adults saw four printed words arrayed on a screen and heard targets that varied from onset competitors by one (Experiments 1-3) or two (Experiment 4) features (see also McQueen & Viebahn, 2007). Adults looked more towards competitor words than phonologically unrelated distractor words, but there were no differences depending on phonological relations between target and competitor onsets.

In this article, we present a series of studies examining possible asymmetries in toddlers’ processing of (mis)pronunciations of familiar words and compare toddlers’ processing with adults’ to examine the question of developmental continuity. That is, do toddlers’ lexical representations have similar phonological details with adults’ mature lexical representations in terms of featural (under)specification. The other goal of our studies was to ensure that the tasks used clearly tapped lexical representations. Unlike Mitterer (2011), we used a simplified visual-world procedure in which subjects saw pictured referents of familiar words. Associations between pronunciations and referents by definition involve the mental lexicon. By contrast, in languages with sub-logographic writing systems such as English, pronunciations can be readily generated for novel or nonce forms and thus relations between pronunciations and printed forms need not rely on lexical entries. Use of pictured referents also allowed us to use the same procedure, with only minor modifications, with both toddlers and adults. This will best enable us to address our fundamental goal: to determine whether the nature of phonological specification in the mental lexicon is stable across development.

Based on the underspecification account (Kiparsky, 1982 et seq.), the unmarked values are assumed to be the default values. Therefore, certain considerations, both theoretical and empirical, suggest that effects of underspecification might be observed most strongly in younger populations. For example, Optimality Theory (OT) analyses of child grammar (Gnanadesikan, 2004) have suggested that markedness constraints, which dis-prefer marked forms, outrank faithfulness constraints, which would have preserved those marked forms, in toddlers’ early phonology.

Language experience may attenuate effects of underspecification. Certain accounts of early lexical representation have suggested that the degree of phonological specification correlates positively with the toddler’s vocabulary size (Charles Luce & Luce, 1990, 1995; Storkel, 2002; Walley, 1993, 2005). According to the Lexical Restructuring Model (Metasala & Walley, 1998), for example, word learning can increase the specificity of infants’ speech categorization and concomitantly increase in the specificity with which individual words are represented in the lexicon. As the child’s vocabulary expands, the lexicon comes to contain more and more sets of similar sounding words. On such views, toddlers, whose vocabularies are markedly smaller, are more likely to have an underspecified lexicon than adults. Of course, adults know many more minimal pairs than do toddlers, many of which involve coronal vs. noncoronal contrasts (e.g., pop, top, cop); suppression of underspecification might help to avoid false-alarm over-recognition of items with coronal segments¹.

To date, studies with infants and toddlers have found mixed evidence for early appearance of asymmetries predicted by underspecification. Dijkstra & Fikkert (2011), for example, habituated 6-month-old Dutch-learning infants to either repeated taan or paan tokens and then tested their ability to discriminate trials in which one or the other stimulus repeated versus trials in which the two stimuli alternated. Whereas infants habituated to paan discriminated the two types of trials, infants habituated to taan did not. Similarly, Tsuji, Mazuka, Cristia & Fikkert (2015) habituated 4-month-old Dutch-learning infants and 4-month-old Japanese-learning infants to either repeated ompa or onta tokens. Then, they tested the infants’ ability to discriminate trials in which one or the other stimulus repeated versus trials in which the two stimuli alternated. For both language groups, whereas infants habituated to noncoronal stimuli (paan, ompa) discriminated the two types of trials, those habituated to coronal stimuli (taan, onta) did not. In both studies, the authors interpreted their findings in terms of underspecification: when the standard of comparison was coronal (taan or onta), the coronal feature for place of articulation was unspecified, and the noncoronal counterpart (paan or ompa) were compatible with the standard. However, when the standard of comparison was noncoronal (paan or ompa), the labial place of articulation was specified, the coronal counterpart (taan or onta) was not compatible with the standard, and paan vs. taan and ompa vs.onta were both discriminable.

Using a preferential looking mispronunciation task, van der Feest and Fikkert (2015) found that 20- and 24-month-old Dutch-learning toddlers showed significant differences in proportional looking times to referents of familiar words beginning with labials depending on whether they were correctly pronounced or mispronounced with coronal onsets. However, toddlers did not show differences in looking times when familiar words beginning with coronals were mispronounced with labial onsets. Similarly, Tsuji, Fikkert, Yamane & Mazuka (2016) tested one group of Dutch-learning and one group of Japanese-learning 18-month-olds on their sensitivities to coronal mispronunciations of novels words. They found lack of sensitivities to such mispronunciations for infants from both language groups. The procedures of these two studies both involved considerable stimulus repetition. That is, each toddler was exposed to a single target item, which was tested in multiple pronunciation conditions: In van der Feest & Fikkert, there are six items tested in correct pronunciations, and among the six items, four were tested with place of articulation mispronunciations, and four with voicing mispronunciations. Similarly, in Tsuji, Fikkert, Yamane & Mazuka, the two items were tested with correct pronunciations, mispronunciation changes to labial sounds, and mispronunciation changes to dorsal sounds.

Less clear support for underspecification comes from a series of studies by Fennell and colleagues testing 14-month-old English-learning infants on their sensitivities to different directions of novel word mispronunciations using the “switch task” (Stager & Werker, 1997). Consistent with the underspecification hypothesis, an initial study (Fennel, 2007) showed that infants detected a labial-to-coronal switch but failed to detect a coronal-to-labial switch. However, inconsistent with the underspecification hypothesis, a follow-up study (Fennell, van der Feest, & Spring, 2010) showed that 14-month-olds were better able to detect a coronal-to-velar switch than a velar-to-coronal switch. To explain such findings, the author attributed the asymmetries to the fact that /b/</d/</g/ in acoustic variability in their experimental stimuli. Thus, the authors concluded that the asymmetries they observed might be better explained by acoustic properties of the segments than by their phonological status or (under)specification.

In this article, we examine possible asymmetries at the lexical level by comparing 19-month-old English-learning toddlers’ and English-speaking adults’ detection of correct and incorrect pronunciations of familiar words. We tested coronal-to-labial, labial-to-coronal, coronal-to-velar, and velar-to-coronal mispronunciations in an online word recognition task using a simplified version of the visual world paradigm. If toddlers’ lexical representations are unspecified for coronal place of articulation as predicted by the underspecification hypothesis, then we expect to see significant asymmetries: larger effects for labial-to-coronal and velar-tocoronal mispronunciations than for coronal-to-labial and coronal-to-velar mispronunciations. By using lexical specification as leverage, we hope to obtain a clearer picture of developmental (dis)continuity in the nature of lexical representations.

EXPERIMENT 1

White & Morgan (2008) introduced several refinements to the preferential looking mispronunciation task pioneered by Swingley & Aslin (2000). First, rather than displaying two referents with known labels in each trial, they paired a referent with a known label and a referent without a known label. The rationale for this modification was that known labels might reduce the sensitivity of the paradigm by repelling of mispronunciations towards the correct forms. For example, if a toddler were to see pictures of a baby and a car and hear a mispronunciation vaby, because that pronunciation is not a possible label for the car, changes in looking time relative to the correct pronunciation baby might be limited. They also tested each item on only a single trial. This modification is of importance for studies of asymmetries in speech perception, because testing items repeatedly raises possibilities of carry-over effects and may change the perceived nature of the task. In particular, whereas hearing vaby once may interrogate whether it is an acceptable pronunciation of baby (a perceptual task), hearing these two pronunciations multiple times might be construed as asking whether vaby is a plausible mispronunciation of baby. To find a solution to such a question, infants might need to draw on implicit metalinguistic knowledge. In this way, tasks with repetitions of stimuli may tap into different processing levels than on-line lexical processing.

Collectively, the modifications introduced by White & Morgan served to increase the power and sensitivity of the mispronunciation task: whereas earlier studies (Bailey & Plunkett, 2002; Swingley & Aslin, 2002) had failed to find graded effects depending on the numbers of features altered in mispronunciations, White & Morgan showed a strong linear effect of number of features altered in onset changes on toddlers’ looking times. Here, we use White & Morgan’s procedures to interrogate possible asymmetries in word recognition.

Experiments 1a and 1b were designed to establish whether 19-month-olds exhibit asymmetrical sensitivities to mispronunciations depending on whether the mispronunciations are from coronal to noncoronal segments or vice-versa. To address this question, we tested mispronunciations in consonantal onsets in Experiment 1a and codas in Experiment 1b, respectively.

Methods

Participants

Fifty-five 19-month-old English-learning infants were recruited from monolingual English-speaking families in Rhode Island and Massachusetts. Twenty-seven of these infants were tested on onset mispronunciations in Experiment 1a and twenty-eight were tested on coda mispronunciations in Experiment 1b. Nine participants were excluded from data analysis due to fussiness (4) or crying (3) or over 50% alternative language exposure (1). This left twenty-one 19-month-olds (mean age = 1;7;17) in Experiment 1a and twenty-six 19-month-olds (mean age = 1;7;23) in Experiment 1b, respectively.

Stimuli

Familiar labels comprised a set of words that are comprehended by the majority of infants by 14 months, according to the MacArthur CDI norms (Dale & Fenson, 1996). In each trial, infants saw two images, one depicting a referent of a familiar label, the other depicting a referent of an unfamiliar (to 19-month-olds) label. An example stimulus pair is depicted in Figure 1; a list of displayed objects is given in Appendix A for onset mispronunciations and in Appendix B for coda mispronunciations. Infants were tested in 18 trials. These included three trials with correctly pronounced coronal stops (e.g. dog or cat), three mispronounced coronal stops (e.g. gog or cak), three correctly pronounced noncoronal stops (e.g. cat or dog), three mispronounced noncoronal stops (e.g. tat or dod), three correctly pronounced familiar fillers (e.g. hand) and three novel fillers (e.g. wrench). Examples of (mis)pronunciations are given in Table 1 for onsets and Table 2 for codas. Familiarity of labels for target and distractor images was assessed via a parental questionnaire completed after the experiment session.

Figure 1. — Sample visual stimulus pair in Experiments 1a & 1b.

Table 1.

Sample stimuli for onset mispronunciations

Audio Stimuli Samples	(Mis)pronunciation	Place of articulation
Where’s the duck?	Correct	Coronal
Where’s the guck?	Mispronounced	Noncoronal
Where’s the cat?	Correct	Noncoronal
Where’s the tat?	Mispronounced	Coronal

Open in a new tab

Table 2.

Sample stimuli for coda mispronunciations

Audio Stimuli Samples (	Mis)pronunciation	Place of articulation
Where’s the cat?	Correct	Coronal
Where’s the cak?	Mispronounced	Noncoronal
Where’s the duck?	Correct	Noncoronal
Where’s the dut?	Mispronounced	Coronal

Open in a new tab

Mispronunciations involved only single-feature changes in place of articulation, and all mispronunciations resulted in non-words or in words judged unlikely to be familiar to infants at this age. Most of the words tested in the current study were monosyllabic, with only two exceptions (baby and table) for onsets in Experiment 1a due to the restricted vocabulary size of babies at 19 months. All stimuli were naturally produced by a trained female speaker of American English who produced the utterances with positive infant-directed affect. The mean length of target items across all conditions was 588.05 msec (SD = 78.58 msec); lengths of target items did not differ significantly across conditions, F (3, 92) = 0.036, p = 0.991. A complete list of words used in the current study and their accompanying images are provided in Appendix A for onsets and Appendix B for codas.

Pairings of familiar and unfamiliar objects remained constant across subjects. Assignment of stimulus pairs to pronunciation condition was counterbalanced across subjects (filler trials were constant across subjects). Half of the coronal items were mispronounced with a corresponding labial stop, and half were mispronounced with a velar stop. These were counterbalanced so that one sub-group heard two labial mispronunciations and one velar mispronunciation, while the other heard one labial mispronunciation and two velar mispronunciations. Similarly, the noncoronal items were evenly divided between labials and velars; one sub-group heard one correct labial pronunciation and two correct velar pronunciations, while the other heard two correct labial pronunciations and one correct velar pronunciation. Each item was presented to each infant one time only, i.e. either the correct form or the mispronounced form. Order of presentation was pseudo randomized online for each infant with the constraint that the first two trials always contained one correct filler trial and one novel filler trial.

Procedure

Testing was conducted in a sound-treated laboratory room. The parent sat with the child on his/her lap, while listening to instrumental music over noise-cancelation headphones to mask the audio stimuli. Approximately 90cm in front of the child were two 51cm flat-panel monitors mounted side-by-side, together subtending approximately 55 degrees of visual angle. A speaker was located centrally between the two monitors behind a pegboard panel. At the subjects’ eye level, a blue light was mounted on the panel between the two monitors. The subjects were observed over a closed-circuit video system and recorded on a digital camcorder at 30 fps for later off-line coding. Speech stimuli were played at conversational level (70 dB).

An intermodal preferential looking procedure (IPLP) similar as that used in White & Morgan (2008) was used during the experiment. Each trial began with the blue light flashing until the subject fixated at midline. At that point, the experimenter turned off the center light and initiated the salience phase. During the salience phase, one object with a known label and a second object with unknown label were simultaneously displayed on the two monitors (see Figure 1). The two objects were displayed silently for 4 seconds to establish baseline looking preference. After the salience phase, the two monitors went dark, and infants’ attention was then recaptured to midline to avoid contingencies between side of fixation at the end of the salience period and at the beginning of the test period.

After recentralization, the experimenter initiated the test phase. During the test phase, the audio stimulus (Where’s the X?) was played, and immediately after that, the two visual stimuli were presented for 8 seconds. Following an interval of at least 1 second, the next trial commenced. Side of presentation of the familiar object was randomized between trials by the customized experimental software, but was consistent across salience and test phases within each single trial. The dependent measure was the change in proportional looking to the familiar object between the (silent) salience phase and the test phase. Of interest was whether looking behavior would differ as a function of the directions of mispronunciation.

Following the session, the parent completed the vocabulary questionnaire to verify on his/her toddler’s comprehension and production of the stimulus items (familiar and unfamiliar).

Results and discussion

Results from the parental questionnaire indicated the items selected were appropriate for this sample of 19-month-olds. Of a scale of 3 (1 = only visually familiar; 2 = visually familiar and label known; 3 = visually familiar, label known and produced), labels for target images (used as familiar words) received an average score of 2.462 for Experiment 1a (SD = .242) and 2.462 for Experiment 1b (SD = .273), indicating that they were highly familiar to toddlers. Labels of distractor images² received an average score of 0.24 for Experiment 1a (SD = .235) and 0.301 for Experiment 1b (SD = .208), indicating that they were highly unfamiliar to toddlers. No labels of target images in Experiment 1a or 1b were scored 1. Labels of distractor images were scored 3 in 24 trials (of 360 total, 6.67%) in Experiment 1a, and in 32 trials (of 468 total, 6.84%) in Experiment 1b. These trials were all removed from further analyses.

Looking behavior was coded off-line frame-by-frame (1 frame = 33 msec) using in-house coding software. For the salience phase, looking behavior was coded for entire the 4s duration of the phase. For the test phase, looking behavior was coded only for the 3s following the onset of the first occurrence of the target word. This was done in order to include only subjects’ initial response to the target word.

In the present case, two competing accounts make divergent predictions for the patterns of results that should be observed. On the underspecification (FUL) account, mispronunciations of segments that are underlyingly labial or velar should disrupt lexical recognition and access, whereas mispronunciations of underlyingly coronal segments should not; the data should follow the pattern shown in Figure 2. By contrast, on the account that segments are equivalently represented regardless of place of articulation, a main effect of (mis)pronunciation should obtain, but there should be no (mis)pronunciation-by-place of articulation interaction.

Figure 2. — Predicted results from the hypothesis of underspecification.

Overall summaries of the data for onsets in Experiment 1a and codas in Experiment 1b are shown in Figures 3 and 4, respectively. In both figures, proportional looking towards each of the objects in each trial was first computed over the total time the subject spent looking at both objects for that phase. Then, a difference score of proportional looking was computed using the following formula:

% L o o k i n g {(F a m i l i a r)}_{T e s t} - % L o o k i n g {(F a m i l i a r)}_{S a l i e n c e .}

Figure 3. — Experiment 1a: Toddlers’ sensitivities to different directions of onset mispronunciations. Error bars show two standard errors computed via subject-wise non-parametric

Figure 4. — Experiment 1b: Toddlers’ sensitivities to different directions of coda mispronunciations. Error bars show two standard errors computed via subject-wise non-parametric bootstrap.

This formula measures the change in looking toward the familiar object after the target was named. Such difference scores allowed us to use each stimulus pair as its own baseline, controlling for differences in visual salience or inherent preference for a particular stimulus in each pairing. It is clear from inspection that the interactions predicted by the underspecification account did not materialize in our evidence.

The dependent measure of our statistical analyses was participants’ success/failure to gaze at the target object at each time frame for each trial. In particular, if at a certain time frame of a trial the participant was looking at the target, we coded it as a 1 and otherwise we coded it as 0. The time course data can be found in Figure 5 for Experiment 1a and Figure 6 for Experiment 1b, respectively.

Figure 5. — Experiment 1a: Time course data of toddlers’ sensitivities to different directions of onset mispronunciations

Figure 6. — Experiment 1b: Time course data of toddlers’ sensitivities to different directions of coda mispronunciations

Given the binary data in each time frame for each trial, we conducted Bayesian Logistic Regressions with Mixed Effects Modeling. By specifying priors that are appropriate for each account, Bayesian data analyses (Kruschke, 2010; Gelman, Carlin, Stern & Rubin, 2004) can be computed that yield estimates of the posterior probabilities of the parameters, providing a principled quantitative evaluation of the credibility of the hypotheses under discussion. Fixed effects of the model included pronunciation and place of articulation as two discrete variables, time as a continuous variable, and their interactions. Random effects of the model include random intercepts and slopes for both subjects and items. Mathematical details of the model can be found in Supplementary Material II. The time window used for Experiment 1a was between 233.31 msec and 1899.81 msec after the initiation of each trial, and the time window used for Experiment 1b was between 233.31 msec and 1733.16 msec after the initiation of each trial. Following standard practice, data from the first 233.31 msec were excluded from analyses to allow for programming and execution of initial saccades. After 1899.81 msec (1733.16 msec for Experiment 1b), on average participants had reached the highest looking proportion to the target objects. This suggests that participants at this time point had achieved the highest degree of lexical activation for all the experimental conditions.

We first ran Bayesian Logistic Mixed Effects Regressions with non-informative priors. That is, conjugate priors were used for the parameters of the model with coefficients of (mis)pronunciation, place of articulation, time and the interaction between these factors were all set as normally distributed centered on zero. Standard deviations of normal distributions were set to follow a half gamma distribution with the rate parameter and the shape parameter both set as 0.01. This Bayesian method is similar to null hypothesis significance testing with Logistic Mixed Effects Modeling except that the mean and the variance of the regression coefficients were specified with conjugate prior distributions. In this way, we will be able to obtain the posterior distributions of these parameters which provides the information about the observed effect size. JAGS code for Logistic Mixed Effects Modeling with non-informative priors is given in Appendix E. For readers unfamiliar with Bayesian data analyses, results of NHST Logistic Mixed Effects Modeling are given in Supplementary Material II. Posterior distributions for two main effects and the interaction are shown in Table 3 for Experiments 1a and 1b.

Table 3.

Posterior distributions of coefficients of Experiment 1a and 1b

	Experiment 1a Infant Onset				Experiment 1b Infant Coda

Coefficients	Mean	SD	95%	HDI	mean	SD	95%	HDI

Intercept	−0.01	0.35	−0.68	0.68	−0.50	0.18	−0.85	−0.15
Time	0.69	0.14	0.42	0.97	0.96	0.08	0.81	1.11
Pronunciation	0.11	0.37	−0.62	0.84	0.68	0.16	0.37	0.99
Place of Articulation (POA)	0.02	0.45	−0.85	0.90	−0.28	0.20	−0.67	0.12
POA*Pronunciation	0.16	0.70	−1.22	1.54	−0.06	0.14	−0.32	0.21
Pronunciation*Time	0.22	0.26	−0.28	0.72	0.38	0.11	0.17	0.59
POA*Time	−0.13	0.18	−0.49	0.22	0.06	0.10	−0.13	0.26
POAPronunciation Time	−0.01	0.27	−0.51	0.53	−0.07	0.11	−0.29	0.15

Open in a new tab

For both experiments, zero was included in the 95% highest density intervals (HDIs) of the posterior distributions for the three-way interactions of (mis)pronunciation by place of articulation by time. These results suggest that during word activation, toddlers do not differentiate mispronunciation effects according to the place of articulations of the target word for both onsets and codas. Moreover, the HDIs of the posterior distributions for the effect of (mis)pronunciation over time (i.e., time and pronunciation interaction) exclude zero for Experiment 1b, suggesting that 19-month-olds in this experiment were also sensitive to 1-feature mispronunciations of coda consonants in word recognition. By contrast, the effect of (mis)pronunciation over time (i.e., time and pronunciation interaction) does not exclude zero for Experiment 1a. Therefore, the lack of three-way interaction in this experiment participants could also suggest that 19-month-olds were not sensitive to 1-feature mispronunciations of onset consonants.

To examine this possibility and to further evaluate the underspecification and equivalent-representation accounts, we conducted Bayesian Model Comparisons. In particular, two sets of priors were established with equal probability, where one set supports the equivalent representation account and the other set supports the underspecification hypothesis. By specifying priors that are appropriate for each account, posterior probabilities were then computed for the observed data given each of the alternative accounts to provide a quantitative estimate of the relative credibility of the two accounts.

The two informed sets of priors were obtained through the following steps. First, we obtained data from Tin & Morgan (2014) who tested English-speaking adults on their sensitivities to 1-, 2- and 3-feature mispronunciations in word onset consonants. Using the correct mispronunciation and 1-feature mispronunciation data of this study, we created two separate datasets with one dataset supporting the underspecification hypothesis and the other dataset supporting the equivalent representation account. In the underspecification data, participants’ response to (mis)pronunciations of coronal segments were made to be equal to their correct counterparts; and all of the effect of mispronunciation is assumed to reside in the noncoronal items. In the equivalent-representation data, participants’ response to (mis)pronunciations of coronal segments were simply taken from their 1-feature mispronunciation counterpart in Tin & Morgan’s data to model the fact that there was no (mis)pronunciation by place of articulation interaction. Then, the two datasets were tested with the same logistic mixed effects model as the one used for non-informative priors, and parameters of the model were estimated using Maximum Likelihood Estimation (MLE). Thus, two sets of coefficients were obtained and then entered into the Bayesian Model Comparison as two sets of prior distributions, with one set of coefficient representing the underspecification model and the other set representing the equivalent representation account. Specific values for the prior means of the coefficients under each model are given in Table 4. JAGS code for Bayesian Model Comparison can be found in Appendix F.

Table 4.

Prior coefficients in Bayesian model comparisons

Coefficient	Model 1: Full-Specification	Model 2: Underspecification
Intercept	−3.440e-01	−5.306e-01
Time	7.344e-01	1.396e+00
Pronunciation	8.746e-01	3.036e-13
Place of Articulation (POA)	−1.447e-14	−8.746e-01
POA*Pronunciation	1.750e-14	8.746e-01
Pronunciation*Time	6.620e-01	1.055e-14
POA*Time	− 8.674e-15	−6.620e-01
POAPronunciation Time	1.016e-14	6.620e-01

Open in a new tab

Given the onset data in Experiment 1a and 1b, the posterior probability of the model based on parameters derived from the underspecification account (P(M_U|D)) was both 0, whereas the posterior likelihood of the model based on parameters derived from the equivalent representation account (P(M_ER|D)) was 1 according to the MCMC estimation, which assumed the prior probability of the two sets of parameters to be equal. The ratio of these two likelihoods yields a Bayes factor of infinity. A Bayes factor of this magnitude is typically interpreted as “very strong” (Kass & Raftery, 1995) or “decisive” (Jeffries, 1961). Therefore, for the data in Experiment 1a and Experiment 1b, the equivalent representation parameters provide a credible account, whereas the underspecification parameters do not. Therefore, in the range of reasonable priors, there are no values leading to anything other than supporting for an interpretation of equivalent specification.

Contrary to accounts for both infants (Dijkstra & Fikkert, 2011; Tsuji et al., 2015) and adults (Lahiri & Reetz, 2010), all our analyses indicate that 19-month-olds equivalently represent coronal and noncoronal consonants. This inconsistency with previous findings may in part be due to our use of on-line measurements of processing of correct and incorrect pronunciations in an intermodal preferential looking procedure. Previous studies of underspecification in adults have typically used other tasks, such as the oddball paradigm in EEG, gating or semantic priming, to examine effects of mispronunciations. It is thus unclear whether the reported results reflect procedural differences or developmental changes in lexical representation between infants’ early lexical representation and adults’ mature lexical representation. To disentangle this, in Experiments 2a and 2b, we employed a visual world paradigm mimicking the intermodal preferential looking procedure used for infants to test adults on their immediate responses to mispronunciations.

EXPERIMENT 2

The purpose of the two experiments was to test whether adults asymmetrically represent coronals and non-coronals using an on-line word recognition task. Experiments 2a and 2b addressed this question for consonantal onsets and codas, respectively.

As noted earlier, Mitterer (2011), using a procedure similar to ours, failed to find evidence for predicted asymmetries. Our studies differ from those of Mitterer’s in several ways, most notably using images rather than printed words and examining effects of mispronunciations. We discuss ramifications of these differences later in Experiment 3.