Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2015 Sep 1.
Published in final edited form as: J Phon. 2014 Sep 1;46:147–160. doi: 10.1016/j.wocn.2014.07.001

The role of abstraction in non-native speech perception

Bozena Pajak a,*, Roger Levy a
PMCID: PMC4153394  NIHMSID: NIHMS618817  PMID: 25197153

Abstract

The end-result of perceptual reorganization in infancy is currently viewed as a reconfigured perceptual space, “warped” around native-language phonetic categories, which then acts as a direct perceptual filter on any non-native sounds: naïve-listener discrimination of non-native-sounds is determined by their mapping onto native-language phonetic categories that are acoustically/articulatorily most similar. We report results that suggest another factor in non-native speech perception: some perceptual sensitivities cannot be attributed to listeners’ warped perceptual space alone, but rather to enhanced general sensitivity along phonetic dimensions that the listeners’ native language employs to distinguish between categories. Specifically, we show that the knowledge of a language with short and long vowel categories leads to enhanced discrimination of non-native consonant length contrasts. We argue that these results support a view of perceptual reorganization as the consequence of learners’ hierarchical inductive inferences about the structure of the language’s sound system: infants not only acquire the specific phonetic category inventory, but also draw higher-order generalizations over the set of those categories, such as the overall informativity of phonetic dimensions for sound categorization. Non-native sound perception is then also determined by sensitivities that emerge from these generalizations, rather than only by mappings of non-native sounds onto native-language phonetic categories.

Keywords: Non-native speech perception, sound discrimination, perceptual reorganization, naïve listeners, cross-linguistic influence, inductive inference

1. Introduction

The development of speech perception in the first year of life provides a critical foundation for future language learning. Infants undergo profound perceptual reorganization (Eimas, 1978): they transition from discriminating almost any speech sound distinction (including those absent from their ambient language) to a state of enhanced sensitivity to native-language (L1) distinctions, accompanied by a decline in sensitivity to many non-native distinctions (Werker & Tees, 1984; for reviews, see Werker, 1989; Kuhl, 2004). These results have led to the development of theories in which perceptual reorganization is understood as resulting from the acquisition of the specific inventory of native-language phonetic categories1, and the end-state is a reconfigured (“warped”) perceptual space, where innate perceptual sensitivity along natural auditory boundaries is replaced by sensitivity along boundaries of phonetic categories in the learner’s native language (Kuhl, 1991, 2000).

As a consequence, the long-held assumption underlying the research on non-native speech perception has been that non-native speech is necessarily “filtered” through listeners’ L1 phonetic category inventory. The “L1-category filter” metaphor can be traced back to Trubetzkoy (1939/1969), and the essence of this idea is present in current theories of non-native speech perception and learning: the Native Language Magnet model (NLM, Kuhl, 1992, 1994; Kuhl & Iverson, 1995; Kuhl, 2000; Kuhl, Conboy, Coffey-Corina, Padden, Rivera-Gaxiola, & Nelson, 2008), the Speech Learning Model (SLM, Flege, 1988, 1992, 1995), and the Perceptual Assimilation Model (PAM and PAM-L2, Best, 1993, 1994, 1995; Best & Tyler, 2007). These theories, while different in several respects, preserve the basic insight captured in the “L1-category filter” metaphor: that the perceptual space warped in accordance with the L1 phonetic category inventory – the end-result of perceptual reorganization in infancy – acts as a perceptual filter when processing non-native languages. Specifically, according to these theories, naïve-listener and second-language (L2) learner discrimination of non-native sounds is determined by their mapping onto specific L1 phonetic categories that are acoustically or articulatorily most similar, if such categories are available. Broadly speaking, discrimination of non-native contrasts is thought to be impaired when the stimuli are mapped (i.e., perceptually assimilated) onto the same L1 category (with varying performance depending on the goodness of fit to that category), relative to when they are mapped onto differing categories.

These classic theories have been very successful in explaining a wide range of perceptual difficulties in non-native speech perception and learning (Miyawaki, Strange, Verbrugge, Liberman, Jenkins, & Fujimura, 1975; Flege & Eefting, 1987; Best & Strange, 1992; Polka, 1991, 1992; Hallé, Best, & Levitt, 1999; Best, McRoberts, & Goodell, 2001; McAllister, Flege, & Piske, 2002; Best & Hallé, 2010, among others; for a review see Strange & Shafer, 2008), showing that the degree of similarity between native and non-native sounds – as assessed through acoustic and articulatory comparisons or direct measures of perceived similarity – can predict performance on discrimination of non-native sound pairs. That is, if two non-native sounds are both assessed as highly similar to a single L1 category, their discrimination is impaired. On the other hand, if each sound in the non-native pair is highly similar to a distinct L1 category, then their discrimination is facilitated. A widely cited example is the difficulty of L1-Japanese speakers in discriminating the English [ɹ]-[l] distinction, which is generally attributed to Japanese only having one phonetic category in the same acoustic-phonetic range (Goto, 1971; Strange & Dittmann, 1984; Miyawaki et al., 1975). This type of example has been used as evidence supporting the classic theories since perceptual difficulties can in this case be explained by L1-Japanese listeners’ assimilating both of the non-native sounds onto a single L1 category.

However, recent evidence suggests that the theories of non-native speech perception might need to be extended to accommodate discrimination patterns that cannot be explained by specific L1 phonetic category inventories. In particular, it has been shown that native French, Danish, and German listeners outperform English native speakers on discriminating the English [w]-[j] contrast (Hallé et al., 1999; Bohn & Best, 2012). These results are extremely surprising for at least two reasons: (1) native speakers performed more poorly on discriminating sounds from their native language than did non-native listeners, and (2) this non-native perceptual advantage was observed for speakers of languages that do not even have [w] in their inventory (Danish and German). Bohn and Best suggested that these results could be explained by the influence of more general characteristics of the L1 inventory on the listener’s perceptual system: French, Danish, and German have a relatively rich vowel inventory, and – unlike English – use lip rounding contrastively for vowels; since lip rounding is one of the cues to distinguish [w] from [j], the practice with lip rounding to distinguish between L1 vowels might boost French, Danish, and German listeners’ performance on discriminating the [w]-[j] contrast. In other words, phonological principles – such as whether or not L1 uses a given feature contrastively – may affect perception of non-native contrasts.

Bohn and Best’s proposal echoes prior suggestions that phonological distinctive features may affect non-native speech perception and learning (e.g., Hancin-Bhatt, 1994; Brown, 1997, 2000; McAllister et al., 2002). For example, McAllister et al. (2002) developed one of the components of the SLM into the “feature hypothesis” stating that “[phonological] features not used to signal phonological contrast in L1 will be difficult to perceive for the L2 learner and this difficulty will be reflected in the learner’s production of the contrast based on this feature” (p. 230). That is, under the feature hypothesis, the difficulty in learning a given L2 contrast based on feature X is determined by the role of feature X in the learner’s L1. In particular, forming an L2 phonetic category may be more difficult (or even blocked) if it requires attending to a feature that is not exploited in the learner’s L1. As support for their hypothesis, McAllister and colleagues showed that L2 learners of Swedish are more successful at acquiring the short vs. long vowel distinctions if their L1 also employs the length feature to distinguish between vowels (as for L1-Estonian learners), relative to the case when vowel contrasts in the learners’ L1 either only use duration as a secondary cue (as for L1-English learners) or do not make use of duration (as for L1-Spanish learners). In a similar vein, Brown (1997, 2000) argued for a model of non-native speech perception in which any phonological distinctive features used contrastively in L1 would be transferred to L2, which would in turn facilitate discrimination of any non-native sounds that are contrasted by those features.

Converging proposals can be found in the general literature on categorization and perceptual learning (e.g., Nosofsky, 1986; Goldstone 1993, 1994), where native-language phonetic learning is viewed as shifting attention to relevant acoustic-phonetic cues (e.g., Jusczyk, 1985, 1986, 1994, 1997; Best, 1994; Nusbaum & Goodman, 1994; Pisoni, Lively, & Logan, 1994; Nittrouer & Miller, 1997). Furthermore, there is a growing body of work showing that the difficulty of non-native sound perception and learning is modulated by perceptual weighting of the relevant acoustic and articulatory cues in L1 (e.g., Francis & Nusbaum, 2002; Iverson, Kuhl, Akahane-Yamada, Diesch, Tohkura, Kettermann, & Siebert, 2003; Escudero & Boersma, 2004; Iverson, Hazan & Bannister, 2005; Kondaurova & Francis, 2008; Escudero, 2009; Escudero, Benders, & Lipski, 2009; Kondaurova & Francis, 2010; Lim & Holt, 2011). However, while these approaches emphasize the role of individual acoustic-phonetic cues in speech perception and learning, they largely focus on explaining the difficulties in discriminating and learning the contrasts that differ along unattended dimensions. It is in fact unclear whether they would predict any facilitatory effects on discriminating non-native sounds (e.g., [w]-[j]) that differ along dimensions attended in L1 but occurring in different contexts (such as the lip rounding cue distinguishing between the high front unrounded vowel [i] and the high front rounded vowel [y]). This is because selective attention is generally thought to operate on particular configurations of cues that are largely context-specific: that is, specific to particular segmental contexts, word positions, talkers, etc. (e.g., Logan, Lively, & Pisoni, 1991; Nusbaum & Goodman, 1994; Pisoni et al. 1994; Francis et al., 2000; Francis & Nusbaum, 2002; Reinisch, Wozny, Mitterer, & Holt, 2014). Therefore, these approaches do not immediately capture the potential facilitation in non-native speech perception coming from attending to cues that are relevant in L1, but occur in very different acoustic-phonetic contexts.

Some support for the idea that more general phonological principles might affect non-native speech perception comes from the literature on speech perception and learning by infant learners. In particular, it has been shown that exposing infants to particular sound categories leads not only to their enhanced sensitivity to those specific categories, but also to generally enhanced sensitivity to the underlying phonetic dimension that these categories are defined by (Maye, Weiss, & Aslin, 2008). More specifically, English-learning infants exposed to a non-native voice onset time (VOT) distinction (e.g., the prevoiced [d] vs. voiceless unaspirated [t]) showed enhanced perceptual sensitivity not only to that specific contrast, but also to an unfamiliar contrast along the same dimension (e.g., prevoiced [g] vs. voiceless unaspirated [k]).

It is noteworthy that the emergence of such general dimension-based perceptual sensitivity (for dimensions or features such as voicing, length, or lip rounding) would be a natural consequence of language development from a rational perspective on learning (e.g., Chater & Manning, 2006; Tenenbaum, Kemp, Griffiths, & Goodman, 2011). This is because languages re-use a limited set of phonetic dimensions or features for signaling multiple contrasts (Clements, 2003). Thus, enhancing sensitivity to relevant dimensions that encompass many potential contrasts is more adaptive for a learner than only enhancing sensitivity to specific categories that a learner has been directly exposed to. Note that learning need not be sequential (i.e., first learning [d]-[t], and then [g]-[k], as in the Maye et al. study) in order for learners to benefit from general enhanced sensitivity to phonetic dimensions or features. When learning all phonetic categories at once, as is the case in naturalistic language acquisition, noticing featural relationships between categories would make learning more efficient because less data may be needed to accurately identify and represent each individual category.

Taken together, recent evidence suggests that speech perception – both native and non-native – is affected not only by specific L1 phonetic category inventory, but also by more general phonological principles. But what exactly are those phonological principles? Bohn and Best (2012) suggested that contrastive lip rounding in L1 vowels translates into enhanced sensitivity to a non-native lip-rounding contrast between glides ([w]-[j]). This result is suggestive of a phonological principle that generates sensitivity to the lip-rounding cue. However, there are still many open questions. For example, it is unclear what kinds of cues are the building blocks of such principles (phonological features, acoustic and/or articulatory cues, etc.), or how such principles operate: Do they apply only in cases when the corresponding contrasts are phonetically similar (as is the case with vowels and glides)? Or are they more abstract principles that apply regardless of the degree of phonetic similarity between known and non-native contrasts?

In this paper we begin to address these questions by examining the phonetic dimension of length. Length is a relatively salient cue that tends to be easily accessible to learners, irrespective of their L1 background (see, e.g., Bohn & Flege, 1990; Bohn, 1995; Cebrian, 2006; Flege, Bohn, & Jang, 1997; Escudero, 2006; Escudero & Boersma, 2004; Kondaurova & Francis, 2008, 2010; Wang & Munro, 2004). For example, L2 learners tend to over-rely on length when distinguishing between non-native vowels for which length is only a secondary cue (such as the English tense-lax distinction), even when their L1 does not use length contrastively (e.g., native speakers of Spanish or Mandarin; Bohn, 1995; Escudero & Boersma, 2004). At the same time, however, the accessibility of length depends on the exact contrast (Wang, 2006), and L1 background does affect perception and learning of contrasts based on length (Hayes, 2002; McAllister et al., 2002; Goudbeek, Cutler, & Smits, 2008; Hayes-Harb & Masuda, 2008; Heeren & Schouten, 2008; Escudero et al., 2009; Pajak, 2013). In fact, differential perception of length contrasts can already be observed in 18-month-old infants learning a language with phonemic vs. non-phonemic length, such as Dutch (Dietrich, Swingley, & Werker, 2007) or Japanese (Mugitani, Pons, Fais, Werker, & Amano, 2009). Furthermore, even though speakers of languages with no length contrasts (e.g., Spanish) tend to rely on length in non-native speech perception, they do not display the kind of perceptual warping along the length dimension that is typical for speakers of languages with contrastive length (i.e., increased perceptual sensitivity at category boundaries; Heeren & Schouten, 2008; Kondaurova & Francis, 2010). Given this body of evidence, naïve-listener discrimination of length contrasts could well be modulated by the overall role of length in the listener’s L1.

The main advantage of using length as the dimension of investigation is that it can be applied contrastively to a wide range of segments, both vowels and consonants. This means that a length contrast in one place of the L1 phonetic category inventory might potentially reveal an influence on perception of sounds in a completely different part of the perceptual space. Therefore, studying length allows us to investigate the limits of phonological abstractness that might guide non-native speech perception. Specifically, we ask whether sensitivity to length as a cue for distinguishing between L1 vowels translates into enhanced sensitivity to length as a cue for consonants, even when length is not used to distinguish between consonants in the listener’s L1. This is of particular interest given that vowels and consonants as a group are acoustically and functionally very distinct. For example, vowels are perceived differently from consonants (e.g., vowel perception is less categorical than consonant perception; e.g., Schouten & van Hessen, 1992), and the two types of segments carry different kinds of information (Bonatti, Peña, Nespor, & Mehler, 2005).

Here we test discrimination of non-native consonant length contrasts by naïve listeners of different L1 backgrounds: Korean, where length is a highly informative contrastive cue for both vowels and consonants; Vietnamese and Cantonese, where length is informative, but more limited (only for vowels, and – for Cantonese – only as an additional cue together with changes in vowel quality); and Mandarin Chinese, where length is uninformative for segmental distinctions (see detailed language characteristics in section 2.2). Note that we view cue informativity as a continuum, where the most informative cues are phonologically contrastive, while the secondary or allophonic cues are less informative, but where all cues play some role in shaping learners’ perceptual sensitivities (see section 4.1 for further discussion).

We use an AX discrimination task, which can inform us about perceptual sensitivities of naïve listeners. If non-native speech perception is affected by a phonological principle that produces general enhanced sensitivity to those cues that are informative in L1, irrespective of the exact segments involved, then we expect all length-experienced participants (Korean, Vietnamese, Cantonese) to outperform Mandarin speakers on discrimination of short versus long consonants. In addition, if the degree of cue informativity plays a role, then we expect gradient discrimination performance that corresponds to length informativity in L1: best for L1-Korean, medium for L1-Vietnamese and L1-Cantonese, and worst for L1-Mandarin. On the other hand, if this kind of phonological principle is restricted to segments that are acoustically/articulatory fairly similar, then we expect L1-Korean listeners to show relatively good discrimination of short and long consonants (since Korean uses length contrastively for some consonants), but the other three groups (Vietnamese, Cantonese, Mandarin) should pattern together and show relatively worse discrimination (given that none of these languages uses length contrastively for consonants).

As a control, we included sibilant2 place of articulation contrasts from Polish (alveolo-palatal vs. retroflex fricatives and affricates) distinguished by dimensions familiar to speakers of Mandarin (where similar alveolo-palatal and retroflex consonants exist as allophones), but not to speakers of the other languages. Here, we expected a reverse performance pattern: Mandarin speakers outperforming other participants on discriminating between sibilants.

2. Materials and methods

2.1 Participants

96 undergraduate students at the University of California, San Diego participated in the experiment for course credit. Each participant was from one of four language groups: Korean, Vietnamese, Cantonese (all three length-experienced), and Mandarin (sibilant-experienced). All Cantonese speakers also spoke some Mandarin, learned at school, which means that they had been exposed to Mandarin sibilants. If such exposure was sufficient to induce perceptual sensitivity to sibilant contrasts, then we might observe an advantage for Cantonese speakers, relative to speakers of Korean and Vietnamese, in discriminating the sibilant contrasts.

Participants learned the target language from birth, and were bilingual in English. While recruiting monolingual participants would have been more desirable, we hoped that the relatively uniform experience with English across all language groups would not significantly affect the results. In particular, any differences between groups should not be driven by English knowledge, but rather by the differing L1 background.

We collected detailed language background information through a questionnaire, and we recruited both L1-dominant and English-dominant participants to ensure that the results would not be driven by language dominance. Language dominance was self-reported: participants were instructed to list all known languages in order of dominance, starting with the language they know best; in rare cases of inconsistencies between the listed language dominance order and the proficiency ratings, the experimenter asked follow-up questions to resolve the inconsistency. The language that was listed first was taken as the participant’s dominant language. The obvious limitation of collecting self-reported language information is that it is entirely subjective. However, we made sure to only recruit participants that considered themselves fluent in their L1, and used their L1 on a regular basis.

The questionnaire data revealed no major differences between language groups with respect to language background (see Table 5 in the Appendix for detailed information), although the Vietnamese-speaker group was overall more biased toward English, as we were not able to recruit as many Vietnamese-dominant participants, and the participants within the Vietnamese-dominant group immigrated to the US at a younger age than the L1-dominant participants from other language groups. We come back to this point in section 3.1.

Participants reported no history of speech or hearing problems.

2.2 Language characteristics

Korean uses length to distinguish between all vowels (e.g., [pul] “fire” vs. [pu:l] “blow”; Lee, 1999). There are only a few lexical items with underlying monomorphemic long consonants (e.g., [pəl:e] “worm”; Kim, 2002). More common are morphologically derived long consonants ([l:], [n:], [m:]), which arise from phonological assimilation processes (Sohn, 1999). These consonant length differences are not simply phonetic, but rather phonological in nature, as many near-minimal pairs can be found (e.g., [namu] “tree” and [nam:i] “South America”, or [samil] “three-one / March 1” and [sim:un] “interrogation”). In addition, Korean tense obstruents ([p’], [t’], [k’], [s’], [tɕ’]) have sometimes been analyzed as long (Choi, 1995). Korean does not have an alveolo-palatal vs. retroflex distinction; the inventory includes the alveolo-palatal affricate [tɕ] and the alveolo-palatal fricative [ɕ] that is an allophone of /s/, but no retroflex sounds (Hahm, 2007; Sohn, 1999).

In Vietnamese, length is phonologically contrastive for two sets of vowels: [a]-[a:] and [ə]-[ə:] (e.g., [bang] “state” vs. [ba:ng] “ice”), although the latter have been argued to also differ in vowel quality (Winn, Blodgett, Bauman, Bowles, Charters, Rytting, & Shamoo, 2008). Consonants are always short. Vietnamese does not have alveolo-palatal or retroflex sounds.

Length is used in Cantonese to distinguish between vowel categories, but it is generally only one of the cues in addition to distinctions in vowel quality, and all short-long vowel pairs but one are in complementary distribution (Bauer & Benedict, 1997). However, the one pair occurring in the same contexts ([ɐ]-[a:]) is distinguished almost exclusively by length, with only minimal quality differences (Zhang, 2011), which makes length-based minimal pairs quite numerous. In addition, length has been shown to be the primary cue for distinguishing other vowel pairs as well (Kao, 1971; Bauer & Benedict, 1997. Consonants are always short. Cantonese does not have retroflex sounds, but alveolar sibilants ([ts], [tsh], [s]) can be palatalized to the alveolo-palatal place of articulation ([tɕ], [ɕh], [ɕ]), especially before high front vowels (Bauer & Benedict, 1997).

Mandarin does not have segmental length contrasts (Lin, 2001), although Mandarin tones vary in length, and some listeners have been reported to use length to distinguish between tones when the main cue – the F0 pattern – has been artificially manipulated to be ambiguous (Tseng, Massaro, & Cohen, 1986; Blicher, Diehl, & Cohen, 1990; Jongman, Wang, Moore, & Sereno, 2006). As for sibilants, Mandarin has voiceless alveolo-palatals ([ɕ], [tɕ]) and retroflexes ([ʂ], [tʂ]) as allophones of the same phoneme: alveolo-palatals occur before high front vowels and the palatal glide, and retroflexes occur elsewhere (Lin, 2001). In addition, the voiced retroflex fricative [ʐ] is a between-speaker variant of the retroflex approximant [ɻ]. Other voiced sibilants are assumed to be absent because Mandarin has obstruent distinctions in aspiration, not voicing (Lin, 2001).

American English does not use length contrastively. Vowel length varies, but it correlates with the tense-lax distinction (e.g., beat vs. bit) and the voicing of the following segment (e.g., cad vs. cat). The differences in vowel length alone never distinguish between words, and native speakers of English identify vowels relying predominantly on spectral properties, with length serving only as a secondary cue (e.g., Hillenbrand, Clark, & Houde, 2000). Long consonants are sometimes attested but only at morpheme boundaries (e.g., dissatisfied; Benus, Smorodinsky, & Gafos, 2003). Minimal pairs are extremely rare (e.g., unnamed vs. unaimed), and for most speakers the contrast is neutralized (Kaye, 2005). Furthermore, there is evidence that by 18 months of age English-learning infants process length contrasts differently from infants learning a language that uses length contrastively (e.g., Dutch or Japanese; Dietrich, Swingley, & Werker, 2007; Mugitani, Pons, Fais, Werker, & Amano, 2008). English does not have alveolopalatal nor retroflex obstruents, although some speakers produce the alveolar approximant [ɹ] as retroflex (Ladefoged & Maddieson, 1996; Westbury, Hashi, & Lindstrom, 1998).

2.3 Materials

The materials consisted of nonce words modeled after Polish phonology, and were recorded in a soundproof booth by a phonetically-trained Polish native speaker. Polish was chosen as the target language because it has both consonant length contrasts and alveolo-palatal vs. retroflex sibilant contrasts.3 Note that the Polish and the Mandarin alveolo-palatals and retroflexes differ in the exact place of articulation, but they share similar spectral cues that distinguish between the two (Ladefoged & Maddieson, 1996).

A complete list of sound segments used is provided in Table 1. The critical length items included short and long consonants. The control sibilant items included alveolo-palatal and retroflex fricatives and affricates, which differ in the spectral shape of the frication noise. Each sound segment was recorded embedded in seven contexts: [pa_a], [pe_a], [po_a], [ta_a], [te_a], [ka_a], [ke_a],4 with five repetitions of each word. The words were produced in isolation in a random order. For each word type, the two tokens with clearest pronunciations were chosen (e.g., [pama]1 and [pama]2). Subsequently, the chosen tokens were manipulated through splicing to ensure that the minimal-pair words differed only in length or place of articulation, with no irrelevant differences present elsewhere in a word. That is, the minimal-pair tokens always had acoustically identical contexts, and differed only in the middle consonant. All words were spliced, even in the cases of splicing segments across the two tokens of a given word (e.g., [pama]1 and [pama]2), in order to avoid any potential facilitation in processing “unspliced” words relative to “spliced” words. The exact procedure was slightly different across item types, and so in what follows they are described separately.

Table 1.

Sound segments used as stimuli, and the occurrences of corresponding sounds in Korean, Vietnamese, Cantonese, and Mandarin

length stimuli sibilant stimuli filler stimuli
short long alveolo-palatal retroflex
m n l s j w f m: n: l: s: j: w: f: (Vs)c ɕ ʑ ʂ ʐ x χ ʁ ʝ
Kor a a a b (√)
Viet (√)
Cant (√) (√) (√)
Mand d
a

Almost exclusively as a result of phonological assimilation.

b

Korean tense [s] is sometimes analyzed as long.

c

Vs=vowels; long vowels were not included in the stimuli.

d

Alveolo-palatals and retroflexes are allophones in Mandarin.

For length items, short-segment words were chosen as frames (e.g., [pama]1), and their middle consonants were removed (yielding, e.g., [pa_a]m1). Then, words with short segments were created by splicing in the consonants that were taken from other tokens of the same type (e.g., [pama]2). Thus, this procedure yielded new tokens, for which the context was taken from one token, and the middle consonant was taken from another token (e.g., [pa(m)2a]m1). Words with long segments were created from the spliced short-segment words (e.g., [pa(m:)2a]m1) by either doubling the consonant’s length (for sonorants: [j], [w], [l], [m], [n]) or elongating it by half their length (for fricatives: [f], [s]). This difference was introduced to mimic natural production, reflecting the fact that intervocalic length contrasts are perceptually harder for sonorants than for fricatives due to more blurred segment boundaries (Kawahara, 2007). The duration of short segments was not manipulated. Instead, naturally produced short consonants from each word type were spliced in. The exact consonant duration values for the length items are listed in Table 2.

Table 2.

Consonant duration values (in ms) for the length items

Segment
type
Middle consonant durations (short/long) Mean
long/short
duration
ratio
[pa_a] [pe_a] [po_a] [ta_a] [ta_a] [ka_a] [ka_a]
j/j: 73/153 81/160 78/154 82/164 77/153 87/180 76/149 2.01
w/w: 88/176 79/165 66/130 88/184 64/126 61/120 73/146 2.01
l/l: 71/142 76/151 78/154 70/142 72/143 73/146 95/193 2.00
m/m: 81/163 79/155 83/170 92/186 87/176 91/180 89/177 2.00
n/n: 74/148 74/157 73/156 71/147 86/172 82/166 74/145 2.04
s/s: 139/208 152/228 147/220 150/225 139/209 160/240 155/234 1.50
f/f: 128/192 142/213 127/191 130/195 128/192 135/202 149/223 1.50

For sibilant items, both alveolo-palatals and retroflexes were spliced in. In order to handle differences in formant transitions between alveolo-palatals and retroflexes, we first checked which frame type (an alveolo-palatal or a retroflex word) would produce more natural-sounding stimuli. After experimenting with multiple tokens, it was determined that retroflex-segment words produced better frames (e.g., [paʂa]1). As with length items, the middle consonants were removed from the frame tokens (yielding, e.g., [pa_a]ʂ1). Then, both retroflex consonants and their corresponding alveolo-palatal consonants were spliced into the frames (e.g., [pa(ʂ)2a]ʂ1 and [pa(ɕ)1a]ʂ1). Note that the formant transitions into and out of the medial consonants were not manipulated, which meant that the transitions were appropriate for the retroflex consonants, but not for the alveolo-palatal consonants. Consequently, one of the main cues to the sibilant contrast – formant transitions out of alveolo-palatals – was partially removed, thus making this contrast perceptually less salient than in natural speech. We come back to this point later in the discussion of the results.

The fillers were spliced using the same procedure as for sibilant items. The frames for each contrast were again chosen based on initial experimentation, with the goal to choose most natural-sounding stimuli. Ultimately, the frames for the [x]-[χ] contrast were taken from the [x]-words, the frames for the [χ]-[ʁ] contrast were taken from the [χ]-words, and the frames for the [j-ʝ] contrast were taken from the [ʝ]-words.

2.4 Procedure

The experiment consisted of a same-different AX discrimination task. On each trial, a pair of words was presented auditorily over headphones. The words were either “different” (e.g., [pama]-[pam:a]; all tested contrasts are listed in Table 3) or “same” (e.g., [pama]-[pama]). “Same” words in each pair were physically identical. “Different” words in each pair always shared a physically identical frame (i.e., the words were identical except for artificial lengthening for length contrasts and a spliced consonant for sibilant and filler contrasts) to ensure that “different” responses resulted only from the manipulation of interest, and not due to irrelevant differences present elsewhere in a word. The words in each pair were separated by a 750 ms interstimulus interval. Each pair was repeated twice throughout the experiment, which yielded 392 pairs (196 length pairs, 196 sibilant and filler pairs; half “different”, half “same”), divided into seven 56-trial blocks separated by self-terminated breaks. In each trial, a word-pair was played once without a replay option, and the response to one trial triggered presentation of the subsequent trial with a 500 ms delay. Trial order was randomized for every participant. The testing was preceded by a 16-trial no-feedback practice session.

Table 3.

Tested contrasts

Length Sibilant Filler
j-j: ɕ-ʂ x-χ
w-w: tɕ-tʂ χ-ʁ
l-l: ʑ-ʐ j-ʝ
m-m: dʑ-dʐ
n-n:
s-s:
f-f:

3. Results

For each tested contrast and each participant we calculated d-prime scores, which is a measure of contrast sensitivity based on the principles of Signal Detection Theory (Macmillan & Creelman, 2005).5 The main results are plotted in Figures 12. In addition, Table 4 provides the raw accuracy scores that correspond to d-prime scores from Figure 1.

Figure 1.

Figure 1

Overall performance on perception of three types of contrasts: length, sibilant, and filler (error bars are standard errors)

Figure 2.

Figure 2

Performance on perception of length contrasts split by segment (error bars are standard errors)

Table 4.

Overall performance: mean % accuracy

Length Sibilant Filler
Korean 82.1 57.7 71.8
Vietnamese 80.0 59.7 72.9
Cantonese 74.4 59.2 68.7
Mandarin 66.1 64.3 67.4

We analyzed the d-prime scores using repeated-measures ANOVAs. An overall 4×3 ANOVA with the factors LANGUAGE (Korean, Vietnamese, Cantonese, Mandarin) and CONTRAST (length, sibilant, filler) revealed significant effects of both LANGUAGE [F(3,92)=3.0; p<.05] and CONTRAST [F(2,184)=172.9; p<.001], as well as a significant interaction [F(6,184)=17.9; p<.001]. To investigate each of these effects, we proceeded with a series of comparisons between language groups.

First, we compared length-experienced participants (Korean, Vietnamese, Cantonese speakers) to sibilant-experienced participants (Mandarin speakers) using a 2×2 ANOVA with the factors LANGUAGE GROUP (length-experienced, sibilant-experienced) and CONTRAST (length, sibilant). There was a significant interaction between Language Group and Contrast [F(1,94)=51.7; p<.001]: length-experienced participants were more sensitive to length differences, and sibilant-experienced participants were more sensitive to sibilant contrasts.6 The results also revealed a main effect of Language Group [F(1,94)=6.5; p<.05]: Mandarin speakers performed overall worse than length-experienced participants as a group. Mandarin speakers’ diminished sensitivity to length contrasts cannot, however, be attributed to worse overall performance, since the result was reversed for sibilant contrasts7 (also supported by a significant interaction between Language Group and sibilant vs. filler Contrast [F(1,94)=24.0; p<.001]), and the differences between the two Language Groups on length and on filler contrasts were of different magnitudes (as indicated by a significant interaction between Language Group and length vs. filler Contrast [F(1,94)=33.4; p<.001]).

Crucially, the main result was not driven just by Korean performance, but also held for each relevant pairwise Language comparison, as indicated by significant interactions between Language and length vs. sibilant Contrast (Korean-Mandarin: [F(1,46)=63.7; p<.001]; Vietnamese-Mandarin: [F(1,46)=33.1; p<.001], Cantonese-Mandarin: [F(1,46)=18.7; p<.001]). The results reveal an extremely robust pattern: Korean, Vietnamese, and Cantonese speakers were consistently better at discriminating length contrasts than Mandarin speakers for each tested segment (Figure 2).

As for the comparisons within the length-experienced group, there was no significant difference on length contrasts for the Korean-Vietnamese pair [F<1], but there was a significant difference for both Korean-Cantonese [F(1,46)=11.0; p<.01] and Vietnamese-Cantonese [F(1,46)=5.1; p<.05], with Cantonese speakers performing worse. Therefore, the observed pattern of performance was: Korean, Vietnamese > Cantonese > Mandarin (although Vietnamese speakers performed numerically worse than Korean speakers). This result does not align exactly with our predictions, since we expected either (1) all length-experienced participants patterning together: Korean, Vietnamese, Cantonese > Mandarin, or (2) a gradient pattern: Korean > Vietnamese, Cantonese > Mandarin. However, this result is consistent with the idea that sensitivity to a given phonetic dimension is mediated by the degree of informativity that this dimension has in the learner’s native language. Recall that Korean uses length contrastively on both vowels and consonants, and Vietnamese uses it contrastively on vowels but not consonants. In Cantonese, on the other hand, length is only one of the cues to vowel distinctions (in addition to changes in vowel quality). Therefore, it appears that the major factor affecting perceptual sensitivities is the contrastiveness of a cue in L1, regardless of the exact segments it applies to (whether vowels or consonants). On the other hand, when a dimension is only one of the cues to a phonemic contrast (as length in Cantonese vowels), perceptual sensitivities to other contrasts along that dimension are also enhanced, but to a lesser degree.

3.1 Effects of language dominance

In this section we investigate more closely the potential effects of participants’ exact language background. Recall that all participants in our study were bilingual in English, which might have had an impact on their perceptual sensitivities (see, e.g., Antoniou, Tyler, & Best, 2012). Most importantly, discrimination of length contrasts in our experiment might have been facilitated due to length being informative for some contrasts in English (e.g., as a secondary cue for the tense-lax vowel contrasts). On the one hand, all participants in our study spoke English, and so any differences observed between language groups might seem unlikely to be driven by English knowledge. On the other hand, however, participants differed in their English exposure, and there was an imbalance in the Vietnamese speaker group in that – relative to other language groups – more participants were dominant in English, and even those dominant in their L1 arrived in the US at an earlier age.

In order to investigate the question of English influence more closely, we tested whether there were any differences in performance depending on the participant’s dominant language. As the first step, we reran the main analysis as a 4×3×2 ANOVA with the factors Language (Korean, Vietnamese, Cantonese, Mandarin), Contrast (length, sibilant, filler), and Language Dominance (L1, English). The results, illustrated in Figure 3, revealed no main effect of Language Dominance [F<1], but a significant interaction between Language Dominance and Contrast [F(2, 176)=3.5; p<.05] (in addition to the previously found significant effects of Language, Contrast, and their interaction). Separate 4×2 ANOVAs for each contrast type showed that the interaction was due to Language Dominance having a significant effect for sibilant items [F(1, 88)=1.0; p<.05], but not for length or filler items [Fs<1]. In particular, L1 dominance correlated with better performance on sibilant items, but not other items. There were no significant interactions between Language and Language Dominance in any of the models [Fs<1].

Figure 3.

Figure 3

Overall performance by language dominance (error bars are standard errors)

For sibilant items, better performance among L1-dominant participants was especially pronounced for Mandarin speakers, which is consistent with our hypothesis that the knowledge of Mandarin enhanced listeners’ discrimination of Polish alveolo-palatals and retroflexes. Similarly, a slight advantage for L1-dominant Cantonese speakers on sibilant items might be due to their knowledge of Mandarin, since L1-dominant participants within this group were likely to have had more Mandarin exposure than English-dominant participants. The source of an analogous advantage for L1-dominant Korean speakers is less clear, but perhaps it can be explained by more extensive exposure to Korean voiceless alveolo-palatals. Based on these results, it is possible that monolingual speakers would perform overall better on sibilant items than our bilingual participants, and this should be especially true for speakers of Mandarin.

For length items, the English-dominant length-familiar participants (Korean, Vietnamese, Cantonese) performed slightly better than L1-dominant participants, but this difference was not statistically significant. The pattern of responses was similar across all length-familiar participants, suggesting that the overall imbalance within the Vietnamese-speaker group (i.e., overall more extensive exposure to English relative to other language groups) did not impact the results. Furthermore, the pattern was reversed for Mandarin speakers: L1-dominant participants performed better than English-dominant participants, despite the fact that the latter report the highest regular English exposure and the highest preferred English usage of all tested language groups (see Table 5 in the appendix).8 Therefore, there is no clear evidence that the knowledge of English significantly affected participants’ discrimination of length contrasts, and we might expect that monolingual listeners would perform similarly on the length items to our bilingual participants.

Critically, our main finding – that is, better discrimination of length contrasts by Korean/Vientamese/Cantonese than by Mandarin speakers, but better discrimination of sibilant contrasts by Mandarin than by Korean/Vientamese/Cantonese speakers – holds regardless of the participants’ language dominance. Even when we only consider L1-dominant speakers, we observe the same pattern of results: a significant interaction between Language and length/sibilant/filler Contrast in a 4×3 ANOVA [F(6, 78)=9.9; p<.001], and significant interactions between Language and length vs. sibilant Contrast for each relevant pairwise Language comparison (Korean-Mandarin: [F(1,22)=5.3; p<.001]; Vietnamese-Mandarin: [F(1,17)=11.2; p<.01], Cantonese-Mandarin: [F(1,22)=5.6; p<.05]).

4. Discussion

In this paper we investigated the extent to which general phonological principles affect non-native speech perception. Hallé et al. (1999) and Bohn and Best (2012) showed that when the listeners’ L1 uses the lip-rounding cue to distinguish between vowels, their discrimination is also enhanced for a non-native lip-rounding contrast between glides ([w]-[j]). This particular result could be explained, as Bohn and Best suggested, by a simple phonological principle: contrastive lip rounding in L1 enhances perceptual sensitivity to the lip-rounding cue, which in turn leads to overall better discrimination of any non-native lip-rounding contrasts. Here we tested the extent of abstractness of such phonological principles: Do they apply only in cases when the corresponding contrasts are phonetically similar (as is the case with vowels and glides)? Or are they more abstract, applying regardless of the degree of phonetic similarity between known and non-native contrasts? We tested a different cue, length, which can span a wide range of segments, both vowels and consonants. We investigated whether sensitivity to L1 length differences on vowels will lead to enhanced perceptual sensitivity to non-native length differences on a range of consonants (glides, liquids, nasals, and voiceless fricatives). We recruited participants of different L1 backgrounds: Korean, where length is a highly informative contrastive cue for both vowels and consonants; Vietnamese and Cantonese, where length is informative, but more limited (only for vowels, and – for Cantonese – only as an additional cue together with changes in vowel quality); and Mandarin Chinese, where length is uninformative for segmental distinctions.

The results revealed differential discrimination of non-native length contrasts: all length-experienced participants (Korean, Vietnamese, Cantonese) outperformed participants not experienced with length (Mandarin). Furthermore, there were differences between the length-experienced group: performance was best for Korean speakers, slightly worse for Vietnamese speakers, and worst for Cantonese speakers. These results suggest that experience with a phonetic dimension on a limited subset of segments can lead to enhanced perceptual sensitivity to any distinctions along that dimension, even those that are acoustically and functionally very dissimilar from the previously learned categories. This result cannot be attributed to better task performance, as the pattern was reversed for sibilant contrasts, which are more familiar to Mandarin speakers.

Furthermore, we found some evidence of gradience: discrimination appears to be mediated by the degree of informativity that this dimension has in the learner’s native language. Note that while we did not expect a difference between Vietnamese and Cantonese speakers, it may be that length is in fact less informative in Cantonese than in Vietnamese due to the fact that it is only one of the cues to vowel contrasts.

These results add to the previous findings suggesting that existing models of non-native speech perception and learning need to be extended, incorporating the role of abstract phonological principles. The influence of such phonological principles might be captured in one existing framework: Processing Rich Information from Multidimensional Interactive Representations (PRIMIR; Werker & Curtin, 2005; Curtin, Byers-Heinlein, & Werker, 2011). The basic idea of PRIMIR is that linguistic knowledge is represented at non-hierarchically organized multidimensional spaces, including a General Perceptual space, a Word Form space, and a Phoneme space, with abstract phonemes emerging after the infant has acquired some of the lexicon, integrating information from the other two spaces. Werker, Curtin, and Byers-Heinlein do not explicitly discuss the influence of general phonological principles on non-native speech perception (such as which cues are generally informative or contrastive in L1). However, this influence might be captured in PRIMIR by assuming, for example, a separate learning mechanism available to infants that dynamically evaluates the general informativity of different phonetic cues in L1. This kind of mechanism would aid the acquisition of the L1 sound system because noticing featural relationships between categories would allow learners to pool the data from multiple categories along the same dimension, and thus require less data from each individual category to form accurate category representations. Crucially, once established for L1, the generalizations about cue informativity would affect perceptual sensitivities in a non-native language as well, thus providing an explanation for the results reported here, as well as those by Hallé et al. (1999) and Bohn and Best (2012).

These ideas could also be developed within a framework that builds on insights from recent rational approaches to learning (Tenenbaum et al., 2011), while accommodating the key insights of PRIMIR and related approaches. In particular, we propose to view perceptual reorganization as the consequence of learners’ hierarchical inductive inferences about the structure of the language’s sound system: infants not only acquire the specific phonetic category inventory, but also draw higher-order generalizations over the set of those categories, such as the overall informativity of phonetic dimensions for sound categorization. Non-native sound perception is then determined by sensitivities that emerge from these generalizations, in addition to mappings of non-native sounds onto native-language phonetic categories. Below we motivate and develop this account in more detail. We hope that by drawing on ideas from the general approaches to learning as rational inductive inference, we will contribute additional insights to our understanding of perceptual reorganization and non-native speech perception.

4.1 Hierarchical inductive inference in perceptual reorganization

Current theories view perceptual reorganization in infancy as resulting from the acquisition of the specific inventory of L1 phonetic categories, and yielding a “warped” perceptual space that then acts as a direct filter in naïve-listener perception of non-native sounds (Kuhl, 1991, 2000). Here we propose to adopt an approach in which perceptual reorganization is viewed as a consequence of hierarchical inductive inferences over the sound inventory of a language: in addition to learning individual phonetic categories, learners are posited also to make higher-order generalizations about the general properties of the set of those categories. On our proposal, then, perceptual reorganization leads not only to perceptual sensitivity to specific L1 phonetic categories, but also to sensitivity induced by these higher-order generalizations.

Our proposal is based on rational approaches to learning, and in particular recent work in the computational modeling of acquisition of abstract knowledge, suggesting that learners find regularities in the input through (implicit) inductive inferences and constructing hierarchical representations with deep abstract structure (Chater & Manning, 2006; Tenenbaum et al., 2011). More specifically, learning has been shown to occur not only at a single, flat level of representation but rather hierarchically. This means that the learner makes simultaneous inductive inferences about both particular categories and higher-level category structure. To take one example from the language domain: when acquiring individual verbs in a language, learners also infer general properties of verbal classes (Perfors, Tenenbaum, & Wonnacott, 2010). In terms of recent work in phonetic category induction (de Boer & Kuhl, 2003; Vallabha, McClelland, Pons, & Werker, 2007; McMurray, Aslin, & Toscano, 2009; Feldman, Griffiths, & Morgan, 2009a; Feldman, Griffiths, Goldwater, & Morgan, 2013) – where native-speaker knowledge is represented as a set of distributions over perceptual space, one distribution for each phonetic category – our proposal can be viewed as adding a higher-level distribution responsible for generating specific category-level distributions.

One type of higher-order generalization that learners might make about their native-language sound system involves the underlying set of informative phonetic dimensions from which the system is constructed.9 Positing dimension-based generalizations is motivated by evidence that infants encode speech sounds in terms of their subsegmental properties (e.g., Saffran & Thiessen, 2003; Maye et al., 2008; Cristià & Seidl, 2008; Cristià, Seidl, & Francis, 2011), suggesting that the information necessary to derive generalizations about phonetic dimension informativity in the native language is available to infant learners. This is also consistent with viewing phonetic learning as shifting selective attention to relevant acoustic-phonetic cues (e.g., Jusczyk, 1985, 1986, 1994, 1997; Best, 1994; Nusbaum & Goodman, 1994; Pisoni et al., 1994; Nittrouer & Miller, 1997). The dimensions that infants learn to attend to can include a range of phonetic cues with different degrees of informativity. The most informative dimensions are contrastive: they uniquely distinguish between phonemic categories (e.g., VOT differentiating English /b/-/p/, /t/-/d/, /k/-/g/) and are necessary to discriminate lexical items (e.g., bin-pin), but other properties may also be attended to, such as secondary cues to phonemic categories (e.g., vowel duration in bet vs. bed) or cues distinguishing between allophones (e.g., aspiration of the stop consonant in pool vs. spool). Thus, any higher-order generalizations about phonetic dimensions are likely modulated by their gradient native-language informativity. Crucially, these generalizations will differ depending on the person’s language background because the exact configuration of informative dimensions varies cross-linguistically (Ladefoged & Maddieson, 1996): for example, VOT distinguishes categories in English, but plays no role in Hawaiian; segmental length is contrastive for many Japanese categories (e.g., /p/-/p:/, /t/-/t:/), but is only a secondary cue to some English vowel distinctions (e.g., /ɪ/-/i/) and is uninformative in languages like Spanish.

Therefore, we propose that perceptual reorganization involves higher-order generalizations about phonetic dimension informativity. The crucial prediction of this account is that the end-state of perceptual reorganization is not just a perceptual space warped around L1 phonetic categories, as in the current theories. Instead, the end-state includes (1) the knowledge of the inventory of specific phonetic categories in the language (which in turn gives rise to perceptual warping effects around specific category boundaries for reasons such as those given by Feldman, Griffiths, & Morgan, 2009b), but also (2) heightened overall sensitivity and precision of encoding for those phonetic dimensions that determine category contrasts within the language as a whole. Thus, sensitivity is based on subsegmental properties abstracted away from individual categories.

Our proposal follows naturally from assuming a rational language learning and processing architecture, because for perceptual reorganization to proceed along these lines would be adaptive for the language learner. In particular, given that languages extensively re-use a limited set of phonetic dimensions (Clements, 2003), learning one distinction (e.g., [b]-[p]) might help learn analogous distinctions (e.g., [d]-[t], [g]-[k]). Hence, if the learner’s perceptual sensitivity to a given phonetic dimension is enhanced once the dimension is determined to be informative for one set of L1 phonetic categories, this would allow for perceptual bootstrapping and more efficient learning of L1 phonetic categories overall. Existing experimental evidence supports this view: both infants and adults generalize newly learned phonetic category distinctions to untrained sounds along the same dimension (McClaskey, Pisoni, & Carrell, 1983; Maye et al., 2008; Perfors & Dunbar, 2010; Pajak & Levy, 2011a, 2011b; see also Pajak, Bicknell, & Levy, 2013, for a computational implementation). Our proposal attributes these learning results to the mechanisms underlying perceptual reorganization, which – as we propose – induce sensitivity to whole phonetic dimensions, not just individual phonetic categories.

Let us now turn to the consequences of this view of perceptual reorganization for non-native speech perception. If perceptual reorganization in infancy yields general sensitivity to phonetic dimensions that are informative in L1, then listeners’ perception should be enhanced for any contrast along those informative dimensions. This means that naïve-listener perception of non-native sounds would be determined by perceptual sensitivities emerging from the higher-order generalizations made during perceptual reorganization, in addition to mappings of non-native sounds onto L1 phonetic categories. That is, learning a single VOT distinction in L1 (e.g., [b]-[p]) would lead to enhanced sensitivity not only to that particular distinction, but also to analogous VOT distinctions (e.g., [d]-[t], [g]-[k]), even if they do not in fact occur in L1. Similarly, learning lip rounding as a cue for distinguishing between L1 vowels would lead to enhanced sensitivity to other lip-rounding contrasts, such as [w]-[j], as reported by Hallé et al. (1999) and Bohn and Best (2012). Finally, learning that length is an informative cue to distinguish between some categories (e.g., vowels) would lead to enhanced sensitivity to other length contrast (e.g., consonants), again, even if they do not in fact occur in the learner’s L1. Therefore, the hierarchical inductive inference framework allows us to capture the present results, as well as the results of Hallé et al. (1999) and Bohn and Best (2012), thus incorporating the idea that non-native speech perception is affected by abstract phonological principles.

Although the results reported here only bear on one type of possible inference about learners’ native-language sound system, the hierarchical inductive inference framework predicts that learners make other types of higher-order generalizations about linguistic structures that should affect non-native language processing, both in the sound domain (e.g., likely phonotactic patterns) and in other aspects of language (e.g., likely word and morpheme orderings). Thus, this framework fits within the broader literature investigating generalization of linguistic knowledge in different language domains (e.g., Xu & Tenenbaum, 2007; Wonnacott, Newport, & Tanenhaus, 2008; Gerken, 2010), and it provides theoretical unification with many other domains in which inductive approaches to learning have proven fruitful.

5. Conclusion

In this paper we investigated the role of abstract phonological principles in shaping non-native speech perception. We reported results demonstrating that some perceptual sensitivities cannot be attributed to listeners’ warped perceptual space alone, as assumed in prior accounts, but rather to enhanced general sensitivity along phonetic dimensions that the listeners’ native language employs to distinguish between categories. Specifically, we showed that knowledge of a language with short and long vowel categories is associated with enhanced discrimination of non-native consonant length contrasts. These results suggest that abstract phonological principles – such as whether a phonetic dimension is overall informative in L1 – influence perception of non-native sounds. Critically, such principles operate at an abstract level given that they apply across sound groups that are acoustically and functionally very distinct. To account for these results we developed a novel approach within a hierarchical inductive inference framework. We proposed viewing perceptual reorganization in infancy as the consequence of learners’ hierarchical inductive inferences about the structure of the language’s sound system. That is, we argued that infants not only acquire the specific phonetic category inventory of their native language, but also draw higher-order generalizations over the set of those categories. One such generalization captures the overall informativity of phonetic dimensions for sound categorization in L1. We further argued that this re-conceptualization of perceptual reorganization has consequences for non-native speech perception: perceptual sensitivities of naïve listeners emerge from higher-order generalizations formed during the L1 acquisition, rather than exclusively from mappings of non-native sounds onto native-language phonetic categories. We believe that this account contributes new insights to our understanding of perceptual reorganization and non-native speech perception by offering a rational perspective that has been successful in many other domains.

Acknowledgements

We thank Eric Bakovic, Klinton Bicknell, Sarah Creel, Tamar Gollan, Dennis Norris, Amy Perfors, and Arthur Samuel for useful feedback. We are also grateful to the associate editor, Ocke-Schwen Bohn, and four anonymous reviewers, whose comments and suggestions have been extremely helpful. This research was supported by NIH Training Grant T32-DC-000041 from the Center for Research in Language at UC San Diego to BP and NIH Training Grant T32-DC000035 from the Center for Language Sciences at University of Rochester to B.P.

Appendix

Table 5.

Mean and standard deviation of self-reported participant characteristics

Korean Vietnamese Cantonese Mandarin
L1-
dominant
Eng-
dominant
L1-
dominant
Eng-
dominant
L1-
dominant
Eng-
dominant
L1-
dominant
Eng-
dominant
N=12 N=12 N=7 N=17 N=12 N=12 N=12 N=12
Measure M SD M SD M SD M SD M SD M SD M SD M SD
Age 21 (3.0) 20 (1.0) 20 (1.2) 20 (1.4) 20 (0.9) 20 (1.5) 21 (1.4) 20 (1.3)
Age when immigrated to the USa 14 (5.3) 4 (5.3) 5 (4.5) 0 (0.3) 14 (4.4) 2 (4.4) 12 (4.4) 4 (4.3)
Self-rated L1 proficiency (0-none, 10-perfect)b 9.1 (1.0) 7.4 (2.1) 8.1 (0.9) 7.0 (0.9) 9.2 (1.3) 8.3 (1.2) 9.4 (1.0) 8.1 (1.0)
% time current L1 exposure 44 (13.8) 34 (17.1) 25 (14.7) 20 (12.2) 41 (24.6) 33 (8.9) 46 (14.7) 16 (11.8)
L1 use w/family (0-never, 10-always) 9.9 (0.3) 8.6 (1.5) 8.7 (1.8) 8.1 (1.8) 9.1 (1.9) 9.6 (0.8) 9.6 (0.9) 8.6 (1.7)
L1 use w/friends (0–10) 6.8 (1.7) 4.0 (2.3) 2.6 (2.5) 1.7 (1.8) 7.1 (3.1) 4.0 (2.1) 5.5 (3.2) 2.8 (2.3)
% time preferred L1 usec 61 (15.6) 35 (21.5) 46 (9.4) 26 (14.0) 58 (25.1) 33 (9.4) 56 (24.7) 26 (15.1)
Age when began regular Eng exposured 10 (4.7) 4 (3.8) 6 (3.5) 3 (1.8) 5 (2.8) 4 (1.7) 8 (3.6) 5 (2.7)
Self-rated Eng proficiency (0–10)b 7.3 (1.1) 9.4 (0.8) 7.7 (0.49) 9.4 (0.6) 7.3 (1.0) 9.2 (0.9) 7.3 (0.7) 9.0 (1.1)
% time current Eng exposure 53 (13.6) 65 (16.0) 74 (15.1) 77 (13.5) 45 (22.9) 57 (11.4) 50 (14.7) 81 (18.6)
Eng use w/family (0–10) 1.0 (1.76) 4.3 (2.4) 3.6 (2.9) 6.1 (2.1) 1.1 (1.2) 4.0 (2.9) 2.3 (1.9) 3.3 (2.1)
Eng use w/friends (0–10) 5.8 (2.2) 9.0 (1.6) 9.1 (1.5) 9.7 (0.6) 7.3 (2.1) 8.6 (2.0) 7.4 (1.9) 9.6 (0.7)
% time preferred Eng usec 35 (16.2) 63 (20.9) 55 (10.3) 69 (15.6) 28 (20.3) 51 (19.6) 43 (24.2) 73 (15.3)
a

If born in the US, coded as 0.

b

Mean proficiency speaking and understanding.

c

“If you could freely choose a language to speak, what percentage of time would you choose to speak each language?”

d

Participants were instructed to provide the age when they started learning English. This typically reflects the onset of immersion in an English-speaking environment (e.g., US preschool) or the beginning of classroom exposure to English as a second language outside the US.

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

1

Throughout the paper we use the term “phonetic categories”, not making an explicit distinction between phonemes and allophones. For discussion on the relationship between phonetic and phonological levels in perception and learning of sound inventories, see, for example, Best and Tyler (2007) and Dillon, Dunbar, and Idsardi (2013).

2

The term “sibilant” refers to the subset of fricatives and affricates that is characterized by a high-pitch frication noise (e.g., [s], [z], [ʃ], [ʒ], [ʂ], etc.).

3

As for the length contrasts, Polish has both “true” geminate consonants, which are underlyingly long (mostly borrowings from other languages, e.g., [get:ɔ] ‘getto’, [an:a] ‘Ann’), and “fake” geminate consonants, which are derived via morphological processes (e.g., [v:ɔʑitɕ] ‘to carry in’, [bes:trɔn:ɨ] ‘impartial’; for more examples and discussion see, e.g., Zajda, 1977; Rubach, 1986; Rubach & Booij; 1990; Sawicka, 1995; Thurgood, 2002; Pajak, 2010; Pajak & Baković, 2010). As for the sibilant contrasts, the retroflex consonants in Polish are by some considered postalveolar, but see Keating (1991), Ladefoged and Maddieson (1996), and Hamann (2004) for arguments regarding their analysis as slightly retroflex (or retracted). Importantly, Polish and Mandarin are analogous in having three sibilant places of articulation (e.g., [s], [ɕ], [ʂ]), a typologically rare distinction that makes use of similar cues (Ladefoged and Maddieson, 1996; see also Nowak, 2006, for an acoustic analysis of Polish sibilants).

4

The stimuli were originally prepared for use with Mandarin-English and Cantonese-English bilinguals, and the segmental contexts were chosen so that the particular sequences were legal in all three languages: Mandarin, Cantonese, and English.

5

D-prime scores are calculated by taking into account the standardized proportion of Hits (correct “different” responses) and False Alarms (incorrect “different” responses), with the following formula: d′ = z(FA) − z(H). D-prime values near zero indicate chance performance. The maximum value of d-prime depends on the specific dataset, but the effective ceiling is generally considered to be equal to 4.65.

6

We expected that Cantonese speakers might have had an advantage over Korean and Vietnamese speakers on discriminating alveolo-palatal and retroflex consonants, given their experience with Mandarin through school instruction. However, no such advantage was observed: Cantonese speakers patterned with Korean and Vietnamese speakers in their performance on sibilant trials, thus suggesting that their experience with Mandarin was not sufficient to enhance their sensitivity to alveolo-palatal vs. retroflex consonant contrasts.

7

Note that Mandarin speakers’ relatively low performance extended even to the sibilant contrasts, which were discriminated slightly worse than the filler contrasts. This result might be due to two main factors: (1) one of the cues to the contrast – the formant transition out of the consonant – was partially removed due to splicing, and (2) the tested Polish alveolo-palatal vs. retroflex sibilant contrast differed in the exact place of articulation from the analogous Mandarin contrast (Ladefoged & Maddieson, 1996). The first factor made the contrast overall perceptually less salient, thus making it hard for all participants, while the second factor possibly diminished Mandarin speakers’ perceptual advantage relative to other listeners on the sibilant stimuli. Indeed, when the formant transition cues are left intact, the discrimination of the Polish alveolo-palatal vs. retroflex sibilant contrast by Mandarin listeners is much higher (around 85–90% accuracy; Pajak & Levy, 2012; Pajak, Creel, & Levy, in prep.).

8

It is not obvious why L1-dominant Mandarin speakers would perform better on length items than English-dominant ones. One possibility is that there is an advantage coming from duration differences in Mandarin tones, and L1-dominant participants benefit from more extensive exposure to tones. Another possibility, however, is that the English-dominant group happened to include overall weaker performers. This is supported by the results on filler trials, where the Mandarin-speaker group differs from all other language groups in having numerically lower scores within the English-dominant participants than the L1-dominant participants.

9

Alternatively, it can be thought of as a set of phonological features that are active in the language. In this paper we do not make any theoretical commitment as to the exact content of subsegmental properties.

Reference List

  1. Antoniou M, Tyler MD, Best CT. Two ways to listen: Do L2-dominant bilinguals perceive stop voicing according to language mode? Journal of Phonetics. 2012;40:582–594. doi: 10.1016/j.wocn.2012.05.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Bauer RS, Benedict PK. Modern Cantonese phonology. Berlin/New York: Mouton de Gruyter; 1997. [Google Scholar]
  3. Benus S, Smorodinsky I, Gafos A. Proceedings of the 27th Annual Penn Linguistics Colloquium. Philadelphia, PA: University of Pennsylvania; 2003. Gestural coordination and the distribution of English ‘geminates’; pp. 33–46. [Google Scholar]
  4. Best CT. Emergence of language-specific constraints in perception of non-native speech: a window on early phonological development. In: de Boysson-Bardies B, de Schonen S, Jusczyk P, MacNeilage P, Morton J, editors. Developmental neurocognition: speech and face processing in the first year of life. Dordrecht: Kluwer; 1993. [Google Scholar]
  5. Best CT. The emergence of native-language phonological influence in infants: A perceptual assimilation model. In: Goodman J, Nusbaum H, editors. The development of speech perception: The transition from speech sounds to spoken words. Cambridge: MIT Press; 1994. pp. 167–224. [Google Scholar]
  6. Best CT. A direct realist view of cross-language speech perception. In: Strange W, editor. Speech perception and linguistic experience: issues in cross-language research. Timonium, MD: York Press; 1995. pp. 171–204. [Google Scholar]
  7. Best CT, Hallé PA. Perception of initial obstruent voicing is influenced by gestural organization. Journal of Phonetics. 2010;38:109–126. doi: 10.1016/j.wocn.2009.09.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Best CT, Strange W. Effects of phonological and phonetic factors on cross-language perception of approximants. Journal of Phonetics. 1992;20:305–330. [Google Scholar]
  9. Best CT, Tyler MD. Nonnative and second-language speech perception. In: Bohn O-S, Munro MJ, editors. Language experience in second language speech learning: in honor of James Emil Flege. Amsterdam/Philadelphia: John Benjamins; 2007. pp. 13–34. [Google Scholar]
  10. Best CT, McRoberts GW, Goodell E. Discrimination of nonnative consonant contrasts varying in perceptual assimilation to the listener’s native phonological system. Journal of the Acoustical Society of America. 2001;109:775–794. doi: 10.1121/1.1332378. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Blicher DL, Diehl RL, Cohen LB. Effects of syllable duration on the perception of the Mandarin Tone 2/Tone 3 distinction: evidence of auditory enhancement. Journal of Phonetics. 1990;18:37–49. [Google Scholar]
  12. Bohn OS. Cross-language speech perception in adults: first language transfer doesn’t tell it all. In: Strange W, editor. Speech perception and linguistic experience: issues in cross-language research. Timonium, MD: York Press; 1995. pp. 275–300. [Google Scholar]
  13. Bohn OS, Best CT. Native-language phonetic and phonological influences on perception of American English approximants by Danish and German listeners. Journal of Phonetics. 2012;40:109–128. [Google Scholar]
  14. Bohn OS, Flege JE. Interlingual identification and the role of foreign language experience in L2 vowel perception. Applied Psycholinguistics. 1990;11:303–328. [Google Scholar]
  15. Bonatti LL, Peña M, Nespor M, Mehler J. Linguistic constraints on statistical computations: the role of consonants and vowels in continuous speech processing. Psychological Science. 2005;16(6):451–459. doi: 10.1111/j.0956-7976.2005.01556.x. [DOI] [PubMed] [Google Scholar]
  16. Brown C. Unpublished doctoral dissertation. McGill University; 1997. Acquisition of segmental structure: Consequences for speech perception and second language acquisition. [Google Scholar]
  17. Brown C. The interrelation between speech perception and phonological acquisition from infant to adult. In: Archibald J, editor. Second language acquisition and linguistic theory. Malden, MA: Blackwell; 2000. pp. 4–63. [Google Scholar]
  18. Cebrian J. Experience and the use of duration in the categorization of L2 vowels. Journal of Phonetics. 2006;34:372–387. [Google Scholar]
  19. Chater N, Manning CD. Probabilistic models of language processing and acquisition. Trends in Cognitive Science. 2006;10(7):335–344. doi: 10.1016/j.tics.2006.05.006. [DOI] [PubMed] [Google Scholar]
  20. Choi D-I. Korean “tense” consonants as geminates. Kansas Working Papers in Linguistics. 1995;20:25–38. [Google Scholar]
  21. Clements GN. Feature economy in sound systems. Phonology. 2003;20:287–333. [Google Scholar]
  22. Cristia A, Seidl A. Is infants’ learning of sound patterns constrained by phonological features? Language Learning and Development. 2008;4(3):203–227. [Google Scholar]
  23. Cristià A, Seidl A, Francis AL. Phonological features in infancy. In: Clements GN, Ridouane R, editors. Where do phonological features come from? Cognitive, physical and developmental bases of distinctive speech categories. Amsterdam/Philadelphia: John Benjamins; 2011. pp. 303–326. [Google Scholar]
  24. Curtin S, Byers-Heinlein K, Werker JF. Bilingual beginnings as a lens for theory development: PRIMIR in focus. Journal of Phonetics. 2011;39:492–504. [Google Scholar]
  25. de Boer B, Kuhl PK. Investigating the role of infant-directed speech with a computer model. Acoustic Research Letters Online. 2003;4(4):129–134. [Google Scholar]
  26. Dietrich C, Swingley D, Werker JF. Native language governs interpretation of salient speech sound differences at 18 months. Proceedings of the National Academy of Sciences. 2007;104:16027–16031. doi: 10.1073/pnas.0705270104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Dillon B, Dunbar E, Idsardi W. A single-stage approach to learning phonological categories: Insights from Inuktitut. Cognitive Science. 2013;37:344–377. doi: 10.1111/cogs.12008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Eimas PD. Developmental aspects of speech perception. In: Held R, Leibowitz HW, Teuber HL, editors. Handbook of sensory physiology. Vol. 8. Berlin: Springer-Verlag; 1978. [Google Scholar]
  29. Escudero P. The phonological and phonetic development of new vowel contrasts in Spanish learners of English. In: Baptista BO, Watkins MA, editors. English With a Latin Beat: Studies in Portuguese/Spanish-English Interphonology, Studies in Bilingualism. Vol. 31. Amsterdam: John Benjamins; 2006. pp. 149–161. [Google Scholar]
  30. Escudero P. Linguistic Perception of "similar" L2 sounds. In: Boersma P, Hamann S, editors. Phonology in Perception. Berlin: Mouton de Gruyter; 2009. pp. 151–190. [Google Scholar]
  31. Escudero P, Boersma P. Bridging the gap between L2 speech perception research and phonological theory. Studies in Second Language Acquisition. 2004;26:551–585. [Google Scholar]
  32. Escudero P, Benders T, Lipski SC. Native, non-native and L2 perceptual cue weighting for Dutch vowels: The case of Dutch, German, and Spanish listeners. Journal of Phonetics. 2009;37:452–465. [Google Scholar]
  33. Feldman NH, Griffiths TL, Morgan JL. Proceedings of the 31st Annual Conference of the Cognitive Science Society. Austin, TX: Cognitive Science Society; 2009a. Learning phonetic categories by learning a lexicon; pp. 2208–2213. [Google Scholar]
  34. Feldman NH, Griffiths TL, Morgan JL. The influence of categories on perception: explaining the perceptual magnet effect as optimal statistical inference. Psychological Review. 2009b;116(4):752–782. doi: 10.1037/a0017196. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Feldman NH, Griffiths TL, Goldwater S, Morgan JL. A role for the developing lexicon in phonetic category acquisition. Psychological Review. 2013;120(4):751–778. doi: 10.1037/a0034245. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Flege JE. The production and perception of foreign language speech sounds. In: Winitz H, editor. Human communication and its disorders: a review. Vol. 1988. Norwood, NJ: Ablex Publishing; 1988. pp. 224–401. [Google Scholar]
  37. Flege JE. The intelligibility of English vowels spoken by British and Dutch talkers. In: Kent RD, editor. Intelligibility in speech disorders: theory, measurement, and management. Amsterdam: John Benjamins; 1992. pp. 157–232. [Google Scholar]
  38. Flege JE. Second-language speech learning: theory, findings and problems. In: Strange W, editor. Speech perception and linguistic experience: issues in cross-language research. Timonium, MD: York Press; 1995. pp. 229–273. [Google Scholar]
  39. Flege JE, Eefting W. Production and perception of English stops by native Spanish speakers. Journal of Phonetics. 1987;15:67–83. [Google Scholar]
  40. Flege JE, Bohn O-S, Jang S. Effects of experience on non-native speakers' production and perception of English vowels. Journal of Phonetics. 1997;25:437–470. [Google Scholar]
  41. Francis AL, Baldwin K, Nusbaum HC. Effects of training on attention to acoustic cues. Perception and Psychophysics. 2000;62:1668–1680. doi: 10.3758/bf03212164. [DOI] [PubMed] [Google Scholar]
  42. Francis A, Nusbaum HC. Selective attention and the acquisition of new phonetic categories. Journal of Experimental Psychology: Human Perception and Performance. 2002;28(2):349–366. doi: 10.1037//0096-1523.28.2.349. [DOI] [PubMed] [Google Scholar]
  43. Gerken L. Infants use rational decision criteria for choosing among models of their input. Cognition. 2010;115:362–366. doi: 10.1016/j.cognition.2010.01.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Goldstone R. Feature distribution and biased estimation of visual displays. Journal of Experimental Psychology: Human Perception and Performance. 1993;19:564–579. [Google Scholar]
  45. Goldstone R. Influences of categorization on perceptual discrimination. Journal of Experimental Psychology: General. 1994;123:178–200. doi: 10.1037//0096-3445.123.2.178. [DOI] [PubMed] [Google Scholar]
  46. Goto H. Auditory perception by normal Japanese adults of the sounds “L" and “R". Neuropsychologia. 1971;9:317–323. doi: 10.1016/0028-3932(71)90027-3. [DOI] [PubMed] [Google Scholar]
  47. Goudbeek M, Cutler A, Smits R. Supervised and unsupervised learning of multidimensionally varying non-native speech categories. Speech Communication. 2008;50:109–125. [Google Scholar]
  48. Hahm H-J. The effects of following vowel on Korean fricatives. Linguistic Research. 2007;24(1):57–82. [Google Scholar]
  49. Hallé PA, Best CT, Levitt A. Phonetic vs. phonological influences on French listeners’ perception of American English approximants. Journal of Phonetics. 1999;27:281–306. [Google Scholar]
  50. Hancin-Bhatt BJ. Segment transfer: A consequence of a dynamic system. Second Language Research. 1994;10:241–269. [Google Scholar]
  51. Hayes-Harb R, Masuda K. Development of the ability to lexically encode novel second language phonemic contrasts. Second Language Research. 2008;24:5–33. [Google Scholar]
  52. Hayes R. The perception of novel phoneme contrasts in a second language: a developmental study of native speakers of English learning Japanese singleton and geminate consonant contrasts. Coyote Papers. 2002;12:28–41. [Google Scholar]
  53. Heeren WFL, Schouten MEH. Perceptual development of phoneme contrasts: how sensitivity changes along acoustic dimensions that contrast phoneme categories. Journal of the Acoustical Society of America. 2008;124(4):2291–2302. doi: 10.1121/1.2967472. [DOI] [PubMed] [Google Scholar]
  54. Hillenbrand JM, Clark MJ, Houde RA. Some effects of duration on vowel recognition. Journal of the Acoustical Society of America. 2000;108(6):3013–3022. doi: 10.1121/1.1323463. [DOI] [PubMed] [Google Scholar]
  55. Iverson P, Hazan V, Bannister K. Phonetic training with acoustic cue manipulations: A comparison of methods for teaching English /r/-/l/ to Japanese adults. Journal of the Acoustical Society of America. 2005;118(5):3267–3278. doi: 10.1121/1.2062307. [DOI] [PubMed] [Google Scholar]
  56. Iverson P, Kuhl PK, Akahane-Yamada R, Diesch E, Tohkura Y, Kettermann A, Siebert C. A perceptual interference account of acquisition difficulties for non-native phonemes. Cognition. 2003;87:B47–B57. doi: 10.1016/s0010-0277(02)00198-1. [DOI] [PubMed] [Google Scholar]
  57. Jongman A, Wang Y, Moore C, Sereno J. Perception and production of Mandarin tone. In: Li P, Tan LH, Bates E, Tzeng OJL, editors. Handbook of East Asian Psycholinguistics. 1: Chinese. Cambridge, UK: Cambridge University Press; 2006. [Google Scholar]
  58. Jusczyk P. On characterizing the development of speech perception. In: Mehler J, Fox R, editors. Neonate cognition: beyond the blooming buzzing confusion. Hillsdale, NJ: Erlbaum; 1985. pp. 199–229. [Google Scholar]
  59. Jusczyk P. Toward a model of the development of speech perception. In: Perkell J, Klatt D, editors. Invariance and variability in speech processes. Hillsdale, NJ: Erlbaum; 1986. pp. 1–19. [Google Scholar]
  60. Jusczyk P. Infant speech perception and the development of the mental lexicon. In: Goodman JC, Nusbaum HC, editors. The development of speech perception: The transition from speech sounds to spoken words. Cambridge, MA: MIT Press; 1994. pp. 227–270. [Google Scholar]
  61. Jusczyk P. The discovery of spoken language. Cambridge, MA: MIT Press; 1997. [Google Scholar]
  62. Kao DL. Structure of the syllable in Cantonese. The Hague: Mouton; 1971. [Google Scholar]
  63. Kawahara S. University of Massachusetts Occasional Papers in Linguistics 32: Papers in Optimality III. Amherst: GLSA; 2007. Sonorancy and geminacy; pp. 145–186. [Google Scholar]
  64. Keating PA. Coronal places of articulation. In: Paradis C, Prunet J-F, editors. Phonetics and phonology. The special status of coronals: internal and external evidence. San Diego: Academic Press; 1991. pp. 29–48. [Google Scholar]
  65. Kim Y-S. On non-moraic geminates. Studies in Phonetics. 2002;8(2):187–200. [Google Scholar]
  66. Kondaurova MV, Francis AL. The relationship between native allophonic experience with vowel duration and perception of the English tense/lax vowel contrast by Spanish and Russian listeners. Journal of the Acoustical Society of America. 2008;124(6):3959–3971. doi: 10.1121/1.2999341. [DOI] [PMC free article] [PubMed] [Google Scholar]
  67. Kondaurova MV, Francis AL. The role of selective attention in the acquisition of English tense and lax vowels by native Spanish listeners: comparison of three training methods. Journal of Phonetics. 2010;38:569–587. doi: 10.1016/j.wocn.2010.08.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  68. Kuhl PK. Human adults and human infants show a "perceptual magnet effect" for the prototypes of speech categories, monkeys do not. Perception and Psychophysics. 1991;50(2):93–107. doi: 10.3758/bf03212211. [DOI] [PubMed] [Google Scholar]
  69. Kuhl PK. Psychoacoustics and speech perception: internal standards, perceptual anchors, and prototypes. In: Werner LA, Rubel EW, editors. Developmental psychoacoustics. Washington, DC: American Psychological Association; 1992. pp. 293–332. [Google Scholar]
  70. Kuhl PK. Learning and representation in speech and language. Current Opinion in Neurobiology. 1994;4(6):812–822. doi: 10.1016/0959-4388(94)90128-7. [DOI] [PubMed] [Google Scholar]
  71. Kuhl PK. Language, mind, and brain: Experience alters perception. In: Gazzaniga MS, editor. The new cognitive neurosciences (2nd ed.) Cambridge, MA: MIT Press; 2000. pp. 99–115. [Google Scholar]
  72. Kuhl PK. Early language acquisition: cracking the speech code. Nature Reviews Neuroscience. 2004;5:831–843. doi: 10.1038/nrn1533. [DOI] [PubMed] [Google Scholar]
  73. Kuhl PK, Iverson P. Linguistic experience and the "perceptual magnet effect". In: Strange W, editor. Speech perception and linguistic experience: issues in cross-language research. Timonium, MD: York Press; 1995. pp. 121–154. [Google Scholar]
  74. Kuhl PK, Conboy BT, Coffey-Corina S, Padden D, Rivera-Gaxiola M, Nelson T. Phonetic learning as a pathway to language: new data and native language magnet theory expanded (NLM-e) Philosophical Transactions of the Royal Society. 2008;363:979–1000. doi: 10.1098/rstb.2007.2154. [DOI] [PMC free article] [PubMed] [Google Scholar]
  75. Ladefoged P, Maddieson I. The sounds of the world’s languages. Oxford, UK; Cambridge, MA: Blackwell; 1996. [Google Scholar]
  76. Lee HB. Handbook of the International Phonetic Association. Cambridge: Cambridge University Press; 1999. Korean; pp. 120–123. [Google Scholar]
  77. Lim S-J, Holt L. Learning foreign sounds in an alien world: Videogame training improves non-native speech categorization. Cognitive Science. 2011;35(7):1390–1405. doi: 10.1111/j.1551-6709.2011.01192.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  78. Lin H. A grammar of Mandarin Chinese. Lincom Europa: München; 2001. [Google Scholar]
  79. Logan JS, Lively SE, Pisoni DB. Training Japanese listeners to identify English /r/ and /l/: a first report. Journal of the Acoustical Society of America. 1991;89(2):874–886. doi: 10.1121/1.1894649. [DOI] [PMC free article] [PubMed] [Google Scholar]
  80. Macmillan NA, Creelman CD. Detection theory: a user’s guide. 2nd ed. Mahwah, NJ: Lawrence Erlbaum Associates; 2005. [Google Scholar]
  81. Magen HS, Blumstein SE. Effects of speaking rate on the vowel length distinction in Korean. Journal of Phonetics. 1993;21:387–409. [Google Scholar]
  82. Maye J, Weiss D, Aslin RN. Statistical phonetic learning in infants: facilitation and feature generalization. Developmental Science. 2008;11(1):122–134. doi: 10.1111/j.1467-7687.2007.00653.x. [DOI] [PubMed] [Google Scholar]
  83. McAllister R, Flege JE, Piske T. The influence of L1 on the acquisition of Swedish quantity by native speakers of Spanish, English, and Estonian. Journal of Phonetics. 2002;30:229–258. [Google Scholar]
  84. McClaskey CL, Pisoni DB, Carrell TD. Transfer of training of a new linguistic contrast in voicing. Perception and Psychophysics. 1983;34(4):323–330. doi: 10.3758/bf03203044. [DOI] [PMC free article] [PubMed] [Google Scholar]
  85. McMurray B, Aslin RN, Toscano JC. Statistical learning of phonetic categories: insights from a computational approach. Developmental Science. 2009;12(3):369–378. doi: 10.1111/j.1467-7687.2009.00822.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  86. Miyawaki K, Strange W, Verbrugge RR, Liberman AM, Jenkins JJ, Fujimura O. An effect of linguistic experience: The discrimination of [r] and [l] by native speakers of Japanese and English. Perception and Psychophysics. 1975;18:331–340. [Google Scholar]
  87. Mugitani R, Pons F, Fais L, Werker JF, Amano S. Perception of vowel length by Japanese- and English-learning infants. Developmental Psychology. 2008;45(1):236–247. doi: 10.1037/a0014043. [DOI] [PubMed] [Google Scholar]
  88. Nittrouer S, Miller ME. Predicting developmental shifts in perceptual weighting schemes. Journal of the Acoustical Society of America. 1997;101:2253–2266. doi: 10.1121/1.418207. [DOI] [PubMed] [Google Scholar]
  89. Nosofsky RM. Attention, similarity, and the identification–categorization relationship. Journal of Experimental Psychology: General. 1986;115(1):39–57. doi: 10.1037//0096-3445.115.1.39. [DOI] [PubMed] [Google Scholar]
  90. Nowak P. The role of vowel transitions and frication noise in the perception of Polish sibilants. Journal of Phonetics. 2006;34:139–152. [Google Scholar]
  91. Nusbaum HC, Goodman J. Learning to hear speech as spoken language. In: Goodman J, Nusbaum HC, editors. The development of speech perception: The transition from speech sounds to spoken words. Cambridge, MA: MIT Press; 1994. pp. 299–338. [Google Scholar]
  92. Pajak B. Proceedings of the 35th Annual Meeting of the Berkeley Linguistics Society. Berkeley, CA: University of California, Berkeley; 2010. Contextual constraints on geminates: the case of Polish; pp. 269–280. [Google Scholar]
  93. Pajak B. Non-intervocalic geminates: typology, acoustics, perceptibility. In: Carroll L, Keffala B, Michel D, editors. San Diego Linguistics Papers. Vol. 4. San Diego, CA: University of California San Diego; 2013. pp. 2–27. [Google Scholar]
  94. Pajak B, Baković E. Assimilation, antigemination, and contingent optionality: the phonology of monoconsonantal proclitics in Polish. Natural Language and Linguistic Theory. 2010;28:643–680. [Google Scholar]
  95. Pajak B, Levy R. Proceedings of the 47th Annual Meeting of the Chicago Linguistic Society. Chicago, IL: University of Chicago; 2011a. How abstract are phonological representations? Evidence from distributional perceptual learning. [Google Scholar]
  96. Pajak B, Levy R. Phonological generalization from distributional evidence. In: Carlson L, Hölscher C, Shipley T, editors. Proceedings of the 33rd Annual Conference of the Cognitive Science Society. Austin, TX: Cognitive Science Society; 2011b. pp. 2673–2678. [Google Scholar]
  97. Pajak B, Levy R. Distributional learning of L2 phonological categories by listeners with different language backgrounds. In: Biller AK, Chung EY, Kimball AE, editors. Proceedings of the 36th Boston University Conference on Language Development. Somerville, MA: Cascadilla Press; 2012. pp. 400–413. [Google Scholar]
  98. Pajak B, Bicknell K, Levy R. A model of generalization in distributional learning of phonetic categories. In: Demberg V, Levy R, editors. Proceedings of the 4th Workshop on Cognitive Modeling and Computational Linguistics. Sofia: Association for Computational Linguistics; 2013. pp. 11–20. [Google Scholar]
  99. Pajak B, Creel SC, Levy R. Difficulty in learning similar-sounding words: a developmental stage or a general property of learning? doi: 10.1037/xlm0000247. (in prep.). Manuscript in preparation for journal submission. [DOI] [PMC free article] [PubMed] [Google Scholar]
  100. 100.Perfors A, Dunbar D. Phonetic training makes word learning easier. In: Ohlsson S, Catrambone R, editors. Proceedings of the 32nd Annual Conference of the Cognitive Science Society. Austin, TX: Cognitive Science Society; 2010. pp. 1613–1618. [Google Scholar]
  101. 101.Perfors A, Tenenbaum JB, Wonnacott E. Variability, negative evidence, and the acquisition of verb argument constructions. Journal of Child Language. 2010;37:607–642. doi: 10.1017/S0305000910000012. [DOI] [PubMed] [Google Scholar]
  102. 102.Pisoni DB, Lively SE, Logan JS. Perceptual learning of nonnative speech contrasts: Implications for theories of speech perception. In: Goodman JC, Nusbaum HC, editors. The development of speech perception: The transition from speech sounds to spoken words. Cambridge, MA: MIT Press; 1994. pp. 121–166. [Google Scholar]
  103. 103.Polka L. Cross-language speech perception in adults: Phonemic, phonetic, and acoustic contributions. Journal of the Acoustical Society of America. 1991;89(6):2961–2977. doi: 10.1121/1.400734. [DOI] [PubMed] [Google Scholar]
  104. 104.Polka L. Characterizing the influence of native language experience on adult speech perception. Perception and Psychophysics. 1992;52(1):37–52. doi: 10.3758/bf03206758. [DOI] [PubMed] [Google Scholar]
  105. 105.Rubach J. Does the obligatory contour principle operate in Polish? Studies in the Linguistic Sciences. 1986;16:133–147. [Google Scholar]
  106. 106.Rubach J, Booij GE. Edge of constituent effects in Polish. Natural Language and Linguistic Theory. 1990;8(3):427–463. [Google Scholar]
  107. 107.Saffran JR, Thiessen ED. Pattern induction by infant language learners. Developmental Psychology. 2003;39(3):484–494. doi: 10.1037/0012-1649.39.3.484. [DOI] [PubMed] [Google Scholar]
  108. 108.Sawicka Irena. Fonologia. In: Wróbel H, editor. Gramatyka współczesnego języka polskiego. Fonetyka i fonologia. Kraków: Instytut Języka Polskiego PAN; 1995. pp. 105–195. [Google Scholar]
  109. 109.Schouten MEH, van Hessen AJ. Modeling phoneme perception. I: Categorical perception. Journal of the Acoustical Society of America. 1992;92(4):1841–1855. doi: 10.1121/1.403841. [DOI] [PubMed] [Google Scholar]
  110. 110.Sohn H-M. The Korean language. Cambridge, UK: Cambridge University Press; 1999. [Google Scholar]
  111. 111.Strange W, Dittmann S. Effects of discrimination training on the perception of /r-l/ by Japanese adults learning English. Perception and Psychophysics. 1984;36(2):131–145. doi: 10.3758/bf03202673. [DOI] [PubMed] [Google Scholar]
  112. 112.Strange W, Shafer VL. Speech perception in second language learners: the re-education of selective perception. In: Hansen Edwards JG, Zampini ML, editors. Phonology and second language acquisition. Amsterdam/Philadelphia: John Benjamins; 2008. pp. 153–191. [Google Scholar]
  113. 113.Tenenbaum JB, Kemp C, Griffiths TL, Goodman N. How to grow a mind: statistics, structure, and abstraction. Science. 2011;331:1279–1285. doi: 10.1126/science.1192788. [DOI] [PubMed] [Google Scholar]
  114. 114.Thurgood E. The recognition of geminates in ambiguous contexts in Polish. Speech Prosody. 2002;2002:659–662. [Google Scholar]
  115. 115.Trubetzkoy N. Principles of phonology [Grundzu ge der Phonologie] Berkeley, CA: University of California Press; 1939/1969. [Google Scholar]
  116. 116.Tseng C-Y, Massaro DW, Cohen MM. Lexical tone perception in Mandarin Chinese: Evaluation and integration of acoustic features. In: Kao HSR, Hoosain R, editors. Linguistics, psychology, and the Chinese language. Hong Kong, China: Centre of Asian Studies, University of Hong Kong; 1986. pp. 91–104. [Google Scholar]
  117. 117.Vallabha GK, McClelland JL, Pons F, Werker JF, Amano S. Unsupervised learning of vowel categories from infant-directed speech. Proceedings of the National Academy of Sciences. 2007;104(33):13273–13278. doi: 10.1073/pnas.0705369104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  118. 118.Wang X. Mandarin listeners' perception of English vowels: Problems and strategies. Canadian Acoustics. 2006;34(4):15–26. [Google Scholar]
  119. 119.Wang X, Munro MJ. Computer-based training for learning English vowel contrasts. System. 2004;32:539–552. [Google Scholar]
  120. 120.Werker JF. Becoming a native listener. Scientific American. 1989;77(1):54–59. [Google Scholar]
  121. 121.Werker JF, Curtin S. PRIMIR: a developmental framework for infant speech processing. Language Learning and Development. 2005;1(2):197–234. [Google Scholar]
  122. 122.Werker JF, Tees RC. Cross-language speech perception: evidence for perceptual reorganization during the first year of life. Infant Behavior and Development. 1984;7:49–63. [Google Scholar]
  123. 123.Westbury JR, Hashi M, Lindstrom MJ. Differences among speakers in lingual articulation for American English /ɹ / Speech Communication. 1998;26(3):203–226. [Google Scholar]
  124. 124.Winn M, Blodgett A, Bauman J, Bowles A, Charters L, Rytting A, Shamoo J. Vietnamese monophthong vowel production by native speakers and American adult learners. Proceedings of Acoustics ’08. 2008:6125–6130. [Google Scholar]
  125. 125.Wonnacott E, Newport EL, Tanenhaus MK. Acquiring and processing verb argument structure: distributional learning in a miniature language. Cognitive Psychology. 2008;56:165–209. doi: 10.1016/j.cogpsych.2007.04.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  126. 126.Xu F, Tenenbaum JB. Word learning as Bayesian inference. Psychological Review. 2007;114(2):245–272. doi: 10.1037/0033-295X.114.2.245. [DOI] [PubMed] [Google Scholar]
  127. 127.Zajda A. Some problems of Polish pronunciation in the dictionary of Polish pronunciation. In: Karaś M, Madejowa M, editors. Słownik wymowy polskiej/the dictionary of Polish pronunciation, L–LXII. Warszawa: Państwowe Wydawnictwo Naukowe; 1977. [Google Scholar]
  128. 128.Zhang L. Proceedings of the 17th International Congress of Phonetic Sciences. Hong Kong, China: 2011. Vowel length perception in Cantonese; pp. 2292–2295. [Google Scholar]

RESOURCES