Abstract
Listeners rapidly adjust to talkers' pronunciations, accommodating those pronunciations into the relevant phonemic category to improve subsequent perception. Previous work has suggested that such learning is restricted to pronunciations that are representative of how the speaker talks (Kraljic, Samuel, & Brennan, 2008). If an ambiguous pronunciation, for example, can be attributed to an external source (such as a pen in the speaker's mouth), or if it is preceded by normal pronunciations of the same sound, learning is blocked.
In three experiments, we explore this blocking effect in more detail. Our aim is to better understand the nature of the representations underlying the perceptual learning process. Experiment 1 replicates the blocking effect. Experiments 2 and 3 demonstrate that it can be eliminated when certain visual information occurs simultaneously with the auditory signal. The pattern of learning and non-learning is best accounted for by the view that speech perception is mediated by episodic representations that include potentially relevant visual information.
Keywords: Perceptual accommodation, Perceptual learning, Variation, Phonemic restructuring
1. Introduction
Listeners learn talkers' pronunciations: After even brief exposure to a talker with unfamiliar pronunciations, listeners correctly recognize more words from that talker (Bradlow & Bent, 2008; Clarke, 2000, 2002; Gass & Varonis, 1994; Maye, Aslin, & Tanenhaus, 2008; Weil, 2001), and they recognize them more quickly (Clarke & Garrett, 2004).
We are beginning to understand the cognitive changes underlying such learning. As listeners hear speech, they map each acoustic token onto its intended phonemic category. But the mapping is not always straightforward -pronunciations vary in their acoustic realization, so that two tokens of the same phoneme can be quite different. As listeners hear pronunciation variants, they learn the appropriate mapping from the sound to the intended phonemic category. Recent research suggests that as listeners learn this mapping, the category itself may be restructured to reflect the talker's pronunciations, improving subsequent perception of similar sounds (e.g., Dahan, Drucker, & Scarborough, 2008; Kraljic & Samuel, 2005, 2007; Norris, McQueen, & Cutler, 2003).
An optimal perceptual system should be both stable and flexible. Two recent findings demonstrate an ability to learn pronunciation variants while maintaining stable percepts, and suggest that such stability is grounded in a system that learns selectively. Kraljic, Samuel, and Brennan (2008) demonstrated that a particular variant (a sound midway between [s], as in “see”, and [∫], as in “she”), normally learned by listeners, is not learned in at least two scenarios: First, the ambiguous pronunciation was not learned when it was preceded by normal pronunciations of the same sound (i.e., when listeners first heard a speaker pronounce words such as “dinosaur” normally, and then heard the same speaker pronounce additional words (e.g., “Tenne?ee”) with a now-ambiguous [s]). Second, the ambiguous pronunciation was not learned when it could be attributed to a temporary, incidental factor (a pen in the speaker's mouth). Thus, at least two factors can “block” a pronunciation from being learned: previous experience with normal tokens said by the speaker, and evidence that the current pronunciations might be temporary.
To explain this blocking effect, Kraljic et al. (2008) proposed that perceptual learning is a process by which listeners build a model of the speaker, learning those speech characteristics that could improve subsequent perception. Crucially, the learning curve is steep, so that (as in certain computational or ecological models; e.g., Elman, 1993; Gebhart, Aslin, & Newport, 2009; Gibson, 1966) all else being equal, (a) initial experience with a speaker is weighted more heavily than later experience (thus blocking or greatly reducing the effect of later variation on the speaker model), and (b) variation that can be attributed to incidental factors is discarded or ignored (thus evidence of an incidental factor, e.g., a pen in the speaker's mouth, also blocks or reduces the effect of the accompanying variant). The pen provides an “excuse” for the variation, suggesting that it is incidental, and thus no adjustment is made.
The purpose of the present research is to better understand how events like a pen in a speaker's mouth affect the learning process. Does interpreting spoken variation to be a consequence of the pen essentially make an ambiguous pronunciation ‘normal’? That is certainly one way to think about discarding variation: Because the pen signals that the ambiguity is incidental, the variation is stripped away, allowing the same underlying representation to be accessed as by a normal pronunciation. If so, then an ambiguous pronunciation accompanied by an attribution presented while the speaker model is being built should behave as a normal pronunciation, and block subsequent learning.
There is a quite different possibility: The attribution + pronunciation may not access the same underlying representation as the pronunciation by itself. That is, the pen does not render the variation incidental, but instead the joint event of the pen and the pronunciation does not map onto the same representation as the pronunciation itself. This alternative draws on episodic views of word recognition (e.g., Goldinger, 1998), and explains the original pen result (no perceptual learning when the pen was in the speaker's mouth during an ambiguous token) by positing that those exposure events were not relevant to the non-pen testing stimuli.
We begin by replicating the finding that hearing normal pronunciations blocks subsequent learning of ambiguous pronunciations by the same talker. In Experiment 2, the initial normal pronunciations are replaced by tokens that are acoustically ambiguous, and that are paired with videos that provide visual evidence of an external attribution (a pen). If the pen serves as a signal to ‘strip’ the variation from the sound, these ambiguous + pen pronunciations should behave as normal pronunciations do, and block subsequent learning of ambiguous pronunciations (that are not paired with a pen). If instead the exposure tokens with the pen constitute different types of episodes than ones without a pen, they should not block subsequent learning. Experiment 3 provides a complementary test: This time, the initial pronunciations that are paired with a pen are normal. The “pen as excuse” view predicts that the normal exposures should block subsequent learning because the pen is irrelevant when there is nothing to excuse. If, however, speech events with the pen are different types of episodes than those without one, these exposure tokens are not relevant to the subsequent pen-less variants, and those variants should therefore produce perceptual learning.
2. Experiment 1
The purpose of Experiment 1 is to replicate the original Kraljic et al. (2008) finding that hearing normal pronunciations blocks subsequent learning of ambiguous pronunciations of the same phoneme, by the same speaker.
2.1. Method
2.1.1. Participants
Fifty-six students at the University of Pennsylvania participated for research credit. Two participants were replaced due to technical problems during the experiment; one participant was replaced because the experimenter discovered that he was not a native English speaker. All remaining participants were 18 years of age or older and identified themselves as native English speakers with normal hearing.
2.1.2. Design
The experiment consisted of two phases: Exposure and Test. Participants were randomly assigned to one of two exposure lists, depending on whether the ambiguous sound they heard was intended to be perceived as [s] or [∫] (?S or ?SH). After exposure, all listeners were tested on their perception of several tokens along a vowel–Consonant–vowel (vCv) continuum, in which the (C)onsonant ranged from [s]-like to [∫]-like, with several ambiguous versions in between.
2.1.3. Materials
2.1.3.1. Phase I – Exposure (lexical decision)
The exposure phase consisted of an auditory lexical decision task. Two experimental lists (?S and ?SH) were created, each with 100 words and 100 nonwords. The lists were identical, with two exceptions: First (and critically), for participants in the ?S condition, the ambiguous [?] occurred in 10 words that normally contained [s] (e.g., hallucinate); for those in the ?SH condition, it was in 10 words that normally contained [∫] (e.g., ambition). Second, twenty normal tokens of the other sound (e.g., [∫] for the ?S group) were spread throughout the 200 items. All 200 items were recorded by a female speaker. The words that contained [s] and [∫] were recorded twice, once with the speaker pronouncing the word with [s] and once with [∫] (e.g., she recorded both hallucinate and hallushinate). These pairs were then used to create a token in which the critical fricative was ambiguous between [s] and [∫]. Detail regarding the selection and construction of these stimuli can be found in Kraljic and Samuel (2005).
Each of the 200 audio tokens was then spliced onto video of a female speaker saying the words. During recording, the speaker sat against a blank backdrop, with a pen in her right hand. The audio stimuli were played one by one, and she repeated each item, taking care to imitate the rate of speech. Throughout, she fidgeted with the pen in her hand, sometimes placing it in her mouth as she spoke. She recorded the entire list twice, once with the pen in her mouth on 50% of the items, and the second time with it in her mouth on the other 50% of the items. The video and audio were recorded onto digital videotape and edited using Adobe Premier®. The onset of the audio for each item was identified, and the utterance was removed and replaced with the previously created auditory stimulus.
To test whether initial experience with normal pronunciations blocks subsequent learning of ambiguous pronunciations, the exposure lists were presented in a particular order. From the listener's perspective, the exposure phase was one continuous block of 200 items. However, ten normal tokens of the critical phoneme (either [s] or [∫]) were randomly embedded among the first 100 items the listener heard; the ten ambiguous tokens of the phoneme were subsequently embedded among the second 100 items the listener heard. Importantly, although the speaker had the pen in her mouth on 50% of the trials, in Experiment 1 the speaker never had the pen in her mouth during the critical tokens (regardless of whether the pronunciation was normal or ambiguous).
2.1.3.2. Phase II - Category identification
The second phase was identical for all listeners. Participants heard six items on a continuum that ranged from [asi] to [a ∫ i], in the same voice as the lexical decision items. The procedure for creating the continuum was the same as that for creating the ambiguous items in the lexical decision task: Each endpoint ([asi] and [a ∫ i]) was recorded, and the [s] and [∫] were mixed together in proportions varying from 20% [s] 80% [∫] to the reverse. Out of these, six mixtures were chosen that ranged from relatively [s]-like to relatively [∫]-like, with four ambiguous points in between. Presentation was strictly auditory for all participants.
2.1.4. Procedure
Participants were randomly assigned to one of the two exposure conditions (?S or ?SH) (28 per list). They were tested individually on a laptop in a quiet room. Stimuli were viewed on the computer screen and heard over headphones. Participants responded ‘Word’ or ‘Non-word’ to each item by pressing the correspondingly labeled button on the keyboard; responses and reaction times were recorded. The instructions stressed both speed and accuracy. Participants were not told anything about the speaker's pronunciation.
After the lexical decision phase, participants categorized sounds on the [asi] – [a ∫ i] continuum, with the six items randomly ordered and presented 10 times. As with the lexical decision task, the stimuli were presented over headphones (there was no visual component to the categorization task) and participants responded by pressing labeled buttons on the keyboard.
2.2. Results
2.2.1. Lexical decision
Mean accuracy for the tokens with the normal pronunciations (98.9%) was slightly higher than for those with ambiguous pronunciations (96.8%), and correct responses were made more quickly to normal pronunciations as well (1882 ms versus 1965 ms), though neither of these trends reached statistical significance (accuracy: F1(1,110) = 3.25, p = .074; F2(1,38) = 3.6, p = .065; RT: F1(1,110) = 2.93, p = .09; F2(1,38) = 0.48, p = .49). Thus, the tokens containing ambiguous pronunciations were rapidly and consistently accepted as words, but they did result in slightly greater comprehension difficulty than normal pronunciations of the same sounds.
2.2.2. Category identification
For each participant, the average percentage of test syllables identified as SH was calculated. The continuum was designed so that at minimum, participants should be able to distinguish [s] from [∫] at the two endpoints. This provides a check that participants are actually doing the task (as opposed to hitting one button repeatedly, for example). To the extent that ambiguous pronunciations were learned (i.e., accommodated by the intended category), listeners exposed to [?] in SH-words should have identified more syllables as SH than listeners exposed to [?] in S-words. But this did not happen. Instead, as Fig. 1 illustrates, listeners in the ?S group categorized virtually the same percentage of items as SH (54.1%) and as those in the ?SH group (54.5%), t1(1,54) = .08, p = .93. This replicates Kraljic et al.'s (2008) observation that learning is blocked when normal tokens precede ambiguous ones. In fact, across several projects, with a total N approaching 200, shifts under these conditions have averaged 2–3%, and never approach significance. In contrast, in over a dozen experiments using these stimuli without initial exposure to good tokens, shifts are consistently between 10% and 15%, and significant.
Fig. 1.
Listeners who hear normal pronunciations followed by ambiguous ones do not show perceptual learning of the ambiguous tokens.
3. Experiment 2
Experiment 1 replicated the finding that normal pronunciations block learning of subsequent ambiguous pronunciations. Kraljic et al. (2008) also reported that learning was blocked when ambiguous pronunciations were presented with a pen in the speaker's mouth. If the ambiguous + pen pronunciations are not learned because the pen causes the acoustic variation (the ambiguity) to be discarded then the percept essentially becomes that of a normal token. If these tokens are truly transformed to normal, then ambiguous + pen pronunciations should have the same effect as normal pronunciations: When they precede ambiguous pronunciations (with no pen), they should block learning. However, if the ambiguous + pen token is treated as a different type of episode than episodes without a pen, then exposure to ambiguous + pen tokens should be irrelevant to the subsequent ambiguous items, which should therefore produce their usual learning effect. Experiment 2 tests these competing predictions.
3.1. Method
3.1.1. Participants
One-hundred and four students at the University of California at San Diego participated for research credit. All were 18 years of age or older and identified themselves as native English speakers with normal hearing.
3.1.2. Design
Experiment 2 included the standard two phases: Exposure and Test. Participants were randomly assigned to one of four conditions at exposure, depending on whether the ambiguous pronunciation was intended to be [s] or [∫] (?S or ?SH), and whether the second half of the exposure phase remained audiovisual or switched to purely audio (AV or AudioOnly, respectively). After exposure, all listeners categorized tokens from the same vCv continuum as Experiment 1.
3.1.3. Materials
3.1.3.1. Phase I – Exposure (lexical decision)
The two experimental lists (?S and ?SH) were identical to Experiment 1, with two exceptions: First, the 10 normal tokens that were embedded among the first 100 items were replaced with 10 ambiguous tokens (of the same words). Second, the video accompanying these 10 new ambiguous tokens showed the speaker with a pen in her mouth as she spoke these words (10 of the initial filler items were replaced with video in which the speaker did not have a pen in her mouth, to keep the 50/50 ratio intact). The 10 ambiguous tokens within the final 100 exposure trials remained the same as in Experiment 1 and were not accompanied by a pen.
For half of the participants (those in the AudioOnly condition), there was one additional change: The video was turned off for the second half of the exposure phase (the last 100 items, which include the ambiguous tokens with no pen). Thus, AudioOnly participants heard the same audio, in the same order, as AV participants, but the presentation of the second 10 ambiguous items was purely auditory. From an attribution perspective, this tests whether participants assume that the tokens still have an external attribution (there is no visible attribution, but also no evidence that anything has changed about the speaker).
3.1.3.2. Phase II – Category identification
The second phase of the experiment was identical to Experiment 1. Presentation was strictly auditory for all participants.
3.1.4. Procedure
Participants were randomly assigned to one of the four exposure conditions (?S or ?SH × AudioOnly or AV) (26 per group). Testing was identical to Experiment 1.
3.2. Results
3.2.1. Lexical decision
Two participants were replaced due to poor performance (>3 standard deviations from the mean for accuracy or for reaction time). Across the four Exposure conditions, mean accuracy for the normally pronounced tokens was slightly higher (97.5%) than for ambiguous tokens (96.2%; F1(1,103) = 4.41, p = .038; F2(1,39) = xx, p=xx), and correct responses were made more quickly as well (1437 ms versus 1516 ms; F1(1,103) = 17.35, p < .001; F2(1,79) = 1.55, p = .22). As in Experiment 1, the ambiguous tokens were rapidly and consistently accepted as words (see Table 1).
Table 1.
Experiment 2. Mean accuracy and reaction times (for correct responses) for critical words pronounced with a normal fricative and for those pronounced with an ambiguous fricative. Standard deviations are in parentheses.
| Critical words | ||||
|---|---|---|---|---|
| Normal | [?] | |||
| % Correct | RT (ms) | % Correct | RT (ms) | |
| AV 1st half, AV 2nd half | ||||
| 1st half (AV) | 99.0% (2.9) | 1635 (168) | 95.0% (7.7) | 1627 (231) |
| 2nd half (AV) | 98.5% (3.6) | 1567 (162) | 98.1% (4.4) | 1722 (564) |
| AV 1st half, AudioOnly 2nd half | ||||
| 1st half (AV) | 98.3% (5.1) | 1643 (219) | 95.4% (7.3) | 1733 (225) |
| 2nd half (AudioOnly) | 94.0% (5.7) | 916 (120) | 96.5% (5.9) | 983 (160) |
3.2.2. Category identification
For each participant, the average percentage of test syllables identified as SH was calculated. Five participants heard all the syllables as S while seven heard them all as SH; these participants were replaced. Importantly, all of the participants who heard only S and five of those who heard only SH were responding in the direction that was consistent with their exposure condition (i.e., ?S or ?SH). Thus, including these participants would have strengthened the following effect, not weakened it.
If ambiguous tokens were accommodated by the intended category, listeners exposed to [?] in SH-words should have identified more syllables as containing SH than listeners exposed to [?] in S-words. This is precisely the behavior observed: Participants in Experiment 2 showed a strong main effect of exposure sound (F(1,100) = 25.51, p < .001), demonstrating that their categories shifted to accommodate the ambiguous pronunciation. As Fig. 2 suggests, this effect did not interact with the presentation modality of the final 100 exposure items (F(1,100) = .21, p = .65): Learning occurred to the same extent when the second block continued to be audiovisual (a 15.9% shift; F(1,50) = 14.1, p < .001), as when it switched to AudioOnly (a 13.4% shift; F(1,50) = 11.38, p < .002). Thus, ambiguous pronunciations paired with a pen do not produce the same blocking effect that normal tokens do. These results are inconsistent with the attribution view of the pen's role. Instead, they indicate that a percept without a pen in the speaker's mouth is encoded as a different type of episode than a percept with the pen.
Fig. 2.
Experiment 2. Participants' phonemic categories shifted to accommodate the ambiguous pronunciation. Learning occurred to the same extent when the modality of the second block remained the same (AudioVisual, left panel) and when it changed (Audio Only, right panel).
4. Experiment 3
The results from Experiment 2 show that the pen in the mouth does not serve to make the pronunciation ‘normal’. Instead, the results suggest that seeing the pen changes the encoding of the accompanying auditory information; the visual event is not an extraneous event that merely signals that the accompanying acoustic ambiguity should be discounted.
Experiment 3 pursues this perspective by testing what happens when normal pronunciations are paired with the pen. If the pen only provides an “excuse” for a poor pronunciation, it should be irrelevant here because there is no acoustic ambiguity; the normal tokens should block learning of subsequent ambiguous tokens, as the (acoustically identical) normal tokens in Experiment 1 did. If, however, the pen changes the speech percept so that it is associated with an altogether different representation, the normal tokens paired with a pen should no longer block learning.
4.1. Method
4.1.1. Participants
Thirty-eight students at the University of Pennsylvania participated for research credit. All were 18 years of age or older and identified themselves as native English speakers with normal hearing.
4.1.2. Materials
The materials were identical to the audiovisual stimuli in Experiment 1, with a single exception: The video accompanying the critical normal tokens during the first half of the exposure phase now showed the speaker with a pen in her mouth as she spoke these tokens. The second half remained audiovisual, and the 10 ambiguous tokens in the second half were never accompanied by a pen in the speaker's mouth, as in Experiment 1.
4.1.3. Procedure
The procedure was identical to Experiment 1: Listeners were randomly assigned to either ?S or ?SH lists as they performed a lexical decision task. Afterwards, all participants categorized items on the (purely auditory) [asi] – [a∫i] continuum.
4.2. Results
4.2.1. Lexical decision
Mean accuracy for the normal tokens (98.9%) was slightly higher than for ambiguous ones (96.4%), F1(1,36) = 5.48, p = .025; F2(1,36) = 1.42, p = .24. There was no difference in how quickly listeners were able to correctly judge the two types of tokens (mean RT was 1962 ms for normal tokens, 1922 ms for ambiguous tokens; F1(1,37) = 1.37, p = .25; F2(1,38) = 1.85, p = .18).
4.2.2. Category identification
Normal tokens paired with video of a pen in the speaker's mouth do not prevent learning of subsequent ambiguous tokens (see Fig. 3). In contrast to Experiment 1, participants now showed robust evidence of learning the ambiguous pronunciations: Participants exposed to [?] in S-words categorized fewer items as SH than did participants who heard [?] in SH-words (44.5% versus 56.2%, t(36) = 2.17, p = .036).
Fig. 3.
When listeners hear normal pronunciations paired with a pen, subsequent ambiguous pronunciations are accommodated by their phonemic categories.
5. General discussion
These findings shed light on how listeners adapt to speakers' pronunciations. In previous work (Kraljic et al., 2008) we demonstrated that phonemic categories are restructured to accommodate ambiguous pronunciations that appear to be representative of how the speaker talks. We found two important limitations on restructuring: If an ambiguous pronunciation can be attributed to an external source (such as a pen in the speaker's mouth), or if it is preceded by standard pronunciations of the same sound, no restructuring occurs. We took these results as support for the view that listeners build models of talkers, discarding or ignoring acoustic deviations that can be attributed to sources external to the talker.
The current results provide further evidence that hearing initial normal tokens inoculates against adjustments triggered by subsequent variants, but they are problematic for the attribution perspective. Contrary to that view, in Experiment 2 seeing a pen in the speaker's mouth did not lead to variation being stripped away, as the attribution view would suggest. Perhaps more telling, in Experiment 3, normal pronunciations paired with a pen failed to block learning of subsequent ambiguous pronunciations, indicating that the pen cannot simply be providing an excuse for variation, as there was no variation to excuse.
These results call for a quite different view of the representations that underlie perceptual learning. Perceptual learning is a process by which phonemic representations are restructured to accommodate experience, thereby improving subsequent perception. The assumption (by us and by others) has been that each time a listener hears a pronunciation, he or she accesses the associated phonemic representation, modifies it, and then later, when hearing a similar stimulus, accesses the same representation. Within this conception, we hypothesized that an external attribute (such as a pen) could serve to normalize the sound: the same representation is accessed, but it is not modified because the pen causes the variation to be ignored or discarded. This approach can account for our original finding that no perceptual learning occurred when the deviant tokens were accompanied by the pen-in-mouth.
However, the current results demonstrate that the pen must have a different role. Instead, we suggest that pairing a pronunciation with the pen results in an entirely different percept, one that does not access the same underlying representation. Simply put, speech events with a pen are different episodes than ones without the pen. This perspective is consistent with the results of Experiments 2 and 3, as well as the original pen-in-mouth finding (Kraljic et al., 2008). This framing is grounded in theories in which perceptual representations are more episodic and contextually-sensitive (e.g., Goldinger, 1998), and dovetails well with studies that demonstrate sensitivity to very specific phonetic information even within a phonemic category (Clayards, Tanenhaus, Aslin, & Jacobs, 2008). Other recent work in our laboratory (Samuel & Kraljic, submitted for publication) shows that blocking of perceptual learning depends on the visual identity of the speaker (with identical acoustic input), and also is best explained by the view that perceptual adjustments are made to multi-modal representations that are organized by their episodic properties. We believe that with this representational approach, a learning model based on shared patterns rather than abstractions (e.g., adaptive resonance theory (ART); see Carpenter & Grossberg, 2003) can best account for the pattern of learning (and non-learning) found in perceptual learning experiments.
Acknowledgments
This material is based upon work supported by NIH Grant No. MH R01-051663, by NIH Grant No. HD 059787, NSF Grant No. 0325188, and by NIH postdoctoral training grant No. F32 HD052342. We thank Donna Kat for her invaluable assistance, and the anonymous reviewers for their very constructive suggestions.
References
- Bradlow AR, Bent T. Perceptual adaptation to non-native speech. Cognition. 2008;106:707–729. doi: 10.1016/j.cognition.2007.04.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Carpenter GA, Grossberg S. Adaptive resonance theory. In: Arbib MA, editor. The handbook of brain theory and neural networks. 2nd. Cambridge, MA, Boston, MA: MIT Press, Boston University; 2003. pp. 87–90. [Google Scholar]
- Clarke CM. Perceptual adjustments to foreign-accented English. Indiana University's Research on Spoken Language Processing Progress Report No. 24 2000 [Google Scholar]
- Clarke CM. Perceptual adjustment to foreign-accented English with short-term exposure. ICSLP-2002. 2002:253–256. [Google Scholar]
- Clarke CM, Garrett MF. Rapid adaptation to foreign-accented English. Journal of the Acoustical Society of America. 2004;116(6):3647–3658. doi: 10.1121/1.1815131. [DOI] [PubMed] [Google Scholar]
- Clayards M, Tanenhaus MK, Aslin RN, Jacobs RA. Perception of speech reflects optimal use of probabilistic speech cues. Cognition. 2008;108(3):804–809. doi: 10.1016/j.cognition.2008.04.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dahan D, Drucker SJ, Scarborough RA. Talker adaptation in speech perception: Adjusting the signal or the representations? Cognition. 2008;108:710–718. doi: 10.1016/j.cognition.2008.06.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Elman JL. Learning and development in neural networks: The importance of starting small. Cognition. 1993;48:71–99. doi: 10.1016/0010-0277(93)90058-4. [DOI] [PubMed] [Google Scholar]
- Gass S, Varonis E. Input, interaction, and second language production. Studies in Second Language Acquisition. 1994;16(3):283–302. [Google Scholar]
- Gebhart AL, Aslin RN, Newport EL. Changing structures in midstream: Learning along the statistical garden path. Cognitive Science. 2009;33(6):1087–1116. doi: 10.1111/j.1551-6709.2009.01041.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gibson JJ. The problem of temporal order in stimulation and perception. Journal of Psychology. 1966;62:141–149. doi: 10.1080/00223980.1966.10543777. [DOI] [PubMed] [Google Scholar]
- Goldinger SD. Echoes of echoes? An episodic theory of lexical access. Psychological Review. 1998;105(2):251–279. doi: 10.1037/0033-295x.105.2.251. [DOI] [PubMed] [Google Scholar]
- Kraljic T, Samuel AG. Perceptual learning for speech: Is there a return to normal? Cognitive Psychology. 2005;51:141–178. doi: 10.1016/j.cogpsych.2005.05.001. [DOI] [PubMed] [Google Scholar]
- Kraljic T, Samuel AG. Perceptual adjustments to multiple speakers. Journal of Memory and Language. 2007;56:1–15. [Google Scholar]
- Kraljic T, Samuel AG, Brennan SE. First impressions and last resorts: How listeners adjust to speaker variability. Psychological Science. 2008;194(4):332–338. doi: 10.1111/j.1467-9280.2008.02090.x. [DOI] [PubMed] [Google Scholar]
- Maye J, Aslin R, Tanenhaus MK. The weckud wetch of the West: Rapid adaptation to a novel accent. Cognitive Science. 2008;32:543–562. doi: 10.1080/03640210802035357. [DOI] [PubMed] [Google Scholar]
- Norris D, McQueen JM, Cutler A. Perceptual learning in speech. Cognitive Psychology. 2003;47:204–238. doi: 10.1016/s0010-0285(03)00006-9. [DOI] [PubMed] [Google Scholar]
- Samuel AG, Kraljic T. Talking heads: Visual identity can dominate processing of spoken words. submitted for publication. [Google Scholar]
- Weil SA. Unpublished master's thesis. The Ohio State University; 2001. Foreign accented speech: Adaptation and generalization. [Google Scholar]



