Sleep and Native Language Interference Affect Non-Native Speech Sound Learning

F Sayako Earle; Emily B Myers

doi:10.1037/xhp0000113

. Author manuscript; available in PMC: 2016 Dec 1.

Published in final edited form as: J Exp Psychol Hum Percept Perform. 2015 Aug 17;41(6):1680–1695. doi: 10.1037/xhp0000113

Sleep and Native Language Interference Affect Non-Native Speech Sound Learning

F Sayako Earle ¹, Emily B Myers ^1,^2,³

PMCID: PMC4666788 NIHMSID: NIHMS705810 PMID: 26280264

Abstract

Adults learning a new language are faced with a significant challenge: non-native speech sounds that are perceptually similar to sounds in one’s native language can be very difficult to acquire. Sleep and native language interference, two factors that may help to explain this difficulty in acquisition, are addressed in three studies. Results of Experiment 1 showed that participants trained on a non-native contrast at night improved in discrimination 24 hours after training, while those trained in the morning showed no such improvement. Experiments 2 and 3 addressed the possibility that incidental exposure to perceptually similar native language speech sounds during the day interfered with maintenance in the morning group. Taken together, results show that the ultimate success of non-native speech sound learning depends not only on the similarity of learned sounds to the native language repertoire, but also to interference from native language sounds before sleep.

Introduction

Non-native speech sounds are difficult for adults to perceptually disambiguate, particularly if these sounds are similar to sounds in the existing native language phonology (see Strange, 1995, for review). For example, the Hindi dental / Inline graphic / and retroflex /ɖ/ sounds are often perceived by native English speakers as variants of the English alveolar /d/ category (Werker, 1988). Previous accounts have focused on limitations in processing these sounds, suggesting that similarity to native-language perceptual or articulatory representations may prevent listeners from distinguishing novel non-native tokens from native speech sounds (Kuhl & Iverson, 1995; Flege, 1995, Best, 1995). However, little is known about difficulties that may arise due to failures in encoding learned variants into long-term memory following speech sound training. The formation of novel speech sound categories requires that listeners both encode details of these sounds in memory, as well as abstract away from episodic details to recognize new instances of the sound (see Earle & Myers, 2014, for review). Given this, understanding the role of consolidation, that is, the memory process that facilitates these qualitative changes to the memory trace, not only contributes to accounts of speech sound learning, but provides broader insight into the emergence of perceptual categories.

Sleep in Memory Consolidation

The contribution of sleep to memory consolidation is supported by a growing literature (see Rasch & Born 2013, for review), but few studies have directly investigated how sleep affects perceptual learning as it relates to speech, arriving at different conclusions depending on the aspect of speech learning that is assessed (see Earle & Myers, 2014, for review; Eisner & McQueen, 2006; Fenn et al., 2003, 2013; Roth et al. 2005). In particular, some studies show no significant sleep-mediated influences on the maintenance/stability of learned phonetic information. For example, Eisner and McQueen (2006) found that shifts in category boundary to accommodate speaker idiosyncrasies emerged immediately after perceptual training, and remained stable over a post-training interval of 24 hours irrespective of when sleep occurred in relation to training. Similarly, Roth et al. (2005) found that a period of restful wake, as well as sleep, stabilized the training-induced performance gain on the identification of syllables in noise. It should be noted that the sleep group alone showed a trend toward higher performance at the delayed posttest, suggesting that a larger sample size may have yielded a statistically significant improvement as a function of sleep.

In contrast, a separate set of studies suggests that sleep facilitates the recovery of learned perceptual information, and assists in generalization to new instances. Fenn et al. (2003, 2013) trained individuals to identify synthetically generated words, a task which requires that these non-standard phonetic tokens be mapped onto the listener’s native phonology. In their case (Fenn, et al., 2003), sleep exerted either a protective or restorative effect on post-training performance. In a further investigation, the variability of tokens during training determined whether sleep would promote improved performance on the trained tokens (limited token set) or facilitate generalization (expanded set of training tokens) (Fenn, et al., 2013).

Thus, it appears that sleep does not ubiquitously improve performance on trained perceptual tasks when it comes to speech. Rather, sleep effects appear to be more pronounced when the task requires a reorganization of the preexisting phonological system (e.g., Fenn et al., 2003). Previous studies have tended to uncover patterns that suggest maintenance of perceptual task performance (Eisner & McQueen, 2006; Fenn et al., 2003; Roth et al. 2005), rather than overnight improvement with one exception to our knowledge (Fenn et al., 2013). Importantly, these studies have all addressed how sleep affects perceptual adjustments made within one’s native language; thus, it is not yet clear how sleep might assist the acquisition of novel (non-native) acoustic-phonetic features.

For the formation of non-native speech sound categories, two bodies of work, word learning, and auditory skill learning literatures, suggest that sleep plays a crucial role in at least two qualitatively different ways: (see Earle & Myers, 2014, for review). The collective literature on word learning show that sleep facilitates the integration of learned verbal or orthographic forms into the existing lexicon (Bowers, Davis, & Hanley, 2005; Clay, Bowers, Davis, & Hanley, 2007, Davis et al., 2009; Dumay & Gaskell, 2007; Dumay, Gaskell, & Feng, 2004). Moreover, sleep appears to facilitate generalization to untrained items, particularly in online tasks (Tamminen, Davis, Merkx, & Rastle, 2012). Insights from this word learning literature lead to the prediction that phonetic information may undergo a similar sleep-induced change in status within the mental phonology, resulting in generalization away from the trained instances in order to recognize the contrast spoken by new talkers or in new vowel contexts. In contrast, the literature on auditory (non-speech) skill learning suggests that sleep enhances performance on tasks that assess learned skills (e.g., Atienza, Cantero, & Stickgold, 2004; Brawn, Nusbaum, & Margoliash, 2010). Therefore, sleep may also promote improved performance on perceptual tasks in which the assessment tokens are identical to those used in training.

The first of these two predictions is supported by a recent study in our lab, in which generalization of training to an untrained talker occurred after sleep, but not before (Earle & Myers, 2015). Of note, this sleep effect on talker generalization was observed only in the identification task, whereas performance on discrimination of the nonnative contrast, across trained and untrained conditions, remained stable over time. Furthermore, there was no significant improvement in identification on the trained talker, suggesting that sleep effects on performance in the identification task applied only to the generalization of training across talkers, and did not facilitate improved performance with the trained tokens.

A lack of sleep-related improvement in discrimination contradicted the prediction generated by the auditory skill learning literature. This discrepancy between our expectation and our findings motivated a more careful consideration of the demands of the phonetic identification and discrimination tasks, and in particular a consideration of how these demands recruit declarative and procedural memory systems, which are themselves differently affected by sleep (see Marshall & Born, 2007, for review).

Tasks Used to Assess Speech Perception: Differential Effects of Sleep

An individual’s performance on different perceptual tasks, such as identification and discrimination of non-native speech tokens, is often assumed to reflect the quality of common perceptual representations of the target contrast. However, within-individual performance on different perceptual tasks are often found to diverge (e.g., Earle & Myers, 2015; McKain, Best, & Strange, 1981); furthermore, it has been proposed that different sources of information contribute to task performance (e.g. Antoniou, Tyler, & Best, 2012; Antoniou, Best, & Tyler, 2013). For example, Antoniou et al. (2012) assessed a group of Greek-English bilinguals on category goodness ratings and discrimination along a voice onset time (VOT; /p/-/b/ and /d/-/t/) continuum of word-initial stops. The authors found that, while category goodness ratings given by the bilinguals were consistent with English and Greek monolinguals respective to the language mode of the target tokens, discrimination judgments aligned with the VOT boundaries common in the dominant language of the bilinguals’ linguistic environment. Similarly, Antoniou et al. (2013) found that Greek-English bilinguals’ categorization judgments on a non-native (Ma’di) contrast differed according to the language in which the instructions were given, but that language mode did not affect discrimination performance across subgroups. This set of studies suggests that performance on category goodness ratings and categorization tasks are more sensitive to language-specific phonetic knowledge than discrimination performance. We have argued similarly for the task-specific recruitment of different perceptual information following categorization training (Earle & Myers, 2014). Specifically, for the sake of generating predictions utilizing the wider memory consolidation literature, we have discussed this separation of task performance in terms of declarative and procedural knowledge.

Identification tasks, in which listeners map the acoustic input onto a visual or motoric label (such as choose ‘A’ vs. ‘B’, or click ‘left’ or ‘right’), require the explicit recall of cross-modal information. Therefore, changes to task performance across time may reflect the different stages of memory encoding in the declarative memory system (see Earle & Myers, 2014, for review). The benefit of sleep to declarative knowledge is associated with the hippocampal-cortical transfer of information thought to occur during slow-wave sleep (see Diekelmann & Born, 2010, for review; Wilson & McNaughton, 1994; Ji & Wilson, 2004), often referred to as ‘systems consolidation.’ Systems consolidation (Complementary Systems Account of Learning, McClelland, McNaughton, & O’Reilly, 1995) predicts the offline abstraction and integration of the episodic trace with preexisting information. This leads to the prediction that the effects of sleep-mediated abstraction of acoustic phonetic features from the training tokens will be more salient for tasks that directly assess declarative recall of token-label mapping. This is consistent with the sleep-mediated talker generalization effect that we observed in our previous work (Earle & Myers, 2015).

In contrast, perceptual discrimination may not require the explicit recall of category label, but is often observed to improve as a result of categorization training (McCandliss, et al., 2002; Swan & Myers, 2013). We have therefore argued (Earle & Myers, 2014) that improvement on discrimination requires an implicitly acquired ability to attend selectively to the relevant acoustic-phonetic details of the signal (see Francis & Nusbaum, 2002, for an attention-based model on nonnative speech learning); in other words, training-induced changes to performance in this case may reflect procedural learning. For procedural learning, sleep effects have been more consistently observed in the improvement of an acquired skill as opposed to the generalization/abstraction of skill to new input. It has been suggested that the mechanism underlying such skill enhancement in perceptual tasks is the localized strengthening in the primary sensory cortex of selective synapses engaged during perceptual learning (Schwartz, Maquet, & Frith, 2002), which may behaviorally manifest as an increased automaticity (as might be measured by decreased reaction time or increased accuracy) in perceptual tuning (e.g., Atienza, Cantero, & Stickgold, 2004). This process is thought to reflect latent synaptic consolidation during rapid eye movement (REM) sleep, occurring as a complementary, but distinct, process to systems consolidation (see Diekelmann & Born, 2010, for review).

To reiterate, the perceptual skill that is proposed to be acquired implicitly through categorization training is the ability to attend selectively to features that disambiguate the target tokens. Whereas the effects of systems consolidation (that leads to the generalization of skill to new instances) may be limited to tasks that assess declarative recall (such as identification), synaptic consolidation might be expected to facilitate improved performance whenever the task uses familiar (trained) tokens. Therefore, sleep is predicted to facilitate improvement in discrimination, as well as identification, of the trained tokens. However, prior work failed to show any improvements in discrimination or identification on the trained tokens (Earle & Myers, 2015). Importantly, because that study was designed specifically to assess generalization to new instances, the stimulus test set included a large degree of variability. The token set used during assessment included trained and untrained vowels, and trained and untrained speakers; perhaps, as result, the task undermined participants’ ability to retain consistent acoustic-phonetic features particular to the training tokens.

In the current investigation, two questions are examined. First, we ask whether, with reduced variability in the training and test set, sleep will facilitate improvements in both discrimination and identification of trained tokens following training (Exp 1). The current study therefore differs from the previous in two ways: the variability of the assessment tokens was reduced, and the number of assessment trials was increased. As a result, sleep is predicted to facilitate improvement on identification and discrimination of the trained tokens, but not in discrimination of the untrained tokens. Second, we ask whether exposure to similar native-like tokens may interfere with sleep-mediated improvements in consolidation (Exps 2 and 3).

Experiment 1

Changes in discrimination performance were tracked after identification training over 24 hours after training on a non-native (Hindi dental vs. retroflex stop) contrast. Participants were trained in the morning or the evening, and maintenance was assessed at approximate 12-hour intervals. Based on analogy with the auditory skill learning literature, improvement in discrimination and identification was expected during the overnight interval.

Materials and Methods

Participants

Sixty-nine undergraduate students (48 female and 21 male) between the ages of 18–24 were recruited from the University of Connecticut community, and were given course credit in exchange for their participation. This experiment was advertised to monolingual speakers of American English only; upon enrollment, nine participants were excluded on the basis of reporting that they were bilingual, or had grown up in a multi-lingual household. Six participants did not finish the study. Data from the remaining fifty-four participants (40 female, 14 male) were processed for further analyses. Participants gave informed consent in accordance with the guidelines of the University of Connecticut IRB.

Stimuli

Five exemplars of each ‘word’ (minimal pairs /ɖug/ and / Inline graphic ug/; /ɖig/ and / ig/) were produced by an adult male native speaker of Hindi. Auditory stimuli were recorded using a digital recorder (Roland Corporation, Los Angeles, CA) in a sound-proof booth. Tokens were trimmed to the onset of the stop burst, and mean amplitude was normalized across stimuli using PRAAT (Boersma, & Weenink, 2013). The same set of twenty tokens was used for all participants in the discrimination task. Participants were trained/assessed on a subset of 10 tokens (either /ɖug/ and / Inline graphic ug/ OR /ɖig/ and / ig/) for the identification task.

For the identification task, we employed two novel visual objects (‘fribbles’, Stimulus images courtesy of Michael J. Tarr, Center for the Neural Basis of Cognition and Department of Psychology, Carnegie Mellon University, http://www.tarrlab.org/), one for each word within the minimal pair on which participants were trained. Stimuli were presented such that each place of articulation (dental or retroflex) corresponded to a different fribble. The pairing between the minimal pair words and the two fribbles was counterbalanced across participants.

E-prime 2.0 software (Psychology Software Tools, Pittsburgh, PA) was used for stimulus presentation and recording participant response. Participants heard auditory stimuli through SONY MDR-7506 Hi-Fi digital Sound Monitor headphones, at an average listening level of 75 dB SPL (range: 44 – 80dB SPL).

Task Schedule

Participants were randomly assigned to Morning and Evening groups, with Morning participants receiving training between 8–10 AM and Evening participants receiving training between 6–9 PM. Participants returned to the lab approximately 12 and 24 hours after training to assess maintenance of learning (See Figure 1). Identification (ID) of the learned contrast was assessed after training and at the two follow-up sessions. Discrimination ability (AX task) was measured at four time points: immediately before training (baseline), and immediately following training during session one (posttest 1), and again in sessions two and three (posttests 2 and 3).

Overview of timing in the experimental protocol for Experiment 1

Identification Training and Test

Participants were trained to perceive the contrast in one vowel context: half of the sample were trained to identify /ɖug/ and / Inline graphic ug/, and the other half were trained on /ɖig/ and / ig/. During an initial familiarization sequence, each ‘fribble’ was presented in the center of the screen while the participant heard “this is a …” with the corresponding token repeated five times. The training itself consisted of 200 trials of a self-paced, forced-choice identification task with a 3-minute break after the first 100 trials. Within each trial, two fribbles remained visible on the screen while the participant heard “this is a …” followed by a dental or retroflex token. Participants indicated their choice with a mouse click, and written feedback (‘correct’ or ‘incorrect’) was given immediately following the response for every trial. During each identification posttest, 40 trials of identification without feedback were administered.

Discrimination Test

The discrimination task followed an AX design, with an inter-stimulus interval of one second between tokens. At each of the four time points, participants completed a total of 128 trials, such that half of the word pairs contained /u/ and half contained /i/. Note that for every participant, one vowel was the trained vowel, and the other was untrained, with the trained vowel counterbalanced across participants. Within each vowel set, 32 of the trials contained a pair of the ‘same’ words and 32 contained ‘different’ words. ‘Same’ trials used two acoustically distinct exemplars of /ɖ_g/ and / ɖ_g/ or / Inline graphic _g/ and / _g/ such that the measure tapped an individual’s recognition of the speech sound category rather than allowing participants to use low-level acoustic information (e.g., pitch) to discriminate tokens, and every ‘same’ trial was acoustically unique. Similarly, each ‘different’ trial contained either a unique pairing or a unique ordering of the dental and retroflex exemplars, such that no two ‘different’ trials were identical. Participants were instructed to decide if the sound at the beginning of each ‘word’ was the same type of speech sound, or belonged to different types of speech sounds. Participants completed 8 practice trials with feedback prior to each assessment.

In order to ensure that only participants who were actively engaged in the task for the duration of the session were included, participants whose scores on either the identification or discrimination posttest were at or below chance (a d’ value of 0) were excluded. Data from three participants were excluded on this criterion. Data from the remaining 51 participants (n=26/ Morning, n=25/Evening) are included in the analyses below.

Results

Preliminary analyses and data preparation

Percent accuracy in identification and discrimination were converted to d’ scores (MacMillan & Creelman, 2004). See Table 1 for mean percent accuracy and response bias. In order to rule out any pre-training differences in discrimination ability, we ran a 2×2 mixed models ANOVA with Vowel Context (trained or untrained) as the within-subjects measure, and Group as the fixed factor on the baseline discrimination scores. There were no main effects of Group or Vowel Context (F_1,49 = .32, p=.425, η²=.013; F_1,49 = .26, p=.610, η²=.005, respectively), and no interaction between Group and Vowel Context (F_1,49 = .19, p=.665, η²<.004). This suggests that discrimination ability across Vowel Context and Group were comparable prior to training.

Table 1.

Mean accuracy and response bias by Vowel Context by Group for Experiment 1

	Morning Training Group
	Discrimination Performance				Identification Performance
	Trained Vowel Context		Untrained Vowel Context		Trained Vowel Context
	Accuracy (% Correct)	Response Bias (% False Alarm)	Accuracy (% Correct)	Response Bias (% False Alarm)		Accuracy (% Correct)
baseline	.64 (.10)	.47 (.19)	.65 (.11)	.43 (.21)
posttest 1	.70 (.10)	.35 (.17)	.65 (.11)	.44 (.23)	posttest 1	.73 (.18)
postetst 2	.70 (.13)	.33 (.17)	.67 (.10)	.39 (.19)	posttest 2	.74 (.20)
posttest 3	.67 (.12)	.40 (.18)	.66 (.11)	.43 (.20)	posttest 3	.76 (.22)
	Evening Training Group
	Discrimination Performance				Identification Performance
	Trained Vowel Context		Untrained Vowel Context		Trained Vowel Context
	Accuracy (% Correct)	Response Bias (% False Alarm)	Accuracy (% Correct)	Response Bias (% False Alarm)		Accuracy (% Correct)

baseline	.63 (.11)	.40 (.16)	.64 (.08)	.44 (.17)
posttest 1	.68 (.11)	.38 (.20)	.66 (.11)	.44 (.22)	posttest 1	.75 (.16)
postetst 2	.71 (.13)	.39 (.24)	.67 (.10)	.48 (.22)	posttest 2	.80 (.17)
posttest 3	.72 (.12)	.39 (.24)	.66 (.11)	.47 (.20)	posttest 3	.80 (.17)

Open in a new tab

% False alarm is the percentage of trials incorrectly identified as ‘different’ when the tokens belong to the same category. Standard deviations of the mean are indicated in parentheses.

A baseline measure of identification performance was not obtained, because the decision over arbitrary token-label pairings would have been random prior to receiving instruction in the token-label assignments. Therefore, in order to ensure that participants performed above chance following training, a one-sample t-test was performed on the identification posttest immediately after training (ID Posttest 1). ID Posttest 1 scores differed significantly from 0 (t₅₀ = 7.13, p <.000, 95% CI: [1.65; 2.94]). Furthermore, in order to ensure that both groups achieved comparable levels of performance on the identification task, an independent samples t-test by Group on the ID Posttest 1 scores was performed. Differences in Group performance immediately following training were not statistically significant (t₄₉ = −.31, p=.757, 95% CI: [−1.54; 1.13]; confidence intervals were adjusted for family-wise error rate [FWER] using Holms-Bonferroni correction at p < 0.05). This suggests that both groups improved on the identification task as a result of training, and that the degree of improvement was comparable across groups. Learning rate, as measured by average accuracy per 50 trials during the training phase, is depicted in Figure 2a.

Identification

In order to determine if there were any changes in identification performance over the 24-hour experiment period in the absence of further training, a 2×3 mixed- model repeated measures analysis of variance (ANOVA) with Group (Morning or Evening training) as the between-subjects factor, and three levels of Time (ID Posttest 1, ID Posttest 2, ID Posttest 3) as the within-subjects factor was performed. Identification performance remained relatively stable over the 24-hour period for both groups (No main effects or interactions; Group: F_1,49 = .06, p =.801, η²=.001; Time: F_2,98 = 1.95, p =.148, η²=.038; Time by Group: F_2,98 = .51, p =.605, η²=.010; see Figure 3).

Training-related changes in discrimination performance

Discrimination performance improved in both groups following training, even though this task was not explicitly trained. Comparable gains across groups were confirmed via a 2×2×2 mixed models ANOVA on just the time points immediately before and after training, with Group as the between-subjects factor (Morning or Evening training), and two levels of Time (Baseline and Posttest 1) and Vowel Context (Trained/Untrained vowel: whether the vowel context was explicitly trained [Trained] or not [Untrained]) as within-subjects factors (see Figure 4). Participants in both groups improved from pretest to posttest, primarily on the Trained Vowel Context (significant main effect of Time: F_1,49 = 15.50, p <.001, η²=.24; interaction between Time and Vowel Context: F_1,49 = 1.30, p =.010, η²=.13). No other main effects or interactions emerged (main effect of Group F_1,49 = .59, p =.585, η²=.01; Vowel Context by Group: F_1,49 = .44, p =.508, η²=.01; Time by Group: F_1,49 < .00, p =.996, η²<.00; Time by Vowel Context by Group: F_1,49 =2.63, p =.111, η²=.05). The factors driving the Time by Vowel Context interaction were explored by performing two paired samples t-tests comparing Baseline and Posttest 1 scores for each Vowel Context, collapsed across Groups. For the Trained Vowel Context, Posttest 1 score was significantly higher than at Baseline (t₅₀ = −5.68, p<.001, 95% CI: [−0.58; −0.28]), while for the Untrained Vowel Context, the difference was not statistically significant (t₅₀ = −0.11, p=.301, 95% CI: [−0.32; 0.10]). Taken together, this suggests that both groups improved in discrimination performance in the Trained Vowel Context, but not the Untrained Vowel Context, through identification training. The magnitude of gain furthermore appears to be comparable between groups.

Profile of changes in discrimination performance by training group and by vowel context (trained or untrained) for Experiment 1

Error bars indicate standard error of the mean. * indicates statistical significance at alpha = .05.

Sleep-mediated changes in discrimination maintenance

The influence of time of training relative to sleep on changes in discrimination ability over 24 hours was investigated using a 2×3×2 mixed models ANOVA with Group as the single between-subjects factor, and three levels of Time (Posttest1, Posttest2, Posttest3) and Vowel Context as within-subjects factors. There was a significant main effect of Vowel Context (F_1,98 = 5.79, p =.017, η²<.11) and a significant three-way interaction between Group, Time, and Vowel Context (F_2,98 = 3.77, p= .027, η²<.07).

Visual inspection of the means suggested that this interaction likely resulted from significant differences between Groups and Time in the Trained contrast but not in the Untrained contrast. This was confirmed by performing two repeated measures ANOVAs on each Vowel Context (Trained/Untrained) separately. In the Trained Vowel condition, we observed a significant interaction between Time and Group (F_2,98 = 4.52, p = .013 , η²=.09), but neither Group nor Time main effects (F_1,49 = .03, p = .857, η²<.01; F_2,98 = .48, p = .618, η²=.01; respectively). In the Untrained Vowel condition, we observed no significant effects or interactions (Time: F_2,98 = .37, p = .69 , η²=.01; Group: F_1,49 = .02, p = .886 , η²<.01; Time by Group F_1,49 = .54, p = .466, η²=.011).

Visual inspection of the pattern within the Trained contrasts suggests that groups differ in the direction of change over time, with the Morning group losing sensitivity and the Evening group gaining sensitivity. This was confirmed by a 2×2 mixed models ANOVA with Group as the fixed factor and two levels of Time (Posttest 1 and Posttest 3) as the within-subjects factor. A significant interaction between Time and Group (F_1,49 = 11.66, p = .001, η²=.09), but no main effects of Time or Group (F_1,49 = .29, p = .590, η²=.01; F_1,49 = .19, p = .668, η²<.01; respectively), were found. Paired samples t-tests between Posttests 1 and 3 separately by Group indicated that the Morning Group exhibited significantly lower scores at Posttest 3 than immediately after training (t₂₅ = 2.61, p =.015, 95% CI: [0.06; 0.54]). For the Evening Group, Posttest 3 scores were significantly higher than immediately after training (t₂₄ = −2.34, p =.028, 95% CI: [−0.78; −0.05]).

Discussion

Results of Experiment 1 support the view that sleep plays a role in enhancing discrimination of a trained non-native contrast. In contrast, the identification data shows a gradual (non-significant) increase in mean performance per session. We reserve the discussion on the pattern of changes to identification performance over 24 hours until the discussion section of Experiment 2. Of interest, only individuals trained in the evening demonstrated significant improvement in discrimination following the overnight interval. No improvement in performance on an untrained vowel context was seen. The finding that the morning group shows no sleep-mediated improvement suggests that the effects of sleep may depend in part on the duration or quality of post-training wake state activity before sleep. While the specific effects on performance were different in their case, Fenn et al. (2003) similarly described different consequences of the overnight interval on performance for participants trained in the morning versus evening. This post-sleep discrepancy in performance between groups may reflect differences in the quality of nonnative phonetic representations that emerged overnight, though why this might be is unclear. One possibility is that differences in circadian rhythms contribute to diurnal differences in the learning that is taking place in the morning versus evening. A second possibility points to the amount of incidental exposure to native language sounds before sleep. That is, the Morning group is likely to be exposed to more English between training and sleep than the Evening group.

Several accounts of non-native speech sound learning in adulthood suggest that the presence of similar sounds in one’s native language interferes with the learning of the non-native sounds (Best, 1995; Flege, 1995). However, these accounts focus on the difficulty in distinguishing the non-native tokens from the existing representation of native speech sounds. Results of Experiment 1 raise the possibility that this difficulty may be compounded by active interference from exposure to native language tokens subsequent to training. Given that the dental and retroflex sounds perceptually resemble the English /d/ sound, exposure to alveolar /d/ may prevent the perceptual enhancement of the learned contrast overnight.

Similar interference effects have been previously reported in the procedural learning literature. For example, Walker, Brakefield, Hobson, and Stickgold (2003) trained three groups of participants on a motor (finger tapping) sequence. The first group only learned one sequence and was retested after 24 hours. The second group learned a second sequence immediately after the first, and was also retested after 24 hours. The third group learned a second sequence immediately after the first, and was retested immediately after training. While the first group showed an increase in speed and accuracy on the target (first) sequence, the second group only showed a performance increase on the second sequence. Performance immediately after training in the third group however indicated that the two sequences were comparably learned. Taken together, the authors interpreted that while the learning of the second sequence does not impede the learning of the first sequence initially, the learning of the second sequence interfered with the latent consolidation of the first. Similarly, our Morning group shows stable performance at the session 2 posttest, suggesting that the decline in discrimination performance does not occur until sleep during the overnight interval. Experiment 2 tests this interpretation directly.

Experiment 2

In order to isolate the effect of native language interference and to control for the potential confound introduced from diurnal effects, all participants were trained in the evening. The only substantial difference from Experiment 1 was that, immediately following training and Posttest 1, participants were randomly assigned to one of two interference conditions and exposed to a train of native-language syllables beginning with /d/ (D group) or /b/ (B group). We predicted that passive exposure to /d/s immediately following training would prevent sleep-mediated improvement on discrimination of the dental-retroflex contrast, whereas exposure to /b/s would not.