INTRODUCTION
Auditory training has been shown to improve cochlear implant (CI) users’ speech understanding in quiet (e.g., Fu et al., 2005; Stacey et al., 2010). Unfortunately, CI users rarely experience quiet, optimal listening conditions in everyday life. Background noise is problematic for CI users (e.g., Fu et al., 1998; Skinner et al., 1994), especially dynamic noise (e.g., Fu and Nogaki, 2005; Nelson et al., 2003). The relatively coarse spectral resolution of the CI is not sufficient to segregate speech from noise, or one talker from another. While training in quiet may direct listeners’ attention to subtle auditory cues, it is unclear whether listeners can make use of these cues in noisy listening conditions. Indeed, there are few reports regarding the benefit of auditory training for speech understanding in noise for CI users. A pilot study by Fu and Galvin (2008) compared two training protocols (recognition of monosyllable words or sentences in noise) on CI users’ speech understanding in noise. While recognition performance improved with both protocols, the sentence-in-noise training provided the greatest improvement in both test measures (phoneme or sentence recognition) and both types of noise (speech babble and steady noise).
Training outcomes for CI users and normal-hearing (NH) listeners attending to CI simulations can be affected by training frequency (Nogaki et al., 2007), as well as the training protocol and materials (Davis et al., 2005; Fu et al., 2005; Li and Fu, 2007; Loebach and Pisoni, 2007; Stacey and Summerfield, 2007, 2008; Stacey et al., 2010). The time course and degree of adaptation can depend on the listening task or condition, and can differ across individuals (Fu et al., 2005). Nogaki et al. (2007) found that, in a CI simulation study with NH listeners, the frequency of training sessions mattered less than the total amount of training completed. Fu et al. (2005) and Stacey and Summerfield (2007) suggested that multi-talker training materials may be more effective than single-talker training stimuli. Stacey and Summerfield (2008) found that, in a CI simulation study with NH listeners, words and sentences were more effective training stimuli than phonemes. In other CI simulation studies with NH listeners, training outcomes were better with lexically meaningful training material and feedback than without (Davis et al., 2005; Li and Fu, 2007). While these factors may affect training outcomes for speech in quiet, it is unclear how or if they may affect speech understanding in noise. Most of these training studies focused listeners’ attention to acoustic differences between stimuli. As such, these training methods might have targeted peripheral, “bottom-up” processes. While bottom-up processes are certainly important, central “top-down” processes contribute strongly to speech understanding in noise. It is possible that top-down processes should be targeted to improve CI users’ performance in noise.
Auditory training for CI users has been receiving increased attention. Much research has been aimed at finding the most effective and efficient training methods. Computer-based auditory training has been used in CI studies (e.g., Fu et al., 2004, 2005; Galvin et al., 2007; Stacey et al., 2010) and in CI simulation studies with NH listeners (e.g., Loebach and Pisoni, 2008; Nogaki et al., 2007; Stacey and Summerfield, 2007, 2008). Computer-based auditory training allows for large numbers of training stimuli and listening tasks, with unlimited training time. Ideally, training for relatively short periods with a limited number of stimuli and a simple listening task can generalize to a larger number of stimuli and listening conditions. Some previous CI training studies have shown such generalizations. Fu et al. (2005) found that closed-set phonetic contrast training with monosyllable words generalized to improved vowel, consonant and sentence recognition. Galvin et al. (2007) found that closed-set melodic contour identification (MCI) training generalized to improved MCI performance for untrained pitch ranges and improved familiar melody recognition. Loebach and Pisoni (2008) found that, in a CI simulation study with NH listeners, environmental sound recognition training significantly improved speech recognition; however, speech recognition training did not improve environmental sound recognition. All of the above training benefits and generalizations were observed for optimal listening environments (e.g., clean speech in quiet); it is unclear whether these might occur for noisy listening conditions. It is possible that a simple training task with simple stimuli (i.e., minimal acoustic variability) may target top-down processes that may be useful in developing listening strategies to compensate for noise.
The aims of this study were: (1) To determine whether CI users’ speech recognition in noise would improve with auditory training, and (2) To determine if training with familiar stimuli, in a closed-set task, would yield improvements in the more difficult task of open-set speech recognition in noise. In this study, ten CI users were trained to identify digits presented in competing speech babble using a closed-set listening task. Speech recognition was measured for digits, HINT sentences (Nilsson et al., 1994), and IEEE sentences (1969) presented in two types of noise (steady-state, speech-shaped noise and multi-talker speech babble). “Pre-training” baseline performance for all speech tests was repeatedly measured for a minimum of four test sessions, or until performance asymptoted. Participants were then trained only to identify digits in babble using custom training software loaded onto their home computers or loaner laptops. Participants trained for ~30 minutes per day, five days per week for four weeks, for a total of approximately 10 hours of training. During training, the signal-to-noise ratio (SNR) was adjusted from trial to trial according to the correctness of the listener’s response. Auditory and visual feedback was provided. Recognition of digits, HINT sentences, and IEEE sentences in steady noise and babble was re-measured two weeks and four weeks after training was begun (“post-training measures”), as well four weeks after training was stopped (“follow-up measures”). We hypothesized that the digit-in-babble training would significantly improve speech understanding in both types of noise, and for all speech measures.
MATERIALS AND METHODS
Participants
Ten post-lingually deafened adult CI users participated in this experiment. Relevant demographic information is shown in Table I. All participants were native speakers of American English. Seven participants had previous experience in speech recognition experiments. Three participants (S3, S7 and S9) had no prior experience with CI research. All provided informed consent before participating (in compliance with the local Institutional Review Board protocol) and all were reimbursed for their time and expenses associated with testing in the lab and training at home (equivalent to approximately $15.00/hour).
Table 1.
CI participant demographics. HA = hearing aid. ADRO = Adaptive Dynamic Range Optimization. ASC = Auto Sensitivity Control.
| Subject | Gender | Age | Etiology | Age at HL Onset | Duration CI use (years) | Device | Processor | Pre-Processing |
|---|---|---|---|---|---|---|---|---|
| S1 | F | 62 | Ototoxic Drugs | 28 | 10 6 |
Nucleus-24 (L) Nucleus-24 (R) |
Freedom Freedom |
ADRO & Whisper |
| S2 | F | 75 | Unknown | 37 | 8 | Nucleus-24 (R) HA (L) |
Freedom | ADRO & ASC |
| S3 | M | 67 | Unknown | 15 | 5 | Nucleus-24 (R) HA (L) |
Freedom | ADRO |
| S4 | F | 65 | Genetic | 4 | 5 | Nucleus-24 (R) | Freedom | ADRO |
| S5 | F | 70 | Genetic | 5 | 7 | Clarion-CII (L) HA (R) |
CII BTE | N.A. |
| S6 | F | 75 | Otosclerosis | ~20 | 20 | Nucleus-22 (L) | Freedom | ADRO |
| S7 | F | 46 | Unknown | 39 | 2 | Nucleus Freedom (R) | Freedom | ADRO |
| S8 | M | 78 | Noise | 40 | 14 | Nucleus-22 (L) | Freedom | ADRO & ASC |
| S9 | M | 69 | Otosclerosis | 25 | 1 | Nucleus Freedom (L) HA (R) |
Freedom | ADRO |
| S10 | M | 57 | Unknown | 37 | 15 1 |
Nucleus-22 (L) Nucleus Freedom (R) |
Freedom Freedom |
ADRO |
Participants were tested using their clinical speech processors, set to their preferred “everyday” program and volume control settings. If a participant normally wore a hearing aid (HA) in combination with the CI, then testing and training was performed with the CI and HA set to the everyday settings. Participants were instructed to use the same CI (and HA) settings for all testing and training sessions. Participants with the Nucleus Freedom processor used preprocessing technology with their everyday programs for all testing and training. Some research (Gifford and Revit, 2010; Müller-Deile et al., 2008; Wolfe et al., 2009) has shown that preprocessing strategies may improve speech perception in noise. Note that while the preprocessing may have influenced performance, we were interested in the change in performance after training for each subject.
General Testing and Training Timeline
Because of the high variability in performance among CI users, which might overwhelm any training benefits, a “within-subject” control procedure (instead of an experimental control group) was used, with each participant serving as their own experimental control. The within-subject control procedure required extensive baseline performance measures (see below) before training was begun. The within-subject control procedure also allowed procedural learning effects to be estimated independently for each participant. Baseline performance was repeatedly measured in the lab once per week for four weeks, or until achieving asymptotic performance. For each subject, asymptotic performance was determined using a one-way analysis of variance (ANOVA), with test session as a factor. Data from the digit and HINT sentence recognition tests in both types of noise were pooled within the ANOVA to obtain a broad estimate of asymptotic performance. Post-hoc Bonferroni pair-wise comparisons were performed to determine the test session at which performance began to asymptote. Data from this test session forward were used to calculate baseline performance within each test and noise condition.
After completing baseline “pre-training” measures, participants trained at home on their personal computers for approximately 30 minutes per day, five days a week, for four weeks. “Post-training” performance for all speech tests and noise types was measured after completing the second and fourth week of training. Training was stopped after the fourth week, and participants returned to the lab one month later for “follow-up” performance measures for all speech tests and noise types. Post-training and follow-up performance was compared to baseline performance for each subject.
Test Materials and Methods
Speech performance was measured in steady speech-shaped noise (1000-Hz cutoff frequency, −12 dB/octave) and in six-talker speech babble. The signal-to-noise ratio (SNR) was calculated according to the long-term RMS of the speech and noise stimuli. Pre-training, post-training and follow-up performance was assessed using three speech measures: digits in noise, HINT sentences in noise, and IEEE sentences in noise.
For digit-in-noise testing, speech recognition thresholds (SRTs; Plomp and Mimpen, 1979) were measured using a closed-set, adaptive (1-up/1-down) procedure converging on the SNR that produced 50% correct (Levitt, 1971). SRTs were defined as the SNR needed to produce 50% correct recognition of a three-digit sequence. Stimuli included digits “zero” through “nine” produced by a single male talker. During testing, three digits were randomly selected and presented in sequence (e.g., “three-zero-eight”) in the presence of background noise. The listener responded by clicking on response boxes (labeled “zero” through “nine”) shown on the computer screen or by typing in the numbers on the keyboard. Listeners were allowed to repeat the stimulus up to three times. If the listener correctly identified the entire sequence, the SNR was reduced by 2 dB; if the entire sequence was not correctly identified, the SNR was increased by 2 dB. Each test run was 25 trials; all reversals in SNR were used to calculate the SRT. Three runs in steady noise and in speech babble were measured at each test session.
For HINT sentences in noise, SRTs were measured using an open-set, adaptive (1-up/1-down) procedure (Levitt, 1971). HINT stimuli included 260 sentences produced by a single male talker, of easy to moderate difficulty in terms of vocabulary and syntactic complexity (e.g., “Strawberry jam is sweet.”). During each test run, a sentence was randomly selected (without replacement) from the 260-sentence stimulus set and presented in noise. Listeners were asked to repeat what they heard as accurately as possible. If the entire sentence was repeated correctly, the SNR was reduced by 2 dB; if not, the SNR was increased by 2 dB. On average, 15 sentences were presented within each test run, and the mean of the final six reversals in SNR was recorded as the SRT; a minimum of 8 reversals in SNR was required for a valid test run. Results for HINT recognition in noise were reported in terms of the SRT, i.e., the SNR required to produce 50% correct whole sentence recognition. Three runs in steady noise and speech babble were measured at each test session. Note that sentences were not repeated within a test run. However, it is possible that some sentences were repeated across runs.
IEEE sentence recognition was measured at two SNRs that corresponded to moderately and extremely difficult. The SNRs were determined for each subject according to performance at the first test session. The HINT SRT in steady noise was used to estimate the moderately difficult SNR (i.e., the SNR that would produce approximately 50% IEEE sentence recognition in steady noise). Typically, the moderately difficult SNR for IEEE sentences was 2 to 3 dB higher than the HINT SRT. The extremely difficult SNR was 5 dB less than the moderately difficult SNR. Thus, if a subject’s HINT SRT was 8 dB, IEEE sentence recognition was measured +10 dB and +5 dB SNRs. IEEE stimuli included 720 sentences divided into 72 lists (10 sentences per list), produced by one male and one female talker. IEEE sentences are moderately difficult, and much more difficult than HINT sentences in terms of semantic and syntactic complexity and word predictability (e.g., “Plead to the council to free the poor thief.”). During testing, a list was randomly chosen (without replacement) and a sentence was randomly chosen (without replacement) from the list and presented at the target SNR. Listeners were asked to repeat back what they heard as accurately as possible. Results were reported in terms of the percent of correctly identified words. Within a test session, performance was measured independently for each talker, noise type, and SNR. The test order for the different talker, noise type and SNR conditions was randomized across test sessions and across subjects. In each test session, performance for each condition was evaluated using one sentence list. Thus, no IEEE sentences were repeated within or across test runs, as a different list was chosen for each condition at each test session.
Both HINT and IEEE sentences were used because they provide somewhat different measures of performance in noise. HINT SRTs (corresponding to 50% correct whole sentence recognition in noise) were adaptively measured, i.e., the SNR was adjusted from trial to trial according the correctness of response, thereby avoiding floor and ceiling performance effects. IEEE sentence recognition was measured at two SNRs (moderate and difficult) that were fixed for each subject. While the target SNRs were selected to accommodate different performance levels across subjects, floor and ceiling effects may have influenced performance in some cases.
Testing was conducted in sound field in a sound-treated booth (IAC). Combined speech and noise were delivered via single loudspeaker (Tannoy Reveal) at a fixed output (65 dBA) to avoid severe peak-clipping by the CI speech processor. The target SNR was calculated according to the long-term RMS of the speech and noise; the level of the combined speech and noise was then adjusted to achieve the target output (65 dBA). Listeners were seated directly facing the loudspeaker. No training or trial-by-trial feedback was provided during test sessions. The same performance measures (digit recognition, HINT sentences, IEEE sentences), noise types (steady-state, speech babble) and test methods were used for all test sessions (pre-training, post-training, and follow-up).
Training Materials and Methods
After completing pre-training baseline measures, training was begun. Training was conducted at home, using participants’ personal computers or loaner laptops loaded with custom training software (Sound Express). Participants trained while listening to stimuli play back via computer speakers. Participants were extensively trained on how to install and use the software, how to set up the computer speakers and how set the listening level. Participants were also instructed to immediately contact the experimenters if there was any problem with the home training. The software also allowed for remote monitoring of training frequency and performance; the software logged each participant’s training sessions, including performance and the total time spent training. Participants were instructed to train using the same CI (and HA) settings used for testing. Participants were asked to train for 30 minutes/day, 5 days/week, for 4 weeks, for a total of 10 training hours.
The same stimuli used for digit-in-noise testing were used for digit-in-noise training. However, participants trained only in competing speech babble; note that subjects were tested in both speech babble and steady noise. The training was computer administered. Similar to testing, participants were trained using an adaptive (1-up/1-down) SRT procedure. In each trial, three digits were presented in random sequence in the presence of competing speech babble; participants were allowed to repeat each stimulus presentation up to three times. Participants responded by clicking on numbers shown on the computer screen or by typing the numbers on the keyboard. If the participant answered correctly, visual feedback was provided, a new three digit sequence was presented, and the SNR was reduced by 2 dB. If the participant answered incorrectly, audio and visual feedback was provided, in which the incorrect response and the correct response were played back and repeated. Then the next digit sequence was presented and the SNR was increased by 2 dB. Each training exercise consisted of 25 three-digit sequences. Similar to the digit-in-noise testing, the SRT for digit-in-noise training was calculated across all reversals in SNR at the end of each run.
RESULTS
All participants completed the specified training; the total time spent training ranged from 583 to 767 minutes with a mean of 647 minutes (10.75 hours). After training, performance improved for all subjects on some or all test measures; however, there was considerable variability in the amount of improvement.
Figure 1 shows individual and mean pre- and post-training (4 weeks) SRTs for digit recognition. In steady noise, mean SRTs significantly improved from −2.2 to −5.0 dB after training (t9=3.66, p=0.005). Post-training SRTs worsened by 0.6 dB for subject S10, but improved for subjects S1, S2, S3, S4, S5, S6, S7, S8 and S9 (range=0.5–6.6 dB). In speech babble, mean SRTs significantly improved from 2.7 to −1.3 dB after training (t9=7.50, p<0.001). Post-training SRTs improved for all ten subjects (range=1.9–7.5 dB).
Figure 1.
Speech reception thresholds (SRTs) for digit recognition in speech-shaped steady noise (top panel) and multi-talker speech babble (bottom panel). Mean data (across trials) is shown for each participant for pre-training performance and performance after four weeks of training. Mean data (across participants) is shown at the far right. The error bars show one standard deviation of the mean.
Figure 2 shows individual and mean pre- and post-training (4 weeks) SRTs for HINT sentence recognition. In steady noise, mean SRTs improved from 6.6 to 5.6 dB after training; however, the improvement was not significant (t9=1.83, p=0.093). Post-training SRTs worsened for subjects S4, S6 and S8, (range=0.5–1.4 dB), improved for subjects S1, S2, S3, S7, S9 and S10, (range=0.6–4.5 dB) and were unchanged for subject S5. In speech babble, mean SRTs significantly improved from 11.2 to 8.3 dB after training (t9=4.037, p=0.003). Post-training SRTs worsened by 0.5 dB for subject S8 and improved for subjects S1, S2, S3, S4, S5, S6, S7, S9 and S10 (range=1.4–7.6 dB).
Figure 2.
Similar to Fig. 1, but for HINT sentence recognition in speech-shaped steady noise (top panel) and multi-talker speech babble (bottom panel).
Figure 3 shows individual and mean pre- and post-training (4 weeks) IEEE sentence recognition scores at the moderate SNR. In steady noise, mean performance significantly improved from 58.5% to 66.0% correct after training (t9=−2.68, p=0.025). Post-training performance worsened by 10.1 points for subject S4, improved for subjects S1, S2, S6, S7, S8 S9 and S10 (range=6.1–23.5 points), and was unchanged for subjects S3 and S5. In speech babble, mean performance significantly improved from 32.5% to 41.7% correct after training (t9=−3.12, p=0.012). Post-training performance worsened for subjects S2 and S9 (range=5.0 to 5.8 points), improved for subjects S1, S3, S6, S7, S8 and S10 (range=11.4–26.0 points), and was unchanged for subjects S4 and S5.
Figure 3.
Similar to Fig. 1, but for IEEE sentence recognition in speech-shaped steady noise (top panel) and multi-talker speech babble (bottom panel). Data is shown for the moderate SNR condition.
Figure 4 shows individual and mean pre- and post-training (4 weeks) IEEE sentence recognition scores at the difficult SNR. In steady noise, mean performance significantly improved from 36.0% to 44.7% correct after training (t9=−2.54, p=0.032). Post-training performance worsened by 8.8 points for subject S9, improved for subjects S1, S2, S4, S5, S6, S8 and S10 (range = 5.9–19.8 points), and was unchanged for subjects S3 and S7. In speech babble, mean performance improved from 14.3% to 17.7% correct after training; however, the improvement was not significant (t9=−1.88, p=0.093). Because of potential floor performance effects, the scores for the difficult SNR were transformed into rationalized arcsine units (Studebaker, 1985). Even after this transformation, there was no significant difference between pre- and post-training performance (t9=−2.051, p=0.071). Post-training performance worsened by 6.3 points for subject S7, improved for subjects S3, S5, S6, S8, S9 and S10 (range=3.8–13.2 points), and was unchanged for subjects S1, S2 and S4.
Figure 4.
Similar to Fig. 3, but for the difficult SNR condition.
Figure 5 shows mean performance (across participants) for digit recognition in noise (top left), HINT sentence recognition in noise (top right), IEEE sentence recognition at the moderate SNR (bottom left), and IEEE sentence recognition at the difficult SNR (bottom right), as a function of noise type. Mean data are shown for pre-training, 2 weeks post-training, 4 weeks post-training and follow-up measures.
Figure 5.
Mean performance (across participants) for digits recognition in noise (top left panel), HINT sentence recognition in noise (top right panel), IEEE sentence recognition at the moderate SNR (bottom left panel), and IEEE sentence recognition at the difficult SNR (bottom right panel), as a function of noise type. Mean performance is shown for pre-training baseline, two weeks post-training, four weeks post-training, and follow-up measures (four weeks after training was stopped). The error bars show one standard error of the mean.
A two-way repeated measures analysis of variance (RM ANOVA), with test session (pre-training, 2 weeks post-training, four weeks post-training, follow-up) and noise type (steady-state or speech babble) was performed for each of the speech measures shown in Fig. 5. The results are shown in Table 2. For all measures, performance in steady noise was significantly better than that in speech babble, as would be expected. Post-training performance (four weeks after training was begun) and follow-up performance (four weeks after training was stopped) was significantly better than pre-training performance for all test measures. There was no significant difference between post-training and follow-up measures, suggesting that the training benefits were largely retained one month after training was stopped.
Table 2.
Results of two-way RM ANOVAs performed for each speech measure. Factor levels for “training” included: Pre (pre-training), T2 (2 weeks post-training), T4 (4 weeks post-training) and F (follow-up 4 weeks after training was stopped). Factor levels for “noise” included: SS (steady-state speech-shaped) and babble (multi-talker speech babble). The far right column shows significant differences (p<0.05) between levels for post-hoc Bonferroni pair-wise comparisons.
| Factor | dF, res | F-ratio | p-value | Bonferroni (p<0.05) | |
|---|---|---|---|---|---|
| Digits | Training (Pre, T2, T4, F) | 3, 27 | 17.122 | <0.001 | T2, T4, F>Pre |
| Noise (SS, babble) | 1, 27 | 86.476 | <0.001 | SS>babble | |
| Training × Noise | 3, 27 | 1.874 | 0.158 | ||
| HINT | Training (Pre, T2, T4, F) | 3, 27 | 8.518 | <0.001 | T4, F>Pre |
| Noise (SS, babble) | 1, 27 | 121.869 | <0.001 | SS>babble | |
| Training × Noise | 3, 27 | 2.697 | 0.066 | ||
| IEEE moderate SNR | Training (Pre, T2, T4, F) | 3, 27 | 9.243 | <0.001 | T4>Pre F>Pre, T2 |
| Noise (SS, babble) | 1, 27 | 143.519 | <0.001 | SS>babble | |
| Training × Noise | 3, 27 | 1.012 | 0.403 | ||
| IEEE difficult SNR | Training (Pre, T2, T4, F) | 3, 27 | 16.602 | <0.001 | T2, T4>Pre F>T2, Pre |
| Noise (SS, babble) | 1, 27 | 101.515 | <0.001 | SS>babble | |
| Training × Noise | 3, 27 | 1.334 | 0.284 |
Data were also analyzed to see whether there was a difference in post-training performance gains across test and noise conditions. Two-way RM ANOVAs were performed on the performance difference data (4 weeks post-train – pre-training baseline), with test type (Digit or HINT; IEEE moderate or difficult SNR) and noise type (steady-state or speech babble) as factors. The two-way ANOVAs were performed independently for the SRT measures (difference in SNR) and the IEEE measures (difference in percent correct). The results are shown in Table 3. There was no significant difference in post-training performance gains between the digit recognition in noise or HINT sentence recognition in noise tests, or between the IEEE sentence recognition in noise at the moderate or difficult SNRs. There was a significant difference in post-training performance gains between noise types only for the HINT sentence recognition in noise, with a greater improvement observed in babble than in steady noise.
Table 3.
Results of two-way RM ANOVAs performed on the post-training performance gain data (i.e., the difference in performance between 4 weeks post training and pre-training baseline). Because of the differences in performance measures between the adaptive SRT (dB SNR) and the fixed SNR tests (percent correct), ANOVAs were performed separately for the two groups of tests. For both tests, factor levels for “noise” included: SS (steady-state) and babble (multi-talker speech babble). Factor levels for “test” included: digits (digit-in-noise SRT) and HINT (HINT sentence-in-noise SRT). The far right column shows significant differences (p<0.05) between levels for post-hoc Bonferroni pair-wise comparisons.
| Factor | dF, res | F-ratio | p-value | Bonferroni (p<0.05) | |
|---|---|---|---|---|---|
| T4 - Pre | Test (digits, HINT) | 1, 9 | 2.980 | 0.118 | |
| Noise (SS, babble) | 1, 9 | 12.510 | 0.006 | HINT: babble>SS | |
| Test × Noise | 1, 9 | 0.510 | 0.494 | ||
| T4 - Pre | Test - IEEE (moderate, difficult) | 1, 9 | 0.342 | 0.573 | |
| Noise (SS, babble) | 1, 9 | 1.154 | 0.311 | ||
| Test × Noise | 1, 9 | 1.216 | 0.299 |
The digit-in-babble training was very similar to the digit-in-babble testing, in that the task was exactly the same, the stimuli were exactly the same, and SRTs were calculated in exactly the same way (the mean of all reversals in SNR across all 25 trials). The digit-in-babble training differed from the digit-in-babble testing only in terms of the trial-by-trial feedback and the listening environment. Participants trained at home on their own personal computers or laptops which were loaned to them; the sound quality and potential for distraction may have varied greatly among subjects. In contrast, all subjects were tested in the laboratory in a sound-treated room using high quality audio equipment; thus, the sound quality and test environment was the same for all participants. Participants’ digit-in-babble SRTs were compared between training and testing; the results are shown in Figure 6. Testing SRTs were highly correlated with training SRTs, suggesting that improvements in the home training environment were reflected in the more controlled laboratory testing environment. In general, the digit-in-babble SRT measurements were robust to different listening conditions.
Figure 6.
Individual participants’ digit-in-babble SRTs (measured at home during training, with feedback) as a function of digit-in-babble SRTs (tested in the lab, no feedback). Performance is shown for pre-training baseline (black circles), 2 weeks post-train (gray triangles), and 4 weeks post train (white squares). The solid diagonal line shows unity performance. The dashed diagonal line shows a linear regression fit across all data; r2 value and p-value for the regression are shown at lower right.
DISCUSSION
The present results demonstrate that auditory training in noise can improve CI users’ speech perception in noise. Training with the closed-set digit identification task using simple single-talker digit stimuli significantly improved speech understanding in noise for the untrained listening tasks (i.e., single-talker HINT sentence recognition in noise, multi-talker IEEE sentence recognition in noise). Training with the more difficult noise type (speech babble) also improved performance with the untrained noise type (steady speech-shaped noise). Most importantly, performance gains were largely retained one month after training was stopped.
While mean performance significantly improved with training, the amount of improvement varied across participants. Some participants (S2) benefited most strongly in the trained task (digit recognition in noise), others (S1) benefited in all speech recognition tasks, and others (S8) showed little difference in performance after training for any of the measures. Nevertheless, most participants’ performance improved after training, for nearly all speech measures. All participants had at least one year of experience with their CI and the majority had at least five years of implant experience. Thus, participants had a long “passive learning” experience with their device. Longitudinal studies have shown that the greatest improvements in speech recognition (typically measured in quiet) occurred within the first 3–9 months post-implantation, but that performance may continue up to improve 5 years post-implantation (Spivak and Waltzman, 1990; Loeb and Kessler, 1995; Tyler et al., 1997). Prior to enrolling in this study, all participants reported that while they were pleased with their CI performance, background noise remained especially problematic, even after many years of implant use. The present results suggest that even experienced CI users can improve performance in noise with auditory training.
In this study, the mean age of participants was 66 years old, and 8 out of the 10 participants were older than 60 years. Background noise is especially difficult for older hearing-impaired listeners (Frisina and Frisina, 1997; Gordon-Salant and Fitzgibbons, 1995). In a CI simulation study with NH listeners, Schvartz et al. (2008) found significantly better phoneme recognition performance with younger than with older participants. Other studies have shown lower performance for some speech measures for elderly CI users, relative to younger CI users (Chatelin et al., 2004; Friedland et al., 2010; Vermeire et al., 2005). Given these deficits, auditory training may be especially important for older individuals. In this study, nearly all subjects improved performance on nearly all speech measures after the digit-in-noise training. Auditory training, especially for older adults, should not be overlooked as an important component of auditory rehabilitation. While the digits-in-noise training may have been effective, what did participants learn? Fu and Galvin (2008) presented pilot data for training in noise using a phonetic contrast protocol or a keyword-in-sentence protocol. They found that both a “bottom-up” approach (targeting acoustic contrasts with phoneme recognition training) and a “top-down” approach (targeting contextual cues and source streaming available with sentence training) were effective for improving speech understanding in noise, although the sentence training seemed to provide a greater overall benefit. The present digit-in-noise training included aspects of each of these approaches. There may have been some bottom-up learning (sensory component) in discriminating phonemic contrasts (e.g., the acoustic differences between the consonants in “five” and “nine”). The closed-set task and limited number of stimuli may have aided top-down processing, in which participants developed strategies for segregating the target from the noise. Because the digits were speech stimuli, central pattern processing (i.e., top-down) most likely played a very strong role.
Alternatively, participants’ general attention may have improved following training. Amitay et al. (2006) found that NH listeners’ frequency discrimination improved after training in an unrelated visual task, or after training with identical stimuli, suggesting that some benefits observed with auditory training may not be due to improved auditory processing, but to better overall attention. More recently, Moore et al. (2010) found that auditory training improved adult listeners’ ability to keep three stimuli in short-term memory in a 3AFC psychophysical task. In this study, we did not have a separate control group who performed an unrelated training task that might shed light on the effects of “attention” on learning effects. Instead, we used a within-subject control that allowed relative improvements with training to be observed, regardless of pre-training baseline performance. There was a wide range (0.4–12.5 dB) in pre-training baseline HINT SRTs in steady noise. It would be difficult to establish a comparable control group, and the range in baseline performance might overwhelm post-training changes in performance.
It is also difficult to completely rule out long-term procedural learning due to exposure to the stimuli, test environment and methods. The extensive baseline testing was performed to reduce these procedural learning effects. However, because of the repeated testing (a minimum of seven test sessions throughout this study), there was a risk of familiarization and learning effects. This may be especially true for the HINT SRT measures, in which some sentences may have been repeated across test runs and sessions. However, increased familiarity with the test materials would tend to sharpen the slope of the performance-intensity function; noise would provide either complete interference or none at all. Given the easy level of difficulty for the HINT sentences, familiarity with the materials was most likely a minor issue. In general, better experimental controls will help to clarify auditory learning and demonstrate any benefits of auditory training.
Anecdotal reports suggest that participants benefited from the training outside of the lab. One participant reported better awareness of background noise after training. She no longer “tuned-out” when presented with background noise and instead tried to “listen beyond the noise” in more situations. Another participant commented that she was surprised how well she performed in seemingly large amounts of noise. Another participant commented that she used to be intimidated when listening to numbers over the telephone (an auditory-only listening task not unlike the present digit recognition testing); the digit recognition training had improved her confidence. CI users may gain confidence if they see they are making progress with auditory training. Indeed, some participants commented that the auditory training was motivating, as they were able to see their progress over time. Motivation is certainly an important factor in training outcomes. The participants in this study were highly motivated (hence, their willingness to participate in CI research), which may influenced the training outcomes. With effective training tools and resources, CI users may receive the necessary “motivational boost” to get the most benefit from their implant.
Based on this study and previous work, most CI users would seem to benefit from structured auditory training, due to improved performance and increased confidence. In this study, subjects improved regardless of pre-training performance levels. Speech performance improved with training for both elderly CI users and younger CI subjects who had many years of experience with their device. Future research will determine whether auditory training can accelerate adaptation for newly implanted CI patients. In this study, there was great inter-subject variability in terms of post-training performance gains. Future research may help to identify CI users that may benefit most from training, as well as to select training tasks tailored for individual CI users and skill levels.
CONCLUSIONS
The results of this study demonstrate that auditory training the digit-in-noise task significantly improved CI users’ ability to recognize digits in the presence of both steady noise and speech babble. These training benefits generalized to improved open-set HINT and IEEE sentence recognition in both steady noise and babble. The results suggest that auditory training with familiar stimuli using a simple, closed-set task may improve CI users’ speech understanding in difficult listening conditions.
Acknowledgments
Supported by NIH-NIDCD 5RO1DC004792-07
We thank all of the CI users who participated in this study. We also thank three anonymous reviewers for their helpful comments. This work was supported by NIDCD grant R01-DC004792.
References
- Amitay S, Irwin A, Moore DR. Discrimination learning induced by training with identical stimuli. Nat Neurosci. 2006;9:1446–1448. doi: 10.1038/nn1787. [DOI] [PubMed] [Google Scholar]
- Chatelin V, Kim EJ, Driscoll C, et al. Cochlear implant outcomes in the elderly. Otol Neurotol. 2004;25:298–301. doi: 10.1097/00129492-200405000-00017. [DOI] [PubMed] [Google Scholar]
- Davis MH, Hervais-Adelman A, Taylor K, et al. Lexical information drives perceptual learning of distorted speech: Evidence from the comprehension of noise-vocoded sentences. J Exp Psych. 2005;134:222–241. doi: 10.1037/0096-3445.134.2.222. [DOI] [PubMed] [Google Scholar]
- Friedland DR, Runge-Samuelson C, Baig H, et al. Case-control analysis of cochlear implant performance in elderly patients. Arch Otolaryngol Head Neck Surg. 2010;136:432–438. doi: 10.1001/archoto.2010.57. [DOI] [PubMed] [Google Scholar]
- Frisina DR, Frisina RD. Speech recognition in noise and presbycusis: relations to possible neural mechanisms. Hear Res. 1997;106:95–104. doi: 10.1016/s0378-5955(97)00006-3. [DOI] [PubMed] [Google Scholar]
- Fu QJ, Galvin JJ., III Maximizing cochlear implant patients’ performance with advanced speech training procedures. Hear Res. 2008;242:198–208. doi: 10.1016/j.heares.2007.11.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fu QJ, Nogaki G. Noise susceptibility of cochlear implant users: the role of spectral resolution and smearing. J Assoc Res in Otolaryngol. 2005;6:19–27. doi: 10.1007/s10162-004-5024-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fu Q-J, Galvin JJ, III, Wang X, et al. Effects of auditory training on adult cochlear implant patients: a preliminary report. Cochlear Implants Int. 2004;5:84–90. doi: 10.1179/cim.2004.5.Supplement-1.84. [DOI] [PubMed] [Google Scholar]
- Fu Q-J, Galvin JJ, III, Wang X, et al. Moderate auditory training can improve speech performance of adult cochlear implant users. J Acoust Soc Am. 2005;6:106–111. (ARLO) [Google Scholar]
- Fu QJ, Shannon RV, Wang X. Effects of noise and number of channels on vowel and consonant recognition: acoustic and electric hearing. J Acoust Soc Am. 1998;104:3586–3596. doi: 10.1121/1.423941. [DOI] [PubMed] [Google Scholar]
- Galvin JJ, III, Fu QJ, Nogaki G. Melodic contour identification by cochlear implant listeners. Ear Hear. 2007;28:302–319. doi: 10.1097/01.aud.0000261689.35445.20. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gifford RH, Revit LJ. Speech perception for adult cochlear implant recipients in a realistic background noise: effectiveness of preprocessing strategies and external options for improving speech recognition in noise. J Am Acad Audiol. 2010;21:441–451. doi: 10.3766/jaaa.21.7.3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gordon-Salant S, Fitzgibbons PJ. Recognition of multiply degraded speech by young and elderly listeners. J Speech Hear Res. 1995;38:1150–1156. doi: 10.1044/jshr.3805.1150. [DOI] [PubMed] [Google Scholar]
- IEEE Subcommittee. IEEE recommended practice for speech quality measurements. IEEE Transactions on Audio and Electroacoustics. 1969;AU-17:225–246. [Google Scholar]
- Levitt H. Transformed up-down methods in psychoacoustics. J Acoust Soc Am. 1971;49:467–476. [PubMed] [Google Scholar]
- Li TH, Fu QJ. Perceptual adaptation to spectrally shifted vowels: the effects of lexical and non-lexical labeling. J Assoc Res Otolaryngol. 2007;8:32–41. doi: 10.1007/s10162-006-0059-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Loeb GE, Kessler DK. Speech recognition performance over time with the Clarion cochlear prosthesis. Ann Otol Rhinol Laryngol Suppl. 1995;166:290–292. [PubMed] [Google Scholar]
- Loebach JL, Pisoni DB. Perceptual learning of spectrally degraded speech and environmental sounds. J Acoust Soc Am. 2008;123:1126–1139. doi: 10.1121/1.2823453. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Moore D, Amitay S, Halliday L. Changes in accuracy with interval order during auditory development and learning. Abstracts Assoc Res Otolaryng 33rd Midwinter Meeting; 2010. p. 304. [Google Scholar]
- Müller-Deile J, Kiefer J, Wyss J, et al. Performance benefits for adults using a cochlear implant with adaptive dynamic range optimization (ADRO): a comparative study. Cochlear Implants Int. 2008;9:8–26. doi: 10.1179/cim.2008.9.1.8. [DOI] [PubMed] [Google Scholar]
- Nelson PB, Jin SH, Carney AE, et al. Understanding speech in modulated interference: cochlear implant users and normal-hearing listeners. J Acoust Soc Am. 2003;113:961–968. doi: 10.1121/1.1531983. [DOI] [PubMed] [Google Scholar]
- Nilsson M, Soli SD, Sullivan JA. Development of the Hearing in Noise Test for the measurement of speech reception thresholds in quiet and in noise. J Acoust Soc Am. 1994;95:1085–1099. doi: 10.1121/1.408469. [DOI] [PubMed] [Google Scholar]
- Nogaki G, Fu QJ, Galvin JJ., III The effect of training rate on recognition of spectrally shifted speech. Ear Hear. 2007;28:132–140. doi: 10.1097/AUD.0b013e3180312669. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Plomp R, Mimpen AM. Speech-reception threshold for sentences as a function of age and noise level. J Acoust Soc Am. 1979;66:1333–1342. doi: 10.1121/1.383554. [DOI] [PubMed] [Google Scholar]
- Schvartz KC, Chatterjee M, Gordon-Salant S. Recognition of spectrally degraded phonemes by younger, middle-aged, and older normal-hearing listeners. J Acoust Soc Am. 2008;124:3972–3988. doi: 10.1121/1.2997434. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Skinner MW, Clark GM, Whitford LA, et al. Evaluation of a new spectral peak coding strategy for the Nucleus-22 channel cochlear implant system. Am J Otol. 1994;15:15–27. [PubMed] [Google Scholar]
- Spivak LG, Waltzman SB. Performance of cochlear implant patients as a function of time. J Speech Hear Res. 1990;33:511–519. doi: 10.1044/jshr.3303.511. [DOI] [PubMed] [Google Scholar]
- Stacey PC, Raine CH, O’Donoghue GM, et al. Effectiveness of computer-based auditory training for adult users of cochlear implants. Int J Audiol. 2010;49:347–356. doi: 10.3109/14992020903397838. [DOI] [PubMed] [Google Scholar]
- Stacey PC, Summerfield AQ. Effectiveness of computer-based auditory training in improving the perception of noise-vocoded speech. J Acoust Soc Am. 2007;121:2923–2935. doi: 10.1121/1.2713668. [DOI] [PubMed] [Google Scholar]
- Stacey PC, Summerfield AQ. Comparison of word-, sentence-, and phoneme-based training strategies in improving the perception of spectrally distorted speech. J Speech Lang Hear Res. 2008;51:526–538. doi: 10.1044/1092-4388(2008/038). [DOI] [PubMed] [Google Scholar]
- Studebaker GA. A “rationalized” arcsine transform. J Speech Lang Hear Res. 1985;28:455–462. doi: 10.1044/jshr.2803.455. [DOI] [PubMed] [Google Scholar]
- Tyler RS, Parkinson AJ, Woodworth GG, et al. Performance over time of adult patients using the Ineraid or Nucleus cochlear implant. J Acoust Soc Am. 1997;102:508–522. doi: 10.1121/1.419724. [DOI] [PubMed] [Google Scholar]
- Vermeire K, Brokx JPL, Wuyts FL, et al. Quality-of-life benefit from cochlear implantation in the elderly. Otol Neurotol. 2005;26:188–195. doi: 10.1097/00129492-200503000-00010. [DOI] [PubMed] [Google Scholar]
- Wolfe J, Schafer EC, Heldner B, et al. Evaluation of speech recognition in noise with cochlear implants and dynamic FM. J Am Acad Audiol. 2009;20:409–421. doi: 10.3766/jaaa.20.7.3. [DOI] [PubMed] [Google Scholar]






