Abstract
Objectives
This study (a) examined speech recognition abilities of cochlear implant (CI) recipients in the spectrally complex listening condition of three contrasting types of background music, and (b) compared performance based upon listener groups: CI recipients using conventional long-electrode (LE) devices, Hybrid CI recipients (acoustic plus electric stimulation), and normal-hearing (NH) adults.
Methods
We tested 154 LE CI recipients using varied devices and strategies, 21 Hybrid CI recipients, and 49 NH adults on closed-set recognition of spondees presented in three contrasting forms of background music (piano solo, large symphony orchestra, vocal solo with small combo accompaniment) in an adaptive test.
Outcomes
Signal-to-noise thresholds for speech in music (SRTM) were examined in relation to measures of speech recognition in background noise and multi-talker babble, pitch perception, and music experience.
Results
SRTM thresholds varied as a function of category of background music, group membership (LE, Hybrid, NH), and age. Thresholds for speech in background music were significantly correlated with measures of pitch perception and speech in background noise thresholds; auditory status was an important predictor.
Conclusions
Evidence suggests that speech reception thresholds in background music change as a function of listener age (with more advanced age being detrimental), structural characteristics of different types of music, and hearing status (residual hearing). These findings have implications for everyday listening conditions such as communicating in social or commercial situations in which there is background music.
Keywords: cochlear implant, music, pitch, speech perception, preserved hearing
Cochlear implants (CIs) have been designed to support speech recognition by persons with profound hearing loss, and are quite effective at doing so in quiet listening environments1. Most current-generation implants transmit only broad features of the spectral envelope in terms of the place-frequency mapping of the multiple-electrode array; the speech processor stimulation rates are unrelated to the precise frequency components of the input signals. While this processing approach is effective in transmitting salient features of speech in quiet, the lack of fine structure and poor frequency resolution have negative implications for speech perception in noisy listening conditions2–5. Turner et al.4 and Stickney, Zeng, Litovsky, and Assmann6 provided evidence that patients with cochlear implants tend to do quite poorly in recognizing speech in background noise, in particular when the background signal is not steady state. One explanation is implant patients’ inability to use pitch cues to separate the target from the background, as supported by the studies of Qin and Oxenham3.
Frequency selectivity and related abilities such as pitch perception have been shown to be important factors in the ability to recognize speech in backgrounds of noise or competing talkers3. Henry et al.7 showed that frequency resolution tends to be poorest in profoundly deaf cochlear implant recipients using electric stimulation, somewhat better in listeners with various degrees of sensorineural hearing loss, and best in normal hearing (NH) listeners. This corresponds with the abilities of these subject groups to understand speech in background noises3, 8–10. CI patients with preserved low-frequency acoustic hearing can use the better frequency resolution of the acoustic hearing to help them understand speech in background noise better than implant patients reliant upon the electrical stimulation transmitted via the traditional long electrode (LE) implant4.
While considerable research has been conducted regarding speech perception of CI recipients in conditions of broad band noise or multiple talkers, less is known regarding another common acoustic signal that often competes with speech: background music. Background music often comprises a broader frequency range (fundamentals and harmonics) and greater fluctuations in amplitude and timbre than is experienced with human speech. CI recipients are quite likely to encounter background music in a variety of everyday situations. For example, background music is frequently played to create the appropriate “ambience” at social gatherings and in places of business (e.g., MUZAK). Music scores are also common in movies and television, played in conjunction with theatrical dialog, in order to establish the mood of the scene, or as a cue to signal characters or events11. Even though these uses of background music are intended to enhance a mood or proffer additional communication regarding the situation, it is nevertheless a competing acoustic signal which can mask or interfere with reception of the target speaker. To date, however, few studies have examined the challenges of speech perception in background music; some studies have focused on NH or hearing impaired listeners with conventional hearing aid users and several studies are comprised of survey data with self-report12–14. Additional data on perceptual accuracy with real-world background music would help illuminate the auditory capabilities of CI recipients in this common everyday listening environment.
This study examines the following questions regarding speech recognition of CI recipients in the listening condition of competing real-world music: (a) Does usable low frequency acoustic hearing assist cochlear implant recipients in recognizing speech in different types of competing background music? (b) What participant characteristics (e.g., age, speech perception, musical experience) are influential in recognizing speech in complex listening situations, such as speech in background music? (c) Is better speech recognition in competing sound related to more accurate pitch perception? (d) How does speech in background music relate to speech recognition in other complex listening conditions (noise, multi-talker babble)?
Methods
Participants
Participants included 175 adult CI recipients and 49 adults with normal hearing (NH). The implant recipients used a variety of internal devices and speech processing strategies: (a) a long electrode group (LE) (n = 154), who used a variety of devices (all 22 mm internal array) and strategiesa; and (b) a Hybrid group, who received acoustic plus electric stimulation (n = 21). Preliminary analyses of the results indicated that none of the devices or strategies within the LE group (as listed in footnote a) was superior with regard to accuracy on pitch ranking or the primary dependent variable. Consequently, the data were collapsed into one group (LE group) for subsequent analyses.
The CI recipients in the Hybrid group were implanted with a 10 mm internal electrode array (Note (b) technical features of the Hybrid and specific selection criteria are described in references 15–17)15–17. Briefly, candidates or the Hybrid device are drawn from a different population with regard to auditory profile from typical long electrode candidates in that they should have good residual acoustic hearing for low frequencies; however, like long electrode candidates, they have extremely poor hearing for high frequenciesb. The Hybrid stimulates 6–10 channels in the basal end of the cochlea with high-frequency information using a CIS processing strategy; low-frequency information is perceived via the patient’s residual hearing. Eighteen of the Hybrid group also used hearing aids in conjunction with their CI, which amplified their preserved acoustic hearing in the implanted and/or contralateral ear.
While the LE and Hybrid groups differed with regard to length of internal array and number of electrodes, these two groups also differ considerably with regard to hearing history (e.g., length of profound deafness, see Table 1) and residual hearing. The long electrode users had, for the most part, essentially no residual hearing in the implanted ear after the implantation surgery. Pre-Operative pure-tone thresholds for this group averaged 80 dB HL and 94 dB HL for 250 and 500 Hz, respectively. Because implantation surgery usually destroys residual hearing, post-operative threshold testing for these patients rarely produces any responses. In contrast, the Hybrid patients had substantial residual hearing in the low frequencies. Mean unaided thresholds for this group were 32.5 dB HL and 45.25 dB HL for 250 and 500 Hz, respectively. When hearing aids were used, these thresholds were even more sensitive, averaging 17.5 dB HL at 250 Hz and 25 dB HL at 500 Hz.
Table 1.
Means and Standard Deviations for Demographic Variables and Speech in Quiet Scores
| Group | Age at testing (years) |
Months of CI use |
LPD in years | PTA 250Hz |
PTA 500 Hz |
HINT in quiet |
|---|---|---|---|---|---|---|
| LE (n =154) | M = 62.1 (sd=14.5) (21, 84) |
M = 80.5 (sd=56.4) (11, 265) |
M = 10.4 (sd=12.1) (0, 50) |
M=79.9 (sd=21.2) (20, 105) |
M=93.7 (sd=16.8) (35, 110) |
M = 87.6 (sd=19.1) (0, 100) |
| Hybrid (n=21) | M = 60.1 (sd=11.2) (33, 81) |
M = 29.6 (sd=26.2) (2, 99) |
NA | M=32.5 (sd=11.6) (10, 50) |
M=45.3 (sd=16.7) (15, 95) |
NA |
| NH (n = 49) | M = 26.2 (sd=5.6) (19, 42) |
NA | NA | NA | NA | NA |
LPD=length of profound deafness
NA=not available or not applicable
A normal-hearing (NH) group, who served as a normative reference for performance on the primary dependent variable (Speech Reception Threshold in Music -SRTM), was recruited through advertisements. Participants were required to pass a hearing screen in order to participate in the study. None had extensive formal music training (e.g., extensive high school or college-level instruction) as determined through a music background questionnaire11. Table 1 provides the means and standard deviations for the comparison groups regarding demographic variables: age at time of testing, months of CI use, length of profound deafness (LPD), pure tone thresholds, and scores (LE group) for speech perception (HINT) in quiet. A precise length of profound deafness was not available for 23 of those in the LE group, and profound deafness was not a relevant variable for the Hybrid group, given that criteria for implantation with that device require that the individual possess greater residual hearing. All participants were consented in compliance with ethical standards for human subjects
Measures
Speech Reception Threshold in Music Test (SRTM)
The primary dependent variable in this study was the Speech Reception Threshold in Music test (SRTM), designed to examine the ability of the listener to accurately recognize spondees against 3 contrasting types of background music. This test was modeled after a speech reception in noise task (SRT) reported in Turner et al.4, and described later in this paper, which examined spondee recognition in broadband noise or multi-talker babble. The SRTM requires identification of a spondee wordc, spoken by a female talker in the presence of background music. The fundamental frequency of the spondee items ranged from 212–250 Hz, and the spondees ranged in duration from 1.12 to 1.63 s. The spondees were presented at 65 dBC SPL in sound field at 45 degrees azimuth, one foot on the listener’s left against background music. All spondees were equalized to the same RMS level.
Spondee recognition was used as the target speech for several reasons: Prior studies4 indicate that nearly all CI recipients easily recognized each of these spondee words in quiet, so this task primarily measured the ability of the listeners to perceive speech in noise, rather than their underlying speech recognition ability. The use of spondees rather than sentences can minimize potentially influential variables such as contextual cues and cognitive abilities. In addition, the use of spondees permits a more direct comparison with speech perception in noise (SRT) data from our laboratories, reported later in this paper.
The word “music” represents a vast universe of structural combinations of pitches, rhythms, and timbres that vary with regard to masking properties and structural complexity in relation to cognitive processing. Clearly it is not feasible to examine all possible structural (pitch, timbre, rhythm, intensity) combinations comprising real-world music, or the specific interaction between those features and spoken communication. In this experiment, we selected 3 real-world excerpts that contrast one another on parameters of timbral blends, presence or absence of linguistic information (lyrics), size of musical ensemble (one instrument, 3-person instrumental-vocal combo, full symphonic orchestra), and genre. Excerpts of approximately 2 seconds in length included (a) a classical piano solo by Beethoven, with a structurally predictable and repetitive arpeggiated bass line against a simple solo melody line in the right hand (piano); (b) an orchestral composition by 20th Century composer, Igor Stravinski, (representing the full range of frequencies heard in contemporary orchestras) made up of large dissonant chords played in an irregular rhythmic pattern (orchestra); and (c) popular music with a linguistic component, that is, a solo male vocalist (Billy Joel) singing lyrics (which hypothetically, could be confused with words in the spondee list) against a small ensemble of drum set (bass drum beat with cymbal) and muted guitar (vocal). The small ensemble played a predictable rhythmic pattern of duple meter. All musical excerpts were calibrated to present at the same loudness during testing. Spectrograms for the 3 contrasting background music options appear in Figure 1.
Figure 1.
Spectrograms of the 3 contrasting background music stimuli. The vertical-axes display frequency in increments of 2000 Hz. The horizontal-axes represent time in hundredths of milliseconds.
Following each SRTM test item (spondee plus background music), the listeners responded on a touch screen with the spondee that they thought had been presented. Listeners were required to respond on each trial and were instructed to guess if they were not sure of the correct answer. The level of the background music was increased 2 dB following a correct response and decreased by 2 dB following an incorrect response, thus converging on the 50%-correct level of performance. The adaptive procedure continued until 14 reversals occurred, and then the average signal-to-noise ratio (SNR) of the final 10 reversals was taken as the SNR in noise value (a lower SNR means that the listener could understand speech in more adverse conditions). At least 4 experimental runs of this SNR in background music procedure were obtained from each subject; the final value was taken as the average of the final three experimental runs.
Speech in Noise Tasks (SRT)
The two tasks in the SRT are described in detail in Turner et al.4. Briefly, the speech reception tasks required the participant to identify a spondee word spoken by a female talker in the presence of either steady state broadband sound (steady state white noise that had been low-pass filtered at −12 dB/octave above 400 Hz, to generally simulate the long-term speech spectrum) or babble with sentences by a male and a female talker mixed together at equal RMS amplitudes. The spondees were presented at 65 dB SPL in sound field one foot on the listener’s left against background music. The same sample of noise background was presented on each trial. The level of the background was increased 2 dB following a correct response and decreased by 2 dB following an incorrect response, thus converging on the 50%-correct level of performance. The speech recognition task is similar to that used in the SRTM.
Pitch Ranking
The pitch ranking task is (PRT) described in detail in Gfeller et al.18 Briefly, this test measured how accurately the participant could determine the direction of a pitch change (higher or lower). Each trial consisted of three tones presented sequentially. The first two tones were the same frequency and the third tone was a different frequency. The response task was to indicate via touch screen whether the final tone was higher or lower in pitch than the first two. No feedback was given on accuracy.
The stimuli in this task were pure tones ranging from 131 Hz to 1048 Hz and were presented in a total of 540 F0 pairs, with pairs ranging in interval sizes of 1, 2, 3, or 4 semitones. Tones were generated digitally at a 44.1 kHz sampling rate and presented through the output of a DigiDesign Audiomedia III sound card. The tones were 500 ms in duration, with 25 ms rise-fall times. The time interval between tones of each trial sequence was 300 ms. The tones were roved across an 8 dB level around the average presentation level to minimize loudness cues.
Pitch ranking was calculated by dividing the number of correct responses by the total number of trials (six) at each combination of base frequency by interval size. The pitch ranking measure is modeled as a function of the size of the interval (difference in frequency between two sequential F0s) and the base frequency class of the two sequential F0s. The stimuli were presented in sound field at suprathreshold levels (average level of 87 dB SPL).
The LE group was tested with only their CIs. Thirty-one bilateral LE users were tested using both devices; contralateral ears of monaural LE recipients were plugged during testing. All of the Hybrid participants were tested in the Hybrid CI “condition” (Hybrid plus hearing aid in the ipsilateral ear), except for four who used either no hearing aid or who used only a contralateral hearing aid due to loss of residual hearing in the implant ear and were tested with only the CI.17 Those who use the Hybrid device had their contralateral ear plugged during testing. The Hybrid patients had symmetrical hearing losses. The use of a contralateral earplug resulted in the conveyed signal at least 35 dB, and up to 70 dB, more intense in the implanted ear compared to the plugged ear, thus reducing the sensation level of the opposite ear so that pitch discrimination from that ear would offer minimal, if any assistance in the task. These patients had significant hearing loss in the opposite ear, thus, hearing from the plugged ear is likely not a serious concern.
Auditory Profile and Musical Experience Variables
In order to characterize the two CI groups on potentially-influential variables, we gathered information on age at time of testing, the length of profound deafness (LPD), auditory thresholds at 250 and 500 Hz (with/without hearing aids), months of implant use (MOU), HINT in background noise (See Table 1), and CNC Words. We also gathered information on amount of music training and listening habits prior to and following implantation using the Iowa Music Background Questionnaire IMBQ (Gfeller et al., 2000). Two values for music training were documented: A score for music training during elementary school (MT1) consisting of length of participation in music lessons, classes, ensemble participation; and a music training score for classes and lessons during high school and beyond (e.g., college, community involvement) (MT2). Music listening habits (MLH), or typical amount of music listening time per week was documented for prior to (MLH-pre) and following (MLH-post) cochlear implantation.
Correlations among speech in background music, speech in noise, and pitch
Correlational analyses were used to examine the relations among the primary dependent variable (SRTM), pitch ranking (PRT), and speech in background noise (SRT) for the CI participants. The relations between SRTM, PRT, and speech in background noise or babble were considered of interest because prior studies indicate the benefit of frequency information in segregation of the target speaker from background noise; thus more accurate pitch perception, which is better presented through acoustic as opposed to electric hearing, may improve speech recognition in various noisy listening environments3,8–10. Scores and thresholds were not available for all CI recipients on each of the measures used in the correlational analyses. Thus, the sample size will be noted for each measure.
Statistical Methods
The first research question examined whether low frequency acoustic hearing in cochlear implantation assists in recognizing speech in different types of competing background music. Because each subject was measured on each musical stimulus, a repeated measures ANOVA analysis was employed using PROC MIXED in SAS v 9.2. Accuracy on speech in background music (SRTM) was investigated as a function of the following predictors: group membership (between individual-factor), type of musical stimuli, and age at time of testing and two-way interactions between the predictors. Additional hearing covariates (e.g., months of CI use, speech perception scores) were not included in the analyses because NH subjects did not have background hearing information. The predictor variables were included as fixed effects in the model. Adjusting for hearing covariates is an important consideration in this study. Therefore, a follow up repeated measures ANOVA using only the hybrid and LE groups was conducted with the additional hearing covariates to better understand potentially influential factors of auditory profile and musical backgrounds.(Research question b).
Results
The dataset contained 224 individuals but only 202 individuals were used in SRTM analysis because 22 individuals were excluded as a result of missing SRTM scores or missing predictor values. The final ANOVA model to assess the influence of low frequency acoustic hearing in recognition of differing background music included group membership, type of musical stimuli, age at time of testing and an interaction between group and music. Type of music (p <.0001), group (p<.0003), age at time of testing (p <.0001) and the interaction between group and music (p<.0001) were all significant predictors of SRTM thresholds.
A compound symmetry correlation structure was used to account for within subject measurements. The estimated total variance in SRTM threshold was 43.75 and the estimated variance from the compound symmetry was 30.17; thus, the within subject correlation between a subject’s score on the three types of music (piano, orchestra, vocal) accounted for 68% of the overall variability in the sample. The parameter estimate on age was 0.1941. When comparing two individuals within the same group and on the same music stimulus who differ in age by 1 year, on average the older individual will have an SRTM score 0.1941 dB higher than the individual who is one year younger. Therefore, as age increases, thresholds are on average larger (poorer). Similarly, someone 10 years older would be expected to have thresholds about 2 dB larger (1.941) conditional on music stimulus and group. No significant difference was detected in the relationship of SRTM and age between groups (i.e. no significant age*group interaction) despite the differing age ranges in each group.
Speech recognition for all groups, as well as the extent to which acoustic hearing was beneficial varied depending upon the particular type of background music. Because the interaction between group and music was statistically significant, the difference in speech recognition by music type needed to be assessed one group at a time. Note that follow-up tests are reported using the Tukey-Kramer adjusted p-value for multiple comparisons. For the LE, Hybrid, and NH groups, all three types of music were significantly different from one another in their effect on SRTM thresholds (p<.0001) (see Figure 2). A negative threshold for the signal-to-noise ratio (SNR) indicates greater ability to recognize speech at a lower mean threshold than the average background level. Data values indicate that all three groups had the best signal-to-noise thresholds on piano, second best on vocal and the poorest outcomes with orchestra. All three groups had negative mean SRTM thresholds for both vocal and piano music types; only the NH group had a negative mean threshold for the orchestra music. However, even the NH group experienced difficulty with speech reception in the orchestral listening condition.
Figure 2.
Box plots for Raw Signal-to-noise thresholds in the SRTM Test: music by group
There was variation in the results when comparing groups within each music type. The NH group had the best thresholds for all three music types. Although the Hybrid group had lower mean thresholds than the LE group in the vocal and orchestra conditions, the difference was not significant (using an a priori significance level of .05.). None of the groups were significantly different from one another when listening to the orchestra condition. In the piano condition, the LE and Hybrid groups had significantly poorer thresholds than the NH group (p<.0001); there was no significant difference between LE and Hybrid groups (p=.9943) (see Figure 2). In the vocal music condition, thresholds for LE and Hybrid were poorer than thresholds for the NH group but only the LE difference was significant (p<.0001 and p=.0903, respectively). Although the thresholds for the Hybrid group were not statistically different from the LE group (p=.5651), the distributions (figure 2) show a potential difference in SRTM thresholds between the two groups (covariate-adjusted mean thresholds: −5.66 and −2.62 for SE and LE, respectively).
The unaccounted for variability in both groups, coupled with the small sample size in the SE group, may have contributed to the inability to detect a significant difference; therefore a secondary analysis was conducted between the LE and Hybrid group with the inclusion of additional hearing predictors not available for the NH group to help explain more of the variability in SIBM thresholds. Additional factors considered were MOU, CNC words, MT1, MT2, MLH-pre, and MLH-post and all reasonable two-way interactions of the predictors. The final repeated measures ANOVA model includes the predictors: music type, group, age, CNC, MLH-pre and the interactions between CNC*music and music*group.
For the two CI groups, the significant predictors for speech recognition in background music (SRTM) thresholds were music type (p<0.0001), group (p=0.0098), age (p<0.0001), and CNC (p<0.0001). MLH-pre was not significant (p=0.0958). Significant interactions were CNC*music (p<0.0001) and music*group (p=0.0042). On average, older individuals had higher thresholds of SRTM. The effects of music, group, and CNC cannot be assessed without evaluating the interaction. A significant difference in SRTM thresholds between LE and Hybrid was now found for the vocal (p=0.0170) and orchestra (p=.0383) music conditions, but the piano condition was still not significant (p=.9984) (Figure 3). In general, the better the CNC words score, the smaller (better) the SRTM threshold. However, the CNC score was the most influential in predicting the thresholds in the piano condition, and the least influential in predicting thresholds in the orchestra condition.
Figure 3.
Covariate Adjusted Signal-to-Noise Mean Thresholds in SRTM test for CI Users with 95% confidence intervals.
Correlations between Speech Recognition in Complex Listening Conditions and Pitch Ranking
Additional research questions involved the relations among speech recognition in competing music, speech in background noise, and accurate pitch perception. Data for the LE and Hybrid group were consolidated for correlation analyses between the SRTM, SRT for noise or babble, and pitch ranking (PRT). As Table 2 indicates, there were statistically significant correlations between pitch ranking and all measures except speech in steady state background noise. The strongest relations were between pitch ranking and speech in babble for the orchestra condition, which was the most challenging competing condition for music background.
Table 2.
Correlations between pitch ranking (PRT) and speech recognition in noise or music for CI recipients
| SRT in noise (n=132) |
SRT in babble (n=127) |
SRTM Piano (n=168) |
SRTM Vocal (n=166) |
SRTM Orchestra (n=167) |
|
|---|---|---|---|---|---|
| PRT Overall |
r= −.16 p=.071 |
r= −.30 p=.0018 |
r= −.21 p=.008 |
r= −.23 p=.003 |
r= −.31 p<.0001 |
| PRT LE |
r= −0.24 p=0.0013 n=120 |
r= −0.28 p=0.0013 n=116 |
r= −0.26 p=0.0013 n=151 |
r= −0.24 p=0.0093 n=150 |
r= −0.30 p=0.0002 n=151 |
| PRT Hybrid |
r= 0.35 p=0.2579 n=12 |
r= −0.08 p=0.8054 n=11 |
r= 0.03 p=0.9218 n=16 |
r= 0.027 p=0.9200 n=16 |
r= 0.09 p=0.9749 n=16 |
Correlations between SRTM, and Speech in Noise (SRT)
In considering overall means for the SRT, the mean threshold for noise was slightly lower than the mean threshold for babble. The overall SRTM (all types of music) and SRT mean thresholds were highly correlated (r=.70, p<.0001). This correlation differed by group. Within the LE group, SRTM and SRT were highly correlated (r=.73, p<.0001) whereas there was not a significant correlation within Hybrid group (r=.41, p=.19). The lack of significant correlation in the Hybrid group could be related to the small sample size; that is, a small number of Hybrid participants had both SRTM and SRT thresholds available for analyses (n=12). With a significant correlation, as the SRT thresholds decreased (which indicates more accurate recognition in adverse listening conditions), the SRTM thresholds were expected to decrease as well.
The pairwise correlations within the subgroups as a function of music type were all relatively strong, and in a positive direction. Within the LE group, SRT in noise was most highly correlated with SRTM thresholds with piano, followed by orchestra, and then vocal (all p<.0001). For the Hybrid group, the only statistically significant correlations were orchestra with SRT in noise (p=.031); the correlation between vocal and SRT in noise neared significance (p=.054). The small number of observations was an issue in analyses within the Hybrid group.
Discussion
As these data indicate, the extent to which background music is problematic with regard to speech recognition varies as a function of the structural properties of the music as well as the type of auditory input (NH, acoustic plus electric, electric) and auditory status (residual hearing, speech perception abilities). This study used three structurally different excerpts of real-world music as competing stimuli. For example, even the NH listeners experienced considerable difficulty in the orchestra condition. These results emphasize that persons with a healthy hearing mechanism as well as those with compromised hearing can experience communicative difficulties in commercial or social situations (e.g., restaurants, parties) in which background music is played, depending upon the characteristics of the music.
Future studies that systematically examine specific acoustic variables of background music could help determine particular acoustic elements that create more challenging listening conditions for CI recipients than others. For example, data from Ekstrom and Borg (2011) found that music had a lower masking effect for both NH listeners and hearing impaired listeners (hearing aids users) when the music was played at slower tempo and in a higher frequency range. The clinical application of such information, unfortunately, is tempered by the reality that real-world music as heard in commercial or social settings is naturally quite variable and includes on-going and rapid changes in structural features (e.g., dynamic range, instrumental combinations, variations in tempo19). Furthermore, control of the musical selections and sound system in public places are unlikely to be under listener control. Thus, practically speaking, the findings of the present study emphasizes the importance of effective counseling for CI recipients regarding proactive management of differing and ever-changing listening environments (e.g., direct requests for more quiet locations in restaurants, or that the loudness of the music be turned down).
With regard to group differences (LE and Hybrid), after adjusting for hearing covariates, the Hybrid group did achieve a statistically significant difference from the LE group at the 0.05 level. This helps to confirm the notion that the pattern of distributions was different. The mean thresholds were −1.91 dB for the LE group and −5.30 for the Hybrid group, which has possible clinical implications. It is also important to note that the LE thresholds range from −21.6 to 31.56 dB, while the Hybrid thresholds have much less variation in scores ranging from −18.68 to 9.28. Thus, on average, the bottom tail of participants performed better using the Hybrid device than the bottom tail of participants using the LE device. Examining the worst performing individuals (i.e., the top tail), only 5% scored worse than 7.24 dB using the Hybrid, while the 5% cutoff for the LE was 9.04 dB. Furthermore, the 25th, 50th, and 75th percentiles for the Hybrid group were −12.6, −5.56, and 1.14 dB, respectively, while the same percentiles were −7.24, −1.76, and 2.16 dB, respectively for the LE group. The worst performing participants were considerably worse for the LE than for hybrid. Thus, the lack of significance in the initial model which included NH appears to be an artifact of the small Hybrid sample size and the large variability exhibited by LE participants.
In general, CI recipients were hampered in word recognition by background music more than NH listeners, but those CI recipients who as a group had better residual hearing, (and thus access to acoustic stimulation, e.g., Hybrid group) had better mean threshold than those in the LE group for the two more challenging music conditions. There was a significant advantage for the Hybrid users in the vocal music condition.
These results, while less straightforward than those of some experiments reviewed by McDermott20 suggest possible real-world listening advantages of preserving low-frequency residual acoustic hearing, particularly to supporting some complex listening situations. The benefits of preserved acoustic hearing are of particular importance, given the current challenges in providing better pitch discrimination through electrical stimulation20. This was emphasized by the lack of difference (as noted in the preliminary analyses) among unilateral or bilateral CI users of different device types or strategies (e.g., ACE, SPEAK, CIS).
The correlations between the SRTM and speech in noise measures (SRT) (r=.70), as well as the SRTM and SRT with pitch ranking (see Table 2) results provide additional support to prior studies indicating that the ability of patients to separate voices in multi-talker situations and complex background music is related to their ability to accurately perceive pitch, and that preservation of low-frequency acoustic hearing can be quite helpful4, 21. Segregation of voices using voice pitch has been shown to be important by a number of researchers22, 23. However, it might also be that residual hearing (and its better pitch sensations) may only be one tool that listeners use to separate speech from music. There may be other cues or abilities that CI listeners can use to help in this task, which would account for the high levels of performance seen by some LE users. Several recent studies have shown that acoustic plus electric stimulation is less effective in identifying voices or gender compared to LE users.24
This study did not examine directly the specific contributions of residual hearing (e.g., PTAs, testing the hearing aid condition only) to perceptual acuity. Some preliminary studies of bimodal conditions (LE CI plus hearing aids) suggest that there may be a synergistic effect of bimodal stimulation25–27; thus, tests which compare various conditions (LE plus hearing aid, short electrode alone, hearing aid alone, and short electrode plus hearing aids in the ipsilateral and/or contralateral side) would help to further elucidate the relative contributions of electrical and acoustic stimulation and implant technology.
Consistent with prior studies13, 28, age at time of testing was a significant predictor of poorer performance on the SRTM. More advanced age is associated with greater difficulty in spectrally complex listening tasks29, 30. These findings suggest particular need for accommodations in acoustic environments (e.g., eliminating or turning down the volume of background music) to support optimal communication of CI recipients who are more advanced in age, particularly with regard to real-world listening tasks such as listening to a conversational partner in the presence of MUZAK or background music in a movie or at a party.
Supplementary Material
Acknowledgements
This study was supported by grant 2 P50 DC00242 from the NIDCD, NIH; grant 1R01 DC000377 from the NIDCD, grant RR00059 from the General Clinical Research Centers Program, NCRR, NIH; and the Iowa Lions Foundation.
Contributor Information
Kate Gfeller, School of Music; Department of Communication Sciences and Disorders; Iowa Cochlear Implant Clinical Research Center.
Christopher Turner, Department of Communication Sciences and Disorders; Department of Otolaryngology, Head and Neck Surgery, Iowa Cochlear Implant Clinical Research Center.
Jacob Oleson, Department of Biostatistics, Iowa Cochlear Implant Clinical Research Center, The University of Iowa Hospitals and Clinics, The University of Iowa.
Stephanie Kliethermes, Department of Biostatistics, Iowa Cochlear Implant Clinical Research Center, The University of Iowa Hospitals and Clinics, The University of Iowa.
Virginia Driscoll, Iowa Cochlear Implant Clinical Research Center, The University of Iowa Hospitals and Clinics, The University of Iowa.
References
- 1.Wilson B. Cochlear implant technology. In: Niparko JK, Kirk KI, Mellon NK, Robbins AM, Tucci DL, Wilson BS, editors. Cochlear implants: Principles & practices. New York: Lippincott, Williams & Wilkins; 2000. pp. 109–118. [Google Scholar]
- 2.Gantz B, Turner C, Gfeller K, Lowder M. Preservation of hearing in cochlear implant surgery: Advantages of combined electrical and acoustical speech processing. Laryngoscope. 2005;115:796–802. doi: 10.1097/01.MLG.0000157695.07536.D2. [DOI] [PubMed] [Google Scholar]
- 3.Qin MK, Oxenham AJ. Effects of simulated cochlear-implant processing on speech reception in fluctuating maskers. J Acoust Soc Am. 2003;114:446–454. doi: 10.1121/1.1579009. [DOI] [PubMed] [Google Scholar]
- 4.Turner CW, Gantz BJ, Vidal C, Behrens A, Henry BA. Speech recognition in noise for cochlear implant listeners: Benefits of residual acoustic hearing. J Acoust Soc Am. 2004;115:1729–1735. doi: 10.1121/1.1687425. [DOI] [PubMed] [Google Scholar]
- 5.Nelson P, Jin S-H. Factors affecting speech understanding in gated interference: Cochlear implant users and normal-hearing listeners. J Acoust Soc Am. 2004;115:2286. doi: 10.1121/1.1703538. [DOI] [PubMed] [Google Scholar]
- 6.Stickney GS, Zeng F-G, Litovsky R, Assmann PF. Cochlear implant speech recognition with speech maskers. J Acoust Soc Am. 2004;116:1081–1091. doi: 10.1121/1.1772399. [DOI] [PubMed] [Google Scholar]
- 7.Henry BA, Turner CW, Behrens A. Spectral peak resolution and speech recognition in quiet: Normal hearing, hearing impaired and cochlear implant listeners. J Acoust Soc Am. 2005;118:1111–1121. doi: 10.1121/1.1944567. [DOI] [PubMed] [Google Scholar]
- 8.Turner CW. Hearing loss and the limits of amplification. Audiol Neurootol. 2006;11(suppl 1):2–5. doi: 10.1159/000095606. [DOI] [PubMed] [Google Scholar]
- 9.Fu QJ, Shannon RV, Wang X. Effects of noise and spectral resolution on vowel and consonant recognition. J. Acoust. Soc. Am. 1998;104:3586. doi: 10.1121/1.423941. [DOI] [PubMed] [Google Scholar]
- 10.Friesen LM, Shannon RV, Baskent D, Wang X. Speech recognition in noise as a function of the number of spectral channels: Comparison of acoustic hearing and cochlear implants. J. Acoust. Soc. Am. 2001;110:1150–1163. doi: 10.1121/1.1381538. [DOI] [PubMed] [Google Scholar]
- 11.Gfeller KE. Music: A human phenomenon and therapeutic tool. In: Davis WB, Gfeller KE, Thaut MH, editors. An introduction to music therapy theory and practice. 3rd ed. Silver Spring, MD: American Music Therapy Association; 2008. pp. 41–75. [Google Scholar]
- 12.Gfeller KE, Christ A, Knutson JF, Witt SA, Murray KT, Tyler RS. The musical backgrounds, listening habits, and aesthetic enjoyment of adult cochlear implant recipients. J Am Acad Audiol. 2000;11:390–406. [PubMed] [Google Scholar]
- 13.Gfeller K, Jiang D, Oleson JJ, Driscoll V, Knutson J. Temporal stability of music perception and appraisal scores of adult cochlear implant recipients. J Am Acad Audiol. 2010;21:28–34. doi: 10.3766/jaaa.21.1.4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Erkstrom S-R, Borg E. Hearing speech in music. Noise & Health. 2011 Jul-Aug;13:277–285. doi: 10.4103/1463-1741.82960. [DOI] [PubMed] [Google Scholar]
- 15.Gantz BJ, Turner CW. Combining acoustic and electric hearing. Laryngoscope. 2003;113:1726–1730. doi: 10.1097/00005537-200310000-00012. [DOI] [PubMed] [Google Scholar]
- 16.Woodson EA, Dempewolf RD, Gubbels SP, Porter AT, Oleson JJ, Hansen MR, Gantz BJ. Long-term hearing preservation after microsurgical excision of vestibular schwannoma. Otol Neurotol. 2010;31(7):1144–1152. doi: 10.1097/MAO.0b013e3181edb8b2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Gantz BJ, Hansen MR, Turner CW, Oleson JJ, Reiss LA, Parkinson AJ. Hybrid 10 clinical trial: preliminary results. Audiol Neurootol. 2009;14(Suppl 1):32–38. doi: 10.1159/000206493. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Gfeller K, Turner C, Oleson J, Zhang X, Gantz B, Froman R, Olszewski C. Accuracy of cochlear implant recipients on pitch perception, melody recognition, and speech reception in noise. Ear Hear. 2007;28:412–423. doi: 10.1097/AUD.0b013e3180479318. [DOI] [PubMed] [Google Scholar]
- 19.Chasin M. Music and hearing aids. The Hearing Journal. 2003;56(7):36–41. [Google Scholar]
- 20.McDermott H. Benefits of combined acoustic and electric hearing for music and pitch perception. Seminars in Hearing. 2011;32:103–114. [Google Scholar]
- 21.Qin MK, Oxenham AJ. Effects of introducing unprocessed low-frequency information on the reception of envelope-vocoder processed speech. J Acoust Soc Am. 2006;119:2417–2426. doi: 10.1121/1.2178719. [DOI] [PubMed] [Google Scholar]
- 22.Assmann PF, Summerfield Q. Modeling the perception of concurrent vowels. J Acoust Soc Am. 1990;88:680–697. doi: 10.1121/1.399772. [DOI] [PubMed] [Google Scholar]
- 23.Brokx JPL, Nooteboom SG. Intonation and the perceptual separation of simultaneous voices. J. Phonetics. 1982;10:23–36. [Google Scholar]
- 24.Dorman MF, Gifford RH, Spahr AJ, McKarns SA. The benefits of combining acoustic and electric stimulation for the recognition of speech, voice and melodies. Audiol Neurotol. 2008;13:105–112. doi: 10.1159/000111782. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Dillier N. Combining cochlear implants and hearing instruments; Proceedings of the Third International Pediatric Conference; Chicago, IL. 2004. Nov, pp. 163–172. [Google Scholar]
- 26.Kong Y, Stickney G, Zeng F. Speech and melody recognition in binaurally combined acoustic and electric hearing. J Acoust Soc Am. 2005;117:1351–1361. doi: 10.1121/1.1857526. [DOI] [PubMed] [Google Scholar]
- 27.El Fata F, James C, Laborde M, Fraysse B. How much residual hearing is 'useful' for music perception with cochlear implants? Audiol Neurootol. 2009;14:14–21. doi: 10.1159/000206491. [DOI] [PubMed] [Google Scholar]
- 28.Gfeller K, Turner C, Woodworth G, Mehr M, Fearn R, Witt S, Stordahl J. Recognition of familiar melodies by adult cochlear implant recipients and normal hearing adults. Cochlear Implants Int. 2002;3:29–53. doi: 10.1179/cim.2002.3.1.29. [DOI] [PubMed] [Google Scholar]
- 29.Gordon-Salant S, Fitzgibbons PJ. Recognition of multiply degraded speech by young and elderly listeners. J Speech Hear Res. 1995;38:1150–1156. doi: 10.1044/jshr.3805.1150. [DOI] [PubMed] [Google Scholar]
- 30.Martin JS, Jerger JF. Some effects of aging on central auditory processing. J Rehabil Res Dev. 2005;42(suppl 2):25–43. doi: 10.1682/jrrd.2004.12.0164. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.




