Abstract
Melodic contour identification was measured in cochlear implant (CI) and normal-hearing (NH) subjects for piano samples processed by four bandpass filters: low (310–620 Hz), middle (620–2480 Hz), high (2480–4960 Hz), and full (310–4960 Hz). NH performance was near-perfect for all filter ranges and much higher than CI performance. The best mean CI performance was with the middle frequency range; performance was much better for some CI subjects with the middle rather than the full filter. These results suggest that acoustic filtering may reduce potential mismatches between fundamental frequencies and harmonic components thereby improving CI users’ melodic pitch perception.
Introduction
Because of the relatively poor spectral resolution provided by cochlear implants (CIs), CI users generally have great difficulty with music perception and∕or appreciation. The 12–22 implanted electrodes typically available are not sufficient to support complex pitch perception, which is essential for good melodic pitch perception and music timbre perception (Shannon et al., 2004). In clinical speech processors, the frequency analysis filters are too broadly spaced to directly convey fundamental frequency (F0) place cues; CI listeners must use within-channel temporal envelope cues to discriminate small changes in F0. These temporal pitch cues are relatively weak, compared to place pitch cues. Previous studies have tried to optimize CI processing for F0 representation, with only minor success (e.g., Geurts and Wouters, 2004; Laneau et al., 2006; Kasturi and Loizou, 2007).
Due to processor- and patient-related factors (e.g., number and location of healthy neurons), individual CI users may perceive somewhat different musical patterns given the same acoustic input. Galvin et al. (2008) found that the CI users’ melodic contour identification (MCI) was influenced by different instruments. Individual subjects performed best (or worst) with one particular instrument, with no predictable effect of instrument timbre. In a related study, Galvin et al. (2009) found that CI users’ MCI performance deteriorated sharply in the presence of a competing instrument. CI subjects differed in their attention to the different frequency components, with some attending to only the highest or the lowest frequency components. These studies suggest that individual CI users may attend to different frequency components when extracting melodic pitch. It is possible that some frequency components may interfere with melodic pitch perception, depending on the acoustic input, the CI signal processing, and individual CI users’ stimulation patterns.
Previous work with normal-hearing (NH) listeners by Oxenham et al. (2004) suggests that harmonic information should be tonotopically correct to extract accurate F0 information. This is rarely the case for complex stimulation patterns with CIs. Because of the limited insertion depth, there is a tonotopic mismatch between the acoustic F0 (and∕or lower harmonics) and the electrode location. This “absolute mismatch” between the acoustic frequency and electrode place is reduced as the frequency increases. Accordingly, there is also a “relative mismatch” between F0 and harmonic information. When patient-related factors such as exact electrode position and nerve survival are considered, there can be significant “spectral warping” to the acoustic input. In both scenarios, CI users’ melodic pitch perception may benefit from optimal filtering that reduces the absolute and∕or relative mismatch. In this study, MCI was measured in CI and NH subjects listening to piano samples that were bandpass-filtered (low, middle, high, and full frequency ranges) to preserve different amounts of F0 and harmonic information and to see how different frequency regions contributed to melodic pitch perception.
Methods
Eight adult, post-lingually deafened CI users participated in this study; subjects were all unilateral CI users. All CI subjects participated in a similar previous experiment (Galvin et al., 2008) and thus were familiar with the MCI task. CI subject demographics are shown in Table Table 1.. Five adult NH listeners (one male and four females, aged 22–29 yr old) also were tested. All subjects were paid for their participation, and all provided informed consent in accordance with the local Institutional Review Board.
Table 1.
CI subject demographics. HL = hearing loss, N-24 = Nucleus 24, N-22 = Nucleus 22, HiRes 90k = HiResolution 90k, ACE = Advanced combination encoder, SPEAK = Spectral peak encoder, pps∕ch = pulses per second per channel.
| Subject | Gender | Age at profound HL (yr) | Age at implantation (yr) | Age at testing (yr) | Device | Strategy | Input frequency (kHz) | Stimulation rate (pps∕ch) |
|---|---|---|---|---|---|---|---|---|
| S1 | M | 54 | 55 | 57 | Freedom | ACE | 0.19–7.89 | 1200 |
| S2 | F | 40 | 60 | 64 | N-24 | ACE | 0.19–7.89 | 1200 |
| S3 | M | 45 | 49 | 68 | N-22 | SPEAK | 0.15–10.1 | 250 |
| S4 | M | 35 | 35 | 52 | N-22 | SPEAK | 0.15–10.1 | 250 |
| S5 | M | 65 | 13 | 79 | N-22 | SPEAK | 0.15–10.1 | 250 |
| S6 | M | 44 | 52 | 65 | N-22 | SPEAK | 0.15–10.1 | 250 |
| S7 | F | 55 | 67 | 76 | N-24 | ACE | 0.19–7.89 | 900 |
| S8 | M | 65 | 65 | 75 | HiRes 90k | Fidelity 120 | 0.25–8.70 | 2750 |
Stimuli consisted of nine five-note melodic contours (rising, flat, falling, flat–rising, falling–rising, rising–flat, falling–flat, rising–falling, and flat–falling), similar to those used in Galvin et al. (2007, 2008, 2009). Melodic contours were generated in relation to a “root note,” the lowest note in the contour. The root note was D#4 (312 Hz), selected to accommodate the acoustic input range of Advanced Bionics devices (see Table Table 1. for the input frequency range for each subject). This root note frequency approached listeners’ limits of temporal processing (∼300 Hz, according to Shannon, 1983), requiring greater attention to spectral patterns for melodic pitch perception. All notes in the contours were generated according to , where f is the frequency of the target note, x is the number of semitones relative to the root note, and fref is the frequency of the root note. The spacing between each of the notes within each contour was 1, 2, or 3 semitones. Given the different contours and semitone spacing, the F0 range was 312–624 Hz. Each note was 250 ms in duration, and the interval between notes was 50 ms. All melodic contours were played by simulations of a piano (the most difficult instrument in Galvin et al., 2008) generated using sampling and MIDI synthesis.
Each contour was processed by four bandpass filters: low (310–620 Hz), middle (620–2480 Hz), high (2480–4960 Hz), and full (310–4960 Hz). The bandpass filter ranges were selected to represent F0-only cues (low), lower-order harmonics (middle), higher-order harmonics (high), or all frequency components (full). Note that the low and high filters were only one octave wide, while the middle filter was two octaves wide and the full filter was four octaves wide. Bandpass filters were 20th order Butterworth filters (240 dB∕octave). After filtering, stimuli were normalized to have the same long-term root-mean-square (rms) as the input. Figure 1 shows electrodograms for the “rising” contour processed by the different bandpass filters. Electrodograms were generated using custom software that simulated the default processing parameters for Advanced Combination Encoder (ACE) strategy (Vandali et al., 2000) used in Nucleus-24 and Freedom CI devices, i.e., 900 pulses per second per channel (pps∕ch), Frequency Allocation Table 6 and 8 maxima. With the low filter, the pattern spans ∼5 electrodes, with some shift in the “weight” from apex to base. With the middle filter, the pattern is more diffuse, spanning ∼12 electrodes. With the high filter, the pattern is more diffuse than with the low filter, spanning ∼9 electrodes. The stimulation pattern was most diffuse with the full filter, spanning ∼18 electrodes; pitch changes are signaled by strong changes in the electrode firing pattern. Note that the electrodograms would be similar for the Spectral Peak (SPEAK) strategy, except for slight differences due to frequency allocation (Table 9 vs Table 6), spectral maxima (Table 8 vs Table 6) and stimulation rate (250 pps∕ch vs 900 pps∕ch).
Figure 1.
(Color online) Electrodograms for the D#4 rising contour (2-semitone spacing) played by the piano with the different bandpass filters. The electrodograms were generated using the clinical defaults for the ACE strategy used in Nucleus-24 and Freedom devices: 900 pps∕ch, Table 6 and 8 maxima.
Procedures were generally similar to those in Galvin et al. (2007, 2008, 2009). CI subjects were tested with their clinically assigned speech processors and settings. Once set, these settings were not changed during testing. CI and NH subjects were tested while sitting in a sound-treated booth (IAC) and directly facing a single loudspeaker (Tannoy Reveal). All stimuli were presented acoustically at 70 dBA. The different bandpass filter conditions were tested in separate blocks and the test block order randomized across subjects. During each test block, a contour was randomly selected (without replacement) from among the 54 stimuli (nine contours × three semitone spacings × two repeats) and presented to the subject, who responded by clicking on one of the nine response boxes shown onscreen. Subjects were allowed to repeat each stimulus up to three times. No preview or trial-by-trial feedback was provided. A minimum of three test blocks were tested for each bandpass filter condition.
Results
Figure 2 shows individual and mean MCI performance for CI and NH listeners with the different bandpass filters. Mean NH performance was nearly perfect across all filter conditions, and significantly better than that of CI subjects (Mann–Whitney rank sum test: p < 0.001). Mean CI performance was best with the middle filter (63.2% correct) and poorest with the high filter (29.7% correct); mean performance was similar for the low (49.1% correct) and full filters (50.2% correct). There was strong inter-subject variability. For some subjects (S1, S2, S3, and S6), there was little difference between the middle and full bands. For others (S2, S6, and S7), there was little difference between the low and middle bands. For some subjects (S4, S5, S7, and S8), performance with the middle band was much better than that with the full band (nearly 40 points better for S8).
Figure 2.
(Color online) Individual and mean MCI performance for CI and NH subjects with the different bandpass filters. The error bars show one standard error of the mean, and the dashed line shows chance performance level (11.1% correct).
Figure 3 shows mean MCI performance for the different bandpass filters, as a function of semitone spacing in the contours. For 1- or 2-semitone spacing, the middle filter provided the best performance. For 3-semitone spacing, there was little difference between the low, middle, and full filters. Performance was poorest with the high filter and differed little across the semitone spacing conditions. A two-way repeated-measures analysis of variance (RM ANOVA) showed significant main effects for bandpass filter [F(3,42) = 9.1, p < 0.001] and semitone spacing [F(2,42) = 22.2, p < 0.001], as well as a significant interaction [F(6,42) = 8.8, p < 0.001].
Figure 3.
(Color online) Mean MCI performance (across subjects) for the different bandpass filters, as a function of semitone spacing between notes in the contour. Overall mean performance is shown at far right. The error bars show one standard error of the mean, and the dashed line shows chance performance level (11.1% correct).
As shown in Table Table 2., post-hoc Bonferroni t-tests showed a significant effect for bandpass filtering within the 2- and 3-semitone spacings (p < 0.05), but not within the 1-semitone spacing. Post-hoc Bonferroni t-tests also showed a significant effect for semitone spacing within the low, middle, and full bands (p < 0.05), but not within the high band.
Table 2.
Results of post-hoc Bonferroni pair-wise comparisons from the two-way RM ANOVA. Significant differences are indicated in bold italics.
| Bandpass filter | ||||||||
|---|---|---|---|---|---|---|---|---|
| Low | Middle | High | Full | |||||
| Comparison | t | p | t | p | t | p | t | p |
| 3 semi vs 1 semi | 8.137 | <0.001 | 3.059 | 0.228 | 1.065 | 1.000 | 4.098 | 0.009 |
| 3 semi vs 2 semi | 2.966 | 0.297 | 1.274 | 1.000 | 0.983 | 1.000 | 1.333 | 1.000 |
| 2 semi vs 1 semi | 5.171 | <0.001 | 4.334 | 0.004 | 0.083 | 1.000 | 2.764 | 0.515 |
| Semitone spacing | ||||||
|---|---|---|---|---|---|---|
| 3 | 2 | 1 | ||||
| Comparison | t | p | t | p | t | p |
| Full vs low | 1.078 | 1.000 | 0.025 | 1.000 | 1.651 | 1.000 |
| Full vs middle | 0.838 | 1.000 | 2.600 | 0.880 | 1.540 | 1.000 |
| Full vs high | 4.452 | 0.005 | 2.832 | 0.493 | 1.020 | 1.000 |
| Low vs middle | 0.240 | 1.000 | 2.625 | 0.828 | 3.190 | 0.192 |
| Low vs high | 5.530 | <0.001 | 2.807 | 0.525 | 0.631 | 1.000 |
| Middle vs high | 5.291 | <0.001 | 5.432 | <0.001 | 2.560 | 0.971 |
Discussion
The results showed that, at least for these stimuli (melodic contours with a D#4 root note, 1–3 semitone spacing, played by simulations of a piano), the middle filter band provided as good if not better MCI performance as with the full band. For some CI subjects, performance improved by 30–40 percentage points when the full signal was filtered by the middle band. However, even with the bandpass filtering, mean CI performance was much poorer than that of NH listeners. Below we discuss the results in greater detail.
For CI subjects, mean performance with the middle band was approximately 11 and 19 points better than that for the full band with 1-semitone or 2-semtione spacing, respectively. With the 1- and 2-semitone spacings, performance with the low band was similar to that with the full band, suggesting that F0 spectral cues may have dominated the full-band percept. This may explain why performance was poorer with the full rather than the middle band for some subjects. Low-frequency acoustic information was upwardly shifted by CI subjects’ clinical frequency allocations, resulting in some degree of absolute mismatch between acoustic and electric frequency information. As suggested by Oxenham et al. (2004), pitch perception is weakened when temporal envelope cues are delivered to the wrong tonotopic location. The degree of absolute mismatch may have been reduced for harmonic cues in the middle band. There may have also been a relative mismatch between F0 and harmonic cues in the full band, due to the increased absolute mismatch of low-frequency F0 cues. As such, F0 and harmonic information is subject to spectral warping. Removing low-frequency information may reduce this relative mismatch and the associated spectral warping. Alternatively, the improved performance with the middle band may have been due to removing low-frequency spectral cues that dominated the full-band melodic pitch percept for some CI users (e.g., CI4 and CI7).
It should be noted that the low, middle, and high frequency bands were not equal in terms of octave-width. The low and high bands were each one octave wide, the middle band was two octaves wide, and the full band was four octaves wide. The filter bandwidths were chosen to preserve different amounts of F0 and harmonic information. Given most CI subjects’ frequency allocation, more channels would have been available in the middle and high bands than in the low band, with the most available in the full band. While the full band was widest and comprised nearly all the electrodes in the array, the middle band provided the best mean performance, with much smaller bandwidth and fewer electrodes included in the stimulation pattern. In clinical CI processors, the frequency allocation is generally optimized for speech recognition, with better frequency resolution around ∼1500 Hz. This distribution of analysis bands may have been better optimized for the experimental contours with the middle band than with the low or high bands. While the number of stimulated electrodes may have contributed to differences between the low, middle, and high bands, the pattern of results suggests that, for some CI subjects, the poorer performance with the full band may have been due to the dominant but limited F0 spectral cues in the low band.
There was great inter-subject variability, both in terms of overall MCI performance and in terms of the effect of the bandpass filters. While some subjects’ MCI performance greatly improved with bandpass filtering, the most musically experienced subjects (S1 and S2) performed equally well with the middle and full bands (subject S2 performed equally well with the low, middle, and full bands). Extensive musical experience before and after implantation may have allowed these listeners to better extract melodic pitch from all of the available cues in the full band. Poorer performing subjects may similarly learn to extract melodic pitch cues from the full band with experience or training. Previous studies (Galvin et al., 2007) have shown that MCI training can dramatically improve MCI performance. It is unclear whether training with the full band or gradually broadening the middle band (which provided the best acute performance) would provide the best benefit.
The effects of bandpass filtering were evaluated using very simple stimuli that do not reflect the dynamics and complexities of “real-world” music and listening conditions. For example, polyphonic music perception, timbre perception, or sound quality may not be improved by bandpass filtering. Bandpass filtering may need to be optimized for individual CI users or different pitch ranges. However, the present findings are a beneficial first step toward improving CI users’ melodic pitch perception using simple filtering of the acoustic input. It is possible that similar bandpass filtering may help to remove unwanted low-frequency noise (e.g., road noise in a car) that may interfere with speech signals. Ultimately, CI users may benefit from adjusting the acoustic frequency input “on-the-fly” to address different listening needs and conditions.
ACKNOWLEDGMENTS
The authors thank all the subjects who participated in this study, as well as Sandy Oba and Meimei Zhu for help with data collection. The authors also thank Dr. Robert V. Shannon, Dr. Diana Deutsch, and two anonymous reviewers for helpful comments. This work was supported by National Institute of Health, Grant No. DC004993.
References and Links
- Galvin, J., Fu, Q. -J., and Nogaki, G. (2007). “Melodic contour identification by cochlear implant listeners,” Ear Hear. 28, 302–319. 10.1097/01.aud.0000261689.35445.20 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Galvin, J., Fu, Q. -J., and Oba, S. (2008). “Effect of instrument timbre on melodic contour identification by cochlear implant users,” J. Acoust. Soc. Am. 124, EL189–EL195. 10.1121/1.2961171 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Galvin, J., Fu, Q. -J., and Oba, S. (2009). “Effect of a competing instrument on melodic contour identification by cochlear implant users,” J. Acoust. Soc. Am. 125, EL98–EL103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Geurts, L., and Wouters, J. (2004). “Better place coding of the fundamental frequency in cochlear implants,” J. Acoust. Soc. Am. 115, 844–852. 10.1121/1.1642623 [DOI] [PubMed] [Google Scholar]
- Kasturi, K., and Loizou, P. (2007). “Effect of filter spacing on melody recognition: Acoustic and electric hearing,” J. Acoust. Soc. Am. 122, EL29–EL34. 10.1121/1.2749078 [DOI] [PubMed] [Google Scholar]
- Laneau, J., Wouters, J., and Moonen, M. (2006). “Improved music perception with explicit pitch coding in cochlear implants,” Audiol. Neurootol. 11, 38–52. 10.1159/000088853 [DOI] [PubMed] [Google Scholar]
- Oxenham, A., Bernstein, J., and Penagos, H. (2004). “Correct tonotopic representation is necessary for complex pitch perception,” Proc. Natl. Acad. Sci. U.S.A. 101, 1421–1425. 10.1073/pnas.0306958101 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shannon, R. V. (1983). “Multichannel electrical stimulation of the auditory nerve in man. I. Basic psychophysics,” Hear. Res. 11, 157–189. 10.1016/0378-5955(83)90077-1 [DOI] [PubMed] [Google Scholar]
- Shannon, R., Fu, Q. -J., and Galvin, J. (2004). “The number of spectral channels required for speech recognition depends on the difficulty of the listening situation,” Acta Oto-Laryngol., Suppl. 552, 50–54. 10.1080/03655230410017562 [DOI] [PubMed] [Google Scholar]
- Vandali, A., Whitford, L., Plant, K., and Clark, G. (2000). “Speech perception as a function of electrical stimulation rate: Using the Nucleus 24 cochlear implant system,” Ear Hear. 21, 608–624. 10.1097/00003446-200012000-00008 [DOI] [PubMed] [Google Scholar]



