Speech recognition and temporal amplitude modulation processing by Mandarin-speaking cochlear implant users

Xin Luo; Qian-Jie Fu; Chao-Gang Wei; Ke-Li Cao

doi:10.1097/AUD.0b013e3181888f61

. Author manuscript; available in PMC: 2009 Jul 1.

Published in final edited form as: Ear Hear. 2008 Dec;29(6):957–970. doi: 10.1097/AUD.0b013e3181888f61

Speech recognition and temporal amplitude modulation processing by Mandarin-speaking cochlear implant users

Xin Luo ¹, Qian-Jie Fu ¹, Chao-Gang Wei ², Ke-Li Cao ²

PMCID: PMC2704892 NIHMSID: NIHMS104804 PMID: 18818548

Abstract

OBJECTIVE

Fundamental frequency (F0) information is important to Chinese tone and speech recognition. Cochlear implant (CI) speech processors typically provide limited F0 information via temporal envelopes delivered to stimulating electrodes. Previous studies have shown that English-speaking CI users’ speech performance is correlated with amplitude modulation detection thresholds (AMDTs). The present study investigated whether Chinese-speaking CI users’ speech performance (especially tone recognition) is correlated with temporal processing capabilities.

DESIGN

Chinese tone, vowel, consonant, and sentence recognition were measured in 10 native Mandarin-speaking CI users via clinically assigned speech processors. AMDTs were measured in the same subjects for 20- and 100-Hz AM presented to a middle electrode at 5 stimulation levels that spanned the dynamic range (DR). To further investigate the CI users’ sensitivity to temporal envelope cues, AM frequency discrimination thresholds (AMFDTs) were measured for 2 standard AM frequencies (50 and 100 Hz), presented to the same middle electrode at 30% and 70% DR with a fixed modulation depth (50%).

RESULTS

Results showed that AMDTs significantly improved with increasing stimulation level, and that individual subjects exhibited markedly different AMDT functions. AMFDTs also improved with increasing stimulation level, and were better with the 100-Hz standard AM frequency than with the 50-Hz standard AM frequency. Statistical analyses revealed that both mean AMDTs (averaged for 20- or 100-Hz AM across all stimulation levels) and mean AMFDTs (averaged for the 50-Hz standard AM frequency across both stimulation levels) were significantly correlated with tone, consonant, and sentence recognition scores, but not with vowel recognition scores. Mean AMDTs were also significantly correlated with mean AMFDTs.

CONCLUSIONS

These preliminary results, obtained from a limited number of subjects, demonstrate the importance of temporal processing to CI speech recognition. The results further suggest that CI users’ Chinese tone and speech recognition may be improved by enhancing temporal envelope cues delivered by speech processing algorithms.

I. INTRODUCTION

During the last two decades, cochlear implant (CI) users’ overall speech performance has steadily improved alongside advances in implant and speech processor design. Today, many CI users are capable of good speech understanding in quiet, using auditory-only speech cues. However, there remains considerable variability in CI patient outcomes, even among patients implanted with the same device and fit with the same speech processing strategy. It is important to understand factors that may underlie this variability in CI patient outcomes, as such understanding may help to optimize speech processors for individual CI patients and guide further developments in implant design and signal processing.

Individual CI patient factors may contribute to differences in patient outcomes. These factors include etiology of deafness, age at onset of profound hearing loss, duration of hearing loss before implantation, experience with implant device, the location and health of surviving auditory neurons, the location and insertion depth of the electrode array, etc. Psychophysical measures of spectral and temporal resolution with simple stimuli may partly reflect the extent to which speech information can be transmitted and received by CI users via clinically assigned speech processors (e.g., Shannon, 1983; Shannon, 1989; Shannon, 1992; Busby et al., 1993; Cazals et al., 1994; Zwolan et al., 1997; Busby and Clark, 1999; Donaldson and Nelson, 2000; Fu, 2002). However, correlations between CI patients’ psychophysical limits and speech performance have often been weak, inconsistent, or non-existent.

Previous studies have investigated the relationship between speech performance and sensitivity to place cues in CI subjects, but have yielded somewhat contradictory and confusing results. For example, Zwolan et al. (1997) found significant inter-subject variability in CI users’ single-electrode discrimination. When poorly discriminated electrodes were removed from the clinically assigned speech processor, monosyllabic word and sentence recognition scores were better than those with the clinically assigned speech processor (which contained all available electrodes). However, CI subjects’ electrode discrimination performance was not significantly correlated with speech recognition scores, or with the improvement in speech performance when speech processors were optimized by removing electrodes that were not discriminable. Donaldson and Nelson (2000) observed a significant correlation between place-pitch sensitivity and consonant place-cue perception in Nucleus-22 implant patients who used the SPEAK strategy, but not in patients who used the MPEAK strategy. These contradictory results may have been due to the enhanced representations of spectral envelope cues in the SPEAK strategy compared to those provided by the MPEAK strategy (Skinner et al., 1994). Thus, CI patients’ spectral resolution, as measured by simple single-electrode discrimination, may not predict speech performance, especially when differences in speech processing strategies are considered.

Measures of CI users’ temporal processing have also produced mixed results, in relation to speech performance. For example, Shannon (1989) found that CI users’ gap detection thresholds were 20–50 ms for low stimulation levels and 2–5 ms for high stimulation levels, similar to those of normal-hearing (NH) subjects listening to acoustic stimuli presented at comparable loudness levels. While relatively little inter-subject variability was observed in CI subjects’ gap detection thresholds, great inter-subject variability was observed in subjects’ speech performance, indicating that CI speech performance was not well predicted by gap detection thresholds. In contrast, Busby and Clark (1999) examined gap detection in early-deafened CI subjects who were implanted later in life (the mean duration of deafness before implantation was ~8 years). Some subjects exhibited very large gap detection thresholds that ultimately limited access to gross temporal envelope cues in speech signals. For these subjects, gap detection thresholds were significantly correlated with age at onset of deafness, as well as with audio-visual sentence recognition scores and corresponding lip-reading enhancements. Busby and Clark (1999) argued that better gap detection may help CI subjects to better perceive stop consonants and word boundaries, and in turn, obtain better audio-visual sentence recognition scores.

While temporal gap detection might reflect listeners’ abilities to detect timing cues associated with stop consonants and/or boundaries of words in sentences, sensitivity to amplitude variations over time (within or across different frequency bands) may provide a more direct and comprehensive comparison to speech perception. As described by Rosen (1992), low-frequency amplitude envelopes (<50 Hz) are important for perceiving prosodic and segmental speech cues, while periodicity fluctuations (50–500 Hz) contain important temporal envelope pitch cues. Studies (e.g., Smith et al., 2002) have also shown that temporal fine structure (>500 Hz) is important for more challenging listening tasks, e.g., speech understanding in noise, sound localization, and music perception. Contemporary multi-channel CI speech processors typically transmit temporal envelope information extracted from individual frequency bands to corresponding implanted electrodes (e.g., Wilson et al., 1991). Because different frequency bands contain different temporal envelopes at different amplitudes, amplitude modulation detection thresholds (AMDTs) may be a relevant measure of temporal processing in relation to speech perception.

Shannon (1992) found that CI users’ temporal modulation transfer functions (TMTFs; AMDTs as a function of modulation frequency) showed similar low-pass filter characteristics as those of NH listeners (e.g., Viemeister, 1979), but with relatively higher low-pass cut-off frequencies (140 Hz for CI users versus 70 Hz for NH listeners). While modulation sensitivity is relatively homogeneous across NH listeners, there is significant variability in modulation sensitivity among CI users. For example, Busby et al. (1993) found that pre-lingually deafened but late-implanted CI users had elevated AMDTs, relative to post-lingually deafened CI users. They also found greater inter-subject variability in modulation sensitivity for pre-lingually deafened, late-implanted CI users than for post-lingually deafened CI users. CI users may also differ in terms of modulation sensitivity at different modulation frequencies, which may contribute to the variability in speech performance. For example, Cazals et al. (1994) found that post-lingually deafened CI users’ consonant and vowel recognition scores were significantly correlated with the deficit in modulation sensitivity at higher modulation frequencies (i.e., >400 Hz), rather than overall modulation sensitivity across all modulation frequencies. Modulation sensitivity is also known to vary according to loudness (e.g., Shannon, 1992), and CI users may differ in terms of modulation sensitivity at different stimulation levels. For example, Fu (2002) found that post-lingually deafened CI users’ AMDTs were significantly correlated with phoneme recognition scores when averaged across each subject’s entire electric dynamic range (DR), rather than at a single stimulation level. Modulation sensitivity has also been shown to vary with electrode location (e.g., Pfingst et al., 2008) and carrier pulse rate (e.g., Galvin and Fu, 2005; Pfingst et al., 2007).

Correlation analyses from previous studies indicate the importance of temporal speech features and CI users’ temporal processing capabilities for speech recognition. The contribution of temporal speech cues is especially strong for tonal languages such as Mandarin Chinese. In general, Mandarin-speaking CI users’ Chinese tone recognition performance is moderate, ranging from 50% to 70% correct (e.g., Fu et al., 2004; Wei et al., 2004). Fundamental frequency (F0) cues are important for Chinese tone recognition (e.g., Liang, 1963; Lin, 1988). However, the coarse spectral resolution provided by most CIs does not adequately encode F0 information. Thus, CI users’ tone recognition relies more strongly on temporal envelope and periodicity cues found within frequency channels (e.g., Fu et al., 1998; Fu and Zeng, 2000; Xu et al., 2002; Luo and Fu, 2004). Wei et al. (2004) found that CI users’ pulse rate discrimination thresholds were significantly correlated with tone recognition, and only marginally correlated with sentence recognition, possibly because of the small number of subjects in the study (5 CI users). In a more recent study, Wei et al. (2007) compared Chinese tone recognition performance to gap detection and frequency discrimination thresholds in 17 Mandarin-speaking CI users, and found that Chinese tone recognition in quiet was correlated with frequency discrimination thresholds only at 1000 Hz. Note that in Wei et al. (2007), gap detection and frequency discrimination thresholds were both measured acoustically via CI subjects’ clinically assigned processors, and the speech processors’ automatic gain control, acoustic frequency-to-electrode allocation, and stimulation rate may have influenced measures of psychophysical sensitivity. Also note that CI subjects’ acoustic frequency discrimination via clinical speech processors is likely to involve both spectral and temporal processing. When an acoustic sine wave is input to a multi-channel clinical speech processor, more than one electrode is typically stimulated. CI subjects may use changes in the cross-channel stimulation pattern to discriminate acoustic input frequencies. Because of the limits of CI temporal processing, these spectral envelope cues may be more strongly used for discrimination of relatively high acoustic frequencies (e.g., >1000 Hz). In contrast, CI subjects may be able to better use within-channel temporal cues for discrimination of relatively low acoustic frequencies (e.g., <300 Hz). In the present study, CI users’ modulation sensitivity was measured using a research interface, thereby bypassing the clinically assigned speech processor. CI research interfaces have been used extensively for psychophysical measurements. By bypassing the clinical speech processor and the associated processing to the acoustic input, psychophysical measurements via research interface may better reflect general subject factors such as neural survival conditions, which are independent of speech processing strategies.

In the present study, Chinese tone, vowel, consonant, and sentence recognition were measured in 10 native Mandarin-speaking post-lingually deafened CI users. To recognize speech patterns (especially tonal patterns), CI users must not only detect the presence of AM, but also detect changes in AM frequency. Thus, AMDTs were measured in the same subjects for low-frequency amplitude envelopes (20-Hz AM) and periodicity fluctuations (100-Hz AM); AMDTs were measured at 5 stimulation levels (10, 30, 50, 70, and 90% DR). AM frequency discrimination thresholds (AMFDTs) were also measured in the same subjects; AMFDTs were measured for 2 standard AM frequencies (50 and 100 Hz) at 2 stimulation levels (30% and 70% DR). To evaluate the contribution of temporal amplitude modulation processing to speech perception, in each subject, AMDTs and AMFDTs were compared to performance in each of the speech recognition tasks. AMDTs and AMFDTs were also directly compared in each subject to see whether better overall modulation sensitivity provided better sensitivity to changes in modulation rate.

II. METHODS

A. Subjects

Ten post-lingually deafened native Mandarin-speaking Nucleus-24 CI users (6 males and 4 females) participated in the present study. All subjects were implanted at the Cochlear Implant Center, Peking Union Medical College Hospital. Table 1 shows the relevant demographic details for these subjects. Subjects were paid for their participation and informed consent was obtained from all subjects.

Table 1.

Relevant demographic details for the cochlear implant subjects. Note that ACE stands for Advanced Combination Encoding, and stimulation rate is in pulses per second per electrode.

Subject	Age (years)	Gender	Etiology	Duration of deafness (years)	Processing strategy (stimulation rate)	Years with prosthesis
S1	53	M	Ototoxicity	15	ACE (900)	0.33
S2	23	F	Unknown	12	ACE (900)	2.75
S3	51	M	Unknown	3	ACE (900)	0.75
S4	32	F	Unknown	4	ACE (900)	1.66
S5	48	M	Sudden Hearing Loss	0.8	ACE (900)	0.33
S6	28	F	Parotitis	18	ACE (900)	4.25
S7	29	M	Ototoxicity	11	ACE (1200)	2.00
S8	39	F	Unknown	4	ACE (1200)	1.50
S9	26	M	Large Vestibular Aqueduct Syndrome	6	ACE (900)	0.83
S10	41	M	Cochlear Ossification	27	ACE (1200)	5.00

Open in a new tab

B. Chinese Speech Recognition Tests

Chinese tone, vowel, and consonant recognition were measured using speech stimuli derived from the ‘Chinese Standard Database’ (Wang, 1993). For Chinese tone recognition, 6 Mandarin Chinese single-vowel syllables (/a/, /o/, /e/, /i/, /u/, /ü/ in Pinyin; [Ą], [o], [ɣ], [i], [u], [y] in International Phonetic Alphabet, IPA) were produced by 2 male and 2 female talkers according to 4 tones (Tone 1 - flat, Tone 2 - rising, Tone 3 - falling-rising, Tone 4 - falling), resulting in a total of 96 tokens. For vowel recognition, 12 Mandarin Chinese single- and combined-vowel syllables (/a/, /o/, /e/, /i/, /u/, /ü/, /ai/, /ao/, /ou/, /an/, /ang/, /en/ in Pinyin; [Ą], [o], [ɣ], [i], [u], [y], [ai], [ɑ u], [ou], [an], [ɑ ŋ], [ən] in IPA) were produced 2 times by 2 male and 2 female talkers according to Tone 1, resulting in a total of 96 tokens. For consonant recognition, 16 Mandarin Chinese initial consonants (/b/, /c/, /d/, /f/, /g/, /h/, /k/, /l/, /m/, /n/, /p/, /s/, /t/, /w/, /y/, /z/ in Pinyin; [b], [ts‘], [t], [f‘], [k], [x], [k‘], [l], [m], [n], [b‘], [s], [t‘], [w], [j], [ts] in IPA) were produced by 2 male and 2 female talkers in a consonant-/a/ context (/a/ was produced according to Tone 1), resulting in a total of 64 tokens. All phoneme stimuli were lexically meaningful in Mandarin Chinese. A 16-bit A/D converter at a 16-kHz sampling rate (without high frequency pre-emphasis) was used to digitize the stimuli. Chinese sentence recognition was measured using the Mandarin Hearing in Noise Test (HINT; Soli, 2003). One male talker produced 240 every-day Chinese sentences of easy to moderate difficulty (10 key words per sentence). The sentence database was divided into 12 lists (20 sentences per list). All sentence stimuli were digitized using a 16-bit A/D converter at a 24-kHz sampling rate (without high frequency pre-emphasis).

During all speech recognition tests, CI subjects were tested while listening with their clinically assigned speech processors and microphone/sensitivity settings (set for conversational speech levels). Subjects were instructed to not change any settings during the experiments. Subjects were seated in a double-walled sound-treated booth and listened to the stimuli presented in sound field over a single loudspeaker. The presentation level was fixed at 65 dBA. Closed-set identification tasks were used to measure tone (4-choices), vowel (12-choices), and consonant (16-choices) recognition. The corresponding stimulus set was presented to each subject only once for each of the tone, vowel, and consonant recognition tasks. In each trial, a stimulus token was randomly selected (without replacement) from within the stimulus set and presented to the subject; subjects responded by clicking on one of the response choices shown on the screen. Responses were collected and scored in terms of percent correct. For open-set Chinese sentence recognition, a sentence list with 20 sentences was randomly selected from the Mandarin HINT for each subject. In each trial, a sentence was randomly selected (without replacement) from within the list and presented to the subject. Subjects were instructed to repeat the sentence as accurately as possible. Responses were collected and scored in terms of key word percent correct. No preview, training, or feedback was provided for any speech recognition test.

C. Amplitude Modulation Detection and Modulation Frequency Discrimination Tests

All stimuli were delivered via custom research interface (HEINRI; Shannon et al., 1990; Wygonski and Robert, 2001). Stimuli were 300-ms, biphasic pulse trains. The stimulation rate was fixed at 2000 pulses per second (pps). This relatively high carrier rate (i.e., higher than those typically used in clinical processors) was used to avoid possible aliasing effects with amplitude modulation. For each pulse, the phase duration was 50 µs, and the inter-phase gap was 8 µs. Before beginning the AMDT and AMFDT experiments, the electric DR was estimated in each subject for electrode 10 (MP1+2 stimulation mode), using steady-state unmodulated stimuli. A counting method was used to estimate absolute detection thresholds, similar to clinical fitting procedures. Beginning at a sub-threshold level, the current amplitude was increased until subjects were able to correctly count the number of stimuli; the amplitude was then reduced until subjects could no longer correctly count the number of stimuli. These ascending and descending sequences were repeated several times, and absolute detection thresholds were calculated as the average amplitude across these reversals. A method of limits was used to measure maximum comfortable levels (MCLs), defined as the maximum stimulation level that subjects could comfortably listen to for an extended period of time (e.g., during an experiment). Subjects pressed a mouse button to slowly increase the current amplitude until achieving MCL; MCLs were averaged across several measures to ensure reliable levels. The estimated DR was calculated as the difference in current level (in linear µA) between threshold and MCL.

1. AMDT

AMDTs were measured for electrode 10 for 2 modulation frequencies (20 and 100 Hz) at 5 stimulation levels (10, 30, 50, 70, and 90% of the estimated DR, calculated in linear µA), resulting in a total of 10 experimental conditions. For AM stimuli, the reference pulse amplitude was modulated by a 20- or 100-Hz sine wave, with the starting phase fixed at 180°. AMDTs were measured using an adaptive three-alternative forced-choice (3AFC) procedure (3-down/1-up), converging on the modulation depth that produced 79.4% correct response (Levitt, 1971). In each trial, there were 3 stimulation intervals, 2 of which (randomly selected) contained steady-state, unmodulated stimuli and the other contained the AM stimulus. Subjects were asked to choose which interval was different, and the modulation depth of the AM stimulus was adjusted according to subject response. No feedback was provided. The starting value of the modulation depth was adjusted for the different reference stimulation levels, and was set to ensure that subject responses for the first 3 trials were always correct. The step size of the modulation depth was also adjusted for the different experimental conditions and for individual subjects, and was reduced to half of the initial step size after the first 4 reversals. The adaptive run terminated after 12 reversals, or after 60 trials with a minimum of 8 reversals. AMDTs were calculated as the average modulation depth across the final 8 reversals. The test order of experimental conditions was randomized across subjects.

2. AMFDT

AMFDTs were measured for 2 standard AM frequencies (50 and 100 Hz) at 2 stimulation levels (30% and 70% DR), resulting in a total of 4 experimental conditions. For AM stimuli, the reference pulse amplitude was modulated by a sine wave, with the starting phase fixed at 180°. The standard AM frequency was varied according to the different experimental conditions, and the modulation depth was fixed at 50%, which was well above the AMDTs for the 2 standard AM frequencies at the 2 stimulation levels (see the AMDT results). All other stimulation parameters were the same as for the AMDT experiments. Before beginning the AMFDT experiments, the AM stimuli with the 2 standard frequencies at both stimulation levels were loudness-balanced to steady-state, unmodulated stimuli presented at the corresponding stimulation level; this loudness-balancing procedure helped to equate loudness across the different standard AM frequencies. A 2AFC, double-staircase procedure (Jesteadt, 1980; Zeng and Turner, 1991) was used; depending on the sequence, the adaptation rule was 2-down/1-up or 2-up/1-down. The standard stimulus was a steady-state, unmodulated pulse train, and the reference amplitude of the AM stimulus was adjusted according to subject response (0.8 dB step size for the first 4 reversals and 0.4 dB step size thereafter). The sequences terminated after 12 reversals, or after 60 trials with a minimum of 8 reversals. The reference amplitudes of the final 8 reversals were averaged for each sequence; the mean values from both sequences were then averaged to obtain the loudness-balanced reference amplitudes for the AM stimuli used in the AMFDT experiments. When measuring AMFDTs, it is possible that the loudness of the probe AM stimulus may change with its modulation frequency; this loudness cue may be used to discriminate AM stimuli if the modulation frequency difference is sufficiently large. One way to reduce such loudness cues is to apply amplitude roving to AM stimuli. However, Chatterjee and Peng (2008) found that while amplitude roving of ±0.5 dB made the modulation frequency discrimination task more difficult, AMFDTs for 50- and 100-Hz standard frequencies were not significantly affected by amplitude roving (or the lack thereof). Therefore, no amplitude roving was used in our AMFDT measurements.

AMFDTs were measured using an adaptive 3AFC procedure (3-down/1-up), converging on the modulation frequency that produced 79.4% correct response (Levitt, 1971). In each trial, there were 3 stimulation intervals, 2 of which (randomly selected) contained the reference AM stimuli and the other contained the probe AM stimulus. Note that the probe AM frequency was always higher than the reference AM frequency. Subjects were asked to choose which interval was different, and the probe AM frequency was adjusted according to subject response. No feedback was provided. The starting value of the probe AM frequency was adjusted for the different reference AM frequencies and stimulation levels, and was set to ensure that subject responses for the first 3 trials were always correct. The linear step size of the probe AM frequency was also adjusted according to the experimental conditions, and was reduced after the first 4 reversals. The adaptive run terminated after 12 reversals, or after 60 trials with a minimum of 8 reversals. The probe AM frequencies across the final 8 reversals were averaged. AMFDTs were calculated as the ratio between the probe AM frequency difference limen (DL) and the reference AM frequency (i.e., Weber’s fraction). The test order of experimental conditions was randomized across subjects.

III. RESULTS

A. Chinese Speech Recognition

Figure 1 shows tone, vowel, consonant, and sentence recognition scores for individual subjects, as well as averaged across all subjects. Note that for all speech tests, there was great inter-subject variability in recognition performance. Mean tone recognition was 60.6% correct, while mean vowel, consonant, and sentence recognition were 68.9%, 52.8%, and 79.2% correct, respectively.

Individual subject and mean Chinese speech recognition scores. The error bars represent one standard deviation.

Table 2 shows the multiple pair-wise Pearson correlation coefficients and significance levels (with Bonferroni correction) between the different speech recognition tests. All pair-wise correlations were significant or approached significance, except that between tone and vowel recognition.

Table 2.

Multiple pair-wise Pearson correlation coefficients and significance levels (with Bonferroni correction) between the different speech recognition tests.

	Vowel Tests	Consonant Tests	Sentence Tests
Tone Tests	r²=0.51, p=0.123	r²=0.68, p=0.019	r²=0.68, p=0.019
Vowel Tests		r²=0.59, p=0.054	r²=0.86, p=0.001
Consonant Tests			r²=0.77, p=0.006

Open in a new tab

B. AMDT

Figure 2 shows individual subjects’ AMDTs for 20-Hz AM (left panel) and 100-Hz AM (right panel), as a function of the reference stimulation level (in percent DR). There was large inter-subject variability in overall modulation sensitivity. For example, for 20-Hz modulation, subject S6 had the best modulation detection threshold among all subjects at 10% DR; however, AMDTs only slightly improved with stimulation level, saturating at 50% DR. In contrast, subject S10’s AMDTs continuously improved with stimulation level; thus, overall modulation sensitivity was much better for S10 than for S6. A two-way repeated measures analysis of variance (RM ANOVA) showed significant effects for both AM frequency [F(1,36)=5.8, p=0.039] and stimulation level [F(4,36)=62.4, p<0.001]; there was no significant interaction between AM frequency and stimulation level [F(4,36)=1.7, p=0.175, power of analysis: 0.20]. Post hoc Bonferroni t-tests showed that AMDTs were significantly better for 20-Hz AM than for 100-Hz AM. Also, AMDTs significantly improved as the stimulation level was increased (p<0.03), except from 70% to 90% DR.

Individual subjects’ AM detection thresholds for 20-Hz AM (left panel) and 100-Hz AM (right panel), as a function of the stimulation level (in percent DR).

C. AMFDT

Figure 3 shows individual subjects’ AMFDTs (in ΔF / F) obtained at 30% DR (left panel) and 70% DR (right panel), as a function of the standard AM frequency. Similar to AMDTs, there was great inter-subject variability in AMFDTs, in terms of overall sensitivity and the slope of the functions. At both stimulation levels, AMFDTs were generally lower with the 100-Hz standard frequency than with the 50-Hz standard frequency. Note that at 30% DR, only 6 of the 10 subjects were able to perform the modulation frequency discrimination task with the 100-Hz standard frequency, while at 70% DR, all subjects except S8 were able to perform the task with the 100-Hz standard frequency. Therefore, AMFDTs were successfully obtained in only 6 of the 10 subjects for the 50- and 100-Hz standard frequencies at 30% and 70% DR. These data were analyzed using a two-way RM ANOVA, which showed significant effects for stimulation level [F(1,5)=6.8, p=0.048], but not for standard frequency [F(1,5)=6.1, p=0.056, power of analysis: 0.45]. There was no significant interaction between stimulation level and standard frequency [F(1,5)=1.7, p=0.250, power of analysis: 0.10].

Individual subjects’ AM frequency discrimination thresholds obtained at 30% DR (left panel) and 70% DR (right panel), as a function of the standard AM frequency.

D. Speech Recognition vs. AMDT

CI subjects’ Chinese tone, vowel, consonant, and sentence recognition scores were correlated with their mean 20- and 100-Hz AMDTs (averaged across all stimulation levels). Figure 4 shows individual subjects’ speech performance as a function of their mean 20-Hz AMDTs; the solid lines show the linear regressions between the different speech recognition scores and the mean 20-Hz AMDTs. Mean 20-Hz AMDTs were significantly correlated with Chinese tone [r²=0.628, p=0.006], consonant [r²=0.506, p=0.021], and sentence recognition scores [r²=0.571, p=0.012], but not with vowel recognition scores [r²=0.332, p=0.081, power of analysis: 0.41]. Figure 5 shows individual subjects’ speech performance as a function of their mean 100-Hz AMDTs; the solid lines show the linear regressions between the different speech recognition scores and the mean 100-Hz AMDTs. Mean 100-Hz AMDTs were significantly correlated with Chinese tone [r²=0.528, p=0.017], consonant [r²=0.465, p=0.030], and sentence recognition scores [r²=0.545, p=0.015], but not with vowel recognition scores [r²=0.260, p=0.132, power of analysis: 0.32]. This pattern of correlation with speech performance was very similar to that of 20-Hz AMDTs. If anything, speech performance for these subjects was slightly better correlated with 20-Hz AMDTs than with 100-Hz AMDTs.

Chinese speech recognition scores as a function of the mean 20-Hz AMDTs (averaged across the entire DR). The solid lines show the linear regressions between the different speech recognition scores and the mean 20-Hz AMDTs.

Chinese speech recognition scores as a function of the mean 100-Hz AMDTs (averaged across the entire DR). The solid lines show the linear regressions between the different speech recognition scores and the mean 100-Hz AMDTs.

E. Speech Recognition vs. AMFDT

CI subjects’ Chinese tone, vowel, consonant, and sentence recognition scores were also correlated with their mean 50- and 100-Hz AMFDTs. AMFDTs with the 50-Hz standard frequency were successfully obtained in all 10 CI subjects at both stimulation levels (30% and 70% DR). Figure 6 shows individual subjects’ speech performance as a function of their mean 50-Hz AMFDTs (averaged across both stimulation levels); the solid lines show the linear regressions between the different speech recognition scores and the mean 50-Hz AMFDTs. Similar to mean AMDTs, mean 50-Hz AMFDTs were significantly correlated with Chinese tone [r²=0.396, p=0.050], consonant [r²=0.461, p=0.031], and sentence recognition scores [r²=0.455, p=0.032], but not with vowel recognition scores [r²=0.342, p=0.076, power of analysis: 0.43]. AMFDTs with the 100-Hz standard frequency were successfully obtained in only 9 subjects at 70% DR, and in only 6 subjects at 30% DR. Therefore, Figure 7 shows only 9 subjects’ speech performance as a function of their 100-Hz AMFDTs (obtained at 70% DR); the solid lines show the linear regressions between the different speech recognition scores and the 100-Hz AMFDTs. Different from the mean 50-Hz AMFDTs, 100-Hz AMFDTs (obtained at 70% DR) were not significantly correlated with Chinese tone [r²=0.184, p=0.249, power of analysis: 0.20], vowel [r²=0.033, p=0.639, power of analysis: 0.07], consonant [r²=0.226, p=0.195, power of analysis: 0.24], or sentence recognition scores [r²=0.071, p=0.487, power of analysis: 0.10].

Chinese speech recognition scores as a function of the mean 50-Hz AMFDTs (averaged across 30% and 70% DR). The solid lines show the linear regressions between the different speech recognition scores and the mean 50-Hz AMFDTs.

Chinese speech recognition scores as a function of the 100-Hz AMFDTs (obtained at 70% DR). The solid lines show the linear regressions between the different speech recognition scores and the 100-Hz AMFDTs.

F. AMDT vs. AMFDT

Figure 8 shows individual subjects’ mean AMDTs (averaged across the entire DR and across the 20- and 100-Hz modulation frequencies) as a function of their mean AMFDTs (averaged across both stimulation levels for the 50-Hz standard frequency); the solid line shows the linear regression between the two psychophysical measures. There was a significant correlation between mean AMFDTs and mean AMDTs [r²=0.474, p=0.028]. Note that subject S7 had relatively good modulation frequency discrimination, but relatively poor modulation detection. In contrast, subject S10 had relatively good modulation detection, but relatively poor modulation frequency discrimination.

Mean AMDTs (averaged across 20- and 100-Hz AM, and across the entire DR) as a function of the mean 50-Hz AMFDTs (averaged across 30% and 70% DR). The solid line shows the linear regression between the two psychophysical measures.

IV. GENERAL DISCUSSION

In the present study, Chinese speech recognition and modulation sensitivity were measured in 10 Mandarin-speaking CI subjects; correlations between these measures suggest that variable speech performance is at least partly due to limitations in temporal processing. Chinese speech recognition scores were comparable to those reported in previous studies (e.g., Fu et al., 2004; Wei et al., 2004). Although current CI speech processors are not specifically designed for tonal languages, CI subjects achieved moderate levels of tone recognition (42% to 75% correct); Chinese phoneme and sentence recognition scores were similar to those of English-speaking CI users (e.g., Fu, 2002).

Consistent with the results from Fu (2002) and Pfingst et al. (2007), AMDTs for the present study ranged from −5 to −40 dB re 100%, and improved with increasing stimulation level. Interestingly, the shapes of the AMDT-versus-level functions were slightly different among the three studies. In the Fu (2002) study, AMDTs for 100-Hz AM did not significantly improve for stimulation levels beyond ~50% DR, similar to the 40-Hz AMDTs obtained with the 250 Hz carrier rate in Pfingst et al. (2007); in the present study, the 100-Hz AMDTs generally continued to improve with increasing stimulation level, even for levels greater than 50% DR (see the right panel in Figure 2), similar to the 40-Hz AMDTs obtained with the 4000 Hz carrier rate in Pfingst et al. (2007). This relatively small difference among these studies may have been due to different choices of stimulation parameters (e.g., different carrier rates and modulation frequencies), as well as relative differences in subjects’ experience with the AM detection task. Compared to the more experienced subjects in the Fu (2002) study, the subjects in the present study had little to no previous experience in psychophysical tests. It is possible that, with more testing experience, with feedback, or with explicit training, these subjects might have performed better and reached the ceiling performance at lower stimulation levels (e.g., around 50% DR). In the present study, AMDTs for 20-Hz AM were better than those for 100-Hz AM, in agreement with the low-pass filter characteristics of CI users’ typical TMTFs (e.g., Shannon, 1992).

In contrast to the Fu (2002) study, the present study found that mean 20- or 100-Hz AMDTs across the entire DR were significantly correlated with consonant recognition scores, but not with vowel recognition scores. This result might be expected, as vowel recognition depends more on spectral envelope cues (e.g., formant frequencies), and less on temporal envelope cues. More interestingly, tone recognition was significantly correlated with mean 20- or 100-Hz AMDTs, which indirectly demonstrates the importance of temporal envelope cues to tone recognition in CIs. In contrast, Chatterjee and Peng (2008) did not find significant correlation between CI subjects’ AMDTs and their speech intonation recognition performance (which also strongly relies on pitch information). However, they measured AMDTs only at 50% DR for several modulation frequencies from 50 to 300 Hz. Taken together, it is not surprising that Chinese sentence recognition, as a combination of phoneme and tone recognition, was also significantly correlated with mean AMDTs.

AMFDTs in the present study were higher than previously reported data. Chatterjee and Peng (2008) reported that AMFDTs (in ΔF / F) for 9 CI subjects with the 50- and 100-Hz standard AM frequencies were lower than 1.0 (measured at 50% DR and 20% modulation depth). In the present study, some subjects exhibited AMFDTs higher than 1.0 with the 50-Hz standard AM frequency, even at 70% DR. Again, subjects’ relative inexperience in psychophysical testing may have contributed to the elevated AMFDTs and performance may have improved with experience, feedback, or training. Similar to AMDTs, AMFDTs were also level dependent. Increasing loudness slightly improved CI subjects’ sensitivity to changes in AM frequency. In terms of the effect of standard AM frequency, AMFDTs were better with the 100-Hz standard frequency than with the 50-Hz standard frequency, consistent with the observations in Chatterjee and Peng (2008). The AMFDT-versus-frequency functions in Chatterjee and Peng (2008) have band-pass filter characteristics, with AMFDTs increasing for standard frequencies higher or lower than 100 Hz, which are slightly different from the low-pass filter characteristics of typical TMTFs (Viemeister, 1979; Shannon, 1992). Subject S3 in the present study also exhibited atypical modulation frequency sensitivity, as he was able to discriminate between 100-Hz and 150-Hz AM stimuli, but could not discriminate between 50-Hz and 150-Hz AM stimuli. Taken together, these results suggest that AMFDTs with the 100-Hz standard frequency may be a more pitch-related measure, compared to those with the 50-Hz standard frequency. Chatterjee and Peng (2008) have suggested that the higher AMFDTs with the 50-Hz standard AM frequency may be related to the smaller number of modulation periods available within the 300-ms stimulation interval.

In the present study, mean AMFDTs with the 50-Hz standard AM frequency were significantly correlated with tone, consonant, and sentence recognition scores, similar to results from Chatterjee and Peng (2008), who found that CI subjects’ mean AMFDTs were significantly correlated with their intonation recognition results, but in exponential correlation functions. The significant correlation between speech performance and mean 50-Hz AMFDTs shed further lights on the role of temporal processing in speech recognition. It is clear that tone recognition would be enhanced by temporally cued pitch or tonal information, and Chinese sentence recognition would be improved with better tone recognition (e.g., Fu et al., 1998). The correlation between consonant recognition and mean 50-Hz AMFDTs may reflect some general aspects of temporal modulation that are important for perceiving consonants, rather than for pitch. Surprisingly, 100-Hz AMFDTs (obtained at 70% DR) were not significantly correlated with any of the speech measures. Note that in the present study, AMFDTs could not be measured for many subjects with the 100-Hz standard frequency at 30% DR, resulting in a limited data set to be correlated with speech performance. For the 9 subjects who were able to perform the modulation frequency discrimination task with the 100-Hz standard frequency at 70% DR, 100-Hz AMFDTs were relatively good, with relatively little inter-subject variability, which may have also contributed to the weaker correlation with speech performance. Interestingly, mean AMDTs were significantly correlated with mean 50-Hz AMFDTs, suggesting that CI subjects with better modulation detection sensitivity were better at detecting changes in AM frequency.

Wei et al. (2004) also showed that CI users’ stimulation rate discrimination was significantly correlated with tone recognition performance. Modulation rate DLs (as measured in the present study) and stimulation rate DLs (as measured in Wei et al., 2004) are generally similar, given that both are measures of CI users’ temporal resolution and tend to sharply elevate for standard rates above 300 Hz (e.g., Zeng, 2002; Baumann and Nobbe, 2004). However, the pitch percept elicited from amplitude modulation rate (temporal envelope pitch) is perceptually distinct from that elicited from carrier stimulation rate (temporal rate pitch); the salience of the temporal envelope pitch increases, while that of the temporal rate pitch decreases, with increasing modulation depth (McKay and Carlyon, 1999). AMFDTs have also been found to be much higher than stimulation rate DLs for standard rates between 200 and 600 Hz (Baumann and Nobbe, 2004).

It should be noted that AMDTs and AMFDTs may be affected by experimental parameters such as the carrier stimulation rate, stimulation mode, and electrode location (e.g., Galvin and Fu, 2005; Pfingst et al., 2006, 2007, 2008; Chatterjee and Peng, 2008). For example, Pfingst et al. (2008) found that AMDTs were highly variable across electrode locations for most CI subjects, suggesting that AMDTs should be measured at more than one electrode location to provide accurate estimates of CI users’ modulation sensitivity. Note also that the single-electrode psychophysical measures in the present study were independent of CI speech processing parameters that are known to significantly affect speech performance (e.g., the number and insertion depth of implanted electrodes, acoustic frequency-to-electrode allocation, etc.). In the future, the present study may be extended by testing a larger sample of subjects and evaluating the effects of feedback and/or training on the AM-related psychophysical tasks. Nonetheless, the preliminary results of the present study (in agreement with previous studies) showed that CI users’ Chinese speech recognition (especially tone recognition) performance was significantly correlated with their temporal processing capabilities, as measured on a single electrode. Based on these observations, it may be possible to enhance Chinese tone and speech recognition by training CI users to perceive relatively subtle changes in AM electrical stimulation patterns using psychophysical tasks. Another approach (albeit with only limited success to date) would be to develop CI signal processing strategies to enhance temporal envelope cues (e.g., Geurts and Wouters, 2001; Green et al., 2004, 2005; Luo and Fu, 2004; Vandali et al., 2005, 2007; Laneau et al., 2006; Hamilton et al., 2007) or to improve the neural response to temporal envelope fluctuations (e.g., Rubinstein et al., 1999).

V. CONCLUSIONS

In the present study, Chinese speech recognition, AMDTs, and AMFDTs were measured in 10 Mandarin-speaking CI subjects. AMDTs significantly improved with increasing stimulation level, and individual subjects exhibited markedly different AMDT functions. AMFDTs also improved with increasing stimulation level, and were better with the 100-Hz standard frequency than with the 50-Hz standard frequency. Mean AMDTs (averaged for 20- or 100-Hz AM across the entire DR) and mean AMFDTs (averaged for the 50-Hz standard AM frequency across 30% and 70% DR) were significantly correlated with tone, consonant, and sentence recognition scores, but not with vowel recognition scores. These preliminary results further confirm the importance of temporal envelope cues to CI users’ Chinese speech recognition (especially tone recognition).

ACKNOWLEDGMENTS

We are grateful to all subjects for their participation in these experiments. We thank John J. Galvin III for editorial assistance. We would also like to thank Dr. Gail S. Donaldson and two anonymous reviewers for their constructive comments on an earlier version of this paper. Research was supported in part by NIH (R01-DC004993 and R03-DC008192).

REFERENCES

Baumann U, Nobbe A. Pulse rate discrimination with deeply inserted electrode arrays. Hear. Res. 2004;vol. 196:49–57. doi: 10.1016/j.heares.2004.06.008. [DOI] [PubMed] [Google Scholar]
Busby PA, Clark GM. Gap detection by early-deafened cochlear-implant subjects. J. Acoust. Soc. Am. 1999;vol. 105:1841–1852. doi: 10.1121/1.426721. [DOI] [PubMed] [Google Scholar]
Busby PA, Tong YC, Clark GM. The perception of temporal modulations by cochlear implant patients. J. Acoust. Soc. Am. 1993;vol. 94:124–131. doi: 10.1121/1.408212. [DOI] [PubMed] [Google Scholar]
Cazals Y, Pelizzone M, Saudan O, Boex C. Low-pass filtering in amplitude modulation detection associated with vowel and consonant identification in subjects with cochlear implants. J. Acoust. Soc. Am. 1994;vol. 96:2048–2054. doi: 10.1121/1.410146. [DOI] [PubMed] [Google Scholar]
Chatterjee M, Peng S-C. Processing F0 with cochlear implants: Modulation frequency discrimination and speech intonation recognition. Hear. Res. 2008;vol. 235:143–156. doi: 10.1016/j.heares.2007.11.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
Donaldson GS, Nelson DA. Place-pitch sensitivity and its relation to consonant recognition by cochlear implant listeners using the MPEAK and SPEAK speech processing strategies. J. Acoust. Soc. Am. 2000;vol. 107:1645–1658. doi: 10.1121/1.428449. [DOI] [PubMed] [Google Scholar]
Fu Q-J. Temporal processing and speech recognition in cochlear implant users. Neuroreport. 2002;vol. 13:1–5. doi: 10.1097/00001756-200209160-00013. [DOI] [PubMed] [Google Scholar]
Fu Q-J, Hsu C-J, Horng M-J. Effects of speech processing strategy on Chinese tone recognition by Nucleus-24 cochlear implant users. Ear Hear. 2004;vol. 25(no 5):501–508. doi: 10.1097/01.aud.0000145125.50433.19. [DOI] [PubMed] [Google Scholar]
Fu Q-J, Zeng F-G. Identification of temporal envelope cues in Chinese tone recognition. Asia Pac. J. Speech, Lang. Hear. 2000;vol. 5:45–57. [Google Scholar]
Fu Q-J, Zeng F-G, Shannon RV, Soli SD. Importance of tonal envelope cues in Chinese speech recognition. J. Acoust. Soc. Am. 1998;vol. 104:505–510. doi: 10.1121/1.423251. [DOI] [PubMed] [Google Scholar]
Galvin JJ, Fu Q-J. Effects of stimulation rate, mode and level on modulation detection by cochlear implant users. J. Assoc. Res. Otolaryngol. 2005;vol. 6:269–279. doi: 10.1007/s10162-005-0007-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
Geurts L, Wouters J. Coding of the fundamental frequency in continuous interleaved sampling processors for cochlear implants. J. Acoust. Soc. Am. 2001;vol. 109:713–726. doi: 10.1121/1.1340650. [DOI] [PubMed] [Google Scholar]
Green T, Faulkner A, Rosen S. Enhancing temporal cues to voice pitch in continuous interleaved sampling cochlear implants. J. Acoust. Soc. Am. 2004;vol. 116:2298–2310. doi: 10.1121/1.1785611. [DOI] [PubMed] [Google Scholar]
Green T, Faulkner A, Rosen S, Macherey O. Enhancement of temporal periodicity cues in cochlear implants: Effects on prosodic perception and vowel identification. J. Acoust. Soc. Am. 2005;vol. 118:375–385. doi: 10.1121/1.1925827. [DOI] [PubMed] [Google Scholar]
Hamilton N, Green T, Faulkner A. Use of a single channel dedicated to conveying enhanced temporal periodicity cues in cochlear implants: Effects on prosodic perception and vowel identification. Int. J. Audiol. 2007;vol. 46:244–253. doi: 10.1080/14992020601053340. [DOI] [PubMed] [Google Scholar]
Jesteadt W. An adaptive procedure for subjective judgements. Percept. Psychophys. 1980;vol. 28:85–88. doi: 10.3758/bf03204321. [DOI] [PubMed] [Google Scholar]
Laneau J, Wouters J, Moonen M. Improved music perception with explicit pitch coding in cochlear implants. Audiol. Neurootol. 2006;vol. 11:38–52. doi: 10.1159/000088853. [DOI] [PubMed] [Google Scholar]
Levitt H. Transformed up-down methods in psychoacoustics. J. Acoust. Soc. Am. 1971;vol. 49:467–477. [PubMed] [Google Scholar]
Liang Z-A. The auditory perception of Mandarin tones. Acta Phys. Sin. 1963;vol. 26:85–91. [Google Scholar]
Lin M-C. The acoustic characteristics and perceptual cues of tones in Standard Chinese. Chinese Lang. Writings. 1988;vol. 204:182–193. [Google Scholar]
Luo X, Fu Q-J. Enhancing Chinese tone recognition by manipulating amplitude envelope: Implications for cochlear implants. J. Acoust. Soc. Am. 2004;vol. 116:3659–3667. doi: 10.1121/1.1783352. [DOI] [PubMed] [Google Scholar]
McKay CM, Carlyon RP. Dual temporal pitch percepts from acoustic and electric amplitude-modulated pulse trains. J. Acoust. Soc. Am. 1999;vol. 105:347–357. doi: 10.1121/1.424553. [DOI] [PubMed] [Google Scholar]
Pfingst BE, Burkholder RA, Thompson CS, Xu L. Modulation detection thresholds for cochlear implants: Dependence on stimulation site and stimulus level. J. Acoust. Soc. Am. 2006;vol. 120:3342. [Google Scholar]
Pfingst BE, Burkholder-Juhasz RA, Xu L, Thompson CS. Across-site patterns of modulation detection in listeners with cochlear implants. J. Acoust. Soc. Am. 2008;vol. 123:1054–1062. doi: 10.1121/1.2828051. [DOI] [PMC free article] [PubMed] [Google Scholar]
Pfingst BE, Xu L, Thompson CS. Effects of carrier pulse rate and stimulation site on modulation detection by subjects with cochlear implants. J. Acoust. Soc. Am. 2007;vol. 121:2236–2246. doi: 10.1121/1.2537501. [DOI] [PMC free article] [PubMed] [Google Scholar]
Rosen S. Temporal information in speech: acoustic, auditory and linguistics aspects. Philos. Trans. R. Soc. Ser. B. 1992;vol. 336:367–373. doi: 10.1098/rstb.1992.0070. [DOI] [PubMed] [Google Scholar]
Rubinstein JT, Wilson BS, Finley CC, Abbas PJ. Pseudospontaneous activity: stochastic independence of auditory nerve fibers with electrical stimulation. Hear. Res. 1999;vol. 127:108–118. doi: 10.1016/s0378-5955(98)00185-3. [DOI] [PubMed] [Google Scholar]
Shannon RV. Multichannel electrical stimulation of the auditory nerve in man. I. Basic psychophysics. Hear. Res. 1983;vol. 11:157–189. doi: 10.1016/0378-5955(83)90077-1. [DOI] [PubMed] [Google Scholar]
Shannon RV. Detection of gaps in sinusoids and pusle trains by patients with cochlear implants. J. Acoust. Soc. Am. 1989;vol. 85:2587–2592. doi: 10.1121/1.397753. [DOI] [PubMed] [Google Scholar]
Shannon RV. Temporal modulation transfer functions in patients with cochlear implants. J. Acoust. Soc. Am. 1992;vol. 91:2156–2164. doi: 10.1121/1.403807. [DOI] [PubMed] [Google Scholar]
Shannon RV, Adams DD, Ferrel RL, Palumbo RL, Grandgenett M. A computer interface for psychophysical and speech research with the Nucleus cochlear implant. J. Acoust. Soc. Am. 1990;vol. 87:905–907. doi: 10.1121/1.398902. [DOI] [PubMed] [Google Scholar]
Skinner MW, Clark GM, Whitford LA. Evaluation of a new Spectral Peak coding strategy for the Nucleus 22 Channel Cochlear Implant System. Am. J. Otol. 1994;vol. 15(suppl 2):15–27. [PubMed] [Google Scholar]
Smith ZM, Delgutte B, Oxenham AJ. Chimaeric sounds reveal dichotomies in auditory prception. Nature. 2002;vol. 416:87–90. doi: 10.1038/416087a. [DOI] [PMC free article] [PubMed] [Google Scholar]
Soli SD. House Ear Institute; 2003. Hearing in Noise Test for Mandarin Chinese. [Google Scholar]
Vandali AE, Sucher C, Tsang DJ, McKay CM, Chew JW, McDermott HJ. Pitch ranking ability of cochlear implant recipients: a comparison of sound-processing strategies. J. Acoust. Soc. Am. 2005;vol. 117:3126–3138. doi: 10.1121/1.1874632. [DOI] [PubMed] [Google Scholar]
Vandali AE, Ciocca V, Wong LLN, Luk B, Ip VWK, Murray B, Yu HC, Chung I, Ng E, Yuen K. Pitch and tonal language perception in cochlear implant users. Conference on Implantable Auditory Prosthese. 2007:204. [Google Scholar]
Viemeister NF. Temporal modulation transfer functions based upon modulation thresholds. J. Acoust. Soc. Am. 1979;vol. 66:1364–1380. doi: 10.1121/1.383531. [DOI] [PubMed] [Google Scholar]
Wang R-H. University of Science and Technology of China, internal materials; 1993. The standard Chinese database. [Google Scholar]
Wei C-G, Cao K-L, Jin X, Chen X-W, Zeng F-G. Psychophysical performance and Mandarin tone recognition in noise by cochlear implant users. Ear Hear. 2007;vol. 28:62s, 65s. doi: 10.1097/AUD.0b013e318031512c. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wei C-G, Cao K-L, Zeng F-G. Mandarin tone recognition in cochlear-implant subjects. Hear. Res. 2004;vol. 197:87–95. doi: 10.1016/j.heares.2004.06.002. [DOI] [PubMed] [Google Scholar]
Wilson BS, Finley CC, Lawson DT, Wolford RD, Eddington DK, Rabinowitz WM. Better speech recognition with cochlear implants. Nature. 1991;vol. 352:236–238. doi: 10.1038/352236a0. [DOI] [PubMed] [Google Scholar]
Wygonski J, Robert ME. House Ear Institute; 2001. HEI Nucleus research interface specification. [Google Scholar]
Xu L, Tsai Y, Pfingst BE. Features of stimulation affecting tonal-speech perception: Implications for cochlear prostheses. J. Acoust. Soc. Am. 2002;vol. 112:247–258. doi: 10.1121/1.1487843. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zeng F-G. Temporal pitch in electric hearing. Hear. Res. 2002;vol. 174:101–106. doi: 10.1016/s0378-5955(02)00644-5. [DOI] [PubMed] [Google Scholar]
Zeng F-G, Turner CW. inaural loudness matches in unilaterally impaired listeners. Q. J. Exp. Psychol. (A) 1991;vol. 43:565–583. doi: 10.1080/14640749108400987. [DOI] [PubMed] [Google Scholar]
Zwolan TA, Collins LM, Wakefield GH. Electrode discrimination and speech recognition in postlingually deafened adult cochlear implant subjects. J. Acoust. Soc. Am. 1997;vol. 102:3673–3685. doi: 10.1121/1.420401. [DOI] [PubMed] [Google Scholar]

[R1] Baumann U, Nobbe A. Pulse rate discrimination with deeply inserted electrode arrays. Hear. Res. 2004;vol. 196:49–57. doi: 10.1016/j.heares.2004.06.008. [DOI] [PubMed] [Google Scholar]

[R2] Busby PA, Clark GM. Gap detection by early-deafened cochlear-implant subjects. J. Acoust. Soc. Am. 1999;vol. 105:1841–1852. doi: 10.1121/1.426721. [DOI] [PubMed] [Google Scholar]

[R3] Busby PA, Tong YC, Clark GM. The perception of temporal modulations by cochlear implant patients. J. Acoust. Soc. Am. 1993;vol. 94:124–131. doi: 10.1121/1.408212. [DOI] [PubMed] [Google Scholar]

[R4] Cazals Y, Pelizzone M, Saudan O, Boex C. Low-pass filtering in amplitude modulation detection associated with vowel and consonant identification in subjects with cochlear implants. J. Acoust. Soc. Am. 1994;vol. 96:2048–2054. doi: 10.1121/1.410146. [DOI] [PubMed] [Google Scholar]

[R5] Chatterjee M, Peng S-C. Processing F0 with cochlear implants: Modulation frequency discrimination and speech intonation recognition. Hear. Res. 2008;vol. 235:143–156. doi: 10.1016/j.heares.2007.11.004. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] Donaldson GS, Nelson DA. Place-pitch sensitivity and its relation to consonant recognition by cochlear implant listeners using the MPEAK and SPEAK speech processing strategies. J. Acoust. Soc. Am. 2000;vol. 107:1645–1658. doi: 10.1121/1.428449. [DOI] [PubMed] [Google Scholar]

[R7] Fu Q-J. Temporal processing and speech recognition in cochlear implant users. Neuroreport. 2002;vol. 13:1–5. doi: 10.1097/00001756-200209160-00013. [DOI] [PubMed] [Google Scholar]

[R8] Fu Q-J, Hsu C-J, Horng M-J. Effects of speech processing strategy on Chinese tone recognition by Nucleus-24 cochlear implant users. Ear Hear. 2004;vol. 25(no 5):501–508. doi: 10.1097/01.aud.0000145125.50433.19. [DOI] [PubMed] [Google Scholar]

[R9] Fu Q-J, Zeng F-G. Identification of temporal envelope cues in Chinese tone recognition. Asia Pac. J. Speech, Lang. Hear. 2000;vol. 5:45–57. [Google Scholar]

[R10] Fu Q-J, Zeng F-G, Shannon RV, Soli SD. Importance of tonal envelope cues in Chinese speech recognition. J. Acoust. Soc. Am. 1998;vol. 104:505–510. doi: 10.1121/1.423251. [DOI] [PubMed] [Google Scholar]

[R11] Galvin JJ, Fu Q-J. Effects of stimulation rate, mode and level on modulation detection by cochlear implant users. J. Assoc. Res. Otolaryngol. 2005;vol. 6:269–279. doi: 10.1007/s10162-005-0007-6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] Geurts L, Wouters J. Coding of the fundamental frequency in continuous interleaved sampling processors for cochlear implants. J. Acoust. Soc. Am. 2001;vol. 109:713–726. doi: 10.1121/1.1340650. [DOI] [PubMed] [Google Scholar]

[R13] Green T, Faulkner A, Rosen S. Enhancing temporal cues to voice pitch in continuous interleaved sampling cochlear implants. J. Acoust. Soc. Am. 2004;vol. 116:2298–2310. doi: 10.1121/1.1785611. [DOI] [PubMed] [Google Scholar]

[R14] Green T, Faulkner A, Rosen S, Macherey O. Enhancement of temporal periodicity cues in cochlear implants: Effects on prosodic perception and vowel identification. J. Acoust. Soc. Am. 2005;vol. 118:375–385. doi: 10.1121/1.1925827. [DOI] [PubMed] [Google Scholar]

[R15] Hamilton N, Green T, Faulkner A. Use of a single channel dedicated to conveying enhanced temporal periodicity cues in cochlear implants: Effects on prosodic perception and vowel identification. Int. J. Audiol. 2007;vol. 46:244–253. doi: 10.1080/14992020601053340. [DOI] [PubMed] [Google Scholar]

[R16] Jesteadt W. An adaptive procedure for subjective judgements. Percept. Psychophys. 1980;vol. 28:85–88. doi: 10.3758/bf03204321. [DOI] [PubMed] [Google Scholar]

[R17] Laneau J, Wouters J, Moonen M. Improved music perception with explicit pitch coding in cochlear implants. Audiol. Neurootol. 2006;vol. 11:38–52. doi: 10.1159/000088853. [DOI] [PubMed] [Google Scholar]

[R18] Levitt H. Transformed up-down methods in psychoacoustics. J. Acoust. Soc. Am. 1971;vol. 49:467–477. [PubMed] [Google Scholar]

[R19] Liang Z-A. The auditory perception of Mandarin tones. Acta Phys. Sin. 1963;vol. 26:85–91. [Google Scholar]

[R20] Lin M-C. The acoustic characteristics and perceptual cues of tones in Standard Chinese. Chinese Lang. Writings. 1988;vol. 204:182–193. [Google Scholar]

[R21] Luo X, Fu Q-J. Enhancing Chinese tone recognition by manipulating amplitude envelope: Implications for cochlear implants. J. Acoust. Soc. Am. 2004;vol. 116:3659–3667. doi: 10.1121/1.1783352. [DOI] [PubMed] [Google Scholar]

[R22] McKay CM, Carlyon RP. Dual temporal pitch percepts from acoustic and electric amplitude-modulated pulse trains. J. Acoust. Soc. Am. 1999;vol. 105:347–357. doi: 10.1121/1.424553. [DOI] [PubMed] [Google Scholar]

[R23] Pfingst BE, Burkholder RA, Thompson CS, Xu L. Modulation detection thresholds for cochlear implants: Dependence on stimulation site and stimulus level. J. Acoust. Soc. Am. 2006;vol. 120:3342. [Google Scholar]

[R24] Pfingst BE, Burkholder-Juhasz RA, Xu L, Thompson CS. Across-site patterns of modulation detection in listeners with cochlear implants. J. Acoust. Soc. Am. 2008;vol. 123:1054–1062. doi: 10.1121/1.2828051. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R25] Pfingst BE, Xu L, Thompson CS. Effects of carrier pulse rate and stimulation site on modulation detection by subjects with cochlear implants. J. Acoust. Soc. Am. 2007;vol. 121:2236–2246. doi: 10.1121/1.2537501. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R26] Rosen S. Temporal information in speech: acoustic, auditory and linguistics aspects. Philos. Trans. R. Soc. Ser. B. 1992;vol. 336:367–373. doi: 10.1098/rstb.1992.0070. [DOI] [PubMed] [Google Scholar]

[R27] Rubinstein JT, Wilson BS, Finley CC, Abbas PJ. Pseudospontaneous activity: stochastic independence of auditory nerve fibers with electrical stimulation. Hear. Res. 1999;vol. 127:108–118. doi: 10.1016/s0378-5955(98)00185-3. [DOI] [PubMed] [Google Scholar]

[R28] Shannon RV. Multichannel electrical stimulation of the auditory nerve in man. I. Basic psychophysics. Hear. Res. 1983;vol. 11:157–189. doi: 10.1016/0378-5955(83)90077-1. [DOI] [PubMed] [Google Scholar]

[R29] Shannon RV. Detection of gaps in sinusoids and pusle trains by patients with cochlear implants. J. Acoust. Soc. Am. 1989;vol. 85:2587–2592. doi: 10.1121/1.397753. [DOI] [PubMed] [Google Scholar]

[R30] Shannon RV. Temporal modulation transfer functions in patients with cochlear implants. J. Acoust. Soc. Am. 1992;vol. 91:2156–2164. doi: 10.1121/1.403807. [DOI] [PubMed] [Google Scholar]

[R31] Shannon RV, Adams DD, Ferrel RL, Palumbo RL, Grandgenett M. A computer interface for psychophysical and speech research with the Nucleus cochlear implant. J. Acoust. Soc. Am. 1990;vol. 87:905–907. doi: 10.1121/1.398902. [DOI] [PubMed] [Google Scholar]

[R32] Skinner MW, Clark GM, Whitford LA. Evaluation of a new Spectral Peak coding strategy for the Nucleus 22 Channel Cochlear Implant System. Am. J. Otol. 1994;vol. 15(suppl 2):15–27. [PubMed] [Google Scholar]

[R33] Smith ZM, Delgutte B, Oxenham AJ. Chimaeric sounds reveal dichotomies in auditory prception. Nature. 2002;vol. 416:87–90. doi: 10.1038/416087a. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R34] Soli SD. House Ear Institute; 2003. Hearing in Noise Test for Mandarin Chinese. [Google Scholar]

[R35] Vandali AE, Sucher C, Tsang DJ, McKay CM, Chew JW, McDermott HJ. Pitch ranking ability of cochlear implant recipients: a comparison of sound-processing strategies. J. Acoust. Soc. Am. 2005;vol. 117:3126–3138. doi: 10.1121/1.1874632. [DOI] [PubMed] [Google Scholar]

[R36] Vandali AE, Ciocca V, Wong LLN, Luk B, Ip VWK, Murray B, Yu HC, Chung I, Ng E, Yuen K. Pitch and tonal language perception in cochlear implant users. Conference on Implantable Auditory Prosthese. 2007:204. [Google Scholar]

[R37] Viemeister NF. Temporal modulation transfer functions based upon modulation thresholds. J. Acoust. Soc. Am. 1979;vol. 66:1364–1380. doi: 10.1121/1.383531. [DOI] [PubMed] [Google Scholar]

[R38] Wang R-H. University of Science and Technology of China, internal materials; 1993. The standard Chinese database. [Google Scholar]

[R39] Wei C-G, Cao K-L, Jin X, Chen X-W, Zeng F-G. Psychophysical performance and Mandarin tone recognition in noise by cochlear implant users. Ear Hear. 2007;vol. 28:62s, 65s. doi: 10.1097/AUD.0b013e318031512c. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R40] Wei C-G, Cao K-L, Zeng F-G. Mandarin tone recognition in cochlear-implant subjects. Hear. Res. 2004;vol. 197:87–95. doi: 10.1016/j.heares.2004.06.002. [DOI] [PubMed] [Google Scholar]

[R41] Wilson BS, Finley CC, Lawson DT, Wolford RD, Eddington DK, Rabinowitz WM. Better speech recognition with cochlear implants. Nature. 1991;vol. 352:236–238. doi: 10.1038/352236a0. [DOI] [PubMed] [Google Scholar]

[R42] Wygonski J, Robert ME. House Ear Institute; 2001. HEI Nucleus research interface specification. [Google Scholar]

[R43] Xu L, Tsai Y, Pfingst BE. Features of stimulation affecting tonal-speech perception: Implications for cochlear prostheses. J. Acoust. Soc. Am. 2002;vol. 112:247–258. doi: 10.1121/1.1487843. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R44] Zeng F-G. Temporal pitch in electric hearing. Hear. Res. 2002;vol. 174:101–106. doi: 10.1016/s0378-5955(02)00644-5. [DOI] [PubMed] [Google Scholar]

[R45] Zeng F-G, Turner CW. inaural loudness matches in unilaterally impaired listeners. Q. J. Exp. Psychol. (A) 1991;vol. 43:565–583. doi: 10.1080/14640749108400987. [DOI] [PubMed] [Google Scholar]

[R46] Zwolan TA, Collins LM, Wakefield GH. Electrode discrimination and speech recognition in postlingually deafened adult cochlear implant subjects. J. Acoust. Soc. Am. 1997;vol. 102:3673–3685. doi: 10.1121/1.420401. [DOI] [PubMed] [Google Scholar]

PERMALINK

Speech recognition and temporal amplitude modulation processing by Mandarin-speaking cochlear implant users

Xin Luo

Qian-Jie Fu

Chao-Gang Wei

Ke-Li Cao

Abstract

OBJECTIVE

DESIGN

RESULTS

CONCLUSIONS

I. INTRODUCTION

II. METHODS

A. Subjects

Table 1.

B. Chinese Speech Recognition Tests

C. Amplitude Modulation Detection and Modulation Frequency Discrimination Tests

1. AMDT

2. AMFDT

III. RESULTS

A. Chinese Speech Recognition

Figure 1.

Table 2.

B. AMDT

Figure 2.

C. AMFDT

Figure 3.

D. Speech Recognition vs. AMDT

Figure 4.

Figure 5.

E. Speech Recognition vs. AMFDT

Figure 6.

Figure 7.

F. AMDT vs. AMFDT

Figure 8.

IV. GENERAL DISCUSSION

V. CONCLUSIONS

ACKNOWLEDGMENTS

REFERENCES

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases