Speech Perception with Spectrally Non-overlapping Maskers as Measure of Spectral Resolution in Cochlear Implant Users

Erin R O’Neill; Heather A Kreft; Andrew J Oxenham

doi:10.1007/s10162-018-00702-2

. 2018 Nov 19;20(2):151–167. doi: 10.1007/s10162-018-00702-2

Speech Perception with Spectrally Non-overlapping Maskers as Measure of Spectral Resolution in Cochlear Implant Users

Erin R O’Neill ^1,^✉, Heather A Kreft ¹, Andrew J Oxenham ¹

PMCID: PMC6453996 PMID: 30456730

Abstract

Poor spectral resolution contributes to the difficulties experienced by cochlear implant (CI) users when listening to speech in noise. However, correlations between measures of spectral resolution and speech perception in noise have not always been found to be robust. It may be that the relationship between spectral resolution and speech perception in noise becomes clearer in conditions where the speech and noise are not spectrally matched, so that improved spectral resolution can assist in separating the speech from the masker. To test this prediction, speech intelligibility was measured with noise or tone maskers that were presented either in the same spectral channels as the speech or in interleaved spectral channels. Spectral resolution was estimated via a spectral ripple discrimination task. Results from vocoder simulations in normal-hearing listeners showed increasing differences in speech intelligibility between spectrally overlapped and interleaved maskers as well as improved spectral ripple discrimination with increasing spectral resolution. However, no clear differences were observed in CI users between performance with spectrally interleaved and overlapped maskers, or between tone and noise maskers. The results suggest that spectral resolution in current CIs is too poor to take advantage of the spectral separation produced by spectrally interleaved speech and maskers. Overall, the spectrally interleaved and tonal maskers produce a much larger difference in performance between normal-hearing listeners and CI users than do traditional speech-in-noise measures, and thus provide a more sensitive test of speech perception abilities for current and future implantable devices.

Keywords: masking release, current spread, spectral separation, speech in noise

INTRODUCTION

The cochlear implant (CI) has been an extremely effective auditory solution for many individuals with severe to profound hearing loss (e.g., Zeng et al. 2008). Despite the success of the device, a major challenge for CI users remains the difficulty of understanding speech in the presence of background noise. One factor believed to be critical in limiting CI performance is poor spectral resolution, produced by the limited number of electrodes in the CI array, and by the extensive overlap in the electrical fields produced by neighboring electrodes. The overlap or spread of current means that increasing the number of electrodes does not necessarily increase the effective number of independent frequency channels, and so performance does not generally improve with increasing number of electrodes beyond about eight or ten (Friesen et al. 2001). Studies with normal-hearing (NH) listeners have successfully simulated the effects of current spread by implementing different forms of spectral smearing within noise- or tone-excited envelope vocoder schemes (Fu and Nogaki 2005; Bingabr et al. 2008; Crew et al. 2012; Oxenham and Kreft 2014; Mesnildrey and Macherey 2015; Grange et al. 2017).

Despite the intuitively obvious connection between speech perception in noise and spectral resolution, the relationship between the two measures has not always been clear at the level of individual listeners. Although a number of studies have reported correlations between spectral ripple discrimination thresholds (i.e., the highest spectral ripple rate, in ripples per octave, at which a phase reversal can be detected) and speech perception in quiet (Henry et al. 2005; Anderson et al. 2011; Won et al. 2011; Drennan et al. 2014), results have been more mixed for speech perception in noise, with some studies finding a significant correlation (Won et al. 2011; Jeon et al. 2015; Holden et al. 2016; Zhou 2017) and others not (Anderson et al. 2011). Other measures, involving spectral ripple detection, have often found correlations between the minimum detectable ripple depth and speech perception in noise (Litvak et al. 2007; Saoji et al. 2009; Anderson et al. 2012). However, these correlations tend to be significant at low ripple rates (such as 0.25 or 0.5 ripples per octave, rpo) but not at higher ripple rates (such as 1 or 2 rpo), which is the opposite of what would be expected if spectral resolution, rather than intensity resolution, were limiting speech perception in noise (Anderson et al. 2012). A more recent measure of spectral and/or intensity resolution that is most similar to spectral ripple detection (Azadpour and Mckay 2012) was also found not to be significantly correlated with speech perception in noise. Interestingly, a recent publication by Gifford et al. (2018) found correlations between spectral modulation detection thresholds at 0.5 and 1 rpo and speech perception in both quiet and noise for adult but not pediatric CI users.

One reason for the lack of robust correlations between measures of spectral resolution and speech perception in noise may be the use of noise that is spectrally matched to the speech. Spectral resolution is likely to be most important when the speech and noise are not spectrally matched, so that better spectral resolution can help segregate speech from noise. Recent studies have shown that spectrally separating speech and noise leads to increased speech intelligibility for NH listeners, compared to performance when speech and noise overlap in the frequency domain (Kidd et al. 2005; Apoux and Healy 2010). It is already known that CI users are not able to take as much advantage of spectral gaps in a masker as NH listeners. For instance, Oxenham and Kreft (2014) found that a spectrally sparse masker, consisting of 16 logarithmically spaced pure tones, produced as much speech masking in CI users as a speech-shaped noise with the same overall level and spectral envelope, whereas NH listeners exhibited a large release from masking. These findings suggest that conditions in which the masker and speech do not completely spectrally overlap may provide a more sensitive test of the effects of spectral resolution, and so may provide measures of speech recognition that correlate more closely with measures of spectral resolution than more typical measures of speech perception in spectrally overlapping noise.

The aim of this study was to test the prediction that measures of speech intelligibility in spectrally unmatched noise should be more sensitive to differences in spectral resolution in CI users than traditional speech-in-noise tests. Conditions in which the speech and masker were presented to the same CI electrodes were compared with conditions in which the speech and masker were presented to different (interleaved) electrodes. This same paradigm was also implemented using the virtual channels that were used in the processing schemes of the CI users. Both noise and tones were used as maskers (experiments 1 and 2), and the results were compared to more direct measures of spectral resolution using spectral ripple discrimination (experiment 3). In all cases, the results from CI users were compared with results from NH listeners using tone-excited envelope vocoders to simulate various degrees of current spread (Crew et al. 2012; Oxenham and Kreft 2014).

EXPERIMENT 1: SPEECH PERCEPTION IN SPECTRALLY INTERLEAVED OR OVERLAPPED MASKERS

Methods

Listeners

A total of 13 post-lingually deafened CI users and 24 NH listeners (3 groups of 8 participants) were tested. All participants were native speakers of American English. Individual details for the CI users are provided in Table 1. To take part in the study, CI users were required to obtain at least 40 % of keywords correct in sentences from the IEEE corpus (IEEE 1969) in quiet. The eight CI users who met this criterion are indicated in Table 2. Among the NH listeners tested, 15 were male and 9 were female, with ages ranging from 18 to 32 years. Normal hearing was defined as having pure-tone audiometric thresholds less than 20 dB hearing level (HL) at all octave frequencies between 250 and 8000 Hz with no reported history of hearing disorders. All experimental protocols were approved by the Institutional Review Board of the University of Minnesota, and all listeners provided written informed consent prior to participating.

Table 1.

Individual subject information for CI users

Subject code	Gender	Age (years)	CI use (years)	Etiology	HL prior to implant (years)	Speech processing strategy
C16	F	63	16	Unknown	1	MPS
D02	F	67	15	Unknown	1	HiRes Optima-P; ClearVoice MED
D10	F	63	14	Unknown	8	HiRes-S w/Fidelity 120; ClearVoice HIGH
D25	F	53	10	Meniere’s disease	33	HiRes Optima-S;ClearVoice LOW
D26	F	57	8	Unknown	11	HiRes Optima-S; ClearVoice OFF
D27	F	65	7	Otosclerosis	13	HiRes-S w/Fidelity 120; ClearVoice OFF
D28	F	68	14	Familial progressive SNHL	7	HiRes Optima-S; ClearVoice MED
D35	F	57	6	High fever	?	HiRes Optima-S; ClearVoice MED
D39	M	69	8	Unknown	7	HiRes Optima-S; ClearVoice MED
D41	F	68	4	Familial progressive SNHL	41	HiRes-S w/Fidelity 120; ClearVoice MED
D42	M	61	3	Familial progressive SNHL	2	HiRes Optima-S; ClearVoice MED
D44	F	70	9	Familial progressive SNHL	18	HiRes Optima-P; ClearVoice HIGH
D46	F	60	4	Meniere’s disease	13	HiRes-S w/Fidelity 120; ClearVoice MED
D47	F	59	4	Unknown	< 1	HiRes Optima-P; ClearVoice MED
D52	F	60	14	High fever	4	HiRes Optima-S; ClearVoice MED
D55	F	58	7	Familial progressive SNHL	?	HiRes Optima-S; ClearVoice OFF

Open in a new tab

Table 2.

Experiments completed by each CI user included in the study

Subject code	Experiment 1	Experiment 2	Experiment 3
C16	*		X
D02	*		X
D10	X	X	X
D25		X
D26	X	X	X
D27	*		X
D28	X		X
D35	*		X
D39	X	X	X
D41	X	X	X
D42	*		X
D44	X		X
D46	X		X
D47	X	X	X
D52		X
D55		X

Open in a new tab

* = attempted but did not pass screening

X = successfully completed

Stimuli

The speech materials were comprised of sentences taken from the IEEE speech corpus (IEEE 1969), recorded by a single female talker. The sentences were presented in either a noise or tonal masker, or in quiet. The Gaussian noise was spectrally shaped to match the long-term spectrum of the IEEE speech corpus. The tone frequencies were selected to match the center frequencies of the CI electrodes of each individual CI user tested, and the amplitudes were equated in terms of their rms to produce the same output level from each channel of the CIs as the noise masker. The 16 center frequencies from the standard clinical map for Advanced Bionics CIs were used to generate the stimuli for the NH listeners. The center frequencies for the individual CI users and for the standard clinical map are shown in Table 3.

Table 3.

Center frequencies (CFs) of each CI user’s clinical map, as well as those used for the NH listeners that were used for the implementation of the vocoding

Subject code	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16
All NH	333	455	540	642	762	906	1076	1278	1518	1803	2142	2544	3022	3590	4264	6665
C16	434	644	954	1414	2096	3108	6206	OFF	NA	NA	NA	NA	NA	NA	NA	NA
D02	333	455	540	642	762	906	1076	1278	1518	1803	2142	2544	3022	3590	4264	6665
D10	333	455	540	642	762	906	1076	1278	1518	1803	2142	2544	3022	3590	4264	6665
D25	333	455	540	642	762	906	1076	1278	1518	1803	2142	2544	3022	3590	4264	6665
D26	OFF	386	463	556	668	804	965	1160	1394	1674	2012	2417	2904	3490	4193	6638
D27	386	463	556	668	804	965	1160	1394	1674	2012	2417	2904	3490	4193	6638	OFF
D28	333	455	540	642	762	906	1076	1278	1518	1803	2142	2544	3022	3590	4264	6665
D35	333	455	540	642	762	906	1076	1278	1518	1803	2142	2544	3022	3590	4264	6665
D39	386	463	556	668	804	965	1160	1394	1674	2012	2417	2904	3490	4193	6638	OFF
D41	333	455	540	642	762	906	1076	1278	1518	1803	2142	2544	3022	3590	4264	6665
D42	338	472	576	700	852	1038	1264	1538	1872	2280	2776	3380	4114	6609	OFF	OFF
D44	333	455	540	642	762	906	1076	1278	1518	1803	2142	2544	3022	3590	4264	6665
D46	333	455	540	642	762	906	1076	1278	1518	1803	2142	2544	3022	3590	4264	6665
D47	333	455	540	642	762	906	1076	1278	1518	1803	2142	2544	3022	3590	4264	6665
D52	333	455	540	642	762	906	1076	1278	1518	1803	2142	2544	3022	3590	4264	6665
D55	333	455	540	642	762	906	1076	1278	1518	1803	2142	2544	3022	3590	4264	6665

Open in a new tab

The speech and masker were then passed through a tone-excited envelope vocoder (Dorman et al. 1998; Whitmal et al. 2007). The stimulus was divided into 16 frequency subbands (with the exception of those CI users who had fewer than 16 active channels) with cutoff frequencies and center frequencies of each subband made equal to those in the clinical maps of the individual CI users. For the NH listeners, the standard Advanced Bionics clinical map was used to set the subband frequencies (see Table 3). The bandpass filters used to generate the subbands were high-order (947) FIR filters, generated with Matlab’s fir1 function, producing very little overlap between the spectral content of adjacent subbands and a flat frequency response (± 0.05 dB) within the entire passband. The impulse responses from the filters were time-aligned, reaching their peaks at a delay of approximately 20 ms, independent of filter center frequency. Two conditions were generated, one termed “interleaved” and one termed “overlapping.” In the overlapping condition, both the speech and the masker were mixed at the appropriate signal-to-masker ratio (SMR) and then passed through the even-numbered vocoder channels (i.e., 2, 4, … 16), resulting in eight equally spaced frequency subbands (top panel Fig. 1). In the interleaved condition, the speech was passed through the even-numbered channels (as before), but the masker was passed through the odd-numbered channels, resulting in spectral separation between the speech and the masker (bottom panel Fig. 1). In both cases, the temporal envelope from each subband was extracted using a Hilbert transform, and the resulting envelope was lowpass filtered using a fourth-order Butterworth filter with a cutoff frequency of 50 Hz. This cutoff frequency was chosen to reduce possible voicing periodicity cues and to reduce the possibility (for NH listeners) that the vocoder produced spectrally resolved components via the amplitude modulation of the tonal carriers. The resulting temporal envelopes were used to modulate pure-tone carriers with frequencies corresponding to the center frequencies of each channel, which were then presented to the CI users and to the group of eight NH listeners who were assigned to the “no spread” conditions. Electrodograms (generated with software supplied by Advanced Bionics) showing the stimulation applied to each electrode for both the overlapping and interleaved conditions are shown in Fig. 2 using the Optima stimulation strategy and the tone maskers. Since all but one of the CI participants used processing strategies that included some form of current steering (Fidelity 120 or Optima), there was some degree of cross-talk between channels in both conditions. Most cross-talk is observed with the Optima strategy, which has a maximum assignment to one electrode of 75 % of the current (with the other 25 % to the other member of each electrode pair that constitute a virtual channel). The effects of cross-talk (as well as the limited filter resolution) can be seen most clearly in the overlapping condition, where some stimulus can be observed in the odd electrodes, despite no intended stimulation of these electrodes. However, despite some interaction between channels, the electrodograms show that the two configurations resulted in very different stimulation patterns (compare top and bottom panels of Fig. 2). In particular, in the interleaved condition (bottom panel), the speech envelope is observed only in the even channels. In the odd channels, the current level representing the tone maskers is suppressed during the high-amplitude portions of the speech, presumably due to a combination of automatic gain control, compression, and current steering. No attempt was made to simulate this effect in the vocoder. For the other two groups of NH listeners, the effects of current spread were simulated via the vocoder in the same way as in Oxenham and Kreft (2014): each carrier was modulated by the weighted sum of the intensity envelopes from all 16 channels. The weights used in this sum were selected to produce slopes of either 24 dB/oct or 12 dB/oct to simulate different degrees of spectral smearing or current spread.

Fig. 1 — Schematic diagram of the two masker configurations used in experiment 1. The top panel shows the condition in which speech and masker are overlapping in the even channels, and the bottom panel shows the condition in which speech and masker are interleaved with speech in the even channels and masker in the odd channels

Fig. 2 — Electrodograms of the two masker configurations used in the present study. The top panel shows the condition in which speech and masker are overlapping in the even channels, and the bottom panel shows the condition in which speech and masker are interleaved with speech in the even channels and masker in the odd channels. The first sentence of the IEEE corpus and a tone masker with SMR of 5, processed by the Optima strategy, is shown in both panels

The level of the speech after filtering was 51 dB SPL for the CI users and was between 51 and 56 dB SPL for the NH listeners, depending on the degree of spectral smearing, as measured 1 m from the loudspeaker, corresponding to the position of the participant’s head. The masker level was adjusted to produce the desired SMR, referring to the levels of the speech and masker before filtering. The masker was gated on 1 s before the beginning of each sentence and was gated off 1 s after the end of each sentence. The SMRs were selected in advance, based on pilot data, to span a range of performance between 0 and 100 % word recognition for each condition. The resulting range was −15 to 10 dB SMR for the no spread and 24 dB/oct spread NH groups and 0 to 20 dB SMR for the 12 dB/oct spread NH group and for the CI group.

Procedure

The stimuli were generated and processed using MATLAB (The Mathworks, Natick, MA). The sounds were converted via a 24-bit digital-to-analog converter (L22, LynxStudio, Costa Mesa, CA) at a sampling rate of 22,050 Hz, and were presented via an amplifier and a single loudspeaker, placed approximately 1 m from the listener at 0° azimuth and level with the listener’s head. The listeners were seated individually in a double-walled, sound-attenuating booth with approximate interior dimensions of 6′8″ × 7′10″ × 6′6″. Bilateral CI users were instructed to use whichever processor they thought gave them better speech perception and to remove the other processor. One participant with a hearing aid in the ear contralateral to her CI was also instructed to remove it before beginning the experiment. The hearing provided by this contralateral ear without the hearing aid was deemed negligible, as a recent audiogram showed a flat severe sensorineural hearing loss with audiometric thresholds at octave frequencies between 250 and 8000 Hz of between 65 and 75 dB HL. Thus, the speech presented at an overall level of 51 dB SPL was inaudible without the hearing aid.

Listeners responded to sentences by typing what they heard on a computer keyboard. They were encouraged to guess individual words, even if they had not heard or understood the entire sentence. Sentences were scored for keywords correct as a proportion of the total number of keywords presented. Initial scoring was automatic, with each error then checked manually for potential spelling errors. Before the actual experiment took place, listeners were presented with two sentence lists (of ten sentences each) of the HINT speech corpus (Nilsson et al. 1994) to acclimate them to the stimuli before the scored sentences were presented. This procedure was repeated for each new masker type (noise or tone) and configuration (spectrally overlapping or interleaved).

In the actual experiment, two sentence lists of ten sentences each were completed for each combination of masker type, SMR, and configuration (overlapping and interleaved). Each NH listener completed the experiment using one of three simulated spread conditions (no spread, 24 dB/oct spread, or 12 dB/oct spread). The stimuli for CI users were processed using a vocoder with no spread, with the center frequencies of each subband matching the center frequencies of each active channel in their map. The proportions of correct scores were converted to rationalized arcsine units (RAU) (Studebaker 1985) to compensate for possible floor or ceiling effects before statistical analysis.

Results

The mean results from the three NH groups (no spread, 24 dB/oct spread, and 12 dB/oct spread) and the CI group are shown in the separate panels of Fig. 3. The results from tone and noise maskers are denoted by circles and triangles, respectively. The filled symbols represent data from the conditions in which the speech and masker overlapped in the same frequency bands, and the open symbols represent data from conditions in which the speech and masker were presented to different interleaved frequency bands.

As expected, increasing the amount of simulated spread in the NH listeners led to poorer speech perception. Also with increasing spread, the difference between the tone and noise maskers became less pronounced. This effect is expected, based on the results of Oxenham and Kreft (2014). They reasoned that the lack of difference at large spread values was due to the effective smoothing of the temporal envelope of the noise masker, due to the overlap between adjacent channels, making the noise masker more tone-like (Oxenham and Kreft 2014). The NH listeners also benefited less from the spectral separation of the speech and masker in the interleaved condition (i.e., difference between the overlapped and interleaved conditions) with increasing spread. For instance, with the no spread NH group, there was a clear separation in performance in conditions with the tone and noise maskers (compare circles and triangles) for all but the highest SMRs, and the difference in performance between the spectrally interleaved and overlapped maskers was very large, reaching a mean difference of about 60 RAU at the lowest SMRs. In contrast, with the 12 dB/oct spread group, performance was much poorer overall (note the different SMRs tested) and was very similar regardless of masker type or spectral overlap. Given the large differences between groups, in terms of pattern of results, the SMRs tested, and overall performance, separate repeated-measures ANOVAs were performed for each group.

For the no spread NH group, a three-way repeated-measures ANOVA on the RAU-transformed proportion of words correctly reported confirmed significant main effects of masker type (tone or noise) [F(1, 7) = 7.8, P = 0.027, partial η² = 0.528], condition (overlapping vs interleaved) [F(1, 7) = 186, P < 0.001, partial η² = 0.964], and SMR [F(3, 21) = 290, P < 0.001, partial η² = 0.997]. There were also interactions between masker type and SMR [F(3, 21) = 5.5, P = 0.006, partial η² = 0.715], and between condition and SMR [F(3, 21) = 46, P < 0.001, partial η² = 0.966].

The NH group listening through the vocoder with the 24 dB/oct spread showed a similar pattern of statistical outcomes, with a significant effect of masker type [F(1, 7) = 61.0, P < 0.001, partial η² = 0.897], condition [F(1, 7) = 174, P < 0.001, partial η² = 0.961], and SMR [F(3, 21) = 357, P < 0.001, partial η² = .981]. Results from this group also showed an interaction between condition and SMR [F(3, 21) = 10, P < 0.001, partial η² = .877]. Although the effect of spectral condition was significant, it was smaller than in the no-spread group, especially at the lower SMRs. For instance, at − 5 dB SMR with the noise masker, the average increase in score from the overlapped to the interleaved condition was about 51 RAU in the no-spread group, whereas it was only 17 RAU in the 24 dB/oct spread group, despite similar levels of performance of both groups in the overlapped condition.

For the NH group listening through the vocoder with 12 dB/oct spread, there were only significant effects of condition [F(1, 7) = 8.8, P = 0.021, partial η² = 0.556] and SMR [F(4, 28) = 66.6, P < 0.001, partial η² = 0.982]; the effect of masker type was not significant [F(1, 7) = 0.62, P = 0.456, partial η² = 0.082]. Similar to the other two NH groups, there was an interaction between condition and SMR [F(4, 28) = 3.7, P = 0.016, partial η² = 0.690] but in contrast to the other two NH groups, there was also a three-way interaction between masker type, condition, and SMR [F(4, 28) = 3.5, P = 0.019, partial η² = 0.751]. Again, although the effect of condition reached significance, it was small, with the mean difference rarely exceeding 10 RAU for either the noise or tone maskers.

The results from the eight CI users resemble most closely those of the NH group listening through the vocoder with the 12 dB/oct spread. Despite the apparent similarity of the results from the two groups, the statistical analysis resulted in slightly different outcomes. The same three-way repeated-measures ANOVA performed on the data from the CI users confirmed a significant effect of SMR [F(4, 28) = 76.1, P < 0.001, partial η² = 0.989], but no significant main effect of masker type [F(1, 7) = 3.9, P = 0.089, partial η² = 0.358], and no main effect of condition (interleaved vs. overlapped) [F(1, 7) = 0.8, P = 0.401, partial η² = 0.102]. There was a significant interaction between masker type and SMR [F(4, 28) = 5.1, P = 0.003, partial η² = 0.903] and between masker type and condition [F(1, 7) = 6.0, P = 0.044, partial η² = 0.463]. The interactions with masker type presumably reflect the fact that scores with the tone masker appear higher than scores with the noise masker for the interleaved conditions at SMRs of 0 and 5 dB, but only for the overlapped conditions at an SMR of 0 dB.

The effects of the interleaved and overlapped conditions in the different groups can be seen more clearly in Fig. 4, which replots the data from Fig. 3, but with the data from the different groups shown in the same panel. Results using the tone masker are presented in the top panel and results using the noise masker are shown in the bottom panel. Mean data for the no spread, 24 dB/oct spread, 12 dB/oct spread, and CI groups are represented by circles, triangles, squares, and diamonds, respectively. As before, the filled symbols represent data from the overlapped conditions and the open symbols represent data from the interleaved conditions. As noted above, the benefit gained by listeners in the spectrally interleaved conditions decreased with increasing spread in the NH group, and was essentially absent in the CI group.

The results in quiet are shown at the right of each panel. Each participant completed two to four lists of ten sentences in the quiet condition to establish a baseline average. The average across trials for each participant was used to calculate a mean for each group, shown at the right of each panel. A between-subjects one-way ANOVA on the performance in quiet showed a significant effect of group [F(1,3) = 13.4, P < 0.001]. Post-hoc contrasts (with Bonferroni correction and six possible contrasts yielding a criterion value, α = 0.05/6 = 0.0083) revealed a significant difference between the 24 dB/oct spread group and the 12 dB/oct spread group (P = 0.001), but not between the no spread and 24 dB/oct spread groups (P = 0.836). The results from the CI group were significantly poorer than those of the no-spread and 24 dB/oct spread groups (P = 0.002 and P = 0.001, respectively) but were not significantly different from those of the 12 dB/oct group (P = 0.154).

Discussion

Benefit of Spectrally Interleaved Maskers

The difference in speech understanding between conditions with overlapped and interleaved maskers can be viewed as spectral masking release. To investigate the relationship between spectral resolution (as manipulated through varying amounts of spread) and spectral masking release, we subtracted speech recognition RAU scores in the spectrally overlapped condition from the RAU scores in the interleaved condition. The upper panels of Fig. 5 show this masking release for each group as a function of SMR. Considering the mean data from each NH group, the expected trend is clear: decreases in spectral resolution by increasing spread, from no spread to 24 dB/oct to 12 dB/oct (open circles, triangles, and squares, respectively), leads to less masking release. This is especially apparent at the lower SMRs, where performance is well below ceiling. As already observed in the raw data, the difference scores for the CI group are generally close to zero, indicating little or no spectral masking release.

Fig. 5 — Speech perception for the CI listeners and the three groups of NH listeners are shown in the four panels. The benefit gained, or difference in RAU-transformed proportion of keywords from sentences reported correctly, is plotted as a function of speech-to-masker ratio. The top two plots show the benefit in speech perception for the interleaved vs overlapping condition, for each masker (tone and noise), respectively. The bottom two plots show the benefit in speech perception for the tone masker over the noise masker, for each condition (interleaved and overlapped), respectively. Results from the no spread, 24 dB/oct spread, 12 dB/oct spread, and CI groups are represented by open circles, triangles, squares, and diamonds. Note the different ranges of speech-to-masker ratio across groups. Error bars represent ±1 standard error of the mean between listeners

Benefit of Tone over Noise Maskers

Earlier studies have found that inherent temporal-envelope fluctuations in steady-state noise can account for a substantial proportion of the masking of speech in NH listeners (Stone et al. 2011, 2012; Stone and Moore 2014). A more recent study found that the same was not true for CI users, who exhibited as much masking with tone maskers that had no inherent fluctuations as they did with noise maskers (Oxenham and Kreft 2014). The difference appeared to be due to the loss of spectral resolution leading to interactions between the temporal envelopes from neighboring channels, which in turn resulted in an effective smoothing of the temporal-envelope fluctuations. If spectral resolution limits the effect of inherent masker fluctuations, then we may expect a relationship between spectral spread and the benefit of tone over noise maskers. This relationship is shown in the lower panels of Fig. 5, where speech scores with the noise masker are subtracted from speech scores with the tone masker.

The results from the no spread and 24 dB/oct NH groups (circles and triangles) show modest benefits of the tone over noise maskers, particularly at the lower SMRs, where performance is well below ceiling. There is also increased benefit for the no-spread group, over the 24 dB/oct spread and 12 dB/oct spread groups, when speech and masker were overlapping. The CI group showed a modest benefit of the tone over the noise masker at lower SMRs in the interleaved condition, but when performance is considered across all conditions, no benefit of tone over noise maskers was observed, consistent with results from an earlier study that used all channels and only spectrally overlapped maskers (Oxenham and Kreft 2014).

EXPERIMENT 2: SPEECH PERCEPTION IN SPECTRALLY INTERLEAVED OR OVERLAPPED MASKERS WITH VIRTUAL CHANNELS

Rationale

All eight CI users from experiment 1 used processing strategies that involved current steering (see Table 1). This implies that although each electrode is assigned to the frequencies listed in Table 3, the virtual channels produced by simultaneous stimulation of adjacent electrodes actually have center frequencies at the midpoints between the listed frequencies. In other words, by stimulating at the electrode frequencies, we were actually stimulating at the corner frequencies of the virtual channels. Theoretically and empirically, our approach seemed most appropriate, in terms of stimulating the actual electrodes present in the device. Nevertheless, it is possible that by bypassing the virtual-channel design of the CIs, we may have inadvertently lost some of the spectral resolution capabilities of the device. For this reason, we repeated critical conditions from experiment 1 with a map that was based on the center frequencies of the virtual channels rather than the electrodes.