Assessing the role of spectral and intensity cues in spectral ripple detection and discrimination in cochlear-implant users

Elizabeth S Anderson; Andrew J Oxenham; Peggy B Nelson; David A Nelson

doi:10.1121/1.4763999

. 2012 Dec;132(6):3925–3934. doi: 10.1121/1.4763999

Assessing the role of spectral and intensity cues in spectral ripple detection and discrimination in cochlear-implant users

Elizabeth S Anderson ^1,^a), Andrew J Oxenham ², Peggy B Nelson ³, David A Nelson ⁴

PMCID: PMC3529540 PMID: 23231122

Abstract

Measures of spectral ripple resolution have become widely used psychophysical tools for assessing spectral resolution in cochlear-implant (CI) listeners. The objective of this study was to compare spectral ripple discrimination and detection in the same group of CI listeners. Ripple detection thresholds were measured over a range of ripple frequencies and were compared to spectral ripple discrimination thresholds previously obtained from the same CI listeners. The data showed that performance on the two measures was correlated, but that individual subjects’ thresholds (at a constant spectral modulation depth) for the two tasks were not equivalent. In addition, spectral ripple detection was often found to be possible at higher rates than expected based on the available spectral cues, making it likely that temporal-envelope cues played a role at higher ripple rates. Finally, spectral ripple detection thresholds were compared to previously obtained speech-perception measures. Results confirmed earlier reports of a robust relationship between detection of widely spaced ripples and measures of speech recognition. In contrast, intensity difference limens for broadband noise did not correlate with spectral ripple detection measures, suggesting a dissociation between the ability to detect small changes in intensity across frequency and across time.

INTRODUCTION

The ability to discriminate spectral shapes has relevance to the auditory processing of complex acoustic signals such as speech, as many speech cues are contained in the spectral envelope (Peterson and Barney, 1952; Pickett, 1999). One psychophysical task that is believed to measure spectral resolution is spectral ripple discrimination (Supin et al., 1994; Henry et al., 2005; Won et al., 2007; Anderson et al., 2011). For this procedure, the listener is required to discriminate between a spectrally rippled noise stimulus (i.e., broadband noise containing regular variations in amplitude along the frequency axis) and another stimulus in which the positions of spectral peaks and valleys are reversed. Typically, the spectral modulation depth is held constant and the ripple rate is adaptively varied to find the listener's ripple discrimination threshold. Spectral ripple discrimination has been shown to correlate with other measures of spectral resolution in cochlear-implant (CI) users, such as spatial tuning curves (Anderson et al., 2011).

Spectral ripple detection is a different psychoacoustic measure that also involves the use of rippled noise (Bernstein and Green, 1987; Eddins and Bero, 2007; Litvak et al., 2007). In this task, the listener must distinguish a spectrally rippled noise from an unrippled (spectrally flat) noise. Unlike the spectral ripple reversal task, where the spectral ripple rate is varied and the modulation depth held constant, the spectral ripple detection paradigm typically keeps the ripple rate constant and adaptively varies the modulation depth of the spectral ripples. The spectral ripple detection threshold, or spectral modulation threshold, represents the smallest modulation depth, or spectral contrast, in a rippled noise signal (in dB) that can be discriminated from an unmodulated standard noise stimulus at a given ripple rate (Litvak et al., 2007).

When spectral ripple detection thresholds are measured individually at different spectral modulation rates, the resulting pattern of thresholds as a function of modulation rate is referred to as the spectral modulation transfer function (SMTF) (Saoji and Eddins, 2007; Saoji et al., 2009). Eddins and Bero (2007) measured the SMTF in normal-hearing listeners for rates between 0.25 and 10 ripples per octave (rpo), and found that the general form of the SMTF in that range is bandpass, with best modulation detection in the region between 2 and 4 rpo, and poorer detection at lower and higher modulation frequencies. Neither carrier bandwidth (1–6 octaves) nor carrier frequency region (200–12 800 Hz) was found to influence spectral ripple detection thresholds, with the exception of carrier bands restricted to very low audio frequencies (e.g., 200–400 Hz), where ripple detection was poorer. Although poorer performance at very low frequencies is qualitatively consistent with expectations based on broader auditory filters at low frequencies (Glasberg and Moore, 1990), the ripple thresholds at these very low frequencies were quantitatively poorer than predicted by excitation patterns based on expected auditory filter bandwidths.

Poor spectral resolution results in a smoothing of the spectral envelope of the internal representation of the acoustic spectrum of a complex stimulus, such as rippled noise or speech (Horst, 1987). In the case of spectral ripple discrimination, a smoothed spectral envelope will result in a lower maximum ripple rate at which ripple phase reversals are detectable. In the case of spectral ripple detection, poorer spectral resolution should lead to poorer (higher) detection thresholds at higher rates, where the smoothing effect of broader filters is most prominent. In general, spectral ripple detection is thought to be limited by poor spectral resolution, as measured by notched-noise estimates of auditory filter bandwidth (e.g., Summers and Leek, 1994). If poorer spectral resolution is thought of as a low-pass filter in the spectral modulation frequency domain (Summers and Leek, 1994; Eddins and Bero, 2007; Saoji and Eddins, 2007), then the effect of poorer spectral resolution (i.e., broader filters) will be to reduce the upper cutoff frequency of the SMTF.

Directly reducing spectral contrasts has been shown to result in decreased speech recognition performance (Bacon and Brandt, 1982; Van Tasell et al., 1987; Baer and Moore, 1994), although the research literature has not consistently shown a significant correlation between measures of spectral resolution and speech perception in normal acoustic hearing (e.g., Dubno and Dirks, 1990; Surprenant and Watson, 2001). In general, speech in quiet remains intelligible even with very poor spectral resolution (e.g., Shannon et al., 1995), although somewhat better spectral resolution is required to understand speech in a background of noise (e.g., Dorman et al., 1998; Friesen et al., 2001).

Litvak et al. (2007) used the spectral ripple detection paradigm to explore the role of spectral resolution as a possible source of variance in word recognition by CI users. Litvak et al. (2007) also introduced varying amounts of spectral smearing to vocoder-processed spectrally rippled noise stimuli by varying the vocoder's carrier filter slopes, in order to match performance of normal-hearing (NH) listeners on the spectral ripple task to that of CI listeners. Their goal was to then compare performance on measures of vowel and consonant identification by CI listeners to performance of NH listeners with the vocoder parameters that produced the same spectral ripple detection thresholds. Average spectral ripple detection thresholds for ripple rates of 0.25 and 0.5 rpo were found to correlate strongly with vowel recognition across both groups of listeners, with vowel scores for NH listeners decreasing as the filter slopes of the noise-excited vocoder became shallower, simulating poorer spectral resolution. Spectral ripple detection thresholds also correlated with consonant recognition for both groups, but consonant performance decreased less than vowel performance for a given decrease in spectral resolution, perhaps reflecting less dependence of consonant recognition on fine-grained spectral information.

The finding of a relationship between spectral ripple detection for low ripple frequencies and speech recognition is somewhat counterintuitive, given that detection thresholds for higher, not lower, ripple frequencies should be more sensitive to reduced spectral resolution (Eddins and Bero, 2007; Saoji and Eddins, 2007). Saoji et al. (2009) investigated this discrepancy by comparing spectral ripple detection thresholds for an extended range of spectral ripple rates (0.25, 0.5, 1.0, and 2.0 rpo) for the same 25 subjects included in the earlier study. They found that the strongest predictors of vowel and consonant identification were detection thresholds at 0.25 and 0.5 rpo, respectively, consistent with the findings of Litvak et al. (2007). The authors concluded that the correlation between detection of low spectral modulation frequencies and vowel and consonant identification was not likely related to spectral resolution per se, but rather to differences in listeners’ ability to compare the amplitudes of widely spaced spectral maxima and minima in the spectral envelope spanning a broad frequency range (cf. Bernstein and Green, 1987; Eddins and Bero, 2007). Reduced spectral resolution may prevent the use of fine spectral detail carried by higher spectral modulation frequencies, requiring CI listeners to rely on features in the broad spectral envelope (i.e., low modulation frequencies).

The aim of the present study was to compare performance on spectral ripple discrimination and detection tasks in the same CI listeners. Spectral ripple detection thresholds were measured over a range of ripple rates for a group of CI subjects and compared to the spectral ripple discrimination thresholds previously obtained from the same subjects (Anderson et al., 2011). It was predicted that if both measures reflect spectral resolution, then the individual results in the two tasks should be related in a lawful manner. In addition, spectral ripple detection thresholds were compared to speech-perception measures from the same subjects, reported in Anderson et al. (2011), to test whether the earlier findings showing a relationship between detection of broadly spaced ripples and measures of speech recognition (Litvak et al., 2007; Saoji et al., 2009) could be replicated.

EXPERIMENT 1: SPECTRAL RIPPLE DETECTION

Subjects

Fifteen CI subjects (5 Clarion I, 5 Clarion II, and 5 Nucleus-22) participated in this study. See Table TABLE I. for individual subject characteristics.

TABLE I.

Individual subject and device characteristics. Gender, age when tested, duration of implant use prior to the study, etiology of deafness, duration of bilateral, severe-to-profound hearing loss prior to implantation, implanted device, and sound processing strategy. The processing strategies include Continuous Interleaved Sampling (CIS), Multiple Pulsatile Stimulation (MPS), HiResolution-Paired (HiRes-P) and Sequential (HiRes-S), and Spectral Peak (SPEAK).

Subject code	M/F	Age (yr)	CI use (yr)	Etiology	Duration of deafness (yr)	Device	Strategy
C03	F	58.8	9.7	Familial Progressive SNHL	27	Clarion I	CIS
C05	M	52.5	10.2	Unknown	<1	Clarion I	CIS
C16	F	54.2	6.7	Progressive SNHL	13	Clarion I	MPS
C18	M	74.0	7.2	Otosclerosis	33	Clarion I	MPS
C23	F	48.1	6.4	Progressive SNHL; Mondini's	27	Clarion I	CIS
D02	F	58.2	6.4	Unknown	1	Clarion II	HiRes-P
D05	F	78.2	6.6	Unknown	3	Clarion II	HiRes-S
D08	F	55.9	5.0	Otosclerosis	13	Clarion II	HiRes-S
D10	F	53.8	5.2	Unknown	8	Clarion II	HiRes-S
D19	F	48.2	3.5	Unknown	7	Clarion II	HiRes-S
N13	M	69.9	17.5	Hereditary; Progressive SNHL	4	Nucleus 22	SPEAK
N14	M	63.5	13.9	Progressive SNHL	1	Nucleus 22	SPEAK
N28	M	68.8	11.8	Meningitis	<1	Nucleus 22	SPEAK
N32	M	40.1	10.3	Maternal Rubella	<1	Nucleus 22	SPEAK
N34	F	62.0	8.4	Mumps; Progressive SNHL	9	Nucleus 22	SPEAK

Open in a new tab

Stimuli

The stimuli were patterned after those used by Litvak et al. (2007) and were generated in Matlab (The Mathworks, Natick, MA), using the AFC software package developed by Stephan Ewert (University of Oldenburg, Germany). Spectral modulations were applied to a broadband (350–5600 Hz) Gaussian noise, producing sinusoidal variations in level (dB) on a log-frequency axis using

X (f) = 10^{(D / 2) \sin (2 π (\log_{2} (f / L)) f_{s} + θ) / 20},

(1)

where X is the expected level at frequency f, D is peak-to-valley modulation depth in dB, L is the low cutoff frequency of the noise passband, f_s is the spectral modulation frequency (in rpo), and θ is the starting phase of the ripple function. Noises were generated at seven fixed ripple rates: 0.25, 0.5, 0.75, 1.0, 1.5, 2.0, and 3.0 rpo. The modulation depth was varied adaptively for this task. The signal duration was 400 ms, which included 20 ms raised-cosine onset and offset ramps. The stimuli were played through a Lynx L22 sound card (Lynx Studio Technology, Costa Mesa, CA) at a sampling rate of 48 kHz and presented in a double-walled, sound-attenuating booth through a single loudspeaker (Infinity RS1000, Harman International, Stamford, CT) located 1 m from the subject's seated position at 0° azimuth, at approximately head height. The average (root-mean-square) sound level of the noise was 60 dBA when measured at a point corresponding to the position of the subject's head. The level was roved across intervals within each trial by ±3 dB, and the starting phase of the spectral modulation was randomized for each trial.

Procedure

Subjects wore their own speech processors at typical use settings. The same settings were used as with the previously obtained measures of spectral ripple discrimination in the same listeners (Anderson et al., 2011). A three-interval, three-alternative forced-choice (3I-3AFC) adaptive procedure was employed. During each trial, subjects heard two intervals of unrippled broadband noise and one interval of rippled noise. The interval containing the rippled noise was selected at random with equal a priori probability on each trial. Subjects indicated which interval they judged as sounding different (corresponding to the rippled signal) by selecting the appropriate button on a computer screen offset to the right of the speaker. Correct-answer feedback was provided after each trial. Each test run began with a peak-to-valley ratio for the rippled stimulus of 20 dB; the modulation depth was then varied adaptively in a two-down, one-up adaptive procedure to track the peak-to-valley ratio that could be detected with an accuracy of about 71% correct (Levitt, 1971). The initial step size was 4 dB, changing to 2 dB after the first two reversals, and to 0.5 dB following two more reversals. Termination of the run occurred after a total of ten reversals. Spectral ripple detection threshold was defined for each run as the geometric mean of the peak-to-valley ratios (PVR) at the final six reversal points.

Each subject completed six runs for each of the seven ripple rates, for a total of 42 experimental runs. Thresholds from the first run for each ripple-rate condition were discarded as practice runs, and any thresholds that were more than 3 standard deviations (s.d.) away from the mean of the remaining measurements for that condition were excluded as outliers. Typically, thresholds from five runs for each ripple rate were used to calculate a geometric mean threshold for each subject.

Results

Spectral ripple detection thresholds varied widely across subjects and across ripple frequencies. Individual thresholds ranged from about 5 to 45 dB PVR. The pattern of results is in line with that reported by Saoji et al. (2009). In general, greater spectral modulation was required for detection as the ripple frequency increased, although some subjects showed non-monotonicities in their SMTFs, as illustrated in Fig. 1. In addition, a range of different shapes of SMTFs can be observed, similar to the findings of Saoji et al. (2009), with some individual functions steeper than others. As expected from Fig. 1, a repeated-measures analysis of variance (ANOVA) indicated a main effect of ripple frequency [F(2.29, 29.73) = 26.04, p < 0.001]. (Here and elsewhere, a Greenhouse–Geisser correction was applied where appropriate to account for lack of sphericity.)

Spectral modulation transfer functions (SMTFs) for 15 CI subjects. Ripple detection threshold (in dB) for each of seven different spectral ripple frequencies (0.25, 0.5, 0.75, 1.0, 1.5, 2.0, and 3.0 rpo) is depicted.

The spectral ripple detection thresholds were compared to spectral ripple discrimination thresholds from Anderson et al. (2011, Table TABLE II.). The discrimination thresholds were measured with stimuli using a 30 dB spectral PVR; this value represents an approximation of “maximum” depth, close to 95% modulation on a linear modulation scale.

TABLE II.

Correlations and corresponding p values (with no post-hoc Bonferroni corrections for multiple comparisons) for three summary measures of ripple detection (the average threshold at ripple rates of 0.25 and 0.5 rpo, the average threshold at ripple rates of 2 and 3 rpo, and the interpolated ripple rate at which the threshold peak-to-valley ratio was 30 dB), and speech recognition measures.

Speech measure	Ripple detection measure
Speech measure	Avg. (0.25, 0.5 rpo)	Avg. (2.0, 3.0 rpo)	SMT_{30 dB}
Sentence recognition, Q (RAU)	r² = 0.68	r² = 0.37	r² = 0.33
Sentence recognition, Q (RAU)	p < 0.001	p = 0.02	p = 0.02
Vowel identification, Q (RAU)	r² = 0.63	r² = 0.36	r² = 0.16
Vowel identification, Q (RAU)	p < 0.001	p = 0.02	p = 0.15
SNR_50%, sentences	r² = 0.26	r² = 0.14	r² = 0.09
SNR_50%, sentences	p = 0.09	p = 0.24	p = 0.35
SNR_50%, vowels	r² = 0.30	r² = 0.09	r² = 0.04
SNR_50%, vowels	p = 0.04	p = 0.31	p = 0.51

Open in a new tab

Comparing spectral ripple detection and discrimination

If spectral ripple detection and discrimination are both mediated by a common underlying mechanism, then performance in the two tasks should be related. For instance, if it is necessary to detect spectral modulation in order to discriminate changes in its phase, then we might expect that the ripple frequency at which a detection threshold of 30 dB is obtained should correspond to the spectral ripple discrimination threshold, since the modulation depth used in the ripple discrimination task was 30 dB.1 However, the measured spectral ripple detection thresholds were better (lower) than 30 dB for some subjects even at the highest ripple rate originally tested (3 rpo). Consequently, the spectral ripple detection task was conducted at additional higher ripple frequencies (6, 9, 12, 15, 18, and 21 rpo) for each subject, as needed, to achieve a ripple detection threshold at or above 30 dB. From this extended SMTF, the spectral ripple frequency corresponding to a ripple detection threshold of 30 dB was estimated by a linear interpolation from existing points on the SMTF. Simple linear interpolation, rather than defining best-fit functions for the data points (e.g., Saoji et al., 2009), was chosen because of the irregularity and non-monotonicities present in some of the SMTFs. Figure 2 displays the extended SMTF measures for each subject, grouped by implant device. Note the apparent differences in results as a function of device type. Using a between-subjects ANOVA on the ripple detection thresholds for 3 rpo, a marginally significant effect of device was found [F(2,13) = 4.313, p = 0.04], presumably reflecting the generally lower limits found in the Nucleus users; however, the individual differences are so large as to preclude any general statement. The same analysis using the interpolated 30 dB threshold in the detection task showed no significant differences between devices [F(2,14) = 1.935; p = 0.19].

SMTFs for Clarion-I, Clarion-II, and Nucleus subjects (top, middle, and lower panels, respectively) over an extended range of spectral ripple rates. The dotted horizontal lines correspond to a ripple detection threshold of 30 dB.

Figure 3 illustrates the relationship between the interpolated spectral ripple detection threshold and spectral ripple discrimination thresholds obtained from Anderson et al. (2011). Linear regression analysis revealed a significant correlation between the two variables (r² = 0.42, p = 0.01). Despite the significant correlation, the correspondence is surprisingly poor. For instance, listeners with very similar ripple discrimination thresholds (between 2 and 3 rpo) had 30 dB detection thresholds at rates ranging from about 3 to more than 17 rpo. For most (but not all) listeners, the spectral ripple rate at which a 30 dB PVR was detected was substantially higher than the corresponding ripple discrimination threshold, and the function relating the ripple rate corresponding to a detection threshold of 30 dB to the ripple discrimination threshold had a slope of approximately 5. This pattern of results is difficult to explain in terms of the detectability of the spectral contrasts within each stimulus.2 Potential reasons for this apparent discrepancy are addressed in Sec. 2E.

Interpolated ripple frequency (in rpo) for ripple detection threshold, or spectral modulation threshold (SMT) = 30 dB, as a function of spectral ripple discrimination threshold (rpo).

Spectral ripple detection and speech recognition

Spectral ripple detection thresholds were examined in relation to sentence and vowel recognition performance [see Anderson et al. (2011) for additional details on speech recognition testing]. Two lists of IEEE sentences (IEEE, 1969) recorded by one male and one female talker were administered for each experimental condition. Each list contained ten sentences with five key words, for a total of 100 key words per condition. Subjects orally repeated the words they heard, following presentation. Hillenbrand vowels (Hillenbrand et al., 1995) included 11 vowels in an h/V/d context spoken by six male talkers. Subjects identified the token they heard by selecting the corresponding item from the full set of possible items on a computer screen. Stationary Gaussian background noise was spectrally shaped to match the long-term average spectrum of each type of speech material (sentences and vowels). Speech and noise were mixed to produce the appropriate speech-to-noise ratios (SNRs), and acoustic verification was performed with a sound level meter. Speech materials were presented in the sound field at 65 dBA, in quiet and at fixed SNRs (0, 5, 10, 15, and 20 dB). Performance in quiet was quantified in rationalized arcsine units, or RAU scores (Studebaker, 1985), representing correct key word recognition for sentence materials and for vowel identification. Performance in noise is reported as the SNR corresponding to 50% correct recognition, derived from logistic functions fitted to the speech psychometric functions (percent correct as a function of SNR).

To analyze the relationship between speech recognition and spectral ripple detection, three summary measures of ripple detection were derived. The first was the average threshold at ripple rates of 0.25 and 0.5 rpo, representing performance at low ripple rates. The second was the average threshold at ripple rates of 2 and 3 rpo, representing performance at higher ripple rates. The third measure was the interpolated ripple rate at which the threshold PVR was 30 dB, representing the upper limit of spectral ripple detection.

The average of spectral ripple detection thresholds at 0.25 and 0.5 rpo showed a strong relationship with sentence recognition in quiet (r² = 0.68, p < 0.001), as shown in Fig. 4A, and vowel recognition in quiet (r² = 0.63, p < 0.001), shown in Fig. 4B. Data from all 15 subjects are included.

RAU scores for sentence recognition (A) and vowel recognition (B) as a function of average SMT for 0.25 and 0.5 rpo. (C), (D) SNR_50% for sentences and vowels, respectively, as a function of the average of ripple detection thresholds at 0.25 and 0.5 rpo.

The relationships between spectral ripple detection and sentence and vowel recognition in noise are depicted in Figs. 4C, 4D, respectively. Plotted are the SNRs corresponding to 50% correct performance as a function of ripple detection threshold. Three subjects (C18, C23, N28) were excluded from sentence analysis and one (C23) from vowel analysis because their asymptotic performance in quiet was below 50%. The SNR required for 50% word recognition in sentences for the remaining 12 subjects showed trends, but the relationships failed to reach significance (r² = 0.26, p = 0.09); the corresponding SNR for vowel recognition in noise for 14 subjects showed a borderline significant correlation with ripple detection (r² = 0.30, p = 0.04). The poorer correlations found for speech in noise may be due, at least in part, to the smaller number of subjects in the noise conditions.

Table TABLE II. displays results from all of the regression analyses for each of the three summary ripple detection measures, with each of the speech measures obtained, uncorrected for multiple comparisons. The average threshold at ripple rates of 0.25 and 0.5 rpo shows consistently strong correlations for sentence and vowel recognition in quiet, whereas correlations with speech measures at higher ripple rates are less robust, in agreement with other studies (Litvak et al., 2007; Saoji et al., 2009).

Discussion

Comparing spectral ripple detection and discrimination as measures of spectral resolution

One aim of this study was to test whether spectral ripple detection and discrimination were both measures of the same underlying mechanisms involving spectral resolution. Although the two measures were correlated, the relationship was not clear-cut and did not conform to expectations based on predictions from simple models of detection theory. The results fail to support the suggestion of Saoji et al. (2009) that thresholds from the spectral ripple discrimination task represent points on the high-modulation-frequency end of the SMTF. In fact, spectral ripple detection for a modulation depth of 30 dB generally fell at a higher ripple rate than the corresponding ripple discrimination thresholds.

The better performance for the detection task might reflect differences in listening strategies used by individuals for the two tasks. For instance, a strategy for the spectral ripple detection task might be to compare each interval to an internal template of the standard stimulus, unmodulated noise, which will remain constant across all trials (e.g., Sabin et al., 2012). On the other hand, for the discrimination task a listener would not have a constant internal template, because of the randomized starting phase used in all tasks, so that a direct comparison of all three intervals with each other is necessary for successful performance. The demands on working memory may therefore be greater than for the detection task.

However, the extended-frequency ripple detection results raise a further question of the validity of the ripple detection technique as a measure of spectral resolution. Some CI users were able to detect spectral ripples as closely spaced as 20 rpo, well beyond their spectral ripple discrimination thresholds and predicted capabilities based on the bandwidths of the CI analysis filters. The filters of the individual subjects were never narrower than 1/4 octave on average, meaning that spectral ripple rates greater than 4 rpo should not have been reliably detected. Figure 5 demonstrates an example, where a non-monotonicity occurs in the function, with a dip representing better performance (smaller modulation depth at threshold) at higher ripple densities. The non-monotonicity may indicate a transition from one cue (e.g., spectral contrast) at low ripple rates to another cue at higher rates.

Individual SMTF (ripple detection threshold as a function of spectral ripple frequency) for subject D10. The vertical line indicates the spectral ripple discrimination threshold for this listener.

To further investigate factors underlying spectral ripple detection, three NH subjects (ages 34–52 with audiometric thresholds no greater than 20 dB HL at octave frequencies between 250 and 8000 Hz) were tested on an identical spectral ripple detection procedure in the sound field, except that an extended range of spectral ripple rates was used: 0.25, 0.5, 0.75, 1.0, 1.5, 2.0, 3.0, 6.0, 9.0, 12.0, 15.0, 21.0, 30.0, 45.0, and 60.0 rpo. Note that according to estimates of human auditory filter bandwidths (e.g., Glasberg and Moore, 1990; Oxenham and Shera, 2003), filters are typically no narrower than one-sixth to one-tenth of an octave (depending on the center frequency and the method of measurement). Therefore, the detection of spectral ripples should be limited to ripple rates below about 10 rpo. (This expected limit may explain why earlier studies in NH listeners have typically been limited to rates below about 10 rpo.) The results from the three NH listeners are shown in Fig. 6. Measurable thresholds were obtained for ripple rates well beyond the expected theoretical limit, and all three subjects were able to detect ripples out to the highest ripple rate tested of 60 rpo. Although not as prominent as in Fig. 5, there is a hint of a non-monotonicity in the function, with a local peak just below 10 rpo, which is found in all three NH listeners.

SMTFs for three normal-hearing subjects. Ripple detection thresholds (in dB) for each of 15 different spectral ripple frequencies (0.25–60 rpo) is displayed. Spectral ripple discrimination thresholds are indicated by vertical lines.

It seems highly unlikely that listeners are able to resolve spectral peaks presented at 60 rpo; instead, some other cue must be available. It is possible that the interactions between adjacent peaks in the spectrally rippled noise provide usable temporal cues. Specifically, closely spaced spectral peaks might produce regular temporal fluctuations or beats in the temporal envelope, providing a salient cue for discriminating the rippled signal from the flat-spectrum noise standard. Viewed in this light, the non-monotonicity illustrated in Fig. 5, and to a lesser extent in Fig. 6, may reflect two regions in which different cues are being used to perform the task: At low ripple rates, subjects are detecting the spectral ripples via a spectral-contrast mechanism because the temporal cues, based on peaks that are spaced far apart, are too weak because of the high rate of fluctuation, based on the difference in frequency of two adjacent peaks, and because of the attenuating effects of the auditory filters. At higher ripple rates, the spectral cues become weaker, but the temporal cues become more salient, as the denser spectral peaks produce slower and more detectable temporal fluctuations. To our knowledge, there exist no published reports addressing this possibility systematically, but work using iterated rippled noise points to the fact that temporal periodicity in noise waveforms can be observed as spectral peaks, and vice versa (e.g., Yost, 1996; Yost et al., 1998; Wiegrebe and Patterson, 1999).

In summary, spectral ripple detection at ripple rates higher than 2–3 rpo in CI users (or 8–10 rpo in NH listeners) may be influenced by temporal-envelope cues, and therefore do not solely reflect spectral resolution. This problem may not apply to spectral ripple discrimination, because the temporal-envelope fluctuations would be present in both the target and reference stimuli and therefore may not provide a reliable cue to perform the task.

Returning to spectral ripple detection at very low ripple rates, it is unclear the extent to which the results reflect spectral resolution, as opposed to spectral profile discrimination (e.g., Bernstein and Green, 1988), or intensity resolution: At low ripple rates, if the spectral ripples of the stimulus are much broader than the bandwidths of the effective analysis filters (either in the CI processor or in terms of current spread within the cochlea), then changes in the effective filter bandwidths are unlikely to affect spectral ripple detection thresholds. Instead, thresholds are likely to be determined primarily by the subject's ability to detect across-frequency (or across-channel) differences in intensity. If so, then it is possible that this ability is related to intensity difference limens—the just-noticeable difference in intensity of a sound. This hypothesis is tested in experiment 2.

Relationship between ripple detection and speech perception

Three measures of spectral ripple detection were compared to speech recognition performance: (1) Detection threshold for broadly spaced spectral ripples (the average of 0.25 and 0.5 rpo); (2) detection threshold for more closely spaced ripples (the average of 2.0 and 3.0 rpo); (3) the interpolated ripple density from each subject's SMTF corresponding to a 30 dB peak-to-valley modulation depth. It might be predicted that the ability to detect more closely spaced ripples would be most predictive of speech perception, given that this measure is most likely to reflect spectral resolution, rather than intensity resolution (low rates) or temporal processing (interpolated 30 dB threshold). However, the measure of ripple detection that correlated most strongly with speech recognition measures in quiet was for the low ripple rates of 0.25 and 0.5 rpo, corroborating the results reported by Litvak et al. (2007) and Saoji et al. (2009). Correlations were not as robust for any of the speech measures in noise, in part perhaps because of the smaller number of subjects included in the comparison.

In general, the relationship between detection of low spectral ripple rates and speech perception is consistent with a study by Liu and Eddins (2008), in which they measured vowel identification by NH listeners for vowel stimuli that were progressively high-pass filtered in the spectral modulation domain. Their data suggested that the spectral cues most important for vowel identification are represented by spectral modulation frequencies below 2 cycles/octave.

If spectral resolution strongly influences speech perception, then these results are somewhat counterintuitive, given that detection thresholds for higher ripple frequencies, not lower ones, should be more sensitive to reduced spectral resolution and therefore might be expected to correlate more strongly with speech recognition. Saoji et al. (2009) suggested that the correlation between detection of low spectral modulation frequencies and vowel and consonant identification might not be related to spectral resolution per se, but rather to differences in listeners’ ability to compare widely spaced spectral maxima and minima in the spectral envelope spanning a broad frequency range, i.e., profile analysis (Green et al., 1984; Bernstein and Green, 1988). In other words, adequate speech perception may not require access to high spectral modulation frequencies, so long as the intensity contrasts at lower spectral modulation frequencies are well resolved. Another possibility is that subjects were basing their judgments on small changes in overall loudness, rather than spectral contrasts. Although this possibility is rendered less likely by our use of level roving and randomized starting modulation phase, the ability of subjects to make use of any overall loudness cue would again depend on their ability to resolve small differences in intensity. This possibility is explored further in experiment 2.

EXPERIMENT 2: INTENSITY DISCRIMINATION

In this experiment, intensity difference limens were measured using broadband, spectrally flat noise stimuli, similar in bandwidth to the stimuli from experiment 1. The rationale was based on the possibility that performance in the spectral ripple detection task, particularly at low ripple rates, was governed primarily by intensity resolution, rather than spectral resolution. This was mostly likely to be in the form of spectral profile analysis, with across-frequency (or across-channel) comparisons of intensity, but might also be in the form of comparisons of the overall perceived intensity (loudness) of the spectrally modulated and unmodulated stimuli. In either case, we might expect to see a relationship between spectral ripple detection thresholds at low ripple rates and intensity difference limens.