Abstract
Objectives:
The objective of our study is to understand how listeners with and without sensorineural hearing loss (SNHL) use energy and temporal envelope cues to detect tones in noise. Previous studies of low-frequency tone-in-noise detection have shown that when energy cues are made less reliable using a roving-level paradigm, thresholds of listeners with normal hearing (NH) are only slightly increased. This result is consistent with studies demonstrating the importance of temporal envelope cues for masked detection. In contrast, roving-level detection thresholds are more elevated in listeners with SNHL at the test frequency, suggesting stronger weighting of energy cues. The present study extended these tests to a wide range of frequencies and stimulus levels. The authors hypothesized that individual listeners with SNHL use energy and temporal envelope cues differently for masked detection at different frequencies and levels, depending on the degree of hearing loss.
Design:
Twelve listeners with mild to moderate SNHL and 12 NH listeners participated. Tone-in-noise detection thresholds at 0.5, 1, 2, and 4 kHz in 1/3 octave bands of simultaneously gated Gaussian noise were obtained using a novel, two-part tracking paradigm. A track refers to the sequence of trials in an adaptive test procedure; the signal to noise ratio was the tracked variable. Each part of the track consisted of a two-alternative, two-interval, forced-choice procedure. The initial portion of the track estimated detection threshold using a fixed masker level. When the track continued, stimulus levels were randomly varied over a 20-dB rove range (±10 dB with respect to mean masker level), and a second threshold was estimated. Rove effect (RE) was defined as the difference between thresholds for the fixed- and roving-level tests. The size of the RE indicated how strongly a listener weighted energy-based cues for masked detection. Participants were tested at one to three masker levels per frequency, depending on audibility.
Results:
Across all stimulus frequencies and levels, NH listeners had small REs (≈1 dB), whereas listeners with SNHL typically had larger REs. Some listeners with SNHL had larger REs at higher frequencies, where pure-tone audiometric thresholds were typically elevated. RE did not vary significantly with masker level for either group. Increased RE for the SNHL group was consistent with simulations in which energy cues were more heavily weighted than envelope cues.
Conclusions:
Tone-in-noise detection thresholds in NH listeners were typically elevated only slightly by the roving-level paradigm at any frequency or level tested, consistent with the primary use of level-independent cues, such as temporal envelope cues that are conveyed by fluctuations in neural responses. In comparison, thresholds of listeners with SNHL were more affected by the roving-level paradigm, suggesting stronger weighting of energy cues. For listeners with SNHL, the largest RE was observed at 4000 Hz, for which pure-tone audiometric thresholds were most elevated. Specifically, RE size at 4000 Hz was significantly correlated with higher pure-tone audiometric thresholds at the same frequency, after controlling for the effect of age. Future studies will explore strategies for restoring or enhancing neural fluctuation cues that may lead to improved hearing in noise for listeners with SNHL.Key words: Masked detection, Sensorineural hearing loss.
Keywords: masked detection, Sensorineural hearing loss
INTRODUCTION
Listening in background noise is challenging for everyone, regardless of hearing status. In fact, one of the most common reasons for individuals to seek audiologic help is difficulty understanding speech in background noise. The overall prevalence of hearing loss among adults 20 to 69 years of age is 14.1% in the United States (Hoffman et al. 2017). Although assistive listening devices can improve access to sounds for individuals with hearing loss, listeners with decreased hearing continue to struggle in background noise, even when audibility is restored or maximized.
Studies of the mechanisms for detecting signals in noise provide the basis for fundamental models of hearing. For example, the concept of critical bands (Fletcher 1940) and estimates of psychophysical tuning curves (review: Patterson & Moore 1986) are based on performance in masked-detection paradigms. These data are generally interpreted based on the power spectrum model of masking, with the assumption that detection depends upon changes in the energy at the output of a peripheral filter tuned to the target frequency (Fletcher 1940; Patterson 1976). Energy-based models have been challenged by studies showing that detection thresholds of listeners with normal hearing (NH) are minimally affected when the energy-based cues are made unreliable by a roving-level paradigm (Fig. 1) (Kidd et al. 1989; Richards 1992; Lentz et al. 1999) or when energy cues are removed by normalization (Richards & Nekrich 1993).
The roving-level paradigm, used here to study masked detection, can be used to test the importance of a given cue in a psychophysical task. The strategy is to randomly vary or rove the value of a parameter of interest to determine its importance to the listener for a given task. Analyses of the expected effect on performance of a given rove range have been reported for deterministic stimuli (Green 1988; Dai 2008; Dai & Kidd 2009). If listeners based their decisions on the roved parameter, the roving paradigm would affect the correctness of their responses on a subset of the trials (by chance, sometimes the rove makes a trial “harder,” and sometimes it makes a trial “easier”). For deterministic stimuli, the shift in threshold of an energy-based model due to a roving-level paradigm can be directly computed. In contrast, for tasks with nondeterministic stimuli, such as Gaussian noise maskers, the energy in each stimulus interval is affected by both the randomized masker level and by the interaction between the added tone and the random-noise masker. However, it is straightforward to estimate the expected effect of a roving-level paradigm for an energy-based model using simulation. These simulations inform the interpretation of the results of the roving-level paradigm and also help tailor the size of the rove range to ensure that a measurable shift in threshold is introduced, while accommodating a comfortable dynamic range for the listeners (Dai & Kidd 2009). Rove effects (REs) that approach the predicted change in threshold for an energy-based model indicate that listeners base their decisions primarily on energy. Smaller, but significant, REs indicate that listeners are influenced by the parameter, but do not base their decisions solely on it. Insignificant REs would indicate that the roved parameter does not influence participants’ decisions.
The experiments in the present study roved the overall level of the stimuli to test whether individual listeners depended on energy-based cues for masked detection in different stimulus conditions. The lack of a RE would suggest that a listener was able to use alternative cues, such as temporal cues that result from changes upon addition of a tone to a noise masker in either the stimulus fine structure or envelope fluctuations (e.g., Kidd et al. 1989; Richards 1992; Kohlrausch et al. 1997; Jepsen et al. 2008). For example, Kidd et al. (1989) reported tone detection in noise with varying bandwidths in a roving-level paradigm. In conditions for which bandwidths were narrower than the critical band, performance of young listeners with NH in the roving-level conditions was better than an energy-based detection strategy would predict. Their results suggested that temporal fluctuations in the amplitude envelope of stimuli were used in roving-level masked detection. In Richards (1992), information from the energy and the temporal cues of the tone-in-noise stimuli with varying durations and masker bandwidths was used to predict detection performance of listeners with NH. The results suggested that performance could be explained by changes in temporal cues, but not by changes in energy. In masked detection using “low-noise” noise, which is a masking noise designed to have a flat temporal envelope, the addition of a tone increases temporal envelope fluctuations. This change, rather than an increase in overall energy, was reported to be the primary detection cue (Kohlrausch et al. 1997).
Studies of tone detection in reproducible noise have further tested the hypothesis that temporal envelope cues could explain behavioral results of NH listeners. These studies tested models against detailed detection results for an ensemble of known masker waveforms and took advantage of the fact that detection performance varies significantly and reliably from one masker waveform to another (e.g., Gilkey & Robinson 1986; Evilsizer et al. 2002). Davidson et al. (2009) compared predictions of several psychophysical models for masked detection, using both energy and temporal cues. The temporal envelope cue was normalized by stimulus energy, making it robust to variations in stimulus level. Both temporal envelope and fine structure cues were significantly correlated to behavioral results in NH listeners. A subsequent study showed that a nonlinear combination of energy and normalized temporal envelope cues yielded better predictions of detection performance when compared with either individual cue (Mao et al. 2013). The role of envelope cues in both diotic and dichotic masked detection was examined in Mao and Carney (2015), using a stimulus-based signal-processing model and a physiologic model with neural mechanisms that included sensitivity to the stimulus envelope. Both models predicted a significant amount of the variance in performance across different reproducible masker waveforms.
While masked-detection thresholds of NH listeners are minimally affected in roving-level tasks with rove ranges of up to 32 dB (Kidd et al. 1989; Richards 1992), a recent study of listeners with sensorineural hearing loss (SNHL) showed that roving-level detection thresholds for a 500-Hz tone in both narrowband and wideband reproducible noise maskers were elevated with respect to fixed-level thresholds (Mao et al. 2015). In addition, analysis of responses to reproducible noise waveforms, which allows identification of the cues used by individual listeners, indicated that, for listeners with substantial SNHL at the tone frequency, the pattern of correct detection (hit) and false-alarm rates across the ensemble of reproducible noise waveforms was strongly influenced by energy-based cues.
In contrast with energy-based cues, which are typically associated with an increase in the average firing rate of auditory nerve (AN) fibers tuned to the tone frequency, the temporal envelope of a tone-plus-noise stimulus is represented in the time-varying responses of the cochlea, inner hair cells, and AN fibers. The physiologic transformation of these cues, from the envelope of the mechanical response of the cochlea to amplitude fluctuations in neural signals, is affected by SNHL. Figure 2 illustrates neural fluctuation cues at the level of the AN for noise-alone and tone-plus-noise stimuli. The purpose of this figure is simply to show the nature of the cues available at the input to the ascending auditory pathway. Neurons in higher centers, which are sensitive to the amplitudes and frequencies of low-frequency fluctuations on their inputs, transform these temporal cues into average-rate profiles (Carney 2018).
Figure 2 shows responses to a 1/3 octave Gaussian noise at 65 dB SPL, with (right) or without (left) a 1-kHz tone with signal to noise ratio (SNR) of 3 dB for high-spontaneous-rate model AN fibers (Zilany et al. 2014) tuned to frequencies straddling the tone frequency. A 50-msec segment of each response is shown to illustrate both the temporal fine structure in the responses (detailed black curves) and the envelope of the model AN response (red), which highlights the amplitude fluctuations in the neural response. In the responses of the model with NH (top), large low-frequency fluctuations are observed in all responses to the noise-alone stimulus (upper left) and to channels tuned away from the tone frequency in response to the tone-plus-noise stimulus (upper right). In contrast, the AN channel tuned to the 1-kHz tone (upper right) has a relatively flat response because this fiber’s response is dominated (“captured”) by the tone. Capture is caused by a combination of cochlear amplification and saturation of the inner-hair-cell (IHC) response (Zilany & Bruce 2007; reviewed in Carney 2018). Saturation in the IHC transduction input/output function results in reduced amplitude of low-frequency fluctuations in the IHC voltage. The line plot behind each panel shows mean rates (green) and fluctuation amplitudes (blue) based on a normalized envelope slope cue (the mean of the absolute value of the slope of the envelope, normalized by energy, Mao et al. 2013). For the NH high-spontaneous-rate AN model fibers, the rates are near saturation for all channels, with or without the added tone. However, the dip in the fluctuation profile at 1 kHz provides a cue for the presence of the tone (Carney 2018). This neural fluctuation cue is relatively robust to the roving-level paradigm (Carney et al. 2015).
The responses of AN model fibers with SNHL are shown in Figure 2 (bottom) for a model with thresholds for tones in quiet of approximately 40 dB HL, for which 2/3 of the loss is due to reduction of cochlear amplification associated with outer hair cells (OHCs), and 1/3 is due to reduced sensitivity of the IHCs. The dip in the fluctuation profile at 1 kHz is reduced in the responses of model fibers with SNHL due to the decrease in capture by the tone in the model with SNHL (Miller et al. 1997; Zilany & Bruce 2007). Thus, the temporal envelope information carried by neural fluctuations is less salient as a cue for the presence of the tone in the responses of the model with SNHL. In contrast with the NH model, the decreased sensitivity in the model with SNHL results in response rates that are not saturated, and thus an increase in average rate does occur in the AN channel tuned to the tone (bottom right, green), consistent with an energy cue being available to listeners with SNHL. Such a cue could support detection of a tone with a sufficient SNR, but this energy-based cue would be vulnerable to changes in stimulus level, such as those that occur in a roving-level paradigm.
In the present study, the use of level-dependent versus level-independent cues in listeners with NH or SNHL was investigated using masked detection of tones in the roving-level paradigm over a wide range of target frequencies and masker levels. A rove range of 20 dB was chosen to be large enough to provide a measurable effect of the rove on threshold, for listeners who used energy-based cues, but small enough to accommodate the limited dynamic range in listeners with SNHL. Results of the SNHL group were compared with results for listeners with NH. A key question was whether an individual listener uses different cues for masked detection across different frequencies and levels, and if so, whether the degree of hearing loss at each frequency predicted which cue is used. That is, does a listener with sloping hearing loss use different cues, or different combinations of temporal envelope and energy cues, at different frequencies? Also, how does the role of these cues vary with mean masker level? The role of IHC saturation in the creation of neural fluctuation cues might suggest that this cue is level dependent; however, IHC saturation interacts with cochlear amplification, which is also level dependent in the NH ear. Cochlear amplification is decreased, and thus gain is less level dependent, in the ear affected by SNHL. Furthermore, cochlear amplification is influenced by the efferent system, which could potentially be influenced by descending signals from the midbrain that are driven by neural fluctuation cues (Carney 2018). The interactions of these mechanisms could potentially reduce the level dependence of neural fluctuation cues, particularly if amplification was controlled, so as to preserve these cues across a range of sound levels. At the same time, loudness recruitment may also affect performance in individuals with SNHL. Therefore, it was of interest to study the level dependence of neural fluctuation cues for masked detection in listeners with both NH and SNHL. The answers to these questions may guide novel strategies for customized signal processing across both frequency and sound level, for example, by designing stimuli to manipulate neural fluctuations.
MATERIALS AND METHODS
Participants
Data were collected from 24 listeners (15 female and 9 male) ranging from 19 to 79 years of age. Participants were recruited from the local community. The test protocol consisted of tympanometry; air- and bone-conduction pure-tone testing from 0.25 to 8 kHz, including inter-octave frequencies of 1.5, 3, and 6 kHz; word-recognition testing in quiet (Central Institute for the Deaf, Test W-22; Auditec, Inc.); and speech-in-noise testing (QuickSIN; Etymotic Research, Inc.). All listeners had symmetric hearing, defined as <15 dB difference at two adjacent frequencies between the ears, and type A tympanograms suggesting normal Eustachian function. Participants with air–bone gap >15 dB at any test frequency were excluded.
On the basis of the average pure-tone audiometric threshold at 0.5, 1, 2, and 4 kHz in both ears (4PTA), half of the listeners were classified as NH with 4PTA <15 dB HL (n = 12), and the remaining group had 4PTA >25 dB HL (n = 12). The two groups of participants were not age matched; the mean age of NH listeners was 37.5 years and that of listeners with SNHL was 66 years. Among the participants with SNHL, 9 had 4PTA <40 dB HL and the remaining 3 had 4PTA between 40 and 60 dB HL. The audiometric data of the 24 participants are shown in Table 1.
TABLE 1.
Subject ID | Age | Sex | Mean Pure-Tone Thresholds Across Two Ears (dB HL) |
QuickSIN SNR Loss (dB) |
||||
---|---|---|---|---|---|---|---|---|
500 Hz | 1000 Hz | 2000 Hz | 4000 Hz | 4PTA | ||||
S112 | 20 | F | 5 | −3 | 0 | −5 | −1 | −0.5 |
S127 | 19 | F | 3 | 0 | 0 | −3 | 0 | 0 |
S122 | 41 | F | −3 | 3 | 0 | 8 | 2 | −0.5 |
S109 | 38 | F | 0 | 0 | 3 | 5 | 2 | 2.5* |
S201 | 20 | F | 10 | 3 | 0 | 0 | 3 | −2 |
S116 | 29 | M | 8 | 5 | 5 | 5 | 6 | 2.5* |
S123 | 31 | M | 10 | 8 | 0 | 8 | 6 | 0 |
S120 | 30 | M | 8 | 8 | 8 | 8 | 8 | 0.5 |
S118 | 64 | F | 3 | 8 | 10 | 13 | 8 | 0.5 |
S130 | 23 | F | 8 | 5 | 10 | 13 | 9 | 3* |
S105 | 72 | M | 10 | 3 | 13 | 18 | 11 | 2.5 |
S115 | 63 | F | 13 | 3 | 10 | 25 | 13 | 1.5 |
S131 | 72 | F | 13 | 13 | 30 | 48 | 26 | 3 |
S128 | 68 | M | 18 | 25 | 33 | 28 | 26 | 0.5 |
S119 | 63 | F | 20 | 25 | 28 | 33 | 26 | 1.5 |
S114 | 64 | F | 23 | 25 | 30 | 28 | 26 | 2.5 |
S36 | 68 | M | 13 | 20 | 30 | 53 | 29 | 0.5 |
S138 | 30 | F | 13 | 23 | 30 | 55 | 30 | 1 |
S134 | 69 | M | 8 | 20 | 40 | 58 | 31 | 4 |
S63 | 66 | F | 23 | 33 | 33 | 55 | 36 | 1 |
S129 | 71 | M | 30 | 33 | 40 | 43 | 36 | 3 |
S61 | 79 | F | 43 | 38 | 35 | 53 | 42 | 4 |
S40 | 71 | M | 35 | 48 | 55 | 68 | 51 | 6 |
S133 | 71 | F | 50 | 55 | 58 | 60 | 56 | 8.5 |
The QuickSIN scores of subjects who speak English as a second language.
4PTA, pure-tone audiometric threshold at 0.5, 1, 2, and 4 kHz in both ears; F, female; M, male; SNR, signal-to-noise ratio.
For QuickSIN testing, sentences were presented at 70 dB HL for NH listeners and at least 20 dB above the average pure-tone audiometric threshold at 2000 Hz for individual listeners with SNHL. Stimuli were presented diotically under headphones. The scores of four lists were obtained after each participant went through a practice list. The lowest and the highest scores of the four lists were dropped, and the mean score for the two remaining lists is shown in Table 1.
Combined Fixed/Roving-Level Tracking Paradigm
All stimulus generation and data collection were completed using custom programs in MATLAB (version R2016b). Statistical analyses were performed in SPSS (version 24; IBM, Inc.). The SNR at tone-in-noise detection threshold was obtained using a two-interval forced-choice tracking procedure. Four tone frequencies were tested (0.5, 1, 2, and 4 kHz); all stimuli were diotic. Maskers were 1/3-octave bands of Gaussian noise logarithmically centered at the tone frequency; maskers were created with steeply sloping (5000-order) finite impulse response filters. Both masker and tone were 300 msec in duration, gated simultaneously with 50-msec raised-cosine ramps. The initial tone level was set at 10 dB above the overall masker level, and SNR was varied using a two-down, one-up paradigm (Levitt 1971). The step size of the SNR was initially 2 dB, reduced to 1 dB after two reversals. Feedback was provided after each trial.
All tracks began with a fixed-level masker and then transitioned to a roving-level paradigm, as follows: After 16 reversals had been collected, the SD of the SNR at the final 10 reversals was computed. If the SD was >3 dB, the fixed-level track was extended until the SD of the SNR at the final 10 reversals was <3 dB. The mean SNR at the final 10 reversals in the fixed-level portion of the track was recorded as the fixed-level threshold. The track then continued into the roving-level portion, with the SNR for the first trial of the roving-level portion matched to the SNR from the final trial of the fixed-level portion of the track. In the roving-level paradigm, SNR tracking continued, starting with a step size of 2 dB that was reduced to 1 dB after two reversals. During the roving-level portion of the track, each interval of each two-interval trial had an overall level that varied uniformly over a 20-dB range (±10 dB) centered on the fixed masker level. The SNR threshold and SD computation for the roving-level portion of the track were based on the same criteria as for the fixed-level condition.
To check the stability of the threshold estimate for a given combination of frequency and level, at least four two-part tracks were collected in each stimulus condition, such that four pairs of fixed- and roving-level thresholds were estimated. Testing was concluded when the SD of the fixed-level thresholds and that of the roving-level thresholds were both <3 dB. The effect of the roving-level paradigm on the detection threshold, referred to as the RE, was estimated within each track as the difference between the fixed- and roving-level thresholds. Examples of combined fixed-/roving-level tracks for 2 participants (Fig. 3) illustrate either the lack (Fig. 3A) or presence (Fig. 3B) of a substantial RE for one test condition. Large REs suggest that listeners weight energy-based cues relatively strongly, and small REs suggest a greater importance of temporal envelope cues (Kidd et al. 1989; Richards 1992; Mao et al. 2015).
To ensure audibility of the stimuli, the threshold of the masker at each tone frequency was obtained before testing. The lowest masker level tested for each listener was always 10 dB above masker threshold for each test frequency. NH listeners were tested at three mean masker-level ranges: “Low” (25 to 40 dB SPL, depending on masker threshold at each test frequency), “Mid”: 65 dB SPL, and “High”: 80 dB SPL. Listeners with SNHL were tested at one to three mean masker-level ranges, depending on hearing loss: “Low”: ≤60 dB SPL, “Mid”: 65 to 75 dB SPL, and “High”: 80 to 90 dB SPL.
All participants were introduced to the task using 500-Hz stimuli with a mean masker level of 65 or 80 dB SPL. For the remaining conditions, the order of the frequency and level combinations was varied across participants. Each track required approximately 8 to 15 min, and participants usually completed 3 to 6 tracks in an hour-long listening session. Participants typically returned for 6 to 12 sessions, depending on the stability of their data (extra sessions were required for some participants), number of testable conditions based on hearing status, and their speed.
In each of the NH and SNHL groups, repeated-measures analysis of variance (ANOVAs) performed in SPSS were used to examine the relationship between detection thresholds across frequency and mean masker-level ranges. Effect sizes are reported as partial eta squared (ηp2). For tests of within-subject effects, the Mauchly test of sphericity was used to determine whether modification to the degrees of freedom was necessary to avoid type I errors. The Greenhouse–Geisser correction method was applied in cases where sphericity was violated. Bonferroni adjustment for multiple pairwise comparisons of least-squared means was used in post hoc analyses.
Repeated-measures ANOVAs on RE for both groups were performed to test the hypotheses that the RE in listeners with SNHL was significantly greater than in listeners with NH. The effects of frequency and mean masker level on RE were also examined. Relationship between size of RE and hearing status was studied using nonparametric, rank-order correlation coefficient (Spearman rho, rs).
Simulation of RE With Different Weighting of Energy and Envelope Cues
Simulations of basic energy and envelope-based models were performed to quantify the anticipated effect of the roving-level paradigm on thresholds and to test the hypothesis that combined energy and envelope-related cues could explain the observed trends in RE. Adaptive tracks were simulated using a two-alternative, two-down, one-up paradigm, with or without a 20-dB across-interval rove of the stimulus level. Simulations of tracks were made for 65-dB SPL mean masker levels, for comparison to the measured tracks shown in Fig. 3. Simulations of the RE across frequency for models with combined cues were done for 80-dB SPL mean masker levels because results were available for all listeners in both groups for this condition (see later). Simulated thresholds were based on averages of 100 tracks for each condition.
The energy model was a gammatone filter centered at the tone frequency, and the decision variable was the root mean square energy (in dB) of the filter response to the noise-alone or tone-plus-noise stimulus. An increase in energy at the filter output indicated tone presence. The envelope model used the envelope of the gammatone filter response; the decision variable was the normalized envelope slope, computed by taking the mean of the absolute value of the derivative of the envelope, normalized by stimulus energy (Davidson et al. 2009; Mao et al. 2013). A reduction of the envelope slope cue (i.e., a “flattening” of the stimulus envelope, Fig. 2) indicated the presence of a tone. Models based on combined cues used decision variables that were linearly weighted sums of standardized cues, where standardization was based on distributions of cues calculated for 250 tokens of noise-alone and tone-plus-noise stimuli.
The bandwidths of the gammatone filters used for both the energy and envelope models were based on the equivalent rectangular bandwidth for each frequency channel, from Moore and Glasberg (1983). The filter bandwidth affects the detection threshold but has only a small effect on the RE in comparison to the effect of using energy versus envelope cues. Because detection thresholds for the fixed-level condition were similar for the NH and SNHL groups (Fig. 4), the same model gammatone filter bandwidths were used for both groups.
RESULTS
All 12 listeners with NH completed the tone-in-noise detection task with combinations of four frequencies and three mean masker-level ranges (Low: 25 to 40, Mid: 65, and High: 80 dB SPL). The remaining 12 participants with SNHL were tested at one to three masker-level ranges (Low: ≤60, Mid: 65 to 75, High: 80 to 90 dB SPL) after audibility of masker-only stimuli at each test frequency was verified. The mean detection threshold in SNR across mean masker levels is plotted from 500 to 4000 Hz in Figure 4. Roving-level thresholds (solid symbols) were consistently slightly higher than fixed-level thresholds (open symbols) for the NH group (Fig. 4, left column). A clear separation between fixed- and roving-level thresholds was observed at all frequencies in the SNHL group (Fig. 4, red, right column).
Trends in Detection in NH Group
The mean fixed-level detection thresholds of listeners with NH across frequency and masker-level range are shown in Figure 4 (black open symbols, left column). The absolute thresholds, approximately −3 to −5 dB SNR across fixed-level conditions, for the listeners with NH were consistent with thresholds reported previously for similar masker bandwidths (van de Par & Kohlrausch 1999). Performance was stable across frequency [F(1.614,17.755) = 2.206; p = 0.106]. There was a small but statistically significant difference in fixed-level detection thresholds across masker level [F(2,22) = 10.372; p = 0.001; ηp2 = 0.485]. Mean fixed-level detection thresholds at the high masker level (80 dB SPL) were 1.1 dB lower than those obtained at low and mid masker-level ranges (p < 0.05). There was no significant difference in RE across mean masker level [F(2,22) = 1.620; p = 0.221]. The size of the RE also remained stable across frequency [F(3,33) = 2.473; p = 0.079].
Trends in SNHL Group
In the SNHL group, thresholds for four test frequencies were obtained at 80 to 90 dB SPL from all participants. Two participants (S40 and S133) were tested at only one masker level at two to three test frequencies due to their degree of hearing loss (Fig. 5). To investigate the effect of masker-level range, detection thresholds of the 10 remaining participants for 80 to 90 dB SPL and for maskers 10 dB above masker threshold at each frequency (10 dB SL) were further analyzed. The SL range of the 10 dB SL data varied from 35 to 75 dB SPL across conditions and listeners.
Detection thresholds in the fixed-level conditions were similar across masker-level range [F(1,9) = 0.965; p = 0.352]. No significant difference in the size of RE across masker-level ranges was noted [F(1,9) = 0.2; p = 0.665]. While fixed-level detection thresholds across frequency were similar [F(1.545,13.905) = 3.998; p =0.051], the mean RE varied across frequency [F(3,27) = 3.846; p = 0.021; ηp2 = 0.299]. Mean RE for 500, 1000, and 2000 Hz was all 2.9 dB. Post hoc analysis showed that the mean RE of 4.4 dB at 4000 Hz was significantly greater than that at 2000 Hz (p < 0.05).
To compare the performance between NH and SNHL groups, fixed-level detection thresholds and RE at the high masker level (80 to 90 dB SPL) were analyzed (Fig. 6) because this condition was completed by all 24 participants in the study. Repeated-measures ANOVA showed a significant frequency effect regardless of hearing-loss status [F(2.090,45.982) = 4.512; p = 0.015; ηp2 = 0.17]. Post hoc analysis showed that fixed-level detection threshold at 500 Hz was 1.5 dB higher than that of 1000 Hz (p < 0.05). Performance was similar across NH and SNHL groups [F(1,22) = 1.791; p = 0.194]. The slightly higher threshold at 500 Hz is consistent with trends in the diotic masked thresholds for comparable masker bandwidths reported by van de Par and Kohlrausch (1999).
RE also varied significantly across frequency and hearing-loss status [F(3,66) = 7.391; p < 0.001; ηp2 = 0.251]. Whereas mean RE in listeners with NH ranged from 0.8 to 1.4 dB and did not vary significantly across frequency (p > 0.05), mean RE in listeners with SNHL increased from 2.2 dB at 500 Hz, to 3.3 dB at 1000 and 2000 Hz, to 5.7 dB at 4000 Hz (p < 0.05).
Simulations
Figure 7 shows simulations of four tracks for the 1000-Hz, 65-dB SPL masker condition to illustrate the effect of cue weighting for simple models that linearly combine these energy and envelope cues. The model that depends most strongly on the reduction of envelope fluctuations to detect the tone has a small RE (left); the model that depends more strongly on energy-based cues has a larger RE (right). These cue combinations were selected to approximately match the tracks that are illustrated in Figure 3.
The differences between fixed- and roving-level thresholds in Figure 6 are replotted as REs in Figure 8, superimposed with the REs predicted by simulations with models based on energy, envelope, and combined cues (dashed lines). As expected, the RE for a model that relies completely on a normalized envelope cue is close to 0 dB (green line), a smaller RE than was observed for either group of listeners. The RE for a model that relies only on energy cues ranges from 10 to 15 dB (blue line), higher than was ever observed for listeners in either group. The small but significant RE for the NH group was best described by a model that linearly combined the two cues, weighting the standardized energy cue by 0.25 and the standardized envelope cues by 0.75 (purple). The models that weighted the energy cue more strongly yielded better estimates of the RE for listeners with SNHL (yellow and red lines), consistent with the hypothesis that a reduction in the quality of envelope of neural fluctuation cues resulted in greater reliance on energy-based cues for this group. As expected, the weighting of the energy cue was strongest (0.75) in the model that best explained the RE for 4000 Hz for the SNHL group.
One-tailed Spearman correlations were used to investigate whether elevated pure-tone audiometric thresholds were associated with larger RE at the test frequency (Table 2). Correlations were observed between RE and pure-tone audiometric thresholds at test frequencies of 500, 2000, and 4000 Hz. After controlling for the relationship between age and audiometric threshold elevation, correlations were significant only at 4 kHz [rs(21) = 0.4; p = 0.029].
TABLE 2.
Frequency | ||||
---|---|---|---|---|
500 Hz | 1000 Hz | 2000 Hz | 4000 Hz | |
Spearman correlation (p) | 0.548 (0.003) | 0.272 (0.099) | 0.454 (0.013) | 0.746 (<0.001) |
Partial Spearman correlation corrected for age (p) | 0.224 (0.152) | −0.019 (0.466) | 0.290 (0.09) | 0.400 (0.029) |
Bold values indicates significant correlations at p < 0.05.
After controlling for age of subjects, only the pure-tone threshold at 4000 Hz was significantly correlated with RE.
RE, rove effect.
As expected, listeners with greater 4PTA tended to have higher QuickSIN scores after the effect of age was controlled [rs(21) = 0.421; p = 0.023]. There was no significant correlation between size of RE at any test frequency and QuickSIN scores, after controlling for the effects of pure-tone audiometric threshold and age (p > 0.05).
DISCUSSION
In the classic fixed-level, tone-in-noise detection task, both energy and temporal cues are present in the stimuli. On average, the addition of a tone at or above detection threshold increases the overall energy of the stimulus and also reduces the slopes of envelope fluctuations with respect to the masker-alone stimulus (Fig. 1A, e.g., Richards 1992; Mao et al. 2013). To study the role of temporal envelope cues in masked detection, previous studies used either energy normalization or a roving-level paradigm to reduce the effectiveness of the energy cue as a detection strategy. Whereas temporal envelope cues are resistant to level changes and thus could explain minimal changes in threshold across fixed- and roving-level detection, reliance on an energy cue would lead to elevated roving-level detection thresholds (Green 1988; Kidd et al. 1989).
Individuals with substantial SNHL at the target tone frequency have elevated thresholds when energy cues are rendered less reliable in the roving-level task (Mao et al. 2015). To expand upon the findings of Mao et al. (2015), in which only 500-Hz tones in reproducible noises were used, the present study examined roving-level tone-in-noise detection across a wider range of frequencies and levels, in listeners with and without SNHL. We found that performance of NH listeners was surprisingly consistent across frequency and masker-level range. Their mean RE was about 1 dB across all test conditions. The fact that detection thresholds were minimally affected by roving may be explained by normal cochlear amplification and IHC saturation in the healthy ear leading to capture, that is, flattening of the neural fluctuations near the tone frequency observed in the AN response (Zilany & Bruce 2007; Carney 2018). The size of RE in NH listeners was consistent with simulation results in which envelope cues were more strongly weighted than energy cues.
In contrast, listeners with SNHL had different RE size across test frequency, suggesting different weighting of cues. Their mean RE at the 80 to 90 dB SPL masker level increased with frequency, ranging from 2.2 to 5.7 dB. This finding was consistent with simulation results for which energy cues were weighted more strongly than temporal envelope cues. The largest RE was at 4000 Hz, the test frequency at which our study participants had the highest pure-tone thresholds. Due to the fact that our NH and SNHL groups were not age matched, it is difficult to delineate the effect of hearing loss alone on RE. While it is unknown whether aging itself affects the coding and weighting of the cues, our analyses showed that the size of RE was significantly correlated with pure-tone audiometric threshold at 4000 Hz after controlling for age. This finding suggested that the increased weighting of the energy cue was associated with greater degree of hearing loss.
One other key finding is that listeners with SNHL were affected differently by the roving-level paradigm across test frequencies and levels. Eight out of 12 participants in the SNHL group had RE >2 SD of the NH group in at least one test condition (Fig. 5). The reliance on and effective use of temporal cues, indicated by minimal changes in threshold between the fixed- and roving-level conditions across a range of levels and frequencies, could be a hallmark for NH function. Examination of the track data (see examples in Fig. 3) provided evidence that listeners, regardless of hearing status, used consistent weighting of the two cues across fixed- and roving-level test conditions. Even when the task was abruptly switched from fixed- to roving-level detection, the transition did not lead to a sudden worsening of performance that improved after an apparent change in strategy. The tracks of listeners with no RE were essentially flat after the initial descent and across the transition from fixed- to roving-level testing, whereas listeners with larger RE showed an increase in detection threshold immediately after the roving-level paradigm was initiated, followed by a plateau in their tracks.
The potential roles of temporal and energy cues in tone-in-noise detection and coding of speech in noise are illustrated by the responses of model AN fibers (Fig. 2). In a healthy ear, the addition of a tonal stimulus at or above threshold in the presence of noise flattens the amplitude fluctuation of the instantaneous firing rate of the fiber tuned near the tone frequency. The capture of the AN responses tuned near the tone frequency and the associated flattening of the neural fluctuations in those responses depends on cochlear amplification and the saturation of the IHC response (Zilany & Bruce 2007; Carney 2018). The same pattern of neural fluctuations is observed in responses to vowels (Carney et al. 2015, 2016). The fluctuation pattern across frequency channels is resilient to changes in the overall stimulus level and in the presence of background noise, and the pattern is enhanced along the ascending pathway.
In the ear with SNHL, the interplay of changes in average discharge rates and in temporal information, such as capture, is potentially complex. For SNHL that is primarily attributed to loss of OHC function, IHC saturation would not occur at the same sound levels as in a healthy ear, due to reduction of cochlear amplification (Fig. 2B). Reduced sensitivity of the IHC, which may also contribute to SNHL, would also reduce saturation of the IHC response and therefore further reduce capture. In an ear with normal or near-normal thresholds, an increase in tone level may lead to capture by the tone without a change in average rate, whereas in an ear with reduced sensitivity, increasing the tone level may increase firing rate and provide an energy-based cue. Capture is reduced in AN fibers in cats with SNHL (Miller et al. 1997), leading to a degradation in the fluctuation pattern across frequency channels (Fig. 2B, and Carney et al. 2016). Average discharge rates of AN fibers in the ear with SNHL would be affected by the slope of the rate-level functions, which have been assumed to explain loudness recruitment (Harrison 1981). However, Heinz and Young (2004) reported that steeper rate-level functions in AN fibers of cats with SNHL only occurred in response to complex sounds, such as speech, or in response to tones for fibers with thresholds of approximately 80 dB SPL or higher. On average, they found that rate-level functions in the impaired ear were slightly less steep than in the normal ear (Heinz & Young 2004). The relative contributions, and the interaction, of changes in temporal response properties, such as capture, and changes in the average-rate profile across AN fibers are important topics for future studies of SNHL.
It is well known that older adults with NH as well as individuals with hearing loss require a more favorable SNR to obtain equal performance in speech-in-noise testing when compared with young NH peers, even when audibility is maximized (Dubno et al. 1984, Pichora-Fuller et al. 1995; Festen & Plomp 1990; McArdle & Wilson 2006). In the present study, due to the small number of participants, the NH group and the SNHL group were not age matched. A larger sample size would be needed to further study the effect of age and hearing loss, and potentially musicianship (Zendel & Alain 2012), on the size of RE and how RE may be used to predict speech-in-noise performance.
Future studies of roving-level tone-in-noise detection of individuals with normal versus poor speech-in-noise performance in particular may shed light on subtle changes in the cues that are used by the auditory system. Because the present study has shown that listeners with SNHL may use different cues at different test frequencies in tone-in-noise detection, modeling of these behavioral responses using different cues may lead to better understanding of encoding in the healthy auditory system, as well as the effect of OHC and/or IHC dysfunction on the use of temporal envelope cues. Alternative analyses, such as decision variable correlations (Sebastian & Geisler 2018), may facilitate determination of an individual listener’s weighting of energy versus envelope cues, potentially even in real time. Intervention strategies focused on enhancing temporal cues that are resilient to level changes may guide novel strategies for signal processing that currently focus on maximizing audibility for people with SNHL. A better understanding of the relationship between speech intelligibility and the cues used by individual listeners across level and frequency may lead to further customization and greater improvement in aural rehabilitation.
ACKNOWLEDGMENTS
This work was supported by NIH-DC-010813.
Footnotes
The authors have no conflicts of interest to disclose.
REFERENCES
- Carney LH (2018). Supra-threshold hearing and fluctuation profiles: Implications for sensorineural and hidden hearing loss. J Assoc Res Otolaryngol, 19, 1–22. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Carney LH, Kim DO, Kuwada S (2016). Speech coding in the midbrain: Effects of sensorineural hearing loss. In van Dijk P, Başkent D, Gaudrain E, de Kleine E, Wagner A, Lanting C (Eds.) Physiology, Psychoacoustics and Cognition in Normal and Impaired Hearing (pp. 427–435). Cham, Switzerland: Springer. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Carney LH, Li T, McDonough JM (2015). Speech coding in the brain: Representation of vowel formants by midbrain neurons tuned to sound fluctuations. Eneuro, 2,1–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dai H (2008). On suppressing unwanted cues via randomization. Percept Psychophys, 70, 1379–1382. [DOI] [PubMed] [Google Scholar]
- Dai H, & Kidd G (2009). Limiting unwanted cues via random rove applied to the yes-no and multiple-alternative forced choice paradigms. J Acoust Soc Am, 126, EL62–EL67. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Davidson SA, Gilkey RH, Colburn HS, et al. (2009). An evaluation of models for diotic and dichotic detection in reproducible noises. J Acoust Soc Am, 126, 1906–1925. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dubno JR, Dirks DD, Morgan DE (1984). Effects of age and mild hearing loss on speech recognition in noise. J Acoust Soc Am, 76, 87–96. [DOI] [PubMed] [Google Scholar]
- Evilsizer ME, Gilkey RH, Mason CR, et al. (2002). Binaural detection with narrowband and wideband reproducible noise maskers: I. Results for human. J Acoust Soc Am, 111(1 Pt 1), 336–345. [DOI] [PubMed] [Google Scholar]
- Festen JM, & Plomp R (1990). Effects of fluctuating noise and interfering speech on the speech-reception threshold for impaired and normal hearing. J Acoust Soc Am, 88, 1725–1736. [DOI] [PubMed] [Google Scholar]
- Fletcher H (1940). Auditory patterns. Reviews of modern physics, 12, 47. [Google Scholar]
- Gilkey RH, & Robinson DE (1986). Models of auditory masking: A molecular psychophysical approach. J Acoust Soc Am, 79, 1499–1510. [DOI] [PubMed] [Google Scholar]
- Green DM (1988). Profile Analysis: Auditory Intensity Discrimination. Oxford, United Kingdom: Oxford Science. [Google Scholar]
- Harrison RV (1981). Rate-versus-intensity functions and related AP responses in normal and pathological guinea pig and human cochleas. J Acoust Soc Am, 70, 1036–1044. [DOI] [PubMed] [Google Scholar]
- Heinz MG, & Young ED (2004). Response growth with sound level in auditory-nerve fibers after noise-induced hearing loss. J Neurophysiol, 91, 784–795. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hoffman HJ, Dobie RA, Losonczy KG, et al. (2017). Declining prevalence of hearing loss in US adults aged 20 to 69 years. JAMA Otolaryngol Head Neck Surg, 143, 274–285. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jepsen ML, Ewert SD, Dau T (2008). A computational model of human auditory signal processing and perception. J Acoust Soc Am, 124, 422–438. [DOI] [PubMed] [Google Scholar]
- Kidd G Jr, Mason CR, Brantley MA, et al. (1989). Roving-level tone-in-noise detection. J Acoust Soc Am, 86, 1310–1317. [DOI] [PubMed] [Google Scholar]
- Kohlrausch A, Fassel R, van der Heijden M, et al. (1997). Detection of tones in low-noise noise: Further evidence for the role of envelope fluctuations. Acta Acust united Ac, 83, 659–669. [Google Scholar]
- Lentz JJ, Richards VM, Matiasek MR (1999). Different auditory filter bandwidth estimates based on profile analysis, notched noise, and hybrid tasks. J Acoust Soc Am, 106, 2779–2792. [DOI] [PubMed] [Google Scholar]
- Levitt H (1971). Transformed up-down methods in psychoacoustics. J Acoust Soc Am, 49(2B), 467–477. [PubMed] [Google Scholar]
- Mao J, & Carney LH (2015). Tone-in-noise detection using envelope cues: Comparison of signal-processing-based and physiological models. J Assoc Res Otolaryngol, 16, 121–133. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mao J, Koch KJ, Doherty KA, et al. (2015). Cues for diotic and dichotic detection of a 500-Hz tone in noise vary with hearing loss. J Assoc Res Otolaryngol, 16, 507–521. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mao J, Vosoughi A, Carney LH (2013). Predictions of diotic tone-in-noise detection based on a nonlinear optimal combination of energy, envelope, and fine-structure cues. J Acoust Soc Am, 134, 396–406. [DOI] [PMC free article] [PubMed] [Google Scholar]
- McArdle RA, & Wilson RH (2006). Homogeneity of the 18 QuickSIN lists. J Am Acad Audiol, 17, 157–167. [DOI] [PubMed] [Google Scholar]
- Miller RL, Schilling JR, Franck KR, et al. (1997). Effects of acoustic trauma on the representation of the vowel “eh” in cat auditory nerve fibers. J Acoust Soc Am, 101, 3602–3616. [DOI] [PubMed] [Google Scholar]
- Moore BC, & Glasberg BR (1983). Suggested formulae for calculating auditory-filter bandwidths and excitation patterns. J Acoust Soc Am, 74, 750–753. [DOI] [PubMed] [Google Scholar]
- Patterson RD (1976). Auditory filter shapes derived with noise stimuli. J Acoust Soc Am, 59, 640–654. [DOI] [PubMed] [Google Scholar]
- Patterson RD, & Moore BCJ (1986) Auditory filters and excitation patterns as representations of frequency resolution. In Moore BCJ (Ed.), Frequency Selectivity in Hearing (pp. 123–177). London, United Kingdom: Academic Press. [Google Scholar]
- Pichora-Fuller MK, Schneider BA, Daneman M (1995). How young and old adults listen to and remember speech in noise. J Acoust Soc Am, 97, 593–608. [DOI] [PubMed] [Google Scholar]
- Richards VM (1992). The effects of level uncertainty on the detection of a tone added to narrow bands of noise. In Cazals Y, Horner K, Demany L (Eds.) Auditory Physiology and Perception (pp. 337–343). Oxford, United Kingdom: Pergamon Press. [Google Scholar]
- Richards VM, & Nekrich RD (1993). The incorporation of level and level-invariant cues for the detection of a tone added to noise. J Acoust Soc Am, 94, 2560–2574. [DOI] [PubMed] [Google Scholar]
- Sebastian S, & Geisler WS (2018). Decision-variable correlation. J Vis, 18, 3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- van de Par S, & Kohlrausch A (1999). Dependence of binaural masking level differences on center frequency, masker bandwidth, and interaural parameters. J Acoust Soc Am, 106, 1940–1947. [DOI] [PubMed] [Google Scholar]
- Zendel BR, & Alain C (2012). Musicians experience less age-related decline in central auditory processing. Psychol Aging, 27, 410–417. [DOI] [PubMed] [Google Scholar]
- Zilany MS, & Bruce IC (2007). Representation of the vowel /epsilon/ in normal and impaired auditory nerve fibers: Model predictions of responses in cats. J Acoust Soc Am, 122, 402–417. [DOI] [PubMed] [Google Scholar]
- Zilany MS, Bruce IC, & Carney LH (2014). Updated parameters and expanded simulation options for a model of the auditory periphery. J Acoust Soc Am, 135, 283–286. [DOI] [PMC free article] [PubMed] [Google Scholar]