Abstract
Results of simultaneous notched-noise masking are commonly interpreted as reflecting the bandwidth of underlying auditory filters. This interpretation assumes that listeners detect a tone added to notched-noise based on an increase in energy at the output of an auditory filter. Previous work challenged this assumption by showing that randomly and independently varying (roving) the levels of each stimulus interval does not substantially worsen listener thresholds [Lentz, Richards, and Matiasek (1999). J. Acoust. Soc. Am. 106, 2779–2792]. Lentz et al. further challenged this assumption by showing that filter bandwidths based on notched-noise results were different from those based on a profile-analysis task [Green (1983). Am. Psychol. 38, 133–142; (1988). (Oxford University Press, New York)], although these estimates were later reconciled by emphasizing spectral peaks of the profile-analysis stimulus [Lentz (2006). J. Acoust. Soc. Am. 120, 945–956]. Here, a single physiological model is shown to account for performance in fixed- and roving-level notched-noise tasks and the Lentz et al. profile-analysis task. This model depends on peripheral neural fluctuation cues that are transformed into the average rates of model inferior colliculus neurons. Neural fluctuations are influenced by peripheral filters, synaptic adaptation, cochlear amplification, and saturation of inner hair cells, an element not included in previous theories of envelope-based cues for these tasks. Results suggest reevaluation of the interpretation of performance in these paradigms.
I. INTRODUCTION
The power-spectrum model of masking [Patterson (1976), Moore (1986), Glasberg and Moore (1990), and Moore, 2007; and related Fletcher (1938, 1940)] provides an elegant basis for explaining a wide variety of masking data and is one of the most widely used models in psychoacoustics. Some perspectives on the power-spectrum model suggest that it has a basis in peripheral physiology (Sumner et al., 2018). Others view the power-spectrum model more conceptually, simply as a predictor of behavioral results. The central principle of the model is that masking results can be predicted by representing stimuli in terms of power within a single frequency channel (Moore, 2007).
Despite its usefulness, the power-spectrum model of masking has limitations. Previous evidence from Lentz et al. (1999) suggested that the power-spectrum representation of stimuli may not be sufficient for predicting behavioral results of the classical simultaneous notched-noise masking paradigm (Patterson, 1976; Patterson and Nimmo-Smith, 1980; Patterson et al., 1982; Glasberg and Moore, 1990; Rosen and Stock, 1992; Heinz et al., 2002; Baker and Rosen, 2006; Alves-Pinto et al., 2016; Burton et al., 2018). Lentz et al. (1999) randomly varied (roved) the level of both stimuli in a two-interval notched-noise task over a wide range (30 dB) to make energy cues less reliable, but listener thresholds were only slightly elevated. This result implied that listener performance (and auditory-filter estimation using this paradigm) did not depend on the detection of an increase in power within a single frequency channel. The current study tested a physiological model that can explain simultaneous notched-noise masking results with and without roving levels, thus accounting for notched-noise performance in a way the conceptual power-spectrum model cannot.
The results of Lentz et al. (1999), as well as results for tone-in-noise detection in paradigms with level variation [e.g., Kidd et al. (1989) and Richards (1992)] and low-noise noise (Kohlrausch et al., 1997), suggested that temporal envelope cues may play a role in detection with simultaneous maskers. These studies interpreted psychophysical results based on an analysis of cues available in the envelope of the stimulus waveform or on the outputs of band-pass filters representing the auditory periphery. The current study focused not on envelope cues in the stimulus waveform, but on temporal cues that are carried in the envelope-driven fluctuations of neural responses. These fluctuations include the effects of both peripheral tuning and nonlinearities and are referred to as “neural fluctuations” (Carney, 2018).
Figure 1 schematically illustrates neural fluctuations in two example post-stimulus-time histograms (PSTHs) of an auditory-nerve (AN) fiber model. The two examples have strong [Fig. 1(a)] and weak [Fig. 1(b)] fluctuations in instantaneous rate as a function of time. The amplitudes of the fluctuations (i.e., size of amplitude variations in the red lines) differ substantially between the two examples, whereas the average rates (dashed blue lines) do not. These examples illustrate that fluctuations in instantaneous rate, which are shaped over a relatively wide range of levels by the saturating transduction nonlinearity of the inner hair cell (IHC), can differ even when average rates are the same (Carney, 2018). For most AN fibers, saturation of average discharge rates, which occurs due to saturation of the synapse between the IHC and AN, occurs at a lower sound level than the more gradual saturation of transduction in the IHC (Patuzzi and Robertson, 1988). The distinction between neural fluctuations and temporal fine structure (TFS) of the neural responses is also important. Although very low-frequency TFS may be indistinguishable from neural fluctuations, the term neural fluctuations generally refers to “slow” changes with respect to those in the TFS, i.e., neural fluctuations correspond to the fluctuations in the envelope of the PSTH. The three elements of the AN response labeled in Fig. 1 (average rate, TFS, and neural fluctuations) are notably different from the analogous elements of the signal at the output of a peripheral filter (e.g., energy, TFS, and envelope of the filtered stimulus) because the neural responses are strongly shaped by peripheral nonlinearities (Carney, 2018).
FIG. 1.
(Color online) Neural fluctuations: (a) AN post-stimulus-time histogram (PSTH) schematic showing strong fluctuations. (b) AN PSTH schematic showing weak fluctuations (flat AN response). Elements of the AN response include neural fluctuations (thick red line), average rate (dashed blue line), and temporal fine structure (thin black line). Neural fluctuations are schematically highlighted here by the upper envelope of the PSTH. In both (a) and (b), the AN fiber has a characteristic frequency (CF) of approximately 1 kHz. AN responses were to a 1/3 octave narrowband noise (spectrum level of 30 dB re 20 μPa) centered on 1 kHz (a) to which a 1 kHz tone was added (b) at an SNR of +2 dB.
An important phenomenon that shapes neural fluctuations is synchrony capture (Deng and Geisler, 1987; Miller et al., 1997). Capture occurs when a single frequency component dominates the response of the inner hair cell (IHC) due to saturation of the IHC and nonlinear gain in the cochlea (Zilany and Bruce, 2007). Because a captured response resembles the response to a single tone, capture reduces low-frequency fluctuations in the auditory-nerve (AN) response [e.g., relatively flat response in Fig. 1(b)]. For stimuli with relatively low-density frequency components, neural channels centered on single frequency components are captured and fluctuate less. Neural channels not centered on single frequency components are not captured by single components and therefore have larger fluctuations because of beating between components [similar to the larger fluctuations in Fig. 1(a)]. Capture depends on the relative level of the single-frequency components near a fiber's characteristic frequency (CF).
Several other peripheral components also shape fluctuations in AN responses. For example, the narrowband filtering in the cochlea emphasizes stimulus components near CF, such that fluctuations in the envelope of the cochlear response are reduced for channels tuned near spectral peaks (e.g., Mao et al., 2013). Nonlinear cochlear amplification interacts with the sensitivity of IHC transduction to modulate capture related to IHC saturation (Zilany and Bruce, 2007). Physiologically realistic IHC-AN synaptic adaptation also tends to reduce fluctuations in the AN response when a spectral peak falls near a fiber's CF. The current study used a model of the AN (Zilany et al., 2014) that includes all of these features to accurately simulate temporal fluctuations in the responses of peripheral neurons (Joris and Yin, 1992). Other models with realistic basilar-membrane compression and saturating IHC transduction may also reliably simulate neural fluctuations [e.g., Sumner et al. (2002), Verhulst et al. (2018), and Bruce et al. (2018)]. It is important to emphasize that the contrast between neural fluctuation amplitudes across frequency channels is not simply explained by rate-saturation. Fluctuation amplitudes in the instantaneous rate of AN fibers can vary even when the average rates of all channels are saturated (Carney, 2018).
To our knowledge, previous modeling studies of envelope-related cues in the notched-noise task or related simultaneous-masking tasks (Derleth and Dau, 2000; Jepsen et al., 2008) have not simulated responses to notched-noise with roving levels, as used in Lentz et al. (1999). Also, these previous modeling studies incorporated none or only some of the nonlinearities that are important for accurately representing neural fluctuations and capture. Differences between the current model and modulation-filter-based models for masked detection such as Dau et al. (1997), used in Derleth and Dau (2000) and with updates in Jepsen et al. (2008), are mentioned in Sec. II B below.
Neural fluctuations are of particular interest for studies of auditory coding because cells at higher levels of the auditory pathway, particularly in the auditory midbrain [inferior colliculus (IC)], can be strongly excited or suppressed by low-frequency sinusoidal modulations of stimulus amplitude. Changes in the average rates of IC neurons in response to stimulus modulations are characterized by modulation transfer functions (MTFs) based on responses to sinusoidally amplitude-modulated (AM) sounds [for a review, see Joris et al. (2004)]. Many IC neurons have MTFs with average rates that are increased or suppressed over a band of modulation frequencies, referred to as band-enhanced or band-suppressed MTFs, respectively [Kim et al. (2015) and Carney et al. (2016); previously referred to as band-pass or band-reject MTFs, respectively: Krishna and Semple (2000) and Nelson and Carney (2007)]. In response to complex sounds, the fluctuations that are set up in AN responses ascend the auditory pathway to excite or suppress the responses of AM-sensitive neurons in the midbrain, depending upon the fluctuation amplitudes and frequencies. Therefore, temporal fluctuations in peripheral neurons are recast as changes in average rate across the population of IC cells.
In the current study, neurons from the central nucleus of the IC (ICC, hereafter abbreviated IC) with band-enhanced MTFs were simulated using a physiological model of IC responses to AM sounds based on excitatory-inhibitory interactions (Nelson and Carney, 2004; Carney et al., 2015). Best modulation frequencies (BMFs) of the model neurons were chosen based on distributions of BMFs found in the mammalian IC (Langner and Schreiner, 1988; Krishna and Semple, 2000; Nelson and Carney, 2007). The first goal of this study was to use this midbrain model to test the hypothesis that peripheral neural fluctuations, manifest as changes in discharge rates of AM-sensitive cells in the IC, can explain performance for the detection of a tone added to a notched noise, with and without roving levels. Implicated in this hypothesis is the suggestion that temporal features of the stimulus, transformed by peripheral nonlinearities, may account for listener performance. This explanation, potentially demonstrable using any computational model that accurately represents peripheral nonlinearities, is fundamentally different from explanations of human performance based on increases in energy at the output of one filter due to the added tone, such as the power-spectrum model, and models based on cross-channel comparisons of energy, such as the profile-analysis model (Green, 1988). In the current study, for comparison to previous energy-based models, performance based on physiologically realistic energy cues was also evaluated, using AN average rates as the neural representation of energy at the output of physiological (nonlinear) peripheral models.
The main objective of Lentz et al. (1999) was to evaluate the consistency of filter bandwidths estimated using different stimuli and paradigms. In addition to the detection of a tone added to notched noises in both fixed-level (without random level variation) and roving-level paradigms, Lentz et al. (1999) also examined a version of the classic profile-analysis task (Green, 1983; Green et al., 1984; Green and Mason, 1985; Green, 1988; Dai and Green, 1993; Richards and Lentz, 1998). The profile-analysis task in their study required discrimination between a set of logarithmically spaced tones with equal amplitude (the standard stimulus) and that same set of tones with alternate components amplified/attenuated in an up-down pattern (the signal-plus-standard stimulus). Critically, Lentz et al. (1999) tested the same subjects on both the profile-analysis and detection in notched-noise tasks and observed substantial differences in auditory-filter bandwidths estimated for individual subjects based on the two datasets, as interpreted using the power-spectrum model of masking. These differing estimates were reconciled by Lentz (2006) by adding to the conceptual power-spectrum model explanation, suggesting that listeners rely primarily on the peaks in the spectra of these stimuli and adjusting their modeling procedure to select those peaks. The nonlinear peripheral response properties described above may emphasize spectral peaks for the central auditory system in a way similar to the selection of peaks in Lentz (2006). A secondary goal of the current study was to test the hypothesis that a single neural-fluctuation-based model could account for the psychophysical results for both the detection of a tone added to notched noise and the profile-analysis task used by Lentz et al. (1999). The nature of the changes in neural fluctuations across channels that potentially support performance on each task is illustrated and discussed below.
II. METHODS
A. Stimuli
Stimuli were generated following Lentz et al. (1999), experiments 1 and 2. Stimuli were scaled in pascals for use as inputs to the AN model, because the nonlinear AN model responses depend strongly on sound level.
1. Notched noise
The notched-noise stimuli are schematically illustrated in Fig. 2(a). The stimuli were presented using a simulated two-interval, forced-choice procedure. The stimulus in each interval was 100 ms in duration, including 5-ms raised cosine on/off ramps. Two intervals, one with the added tone (target) and one without the tone (standard), were separated by a silent interstimulus interval of 330 ms. Each notched-noise masker was produced by adding tones spaced at 10 Hz to create two noise bands. Noise bandwidths tested were 50, 100, and 300 Hz. A wideband condition, in which the masker extended from 200 to 5000 Hz without energy in the notch region, was also tested. For each noise bandwidth condition, notch bandwidths of 100, 200, 300, 400, and 800 Hz were tested. Gaussian noises were generated as follows: the magnitude of each masker tone was drawn from a Rayleigh distribution, and phases were drawn at random from a uniform distribution with a range of 2π rad (Hartmann, 1998, p. 605). The masker was scaled based on the average level of a large set of stimuli with a spectrum level of 35 dB re 20 μPa. Target tones at 1 kHz ranged in level from 10 to 90 dB sound pressure level (SPL); several tone levels were tested to estimate the threshold at which the models achieve 79% correct, as described in Sec. II C. In the roving-level condition, the overall level of each interval was randomly and independently drawn from a 30-dB range [±15 dB, vertical gray arrows, Fig. 2(a)].
FIG. 2.
Notched-noise and profile-analysis stimuli, based on Lentz et al. (1999). (a) Notched-noise stimuli: standard and target. (b) Profile-analysis stimuli. Vertical gray bars indicate roving-level ranges.
2. Profile analysis
Standard and target profile-analysis stimuli are illustrated in Fig. 2(b). The stimuli were 100 ms in duration, including 5-ms raised cosine on/off ramps, and the two “listening” intervals were separated by a silent interstimulus interval of 330 ms. Stimuli for both intervals were produced by adding components with frequencies logarithmically equally spaced from 200 to 5000 Hz. The numbers of components tested were 4, 6, 10, 16, 20, 30, 40, and 50. The phases of the individual components were chosen for each interval at random from a uniform distribution with a range of 2π rad. For the standard stimulus (flat spectrum), all components were individually scaled to 50 dB SPL per component. For the target stimulus (“up-down” spectrum), each component was scaled to 50 dB SPL and then additionally scaled by , with odd-numbered components amplified and even-numbered components attenuated; SRS refers to the Signal re Standard level, e.g., the magnitude of the signal relative to the standard, in dB. An example of this scaling is as follows: an SRS value of 0 fully removes the attenuated components and doubles the amplitude of the amplified components [Lentz et al. (1999), see endnote 3]. Tested values of SRS ranged from −40 to 0 dB; this range of values was tested to estimate the model threshold (see Sec. II C). The intervals were then randomly and independently roved over a 30-dB range [±15 dB, vertical gray arrows, Fig. 2(b)].
B. Models
Responses were simulated for populations of neurons in both the AN and the inferior colliculus (IC). Simulated AN responses were generated from the Zilany et al. (2014) AN model, which incorporates cochlear compression and IHC saturation, as well as synaptic adaptation and saturation.
Human parameters for the Zilany et al. (2014) model were based on those in Ibrahim and Bruce (2010), including basilar membrane tuning based on Shera et al. (2002). The AN parameters incorporate healthy outer and inner hair cell function and spontaneous activity that varies over time and from trial to trial. The implementation of spontaneous activity was the same as Zilany et al. (2014) [mostly unaltered from Zilany et al. (2009): see Table I in Zilany et al. (2014)]. Only high-spontaneous-rate AN fibers were simulated. The wide dynamic ranges of low-spontaneous-rate (LSR) fibers are only observed above about 1500 Hz [Winter and Palmer (1991); see also Fig. 16 from Liberman (1978)] and so would not substantially affect threshold estimates in the notched-noise task in which the stimulus was centered at 1 kHz. Even if LSR fibers with wide dynamic ranges were available at relevant CFs, LSR fibers would not be more effective than HSR fibers at coding stimuli with roving levels using an energy-based metric such as average rate.
For all notched-noise and profile-analysis tests, 99 characteristic frequencies (CFs) were logarithmically spaced between 200 and 5000 Hz, including one CF at 1000 Hz. One AN fiber was simulated per CF. All simulations used the instantaneous rate functions provided by the output of the AN model, rather than spike times. This AN model output was the average arrival rate for a non-homogeneous Poisson process in spikes per second (sp/s), as could be used to drive a spike generator. This output may alternatively be conceptualized as an estimate of a PSTH based on a large number of repetitions of a model with spike generation.
Note that this AN model is substantially different from peripheral models such as Dau et al. (1997), used in simulations of the notched-noise task in Derleth and Dau (2000) and similar masking tasks (with some changes to the model) in Jepsen et al. (2008). The Dau et al. (1997) model, and later modifications, did not include IHC saturation and did not aim to simulate physiological neural fluctuations and capture.
Model IC cells were band-enhanced cells, which are excited by fluctuations in a band of frequencies, as mentioned above (Carney et al., 2015; Carney et al., 2016). The AN model responses provided the input to the brainstem stage of the same-frequency inhibitory-excitatory (SFIE) model (Nelson and Carney, 2004; Carney et al., 2015; Carney and McDonough, 2019). The parameters of the SFIE model (Table I) were set to produce IC modulation filters centered on neural fluctuation frequencies near 50, 100, and 150 Hz (model BMFs). The bandwidths of these modulation filters were approximately equal to their center frequencies (i.e., quality factor, Q ≈ 1; Nelson and Carney, 2004). These model BMFs fall near the center of the distribution of BMFs found in the mammalian IC (Langner and Schreiner, 1988; Krishna and Semple, 2000; Nelson and Carney, 2007).
TABLE I.
SFIE model parameters. Columns represent the CN (brainstem) and IC stages (50, 100, and 150 Hz BMFs) of the SFIE model. For further description of the parameters, see Nelson and Carney (2004).
CN | IC 50 Hz BMF | IC 100 Hz BMF | IC 150 Hz BMF | |
---|---|---|---|---|
Excitatory time constant (ms) | 0.5 | 2 | 1 | 0.667 |
Inhibitory time constant (ms) | 2 | 3 | 1.5 | 1 |
Delay of inhibition (ms) | 1 | 4 | 2 | 1.33 |
Inhibitory strength relative to excitatory strength | 0.6 | 0.9 | 0.9 | 0.9 |
Average rate scalar | 1.5 | 1 | 1 | 1 |
C. Model threshold estimation
For each experimental condition (e.g., different stimulus parameters and signal levels) population model responses were simulated for 100 two-interval trials. The AN and IC population model responses were simulated beginning with the first interval and continuing through the interstimulus interval to the end of the second interval. Half of the 100 trials had the signal in the first interval and half had the signal in the second interval to control for any effects of ordering on the model response. Average rates for each CF channel during each interval were collected by averaging the output rate functions over an analysis window with a length matched to the duration of each interval (the interstimulus interval was excluded from analysis).
Signal levels at model detection threshold based on AN and IC rates were estimated for each experimental condition, as follows: First, a template depicting the average response to the standard interval was generated by averaging the response rate curves (consisting of average rates for each CF on each trial) across 1500 standard intervals (see Fig. 3). [For a review of some other approaches to templates in auditory contexts, see Appendix E of Osses Vecchi (2018). For additional background see Dau et al. (1996) and Green and Swets (1966).] These 1500 intervals were separate from intervals used in the experiment, and responses were collected with the standard intervals in pairs, separated by the same interstimulus interval as in the experiment. This large number of intervals was used so that no single standard interval would have an inordinate effect on the template. The standard interval was used for the template rather than the target because responses to the standard did not change with tone level (for the tone-in-notched-noise task) or SRS (for the profile-analysis task). Distinct templates were generated for each condition in the tone-in-notched-noise task (e.g., different noise bandwidths and different notch bandwidths) and for each condition in the profile-analysis task (e.g., different numbers of components), separately for fixed-level and roving-level conditions. Then for each trial, the Mahalanobis distance [Mahalanobis, 1936; Duda et al., 2001, pp. 35, Eq. (45)]1 between the response to each interval of the trial and the template was calculated, and the interval with the response that was most distant from the template was selected as the target interval. The Mahalanobis distance takes into account the covariance across model channels, avoiding the assumption that the channels are independent; the covariance can be significant in responses to complex sounds. The covariance used for the Mahalanobis distance was the covariance of the population responses to the standard intervals used in the template. In this way, percent correct across 100 trials was determined for each signal level. A logistic function (scaled to extend from 0.5 to 1) was fit to the percent-correct data to estimate the tone level (for notched noise) or SRS value (for profile analysis) at which the model achieved 79% correct, matching the criterion used for human testing (Lentz et al., 1999).
FIG. 3.
(Color online) Model method for selecting the target interval on each trial. Left diagram shows responses to both standard and target intervals over time (abscissa) for the IC model population (ordinate) for a single trial. Right column shows average IC population rate responses for both intervals (top and bottom) as compared to the means and standard deviations (shaded areas) associated with the template for the standard stimulus (middle). Mahalanobis distance (DM) was calculated between each interval and the template, and the interval more different from the template was selected as the target interval.
To measure the performance based on a combination of IC neurons with multiple BMFs (the “multi-BMF model”), the template included every CF channel for each BMF, and the Mahalanobis distance took into account the covariance between every pair of CF channels (pairs within and between populations with different BMFs). Although the multi-BMF model involved a population with three times as many model IC neurons as the individual BMF models, each BMF model population received inputs from the same group of 99 AN CFs (on each trial). Therefore, the multi-BMF model used the same number of distinct AN CFs as the AN model and the individual IC BMF models and operated with exactly the same model AN inputs. In all other respects the thresholds for the multi-BMF model were calculated using the same method as for single populations.
In addition to the simulations used to identify model thresholds, model responses were also simulated at human thresholds (from Lentz et al., 1999) in order to illustrate the nature and availability of fluctuation cues for threshold stimuli (see Figs. 4, 5, 6, and 8, below).
FIG. 4.
(Color online) Changes in AN and IC population response across conditions. Black indicates the average model population response to the standard (top: AN; bottom: IC, 100 Hz BMF). Red indicates the average response to the target at the average human thresholds from Lentz et al. (1999). Responses are the model average rates of 100 trials in the fixed-level conditions. Note that the model IC response to the standard had a peak at 1000 Hz for the 100-Hz notch bandwidth for narrowband maskers and dips at the tone frequency for the 800-Hz notch bandwidth.
FIG. 5.
(Color online) Normalized differences in rates for fixed-level notched noise. The difference between mean responses of each model channel to standard and target stimuli across 100 trials, normalized by the channel standard deviation drawn from the template, as a function of CF (top, AN; bottom, IC). This difference value is indicated by both the relative height of the line and the color of the line, as displayed in the color bar (height of bar is to scale). Stimuli were at average human thresholds from Lentz et al. (1999). Red (blue) indicates responses for which the addition of the tone caused an increase (decrease) in normalized activity. Results are shown for notch bandwidths of 100, 200, 300, 400, and 800 Hz, masker bandwidths of 50, 100, 300 Hz and a wideband condition (200–5000 Hz), and three model cell BMFs.
FIG. 6.
(Color online) Normalized differences in rates for roving-level notched noise. The difference between mean responses of each model channel to standard and target stimuli across 100 trials, normalized by the channel standard deviation drawn from the template, as a function of CF (top, AN; bottom, IC). This difference value is indicated by both the relative height of the line and the color of the line, as displayed in the color bar (height of bar is to scale). Stimuli were at average human thresholds from Lentz et al. (1999). Red (blue) indicates responses for which the addition of the tone caused an increase (decrease) in normalized activity. Results are shown for notch bandwidths of 100, 200, 300, 400, and 800 Hz, masker bandwidths of 50, 100, 300 Hz and a wideband condition (200–5000 Hz), and three model cell BMFs.
FIG. 8.
(Color online) Population responses for profile analysis. Average responses to standard (black) and target (red) profile-analysis stimuli are shown for the AN model (left) and the IC model (BMF = 100 Hz; right) at −7 dB Sig re Standard for various numbers of components (shown on right). Horizontal axis indicates characteristic frequencies of model cells. Vertical axis shows average response rate in spikes per second. Error bars show standard deviation across 100 trials. Red dots above (below) black line indicate frequencies of components amplified (attenuated) in the target stimulus.
III. RESULTS AND DISCUSSION
A. Detection of a tone added to notched noise
Figure 4 shows population responses to the target (red) and standard (black) stimuli in the fixed-level condition. The top panel shows rates for the AN population in various notch-bandwidth and noise-bandwidth conditions. The bottom panel shows rates for the 100-Hz BMF IC population in the same conditions. Target stimuli used were set to the human psychophysical thresholds experimentally measured in Lentz et al. (1999) for the model responses shown. Most results in Fig. 4 involve tone levels close to or higher than the multi-BMF model thresholds; that is, combining the BMF populations enabled the models to reach threshold at lower signal levels than those depicted here. Human thresholds are shown rather than model thresholds so that, by depicting the responses of different models at a single level for each condition, the relative efficacy of different models (different IC BMFs and IC vs AN) can be assessed. Note that the AN responses (Fig. 4, top panel) are not substantially changed by the addition of the tone for all but the widest notch bandwidth. As expected based on the power-spectrum model, increases in rates due to level become more apparent as the bandwidth of the notch increases (left to right).
The bottom panel of Fig. 4 shows changes in IC responses due to the addition of the tone. The shape of the population responses changes depending on both the masker bandwidth and the notch bandwidth (the right and top axes, respectively). First consider the dip apparent for the standard stimulus with the widest notch bandwidth (far right column). This dip reflects the absence of energy in the standard at those frequencies and is also present in the AN population response. Second, consider the small peak in the response to the standard (i.e., without tone) at narrow notch bandwidths for narrowband maskers. This peak does not occur for the AN population and thus suggests an increase in fluctuation amplitudes rather than an increase in energy at those CFs. This peak also does not occur for the IC population in response to stimuli with wider notches, suggesting that the fluctuations change as the notch bandwidth increases. Together, these observations suggest that beating between the two noise bands, which is closer in frequency to BMF at narrow notch bandwidths and farther from BMF at wider notch bandwidths, causes this peak. Changes in the IC population responses (bottom panel) generally appear more substantial than those for the AN (top panel), although it is difficult to gauge how significant these differences are without taking into account the standard deviation of the rates across trials for each CF. This issue is addressed in the following figures.
In contrast to the average population responses for both standard (black) and target (red) shown in Fig. 4, Fig. 5 shows just one measure, the difference in rates between the average responses to the standard and target stimuli, normalized by the standard deviation of rates for each CF [drawn from the distribution of average rates across all (standard) intervals in the template discussed in the methods above]. This normalized difference provides a measure of the discriminability between the two distributions of average rates for each CF, as shown in the formula above the color bar. The formula indicates the average rate (R) in response to the target stimulus (TAR) and standard stimulus (STD), and the standard deviation (σ) drawn from the template (T). Letters i and j indicate that these values were different for each CF and each IC model BMF population. As in Fig. 4, the top and bottom panels show AN and IC results, respectively, with the same layout of conditions as Fig. 4. All responses shown are for stimuli at average human detection thresholds from Lentz et al. (1999). Figure 5 provides additional depiction of population responses for 50- and 150-Hz BMFs. As shown in the top panel, the information provided by the AN rates is still visibly minimal, and information provided by IC rates is still substantial, when the standard deviations of rates in each channel are taken into account. It is important to reiterate that the average rates of the AN are distinct from the temporal information carried by the AN. It is the neural fluctuations from the AN responses that are recast here as IC rates.
The way in which the midbrain responses indicate the presence of the tone changes across the various conditions shown in Fig. 5. Peaks (red) in the graphs of Fig. 5 indicate increased fluctuations due to the added signal tone, suggesting beating between the tone and the noise bands. Dips (blue) suggest that the tone is dominating the AN response, reducing fluctuation amplitudes in the temporal envelope of the AN response (Zilany and Bruce, 2007; Carney, 2018). Near the detection thresholds of listeners, as shown here, most of the informative aspects of the model IC population responses are increases in the responses of CFs near the tone frequency. Beating between the tone and the noise bands, at frequencies that result in increased IC discharge rates, appears to be the primary neural fluctuation mechanism encoding the presence of the tone. However, slight decreases in IC rate relative to the standard interval (blue) provide additional information for the conditions with the narrowest and widest notch bandwidths. The 50-Hz BMF model cell population responses also show slight decreases in IC rate for CFs near the tone frequency in the 300- and 400-Hz notch bandwidth conditions.
Most of the useful information is present in responses of model cells with CFs near the 1-kHz tone frequency, but the tone also affects interactions for more distant CFs, particularly for narrow notch bandwidths in the 100-Hz noise band condition. This spread of information across a wide range of CFs did not occur for the 50-Hz noise bands or for noise bands wider than 100-Hz. In the 50-Hz band condition, it is possible that not enough energy from the noise bands passed through distant CFs to allow for significant interactions with the tone. In the wider band conditions, the response rates of CFs distant from the tone frequency were elevated relative to the narrower band conditions and similar between the standard and test conditions (Fig. 4, bottom panel). This pattern of responses suggests that the lack of informative contributions of CFs away from the tone frequency was likely the result of beating among the components of the maskers that obscured the beating between the tone and noise.
Figure 6 shows that differences in midbrain fluctuation between the target and standard notched-noise stimuli are robust to the roving-level paradigm. Note that in every condition the polarity of the most substantial differences (i.e., whether they are increases or decreases) remains the same as in the fixed-level responses (Fig. 5). The magnitude of the differences was generally reduced in responses to the roving-level conditions, as expected due to the higher standard deviation of the responses within each channel caused by the roving-level paradigm. Note that AN responses were omitted from Fig. 6, as significant AN rate changes were minimal in the roving-level condition.
Figure 7 compares model thresholds for various notched-noise conditions with human thresholds for the same conditions from Lentz et al. (1999). In contrast to Figs. 5 and 6 that showed differences in responses for each individual channel, the model thresholds shown here were based on a decision variable that took into account the entire population response including covariance between channels (see Sec. II C). In addition to the performance of the three separate IC model cell populations with BMFs of 50, 100, and 150 Hz (dotted and dashed gray lines), Fig. 7 shows performance based on the combination of all three populations of model cells (the multi-BMF model, solid blue lines). Note the substantial upward shift of the thresholds based on AN average rates in the roving-level condition, whereas the IC model thresholds are only slightly elevated by the roving level, similar to human thresholds.
FIG. 7.
(Color online) Model thresholds for notched noise. Thresholds for detecting a 1-kHz tone are shown as a function of notched-noise masker notch bandwidth, for three masker bandwidths and for the wideband condition. Results are for fixed-level (left column) and 30-dB roving-level (center column) conditions. The right column shows the amount by which thresholds increased in each roving-level condition (the difference between the center and left columns). Line colors and styles indicate average human thresholds from Lentz et al. (1999; black crosses), and for the AN model (red squares), and different IC models. Human thresholds for 50-Hz-masker bandwidth were from only Observer 1 in Lentz et al. (1999) (see their Fig. 3). The IC multi-BMF model (blue circles) includes 50, 100, and 150 Hz BMFs. Triangles indicate conditions and parameters for which a model did not reach threshold at a tone level of 90 dB SPL. Mean spectrum level was 35 dB re 20 μPa. Dotted lines in bottom center and right panels show results for the AN (red squares) and IC (blue circles) models with a restricted range of CFs, as described in the text.
The AN average rates did not explain human thresholds in any of the fixed- or roving-level conditions, suggesting that, according to this model, physiologically-based energy cues cannot account for human performance. This result supports modeling work from Lentz et al. (1999) that used a completely different approach: fitting auditory-filter bandwidths to the fixed-level human data, roving the levels of stimuli at the input to single auditory filters with those bandwidths, and predicting thresholds based on the output of the single auditory filter using the power-spectrum model. Averaged across all conditions, the thresholds of their model were almost 10 dB above the human psychophysical thresholds with roving.
Across the 50, 100, and 300 Hz-bandwidth conditions the average increase in human thresholds, from the fixed-level to the roving-level paradigm, was ∼3 dB. The average increase in threshold for the multi-BMF model due to the roving-level paradigm was similar for these conditions, at ∼4 dB. No fitting procedure was used to align the IC model thresholds with the human data. Across these conditions the model thresholds were close to (on average, ∼3 dB below) the listener thresholds for several conditions and had the same general trends across stimulus conditions. At the widest notch bandwidths, the multi-BMF model thresholds were higher and the threshold elevation due to rove increased. These trends are consistent with neural fluctuation cues involving interactions between the tone and the noise bands; as the noise bands become more distant from the tone, the interactions decrease and thus the robustness of the model IC response in the roving-level paradigm decreases slightly.
The roving-level affected the multi-BMF IC model most strongly in the wideband condition. The roving-level multi-BMF IC model thresholds were ∼20 dB higher on average than human thresholds in this condition, although the same model predicted wideband fixed-level human thresholds within less than, on average, 1 dB. It is worth noting that although spot checks suggested that moderate changes in the range of CFs did not substantially affect AN or IC model thresholds for the band-limited masker conditions, the range of CFs did strongly affect the wideband roving-level model thresholds. Dotted red and blue lines in the panel of Fig. 7 representing this condition show the performance of a model with a restricted subset of the 99 CFs described in Sec. II B (35 CFs from 502 to 1533 Hz). The removal of uninformative CFs at frequencies far from the tone frequency improved model performance. The multi-BMF IC model thresholds using this restricted subset of CFs were, on average, ∼4 dB higher than human thresholds.
The increase in multi-BMF IC thresholds in the wideband condition is inconsistent with human psychophysical data, for which differences in fixed- and roving-level threshold tend to be smaller for wide than narrow noise bandwidths (Lentz et al., 1999). Lentz et al. suggested that this trend might be expected if profile cues were used to perform the notched-noise task. The current models did not take advantage of response profiles, i.e., differences across channels, and thus our results are consistent with their explanation. Although the IC model results offer support for a model based on cross-interval, within-channel differences in fluctuations, this particular aspect of the results suggests that listeners may also use cross-channel comparisons.
B. Profile analysis
Figure 8 shows model population responses for the AN (left column) and IC (right column) to the profile-analysis stimuli. Average responses for several conditions are shown for stimuli with the same Sig re Standard [−7 dB, which was the average human threshold from Lentz et al. (1999) for the 50-component condition]. Results are shown at a single SRS value, rather than at model thresholds, to depict how the model response changes (and the task becomes easier/more difficult) with changes in the number of components. All results in Fig. 8 involve SRS values higher than the multi-BMF model thresholds. An alternating up-down rate pattern is present in AN rates (Fig. 8, left column); however, the differences between the target and standard are small in relation to the standard deviation of rates across trials for this roving-level paradigm, and are thus less informative. Note that the up-down pattern in the AN response, which is driven by stimulus energy, is inverted with respect to that in the IC response pattern, which is driven by low-frequency fluctuations (Fig. 8, right column).
The IC model population response to the standard (Fig. 8, black lines, right panels) alternates up-down across the population, with dips in the population response for cells with CFs near each component frequency. Channels tuned near each component in the profile stimulus have relatively weak fluctuations in the AN model response, which provide relatively ineffective inputs to the band-enhanced IC model neurons, resulting in the dips in the model IC population response. In contrast, peripheral channels tuned between components have fluctuating responses due to beating of nearby components. These channels provide relatively effective inputs to the band-enhanced model IC neurons, which are excited by the fluctuating inputs. Thus, peaks occur in the IC model population response at CFs between tone components.
In response to the target stimulus (Fig. 8, red lines, right panels), the spacing of dips and peaks is qualitatively different than for the standard stimulus, with half as many dips and peaks across the IC population. This pattern occurs because the alternate incremented components are more dominant in reducing fluctuations in AN responses than those in the standard stimulus, and the attenuated components are relatively less dominant. This pattern of neural fluctuation information across the population of model IC neurons can be further understood by observing differences in fluctuation amplitudes at each stage of the Zilany et al. (2014) AN model in response to a profile-analysis target stimulus (Fig. 9). For the relatively sparse spectrum of this stimulus condition, differences in fluctuation amplitudes are established in the peripheral filter responses (Fig. 9, top), and these differences are then enhanced by IHC saturation (middle) and by the power-law model synapse (bottom).
FIG. 9.
Example AN model responses for profile analysis. Time-varying AN model responses for two CFs (columns, 554 and 652 Hz) near amplified and attenuated components of a profile-analysis target stimulus with 20 components, −7 dB Sig re Standard, referenced to a standard stimulus with 50 dB SPL per component. Responses are shown for a 40 ms window during the stimulus. Rows are responses of three stages of the AN model: basilar-membrane filter (top), IHC (middle), and synapse (bottom). The contrast in fluctuation amplitudes between these two CF channels emerges as the signal progresses through the AN model.
As the number of components increases, the broad peak in the IC population response shifts to higher CFs, where the spacing of the log-spaced components matches the BMFs of the model IC neurons. That is, as more components are added to the stimulus, the frequency range containing components separated by approximately 100 Hz shifts to higher frequencies (from top to bottom of Fig. 8, right panel). Overall, the population response to the target involves lower rates than the response to the standard, suggesting that the decrease in IC response at the spectral peaks is the most important change in the IC model responses between standard and target stimuli. This pivotal effect of the peaks in the spectrum on the neural fluctuations, and thus the model IC responses, aligns with the results from Lentz (2006) that suggest human performance relies on spectral peaks.
Figure 10 compares human (black, with crosses) and model thresholds (blue circles) for the profile-analysis task. Each of the IC models and the multi-BMF model (blue line) reflect the U-shape of the human thresholds across varying numbers of components. Thresholds are higher for stimuli with fewer than 20 components because the wide spacing of the components for those stimuli results in fluctuations that are outside the range of BMFs used for the model IC neurons. The increase in threshold for conditions with more than 20 components is explained by the decreasing size of dips in the IC population response (Fig. 8). The energy-based AN model (red) performs poorly in this task, as expected for this roving-level paradigm. As an aside, note that the decision variable used for the AN and IC responses did not include cross-channel differences and therefore is not equivalent to a profile-analysis model. The decision variable used here simply combines within-channel differences in rates for all channels. The thresholds of the multi-BMF IC model are lower than the human thresholds; as in the notched-noise results, this model was not fit to the human thresholds. Additional internal noise could elevate these thresholds. Although the model thresholds do not indicate the small increase in human thresholds for stimuli with 16 components, this increase is not consistently observed across listeners (Richards and Lentz, 1998; Lentz 2006).
FIG. 10.
Model thresholds for profile analysis (color online). Thresholds are shown for 4, 6, 10, 16, 20, 30, 40, and 50 components. Vertical axis shows threshold as Sig re Standard (dB) (See Lentz et al., 1999). Line colors and styles indicate average human thresholds from Lentz et al. (1999; black crosses), and for the auditory nerve model (red squares), and different IC models. IC multi-BMF model (blue circles) includes 50, 100, and 150 Hz BMFs. Triangles indicate conditions and parameters for which models did not reach threshold at Sig re Standard = 0 dB.
The fluctuation-based explanation for the U-shaped threshold curve contrasts with previous explanations (Lentz et al., 1999). Previously it was suggested that thresholds are worse for small numbers of components because less information is available, and thresholds are worse for high numbers of components as the components become unresolved (Bernstein and Green, 1987). The current explanation differs fundamentally in its explanation for low numbers of components—it is not just the small number of components but also the distance between them and the resulting shift in beat frequencies away from common BMFs that limit performance. For high numbers of components the issue is still related to resolvability, but it is the resolvability of the effects of spectral peaks on neural fluctuations that becomes blurred when the components are spaced too closely in frequency.
Although profile-cue-based models (variants of Durlach et al., 1986) were successful in explaining both the notched-noise and profile-analysis data in Lentz et al. (1999), substantially different auditory-filter bandwidths were necessary to explain results for the same listeners. This problem was rectified in Lentz (2006) by assuming that listeners primarily use spectral peaks, and Lentz (2006) provided human data supporting this hypothesis. It is notable that a single IC model, using neural fluctuations primarily shaped by these peaks, aligns with both notched-noise and profile-analysis results. Moreover, the IC model used here did not take advantage of cross-channel differences, but rather used only within-channel differences.
IV. CONCLUSIONS
This modeling work demonstrated that neural fluctuations that drive midbrain neurons provide a plausible underlying mechanism to explain the results of Lentz et al. (1999) for two different psychophysical tasks. A neural-fluctuation model based on available physiological data can explain the robustness of human thresholds to a roving-level paradigm in the notched-noise task. Additionally, the same fluctuation-based model is consistent with the essential role of spectral peaks in a profile-analysis task, as suggested by Lentz (2006).
The implications of the current study for the measurement of auditory-filter bandwidths are similar to those outlined by Lentz et al. (1999). Human thresholds in the roving-level paradigm led those authors to conclude that the power-spectrum model did not fully describe the detection process for a tone added to notched-noise maskers. The current results show that a model based on AN average rates was not able to explain the notched-noise results, not only in the roving-level notched-noise task but also in profile analysis and even for the detection of a tone added to notched noise in the absence of roving levels. Even though information was combined across channels, energy cues coded as average rates at the level of the AN were not sufficient. This result presents an obstacle to interpretations of notched-noise thresholds that assume physiological representations of level cues, but is not problematic for more conceptual views of the power-spectrum model. However, fluctuation cues can explain human performance in both fixed-level and roving-level notched-noise tasks, while the conceptual power-spectrum model (using solely the power within a single frequency channel) cannot.
Replication of these results with other midbrain models and continual updates to models of the auditory periphery are necessary to further assess the robustness of neural fluctuation cues. Models other than Zilany et al. (2014) may yield similar conclusions if they feature compression, saturation of the inner hair cell, and physiologically realistic synaptic adaptation between the IHC and AN [e.g., Sumner et al. (2002), Verhulst et al. (2018), and Bruce et al. (2018)]. Further modeling work should also investigate the nature of these cues as represented by the band-suppressed cells of the IC. These cells are hypothesized to receive inhibitory inputs from the band-enhanced cells and excitatory inputs from the lower brainstem, so they have the capacity to more clearly distinguish between lack of energy at a given frequency and reduced fluctuations (Carney et al., 2015, Carney 2018). Future physiological work should more closely examine the prevalence of fluctuation cues in IC responses, and behavioral studies may provide an additional assessment of whether or not performance on the roving-level and fixed-level tasks involves the same mechanisms. Another question is whether animals that have behavioral performance in the notched-noise task that is consistent with rate-based measures of peripheral frequency selectivity [e.g., Evans et al. (1992)] use the same cues as humans. Some evidence suggests that rabbits, for example, do not use available temporal information as efficiently as humans (Carney et al., 2014).
Although the effectiveness of fluctuation cues is depicted in the current study through the lens of midbrain rates, it is important to reiterate that the progenitor of these cues is low-frequency timing information in the neural fluctuations of AN responses, as shaped by peripheral nonlinearities. The factors that produce modulation tuning in the midbrain convert the peripheral fluctuation-amplitude information into midbrain rate information. These modeling results should be considered indicative of the usefulness of low-frequency AN fluctuation information to the same extent as they are indicative of midbrain contributions to the performance of these tasks.
ACKNOWLEDGMENTS
The authors gratefully acknowledge helpful comments from Dr. Skyler Jennings and two anonymous reviewers. This work was supported by NIH-NIDCD Grant No. DC010813.
Portions of this work were presented at the “41st Annual Midwinter Meeting of the Association for Research in Otolaryngology,” San Diego, CA, USA, 2018.
Footnotes
The Mahalanobis distance was calculated as , where x is the model population response (a vector with AN or IC average discharge rates for 99 CF channels) on a given trial, μ is the template (also consisting of values for 99 CFs), t indicates transposition, and ∑−1 is the inverse of the covariance matrix.
Contributor Information
Braden N. Maxwell, Email: .
Laurel H. Carney, Email: .
References
- 1. Alves-Pinto, A. , Sollini, J. , Wells, T. , and Sumner, C. J. (2016). “ Behavioural estimates of auditory filter widths in ferrets using notched-noise maskers,” J. Acoust. Soc. Am. 139, EL19–EL24. 10.1121/1.4941772 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Baker, R. J. , and Rosen, S. (2006). “ Auditory filter nonlinearity across frequency using simultaneous notched-noise masking,” J. Acoust. Soc. Am. 119, 454–462. 10.1121/1.2139100 [DOI] [PubMed] [Google Scholar]
- 3. Bernstein, L. R. , and Green, D. M. (1987). “ The profile-analysis bandwidth,” J. Acoust. Soc. Am. 81, 1888–1895. 10.1121/1.394753 [DOI] [Google Scholar]
- 4. Bruce, I. C. , Erfani, Y. , and Zilany, M. S. (2018). “ A phenomenological model of the synapse between the inner hair cell and auditory nerve: Implications of limited neurotransmitter release sites,” Hear. Res. 360, 40–54. 10.1016/j.heares.2017.12.016 [DOI] [PubMed] [Google Scholar]
- 5. Burton, J. A. , Dylla, M. E. , and Ramachandran, R. (2018). “ Frequency selectivity in macaque monkeys measured using a notched-noise method,” Hear. Res. 357, 73–80. 10.1016/j.heares.2017.11.012 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Carney, L. H. (2018). “ Supra-threshold hearing and fluctuation profiles: Implications for sensorineural and hidden hearing loss,” Assoc. Res. Otolaryngol. 19, 331–352. 10.1007/s10162-018-0669-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Carney, L. H. , Kim, D. O. , and Kuwada, S. (2016). “ Speech coding in the midbrain: Effects of sensorineural hearing loss,” in Physiology, Psychoacoustics, and Cognition in Normal and Impaired Hearing, edited by van Dijk P., Baskent D., Gaudrain E., de Kleine E., Wagner A., and Lanting C. ( Springer, Cham: ), pp. 427–435. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Carney, L. H. , Li, T. , and McDonough, J. M. (2015). “ Speech coding in the brain: Representation of vowel formants by midbrain neurons tuned to sound fluctuations,” Eneuro. 2(4), 1–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Carney, L. H. , and McDonough, J. M. (2019). “ Nonlinear auditory models yield new insights into representations of vowels,” Atten. Percept. Psychophys. 81(4), 1034–1046. 10.3758/s13414-018-01644-w [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Carney, L. H. , Zilany, M. S. , Huang, N. J. , Abrams, K. S. , and Idrobo, F. (2014). “ Suboptimal use of neural information in a mammalian auditory system,” J. Neurosci. 34, 1306–1313. 10.1523/JNEUROSCI.3031-13.2014 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Dai, H. , and Green, D. M. (1993). “ Discrimination of spectral shape as a function of stimulus duration,” J. Acoust. Soc. Am. 93, 957–965. 10.1121/1.405456 [DOI] [PubMed] [Google Scholar]
- 12. Dau, T. , Kollmeier, B. , and Kohlrausch, A. (1997). “ Modeling auditory processing of amplitude modulation. I. Detection and masking with narrow-band carriers,” J. Acoust. Soc. Am. 102, 2892–2905. 10.1121/1.420344 [DOI] [PubMed] [Google Scholar]
- 13. Dau, T. , Püschel, D. , and Kohlrausch, A. (1996). “ A quantitative model of the ‘effective’ signal processing in the auditory system. I. Model structure,” J. Acoust. Soc. Am. 99, 3615–3622. 10.1121/1.414959 [DOI] [PubMed] [Google Scholar]
- 14. Deng, L. , and Geisler, C. D. (1987). “ Responses of auditory-nerve fibers to nasal consonant-vowel syllables,” J. Acoust. Soc. Am. 82, 1977–1988. 10.1121/1.395642 [DOI] [PubMed] [Google Scholar]
- 15. Derleth, R. P. , and Dau, T. (2000). “ On the role of envelope fluctuation processing in spectral masking,” J. Acoust. Soc. Am. 108, 285–296. 10.1121/1.429464 [DOI] [PubMed] [Google Scholar]
- 16. Duda, R. O. , Hart, P. E. , and Stork, D. G. (2001). Pattern Classification, 2nd ed ( Wiley, New York), p. 35. [Google Scholar]
- 17. Durlach, N. I. , Braida, L. D. , and Ito, Y. (1986). “ Towards a model for discrimination of broadband signals,” J. Acoust. Soc. Am. 80, 63–72. 10.1121/1.394084 [DOI] [PubMed] [Google Scholar]
- 18. Evans, E. F. , Pratt, S. R. , Spenner, H. , and Cooper, N. P. (1992). “ Comparisons of physiological and behavioural properties: Auditory frequency selectivity,” in Auditory Physiology and Perception, edited by Cazals Y., Horner K., and Demany L. ( Pergamon, Oxford: ), pp. 159–169. [Google Scholar]
- 19. Fletcher, H. (1938). “ The mechanism of hearing as revealed through experiment on the masking effect of thermal noise,” Proc. Natl. Acad. Sci. U.S.A. 24, 265–274. 10.1073/pnas.24.7.265 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Fletcher, H. (1940). “ Auditory patterns,” Rev. Mod. Phys. 12, 47–65. 10.1103/RevModPhys.12.47 [DOI] [Google Scholar]
- 21. Glasberg, B. R. , and Moore, B. C. (1990). “ Derivation of auditory filter shapes from notched-noise data,” Hear. Res. 47, 103–138. 10.1016/0378-5955(90)90170-T [DOI] [PubMed] [Google Scholar]
- 22. Green, D. M. (1983). “ Profile analysis: A different view of auditory intensity discrimination,” Am. Psychol. 38, 133–142. 10.1037/0003-066X.38.2.133 [DOI] [PubMed] [Google Scholar]
- 23. Green, D. M. (1988). Profile Analysis: Auditory Intensity Discrimination ( Oxford University Press, New York: ). [Google Scholar]
- 24. Green, D. M. , and Mason, C. R. (1985). “ Auditory profile analysis: Frequency, phase, and Weber's law,” J. Acoust. Soc. Am. 77, 1155–1161. 10.1121/1.392179 [DOI] [PubMed] [Google Scholar]
- 25. Green, D. M. , Mason, C. R. , and Kidd, G., Jr. (1984). “ Profile analysis: Critical bands and duration,” J. Acoust. Soc. Am. 75, 1163–1167. 10.1121/1.390765 [DOI] [PubMed] [Google Scholar]
- 26. Green, D. M. , and Swets, J. A. (1966). Signal Detection Theory and Psychophysics ( Wiley, New York: ). [Google Scholar]
- 27. Hartmann, W. M. (1998). Signals, Sound, and Sensation ( Springer Science and Business Media, New York), p. 605. [Google Scholar]
- 28. Heinz, M. G. , Colburn, H. S. , and Carney, L. H. (2002). “ Quantifying the implications of nonlinear cochlear tuning for auditory-filter estimates,” J. Acoust. Soc. Am. 111, 996–1011. 10.1121/1.1436071 [DOI] [PubMed] [Google Scholar]
- 29. Ibrahim, R. A. , and Bruce, I. C. (2010). “ Effects of peripheral tuning on the auditory nerve's representation of speech envelope and temporal fine structure cues,” in The Neurophysiological Bases of Auditory Perception, edited by Lopez-Poveda E., Palmer A., and Meddis R. ( Springer, New York: ), pp. 429–438. [Google Scholar]
- 30. Jepsen, M. L. , Ewert, S. D. , and Dau, T. (2008). “ A computational model of human auditory signal processing and perception,” J. Acoust. Soc. Am. 124, 422–438. 10.1121/1.2924135 [DOI] [PubMed] [Google Scholar]
- 31. Joris, P. X. , Schreiner, C. E. , and Rees, A. (2004). “ Neural processing of amplitude-modulated sounds,” Physiol. Rev. 84, 541–577. 10.1152/physrev.00029.2003 [DOI] [PubMed] [Google Scholar]
- 32. Joris, P. X. , and Yin, T. C. (1992). “ Responses to amplitude-modulated tones in the auditory nerve of the cat,” J. Acoust. Soc. Am. 91, 215–232. 10.1121/1.402757 [DOI] [PubMed] [Google Scholar]
- 33. Kidd, G., Jr. , Mason, C. R. , Brantley, M. A. , and Owen, G. A. (1989). “ Roving-level tone-in-noise detection,” J. Acoust. Soc. Am. 86, 1310–1317. 10.1121/1.398745 [DOI] [PubMed] [Google Scholar]
- 34. Kim, D. O. , Zahorik, P. , Carney, L. H. , Bishop, B. B. , and Kuwada, S. (2015). “ Auditory distance coding in rabbit midbrain neurons and human perception: Monaural amplitude modulation depth as a cue,” J. Neurosci. 35, 5360–5372. 10.1523/JNEUROSCI.3798-14.2015 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35. Kohlrausch, A. , Fassel, R. , van der Heijden, M. , Kortekaas, R. , van de Par, S. , Oxenham, A. J. , and Püschel, D. (1997). “ Detection of tones in low-noise noise: Further evidence for the role of envelope fluctuations,” Acta Acust. Acust. 83, 659–669. [Google Scholar]
- 36. Krishna, B. S. , and Semple, M. N. (2000). “ Auditory temporal processing: Responses to sinusoidally amplitude-modulated tones in the inferior colliculus,” J. Neurophys. 84, 255–273. 10.1152/jn.2000.84.1.255 [DOI] [PubMed] [Google Scholar]
- 37. Langner, G. , and Schreiner, C. E. (1988). “ Periodicity coding in the inferior colliculus of the cat. I. Neuronal mechanisms,” J. Neurophys. 60, 1799–1822. 10.1152/jn.1988.60.6.1799 [DOI] [PubMed] [Google Scholar]
- 38. Lentz, J. J. (2006). “ Spectral-peak selection in spectral-shape discrimination by normal-hearing and hearing-impaired listeners,” J. Acoust. Soc. Am. 120, 945–956. 10.1121/1.2216564 [DOI] [PubMed] [Google Scholar]
- 39. Lentz, J. J. , Richards, V. M. , and Matiasek, M. R. (1999). “ Different auditory filter bandwidth estimates based on profile analysis, notched noise, and hybrid tasks,” J. Acoust. Soc. Am. 106, 2779–2792. 10.1121/1.428137 [DOI] [PubMed] [Google Scholar]
- 40. Liberman, M. C. (1978). “ Auditory-nerve response from cats raised in a low-noise chamber,” J. Acoust. Soc. Am. 63, 442–455. 10.1121/1.381736 [DOI] [PubMed] [Google Scholar]
- 41. Mahalanobis, P. C. (1936). “ On the generalized distance in statistics,” Proc. Natl. Inst. Sci. India 2, 49–55. [Google Scholar]
- 42. Mao, J. , Vosoughi, A. , and Carney, L. H. (2013). “ Predictions of diotic tone-in-noise detection based on a nonlinear optimal combination of energy, envelope, and fine-structure cues,” J. Acoust. Soc. Am. 134, 396–406. 10.1121/1.4807815 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43. Miller, R. L. , Schilling, J. R. , Franck, K. R. , and Young, E. D. (1997). “ Effects of acoustic trauma on the representation of the vowe /ε/ in cat auditory nerve fibers,” J. Acoust. Soc. Am. 101, 3602–3616. 10.1121/1.418321 [DOI] [PubMed] [Google Scholar]
- 44. Moore, B. C. J. (1986). “ Parallels between frequency selectivity measured psychophysically and in cochlear mechanics,” Scand. Audiol. Suppl. 25, 139–152. [PubMed] [Google Scholar]
- 45. Moore, B. C. J. (2007). “ Basic auditory processes involved in the analysis of speech sounds,” Philos. Trans. R. Soc. London, Ser. B 363, 947–963. 10.1098/rstb.2007.2152 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46. Nelson, P. C. , and Carney, L. H. (2004). “ A phenomenological model of peripheral and central neural responses to amplitude-modulated tones,” J. Acoust. Soc. Am. 116, 2173–2186. 10.1121/1.1784442 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47. Nelson, P. C. , and Carney, L. H. (2007). “ Neural rate and timing cues for detection and discrimination of amplitude-modulated tones in the awake rabbit inferior colliculus,” J. Neurophys. 97, 522–539. 10.1152/jn.00776.2006 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48. Osses Vecchi, A. (2018). “ Prediction of perceptual similarity based on time domain models of auditory perception,” Ph.D. thesis, Technische Universiteit Eindhoven, Eindhoven, the Netherlands. [Google Scholar]
- 49. Patterson, R. D. (1976). “ Auditory filter shapes derived with noise stimuli,” J. Acoust. Soc. Am. 59, 640–654. 10.1121/1.380914 [DOI] [PubMed] [Google Scholar]
- 50. Patterson, R. D. , and Nimmo-Smith, I. (1980). “ Off-frequency listening and auditory-filter asymmetry,” J. Acoust. Soc. Am. 67, 229–245. 10.1121/1.383732 [DOI] [PubMed] [Google Scholar]
- 51. Patterson, R. D. , Nimmo-Smith, I. , Weber, D. L. , and Milroy, R. (1982). “ The deterioration of hearing with age: Frequency selectivity, the critical ratio, the audiogram, and speech threshold,” J. Acoust. Soc. Am. 72, 1788–1803. 10.1121/1.388652 [DOI] [PubMed] [Google Scholar]
- 52. Patuzzi, R. , and Robertson, D. (1988). “ Tuning in the mammalian cochlea,” Physiol. Rev. 68, 1009–1082. 10.1152/physrev.1988.68.4.1009 [DOI] [PubMed] [Google Scholar]
- 53. Richards, V. M. (1992). “ The detectability of a tone added to narrow bands of equal-energy noise,” J. Acoust. Soc. Am. 91, 3424–3435. 10.1121/1.402831 [DOI] [PubMed] [Google Scholar]
- 54. Richards, V. M. , and Lentz, J. J. (1998). “ Sensitivity to changes in level and envelope patterns across frequency,” J. Acoust. Soc. Am. 104, 3019–3029. 10.1121/1.423883 [DOI] [PubMed] [Google Scholar]
- 55. Rosen, S. , and Stock, D. (1992). “ Auditory filter bandwidths as a function of level at low frequencies (125 Hz–1 kHz),” J. Acoust. Soc. Am. 92, 773–781. 10.1121/1.403946 [DOI] [PubMed] [Google Scholar]
- 56. Shera, C. A. , Guinan, J. J. , and Oxenham, A. J. (2002). “ Revised estimates of human cochlear tuning from otoacoustic and behavioral measurements,” Proc. Natl. Acad. Sci. U.S.A. 99, 3318–3232. 10.1073/pnas.032675099 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57. Sumner, C. J. , Lopez-Poveda, E. A. , O'Mard, L. P. , and Meddis, R. (2002). “ A revised model of the inner-hair cell and auditory-nerve complex,” J. Acoust. Soc. Am. 111, 2178–2188. 10.1121/1.1453451 [DOI] [PubMed] [Google Scholar]
- 58. Sumner, C. J. , Wells, T. T. , Bergevin, C. , Sollini, J. , Kreft, H. A. , Palmer, A. R. , Oxenham, A. J. , and Shera, C. A. (2018). “ Mammalian behavior and physiology converge to confirm sharper cochlear tuning in humans,” Proc. Natl. Acad. Sci. U.S.A. 115, 11322–11326. 10.1073/pnas.1810766115 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59. Verhulst, S. , Altoe, A. , and Vasilkov, V. (2018). “ Computational modeling of the human auditory periphery: Auditory-nerve responses, evoked potentials and hearing loss,” Hear. Res. 360, 55–75. 10.1016/j.heares.2017.12.018 [DOI] [PubMed] [Google Scholar]
- 60. Winter, I. M. , and Palmer, A. R. (1991). “ Intensity coding in low-frequency auditory-nerve fibers of the guinea pig,” J. Acoust. Soc. Am. 90, 1958–1967. 10.1121/1.401675 [DOI] [PubMed] [Google Scholar]
- 61. Zilany, M. S. A. , Bruce, I. C. , and Carney, L. H. (2014). “ Updated parameters and expanded simulation options for a model of the auditory periphery,” J. Acoust. Soc. Am. 135, 283–286. 10.1121/1.4837815 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62. Zilany, M. S. A. , Bruce, I. C. , Nelson, P. C. , and Carney, L. H. (2009). “ A phenomenological model of the synapse between the inner hair cell and auditory nerve: Long-term adaptation with power-law dynamics,” J. Acoust. Soc. Am. 126, 2390–2412. 10.1121/1.3238250 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63. Zilany, M. S. , and Bruce, I. C. (2007). “ Representation of the vowel /ε/ in normal and impaired auditory nerve fibers: Model predictions of responses in cats,” J. Acoust. Soc. Am. 122, 402–417. 10.1121/1.2735117 [DOI] [PubMed] [Google Scholar]