Abstract
The nonlinearities of the inner ear are often considered to be obstacles that the central nervous system has to overcome to decode neural responses to sounds. This review describes how peripheral nonlinearities, such as saturation of the inner-hair-cell response and of the IHC-auditory-nerve synapse, are instead beneficial to the neural encoding of complex sounds such as speech. These nonlinearities set up contrast in the depth of neural-fluctuations in auditory-nerve responses along the tonotopic axis, referred to here as neural fluctuation contrast (NFC). Physiological support for the NFC coding hypothesis is reviewed, and predictions of several psychophysical phenomena, including masked detection and speech intelligibility, are presented. Lastly, a framework based on the NFC code for understanding how the medial olivocochlear (MOC) efferent system contributes to the coding of complex sounds is presented. By modulating cochlear gain control in response to both sound energy and fluctuations in neural responses, the MOC system is hypothesized to function not as a simple feedback gain-control device, but rather as a mechanism for enhancing NFC along the tonotopic axis, enabling robust encoding of complex sounds across a wide range of sound levels and in the presence of background noise. Effects of sensorineural hearing loss on the NFC code and on the MOC feedback system are presented and discussed.
Keywords: Neural code, Saturation, Computational model, Medial Olivocochlear Efferent System, Sensorineural Hearing Loss
1. A Code based on Neural-Fluctuation Contrast
This review describes a theory for the neural coding and decoding of complex sounds based on the contrast in depth along the tonotopic axis of low-frequency fluctuations in auditory-nerve (AN) responses, referred to as neural-fluctuation contrast (NFC). A defining aspect of any nonlinear system is that it responds differently to inputs with different amplitudes. For the auditory system, which is often tasked with encoding spectral peaks, this feature of nonlinear systems is arguably beneficial. Fluctuations in the responses of inner hair cells (IHCs) and AN fibers differ qualitatively between channels tuned near spectral peaks and those tuned in spectral valleys, in a way that can be readily decoded by the central nervous system. The NFC code provides an alternative to earlier hypotheses for neural codes based on AN average discharge rates (i.e., excitation patterns) or phase-locking to temporal fine structure (TFS), as discussed below.
Figure 1 introduces the NFC code by illustrating several stages of peripheral responses for two frequency channels, one channel tuned near a formant peak in a vowel stimulus (orange), and the other to a spectral valley (violet) (Fig. 1A). The spectra of the peripheral-filter responses (Fig. 1B, center panels) illustrate the initial shaping of response profiles by cochlear tuning: for the channel near the formant peak (orange), the sharpness of the spectral peak is enhanced in the response of the channel, whereas for the channel centered in the valley (violet), the spectrum of the filter response is relatively flat. The peripheral-filter response waveforms differ not only in amplitude, but also in modulation depth: the response of the channel tuned near the peak (orange) is larger and more weakly modulated at the fundamental frequency (F0) of the vowel than the response of the channel tuned in the valley (violet) (Fig. 1B; time waveforms). This difference in modulation depth is enhanced by IHC transduction of the filter responses into voltage waveforms (Fig. 1C). The channel tuned near the peak is pushed into saturation on every cycle of the input, resulting in more uniform response amplitudes across cycles, which further reduces the depth of modulation (Fig. 1C, orange). In contrast, the IHC response to the lower amplitude signal for the channel tuned in the valley is not pushed into IHC saturation, and thus retains strong modulations (Fig. 1C, violet). The IHC responses of the two channels thus differ substantially in both amplitude and modulation depth. The nature of these differences is further enhanced by the IHC-AN synapse (Fig. 1D). The stimulus energy at both the spectral peak and the valley, for this utterance at a conversational speech level, is above the sound level at which high-spontaneous-rate (HSR) AN fibers have saturated average rates. Thus, the two synaptic responses are similar in terms of average rate (Fig. 1D, horizontal lines), but the fluctuations locked to the fundamental period (1/F0) in these responses are strikingly different. The fluctuations in the AN response are referred to here as neural fluctuations (NFs) (Fig. 1D), to distinguish them from the modulations in the stimulus envelope or in peripheral-filter responses (Fig. 1 B). Differences between responses to spectral peaks and valleys are generally exaggerated in NFCs, especially in healthy ears, for which the cochlear sensitivity of places tuned near spectral peaks helps to push the IHCs into saturation (Bruce et al., 2003; Zilany and Bruce, 2007).
Figure 1 -.
Depths of Neural Fluctuations (NFs) in AN responses differ between channels tuned near spectral peaks (orange, CF = 700 Hz) and valleys (violet, CF = 1250 Hz). A) Stimulus spectrum, steady state, 75 dB SPL, vowel /ae/ (speaker m06 from Hillenbrand et al., 1995). Formant peaks are indicated by F1–3. B) Cochlear responses: time waveforms (outer panels) and spectra (inner panels) of the nonlinear basilar-membrane response of the Zilany et al. (2014) model. IHC transduction is described by a saturating nonlinearity (Zhang et al., 2001). Stimulus components near a spectral peak push the near-CF IHC towards saturation (orange). C) IHC voltage, and D) AN response. For a channel tuned near a spectral peak (orange), NF depth is increasingly reduced at each stage of the model, whereas NF depth is enhanced for a channel tuned in a spectral valley (violet).
Figure 1 illustrates how peripheral nonlinearities shape NFCs in AN responses to complex sounds. Historically, this qualitative feature of the responses has not been a focus in studies of AN coding, perhaps because NFCs are not “interesting” in responses to pure tones in quiet, for which, by default, a single frequency dominates the responses at all peripheral stages: cochlea, IHCs, and AN fibers. For complex sounds, which are defined as having multiple frequency components, beating between frequency components creates envelope fluctuations in the stimulus, and beating between components that pass through the same peripheral filter shape the envelope of the basilar membrane response at each place along the cochlea. For voiced sounds, this beating is dominated by the fundamental frequency (F0), which is the most common difference in frequency between components across the spectrum. For all sounds, the beating is altered by the magnitude and phase properties of peripheral filters, and shaped by both IHC saturation and the IHC-AN synapse.
If the stimulus components that pass through a peripheral filter have similar amplitudes, large envelope fluctuations in the filter response are created by this beating. But the IHC response is affected by a saturating nonlinearity. Therefore, if there is a stimulus component near the characteristic frequency (CF) that is larger in amplitude than the neighboring components, that component will push the IHC into saturation and thus dominate, or capture, the AN response (Sachs and Young, 1980; Deng and Geisler, 1987; Deng et al., 1987; Miller et al., 1997; Sachs et al., 2002; Bruce et al., 2003; Zilany and Bruce, 2007; Carney, 2018). The captured response is similar to a response to a single-component stimulus. At low frequencies, this response resembles the phase-locked response to a pure tone, and early descriptions of capture quantified the effect using the synchronization coefficient to the stimulus component that captured the response (e.g., “synchrony capture”, or alternatively, the average localized synchronized rate; Young and Sachs, 1979). More generally, capture, or dominance of a response by energy near a spectral peak, can occur across a wide frequency range, with a resulting reduction in the NF depths of AN responses when IHCs are pushed into saturation. As a result of capture by spectral peaks, IHCs and AN fibers tuned near a spectral peak have qualitatively different responses than those tuned in a spectral valley. In response to complex sounds such as speech or music, which contain both spectral peaks and valleys, variation in the amount of capture along the tonotopic axis creates contrast in fluctuation depth that encodes the spectral envelope.
Figure 1 illustrates the NFC in AN responses for two “extremes” – one channel tuned at a spectral peak and the other in a valley. More generally, the fluctuation depths vary along the tonotopic axis, as illustrated in Fig. 2 for the response to the vowel /ae/ (speaker m06, Hillenbrand et al., 1995). The NF-depth profile for the normal-hearing AN model is illustrated in the background of Fig. 2B (magenta) and in more detail in Fig. 2E (solid). These profiles show how NF depth varies systematically along the tonotopic axis, such that NF-depth versus place encodes the spectral envelope.
Figure 2.
Model AN post-stimulus histograms (PSTHs) and response profiles for the vowel /ae/ at 75 dB SPL. A) Model audiograms for the three examples of SNHL illustrated, based on categories in Bisgaard et al. (2010). B) Responses of AN model with normal hearing; stimulus spectrum (foreground, black), fine structure (brown), and envelope (blue) are shown for each HSR AN CF. Normalized NF depth (standard deviation (std) of envelope) for each channel is shown in background (magenta). Vertical lines indicate spectral peaks associated with the first three formants (F1=730, F2=1850, F3=2685 Hz). C, D) Responses of models with mild and moderate sensorineural hearing loss, respectively. E,F,H,I) Smoothed profiles, based on averages of 10 model fibers per channel, across 50 CFs for models with NH (solid), mild SNHL (dashed), and moderate SNHL (dotted). Vertical lines indicate F1 and F2. E) Normalized neural fluctuation (NF) depth, F) model IC BE rate, H) model HSR AN rate, I) model IC BS rate. IC BE neurons are excited by fluctuations, thus IC BE rates (F) mirror the NF depths (E). Model IC BS neurons (I) are suppressed by fluctuating inputs, and are thus inversely related to NF depths. NFC cues, especially at F2, are substantially degraded by SNHL. Note that contrast in the AN rate profile is sharpened by mild SNHL. G, J) Examples of modulation transfer functions for IC BE (G) and IC BS (J) neurons, based on data from Carney et al. (2015).
The NFC code was first suggested not by examination of AN responses per se, but rather by the response properties of a likely decoding stage, the inferior colliculus (IC) (Carney et al., 2015). The strong sensitivity of IC neurons to input fluctuations, as evidenced by tuning to amplitude modulation (AM) frequencies in the range 20–250 Hz (Joris et al., 2004; Kim et al., 2020), motivated the detailed investigation of NFC in the input to the IC. This investigation revealed a simple code: cochlear filtering and saturating nonlinearities in the auditory periphery result in changes in NF depths, or NFC along the tonotopic axis, in the responses of AN fibers, with smaller amplitude fluctuations occurring in responses of fibers tuned near spectral peaks compared to those tuned near spectral valleys. This mapping from spectral features to NFC is robust across sound levels and in background noise and, due to the sensitivity of IC neurons to input fluctuations, is readily decoded by IC neurons and transformed into average-rate (Carney et al., 2015; Carney, 2018) and synchronized-rate (Henry et al., 2017) codes at the level of the IC.
NFC across AN responses along the tonotopic axis is reflected in different rate profiles for two of the most populous types of IC neuron. Neurons that are excited over a band of modulation frequencies (Fig. 2G), the band-enhanced (BE) neurons, are a hallmark of the IC, the first level of the ascending pathway where strong rate-based tuning to AM arises (Langner and Schreiner, 1988; Krishna and Semple, 2000; Joris et al., 2004; Nelson and Carney, 2007; Kim et al., 2020), providing inspiration for modulation-filter models (e.g., Kay, 1982; Dau et al., 1997a,b). The rate profile for IC BE neurons in response to complex sounds is counter-intuitive. These neurons are excited by deep fluctuations, therefore IC BE neurons have increased rates when CF is in a spectral valley and reduced rates when CF is near a spectral peak (Fig. 2F) (Carney et al., 2015).
IC neurons that are suppressed by a band of modulation frequencies, the band-suppressed (BS) neurons (Fig. 2J), have a rate profile that is inversely related to the NF-depth profile. This population of IC neurons is the most numerous in the IC (BS are roughly 40%, BE are roughly 30%, and hybrid neurons, which have both BE and BS response features, are roughly 20% of IC neurons, depending upon categorization criteria, Kim et al., 2020). Responses of BS neurons are “opponent” to those of BE neurons (Carney, 2018; Kim et al., 2020), in the sense that BS neurons are excited in channels where the NF depths are smallest, and suppressed where NF depths are largest (Fig. 2I).
While the BS rate profile might appear to simply reflect spectral energy, comparison to the AN rate profile (Fig. 2H) makes it clear that the IC BS response profile is enhanced by the sensitivity of these neurons to fluctuations. Even though many of the HSR AN fibers have average rates that are saturated, or nearly saturated, by the 75 dB sound pressure level (SPL) vowel illustrated in Fig. 2, the NFC in the HSR AN instantaneous-rate responses influences both IC BE and BS neurons. (Note that the AN rate profile shown in Fig. 2H is for HSR model AN fibers, which were the inputs to the model IC neurons.) A combination of the opponent BE and BS responses, potentially at higher levels of the ascending pathway, could further enhance sensitivity to the spectral features (c.f., coding strategies in visual processing, including the combination of opponent populations of retinal ganglion cells to enhance image features such as luminance contrast or chromatic edges) (Kim et al., 2020).
2. Effects of Sensorineural Hearing Loss on Neural-Fluctuation Contrast
An important aspect of the NFC model is that sensorineural hearing loss (SNHL) affects the NFC in a manner that is consistent with changes observed in psychophysical performance (see below). Figure 2 illustrates responses of AN models with sensorineural hearing loss (Fig. 2A), implemented as a combined reduction in cochlear gain, accounting for 2/3 of the loss, and in reduced IHC sensitivity, accounting for 1/3 of the loss (Zilany and Bruce, 2007). Mild SNHL has a relatively small impact on NFC at low CFs, but a strong effect near the second formant (F2). A quantitative study of the degraded NFC near F2 is described below. Qualitatively, NFC is decreased because the NF depths across mid to high frequencies are generally increased, even by mild SNHL (Fig. 2C). The increased NF depths are expected because of broadened peripheral tuning, reduced cochlear gain, and thus reduced saturation of the IHCs.
The general increase in fluctuation depth of AN responses at the F0 of the stimulus for listeners with mild and moderate SNHL is consistent with the stronger envelope following, as reported physiologically for animals with elevated thresholds (Kale and Heinz, 2010; Heeringa et al., 2023) and in EEG recordings for listeners with SNHL (e.g., Millman et al., 2017). An interesting aspect of the SNHL model responses is that while NFC is reduced, the contrast in the HSR average-rate profile is actually improved by mild SNHL (dashed line, Fig. 2H); this improvement is explained by the elevated thresholds that pull the responses away from saturation. Thus, if either AN average-rate-based excitation patterns or the strength of phase-locking to F0 were essential for coding complex sounds such as vowels, vowel identification and formant-frequency discrimination performance would be expected to improve with mild hearing loss. For more substantial hearing loss, the AN rate profile deteriorates (Fig. 2H, dotted line), and the NF profile and IC rate profiles are inverted for frequencies near F2, due to weakened capture (Miller et al., 1997) (Fig. 2E, F, H, I, dotted lines). With moderate SNHL, the F2 spectral peak results in a peak in IC BE rate, and a dip in BS rates, due to the resulting NFs that persist in the F2 region (Fig. 2D,F, I, dotted lines). The changes in the model NFCs near the first formant (F1) and F2 are consistent with behavioral discrimination of F1 and F2 in listeners with SNHL (Carney et al., 2023; see below).
3. Classical Theories for Neural Coding
The pure tone is the most commonly used stimulus in auditory science (and in the clinic), but the shaping of the envelopes of AN responses by peripheral nonlinearities is relatively unimportant for pure tones, which clearly capture IHC and AN responses. Saturation of the IHC-AN synapse, however, limits the dynamic range of AN fiber discharge rates. This nonlinearity has received considerable attention because it significantly limits coding schemes of sound level that are based on average discharge rates (e.g., Viemeister, 1988; Bharadwaj et al., 2014). An earlier “solution” to this limitation has been to emphasize the role of low-spontaneous-rate (LSR) AN fibers, which have wide dynamic ranges for coding pure tones at CF (Liberman, 1978), at least for CFs > 1500 Hz (Winter & Palmer, 1991), or for coding the envelopes of AM tones with carrier frequencies near CF (Joris & Yin, 1992). Rate-based models can explain thresholds for discrimination of level differences of pure tones by heavily weighting the contribution of LSR fibers in a model AN population (e.g., Delgutte, 1987; Viemeister, 1988; Winter et al., 1991; see Colburn et al., 2003). However, because the slopes of rate-level functions do not increase as level increases, rate-based codes limited to on-CF fibers still worsen as sound level increases, contrary to the wealth of psychophysical data indicating that performance in most listening tasks improves over a range of levels for which rate-based codes deteriorate (e.g., Schacknow and Raab, 1973).
This so-called dynamic-range problem motivated considerable modeling and experimental work focused on explaining perception of the level of pure tones. Performance in discrimination of pure tones or AM tones in quiet can be readily explained by the large population of AN fibers tuned away from the tone frequency (“spread of excitation”) (e.g., Siebert, 1965; Florentine and Buus, 1981; Heinz et al., 2001a,b; Encina-Llamas et al., 2019). Nevertheless, an influential psychophysical study by Viemeister (1983) tested level discrimination of a band of noise in the presence of a notched-noise masker and argued that a limited band of AN fibers with a range of thresholds is sufficient to encode sound level. However, it was more recently shown that this task could be achieved even when the level is randomized over a wide range, suggesting that it is not in fact a level-discrimination task, but likely depends upon relative spectral cues, such as edge-pitch (Richards and Carney, 2019). Edge-pitch refers to the pitch perceived at a frequency near a steep spectral edge of a band-limited noise (Hartmann et al., 2019), and may be a consequence of cochlear suppression. Other tasks using notched-noise maskers designed to limit rate-based information to an on-CF channel have been shown to actually provide NFC-based cues (Maxwell et al., 2020b; see below).
In general, although the dynamic-range problem for rate-based coding of pure tones in quiet may be solved by spread of excitation, the problem remains for complex sounds, especially in noise. The NFC code provides an attractive alternative. In fact, it is argued here that the saturation of the IHC-AN synapse, which equalizes the average discharge rates for the majority of AN fibers across a wide range of sound levels, is beneficial for the NFC code. Saturation of AN average rates effectively normalizes the average rates across stimuli and frequency channels at mid-to-high sound levels, removing the potential confound of changes in average rate, due to changes in sound level, from changes in fluctuation depth. It is important to note that equalizing the average discharge rate does not obscure the time-varying instantaneous-rates of HSR (Figs. 1, 2), and also LSR, fibers. It is the change in the depth of fluctuations of instantaneous rates of AN fibers along the tonotopic axis that is referred to here as NFC.
The classical alternative to the average-rate-based code for tones is TFS information in phase-locked responses of AN fibers or combined rate-based and TFS coding (Siebert, 1965; Heinz et al., 2001a,b). However, phase-locking to TFS steadily decreases for frequencies above approximately 1 kHz (Rose et al., 1967; Johnson, 1980), and TFS-based coding strategies are further challenged by complex sounds due to interactions between spectral peaks that influence TFS phase-locking (Bandyopadhyay & Young, 2004; Young, 2008). Moreover, decoding TFS information would require neural algorithms to analyze the temporal spiking patterns of AN fibers, and although many such algorithms have been proposed (e.g., auto-correlation, Meddis and Hewitt, 1991a,b; phase-opponency, Carney et al., 2002; cancellation, de Cheveigne, 1993, 2023; cross-frequency spatio-temporal patterns, Shamma, 1985; Cedolin and Delgutte, 2010; Heinz et al., 2010; Shamma and Klein, 2000), there is no physiological evidence to support any of these decryption mechanisms within the auditory central nervous system (Young, 2008; Li and Joris, 2023).
The NFC code provides an alternative to the earlier hypotheses for codes based on AN average discharge rates (i.e., excitation patterns) or phase-locking to TFS. NFCs associated with spectral features occur across a wide range of sound levels and frequencies. The maximal possible contrast in NFCs is determined by the bandwidth of peripheral tuning, because fluctuations are caused by beating between the components that pass through a filter: when more components pass through broader filters, the depth of the low-frequency fluctuations is reduced (Lawson and Uhlenbeck, 1950; see Dau et al., 1997a). Thus, NFCs are moderated at high frequencies by the increase in peripheral bandwidths, but they are still sufficient for representing broad spectral cues at high frequencies (see below). Importantly, NF depth versus place along the tonotopic axis is readily converted into a rate profile across the population of IC neurons by the well-established sensitivity of these neurons to the depth and frequency of fluctuations of their inputs (Joris et al., 2004).
The idea that envelope fluctuations, as opposed to energy, contribute to coding sounds is not new. Psychophysical models based on the envelopes of stimuli have resolved important limitations of the power-spectrum model of masking (Fletcher, 1940; Moore, 1995). For example, the addition of a tone to a narrowband-noise masker reduces the modulation depth of the stimulus envelope, yielding a cue for detection of the tone (Richards, 1992a,b). Studies comparing masking by low-noise noise, which is constructed to have a flat envelope, and gaussian noise highlight the importance of masker envelope properties (Kohlrausch et al., 1997; Svec et al., 2015; Brennan et al., 2023). These studies often focused on cues in the stimulus itself, using narrowband stimuli that are presumed to pass relatively unchanged through a single critical-band filter. Consideration of modulation cues in complex stimuli led to the development of modulation-filterbank models, which have been applied to a wide range of psychophysical tasks (Dau et al., 1997a,b) and to predicting speech intelligibility (e.g., Jepson et al., 2008; Jørgensen and Dau, 2011; Relaño-Iborra et al., 2019; Relaño-Iborra and Dau, 2022). These studies analyzed responses to wideband sounds using relatively simple peripheral models that lack strongly saturating nonlinearities. As a result, the detection cues used in modulation-filterbank models differ from those in NFC theory: modulation-filterbank models emphasize the “modulation excitation pattern”, or peaks in the modulation power at the filter outputs (e.g., Jørgensen and Dau, 2011), whereas in NFC theory, spectral peaks are encoded by contrast in fluctuation depth resulting from reduced fluctuation depths created by saturating peripheral nonlinearities (Figs. 1, 2) (Carney et al., 2015; Carney, 2018).
In the following sections, the role of the interactions of peripheral nonlinearities in establishing the NFC code is illustrated. Physiological support for the NFC code at the level of the midbrain, based on recordings in IC, is reviewed. Following that, psychophysical modeling studies that illustrate the ability of NFC cues to explain a wide range of perceptual tasks are presented. Most of these studies used stimuli at moderate sound levels, corresponding to the levels of conversational speech, for which NFCs in a basic model of AN responses are particularly robust. However, it is important to challenge any hypothesized neural code across a wide range of sound levels. Maintaining NFCs at low or high sound levels could be achieved by controlling cochlear gain based on both sound level and NFCs (Carney, 2018). The hypothesis that a primary aspect of auditory efferent function is to maintain, and in fact enhance, NFCs is described below, including an overview of a new computational model that includes efferent pathways (Farhadi et al., 2023).
Lastly, this review considers the implications of the NFC code for understanding hearing loss. Limitations of the current iteration of the models used to investigate NFC, particularly regarding well-known and emerging physiological properties of midbrain neurons, is presented. Although the NFC-coding hypothesis is consistent with a broad set of physiological and psychophysical results, future investigatory paths at different levels of the auditory pathway are presented.
4. Physiological Support for the NFC Code
As mentioned above, NFC has not been a traditional focus of physiological studies of AN fibers (but see Li and Joris, 2021; Heeringa and Köppl, 2022). Nevertheless, changes in NFC along the tonotopic axis are evident in the classic studies of AN responses to vowels and harmonic complexes. For example, an illustration of the dominant components that governed the temporal responses of AN fibers (Fig. 7 in Delgutte and Kiang, 1984) shows that synchrony to F0 in response to vowels changes as a function of CF in a manner consistent with Fig. 2. Young and Sachs (1979) studied temporal responses of AN responses to vowels and focused on quantifying TFS using the averaged localized synchronized rate, but again synchrony to F0 varied across CFs, consistent with the model responses shown in Fig. 2.
At the level of the IC, physiological responses to vowels were compared to predictions of the NFC model and an energy-based model (Carney et al., 2015; reviewed in Carney, 2018). IC rate responses to a set of English vowels were better predicted by BE and BS models for IC neurons than by a model based on stimulus energy at CF. The NF-driven IC-rate profiles were also robust in additive gaussian noise over a wide range of signal-to-noise ratios (SNRs). A physiological and behavioral study of formant-frequency coding in the budgerigar (Henry et al., 2017) showed that IC average-rate coding of single-formant vowel-like sounds could explain behavioral discrimination for stimuli in quiet, but that IC phase-locking to the F0 was important to explain operant behavioral results for peak-frequency discrimination in noise. Figure 2 focuses on the IC rate profiles, but future work should include both rate and temporal properties of physiological and model IC responses.
NFC cues contribute to the coding not only of speech, but also of any complex sound. Several physiological studies have examined the NFC cues that have been proposed to explain performance in psychophysical tasks, such as detection of tones in noise. Increased NFC across channels tuned near a tone added to a noise masker provides a cue for detection of a tone in noise (Fig. 3A,C), and the place along the tonotopic axis of the dip in NFs indicates the frequency of the added tone. Physiological studies of IC responses to tones and narrowband (1/3-octave) noise maskers, using stimulus levels matched to those used in psychophysical studies (see below), were carried out to test several specific predictions of the NFC model. For example, BE neurons were predicted to have average rates that decreased when a CF tone was added to a narrowband gaussian noise centered at CF, because the tone reduces the depth of NFs which excite BE neurons. In contrast, BS neurons would have rates that are increased by addition of a tone to a noise, because the tone reduces NFs that suppress BS neurons. These predicted changes in rate as a function of the level of the added tone were observed in the majority of investigated IC neurons (Fan et al., 2021). Opposite predictions were made for responses of IC neurons to CF tones added to low-noise-noise, a narrowband noise manipulated such that the temporal envelope is flat (Kohlrausch et al., 1997). Adding a tone to a low-noise noise increases the depth of envelope fluctuations in the stimulus, and thus would be expected to increase the response rates of BE neurons and decrease the rates of BS neurons. Again, these predicted rate changes in response to tones added to low-noise-noise stimuli were observed in the majority of the physiological IC responses reported in Fan et al. (2021).
Figure 3 –
AN model responses to stimuli used in detection studies. (A, B, C) Model responses to tone-plus-noise, and (D, E, F) responses noise-alone stimuli, for (left, A, D) Normal Hearing, (middle, B, E) mild and (right, C, F) moderate Sensorineural hearing loss. Stimulus spectra are shown in foreground, and profile of NF depths in background (magenta). Stimuli were 65 dB SPL, 1000-Hz tones, in 1/3-octave gaussian noise at 65 dB SPL, matching stimuli in Leong et al. (2020). Model audiograms were the same as in Fig. 2A). The NFC for the normal-hearing noise-alone response (D) is strong because fluctuations were deeper for frequencies below and above the center frequency, where the cochlear filters straddle the spectral edges of the 1/3-octave noise. (This profile is consistent with the pitch perceived in response to a narrowband noise, which is slightly offset from the center of the noise band – unpublished observation.) The added tone increased NFC by deepening the dip in the NF profile. The added tone also centered the dip at the tone frequency (vertical dashed line). The models with SNHL had larger NF depths in response to both stimulus conditions, and thus a smaller change in NF depth, or NFC, upon addition of the tone. The reduced contrast near the center frequency in the NFC noise-alone profiles with SNHL (compare D to E, F) is consistent with weaker pitch strength for narrowband noise in listeners with SNHL (Horbach et al., 2018).
Ongoing physiological studies are exploring responses of IC neurons to speech-like stimuli that match those used in psychophysical and modeling studies (Carney et al., 2023). IC responses to wideband tone-in-noise stimuli reveal complexities that cannot be explained by a simple IC model with AM tuning, likely due to the influence of off-CF mechanisms, such as broad-band inhibition, that would not have been engaged by the narrowband tone-in-noise stimuli used by Fan et al. (2021). These wideband mechanisms may also shape responses to speech sounds, with or without noise backgrounds. Finally, recent studies of IC neurons reveal other sensitivities, in addition to the frequency tuning and fluctuation sensitivity included in the model responses shown above. These include sensitivity to binaural differences and sensitivity to fast frequency sweeps (“chirps”) in complex sounds, such as Schroeder-phase harmonic complexes (Steenken et al., 2022; Henry et al., 2023; Mitchell et al., 2023). Frequency sweeps occur in complex sounds in which the phase varies across frequency components, and they are thus potentially important for understanding responses to complex sounds such as voice and music, for which vocal-tract and instrumental resonances shape the phase spectrum.
5. Psychophysical Modeling Support for the NFC Code
Extensive support for the NFC model is provided by predictions of psychophysical performance, which have taken advantage of a large literature describing performance of listeners on a diverse set of tasks. To make such predictions, a quantity, referred to as a decision variable (DV), is computed based on each stimulus waveform or on the model response to each waveform. For a multiple-interval task, the target interval is selected based on a comparison of the DVs computed for each interval using a decision rule. For example, for a task requiring detection of a tone in noise, a classical decision rule could be to select the interval with the highest energy in the filter response. Alternatively, NFC-inspired decision rules would be to select the interval that elicited the largest NFC, or the IC BE or BS rate profile with the largest across-frequency differences, as the interval that most likely contained the tone. DVs and decision rules are tested by using them to estimate thresholds for direct comparison to psychophysical thresholds. A DV can be further tested by perturbing it using stimulus manipulations. For example, an energy-based DV can be challenged by randomly varying (i.e., “roving”) the overall sound level of the tone-plus-noise stimulus across intervals, which significantly increases the threshold estimated using an energy-based DV (e.g., Appendix A in Green, 1988). Experimentally, listener thresholds for detection of a tone in noise are affected much less by a roving-level paradigm than would be predicted by the energy-based DV, especially for large rove ranges (e.g., 32 dB; Kidd et al., 1989; Richards, 1992b), suggesting that listeners have access to more reliable DVs than energy. Generally, stimulus- or model-based DVs can be used to estimate a threshold, using either an adaptive track or the method of constant stimuli, or hit- and false-alarm rates for specific waveforms in a given stimulus condition. These model-based threshold estimates can then be directly compared to listeners’ performance in psychophysical discrimination or detection tasks.
For either single-channel or population-response models, decision rules can be based on direct comparison of DVs for each stimulus interval. Alternatively, the DV for each interval can be compared to a template created by averaging responses to many repetitions of a standard stimulus (e.g., Dau et al., 1997a,b; Maxwell et al., 2020b; Osses and Kohlrausch, 2021). The template can be based on temporal responses to a stimulus waveform or on an across-channel profile of time-averaged responses. Thresholds based on template comparisons can take advantage of any change in the response that affects the comparison of the single-interval responses to the template. Examination of differences across intervals that contribute to successful decisions can then identify potential cues used by listeners.
5.1. Detection of Tones in Diotic or Monaural Gaussian Noise
As mentioned above, earlier studies explored the role of fluctuations in the stimulus envelope in masking the detection of tones. Several of these studies used narrowband maskers, under the assumption that the stimulus envelopes would be minimally affected by the peripheral filter tuned to the tone frequency, and the further assumption that detection is based on the response of the frequency channel centered on the tone frequency. Richards (1992a,b) showed that a DV that was calculated based on the envelope slope (i.e., the mean of the absolute slope of the stimulus envelope) predicted listeners’ responses for both roving-level and equal-energy paradigms. Kohlrausch et al. (1997) manipulated the stimulus envelope by using low-noise noise (LNN), in which envelope fluctuations were reduced, and showed that detection thresholds are lower for LNN maskers than for gaussian noise maskers. Indeed, they reported that the nature of the task differs between the two masker types: for gaussian noise maskers, listeners detect the tone by selecting the stimulus interval that fluctuates less, whereas for LNN they select the stimulus interval that fluctuates more (Kohlrausch et al., 1997). Moore (1975) mentions the possibility of envelope cues and a related “roughness” percept as a cue for tone-in-noise detection, even in wideband noise maskers. Although the results of these studies were not interpreted in terms of peripheral nonlinearities, the results are consistent with the NFCs that would be elicited by the stimuli used in these studies (Fig. 3), as supported by the physiological studies of tone-in-noise responses described above and further modeling studies reviewed below.
Davidson et al. (2006, 2009) extended previous work on the role of fluctuations in tone-in-noise detection by predicting hit- and false-alarm rate responses of listeners for the detection of a 500-Hz tone in an ensemble of 25 reproducible (“frozen”) gaussian-noise waveforms from Evilsizer et al. (2002). Unlike the studies that focused on narrowband maskers, this dataset included responses for either wideband (100–3000 Hz) or narrowband maskers (452–552 Hz, roughly a critical bandwidth at 500 Hz). All noise stimuli were passed through a 4th-order, gammatone filter centered at 500 Hz with a bandwidth equal to the equivalent rectangular bandwidth (ERBN, an estimate of the bandwidth of the auditory filter in listeners with normal-hearing; Moore, 2003) at 500 Hz. Davidson et al. (2006, 2009) tested energy, TFS, and normalized envelope-slope DVs and argued that, in general, the cues based on envelope fluctuations predicted significant variance in listeners’ responses while offering a decision strategy that would be resistant to a roving-level paradigm.
This work was extended to study cues for masked detection, including envelope-fluctuation cues, in three ways. First, a modulation filter was applied to the response of the gammatone filter, to limit the envelope-based cues to those that are most informative about the presence of the 500-Hz target tone in gaussian noise (Mao et al., 2013). The modulation filter was tuned to 120 Hz with a quality factor, Q = center frequency/bandwidth = 1; this modulation frequency lies within the distributions of peak or trough modulation frequencies for IC BE and BS modulation transfer functions (MTFs) (Kim et al., 2020). The inclusion of the modulation filter contributed to the increased amount of variance explained in the Evilsizer et al. (2002) dataset, based on an optimal nonlinear combination of envelope, energy and TFS cues, to approximately the limit of predictable variance (Mao et al., 2013). Second, Mao and Carney (2015) provided a bridge between stimulus-based and physiologically realistic envelope-related cues by using a nonlinear peripheral model for AN responses (Zilany et al., 2014), instead of a linear gammatone filter. The AN model further shaped the fluctuation cues due to cochlear compression, IHC transduction, and adaptation at the IHC-AN synapse (see Fig. 1). The AN model response was then processed by model cochlear nucleus and IC neurons, which introduced modulation tuning. The fluctuation-related DVs based on responses of the physiological model were as successful as stimulus-based DVs in predicting listeners’ hit and false-alarm rates for detection of the 500-Hz tone in reproducible maskers. This physiological modeling approach was more recently extended to predictions of tone detection in several classic diotic paradigms (Carney et al., 2022), including varied bandwidths, a range of tone frequencies, a roving-level paradigm, and co-modulated noise, which refers to a gaussian noise that is multiplied by a low-pass-filtered noise to impose a common envelope across a wide range of frequencies (e.g., Hall et al., 1984; Moore, 1990).
Thirdly, Mao et al. (2015) showed that a combination of energy- and envelope-fluctuation-based cues provided by the physiological model predicted 500-Hz tone-in-noise detection in reproducible noises for listeners with normal-hearing or mild hearing loss. In contrast, listeners with substantial SNHL at the target frequency depended mainly on energy cues, and were adversely affected by the roving-level paradigm. More recently, Leong et al. (2020) studied diotic detection of tones in 1/3-octave gaussian maskers for listeners with SNHL across a range of tone frequencies, with and without a roving-level paradigm. As expected, listeners with SNHL were more affected by the roving-level paradigm, suggesting that they depended more on energy-based cues, consistent with SNHL impairing the quality of NFC cues (Fig. 3).
5.2. Detection of Tones in Other Diotic Maskers
Masked thresholds are often interpreted based on the power spectrum model (Moore, 1995), which focuses on the energy at the output of the auditory filter tuned to the target frequency. As such, many studies use maskers designed to constrain the listeners to use information within a specific frequency range. For example, notched-noise maskers have been used to measure detection thresholds as a function of the width of a spectral notch centered at the frequency of the target tone (e.g., Patterson, 1976; Patterson and Moore, 1986) in order to estimate the shapes of auditory filters. However, Lentz et al. (1999) showed that this task, too, is not affected by a roving-level paradigm, suggesting that cues other than energy in the output of a single filter must be involved. Maxwell et al. (2020b) showed that NFC cues and a model for a population of fluctuation-sensitive IC neurons could explain the notched-noise data of Lentz et al. (1999), for both fixed- and roving-level paradigms.
Lentz et al. (1999) also showed that estimates of auditory-filter bandwidths differed for the same listeners when using a different estimation strategy based on a profile-analysis (PA) task. In PA tasks, listeners detect a change in spectral shape, typically in the presence of level variation (Green, 1988). In Lentz et al. (1999), the target tones were added in-phase with alternate frequency components of the equal-amplitude, log-spaced complex tone masker, and could thus be thought of as increments in the amplitude of those masker components. Maxwell et al. (2020b) showed that the trends in threshold for this PA task could be explained by the same NFC-based population IC model that was used for the notched-noise task. Increments in masker components result in small spectral peaks, and thus capture of IHC responses, providing NFC cues for the task. A key parameter manipulation in PA tasks is the number of masker components, which typically affects their frequency spacing. Because the beat frequencies resulting from interactions between masker components depends on their frequency spacing, manipulating the number of masker components varies the correspondence between NF frequencies and the range of modulation tuning in the IC and thus how well IC rates code the target increment(s). Additionally, because cochlear filter bandwidths are broader (in Hz) at higher frequencies, components that are more widely spaced can create beats in the high-CF filter responses. Thus, the modulation rates at higher frequencies span a wider range, which can again extend beyond the range of modulation sensitivity of IC neurons. Ongoing work is exploring predictions of PA thresholds based on NFC cues in individual listeners with SNHL.
5.3. NFC Cues for Tasks Related to Pitch and Timbre
An NFC cue for the F0 of a harmonic tone is provided by differences in NF depth along the tonotopic axis between AN fibers tuned near or between harmonics. A model for F0 discrimination of equal-amplitude harmonic-tone complexes was able to explain trends in the performance of listeners with and without mild SNHL, using a template DV based on a population of model IC neurons driven by model HSR AN fibers (Bianchi et al., 2019). Impressively, listeners with normal hearing can discriminate F0 in harmonic complexes at very low SNRs (Gockel et al., 2006), challenging most pitch models. Performance for discrimination of F0 near 250-Hz for a range of levels of pink noise can be predicted by a population model of fluctuation-sensitive IC neurons (Carney, 2022), suggesting a role for NFC cues in explaining a challenging F0-discrimination task. The field of pitch perception provides a rich source for phenomena that are being used to challenge the NF model in ongoing studies.
Timbre is a quality of complex sounds essential for speech and music perception. One important aspect of timbre is related to the shape of the spectral envelope, often quantified by the spectral centroid. Studies of timbre discrimination have used harmonic tone complexes with a triangular spectral envelope (Allen and Oxenham, 2014). The spectral peak results in capture of IHC and AN responses tuned near the highest amplitude component. Capture creates NFC between channels tuned near the peak and those on the spectral slopes (Figs. 1, 2). Maxwell et al. (2020a, 2021) showed that timbre-discrimination thresholds based on a population of fluctuation-sensitive model IC neurons could explain the sensitivity of human listeners, as well as trends in performance when the F0 of the harmonic complex was randomly varied across stimulus intervals. Model responses also predicted trends in brightness estimates for a set of instrumental sounds (Maxwell et al., 2021). Ongoing work in this area is focusing on physiological responses in the IC for stimuli with simple triangular spectral envelopes or with the more complex envelopes created by musical instruments.
5.4. Speech Responses and Intelligibility Predictions Based on the NFC Model
As mentioned above, the NFC model arose from modeling studies of responses to vowel sounds, which were pursued in physiological and modeling studies of the IC (Carney et al., 2015, 2016; Carney & McDonough, 2019). A psychophysical study of formant-frequency discrimination in synthetic vowel-like sounds was carried out in listeners with and without SNHL to test the hypothesis that NFC cues could explain results for this task (Carney et al., 2023). Formant bandwidths were manipulated to vary the NFC and thus vary the difficulty of the discrimination task. Thresholds changed with bandwidth, as expected. Interestingly, the thresholds for discrimination at 600 Hz, a frequency in the F1 region, were not affected by SNHL (consistent with Fig. 2). In contrast, thresholds at 2 kHz, in the F2 region, were strongly affected by SNHL. An NFC model based on a population of IC neurons driven by model HSR AN fibers, with individual audiometric thresholds included in the AN model, had discrimination thresholds, and trends, that were similar to those of the listeners. Ongoing work is testing the hypothesis that the NFC code can explain formant-frequency discrimination by recording physiological responses in the IC.
The phenomenon of capture and its effect on speech perception is described above using examples of vowels, for which NFC is created by capture in frequency channels near spectral peaks, against a background of fluctuations at the fundamental of the harmonic stimulus. However, as demonstrated by the masked-detection studies (Fig. 3), the NFC model does not require harmonic stimuli. AN responses tuned near significant spectral peaks in noisy stimuli are also captured. The spectral peaks in the stimulus are enhanced by peripheral filtering, similar to Fig. 1B (orange). The concentrated energy near CF in the cochlear response drives IHCs into saturation, reducing NFC across channels tuned near spectral peaks. Hamza et al. (2023) showed that NFCs provide a potential cue for both voiced and unvoiced fricative consonants, which are characterized by spectral envelopes with broad, relatively high-frequency peaks. Fricatives were categorized using a machine-learning classifier based on their spectra or on the average-rate response profiles of model HSR AN fibers or model IC BE or BS fluctuation-sensitive neurons. The model IC responses provided the best prediction of behavioral performance in a consonant-identification task.
A challenging, but potentially valuable, application for models of speech responses is the task of predicting speech intelligibility in different listening environments and with SNHL. The NFC model framework was applied to this problem by estimating changes in the correlation between time-frequency responses to noise alone and speech-plus-noise for a population of model BE neurons driven by model AN fibers (Zaar & Carney, 2022). The basis for these predictions is the assumption that larger differences between the speech-plus-noise and the noise-alone responses indicate more intelligible (or less masked) speech. Intelligibility predictions were made in several types of background noise: speech-shaped gaussian noise, AM gaussian noise, and the international speech test signal (Holube et al., 2010). Masking release provided by fluctuations in the maskers was estimated for a group of listeners with either normal hearing or SNHL by including individual audiogram thresholds in the AN model. The predictions were generally successful in describing the trends in masking release across noise backgrounds for both groups of listeners, and for individual listeners with SNHL. Ongoing work will explore whether model updates (see below) will further improve these predictions.
5.5. Binaural Detection and NFC
The psychophysical predictions described above are all for monaural or diotic stimuli. More limited work has applied the NFC modeling framework to dichotic stimuli. Mao & Carney (2014) showed that fluctuation-based cues were also successful in predicting hit and false-alarm rates for listeners in a binaural-detection task, in which a large advantage in detection is provided by a 180-degree phase difference between the tones delivered to the two ears in the presence of a diotic gaussian noise masker (Hirsh, 1948). The difference between the diotic and dichotic thresholds is referred to as a binaural masking level difference, a phenomenon that has received considerable attention from psychophysical modelers (e.g., Colburn and Durlach, 1978; Gilkey and Robinson, 1986; Isabelle and Colburn, 1991). A challenge in this work is to identify the cues that listeners use to gain the binaural advantage in detection. Mao and Carney (2014) showed that the classical binaural cues, interaural time and level differences, or even an optimal linear combination of interaural time (ITD) and level (ILD) differences, are not successful in predicting much of the predictable variance in dichotic detection, using the reproducible noise data of Evilsizer et al. (2002) and Isabelle (1995). These classical cues or their combinations were out-performed by a cue based on the slope of the interaural envelope difference (IED), which was computed based on the difference between responses of modulation filters that were driven by gammatone-filtered stimulus waveforms from each ear. The IED is a straightforward cue in the NF framework, but analysis of this cue using a binaural modulation strategy (van der Heijden and Joris, 2010) showed that it is equivalent to a nonlinear ITD ILD cues (Mao and Carney, 2014). Specifically, ITDs dominate the IED cue at low modulation depths, and ILDs dominate at higher modulation depths. A further study of binaural detection and the IED cue used a physiological AN model followed by modulation-tuned model neurons and excitatory/inhibitory interactions to estimate IEDs (Mao and Carney, 2015). This study showed that a simple physiologically based IED model could predict the listeners’ hit and false alarm rates for the dichotic conditions in the Evilsizer et al. (2002) reproducible-noise dataset. Note that the model of Mao and Carney (2015) focused on binaural detection based on interaural differences between fluctuation-related cues, and did not test the model IC neurons for sensitivity to classic interaural cues. A follow-up study showed a simple IC model that performed coincidence detection on excitatory inputs from both ears, as well as an additional inhibitory input driven by the contralateral ear, could explain physiologically-realistic ITD and ILD curves, as well as IED sensitivity that predicted the binaural-detection data (Carney, 2019).
6. Efferent Control of Cochlear Gain and the NFC model
The NFC cues along the tonotopic axis that were explored in the physiological and psychophysical studies described above are shaped by peripheral nonlinearities and their interactions (Fig. 1). The amplitude of the cochlear response is determined by nonlinear cochlear gain at a given place in the cochlea, and the cochlear response determines the extent to which IHCs are saturated, or captured, by stimulus components near CF (Zilany and Bruce, 2007). The amplitude and time course of the IHC voltage then determines the state of adaptation of the IHC-AN synapse. Thus, cochlear gain control plays an important part in shaping NF cues, and gain control can modulate NFC. We are using computational models to explore the potential for the MOC efferent system to maintain and enhance NFC by regulating cochlear gain.
The MOC is an important control system within the auditory system. In engineering parlance, control systems are designed to maintain desired properties in an output signal. As such, the inputs to a control system must include signals that carry information relevant to the desired output property. For example, basic “automatic-gain-control” systems are designed to keep the amplitude of an output signal within a certain range, perhaps to avoid saturation of following subsystems. Such control systems must receive inputs that carry amplitude information. This example was chosen because it is the historically assumed framework for the MOC reflex (reviewed in Guinan, 2018; Jennings, 2021).
Maintenance of NFC is a different conceptual framework for the purpose of the MOC efferent system. In this case, the goal is to prevent NFCs from being deteriorated in response to either low-level sounds, for which the IHC and IHC-AN synapse would not be saturated in any channels, or high-level sounds, for which saturation might occur across all channels. A control system that maintains NFC would require input signals that carry fluctuation information. Such inputs to the MOC are provided by the large and powerful projections from the central nucleus of the IC to MOC neurons (Brown et al., 2013; Romero and Trussell, 2021, 2022), which are located in the ventral nucleus of the trapezoid body (Warr, 1992; Brown, 2011; Schofield, 2011). Recordings from MOC axons in the cochlea reveal band-pass MTFs, which could be inherited from IC BE inputs (Gummer at al., 1988), or derived de novo by convergent inhibitory and excitatory inputs to MOC neurons (e.g., convergent inhibitory-excitatory interactions can explain AM tuning in the IC, Nelson and Carney, 2004).
We are using computational models to explore the maintenance and enhancement of NFCs by a combination of level-driven inputs from wide-dynamic range neurons that project to MOC neurons (e.g., Ghoshal and Kim, 1997; Ye et al., 2000) and fluctuation-tuned inputs from the IC to MOC neurons (Fig. 4; Farhadi et al., 2023). Wide-dynamic-range inputs to MOC neurons that project to a range of frequency channels in the periphery are critical to maintain the “regional” operating point of the peripheral nonlinearities near the saturation knee points (Farhadi, 2023). By doing so, for a wide range of overall sound levels, the spectral peaks would saturate channels tuned near the peaks, whereas those tuned away from the peaks would not be saturated. (Note that more frequency-specific gain control has the problem of equalizing, or “flattening” the spectrum, a classical problem in designing hearing-aid amplifiers with compression (e.g., Plomp, 1988).) The wide-dynamic-range inputs to the MOC system act as a classical negative-feedback loop: higher-level stimuli result in reduction of the cochlear gain, bringing down the amplitude of the output signal (Farhadi et al., 2023).
Figure 4.
Schematic diagram of computational model with efferent pathways, as described in Farhadi (2023). MOC-driven cochlear gain control combines two feedback signals. Blue: Energy-driven, wide-dynamic-range signals were simulated with LSR AN fibers that project via the cochlear nucleus (CN) to MOC neurons which are hypothesized to innervate a relatively wide range of cochlear locations. Red: Fluctuation-driven signals are initiated by HSR AN fibers that project via the CN and superior olivary complex (SOC) to IC BE cells, which project to MOC neurons that are hypothesized to innervate a narrower range of cochlear locations. Cochlear gain is controlled on a sample-by-sample basis. Responses shown are for model neurons with CF = 2 kHz in response to a 400-ms, 65-dB SPL, 1/3-ERBN gaussian noise centered at 2 kHz. The gain factor modulates the value of the AN model parameter COHC over the time course of the stimulus.
We hypothesize a very different role for the fluctuation-driven inputs to the MOC system (Carney, 2018; Farhadi, 2023; Farhadi et al., 2023). IHCs in frequency channels with deep fluctuations are, in general, not saturated (see Fig. 1). Yet strong fluctuations will ultimately excite MOC neurons that have bandpass modulation tuning, resulting in cochlear gain reduction. Thus, the fluctuating signals to the MOC create a positive feedback loop: fluctuating inputs reduce cochlear gain, resulting in deeper fluctuations. On the other hand, frequency channels tuned near spectral peaks that are captured (and have low NF depths) would provide less excitation to the MOC system, decreasing the reduction of cochlear gain (i.e., restoring cochlear gain towards higher values) and thus increasing the degree of saturation of IHCs. Again, this is positive feedback: reduced fluctuation depths lead to further reduction of fluctuation depths. The combination of fluctuating channels leading to deeper fluctuations, and saturated channels leading to further saturation, results in enhancement of NFC across the population. For this positive feedback signal to be effective in sharpening NFC, it must be more frequency-specific than the wide-dynamic-range-driven negative feedback signal described above (Farhadi, 2023). The existence of both broad and narrow MOC projections to the cochlea have been described (Brown, 2014). Future work is required to elucidate whether these projections are associated with the roles proposed here. Additionally, more measurements of efferent function are required to better determine the functional frequency spans of the feedback projections from the MOC to the cochlea.
Because the peripheral nonlinearities interact with each other, and the two feedback loops described above also interact, computer simulations are helpful in visualizing the overall effect of the MOC control system. The MOC stage of the model illustrated in Fig. 4 was parametrized based on the dynamics of IC responses, which have rates that increase over the time course of fluctuating AM-noise stimuli (Farhadi et al., 2023). This framework for MOC control of NFC effectively sharpens the representation of complex sounds, such as the NF profile in response to formant peaks in synthetic vowel-like sounds, even in moderate levels of background noise (Farhadi, 2023).
The proposed gain-control framework also explains interesting forward-masking phenomena. For example, a model that includes these efferent pathways, with dynamics that are appropriate for the MOC system, can explain trends in gaussian-noise disruption, which refers to stronger forward-masking by fluctuating narrowband-noise as compared to low-fluctuating-noise (Svec et al., 2015; Brennan et al., 2023). Additionally, ongoing work is testing the hypothesis that the model with efferents can explain the growth of masking and the time-course of recovery from forward-masking by tones, even in a roving-level paradigm (Jesteadt et al., 2005). The role of the efferent system in forward masking is consistent with that proposed in previous work (e.g., Jennings et al., 2009; Jennings, 2021). The overshoot effect, which describes increased thresholds for brief tones presented near the onset of a simultaneous masker as compared to delayed tones, is another phenomenon hypothesized to involve the MOC system (Strickland, 2004; Jennings et al., 2011); this effect is also predicted by the computational model with efferents (Farhadi and Carney, 2023). It is interesting to compare the proposed role of the MOC efferent system in maintaining and enhancing NFC to control systems in vision for enhancing contrast in the neural representation of images (Carney, 2018; Frank et al., 2023).
7. Summary and Future Directions
This review introduced the NFC code for complex sounds. Effects of SNHL on NFC were also illustrated. The implications of these effects are of interest for a better understanding of how even mild hearing loss can interfere with this neural code, possibly explaining the difficulties of these listeners in understanding complex sounds, especially in noise. NFC theory provides a new framework for addressing SNHL. Amplification per se will not fully correct NFC cues; instead, stimulus spectra and temporal envelope properties must be manipulated to restore NFC and NF-versus-place cues. The NFC code may also provide a strategy for understanding the physiological mechanisms underlying new tests that have been developed for predicting speech intelligibility in listeners with SNHL, using stimuli with interesting spectral and temporal contrasts (Zaar et al., 2023, 2024). In addition to SNHL, the influence of aging on modulation tuning in the IC (Palombi et al., 2001) would be expected to affect the NFC code and is an important topic for future studies.
The role of efferent gain control in maintaining and enhancing NFC also has implications for understanding and addressing hearing loss. The reduction of NFC (e.g., Figs. 2, 3) and generally deeper fluctuations in responses of the models with SNHL result in a descending signal from the IC to the MOC that would reduce cochlear gain. Thus, the efferent gain control system that would enhance NFC in the healthy ear may actually work to reduce NFC in the ear with SNHL. However, if contrast can be restored in the peripheral NF profile, then the same descending system would have a chance to enhance it. Ongoing physiological work will test predictions for the effects of the efferent feedback on responses at the level of the IC. Pursuing this work in awake animals is critical, as anesthesia is known to suppress efferent function (e.g., Guitton et al., 2004). Another approach to testing the role of the MOC efferent system in processing complex sounds focuses on evoked emissions (Marshall et al., 2014; Goodman et al., 2021).
It is important to note that the computational models used here are not complete. In particular, the IC models used here focused on model neurons that have frequency tuning and sensitivity to AM; however, IC neurons have other important sensitivities, such as to interaural differences and the direction and velocity of frequency sweeps (Steenken et al., 2022; Mitchell et al., 2023). Inclusion of these sensitivities, as well as more accurate binaural and collateral pathways for efferent projections (Brown, 2011; Schofield, 2011), should be included in future efforts. Additionally, the influence of the lateral olivocochlear system on NFC has not been explored and should be included in future studies.
Future tests of the NFC coding hypothesis should include additional predictions of psychophysical tasks that cannot be explained by rate-place coding or by phase-locking to temporal fine structure or to the stimulus envelope. For example, Jackson and Moore (2014) designed several psychophysical tasks involving discrimination of harmonic and inharmonic complex tones to explicitly challenge these classical coding theories, such as by randomizing the phases of stimulus components to disrupt envelope cues. Quantitative predictions of their results based on NFC cues have not yet been made. However, the randomization of component phase designed to make envelope-related cues unreliable has little effect on capture of AN responses, and thus on NFC cues or rate profiles across the population of model IC neurons, suggesting that the NFC code may successfully predict the lack of an effect of phase randomization on listeners that was observed for this stimulus manipulation. Additional physiological recordings at the level of the midbrain are required to determine whether the responses to complex sounds agree with model predictions. Recordings at the level of cortex would be interesting, although the NFC model would generally predict cortical encoding of the spectrum in terms of average rate, largely consistent with the known properties of cortical responses to complex sounds (e.g. Pasley et al., 2012; Steinschneider et al., 2013).
Highlights.
Peripheral nonlinearities have potentially beneficial properties for neural coding of complex sounds.
Interactions of cochlear gain and saturating nonlinearities shape the contrasts in fluctuation depths of auditory-nerve responses along the tonotopic axis, referred to as neural-fluctuation contrast (NFC).
NFC provides a code for the spectrum of complex sounds that is readily decoded by fluctuation-sensitive neurons in the inferior colliculus.
The control of cochlear gain by the medial olivocochlear efferent system, which is driven by both wide-dynamic-range inputs and by fluctuation-sensitive inputs, could maintain and enhance NFC along the tonotopic axis.
Sensorineural hearing loss reduces cochlear gain and thus reduces the effects of saturation in the periphery, impairing the NFC code for complex sounds and interfering with the feedback mechanisms for maintaining this code.
Acknowledgements
Funding: This work was supported by the National Institutes of Health grants R01DC001641 and R01DC010813, and by a Fellowship from the Hanse-Wissenschaftskolleg in Delmenhorst, Germany. The manuscript benefited from thoughtful comments from Emanuela Assenza, Daniel Guest, Swapna Agarwalla, David Cameron, Elizabeth Strickland, and Afagh Farhadi, as well as from Brian Moore and an anonymous reviewer. MATLAB code used for simulations is available at https://osf.io/6bsnt/ and https://urhear.urmc.rochester.edu/webapps/home/.
Glossary – List of Abbreviations:
- AN
auditory nerve
- AM
amplitude modulation
- BE
band-enhanced
- BS
band-suppressed
- CF
characteristic frequency
- DV
decision variable
- F0
fundamental frequency
- F1, F2, F3
first, second, third formant
- HSR
high spontaneous rate
- IHC
inner hair cell
- IED
interaural envelope difference
- ILD
interaural level difference
- ITD
interaural time difference
- LSR
low spontaneous rate
- MOC
medial olivocochlear
- MTF
modulation transfer function
- NF
neural fluctuation
- NFC
neural fluctuation contrast
- PS
profile analysis
- SNHL
sensorineural hearing loss
- SPL
sound pressure level
- TFS
temporal fine structure
- WDR
wide dynamic range
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
References
- Allen EJ, Oxenham AJ, 2014. Symmetric interactions and interference between pitch and timbre. J. Acoust. Soc. Am 135, 1371–1379. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bandyopadhyay S, Young ED, 2004. Discrimination of voiced stop consonants based on auditory nerve discharges. J. Neurosci 24, 531–541. DOI: 10.1523/JNEUROSCI.4234-03.2004 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bharadwaj H, Verhulst S, Shaheen K, Liberman MC, Shinn-Cunningham BG, 2014. Cochlear neuropathy and the coding of supra-threshold sound. Frontiers Sys. Neurosci 26, 1–18. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bianchi F, Carney LH, Dau T, Santurette S, 2019. Effects of musical training and hearing loss on fundamental frequency discrimination and temporal fine structure processing: Psychophysics and modeling. JARO 20, 263–277. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bisgaard N, Vlaming MS, Dahlquist M, 2010. Standard audiograms for the IEC 60118–15 measurement procedure. Trends in Ampl. 14, 113–120. DOI: 10.1177/1084713810379609 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Brennan MA, Svec A, Farhadi A, Maxwell BN, Carney LH, 2023. Inherent envelope fluctuations in forward masking: Effects of age and hearing loss. J. Acoust. Soc. Am 153, 1994–1994. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Brown MC, 2011. Anatomy of Olivocochlear Neurons. In: Ryugo DK, Fay RR, Popper AN (eds.) Auditory and Vestibular Efferents. Springer, New York, pp. 17–37. [Google Scholar]
- Brown MC, 2014. Single-unit labeling of medial olivocochlear neurons: the cochlear frequency map for efferent axons. J. Neurophys 111, 2177–2186. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Brown MC, Mukerji S, Drottar M, Windsor AM, Lee DJ, 2013. Identification of inputs to olivocochlear neurons using transneuronal labeling with pseudorabies virus (PRV). JARO 14, 703–717. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bruce IC, Sachs MB, Young ED, 2003. An auditory-periphery model of the effects of acoustic trauma on auditory nerve responses. J. Acoust. Soc. Am 113, 369–388, [DOI] [PubMed] [Google Scholar]
- Carney LH, 2018. Supra-threshold hearing and fluctuation profiles: implications for sensorineural and 326 hidden hearing loss. JARO 19, 331–352. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Carney LH, 2019. Reconsidering binaural phenomena in terms of interaural neural fluctuation differences. Proc. Int’l Congress on Acoustics, Aachen, Germany. [Google Scholar]
- Carney LH, 2022. Challenging the neural-fluctuation model for pitch using harmonic complexes in background noise, Proc. Int’l Symposium on Hearing, Lyon France, June, 2022. [Google Scholar]
- Carney LH, Cameron DA, Kinast KB, Feld CE, Schwarz DM, Leong UC, McDonough JM, 2023. Effects of sensorineural hearing loss on formant-frequency discrimination: Measurements and models. Hear. Res 435, 108788. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Carney LH, Heinz MG, Evilsizer ME, Gilkey RH, Colburn HS, 2002. Auditory phase opponency: A temporal model for masked detection at low frequencies. Acta Acustica United with Acustica, 88, 334–347. [Google Scholar]
- Carney LH, Kim DO, Kuwada S, 2016. Speech Coding in the Midbrain: Effects of Sensorineural Hearing Loss, in Physiology, Psychoacoustics and Cognition in Normal and Impaired Hearing, Springer. Advances in Exp. Med and Biol 894:427–435. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Carney LH, Maxwell BN, Richards VM, 2022. Nonlinearity in Hearing: The Role of Inner-Hair-Cell Saturation in Neural Coding, Proc. Mechanics of Hearing Workshop [Google Scholar]
- Carney LH, Li T, McDonough JM, 2015. Speech coding in the brain: representation of vowel formants by midbrain neurons tuned to sound fluctuations. Eneuro 2: ENEURO-0004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Carney LH, McDonough JM, 2019. Nonlinear auditory models yield new insights into representations of vowels, Atten. Percept. Psychophys 81, 1034–1046. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cedolin L, Delgutte B, 2010. Spatiotemporal representation of the pitch of harmonic complex tones in the auditory nerve. J. Neurosc 30, 12712–12724. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Colburn HS, Carney LH, Heinz MG, 2003. Quantifying the information in auditory-nerve responses for level discrimination. JARO 4, 294–311. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Colburn HS, Durlach NI, 1978. Models of binaural interaction. Handbook of perception, 4, 467–518. [Google Scholar]
- Dau T, Kollmeier B, Kohlrausch A, 1997a. Modeling auditory processing of amplitude modulation. I. Detection and masking with narrow-band carriers. J. Acoust. Soc. Am 102, 2892–2905. [DOI] [PubMed] [Google Scholar]
- Dau T, Kollmeier B, Kohlrausch A 1997b. Modeling auditory processing of amplitude modulation. II. Spectral and temporal integration. J. Acoust. Soc. Am 102, 2906–2919. [DOI] [PubMed] [Google Scholar]
- Davidson SA, Gilkey RH, Colburn HS, Carney LH, 2006. Binaural detection with narrowband and wideband reproducible noise maskers: III. Models for monaural and diotic detection. J. Acoust. Soc. Am 119, 2258–2275. [DOI] [PubMed] [Google Scholar]
- Davidson SA, Gilkey RH, Colburn HS, Carney LH, 2009. An evaluation of models for diotic and dichotic detection in reproducible noises. J. Acoust. Soc. Am 126, 1906–1925. [DOI] [PMC free article] [PubMed] [Google Scholar]
- de Cheveigné A, 1993. Separation of concurrent harmonic sounds: Fundamental frequency estimation and a time-domain cancellation model of auditory processing. J. Acoust. Soc. Am 93, 3271–3290. [Google Scholar]
- de Cheveigné A, 2023. In-channel cancellation: A model of early auditory processing. J. Acoust. Soc. Am 153, 3350–3350. [DOI] [PubMed] [Google Scholar]
- Delgutte B, 1987. Peripheral auditory processing of speech information: implications from a physiological study of intensity discrimination. In: Schouten ME (ed) The Psychophysics of Speech Perception. Springer, Netherlands, pp. 333–353. [Google Scholar]
- Delgutte B, Kiang NY, 1984. Speech coding in the auditory nerve: I. Vowel-like sounds. J. Acoust. Soc. Am 75, 866–878. [DOI] [PubMed] [Google Scholar]
- Deng L, Geisler CD, 1987. Responses of auditory-nerve fibers to nasal consonant–vowel syllables. J. Acoust. Soc. Am 82, 1977–1988. [DOI] [PubMed] [Google Scholar]
- Deng L, Geisler CD, Greenberg S, 1987. Responses of auditory-nerve fibers to multiple-tone complexes. J. Acoust. Soc. Am 82, 1989–2000. [DOI] [PubMed] [Google Scholar]
- Encina-Llamas G, Harte JM, Dau T, Shinn-Cunningham B, Epp B, 2019. Investigating the effect of cochlear synaptopathy on envelope following responses using a model of the auditory nerve. JARO 20, 363–382. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Evilsizer ME, Gilkey RH, Mason CR, Colburn HS, Carney LH, 2002. Binaural detection with Narrowband and Wideband Reproducible Noise Maskers: I. Results for Human. J. Acoust. Soc. Am 111, 336–345. [DOI] [PubMed] [Google Scholar]
- Fan L, Henry KS, Carney LH, 2021. Responses to Diotic Tone-in-Noise Stimuli in the Inferior Colliculus: Stimulus Envelope and Neural Fluctuation Cues. Hear. Res 409:108328. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Farhadi A, 2023. Modeling the Medial Olivocochlear Efferent in the Descending Auditory Pathway with a Dynamic Gain Control Feedback System, PhD Dissertation, University of Rochester, Rochester, NY. https://www.urmc.rochester.edu/MediaLibraries/URMCMedia/labs/carney-lab/documents/afagh-farhadi-thesis.pdf [Google Scholar]
- Farhadi A, Carney LH, 2023. Predicting thresholds in an auditory overshoot paradigm using a computational subcortical model with efferent feedback, IEEE-WASPAA Proceedings. [Google Scholar]
- Farhadi A, Jennings SG, Strickland EA, Carney LH, 2023. Subcortical Auditory Model including Efferent Dynamic Gain Control with Inputs from Cochlear Nucleus and Inferior Colliculus. J. Acoust. Soc. Am 154, 3644–3659. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fletcher H, 1940. Auditory patterns. Reviews of Modern Physics, 12, 47. [Google Scholar]
- Florentine M, Buus SR, 1981. An excitation-pattern model for intensity discrimination. J. Acoust. Soc. Am 70, 1646–1654. [Google Scholar]
- Frank MM, Sitko AA, Suthakar K, Cadenas LT, Hunt M, Yuk MC, Weisz CJC, Goodrich LV, 2023. Experience-dependent flexibility in a molecularly diverse central-to-peripheral auditory feedback system. Elife, 12, e83855. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ghoshal S, Kim DO, 1997. Marginal shell of the anteroventral cochlear nucleus: single-unit response properties in the unanesthetized decerebrate cat. J. Neurophys, 77, 2083–2097. [DOI] [PubMed] [Google Scholar]
- Gilkey RH, Robinson DE, 1986. Models of auditory masking: A molecular psychophysical approach. J. Acoust. Soc. Am 79, 1499–1510. [DOI] [PubMed] [Google Scholar]
- Gockel H, Moore BC, Plack CJ, Carlyon RP, 2006. Effect of noise on the detectability and fundamental frequency discrimination of complex tones. J. Acoust. Soc. Am 120, 957–965. [DOI] [PubMed] [Google Scholar]
- Goodman SS, Boothalingam S, Lichtenhan JT, 2021. Medial olivocochlear reflex effects on amplitude growth functions of long-and short-latency components of click-evoked otoacoustic emissions in humans. J. Neurophys 125, 1938–1953. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Green DM, 1988. Profile analysis: Auditory intensity discrimination (No. 13). Oxford University Press, Oxford. [Google Scholar]
- Guinan JJ Jr., 2018. Olivocochlear efferents: Their action, effects, measurement and uses, and the impact of the new conception of cochlear mechanical responses. Hear. Res 362, 38–47. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Guitton MJ, Avan P, Puel JL, Bonfils P, 2004. Medial olivocochlear efferent activity in awake guinea pigs. Neuroreport 15, 1379–1382. [DOI] [PubMed] [Google Scholar]
- Gummer M, Yates GK, Johnstone BM, 1988. Modulation transfer function of efferent neurones in the guinea pig cochlea. Hear. Res 36, 41–51. [DOI] [PubMed] [Google Scholar]
- Hall JW, Haggard MP, Fernandes MA, 1984. Detection in noise by spectro-temporal pattern analysis, J. Acoust. Soc. Am 76, 50–56. [DOI] [PubMed] [Google Scholar]
- Hamza Y, Farhadi A, Schwarz DM, McDonough JM, Carney LH, 2023. Representations of fricatives in subcortical model responses: Comparisons with human consonant perception. J. Acoust. Soc. Am 154, 602–618. 10.1121/10.0020536 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hartmann WM, Cariani PA, Colburn HS, 2019. Noise edge pitch and models of pitch perception. J. Acoust. Soc. Am 145, 1993–2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Heeringa AN, Jüchter C, Beutelmann R, Klump GM, Köppl C, 2023. Altered neural encoding of vowels in noise does not affect behavioral vowel discrimination in gerbils with age-related hearing loss, Frontiers in Neurosci., 17. doi: 10.3389/fnins.2023.1238941. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Heeringa AN, Köppl C, 2022. Auditory nerve fiber discrimination and representation of naturally-spoken vowels in noise. Eneuro 9(1). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Heinz MG, Colburn HS, Carney LH, 2001a. Evaluating auditory performance limits: I. One-parameter discrimination using a computational model for the auditory nerve. Neural Comp. 13, 2273–2316. [DOI] [PubMed] [Google Scholar]
- Heinz MG, Colburn HS, Carney LH, 2001b. Rate and timing cues associated with the cochlear amplifier: level discrimination based on monaural cross-frequency coincidence detection. J. Acoust. Soc. Am 110, 2065–2084 [DOI] [PubMed] [Google Scholar]
- Heinz MG, Swaminathan J, Boley JD, Kale S, 2010. Across-fiber coding of temporal fine-structure: Effects of noise-induced hearing loss on auditory-nerve responses. In: The Neurophysiological Bases of Auditory Perception (pp. 621–630). Springer; New York. [Google Scholar]
- Henry KS, Abrams KS, Forst J, Mender M, Neilans EG, Idrobo F, Carney LH 2017. Midbrain synchrony to envelope structure supports behavioral sensitivity to single-formant vowel-like sounds in noise. JARO 18, 165–181. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Henry KS, Wang Y, Abrams KS, Carney LH, 2023. Mechanisms of masking by Schroeder-phase harmonic tone complexes in the budgerigar (Melopsittacus undulatus). Hear. Res 108812. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hillenbrand J, Getty LA, Clark MJ, Wheeler K, 1995. Acoustic characteristics of American English vowels. J. Acoust. Soc. Am 97, 3099–3111. [DOI] [PubMed] [Google Scholar]
- Hirsh IJ, 1948. The influence of interaural phase on interaural summation and inhibition. J. Acoust. Soc. Am 20, 536–544. [Google Scholar]
- Holube I, Fredelake S, Vlaming MSMG, Kollmeier B, 2010. Development and analysis of an International Speech Test Signal (ISTS). Int. J. Audiol 49, 891–903. [DOI] [PubMed] [Google Scholar]
- Horbach M, Verhey JL, Hots J, 2018. On the pitch strength of bandpass noise in normal-hearing and hearing-impaired listeners. Trends in Hear. 22, 2331216518787067. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Isabelle SK, 1995. Binaural detection performance using reproducible stimuli. Ph.D. thesis, Boston University, Boston, MA. [Google Scholar]
- Isabelle SK, Colburn HS, 1991. Detection of tones in reproducible narrow-band noise. J. Acoust. Soc. Am 89, 352–359. [DOI] [PubMed] [Google Scholar]
- Jackson HM, Moore BCJ, 2014. The role of excitation-pattern and temporal-fine-structure cues in the discrimination of harmonic and frequency-shifted complex tones. J. Acoust. Soc. Am 135, 1356–1570. [DOI] [PubMed] [Google Scholar]
- Jesteadt W, Schairer KS, Neff DL, 2005. Effect of variability in level on forward masking and on increment detection. J. Acoust. Soc. Am 118, 325–337. [DOI] [PubMed] [Google Scholar]
- Jennings SG, 2021. The role of the medial olivocochlear reflex in psychophysical masking and intensity resolution in humans: A review. J. Neurophys 125, 2279–2308. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jennings SG, Heinz MG, Strickland EA, 2011. Evaluating adaptation and olivocochlear efferent feedback as potential explanations of psychophysical overshoot. JARO 12, 345–360. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jennings SG, Strickland EA, Heinz MG, 2009. Precursor effects on behavioral estimates of frequency selectivity and gain in forward masking. J. Acoust. Soc. Am 125, 2172–2181. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jepsen ML, Ewert SD, Dau T, 2008. A computational model of human auditory signal processing and perception. J. Acoust. Soc. Am 124, 422–438. [DOI] [PubMed] [Google Scholar]
- Johnson DH, 1980. The relationship between spike rate and synchrony in responses of auditory-nerve fibers to single tones. J. Acoust. Soc. Am 68, 1115–1122. [DOI] [PubMed] [Google Scholar]
- Jørgensen S, Dau T, 2011. Predicting speech intelligibility based on the signal-to-noise envelope power ratio after modulation-frequency selective processing. J. Acoust. Soc. Am 130, 1475–1487. [DOI] [PubMed] [Google Scholar]
- Joris PX, Schreiner CE, Rees A, 2004. Neural processing of amplitude-modulated sounds. Phys. Reviews 84, 541–577. [DOI] [PubMed] [Google Scholar]
- Joris PX, Yin TC, 1992. Responses to amplitude-modulated tones in the auditory nerve of the cat. J. Acoust. Soc. Am 91, 215–232. [DOI] [PubMed] [Google Scholar]
- Kale S, Heinz MG, 2010. Envelope coding in auditory nerve fibers following noise-induced hearing loss. JARO 11, 657–673. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kay RH, 1982. Hearing of modulation in sounds. Phys. Reviews, 62, 894–975. [DOI] [PubMed] [Google Scholar]
- Kidd G Jr., Mason CR, Brantley MA, Owen GA, 1989. Roving-level tone-in-noise detection. J. Acoust. Soc. Am 86,1310–1317. [DOI] [PubMed] [Google Scholar]
- Kim DO, Carney LH, Kuwada S, 2020. Amplitude modulation transfer functions reveal opposing populations within both the inferior colliculus and medial geniculate body. J. Neurophys 124, 1198–1215. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kohlrausch A, Fassel R, Van Der Heijden M, Kortekaas R, Van De Par S, Oxenham AJ, Püschel D, 1997. Detection of tones in low-noise noise: Further evidence for the role of envelope fluctuations. Acta Acustica united with Acustica 83, 659–669. [Google Scholar]
- Krishna BS, Semple MN, 2000. Auditory temporal processing: responses to sinusoidally amplitude-modulated tones in the inferior colliculus. J. Neurophys 84, 255–273 [DOI] [PubMed] [Google Scholar]
- Langner G, Schreiner CE, 1988. Periodicity coding in the inferior colliculus of the cat. I. Neuronal mechanisms. J. Neurophys 60, 1799–1822. [DOI] [PubMed] [Google Scholar]
- Lawson JL, Uhlenbeck GE, 1950. Threshold signals. Volume 24 of Radiation Laboratory Series, McGraw-Hill, New York. [Google Scholar]
- Lentz JJ, Richards VM, Matiasek MR, 1999. Different auditory filter bandwidth estimates based on profile analysis, notched noise, and hybrid tasks. J. Acoust. Soc. Am 106, 2779–2792. [DOI] [PubMed] [Google Scholar]
- Leong U, Schwarz DM, Carney LH, 2020. Sensorineural Hearing Loss Diminishes Use of Temporal Envelope Cues: Evidence from Roving-Level Tone-In-Noise Detection. Ear & Hear. 41, 1009–1019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li YH, Joris PX, 2021. Temporal correlates to monaural edge pitch in the distribution of interspike interval statistics in the auditory nerve. Eneuro 8(4). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li YH, Joris PX, 2023. Case reopened: A temporal basis for harmonic pitch templates in the early auditory system? J. Acoust. Soc. Am 154, 3986–4003. [DOI] [PubMed] [Google Scholar]
- Liberman MC, 1978. Auditory-nerve response from cats raised in a low-noise chamber, J. Acoust. Soc. Am 63, 442–455. [DOI] [PubMed] [Google Scholar]
- Mao J, Carney LH, 2014. Binaural detection with narrowband and wideband reproducible noise maskers: IV. Models using time, level, and envelope differences. J. Acoust. Soc. Am 135, 824–837. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mao J, Carney LH, 2015. Tone-in-Noise Detection Using Envelope Cues: Comparison of Signal-Processing-Based and Physiological Models, JARO 16, 121–133. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mao J, Koch K-J, Doherty KA, Carney LH, 2015. Cues for Diotic and Dichotic Detection of a 500-Hz Tone in Noise Vary with Hearing Loss. JARO 16, 507–521. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mao J, Vosoughi A, Carney LH, 2013. Predictions of diotic tone-in-noise detection based on a nonlinear optimal combination of energy, envelope, and fine-structure cues. J. Acoust. Soc. Am 134, 396–406. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Marshall L, Lapsley Miller JA, Guinan JJ, Shera CA, Reed CM, Perez ZD, Delhorne LA, Boege P, 2014. Otoacoustic-emission-based medial-olivocochlear reflex assays for humans. J. Acoust. Soc. Am 136, 2697–2713. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Maxwell BN, Fritzinger J, Carney LH, 2020a. Neural Mechanisms for Timbre: Spectral-Centroid Discrimination based on a Model of Midbrain Neurons. Timbre2020 Proc. [Google Scholar]
- Maxwell BN, Fritzinger J, Carney LH, 2021. A New Auditory Theory and its Implications for the Study of Timbre, Future Directions of Music Cognition Proc.
- Maxwell BN, Richards VM, Carney LH, 2020b, Neural fluctuation cues for simultaneous notched-noise masking and profile-analysis tasks: Insights from model midbrain responses. J. Acoust. Soc. Am 147, 3523–3537. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Meddis R, Hewitt MJ, 1991a. Virtual pitch and phase sensitivity of a computer model of the auditory periphery. I: Pitch identification. J. Acoust. Soc. Am 89, 2866–2882. [Google Scholar]
- Meddis R, Hewitt MJ, 1991b. Virtual pitch and phase sensitivity of a computer model of the auditory periphery. II: Phase sensitivity. J. Acoust. Soc. Am 89, 2883–2894. [Google Scholar]
- Miller RL, Schilling JR, Franck KR, Young ED, 1997. Effects of acoustic trauma on the representation of the vowel /ε/ in cat auditory nerve fibers. J. Acoust. Soc. Am 101, 3602–3616. [DOI] [PubMed] [Google Scholar]
- Millman RE, Mattys SL, Gouws AD, Prendergast G, 2017. Magnified neural envelope coding predicts deficits in speech perception in noise. J. Neurosci 37, 7727–7736. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mitchell PW, Henry KS, Carney LH, 2023. Sensitivity to direction and velocity of fast frequency chirps in the inferior colliculus of awake rabbit. Hear. Res, 440, 108915. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Moore BCJ, 1975. Mechanisms of masking. J. Acoust. Soc. Am 57, 391–399. [DOI] [PubMed] [Google Scholar]
- Moore BCJ, 1990. Co-modulation masking release: spectro-temporal pattern analysis in hearing. Brit. J. Audiol 24, 131–137. [DOI] [PubMed] [Google Scholar]
- Moore BCJ, 1995. Frequency analysis and masking, in Hearing, edited by Moore BCJ (Academic Press, Orlando, FL: ), pp. 161–205. [Google Scholar]
- Moore BCJ, 2003, An Introduction to the Psychology of Hearing, 5th Ed. San Diego: Academic Press. [Google Scholar]
- Nelson PC, Carney LH, 2004. A phenomenological model of peripheral and central neural responses to amplitude-modulated tones. J. Acoust. Soc. Am 116, 2173–2186 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nelson PC, Carney LH, 2007. Neural rate and timing cues for detection and discrimination of amplitude-modulated tones in the awake rabbit inferior colliculus. J. Neurophys 97, 522–539. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Osses Vecchi A, Kohlrausch A, 2021. Perceptual similarity between piano notes: Simulations with a template-based perception model. J. Acoust. Soc. Am 149, 3534–3552. [DOI] [PubMed] [Google Scholar]
- Palombi PS, Backoff PM, Caspary DM, 2001. Responses of young and aged rat inferior colliculus neurons to sinusoidally amplitude modulated stimuli. Hear. Res 153, 174–180. [DOI] [PubMed] [Google Scholar]
- Pasley BN, David SV, Mesgarani N, Flinker A, Shamma SA, Crone NE, Knight RT, Chang EF, 2012. Reconstructing speech from human auditory cortex. PLoS Biol. 10, e1001251. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Patterson RD, Moore BCJ, 1986. Auditory filters and excitation patterns as representations of frequency resolution. In: Moore BCJ (ed) Frequency Selectivity in Hearing, Academic Press, London. pp. 123–177. [Google Scholar]
- Patterson RD, 1976. Auditory filter shapes derived with noise stimuli. J. Acoust. Soc. Am 59, 640–654. [DOI] [PubMed] [Google Scholar]
- Plomp R, 1988. The negative effect of amplitude compression in multichannel hearing aids in the light of the modulation-transfer function. J. Acoust. Soc. Am 83, 2322–2327. [DOI] [PubMed] [Google Scholar]
- Relaño-Iborra H, Dau T, 2022. Speech intelligibility prediction based on modulation frequency-selective processing. Hear. Res 108610. [DOI] [PubMed] [Google Scholar]
- Relaño-Iborra H, Zaar J, Dau T, 2019. A speech-based computational auditory signal processing and perception model. J. Acoust. Soc. Am 146, 3306–3317. [DOI] [PubMed] [Google Scholar]
- Richards VM, 1992a. The detectability of a tone added to narrow bands of equal-energy noise. J. Acoust. Soc. Am 91, 3424–3435. [DOI] [PubMed] [Google Scholar]
- Richards VM, 1992b. The effects of level uncertainty on the detection of a tone added to narrow bands of noise. In: Auditory physiology and perception (pp. 337–343). Pergamon. [Google Scholar]
- Richards VM, Carney LH, 2019. Potential cues for the “level discrimination” of a noise band in the presence of flanking bands. J. Acoust. Soc. Am 145, EL442–EL448. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Romero GE, Trussell LO, 2021. Distinct forms of synaptic plasticity during ascending vs descending control of medial olivocochlear efferent neurons. Elife 10, e66396. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Romero GE, Trussell LO, 2022. Central circuitry and function of the cochlear efferent systems. Hear. Res 425, 108516. [DOI] [PubMed] [Google Scholar]
- Rose JE, Brugge JF, Anderson DJ, Hind JE, 1967. Phase-locked response to low-frequency tones in single auditory nerve fibers of the squirrel monkey. J. Neurophys 30, 769–793. [DOI] [PubMed] [Google Scholar]
- Sachs MB, Bruce IC, Miller RL, Young ED, 2002. Biological basis of hearing-aid design. Annals of Biomedical Engineering 30, 157–168. [DOI] [PubMed] [Google Scholar]
- Sachs MB, Young ED, 1980. Effects of nonlinearities on speech encoding in the auditory nerve. J. Acoust. Soc. Am 68, 858–875. [DOI] [PubMed] [Google Scholar]
- Schacknow PN, Raab DH, 1973. Intensity discrimination of tone bursts and the form of the Weber function. Percept. & Psycho 14, 449–450. [Google Scholar]
- Schofield BR, 2011. Central descending auditory pathways. In: Ryugo DK, Fay RR, Popper AN (eds) Auditory and Vestibular Efferents. Springer, New York, pp. 261–290. [Google Scholar]
- Shamma SA, 1985. Speech processing in the auditory system. I: The representation of speech sounds in the responses of the auditory nerve. J. Acoust. Soc. Am 78,1612–1621. [DOI] [PubMed] [Google Scholar]
- Shamma S, Klein D, 2000, The case of the missing pitch templates: How harmonic templates emerge in the early auditory system. J. Acoust. Soc. Am 107, 2631–2644. [DOI] [PubMed] [Google Scholar]
- Siebert WM, 1965. Some implications of the stochastic behavior of primary auditory neurons. Kybernetik 2, 206–215. [DOI] [PubMed] [Google Scholar]
- Steenken F, Oetjen H, Beutelmann R, Carney LH, Koeppl C, Klump GM, 2022. Neural processing and perception of Schroeder-phase harmonic tone complexes in the gerbil: Relating single-unit neurophysiology to behavior. European J. Neurosci, 56, 4060–4085. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Steinschneider M, Nourski KV, Fishman YI, 2013. Representation of speech in human auditory cortex: is it special?. Hear. Res 305, 57–73. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Strickland EA, 2004. The temporal effect with notched-noise maskers: analysis in terms of input–output functions. J. Acoust. Soc. Am 115, 2234–2245. [DOI] [PubMed] [Google Scholar]
- Svec A, Dubno JR, Nelson PB, 2015. Effects of inherent envelope fluctuations in forward maskers for listeners with normal and impaired hearing. J. Acoust. Soc. Am 137, 1336–1343. [DOI] [PMC free article] [PubMed] [Google Scholar]
- van der Heijden M, Joris PX, 2010. Interaural correlation fails to account for detection in a classic binaural task: Dynamic ITDs dominate N0Spi detection. JARO 11, 113–131. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Viemeister NF, 1983. Auditory intensity discrimination at high frequencies in the presence of noise. Science, 221, 1206–1208. [DOI] [PubMed] [Google Scholar]
- Viemeister NF, 1988. Intensity coding and the dynamic range problem. Hear. Res 34, 267–274. [DOI] [PubMed] [Google Scholar]
- Warr WB, 1992. Organization of olivocochlear efferent systems in mammals. In: The mammalian auditory pathway: Neuroanatomy, Webster DB, Popper AN, Fay RR (Eds.) Springer: New York, pp. 410–448. [Google Scholar]
- Winter IM, Palmer AR, 1991. Intensity coding in low-frequency auditory-nerve fibers of the guinea pig. J. Acoust. Soc. Am 90,1958–1967. [DOI] [PubMed] [Google Scholar]
- Ye Y, Machado DG, Kim DO, 2000. Projection of the marginal shell of the anteroventral cochlear nucleus to olivocochlear neurons in the cat. J. Comp. Neurol 420, 127–138. [PubMed] [Google Scholar]
- Young ED, 2008. Neural representation of spectral and temporal information in speech. Phil. Trans. R. Soc. B 363, 923–945. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Young ED, Sachs MB, 1979. Representation of steady-state vowels in the temporal aspects of the discharge patterns of populations of auditory-nerve fibers. J. Acoust. Soc. Am 66, 1381–1403. [DOI] [PubMed] [Google Scholar]
- Zaar J, Carney LH, 2022. Predicting speech intelligibility in hearing-impaired listeners using a physiologically inspired auditory model. Hear. Res 426, 108553. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zaar J, Simonsen LB, Dau T, Laugesen S, 2023. Toward a clinically viable spectro-temporal modulation test for predicting supra-threshold speech reception in hearing-impaired listeners. Hear. Res, 427, 108650. [DOI] [PubMed] [Google Scholar]
- Zaar J, Simonsen LB, Laugesen S, 2024. A spectro-temporal modulation test for predicting speech reception in hearing-impaired listeners with hearing aids, Hear. Res doi: 10.1016/j.heares.2024.108949. [DOI] [PubMed] [Google Scholar]
- Zhang X, Heinz MG, Bruce IC, Carney LH, 2001. A phenomenological model for the responses of auditory-nerve fibers: I. Nonlinear tuning with compression and suppression. J. Acoust. Soc. Am 109, 648–670. [DOI] [PubMed] [Google Scholar]
- Zilany MSA, Bruce IC, 2007. Representation of the vowel /ε/ in normal and impaired auditory nerve fibers: model predictions of responses in cats. J. Acoust. Soc. Am 122, 402–417. [DOI] [PubMed] [Google Scholar]
- Zilany MSA, Bruce IC, Carney LH, 2014. Updated parameters and expanded simulation options for a model of the auditory periphery. J. Acoust. Soc. Am 135, 283–286. [DOI] [PMC free article] [PubMed] [Google Scholar]