Skip to main content
The Journal of the Acoustical Society of America logoLink to The Journal of the Acoustical Society of America
. 2020 Dec 10;148(6):3581–3597. doi: 10.1121/10.0002879

Effects of noise precursors on the detection of amplitude and frequency modulation for tones in noise

Juraj Mesik 1,a),, Magdalena Wojtczak 1,b)
PMCID: PMC8097715  PMID: 33379905

Abstract

Recent studies on amplitude modulation (AM) detection for tones in noise reported that AM-detection thresholds improve when the AM stimulus is preceded by a noise precursor. The physiological mechanisms underlying this AM unmasking are unknown. One possibility is that adaptation to the level of the noise precursor facilitates AM encoding by causing a shift in neural rate-level functions to optimize level encoding around the precursor level. The aims of this study were to investigate whether such a dynamic-range adaptation is a plausible mechanism for the AM unmasking and whether frequency modulation (FM), thought to be encoded via AM, also exhibits the unmasking effect. Detection thresholds for AM and FM of tones in noise were measured with and without a fixed-level precursor. Listeners showing the unmasking effect were then tested with the precursor level roved over a wide range to modulate the effect of adaptation to the precursor level on the detection of the subsequent AM. It was found that FM detection benefits from a precursor and the magnitude of FM unmasking correlates with that of AM unmasking. Moreover, consistent with dynamic-range adaptation, the unmasking magnitude weakens as the level difference between the precursor and simultaneous masker of the tone increases.

I. INTRODUCTION

Extraction of spectrotemporal modulation patterns present within natural acoustic inputs is a key function of the auditory system as these signals carry the meaningful information necessary for communication, object identification, and other auditory tasks. Importantly, in complex natural environments, signals of interest are often embedded in a rich mixture of acoustic signals that hinder the extraction of target information via energetic masking in the auditory periphery, as well as modulation masking (e.g., Bacon and Grantham, 1989; Dau et al., 1997a,b; Wojtczak, 2011) and informational masking in more central neural loci (e.g., Kidd et al., 2003; Roverud et al., 2016; Rennies et al., 2019). The auditory system needs to flexibly adjust response properties throughout its processing hierarchy to both enhance the representation fidelity of the target signals and suppress the representation of the interfering noise sources. Elucidation of the mechanisms involved in these functions is key for both understanding how acoustic signals are converted into perception and how their representation is altered by the presence of hearing loss or by bypassing the cochlea in cochlear-implant users.

Recently, a few studies have reported that amplitude modulation (AM) detection for tonal and narrowband-noise carriers presented in a simultaneous noise masker can be improved by the presentation of a noise precursor (Almishaal et al., 2017; Jennings et al., 2018; Marrufo-Pérez et al., 2018a; Wojtczak et al., 2019). Consistent with the idea that an improved AM representation in the target leads to better speech understanding (Jørgensen and Dau, 2011; Jørgensen et al., 2013; Jørgensen et al., 2015), it has been shown that delaying words from the onset of a background noise presented ipsilaterally (Ben-David et al., 2012; Ben-David et al., 2016; Marrufo-Pérez et al., 2018b; Marrufo-Pérez et al., 2020) and contralaterally (Marrufo-Pérez et al., 2018b) results in better word recognition than when onsets of words and noise maskers coincide.

Although frequency glides are ubiquitous in most environmental acoustic stimuli, including speech, the effect of precursors on frequency modulation (FM) detection has received much less attention. FM detection is thought to be based on detecting AM at the output of cochlear filters through which the stimulus sweeps during an FM cycle. This FM coding strategy has been referred to as FM-to-AM conversion (Zwicker, 1970; Moore and Sek, 1992, 1994, 1995; Whiteford and Oxenham, 2015; Moore et al., 2019). For low carrier frequencies, the FM-to-AM conversion is thought to be the dominant neural code for detecting fast-rate (≳10 Hz) FM because of a presumed sluggishness of temporal fine-structure processing (Grantham and Wightman, 1978; Moore and Sek, 1995, 1996). For high carrier frequencies, FM is thought to be detected via FM-to-AM conversion for all FM rates because temporal fine-structure cues are unreliable above the upper frequency limit of neural phase locking (Joris and Verschooten, 2013; Verschooten et al., 2019). Byrne et al. (2013) measured FM-detection thresholds for a 4-kHz tonal carrier and found that delaying the onset of FM relative to the onset of the carrier improved performance for FM delays up to a few hundred milliseconds. Effects of a noise precursor on FM detection for tones in noise, i.e., in conditions comparable to those used in the studies of the effect of a precursor on AM detection, have not been measured. Since the detection of FM is thought to be based on FM-to-AM conversion, FM detection should benefit from mechanisms facilitating AM detection at least for relatively high (≳10 Hz) FM rates.

The mechanisms involved in the facilitation of AM detection (referred to hereafter as AM unmasking) due to wideband precursors remain unknown. Wojtczak et al. (2019) provided evidence that auditory grouping (Bregman, 1990) based on the similarity of acoustic features between the simultaneous masker and the precursor could not explain this phenomenon. In that study, the unmasking effect was comparable in size for a precursor that consisted of a different sample of the same band of Gaussian noise as that used for the masker and for a precursor that was distinct from the masker (a seven-tone complex with different perceptual quality from the noise masker).

A frequently invoked mechanism of AM unmasking due to precursors is the activation of medial olivocochlear (MOC) efferent feedback. MOC efferents, which synapse directly onto outer hair cells in the cochlea, turn down cochlear gain after a sufficient amount of time is allowed for the medial olivocochlear reflex (MOCR) to build up (Backus and Guinan, 2006; Guinan, 2006; Lopez-Poveda, 2018). One of the main effects of the MOCR is to improve the representation of transient sounds presented in noise by adjusting the cochlear gain and restoring the dynamic range of auditory-nerve fibers (Winslow and Sachs, 1988; Kawase et al., 1993). The MOCR is hypothesized to play an important role for enhancing speech representation in noise as speech is rich in transients and envelope fluctuations. Indeed, the build-up time for MOCR strength appears to correspond to the time course of improvement in word recognition with an increased delay of words from the noise onset (Ben-David et al., 2016). In general, however, experimental support for the MOCR-based hypothesis is weak. While some studies show that greater MOCR suppression of otoacoustic emissions (indicating more effective cochlear gain control by the MOC efferent feedback) is related to better speech recognition in noise (Kumar and Vanaja, 2004; Mishra and Lutman, 2014; Bidelman and Bhagat, 2015), others report the opposite relationship (de Boer et al., 2012). Marrufo-Pérez et al. (2018b) and Marrufo-Pérez et al. (2019) showed that cochlear-implant users exhibit improvements in both AM detection and word recognition in noise due to a noise precursor, although MOCR effects on cochlear responses are absent in electric hearing. For words in noise, the size of the precursor effect was similar to that observed for normal-hearing listeners presented with tone-vocoded stimuli (Marrufo-Pérez et al., 2018b). For AM detection, the improvement in cochlear-implant users (Marrufo-Pérez et al., 2019) was comparable to or somewhat smaller than that in normal-hearing listeners (Almishaal et al., 2017; Jennings et al., 2018; Wojtczak et al., 2019). For AM tones presented in noise, Wojtczak et al. (2019) found no significant MOCR effects on stimulus-frequency otoacoustic emissions (SFOAEs) for stimuli comparable to those for which robust behavioral AM unmasking was observed in that study. The lack of a relationship between the presence of unmasking and the presence of MOCR effects suggests that other mechanisms must play a role.

An alternative explanation often considered in the context of AM unmasking is that neural adaptation to stimulus-level statistics, typically referred to as dynamic-range adaptation, facilitates AM coding when the AM stimulus is preceded by a precursor. Dynamic-range adaptation is characterized by a shift of neural rate-level functions toward the most common level in a dynamically varying acoustic stimulus. The shift allows an adapted neural fiber to encode a change in stimulus intensity that would otherwise be undetected due to saturation of the fiber's firing rate (e.g., see Fig. 1 in Dean et al., 2005). Studies using animal models have shown direct evidence of dynamic-range adaptation from neural recordings at different sites throughout the auditory pathway from the auditory nerve (Wen et al., 2009, 2012) through the inferior colliculus (IC; Dean et al., 2005; Dean et al., 2008; Robinson et al., 2016; Rocchi and Ramachandran, 2018) and the auditory cortex (Watkins and Barbour, 2008). Two recent studies investigated dynamic-range adaptation of cortical responses in humans using magnetoencephalographic (MEG; Herrmann et al., 2018) and electroencephalographic (EEG; Herrmann et al., 2020) measures. The latter study also incorporated a behavioral measure of AM detection, but because of differences in stimulus design, their results cannot be directly related to the AM unmasking by a fixed-level precursor immediately preceding the target AM sound.

This study was conducted with two aims. One aim was to investigate if the unmasking by a noise precursor extends to relatively brief bursts of FM. Our hypothesis was that for a FM tone presented with simultaneous noise, a noise precursor should facilitate AM detection in auditory filters swept by the stimulus frequency during each FM cycle and, thus, should have an unmasking effect on FM coded via FM-to-AM conversion. The second aim was to investigate the role of dynamic-range adaptation in AM unmasking by implementing a level-rove paradigm to manipulate a mismatch between the level of the precursor and the level of the simultaneous noise masker of an AM tone. Our hypothesis was that due to dynamic-range adaptation, a precursor level most similar to the level of the simultaneous noise masker should result in a shift in neural rate-level functions that optimizes coding of the AM of tones embedded in the masker. In contrast, precursor levels that are most mismatched with the level of the masker should lead to suboptimal shifts of neural rate-level functions and, consequently, diminished ability of the fibers to optimally encode the AM. These mismatched shifts should yield poorer AM-detection performance. The level-rove paradigm was designed to use different precursor levels that were fixed throughout the precursor duration. The underlying assumption was that each fixed-level precursor represents an adapting stimulus with an infinitesimally narrow level distribution that induces dynamic-range adaptation within each trial. In this design, trials with a precursor level matching the level of the masker mimicked conditions with a precursor used in previous behavioral studies on AM unmaking (Marrufo-Pérez et al., 2018a; Marrufo-Pérez et al., 2019; Wojtczak et al., 2019).

II. EXPERIMENT 1: DETECTION OF AM AND FM OF A TONE IN NOISE WITH AND WITHOUT A PRECURSOR

A. Rationale

For tones modulated in frequency, auditory filters swept through by the stimulus frequency over the course of the modulation cycle convert the frequency sweep into AM at the filters' outputs. It has been shown that FM-to-AM conversion is likely used as a code for FM detection for modulation rates above ∼10 Hz (e.g., Moore and Sek, 1994, 1995, 1996; Moore et al., 2019). This experiment was designed to test the hypothesis that for conditions in which detection of AM and FM is based on a shared code, i.e., AM at the output of cochlear filters, detection of both types of modulation can be improved by a noise precursor, and these improvements should be correlated with each other across listeners.

B. Methods

1. Listeners

A total of 70 normal-hearing listeners (22 male, 48 female), aged between 18 and 64 years old [mean = 22.69 yr, standard deviation (SD) = 8.17 yr, median = 20 yr], participated in the study. AM-detection tasks were completed by 61 of the recruited participants while 30 participants completed FM-detection tasks. Among these listeners, a smaller number (21) were able to commit to participation over a longer time period and completed both the AM and FM tasks. Participants were recruited from the population of students and employees at the University of Minnesota and were screened for normal hearing by administering pure-tone audiometry at octave frequencies between 250 and 8000 Hz prior to the beginning of the experiment. Normal hearing was defined by air-conduction thresholds ≤20 dB hearing level (HL) at all the octave frequencies. In this, and the subsequent experiment, all participants provided written informed consent, and the experimental procedures were approved by the Institutional Review Board of the University of Minnesota. Listeners received monetary compensation for their participation.

2. Stimuli

Threshold modulation depths needed for the detection of 20-Hz AM and 20-Hz FM were measured for pure-tone carriers with frequencies of 1 and 4 kHz. Because of the possibility that the temporal fine structure could contribute to FM coding at low to medium but not at high carrier frequencies, the 1-kHz carrier was used because it falls within the range of frequencies for which auditory-nerve fibers exhibit phase-locked responses, and the 4-kHz carrier was used because it likely falls above the limit of phase locking in humans (Joris and Verschooten, 2013; Verschooten et al., 2019). Based on studies by Moore and Sek (1992, 1995, 1996), the 20-Hz FM should be detected via FM-to-AM conversion for both carrier frequencies despite robust phase-locked neural responses at 1 kHz. The tonal carriers had a duration of 100 ms, including 5-ms Hanning onset and offset ramps and were modulated throughout their duration, thus comprising two modulation cycles. The starting phases of the modulations (AM and FM) were chosen randomly from a range of 0 − 2π radians. The carriers were presented at 55 dB sound pressure level (SPL) and were temporally centered in a 110-ms simultaneous two-octave-wide threshold equalizing noise (TEN; Moore et al., 2000). For both tonal carriers, the TEN was centered on the respective carrier frequency. In the precursor condition, the TEN masker followed a 250-ms TEN precursor (different noise sample) after a 15-ms silent interval. The masker and the precursor durations included 5-ms Hanning onset/offset ramps. For the AM detection, the level of the TEN was set individually for each participant to produce a relatively poor but below ceiling (<100% AM depth) performance. This was done to optimize the AM unmasking effect due to a precursor as our pilot data indicated that the effect tends to decrease or disappear when the AM-detection threshold measured in noise approaches that for the AM of the tones presented in quiet, leaving little or no room for AM unmasking. For many listeners, the signal-to-noise ratio (SNR) was set to 25 dB and remained at this level. However, some listeners (32 of 61) needed a lower SNR if their AM thresholds in the no-precursor condition were low (approaching the thresholds in quiet), and no reliable AM unmasking was observed at the SNR of 25 dB. A few listeners required a slightly higher SNR as they could not detect the full (100%) AM for the SNR of 25 dB. For each listener, the same SNR was used for conditions with and without the noise precursor. For FM detection, carriers were presented 15 dB above the level of the TEN (SNR = 15 dB) because a ceiling performance was not a concern as all listeners could detect the FM for frequency deviations that were a fraction of the carrier frequency.

All stimuli were generated in matlab (The MathWorks, Natick, MA) on a personal computer (PC) and played using an E22 soundcard (LynxStudio, Costa Mesa, CA) with 24-bit resolution and a sampling rate of 48 kHz. The stimuli were delivered to both ears (diotically) through Sennheiser HD650 headphones (Sennheiser, Old Lyme, CT).

3. Procedure

The AM- and FM-detection thresholds were measured using a two-interval two-alternative forced-choice (2AFC) procedure with a three-down one-up adaptive tracking technique, estimating the 79.4% correct point on the psychometric function (Levitt, 1971). The procedure was implemented in the AFC toolbox for matlab (The MathWorks, Natick, MA; Ewert, 2013; Ewert and Dau, 2017). In each trial, participants heard two tones embedded in TEN, separated by a 500-ms silent interstimulus interval. One interval was randomly selected to contain an unmodulated tone while the other interval contained an identical tone except it was modulated in amplitude (AM detection) or frequency (FM detection). Listeners were asked to indicate which of the two intervals contained modulation by a key press on the computer keyboard or via a mouse click. The observation intervals were marked by flashing colored boxes, and visual feedback, indicating the correct response, was provided after each trial.

For AM detection, the modulation depth was tracked by using steps in decibel units, 20log(m), where m is the modulation index with a value between 0 and 1. The adaptive track used steps of 6 dB for the first two reversals, 4 dB for the subsequent two reversals, and 2 dB for the remaining eight reversals. Modulation depths at the final eight reversals were averaged to compute the threshold from a single run. When the adaptive procedure called for a modulation depth greater than 0 dB (100% AM), the modulation depth was set to 0 dB and the track continued. Adaptive tracks that exceeded 100 trials in length were terminated and no threshold was recorded. The final AM-detection threshold was obtained by averaging thresholds from three runs. If the SD of the mean from the three runs exceeded 3 dB, three more runs were performed, and the final threshold estimate was obtained by averaging all single-run thresholds that were less than ±3 median absolute deviations (MADs) from the median threshold (i.e., excluding outlier thresholds). Utilizing MADs for outlier detection is preferable to using the SD, as the MAD metric is more robust to outliers (Leys et al., 2013).

For FM detection, the procedure was similar to that of the AM-detection task except the adaptive tracking was applied to a deviation from the carrier frequency defined in dB units as 10log(Δf), where Δf indicates a deviation in Hz from the carrier frequency. The adaptive tracking used an initial step size of 3 dB, which was reduced to 1.5 dB after the first two reversals, and 0.75 dB after another two reversals. The 0.75-dB step was used for the remaining eight reversals that were then averaged to estimate the FM-detection threshold from each run. Thresholds from three runs were averaged to obtain the final estimate, and no exclusion criterion was applied as the thresholds were generally less variable across runs than those for AM detection. Both AM- and FM-detection thresholds were measured with and without the noise precursor at the two carrier frequencies, 1 and 4 kHz. The four conditions (two carrier frequencies × precursor/no-precursor) were presented in a random order for a given modulation type. Listeners who performed the tasks for both AM and FM completed the AM portion of the experiment prior to starting the FM conditions. During the experiment, listeners were seated comfortably in a single-walled sound-attenuating booth.

C. Results and discussion

Individual shifts in AM-detection thresholds due to the noise precursor are shown in Fig. 1 for the 1-kHz carrier [Fig. 1(a)] and the 4-kHz carrier [Fig. 1(b)]. Shifts in the negative direction indicate improvement in AM detection due to the precursor (AM unmasking), whereas shifts in the positive direction indicate a worse performance with than without the precursor. Note that a greater negative shift (lower negative number) indicates a greater unmasking effect. The majority of listeners exhibited AM unmasking with an average improvement in AM detection of ∼4 dB for the 1-kHz carrier and ∼2 dB for the 4-kHz carrier. These shifts in threshold are comparable to those observed by Marrufo-Pérez et al. (2018a) and Jennings et al. (2018) but are generally smaller than those reported by Almishaal et al. (2017) and Wojtczak et al. (2019) especially for the 4-kHz carrier frequency. A two-way repeated-measures analysis of variance (ANOVA) with main factors of condition (precursor vs no-precursor) and carrier frequency showed a significant effect of precursor [F(1,60) = 121.3, p < 0.001], indicating that the precursor facilitated AM detection. The effect of the carrier frequency was also significant [F(1,60) = 4.4, p = 0.04], indicating higher overall thresholds for the 4-kHz carrier. Finally, there was a significant interaction between the effects of the noise precursor and carrier frequency [F(1,60) = 30.4, p < 0.001]. A post hoc t-test showed that the interaction reflected a significantly greater unmasking for the 1-kHz carrier than for the 4-kHz carrier [t(60) = 5.5, p < 0.001].

FIG. 1.

FIG. 1.

Individual shifts in AM-detection threshold due to a noise precursor for a 1-kHz (a) and 4-kHz (b) carrier. Negative values indicate lower thresholds (better performance) with the precursor. The horizontal dashed lines indicate the mean AM unmasking and the shaded areas between dotted lines denote ±1 standard error (SE) of the mean.

The effect of carrier frequency on AM unmasking was not observed by Wojtczak et al. (2019). Different parameters of the stimuli may have contributed to this discrepancy. Specifically, both studies used two modulation cycles, but the modulation rate was lower in this study (20 Hz vs 40 Hz), resulting in a longer stimulus duration (100 ms vs 50 ms). A frequency-dependent decay time of the physiological effect underlying the AM unmasking could lead to these different outcomes. Dean et al. (2008) showed that the buildup time of dynamic-range adaptation was inversely related to the characteristic frequency (CF) of neurons in the IC. It is feasible that the decay time is similarly related to the CF. The dependence on frequency might be more easily observed for the longer stimuli used here as more recovery would occur over the course of their duration. In addition, in all of the previous studies of AM unmasking (Almishaal et al., 2017; Marrufo-Pérez et al., 2018a; Jennings et al., 2018; Wojtczak et al., 2019), there was no silent gap between the noise precursor and the masker with the modulated/unmodulated carrier. The longer carrier duration and the 15-ms silent gap used in this experiment might have facilitated a greater recovery from the precursor effect. The gap was introduced to ensure that AM unmasking can be observed for the temporal parameters of the stimuli used in experiment 2 in which the role of dynamic-range adaptation was investigated. This issue will be discussed in Sec. III B 2.

Although most listeners exhibited AM unmasking, a few exhibited no precursor effect or an effect in the opposite direction (a worse performance with a precursor). This pattern is consistent with previous studies, which reported significant effects of noise precursors at the group level, but a lack of effects or effects in the opposite direction in a subset of listeners (see Jennings et al., 2018; Marrufo-Pérez et al., 2018a). One reason for the lack of the effects could be a suboptimal choice of the SNR. In the study by Wojtczak et al. (2019), the SNR was adjusted to produce poor performance in the no-precursor condition because good performance resulted in little or no effect of the precursor. Similarly, for words in noise, Marrufo-Pérez et al. (2018b) showed that good performance in the no-precursor condition was associated with a smaller precursor effect. In this study, up to three SNR adjustments were performed for listeners whose thresholds in the no-precursor condition were low (indicating good performance), and no AM unmasking due to a precursor was observed. However, when thresholds in the no-precursor condition were high (in the range of −6 to −1.5 dB) and no AM unmasking or worsening of AM detection was observed, no attempts to optimize the SNR were made.

Figure 2 shows the relationship between AM-detection thresholds in the no-precursor condition and shifts in threshold due to the noise precursor. Note that to improve sensitivity to a possible linear relationship between the two measures, we identified outlier data points (ones differing from the median of each measure in either direction by more than three MADs) using matlab's “isoutlier” function and excluded them from analysis. This led to the exclusion of two data points from both the 1-kHz and 4-kHz data. In this and subsequent correlation plots, outlier points are shown as light red circles.

FIG. 2.

FIG. 2.

(Color online) The relationship between AM-detection thresholds without a noise precursor and threshold shifts due to a precursor for 1-kHz (a) and 4-kHz (b) carriers. The dashed lines denote the best fitting linear function relating the two measures. Faint red circles indicate outlier data points.

For the 1-kHz carrier, there was a significant negative correlation between the size of AM unmasking and the baseline (no-precursor) performance (r = −0.28, p = 0.029). However, despite a comparable range of thresholds in the baseline condition for the 4-kHz carrier, the effect of the precursor was independent of the baseline performance (r = −0.09, p = 0.49).

Overall, for the relatively long (100-ms) bursts of AM with the 20-Hz rate, the unmasking effect was observed both at 1 and 4 kHz, but the effect appeared less robust at the higher carrier frequency.

Individual shifts in FM-detection thresholds for the 1- and 4-kHz carriers due to a noise precursor are shown in Figs. 3(a) and 3(b), respectively. For FM, it was hard to gauge what constitutes poor performance when choosing the SNR because the worst performance in an FM-detection task is limited by Δf that is equal to the carrier frequency. Pilot data from three listeners for tones presented in quiet showed average FM-detection thresholds [in units of 10log(Δf)] of 6.9 (SD = 1.2) dB for the 1-kHz carrier and 14.2 (SD = 1.1) dB for the 4-kHz carrier. These thresholds translate to geometric-mean frequency deviations of Δf = 4.9 Hz, and Δf = 26.1 Hz for the 1- and 4-kHz carriers, respectively. It was decided that the SNR producing FM-detection thresholds that are substantially increased by the simultaneous noise masker would provide room for FM unmasking. For the tones presented in a noise masker without a precursor, FM-detection thresholds averaged across listeners were 14.4 (SD = 1.7) dB for the 1-kHz carrier, and 19.5 (SD = 1.8) dB for the 4-kHz carrier. When converted to frequency deviations from the carrier, the geometric-mean thresholds were 27.6 Hz and 89.2 Hz for the 1- and 4-kHz carriers, respectively. In both cases, the thresholds in noise were also higher than those observed in quiet for comparable carrier and modulation frequencies in previous studies. Moore and Sek (1992) showed FM-detection thresholds around Δf of ∼5 Hz for a 1-kHz carrier, and Byrne et al. (2013) showed the thresholds to be around 20 Hz for a 4-kHz carrier, in agreement with our limited pilot data. Higher FM-detection thresholds result in a greater Δf/fmod and, consequently, in a long-term spectrum of the FM tone with multiple relatively intense sidebands. The sidebands occur at frequencies of fc±nfmod, where n is a positive integer number, and fc and fmod denote the carrier frequency and the modulation rate, respectively. To investigate the possibility that listeners detected FM in the tones in noise by detecting the sidebands in the long-term spectrum, we calculated the spectra for the FM tones corresponding to the average thresholds and the equivalent rectangular bandwidths (ERBs; Glasberg and Moore, 1990) of auditory filters at the two carrier frequencies. The ERBs were 133 and 456 Hz, for the 1- and 4-kHz tones, respectively. The first sidebands in the spectra of the FM tones that fell outside of the ERB were −36 dB and −84 dB, relative to the level of the 1- and 4-kHz carriers, respectively. Sidebands that were further away from the carriers decayed rapidly in level. Because the FM tones were presented at 15 dB SNR, the sidebands outside the auditory filters tuned to the carrier frequencies that fell within the band of the two-octave noise masker were completely masked by the noise. The more remote sidebands, which fell outside of the noise bandwidth, had levels below hearing thresholds. Thus, it is unlikely that the FM was detected using differences between the long-term spectra of the modulated and unmodulated carriers. Most listeners showed FM unmasking due to a precursor for both carrier frequencies. The mean shifts in threshold in the log-transformed tracking units were −1.19 (SD = 1.31) dB and −2.04 (SD = 1.51) dB for the 1- and 4-kHz carriers, respectively. A two-way repeated-measures ANOVA performed on the log-transformed shifts in frequency deviation at threshold with the main factors of condition (precursor vs no-precursor) and carrier frequency showed a significant effect of the precursor [F(1,29) = 58.9, p < 0.001]. There was also a significant effect of the carrier frequency [F(1,29) = 230.0, p < 0.001], indicating a significantly larger Δf value at FM-detection thresholds for the 4-kHz than for the 1-kHz carrier. Finally, there was a significant interaction between the carrier frequency and noise precursor [F(1,29) = 8.1, p = 0.008]. A post hoc t-test revealed that the magnitude of unmasking was greater for the 4–kHz carrier than for the 1-kHz carrier [t(29) = −2.8, p = 0.008].

FIG. 3.

FIG. 3.

(Color online) Individual shifts in FM-detection threshold for a 1-kHz (a) and a 4-kHz (b) carrier presented in noise due to a noise precursor. The horizontal dashed lines and the shaded areas between dotted lines represent the mean and ±1 SE of the mean, respectively.

Unmasking, expressed as a change in Δf in Hz between the no-precursor and precursor conditions, was statistically greater for the 4-kHz than for the 1-kHz carrier. However, this increase reflects increased absolute bandwidths of auditory filters with an increasing center frequency (Glasberg and Moore, 1990). Although the relative bandwidth decreases somewhat with an increasing center frequency (Shera et al., 2002; Oxenham and Shera, 2003), the tuning of cochlear filters estimated based on psychophysical simultaneous-masking data is characterized by constant relative bandwidths (i.e., constant QERB=ERB/CF) for CFs above ∼1 kHz (Glasberg and Moore, 1990). Because of that, we compared the magnitudes of FM unmasking expressed as a fraction of the corresponding carrier frequencies. For this relative unmasking, we no longer found a reliable difference in FM unmasking between the two carrier frequencies [t(29) = −1.7, p = 0.11].

To investigate whether the size of FM unmasking depends on the threshold in the no-precursor condition, we plotted the relationships between FM-detection thresholds without the precursor and the unmasking magnitude, which is shown in Fig. 4. The MAD metric identified no outliers and, therefore, all data were included in the correlations for both carrier frequencies. Unlike for AM unmasking, no significant correlation with performance in the no-precursor condition was found for the 1-kHz carrier (r = −0.06, p = 0.74), but there was a significant correlation between the two variables for the 4-kHz carrier (r = −0.45, p = 0.014).

FIG. 4.

FIG. 4.

(Color online) The relationship between FM-detection thresholds without a noise precursor and threshold shifts with a precursor for the 1-kHz (a) and 4-kHz (b) carriers. The dashed lines denote the best fitting linear function relating the two measures.

Across the two types of modulation, we found that a noise precursor provides behavioral benefits in detection of both 20-Hz AM and FM for both the 1- and 4-kHz carriers. For AM, the results are in line with prior studies that demonstrated AM unmasking for higher modulation rates of 40 and 50 Hz (Jennings et al., 2018; Marrufo-Pérez et al., 2018a; Wojtczak et al., 2019) and replicate the effect for a 20-Hz modulation rate (Almishaal et al., 2017). Our results also extend these previous observations of AM unmasking to FM, likely due to FM being detected via AM at the output of the cochlear filters swept by the fluctuating frequency.

A direct comparison between the magnitudes of AM and FM unmasking would require an estimation of cochlear filter bandwidths for each listener and making some assumptions about how the fluctuations in the filter outputs combine over the FM cycle (Henry et al., 2017). Since auditory filters were not measured in this study, using an auditory model to make such predictions for individual listeners would not be informative without using reasonable constraints. It has been suggested that when detecting FM, listeners may weigh changes on the low-frequency side of the excitation pattern more heavily because they are larger due to a steeper slope on that side (Zwicker, 1956; Moore and Sek, 1992). The fact that the AM unmasking effect was larger at 1 kHz than at 4 kHz while the opposite was true for FM does not indicate that the results are inconsistent across the two modulation types. Cochlear filters increase in absolute bandwidth (in Hz) between 1 and 4 kHz while their relative bandwidths (related to the center frequency) decrease with increasing frequency (Shera et al., 2002; Oxenham and Shera, 2003). The dependence of cochlear filter bandwidths on frequency could produce the apparently inconsistent findings. However, if FM and AM unmasking are mediated by the same mechanism, the shifts in thresholds for both types of modulation would be expected to be positively correlated at both carrier frequencies. Figure 5 shows the relationship between AM and FM unmasking for the subset of listeners (n = 21) who provided data for both types of modulation. As in the other correlation analyses, we identified outlier data points and excluded them from analysis to prevent outliers from driving correlations or lack thereof. This led to the exclusion of no data points for the 1-kHz carrier and two data points for the 4-kHz carrier.

FIG. 5.

FIG. 5.

(Color online) The relationship between AM and FM unmasking magnitudes for the 1-kHz (a) and 4-kHz (b) carriers. The dashed lines denote the best fitting linear function relating the two measures. Faint red circles indicate outlier data points. Note that different tracking variables were used in the two experiments and, thus, the unmasking dB values cannot be directly compared.

The correlations between AM and FM unmasking magnitudes were significant for both carrier frequencies (1 kHz, r = 0.41, p = 0.033, right-tailed; 4 kHz, r = 0.58, p = 0.004, right-tailed). This is consistent with the notion that the same mechanism underlies AM and FM unmasking. The fact that the correspondence in AM and FM unmaking was found at both carrier frequencies supports the notion that for the 20-Hz modulation rate, FM was detected via FM-to-AM conversion despite robust phase locking of auditory-nerve responses at 1 kHz (Joris and Verschooten, 2013). This result is consistent with conclusions from the studies by Moore and Sek (1995, 1996) and Moore et al. (2019) that detection of FM for higher modulation rates (≳10 Hz) does not utilize fine-structure cues even when the carrier frequency is below the limit of phase locking.

In summary, our results replicate previous findings of AM unmasking and demonstrate that a similar phenomenon also affects FM detection at least at modulation rates for which FM is encoded via FM-to-AM conversion. Indeed, in our data, the two effects were correlated in their amplitude, which is consistent with a shared underlying mechanism.

III. EXPERIMENT 2: THE EFFECT OF LEVEL ROVE ON PRECURSOR EFFECTS

A. Rationale

This experiment was conducted to test the hypothesis that dynamic-range adaptation contributes to the modulation unmasking effects shown in experiment 1 and those reported in previous studies (Almishaal et al., 2017; Jennings et al., 2018; Marrufo-Pérez et al., 2018a; Marrufo-Pérez et al., 2019; Wojtczak et al., 2019). The unmasking has been observed with the target AM tone or speech sound (words) presented in noise that followed a fixed-level precursor with a duration of a few hundred milliseconds. The precursors in these human studies, which have been hypothesized to induce dynamic-range adaptation, differed from the stimuli typically used in animal physiological studies of this type of adaptation (Dean et al., 2005; Dean et al., 2008; Watkins and Barbour, 2008, 2011; Wen et al., 2009, 2012; Robinson et al., 2016; Rocchi and Ramachandran, 2018).

In the animal studies, shifts in rate-level functions were observed using adapting stimuli that consisted of sequences of short-burst (∼50 ms) stimuli with no silent gaps between them. The intensities for the sound bursts within the adapting sequence were drawn from carefully titrated distributions that contained narrow high-probability regions centered on different (typically, one low and one high) intensities. After exposure to the adapting stimuli with high-probability regions centered on high intensities, neurons showed a shift in rate-intensity function that allowed them to encode target intensities above that of the adapting sound without reaching their saturation rate. In contrast, after exposure to the adapting stimulus with the high-probability range centered on a low intensity, the same neurons would respond with a saturated rate to the same high-intensity targets. Based on these observations, it has been suggested that the perceptual role of dynamic-range adaptation is to optimize intensity coding in a dynamically varying context. The greatest benefit for neural coding of changes in intensity has been shown to occur for levels just above the most common level in the adapting sequence (Dean et al., 2005). Although the adapting sequences in the animal studies often had a total duration on the order of seconds, a duration of an adapting stimulus of a few hundred milliseconds was sufficient to observe the full benefits of dynamic-range adaptation (Dean et al., 2008; Wen et al., 2012).

In the present experiment, it was assumed that a fixed-level precursor with a duration of a few hundred milliseconds can induce dynamic-range adaptation that should affect encoding of a subsequent AM tone presented in a burst of noise within a single trial. The fixed level was assumed to be equivalent to an adapting “sequence” with an infinitesimally narrow distribution of intensities throughout the precursor duration. The assumption is supported by evidence from auditory-nerve data (Costalupes et al., 1984; Gibson et al., 1985) and model simulations of auditory-nerve responses (Zilany and Carney, 2010), showing that a fixed-level adaptor can induce similar dynamic-range adaptation as can an adaptor with random level variations drawn from a narrow level distribution (Wen et al., 2009). With these assumptions, we tested the hypothesis that dynamic-range adaptation plays a role in AM unmasking by performing an AM-detection task using a masker (with an AM tone) with a level fixed throughout the experiment and a precursor with a level roved over a wide range across trials. In the context of the adaptation hypothesis, a precursor with a level close to that of the masker with the AM tone should be the most effective at facilitating AM detection as the precursor should shift neural rate-level functions to optimize coding of subsequent short-term fluctuations in intensity (Dean et al., 2005). On the other hand, for trials with precursor levels much lower or much higher than the level of the masker with the AM tone, the adapted neural responses would be either at least partially saturated in response to the target or shifted to optimally code levels away from that of the target stimulus. Both cases would result in worse AM-detection ability.

Because in this experiment the percentage of correct responses to an AM tone in a noise masker was measured for different precursor levels, the dynamic-range-adaptation hypothesis would be supported by a pattern with a peak in AM-detection performance (the highest percentage of correct responses) around the precursor level close to that of the noise masker [Fig. 6(a)]. If, on the other hand, the precursor affected AM detection via activation of the MOCR, the percentage of correct responses could exhibit one of the three patterns shown in Fig. 6(b). The pattern shown by the solid line is based on the assumption that MOCR-related unmasking effects should grow monotonically with an increasing precursor level for the masker and precursor levels used in experiment 2 (Kawase et al., 1993; Backus and Guinan, 2006; Wojtczak et al., 2019). This assumption is based on neural data from animals, showing that activation of the MOCR results in unmasking (sometimes referred to as “antimasking”) of a tone in noise (Winslow and Sachs, 1987, 1988; Kawase et al., 1993). The unmasking occurs because of the steepening of neural rate-level functions for a tone in noise when the MOCR is activated compared to the rate-level functions measured without MOCR activation [e.g., see Fig. 2(a) in Kawase et al., 1993]. The steepening is caused by the combination of a lowered rate in response to the noise masker due to a reduction in cochlear gain and an increased saturation rate due to a decreased rate adaptation. The unmasking effect has been demonstrated using electrical (Winslow and Sachs, 1987, 1988) and acoustic (Kawase et al., 1993) stimulation of MOC efferents. The pattern shown by the dashed line would be observed if the MOCR effect saturated for higher precursor levels yielding constant unmasking beyond the point of saturation. The pattern shown by the dotted line would be observed if MOCR-induced gain reductions were sufficiently large for the TEN masker to become inaudible for high precursor levels. If such large gain reductions happened for a precursor level lower than the highest level used, the pattern would exhibit a peak because performance would deteriorate with increasing gain reduction once the troughs of the AM tone became limited by the hearing threshold rather than by the TEN masker. Based on animal and human data for levels roughly comparable to those used in experiment 2, the peak performance would have to correspond to a precursor level equal to or higher than 60 dB SPL, i.e., a level much higher than that of our TEN masker (Backus and Guinan, 2006; Lilaonitkul and Guinan, 2012; Wojtczak et al., 2019). This latter scenario appears unlikely based on the published literature on acoustically elicited efferent effects (reviewed by Lopez-Poveda, 2018) as discussed in Sec. III C.

FIG. 6.

FIG. 6.

(Color online) A schematic illustration of expected patterns of dependence of the AM-detection task performance on the precursor level for three mechanisms mediating precursor effects: dynamic-range adaptation (a), MOCR activation (b), and forward masking (c). In (b), the solid line depicts a scenario in which the MOCR effect grows monotonically for the range of precursor levels used in this study. The dashed and dotted lines illustrate scenarios where, at high precursor levels, the MOCR effect either saturates (dashed line) or becomes so large that the troughs of the AM tone fall below the hearing threshold, effectively reducing its modulation depth (dotted line). Note that the inflection points of these two lines coincide for illustrative purposes only. The gray horizontal dashed-dotted lines schematically represent the percentage of correct responses in the no-precursor condition.

Finally, although our stimuli in the roved-level paradigm were designed to minimize the effect of forward masking by the precursor (as described below), if forward masking mediated the precursor effect, AM-detection performance should remain at the level of that in the no-precursor condition for lower precursor levels and should gradually deteriorate with increasing precursor level once the precursor is intense enough to produce significant forward masking of the target [Fig. 6(c)]. Although the exact mechanism underlying forward masking has not yet been elucidated (Oxenham, 2001), a temporal-window model that is often used to simulate effects of forward masking (Oxenham and Moore, 1994) would predict that the troughs of an AM tone presented after the precursor become increasingly “filled” by persisting excitation from the precursor as the precursor level increases. Consequently, a forward-masking effect should result in a reduction of the effective modulation depth and worse performance compared to that without the precursor or with a relatively low-level precursor. Based on the (still limited) current knowledge of the magnitude of MOCR-induced changes in cochlear gain in humans, neither the precursor-elicited MOCR nor forward masking should result in a pattern with a peak around the masker level expected from the dynamic-range adaptation.

B. Methods

1. Listeners

Thirty-one listeners (9 male, 22 female), aged between 18 and 55 years old (mean = 23.2 yr, SD = 8.1 yr; median = 20 yr), who completed the AM-detection task in experiment 1 and showed some degree of unmasking for at least one of the carrier frequencies, participated in experiment 2. Of these listeners, 25 completed experiment 2 for both carrier frequencies of 1 and 4 kHz, five listeners completed the experiment for the 1-kHz carrier only, and one listener completed the experiment for 4 kHz only. Because of the very small number of listeners who only completed the experiment at one carrier frequency, and because their data showed the pattern consistent with other listeners in the corresponding conditions, we opted to limit the data analyses to the 25 listeners who completed the experiment with both carrier frequencies.

2. Stimuli

The 55-dB SPL tonal carriers (1 and 4 kHz) were presented with simultaneous two-octave TEN maskers, and their temporal characteristics and spectral placements within the masker were the same as in experiment 1. For each listener, the TEN masker levels were set to values that produced the individual SNRs used to measure the 20-Hz AM-detection thresholds in experiment 1. The modulation depths at each carrier frequency were fixed at the values corresponding to each listener's thresholds obtained with the precursors. A TEN precursor had the same duration as in experiment 1, but its level was selected at random for each trial from a wide range of levels that were −25, −15, 0, 15, and 30 dB relative to the masker level. These relative levels will be referred to hereafter as the precursor-masker level difference (PMLD). A PMLD of 0 dB represents the condition with the precursor in experiment 1. Trials with no precursor (baseline condition in experiment 1) were also included within each experimental block. A silent gap of 15 ms was used between the offset of the precursor and the onset of the masker with the tonal carrier. This gap duration was chosen as a compromise between an attempt to limit the role of forward masking, especially for high-level precursors, while still being able to observe the precursor unmasking effect on AM detection for a tone in noise. Forward masking has been shown to decay rapidly over the first 20 ms after the offset of a masker (Wojtczak and Oxenham, 2009). The TEN and AM tones embedded in it were clearly audible in all precursor conditions, indicating negligible forward masking. The methods for the generation of stimuli and the equipment were the same as those in experiment 1.

3. Procedure

A 2AFC procedure combined with a method of constant stimuli was used to measure AM-detection performance for different levels of the noise precursor. The precursor level was selected at random for each trial but was the same in the two observation intervals. The masker containing the 55-dB SPL modulated and unmodulated tonal carriers in the two randomly ordered observation intervals was presented at a fixed (individualized) level throughout the experiment. Listeners were asked to decide which observation interval contained the AM and provide their response via a keypress or a mouse click. Correct-response feedback was provided after each trial. Each block consisted of 120 trials during which each of the 5 precursor levels occurred a total of 20 times. Interleaved with the precursor trials were 20 trials containing the no-precursor condition (a silent interval with a duration corresponding to that of the precursor preceded the masker with the modulated and unmodulated tone). The no-precursor condition was included to ensure that the precursor-induced unmasking effect was observed when using this modified experimental design. Blocks for the 1- and 4-kHz carriers were presented in quasi-random order such that no more than two blocks for the same carrier frequency could occur in sequence. Each listener completed 10 blocks (5 blocks at each carrier frequency), resulting in a total of 100 responses per precursor condition, and the total proportions of correct responses were calculated separately for each carrier frequency and each precursor level, as well as for the no-precursor conditions.

C. Results and discussion

Figure 7 shows the mean results for the 25 listeners who completed the task at both carrier frequencies. The PMLD of 0 dB on the x axis indicates equal levels of the precursor and the masker (as in experiment 1), negative PMLD values indicate that the precursor level was below that of the masker, and positive values indicate that the precursor was more intense than the masker. Overall, for the conditions replicating those in experiment 1 (i.e., 0-dB PMLD vs no-precursor; see the green bars in Fig. 7), a significant amount of AM unmasking was observed as reflected by the higher percentage of correct responses for the 0-dB PMLD than for the no-precursor (np) condition [one-tailed t-test, t(24) = 5.1, p < 0.001 for the 1-kHz carrier and t(24) = 3.0, p = 0.003 for the 4-kHz carrier].

FIG. 7.

FIG. 7.

(Color online) The mean percentage of correct AM detections for different levels of the precursor relative to the level of the masker for 1-kHz (a) and 4-kHz (b) carriers. The no-precursor condition is also included (leftmost bars, denoted by “np”). Light green bars denote conditions present in experiment 1. Error bars indicate ±1 SE.

A repeated-measures ANOVA with within-subject factors of the precursor level and carrier frequency showed a significant effect of the precursor level [F(3.07,73.67) = 6.1, p < 0.001] but not carrier the frequency [F(1,24) = 0.3, p = 0.612]. Additionally, we found a significant interaction between the precursor level and carrier frequency [F(5,120) = 3.4, p = 0.006]. While for the 1-kHz carrier, the pattern of results in Fig. 7 shows a clear peak in performance for the 0-dB PMLD and worse performance for precursor levels above and below the masker, the pattern is less clearly defined for the 4-kHz carrier. A contrast analysis testing for a quadratic trend across PMLDs showed significant quadratic contrast overall [F(1,24) = 16.2, p < 0.001] and a significant interaction between the quadratic trend across levels and the carrier frequency [F(1,24) = 4.5, p = 0.044]. The interaction reflects the observation that the peaky pattern of AM unmasking was more defined for the 1–kHz carrier frequency than for 4-kHz carrier frequency. Indeed, examining the quadratic trend at 4 kHz alone approached but did not reach significance [F(1,24) = 4.0, p = 0.058].

Although the average data replicated the AM-unmasking effect observed in the same listeners in experiment 1, some individuals did not exhibit a difference between the 0-dB PMLD and no-precursor conditions despite showing unmasking for the same stimuli in experiment 1. There were no systematic differences in the amount of unmasking in experiment 1 between listeners who did not exhibit unmasking in experiment 2 and those who did. Individual differences in percentage-correct responses between the 0-dB PMLD and no-precursor conditions are shown in Fig. 8. Note that in this plot, bars representing positive values indicate AM unmasking, whereas bars representing negative values show worse performance with the precursor. It is not clear why AM unmasking was not present in a subset of listeners in the experimental design with the roved precursor level. One possibility is that training during the course of experiment 1 improved performance for these listeners in the no-precursor condition but did less so or not at all in the 0-dB PMLD condition. This was the case for some but not all of the listeners in this subset. Only three out of seven listeners who showed no AM unmasking in experiment 2 in the 1-kHz carrier condition and four out of nine with no unmasking in the 4-kHz carrier condition achieved a no-precursor performance that matched or even exceeded the 79.4% score expected for the 0-dB PMLD condition. However, for other listeners who did not exhibit unmasking with the roved precursor paradigm, performance in the 0-dB PMLD condition was below the expected 79.4% and matched or fell below that in the no-precursor condition. It is not clear why the roved-level paradigm should affect the unmasking phenomenon that was present in experiment 1, but one explanation is in terms of selective adaptation to “oddball” sounds, akin to that demonstrated by Simpson et al. (2014). According to this explanation, the stimulus with no precursor would be an “oddball” event, occurring on ∼17% of the 120 trials within a block. This oddball event could lead to an improvement in the processing of the intensity fluctuations in the target because the temporal characteristics of the no-precursor trials were different from all the trials with the precursor. In the no-precursor trials, the precursor was replaced by a 250-ms silent gap before the onset of the masker with the target. Simpson et al. (2014) showed that for stimuli occurring with low probability that were most dissimilar (oddball) to other stimuli within a block of 100 trials, intensity discrimination improved, sometimes surpassing performance in the baseline condition measured using an adaptive tracking procedure for the same stimuli. The selective adaptation leading to the enhancement of oddball events was shown to have a much longer buildup time, on the order of 1–2 min, than dynamic-range adaptation. Adaptation to temporal features may be explained in terms of adaptation of cortical modulation channels that process different low modulation rates independently (Xiang et al., 2013). Simpson et al. (2014) suggested that adaptation in modulation channels combined with contrast gain found in cortical neurons (Rabinowitz et al., 2011) may account for enhancements of intensity changes in stimuli with oddball (low-probability) temporal characteristics. Indeed, some listeners reported that the AM in the no-precursor trials appeared more salient than that in trials with the precursor, an effect they did not experience when the same time course for all stimuli was used throughout a block of trials in experiment 1.

FIG. 8.

FIG. 8.

(Color online) Individual differences between the scores from the 0-dB precursor-masker difference and the no-precursor trials for 1-kHz (a) and 4-kHz (b) carriers. Dashed lines and shaded regions between dotted lines reflect the mean and ±1 SE, respectively.

Because the goal of the experiment was to determine how the unmasking effect changes as a function of the precursor-masker level mismatch, the statistical analyses were re-run on data from the 13 participants who showed more than 5% performance benefit with the 0-dB PMLD for both of the carrier frequencies. The 5% criterion was chosen arbitrarily to select listeners with a robust AM unmasking effect. The ANOVA showed a pattern of results that was similar to that with all 25 listeners included with the exception that the interaction between the quadratic trend across the precursor level and carrier frequency was nonsignificant [F(1,12) = 1.5, p = 0.237], reflecting a similar peaky pattern for the two carrier frequencies.

The pattern with a peak around the 0-dB PMLD, seen more prominently for the 1-kHz carrier in Fig. 7, is consistent with the expected effect of dynamic-range adaptation. Dean et al. (2005) estimated that due to dynamic-range adaptation, intensity coding should improve the most for stimulus levels that fall into the range just above the level of the adapting stimulus (in our case, the precursor). A reason for a weaker quadratic trend for the 4-kHz carrier seems to be the generally weaker AM unmasking effect for the 4-kHz carrier. With the roved precursor level, the weaker unmasking at 4 kHz was not always observed in the individual data (Fig. 8) but it was present, albeit small, in average percentage-correct responses (an average unmasking of 8.1% at 1 kHz and 6.1% at 4 kHz, shown by horizontal dashed lines in Fig. 8).

In theory, a peaky pattern could be observed if precursor effects mediated AM detection for a tone in noise via MOCR activation. However, given the levels of the stimuli in experiment 2, obtaining a peaky pattern would require very large gain reductions that could render the TEN masker inaudible. Once the troughs of the AM tone became limited by the hearing threshold at the carrier frequency rather than by the noise masker, performance would decline with further increases in MOCR-induced gain reduction. The SNRs for tones presented in TEN were 25 dB for a majority of listeners (19/25 for the 1-kHz carrier and 16/25 for the 4-kHz carrier). For the remaining listeners, the SNRs were lower than 25 dB except for one listener in the 4-kHz condition for whom the SNR was 27 dB. Because for all listeners the AM tones were presented at a fixed level of 55 dB SPL, the TEN had a level of 30 dB or higher (between 30 and 42 dB with the exception of one listener mentioned above for whom the TEN level was 28 dB). Because our listeners were all young and had normal hearing, these TEN masker levels were equivalent to a 30 dB (or higher) sensation level (SL). Gain reductions of 30 dB due to MOCR activation have only been observed with electric stimulation for tones in quiet presented at very high frequencies in studies on animals (Russell and Murugasu, 1997). Studies of efferent effects in humans have often used SFOAEs to probe MOCR-induced cochlear gain reduction because the magnitude of a SFOAE is known to depend on the local cochlear gain (Shera and Guinan, 1999). For a stimulus (pure-tone) level of 40 dB SPL, noise elicitors of the MOCR presented at 40 dB SPL were found to produce undetectable or very small effects on the SFOAE magnitude (a small fraction of a dB), suggesting little or no MOCR-induced reduction of gain applied by the cochlea to the 40-dB SPL tone (Backus and Guinan, 2006; Wojtczak et al., 2019). Both of these studies also showed that as MOCR-elicitor levels increased up to 60 dB SPL, the MOCR-induced gain reduction increased monotonically. However, for the highest level of the noise elicitor (60 dB SPL), the reduction of the SFOAE magnitude was no more than a few decibels in frequency regions of 1 and 4 kHz (Backus and Guinan, 2006; Lilaonitkul and Guinan, 2012). Although it is uncertain whether there is a one-to-one correspondence between the change in SFOAE magnitude and the change in cochlear gain, MOCR-induced changes in the compound action potential show similar or only slightly larger gain reductions (for a review, see Lopez-Poveda, 2018). For most listeners, the highest precursor level used in experiment 2 was approximately 60 dB SPL. Based on existing studies on MOCR effects, it appears impossible that a peaky pattern like the one depicted by the dotted line in Fig. 6(b) would be observed for masker and precursor levels used in this study because gain reductions of 30 dB or larger have not been observed with acoustic stimulation in humans or animals.

The most intense precursors could elicit the middle-ear-muscle reflex (MEMR). The MEMR activation could affect the effective level of the masker with the AM tone in a different way at the two frequencies tested. The MEMR activation changes the middle-ear acoustic impedance, thereby affecting sound transmission through the middle ear in a frequency-dependent manner (Feeney and Keefe, 2001; Schairer et al., 2007; Wojtczak et al., 2017). For both the 1- and 4-kHz tones, the MEMR activation should reduce the upward spread of masking of the AM carrier by lower frequency components in the two-octave noise bands. The frequency-dependent pattern of changes in the middle-ear transmission suggests that the reduction in the upward spread of masking should be greater for the 1-kHz carrier than for the 4-kHz carrier (see the left panel in Fig. 2 in Wojtczak et al., 2017). This is because the MEMR affects the transmission of lower frequencies more than those above ∼2 kHz. Note that the MEMR effects should result in an improvement in AM detection at the higher precursor levels (through a reduction of upward spread of masking), contrary to the worsening of performance that results in the peaky pattern in Fig. 7.

Overall, the results from experiment 2 are broadly consistent with the outcome expected based on the dynamic-range adaptation. Although we performed the roved-level experiment only for AM tones in noise, the implicit assumption was that FM detection would exhibit a similar pattern of dependence on the precursor level. However, the inability to observe AM unmasking in all of the listeners who exhibited the effect with a fixed-level precursor in the first experiment and the relatively small effects in the experiment with the roved precursor level (on average <10 percentage points) preclude strong conclusions. The use of a relatively long (100-ms) AM target may have contributed to the small effects if substantial recovery from dynamic-range adaptation occurred over the course of the 15-ms silent precursor-masker gap and the initial portion of the target sound. In addition, although the outcome is generally consistent with the adaptation to sound level statistics, other possible explanations cannot be ruled out as will be discussed below.

IV. GENERAL DISCUSSION

Adaptive mechanisms play a crucial role in adjusting neural responses to a dynamically varying acoustic environment to facilitate robust coding of stimuli over a wide range of sound intensities. To gain understanding of the full extent of their functional role, adaptive mechanisms have been studied using animal physiological models, human behavioral data, and noninvasive MEG and EEG techniques. Accumulating evidence has shown that in addition to the well-documented rate adaptation (Smith and Zwislocki, 1975; Smith, 1977), neurons at different stages along the auditory pathways exhibit the ability to shift their response functions to adjust to the level statistics of the acoustic environment. Such shifts expand the auditory system's effective dynamic range for intensity coding (Dean et al., 2005; Dean et al., 2008; Watkins and Barbour, 2008, 2011; Wen et al., 2009, 2012; Robinson et al., 2016; Rocchi and Ramachandran, 2018). At the level of the auditory nerve, dynamic-range adaptation has been modeled by the power-law function, combined with exponential functions that simulate rate adaptation (Zilany and Carney, 2010). Although the exact source of the process with power-law dynamics has not been identified, Zilany and Carney (2010) suggested that adaptive shifts have synaptic and/or postsynaptic origin. Verhulst et al. (2018) simulated the power-law dynamics by including a model of the basolateral inner hair cell potassium (K+) currents. Dynamic-range adaptation becomes even stronger in IC neurons (Dean et al., 2005) and, possibly, at higher-level auditory nuclei.

Dynamic-range adaptation has been recently suggested as the mechanism that underlies or at least contributes to improvements in the detection of AM in tones and narrowband noises presented in simultaneous noise maskers due to precursors that immediately precede the target stimuli (Almishaal et al., 2017; Jennings et al., 2018; Marrufo-Pérez et al., 2018a; Wojtczak et al., 2019). In this study, we replicated the previous findings and showed that thresholds for detecting a 20-Hz AM in a tonal carrier embedded in a TEN masker are lower when the masker is preceded by a 250-ms TEN precursor than when it is preceded by silence. We also extended this finding to FM and showed that detection of a 20-Hz FM in TEN also improves after a TEN precursor. This result was expected based on previous evidence that for modulation rates above ∼10 Hz, FM and AM share a common coding mechanism in that both are coded by fluctuations of response magnitudes at the output of cochlear filters (Zwicker, 1970; Moore and Sek, 1992, 1994, 1995; Whiteford and Oxenham, 2015). Consistent with the notion of a common coding mechanism for AM and FM, we found that the magnitudes of the unmasking effects for AM and FM were significantly correlated (see Fig. 5). The mechanism based on FM-to-AM conversion is also used in the detection of low modulation rates (<10 Hz), but previous studies have argued that it is true only at high carrier frequencies for which fine-structure cues based on phase locking are not available (Moore and Sek, 1994, 1995; Joris and Verschooten, 2013; Moore et al., 2019; but cf. Whiteford et al., 2020). Low modulation rates were not used in this study because they would require longer target durations for which the effect of a precursor, hypothesized to be due to dynamic-range adaptation, would be diminished by recovery over the course of the AM and FM stimuli.

A. Comparison with other studies

AM detection measured with precursor levels roved from trial to trial over a wide level range resulted in the best performance for the precursor that matched the level of the masker presented with the target. Since the SNR of the target was chosen to be relatively low, this result is broadly consistent with the predictions based on the dynamic-range adaptation (Dean et al., 2005). Dean et al. (2005) reported that Fisher information calculated from shifted rate-level functions in the IC of the guinea pig is the largest for target levels just above the most common level of the adapting stimulus, thus predicting the best coding of changes in stimulus intensity at these target levels. The adapting stimulus (the precursor) in each trial of the current study had a constant level, but its duration was sufficient to elicit adaptation (Dean et al., 2008; Wen et al., 2009, 2012). Physiological and modeling studies of adaptation in the auditory nerve have shown that a fixed precursor produces equally strong shifts in rate-level functions as a precursor with a level dynamically varying over a narrow range (Gibson et al., 1985; Zilany and Carney, 2010).

Despite the general agreement of our data with the predicted benefits from the dynamic-range adaptation, strong conclusions regarding the role of this adaptation must be moderated due to inconsistent findings from earlier studies. Rocchi and Ramachandran (2018) did not find any benefit of dynamic-range adaptation for the detection of a tone in noise in behaving macaques despite demonstrating clear shifts in neural responses to noise bursts in their IC, which depended on the most common level in the adapting stimulus. One reason for the disagreement could be that detecting a tone in noise does not require the detection of a change in the overall intensity. In fact, performing this task likely involves analysis of the spectral (e.g., Spiegel et al., 1981; Spiegel and Green, 1982; Green, 1983; Green et al., 1984) or modulation profile (Carney, 2018) of neural population responses across CFs.

A recent study by Herrmann et al. (2020) investigated dynamic-range adaptation in humans by examining P1-N1 EEG responses. They also measured detection thresholds for the AM of noise bursts presented at two levels differing by about 30 dB. AM targets were interleaved with sequences of noise bursts with levels drawn from two distributions with narrow ranges of high-probability regions with mean levels similar to the levels of the two targets. The P1-N1 responses were affected by whether a low- or higher-level distribution was used for the noise bursts in the adapting sequence in a way that was consistent with dynamic-range adaptation. However, contrary to predictions based on Fisher information by Dean et al. (2005), the behavioral performance was worse when the mean level of the adapting sequence was similar to that of the target than when the mean level differed from the target level by about 30 dB. The behavioral data in Herrmann et al. (2020) were also inconsistent with the effect of the precursor level shown in Fig. 7 of this study. Herrmann et al. (2020) used adapting sequences that consisted of noise bursts separated by 400-ms silent intervals. This stimulus design may have been more conducive to effects of selective adaptation to “oddballs” such as those shown by Simpson et al. (2014). Simpson et al. (2014) suggested that this type of adaptation operates over a substantially longer time scale than dynamic-range adaptation (on the order of minutes), and it facilitates the detection of changes in intensity for targets that differ substantially in their intensity or temporal characteristics from the adapting stimulus. Recordings of neural activity in the midbrain and auditory cortex of animals have shown that different adaptive processes operate concurrently on different time scales (Ulanovsky et al., 2004; Dean et al., 2008; Yaron et al., 2012). Shorter-term adaptation emphasizes changes in intensity in high-probability stimuli, such as those that evoke dynamic-range adaptation, while long-term adaptation emphasizes oddball stimuli. To determine if the different outcomes represent different types of adaptation, it would be desirable to perform EEG (or MEG) recordings combined with AM detection for stimuli with much shorter silent gaps, such as those used in experimental paradigms shown to elicit robust dynamic-range adaptation.

B. Other potential mechanisms

Although the data in experiment 2 were broadly consistent with the effects of dynamic-range adaptation predicted by Dean et al. (2005), other mechanisms cannot be ruled out. Perceptual similarity between the noise masker and precursor may facilitate grouping of the two stimuli, effectively enhancing the salience of the tonal carrier and making the detection of AM or FM in the carrier an easier task. Auditory grouping based on the similarities of acoustic features of the stimuli likely emerges in cortical areas of the brain (e.g., Nelken et al., 2014). It could be argued that as the level difference between the precursor and masker noise bursts increases, the perceptual similarity of the two stimuli decreases, making their grouping into one perceptual object less effective and, consequently, the detection of the AM of the tone harder. This explanation, although still possible, is weakened by findings from a previous study from our laboratory that reported sizeable AM unmasking with precursors that were perceptually distinct from a noise masker with an AM tone (Wojtczak et al., 2019).

The improvement in AM detection after a precursor fits well within the hypothesized role of the MOCR (Guinan, 2006; Lopez-Poveda, 2018), whereby the cochlear gain reduction and neural firing rate adaptation due to a precursor would result in enhanced coding of the AM in a tone masked by noise. However, the data from experiment 2 are inconsistent with the known effects of MOCR activation. Backus and Guinan (2006) showed that the effect of efferent activation on the magnitude of a SFOAE evoked by a probe presented at a fixed level increases with an increasing level of the MOCR elicitor, indicating progressively greater reduction of the cochlear gain. For tones presented in noise, the gain reduction due to MOCR activation interacts with “two-tone” suppression occurring between the target tone and the simultaneous masking noise on the basilar membrane. The result of the interaction is not a simple rightward shift in auditory-nerve rate level function but a steepening of the rate-level function and an expansion of its dynamic range (Winslow and Sachs, 1987, 1988; Kawase et al., 1993). Based on neural data from animal models, higher precursor (presumed MOCR elicitor) levels would expand the dynamic range of auditory-nerve fibers responding to the AM tones progressively more as the strength of the reflex increased, at least until the maximum gain reduction via efferent feedback was reached, at which point the effect would saturate. Inconsistent with this prediction, the best AM-detection performance was observed for the precursor level, which was similar to the level of the masker presented with the AM tone. Performance decayed for precursor levels that were lower and higher than the masker level, resulting in the peaky pattern shown in Fig. 7. Previous studies also reported inconsistencies between their data and the MOCR-based explanation for AM unmasking effects. Specifically, MOCR effects are known to be greater for bilateral elicitors while the reported unmasking effects are independent of the precursor laterality (Marrufo-Pérez et al., 2018a; Marrufo-Pérez et al., 2018b). Wojtczak et al. (2019) measured effects of efferent activation on SFOAEs for noise and tonal stimuli with levels set to values for which significant AM unmasking was observed in a behavioral AM-detection task performed in that study. For those levels, no significant MOCR-related changes in emission magnitude due to the noise elicitor were observed, thus, suggesting that post-cochlear mechanisms must be at play. Notably, activation of the MEMR, if present, would have been observed in the emission measurements as it would have resulted in a change in ear-canal sound pressure due to the noise elicitor (Guinan et al., 2003; Wojtczak et al., 2019). As stated above, no significant change was observed for the stimuli that produced AM unmasking. Recently, Marrufo-Pérez et al. (2020) showed that a stimulus consisting of short (50-ms) noise segments with levels varying randomly over a wide range reduced click-evoked otoacoustic emission in the contralateral ear more than a sequence of noise bursts with narrow level distribution, but only the latter stimulus improved word recognition in noise when used as a precursor. In addition, Marrufo-Pérez et al. (2018a) and Marrufo-Pérez et al. (2019) showed significant improvements in AM detection and word recognition in noise due to a precursor in cochlear-implant users. Their data indicate that neither the MOCR nor MEMR are needed to produce the unmasking effects. Therefore, given the evidence from this and the previous studies, the role of the MOCR and MEMR for the AM and FM unmasking appears limited at best.

Although neither the MOCR nor forward masking would predict the patterns of data in Fig. 7 separately, the combined effect of the two mechanisms could, in principle, result in a peaky pattern shown by our data. For lower precursor levels, the MOCR could dominate the effect of the precursor on performance and lead to improvement with an increasing precursor level. Once forward masking became dominant at higher precursor levels, it could obliterate the benefits of MOCR-related unmasking and lead to increasingly worse performances with increasing precursor levels. However, the lack of consistent support for the role of the MOCR in the psychophysical AM unmasking makes this explanation less likely than a more parsimonious account in terms of dynamic-range adaptation. In addition, because of the 15-ms delay between the precursor and the TEN with an AM tone, forward masking unlikely contributed to the results pattern shown in Fig. 7.

The MOC efferent system could still act to improve encoding of transient sounds in an ongoing acoustic background via collateral projections to excitatory and inhibitory stellate cells in the ventral cochlear nucleus as was suggested based on the mice model (Fujino and Oertel, 2001). Fujino and Oertel (2001) showed that this local neuronal circuit acts to enhance responses to narrowband sounds in noise and operates over a time course similar to that for the MOCR. The authors state that these enhancements of neural representations of narrowband sounds in noise can occur even in the absence of MOCR effects on cochlear gain. It is currently unknown how the operation of this neuronal circuit depends on the parameters of the acoustic stimuli, making it impossible to test hypotheses about the contribution of this mechanism to perceptual context effects such as those shown in this study. Until further research is performed, the role of such efferent-based neural circuits at the level of the MOC for AM unmasking cannot be evaluated.

C. Final remarks

As reported in this and the previous study from our laboratory (Wojtczak et al., 2019), context effects that involve improvements in AM- and FM-detection thresholds for tones partially masked by noise occur over a limited range of SNRs and are most pronounced when the prior stimulus providing the context has a level similar to that of the masker. However, the limited parameter space over which the unmasking effects are observed does not indicate a limited benefit for real-life performance. Even a small improvement in detecting envelope fluctuations in the target sound in the presence of a masker is beneficial for understanding speech in complex acoustic backgrounds (e.g., Jørgensen et al., 2015). Although acoustic environments may vary over a wide range of levels, relatively short-term variations are typically limited to a narrow level range (Kirk and Smith, 2003). These small variations in the dynamically varying acoustic background could facilitate the unmasking effects reported here, possibly through dynamic-range adaptation, particularly for relatively low target-to-masker ratios. Establishing the underlying mechanisms is challenging but necessary for uncovering sources of abnormal processing in impaired auditory systems. Although findings from this study broadly support the role of dynamic-range adaptation for context-related improvement in AM and FM processing, it is desirable to follow-up on these findings by using an experimental design that combines behavioral and electrophysiological measures for the same set of stimuli.

ACKNOWLEDGMENTS

We thank Nathan Torunsky for help with the data collection. We also thank the Associate Editor and two anonymous reviewers for their helpful comments. This work was supported by National Institutes of Health Grant No. R01DC015462.

References

  • 1. Almishaal, A. , Bidelman, G. M. , and Jennings, S. G. (2017). “ Notched-noise precursors improve detection of low-frequency amplitude modulation,” J. Acoust. Soc. Am. 141, 324–333. 10.1121/1.4973912 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Backus, B. C. , and Guinan, J. J. (2006). “ Time-course of the human medial olivocochlear reflex,” J. Acoust. Soc. Am. 119, 2889–2904. 10.1121/1.2169918 [DOI] [PubMed] [Google Scholar]
  • 3. Bacon, S. P. , and Grantham, D. W. (1989). “ Modulation masking: Effects of modulation frequency, depth, and phase,” J. Acoust. Soc. Am. 85, 2575–2580. 10.1121/1.397751 [DOI] [PubMed] [Google Scholar]
  • 4. Ben-David, B. M. , Avivi-Reich, M. , and Schneider, B. A. (2016). “ Does the degree of linguistic experience (native versus nonnative) modulate the degree to which listeners can benefit from a delay between the onset of the maskers and the onset of the target speech?,” Hear. Res. 341, 9–18. 10.1016/j.heares.2016.07.016 [DOI] [PubMed] [Google Scholar]
  • 5. Ben-David, B. M. , Tse, V. Y. Y. , and Schneider, B. A. (2012). “ Does it take older adults longer than younger adults to perceptually segregate a speech target from a background masker?,” Hear. Res. 290, 55–63. 10.1016/j.heares.2012.04.022 [DOI] [PubMed] [Google Scholar]
  • 6. Bidelman, G. M. , and Bhagat, S. P. (2015). “ Right-ear advantage drives the link between olivocochlear efferent ‘antimasking’ and speech-in-noise listening benefits,” Neuroreport 26, 483–487. 10.1097/WNR.0000000000000376 [DOI] [PubMed] [Google Scholar]
  • 7. Bregman, A. S. (1990). Auditory Scene Analysis: The Perceptual Organization of Sound ( MIT Press, Cambridge, MA: ). [Google Scholar]
  • 8. Byrne, A. J. , Viemeister, N. F. , and Stellmack, M. A. (2013). “ The effects of unmodulated carrier fringes on the detection of frequency modulation,” J. Acoust. Soc. Am. 133, 998–1003. 10.1121/1.4773353 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Carney, L. H. (2018). “ Supra-threshold hearing and fluctuation profiles: Implications for sensorineural and hidden hearing loss,” J. Assoc. Res. Otolaryngol. 19, 331–352. 10.1007/s10162-018-0669-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Costalupes, J. A. , Young, E. D. , and Gibson, D. J. (1984). “ Effects of continuous noise backgrounds on rate response of auditory nerve fibers in cat,” J. Neurophysiol. 51, 1326–1344. 10.1152/jn.1984.51.6.1326 [DOI] [PubMed] [Google Scholar]
  • 11. Dau, T. , Kollmeier, B. , and Kohlrausch, A. (1997a). “ Modeling auditory processing of amplitude modulation. I. Detection and masking with narrow-band carriers,” J. Acoust. Soc. Am. 102, 2892–2905. 10.1121/1.420344 [DOI] [PubMed] [Google Scholar]
  • 12. Dau, T. , Kollmeier, B. , and Kohlrausch, A. (1997b). “ Modeling auditory processing of amplitude modulation. II. Spectral and temporal integration,” J. Acoust. Soc. Am. 102, 2906–2919. 10.1121/1.420345 [DOI] [PubMed] [Google Scholar]
  • 13. Dean, I. , Harper, N. S. , and McAlpine, D. (2005). “ Neural population coding of sound level adapts to stimulus statistics,” Nat. Neurosci. 8, 1684–1689. 10.1038/nn1541 [DOI] [PubMed] [Google Scholar]
  • 14. Dean, I. , Robinson, B. L. , Harper, N. S. , and McAlpine, D. (2008). “ Rapid neural adaptation to sound level statistics,” J. Neurosci. 28, 6430–6438. 10.1523/JNEUROSCI.0470-08.2008 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. de Boer, J. , Thornton, A. R. D. , and Krumbholz, K. (2012). “ What is the role of the medial olivocochlear system in speech-in-noise processing?,” J. Neurophysiol. 107, 1301–1312. 10.1152/jn.00222.2011 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Ewert, S. D. (2013). “ AFC—A modular framework for running psychoacoustic experiments and computational perception models,” in Proc. Int. Conf. Acoust. AIA-DAGA, Merano, Italy, pp. 1326–1329. [Google Scholar]
  • 17. Ewert, S. D. , and Dau, T. (2017). “ Reproducible psychoacoustic experiments and computational perception models in a modular software framework,” J. Acoust. Soc. Am. 141, 3630. 10.1121/1.4987809 [DOI] [Google Scholar]
  • 18. Feeney, M. P. , and Keefe, D. H. (2001). “ Estimating the acoustic reflex threshold from wideband measures of reflectance, admittance, and power,” Ear Hear. 22, 316–332. 10.1097/00003446-200108000-00006 [DOI] [PubMed] [Google Scholar]
  • 19. Fujino, K. , and Oertel, D. (2001). “ Cholinergic modulation of stellate cells in the mammalian ventral cochlear nucleus,” J. Neurosci. 21, 7372–7383. 10.1523/JNEUROSCI.21-18-07372.2001 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Gibson, D. J. , Young, E. D. , and Costalupes, J. A. (1985). “ Similarity of dynamic range adjustment in auditory nerve and cochlear nuclei,” J. Neurophysiol. 53, 940–958. 10.1152/jn.1985.53.4.940 [DOI] [PubMed] [Google Scholar]
  • 21. Glasberg, B. R. , and Moore, B. C. (1990). “ Derivation of auditory filter shapes from notched-noise data,” Hear. Res. 47, 103–138. 10.1016/0378-5955(90)90170-T [DOI] [PubMed] [Google Scholar]
  • 22. Grantham, D. W. , and Wightman, F. L. (1978). “ Detectability of varying interaural temporal differences,” J. Acoust. Soc. Am. 63, 511–523. 10.1121/1.381751 [DOI] [PubMed] [Google Scholar]
  • 23. Green, D. M. (1983). “ Profile analysis: A different view of auditory intensity discrimination,” Am. Psychol. 38, 133–142. 10.1037/0003-066X.38.2.133 [DOI] [PubMed] [Google Scholar]
  • 24. Green, D. M. , Mason, C. R. , and Kidd, G. (1984). “ Profile analysis: Critical bands and duration,” J. Acoust. Soc. Am. 75, 1163–1167. 10.1121/1.390765 [DOI] [PubMed] [Google Scholar]
  • 25. Guinan, J. J. (2006). “ Olivocochlear efferents: Anatomy, physiology, function, and the measurement of efferent effects in humans,” Ear Hear. 27, 589–607. 10.1097/01.aud.0000240507.83072.e7 [DOI] [PubMed] [Google Scholar]
  • 26. Guinan, J. J. , Backus, B. C. , Lilaonitkul, W. , and Aharonson, V. (2003). “ Medial olivocochlear efferent reflex in humans: Otoacoustic emission (OAE) measurement issues and the advantages of stimulus frequency OAEs,” J. Assoc. Res. Otolaryngol. 4, 521–540. 10.1007/s10162-002-3037-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. Henry, K. S. , Abrams, K. S. , Forst, J. , Mender, M. J. , Neilans, E. G. , Idrobo, F. , and Carney, L. H. (2017). “ Midbrain synchrony to envelope structure supports behavioral sensitivity to single-formant vowel-like sounds in noise,” J. Assoc. Res. Otolaryngol. 18, 165–181. 10.1007/s10162-016-0594-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Herrmann, B. , Augereau, T. , and Johnsrude, I. S. (2020). “ Neural responses and perceptual sensitivity to sound depend on sound-level statistics,” Sci. Rep. 10, 9571. 10.1038/s41598-020-66715-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Herrmann, B. , Maess, B. , and Johnsrude, I. S. (2018). “ Aging affects adaptation to sound-level statistics in human auditory cortex,” J. Neurosci. 38, 1989–1999. 10.1523/JNEUROSCI.1489-17.2018 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Jennings, S. G. , Chen, J. , Fultz, S. E. , Ahlstrom, J. B. , and Dubno, J. R. (2018). “ Amplitude modulation detection with a short-duration carrier: Effects of a precursor and hearing loss,” J. Acoust. Soc. Am. 143, 2232–2243. 10.1121/1.5031122 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Jørgensen, S. , and Dau, T. (2011). “ Predicting speech intelligibility based on the signal-to-noise envelope power ratio after modulation-frequency selective processing,” J. Acoust. Soc. Am. 130, 1475–1487. 10.1121/1.3621502 [DOI] [PubMed] [Google Scholar]
  • 32. Jørgensen, S. , Decorsière, R. , and Dau, T. (2015). “ Effects of manipulating the signal-to-noise envelope power ratio on speech intelligibility,” J. Acoust. Soc. Am. 137, 1401–1410. 10.1121/1.4908240 [DOI] [PubMed] [Google Scholar]
  • 33. Jørgensen, S. , Ewert, S. D. , and Dau, T. (2013). “ A multi-resolution envelope-power based model for speech intelligibility,” J. Acoust. Soc. Am. 134, 436–446. 10.1121/1.4807563 [DOI] [PubMed] [Google Scholar]
  • 34. Joris, P. X. , and Verschooten, E. (2013). “ On the limit of neural phase locking to fine structure in humans,” Adv. Exp. Med. Biol. 787, 101–108. 10.1007/978-1-4614-1590-9 [DOI] [PubMed] [Google Scholar]
  • 35. Kawase, T. , Delgutte, B. , and Liberman, M. C. (1993). “ Antimasking effects of the olivocochlear reflex. II. Enhancement of auditory-nerve response to masked tones,” J. Neurophysiol. 70, 2533–2549. 10.1152/jn.1993.70.6.2533 [DOI] [PubMed] [Google Scholar]
  • 36. Kidd, G. , Mason, C. R. , Arbogast, T. L. , Brungart, D. S. , and Simpson, B. D. (2003). “ Informational masking caused by contralateral stimulation,” J. Acoust. Soc. Am. 113, 1594–1603. 10.1121/1.1547440 [DOI] [PubMed] [Google Scholar]
  • 37. Kirk, C. E. , and Smith, D. W. (2003). “ Protection from acoustic trauma is not a primary function of the medial olivocochlear efferent system,” J. Assoc. Res. Otolaryngol. 4, 445–465. 10.1007/s10162-002-3013-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38. Kumar, U. A. , and Vanaja, C. S. (2004). “ Functioning of olivocochlear bundle and speech perception in noise,” Ear Hear. 25, 142–146. 10.1097/01.AUD.0000120363.56591.E6 [DOI] [PubMed] [Google Scholar]
  • 39. Levitt, H. (1971). “ Transformed up-down methods in psychoacoustics,” J. Acoust. Soc. Am. 49, 467–477. 10.1121/1.1912375 [DOI] [PubMed] [Google Scholar]
  • 40. Leys, C. , Ley, C. , Klein, O. , Bernard, P. , and Licata, L. (2013). “ Detecting outliers: Do not use standard deviation around the mean, use absolute deviation around the median,” J. Exp. Soc. Psychol. 49, 764–766. 10.1016/j.jesp.2013.03.013 [DOI] [Google Scholar]
  • 41. Lilaonitkul, W. , and Guinan, J. J. (2012). “ Frequency tuning of medial-olivocochlear-efferent acoustic reflexes in humans as functions of probe frequency,” J. Neurophysiol. 107, 1598–1611. 10.1152/jn.00549.2011 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42. Lopez-Poveda, E. A. (2018). “ Olivocochlear efferents in animals and humans: From anatomy to clinical relevance,” Front. Neurol. 9, 1–18. 10.3389/fneur.2018.00197 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43. Marrufo-Pérez, M. I. , del Pilar Sturla-Carreto, D. , Eustaquio-Martín, A. , and Lopez-Poveda, E. A. (2020). “ Adaptation to noise in human speech recognition depends on noise-level statistics and fast dynamic-range compression,” J. Neurosci. 40, 6613–6623. 10.1523/JNEUROSCI.0469-20.2020 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44. Marrufo-Pérez, M. I. , Eustaquio-Martín, A. , Fumero, M. J. , Gorospe, J. M. , Polo, R. , Gutiérrez Revilla, A. , and Lopez-Poveda, E. A. (2019). “ Adaptation to noise in amplitude modulation detection without the medial olivocochlear reflex,” Hear. Res. 377, 133–141. 10.1016/j.heares.2019.03.017 [DOI] [PubMed] [Google Scholar]
  • 45. Marrufo-Pérez, M. I. , Eustaquio-Martín, A. , López-Bascuas, L. E. , and Lopez-Poveda, E. A. (2018a). “ Temporal effects on monaural amplitude-modulation sensitivity in ipsilateral, contralateral and bilateral noise,” J. Assoc. Res. Otolaryngol. 19, 147–161. 10.1007/s10162-018-0656-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46. Marrufo-Pérez, M. I. , Eustaquio-Martín, A. , and Lopez-Poveda, E. A. (2018b). “ Adaptation to noise in human speech recognition unrelated to the medial olivocochlear reflex,” J. Neurosci. 38, 4138–4145. 10.1523/JNEUROSCI.0024-18.2018 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47. Mishra, S. K. , and Lutman, M. E. (2014). “ Top-down influences of the medial olivocochlear efferent system in speech perception in noise,” PLoS One 9, e8575. 10.1371/journal.pone.0085756 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48. Moore, B. C. J. , Huss, M. , Vickers, D. A. , Glasberg, B. R. , and Alcantara, J. I. (2000). “ A test for the diagnosis of dead regions in the cochlea,” Br. J. Audiol. 34, 205–224. 10.3109/03005364000000131 [DOI] [PubMed] [Google Scholar]
  • 49. Moore, B. C. J. , Mariathasan, S. , and Sęk, A. P. (2019). “ Effects of age and hearing loss on the discrimination of amplitude and frequency modulation for 2- and 10-hz rates,” Trends Hear. 23, 1–12. 10.1177/2331216519853963 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50. Moore, B. C. J. , and Sek, A. (1992). “ Detection of combined frequency and amplitude modulation,” J. Acoust. Soc. Am. 92, 3119–3131. 10.1121/1.404208 [DOI] [PubMed] [Google Scholar]
  • 51. Moore, B. C. J. , and Sek, A. (1994). “ Effects of carrier frequency and background noise on the detection of mixed modulation,” J. Acoust. Soc. Am. 96, 741–751. 10.1121/1.410312 [DOI] [PubMed] [Google Scholar]
  • 52. Moore, B. C. J. , and Sek, A. (1995). “ Effects of carrier frequency, modulation rate, and modulation waveform on the detection of modulation and the discrimination of modulation type (amplitude modulation versus frequency modulation),” J. Acoust. Soc. Am. 97, 2468–2478. 10.1121/1.411967 [DOI] [PubMed] [Google Scholar]
  • 53. Moore, B. C. J. , and Sek, A. (1996). “ Detection of frequency modulation at low modulation rates: Evidence for a mechanism based on phase locking,” J. Acoust. Soc. Am. 100, 2320–2331. 10.1121/1.417941 [DOI] [PubMed] [Google Scholar]
  • 54. Nelken, I. , Bizley, J. K. , Shamma, S. A. , and Wang, X. (2014). “ Auditory cortical processing in real-world listening: The auditory system going real,” J. Neurosci. 34, 15135–15138. 10.1523/JNEUROSCI.2989-14.2014 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55. Oxenham, A. J. (2001). “ Forward masking: Adaptation or integration?,” J. Acoust. Soc. Am. 109, 732–741. 10.1121/1.1336501 [DOI] [PubMed] [Google Scholar]
  • 56. Oxenham, A. J. , and Moore, B. C. J. (1994). “ Modeling the additivity of nonsimultaneous masking,” Hear. Res. 80, 105–118. 10.1016/0378-5955(94)90014-0 [DOI] [PubMed] [Google Scholar]
  • 57. Oxenham, A. J. , and Shera, C. A. (2003). “ Estimates of human cochlear tuning at low levels using forward and simultaneous masking,” J. Assoc. Res. Otolaryngol. 4, 541–554. 10.1007/s10162-002-3058-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58. Rabinowitz, N. C. , Willmore, B. D. B. , Schnupp, J. W. H. , and King, A. J. (2011). “ Contrast gain control in auditory cortex,” Neuron 70, 1178–1191. 10.1016/j.neuron.2011.04.030 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59. Rennies, J. , Best, V. , Roverud, E. , and Kidd, G. (2019). “ Energetic and informational components of speech-on-speech masking in binaural speech intelligibility and perceived listening effort,” Trends Hear. 23, 1–21. 10.1177/2331216519854597 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60. Robinson, B. L. , Harper, N. S. , and McAlpine, D. (2016). “ Meta-adaptation in the auditory midbrain under cortical influence,” Nat. Commun. 7, 1–8. 10.1038/ncomms13442 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61. Rocchi, F. , and Ramachandran, R. (2018). “ Neuronal adaptation to sound statistics in the inferior colliculus of behaving macaques does not reduce the effectiveness of the masking noise,” J. Neurophysiol. 120, 2819–2833. 10.1152/jn.00875.2017 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62. Roverud, E. , Best, V. , Mason, C. R. , Swaminathan, J. , and Kidd, G. (2016). “ Informational masking in normal-hearing and hearing-impaired listeners measured in a nonspeech pattern identification task,” Trends Hear. 20, 1–17. 10.1177/2331216516638516 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63. Russell, I. J. , and Murugasu, E. (1997). “ Medial efferent inhibition suppresses basilar membrane responses to near characteristic frequency tones of moderate to high intensities,” J. Acoust. Soc. Am. 102, 1734–1738. 10.1121/1.420083 [DOI] [PubMed] [Google Scholar]
  • 64. Schairer, K. S. , Ellison, J. C. , Fitzpatrick, D. , and Keefe, D. H. (2007). “ Wideband ipsilateral measurements of middle-ear muscle reflex thresholds in children and adults,” J. Acoust. Soc. Am. 121, 3607–3616. 10.1121/1.2722213 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65. Shera, C. A. , and Guinan, J. J. (1999). “ Evoked otoacoustic emissions arise by two fundamentally different mechanisms: A taxonomy for mammalian OAEs,” J. Acoust. Soc. Am. 105, 782–798. 10.1121/1.426948 [DOI] [PubMed] [Google Scholar]
  • 66. Shera, C. A. , Guinan, J. J. , and Oxenham, A. J. (2002). “ Revised estimates of human cochlear tuning from otoacoustic and behavioral measurements,” Proc. Natl. Acad. Sci. U.S.A. 99, 3318–3323. 10.1073/pnas.032675099 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67. Simpson, A. J. R. , Harper, N. S. , Reiss, J. D. , and McAlpine, D. (2014). “ Selective adaptation to ‘oddball’ sounds by the human auditory system,” J. Neurosci. 34, 1963–1969. 10.1523/JNEUROSCI.4274-13.2013 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68. Smith, R. L. (1977). “ Short-term adaptation in single auditory nerve fibers: Some poststimulatory effects,” J. Neurophysiol. 40, 1098–1112. 10.1152/jn.1977.40.5.1098 [DOI] [PubMed] [Google Scholar]
  • 69. Smith, R. L. , and Zwislocki, J. J. (1975). “ Short-term adaptation and incremental responses of single auditory-nerve fibers,” Biol. Cybern. 17, 169–182. 10.1007/BF00364166 [DOI] [PubMed] [Google Scholar]
  • 70. Spiegel, M. F. , and Green, D. M. (1982). “ Signal and masker uncertainty with noise maskers of varying duration, bandwidth, and center frequency,” J. Acoust. Soc. Am. 71, 1204–1210. 10.1121/1.387769 [DOI] [PubMed] [Google Scholar]
  • 71. Spiegel, M. F. , Picardi, M. C. , and Green, D. M. (1981). “ Signal and masker uncertainty in intensity discrimination,” J. Acoust. Soc. Am. 70, 1015–1019. 10.1121/1.386951 [DOI] [PubMed] [Google Scholar]
  • 72. Ulanovsky, N. , Las, L. , Farkas, D. , and Nelken, I. (2004). “ Multiple time scales of adaptation in auditory cortex neurons,” J. Neurosci. 24, 10440–10453. 10.1523/JNEUROSCI.1905-04.2004 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73. Verhulst, S. , Altoè, A. , and Vasilkov, V. (2018). “ Computational modeling of the human auditory periphery: Auditory-nerve responses, evoked potentials and hearing loss,” Hear. Res. 360, 55–75. 10.1016/j.heares.2017.12.018 [DOI] [PubMed] [Google Scholar]
  • 74. Verschooten, E. , Shamma, S. , Oxenham, A. J. , Moore, B. C. J. , Joris, P. X. , Heinz, M. G. , and Plack, C. J. (2019). “ The upper frequency limit for the use of phase locking to code temporal fine structure in humans: A compilation of viewpoints,” Hear. Res. 377, 109–121. 10.1016/j.heares.2019.03.011 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75. Watkins, P. V. , and Barbour, D. L. (2008). “ Specialized neuronal adaptation for preserving input sensitivity,” Nat. Neurosci. 11, 1259–1261. 10.1038/nn.2201 [DOI] [PubMed] [Google Scholar]
  • 76. Watkins, P. V. , and Barbour, D. L. (2011). “ Level-tuned neurons in primary auditory cortex adapt differently to loud versus soft sounds,” Cereb. Cortex 21, 178–190. 10.1093/cercor/bhq079 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77. Wen, B. , Wang, G. I. , Dean, I. , and Delgutte, B. (2009). “ Dynamic range adaptation to sound level statistics in the auditory nerve,” J. Neurosci. 29, 13797–13808. 10.1523/JNEUROSCI.5610-08.2009 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78. Wen, B. , Wang, G. I. , Dean, I. , and Delgutte, B. (2012). “ Time course of dynamic range adaptation in the auditory nerve,” J. Neurophysiol. 108, 69–82. 10.1152/jn.00055.2012 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79. Whiteford, K. L. , Kreft, H. A. , and Oxenham, A. J. (2020). “ The role of cochlear place coding in the perception of frequency modulation,” Elife 9, 1–26. 10.7554/eLife.58468 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80. Whiteford, K. L. , and Oxenham, A. J. (2015). “ Using individual differences to test the role of temporal and place cues in coding frequency modulation,” J. Acoust. Soc. Am. 138, 3093–3104. 10.1121/1.4935018 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 81. Winslow, R. L. , and Sachs, M. B. (1987). “ Effect of electrical stimulation of the crossed olivocochlear bundle on auditory nerve response to tones in noise,” J. Neurophysiol. 57, 1002–1021. 10.1152/jn.1987.57.4.1002 [DOI] [PubMed] [Google Scholar]
  • 82. Winslow, R. L. , and Sachs, M. B. (1988). “ Single-tone intensity discrimination based on auditory-nerve rate responses in backgrounds of quiet, noise, and with stimulation of the crossed olivocochlear bundle,” Hear. Res. 35, 165–189. 10.1016/0378-5955(88)90116-5 [DOI] [PubMed] [Google Scholar]
  • 83. Wojtczak, M. (2011). “ The effect of carrier level on tuning in amplitude-modulation masking,” J. Acoust. Soc. Am. 130, 3916–3925. 10.1121/1.3658475 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 84. Wojtczak, M. , Beim, J. A. , and Oxenham, A. J. (2017). “ Weak middle-ear-muscle reflex in humans with noise-induced tinnitus and normal hearing may reflect cochlear synaptopathy,” eNeuro 4, ENEURO.0363–17.2017. 10.1523/ENEURO.0363-17.2017 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 85. Wojtczak, M. , Klang, A. M. , and Torunsky, N. T. (2019). “ Exploring the role of medial olivocochlear efferents on the detection of amplitude modulation for tones presented in noise,” J. Assoc. Res. Otolaryngol. 20, 395–413. 10.1007/s10162-019-00722-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 86. Wojtczak, M. , and Oxenham, A. J. (2009). “ Pitfalls in behavioral estimates of basilar-membrane compression in humans,” J. Acoust. Soc. Am. 125, 270–281. 10.1121/1.3023063 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 87. Xiang, J. , Poeppel, D. , and Simon, J. Z. (2013). “ Physiological evidence for auditory modulation filterbanks: Cortical responses to concurrent modulations,” J. Acoust. Soc. Am. 133, EL7–EL12. 10.1121/1.4769400 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 88. Yaron, A. , Hershenhoren, I. , and Nelken, I. (2012). “ Sensitivity to complex statistical regularities in rat auditory cortex,” Neuron 76, 603–615. 10.1016/j.neuron.2012.08.025 [DOI] [PubMed] [Google Scholar]
  • 89. Zilany, M. S. A. , and Carney, L. H. (2010). “ Power-law dynamics in an auditory-nerve model can account for neural adaptation to sound-level statistics,” J. Neurosci. 30, 10380–10390. 10.1523/JNEUROSCI.0647-10.2010 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 90. Zwicker, E. (1956). “ Die elementaren grundlagen zur bestimmung der informationskapazität des gehörs” (“The elemental foundations for determining the information capacity of the auditory system”), Acustica 6, 365–381. [Google Scholar]
  • 91. Zwicker, E. (1970). “ Masking and psychological excitation as consequences of the ear's frequency analysis,” in Frequency Analysis and Periodicity Detection in Hearing, edited by Plomp R. and Smoorenburg G. F. ( Sijthoff, Leiden: ), pp. 376–396. [Google Scholar]

Articles from The Journal of the Acoustical Society of America are provided here courtesy of Acoustical Society of America

RESOURCES