Skip to main content
PLOS Biology logoLink to PLOS Biology
. 2021 Oct 20;19(10):e3001439. doi: 10.1371/journal.pbio.3001439

Understanding degraded speech leads to perceptual gating of a brainstem reflex in human listeners

Heivet Hernández-Pérez 1,*, Jason Mikiel-Hunter 1, David McAlpine 1, Sumitrajit Dhar 2, Sriram Boothalingam 3, Jessica J M Monaghan 1,4,#, Catherine M McMahon 1,#
Editor: Manuel S Malmierca5
PMCID: PMC8559948  PMID: 34669696

Abstract

The ability to navigate “cocktail party” situations by focusing on sounds of interest over irrelevant, background sounds is often considered in terms of cortical mechanisms. However, subcortical circuits such as the pathway underlying the medial olivocochlear (MOC) reflex modulate the activity of the inner ear itself, supporting the extraction of salient features from auditory scene prior to any cortical processing. To understand the contribution of auditory subcortical nuclei and the cochlea in complex listening tasks, we made physiological recordings along the auditory pathway while listeners engaged in detecting non(sense) words in lists of words. Both naturally spoken and intrinsically noisy, vocoded speech—filtering that mimics processing by a cochlear implant (CI)—significantly activated the MOC reflex, but this was not the case for speech in background noise, which more engaged midbrain and cortical resources. A model of the initial stages of auditory processing reproduced specific effects of each form of speech degradation, providing a rationale for goal-directed gating of the MOC reflex based on enhancing the representation of the energy envelope of the acoustic waveform. Our data reveal the coexistence of 2 strategies in the auditory system that may facilitate speech understanding in situations where the signal is either intrinsically degraded or masked by extrinsic acoustic energy. Whereas intrinsically degraded streams recruit the MOC reflex to improve representation of speech cues peripherally, extrinsically masked streams rely more on higher auditory centres to denoise signals.


Understanding speech in background noise is critical to human communication. This study highlights a critical role for neural feedback circuits that modulate the activity of the inner ear, enabling effective listening to degraded speech.

Introduction

Robust cocktail party listening, the ability to focus on a single talker in a background of simultaneous, overlapping conversations, is critical to human communication and a long-sought goal of hearing technologies [1,2]. Problems listening in background noise are a key complaint of many listeners with even mild hearing loss and a stated factor in the non-use and non-uptake of hearing devices [35]. However, despite their importance in everyday listening tasks and relevance to hearing impairment, physiological mechanisms that enhance attended speech remain poorly understood. In addition to local circuits in the auditory periphery and brainstem that have evolved to enhance automatically the neural representation of ecologically relevant sounds [68], it is likely that such a critical goal-directed behaviour as cocktail party listening also relies on top-down, cortically driven processes to emphasise perceptually relevant sounds and suppress those that are irrelevant [9,10]. Nevertheless, the specific role of bottom up and top-down mechanisms in complex listening tasks remain to be determined.

A potential mechanistic pathway supporting cocktail party listening is the auditory efferent system, whose multisynaptic connections extend from the auditory cortex (AC) to the inner ear [1113]. In particular, reflexive activation by sound of fibres in the medial olivocochlear (MOC) reflex innervating the outer hair cells (OHCs—electromotile elements responsible for the cochlea’s active amplifier) is known to reduce cochlear gain [14], increasing the overall dynamic range of the inner ear and facilitating sound encoding in high levels of background noise [15].

MOC fibres (ipsilateral and contralateral to each ear) originate in medial divisions of the superior olivary complex in the brainstem and synapse on the basal aspects of OHCs to modulate directly mechanical [16,17] and indirectly neural sensitivity to sound [18,19]. MOC neurons are also innervated by descending fibres from AC and midbrain neurons [11,20,21], providing a potential means by which the MOC reflex might be gated perceptually, either by directly exciting/inhibiting MOC fibres or by modulating their stimulus-evoked reflexive activity [2225]. Although it has been speculated that changes in cochlear gain mediated by the MOC reflex might enhance speech coding in background noise [2628], its role in reducing cochlear gain during goal-directed listening in normal-hearing human listeners (i.e., those with physiologically normal OHCs) remains unclear. In particular, it is unknown under which conditions the MOC reflex is active, including whether listeners must actively be engaged in a listening task for this to occur [27,29,30].

MOC reflex–mediated changes in cochlear gain can be assessed by measuring otoacoustic emissions (OAEs), energy generated by the active OHCs and measured noninvasively as sound from the ear canal [31]. When transient sounds such as clicks are delivered to one ear in the presence of noise in the opposite ear, OAE amplitudes are expected to be reduced, reflecting increased MOC reflex activity [32]. However, the extent to which OAEs are suppressed has been reported as either positively [29,33,34], negatively [27,35] or even uncorrelated [36,37] with the performance in speech-in-noise tasks. Modulation of cochlear gain through the MOC reflex could depend on factors such as task difficulty or relevance (e.g., speech versus nonspeech tasks) and even methodological differences in the way in which inner ear signatures such as OAEs are recorded and analysed [29,38].

Here, we sought to determine whether cochlear gain is modulated in a task-dependent manner by selective recruitment of the MOC reflex. If the MOC reflex is sensitive to goal-directed control and improves understanding of degraded speech, then increases in task difficulty should be accompanied by reduced cochlear gain. We therefore assessed the extent to which the MOC reflex controls cochlear gain in active versus passive listening, i.e., when participants were required to attend to speech stimuli in order to complete a listening task compared to when they were not required to attend and instead watched a silent, non-subtitled film. In order to manipulate task difficulty, we employed speech sounds in background noise—stimuli traditionally used to evoke the MOC reflex [3941]—and noise vocoding of “natural,” clean speech—filtering that mimics processing by a cochlear implant (CI) [42]. Unlike speech in noise, noise-vocoded speech allows for manipulation of intelligibility without the addition of spectrally broadband acoustic energy that intrinsically activates the MOC reflex [4346]. Noise-vocoded speech should therefore enable better detection of any modulation of the MOC reflex by perceptual gating as task difficulty increases.

Physiological recordings in the central auditory pathway, including brainstem, midbrain, and cortical responses, were made while listeners performed an active listening task (detecting non-words in a string of Australian-English words and non-words). Importantly, our experimental paradigm was designed to maintain fixed levels of task difficulty. This allowed us to preserve comparable task relevance across different speech manipulations and avoid confounding effects of task difficulty on attention-gated activation of the MOC reflex. Additionally, visual and auditory scenes were identical across conditions to control for differences in alertness between active and passive listening.

We found that when task difficulty was maintained across speech manipulations, measures of hearing function at the level of the cochlea, brainstem, midbrain, and cortex were modulated depending on the type of degradation applied to speech sounds and on whether speech was actively attended. Specifically, the MOC reflex, assessed in terms of the suppression of click-evoked otoacoustic emissions (CEOAEs), was activated by noise-vocoded speech—an intrinsically degraded speech signal—but not by otherwise “natural” speech presented in either babble noise (BN; i.e., noise consisting of 8 talkers) or speech-shaped noise (SSN; i.e., noise sharing the long-term average spectrum of our speech corpus). Further, neural activity in the auditory midbrain was significantly increased in active versus passive listening for speech in BN and SSN, but not for noise-vocoded speech. This increase was associated with elevated cortical markers of listening effort for the speech-in-noise conditions. A model of the peripheral auditory system and its processes, including the MOC reflex, confirmed the stimulus-dependent role of the MOC reflex in enhancing neural coding of the speech envelope—a feature strongly correlated with the decoding and understanding of speech [4751]. Our data suggest that otherwise identical performance in active listening tasks may invoke quite different efferent circuits, requiring not only different levels, but also different kinds, of listening effort, depending on the type of stimulus degradation experienced.

Results

Maintaining task relevance across speech manipulations requires isoperformance

We assessed speech intelligibility—specifically the ability to discriminate between Australian-English words and non-words—when speech was degraded by 3 different manipulations: noise vocoding the entire speech waveform; adding 8-talker BN to “natural” speech in quiet; or adding SSN to “natural” speech. Participants were asked to make this lexical decision (by means of a button press) when they heard a non-word in a string of words and non-words (Fig 1A and 1C).

Fig 1. Behavioural and physiological measurements during active and passive listening.

Fig 1

(A) Schematic representation of the experimental paradigm. Clicks were continuously presented for 12 minutes (grey-striped rectangles) in each experimental condition (e.g., active listening of natural speech). A 1-minute recording of baseline CEOAE magnitudes (black rectangles) was made at the beginning and at the end of each experimental condition. However, due to artefacts in CEOAE recordings, only the initial CEOAE baseline proved to be of sufficient quality for analysis. Speech tokens (blue and black-striped rectangles) were presented for 10 minutes to the ear contralateral to the ear receiving clicks. ABRs and ERPs were also recorded during this time frame. Following presentation of a word or non-word, participants had 3 seconds to make a lexical decision in the active listening condition (i.e., to determine whether an utterance was a word or a non-word) by pressing a button, or they remained silent in the passive condition by ignoring all auditory stimuli while watching a movie. (B) Schematic shows how click stimuli generated both CEOAEs (analysis window enclosed in rectangle) as well as brainstem activity (ABRs). (C) Corresponds to the time course of natural and degraded speech stimuli in relation to cortical activity (ERPs). (D) Performance during the lexical decision task. Mean d’ (measure of accuracy calculated as Z (correct responses) − Z (false alarm) [i.e., Z (correct responses) = NORMSINV (correct responses)] denoted as white circles (n = 27 in the noise-vocoded condition and n = 29 in the 2 masked conditions). The horizontal line denotes the median. Upper and lower limits of the boxplot represent first (q1) and third (q3) quartiles, respectively, while whiskers denote the upper (q3+1.5*IQR) and lower bounds (q1-1.5*IQR) of the data where IQR is the interquartile range calculated as (q3-q1). Post hoc pairwise comparisons showed that highest performance was always achieved for natural speech compared to noise-vocoded, BN, and SSN manipulations (Bonferroni corrected p = 0.001). Performance was moderately high (statistically lower than natural speech but higher than Voc8, BN5, and SSN3) for Voc16/Voc12 (combined due to n.s. differences, Bonferroni corrected p = 0.11) as well as for BN10 and SSN8, respectively. The lowest level of performance was predictably observed for Voc8, BN5, and SSN3 (p = 0.001). (E) CEOAEs and ABRs collapsed (rANOVA main effect of conditions (active and passive)). CEOAEs and ABRs are reported as means ± SEM. (**Bonferroni corrected p < 0.01; *Bonferroni corrected p < 0.05). (F) CEOAE suppression. The figure shows mean CEOAE magnitude (dB SPL) changes relative to the baseline for all conditions where negative values represent an increase in suppression (activation of MOC reflex). The underlying data can be found in https://doi.org/10.5061/dryad.3ffbg79fw. ABR, auditory brain stem response; BN, babble noise; CEOAE, click-evoked otoacoustic emission; ERP, event-related potential; IQR, interquartile range; n.s., nonsignificant; rANOVA, repeated measures ANOVA; SNN, speech-shaped noise.

Distinct levels of task difficulty were achieved by altering either the number of noise-vocoded channels—16 (Voc16), 12 (Voc12), and 8 (Voc8) channels—or by altering the signal-to-noise ratio (SNR) when speech was masked by BN—+10 (BN10) and +5 (BN5) dB SNR—or SSN—+8 (SSN8) and +3 (SSN3) dB SNR—(Fig 1D). This modulation of task difficulty was statistically confirmed in all 56 listeners (n = 27 in the vocoded condition and n = 29 in the 2 masked conditions) who showed consistently better performance—a higher rate of detecting non-words—in the less degraded conditions, i.e., more vocoded channels or higher SNRs in the masked manipulations: repeated measures ANOVA (rANOVA) [vocoded: [F (3, 78) = 70.92, p = 0.0001, η2 = 0.73]; BN: [F (3, 78) = 70.92, p = 0.0001, η2 = 0.80]; and SSN: [F (2, 56) = 86.23, p = 0.0001, η2 = 0.75]; see Fig 1D for post hoc analysis with Bonferroni corrections (6 multiple comparisons for the noise-vocoded experiment and 3 for BN and SSN manipulations).

Isoperformance was achieved across speech manipulations, with best performance observed in the 2 natural speech conditions—one employed during the noise vocoding experiment and the other during the masking experiments: one-way ANOVA [F (1, 54) = 0.43, p = 0.84, η2 = 0.001]. A moderate and similar level of performance (significantly lower than performance for natural speech) was achieved across Voc16/Voc12 (Voc16 versus Voc12: nonsignificant [n.s.] post hoc t test: [t (26) = 2.53, p = 0.11, d = 0.34]), BN10, and SSN8 conditions: one-way ANOVA: [F (3, 108) = 0.67, p = 0.57, η2 = 0.018]. The poorest performance, significantly lower than the high and moderate performance levels, was observed for Voc8, BN5, and SSN3: one-way ANOVA: F (3, 81) = 0.07, p = 0.72, η2 = 0.008].

Increasing task difficulty has been linked to the allocation of auditory attention and cognitive resources towards the task itself [52]. We, therefore, employed the discrete and matching levels of task difficulty across speech manipulations as a proxy for the required auditory attention.

MOC reflex is modulated by task engagement in a stimulus-dependent manner

To determine whether auditory attention modulates cochlear gain via the auditory efferent system in a task-dependent manner, we assessed the effect of active versus passive listening and speech manipulation on the activity of the inner ear. CEOAEs were recorded continuously while participants either actively performed the lexical task or passively listened to the same corpus of speech tokens (Fig 1A). We first confirmed that CEOAE amplitudes were significantly reduced relative to a baseline measure (obtained in the absence of speech, Fig 1A) within each stimulus manipulation (planned t test comparisons, S1 Table). CEOAEs were significantly reduced in magnitude when actively listening to natural speech and all noise-vocoded stimuli (natural: [t (24) = 2.33, p = 0.03, d = 0.50]; Voc16: [t (23) = 3.40, p = 0.002, d = 0.69]; Voc12: [t (24) = 3.98, p = 0.001, d = 0.80] and Voc8: [t (25) = 5.14, p = 0.001, d = 1.00]). Conversely, during passive listening, CEOAEs obtained during natural, but not noise-vocoded speech were significantly smaller than baseline: [t (25) = 2.29, p = 0.03, d = 0.44] (S1 Table). This was also true of CEOAEs recorded during the 2 masked conditions at all SNRs (natural: [t (26) = 2.17, p = 0.04, d = 0.42]; BN10: [t (28) = 2.80, p = 0.009, d = 0.52] and BN5: [t (28) = 2.36, p = 0.02, d = 0.44]; SSN8: [t (28) = 3.37, p = 0.002, d = 0.63] and SSN3: [t (28) = 3.50, p = 0.002, d = 0.65]). This suggests that the MOC reflex is gated differently in active and passive listening, and by the different types of speech manipulation, despite listeners achieving isoperformance across experimental conditions (i.e., comparable levels of lexical discrimination).

We calculated the reduction in CEOAEs between baseline and experimental conditions (CEOAE suppression, a proxy for activation of the MOC reflex) to quantify efferent control of cochlear gain in active and passive listening. For noise-vocoded speech, suppression of CEOAEs was significantly greater when participants were actively engaged in the lexical task compared to when they were asked to ignore the auditory stimuli: rANOVA: [F (1, 22) = 8.49, p = 0.008, η2 = 0.28] (Fig 1E and 1F). Moreover, we observed a significant interaction between conditions and stimulus type: [F (3, 66) = 2.80, p = 0.046, η2 = 0.12], indicating that the suppression of CEOAEs was stronger for all vocoded conditions in which listeners were required to make lexical decisions, compared to when they were not (Fig 1F)—Voc16: [t (23) = −2.16, p = 0.04, d = 0.44]; Voc12: [t (24) = −2.19, p = 0.038, d = 0.44] and Voc8: [t (25) = 3.51, p = 0.002, d = 0.69]. Engagement in the task did not modulate CEOAE suppression for the natural speech condition: [t (24) = 0.62, p = 0.54, d = 0.12].

By contrast, speech embedded in SSN elicited the opposite pattern of results to noise-vocoded speech (Fig 1E and 1F). The suppression of CEOAEs was significantly stronger during passive, compared to active, listening (Fig 1F): rANOVA: [F (1, 24) = 4.44, p = 0.046, η2 = 0.16], and we observed a significant interaction between condition and stimulus type: [F (2, 48) = 4.67, p = 0.014, η2 = 0.16] for both SNRs: SSN8 [t (27) = 2.71, p = 0.01, d = 0.51] and SSN3 [t (28) = 2.67, p = 0.012, d = 0.50]. We also observed a mild suppression of CEOAEs for speech masked by BN, with CEOAEs significantly smaller than their own baseline measures only during passive listening (shown in the planned t test, S1 Table), but not when active and passive conditions were compared (Fig 1E and 1F) (rANOVA n.s.: F (1, 25) = 1.21, p = 0.28, η2 = 0.05). Cochlear gain was, therefore, suppressed during active listening of noise-vocoded speech, slightly but significantly suppressed during passive listening in BN, and strongly suppressed during passive listening in SSN. Together, our data suggest that the MOC reflex is modulated by task engagement and strongly depends on the way in which signals are degraded, including the type of noise used to mask speech.

Auditory brainstem activity reflects changes in cochlear gain when listening to speech in noise

The effects of active versus passive listening on cochlear gain were evident in the activity of subcortical auditory centres when we simultaneously measured auditory brainstem responses (ABRs) to the same clicks used to evoke CEOAEs. Click-evoked ABRs largely reflect summed activity of higher-frequency regions of the cochlea (3 to 8 kHz [53,54]). However, as CEOAE suppression in the 1 to 2 kHz band is used here as a marker for MOC reflex activity across the entire length of the cochlea (see Materials and methods), we can relate observed changes in cochlear gain to amplitudes of ABR waves.

Click-evoked ABRs—measured during presentation of speech in noise—showed similar effects to those observed for CEOAE measurements. Specifically, in both masked conditions, wave V of the ABR—corresponding to neural activity generated in the midbrain nucleus of the inferior colliculus (IC)—was significantly enhanced in the active, compared to the passive, listening condition (Fig 1E) (speech in BN: [F (1, 26) = 5.66, p = 0.025, η2 = 0.20] and SSN: [F (1, 26) = 9.22, p = 0.005, η2 = 0.26]). No changes in brainstem or midbrain activity were observed between active and passive listening of noise-vocoded speech.

To exclude this stimulus-dependent pattern of inner ear and brainstem responses arising from intrinsic differences in the 2 populations of listeners tested (noise-vocoded versus masked speech experiments), we compared CEOAE suppression as well as the amplitude of ABR waves between the 2 groups for active and passive listening of natural speech. No statistical differences were observed for either active or passive listening between the 2 groups (active natural condition: suppression of CEOAEs [t (23) = −0.21, p = 0.83, d = −0.04; wave III [t (23) = −0.45, p = 0.65, d = 0.09]; wave V [t (23) = 0.09, p = 0.93, d = 0.02]; passive natural condition: suppression of CEOAEs [t (24) = −0.36, p = 0.72, d = 0.07]; wave III [t (24) = −0.16, p = 0.88, d = 0.03]; wave V [t (26) = 0.40, p = 0.69, d = 0.05]). We conclude from this that the differences observed in cochlear gain and auditory brainstem/midbrain activity can be attributed to the specific form of speech degradation.

Together with the effect on CEOAEs, these data suggest that the magnitude of auditory midbrain activity for the different speech manipulations reflects cochlear output. While this is evident for both listening conditions in masked speech, the similarity of ABR magnitudes in the midbrain for active and passive listening of noise-vocoded stimuli is indicative of feed-forward amplification that compensates for reduced cochlear gain during active listening. This highlights increased emphasis of peripheral processing for noise-vocoded, compared to noise-masked, speech and suggests that processing by higher-order auditory centres may be involved in decoding masked speech.

Simulated MOC reflex improves the neural representation of noise-vocoded speech, but not speech in noise

Previous modelling studies have supported the ability of the MOC reflex to “unmask” signals in noise in the auditory nerve (AN) [5562] and, therefore, would not provide a suitable rationale for the absence of CEOAE suppression (i.e., a lack of MOC reflex activity) when participants actively listened to speech in BN or SSN (Fig 1E and 1F). To determine how the neural representation of degraded speech differs in the AN with and without the MOC reflex, and whether this might explain the stimulus dependence of CEOAE suppression in Fig 1E and 1F, we implemented a model of the initial auditory stages (outer, middle, and inner ear with the AN) that includes an MOC reflex [58,6365]. We focused on how the MOC reflex affects the encoding of the energy envelope of acoustic waveforms: considered critical to speech understanding (especially noise-vocoded speech [42,50]) [42,50,66,67], strongly correlated with the cortical tracking and decoding of speech [4749], and the basis of several successful speech intelligibility models [51,68,69].

Here, we tested the hypothesis that efferent suppression of cochlear gain differentially impacts neural encoding of masked and noise-vocoded stimulus envelopes in AN fibres. Natural and degraded speech tokens (those generating lowest, isoperformance in the active task: Voc8; BN5; and SSN3; Fig 1C) were presented to the model at 75 dBA, with and without a fixed 15 dB attenuation generated by the MOC reflex (Fig 2A). Responses of 400 AN fibres with low spontaneous rate (LSR) and high thresholds were simulated in each of 30 frequency channels, logarithmically spaced between 0.1 and 4.5 kHz, forming the model’s output (Fig 2A and 2B). We chose this type of AN fibre for our model because of their apparent critical role in processing sounds in high levels of background noise [7072].

Fig 2. Output of model of the initial auditory stages with and without inclusion of simulated MOC reflex.

Fig 2

(A) Schematic of model showing input stimulus and neural output. Normal and inverted polarity waveforms of words (here, the word “Like”) were presented to the MAP_BS model that incorporates a cascade of stages from outer and middle ear to AN output. Attenuation of cochlear gain by the simulated MOC reflex was implemented at the “modified DRNL filter bank” stage [58,63]. Responses of 400 LSR fibres (shown here in the form of raster plots for normal and polarity-inverted waveforms of the word “Like”) constituted the model output at the AN stage. (B) Presentation of natural and degraded versions of the word “Like” with and without simulated MOC reflex. Normal “Like” waveforms for natural (dark grey, far left), Voc8 (pink, second left), BN5 (light blue, second right), and SSN3 (green, far right) conditions are shown in the top row. Post-stimulus time histograms (average response of 400 fibres binned at 1 ms) were calculated for LSR AN fibres (characteristic frequency: 2.33 kHz) with (bottom row) and without (middle row) simulated MOC reflex. Including simulated MOC reflex reduced activity during quiet for natural condition (and Voc8, but less so) while maintaining high spiking rates at peak sound levels (e.g., at 0.075, 0.3, and 0.45ms). No changes in neural representation of signal were visually evident for BN5 and SSN3 “Like”. (C and D) Quantifying ρENV for Voc8 “Like” without simulated MOC reflex in 2.33-kHz channel. Sumcor plots (bottom row, C) were generated by adding shuffled autocorrelograms (thick lines, top left/middle panels, C) or shuffled cross-correlograms (thick line, top right panel, C) to shuffled cross-polarity correlograms (thin lines, top row, C) to compare neural envelopes of the control condition, Naturally spoken “Like” with simulated MOC reflex (left/right columns, C), and the test condition, Voc8 “Like” without simulated MOC reflex (middle/right columns, C). ρENV for Voc8 “Like” without simulated MOC reflex (AN, solid-pink bar, D) was calculated from sumcor peaks in C [73]. Value of ρENV with simulated MOC reflex (AN+MOC, speckled pink bar, D) is also displayed. (E and F) Comparing ΔρENVs for 100 words after introduction of simulated MOC reflex. Mean percentage changes in ρENVs (calculated in 2 frequency bands: below and above 1.5 kHz) after adding simulated MOC reflex were plotted as a function of ρENV without simulated MOC reflex for degraded versions of 100 words (each symbol represents one word). ΔρENVs were positive for all Voc8 words except 1 (pink circles, E) (Max-Min ΔρENV for Voc8 for <1.5 kHz: +17.62 to −0.78%; Max-Min ΔρENV for Voc8 for >1.5 kHz: +16.14 to −0.62%), appearing largest for words with lowest ρENVs without simulated MOC reflex. This relationship was absent for BN5 (light blue squares, E) and SSN3 (green diamonds, E) words whose ΔρENV ranges spanned the baseline (Max-Min ΔρENV for BN5 for <1.5 kHz: +3.06 to −4.09%; Max-Min ΔρENV for BN5 for >1.5 kHz: +6.08 to −13.56%; Max-Min ΔρENV for SSN3 for <1.5 kHz: +4.52 to −3.58% Max-Min ΔρENV for SSN3 for >1.5 kHz: +0.50 to −11.39%). Progression of mean ΔρENVs (± SEM) for model data >1.5 kHz (checkerboard bars, right, F) mirrored that of the active listening task, CEOAE data (mean ± SEM) (solid colour bars, left, F). The underlying data can be found in https://doi.org/10.5061/dryad.3ffbg79fw. AN, auditory nerve; CEOAE, click-evoked otoacoustic emission; DRNL, dual resonance nonlinear; LSR, low spontaneous rate; MAP_BS, Matlab Auditory Periphery and Brainstem; MOC, medial olivocochlear.

To assess how the energy envelope of degraded speech tokens was represented in the output of the population of AN fibres, both normal and polarity-inverted copies of each token were presented to the model for processing (Fig 2A and 2B). The polarity tolerant component, associated with the stimulus envelope [50,7377], was extracted from the responses of AN fibres using sumcor analysis (Fig 2C). We compared this similarity of neural envelopes between conditions by computing the neural cross-correlation coefficient, ρENV, in each frequency channel (Fig 2D) [50,73]. Values of ρENV, ranging from 0 to 1 for independent to identical neural envelopes, respectively, were calculated for the 3 speech manipulations with (ρENVAN) and without (ρENVMOC) the MOC reflex included in the model. In each case, the neural envelope for natural speech acted as the “control” template for comparison. “Control” simulations of natural speech were performed with the MOC reflex included, based on our observations that a steady CEOAE suppression—indicative of an active MOC reflex—occurred for natural speech experimentally (Fig 1F) and that neural envelopes for natural sentence stimuli were enhanced in model AN fibres with an MOC reflex present (S1 Fig).

Given the range of acoustic waveforms in our speech corpus, we included 100 words (50 stop/nonstop consonants) in our analysis (Fig 2E). Despite the diversity of speech tokens, the effects of including MOC reflex (on ρENV) were consistently dependent on the form of stimulus manipulation. Neural encoding of speech envelopes improved significantly with simulated MOC reflex for noise-vocoded words (pink circles, Fig 2E) (mean ΔρENV for Voc8 for freqs <1.5 kHz = +4.22 ± 0.30%, [Z(99) = 8.66, p < 0.0001, r = 0.86]; mean ΔρENV for Voc8 for freqs >1.5 kHz = + 6.88 ± 0.28%, [Z (99) = 8.67, p < 0.0001, r = 0.87]), with the largest enhancements observed for words with the lowest ρENV values in the absence of MOC reflex. By contrast, no such relationship was observed for words presented in BN (BN5, light blue squares, Fig 2E) or SSN (SSN3, green diamonds, Fig 2E). Moreover, envelope coding in both speech-in-noise conditions was significantly impaired, on average, when the MOC reflex was included (mean ΔρENV for BN5 for freqs <1.5 kHz = −0.89 ± 0.11, [Z(99) = −6.67, p < 0.001, r = 0.67]; mean ΔρENV for BN5 for freqs >1.5 kHz = −1.91 ± 0.31%, [Z(99) = −5.704, p < 0.001, r = 0.57]; mean ΔρENV for SSN3 for freqs <1.5 kHz = −2.62 ± 0.12, [Z (99) = −4.10, p = 0.001, r = 0.41]; mean ΔρENV for SSN3 >1.5 kHz = −3.90 ± 0.26, [Z (99) = −8.66, p < 0.0001, r = 0.87]). Given the apparent similarities between single polarity responses for corresponding noise-vocoded and natural stimuli in Fig 2B (i.e., Voc8ANonly versus NatANonly and Voc8AN+MOC versus NatAN+MOC, Fig 2B), we tested how values of ΔρENV were affected when the neural envelopes for degraded speech tokens were compared with their corresponding natural conditions (i.e., Voc8/BN5/SSN3ANonly versus NatANonly and Voc8/BN5/SSN3AN+MOC versus NatAN+MOC) (S2 Fig). The stimulus-specific effects we observed in Fig 2 not only remained with this new “control” template for ρENVAN (S2A Fig) but were also enhanced for all 3 speech manipulations (S2B Fig). The inclusion of the MOC reflex had similar stimulus-dependent effects in low (<1.5 kHz) and high (>1.5 kHz) frequency bands. However, given the increased importance carried by stimulus envelope for acoustic stimuli at high frequencies [66,78,79], only neural envelope encoding in the high-frequency band was considered in subsequent simulations and analysis.

The stimulus-specific changes in envelope encoding we observed were also evident, but with reduced magnitudes, when we lowered the fixed attenuation of the MOC from 15 dB to 10 dB (S3A Fig), demonstrating that manipulating the strength of the MOC reflex may provide a means of selectively enhancing or impairing neural encoding of the stimulus envelope. Increasing the SNR of the masked speech [i.e., from +5 SNR (BN5) to +10 SNR (BN10) for BN and from +3 SNR (SN3) to + 8 SNR (SN8) for SSN] not only diminished the detrimental effects of the MOC reflex on envelope encoding at the lower SNRs (S3B Fig), but also led to an overall improvement in envelope encoding with the MOC reflex for BN10 stimuli. This suggests that efferent feedback through the MOC reflex may be unable to enhance the neural representation of speech in background noise at low SNRs. By contrast, introducing the MOC reflex to the neural coding of stimuli with more noise-vocoded channels generated enhanced benefit (S3B Fig).

Although we assessed high-threshold fibres due to their importance at high sounds levels [7072], the majority of AN fibres possess high spontaneous rate (HSR) with low-threshold (i.e., fibres that respond preferentially to low-intensity sounds but saturate at higher intensities [8082]). These low-threshold fibres may not only play an important role in envelope processing of speech at low intensities but also contribute at high intensities thanks to their dynamic range adaptation and response fluctuations [78,8387]. We therefore also assessed how these low-threshold, HSR fibres processed the most difficult stimulus degradations (Voc8, BN5, and SSN3; Fig 1D) in the presence and absence of the MOC reflex (S4 Fig). Similar to AN fibres with high thresholds, including the MOC reflex improved envelope encoding by low-threshold AN fibres for noise-vocoded speech and impaired it for speech masked by BN or SSN (S4A and S4B Fig). This is despite poorer dynamic range of low-threshold fibres at 75 dBA (normalised sound presentation level across manipulations) likely impacting their overall ability to encode the stimulus envelope.

We also examined the effects of efferent feedback on the encoding of temporal fine structure (TFS, the instantaneous pressure waveform of a sound)—a stimulus cue at low frequencies associated with speech understanding [50,8892]—and observed a small, mean improvement in the model with the MOC reflex at low frequencies (<1.5 kHz) for the masked conditions with the lowest SNRs (i.e., BN5 and SSN3; S5 Fig). Although this improvement in TFS encoding is consistent with other studies whose simulations support a role for efferent unmasking in speech-in-noise processing [5759,61], it cannot explain the lack of CEOAE suppression we observed experimentally for these, most difficult, masked speech tasks (Fig 1F).

Overall, the pattern of neural envelope enhancement observed in our model AN fibres (both low and high threshold) when the MOC reflex was introduced to the different stimulus degradations (right, Fig 2F) mirrored the observed suppression of CEOAEs for corresponding active listening conditions (left, Fig 2F). Where activation of the MOC reflex was evident experimentally for noise-vocoded speech, enhancement of neural envelopes was observed when the same degraded stimuli were presented to the model with the MOC reflex present. This was not the case for active listening to masked speech. Here, we found no evidence that MOC activity was modulated by active listening to masked speech, and this was consistent with the poorer neural representations of the stimulus envelope when the MOC reflex was included in the model for these stimulus conditions.

Cortical evoked potentials are enhanced when actively listening to speech in noise

The seeming lack of any contribution from the MOC reflex during active listening to speech masked by speech-like sounds (i.e., BN and SSN) compared to noise-vocoded speech suggests that other compensatory brain mechanisms must contribute to listening tasks if isoperformance is maintained across conditions. We therefore explored whether higher brain centres—providing top-down, perhaps attention-driven, enhancement of speech processing in background noise—contribute to maintaining isoperformance across the different speech degradations. In particular, the significant increase observed in wave V of the ABR for active speech-in-noise conditions suggests greater activity in the IC—the principal midbrain nucleus receiving efferent feedback from auditory cortical areas. Levels of cortical engagement might therefore be expected to differ depending on the form of speech manipulation, despite similar task performance.

To determine the degree of cortical engagement in the active listening task, we recorded cortical evoked potentials from all 56 participants—simultaneously with CEOAE and ABR measurements—using a 64-channel, EEG-recording system. Grand averages of event-related potentials (ERPs) to speech onset (Fig 3A, S6 Fig) for the most demanding speech manipulations [Voc8 (S6A Fig), BN5 (S6B Fig), and SSN3 (S6C Fig)] were analysed to test the hypothesis that greater cortical engagement occurred when listening to speech in background noise compared to noise-vocoded speech, despite their being matched in task difficulty.

Fig 3. Cortical activity and proposed mechanisms for active listening to noise-vocoded and masked speech.

Fig 3

(A) ERP components (from electrodes: FZ, F3, F4, CZ, C3, C4, TP7, TP8, T7, T8, PZ, P3, and P4) during the active listening of Voc8, BN5, and SSN3. Electrodes’ selection was based on their relevance in attentional and language brain networks [9397]. Thick lines and shaded areas represent mean and the SEM, respectively. Boxplots on the right show statistical comparisons between speech conditions for P2, N400, and LPC components. (B) Proposed auditory efferent mechanisms for speech processing. The “single stream” mechanism shows how degraded tokens such as noise-vocoded speech are processed in a mostly feed-forward manner (thick black arrows) (as should be the case for natural speech). The activation of the MOC reflex (dark green arrow) improves the AN representation of speech envelope (black arrow, from the cochlea shown as a spiral). This information passes up the auditory centres without much need to “denoise” the signal (represented as black arrows from the brainstem to midbrain to cortex). Given our observation that cochlear gain suppression increased with task difficulty, we included the possibility for enhanced MOC reflex drive from higher auditory regions via corticofugal connections (dark red arrows). By contrast, “multiple streams” such as speech in BN or SSN do not recruit the MOC reflex (light green arrow) because it negatively affects envelope encoding of speech signals (light grey arrows from cochlea–brainstem–midbrain). We therefore propose that corticofugal drive is suppressed to the MOC reflex (shaded red arrow), resulting in weaker MOC reflex activation (light green arrow). This leaves greater responsibility for speech signal extraction to the midbrain, cortex, and the efferent loop therein (corticofugal connections from AC to midbrain: dark red arrow). Both mechanisms ultimately lead to equal behavioural performance across speech conditions. The underlying data can be found in https://doi.org/10.5061/dryad.3ffbg79fw. AC, auditory cortex; AN, auditory nerve; ERP, event-related potential; LPC, late positivity complex; MOC, medial olivocochlear.

We analysed early auditory cortical responses (P1 and N1 components, Fig 3A) that are largely influenced by acoustic features of the stimulus such as intensity and spectral content [98,99]. Noise-vocoded words elicited well-defined P1 and N1 components compared to the speech-in-noise conditions (Fig 3A), despite words and noises having similar onsets to noise-vocoded tokens. This likely reflects the relatively high precision of the envelope components of noise-vocoded speech at stimulus onset compared to the masked conditions in which the competing noises interfere with speech envelope, producing less precise neural responses [100102].

Later ERP components, such as P2, N400 and the late positivity complex (LPC), are associated with speech- and task-specific, top-down (context-dependent) processes [103,104]. Speech masked by BN or SSN elicited significantly larger P2 components during active listening compared to the noise-vocoded condition, but not significantly different between themselves [F (2,79) = 5.08, p = 0.008, η2 = 0.11], post hoc with Bonferroni corrections for 3 multiple comparisons: [BN5 versus Voc8 (p = 0.012, d = 0.78); SSN3 versus Voc8 (p = 0.041, d = 0.69); BN5 versus SSN3 (p = 1.00, d = 0.13)]. Similarly, the magnitude of the LPC—thought to reflect the involvement of cognitive resources including memorisation, understanding [105], and post-decision closure [106] during speech processing—differed significantly across conditions: [F (2,79) = 4.24, p = 0.018, η2 = 0.10]. Specifically, LPCs were greater during active listening to speech in BN compared to noise-vocoded speech (p = 0.02, d = 0.85), with LPCs generated during active listening to speech in SSN intermediate to both, but not significantly different from either (Fig 3A). Consistent with effortful listening varying across speech manipulations even when isoperformance was maintained [107,108], the speech manipulation generating the clearest signature of cortical engagement was speech in a background of BN. This was considered the most difficult of the masked conditions (Fig 1D) where isoperformance was achieved at +5 dB SNR (BN5) compared to only +3 dB SNR for speech in SSN (SSN3). In contrast to P2 and the LPC, the N400 component of the ERP—associated with the processing of meaning [93]—did not differ between conditions [F (2,81) = 0.22, p = 0.81, η2 = 0.005]. This is unsurprising, given that participants were equally able to differentiate non-words in Voc8, BN5, and SSN3 conditions, given that isoperformance had been achieved.

Our ERP data are consistent with differential cortical contributions to the processing of noise-vocoded and masked speech, being larger in magnitude for speech manipulations in which the MOC reflex was less efficiently recruited, i.e., BN and SSN, and largest overall for the manipulation requiring most listening effort—speech in a background of multitalker babble. From the auditory periphery to the cortex, our data suggest that 2 different strategies coexist to achieve similar levels of performance when listening to single undegraded/degraded streams of speech compared to speech masked by additional noise (Fig 3B). The first involves enhanced sensitivity to energy fluctuations through recruitment of the MOC reflex to generate a central representation of the stimulus sufficient and necessary for speech intelligibility of single streams (Fig 3B, left panel). The second, implemented when processing speech in background noise (Fig 3B, right panel), preserves cochlear gain by gating off the MOC reflex off; suppression of cochlear gain by an active MOC reflex does not provide any added benefit to the encoding of the stimulus envelope in the periphery. This places the onus of “denoising” on midbrain and cortex auditory structures and processes including loops between them to maximise speech understanding for masked speech.

Discussion

We assessed the role of attention in modulating the contribution of the cochlear efferent system in a lexical task—detecting non-words in a stream of words and non-words. Employing 3 speech manipulations to modulate task difficulty—noise vocoding of words, words masked by multitalker BN, or SSN (i.e., noise with the same long-term spectrum as speech)—we find that these manipulations differentially activate the MOC reflex to modulate cochlear gain. Activation of the cochlear efferent system also depends on whether listeners are performing the lexical task (active condition) or are not required to engage in the task and instead watch a silent, stop-motion film (passive condition). Specifically, with increasing task difficulty (i.e., fewer noise-vocoded channels), noise vocoding increasingly activates the MOC reflex in active, compared to passive, listening. The opposite is true for the 2 masked conditions, where words presented at increasingly lower SNRs more strongly activate the MOC reflex during passive, compared to active, listening. By adjusting parameters of the 3 speech manipulations—the number of noise-vocoded channels or the SNR for the speech-in-noise conditions, we find that lower MOC reflexive activity is accompanied by heightened cortical activation, possibly to maintain isoperformance in the task. A computational model incorporating efferent feedback to the inner ear demonstrates that improvements in neural representation of the amplitude envelope of sounds provides a rationale for either suppressing or maintaining the cochlear gain during the perception of noise-vocoded speech or speech in noise, respectively. Our data suggest that a network of brainstem and higher brain circuits maintains performance in active listening tasks and that different components of this network, including reflexive circuits in the lower brainstem and the relative allocation of attentional resources, are differentially invoked depending on specific features of the listening environment.

Attentional demands reveal differential recruitment of the MOC reflex

Our data highlight a categorical distinction between active and passive processing of single, degraded auditory streams (e.g., noise-vocoded speech) and parsing a complex acoustic scene to hear out a stream from multiple competing, spectrally similar sounds (multitalker babble and SSN). Specifically, task difficulty during active listening appears to modulate cochlear gain in a stimulus-specific manner. The reduction in cochlear gain with increasing task difficulty for noise-vocoded speech and, conversely, the preservation of cochlear gain when listening to speech in background noise, suggests that attentional resources might gate the MOC reflex differently depending on how speech is degraded. In contrast to active listening, when participants were asked to ignore the auditory stimuli and direct their attention to a silent film, the MOC reflex was gated in a direction consistent with the auditory system suppressing irrelevant and expected auditory information while (presumably) attending to visual streams [24,25,109].

Nevertheless, auditory stimuli may capture attention in a different manner depending on how easy they are to detect, i.e., saliency based (bottom-up processes) [110]. For example, the pitch conveyed in the fundamental frequency, F0, here in terms of the voice pitch, is a highly salient cue that plays an important role in the perceptual segregation of speech sources [111113]. When speech is noise vocoded, saliency carried by the envelope periodicity of the speech is diminished [114117]. In the context of our passive conditions, then, it is possible that noise-vocoded speech was not salient (or distracting) enough to elicit a reduction in cochlear gain sufficient to suppress irrelevant auditory information when attention was presumably focused elsewhere (i.e., on watching a silent film).

Interestingly, activation of the MOC reflex was observed for natural speech—further evidence that activation is not limited to tones and broadband noise [118120]—and did not depend on whether participants were required to attend in a lexical decision task. This is consistent with natural speech being particularly salient as an ethologically relevant and nondegraded stimulus, as well as the low attentional load required when passively watching a film permitting continued monitoring of unattended speech [121].

To explain the variable reduction in cochlear gain between noise-vocoded and noise-masked speech across active and passive conditions (Fig 1E), top-down and bottom-up mechanisms can be posited as candidates to modify the activity of the MOC reflex. Bottom-up mechanisms, such as increased reflexive activation of the MOC reflex by wideband stimuli with flat power spectra [43] may explain why CEOAEs were suppressed during passive listening of speech masked by SSN (Fig 1E). The weaker activation of the reflex observed for speech in BN during passive listening may arise from BN more poorly activating the MOC reflex than “stationary” noises such as white noise or SSN [122,123]. However, if only bottom-up mechanisms were involved, then noise-vocoded speech with relatively fewer channels (i.e., Voc8 stimuli) might have also been expected to activate the MOC reflex more effectively due to their more “noise-like” spectra. The lack of suppression of CEOAEs in the passive noise-vocoded conditions, as well as the stimulus-specific pattern of MOC reflex activity in active listening conditions (i.e., enhanced suppression of cochlear gain for noise-vocoded versus preservation of cochlea gain for masked speech), suggests that a perceptual, top-down categorisation of speech sounds is necessary to engage or disengage appropriately the MOC reflex. Top-down mechanisms may include direct descending control of MOC fibre activity (either excitation or inhibition) or descending modulation of their sound-driven reflexive activity [2224].

Descending control of the MOC reflex for speech stimuli is likely bilateral

A central premise of our study, and of those exploring the effects of attention on the MOC reflex, is that OAEs recorded in one ear can provide a direct measure of top-down modulation of cochlear gain in the opposite ear. However, it has also been suggested that activation of the MOC reflex may differ between ears to expand interaural cues associated with sound localisation (while also enhancing amplitude modulations and suppressing noise in the ear with the better acoustic SNR) [124,125]. This process could be independently modulated by the largely ipsilateral corticofugal pathways evident anatomically (see [23] for review). Had the activation of the MOC reflex been independently controlled at either ear—for example, to suppress irrelevant clicks in one ear while preserving cochlear gain in the ear stimulated by speech—we would have expected similar suppression of CEOAEs across active and passive conditions for all speech manipulations—since click stimuli were always irrelevant to the task. Instead, however, suppression of CEOAEs—a biomarker for activation of the MOC reflex—was both stimulus and task dependent, reducing the likelihood that dichotic presentation of sounds engaged top-down modulation of cochlear gain differentially at either ear.

Anatomical evidence of purely ipsilateral corticofugal pathways ignores the possibility that, even when presented monaurally, descending control of the MOC reflex for speech stimuli may actually be bilateral. Unlike pure tones, speech activates both left and right auditory cortices even when presented monaurally [126]. In addition, cortical gating of the MOC reflex in humans does not appear restricted to direct, ipsilateral descending processes that impact cortical gain control in the opposite ear [127]. Rather, cortical gating of the MOC reflex likely incorporates polysynaptic, decussating processes that influence/modulate cochlear gain in both ears.

Stimulus-specific enhancement of the neural representation of the speech envelope explains the pattern of CEOAE suppression

Beyond enhancing spatial listening [124] and protecting from noise exposure [128130], the function most commonly attributed to activation of the MOC reflex is “unmasking” of transient acoustic signals in the presence of background noise [19,118,124,131]. By increasing the dynamic range of mechanical [16,17] and neural [18,19] responses to amplitude-modulated (AM) components in the cochleae [15,87] (see [28] for review), suppression of cochlear gain by the MOC reflex might preferentially favour the neural representation of syllables and phonemes in speech over masking noises.

While models of the auditory periphery that include efferent feedback typically demonstrate improved word recognition for a range of masking noises [pink noise [5860], SSN [61] and BN [59]], experimental studies only sporadically report positive correlations between increased activation of the MOC reflex and improved speech-in-noise perception [29,33,34], with some even reporting negative correlations [27,35,132] or no effect at all [36,37]. Although this variability has generally been attributed to methodological differences in the measurement of OAEs [29,38], the majority of these studies have assessed contralateral activation of the MOC reflex and performance in a speech-in-noise task in separate sessions. As a result, the possibility that the MOC reflex’s automatic activation is modulated by top-down processes to maximise its functional relevance in auditory tasks has not been fully challenged.

Here, we simultaneously employed degraded speech tokens as both contralateral activators of the MOC reflex and as targets in a lexical decision task and, therefore, were able to ascertain directly the involvement of the MOC reflex in both active and passive tasks. The stimulus- and task-dependent suppression of cochlear gain we observed suggests that automatic activation of the MOC reflex is indeed gated on or off by top-down modulation, with the direction of this modulation dependent on whether or not the reflex functionally benefits performance in either task, i.e., facilitating a lexical decision in an active listening task or ignoring the auditory stimulus in a passive one.

Our modelling data support this conclusion by accounting for the stimulus-dependent suppression of CEOAEs through enhancement of the neural representation of the speech envelope in the AN. The apparent benefit of suppressing cochlear gain in response to envelopes of noise-vocoded words, compared to a disbenefit for words masked by background noise, is consistent with noise-vocoded words retaining relatively strong envelope modulations and that these modulations are extracted effectively through expansion of the dynamic range as cochlear gain is reduced. For both noise maskers, however, any improvement to envelope coding due to dynamic expansion of the mechanical (and neural) range applies to both speech and masker since these signals overlap spectrotemporally. Thus, the reduction of cochlear gain results in a poorer representation of the speech envelope.

Dependence of MOC reflex on SNR

The results of our simulations are based on average changes in the neural correlation coefficient with efferent feedback calculated for 100 words. For individual words, however, the effect of suppressing cochlear gain was highly dependent on both the word itself and the SNR at which it was presented (Fig 2E, S3B Fig). If top-down modulation gates activation of the MOC reflex, it must account for the statistical likelihood that suppression of cochlear gain can improve the neural coding of the stimulus envelope throughout the task.

A potential criterion for predicting whether activation of the MOC reflex can improve speech intelligibility is the extent to which target speech is “glimpsed” in spectrotemporal regions least affected by a background noise [133135]. If computing SNR of the speech envelope in short-time windows underpins speech intelligibility—as proposed by several models [51,68,69]—then this could be a suitable metric by which top-down modulation of the MOC reflex is adjusted. This metric could explain why previous studies of CEOAE suppression, which share identical paradigms aside from the stimuli with different spectrotemporal content (i.e., consonant–vowel pairs), can generate opposing correlations between discrimination in noise and the strength of the MOC reflex [27,34]. Additionally, presenting words at different SNRs should impact any benefit on speech intelligibility of activating the MOC reflex and can explain reported correlations between the strength of the MOC reflex and task performance at a range of SNRs [30,38].

In our model, switching from lowest to intermediate stimulus SNRs for both noise maskers led to an increase in the average ΔρENV when efferent feedback was applied (S3B Fig). This contributed to a small improvement in envelope encoding for +10 dB SNR speech in BN. While the absence of CEOAE suppression experimentally in the active task for BN at +10 dB SNR did not reflect the modelling results (Fig 1F), only a slight majority (59/100) of ΔρENV values were positive for speech in BN at +10 dB SNR, and this may have changed were different words selected. The modelling results for the intermediate SNR in the SSN condition (+8 dB SNR) were, however, consistent with the lack of CEOAE suppression observed experimentally in the active task (Fig 1F). Where an active MOC reflex does not functionally benefit the neural representation of the speech envelope (e.g., at negative SNRs), it remains possible that it is activated in another capacity, for example, to prevent damage by prolonged exposure to loud sounds [128130].

Reconciling stimulus-specific effects of the MOC reflex with previous studies using MAP model and efferent feedback

Resolving the differences between our stimulus-dependent observations in the Matlab Auditory Periphery and Brainstem (MAP_BS) model and those of previous MAP models incorporating efferent feedback and an automatic speech recogniser (ASR) (i.e., consistent improvements to speech-in-noise recognition, irrespective of the type of noise masker) requires understanding the models’ different outputs and analyses. While MAP and MAP_BS models share many similarities [5860,6264,136,137], they differ in their AN fibre outputs. The MAP model generates estimated spike probabilities across populations of simulated AN fibres [137], whereas the spike train output of the MAP_BS model is more stochastic in nature and incorporates the effects of internal noise in individual AN fibres [136,138].

Here, we took advantage of the MAP_BS’s output to present stimuli of opposing polarity and compare coincident events in the resulting stochastic spike trains. Consequently, by extracting and quantifying these envelope-sensitive components with sumcor analysis (Fig 2C and 2E), we could identify how envelope encoding was affected by the suppression of cochlear gain for differently degraded words [7376] and provide a coherent rationale for our experimental observations (Fig 2E and 2F). The ASR, on the other hand, has been previously used as a proxy for human hearing, predicting the intelligibility of test digit sequences from their AN spike probabilities using a hidden Markov model [5860,62,139]. While both stimulus envelope and TFS cues are available to the ASR during digit classification, their respective weighting is not transparent as the ASR exploits any available cue to correctly identify the digit sequence. Therefore, a small enhancement of TFS encoding by the MOC reflex (as observed in our simulations; S5 Fig) could potentially outweigh any disbenefits to envelope encoding. This would result in improved digit identification by the ASR without explaining experimental CEOAE results for masked speech, such as we present here.

Even if envelope encoding had been weighted strongly by the ASR in previous studies, its representation of the neural envelope was likely very different to the sumcor analysis. This is particularly evident when comparing the effects of MOC reflex on “natural,” clean speech in this study and that of Brown and colleagues who also used fixed attenuation of the MOC reflex [58]. Whereas we observe improved encoding of the stimulus envelope in the 4 to 8 Hz range of modulation frequencies for natural sentences (S1B Fig), the ASR’s recognition of digits in silence was reduced when the MOC reflex was active (Fig 7A in [58]). This decrease in classification accuracy arises from the ASR interpreting the effects of the MOC reflex as an overall reduction in AN output (due to suppression of the cochlea gain) rather than the generation of sparser yet more precise neural envelopes as suggested by sumcor analysis and calculation of ρENV (Fig 2E and 2F). In addition, were we to present noise-vocoded speech—a single stream like “natural,” clean speech—to the ASR, its AN output would also appear reduced with the MOC reflex, impacting the ASR’s word recognition and further highlighting its inability to explain our experimental CEOAE data.

Considerations when modelling the MOC reflex

Many aspects of the MOC reflex and its circuitry remain poorly understood. For example, while we represent the active MOC reflex here as a fixed attenuation of cochlear gain (allowing a more controlled examination of the reflex’s effects on envelope encoding), it is in fact a dynamic process with multiple time courses (spanning fifty milliseconds to tens of seconds) whose purposes remain unknown [45,140142]. Previous MAP models incorporating dynamic efferent feedback have gravitated towards slow time constants to optimise speech recognition by their ASR [59,60,62]. However, since digit recognition by the ASR always improved with the MOC reflex, whatever its chosen time constant [60], we are confident that introducing a dynamic MOC reflex would not alter the stimulus dependence of any effects we observed in our model.

An added complexity of modelling MOC reflex activity is considering how top-down gating modulates the reflex and over what timescale. It remains unknown whether corticofugal inputs to the MOC reflex produce tonic activation/inhibition or only modulate the reflex’s strength as has been shown for awake versus anaesthetised animals [20,23,143145]. Given anatomical evidence that MOC neurons sparsely innervate broad regions of the cochlea [146,147], it is possible that, through a combination of tonic top-down modulation and poor frequency tuning of efferent innervation, the fixed attenuation we have implemented across all frequency channels may closely reflect actual recruitment of the MOC reflex under active listening conditions. Finally, our model does not include neural adaptation to stimulus statistics as a potential contributing factor to speech-in-noise discrimination [86,87]. We therefore cannot discount its involvement in the lexical decision task nor show that it is sufficient for robust recognition of degraded speech without the MOC reflex as has been previously suggested [86,87,148].

Higher auditory centre activity supports coexistence of multiple strategies to achieve similar levels of performance

The impact of modulating the MOC reflex was observed in the activity of the auditory midbrain and cortex. Increased midbrain activity for active noise-masked conditions was consistent with changes in magnitudes of ABRs previously reported during unattended versus attended listening to speech [149] or clicks [150,151]. This highlights the potential for subcortical levels either to enhance attended signals or filter out distracting auditory information. At the cortical level, recorded potentials were larger for all attended, compared to ignored, speech (S6A–S6C Fig), consistent with previous reports [152,153]. However, later cortical components were larger for masked, compared to noise-vocoded, speech while attending to the most extreme manipulations. Late cortical components have been associated with the evaluation and classification of complex stimuli [104] as well as the degree of mental allocation during a task [103]. Therefore, differing cortical activity likely reflects greater reliance on, or at least increased contribution from, context-dependent processes for speech masked by noise than for noise-vocoded speech.

Together, differences in physiological measurements from higher auditory centres and the auditory periphery highlight the possibility of diverging pathways to process noise-vocoded and masked speech. Evidence for systemic differences in processing single, degraded streams of speech, compared to masked speech, has been reported in the autonomic nervous system [107]: where, despite maintaining similar task difficulty across conditions, masked speech elicits stronger physiological reactions than single unmasked streams. Here, we propose that 2 strategies enable isoperformance to be maintained even when stimuli are categorically different. The processing of single and intrinsically degraded streams selectively recruits auditory efferent pathways from the AC to the inner ear which, in turn, “denoises” the representation of the stimuli in the periphery (Fig 3B). By contrast, multiple streams, represented by speech in BN or SSN, appear to rely much more on higher auditory centres such as midbrain and AC for the extraction of foreground, relevant signals (Fig 3B). Given evidence of denoised auditory signals in the cortex [47,154156], the extensive loops and chains of information between cortex, subcortical regions, and the auditory periphery in everyday listening environments [20,157159] have not been acknowledged, nor has their candidacy as targets for hearing technologies.

Implications for hearing-impaired listeners

Although normal-hearing listeners appear to benefit from an MOC reflex that modulates cochlear gain and is amenable to top-down, attentional control, it is important to note that users of CIs—for whom normal-hearing listeners processing noise-vocoded speech are often considered a proxy—have no access to the MOC reflex. CIs bypass the mechanical processes of the inner ear, including the OHCs—which are nonfunctioning or absent completely in individuals with severe to profound hearing loss—to stimulate directly the AN fibres themselves. Efforts have been made to incorporate MOC-like properties into CI processes, providing expanded auditory spatial cues and “denoised” electrode output to improve listening in bilaterally implanted CI users [124,125], but the capacity to exploit efferent feedback to aid speech understanding in CI listeners is yet to be realised in any device.

Most recently, Lopez-Poveda and colleagues [125] highlighted the benefits of introducing an MOC strategy to CI users’ understanding of speech in both quiet and noise. Consistent with this, our observed activation of MOC reflex for noise-vocoded speech supports the notion that enhancing the neural representation of acoustic envelopes is key to understanding intrinsically degraded speech for both normal-hearing listeners and CI users (Fig 1F). However, masked speech generates different outcomes between the 2 groups; the improved speech understanding of CI users using an MOC strategy may not be solely related to envelope expansion but may also derive from increased neural “unmasking” due to a reduction of CI stimulation by the MOC strategy [125]. In normal-hearing individuals, however, who showed no MOC reflex activation when listening to masked speech (Fig 1F), the “unmasking” rationale may not be as critical given the far larger dynamic ranges of their AN fibres compared to CI-using counterparts [160162]. This suggests that any future implementation of MOC strategies in CIs might not necessarily reflect the fundamental role of the reflex in a healthy auditory system.

For other hearing-impaired listeners, aided or unaided, the contribution of MOC reflex feedback to speech processing is limited compared to normal-hearing listeners as, in most cases, their hearing loss comes from damage to the OHCs, which receive direct synaptic input from MOC fibres. In hearing loss, generally, the degradation or loss of peripheral mechanisms contributing to effective speech processing in complex listening environments may mean that listeners rely more heavily on attentional and other cortical-mediated processes, contributing to widely reported increases in listening effort required to achieve adequate levels of listening performance [108]. This increase in listening effort—likely manifesting over time—may not be reflected in performance in relatively short, laboratory-, or clinic-based assessments of hearing function.

Materials and methods

This study was approved by the Human Research Ethics Committee of Macquarie University (ref: 5201500235) and was performed according to the Australian Code for the Responsible Conduct of Research. Each participant signed a written informed consent form and was provided with a small financial remuneration for their time.

Hearing assessment

A total of 56 participants (36 females, aged between 18 and 35 [mean: 24 ± 7 years old]) were recruited in this study; however, not all subjects completed every experimental measurement (S2 Table). All subjects included in this study had normal pure tone thresholds (<20 dB HL); normal middle ear function (standard 226 Hz tympanometry); and normal OHC function (assessed with distortion product otoacoustic emissions (DPOAEs) between 0.5 and 10 kHz).

MEMR assessment and stimuli calibration

Controlling the stimulus level is a critical step when recording any type of OAE due to the potential activation of the middle ear muscle reflex (MEMR). High-intensity sounds can evoke contractions of both the stapedius and the tensor tympani muscles, causing the ossicular chain to stiffen and the impedance of middle ear sound transmission to increase. As a result, retrograde middle ear transmission of OAE magnitude can be reduced due to MEMR and not MOC reflex activation [163]. It has been shown that even sounds 10 to 15 dB SPL below the clinical MEMR threshold can cause contractions of the middle ear muscles [164166]. Therefore, a modified version of the clinical protocol (Titan, Interacoustics, Middelfart, Denmark) was used for threshold estimation of the MEMR. Due to the broadband nature of our experimental stimuli (i.e., clicks and speech), instead of tones (typical clinical paradigm), wideband (0.25 to 8 kHz) stimuli were used as activators of both the contralateral and ipsilateral MEMR. MEMR activations were monitored in a modified range (60 to 80 dB HL) with a 5-dB step size and a very sensitive threshold criterion (0.02 ml). All participants had thresholds >75 dB HL. Therefore, presentation level for all natural, noise-vocoded, and speech-in-noise tokens was set at 75 dBA (root–mean–square normalised) and click stimulus at 75 dB p-p. According to ANSI S3.6–1996 standards for the conversion of dB SPL to dB HL, a minimum of 10 dB SPL difference across frequencies was kept between our participants MEMR thresholds (>75 dB HL) and stimulus levels. Therefore, no significant impact of MEMR was expected in our experimental paradigm.

Experimental protocol

Participants were seated comfortably inside an electrically shielded, sound-proof booth (ISO 8253–1:2010) while wearing an EEG cap (Neuroscan 64 channels, SynAmps2 amplifier, Compumedics, Melbourne, Australia). Two attentional conditions (passive and active) were counterbalanced across participants. In the passive listening condition, subjects were asked to ignore the auditory stimuli and to watch a non-subtitled, stop motion movie. To ensure participants’ attention during this condition, they were monitored with a video camera and were asked questions at the end of this session (e.g., What happened in the movie? How many characters were present?). The aim of a passive or an auditory-ignoring condition is to shift attentional resources away from the auditory scene and towards the visual scene. During active listening, participants performed an auditory lexical decision task, where they were asked to press a keyboard’s space key each time they heard a non-word in strings of 300 speech tokens. D prime (d’) was used as a measure of accuracy and calculated as: Z(correct responses)–Z(false alarm) (i.e., Z(correct responses)) = NORMSINV(correct responses). Simultaneous to the presentation of word/non-word in one ear, CEOAEs were recorded continuously in the contralateral ear (Fig 1A). The ear receiving either the clicks or speech stimuli was randomised across participants.

Speech stimuli

A total of 423 word items were acquired from Australian-English-adapted versions of monosyllabic consonant–nucleus–consonant (CNC) word lists and were spoken by a female, native Australian-English speaker. The duration of words ranged between 420 and 650 ms. Moreover, 328 monosyllabic CNC non-word tokens were selected from the Australian Research Council non-word database. Speech stimuli were delivered using ER-3C insert earphones (Etymotic Research, Elk Grove Village, Illinois, USA) and Presentation software (Neurobehavioral Systems, Berkeley, California, USA, version 18.1.03.31.15) at 44.1 kHz, 16 bits. All tokens were root–mean–square normalised, and the calibration system (sound level metre (B&K G4) and microphone IEC 60711 Ear Simulator RA 0045 563 (BS EN 60645–3:2007) (see CEOAEs acquisition and analysis section)) was set to 75 dB “A-Weighting”, which matches the human auditory range.

Each experimental condition, a combination of attentional and stimulus manipulations (see below for details of speech manipulations), was tested using 200 words and 100 non-words (randomly selected from the speech corpus). Speech tokens were counterbalanced in each condition based on the presence of stop and nonstop initial consonants: 100 stop/nonstop consonant words; 50 stop/nonstop consonants words with a maximum of 3 repeats per participant allowed. Each experimental condition had a duration of 12 minutes (Fig 1A), and participants could take short breaks between them if needed. The order of the experimental conditions was always randomised to prevent presentation order bias or training effects.

Noise-vocoded speech

A total of 27 native speakers of Australian-English (17 females: 25 right-handed and 2 left handed) were recruited, aged between 18 and 35 (mean: 23 ± 5 years old). Based on the noise vocoding method and behavioural results of Shannon and colleagues [42], 3 noise-vocoded conditions (16, 12, and 8 channels: Voc16, Voc12, and Voc8, respectively) were tested to represent 3 degrees of speech intelligibility (i.e., task difficulty). Four stimulus conditions were assessed in both active and passive listening conditions: Stimulus condition 1: natural speech; Stimulus condition 2: Voc16; Stimulus condition 3: Voc12; and Stimulus condition 4: Voc8. Each experimental condition lasted 12 minutes (Fig 1A). The total of 8 experimental conditions (i.e., an active and passive condition for each of natural, Voc16, Voc12, and Voc8) had a 2.6 hours duration (including hearing assessment and EEG cap setup).

Speech in BN

A total of 29 native speakers of Australian-English (19 females: 28 right handed, 1 left handed) were recruited, aged between 20 and 35 (mean: 26 ± 9 years old). The BN, used here, consisted of 4 females and 4 male talkers and was filtered to match the long-term average spectrum of the speech corpus (S7 Fig). Random segments from a 60-second BN recording were temporally matched to the speech tokens with no ramps applied to the stimuli (Fig 1C). Three stimulus conditions were presented in the active and passive listening conditions: Stimulus condition 1: natural speech; Stimulus condition 2: speech in BN at +10 dB SNR (BN10); and Stimulus condition 3: speech in BN at +5 dB SNR (BN5).

Speech in SSN

The SSN was generated to match the long-term average spectrum long-term average of the speech corpus (S7 Fig). Random segments from a 60-second SSN were selected to temporally match the speech tokens (Fig 1C); no ramps were applied to the stimuli. Both BN and SSN manipulations were presented in the same session; therefore, Stimulus condition 1 was the same for both manipulations, Stimulus condition 2: speech in SSN at +8 dB SNR (SSN8) and Stimulus condition 3: speech in SSN at +3 dB SNR (SSN3). BN and SSN were combined into a unique session of 3 hours (including hearing assessment and EEG cap setup). The 29 subjects experienced a total of 10 experimental conditions, each of 12-minute duration (i.e., an active and passive condition for each of natural, BN10, BN5, SSN8, and SSN3).

CEOAEs acquisition and analysis

Nonfiltered click stimuli, with a positive polarity and 83-μs duration were digitally generated using RecordAppX (Advanced Medical Diagnostic Systems, Oxford, Mississippi, USA) software. The presentation rate was 32 Hz in all conditions, which contributed to minimise ipsilateral MOC reflex activation [167]. Ipsilateral MOC reflex activation was otherwise constant across participants and experimental manipulations by maintaining a fixed click rate.

Both the generation of clicks and OAE recordings were controlled via an RME UCX soundcard (RME, Haimhausen, Germany) and delivered/collected to and from the ear canal through an Etymotic ER-10B probe connected to ER-2 insert earphones with the microphone pre-amplifier gain set at 20 dB. Calibration of clicks was performed using a sound level metre (B&K G4) and microphone IEC 60711 Ear Simulator RA 0045 (BS EN 60645–3:2007). This setup was also used to calibrate the speech stimuli. In addition, clicks were calibrated in-ear using forward equivalent pressure level (FPL), ensuring accurate stimulus levels [168,169]. The OAE’s probe was repositioned, recalibrated, and the block restarted if participants moved or touched it.

CEOAE data were analysed offline using custom Matlab scripts (available upon request). The averaged RMS magnitudes of CEOAE signals (Fig 1B) were analysed between 1 and 2 kHz given maximal MOC effects in this frequency band [46,170]. The energy in the 1 to 2 kHz CEOAE band does not necessarily originate solely from the equivalent tonotopic region in the cochlea, especially at high level intensities (such as the 75 dB p-p used here), where significant energy from nonlinear distortions distant to the 1 to 2 kHz region will likely contribute [171173]. Given the broadband nature of the click stimuli and the sparse but nonfrequency specific nature of MOC innervation of the cochleae [146,147], the MOC reflex will be acting along the length of the cochleae when suppression is observed in the 1 to 2 kHz band. We therefore consider suppression of the cochlear gain in the 1 to 2 kHz band as a suitable and consistent marker for MOC reflex activity across the entire cochlea.

Only binned data for averaged CEOAEs displaying an SNR ≥ 6 dB (shown to reduce intra- and interindividual variability [29,170]) and with > 80% of epochs retained (i.e., had RMS levels within the 2 standard deviations limit) were selected as valid signals for further analysis; see example individual data (S8 Fig). Although 2 minutes of baseline CEOAE were recorded at the beginning and end of each block, in the absence of speech tokens (Fig 1A), only the first minute was used as baseline, due to low SNR and high number of artefacts (participants swallowing and jaw movements) in the last minute of CEOAE recordings. As no significant differences were observed between CEOAE baseline magnitudes within participants (p > 0.05) across experimental conditions, all baselines were pooled within participants. This allowed for an increase in both SNR and reliability of the individual CEOAE recordings. After baseline recording, CEOAEs were continuously obtained for 10 minutes during the contralateral presentation of the speech tokens (Fig 1A). The suppression of CEOAE magnitude (dB SPL) relative to the baseline was calculated as follows and reported as means and SEM:

CEOAEsuppression=CEOAEspeechpresentation(averageacrossminutes)CEOAEbaseline(first60s).

EEG: Event Related Potentials (ERPs)

EEG measurements and the CEOAE setup were synchronised using a Stimtracker (https://cedrus.com/) (Fig 1A and 1C). EEG data were acquired according to the 10 to 20 system (internationally standardised scalp electrode placement [174]). Impedance levels were kept below 5 kΩ for all electrodes. Signals were sampled at a rate of 20 kHz in the AC mode with a gain of 20000 and an accuracy of 0.15 nV/least significant bit (LSB). Early and late ERP components were analysed offline using fieldtrip-based scripts. Data were rereferenced to the average of mastoid electrodes. Trials started 200 ms before and ended 1.2 seconds after speech onset. Components visually identified as eye blinks and horizontal eye movement were excluded from the data as well as trials with amplitude >75 μV. The accepted trials (60% to 80% per condition) were band-pass filtered between 0.5 and 30 Hz with transition band roll-off of 12 dB/octave. Trials were baseline-corrected using the mean amplitude between −200 and 0 ms before speech onset. Baseline-corrected trials were averaged to obtain ERP waveforms (Fig 1C). Analysis windows centred on the grand average ERP component maximums were selected: P1 (100 to 110 ms) and N1(145 to 155 ms); P2 (235 to 265 ms), N400 (575 to 605 ms); and LPP (945 to 975 ms) (Fig 1C). Mean amplitude for each component within the analysis window was calculated for each participant and experimental condition.

EEG: Auditory Brainstem Responses (ABRs)

ABR signals were extracted from central electrodes (FZ, FCZ, and CZ). Moreover, 16-ms duration ABR analysis windows (2 ms prior and 14 ms after click onset-stimulus artefact between 0 and 1 ms) were selected (Fig 1B). A total of 19,200 trials (click rate of 32 Hz across 10 minutes per condition) were band-pass filtered between 200 and 3,000 Hz. Averaged ABR waveforms were obtained using a weighted-averaging method [175,176]. Amplitude of waves III (peak amplitude at 4 ms across) and V (peak amplitude at 6 ms) (Fig 1B) were visually determined by the first author and 2 lab members (nonauthors) for each subject across blocks and conditions when appropriate (wave amplitudes above the residual noise, therefore a positive SNR, Fig 1B). Due to stimulus level restrictions (< = 75 p-p dB SPL to avoid MEMR activation), wave I could not be extracted from the EEG residual noise.

Statistical analysis

Sample size estimation was computed according to the statistical test employed by using G*Power (Effect size f = 0.4; α err prob = 0.05; Power (1-β err prob) = 0.8). All variables were tested for normality (Shapiro–Wilk test); outlier residual values preventing normal distribution were removed from the data set (S2 Table). One-way ANOVA for the behavioural and ERPs data and rANOVA for CEOAEs and ABRs data and t tests (alpha = 0.05, with Bonferroni corrections for multiple comparisons) were performed. One-way ANOVAs had stimulus type (i.e., Natural, Voc16, BN10) as factors, whereas rANOVA had both attentional conditions (active and passive) and stimulus type as factors. The interaction between factors was also explored. Effect sizes were calculated for all statistical analysis (Eta-squared (ŋ2) for ANOVAs, and Cohen’s d was reported for all t tests) [177].

AN simulations

The MAP_BS [64,65] computational model was used to simulate AN responses with and without efferent feedback (MOC reflex) in thirty frequency channels, logarithmically spaced between 0.1 kHz and 4.5 kHz. Similarly to previous versions of the model [5860,62,63,139], MAP_BS uses a dual-resonance nonlinear (DRNL) filter bank to translate the input of the “outer/middle ear” stage into “basilar membrane velocity” in each frequency channel (Fig 2A). Both linear and nonlinear paths of the DRNL consist of a sequence of band-pass (Gammatone) and low-pass (Butterworth) filters; however, the nonlinear path also includes a compressive nonlinearity that acts when the stimulus exceeds a threshold level. For the current simulations, MAP_BS was run in the “AN only” mode at 100 kHz, i.e., no brainstem neurons were simulated, and stochastic spike trains were generated as the AN output (as opposed to spiking probability of previous MAP models) [5860,6264,136,139]. A total of 400 AN fibres (200/ear) were simulated for each natural and degraded word token. Although we simulated both low-threshold, HSR, and high-threshold, low-spontaneous rate AN fibres (modelled by setting the calcium clearance time constant to 2.4 × 10−4 s for HSR and 0.8 × 10−4 s for LSR), the latter were considered as the main fibre type given their suggested importance for speech in noise at high stimulus intensities [7072].

Implementation of efferent feedback in simulations

MOC attenuation of the cochlear gain in the MAP_BS model was implemented at the first stage of the DRNL filterbank’s nonlinear path. Given the purpose of the modelling was to observe the qualitative effect that suppression of cochlear gain had on the neural envelope of differentially degraded word tokens (as opposed to matching behavioural data quantitively or optimising the effect quantitively as in previous studies [5860,62,139]), we chose to represent the active MOC reflex condition as a fixed, nontemporally varying attenuation of the cochlear gain. This not only imposed a suppression of the cochlear gain that was consistent across all word tokens variants, hence avoiding any stimulus-dependent differences in the time course or mean activation of the MOC reflex for noise-vocoded and masked versions of the same word, but also accelerated simulations as we could present degraded and natural word tokens to the model individually without worrying about the MOC reflex’s initial strength at stimulus onset. Values of 10/15 dB attenuation were applied for the fixed efferent feedback given their effective action on stimuli of similar nature and intensity in previous versions of the model [58,59]. The MEMR was not active in the model as the “ARatt” parameter was set to 0.

Word presentation

A total of 100 words (50 stop/nonstop consonant words) were chosen at random from the speech corpus and were degraded using the most demanding speech manipulations (Voc8, BN5, and SSN3; Fig 1D). Normal, Test+ and polarity-inverted, Test- versions of each manipulation were presented to the MAP_BS model at 75 dB SPL both with and without efferent feedback (Fig 2A). Natural words (both normal, Nat+, and polarity-inverted, Nat-) were also presented to the MAP_BS model with and without efferent feedback; however, the AN output with active efferent feedback was selected as the main reference condition to compare against neural responses to degraded speech tokens (for exception, see S2 Fig).

Shuffled autocorrelogram analysis

Comparative analysis of AN coding of AM envelope between Voc8/BN5/SSN3 conditions and the reference natural condition (with the MOC reflex) was performed using shuffled auto- and cross- correlograms (SACs and SCCs, respectively) [50,73,74]. Normalised all-order histograms were calculated using the spike trains of 400 high-threshold AN fibres with a coincidence window of 50μs and a delay window ± 25 ms centred on zero [73]. No correction for triangular shape was required given brevity of delay window relative to stimulus length (between 420 and 650 ms) [74,77]. A neural cross-correlation coefficient, ρENV, quantifying AM envelope encoding similarity between conditions was generated as follows [50,73]:

ρENV=(sumcorTestNat1)(sumcorTest1)×(sumcorNat1),

where sumcorNat (natural word reference) and sumcorTest (Voc/BN/SSN conditions) are the averages of SACs (Normalised all-order histograms for Nat+ versus Nat+/Test+ versus Test+ for sumcorNat /sumcorTest, respectively) and cross-polarity histograms (Normalised all-order histograms for Nat+ versus Nat-/Test+ versus Test- for sumcorNat /sumcorTest, respectively). SumcorTest Nat is the average of the SCC (Average of normalised all-order histograms for Nat+ versus Test+ and Nat- versus Test-) and the cross-polarity correlogram (Average of normalised all-order histograms for Nat- versus Test+ and Nat+ versus Test-) between natural and Voc8/BN5/SSN3 conditions. All high-frequency oscillations (> characteristic frequency of AN fibre), associated with fine-structure leakage, were removed from sumcors [73,76]. ρENV values ranged from 0 to 1 where 0 represents completely dissimilar spike trains and 1 represents identical spike patterns [50,73]. The neural cross-correlation coefficient, ρTFS, was also calculated to quantify the similarity of 2 conditions’ TFSs as follows [50,73]:

ρTFS=(diffcorTestNat)(diffcorTest)×(diffcorNat),

where diffcorNat (natural word reference) and diffcorTest (Voc/BN/SSN conditions) are the difference between SACs (Normalised all-order histograms for Nat+ versus Nat+/Test+ versus Test+ for diffcorNat /diffcorTest, respectively) and cross-polarity histograms (Normalised all-order histograms for Nat+ versus Nat-/Test+ versus Test- for diffcorNat /diffcorTest, respectively). DifcorTest Nat is the difference between the SCC (Average of normalised all-order histograms for Nat+ versus Test+ and Nat- versus Test-) and the cross-polarity correlogram (Average of normalised all-order histograms for Nat- versus Test+ and Nat+ versus Test-) between natural and Voc8/BN5/SSN3 conditions. ρTFS was calculated only for masked speech stimuli (BN5 and SSN3) given that noise vocoding scrambles the TFS cues and, therefore, would limit the utility of any comparison of TFS encoding for noise-vocoded and natural speech [50,178].

Analysis of modelling and statistics

Percentage changes in ρENV due to efferent feedback inclusion in MAP_BS were calculated for each test frequency and Voc8/BN5/SSN3 condition as follows:

ΔρENVfreq=(ρENVeffρENVnoeffρENVnoeff)×100,

where ΔρENVfreq is the percentage change in ρENV at a test-frequency for a manipulated word. ρENVno eff and ρENVeff are measures of ρENV with and without efferent feedback enabled, respectively. An average ΔρENVfreq was calculated across test-frequencies for each word and manipulation. Similar calculations were performed for ΔρTFSfreq by replacing ρENV in the equation with ρTFS. Data are reported as means and SEM. A one-sample Wilcoxon signed rank test (not normally distributed data) was performed to confirm whether average ΔρENVfreq for all words differed from zero for each speech manipulation. Paired Wilcoxon signed rank tests were performed between experimental conditions. Wilcoxon effect size (r) was calculated for all statistical tests.

Supporting information

S1 Table. Planned t test comparisons between CEOAEs baseline measures and CEOAEs magnitude obtained during the presentation of noise-vocoded speech and masked speech.

CEOAE, click-evoked otoacoustic emission.

(XLSX)

S2 Table. Subjects removed for CEOAEs suppression and ABR analysis.

ABR, auditory brainstem response; CEOAE, click-evoked otoacoustic emission.

(XLSX)

S1 Fig. Efferent feedback improves envelope encoding for naturally spoken sentences.

(A) Shuffled-Correlogram Sumcors (upper panel) were calculated for the naturally spoken sentence, “the steady drip is worse than the drenching rain” (s86; The MAVA corpus, [179]), using LSR AN fibre output in the 2.334 kHz channel with (red line) and without (black line) efferent feedback (MOC reflex). A longer, 1-second delay window was used compared to the single word presentation; in addition, inverted triangular compensation was implemented to compensate for large delays relative to signal length [74,77]. The envelope power spectral density (lower panel) was computed both with (red line) and without (black line) efferent feedback by computing Fourier transforms of the above Sumcors with a <1 Hz spectral resolution. Efferent feedback was conducive to larger envelope responses, especially at low modulation frequencies associated with words and syllables. (B) Envelope power spectra computed with and without MOC reflex in the 4 to 8 Hz modulation range for 6 sentences (s7, s26, s37, s42, s86, and s164; The MAVA corpus, [179]). In all instances, adding MOC reflex improved envelope encoding across most modulation frequencies. The underlying data can be found in https://doi.org/10.5061/dryad.3ffbg79fw. AN, auditory nerve; LSR, low spontaneous rate; MOC, medial olivocochlear.

(EPS)

S2 Fig. Using a different control template condition (NatANonly) to calculate ρENVAN does not alter stimulus-specific changes to envelope encoding when (15-dB attenuation) efferent feedback is added to LSR fibres.

(A) ΔρENVs for 100 words [in their 3 degraded forms and for high- and low-frequency bands (<1.5 kHz (left, A) and >1.5 kHz (right, A)] were calculated using AN responses to “Natural” speech (i.e., in quiet) presented without the MOC reflex as the control template to compute values of ρENVAN, i.e., NatANonly versus DegradedANonly. ΔρENVs for all manipulations in both low and high-frequency bands followed the same stimulus-dependent trends as in Fig 2. (mean ΔρENV for Voc8 for freqs <1.5 kHz = +3.69 ± 0.30%, [Z(99) = 5.68, p < 0.001, r = 0.57]; mean ΔρENV for Voc8 for freqs >1.5 kHz = + 8.08 ± 0.36%, [Z (99) = 8.6, p < 0.0001, r = 0.87]; mean ΔρENV for BN5 for freqs <1.5 kHz = −9.20 ± 0.66, [Z(99) = −8.68, p < 0.0001, r = 0.87]; mean ΔρENV for BN5 for freqs >1.5 kHz = −3.24 ± 0.30%, [Z(99) = −7.84, p < 0.001, r = 0.57]; mean ΔρENV for SSN3 for freqs <1.5 kHz = −9.09 ± 0.70, [Z (99) = −8.68, p < 0.0001, r = 0.87]; mean ΔρENV for SSN3 >1.5 kHz = −5.60 ± 0.34, [Z (99) = −8.68, p < 0.0001, r = 0.87]). ΔρENVs for Voc8 stimuli (pink circles, left, A) were exclusively positive in the high-frequency band with the largest benefits observed for noise-vocoded tokens with the lowest ρENVAN values, as observed in Fig 2E. In addition, the most negative ΔρENVs for BN5 and SSN3 stimuli were observed for the lowest values of ρENVAN. (B) Comparing ΔρENVs calculated using NatANonly and NatAN+MOC as a control template for ρENVAN at high frequencies (>1.5 kHz). The mean improvement in envelope encoding for Voc8 stimuli was larger after calculating ρENVAN with the new NatANonly control template ([Z(99) = −4.6, p < 0.001, r = 0.47]) (left column, B). Similarly for masked stimuli (BN5 (middle, B) and SSN3 (right, B)), the new control template for ρENVAN led to an increase in the impairment to envelope encoding with the MOC reflex (BN5:[Z(99) = −6.50, p < 0.001, r = 0.65]; SSN3:[Z(99) = −7.09, p < 0.01, r = 0.71]) (middle and right columns, B). The underlying data can be found in https://doi.org/10.5061/dryad.3ffbg79fw. AN, auditory nerve; BN, babble noise; LSR, low spontaneous rate; MOC, medial olivocochlear; SSN, speech-shaped noise.

(EPS)

S3 Fig. Comparing mean changes in ΔρENVs (in >1.5kHz frequency band) for control conditions.

(A) Using a smaller fixed attenuation for the active MOC reflex (10-dB attenuation) than in the main simulations (15-dB attenuation) reduced both positive mean ΔρENV for Voc8 and negative mean ΔρENVs for BN5/SSN3 (Voc8: [Z(99) = −7.94, p < 0.001, r = 0.79]; BN5: [Z(99) = −5.53, p < 0.001, r = 0.55]; SSN3: [Z(99) = −7.85, p < 0.001, r = 0.78]). Nevertheless the benefits (Voc8) and disbenefits (BN5/SSN3) of adding the MOC reflex to envelope encoding remained for all 3 stimulus manipulations (Voc8: ([Z(99) = 6.756, p < 0.0001, r = 0.79]; BN5: [Z(99) = −5.16, p < 0.001, r = 0.52]; SSN3: [Z(99) = −8.44, p < 0.0001, r = 0.84]). (B) Presenting degraded speech tokens with more channels for Voc stimuli, i.e., Voc16, generated signficantly larger ΔρENVs with a fixed 15-dB MOC reflex attenuation ([Z(99) = −3.66, p < 0.001, r = 0.4]). Increasing the SNRs (10-dB SNR for BN and 8 dB SNR for SSN) significantly reduced ΔρENVs for speech-in-noise conditions when the same MOC reflex attenuation was implemented (BN: [Z(99) = −8.20, p < 0.001, r = 0.82]; SSN: [Z(99) = −8.67, p < 0.001, r = 0.87]). For BN10, the new mean ΔρENVs was in fact positive ([Z(99) = 2.025, p = 0.043, r = 0.2]). The underlying data can be found in https://doi.org/10.5061/dryad.3ffbg79fw. BN, babble noise; MOC, medial olivocochlear; SNR, signal-to-noise ratio; SSN, speech-shaped noise.

(EPS)

S4 Fig. Percentage change in envelope encoding after introduction of (15-dB attenuation) efferent feedback to low-threshold, HSR AN fibres (in > 1.5kHz frequency band).

(A and B) ΔρENVs for 100 words (in their 3 degraded forms) were calculated as in Fig 2E; however, HSR AN fibre output for frequencies >1.5 kHz was used here. ΔρENVs for Voc8 words (pink circles, A) varied greatly (Max-Min ΔρENV for Voc8 > 1.5kHz = +32.84 to −0.43%) but the mean ΔρENV was significantly positive (mean ΔρENV for Voc8 >1.5 kHz = +12.06 ± 0.57%, [Z (99) = 8.68, p < 0.0001, r = 0.87]). Note that values of ρENVAN for Voc8 were smaller here compared to values for low SR (LSR) AN fibres (mean ρENVAN-HSR for Voc8 >1.5 kHz = 0.55 ± 0.01 versus mean ρENVAN-LSR for Voc8 >1.5 kHz = 0.64 ± 0.01). By contrast, the distributions of ΔρENVs for BN5 (light blue squares, A) and SSN3 (green diamonds, A) appeared more compact (Max-Min Range ΔρENV for BN5 words = +2.062,27 to −9.79%; Max-Min ΔρENV for SSN3 = +1.07 to −10.57%); however, as for LSR AN fibre results (Fig 2E and 2F), both mean ΔρENVs for HSR AN fibres were significantly negative overall (mean ΔρENV for BN5 = −3.47 ± 0.27, [Z(99) = −8.18, p < 0.001, r = 0.82]; mean ΔρENV for SSN3 = −4.36 ± 0.22, [Z(99) = −8.65, p < 0.0001, r = 0.87]). Progression of mean ΔρENVs (± SEM) for model data > 1.5kHz (checkerboard bars, right, B) mirrored that of active-task, CEOAE data (mean ± SEM) (solid colour bars, left, B). The underlying data can be found in https://doi.org/10.5061/dryad.3ffbg79fw. AN, auditory nerve; BN, babble noise; CEOAE, click-evoked otoacoustic emission; HSR, high spontaneous rate; LSR, low spontaneous rate; SSN, speech-shaped noise.

(EPS)

S5 Fig. Percentage change in TFS encoding for masked speech conditions (BN5/SSN3) after introduction of efferent feedback (15-dB attenuation) to LSR AN fibres (in >1.5kHz frequency band).

Changes to TFS encoding were calculated for masked speech (not for noise-vocoded stimuli given their scrambled fine structure [50,73,76,178]) using Natural conditions with MOC reflex as control templates to calculate both ρTFSAN and ρTFSMOC. Adding the MOC reflex produced a mean improvement in TFS encoding for both BN5 and SSN3 (mean ΔρTFS for BN5 for freqs >1.5 kHz = 0.31 ± 0.16%, [Z(99) = 2.61, p = 0.009, r = 0.26]); mean ΔρENV for SSN3 >1.5 kHz = 1.06 ± 0.15, [Z (99) = 5.95, p < 0.001, r = 0.59]). The underlying data can be found in https://doi.org/10.5061/dryad.3ffbg79fw. AN, auditory nerve; BN, babble noise; LSR, low spontaneous rate; MOC, medial olivocochlear; SSN, speech-shaped noise; TFS, temporal fine structure.

(EPS)

S6 Fig. Cortical evoked potentials during active and passive speech perception.

ERP components during the active and passive listening from electrodes: FZ, F3, F4, CZ, C3, C4, TP7, TP8, T7, T8, PZ, P3, and P4 are shown in panels A, B, and C. Electrode’s selection was based on their relevance in attentional and language brain activity related networks [9397]. Thick lines and shaded areas represent means and SEM, respectively. Within conditions analysis showed that, for all speech manipulations, the magnitude of P1, P2, and N400 potentials were enhanced during active (colour lines) when compared to the passive (grey lines) listening conditions, while N1 tended to be less negative in the active task. LPC magnitude was only significantly enhanced during the active listening of speech in noise. (A) ERP components in natural and all noise-vocoded manipulations: P1: [F (1,24) = 6.36, p = 0.02, η2 = 0.21], N1: [F (1, 24) = 16.03, p = 0.001, η2 = 0.40], P2: [F (1, 24) = 12.30, p = 0.002, η2 = 0.34], N400: [F (1, 24) = 31.82, p = 0.0001, η2 = 0.57], LPC: [F(1,24) = 5.29, p = 0.03, η2 = 0.18]. (B) ERPs during natural (different population than noise-vocoded experiment) and all BN manipulations (n = 29): P1: [F (1, 28) = 24.47, p = 0.0001, η2 = 0.47], N1: [F (1, 28) = 10.46, p = 0.003, η2 = 0.27], P2: [F (1, 28) = 10.65, p = 0.003, η2 = 0.28], N400: [F (1, 28) = 62.16, p = 0.0001, η2 = 0.69], LPC: [F(1,28) = 10.55, p = 0.003, η2 = 0.27]. (C) ERP components during SSN manipulations (n = 29): P1: [F (1, 28) = 22.98, p = 0.0001, η2 = 0.45], N1: [F (1, 28) = 6.07, p = 0.02, η2 = 0.18], P2: [F (1, 28) = 18.10, p = 0.001, η2 = 0.39] and N400: [F (1, 28) = 60.75, p = 0.0001, η2 = 0.68], LPC: [F(1,28) = 10.76, p = 0.003, η2 = 0.28]. The underlying data can be found in https://doi.org/10.5061/dryad.3ffbg79fw. BN, babble noise; ERP, event-related potential; LPC, late positivity complex; SSN, speech-shaped noise.

(EPS)

S7 Fig. Comparison of LTAS for natural speech, BN, and SSNs.

Power spectrum density estimates were calculated for 300 concatenated natural speech tokens and 60 seconds of 8-talker BN and SSN; all acoustic stimuli had been normalised to 65 dB for the purpose of this figure. The upper root-mean square envelopes, generated using 300-point sliding windows, are shown for the different conditions. The underlying data can be found in https://doi.org/10.5061/dryad.3ffbg79fw. BN, babble noise; LTAS, long-term average spectra; SSN, speech-shaped noise.

(EPS)

S8 Fig. Example of subject’s CEOAE data management from Fig 1F.

Boxes and whiskers represent the distribution of the data in quartiles. Whiskers indicate the variability outside the upper and lower quartiles. Stars symbols represent outliers, data points labelled SNR corresponds to CEOAEs data with snr <6 dB, while data points labelled ID corresponds to incomplete data acquisition. These data points were not considered for statistical analysis. The underlying data can be found in https://doi.org/10.5061/dryad.3ffbg79fw. CEOAE, click-evoked otoacoustic emission; SNR, signal-to-noise ratio.

(EPS)

Acknowledgments

The authors thank Prof. David Poeppel for his contributions during experimental design. We thank Prof. David Ryugo for his comments on the manuscript. In addition, we thank Ronny Ibrahim, Jaime Undurraga, Lindsey Van Yper, and Greg Stewart for their technical support and EEG analysis and Nicholas Clark for his assistance with MAP_BS. The authors thank Ray Meddis for bringing the MAP_BS model to our attention.

Abbreviations

ABR

auditory brainstem response

AC

auditory cortex

AM

amplitude-modulated

AN

auditory nerve

BN

babble noise

CEOAE

click-evoked otoacoustic emission

CI

cochlear implant

CNC

consonant–nucleus–consonant

DPOAE

distortion product otoacoustic emission

DRNL

dual-resonance nonlinear

ERP

event-related potential

FPL

forward equivalent pressure level

HSR

high spontaneous rate

IC

inferior colliculus

LPC

late positivity complex

LSB

least significant bit

LSR

low spontaneous rate

MAP_BS

Matlab Auditory Periphery and Brainstem

MEMR

middle ear muscle reflex

MOC

medial olivocochlear

n.s.

nonsignificant

OAE

otoacoustic emission

OHC

outer hair cell

rANOVA

repeated measures ANOVA

SNR

signal-to-noise ratio

SSN

speech-shaped noise

TFS

temporal fine structure

Data Availability

The underlying data can be found in https://doi.org/10.5061/dryad.3ffbg79fw.

Funding Statement

H.H.P. was supported in this study by an International Macquarie University Excellence Scholarship (https://www.mq.edu.au/research/phd-and-research-degrees/scholarships/scholarship-search/data/international-hdr-main-scholarship-round) and the The HEARing Cooperative Research Centre (https://www.hearingcrc.org/) J.M.H. was supported in this study by an Australian Research Council Laureate Fellowship (FL 160100108) awarded to D.M (https://www.arc.gov.au/grants/discovery-program/australian-laureate-fellowships).The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Lesica NA. Why do hearing aids fail to restore normal auditory perception? Trends Neurosci. 2018;41:174–85. doi: 10.1016/j.tins.2018.01.008 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Lehmann J, Christen N, Barilan YM, Gannot I. Age-related hearing loss, speech understanding and cognitive technologies. Int J Speech Technol. 2021. doi: 10.1007/s10772-021-09817-z [DOI] [Google Scholar]
  • 3.Chien W, Lin FR. Prevalence of hearing aid use among older adults in the United States. Arch Intern Med. 2012;172:292–3. doi: 10.1001/archinternmed.2011.1408 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Simpson AN, Matthews LJ, Cassarly C, Dubno JR. Time From Hearing Aid Candidacy to Hearing Aid Adoption: A Longitudinal Cohort Study. Ear Hear. 2019;40:468–76. doi: 10.1097/AUD.0000000000000641 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Irace AL, Sharma RK, Reed NS, Golub JS. Smartphone-Based Applications to Detect Hearing Loss: A Review of Current Technology. J Am Geriatr Soc. 2021;69:307–16. doi: 10.1111/jgs.16985 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Darrow KN, Maison SF, Liberman MC. Cochlear efferent feedback balances interaural sensitivity. Nat Neurosci. 2006;9:1474–6. doi: 10.1038/nn1807 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Grothe B, Pecka M. The natural history of sound localization in mammals—a story of neuronal inhibition. Front Neural Circuits. 2014;8:116. doi: 10.3389/fncir.2014.00116 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Smith DW, Keil A. The biological role of the medial olivocochlear efferents in hearing: separating evolved function from exaptation. Front Syst Neurosci. 2015;9:12. doi: 10.3389/fnsys.2015.00012 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Ahveninen J, Jääskeläinen IP, Raij T, Bonmassar G, Devore S, Hämäläinen M, et al. Task-modulated “what” and “where” pathways in human auditory cortex. Proc Natl Acad Sci U S A. 2006;103:14608–13. doi: 10.1073/pnas.0510480103 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Knudsen EI. Fundamental components of attention. Annu Rev Neurosci. 2007;30:57–78. doi: 10.1146/annurev.neuro.30.051606.094256 [DOI] [PubMed] [Google Scholar]
  • 11.Mulders WH, Robertson D. Evidence for direct cortical innervation of medial olivocochlear neurones in rats. Hear Res. 2000;144:65–72. doi: 10.1016/s0378-5955(00)00046-0 [DOI] [PubMed] [Google Scholar]
  • 12.Xiao Z, Suga N. Modulation of cochlear hair cells by the auditory cortex in the mustached bat. Nat Neurosci. 2002;5:57–63. doi: 10.1038/nn786 [DOI] [PubMed] [Google Scholar]
  • 13.Dragicevic CD, Aedo C, León A, Bowen M, Jara N, Terreros G, et al. The olivocochlear reflex strength and cochlear sensitivity are independently modulated by auditory cortex microstimulation. J Assoc Res Otolaryngol. 2015;16:223–40. doi: 10.1007/s10162-015-0509-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Ashmore J, Avan P, Brownell WE, Dallos P, Dierkes K, Fettiplace R, et al. The remarkable cochlear amplifier. Hear Res. 2010;266:1–17. doi: 10.1016/j.heares.2010.05.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Almishaal A, Jennings SG. Effects of a precursor on amplitude modulation detection are consistent with efferent feedback. J Acoust Soc Am. 2016;139:2155–5. [Google Scholar]
  • 16.Cooper NP, Guinan JJ Jr. Efferent-mediated control of basilar membrane motion. J Physiol. 2006;576:49–54. doi: 10.1113/jphysiol.2006.114991 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Murugasu E, Russell IJ. The effect of efferent stimulation on basilar membrane displacement in the basal turn of the guinea pig cochlea. J Neurosci. 1996;16:325–32. doi: 10.1523/JNEUROSCI.16-01-00325.1996 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Winslow RL, Sachs MB. Effect of electrical stimulation of the crossed olivocochlear bundle on auditory nerve response to tones in noise. J Neurophysiol. 1987;57:1002–21. doi: 10.1152/jn.1987.57.4.1002 [DOI] [PubMed] [Google Scholar]
  • 19.Guinan JJ Jr, Gifford ML. Effects of electrical stimulation of efferent olivocochlear neurons on cat auditory-nerve fibers. I Rate-level functions Hear Res. 1988;33:97–113. doi: 10.1016/0378-5955(88)90023-8 [DOI] [PubMed] [Google Scholar]
  • 20.Suthakar K, Ryugo DK. Descending projections from the inferior colliculus to medial olivocochlear efferents: Mice with normal hearing, early onset hearing loss, and congenital deafness. Hear Res. 2017;343:34–49. doi: 10.1016/j.heares.2016.06.014 [DOI] [PubMed] [Google Scholar]
  • 21.Faye-Lund H. Projection from the inferior colliculus to the superior olivary complex in the albino rat. Anat Embryol. 1986;175: 35–52. doi: 10.1007/BF00315454 [DOI] [PubMed] [Google Scholar]
  • 22.Perrot X, Ryvlin P, Isnard J, Guénot M, Catenoix H, Fischer C, et al. Evidence for corticofugal modulation of peripheral auditory activity in humans. Cereb Cortex. 2006;16:941–8. doi: 10.1093/cercor/bhj035 [DOI] [PubMed] [Google Scholar]
  • 23.Terreros G, Delano PH. Corticofugal modulation of peripheral auditory responses. Front Syst Neurosci. 2015;9:134. doi: 10.3389/fnsys.2015.00134 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Bowen M, Terreros G, Moreno-Gómez FN, Ipinza M, Vicencio S, Robles L, et al. The olivocochlear reflex strength in awake chinchillas is relevant for behavioural performance during visual selective attention with auditory distractors. Sci Rep. 2020;10:14894. doi: 10.1038/s41598-020-71399-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Wittekindt A, Kaiser J, Abel C. Attentional modulation of the inner ear: a combined otoacoustic emission and EEG study. J Neurosci. 2014;34:9995–10002. doi: 10.1523/JNEUROSCI.4861-13.2014 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Guinan JJ Jr. Olivocochlear efferents: Their action, effects, measurement and uses, and the impact of the new conception of cochlear mechanical responses. Hear Res. 2018;362:38–47. doi: 10.1016/j.heares.2017.12.012 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.de Boer J, Thornton ARD, Krumbholz K. What is the role of the medial olivocochlear system in speech-in-noise processing? J Neurophysiol. 2012;107:1301–12. doi: 10.1152/jn.00222.2011 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Lopez-Poveda EA. Olivocochlear Efferents in Animals and Humans: From Anatomy to Clinical Relevance. Front Neurol. 2018;9:197. doi: 10.3389/fneur.2018.00197 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Mishra SK, Lutman ME. Top-down influences of the medial olivocochlear efferent system in speech perception in noise. PLoS ONE. 2014;9:e85756. doi: 10.1371/journal.pone.0085756 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Mertes IB, Johnson KM, Dinger ZA. Olivocochlear efferent contributions to speech-in-noise recognition across signal-to-noise ratios. J Acoust Soc Am. 2019;145:1529. doi: 10.1121/1.5094766 [DOI] [PubMed] [Google Scholar]
  • 31.Kemp DT. Stimulated acoustic emissions from within the human auditory system. J Acoust Soc Am. 1978;64:1386–91. doi: 10.1121/1.382104 [DOI] [PubMed] [Google Scholar]
  • 32.Collet L, Kemp DT, Veuillet E, Duclaux R, Moulin A, Morgon A. Effect of contralateral auditory stimuli on active cochlear micro-mechanical properties in human subjects. Hear Res. 1990;43:251–61. doi: 10.1016/0378-5955(90)90232-e [DOI] [PubMed] [Google Scholar]
  • 33.Giraud AL, Garnier S, Micheyl C, Lina G, Chays A, Chéry-Croze S. Auditory efferents involved in speech-in-noise intelligibility. Neuroreport. 1997;8:1779–83. doi: 10.1097/00001756-199705060-00042 [DOI] [PubMed] [Google Scholar]
  • 34.de Boer J, Thornton ARD. Neural correlates of perceptual learning in the auditory brainstem: efferent activity predicts and reflects improvement at a speech-in-noise discrimination task. J Neurosci. 2008;28:4929–37. doi: 10.1523/JNEUROSCI.0902-08.2008 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Harkrider AW, Bowers CD. Evidence for a cortically mediated release from inhibition in the human cochlea. J Am Acad Audiol. 2009;20:208–15. doi: 10.3766/jaaa.20.3.7 [DOI] [PubMed] [Google Scholar]
  • 36.Wagner W, Frey K, Heppelmann G, Plontke SK, Zenner H-P. Speech-in-noise intelligibility does not correlate with efferent olivocochlear reflex in humans with normal hearing. Acta Otolaryngol. 2008;128:53–60. doi: 10.1080/00016480701361954 [DOI] [PubMed] [Google Scholar]
  • 37.Stuart A, Butler AK. Contralateral suppression of transient otoacoustic emissions and sentence recognition in noise in young adults. J Am Acad Audiol. 2012;23:686–96. doi: 10.3766/jaaa.23.9.3 [DOI] [PubMed] [Google Scholar]
  • 38.Mertes IB, Wilbanks EC, Leek MR. Olivocochlear Efferent Activity Is Associated With the Slope of the Psychometric Function of Speech Recognition in Noise. Ear Hear. 2018;39:583–93. doi: 10.1097/AUD.0000000000000514 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Berlin CI, Hood LJ, Wen H, Szabo P, Cecola RP, Rigby P, et al. Contralateral suppression of non-linear click-evoked otoacoustic emissions. Hear Res. 1993;71:1–11. doi: 10.1016/0378-5955(93)90015-s [DOI] [PubMed] [Google Scholar]
  • 40.Norman M, Thornton AR. Frequency analysis of the contralateral suppression of evoked otoacoustic emissions by narrow-band noise. Br J Audiol. 1993;27:281–9. doi: 10.3109/03005369309076705 [DOI] [PubMed] [Google Scholar]
  • 41.Kalaiah MK, Theruvan NB, Kumar K, Bhat JS. Role of Active Listening and Listening Effort on Contralateral Suppression of Transient Evoked Otoacousic Emissions. J Audiol Otol. 2017;21:1–8. doi: 10.7874/jao.2017.21.1.1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Shannon RV, Zeng FG, Kamath V, Wygonski J, Ekelid M. Speech recognition with primarily temporal cues. Science. 1995;270:303–4. doi: 10.1126/science.270.5234.303 [DOI] [PubMed] [Google Scholar]
  • 43.Lilaonitkul W, Guinan JJ Jr. Human medial olivocochlear reflex: effects as functions of contralateral, ipsilateral, and bilateral elicitor bandwidths. J Assoc Res Otolaryngol. 2009;10:459–70. doi: 10.1007/s10162-009-0163-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Maison S, Micheyl C, Andéol G, Gallégo S, Collet L. Activation of medial olivocochlear efferent system in humans: influence of stimulus bandwidth. Hear Res. 2000;140:111–25. doi: 10.1016/s0378-5955(99)00196-3 [DOI] [PubMed] [Google Scholar]
  • 45.Backus BC, Guinan JJ Jr. Time-course of the human medial olivocochlear reflex. J Acoust Soc Am. 2006;119:2889–904. doi: 10.1121/1.2169918 [DOI] [PubMed] [Google Scholar]
  • 46.Hood LJ, Berlin CI, Hurley A, Cecola RP, Bell B. Contralateral suppression of transient-evoked otoacoustic emissions in humans: intensity effects. Hear Res. 1996;101:113–8. doi: 10.1016/s0378-5955(96)00138-4 [DOI] [PubMed] [Google Scholar]
  • 47.Ding N, Simon JZ. Adaptive temporal encoding leads to a background-insensitive cortical representation of speech. J Neurosci. 2013;33:5728–35. doi: 10.1523/JNEUROSCI.5297-12.2013 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Vanthornhout J, Decruy L, Wouters J, Simon JZ, Francart T. Speech Intelligibility Predicted from Neural Entrainment of the Speech Envelope. J Assoc Res Otolaryngol. 2018;19:181–91. doi: 10.1007/s10162-018-0654-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Brodbeck C, Jiao A, Hong LE, Simon JZ. Neural speech restoration at the cocktail party: Auditory cortex recovers masked speech of both attended and ignored speakers. PLoS Biol. 2020;18:e3000883. doi: 10.1371/journal.pbio.3000883 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Moon IJ, Won JH, Park M-H, Ives DT, Nie K, Heinz MG, et al. Optimal combination of neural temporal envelope and fine structure cues to explain speech identification in background noise. J Neurosci. 2014;34:12145–54. doi: 10.1523/JNEUROSCI.1025-14.2014 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Relaño-Iborra H, May T, Zaar J, Scheidiger C, Dau T. Predicting speech intelligibility based on a correlation metric in the envelope power spectrum domain. J Acoust Soc Am. 2016;140:2670. doi: 10.1121/1.4964505 [DOI] [PubMed] [Google Scholar]
  • 52.Kahneman D. Attention and effort. Prentice-Hall; 1973. [Google Scholar]
  • 53.Abdala C, Folsom RC. Frequency contribution to the click-evoked auditory brain-stem response in human adults and infants. J Acoust Soc Am. 1995;97:2394–404. doi: 10.1121/1.411961 [DOI] [PubMed] [Google Scholar]
  • 54.Don M, Eggermont JJ. Analysis of the click-evoked brainstem potentials in man unsing high-pass noise masking. J Acoust Soc Am. 1978;63:1084–92. doi: 10.1121/1.381816 [DOI] [PubMed] [Google Scholar]
  • 55.Chintanpalli A, Jennings SG, Heinz MG, Strickland EA. Modeling the anti-masking effects of the olivocochlear reflex in auditory nerve responses to tones in sustained noise. J Assoc Res Otolaryngol. 2012;13:219–35. doi: 10.1007/s10162-011-0310-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Jennings SG, Heinz MG, Strickland EA. Evaluating adaptation and olivocochlear efferent feedback as potential explanations of psychophysical overshoot. J Assoc Res Otolaryngol. 2011;12:345–60. doi: 10.1007/s10162-011-0256-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Smalt CJ, Heinz MG, Strickland EA. Modeling the time-varying and level-dependent effects of the medial olivocochlear reflex in auditory nerve responses. J Assoc Res Otolaryngol. 2014;15:159–73. doi: 10.1007/s10162-013-0430-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Brown GJ, Ferry RT, Meddis R. A computer model of auditory efferent suppression: implications for the recognition of speech in noise. J Acoust Soc Am. 2010;127:943–54. doi: 10.1121/1.3273893 [DOI] [PubMed] [Google Scholar]
  • 59.Clark NR, Brown GJ, Jürgens T, Meddis R. A frequency-selective feedback model of auditory efferent suppression and its implications for the recognition of speech in noise. J Acoust Soc Am. 2012;132:1535–41. doi: 10.1121/1.4742745 [DOI] [PubMed] [Google Scholar]
  • 60.Yasin I, Drga V, Liu F, Demosthenous A, Meddis R. Optimizing Speech Recognition Using a Computational Model of Human Hearing: Effect of Noise Type and Efferent Time Constants. IEEE Access. 2020;8:56711–9. [Google Scholar]
  • 61.Messing DP, Delhorne L, Bruckert E, Braida LD, Ghitza O. A non-linear efferent-inspired model of the auditory system; matching human confusions in stationary noise. Speech Commun. 2009;51:668–83. [Google Scholar]
  • 62.Yasin I, Liu F, Drga V, Demosthenous A, Meddis R. Effect of auditory efferent time-constant duration on speech recognition in noise. J Acoust Soc Am. 2018;143:EL112. doi: 10.1121/1.5023502 [DOI] [PubMed] [Google Scholar]
  • 63.Ferry RT, Meddis R. A computer model of medial efferent suppression in the mammalian auditory system. J Acoust Soc Am. 2007;122:3519–26. doi: 10.1121/1.2799914 [DOI] [PubMed] [Google Scholar]
  • 64.Meddis R. MAP-BS a Matlab Auditory Processing software platform for studying Auditory BrainStem activity. 2016. [cited 3 Jun 2020]. doi: 10.13140/RG.2.2.30627.45603 [DOI] [Google Scholar]
  • 65.Moezzi B, Iannella N, McDonnell MD. Modeling the influence of short term depression in vesicle release and stochastic calcium channel gating on auditory nerve spontaneous firing statistics. Front Comput Neurosci. 2014;8. doi: 10.3389/fncom.2014.00008 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Apoux F, Yoho SE, Youngdahl CL, Healy EW. Role and relative contribution of temporal envelope and fine structure cues in sentence recognition by normal-hearing listeners. J Acoust Soc Am. 2013;134:2205–12. doi: 10.1121/1.4816413 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Fogerty D. Perceptual weighting of individual and concurrent cues for sentence intelligibility: frequency, envelope, and fine structure. J Acoust Soc Am. 2011;129:977–88. doi: 10.1121/1.3531954 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Jørgensen S, Ewert SD, Dau T. A multi-resolution envelope-power based model for speech intelligibility. J Acoust Soc Am. 2013;134:436–46. doi: 10.1121/1.4807563 [DOI] [PubMed] [Google Scholar]
  • 69.Scheidiger C, Carney LH, Dau T, Zaar J. Predicting Speech Intelligibility Based on Across-Frequency Contrast in Simulated Auditory-Nerve Fluctuations. Acta Acust United Acust. 2018;104:914–7. doi: 10.3813/aaa.919245 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Costalupes JA, Young ED, Gibson DJ. Effects of continuous noise backgrounds on rate response of auditory nerve fibers in cat. J Neurophysiol. 1984;51:1326–44. doi: 10.1152/jn.1984.51.6.1326 [DOI] [PubMed] [Google Scholar]
  • 71.Kujawa SG, Liberman MC. Synaptopathy in the noise-exposed and aging cochlea: Primary neural degeneration in acquired sensorineural hearing loss. Hear Res. 2015;330:191–9. doi: 10.1016/j.heares.2015.02.009 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Huet A, Desmadryl G, Justal T, Nouvian R, Puel J-L, Bourien J. The Interplay Between Spike-Time and Spike-Rate Modes in the Auditory Nerve Encodes Tone-In-Noise Threshold. J Neurosci. 2018;38:5727–38. doi: 10.1523/JNEUROSCI.3103-17.2018 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Heinz MG, Swaminathan J. Quantifying envelope and fine-structure coding in auditory nerve responses to chimaeric speech. J Assoc Res Otolaryngol. 2009;10:407–23. doi: 10.1007/s10162-009-0169-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Joris PX, Louage DH, Cardoen L, van der Heijden M. Correlation index: a new metric to quantify temporal coding. Hear Res. 2006;216–217:19–30. doi: 10.1016/j.heares.2006.03.010 [DOI] [PubMed] [Google Scholar]
  • 75.Louage DHG, van der Heijden M, Joris PX. Temporal properties of responses to broadband noise in the auditory nerve. J Neurophysiol. 2004;91:2051–65. doi: 10.1152/jn.00816.2003 [DOI] [PubMed] [Google Scholar]
  • 76.Parida S, Bharadwaj H, Heinz MG. Spectrally specific temporal analyses of spike-train responses to complex sounds: A unifying framework. PLoS Comput Biol. 2021;17:e1008155. doi: 10.1371/journal.pcbi.1008155 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Rallapalli VH, Heinz MG. Neural Spike-Train Analyses of the Speech-Based Envelope Power Spectrum Model: Application to Predicting Individual Differences with Sensorineural Hearing Loss. Trends in Hearing. 2016;20:2331216516667319. [Google Scholar]
  • 78.Joris PX, Schreiner CE, Rees A. Neural processing of amplitude-modulated sounds. Physiol Rev. 2004;84:541–77. doi: 10.1152/physrev.00029.2003 [DOI] [PubMed] [Google Scholar]
  • 79.Kale S, Heinz MG. Envelope coding in auditory nerve fibers following noise-induced hearing loss. J Assoc Res Otolaryngol. 2010;11:657–73. doi: 10.1007/s10162-010-0223-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80.Liberman MC. Auditory-nerve response from cats raised in a low-noise chamber. J Acoust Soc Am. 1978;63:442–55. doi: 10.1121/1.381736 [DOI] [PubMed] [Google Scholar]
  • 81.Winter IM, Palmer AR. Intensity coding in low-frequency auditory-nerve fibers of the guinea pig. J Acoust Soc Am. 1991;90:1958–67. doi: 10.1121/1.401675 [DOI] [PubMed] [Google Scholar]
  • 82.Heinz MG, Young ED. Response growth with sound level in auditory-nerve fibers after noise-induced hearing loss. J Neurophysiol. 2004;91:784–95. doi: 10.1152/jn.00776.2003 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 83.Dean I, Robinson BL, Harper NS, McAlpine D. Rapid neural adaptation to sound level statistics. J Neurosci. 2008;28:6430–8. doi: 10.1523/JNEUROSCI.0470-08.2008 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 84.Wen B, Wang GI, Dean I, Delgutte B. Dynamic range adaptation to sound level statistics in the auditory nerve. J Neurosci. 2009;29:13797–808. doi: 10.1523/JNEUROSCI.5610-08.2009 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 85.Carney LH. Supra-Threshold Hearing and Fluctuation Profiles: Implications for Sensorineural and Hidden Hearing Loss. J Assoc Res Otolaryngol. 2018. doi: 10.1007/s10162-018-0669-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 86.Marrufo-Pérez MI, Sturla-Carreto DDP, Eustaquio-Martín A, Lopez-Poveda EA. Adaptation to Noise in Human Speech Recognition Depends on Noise-Level Statistics and Fast Dynamic-Range Compression. J Neurosci. 2020;40:6613–23. doi: 10.1523/JNEUROSCI.0469-20.2020 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 87.Marrufo-Pérez MI, Eustaquio-Martín A, Lopez-Poveda EA. Adaptation to Noise in Human Speech Recognition Unrelated to the Medial Olivocochlear Reflex. J Neurosci. 2018;38:4138–45. doi: 10.1523/JNEUROSCI.0024-18.2018 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 88.Lopez-Poveda EA, Barrios P. Perception of stochastically undersampled sound waveforms: a model of auditory deafferentation. Front Neurosci. 2013;7:124. doi: 10.3389/fnins.2013.00124 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 89.Lorenzi C, Gilbert G, Carn H, Garnier S, Moore BCJ. Speech perception problems of the hearing impaired reflect inability to use temporal fine structure. Proc Natl Acad Sci U S A. 2006;103:18866–9. doi: 10.1073/pnas.0607364103 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 90.Apoux F, Bacon SP. Relative importance of temporal information in various frequency regions for consonant identification in quiet and in noise. J Acoust Soc Am. 2004;116:1671–80. doi: 10.1121/1.1781329 [DOI] [PubMed] [Google Scholar]
  • 91.Moore BCJ. The role of temporal fine structure processing in pitch perception, masking, and speech perception for normal-hearing and hearing-impaired people. J Assoc Res Otolaryngol. 2008;9:399–406. doi: 10.1007/s10162-008-0143-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 92.Steinmetzger K, Rosen S. The role of periodicity in perceiving speech in quiet and in background noise. J Acoust Soc Am. 2015;138:3586–99. doi: 10.1121/1.4936945 [DOI] [PubMed] [Google Scholar]
  • 93.Kutas M, Federmeier KD. Thirty years and counting: finding meaning in the N400 component of the event-related brain potential (ERP). Annu Rev Psychol. 2011;62:621–47. doi: 10.1146/annurev.psych.093008.131123 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 94.Trimmel K, Sachsenweger J, Lindinger G, Auff E, Zimprich F, Pataraia E. Lateralization of language function in epilepsy patients: A high-density scalp-derived event-related potentials (ERP) study. Clin Neurophysiol. 2017;128:472–9. doi: 10.1016/j.clinph.2016.12.025 [DOI] [PubMed] [Google Scholar]
  • 95.Kutas M, Federmeier KD. Electrophysiology reveals semantic memory use in language comprehension. Trends Cogn Sci. 2000;4:463–70. doi: 10.1016/s1364-6613(00)01560-6 [DOI] [PubMed] [Google Scholar]
  • 96.Davenport T, Coulson S. Hemispheric asymmetry in interpreting novel literal language: an event-related potential study. Neuropsychologia. 2013;51:907–21. doi: 10.1016/j.neuropsychologia.2013.01.018 [DOI] [PubMed] [Google Scholar]
  • 97.Broadway JM, Franklin MS, Schooler JW. Early event-related brain potentials and hemispheric asymmetries reveal mind-wandering while reading and predict comprehension. Biol Psychol. 2015;107:31–43. doi: 10.1016/j.biopsycho.2015.02.009 [DOI] [PubMed] [Google Scholar]
  • 98.Näätänen R, Picton T. The N1 wave of the human electric and magnetic response to sound: a review and an analysis of the component structure. Psychophysiology. 1987;24:375–425. doi: 10.1111/j.1469-8986.1987.tb00311.x [DOI] [PubMed] [Google Scholar]
  • 99.Eggermont JJ, Ponton CW. The neurophysiology of auditory perception: from single units to evoked potentials. Audiol Neurootol. 2002;7:71–99. doi: 10.1159/000057656 [DOI] [PubMed] [Google Scholar]
  • 100.Whiting KA, Martin BA, Stapells DR. The effects of broadband noise masking on cortical event-related potentials to speech sounds /ba/ and /da/. Ear Hear. 1998;19:218–31. doi: 10.1097/00003446-199806000-00005 [DOI] [PubMed] [Google Scholar]
  • 101.Billings CJ, McMillan GP, Penman TM, Gille SM. Predicting perception in noise using cortical auditory evoked potentials. J Assoc Res Otolaryngol. 2013;14:891–903. doi: 10.1007/s10162-013-0415-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 102.Bidelman GM, Bush LC, Boudreaux AM. Effects of Noise on the Behavioral and Neural Categorization of Speech. Front Neurosci. 2020;14:153. doi: 10.3389/fnins.2020.00153 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 103.Getzmann S, Falkenstein M, Wascher E. ERP correlates of auditory goal-directed behavior of younger and older adults in a dynamic speech perception task. Behav Brain Res. 2015;278:435–45. doi: 10.1016/j.bbr.2014.10.026 [DOI] [PubMed] [Google Scholar]
  • 104.Potts GF. An ERP index of task relevance evaluation of visual stimuli. Brain Cogn. 2004;56:5–13. doi: 10.1016/j.bandc.2004.03.006 [DOI] [PubMed] [Google Scholar]
  • 105.Juottonen K, Revonsuo A, Lang H. Dissimilar age influences on two ERP waveforms (LPC and N400) reflecting semantic context effect. Brain Res Cogn Brain Res. 1996;4:99–107. [PubMed] [Google Scholar]
  • 106.Stuss DT, Picton TW, Cerri AM, Leech EE, Stethem LL. Perceptual closure and object identification: electrophysiological responses to incomplete pictures. Brain Cogn. 1992;19:253–66. doi: 10.1016/0278-2626(92)90047-p [DOI] [PubMed] [Google Scholar]
  • 107.Francis AL, MacPherson MK, Chandrasekaran B, Alvar AM. Autonomic Nervous System Responses During Perception of Masked Speech may Reflect Constructs other than Subjective Listening Effort. Front Psychol. 2016;7:263. doi: 10.3389/fpsyg.2016.00263 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 108.Pichora-Fuller MK, Kramer SE, Eckert MA, Edwards B, Hornsby BWY, Humes LE, et al. Hearing Impairment and Cognitive Energy: The Framework for Understanding Effortful Listening (FUEL). Ear Hear. 2016;37 (Suppl 1):5S –27S. [DOI] [PubMed] [Google Scholar]
  • 109.Puel JL, Bonfils P, Pujol R. Selective attention modifies the active micromechanical properties of the cochlea. Brain Res. 1988;447:380–3. doi: 10.1016/0006-8993(88)91144-4 [DOI] [PubMed] [Google Scholar]
  • 110.Kayser C, Petkov CI, Lippert M, Logothetis NK. Mechanisms for allocating auditory attention: an auditory saliency map. Curr Biol. 2005;15:1943–7. doi: 10.1016/j.cub.2005.09.040 [DOI] [PubMed] [Google Scholar]
  • 111.Terhardt E, Stoll G, Seewann M. Algorithm for extraction of pitch and pitch salience from complex tonal signals. J Acoust Soc Am. 1982;71:679–88. [DOI] [PubMed] [Google Scholar]
  • 112.Bregman AS. Auditory scene analysis: The perceptual organization of sound. https://psycnet.apa.org › record › 1990-98046-000https://psycnet.apa.org › record › 1990-98046-000. 1990;773. Available: https://psycnet.apa.org/fulltext/1990-98046-000.pdf [Google Scholar]
  • 113.Darwin CJ, Carlyon RP. Auditory grouping. Hearing. 1995;468:387–424. doi: 10.3758/bf03206505 [DOI] [PubMed] [Google Scholar]
  • 114.Burns EM, Viemeister NF. Nonspectral pitch. J Acoust Soc Am. 1976;60:863–9. [Google Scholar]
  • 115.Burns EM, Viemeister NF. Played-again SAM: Further observations on the pitch of amplitude-modulated noise. J Acoust Soc Am. 1981;70:1655–60. [Google Scholar]
  • 116.Shackleton TM, Carlyon RP. The role of resolved and unresolved harmonics in pitch perception and frequency modulation discrimination. J Acoust Soc Am. 1994;95:3529–40. doi: 10.1121/1.409970 [DOI] [PubMed] [Google Scholar]
  • 117.Bernstein JG, Oxenham AJ. Pitch discrimination of diotic and dichotic tone complexes: harmonic resolvability or harmonic number? J Acoust Soc Am. 2003;113:3323–34. doi: 10.1121/1.1572146 [DOI] [PubMed] [Google Scholar]
  • 118.Kawase T, Delgutte B, Liberman MC. Antimasking effects of the olivocochlear reflex. II. Enhancement of auditory-nerve response to masked tones. J Neurophysiol. 1993;70:2533–49. doi: 10.1152/jn.1993.70.6.2533 [DOI] [PubMed] [Google Scholar]
  • 119.Marian V, Lam TQ, Hayakawa S, Dhar S. Spontaneous Otoacoustic Emissions Reveal an Efficient Auditory Efferent Network. J Speech Lang Hear Res. 2018;61:2827–32. doi: 10.1044/2018_JSLHR-H-18-0025 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 120.Saiz-Alía M, Miller P, Reichenbach T. Otoacoustic emissions evoked by the time-varying harmonic structure of speech. eNeuro. 2021. doi: 10.1523/ENEURO.0428-20.2021 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 121.Lavie N. Distracted and confused?: selective attention under load. Trends Cogn Sci. 2005;9:75–82. doi: 10.1016/j.tics.2004.12.004 [DOI] [PubMed] [Google Scholar]
  • 122.Boothalingam S, Purcell D, Scollie S. Influence of 100Hz amplitude modulation on the human medial olivocochlear reflex. Neurosci Lett. 2014;580:56–61. doi: 10.1016/j.neulet.2014.07.048 [DOI] [PubMed] [Google Scholar]
  • 123.Kalaiah MK, Nanchirakal JF, Kharmawphlang L, Noronah SC. Contralateral suppression of transient evoked otoacoustic emissions for various noise signals. Hearing, Balance and Communication. 2017;15:84–90. [Google Scholar]
  • 124.Lopez-Poveda EA, Eustaquio-Martín A, Stohl JS, Wolford RD, Schatzer R, Wilson BS. A Binaural Cochlear Implant Sound Coding Strategy Inspired by the Contralateral Medial Olivocochlear Reflex. Ear Hear. 2016;37:e138–48. doi: 10.1097/AUD.0000000000000273 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 125.Lopez-Poveda EA, Eustaquio-Martín A, Fumero MJ, Gorospe JM, Polo López R, Gutiérrez Revilla MA, et al. Speech-in-Noise Recognition With More Realistic Implementations of a Binaural Cochlear-Implant Sound Coding Strategy Inspired by the Medial Olivocochlear Reflex. Ear Hear. 2020;41:1492–510. doi: 10.1097/AUD.0000000000000880 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 126.Heggdal POL, Aarstad HJ, Brännström J, Vassbotn FS, Specht K. An fMRI-study on single-sided deafness: Spectral-temporal properties and side of stimulation modulates hemispheric dominance. Neuroimage Clin. 2019;24:101969. doi: 10.1016/j.nicl.2019.101969 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 127.Khalfa S, Bougeard R, Morand N, Veuillet E, Isnard J, Guenot M, et al. Evidence of peripheral auditory activity modulation by the auditory cortex in humans. Neuroscience. 2001;104:347–58. doi: 10.1016/s0306-4522(01)00072-0 [DOI] [PubMed] [Google Scholar]
  • 128.Maison SF, Liberman MC. Predicting vulnerability to acoustic injury with a noninvasive assay of olivocochlear reflex strength. J Neurosci. 2000;20:4701–7. doi: 10.1523/JNEUROSCI.20-12-04701.2000 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 129.Taranda J, Maison SF, Ballestero JA, Katz E, Savino J, Vetter DE, et al. A point mutation in the hair cell nicotinic cholinergic receptor prolongs cochlear inhibition and enhances noise protection. PLoS Biol. 2009;7:e18. doi: 10.1371/journal.pbio.1000018 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 130.Boero LE, Castagna VC, Terreros G, Moglie MJ, Silva S, Maass JC, et al. Preventing presbycusis in mice with enhanced medial olivocochlear feedback. Proc Natl Acad Sci U S A. 2020;117:11811–9. doi: 10.1073/pnas.2000760117 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 131.Wiederhold ML, Kiang NY. Effects of electric stimulation of the crossed olivocochlear bundle on single auditory-nerve fibers in the cat. J Acoust Soc Am. 1970;48:950–65. doi: 10.1121/1.1912234 [DOI] [PubMed] [Google Scholar]
  • 132.Marrufo-Pérez MI, Johannesen PT, Lopez-Poveda EA. Correlation and Reliability of Behavioral and Otoacoustic-Emission Estimates of Contralateral Medial Olivocochlear Reflex Strength in Humans. Front Neurosci. 2021;15:640127. doi: 10.3389/fnins.2021.640127 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 133.Miller GA, Licklider JCR. The intelligibility of interrupted speech. J Acoust Soc Am. 1950;22:167–73. [Google Scholar]
  • 134.Cooke M. A glimpsing model of speech perception in noise. J Acoust Soc Am. 2006;119:1562–73. doi: 10.1121/1.2166600 [DOI] [PubMed] [Google Scholar]
  • 135.Li N, Loizou PC. Factors influencing glimpsing of speech in noise. J Acoust Soc Am. 2007;122:1165–72. doi: 10.1121/1.2749454 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 136.Sumner CJ, Lopez-Poveda EA, O’Mard LP, Meddis R. A revised model of the inner-hair cell and auditory-nerve complex. J Acoust Soc Am. 2002;111:2178–88. doi: 10.1121/1.1453451 [DOI] [PubMed] [Google Scholar]
  • 137.Meddis R. Simulation of mechanical to neural transduction in the auditory receptor. J Acoust Soc Am. 1986;79:702–11. doi: 10.1121/1.393460 [DOI] [PubMed] [Google Scholar]
  • 138.Javel E, Viemeister NF. Stochastic properties of cat auditory nerve responses to electric and acoustic stimuli and application to intensity discrimination. J Acoust Soc Am. 2000;107:908–21. doi: 10.1121/1.428269 [DOI] [PubMed] [Google Scholar]
  • 139.Jürgens T, Brand T, Clark NR, Meddis R, Brown GJ. The robustness of speech representations obtained from simulated auditory nerve fibers under different noise conditions. J Acoust Soc Am. 2013;134:EL282–8. doi: 10.1121/1.4817912 [DOI] [PubMed] [Google Scholar]
  • 140.James AL, Mount RJ, Harrison RV. Contralateral suppression of DPOAE measured in real time. Clin Otolaryngol Allied Sci. 2002;27:106–12. doi: 10.1046/j.1365-2273.2002.00541.x [DOI] [PubMed] [Google Scholar]
  • 141.Zhao W, Dhar S. Fast and slow effects of medial olivocochlear efferent activity in humans. PLoS ONE. 2011;6:e18725. doi: 10.1371/journal.pone.0018725 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 142.Cooper NP, Guinan JJ Jr. Separate mechanical processes underlie fast and slow effects of medial olivocochlear efferent activity. J Physiol. 2003;548:307–12. doi: 10.1113/jphysiol.2003.039081 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 143.Aedo C, Tapia E, Pavez E, Elgueda D, Delano PH, Robles L. Stronger efferent suppression of cochlear neural potentials by contralateral acoustic stimulation in awake than in anesthetized chinchilla. Front Syst Neurosci. 2015;9:21. doi: 10.3389/fnsys.2015.00021 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 144.Chambers AR, Hancock KE, Maison SF, Liberman MC, Polley DB. Sound-evoked olivocochlear activation in unanesthetized mice. J Assoc Res Otolaryngol. 2012;13:209–17. doi: 10.1007/s10162-011-0306-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 145.Guitton MJ, Avan P, Puel J-L, Bonfils P. Medial olivocochlear efferent activity in awake guinea pigs. Neuroreport. 2004;15:1379–82. doi: 10.1097/01.wnr.0000131672.15566.64 [DOI] [PubMed] [Google Scholar]
  • 146.Liberman LD, Liberman MC. Cochlear Efferent Innervation Is Sparse in Humans and Decreases with Age. J Neurosci. 2019;39:9560–9. doi: 10.1523/JNEUROSCI.3004-18.2019 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 147.Brown MC. Single-unit labeling of medial olivocochlear neurons: the cochlear frequency map for efferent axons. J Neurophysiol. 2014;111:2177–86. doi: 10.1152/jn.00045.2014 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 148.Wojtczak M, Klang AM, Torunsky NT. Exploring the Role of Medial Olivocochlear Efferents on the Detection of Amplitude Modulation for Tones Presented in Noise. J Assoc Res Otolaryngol. 2019;20:395–413. doi: 10.1007/s10162-019-00722-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 149.Forte AE, Etard O, Reichenbach T. The human auditory brainstem response to running speech reveals a subcortical mechanism for selective attention. elife. 2017;6:e27203. doi: 10.7554/eLife.27203 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 150.Brix R. The influence of attention on the auditory brain stem evoked responses. Preliminary report Acta Otolaryngol. 1984;98:89–92. doi: 10.3109/00016488409107538 [DOI] [PubMed] [Google Scholar]
  • 151.Lukas JH. Human auditory attention: the olivocochlear bundle may function as a peripheral filter. Psychophysiology. 1980;17:444–52. doi: 10.1111/j.1469-8986.1980.tb00181.x [DOI] [PubMed] [Google Scholar]
  • 152.Fujiwara N, Nagamine T, Imai M, Tanaka T, Shibasaki H. Role of the primary auditory cortex in auditory selective attention studied by whole-head neuromagnetometer. Brain Res Cogn Brain Res. 1998;7:99–109. doi: 10.1016/s0926-6410(98)00014-7 [DOI] [PubMed] [Google Scholar]
  • 153.Hillyard SA, Hink RF, Schwent VL, Picton TW. Electrical signs of selective attention in the human brain. Science. 1973;182:177–80. doi: 10.1126/science.182.4108.177 [DOI] [PubMed] [Google Scholar]
  • 154.Mesgarani N, David SV, Fritz JB, Shamma SA. Mechanisms of noise robust representation of speech in primary auditory cortex. Proc Natl Acad Sci U S A. 2014;111:6792–7. doi: 10.1073/pnas.1318017111 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 155.Rabinowitz NC, Willmore BDB, King AJ, Schnupp JWH. Constructing noise-invariant representations of sound in the auditory pathway. PLoS Biol. 2013;11:e1001710. doi: 10.1371/journal.pbio.1001710 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 156.Kell AJE, McDermott JH. Invariance to background noise as a signature of non-primary auditory cortex. Nat Commun. 2019;10. doi: 10.1038/s41467-018-07709-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 157.Robinson BL, Harper NS, McAlpine D. Meta-adaptation in the auditory midbrain under cortical influence. Nat Commun. 2016;7:13442. doi: 10.1038/ncomms13442 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 158.Shaheen LA, Slee SJ, David SV. Task Engagement Improves Neural Discriminability in the Auditory Midbrain of the Marmoset Monkey. J Neurosci. 2021;41:284–97. doi: 10.1523/JNEUROSCI.1112-20.2020 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 159.Asokan MM, Williamson RS, Hancock KE, Polley DB. Sensory overamplification in layer 5 auditory corticofugal projection neurons following cochlear nerve synaptic damage. Nat Commun. 2018;9:2468. doi: 10.1038/s41467-018-04852-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 160.Ballestero J, Recugnat M, Laudanski J, Smith KE, Jagger DJ, Gnansia D, et al. Reducing Current Spread by Use of a Novel Pulse Shape for Electrical Stimulation of the Auditory Nerve. Trends Hear. 2015;19. doi: 10.1177/2331216515619763 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 161.Shannon RV. Multichannel electrical stimulation of the auditory nerve in man. I Basic psychophysics Hear Res. 1983;11:157–89. doi: 10.1016/0378-5955(83)90077-1 [DOI] [PubMed] [Google Scholar]
  • 162.Zeng F-G. Trends in Cochlear Implants. Trends Amplif. 2004;8:1–34. doi: 10.1177/108471380400800102 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 163.Lee DJ, de Venecia RK, Guinan JJ Jr, Brown MC. Central auditory pathways mediating the rat middle ear muscle reflexes. Anat Rec A Discov Mol Cell Evol Biol. 2006;288:358–69. doi: 10.1002/ar.a.20296 [DOI] [PubMed] [Google Scholar]
  • 164.Feeney MP, Keefe DH, Hunter LL, Fitzpatrick DF, Garinis AC, Putterman DB, et al. Normative Wideband Reflectance, Equivalent Admittance at the Tympanic Membrane, and Acoustic Stapedius Reflex Threshold in Adults. Ear Hear. 2017;38:e142. doi: 10.1097/AUD.0000000000000399 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 165.Feeney MP, Grant IL, Marryott LP. Wideband energy reflectance measurements in adults with middle-ear disorders. J Speech Lang Hear Res. 2003;46:901–11. doi: 10.1044/1092-4388(2003/070) [DOI] [PubMed] [Google Scholar]
  • 166.Feeney MP, Keefe DH. Estimating the acoustic reflex threshold from wideband measures of reflectance, admittance, and power. Ear Hear. 2001;22:316–32. doi: 10.1097/00003446-200108000-00006 [DOI] [PubMed] [Google Scholar]
  • 167.Boothalingam S, Purcell DW. Influence of the stimulus presentation rate on medial olivocochlear system assays. J Acoust Soc Am. 2015;137:724–32. doi: 10.1121/1.4906250 [DOI] [PubMed] [Google Scholar]
  • 168.Souza NN, Dhar S, Neely ST, Siegel JH. Comparison of nine methods to estimate ear-canal stimulus levels. J Acoust Soc Am. 2014;136:1768–87. doi: 10.1121/1.4894787 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 169.Scheperle RA, Neely ST, Kopun JG, Gorga MP. Influence of in situ, sound-level calibration on distortion-product otoacoustic emission variability. J Acoust Soc Am. 2008;124:288–300. doi: 10.1121/1.2931953 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 170.Mertes IB, Goodman SS. Within- and Across-Subject Variability of Repeated Measurements of Medial Olivocochlear-Induced Changes in Transient-Evoked Otoacoustic Emissions. Ear Hear. 2016;37:e72–84. doi: 10.1097/AUD.0000000000000244 [DOI] [PubMed] [Google Scholar]
  • 171.Shera CA, Guinan JJ Jr. Evoked otoacoustic emissions arise by two fundamentally different mechanisms: a taxonomy for mammalian OAEs. J Acoust Soc Am. 1999;105:782–98. doi: 10.1121/1.426948 [DOI] [PubMed] [Google Scholar]
  • 172.Yates GK, Withnell RH. The role of intermodulation distortion in transient-evoked otoacoustic emissions. Hear Res. 1999;136:49–64. doi: 10.1016/s0378-5955(99)00108-2 [DOI] [PubMed] [Google Scholar]
  • 173.Withnell RH, Yates GK. Onset of basilar membrane non-linearity reflected in cubic distortion tone input-output functions. Hear Res. 1998;123:87–96. doi: 10.1016/s0378-5955(98)00100-2 [DOI] [PubMed] [Google Scholar]
  • 174.Homan RW, Herman J, Purdy P. Cerebral location of international 10–20 system electrode placement. Electroencephalogr Clin Neurophysiol. 1987;66:376–82. doi: 10.1016/0013-4694(87)90206-9 [DOI] [PubMed] [Google Scholar]
  • 175.Silva I. Estimation of postaverage SNR from evoked responses under nonstationary noise. IEEE Trans Biomed Eng. 2009;56:2123–30. doi: 10.1109/TBME.2009.2021400 [DOI] [PubMed] [Google Scholar]
  • 176.Don M, Elberling C. Evaluating residual background noise in human auditory brain-stem responses. J Acoust Soc Am. 1994;96:2746–57. doi: 10.1121/1.411281 [DOI] [PubMed] [Google Scholar]
  • 177.Lakens D. Calculating and reporting effect sizes to facilitate cumulative science: a practical primer for t-tests and ANOVAs. Front Psychol. 2013;4:863. doi: 10.3389/fpsyg.2013.00863 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 178.Shamma S, Lorenzi C. On the balance of envelope and temporal fine structure in the encoding of speech in the early auditory system. J Acoust Soc Am. 2013;133:2818–33. doi: 10.1121/1.4795783 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 179.Aubanel V, Davis C. Kim J. The MAVA corpus. 2017. doi: 10.1121/1.4983826 [DOI] [PMC free article] [PubMed] [Google Scholar]

Decision Letter 0

Gabriel Gasque

18 Mar 2021

Dear Dr Hernandez Perez,

Thank you for submitting your manuscript entitled "Perceptual gating of a brainstem reflex facilitates speech understanding in human listeners" for consideration as a Research Article by PLOS Biology.

Your manuscript has now been evaluated by the PLOS Biology editorial staff, as well as by an academic editor with relevant expertise, and I am writing to let you know that we would like to send your submission out for external peer review.

However, before we can send your manuscript to reviewers, we need you to complete your submission by providing the metadata that is required for full assessment. To this end, please login to Editorial Manager where you will find the paper in the 'Submissions Needing Revisions' folder on your homepage. Please click 'Revise Submission' from the Action Links and complete all additional questions in the submission questionnaire.

Please re-submit your manuscript within two working days, i.e. by Mar 22 2021 11:59PM.

Login to Editorial Manager here: https://www.editorialmanager.com/pbiology

During resubmission, you will be invited to opt-in to posting your pre-review manuscript as a bioRxiv preprint. Visit http://journals.plos.org/plosbiology/s/preprints for full details. If you consent to posting your current manuscript as a preprint, please upload a single Preprint PDF when you re-submit.

Once your full submission is complete, your paper will undergo a series of checks in preparation for peer review. Once your manuscript has passed all checks it will be sent out for review.

Given the disruptions resulting from the ongoing COVID-19 pandemic, please expect delays in the editorial process. We apologise in advance for any inconvenience caused and will do our best to minimize impact as far as possible.

Feel free to email us at plosbiology@plos.org if you have any queries relating to your submission.

Kind regards,

Gabriel Gasque, Ph.D.,

Senior Editor

PLOS Biology

Decision Letter 1

Gabriel Gasque

27 Apr 2021

Dear Dr Hernandez Perez,

Thank you very much for submitting your manuscript "Perceptual gating of a brainstem reflex facilitates speech understanding in human listeners" for consideration as a Research Article at PLOS Biology. Your manuscript has been evaluated by the PLOS Biology editors, by an Academic Editor with relevant expertise, and by four independent reviewers. Please accept my apologies for the delay in sending the decision below to you.

In light of the reviews (below), we will not be able to accept the current version of the manuscript, but we would welcome re-submission of a much-revised version that takes into account the reviewers' comments. We cannot make any decision about publication until we have seen the revised manuscript and your response to the reviewers' comments. Your revised manuscript is also likely to be sent for further evaluation by the reviewers.

We expect to receive your revised manuscript within 3 months.

Please email us (plosbiology@plos.org) if you have any questions or concerns, or would like to request an extension. At this stage, your manuscript remains formally under active consideration at our journal; please notify us by email if you do not intend to submit a revision so that we may end consideration of the manuscript at PLOS Biology.

**IMPORTANT - SUBMITTING YOUR REVISION**

Your revisions should address the specific points made by each reviewer. Having discussed these with the Academic Editor, we think reviewer 1 makes a good point about your title. You should be more specific unless you can make a clear demonstration otherwise. You will also note that reviewer 3 suggests you could split this submission into several manuscripts, to decompress the message. We think you should keep all the information within this paper, as its impact relies on the comprehensive collection of data and analyses .

Please submit the following files along with your revised manuscript:

1. A 'Response to Reviewers' file - this should detail your responses to the editorial requests, present a point-by-point response to all of the reviewers' comments, and indicate the changes made to the manuscript.

*NOTE: In your point by point response to the reviewers, please provide the full context of each review. Do not selectively quote paragraphs or sentences to reply to. The entire set of reviewer comments should be present in full and each specific point should be responded to individually, point by point.

You should also cite any additional relevant literature that has been published since the original submission and mention any additional citations in your response.

2. In addition to a clean copy of the manuscript, please also upload a 'track-changes' version of your manuscript that specifies the edits made. This should be uploaded as a "Related" file type.

*Re-submission Checklist*

When you are ready to resubmit your revised manuscript, please refer to this re-submission checklist: https://plos.io/Biology_Checklist

To submit a revised version of your manuscript, please go to https://www.editorialmanager.com/pbiology/ and log in as an Author. Click the link labelled 'Submissions Needing Revision' where you will find your submission record.

Please make sure to read the following important policies and guidelines while preparing your revision:

*Published Peer Review*

Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. Please see here for more details:

https://blogs.plos.org/plos/2019/05/plos-journals-now-open-for-published-peer-review/

*PLOS Data Policy*

Please note that as a condition of publication PLOS' data policy (http://journals.plos.org/plosbiology/s/data-availability) requires that you make available all data used to draw the conclusions arrived at in your manuscript. If you have not already done so, you must include any data used in your manuscript either in appropriate repositories, within the body of the manuscript, or as supporting information (N.B. this includes any numerical values that were used to generate graphs, histograms etc.). For an example see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5

*Blot and Gel Data Policy*

We require the original, uncropped and minimally adjusted images supporting all blot and gel results reported in an article's figures or Supporting Information files. We will require these files before a manuscript can be accepted so please prepare them now, if you have not already uploaded them. Please carefully read our guidelines for how to prepare and upload this data: https://journals.plos.org/plosbiology/s/figures#loc-blot-and-gel-reporting-requirements

*Protocols deposition*

To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols

Thank you again for your submission to our journal. We hope that our editorial process has been constructive thus far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

Gabriel Gasque, Ph.D.,

Senior Editor,

ggasque@plos.org,

PLOS Biology

*****************************************************

REVIEWS:

Reviewer's Responses to Questions

Reviewer #1: This study investigates cochlear, brainstem and cortical responses during passive and active listening monaurally. The active listening task consisted of detecting non-words in a single stream of words and non-words, and the passive task involved presenting similar speech streams while participants were watching and attending to a stop-motion movie. Cochlear and brainstem responses during the two tasks were monitored by recording otoacoustic emissions (OAEs) and auditory brainstem responses (ABRs) to clicks presented to the ear contralateral to speech ear. Cortical responses were obtained by analyzing the P1, N1, N2, N400 and the late positivity component (LPC) after each word. The speech was degraded by noise-vocoding and by adding babble (BN) and speech-shaped noise (SSN).

A main finding is that cochlear responses are inhibited (OAEs suppressed) during active but not passive listening to vocoded speech, while they are inhibited during passive but not active listening to masked speech. This finding is interpreted as indicating that medial olivo-cochlear (MOC) efferents are gated selectively when their activation helps to perform the task, i.e., they are activated when actively listening to vocoded speech because their activation enhances the amplitude fluctuations in vocoded speech, which facilitates understanding vocoded speech; conversely, they are activated in passive listening (during the active visual task) because MOC efferent activation suppresses the speech+noise stimulus that is irrelevant to perform the active visual task. The authors reason that MOC efferents are not active during active listening to masked speech because their activation would not improve the representation of speech in simultaneous noise. To support this interpretation, the authors use a computational model of the auditory brainstem with MOC efferent control. The model predicts better speech-envelope coding with than without MOC efferent activation for vocoded speech but not for masked speech.

Another main finding is that brainstem (wave V) responses are stronger in active than in passive listening to masked speech, while they are not different in the active and passive listening tasks when listening to vocoded speech. Furthermore, in active listening, cortical responses (P1, P2 and LPC) are stronger when listening to masked speech than when listening to vocoded speech. These findings are interpreted as reflecting an engagement of the central auditory system in de-noising masked speech but not when listening to vocoded speech in quiet.

Combined, the findings are interpreted as reflecting that central/cortical auditory areas are involved in de-noising when listening to speech in noise, while MOC efferents are gated during tasks where participants benefit from the effects of MOC efferent activation.

The data are interesting and the interpretations thought-provoking. There are several issues, however, that prevent me from recommending this study for publication in its present form. I would be happy to reconsider my recommendation once the authors ponder (and hopefully address) the following concerns and comments.

Major issues

1.- The authors should be more careful in using the term MOC reflex. The term reflex is conventionally used to refer to an activation of MOC efferents by sound. Therefore, it is confusing to see the same term used to refer to an activation of MOC efferents by active listening (i.e., by attention). Noting the difference in terminology might help reconciling present and past findings. It is conceivable, for example, that MOC-efferent activation by sound (i.e., the MOC reflex) facilitates the neural representation of speech in noise, but attention-gated activation of MOC-efferents facilitates task performance. Therefore, I suggest carefully revising the terminology throughout the manuscript for accuracy and consistency with previous studies.

2.- The manuscript title does not accurately reflect the content and scope of the study and is misleading. As it stands, it implies that the MOC reflex facilitates the understanding of any speech (in noise), when in fact the data do not support this idea. The data support that activation of MOC efferents facilitate understanding noise-vocoded speech in quiet, but not necessarily speech in general, and certainly not speech embedded in noise. I advise revising the title for accuracy.

3.- The description of the stimuli is not clear. It is unclear if words were presented with or without silent gaps between them. Also unclear is whether the BN and SSN were continuous during the 12-minute experimental blocks, including any silent gaps between words, or if they were gated with each word (as implied in L643). Activation of the MOC efferents (and their effects on CEOAEs) would have been very different for continuous and gated noises. Because of the relatively slow time-course of MOC efferent activation (~280-300 ms), if noises were gated with words, and there were silent gaps between words, then MOC efferents would have been active only during the last portion of each word. If my understanding is correct, this was the certainly case for noise-vocoded speech but it is unclear if it was also the case for noisy speech. The temporal course of the various stimuli should be explained more clearly for this reviewer to be able to interpret the results. I suggest including a supplementary figure illustrating more clearly the time course of words, BN or SSN, clicks, and recording windows of OAEs, ABRs, and cortical responses.

4.- The writing needs to be improved. I appreciate the direct and concise writing style, but the manuscript contains a lot of information, some of which is hard to follow without further details. Specific instances where the text can be improved are given below.

5.- The rationale for using the various stimulus configurations is not explained. For example, what was the purpose of (or even better, the hypothesis for) using vocoded speech? Without further details, it seems as though they are building a story backwards from the findings. Please explain upfront the rationale for your stimuli, in particular, the purpose/hypothesis of using vocoded speech in quiet.

6.- The interpretation of the findings is not rounded and not 100% warranted by the data. Why would MOC efferent activation not occur during passive listening to vocoded speech? In this condition, the task was to visually attend to the silent movie and the vocoded speech was irrelevant. Therefore, participants would have also benefited from the MOC system inhibiting the irrelevant auditory stimulus as they benefited (according to your interpretation) from the MOC inhibiting the masked speech in the passive auditory task.

7.- The model support to your findings is not 100% convincing for three reasons. First, it is based on the assumption that the speech understanding is based solely on the neural representation of the speech envelope. This assumption may or may not be true, but people can understand speech reasonably well with only fine structure (Lorenzi et al., 2006, PNAS). Second, if I understood correctly, the correlation between neural and stimulus envelope representations is based on neural responses at one CF only (?), which is unlikely the case. Third, when the same model (or a version of it) has been used in combination with an automatic speech recognizer, the model predicts MOC efferent activation to improve the recognition of speech in noise (Brown et al., 2010, JASA; Yasin et al., 2018, JASA; Yasin et al., 2020, IEEE Access). This latter aspect is discussed in the manuscript but is dismissed by the authors. Altogether, I find that the model and the chosen metric used to predict speech recognition (Delta_rho_ENV) are used in a limited (and somewhat biased) way.

Minor/specific comments

L18. The reflex does not really extract anything, though it might help extracting the salient features in question. Please rephrase for accuracy.

L23. The author(s) of the model reported exactly the opposite, i.e., they reported that MOC efferent activation improved the recognition of speech in noise by an automatic speech recognizer.

L50-51. The end of this sentence appears to be incomplete.

L55. Here and elsewhere, it would be appropriate to cite the review of Lopez-Poveda (2018) Front Neurosci.

L62. Replace the semicolon with a comma.

L63. Remove the space in "OAE s".

L79. I am not sure what mean by "homologous visual and auditory scenes". In what sense were they homologous?

L82. For clarity, I suggest starting this paragraph with "We found that…".

L85. It would be clearer to say "assessed in terms of the suppression of CEOAEs"?

L94. "levels of listening effort". Are you saying (in this paragraph) that listening to vocoded speech involves spending different levels of effort than listening to speech in noise? I would say that it is not only "different levels of" but also "different kinds of" effort?

L101. "to 'natural' speech (i.e., speech-in-quiet)". This is confusing. Consider rephrasing as "to 'natural' speech in quiet".

L105. "Three levels of task difficulty". This description is confusing because, actually, I see more than three levels. Considering rephrasing for clarity/accuracy.

L108. "This was statistically confirmed". What was confirmed?

L113-115. The last statement appears to be incomplete. Please revise.

L120. What test was applied?

L122-123. I am not sure what the statistical test indicated at the end of the sentence refers to.

L132. It would be useful to describe what d' refers to, so that the reader can understand the legend and figure without having to read the Methods. Also, the "white circles" referred to are not visible (or present?) in the figure. Perhaps, this is because of the poor quality (low resolution) of the figures provided to the reviewers.

L160. "auditory efferent activity". Do you mean efferent to the cochlea or more generally?

L162. I am not sure (at this point in the text) how iso-performance was achieved. Please explain.

L168. "(Figure 2A)" I think you probably mean "(Figure 1C)". Please double check and correct it if appropriate.

L193. Delete "and".

L194. "(Figure 1C)" I think you mean "(Figure 1D)". Please double check and correct it if appropriate.

L198. "differing results" to what results are your referring to?

L208. "gain changes are sufficient"; sufficient for what? Please explain.

L215-217. Alright, but other authors have shown this not to be only effect of MOC efferent activation. See my major comments. Why not use, for example, the model in combination with a speech recognizer?

L220-222. Speech recognition is unlikely based on information carried by LSR units? Why not apply the same analyses to all fiber types?

L244. Do you really mean 400 fibers/1ms bin-width or 400 stimulus presentations per fiber? It is not the same thing.

L279-281. How does this compare to the findings of past studies that used the same model? See my general comments.

L282-291 and more generally. Did the masking noise in the model and the experiments start sufficiently well before the words to guarantee full activation of MOC efferents at the time when words started and during the full duration of the words?

L310-316 and more generally. Were the BN and SSN such that the MOC *reflex* (i.e., an activation of MOC efferents by sound) be active during passive listening in noise (Fig. 1C)?

L320-321. Why are so many electrodes needed to record the responses shown in Fig. 3? Why are responses shown only for the active listening task? What were responses like (i.e., did they show any significant features) in the passive auditory task (active visual task)?

L356. Delete "2016)".

L373. "preserves cochlear gain to prevent deterioration of envelope encoding". Why would activation of MOC efferents deteriorate envelope encoding in quiet or in noise?

L381. Close the brackets after "as speech".

L391. "to maintain iso-performance" � "possibly to maintain iso-performance".

L407. "where participants" � "when participants". (?)

L412-413. "required to attention". Please revise this text.

L417-428. This whole paragraph is hard to follow. Ideas are thrown without indicating which aspects of the data support the ideas in question.

L435. The cited study [62] not only suggests that MOC reflex expands interaural cues associated with sound localization but also; it suggests that the MOC enhances amplitude modulations, and suppresses the noise (and thus enhances the SNR) in the ear with the better acoustic SNR (see for example, Fig. 2 in Lopez-Poveda et al., 2020, Ear Hear).

L469. It would be appropriate to also cite Marrufo-Pérez et al. (2021). Front Neurosci.

L532, "that this analysis" what analysis exactly?

L614. Please define Z.

L643. Was the BN (and SSN) gated with each word? If so, this may have been insufficient for a full activation of the MOC reflex by the noise. Or were the noises continuous? See my general comments.

L682. In what units was CEOAE magnitude expressed? In dB SPL or in Pascal?

L682. CEOAE_baseline (first 60 sec). I am confused. You just said that baseline CEAOE magnitude was calculated over the first and last 60 sec.

L689-694. This (long) sentence appears to be incomplete (missing a verb ?). The MEMR could have been activated by the speech for a significant proportion of participants (see Feeney et al., 2017, Ear Hear).

L696. I am not sure what the 10-20 system is. Please explain.

L698. I am not sure what LSB in nV/LSB refers to. Please define LSB.

Reviewer #2: The present manuscript presents a large data set to suggest that top-down modulation of the cochlea efferent system depends on the stimulus material, and in particular on how the envelope of speech stimuli is modulated by cochlear gain. This is interesting, as a number of previous studies have suggested top-down control of the efferent system, but the direction of the modulation has been variable, such that the role of this mechanism is not yet well understood. The present manuscript might therefore help to resolve some of the variance between previous studies. The experiments are overall well done and the manuscript is well written. I have only a few questions, which, however, I think are important to interpret the data. As I could not find a certain answer to these in the manuscript, I will discuss them below:

(1) Based on Figure suppl. 1B, there is a 3-s interstimulus interval (ISI) between speech tokens (elsewhere I read that the speech tokens were concatenated?). In the natural speech and NV conditions, I assume the ISI is silent. However, how is the temporal shape of the masker in the other conditions? In the methods section on page 26, it is mentioned that maskers were 60-s long (l. 642), but then also that the speech token and masker segments were matched in duration (l. 643), but no details about onset/offset ramps are provided. Thus, I wasn't sure if the duration of the masker is only of the duration of each speech token, or if it continuous through a 3-s ISI. In the latter case, this would have important influence on the CEOAE, because the ongoing masker would lead to stronger suppression than the silence in the other condition, which would in fact match the data. First, the stimulus setup needs to be explained in more detail (methods refer to Fig Supp 6, but this only includes the spectra). If there are really ISIs that are silent versus filled with a masker, this would need to be considered in the explanation of the data and in the model calculations.

(2) The model suggests that the presence of the MOC reflex enhances envelope coding measured by pENV in the range of 3.5%. How can this effect be compared with CEOAE suppression and modulation of wave V? On first glance the effect size appears relatively small. Could the authors provide some arguments how the effect size could be compared between these meassures?

(3) For the early part of the cortical response, the waveform appears to start right at 0 ms latency, in particular for the conditions where a masker was used. Again, it would be helpful to know more details about the timing of the masker to interpret these waveforms. Is there a common onset of masker and speech token, or is the masker ongoing? (The authors write that "the speech and noise tokens overlap", but this would be the case both with continuous as well as gated maskers). In the case of gated maskers, the onset would represent a mixture of masker and speech token.

Reviewer #3: Review of "Perceptual gating of a brainstem reflex facilitates speech understanding in human listeners" by Hernandez-Perez et al.

This manuscript describes a study aimed at determining the extent to which Medial Olivocochlear (MOC) reflex activity depends on active/passive listening during the presentation of speech that was intrinsically degraded (i.e., vocoded speech) or degraded by the addition of background noise. The effect of presenting vocoded or noise-degraded speech on the amplitudes of click-evoked otoacoustic emissions (OAEs) and the auditory brainstem response (ABR) was measured by presenting a continuous click train to the non-test ear. Late-latency evoked potentials were also measured in response to the vocoded or noise-degraded speech presented in the test ear. All measurements (OAEs, ABR, late-latency potentials) were obtained simultaneously while the subject actively detected the presence of non-sense words or passively watched a stop-motion movie. The primary findings are 1) suppression of OAEs in response to vocoded speech was significantly larger during active versus passive listening, 2) suppression of OAEs in response to speech degraded by steady-state noise was significantly larger during passive versus active listening, 3) ABR-wave V amplitudes obtained during the presentation of speech-shaped noise were significantly larger during active versus passive listening, and 4) cortical potential amplitudes measured during active listening were larger for noise-degraded speech compared to vocoded speech. The authors interpret a subset of these findings in the context of a computational auditory model where emphasis was placed on analyzing changes in envelope coding for simulations that did versus did not simulate the MOC reflex.

This study is well designed and appears to produce results that are consistent and are interpretable in the context of previous literature. I appreciate the approach to include simultaneous measurement of perception, OAEs, and event-related potentials and I feel this approach is a great strength to the design of the study. Although I am unable to identify any fatal flaws with the study, I have some general comments that I wish to express to the authors - some of which I give as a matter of opinion and not as changes I require before recommending the manuscript for publication.

GENERAL COMMENTS

The rationale for the methods surrounding the model simulations is underdeveloped. The manuscript refers to the "Matlab Auditory Periphery and Brainstem" (MAP-BS) computational model; however, a citation is not provided for this model. I was unable to determine 1) whether the authors were proposing a new (unpublished) model, 2) why currently published computational models with efferent feedback were not considered (e.g., Messing et al., 2009; Brown, Ferry, and Meddis, 2010; Smalt et al. 2014), 3) why the MAP-BS model is an improvement over previous models, 4) how the predictions from the MAP-BS model may be similar to or different from previously published models.

The level of the vocoded and noise-degraded speech was likely intense enough to elicit the middle ear muscle (MEM) reflex. Although the authors measured clinical thresholds for the MEM reflex, there is evidence that clinically-based thresholds are, on average, 18.5 dB higher than more sensitive measures of the MEM reflex based on power reflectance (Feeney and Keefe, 2001). The discussion does not address the likelihood that the stimuli used in the experiment stimulated the MEM reflex; nor does the discussion address how this stimulation modifies the interpretation of the results.

The manuscript introduces an appreciable amount of new information and findings in a relatively restricted length. I feel this may be overwhelming for the reader. I consider myself an expert in psychophysics, auditory modeling, and MOC efferents, and despite this expertise I found the manuscript overwhelming to read and digest. I feel there are two and maybe three papers stuffed into this one manuscript. For example, if the model simulations are the result of a new (unpublished) model, I think it would be appropriate to dedicate a separate paper to more fully describing the model and the interpretation of the results in the context of the model. Similarly, I am unaware of previous studies that evaluated contralateral suppression of OAEs using vocoded stimuli in passive versus active listening tasks. The discussion of such findings in the context of temporal envelope coding is very dense and may be more digestible if the experimental results were presented in a separate manuscript. The discussion of such a manuscript could more carefully relate these results to suppression of OAEs measured with other stimuli with modulated envelopes and in tasks with/without active listening. I give this final comment as an option for the authors to consider rather than a requirement before I recommend the manuscript for publication.

Messing, D. P., Delhorne, L., Bruckert, E., Braida, L. D., & Ghitza, O. (2009). A non-linear efferent-inspired model of the auditory system; matching human confusions in stationary noise. Speech communication, 51(8), 668-683.

Brown, G. J., Ferry, R. T., & Meddis, R. (2010). A computer model of auditory efferent suppression: implications for the recognition of speech in noise. The Journal of the Acoustical Society of America, 127(2), 943-954.

Smalt, C. J., Heinz, M. G., & Strickland, E. A. (2014). Modeling the time-varying and level-dependent effects of the medial olivocochlear reflex in auditory nerve responses. Journal of the Association for Research in Otolaryngology, 15(2), 159-173.

Feeney, M. P., & Keefe, D. H. (2001). Estimating the acoustic reflex threshold from wideband measures of reflectance, admittance, and power. Ear and hearing, 22(4), 316-332.

OTHER COMMENTS

Line 28 (abstract): it would help to quickly state/summarize these strategies.

Line 31: I think the goal is not "cocktail-party" listening per se, but effective/robust "cocktail-party" listening.

Lines 50-51: something is wrong with this sentence

Line 57: consider softening the statement by using "unclear" instead of "controversial"

Line 62: Be more explicit. OAE amplitudes are not always reduced by contralateral acoustic stimulation. In some cases, as with DPOAEs, the amplitude may increase.

Line 63: There is an extra space between the "E" and "s" of OAEs.

Line 63-64: This sentence should state that the magnitude of OAE suppression may relate to speech perception performance. (not just OAE magnitude).

Line 64-65: Efferent modulation of cochlear gain could depend…? (I'm not sure what is meant by "confounding effects.")

Line 108: State what "this" is for clarity. (e.g., This "modulation of task difficulty" was statistically…)

Lines 155-156: Why did the direction of the t-statistic change? Shouldn't all t-statistics be in the same direction if CEOAE suppression is present?

Lines 179-180: "Effects" is vague. Do you mean intermediate magnitude of suppression of OAEs?

Lines 207-210: Or alternatively, the changes in ABR are inherited from the cochlear reduction in gain in the noise-degraded speech conditions (brainstem passively transmits signal from lower auditory centers), and for vocoded stimuli there is a compensatory central amplification to offset the reduction in cochlear gain.

Lines 485-488: This conclusion may depend on SNR. For favorable SNRs the expansion would enhance the effective speech envelope while releasing AN fibers from adaptation/saturation produced by the noise.

Lines 496-501: This sentence is long and hard to read. Consider breaking it apart.

Line 597: State that all subjects did not complete all experimental measurements.

Line 599: Close parentheses after "tympanometry."

Line 600: Close parentheses that started before "assessed" on line 599.

Line 600-601: Something is wrong with this sentence.

Line 620: Why was the speech material only from one talker? Please address whether the results of the experiment would be different if a different talker were used. Can we be confident that the results would be similar with a different talker?

Line 729: Why were the model simulations limited to these frequencies? I expected the simulations would include a broad range of CFs given that the stimulus of interest (speech) is broadband.

Line 750: The model is not fully described in the text. For example, does the model simulate basilar membrane nonlinearity (i.e., suppression, compression, level-dependent tuning)? I assume it does, but I could not find the description in the text.

Reviewer #4: Hernandez-Perez et al review

This study uses a clever experimental design with simultaneous OAE, ABR and speech EEG recordings to investigate how cochlear gain changes when (modified) speech is presented in attended vs unattended listening. The topic is very relevant and timely, because little is known about how bottom-up and top-down mechanisms alter the MOC reflex when listening to (degraded) speech. This study is important in that it evaluates cochlear, brainstem and late potential changes simultaneously, which is a rare but necessary approach to understand the cascade of peripheral and neural adjustments the MOC reflex can inflict. In the study, CEOAE gain changes are interpreted as MOC reflex changes that appear to be modulated by the speech modifications and listening state. Model simulations with MOC reflexes on or off can capture the effect of how the stimulus modifications affect the envelope similarity between the natural-MOC condition. The paper ends with a theoretical interpretation of how the trends in the data can be interpreted in a model of (attention-driven) top-down and bottom-up MOC modifications that depend on the acoustics.

Major comments:

The analysis of the results is of very high quality. However, I think that the connection with the ABR results can be made stronger and that the model/interpretation proposed is not necessarily the "sole" or "conclusive" answer as to how these results could have arisen. At the same time, there is a methodological reference issue (baseline for CEOAE was different from that in modeling) that may have had an impact on the results interpretation. This would have to be sorted out before we can fully grasp the meaning of the results and evaluate whether the proposed models are appropriate. It is rather a collection of aspects that make me doubt whether the data observations warrant the final conclusions, but hopefully these can be addressed/mitigated/clarified in a revision.

The below comments are in random order of importance:

Minor spelling:

L33: its => their

L46: check em-dash and bracket positions

L51: check bracket text and references

L58: "the/a" listening task

L63: OAEs

L120: "three-way" ANOVA?

L168: should be figure 1, no?

L180: CEOAEs

L193: "and was", change into "- was"

L194: Figure 1D?

L199: Vs. => vs.

L202-205: Question related to testing intrinsic differences in the two populations, and inference of conclusion: "the differences observed in auditory brainstem/midbrain activity and cochlear gain can be attributed to the presented speech degradation when there is no relationship between CEOAE suppression and ABR amplitudes in the natural speech condition" L202-205 show results of a t-test where wave-III/V amplitudes [in microV] are compared to CEOAE suppression [in dB] to support this statement statistically. Does it make sense to compare dB values to microV amplitudes, they have different units and why not characterize ABR reduction rather than amplitude to have a meaningful comparison? At the moment, the 0-hypothesis of your t-test is: "there is no difference in the distributions: dB suppression vs ABR amplitude in microV". To me, such a comparison makes no sense at all, there might never be a theoretical relationship between these two units. Have I misunderstood something about the purpose of the test you conducted, or can you use the same units?

Figure 1C-D, It is confusing to see CEOAE suppression in negative dBs, are these just small values or are you observing a "release from suppression", i.e. more gain in the system when more background noise is presented?

L206-210: It is hard to follow your conclusion here, you would need to motivate this better. When I look at the passive conditions only for CEOAE suppression (Figure 1C), and compare the two natural conditions, there is quite some variability between the degree of CEOAE suppression, what is the reason for this, and is this meaningful? My worry is that the BN10/BN5 conditions show more suppression than the corresponding natural condition, but not to the natural condition that came with the vocoded manipulations. Hence it may be difficult to conclude the "reduction of the cochlear gain for masked stimuli" part of your sentence. You then go on to write that the "ABR magnitudes appear only inherited from [this cochlear gain reduction]". Again, only looking at the passive conditions (Figure 1D), I can see that wave-V is reduced for the BN & SSN compared to the vocoded (significance not sure) conditions, but you do not compare against the "Natural" condition. This collapsing in Figure 1D is confusing to me, at least you'd have to isolate the "Natural" conditions. L208-210 are even more difficult to follow, it is not clear how exactly the results lead you to conclude this.

L228: Here, logically you use the "natural speech" as the template for comparison with the iso-response conditions, why not do the same for the CEOAE and ABR conditions? There you are referring to the control as the condition without speech. It would be more consistent to use the same reference throughout the paper. In consequence, CEOAE suppression might become positive in some cases, but it would make it easier to judge the effect of speech manipulations, attention and masking if you used the natural speech as reference in all cases.

L236: It is unclear to me what you mean by "to their natural speech controls" in this section, because earlier on you write "Natural speech control simulations were always performed with the MOC reflex". From a visual inspection of Fig.2A, where this distinction does not matter, VOC8 resembles well the corresponding Nat (compare MOC with MOC and AN only with AN only). The BN/SNN matches less well the corresponding natural conditions (compare MOC with MOC, and AN with AN). The MOC on or off does not change much to these trends as long as you use the corresponding Nat condition. So yes, VOC shows increased similarity to Nat than BM/SNN, but this appears MOC independent. Adding MOC does have an effect on all conditions, but from a visual inspection, but the trends are similarity in comparison to the Nat condition appear the same. Perhaps you did refer to corresponding controls for these comparisons, but it is not clear to me from the description, and the used reference may have a big influence on the results interpretations.

Figure 1 D and text: Also here I am confused as to whether �ENVAN on the x-axis uses (a) the corresponding natural or (b) always MOC natural as reference. In case of (a) you can safely conclude that adding MOC has an effect on envelope similarity. In case of (b) you cannot because clearly the MOC simulations will be more similar to the MOC natural than the no-MOC simulations, especially for the VOC conditions. So in case you use (b), the sentence "with largest enhancements observed for words with the lowest �ENV values in the absence of MOC reflect" appears trivial and a consequence of always using MOC natural as the reference for both MOC and no-MOC manipulated speech conditions.

Model Simulations: To better integrate the ABR results into the resulting final proposed mechanistic MOC reflex model, it would be worthwhile to also simulate a proxi for ABRs with the Meddis model. You can sum up the simulated AN responses across fibers and CFs in the to evaluate how the click ABR would be affected with and without MOC, and see how these alterations correspond to the amplitude changes you observed experimentally. Of course you'd only have a reliable wave-I simulation from that model, but you could add a basic-non gain CN/IC model to simulate the ABR wave-III and V as well (e.g. Nelson and Carney, 2005, Verhulst et al., 2018). It is not problematic to me if you decide not to pursue ABR simulations all the way to the wave-V, but even simulating wave-I would already be informative to better understand the bottom-up MOC strength changes and their effects on expected click ABR apmplitude changes (that you also measured).

Model Simulations/Data interpretations: On the one hand you state in L671 that you analyzed the rms suppression of CEOAE only in the 1-2 kHz region because that is where the MOC reflexes are strongest. On the other hand, you consider envelope tracking as a measure of speech coding and study effects of on-CF MOC simulations. Does this not bias your interpretation, i.e. you simulate MOC effects for speech/noise frequencies outside the 1-2 kHz region as well, where no MOC innervation is expected based on CEOAEs. I am assuming that speech envelope tracking parameters mostly gets their contributions from high-frequency coding (i.e. from ANFs that do envelope tracking with CFs above the phase-locking limit), how do you reconcile these two aspects, and minimize bias on your results interpretation?

Figure 2E "CEOAE inhibition" => use a similar nomenclature in Fig 1 and 2 when referring to the CEOAEs.

Methods:

When recording simultaneous EEG and OAEs, it is possible to generate stimulus artifacts in the EEG traces caused by the ER-10B OAE mic. Did you observe such artifacts, and how did you ensure they had no confounding effect on your ABR amplitudes?

Did you only use only one observer to peak-pick your ABR traces (unusual in audiological studies), and did these amplitudes correspond to wave-III and V peaks to noise-floor or positive peak to negative peak amplitudes?

How did you minimize the effects of individual CEOAE/MOC strength on your recordings? In our own recordings, we tend to observe that people with strong CEOAEs shows the most suppression. This is not unusual because they have the most gain in their CEOAEs to begin with. Could individual differences in CEOAE amplitude and MOC strength have had an impact on your results?

Decision Letter 2

Gabriel Gasque

28 Sep 2021

Dear Dr Hernandez Perez,

Thank you for submitting your revised Research Article entitled "Understanding degraded speech leads to perceptual gating of a brainstem reflex in human listeners" for publication in PLOS Biology. I have now obtained advice from original reviewers 1 and 4 and have discussed their comments with the Academic Editor. 

Based on the reviews, we will probably accept this manuscript for publication, provided you satisfactorily address the remaining points raised by the reviewers. Please also make sure to address the following data and other policy-related requests.

(1) Please provide a blurb, which will be included in our weekly and monthly Electronic Table of Contents, sent out to readers of PLOS Biology, and may be used to promote your article in social media. The blurb should be about 30-40 words long and is subject to editorial changes. It should, without exaggeration, entice people to read your manuscript. It should not be redundant with the title and should not contain acronyms or abbreviations. For examples, view our author guidelines: https://journals.plos.org/plosbiology/s/revising-your-manuscript

(2) Please indicate within your manuscript if your experiments were conducted according to the principles expressed in the Declaration of Helsinki or any other (specific) national or international ethical guidelines.

(3.a) We thank you for providing a Reviewer Dryad URL with your data. If you are going to use Dryad as your data repository, please make sure the link goes live before acceptance.

(3.b) Please include in your data folder/Dryad URL a README file that explains how the source data were analyzed to generate the graphs displayed in the figures.

(3.c) Please double check your Excel file for S2 Fig and S5 Fig because the “tabs” seem to be mislabeled.

(3.d) Please include in each figure legend a sentence indicating where the underlying data can be found. For example, you can write, "The underlying data can be found in [Dryad URL]".

As you address these items, please take this last chance to review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the cover letter that accompanies your revised manuscript.

We expect to receive your revised manuscript within two weeks.

To submit your revision, please go to https://www.editorialmanager.com/pbiology/ and log in as an Author. Click the link labelled 'Submissions Needing Revision' to find your submission record. Your revised submission must include the following:

-  a cover letter that should detail your responses to any editorial requests, if applicable, and whether changes have been made to the reference list

-  a Response to Reviewers file that provides a detailed response to the reviewers' comments (if applicable)

-  a track-changes file indicating any changes that you have made to the manuscript. 

NOTE: If Supporting Information files are included with your article, note that these are not copyedited and will be published as they are submitted. Please ensure that these files are legible and of high quality (at least 300 dpi) in an easily accessible file format. For this reason, please be aware that any references listed in an SI file will not be indexed. For more information, see our Supporting Information guidelines:

https://journals.plos.org/plosbiology/s/supporting-information  

*Published Peer Review History*

Please note that you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. Please see here for more details:

https://blogs.plos.org/plos/2019/05/plos-journals-now-open-for-published-peer-review/

*Early Version*

Please note that an uncorrected proof of your manuscript will be published online ahead of the final version, unless you opted out when submitting your manuscript. If, for any reason, you do not want an earlier version of your manuscript published online, uncheck the box. Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us as soon as possible if you or your institution is planning to press release the article.

*Protocols deposition*

To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols

Please do not hesitate to contact me should you have any questions.

Sincerely,

Gabriel Gasque, Ph.D.,

Senior Editor,

ggasque@plos.org,

PLOS Biology

------------------------------------------------------------------------

Reviewer remarks:

Reviewer #1: The authors have considered each and every one of my comments and have addressed all of them adequately. The revised manuscript is much clearer and stronger. The study is very interesting and timely. It will be of interest to hearing scientists and audiologists. It should be published pending the following comments.

General comment

One general comment is that the notion that the MOC reflex can help understanding vocoded speech because it enhances the amplitude modulations in speech was suggested by Lopez-Poveda et al. (2016). Indeed, it is explicitly shown in the insets of Fig. 2 of a more recent study by the same group (Lopez-Poveda et al., 2020, Ear and Hearing). Of course, their conclusions are based on the behavior of a MOC-inspired sound-coding strategy for cochlear implants. It is nice that the present study provides experimental support to that notion also for normal-hearing listeners. What I find puzzling is that Lopez-Poveda et al. (2020) found that the effect and benefit apply to (vocoded) speech in quiet as well as in noise, while the present study find the effect and benefits to apply only in quiet. I would ask the authors to consider highlighting and discussing the issue briefly in their manuscript.

Minor comments

L107. Remove the comma after "different kinds,".

L152-153. Do whiskers really represent interquartile range (IQR=q3-q1)? Please double check because this does not seem to be the case for some boxplots, e.g., Natural and Voc16 in the left panel.

L751. Delete the blank space in "to increase ."

L775. Delete the comma in "each time, they heard".

L793. Delete the blank space in "stop/ non-stop".

L812-813. Replace the full stop with a comma in "applied to stimuli. Figure 1C".

Figure 2D. Please double check the titles for the panels; they seem incorrect. Specifically it makes no sense to write "Nat_MOC vs Nat_MOC" or "VOC_AN vs VOC_AN".

Figure 3A. Correct the ordinate label in the right-top panel; it reads "nplitude" where it should read "Amplitude".

Reviewer #4: Revision:

Understanding degraded speech leads to perceptual gating of a brainstem reflex in human listeners.

Hernandez-Perez, Mikiel-Hunter et al.

I would like to congratulate the authors on their revised manuscript. It follows a unique whole-systems approach to unravel the functional role of the MOC reflex in (degraded) speech perception, a topic that is both timely and important. I appreciate the careful consideration of the reviewer comments/suggestions and subsequent additional analyses, which substantially improved the clarity of the manuscript and led to a stronger scientific argumentation. The editor asked me to also evaluate rebuttal comments to R2, and I can confirm that you addressed both our raised points satisfactory. I have one comment left that you may wish to consider in your final submitted version (no need for another rebuttal on my behalf).

L233: "We conclude from this that the differences observed in cochlear gain and auditory brainstem/midbrain activity can be attributed to the specific form of speech degradation. Together with the effect on CEOAEs, our data suggest that the magnitude of auditory midbrain activity for the different speech manipulations reflects cochlear output. While this is evident for both listening conditions in masked speech, the similarity of ABR amplitudes in the midbrain for active and passive listening of noise-vocoded stimuli is indicative of feed-forward amplification that compensates for reduced cochlear gain during active listening"

I understand how you get to your conclusion, but I find the direct comparison between CEOAE and ABR suppression as motivator for the "feed-forward amplification" explanation a bit tricky. It is a possibility, but you need to also consider that ABRs and CEOAEs reflect activity from somewhat different cochlear regions. Dominant CEOAE energy in 1-2 kHz, whereas regular click ABRs mostly reflect summed activity of the higher frequency regions (3-8 kHz, Abdala & Folsom 1995, Don & Eggermont 1978). Can you be sure that this latter aspect had no influence on your interpretation of results?

L372-374: consider rephrasing for clarity

References

Analysis of the click-evoked brainstem potentials in man using high-pass noise masking

M. Don, and J. J. Eggermont The Journal of the Acoustical Society of America 63, 1084 (1978); doi: 10.1121/1.381816

Frequency contribution to the click-evoked auditory brain-stem response in human adults and infants. Carolina Abdala, and Richard C. Folsom. The Journal of the Acoustical Society of America 97, 2394 (1995); doi: 10.1121/1.411961 View online: https://doi.org/10.1121/1.411961

Decision Letter 3

Gabriel Gasque

7 Oct 2021

Dear Heivet,

On behalf of my colleagues and the Academic Editor, Manuel S. Malmierca, I am pleased to say that we can in principle offer to publish your Research Article "Understanding degraded speech leads to perceptual gating of a brainstem reflex in human listeners" in PLOS Biology, provided you address any remaining formatting and reporting issues. These will be detailed in an email that will follow this letter and that you will usually receive within 2-3 business days, during which time no action is required from you. Please note that we will not be able to formally accept your manuscript and schedule it for publication until you have made the required changes.

Please take a minute to log into Editorial Manager at http://www.editorialmanager.com/pbiology/, click the "Update My Information" link at the top of the page, and update your user information to ensure an efficient production process.

PRESS

We frequently collaborate with press offices. If your institution or institutions have a press office, please notify them about your upcoming paper at this point, to enable them to help maximise its impact. If the press office is planning to promote your findings, we would be grateful if they could coordinate with biologypress@plos.org. If you have not yet opted out of the early version process, we ask that you notify us immediately of any press plans so that we may do so on your behalf.

We also ask that you take this opportunity to read our Embargo Policy regarding the discussion, promotion and media coverage of work that is yet to be published by PLOS. As your manuscript is not yet published, it is bound by the conditions of our Embargo Policy. Please be aware that this policy is in place both to ensure that any press coverage of your article is fully substantiated and to provide a direct link between such coverage and the published work. For full details of our Embargo Policy, please visit http://www.plos.org/about/media-inquiries/embargo-policy/.

Thank you again for choosing PLOS Biology for publication and supporting Open Access publishing. We look forward to publishing your study. 

Sincerely, 

Gabriel Gasque, Ph.D. 

Senior Editor 

PLOS Biology

ggasque@plos.org

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Table. Planned t test comparisons between CEOAEs baseline measures and CEOAEs magnitude obtained during the presentation of noise-vocoded speech and masked speech.

    CEOAE, click-evoked otoacoustic emission.

    (XLSX)

    S2 Table. Subjects removed for CEOAEs suppression and ABR analysis.

    ABR, auditory brainstem response; CEOAE, click-evoked otoacoustic emission.

    (XLSX)

    S1 Fig. Efferent feedback improves envelope encoding for naturally spoken sentences.

    (A) Shuffled-Correlogram Sumcors (upper panel) were calculated for the naturally spoken sentence, “the steady drip is worse than the drenching rain” (s86; The MAVA corpus, [179]), using LSR AN fibre output in the 2.334 kHz channel with (red line) and without (black line) efferent feedback (MOC reflex). A longer, 1-second delay window was used compared to the single word presentation; in addition, inverted triangular compensation was implemented to compensate for large delays relative to signal length [74,77]. The envelope power spectral density (lower panel) was computed both with (red line) and without (black line) efferent feedback by computing Fourier transforms of the above Sumcors with a <1 Hz spectral resolution. Efferent feedback was conducive to larger envelope responses, especially at low modulation frequencies associated with words and syllables. (B) Envelope power spectra computed with and without MOC reflex in the 4 to 8 Hz modulation range for 6 sentences (s7, s26, s37, s42, s86, and s164; The MAVA corpus, [179]). In all instances, adding MOC reflex improved envelope encoding across most modulation frequencies. The underlying data can be found in https://doi.org/10.5061/dryad.3ffbg79fw. AN, auditory nerve; LSR, low spontaneous rate; MOC, medial olivocochlear.

    (EPS)

    S2 Fig. Using a different control template condition (NatANonly) to calculate ρENVAN does not alter stimulus-specific changes to envelope encoding when (15-dB attenuation) efferent feedback is added to LSR fibres.

    (A) ΔρENVs for 100 words [in their 3 degraded forms and for high- and low-frequency bands (<1.5 kHz (left, A) and >1.5 kHz (right, A)] were calculated using AN responses to “Natural” speech (i.e., in quiet) presented without the MOC reflex as the control template to compute values of ρENVAN, i.e., NatANonly versus DegradedANonly. ΔρENVs for all manipulations in both low and high-frequency bands followed the same stimulus-dependent trends as in Fig 2. (mean ΔρENV for Voc8 for freqs <1.5 kHz = +3.69 ± 0.30%, [Z(99) = 5.68, p < 0.001, r = 0.57]; mean ΔρENV for Voc8 for freqs >1.5 kHz = + 8.08 ± 0.36%, [Z (99) = 8.6, p < 0.0001, r = 0.87]; mean ΔρENV for BN5 for freqs <1.5 kHz = −9.20 ± 0.66, [Z(99) = −8.68, p < 0.0001, r = 0.87]; mean ΔρENV for BN5 for freqs >1.5 kHz = −3.24 ± 0.30%, [Z(99) = −7.84, p < 0.001, r = 0.57]; mean ΔρENV for SSN3 for freqs <1.5 kHz = −9.09 ± 0.70, [Z (99) = −8.68, p < 0.0001, r = 0.87]; mean ΔρENV for SSN3 >1.5 kHz = −5.60 ± 0.34, [Z (99) = −8.68, p < 0.0001, r = 0.87]). ΔρENVs for Voc8 stimuli (pink circles, left, A) were exclusively positive in the high-frequency band with the largest benefits observed for noise-vocoded tokens with the lowest ρENVAN values, as observed in Fig 2E. In addition, the most negative ΔρENVs for BN5 and SSN3 stimuli were observed for the lowest values of ρENVAN. (B) Comparing ΔρENVs calculated using NatANonly and NatAN+MOC as a control template for ρENVAN at high frequencies (>1.5 kHz). The mean improvement in envelope encoding for Voc8 stimuli was larger after calculating ρENVAN with the new NatANonly control template ([Z(99) = −4.6, p < 0.001, r = 0.47]) (left column, B). Similarly for masked stimuli (BN5 (middle, B) and SSN3 (right, B)), the new control template for ρENVAN led to an increase in the impairment to envelope encoding with the MOC reflex (BN5:[Z(99) = −6.50, p < 0.001, r = 0.65]; SSN3:[Z(99) = −7.09, p < 0.01, r = 0.71]) (middle and right columns, B). The underlying data can be found in https://doi.org/10.5061/dryad.3ffbg79fw. AN, auditory nerve; BN, babble noise; LSR, low spontaneous rate; MOC, medial olivocochlear; SSN, speech-shaped noise.

    (EPS)

    S3 Fig. Comparing mean changes in ΔρENVs (in >1.5kHz frequency band) for control conditions.

    (A) Using a smaller fixed attenuation for the active MOC reflex (10-dB attenuation) than in the main simulations (15-dB attenuation) reduced both positive mean ΔρENV for Voc8 and negative mean ΔρENVs for BN5/SSN3 (Voc8: [Z(99) = −7.94, p < 0.001, r = 0.79]; BN5: [Z(99) = −5.53, p < 0.001, r = 0.55]; SSN3: [Z(99) = −7.85, p < 0.001, r = 0.78]). Nevertheless the benefits (Voc8) and disbenefits (BN5/SSN3) of adding the MOC reflex to envelope encoding remained for all 3 stimulus manipulations (Voc8: ([Z(99) = 6.756, p < 0.0001, r = 0.79]; BN5: [Z(99) = −5.16, p < 0.001, r = 0.52]; SSN3: [Z(99) = −8.44, p < 0.0001, r = 0.84]). (B) Presenting degraded speech tokens with more channels for Voc stimuli, i.e., Voc16, generated signficantly larger ΔρENVs with a fixed 15-dB MOC reflex attenuation ([Z(99) = −3.66, p < 0.001, r = 0.4]). Increasing the SNRs (10-dB SNR for BN and 8 dB SNR for SSN) significantly reduced ΔρENVs for speech-in-noise conditions when the same MOC reflex attenuation was implemented (BN: [Z(99) = −8.20, p < 0.001, r = 0.82]; SSN: [Z(99) = −8.67, p < 0.001, r = 0.87]). For BN10, the new mean ΔρENVs was in fact positive ([Z(99) = 2.025, p = 0.043, r = 0.2]). The underlying data can be found in https://doi.org/10.5061/dryad.3ffbg79fw. BN, babble noise; MOC, medial olivocochlear; SNR, signal-to-noise ratio; SSN, speech-shaped noise.

    (EPS)

    S4 Fig. Percentage change in envelope encoding after introduction of (15-dB attenuation) efferent feedback to low-threshold, HSR AN fibres (in > 1.5kHz frequency band).

    (A and B) ΔρENVs for 100 words (in their 3 degraded forms) were calculated as in Fig 2E; however, HSR AN fibre output for frequencies >1.5 kHz was used here. ΔρENVs for Voc8 words (pink circles, A) varied greatly (Max-Min ΔρENV for Voc8 > 1.5kHz = +32.84 to −0.43%) but the mean ΔρENV was significantly positive (mean ΔρENV for Voc8 >1.5 kHz = +12.06 ± 0.57%, [Z (99) = 8.68, p < 0.0001, r = 0.87]). Note that values of ρENVAN for Voc8 were smaller here compared to values for low SR (LSR) AN fibres (mean ρENVAN-HSR for Voc8 >1.5 kHz = 0.55 ± 0.01 versus mean ρENVAN-LSR for Voc8 >1.5 kHz = 0.64 ± 0.01). By contrast, the distributions of ΔρENVs for BN5 (light blue squares, A) and SSN3 (green diamonds, A) appeared more compact (Max-Min Range ΔρENV for BN5 words = +2.062,27 to −9.79%; Max-Min ΔρENV for SSN3 = +1.07 to −10.57%); however, as for LSR AN fibre results (Fig 2E and 2F), both mean ΔρENVs for HSR AN fibres were significantly negative overall (mean ΔρENV for BN5 = −3.47 ± 0.27, [Z(99) = −8.18, p < 0.001, r = 0.82]; mean ΔρENV for SSN3 = −4.36 ± 0.22, [Z(99) = −8.65, p < 0.0001, r = 0.87]). Progression of mean ΔρENVs (± SEM) for model data > 1.5kHz (checkerboard bars, right, B) mirrored that of active-task, CEOAE data (mean ± SEM) (solid colour bars, left, B). The underlying data can be found in https://doi.org/10.5061/dryad.3ffbg79fw. AN, auditory nerve; BN, babble noise; CEOAE, click-evoked otoacoustic emission; HSR, high spontaneous rate; LSR, low spontaneous rate; SSN, speech-shaped noise.

    (EPS)

    S5 Fig. Percentage change in TFS encoding for masked speech conditions (BN5/SSN3) after introduction of efferent feedback (15-dB attenuation) to LSR AN fibres (in >1.5kHz frequency band).

    Changes to TFS encoding were calculated for masked speech (not for noise-vocoded stimuli given their scrambled fine structure [50,73,76,178]) using Natural conditions with MOC reflex as control templates to calculate both ρTFSAN and ρTFSMOC. Adding the MOC reflex produced a mean improvement in TFS encoding for both BN5 and SSN3 (mean ΔρTFS for BN5 for freqs >1.5 kHz = 0.31 ± 0.16%, [Z(99) = 2.61, p = 0.009, r = 0.26]); mean ΔρENV for SSN3 >1.5 kHz = 1.06 ± 0.15, [Z (99) = 5.95, p < 0.001, r = 0.59]). The underlying data can be found in https://doi.org/10.5061/dryad.3ffbg79fw. AN, auditory nerve; BN, babble noise; LSR, low spontaneous rate; MOC, medial olivocochlear; SSN, speech-shaped noise; TFS, temporal fine structure.

    (EPS)

    S6 Fig. Cortical evoked potentials during active and passive speech perception.

    ERP components during the active and passive listening from electrodes: FZ, F3, F4, CZ, C3, C4, TP7, TP8, T7, T8, PZ, P3, and P4 are shown in panels A, B, and C. Electrode’s selection was based on their relevance in attentional and language brain activity related networks [9397]. Thick lines and shaded areas represent means and SEM, respectively. Within conditions analysis showed that, for all speech manipulations, the magnitude of P1, P2, and N400 potentials were enhanced during active (colour lines) when compared to the passive (grey lines) listening conditions, while N1 tended to be less negative in the active task. LPC magnitude was only significantly enhanced during the active listening of speech in noise. (A) ERP components in natural and all noise-vocoded manipulations: P1: [F (1,24) = 6.36, p = 0.02, η2 = 0.21], N1: [F (1, 24) = 16.03, p = 0.001, η2 = 0.40], P2: [F (1, 24) = 12.30, p = 0.002, η2 = 0.34], N400: [F (1, 24) = 31.82, p = 0.0001, η2 = 0.57], LPC: [F(1,24) = 5.29, p = 0.03, η2 = 0.18]. (B) ERPs during natural (different population than noise-vocoded experiment) and all BN manipulations (n = 29): P1: [F (1, 28) = 24.47, p = 0.0001, η2 = 0.47], N1: [F (1, 28) = 10.46, p = 0.003, η2 = 0.27], P2: [F (1, 28) = 10.65, p = 0.003, η2 = 0.28], N400: [F (1, 28) = 62.16, p = 0.0001, η2 = 0.69], LPC: [F(1,28) = 10.55, p = 0.003, η2 = 0.27]. (C) ERP components during SSN manipulations (n = 29): P1: [F (1, 28) = 22.98, p = 0.0001, η2 = 0.45], N1: [F (1, 28) = 6.07, p = 0.02, η2 = 0.18], P2: [F (1, 28) = 18.10, p = 0.001, η2 = 0.39] and N400: [F (1, 28) = 60.75, p = 0.0001, η2 = 0.68], LPC: [F(1,28) = 10.76, p = 0.003, η2 = 0.28]. The underlying data can be found in https://doi.org/10.5061/dryad.3ffbg79fw. BN, babble noise; ERP, event-related potential; LPC, late positivity complex; SSN, speech-shaped noise.

    (EPS)

    S7 Fig. Comparison of LTAS for natural speech, BN, and SSNs.

    Power spectrum density estimates were calculated for 300 concatenated natural speech tokens and 60 seconds of 8-talker BN and SSN; all acoustic stimuli had been normalised to 65 dB for the purpose of this figure. The upper root-mean square envelopes, generated using 300-point sliding windows, are shown for the different conditions. The underlying data can be found in https://doi.org/10.5061/dryad.3ffbg79fw. BN, babble noise; LTAS, long-term average spectra; SSN, speech-shaped noise.

    (EPS)

    S8 Fig. Example of subject’s CEOAE data management from Fig 1F.

    Boxes and whiskers represent the distribution of the data in quartiles. Whiskers indicate the variability outside the upper and lower quartiles. Stars symbols represent outliers, data points labelled SNR corresponds to CEOAEs data with snr <6 dB, while data points labelled ID corresponds to incomplete data acquisition. These data points were not considered for statistical analysis. The underlying data can be found in https://doi.org/10.5061/dryad.3ffbg79fw. CEOAE, click-evoked otoacoustic emission; SNR, signal-to-noise ratio.

    (EPS)

    Attachment

    Submitted filename: response-to-reviewers.docx

    Attachment

    Submitted filename: Response-to-Reviewers.docx

    Data Availability Statement

    The underlying data can be found in https://doi.org/10.5061/dryad.3ffbg79fw.


    Articles from PLoS Biology are provided here courtesy of PLOS

    RESOURCES