Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2024 Jul 1.
Published in final edited form as: Hear Res. 2023 Apr 19;434:108771. doi: 10.1016/j.heares.2023.108771

Effects of Age on Brainstem Coding of Speech Glimpses in Interrupted Noise

William J Bologna a,b,c, Michelle R Molis b,c, Brandon M Madsen c,1, Curtis J Billings c,2
PMCID: PMC10213136  NIHMSID: NIHMS1896242  PMID: 37119674

Abstract

Difficulty understanding speech in fluctuating backgrounds is common among older adults. Whereas younger adults are adept at interpreting speech based on brief moments when the signal-to-noise ratio is favorable, older adults use these glimpses of speech less effectively. Age-related declines in auditory brainstem function may degrade the fidelity of speech cues in fluctuating noise for older adults, such that brief glimpses of speech interrupted by noise segments are not faithfully represented in the neural code that reaches the cortex. This hypothesis was tested using electrophysiological recordings of the envelope following response (EFR) elicited by glimpses of speech-like stimuli varying in duration (42, 70, 210 ms) and interrupted by silence or intervening noise. Responses from adults aged 23–73 years indicated that both age and hearing sensitivity were associated with EFR temporal coherence and response magnitude. Age was better than hearing sensitivity for predicting temporal coherence, whereas hearing sensitivity was better than age for predicting response magnitude. Poorer-fidelity EFRs were observed with shorter glimpses and with the addition of intervening noise. However, losses of fidelity with glimpse duration and noise were not associated with participant age or hearing sensitivity. These results suggest that the EFR is sensitive to factors commonly associated with glimpsing but do not entirely account for age-related changes in speech recognition in fluctuating backgrounds.

Keywords: Aging, Envelope Following Response, Frequency Following Response, Glimpsing, Forward Masking, Interrupted Speech

1. Introduction

Speech communication in background noise is particularly difficult for older adults—even those with normal hearing thresholds. Poor speech recognition is thought to be a consequence of age-related declines in temporal resolution (Frisina and Frisina, 1997; Gordon-Salant and Fitzgibbons, 1993), as well as declines in cognition and other higher-level processes (Akeroyd, 2008; Humes et al., 2012). Poor temporal resolution may impair speech recognition by limiting the listener’s ability to use brief cues in speech, which are particularly important in fluctuating noise. In fluctuating backgrounds, listeners reconstruct partially masked messages based on brief moments when the signal-to-noise ratio (SNR) is favorable (Cooke, 2006; Best et al., 2017). This perceptual ability, known as ―glimpsing,‖ is more difficult for older adults, even those with minimal hearing loss (Bologna et al., 2018). The purpose of this study was to test the hypothesis that age-related declines in temporal resolution reduce the neural fidelity (as determined by measures of brainstem coding) of speech cues during brief moments with favorable SNR.

Forward masking—the reduced detectability of an acoustic signal occurring after the offset of a masker—may explain the relationship between poor temporal resolution and the age-related decline in glimpsing. Dubno et al. (2003) measured detection thresholds for tonal signals occurring after the offset of a masker (forward-masked thresholds) as well as speech recognition in interrupted noise (a common paradigm for studying glimpsing) in the same sample of participants. They observed negative correlations between forward-masked thresholds and “masking release;” adults with poorer forward-masked thresholds received less benefit on the speech-recognition task when interruptions in the noise created momentary improvements in SNR. These results were interpreted in terms of the “recovery from prior stimulation” that is common to both the forward-masking and speech-recognition tasks. Masking release was dependent on the recovery of the response to a suprathreshold signal (tone or speech) following prior stimulation by a masker (Dubno et al., 2003). The strength of the association increased at higher rates of interruption, suggesting that forward masking from the preceding noise bursts was more disruptive to the recognition of shorter glimpses of speech. Similar results were reported by Fogerty et al. (2017), showing that older adults require longer durations after the offset of a masker to recognize consonant-vowel signals than younger adults. Physiological evidence that shorter glimpses are more susceptible to forward masking, particularly in older adults, would strengthen this theoretical relationship between forward masking, glimpsing, and age. Given that many older adults report difficulty understanding speech in noise despite relatively normal auditory thresholds (Beck et al., 2018), the ability to measure the underlying mechanism driving this relationship would offer considerable diagnostic utility.

Electrophysiological studies in humans and animals have provided insight into the potential mechanisms leading to the observed effects of age on glimpsing and forward masking. Animal models of aging suggest that neural activity in the brainstem becomes less temporally precise in older animals, even in the absence of peripheral hearing loss (Willott et al., 1988; Caspary et al., 1990). Precise neural firing allows the auditory system to capture fast changes in an acoustic environment, such as a momentary dip in the level of the background noise that would provide a glimpse of the speech signal. Electrophysiological work with human subjects has linked the age-related decline in temporal resolution to poor brainstem coding of speech and speech-like signals (Anderson et al., 2012; Clinard and Tremblay, 2013; Mamo et al., 2016; Parthasarathy et al., 2018; Vander Werff and Burns, 2011). These studies recorded auditory evoked potentials from younger and older adults in response to speech-like stimuli in quiet or with various maskers. Stimulus-locked responses can be processed to extract either the envelope or fine-structure component of the neural signal. These two components of the response, known as the envelope following response (EFR) and the frequency following response (FFR), respectively, reflect the neural representation of the acoustic envelope or fine structure of the evoking stimulus (Aiken and Picton, 2008). These techniques allow for quantitative measurement of the fidelity of brainstem coding (or at least certain aspects of it) in humans. Several independent groups have demonstrated that increasing age, poorer peripheral hearing sensitivity, or a combination of the two is associated with poorer fidelity of the EFR and FFR, particularly in noise (Hao et al., 2018; McClasky et al., 2019; Schoof and Rosen, 2016). The current study used the EFR to investigate the extent to which physiological declines in brainstem coding may explain effects of age on glimpsing and forward masking of speech glimpses.

In this study, EFRs to glimpses of speech-like stimuli interrupted by silence or noise were elicited from adults spanning a wide range of ages. The effect of glimpse duration was evaluated with stimuli consisting of either a single long glimpse, three medium glimpses, or five short glimpses, with equivalent stimulus energy across the three conditions. The effect of forward masking on the EFR was determined by comparing responses to glimpses interrupted by silence to glimpses interrupted by non-simultaneous masking noise. We expected that older adults would demonstrate poorer overall EFR strength than younger adults, as seen in several previous studies (e.g., Anderson et al., 2012; Clinard and Tremblay, 2013; Vander Werff and Burns, 2011). We hypothesized that age-related declines in glimpsing were driven by poor brainstem coding of speech glimpses and predicted that older adults would demonstrate particularly poor responses to shorter glimpses of speech (i.e., interaction between age and glimpse duration, as seen in Bologna et al., 2018). We also hypothesized that poor brainstem coding contributed to prolonged susceptibility to forward masking in older adults and predicted that poor responses would be observed when glimpses were interrupted by noise compared to silence, particularly for older adults and with shorter glimpses of speech (as shown behaviorally in Dubno et al., 2003 and Fogerty et al., 2017). Evidence supporting these hypotheses would suggest that age-related declines in glimpsing may be driven by poor brainstem coding of speech glimpses.

2. Material and methods

2.1. Participants

Thirty-one adults (15 males) aged 23–73 years (mean = 43.3, SD = 18.3) participated in this study. Physiological data from one participant (age 33) was excessively noisy and could not be analyzed; data from the remaining 30 adults are reported below. All participants had hearing thresholds ≤ 25 dB HL at all octave frequencies from 250 to 4000 Hz in the right ear. Figure 1 displays pure-tone hearing thresholds (right ear) from each individual in gray and the average hearing thresholds in black (error bars represent standard deviation). The inset scatterplot in Fig. 1 shows that a significant correlation between age and pure-tone average (0.5, 1, and 2 kHz) was noted in this sample [F(1,28) = 7.66, p < 0.01, r2 = 0.21]. No participant reported taking medication for sleep, seizures, or mood. All were paid for their participation and provided informed consent.

Figure 1: Mean and individual audiograms with inset correlation with age.

Figure 1:

Individual pure-tone hearing thresholds (dB HL) are plotted from 0.25 to 8.0 kHz for all 30 participants (gray) as well as the mean across participants (black). Error bars represent standard deviation. Inset scatterplot shows a significant correlation (r2=0.21) between age and pure-tone average (0.5, 1, and 2 kHz).

2.2. Stimuli

Six conditions were tested: three glimpse durations (long, medium, or short), with glimpses interrupted by either silence or noise. Stimulus waveforms and the spectra of the signal and noise are plotted in Figures 2A and 2B, respectively. The signal was a harmonic complex generated with a Klatt synthesizer implemented in Praat (Boersma and Weenink, 2018) with a fundamental frequency (f0) of 112.3 Hz. Harmonics were spectrally shaped using the KlattGrid function to create a single resonance peak at 500 Hz, such that the signal resembled the first formant of a neutral vowel. In each condition, a total of 210 ms of signal was presented, either in a single “long” glimpse (210 ms), three “medium” glimpses (70 ms each), or five “short” glimpses (42 ms each). The beginning and ending of each glimpse were pitch-pulse synchronized and included 5-ms cosine-shaped on- and off-ramps to reduce spectral splatter. Glimpses were evenly distributed across the 315-ms stimulus epoch and flanked by either silence (silent interruption) or noise shaped to approximate the spectral envelope of the signal (noise interruption). In the latter condition, the root-mean-square amplitude of the noise was set 10 dB above that of the signal. Noise segments also contained 5-ms cosine transition ramps to reduce spectral splatter. A phase-inverted copy of each stimulus was generated by multiplying the amplitude at each sample point by −1 to allow for alternating-polarity presentations. Stimuli were digitally stored as 44.1-kHz WAV files with 16-bit LPCM encoding.

Figure 2: Stimulus waveforms and spectra.

Figure 2:

(A) Stimulus waveforms for the three glimpse durations interrupted by silence (left column) and by noise (right column). Total stimulus duration (315 ms) and duty cycles (66.7%) were fixed to facilitate comparison of response strength across conditions. (B) Frequency spectra of the signal (red) and noise (gray). Signal was a harmonic complex with 112.3-Hz fundamental frequency and a resonance peak at 500 Hz. Noise was shaped to approximate the spectral envelope of the signal.

2.3. Procedures

Participants were seated in a reclined position inside a sound-treated and electrically shielded test booth. Participants were instructed to relax and minimize any unnecessary movement during testing; sleeping was encouraged. Stimuli were presented monaurally to the right ear at 65 dB SPL (noise at 75 dB SPL) using an ER-3A insert earphone (Etymotic Research, Elk Grove Village, IL) with mu-metal magnetic shielding and double-length tubing. Presentation was controlled by Stim2 software and digital-to-analog conversion hardware (Compumedics Neuroscan, Charlotte, NC). Each run consisted of 1502 alternating-polarity sweeps of a single stimulus condition with randomized inter-stimulus intervals (68, 85, or 102 ms) to prevent neural entrainment. Each of the six stimulus conditions was tested twice. The order of the conditions was randomized, with the restriction that each condition be tested once before being repeated. Each run lasted about 10 minutes, and participants were instructed to request breaks between conditions as often as necessary for stretching, water, or to use the restroom. If no break was requested, the time between runs was approximately 1 minute of silence. The study visit lasted no more than 4 hours, including consenting, audiometric testing, and setup.

EFRs were recorded at a sampling rate of 20 kHz with an online analog filter passband of 0.05–3000 Hz using SynAmps RT amplifiers and Scan Acquire 4.5 software (Compumedics Neuroscan, Charlotte, NC). Recordings were made using reusable Ag/AgCl electrodes in a vertical montage of Cz (vertex) referenced to M2 (right mastoid), with a common ground electrode at Fpz, and additional noninverting (active) electrodes at C7 (the 7th cervical vertebra), M1 (left mastoid), and Fz (halfway between Cz and the nasion).

2.4. Processing and Quantification

The continuous electroencephalogram from each run was divided into epochs extending from −40 ms to 355 ms re: stimulus onset, such that each epoch contained a 40-ms pre- and post-stimulus interval. Baseline correction was applied to each epoch by vertically aligning the sweep so that the mean amplitude in the 40-ms prestimulus interval was zero. Epochs containing absolute amplitude values above 20 μV were flagged as having excessive amounts of artifact; such epochs were rejected, as were the first and last epoch from each run. The remaining epochs were pooled across the two repeated runs of that condition for each participant, and then epochs of each polarity were averaged separately. The envelope component of the response (i.e., the EFR) was extracted by summing the two polarity-specific average responses and dividing by two. The EFR was subsequently filtered in MATLAB (MathWorks, Natick, MA) with a digital FIR filter, which had a passband of 80‒800 Hz and an order of 400. The filter was applied in both forward and reverse directions to eliminate phase delays.

Prior to quantification of response strength, the filtered EFR was aligned with the stimulus for each participant in each condition to estimate the delay in the neural response, relative to the stimulus onset trigger. The stimuli were downsampled from 44.1 kHz to 20 kHz and then filtered using the same digital FIR filter used on the EFR. The prestimulus interval in the EFR was removed, and the trimmed EFR was cross-correlated with the filtered version of the evoking stimulus using the xcorr function in MATLAB. This function generates a vector of correlation values corresponding to every possible sample offset between the stimulus and the response. Visual inspection of the cross-correlation vectors indicated that the greatest correlation between stimulus and response fell between 10 and 18 ms post stimulus onset for most participants and conditions. These bounds are also physiologically plausible for brainstem generator sites thought to contribute to the EFR (King et al., 2016). The sample offset corresponding to the maximum absolute correlation within these bounds was translated into a millisecond value re: stimulus onset and recorded as the lag for that participant and condition. This method allowed the lag estimate to vary across participants and conditions (mean = 14.37 ms, SD = 1.48 ms), such that the most accurate temporal alignment could be used in all cases. The portion of the response preceding the lag point was trimmed away, as was the portion occurring more than 315 ms after the lag estimate. Thus, the final EFR for each participant and condition was the 315-ms segment of neural activity that best correlated with the evoking stimulus, given the a priori restrictions on allowable lag values.

In the interest of comparing the neural response to the glimpse portion of the stimulus in the two interrupter conditions, the EFRs were passed through gating functions that isolated the portion corresponding to the glimpses and removed the neural activity during the silence or noise portions of the stimulus. Three gating functions mirrored the pattern of glimpses in the three stimuli, with 5-ms cosine transitions between the glimpse and non-glimpse portions. Each trimmed 315-ms EFR was multiplied by the gating function corresponding to its evoking stimulus, such that the portions of the EFR reflecting neural coding of glimpses were multiplied by 1 and the portions between glimpses were multiplied by 0, with transition portions ramped by the 5-ms cosine. This was an essential step for evaluating forward masking effects on brainstem coding of glimpses, as it allowed direct comparison of the neural responses to the glimpses interrupted by silence and noise, without contamination by the neural response to the noise.

The gated EFRs were quantified in both the spectral and temporal domains. In the spectral domain, response strength was calculated using a fast Fourier transform (FFT) applied in a single 315-ms window. The magnitude portion of the FFT in the frequency bin corresponding to the f0 of the stimulus (112.3 Hz) was recorded as the response strength for each participant in each condition. Despite the fact that each condition contained an equal duration of signal “on” time (210 ms), the differences in glimpse duration and number of glimpses resulted in small differences in f0 amplitude across the stimuli. To account for potential effects of this stimulus characteristic on the results, EFRs were also analyzed in the temporal domain using the stimulus-to-response correlation coefficient (SRCC, see Billings et al., 2019 for review). SRCC strongly correlates with response magnitude but is less sensitive to subtle differences in f0 amplitude. Declines in response magnitude and SRCC as a function of glimpse duration can be interpreted as poorer brainstem coding of f0 for shorter glimpses of speech.

3. Results

Grand-average response waveforms and spectrograms (prior to response gating and quantification) are shown in Figure 3A and 3B, respectively. Responses in the silent interruption condition illustrate bursts of neural activity that qualitatively match the pattern of glimpses in the corresponding stimuli; for example, note that the response to the long glimpse (top left panel of Fig. 3A) contains a visible onset of neural activity roughly 50 ms after stimulus onset, which corresponds to the onset of the long glimpse in the stimulus waveform (top left panel of Fig. 2A). In contrast, the pattern of glimpses in the noise interruption response waveforms is visually obscured by neural activity occurring during the noise segments of the stimuli. Neural activity in the noise interruption response waveforms (right panels of Fig. 3A) begins near 0 ms, which corresponds to the onset of the initial noise bursts in the noise interruption conditions (see right panels of Fig. 2A). Response spectrograms in Figure 3B indicate that the neural activity is spectrally concentrated at f0 (112.3 Hz), with a faint representation of the second harmonic (224.6 Hz), particularly for glimpses interrupted by silence (left column of Fig. 3B). The overall pattern observed in these grand-average waveforms and spectrograms is consistent with expectations for EFRs with these stimuli.

Figure 3: Grand-average response waveforms and spectrograms.

Figure 3:

(A) Grand-average response waveforms from 30 adults. Responses to long, medium, and short glimpses are shown in the top, middle, and bottom rows, respectively, with silent interruption responses in the left column and noise interruption in the right column. Dotted lines in waveforms indicate stimulus onset and offset. (B) Spectrograms of the grand-average response waveforms. Warmer colors indicate greater response magnitude.

Mean response magnitudes at f0 (left) and stimulus-to-response correlation coefficients (SRCC; right) are plotted in Figure 4 as a function of glimpse duration with interrupter (silent vs. noise) as the parameter. Magnitude and SRCC data follow similar patterns—stronger responses were observed for a single long glimpse than for medium or short glimpses (lines tilt downward), and responses were better when glimpses were interrupted by silence rather than noise (separation between lines). Magnitude and SRCC data were initially analyzed using two separate 2×3 repeated-measures ANOVAs with factors of interrupter (silent or noise) and glimpse duration (long, medium, or short). Magnitude results indicated that main effects of interrupter [F(1,29) = 86.52, p < 0.001] and glimpse duration [F(2,58) = 15.82, p < 0.001] were statistically significant, while the interaction between them [F(1.44,41.8) = 0.972, p = 0.36] was not. Similar results were found with SRCC data; main effects of interrupter [F(1,29) = 30.60, p < 0.001] and glimpse duration [F(2,58) = 6.07, p < 0.01] were found to be statistically significant, while the interaction between them [F(1.44,45.8) = 2.24, p = 0.13] was not. Taken together, these results suggest that shorter glimpses produce less robust EFRs, and that the EFR is disrupted by non-simultaneous masking noise between glimpses.

Figure 4: Average EFR magnitude and stimulus-to-response correlation coefficients.

Figure 4:

Mean response magnitude at f0 (left) and mean stimulus-to-response correlation coefficient (SRCC; right) are plotted as a function of glimpse duration for silent interruption (dashed line with open symbols) and noise interruption stimuli (solid line with filled symbols). Error bars represent standard error. Response magnitude and SRCC decrease as a function of glimpse duration and with the addition of noise between glimpses.

Effects of age and hearing sensitivity were evaluated with two separate linear regressions implemented in R (R Core Team, 2021) using linear mixed-effects models (LMM; R-package: lme4; Bates et al., 2015). SRCC values were transformed into Z-scores using a Fischer transform to facilitate modeling the data with a gaussian distribution. The LMMs specified individual Magnitude (μV) or SRCC (Z-score) values as the dependent variable and estimated separate β coefficients for each independent variable included in the model. The same modeling approach was used to generate the final two models. First, simple models were generated with Subject as a random effect and the significant factors identified with ANOVA as fixed effects: Interrupter (coded as 0 for silent interruption and 1 for noise interruption) and Duration (coded as the relative glimpse duration compared to the long glimpse; 1.0 for long, 0.33 for medium, 0.2 for short). Next, potential effects of Age (participant’s age in years), PTA (average hearing threshold at 500, 1000, and 2000 Hz), and interactions between Age, PTA, and the fixed effects were added to the model one at a time and tested for significance using model testing with likelihood ratio tests (Hofmann, 1997). The factor or interaction that most improved model fit was retained, and the process began again; remaining factors and interactions were added one at a time and tested for significance. This iterative process ended when no additional factors or interactions significantly improved the fit of the model. The goal of this process was to optimize the model to retain only the fixed effects and interactions that significantly improved model fit, such that the final models described below represent the best fit to the data with the fewest number of factors.

3.1. Magnitude Model

A simple model was constructed to predict Magnitude based on the stimulus condition variables (Interrupter and Duration) and a random effects term (Subject). Model testing confirmed significant contributions of Interrupter and Duration to model fit (Interrupter: χ2 = 108.7, p < .001; Duration: χ2 = 14.71, p < .001). Following our stepwise process, PTA was identified as the next most predictive factor and was added to the model (χ2 = 6.65, p < .01). During this initial step, Age was also noted to significantly improve model fit (χ2 = 4.54, p < .05) but its contribution was weaker than PTA, so it was held out of the model until the next iteration of model testing. In the next iteration, factors of Age and all two-way interactions between Interrupter, Duration, PTA, and Age, were added to the model with PTA and tested for significance with model testing. None of these factors or interactions significantly improved model fit compared to the model with PTA (all χ2 < 2.42, nonsignificant in all cases). As such, the final model can be expressed as: Magnitude ~ Interrupter + Duration + PTA + (Subject). Standard estimates, standard error, and t values for each factor can be found in Table 1. These results indicated three primary factors affected the magnitude of the response: (1) the presence of non-simultaneous noise between glimpses reduced the magnitude of the EFR (βInterrupter = −0.009; t = −12.55), (2) longer duration glimpses increased the magnitude of the EFR (βDuration = 0.004; t = 3.91), and (3) worse hearing sensitivity resulted in lower magnitude of the EFR (βPTA = −0.001; t = −2.64).

Table 1:

Final linear mixed-effects model of EFR magnitude with coding scheme, standard estimates, standard error, and t values for each factor in the model.

Factor Coding Scheme Standard Estimate (β) Standard Error t Value
(Intercept) NA 0.0364 0.0039 9.293
Interrupter 0 = Silent;
1 = Noise
−0.0095 0.0008 −12.549
Duration 1.0 = Long;
0.33 = Medium;
0.2 = Short
0.0042 0.0011 3.905
PTA Average threshold at 500, 1000, and 2000 Hz (dB HL) −0.0009 0.0003 −2.636

3.2. Stimulus-to-Response Correlation Coefficient Model

A simple model was constructed to predict SRCC Z-scores based on the stimulus condition variables (Interrupter and Duration) and a random effects term (Subject). Model testing confirmed significant contributions of Interrupter and Duration to model fit (Interrupter: χ2 = 30.97, p < .001; Duration: χ2 = 8.99, p < .01). Following our stepwise process, Age was identified as the next most predictive factor and was added to the model (χ2 = 11.37, p < .001). During this initial step, PTA was also noted to significantly improve model fit (χ2 = 6.77, p < .01) but its contribution was weaker than Age, so it was held out of the model until the next iteration of model testing. In the next iteration, factors of Age and all two-way interactions between Interrupter, Duration, PTA, and Age, were added to the model with Age and tested for significance with model testing. None of these factors or interactions significantly improved model fit (all χ2 < 3.38, nonsignificant in all cases). As such, the final model can be expressed as: SRCC ~ Interrupter + Duration + Age + (Subject). Standard estimates, standard error, and t values for each factor can be found in Table 2. These results indicated three primary factors affected the temporal coherence of the response: (1) the presence of non-simultaneous noise between glimpses reduced temporal coherence of the EFR (βInterrupter = −0.050; t = −5.83), (2) longer duration glimpses increased temporal coherence of the EFR (βDuration = 0.037; t = 3.02), and (3) greater age resulted in reduced temporal coherence of the EFR (βAge = −0.004; t = −3.59).

Table 2:

Final linear mixed-effects model of EFR stimulus-to-response correlation coefficient with coding scheme, standard estimates, standard error, and t values for each factor in the model.

Factor Coding Scheme Standard Estimate (β) Standard Error t Value
(Intercept) NA 0.5154 0.0535 9.643
Interrupter 0 = Silent;
1 = Noise
−0.0499 0.0086 −5.825
Duration 1.0 = Long;
0.33 = Medium;
0.2 = Short
0.0370 0.1223 3.023
Age Participant age in years −0.0040 0.0011 −3.591

4. Discussion

This study explored a potential physiological mechanism for the effects of age on glimpsing and forward masking. Age-related declines in hearing sensitivity and temporal resolution result in poorer brainstem-level representations of acoustic stimuli in older adults (Anderson et al., 2012; Clinard and Tremblay, 2013; Mamo et al., 2016; Parthasarathy et al., 2018; Vander Werff and Burns, 2011), and these declines may be particularly detrimental to the representation of short glimpses interrupted by noise. The neural representation of speech-like glimpses was measured using the EFR in a group of adults with normal hearing spanning a wide range of ages. Stimulus manipulations designed to assess the effects of glimpse duration and forward masking indicated that the neural representation of f0 declined in both magnitude and temporal coherence when a single long glimpse was separated into a series of shorter glimpses and when glimpses were interrupted by noise rather than silence. Effects of age and hearing sensitivity on the EFR were observed in different aspects of the response — greater age of the participant was associated with poorer temporal coherence, and poorer hearing sensitivity was associated with lower EFR magnitude at f0. Hypotheses based on behavioral studies of glimpsing and forward masking predicted interactions between the effects of age and glimpse duration and between age and noise interruption, but these interactions were not observed in the study. Thus, effects of glimpse duration and non-simultaneous masking on the EFR were observed but did not support the hypothesis that the mechanisms captured by the EFR may account for the age-related decline in glimpsing.

The correlated factors of age and hearing sensitivity both contributed to poorer electrophysiological responses and likely account for a shared portion of variance in the EFR across participants. In our models of EFR magnitude and temporal coherence, factors of age and PTA were both significant predictors when included in isolation. However, when both factors were included in the same model, their correlation resulted in neither factor significantly improving in model fit. Thus, a modeling decision was made to include the single factor (age or PTA) that best predicted the outcome variable. At this point, our models of the two EFR metrics diverged — magnitude was best predicted by PTA, and SRCC was best predicted by age. This outcome is in line with theoretical effects of age and hearing sensitivity of these aspects of the EFR. The effects of age are characterized by declines in temporal resolution (Frisina and Frisina, 1997; Gordon-Salant and Fitzgibbons, 1993), and age effects on the EFR and FFR have been reported by studies using temporal metrics of the response (e.g., Anderson et al., 2012; Clinard and Tremblay, 2013). In contrast, studies that have used magnitude estimates of the EFR and FFR have reported effects of hearing sensitivity or subclinical peripheral declines (e.g., Ananthakrishnan et al., 2016; Parthasarathy et al., 2019). Thus, while temporal coherence and response magnitude are correlated metrics (Billings et al., 2019), they are differentially sensitive to the effects of age and hearing sensitivity on the EFR.

Results of the current study support recent work exploring the potential relationship between the EFR and speech recognition in modulated noise. Schoof and Rosen (2016) measured speech recognition, as well as the EFR, in steady-state and 10-Hz amplitude-modulated noise. By comparing speech recognition measured in steady-state and modulated noise, they calculated the improvement in speech recognition due to glimpsing speech during the trough of the modulated masker (i.e., ―masking release‖). Groups of younger and older adults both showed behavioral release from masking, but associations between the behavioral and electrophysiological measures were not observed. Similarly, they estimated a “neural masking release” based on differences in EFR response strength at the masker peak and trough. However, this measure was inconsistent across individuals, with responses at the peak of the masker being better than responses at the trough in some participants. Their decision to use sinusoidal modulation and a relatively favorable SNR facilitated comparison between the behavioral and electrophysiological measures but may have contributed to the inconsistency of their measure of “neural masking release.” In the present study, noise modulated with rectangular waves was presented at a negative SNR in order to evaluate effects of glimpse duration and forward masking on the EFR. We observed stable and consistent effects of both factors on the strength of the response across participants.

The hypothesis that shorter glimpses of speech would be poorly coded in the brainstem of older adults was guided by behavioral experiments on recognition of interrupted speech. Bologna et al. (2018) showed that older adults were poorer than younger adults at recognizing interrupted sentences, and this age-related decline was greatest for sentences with the shortest speech glimpses. Similar effects of aging on recognition of interrupted speech have been reported previously (Gordon-Salant and Fitzgibbons, 1993; Krull et al., 2013; Shafiro et al., 2015). However, effects of glimpse duration on recognition of interrupted speech have not been found consistently in behavioral studies. Using only younger listeners, Wang and Humes (2010) tested recognition of interrupted sentences that varied in glimpse duration but retained equivalent proportions of the sentence (i.e., equivalent duty cycle of interruptions). They found that changes in glimpse duration did not affect sentence recognition when the duty cycle of the interruption was kept constant. The stimuli used in this study were analogous to those used by Wang and Humes (2010); the short glimpse stimulus contained five 42-ms glimpses, whose durations sum to equal the duration of the single long glimpse (5 × 42 ms = 210 ms).

Electrophysiological responses to these stimuli indicated subtle changes in the magnitude and temporal coherence of f0 coding with shorter duration glimpses. It is likely that these subtle changes in signal quality can be overcome by younger listeners on a behavioral task where sentence context and top-down processing can be used to facilitate speech recognition. Older adults are less able to use available glimpses of speech to fill in missing information (Krull et al., 2013), which may account for the association between glimpse duration and effects of age seen in other behavioral studies (e.g., Bologna et al., 2018; Shafiro et al., 2015). Similar conclusions were drawn by Grose et al. (2009) in their evaluation of behavioral masking release and auditory steady-state response data.

The addition of non-simultaneous masking noise between glimpses decreased f0 magnitude and temporal coherence for each glimpse duration. Similar results were recently reported on the FFR; the addition of a noise burst 5 ms prior to a synthetic /da/ syllable resulted in prolonged latency of FFR waves (Griz et al., 2020). As in this study, the effect was observed in participants across a wide range of ages but an association between forward masking and age was not found. The authors concluded that an association with age may have been obscured by an overall effect of age on the latency of FFR waves. Similar results were observed in this study, where age and noise interruption were associated with poorer SRCC, but no interaction was observed between these factors. Thus, while forward masking affects both the latency, magnitude, and temporal coherence of brainstem responses to speech glimpses, there is little evidence that this effect is exacerbated by advancing age. This suggests that the age-related increase in susceptibility to forward masking in a behavioral glimpsing context (e.g., Dubno et al., 2003) may be driven by mechanisms higher in the auditory pathway than is targeted by EFR and FFR. Support for this interpretation can be drawn from literature on the auditory steady-state response, which has shown that neural responses in the traditional modulation range for glimpsing (less than 20 Hz) localize to the auditory cortex (Luke et al., 2017).

The specific stimulus set used in this study may have limited our ability to observe the hypothesized interactions with age. The synthetic formant has speech-like qualities and harmonic structure but is considerably less complex than speech and does not contain dynamic spectral cues that maybe more sensitive to age effects. For example, Anderson et al. (2012) showed that effects of age were most pronounced on the transition between phonemes in the synthetic /da/ stimulus. In a glimpsing context, these complex spectrotemporal cues may be revealed during moments with positive SNR but are used less effectively by older adults for speech recognition (Venezia et al., 2019). Additionally, our signal contained limited acoustic energy above 800 Hz. A limited bandwidth stimulus was selected to minimize the potential effects of hearing sensitivity on the EFR, as age was expected to correlate with PTA. Within the low-frequency region of stimulus energy, participants in our sample had more comparable hearing thresholds across ages. However, an interaction between age and glimpse duration or noise may be more pronounced with stimuli that contain high-frequency energy. The fidelity of high-frequency temporal fine structure cues may play an important role in glimpsing (Hopkins and Moore, 2009), and these cues are degraded in recordings of the FFR in older adults (Ananthakrishnan et al., 2016). Similarly, recordings of the auditory steady-state response have shown effects of age on the encoding of envelope modulations, but only at faster modulation rates (Gaskins et al., 2019; Grose et al., 2009). Evaluating the neural responses using a signal with high-frequency temporal cues and faster modulation rates may have increased the likelihood of observing an interaction between age and glimpse duration or noise.

5. Conclusions

This study evaluated the hypothesis that age-related declines in temporal resolution disrupts brainstem coding of speech glimpses. In a sample of 30 adults spanning a wide range of ages, both age and subtle differences in hearing sensitivity were associated with different aspects of the neural response; the magnitude of the response was better predicted by hearing sensitivity, whereas the temporal coherence of the response was best predicted by age. For both response magnitude and temporal coherence, brainstem coding was negatively affected by shortening the duration of glimpses, and by the addition of noise bursts between glimpses. However, no interactions between glimpsing variables and age or PTA were observed. Rather, declines in brainstem coding associated with glimpse duration and noise interruption were consistent across participants in the study sample. Whereas declines in temporal resolution with age were shown to disrupt the fidelity of neural signals traveling through the brainstem, this physiological decline did not mirror the pattern of age effects observed in behavioral studies of speech recognition. Additional sources of variance in speech recognition, such as cognitive factors, are likely mediating the relationship between subcortical measures of speech coding (EFR) and behavioral speech recognition abilities.

Highlights:

  • Temporal coherence of brainstem coding of glimpses is less precise in older adults than younger adults

  • Magnitude of the brainstem responses is lower in adults with poorer hearing sensitivity than those with better hearing sensitivity

  • Shorter duration glimpses are coded less robustly than longer glimpses

  • Glimpses are coded less robustly when interrupted by noise than by silence

Acknowledgements

The authors would like to thank Ramesh Muralimanohar and Alice Scarlet for their assistance with data collection and management. Portions of this work were presented at the 2019 MidWinter Meeting of the Association for Research in Otolaryngology and the 2022 Spring Meeting of the Acoustical Society of America.

Funding

This work was supported by the Oregon Clinical and Translational Research Institute (OCTRI), grant number (TL1 TR 002371) from the National Center for Advancing Translational Sciences (NCATS), as well as grant numbers (R01 DC 015240, PI: Billings); R01 DC 012314, PI: Molis) from the National Institute on Deafness and Other Communication Disorders (NIDCD) at the National Institutes of Health (NIH), and grant number (I01 RX 002139) from the Veteran’s Affairs (VA) Rehabilitation Research and Development Service. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH. The contents do not represent the views of the U.S. Department of Veterans Affairs or the United States Government.

Abbreviations

SNR

signal-to-noise ratio

EFR

envelope following response

FFR

frequency following response

f 0

fundamental frequency

FFT

fast Fourier transform

SRCC

stimulus-to-response correlation coefficient

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

CRediT author statement

William J. Bologna: Conceptualization, Methodology, Software, Formal Analysis, Investigation, Data Curation, Writing – Original Draft, Visualization,

Michelle R. Molis: Conceptualization, Writing – Reviewing and Editing, Supervision, Funding Acquisition

Brandon Madsen: Methodology, Software, Data Curation, Writing – Reviewing and Editing

Curtis Billings: Conceptualization, Methodology, Resources, Writing – Reviewing and Editing, Supervision, Funding Acquisition

References

  1. Aiken SJ, & Picton TW (2008). Envelope and spectral frequency-following responses to vowel sounds. Hearing research, 245(1–2), 35–47. 10.1016/j.heares.2008.08.004 [DOI] [PubMed] [Google Scholar]
  2. Akeroyd MA (2008). Are individual differences in speech reception related to individual differences in cognitive ability? A survey of twenty experimental studies with normal and hearing-impaired adults. International journal of audiology, 47 Suppl 2, S53–S71. 10.1080/14992020802301142 [DOI] [PubMed] [Google Scholar]
  3. Ananthakrishnan S, Krishnan A, & Bartlett E (2016). Human Frequency Following Response. Ear & Hearing, 37(2), e91–e103. 10.1097/aud.0000000000000247 [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Anderson S, Parbery-Clark A, White-Schwoch T, & Kraus N (2012). Aging Affects Neural Precision of Speech Encoding. The Journal of Neuroscience, 32(41), 14156–14164. 10.1523/jneurosci.2176-12.2012 [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Beck DL, Danhauer JL, Abrams HB, Atcherson SR, Brown DK, Chasin M, Clark JG, De Placido C, Edwards B, Fabry DA, Flexer C, Fligor B, Frazer G, Galster JA, Gifford L, Johnson CE, Madell J, Moore DR, Roeser RJ, Saunders GH, Searchfield GD, Spankovich C, Valente M, and Wolfe J. (2018). Audiologic considerations for people with normal hearing sensitivity yet hearing difficulty and/or speech-in-noise problems. Hear Rev, 25, 28–38. [Google Scholar]
  6. Best V, Mason CR, Swaminathan J, Roverud E, & Kidd G Jr (2017). Use of a glimpsing model to understand the performance of listeners with and without hearing loss in spatialized speech mixtures. The Journal of the Acoustical Society of America, 141(1), 81. 10.1121/1.4973620 [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Billings CJ, Bologna WJ, Muralimanohar RK, Madsen BM, & Molis MR (2019). Frequency following responses to tone glides: Effects of frequency extent, direction, and electrode montage. Hearing research, 375, 25–33. 10.1016/j.heares.2019.01.012 [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Boersma P, Weenink D (2018). Praat: doing phonetics by computer [computer program].
  9. Bologna WJ, Vaden KI, Ahlstrom JB, & Dubno JR (2018). Age effects on perceptual organization of speech: Contributions of glimpsing, phonemic restoration, and speech segregation. The Journal of the Acoustical Society of America, 144(1), 267. 10.1121/1.5044397 [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Caspary DM, Raza A, Lawhorn Armour BA, Pippin J, & Arnerić SP (1990). Immunocytochemical and neurochemical evidence for age-related loss of GABA in the inferior colliculus: implications for neural presbycusis. The Journal of Neuroscience, 10(7), 2363–2372. 10.1523/JNEUROSCI.10-07-02363.1990 [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Clinard CG, & Tremblay KL (2013). Aging degrades the neural encoding of simple and complex sounds in the human brainstem. Journal of the American Academy of Audiology, 24(7), 590–644. 10.3766/jaaa.24.7.7 [DOI] [PubMed] [Google Scholar]
  12. Cooke M (2006). A glimpsing model of speech perception in noise. The Journal of the Acoustical Society of America, 119(3), 1562–1573. 10.1121/1.2166600 [DOI] [PubMed] [Google Scholar]
  13. Dubno JR, Horwitz AR, & Ahlstrom JB (2003). Recovery from prior stimulation: masking of speech by interrupted noise for younger and older adults with normal hearing. The Journal of the Acoustical Society of America, 113(4 Pt 1), 2084–2094. 10.1121/1.1555611 [DOI] [PubMed] [Google Scholar]
  14. Fogerty D, Bologna WJ, Ahlstrom JB, & Dubno JR (2017). Simultaneous and forward masking of vowels and stop consonants: Effects of age, hearing loss, and spectral shaping. The Journal of the Acoustical Society of America, 141(2), 1133–1143. 10.1121/1.4976082 [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Frisina DR, & Frisina RD (1997). Speech recognition in noise and presbycusis: relations to possible neural mechanisms. Hearing research, 106(1–2), 95–104. 10.1016/s0378-5955(97)00006-3 [DOI] [PubMed] [Google Scholar]
  16. Gaskins C, Jaekel BN, Gordon-Salant S, Goupell MJ, & Anderson S (2019). Effects of aging on perceptual and electrophysiological responses to acoustic pulse trains as a function of rate. Journal of Speech, Language, and Hearing Research, 62(4S), 1087–1098. 10.1044/2018_JSLHR-H-ASCC7-18-0133 [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Gordon-Salant S, & Fitzgibbons PJ (1993). Temporal factors and speech recognition performance in young and elderly listeners. Journal of speech and hearing research, 36(6), 1276–1285. 10.1044/jshr.3606.1276 [DOI] [PubMed] [Google Scholar]
  18. Griz S, Menezes DC, Angelo Venâncio LG, Holanda da Fonsêca N, do Nascimento TO, de Araújo A, Andrade K, & Menezes PL (2020). Effect of Forward Masking on Frequency Following Response as a Function of Age. Journal of the American Academy of Audiology, 31(5), 317–323. 10.3766/jaaa.18104 [DOI] [PubMed] [Google Scholar]
  19. Grose JH, Mamo SK, & Hall JW 3rd (2009). Age effects in temporal envelope processing: speech unmasking and auditory steady state responses. Ear and Hearing, 30(5), 568–575. 10.1097/AUD.0b013e3181ac128f [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Hao W, Wang Q, Li L, Qiao Y, Gao Z, Ni D, & Shang Y (2018). Effects of Phase-Locking Deficits on Speech Recognition in Older Adults With Presbycusis. Frontiers in aging neuroscience, 10, 397. 10.3389/fnagi.2018.00397 [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Hopkins K, & Moore BC (2009). The contribution of temporal fine structure to the intelligibility of speech in steady and modulated noise. The Journal of the Acoustical Society of America, 125(1), 442–446. 10.1121/1.3037233 [DOI] [PubMed] [Google Scholar]
  22. Hofmann DA (1997). An overview of the logic and rationale of hierarchical linear models. Journal of Management, 23(6), 723–744. 10.1016/S0149-063(97)90026-X [DOI] [Google Scholar]
  23. Humes LE, Dubno JR, Gordon-Salant S, Lister JJ, Cacace AT, Cruickshanks KJ, Gates GA, Wilson RH, & Wingfield A (2012). Central presbycusis: a review and evaluation of the evidence. Journal of the American Academy of Audiology, 23(8), 635–666. 10.3766/jaaa.23.8.5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. King A, Hopkins K, & Plack CJ (2016). Differential Group Delay of the Frequency Following Response Measured Vertically and Horizontally. Journal of the Association for Research in Otolaryngology, 17(2), 133–143. 10.1007/s10162-016-0556-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Krull V, Humes LE, & Kidd GR (2013). Reconstructing wholes from parts: effects of modality, age, and hearing loss on word recognition. Ear and hearing, 34(2), e14–e23. 10.1097/AUD.0b013e31826d0c27 [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Luke R, De Vos A, & Wouters J (2017). Source analysis of auditory steady-state responses in acoustic and electric hearing. Neuroimage, 147, 568–576. 10.1016/j.neuroimage.2016.11.023 [DOI] [PubMed] [Google Scholar]
  27. Mamo SK, Grose JH, & Buss E (2016). Speech-evoked ABR: Effects of age and simulated neural temporal jitter. Hearing research, 333, 201–209. 10.1016/j.heares.2015.09.005 [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. McClaskey CM, Dias JW, & Harris KC (2019). Sustained envelope periodicity representations are associated with speech-in-noise performance in difficult listening conditions for younger and older adults. Journal of neurophysiology, 122(4), 1685–1696. 10.1152/jn.00845.2018 [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Parthasarathy A, Bartlett EL, & Kujawa SG (2019). Age-related Changes in Neural Coding of Envelope Cues: Peripheral Declines and Central Compensation. Neuroscience, 407, 21–31. 10.1016/j.neuroscience.2018.12.007 [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Schoof T, & Rosen S (2016). The Role of Age-Related Declines in Subcortical Auditory Processing in Speech Perception in Noise. Journal of the Association for Research in Otolaryngology, 17(5), 441–460. 10.1007/s10162-016-0564-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Shafiro V, Sheft S, Risley R, & Gygi B (2015). Effects of age and hearing loss on the intelligibility of interrupted speech. The Journal of the Acoustical Society of America, 137(2), 745–756. 10.1121/1.4906275 [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Venezia JH, Martin AG, Hickok G, & Richards VM (2019). Identification of the spectrotemporal modulations that support speech intelligibility in hearing-impaired and normal-hearing listeners. Journal of Speech, Language, and Hearing Research, 62(4), 1051–1067. 10.1044/2018_JSLHR-H-18-0045 [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Vander Werff KR, & Burns KS (2011). Brain stem responses to speech in younger and older adults. Ear and hearing, 32(2), 168–180. 10.1097/AUD.0b013e3181f534b5 [DOI] [PubMed] [Google Scholar]
  34. Wang X, & Humes LE (2010). Factors influencing recognition of interrupted speech. The Journal of the Acoustical Society of America, 128(4), 2100–2111. 10.1121/1.3483733 [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Willott JF, Parham K, & Hunter KP (1988). Response properties of inferior colliculus neurons in young and very old CBA/J mice. Hearing research, 37(1), 1–14. 10.1016/0378-5955(88)90073-1 [DOI] [PubMed] [Google Scholar]

RESOURCES