Skip to main content
Journal of Neurophysiology logoLink to Journal of Neurophysiology
. 2019 Jul 31;122(4):1685–1696. doi: 10.1152/jn.00845.2018

Sustained envelope periodicity representations are associated with speech-in-noise performance in difficult listening conditions for younger and older adults

Carolyn M McClaskey 1,, James W Dias 1, Kelly C Harris 1
PMCID: PMC6843096  PMID: 31365323

Abstract

Temporal modulations are an important part of speech signals. An accurate perception of these time-varying qualities of sound is necessary for successful communication. The current study investigates the relationship between sustained envelope encoding and speech-in-noise perception in a cohort of normal-hearing younger (ages 18–30 yr, n = 22) and older adults (ages 55–90+ yr, n = 35) using the subcortical auditory steady-state response (ASSR). ASSRs were measured in response to the envelope of 400-ms amplitude-modulated (AM) tones with 3,000-Hz carrier frequencies and 80-Hz modulation frequencies. AM tones had modulation depths of 0, −4, and −8 dB relative to m = 1 (m = 1, 0.631, and 0.398, respectively). The robustness, strength at modulation frequency, and synchrony of subcortical envelope encoding were quantified via time-domain correlations, spectral amplitude, and phase-locking value, respectively. Speech-in-noise ability was quantified via the QuickSIN test in the 0- and 5-dB signal-to-noise (SNR) conditions. All ASSR metrics increased with increasing modulation depth and there were no effects of age group. ASSR metrics in response to shallow modulation depths predicted 0-dB speech scores. Results demonstrate that sustained amplitude envelope processing in the brainstem relates to speech-in-noise abilities, but primarily in difficult listening conditions at low SNRs. These findings furthermore highlight the utility of shallow modulation depths for studying temporal processing. The absence of age effects in these data demonstrate that individual differences in the robustness, strength, and specificity of subcortical envelope processing, and not age, predict speech-in-noise performance in the most difficult listening conditions.

NEW & NOTEWORTHY Failure to correctly understand speech in the presence of background noise is a significant problem for many normal-hearing adults and may impede healthy communication. The relationship between sustained envelope encoding in the brainstem and speech-in-noise perception remains to be clarified. The present study demonstrates that the strength, specificity, and robustness of the brainstem’s representations of sustained stimulus periodicity relates to speech-in-noise perception in older and younger normal-hearing adults, but only in highly challenging listening environments.

Keywords: aging, auditory steady-state response, envelope following response, speech perception, temporal processing

INTRODUCTION

Temporal modulations are a fundamental part of auditory signals. An accurate perception of the time-varying qualities of sounds, also known as temporal processing, is important for many aspects of hearing, particularly speech recognition (Shannon et al. 1995). In humans, subcortical temporal processing can be studied via the frequency-following response (FFR), a noninvasive scalp-measured extracellular potential that reflects the phase-locked temporal qualities of the stimulus (Worden and Marsh 1968) and, when elicited by stimuli with periodicities above ~70 Hz, is generated by the brainstem (Wang and Li 2018). By measuring the FFR in response to simple speech stimuli such as consonant-vowel (CV) syllables, studies have demonstrated that the timing of peaks in the response to both the syllable onset and the formant transition predicts speech-in-noise (SIN) perception, particularly when the FFR is elicited by CV syllables in noise. (Parbery-Clark et al. 2009a; Song et al. 2011). The introduction of background noise to the stimulus acts as an energetic masker and obscures the envelope by filling in the troughs of the modulated signal (Assmann and Summerfield 2004). This attenuates amplitude modulations and degrades the signal, providing a challenge for the auditory system’s temporal processing mechanisms, resulting in an FFR that has less temporal regularity, diminished amplitude, and an envelope that less closely resembles that of the eliciting stimulus.

The relationship between SIN perception and the steady-state portion of the speech FFR is less straightforward. The speech FFR’s steady-state segment is elicited by the sustained vowel portion of a CV syllable and reflects subcortical encoding of sustained waveform periodicity (Picton 2013). A limited number of speech-FFR studies have found that the brainstem’s representation of sustained vowel periodicity relates to SIN perception in that it negatively correlates with the amount of degradation (measured via time-domain correlations) induced in the FFR by the addition of background noise to the stimulus. In other words, higher correlations between the stimulus and the response, and between the response in quiet and the response in noise, relate to better SIN perception (Parbery-Clark et al. 2009a, 2012b; Song et al. 2011). The FFR magnitude of the entire CV response was found to relate to SIN scores in one study of normal-hearing listeners separated into two groups on the basis of SIN abilities (Anderson et al. 2011), but more commonly, subcortical neural representations of sustained waveform periodicity do not relate to SIN (Anderson et al. 2010, 2012; Hornickel et al. 2009; Parbery-Clark et al. 2009a, 2012b; Presacco et al. 2016; Schoof and Rosen 2016; Vander Werff and Burns 2011; for a review, see Anderson and Kraus 2010b). Taken together, these results provide evidence for the importance of the brainstem’s representation of the timing of speech sounds, particularly in the onset and formant transition periods, but provide conflicting results about the relationship between sustained envelope periodicity encoding and SIN.

The sustained portion of the FFR is functionally and anatomically distinct from the FFR’s onset responses (Bidelman 2015; Wang and Li 2018), which may explain the differential effects of the steady-state period and the onset and transition periods. As a result, the neural mechanisms underlying the representation of sustained periodicity in the brainstem may contribute to SIN perception differently than those responsible for FFR onset and formant transition encoding. To that end, stimuli other than speech or CV syllables may be more appropriate for evaluating how the subcortical neural encoding of sustained waveform periodicity relates to SIN perception. One such class of stimuli, amplitude-modulated (AM) sounds, exclude the word onsets and syllable transitions of natural speech and allow for a direct manipulation of envelope properties such as modulation depth and frequency in a way that is not possible with speech stimuli, making them highly suitable stimuli for studies of sustained temporal envelope encoding. AM stimuli have also been proposed as a means to study bottom-up influences on auditory temporal processing and SIN (Bharadwaj et al. 2014; Shaheen et al. 2015).

Fewer studies have investigated the link between sustained amplitude envelope encoding in the brainstem and SIN perception. Brainstem responses elicited by the amplitude envelopes of AM stimuli are often called envelope following responses (EFRs), auditory steady-state responses (ASSRs), steady-state auditory evoked potentials (SSAEPs), or amplitude-modulation-following responses (AMFRs). Whereas “envelope following response” refers to any psychological response to envelope modulations regardless of the eliciting stimuli, ASSRs refer to those EFRs that are elicited by stationary stimuli, such as AM sounds, and that have constant amplitude and phase across time (Rance 2008). The terms ASSR and EFRs are often used interchangeably, but for clarity and to reflect our focus on sustained envelope modulations, the current report uses the term ASSR. ASSRs measured in response to 80-Hz modulation at moderately shallow modulation depths and quantified by the number of responses that exceeded the noise floor have demonstrated that the number of significant responses and their amplitudes significantly predict word recognition scores in quiet and in the presence of speech-shaped maskers (Dimitrijevic et al. 2001, 2004). This was the case for both older and younger normal-hearing listeners and for older hearing-impaired listeners. AM tones of a similar modulation frequency measured in response to a wider range of modulation depths show a similar relationship for low-probability sentences but not for high-probability sentences, and for carrier frequencies of 500 Hz but not for 2,000 Hz (Leigh-Paffenroth and Fowler 2006).

In contrast, the absence of a relationship between sustained envelope processing and SIN perception has also been reported. Guest et al. (2018) quantified both ASSR amplitude and the change in amplitude with modulation depth and found that neither predicted a difference between listeners with and without SIN difficulties. In two additional studies with both normal-hearing and hearing-impaired listeners, subcortical ASSRs from normal-hearing listeners were not associated with the speech-reception thresholds of masked sentences (Goossens et al. 2018) or with word identification in noise and in quiet (Leigh-Paffenroth and Murnane 2011), but ASSRs from hearing-impaired listeners demonstrated a mixed pattern of results: stronger subcortical responses were associated with poorer SIN in Goossens et al. (2018), but with better SIN in Leigh-Paffenroth and Murnane (2011). Taken together, these studies extend the speech-evoked FFR literature to show that subcortical neural temporal processing relates to SIN perception but provide conflicting findings regarding the precise relationship between sustained envelope encoding and SIN perception.

The primary goal of the present study is thus to clarify the role of AM coding in SIN perception by normal-hearing listeners. Sustained waveform periodicity is important for extracting the voice fundamental in continuous speech, which aids in talker identification and sound segregation and is thus an important part of hearing in noise (Anderson and Kraus 2010a; Picton 2013). This may be especially important in difficult listening conditions where overlapping talkers and background noise degrade the modulation envelope of a talker’s voice and thereby make it more difficult to follow the talker in conversation. Therefore, we hypothesize that the fidelity of sustained envelope representations in normal-hearing listeners should relate to SIN perception in difficult listening conditions where other speech cues are less informative. Understanding how subcortical processing of waveform periodicity relates to SIN abilities is important for understanding the neural basis of SIN deficits.

The present study measures ASSRs in response to AM tones of varying modulation depths and relates this to SIN recognition. To obtain a comprehensive measure of neural encoding, we quantify brainstem encoding in terms of robustness to degradation (via time-domain correlations), the strength and specificity of the response at the modulation frequency, and synchrony. Similar to the masking effect of background noise, decreasing the modulation depth of AM signals diminishes the amplitude envelope and degrades the signal. We therefore test a range of modulation depths, including shallow ones, to evaluate the subcortical encoding of waveforms that better approximate real-world speech envelopes in background noise. We also calculate the slope of ASSR strength and specificity across modulation depths (Bharadwaj et al. 2014). Whereas the slope of the ASSR as a function of modulation depth acts as a normalized measure of ASSR encoding, low modulation depths such as those tested in Dimitrijevic et al. (2004) may also be optimal for describing this relationship because they more closely approximate speech envelopes in adverse listening conditions. A secondary goal of the present study is to evaluate the viability of low-modulation-depth stimuli for investigating temporal processing and its relationship to SIN perception. We predict that the neural encoding of AM sounds with shallow modulation depths will best predict SIN perception for both older and younger listeners.

Although not universal, certain aspects of temporal processing decline with age and may affect speech understanding in difficult listening conditions (Anderson et al. 2012; Harris and Dubno 2017; Strouse et al. 1998). We include both older and younger normal-hearing listeners in the present study to evaluate the effect of age on the subcortical periodicity encoding of AM sounds. Age differences in subcortical envelope encoding may depend on modulation frequency: age effects in the FFR have been documented at higher modulation frequencies close to 1,000 Hz but not at lower ones (Clinard and Tremblay 2013; Purcell et al. 2004), although some have been found in response to stimuli with periodicities as low as 150 Hz (Grose et al. 2009; Leigh-Paffenroth and Fowler 2006). Age effects are rarely documented below 80 Hz (Boettcher et al. 2001; Purcell et al. 2004). It is possible that age does not result in overall changes to ASSR metrics for frequencies below 100 Hz, but instead affects how the envelope representations change with modulation depth (Dimitrijevic et al. 2016). Therefore, we do not expect to see a dramatic age group difference in subcortical envelope encoding for the modulation frequency employed in the present study and instead predict that modulation depth affects older and younger listeners differently, resulting in different slopes of ASSR metrics by modulation depth in younger and older listeners.

METHODS

Participants

Fifty-seven adults with clinically normal audiograms from 250 to 3,000 Hz were recruited from the greater Charleston community: 35 older adults [ages 56–90+ yr, mean age (M) = 66.7 yr, SD = 7.3 yr; 25 women] and 22 younger adults (age 18–30 yr, M = 24.7 yr, SD = 3.6 yr; 12 women). Audiometric thresholds were measured at 250, 500, 1,000, 2,000, 3,000, 4,000, 6,000, and 8,000 Hz (Fig. 1). All participants had pure-tone air conduction thresholds at or better than 25 dB hearing level (HL) from 250 to 3,000 Hz in the right ear, between-ear threshold differences ≤15 dB, and normal otoscopic and neurological assessments, evaluated via the Mini-Mental State Exam (MMSE). All participants provided written informed consent in accordance with procedures approved by the Medical University of South Carolina’s Institutional Review Board.

Fig. 1.

Fig. 1.

Mean pure-tone air conduction thresholds at audiometric test frequencies for the right ear. Closed triangles indicate younger participants and open circles indicate older participants. Vertical bars indicate 95% confidence intervals. dB HL, decibels in hearing level.

Neurophysiology Measurements

Stimuli and equipment.

ASSRs were measured in response to AM transposed tones with carrier frequencies of 3,000 Hz and modulation frequencies of 80 Hz. Transposed tones were generated by modulating a 3,000-Hz carrier tone with a half-wave-rectified sinusoid with a modulation frequency of 80 Hz (Bernstein and Trahiotis 2004; Dreyer and Delgutte 2006). The resultant tone is an AM signal that generates better phase-locking than sinusoidal AM tones. Transposed tones had modulation depths of 0, −4, or −8 dB relative to m = 1, calculated using the equation dB = 20 log(m) (Bharadwaj et al. 2015; Guest et al. 2018). These correspond to modulation depths of m = 1, 0.631, and 0.3981, respectively. All tones included 15 ms on- and off-ramps, and the total duration was 400 ms. Stimuli were generated offline in MATLAB (The MathWorks, Natick, MA) with a sampling rate of 50 kHz and were stored for later playback. One thousand trials of each modulation depth were played in random order through ER-3C headphones (Etymotic Research, Elk Grove Village, IL) to the right ear at 75 dB using MATLAB in conjunction with a Tucker-Davis RZ6 processor (Tucker-Davis Technologies, Alachua, FL). The mean interstimulus interval was 1,000 ms (±100 ms) and was jittered from trial to trial. Stimuli were generated with alternating polarity, and responses from both polarities were averaged together to enhance envelope responses and exclude the cochlear microphonic and stimulus artifacts (Aiken and Picton 2008).

EEG recording procedure.

EEG was recorded with a 64-channel Neuroscan QuikCap with electrodes in the international 10–20 coordinate system, connected to a SynAmps RT 64-channel amplifier (Compumedics USA, Charlotte, NC). Recordings were collected using Compumedics Curry 7.0 software at a sampling rate of 5,000 Hz in an electrically and acoustically shielded room. Vertical eye movements were monitored using bipolar electrodes placed on the skin above and below the left eye. Participants reclined in a comfortable chair and were encouraged to rest quietly for the duration of testing to minimize muscle artifacts.

Data were preprocessed offline in MATLAB via custom scripts in conjunction with EEGlab (Delorme et al. 2015). Recordings were referenced to the mastoids and high-pass filtered above 70 Hz. Recordings were epoched from −50 to + 450 ms relative to stimulus onset and baseline-corrected to a baseline window from −50 to 0 ms prestimulus. Epochs with voltages exceeding ±100 µV were rejected and visually inspected. Artifact-free trials were averaged to yield the event-related potential. One younger participant was excluded from the analysis due to excessive voltage measurements >1,000 µV.

Data Analysis

ASSRs were analyzed via correlational analysis in the time domain (Anderson et al. 2011; Bidelman and Krishnan 2010; Clinard and Cotter 2015) and via Fourier analysis in the frequency domain (Clinard et al. 2010). Whole-head scalp topographies of the 80-Hz ASSR amplitude response (see below) for both groups were calculated as a preliminary analysis to determine the channels maximally contributing to the ASSR response (see Fig. 2). Together with previous research showing that the brainstem-generated FFRs are maximally observed over frontal-central electrodes (Bidelman 2015), we selected a cluster of 17 frontal-central channels to include in subsequent analyses: AF3, AF4, F3, F1, FZ, F2, F4, FC3, FC1, FCZ, FC2, FC4, C3, C1, CZ, C2, C4. Metrics were calculated on each channel separately and then averaged across channels to yield data points for each person and each condition.

Fig. 2.

Fig. 2.

Scalp topography of the auditory steady-state response (ASSR) at 80 Hz in response to an 80-Hz amplitude-modulated signal. Maps show the group-averaged spectral amplitude of the 80-Hz ASSR (in signal-to-noise ratio, SNR) for younger and older listeners. Maximal spectral amplitude is observed over frontal regions on the scalp. Circles indicate channels used in the cluster analysis.

Figure 3 shows representative ASSR responses at a single electrode (CZ) from a younger subject in the three modulation depth conditions. Metrics were calculated at all modulation depths, and slope was calculated as a function of modulation depth (Bharadwaj et al. 2014; Guest et al. 2018). The amplitude of the ASSR has been shown to decrease linearly with logarithmic changes in modulation depth (Ross et al. 2000), and thus we calculated a linear change in ASSR as a function of logarithmic steps (dB) in modulation depth. Pilot analyses show that the choice of logarithmic vs. linear steps in modulation did not affect the slope results (data not shown).

Fig. 3.

Fig. 3.

Sample auditory steady-state responses (ASSRs) from a representative subject at the 3 experimental conditions of 0-dB modulation (top), −4-dB modulation (middle), and −8-dB modulation (bottom). Left: ASSRs in the time domain. Insets show a 50-ms sample of the amplitude-modulated stimuli (not to scale). Right: power in the frequency domain calculated from the output of a fast Fourier transform, demonstrating the peak at the modulation frequency.

ASSR metrics.

stimulus-to-response correlation coefficients.

To determine how accurately the ASSR waveform represented the stimulus periodicity at each modulation depth, stimulus-to-response (Pearson’s r) correlations were performed between the ASSR and the half-wave-rectified sinusoid that is the envelope of the transposed tones. Higher stimulus-to-response correlations indicate a more accurate representation of the stimulus envelope. For each participant and each condition, the trial-averaged ASSR from each electrode was bandpass filtered via a 4th-order infinite impulse response (IIR) filter with cutoff frequencies of 65 and 1,000 Hz (Clinard et al. 2010), and the first 50 ms of the waveform were then discarded to avoid confounds introduced by the response onset. Stimulus-to-response correlation coefficients were then calculated over all positive time lags. Negative lags (in which the response waveform led the stimulus waveform) were not included because this was not physiologically valid. The maximum coefficient was taken as the measure of stimulus-to-response correlation (Anderson et al. 2011; Bidelman and Krishnan 2010; Clinard and Cotter 2015), and the lag time (in ms) of the maximum correlation coefficient was taken as the stimulus-to-response correlation latency. Greater stimulus-to-response correlational latencies indicate a greater delay in the ASSR waveform relative to the eliciting AM stimulus.

response-to-response correlation coefficients.

To examine how robust the ASSR is to decreases in modulation depth, Pearson correlations were calculated between the ASSRs from the highest (m = 0 dB) and lowest (m = −8 dB) modulation depth conditions for each participant (Anderson et al. 2011; Bidelman and Krishnan 2010; Clinard and Cotter 2015). Like the stimulus-to-response correlational analysis, trial-averaged ASSRs from each electrode were first bandpass filtered, and then the initial 50 ms of the response were discarded. The maximum correlation coefficient across all time lags (without being restricted to positive lags only) was taken as the cross-correlation value for each person (Bidelman and Krishnan 2010). A higher correlation coefficient indicates greater correspondence between the responses to the two conditions, i.e., that the response was less adversely affected by a change in modulation depth.

assr amplitude.

To measure the neural strength and specificity of the brainstem’s representation of modulation frequency, the amplitude of the ASSR at the modulation frequency was quantified as signal-to-noise ratio (SNR). A fast Fourier transform (FFT) was performed on the ASSR waveform from each electrode using Hanning windowing and without zero padding. The power in the band corresponding to the modulation frequency (i.e., 80 Hz) was taken as the signal. The mean power in 10 frequency bands surrounding the modulation frequency (±5 bands around 80 Hz, corresponding to ±12.5 Hz) was taken as the noise estimate (Clinard et al. 2017; Dobie and Wilson 1996). SNR was then calculated as the square root of the ratio of signal power (Spower) to noise power (Snoise), using the following equation (Clinard et al. 2010; Dobie and Wilson 1996):

SNR=SpowerSnoise.

Higher SNR values indicate greater strength and specificity of the envelope representation in the brainstem at the modulation frequency.

phase-locking value.

To estimate the neural phase-locking of the ASSR, a phase-locking value (PLV) was also calculated for each condition, performed on the non-trial-averaged ASSR waveforms (Clinard and Tremblay 2013; Tallon et al. 1996). Because phase delay does not change as a function of modulation depth and remains fixed for a given modulation frequency (Picton et al. 1987), we were able to calculate PLV from the output of the FFT rather than a time-frequency decomposition. PLV is the length of the vector formed by averaging the complex phase angles of each trial, which are calculated from the complex output of the FFT (Lachaux et al. 1999). An FFT was performed on each individual trial using Hanning windowing and no zero padding, and PLV was calculated using the following equation (Cohen 2014; Lachaux et al. 1999):

PLVf=1nr=1neikfr,

where n is the number of trials and kfr is the phase angle k on trial r for frequency f. The PLV at the modulation frequency was then measured and is a normalized value between 0 and 1. Higher PLV indicates greater phase coherence across all trials and is used as an indicator of neural synchrony (Lachaux et al. 1999).

Behavioral speech tests.

Forty-nine of the 56 participants in the ASSR experiment additionally completed the behavioral speech tests. All were native English speakers and had no prior experience with the speech tests used in the current study. SIN perception was evaluated using the Quick Speech-in-Noise Test (QuickSIN; Etymotic Research; Killion et al. 2004). QuickSIN is a nonadaptive speech test in which listeners hear semantically and syntactically simple sentences spoken by a woman in the presence of four-talker babble, which is composed of four women and one man, and must repeat what they heard. Five lists were presented to each participant, each with five sentences that progressively decreased in SNR from 25 to 0 dB in 5-dB steps. The test was administered in an acoustically shielded room using TDH-39 headphones at 70 dB SPL. Similar to previous studies in which performance in only the most difficult condition was of interest, we restricted our analysis to only the two SNR = 0 dB and SNR = 5 dB conditions and used the percent correct as a metric of SIN perception (Song et al. 2011). The metric of SNR loss was also calculated to determine if age significantly affected SIN scores.

The QuickSIN test is advantageous to the current investigation for several reasons. The QuickSIN test uses less semantically predictable words than other speech tests and is more difficult than other tests of SIN ability (Wilson et al. 2007), and is therefore an appropriate investigation of SIN perception in difficult listening environments. The QuickSIN test’s progressive decreases in speech SNR relative to noise also parallels the decreasing modulation depth of our AM stimuli. QuickSIN has also been used previously to relate individual differences in neural data to SIN performance (Song et al. 2011).

Experimental design and statistical analyses.

Statistical analyses were done in SPSS version 24 and R version 1.1.423 (R Core Team 2017). Mixed analyses of covariance (ANCOVA) were used to test whether ASSR stimulus-to-response correlation coefficients and latencies, ASSR amplitude, and PLV were affected by modulation depth while controlling for pure-tone audiometric thresholds. These were done with a within-subjects factor of modulation depth and a between-subjects factor of age group, with average pure-tone thresholds included as a covariate. The pure-tone thresholds included in all analyses were calculated as an average of thresholds from 2, 3, and 4 kHz. Response-to-response correlations examine ASSR responses across modulation depths, and the factor of modulation depth was thus omitted from the relevant ANCOVA. The Greenhouse-Geisser correction was used when Mauchly’s test of sphericity was violated.

The predictive relationships between ASSR metrics and QuickSIN scores at 0 and 5 dB were evaluated via regression in R (Rigby and Stasinopoulos 2005). QuickSIN responses are proportions bounded on a scale from 0 to 1, and thus a standard Gaussian linear regression model is inappropriate. Instead, we used a nonlinear model with a beta distribution (Ferrari and Cribari-Neto 2004). Because the QuickSIN responses in the current study furthermore include values at 0 and 1, we used zero- and one-inflated beta regressions (Ospina and Ferrari 2010). Inflated beta regressions are three-parameter models that model the data using an additive combination of a continuous beta distribution, Beta(μ,σ), for responses between 0 and 1 and a Bernoulli distribution that assigns probabilities to responses at either 0 or 1 (Ferrari and Cribari-Neto 2004; Ospina and Ferrari 2010, 2012). The mean and variance of the QuickSIN response variable y can be written as

Ey=αc+1αμ
Vary=1αμ1μσ+1+α1αcμ2

where c is either 0 or 1 for zero- and one-inflated beta distributions, respectively, α is the probability of a response at c, and μ and σ are the mean and precision parameters of the beta distribution Beta(μ, σ). Note that when 0 < y < 1, the expected value of the QuickSIN response variable becomes (Ospina and Ferrari 2012)

Ey=μ.

The parameters are linked to responses by link functions and linear predictors. The values for the precision (σ) and mixture (α) parameters were assumed to be constant for all observations. The inflated beta regression is thus specified as (Ferrari and Cribari-Neto 2004; Pereira and Cribari-Neto 2014)

gμ=i=1mβixti
bσ=γ0
hα=υ0

where g(.), b(.), and h(.) are link functions, γ0 and ν0 are constants, and β = β1, …, βm is a vector of regression coefficients for covariate terms xt1, …, xtm. A logit link function was used for the mean (μ) and precision (σ) parameters, and an identity link function was used for the mixture (α) parameter (Ospina and Ferrari 2010, 2012).

Regressions were modeled in a generalized additive modeling framework implemented in the GAMLSS package (Rigby and Stasinopoulos 2005; Stasinopoulos and Rigby 2007; reference manual available at https://cran.r-project.org/web/packages/gamlss/index.html), and the distribution families “BEINF0” and BEINF1” were used to indicate zero- and one-inflated beta regressions, respectively. Zero-inflated beta regressions were used to model QuickSIN 0-dB SNR responses, and one-inflated beta regressions were used to model QuickSIN 5-dB SNR responses.

All regressions were bootstrapped with 1,000 iterations, and 95% bias-corrected and accelerated confidence intervals (CI) were calculated for each regression coefficient estimate. To correct for inflated type I error rates arising from multiple regressions per response variable, we adjusted critical P values (Pcritical) according to the Benjamini-Hochberg (BH) false discovery rate method (Benjamini and Hochberg 1995).

RESULTS

Audiometric Thresholds

Pure-tone audiometric thresholds from 2, 3, and 4 kHz were averaged together to yield a measure of pure-tone hearing near the stimulus carrier frequency. Independent t tests reveal that pure-tone thresholds were significantly higher (poorer) for older listeners than for younger listeners [t(54) = −8.002, P < 0.001]. To account for differences in audibility, all further analyses will control for pure-tone thresholds using the averaged thresholds of 2, 3, and 4 kHz as a covariate.

Effects of Modulation Depth and Age on the ASSR

We examined how the ASSR metrics of correlation coefficients and latencies, ASSR amplitude, and PLV varied with modulation depth and age while controlling for audibility. ASSR metrics at each modulation depth for both age groups are shown in Fig. 4.

Fig. 4.

Fig. 4.

Auditory steady-state response (ASSR) metrics of stimulus-to-response correlation coefficients (top left) and latencies (top right), ASSR amplitude (bottom left), and phase-locking value (PLV; bottom right) as a function of modulation depth for younger (solid lines, closed triangles) and older listeners (dashed lines, open circles). Shaded symbols indicate data from each participant. Lines indicate the mean for each group. Error bars are 95% confidence intervals.

Stimulus-to-response correlations.

An ANCOVA with modulation depth as a within-subjects factor, age group as a between-subjects factor, and pure-tone thresholds as a covariate showed a significant main effect of modulation depth on stimulus-to-response correlation coefficients [F(1.776,94.142) = 30.111, P < 0.001], with coefficients increasing with increasing modulation depth. Post hoc contrast analyses revealed that correlation coefficients at 0-dB modulation were significantly greater than those at −4-dB modulation [F(1,53) = 21.324, P < 0.001], which were in turn significantly greater than correlation coefficients at −8-dB modulation [F(1,53) = 14.323, P < 0.001]. There was no significant effect of age group or pure-tone thresholds and no interaction between age group and modulation depth or between pure-tone thresholds and modulation depth. An ANCOVA with stimulus-to-response correlational latencies shows no significant effects.

Response-to-response correlations.

An ANCOVA with response-to-response correlation coefficients as the dependent variable, age group as a between-subjects factor, and pure-tone thresholds as a covariate showed no significant main effect of either age group or pure-tone thresholds.

ASSR amplitude.

ANCOVA showed a significant effect of modulation depth on ASSR amplitude [F(1.794,95.098) = 19.339, P < 0.001], with ASSR amplitude increasing with increasing modulation depth. Post hoc contrast analyses revealed that amplitude at 0-dB modulation was significantly greater than that at −4-dB modulation [F(1,53) = 13.245, P = 0.001] and that ASSR amplitude at −4-dB modulation was significantly greater than amplitude at −8-dB modulation [F(1,53) = 9.466, P = 0.003]. There was no significant effect of age group or pure-tone thresholds and no interaction between age group and modulation depth or between pure-tone thresholds and modulation depth.

Phase-locking value.

An ANCOVA shows that modulation depth significantly affected PLV [F(1.407,74.581) = 50.245, P < 0.001], with PLV increasing as modulation depth increases. Post hoc contrast analyses reveal a significant increase in PLV as modulation depth increased from −8 to −4 dB [F(1,53) = 27.938, P < 0.001] and from −4 to 0 dB [F(1,53) = 43.801, P < 0.001]. There was no significant effect of age group or pure-tone thresholds and no significant interactions. Taken together, these results agree with previous findings on the effect of modulation depth on ASSRs in that AM signals with greater modulation depths elicit stronger ASSR responses (Dimitrijevic et al. 2004, 2016).

Behavioral Results

In the QuickSIN 0 dB condition, independent t tests showed that younger adults (0.069 ± 0.087, mean ± SD) and older adults (0.080 ± 0.099) did not significantly differ [t(47) = −0.397, P = 0.693]. QuickSIN 5 dB scores from younger adults (0.906 ± 0.073) also did not significantly differ [t(47) = 1.80, P = 0.078] from those of older adults (0.847 ± 0.121). There was also no significant difference between younger (1.00 ± 0.99) and older listeners (1.46 ± 1.68) when QuickSIN scores were quantified via the traditional metric of SNR loss [t(47) = −1.014, P = 0.316] (Killion et al. 2004). Our results agree with previous studies that have similarly considered only difficult listening conditions (Song et al. 2011), as well as studies that failed to find an age effect in QuickSIN tasks when SIN was quantified using SNR loss in normal-hearing listeners (Sheft et al. 2012).

Relationship Between ASSR Metrics and SIN Scores

Neither ASSR metrics nor QuickSIN scores differed between younger and older listeners, indicating that age may play a smaller role than individual differences in this task. Therefore, age was not included in any subsequent analyses. The regression submodels for μ thus take the general form (Ferrari and Cribari-Neto 2004)

gμ=β0+βASSRxASSR+βPTxPT,

where g(.) indicates the logit link function, μ indicates the mean (expected value) of the response y for 0 < y < 1, β0 indicates the intercept, βASSR indicates the linear predictor for ASSR term xASSR, and βPT indicates the linear predictor for pure-tone threshold term xPT.

Speech scores in the QuickSIN 5-dB condition were not significantly predicted by any of the ASSR metrics measured (data not shown). The following section presents only the results for regressions with performance in the QuickSIN 0 dB condition as the dependent variable. Pure-tone thresholds were not a significant predictor in any of the models tested [t(45) < 1.774, P > 0.083], and thus only the linear predictors for the ASSR metrics are reported below.

Stimulus-to-response correlations.

For QuickSIN scores at 0 dB, stimulus-to-response correlation coefficients significantly predicted speech scores at modulation depths of −8 dB [b = 1.589, t(45) = 3.689. P < 0.001, 95% CI = (0.259 2.345), Pcritical = 0.0036; see Fig. 5] and −4 dB [b = 1.626, t(45) = 3.305, P = 0.002, 95% CI = (0.334, 2.707), Pcritical = 0.0038], but not at 0 dB [b = 1.392, t(45) = 2.550, P = 0.014, 95% CI = (−0.074, 2.510), Pcritical = 0.0056]. The slope of the stimulus-to-response correlation coefficients across modulation depths was also not a significant predictor of SIN scores [b = −16.700, t(45) = −1.852, P = 0.071, 95% CI = (−37.83, 2.34), Pcritical = 0.0071]. Stimulus-to-response correlation latencies were also not significant predictors of QuickSIN scores.

Fig. 5.

Fig. 5.

Scatterplot of 0-dB QuickSIN scores by standardized residuals of stimulus-to-response correlation coefficients for all 3 modulation depths. The x-axes are the standardized residuals of stimulus-to-response correlation coefficients after controlling for pure-tone thresholds from 2 to 4 kHz. Each symbol represents data from a single participant. Closed triangles indicate younger participants, and open circles indicate older participants. Zero-inflated beta regressions were implemented within a GAMLSS framework that separately models QuickSIN scores equal to 0 and QuickSIN scores between 0 and 1. P values indicate significance of regression coefficient estimates. *0-dB QuickSIN scores are significantly predicted by stimulus-to-response correlation coefficients at amplitude modulation depths of −8 and −4 dB, but not at 0 dB, after controlling for multiple comparisons and calculation of 95% confidence intervals.

Response-to-response correlations.

Response-to-response correlation coefficients significantly predicted QuickSIN scores [b = 1.599, t(45) = 3.266, P = 0.0021, 95% CI = (0.026, 2.377), Pcritical = 0.0038; Fig. 6].

Fig. 6.

Fig. 6.

Scatterplots showing 0-dB QuickSIN scores by the standardized residuals of response-to-response correlation coefficients. The x-axis is the standardized residuals of response-to-response correlation coefficients after controlling for pure-tone thresholds from 2 to 4 kHz. Each symbol represents data from a single participant. Closed triangles indicate younger participants, and open circles indicate older participants. Zero-inflated beta regressions were implemented within a GAMLSS framework that separately models QuickSIN scores equal to 0 and QuickSIN scores between 0 and 1. P values indicate significance of regression coefficient estimates. *QuickSIN scores are significantly predicted by response-to-response correlation coefficients in zero-inflated beta regressions, after controlling for multiple comparisons and calculation of 95% confidence intervals.

ASSR amplitude.

ASSR amplitude significantly predicted QuickSIN scores at 0 dB only at the shallowest modulation depth of −8 dB [b = 0.278, t(45) = 3.006, P = 0.0044, 95% CI = (0.026, 0.478), Pcritical = 0.0045; Fig. 7]. ASSR amplitude was not a significant predictor of SIN scores at a modulation depth of either −4 dB [b = 0.233, t(45) = 2.000, P = 0.0517, 95% CI = (−0.0594, 0.5359), Pcritical = 0.0063] or 0 dB [b = 0.189, t(45) = 1.630, P = 0.1100, 95% CI = (−0.0256, 0.5584), Pcritical = 0.0083]. The slope of ASSR amplitude across modulation depths was not a significant predictor of SIN scores [b = −1.99, t(45) = −1.596, P = 0.1180, 95% CI = (−4.192, 1.001), Pcritical = 0.0100].

Fig. 7.

Fig. 7.

Scatterplot of 0-dB QuickSIN scores by standardized residuals of auditory steady-state response (ASSR) amplitude at all 3 modulation depths. The x-axis is the standardized residuals of ASSR amplitudes after controlling for pure-tone thresholds from 2 to 4 kHz. Each symbol represents data from a single participant. Closed triangles indicate younger participants, and open circles indicate older participants. Zero-inflated beta regressions were implemented within a GAMLSS framework that separately models QuickSIN scores equal to 0 and QuickSIN scores between 0 and 1. P values indicate significance of regression coefficient estimates. *0-dB QuickSIN scores are significantly predicted by ASSR amplitude only at an amplitude modulation depth of −8 dB, after controlling for multiple comparisons and calculation of 95% confidence intervals.

Phase-locking value.

Figure 8 shows scatterplots of QuickSIN by PLV for all modulation depths. PLV significantly predicted 0-dB QuickSIN scores at −8-dB modulation [b = 3.593, t(45) = 4.238, P < 0.001, 95% CI = (1.587, 5.029), Pcritical = 0.0031] and at −4-dB modulation [b = 2.825, t(45) = 3.786, P < 0.001, 95% CI = (0.515, 4.060), Pcritical = 0.0033]. The regression of QuickSIN by PLV at 0-dB modulation was significant but did not survive BH correction for multiple comparisons [b = 1.823, t(45) = 2.762, P = 0.0084, 95% CI = (0.331, 3.128), Pcritical = 0.0050], and thus a type 1 error cannot be ruled out. The slope of PLV as a function of modulation depth was not a significant predictor of SIN scores [b = 7.981, t(43) = 0.671, P = 0.506, 95% CI = (−18.413, 32.754), Pcritical = 0.0500].

Fig. 8.

Fig. 8.

Scatterplot of 0-dB QuickSIN scores by standardized residuals of auditory steady-state response (ASSR) phase-locking value (PLV) for all 3 modulation depths. The x-axis is the standardized residuals of phase-locking value after controlling for pure-tone thresholds from 2 to 4 kHz. Each symbol represents data from a single participant. Closed triangles indicate younger participants, and open circles indicate older participants. Zero-inflated beta regressions were implemented within a GAMLSS framework that separately models QuickSIN scores equal to 0 and QuickSIN scores between 0 and 1. P values indicate significance of regression coefficient estimates. *QuickSIN scores are significantly predicted by PLV at −8 and at −4 dB modulation, after controlling for multiple comparisons and calculation of 95% confidence intervals.

Taken together, these results demonstrate that the shallowest modulation depth of −8 dB is a consistent predictor of QuickSIN scores for the lowest speech SNR of 0 dB, consistent with previous findings demonstrating that ASSR metrics predict speech scores during challenging listening conditions (Leigh-Paffenroth and Fowler 2006).

DISCUSSION

The current study examined the relationship between subcortical encoding of sustained waveform periodicity and SIN perception in a cohort of younger and older normal-hearing listeners. Because temporal processing is an important element of speech recognition, particularly in the presence of competing talkers or background noise, we hypothesized that individuals with stronger and more temporally precise subcortical encoding of sustained waveform periodicity would exhibit better SIN perception. These predictions held true, but only for the most difficult SNR = 0 dB speech condition. Using a cohort of nearly 60 listeners, larger than has been used previously, we quantified the robustness (via time-domain correlations), strength at the modulation frequency, and synchrony of the brainstem’s response to the periodicity of sustained amplitude modulation envelopes and showed that individual variation in SIN ability is driven by individual differences in subcortical envelope encoding and not by age. These relationships are especially apparent for ASSRs evoked by AM stimuli with shallow modulation depths. Previous studies have found conflicting evidence regarding the importance of sustained periodicity in the brainstem, and here we clarify the relationship between sustained periodicity encoding and SIN: stronger and more robust subcortical encoding of shallow envelope modulations by normal-hearing listeners is associated with better SIN perception in difficult listening conditions, but not in less challenging ones, and primarily for shallow envelope cues. To our knowledge this is the first study to provide a cohesive explanation for conflicting previous results. Previous studies incorporating less challenging tests of SIN ability (Leigh-Paffenroth and Murnane 2011; Parbery-Clark et al. 2009a), focusing on stimuli with higher modulation depths (Goossens et al. 2018; Leigh-Paffenroth and Murnane 2011), or only looking at differences between groups (Guest et al. 2018) may have been unable to elucidate these effects.

A secondary goal of the current study was to evaluate the feasibility of using AM tones at low modulation depths to evaluate periodicity encoding. We found substantial evidence for the utility of AM tones with low modulation depths. With the exception of stimulus-to-response correlation lags, all of the ASSR metrics that we calculated predicted SIN perception when measured from AM tones with a modulation depth of −8 dB. ASSR robustness and synchrony, measured via phase-locking and stimulus-to-response correlations, additionally predicted SIN at a moderate modulation depth of −4 dB. Notably, ASSR amplitude, which is a common method of quantifying ASSR, was the only ASSR metric that predicted SIN only at a modulation depth of −8 dB and not at −4 dB; previous studies using ASSR amplitude as a metric of temporal envelope processing may not have shown relationships between ASSR and SIN if stimulus modulation envelopes were not shallow enough (Guest et al. 2018; Leigh-Paffenroth and Murnane 2011). No ASSR metric predicted SIN scores at full modulation (m = 0 dB). Stimulus-to-response correlation latencies were not significantly affected by modulation depth and age and were not significant predictors of SIN abilities. This may indicate that although the strength of the correlation between the stimulus and response is relevant, the time at which this correlation is established is less meaningful. The fact that shallow modulation depths relate to SIN may reflect the fact that these degraded stimuli closely align with stimuli in real-world environments (Assmann and Summerfield 2004; Picton 2013). Bharadwaj et al. (2015) and others have suggested an additional use for the AM tones at moderate sound levels and low modulation depths as a potential marker for suprathreshold processing. However, the AM tones we utilized did not incorporate off-frequency maskers and thus cannot fully disentangle the bottom-up contributions of auditory nerve fibers of different compositions that have been proposed as contributors to suprathreshold processing deficits. Regardless, these data highlight the utility of low-modulation depths for examinations of temporal processing.

We did not find that the slope of ASSR metrics across modulation depths predicted SIN performance. Although calculating slope across stimulus factors is a method to normalize measurements within an individual, it is possible that a linear slope across modulation depths may inappropriately represent changes in metrics across modulation depths. Instead of a purely linear change of ASSR with modulation depth (Bharadwaj et al. 2015; Guest et al. 2017; Rees et al. 1986), the degree of linearity may change with age (Dimitrijevic et al. 2016), complicating the use of this metric. Although we found that quantifying slope across linear vs. logarithmic changes in modulation depth did not affect the results, we limited our investigation to three modulation depths and thus were unable to test for nonlinearity across modulation depth for the two groups in the present study, as has been done previously (Dimitrijevic et al. 2016).

Contrary to our expectations, SIN performance did not differ between older and younger listeners. This was particularly true after we accounted for differences in audiometric thresholds. Although previous investigations of speech recognition in normal-hearing adults have found that older adults typically demonstrate poorer speech recognition in the presence of noise or background maskers, we did not find this in the present results. QuickSIN scores comparing older and younger listeners with audiometric thresholds that fall within a clinically normal range have failed to consistently demonstrate an age group effect (Sheft et al. 2012; Song et al. 2011), suggesting that this test may not be as sensitive as others in determining age differences in normal-hearing listeners. The absence of an age effect in the current findings may result from several factors. Long-term auditory experience such as musical training or tonal language proficiency is known to enhance SIN abilities and ameliorate age-related declines in SIN perception (Parbery-Clark et al. 2009a, 2009b, 2012a, 2012b) and may have played a role in the SIN abilities documented in this report. Although it is unlikely that a high number of our older participants were highly trained musicians or spoke a tonal language, and we did not collect data on the musical and auditory experience of our participants, the inevitable individual variation that exists in participants’ auditory life experiences may nevertheless be contributing to variability in our responses. We find that subcortical envelope periodicity strongly relates to SIN perception in difficult conditions but not to age or audibility in our ASSR data, suggesting that subject-specific factors such as auditory experience (Parbery-Clark et al. 2012b) or inhibition (Caspary et al. 1995) may play a larger role in SIN in these difficult conditions than age or hearing health does. Taken together, these findings highlight a potential area where auditory training or other experience-dependent factors may affect perception and provide avenues for future studies and/or interventions to improve communication.

We found that subcortical encoding of amplitude envelope did not differ between our older and younger participants and, furthermore, that age did not affect how the ASSR metrics changed with modulation depth. Previous studies of age-related changes in brainstem encoding of the temporal qualities of sounds have found that age effects between young and old normal-hearing listeners depend on the specific stimulus factors: age effects are present primarily at higher frequencies (Clinard et al. 2010) or are apparent only in the onset and formant transitions of the response (Anderson and Kraus 2010b; Clinard et al. 2010). Age differences have been previously documented in ASSRs measured in response to AM tones with low modulation depths but included both hearing-impaired and normal-hearing older adults in the older cohort, and therefore the age effects may have been driven by hearing loss rather than age (Dimitrijevic et al. 2004). Similar studies investigating envelope encoding of AM tones did not find a difference between younger and older listeners (Dimitrijevic et al. 2016). Furthermore, the use of transposed tones, which are known to elicit stronger responses and greater phase-locking than sinusoidal AM tones, may have allowed our older normal-hearing listeners to produce ASSRs that were more similar to those of younger adults.

That we see no differences between the ASSRs of younger and older adults may also reflect central gain, wherein diminished function at the auditory periphery, whether through natural aging or injury, is progressively compensated for at higher levels of the auditory system, resulting in evoked potentials from the midbrain and cortex that match or even exceed those of normal controls. Enhanced physiological responses are found primarily in response to simpler stimuli such as AM sounds with low modulation rates or stimuli presented in quiet, but these recovered potentials do not necessarily accompany complete recovery in complex temporal processing or full return to normal function (Chambers et al. 2016). This may be one reason why age differences in FFRs are seen primarily at higher modulation frequencies but not at lower ones (Clinard and Tremblay 2013; Grose et al. 2009; Leigh-Paffenroth and Fowler 2006; Purcell et al. 2004; for a review, see Parthasarathy et al. 2019). Taken together, these results show that age does not have an effect on ASSRs at 80 Hz and provide support for an absence of age effects in subcortical waveform periodicity encoding for normal-hearing listeners at modulation frequencies as low as 80 Hz.

In summary, these results clarify previous conflicting findings about the role of sustained periodicity encoding in the perception of speech in noise. With a relatively large cohort of older and younger listeners with clinically normal audiograms, we conclusively demonstrate that the strength of the ASSR response, the degree of intertrial phase coherence, and the response regularity all predict SIN abilities, but only in the most difficult listening conditions. These effects are independent of age and pure-tone thresholds, emphasizing how individual variability in sustained temporal envelope processing drives variation in SIN ability. These findings highlight the value of using AM tones with shallow modulation depths as an effective way of examining the neural encoding of waveform periodicity, an important factor for SIN perception.

GRANTS

This work was supported in part by National Institutes of Health (NIH) Grants R01 DC014467, R01 DC017619, P50 DC00422, and T32 DC014435. The project also received support from the South Carolina Clinical and Translational Research Institute with an academic home at the Medical University of South Carolina, supported by NIH Grant UL1RR029882. This investigation was conducted in a facility constructed with support from NIH Research Facilities Improvement Program Grant C06 RR14516.

DISCLOSURES

No conflicts of interest, financial or otherwise, are declared by the authors.

AUTHOR CONTRIBUTIONS

K.C.H. conceived and designed research; C.M.M., J.W.D., and K.C.H. performed experiments; C.M.M. analyzed data; C.M.M., J.W.D., and K.C.H. interpreted results of experiments; C.M.M. prepared figures; C.M.M. drafted manuscript; C.M.M., J.W.D., and K.C.H. edited and revised manuscript; C.M.M., J.W.D., and K.C.H. approved final version of manuscript.

ACKNOWLEDGMENTS

We thank the participants of this study. We also thank Hari Bharadwaj and Barbara Shinn-Cunningham for assistance with stimulus generation.

REFERENCES

  1. Aiken SJ, Picton TW. Envelope and spectral frequency-following responses to vowel sounds. Hear Res 245: 35–47, 2008. doi: 10.1016/j.heares.2008.08.004. [DOI] [PubMed] [Google Scholar]
  2. Anderson S, Kraus N. Objective neural indices of speech-in-noise perception. Trends Amplif 14: 73–83, 2010a. doi: 10.1177/1084713810380227. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Anderson S, Kraus N. Sensory-cognitive interaction in the neural encoding of speech in noise: a review. J Am Acad Audiol 21: 575–585, 2010b. doi: 10.3766/jaaa.21.9.3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Anderson S, Parbery-Clark A, White-Schwoch T, Kraus N. Aging affects neural precision of speech encoding. J Neurosci 32: 14156–14164, 2012. doi: 10.1523/JNEUROSCI.2176-12.2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Anderson S, Parbery-Clark A, Yi HG, Kraus N. A neural basis of speech-in-noise perception in older adults. Ear Hear 32: 750–757, 2011. doi: 10.1097/AUD.0b013e31822229d3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Anderson S, Skoe E, Chandrasekaran B, Kraus N. Neural timing is linked to speech perception in noise. J Neurosci 30: 4922–4926, 2010. doi: 10.1523/JNEUROSCI.0107-10.2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Assmann P, Summerfield Q. The perception of speech under adverse conditions. In: Speech Processing in the Auditory System, edited by Greenberg S, Ainsworth WA, Fay RR. New York: Springer, 2004, p. 231–308. [Google Scholar]
  8. Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B (Methodol) 57: 289–300, 1995. doi: 10.1111/j.2517-6161.1995.tb02031.x. [DOI] [Google Scholar]
  9. Bernstein LR, Trahiotis C. The apparent immunity of high-frequency “transposed” stimuli to low-frequency binaural interference. J Acoust Soc Am 116: 3062–3069, 2004. doi: 10.1121/1.1791892. [DOI] [PubMed] [Google Scholar]
  10. Bharadwaj HM, Masud S, Mehraei G, Verhulst S, Shinn-Cunningham BG. Individual differences reveal correlates of hidden hearing deficits. J Neurosci 35: 2161–2172, 2015. doi: 10.1523/JNEUROSCI.3915-14.2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Bharadwaj HM, Verhulst S, Shaheen L, Liberman MC, Shinn-Cunningham BG. Cochlear neuropathy and the coding of supra-threshold sound. Front Syst Neurosci 8: 26, 2014. doi: 10.3389/fnsys.2014.00026. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Bidelman GM. Multichannel recordings of the human brainstem frequency-following response: scalp topography, source generators, and distinctions from the transient ABR. Hear Res 323: 68–80, 2015. doi: 10.1016/j.heares.2015.01.011. [DOI] [PubMed] [Google Scholar]
  13. Bidelman GM, Krishnan A. Effects of reverberation on brainstem representation of speech in musicians and non-musicians. Brain Res 1355: 112–125, 2010. doi: 10.1016/j.brainres.2010.07.100. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Boettcher FA, Poth EA, Mills JH, Dubno JR. The amplitude-modulation following response in young and aged human subjects. Hear Res 153: 32–42, 2001. doi: 10.1016/S0378-5955(00)00255-0. [DOI] [PubMed] [Google Scholar]
  15. Caspary DM, Milbrandt JC, Helfert RH. Central auditory aging: GABA changes in the inferior colliculus. Exp Gerontol 30: 349–360, 1995. doi: 10.1016/0531-5565(94)00052-5. [DOI] [PubMed] [Google Scholar]
  16. Chambers AR, Resnik J, Yuan Y, Whitton JP, Edge AS, Liberman MC, Polley DB. Central gain restores auditory processing following near-complete cochlear denervation. Neuron 89: 867–879, 2016. doi: 10.1016/j.neuron.2015.12.041. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Clinard CG, Cotter CM. Neural representation of dynamic frequency is degraded in older adults. Hear Res 323: 91–98, 2015. doi: 10.1016/j.heares.2015.02.002. [DOI] [PubMed] [Google Scholar]
  18. Clinard CG, Hodgson SL, Scherer ME. Neural correlates of the binaural masking level difference in human frequency-following responses. J Assoc Res Otolaryngol 18: 355–369, 2017. doi: 10.1007/s10162-016-0603-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Clinard CG, Tremblay KL. Aging degrades the neural encoding of simple and complex sounds in the human brainstem. J Am Acad Audiol 24: 590–599, 2013. doi: 10.3766/jaaa.24.7.7. [DOI] [PubMed] [Google Scholar]
  20. Clinard CG, Tremblay KL, Krishnan AR. Aging alters the perception and physiological representation of frequency: evidence from human frequency-following response recordings. Hear Res 264: 48–55, 2010. doi: 10.1016/j.heares.2009.11.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Cohen MX. Analyzing Neural Time Series. Cambridge, MA: The MIT Press, 2014. [Google Scholar]
  22. Delorme A, Miyakoshi M, Jung TP, Makeig S. Grand average ERP-image plotting and statistics: a method for comparing variability in event-related single-trial EEG activities across subjects and conditions. J Neurosci Methods 250: 3–6, 2015. doi: 10.1016/j.jneumeth.2014.10.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Dimitrijevic A, Alsamri J, John MS, Purcell D, George S, Zeng F-G. Human envelope following responses to amplitude modulation: effects of aging and modulation depth. Ear Hear 37: e322–e335, 2016. doi: 10.1097/AUD.0000000000000324. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Dimitrijevic A, John MS, Picton TW. Auditory steady-state responses and word recognition scores in normal-hearing and hearing-impaired adults. Ear Hear 25: 68–84, 2004. doi: 10.1097/01.AUD.0000111545.71693.48. [DOI] [PubMed] [Google Scholar]
  25. Dimitrijevic A, John MS, van Roon P, Picton TW. Human auditory steady-state responses to tones independently modulated in both frequency and amplitude. Ear Hear 22: 100–111, 2001. doi: 10.1097/00003446-200104000-00003. [DOI] [PubMed] [Google Scholar]
  26. Dobie RA, Wilson MJ. A comparison of t test, F test, and coherence methods of detecting steady-state auditory-evoked potentials, distortion-product otoacoustic emissions, or other sinusoids. J Acoust Soc Am 100: 2236–2246, 1996. doi: 10.1121/1.417933. [DOI] [PubMed] [Google Scholar]
  27. Dreyer A, Delgutte B. Phase locking of auditory-nerve fibers to the envelopes of high-frequency sounds: implications for sound localization. J Neurophysiol 96: 2327–2341, 2006. doi: 10.1152/jn.00326.2006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Ferrari SL, Cribari-Neto F. Beta regression for modelling rates and proportions. J Appl Stat 31: 799–815, 2004. doi: 10.1080/0266476042000214501. [DOI] [Google Scholar]
  29. Goossens T, Vercammen C, Wouters J, van Wieringen A. Neural envelope encoding predicts speech perception performance for normal-hearing and hearing-impaired adults. Hear Res 370: 189–200, 2018. doi: 10.1016/j.heares.2018.07.012. [DOI] [PubMed] [Google Scholar]
  30. Grose JH, Mamo SK, Hall JW 3rd. Age effects in temporal envelope processing: speech unmasking and auditory steady state responses. Ear Hear 30: 568–575, 2009. doi: 10.1097/AUD.0b013e3181ac128f. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Guest H, Munro KJ, Prendergast G, Howe S, Plack CJ. Tinnitus with a normal audiogram: Relation to noise exposure but no evidence for cochlear synaptopathy. Hear Res 344: 265–274, 2017. doi: 10.1016/j.heares.2016.12.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Guest H, Munro KJ, Prendergast G, Millman RE, Plack CJ. Impaired speech perception in noise with a normal audiogram: no evidence for cochlear synaptopathy and no relation to lifetime noise exposure. Hear Res 364: 142–151, 2018. doi: 10.1016/j.heares.2018.03.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Harris KC, Dubno JR. Age-related deficits in auditory temporal processing: unique contributions of neural dyssynchrony and slowed neuronal processing. Neurobiol Aging 53: 150–158, 2017. doi: 10.1016/j.neurobiolaging.2017.01.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Hornickel J, Skoe E, Nicol T, Zecker S, Kraus N. Subcortical differentiation of stop consonants relates to reading and speech-in-noise perception. Proc Natl Acad Sci USA 106: 13022–13027, 2009. doi: 10.1073/pnas.0901123106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Killion MC, Niquette PA, Gudmundsen GI, Revit LJ, Banerjee S. Development of a quick speech-in-noise test for measuring signal-to-noise ratio loss in normal-hearing and hearing-impaired listeners. J Acoust Soc Am 116: 2395–2405, 2004. [Erratum in J Acoust Soc Am 119: 1888, 2006.] doi: 10.1121/1.1784440. [DOI] [PubMed] [Google Scholar]
  36. Lachaux JP, Rodriguez E, Martinerie J, Varela FJ. Measuring phase synchrony in brain signals. Hum Brain Mapp 8: 194–208, 1999. doi:. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Leigh-Paffenroth ED, Fowler CG. Amplitude-modulated auditory steady-state responses in younger and older listeners. J Am Acad Audiol 17: 582–597, 2006. doi: 10.3766/jaaa.17.8.5. [DOI] [PubMed] [Google Scholar]
  38. Leigh-Paffenroth ED, Murnane OD. Auditory steady state responses recorded in multitalker babble. Int J Audiol 50: 86–97, 2011. doi: 10.3109/14992027.2010.532512. [DOI] [PubMed] [Google Scholar]
  39. Ospina R, Ferrari SL. Inflated beta distributions. Stat Papers 51: 111–126, 2010. doi: 10.1007/s00362-008-0125-4. [DOI] [Google Scholar]
  40. Ospina R, Ferrari SL. A general class of zero-or-one inflated beta regression models. Comput Stat Data Anal 56: 1609–1623, 2012. doi: 10.1016/j.csda.2011.10.005. [DOI] [Google Scholar]
  41. Parbery-Clark A, Anderson S, Hittner E, Kraus N. Musical experience offsets age-related delays in neural timing. Neurobiol Aging 33: 1483.e1–1483.e4, 2012a. doi: 10.1016/j.neurobiolaging.2011.12.015. [DOI] [PubMed] [Google Scholar]
  42. Parbery-Clark A, Anderson S, Hittner E, Kraus N. Musical experience strengthens the neural representation of sounds important for communication in middle-aged adults. Front Aging Neurosci 4: 30, 2012b. doi: 10.3389/fnagi.2012.00030. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Parbery-Clark A, Skoe E, Kraus N. Musical experience limits the degradative effects of background noise on the neural processing of sound. J Neurosci 29: 14100–14107, 2009a. 10.1523/JNEUROSCI.3256-09.2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Parbery-Clark A, Skoe E, Lam C, Kraus N. Musician enhancement for speech-in-noise. Ear Hear 30: 653–661, 2009b. doi: 10.1097/AUD.0b013e3181b412e9. [DOI] [PubMed] [Google Scholar]
  45. Parthasarathy A, Bartlett EL, Kujawa SG, Bartlett L, Kujawa SG, Bartlett EL, Kujawa SG. Age-related changes in neural coding of envelope cues: peripheral declines and central compensation. Neuroscience 407: 21–31, 2019. doi: 10.1016/j.neuroscience.2018.12.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Pereira TL, Cribari-Neto F. Detecting model misspecification in inflated beta regressions. Commun Stat Simul Comput 43: 631–656, 2014. doi: 10.1080/03610918.2012.712183. [DOI] [Google Scholar]
  47. Picton T. Hearing in time: evoked potential studies of temporal processing. Ear Hear 34: 385–401, 2013. doi: 10.1097/AUD.0b013e31827ada02. [DOI] [PubMed] [Google Scholar]
  48. Picton TW, Skinner CR, Champagne SC, Kellett AJC, Maiste AC. Potentials evoked by the sinusoidal modulation of the amplitude or frequency of a tone. J Acoust Soc Am 82: 165–178, 1987. doi: 10.1121/1.395560. [DOI] [PubMed] [Google Scholar]
  49. Presacco A, Simon JZ, Anderson S. Evidence of degraded representation of speech in noise, in the aging midbrain and cortex. J Neurophysiol 116: 2346–2355, 2016. doi: 10.1152/jn.00372.2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Purcell DW, John SM, Schneider BA, Picton TW. Human temporal auditory acuity as assessed by envelope following responses. J Acoust Soc Am 116: 3581–3593, 2004. doi: 10.1121/1.1798354. [DOI] [PubMed] [Google Scholar]
  51. R Core Team R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing, 2017. https://www.r-project.org/. [Google Scholar]
  52. Rance G. The Auditory Steady-State Response: Generation, Recording, and Clinical Application. San Diego, CA: Plural, 2008. [Google Scholar]
  53. Rees A, Green GGR, Kay RH. Steady-state evoked responses to sinusoidally amplitude-modulated sounds recorded in man. Hear Res 23: 123–133, 1986. doi: 10.1016/0378-5955(86)90009-2. [DOI] [PubMed] [Google Scholar]
  54. Rigby RA, Stasinopoulos M. Generalized additive models for location, scale and shape. J R Stat Soc Ser A Stat Soc 54: 507–554, 2005. doi: 10.1111/j.1467-9876.2005.00510.x. [DOI] [Google Scholar]
  55. Ross B, Borgmann C, Draganova R, Roberts LE, Pantev C. A high-precision magnetoencephalographic study of human auditory steady-state responses to amplitude-modulated tones. J Acoust Soc Am 108: 679–691, 2000. doi: 10.1121/1.429600. [DOI] [PubMed] [Google Scholar]
  56. Schoof T, Rosen S. The role of age-related declines in subcortical auditory processing in speech perception in noise. J Assoc Res Otolaryngol 17: 441–460, 2016. doi: 10.1007/s10162-016-0564-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  57. Shaheen LA, Valero MD, Liberman MC. Towards a diagnosis of cochlear neuropathy with envelope following responses. J Assoc Res Otolaryngol 16: 727–745, 2015. doi: 10.1007/s10162-015-0539-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  58. Shannon RV, Zeng FG, Kamath V, Wygonski J, Ekelid M, Series N, Oct N. Speech recognition with primarily temporal cues. Science 270: 303–304, 1995. doi: 10.1126/science.270.5234.303. [DOI] [PubMed] [Google Scholar]
  59. Sheft S, Shafiro V, Lorenzi C, McMullen R, Farrell C. Effects of age and hearing loss on the relationship between discrimination of stochastic frequency modulation and speech perception. Ear Hear 33: 709–720, 2012. doi: 10.1097/AUD.0b013e31825aab15. [DOI] [PMC free article] [PubMed] [Google Scholar]
  60. Song JH, Skoe E, Banai K, Kraus N. Perception of speech in noise: neural correlates. J Cogn Neurosci 23: 2268–2279, 2011. doi: 10.1162/jocn.2010.21556. [DOI] [PMC free article] [PubMed] [Google Scholar]
  61. Stasinopoulos DM, Rigby RA. Generalized additive models for location scale and shape (GAMLSS) in R. J Stat Softw 23: 1–46, 2007. doi: 10.18637/jss.v023.i07. [DOI] [Google Scholar]
  62. Stasinopoulos M, Rigby B. gamlss: Generalized Additive Models for Location Scale and Shape, 2017. https://cran.r-project.org/web/packages/gamlss/index.html [10 Oct. 2018].
  63. Strouse A, Ashmead DH, Ohde RN, Grantham DW. Temporal processing in the aging auditory system. J Acoust Soc Am 104: 2385–2399, 1998. doi: 10.1121/1.423748. [DOI] [PubMed] [Google Scholar]
  64. Tallon-Baudry C, Bertrand O, Delpuech C, Pernier J. Stimulus specificity of phase-locked and non-phase-locked 40 Hz visual responses in human. J Neurosci 16: 4240–4249, 1996. doi: 10.1523/JNEUROSCI.16-13-04240.1996. [DOI] [PMC free article] [PubMed] [Google Scholar]
  65. Vander Werff KR, Burns KS. Brain stem responses to speech in younger and older adults. Ear Hear 32: 168–180, 2011. doi: 10.1097/AUD.0b013e3181f534b5. [DOI] [PubMed] [Google Scholar]
  66. Wang Q, Li L. Differences between auditory frequency-following responses and onset responses: intracranial evidence from rat inferior colliculus. Hear Res 357: 25–32, 2018. doi: 10.1016/j.heares.2017.10.014. [DOI] [PubMed] [Google Scholar]
  67. Wilson RH, McArdle RA, Smith SL. An evaluation of the BKB-SIN, HINT, QuickSIN, and WIN materials on listeners with normal hearing and listeners with hearing loss. J Speech Lang Hear Res 50: 844–856, 2007. doi: 10.1044/1092-4388(2007/059). [DOI] [PubMed] [Google Scholar]
  68. Worden FG, Marsh JT. Frequency-following (microphonic-like) neural responses evoked by sound. Electroencephalogr Clin Neurophysiol 25: 42–52, 1968. doi: 10.1016/0013-4694(68)90085-0. [DOI] [PubMed] [Google Scholar]

Articles from Journal of Neurophysiology are provided here courtesy of American Physiological Society

RESOURCES