Predictors for estimating subcortical EEG responses to continuous speech

Joshua P Kulasingham; Florine L Bachmann; Kasper Eskelund; Martin Enqvist; Hamish Innes-Brown; Emina Alickovic

doi:10.1371/journal.pone.0297826

. 2024 Feb 8;19(2):e0297826. doi: 10.1371/journal.pone.0297826

Predictors for estimating subcortical EEG responses to continuous speech

Joshua P Kulasingham ^1,^*, Florine L Bachmann ², Kasper Eskelund ³, Martin Enqvist ¹, Hamish Innes-Brown ^2,^4,^‡, Emina Alickovic ^1,^2,^‡

Editor: Diego A Forero⁵

PMCID: PMC10852227 PMID: 38330068

Abstract

Perception of sounds and speech involves structures in the auditory brainstem that rapidly process ongoing auditory stimuli. The role of these structures in speech processing can be investigated by measuring their electrical activity using scalp-mounted electrodes. However, typical analysis methods involve averaging neural responses to many short repetitive stimuli that bear little relevance to daily listening environments. Recently, subcortical responses to more ecologically relevant continuous speech were detected using linear encoding models. These methods estimate the temporal response function (TRF), which is a regression model that minimises the error between the measured neural signal and a predictor derived from the stimulus. Using predictors that model the highly non-linear peripheral auditory system may improve linear TRF estimation accuracy and peak detection. Here, we compare predictors from both simple and complex peripheral auditory models for estimating brainstem TRFs on electroencephalography (EEG) data from 24 participants listening to continuous speech. We also investigate the data length required for estimating subcortical TRFs, and find that around 12 minutes of data is sufficient for clear wave V peaks (>3 dB SNR) to be seen in nearly all participants. Interestingly, predictors derived from simple filterbank-based models of the peripheral auditory system yield TRF wave V peak SNRs that are not significantly different from those estimated using a complex model of the auditory nerve, provided that the nonlinear effects of adaptation in the auditory system are appropriately modelled. Crucially, computing predictors from these simpler models is more than 50 times faster compared to the complex model. This work paves the way for efficient modelling and detection of subcortical processing of continuous speech, which may lead to improved diagnosis metrics for hearing impairment and assistive hearing technology.

Introduction

The human auditory system consists of several subcortical and cortical structures that rapidly process incoming sound signals such as speech. Electroencephalography (EEG) measurements of the aggregate activity of these neural structures have been instrumental in understanding the mechanisms underlying normal hearing and hearing impairments [1, 2]. One important measure is the morphology of the auditory brainstem response (ABR), and the amplitude and latency of ABR peaks have been widely used in many clinical settings such as neonatal hearing screening [3]. Conventional methods to detect the ABR rely on averaging responses over multiple trials of non-natural, short stimuli such as clicks, chirps or speech syllables [4].

Recently, ABR-like responses to continuous, ongoing speech were detected [5, 6], allowing for the exploration of subcortical processing of ecologically relevant speech stimuli. One method to estimate these subcortical responses is the temporal response function (TRF), a linear encoding model of time-locked neural responses to continuous stimuli [7]. TRFs have been widely used for estimating cortical responses to speech [8–12], but fewer studies have investigated subcortical TRFs [5, 13–16].

Several factors complicate the direct application of TRF models to detect subcortical responses to continuous speech. Electrical responses that are generated in the brainstem and measured at the scalp are small compared to the amplitude of the on-going EEG. They are subsequently difficult to detect, and a large amount of data is required for reliable TRF estimation [5, 13]. Additionally, subcortical neural processes rapidly time-lock to fast stimulus fluctuations, and a measurement system with precise synchronization (sub-millisecond) between the stimulus and the EEG is essential in order to extract these responses. Another concern is that linear models may ignore several highly non-linear and adaptive processing stages in the auditory periphery and brainstem [17–19]. The TRF is a linear model that relates the EEG signal to a stimulus-derived predictor, and therefore cannot capture the non-linear processing stages of the auditory system. However, the predictor, serving as the input to the TRF model, can be constructed to be a feature (or transformation) of the speech stimulus relevant to the auditory system. Accounting for peripheral non-linearities in the predictor could help ‘linearize’ the TRF estimation problem and lead to improved TRF models that reflect the activity of later neural processes.

Previous work has used the rectified speech waveform as a predictor, which is a coarse approximation of the initial rectifying non-linearity in the cochlea [5]. However, a recent study has shown that predictors derived from a complex model of the auditory periphery [20] that incorporates non-linear stages can lead to improved subcortical TRFs [21]. Another recent study showed that auditory-model-derived predictors outperform previously used envelope predictors even for cortical TRFs [22].

It is essential to determine computationally efficient predictors that result in clear TRF peaks for clinical applications involving realistic speech stimuli, or for future assistive hearing technology. Previous work has compared different methods to compute envelope predictors for investigating cortical responses to continuous speech [23]. In this work, we compared the rectified speech waveform with predictors derived from various auditory models in terms of their suitability for estimating subcortical TRFs. We computed predictors from filterbank models [17], with or without adaptation [24], and compared them to a more complex auditory nerve model [20, 25] that has been previously used to fit subcortical TRFs [21]. TRFs were estimated from EEG data recorded from 24 participants listening to continuous speech. Prior work indicates that the most prominent feature of subcortical TRFs is the wave V peak [5, 15], which was used as a performance measure in our study. Additional measures such as the computational time taken to generate predictors and the amount of data required for fitting TRFs for each predictor type are also reported.

We corroborate recent findings [21] by confirming that the predictor derived from a complex model [20] of the auditory nerve outperforms the rectified speech predictor. Interestingly, our results indicate that predictors from simpler models [24] can reach similar performance for estimating wave V peaks as complex models, with the added advantage of being more than 50 times faster to compute. These simpler models, combined with TRF analysis, could lead to efficient algorithms for future assistive hearing technology [26], and encourage the use of more ecologically relevant continuous speech stimuli in clinical applications.

Materials and methods

Experimental setup

EEG data was collected from 24 participants with clinically normal hearing thresholds (14 males, M_age = 37.07, SD_age = 10.02 years). All participants provided written informed consent and the study was approved by the ethics committee for the capital region of Denmark (journal number 22010204). The data collection period was from 1^st June to 11^th October 2022. EEG data was recorded while participants were seated listening to continuous segments from a Danish audiobook of H.C. Andersen adventures, read by Jens Okking. The participants were instructed to relax and listen to the story. Four audiobook stories were presented in randomized order, divided into two segments (M_duration = 6 minutes 0 seconds, SD_duration = 55 seconds) each, resulting in a total of 8 trials.

The 2-channel audio was averaged to form a mono audio channel, which was then highpass filtered at 1kHz using a first order Butterworth filter to enhance the relative contribution of high frequencies, since the brainstem response is more strongly driven by high frequencies [27]. Using this gentle highpass filter resulted in natural sounding speech in which a lot of power between 125-1000 Hz as well as the pitch information is clearly preserved. This method was also used in prior studies to detect clear subcortical TRFs [5]. The single channel speech segments were calibrated to be 72 dB SPL using the following procedure: Speech shaped noise was generated by transforming white noise to have the long-term spectrum of the speech. This signal was then calibrated to be 72 dB SPL by recording the audio signals using a measurement amplifier (Bruel and Kjær Type 2636) and head-and-torso simulator (HATS, Bruel and Kjær Type 4128-C) containing two ear simulators (Bruel and Kjær Type 4158). The setup was calibrated using a sound source (Bruel and Kjær Type 4231). Each speech segment was then scaled digitally to have the same root mean square (r.m.s.) value as the 72 dB SPL speech shaped noise. These speech signals were then presented binaurally using an RME Fireface UCX soundcard (RME Audio, Haimhausen Germany) and Etymotic ER-2 (Etymotic Research, Illinois, USA) insert earphones, which were shielded using a grounded metal box to avoid direct stimulus artifacts on the EEG. Stimulus artifacts occur when electromagnetic activity related to stimulus presentation is recorded in the EEG, and are largely caused by electromagnetic leakage of the headphone transducers and cables [28]. Here, we employed several methods to reduce stimulus artifacts: 1) Air-tube insert earphones were employed, creating distance between the headphone transducers and the EEG electrodes. 2) The headphone transducers were shielded with grounded metal boxes which has been shown to reduce stimulus presentation artifacts [28, 29]. The audio signal cables were also shielded, with the cable shield connected to the same ground as the metal box. 3) Model predictors were computed once for the original and once for the sign-inverted speech stimuli. TRFs were computed for both predictors, and then averaged, following prior work [5]. This approach is inspired by the traditional approach of using repeated short stimuli of alternating polarity, and then averaging across neural responses. Further details on predictor and TRF estimation are provided below. Later visual inspection of TRFs confirmed that stimulus artifacts were not present in the estimated TRFs.

EEG data collection and preprocessing

A Biosemi 32-channel EEG system was used with a sampling frequency of 16,384 Hz and a fifth order cascaded integrator-comb anti-aliasing filter with a -3 dB point at 3276.8 Hz. Electrodes were placed on the mastoids and earlobes, as well as above and below the right eye. Scalp electrodes were placed according to the 10-20 system. Data analysis was conducted in MATLAB (version R2021a) and the Eelbrain Python toolbox (version 0.38.1) [30] using only the Cz electrode referenced to the average of the two mastoid electrodes. The EEG data was highpass filtered using a first order Butterworth filter with cutoff frequency of 1 Hz. To remove power line noise, the signal was passed through FIR notch filters at all multiples of 50 Hz until 1000 Hz, with widths of 5 Hz. The data was then downsampled to 4096 Hz (after passing through an anti-aliasing lowpass filter) to speed up computation. Simple artifact removal was performed by zeroing out 1 second segments around parts of the EEG data that had amplitudes larger than 5 standard deviations above the mean, similar to prior work [5]. Finally, only the data from 2 to 242 seconds of each trial was used for further analysis to avoid onset effects and to have the same amount of data in each trial.

Detecting subcortical responses requires precise synchronization between the EEG and the audio stimuli. Hence, to avoid trigger jitters and clock drifts, the output of the audio interface was also fed to the BioSemi Erg1 channel via an optical isolator to maintain electrical separation between the mains power and the data collection system (StrimTrak, BrainProducts GmbH, Gilching, Germany). The recorded signal from the StimTrak was used to generate predictors for the TRF analysis.

Auditory models

Predictors were computed using several auditory models, described below in order of increasing complexity. For all models, the input was the audio stimulus, as recorded by the StrimTrak system. The lags inherent in the output of each model were accounted for by shifting the generated predictors to maximize the correlation with the rectified speech predictor. Since brainstem responses are largely agnostic to stimulus polarities, a pair of predictors were generated for each model, using an input stimulus pair with the original stimulus and the stimulus with opposite sign. In line with prior work [5, 21], TRFs were fit to both predictors separately and then averaged together. Although some auditory models account for peripheral rectification, predictor pairs were generated for all models in order to have the same preprocessing setup and since averaging the two TRFs fit to the predictor pair led to cleaner estimates. The generated predictor waveforms are shown in Fig 1 for a short speech segment, and the overall execution times to generate each predictor and the Pearson correlations between predictors are also reported in Table 1.

Fig 1 — The predictor waveforms are shown for a 1-second speech segment (also shown in the top row) to illustrate the differences between the models. Note that the OSSA and ZIL predictors are the most different from the speech waveform, since they incorporate more peripheral non-linearities and adaptation effects.

Table 1. Predictor comparison.

The ZIL model is more than 50 times slower to compute than the other models.

Predictor	Computation Time (1 s input)	Correlation with RS	Correlation with ZIL
RS	-	-	0.316
GT	0.0521 s	0.461	0.550
OSS	0.0563 s	0.438	0.496
OSSA	0.0680 s	0.262	0.577
ZIL	4.1208 s	0.316	-

Open in a new tab

Rectified speech (RS)

Previous studies have shown that the rectified speech signal can be used to estimate subcortical TRFs to continuous speech [5]. The method used in previous work [5] was followed to generate the first predictor pair, termed RS, which was formed by rectifying the speech stimulus (and the stimulus with opposite sign).

Gammatone spectrogram predictor (GT)

Incoming sounds undergo several stages of non-linear processing in the human ear and cochlea. The gammatone filterbank is a simple approximation of this system [31]. A gammatone filterbank consisting of 31 filters from 80-8000 Hz with 1 equivalent rectangular bandwidth (ERB) spacing was applied to the stimulus pair. The resulting amplitude spectra were averaged over all bands to generate the second predictor pair, which was termed GT. The Auditory Modeling Toolbox (AMT) version 1.1.0 [32] (function auditoryfilterbank with default parameters) was used.

Simple model without adaptation (OSS)

The next predictor pair, termed OSS, was generated using the auditory model provided in [24], which is based on the model in [17]. The implementation in AMT (function osses2021) was used, and the generated predictors are henceforth referred to as OSS after the first author of the relevant publication [24]. This model consists of an initial headphone and outer ear pre-filter (stage 1), a gammatone filterbank (stage 2), and an approximation of inner hair cell transduction that includes rectification followed by lowpass filtering (stage 3). The next stage of the model consists of adaptation loops (stage 4), which approximate the adaptation properties of the auditory nerve. The initial prefilter was omitted since it is not required for stimuli presented with insert earphones. The adaptation stage was also omitted for this version of the model. Therefore, only stages 2 and 3 were used, and the resulting signals with 31 center frequencies (similar to GT) were averaged together to form the predictor pair.

Simple model with adaptation (OSSA)

The adaptation loops (stage 4) of the previous auditory model [24] were now included (i.e., stages 2, 3 and 4 were used). The 31 channel output from the adaptation loops were averaged together to generate the pair of predictors. These predictors are henceforth referred to as OSSA (OSS + Adaptation).

Complex model (ZIL)

Finally, a more complex auditory model [20] was used to generate predictors, which are henceforth referred to as ZIL after first author of the relevant publication [20]. This model has been recently used to estimate subcortical TRFs [21] and consists of several stages approximating non-linear cochlear filters, inner and outer hair cell properties, auditory nerve synapses, and adaptation. The implementation in the Python cochlea package [33] was used with 43 auditory nerve fibers with high spontaneous firing rates and center frequencies logarithmically spaced between 125 Hz and 16 kHz, in line with previous work [21]. To speed up computation, an approximation of the power-law adaptation was used [21]. The outputs of this model are the mean firing rates of the auditory nerves, which were averaged to form the final predictor pair.

Temporal response function estimation

TRFs were fit for each predictor using the frequency domain method outlined in previous studies [5, 13] and shown in Eq (1).

\begin{matrix} T R F & = F^{- 1} {\frac{\sum_{i = 1}^{N} w_{i} F {x_{i}}^{*} F {y_{i}}}{\sum_{i = 1}^{N} \frac{1}{N} F {x_{i}}^{*} F {x_{i}}}} \end{matrix}

(1)

Here, $F$ denotes the Fourier transform, N is the number of trials, x_i, y_i and w_i are the predictor, EEG signal and weight for trial i, and * denotes the complex conjugate. The trial weights w_i were set to be the reciprocal of the variance of the EEG data of trial i normalized to sum to 1 across trials. In line with prior work [13], this was done to down-weight noisy (high variance) EEG trials. This frequency domain method results in TRFs with lags from −T/2 to T/2 where T is the data length.

Two TRFs were estimated separately for each predictor pair, and then averaged together. These TRFs were then bandpass filtered between 30-1000 Hz using a delay compensated FIR filter and then smoothed using a Hamming window of width 2 ms. The smoothing step was necessary since this unregularized TRF approach resulted in noisy estimates for the OSS and OSSA models (see Discussion). Although smoothing could obscure early subcortical peaks, there were no clear early peaks detected visually in the TRFs without smoothing. Given that incorporating smoothing led to more distinct wave V peaks and cleaner TRFs (less noise in the baseline period), it was used for all further TRF analysis. The TRF segment from -10 to 30 ms was extracted for further analysis. Finally, the baseline activity (mean of the TRF segment from -10 to 0 ms) was subtracted from each TRF.

To investigate the effect of data length, TRFs were estimated on a consecutively increasing number of trials (i.e, 2, 3, …, 8 trials, corresponding to 8, 12, …, 32 minutes of data) in the order that they were presented in the experiment. This simulates TRF estimation as if the experiment had been terminated after a few trials. For each data length, a leave-one-out cross-validation approach was followed, with one trial being used as test data to estimate model fits and the other trials being used to fit the TRF. The TRFs for each cross-validation fold were averaged together to form the final TRF for that data length. This resulted in 7 TRFs for each predictor that allowed for quantifying the improvement of TRF estimation with increasing data length.

Performance metrics and statistical tests

The goodness of fits of the TRF models were evaluated using prediction correlations. The average TRF across positive and negative predictors fit on the training dataset was used to predict the EEG signal of the test trial by convolving it with the appropriate predictors, and subsequently the Pearson correlation between the predicted EEG and the actual EEG signal was calculated. The correlations across all cross-validation folds were averaged together to form an estimate of the model fit. To estimate the noise floor, a null model was formed by averaging the prediction correlations from TRFs that were fit on circularly shifted predictors (shifts of 30, 60 and 90 seconds), similar to typical null models used in prior work with cortical TRFs [12]. This method preserves the temporal structure of the stimulus, while destroying the alignment between the stimulus and the EEG, resulting in an estimate of the noise floor. The same leave-one-out cross-validation approach at each data length was followed for the null models.

The most prominent feature of ABR TRFs is the wave V peak that occurs around 5-10 ms [5, 14, 15]. The amplitude of this wave V peak was used as the primary metric for comparing TRFs from each predictor type. The SNR of the wave V peak was computed, similar to prior work [5]. First, the TRF peak between 5-10 ms was automatically detected, and the power in a 5 ms window around the peak was computed as a measure of the signal power S. Next, the noise power N was estimated as the average TRF power in 5 ms windows in the range -500 to -20 ms. Finally, the wave V SNR was computed as SNR = 10log₁₀(S/N). Since the signal power cannot theoretically be lower than the noise floor (i.e., 0 dB SNR), negative SNRs were assumed to be meaningless and were set to be 0 dB. The threshold for detecting meaningful wave V peaks was considered to be 3 dB (signal power is twice the noise power). This threshold of 3 dB, though arbitrary, has the intuitive meaning of the signal power being twice the noise power. Indeed, individual TRFs with more than 3 dB SNR showed visually distinct wave V peaks, confirming that this value was a reasonable threshold for wave V peak detection.

The amplitudes and latencies of the TRF wave V for each predictor for each participant were also extracted. The consistency of individual wave V was investigated using correlations of wave V amplitudes and latencies across the different predictors.

Statistical analysis was performed using non-parametric tests since the wave V SNRs have a skewed distribution with some TRFs having 0 dB SNRs (i.e., no clear wave V peaks for RS predictor). Non-parametric small sample two-tailed Wilcoxon signed rank tests with Holm Bonferroni multiple comparisons correction were used to test pairwise differences in wave V SNR across predictors. Two participants were excluded from the statistical tests since they did not have data for the full 32 minutes. The group medians, test statistics (rank sums above zero) and p-values are reported.

Results

Subcortical TRFs for predictors derived from auditory models

A comparison of the computational time required to generate each predictor and their correlations with the simplest (RS) and the most complex (ZIL) models are provided in Table 1. The computations were performed on an AMD Ryzen 7 PRO 5850U 1.9 GHz CPU with 32 GB RAM. Note that even the approximate ZIL model is more than 50 times slower than the others.

The grand average TRFs for the five predictors over all 24 participants are shown in Fig 2 on the left panel. The TRFs for all predictors show clear wave V peaks. The wave V peak latency slightly varies across the predictor types, even after removing lags arising from the models themselves by shifting each predictor to have the maximum correlation with RS (see Discussion). The right panel shows the model fits for each predictor as well as the corresponding null model fits. Both OSSA and ZIL show an improvement in model fits over the other 3 models. All the individual TRFs are also shown in Fig 3, highlighting the consistent wave V peak across all participants.

Fig 3 — For visual clarity, only TRFs for RS (gray), OSSA (magenta) and ZIL (red) predictors are shown. The model fit prediction correlation and wave V SNR for the ZIL TRF is shown above each subplot. Wave V peaks can be seen for all participants.

Interaction of data length and predictor type on subcortical TRFs

The amount of data required for estimating the subcortical TRFs was investigated by fitting TRFs on an increasing number of 4 minute trials. Two metrics, the model fit and the wave V SNR, were used to compare TRFs across predictors and data lengths as shown in Fig 4. Almost all participants reached above zero prediction correlation and above 3 dB wave V SNR with 12 minutes of data for the OSSA and ZIL models. Two trends can be observed in Fig 4; 1) models with filterbanks (GT, OSS) produce wave V estimates with higher SNR compared to RS, 2) models with adaptation and level dependency (OSSA, ZIL) have higher wave V SNR compared to models with filterbanks. Interestingly, wave V SNR and prediction correlation of the simpler OSSA model was comparable to the more complex ZIL model. Statistical tests were performed on the wave V SNR for 32 minutes of data using pairwise non-parametric two-tailed Wilcoxon signed rank tests with Holm-Bonferroni correction. RS had significantly lower wave V SNRs (median 6.55 dB) than all other predictors (GT vs. RS T = 32, p = 0.005; OSS vs. RS T = 49, p = 0.031; OSSA vs. RS T = 0, p < 0.001; ZIL vs. RS T = 0, p < 0.001). Wave V SNRs for GT (median 9.38 dB) was not significantly different than for OSS (median 9.14 dB) (T = 59, p = 0.055). The wave V SNRs for OSSA (median 13.58 dB) and ZIL (median 13.99 dB) were larger than for OSS (OSSA vs. OSS T = 0, p < 0.001; ZIL vs. OSS T = 2, p < 0.001) or GT (OSSA vs. GT T = 0, p < 0.001; ZIL vs. OSS T = 2, p < 0.001). Critically, there was no significant difference in wave V SNRs between OSSA and ZIL (T = 126, p > 0.5), indicating that the simpler OSSA model provided comparable wave V peak amplitudes to the complex ZIL model.

Fig 4 — **Top**: Change in model fit prediction correlation with data length. Prediction correlations are shown after subtracting the corresponding null models. **Bottom**: Change in wave V SNR with data length. The threshold of 3 dB SNR is shown as a dashed line. All boxplots are shown across participants. Model fits are above zero for OSSA and ZIL for all participants after 12 minutes of data. Wave V SNRs are above 3 dB for almost all participants for OSSA and ZIL after 12 minutes of data. The wave V SNRs at 32 minutes of data for the OSSA and ZIL were significantly larger than all other predictors (see Results). Crucially, OSSA wave V SNRs were not significantly different to ZIL.

Individual amplitudes and latencies of wave V

Finally, the TRFs for the OSSA and ZIL predictors were compared as shown in Fig 5, to further investigate their similarity. The OSSA model showed a high degree of correlation with the ZIL model on a single participant level (Pearson correlation of OSSA vs. ZIL: r = 0.865 for wave V SNR, r = 0.913 for the peak latencies, r = 0.934 for the peak amplitudes, and r = 0.852 for model fits). This confirms that both models provide TRF wave V estimates that are consistent for each participant. However, the ZIL model has a shorter mean latency, also seen in Figs 2 and 3 (see Discussion). Additionally, the OSSA model seems to have slightly smaller wave V peaks than the ZIL model. Nevertheless, this correlation analysis indicates that the simpler OSSA model may provide a good trade-off between computational efficiency and reliable wave V peaks. Additionally, individual wave V SNR and model fits show moderate correlation as seen in Fig 5 (OSSA r = 0.485, ZIL r = 0.538), indicating that higher model fits may not always lead to higher wave V peaks. Therefore the appropriate metric should be considered based on whether the goal is to detect wave Vs or to evaluate model estimation quality.

Discussion

In this work, we compared the suitability of several predictors for estimating subcortical TRFs to continuous speech. We replicated prior work and showed that including non-linearities in the predictor using auditory models leads to improved linear TRF estimates. Our results indicate that the addition of filterbanks and adaptation stages to the predictor models greatly improves estimation of wave V in the TRFs over the rectified speech predictor. Critically, we show that even simpler models may allow for robust model fits and wave V peaks using around 12 minutes of data. These simple models give TRFs that are comparable to a more complex model, even though the complex model is more than 50 times slower to compute. However, it must be noted that OSSA wave V SNRs were comparable to ZIL only after smoothing the TRFs using a 2 ms Hamming window (see Methods), perhaps because the OSSA TRFs were noisier. Other methods such as regularized regression, which is widely used for cortical TRFs [9, 34, 35], or direct estimation of TRF peaks [36] may be able to overcome this issue. Nevertheless, our correlation analysis revealed that these smoothed TRFs resulted in wave V peak amplitudes and latencies for OSSA and ZIL that were consistent across participants.

The auditory models considered in this work can be categorized into three groups: rectification only (RS), models with filterbanks but without adaptation (GT and OSS), and models with adaptation (OSSA, ZIL). It should be noted that models with filterbanks provided an improvement in TRFs over rectification alone, and that models with adaptation provide the best TRFs. These results are as expected, since including these non-linearities of the peripheral auditory system in the predictor should lead to better linear TRF models.

However, it is surprising that the simpler OSSA model performs as well as the more complex ZIL model. The OSSA model is a functional model that simulates behavioral results while the ZIL model is a phenomenological model that simulates biophysical properties of the neural system [37]. Indeed the ZIL model has several stages that are absent in the OSSA model, such as adaptive filterbanks that simulate inner and outer hair cell activity, power-law adaptation, and models of the auditory synapse. However, our results indicate that perhaps such complex simulations are not necessary for estimating reliable TRF wave Vs. This does not indicate that the OSSA model simulates the auditory system as accurately as the ZIL model for other types of metrics (see [37] for a more detailed comparison of each model), but only that it may suffice for accounting for peripheral non-linearities in TRF estimation in a computationally efficient manner.

This work does not provide an exhaustive list of auditory models or predictors for estimating subcortical TRFs. We also do not directly compare the performance of the auditory models themselves (see [37]), but only evaluate their suitability to generate predictors for subcortical TRFs. Several other models (e.g., [38, 39]) could be utilized to generate predictors, although our work suggests that simple models are reliable enough to fit TRFs with clear wave V peaks.

It must be noted that although the wave V peak was used as the primary metric of performance, the conventional click ABR consists of several other morphological features [1]. The wave V peak was selected here to both be consistent with prior work [5, 15, 21], and because it was the only consistent feature that was visually detected in all participants (see Fig 3). Conventional click-ABR studies show that early peaks of the ABR are weaker with increasing stimulus rate, and that the wave V is the most consistently detected for different stimulus rates and amplitudes [2]. Therefore, these early peaks may be more difficult to detect using a continuous stimulus like speech, although one study has shown that it may be possible for some participants [21]. Future work could explore if improved predictors or TRF methods could help detect these early subcortical peaks.

TRFs using the ZIL predictor had shorter wave V peak latencies (see Figs 2 and 5), even after accounting for modelling delays by shifting the ZIL predictor to have the maximum correlation with the RS predictor. It is possible that the wave V from the ZIL model is earlier since the ZIL model better incorporates peripheral non-linearities. This may provide a predictor that is similar to intermediate signal representations in the auditory pathway near the wave V generators, which could in turn result in an earlier estimated wave V. Further investigation is needed to disentangle the effects of lags introduced by the auditory peripheral models in order to ascertain whether these latency differences are meaningful properties of the ABR.

Finally, this work only analyses subcortical responses to speech stimuli. Recent work indicates that complex auditory model predictors (ZIL) provide significant advantages over rectified speech when estimating subcortical TRFs for music [21]. Future work could investigate the suitability of simpler auditory model predictors for estimating TRFs for non-speech stimuli.

Conclusion

This work provides a systematic comparison of predictors derived from auditory peripheral models for estimating subcortical TRFs to continuous speech. Our results indicate that simple models with filterbanks and adaptation loops may suffice to estimate reliable subcortical TRFs. Such efficient algorithms may pave the way toward the use of more ecologically relevant natural speech for investigating hearing impairment and for future assistive hearing technology.

Acknowledgments

The authors are grateful to all participants for their participation in this study.

Data Availability

There are ethical restrictions on sharing the data set. The consent given by participants at the outset of this study did not explicitly detail sharing of the data in any format; this limitation is keeping with EU General Data Protection Regulation, and is imposed by the Research Ethics Committees of the Capital Region of Denmark. Due to this regulation and the way data was collected with a low number of participants, it is not possible to fully anonymize the dataset and hence cannot be shared. As a non-author contact point, data requests can be sent to Claus Nielsen, Eriksholm research operations manager at clni@eriksholm.com.

Funding Statement

JPK has received funding from the William Demant Foundation (Case no. 20-0480). Oticon A/S provided support in the form of salaries for authors FLB, KE, HI, EA, but did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

1. Picton TW, Hillyard SA, Krausz HI, Galambos R. Human auditory evoked potentials. I: Evaluation of components. Electroencephalography and Clinical Neurophysiology. 1974;36:179–190. doi: 10.1016/0013-4694(74)90156-4 [DOI] [PubMed] [Google Scholar]
2. Picton TW. Human Auditory Evoked Potentials. Plural Publishing; 2010. [Google Scholar]
3. Warren MP. The Auditory Brainstem Response in Pediatrics. Otolaryngologic Clinics of N America. 1989;22(3):473–500. doi: 10.1016/S0030-6665(20)31412-2 [DOI] [PubMed] [Google Scholar]
4. Skoe E, Kraus N. Auditory brainstem response to complex sounds: a tutorial. Ear and Hearing. 2010;31(3):302–324. doi: 10.1097/AUD.0b013e3181cdb272 [DOI] [PMC free article] [PubMed] [Google Scholar]
5. Maddox RK, Lee AKC. Auditory Brainstem Responses to Continuous Natural Speech in Human Listeners. eNeuro. 2018;5(1). doi: 10.1523/ENEURO.0441-17.2018 [DOI] [PMC free article] [PubMed] [Google Scholar]
6. Etard O, Kegler M, Braiman C, Forte AE, Reichenbach T. Decoding of selective attention to continuous speech from the human auditory brainstem response. NeuroImage. 2019;200:1–11. doi: 10.1016/j.neuroimage.2019.06.029 [DOI] [PubMed] [Google Scholar]
7. Lalor EC, Power AJ, Reilly RB, Foxe JJ. Resolving Precise Temporal Processing Properties of the Auditory System Using Continuous Stimuli. Journal of Neurophysiology. 2009;102(1):349–359. doi: 10.1152/jn.90896.2008 [DOI] [PubMed] [Google Scholar]
8. Di Liberto GM, O’Sullivan JA, Lalor EC. Low-Frequency Cortical Entrainment to Speech Reflects Phoneme-Level Processing. Current Biology. 2015;25(19):2457–2465. doi: 10.1016/j.cub.2015.08.030 [DOI] [PubMed] [Google Scholar]
9. Alickovic E, Lunner T, Gustafsson F, Ljung L. A tutorial on auditory attention identification methods. Frontiers in Neuroscience. 2019; p. 153. doi: 10.3389/fnins.2019.00153 [DOI] [PMC free article] [PubMed] [Google Scholar]
10. Brodbeck C, Presacco A, Simon JZ. Neural source dynamics of brain responses to continuous stimuli: Speech processing from acoustics to comprehension. NeuroImage. 2018;172:162–174. doi: 10.1016/j.neuroimage.2018.01.042 [DOI] [PMC free article] [PubMed] [Google Scholar]
11. Kulasingham JP, Joshi NH, Rezaeizadeh M, Simon JZ. Cortical Processing of Arithmetic and Simple Sentences in an Auditory Attention Task. Journal of Neuroscience. 2021;41(38):8023–8039. doi: 10.1523/JNEUROSCI.0269-21.2021 [DOI] [PMC free article] [PubMed] [Google Scholar]
12. Kulasingham JP, Brodbeck C, Presacco A, Kuchinsky SE, Anderson S, Simon JZ. High gamma cortical processing of continuous speech in younger and older listeners. NeuroImage. 2020;222:117291. doi: 10.1016/j.neuroimage.2020.117291 [DOI] [PMC free article] [PubMed] [Google Scholar]
13. Polonenko MJ, Maddox RK. Exposing distinct subcortical components of the auditory brainstem response evoked by continuous naturalistic speech. eLife. 2021;10:e62329. doi: 10.7554/eLife.62329 [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Bachmann FL, MacDonald E, Hjortkjær J. A comparison of two measures of subcortical responses to ongoing speech: Preliminary results. Proceedings of the International Symposium on Auditory and Audiological Research. 2019;7:461–468.
15. Bachmann FL, MacDonald EN, Hjortkjær J. Neural Measures of Pitch Processing in EEG Responses to Running Speech. Frontiers in Neuroscience. 2021;15:738408. doi: 10.3389/fnins.2021.738408 [DOI] [PMC free article] [PubMed] [Google Scholar]
16. Kegler M, Weissbart H, Reichenbach T. The neural response at the fundamental frequency of speech is modulated by word-level acoustic and linguistic information. Frontiers in Neuroscience. 2022;16. doi: 10.3389/fnins.2022.915744 [DOI] [PMC free article] [PubMed] [Google Scholar]
17. Dau T, Püschel D, Kohlrausch A. A quantitative model of the “effective” signal processing in the auditory system. I. Model structure. The Journal of the Acoustical Society of America. 1996;99(6):3615–3622. doi: 10.1121/1.414959 [DOI] [PubMed] [Google Scholar]
18. Dau T, Püschel D, Kohlrausch A. A quantitative model of the “effective” signal processing in the auditory system. II. Simulations and measurements. The Journal of the Acoustical Society of America. 1996;99(6):3623–3631. doi: 10.1121/1.414959 [DOI] [PubMed] [Google Scholar]
19. Saiz-Alía M, Reichenbach T. Computational modeling of the auditory brainstem response to continuous speech. Journal of Neural Engineering. 2020;17(3):036035. doi: 10.1088/1741-2552/ab970d [DOI] [PubMed] [Google Scholar]
20. Zilany MSA, Bruce IC, Carney LH. Updated parameters and expanded simulation options for a model of the auditory periphery. The Journal of the Acoustical Society of America. 2014;135(1):283–286. doi: 10.1121/1.4837815 [DOI] [PMC free article] [PubMed] [Google Scholar]
21. Shan T, Cappelloni MS, Maddox RK. Subcortical responses to music and speech are alike while cortical responses diverge. Scientific Reports. 2024;14(1):789. doi: 10.1038/s41598-023-50438-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
22. Lindboom E, Nidiffer A, Carney LH, Lalor EC. Incorporating models of subcortical processing improves the ability to predict EEG responses to natural speech. Hearing Research. 2023;433:108767. doi: 10.1016/j.heares.2023.108767 [DOI] [PMC free article] [PubMed] [Google Scholar]
23. Biesmans W, Das N, Francart T, Bertrand A. Auditory-Inspired Speech Envelope Extraction Methods for Improved EEG-Based Auditory Attention Detection in a Cocktail Party Scenario. IEEE transactions on neural systems and rehabilitation engineering. 2017;25(5):402–412. doi: 10.1109/TNSRE.2016.2571900 [DOI] [PubMed] [Google Scholar]
24. Osses Vecchi A, Kohlrausch A. Perceptual similarity between piano notes: Simulations with a template-based perception model. The Journal of the Acoustical Sociecty of America. 2021;149(5):3534. doi: 10.1121/10.0004818 [DOI] [PubMed] [Google Scholar]
25. Zilany MSA, Bruce IC, Nelson PC, Carney LH. A phenomenological model of the synapse between the inner hair cell and auditory nerve: long-term adaptation with power-law dynamics. The Journal of the Acoustical Society of America. 2009;126(5). doi: 10.1121/1.3238250 [DOI] [PMC free article] [PubMed] [Google Scholar]
26. Geirnaert S, Vandecappelle S, Alickovic E, de Cheveigne A, Lalor E, Meyer BT, et al. Electroencephalography-Based Auditory Attention Decoding: Toward Neurosteered Hearing Devices. IEEE Signal Processing Magazine. 2021;38(4):89–102. doi: 10.1109/MSP.2021.3075932 [DOI] [Google Scholar]
27. Abdala C, Folsom RC. The development of frequency resolution in humans as revealed by the auditory brain-stem response recorded with notched-noise masking. The Journal of the Acoustical Society of America. 1995;98(2):921–930. doi: 10.1121/1.414350 [DOI] [PubMed] [Google Scholar]
28. Akhoun I, Moulin A, Jeanvoine A, Ménard M, Buret F, Vollaire C, et al. Speech auditory brainstem response (speech ABR) characteristics depending on recording conditions, and hearing status: an experimental parametric study. Journal of Neuroscience Methods. 2008;175(2):196–205. doi: 10.1016/j.jneumeth.2008.07.026 [DOI] [PubMed] [Google Scholar]
29. Riazi M, Ferraro JA. Observations on mastoid versus ear canal recorded cochlear microphonic in newborns and adults. Journal of the American Academy of Audiology. 2008;19(1):46–55. doi: 10.3766/jaaa.19.1.5 [DOI] [PubMed] [Google Scholar]
30. Brodbeck C, Das P, Gillis M, Kulasingham JP, Bhattasali S, Gaston P, et al. Eelbrain, a Python toolkit for time-continuous analysis with temporal response functions. eLife. 2023;12:e85012. doi: 10.7554/eLife.85012 [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Patterson RD, Nimmo-Smith I, Holdsworth J, Rice P. An efficient auditory filterbank based on the gammatone function. In: a meeting of the IOC Speech Group on Auditory Modelling at RSRE. vol. 2; 1987.
32. Majdak P, Hollomey C, Baumgartner R. AMT 1.x: A toolbox for reproducible research in auditory modeling. Acta Acustica. 2022;6:19. doi: 10.1051/aacus/2022011 [DOI] [Google Scholar]
33. Rudnicki M, Schoppe O, Isik M, Völk F, Hemmert W. Modeling auditory coding: from sound to spikes. Cell and Tissue Research. 2015;361(1):159–175. doi: 10.1007/s00441-015-2202-z [DOI] [PMC free article] [PubMed] [Google Scholar]
34. Wong DD, Fuglsang SA, Hjortkjær J, Ceolini E, Slaney M, De Cheveigne A. A comparison of regularization methods in forward and backward models for auditory attention decoding. Frontiers in Neuroscience. 2018;12:531. doi: 10.3389/fnins.2018.00531 [DOI] [PMC free article] [PubMed] [Google Scholar]
35. Crosse MJ, Zuk NJ, Di Liberto GM, Nidiffer AR, Molholm S, Lalor EC. Linear modeling of neurophysiological responses to speech and other continuous stimuli: methodological considerations for applied research. Frontiers in Neuroscience. 2021; p. 1350. doi: 10.3389/fnins.2021.705621 [DOI] [PMC free article] [PubMed] [Google Scholar]
36. Kulasingham JP, Simon JZ. Algorithms for Estimating Time-Locked Neural Response Components in Cortical Processing of Continuous Speech. IEEE Transactions on Biomedical Engineering. 2023;70(1):88–96. doi: 10.1109/TBME.2022.3185005 [DOI] [PMC free article] [PubMed] [Google Scholar]
37. Vecchi AO, Varnet L, Carney LH, Dau T, Bruce IC, Verhulst S, et al. A comparative study of eight human auditory models of monaural processing. Acta Acustica. 2022;6:17. doi: 10.1051/aacus/2022008 [DOI] [PMC free article] [PubMed] [Google Scholar]
38. Relaño-Iborra H, Zaar J, Dau T. A speech-based computational auditory signal processing and perception model. The Journal of the Acoustical Society of America. 2019;146(5):3306–3317. doi: 10.1121/1.5129114 [DOI] [PubMed] [Google Scholar]
39. Verhulst S, Bharadwaj HM, Mehraei G, Shera CA, Shinn-Cunningham BG. Functional modeling of the human auditory brainstem response to broadband stimulation. The Journal of the Acoustical Society of America. 2015;138(3):1637–1659. doi: 10.1121/1.4928305 [DOI] [PMC free article] [PubMed] [Google Scholar]

PLoS One. doi: 10.1371/journal.pone.0297826.r001

Decision Letter 0

Diego A Forero

8 Nov 2023

PONE-D-23-31966Predictors for estimating subcortical EEG responses to continuous speechPLOS ONE

Dear Dr. Kulasingham,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

As Academic Editor, I agree with both reviewers and their suggestions for addressing several points in the manuscript.

Please submit your revised manuscript by Dec 23 2023 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Diego A. Forero, MD; PhD

Academic Editor

PLOS ONE

Journal Requirements:

Additional Editor Comments:

I agree with both reviewers in their suggestions of several points to be addressed.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: No

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: No

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: In this paper, the authors investigated the predictors for estimating the subcortical EEG response to continuous speech. This study provides a method to model and detect the subcortical processing of continuous speech. Even though this paper has some merits and is interesting, I still feel confused about several points that need to be addressed by the authors in details.

1. The abstract has not provided numerical results that make the audiences clear after glancing the abstract.

2. Some important concepts are not clear and confusing. “Predictors” is what I concern most. TRF is recognized as predictor derived from the stimulus, but the authors also “compare predictors … for estimating TRFs”. Moreover, “rectified speech waveform as a predictor”. Why the waveform can be predictor also? It is really confusing me. It is strongly suggested to clearly define the “predictor”.

3. Some details are missing. Which ear was given stimuli? How to prove the speech segments convey the same energy, or power (72 dB SPL), as the compared methods? From the spectrogram? It is said that the electrodes were placed on the mastoids and earlobes. What’s the usage of the electrodes on earlobes? There is no information for this. How the speech stimuli were given to the subjects? In what manner? What are the parameters? What is the difference between Rectified Speech and the one in [5]? What does they look like? It is suggested to provide some figures to show the difference between RS, GT, OSS, OSSA, and ZIL.

4. More details should be provided for the model fit. Since this is an important metric for evaluation. Also, how these aforementioned methods derived from the auditory model should be clarified.

5. I am wondering the cut-off frequency of highpass filter is appropriate or not. First, after filtering, is the speech sound naturally? Second, in PTA, we know that the 500 Hz was also tested for the human being. Also, in tone-burst ABR, 500 Hz could also induce highly recognized ABR signal. It is appropriate or not to use 1kHz highpass filter to process the audio should be investigated.

6. Grounded metal box can eliminate the electromagnetic noise. But how can this avoid stimulus artifact? Please provide more details about this. For click, stimulus polarity alternation can avoid the stimulus artifact, how can speech stimulus make it, since from the results I can hardly see the stimulus artifacts.

7. The abbreviation should be rigorous. Simple model without adaptation (OSS). I don’t think OSS could be the abbr. of this phrase. So do OSSA, ZIL.

8. For figure 1, the authors should provide an explanation for the markers in the figures, like what the circles mean? Also, they should give the SD of the TRFs, and so does figure 2.

9. For figure 3, the statistical analyses should be marked out in the figure.

10. For the whole manuscript, there are discussing wave V, I think indeed they are studying ABR? If so, why the put the results for 30ms? If not, what other indexes they used?

This works made some efforts on this field but need to provide more comprehensive and detailed results for supporting their statements.

Reviewer #2: The submitted article compares 5 computational methods for estimating the subcortical TRF based on Wave V of the ABR from EEG during a passive listening task. Each model is sufficiently described in detail, and the results/conclusions indicate that while the most complex method based on a well-known auditory-nerve model (Zilany et al. 2009) was the best predictor of the TRF, it was computationally much higher than a simpler model with adaptation (OSSA), which also performed well. This leads the author to suggest that practical consideration, for example in assistive listening devices, would perhaps benefit from this computational-performance trade off.

The article is free of any glaring issues, though i have a few questions/comments that could be helpful to the reader.

1) the choice of 3 dB threshold for detecting a meaningful wave V is seemingly arbitrary and not discussed. how might the results change (if at all) if this choice were more liberal or conservative?

2) it looks like a few subjects (P01, P07, and P08) have very strong TRFs relative to the others in Figure 2. I may have missed it, but how was this taken into account when performing a grand average? That is, were their relative reponses between computational methods in any way over represented in the grand average, or was their a normalization process used?

3) In figure 3, the rightmost panel of the top row (prediction correlations) should be identical to the boxplots in Figure 1, both of which considered the full 32 minutes? It appears that there are slight differences, especially in the outliers, so this was confusing to me.

4) were tests of statistical significance run on the correlations presented in Fig 4?

5) check the references - some do not contain the publication year, like Picton and Zilany et al.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2024 Feb 8;19(2):e0297826. doi: 10.1371/journal.pone.0297826.r002

Author response to Decision Letter 0

12 Dec 2023

We have responded to all reviewer and editor comments in our response to reviewers pdf document.

Attachment

Submitted filename: response_to_reviewers.pdf

Click here for additional data file.^{(1MB, pdf)}

PLoS One. doi: 10.1371/journal.pone.0297826.r003

Decision Letter 1

Diego A Forero

15 Jan 2024

Predictors for estimating subcortical EEG responses to continuous speech

PONE-D-23-31966R1

Dear Dr. Kulasingham,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Diego A. Forero, MD; PhD

Academic Editor

PLOS ONE

Additional Editor Comments:

Both reviewers recommend the acceptance of the revised manuscript.

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #2: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: N/A

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

Reviewer #1: Yes

Reviewer #2: Yes

**********

6. Review Comments to the Author

Reviewer #1: The authors have addressed the question i am confused. And i think the manuscript is now ready to be accepted.

Reviewer #2: (No Response)

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

**********

PLoS One. doi: 10.1371/journal.pone.0297826.r004

Acceptance letter

Diego A Forero

30 Jan 2024

PONE-D-23-31966R1

PLOS ONE

Dear Dr. Kulasingham,

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now being handed over to our production team.

At this stage, our production department will prepare your paper for publication. This includes ensuring the following:

* All references, tables, and figures are properly cited

* All relevant supporting information is included in the manuscript submission,

* There are no issues that prevent the paper from being properly typeset

If revisions are needed, the production department will contact you directly to resolve them. If no revisions are needed, you will receive an email when the publication date has been set. At this time, we do not offer pre-publication proofs to authors during production of the accepted work. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few weeks to review your paper and let you know the next and final steps.

Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

If we can help with anything else, please email us at customercare@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Diego A. Forero

Academic Editor

PLOS ONE

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Attachment

Submitted filename: response_to_reviewers.pdf

Click here for additional data file.^{(1MB, pdf)}

Data Availability Statement

[pone.0297826.ref001] 1. Picton TW, Hillyard SA, Krausz HI, Galambos R. Human auditory evoked potentials. I: Evaluation of components. Electroencephalography and Clinical Neurophysiology. 1974;36:179–190. doi: 10.1016/0013-4694(74)90156-4 [DOI] [PubMed] [Google Scholar]

[pone.0297826.ref002] 2. Picton TW. Human Auditory Evoked Potentials. Plural Publishing; 2010. [Google Scholar]

[pone.0297826.ref003] 3. Warren MP. The Auditory Brainstem Response in Pediatrics. Otolaryngologic Clinics of N America. 1989;22(3):473–500. doi: 10.1016/S0030-6665(20)31412-2 [DOI] [PubMed] [Google Scholar]

[pone.0297826.ref004] 4. Skoe E, Kraus N. Auditory brainstem response to complex sounds: a tutorial. Ear and Hearing. 2010;31(3):302–324. doi: 10.1097/AUD.0b013e3181cdb272 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0297826.ref005] 5. Maddox RK, Lee AKC. Auditory Brainstem Responses to Continuous Natural Speech in Human Listeners. eNeuro. 2018;5(1). doi: 10.1523/ENEURO.0441-17.2018 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0297826.ref006] 6. Etard O, Kegler M, Braiman C, Forte AE, Reichenbach T. Decoding of selective attention to continuous speech from the human auditory brainstem response. NeuroImage. 2019;200:1–11. doi: 10.1016/j.neuroimage.2019.06.029 [DOI] [PubMed] [Google Scholar]

[pone.0297826.ref007] 7. Lalor EC, Power AJ, Reilly RB, Foxe JJ. Resolving Precise Temporal Processing Properties of the Auditory System Using Continuous Stimuli. Journal of Neurophysiology. 2009;102(1):349–359. doi: 10.1152/jn.90896.2008 [DOI] [PubMed] [Google Scholar]

[pone.0297826.ref008] 8. Di Liberto GM, O’Sullivan JA, Lalor EC. Low-Frequency Cortical Entrainment to Speech Reflects Phoneme-Level Processing. Current Biology. 2015;25(19):2457–2465. doi: 10.1016/j.cub.2015.08.030 [DOI] [PubMed] [Google Scholar]

[pone.0297826.ref009] 9. Alickovic E, Lunner T, Gustafsson F, Ljung L. A tutorial on auditory attention identification methods. Frontiers in Neuroscience. 2019; p. 153. doi: 10.3389/fnins.2019.00153 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0297826.ref010] 10. Brodbeck C, Presacco A, Simon JZ. Neural source dynamics of brain responses to continuous stimuli: Speech processing from acoustics to comprehension. NeuroImage. 2018;172:162–174. doi: 10.1016/j.neuroimage.2018.01.042 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0297826.ref011] 11. Kulasingham JP, Joshi NH, Rezaeizadeh M, Simon JZ. Cortical Processing of Arithmetic and Simple Sentences in an Auditory Attention Task. Journal of Neuroscience. 2021;41(38):8023–8039. doi: 10.1523/JNEUROSCI.0269-21.2021 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0297826.ref012] 12. Kulasingham JP, Brodbeck C, Presacco A, Kuchinsky SE, Anderson S, Simon JZ. High gamma cortical processing of continuous speech in younger and older listeners. NeuroImage. 2020;222:117291. doi: 10.1016/j.neuroimage.2020.117291 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0297826.ref013] 13. Polonenko MJ, Maddox RK. Exposing distinct subcortical components of the auditory brainstem response evoked by continuous naturalistic speech. eLife. 2021;10:e62329. doi: 10.7554/eLife.62329 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0297826.ref014] 14.Bachmann FL, MacDonald E, Hjortkjær J. A comparison of two measures of subcortical responses to ongoing speech: Preliminary results. Proceedings of the International Symposium on Auditory and Audiological Research. 2019;7:461–468.

[pone.0297826.ref015] 15. Bachmann FL, MacDonald EN, Hjortkjær J. Neural Measures of Pitch Processing in EEG Responses to Running Speech. Frontiers in Neuroscience. 2021;15:738408. doi: 10.3389/fnins.2021.738408 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0297826.ref016] 16. Kegler M, Weissbart H, Reichenbach T. The neural response at the fundamental frequency of speech is modulated by word-level acoustic and linguistic information. Frontiers in Neuroscience. 2022;16. doi: 10.3389/fnins.2022.915744 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0297826.ref017] 17. Dau T, Püschel D, Kohlrausch A. A quantitative model of the “effective” signal processing in the auditory system. I. Model structure. The Journal of the Acoustical Society of America. 1996;99(6):3615–3622. doi: 10.1121/1.414959 [DOI] [PubMed] [Google Scholar]

[pone.0297826.ref018] 18. Dau T, Püschel D, Kohlrausch A. A quantitative model of the “effective” signal processing in the auditory system. II. Simulations and measurements. The Journal of the Acoustical Society of America. 1996;99(6):3623–3631. doi: 10.1121/1.414959 [DOI] [PubMed] [Google Scholar]

[pone.0297826.ref019] 19. Saiz-Alía M, Reichenbach T. Computational modeling of the auditory brainstem response to continuous speech. Journal of Neural Engineering. 2020;17(3):036035. doi: 10.1088/1741-2552/ab970d [DOI] [PubMed] [Google Scholar]

[pone.0297826.ref020] 20. Zilany MSA, Bruce IC, Carney LH. Updated parameters and expanded simulation options for a model of the auditory periphery. The Journal of the Acoustical Society of America. 2014;135(1):283–286. doi: 10.1121/1.4837815 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0297826.ref021] 21. Shan T, Cappelloni MS, Maddox RK. Subcortical responses to music and speech are alike while cortical responses diverge. Scientific Reports. 2024;14(1):789. doi: 10.1038/s41598-023-50438-0 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0297826.ref022] 22. Lindboom E, Nidiffer A, Carney LH, Lalor EC. Incorporating models of subcortical processing improves the ability to predict EEG responses to natural speech. Hearing Research. 2023;433:108767. doi: 10.1016/j.heares.2023.108767 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0297826.ref023] 23. Biesmans W, Das N, Francart T, Bertrand A. Auditory-Inspired Speech Envelope Extraction Methods for Improved EEG-Based Auditory Attention Detection in a Cocktail Party Scenario. IEEE transactions on neural systems and rehabilitation engineering. 2017;25(5):402–412. doi: 10.1109/TNSRE.2016.2571900 [DOI] [PubMed] [Google Scholar]

[pone.0297826.ref024] 24. Osses Vecchi A, Kohlrausch A. Perceptual similarity between piano notes: Simulations with a template-based perception model. The Journal of the Acoustical Sociecty of America. 2021;149(5):3534. doi: 10.1121/10.0004818 [DOI] [PubMed] [Google Scholar]

[pone.0297826.ref025] 25. Zilany MSA, Bruce IC, Nelson PC, Carney LH. A phenomenological model of the synapse between the inner hair cell and auditory nerve: long-term adaptation with power-law dynamics. The Journal of the Acoustical Society of America. 2009;126(5). doi: 10.1121/1.3238250 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0297826.ref026] 26. Geirnaert S, Vandecappelle S, Alickovic E, de Cheveigne A, Lalor E, Meyer BT, et al. Electroencephalography-Based Auditory Attention Decoding: Toward Neurosteered Hearing Devices. IEEE Signal Processing Magazine. 2021;38(4):89–102. doi: 10.1109/MSP.2021.3075932 [DOI] [Google Scholar]

[pone.0297826.ref027] 27. Abdala C, Folsom RC. The development of frequency resolution in humans as revealed by the auditory brain-stem response recorded with notched-noise masking. The Journal of the Acoustical Society of America. 1995;98(2):921–930. doi: 10.1121/1.414350 [DOI] [PubMed] [Google Scholar]

[pone.0297826.ref028] 28. Akhoun I, Moulin A, Jeanvoine A, Ménard M, Buret F, Vollaire C, et al. Speech auditory brainstem response (speech ABR) characteristics depending on recording conditions, and hearing status: an experimental parametric study. Journal of Neuroscience Methods. 2008;175(2):196–205. doi: 10.1016/j.jneumeth.2008.07.026 [DOI] [PubMed] [Google Scholar]

[pone.0297826.ref029] 29. Riazi M, Ferraro JA. Observations on mastoid versus ear canal recorded cochlear microphonic in newborns and adults. Journal of the American Academy of Audiology. 2008;19(1):46–55. doi: 10.3766/jaaa.19.1.5 [DOI] [PubMed] [Google Scholar]

[pone.0297826.ref030] 30. Brodbeck C, Das P, Gillis M, Kulasingham JP, Bhattasali S, Gaston P, et al. Eelbrain, a Python toolkit for time-continuous analysis with temporal response functions. eLife. 2023;12:e85012. doi: 10.7554/eLife.85012 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0297826.ref031] 31.Patterson RD, Nimmo-Smith I, Holdsworth J, Rice P. An efficient auditory filterbank based on the gammatone function. In: a meeting of the IOC Speech Group on Auditory Modelling at RSRE. vol. 2; 1987.

[pone.0297826.ref032] 32. Majdak P, Hollomey C, Baumgartner R. AMT 1.x: A toolbox for reproducible research in auditory modeling. Acta Acustica. 2022;6:19. doi: 10.1051/aacus/2022011 [DOI] [Google Scholar]

[pone.0297826.ref033] 33. Rudnicki M, Schoppe O, Isik M, Völk F, Hemmert W. Modeling auditory coding: from sound to spikes. Cell and Tissue Research. 2015;361(1):159–175. doi: 10.1007/s00441-015-2202-z [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0297826.ref034] 34. Wong DD, Fuglsang SA, Hjortkjær J, Ceolini E, Slaney M, De Cheveigne A. A comparison of regularization methods in forward and backward models for auditory attention decoding. Frontiers in Neuroscience. 2018;12:531. doi: 10.3389/fnins.2018.00531 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0297826.ref035] 35. Crosse MJ, Zuk NJ, Di Liberto GM, Nidiffer AR, Molholm S, Lalor EC. Linear modeling of neurophysiological responses to speech and other continuous stimuli: methodological considerations for applied research. Frontiers in Neuroscience. 2021; p. 1350. doi: 10.3389/fnins.2021.705621 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0297826.ref036] 36. Kulasingham JP, Simon JZ. Algorithms for Estimating Time-Locked Neural Response Components in Cortical Processing of Continuous Speech. IEEE Transactions on Biomedical Engineering. 2023;70(1):88–96. doi: 10.1109/TBME.2022.3185005 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0297826.ref037] 37. Vecchi AO, Varnet L, Carney LH, Dau T, Bruce IC, Verhulst S, et al. A comparative study of eight human auditory models of monaural processing. Acta Acustica. 2022;6:17. doi: 10.1051/aacus/2022008 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0297826.ref038] 38. Relaño-Iborra H, Zaar J, Dau T. A speech-based computational auditory signal processing and perception model. The Journal of the Acoustical Society of America. 2019;146(5):3306–3317. doi: 10.1121/1.5129114 [DOI] [PubMed] [Google Scholar]

[pone.0297826.ref039] 39. Verhulst S, Bharadwaj HM, Mehraei G, Shera CA, Shinn-Cunningham BG. Functional modeling of the human auditory brainstem response to broadband stimulation. The Journal of the Acoustical Society of America. 2015;138(3):1637–1659. doi: 10.1121/1.4928305 [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Predictors for estimating subcortical EEG responses to continuous speech

Joshua P Kulasingham

Florine L Bachmann

Kasper Eskelund

Martin Enqvist

Hamish Innes-Brown

Emina Alickovic

Roles

Abstract

Introduction

Materials and methods

Experimental setup

EEG data collection and preprocessing

Auditory models

Fig 1. Predictor waveforms.

Table 1. Predictor comparison.

Rectified speech (RS)

Gammatone spectrogram predictor (GT)

Simple model without adaptation (OSS)

Simple model with adaptation (OSSA)

Complex model (ZIL)

Temporal response function estimation

Performance metrics and statistical tests

Results

Subcortical TRFs for predictors derived from auditory models

Fig 2. Estimated TRFs for each predictor.

Fig 3. Individual TRFs for all 24 participants.

Interaction of data length and predictor type on subcortical TRFs

Fig 4. Effect of predictor type and data length.

Individual amplitudes and latencies of wave V

Fig 5. Comparison of ZIL and OSSA models.

Discussion

Conclusion

Acknowledgments

Data Availability

Funding Statement

References

Decision Letter 0

Diego A Forero

Roles

Author response to Decision Letter 0

Decision Letter 1

Diego A Forero

Roles

Acceptance letter

Diego A Forero

Roles

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases