Abstract
The multivariate temporal response function (mTRF) is an effective tool for investigating the neural encoding of acoustic and complex linguistic features in natural continuous speech. In this study, we investigated how neural representations of speech features derived from natural stimuli are related to early signs of cognitive decline in older adults, taking into account the effects of hearing. Participants without () and with () early signs of cognitive decline listened to an audiobook while their electroencephalography responses were recorded. Using the mTRF framework, we modeled the relationship between speech input and neural response via different acoustic, segmented and linguistic encoding models and examined the response functions in terms of encoding accuracy, signal power, peak amplitudes and latencies. Our results showed no significant effect of cognitive decline or hearing ability on the neural encoding of acoustic and linguistic speech features. However, we found a significant interaction between hearing ability and the word-level segmentation model, suggesting that hearing impairment specifically affects encoding accuracy for this model, while other features were not affected by hearing ability. These results suggest that while speech processing markers remain unaffected by cognitive decline and hearing loss per se, neural encoding of word-level segmented speech features in older adults is affected by hearing loss but not by cognitive decline. This study emphasises the effectiveness of mTRF analysis in studying the neural encoding of speech and argues for an extension of research to investigate its clinical impact on hearing loss and cognition.
Keywords: Auditory speech processing, Linguistic speech processing, Cognitive decline, Natural continuous speech, Electroencephalography, Temporal response function
Subject terms: Neuroscience, Psychology, Biomarkers, Risk factors
Introduction
The increasing number of dementia patients in our ageing population underlines the urgent need for research that focuses on methods for early detection. Early stages of cognitive decline pose a major challenge for diagnosis and intervention, as early detection is central to effective treatment, making the identification of subtle changes prior to diagnosis particularly important1. Language performance, encompassing both comprehension and production, is intricately linked to cognitive functions and often deteriorates in dementia and other forms of cognitive decline2–5. The processing of language in the brain involves complex neural networks across various brain regions, which are vulnerable to early neuropathological changes, especially in conditions like Alzheimer’s disease (AD)6. Furthermore, the relationship between cognition and hearing is critically important7. Age-related hearing loss is closely linked with cognitive decline and is considered the most common modifiable risk factor for dementia1. A significant proportion of older adults with cognitive decline also experience hearing loss8,9. The manner in which the aging brain processes speech—particularly natural continuous speech that involves ongoing top-down and bottom-up processing10—through the auditory system may be key in the early detection of cognitive decline. Behavioral measures have indicated that individuals with early signs of cognitive decline show impaired auditory processing capabilities11. Neurophysiological studies have revealed that these early stages of decline are associated with significant changes in neural processing12,13. Specifically, altered encoding of syllable sounds at both cortical and subcortical levels has been observed in individuals with early cognitive decline, suggesting potential predictive value for such decline12. However, in our earlier work where natural speech was used instead of syllable sounds, we were unable to confirm these results14.
In our previous work, we used the temporal response functions (TRF) framework to focus on auditory speech encoding as reflected in both subcortical and cortical responses15. The TRF, a linearized stimulus-response model, delineates the relationship between speech input and neuronal response, as typically measured by electroencephalography (EEG). These models are particularly adept at quantifying the brain’s response to natural speech over extended listening periods and offer a broader temporal integration range than conventional evoked potentials. Natural speech, which has higher ecological validity than the syllable and click sounds typically used in conventional potentials, forms the basis of our study, increasing the applicability of the results in the real world. The ecological validity of natural speech is underscored by studies showing that cognitive factors, such as focal attention, significantly impact the cortical tracking of speech16,17. The concept of “neural tracking of speech” refers to the brain’s ability to follow the dynamic properties of the speech signal, which can be captured using TRF models. TRF models are critical for two reasons: they provide an encoding accuracy for the speech feature of interest and a time-delayed neural response function that enables neurophysiological interpretation18. Initially, our study focused on the encoding of an acoustic feature present in natural speech, specifically auditory nerve rates derived from speech wave rates—akin to a temporal envelope14. However, the flexibility of TRF models allows them to also fit a range of lexical (at the word-level) and sublexical (at the phoneme-level) speech features derived from the speech signal and its temporally aligned transcript, as has been done in several studies19–22. These models can be calculated using impulse vectors that code for, e.g., word or phoneme onsets, or scaled impulse vectors that reflect the surprisal value of a word or phoneme. Furthermore, a TRF model can simultaneously be modeled to multiple speech features, producing multivariate neural response functions—each corresponding to a different speech feature—known as a multivariate TRF (mTRF)15. The mTRF models extend beyond acoustic features to encompass linguistic features represented in natural speech. This framework has proven valuable in exploring differential acoustic and linguistic speech tracking in patients with post-stroke aphasia22 and demonstrating that linguistic representations diminish with increasing age23. Inspired by the opportunities offered by the mTRF framework, we have delved deeper into the investigation of linguistic processing.
In this work, we drew on the dataset from our previous study14 to investigate the neural encoding of linguistic features in natural speech in older adults, focusing on participants with and without putative cognitive decline. Participants were categorized into two groups based on their scores from the Montreal Cognitive Assessment (MoCA)24: the normal MoCA group, showing no early signs of cognitive decline, and the low MoCA group, where participants scored below 26 points, the clinical threshold for mild cognitive impairment (MCI). The distribution of MoCA scores and the corresponding group allocation are shown in Fig. 1A.
Figure 1.
Montreal Cognitive Assessment (MoCA) scores, audiogram, and age as a function of four-frequency pure-tone average (PTA). (A) Distribution of MoCA scores. The dashed line indicates the cutoff score of 26. (B) Individual hearing thresholds for each frequency, averaged by ear and colored by MoCA group. PTA values calculated from the audiogram did not differ between groups. (C) Age as a function of PTA. The shaded area represents the 95% confidence interval. Age correlated with PTA across all participants. This plot is adapted from the original study14.
Drawing inspiration from the framework established by Kries et al.22, we conducted our analysis using five distinct mTRF models. These models covered a spectrum of linguistic features, from basic segmentation of words and phonemes to more complex analyses such as surprisal, frequency, and entropy of (sub)lexical items. Specifically, our models were designed as follows: the acoustic model incorporated the speech envelope and envelope onsets; the word- and phoneme-level segmentation models included the word and phoneme onsets; the linguistic word-level model included word surprisal and word frequency; and the linguistic phoneme-level model included phoneme surprisal and phoneme entropy (see Fig. 2). Given the collinearity between features originating from the same speech signal25, we regressed the features not of interest from the EEG signal before fitting the mTRF models to isolate the specific feature of interest for each model. Our exploratory approach aimed to determine whether the neural encoding of these linguistic features in natural speech varied between the two participant groups and to understand how these differences might be influenced by hearing loss.
Figure 2.
Overview over the multivariate temporal response function (mTRF) models with speech features. The features are demonstrated using an example sentence: “Botanik gefiel mir, weil ich gern Blätter zerschnitt.” (I liked botany because I liked cutting up leaves) from one of the audiobook segments used in the study. The acoustic model included the envelope and envelope onsets derived from the speech wave (teal colored). The word- and phoneme-level segmentation model included the word and phoneme onsets (black), the linguistic word-level model included the word surprisal and word frequency (black), and the linguistic phoneme-level model included the phoneme surprisal and phoneme entropy (gray), all derived from word- and phoneme-level time-aligned transcriptions. The graphical representation was inspired by the one created by Kries et al.22.
Our analysis began with an evaluation of the encoding accuracy across the mTRF models to assess whether overall neural speech tracking performance was influenced by cognitive decline. We further explored the signal of the time-delayed neural response functions for the two speech features embedded in each mTRF model. This exploration aimed to investigate how the brain processes these features differently in participants with and without early signs of cognitive decline. Additionally, we integrated an assessment of hearing ability by quantifying the four-frequency pure tone average (PTA)26 to investigate potential interactions between auditory encoding, cognitive decline, and hearing ability. Individual hearing thresholds for each frequency, averaged by ear, are shown in Fig. 1B, and the interaction between age and PTA is shown in Fig. 1C. Considering the well-established link between cognitive decline and language processing2–5, we hypothesized that participants exhibiting early signs of cognitive decline would demonstrate altered neural encoding of linguistic features in natural speech, as evidenced by variations in encoding accuracy, compared to those without early signs of cognitive decline. We also anticipated that these differences would manifest in distinct response function signals and that hearing loss would modulate these effects, as seen in previous studies on neural speech processing (see, e.g.,27–29).
Results
Effect of MoCA group and PTA on encoding accuracy
To investigate how cognitive decline and hearing ability affect the encoding accuracy of the five different mTRF models—the acoustic, segmentation at the word- and phoneme-level, and linguistic at the word- and phoneme-level models, illustrated in Fig. 2—we implemented a linear mixed model (LMM) using orthogonal sum contrast coding for the categorical predictors. The results are detailed in Table 1.
Table 1.
Results of the linear mixed model (LMM) for the encoding accuracy of each multivariate temporal response function (mTRF) model.
| Coefficient | CI | df | t | p | |||
|---|---|---|---|---|---|---|---|
| LL | UL | ||||||
| Intercept | 0.033 | 0.028 | 0.038 | 40 | 12.6 | *** | |
| MoCA group (low) | 0.004 | 40 | 0.559 | ||||
| PTA (z) | 0.003 | 40 | 0.475 | ||||
| mTRF model (Seg. word-level) | 0.018 | 0.016 | 0.020 | 160 | 19.9 | *** | |
| mTRF model (Seg. phoneme-level) | 160 | *** | |||||
| mTRF model (Lin. word-level) | 0.003 | 0.001 | 0.005 | 160 | 3.1 | 0.003 | ** |
| mTRF model (Lin. phoneme-level) | 160 | *** | |||||
| MoCA group × PTA | 0.003 | 0.008 | 40 | 1.3 | 0.205 | ||
| MoCA group × mTRF model (Seg. word-level) | 0.002 | 160 | 0.800 | ||||
| MoCA group × mTRF model (Seg. phoneme-level) | 0.001 | 0.003 | 160 | 1.3 | 0.211 | ||
| MoCA group × mTRF model (Lin. word-level) | 0.001 | 160 | 0.162 | ||||
| MoCA group × mTRF model (Lin. phoneme-level) | 0.001 | 0.002 | 160 | 0.6 | 0.522 | ||
| PTA × mTRF model (Seg. word-level) | 0.003 | 0.002 | 0.005 | 160 | 3.7 | *** | |
| PTA × mTRF model (Seg. phoneme-level) | 0.001 | 160 | 0.357 | ||||
| PTA × mTRF model (Lin. word-level) | 0.001 | 160 | 0.679 | ||||
| PTA × mTRF model (Lin. phoneme-level) | 0.001 | 160 | 0.205 | ||||
| MoCA group × PTA × mTRF model (Seg. word-level) | 0.001 | 0.003 | 160 | 1.2 | 0.237 | ||
| MoCA group × PTA × mTRF model (Seg. phoneme-level) | 0.001 | 160 | 0.570 | ||||
| MoCA group × PTA × mTRF model (Lin. word-level) | 0.001 | 0.002 | 160 | 0.7 | 0.464 | ||
| MoCA group × PTA× mTRF model (Lin. phoneme-level) | 0.001 | 0.002 | 160 | 0.7 | 0.500 | ||
The LMM included Montreal Cognitive Assessment (MoCA) group, four-frequency pure-tone average (PTA), mTRF model and the interaction between MoCA group, PTA and mTRF model as fixed effects, and participant ID as a random effect. The reference levels for MoCA group and mTRF model were low and the acoustic model, respectively. Seg., segmentation; Lin., linguistic, CI, confidence interval; LL, lower limit; UL, upper limit; df, degrees of freedom. Orthogonal contrasts were used to test the interaction effects. Significance levels are indicated as: (***), (**), (*).
This statistical model accounted for individual differences and the nesting of the five encoding accuracies within participants, yielding the following insights: First, no significant main effect was observed for the MoCA group, indicating that encoding accuracies were comparable between participants with and without early signs of cognitive decline. Second, no significant main effect was detected for PTA, suggesting that hearing ability did not significantly influence encoding accuracy. Third, a significant main effect was found for the mTRF models: the acoustic model, serving as the reference level, exhibited higher encoding accuracy on average compared to the other models, Fig. 3A. Post-hoc tests confirmed that the encoding accuracies of all other models were significantly lower than that of the acoustic model ( for all comparisons, see Table S3). The interaction between MoCA group and PTA or between MoCA group and the mTRF models were not significant. A significant interaction was observed between PTA and the segmentation word-level model, whereas no other interactions between PTA and the mTRF models were significant, indicating that hearing impairment affects the encoding accuracy for this specific model. Additionally, no significant three-way interactions were found between MoCA group, PTA, and the mTRF models, indicating that the combined effect of cognitive decline and hearing impairment did not differentially affect the encoding accuracy across the different mTRF models.
Figure 3.
Encoding accuracy by multivariate temporal response function (mTRF) model and estimated marginal means (EMMs) of the interactions between four-frequency pure-tone average (PTA) and the mTRF models. (A) Encoding accuracies (Pearson’s r, averaged across all 32 electrodes) for each mTRF model. Violin plots show the distribution of the individual data points colored by Montreal Cognitive Assessment (MoCA) group, with the dashed line indicating the median and the dotted lines indicating the interquartile range. All mTRF models exhibited significantly lower encoding accuracies compared to the acoustic model. (B) EMMs of the interaction between PTA and the mTRF models. The acoustic model served as the reference level. The interaction was significant for the segmentation word-level model, with a greater difference in encoding accuracy observed as hearing ability decreased. The error bars represent the standard error of the mean. Significance levels are indicated as: (***). Seg., segmentation; Lin., linguistic.
To further explore the significant interaction between PTA and the segmentation word-level model, we conducted post-hoc tests. Specifically, we examined the estimated marginal means (EMMs) of the interaction at , 0, and standard deviations of PTA, revealing the following: At PTA , the estimated difference in encoding accuracy between the acoustic and segmentation word model was 0.019 (, , ). At PTA , the difference was 0.024 (, , ). At PTA , the difference was 0.028 (, , ). The EMMs for each mTRF model are visualized in Fig. 3B. These results indicate that, compared to the acoustic model, hearing impairment significantly affects encoding accuracy in the segmentation word-level model, with greater differences observed as hearing ability decreases.
Overall, our analysis suggests that while there is no overarching difference in encoding accuracy between participants with and without cognitive decline, hearing loss impacts encoding accuracy in the segmentation word-level model. Please note that the supplementary analysis treating MoCA as a continuous variable yielded similar conclusions (see Supplementary Analysis 1 and Table S1).
mTRF-model based effects of MoCA group and PTA on response function signal power
The response functions to each speech feature modeled in the mTRF models are depicted in Fig. 4. We evaluated the effects of cognitive decline on the signal power of these response functions by comparing the root mean square (RMS) values between participants with and without early signs of cognitive decline, using a LMM to account for individual differences and the nesting of RMS values within participants. Specifically, we ran the LMM separately for each mTRF model, with the RMS of the response functions to the different speech features nested within three electrode clusters (F, frontal; C, central; P, parietal) and participants. The results are summarized in Table 2.
Figure 4.
Mean weights of response functions for each speech feature in the multivariate temporal response function (mTRF) models by Montreal Cognitive Assessment (MoCA) group, with the low MoCA group indicating early signs of cognitive decline, and topographical maps. Response functions shown in the plot were globally z-transformed for visualization purposes. Topographical maps were derived from the largest average peak in the response functions, displaying the mean activity across all participants in a 50 ms window centered around the peak latency, as indicated by the vertical dashed line in the response functions. Displayed response functions are averaged from three key electrodes, marked in yellow on the topographical maps. The response functions showed distinct peaks that differed in amplitude and latency between the speech features. However, no significant differences in peak amplitudes or latencies were observed between the groups, with the exception of response peak latency in phoneme onsets in the parietal cluster during the early time window, as indicated by the shaded gray area. Significance levels are indicated as: (**).
Table 2.
Results of the linear mixed models (LMM) for the root mean square (RMS) of the temporal response function (TRF) patterns.
| Coefficient | CI | df | t | p | |||
|---|---|---|---|---|---|---|---|
| LL | UL | ||||||
| Acoustic model | |||||||
| Intercept | 0.001 | 0.001 | 0.001 | 49.7 | 11.9 | *** | |
| MoCA group (low) | 40.0 | 0.4 | 0.667 | ||||
| PTA (z) | 40.0 | 1.0 | 0.303 | ||||
| Speech feature (envelope onsets) | 745.0 | 0.2 | 0.809 | ||||
| Cluster (C) | 745.0 | 0.135 | |||||
| Cluster (P) | 745.0 | *** | |||||
| MoCA group × PTA | 40.0 | 0.682 | |||||
| Segmentation word-level model | |||||||
| Intercept | 0.001 | 0.001 | 0.001 | 43.6 | 11.1 | *** | |
| MoCA group (low) | 40.0 | 0.3 | 0.751 | ||||
| PTA (z) | 40.0 | 0.943 | |||||
| Cluster (C) | 350.0 | 3.7 | *** | ||||
| Cluster (P) | 350.0 | *** | |||||
| MoCA group × PTA | 40.0 | 0.404 | |||||
| Segmentation phoneme-level model | |||||||
| Intercept | 0.001 | 0.001 | 0.001 | 43.2 | 9.6 | *** | |
| MoCA group (low) | 0.001 | 40.0 | 1.1 | 0.269 | |||
| PTA (z) | 40.0 | 0.746 | |||||
| Cluster (C) | 350.0 | 0.9 | 0.350 | ||||
| Cluster (P) | 350.0 | *** | |||||
| MoCA group × PTA | 40.0 | 0.870 | |||||
| Linguistic word-level model | |||||||
| Intercept | 0.001 | 0.001 | 0.001 | 54.9 | 15.2 | *** | |
| MoCA group (low) | 40.0 | 0.5 | 0.627 | ||||
| PTA (z) | 40.0 | 0.2 | 0.865 | ||||
| Speech feature (word frequency) | 745.0 | *** | |||||
| Cluster (C) | 745.0 | 2.0 | 0.041 | * | |||
| Cluster (P) | 745.0 | *** | |||||
| MoCA group × PTA | 40.0 | 0.208 | |||||
| Linguistic phoneme-level model | |||||||
| Intercept | 0.001 | 0.001 | 0.001 | 47.1 | 11.3 | *** | |
| MoCA group (low) | 40.0 | 0.9 | 0.349 | ||||
| PTA (z) | 40.0 | 0.496 | |||||
| Speech feature (phoneme entropy) | 745.0 | *** | |||||
| Cluster (C) | 745.0 | 2.4 | 0.016 | * | |||
| Cluster (P) | 745.0 | *** | |||||
| MoCA group × PTA | 40.0 | 0.2 | 0.844 | ||||
The models included Montreal Cognitive Assessment (MoCA) group, four-frequency pure-tone average (PTA), speech feature (in the acoustic and word- and phoneme-level linguistic models), electrode cluster (F, frontal; C, central; P, parietal), and the interaction between MoCA group and PTA as fixed effects, and participant ID as a random effect. The reference level for cluster was F. The reference levels for the mTRF models with two nested speech features were envelope (acoustic model), word surprisal (linguistic word-level model) and phoneme surprisal (linguistic phoneme-level model). CI, confidence interval; LL, lower limit; UL, upper limit, df, degrees of freedom. Significance levels are indicated as: (***), (**), (*).
Overall, the results largely paralleled those of the encoding accuracy analysis. First, no significant main effect was found for the MoCA group across any of the mTRF models, indicating that the signal power of the response functions did not differ between participants with and without cognitive decline. Second, PTA did not significantly affect the signal power for any of the mTRF models. Additionally, no significant interactions were found between MoCA group and PTA across any of the mTRF models. Third, the cluster variable was significant in all five LMMs, as reflected in the topographical maps of the response functions in Fig. 4. Parietal clusters exhibited lower RMS values compared to frontal clusters for all mTRF models, indicating that the signal power was more prominent in the frontal cluster. For the word- and phoneme-based models, central clusters showed significantly higher RMS values than frontal clusters, suggesting that in these higher-order linguistic models, the signal power was more distinct in central clusters. Finally, the speech feature variable also significantly influenced signal power, depending on the mTRF model. For the acoustic model, RMS for the envelope onsets was comparable to the envelope. In the segmentation model, signal power for phoneme onsets was significantly higher than for word onsets. In the word-based model, signal power for word frequency was significantly lower than for word surprisal. In the phoneme-based model, signal power for phoneme entropy was significantly lower than for phoneme onsets.
Taken together, these results suggest that the signal power of response functions to natural speech is not influenced by early signs of cognitive decline or hearing ability. However, the analysis indicates that the signal power of response functions to higher-order linguistic features is significantly influenced by electrode clusters and speech features. Also here it is noteworthy that the supplementary analysis treating MoCA as a continuous variable yielded similar conclusions (see Supplementary Analysis 2 and Table S2).
Group comparisons of peak amplitudes and latencies
Our analysis of response functions to different speech features revealed distinct peaks that vary in amplitude and latency across responses, as illustrated in Fig. 4. We performed a detailed analysis of these peaks to investigate possible differences in peak amplitudes and latencies between participants with and without cognitive decline, in addition to the RMS analysis, which focuses primarily on signal power and neglects variations in peaks and their latencies. Our focus was on responses where at least of participants exhibited a peak (see summary in Table S4). The results of these comparisons, detailed in Table 3, predominantly showed no significant differences in peak amplitudes or latencies between the two groups during an earlier time window. However, an exception was noted for phoneme onsets in this time window. Specifically, participants with early signs of cognitive decline exhibited earlier peak latencies compared to those without cognitive decline in the parietal cluster (two-tailed Mann–Whitney U test, , Holm-Bonferroni corrected , ). The descriptive statistics for all peak latencies are summarized in Table S5. On average, the peak latency for phoneme onsets was for the normal MoCA group and for the low MoCA group.
Table 3.
Results of the Mann–Whitney U tests for peak amplitudes and latencies in early and late time windows. The table includes U-statistics, Holm-Bonferroni corrected p-values, effect sizes (Rank-Biserial correlation coefficient r), and the number of participants in each group. This analysis focused on peaks observed in more than of participants for each response and electrode cluster within each time window. Significance levels are indicated as: (**).
| Speech feature | Cluster | Amplitudes | Latencies | Participants | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| U | r | U | r | Normal | Low | |||||
| Early window | ||||||||||
| Envelope | F | 289.0 | 1.000 | 281.5 | 0.658 | 25 | 18 | |||
| C | 253.0 | 0.844 | 264.5 | 0.973 | 23 | 18 | ||||
| P | 255.0 | 1.000 | 220.0 | 0.911 | 22 | 16 | ||||
| Envelope onsets | F | 176.0 | 0.568 | 122.0 | 0.230 | 21 | 15 | |||
| Word onsets | F | 182.0 | 1.000 | 164.5 | 0.525 | 22 | 17 | |||
| C | 195.0 | 1.000 | 180.0 | 0.475 | 23 | 18 | ||||
| P | 137.0 | 1.000 | 160.5 | 0.974 | 17 | 17 | ||||
| Phoneme onsets | F | 260.0 | 0.606 | 228.0 | 0.821 | 25 | 19 | |||
| C | 247.0 | 0.885 | 213.5 | 0.958 | 24 | 18 | ||||
| P | 218.0 | 0.391 | 82.5 | 0.008 | ** | 21 | 16 | |||
| Word surprisal | F | 125.0 | 1.000 | 122.5 | 0.655 | 18 | 15 | |||
| C | 186.0 | 1.000 | 176.0 | 0.551 | 22 | 18 | ||||
| P | 176.0 | 1.000 | 125.5 | 0.255 | 19 | 16 | ||||
| Phoneme surprisal | F | 212.0 | 0.298 | 151.5 | 0.467 | 21 | 15 | |||
| C | 230.0 | 0.393 | 160.5 | 0.502 | 22 | 15 | ||||
| Phoneme entropy | F | 212.0 | 0.893 | 179.0 | 0.939 | 21 | 16 | |||
| C | 196.0 | 0.808 | 155.0 | 0.693 | 21 | 16 | ||||
| P | 178.0 | 0.526 | 90.5 | 0.031 | 21 | 15 | ||||
| Late window | ||||||||||
| Word onsets | F | 221.0 | 0.960 | 179.5 | 0.330 | 22 | 17 | |||
| C | 162.0 | 1.000 | 152.0 | 0.633 | 20 | 16 | ||||
| P | 233.0 | 1.000 | 208.0 | 0.735 | 22 | 15 | ||||
| Word surprisal | F | 267.0 | 1.000 | 193.0 | 0.398 | 22 | 18 | |||
| C | 131.0 | 0.901 | 138.0 | 0.928 | 18 | 15 | ||||
| P | 190.0 | 1.000 | 206.5 | 0.419 | 19 | 15 | ||||
| Word frequency | F | 187.0 | 0.829 | 164.5 | 0.403 | 20 | 15 | |||
| P | 150.0 | 0.610 | 141.5 | 0.202 | 18 | 15 | ||||
In the later time window, our analysis also revealed no significant differences in peak amplitudes or latencies across the different clusters for both groups. Overall, these findings suggest that response functions peaks of neural encoding of natural speech do not significantly differ between participants with and without early signs of cognitive decline, except for phoneme onsets in the early time window.
Discussion
The primary aim of our study was to explore how older adults, both with and without early signs of cognitive decline, encode acoustic, segmentation, and linguistic cues in natural continuous speech, while also considering the impact of hearing ability. We used the mTRF framework to analyze neural responses to various acoustic, lexical, and sublexical features in an audiobook’s speech stream. This analysis serves as a proxy for understanding how the brain processes these features, using EEG data from our previous study14. Our analysis focused on encoding accuracy, signal power, and response peak and latencies across five models: acoustic, segmentation at the word- and phoneme-level, and linguistic at the word- and phoneme-level.
Contrary to our hypothesis, our findings revealed no significant impact of cognitive decline on neural encoding of speech as measured by our metrics, despite existing literature suggesting a link between cognitive decline and deteriorating language performance2–5. Similarly, we found no significant main effect of hearing ability on neural encoding of speech features, though hearing loss—a common comorbidity in cognitively declining older adults—typically alters speech processing8,9,27,28,30. This is surprising, especially considering the common pool model of cognitive processing resources31, which suggests that cognitive decline and hearing loss may compete for limited cognitive and perceptual resources, thereby reducing neural encoding of speech features in affected individuals. This theory suggests that if participants with advanced cognitive decline or hearing loss were included, differences in neural encoding might be more pronounced, particularly in tasks requiring more cognitive resources, such as speech-in-noise perception or in individuals with diagnosed MCI or AD. For a related discussion on natural speech tracking with and without visual enhancement, which treats resources differently, see Frei et al.32. In our data, we did not observe such an association between cognitive decline and neural encoding, suggesting that early cognitive decline did not significantly affect the neural processing of speech in our participants.
Cognitive factors are known to influence neural tracking of speech. Studies have shown that neural speech tracking, particularly the phase-locking of the neural response to the amplitude envelope, is crucial for successful speech comprehension33–35. Studies also demonstrated that neural speech tracking underlies successful speech comprehension29, with a positive relationship observed between neural tracking and speech comprehension in older adults with both normal hearing and hearing impairment27,36. Furthermore, research has indicated that the older brain recruits additional higher-level auditory regions during the early stages of speech processing to maintain speech comprehension19,37. Thus, both cognitive decline and hearing loss are known to affect neural encoding of speech, with potential implications for speech comprehension. The lack of a significant effect of cognitive decline on neural encoding in our study may be due to the relatively early stage of cognitive decline in our participants, as indicated by the MoCA scores, see also the limitations discussed below.
What we did observe was a significant interaction between hearing ability and the segmentation word-level model, indicating that increasing hearing impairment led to a decrease in the brain’s ability to track word segmentation in natural speech. This is noteworthy since recognizing words in continuous speech is a complex task requiring substantial cognitive resources, which can become even more challenging in the presence of hearing loss38,39. It is intriguing that we did not see these interactions in the higher-order linguistic models, nor did we observe a significant main effect of PTA on neural encoding accuracy, which is commonly reported in other studies27,28,30. We also found that while investigating the word-level segmentation mTRF model at the response signal level, hearing ability did not affect the response signal power, contrasting with the encoding accuracy results.
Another noteworthy result was the significantly earlier peak latency for phoneme onsets in participants with early signs of cognitive decline compared to those without, observed in a parietal cluster during the early time window. Given that phoneme processing is more demanding than word processing40, it is possible that the earlier latencies reflect a compensatory mechanism or a heightened sensitivity to processing demands in individuals with cognitive decline. The emergence of this result in a parietal cluster is also consistent with the literature, as the parietal cortex is known to be involved in phonological processing41. Additionally, a previous study demonstrated a decrease in encoding accuracy in linguistic neural tracking with age, particularly in a comparable parietal region21. This could suggest that cognitive decline impacts the timing of more demanding phonological processes, leading to these earlier neural responses.
Overall, our study suggests that while there is no overarching difference in speech encoding between participants with and without cognitive decline, hearing loss specifically impacts word segmentation in speech processing.
Implications for the study of cognitive decline and hearing loss
Drawing on our previous research14, we anticipated no significant differences in the acoustic encoding of speech in this dataset. However, we did expect to observe differences in the encoding of linguistic features, particularly in participants with early signs of cognitive decline, which we did not find. Our interest was in exploring potential variations in neural encoding of linguistic features, given the established link between language performance and cognitive functions, which may alter under cognitive decline. Previous results where we applied machine learning methods to voice parameters extracted from a semi-spontaneous speech task recorded with the same participants, we were able to classify the low MoCA group with a cross-validated accuracy of 42. In our current study, in contrast, we found no significant differences in neural encoding of linguistic features between participants with and without early signs of cognitive decline. The neural responses appeared relatively homogeneous across participants, regardless of their cognitive status, suggesting that neural encoding of speech in natural continuous settings may not effectively indicate early cognitive decline in older adults. Regarding hearing ability, our findings diverged from existing studies, where hearing loss often correlated with enhanced cortical speech tracking measures in older adults27,28,30. The relatively good hearing ability in our sample, with only seven individuals exhibiting a PTA exceeding , which can be considered as moderate hearing loss43, might explain the lack of a significant association between hearing ability and neural encoding measures. We hypothesize that a sample with a higher prevalence of hearing loss might have yielded different results. The lack of a significant main effect of cognitive decline on neural encoding suggests that early cognitive decline might not uniformly disrupt neural processing of speech at the acoustic and linguistic levels. In addition, the specific impact of hearing loss on word-level segmentation underscores the importance of auditory cues in successful word recognition and highlights the additional cognitive load imposed by hearing impairment39.
Dual role of encoding accuracies in mTRF modeling
Our study found differences in encoding accuracies across the five mTRF models: acoustic, segmentation at the word- and phoneme-level, and linguistic at the word- and phoneme-level. The acoustic model consistently showed higher encoding accuracy compared to the other models. In contrast, the segmentation models, particularly at the phoneme-level, exhibited lower encoding accuracies. There is a high correlation between speech features like the ones we used25, which we accounted for as detailed in the Methods section. Specifically, the envelope of the speech signal, which is the basis for the acoustic model, contains information about the speech rate, rhythm, and prosody44, while the segmentation and linguistic models are more isolated to boundaries and feature-engineered linguistic cues. Furthermore, the speech envelope is a continuous signal, while the segmentation and linguistic models are based on discrete features and are directly derived from the speech wave heard by participants. This distinction likely accounts for the differences in encoding accuracies, as the acoustic signal contains more direct, continuous auditory information compared to the derived linguistic features.
Initially, encoding accuracy in mTRF models was introduced as a measure to assess the quality of the model, involving statistical testing18. Specifically, one approach is to establish a null distribution using a permutation test procedure to define a null distribution against which the accuracy score is tested18. Similarly, some researchers establish a noise floor by phase scrambling the target regressors and then comparing the encoding accuracies from the original data to the noise floor28,45. However, other researchers have started using encoding accuracy as a measure of neural tracking, reflecting how well the brain follows specific features of the speech signal22,25, using it as a proxy for how well the speech representation is reflected in the EEG signal. The review by Gillis et al.25 discusses the use of encoding models as a diagnostic tool to assess the auditory pathway, highlighting this dual role. This dual role of encoding accuracy is a crucial methodological consideration when using the mTRF framework to study neural speech processing. In our data, we found that encoding accuracy is highly correlated with the signal power of the response signal (see Figure S2 for the correlation between encoding accuracy and signal power). This observation aligns with the notion that both the quality of the model and the neural tracking of speech can be inferred from the encoding accuracy.
The methods used to establish encoding accuracies also vary. Some scholars use a nested cross-validation approach to estimate encoding accuracies, as we did in the current manuscript, where the test set is also rotated (e.g.,28,46), while others use a leave-one-trial-out cross-validation approach (e.g.,47), as recommended in the sample procedure outlined by Crosse et al.18 and which we did in our previous study14. Depending on this procedure, the encoding accuracies can vary, which is an important consideration when comparing results across studies. It is noteworthy that there are no established benchmark values for expected encoding accuracies when comparing different acoustic, linguistic, and segmentation models.
Given the strengths of the mTRF framework, which provides both a signal response that allows for neurophysiological interpretation15 and a general encoding accuracy25, it is important to have clear research recommendations. Future studies should aim to delineate when to interpret response signals in terms of their power and shape, including time lags, and when it is more appropriate to focus on encoding accuracies. Establishing guidelines and normative values for these interpretations would enhance the reliability and consistency of findings in neural speech processing research.
Limitations of the study and future directions
Neuropsychological assessment: A major limitation of our study was that we relied on the MoCA as a proxy for detecting putative MCI, i.e., early signs of cognitive decline. We chose this approach for two main reasons: First, ideally, we would have conducted this study with patients diagnosed with MCI or AD, but it proved difficult to recruit participants with a clear neurocognitive profile—a common issue in early cognitive decline studies. Therefore, we used the MoCA, a screening instrument for MCI, to provide a rough classification of our participants. Second, we preferred using the MoCA over a neurocognitive test battery, as the MoCA was specifically developed for screening evaluations. Although the MoCA is suitable for initial screening, it may not be sensitive enough to detect subtle cognitive changes in our participants. Furthermore, the stage of cognitive decline detected with the MoCA may be too early to significantly affect neural speech encoding. Therefore, in replicating the study, we would rely on more comprehensive neuropsychological assessments that allow for a more accurate classification of cognitive status and a deeper understanding of the relationship between cognitive decline and neural speech encoding. In addition to the MoCA, it would have been beneficial to administer a more comprehensive, age-appropriate IQ test. An example for our case would be the LPS , which assesses cognitive status and intellectual profiles in individuals aged 50–90 years and aids in diagnosing brain function disorders (e.g., early detection of degenerative diseases)48. This would allow us to relate the MoCA cut-off scores to the IQ test results, providing a more comprehensive picture of the cognitive status of our participants. We suggest that future studies incorporate such assessments to more accurately describe these relationships.
Stimulus complexity and cognitive demand: The complexity of the auditory stimulus in our study, involving the presentation of an audiobook in silence, might not have been sufficiently challenging to reveal differences in neural encoding between participants with and without early signs of cognitive decline. Prior research from our lab has shown that neural tracking of speech in older adults is significantly affected by cognitive load, particularly in speech-in-noise conditions29. Thus, embedding natural speech stimuli in more demanding listening conditions could provide a more robust assessment of the impacts of cognitive decline on speech processing, representing a promising direction for future research.
Methodological considerations in neural encoding: The methodology we used to investigate neural speech processing, particularly the use of the mTRF framework, faced limitations worth noting. The interdependence of the speech features used in our study posed a challenge in accurately isolating the effects of individual features on neural encoding. We chose to address this by regressing out the shared variance between features, which is one of several approaches, see review by Gillis et al.25. While this method helps mitigate some confounding effects, it may still have influenced the outcomes of our models. Furthermore, the linguistic models used—specifically, the word- and phoneme-based models—were constrained by the available features. Unlike previous studies that used 5-gram language models to estimate word surprisal, such as, e.g., Kries et al.22 or Gillis et al.49, we adopted a different approach due to our resource constraints. We used German BERT59, a pretrained large language model, to estimate word surprisal, potentially yielding different estimates compared to those derived from 5-gram language models. For phoneme surprisal, we designed a custom phonetic lexicon based on the Montreal Forced Aligner’s (MFA) pronunciation dictionary, and used the DeReKoGram frequency dataset50, which is an important source for behavioral or neurophysiological studies that require a large-scale corpus, for word frequencies. Additionally, see Weissbart et al.20 for a customized approach to n-gram models for estimating word surprisal. While our method allowed for bespoke estimations, it may have introduced inaccuracies in word- and phoneme-based speech feature calculations, potentially affecting the results of our mTRF models. Future research should strive for a unified methodological approach in constructing auditory encoding models. This would facilitate direct comparisons between studies and ensure more consistent estimations of linguistic surprisals, enhancing the reliability and interpretability of findings in cognitive neuroscience.
Conclusion and outlook
Our study revealed a significant interaction between hearing ability and the segmentation word-level model. Participants with reduced hearing ability showed lower encoding accuracy for the word segmentation model, suggesting that hearing impairment specifically affects the neural encoding of word boundaries in natural speech. This finding underscores the importance of auditory cues in successful word recognition and highlights the additional cognitive load imposed by hearing impairment. While we did not observe significant differences in the neural encoding of linguistic features between older adults with and without early signs of cognitive decline, our findings emphasize the need to consider the interplay between cognitive and auditory functions, particularly in the context of hearing loss. Future studies should incorporate more comprehensive neuropsychological assessments and introduce more challenging listening conditions to better understand these relationships. These enhancements would improve our understanding of the relationship between cognitive decline, hearing impairment, and neural encoding of speech in older adults. Additionally, recruiting individuals with more severe cognitive decline in future research could provide insights into the effects at different stages of cognitive impairment. Our work contributes to the expanding literature on the intersection of cognitive decline and hearing loss, highlighting the complex interactions between auditory and cognitive functions in older adulthood.
Methods
Participants and cognitive grouping
This reanalysis included 44 native Swiss-German speakers from the first study14. Participants were monolingual up to the age of seven, right-handed and retired. Exclusion criteria included a professional music career, dyslexia, significant neurological disorders, severe or asymmetrical hearing loss and the use of hearing aids. The study was conducted in accordance with the Declaration of Helsinki and approved by the Ethics Committee of the Canton of Zurich (BASEC No. 2019-01400), with all research methods conducted in adherence to the relevant guidelines and regulations. The sessions were conducted at the Linguistic Research Infrastructure (LiRI, liri.uzh.ch). Written informed consent was obtained from all participants and they received compensation for their participation.
Participants were divided into groups based on the MoCA12,24. MoCA scores range from 0 to 30, with a cutoff of 26 for normal cognitive function. Participants who scored 26 or more were assigned to the normal MoCA group, participants below 26 to the low MoCA group. The normal MoCA group included 25 participants (14 women) with an average age of years (range: 60–83 years). The low MoCA group included 19 participants (12 women) with a mean age of (range: 60–82 years). The age differences between the groups were not statistically significant (two-tailed Mann–Whitney U test: 311.5, , ). The distribution of MoCA scores is shown in Fig. 1A.
Audiometry
We measured the hearing thresholds for pure tones at frequencies of 125, 250, 500, 1000, 2000, 4000, and in both ears, using the Affinity Compact audiometer (Interacoustics, Middelfart, Denmark), equipped with a DD450 circumaural headset (Radioear, New Eagle, PA, USA). Individual frequency thresholds are shown in Fig. 1B. Overall hearing ability was determined using the four-frequency PTA, the average of the thresholds at 500, 1000, 2000 and 26. The average interaural difference was (range: 0–), indicating symmetrical hearing ability. The PTA, averaged over both ears, was for the normal MoCA group and for the low MoCA group. Most participants showed no to mild hearing impairment, while seven showed moderate impairment (): two in the low MoCA group and five in the normal MoCA group. Furthermore, PTA was correlated with age (, , Fisher’s , ), see Fig. 1C. No significant group differences in hearing ability were found (two-tailed Mann–Whitney U test: U = 262.5, p = .561, r = 0.11). However, as hearing ability is relevant and has an influence on auditory encoding21,27,28, we included PTA as a control variable in the analyses.
EEG recording
We recorded the EEG using the Biosemi ActiveTwo system (Biosemi, Amsterdam, The Netherlands) with 32 electrodes (10–20 system), four external electrodes and a sampling rate of . We positioned two external electrodes above and below the right eye to measure the electrooculogram (EOG), and two on the left and right mastoids. Participants sat in a soundproof, electromagnetically shielded booth during the recording. During the listening tasks, we instructed participants to focus on a fixation cross displayed on a screen and to minimize movement, especially when the cross was visible. Throughout the experiment, we monitored and maintained electrode offsets below .
Audiobook
Participants listened to 25 segments from the German audiobook version of Sylvia Plath’s novel, “The Bell Jar,” read by a professional female speaker36. Segments lasted on average , totalling a listening time of approximately . Silent gaps were limited to and the average speech wave intensity was scaled to SPL. Each segment commenced with a short silence of approximately , which we retained to introduce stimulus onset jitter. We calibrated the sound level such that segments were consistently played at peSPL. Audiobook segments were presented bilaterally through electromagnetically shielded insert ER3 earphones (Etymotic Research, Elk Grove Village, IL, USA)
Signal processing
We performed all data processing in Python 3.11.6 and used the MNE-Python package51 for all signal preprocessing steps. Unless otherwise specified, all filters applied to the EEG and speech wave signals were non-causal Infinite Impulse Response (IIR) Butterworth filters with an effective order twice the specified filter order, which was always set to 3. Anti-alias filters were consistently applied at of the target rate.
EEG
First, we removed bad electrodes—on average electrodes per participant—and then referenced the EEG signals to the mean of the two mastoid channels. For six participants with at least one noisy mastoid channels, we used cap electrodes T7 and T8 as reference. We then segmented the continuous EEG from to relative to audiobook onset and downsampled the epochs to after applying an anti-alias filter at . We used Independent Component Analysis (ICA) to remove artifacts from the EEG signals. We created a copy of the epoch instance for ICA and filtered the epoch copy with a high-pass filter at (zero-phase, non-causal Hamming window Finite Impulse Response (FIR) filter, transition bandwidth: , filter length: 1691 samples), a process reported to facilitate ICA decomposition52. We performed ICA using the Picard algorithm53 with 1000 iterations, aiming to obtain of the variance of the signal. Furthermore, we improved the ICA performance by using five iterations within the FastICA algorithm54. After ICA fitting, the components associated with eye-related artifacts were automatically labeled using the external EOG electrodes as references, and the components associated with muscle activity or singular artifacts were manually labeled based on topography, temporal occurrence, and frequency spectrum. On average, we excluded components per participant. We zeroed out the components in the original epoch instance and then performed electrode interpolation for the epochs. The EEG was then downsampled to after applying an anti-alias filter at . Finally, we band-pass filtered the EEG between 0.5 and . To facilitate matrix storage, the epochs were cut to 0 to relative to audiobook onset.
Speech wave
The speech waves from each segment were processed to extract acoustic features. We first downsampled the speech waves to using an anti-alias filter at . Speech waves were then passed through a Gammatone filterbank55 with 28 channels with center frequencies from 50 to spaced equally on the equivalent rectangular bandwidth (ERB) scale. Each Gammatone frequency channel output was half-wave rectified and raised to the power of 0.3, before we avaraged the filter outputs across channels to obtain a univariate temporal envelope. In line with the EEG preprocessing, we downsampled the envelopes to after applying an anti-alias filter at , and eventually band-pass filtered them between 0.5 and . We then truncated the envelopes to a uniform length of or padded them with zeros (depending on the segment length).
Acoustic and linguistic speech features extraction
Our goal was to model neural responses not only to acoustic features of the speech signal, but also to a range of lexical (word-based) and sublexical (phoneme-based) representations in the signal. To this end, we generated a range of time-aligned linguistic speech representations as impulse features from the audiobook transcription. Drawing inspiration from Kries et al.22, we constructed five models, each with distinct speech features, to quantify the tracking of different levels of linguistic information in the speech signal. These models included (1) an acoustic model, (2) a segmentation model at the word-level, (3) a segmentation model at the phoneme-level, (4) a linguistic model at the word-level, and (5) a linguistic model at the phoneme-level. We generated the speech features for each segmentation and linguistic mTRF model as vectorized time series of zeros, with a sampling rate of and a length of , corresponding to the length and rate of the EEG epochs. Features were modeled as impulses at word and phoneme onsets, respectively. An example of the speech features is shown in Fig. 2.
Acoustic features
The speech feature pair for the acoustic model were the envelope and the envelope onsets, both of which were extracted from the speech wave. The extraction of the envelope is described in the previous section. Since the brain is sensitive to contrast and changes and information is often encoded in acoustic onsets in particular56, the model also included the envelope onsets. We constructed the envelope onsets as the half-wave rectified derivative of the envelope.
Segmentation features at word- and phoneme-level
Speech features of segmentation consisted of word onsets and phoneme onsets. For extracting the onsets, we determined their boundaries using the MFA (version 2.2.14)57. We created a transcript in Praat and then used the pre-trained German MFA acoustic model and the German MFA pronunciation dictionary58 to first, create a phonetic transcription of the audio file and second, to determine the timing of word and phoneme boundaries for each segment. The accuracy of word and phoneme boundaries, along with their time-aligned transcriptions, was manually verified and corrected as necessary. Word and phoneme onsets timepoints were then extracted from the MFA output and modeled as impulses of the value one in the vectorized time series of zeros.
Linguistic word-level features
The linguistic word-level model included word surprisal and word frequency as speech features. Word surprisal is an approximation of how unexpected a word is in a given context. We used the pre-trained German BERT model59 for this calculation. BERT, short for Bidirectional Encoder Representations from Transformers, is a state-of-the-art language model designed for contextual language analysis. We adapted BERT to simulate unidirectional (left-to-right) context processing, reflecting natural listening comprehension. For each word in the audiobook segments, we created a sequence with the target word masked. The model then predicted the probability of the masked word based on its preceding context. We calculated word surprisal as the negative logarithm of the probability predicted by BERT for each segment.
Word frequency describes the frequency of a word out of context. We used DeReKoGram, a novel frequency dataset for 1-, 2- and 3-grams from the German Reference Corpus50, using the unigram frequencies to calculate the frequency of each word20. Relative unigram frequencies can be viewed as an estimate of the unconditional probability of occurrence of a word. First, we determined the relative frequency of each word in the audiobook segment-based transcript, and then we used the negative logarithm of this value as the word frequency. This procedure results in words with a high frequency producing a low value and vice versa22,49.
Linguistic phoneme-level features
The speech features in the linguistic phoneme-level model were phoneme surprisal and phoneme entropy. Phoneme surprisal reflects how surprising a phoneme is given the preceding phonemes. We calculated phoneme surprisal as the negative logarithm of the phoneme probability given an activated cohort of phonemes, in line with prior work22,49,56. We generated a phonetic lexicon with lexical statistics by combining pronunciations from the MFA pronunciation dictionary and word frequency derived as absolute unigram frequencies from the DeReKoGram dataset, in which missing pronunciations were manually added, and words occurring in the stimuli but missing from DeReKoGram were assigned a frequency of one56. We then used the lexicon to calculate the probability of each phoneme given the preceding phonemes in the activated cohort and thus derived surprisal values for each phoneme in the audiobook segments.
Phoneme entropy is an indicator for the degree of competition between words congruent with the current phonemic input49,56. At the beginning of a word utterance, a large number of potential words form the activated cohort, leading to a high level of competition. This competition decreases as the utterance progresses and the cohort becomes smaller. We calculated the phoneme entropy using the Shannon entropy formula applied to the words within the activated cohort. Specifically, the initial phoneme of each word included all words in the active cohort. We used the same phonetic lexicon as for phoneme surprisal to calculate the phoneme entropy.
Regressing out speech features not of interest
Following the methodology of Kries et al.22, we addressed the problem of collinearity between features before fitting the mTRF models. Given the interdependence of features—e.g., envelope onsets include information about word onsets and word onsets reflect phoneme onsets (and vice versa)—it was crucial to isolate the specific feature of interest for each model25. Before fitting the mTRF models, we regressed out features not of interest for the current model from the EEG signal. The reason for this is the collinearity between the features, for example, the envelope onsets contain information about word onsets, and word onsets contain information about phoneme onsets, but also vice versa, sublexical features can reveal information about lexical features25. We therefore regressed out the features of no interest from the EEG signal and then used the EEG residuals for the mTRF models. We did this using a linear regression model with the EEG signal as the dependent variable and the non-interesting features as independent variables. We then used the residuals, which were the difference between the EEG signal and the predicted EEG by the linear model, as target for the mTRF models. For the acoustic models, we regressed out the features for word- and phoneme-level segmentation and linguistic models. Regarding the segmentation models at both the word and phoneme levels, we regressed out the features for the acoustic model and the word- and phoneme-level linguistic models, but not for each other22. When addressing the word-level linguistic models, we regressed out the regressors for the acoustic, segmentation, and phoneme-level linguistic models. Similarly, in the phoneme-level linguistic models, we regressed out the regressors for the acoustic, segmentation, and word-level linguistic models.
mTRF modeling
We quantified the cortical encoding of the speech features using mTRF models computed with the Eelbrain toolbox60. Eelbrain applies the Boosting algorithm to fit the mTRF models while mitigating the overfitting that is present in correlated features as in our data61,62. Boosting is a coordinate descent algorithm that iteratively updates a sparse multi-temporal resolution filter (the mTRF model) by changing a single filter weight at a time based on training data, as opposed to the uniform filter weight adjustment of ridge regression, which is also a conventional method in TRF modeling15. After each update of the weights, the model is evaluated against validation data and training stops when error reduction ceases, preventing overfitting and ensuring that irrelevant filter weights remain at zero, thereby increasing the parsimony of the model. A detailed explanation of the Boosting algorithm in the Eelbrain toolbox can be found in Brodbeck et al.60.
For each participant, we fitted five mTRF models. To prevent overfitting the models to the onset effects of the speech wave, we truncated the first second of each speech feature and EEG time series before fitting the mTRF models18. Upon fitting the models using the Boosting algorithm, we adjusted the basis function in Eelbrain to and normalized the feature-target pairs by z-transformation, setting the scale_data parameter to True. The models were fitted for time lags ranging from to 600 ms. We used a six-fold cross-validation approach in which the feature-target pairs based on 25 audiobook segments were systematically rotated through the training, validation, and testing phases. Each segment was used four times for training, once for validation, and once for testing across all folds. The models were calibrated using the training segments, optimized using the validation segments, and evaluated using the test segments, with encoding accuracy assessed using the Pearson correlation coefficient between the observed EEG residuals y and the predicted EEG residuals . The mean correlation coefficient across all validation folds was calculated for each electrode, resulting in a single metric for encoding accuracy per electrode. Consequently, each mTRF model yielded one correlation coefficient per electrode and two response functions (for acoustic and linguistic models) or one response function (for segmentation models) per speech feature. Note that only the acoustic and linguistic models are actual mTRF models, while the segmentation models are not multivariate, thus TRF models only, but for the sake of consistency, we refer to all models as mTRF models. We averaged the encoding accuracies across all electrodes to obtain a single, comprehensive value of encoding accuray, and stored it together with the response functions for further analysis.
Spatial and temporal clustering of response functions
Electrode clusters
We selected nine a priori midline electrodes for further analysis: F3, Fz, F4, C3, Cz, C4, P3, Pz, and P4, which were categorized into three cluster regions: frontal (F), central (C), and parietal (P). This approach allows to include the activity of individual electrodes in the analysis and at the same time to perform hierarchical modeling at the cluster level. The electrodes were selected based on their proximity to the auditory cortex and their relevance to neural speech processing, while covering a broad area of the scalp without making prior assumptions about the exact location of the neural generators of the response functions.
Temporal clusters
Previous studies have shown that peaks in the response functions occur in different time windows depending on the speech feature of interest21,22. To determine the time windows, we took a data-driven approach as follows: First, for each participant and each of the nine a priori selected electrodes, we identified the two largest peaks in each response function. We extracted peaks within time lags of to 600 ms, omitting most of the negative lags but accounting for potential peaks occurring close to time lag zero. We used the find_peaks function from SciPy to identify the peaks, setting the prominence parameter to 0.5, meaning that the peak must be at least 0.5 times higher than the surrounding data points. Finally, we excluded peaks with latencies below 0 ms, since neurophysiological responses are not expected to occur before the stimulus. Second, using the latencies of the two largest peaks across all electrodes, we performed K-Means clustering to group the time lags into two clusters. Specifically, we used the function KMeans from SciKit-Learn63 with 100 initializations and two clusters. We then calculated the mean of the two cluster centers to determine the boundary between the two peaks. This process was repeated for each speech feature for later peak amplitude and latency analysis. Eventually, the boundaries were used to define an “early” and a “late” time window for each speech feature. Figure S1 shows the peaks identified and the time boundaries estimated by the K-Means clustering for all response functions and time windows are reported in Table S4.
Statistical analyses
We ran all statistical analyses reported in this study in R (version 4.3.2). To run the LMM described in this section, we used the lme4 package64 for statistical model fitting and reported the estimates with confidence intervals (CI) calculated through bootstrap resamples. The reports included CI, their lower and upper limits, t-values, and p-values. We normalized the continuous predictors (PTA) prior to model fitting, so the coefficients for these predictors are standardized coefficients. For the categorical predictors (mTRF models), we used sum contrast coding. These coefficients represent deviations from the overall mean and are not standardized in the same way as the continuous predictors.
Effect of MoCA group and PTA on encoding accuracy
We used a LMM to examine the encoding accuracies across mTRF models, MoCA groups, PTA, and their interactions. The LMM model was formulated as:
| 1 |
In this model, encoding accuracy, averaged across all electrodes per participant and mTRF model, served as the dependent variable. MoCA group was coded as a binary factor (0: normal, 1: low), PTA values were normalized using z-transformation, and the mTRF model variable was classified into five levels (acoustic, word-level segmentation, phoneme-level segmentation, word-level linguistic, and phoneme-level linguistic). We included the interaction between MoCA group and PTA to examine the effect of cognitive group on encoding accuracy while controlling for hearing ability. Additionally, we included the interaction between MoCA group and mTRF model to investigate the effect of cognitive group on encoding accuracy across different models. To examine the effect of hearing ability on encoding accuracy across different models, we included the interaction between PTA and mTRF model. Furthermore, we incorporated the three-way interaction between MoCA group, PTA, and mTRF model to investigate the combined effect of cognitive group and hearing ability on encoding accuracy across different models. By including this three-way interaction term, we aimed to assess how cognitive decline influences speech processing at various levels. We modified the default contrast coding scheme from treatment contrasts to an orthogonal sum-to-zero coding system to account for potential interactions65. This adjustment allowed us to estimate the main effects at the level of the grand mean, ensuring more accurate interpretation of these effects. The model also included a random intercept for the factor participant ID to account for the nested data structure.
mTRF-model based effects of MoCA group and PTA on response function signal power
In addition to encoding accuracy, we examined the signal power of the response functions for each mTRF model. We calculated the signal power using the RMS over delays from 0 to 500 ms for each electrode and each speech feature-based response function separately. The RMS value was chosen for two reasons: First, because of its efficiency in quantifying the total energy of the response function. RMS is a reliable measure that is robust to signal fluctuations and sensitive to the magnitude and consistency of neuronal responses by capturing potential peaks regardless of their polarity. Second, while acoustic encoding typically exhibits a P1–N1–P2 pattern, mTRF response functions in neuronal responses are less well-defined. For example, previous studies21,49 have identified a response to word surprisals similar to the N400 effect, characterized by a negative deflection around 400 ms after the stimulus. Using the RMS within the 0–500 ms window therefore allows us to comprehensively measure the total energy of the response function regardless of its specific shape.
A LMM with the following formula was used to examine the RMS values across mTRF models, MoCA groups, and their interaction with PTA:
| 2 |
We implemented the LMM separately for each of the mTRF models. RMS was the dependent continuous variable. Again, MoCA group was coded as a binary factor (0: normal, 1: low) and PTA values were normalized using z-transformation. Cluster was a factor with three levels (F, C, P, with F as reference). In line with the previous statistical model, we included the interaction between MoCA group and PTA and a random intercept for the factor participant ID to account for the nested structure of the data. In the acoustic, linguistic word- and phoneme-level models, speech feature was a factor with two levels, thus two speech features per mTRF model, with the first level serving as reference level, i.e., envelope for the acoustic model, word surprisal for the linguistic word-level model, and phoneme surprisal for the linguistic phoneme-level model. Again, we used the sum-to-zero coding system to estimate the main effects at the level of the grand mean.
For the word-level and phoneme-level segmentation models, we used a slightly different formula:
| 3 |
In these models, only one speech feature was included, i.e., word onsets for the word-level model and phoneme onsets for the phoneme-level model, thus, there was no factor of speech feature.
We conducted post-hoc comparisons to examine the interaction between PTA and the mTRF model at word-level segmentation on the encoding accuracy using the emmeans package, with the significance level adjusted using Tukey’s method.
Group comparisons of peak amplitudes and latencies
Table S4 shows the percentage frequency of occurrence of peaks in three electrode clusters during early and late time windows. To examine group differences in these peak amplitudes and latencies, we focused on windows and clusters in which peaks occurred in at least of participants (an arbitrary threshold). Consequently, our analyses included peaks in the early window for the speech features envelope, envelope onset, word onset, and word surprisal responses, and in the late window for word onset, phoneme onset, and word surprisal responses, with different clusters for each speech feature-based response. We used two-tailed Mann–Whitney U tests to assess the differences in peak amplitudes and latencies between the normal and low MoCA groups in these time windows. Since we performed multiple comparisons over the same peaks within different clusters, we applied the Holm-Bonferroni correction66 to the p-values. This method involves adjusting the significance level of each test based on its rank among the other tests within the same response, thereby controlling the familywise error rate. Our report includes the U-statistic, corrected p-value, and effect size r for each comparison.
Supplementary Information
Acknowledgements
We would like to express our gratitude to Elainne Vibal and Katarina Kliestenec for their tremendous assistance with data collection, and to Andrew Clark from LiRI for his support in setting up the EEG experiment. In addition, we are very grateful to Eleanor Chodroff for providing the MFA tutorial, which greatly aided our understanding of how to apply the MFA for forced alignment to our audiobook segments. Finally, we would like to thank the two reviewers for their insightful comments and suggestions, which helped us to improve the quality of this paper.
Author contributions
EB and NG designed research, EB collected data, analyzed data and wrote the paper, NG revised the manuscript.
Funding sources
This study was funded by the Swiss National Science Foundation (SNSF, snf.ch, grant number PR00P1_185715 to NG). EB is a pre-doctoral fellow at International Max Planck Research School on the Life Course (IMPRS LIFE).
Data availability
The data used in this study are available upon request from the corresponding author.
Code availability
The code for signal processing, including the extraction of acoustic and linguistic speech features, the Boosting pipeline, and the statistical analysis described in this paper, are available at github.com/elbolt/acuLin-speech.
Competing interests
The authors declare no competing interests.
Footnotes
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
The online version contains supplementary material available at 10.1038/s41598-024-69602-1.
References
- 1.Livingston, G. et al. Dementia prevention, intervention, and care: 2020 Report of the Lancet Commission. The Lancet396, 413–446. 10.1016/S0140-6736(20)30367-6 (2020). 10.1016/S0140-6736(20)30367-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Vuorinen, E., Laine, M. & Rinne, J. Common pattern of language impairment in vascular Dementia and in Alzheimer disease. Alzheimer Dis. Assoc. Disord.14, 81–86. 10.1097/00002093-200004000-00005 (2000). 10.1097/00002093-200004000-00005 [DOI] [PubMed] [Google Scholar]
- 3.Kempler, D. & Goral, M. Language and dementia: Neuropsychological aspects. Annu. Rev. Appl. Linguist.28, 73–90. 10.1017/S0267190508080045 (2008). 10.1017/S0267190508080045 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Mueller, K. D., Hermann, B., Mecollari, J. & Turkstra, L. S. Connected speech and language in mild cognitive impairment and Alzheimer’s disease: A review of picture description tasks. J. Clin. Exp. Neuropsychol.40, 917–939. 10.1080/13803395.2018.1446513 (2018). 10.1080/13803395.2018.1446513 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Taler, V. & Phillips, N. A. Language performance in Alzheimer’s disease and mild cognitive impairment: A comparative review. J. Clin. Exp. Neuropsychol.30, 501–556. 10.1080/13803390701550128 (2008). 10.1080/13803390701550128 [DOI] [PubMed] [Google Scholar]
- 6.Keller, J. N. Age-related neuropathology, cognitive decline, and Alzheimer’s disease. Ageing Res. Rev.5, 1–13. 10.1016/j.arr.2005.06.002 (2006). 10.1016/j.arr.2005.06.002 [DOI] [PubMed] [Google Scholar]
- 7.Lindenberger, U. & Baltes, P. B. Sensory functioning and intelligence in old age: A strong connection. Psychol. Aging9, 339–355. 10.1037/0882-7974.9.3.339 (1994). 10.1037/0882-7974.9.3.339 [DOI] [PubMed] [Google Scholar]
- 8.Lin, F. R. & Albert, M. Hearing loss and dementia—Who is listening?. Aging Mental Health18, 671–673. 10.1080/13607863.2014.915924 (2014). 10.1080/13607863.2014.915924 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Thomson, R. S., Auduong, P., Miller, A. T. & Gurgel, R. K. Hearing loss as a risk factor for dementia: A systematic review. Laryngoscope Investig. Otolaryngol.2, 69–79. 10.1002/lio2.65 (2017). 10.1002/lio2.65 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Zion Golumbic, E. M., Poeppel, D. & Schroeder, C. E. Temporal context in speech processing and attentional stream selection: A behavioral and neural perspective. Brain Lang.122, 151–161. 10.1016/j.bandl.2011.12.010 (2012). 10.1016/j.bandl.2011.12.010 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Edwards, J. D. et al. Auditory processing of older adults with probable mild cognitive impairment. J. Speech Lang. Hear. Res.60, 1427–1435. 10.1044/2016_JSLHR-H-16-0066 (2017). 10.1044/2016_JSLHR-H-16-0066 [DOI] [PubMed] [Google Scholar]
- 12.Bidelman, G. M., Lowther, J. E., Tak, S. H. & Alain, C. Mild cognitive impairment is characterized by deficient brainstem and cortical representations of speech. J. Neurosci.37, 3610–3620. 10.1523/JNEUROSCI.3700-16.2017 (2017). 10.1523/JNEUROSCI.3700-16.2017 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Morrison, C., Rabipour, S., Knoefel, F., Sheppard, C. & Taler, V. Auditory event-related potentials in mild cognitive impairment and Alzheimer’s disease. Curr. Alzheimer Res.15, 702–715. 10.2174/1567205015666180123123209 (2018). 10.2174/1567205015666180123123209 [DOI] [PubMed] [Google Scholar]
- 14.Bolt, E. & Giroud, N. Auditory encoding of natural speech at subcortical and cortical levels is not indicative of cognitive decline. eNeuro10.1523/ENEURO.0545-23.2024 (2024). 10.1523/ENEURO.0545-23.2024 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Crosse, M. J., Di Liberto, G. M., Bednar, A. & Lalor, E. C. The multivariate temporal response function (mTRF) toolbox: A MATLAB toolbox for relating neural signals to continuous stimuli. Front. Hum. Neurosci.10, 604. 10.3389/fnhum.2016.00604 (2016). 10.3389/fnhum.2016.00604 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Vanthornhout, J., Decruy, L. & Francart, T. Effect of task and attention on neural tracking of speech. Front. Neurosci.10.3389/fnins.2019.00977 (2019). 10.3389/fnins.2019.00977 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Lesenfants, D. & Francart, T. The interplay of top–down focal attention and the cortical tracking of speech. Sci. Rep.10, 6922. 10.1038/s41598-020-63587-3 (2020). 10.1038/s41598-020-63587-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Crosse, M. J. et al. Linear modeling of neurophysiological responses to speech and other continuous stimuli: Methodological considerations for applied research. Front. Neurosci.15, 705621. 10.3389/fnins.2021.705621 (2021). 10.3389/fnins.2021.705621 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Brodbeck, C., Presacco, A., Anderson, S. & Simon, J. Z. Over-representation of speech in older adults originates from early response in higher order auditory cortex. Acta Acustica United Acustica104, 774–777. 10.3813/AAA.919221 (2018). 10.3813/AAA.919221 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Weissbart, H., Kandylaki, K. D. & Reichenbach, T. Cortical tracking of surprisal during continuous speech comprehension. J. Cogn. Neurosci.32, 155–166. 10.1162/jocn_a_01467 (2020). 10.1162/jocn_a_01467 [DOI] [PubMed] [Google Scholar]
- 21.Gillis, M., Kries, J., Vandermosten, M. & Francart, T. Neural tracking of linguistic and acoustic speech representations decreases with advancing age. NeuroImage267, 119841. 10.1016/j.neuroimage.2022.119841 (2023). 10.1016/j.neuroimage.2022.119841 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Kries, J. et al. Exploring neural tracking of acoustic and linguistic speech representations in individuals with post-stroke aphasia. Hum. Brain Mapp.45, e26676. 10.1002/hbm.26676 (2024). 10.1002/hbm.26676 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Gillis, M., Decruy, L., Vanthornhout, J. & Francart, T. Hearing loss is associated with delayed neural responses to continuous speech. Eur. J. Neurosci.55, 1671–1690. 10.1111/ejn.15644 (2022). 10.1111/ejn.15644 [DOI] [PubMed] [Google Scholar]
- 24.Nasreddine, Z. S. et al. The montreal cognitive assessment, MoCA: A brief screening tool for mild cognitive impairment. J. Am. Geriatr. Soc.53, 695–699. 10.1111/j.1532-5415.2005.53221.x (2005). 10.1111/j.1532-5415.2005.53221.x [DOI] [PubMed] [Google Scholar]
- 25.Gillis, M., Van Canneyt, J., Francart, T. & Vanthornhout, J. Neural tracking as a diagnostic tool to assess the auditory pathway. Hear. Res.426, 108607. 10.1016/j.heares.2022.108607 (2022). 10.1016/j.heares.2022.108607 [DOI] [PubMed] [Google Scholar]
- 26.Lin, F. R. & Reed, N. S. The pure-tone average as a universal metric-knowing your hearing. JAMA Otolaryngol. Head Neck Surg.147, 230–231. 10.1001/jamaoto.2020.4862 (2021). 10.1001/jamaoto.2020.4862 [DOI] [PubMed] [Google Scholar]
- 27.Decruy, L., Vanthornhout, J. & Francart, T. Hearing impairment is associated with enhanced neural tracking of the speech envelope. Hear. Res.393, 107961. 10.1016/j.heares.2020.107961 (2020). 10.1016/j.heares.2020.107961 [DOI] [PubMed] [Google Scholar]
- 28.Fuglsang, S. A., Märcher-Rørsted, J., Dau, T. & Hjortkjær, J. Effects of sensorineural hearing loss on cortical synchronization to competing speech during selective attention. J. Neurosci.40, 2562–2572. 10.1523/JNEUROSCI.1936-19.2020 (2020). 10.1523/JNEUROSCI.1936-19.2020 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Schmitt, R., Meyer, M. & Giroud, N. Better speech-in-noise comprehension is associated with enhanced neural speech tracking in older adults with hearing impairment. Cortex151, 133–146. 10.1016/j.cortex.2022.02.017 (2022). 10.1016/j.cortex.2022.02.017 [DOI] [PubMed] [Google Scholar]
- 30.Presacco, A., Simon, J. Z. & Anderson, S. Evidence of degraded representation of speech in noise, in the aging midbrain and cortex. J. Neurophysiol.116, 2346–2355. 10.1152/jn.00372.2016 (2016). 10.1152/jn.00372.2016 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Schneider, B. A. & Pichora-Fuller, M. K. Implications of perceptual deterioration for cognitive aging research. In The handbook of aging and cognition, 2nd ed, 155–219 (Lawrence Erlbaum Associates Publishers, Mahwah, NJ, US, 2000).
- 32.Frei, V., Schmitt, R., Meyer, M. & Giroud, N. Visual speech cues enhance neural speech tracking in right auditory cluster leading to improvement in speech in noise comprehension in older adults with hearing impairment. Authorea Preprints (2023). 10.22541/au.167769544.47033512/v1.
- 33.Luo, H. & Poeppel, D. Phase patterns of neuronal responses reliably discriminate speech in human auditory cortex. Neuron54, 1001–1010. 10.1016/j.neuron.2007.06.004 (2007). 10.1016/j.neuron.2007.06.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Giraud, A.-L. & Poeppel, D. Cortical oscillations and speech processing: Emerging computational principles and operations. Nat. Neurosci.15, 511–517. 10.1038/nn.3063 (2012). 10.1038/nn.3063 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Poeppel, D. & Assaneo, M. F. Speech rhythms and their neural foundations. Nat. Rev. Neurosci.21, 322–334. 10.1038/s41583-020-0304-4 (2020). 10.1038/s41583-020-0304-4 [DOI] [PubMed] [Google Scholar]
- 36.Kurthen, I. et al. Selective attention modulates neural envelope tracking of informationally masked speech in healthy older adults. Hum. Brain Mapp.42, 3042–3057. 10.1002/hbm.25415 (2021). 10.1002/hbm.25415 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Giroud, N., Keller, M., Hirsiger, S., Dellwo, V. & Meyer, M. Bridging the brain structure-brain function gap in prosodic speech processing in older adults. Neurobiol. Aging80, 116–126. 10.1016/j.neurobiolaging.2019.04.017 (2019). 10.1016/j.neurobiolaging.2019.04.017 [DOI] [PubMed] [Google Scholar]
- 38.McClelland, J. L., Mirman, D. & Holt, L. L. Are there interactive processes in speech perception?. Trends Cogn. Sci.10, 363–369. 10.1016/j.tics.2006.06.007 (2006). 10.1016/j.tics.2006.06.007 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Mattys, S. L., Davis, M. H., Bradlow, A. R. & Scott, S. K. Speech recognition in adverse conditions: A review. Lang. Cogn. Proc.27, 953–978. 10.1080/01690965.2012.705006 (2012). 10.1080/01690965.2012.705006 [DOI] [Google Scholar]
- 40.Poeppel, D. & Hackl, M. The functional architecture of speech perception. Topics in integrative neuroscience: From cells to cognition 154–180 (2008).
- 41.Hickok, G. & Poeppel, D. The cortical organization of speech processing. Nat. Rev. Neurosci.8, 393–402. 10.1038/nrn2113 (2007). 10.1038/nrn2113 [DOI] [PubMed] [Google Scholar]
- 42.Santos Revilla, A. E., Bolt, E., Kodrasi, I., Pellegrino, E. & Giroud, N. Classifying subjects with MCI and hearing loss from speech signals using machine learning. In preparation (2024).
- 43.Humes, L. E. The World Health Organization’s hearing-impairment grading system: an evaluation for unaided communication in age-related hearing loss. Int. J. Audiol.58, 12–20. 10.1080/14992027.2018.1518598 (2019). 10.1080/14992027.2018.1518598 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Peelle, J. E. & Davis, M. H. Neural oscillations carry speech rhythm through to comprehension. Front. Psychol.10.3389/fpsyg.2012.00320 (2012). 10.3389/fpsyg.2012.00320 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Wong, D. D. E. et al. A comparison of regularization methods in forward and backward models for auditory attention decoding. Front. Neurosci.12, 531. 10.3389/fnins.2018.00531 (2018). 10.3389/fnins.2018.00531 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Bachmann, F. L., MacDonald, E. N. & Hjortkjær, J. Neural measures of pitch processing in EEG responses to running speech. Front. Neurosci.15, 738408. 10.3389/fnins.2021.738408 (2021). 10.3389/fnins.2021.738408 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Hjortkjær, J., Märcher-Rørsted, J., Fuglsang, S. A. & Dau, T. Cortical oscillations and entrainment in speech processing during working memory load. Eur. J. Neurosci.51, 1279–1289. 10.1111/ejn.13855 (2020). 10.1111/ejn.13855 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Kiese-Himmel, C. Neue intelligenztests [New intelligence tests]. Sprache Stimme Gehör40, 34–36. 10.1055/s-0041-103300 (2016). 10.1055/s-0041-103300 [DOI] [Google Scholar]
- 49.Gillis, M., Vanthornhout, J., Simon, J. Z., Francart, T. & Brodbeck, C. Neural markers of speech comprehension: Measuring EEG tracking of linguistic speech representations, controlling the speech acoustics. J. Neurosci.41, 10316–10329. 10.1523/JNEUROSCI.0812-21.2021 (2021). 10.1523/JNEUROSCI.0812-21.2021 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Wolfer, S., Koplenig, A., Kupietz, M. & Müller-Spitzer, C. Introducing DeReKoGram: A novel frequency dataset with lemma and part-of-speech information for German. Data8, 170. 10.3390/data8110170 (2023). 10.3390/data8110170 [DOI] [Google Scholar]
- 51.Gramfort, A. et al. MEG and EEG data analysis with MNE-Python. Front. Neurosci.7, 267. 10.3389/fnins.2013.00267 (2013). 10.3389/fnins.2013.00267 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Klug, M. & Gramann, K. Identifying key factors for improving ICA-based decomposition of EEG data in mobile and stationary experiments. Eur. J. Neurosci.54, 8406–8420. 10.1111/ejn.14992 (2020). 10.1111/ejn.14992 [DOI] [PubMed] [Google Scholar]
- 53.Ablin, P., Cardoso, J.-F. & Gramfort, A. Faster independent component analysis by preconditioning With Hessian approximations. IEEE Trans. Signal Process.66, 4040–4049. 10.1109/TSP.2018.2844203 (2018). 10.1109/TSP.2018.2844203 [DOI] [Google Scholar]
- 54.Hyvarinen, A. Fast ICA for noisy data using Gaussian moments. In 1999 IEEE International Symposium on Circuits and Systems (ISCAS), vol. 5, 57–61. 10.1109/ISCAS.1999.777510 (1999).
- 55.Glasberg, B. R. & Moore, B. C. Derivation of auditory filter shapes from notched-noise data. Hear. Res.47, 103–138. 10.1016/0378-5955(90)90170-T (1990). 10.1016/0378-5955(90)90170-T [DOI] [PubMed] [Google Scholar]
- 56.Brodbeck, C., Hong, L. E. & Simon, J. Z. Rapid transformation from auditory to linguistic representations of continuous speech. Curr. Biol.28, 3976-3983.e5. 10.1016/j.cub.2018.10.042 (2018). 10.1016/j.cub.2018.10.042 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.McAuliffe, M., Socolof, M., Mihuc, S., Wagner, M. & Sonderegger, M. Montreal Forced Aligner: trainable text-speech alignment using Kaldi. In Proc. Interspeech 2017 498–502 (2017). 10.21437/Interspeech.2017-1386.
- 58.McAuliffe, M. & Sonderegger, M. German MFA Dictionary v2.0.0. https://mfa-models.readthedocs.io/en/latest/dictionary/German/German%20MFA%20dictionary%20v2_0_0.html (2022).
- 59.Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. BERT: Pre-training of deep bidirectional transformers for language understanding. In Burstein, J., Doran, C. & Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 4171–4186, 10.18653/v1/N19-1423 (Association for Computational Linguistics, Minneapolis, Minnesota, 2019).
- 60.Brodbeck, C. et al. Eelbrain, a Python toolkit for time-continuous analysis with temporal response functions. eLife12, e85012. 10.7554/eLife.85012 (2023). 10.7554/eLife.85012 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.David, S. V., Mesgarani, N. & Shamma, S. A. Estimating sparse spectro–temporal receptive fields with natural stimuli. Netw. Comput. Neural Syst.18, 191–212. 10.1080/09548980701609235 (2007). 10.1080/09548980701609235 [DOI] [PubMed] [Google Scholar]
- 62.David, S. V. & Shamma, S. A. Integration over multiple timescales in primary auditory cortex. J. Neurosci.33, 19154–19166. 10.1523/JNEUROSCI.2270-13.2013 (2013). 10.1523/JNEUROSCI.2270-13.2013 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Pedregosa, F. et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res.12, 2825–2830. 10.5555/1953048.2078195 (2011). 10.5555/1953048.2078195 [DOI] [Google Scholar]
- 64.Bates, D., Mächler, M., Bolker, B. & Walker, S. (2015) Fitting linear mixed-effects models using lme4. J. Stat. Softw.67, 1–48. 10.18637/jss.v067.i01.
- 65.Singmann, H. & Kellen, D. An introduction to mixed models for experimental psychology. In New Methods in Cognitive Psychology (Routledge, 2019).
- 66.Holm, S. A simple sequentially rejective multiple test procedure. Scand. J. Stat.6, 65–70 (1979). [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The data used in this study are available upon request from the corresponding author.
The code for signal processing, including the extraction of acoustic and linguistic speech features, the Boosting pipeline, and the statistical analysis described in this paper, are available at github.com/elbolt/acuLin-speech.




