Explaining the Musical Advantage in Speech Perception Through Beat Perception and Working Memory

Maxime Perron; Emily A Wood; Frank A Russo

doi:10.1111/nyas.70212

. 2026 Feb 4;1556(1):e70212. doi: 10.1111/nyas.70212

Explaining the Musical Advantage in Speech Perception Through Beat Perception and Working Memory

Maxime Perron ^1,^✉, Emily A Wood ¹, Frank A Russo ^1,²

PMCID: PMC12873458 PMID: 41640046

ABSTRACT

Although musical experience has been linked to enhanced speech‐in‐noise (SIN) perception, the mechanisms underlying this relationship remain unclear. While previous studies have identified contributions from both auditory and cognitive skills, few have evaluated these contributions within an integrated framework. Furthermore, most studies have relied on binary comparisons between musicians and nonmusicians. Here, we assessed 62 young adults with normal hearing using a continuous measure of musical engagement (Goldsmiths Musical Sophistication Index) alongside tests of beat perception (Beat Alignment Test), pitch discrimination (frequency difference limen), auditory working memory (WAIS digit span), and subcortical pitch encoding (frequency‐following response, FFR). SIN perception was measured with a spatialized two‐talker masker task. Greater musical sophistication was associated with better SIN performance, stronger working memory, finer beat perception, and sharper pitch discrimination. Regression analyses identified working memory and beat perception as the strongest predictors, and mediation analyses indicated that these skills contributed to the association between musical sophistication and SIN performance, with working memory accounting for the most variance. In contrast, pitch discrimination and FFR precision were not significant predictors. Our findings clarify the cognitive and temporal foundations of the musician advantage and highlight the value of considering musical engagement as a continuous variable rather than categorical.

Keywords: auditory cognition, beat perception, music, speech perception, working memory

Musical experience enhances speech‐in‐noise (SIN) perception, yet the mechanisms remain unclear. We tested 62 young adults using continuous measures of musical engagement, auditory and cognitive skills, and subcortical pitch encoding. Greater musical sophistication predicted better SIN performance, stronger working memory, finer beat perception, and better pitch discrimination. Regression and mediation analyses identified working memory and beat perception as key mediators. Our findings clarify the cognitive foundations of the musician advantage.

graphic file with name NYAS-1556-0-g001.jpg

1. Introduction

Music challenges the brain more than almost any other form of auditory input, engaging perception, memory, and emotion all at once. Its harmonic, melodic, and rhythmic structures are layered, rich, and dynamic. Listeners must process precise acoustic information at multiple time scales while maintaining attentional control. Learning to play an instrument or sing presents additional challenges. Musicians must refine their ability to distinguish pitch differences of a fraction of a semitone, synchronize their sensorimotor timing to the millisecond, distinguish subtle differences in timbre between instruments or voices, and integrate sensory information from auditory, visual, and proprioceptive sources. These skills are often perfected and maintained through years of practice, often in emotionally stimulating and socially interactive conditions. It is believed that such intensive training triggers neuroplastic changes related to cortical and subcortical auditory processing [1, 2]. Research indicates that musicians outperform nonmusicians in detecting subtle differences in pitch [3, 4], timing [5], and timbre [6, 7]. These advantages may extend to the perception of complex nonmusical sounds, such as speech [8].

The potential of musical training to improve speech processing is attributed to cross‐domain auditory plasticity. This concept suggests that improvements in one domain can transfer to another if the two domains have shared perceptual and cognitive requirements [8−12]. Patel [9, 10, 11] proposed the OPERA (Overlap, Precision, Emotion, Repetition, Attention) hypothesis, according to which such transfer is more likely to occur if music and speech involve overlapping neural circuits; when musical training requires greater precision than typical speech processing; when practice is emotionally rewarding; when practice is repeated over long periods of time; and when practice requires sustained attention. Empirical evidence supports each of these conditions. For instance, speech and music processing overlap in several cortical and subcortical areas [13−16], and musical training has been associated with structural and functional alterations in speech‐related networks [17−19]. Music often requires detecting pitch changes as small as a semitone, demanding a level of auditory precision beyond what is typically needed for speech [20].

Speech‐in‐noise (SIN) perception is a particularly relevant test for OPERA. Understanding SIN is a demanding skill that varies considerably across populations, even among young adults with clinically normal hearing [21, 22]. SIN tasks require listeners to isolate a target voice from competing background noise, integrate degraded information, and maintain comprehension despite interference. This “cocktail party problem” involves both a bottom‐up analyses of the auditory scene, such as separating streams using spectral, temporal, and spatial cues [23−25], and top‐down processes, including selective attention [26], sensorimotor integration [27, 28], working memory [29], and linguistic prediction [30]. Musical training mobilizes and refines many of these same mechanisms, for example, by focusing on individual lines in an ensemble, anticipating the beat in a rhythmic pattern, and following melodies in polyphonic textures. In addition, engagement with music is generally rewarding [31], which may confer advantages in terms of achieving consistent and prolonged training. If musical experience can improve perception of SIN, it could be a valuable means of improving communication, particularly for individuals who have difficulty understanding SIN.

Converging evidence from behavioral studies supports the link between musical experience and improved SIN perception. Cross‐sectional research frequently reports that both professional and amateur musicians outperform nonmusicians in SIN tasks [18, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42], although other studies have reported null effects [43−45]. Meta‐analytic work finds a small‐to‐moderate musician advantage that is largest under difficult listening conditions [46−48]. Furthermore, intervention studies suggest that musical training can improve SIN performance [39, 49, 50, 51].

Several mechanisms have been proposed to explain how musical experience improves SIN perception. Pitch processing is often considered central and can be indexed both neurally and behaviorally. The frequency‐following response (FFR), a neurophysiological marker of phase‐locked activity to periodic sounds, reflects pitch perception and auditory fidelity [52, 53]. Musicians frequently show stronger FFRs, consistent with experience‐dependent plasticity [54−56], although contributions from inherent auditory abilities have also been noted [44], and a large multisite study found no clear association between FFR strength and musical training [57]. Behavioral studies similarly suggest an advantage in pitch acuity. Musicians tend to exhibit lower frequency discrimination limens (FDL), allowing them to detect subtle differences in fundamental frequency (F0) [3, 58, 59]. Intervention studies provide causal evidence for this pathway. For instance, Dubinsky et al. [49] showed that short‐term choir training enhanced both pitch acuity and FFR strength in older adults, leading to improved SIN performance.

Rhythm skills, particularly beat perception, may provide an additional pathway. Although speech is not perfectly regular, its prosodic, syllabic, and phonological patterns provide temporal cues that help listeners anticipate important elements, direct attention to moments of high informational value, and take advantage of brief reductions in background noise. Such timing patterns help segment the speech stream into meaningful units, since listeners can use durational information to detect word boundaries even under noisy conditions [60]. Percussionists, who typically exhibit refined rhythm skills, have been shown to outperform both vocalists and nonmusicians in SIN tasks [41].

Beyond auditory mechanisms, higher‐order cognitive mechanisms may further support the musician advantage. Working memory supports SIN performance by enabling listeners to maintain and integrate degraded speech until sufficient context becomes available for comprehension [61, 62]. Musicians demonstrate better auditory working memory compared to nonmusicians [37, 63, 64], and evidence suggests that this advantage can mitigate age‐related decline [65]. Together, these findings indicate that musical experience may benefit SIN perception through multiple pathways.

A key limitation of the existing literature is that these mechanisms are rarely examined within a single framework, but instead tested in isolation or separate statistical models. This makes it difficult to determine whether musicians’ advantages in SIN reflect a dominant pathway, multiple interacting mechanisms, or the broader influence of auditory and cognitive abilities. Even when advantages are observed, meta‐analyses suggest that the effects are small to moderate [46−48]. Methodological choices further compound this issue. Self‐selection may inflate the observed benefits, since individuals with strong innate auditory or cognitive abilities are more likely to pursue music. More importantly, many studies dichotomize participants into musicians and nonmusicians using arbitrary cut‐offs (e.g., ≥10 years of training), which overlook informal engagement, intermediate levels of experience, and high aptitude in individuals with limited training (“musical sleepers” [44]). This could mask potential dose−response effects.

Some studies have adopted continuous measures of musical experience, such as the Goldsmiths Musical Sophistication Index [66] (Gold‐MSI). The Gold‐MSI is a dimensional measure of musical engagement that accounts for formal and informal experience, perceptual abilities, singing abilities, and emotional connection to music. Higher Gold‐MSI scores have been linked to greater sensory‐motor synchronization [67] and to neural markers of musical expertise, including increased gray matter volume and connectivity in regions associated with cognitive control, memory, language, and emotion [68−70]. Using this measure, Yates et al. [71] reported that in young adults, greater musical training and enhanced sensitivity to rhythm, beat, and melody were associated with better SIN performance. Even after controlling for frequency discrimination and working memory, beat perception was the strongest predictor of SIN performance. While this study provided important evidence for a rhythm–speech link in musicians, it did not test whether beat perception helped to account for the observed association between musical sophistication and SIN. Additionally, it relied on a relatively small sample (N = 24) and did not incorporate measures of FFR, which may also help explain the musician advantage.

Building on this work, the present study applies an integrated framework to young adults with normal hearing. We hypothesized that greater musical sophistication would be associated with better SIN performance. We also explored whether this statistical association might be reflected in shared variance with beat perception, pitch discrimination, working memory, and FFR precision. We did not have any specific a priori predictions about which of these skills would contribute the most. However, based on Yates et al.’s findings [71], we expected beat perception to emerge as an important factor. We also aimed to estimate the direct statistical association between musical sophistication and SIN, as well as the joint contribution of these candidate variables, without assuming a causal ordering. If musical sophistication continues to show a robust statistical link with SIN after including beat perception, pitch discrimination, working memory, and FFR precision, this would suggest that the association is not fully explained by them. Conversely, if the strength of the relationship is reduced, it would suggest that these skills collectively account for a substantial portion of the variance shared by musical sophistication and SIN. The broader aim of this study was to identify correlates that could clarify the basis of the musician advantage in SIN perception in order to inform future work addressing causal mechanisms more directly and to improve the potential efficacy of intervention studies.

2. Methods

2.1. Participants

Sixty‐four young adults aged 18–33 years with a broad range of musical experience were recruited from the psychology research pool (SONA) at Toronto Metropolitan University. No selection criteria were applied with respect to musical background, yielding a sample with typical variability in musical experience for this age group. All individuals who met the eligibility criteria were included.

Eligibility criteria included normal hearing and learning English before the age of 6. Only participants whose better ear average thresholds were within the normal‐hearing range (<20 dB HL between 0.5 and 4 kHz) and whose interaural difference was less than 15 dB HL were included [72]. Two participants were excluded due to thresholds meeting the criteria for hearing loss, resulting in a final sample of 62 participants (mean age = 21.0 years, SD = 3.6; 53 female).

The study was approved by the Toronto Metropolitan University Research Ethics Board (ID: REB 2020–202). All participants provided informed consent and received either course credit or monetary compensation for their time.

2.2. Procedure

All testing took place in an Industrial Acoustics Company (IAC) double‐walled soundproof booth during a 2.5‐h session. Pure‐tone thresholds were measured at the beginning of the session to confirm eligibility. Participants then completed a battery of established experimental tasks commonly used in prior research examining relationships between musical experience, SIN perception, and cognitive and auditory processing [49, 51, 73].

The task battery included a spatial SIN task, an auditory working memory task, a beat perception task, and a frequency discrimination task, followed by completion of the Gold‐MSI questionnaire and an FFR recording. The order of the tasks was fixed for all participants. Stimuli were presented at 80 dB sound pressure level (SPL) through ER‐3C insert earphones (Etymotic Research). Breaks were allowed as needed, and participants were monitored throughout the session.

2.3. Pure‐Tone Audiometry

Pure‐tone thresholds in dB hearing loss (HL) were obtained for each ear at 0.25, 0.5, 1, 2, 4, and 6 kHz with a calibrated clinical audiometer (GSI 61, Grason‐Stadler, United States). A pure‐tone average (PTA) was calculated separately for each ear using thresholds at 0.5, 1, 2, and 4 kHz. The mean PTA for the left ear was 7.52 dB HL (SD = 4.27, range = −1.25 to 20 dB HL) and for the right ear was 7.76 dB HL (SD = 3.77, range = −3.75 to 18.75 dB HL). The average interaural difference was 2.45 dB HL (SD = 1.80, range = 0–6.25). The better ear had a mean threshold of 6.41 dB HL (SD = 3.82, range = −3.75 to 18.75 dB HL).

2.4. Musical Sophistication

Participants completed the Gold‐MSI (v1.0) [66] which consists of 31 items rated on a 7‐point Likert scale from 1 (completely disagree) to 7 (completely agree) and eight categorical items (e.g., “I engaged in regular, daily practice of a musical instrument [including voice] for: 0 / 1 / 2 / 3 / 4–5 / 6–9 / 10+ years”). Scores were calculated using the authors’ scoring template, yielding five subscales. The Active Engagement subscale reflects the amount of time, effort, and resources devoted to musical activities such as listening, practicing, or attending concerts. The Perceptual Abilities subscale captures self‐reported skill in perceiving and discriminating musical features, including pitch, rhythm, and timbre. The Musical Training subscale indexes the extent of formal and informal instruction, practice history, and self‐rated proficiency on an instrument or voice. The Singing Abilities subscale measures perceived ability to sing in tune, maintain pitch, and reproduce melodies accurately. The Emotions subscale reflects the emotional responses to music and its role in mood regulation and engagement. A General Sophistication composite score, calculated from all subscales, ranges from 18 to 126 points, with higher points reflecting greater musical sophistication. This score served as the primary variable of interest in our analyses. Exploratory analyses were also conducted on the subscales. Histograms illustrating the score distributions are provided in Figure S1.

2.5. SIN Task

We adapted the spatialized SIN task from Swaminathan et al. [40]. The original study included four conditions that manipulated masker intelligibility (forward vs. reversed speech) and spatial configuration (collocated vs. separated). Here, we focused exclusively on the spatially separated, forward masker condition, which produced the largest musician benefit in the original work.

In this task, participants listened to five‐word target sentences spoken by a female voice in the presence of an intelligible two‐speaker babble of different female voices. All sentences were semantically correct but unpredictable, following a fixed structure: noun, verb, number, adjective, object (e.g., “Jane bought four old socks”). The target sentence was announced by its initial name (e.g., “Jane”), leaving four target words to be identified.

The target speech was presented from an azimuth of 0°, and on every trial, the two masker talkers were presented simultaneously from fixed azimuths of −15° and +15°, with one masker to the left and one to the right of the listener. Maskers were presented at a fixed level of 55 dB SPL, and the target's level varied adaptively using a one‐down one‐up procedure. The initial step size was 6 dB, which was reduced to 3 dB after three reversals. Participants selected the target words on a matrix displaying eight possible options for each word category. Feedback was provided after each trial. A trial was scored as correct only if all four target words were identified accurately.

Consistent with the original implementation described by Swaminathan et al. [40], each participant completed blocks of 25 trials, with each block containing at least nine reversals. Compared to the original version, which included six blocks, the number of blocks was reduced to four to limit task duration. The outcome was the speech reception threshold (SRT) in dB, representing the signal‐to‐noise ratio (SNR) at which participants could correctly identify the target sentence 50% of the time. The final SRT score was calculated as the mean across the four blocks, with lower thresholds indicating better SIN performance.

2.6. Pitch Perception

Pitch discrimination was assessed using a computerized FDL task with a three‐alternative forced‐choice design, following Amitay et al. [74]. On each trial, participants heard three 100‐ms pure tones (20‐ms rise/fall). Two were standard tones at 1000 Hz, and one was a slightly higher‐frequency target tone. Participants were asked to identify which tone was highest in pitch by pressing the corresponding number key (1, 2, or 3).

An adaptive staircase tracked individual thresholds: for the first five reversals, the frequency difference was halved after three correct responses or doubled after one incorrect response; thereafter, step sizes followed a √2 rule (division or multiplication by 1.414). Participants completed two blocks, each ending after 12 reversals. The threshold for each block was calculated as the mean frequency difference of the last 10 reversals. The final FDL score was obtained by averaging thresholds across the two blocks, with lower FDL values indicating better pitch discrimination performance.

2.7. Beat Perception

Musical beat perception was assessed using the auditory subscale of the Beat Alignment Test (BAT) [75]. Twelve musical excerpts, drawn from three genres (four jazz, four rock, and four pop orchestral), were presented once in each of three conditions: on‐beat, phase‐shift (beats 30% early or late), and tempo‐shift (beats 10% faster or slower than the musical beat), resulting in 36 trials in total. On each trial, participants judged whether the beats were on the beat by pressing Y for “yes” or N for “no” on the keyboard after the excerpt ended, and rated their confidence (1 = guessing, 2 = somewhat sure, 3 = completely certain). Three practice trials were provided before testing began. Participants were instructed not to tap or move along with the music. Performance was calculated as the percentage of correct responses across all trials, with higher scores reflecting better beat perception.

2.8. Working Memory

Working memory was assessed using the forward, backward, and sequencing subtests of the digit span of the Wechsler Adult Intelligence Scale‐Fourth Edition (WAIS) [76]. In the forward span test, participants heard sequences of digits, starting with two and increasing by one per trial up to a maximum of nine. The digits were presented at a rate of 1 per second. Two trials were conducted for each sequence length. The test ended when both trials of a given length were unsuccessful, or the maximum length was reached. The backward recall subtest followed the same procedure, but participants had to recall the digits in reverse order. In the sequencing test, participants repeated the digits in ascending numerical order.

Consistent with standard WAIS‐IV scoring procedures, a Digit Span composite score was computed as the sum of correctly recalled trials across the Forward, Backward, and Sequencing subtests (maximum score = 48 points). The Digit Span task contributes to the Working Memory Index and reflects a combination of short‐term storage and attentional processes (Forward) as well as working memory manipulation and reordering processes (Backward and Sequencing), with higher scores indicating better performance.

2.9. FFR Acquisition and Preprocessing

Following Skoe and Kraus [53] and Krizman and Kraus [52], the FFR was elicited using a repeated “da” syllable (F0 = 100 Hz), presented binaurally with alternating polarities. Each stimulus was 170 ms in duration. Participants heard a total of 6000 trials over a recording time of 25 min. The “da” was presented in the presence of multitalker babble at an SNR of +10 dB.

Data were recorded using a vertical, one‐channel montage with three electrodes. The active electrode was placed on the forehead at the FPz site, the reference electrode was placed on the right earlobe, and the ground electrode was placed lower on the forehead. Prior to electrode placement, the skin at each site was cleaned with alcohol. One‐inch square cloth gel BIOPAC electrodes were connected to a BIOPAC MP150 system with an Evoked Response Amplifier. AcqKnowledge software acquired the data at a 20 kHz sampling rate with a 1 kHz high‐pass filter and a 10 kHz low‐pass filter. During the recording, participants watched a silent movie of their choice.

The data were preprocessed using PHZLAB and custom MATLAB scripts. First, a notch filter was applied to remove 60 Hz power line interference. Then, a high‐pass filter with a cut‐off frequency of 75 Hz was used to attenuate low‐frequency components, such as motion artifacts. Next, the continuous signal was segmented into 253‐ms epochs using a window from −40 to 213 ms relative to stimulus onset. Trials containing myogenic artifacts were excluded if their amplitude exceeded 50 µV. The number of trials retained per participant ranged from 3840 to 5994 out of 6000 trials (mean = 5627, SD = 415; median = 5772). Specifically, 49 out of 61 participants (80%) retained more than 5400 trials (>90% of trials). The remaining epochs were equalized to ensure an equal number of trials for positive and negative stimulus polarities. The FFRs from these epochs were averaged across both polarities. We applied a fast Fourier transform with 8192 samples to the averaged waveform to generate the frequency spectra. The spectral magnitude at the fundamental frequency (100 Hz) was extracted as the primary measure reflecting the strength of the FFR response, with larger values indicating stronger neural encoding of pitch.

2.10. Statistical Analyses

All analyses were conducted in RStudio. First, the normality of the variables was screened. Outliers, defined as values more than 2.5 times the interquartile range above or below the mean, were removed. Three cases were excluded from the SIN task, five from the FDL task, and three from the FFR task. The FDL variable showed a non‐normal distribution. Missing values were replaced using predictive mean matching with m = 5 imputations (mice package [77]). This approach ensured that all participants were included in the analysis since mediation models require complete data and otherwise exclude cases with missing values.

Pearson correlations were computed among all key variables to examine bivariate relationships. Stepwise multiple regression analyses were then conducted to identify the strongest predictors of Gold‐MSI and SIN performance. For Gold‐MSI, BAT, Digit Span, FDL, FFR strength, and SIN thresholds were entered as predictors. For SIN performance, BAT, Digit span, FDL, and FFR strength were entered as predictors.

The main analysis was a parallel multiple mediation model (Model 4 in the PROCESS macro for R) [78], in which general Gold‐MSI served as the predictor, and SRT served as the dependent variable. BAT, FDL, Digit Span, and FFR precision served as mediators. This model estimated the direct statistical association between musical sophistication and SIN performance, as well as the indirect statistical pathways through each mediator. Indirect effects were computed using 20,000 bias‐corrected bootstrap samples to generate 95% confidence intervals (BCa CI). Effects were considered significant if the confidence intervals did not include zero.

The total effect represents the overall association between musical sophistication and SIN performance without accounting for the mediators. The direct effect represents the association after controlling for the mediators. Specific indirect effects quantify how much each mediator contributes on its own to the relationship between musical sophistication and SIN performance. Each indirect pathway, or ab coefficient, is calculated as the product of path a (predictor → mediator) and path b (mediator → outcome). Summed together, these indirect effects reflect the combined impact of all mediators. We also ran models with fewer mediators and models testing each mediator separately (with the other variables entered as covariates). These models yielded the same pattern of results as the main analysis. Therefore, we present results from the main analysis.

3. Results

Descriptive statistics summarizing all measures are available in Table 1.

TABLE 1.

Descriptive statistics (mean, standard deviation, minimum, maximum) for all measures included in the study.

Measures	Mean	SD	Min	Max
Gold‐MSI (/126)	70	18	38	111
SRT (dB)	−1.6	2.4	−8.2	5.8
Digit Span (/48 points)	27	3.5	19	37
BAT (%)	79	11	56	100
FDL (Hz)	17	16	2.8	63
FFR strength (f0 at 100 Hz)	0.05	0.022	0.0059	0.11

Open in a new tab

Note: Digit Span = working memory score from the Wechsler Adult Intelligence Scale. FFR strength = frequency‐following response amplitude at the fundamental frequency of 100 Hz.

Abbreviations: BAT, Beat Alignment Test (percentage correct); FDL, frequency discrimination limen (in Hz, lower values indicate better sensitivity); FFR, frequency‐following response; Gold‐MSI, Goldsmiths Musical Sophistication Index (score range: 18–126); SRT, speech reception threshold (lower values indicate better performance).

3.1. Relationships Among Measures

The correlation matrix between all measures is shown in Figure 1. Gold‐MSI scores were negatively correlated with SIN performance and FDL, and positively correlated with BAT and Digit Span, indicating that higher musical sophistication was associated with better auditory and cognitive abilities. No significant correlation was found with FFR strength. Subscale analyses of Digit Span revealed that SIN performance was significantly correlated with the Forward and Sequencing subtests, but not the Backward subtest (Figure S2). To characterize which measures uniquely accounted for variance in musical sophistication, a stepwise multiple regression was conducted with BAT, Digit Span, FDL, FFR strength, and SIN as predictors. The final model retained BAT, FDL, and SIN, explaining 36.8% of the variance in Gold‐MSI scores, F(3, 58) = 11.25, p < 0.001. Higher BAT scores (β = 0.47, p = 0.010), lower FDL thresholds (β = −0.30, p = 0.018), and lower SIN thresholds (i.e., SRT) (β = −2.17, p = 0.009) were associated with higher musical sophistication.

Pairwise correlations among musical sophistication (Gold‐MSI), beat perception (BAT), working memory (Digit Span), pitch discrimination (FDL), speech‐in‐noise thresholds (SIN), and subcortical pitch encoding (FFR). Diagonal panels show variable distributions. Upper panels display Pearson correlation coefficients (r) with associated p‐values. Lower panels show scatterplots, with regression lines and 95% confidence intervals plotted only for significant correlations. Abbreviations: BAT, beat alignment test; FDL, frequency discrimination limens; FFR, frequency‐following response; Gold‐MSI, Goldsmiths Musical Sophistication Index; SIN, speech‐in‐noise.

SIN performance was negatively correlated with Gold‐MSI, BAT, and Digit Span. No significant correlation was found with SIN performance and FDL or FFR strength. SIN thresholds were entered as the outcome variable in a stepwise regression including BAT, Digit Span, FDL, and FFR strength as predictors. The final model retained Digit Span and BAT, explaining 23.9% of the variance in SIN performance, F(2, 59) = 9.27, p < 0.001. Better Digit Span performance (β = −0.26, p = 0.002) and higher BAT scores (β = −0.06, p = 0.017) were associated with lower (i.e., better) SIN thresholds.

3.2. Parallel Mediation Analysis

We examined whether variance shared between musical sophistication and SIN performance could be statistically accounted for by Digit Span, BAT, FDL, and FFR precision (Figure 2A). As expected based on the correlational analysis, the total effect showed that musical sophistication significantly predicted SIN performance, B = −0.058, SE = 0.016, p < 0.001, 95% CI [−0.091, −0.025], with R ² = 0.177 and adjusted R ² = 0.164, indicating that 17.7% of the variance in SIN thresholds was explained by musical sophistication alone.

Panel A shows the results of a mediation model testing the contribution of auditory (FDL, BAT, FFR) and cognitive (Digit Span) factors to the relationship between musical sophistication (Gold‐MSI) and speech‐in‐noise perception (SIN). Standardized regression coefficients are shown for each path. Green paths indicate significant mediation effects. The total effect reflects the overall association between Gold‐MSI and SIN, including both direct and indirect influences through mediators. The direct effect reflects the remaining association once mediators are accounted for. The bottom panels show fitted regression planes for SIN as a function of Gold‐MSI with (B) Digit Span and (C) BAT. Points are observed data. The dark gray plane shows values predicted by the multiple regression. Abbreviations: BAT, beat alignment test; FDL, frequency discrimination limens; FFR, frequency‐following response; Gold‐MSI, Goldsmiths Musical Sophistication Index; SIN, speech‐in‐noise. *p < 0.05, **p < 0.01, ***p < 0.001*.

When all four mediators were entered simultaneously, the direct effect of musical sophistication on SIN performance was reduced and was no longer statistically significant, B = −0.037, SE = 0.021, p = 0.069, 95% CI [−0.078, 0.003]. The full mediation model explained 30.8% of the variance in SIN performance (adjusted R ² = 0.246). Within this model, Digit Span accounted for 42.9% of the explained variance, BAT for 14.3%, FDL for 13.9%, FFR precision for 1.0%, and the residual direct effect of musical sophistication for 17.9% of the explained variance.

For the indirect pathways, Digit Span showed a significant association, ab = −0.016, SE = 0.007, 95% BCa CI [−0.034, −0.005]. Higher musical sophistication was related to higher Digit Span scores, which in turn were associated with better SIN thresholds (i.e., lower SRT) (Figure 2B). The pathway through BAT accuracy was also significant, ab = −0.013, SE = 0.008, 95% BCa CI [−0.033, −0.001], such that greater musical sophistication was associated with higher BAT accuracy, which in turn was linked with better SIN thresholds (Figure 2C). In contrast, neither FDL, ab = 0.010, SE = 0.007, 95% BCa CI [−0.000, 0.028], nor FFR precision, ab = −0.001, SE = 0.003, 95% BCa CI [−0.011, 0.003], showed significant indirect effects.

Given that SIN performance was significantly correlated with the Forward and Sequencing subtests of Digit Span, but not the Backward subtest, additional simple mediation analyses were conducted to examine these subcomponents separately. Both the Forward, ab = −0.017, SE = 0.011, 95% BCa CI [−0.043, −0.000], and Sequencing subtests, ab = −0.014, SE = 0.008, 95% BCa CI [−0.033, −0.001], showed significant indirect effects between Gold‐MSI and SIN thresholds.

3.3. Analyses of Gold‐MSI Subscales

We conducted additional correlation analyses between each Gold‐MSI subscale and the auditory and cognitive measures to determine which aspects of musical sophistication contributed most strongly to the observed effects, with particular emphasis on SIN performance. Results are presented in Figure S3. Among the five subscales, Musical Training, Perceptual Abilities, and Singing Abilities were each significantly associated with better SIN thresholds. In contrast, the Active Engagement and Emotions subscales showed no significant relationships with SIN thresholds.

Consistent with the main mediation analysis, follow‐up mediation analyses revealed the same pattern of results at the subscale level (Figure S4). The associations between SIN thresholds and the Musical Training, Perceptual Abilities, and Singing Abilities subscales were statistically mediated by Digit Span and BAT. For the two remaining subscales, Active Engagement and Emotions, the direct effects on SIN thresholds were not significant. However, significant indirect effects were observed through BAT.

4. Discussion

The present study examined whether and how individual differences in musical sophistication relate to SIN perception in young adults with normal hearing, with potential predictors including working memory, frequency discrimination, beat perception, and FFR precision. Consistent with previous research, higher musical engagement was associated with better SIN performance, as well as with finer pitch discrimination, stronger working memory, and more accurate beat perception. However, when these predictors were modeled simultaneously, only auditory working memory and beat perception statistically accounted for the association between musical sophistication and SIN performance, whereas pitch discrimination and FFR precision did not. These findings suggest that music‐related advantages for SIN perception in young adults are best understood in relation to specific cognitive and temporal processing skills, rather than as a generalized enhancement of auditory ability.

4.1. Positive Relationship Between Gold‐MSI and SIN

Research on the musician advantage in perceiving SIN has yielded mixed results. While some studies report clear advantages for amateur and professional musicians [18, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42], others find only a minimal or no effect [43−45]. One possible reason for this variability is that the effect may depend on the demands of the listening task and the type of auditory mask used. Indeed, a growing body of evidence suggests that musicians’ advantages are more likely to appear under difficult conditions, particularly when there is informational masking, and listeners must isolate a target voice from other speakers. A recent meta‐analysis supports this idea, showing that the magnitude of the musician advantage is the strongest in the most challenging listening conditions [47]. Simpler assessments such as QuickSIN often yield smaller or inconsistent differences, typically around 1 dB [36, 79]. Although QuickSIN includes babble from multiple speakers, it lacks key features such as spatial separation and distinct vocal identities, which place greater demands on selective attention and working memory.

This distinction has been discussed in detail by Baskent and Gaudrain [32], who argued that small group differences may result from tasks that do not sufficiently challenge cognitive resources. They emphasized that informational masking (i.e., speech‐on‐speech), where the masker closely resembles the target voice, places greater demands on cognitive control. In contrast, energetic masking reflects more peripheral signal degradation. Since musical training is associated with enhanced cognitive abilities, musicians are expected to perform better under conditions involving informational masking. Supporting this, they found a musician advantage when the masker was a single competing talker. In a related study, Swaminathan et al. [40] compared musicians and nonmusicians in a task in which target sentences were masked by other sentences presented from different spatial locations. Musicians showed a significant advantage of approximately 6 dB in terms of SRT compared to nonmusicians. The authors further showed that this advantage was highly dependent on the characteristics of the masker. When the maskers were intelligible and spatially separated, musicians significantly outperformed nonmusicians. In contrast, when the maskers were unintelligible and, therefore, poor in informational content, the performance of the two groups was comparable. Altogether, these results highlight that the extent of the musician advantage is highly dependent on the degree of informational masking.

In our study, we used an adapted version of the spatialized two‐talker paradigm used by Swaminathan et al. [40]. As expected, higher musical sophistication was associated with better SIN performance. Musical sophistication was measured with the Gold‐MSI General Sophistication score (range: 18–126). The regression coefficient of the total effect of the mediation analysis (B = –0.058) indicates that each 1‐point increase on the Gold‐MSI corresponded to a 0.058 dB improvement in SRT, yielding an estimated 6.2 dB advantage across the full scale. This effect is also evident in the correlation plot, where individuals at the high and low ends of the scale differ by roughly 5 dB. These findings closely align with the 6 dB musician–nonmusician difference reported by Swaminathan et al.

Another potential reason for the inconsistencies in the literature may lie in how musical experience is defined. Many studies rely on binary classifications based on the number of years of formal training. This classification may mask significant variations in musical engagement. For example, individuals with limited formal training may nevertheless be involved in music through singing, informal practice, or frequent listening, while others with extensive training may no longer be actively involved. Therefore, training background alone may not capture key aspects of musical experience and may overlook “musical sleepers,” that is, individuals who do not meet conventional criteria but nevertheless exhibit enhanced auditory or cognitive abilities due to informal experience and long‐term exposure [44]. In addition, some individuals may possess innate auditory or cognitive abilities that promote both musical performance and listening, regardless of their training.

To address this limitation, we deliberately measured musical experience using the Gold‐MSI, which offers a more nuanced assessment by capturing various forms of musical engagement. Analyses of the individual subscales revealed a selective pattern, such that musical training, perceptual abilities, and singing abilities were associated with better SIN performance, whereas subscales indexing engagement or affective responses to music were not. This pattern suggests that the relationship between musical sophistication and SIN perception is more closely related to facets of musical experience that reflect auditory skills and performance abilities. Because the present study is cross‐sectional, these findings do not permit conclusions about causality or the direction of effects. Rather, they indicate that individual differences in SIN performance covary most strongly with behaviorally and perceptually grounded dimensions of musical sophistication, potentially reflecting a combination of training‐related influences and pre‐existing auditory or cognitive predispositions.

Consistent with this approach, Perron et al. [18] found that when musical experience was considered dimensionally, variables such as frequency and duration of practice, multilingual singing repertoire, and formal training predicted SIN ability, whereas total years of training did not. No difference in SIN accuracy was found when only considering group difference (i.e., musicians vs. nonmusicians). Similarly, Ruggles et al. [79] found no group differences in SIN performance. However, within the musician group, years of training were found to positively correlate with performance, highlighting the added value of continuous training measures. Yates et al. [71] also reported that self‐reported musical training on the Gold‐MSI predicted sentence recognition in noise. Our results, therefore, extend this work by showing that specific aspects of musical sophistication can also account for significant individual differences in SIN ability. These results suggest the importance of measuring musical engagement on a continuum. Future research could use the Gold‐MSI to compare groups on specific subscales or to better match participants, thereby improving reproducibility across studies.

4.2. Working Memory as the Main Contributing Factor

In our data, working memory proved to be the strongest predictor of the relationship between musical sophistication and SIN perception. Working memory was also retained as the best predictor in the stepwise regression model, and its association with the SIN remained significant even after considering all other auditory and cognitive factors.

Experimental studies have shown that older and younger adults with greater working memory capacity are better able to recognize speech in difficult acoustic environments [80−82]. A systematic review of 20 studies by Akeroyd [61] concluded that working memory was the most consistent cognitive predictor of individual differences in SIN. Similarly, a more recent meta‐analysis by Dryden et al. [83] further confirmed a weak but significant overall correlation across studies (R = 0.28). Together, these findings identify working memory as a particularly reliable predictor of SIN performance.

To clarify which working memory operations support SIN perception, working memory was assessed using the Digit Span subtests of the WAIS, which index partially distinct processes. Digit Span Forward primarily reflects short‐term storage and attentional capacity, whereas Sequencing places additional demands on maintaining items while dynamically updating their serial order. Digit Span Backward, in contrast, emphasizes executive manipulation through mental reversal of order. We found that SIN perception was significantly associated with Forward and Sequencing performance, but not with Backward performance.

Functionally, SIN perception requires listeners to preserve degraded speech representations while continuously integrating incoming input as speech unfolds over time. Forward span likely reflects the capacity to hold speech fragments online long enough for subsequent processing, whereas Sequencing captures the ability to impose and update temporal structure as new information becomes available. Accordingly, cognitive operations that require reversing information are not directly engaged by the demands of SIN perception. This pattern aligns with theoretical models of speech perception under adverse conditions. The ELU (ease of language understanding) model [62, 84] proposes that comprehension is generally rapid and automatic, as incoming speech aligns directly with phonological representations stored in long‐term memory. In noisy or degraded conditions, this automatic matching often fails, creating inconsistencies that require listeners to engage in explicit processing pathways. These pathways rely heavily on working memory to maintain and update incoming speech information until a coherent interpretation can be formed. Reverse hierarchy theory [85] provides a complementary account by suggesting that when high‐level representations are insufficient, listeners must fall back on fine‐grained acoustic detail through top‐down control, which is also cognitively costly. Despite emphasizing different mechanisms, both frameworks converge on the idea that working memory supports SIN perception primarily through maintenance and organization of auditory information, consistent with the observed involvement of Forward and Sequencing spans.

Our findings also complement the literature establishing a link between musical experience and working memory. Musicians have been shown to outperform nonmusicians in a range of working memory tasks across different modalities [64], and longitudinal studies suggest that music training can strengthen working memory capacity [86−88]. A recent multilaboratory study involving 1200 participants (600 musicians and 600 nonmusicians) confirmed that musicians have a reliable advantage in short‐term memory, particularly for musical tasks, but also for verbal and visuospatial tasks [89]. As mentioned previously, masking studies also support these findings. Musicians’ advantages are more pronounced in informational masking, where competing signals resemble the target speech in terms of language or structure. In contrast, the advantages are smaller or absent in energetic masking, which mainly involves acoustic interference and places little demand on higher‐level processes. This asymmetry suggests that the advantages associated with music are more apparent when working memory and executive resources are taxed. This provides evidence that cognitive mechanisms, rather than peripheral auditory enhancements alone, underpin the musician advantage in SIN perception.

Importantly, Zhang et al. [65] showed that working memory moderates the relationship between SIN perception and aging, suggesting a protective role for working memory throughout life. Although our study focused on young adults, our results demonstrate that working memory also mediates the relationship between musical sophistication and SIN performance in this cohort. Converging evidence for a mediating role of working memory has also been reported outside the speech domain. Using a nonlinguistic music‐in‐noise stream segregation task, a recent study showed that auditory working memory mediated the relationship between musical training and performance, with the effect replicated across two independent samples of young adults [90]. Although these studies address different populations and tasks, both point to working memory as a mechanism that shapes how experiential and lifespan‐related factors relate to auditory performance under challenging conditions.

4.3. Beat Perception as a Complementary Pathway

We also found that beat perception, as measured by the BAT task, accounted for a modest, yet statistically significant, proportion of the variance in the association between Gold‐MSI and SRT. Furthermore, BAT performance partially mediated the relationship between general musical sophistication and SIN ability. The stepwise regression model revealed that, when both BAT scores and working memory capacity were entered into the model, each emerged as a unique predictor of SIN perception. Together, these results suggest that rhythm sensitivity and memory support SIN processing via distinct pathways.

Previous research has linked music rhythm sensitivity to a range of language and speech‐related outcomes. In adults, rhythm skills support temporal prediction and segmentation during speech perception [91, 92], while in children, they have additionally been associated with phonological awareness, grammar development, and reading fluency [93−95]. Its role in SIN perception, however, has received less attention. Slater and Kraus [96] reported that participants with stronger rhythm competence also tended to perform better on the QuickSIN test. They further observed that percussionists outperformed both singers and nonmusicians on the Word‐In‐Noise task. Relatedly, Yates et al. [71] showed that, in young adults, greater musical training and enhanced sensitivity to rhythm, beat, and melody were associated with better SIN performance. Notably, even after controlling for frequency discrimination and working memory, beat perception emerged as the strongest predictor. Our data extend this literature by demonstrating that BAT performance predicts SRT in a relatively large sample of young adults. Interestingly, our group has also recently found that working memory and beat perception jointly predict SIN performance in older adults with hearing loss who have little or no musical training [73].

One leading explanation for the speech–rhythm link is temporal prediction. Speech contains quasi‐regular rhythmic features, such as syllable timing and prosodic stress patterns, even though it lacks the strict periodicity of music. Neural oscillations in the delta (1–4 Hz) and theta (4–8 Hz) bands have been shown to entrain to these modulations, aligning the phase of ongoing activity with expected acoustic events [97−99]. This alignment is thought to improve the timing of attentional selection and encoding, particularly when speech is masked or degraded. Dynamic Attending Theory [100, 101] emphasizes this role of rhythm in structuring attentional fluctuations, suggesting that entrainment allows listeners to anticipate when important events will occur. In parallel, the Temporal Sampling Framework [102] proposes that individual differences in rhythm sensitivity modulate the precision of neural entrainment, thereby shaping speech parsing.

These mechanisms are further integrated within the PRISM model [103], which outlines how rhythmic expertise transfers to speech processing. According to PRISM, rhythm‐related benefits arise from three interacting components: enhanced auditory precision, entrainment‐based prediction, and sensorimotor coupling. First, music requires precise tracking of temporal features such as amplitude modulations and onset timing, which may generalize to improved perception of temporal cues in speech. Second, exposure to regular rhythmic structures in music may strengthen the brain's ability to entrain to the temporal dynamics of speech, thereby supporting segmentation and prediction. Third, rhythmic training enhances auditory–motor integration, facilitating predictive timing through top‐down modulation of auditory processing. Together, these mechanisms provide a coherent explanation for how beat perception supports stream segregation and target tracking in noisy environments. In addition to the role of working memory, our findings suggest that rhythm‐focused training is a promising method for strengthening temporal prediction and auditory–motor integration. This, in turn, could improve SIN perception. To our knowledge, this has not yet been examined using a longitudinal design.

4.4. Null Effect of Pitch Discrimination and FFR Precision

In contrast to working memory and beat perception, neither pitch discrimination nor FFR precision uniquely predicted SIN outcomes. Participants with higher musical sophistication did show finer frequency discrimination thresholds, replicating the well‐established link between musicianship and pitch acuity [3, 58, 59]. However, this sensory advantage did not translate into differences in SIN thresholds. In the literature, the evidence for pitch as a cue for enhanced SIN perception is mixed, with some studies finding relationships between frequency discrimination and SIN thresholds [36], while other studies do not find this relationship [43]. Fuller et al. [104] suggested that the influence of frequency discrimination on speech‐related tasks depends on the extent to which pitch cues are informative. In our paradigm, target and masker voices were closely matched in pitch and timbre (all female voices), substantially limiting the usefulness of spectral cues for stream segregation. Under these conditions, spectral cues may have been insufficient, forcing listeners to rely on other cues and mechanisms such as spatial separation, stream segregation, and executive control. Similarly, Baskent and Gaudrain [32] reported that musicians did not benefit more than nonmusicians from differences in fundamental frequency or vocal tract length between concurrent voices. The largest advantage for musicians occurred when the target and masker voices had small differences in their average vocal characteristics. This suggests that the benefit is unlikely to arise from static pitch cues. The authors argued that this advantage may reflect musicians’ enhanced ability to process rapid fluctuations in F0 contours or, more broadly, superior stream segregation and auditory–cognitive abilities. This is consistent with our findings.

In our models, FFR precision was neither a significant predictor of SIN performance nor associated with musical sophistication. Exploratory Bayesian analyses provided further support for these null effects (BF₁₀ = 0.26 for SIN → FFR strength; BF₁₀ = 0.34 for MSI → FFR strength), indicating that the observed data were more likely under the absence of an effect. This pattern aligns with the recent six‐site replication study [57], which also revealed no consistent musicianship advantage in FFR strength across various samples. Cross‐sectional studies have produced variable results, with some reporting links between musicianship and FFR enhancement [56], while others have not [105]. In contrast, longitudinal studies more consistently demonstrate that music training is associated with FFR enhancements. For instance, our group's studies have demonstrated significant increases in FFR strength after 12 weeks of choral singing for older adults with hearing loss [49] and in hearing aid users [51].

One plausible interpretation is that cross‐sectional and longitudinal models are sensitive to partially distinct aspects of auditory plasticity. Longitudinal designs likely benefit from within‐subject control and are well suited to detecting experience‐dependent changes over time, whereas cross‐sectional comparisons are inherently limited by large interindividual variability in auditory physiology, musical experience, and other unmeasured factors. This variability may limit the ability of cross‐sectional designs to resolve relationships between musical experience and auditory neural measures.

The interpretation of the null FFR findings is also constrained by the way the response was quantified in the present study. Here, the FFR strength reflects the summed activity of multiple neural generators, including both subcortical and cortical sources [106]. If musical training selectively affects specific aspects of this response, such effects may not be apparent when relying on a global measure of response strength. Consistent with this possibility, studies reporting musicianship‐related differences have often focused on alternative FFR metrics, including stimulus tracking accuracy, phase locking, or response timing. Future work that applies a broader analytic framework or explicitly models contributions from distinct neural generators may help clarify how musical training and sophistication influences auditory neural encoding.

4.5. Limitations

There are several limitations that should be noted. First, although mediation analysis is useful for examining relationships between variables, it cannot establish causality in a cross‐sectional design. Indirect effects are particularly susceptible to omitted variable bias and confounding by unmeasured variables. Our modest sample size exacerbates these concerns by increasing the risk of overfitting and parameter instability in complex models. To address these issues, we ran simplified models that included only one predictor as a mediator. The results were consistent with those of the full analyses, suggesting that the findings are not artifacts of model complexity. Nevertheless, stronger mechanistic conclusions will require longitudinal or training studies in which mediators can be manipulated or tracked over time. The present results align with prior research identifying working memory and beat processing as critical components of SIN perception. These results bolster confidence in the reliability of the observed pathways.

Second, although the Gold‐MSI is a validated measure of musical sophistication, it reflects a broad concept of musical sophistication that encompasses factors such as formal training, informal engagement, and enduring personal traits, including auditory sensitivity, attentional style, and motivation [66]. Consequently, our findings cannot determine the extent to which associations reflect the effects of training versus pre‐existing individual differences (i.e., the nature vs. nurture debate).

Third, our models did not consider other cognitive and linguistic abilities that are known to influence SIN perception, such as inhibitory control and attentional switching. These abilities may account for additional variance in the relationship between musical sophistication and SIN perception. Additionally, although we treated musical sophistication as a continuous measure, we did not consider participants’ primary instruments or how they engage with music. Previous studies have shown that instrumentalists and singers have different cognitive and neural profiles [107, 108]. As our sample represented a continuum of engagement, such practice‐related differences may have been reduced or obscured. It is very likely that different musical profiles support SIN perception through partly distinct pathways.

Taken together, these limitations underscore the necessity of future studies that incorporate broader cognitive assessments and longitudinal designs. At the same time, the present findings provide evidence that domain‐general and music‐related skills work together to support SIN perception.

5. Conclusion

This study clarifies the mechanisms underlying the relationship between musical sophistication and SIN perception by modeling several auditory and cognitive abilities simultaneously. Although musical sophistication was associated with enhanced pitch discrimination, beat perception, and working memory, only the latter two were identified as reliable predictors of SIN performance. These results suggest that the music‐related advantage in SIN perception reflects the specific cognitive and temporal abilities necessary for real‐world listening. The findings have important potential applications. Rhythmic processing and working memory are promising, trainable targets for interventions designed to improve communication in challenging auditory environments. This could be particularly relevant for older adults, who often report difficulty understanding SIN. However, future longitudinal and intervention studies are required to determine the causal impact of music training on auditory and cognitive skills, but also to determine the most effective ways to translate these advantages into practice.

Author Contributions

Maxime Perron: Data curation, formal analysis, visualization, writing – original draft preparation. Emily Wood: Conceptualization, investigation, writing – review and editing. Frank Russo: Conceptualization, funding acquisition, methodology, project administration, resources, supervision, writing – review and editing.

Conflicts of Interest

The authors have declared no conflict of interest.

Supporting information

Supplementary Information: nyas70212‐sup‐0001‐SuppMat.docx

NYAS-1556-0-s001.docx^{(1.9MB, docx)}

Acknowledgments

We thank all the participants for their contribution. This work was funded by a Discovery Grant (2017‐06969) and an Industrial‐Research Chair (IRC 537355‐18) from the Natural Science and Engineering Research Council (NSERC) of Canada awarded to F.A.R. M.P. was supported by a Postdoctoral Fellowship from the Canadian Institutes of Health Research (CIHR).

Perron M., Wood E. A., and Russo F. A., “Explaining the Musical Advantage in Speech Perception Through Beat Perception and Working Memory.” Annals of the New York Academy of Sciences 1556, no. 1 (2026): e70212. 10.1111/nyas.70212

References

1. Herholz S. C. and Zatorre R. J., “Musical Training as a Framework for Brain Plasticity: Behavior, Function, and Structure,” Neuron 76 (2012): 486–502. [DOI] [PubMed] [Google Scholar]
2. Kraus N. and Chandrasekaran B., “Music Training for the Development of Auditory Skills,” Nature Reviews Neuroscience 11 (2010): 599–605. [DOI] [PubMed] [Google Scholar]
3. Micheyl C., Delhommeau K., Perrot X., et al., “Influence of Musical and Psychoacoustical Training on Pitch Discrimination,” Hearing Research 219 (2006): 36–47. [DOI] [PubMed] [Google Scholar]
4. Kishon‐Rabin L., Amir O., Vexler Y., et al., “Pitch Discrimination: Are Professional Musicians Better Than Non‐Musicians?,” Journal of Basic and Clinical Physiology and Pharmacology 12 (2001): 125–143. [DOI] [PubMed] [Google Scholar]
5. Rammsayer T. and Altenmüller E., “Temporal Information Processing in Musicians and Nonmusicians,” Music Perception 24 (2006): 37–48. [Google Scholar]
6. Chartrand J.‐P. and Belin P., “Superior Voice Timbre Processing in Musicians,” Neuroscience Letters 405 (2006): 164–167. [DOI] [PubMed] [Google Scholar]
7. Samson S. and Zatorre R. J., “Contribution of the Right Temporal Lobe to Musical Timbre Discrimination,” Neuropsychologia 32 (1994): 231–240. [DOI] [PubMed] [Google Scholar]
8. Besson M., Chobert J., and Marie C., “Transfer of Training Between Music and Speech: Common Processing, Attention, and Memory,” Frontiers in Psychology 2 (2011): 94. [DOI] [PMC free article] [PubMed] [Google Scholar]
9. Patel A. D., “Can Nonlinguistic Musical Training Change the Way the Brain Processes Speech? The Expanded OPERA Hypothesis,” Hearing Research 308 (2014): 98–108. [DOI] [PubMed] [Google Scholar]
10. Patel A. D., “The OPERA Hypothesis: Assumptions and Clarifications,” Annals of the New York Academy of Sciences 1252 (2012): 124–128. [DOI] [PubMed] [Google Scholar]
11. Patel A. D., “Why Would Musical Training Benefit the Neural Encoding of Speech? The OPERA Hypothesis,” Frontiers in Psychology 2 (2011): 142. [DOI] [PMC free article] [PubMed] [Google Scholar]
12. Moreno S. and Bidelman G. M., “Examining Neural Plasticity and Cognitive Benefit Through the Unique Lens of Musical Training,” Hearing Research 308 (2014): 84–97. [DOI] [PubMed] [Google Scholar]
13. Peretz I., Vuvan D., Lagrois M. E., et al., “Neural Overlap in Processing Music and Speech,” Philosophical Transactions of the Royal Society of London. Series B: Biological Sciences 370 (2015): 20140090. [DOI] [PMC free article] [PubMed] [Google Scholar]
14. Schön D., Gordon R., Campagne A., et al., “Similar Cerebral Networks in Language, Music and Song Perception,” Neuroimage 51 (2010): 450–461. [DOI] [PubMed] [Google Scholar]
15. te Rietmolen N., Mercier M. R., Trébuchon A., et al., “Speech and Music Recruit Frequency‐Specific Distributed and Overlapping Cortical Networks,” eLife 13 (2024): RP94509. [DOI] [PMC free article] [PubMed] [Google Scholar]
16. Abrams D. A., Bhatara A., Ryali S., et al., “Decoding Temporal Structure in Music and Speech Relies on Shared Brain Resources But Elicits Different Fine‐Scale Spatial Patterns,” Cerebral Cortex 21 (2011): 1507–1518. [DOI] [PMC free article] [PubMed] [Google Scholar]
17. Perron M., Theaud G., Descoteaux M., et al., “The Frontotemporal Organization of the Arcuate Fasciculus and Its Relationship With Speech Perception in Young and Older Amateur Singers and Non‐Singers,” Human Brain Mapping 42 (2021): 1–19. [DOI] [PMC free article] [PubMed] [Google Scholar]
18. Perron M., Vaillancourt J., and Tremblay P., “Amateur Singing Benefits Speech Perception in Aging Under Certain Conditions of Practice: Behavioural and Neurobiological Mechanisms,” Brain Structure and Function 227 (2022): 943–962. [DOI] [PubMed] [Google Scholar]
19. Fleming D., Belleville S., Peretz I., et al., “The Effects of Short‐Term Musical Training on the Neural Processing of Speech‐In‐Noise in Older Adults,” Brain and Cognition 136 (2019): 103592. [DOI] [PubMed] [Google Scholar]
20. Zatorre R. J. and Baum S. R., “Musical Melody and Speech Intonation: Singing a Different Tune,” PLoS Biology 10 (2012): e1001372. [DOI] [PMC free article] [PubMed] [Google Scholar]
21. Ruggles D., Bharadwaj H., and Shinn‐Cunningham B. G., “Normal Hearing Is Not Enough to Guarantee Robust Encoding of Suprathreshold Features Important in Everyday Communication,” Proceedings of the National Academy of Sciences 108 (2011): 15516–15521. [DOI] [PMC free article] [PubMed] [Google Scholar]
22. Bharadwaj H. M., Masud S., Mehraei G., et al., “Individual Differences Reveal Correlates of Hidden Hearing Deficits,” Journal of Neuroscience 35 (2015): 2161–2172. [DOI] [PMC free article] [PubMed] [Google Scholar]
23. Alain C. and Arnott S. R., “Selectively Attending to Auditory Objects,” Frontiers in Bioscience 5 (2000): D202–212. [DOI] [PubMed] [Google Scholar]
24. Bregman A. S., Auditory Scene Analysis: The Perceptual Organization of Sound (MIT Press, 1990). [Google Scholar]
25. Hake R., Bürgel M., Nguyen N. K., et al., “Development of an Adaptive Test of Musical Scene Analysis Abilities for Normal‐Hearing and Hearing‐Impaired Listeners,” Behavior Research Methods 56 (2024): 5456–5481. [DOI] [PMC free article] [PubMed] [Google Scholar]
26. Rimmele J. M., Zion Golumbic E., Schröger E., et al., “The Effects of Selective Attention and Speech Acoustics on Neural Speech‐Tracking in a Multi‐Talker Scene,” Cortex 68 (2015): 144–154. [DOI] [PMC free article] [PubMed] [Google Scholar]
27. Perron M., Ross B., and Alain C., “Left Motor Cortex Contributes to Auditory Phonological Discrimination,” Cerebral Cortex 34 (2024): bhae369. [DOI] [PMC free article] [PubMed] [Google Scholar]
28. Du Y., Buchsbaum B. R., Grady C. L., et al., “Noise Differentially Impacts Phoneme Representations in the Auditory and Speech Motor Systems,” Proceedings of the National Academy of Sciences of the United States of America 111 (2014): 7126–7131. [DOI] [PMC free article] [PubMed] [Google Scholar]
29. Kim S., Choi I., Schwalje A. T., et al., “Auditory Working Memory Explains Variance in Speech Recognition in Older Listeners Under Adverse Listening Conditions,” Clinical Interventions in Aging 15 (2020): 395–406. [DOI] [PMC free article] [PubMed] [Google Scholar]
30. Obleser J. and Kotz S. A., “Expectancy Constraints in Degraded Speech Modulate the Language Comprehension Network,” Cerebral Cortex 20 (2010): 633–640. [DOI] [PubMed] [Google Scholar]
31. Salimpoor V. N., Zald D. H., Zatorre R. J., et al., “Predictions and the Brain: How Musical Sounds Become Rewarding,” Trends in Cognitive Sciences 19 (2015): 86–91. [DOI] [PubMed] [Google Scholar]
32. Baskent D. and Gaudrain E., “Musician Advantage for Speech‐on‐Speech Perception,” Journal of the Acoustical Society of America 139 (2016): EL51–56. [DOI] [PubMed] [Google Scholar]
33. Du Y. and Zatorre R. J., “Musical Training Sharpens and Bonds Ears and Tongue to Hear Speech Better,” Proceedings of the National Academy of Sciences of the United States of America 114 (2017): 13579–13584. [DOI] [PMC free article] [PubMed] [Google Scholar]
34. Parbery‐Clark A., Anderson S., Hittner E., et al., “Musical Experience Offsets Age‐Related Delays in Neural Timing,” Neurobiology of Aging 33, (2012): 1483 e1481–1484. [DOI] [PubMed] [Google Scholar]
35. Parbery‐Clark A., Skoe E., and Kraus N., “Musical Experience Limits the Degradative Effects of Background Noise on the Neural Processing of Sound,” Journal of Neuroscience 29 (2009): 14100–14107. [DOI] [PMC free article] [PubMed] [Google Scholar]
36. Parbery‐Clark A., Skoe E., Lam C., et al., “Musician Enhancement for Speech‐In‐Noise,” Ear and Hearing 30 (2009): 653–661. [DOI] [PubMed] [Google Scholar]
37. Parbery‐Clark A., Strait D. L., Anderson S., et al., “Musical Experience and the Aging Auditory System: Implications for Cognitive Abilities and Hearing Speech in Noise,” PLoS ONE 6 (2011): e18082. [DOI] [PMC free article] [PubMed] [Google Scholar]
38. Zendel B. R. and Alain C., “Musicians Experience Less Age‐Related Decline in Central Auditory Processing,” Psychology and Aging 27 (2012): 410–417. [DOI] [PubMed] [Google Scholar]
39. Zendel B. R., West G. L., Belleville S., et al., “Musical Training Improves the Ability to Understand Speech‐In‐Noise in Older Adults,” Neurobiology of Aging 81 (2019): 102–115. [DOI] [PubMed] [Google Scholar]
40. Swaminathan J., Mason C. R., Streeter T. M., et al., “Musical Training, Individual Differences and the Cocktail Party Problem,” Scientific Reports 5 (2015): 11628. [DOI] [PMC free article] [PubMed] [Google Scholar]
41. Slater J. and Kraus N., “The Role of Rhythm in Perceiving Speech in Noise: A Comparison of Percussionists, Vocalists and Non‐Musicians,” Cognitive Processing 17 (2016): 79–87. [DOI] [PMC free article] [PubMed] [Google Scholar]
42. Zendel B. R., Tremblay C.‐D., Belleville S., et al., “The Impact of Musicianship on the Cortical Mechanisms Related to Separating Speech From Background Noise,” Journal of Cognitive Neuroscience 27 (2015): 1044–1059. [DOI] [PubMed] [Google Scholar]
43. Boebinger D., Evans S., Rosen S., et al., “Musicians and Non‐Musicians Are Equally Adept at Perceiving Masked Speech,” Journal of the Acoustical Society of America 137 (2015): 378–387. [DOI] [PMC free article] [PubMed] [Google Scholar]
44. Mankel K. and Bidelman G. M., “Inherent Auditory Skills Rather Than Formal Music Training Shape the Neural Encoding of Speech,” Proceedings of the National Academy of Sciences of the United States of America 115 (2018): 13129–13134. [DOI] [PMC free article] [PubMed] [Google Scholar]
45. Madsen S. M. K., Marschall M., Dau T., et al., “Speech Perception Is Similar for Musicians and Non‐Musicians Across a Wide Range of Conditions,” Scientific Reports 9 (2019): 10404. [DOI] [PMC free article] [PubMed] [Google Scholar]
46. Coffey E. B. J., Mogilever N. B., and Zatorre R. J., “Speech‐In‐Noise Perception in Musicians: A Review,” Hearing Research 352 (2017): 49–69. [DOI] [PubMed] [Google Scholar]
47. Maillard E., Joyal M., Murray M. M., et al., “Are Musical Activities Associated With Enhanced Speech Perception in Noise in Adults? A Systematic Review and Meta‐Analysis,” Current Neurology and Neuroscience Reports 4 (2023): 100083. [DOI] [PMC free article] [PubMed] [Google Scholar]
48. Hennessy S., Mack W. J., and Habibi A., “Speech‐In‐Noise Perception in Musicians and Non‐Musicians: A Multi‐Level Meta‐Analysis,” Hearing Research 416 (2022): 108442. [DOI] [PubMed] [Google Scholar]
49. Dubinsky E., Wood E. A., Nespoli G., et al., “Short‐Term Choir Singing Supports Speech‐In‐Noise Perception and Neural Pitch Strength in Older Adults With Age‐Related Hearing Loss,” Frontiers in Neuroscience 13 (2019): 1153. [DOI] [PMC free article] [PubMed] [Google Scholar]
50. Fuller C. D., Galvin J. J., Maat B., et al., “Comparison of Two Music Training Approaches on Music and Speech Perception in Cochlear Implant Users,” Trends in Hearing 22 (2018): 2331216518765379. [DOI] [PMC free article] [PubMed] [Google Scholar]
51. Lo C. Y., Dubinsky E., Gilmore S. A., et al., “Choir Singing and Music Appreciation Training Enhances Unaided Speech‐In‐Noise Perception and Frequency Following Responses for Older Adult Hearing Aid Users: A Randomized Controlled Trial,” Seminars in Hearing 46 (2025): 125–140. [DOI] [PMC free article] [PubMed] [Google Scholar]
52. Krizman J. and Kraus N., “Analyzing the FFR: A Tutorial for Decoding the Richness of Auditory Function,” Hearing Research 382 (2019): 107779. [DOI] [PMC free article] [PubMed] [Google Scholar]
53. Skoe E. and Kraus N., “Auditory Brain Stem Response to Complex Sounds: A Tutorial,” Ear and Hearing 31 (2010): 302–324. [DOI] [PMC free article] [PubMed] [Google Scholar]
54. Bidelman G. M. and Alain C., “Musical Training Orchestrates Coordinated Neuroplasticity in Auditory Brainstem and Cortex to Counteract Age‐Related Declines in Categorical Vowel Perception,” Journal of Neuroscience 35 (2015): 1240–1249. [DOI] [PMC free article] [PubMed] [Google Scholar]
55. Bidelman G. M. and Krishnan A., “Effects of Reverberation on Brainstem Representation of Speech in Musicians and Non‐Musicians,” Brain Research 1355 (2010): 112–125. [DOI] [PMC free article] [PubMed] [Google Scholar]
56. Musacchia G., Sams M., Skoe E., et al., “Musicians Have Enhanced Subcortical Auditory and Audiovisual Processing of Speech and Music,” Proceedings of the National Academy of Sciences of the United States of America 104 (2007): 15894–15898. [DOI] [PMC free article] [PubMed] [Google Scholar]
57. Whiteford K. L., Baltzell L. S., Chiu M., et al., “Large‐Scale Multi‐Site Study Shows No Association Between Musical Training and Early Auditory Neural Sound Encoding,” Nature Communications 16 (2025): 7152. [DOI] [PMC free article] [PubMed] [Google Scholar]
58. Bidelman G. M., Gandour J. T., and Krishnan A., “Musicians and Tone‐Language Speakers Share Enhanced Brainstem Encoding But Not Perceptual Benefits for Musical Pitch,” Brain and Cognition 77 (2011): 1–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
59. Hsieh I.‐H. and Guo Y.‐J., “No Musician Advantage in the Perception of Degraded–Fundamental Frequency Speech in Noisy Environments,” Journal of Speech, Language, and Hearing Research 66 (2023): 2643–2655. [DOI] [PubMed] [Google Scholar]
60. Smith M. R., Cutler A., Butterfield S., et al., “The Perception of Rhythm and Word Boundaries in Noise‐Masked Speech,” Journal of Speech and Hearing Research 32 (1989): 912–920. [DOI] [PubMed] [Google Scholar]
61. Akeroyd M. A., “Are Individual Differences in Speech Reception Related to Individual Differences in Cognitive Ability? A Survey of Twenty Experimental Studies With Normal and Hearing‐Impaired Adults,” International Journal of Audiology 47: no. Suppl 2 (2008): S53–71. [DOI] [PubMed] [Google Scholar]
62. Rönnberg J., Lunner T., Zekveld A., et al., “The Ease of Language Understanding (ELU) Model: Theoretical, Empirical, and Clinical Advances,” Frontiers in Systems Neuroscience 7 (2013): 31. [DOI] [PMC free article] [PubMed] [Google Scholar]
63. Kraus N., Strait D. L., and Parbery‐Clark A., “Cognitive Factors Shape Brain Networks for Auditory Skills: Spotlight on Auditory Working Memory,” Annals of the New York Academy of Sciences 1252 (2012): 100–107. [DOI] [PMC free article] [PubMed] [Google Scholar]
64. Talamini F., Altoè G., Carretti B., et al., “Musicians Have Better Memory Than Nonmusicians: A Meta‐Analysis,” PLoS ONE 12 (2017): e0186773. [DOI] [PMC free article] [PubMed] [Google Scholar]
65. Zhang L., Fu X., Luo D., et al., “Musical Experience Offsets Age‐Related Decline in Understanding Speech‐In‐Noise: Type of Training Does Not Matter, Working Memory Is the Key,” Ear and Hearing 42 (2021): 258–270. [DOI] [PMC free article] [PubMed] [Google Scholar]
66. Müllensiefen D., Gingras B., Musil J., et al., “The Musicality of Non‐Musicians: An Index for Assessing Musical Sophistication in the General Population,” PLoS ONE 9 (2014): e89642. [DOI] [PMC free article] [PubMed] [Google Scholar]
67. Whitton S. A. and Jiang F., “Sensorimotor Synchronization With Visual, Auditory, and Tactile Modalities,” Psychological Research 87 (2023): 2204–2217. [DOI] [PMC free article] [PubMed] [Google Scholar]
68. Ai M., Loui P., Morris T. P., et al., “Musical Experience Relates to Insula‐Based Functional Connectivity in Older Adults,” Brain Sciences 12 (2022): 1157. [DOI] [PMC free article] [PubMed] [Google Scholar]
69. Chaddock‐Heyman L., Loui P., Weng T. B., et al., “Musical Training and Brain Volume in Older Adults,” Brain Sciences 11 (2021): 50. [DOI] [PMC free article] [PubMed] [Google Scholar]
70. Mehrabinejad M. M., Rafei P., Sanjari Moghaddam H., et al., “Sex Differences Are Reflected in Microstructural White Matter Alterations of Musical Sophistication: A Diffusion MRI Study,” Frontiers in Neuroscience 15 (2021): 622053. [DOI] [PMC free article] [PubMed] [Google Scholar]
71. Yates K. M., Moore D. R., Amitay S., et al., “Sensitivity to Melody, Rhythm, and Beat in Supporting Speech‐In‐Noise Perception in Young Adults,” Ear and Hearing 40 (2019): 358–367. [DOI] [PMC free article] [PubMed] [Google Scholar]
72. Humes L. E., “The World Health Organization's Hearing‐Impairment Grading System: An Evaluation for Unaided Communication in Age‐Related Hearing Loss,” International Journal of Audiology 58 (2019): 12–20. [DOI] [PMC free article] [PubMed] [Google Scholar]
73. Lo C. Y., Dubinsky E., Wright‐Whyte K., et al., “On‐Beat Rhythm and Working Memory Are Associated With Better Speech‐In‐Noise Perception for Older Adults With Hearing Loss,” Quarterly Journal of Experimental Psychology (Hove) (2025): 17470218241311204. [DOI] [PubMed] [Google Scholar]
74. Amitay S., Irwin A., Hawkey D. J., et al., “A Comparison of Adaptive Procedures for Rapid and Reliable Threshold Assessment and Training in Naive Listeners,” Journal of the Acoustical Society of America 119 (2006): 1616–1625. [DOI] [PubMed] [Google Scholar]
75. Iversen J. and Patel A., “The Beat Alignment Test (BAT): Surveying Beat Processing Abilities in the General Population,” in Proceedings of the 10th International Conference on Music Perception and Cognition (ICMPC10) (2008).
76. Wechsler D., WISC‐V: Technical and Interpretive Manual (NCS Pearson, Incorporated, 2014). [Google Scholar]
77. van Buuren S. and Groothuis‐Oudshoorn K., “mice: Multivariate Imputation by Chained Equations in R,” Journal of Statistical Software 45 (2011): 1–67. [Google Scholar]
78. Hayes A. F., Introduction to Mediation, Moderation, and Conditional Process Analysis: A Regression‐Based Approach (Guilford Press, 2013). [Google Scholar]
79. Ruggles D. R., Freyman R. L., and Oxenham A. J., “Influence of Musical Training on Understanding Voiced and Whispered Speech in Noise,” PLoS ONE 9 (2014): e86980. [DOI] [PMC free article] [PubMed] [Google Scholar]
80. Gordon‐Salant S. and Cole S. S., “Effects of Age and Working Memory Capacity on Speech Recognition Performance in Noise Among Listeners With Normal Hearing,” Ear and Hearing 37 (2016): 593–602. [DOI] [PubMed] [Google Scholar]
81. Shokuhifar G., Javanbakht M., Vahedi M., et al., “The Relationship Between Speech in Noise Perception and Auditory Working Memory Capacity in Monolingual and Bilingual Adults,” International Journal of Audiology 64 (2025): 131–138. [DOI] [PubMed] [Google Scholar]
82. Stenbäck V., Marsja E., Hällgren M., et al., “The Contribution of Age, Working Memory Capacity, and Inhibitory Control on Speech Recognition in Noise in Young and Older Adult Listeners,” Journal of Speech, Language, and Hearing Research 64 (2021): 4513–4523. [DOI] [PubMed] [Google Scholar]
83. Dryden A., Allen H. A., Henshaw H., et al., “The Association Between Cognitive Performance and Speech‐In‐Noise Perception for Adult Listeners: A Systematic Literature Review and Meta‐Analysis,” Trends in Hearing 21 (2017): 2331216517744675. [DOI] [PMC free article] [PubMed] [Google Scholar]
84. Rönnberg J., Rudner M., Foo C., et al., “Cognition Counts: A Working Memory System for Ease of Language Understanding (ELU),” International Journal of Audiology 47, no. Suppl 2 (2008): S99–105. [DOI] [PubMed] [Google Scholar]
85. Nahum M., Nelken I., and Ahissar M., “Low‐Level Information and High‐Level Perception: The Case of Speech in Noise,” PLoS Biology 6 (2008): e126. [DOI] [PMC free article] [PubMed] [Google Scholar]
86. Bugos J. A., Perlstein W. M., McCrae C. S., et al., “Individualized Piano Instruction Enhances Executive Functioning and Working Memory in Older Adults,” Aging and Mental Health 11 (2007): 464–471. [DOI] [PubMed] [Google Scholar]
87. Marie D., Müller C. A. H., Altenmüller E., et al., “Music Interventions in 132 Healthy Older Adults Enhance Cerebellar Grey Matter and Auditory Working Memory, Despite General Brain Atrophy,” Neuroimage: Reports 3 (2023): 100166. [DOI] [PMC free article] [PubMed] [Google Scholar]
88. Wang X., Soshi T., Yamashita M., et al., “Effects of a 10‐Week Musical Instrument Training on Cognitive Function in Healthy Older Adults: Implications for Desirable Tests and Period of Training,” Frontiers in Aging Neuroscience 15 (2023): 1180259. [DOI] [PMC free article] [PubMed] [Google Scholar]
89. Grassi M., Talamini F., Altoè G., et al., “Do Musicians Have Better Short‐Term Memory Than Nonmusicians? A Multilab Study,” Advances in Methods and Practices in Psychological Science 8 (2025): 25152459251379432. [Google Scholar]
90. Liu M., Arseneau‐Bruneau I., Franch M. F., et al., “Auditory Working Memory Mechanisms Mediating the Relationship Between Musicianship and Auditory Stream Segregation,” Frontiers in Psychology 16 (2025): 1538511. [DOI] [PMC free article] [PubMed] [Google Scholar]
91. Beier E. J. and Ferreira F., “The Temporal Prediction of Stress in Speech and Its Relation to Musical Beat Perception,” Frontiers in Psychology 9 (2018): 431. [DOI] [PMC free article] [PubMed] [Google Scholar]
92. Cutler A. and Butterfield S., “Rhythmic Cues to Speech Segmentation: Evidence From Juncture Misperception,” Journal of Memory and Language 31 (1992): 218–236. [Google Scholar]
93. Ozernov‐Palchik O. and Patel A. D., “Musical Rhythm and Reading Development: Does Beat Processing Matter?,” Annals of the New York Academy of Sciences 1423 (2018): 166–175. [DOI] [PubMed] [Google Scholar]
94. Rimmer C., Dahary H., and Quintin E. M., “Links Between Musical Beat Perception and Phonological Skills for Autistic Children,” Child Neuropsychology 30 (2024): 361–380. [DOI] [PubMed] [Google Scholar]
95. Nitin R., Gustavson D. E., Aaron A. S., et al., “Exploring Individual Differences in Musical Rhythm and Grammar Skills in School‐Aged Children With Typically Developing Language,” Scientific Reports 13 (2023): 2201. [DOI] [PMC free article] [PubMed] [Google Scholar]
96. Slater J. and Kraus N., “The Role of Rhythm in Perceiving Speech in Noise: A Comparison of Percussionists, Vocalists and Non‐Musicians,” Cognitive Processing 17 (2016): 79–87. [DOI] [PMC free article] [PubMed] [Google Scholar]
97. Giraud A. L. and Poeppel D., “Cortical Oscillations and Speech Processing: Emerging Computational Principles and Operations,” Nature Neuroscience 15 (2012): 511–517. [DOI] [PMC free article] [PubMed] [Google Scholar]
98. Arnal L. H. and Giraud A.‐L., “Cortical Oscillations and Sensory Predictions,” Trends in Cognitive Sciences 16 (2012): 390–398. [DOI] [PubMed] [Google Scholar]
99. Lakatos P., Karmos G., Mehta A. D., et al., “Entrainment of Neuronal Oscillations as a Mechanism of Attentional Selection,” Science 320 (2008): 110–113. [DOI] [PubMed] [Google Scholar]
100. Jones M. R. and Boltz M., “Dynamic Attending and Responses to Time,” Psychological Review 96 (1989): 459–491. [DOI] [PubMed] [Google Scholar]
101. Large E. W. and Jones M. R., “The Dynamics of Attending: How People Track Time‐Varying Events,” Psychological Review 106 (1999): 119–159. [Google Scholar]
102. Goswami U., “A Temporal Sampling Framework for Developmental Dyslexia,” Trends in Cognitive Sciences 15 (2011): 3–10. [DOI] [PubMed] [Google Scholar]
103. Fiveash A., Bedoin N., Gordon R. L., et al., “Processing Rhythm in Speech and Music: Shared Mechanisms and Implications for Developmental Speech and Language Disorders,” Neuropsychology 35 (2021): 771–791. [DOI] [PMC free article] [PubMed] [Google Scholar]
104. Fuller C. D., Galvin J. J. 3rd, Maat B., et al., “The Musician Effect: Does It Persist Under Degraded Pitch Conditions of Cochlear Implant Simulations?,” Frontiers in Neuroscience 8 (2014): 179. [DOI] [PMC free article] [PubMed] [Google Scholar]
105. Riegel J., Schüller A., and Reichenbach T., “No Evidence of Musical Training Influencing the Cortical Contribution to the Speech‐Frequency‐Following Response and Its Modulation Through Selective Attention,” eNeuro 11, (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
106. Coffey E. B. J., Herholz S. C., Chepesiuk A. M. P., et al., “Cortical Contributions to the Auditory Frequency‐Following Response Revealed by MEG,” Nature Communications 7 (2016): 11070. [DOI] [PMC free article] [PubMed] [Google Scholar]
107. Joyal M., Sicard A., Penhune V., et al., “Attention, Working Memory, and Inhibitory Control in Aging: Comparing Amateur Singers, Instrumentalists, and Active Controls,” Annals of the New York Academy of Sciences 1541 (2024): 163–180. [DOI] [PMC free article] [PubMed] [Google Scholar]
108. Halwani G. F., Loui P., Rueber T., et al., “Effects of Practice and Experience on the Arcuate Fasciculus: Comparing Singers, Instrumentalists, and Non‐Musicians,” Frontiers in Psychology 2 (2011): 156. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Information: nyas70212‐sup‐0001‐SuppMat.docx

NYAS-1556-0-s001.docx^{(1.9MB, docx)}

[nyas70212-bib-0001] 1. Herholz S. C. and Zatorre R. J., “Musical Training as a Framework for Brain Plasticity: Behavior, Function, and Structure,” Neuron 76 (2012): 486–502. [DOI] [PubMed] [Google Scholar]

[nyas70212-bib-0002] 2. Kraus N. and Chandrasekaran B., “Music Training for the Development of Auditory Skills,” Nature Reviews Neuroscience 11 (2010): 599–605. [DOI] [PubMed] [Google Scholar]

[nyas70212-bib-0003] 3. Micheyl C., Delhommeau K., Perrot X., et al., “Influence of Musical and Psychoacoustical Training on Pitch Discrimination,” Hearing Research 219 (2006): 36–47. [DOI] [PubMed] [Google Scholar]

[nyas70212-bib-0004] 4. Kishon‐Rabin L., Amir O., Vexler Y., et al., “Pitch Discrimination: Are Professional Musicians Better Than Non‐Musicians?,” Journal of Basic and Clinical Physiology and Pharmacology 12 (2001): 125–143. [DOI] [PubMed] [Google Scholar]

[nyas70212-bib-0005] 5. Rammsayer T. and Altenmüller E., “Temporal Information Processing in Musicians and Nonmusicians,” Music Perception 24 (2006): 37–48. [Google Scholar]

[nyas70212-bib-0006] 6. Chartrand J.‐P. and Belin P., “Superior Voice Timbre Processing in Musicians,” Neuroscience Letters 405 (2006): 164–167. [DOI] [PubMed] [Google Scholar]

[nyas70212-bib-0007] 7. Samson S. and Zatorre R. J., “Contribution of the Right Temporal Lobe to Musical Timbre Discrimination,” Neuropsychologia 32 (1994): 231–240. [DOI] [PubMed] [Google Scholar]

[nyas70212-bib-0008] 8. Besson M., Chobert J., and Marie C., “Transfer of Training Between Music and Speech: Common Processing, Attention, and Memory,” Frontiers in Psychology 2 (2011): 94. [DOI] [PMC free article] [PubMed] [Google Scholar]

[nyas70212-bib-0009] 9. Patel A. D., “Can Nonlinguistic Musical Training Change the Way the Brain Processes Speech? The Expanded OPERA Hypothesis,” Hearing Research 308 (2014): 98–108. [DOI] [PubMed] [Google Scholar]

[nyas70212-bib-0010] 10. Patel A. D., “The OPERA Hypothesis: Assumptions and Clarifications,” Annals of the New York Academy of Sciences 1252 (2012): 124–128. [DOI] [PubMed] [Google Scholar]

[nyas70212-bib-0011] 11. Patel A. D., “Why Would Musical Training Benefit the Neural Encoding of Speech? The OPERA Hypothesis,” Frontiers in Psychology 2 (2011): 142. [DOI] [PMC free article] [PubMed] [Google Scholar]

[nyas70212-bib-0012] 12. Moreno S. and Bidelman G. M., “Examining Neural Plasticity and Cognitive Benefit Through the Unique Lens of Musical Training,” Hearing Research 308 (2014): 84–97. [DOI] [PubMed] [Google Scholar]

[nyas70212-bib-0013] 13. Peretz I., Vuvan D., Lagrois M. E., et al., “Neural Overlap in Processing Music and Speech,” Philosophical Transactions of the Royal Society of London. Series B: Biological Sciences 370 (2015): 20140090. [DOI] [PMC free article] [PubMed] [Google Scholar]

[nyas70212-bib-0014] 14. Schön D., Gordon R., Campagne A., et al., “Similar Cerebral Networks in Language, Music and Song Perception,” Neuroimage 51 (2010): 450–461. [DOI] [PubMed] [Google Scholar]

[nyas70212-bib-0015] 15. te Rietmolen N., Mercier M. R., Trébuchon A., et al., “Speech and Music Recruit Frequency‐Specific Distributed and Overlapping Cortical Networks,” eLife 13 (2024): RP94509. [DOI] [PMC free article] [PubMed] [Google Scholar]

[nyas70212-bib-0016] 16. Abrams D. A., Bhatara A., Ryali S., et al., “Decoding Temporal Structure in Music and Speech Relies on Shared Brain Resources But Elicits Different Fine‐Scale Spatial Patterns,” Cerebral Cortex 21 (2011): 1507–1518. [DOI] [PMC free article] [PubMed] [Google Scholar]

[nyas70212-bib-0017] 17. Perron M., Theaud G., Descoteaux M., et al., “The Frontotemporal Organization of the Arcuate Fasciculus and Its Relationship With Speech Perception in Young and Older Amateur Singers and Non‐Singers,” Human Brain Mapping 42 (2021): 1–19. [DOI] [PMC free article] [PubMed] [Google Scholar]

[nyas70212-bib-0018] 18. Perron M., Vaillancourt J., and Tremblay P., “Amateur Singing Benefits Speech Perception in Aging Under Certain Conditions of Practice: Behavioural and Neurobiological Mechanisms,” Brain Structure and Function 227 (2022): 943–962. [DOI] [PubMed] [Google Scholar]

[nyas70212-bib-0019] 19. Fleming D., Belleville S., Peretz I., et al., “The Effects of Short‐Term Musical Training on the Neural Processing of Speech‐In‐Noise in Older Adults,” Brain and Cognition 136 (2019): 103592. [DOI] [PubMed] [Google Scholar]

[nyas70212-bib-0020] 20. Zatorre R. J. and Baum S. R., “Musical Melody and Speech Intonation: Singing a Different Tune,” PLoS Biology 10 (2012): e1001372. [DOI] [PMC free article] [PubMed] [Google Scholar]

[nyas70212-bib-0021] 21. Ruggles D., Bharadwaj H., and Shinn‐Cunningham B. G., “Normal Hearing Is Not Enough to Guarantee Robust Encoding of Suprathreshold Features Important in Everyday Communication,” Proceedings of the National Academy of Sciences 108 (2011): 15516–15521. [DOI] [PMC free article] [PubMed] [Google Scholar]

[nyas70212-bib-0022] 22. Bharadwaj H. M., Masud S., Mehraei G., et al., “Individual Differences Reveal Correlates of Hidden Hearing Deficits,” Journal of Neuroscience 35 (2015): 2161–2172. [DOI] [PMC free article] [PubMed] [Google Scholar]

[nyas70212-bib-0023] 23. Alain C. and Arnott S. R., “Selectively Attending to Auditory Objects,” Frontiers in Bioscience 5 (2000): D202–212. [DOI] [PubMed] [Google Scholar]

[nyas70212-bib-0024] 24. Bregman A. S., Auditory Scene Analysis: The Perceptual Organization of Sound (MIT Press, 1990). [Google Scholar]

[nyas70212-bib-0025] 25. Hake R., Bürgel M., Nguyen N. K., et al., “Development of an Adaptive Test of Musical Scene Analysis Abilities for Normal‐Hearing and Hearing‐Impaired Listeners,” Behavior Research Methods 56 (2024): 5456–5481. [DOI] [PMC free article] [PubMed] [Google Scholar]

[nyas70212-bib-0026] 26. Rimmele J. M., Zion Golumbic E., Schröger E., et al., “The Effects of Selective Attention and Speech Acoustics on Neural Speech‐Tracking in a Multi‐Talker Scene,” Cortex 68 (2015): 144–154. [DOI] [PMC free article] [PubMed] [Google Scholar]

[nyas70212-bib-0027] 27. Perron M., Ross B., and Alain C., “Left Motor Cortex Contributes to Auditory Phonological Discrimination,” Cerebral Cortex 34 (2024): bhae369. [DOI] [PMC free article] [PubMed] [Google Scholar]

[nyas70212-bib-0028] 28. Du Y., Buchsbaum B. R., Grady C. L., et al., “Noise Differentially Impacts Phoneme Representations in the Auditory and Speech Motor Systems,” Proceedings of the National Academy of Sciences of the United States of America 111 (2014): 7126–7131. [DOI] [PMC free article] [PubMed] [Google Scholar]

[nyas70212-bib-0029] 29. Kim S., Choi I., Schwalje A. T., et al., “Auditory Working Memory Explains Variance in Speech Recognition in Older Listeners Under Adverse Listening Conditions,” Clinical Interventions in Aging 15 (2020): 395–406. [DOI] [PMC free article] [PubMed] [Google Scholar]

[nyas70212-bib-0030] 30. Obleser J. and Kotz S. A., “Expectancy Constraints in Degraded Speech Modulate the Language Comprehension Network,” Cerebral Cortex 20 (2010): 633–640. [DOI] [PubMed] [Google Scholar]

[nyas70212-bib-0031] 31. Salimpoor V. N., Zald D. H., Zatorre R. J., et al., “Predictions and the Brain: How Musical Sounds Become Rewarding,” Trends in Cognitive Sciences 19 (2015): 86–91. [DOI] [PubMed] [Google Scholar]

[nyas70212-bib-0032] 32. Baskent D. and Gaudrain E., “Musician Advantage for Speech‐on‐Speech Perception,” Journal of the Acoustical Society of America 139 (2016): EL51–56. [DOI] [PubMed] [Google Scholar]

[nyas70212-bib-0033] 33. Du Y. and Zatorre R. J., “Musical Training Sharpens and Bonds Ears and Tongue to Hear Speech Better,” Proceedings of the National Academy of Sciences of the United States of America 114 (2017): 13579–13584. [DOI] [PMC free article] [PubMed] [Google Scholar]

[nyas70212-bib-0034] 34. Parbery‐Clark A., Anderson S., Hittner E., et al., “Musical Experience Offsets Age‐Related Delays in Neural Timing,” Neurobiology of Aging 33, (2012): 1483 e1481–1484. [DOI] [PubMed] [Google Scholar]

[nyas70212-bib-0035] 35. Parbery‐Clark A., Skoe E., and Kraus N., “Musical Experience Limits the Degradative Effects of Background Noise on the Neural Processing of Sound,” Journal of Neuroscience 29 (2009): 14100–14107. [DOI] [PMC free article] [PubMed] [Google Scholar]

[nyas70212-bib-0036] 36. Parbery‐Clark A., Skoe E., Lam C., et al., “Musician Enhancement for Speech‐In‐Noise,” Ear and Hearing 30 (2009): 653–661. [DOI] [PubMed] [Google Scholar]

[nyas70212-bib-0037] 37. Parbery‐Clark A., Strait D. L., Anderson S., et al., “Musical Experience and the Aging Auditory System: Implications for Cognitive Abilities and Hearing Speech in Noise,” PLoS ONE 6 (2011): e18082. [DOI] [PMC free article] [PubMed] [Google Scholar]

[nyas70212-bib-0038] 38. Zendel B. R. and Alain C., “Musicians Experience Less Age‐Related Decline in Central Auditory Processing,” Psychology and Aging 27 (2012): 410–417. [DOI] [PubMed] [Google Scholar]

[nyas70212-bib-0039] 39. Zendel B. R., West G. L., Belleville S., et al., “Musical Training Improves the Ability to Understand Speech‐In‐Noise in Older Adults,” Neurobiology of Aging 81 (2019): 102–115. [DOI] [PubMed] [Google Scholar]

[nyas70212-bib-0040] 40. Swaminathan J., Mason C. R., Streeter T. M., et al., “Musical Training, Individual Differences and the Cocktail Party Problem,” Scientific Reports 5 (2015): 11628. [DOI] [PMC free article] [PubMed] [Google Scholar]

[nyas70212-bib-0041] 41. Slater J. and Kraus N., “The Role of Rhythm in Perceiving Speech in Noise: A Comparison of Percussionists, Vocalists and Non‐Musicians,” Cognitive Processing 17 (2016): 79–87. [DOI] [PMC free article] [PubMed] [Google Scholar]

[nyas70212-bib-0042] 42. Zendel B. R., Tremblay C.‐D., Belleville S., et al., “The Impact of Musicianship on the Cortical Mechanisms Related to Separating Speech From Background Noise,” Journal of Cognitive Neuroscience 27 (2015): 1044–1059. [DOI] [PubMed] [Google Scholar]

[nyas70212-bib-0043] 43. Boebinger D., Evans S., Rosen S., et al., “Musicians and Non‐Musicians Are Equally Adept at Perceiving Masked Speech,” Journal of the Acoustical Society of America 137 (2015): 378–387. [DOI] [PMC free article] [PubMed] [Google Scholar]

[nyas70212-bib-0044] 44. Mankel K. and Bidelman G. M., “Inherent Auditory Skills Rather Than Formal Music Training Shape the Neural Encoding of Speech,” Proceedings of the National Academy of Sciences of the United States of America 115 (2018): 13129–13134. [DOI] [PMC free article] [PubMed] [Google Scholar]

[nyas70212-bib-0045] 45. Madsen S. M. K., Marschall M., Dau T., et al., “Speech Perception Is Similar for Musicians and Non‐Musicians Across a Wide Range of Conditions,” Scientific Reports 9 (2019): 10404. [DOI] [PMC free article] [PubMed] [Google Scholar]

[nyas70212-bib-0046] 46. Coffey E. B. J., Mogilever N. B., and Zatorre R. J., “Speech‐In‐Noise Perception in Musicians: A Review,” Hearing Research 352 (2017): 49–69. [DOI] [PubMed] [Google Scholar]

[nyas70212-bib-0047] 47. Maillard E., Joyal M., Murray M. M., et al., “Are Musical Activities Associated With Enhanced Speech Perception in Noise in Adults? A Systematic Review and Meta‐Analysis,” Current Neurology and Neuroscience Reports 4 (2023): 100083. [DOI] [PMC free article] [PubMed] [Google Scholar]

[nyas70212-bib-0048] 48. Hennessy S., Mack W. J., and Habibi A., “Speech‐In‐Noise Perception in Musicians and Non‐Musicians: A Multi‐Level Meta‐Analysis,” Hearing Research 416 (2022): 108442. [DOI] [PubMed] [Google Scholar]

[nyas70212-bib-0049] 49. Dubinsky E., Wood E. A., Nespoli G., et al., “Short‐Term Choir Singing Supports Speech‐In‐Noise Perception and Neural Pitch Strength in Older Adults With Age‐Related Hearing Loss,” Frontiers in Neuroscience 13 (2019): 1153. [DOI] [PMC free article] [PubMed] [Google Scholar]

[nyas70212-bib-0050] 50. Fuller C. D., Galvin J. J., Maat B., et al., “Comparison of Two Music Training Approaches on Music and Speech Perception in Cochlear Implant Users,” Trends in Hearing 22 (2018): 2331216518765379. [DOI] [PMC free article] [PubMed] [Google Scholar]

[nyas70212-bib-0051] 51. Lo C. Y., Dubinsky E., Gilmore S. A., et al., “Choir Singing and Music Appreciation Training Enhances Unaided Speech‐In‐Noise Perception and Frequency Following Responses for Older Adult Hearing Aid Users: A Randomized Controlled Trial,” Seminars in Hearing 46 (2025): 125–140. [DOI] [PMC free article] [PubMed] [Google Scholar]

[nyas70212-bib-0052] 52. Krizman J. and Kraus N., “Analyzing the FFR: A Tutorial for Decoding the Richness of Auditory Function,” Hearing Research 382 (2019): 107779. [DOI] [PMC free article] [PubMed] [Google Scholar]

[nyas70212-bib-0053] 53. Skoe E. and Kraus N., “Auditory Brain Stem Response to Complex Sounds: A Tutorial,” Ear and Hearing 31 (2010): 302–324. [DOI] [PMC free article] [PubMed] [Google Scholar]

[nyas70212-bib-0054] 54. Bidelman G. M. and Alain C., “Musical Training Orchestrates Coordinated Neuroplasticity in Auditory Brainstem and Cortex to Counteract Age‐Related Declines in Categorical Vowel Perception,” Journal of Neuroscience 35 (2015): 1240–1249. [DOI] [PMC free article] [PubMed] [Google Scholar]

[nyas70212-bib-0055] 55. Bidelman G. M. and Krishnan A., “Effects of Reverberation on Brainstem Representation of Speech in Musicians and Non‐Musicians,” Brain Research 1355 (2010): 112–125. [DOI] [PMC free article] [PubMed] [Google Scholar]

[nyas70212-bib-0056] 56. Musacchia G., Sams M., Skoe E., et al., “Musicians Have Enhanced Subcortical Auditory and Audiovisual Processing of Speech and Music,” Proceedings of the National Academy of Sciences of the United States of America 104 (2007): 15894–15898. [DOI] [PMC free article] [PubMed] [Google Scholar]

[nyas70212-bib-0057] 57. Whiteford K. L., Baltzell L. S., Chiu M., et al., “Large‐Scale Multi‐Site Study Shows No Association Between Musical Training and Early Auditory Neural Sound Encoding,” Nature Communications 16 (2025): 7152. [DOI] [PMC free article] [PubMed] [Google Scholar]

[nyas70212-bib-0058] 58. Bidelman G. M., Gandour J. T., and Krishnan A., “Musicians and Tone‐Language Speakers Share Enhanced Brainstem Encoding But Not Perceptual Benefits for Musical Pitch,” Brain and Cognition 77 (2011): 1–10. [DOI] [PMC free article] [PubMed] [Google Scholar]

[nyas70212-bib-0059] 59. Hsieh I.‐H. and Guo Y.‐J., “No Musician Advantage in the Perception of Degraded–Fundamental Frequency Speech in Noisy Environments,” Journal of Speech, Language, and Hearing Research 66 (2023): 2643–2655. [DOI] [PubMed] [Google Scholar]

[nyas70212-bib-0060] 60. Smith M. R., Cutler A., Butterfield S., et al., “The Perception of Rhythm and Word Boundaries in Noise‐Masked Speech,” Journal of Speech and Hearing Research 32 (1989): 912–920. [DOI] [PubMed] [Google Scholar]

[nyas70212-bib-0061] 61. Akeroyd M. A., “Are Individual Differences in Speech Reception Related to Individual Differences in Cognitive Ability? A Survey of Twenty Experimental Studies With Normal and Hearing‐Impaired Adults,” International Journal of Audiology 47: no. Suppl 2 (2008): S53–71. [DOI] [PubMed] [Google Scholar]

[nyas70212-bib-0062] 62. Rönnberg J., Lunner T., Zekveld A., et al., “The Ease of Language Understanding (ELU) Model: Theoretical, Empirical, and Clinical Advances,” Frontiers in Systems Neuroscience 7 (2013): 31. [DOI] [PMC free article] [PubMed] [Google Scholar]

[nyas70212-bib-0063] 63. Kraus N., Strait D. L., and Parbery‐Clark A., “Cognitive Factors Shape Brain Networks for Auditory Skills: Spotlight on Auditory Working Memory,” Annals of the New York Academy of Sciences 1252 (2012): 100–107. [DOI] [PMC free article] [PubMed] [Google Scholar]

[nyas70212-bib-0064] 64. Talamini F., Altoè G., Carretti B., et al., “Musicians Have Better Memory Than Nonmusicians: A Meta‐Analysis,” PLoS ONE 12 (2017): e0186773. [DOI] [PMC free article] [PubMed] [Google Scholar]

[nyas70212-bib-0065] 65. Zhang L., Fu X., Luo D., et al., “Musical Experience Offsets Age‐Related Decline in Understanding Speech‐In‐Noise: Type of Training Does Not Matter, Working Memory Is the Key,” Ear and Hearing 42 (2021): 258–270. [DOI] [PMC free article] [PubMed] [Google Scholar]

[nyas70212-bib-0066] 66. Müllensiefen D., Gingras B., Musil J., et al., “The Musicality of Non‐Musicians: An Index for Assessing Musical Sophistication in the General Population,” PLoS ONE 9 (2014): e89642. [DOI] [PMC free article] [PubMed] [Google Scholar]

[nyas70212-bib-0067] 67. Whitton S. A. and Jiang F., “Sensorimotor Synchronization With Visual, Auditory, and Tactile Modalities,” Psychological Research 87 (2023): 2204–2217. [DOI] [PMC free article] [PubMed] [Google Scholar]

[nyas70212-bib-0068] 68. Ai M., Loui P., Morris T. P., et al., “Musical Experience Relates to Insula‐Based Functional Connectivity in Older Adults,” Brain Sciences 12 (2022): 1157. [DOI] [PMC free article] [PubMed] [Google Scholar]

[nyas70212-bib-0069] 69. Chaddock‐Heyman L., Loui P., Weng T. B., et al., “Musical Training and Brain Volume in Older Adults,” Brain Sciences 11 (2021): 50. [DOI] [PMC free article] [PubMed] [Google Scholar]

[nyas70212-bib-0070] 70. Mehrabinejad M. M., Rafei P., Sanjari Moghaddam H., et al., “Sex Differences Are Reflected in Microstructural White Matter Alterations of Musical Sophistication: A Diffusion MRI Study,” Frontiers in Neuroscience 15 (2021): 622053. [DOI] [PMC free article] [PubMed] [Google Scholar]

[nyas70212-bib-0071] 71. Yates K. M., Moore D. R., Amitay S., et al., “Sensitivity to Melody, Rhythm, and Beat in Supporting Speech‐In‐Noise Perception in Young Adults,” Ear and Hearing 40 (2019): 358–367. [DOI] [PMC free article] [PubMed] [Google Scholar]

[nyas70212-bib-0072] 72. Humes L. E., “The World Health Organization's Hearing‐Impairment Grading System: An Evaluation for Unaided Communication in Age‐Related Hearing Loss,” International Journal of Audiology 58 (2019): 12–20. [DOI] [PMC free article] [PubMed] [Google Scholar]

[nyas70212-bib-0073] 73. Lo C. Y., Dubinsky E., Wright‐Whyte K., et al., “On‐Beat Rhythm and Working Memory Are Associated With Better Speech‐In‐Noise Perception for Older Adults With Hearing Loss,” Quarterly Journal of Experimental Psychology (Hove) (2025): 17470218241311204. [DOI] [PubMed] [Google Scholar]

[nyas70212-bib-0074] 74. Amitay S., Irwin A., Hawkey D. J., et al., “A Comparison of Adaptive Procedures for Rapid and Reliable Threshold Assessment and Training in Naive Listeners,” Journal of the Acoustical Society of America 119 (2006): 1616–1625. [DOI] [PubMed] [Google Scholar]

[nyas70212-bib-0075] 75. Iversen J. and Patel A., “The Beat Alignment Test (BAT): Surveying Beat Processing Abilities in the General Population,” in Proceedings of the 10th International Conference on Music Perception and Cognition (ICMPC10) (2008).

[nyas70212-bib-0076] 76. Wechsler D., WISC‐V: Technical and Interpretive Manual (NCS Pearson, Incorporated, 2014). [Google Scholar]

[nyas70212-bib-0077] 77. van Buuren S. and Groothuis‐Oudshoorn K., “mice: Multivariate Imputation by Chained Equations in R,” Journal of Statistical Software 45 (2011): 1–67. [Google Scholar]

[nyas70212-bib-0078] 78. Hayes A. F., Introduction to Mediation, Moderation, and Conditional Process Analysis: A Regression‐Based Approach (Guilford Press, 2013). [Google Scholar]

[nyas70212-bib-0079] 79. Ruggles D. R., Freyman R. L., and Oxenham A. J., “Influence of Musical Training on Understanding Voiced and Whispered Speech in Noise,” PLoS ONE 9 (2014): e86980. [DOI] [PMC free article] [PubMed] [Google Scholar]

[nyas70212-bib-0080] 80. Gordon‐Salant S. and Cole S. S., “Effects of Age and Working Memory Capacity on Speech Recognition Performance in Noise Among Listeners With Normal Hearing,” Ear and Hearing 37 (2016): 593–602. [DOI] [PubMed] [Google Scholar]

[nyas70212-bib-0081] 81. Shokuhifar G., Javanbakht M., Vahedi M., et al., “The Relationship Between Speech in Noise Perception and Auditory Working Memory Capacity in Monolingual and Bilingual Adults,” International Journal of Audiology 64 (2025): 131–138. [DOI] [PubMed] [Google Scholar]

[nyas70212-bib-0082] 82. Stenbäck V., Marsja E., Hällgren M., et al., “The Contribution of Age, Working Memory Capacity, and Inhibitory Control on Speech Recognition in Noise in Young and Older Adult Listeners,” Journal of Speech, Language, and Hearing Research 64 (2021): 4513–4523. [DOI] [PubMed] [Google Scholar]

[nyas70212-bib-0083] 83. Dryden A., Allen H. A., Henshaw H., et al., “The Association Between Cognitive Performance and Speech‐In‐Noise Perception for Adult Listeners: A Systematic Literature Review and Meta‐Analysis,” Trends in Hearing 21 (2017): 2331216517744675. [DOI] [PMC free article] [PubMed] [Google Scholar]

[nyas70212-bib-0084] 84. Rönnberg J., Rudner M., Foo C., et al., “Cognition Counts: A Working Memory System for Ease of Language Understanding (ELU),” International Journal of Audiology 47, no. Suppl 2 (2008): S99–105. [DOI] [PubMed] [Google Scholar]

[nyas70212-bib-0085] 85. Nahum M., Nelken I., and Ahissar M., “Low‐Level Information and High‐Level Perception: The Case of Speech in Noise,” PLoS Biology 6 (2008): e126. [DOI] [PMC free article] [PubMed] [Google Scholar]

[nyas70212-bib-0086] 86. Bugos J. A., Perlstein W. M., McCrae C. S., et al., “Individualized Piano Instruction Enhances Executive Functioning and Working Memory in Older Adults,” Aging and Mental Health 11 (2007): 464–471. [DOI] [PubMed] [Google Scholar]

[nyas70212-bib-0087] 87. Marie D., Müller C. A. H., Altenmüller E., et al., “Music Interventions in 132 Healthy Older Adults Enhance Cerebellar Grey Matter and Auditory Working Memory, Despite General Brain Atrophy,” Neuroimage: Reports 3 (2023): 100166. [DOI] [PMC free article] [PubMed] [Google Scholar]

[nyas70212-bib-0088] 88. Wang X., Soshi T., Yamashita M., et al., “Effects of a 10‐Week Musical Instrument Training on Cognitive Function in Healthy Older Adults: Implications for Desirable Tests and Period of Training,” Frontiers in Aging Neuroscience 15 (2023): 1180259. [DOI] [PMC free article] [PubMed] [Google Scholar]

[nyas70212-bib-0089] 89. Grassi M., Talamini F., Altoè G., et al., “Do Musicians Have Better Short‐Term Memory Than Nonmusicians? A Multilab Study,” Advances in Methods and Practices in Psychological Science 8 (2025): 25152459251379432. [Google Scholar]

[nyas70212-bib-0090] 90. Liu M., Arseneau‐Bruneau I., Franch M. F., et al., “Auditory Working Memory Mechanisms Mediating the Relationship Between Musicianship and Auditory Stream Segregation,” Frontiers in Psychology 16 (2025): 1538511. [DOI] [PMC free article] [PubMed] [Google Scholar]

[nyas70212-bib-0091] 91. Beier E. J. and Ferreira F., “The Temporal Prediction of Stress in Speech and Its Relation to Musical Beat Perception,” Frontiers in Psychology 9 (2018): 431. [DOI] [PMC free article] [PubMed] [Google Scholar]

[nyas70212-bib-0092] 92. Cutler A. and Butterfield S., “Rhythmic Cues to Speech Segmentation: Evidence From Juncture Misperception,” Journal of Memory and Language 31 (1992): 218–236. [Google Scholar]

[nyas70212-bib-0093] 93. Ozernov‐Palchik O. and Patel A. D., “Musical Rhythm and Reading Development: Does Beat Processing Matter?,” Annals of the New York Academy of Sciences 1423 (2018): 166–175. [DOI] [PubMed] [Google Scholar]

[nyas70212-bib-0094] 94. Rimmer C., Dahary H., and Quintin E. M., “Links Between Musical Beat Perception and Phonological Skills for Autistic Children,” Child Neuropsychology 30 (2024): 361–380. [DOI] [PubMed] [Google Scholar]

[nyas70212-bib-0095] 95. Nitin R., Gustavson D. E., Aaron A. S., et al., “Exploring Individual Differences in Musical Rhythm and Grammar Skills in School‐Aged Children With Typically Developing Language,” Scientific Reports 13 (2023): 2201. [DOI] [PMC free article] [PubMed] [Google Scholar]

[nyas70212-bib-0096] 96. Slater J. and Kraus N., “The Role of Rhythm in Perceiving Speech in Noise: A Comparison of Percussionists, Vocalists and Non‐Musicians,” Cognitive Processing 17 (2016): 79–87. [DOI] [PMC free article] [PubMed] [Google Scholar]

[nyas70212-bib-0097] 97. Giraud A. L. and Poeppel D., “Cortical Oscillations and Speech Processing: Emerging Computational Principles and Operations,” Nature Neuroscience 15 (2012): 511–517. [DOI] [PMC free article] [PubMed] [Google Scholar]

[nyas70212-bib-0098] 98. Arnal L. H. and Giraud A.‐L., “Cortical Oscillations and Sensory Predictions,” Trends in Cognitive Sciences 16 (2012): 390–398. [DOI] [PubMed] [Google Scholar]

[nyas70212-bib-0099] 99. Lakatos P., Karmos G., Mehta A. D., et al., “Entrainment of Neuronal Oscillations as a Mechanism of Attentional Selection,” Science 320 (2008): 110–113. [DOI] [PubMed] [Google Scholar]

[nyas70212-bib-0100] 100. Jones M. R. and Boltz M., “Dynamic Attending and Responses to Time,” Psychological Review 96 (1989): 459–491. [DOI] [PubMed] [Google Scholar]

[nyas70212-bib-0101] 101. Large E. W. and Jones M. R., “The Dynamics of Attending: How People Track Time‐Varying Events,” Psychological Review 106 (1999): 119–159. [Google Scholar]

[nyas70212-bib-0102] 102. Goswami U., “A Temporal Sampling Framework for Developmental Dyslexia,” Trends in Cognitive Sciences 15 (2011): 3–10. [DOI] [PubMed] [Google Scholar]

[nyas70212-bib-0103] 103. Fiveash A., Bedoin N., Gordon R. L., et al., “Processing Rhythm in Speech and Music: Shared Mechanisms and Implications for Developmental Speech and Language Disorders,” Neuropsychology 35 (2021): 771–791. [DOI] [PMC free article] [PubMed] [Google Scholar]

[nyas70212-bib-0104] 104. Fuller C. D., Galvin J. J. 3rd, Maat B., et al., “The Musician Effect: Does It Persist Under Degraded Pitch Conditions of Cochlear Implant Simulations?,” Frontiers in Neuroscience 8 (2014): 179. [DOI] [PMC free article] [PubMed] [Google Scholar]

[nyas70212-bib-0105] 105. Riegel J., Schüller A., and Reichenbach T., “No Evidence of Musical Training Influencing the Cortical Contribution to the Speech‐Frequency‐Following Response and Its Modulation Through Selective Attention,” eNeuro 11, (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]

[nyas70212-bib-0106] 106. Coffey E. B. J., Herholz S. C., Chepesiuk A. M. P., et al., “Cortical Contributions to the Auditory Frequency‐Following Response Revealed by MEG,” Nature Communications 7 (2016): 11070. [DOI] [PMC free article] [PubMed] [Google Scholar]

[nyas70212-bib-0107] 107. Joyal M., Sicard A., Penhune V., et al., “Attention, Working Memory, and Inhibitory Control in Aging: Comparing Amateur Singers, Instrumentalists, and Active Controls,” Annals of the New York Academy of Sciences 1541 (2024): 163–180. [DOI] [PMC free article] [PubMed] [Google Scholar]

[nyas70212-bib-0108] 108. Halwani G. F., Loui P., Rueber T., et al., “Effects of Practice and Experience on the Arcuate Fasciculus: Comparing Singers, Instrumentalists, and Non‐Musicians,” Frontiers in Psychology 2 (2011): 156. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Explaining the Musical Advantage in Speech Perception Through Beat Perception and Working Memory

Maxime Perron

Emily A Wood

Frank A Russo

ABSTRACT

1. Introduction

2. Methods

2.1. Participants

2.2. Procedure

2.3. Pure‐Tone Audiometry

2.4. Musical Sophistication

2.5. SIN Task

2.6. Pitch Perception

2.7. Beat Perception

2.8. Working Memory

2.9. FFR Acquisition and Preprocessing

2.10. Statistical Analyses

3. Results

TABLE 1.

3.1. Relationships Among Measures

FIGURE 1.

3.2. Parallel Mediation Analysis

FIGURE 2.

3.3. Analyses of Gold‐MSI Subscales

4. Discussion

4.1. Positive Relationship Between Gold‐MSI and SIN

4.2. Working Memory as the Main Contributing Factor

4.3. Beat Perception as a Complementary Pathway

4.4. Null Effect of Pitch Discrimination and FFR Precision

4.5. Limitations

5. Conclusion

Author Contributions

Conflicts of Interest

Supporting information

Acknowledgments

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases