Learning to Understand Speech in Babble Noise: The Role of Rhythm Perception in English and Spanish

Katerina A Tetzloff; Sarah E Yoho; Stephanie A Borrie

doi:10.1044/2025_JSLHR-25-00601

. 2026 Feb 12;69(4):1576–1583. doi: 10.1044/2025_JSLHR-25-00601

Learning to Understand Speech in Babble Noise: The Role of Rhythm Perception in English and Spanish

Katerina A Tetzloff ^a,^✉, Sarah E Yoho ^b,^c, Stephanie A Borrie ^b

PMCID: PMC13086189 PMID: 41678644

Abstract

Purpose:

Perceptual learning—the ability to improve understanding of degraded speech with exposure—has been shown to occur across languages when speech is masked by stationary noise. However, it is unknown whether this holds for more complex maskers like babble noise, which introduces both energetic and informational masking. Compared to stationary noise, babble has greater rhythmic complexity, which may interfere with listeners' ability to use rhythmic cues for perceptual learning, particularly for stress-timed languages like English. In a previous study, rhythm perception predicted perceptual learning for English-speaking but not Spanish-speaking listeners, suggesting that the role of rhythm in perceptual learning may be language specific; whether this relationship remains in the presence of rhythmically complex babble remains an open question.

Method:

Native English-speaking and Spanish-speaking listeners (N = 77) completed a perceptual learning paradigm (pretest, training, posttest) with low-predictability phrases masked by four-talker babble noise, followed by a rhythm perception task.

Results:

Listeners in both groups showed significant perceptual learning, but Spanish-speaking listeners demonstrated greater intelligibility improvement. Rhythm perception did not predict learning in either group, suggesting that babble noise disrupts rhythm-based perceptual learning strategies.

Conclusion:

These findings highlight that perceptual learning in babble is influenced by language-specific rhythmic properties and that rhythm perception may play a reduced role in more complex masking conditions.

Spoken communication rarely occurs in quiet conditions; listeners must often extract target speech in adverse listening environments such as background noise. Different types of background noise, however, vary in complexity. For example, stationary noise (e.g., white noise) provides a relatively steady masker, whereas babble noise (e.g., competing talkers) contains fluctuating energy and linguistic content, introducing both energetic and informational masking. These differences can affect listeners' ability to successfully extract the target speech signal.

One way listeners cope with degraded input like speech masked by various types of noise is through perceptual learning—the ability to improve understanding of degraded speech following exposure. Recently, it was shown that perceptual learning was evident in both English- and Spanish-speaking listeners, indicating cross-linguistic generality across languages with different rhythmic classifications (stress- vs. syllable-timed), at least for these two specific languages (Tetzloff et al., 2025). In this study, both native English-speaking and native Spanish-speaking listeners were able to perceptually adapt to speech in their respective native language that was masked by a speech-shaped noise derived from a stationary white noise (Tetzloff et al., 2025; i.e., they exhibited intelligibility improvement following perceptual training).

Although these findings demonstrate that perceptual learning can occur across languages under stationary noise conditions, it is less clear whether this cross-linguistic generality extends to more ecologically complex forms of masking. As previously mentioned, different types of noise have different acoustic properties. For example, babble noise, compared to stationary noise, has complex temporal properties that may interfere more with the target speech signal (Sperry et al., 1997). At the same time, babble noise may also afford listeners opportunities for glimpsing, or listening during the dips in the noise, to facilitate perceptual learning (Rosen et al., 2013). Evaluating perceptual learning of speech in babble noise therefore asks whether the masker's rhythmic structure disrupts perceptual learning or whether listeners compensate by exploiting glimpses. By testing English and Spanish in babble noise, relative to prior stationary noise findings, we further ask whether language-specific rhythmic properties modulate perceptual learning across different types of noise maskers.

The present study specifically examined differences in perceptual learning of speech in babble noise in English versus Spanish, as language structure and rhythmic properties may shape how listeners are able to adapt. Stress-timed languages such as English are characterized by unequal syllable durations: Stressed syllables occur at relatively regular intervals, but the number of unstressed syllables between stressed syllables can vary greatly, leading to some syllables being lengthened (stressed) and others being reduced (unstressed). This creates high rhythmic variability, with fluctuations in syllable duration and prominence across an utterance. English-speaking listeners, therefore, make extensive use of rhythmic and prosodic cues, such as stress placement and vowel reduction, to locate word boundaries and segment the speech stream (Cutler & Butterfield, 1992). In contrast, syllable-timed languages such as Spanish exhibit more uniform syllable duration, with each syllable produced at a relatively consistent rate regardless of stress. This results in a more predictable and regular rhythm, where listeners rely less on variable prosodic cues for lexical segmentation (Cutler & Butterfield, 1992; Cutler et al., 1983, 1986).

From a theoretical perspective, rhythm may contribute to speech-in-noise perception at multiple levels: as temporal predictability, by entraining attention to expected moments in the signal, and as a decoding aid, by guiding lexical segmentation strategies (Cutler & Butterfield, 1992; Cutler & Norris, 1988). Because English-speaking listeners rely more heavily on rhythm for speech perception, they may be particularly vulnerable when babble noise masks those rhythmic cues. Specifically, babble introduces competing rhythmic fluctuations that may obscure the stress patterns of the target signal, thereby interfering with perceptual learning for English-speaking listeners. Spanish-speaking listeners, by contrast, may be less disrupted by babble masking, as their segmentation strategies are less dependent on rhythmic variability. Although Spanish rhythm is more predictable, this predictability reduces its informational value: Uniform syllabic timing provides fewer contrasts in prominence, making rhythm a less diagnostic cue for segmentation; Spanish-speaking listeners thus rely more on alternative cues (e.g., phonotactics), and therefore rhythm may play a smaller role in perceptual learning compared to English. The current study tests this hypothesis that English-speaking listeners will adapt less effectively to English target speech in babble noise than Spanish-speaking listeners will to Spanish target speech. By testing perceptual learning in babble noise rather than stationary noise, this study further explores the role of linguistic structure and rhythm perception in perceptual learning. We note that, in this study, the term (perceptual) learning refers to short-term changes or improvements in intelligibility that occur following exposure to the degraded speech.

Rhythm perception, or the ability to discriminate between similar rhythmic patterns, has been linked to better (i.e., more accurate) speech perception in noise in tasks involving English speakers (Slater & Kraus, 2016; Yates et al., 2019). It has also been associated with better perceptual learning, as individuals with stronger rhythm perception skills tend to show greater improvements in adverse listening conditions. For example, Borrie et al. (2017, 2018) found that English-speaking listeners with better rhythm perception abilities benefited more from a perceptual learning task, showing greater magnitude of intelligibility improvements following familiarization with neurologically degraded speech (dysarthria). Tetzloff et al. (2025) extended this line of research by demonstrating that rhythm perception abilities predicted the magnitude of intelligibility improvements of speech in noise for English-speaking listeners; however, this relationship was not present for Spanish-speaking listeners. The authors speculated that this difference arose from language-specific rhythmic properties: Because Spanish is syllable timed and has more predictable rhythmic cues, rhythm may be less informative for perceptual learning in Spanish compared to English. However, Tetzloff et al. (2025) used stationary Gaussian speech-shaped noise as the masker. Since babble noise is more rhythmically complex, it is unclear whether the relationship between rhythm perception and perceptual learning benefits will persist under these conditions.

Accordingly, our first research question asked whether perceptual learning is evident in both English- and Spanish-speaking listeners when speech is masked by babble noise. Second, if rhythmic interference from babble noise disrupts the use of rhythmic cues in the perceptual learning process, the relationship between rhythm perception and perceptual learning may be diminished. As such, our ancillary research question examined whether rhythm perception abilities predict the magnitude of perceptual learning of speech in babble noise.

Methodology

Speech Stimuli

The target speech for the pretest and posttest (intelligibility testing stimuli) consisted of 80 two-word phrases that had low interword predictability based on linguistic judgments in both English (e.g., “whisper galaxy,” “nostalgic tomato”) and Spanish (“burro mortal” [mortal donkey], “cliente taza” [client cup]), which were previously used in Tetzloff et al. (2025). These phrases were chosen to have relatively low semantic predictability and ranged from four to six syllables. As in the Tetzloff et al. (2025) study, the training stimuli came from the “Caterpillar” passage (Patel et al., 2013) in English, which was also translated into Spanish. The stimuli were recorded in both languages by a 33-year-old male native bilingual speaker of English and Spanish.

The target speech stimuli were equated to within ± 1 dB based on the total root-mean-square amplitude. An English four-talker babble noise (Auditec) was mixed with the speech signal at a +5 dB signal-to-noise ratio (SNR) and ~1.5–2 s of babble padding before and after the target stimuli. An equivalent Spanish four-talker babble noise (Auditec) was mixed with the Spanish speech signal at the same SNR of +5 dB. This SNR was chosen based on pilot data indicating that baseline scores would be approximately 50% correct, thus allowing room for improvement posttraining. A single long babble track was used as the noise source, and each stimulus was thus mixed with a different excerpt from that track.

Procedure

This study was approved by the institutional review board at Utah State University (Protocol 14741), and all participants gave their informed consent before participating in the study. The tasks were hosted on Gorilla (gorilla.sc) and administered through Prolific (prolific.com).

Comprehensive details of the procedure can be found in Tetzloff et al. (2025), but a brief and complete description is provided here. After completing a short demographic survey, participants were instructed to wear headphones for the remainder of the experiment, although no formal headphone check was implemented; prior to beginning the listening tasks, participants were able to adjust the volume to a comfortable level. They then completed a perceptual learning task consisting of pretest, training, and posttest phases. In the pretest, participants heard 20 two-word testing phrases in their respective native language masked by the babble noise. They were told that the speech would be masked with noise and thus may be hard to understand but that they should try their best to transcribe what they thought the speaker was saying. For each trial, participants clicked to initiate playback, and the stimulus could only be played once. The next trial became available as soon as participants submitted their previous response, but the task did not automatically advance, and no fixed delay was imposed. Following the pretest intelligibility task, the listeners completed the training phase in which they heard the same speaker masked by the same noise reading the “Caterpillar” passage, which was accompanied with subtitles. They were asked to listen carefully to the speaker and use the subtitles to help them understand what was being said; it was manually checked and verified that all participants did play the audio files for the training portion and did not carelessly skip through this important step. Following this, they completed the posttest, whereby they transcribed 60 novel testing phrases produced by the same speaker and masked by the same noise. The stimuli presentation was randomized across all listeners, meaning that different listeners did not necessarily have the same phrases in pretest and posttest (although no phrase was ever repeated in both phases), and listeners only heard speech in their respective native language. After the perceptual learning task was completed, participants were able to take a break if they needed; they then partook in the Rhythm Subtest of the Musical Ear Test (Wallentin et al., 2010). This is a task that measures the rhythm perception abilities of both musicians and nonmusicians alike; it consists of 52 pairs of adjacent similar rhythmic beats, and participants must decide if they are the same or different. The total duration of all tasks was approximately 44 min (M = 43.5 min, SD = 17.1, range: 26.6–106.1 min). These tasks were programmed using the Gorilla.sc experimental platform and were administered via Prolific.com, a website designed to crowdsource suitable participants for behavioral studies and surveys. Participants were required to use either a desktop or laptop computer (i.e., no tablets or phones) to complete these tasks. They were compensated $20 through the Prolific.com platform.

Listener Participants

English-Speaking Listeners

Thirty-nine listeners participated in the English portion of the study, all of whom were monolingual speakers of American English with self-reported normal hearing. The mean age of these participants was 36.7 years (SD = 12.3). Fifteen (38.5%) were female, 23 (59.0%) were male, and one (2.6%) preferred not to say. Three participants were excluded from analyses because they failed to respond to 20% or more of the intelligibility testing stimuli (total n = 36). The participants in the current study were comparable to the 37 English-speaking listener participants reported by Tetzloff et al. (2025).

Spanish-Speaking Listeners

Thirty-eight listeners participated in the Spanish portion of the study, all of whom were raised as monolingual Spanish speakers and did not have self-reported proficiency in another language; they, too, all had self-reported normal hearing. The mean age was 30.3 years (SD = 5.5), 10 (26%) were female, and 28 were male (74%). There was one participant from Venezuela, seven from Chile, 14 from Spain, and 16 from Mexico. Two participants were excluded from the analyses because they did not respond to 20% or more of the intelligibility testing stimuli (total n = 36). These participants were also comparable to the 38 Spanish-speaking listener participants reported by Tetzloff et al. (2025).

Transcript Analysis for Perceptual Learning

The data from the perceptual learning task consisted of orthographic transcripts of what the listeners believed the speaker was saying. The transcripts were scored for percent words correct (PWC) using Autoscore (Borrie et al., 2019), an R package designed to automatically score orthographic transcripts. A PWC score was calculated for each phrase per participant for both pretest and posttest stimuli. An intelligibility improvement score (reflecting perceptual learning) was also calculated for each listener by subtracting their mean posttest PWC intelligibility score from their mean pretest PWC intelligibility score.

Rhythm Perception Analysis

The data from the rhythm perception task consisted of “same/iguales” or “different/diferentes” responses for each stimulus pair for each listener. Following Tetzloff et al. (2025), these data were analyzed using signal detection theory (Green & Swets, 1966; Hautus et al., 2021; Pastore & Scheirer, 1974), which is a way to evaluate discrimination abilities while taking into account response bias. A-prime (A′) scores were calculated for each participant's performance on the rhythm perception task; A′ is a nonparametric version of the more commonly known d-prime score that can handle extreme values without correction. A more detailed description of A′ calculation can be found in Tetzloff et al. (2025).

Statistics

All statistical analyses were run using R Statistical Software (R Core Team, 2020) with the packages “dplyr” (Wickham et al., 2019), “lme4” (Bates et al., 2015), and “emmeans” (Lenth, 2025). Because PWC at the trial level was bounded and discrete (0, 50, 100 corresponding to 0/2, 1/2, or 2/2 words correct), we analyzed trial-level counts (correct vs. incorrect) using a generalized linear mixed-effects model with a binomial likelihood (logit link). The model included test phase (pretest vs. posttest) as a fixed effect, language (English vs. Spanish) as an interaction term, and age as a covariate; random intercepts for participant and stimulus were specified to account for repeated measures.¹ An additional interaction model was run with the increase in PWC as the dependent variable and A′ score as the predictor with language (English or Spanish) as the interaction term to investigate the relationship between rhythm perception abilities and perceptual learning across the two languages, with age included as a covariate.² Estimated marginal means and trends were extracted for planned within-language comparisons and to explore language-specific effects of rhythm perception on perceptual learning.

Results

Perceptual Learning

In both English and Spanish, there was significant perceptual learning of the speech in babble noise after the training paradigm. For English, the mean pretest intelligibility across listeners was 56.6%, which increased to 66.3% at posttest (β = −.54, z = −7.34, standard error [SE] = 0.07, p < .0001), corresponding to a gain of 9.7 percentage points (interquartile range: 0.65–17.18, odds ratio ≈ 1.5, Cohen's h = .20). For Spanish, the mean pretest intelligibility was 53.1%, which increased to 70.7% at posttest (β = −.93, z = −12.38, SE = 0.08, p < .0001), a gain of 17.6 percentage points (interquartile range: 9.58–21.67, odds ratio ≈ 2.1, Cohen's h = .36). When comparing the magnitude of learning across the two languages, Spanish showed significantly more learning than English (β = .39, z = 3.71, SE = 0.10, p = .0002). These data are illustrated in Figure 1.

Two plots that show the data for the percent words correct during pretest and posttest for English-speaking (left) and Spanish-speaking (right) listeners. For English, the mean percent words correct for pretest and posttest were 56.6 percent and 66.3 percent, respectively. For Spanish, the mean percent words correct for pretest and posttest were 53.1 percent and 70.7 percent, respectively — Perceptual learning intelligibility gains from pretest to posttest with English- and Spanish-speaking listeners.

Effect of Rhythm Perception on Perceptual Learning

The mean (SD) A′ score for the English-speaking listeners in this study was 0.81 (0.10), and for Spanish-speaking listeners, 0.82 (0.11); there was no significant difference in rhythm perception scores between listeners of the two languages (β = .02, t = 0.69, SE = 0.02, p = .49). Furthermore, rhythm perception scores were not predictive of perceptual learning in English (β = −33.7, t = −1.48, SE = 22.7, p = .14) or Spanish (β = −10.7, z = −0.55, SE = 19.5, p = .59). There was no significant interaction between rhythm perception scores and language (β = 22.98, t = 0.76, SE = 20.32, p = .76). This can be visualized in Figure 2.

A scatter plot with regression lines. The y-axis represents the increase in percent words correct from pretest to posttest and the x-axis represents the A prime score from the rhythm perception task. The regression lines for both Spanish and English show almost no correlation between the two variables. — Nonsignificant interaction between perceptual learning gains and rhythm perception scores in English- and Spanish-speaking listeners.

Discussion

This study extends earlier work by Tetzloff et al. (2025), which demonstrated that both English- and Spanish-speaking listeners alike can adapt to speech masked by stationary noise in a perceptual learning paradigm, examining if these results hold true with a more spectrotemporally complex noise. Our hypothesis, that English-speaking listeners may adapt less effectively to target speech in babble noise than Spanish-speaking listeners, was supported. Although English-speaking listeners were successful in perceptual learning of speech in babble noise, as the mean PWC significantly increased from pretest (i.e., before training) to posttest (i.e., after training), Spanish-speaking listeners were significantly more successful, as determined by greater intelligibility gains from pretest to posttest. Nevertheless, in both languages, these improvements exceed the ~8 percentage-point benchmark that is often considered clinically meaningful (Stipancic et al., 2025), suggesting that the observed effects, particularly in Spanish, are both statistically reliable and practically important. In the current study, the difference in intelligibility gains from pretest to posttest was on average 7.5 percentage points greater in Spanish versus English. This descriptively contrasts from the results of Tetzloff et al. (2025), in which intelligibility improvement was statistically comparable across the two language conditions, with the difference in intelligibility gains from pretest to posttest only 3.2 percentage points. This finding of such a difference between the two languages in the present study suggests that the Spanish-speaking listeners may have taken advantage of the dips in the babble noise more so than the English-speaking listeners to facilitate better perceptual learning. This may be because Spanish, as a syllable-timed language, has more predictable stress and timing patterns, which could help listeners more effectively align their attention with the speech signal during temporal dips in the babble noise. The regular rhythm may facilitate detection of acoustic regularities, allowing Spanish-speaking listeners to extract useful speech cues when the masker momentarily recedes. By contrast, in English, which is stress timed, the glimpsing effect may be less advantageous because the portions of the speech signal present in the dips of the masker may be more temporally unpredictable. Furthermore, due to the nature of Spanish phonotactics, the Spanish training passage contained more multisyllabic words than the English passage (≈ 2.4 syllables per content in Spanish vs. 1.5 in English), implying denser syllabic/rhythmic cues. Accordingly, any Spanish advantage in learning speech masked by babble could reflect, in part, greater redundancy of syllabic information, which could have operated in addition to the proposed glimpsing advantage.

Another possible contributor to the group difference in the magnitude of perceptual learning is accent familiarity. The Spanish-speaking listeners in our study varied in their regional background, and the target talker in the Spanish condition spoke Puerto Rican Spanish, a dialect which our listeners may not have been previously exposed to. Although traditional Spanish features of Puerto Rican Spanish (e.g., /s/ aspiration, /r/ realized as [l] in coda position) were not present in his read speech used for the stimuli, nevertheless, as a result, listeners may have been adapting not only to speech masked by babble noise but also to unfamiliar acoustic features associated with Puerto Rican Spanish. In contrast, the English-speaking listeners were more likely to be familiar with the target talker's American English accent and thus faced less dialect-related variability. While we were unable to control for dialect in the Spanish portion of the present study, as a post hoc analysis, we ran the second perceptual learning model with just the Spanish-speaking listeners, with and without controlling for Spanish dialect; adding dialect did not improve the model fit.³ Nevertheless, future studies could examine the role of listener–talker dialect match in noise as a moderator of perceptual learning in both English and Spanish.

It is important to also consider the role of the spectral properties of the maskers in relation to the target speech when interpreting these results. We quantified spectral overlap between each language's target speech and the corresponding babble. For each file, we computed 1/3 octave long-term spectra (160 Hz–8 kHz); then we derived the Pearson correlation between the target and the babble spectra in dB, as well as a normalized overlap score (sum of bandwise minima after linear-power normalization; range: 0–1). Spectral overlap was higher for Spanish than English. The target-babble spectral correlation was .87 for Spanish versus .73 for English, and the overlap score (0–1) was .88 for Spanish versus .82 for English. These values indicate that the Spanish target speech places energy more similarly to Spanish babble than English target speech does to English babble; in other words, there was greater spectral crowding of speech-relevant bands in the Spanish condition. This provides a concrete acoustic basis for stronger masking in the Spanish condition. Nevertheless, Spanish-speaking listeners exhibited larger perceptual learning gains. This pattern suggests that perceptual learning is not bounded by energetic masking, but rather language-internal regularities in Spanish (more predictable syllabic rhythm) likely provided stable anchors in both the target and masker speech that listeners could learn to exploit, yielding greater improvement despite a more crowded spectrum.

As a second research question, we evaluated if rhythm perception abilities corresponded to the magnitude of perceptual learning. Tetzloff et al. (2025) showed that rhythm perception abilities were predictive of perceptual learning of speech in white noise for English-speaking but not Spanish-speaking listeners. In the present study, however, we found no relationship between rhythm perception abilities and perceptual learning of speech in babble noise in either language. A likely reason is that babble imposes its own rhythmic structure, which competes with and masks the target speech's rhythmic structure, making rhythm-based strategies harder to exploit. This interpretation is consistent with two pieces of prior evidence. First, McAuley et al. (2020) showed that target-rhythm variability impairs performance in babble and that making the background rhythm irregular improves target understanding; this demonstrates that masker rhythmic regularity contributes to interference. Second, Pearson et al. (2025) similarly found that reducing the temporal regularity of background speech diminishes masking. Together, these results suggest that in babble, masker-driven rhythmic structure limits the extent to which individual rhythm perception abilities can facilitate perceptual learning, even when those abilities may predict learning in simpler, stationary maskers (e.g., Tetzloff et al., 2025).

Since babble noise closely resembles real-world conditions (e.g., crowded spaces, conversations in noisy environments), the present findings highlight potential challenges for speakers of English and other stress-timed languages in adapting to speech in multitalker settings compared to speakers of Spanish and other syllable-timed languages. These results thus underscore the need to consider linguistic background when designing auditory training programs, assistive listening devices, or speech perception models for noisy environments.

In summary, this study builds on previous research by showing that language-specific rhythmic properties influence how listeners adapt to speech in babble noise. Listeners of both stress-timed and syllable-timed languages can succeed in the perceptual learning of speech in babble noise, but the listeners of the syllable-timed language were able to do this more effectively. While rhythm perception may be useful for adapting to speech degraded by the presence of simpler noise types (e.g., stationary noise), it appears to be less effective in more complex noise like babble regardless of the listener's native language. These findings have important implications for understanding speech perception in noisy environments in English versus Spanish.

Data Availability Statement

The data and statistical code for this study are available at https://osf.io/vu5q7/.

Acknowledgments

This work was supported by National Institute on Deafness and Other Communication Disorders Grants R01 DC020713 and R01 DC020930 (awarded to Stephanie A. Borrie).

Funding Statement

This work was supported by National Institute on Deafness and Other Communication Disorders Grants R01 DC020713 and R01 DC020930 (awarded to Stephanie A. Borrie).

Footnotes

glmer(cbind(correct, total − correct) ~ test × Language + Age + (1|id) + (1|target), data = datapl, family = binomial)

lm(intelligibility improvement ~ rhythm perception A′ × Language + Age, data = data)

χ²(3) = 2.90, p = .41 (Akaike information criterion [AIC] without dialect = 4,416.8 vs. AIC with dialect = 4,419.9).

References

Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67(1), 1–48. 10.18637/jss.v067.i01 [DOI] [Google Scholar]
Borrie, S. A., Barrett, T. S., & Yoho, S. E. (2019). Autoscore: An open-source automated tool for scoring listener perception of speech. The Journal of the Acoustical Society of America, 145(1), 392–399. 10.1121/1.5087276 [DOI] [PMC free article] [PubMed] [Google Scholar]
Borrie, S. A., Lansford, K. L., & Barrett, T. S. (2017). Rhythm perception and its role in perception and learning of dysrhythmic speech. Journal of Speech, Language, and Hearing Research, 60(3), 561–570. 10.1044/2016_JSLHR-S-16-0094 [DOI] [Google Scholar]
Borrie, S. A., Lansford, K. L., & Barrett, T. S. (2018). Understanding dysrhythmic speech: When rhythm does not matter and learning does not happen. The Journal of the Acoustical Society of America, 143(5), EL379–EL385. 10.1121/1.5037620 [DOI] [PMC free article] [PubMed] [Google Scholar]
Cutler, A., & Butterfield, S. (1992). Rhythmic cues to speech segmentation: Evidence from juncture misperception. Journal of Memory and Language, 31(2), 218–236. 10.1016/0749-596X(92)90012-M [DOI] [Google Scholar]
Cutler, A., Mehler, J., Norris, D., & Segui, J. (1983). A language-specific comprehension strategy. Nature, 304(5922), 159–160. 10.1038/304159a0 [DOI] [PubMed] [Google Scholar]
Cutler, A., Mehler, J., Norris, D., & Segui, J. (1986). The syllable's differing role in the segmentation of French and English. Journal of Memory and Language, 25(4), 385–400. 10.1016/0749-596X(86)90033-1 [DOI] [Google Scholar]
Cutler, A., & Norris, D. (1988). The role of strong syllables in segmentation for lexical access. Journal of Experimental Psychology: Human Perception and Performance, 14(1), Article 113. 10.1037//0096-1523.14.1.113 [DOI] [Google Scholar]
Green, D. M., & Swets, J. A. (1966). Signal detection theory and psychophysics (Vol. 1). Wiley New York. [Google Scholar]
Hautus, M. J., Macmillan, N. A., & Creelman, C. D. (2021). Detection theory: A user's guide. Routledge. 10.4324/9781003203636 [DOI] [Google Scholar]
Lenth, R. V. (2025). emmeans: Estimated marginal means, aka least squares means (R package Version 1.10.7) [Computer software]. https://rvlenth.github.io/emmeans/
McAuley, J. D., Shen, Y., Dec, S., & Kidd, G. R. (2020). Altering the rhythm of target and background talkers differentially affects speech understanding. Attention, Perception, & Psychophysics, 82(6), 3222–3233. 10.3758/s13414-020-02064-5 [DOI] [Google Scholar]
Pastore, R. E., & Scheirer, C. J. (1974). Signal detection theory: Considerations for general application. Psychological Bulletin, 81(12), 945–958. 10.1037/h0037357 [DOI] [Google Scholar]
Patel, R., Connaghan, K., Franco, D., Edsall, E., Forgit, D., Olsen, L., Ramage, L., Tyler, E., & Russell, S. (2013). “The Caterpillar”: A novel reading passage for assessment of motor speech disorders. American Journal of Speech-Language Pathology 22(1), 1–9. 10.1044/1058-0360(2012/11-0134) [DOI] [PubMed] [Google Scholar]
Pearson, D. V., Shen, Y., McAuley, J. D., & Kidd, G. R. (2025). Aging and the effect of background rhythm on selective listening in multiple-source environments. Hearing Research, 467, Article 109389. 10.1016/j.heares.2025.109389 [DOI] [Google Scholar]
R Core Team. (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing. https://www.R-project.org/
Rosen, S., Souza, P., Ekelund, C., & Majeed, A. A. (2013). Listening to speech in a background of other talkers: Effects of talker number and noise vocoding. The Journal of the Acoustical Society of America, 133(4), 2431–2443. 10.1121/1.4794379 [DOI] [PMC free article] [PubMed] [Google Scholar]
Slater, J., & Kraus, N. (2016). The role of rhythm in perceiving speech in noise: A comparison of percussionists, vocalists and non-musicians. Cognitive Processing, 17(1), 79–87. 10.1007/s10339-015-0740-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
Sperry, J. L., Wiley, T. L., & Chial, M. R. (1997). Word recognition performance in various background competitors. Journal-American Academy of Audiology, 8, 71–80. https://pubmed.ncbi.nlm.nih.gov/9101453/ [Google Scholar]
Stipancic, K. L., van Brenk, F., Qiu, M., & Tjaden, K. (2025). Progress toward estimating the minimal clinically important difference of intelligibility: A crowdsourced perceptual experiment. Journal of Speech, Language, and Hearing Research, 68(7S), 3480–3494. 10.1044/2024_JSLHR-24-00354 [DOI] [Google Scholar]
Tetzloff, K. A., Yoho, S. E., & Borrie, S. A. (2025). The relationship between rhythm perception abilities and perceptual learning in syllable- versus stress-timed languages. The Journal of the Acoustical Society of America, 157(4), 2847–2856. 10.1121/10.0036455 [DOI] [PMC free article] [PubMed] [Google Scholar]
Wallentin, M., Nielsen, A. H., Friis-Olivarius, M., Vuust, C., & Vuust, P. (2010). The Musical Ear Test, a new reliable test for measuring musical competence. Learning and Individual Differences, 20(3), 188–196. 10.1016/j.lindif.2010.02.004 [DOI] [Google Scholar]
Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L. D., François, R., Grolemund, G., Hayes, A., Henry, L., Hester, J., Kuhn, M., Pedersen, T. L., Miller, E., Bache, S. M., Müller, K., Ooms, J., Robinson, D., Seidel, D. P., Spinu, V., … Yutani, H. (2019). Welcome to the tidyverse. The Journal of Open Source Software, 4(43), Article 1686. 10.21105/joss.01686 [DOI] [Google Scholar]
Yates, K. M., Moore, D. R., Amitay, S., & Barry, J. G. (2019). Sensitivity to melody, rhythm, and beat in supporting speech-in-noise perception in young adults. Ear and Hearing, 40(2), 358–367. 10.1097/AUD.0000000000000621 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The data and statistical code for this study are available at https://osf.io/vu5q7/.

[bib1] Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67(1), 1–48. 10.18637/jss.v067.i01 [DOI] [Google Scholar]

[bib2] Borrie, S. A., Barrett, T. S., & Yoho, S. E. (2019). Autoscore: An open-source automated tool for scoring listener perception of speech. The Journal of the Acoustical Society of America, 145(1), 392–399. 10.1121/1.5087276 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib3] Borrie, S. A., Lansford, K. L., & Barrett, T. S. (2017). Rhythm perception and its role in perception and learning of dysrhythmic speech. Journal of Speech, Language, and Hearing Research, 60(3), 561–570. 10.1044/2016_JSLHR-S-16-0094 [DOI] [Google Scholar]

[bib4] Borrie, S. A., Lansford, K. L., & Barrett, T. S. (2018). Understanding dysrhythmic speech: When rhythm does not matter and learning does not happen. The Journal of the Acoustical Society of America, 143(5), EL379–EL385. 10.1121/1.5037620 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib5] Cutler, A., & Butterfield, S. (1992). Rhythmic cues to speech segmentation: Evidence from juncture misperception. Journal of Memory and Language, 31(2), 218–236. 10.1016/0749-596X(92)90012-M [DOI] [Google Scholar]

[bib6] Cutler, A., Mehler, J., Norris, D., & Segui, J. (1983). A language-specific comprehension strategy. Nature, 304(5922), 159–160. 10.1038/304159a0 [DOI] [PubMed] [Google Scholar]

[bib7] Cutler, A., Mehler, J., Norris, D., & Segui, J. (1986). The syllable's differing role in the segmentation of French and English. Journal of Memory and Language, 25(4), 385–400. 10.1016/0749-596X(86)90033-1 [DOI] [Google Scholar]

[bib8] Cutler, A., & Norris, D. (1988). The role of strong syllables in segmentation for lexical access. Journal of Experimental Psychology: Human Perception and Performance, 14(1), Article 113. 10.1037//0096-1523.14.1.113 [DOI] [Google Scholar]

[bib9] Green, D. M., & Swets, J. A. (1966). Signal detection theory and psychophysics (Vol. 1). Wiley New York. [Google Scholar]

[bib10] Hautus, M. J., Macmillan, N. A., & Creelman, C. D. (2021). Detection theory: A user's guide. Routledge. 10.4324/9781003203636 [DOI] [Google Scholar]

[bib11] Lenth, R. V. (2025). emmeans: Estimated marginal means, aka least squares means (R package Version 1.10.7) [Computer software]. https://rvlenth.github.io/emmeans/

[bib12] McAuley, J. D., Shen, Y., Dec, S., & Kidd, G. R. (2020). Altering the rhythm of target and background talkers differentially affects speech understanding. Attention, Perception, & Psychophysics, 82(6), 3222–3233. 10.3758/s13414-020-02064-5 [DOI] [Google Scholar]

[bib13] Pastore, R. E., & Scheirer, C. J. (1974). Signal detection theory: Considerations for general application. Psychological Bulletin, 81(12), 945–958. 10.1037/h0037357 [DOI] [Google Scholar]

[bib14] Patel, R., Connaghan, K., Franco, D., Edsall, E., Forgit, D., Olsen, L., Ramage, L., Tyler, E., & Russell, S. (2013). “The Caterpillar”: A novel reading passage for assessment of motor speech disorders. American Journal of Speech-Language Pathology 22(1), 1–9. 10.1044/1058-0360(2012/11-0134) [DOI] [PubMed] [Google Scholar]

[bib15] Pearson, D. V., Shen, Y., McAuley, J. D., & Kidd, G. R. (2025). Aging and the effect of background rhythm on selective listening in multiple-source environments. Hearing Research, 467, Article 109389. 10.1016/j.heares.2025.109389 [DOI] [Google Scholar]

[bib16] R Core Team. (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing. https://www.R-project.org/

[bib17] Rosen, S., Souza, P., Ekelund, C., & Majeed, A. A. (2013). Listening to speech in a background of other talkers: Effects of talker number and noise vocoding. The Journal of the Acoustical Society of America, 133(4), 2431–2443. 10.1121/1.4794379 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib20] Slater, J., & Kraus, N. (2016). The role of rhythm in perceiving speech in noise: A comparison of percussionists, vocalists and non-musicians. Cognitive Processing, 17(1), 79–87. 10.1007/s10339-015-0740-7 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib21] Sperry, J. L., Wiley, T. L., & Chial, M. R. (1997). Word recognition performance in various background competitors. Journal-American Academy of Audiology, 8, 71–80. https://pubmed.ncbi.nlm.nih.gov/9101453/ [Google Scholar]

[bib18] Stipancic, K. L., van Brenk, F., Qiu, M., & Tjaden, K. (2025). Progress toward estimating the minimal clinically important difference of intelligibility: A crowdsourced perceptual experiment. Journal of Speech, Language, and Hearing Research, 68(7S), 3480–3494. 10.1044/2024_JSLHR-24-00354 [DOI] [Google Scholar]

[bib19] Tetzloff, K. A., Yoho, S. E., & Borrie, S. A. (2025). The relationship between rhythm perception abilities and perceptual learning in syllable- versus stress-timed languages. The Journal of the Acoustical Society of America, 157(4), 2847–2856. 10.1121/10.0036455 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib22] Wallentin, M., Nielsen, A. H., Friis-Olivarius, M., Vuust, C., & Vuust, P. (2010). The Musical Ear Test, a new reliable test for measuring musical competence. Learning and Individual Differences, 20(3), 188–196. 10.1016/j.lindif.2010.02.004 [DOI] [Google Scholar]

[bib23] Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L. D., François, R., Grolemund, G., Hayes, A., Henry, L., Hester, J., Kuhn, M., Pedersen, T. L., Miller, E., Bache, S. M., Müller, K., Ooms, J., Robinson, D., Seidel, D. P., Spinu, V., … Yutani, H. (2019). Welcome to the tidyverse. The Journal of Open Source Software, 4(43), Article 1686. 10.21105/joss.01686 [DOI] [Google Scholar]

[bib24] Yates, K. M., Moore, D. R., Amitay, S., & Barry, J. G. (2019). Sensitivity to melody, rhythm, and beat in supporting speech-in-noise perception in young adults. Ear and Hearing, 40(2), 358–367. 10.1097/AUD.0000000000000621 [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Learning to Understand Speech in Babble Noise: The Role of Rhythm Perception in English and Spanish

Katerina A Tetzloff

Sarah E Yoho

Stephanie A Borrie

Abstract

Purpose:

Method:

Results:

Conclusion:

Methodology

Speech Stimuli

Procedure

Listener Participants

English-Speaking Listeners

Spanish-Speaking Listeners

Transcript Analysis for Perceptual Learning

Rhythm Perception Analysis

Statistics

Results

Perceptual Learning

Figure 1.

Effect of Rhythm Perception on Perceptual Learning

Figure 2.

Discussion

Data Availability Statement

Acknowledgments

Funding Statement

Footnotes

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Learning to Understand Speech in Babble Noise: The Role of Rhythm Perception in English and Spanish

Katerina A Tetzloff

Sarah E Yoho

Stephanie A Borrie

Abstract

Purpose:

Method:

Results:

Conclusion:

Methodology

Speech Stimuli

Procedure

Listener Participants

English-Speaking Listeners

Spanish-Speaking Listeners

Transcript Analysis for Perceptual Learning

Rhythm Perception Analysis

Statistics

Results

Perceptual Learning

Figure 1.

Effect of Rhythm Perception on Perceptual Learning

Figure 2.

Discussion

Data Availability Statement

Acknowledgments

Funding Statement

Footnotes

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases