Skip to main content
The Journal of the Acoustical Society of America logoLink to The Journal of the Acoustical Society of America
. 2019 Mar 1;145(3):EL173–EL178. doi: 10.1121/1.5091664

Age effects on the contributions of envelope and periodicity cues to recognition of interrupted speech in quiet and with a competing talker

William J Bologna 1,a),, Kenneth I Vaden Jr 1, Jayne B Ahlstrom 1, Judy R Dubno 1
PMCID: PMC7112707  PMID: 31067962

Abstract

Envelope and periodicity cues may provide redundant, additive, or synergistic benefits to speech recognition. The contributions of these cues may change under different listening conditions and may differ for younger and older adults. To address these questions, younger and older adults with normal hearing listened to interrupted sentences containing different combinations of envelope and periodicity cues in quiet and with a competing talker. Envelope and periodicity cues improved speech recognition for both groups, and their benefits were additive when both cues were available. Envelope cues were particularly important for older adults and for sentences with a competing talker.

1. Introduction

Speech contains multiple sources of information across a range of temporal fluctuation rates. The framework developed by Rosen (1992) distinguishes envelope cues (2–50 Hz), which code syllabic and segmental rates, from periodicity cues (50–500 Hz), which code fundamental frequency (F0) and intonation. These cues may each contribute to speech recognition and their combined effects may be redundant, additive, or synergistic. Recent work demonstrated that filling silent intervals of interrupted speech with missing envelope cues improved speech recognition for younger and older adults in quiet and with a competing talker (Bologna et al., 2018). The current report expands on this work with a set of conditions in which missing periodicity cues were provided, with or without missing envelope cues. The independent manipulation of envelope and periodicity cues during sentence interruptions provided a means to determine their relative and combined contributions to speech recognition.

A background of competing speech may alter the relative contributions of envelope and periodicity cues to speech recognition. Periodicity cues are thought to play a key role in speech segregation, as listeners can exploit differences in F0 to facilitate segregation of target speech from competing talkers (Brokx and Nooteboom, 1982). In contrast, vocoded speech is characterized by an absence of periodicity cues, which substantially reduces intelligibility in a background of competing talkers (Stickney et al., 2007). Periodicity cues also convey intonation, which varies dynamically over the course of a sentence and may provide a predictable pitch pattern to facilitate the tracking of target speech in a competing talker background (Woods and McDermott, 2015). Thus, periodicity cues may have a larger effect on speech recognition when measured in a background that contains competing speech. This hypothesis was tested in the current study by comparing recognition of interrupted speech in quiet and with a competing talker.

Several lines of evidence suggest that periodicity cues have a diminished effect on speech recognition for older adults compared to younger adults. For example, age-related declines have been observed in neural representations of periodicity cues in the brainstem (Clinard and Tremblay, 2013; Snyder and Alain, 2005). Behaviorally, older adults are poorer than younger adults for speech recognition tasks with competing talkers (e.g., Helfer and Freyman, 2008; Lee and Humes, 2012) as well as for psychophysical tasks involving periodicity cues (Vongpaisal and Pichora-Fuller, 2007; Arehart et al., 2011). The prevailing hypothesis across these studies is that periodicity cues help facilitate speech segregation, and older adults are less able than younger adults to realize this benefit. However, older adults are more sensitive to intermittent distortions of the envelope, which may also account for age-related declines in speech recognition with competing talkers (Bologna et al., 2018). Here, these aging effects were assessed by comparing performance of younger and older adults on an interrupted speech task with different combinations of envelope and periodicity cues in quiet and with a competing talker.

2. Methods

The same participants described in Bologna et al. (2018) were tested, 20 younger adults [18–29 years; mean, 24.7; standard deviation (SD), 2.8] and 20 older adults (63–84 years; mean, 69.9; SD, 5.7). All participants had air-conduction thresholds ≤30 dB hearing level for 250–6000 Hz, and were native speakers of American English.

Example waveforms and spectrograms for the four types of interrupted sentences in a 2 × 2 design are displayed in Fig. 1(A). PRESTO sentences were used because the diversity of talkers and dialects in the sentence recordings improves generalizability to real-world listening (Gilbert et al., 2013). Sentences were interrupted at regular intervals, and interrupted segments were either left silent (−Env, −Prd), filled with noise that carried the missing envelope cues (+Env, −Prd), filled with pulse trains that carried the missing periodicity cues (−Env, +Prd), or modulated pulse trains that carried both envelope and periodicity cues (+Env, +Prd). Sentences were interrupted to retain speech proportions of 0.4, 0.5, 0.55, 0.6, and 0.7; this manipulation primarily affected the duration of speech glimpses, with larger speech proportions producing longer glimpses of speech and more intelligible sentences (see Bologna et al., 2018). Competing talker sentences were selected from the same sentence corpus and mixed with target sentences at a +3-dB signal-to-noise ratio (SNR) for the competing talker conditions. Although masking effects differ for younger and older adults (Fogerty et al., 2017), both groups were tested at the same SNR so that effects of background and age could be modeled independently.

Fig. 1.

Fig. 1.

(A) Example waveforms and spectrograms for the four types of sentences. Interrupted segments contain envelope cues in the bottom panels (+Env) but not in the top panels (−Env). Interrupted segments contain periodicity cues in the right panels (+Prd) but not in the left panels (−Prd). (B) Keyword recognition (rau) plotted as a function of proportion of speech with age (filled vs open symbols) and background (circles vs triangles) as parameters.

Pulse trains designed to carry periodicity cues were generated using the praat software package (Boersma and Weenick, 2014) and consisted of a series of periodic pitch pulses with a time-varying F0 contour matching the original sentence (Tinnemore and Chatterjee, 2015). Pitch contours were extracted from each sentence with a sampling rate of 100 Hz. Aberrant pitch points were removed by hand and replaced with actual pitch values as needed. Pitch contours were interpolated through unvoiced segments and periods of silence to generate continuous periodic pulse trains with a flat spectrum that followed the pitch contour of the original sentence. Pulse trains were subsequently filtered to match the PRESTO long-term average speech spectrum and the RMS of the original sentence. To generate periodicity-filled sentences, continuous pulse trains were interrupted by a rectangular gating function and combined with silence-interrupted sentences such that the pulse trains were gated on when the speech was gated off (see Fig. 2 in Bologna et al., 2018). For envelope-plus-periodicity-filled sentences, pulse trains were modulated by the envelope of the original sentence, obtained via full-wave rectification and low-pass filtering at 50 Hz with a fifth-order Butterworth filter.

Keyword recognition in all conditions was measured by instructing participants to repeat each sentence, guessing whenever possible. Responses were scored online by the experimenter using a strict scoring rule (i.e., no additional or missing suffixes). Testing was completed over two, 2-h sessions. One session contained silence-interrupted and envelope-filled conditions described in Bologna et al. (2018), and the other session included periodicity-filled and envelope-plus-periodicity conditions. The order of these two sessions was randomized across subjects to minimize any systematic effects of practice or familiarity with the sentences. Testing was blocked by background (quiet/competing talker) and counterbalanced across subjects. A break was offered between test blocks.

Data were analyzed using an item-level logistic regression analysis of keyword recognition implemented in r using a generalized linear mixed model (GLMM; r-package: lme4; R Development Core Team, 2016). Two binary factors were included in the model that reflected the 2 × 2 design, with present or absent envelope cues (+/− Env) and periodicity cues (+/− Prd) during sentence interruptions. Additional factors of interest included the background, age group, and proportion of speech. Several other factors1 describing item-level and subject-level effects were also included in the model; a comprehensive examination of these factors can be found in Bologna et al. (2018). A combination of stepwise factor addition and elimination was performed to optimize model fit, which eliminated some terms that did not account for variance despite relevance to the study hypotheses (e.g., Background × Prd, and Age × Prd). When interaction terms significantly improved model fit, post hoc models were constructed to characterize the effects (e.g., Background × Env, and Age × Env).

3. Results

Keyword recognition (rau; Studebaker, 1985) is plotted for the four stimulus conditions in Fig. 1(B). Similar to results for the silence-interrupted and envelope-filled conditions reported in Bologna et al. (2018), model testing confirmed significant improvements in model fit with the addition of each factor in the study design: envelope cues, periodicity cues, speech proportion, background, and age group (all χ2 ≥ 22.08, p < 0.001). Specifically, the results indicated that keyword recognition improved significantly when envelope cues were available during sentence interruptions (βEnv = 0.24, z = 20.82, p < 0.001) and when periodicity cues were available (βPrd = 0.16, z = 10.81, p < 0.001). Keyword recognition improved significantly with increasing proportion of speech (βProp = 1.00, z = 77.60, p < 0.001) and declined significantly with a competing talker (βBg = −0.75, z = −53.05, p < 0.001). Younger adults significantly outperformed older adults (βAge = −0.27, z = −5.42, p < 0.001).

An interaction between envelope cues and periodicity cues was tested for significance to determine the extent to which the combined effects of these two cues were redundant, additive, or synergistic. This interaction term did not improve model fit (χ2Env×Prd = 0.21, ns), indicating that the combined effects were not redundant or synergistic. Rather, each cue increased the likelihood of correct keyword recognition and these benefits were additive when both cues were available. This result is illustrated in Fig. 2. Keyword recognition was poorest when neither cue was available (white bars). When both cues were available (gray hatched bars), the improvement in keyword recognition, relative to silence-interrupted, is roughly equal to the sum of the improvement observed in the envelope and periodicity conditions (hatched bars and gray bars, respectively).

Fig. 2.

Fig. 2.

Keyword recognition (rau) by younger (left) and older adults (right) for the four types of sentences, collapsed across proportion of speech. The 0.40 proportion was only tested in quiet due to floor effects with a competing talker and was omitted from these averages to facilitate comparisons across the two backgrounds. White/Gray bars indicate +/− Prd and solid/hatched bars indicated +/− Env.

Interactions with background were tested to determine the extent to which envelope cues or periodicity cues provided an additional benefit for speech segregation with a competing talker. Only the interaction between envelope and background was significant (βEnv×Bg = 0.05, z = 5.63, p < 0.001). Post hoc modeling revealed that envelope cues were more beneficial with a competing talker (βEnv_CT = 0.22) than in quiet (βEnv_Q = 0.19). In Fig. 2, the improvement when envelope cues were available (hatched bars and gray hatched bars) was larger with a competing talker than in quiet. This suggests that envelope cues (not periodicity cues) provided a benefit to speech segregation with a competing talker.

Interactions were also tested between age and envelope cues and between age and periodicity cues to determine the extent to which age affected the speech recognition benefit from these cues. Again, only the interaction with envelope reached significance (βEnv×Age = 0.06, z = 5.91, p < 0.001). Post hoc modeling revealed that envelope cues were more beneficial for older adults (βEnv_O = 0.26) than younger adults (βEnv_Y = 0.15). This effect can be observed by comparing left and right panels of Fig. 2; older adults (right) realized a greater improvement than younger adults (left) from signals containing envelope cues (hatched bars) and signals containing both envelope and periodicity cues (gray hatched bars). It should also be noted that older adults performed poorer than younger adults on the baseline condition (white bars). Thus, the interaction between age and envelope cues is driven, in part, by poorer performance of older adults for recognition of silence-interrupted speech (see Bologna et al., 2018 for discussion). Together, the results suggest that older adults relied more on continuous envelope cues to facilitate speech recognition than younger adults, but younger and older adults did not differ in the modest benefit from periodicity cues.

4. Discussion

The purpose of this study was to determine (1) the relative contributions of envelope and periodicity cues for speech recognition, (2) the extent to which periodicity cues provide additional benefit with a competing talker, and (3) the extent to which younger and older adults differ in the benefit they receive from these cues in either background. Results from conditions with periodicity-filled and envelope-plus-periodicity filled sentences were combined with silence-interrupted and envelope-filled results from Bologna et al. (2018) to test study hypotheses. These new results provide additional evidence of the importance of envelope cues (relative to periodicity) for speech recognition in a background with competing talkers.

For both younger and older adults, envelope and periodicity cues provided additive contributions to recognition of interrupted speech. Consistent with recent work by Oh et al. (2016), keyword recognition in sentences was best when both cues were available. Here, results suggested a larger effect of envelope cues than periodicity cues both in quiet and with a competing talker. The envelope is naturally continuous in an uninterrupted sentence, and so the additional envelope cues likely restore the continuity of the envelope for interrupted sentences. In contrast, periodicity is naturally intermittent in an uninterrupted sentence, occurring only during voiced speech segments and vowels. As such, the additional periodicity cues may have created an unnatural continuity of periodicity information through segments of speech that would otherwise be aperiodic, such as unvoiced consonants. These spurious periodic segments can result in poorer speech recognition compared to an analogous aperiodic stimulus (Steinmetzger and Rosen, 2015).

With a competing talker, periodicity cues were hypothesized to provide an additional benefit to speech segregation, thereby promoting better recognition of target speech. Contrary to this hypothesis, the benefit associated with continuous periodicity cues was equivalent in quiet and with a competing talker. Rather, envelope cues provided additional benefit with a competing talker. This result is consistent with work by Apoux et al. (2013), which demonstrated that speech recognition with a single competing talker is determined mostly by envelope cues. They interpreted the role of periodicity cues to be a means of identifying glimpses of target speech and segregating them from the background (see also, Hopkins and Moore, 2009). With the current design, the periodicity cues were added between glimpses of an interrupted sentence to facilitate continuity of speech cues across the sentence. As these cues were carried by non-speech signals, they did not provide a means of identifying glimpses of speech. Thus, periodicity cues provided no additional benefit with a competing talker. They were either redundant with existing periodicity in the speech glimpses or were otherwise unnecessary for speech segregation.

Whereas older adults were hypothesized to benefit less from periodicity cues than younger adults, this study demonstrated an equivalent benefit for both age groups. This may be explained by the relatively minor contributions of periodicity cues in interrupted segments to overall sentence recognition. Recent work by Clarke et al. (2017) demonstrated that manipulations of the F0 contour did not affect top-down repair of interrupted speech in younger adults. In their study, periodicity cues in speech glimpses were either flattened to remove intonation cues or exaggerated to provide a more salient F0 cue over the course of the sentence. They hypothesized that manipulations of the F0 contour would affect across-time linkage of successive glimpses of speech and disrupt recognition of an interrupted sentence. They found (similar to results here) that the F0 contour provided only a small benefit for recognition of interrupted speech. Thus, any age-related declines in periodicity coding may have a negligible effect on older adults' ability to benefit from a continuous F0 contour, as this benefit is generally minimal even in younger adults.

In summary, this study assessed the contributions of envelope and periodicity cues to recognition of interrupted speech by younger and older adults in quiet and with a competing talker. Envelope and periodicity cues each facilitated better keyword recognition, and the improvements were additive when both cues were available. Envelope cues were more beneficial than periodicity cues and were particularly beneficial when the background contained a competing talker. Although older adults were expected to benefit less than younger adults from periodicity cues, the current results indicate that older adults benefit at least as much from these cues as younger adults. Thus, the effects of age on speech recognition with competing talkers may be driven more by intermittent distortions of the envelope than reduced benefit of periodicity cues.

Acknowledgments

This work was supported (in part) by research Grants Nos. R01 DC000184 and P50 DC000422 from NIH/NIDCD, and the South Carolina Clinical & Translational Research (SCTR) Institute, with an academic home at the Medical University of South Carolina, through NIH Grant No. UL1 TR000062. This investigation was conducted in a facility constructed with support from Research Facilities Improvement Program Grant No. C06 RR014516 from the National Center for Research Resources, National Institutes of Health. This work was completed as part of the first author's dissertation to fulfill the requirement for Ph.D. at the University of Maryland. The authors thank Monita Chatterjee, Sandra Gordon-Salant, Rochelle Newman, and Jonathan Simon for their advice and support on this work, and Lois Matthews for her assistance with participant recruitment.

Footnotes

1

A binary session order factor (Ord) was added to the model to determine the extent to which keyword recognition improved in the second testing session due to practice or familiarity effects. Model testing revealed that session order significantly improved the fit of the model, with a modest, but significant, improvement in keyword recognition on the second test session compared to the first (βOrd = 0.05, z = 3.92, p < 0.001). Interactions were explored between session order and the remaining factors, but none of these interactions significantly improve the fit of the model (χ2Ord < 1.73, ns in all cases). The session order effect on keyword recognition on the second session compared to the first was small, corresponding to an average improvement of 3% across subjects. The session order factor was retained in the model to account for this variance in performance.

References and links

  • 1. Apoux, F. , Yoho, S. E. , Youngdahl, C. L. , and Healy, E. W. (2013). “Role and relative contributions of temporal envelope and fine structure cues in sentence recognition by normal-hearing listeners,” J. Acoust. Soc. Am. 134, 2205–2212. 10.1121/1.4816413 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Arehart, K. H. , Souza, P. E. , Muralimanohar, R. K. , and Miller, C. W. (2011). “Effects of age on concurrent vowel perception in acoustic and simulated electroacoustic hearing,” J. Speech Lang. Hear. Res. 54, 190–210. 10.1044/1092-4388(2010/09-0145) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Boersma, P. , and Weenick, D. (2014). “Praat: Doing phonetics by computer” [Computer program], version 5.4, http://www.praat.org/ (Last viewed 24 November 2014).
  • 4. Bologna, W. J. , Vaden, K. I., Jr. , Ahlstrom, J. B. , and Dubno, J. R. (2018). “Age effects on perceptual organization of speech: Contributions of glimpsing, phonemic restoration, and speech segregation,” J. Acoust. Soc. Am. 144, 267–281. 10.1121/1.5044397 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Brokx, J. P. L. , and Nooteboom, S. G. (1982). “Intonation and the perceptual separation of simultaneous voices,” J. Phon. 10, 23–36. [Google Scholar]
  • 6. Clarke, J. , Kazanoğlu, D. , Bașkent, D. , and Gaudrain, E. (2017). “Effect of F0 contours on top-down repair of interrupted speech,” J. Acoust. Soc. Am. 142, EL7–EL12. 10.1121/1.4990398 [DOI] [PubMed] [Google Scholar]
  • 7. Clinard, C. G. , and Tremblay, K. L. (2013). “Aging degrades the neural encoding of simple and complex sounds in the human brainstem,” J. Am. Acad. Audiol. 24, 590–599. 10.3766/jaaa.24.7.7 [DOI] [PubMed] [Google Scholar]
  • 8. Fogerty, D. , Bologna, W. J. , Ahstrom, J. B. , and Dubno, J. R. (2017). “Simultaneous and forward masking of vowels and stop consonants: Effects of age, hearing loss, and spectral shaping,” J. Acoust. Soc. Am. 141, 1133–1143. 10.1121/1.4976082 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Gilbert, J. L. , Tamati, T. N. , and Pisoni, D. B. (2013). “Development, reliability, and validity of PRESTO: A new high-variability sentence recognition test,” J. Am. Acad. Audiol. 24, 26–36. 10.3766/jaaa.24.1.4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Helfer, K. S. , and Freyman, R. L. (2008). “Aging and speech-on-speech masking,” Ear Hear. 29, 87–98. 10.1097/AUD.0b013e31815d638b [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Hopkins, K. , and Moore, B. C. J. (2009). “The contribution of temporal fine structure to the intelligibility of speech in steady and modulated noise,” J. Acoust. Soc. Am. 125, 442–446. 10.1121/1.3037233 [DOI] [PubMed] [Google Scholar]
  • 12. Lee, J. H. , and Humes, L. E. (2012). “Effect of fundamental-frequency and sentence-onset differences on speech-identification performance of young and older adults in a competing-talker background,” J. Acoust. Soc. Am. 132, 1700–1717. 10.1121/1.4740482 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Oh, S. H. , Donaldson, G. S. , and Kong, Y. Y. (2016). “The role of continuous low-frequency harmonicity cues for interrupted speech perception in bimodal hearing,” J. Acoust. Soc. Am. 139, 1747–1755. 10.1121/1.4945747 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.R Development Core Team (2016). “R: A language and environment for statistical computing,” R Foundation for Statistical Computing, Vienna, Austria, http://www.R-project.org (Last viewed 21 June 2018).
  • 15. Rosen, S. (1992). “Temporal information in speech: Acoustic, auditory and linguistic aspects,” Philos. Trans. R. Soc. London B. Biol. Sci. 336, 367–373. 10.1098/rstb.1992.0070 [DOI] [PubMed] [Google Scholar]
  • 16. Snyder, J. S. , and Alain, C. (2005). “Age-related changes in neural activity associated with concurrent vowel segregation,” Brain Res. Cogn. Brain Res. 24, 492–499. 10.1016/j.cogbrainres.2005.03.002 [DOI] [PubMed] [Google Scholar]
  • 17. Steinmetzger, K. , and Rosen, S. (2015). “The role of periodicity in perceiving speech in quiet and in background noise,” J. Acoust. Soc. Am. 138, 3586–3599. 10.1121/1.4936945 [DOI] [PubMed] [Google Scholar]
  • 18. Stickney, G. S. , Assmann, P. F. , Chang, J. , and Zeng, F. G. (2007). “Effects of cochlear implant processing and fundamental frequency on the intelligibility of competing sentences,” J. Acoust. Soc. Am. 122, 1069–1078. 10.1121/1.2750159 [DOI] [PubMed] [Google Scholar]
  • 19. Studebaker, G. A. (1985). “A ‘rationalized’ arcsine transform,” J. Speech Hear. Res. 28, 455–462. 10.1044/jshr.2803.455 [DOI] [PubMed] [Google Scholar]
  • 20. Tinnemore, A. , and Chatterjee, M. (2015). “The incomplete role of the pitch contour in voice emotion transmission,” poster presentation at the 2015 MidWinter Meeting of the Association for Research in Otolaryngology, Baltimore, MD. [Google Scholar]
  • 21. Vongpaisal, T. , and Pichora-Fuller, M. K. (2007). “Effect of age on F0 difference limen and concurrent vowel identification,” J. Speech Lang. Hear. Res. 50, 1139–1156. 10.1044/1092-4388(2007/079) [DOI] [PubMed] [Google Scholar]
  • 22. Woods, K. J. P. , and McDermott, J. (2015). “Attentive tracking of sound sources,” Curr. Biol. 25, 2238–2246. 10.1016/j.cub.2015.07.043 [DOI] [PubMed] [Google Scholar]

Articles from The Journal of the Acoustical Society of America are provided here courtesy of Acoustical Society of America

RESOURCES