Abstract
While dynamic pitch is helpful for speech perception in temporally-modulated noise, the ability to benefit from this cue varies substantially among older listeners. To examine the perceptual factors that contribute to this variability, this study aimed to characterize individuals' ability to perceive dynamic pitch in temporally-modulated noise using dynamic pitch segments extracted from real speech and embedded in temporally modulated noise. Data from younger and older listeners showed stronger pitch contours were more easily perceived than weaker pitch contours. The metric significantly predicted speech-in-noise ability in older listeners. Potential implications of this work are discussed.
1. Introduction
As one of the most powerful cues for speech recognition under adverse conditions, pitch enhances continuity of speech streams and improves speech recognition in the presence of background talkers.1–4 Dynamic pitch (i.e., the pitch variation in natural speech) plays an important role in facilitating speech comprehension,5,6 as well as conveying emotion.7 Previous studies have also found that dynamic pitch cues facilitate speech recognition in background noise for both younger listeners with normal hearing8–10 and older listeners with normal hearing or hearing loss.11–13 This effect has also been demonstrated in languages other than English.14
We know that improved speech recognition in noise with dynamic pitch cues stems from perceptual and psycholinguistic mechanisms. Dynamic pitch can enhance perceptual continuity and saliency under perceptual interruption15,16 and provide prosodic cues for speech perception.5,6 When target speech is interrupted by temporally-modulated noise (as in many real-life scenarios such as conversation and traffic), dynamic pitch cues can improve speech perception as compared with target speech in steady-state noise.11 As shown by recent data, while older individuals as a group are able to benefit from dynamic pitch cues and perceive speech better in temporally-modulated noise, there was substantial variability across individuals in terms of whether a listener could benefit from stronger dynamic pitch cues.12 A better understanding of this inter-listener variability is critical in pursuing the goal of enhancing dynamic pitch cues for improving speech perception in noise for individual listeners.
One of the perceptual factors that could contribute to the individual's relative benefit of dynamic pitch for speech recognition in temporally-modulated noise is the ability to perceive the difference between strong and mild dynamic pitch cues. Without noise, there is behavioral and electrophysiological evidence supporting individual differences in responses to pitch contours, which is shaped by musical and linguistic experiences.17,18 However, even with musical and linguistic experience controlled, there is still individual variability in responses to pitch contours among older listeners with good hearing.19 It is possible that many individuals cannot utilize strengthened dynamic pitch to improve speech recognition in noise because they cannot perceive the difference in pitch contour between strong and mild dynamic pitch. Further, even good perception of dynamic pitch can be interfered by interrupting noise that has temporal modulation, which is often described as poor ability to “glimpse.”20 As many older listeners have difficulty glimpsing in temporally-modulated noise,21 it is hypothesized that those who cannot perceive pitch contour differences across noise interruptions are not able to benefit from the strengthened pitch cues. To test this hypothesis, a critical first step is to develop a perceptual metric that can be used to characterize individuals' ability to glimpse dynamic pitch in noise.
Although some test paradigms have been developed to measure an individual's ability to perceive dynamic pitch/spectral cues, either in quiet or in noise,19,22–24 those tests do not measure the perception of dynamic pitch across phonemes/words. They typically include pitch patterns that last less than a second and are modeled after random or linear curves, whereas the effect of dynamic pitch can expand across and connect multiple words in a sentence.5,6 Therefore, due to limitations of the pitch stimuli, traditional spectral measures cannot capture the ability to glimpse and benefit from natural dynamic pitch patterns. The present study aimed to characterize older individuals' ability to glimpse dynamic pitch in temporally-modulated noise by using a newly developed metric that employs stimuli that are longer and modeled directly from those in continuous speech.
2. Methods
2.1. Participants
Two groups of adults were recruited in the greater Chicago area to participate in the study. The older group consisted of 17 older adults (12 women and 5 men) aged 57 to 79 yr [mean age = 66.2 yr, standard deviation (SD) = 5.76 yr], and the younger group (13 women and 6 men) consisted of 19 younger adults aged 18 to 31 yr (mean age = 22.3 yr, SD = 3.16 yr). To control for the factor of musical training and tone language experience, none of the participants had more than 3 yr of instrument practice (or vocal training). None of them spoke any tone languages. All of the younger participants had normal hearing [pure tone thresholds of 20 dB hearing level (HL) or better at octave frequencies from 250 to 8000 Hz]. All the older participants had near-normal hearing (pure tone thresholds of 30 dB HL or better at octave frequencies from 250 to 3000 Hz, see Table 1 for their thresholds). Participants were tested monaurally in the ear with better hearing (as defined by a lower pure-tone average threshold). The study protocol was approved by the Institutional Review Board of Northwestern University and all participants were paid for their time.
Table 1.
Pure tone thresholds of older group (dB HL).
| Frequency (Hz) | 250 | 500 | 1000 | 2000 | 3000 | 4000 | 6000 | 8000 |
|---|---|---|---|---|---|---|---|---|
| Participant | ||||||||
| 1 | 20 | 10 | 10 | 15 | 10 | 30 | 45 | 70 |
| 2 | 25 | 5 | 10 | 25 | 30 | 40 | 55 | 60 |
| 3 | 15 | 5 | 10 | 0 | −5 | 10 | 10 | 25 |
| 4 | 10 | 10 | 5 | 10 | 10 | 20 | 10 | 10 |
| 5 | 20 | 5 | 5 | 0 | 5 | 20 | 15 | 15 |
| 6 | 10 | 15 | 10 | 30 | 20 | 35 | 45 | 45 |
| 7 | 20 | 15 | 20 | 20 | 30 | 40 | 60 | 65 |
| 8 | 10 | 5 | 15 | 5 | 25 | 30 | 40 | 35 |
| 9 | 20 | 15 | 15 | 20 | 30 | 25 | 35 | 45 |
| 10 | 15 | 15 | 5 | 15 | 15 | 20 | 15 | 15 |
| 11 | 15 | 15 | 15 | 30 | 30 | 45 | 50 | 50 |
| 12 | 10 | 15 | 10 | 25 | 30 | 30 | 40 | 30 |
| 13 | 10 | 10 | 5 | 0 | 10 | 15 | 15 | 20 |
| 14 | 15 | 15 | 10 | 0 | 5 | 10 | 30 | 25 |
| 15 | 20 | 25 | 15 | 30 | 30 | 55 | 70 | 65 |
| 16 | 15 | 20 | 25 | 10 | 30 | 60 | 60 | 75 |
| 17 | 15 | 15 | 10 | 10 | 15 | 20 | 25 | 20 |
2.2. Stimuli and procedure
To create the dynamic pitch stimuli, the pitch trajectory of IEEE sentences25 were extracted and resynthesized into continuous pure tone glides using the PRAAT speech analysis and resynthesis package.26 The gaps between segments of pitch contours were interpolated with a quadratic spline function.27 A segment of tone glide was randomly selected from each exemplar that was generated from 3-s sentences. The duration of the tone glide was set to be 1.5-s in order to prevent overloading auditory short-term memory in the discrimination task.28 Using the following formula for resynthesizing the speech stimuli,11 three levels of dynamic pitch strength were generated with different pitch factors: 1.0 for natural level, 1.4 for mild level, and 1.75 for strong level.
| (1) |
Five dynamic pitch segments were included in the test as multiple items. For each of the pitch segments, three conditions had two identical stimuli that were selected from one of the three pitch strength levels (natural, mild, and strong) with equal probability. Four other conditions contained a baseline stimulus (natural) and one of the stronger dynamic pitch stimuli (mild or strong), with presentation order counterbalanced. There were 2 trials for each condition/item and 70 trials total with test order randomized across participants. The test used a two-interval force-choice task to measure individuals' ability to discriminate pitch contours that have different amounts of dynamic pitch strength. The listeners were asked to make a judgment about whether the two stimuli in each trial sounded the same or different. A customized Matlab program was used to present the stimuli and record the responses. In the test, pitch contour stimuli were first presented without any background noise to familiarize the listeners with the stimuli. Following that, the same pitch stimuli were presented with background noise, which were temporally-modulated noise (ICRA 2-talker noise29), which is a non-speech noise with the temporal envelop of 2-talker babble preserved. The signal-to-noise ratio (SNR) was set at 10 dB SNR based on piloting. Stimuli were presented over an Etymotic Research ER-2 insert earphone in the test ear with a presentation level of 65 dB sound pressure level. Beside the dynamic pitch glimpsing test, QuickSIN (Ref. 34) was also administered in both age groups to measure individuals' ability to perceive speech in noise with a score of dB SNR Loss. This measure is defined as the SNR required for a listener to have 50% accuracy, as relative to normal hearing listeners' performance.
3. Results
As the responses from the discrimination paradigm inherently reflect listener bias in making the judgment,30 the performance data were first analyzed to extract signal detection parameters (d′ and beta) using R (R Core Team-Version 3.2.1) with package Psycho.31 The sensitivity parameter of d′ was used to indicate the individuals' dynamic pitch glimpsing ability. The goal of the first analysis was to reveal the effect of stimuli and listener factors (i.e., dynamic pitch strength and age group) on dynamic pitch glimpsing ability. Figure 1 presents the group mean values of the d′ for each pitch condition. Data analysis was conducted using mixed effects linear regressions with R's lme4 (Ref. 32) and lmerTest libraries.33 The models included fixed effects of pitch strength (Mild vs Strong), group effect (Younger vs Older), as well as random by-participant intercepts. Results demonstrated significant effects of pitch strength [b = 0.73, t(35) = 8.80, p < 0.001] and group difference [b = 0.54, t(36) = 2.08, p < 0.05], while the interaction was not significant [b = −0.12, t(35) = −0.72, p > 0.1]. Specifically, trials with stronger pitch contours were better perceived than those with mild contours, and younger listeners performed better than older listeners.
Fig. 1.
Group means of dynamic pitch glimpsing ability (as indicated by d′ values) for older and younger groups with two dynamic pitch strength conditions (error bars indicate ± standard error).
In order to find out the range of difficulty across the five items of dynamic pitch strength, a second analysis was carried out to extract d′ for each of the five items (averaged across pitch strengths and participants), for both younger and older groups. It was found that item difficulty (as indicated by d′) had a comparable distribution across the two groups of listeners, with a correlation coefficient r of 0.94.
The third analysis aimed to answer the question whether the dynamic pitch glimpsing ability can predict listeners' speech in noise performance as measured by QuickSIN. For the older group, linear regression models were built initially including variables of age, pure tone average, and dynamic pitch glimpsing ability (as indicated by d′ in the mild contour condition). Model comparison was performed using the stepwise selection function in R. The results showed that, among all the variables, only dynamic pitch glimpsing ability significantly predicted QuickSIN performance of older listeners [b = −1.93, t(15) = −2.75, p = 0.01, see Fig. 2 for a scatter plot]. For younger group, the dynamic pitch glimpsing ability was only marginally significant in predicting QuickSIN performance [b = −0.97, t(17) = −1.85, p = 0.08].
Fig. 2.
Scatter plot showing the relationship between dynamic pitch glimpsing ability and speech-in-noise performance in older group.
4. Discussion
With the goal of characterizing individual listeners' dynamic pitch glimpsing ability, this work serves as a first step in devising a new metric that measures perception of continuous dynamic pitch cues on a supra-segmental level. To the best of our knowledge, this is the first test that employs pitch stimuli that are modeled after real speech and expanded across multiple words. This innovative approach has two implications. First, pitch's role as a supra-segmental cue in speech perception has been documented.35,36 As this cue can extend across multiple phonemes and words, the ability to perceive it should be measured with stimuli that have durations of a few words. To this end, this test was designed to provide unique information about individual ability to perceive natural dynamic pitch pattern in speech, which contains rich prosodic information that can facilitate speech processing. Second, longer-duration stimuli can reflect the perceptual ability of glimpsing in temporally modulated noise that simulates the temporal characteristics of speech. In order to measure the ability to glimpse across temporal modulation, the noise should contain multiple cycles of modulation to engage the glimpsing ability as it happens in perception of continuous speech.
The study has yielded a few new findings that are useful for the next steps of research. First, the group data were consistent with previous findings in showing strong pitch contour is easier to perceive than mild ones.19 However, there was individual variability in the relative performance between strong and mild pitch strength levels. In other words, some individuals did much better with strong contours as compared to mild ones, while others performed comparably across these two conditions. As a next step, we plan to examine the connection between this variability and the relative benefit from different pitch strength levels. Second, while it was not unexpected to find an age effect on performance,19,22 the fact that the performance of younger and older groups highly correlated on the individual items indicates the current paradigm provides a reliable measure across listener groups with a range of item difficulty. Last, we found the dynamic pitch glimpsing ability was a strong predictor for speech-in-noise performance (i.e., QuickSIN). Considering the QuickSIN stimuli do not have exaggerated dynamic pitch, this finding suggests that our metric effectively captures one of the perceptual factors that contribute to individuals' speech perception ability in temporally-modulated noise (e.g., glimpsing ability).
In addition, it is worth noting this dynamic pitch glimpsing metric is clinically appealing because it measures the perceptual ability of pitch perception without involvement of linguistic content, which makes it suitable for those older individuals who may have difficulty with language processing. Built on the findings from the present study, it is envisioned that further work may lead to a clinical version of this test to measure one of the perceptual abilities that influence older individuals' benefit from dynamic pitch cues for speech perception in noise.
Acknowledgments
The authors thank Alexandra Brockner and Melissa Sherman for assistance with data collection. Work supported by NIH Grant Nos. R21DC017560 and R01DC012289.
Contributor Information
Jing Shen, Email: .
Pamela E. Souza, Email: .
References and links
- 1. Brokx J. P. L. and Nooteboom S. G., “ Intonation and the perceptual separation of simultaneous voices,” J. Phonetics 10, 23–26 (1982). [Google Scholar]
- 2. Assmann P. F., “ Fundamental frequency and the intelligibility of competing voices,” in 14th International Congress of Phonetic Sciences, edited by Ohala J. J., Hasegawa Y., Ohala M., Granville D., and Bailey A. C. ( The Regents of the University of California, Berkeley, CA, 1999), pp. 179–182. [Google Scholar]
- 3. Bird J. and Darwin C. J., “ Effects of a difference in fundamental frequency in separating two sentences,” in Psychophysical and Physiological Advances in Hearing, edited by Palmer A. R., Rees A., Summerfield A. Q., and Meddis R. ( Whurr, London, United Kingdom, 1998), pp. 263–269. [Google Scholar]
- 4. Summers V. and Leek M. R., “ F0 processing and the separation of competing speech signals by listeners with normal hearing and with hearing loss,” J. Speech, Lang., Hear. Res. 41, 1294–1306 (1998). 10.1044/jslhr.4106.1294 [DOI] [PubMed] [Google Scholar]
- 5. Brown M., Salverda A. P., Dilley L. C., and Tanenhaus M. K., “ Expectations from preceding prosody influence segmentation in online sentence processing,” Psychonomic Bull. Res. 18, 1189–1196 (2011). 10.3758/s13423-011-0167-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Cutler A., “ Phoneme-monitoring reaction time as a function of preceding intonation contour,” Percept. Psychophys. 20(1), 55–60 (1976). 10.3758/BF03198706 [DOI] [Google Scholar]
- 7. Fairbanks G., “ Recent experimental investigations of vocal pitch in speech,” J. Acoust. Soc. Am. 11, 457–466 (1940). 10.1121/1.1916060 [DOI] [Google Scholar]
- 8. Binns C. and Culling J. F., “ The role of fundamental frequency contours in the perception of speech against interfering speech,” J. Acoust. Soc. Am. 122, 1765–1776 (2007). 10.1121/1.2751394 [DOI] [PubMed] [Google Scholar]
- 9. Laures J. S. and Bunton K., “ Perceptual effects of a flattened fundamental frequency at the sentence level under different listening conditions,” J. Commun. Disorders 36, 449–464 (2003). 10.1016/S0021-9924(03)00032-7 [DOI] [PubMed] [Google Scholar]
- 10. Miller S. E., Schlauch R. S., and Watson P. J., “ The effects of fundamental frequency contour manipulations on speech intelligibility in background noise,” J. Acoust. Soc. Am. 128, 435–443 (2010). 10.1121/1.3397384 [DOI] [PubMed] [Google Scholar]
- 11. Shen J. and Souza P., “ The effect of dynamic pitch on speech recognition in temporally modulated noise,” J. Speech Lang., Hear. Res. 60(9), p. 2725–2739 (2017). 10.1044/2017_JSLHR-H-16-0389 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Shen J. and Souza P., “ Do older listeners with hearing loss benefit from dynamic pitch for speech recognition in noise?,” Am. J. Audiol. 26, 462–466 (2017). 10.1044/2017_AJA-16-0137 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Shen J. and Souza P., “ On dynamic pitch benefit for speech recognition in speech masker,” Front. Psychol. 9, 1967 (2018). 10.3389/fpsyg.2018.01967 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Wu M., “ Effect of F0 contour on perception of Mandarin Chinese speech against masking,” PLoS One 14(1), e0209976 (2019). 10.1371/journal.pone.0209976 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Bregman A. S., Auditory Scene Analysis: The Perceptual Organization of Sound ( MIT Press, Cambridge, MA, 1994). [Google Scholar]
- 16. Nooteboom S. G., Brokx J. P., and de Rooij J. J., “ Contributions of prosody to speech perception,” in Studies in the Perception of Language, edited by W. J. M. Levelt and G. B. Flores d'Arcais ( Wiley, New York, 1978), pp. 75–107. [Google Scholar]
- 17. Coffey E. B., Colagrosso E. M., Lehmann A., Schönwiesner M., and Zatorre R., “ Individual differences in the frequency-following response: Relation to pitch perception,” PLoS One 11(3), e0152374 (2016). 10.1371/journal.pone.0152374 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Chandrasekaran B., Krishnan A., and Gandour J. T., “ Relative influence of musical and linguistic experience on early cortical processing of pitch contours,” Brain Lang. 108(1), 1–9 (2009). 10.1016/j.bandl.2008.02.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Shen J., Wright R., and Souza P., “ On older listeners' ability to perceive dynamic pitch,” J. Speech Lang., Hear. Res. 59(3), 572–582 (2016). 10.1044/2015_JSLHR-H-15-0228 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Miller G. A. and Licklider J. C. R., “ The intelligibility of interrupted speech,” J. Acoust. Soc. Am. 22(2), 167–173 (1950). 10.1121/1.1906584 [DOI] [Google Scholar]
- 21. Takahashi G. A. and Bacon S. P., “ Modulation detection, modulation masking, and speech understanding in noise in the elderly,” J. Speech, Lang., Hear. Res. 35(6), 1410–1421 (1992). 10.1044/jshr.3506.1410 [DOI] [PubMed] [Google Scholar]
- 22. Sheft S., Shafiro V., Lorenzi C., McMullen R., and Farrell C., “ Effects of age and hearing loss on the relationship between discrimination of stochastic frequency modulation and speech perception,” Ear Hear. 33(6) 709 (2012). 10.1097/AUD.0b013e31825aab15 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Sheft S., Shafiro V., Wang E., Barnes L. L., and Shah R. C., “ Relationship between auditory and cognitive abilities in older adults,” PLoS One 10(8), e0134330 (2015). 10.1371/journal.pone.0134330 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Green T., Faulkner A., and Rosen S., “ Spectral and temporal cues to pitch in noise-excited vocoder simulations of continuous interleaved-sampling cochlear implants,” J. Acoust. Soc. Am. 112(5), 2155–2164 (2002). 10.1121/1.1506688 [DOI] [PubMed] [Google Scholar]
- 25. Rothauser E. H., Chapman W. D., Guttman N., Nordby K. S., Silbiger H. R., Urbanek G. E., and Weinstock M., “ I.E.E.E. recommended practice for speech quality measurements,” IEEE Trans. Audio Electroacoust. 17(3), 225–246 (1969). 10.1109/TAU.1969.1162058 [DOI] [Google Scholar]
- 26. Boersma P. and Weenink D., “ Praat: Doing phonetics by computer” [Computer program]. Version 5.3. 51. Online: http://www.praat.org/retrieved (Last viewed 30 October 2018).
- 27. Hirst D., “ ProZed: A speech prosody analysis-by-synthesis tool for linguists,” in Proceedings of the 6th International Conference on Speech Prosody, Shanghai, 2012, pp. 15–18. [Google Scholar]
- 28. Baddeley A. D., Thomson N., and Buchanan M., “ Word length and the structure of short-term memory,” J. Verbal Learn. Verbal Behav. 14(6), 575–589 (1975). 10.1016/S0022-5371(75)80045-4 [DOI] [Google Scholar]
- 29. Dreschler W. A., Verschuure H., Ludvigsen C., and Westermann S., “ ICRA noises: Artificial noise signals with speech-like spectral and temporal properties for hearing instrument assessment (Ruidos ICRA: Señates de ruido artificial con espectro similar al habla y propiedades temporales para pruebas de instrumentos auditivos),” Audiol. 40(3), 148–157 (2001). 10.3109/00206090109073110 [DOI] [PubMed] [Google Scholar]
- 30. Swets J. A., W. P. Tanner, Jr. , and Birdsall T. G., “ Decision processes in perception,” Psychol. Rev. 68(5), 301–340 (1961). 10.1037/h0040547 [DOI] [PubMed] [Google Scholar]
- 31. Makowski D., “ The psycho package: An efficient and publishing-oriented workflow for psychological science,” J. Open Source Software 3(22), 470 (2018). 10.21105/joss.00470 [DOI] [Google Scholar]
- 32. Bates D., Maechler M., Bolker B., and Walker S., “ lme4: Linear mixed-effects models using Eigen and S4,” R package version Vol. 1(7), pp. 1–23 (2014). [Google Scholar]
- 33. Kuznetsova A., Brockhoff P. B., and Christensen R. H. B., “ lmerTest: Tests for random and fixed effects for linear mixed effect models (lmer objects of lme4 package)” R package Version 2.0-3 [Computer software].
- 34. Killion M. C., Niquette P. A., Revit L. J., and Skinner M. W., “ Quick SIN and BKB-SIN, two new speech-in-noise tests permitting SNR-50 estimates in 1 to 2 min,” J. Acoust. Soc. Am. 109(5), 2502–2502 (2001). 10.1121/1.4744912 [DOI] [Google Scholar]
- 35. Ladd D. R., Intonational Phonology ( Cambridge University Press, Cambridge, 1996). [Google Scholar]
- 36. Lehiste I., “ Suprasegmental features of speech,” in Contemporary Issues in Experimental Phonetics, edited by Lass N. ( Academic Press, New York, 1976), pp. 225–239. [Google Scholar]


