Abstract
Mandarin sentence recognition using natural-tone and flat-tone sentences was tested in 22 subjects with sensorineural hearing loss (SNHL) and 25 listeners with normal hearing (NH) in quiet, speech-shaped noise, and two-talker-babble conditions. While little effects of flat tones on sentence recognition were seen in the NH listeners when the signal-to-noise ratio (SNR) was ≥0 dB, the SNHL listeners showed decreases in flat-tone-sentence recognition in quiet and at +5-dB SNR. Such declined performance was correlated with their degrees of hearing loss. Lexical tone contributes greatly to sentence recognition in hearing-impaired listeners in both quiet and in noise listening conditions.
1. Introduction
Lexical tone distinguishes lexical meaning of syllables in tonal languages such as Mandarin Chinese (Howie, 1976). Acoustically, lexical tone is represented by the voice pitch or fundamental frequency (F0). There are four main Mandarin tones, each of which has a distinct pitch pattern (see Xu and Zhou, 2011). The pitch patterns of tone 1 through tone 4 are (1) high-flat, (2) rising, (3) falling-rising, and (4) falling (Lee and Hung, 2008). In addition, a tone 0 that usually appears at the end of Mandarin adjectives or certain nouns has a flat pitch pattern with a very short duration (approximately 100 ms in connected speech) (Yang et al., 2017). While lexical tones are essential for identification of monosyllable words in Mandarin, the importance of lexical tones in connected speech has been shown to be reduced (Surendran and Levow, 2004).
Are lexical tones important for comprehension at sentence level for tonal languages? Xu (2006) showed that the acoustic distinction among lexical tones is diminished in natural running speech. Feng et al. (2012) examined Mandarin tone recognition and sentence recognition using sine-wave speech processing in which F0 information was devoid. Results showed that sine-wave tone recognition was only slightly above chance but sine-wave sentence recognition was nearly perfect. The authors indicated that the importance of lexical tones on sentence recognition was limited, and the top-down processes contributed to the high-level recognition of sine-wave sentences. In a series of studies using pitch-flattened sentences, researchers showed that for adult listeners with normal hearing (NH) the flat-tone Mandarin sentences were just as intelligible as the natural-tone sentences in a quiet background (Patel et al., 2010; Wang et al., 2013; Xu et al., 2013; Chen et al., 2014). However, in a listening condition with moderate levels of noise [e.g., at 0-dB signal-to-noise ratio (SNR)], the intelligibility of the flat-tone sentences dropped 25 percentage points below that of the natural-tone sentences (Chen et al., 2014). Therefore, whereas lexical tones are probably dispensable due to information redundancy and top-down compensations for Mandarin sentence comprehension in quiet, they are important for the recognition of Mandarin sentences in adverse listening conditions.
In listeners with sensorineural hearing loss (SNHL), lexical tone recognition is fairly robust in quiet (Wang et al., 2015; Wang et al., 2016). In Wang et al. (2016), 42 patients with SNHL were tested for tone recognition in quiet. Results showed that tone recognition scores were 99.4%, 95.0%, and 86.9% correct for mild, moderate, and severe degrees of hearing loss, respectively. However, limited data were available on tone recognition in noise in listeners with SNHL. The contributions of lexical tone to Mandarin sentence recognition in hearing-impaired listeners under noisy conditions are not known. In a recent study, Chen et al. (2019) examined sentence recognition using flat-tone and random-tone Mandarin sentences in 32 listeners with moderate to severe SNHL. They found that the sentence recognition decreased by approximately 40 percentage points in the presence of mismatched lexical tone information both in quiet and in noise. These results suggested that lexical tone is of great importance for hearing-impaired individuals who speak tone-languages to perceive sentences in both quiet and noise. However, their subjects were all tested with bilateral hearing aids. Their hearing aids processing typically involved amplitude compression and noise reduction algorithm. Therefore, their results might have been confounded by the effects of hearing aid processing on tone and sentence recognition in noise.
In the present study, to avoid the effects of hearing aid processing on the tone and sentence recognition in noise, we recruited subjects whose hearing loss was moderate so that they could hear the speech stimuli without the need of a hearing aid. We attempt to address the issue of contributions of lexical tone to Mandarin sentence recognition in hearing-impaired listeners without using any hearing devices. In addition, hearing impaired listeners often show more susceptibility to fluctuating noise than to steady-state noise (e.g., Bernstein and Grant, 2009; Ozmeral et al., 2016; Hu et al., 2018). Thus, two types of noise [i.e., speech-shaped noise (SSN) and two-talker babble (TTB)] are used in the present study to examine the different masking effects when tone information is removed in the target sentences.
2. Methods
2.1. Subjects
Twenty-two Mandarin-speaking listeners with sensorineural hearing loss (11 males and 11 females) were recruited to participate in the present study. Their ages ranged between 18 and 55 years with an average age of 34.81 ± 10.47 years. Figure 1 shows the audiograms of the subjects. The group mean pure-tone average of 500 to 4000 Hz (PTA500 to 4000 Hz) was 54.41 ± 8.70 dB hearing level (HL). One subject had mild hearing loss (as defines by PTA of 20–40 dB HL), 20 subjects had moderate hearing loss (as defined by PTA of 41–70 dB HL), and one subject had severe hearing loss (as defines by PTA of 71–95 dB HL). The hearing loss was symmetric in both ears with an interaural difference of PTA500 to 4000 Hz ≤ 10 dB except for one subject whose interaural difference of PTA500 to 4000 Hz was 16.25 dB. As controls, 20 NH adults were also recruited to participate in the study. The NH group had a mean age of 22.25 ± 1.71 years. Their hearing thresholds were all ≤20 dB from 250 to 8000 Hz. The use of human subjects was reviewed and approved by the Institutional Review Boards of Beijing Tongren Hospital and Ohio University.
Fig. 1.
(Color online) Audiograms of all hearing-impaired subjects (N = 22). Each thin line represents the audiogram of one subject. The thick gray line represents the group mean data.
2.2. Test materials
Mandarin Hearing in Noise Test (MHINT) (Wong et al., 2007) was used for sentence recognition tests. The original MHINT materials were recorded with a male speaker. These materials were used as the natural-tone sentences for the recognition test in the present study. To generate flat-tone sentence materials, a male Mandarin-speaking professional announcer was recorded producing all the words in MHINT sentences with only the flat tone (i.e., tone 1). The recordings were done at a sampling rate of 48 000 Hz and a resolution of 16 bits and were then down-sampled at 22 050 Hz for presentations. Figure 2 shows an example of the waveforms, spectrograms, and F0 contours of a sentence with natural tones and flat tones. In this example, the F0 range of the original sentences was 97–279 Hz (mean ± SD: 194 ± 49 Hz), whereas the F0 range of the flat-tone sentences was 163–245 Hz (mean ± SD: 210 ± 12 Hz).
Fig. 2.
(Color online) The waveforms, spectrograms, and F0 contours of an MHINT sentence in natural tone (left) and flat tone (right) conditions. The sentence was “最近宿舍里有很多蚊子 (There are lots of mosquitos in the dorms these days).”
Two types of maskers were used for sentence recognition in noise: SSN and TTB. The SSN was from the original MHINT materials, which was generated by deriving the long-term spectrum of all MHINT sentences and then using it to filter a white noise. The TTB was generated by mixing the root-mean-square-equalized recordings of speech samples produced by one male and one female Mandarin-speaking talker. The amplitude of the noise was adjusted to create different signal-to-noise ratios (SNRs) (i.e., +5, 0, and –5 dB).
2.3. Procedures
NH listeners were tested in quiet and in noisy conditions with natural-tone and flat-tone MHINT sentences. For each test condition, a list of 10 MHINT sentences was used. Therefore, each NH participant listened to a total of 140 sentences for the 14 test conditions (2 noise types × 3 SNRs, plus Quiet, for both natural-tone and flat-tone sentences). The listeners with SNHL were also tested in quiet and in noisy conditions with natural-tone and flat tone MHINT sentences. However, to avoid the floor effects of sentence recognition in the less favorable SNRs, they were only tested at an SNR of +5 dB for the noisy condition. Thus, each participant with SNHL listened to a total of 60 sentences (2 noise types × 1 SNR, plus Quiet, for both natural-tone and flat-tone sentences).
The tests were administered through a custom program written in matlab. The stimuli were delivered bilaterally via Sennheiser HD 280 Pro headphones and the presentation level was adjusted to ensure audibility and the most comfortable level for each subject. The subjects were required to repeat the sentences they heard. Since each list of 10 MHINT sentences contained 100 Chinese words, a percent correct score was obtained for each condition by adding the number of all words correctly repeated by the subject. The order of the MHINT sentence lists and the order of the sentences in the list were all randomized. Each sentence was presented only once.
A short practice session proceeded the real test, which included one sentence in each of the test conditions (i.e., 14 sentences for the NH subjects and 6 sentences for the subjects with SNHL). The sentences used in the practice session were not the same sentences used in the real tests. The purpose of the practice session was to familiarize the subjects with the testing procedure.
3. Results and discussion
Figure 3 shows the results of speech recognition with natural-tone sentences and flat-tone sentences by listeners with NH and listeners with SNHL. For the NH listeners, both natural-tone sentences and flat-tone sentences were recognized nearly 100% correct in quiet and in +5-dB-SNR noises. Recognition only slightly decreased (∼95% correct) with the flat-tone sentences in both types of maskers at 0-dB SNR. At the SNR of −5 dB, the recognition dropped for both natural-tone and flat-tone sentences. In the SSN condition at −5-dB SNR, natural-tone and flat-tone sentences were recognized 81.0% and 62.3% correct, respectively. In the TTB condition at −5-dB SNR, natural-tone and flat-tone sentences were recognized 86.4% and 78.3% correct, respectively. A generalized linear model (GLM) was constructed to evaluate the main effects of (1) SNR, (2) type of sentences, and (3) type of maskers on sentence recognition of the NH listeners. The GLM analysis showed that the main effects of SNR and type of sentences were statistically significant (both p < 0.0001) but the main effects of type of maskers were not statistically significant (p = 0.5567). The differences in recognition at −5-dB SNR between the two types of maskers were due to the strong interactions between the type of sentences and the type of maskers (p < 0.001). These interactions indicated that for the NH listeners, (1) the SSN exerted more masking effects than the TTB and (2) the SSN exerted more masking effects for the flat-tone sentences than for the natural-tone sentences.
Fig. 3.
(Color online) Group mean sentence recognition scores in quiet and noise. The dashed and solid lines are data from listeners with NH and SNHL, respectively. The square and circle symbols represent data with natural- and flat-tone sentences, respectively. The filled and open symbols represent listening conditions with SSN and TTB, respectively.
Our results showed that the natural-tone sentences were more resistant to the masking effects of noise than the flat-tone sentences. The patterns of reduced recognition with flat-tone sentences were similar to previous studies (Patel et al., 2010; Wang et al., 2013; Chen et al., 2014). However, there were certain minor differences among the present study and previous studies. For example, in a study of NH listeners (Patel et al., 2010), the natural-tone and flat-tone Mandarin sentences at SNR of 0 dB were recognized approximately 80% and 60% correct, respectively. On the other hand, Chen et al. (2014) found that those sentences were recognized approximately 100% and 70% correct at the same 0-dB SNR. The sentences in Patel et al. (2010) were fairly long (with an average length of 18 syllables in a sentence) and considered to be more difficult when compared to MHINT sentences (all 10 syllables per sentence) that were used in Chen et al. (2014) and the present study. The present study showed only slightly reduced recognition (∼95% correct) with flat-tone sentences at 0-dB SNR. The more apparent decrease in recognition occurred at SNR of −5 dB in the present study. The differences between the present study and Chen et al. (2014) were probably due to the way that the flat-tone sentences were generated. Chen et al. (2014) used the text-to-speech program that synthesized the flat-tone sentences whereas the present study used human produced flat-tone sentences.
The listeners with SNHL showed remarkably lower recognition scores in both quiet and noisy conditions than the NH listeners (Fig. 3). In quiet conditions, listeners with SNHL obtained scores of 94.9% and 83.0% correct using natural-tone and flat-tone sentences, respectively. At a +5-dB SNR, while the NH listeners achieved nearly perfect recognition with natural-tone and flat-tone sentences, sentence recognition in the listeners with SNHL dramatically declined. Sentence recognition with the natural-tone sentences was 82.5% and 69.4% correct in SSN and TTB masking conditions, respectively. With the flat-tone sentences, however, sentence recognition dropped further down to 61.2% and 44.9% correct in SSN and TTB masking conditions, respectively. A GLM was constructed to test the statistical significance of the main effects of the following four factors for both the NH and SNHL listeners: (1) hearing status (i.e., NH vs SNHL), (2) type of sentences, (3) type of maskers, and (4) SNR. The GLM analysis results indicated statistical significance of the four main effects on sentence recognition (all p < 0.0001).
In a recent study of Mandarin-speaking, hearing-impaired listeners with bilateral hearing aids, Chen et al. (2019) found that the sentence recognition with flat-tone sentences was approximately 40 percentage points lower than that with natural-tone sentences in quiet and in noisy conditions. The noise used in Chen et al. (2019) was SSN at +8-dB SNR. Several other design differences in the studies make it difficult to directly compare the results of the present study with those of Chen et al. (2019). Different speakers were used to record the test materials. The flat-tone sentences were generated in different ways in the two studies, one with human production and one using the text-to-speech synthesizer. The degrees of hearing loss of the listeners in Chen et al. (2019) were more severe than in the present study. More importantly, all of their listeners wore bilateral hearing aids whereas the present study did not involve the use of hearing aids. We attempted to examine the contributions of Mandarin tones to sentence recognition in the hearing-impaired listeners without the confounding effects of hearing aids. Regardless of the differences in the study designs between the present study and Chen et al. (2019), the main findings of these two studies was consistent with each other—that is, lexical tone is important for sentence recognition in noise for listeners with SNHL.
Sentence recognition with the flat-tone sentences in both types of maskers showed enormous individual variability among the subjects with SNHL. Figure 4 plots the average scores using flat-tone sentences under both SSN and TTB conditions for the 22 listeners with SNHL as a function of their better ear PTA500 to 4000 Hz. The mean flat-tone sentence recognition score in noise was moderately correlated with the degree of hearing loss (r = −0.523, p = 0.0114). The least-square fitting of the data shown in Fig. 4 had a slope of approximately −18 percentage points/10 dB hearing loss. The mean natural-tone sentence recognition score in the two types of noise was also moderately correlated with the degree of hearing loss (r = −0.466, p = 0.0279). The slope of the linear fitting of the natural-tone sentence recognition versus degree of hearing loss was approximately −14 percentage points/10 dB hearing loss. The steeper slope in flat-tone condition than in natural-tone condition (−18 versus −14 percentage points/10 dB hearing loss) suggested that removing lexical tone information had a greater detrimental effect on Mandarin sentence recognition for listeners with greater degree of hearing loss. We further analyzed the relationship between the recognition difference (natural tone – flat tone) and degree of hearing loss. The results did not yield a significant correlation (r = 0.1999, p = 0.3791).
Fig. 4.
Individual scores of sentence recognition in noise as a function of their better ear PTA500 to 4000 Hz in listeners with SNHL (N = 22). The sentence recognition score was the average score using flat-tone sentences under both SSN and TTB conditions for each listener.
It is worth noting that while the SSN exerted more masking than the TTB for recognition of either the natural-tone or the flat-tone sentences in the NH listeners, the reverse was true in the listeners with SNHL (Fig. 3). The TTB had more fluctuation in amplitude as compared to the steady-state SSN. This might have allowed listening in the dips or glimpsing to facilitate speech recognition in the NH listeners (Füllgrabe et al., 2006; Ewert et al., 2017). On the other hand, hearing-impaired listeners have been shown to have broadened frequency resolution (Wang et al., 2015) and impaired temporal fine structure processing (Hopkins and Moore, 2009). As a result, they may not have been able to take advantage of the fluctuations in amplitude in the TTB maskers in order to glimpse speech information that could improve speech recognition (Ozmeral et al., 2016; Hu et al., 2018). Our results indicated that the hearing-impaired listeners performed even worse in the TTB than in the SSN conditions for both natural-tone and flat-tone sentences.
In summary, the present study showed that lexical tone was not essential for tonal-language sentence recognition for the NH listeners in quiet or in a small amount of noise (e.g., SNR ≥ 0 dB). The importance of lexical tone was manifested in more adverse noisy conditions as Mandarin sentence recognition was better with tonal information preserved than removed in those listening conditions. On the other hand, for the listeners with SNHL, lexical tone was important even in quiet listening conditions. Its importance was greater in noisy conditions. Fluctuating maskers, such as TTB, exerted stronger masking effects on sentence recognition in listeners with SNHL. The recognition with the flat-tone sentences was negatively correlated with the degree of hearing loss.
Acknowledgments
The authors are grateful to Fei Chen who kindly provided the sentence recordings. Jing Yang and Lexi Neltner provided technical and editorial assistance in the preparation of the manuscript. This study was partially supported by grants from the NIH/NIDCD (Grant No. R15-DC014587), the National Natural Science Foundation of China (Grants Nos. 81870715 and 81200754), the Promotion Grant for High-Level Scientific and Technological Elites in Medical Science from the Beijing Municipal Health Bureau (Grant No. 2015-3-012), and the 2017 Excellent Talents in Medical Science from the Dongcheng District of Beijing.
Contributor Information
Nan Li, Email: .
Shuo Wang, Email: .
Xianhui Wang, Email: .
Li Xu, Email: .
References and links
- 1. Bernstein, J. G. , and Grant, K. W. (2009). “ Auditory and auditory-visual intelligibility of speech in fluctuating maskers for normal-hearing and hearing-impaired listeners,” J. Acoust. Soc. Am. 125(5), 3358–3372. 10.1121/1.3110132 [DOI] [PubMed] [Google Scholar]
- 2. Chen, F. , Wong, L. L. N. , and Hu, Y. (2014). “ Effects of lexical tone contour on Mandarin sentence intelligibility,” J. Speech Lang. Hear. Res. 57(1), 338–345. 10.1044/1092-4388(2013/12-0324) [DOI] [PubMed] [Google Scholar]
- 3. Chen, Y. , Wong, L. L. N. , Qian, J. , Kuehnel, V. , Voss, S. C. , and Chen, F. (2019). “ The role of lexical tone information in the recognition of Mandarin sentences in listeners with hearing aids,” Ear Hear., in press. [DOI] [PubMed]
- 4. Ewert, S. D. , Schubotz, W. , Brand, T. , and Kollmeier, B. (2017). “ Binaural masking release in symmetric listening conditions with spectro-temporally modulated maskers,” J. Acoust. Soc. Am. 142, 12–28. 10.1121/1.4990019 [DOI] [PubMed] [Google Scholar]
- 5. Feng, Y. M. , Xu, L. , Zhou, N. , Yang, G. , and Yin, S. (2012). “ Sine-wave speech recognition in a tonal language,” J. Acoust. Soc. Am. 131(2), EL133–EL138. 10.1121/1.3670594 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Füllgrabe, C. , Berthommier, F. , and Lorenzi, C. (2006). “ Masking release for consonant features in temporally fluctuating background noise,” Hear. Res. 211(1-2), 74–84. 10.1016/j.heares.2005.09.001 [DOI] [PubMed] [Google Scholar]
- 7. Hopkins, K. , and Moore, B. C. (2009). “ The contribution of temporal fine structure to the intelligibility of speech in steady and modulated noise,” J. Acoust. Soc. Am. 125(1), 442–446. 10.1121/1.3037233 [DOI] [PubMed] [Google Scholar]
- 8. Howie, J. M. (1976). Acoustical Studies of Mandarin Vowels and Tones ( Cambridge University Press, Cambridge, England: ). [Google Scholar]
- 9. Hu, H. , Dietz, M. , Williges, B. , and Ewert, S. D. (2018). “ Better-ear glimpsing with symmetrically-placed interferers in bilateral cochlear implant users,” J. Acoust. Soc. Am. 143, 2128–2141. 10.1121/1.5030918 [DOI] [PubMed] [Google Scholar]
- 10. Lee, C. Y. , and Hung, T. H. (2008). “ Identification of Mandarin tones by English-speaking musicians and nonmusicians,” J. Acoust. Soc. Am. 124(5), 3235–3248. 10.1121/1.2990713 [DOI] [PubMed] [Google Scholar]
- 11. Ozmeral, E. J. , Buss, E. , and Hall, J. W. III (2016). “ The effects of sensorineural hearing impairment on asynchronous glimpsing of speech,” PLoS ONE 11(5), e0154920. 10.1371/journal.pone.0154920 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Patel, A. D. , Xu, Y. , and Wang, B. (2010). “ The role of F0 variation in the intelligibility of Mandarin sentences,” in Proceedings of Speech Prosody 2010, Chicago. [Google Scholar]
- 13. Surendran, D. , and Levow, G. A. (2004). “ The functional load of tone in Mandarin is as high as that of vowels,” in International Conference on Speech Prosody, pp. 99–102. [Google Scholar]
- 14. Wang, J. , Shu, H. , Zhang, L. , Liu, Z. , and Zhang, Y. (2013). “ The roles of fundamental frequency contours and sentence context in Mandarin Chinese speech intelligibility,” J. Acoust. Soc. Am. 134(1), EL91–EL97. 10.1121/1.4811159 [DOI] [PubMed] [Google Scholar]
- 15. Wang, S. , Dong, R. , Liu, D. , Wang, Y. , Liu, B. , Zhang, L. , and Xu, L. (2015). “ The role of temporal envelope and fine structure in Mandarin lexical tone perception in auditory neuropathy spectrum disorder,” PLoS ONE 10(6), e0129710. 10.1371/journal.pone.0129710 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Wang, S. , Dong, R. , Liu, D. , Wang, Y. , Mao, Y. , Zhang, H. , Zhang, L. , and Xu, L. (2016). “ Perceptual separation of sensorineural hearing loss and auditory neuropathy spectrum disorder,” Laryngoscope 126, 1420–1425. 10.1002/lary.25595 [DOI] [PubMed] [Google Scholar]
- 17. Wong, L. L. N. , Soli, S. D. , Liu, S. , Han, N. , and Huang, M. W. (2007). “ Development of the Mandarin Hearing in Noise Test (MHINT),” Ear Hear. 28(2), 70s–74s. 10.1097/AUD.0b013e31803154d0 [DOI] [PubMed] [Google Scholar]
- 18. Xu, G. , Zhang, L. , Shu, H. , Wang, X. , and Li, P. (2013). “ Access to lexical meaning in pitch-flattened Chinese sentences: An fMRI study,” Neuropsychologia 51(3), 550–556. 10.1016/j.neuropsychologia.2012.12.006 [DOI] [PubMed] [Google Scholar]
- 19. Xu, L. , and Zhou, N. (2011). “ Tonal languages and cochlear implants,” in Auditory Prostheses: New Horizons, edited by Zeng F. G., Popper A. N., and Fay R. R. ( Springer, New York: ), pp. 341–364. [Google Scholar]
- 20. Xu, Y. (2006). “ Tone in connected discourse,” in Encyclopedia of Language and Linguistics, 2nd ed., edited by Brown K. ( Elsevier, Oxford: ), Vol. 12, pp. 742–750. [Google Scholar]
- 21. Yang, J. , Zhang, Y. , Li, A. , and Xu, L. (2017). “ On the duration of Mandarin tones,” in Proceedings of the Annual Conference of the International Speech Communication Association (Interspeech 2017), Stockholm, Sweden, pp. 1407–1411. [Google Scholar]




