Abstract
The conversion of text to speech is seen as an analysis of the input text to obtain a common underlying linguistic description, followed by a synthesis of the output speech waveform from this fundamental specification. Hence, the comprehensive linguistic structure serving as the substrate for an utterance must be discovered by analysis from the text. The pronunciation of individual words in unrestricted text is determined by morphological analysis or letter-to-sound conversion, followed by specification of the word-level stress contour. In addition, many text character strings, such as titles, numbers, and acronyms, are abbreviations for normal words, which must be derived. To further refine these pronunciations and to discover the prosodic structure of the utterance, word part of speech must be computed, followed by a phrase-level parsing. From this structure the prosodic structure of the utterance can be determined, which is needed in order to specify the durational framework and fundamental frequency contour of the utterance. In discourse contexts, several factors such as the specification of new and old information, contrast, and pronominal reference can be used to further modify the prosodic specification. When the prosodic correlates have been computed and the segmental sequence is assembled, a complete input suitable for speech synthesis has been determined. Lastly, multilingual systems utilizing rule frameworks are mentioned, and future directions are characterized.
Full text
PDF






Selected References
These references are in PubMed. This may not be the complete list of references from this article.
- O'Shaughnessy D., Allen J. Linguistic modality effects on fundamental frequency in speech. J Acoust Soc Am. 1983 Oct;74(4):1155–1171. doi: 10.1121/1.390039. [DOI] [PubMed] [Google Scholar]
- Price P. J., Ostendorf M., Shattuck-Hufnagel S., Fong C. The use of prosody in syntactic disambiguation. J Acoust Soc Am. 1991 Dec;90(6):2956–2970. doi: 10.1121/1.401770. [DOI] [PubMed] [Google Scholar]
- Wightman C. W., Shattuck-Hufnagel S., Ostendorf M., Price P. J. Segmental durations in the vicinity of prosodic phrase boundaries. J Acoust Soc Am. 1992 Mar;91(3):1707–1717. doi: 10.1121/1.402450. [DOI] [PubMed] [Google Scholar]