Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 1995 Oct 24;92(22):9946–9952. doi: 10.1073/pnas.92.22.9946

Linguistic aspects of speech synthesis.

J Allen 1
PMCID: PMC40716  PMID: 7479807

Abstract

The conversion of text to speech is seen as an analysis of the input text to obtain a common underlying linguistic description, followed by a synthesis of the output speech waveform from this fundamental specification. Hence, the comprehensive linguistic structure serving as the substrate for an utterance must be discovered by analysis from the text. The pronunciation of individual words in unrestricted text is determined by morphological analysis or letter-to-sound conversion, followed by specification of the word-level stress contour. In addition, many text character strings, such as titles, numbers, and acronyms, are abbreviations for normal words, which must be derived. To further refine these pronunciations and to discover the prosodic structure of the utterance, word part of speech must be computed, followed by a phrase-level parsing. From this structure the prosodic structure of the utterance can be determined, which is needed in order to specify the durational framework and fundamental frequency contour of the utterance. In discourse contexts, several factors such as the specification of new and old information, contrast, and pronominal reference can be used to further modify the prosodic specification. When the prosodic correlates have been computed and the segmental sequence is assembled, a complete input suitable for speech synthesis has been determined. Lastly, multilingual systems utilizing rule frameworks are mentioned, and future directions are characterized.

Full text

PDF
9946

Selected References

These references are in PubMed. This may not be the complete list of references from this article.

  1. O'Shaughnessy D., Allen J. Linguistic modality effects on fundamental frequency in speech. J Acoust Soc Am. 1983 Oct;74(4):1155–1171. doi: 10.1121/1.390039. [DOI] [PubMed] [Google Scholar]
  2. Price P. J., Ostendorf M., Shattuck-Hufnagel S., Fong C. The use of prosody in syntactic disambiguation. J Acoust Soc Am. 1991 Dec;90(6):2956–2970. doi: 10.1121/1.401770. [DOI] [PubMed] [Google Scholar]
  3. Wightman C. W., Shattuck-Hufnagel S., Ostendorf M., Price P. J. Segmental durations in the vicinity of prosodic phrase boundaries. J Acoust Soc Am. 1992 Mar;91(3):1707–1717. doi: 10.1121/1.402450. [DOI] [PubMed] [Google Scholar]

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES