Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 1995 Oct 24;92(22):10040–10045. doi: 10.1073/pnas.92.22.10040

Toward the ultimate synthesis/recognition system.

S Furui 1
PMCID: PMC40732  PMID: 7479723

Abstract

This paper predicts speech synthesis, speech recognition, and speaker recognition technology for the year 2001, and it describes the most important research problems to be solved in order to arrive at these ultimate synthesis and recognition systems. The problems for speech synthesis include natural and intelligible voice production, prosody control based on meaning, capability of controlling synthesized voice quality and choosing individual speaking style, multilingual and multidialectal synthesis, choice of application-oriented speaking styles, capability of adding emotion, and synthesis from concepts. The problems for speech recognition include robust recognition against speech variations, adaptation/normalization to variations due to environmental conditions and speakers, automatic knowledge acquisition for acoustic and linguistic modeling, spontaneous speech recognition, naturalness and ease of human-machine interaction, and recognition of emotion. The problems for speaker recognition are similar to those for speech recognition. The research topics related to all these techniques include the use of articulatory and perceptual constraints and evaluation methods for measuring the quality of technology and systems.

Full text

PDF
10040

Images in this article

Selected References

These references are in PubMed. This may not be the complete list of references from this article.

  1. Atal B. S. Speech technology in 2001: new research directions. Proc Natl Acad Sci U S A. 1995 Oct 24;92(22):10046–10051. doi: 10.1073/pnas.92.22.10046. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Bates M. Models of natural language understanding. Proc Natl Acad Sci U S A. 1995 Oct 24;92(22):9977–9982. doi: 10.1073/pnas.92.22.9977. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Carlson R. Models of speech synthesis. Proc Natl Acad Sci U S A. 1995 Oct 24;92(22):9932–9937. doi: 10.1073/pnas.92.22.9932. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Cohen P. R., Oviatt S. L. The role of voice input for human-machine communication. Proc Natl Acad Sci U S A. 1995 Oct 24;92(22):9921–9927. doi: 10.1073/pnas.92.22.9921. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Furui S. On the role of spectral transition for speech perception. J Acoust Soc Am. 1986 Oct;80(4):1016–1025. doi: 10.1121/1.393842. [DOI] [PubMed] [Google Scholar]
  6. Hermansky H. Perceptual linear predictive (PLP) analysis of speech. J Acoust Soc Am. 1990 Apr;87(4):1738–1752. doi: 10.1121/1.399423. [DOI] [PubMed] [Google Scholar]
  7. Kamm C. User interfaces for voice applications. Proc Natl Acad Sci U S A. 1995 Oct 24;92(22):10031–10037. doi: 10.1073/pnas.92.22.10031. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Levitt H. Processing of speech signals for physical and sensory disabilities. Proc Natl Acad Sci U S A. 1995 Oct 24;92(22):9999–10006. doi: 10.1073/pnas.92.22.9999. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Makhoul J., Schwartz R. State of the art in continuous speech recognition. Proc Natl Acad Sci U S A. 1995 Oct 24;92(22):9956–9963. doi: 10.1073/pnas.92.22.9956. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Marcus M. New trends in natural language processing: statistical natural language processing. Proc Natl Acad Sci U S A. 1995 Oct 24;92(22):10052–10059. doi: 10.1073/pnas.92.22.10052. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Moore R. C. Integration of speech with natural language understanding. Proc Natl Acad Sci U S A. 1995 Oct 24;92(22):9983–9988. doi: 10.1073/pnas.92.22.9983. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Weinstein C. J. Military and government applications of human-machine communication by voice. Proc Natl Acad Sci U S A. 1995 Oct 24;92(22):10011–10016. doi: 10.1073/pnas.92.22.10011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Wilpon J. G. Voice-processing technologies--their application in telecommunications. Proc Natl Acad Sci U S A. 1995 Oct 24;92(22):9991–9998. doi: 10.1073/pnas.92.22.9991. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES