Abstract
This paper predicts speech synthesis, speech recognition, and speaker recognition technology for the year 2001, and it describes the most important research problems to be solved in order to arrive at these ultimate synthesis and recognition systems. The problems for speech synthesis include natural and intelligible voice production, prosody control based on meaning, capability of controlling synthesized voice quality and choosing individual speaking style, multilingual and multidialectal synthesis, choice of application-oriented speaking styles, capability of adding emotion, and synthesis from concepts. The problems for speech recognition include robust recognition against speech variations, adaptation/normalization to variations due to environmental conditions and speakers, automatic knowledge acquisition for acoustic and linguistic modeling, spontaneous speech recognition, naturalness and ease of human-machine interaction, and recognition of emotion. The problems for speaker recognition are similar to those for speech recognition. The research topics related to all these techniques include the use of articulatory and perceptual constraints and evaluation methods for measuring the quality of technology and systems.
Full text
PDF





Images in this article
Selected References
These references are in PubMed. This may not be the complete list of references from this article.
- Atal B. S. Speech technology in 2001: new research directions. Proc Natl Acad Sci U S A. 1995 Oct 24;92(22):10046–10051. doi: 10.1073/pnas.92.22.10046. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bates M. Models of natural language understanding. Proc Natl Acad Sci U S A. 1995 Oct 24;92(22):9977–9982. doi: 10.1073/pnas.92.22.9977. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Carlson R. Models of speech synthesis. Proc Natl Acad Sci U S A. 1995 Oct 24;92(22):9932–9937. doi: 10.1073/pnas.92.22.9932. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cohen P. R., Oviatt S. L. The role of voice input for human-machine communication. Proc Natl Acad Sci U S A. 1995 Oct 24;92(22):9921–9927. doi: 10.1073/pnas.92.22.9921. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Furui S. On the role of spectral transition for speech perception. J Acoust Soc Am. 1986 Oct;80(4):1016–1025. doi: 10.1121/1.393842. [DOI] [PubMed] [Google Scholar]
- Hermansky H. Perceptual linear predictive (PLP) analysis of speech. J Acoust Soc Am. 1990 Apr;87(4):1738–1752. doi: 10.1121/1.399423. [DOI] [PubMed] [Google Scholar]
- Kamm C. User interfaces for voice applications. Proc Natl Acad Sci U S A. 1995 Oct 24;92(22):10031–10037. doi: 10.1073/pnas.92.22.10031. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Levitt H. Processing of speech signals for physical and sensory disabilities. Proc Natl Acad Sci U S A. 1995 Oct 24;92(22):9999–10006. doi: 10.1073/pnas.92.22.9999. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Makhoul J., Schwartz R. State of the art in continuous speech recognition. Proc Natl Acad Sci U S A. 1995 Oct 24;92(22):9956–9963. doi: 10.1073/pnas.92.22.9956. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Marcus M. New trends in natural language processing: statistical natural language processing. Proc Natl Acad Sci U S A. 1995 Oct 24;92(22):10052–10059. doi: 10.1073/pnas.92.22.10052. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Moore R. C. Integration of speech with natural language understanding. Proc Natl Acad Sci U S A. 1995 Oct 24;92(22):9983–9988. doi: 10.1073/pnas.92.22.9983. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Weinstein C. J. Military and government applications of human-machine communication by voice. Proc Natl Acad Sci U S A. 1995 Oct 24;92(22):10011–10016. doi: 10.1073/pnas.92.22.10011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wilpon J. G. Voice-processing technologies--their application in telecommunications. Proc Natl Acad Sci U S A. 1995 Oct 24;92(22):9991–9998. doi: 10.1073/pnas.92.22.9991. [DOI] [PMC free article] [PubMed] [Google Scholar]