Toward the ultimate synthesis/recognition system

S Furui

doi:10.1073/pnas.92.22.10040

. 1995 Oct 24;92(22):10040–10045. doi: 10.1073/pnas.92.22.10040

Toward the ultimate synthesis/recognition system.

S Furui ¹

PMCID: PMC40732 PMID: 7479723

Abstract

This paper predicts speech synthesis, speech recognition, and speaker recognition technology for the year 2001, and it describes the most important research problems to be solved in order to arrive at these ultimate synthesis and recognition systems. The problems for speech synthesis include natural and intelligible voice production, prosody control based on meaning, capability of controlling synthesized voice quality and choosing individual speaking style, multilingual and multidialectal synthesis, choice of application-oriented speaking styles, capability of adding emotion, and synthesis from concepts. The problems for speech recognition include robust recognition against speech variations, adaptation/normalization to variations due to environmental conditions and speakers, automatic knowledge acquisition for acoustic and linguistic modeling, spontaneous speech recognition, naturalness and ease of human-machine interaction, and recognition of emotion. The problems for speaker recognition are similar to those for speech recognition. The research topics related to all these techniques include the use of articulatory and perceptual constraints and evaluation methods for measuring the quality of technology and systems.

Images in this article

Fig. 3
on p.10042

Selected References

These references are in PubMed. This may not be the complete list of references from this article.

Atal B. S. Speech technology in 2001: new research directions. Proc Natl Acad Sci U S A. 1995 Oct 24;92(22):10046–10051. doi: 10.1073/pnas.92.22.10046. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bates M. Models of natural language understanding. Proc Natl Acad Sci U S A. 1995 Oct 24;92(22):9977–9982. doi: 10.1073/pnas.92.22.9977. [DOI] [PMC free article] [PubMed] [Google Scholar]
Carlson R. Models of speech synthesis. Proc Natl Acad Sci U S A. 1995 Oct 24;92(22):9932–9937. doi: 10.1073/pnas.92.22.9932. [DOI] [PMC free article] [PubMed] [Google Scholar]
Cohen P. R., Oviatt S. L. The role of voice input for human-machine communication. Proc Natl Acad Sci U S A. 1995 Oct 24;92(22):9921–9927. doi: 10.1073/pnas.92.22.9921. [DOI] [PMC free article] [PubMed] [Google Scholar]
Furui S. On the role of spectral transition for speech perception. J Acoust Soc Am. 1986 Oct;80(4):1016–1025. doi: 10.1121/1.393842. [DOI] [PubMed] [Google Scholar]
Hermansky H. Perceptual linear predictive (PLP) analysis of speech. J Acoust Soc Am. 1990 Apr;87(4):1738–1752. doi: 10.1121/1.399423. [DOI] [PubMed] [Google Scholar]
Kamm C. User interfaces for voice applications. Proc Natl Acad Sci U S A. 1995 Oct 24;92(22):10031–10037. doi: 10.1073/pnas.92.22.10031. [DOI] [PMC free article] [PubMed] [Google Scholar]
Levitt H. Processing of speech signals for physical and sensory disabilities. Proc Natl Acad Sci U S A. 1995 Oct 24;92(22):9999–10006. doi: 10.1073/pnas.92.22.9999. [DOI] [PMC free article] [PubMed] [Google Scholar]
Makhoul J., Schwartz R. State of the art in continuous speech recognition. Proc Natl Acad Sci U S A. 1995 Oct 24;92(22):9956–9963. doi: 10.1073/pnas.92.22.9956. [DOI] [PMC free article] [PubMed] [Google Scholar]
Marcus M. New trends in natural language processing: statistical natural language processing. Proc Natl Acad Sci U S A. 1995 Oct 24;92(22):10052–10059. doi: 10.1073/pnas.92.22.10052. [DOI] [PMC free article] [PubMed] [Google Scholar]
Moore R. C. Integration of speech with natural language understanding. Proc Natl Acad Sci U S A. 1995 Oct 24;92(22):9983–9988. doi: 10.1073/pnas.92.22.9983. [DOI] [PMC free article] [PubMed] [Google Scholar]
Weinstein C. J. Military and government applications of human-machine communication by voice. Proc Natl Acad Sci U S A. 1995 Oct 24;92(22):10011–10016. doi: 10.1073/pnas.92.22.10011. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wilpon J. G. Voice-processing technologies--their application in telecommunications. Proc Natl Acad Sci U S A. 1995 Oct 24;92(22):9991–9998. doi: 10.1073/pnas.92.22.9991. [DOI] [PMC free article] [PubMed] [Google Scholar]

[OCR_00820] Atal B. S. Speech technology in 2001: new research directions. Proc Natl Acad Sci U S A. 1995 Oct 24;92(22):10046–10051. doi: 10.1073/pnas.92.22.10046. [DOI] [PMC free article] [PubMed] [Google Scholar]

[OCR_00821] Bates M. Models of natural language understanding. Proc Natl Acad Sci U S A. 1995 Oct 24;92(22):9977–9982. doi: 10.1073/pnas.92.22.9977. [DOI] [PMC free article] [PubMed] [Google Scholar]

[OCR_00870] Carlson R. Models of speech synthesis. Proc Natl Acad Sci U S A. 1995 Oct 24;92(22):9932–9937. doi: 10.1073/pnas.92.22.9932. [DOI] [PMC free article] [PubMed] [Google Scholar]

[OCR_00805] Cohen P. R., Oviatt S. L. The role of voice input for human-machine communication. Proc Natl Acad Sci U S A. 1995 Oct 24;92(22):9921–9927. doi: 10.1073/pnas.92.22.9921. [DOI] [PMC free article] [PubMed] [Google Scholar]

[OCR_00877] Furui S. On the role of spectral transition for speech perception. J Acoust Soc Am. 1986 Oct;80(4):1016–1025. doi: 10.1121/1.393842. [DOI] [PubMed] [Google Scholar]

[OCR_00882] Hermansky H. Perceptual linear predictive (PLP) analysis of speech. J Acoust Soc Am. 1990 Apr;87(4):1738–1752. doi: 10.1121/1.399423. [DOI] [PubMed] [Google Scholar]

[OCR_00889] Kamm C. User interfaces for voice applications. Proc Natl Acad Sci U S A. 1995 Oct 24;92(22):10031–10037. doi: 10.1073/pnas.92.22.10031. [DOI] [PMC free article] [PubMed] [Google Scholar]

[OCR_00795] Levitt H. Processing of speech signals for physical and sensory disabilities. Proc Natl Acad Sci U S A. 1995 Oct 24;92(22):9999–10006. doi: 10.1073/pnas.92.22.9999. [DOI] [PMC free article] [PubMed] [Google Scholar]

[OCR_00848] Makhoul J., Schwartz R. State of the art in continuous speech recognition. Proc Natl Acad Sci U S A. 1995 Oct 24;92(22):9956–9963. doi: 10.1073/pnas.92.22.9956. [DOI] [PMC free article] [PubMed] [Google Scholar]

[OCR_00823] Marcus M. New trends in natural language processing: statistical natural language processing. Proc Natl Acad Sci U S A. 1995 Oct 24;92(22):10052–10059. doi: 10.1073/pnas.92.22.10052. [DOI] [PMC free article] [PubMed] [Google Scholar]

[OCR_00861] Moore R. C. Integration of speech with natural language understanding. Proc Natl Acad Sci U S A. 1995 Oct 24;92(22):9983–9988. doi: 10.1073/pnas.92.22.9983. [DOI] [PMC free article] [PubMed] [Google Scholar]

[OCR_00852] Weinstein C. J. Military and government applications of human-machine communication by voice. Proc Natl Acad Sci U S A. 1995 Oct 24;92(22):10011–10016. doi: 10.1073/pnas.92.22.10011. [DOI] [PMC free article] [PubMed] [Google Scholar]

[OCR_00794] Wilpon J. G. Voice-processing technologies--their application in telecommunications. Proc Natl Acad Sci U S A. 1995 Oct 24;92(22):9991–9998. doi: 10.1073/pnas.92.22.9991. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Toward the ultimate synthesis/recognition system.

S Furui

Abstract

Full text

Images in this article

Selected References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Toward the ultimate synthesis/recognition system.

S Furui

Abstract

Full text

Images in this article

Selected References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases