Future Solutions for Voice Rehabilitation in Laryngectomees: A Review of Technologies Based on Electrophysiological Signals

Nithin Prakasan Nair; Vidhu Sharma; Abhinav Dixit; Darwin Kaushal; Kapil Soni; Bikram Choudhury; Amit Goyal

doi:10.1007/s12070-021-02765-9

. 2021 Jul 21;74(Suppl 3):5082–5090. doi: 10.1007/s12070-021-02765-9

Future Solutions for Voice Rehabilitation in Laryngectomees: A Review of Technologies Based on Electrophysiological Signals

Nithin Prakasan Nair ¹, Vidhu Sharma ¹, Abhinav Dixit ², Darwin Kaushal ³, Kapil Soni ¹, Bikram Choudhury ¹, Amit Goyal ^1,^✉

PMCID: PMC9895460 PMID: 36742837

Abstract

Loss of voice is a serious concern for a laryngectomee which should be addressed prior to planning the procedure. Voice rehabilitation options must be educated before the surgery. Even though many devices have been in use, each device has got its limitations. We are searching for probable future technologies for voice rehabilitation in laryngectomees and to familiarise with the ENT fraternity. We performed a bibliographic search using title/abstract searches and Medical Subject Headings (MeSHs) where appropriate, of the Medline, CINAHL, EMBASE, Web of Science and Google scholars for publications from January 1985 to January 2020. The obtained results with scope for the development of a device for speech rehabilitation were included in the review. A total of 1036 articles were identified and screened. After careful scrutining 40 articles have been included in this study. Silent speech interface is one of the topics which is extensively being studied. It is based on various electrophysiological biosignals like non-audible murmur, electromyography, ultrasound characteristics of vocal folds and optical imaging of lips and tongue, electro articulography and electroencephalography. Electromyographic signals have been studied in laryngectomised patients. Silent speech interface may be the answer for the future of voice rehabilitation in laryngectomees. However, all these technologies are in their primitive stages and are potential in conforming into a speech device.

Keywords: Rehabilitation, Laryngectomy, Voice, Laryngeal cancer, Communication aids

Introduction

Total laryngectomy is the surgical management of resectable advanced laryngeal cancers. As voice forms the identity of a person, loss of voice is a serious concern for the laryngeal cancer patients, considering to undergo such a procedure [1, 2]. Voice rehabilitation is an important factor for a patient after laryngectomy. Hence appropriate voice rehabilitation techniques have to be taught to patients, before counselling for laryngectomy. To date, there are many voice rehabilitation techniques for laryngectomees [3–6].

The various rehabilitation techniques routinely practised are:

Pseudo whispering, whistling, and gestures
Esophageal Speech
Electrolarynx
Tracheoesophageal prosthesis

Even though these techniques are available, some of the patient still hesitates for the procedure considering the fact of voice loss. The existing technologies have its own deficiencies. Patient needs to get trained for esophageal speech, still only 1/3rd of the patients masters the art. Tracheo esophageal prosthesis patients are dependent on the prosthesis and have to use their stoma while speaking. These patients are in risk of stomal granulations or stomal infections. People who use electrolarynx, have a monotonous mechanical(robotic) voice. Hence there is always a search for more compatible and acceptable device or technology.

Newer devices and technology have been developed for voice rehabilitation. We performed a literature search to identify the newer modalities of voice rehabilitation techniques and technologies which aid in newer modes of communications. Silent speech interfaces are systems that enable speech communication with the help of inaudible media. These are under extensive research, especially in space communication. This technology is mainly aiming to establish a communication media without the process of actual speech. There are many electrophysiological biosignals existing which can be used for voice generation. Many studies are undergoing on these electrophysiological bio signals especially attempting to generate voice. We believe that utility may aid in voice rehabilitation for laryngectomees in the future [7].

Innovation is the mother of necessity. Non affordability of tracheo esophageal prosthesis was a major limiting factor among laryngectomees especially in low-income countries. Hence, one Dollar prosthesis for laryngectomees in India as a matter of necessity. We believe, considering the existing problems in speech rehabilitation, it is important for us to be familiar with latest technologies in speech rehabilitation for further inventions. Here, we are aiming to explore the literature for development of such technologies and to bring it to the attention of medical fraternity.

Aim

To identify the newer technologies and devices which can be utilised for voice rehabilitation, which may aid in rehabilitation of laryngectomees.

Methodology

The Preferred Reporting Items for Systematic Review and MetaAnalysis (PRISMA) were used to conduct a systematic review of the current literature. We performed a bibliographic search using title/abstract searches and Medical Subject Headings (MeSHs) where appropriate, of the Medline, CINAHL, Web of Science and Google scholars for publications from January 1985 to January 2020. We used a large variety of terminology defining voice rehabilitation, speech rehabilitation or speech interfaces, a comprehensive search was required to capture all relevant articles. We used MeSH terms like Voice(MeSH), Voice Production(Mesh), Speech Rehabilitation(MeSH), Alaryngeal speech(MeSh), Rehabilitation of language disorder(MeSh), Laryngectomee, Voice training(MeSH), Communication Aids for disabled(MeSH), Speech devices(Mesh),Silent Speech Interfaces and Voice Rehabilitation. All articles with description of a technology or a device which aided in speech production were included in the review. Exclusion included review articles without description of device or assessment of technology, publications reported in another language other than English, studies that did not report outcome or device utility. Conference papers were also included. the reference lists of all included articles were browsed to obtain additional studies. Three independent researchers reviewed the articles. Articles with full text were only included in the study. Disagreements over eligibility were resolved by consensus. Figure 1 shows the methodology of literature search.

Fig. 1 — Prisma flowchart of search strategy and included studies

Newer Technologies

We found many articles citing devices for rehabilitation in people with a speech disability. Voice output communication aids or speech-generating devices are devices that can generate speech on demand [8, 9]. Such devices have an output in the form of previously recorded human voice or digitally synthesized speech. Fabric-based speech generating devices have been tried [10]. Fabric based speech generating devices have a few pre-recorded messages attached to some buttons on fabrics. When patient wants to convey some message, he/she can press on button. Similarly, wrist-worn highly compatible devices like Talktrac^TM have also been used [10]. The portability of such devices has improved due to the use of mobile phones and communication apps. Voice generation mobile phone applications have also been used by people with a speech disability [11]. Even though such applications are compatible, portable and simple to use, these applications have not achieved popularity among people with a speech disability.

Silent Speech Interfaces

Silent speech interface systems enable speech production even when an audible acoustic signal is not available. The principle of silent speech interfaces is acquiring sensor data from the various elements of human speech production processes or any biosignals [7]. The various biosignals which have been under research are:

Non-audible murmur
Ultrasound-guided characteristics of vocal folds and optical imaging of tongue and lips
Capturing the motion of fixed points on articulators, otherwise known as electromagnetic articulography
Glottic activity analysis using electromagnetic waves
Electromyography of articular muscles or laryngeal muscles
Electroencephalography signals
Signals from implants in speech centres in cortex.
Miscellaneous

We have included all biosignals of silent speech interfaces for completion purpose. However, the signals based on vocal fold motion or any structures which may be removed after laryngectomy, is of less importance for laryngectomised patients.

Non-Audible Murmur

Non-audible murmur is one of the earliest biosignals studied since the early 2000s. This technology works on the principle of obtaining the non-audible vibrations from inside of the human body during speech which are converted to text or audible speech. Non-audible murmur is the vibration perceived during the act of speaking which is inaudible to a bystander. It is similar to whisper though differs acoustically. The vibrations produced can be due to the motion of various articulators including tongue, palate and lips [11]. The vibrations made without voice is same as that when voice is produced. Hence the attempt of phonation, with motion of articulators will produce vibration in laryngectomised patients and can be used for voice production. Vibrations may be collected from mastoid tip or any bony prominence.

Over the years, many studies have been carried out concerning this input interface technology. The technology initially targeted at building an inaudible mode of communication to make the work environment quieter. The potential of this technology to be used by vocally handicapped people in the form of an input device (body worn) has been recently promoted. Nevertheless, practical issues like attachment of the device to body for collecting input signals, the requirement of such body-worn device are some of the flaws yet to be addressed. The interference with external, internal noise and output in the form of non-natural communication are other disadvantages.

The various studies found to date are summarised in Table 1 [12–17]. Even though we could not find any device in literature, there is a probability of development of a new device in the future based on this technology.

Table 1.

Studies on Non Audible Murmur and related articles based on the same

S.no	Technology	Advantages	Disadvantages
1	Non audible murmur (NAM): Nakajima Y et al. Japan, 2003 [12]	First time suggested a probability of hands-free voice rehabilitation	Weak signal
	Makarov Model (NAM): Herculeus P et al, Japan, 2003 [13]		Need for a wider adaptation data
			Initial Performance
			Practical body attachment
			Sensitivity to internal noise
			Unnatural communication
2	Blind extraction for NAM speech with speaker movement noise. Itoi M et al., Japan, 2012 [14]	Introduction of noise filter with a microphone attached to throat	Practical body attachment
2		Better signal acquisition with less noise	Still requires improvisation
3	NAM enhance based on statistical conversion of air and bone conduction microphone: Tajri et al., Japan, 2013 [15]	Introduction of air microphone in addition to bone conduction microphone	Practical body attachment
		Better signal acquisition	Unnatural communication
			Sensitivity to internal noise
4	Non audible murmur based on full rank Guassian model: Kumar et al., India, 2018 [16]	Improves quality of speech produced by speech processor by 6%	Practical body attachment
			Sensitivity to internal noise
			Unnatural communication
5	Application of L-NAM speech in voice analyser: Kumaresan et al., India, 2016 [17]	Introduction of lip movements along with non audible murmur with better signal and accuracy	Practical body attachment Sensitivity to internal noise
5			Unnatural communication

Open in a new tab

Ultrasound-Guided Characteristics of Vocal Folds and Optical Imaging of Tongue and Lips

The attempt to read lip and tongue movements with images of articulators of the speaker with a camera and utility of ultrasound images of internal structures during phonation can be mapped into vocal tract parameters and combined with sound to produce a speech signal which is pre-recorded. Over time, the development of compatible ultrasound machines has resulted in acquiring more robust signals to produce speech [18].We hope the same can be extended for the rehabilitation of laryngectomees or in voice handicapped people.

Studies based on the same have been summarized in Table 2 [19–23].

Table 2.

Studies based on ultrasonic and optical imaging of tongue and lips

S.no	Study	Novelty	Disadvantages
1	Speech synthesis from real time ultrasound images of tongue. Denbey B, France, 2004 [19]	Earliest study to produce sounds with real time ultrasound images of tongue	Poor quality of sound
1			Needs improvisation
2	Prospects for a silent speech interface using ultrasound imaging. Denby B, France, 2006 [20]	Reassessment of same technology was done to identify the pros and cons	Insufficient information from sagittal tongue contour and lip profile for silent speech interface
2			Larger training, improvisation in imaging technique and processing units will improve further scope
3.	Eigentongue feature extraction for ultrasound based silent speech interface. Heuber T., France, 2007 [21]	With conversion to Eigentongue images (visual acoustic models), better ultrasound image capture and hence more robust signals	Poor quality of sound
3.			Error of 11-16% in the prescribed model
4	Acquisition of ultrasound, video and acoustic data for a silent speech interface application. 2010 [22]	Laptop based ultrasound machine along with industrial camera and data acquisition device acquiring better input signals	Small database
	Device ‘Ultraspeech^TM’: Heuber T, France	Better voice output	Requires improvisation
			For large database requires multi sessional acquisition with inter sessional re calibrations.
5	SottoVoce^TM: An ultrasound based silent speech interface based on deep neural network: Kimura N, Japan. 2019 [23]	A device developed on similar principles	Needs to be more compatible to wear

Open in a new tab

Capturing the Motion of Fixed Points on Articulators, Otherwise Known as Electromagnetic Articulography

Bio-signals captured by devices which sense the changes in orientation and position of definitive marked points over speech articulators have been under extensive research for the purpose of voice rehabilitation [24].

Steiner et al studied the electromagnetic articulography data. They presented an articulatory animation system featuring a lightweight implementation of data with probable integration into frameworks for speech synthesis. This study enlightens the possibility to use this bio-signal for voice rehabilitation [25]. Such devices have the potential to a greater number of input signals and may aid in handsfree communication.

Real-time MRI images of tongue, lip and palate movements have been studied by Narayanan et al., along with electromagnetic articulography. They developed a multi-modal speech database from the images providing dynamic information from the entire mid-sagittal plane of the person’s upper airway. The temporal resolution of real-time MRI was not as good as that of Electromagnetic Articulography. Real-time magnetic resonance imaging of upper airway and speech apparatus is an active research area. The author concludes with insights for further research in the area of speech production [26].

The idea of building up database from functional MRI is indeed very clever. However, more detail research may be needed to detail its utility for laryngectomised patients.

Glottic Activity Analysis with Electromagnetic Waves and Laryngeal Electromyographic Signals

Glottic activity analysis with the aid of electromagnetic waves and laryngeal electromyographic signals has been studied in the field of silent speech interfaces [27, 28]. However, the utility of such bio-signals in laryngectomees may not be possible, due to the absence of the laryngeal apparatus.

Electromyography Signals of Articular Apparatus

The electric signals generated during muscle contraction in the process of speech production can be converted directly to an audible speech waveform. This particular approach has advantages over recognition-based conversations. There will not be any language or vocabulary set. Speaker mood and emotions can also be preserved. Direct mapping will help to use EMG based conversation on a real-time base [29]. This modality has been studied in laryngectomees too.

Studies based on EMG signal have been summarised in Table 3 [30–33].

Table 3.

Studies on electromyographic signals based devices

S.no.	Article	Advantages	Disadvantages
1	Synthesizing speech from electromyography using voice transformation techniques. Toth T, Germany, 2009 [30]	Incorporated voice transformation technique, hence better voice quality	Production of inadequate signal and differences between EMG produced during audible and silent speech
2	Estimation of fundamental frequency from the surface electromyography data. Nakamura K, Japan, 2011 [31]	Measured fundamental frequency of electromyographic signals using gaussian mixture models with vocal conversion	Poor quality/ unnatural speech
2		Decision accuracy among voiced and unvoiced version up to 84%	Latency
3	EMG to speech: Direct generation from facial electromyographic signals. Janke M, Germany, 2012 [32]	Able to generate non language dependent speech	Poor quality/ non-natural speech
3		Not dependent on vocabulary	Latency
4	Silent speech recognition as alternate communication device for persons with laryngectomy	Used 8 EMG sensor attached to face and neck and studied in 8 laryngectomised patients. Studied with training phoneme-based recognition model with 39 commonly used phonemes in English	Found to have a word error rate of 10.3% with a 2500-words vocabulary with the set of 8 sensors
4	Meltzner GS et al, USA 2018 [33]

Open in a new tab

Electroencephalography Signals

Electroencephalography signals have shown promises in the process of making new communication methods. Initial research was based on electroencephalography and magnetoencephalography studies [34]. The true advent of silent speech interfaces started when the unspoken speech in the brain was converted to EEG signals. A study by Da Salla and colleagues recorded EEG signals in three healthy volunteers with the unspoken speech of English vowels /a/ and /u/. Fifty trials of each task were done by each subject with the trial lasting for 2 seconds. Twenty randomly selected tasks from each set were decomposed into spatial patterns and spatial filters were applied. Resultant patterns showed all pairwise combinations of each task. Then the authors trained a non-linear support machine and classified the rest of the 30 trials. Hence, they proved that the motor cortex activation associated with vowels can be classified. The authors foresee the probability of a device in the future [35]. Table 4 summarizes past research based on electroencephalography signals [36–39].

Table 4.

Studies based on encephalography signals

S.no	Article	Novelty	Advantages	Disadvantages
1	Thought translating device	Study was done on healthy and disabled. A letter spelling device was made	Accuracy was up to 75% and was able to produce 0.5 letters per minute	Slow device
1	Birbaumer et al. 2000 [36]			Complex
2	P300 speller:	Study was done on healthy and disabled.	Accuracy up to 62% in disabled and 80–90% in healthy	Slow device
2	Farwell and Donchin et al. 1988 [37]	A letter spelling device was made	1.99 letters per min in healthy and 1 letter per minute in case of disabled	Complex
3	Graz- BCI: Pfurscheller et al. 2001 [38]	Study was done with two class motor imagery strategy in healthy and disabled	Accuracy >71% in healthy and 70% in disabled. One letter per minute in disabled and 1.99 letters per minute in healthy	Slow device
3				Complex
4	Berlin BCI: Blankertz. B et al. 2008 [39]	Studied neurological activity related to imagery things, used spatial filtering technique.	2.3–7.6 letters per minute	Slow device and complex

Open in a new tab

Brain-Computer Interfaces (Invasive)

This technology focusses on the production of silent speech based on signals obtained from intracortical microelectrodes. The electrodes are implanted into the ventral sensorimotor cortex, superior temporal gyrus, and inferior frontal gyrus. Initial results reveal the ability to produce continuous speech but with slow communication rates. Brumberg JS et al. proposed a novel idea using neurotropic implant in speech cortex [40]. Using complex algorithms, cortical signals were decoded. The device worked like a spelling device with a reproducibility rate of 40–70%. The speed was 0.5–7.97 bits/min. Anumanchipali GK et al. devised a brain computer interface technology for voice rehabilitation [41]. This team was able to synthesis speech from neural decoding of spoken sentences. This was also based on intra cortical implants implanted in speech motor cortex. For this device, cortical process of articulation and cognitive functions of the patient should be intact for learning. Still, studies are undergoing in similar lines.

Miscellaneous

Smartphone-based apps have been used for rehabilitation in mute people. The utility of hand gestures in voice rehabilitation is being studied. SignAloud^TM is a device which converts sign language to audible speech with the help of a glove that senses the motion of the hand, converting these signals to pre-recorded speech signals [42]. Many more devices were innovated on similar lines which have been summarised in Table 5 [43–47].

Table 5.

Studies with prototype device for speech rehabilitation

S.no	Article	Novelty	Disadvantage
1	Intraoral Voice recording- a new smartphone-based voice rehabilitation: Schultdt T et al. Germany. 2018 [43]	Novel smartphone based device for voice rehabilitation in future	Requirement of dental splints with voice recorders
1			Complexity of device
2	Speech generation from Hand Gestures based on space mapping: Kunikoshi et al., Japan. 2009 [44]	Novel device based on hand gestures for speech rehabilitation	Based on Japanese five vowels
			Limited signals
			Camera based (motion capture)
3	Glove talk: A neural network interface between a data glove and speech synthesizer: Fels S S, Canada, 1993 [45]	Glove based, using hand gestures	Limited signal output
		Fairly rapid, intelligible speech possible with low error rates	Slow speaking device, unnatural sounds
			Needs extensive training
4	Glove talk II a neural network interface mapping gestures to parallel formant speech synthesizers: Fels S S, Canada, 1997 [46]	Glove based, using hand gestures	Slow speaking
		Sounds with more natural variations as compared to text to speech converter	Limited signal out put
			Unnatural sound
			Needs training for learning the hand gestures
5	Arabic glove talk: Tolba A S, Kuwait, 1998 [47]	Uses hand gestures	Single language
			Slow speaking
			Limited signal output
			Unnatural sound
			Needs training

Open in a new tab

The different glove-based devices made till now as mentioned in the table are language dependent. A device based on flex sensors have been developed by authors which is not language dependent. This device utilizes phonetics and hence can gave outputs in multiple languages [48].

Conclusion

Silent speech interfaces could be a valid option for future rehabilitation of laryngectomees. We have summarised some of the possible directions for future exploration for voice rehabilitation. However, the field is expanding day by day and the possibility of developing a low-cost, easy to use device based on silent speech interfaces in the future is not far.

Currently faced challenges in this technology are the positioning of sensors and the robustness of signals. Each technology has its unique position of sensor depending upon the robustness of signals available. Technology utilizing electromagnetic articulography, electromyography or electroencephalography has got a specific position of sensors, whereas, we need to find an appropriate position for technology utilizing a non-audible murmur. There is no systematic way of ensuring the accurate position of sensors in each silent speech interface [7].

Most of the devices are dependent on the speaker, depending on the anatomy or movements of the articulatory muscles of the patient. Hence the robustness of signal depends upon the speaker. Producing continuous speech is also considered to be a difficult task in real-time interactive systems. Present-day research has made it feasible to create limited vocabulary voice rehabilitation.

A device incorporating a combination of these technologies or involving multiple bio-signals may be the ultimate voice rehabilitation option for laryngectomised patients in future. Most of the ongoing research is still in the preliminary stages, work only under artificial circumstances, requires a lot of technology and hardware development. Even though the current literature describes few devices, their utility has been found to be limited.

Authors contribution

NPN (a) Conceptualization, (b) Drafting the article, (c) Screening articles for inclusion in review article, (d) Writing- Review and editing, (e) Validation. VS (a) Writing–original drafting, (b) Data Curation- Helped in literature search and collecting articles, (c) Helped in screening articles for inclusion in article, (d) Validation. AD (a) Data Curation- Helped in literature search and collecting articles, (b) Helped in screening articles for inclusion in article, (c) Writing- Review and Editing, (d)Validation. DK (a) Helped in literature search and collecting articles, (b) Helped in screening articles for inclusion in article, (c) Validation. KS (a) Helped in literature search and collecting articles, (b) Helped in screening articles for inclusion in article, (c) Validation. BC (a) Helped in screening articles for inclusion in article, (b) Validation. AG (a) Conceptualization, (b) Drafting the article, (c) Screening articles for inclusion in review article, (d) Writing- Review and editing, (e) Validation.

Funding

There was no funding involved in this

Data Availability

Any relevant data/information regarding the proposal will be available on request

Declarations

Conflict of interest

There is no conflict of interests among authors

Footnotes

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Nithin Prakasan Nair, Email: nithinprakasannair.2008@gmail.com.

Vidhu Sharma, Email: vidhusharma88@gmail.com.

Abhinav Dixit, Email: abhinavdr@gmail.com.

Darwin Kaushal, Email: drdarwin.aiims@gmail.com.

Kapil Soni, Email: ksoni805@gmail.com.

Bikram Choudhury, Email: comatosebuddha@gmail.com.

Amit Goyal, Email: meetugoyal@yahoo.com.

References

1.Ţiple C, Drugan T, Dinescu FV, Mureşan R, Chirilă M, Cosgarea M. The impact of vocal rehabilitation on quality of life and voice handicap in patients with total laryngectomy: J Res. Med Sci. 2016;21:127. doi: 10.4103/1735-1995.196609. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.McQuellon RP, Hurt GJ. The psychosocial impact of the diagnosis and treatment of laryngeal cancer. Otolaryngol Clin North Am. 1997;30:231–241. doi: 10.1016/S0030-6665(20)30242-5. [DOI] [PubMed] [Google Scholar]
3.Kapila M, Deore N, Palav RS, Kazi RA, Shah RP, Jagade MV. A brief review of voice restoration following total laryngectomy. Indian J Cancer. 2011;48:99–104. doi: 10.4103/0019-509X.75841. [DOI] [PubMed] [Google Scholar]
4.Tang CG, Sinclair CF. Voice Restoration After Total Laryngectomy. Otolaryngol Clin North Am. 2015;48:687–702. doi: 10.1016/j.otc.2015.04.013. [DOI] [PubMed] [Google Scholar]
5.van Sluis KE, van der Molen L, van Son RJJH, Hilgers FJM, Bhairosing PA, van den Brekel MWM. Objective and subjective voice outcomes after total laryngectomy: a systematic review. Eur Arch Otorhinolaryngol. 2018;275:11–26. doi: 10.1007/s00405-017-4790-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Pawar PV, Sayed SI, Kazi R, Jagade MV. Current status and future prospects in prosthetic voice rehabilitation following laryngectomy. J Cancer Res Ther. 2008;4:186–91. doi: 10.4103/0973-1482.44289. [DOI] [PubMed] [Google Scholar]
7.Denby B, Schultz T, Honda K, Hueber T, Gilbert JM, Brumberg JS. Silent Speech Interfaces: Speech Commun. 2010;52:270–87. [Google Scholar]
8.Hawley M, Cunningham S, Green P, Enderby P, Palmer R, Sehgal S, et al. A Voice-Input Voice-Output Communication Aid for People With Severe Speech Impairment: IEEE transactions on neural systems and rehabilitation engineering : a publication of the IEEE Engineering in Medicine and Biology Society 2012;21:23-31 [DOI] [PubMed]
9.Judge S, Townend G. Perceptions of the design of voice output communication aids: Int J Lang Commun Disord 2013 Jul-Aug;48(4):366-81 [DOI] [PubMed]
10.Fleury A, Wu G, Chau T. A wearable fabric-based speech-generating device: system design and case demonstration. Disabil Rehabil Assist Technol. 2019;14:434–444. doi: 10.1080/17483107.2018.1462860. [DOI] [PubMed] [Google Scholar]
11.Furlong LM, Morris ME, Erickson S, Serry TA. Quality of Mobile Phone and Tablet Mobile Apps for Speech Sound Disorders: Protocol for an Evidence-Based Appraisal:JMIR Res Protoc 2016;5:e233 [DOI] [PMC free article] [PubMed]
12.Nakajima Y, Kashioka H, Shikano K, Campbell N. Non-Audible Murmur Recognition: Interspeech 2003;4
13.Heracleous, Panikos et al. Accurate hidden Markov models for non-audible murmur (NAM) recognition based on iterative supervised adaptation: IEEE Workshop on Automatic Speech Recognition and Understanding 2003: 73-76
14.Tajiri Y, Tanaka K, Toda T, Neubig G, Sakti S, Nakamura S. Non-Audible Murmur Enhancement Based on Statistical Conversion Using Air- and Body-Conductive Microphones in Noisy Environments: Interspeech 2015 :5
15.Itoi M, Miyazaki R, Toda T, Saruwatari H, Shikano K. Blind speech extraction for Non-Audible Murmur speech with speaker’s movement noise: IEEE International Symposium on Signal Processing and Information Technology (ISSPIT) 2012: 320-325.
16.Kumar TR, Suresh GR, Raja S. Conversion of Non-Audible murmur to normal speech based on full-rank gaussian mixture model. J Comput Theor Nanosci. 2018;15:185–190. doi: 10.1166/jctn.2018.7072. [DOI] [Google Scholar]
17.Kumaresan A, Selvaraj P, Mohanraj S, Mohankumar N, Anand SM. Application of L-NAM speech in voice analyser: Advances in Natural and Applied Sciences 2016; 10:172
18.Csapó TG, Grósz T, Gosztolya G, Tóth L, Markó A. DNN-Based Ultrasound-to-Speech Conversion for a Silent Speech Interface: Interspeech 2017 (ISCA) 2017:3672–6
19.Denby B, Stone M. Speech synthesis from real time ultrasound images of the tongue: IEEE International Conference on Acoustics, Speech, and Signal Processing 2004:685–8.
20.Denby B, Oussar Y, Dreyfus G, Stone M. Prospects for a Silent Speech Interface using Ultrasound Imaging: IEEE International Conference on Acoustics Speed and Signal Processing Proceedings 2006;365-368
21.Hueber T, Aversano G, Cholle G, Denby B, Dreyfus G, Oussar Y, et al. Eigentongue Feature Extraction for an Ultrasound-Based Silent Speech Interface: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2007;1245-1248
22.Hueber T, Benaroya EL, Chollet G, Denby B, Dreyfus G, Stone M. Development of a silent speech interface driven by ultrasound and optical images of the tongue and lips. Speech Commun. 2010;52:288–300. doi: 10.1016/j.specom.2009.11.004. [DOI] [Google Scholar]
23.Kimura N, Kono M, Rekimoto J. SottoVoce: An Ultrasound Imaging-Based Silent Speech Interaction Using Deep Neural Networks: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems 2019;1–11
24.Harper S, Lee S, Goldstein L, Byrd D. Simultaneous electromagnetic articulography and electroglottography data acquisition of natural speech. J Acoust Soc Am. 2018;144:380–5. doi: 10.1121/1.5066349. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Steiner I, Richmond K, Ouni S. Speech animation using electromagnetic articulography as motion capture data: AVSP - 12th International Conference on Auditory-Visual Speech Processing 2013:55-60
26.Narayanan S, Toutios A, Ramanarayanan V, Lammert A, Kim J, Lee S, et al. Real-time magnetic resonance imaging and electromagnetic articulography database for speech production research. J Acoust Soc Am. 2014;136:1307–11. doi: 10.1121/1.4890284. [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Chen F, Li S, Zhang Y, Wang J. Detection of the Vibration Signal from Human Vocal Folds Using a 94-GHz Millimeter-Wave Radar: Sensors 2017;17:543 [DOI] [PMC free article] [PubMed]
28.Svec JG, Schutte HK, Miller DG. A subharmonic vibratory pattern in normal vocal folds. J Speech Hear Res. 1996;39:135–43. doi: 10.1044/jshr.3901.135. [DOI] [PubMed] [Google Scholar]
29.Janke M, Diener L. EMG-to-Speech: Direct Generation of Speech From Facial Electromyographic Signals: IEEE/ACM Trans Audio Speech Lang Process 2017;25:2375–85
30.Toth AR, Wand M, Schultz T. Synthesizing Speech from Electromyography Using Voice Transformation Techniques: Interspeech 2009:4
31.Nakamura K, Janke M, Wand M, Schultz T. Estimation of fundamental frequency from surface electromyographic data: EMG-to-F0: International Conference on Acoustics, Speech and Signal Processing (ICASSP) IEEE; 2011;573–6
32.Janke M, Wand M, Nakamura K, Schultz T. Further investigations on EMG-to-speech conversion: International Conference on Acoustics, Speech and Signal Processing (ICASSP) IEEE; 2012;365–8.
33.Meltzner GS, Heaton JT, Deng Y, De Luca G, Roy SH, Kline JC. Silent Speech Recognition as an Alternative Communication Device for Persons With Laryngectomy: IEEE/ACM Trans Audio Speech Lang Process 2017;25:2386–98 [DOI] [PMC free article] [PubMed]
34.Porbadnigk A, Wester M, Calliess J-P, Schultz T. EEG-based Speech Recognition - Impact of Temporal Effects: Biosignals- Proceedings of the International Conference on Bio-inspired Systems and Signal Processing 2009;1;376-381
35.DaSalla C, Kambara H, Koike Y, Sato M. Spatial filtering and single-trial classification of EEG during vowel speech imager: ICREATE ’09 - International Convention on Rehabilitation Engineering and Assistive Technology 2009;
36.Birbaumer N, Kübler A, Ghanayim N, Hinterberger T, Perelmouter J, Kaiser J, et al. The thought translation device (TTD) for completely paralyzed patients: IEEE Trans Rehabil Eng 2000;8:190–3 [DOI] [PubMed]
37.Farwell LA, Donchin E. Talking off the top of your head: toward a mental prosthesis utilizing event-related brain potentials. Electroencephalogr Clin Neurophysiol. 1988;70:510–23. doi: 10.1016/0013-4694(88)90149-6. [DOI] [PubMed] [Google Scholar]
38.Pfurtscheller G, Neuper C. Motor imagery and direct brain-computer communication: IEEE. 2001;89:1123–34. [Google Scholar]
39.Blankertz B, Losch F, Krauledat M, Dornhege G, Curio G, Müller K-R. The Berlin brain-computer interface: accurate performance from first-session in BCI-naïve subjects. IEEE Trans Biomed Eng. 2008;55:2452–62. doi: 10.1109/TBME.2008.923152. [DOI] [PubMed] [Google Scholar]
40.Brumberg JS, Nieto-Castanon A, Kennedy PR, Guenther FH. Brain–computer interfaces for speech communication: Speech Communication 2010;52:367–79 [DOI] [PMC free article] [PubMed]
41.Anumanchipalli GK, Chartier J, Chang EF. Speech synthesis from neural decoding of spoken sentences. Nature. 2019;568:493–8. doi: 10.1038/s41586-019-1119-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
42.O’Connor TF, Fach ME, Miller R, Root SE, Mercier PP, Lipomi DJ. The Language of Glove: Wireless gesture decoder with low-power and stretchable hybrid electronics: PLOS ONE 2017;12:e0179766 [DOI] [PMC free article] [PubMed]
43.Schuldt T, Kramp B, Ovari A, Timmermann D, Dommerich S, Mlynski R, et al. Intraoral voice recording-towards a new smartphone-based method for vocal rehabilitation. HNO. 2018;66:63–70. doi: 10.1007/s00106-018-0549-7. [DOI] [PubMed] [Google Scholar]
44.Kunikoshi A, Qiao Y, Minematsu N, Hirose K. Speech Generation from Hand Gestures Based on Space Mapping: Interspeech 2009 :5
45.Fels SS, Hinton GE. Glove-Talk: a neural network interface between a data-glove and a speech synthesizer: IEEE Trans Neural Netw 1993;4:2–8 [DOI] [PubMed]
46.Fels SS, Hinton GE. Glove-talk II - a neural-network interface which maps gestures to parallel formant speech synthesizer controls. IEEE Trans Neural Netw. 1997;8:977–84. doi: 10.1109/72.623199. [DOI] [PubMed] [Google Scholar]
47.Tolba AS, Abu-Rezq AN. Arabic glove-talk (AGT): A communication aid for vocally impaired: Pattern Analysis & Applic 1998;1:218–30
48.Goyal A, Dixit A, Kalra S, Khandelwal A, Nair NP. 2019. Automatic Speech Generation. Indian Patent Application 201911035856A (2019)

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

Any relevant data/information regarding the proposal will be available on request

[CR1] 1.Ţiple C, Drugan T, Dinescu FV, Mureşan R, Chirilă M, Cosgarea M. The impact of vocal rehabilitation on quality of life and voice handicap in patients with total laryngectomy: J Res. Med Sci. 2016;21:127. doi: 10.4103/1735-1995.196609. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR2] 2.McQuellon RP, Hurt GJ. The psychosocial impact of the diagnosis and treatment of laryngeal cancer. Otolaryngol Clin North Am. 1997;30:231–241. doi: 10.1016/S0030-6665(20)30242-5. [DOI] [PubMed] [Google Scholar]

[CR3] 3.Kapila M, Deore N, Palav RS, Kazi RA, Shah RP, Jagade MV. A brief review of voice restoration following total laryngectomy. Indian J Cancer. 2011;48:99–104. doi: 10.4103/0019-509X.75841. [DOI] [PubMed] [Google Scholar]

[CR4] 4.Tang CG, Sinclair CF. Voice Restoration After Total Laryngectomy. Otolaryngol Clin North Am. 2015;48:687–702. doi: 10.1016/j.otc.2015.04.013. [DOI] [PubMed] [Google Scholar]

[CR5] 5.van Sluis KE, van der Molen L, van Son RJJH, Hilgers FJM, Bhairosing PA, van den Brekel MWM. Objective and subjective voice outcomes after total laryngectomy: a systematic review. Eur Arch Otorhinolaryngol. 2018;275:11–26. doi: 10.1007/s00405-017-4790-6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR6] 6.Pawar PV, Sayed SI, Kazi R, Jagade MV. Current status and future prospects in prosthetic voice rehabilitation following laryngectomy. J Cancer Res Ther. 2008;4:186–91. doi: 10.4103/0973-1482.44289. [DOI] [PubMed] [Google Scholar]

[CR7] 7.Denby B, Schultz T, Honda K, Hueber T, Gilbert JM, Brumberg JS. Silent Speech Interfaces: Speech Commun. 2010;52:270–87. [Google Scholar]

[CR8] 8.Hawley M, Cunningham S, Green P, Enderby P, Palmer R, Sehgal S, et al. A Voice-Input Voice-Output Communication Aid for People With Severe Speech Impairment: IEEE transactions on neural systems and rehabilitation engineering : a publication of the IEEE Engineering in Medicine and Biology Society 2012;21:23-31 [DOI] [PubMed]

[CR9] 9.Judge S, Townend G. Perceptions of the design of voice output communication aids: Int J Lang Commun Disord 2013 Jul-Aug;48(4):366-81 [DOI] [PubMed]

[CR10] 10.Fleury A, Wu G, Chau T. A wearable fabric-based speech-generating device: system design and case demonstration. Disabil Rehabil Assist Technol. 2019;14:434–444. doi: 10.1080/17483107.2018.1462860. [DOI] [PubMed] [Google Scholar]

[CR11] 11.Furlong LM, Morris ME, Erickson S, Serry TA. Quality of Mobile Phone and Tablet Mobile Apps for Speech Sound Disorders: Protocol for an Evidence-Based Appraisal:JMIR Res Protoc 2016;5:e233 [DOI] [PMC free article] [PubMed]

[CR12] 12.Nakajima Y, Kashioka H, Shikano K, Campbell N. Non-Audible Murmur Recognition: Interspeech 2003;4

[CR13] 13.Heracleous, Panikos et al. Accurate hidden Markov models for non-audible murmur (NAM) recognition based on iterative supervised adaptation: IEEE Workshop on Automatic Speech Recognition and Understanding 2003: 73-76

[CR14] 14.Tajiri Y, Tanaka K, Toda T, Neubig G, Sakti S, Nakamura S. Non-Audible Murmur Enhancement Based on Statistical Conversion Using Air- and Body-Conductive Microphones in Noisy Environments: Interspeech 2015 :5

[CR15] 15.Itoi M, Miyazaki R, Toda T, Saruwatari H, Shikano K. Blind speech extraction for Non-Audible Murmur speech with speaker’s movement noise: IEEE International Symposium on Signal Processing and Information Technology (ISSPIT) 2012: 320-325.

[CR16] 16.Kumar TR, Suresh GR, Raja S. Conversion of Non-Audible murmur to normal speech based on full-rank gaussian mixture model. J Comput Theor Nanosci. 2018;15:185–190. doi: 10.1166/jctn.2018.7072. [DOI] [Google Scholar]

[CR17] 17.Kumaresan A, Selvaraj P, Mohanraj S, Mohankumar N, Anand SM. Application of L-NAM speech in voice analyser: Advances in Natural and Applied Sciences 2016; 10:172

[CR18] 18.Csapó TG, Grósz T, Gosztolya G, Tóth L, Markó A. DNN-Based Ultrasound-to-Speech Conversion for a Silent Speech Interface: Interspeech 2017 (ISCA) 2017:3672–6

[CR19] 19.Denby B, Stone M. Speech synthesis from real time ultrasound images of the tongue: IEEE International Conference on Acoustics, Speech, and Signal Processing 2004:685–8.

[CR20] 20.Denby B, Oussar Y, Dreyfus G, Stone M. Prospects for a Silent Speech Interface using Ultrasound Imaging: IEEE International Conference on Acoustics Speed and Signal Processing Proceedings 2006;365-368

[CR21] 21.Hueber T, Aversano G, Cholle G, Denby B, Dreyfus G, Oussar Y, et al. Eigentongue Feature Extraction for an Ultrasound-Based Silent Speech Interface: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2007;1245-1248

[CR22] 22.Hueber T, Benaroya EL, Chollet G, Denby B, Dreyfus G, Stone M. Development of a silent speech interface driven by ultrasound and optical images of the tongue and lips. Speech Commun. 2010;52:288–300. doi: 10.1016/j.specom.2009.11.004. [DOI] [Google Scholar]

[CR23] 23.Kimura N, Kono M, Rekimoto J. SottoVoce: An Ultrasound Imaging-Based Silent Speech Interaction Using Deep Neural Networks: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems 2019;1–11

[CR24] 24.Harper S, Lee S, Goldstein L, Byrd D. Simultaneous electromagnetic articulography and electroglottography data acquisition of natural speech. J Acoust Soc Am. 2018;144:380–5. doi: 10.1121/1.5066349. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR25] 25.Steiner I, Richmond K, Ouni S. Speech animation using electromagnetic articulography as motion capture data: AVSP - 12th International Conference on Auditory-Visual Speech Processing 2013:55-60

[CR26] 26.Narayanan S, Toutios A, Ramanarayanan V, Lammert A, Kim J, Lee S, et al. Real-time magnetic resonance imaging and electromagnetic articulography database for speech production research. J Acoust Soc Am. 2014;136:1307–11. doi: 10.1121/1.4890284. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR27] 27.Chen F, Li S, Zhang Y, Wang J. Detection of the Vibration Signal from Human Vocal Folds Using a 94-GHz Millimeter-Wave Radar: Sensors 2017;17:543 [DOI] [PMC free article] [PubMed]

[CR28] 28.Svec JG, Schutte HK, Miller DG. A subharmonic vibratory pattern in normal vocal folds. J Speech Hear Res. 1996;39:135–43. doi: 10.1044/jshr.3901.135. [DOI] [PubMed] [Google Scholar]

[CR29] 29.Janke M, Diener L. EMG-to-Speech: Direct Generation of Speech From Facial Electromyographic Signals: IEEE/ACM Trans Audio Speech Lang Process 2017;25:2375–85

[CR30] 30.Toth AR, Wand M, Schultz T. Synthesizing Speech from Electromyography Using Voice Transformation Techniques: Interspeech 2009:4

[CR31] 31.Nakamura K, Janke M, Wand M, Schultz T. Estimation of fundamental frequency from surface electromyographic data: EMG-to-F0: International Conference on Acoustics, Speech and Signal Processing (ICASSP) IEEE; 2011;573–6

[CR32] 32.Janke M, Wand M, Nakamura K, Schultz T. Further investigations on EMG-to-speech conversion: International Conference on Acoustics, Speech and Signal Processing (ICASSP) IEEE; 2012;365–8.

[CR33] 33.Meltzner GS, Heaton JT, Deng Y, De Luca G, Roy SH, Kline JC. Silent Speech Recognition as an Alternative Communication Device for Persons With Laryngectomy: IEEE/ACM Trans Audio Speech Lang Process 2017;25:2386–98 [DOI] [PMC free article] [PubMed]

[CR34] 34.Porbadnigk A, Wester M, Calliess J-P, Schultz T. EEG-based Speech Recognition - Impact of Temporal Effects: Biosignals- Proceedings of the International Conference on Bio-inspired Systems and Signal Processing 2009;1;376-381

[CR35] 35.DaSalla C, Kambara H, Koike Y, Sato M. Spatial filtering and single-trial classification of EEG during vowel speech imager: ICREATE ’09 - International Convention on Rehabilitation Engineering and Assistive Technology 2009;

[CR36] 36.Birbaumer N, Kübler A, Ghanayim N, Hinterberger T, Perelmouter J, Kaiser J, et al. The thought translation device (TTD) for completely paralyzed patients: IEEE Trans Rehabil Eng 2000;8:190–3 [DOI] [PubMed]

[CR37] 37.Farwell LA, Donchin E. Talking off the top of your head: toward a mental prosthesis utilizing event-related brain potentials. Electroencephalogr Clin Neurophysiol. 1988;70:510–23. doi: 10.1016/0013-4694(88)90149-6. [DOI] [PubMed] [Google Scholar]

[CR38] 38.Pfurtscheller G, Neuper C. Motor imagery and direct brain-computer communication: IEEE. 2001;89:1123–34. [Google Scholar]

[CR39] 39.Blankertz B, Losch F, Krauledat M, Dornhege G, Curio G, Müller K-R. The Berlin brain-computer interface: accurate performance from first-session in BCI-naïve subjects. IEEE Trans Biomed Eng. 2008;55:2452–62. doi: 10.1109/TBME.2008.923152. [DOI] [PubMed] [Google Scholar]

[CR40] 40.Brumberg JS, Nieto-Castanon A, Kennedy PR, Guenther FH. Brain–computer interfaces for speech communication: Speech Communication 2010;52:367–79 [DOI] [PMC free article] [PubMed]

[CR41] 41.Anumanchipalli GK, Chartier J, Chang EF. Speech synthesis from neural decoding of spoken sentences. Nature. 2019;568:493–8. doi: 10.1038/s41586-019-1119-1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR42] 42.O’Connor TF, Fach ME, Miller R, Root SE, Mercier PP, Lipomi DJ. The Language of Glove: Wireless gesture decoder with low-power and stretchable hybrid electronics: PLOS ONE 2017;12:e0179766 [DOI] [PMC free article] [PubMed]

[CR43] 43.Schuldt T, Kramp B, Ovari A, Timmermann D, Dommerich S, Mlynski R, et al. Intraoral voice recording-towards a new smartphone-based method for vocal rehabilitation. HNO. 2018;66:63–70. doi: 10.1007/s00106-018-0549-7. [DOI] [PubMed] [Google Scholar]

[CR44] 44.Kunikoshi A, Qiao Y, Minematsu N, Hirose K. Speech Generation from Hand Gestures Based on Space Mapping: Interspeech 2009 :5

[CR45] 45.Fels SS, Hinton GE. Glove-Talk: a neural network interface between a data-glove and a speech synthesizer: IEEE Trans Neural Netw 1993;4:2–8 [DOI] [PubMed]

[CR46] 46.Fels SS, Hinton GE. Glove-talk II - a neural-network interface which maps gestures to parallel formant speech synthesizer controls. IEEE Trans Neural Netw. 1997;8:977–84. doi: 10.1109/72.623199. [DOI] [PubMed] [Google Scholar]

[CR47] 47.Tolba AS, Abu-Rezq AN. Arabic glove-talk (AGT): A communication aid for vocally impaired: Pattern Analysis & Applic 1998;1:218–30

[CR48] 48.Goyal A, Dixit A, Kalra S, Khandelwal A, Nair NP. 2019. Automatic Speech Generation. Indian Patent Application 201911035856A (2019)

PERMALINK

Future Solutions for Voice Rehabilitation in Laryngectomees: A Review of Technologies Based on Electrophysiological Signals

Nithin Prakasan Nair

Vidhu Sharma

Abhinav Dixit

Darwin Kaushal

Kapil Soni

Bikram Choudhury

Amit Goyal

Abstract

Introduction

Aim

Methodology

Fig. 1.

Newer Technologies

Silent Speech Interfaces

Non-Audible Murmur

Table 1.

Ultrasound-Guided Characteristics of Vocal Folds and Optical Imaging of Tongue and Lips

Table 2.

Capturing the Motion of Fixed Points on Articulators, Otherwise Known as Electromagnetic Articulography

Glottic Activity Analysis with Electromagnetic Waves and Laryngeal Electromyographic Signals

Electromyography Signals of Articular Apparatus

Table 3.

Electroencephalography Signals

Table 4.

Brain-Computer Interfaces (Invasive)

Miscellaneous

Table 5.

Conclusion

Authors contribution

Funding

Data Availability

Declarations

Conflict of interest

Footnotes

Contributor Information

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases