Chaos Behavior Analysis of Alaryngeal Voices Including Esophageal (SE) and Tracheoesophageal (TE) Voices

Boquan Liu; Fan Zhang; Ling Chen; Matthew A Silverman; Hengxin Liu; Dehui Fu; Yongwang Huang; Jing Pan; Jack J Jiang

doi:10.1159/000521222

. Author manuscript; available in PMC: 2023 Jan 20.

Published in final edited form as: Folia Phoniatr Logop. 2022 Jan 20;74(6):431–440. doi: 10.1159/000521222

Chaos Behavior Analysis of Alaryngeal Voices Including Esophageal (SE) and Tracheoesophageal (TE) Voices

Boquan Liu ¹, Fan Zhang ², Ling Chen ³, Matthew A Silverman ⁴, Hengxin Liu ⁵, Dehui Fu ⁶, Yongwang Huang ⁷, Jing Pan ⁸, Jack J Jiang ⁹

PMCID: PMC9296702 NIHMSID: NIHMS1779797 PMID: 35051938

Abstract

Hypothesis/Objectives

This study’s objective was to develop a method to the evaluate the chaotic characteristic of alaryngeal speech. The proposed method will be capable of distinguishing between normal and alaryngeal voices, including esophageal (SE) and tracheoesophageal (TE) voices. It has been previously shown that alaryngeal voices exhibit chaotic characteristics due to the aperiodicity of their signals. The proposed method will be applied for future use to quantify both chaos behavior and the difference between SE and TE voices.

Study Design

A total of 74 voice recordings including 34 normal and 40 alaryngeal (26 esophageal (SE) and 14 tracheoesophageal (TE)) were used in the study. Voice samples were analyzed to distinguish alaryngeal voices from normal voices and investigate different chaotic characteristics of SE and TE speech.

Methods

A chaotic distribution detection-based method was used to investigate the chaos behavior of alaryngeal voices. This chaos behavior was used to detect the difference between SE and TE voice types.

Quantification of the chaos behavior (CB) parameter was performed. Statistical analyses were used to compare the results of the CB analysis for both the SE and TE voices.

Results

Statistical analysis revealed that CB effectively differentiated between all normal and alaryngeal voice types (P<0.01). Subsequent multiclass receiver operating characteristic (ROC) analysis demonstrated that CB (area under the curve) possessed the greatest classification accuracy relative to Correlation dimension (D₂).

Conclusions

The CB metric shows strong promise as an accurate, useful metric for objective differentiation between all normal and alaryngaeal, SE and TE voice types. The CB calculations showed expected results, as SE voices have significantly more chaos behavior than TE voices, constituting substantial improvement over previous methods and becoming the first SE and TE classification method. This metric can help clinicians obtain additional acoustical information when monitoring the efficacy of treatment for patients undergoing total laryngectomies.

INTRODUCTION

A laryngectomy is a procedure involving the removal of the larynx, or voice box, of a patient. Clinicians perform laryngectomies for many reasons, including laryngeal cancer, necrosis due to radiation, and severe injury/trauma to the neck[1,2]. However, this procedure causes the loss of phonation and requires various changes in respiratory techniques. The patients must use an alternative speaking method to regain verbal communication. To date, several techniques are used including external alaryngeal speech, which includes speaking with an electrolarynx (EL) or an pneumatic artificial (PA) larynx, and internal alaryngeal speech which includes esophageal (SE) and tracheoesophageal (TE) speech.

These four methods differ in their source and in voice production. An electrolarynx uses an artificial mechanical vibrator, which is placed against the neck, and the person is taught to articulate movements with the sound of the vibrator. In PA speech, sound is generated by vibrating an elastic reed inside a hand-held pneumatic device powered by pulmonary air. Sound is then propagated in the vocal tract along a plastic tube that is placed orally for resonance and articulation of various speech sounds[3,4]. Because of the mechanical-sounding voice and high-cost, these external methods are often not preferred by the patients[2,18]. The alaryngeal speech types, SE and TE speech, contain the same vibratory source with different driving forces[8]. SE voice comes from swallowed air being expelled from the esophagus into the pharynx, while TE voice comes from pulmonary expiration driven into the pharynx through the prosthetic valve between the trachea and esophagus[5,7]. Once the air is expelled, the mucosa in the pharyngoesophageal (PE) segment vibrates, creating sound[8–10].

The aerodynamic measurements subglottal air pressure, average flow rate, and laryngeal resistance have been used to explore sound characteristics of different alaryngeal phonations by attributing alterations in aerodynamic parameters to anatomical changes in the patients[11]. However, the most frequently examined parameters are the average fundamental frequency (F0), F0 range and perturbation, phonation intensity, vowel duration, and voice onset time[12]. Research has shown that TE speech results in higher intensity, longer duration, and greater intelligibility than SE speech[13–17]. However, SE and TE voices involve aperiodic signaling and often do not have a fundamental frequency. Because of SE and TE voices’ aperiodicity, acoustic analysis of alaryngeal voice is less straightforward than normal voice[18]. An acoustic visual typing system was developed to evaluate the TE voices and serve as a more precise acoustic analysis method than selected acoustic measurement (median fundamental frequency, standard deviation of fundamental frequency, jitter, percentage of voiced (%Voiced), harmonics-to-noise ratio (HNR), glottal-to-noise excitation (GNE) ratio, and band energy difference (BED)). Furthermore, this system has shown that the traditional acoustic measurement methods fail to evaluate the entire range of TE voices[19]. The visual typing can be used for multidimensional evaluation of functional voice assessment, but signal typing in its current form provides limited predictive information on the voice quality of TE speech[20]. The perturbation measurements demonstrated highly questionable reliability in quantifying SE voices. Based on the comparison of voices using nonlinear dynamical analysis, due to the PE segment’s complexity and the lack of control over the new vibratory structure, TE and SE voices are significantly more chaotic than normal voices[21]. However, previous research fails to detect a significant difference in dynamical parameters between SE and TE voices[22]. Moreover, few studies examined the full range of voice dynamic behavior for SE and TE voices. Understanding methods to quantify the chaos of these commonly studied parameters is of paramount importance in restoring post-laryngectomy verbal communication[12].

Acoustic analysis of these alaryngeal speech types has demonstrated unreliability in evaluating SE and TE voices[21–23]. Chaos is a concept referring to the ostensibly random yet deterministic behavior of dynamic systems[24]. Chaotic behavior has been observed in a multitude of physical systems, ranging from gravitational effects between celestial bodies in the solar system to the flux of ions across biological membranes[25]. Voice production consists of many physiological sources of nonlinearity, including nonlinear stress-strain vocal fold collisions, chaotic biomechanical interactions between tissues, and turbulent airflow stemming from filtering in the vocal tract[25–27]. Additionally, the existence of chaotic behavior has been established in normal and pathological human voices[25,28]. And, it has been shown that these voices exhibit chaos by the aperiodicity of their signals, but the quantification of chaotic characteristics in SE and TE voices still requires exploration. To investigate this, we created a chaotic distribution detection-based method capable of evaluating SE and TE voices. It distinguishes between SE and TE voices by calculating their chaos behavior. This study’s objective is to examine the chaos behavior in and difference between SE and TE voices and compare the results to normal voice. We chose only to compare the Correlation dimension (D₂) because previous studies have thoroughly tested the normal and alaryngeal voices. Performance analysis involved analyzing the computational output of the proposed method and D₂.

METHODS

Voice Sample Collection

Approval for voice sample use was granted by the Institutional Review Board at the University of Wisconsin-Madison. The voice samples used in this study were collected at the Shanghai EENT Hospital of Fudan University and the 2nd Hospital of Tianjin Medical University for clinical purposes and sent to our lab for research purposes. For normal and alaryngeal types, a total of 74 recordings (34 normal, 26 SE, and 14 TE) were used. The voice samples were recorded by a microphone (PG48, Shures, Niles, IL) located 15 cm away from the subjects’ mouths in a sound attenuated booth. The subjects phonated a sustained /a/ vowel into the microphone and were recorded by Computerized Speech Lab (CSL 4500, PENTAX Medical) at a sampling frequency of 44.1 kHz. Summary information for the samples are displayed in Table 1.

Table 1.

Demographic Information of Normal and Alaryngeal Speakers

Type	Number of samples	Mean age (Years)	Duration of using alaryngeal speech (Months)	Gender
Normal	34	36.7	0	16 men
				18 women
SE	26	58.4	20.3 (1–124)	24 men
				2 women
TE	14	57.7	22.6 (1–60)	14 men

Open in a new tab

Correlation Dimension (D₂)

The correlation dimension (D₂) is a measure that describes the number of degrees of freedom required to describe a system. In systems with a higher degree of dimensionality, a higher correlation dimension is expected. The time delay technique was used to create a phase space[29]:

X_{i} = {x (t_{i}), x (t_{i} - τ), \dots x (t_{i} - (m - 1) τ)}

(1)

where m is the minimum embedding dimension, x(t) is the sample, and τ is the time delay. The correlation integral C(L, r) is:

C (L, r) = \frac{1}{L (L - 1)} \sum_{i = 1}^{L} \sum_{\begin{matrix} j = 1 \\ j \neq i \end{matrix}}^{L} H (r - ‖ X_{i} - X_{j} ‖)

(2)

where r is the radius around X_i, H stands for Heaviside function, and the correlation integral C(L, r) is the probability that the distance between two vectors on the attractor is smaller than a radius r. If the value of r is too low, random noise becomes a dominant factor and causes estimates of D₂ to increase continuously with increasing embedding dimension (m)[30]. If the value of r is too high (that is, approximately the size of the reconstructed phase space), estimations of D₂ approach zero because all data pairs of interest in reconstructed phase space are smaller than r. The appropriate range of r, designated as the scaling region, lies between these two extremes and can be determined manually.

H (u) = {\begin{array}{l} 1, & u > 0 \\ 0, & u \leq 0 \end{array}

(3)

The D₂ is defined as:

D_{2} = \lim_{r \to 0} \lim_{L \to \infty} \frac{l o g C (L, r)}{l o g r}

(4)

The D₂ was found by calculating the slope of the most linear part of the log C(L, r) vs log r plot[31].

Chaos Distribution Measurement

A 2-dimensional system is constructed to represent the dynamic characteristic of a signal:

p_{c} (n) = \sum_{j = 1}^{n} x (j) \cos (j c (e)) q_{c} (n) = \sum_{j = 1}^{n} x (j) \sin (j c (e))

(1)

where x(j) is a voice signal, j = 1, 2, …, N. The choice of c influences the computation of p_c and q_c, the resonance and Brownian motion of p_c and q_c occurs when c = π, which would cause the whole computation to increase regardless of if the signal is periodic or chaotic. To avoid resonance distortions around π, the c is restricted to c ∈ [π/5,4π/5], n=1, 2, …, N[32]. The value of c is randomly chosen and varied e times during the calculation to thoroughly sample the voice signal and measure its chaos distribution.

The mean square displacement is constructed to examine the chaos behavior of voice signal through p_c and q_c at each e^th iteration:

M_{c} (i) = \frac{1}{N - i} \sum_{k = 1}^{N - i} ({[p_{c} (k + i) - p_{c} (k)]}^{2} + {[q_{c} (k + i) - q_{c} (k)]}^{2}) - \frac{(\frac{\sum_{j = 1}^{N} x (j)}{N})^{2} (1 - \cos (c (e)))}{1 - \cos (c (e))}

(2)

where the correction term of $\frac{(\frac{\sum_{j = 1}^{N} x (j)}{N})^{2} (1 - \cos (c (e)))}{1 - \cos (c (e))}$ is derived to regularize the linear behavior of M_c(i), which improves the determination of the growth rate S_c(e). And i ≪ N. In practice, the i is chosen to be 1, 2, …, N/10 to ensure that i is sufficiently smaller than N[33].

A parameter (S_c(e)) of the e^th sample is calculated as the asymptotic growth rate of the M_c(i) at the e^th iteration. This parameter quantifies the correlation between M_c(i) and linear growth. Thus, when M_c(i) is relatively bounded over time, the calculated S_c(e) value is closer to zero³³. In the e^th iteration, the S_c is defined as:

S_{c} (e) = \frac{\sum_{i = 1}^{\frac{N}{10}} (i - \frac{10}{N} \sum_{i = 1}^{\frac{N}{10}} i) (M_{c} (i) - \frac{10}{N} \sum_{i = 1}^{\frac{N}{10}} M_{c} (i))}{\sqrt{\sum_{i = 1}^{N / 10} {(i - \frac{10}{N} \sum_{i = 1}^{N / 10} i)}^{2} \sum_{i = 1}^{N / 10} {(M_{c} (i) - \frac{10}{N} \sum_{i = 1}^{N / 10} M_{c} (i))}^{2}}}

(3)

The voice’s chaos behavior is examined through the mean of S_c(e) from each iteration e.

C B = \frac{1}{E} \sum_{e = 1}^{E} S_{C} (e)

(4)

Statistical Analysis

To compare the chaos behavior of Normal with Alaryngeal, and SE with TE voices, and the performance of CB with D₂, a pairwise t test between Normal and Alaryngeal, SE and TE voices, and a multiclass receiver operating characteristic (ROC) curve of each method were performed for SE and TE classification. Subsequently, area under the curve (AUC) was calculated to compare each method’s classification performances. The pairwise t tests were performed to discern which method yielded significantly different mean values (P<0.01). A d-prime statistic was utilized to analyze the effect sizes between the means of the voice samples. Effect sizes measure and quantify the difference or separation between two group means, and a larger effect size indicates a more robust difference between the mean values of two groups[34]. Generally, a d-prime value greater than 0.8 is considered to indicate a large size effect. Scatter plots and box plots were generated to visualize the data.

RESULTS

Normal and Alaryngeal, SE and TE voices classification

As shown in Figure 1 and Table 2, the CB values of alaryngeal voices are much higher than the normal voices. The CB values within the same normal and alaryngeal voice type fluctuated within a narrow range, demonstrating the proposed method’s stability (Figure 1 and Table 2). Compared with D₂, the normalized distribution curves of CB for each voice type exhibited no overlap.

Table 2.

CB and D₂ values for Normal, Alaryngeal, TE, and SE Voices

Type	CB	D₂
	Mean±SD	Mean±SD
Normal	0.17±0.012	2.66±0.28
Alaryngeal (SE+TE)	0.96±0.034	2.37±0.77
SE	0.98±0.028	2.27±0.78
TE	0.93±0.024	2.57±0.74

Open in a new tab

Figure 2 and Table 2 demonstrate that the results of D₂ analysis within the same normal and alaryngeal voice type exhibited increased variability and deviation. The data distribution curves of D₂ values for normal and alaryngeal voices can be visualized in Figure 2. It shows a greater overlap between the normal and alaryngeal voice distribution curves than the CB curves.

As shown in Table 2, compared with the TE voices, the SE voices exhibited higher mean and standard deviation of CB values. The data distribution curve of CB values for SE and TE voices can be visualized in Figure 3. It shows more significant overlap between the data distribution curve of SE and TE voices than the curves of normal and alaryngeal voices, which suggests that the SE and TE voices have a more complex dynamic behavior.

Figure 3: — The chaos behavior (CB) values and normalized data distribution for SE and TE voice samples. The data corresponding to each voice type are represented by a specific shape and color.

As shown in Figure 4 and Table 2, the D₂ values exhibited greater scattering behavior for SE and TE voice samples relative to CB, evident from the greater standard deviation in SE and TE voices.

Pairwise t test

Pairwise t test was used to show significant differences between means for the normal and alaryngeal voice samples and between the SE and TE voice samples. Results from the pairwise t-tests and d-prime analyses for normal and alaryngeal voice samples and SE and TE voice samples are displayed in Tables 3–4, respectively. CB was the only metric that yielded significant mean differences (P<0.01) and substantial effect sizes (D’>0.8) between the means of all normal and alaryngeal voice samples and SE and TE voice samples. On the other hand, D₂ did not result in significant mean differences. This can be seen in Tables 3 and 4 and visualized in Figures 5 and 6.

Table 3.

Comparison Between Alaryngeal and Normal Voice Samples

Method	P	D’
CB	<0.01	29.93
D₂	0.039	0.48

Open in a new tab

Table 4.

Comparison Between TE and SE Voice Samples

Method	P	D’
CB	<0.01	1.64
D₂	0.48	0.39

Open in a new tab

Figure 5: — Box-and-whisker plots of (A) CB of alaryngeal and normal voice samples, (B) D₂ of alaryngeal and normal voice samples. Boxes represent interquartile ranges. The solid red lines indicate median values. The red pluses represent outliers.

Figure 6: — Box-and-whisker plots of (A) CB of SE and TE voice samples, (B) D₂ of SE and TE voice samples. Boxes represent interquartile ranges. Solid red lines indicate median values. The red pluses represent outliers.

D2 analysis yielded no significant differences for all normal and alaryngeal voice samples, SE and TE voice samples. Additionally, d-prime statistical analysis indicated moderate to small effect sizes for all comparisons (Table 3 and 4). The primary limitation of the D₂ method was observed as overlapping between the D₂ values of normal and alaryngeal voice samples, SE and TE voice samples (Figure 5B and 6B).

ROC

The CB results exhibited no overlapping for the normal and alaryngeal voice samples, so multiclass ROC curves were only constructed for SE and TE voice samples (Figure 7). The calculated AUC for CB and D₂ are displayed in Table 5. For classification conditions analyzed, CB method shows a much larger AUC relative to D₂ (Table 5).

Figure 7: — ROC Plots for Classifier Performance Comparison of CB and D2 for SE and TE Voices.

Table 5.

AUC Values of CB and D₂ Obtained from Multiclass ROC Analysis for SE and TE Voices

Comparison	CB	D₂
SE and TE	0.90	0.39

Open in a new tab

DISCUSSION

Human phonation has recently been viewed as a chaotic system because heavily disordered voices exhibit irregular waveforms, unreliable perturbation values, and poor perceptual qualities. Studies using vocal fold models, excised larynges, and human vocal fold analysis have shown that human vocal folds produce some inherent chaos[35].

Perturbation analysis relying heavily on the regularity of vibratory cycles suffers from the notorious aperiodicity of alaryngeal voices. Therefore, perturbation analysis is inapplicable for SE and TE analysis. Before this study, few nonlinear parameters have been applied to the alaryngeal voice types to overcome this limitation. However, there has been a lack of research done to successfully classify and examine the SE and TE voices’ chaotic characteristics. Due to the aperiodic qualities of SE and TE voices, it was necessary to evaluate voice samples from these patients via quantifying and classifying the present chaos behavior.

This has allowed for the development and application of new nonlinear dynamic parameters to analyze alaryngeal voices. The present results demonstrated that alaryngeal voices are significantly more chaotic than normal voices, mainly due to the PE segment’s use as the replacement sound source. Compared with the glottis, the PE segment does not have adequate flexibility and volitional controllability to create normal vibration. This results in highly unsymmetrical and asynchronous vibration of the PE segment, accompanied by irregular voices.

The proposed method of CB was compared with the previous thoroughly tested D₂ approach. Only CB demonstrated comprehensive classification of all normal and alaryngeal, SE and TE voice types. As shown in Table 2, compared to normal and TE voices, the CB values for SE has the highest mean value showing more chaotic behavior in their voice. The D₂ method failed to distinguish between alaryngeal and normal voices, and the alaryngeal and SE voices had smaller mean D₂ values than normal and TE voices. This corroborates the previous research findings[36]. Although the D₂ converges to a finite value with increasing embedding dimension when analyzing periodic and semiperiodic signals, the D₂ does not converge to a finite value when the chaos component of the signal becomes more extensive. Under these high chaos conditions, the signal exhibits infinite dimensionality and, therefore, fail to be used for accurate estimates of D₂ for SE and TE voices. Figures 1 and 3 demonstrate that the obtained CB data for normal and alaryngeal, SE and TE voice types were highly localized to discrete ranges of CB values. Consequently, this yielded significantly different means and large effect sizes between the voice types. Contrastingly, increased variability and fluctuations in the D₂ values obtained within and between each voice type (Figure 2 and Figure 4, respectively) indicate that D₂ is unreliable for differentiating between all voice types. Construction of multiclass ROC curves and calculated AUC values allowed for a direct comparison of overall classifier performance between the CB and D₂ methods (Figure 7). The CB method exhibited a much larger AUC value, which indicates greater classification reliability.

The outperformance of TE voice in the experimental results comes from the different reservoir mechanisms being used in SE voice. The TE speakers use pulmonary air from the lungs to vibrate the neoglottis, similar to normal voice production. On the contrary, SE voice uses air from the upper esophagus to vibrate the PE segment. Due to non-skeletal muscle fibers rendering the intraesophageal air pressure for neoglottal vibration unstable, controlling SE voice production appears to be more difficult than TE. TE speakers should display a good mastery of such air supply mechanism, as this way of air supply has been used by them long before the surgery.

Compared to SE voice, TE voice is considered the gold standard for total laryngectomy patients, and it is currently the preferred method of voice rehabilitation for most patients. This study is the first to provide substantial evidence for the reliability of TE voices compared to the more chaotic behavior of SE voices.

CONCLUSION

In this study, we applied recently developed chaos behavior measurement to both normal and alaryngeal voices. These parameters were capable of quantifying the amount of chaos behavior present in the voice signals. It was found that chaos behavior was lower in TE voices while it was higher in SE voice. The presented CB method were able to significantly differentiate between normal and alaryngeal voices, and between SE and TE voices.

Currently, the utility of acoustic analysis in clinical practice is limited by deficiencies in the accuracy of objective parameters; however, the results of this study suggest that CB might offer a solution to the shortcomings of previous methods. From the values we obtained from the presented method, we were first able to classify SE and TE voices. This classification will assist clinical otolaryngologists and speech therapists in evaluating the quality of both normal and alaryngeal voices. The CB method could also assist clinicians in obtaining additional acoustical information, which may lead to superior treatment for individuals who have undergone total laryngectomies. Furthermore, by measuring chaotic behavior changes over the course of treatment interventions, clinicians can potentially quantitatively evaluate treatment. Currently, the utility of acoustic analysis in clinical practice is limited by deficiencies in understanding the dynamic behavior of alaryngeal voice; however, the results of this study suggest that CB might offer a solution to provide valuable clinical information concerning the similarities and differences across the SE and TE phonation methods, as well as normal speech.

In the future, to get a universal training plan, our lab will collect more SE and TE voice samples and obtain more information from the different durations and ways of using alaryngeal speech. Future research by our lab will also focus on the use of different suture techniques during total laryngectomies and the vocal rehabilitation outcomes. In addition, the use of these nonlinear parameters should be applied to other forms of disordered voice to quantify the chaos present.

Funding Sources:

This study was supported by National Institutes of Health grant NIH/NIDCD (2-R01 DC006019) and NSFC81870710.

Statement of Ethics: This work was approved by the University of Wisconsin-Madison Institutional Review Board (Protocol #2016-1154), the Fudan Medical School Ethics committee (IRB #2018019-1), and the 2^nd Hospital of Tianjin Medical University Ethics Committee (IRB #KY2021K030). All subjects have given their written consent for their data to be analyzed for this study.

Footnotes

Conflict of Interest Statement: The authors have no conflicts of interest to declare.

Contributor Information

Boquan Liu, School of Humanities, Shanghai Jiao Tong University, Shanghai, China.

Fan Zhang, Otorhinolaryngology Department of the Eye, ENT Hospital affiliated with Fudan University.

Ling Chen, Otorhinolaryngology Department of the Eye, ENT Hospital affiliated with Fudan University.

Matthew A. Silverman, Department of Surgery, Division of Otolaryngology Head and Neck Surgery, University of Wisconsin – Madison, Madison, Wisconsin

Hengxin Liu, Minzu University of China, Beijing, China.

Dehui Fu, ENT Department, The 2nd Hospital of Tianjin Medical University.

Yongwang Huang, ENT Department, The 2nd Hospital of Tianjin Medical University.

Jing Pan, ENT Department, The 2nd Hospital of Tianjin Medical University.

Jack J. Jiang, Department of Surgery, Division of Otolaryngology Head and Neck Surgery, University of Wisconsin – Madison, Madison, Wisconsin.

Data Availability Statement:

All data generated or analyzed during this study are included in this article. Further enquiries can be directed to the corresponding author.

REFERENCES

1).Kramp B, & Dommerich S (2009). Tracheostomy cannulas and voice prosthesis. GMS Current Topics in Otorhinolaryngology, Head and Neck Surgery, 8, Doc05. [DOI] [PMC free article] [PubMed] [Google Scholar]
2).Elmiyeh B, Dwivedi RC, Jallali N1, Chisholm EJ, Kazi R, Clarke PM, Rhys-Evans PH. (2010). Surgical voice restoration after total laryngectomy: An overview. Indian Journal of Cancer, 47(3), 239–247. [DOI] [PubMed] [Google Scholar]
3).Ng ML. (2019). The use of the Lombard Effect in Improving Alaryngeal Speech. Journal of Voice, 35(1), 18–28. [DOI] [PubMed] [Google Scholar]
4).Ng ML, Kwok C-LI, and Chow S-FW (1997) Speech Performance of Adult Cantonese-Speaking Laryngectomees Using Different Types of Alaryngeal Phonation, Journal of Voice, 11(3), 338–344. [DOI] [PubMed] [Google Scholar]
5).Guttman MR (1932). “Rehabilitation of the voice in laryngectomized patients”. Arch Otolaryngol. 15, 478–488 [Google Scholar]
6).Malik T, Bruce I, & Cherry J (2007). Surgical complications of tracheo-oesophageal puncture and speech valves. Current Opinion in Otolaryngology & Head and Neck Surgery, 15(2), 117–122. [DOI] [PubMed] [Google Scholar]
7).Singer S, Merbach M, Dietz A, & Schwarz R (2007). Psychosocial determinants of successful voice rehabilitation after laryngectomy. Journal of the Chinese Medical Association : JCMA, 70(10), 407–423 [DOI] [PubMed] [Google Scholar]
8).Williams S, Watson B (1987) Speaking proficiency variations according to method of alaryngeal voicing. Laryngoscope, 97: 737–739. [PubMed] [Google Scholar]
9).Pindzola R, Cain B (1988) Acceptability ratings of tracheo-oesophageal speech. Laryngoscope, 98: 394–397. [DOI] [PubMed] [Google Scholar]
10).Max L, Steurs W, & De Bruyn. (1996). Vocal capacities in esophageal and tracheoesophageal speakers. The Laryngoscope, 106(1 Pt 1), 93–96. [DOI] [PubMed] [Google Scholar]
11).Searl J (2019). Alaryngeal Speech Aerodynamics: Lower and Upper Airway Considerations. In: Doyle P (eds) Clinical Care and Rehabilitation in Head and Neck Cancer. Springer. [Google Scholar]
12).Ng ML. Liu H, Zhao Q, Lam PKY. (2009) Long-term average spectral characteristics of Cantonese alaryngeal speech. Auris Nasus Larynx, 36, 571–577. [DOI] [PubMed] [Google Scholar]
13).Robbins J (1984) Acoustic differentiation of laryngeal, esophageal, and trachea-oesophageal speech. Journal of Speech and Hearing Research, 27: 577–585 [DOI] [PubMed] [Google Scholar]
14).Robbins J, Fisher H, Blom E, Singer M (1984) A comparative acoustic study of normal, esophageal and tracheo-oesophageal speech production. Journal of Speech and Hearing Disorders, 49, 202–210. [DOI] [PubMed] [Google Scholar]
15).Ainsworth WA, Singh W (1992) Perceptual comparison of neo- glottal, oesophageal and normal speech. Folia Phoniatrica, 44, 297–307. [DOI] [PubMed] [Google Scholar]
16).Sanderson R, Anderson S, Denholm S, & Kerr A (1993). The assessment of alaryngeal speech. Clinical Otolaryngology and Allied Sciences, 18(3), 181–183. [DOI] [PubMed] [Google Scholar]
17).Van As C, Hilgers F, Verdonck-de Leeuw I, & Koopmans-van Beinum F (1998). Acoustical analysis and perceptual evaluation of tracheoesophageal prosthetic voice. Journal of Voice, 12(2), 239–248. [DOI] [PubMed] [Google Scholar]
18).Debruyne F, Delaere P, Wouters J, Uwents P (1994) Acoustic analysis of tracheo-oesophageal versus oesophageal speech. The Journal of Laryngology and Otology, 108(4), 325–328. [DOI] [PubMed] [Google Scholar]
19).van As-Brooks Corina J., Beinum Florien J. Koopmans-van, Pols Louis C.W., and Hilgers Frans J.M., (2006) Acoustic Signal Typing for Evaluation of Voice Quality in Tracheoesphageal Speech, Journal of Voice, 20(3), 355–368. [DOI] [PubMed] [Google Scholar]
20).Clapham Renee P., van As-Brooks Corina J., van Son Rob J. J. H., Hilgers Frans J. M., and van den Brekel Michiel W. M., (2015) The Relationship Between Acoustic Signal Typing and Perceptual Evaluation of Tracheoesophageal Voice Quality for Sustained Vowels. Journal of Voice, 29(4), 517.e23–517.e29. [DOI] [PubMed] [Google Scholar]
21).MacCallum JK, Cai L, Zhou L, Zhang Y, and Jiang JJ. (2009) Acoustic Analysis of Aperiodic Voice: Perturbation and Nonlinear Dynamic Properties in Esophageal Phonation. Journal of Voice, 23(3), 283–290. [DOI] [PMC free article] [PubMed] [Google Scholar]
22).Yan Nan, Ng Manwa L., Wang Dongning, Zhang Lan, Chan Victor, and Ho Rerrario S.., (2013) “Nonlinear Dynamical Analysis of Laryngeal, Esophageal and Tracheoesophageal Speech of Cantonese,” Journal of Voice, 27(1), 101–110. [DOI] [PubMed] [Google Scholar]
23).Yan N, Ng ML, Wang D, Chan V, Zhang L. (2011) Nonlinear dynamics of voices in esophageal phonation. Annual International Conference of the IEEE Engineering in Medicine and Biology Society, 2011:2732–2735. [DOI] [PubMed] [Google Scholar]
24).Jiang JJ, Zhang Y, McGilligan C (2006) Chaos in voice, from modeling to measurement. J. Voice, 20(1), 2–17. [DOI] [PubMed] [Google Scholar]
25).Moon FC. (1992) Chaotic and fractal dynamics: an introduction for applied scientists and engineers. New York: Wiley. [Google Scholar]
26).Zhang Y, McGilligan C, Zhou L, Vig M, Jiang JJ. (2004) Nonlinear dynamic analysis of voices before and after surgical excision of vocal polyps. J. Acoust. Soc. Am, 115(5), 2270–2277. [DOI] [PubMed] [Google Scholar]
27).Zhang Y, Jiang JJ, Biazzo L, Jorgensen M (2005) Perturbation and nonlinear dynamic analyses of voices from patients with unilateral laryngeal paralysis. J. Voice, 19(4), 519–528. [DOI] [PubMed] [Google Scholar]
28).Tao C, Jiang JJ. (2008) Chaotic component obscured by strong periodicity in voice production system. Phys. Rev. E Stat. Nonlin. Soft Matter Phys. 77(6), 061922. [DOI] [PMC free article] [PubMed] [Google Scholar]
29).Packard NH, Crutchfield JP, Farmer JD, Shaw RS. (1980). Geometry from a Time Series. Phys Rev Lett. 45 (9), 712–716. [Google Scholar]
30).Awan SN, Roy N, & Jiang JJ (2010). Nonlinear dynamic analysis of disordered voice: The relationship between the correlation dimension (D2) and pre-/post-treatment change in perceived dysphonia severity. Journal of Voice, 24, 285–293. [DOI] [PubMed] [Google Scholar]
31).Fraser AM, & Swinney HL (1986). Independent coordinates for strange attractors from mutual information. Physical Review A, 33, 1134–1140. [DOI] [PubMed] [Google Scholar]
32).Gottwald GA, Melbourne I. (2009) On the validity of the 0–1 test for chaos. Nonlinearity. 22(6), 1367–1382. [Google Scholar]
33).Gottwald GA, & Melbourne I (2009). On the implementation of the 0–1 test for chaos. SIAM Journal on Applied Dynamical Systems, 8, 129–145. [Google Scholar]
34).Kelley K, Preacher KJ. (2012) On effect size. Psychol Methods. 17(2), 137–152. [DOI] [PubMed] [Google Scholar]
35).Jiang JJ, Zhang Y, Ford CN (2003) Nonlinear dynamics of phonations in excised larynx experiments. J Acoust Soc Am, 114(4):2198–2205. [DOI] [PubMed] [Google Scholar]
36).Liu B, Polce E, Sprott JC, and Jiang JJ, (2018) Applied Chaos Level Test for Validation of Signal Conditions Underlying Optimal Performance of Voice Classification Methods, Journal of Speech Language, and Hearing Research, 61(5), 1130–1139. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

All data generated or analyzed during this study are included in this article. Further enquiries can be directed to the corresponding author.

[R1] 1).Kramp B, & Dommerich S (2009). Tracheostomy cannulas and voice prosthesis. GMS Current Topics in Otorhinolaryngology, Head and Neck Surgery, 8, Doc05. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] 2).Elmiyeh B, Dwivedi RC, Jallali N1, Chisholm EJ, Kazi R, Clarke PM, Rhys-Evans PH. (2010). Surgical voice restoration after total laryngectomy: An overview. Indian Journal of Cancer, 47(3), 239–247. [DOI] [PubMed] [Google Scholar]

[R3] 3).Ng ML. (2019). The use of the Lombard Effect in Improving Alaryngeal Speech. Journal of Voice, 35(1), 18–28. [DOI] [PubMed] [Google Scholar]

[R4] 4).Ng ML, Kwok C-LI, and Chow S-FW (1997) Speech Performance of Adult Cantonese-Speaking Laryngectomees Using Different Types of Alaryngeal Phonation, Journal of Voice, 11(3), 338–344. [DOI] [PubMed] [Google Scholar]

[R5] 5).Guttman MR (1932). “Rehabilitation of the voice in laryngectomized patients”. Arch Otolaryngol. 15, 478–488 [Google Scholar]

[R6] 6).Malik T, Bruce I, & Cherry J (2007). Surgical complications of tracheo-oesophageal puncture and speech valves. Current Opinion in Otolaryngology & Head and Neck Surgery, 15(2), 117–122. [DOI] [PubMed] [Google Scholar]

[R7] 7).Singer S, Merbach M, Dietz A, & Schwarz R (2007). Psychosocial determinants of successful voice rehabilitation after laryngectomy. Journal of the Chinese Medical Association : JCMA, 70(10), 407–423 [DOI] [PubMed] [Google Scholar]

[R8] 8).Williams S, Watson B (1987) Speaking proficiency variations according to method of alaryngeal voicing. Laryngoscope, 97: 737–739. [PubMed] [Google Scholar]

[R9] 9).Pindzola R, Cain B (1988) Acceptability ratings of tracheo-oesophageal speech. Laryngoscope, 98: 394–397. [DOI] [PubMed] [Google Scholar]

[R10] 10).Max L, Steurs W, & De Bruyn. (1996). Vocal capacities in esophageal and tracheoesophageal speakers. The Laryngoscope, 106(1 Pt 1), 93–96. [DOI] [PubMed] [Google Scholar]

[R11] 11).Searl J (2019). Alaryngeal Speech Aerodynamics: Lower and Upper Airway Considerations. In: Doyle P (eds) Clinical Care and Rehabilitation in Head and Neck Cancer. Springer. [Google Scholar]

[R12] 12).Ng ML. Liu H, Zhao Q, Lam PKY. (2009) Long-term average spectral characteristics of Cantonese alaryngeal speech. Auris Nasus Larynx, 36, 571–577. [DOI] [PubMed] [Google Scholar]

[R13] 13).Robbins J (1984) Acoustic differentiation of laryngeal, esophageal, and trachea-oesophageal speech. Journal of Speech and Hearing Research, 27: 577–585 [DOI] [PubMed] [Google Scholar]

[R14] 14).Robbins J, Fisher H, Blom E, Singer M (1984) A comparative acoustic study of normal, esophageal and tracheo-oesophageal speech production. Journal of Speech and Hearing Disorders, 49, 202–210. [DOI] [PubMed] [Google Scholar]

[R15] 15).Ainsworth WA, Singh W (1992) Perceptual comparison of neo- glottal, oesophageal and normal speech. Folia Phoniatrica, 44, 297–307. [DOI] [PubMed] [Google Scholar]

[R16] 16).Sanderson R, Anderson S, Denholm S, & Kerr A (1993). The assessment of alaryngeal speech. Clinical Otolaryngology and Allied Sciences, 18(3), 181–183. [DOI] [PubMed] [Google Scholar]

[R17] 17).Van As C, Hilgers F, Verdonck-de Leeuw I, & Koopmans-van Beinum F (1998). Acoustical analysis and perceptual evaluation of tracheoesophageal prosthetic voice. Journal of Voice, 12(2), 239–248. [DOI] [PubMed] [Google Scholar]

[R18] 18).Debruyne F, Delaere P, Wouters J, Uwents P (1994) Acoustic analysis of tracheo-oesophageal versus oesophageal speech. The Journal of Laryngology and Otology, 108(4), 325–328. [DOI] [PubMed] [Google Scholar]

[R19] 19).van As-Brooks Corina J., Beinum Florien J. Koopmans-van, Pols Louis C.W., and Hilgers Frans J.M., (2006) Acoustic Signal Typing for Evaluation of Voice Quality in Tracheoesphageal Speech, Journal of Voice, 20(3), 355–368. [DOI] [PubMed] [Google Scholar]

[R20] 20).Clapham Renee P., van As-Brooks Corina J., van Son Rob J. J. H., Hilgers Frans J. M., and van den Brekel Michiel W. M., (2015) The Relationship Between Acoustic Signal Typing and Perceptual Evaluation of Tracheoesophageal Voice Quality for Sustained Vowels. Journal of Voice, 29(4), 517.e23–517.e29. [DOI] [PubMed] [Google Scholar]

[R21] 21).MacCallum JK, Cai L, Zhou L, Zhang Y, and Jiang JJ. (2009) Acoustic Analysis of Aperiodic Voice: Perturbation and Nonlinear Dynamic Properties in Esophageal Phonation. Journal of Voice, 23(3), 283–290. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] 22).Yan Nan, Ng Manwa L., Wang Dongning, Zhang Lan, Chan Victor, and Ho Rerrario S.., (2013) “Nonlinear Dynamical Analysis of Laryngeal, Esophageal and Tracheoesophageal Speech of Cantonese,” Journal of Voice, 27(1), 101–110. [DOI] [PubMed] [Google Scholar]

[R23] 23).Yan N, Ng ML, Wang D, Chan V, Zhang L. (2011) Nonlinear dynamics of voices in esophageal phonation. Annual International Conference of the IEEE Engineering in Medicine and Biology Society, 2011:2732–2735. [DOI] [PubMed] [Google Scholar]

[R24] 24).Jiang JJ, Zhang Y, McGilligan C (2006) Chaos in voice, from modeling to measurement. J. Voice, 20(1), 2–17. [DOI] [PubMed] [Google Scholar]

[R25] 25).Moon FC. (1992) Chaotic and fractal dynamics: an introduction for applied scientists and engineers. New York: Wiley. [Google Scholar]

[R26] 26).Zhang Y, McGilligan C, Zhou L, Vig M, Jiang JJ. (2004) Nonlinear dynamic analysis of voices before and after surgical excision of vocal polyps. J. Acoust. Soc. Am, 115(5), 2270–2277. [DOI] [PubMed] [Google Scholar]

[R27] 27).Zhang Y, Jiang JJ, Biazzo L, Jorgensen M (2005) Perturbation and nonlinear dynamic analyses of voices from patients with unilateral laryngeal paralysis. J. Voice, 19(4), 519–528. [DOI] [PubMed] [Google Scholar]

[R28] 28).Tao C, Jiang JJ. (2008) Chaotic component obscured by strong periodicity in voice production system. Phys. Rev. E Stat. Nonlin. Soft Matter Phys. 77(6), 061922. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R29] 29).Packard NH, Crutchfield JP, Farmer JD, Shaw RS. (1980). Geometry from a Time Series. Phys Rev Lett. 45 (9), 712–716. [Google Scholar]

[R30] 30).Awan SN, Roy N, & Jiang JJ (2010). Nonlinear dynamic analysis of disordered voice: The relationship between the correlation dimension (D2) and pre-/post-treatment change in perceived dysphonia severity. Journal of Voice, 24, 285–293. [DOI] [PubMed] [Google Scholar]

[R31] 31).Fraser AM, & Swinney HL (1986). Independent coordinates for strange attractors from mutual information. Physical Review A, 33, 1134–1140. [DOI] [PubMed] [Google Scholar]

[R32] 32).Gottwald GA, Melbourne I. (2009) On the validity of the 0–1 test for chaos. Nonlinearity. 22(6), 1367–1382. [Google Scholar]

[R33] 33).Gottwald GA, & Melbourne I (2009). On the implementation of the 0–1 test for chaos. SIAM Journal on Applied Dynamical Systems, 8, 129–145. [Google Scholar]

[R34] 34).Kelley K, Preacher KJ. (2012) On effect size. Psychol Methods. 17(2), 137–152. [DOI] [PubMed] [Google Scholar]

[R35] 35).Jiang JJ, Zhang Y, Ford CN (2003) Nonlinear dynamics of phonations in excised larynx experiments. J Acoust Soc Am, 114(4):2198–2205. [DOI] [PubMed] [Google Scholar]

[R36] 36).Liu B, Polce E, Sprott JC, and Jiang JJ, (2018) Applied Chaos Level Test for Validation of Signal Conditions Underlying Optimal Performance of Voice Classification Methods, Journal of Speech Language, and Hearing Research, 61(5), 1130–1139. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Chaos Behavior Analysis of Alaryngeal Voices Including Esophageal (SE) and Tracheoesophageal (TE) Voices

Boquan Liu, PhD

Fan Zhang, MD

Ling Chen, RN

Matthew A Silverman

Hengxin Liu, PhD

Dehui Fu, MD

Yongwang Huang, MD

Jing Pan, BS

Jack J Jiang, MD, PhD

Abstract

Hypothesis/Objectives

Study Design

Methods

Results

Conclusions

INTRODUCTION

METHODS

Voice Sample Collection

Table 1.

Correlation Dimension (D2)

Chaos Distribution Measurement

Statistical Analysis

RESULTS

Normal and Alaryngeal, SE and TE voices classification

Figure 1:

Table 2.

Figure 2:

Figure 3:

Figure 4:

Pairwise t test

Table 3.

Table 4.

Figure 5:

Figure 6:

ROC

Figure 7:

Table 5.

DISCUSSION

CONCLUSION

Funding Sources:

Footnotes

Contributor Information

Data Availability Statement:

REFERENCES

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

Correlation Dimension (D₂)