Abstract
Background
Amyotrophic Lateral Sclerosis (ALS) is a degenerative neurologic disease with no definitive biomarkers for early detection. This paper discusses the use of acoustic analysis of sustained vowel phonations (SVP) and machine learning in ALS detection.
Methods
An SVP corpus of 128 (64 /a/ and 64 /i/) from 31 patients with ALS and 33 healthy controls (HC) was employed. 131 acoustic features, including jitter, shimmer, Mel-Frequency Cepstral Coefficients (MFCCs), and Pathological Vibrato Index (PVI), were extracted. A LightGBM (Light Gradient Boosting Machine)-based model was built and optimized using 5-fold cross-validation to separate ALS cases. Model performance and feature importance were evaluated.
Results
The model performed well with high predictability, yielding an RMSLE of 0.162 and most predictions closely correlating with actual diagnoses. The top features obtained were S55_i, CCI(2), and dCCa(12), which were consistently at the top of the ranking list, indicating their role in ALS detection. The PVI was determined to be a significant biomarker with high values having high correlations with ALS diagnoses. But the multimodal nature of the predictive values indicated some flaws in generalization.
Conclusion
This paper demonstrates the applicability of acoustic analysis and machine learning for early ALS detection. The proposed method provides an affordable, low-cost, and non-invasive way for ALS diagnosis with potential for application in telemedicine and clinical settings. Future research must expand datasets and integrate additional diagnostic modalities to improve the model's robustness and clinical translation.
Keywords: ALS, Acoustic analysis, SVP, Machine learning, PVI, Feature importance
Highlights
-
•
SVP with machine learning detects early ALS, especially bulbar-onset, using voice acoustic biomarkers.
-
•
131 acoustic features, including jitter, shimmer, MFCCs, and PVI, achieved high LightGBM accuracy.
-
•
S55_i, CCI(2), and PVI_a ranked high in importance, suggesting use as non-invasive ALS biomarkers.
-
•
Low-cost, non-invasive, smartphone-ready method for remote ALS screening in resource-poor settings.
1. Introduction
Speech is produced through the collaborative movements of the articulatory, resonatory, phonatory, and respiratory subsystems. Disabilities in one or more of these speech subsystems can impair speech comprehensibility (1,2). Modeling of the unconnected and combined effects of subsystem disability on speech comprehensibility has been a notable challenge, but is important for making better speech motor assessments and recognizing speech treatment targets (3,4).
Amyotrophic lateral sclerosis (ALS) is a fatal disease involving the upper and lower motor neurons. There are two main forms of ALS, which differ by onset: spinal form (first symptoms appear in the arms and legs) and bulbar form (voice and/or swallowing difficulties are usually the first signs). Progressive bulbar motor disability due to ALS leads to worsening in speech and swallowing function (3). The malformations in speech production, articulation, and phonation due to neurological disorders are referred to as dysarthria. Dysarthria develops in more than 80 % of individuals affected by ALS at some time during the disease's course (5).
Currently, the diagnosis of ALS is predicated on clinical observations of upper and lower motor neuron degeneration in the absence of other causes. Due to the absence of clinically typical markers of ALS, the time to confirm diagnosis on average takes 12 months (6). Throughout the last few years, objective assessment of voice and speech has gained traction as a means of finding early signs of neurological diseases ([7], [8], [9]). It can be described by the fact that speech is achieved through complex articulatory movements, requires precise coordination and timing, and therefore is very sensitive to disruptions in the peripheral or central nervous system (10,11).
Bulbar ALS, which affects speech and swallowing, is eventually characterized by loss of speech comprehensibility and the ability to swallow (3,12,13). The serious impact of bulbar motor activity on quality of life is well-documented (14,15).
This underscores the need to search for sensitive and specific markers of bulbar disease onset and progression. The current standard evaluation of bulbar function involves clinician-based assessments of speech intelligibility and speaking rate. Despite its widespread clinical use, speech intelligibility is not sensitive to the early stages of the disease; changes in speech intelligibility occur late in the disease course and long after the onset of bulbar motor symptoms (1,[16], [17], [18], [19], [20]). However, slowing of speech appears to precede decreases in speech intelligibility, which tends to decrease rapidly once speaking rate slows to approximately 120 words per minute (WPM) (13,19,21). Therefore, the slowing of speaking rate to 120 WPM signifies the onset of the rapid decline phase of speech intelligibility (i.e., intelligibility <85 %). In contrast to the normal speech phase (i.e., intelligibility ranged between 100 and 97 %) and the slow disease phase (i.e., intelligibility ranged between 96 and 86 %), which correspond to slow declines in intelligibility, the rapid disease phase is characterized by precipitous decreases in intelligibility and the eventual loss of speech communication within a short timeframe (3,13).
The primary goal of this work is to investigate the acoustic features of sustained vowel phonations (SVP), specifically the vowels /a/ and /i/, for the classification of ALS patients. The voice database was obtained from the Republican Research and Clinical Center of Neurology and Neurosurgery in Minsk, Belarus. It comprises 128 SVP, including 64 of vowel /a/ and 64 of vowel /i/, uttered by 64 speakers, of whom 31 were ALS patients and 33 were HC. Each speaker was instructed to produce /a/ and /i/ vowels at a comfortable pitch and loudness for as long as possible. The data are nearly balanced, with 48 % pathological voices (ALS) and 52 % healthy voices (HC). This study will conduct an acoustic analysis of the SVP to detect early signs of ALS, particularly its bulbar form, and develop a robust framework for automated ALS detection. We focus on SVP as a means of capturing subtle changes in speech production that may precede more overt clinical symptoms, enabling earlier diagnosis and intervention.
2. Method
2.1. Dataset description
This study utilized the Minsk2020 ALS dataset, which includes clinical data from patients with ALS. The dataset comprises 128 instances of SVP, 64 samples per vowel /a/ and 64 samples per vowel /i/. These were collected from a group of 31 subjects with a diagnosis of ALS and 33 healthy control (HC) subjects. Participants were instructed to sustain /a/ and /i/ vowels at comfortable pitch and volume for as long as possible. The dataset presents an approximate balance, including 48 % pathological voices (ALS) and 52 % healthy voices (HC). Recordings of the voice were captured using smartphones in combination with the standard headsets at a 44.1 kHz sampling rate, and voice recordings were stored as 16-bit uncompressed PCM files. The average recording duration was found to be 3.7 ± 1.5 s for the HC group and 4.1 ± 2.0 s for the ALS group.
2.2. Clinical profile of participants
The study compared data from 31 ALS patients (20 males, 11 females; mean age 59.4 years, range 39–70 years) and 33 HC (13 males, 20 females; mean age 53.8 years, range 34–80 years). The mean disease duration for ALS patients at the time of recording was 20.2 months (range 5–58 months). Disease onset was spinal in 18 patients (58 %) and bulbar in 13 patients (42 %). Clinically apparent bulbar signs were present in 24 (77 %) of the ALS patients. Healthy control participants were screened for the absence of any history of neurological, respiratory, or otolaryngological disease, and were not on medications that would influence vocal characteristics.
2.3. Acoustic feature extraction
From each sustained vowel phonation (/a/ and /i/), a set of 131 acoustic features was extracted. These features describe different aspects of vocal production and are grouped into vocal fold irregularity measures (e.g., jitter, shimmer), noise components (e.g., Harmonic-to-Noise Ratio (HNR), Glottal-to-Noise Excitation Ratio (GNE)), measures of phonatory range (e.g., Phonatory Frequency Range (PFR), Pitch Period Entropy (PPE)), and spectral features (e.g., Mel-Frequency Cepstral Coefficients (MFCCs) and their temporal derivatives (ΔMFCCs)). The Pathological Vibrato Index (PVI) was also added as a sensitive marker for frequency modulations, reflecting bulbar involvement in ALS. Other harmonic-related parameters and joint parameters describing cross-vowel effects were also included. For in-depth descriptions of individual features, see the Supplementary Material. The feature design generally adhered to the structure presented by Vashkevich et al. (22), with adaptations relevant to the clinical setting and the machine learning algorithm (23).
2.4. Feature selection and model overview
We used feature selection methods such as Least Absolute Shrinkage and Selection Operator (LASSO) (24), Relief (25), RelieFF (26), and Quality of Variation (QoV) to decrease dimensionality and identify the most informative features for ALS prediction. For the classification problem, we created a LightGBM (Light Gradient Boosting Machine) model. LightGBM is an efficient gradient boosting framework that creates an ensemble of decision trees, improving the model iteratively by minimizing a loss function. The model was tuned by 5-fold cross-validation to evaluate its generalization performance on unseen data. The Root Mean Squared Logarithmic Error (RMSLE) was used as the evaluation metric because it is robust against outliers and apt for skewed target variables. Additional information on feature selection methods, model setup, and cross-validation procedures can be found in the Supplementary Material.
3. Results
3.1. Overall performance
The LightGBM model performed well in predicting ALS patients. The overall Root Mean Squared Logarithmic Error (RMSLE) obtained was 0.162. The scatter plot of predicted vs. actual ALS diagnosis (Fig. 1) showed evident agreement, with the majority of the data points grouping closely along the diagonal line. This shows correct classification for both ALS and non-ALS cases, with very few examples demonstrating slight misclassifications.
3.2. Key feature significance
A feature significance analysis revealed that several acoustic parameters contributed significantly to the improvement of the model's predictive power across multiple instances. Some of the most salient features included S55_i, CCI (2), and dCCa (12). Notably, the Pathological Vibrato Index (PVI_a) was found to be a significant biomarker that showed a high degree of differentiation between ALS patients and healthy controls (Fig. 2). High PVI_a values were strongly associated with an increased likelihood of an ALS diagnosis, which points to its potential clinical utility for early detection, particularly in those with bulbar symptoms.
4. Discussion
The goal of this study was to demonstrate the feasibility of using acoustic analysis of sustained vowel phonations combined with machine learning techniques for the initial diagnosis of subjects diagnosed with ALS. The LightGBM model used in this research had strong predictive power, as reflected in an RMSLE of 0.162 and the close alignment between predicted and actual diagnoses shown in Fig. 1.
The high importance of features such as the PVI, S55_i, and CCI (2) reflects their key role as acoustic biomarkers for ALS early detection.
Our results agree with those of Vashkevich et al. (22,27), who also highlighted the value of phonation features and the diagnostic potential of analyzing sustained vowel production, despite the application of different datasets and classification strategies. Also, these findings are consistent with studies prioritizing the sensitivity of motor speech features in neurodegenerative diseases and the impact of biomechanical modeling on speech production in ALS (10).
Differences in highlighted features and performance metrics (RMSLE vs. AUROC) are primarily methodological in nature, specifically the choice of speech task. Sustained vowel phonations, as in Farrokhi et al. (22,27), are more comfortable to elicit and inform primarily about the integrity of phonatory and respiratory systems, with features like PVI being highly relevant to capturing subtle vocal fold instabilities. Connected speech, by contrast, as used by the Winterlight Labs study, provides more information about articulatory precision, prosody, and speaking rate, which are likewise severely compromised in ALS (28). Another recent investigation (PMC12062206) conducted attention-based deep learning on remotely recorded sentences to assess dysarthria severity and achieved a high R2 of 0.92 (29). This study differs from Farrokhi et al.'s (22,27) in a basic way in that it applies deep learning to automatically learn features from raw audio, rather than a predefined set of hand-engineered features. The larger number of recordings in the deep learning study dataset (2102 recordings from 125 speakers) also most likely accounted for its strong performance and ability to pick up on intricate patterns, which may be more challenging for models fit on smaller datasets such as the 128 SVP cases in the current study. Advances in machine learning technique, from LightGBM to deep learning, allow for ever more refined pattern detection, which may underlie variable levels of predictive strength and feature importance across studies (30).
The strengths of the study are its non-invasive, low-cost, and telemedicine-friendly approach, which makes it extremely practical for large-scale application. Its emphasis on early detection of ALS is critical for early intervention, and the recognition of PVI as a distinct biomarker is an important contribution. Its limitations are relatively small dataset size (64 speakers), which could impact generalization; the use of only sustained vowel phonations, which may not reflect the full range of dysarthric symptoms found in more elaborate speech; and the use of hand-crafted features, which may not maximize the pattern recognition potential of state-of-the-art deep learning models.
This study has important clinical relevance. The inexpensive and non-invasive technique may prove to be a valuable early screening test for ALS, helping to avoid delays in diagnosis and enable earlier initiation of treatment. It provides a useful means for clinicians to objectively evaluate risk and monitor disease progression, especially for bulbar symptoms. The obvious association of PVI with ALS gives a measurable parameter for guiding clinical decision-making. This could ultimately translate to better patient outcomes by enabling earlier access to therapies and more prompt management of communication and swallowing impairment.
To build on this strategy, future studies must enlarge and diversify datasets to include more heterogeneous patient groups and stages of ALS. Combination of multimodal data with other non-invasive biomarkers may yield more complete diagnoses. The inclusion of a larger range of speech tasks, rather than solely sustained vowels, would provide a more complete picture of dysarthric features. The use of more sophisticated deep learning architectures may enhance predictive capability. Most importantly, longitudinal studies must confirm the results over time and examine the long-term effect of early detection, ultimately opening the door to real-time, clinically viable systems for ALS management.
5. Conclusion
This research adequately proved the possibility of acoustic evaluation of sustained vowel phonations with machine learning as a method for early Amyotrophic Lateral Sclerosis detection. Good prediction performance was shown by the LightGBM model, with good discrimination between normal controls and ALS patients, through common features like S55_i, CCI (2), dCCa (12), and especially the PVI, commonly known to possess major biomarkers. This simple and noninvasive method has tremendous promise for ALS screening early in the disease course in clinical practice and telemedicine. Subsequent research with emphases on data expansion, inclusion of other diagnostic modalities, and creation of real-time systems will further advance its stability and clinical application, eventually leading to enhanced treatment and patient outcomes.
CRediT authorship contribution statement
Zahra Farrokhi: Writing – review & editing, Writing – original draft, Methodology, Investigation. Seyed Amirali Zakavi: Writing – review & editing, Methodology, Investigation, Data curation. Arian Sarafraz: Writing – review & editing, Writing – original draft, Methodology, Investigation. Maryam Valifard: Writing – review & editing, Resources, Methodology. Salar Yousefzadeh: Writing – original draft, Software, Investigation. Zahra Mashhadi Tafreshi: Writing – review & editing, Writing – original draft, Investigation. Omid Anbiyaee: Writing – review & editing, Writing – original draft, Methodology. Navid Rostami: Writing – review & editing, Validation, Methodology. Mahsa Asadi Anar: Visualization, Supervision, Software, Project administration, Methodology, Investigation, Formal analysis, Data curation. Niloofar Deravi: Writing – original draft, Supervision, Software, Formal analysis, Data curation, Conceptualization.
Consent for publication
Not applicable.
Ethics approval and consent to participate
Not applicable.
Funding
None.
Declaration of competing interest
The authors declare that they have no competing interests.
Acknowledgment
The authors would like to thank the researchers whose work was included in this study.
Footnotes
Supplementary data to this article can be found online at https://doi.org/10.1016/j.ensci.2025.100579.
Contributor Information
Mahsa Asadi Anar, Email: Mahsa.boz@gmail.com.
Niloofar Deravi, Email: niloofarderavi@sbmu.ac.ir.
Appendix A. Appendices
Fig. 1.
Scatter plot of predicted vs. actual ALS diagnoses.
Red markers represent ALS cases, and blue markers represent healthy controls (HC). The diagonal dashed line indicates perfect prediction. Most points are clustered near the top-right (correct ALS) and bottom-left (correct HC), indicating strong classification performance with some mild misclassifications near the decision boundary.
Fig. 2.
Relationship Between Pathological Vibrato Index (PVI_a) and ALS Diagnosis.
This scatter plot illustrates the distribution of PVI_a values in relation to ALS diagnosis labels (0 = Healthy Control, 1 = ALS). ALS patients exhibit substantially higher PVI_a values compared to controls, indicating increased vibrato irregularities in sustained phonation. This clear separation supports the clinical utility of PVI_a as a discriminative acoustic biomarker for early detection of bulbar involvement in ALS.
Appendix B. Supplementary data
Supplementary material
Data availability
Data is available upon request from corresponding author.
References
- 1.Kent R.D., Kent J.F., Weismer G., Sufit R.L., Rosenbek J.C., Martin R.E., et al. Impairment of speech intelligibility in men with amyotrophic lateral sclerosis. J. Speech Hear. Disord. 1990;55(4):721–728. doi: 10.1044/jshd.5504.721. [DOI] [PubMed] [Google Scholar]
- 2.Kent R.D., Weismer G., Kent J.F., Rosenbek J.C. Toward phonetic intelligibility testing in dysarthria. J. Speech Hear. Disord. 1989;54(4):482–499. doi: 10.1044/jshd.5404.482. [DOI] [PubMed] [Google Scholar]
- 3.Green J.R., Yunusova Y., Kuruvilla M.S., Wang J., Pattee G.L., Synhorst L. Bulbar and speech motor assessment in ALS: Challenges and future directions. Amyotroph. Lateral Sclerosis Frontotemporal Degenerat. 2013;14(7–8):494–500. doi: 10.3109/21678421.2013.817585. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Rong P., Yunusova Y., Wang J., Green J.R. 2015. Predicting early bulbar decline in amyotrophic lateral sclerosis: A speech subsystem approach. (No Journal Specified) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Duffy J.R. Elsevier Health Sciences; 2012. Motor Speech Disorders: Substrates, Differential Diagnosis, and Management. [Google Scholar]
- 6.Iwasaki Y., Ikeda K., Kinoshita M. The diagnostic pathway in amyotrophic lateral sclerosis. Amyotrophic Lateral Sclerosis Motor Neuron Disorders. 2001;2(3):123–126. doi: 10.1080/146608201753275571. [DOI] [PubMed] [Google Scholar]
- 7.Benba A., Jilbab A., Hammouch A. Discriminating between patients with Parkinson’s and neurological diseases using cepstral analysis. IEEE Trans. Neural. Syst. Rehabil. Eng. 2016;24(10):1100–1108. doi: 10.1109/TNSRE.2016.2533582. [DOI] [PubMed] [Google Scholar]
- 8.Orozco-Arroyave J.R., Hönig F., Arias-Londoño J., Vargas-Bonilla J., Daqrouq K., Skodda S. Automatic detection of Parkinson’s disease in running speech spoken in three different languages. J. Acoust. Soc. Am. 2016;139(1):481–500. doi: 10.1121/1.4939739. [DOI] [PubMed] [Google Scholar]
- 9.Rusz J., Cmejla R., Ruzickova H., Ruzicka E. Quantitative acoustic measurements for characterization of speech and voice disorders in early untreated Parkinson’s disease. J. Acoust. Soc. Am. 2011;129(1):350–367. doi: 10.1121/1.3514381. [DOI] [PubMed] [Google Scholar]
- 10.Gómez-Vilda P., Londral A.R.M., Rodellar-Biarge V., Ferrández-Vicente J.M., de Carvalho M. Monitoring amyotrophic lateral sclerosis by biomechanical modeling of speech production. Neurocomputing. 2015;151:130–138. [Google Scholar]
- 11.Guerra E.C., Lovey D.F., editors. Proceedings of the 25th Annual International Conference of the IEEE Engineering in Medicine and Biology Society. 2003. A modern approach to dysarthria classification. [Google Scholar]
- 12.Devine M.S., Farrell A., Woodhouse H., McCombe P.A., Henderson R.D. A developmental perspective on bulbar involvement in amyotrophic lateral sclerosis. Amyotroph. Lateral Sclerosis Frontotemporal Degenerat. 2013;14(7–8):638–639. doi: 10.3109/21678421.2013.812663. [DOI] [PubMed] [Google Scholar]
- 13.Yorkston K.M. Speech deterioration in amyotrophic lateral sclerosis: implications for the timing of intervention. J. Med. Speech-Lang. Pathol. 1993;1:35–46. [Google Scholar]
- 14.Del Aguila M., Longstreth W., McGuire V., Koepsell T., Van Belle G. Prognosis in amyotrophic lateral sclerosis: a population-based study. Neurology. 2003;60(5):813–819. doi: 10.1212/01.wnl.0000049472.47709.3b. [DOI] [PubMed] [Google Scholar]
- 15.Mitsumoto H., Bene M.D. Improving the quality of life for people with ALS: the challenge ahead. Amyotroph. Lateral Scler. Motor Neuron Disorders. 2000;1(5):329–336. doi: 10.1080/146608200300079464. [DOI] [PubMed] [Google Scholar]
- 16.Ball L.J., Beukelman D.R., Pattee G.L. Timing of speech deterioration in people with amyotrophic lateral sclerosis. J. Med. Speech Lang. Pathol. 2002;10(4):231–236. [Google Scholar]
- 17.DePaul R., Brooks B.R. Multiple orofacial indices in amyotrophic lateral sclerosis. J Speech, Lang. Hear. Res. 1993;36(6):1158–1167. doi: 10.1044/jshr.3606.1158. [DOI] [PubMed] [Google Scholar]
- 18.Mefferd A.S., Green J.R., Pattee G. A novel fixed-target task to determine articulatory speed constraints in persons with amyotrophic lateral sclerosis. J. Commun. Disord. 2012;45(1):35–45. doi: 10.1016/j.jcomdis.2011.09.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Niimi M.N., Seiji N. Changes over time in dysarthric patients with amyotrophic lateral sclerosis (ALS): a study of changes in speaking rate and maximum repetition rate (MRR) Clin. Linguist. Phonet. 2000;14(7):485–497. [Google Scholar]
- 20.Yunusova Y., Green J., Ball L., Mefferd A., Pattee G., editors. Proceedings of the Annual Speech Language and Hearing Convention. 2007. Articulatory correlates of speech intelligibility in ALS. [Google Scholar]
- 21.Ball L.J., Beukelman D.R., Pattee G.L. Acceptance of augmentative and alternative communication technology by persons with amyotrophic lateral sclerosis. Augment. Alternat. Commun. 2004;20(2):113–122. [Google Scholar]
- 22.Vashkevich M., Rushkevich Y. Classification of ALS patients based on acoustic analysis of sustained vowel phonations. Biomed. Signal Proc. Control. 2021;65 [Google Scholar]
- 23.Baken R.J. 1987. Clinical measurement of speech and voice (No Title) [Google Scholar]
- 24.Tibshirani R. Regression shrinkage and selection via the lasso. J. Royal Stat. Soc. Series B: Stat. Methodol. 1996;58(1):267–288. [Google Scholar]
- 25.Kononenko I. European Conference on Machine Learning. Springer; 1994. Estimating attributes: Analysis and extensions of RELIEF. [Google Scholar]
- 26.Robnik-Šikonja M., Kononenko I. Theoretical and empirical analysis of ReliefF and RReliefF. Mach. Learn. 2003;53:23–69. [Google Scholar]
- 27.Vashkevich M., Petrovsky A., Rushkevich Y., editors. 2019 Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA) 2019. Bulbar ALS Detection Based on Analysis of Voice Perturbation and Vibrato. [Google Scholar]
- 28.Simmatis L.E.R., Robin J., Spilka M.J., Yunusova Y. Detecting bulbar amyotrophic lateral sclerosis (ALS) using automatic acoustic analysis. Biomed. Eng. Online. 2024;23(1):15. doi: 10.1186/s12938-023-01174-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Merler M., Agurto C., Peller J., Roitberg E., Taitz A., Trevisan M.A., et al. Clinical assessment and interpretation of dysarthria in ALS using attention based deep learning AI models. NPJ Digital Med. 2025;8(1):260. doi: 10.1038/s41746-025-01654-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Stefanovska E., Pepelnjak T. Optimising predictive accuracy in sheet metal stamping with advanced machine learning: a LightGBM and neural network ensemble approach. Adv. Eng. Inform. 2025;65 [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Supplementary material
Data Availability Statement
Data is available upon request from corresponding author.