Table 1.
First Author and Publication Year | Number of participants with ALS (Number with Bulbar Onset) | Length of follow up | Frequency of data collection | Speech sample acquisition method | Device Brand | Speech features assessed | Additional assessments | Mean Total ALS-FRS (R) (Mean Bulbar Sub-score) | Results/Conclusions |
---|---|---|---|---|---|---|---|---|---|
Agurto, 201928 | 42 (34) | 11 months |
Once weekly (ALS-FRS/FVC/SVC at start point) |
App | Help us Answer ALS | Pitch variation, prosody features, vowel space, vowel quality and noise measurements, mel frequency cepstral coeffients (MFCC), tremor features including tremor frequency and tremor amplitude, spectral features (including spectral slope, the maximum energy, the frequency where the maximum value is obtained, as well as the median and the energy IQR of the long-term average spectra) |
ALS-FRS(R) FVC SVC |
F-34.70 M-40.50 (-) |
AR and MFCC highly correlated with initial speech score. Best speech features obtained from the reading task. Initial scores unable to determine disease trajectory but extracted features could. Progression related more to onset type than initial score. |
Berry, 201927 | 23 (4) | 24 weeks |
Twice weekly for passage reading. 4 days per week for 3 randomly assigned questions ALSFRS-(R) in full once weekly 6, 12 and 24 week clinician administered ALS-FRS(R) |
App | Beiwe | Speech and pause variables |
ALS-FRS(R) VC ALS-CBS |
34.00 (-) |
The correlation between clinic assessed and app ALS-FRS(R) scores taken at a single time point was high at 0.93. High correlation for trajectory of decline between clinic assessment and app-based assessment. An increase in pause duration was demonstrated over time. |
Buder, 199631 | 1 (1) | 2 years | 3 assessments completed | Audiocasset | - | FORMOFFA (FOR= FORmants; MO =MOments; FF = Fundamental Frequency; A = Amplitude). | None | - | Reduction in amplitude (flattening) and reduced spectral mean variability (M1). |
Cebola, 202353 | 40 (-) | Single-time point | - | Smartphone | - |
Window-based features (temporal, spectral and statistical). Full signal features (silence features and formant features). |
None | - |
Sample based classification: Best overall data was from the vowel task. Best results came from use of the whole feature set. The best results were achieved with SVM for both datasets. Patient based classification: Achieved better results than sample-based with the best results achieved using the complete feature set. Accuracy and F1 scores also improved. Accuracy values of over 0.90 using the full speech feature set. |
Chiaramonte, 201910 | 22 (22) | 3 months | Monthly | Multi-Dimensional Voice Programme (MDVP) | Multi-Dimensional Voice Programme (MDVP) | F0; long-term control of amplitude (vAM); jitter; shimmer; Noise to harmonic ratio (NHR); Amplitude tremor intensity index (ATRI); Frequency tremor intensity index (FTRI); Voice turbulence index (VTI); Soft phonation index (SPI); Degree of subharmonic components (DSH) and Degree of voice break (DVB). |
GIRBAS Scale Penetration Aspiration Scale ENT assessment Spectrogram Electroglottography FEES |
- |
Jitter, shimmer, vF0, ATRI, FTRI and vAM all significantly increased. NHR significantly reduced. The scores of the GIRBAS scale and Penetration Aspiration Scale (PAS) were higher at 3 months than at initial assessment, indicating increased difficulty in swallowing. Both progressive dysphagia and dysarthria are associated with muscle weakness and loss of motor control. Acoustic analysis should be used in combination with other assessments including swallowing assessment. |
Garcia-Gancedo, 201922 | 25 (2) | 48 weeks |
Clinic visit every 12 weeks (speech assessment) Wore sensor for 3 consecutive days every month |
Microphone | High fidelity speech capture system | Central tendency of F0, jitter, shimmer, speaking rate, average phoneme rate and % pause time |
ALS-FRS(R) FVC |
41.60 (-) |
All (100%) patients captured the digital speech data successfully at baseline and at week 48. All collected data was successfully analysed. Little change from baseline or an observable change over time was seen. Recording of voice samples using this method is feasible and acceptable to patients. |
Kelly, 202023 | 25 (2) | 48 weeks |
Clinic visit every 12 weeks (speech assessment) Wore sensor for 3 consecutive days every month |
Microphone | High fidelity speech capture system | Central tendency of F0, jitter, shimmer, speaking rate, average phoneme rate and % pause time |
ALS-FRS(R) FVC |
41.60 (11.40) | Four speech endpoints showed between-patient correlation coefficient >0.5- with bulbar (average phoneme rate and average speaking rate) and respiratory (jitter and shimmer) of ALS-FRS(R) score. |
Laganaro, 202132 | 20 (-) | Single-time point | - | Microphone | - |
Intelligibility, articulation, pneumophonatory control. Voice features: jitter; shimmer; F0, harmonic to noise ratio (HNR), cepstral peak prominence (CPPs); prosody, speaking rate and diadochokinetic rate (DDK). |
None | - |
High correlation between device system score and externally validated perceptual score. Sensitivity of 83.3% and specificity 95.2%. System able to identify abnormal speech but not distinguish between pathologies. |
Lévêque, 202233 | 33 (33) | Single-time point | - | Earset Microphone | Focusrite Scarlett (2i4) external audiocard and a professional quality Shure SM35-XLR earset microphone |
TSC_MFCCs: Degree of acoustic change across a sequence. Mean TSC_MFCCs: Average acoustic change across a sequence. VARCO: Degree of variability of the acoustic change across a sequence. eventDUR: Differences in articulation rate. |
None | - |
MEAN acoustic change was significantly smaller for ALS and primary lateral sclerosis (PLS) compared with spinal and bulbar muscular atrophy SBMA (respectively p < .001 and p < .01) and controls (p < .001). ALS showed higher VARCO values than control (p < .0001), PLS (p = .04), and SBMA (p = .001). ALS and PLS speakers are slower than both SBMA and control speakers for all sequences. |
Likhachov, 202130 | 31 (-) | Single-time point | - | Smartphone | ALS Expert Mobile Application for Android. |
Jitter. Shimmer. Features based on F0-contour: Pitch period entropy (PPE) and Pathological vibrato index (PVI). Noise features: Harmonic to noise ratio (HNR) and Glottal -to noise excitation ratio (GNE). |
None | - |
Classifier using jitter, PPE, PVI and GNE was the most accurate. Correct identification of ALS possible using only one voice test. |
Liscombe, 202135 | 50 (-) | Single-time point | - | Microphone | - | Speech events and silence events measured as: true negative time, true positive time, false negative and false positive. Speaker loudness. | ALS-FRS(R) | - |
More silences present for the bulbar cohort most extremely demonstrated by the SIT task. Optimal configuration differs between groups. |
Liscombe, 202334 | 10 (-) | Single-time point | - | Microphone | - | Speech events and silence events measured as: true negative time, true positive time, false negative and false positive. Speaker loudness. | ALS-FRS(R) | - |
Most dramatic change between control VAD settings and pathological VAD settings was endSilence which doubled. When VAD settings were optimised for pathological groups compared to control settings, DCF, I% and FN% reduced. These fell further when settings were further optimised for data in a specific cohort. Cohort specific 10-fold cross validation tests to assess robustness found little variation from other data presented. |
Maffei, 202336 | 49 (7) | Single-time point | - | Lapel Microphone | Audio-Technica AT831R | Perturbation and Noise based measures: local jitter; local shimmer; harmonic to noise ratio (HNR). Cepstral/spectral measures: cepstral peak prominence (CPP); Low High Spectral Ratio and Cepstral Spectral Index of Dysphonia (CSID) | None | - |
Jitter, shimmer, and HNR levels are abnormal in ALS and can discriminate between normal and dysphonic voices. Cepstral/spectral measures also discriminated the groups with excellent or acceptable diagnostic accuracy, defined as an area under the curve (AUC) > .8 and > .7. |
Mori, 200437 | 4 (-) | Single-time point | - | Microphone | Dynamic microphone or electret condenser microphone MI-1233 |
F0 range and F0 minimum. F1: First formant frequency. F2: Second formant frequency |
None | - |
F0 range narrower in dysarthric speakers. F0 minimum for ALS did not differ significantly from controls. F1 and F2 vowel spaces were narrower than controls. Formant frequencies in expected regions. |
Naeini, 202238 | 243 (-) | Single time-point | - | Microphone | - | Pause duration; total duration; speech duration; pause events; % pause; mean phrase; coefficient of variation of phrase durations; coefficient of variation of pause durations | SIT | - |
Both MFA and Wav2Vec2 performed well when compared to the Speech and Pause Analysis (SPA) software. Wav2Vec2 generalized better across clinical severities. Wav2Vec2 model performed better with most features. Audio deemed to be ‘good’ had the strongest correlations. |
Neumann, 202151 | 54 (32) | Single-time point | - | Microphone | - | Mean F0; jitter; shimmer; harmonic to noise ratio (HNR); cepstral peak prominence (CPP); speaking & articulation duration and rate; percentage pause time (PPT) and, Cycle-to-cycle temporal variation (cTV) | ALS-FRS(R) | Bulbar-33.09 (8.75) pre-bulbar-36. 45 (12.00) |
Strong differences between acoustic features for timing measures. Effect sizes between bulbar and control groups were highest. Mean F0 showed a significant difference (smaller effect sizes). UAR- between control and pre-bulbar 0.63 and between bulbar and pre-bulbar -0.77. Voice quality measures added power to the predictive model for pre-bulbar samples. |
Nevler, 202024 | 67 (16) | Single-time point | - | - | Speech Activity Detector (SAD), developed at the University of Pennsylvania Linguistic Data Consortium. | F0, F0 range, mean speech segment duration, total speech duration, pause rate |
Edinburgh Cognitive Assessment Scale (ECAS), ALS-FRS(R), Motor examination, Mini-Mental State Examination (MMSE) MRI |
ALS- 35(-), ALS-FTD- 34.2 (-) |
The F0 range was restricted in patients with ALS-FTD compared to healthy controls (p = 0.005). There was no significant difference between F0 range between motor ALS and healthy controls (p = 0.15). Regression analysis showed strong association between F0 range and severity of bulbar impairment. No association was found between F0 range and cognitive impairment using MMSE score as predictor of F0 range (p = 0.34). Mean speech segment duration was reduced in ALS-FTD compared to controls (p < 0.001) and motor-ALS (p = 0.042). Cognitive impairment was associated with mean speech segment duration and total speech time. Pause rate is related to cognitive function. Exploratory regression analysis revealed a relationship between F0 range, pause rate and total speech duration, and cortical thickness in different areas of the brain. |
Norel, 201829 | 67 (-) | Single-time point | - | App | ALS Mobile Analyzer | Mel-frequency cepstral coefficients (MFCC), spectral changes | ALS-FRS(R) | - | 79% accuracy for males and 83% for females.For males a single feature could distinguish controls and patients. Model was tolerant to uncontrolled recording conditions. |
Peplinski, 201925 | 65 (-) | Several months | Daily | App | ALS at Home | Components of tremor: dominant tremor frequency, maximum absolute tremor intensity, median absolute tremor intensity, mean absolute tremor intensity, max relative tremor intensity, median relative tremor intensity, mean relative tremor intensity, tremor energy, tremor entropy | None | - |
Discriminative power to separate perceptually rated tremor vs non-tremor. Unable to distinguish controls from those ALS without tremor. |
Robert, 199939 | 63 (40) | Single-time point | - | Digital tape recorder | - | F0, jitter, intensity, shimmer, number of harmonics in frequency spectral analysis. | None | - |
5 of 8 acoustic features used present in symptomatic and asymptomatic ALS Jitter significantly higher with bulbar symptoms. Shimmer and CVF were also higher. No of harmonics was significantly lower in symptomatic ALS. MPFR was significantly lower in ALS patients |
Rong, 201516 | 66 (15) | 60 months | Aimed every 3 months but varied based on clinic follow-up schedule. Average no of sessions was 7. | Microphone | Countryman E6 microphone | Respiratory subsystem: Pausing patterns and subglottal pressure. Phonatory subsystem: Jitter; shimmer; noise to harmonic ratio (NHR); loudness; and maximum F0. Resonatory subsystem: Nasalance, peak oral pressure and peak nasal airflow. Articulatory measures. |
ALS-FRS(R) SIT |
38.00 (10.00) | DDK and F0 identified early bulbar decline occurring before SR and SIT decline |
Rong, 201640 | 66 (15) | 1792 days | Approximately every 3 months (varied with clinic schedule) | Microphone | Countryman E6 | 58 measures across 4 subsystems. Speech |
ALS-FRS(R) %FVC SIT |
38.00 (10.00) |
Decline in AMR task performance prior to speech intelligibility decline. Distinction between fast and slow bulbar progressors. |
Rong, 202019 | 16 (-) | Single-time point | - | Microphone | - | Cycle-to-cycle temporal variation (cTV) and syllable rate (sylRate). | SIT | - |
Cycle-to-cycle temporal variation (cTV) showed large increase in early bulbar disease. Large effect size of cTV between controls and early bulbar disease. |
Rowe, 202220 | 46 (-) | Single-time point | - | Microphone or App (depending on database used) | Professional quality microphones (e.g., AKG C410, Shure SM81 Condenser, Olympus VN-702PC digital recorder) or the Beiwe application |
Coordination- relative duration of the silence between two articulatory gestures during each syllable transition (GapSyllProp). Consistency- across repetition variability in voice onset time. (RepVarVOT)Speed- Second formant slope in the consonant transition of /k/ (F2Slope). Precision - across-consonant variability in second formant slope in the consonant transitions of /p/, /t/, and /k/ (ConVarF2Slope). Rate - number of syllables produced per second (RepRate). |
None | - |
Multivariate analysis indicated a different articulatory pattern depending on the diagnosis of the speaker. This was significant for all articulatory components (coordination, speed, precision, rate). Overall Pearson correlation revealed only weak to moderate correlations with pairs of acoustic features for both each individual pathology and the whole study population. Speed and Precision were most strongly correlated (0.72) in speakers with ALS. ALS was the only clinical group where multivariate LDAs using receiver-operating characteristic (ROC) curves showed below acceptable values for sensitivity, specificity, and area under the curve (AUC). The full feature profile performed significantly better than the individual features at classifying the clinical groups. |
Rutkove, 202021 | 113 (60) | 9 months | Daily for 90 days then 2x weekly for additional 180 days (ALSFRS(R) collected weekly) | App | ALS at home | - |
ALS-FRS(R) FVC |
36.10 (-) |
Patients reported greater sense of control. Frequent at home data collection successful and would reduce future sample sizes. |
Silbergleit, 199741 | 20 (-) | Single-time point | - | Headband microphone |
Cspeech CompuAdd computer, model 320/325 IBM ACPA (audio capture and playback adapter) A/D D/A card |
Jitter; shimmer; Signal-to-noise ratio (SNR) and Maximum phonation frequency range (MPFR). | Hearing screening | - | Jitter and maximum phonation frequency range (MPFR) showed significant differences between groups. Shimmer and signal-to-noise ratio (SNR) unable to separate groups. |
Stegmann, 202026 | 65 (12) | 9 months | Daily speech samples for 3 months then 2x weekly for 6 months (Average every 2.9 days) | App | ALS at home |
AP & SR Articulatory precision (AP) and speaking rate (SR) |
ALS-FRS(R) | 37.10 (9.70) |
Speaking rate (SR) and articulatory precision (AP) able to detect bulbar involvement early and track progression. Remote assessment via mobile app possible. Decline of AP and SR faster in bulbar-onset than non-bulbar onset. |
Tanchip, 202242 | 145 (33) | Single-time point | - | Microphone | Marantz PMD660 compact flash recorder with an accompanying Countryman E6 omnidirectional microphone or an Olympus WS-853 recorder with an accompanying ME52W unidirectional microphone | Diadochokinetic rate (DDK); cycle-to-cycle temporal variation (cTV); number of syllables | SIT | - |
The intraclass correlation coefficient (ICC) calculated between syllable counts was 0.99 between both Raters 1 and 2 and Raters 2 and 3, suggesting excellent reliability of the manual procedure. Generally, there was overall agreement between the manual and algorithmic syllable detection. Disease severity had a significant effect on syllable count agreement (p < 0.001) with all five algorithms overestimating syllable count in the severe stage, and all except the Energy algorithm overestimating in the moderate stage. For DDK rate and cTV, the Energy algorithm performed best with correlations of over 0.7 with manual analysis. |
Tena, 202243 | 47 (14) | Single-time point | - | Microphone | USB EMITA Streaming GXT 252 microphone and Audacity (open-source application). |
Phonatory subsystem features including: absolute jitter; relative jitter; absolute Shimmer; relative Shimmer; mean harmonic-to-noise ratio; pitch (SD), pitch (min), pitch (max), pitch (mean). Time frequency features including: Average instantaneous spectral energy, instantaneous frequency peak and spectral information |
None | - |
Differentiation of diagnosis by gender was the most important finding. The best model was Random Forest (RF). RF able to distinguish between control group and bulbar ALS patients with an accuracy of 96.1% and 98.1% for males and females respectively. Different numbers of statistically significant features were identified depending on the cohort and whether participants were male or female. |
Tomik, 201544 | 17 (17) | 12 months | Baseline, at 6 months and 12 months | Microphone | - | F0, jitter, shimmer, noise-to-harmonic ratio (NHR), voice range and maximum phonation time (MPT). | None | - |
Jitter was significantly higher for all examinations in women with ALS compared to controls. Mean shimmer and NHR values were significantly higher in women with ALS. Mean F0 did not show a reduction in ALS for either sex. |
Tomik, 199945 | 53 (15) | 36 weeks | Every 10-12 weeks | Microphone | Bruel and Kjaer microphone | Articulation time, pause duration. | None | - |
Significant differences between the mean distances for all chosen sounds in both ALS groups. Significant increase over time for mean distances for all sounds in both groups. Different acoustic signature patterns identified for each ALS group with different sounds showing different distance increases. |
Vashkevich, 201861 | 26 (-) | Single-time point | - | Smartphone (with a headset) | - | Distance between vowel envelopes, mutual location of formant frequencies, difference in amplitude of the harmonics. | Norris scale | - |
Reduced distance between vowel envelopes in pathology. Harmdiff showed a good separation between control group and ALS. HNR did not show distinction. High accuracy of 88%. |
Vashkevich, 201960 | 15 (-) | Single-time point | - | Smartphone (with a standard headset) | - | Distance between spectral envelopes, formant structure of the speech, formant convergence, breathiness. | None | - |
Distance between vowel envelopes, second formant of ‘I’ and second formant convergence of vowels produced good distinction between controls and ALS group. 84.8% accuracy achieved using just second formants of vowel ‘I’. |
Vashkevich, 202159 | 31 (13) | Single-time point | - | Smartphone (with a standard headset) | - | Jitter & shimmer features; F0; spectral envelopes; harmonic-to-noise ratio (HNR); Glottal -to noise excitation ratio (GNE); Mel-frequency cepstral coefficients (MFCC), Phonatory frequency range (PFR), Pitch period entropy (PPE), Pathological vibrato index (PVI) and tremor and harmonics. | None | - |
Pathological vibrato index (PVI) and Mel-frequency cepstral coefficients (MFCC) are most valuable. MFCC is valuable for early diagnosis by distinguishing from controls. Pathological vibrato index (PVI) is valuable for identifying later changes and progression of disease. Jitter, shimmer and harmonic-to-noise ratio (HNR) is less useful. |
Wang, 201846 | 12 (-) | Single-time point | - | Microphone | - | Jitter, shimmer and Mel-frequency cepstral coefficients (MFCC) | SIT | - | Combining lip, tongue and acoustic data produces also achieved higher accuracy, better correlation and RMSE than acoustic alone. |
Wang, 201647 | 11 (-) | Single-time point | - | Microphone | - | F0 and Mel-frequency cepstral coefficients (MFCC) | SIT | - |
Acoustic data alone produced accuracy above 50%. Lip, tongue, and acoustic data combined improved accuracy to 80.91%. Feasible to detect ALS automatically from short speech samples. |
Wang, 201648 | 9 (-) | Single-time point | - | Microphone | - | F0 features and harmonic-to-noise ratio (HNR) | SIT | - |
Feasible with only acoustic data. Adding articulatory data improves model performance. |
Weismer, 200149 | 10 (-) | Single-time point | - | Microphone | - | Formant frequency measure including F2 slopes; intelligibility and speaking rate (SR). | None | - |
Total utterance length significantly greater for ALS compared to both PD and controls. Vowel space and F2 slopes taken from either single word or sentence production highly correlated with single word and scaled sentence intelligibility. |
Wisler, 201950 | 66 (-) | 24 months | 4 sessions with an interval of 4-6 months | Shure Microflex microphone | Shure Microflex microphone | Mel-frequency cepstral coefficients (MFCC). |
ALS-FRS(R) SIT |
- | Best RMSE and correlations when acoustic data is combined with lip and tongue data using SVR model. |
Yunusova, 20169 | 85 (-) | Single-time point | - | Microphone | - | Speaking rate (SR), articulatory rate (AR) & pause features. |
ALS-FRS(R) SIT |
33.53 (-) |
Articulation rate able to distinguish bulbar disease from respiratory disease. CV phase duration can be used for early detection. |
ALS Amyotrophic lateral sclerosis; ALS-FRS(R) Amyotrophic lateral sclerosis functional rating scale revised, ALS-CBS Amyotrophic lateral sclerosis cognitive behavioural screen, SIT Speech Intelligibility Testing, VC Vital capacity, FVC Forced vital capacity, SVC Slow vital capacity, F0 Fundamental frequency, vF0 Fundamental frequency variation, Jitter frequency perturbation, Shimmer Amplitude perturbation