Skip to main content
. 2024 Feb 10;24(4):1173. doi: 10.3390/s24041173

Table 3.

Voice analysis related papers.

Reference Year Topic Data and Cohort Recording Device ML Models Used Data Processing Methods KPIs
[71] 2020 Disease Identification: COVID-19 Private—116 subjects (76 8 weeks post COVID-19, 40 Healthy Smartphone, Various Microphones VGG19 Log-mel spectrogram Acc: 0.85%, Sens: 0.89%, Spec: 0.77%
[72] 2021 Disease Identification: COVID-19 Coswara—166 subjects (83 COVID-19 positive, 83 Healthy) Various Microphones NB, Bayes Net, SGD, SVM, K-NN, Adaboost algorithm (model combination), DT, OneR, J48, RF, Bagging, Decision table, LWL Fundamental Frequency (F0), Shimmer, Jitter and Harmonic to Noise Ratio, MFCC or Spectral Centroid or Roll-Off Best overall results for vowels a, e, o: Random Forest: Acc: 82.35%, Sens: 94.12%, Spec: 70.59%
[73] 2021 Disease Identification: COVID-19 Coswara—1027 subjects (77 COVID-19 positive (54M, 23F), 950 Healthy (721M, 229F)) Various Microphones SVM, SGD, K-NN, LWL, Adaboost and Bagging, OneR, Decision Table, DT, REPTree ComParE_2016, FF, Jitter and Shimmer, Harmonic to Noise Ratio, MFCCs, MFCC Δ and ΔΔ, Spec. Centroid, Spec. Roll-off Best overall results for vowels a, e, o: SVM: Acc: 97.07%, F1: 82.35%, Spec: 97.37%
[74] 2021 Disease Identification: COVID-19 Private—196 subjects (69 COVID-19, 130 Healthy) Mobile App, Web App–Smartphone, Various Microphones SVM, RBF, RF 1024 embedding feature vector from D-CNN Best model: RF: Acc: 73%, F1: 81%
[75] 2022 Disease Identification: COPD Corpus Gesproken Nederlands—Cohort n.s. Various Microphones SVM Mean intensity (db), Mean frequency (Hz), Pitch variability (Hz), Mean center (Hz) of gravity Formants, Speaking rate, Syllables per breath group, Jitter, Jitter ppq5, Shimmer, Shimmer apq3, Shimmer apq5, HNR, ComParE_2016 Acc: 75.12%, Sens: 85%
[76] 2021 Disease Identification: COPD Private—49 subjects (11 COPD exacerbation, 9 Stable COPD, 29 Healthy) Smartphone LDA, SVM Duration, the four formants, mean gravity center, some measures of pitch and intensity, openSMILE, eGeMAPS, # of words read out loud, duration of file p < 0.01
[77] 2021 Disease Identification: COVID-19 Coswara—Dataset 1: 1040 subjects (965 non-COVID), Dataset 2: 990 subjects (930 non-COVID) Smartphone LR, MLP, RF 39-dimensional MFCCs + Δ and ΔΔ coeff., window size of 1024 samples, window hop size = 441 samples Dataset 1 - RF: Average AUC: 70.69%, Dataset 2 - RF: Average AUC: 70.17%
[78] 2020 Disease Identification: COVID-19 Israeli COVID-19 collection—88 subjects (29 positive, 59 negative) Smartphone Transformer, SVM Mel spectrum transformation /z/: F1: 81%, Prec: 82%, counting: F1: 80%, Prec: 80%, /z/, /ah/: F1: 79%, Prec: 80%, /ah/: F1: 74%, Prec: 83%, cough: 58%, Prec: 72%
[79] 2021 Disease Identification: COVID-19, Asthma COVID-19 sounds—1541 Respiratory Sounds Mobile App, Web App–Smartphone, Various Microphones light-weight CNN MMFCC, EGFCC and Data De-noising Auto encoder COVID-19/non-COVID-19 + breath + cough: Acc: 89%, Asthma/non-asthma + breath + voice Acc: 84%
[80] 2022 Disease Identification: Asthma Private—8 subjects (100 normal, 321 Wheezing, 98 Striding, 73 Rattling sounds) N/A DQNN, Hybrid machine learning IWO, Signal Selection: EHS algorithm Spec: 99.8%, Sens: 99.2%, Acc: 100%
[81] 2022 Disease Identification: Asthma 18 patients—300 respiratory sounds, 10 types of breathing N/A DENN IWO Algorithm for Asthma Detection & Forecasting Spec: 99.8%, Sens: 99.2%, Acc: 99.91%
[82] 2020 Disease Identification: Asthma Private—95 subjects (47 asthmatic, 48 healthy) Various Microphones SVM ISCB using openSMILE, SET A: 5900 features, SET B: 6373 features, MFCC /oU/ All feature groups: Acc: 74%
[13] 2020 Disease Identification: COVID-19 Private–240 acoustic data—60 normal, 20 COVID-19 subjects Smartphone LSTM (RNN) Spec. Centroid, Spec. roll-off, ZCS, MFCC (+ΔΔ) Cough: F1: 97.9% acc: 97%, breathing: F1: 98.8% acc: 98.2%, voices: F1: 92.5% acc: 88.2%
[83] 2020 Disease Identification: Asthma 88 recordings: 1957 segments (65 Severe resp. distress, 216 Asthma, 673 Mild resp. distress) Smartphone LIBSVM Acoustic features: Interspeed 2010 Paralinguistic Challenge, 38 LLDs and 21 functionals Acoustic Features: Acc: 86.3%, Sens: 85.9%, Spec: 86.9%
[84] 2021 Disease Identification: Asthma Private—30 subjects N/A RDNN Discrete Ripplet-II Transform Proposed EAP-DL: Acc: 86.3%, Sens: 85.9%, Spec: 86.9%
[85] 2022 Symptom Identification: Voice Alteration OPJHRC Fortis hospital in Raigarh—Cohort, not specified Various Microphones K-NN, SVM, LDA, LR, Linear SVM, etc. Formant Frequencies, Pitch, Intensity, Jitter, Shimmer, Mean Autocorrelation, Harmonic to Noise ratio, Noice to Harmonic ration, MFCC, LPC Decision Tree K-fold: Acc: 90% Sen: 90% Spec: 90%
[86] 2019 Symptom Identification: Voice Alteration Private—Cohort n.s. Various Microphones Pretrained from Intel OpenVIVO and TensorFlow Not specified, however models are vision based N/A
[87] 2021 Disease Identification: COVID-19 Coswara, Cambridge DB-2—4352 Web App users, 2261 Android App users Smartphone SVM MFCC Acc: 85.7%, F2: 85.1%

Note. ML models: SVM = Support Vector Machine; K-NN = K-Nearest Neighbors; DT = Decision Trees; RF = Random Forest; NN = Neural Network; D-CNN = Deep Convolutional Neural Network; MLP = Multilayer Perceptron; NB = Naive Bayes; IWO = ImprovedWeed Optimization; DENN = Differential Evolutionary Neural Network; RBF model = Radial Basis Function model; LR = Linear Regression; LWL = Locally Weighted Regression (or Lowess); LDA = Linear Discriminant Analysis. Data Processing Methods: MFCCs = Mel-Frequency Cepstral Coefficients; CIF = Cochleagram Image Features; EGFCC = Enhanced-Gamma-tone Frequency Cepstral Coefficients; MMFCC = Modified Mel-frequency Cepstral Coefficients; IWO = Improved Weed Optimization; EHS = Effective Hand Strength; ISCB = Improved Standard Capon Beamforming; LPC = Linear Predictive Coding; FF = Fundamental Frequency; ZCS = Zero Crossing Rate. Metrics: Acc = Accuracy; Sens = Sensitivity; Spec = Specificity; Prec = precision; AUC = Area Under Curve.