Table 3.
Reference | Year | Topic | Data and Cohort | Recording Device | ML Models Used | Data Processing Methods | KPIs |
---|---|---|---|---|---|---|---|
[71] | 2020 | Disease Identification: COVID-19 | Private—116 subjects (76 8 weeks post COVID-19, 40 Healthy | Smartphone, Various Microphones | VGG19 | Log-mel spectrogram | Acc: 0.85%, Sens: 0.89%, Spec: 0.77% |
[72] | 2021 | Disease Identification: COVID-19 | Coswara—166 subjects (83 COVID-19 positive, 83 Healthy) | Various Microphones | NB, Bayes Net, SGD, SVM, K-NN, Adaboost algorithm (model combination), DT, OneR, J48, RF, Bagging, Decision table, LWL | Fundamental Frequency (F0), Shimmer, Jitter and Harmonic to Noise Ratio, MFCC or Spectral Centroid or Roll-Off | Best overall results for vowels a, e, o: Random Forest: Acc: 82.35%, Sens: 94.12%, Spec: 70.59% |
[73] | 2021 | Disease Identification: COVID-19 | Coswara—1027 subjects (77 COVID-19 positive (54M, 23F), 950 Healthy (721M, 229F)) | Various Microphones | SVM, SGD, K-NN, LWL, Adaboost and Bagging, OneR, Decision Table, DT, REPTree | ComParE_2016, FF, Jitter and Shimmer, Harmonic to Noise Ratio, MFCCs, MFCC and , Spec. Centroid, Spec. Roll-off | Best overall results for vowels a, e, o: SVM: Acc: 97.07%, F1: 82.35%, Spec: 97.37% |
[74] | 2021 | Disease Identification: COVID-19 | Private—196 subjects (69 COVID-19, 130 Healthy) | Mobile App, Web App–Smartphone, Various Microphones | SVM, RBF, RF | 1024 embedding feature vector from D-CNN | Best model: RF: Acc: 73%, F1: 81% |
[75] | 2022 | Disease Identification: COPD | Corpus Gesproken Nederlands—Cohort n.s. | Various Microphones | SVM | Mean intensity (db), Mean frequency (Hz), Pitch variability (Hz), Mean center (Hz) of gravity Formants, Speaking rate, Syllables per breath group, Jitter, Jitter ppq5, Shimmer, Shimmer apq3, Shimmer apq5, HNR, ComParE_2016 | Acc: 75.12%, Sens: 85% |
[76] | 2021 | Disease Identification: COPD | Private—49 subjects (11 COPD exacerbation, 9 Stable COPD, 29 Healthy) | Smartphone | LDA, SVM | Duration, the four formants, mean gravity center, some measures of pitch and intensity, openSMILE, eGeMAPS, # of words read out loud, duration of file | p < 0.01 |
[77] | 2021 | Disease Identification: COVID-19 | Coswara—Dataset 1: 1040 subjects (965 non-COVID), Dataset 2: 990 subjects (930 non-COVID) | Smartphone | LR, MLP, RF | 39-dimensional MFCCs + and coeff., window size of 1024 samples, window hop size = 441 samples | Dataset 1 - RF: Average AUC: 70.69%, Dataset 2 - RF: Average AUC: 70.17% |
[78] | 2020 | Disease Identification: COVID-19 | Israeli COVID-19 collection—88 subjects (29 positive, 59 negative) | Smartphone | Transformer, SVM | Mel spectrum transformation | /z/: F1: 81%, Prec: 82%, counting: F1: 80%, Prec: 80%, /z/, /ah/: F1: 79%, Prec: 80%, /ah/: F1: 74%, Prec: 83%, cough: 58%, Prec: 72% |
[79] | 2021 | Disease Identification: COVID-19, Asthma | COVID-19 sounds—1541 Respiratory Sounds | Mobile App, Web App–Smartphone, Various Microphones | light-weight CNN | MMFCC, EGFCC and Data De-noising Auto encoder | COVID-19/non-COVID-19 + breath + cough: Acc: 89%, Asthma/non-asthma + breath + voice Acc: 84% |
[80] | 2022 | Disease Identification: Asthma | Private—8 subjects (100 normal, 321 Wheezing, 98 Striding, 73 Rattling sounds) | N/A | DQNN, Hybrid machine learning | IWO, Signal Selection: EHS algorithm | Spec: 99.8%, Sens: 99.2%, Acc: 100% |
[81] | 2022 | Disease Identification: Asthma | 18 patients—300 respiratory sounds, 10 types of breathing | N/A | DENN | IWO Algorithm for Asthma Detection & Forecasting | Spec: 99.8%, Sens: 99.2%, Acc: 99.91% |
[82] | 2020 | Disease Identification: Asthma | Private—95 subjects (47 asthmatic, 48 healthy) | Various Microphones | SVM | ISCB using openSMILE, SET A: 5900 features, SET B: 6373 features, MFCC | /oU/ All feature groups: Acc: 74% |
[13] | 2020 | Disease Identification: COVID-19 | Private–240 acoustic data—60 normal, 20 COVID-19 subjects | Smartphone | LSTM (RNN) | Spec. Centroid, Spec. roll-off, ZCS, MFCC (+) | Cough: F1: 97.9% acc: 97%, breathing: F1: 98.8% acc: 98.2%, voices: F1: 92.5% acc: 88.2% |
[83] | 2020 | Disease Identification: Asthma | 88 recordings: 1957 segments (65 Severe resp. distress, 216 Asthma, 673 Mild resp. distress) | Smartphone | LIBSVM | Acoustic features: Interspeed 2010 Paralinguistic Challenge, 38 LLDs and 21 functionals | Acoustic Features: Acc: 86.3%, Sens: 85.9%, Spec: 86.9% |
[84] | 2021 | Disease Identification: Asthma | Private—30 subjects | N/A | RDNN | Discrete Ripplet-II Transform | Proposed EAP-DL: Acc: 86.3%, Sens: 85.9%, Spec: 86.9% |
[85] | 2022 | Symptom Identification: Voice Alteration | OPJHRC Fortis hospital in Raigarh—Cohort, not specified | Various Microphones | K-NN, SVM, LDA, LR, Linear SVM, etc. | Formant Frequencies, Pitch, Intensity, Jitter, Shimmer, Mean Autocorrelation, Harmonic to Noise ratio, Noice to Harmonic ration, MFCC, LPC | Decision Tree K-fold: Acc: 90% Sen: 90% Spec: 90% |
[86] | 2019 | Symptom Identification: Voice Alteration | Private—Cohort n.s. | Various Microphones | Pretrained from Intel OpenVIVO and TensorFlow | Not specified, however models are vision based | N/A |
[87] | 2021 | Disease Identification: COVID-19 | Coswara, Cambridge DB-2—4352 Web App users, 2261 Android App users | Smartphone | SVM | MFCC | Acc: 85.7%, F2: 85.1% |
Note. ML models: SVM = Support Vector Machine; K-NN = K-Nearest Neighbors; DT = Decision Trees; RF = Random Forest; NN = Neural Network; D-CNN = Deep Convolutional Neural Network; MLP = Multilayer Perceptron; NB = Naive Bayes; IWO = ImprovedWeed Optimization; DENN = Differential Evolutionary Neural Network; RBF model = Radial Basis Function model; LR = Linear Regression; LWL = Locally Weighted Regression (or Lowess); LDA = Linear Discriminant Analysis. Data Processing Methods: MFCCs = Mel-Frequency Cepstral Coefficients; CIF = Cochleagram Image Features; EGFCC = Enhanced-Gamma-tone Frequency Cepstral Coefficients; MMFCC = Modified Mel-frequency Cepstral Coefficients; IWO = Improved Weed Optimization; EHS = Effective Hand Strength; ISCB = Improved Standard Capon Beamforming; LPC = Linear Predictive Coding; FF = Fundamental Frequency; ZCS = Zero Crossing Rate. Metrics: Acc = Accuracy; Sens = Sensitivity; Spec = Specificity; Prec = precision; AUC = Area Under Curve.