Table 2.
AI techniques used for voice-based analysis
Study | Analysis modality | Objective | AI technique | Validation method | No. of samples in the training dataset | No. of samples in the testing dataset | Best result |
---|---|---|---|---|---|---|---|
[58] | CI | Noise reduction | NC+DDAE | Hold-out | 120 Utterances | 200 Utterances | Accuracy: 99.5% |
[59] | CI | Segregated speech from background noise | DNN | Hold-out | 560×50 Mixtures for each noise type and SNR | 160 Noise segments from original unperturbed noise | Hit ratio: 84%; false alarm: 7% |
[60] | CI | Improved pitch perception | ANN | Hold-out | 1,500 Pitch pairs | 10% of the training material | Accuracy: 95% |
[61] | CI | Predicted speech recognition and QoL outcomes | k-NN, DT | 10-CV | A total of 29 patients, including 48% unilateral CI users and 51% bimodal CI users | Accuracy: 81% | |
[62] | CI | Noise reduction | DDAE | Hold-out | 12,600 Utterances | 900 Noisy utterances | Accuracy: 36.2% |
[63] | CI | Improved speech intelligibility in unknown noisy environments | DNN | Hold-out | 640,000 Mixtures of sentences and noises | - | Accuracy: 90.4% |
[64] | CI | Modeling electrode-to-nerve interface | ANN | Hold-out | 360 Sets of fiber activation patterns per electrode | 40 Sets of fiber activation patterns per electrode | - |
[65] | CI | Provided digital signal processing plug-in for CI | WNN | Hold-out | 120 Consonants and vowels, sampled at 16 kHz; half of data was used as training set and the rest was used as testing set. | SNR: 2.496; MSE: 0.086; LLR: 2.323 | |
[66] | CI | Assessed disyllabic speech test performance in CI | k-NN | - | 60 Patients | - | Accuracy: 90.83% |
[67] | Acoustic signals | Voice disorders detection | CNN | 10-CV | 451 Images from 10 health adults and 70 adults with voice disorders | Accuracy: 90% | |
[68] | Dysphonic symptoms | Voice disorders detection | ANN | Repeated hold-out | 100 Cases of neoplasm, 508 cases of benign phonotraumatic, 153 cases of vocal palsy | Accuracy: 83% | |
[69] | Pathological voice | Voice disorders detection | DNN, SVM, GMM | 5-CV | 60 Normal voice samples and 402 pathological voice samples | Accuracy: 94.26% | |
[70] | Acoustic signal | Hot potato voice detection | SVM | Hold-out | 2,200 Synthetic voice samples | 12 HPV samples from real patients | Accuracy: 88.3% |
[71] | SEMG signals | Voice restoration for laryngectomy patients | XGBoost | Hold-out | 75 Utterances using 7 SEMG sensors | - | Accuracy: 86.4% |
AI, artificial intelligence; CI, cochlear implant; NC, noise classifier; DDAE, deep denoising autoencoder; DNN, deep neural network; SNR, signal-to-noise ratio; ANN, artificial neural network; QoL, quality of life; k-NN, k-nearest neighbors; DT, decision tree; CV, cross-validation; WNN, wavelet neural network; MSE, mean square error; LLR, log-likelihood ratio; CNN, convolutional neural network; GMM, Gaussian mixture model; SVM, support vector machine; HPV, human papillomavirus; SEMG, surface electromyographic.