. 2020 Jun 18;13(4):326–339. doi: 10.21053/ceo.2020.00654

Table 2.

AI techniques used for voice-based analysis

Study	Analysis modality	Objective	AI technique	Validation method	No. of samples in the training dataset	No. of samples in the testing dataset	Best result
[58]	CI	Noise reduction	NC+DDAE	Hold-out	120 Utterances	200 Utterances	Accuracy: 99.5%
[59]	CI	Segregated speech from background noise	DNN	Hold-out	560×50 Mixtures for each noise type and SNR	160 Noise segments from original unperturbed noise	Hit ratio: 84%; false alarm: 7%
[60]	CI	Improved pitch perception	ANN	Hold-out	1,500 Pitch pairs	10% of the training material	Accuracy: 95%
[61]	CI	Predicted speech recognition and QoL outcomes	k-NN, DT	10-CV	A total of 29 patients, including 48% unilateral CI users and 51% bimodal CI users		Accuracy: 81%
[62]	CI	Noise reduction	DDAE	Hold-out	12,600 Utterances	900 Noisy utterances	Accuracy: 36.2%
[63]	CI	Improved speech intelligibility in unknown noisy environments	DNN	Hold-out	640,000 Mixtures of sentences and noises	-	Accuracy: 90.4%
[64]	CI	Modeling electrode-to-nerve interface	ANN	Hold-out	360 Sets of fiber activation patterns per electrode	40 Sets of fiber activation patterns per electrode	-
[65]	CI	Provided digital signal processing plug-in for CI	WNN	Hold-out	120 Consonants and vowels, sampled at 16 kHz; half of data was used as training set and the rest was used as testing set.		SNR: 2.496; MSE: 0.086; LLR: 2.323
[66]	CI	Assessed disyllabic speech test performance in CI	k-NN	-	60 Patients	-	Accuracy: 90.83%
[67]	Acoustic signals	Voice disorders detection	CNN	10-CV	451 Images from 10 health adults and 70 adults with voice disorders		Accuracy: 90%
[68]	Dysphonic symptoms	Voice disorders detection	ANN	Repeated hold-out	100 Cases of neoplasm, 508 cases of benign phonotraumatic, 153 cases of vocal palsy		Accuracy: 83%
[69]	Pathological voice	Voice disorders detection	DNN, SVM, GMM	5-CV	60 Normal voice samples and 402 pathological voice samples		Accuracy: 94.26%
[70]	Acoustic signal	Hot potato voice detection	SVM	Hold-out	2,200 Synthetic voice samples	12 HPV samples from real patients	Accuracy: 88.3%
[71]	SEMG signals	Voice restoration for laryngectomy patients	XGBoost	Hold-out	75 Utterances using 7 SEMG sensors	-	Accuracy: 86.4%

AI, artificial intelligence; CI, cochlear implant; NC, noise classifier; DDAE, deep denoising autoencoder; DNN, deep neural network; SNR, signal-to-noise ratio; ANN, artificial neural network; QoL, quality of life; k-NN, k-nearest neighbors; DT, decision tree; CV, cross-validation; WNN, wavelet neural network; MSE, mean square error; LLR, log-likelihood ratio; CNN, convolutional neural network; GMM, Gaussian mixture model; SVM, support vector machine; HPV, human papillomavirus; SEMG, surface electromyographic.