Harmonic-to-noise ratio as speech biomarker for fatigue: K-nearest neighbour machine learning algorithm

Savita Gaur; Priti Kalani; M Mohan

doi:10.1016/j.mjafi.2022.12.001

. 2023 Jan 16;80(Suppl 1):S120–S126. doi: 10.1016/j.mjafi.2022.12.001

Harmonic-to-noise ratio as speech biomarker for fatigue: K-nearest neighbour machine learning algorithm

Savita Gaur ^a,^∗, Priti Kalani ^b, M Mohan ^c

PMCID: PMC11670608 PMID: 39734876

Abstract

Background

Vital information about a person's physical and emotional health can be perceived in their voice. After sleep loss, altered voice quality is noticed. The circadian rhythm controls the sleep cycle, and when it is askew, it results in fatigue, which is manifested in speech. Using MATLAB statistical techniques and the k-nearest neighbour (KNN) machine learning algorithm, this study assessed the efficacy of the harmonic-to-noise ratio (HNR) as a speech biomarker in differentiating fatigued and normal voice after sleep deprivation of one night.

Methods

After one night of sleep deprivation, acoustic samples for sustained vowel/a/and visual reaction time were recorded from n = 32 healthy young Indian male volunteers (20–40 yrs). One-way ANOVA established significant changes in voice characteristics with progressive sleep deprivation. The effectiveness of speech HNR as a biomarker for the detection of healthy and fatigued voice was researched, using the KNN classifier in a machine learning algorithm.

Results

The HNR voice feature was taken from an acoustic sample for three times: baseline (Time 1), 3 AM (Time 2), and 7 AM (Time 3) towards an incremental one-night sleep loss. At 3AM, the HNR changed significantly p<0.05. Utilizing an iterative signal extrapolation approach, the KNN classifier divided the submitted voice signal sample into normal and fatigued categories.

Conclusion

The findings imply that the HNR can be used to link fatigue from sleep deprivation with vocal alterations by classifying voice samples in a KNN classifier. Along with the multimodal diagnostic features, this method may also offer an additional acoustic biomarker for the diagnosis of fatigue post sleep loss.

Keywords: Speech biomarker, Harmonic-to-noise ratio, K-nearest neighbour, Machine learning

Introduction

The unique characteristics of the military environment make soldiers susceptible to fatigue. The capacity to rapidly and covertly assess a soldier's level of fatigue prior to the start of a key operation would offer crucial medical information and increase the likelihood of mission success. Studies on voice analysis have shown that voice characteristics are significantly impacted by exhaustion. This has introduced the idea of fatigue, which may be linked to audible voice changes.1, 2, 3 Acoustic properties such as voice frequency, jitter, shimmer, and harmonic-to-noise ratio (HNR), Mel-frequency cepstral coefficients (MFCC) and others have been shown to be useful in defining the vocal characteristics in human speech; however, significant biomarkers must be identified so that all voice parameters are not necessary to be examined when detecting fatigue due to sleep deprivation stress.⁴^,⁵ Additionally, because sleep is necessary for organ health, complete or partial sleep deprivation has negative effects on speech and other psychomotor activities that may be related to electrophysiological alterations in the human body.⁶^,⁷ It is important to identify key sleep-related voice biomarkers given the strategic requirements in military aviation. Additionally, there aren't enough research contributions that point to a certain voice characteristic as the only factor in fatigue imposed on by insufficient sleep. The important publications that were reviewed for this investigation are shown in Table 1. It is anticipated that HNR, a voice characteristic of male participants in this research study that was tested at precise intervals during incremental sleep deprivation, could be a significant speech biomarker for fatigue spurred on by sleep loss. The k-nearest neighbour (KNN) machine learning algorithm was used to demonstrate the speech HNR as an efficient and trustworthy biomarker for detecting fatigue. The aim of the research was to identify speech biomarkers associated with sleep deprivation in order to improve fatigue-related risks.

Table 1.

Key reviewed articles during this study.

Author/Year	Sleep loss (hr)/experiment	Participant	Voice recording/physiological parameter	Voice feature extraction/	Result
Xiujie G, et al, 2022	36 h sleep deprivation	N = 15	Short Text voice, ERP P300, saliva, Questionnaire	Fundamental frequency, energy, zero crossing, HNR, jitter, shimmer, loudness, Mel frequency coefficient	No Significant result. However, on SVN algorithm prediction accuracy for fatigue and normal classification for multi vowel was 94%
Ashok BW, et al, 2022	2 and 8 h of sleep deprivation	N = 30	Standard paragraph reading, vowel/a/on Dr. Speech Voice assessment tool, psychomotor vigilance task (PVT).	Shimmer%, NNE, HNR, SNR	harmonic-to-noise ratio and signal-to-noise ratio decreased significantly after 8-h of sleep deprivation
Michal I, et al, 2020	24 h sleep deprivation	N = 47	Phonation, word, sentence	Fundamental frequency, intensity, HNR, jitter, shimmer	Higher HNR value post sleep deprivation
Thomas Z, et al, 2019	Effect of CR on voice	N = 216	Phonate vowel/a/(PRAAT) Horne–Ostbery Morningness–Eveningness Scale	Pitch, jitter, and shimmer	Circadian rhythm has strong relationship with jitter, pitch, and shimmer in the voice
Adam PV, et al, 2010	24 h of sustained wakefulness	N = 18	Extemporaneous speech tasks, sustained vowel, and a read passage	Frequency and spectral energy (PRAAT)	Changes in speech production after just 24 h of sustained wakefulness
Harold PG et al, 2007	34 h sleep deprivation	N = 31	Random words, speech phones, ARES test battery, SSS, POMS	36 components voice vector	MFCC component most sensitive to fatigue
Kumara S et al, 2007	Classification of normal and pathological voices	N = 700	Speech data base- Kay Elemetrics Corporation	HNR, critical band energy spectrum	Pathological voice exhibits higher noise. Differences in energy distribution at critical-bands between clean and noisy speech

Open in a new tab

Material and methods

As the experimental design, prospective observational group study was employed. 32 healthy Indian male volunteers (20–40 years) participated in the study. After securing ethical clearance from the institute, informed consent was acquired. The participants received instruction on how to speak naturally, at their usual speaking rate, and with a vocal effort suitable for conversing with people in quiet settings. The experimental protocol was used to collect voice samples at various intervals, including before and after sleep deprivation (one-night sleep loss). The speech test involved saying an open vowel, such as/a:/for around 3 s (AAAH). By using the vocal assessment instrument of the Dr. Speech software, the baseline (normal voice) was recorded in the lab at 8 AM (Time 1: baseline).⁸ Using a portable psychomotor vigilance task monitor that stores repeated response time data, the basic visual reaction time (RT) was recorded.⁸ At 3 AM (Time 2) and until dawn at 7 AM (Time 3), the measurements for sleep-deprived voice (SDV), including voice parameters and RT, were repeated. Using Dr. Speech's vocal assessment statistical report, the HNR voice features from the data were retrieved. HNR ratio and RT significance analyses were conducted using the one-way ANOVA test and post hoc statistics in MATLAB. Using the KNN classifier in a machine learning algorithm, the effectiveness of speech HNR as a biomarker for the detection of normal and fatigued voice was assessed. It uses supervised machine learning to create self-organizing and learning models from data. All guidelines as per the Declaration of Helsinki and good clinical practice guidelines were followed.

Results

For each participant, HNR voice features were extracted for voice samples corresponding to normal voice (NV), SDV at 3AM and 7AM. Fig. 1 shows the HNR for NV and SDV at 3AM for (n = 32) subjects. The HNR is presented in log scale. The HNR values related to SDV at 3AM on average have lower magnitude as compared to NV, HNR values. These voice data and psychomotor vigilance task RT data were tabulated in Excel sheet and exported to MATLAB R2020b for statistical analysis. Simple one-way ANOVA was carried out to check the significant differences, if any, between the voice parameters and RT at different time interval of sleep deprivation, Time 1: NV, Time 2: 3 AM, and Time 3: 7 AM as two SDV. The p value significance of the one-way ANOVA results for HNR was (F = 7.1, p <0.05 and statistics are given in Fig. 2, and Box plot is given in Fig. 3. The HNR in the NV was 26.9 dB, and at 3 AM and 7 AM, values were 24.16 dB and 28.11 dB, respectively. Post hoc test using multcompare in MATLAB revealed that NV HNR was significantly different from SDV HNR at 3 AM. The results are displayed in Fig. 4. Fig. 5 illustrates the p value statistics of the one-way ANOVA for visual RT (F = 54.9, p<0.05). The RT at 3 AM and 7 AM was significantly distinct from the baseline values, as per the post hoc analysis.

Fig. 2 — The one-way ANOVA results for HNR feature of voice, compared for normal voice (NV) and sleep-deprived voice (SDV) at 3AM for (n = 32) participants. The p value significance of the statistics is p< 0.05.

Fig. 3 — The Box plot displaying the distribution of HNR feature of voice, compared for normal voice (NV) and sleep-deprived voice (SDV) at 3AM for (n = 32) participants. The p value significance of the statistics is p<0.05. The HNR in the voice was 26.90 dB for normal voice (NV) and 3 AM and 7 AM values were 24.16 dB and 28.11 dB respectively.

Fig. 4 — Post hoc analysis displaying the distribution of HNR feature of voice, compared for normal voice (NV) and sleep-deprived voice (SDV) at 3AM and 7AM. Parawise comparison using multcompare in MATLAB revealed that normal voice HNR was significantly different from SDV HNR at 3 AM.

Fig. 5 — The one-way ANOVA results for visual reaction time compared for normal voice (NV) and sleep-deprived voice (SDV) at 3AM for (n = 32) participants. The p value significance of the statistics is p<0.05.

The KNN machine learning algorithm was implemented to establish the speech HNR as an effective and reliable biomarker for detecting fatigue therefrom. Using an iterative signal extrapolation approach, the KNN classifier classified and grouped the given voice signal sample as fatigued or normal. To select the number of neighbour, k, the values of k that maximize true positive rate were used. We performed tests with the number of estimators equal to 10, 40 and 50, and best result was with k = 40 for HNR classification. The test and validation accuracy, as well as true positive rate and false positive rates on receiver operator characteristic (ROC) curve of the KNN classifier were used to evaluate the performance of the classifier. The classification results, for the HNR feature from the vowel/a:/for KNN classifier, are presented in Table 2. The data split plot shows how the data set is partitioned into a training, validation, and test set. In this case, we use 61 observations to train the model, 16 observations to validate the model, and 19 observations to test the model's performance. The classifier results suggest that the HNR feature set can be used to classify NV and fatigued voice. The validation accuracy was 62%, and test accuracy was 57%. The ROC curve for the KNN classification for HNR and RT are presented in Fig. 6, Fig. 7. The higher the area under the curve (AUC), the better the performance of the KNN classifier at distinguishing between the NV and SDV classes. The AUC value for HNR at 3AM was 0.71 and for 7AM is 0.75. The AUC value in ROC curve for RT at 3 AM was 0.80 and for 7 AM was 0.71. The best performing value of k neighbour for visual RT in KNN classification is presented in Table 3.

Table 2.

The best performing value of k neighbour for HNR feature in KNN classification along with data split details. The validation accuracy was 62% and test accuracy was 57%.

K-Nearest Neighbour Classification
Nearest neighbour	Weights	Distance	n (Train)	n (Validation)	n (Test)	Validation Accuracy	Test Accuracy
40	rectangular	Euclidean	61	16	19	0.625	0.57

Open in a new tab

Note. The model is optimized with respect to the validation set accuracy.

Note. The optimum number of nearest neighbour is the maximum number.

Data Split .

Fig. 6 — ROC curve for KNN classifier on HNR dataset. The AUC value for HNR in ROC curve at 3AM was 0.71 and for 7AM is 0.75.

Fig. 7 — ROC curve for KNN classifier on Reaction Time dataset. The AUC value in ROC curve for Reaction time at 3AM was 0.80 and for 7AM was 0.71.

Table 3.

The best performing value of k neighbour for visual reaction time in KNN classification. The validation accuracy was 56% and test accuracy was 78%.

K-Nearest Neighbour Classification
Nearest neighbour	Weights	Distance	n (Train)	n (Validation)	n (Test)	Validation Accuracy	Test Accuracy
2	Rectangular	Euclidean	61	16	19	0.563	0.78

Open in a new tab

Note. The model is optimized with respect to the validation set accuracy.

Discussion

Sleep loss refers to a reduction in regular sleep hours or complete sleep deprivation. The lack of sleep, whether total or partial, has been found to have detrimental effects on a number of neurocognitive and psychomotor abilities, which may be related to electrophysiological changes in the human body.⁶^,⁷

Reduced cognitive processing speed, decreased speech planning, sluggish articulator movements, and slower speech are all effects of fatigue induction.⁹ Sleep loss fatigue reduces muscular tension and increases vocal fold elasticity, which shifts the spectral energy distribution and results in breathiness and laxness. It also causes vocal tract softening, which lowers the energy of the speech signal.⁴ Evaluation of fatigue is crucial to minimizing the risk of accident in an operational setting. Because speaking is a natural action that requires no previous training and voice recording is an unobtrusive procedure, speech analysis to evaluate fatigue is unique and easy from established cognitive assessments.

The term “biological marker” or “biomarker” refers to indicators that reveal the patient's medical condition as seen by others.¹⁰ Acoustic biomarkers have enormous potential for improving diagnostics in disorders that affect the heart, lungs, vocal folds, or brain, all of which can affect a person's voice. Speech recognition technology is becoming one of the most promising technologies for improving healthcare services, and voice analysis using machine learning techniques open up new avenues in medicine.

Circadian rhythm (CR), defined as “physical, mental, and behavioural changes that occur throughout a roughly 24 h cycle in response to light and darkness,” plays a vital role in human health. As discussed by researchers, speech quality has a circadian pattern, peaking during regular working hours and troughing during regular sleeping hours.¹¹ Researchers have discovered a link between intonation loss and sleep loss. Research by Yvonne et al, has shown that intonation had clearly deteriorated by day 2 of sleep deprivation.¹² Jarek et al had shown that when the level of fatigue increased in the subject, it influenced the speech production mechanism.¹³ An important factor to look at when analysing sleepiness influenced speech production is generation of non-linear aerodynamic phenomena within the vocal tract. The 83% recognition rate was achieved by using vowel [a:] for studying the performance difference between an active and fatigue state of a person. Based on the review of the literature, the sleep loss period of 24 h was used for the 40% times in the study.

One of the important characteristics of voiced speech is the well-defined harmonic structure.¹^,⁴ The HNR ratio consequently indicates the dominance of harmonic part of wave over noise levels in the voice and is quantified in dB units. If the vocal fold vibrates non-periodically, it may result in noise part. Perceptually, HNR reflects voice quality and is used as a practical index of degree of hoarseness.⁴ A low HNR indicates dysphonia and an asthenic voice. In other words, an HNR value of less than 7 dB is classified as pathological.¹⁴

The fatigue induced in study participants in this research, by sleep depriving them for whole night, made their voice harmonicity to decrease significantly (F = 7.61, p = 0.0015) and noise in the voice increased. There was decrease in the HNR value for SDV. The dip in the harmonicity at 3 AM and increase for following dawn, proving yet again the influence of circadian pattern in human sleep and fatigue due to sleep. Individual's begun to show the effect of fatigue around midnight (19 h post baseline), with maximal effect on timing arising before dawn (3 AM). The sleep deprivation in the present research was of one night only and does not fall in the category of acute or chronic sleep deprivation. This may be the major reason that the influence of circadian pattern of sleep and fatigue was observed in the voice parameters. This is one important and a very useful observation in the present study and will be very useful in terms of military operations, wherein strategic operations are to be carried out about midnight. The voice analysis results were compared and validated with simple RT visual response data. The performance degradation measured as increases in RT worsened with increasing wakefulness. The cognitive response decreases under fatigued state and several studies have investigated and approved slower RTs and increased errors after sleep loss. Effect of sleep deprivation fatigue was clearly seen on attention and response.

KNN is a non-parametric classifier that classifies sample based on the majority of its neighbouring labels. The HNR-based features provided lower false rejection and thus higher sensitivity with 60% test accuracy. In a study by Sharma et al, the HNR voice feature was used to determine normal and pathological voice using KNN, the test accuracy was about 90%.⁴ Fatigue assessment-based visual reaction analysis provided higher sensitivity with 80% test accuracy. The probability that the positive fatigued case outranks the negative case according to the classifier is given by the AUC in ROC plot. As can be seen in the present case, the AUC value for the fatigue curve in both the ROC curves for the HNR as well as RT is more than 70%, indicating the success of the implementation on KNN classifier in detecting the NV to that of SDV.

Conclusion

Speech provides data that can be utilised as acoustic biomarkers to track a person's physical and mental health. A relatively new approach of detecting fatigue that can quickly and non-invasively identify physical fatigue is the use of speech information. This technique can overcome the drawbacks of current detection techniques, which are cumbersome, inconvenient, and incapable of preventing accidents. Interestingly, this study indicated that the circadian pattern of sleep and fatigue had an impact on the voice parameters. This is an important finding from the study, and it will be enormously beneficial in military operations involving tactical operations at night. The results further demonstrate that speech classes with NV and SDV may be accurately classified using HNR features on a KNN classifier.

Patients/ Guardians/ Participants consent

Participants informed consent was obtained.

Ethical clearance

Institute/hospital ethical clearance certificate was obtained.

Source of support

This paper is based on Armed Forces Medical Research Committee Project No. 4693/2015 granted and funded by the office of the Directorate General Armed Forces Medical Services and Defence Research Development Organization, Government of India.

Disclosure of competing interest

The authors have none to declare.

Acknowledgement

Authors gratefully acknowledge and thank Deepak Gaur, Commandant CH (AF), and Narinder Taneja, Commandant, IAM, IAF, Rajeev Varshney, Director DIPAS, Bhuvnesh Kumar DS for their support in completion of this research work. We sincerely thank all the subjects who participated in this study, for their cooperation.

References

1.Michal I., Zukermann G., Hershkovish S., et al. The effect of 24 hrs of sleep deprivation on vocal parameters of young adults. J Voice. 2020;34(3):489.e1–489.e9. doi: 10.1016/j.jvoice.2018.11.010. [DOI] [PubMed] [Google Scholar]
2.Xiujie G., Kefeng M., Honglian Y., et al. A rapid, non invasive method for fatigue detection based on voice information. Front Cell Dev Biol. 2020:1–10. doi: 10.3389/fcell.2022.994001. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Harold P.G., Joel B., Eric F., et al. Fatigue estimation using voice analysis. Behavioral research methods. 2007;39(3):610–619. doi: 10.3758/bf03193033. [DOI] [PubMed] [Google Scholar]
4.Kumara S., Ananth K., Udipi N., et al. Study of harmonics-to-noise ratio and critical-band energy spectrum of speech as acoustic indicators of laryngeal and voice pathology. Journal on Advances in Signal Processing. 2007:1–9. doi: 10.1155/2007/85286. [DOI] [Google Scholar]
5.Orozco Rafael J., Honig F., Arias-Londoño J.D., Vargas-Bonilla J.F. Voiced/unvoiced transitions in speech as a potential bio-marker to detect Parkinson's disease. INTERSPEECH. 2015 doi: 10.21437/Interspeech.2015-34. [DOI] [Google Scholar]
6.Mohammad H.R.K., Nasir B.M., Hossain Salim. Psychomotor skills in total sleep deprivation. Int Med J. 2021;28(2):159–162. [Google Scholar]
7.Banks S., Dinges D.F. Behavioral and physiological consequences of sleep restriction. J Clin Sleep Med. 2007;3(5):519–528. [PMC free article] [PubMed] [Google Scholar]
8.Ashok B.W., Vipin S. Changes in speech and voice characteristics following two hours and eight hours of acute sleep deprivation. Indian J Aero Med. 2022;66(1):15–20. [Google Scholar]
9.Adam P.V., Janet F., Paul M. Acoustic analysis of the effects of sustained wakefulness on speech. J Acoust Soc Am. 2010;128(6):3747–3756. doi: 10.1121/1.3506349. [DOI] [PubMed] [Google Scholar]
10.Kyle S., Jorge A.T. What are biomarkers? Review Curr Opin HIV AIDS. 2010;5(6):463–466. doi: 10.1097/COH.0b013e32833ed177. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Thomas Z., D'souza P. Effect of circadian cycle on voice: a cross-sectional study with young adults of different chronotypes. J Laryngol Voice. 2018;8(1):19–23. [Google Scholar]
12.Yvonne H., James A.H. Sleep deprivation affects speech. Sleep. 1997;20(10):871–877. doi: 10.1093/sleep/20.10.871. [DOI] [PubMed] [Google Scholar]
13.Jarek K., Sebastian S., David S., Anton B., Björn S. Applying multiple classifiers and non-linear dynamics features for detecting sleepiness from speech. Neurocomputing. 2012;(84):65–75. [Google Scholar]
14.João P.T., Carla O., Carla L. Vocal acoustic analysis - jitter, shimmer and HNR parameters. Procedia Technology. 2013;(9):112–1122. [Google Scholar]

[bib1] 1.Michal I., Zukermann G., Hershkovish S., et al. The effect of 24 hrs of sleep deprivation on vocal parameters of young adults. J Voice. 2020;34(3):489.e1–489.e9. doi: 10.1016/j.jvoice.2018.11.010. [DOI] [PubMed] [Google Scholar]

[bib2] 2.Xiujie G., Kefeng M., Honglian Y., et al. A rapid, non invasive method for fatigue detection based on voice information. Front Cell Dev Biol. 2020:1–10. doi: 10.3389/fcell.2022.994001. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib3] 3.Harold P.G., Joel B., Eric F., et al. Fatigue estimation using voice analysis. Behavioral research methods. 2007;39(3):610–619. doi: 10.3758/bf03193033. [DOI] [PubMed] [Google Scholar]

[bib4] 4.Kumara S., Ananth K., Udipi N., et al. Study of harmonics-to-noise ratio and critical-band energy spectrum of speech as acoustic indicators of laryngeal and voice pathology. Journal on Advances in Signal Processing. 2007:1–9. doi: 10.1155/2007/85286. [DOI] [Google Scholar]

[bib5] 5.Orozco Rafael J., Honig F., Arias-Londoño J.D., Vargas-Bonilla J.F. Voiced/unvoiced transitions in speech as a potential bio-marker to detect Parkinson's disease. INTERSPEECH. 2015 doi: 10.21437/Interspeech.2015-34. [DOI] [Google Scholar]

[bib6] 6.Mohammad H.R.K., Nasir B.M., Hossain Salim. Psychomotor skills in total sleep deprivation. Int Med J. 2021;28(2):159–162. [Google Scholar]

[bib7] 7.Banks S., Dinges D.F. Behavioral and physiological consequences of sleep restriction. J Clin Sleep Med. 2007;3(5):519–528. [PMC free article] [PubMed] [Google Scholar]

[bib8] 8.Ashok B.W., Vipin S. Changes in speech and voice characteristics following two hours and eight hours of acute sleep deprivation. Indian J Aero Med. 2022;66(1):15–20. [Google Scholar]

[bib9] 9.Adam P.V., Janet F., Paul M. Acoustic analysis of the effects of sustained wakefulness on speech. J Acoust Soc Am. 2010;128(6):3747–3756. doi: 10.1121/1.3506349. [DOI] [PubMed] [Google Scholar]

[bib10] 10.Kyle S., Jorge A.T. What are biomarkers? Review Curr Opin HIV AIDS. 2010;5(6):463–466. doi: 10.1097/COH.0b013e32833ed177. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib11] 11.Thomas Z., D'souza P. Effect of circadian cycle on voice: a cross-sectional study with young adults of different chronotypes. J Laryngol Voice. 2018;8(1):19–23. [Google Scholar]

[bib12] 12.Yvonne H., James A.H. Sleep deprivation affects speech. Sleep. 1997;20(10):871–877. doi: 10.1093/sleep/20.10.871. [DOI] [PubMed] [Google Scholar]

[bib13] 13.Jarek K., Sebastian S., David S., Anton B., Björn S. Applying multiple classifiers and non-linear dynamics features for detecting sleepiness from speech. Neurocomputing. 2012;(84):65–75. [Google Scholar]

[bib14] 14.João P.T., Carla O., Carla L. Vocal acoustic analysis - jitter, shimmer and HNR parameters. Procedia Technology. 2013;(9):112–1122. [Google Scholar]

PERMALINK

Harmonic-to-noise ratio as speech biomarker for fatigue: K-nearest neighbour machine learning algorithm

Savita Gaur

Priti Kalani

M Mohan