Skip to main content
PLOS Digital Health logoLink to PLOS Digital Health
. 2022 Oct 20;1(10):e0000112. doi: 10.1371/journal.pdig.0000112

A voice-based biomarker for monitoring symptom resolution in adults with COVID-19: Findings from the prospective Predi-COVID cohort study

Guy Fagherazzi 1,*, Lu Zhang 2, Abir Elbéji 1, Eduardo Higa 1, Vladimir Despotovic 3, Markus Ollert 4,5, Gloria A Aguayo 1, Petr V Nazarov 2,6, Aurélie Fischer 1
Editor: Ryan S McGinnis7
PMCID: PMC9931359  PMID: 36812535

Abstract

People with COVID-19 can experience impairing symptoms that require enhanced surveillance. Our objective was to train an artificial intelligence-based model to predict the presence of COVID-19 symptoms and derive a digital vocal biomarker for easily and quantitatively monitoring symptom resolution. We used data from 272 participants in the prospective Predi-COVID cohort study recruited between May 2020 and May 2021. A total of 6473 voice features were derived from recordings of participants reading a standardized pre-specified text. Models were trained separately for Android devices and iOS devices. A binary outcome (symptomatic versus asymptomatic) was considered, based on a list of 14 frequent COVID-19 related symptoms. A total of 1775 audio recordings were analyzed (6.5 recordings per participant on average), including 1049 corresponding to symptomatic cases and 726 to asymptomatic ones. The best performances were obtained from Support Vector Machine models for both audio formats. We observed an elevated predictive capacity for both Android (AUC = 0.92, balanced accuracy = 0.83) and iOS (AUC = 0.85, balanced accuracy = 0.77) as well as low Brier scores (0.11 and 0.16 respectively for Android and iOS when assessing calibration. The vocal biomarker derived from the predictive models accurately discriminated asymptomatic from symptomatic individuals with COVID-19 (t-test P-values<0.001). In this prospective cohort study, we have demonstrated that using a simple, reproducible task of reading a standardized pre-specified text of 25 seconds enabled us to derive a vocal biomarker for monitoring the resolution of COVID-19 related symptoms with high accuracy and calibration.

Author summary

People infected with SARS-CoV-2 may develop different forms of COVID-19 characterized by diverse sets of COVID-19 related symptoms and thus may require personalized care. Among digital technologies, voice analysis is a promising field of research to develop user-friendly, cheap-to-collect, non-invasive vocal biomarkers to facilitate the remote monitoring of patients. Previous attempts have tried to use voice to screen for COVID-19, but so far, little research has been done to develop vocal biomarkers specifically for people living with COVID-19. In the Predi-COVID cohort study, we have been able to identify an accurate vocal biomarker to predict the symptomatic status of people with COVID-19 based on a standardized voice recording task of about 25 seconds, where participants had to read a pre-specified text. Such a vocal biomarker could soon be integrated into clinical practice for rapid screening during a consultation to aid clinicians during anamnesis, or into future telemonitoring solutions and digital devices to help people with COVID-19 or Long COVID.

Introduction

The COVID-19 pandemic has massively impacted the worldwide population and the healthcare systems, with more than 200 million cases and 4 million deaths in August 2021 [1]. COVID-19 is a heterogeneous disease with various phenotypes and severity. The diversity of profiles, from asymptomatic to severe cases admitted to ICU, require tailored care pathways to treat them [2].

Except for hospitalized individuals, asymptomatic, mild, moderate COVID-19 cases are recommended to go in isolation and ensure home-based healthcare [3]. Monitoring symptom resolution or aggravation can be useful to identify at-risk individuals of hospitalization or immediate attention. An objective monitoring solution could then be beneficial, with its use potentially extended to people with Long Covid syndrome [4] to monitor their symptoms in the long run and improve their quality of life.

The pandemic has largely put under pressure entire healthcare systems, up to the point of needed national or regional lockdowns. Identifying solutions to help healthcare professionals focus on the more severe and urgent cases was strongly recommended. Digital health and artificial intelligence-(AI) based solutions hold the promise of alleviating clinicians by automating or transferring tasks that can be accomplished by the patients themselves [5]. Enabling self-surveillance and remote monitoring of symptoms using augmented telemonitoring solutions could therefore help to improve and personalize the way COVID-19 cases are handled [6].

Among all the types of digital data easily available at a large scale, voice is a promising source, as it is rich, user-friendly, cheap to collect, non-invasive, and can serve to derive vocal biomarkers to characterize and monitor health-related conditions which could then be integrated into innovative telemonitoring or telemedicine technologies [7].

Several vocal biomarkers have already been identified in other contexts, such as neurodegenerative diseases or mental health, or as a potential COVID-19 screening tool based on cough recordings [8], but no prior work has been performed yet to develop a vocal biomarker of COVID-19 symptom resolution.

We hypothesized that symptomatic people with COVID-19 had different audio features from asymptomatic cases and that it was possible to train an AI-based model to predict the presence of COVID-19 symptoms and then derive a digital vocal biomarker for easily and quantitatively monitoring symptom resolution. To test this hypothesis, we used data from the large hybrid prospective Predi-COVID cohort study.

Methods

Study design and population

Predi-COVID is a prospective, hybrid cohort study composed of laboratory-confirmed COVID-19 cases in Luxembourg who are followed up remotely for 1 year to monitor their health status and symptoms. The objectives of Predi-COVID study are to identify new determinants of COVID-19 severity and to conduct deep phenotyping analyses of patients by stratifying them according to the risk of complications. The study design and initial analysis plan were published elsewhere [9]. Predi-COVID is registered on ClinicalTrials.gov (NCT04380987) and was approved by the National Research Ethics Committee of Luxembourg (study number 202003/07) in April 2020. All participants provided written informed consent to take part in the study.

Predi-COVID includes a digital sub-cohort study composed of volunteers who agreed to a real-life remote assessment of their symptoms and general health status based on a digital self-reported questionnaire sent every day for the first 14 days after inclusion, then once a week during the third and fourth weeks and then every month for up to one year. Participants were asked to answer these questionnaires as often as possible but were free to skip them if they felt too ill or if symptoms did not materially change from one day to the other.

Predi-COVID volunteers were also invited to download and use, on their smartphone, Colive LIH, a smartphone application developed by the Luxembourg Institute of Health to specifically collect audio recordings in cohort studies. Participants were given a unique code to enter the smartphone application and perform the recordings.

Data collection in Predi-COVID follows the best practices guidelines from the German Society of Epidemiology [10]. For the present work, the authors also followed the TRIPOD standards for reporting AI-based model development and validation and used the corresponding checklist to draft the manuscript [11].

In the present analysis, we included all the Predi-COVID participants recruited between May 2020 and May 2021 with available audio recordings at any time point in the first two weeks of the follow-up and who had filled in the daily questionnaire on the same day as the audio recordings. Therefore, multiple audio recordings were available for a single participant.

COVID-19 related symptoms

Study participants were asked to report their symptoms among a list of frequently reported ones in the literature: dry cough, fatigue, sore throat, loss of taste and smell, diarrhea, fever, respiratory problems, increase in respiratory problems, difficulty eating or drinking, skin rash, conjunctivitis or eye pain, muscle pain/unusual aches, chest pain, overall pain level (for more details, please see Table 1). We consider a symptomatic case as someone reporting at least one symptom in the list and an asymptomatic case as someone who completed the questionnaires but did not report any symptom in the list.

Table 1. Distribution of symptoms for participants with at least one symptom reported in the 14 days of follow-up.

Symptom Question Modalities Symptom presence (%) in symptomatic evaluations (N = 1049 symptom evaluations from 225 individuals)
Dry cough Do you have a dry cough? Yes/No 44.7
Fatigue Do you feel tired? 50.0
Sore throat Did you have a sore throat in the past few days? 22.7
Loss of taste and smell Did you notice a strong decrease or a loss of taste or smell? 47.3
Diarrhea Do you have diarrhea? At least 3 loose stools per day. 7.3
Fever Do you have fever? 7.7
Respiratory problems Do you have respiratory problems? 15.3
Increase in respiratory problems Did you notice the appearance or an increase of your usual respiratory problems since the diagnosis? 18.9
Difficulty eating or drinking Do you have significant difficulty in eating or drinking? 2.7
Skin rash Did you notice any sudden-onset skin rash on the hands or feet (for example frostbite, persistent redness, sometimes painful, acute urticaria)? 3.5
Conjunctivitis or eye pain Did you notice the appearance of conjunctivitis or eye pain (persistent redness in the whites of the eye, itchy eyelid, tingling, burning, frequent tearing)? 14.8
Muscle pain/unusual aches Did you have muscle pain or unusual aches in the last days? 34.1
Chest pain Did you have chest pain in the last days? 14.6
Pain level > 2 (%)
Overall pain level What is your current pain level? Rate from 1 (low) to 10 (high) 17.3

Voice recordings

Participants were asked to record themselves while reading, in their language (German, French, English, or Portuguese), a standardized, prespecified text which is the official translation of the first section of Article 25 of the Universal Declaration of Human Rights of the United Nations [12] (see S1 File for more details). The audio recordings were performed in a real-life setting. Study investigators provided the participants with a few guidelines on how to position themselves and their smartphones for optimal audio quality, along with a demo video.

Pre-processing

Raw audio recordings have then been pre-processed before the training of the algorithms using Python libraries (Fig 1). First, audio files have all been converted into .wav files, using the ffmpy.FFmpeg() function by keeping the original sampling rate, i.e. 8kHz and 44.1kHz for 3gp and m4a respectively, and with 16-bit bit-depth. The compression ratio is around 10:1 for 3gp files and between 1.25:1 and 10:1 for m4a files. Audios shorter than 2 seconds were excluded at this stage. A clustering (DBSCAN) on basic audio features (duration, the average, sum and standard deviation of signal power, and fundamental frequency), power in time domain, and cepstrum has been performed to detect outliers, which were further checked manually and removed in case of bad audio quality. Audio recordings were then normalized on the volume, using the pydub.effects.normalize function, which finds the maximum volume of an audio segment, then adjusts the rest of the audio in proportion. Noise reduction was applied on normalized audios with log minimum mean square error logMMSE speech enhancement/noise reduction algorithm, which is shown to result in a substantially lower residual noise level without affecting the voice signal substantially [13]. Finally, blanks > 350 ms at the start or the end of the audio were trimmed.

Fig 1. General pipeline from data collection to vocal biomarker.

Fig 1

Feature extraction

We extracted audio features from the pre-processed voice recordings using the OpenSmile package [14] (see S2 File) with 8kHz for both 3gp and m4a format. We used ComParE 2016 feature set but modified the configuration file to add the Low-Level Descriptor (LLD) MFCC0, which is the average of log energy and is commonly used in speech recognition. Applying the logarithm to the computation of energy mimics the dynamic variation in the human auditory system, making the energy less sensitive to input variations that might be caused by the speaker moving closer or further from the microphone. Overall, this allowed us to extract 6473, instead of 6373 in origin, features—functionals of 66 LLDs as well as their delta coefficients.

Data analysis

We first performed descriptive statistics to characterize the study participants included, using means, standard deviations for quantitative features, and counts and percentages for qualitative features. In each audio type (3gp and m4a), we compared the distribution of the arithmetic mean of each LLD from symptomatic and asymptomatic samples(S3 File and S4 File). Separate models were trained for each audio format (3gp/Android, m4a/iOS devices).

Feature selection

We used recursive feature elimination to reduce the dimensionality and select meaningful information from the raw audio signal that could be further processed by a machine learning algorithm. Recursive Feature Elimination (RFE) is a wrapper-based feature selection method, meaning that the model parameters of another algorithm (e.g. Random Forest) are used as criteria to rank the features, and the ones with the smallest rank are iteratively eliminated. In this way, the optimal subset of features is extracted at each iteration, showing that the features with the highest rank (eliminated last) are not necessarily the most relevant individually [15]. We used Random Forest as a core estimator and the optimal number of selected features was determined using a grid search in the range [100, 150, 200, 250, 300]. The optimal number of features is provided in the subsection “Best predictive model” of the “Results” section.

Classification model selection and evaluation

We performed a randomized search with 5-fold cross-validation for the optimal hyperparameters of four frequently-used methods in audio signal processing, using their respective sci-kit learn functions: Support vector machine (SVM), bagging SVMs, bagging trees, random forest (RF), Multi-layer Perceptron (MLP). SVM is a widely used machine learning method for audio classification [16]. SVM constructs a maximum margin hyperplane, which can be used for classification [17]. One of the advantages of SVM is robust to the high variable-to-sample ratio and a large number of variables. A bagging classifier is an ensemble meta-estimator that fits the base classifier on random subsets of the original dataset and then aggregates their individual predictions to form a final prediction [17,18]. This approach can be used to reduce the variance of an estimator. A decision tree is a non-parametric classification method, which creates a model that predicts the value of a target variable by learning simple decision rules inferred from the data features. But a simple decision tree suffers from high variance. Therefore, we added the bagging approach described above to reduce the variance. A RF is also an ensemble learning method that fits a number of decision trees to improve the predictive accuracy and control over-fitting [19]. Different from bagging, the random forest selects, at each candidate split in the learning process, a random subset of the features, while the bagging uses all the features. We applied the random forest with different parameter configurations for the number of trees (100, 150, 200, 250, 300, 350, 400). MLP is a fully connected feedforward neural network using backpropagation for training. The scripts are made available in open source (please see the Github link below).

The performance of the models with optimal hyperparameters was assessed using 5-fold cross-validation and using the test datasets unseen by feature selection and hyperparameter tuning as well with the following indices: area under the ROC curve (AUC), balanced accuracy, F1-score, precision, recall, and Matthews correlation coefficient (MCC, a more reliable measure of the differences between actual values and predicted values) [20]. The model with the highest MCC was selected as the final model. We evaluated the significance of the cross-validated scores of the final model with 1000 permutations [21]. Briefly, we generated randomized datasets by permuting only the binary outcome of the original dataset 1000 times and calculated the cross-validated scores on each randomized dataset. The p-value represented the fraction of randomized datasets where the model performed as well or better than the original data. Calibration was then assessed by plotting reliability diagrams for the selected models using 5-fold cross-validation and by computing the mean Brier score [22]. Classification models and calibration assessments were generated using scikit-learn 0.24.2. Statistical analysis was performed using scipy 1.6.2. The plots were generated using matplotlib 3.3.4 and seaborn 0.11.1.

Derivation of the digital vocal biomarker

For each type of device, we used the predicted probability of being classified as symptomatic from the best model as our final vocal biomarker, which can be used as a quantitative measure to monitor the presence of symptoms. We further described its distribution, in both groups of symptomatic and asymptomatic cases and performed a t-test between the two groups.

Results

Study participants’ characteristics

We analyzed data from N = 272 participants with an average age of 39.9 years. Among them, 50.3% were women. For recording, 101 participants used Android devices (3gp format) while 171 participants used iOS devices (m4a format). We did not observe a difference in the distribution of age, sex, BMI, smoking, antibiotic use, and comorbidity, including diabetes, hypertension, and asthma, between the two types of devices (Table 2). On average, they reported their symptoms during 6.5 days in the first 14 days of follow-up, which ended up in the analysis of 1775 audio recordings. Among them, N = 1049 were classified as “symptomatic” and N = 726 as “asymptomatic”.

Table 2. Overall study participants’ characteristics split by type of smartphone/audio format (Predi-COVID Cohort Study, N = 272).

Variables [mean (SD) or N(%)] Overall (N = 272) Android devices / 3gp audio format
(N = 101)
iOS devices / m4a audio format
(N = 171)
P-values (Student’s t or Chi-square)
Clinical Features
Sex (% female) 50.3% 46.6% 52.4% 0.34
Age (years) 39.9(13.2) 40.1(12.5) 39.7(13.6) 0.80
Body mass index (kg/m 2 ) 25.6(4.7) 25.6(4.3) 25.6(4.9) 0.97
Smoking (% current smoker) 15.8% 15.5% 15.9% 0.75
Antibiotic use (% yes) 11.1% 10.3% 11.6% 0.86
Diabetes (% yes) 2.5% 2.3% 2.7% 1.00
Hypertension (% yes) 9.6% 8.5% 10.2% 0.75
Asthma (% yes) 5.6% 4.7% 6.2% 0.71
Voice recordings
Total audio samples available 1775 693 1082
"Symptomatic" labels 1049 449 600 <0.001
"Asymptomatic" labels 726 244 482
Language
French 44.6% 48.8% 41.9% <0.001
German 30.9% 32.0% 30.2% 0.45
English 22.6% 18.3% 25.3% <0.001
Portuguese 1.9% 0.9% 2.6% 0.016
Number of voice recordings & assessment of symptoms per participant over 14 days 6.5 6.9 6.3

A total of 225 participants reported at least one symptom in the 14 days. The most observed symptoms were fatigue (50%), loss of taste and smell (47.3%), and dry cough (44.7%). Conversely, difficulty eating or drinking (2.7%) and skin rash (3.5%) were the most infrequent symptoms. The participants spoke four languages, French (44.6%), German (30.9%), English (22.6%), and Portuguese (1.9%).

Best predictive model

We selected 100 and 250 features from 3gp and m4a audios using the Recursive Feature Elimination method (see S5 File). The optimal number of features was decided from a five cross-validated selection. For Android, 3gp format, we have observed that the selected features were mainly coming from the spectral (53%) domain, followed by cepstral (37%), prosodic (8%), and voice quality features (2%). We observed a similar trend for iOS, m4a format, as we have found that the selected features were mainly coming from the spectral (59%) domain, followed by cepstral (29%), prosodic (9%), and voice quality features (3%). For each type of device (Android, 3gp format versus iOS, m4a format), we compared the performances of each classifier (Table 3). According to the balanced accuracy and AUC criteria, we have observed that SVM models outperformed all other models. The SVM model has slightly better performance on Android devices than on iOS devices. For Android, the SVM model had an AUC of 0.92 and balanced accuracy of 0.83. The MCC was 0.68, F1-score was 0.85, precision 0.86, and recall 0.86. For iOS devices, the SVM model had an AUC of 0.85, and balanced accuracy of 0.77. The MCC was 0.54, F1-score was 0.77, precision 0.78, and recall 0.78. We can also observe in the confusion matrices that the models rarely predict the wrong symptomatic/asymptomatic status and that both mean Brier scores were low (respectively 0.11 and 0.16 for Android and iOS devices, Fig 2). The scores obtained from permuted datasets were all lower than the scores from the original dataset, i.e. the permutation p-value < 0.001.

Table 3. Performances of the different algorithms.

Audio format Model AUC Balanced accuracy MCC F1-score F1-score F1-score Precision Precision Precision Recall Recall Recall
0 1 0 1 0 1
Android devices / 3gp Random Forest 0.90(0.92) 0.79(0.83) 0.60(0.67) 0.82(0.85) 0.73(0.78) 0.87(0.89) 0.82(0.85) 0.79(0.80) 0.84(0.87) 0.82(0.85) 0.68(0.76) 0.90(0.90)
Support Vector Machine (SVM) 0.92(0.92) 0.83(0.83) 0.68(0.66) 0.85(0.84) 0.79(0.78) 0.89(0.88) 0.86(0.84) 0.83(0.76) 0.87(0.89) 0.86(0.84) 0.75(0.80) 0.91(0.87)
449 symptomatic cases Bagging Tree 0.89(0.91) 0.79(0.80) 0.62(0.63) 0.82(0.83) 0.73(0.75) 0.88(0.88) 0.83(0.83) 0.82(0.81) 0.83(0.85) 0.83(0.83) 0.66(0.69) 0.92(0.91)
244 asymptomatic cases Bagging SVM 0.92(0.94) 0.78(0.80) 0.62(0.65) 0.82(0.84) 0.71(0.75) 0.88(0.88) 0.84(0.84) 0.87(0.85) 0.82(0.84) 0.83(0.84) 0.61(0.67) 0.95(0.93)
Multi-Layer Perceptron (MLP) 0.88(0.88) 0.79(0.80) 0.58(0.59) 0.81(0.81) 0.73(0.73) 0.86(0.86) 0.81(0.81) 0.74(0.73) 0.85(0.86) 0.81(0.81) 0.71(0.73) 0.87(0.86)
iOS devices / m4a Random Forest 0.81(0.78) 0.72(0.70) 0.47(0.41) 0.73(0.71) 0.67(0.66) 0.78(0.74) 0.74(0.71) 0.76(0.68) 0.73(0.73) 0.74(0.71) 0.61(0.65) 0.84(0.76)
Support Vector Machine (SVM) 0.85(0.84) 0.77(0.76) 0.54(0.52) 0.77(0.76) 0.74(0.74) 0.80(0.78) 0.78(0.76) 0.76(0.72) 0.79(0.80) 0.77(0.76) 0.73(0.76) 0.81(0.76)
600 symptomatic cases Bagging Tree 0.83(0.80) 0.74(0.73) 0.50(0.48) 0.75(0.74) 0.69(0.69) 0.80(0.78) 0.76(0.75) 0.79(0.75) 0.74(0.74) 0.75(0.75) 0.61(0.64) 0.87(0.83)
482 asymptomatic cases Bagging SVM 0.86(0.83) 0.72(0.74) 0.51(0.52) 0.73(0.75) 0.63(0.68) 0.80(0.81) 0.78(0.77) 0.88(0.82) 0.70(0.73) 0.74(0.76) 0.50(0.58) 0.94(0.90)
Multi-Layer Perceptron (MLP) 0.81(0.78) 0.73(0.73) 0.47(0.46) 0.74(0.73) 0.70(0.70) 0.77(0.75) 0.74(0.73) 0.71(0.69) 0.76(0.77) 0.74(0.73) 0.69(0.72) 0.78(0.74)

Two strategies were tested to assess the model performances. The first strategy was to perform a 5-fold cross-validation, in order to have more information, but potentially based on partially already seen data, the second strategy was the assessment of the model performances only on the test dataset corresponding to the training dataset where the highest performances were obtained. The numbers outside the bracket indicate the mean values from a 5-fold cross-validation across the whole dataset. In bracket is the performance on the test dataset where the model was trained on a separate training dataset. * A description of each model is available in the Material and Methods section.

Fig 2.

Fig 2

Vocal biomarker distribution in people with and without symptoms—Confusion Matrix (a), Boxplot (b), AUC (c), Calibration curve (d).

Vocal biomarker of symptom resolution

Based on the selected best predictive SVM models, we derived, for each type of device, digital vocal biomarkers which quantitatively represent the probability of being classified as symptomatic (Fig 2). In the test set, we have observed an important difference in the distributions of the vocal biomarkers between the symptomatic and the asymptomatic categories (P<0.001 for both Android and iOS devices).

Discussion

In the prospective Predi-COVID cohort study, we have trained an AI-based algorithm to predict the presence or absence of symptoms in people with COVID-19. We have then derived, for each type of smartphone device (Android or iOS) a vocal biomarker that can be used to accurately identify symptomatic and asymptomatic individuals with COVID-19.

Comparison with the literature

Previous attempts to use respiratory sounds have been suggested for COVID-19 [23], but they were largely focused on the use of cough [8,24] or breathing [25] to predict COVID-19 diagnosis [26]. The most promising approach so far has been proposed by the MIT [8], as they achieved an elevated sensitivity (98.5% in symptomatic and 100% in asymptomatic individuals) and specificity (94.2% in symptomatic and 83.2% in asymptomatic individuals). Similar COVID-19 infection risk evaluation systems and large cough databases are currently under development [27], but additional research still needs to be performed to assess whether such algorithms predict the true COVID-19 status or rather the general health status of the individual [28]. We can also mention the recent work by Robotto et al, who have trained machine learning models (SVM classifiers) based on OpenSmile-derived features from voice recordings to classify individuals as either positive (n = 70), or recovered (n = 70), or healthy (n = 70) [29]. In one sensitivity analysis, they have trained algorithms to classify positive versus recovered individuals, which is the closest approach to ours. They have shown higher performances compared to ours (AUC = 0.96 versus AUC = 0.91 and 0.85 respectively in the present work for Android and iOS devices), and have shown that similar results can be achieved regardless of the type of audio recordings used (vowel phonation, text, or cough). The slightly higher performances in their work are probably attributable to their standardized recruitment process as well as the controlled environment of the recordings, as the recording sessions were conducted in similar hospital rooms, with quiet environments and tolerable levels of background noise. In comparison to their work, our models suggest that, in real life, we can expect a slight decrease in the overall performance of the models but that it remains feasible to use such an approach without any controlled environment and relying on various devices and recording situations. No other work, i.e. focusing on individuals with COVID-19, has been reported. No comparison was possible with potential other models based on voice recordings other than coughs, nor with the objective of monitoring symptoms in people with COVID-19.

Biological mechanisms

SARS-CoV-2 infection can cause damage to multiple organs [30,31], regardless of the initial disease severity, and can persist chronically in individuals with Long Covid [32,33]. Frequently reported COVID-19 related symptoms have now largely been described (fatigue, dyspnea, cardiac abnormalities, cognitive impairment, sleep disturbances, symptoms of post-traumatic stress disorder, muscle pain, concentration problems, headache, anosmia, ageusia [34]), and the underlying mechanisms are also described [33]. Many systems such as, but not restricted to, the respiratory, cardiovascular, neurological, gastrointestinal, and musculoskeletal systems can be altered and, if impaired, can directly impact voice characteristics. Inappropriate inflammatory response [35], increased level of cytokines such as interleukin-6 [36], stiffer and less flexible vocal cords due to inflammation, ACE2 receptors expressed on the mucous membrane of the tongue [37] or more generally a combination of central, peripheral, and negative psychological or social factors are involved both in COVID-19 pathogenesis and voice production or voice disorders. Besides, for hospitalized individuals, tracheostomy [38], intubation could also modify audio features from the voice [39]. Of note, none of the participants included in the present study underwent such a procedure.

Strengths and limitations

This work has several strengths. First, enrolled participants were all cases who had their diagnosis confirmed by a positive PCR test, which excludes the risk of having non-infected individuals in the asymptomatic category, and false-positive individuals in the symptomatic group. The prospective design of the Predi-COVID cohort study limits the risk of differential bias. The data collection process also ensures that the audio recordings were performed on the same day as the assessment of the symptoms, which limits the risk of reverse causation. This work relies on a large set of frequently reported symptoms in the literature.

This work also has some limitations. First, our analysis only covers the discovery and internal validation phases of the vocal biomarker. Because the recordings were performed in real life, we have first cleaned and pre-processed the audio recordings and developed a pipeline to ensure that the vocal biomarkers training is as clean as possible, but we cannot completely rule out the possibility of having a few recordings of low quality. Potential sources of low quality of recordings include sub-optimal recording conditions in an uncontrolled environment, the use of lossy audio formats for data compression and the artifacts potentially introduced by noise reduction. There is currently no similar dataset existing on this topic with similar audio recordings which prevents us from performing an external validation. Our vocal biomarker is mostly based on French and German voice recordings and as audio features may vary across languages or accents, our work will have to be replicated in other settings and populations [40]. The data used to train and test the models as well as the corresponding programs are open source and made available to the scientific community for replication or follow-up studies.

Conclusions and perspectives

Using a simple, reproducible task of reading a standardized pre-specified text of 25 seconds, our work has demonstrated that it is possible to derive a vocal biomarker from a machine learning model to monitor the resolution of COVID-19 related symptoms with elevated accuracy and calibration. We have shown that voice is a non-invasive, quick, and cheap way to monitor COVID-19-related symptom resolution or aggravation. Such a vocal biomarker could be integrated into future telemonitoring solutions, digital devices, or in clinical practice for a rapid screening during a consultation to aid clinicians during anamnesis.

Supporting information

S1 File. Text to read for voice recording.

(DOCX)

S2 File. List of 66 acoustic features used based on OpenSmile_COMPARE 2016.

(XLSX)

S3 File. Density of LLD audio features for symptomatic vs asymptomatic cases—Android devices - 3gp audio format.

(TIFF)

S4 File. Density of LLD audio features for symptomatic vs asymptomatic cases—iOS devices—m4a audio format.

(TIFF)

S5 File. Selected features for the best model for each audio format.

(XLSX)

Acknowledgments

We thank the Predi-COVID participants for their involvement in the study, the members of the Predi-COVID external scientific committee for their expertise, as well as the project team, the IT team in charge of the app development, and the nurses in charge of recruitment, data and sample collection, and management on the field.

Data Availability

Data and Code Availability: The data and code used to train and validate the models are available here: Data: https://zenodo.org/record/5572855 Code: https://github.com/LIHVOICE/Predi-COVID_voice_symptoms.

Funding Statement

The Predi-COVID study is supported by the Luxembourg National Research Fund (FNR) (grant number 14716273 to GF, MO), the André Losch Foundation (GF, MO), and the Luxembourg Institute of Health (GF, MO). The funders had no role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.COVID-19 Map—Johns Hopkins Coronavirus Resource Center. [cited 10 Sep 2021]. Available from: https://coronavirus.jhu.edu/map.html.
  • 2.Wilmes P, Zimmer J, Schulz J, Glod F, Veiber L, Mombaerts L, et al. SARS-CoV-2 transmission risk from asymptomatic carriers: Results from a mass screening programme in Luxembourg. Lancet Reg Health Eur. 2021;4: 100056. doi: 10.1016/j.lanepe.2021.100056 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Johansson MA, Quandelacy TM, Kada S, Prasad PV, Steele M, Brooks JT, et al. SARS-CoV-2 Transmission From People Without COVID-19 Symptoms. JAMA Netw Open. 2021;4: e2035057–e2035057. doi: 10.1001/jamanetworkopen.2020.35057 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Crook H, Raza S, Nowell J, Young M, Edison P. Long covid—mechanisms, risk factors, and management. BMJ. 2021;374. doi: 10.1136/bmj.n1648 [DOI] [PubMed] [Google Scholar]
  • 5.Fagherazzi G, Goetzinger C, Rashid MA, Aguayo GA, Huiart L. Digital Health Strategies to Fight COVID-19 Worldwide: Challenges, Recommendations, and a Call for Papers. J Med Internet Res. 2020;22: e19284. doi: 10.2196/19284 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.DeMerle K, Angus DC, Seymour CW. Precision Medicine for COVID-19: Phenotype Anarchy or Promise Realized? JAMA. 2021;325: 2041–2042. doi: 10.1001/jama.2021.5248 [DOI] [PubMed] [Google Scholar]
  • 7.Fagherazzi G, Fischer A, Ismael M, Despotovic V. Voice for Health: The Use of Vocal Biomarkers from Research to Clinical Practice. Digit Biomark. 2021;5: 78–88. doi: 10.1159/000515346 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Laguarta J, Hueto F, Subirana B. COVID-19 Artificial Intelligence Diagnosis Using Only Cough Recordings. IEEE Open Journal of Engineering in Medicine and Biology. 2020. pp. 275–281. doi: 10.1109/OJEMB.2020.3026928 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Fagherazzi G, Fischer A, Betsou F, Vaillant M, Ernens I, Masi S, et al. Protocol for a prospective, longitudinal cohort of people with COVID-19 and their household members to study factors associated with disease severity: the Predi-COVID study. BMJ Open. 2020;10: e041834. doi: 10.1136/bmjopen-2020-041834 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Hoffmann W, Latza U, Baumeister SE, Brünger M, Buttmann-Schweiger N, Hardt J, et al. Guidelines and recommendations for ensuring Good Epidemiological Practice (GEP): a guideline developed by the German Society for Epidemiology. Eur J Epidemiol. 2019;34: 301–317. doi: 10.1007/s10654-019-00500-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Tripod statement. [cited 10 Sep 2021]. Available from: https://www.tripod-statement.org/resources/.
  • 12.United Nations. Universal Declaration of Human Rights | United Nations. [cited 10 Sep 2021]. Available from: https://www.un.org/en/about-us/universal-declaration-of-human-rights
  • 13.Ephraim Y, Malah D. Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing. 1985. pp. 443–445. doi: 10.1109/tassp.1985.1164550 [DOI] [Google Scholar]
  • 14.audeering. GitHub—audeering/opensmile: The Munich Open-Source Large-Scale Multimedia Feature Extractor. [cited 10 Sep 2021]. Available from: https://github.com/audeering/opensmile.
  • 15.Guyon I, Weston J, Barnhill S, Vapnik V. Machine Learning. 2002. pp. 389–422. doi: 10.1023/a:1012487302797 [DOI] [Google Scholar]
  • 16.Sezgin MC, Gunsel B, Kurt GK. Perceptual audio features for emotion detection. EURASIP J Audio Speech Music Process. 2012;2012. doi: 10.1186/1687-4722-2012-16 [DOI] [Google Scholar]
  • 17.Cortes C, Vapnik V. Support-vector networks. Machine Learning. 1995. pp. 273–297. doi: 10.1007/bf00994018 [DOI] [Google Scholar]
  • 18.Breiman L. Bagging predictors. Machine Learning. 1996. pp. 123–140. doi: 10.1007/bf00058655 [DOI] [Google Scholar]
  • 19.Breiman L. Random Forests. Mach Learn. 2001;45: 5–32. [Google Scholar]
  • 20.Chicco D, Tötsch N, Jurman G. The Matthews correlation coefficient (MCC) is more reliable than balanced accuracy, bookmaker informedness, and markedness in two-class confusion matrix evaluation. BioData Min. 2021;14: 1–22. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Ojala M, Garriga GC. Permutation Tests for Studying Classifier Performance. 2009 Ninth IEEE International Conference on Data Mining. 2009. doi: 10.1109/icdm.2009.108 [DOI]
  • 22.Brier GW. VERIFICATION OF FORECASTS EXPRESSED IN TERMS OF PROBABILITY. Monthly Weather Review. 1950. pp. 1–3. doi: [DOI] [Google Scholar]
  • 23.Anthes E. Alexa, do I have COVID-19? Nature. 2020. pp. 22–25. doi: 10.1038/d41586-020-02732-4 [DOI] [PubMed] [Google Scholar]
  • 24.Imran A, Posokhova I, Qureshi HN, Masood U, Riaz MS, Ali K, et al. AI4COVID-19: AI enabled preliminary diagnosis for COVID-19 from cough samples via an app. Inform Med Unlocked. 2020;20: 100378. doi: 10.1016/j.imu.2020.100378 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.COVID-19 Sounds App. [cited 10 Sep 2021]. Available from: http://www.covid-19-sounds.org/.
  • 26.Automatic diagnosis of COVID-19 disease using deep convolutional neural network with multi-feature channel from respiratory sound data: Cough, voice, and breath. Alex Eng J. 2021. [cited 10 Sep 2021]. doi: 10.1016/j.aej.2021.06.024 [DOI] [Google Scholar]
  • 27.Orlandic L, Teijeiro T, Atienza D. The COUGHVID crowdsourcing dataset, a corpus for the study of large-scale cough analysis algorithms. Scientific Data. 2021;8: 1–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Claxton S, Porter P, Brisbane J, Bear N, Wood J, Peltonen V, et al. Identifying acute exacerbations of chronic obstructive pulmonary disease using patient-reported symptoms and cough feature analysis. NPJ Digit Med. 2021;4: 107. doi: 10.1038/s41746-021-00472-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Robotti C, Costantini G, Saggio G, Cesarini V, Calastri A, Maiorano E, et al. Machine Learning-based Voice Assessment for the Detection of Positive and Recovered COVID-19 Patients. J Voice. 2021. doi: 10.1016/j.jvoice.2021.11.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Hamming I, Timens W, Bulthuis MLC, Lely AT, Navis GJ, van Goor H. Tissue distribution of ACE2 protein, the functional receptor for SARS coronavirus. A first step in understanding SARS pathogenesis. J Pathol. 2004;203: 631–637. doi: 10.1002/path.1570 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Wu Z, McGoogan JM. Characteristics of and Important Lessons From the Coronavirus Disease 2019 (COVID-19) Outbreak in China: Summary of a Report of 72 314 Cases From the Chinese Center for Disease Control and Prevention. JAMA. 2020;323: 1239–1242. doi: 10.1001/jama.2020.2648 [DOI] [PubMed] [Google Scholar]
  • 32.Raman B, Cassar MP, Tunnicliffe EM, Filippini N, Griffanti L, Alfaro-Almagro F, et al. Medium-term effects of SARS-CoV-2 infection on multiple vital organs, exercise capacity, cognition, quality of life and mental health, post-hospital discharge. EClinicalMedicine. 2021;31: 100683. doi: 10.1016/j.eclinm.2020.100683 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Dennis A, Wamil M, Alberts J, Oben J, Cuthbertson DJ, Wootton D, et al. Multiorgan impairment in low-risk individuals with post-COVID-19 syndrome: a prospective, community-based study. BMJ Open. 2021;11: e048391. doi: 10.1136/bmjopen-2020-048391 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Sudre CH, Keshet A, Graham MS, Joshi AD, Shilo S, Rossman H, et al. Anosmia, ageusia, and other COVID-19-like symptoms in association with a positive SARS-CoV-2 test, across six national digital surveillance platforms: an observational study. Lancet Digit Health. 2021;3: e577–e586. doi: 10.1016/S2589-7500(21)00115-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Islam MF, Cotler J, Jason LA. Post-viral fatigue and COVID-19: lessons from past epidemics. Fatigue: Biomedicine, Health & Behavior. 2020. pp. 61–69. doi: 10.1080/21641846.2020.1778227 [DOI] [Google Scholar]
  • 36.McElvaney OJ, McEvoy NL, McElvaney OF, Carroll TP, Murphy MP, Dunlea DM, et al. Characterization of the Inflammatory Response to Severe COVID-19 Illness. Am J Respir Crit Care Med. 2020;202: 812–821. doi: 10.1164/rccm.202005-1583OC [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Xu H, Zhong L, Deng J, Peng J, Dan H, Zeng X, et al. High expression of ACE2 receptor of 2019-nCoV on the epithelial cells of oral mucosa. Int J Oral Sci. 2020;12: 8. doi: 10.1038/s41368-020-0074-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Rouhani MJ, Clunie G, Thong G, Lovell L, Roe J, Ashcroft M, et al. A Prospective Study of Voice, Swallow, and Airway Outcomes Following Tracheostomy for COVID-19. Laryngoscope. 2021;131: E1918–E1925. doi: 10.1002/lary.29346 [DOI] [PubMed] [Google Scholar]
  • 39.Archer SK, Iezzi CM, Gilpin L. Swallowing and Voice Outcomes in Patients Hospitalized With COVID-19: An Observational Cohort Study. Arch Phys Med Rehabil. 2021;102: 1084–1090. doi: 10.1016/j.apmr.2021.01.063 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.The Lancet Digital Health. Do I sound sick? Lancet Digit Health. 2021;3: e534. doi: 10.1016/S2589-7500(21)00182-5 [DOI] [PubMed] [Google Scholar]
PLOS Digit Health. doi: 10.1371/journal.pdig.0000112.r001

Decision Letter 0

Ryan S McGinnis, Liliana Laranjo

30 Mar 2022

PDIG-D-21-00115

A Voice-based Biomarker For Monitoring Symptom Resolution In Adults With COVID-19: Findings From The Prospective Predi-COVID Cohort Study

PLOS Digital Health

Dear Dr. Fagherazzi,

Thank you for submitting your manuscript to PLOS Digital Health. After careful consideration, we feel that it has merit but does not fully meet PLOS Digital Health's publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by May 29 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at digitalhealth@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pdig/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

* A rebuttal letter that responds to each point raised by the editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

* A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

* An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

We look forward to receiving your revised manuscript.

Kind regards,

Ryan S McGinnis, Ph.D.

Academic Editor

PLOS Digital Health

Journal Requirements:

1. Please amend your detailed Financial Disclosure statement. This is published with the article, therefore should be completed in full sentences and contain the exact wording you wish to be published.

State the initials, alongside each funding source, of each author to receive each grant.

2. Please update your Competing Interests statement. If you have no competing interests to declare, please state: “The authors have declared that no competing interests exist.”

3. We ask that a manuscript source file is provided at Revision. Please upload your manuscript file as a .doc, .docx, .rtf or .tex. If you are providing a .tex file, please upload it under the item type ‘LaTeX Source File’ and leave your .pdf version as the item type ‘Manuscript’.

4. Please provide separate figure files in .tif or .eps format only and ensure that all files are under our size limit of 20MB.

Please also ensure that all files are under our size limit of 20MB.

For more information about how to convert your figure files please see our guidelines: https://journals.plos.org/digitalhealth/s/figures

Additional Editor Comments (if provided):

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Does this manuscript meet PLOS Digital Health’s publication criteria? Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe methodologically and ethically rigorous research with conclusions that are appropriately drawn based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

--------------------

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

--------------------

3. Have the authors made all data underlying the findings in their manuscript fully available (please refer to the Data Availability Statement at the start of the manuscript PDF file)?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception. The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

--------------------

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS Digital Health does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

--------------------

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The paper presents a machine-learning based approach for the automatic identification of the presence of COVID-19 symptoms from the voice signal, proposing a “biomarker” which ultimately is the calibrated probability of being symptomatic. A pool of COVID-positive patients were recruited (Predi-COVID cohort), and vocal recordings were collected along with self-reported symptomatology questionaries, throughout a year-long period. There is heterogeneity of language and demographics within the subjects. A pipeline was built, starting from audio pre-processing followed by a wrapper-based feature selection and multiple classifiers, the better of which was chosen as the final model, whose calibrated outputs bring the “biomarker”. The test is independently divided between Android and iOS-based recording devices.

The paper is well written and coherently organized, although there’s an abundance of sections. Moreover, the building and usage of a new dataset and the search for symptomatology are a strong contribution to the originality of the study. However, there is a number of weak points that affect the generality of the work.

Especially when building custom datasets, which are often small, a peculiar attention to the cleanliness and a rigorous methodology should be employed. The recording conditions are not explained with sufficient detail. The main point that should be addressed with regards to the recordings delas with the source audio. In the “Data Analysis” section, the format of the files is stated to be 3GP for Android and M4A for iOS. Both of these extensions employ an AAC coding for audio signals, which is inherently lossy. This often leads to unsatisfactory and/or biased performances as well as loss of information for the audio analysis. An app is said to be used: since it manages the audio recording, we argue that it should record uncompressed/lossless audio. Additionally, no information is given on the compression parameters, sampling frequency and bit-depth of the collected audio.

The automatic selection of the recordings is interesting, although very briefly explained in the “Pre-processing” section. Some points could be stated for the whole pre-processing and feature extraction part:

1) Some references or a concise discussion on the validity of the proposed selection methods could be beneficial, as it could be argued that “bad” recordings might have been left in if the selection methods weren’t sufficiently accurate.

2) The pydub.effects.normalize function is stated to bring “a volume boost on the quiet parts”. Although the manual for the function states so, it is inherently wrong. This kind of normalization simply brings the peak of the signal to 0dbFS. The relative volume throughout different sections of the signal stays the same. Quiet parts are selectively raised in volume if compression is employed, or if normalization happens on smaller sections of the main signal, which appears not to be the case

3) Noise reduction is applied, using the logMMSE method. Using a noise reduction algorithm is a very delicate matter in audio analysis, as it inevitably changes and denatures the original data, possibly bringing artifacts not found in common speech signals, even when perceptual characteristics seem to improve. Therefore, a more thorough explanation and some proper referencing are needed in order to disclose the very algorithm used, the framework/library it has been employed with, and the reason why it should improve performance.

4) It is not totally clear if there is a one-to-one correspondence between an audio recording of a patient and his questionaries. Does a patient record arbitrarily, whereas questionaries are (mostly) compulsory?

5) A better explanation behind the choice of adding the MFCC0 features, and on their nature as well, could be beneficial.

The machine-learning part has some strong points, such as the use of many classifiers and steps, but needs some clarification and a more coherent explanation of some of the choices.

In the Feature Selection section, the theory behind the RFE is not clearly explained, no references are present and the first sentence should be re-written. Additionally, it is not specified that it is a wrapper-based method and there is no clear explanation for the way the number of features is reached using Random Forest classifiers.

However, the most crucial point in the machine-learning section is the training-test split and the cross-validation throughout all the steps. What appears from the manuscript is that a 5-fold cross-validation was independently used within every different step, starting from the feature selection. However, since the 5 folds of in Classification step will be different, the features selected beforehand will be based on samples that may be the same that are now in the testing fold. This generates an inherent bias. Technically, the test data should never endure a feature selection, not even partly.

Nevertheless, in section “Derivation of the digital biomarker” a “distribution in the test dataset” is mentioned. Which test dataset is it referring to? Perhaps the cross-validation happened only once, before the Feature Selection, as Figure 1 may suggest; this is not clear.

Several other points should be clarified, justified or corrected:

1) In the “Classification model selection and evaluation” it is said that “We evaluated the significance of the cross-validated scores of the final model with 1000 permutations.”. The sentence is unclear and the methodology should be explained.

2) Since it is (understandably) stated as the most reliable indicator, a more thorough explanation would be needed for the MCC, as well as referencing. It would also be beneficial to present its formula, possibly in terms of false positives (FP), false negatives (FN), etc. Moreover, there is a “T” missing in Matthews’ name.

3) The MCC is also the only indicator left out in the “Best predictive model” section.

4) At the end of the same section, the calibrations are deemed as “good” because of their Brier scores. However, such scores are not explained or referenced. A Brier score of 0.09 means a correct prediction with a 70% certainty, so what does “good” exactly mean in this scenario?

The meta-analysis is also very hasty and lackluster. There is a brief digression on cough-based studies on COVID, and then it’s just stated that “no similar work to ours, i.e. focusing on individuals with COVID-19, has been previously reported. No comparison was possible with potential other models based on voice recordings rather than cough …”. However, this is definitely not true, as there is a handful of studies dealing with speech (sometimes associated with cough too) for COVID-19. The main ones that should be assessed are:

1) Robotti, C, Costantini, G et al. : Machine Learning-based Voice Assessment for the Detection of Positive and Recovered COVID-19 Patients (Journal of Voice, DOI:https://doi.org/10.1016/j.jvoice.2021.11.004)

2) Pinkas, G, Karny, Y et al. : SARS-CoV-2 detection from voice. (IEEE Open J Eng Med. 2020)

3) Shimon, C, Shafat, G et al.: Artificial intelligence enabled preliminary diagnosis for COVID-19 from voice cues and questionnaires. (J Acoust Soc Am. 2021)

In the end, despite the novelty of the work and the undying necessity for quality research on COVID-19, we argue that this paper needs major revisions. A more thorough analysis and justification for the quality of the audio should be presented, as well as some clearance in the machine-learning section, and definitely a more extensive literature analysis.

Reviewer #2: Review of “a Voice-based Biomarker for Monitoring Symptom Resolution in Adults with Covid-19: Findings from the Prospective Predi-Covid Cohort Study” for PLOS

This paper sets out to derive a digital vocal biomarker to monitor COVID-19 symptom resolution. The authors used data from a prospective COVID study which included voice recordings from both symptomatic and asymptomatic participants who tested positive for COVID. Artificial intelligence models were trained to detect the resolution of symptoms.

The paper would be improved by including additional narrative clarifying the methodology. PLOS Digital Health includes studies from a variety of disciplines, and not everyone will have an extensive background in artificial intelligence modeling or handling digital voice recordings. Please include a list of the acronyms and their meanings.

Please include the total number of audio recordings and the number of symptomatic cases and asymptomatic cases in the text in the final paragraph under “methods.”

In developing the list of COVID-19 related symptoms that were included in self-report questionnaires, did the authors consider “Flu-pro plus”? This is widely used in the United States.

On page 4 under the section “pre-processing, the sentence beginning “First, audio files…” appears to be missing something in parenthesis. The sentence on noise reduction appears to be incomplete.

On page 5, “Feature extraction,” please explain more about why you did this. This should be clear to those unfamiliar with sound processing and artificial intelligence modeling. The sentence starting “We compared the distribution…” talks about the arithmetic mean of symptomatic versus asymptomatic samples but also discusses separating android versus iOS audio in one confusing sentence. It would be clearer if these two concepts were described separately.

On page 5, “classification model selection and evaluation,” please explain the different models and why you would choose to test all four.

On page 7, in the last sentence of the section entitled “comparison with the literature,” do you mean “other” instead of the word “rather”?

On Table 1, does the table include only symptomatic participants? It’s confusing, because you mentioned you started out with 272 study participants, but I could not find the number of symptomatic versus asymptomatic participants.

On table 3, please include a key with a brief description of the machine learning algorithm, and the acronyms. On this table, it appears that “cases” means the number of voice recordings, not the number of actual participants. It would be good to clarify this.

In your discussion section, you might consider including possible future directions. You did not have a COVID free control group. I wondered whether the voices of people who had not tested positive for COVID would be different from people who tested positive but were asymptomatic. It appears that you compared voice recordings of participants indicating symptoms with voice recordings of participants who did not indicate any symptoms. So, over the course of time, would a person have been classified as “asymptomatic” following the resolution of symptoms that were reported earlier? If that is correct, it might be also interesting to look at the same participant’s voice while he/she indicates symptoms with that same participant’s voice after the symptoms have resolved.

Reviewer #3: This is a clearly written, timeliness paper. To my point of view, its major strengths rely on the good data availability and reproducibility, and the large specific covid-19 symptom database.

However, the paper have some major drawbacks, outlined next:

1. The database is large enough to perform state-of-the-art deep learning-based classification algorithms. However, only some traditional algorithms were tested. Although DL algorithms shouldn't necessary perform better, they should be included in the study.

2. The authors extract a huge number of features from OpenSmile (6472), and then they perform a feature selection using a Recursive Feature Elimination method. However, there is not detailed description of which voice features were extracted and selected, nor an analysis of the relevance of the extracted features for the classification. Are the best features more related to prosodic, only acoustic, dynamic features?

Such analysis would be crucial for a better understanding of a speech-based biomarker model.

3. Apart from the large database collection with its specificities, the current study does not represent a novelty with respect to the start of the art. The techniques used are not novel, and the discussion of what features are relevant as biomarker, which would be a good contribution to the paper, are not presented. The authors mention that "The most promising approach so far has been proposed by the MIT, as they achieved an elevated sensitivity (98.5% in symptomatic and 100% in asymptomatic individuals) and specificity (94.2% in symptomatic and 83.2% in asymptomatic individuals).". Which is the study they refer to? This approach is not properly referenced. Also, what are the main differences with respect to MIT's study that could make this contribution novel?

4. One of the limitations mentioned by the authors is that "Our vocal biomarker is mostly based on French and German voice recordings and as audio features may vary across languages or accents, our work will have to be replicated in other settings and populations." The overcome this, the experiments could have been performed separately by languages, so that it would be possible to analyse the language dependency to the biomarker.

Minor remark:

- "350ms" --> "350 ms"

--------------------

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

Do you want your identity to be public for this peer review? If you choose “no”, your identity will remain anonymous but your review may still be made public.

For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Reviewer #3: No

--------------------

Comments to the Author

1. Does this manuscript meet PLOS Digital Health’s publication criteria? Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe methodologically and ethically rigorous research with conclusions that are appropriately drawn based on the data presented.

Reviewer #3: Yes

--------------------

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLOS Digit Health. doi: 10.1371/journal.pdig.0000112.r003

Decision Letter 1

Ryan S McGinnis, Liliana Laranjo

13 Jul 2022

PDIG-D-21-00115R1

A Voice-based Biomarker For Monitoring Symptom Resolution In Adults With COVID-19: Findings From The Prospective Predi-COVID Cohort Study

PLOS Digital Health

Dear Dr. Fagherazzi,

Thank you for submitting your manuscript to PLOS Digital Health. After careful consideration, we feel that it has merit but does not fully meet PLOS Digital Health's publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript within 30 days Sep 11 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at digitalhealth@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pdig/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

* A rebuttal letter that responds to each point raised by the editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

* A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

* An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

We look forward to receiving your revised manuscript.

Kind regards,

Ryan S McGinnis, Ph.D.

Academic Editor

PLOS Digital Health

Journal Requirements:

1. Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article's retracted status in the References list and also include a citation and full reference for the retraction notice.

Additional Editor Comments (if provided):

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #2: All comments have been addressed

--------------------

2. Does this manuscript meet PLOS Digital Health’s publication criteria? Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe methodologically and ethically rigorous research with conclusions that are appropriately drawn based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

--------------------

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

--------------------

4. Have the authors made all data underlying the findings in their manuscript fully available (please refer to the Data Availability Statement at the start of the manuscript PDF file)?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception. The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

--------------------

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS Digital Health does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

--------------------

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: MINOR REVISION

The authors addressed most of the concerns with reasonable answers.

Several limitations on the scientific validity of the proposed study still emerge, especially on the quality of the proposed dataset and on the reproducibility of the methodology. However, the proposal is coherent and interesting enough for publication, with a suggested minor revision addressing the following points:

- In the response, the authors state that lossy audio is “a necessary trade-off in anticipation of the future implementation in real life of such a digital health solution in practice where multiple devices are used”. We politely disagree, as losing on audio quality on sources as diverse and complex as self-recorded voice could really hinder the quality of the measurements; on the other hand, it is feasible to provide an App or API that allows different devices to record lossless audio.

- For the abovementioned reasons, it would be beneficial to clearly state the main limitations of this study for the readers to realize: lossy audio, sub-optimal recording conditions and lack of control (already mentioned in the Limitations section), non-homogeneous processing due to the .m4a compression and possible artifacts induced with the usage of logMMSE noise reduction.

- The explanation of the 1000 permutations for training models is still unclear: which “targets” are permuted? Please re-write it

- It is unclear to me why the results of the proposed references detecting COVID-19 from the voice are not comparable to those in the present study. “Symptomatic” COVID is of course the main aim when trying to pre-diagnose it from the voice. However, this is just a minor consideration.

We thank the authors for following our suggestions.

Reviewer #2: Excellent contribution. Thank you for your response to reviewer comments.

--------------------

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

Do you want your identity to be public for this peer review? If you choose “no”, your identity will remain anonymous but your review may still be made public.

For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: None

Reviewer #2: No

--------------------

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLOS Digit Health. doi: 10.1371/journal.pdig.0000112.r005

Decision Letter 2

Ryan S McGinnis, Liliana Laranjo

26 Aug 2022

A Voice-based Biomarker For Monitoring Symptom Resolution In Adults With COVID-19: Findings From The Prospective Predi-COVID Cohort Study

PDIG-D-21-00115R2

Dear Dr Fagherazzi,

We are pleased to inform you that your manuscript 'A Voice-based Biomarker For Monitoring Symptom Resolution In Adults With COVID-19: Findings From The Prospective Predi-COVID Cohort Study' has been provisionally accepted for publication in PLOS Digital Health.

Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow-up email from a member of our team. 

Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated.

IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they'll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact digitalhealth@plos.org.

Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Digital Health.

Best regards,

Ryan S McGinnis, Ph.D.

Academic Editor

PLOS Digital Health

***********************************************************

Reviewer Comments (if any, and for reference):

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 File. Text to read for voice recording.

    (DOCX)

    S2 File. List of 66 acoustic features used based on OpenSmile_COMPARE 2016.

    (XLSX)

    S3 File. Density of LLD audio features for symptomatic vs asymptomatic cases—Android devices - 3gp audio format.

    (TIFF)

    S4 File. Density of LLD audio features for symptomatic vs asymptomatic cases—iOS devices—m4a audio format.

    (TIFF)

    S5 File. Selected features for the best model for each audio format.

    (XLSX)

    Attachment

    Submitted filename: Rebuttal Letter - PLOS DIGITAL HEALTH.docx

    Attachment

    Submitted filename: Rebuttal Letter - PDH.docx

    Data Availability Statement

    Data and Code Availability: The data and code used to train and validate the models are available here: Data: https://zenodo.org/record/5572855 Code: https://github.com/LIHVOICE/Predi-COVID_voice_symptoms.


    Articles from PLOS Digital Health are provided here courtesy of PLOS

    RESOURCES