Skip to main content
Springer Nature - PMC COVID-19 Collection logoLink to Springer Nature - PMC COVID-19 Collection
. 2022 Feb 1:1–13. Online ahead of print. doi: 10.1007/s12652-022-03732-0

CR19: a framework for preliminary detection of COVID-19 in cough audio signals using machine learning algorithms for automated medical diagnosis applications

Ezz El-Din Hemdan 1, Walid El-Shafai 2,3,, Amged Sayed 4
PMCID: PMC8803577  PMID: 35126765

Abstract

Today, there is a level of panic and chaos dominating the entire world due to the massive outbreak in the second wave of COVID-19 disease. As the disease has numerous symptoms ranging from a simple fever to the inability to breathe, which may lead to death. One of these symptoms is a cough which is considered one of the most common symptoms for COVID-19 disease. Recent research shows that the cough of a COVID-19 patient has distinct features that are different from other diseases. Consequently, the cough sound can be detected and classified to be used as a preliminary diagnosis of the COVID-19, which will help in reducing the spreading of that disease. The artificial intelligence (AI) engine can diagnose COVID-19 diseases by executing differential analysis of its inherent characteristics and comparing it to other non-COVID-19 coughs. However, the diagnosis of a COVID-19 infection by cough alone is an extremely challenging multidisciplinary problem. Therefore, this paper proposes a hybrid framework for efficiently COVID-19 detection and diagnosis using various ML algorithms from cough audio signals. The accuracy of this framework is improved with the utilization of the genetic algorithm with the ML techniques. We also assess the proposed system called CR19 for diagnosis on metrics such as precision, recall, F-measure. The results proved that the hybrid (GA-ML) technique provides superior results based on different evaluation metrics compared with ML approaches such as LR, LDA, KNN, CART, NB, and SVM. The proposed framework achieve an accuracy equal to 92.19%, 94.32%, 97.87%, 92.19%, 91.48%, and 93.61% in compared with the ML are 90.78, 92.90, 95.74, 87.94, 81.56, and 92.198 for LR, LDA, KNN, CART, NB, and SVM respectively. The proposed framework will efficiently help the physicians provide a proper medical decision regarding the COVID-19 analysis, thereby saving more lives. Therefore, this CR19 framework can be a clinical decision assistance tool used to channel clinical testing and treatment to those who need it the most, thereby saving more lives.

Keywords: COVID-19, Machine learning, Cough, Automated diagnosis, Genetic algorithm, AI, GA-ML technique, Classification

Introduction

In recent days, with the continued spread of the COVID-19 and the emergence of the second wave of the pandemic in many countries, this pandemic has seriously affected many countries of the world in various aspects of life. The COVID-19 pandemic is causing enormous human suffering and becoming the largest global crisis since the World Wars due to its health, social and economic consequences for the entire world. After 10 months of the emergence of COVID-19, the pandemic has claimed more than a million lives and infected more than 45 million people globally (WHO 2020).

Misdiagnosis or confusion with regular flu symptoms is one of the main reasons that led to the outbreak of this epidemic. COVID-19 is a respiratory illness that has claimed more than a million lives and infected more than 62 million people globally, Fig. 1. Because of its health, social and economic ramifications for the whole world, the COVID-19 pandemic is causing immense human misery and has become one of the greatest humanitarian catastrophes of the modern era. With the continued spread of COVID-19 and the emergence of the second wave of the pandemic in many countries, researchers and Scientists are striving to propose concepts and methods that can control and prevent the outbreak of COVID-19. One of the key factors that contributed to the spread of this epidemic is misdiagnosis or misunderstanding of periodic flu symptoms. Thus, it can be difficult and crucial to devise appropriate methods and techniques for early and accurate diagnosis for such a COVID-19 pandemic (Apostolopoulos and Mpesiana 2020a, b; Li et al. 2020; Lalmuanawma et al. 2020). So, designing effective methods and techniques for early and correct diagnosis for such a COVID-19 pandemic.

Fig. 1.

Fig. 1

Reported cases in some countries (WHO Nov 28th 2020)

Recently, researchers and scientists have been striving to propose concepts and methods that can control and prevent the outbreak of COVID-19. The new state-of-the-art digital technologies are being applied in such a pandemic for diagnosis, detection, and tracking. The straightforward application comes from utilizing the artificial intelligence (AI) technique for automatic diagnosis of COVID-19 infection from computed tomography (CT) scans and X-ray images (Goel et al. 2020; Zebin and Rezvy 2020; Apostolopoulos and Mpesiana 2020a, b; Malhotra et al. 2021; Lalmuanawma et al. 2020). The strategy works with a dataset of COVID-19 and normal chest X-ray images. The difficulty with this technique is that many people are waiting for hours in hospitals for scan examinations that burden the medical system. This makes a late diagnosis of the infectious causes, which leads to an outbreak of the disease. In another direction, tremendous efforts are being made to early diagnose infectious people to prevent the spread of the disease using mobile apps or websites, such as the strategies in China and South Korea, by discovering patients, tracing, and isolating them (Imran et al. 2020; Maghdid et al. 2020; Menni et al. 2020). New technologies in ICT help the medical sector monitor, diagnose, and track infected patients to control the spread of COVID-19. But with the spreading of the pandemic worldwide nowadays, this strategy needs development to tackle the change of the situation in our lives. The danger of this disease is that most people who develop COVID-19 do not show any symptoms, and thus they are one of the main reasons for the outbreak of the disease. Thus, the most reliable technique is required, composed of several modern technologies such as the internet of things (IoT) with machine learning for diagnosis and tracking the infected patient.

Therefore, this paper proposes a framework for efficiently COVID-19 detection and diagnosis using hybrid machine learning algorithms with genetic algorithms from cough audio signals. The proposed CR19 framework combines ML and genetic algorithms to improve efficiency and the classification accuracy of the cough data set. The results proved that the hybrid (GA-ML) technique provides the best results based on different evaluation metrics as compared with different machine learning approaches. Likewise, the objective of this work is to propose a novel framework of a hybrid machine learning with the genetic algorithm as an innovative system to assist medical staff in automatically diagnosing COVID-19 from cough audio signals. The accuracy is promising enough to encourage a large-scale collection of labeled cough data to gauge the generalization capability of the proposed framework. The framework is not a clinical-grade testing tool. However, it would allow doctors to successfully save more lives by making a proper medical decision about the COVID-19 analysis. The contributions of this paper are concise as follows:

  • Developing CR19 framework that utilizes the machine learning algorithms in a hybrid with the GA algorithm framework to automatically assist patients’ primary diagnosis of COVID-19 from cough audio signals.

  • Accomplishing an empirical analysis of the proposed machine learning algorithms in the task of classifying COVID-19 disease using cough audio signals with a lower cost than other tools such as imaging modalities like CT and X-ray.

  • Analyzing and comparing the proposed models showing enhanced accuracy compared to several machine learning algorithms that show the utmost accurate classification results of COVID-19 using a large cough dataset.

  • Helping in fighting and controlling the COVID-19 outbreak through supporting interdisciplinary scientists to continue developing innovative, intelligent techniques for healthcare systems.

  • Developing a proficient patient remote monitoring system for early COVID-19 detection and diagnosis using hybrid machine learning and GA algorithms in smart healthcare systems and reduce the risk of its spreading.

  • Advocating an IoT-based cloud framework for collect, store, and analyze cough data from user’s wearable devices (i.e., mobile phones) with the scalability to millions of users at the country level.

The rest of this paper is organized as follows. Section 2 gives a brief regarding the used machine learning algorithms and genetic algorithms along with a review of related work in the paper’s subject. Likewise, a detailed description of the proposed CR19 framework is presented in Sect. 3. Next, a high-level proposed framework for remote patient monitoring in smart healthcare systems is presented in Sect. 4, while the experimental results and comparative performance of the proposed algorithms are investigated and deliberated in Sect. 5. Finally, the conclusion of this paper is presented in Sect. 6.

Preliminaries

Machine learning techniques

Machine learning focuses on automatic computer learning that is capable of making its own decisions based on data. In this section, we describe some of the existing state-of-the-art machine learning algorithms that are required to accomplish the clinical purpose of the CR19 framework using algorithms like the following (Randhawa et al. 2018):

  1. Logistic regression (LR): LR is a predictive regression system where the dependent variable is categorical. LR uses maximum likelihood estimation to formulate the probabilities in which logistic regression will take on a particular class, with an iterative algorithm such as Newton’s method used to obtain the model.

  2. Linear discriminant analysis (LDA): The main idea of linear discriminant analysis (LDA) is that it decreases the dimensions of a given classification task, focusing on maximizing the separability among known categories. In practice, LDA creates a new axis by maximizing the distance between the means and by minimizing the variation. It then projects the data onto this new axis while reducing the dimensionality.

  3. K-nearest neighbors (KNN): KNN is a type of supervised learning algorithm and it is mostly used for regression and classification issues. Known data is required to implement the KNN technique. Unlike other machine learning techniques, KNN doesn’t have a training phase. The prediction of a test observation is made based on the distance between observations. The main idea of this technique is to detect the K number of neighbors and predefined classes assign a class to the unknown point.

  4. Decision tree regression (CART): A decision tree is a type of supervised learning algorithm which is mostly used in classification problems. A decision tree is a flowchart-like structure where each internal node denotes a test on an attribute, each branch represents an outcome of a test, and each leaf or terminal node holds a class label. CART can handle both numerical as well as categorical data.

  5. Gaussian Naive Bayes (NB): NB classifies the tested features based on probability. This technique uses normal probability distributions and assumes that the data is normally distributed; in that way, the classification process is more efficient.

  6. Support vector machine (SVM): SVM aims to fit a hyperplane between data points in space, i.e., support vectors, such that the samples are separated by the largest gap possible.

Genetic algorithm

A genetic algorithm (GA) is a metaheuristic that belongs to the broader class of evolutionary algorithms (EA) inspired by the natural selection process (Sivanandam and Deepa 2008). John Holland invented it in 1975, and it is typically employed by biological theory such as mutation, crossover, and selection into computer science to produce high-quality solutions for optimization and search problems. Evolutionary algorithms need a data structure to depict and assess solutions from old solutions to solve problems. For example, in Gas, the search problem’s variables are encoded into finite-length strings of symbols of specific cardinality. So, each solution is represented by chromosomes which are composed of genes. These genes reflect a proposed solution to the problem that the GA is trying to solve. Both chromosomes are called population collection.

The working principles of GA begin with the population, which is initialized randomly based on the data. This population in solution is then modified using different operators, namely reproduction, crossover, mutation. The steps for GA are as follow:

  1. Evaluation: Once the population is initialized or an offspring population is created, the fitness values of the candidate solutions are evaluated. This determines the fitness value of each of the chromosomes and produces as output how “fit” our how “good” the solution is concerning the problem in consideration. (a chromosome is a set of parameters that define a candidate solution to the problem that the genetic algorithm is trying to solve).

  2. Selection: This step is used for reproduction to select the best-fitted chromosomes as parents pass the genes for the next generation and create a new population. The commonly used technique for selection is Roulette-Wheel selection (Goldberg and Holland 1988). In roulette, selection chromosomes are selected based on their fitness relative to all other chromosomes in the population. Thus, the ith string in the population is selected with probability (Fi) proportional to the fitness value (Ai), where Ai is the fitness value of each individual in the population. The roulette wheel selection scheme can be implemented as follows (Burke et al. 2014):
    • A.
      Compute the fitness value, Ai
    • B.
      Calculate the probability Fi, of selecting each member of the ith population:
      Fi=Ain=1kAn 1
      where k is the population size.
    • C.
      Evaluate the cumulative probability, Xi for each individual:
      Xi=n=1iFn 2
    • D.
      Create a positive random number, z[0,1].
    • E.
      Check for z<X1, then select the first chromosome, else select other the individual where Xi-1 < z < Xi.
    • F.
      To create the k candidates for the mating pool, repeat two steps D, E k-times.
  3. Crossover: After the reproduction (selection) is over the population is enriched with better individuals. This is done by creating a new set of chromosomes by combining the parents and add them to the new population set. This operation selects two parents and randomly selects a point between two genes to cut both chromosomes into two parts. Then combine the first part of the first parent and the second part of the second parent to get the first offspring. Similarly, combine the first part of the second parent and the second part of the first parent to get the second offspring. These offsprings belong to the next population.

  4. Mutation: Perform mutation, which alters one or more gene values in a chromosome in the new population set generated. Mutation helps in adding the diversity option to the population. Diversity is important in the search algorithm as the objective of the learning algorithm is to search always in spaces not previously seen ensure to explore the entire search space. The attained population will be applied in the next generation

    Repeat the steps as in Fig. 2 for each generation until finding the best solution to the problem.

Fig. 2.

Fig. 2

Flowchart of genetic algorithm

Previous studies

The breathlessness of the person’s breath is an indication in approximately 50% of the COVID-19 patient persons who be able likewise to reveal other life-threatening infections like pneumonia (Greenhalgh et al. 2020). Computerized recognition of breathlessness from the speech signals is necessary for COVID-19 healthcare and screening telemedicine applications. The speech of the patient in terms of the breath patterns can be recorded with an uncomplicated microphone connected to intelligent portable devices. Malfunction associated with COVID-19 cases can be discovered from these patterns of the patient breath. Recently, the research of COVID-19 diagnosis algorithms based on cough and breath signals has got more attention (Bagad et al. 2020; Rasheed et al. 2020; Alqudaihi et al. 2021; Pal and Sankarasubbu 2021). Therefore, using ML and DL algorithms to detect and screen COVID-19 cases from cough and breath samples has revealed encouraging outcomes (Belkacem et al. 2021; Verde et al. 2021; Pahar et al. 2021; Deshpande et al. 2022). The main challenge of the cough or breath-based COVID-19 diagnosis research direction is the small size of available and open-source datasets.

The datasets of audios (speeches) assist in COVID-19 detection and diagnosis throughout three essential procedures as follows: (1) the sounds of cough can assist in identifying the positive COVID-19 cases after the utilization of machine learning algorithms (Schuller et al. 2020), (2) the rate of breathing can be recognized from the speech signal resulting in the person COVID-19 screening (Greenhalgh et al. 2020), and (3) stress recognition algorithms from the speech signal can be utilized to distinguish individuals with signs of psychological health difficulties and the sternness of COVID-19 indications. However, these mentioned speech-based COVID-19 analysis procedures necessitate massive attempts for the collection of large data sets. Therefore, these procedures can be supported by telemedicine care or smartphone applications.

Brown et al. (2020) developed an application with cross-platform to collect crowdsourced information of breath and cough sounds to differentiate between unhealthy and healthy individuals. The voice signals are employed to discriminate between asthma, healthy, and COVID-19 individuals. The authors constructed three binary classification algorithms to (a) differentiate persons with asthma who acknowledged a cough from positive COVID-19 persons with a cough, (b) differentiate positive COVID-19 persons from healthy persons, and (c) discriminate healthy persons who have a cough from positive COVID-19 persons who have a cough. Approximately greater than 10 K samples (7000 distinctive persons) contributed to collecting crowdsourced data, out of which greater than 200 registered to be positive COVID-19 cases. Different audio augmentation processes are utilized to expand the size of the data set samples. The SVM, gradient boosting trees, and logistic regression classifiers are employed for the categorization mission. This analysis employs the cumulative measure of the AUC (area under the curve) for execution evaluation. AUC of larger than 70% is obtained in the whole suggested classification algorithms.

Sharma et al. (2020) developed efficient cough-based COVID-19 diagnosis algorithms. The speech, breath, and cough sounds are utilized to measure biomarkers in the suggested audibility applications. Various nine vocal sounds accumulate for every patient, including vowel, breath (deep and shallow), and cough (heavy and shallow) phonations. The employed nine different vocals depict distinct physical situations of the breathing procedure. Multidimensional temporal and spectral characteristics are obtained from the files of the audio signals. The data curation and classification assignments are under development for further future research directions. The suggested work is complemented by a web-based application for data collection and an open voice dataset source of roughly 1000 trials.

Faezipour and Abuzneid (2020) suggested a smartphone-based mobile application for self-examination of COVID-19 cases utilizing the sounds of the person breathing. They indicate that breathing problems due to COVID-19 can disclose auditory models and characteristics essential for the pre-diagnosis process of the COVID-19 cases. The sounds of the person breathing can be input into the person’s smartphone through the attached microphone. Deep learning, machine learning, and signal processing algorithms can be employed to the sound of breathing to extract the main features and categorize the input sound into negative and positive COVID-19 cases. Also, the suggested smartphone-based mobile application can be utilized as a self-test while eradicating the costs and risks related to visiting hospitals and healthcare medical institutions. The suggested system can be supplemented with data acquired from blood oxygenation estimated from a pulse oximeter and a lung volume (spirometer). The collected data must be primarily labeled as negative and positive COVID-19 cases by therapeutic professionals based on medical outcomes to prepare the suggested framework. Then, ML algorithms can obtain the main features and categorize new input sounds based on the training of the suggested framework.

Trivedy et al. (2020) suggested a powered compact smartphone spirometer with a computerized infection CNN-based classification system. The device of the spirometer is utilized to determine the volume of inspired and expired air. The suggested framework can be broadened to incorporate COVID-19 categorization, and the classifiers can be re-evaluated for precision efficiency. However, Melek (2021) used a Fourier transform with Mel-frequency cepstral coefficients (MFCC) for feature extraction. It applied support vector machine, which gives the 95.86% classification accuracy with 98.6% sensitivity. Grant et al. (2021) used Mel-frequency cepstral coefficients (MFCCs) and relAtive specTrA perceptual linear prediction (RASTA-PLP) features extraction with random forest classifier for cough analysis. This technique is compared with other techniques to verify its efficiency. Andreu-Perez et al. (2021) proposed a deep neural network (DNN) for cough analysis. This technique achieved an AUC of 0.98 from a dataset consisting of 2339 COVID-19 positive and 6041 COVID-19 negatives. Laguarta et al. (2020) classified COVID-19 coughs with a higher AUC of 0.97 (sensitivity = 98.5% and specificity = 94.2%) using three Pretrained ReSnet50’s architecture after transform the cough recording into CNN. Table 1 shows the contributions and limitations of some existing work regarding the COVID-19 detection based on cough signals.

Table 1.

Summary of contribution and limitations for some existing work in COVID-19 diagnosis using cough signals

Work Contribution Limitations
Imran et al. (2020) They provided a system called AI4COVID-19. This system is an AI-enabled preliminary diagnosis for COVID-19 from cough samples via an app They focused only on using cough samples via an app. So they can use other acquisition methods for collecting and testing their proposed system for better performance analysis
Pal and Sankarasubbu (2021) An interpretable and COVID-19 diagnosis AI framework is devised and developed based on the cough sounds features and symptoms metadata They worked on a medical dataset containing symptoms and demographic data of 30,000 audio segments, 328 cough sounds from 150 patients. So, more datasets can be acceptable for testing their work
Pahar et al. (2021) A machine learning-based COVID-19 cough classifier which can discriminate COVID-19 coughs recorded on a smartphone Still, the performance of their model needs to be improved by performing feature selection and other preprocessing processes
Deshpande et al. (2022) They provided an overview of research on human audio signals using AI techniques to screen, diagnose, monitor, and spread awareness about COVID-19 They have to work in provide more comparative analysis of existing machine and deep learning models with their performance in COVID-19 coughs from different datasets
Trivedy et al. (2020) The offered the design and development of a low-cost, portable, smartphone-enabled spirometer with an automatic disease classification using CNN They have to use more number of the dataset and more experimental investigation in evaluations the proposed system with more algorithms
Melek (2021) The offered a system for the diagnosis of COVID-19 coughs based on the radial basis function (RBF) kernel function of SVM and the MFCC method Their work focused only on using the SVM with the MFCC method, so they can be extended to evaluate their system with more models and features extraction methods for cough audio signals
Grant et al. (2021) They proposed an approach to analyzing sounds to unobtrusively detect COVID-19 based on MFCCs and RASTA-PLP features with classifiers RF and DNN They worked only on the MFCCs and RASTA-PLP features so more features can be extracted from the audio signals plus testing with more number of classifiers over different datasets
Andreu-Perez et al. (2021) They developed a generic method based on EMD with subsequent classification based on a tensor of audio features and DeepCough classifier They proposed a web tool and underpinning algorithm, so it’s better to develop a mobile-based app besides the web tool to empower their approach
Laguarta et al. (2020) They built a data collection pipeline of COVID-19 cough recordings through their website to train their MIT open voice model through using CNN-based models They worked only on CNN-based models to test their collected data with other machine learning models and compare them with existing proposed models

It is observed that research presented many works on COVID-19 based on image processing using X-ray images. Whereas, there are few contributions to the literature on cough signal processing for COVID-19 detection. Therefore, this study proposes an efficient prediction framework consists of four phases specifically: (1) data acquisition phase, (2) data preprocessing phase (FSP), (3) training model and testing, and (4) COVID-19 classification and detection in cough data phase (PP). This work aims to build an innovative smart prediction framework based on hybrid genetic algorithm in mingling with various machine learning techniques such as linear regression, Naïve Bayes, K-nearest neighbors, support vector machine, logistic regression, and decision tree that can classify COVID-19 as positive or negative from cough data the can be through mobile phone audio mic or any audio acquisition system.

Proposed CR19 framework

In this paper, we offered a new framework for automatically identifying the status of COVID-19 in cough audio signals. Figure 3 illustrates the proposed CR19 framework, which consists of a hybrid GA-ML with six different machine learning algorithms, namely linear regression, Naïve Bayes, K-nearest neighbors, support vector machine, logistic regression, and decision tree. In addition, the CR19 framework includes different key phases to accomplish the diagnostic procedure of novel Coronavirus based on the cough audio signals, as follows:

  • Phase 1 (data acquisition and collecting): In this phase, the audio data will be collected from mobile phone sources to process them to the next phases in the proposed framework.

  • Phase 2 (data preprocessing): All audio signals have been collected in one dataset and loaded to be suitable for further processing within the proposed framework to indicate the case of positive or negative COVID-19 for each audio signal in the dataset.

  • Phase 3 (training model and testing): To start the training phase of selected and/or tuned one of six machine learning models, the preprocessed dataset is 75–25 split that means 25% of data will be used for the testing phase, and 75% of data will be used for the training phase. Subsample random selections of training audio data for the machine learning algorithms and then apply performance evaluation metrics to estimate the proposed framework.

  • Phase 4 (COVID-19 classification and detection in cough data): In the last phase of the proposed framework, the testing data is fed to the tuned machine learning algorithms to classify all the input audio signals into one of two types: positive COVID-19 or negative COVID-19 as presented in Fig. 3. Finally, the complete performance analysis for every machine learning algorithm will be evaluated based on the classification-based performance metrics.

Fig. 3.

Fig. 3

Proposed CR19 framework to diagnose COVID-19 in cough audio signals

High-level proposed IoT based cloud framework

The proposed systems have the aim of allowing people suffering from COVID-19 disease to live safely. This can be done through utilizing the internet of things, cloud computing, artificial intelligence, and smartphones for real-time monitoring of people with COVID-19, thus, this proposes improving the efficiency of healthcare by providing a more reliable and expedient healthcare system that enables observing and controlling of spreading of this COVID-19 among people. The recommended framework consists of three main stages that collaborate to achieve system objectives. Each stage offers a specific task and operation in harmonizing with the other stages. Figure 4 illustrates the proposed framework with the three stages as the next:

  • Stage 1: IoT-based smart-phone can be used for real-time cough data acquisition.

  • Stage 2: The cloud will be used to provide a pool of processing and storage resources for receiving users’ cough data from their smartphone over the internet to be sorted, and then it becomes available for doctors’ inspections. Besides, the data exploration and handling will be held in the cloud for any disorder detection in patient’s data. Therefore, the abnormal changes in patient’s data will be classified based on patient status.

  • Stage 3: The medical staff uses a cloud-based dashboard monitoring system to monitor a patient’s records of cough data. The staff will be able to inspect reports provided by the cloud-based analytical system, and they able to make a suitable decision.

Fig. 4.

Fig. 4

Proposed remote monitoring framework for Early COVID-19 detection and diagnosis

Experimental results and discussions

This section provides the experimental environment and the used dataset. Finally, it provides the results analysis and discussion of the proposed framework.

Experiment setup

To evaluate the proposed framework for COVID19 diagnosis based on Cough audio signals, the machine learning classifiers are coded using Python on Intel(R) Core(TM) i5 CPU with 12 GB RAM running Windows 8.1 experiments are carried out.

Dataset

Coswara-data (https://github.com/walzter/COVID_Cough, 2020) is provided by the Indian Institute of Science (IISc) Bangalore is an attempt to build a diagnostic tool for Covid-19 based on respiratory, cough, and speech sounds. The project is in the data collection stage now. It requires the participants to record breathing sounds, cough sounds, sustained phonation of vowel sounds, and a counting exercise. The data was sourced as well as using online datasets on audio files from different sources. Using the audio and a CSV file is created for the experimental purpose with the proposed work. In this work, we worked on all samples in this CSV file for providing A proof of concept (POC) for our proposed system in detecting COVID-19 via cough audio signals.

Evaluation performance metrics

The proposed approach combines machine learning with a genetic algorithm to improve the classification accuracy of cough audio signals. We perform a relative comparison of ML and hybrid GA-ML classifiers to evaluate the accuracy of the proposed framework. We use precision, recall, and F-measure as the performance metric for classifiers. The metrics are calculated for both positive “P” and negative “N” classified documents on each classifier. In the experiments, six machine learning techniques have been applied for classification and detection purposes. These classifiers are linear regression, Naïve Bayes, K-nearest neighbors, support vector machine, logistic regression, and decision tree for detecting and diagnosing COVID-19 in cough audio signals positive and negative.

As shown in Table 2, this matrix is known as a confusion matrix to evaluate the performance of a classification model for which true or false values have been applied on a series of test data (Sokolova and Lapalme 2009). Four essential parameters are true positive (TP), true negative (TN), false positive (FP), and false negative (FN). TP is several anomalies and has been identified with the right scenario. TN is an incorrectly measured number of regular instances. FP is a collection of regular instances that are classified as an anomaly scenario FP. FN is a list of anomalies observed as ordinary scenarios.

Table 2.

Confusion matrix

Predicted negative Predicted positive
Actual negative TP FN
Actual positive FP TN

After calculating the parameters in the confusion matrix, the evaluation metrics can be calculated such as accuracy, precision, recall, and F1-Score as follows:

  • Accuracy: It is the most important metric for efficiency. As shown in Eq. (3), it is simply the submission of true positives and true negatives divided by the total values of confusion matrix components. The most reliable model is the best, but it is important to ensure that there are symmetrical datasets with almost equal false positive values and false adverse values. Therefore, certain parameters must be determined to determine the quality of our system.
    Accuracy=TP+TNTP+FP+FN+TN 3
  • Precision: It as shown in Eq. (4), it is the relationship between true positive predicted values and full positive predicted values.
    Precision=TPTP+FP 4
  • Recall: As shown in Eq. (5), recall is the ratio between predicted true positive values and the submission of predicted true positive values and predicted false negative values.
    Recall=TPTP+FN 5
  • F1-score: It is an overall measure of a model’s accuracy that combines precision and recall in that weird way that addition and multiplication just mix two ingredients to make a separate dish altogether. As shown in Eq. (6), the F1-score is twice the ratio between the multiplication and precision and recall submission.
    F1-score=2×Precision×RecallPrecision+Recall 6

Results analysis

Before discussing the results of the hybrid GA-ML framework for classification, it is worthwhile to check the performance of the individual classifier for COVID-19 detection-based cough data as positive or negative status. For estimating the performance of the proposed framework, input cough data in the dataset including normal and diseased cases are randomly selected for training.

Figures 5 and 6 show the comparison between the six machine learning algorithms based on precision, recall, and F1-score measurements for COVID-19 detection and classification for positive and negative cases, respectively. From the figures, the KNN classifier provides the best performance as compared to other techniques. But, the results of all classifiers emphasize the need for a modified framework to enhance the performance of COVID-19 detection and diagnosis.

Fig. 5.

Fig. 5

Precision, recall, and F-measure comparison for positive cases of six different classifiers

Fig. 6.

Fig. 6

Precision, recall, and F-measure comparison for negative cases of six different classifiers

For all six machine learning classifiers, the resulting convolution matrices are shown in Fig. 7. In addition, the results of the confusion matrix of hybrid GA-ML algorithms in the proposed framework are shown in Fig. 8.

Fig. 7.

Fig. 7

Confusion matrix of all machine learning algorithms in the proposed framework

Fig. 8.

Fig. 8

Confusion matrix of hybrid GA-ML algorithms in the proposed framework

Furthermore, the proposed hybrid GA-ML is applied to diagnosis COVID-19 via cough audio signals. The performance metrics for the proposed framework are given in Figs. 9 and 10. It is evident that combining GA with ML models improves all six different ML performances and gives better results. GA-KNN gives superior results compared to other classifiers, which indicates that this will provide a very good diagnosis for COVID-19 from the patient’s cough. As we observe from Fig. 11, the accuracy of GA-ML increases for all classifiers as compare non-GA based techniques. We also observe that out of these six classifiers, GA-KNN shows more than 97% accuracy in the diagnosis of COVID-19 from cough audio signals, as also shown in Fig. 12. Also, Table 3 illustrates the comparative study between machine learning algorithms (ML) and proposed hybrid genetic algorithm with machine learning (GA-ML) for COVID-19 detection-based cough data as positive or negative status. Finally, Table 4 compares the proposed CR19 framework in comparison to the previous related methods. It is clear that the proposed framework has more than an advantage over all of the conventional methods. Therefore, our proposed framework is presented for cough detection using a hybrid method of genetic and ML models to improve the performance of existing machine learning models in COVID19 detection.

Fig. 9.

Fig. 9

Precision, recall, and F-measure comparison for positive cases of hybrid GA-ML for different classifiers

Fig. 10.

Fig. 10

Precision, recall, and F-measure comparison for negative cases of hybrid GA-ML for different classifiers

Fig. 11.

Fig. 11

The accuracy of classifier for ML and GA-ML framework

Fig. 12.

Fig. 12

The percentage of increasing accuracy GA-ML framework in compared with classical ML framework

Table 3.

Comparative study between ML and proposed hybrid GA-ML for COVID-19 detection based cough data as positive or negative status

Metrics Patient status Algorithms
LR/GA-LR LDA/GA-LDA KNN/GA-KNN CART/GA-CART NB/GA-NB SVM/GA-SVM
Precision Negative 0.91/0.92 0.94/0.95 0.97/0.98 0.92/0.94 0.94/0.94 0.92/0.93
Positive 0.80/1.00 0.80/0.90 0.86/1.00 0.45/0.73 0.32/0.67 1.00/1.00
Recall Negative 0.99/1.00 0.98/0.99 0.98/1.00 0.95/0.98 0.85/0.97 1.00/1.00
Positive 0.25/0.31 0.50/0.56 0.75/0.81 0.31/0.50 0.56/0.50 0.31/0.44
F1-score Negative 0.95/0.96 0.96/0.97 0.98/0.99 0.93/0.96 0.89/0.95 0.96/0.97
Positive 0.38/0.48 0.62/0.69 0.80/0.90 0.37/0.59 0.41/0.57 0.48/0.61

Table 4.

Comparative analysis between the proposed hybrid GA-ML and existing work for COVID-19 detection based cough

Work Dataset Classifier Results
Imran et al. (2020) ESC-50 Multi-class classifier (CNN, SVM, and binary classifier) Accuracy is 92.64
Verde et al. (2021) Dataset collected by Cambridge University ResNet AUC is 84.6
Pahar et al. (2021) Coswara CNN, LSTM, ResNet50, and LSTM + SFS Accuracy are 73.02, 73.78, 74.58, 92.91
Grant et al. (2021) Crowdsourced Random forest + DNN AUC of 79.38 for detecting COVID-19 via speech sound analysis, and 75.75 for detecting COVID-19 via breathing sound analysis
Proposed CR19 framework Coswara Hybrid GA-ML (GA-LR, GA-LDR,GA-KNN,GA-CART,GA-NB, GA-SVM) Accuracy are 90.78, 92.90, 95.74, 87.94, 81.56, and 92.198

Conclusion

In recent days, infectious COVID-19 disease shocked the world and is still threatening the lives of billions of people around the world. This paper proposes a new framework to automatically identify or confirm COVID-19 in cough audio signals based on six machine learning algorithms, names such as linear regression, Naïve Bayes, K-nearest neighbors, support vector machine, logistic regression, and decision tree. The accuracy of this framework is improved with the utilization of the genetic algorithm with the ML techniques. Along with proposing an IoT-based cloud framework for collect, store, and analyze data from a patient mobile phone with the scalability to millions of users at a country level. This framework can support doctors in provide a proper medical decision regarding the COVID-19 analysis. The results showed that the K-nearest neighbors algorithm provides the best results based on different evaluation metrics compared with the other algorithms in the detection and diagnosis process. The proposed CR19 framework can help diagnose COVID-19 infection by cough signals and can also be implemented in a smartphone after performance assessment from medical authorities for medical internet of things (MIoT) applications.

Footnotes

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Ezz El-Din Hemdan, Email: ezzeldinhemdan@el-eng.menofia.edu.eg, Email: ezzvip@yahoo.com.

Walid El-Shafai, Email: walid.elshafai@el-eng.menofia.edu.eg, Email: eng.waled.elshafai@gmail.com.

Amged Sayed, Email: amgad.mahmoud@el-eng.menofia.edu.eg, Email: amged1983@gmail.com.

References

  1. Alqudaihi K, Aslam N, Khan I. Cough sound detection and diagnosis using artificial intelligence techniques: challenges and opportunities. IEEE Access. 2021;9:102327–102344. doi: 10.1109/ACCESS.2021.3097559. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Andreu-Perez J, Perez-Espinosa H, Timonet E. A generic deep learning based cough analysis system from clinically validated samples for point-of-need Covid-19 test and severity levels. IEEE Trans Serv Comput. 2021;2(9):1–13. doi: 10.1109/TSC.2021.3061402. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Apostolopoulos I, Mpesiana T. Covid-19: automatic detection from x-ray images utilizing transfer learning with convolutional neural networks. Phys Eng Sci Med. 2020;43(2):635–640. doi: 10.1007/s13246-020-00865-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Apostolopoulos I, Mpesiana T. Covid-19: automatic detection from x-ray images utilizing transfer learning with convolutional neural networks. Phys Eng Sci Med. 2020;6(1):1–19. doi: 10.1007/s13246-020-00865-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Bagad P, Dalmia A, Doshi J, Nagrani A, Bhamare P, Mahale A, Panicker R (2020) Cough against COVID: evidence of COVID-19 signature in cough sounds. arXiv preprint https://arxiv.org/abs/2009.08790
  6. Belkacem A, Ouhbi S, Lakas A. End-to-end AI-based point-of-care diagnosis system for classifying respiratory illnesses and early detection of COVID-19: a theoretical framework. Front Med. 2021;8(2):1–13. doi: 10.3389/fmed.2021.585578. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Brown C, Chauhan J, Grammenos A, Han J, Hasthanasombat A, Spathis D, Mascolo C (2020) Exploring automatic diagnosis of COVID-19 from crowdsourced respiratory sound data. arXiv preprint https://arxiv.org/abs/2006.05919
  8. Burke E, Burke G, Kendall G, Kendall S. Search methodologies: introductory tutorials in optimization and decision support techniques. Nat Med. 2014;2(3):1–4. [Google Scholar]
  9. Coswara-data (2020) https://github.com/walzter/COVID_Cough. Accessed 27 Oct 2020
  10. Deshpande G, Batliner A, Schuller W. AI-based human audio processing for COVID-19: a comprehensive overview. Pattern Recognit. 2022;12(2):108–119. doi: 10.1016/j.patcog.2021.108289. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Faezipour M, Abuzneid A. Smartphone-based self-testing of COVID-19 using breathing sounds. Telemed EHealth. 2020;1(5):1–17. doi: 10.1089/tmj.2020.0114. [DOI] [PubMed] [Google Scholar]
  12. Goel T, Murugan R, Mirjalili S, Chakrabartty D. OptCoNet: an optimized convolutional neural network for an automatic diagnosis of COVID-19. Appl Intell. 2020;10(3):1–16. doi: 10.1007/s10489-020-01904-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Goldberg D, Holland J. Genetic algorithms and machine learning. Comput Learn Theory. 1988;5(8):1–17. [Google Scholar]
  14. Grant D, McLane I, West J. Rapid and scalable COVID-19 screening using speech, breath, and cough recordings. Biomed Eng. 2021;2(8):1–19. [Google Scholar]
  15. Greenhalgh T, Koh G, Car J. Covid-19: a remote assessment in primary care. IEEE Access. 2020;5:2015–2045. doi: 10.1136/bmj.m1182. [DOI] [PubMed] [Google Scholar]
  16. Imran A, Posokhova I, Qureshi H, Masood U, Riaz S, Ali K, John C, Hussain I, Nabeel M. AI4COVID-19: AI enabled preliminary diagnosis for COVID-19 from cough samples via an app. Inform Med Unlocked. 2020;3(7):100–112. doi: 10.1016/j.imu.2020.100378. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Laguarta J, Hueto F, Subirana B. COVID-19 artificial intelligence diagnosis using only cough recordings. IEEE Open J Eng Med Biol. 2020;1(5):275–281. doi: 10.1109/OJEMB.2020.3026928. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Lalmuanawma S, Hussain J, Chhakchhuak L. Applications of machine learning and artificial intelligence for Covid-19 (SARS-CoV-2) pandemic: a review. Chaos Solitons Fractals. 2020;139(5):205–221. doi: 10.1016/j.chaos.2020.110059. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Li L, Qin L, Xu Z, Yin Y, Wang X. Artificial intelligence distinguishes COVID-19 from community acquired pneumonia on chest CT. Radiology. 2020;10(2):1–14. doi: 10.1148/radiol.2020200905. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Maghdid H, Ghafoor K, Sadiq A, Curran K, Rabie K (2020) A novel ai-enabled framework to diagnose coronavirus covid 19 using smartphone embedded sensors: design study. arXiv preprint https://arxiv.org/abs/2003.07434
  21. Malhotra A, Mittal S, Majumdar P, Chhabra S, Thakral K, Vatsa M, Agrawal A. Multi-task driven explainable diagnosis of COVID-19 using chest x-ray images. Pattern Recognit. 2021;10(8):243–251. doi: 10.1016/j.patcog.2021.108243. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Melek N. Identifying COVID-19 by using spectral analysis of cough recordings: a distinctive classification study. Cogn Neurodynamics. 2021;1(8):1–14. doi: 10.1007/s11571-021-09695-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Menni C, Valdes A, Freidin M, Sudre C, Nguyen L, Drew D, Visconti A. Real-time tracking of self-reported symptoms to predict potential COVID-19. Nat Med. 2020;2(3):1–4. doi: 10.1038/s41591-020-0916-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Pahar M, Klopper M, Warren R, Niesler T. COVID-19 cough classification using machine learning and global smartphone recordings. Comput Biol Med. 2021;13(5):1–17. doi: 10.1016/j.compbiomed.2021.104572. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Pal A, Sankarasubbu M. Pay attention to the cough: early diagnosis of COVID-19 using interpretable symptoms embeddings with cough sound signal processing. Appl Comput. 2021;2(9):620–628. [Google Scholar]
  26. Randhawa K, Loo C, Seera M, Lim C, Nandi A. Credit card fraud detection using AdaBoost and majority voting. IEEE Access. 2018;6:14277–14284. doi: 10.1109/ACCESS.2018.2806420. [DOI] [Google Scholar]
  27. Rasheed J, Jamil A, Hameed A, Aftab U, Aftab J, Shah S, Draheim D. A survey on artificial intelligence approaches in supporting frontline workers and decision makers for COVID-19 pandemic. Chaos Solitons Fractals. 2020;10(2):1–13. doi: 10.1016/j.chaos.2020.110337. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Schuller B, Schuller D, Qian K, Liu J, Zheng H, Li X (2020) Covid-19 and computer audition: an overview on what speech & sound analysis could contribute in the SARS-CoV-2 corona crisis. arXiv preprint https://arxiv.org/abs/2003.11117 [DOI] [PMC free article] [PubMed]
  29. Sharma N, Krishnan P, Kumar R, Ramoji S, Chetupalli S, Ghosh P, Ganapathy S (2020) Coswara-a database of breathing, cough, and voice sounds for COVID-19 diagnosis. arXiv preprint https://arxiv.org/abs/2005.10548
  30. Sivanandam S, Deepa S. Genetic algorithms. IEEE Access. 2008;5:2015–2045. [Google Scholar]
  31. Sokolova M, Lapalme G. A systematic analysis of performance measures for classification tasks. Inf Process Manag. 2009;45(4):427–437. doi: 10.1016/j.ipm.2009.03.002. [DOI] [Google Scholar]
  32. Trivedy S, Goyal M, Mohapatra P, Mukherjee A. Design and development of smartphone-enabled spirometer with a disease classification system using convolutional neural network. IEEE Trans Instrum Meas. 2020;1(2):1–18. [Google Scholar]
  33. Verde L, Pietro G, Ghoneim A. Exploring the use of artificial intelligence techniques to detect the presence of coronavirus Covid-19 through speech and voice analysis. IEEE Access. 2021;9:65750–65757. doi: 10.1109/ACCESS.2021.3075571. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. World Health Organization (WHO) (2020) COVID-19 weekly epidemiological update. Accessed 27 Oct 2020
  35. Zebin T, Rezvy S. COVID-19 detection and disease progression visualization: deep learning on chest X-rays for classification and coarse localization. Appl Intell. 2020;10(3):1–12. doi: 10.1007/s10489-020-01867-1. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Journal of Ambient Intelligence and Humanized Computing are provided here courtesy of Nature Publishing Group

RESOURCES