Abstract
The utilization of lung sounds to diagnose lung diseases using respiratory sound features has significantly increased in the past few years. The Digital Stethoscope data has been examined extensively by medical researchers and technical scientists to diagnose the symptoms of respiratory diseases. Artificial intelligence-based approaches are applied in the real universe to distinguish respiratory disease signs from human pulmonary auscultation sounds. The Deep CNN model is implemented with combined multi-feature channels (Modified MFCC, Log Mel, and Soft Mel) to obtain the sound parameters from lung-based Digital Stethoscope data. The model analysis is observed with max-pooling and without max-pool operations using multi-feature channels on respiratory digital stethoscope data. In addition, COVID-19 sound data and enriched data, which are recently acquired data to enhance model performance using a combination of L2 regularization to overcome the risk of overfitting because of less respiratory sound data, are included in the work. The suggested DCNN with Max-Pooling on the improved dataset demonstrates cutting-edge performance employing a multi-feature channels spectrogram. The model has been developed with different convolutional filter sizes (, , , , and ) that helped to test the proposed neural network. According to the experimental findings, the suggested DCNN architecture with a max-pooling function performs better to identify respiratory disease symptoms than DCNN without max-pooling. In order to demonstrate the model’s effectiveness in categorization, it is trained and tested with the DCNN model that extract several modalities of respiratory sound data.
Keywords: Artificial intelligence, Human respiratory system, Deep learning, Deep convolutional neural networks, COVID-19, Neural architecture search
Introduction
Public health care researchers have dedicated considerable attention to examine lung acoustic signal features for the identification of various lung-based diseases, including but not limited to Asthma, Bronchiolitis, COPD, Heart Disease, Pneumonia, URTI (Upper Respiratory Tract Infection), COVID-19 Symptoms, a combination of Heart Failure and COPD, and LRTI (Lower Respiratory Tract Infection) through analysis of human respiratory sounds [1–3]. Very well-trained medical representatives are needed to identify respiratory disease symptoms. To address these problems, many Artificial Intelligence (AI) approaches are implemented in the real universe. The goal of the study is to classify different respiratory disorders using digital stethoscope data by using a human respiratory sound identification algorithm [4–8].
Using digital stethoscopes to capture lung sound auscultations from the human body, clinical experts and technical researchers have begun their investigation [9, 10]. This method automatically analyses the data from the respiratory sound recordings (Asthma wheezing sound). Medical researchers are currently conducting experiments with the utilization of speech acoustic signals to aid in the automatic identification of a broad spectrum of lung-based diseases. Parkinson’s disease can have a wide range of voice varieties (different people with Parkinson’s can have slow voices, and they don’t convey adequate emotional states in the single tonal quality sound, breath sounds, etc.). Alzheimer’s disease is recognized by incoherent speech, an impediment in which words get repeated and imperfect sentences (that have different words) are made. Coronary artery disease (voice dysfunction, neck discomfort, fatigue), and many other undiagnosed conditions with abnormal voice frequency and rhythm, voice-related brain injury, and exhaustion from fighting [11–13]. When integrated into primary commodities, the application of lung acoustic signals emerges as a pivotal parameter for the identification of various diseases. It offers a vital possibility for quick diagnosis and availability of cost-effective treatments to the public.
The employment of diverse pre-trained models, encompassing Machine Learning (ML), Deep Learning (DL), and signal processing methodologies, has led to a diminution in reliance on individual acoustic breathing sound signals within the context of Digital Stethoscope data analysis [14, 15]. Digital stethoscopes can reliably and effectively gather respiratory sounds from humans, as seen in Fig. 1. In order to confirm the symptoms of respiratory diseases, a recent study was initiated to investigate how the sounds produced by the human respiratory system (breathing auscultations are gathered from infected patients using a smart stethoscope in order to identify COVID-19 symptoms) from the healthy symptoms using a set of diverse infected patients with SARS-CoV-2 symptoms (gathered from the medical centres) and other medical breath trained in a series of simulations [16, 17]. In this area of our research, human respiratory auscultations from digital stethoscope data are utilized.
Fig. 1.

Digital stethoscope to collect the sample respiratory sound data
The research focuses on abnormalities of lung acoustic sound signals to classify different lung illnesses (Asthma, Bronchiolitis, COPD, Heart Disease, Pneumonia, URTI—Upper Respiratory Tract Infection, COVID-19 Symptoms, a combination of Heart Failure and COPD, LRTI—Lower Respiratory Tract Infection). As a result, there is a need for the availability of a proper model to recognize breathing sounds. This study presents three feature extraction techniques employed by Deep Convolutional Neural Networks (CNN) to derive profound acoustic features from spectrogram data: Log Mel frequency spectrum, Mel Frequency Cepstral Coefficients (MFCC), and Gammatone Frequency Cepstral Coefficients (GFCC). The study additionally leverages the ICBHI sounds dataset, augmented data sourced from local hospitals, and individualized data to enhance the performance of the model. The application of L2 regularization is also implemented to mitigate the risk of overfitting arising from an insufficient amount of respiratory sound data. A few samples were taken from local government hospitals, and the datasets were gathered online. We prepared the dataset of sounds, which consists of 10 data classes, by doing statistical analysis and pre-processing on dataset collection (Asthma, Bronchiolitis, COPD, Heart Disease, Pneumonia, URTI—Upper Respiratory Tract Infection, COVID-19 Symptoms, a combination of Heart Failure and COPD, LRTI—Lower Respiratory Tract Infection, Healthy sounds).
The primary focus of this research revolves around the development of two RDCNN models (one incorporating max-pooling and the other without max-pooling) employing three distinct feature spectrums: Soft-Mel frequency channels, Log-Mel frequency spectra, and Modified Mel-frequency Cepstral Coefficients (MMFCC). In addition, COVID-19 sound data and enriched data, which are recently acquired data to enhance model performance using a combination of L2 regularization to overcome the risk of overfitting because of less respiratory sound data, are included in the work. We also focused on observing the model’s performance with different filter sizes (1 × 12, 1 × 24, 1 × 36, 1 × 48, and 1 × 60) that have been implemented to test the proposed neural network. In order to demonstrate the effectiveness of the model in categorization, it is trained and tested with the DCNN model and several modalities of respiratory sound data.
Background
The field of medical diagnostics has witnessed a notable surge in the utilization of lung sounds for the diagnosis of respiratory diseases in recent years. A pivotal tool in this endeavor has been the Digital Stethoscope, which provides a wealth of respiratory sound data that researchers and technical scientists have leveraged for diagnostic purposes. The integration of artificial intelligence (AI) has further propelled these efforts, enabling the identification of respiratory disease symptoms from pulmonary auscultation sounds in real-world scenarios.
In this context, a Deep Convolutional Neural Network (DCNN) model has been implemented, utilizing a combination of multi-feature channels extracted from lung-based Digital Stethoscope data. These features include Modified Mel-Frequency Cepstral Coefficients (MMFCC), Log Mel, and Soft Mel. The utilization of a deep neural network allows for the extraction of intricate sound parameters critical for the diagnosis of respiratory conditions. The analysis of the model involves the incorporation of max-pooling operations and a comparison with scenarios where max-pooling is omitted. This study explores the impact of different convolutional filter sizes (1 × 12, 1 × 24, 1 × 36, 1 × 48, and 1 × 60) on the performance of the proposed neural network architecture. Moreover, the research extends its scope by incorporating COVID-19 sound data and enriched data, a recent acquisition aimed at enhancing model performance. To mitigate the risk of overfitting due to limited respiratory sound data, a combination of L2 regularization techniques is employed.
The experimental findings reveal that the suggested DCNN architecture, particularly when augmented with max-pooling functions, outperforms its counterpart without max-pooling in identifying respiratory disease symptoms. The inclusion of various convolutional filter sizes demonstrates a nuanced exploration of the model’s sensitivity to different sound features. To validate the model’s efficacy in categorization, it undergoes training and testing using diverse modalities of respiratory sound data. This comprehensive approach aims to showcase the versatility and robustness of the proposed DCNN model in identifying and categorizing respiratory disease symptoms, making strides in the intersection of artificial intelligence and medical diagnostics.
In order to achieve the effectiveness of the research, the present study is organized into various parts. The analysis of background works and performance of the existing models on respiratory sound datasets are illustrated in "Literature Analysis" Section. In "Methodology and Methods" Section, we have elaborated on the utilization of the dataset, feature sets, functionality, data augmentation procedure, the experimental setup, the construction of the DCNN architecture, and the DCNN model analysis with and without max-pool functions. "Result Analysis and Discussion" Section focuses on the analysis of results utilizing benchmark data obtained from a digital stethoscope, including the evaluation of the suggested performance model in terms of accuracy and F1–score. This is attained by contrasting the experimental findings with the current methods. Finally, we concluded the proposed work report by outlining the key findings of the study in the final section.
Literature analysis
Scientists and specialists have long acknowledged that respiratory acoustic sound signal is a possible human respiratory health screening medium. A particular investigation employed a digital stethoscope in conjunction with distinct pulmonary audio signals to discern acoustic sound signals originating from the human respiratory system. It required well-known trained specialists to understand and comprehend the sickness, therefore the recent approaches like the MRI method and ultrasounds, which are simple to observe and conduct analysis, are quickly replacing it. Conversely, recent advancements in the analysis of acoustic sound signals and design may potentially bypass these methodologies, presenting cost-effective and widely adopted alternatives, such as lung acoustic signals. The processing of acoustic signals typically involves conventional devices, with recent applications extending to web-based platforms and mobile operating system applications utilizing signals captured through digital stethoscopes.
Lung sound classification for enhanced respiratory pathology detection
At present, lung-based disease is one of the main causes of death. The major lung disorders like COPD (chronic obstructive pulmonary disease), Tuberculosis, Pneumonia, COVID-19, Asthma, and lung cancer kill over three million individuals each year worldwide, based on the World Health Organization (WHO). COVID-19 is a unique type of viral pneumonia that first appeared in Wuhan (China) in 2019, and has been the reason for around 160 million infections and around 4,000,000 fatalities worldwide. Lung sounds provide important details on pulmonary pathology. To date, numerous lung sound classification techniques have been implemented by various researchers. The proposed categorization approach is posited as a potential method for distinguishing between normal respiratory signs and incidental respiratory signals. Firstly, the Active Noise Control (ANC) technique is developed with a dual-channel electronic stethoscope to improve lung sounds that are contaminated by noise in a real-world setting. Second, the Hidden Markov Model (HMM) can help to maximize the likelihood approach to determine the acoustic likelihood for each breathing phase to classify normal or accidental lung sounds. Thirdly, rather than implementing the conventional Gaussian mixture model, a deep neural network calculates the rear possibility of the HMM for each observation. According to the findings, the suggested classification method can significantly enhance classification efficiency [18, 19].
Similarly, researchers are looking into the short-term spectral features of lung sounds to define them and identify any linked disorders. The ability to distinguish between the normal, wheeze, and crackle forms of lung sounds is assessed with the help of five different cepstral features. It was suggested to employ several Machine Learning (ML) methods, including SVM and CNN, to identify respiratory sounds captured by a digital stethoscope and the audio files recording tool [1]. Although in recent studies, algorithms based on deep learning are employed for classification performance, traditional machine learning techniques are still typically followed for classifying lung sounds. Convolutional network architecture is a conceptually straightforward, yet highly flexible paradigm that can be taken for a variety of perceptual tasks. To improve the accuracy of classification, a max-pooling layer, an average-pooling layer, and the CNN design are connected in tandem [4]. The Random Subspace Ensembles (RSE) approach is employed to supply the Linear Discriminant Analysis (LDA) classifier with deep sound features extracted from the data. Scientific researchers have examined various Deep Learning (DL) models like RNN, CNN, RCNN, LSTM, etc., to the identification of respiratory disease abnormalities from the sound data by extracting deep sound features using different feature extraction techniques like Mel-frequency, Gama-tone frequency filters, signal processing techniques, Mel-frequency cepstral coefficient, etc. [20–23].
Comparative analysis of machine learning approaches for lung sound classification
We evaluated three ML methods for classifying lung sounds. While the third approach relies on the creation of CNN, the other two techniques depend on obtaining an ensemble of customized characteristics trained by three different models (SVM, k-nearest neighbour, Stochastic models, and Gaussian mixed methods). Various MFCC statistics were generated by the first method, which retrieved 12-Mel coefficients from the acoustic sign recordings. By now, the CNN model has developed well and it is capable of handling challenging classification tasks [24–26]. Their effectiveness is influenced by the several iterations, the number of batches, as well as the learning metrics. CNN can take the role of conventional classifiers by training the parameters on fully connected layers. In order to identify different sounds, deep residual networks (ResNets) and the optimized S-transform (OST) are also employed. The ResNet with OST DL technique is suggested for identifying wheezes, crackles, and typical noises. The STFT, ST, and OST rescaled spectrum maps are extracted as inputs for the proposed Res-Net-based technique, respectively. The spectrogram of OST is rescaled for the Resnet after the raw respiratory sound has been treated. The classes of respiratory sounds are identified when the ResNet completes pattern learning and identification [27, 28].
To identification of pulmonary sounds is based on the wavelet coefficients in which the characteristic vector is made up of the relative wavelet-based energy in a total of 7 wavelet levels. The wavelet-based entropy, and the Gaussian-type filter functions determine the wavelet-based similarities between the wavelet-based sub-signals and the genuine signal in 7 layers. For early diagnosis, several investigations on pulmonary problems are carried out, however, these abnormalities are very complex and difficult [29]. One of the regular respiratory diseases is COPD (Chronic Obstructive Pulmonary Disease), along with pneumonia. The utilization of signal processing and ML for carrying out to classify Pneumonia and COPD is described in a unique framework. In order to maintain the integrity of the domain and prevent information loss in LS analysis, the EMD approach has been formulated to obtain the ROI parameters. The LS signals of Pneumonia, COPD, and normal people are recognized to be reconstructed using the fundamental mode functions including the low frequencies. Following denoising, features from the spatial, temporal, and spectral domains are combined to test how well the suggested approach performs when compared to other Artificial Intelligence techniques to identify COPD, Pneumonia, and normal patients [30. 31, 32].
Advancing lung sound classification with deep learning models: a comprehensive review
Numerous studies using Artificial Intelligence techniques have been published on the implementation of automatic identification of lung sound classification. Nevertheless, numerous studies have concentrated on estimating lung abnormalities, primarily detecting specific lung acoustic signals such as wheezing and crackling, rather than precisely predicting overall lung problems based on acoustic sound signals from pulmonary auscultation. Due to the signal’s intrinsic complexity, the few efforts focused on pathology classification are somewhat recent and frequently entail extensive processing or specific CNN and RNN frameworks. The recommendation and implementation involved employing a deep Convolutional Neural Network-Recurrent Neural Network (CNN-RNN) model for the categorization of pulmonary sounds using Mel-spectrograms. VGG-net and Mobile-Net structures were applied using the same methods to compare the model with more widely employed CNN architectures. After that, a layer-wise exponential quantization scheme that can decrease the memory footprint of the networks without significantly affecting performance was initiated. A lightweight CNN model was even developed by a few authors to identify respiratory disorders using lung sound scalogram images. This model performed remarkably well, with a ternary chronic classification accuracy of 99.21% [12, 33].
Additionally, a hybrid method that combines both CWT and EMD to produce the scalogram images was provided. Autoencoders and CNNs are currently two of the most widely implemented approaches for a variety of applications. For the detection and localization of video anomalies, a variational autoencoder is also employed. Variational Autoencoders (VA) were commonly employed in 2019 for the evaluation of diverse signal types and their tracking through the utilization of representative samples. To increase the unusual classes, a new convolutional Autoencoder (CAE) was employed. All the pulmonary audios were converted into Mel Spectrograms with the help of CNN. The illnesses in breath sounds may be easily identified using the Mel Spectrograms and CNNs method, even when the training dataset is uneven. Convolutional Autoencoders (CAE) are developed to enhance the classes with a smaller number of samples [9].
With the advent of PR (Pattern Recognition), various characteristics and ML techniques for developing automated processes for wheeze identification have been proposed. The effect of time on the identification of wheezing is investigated, specifically how the outcome of the models was impacted by the introduction of the non-wheezing class classifier. First, numerous classification methods were evaluated, with the best one achieving 98% sensitivity and 95% specificity. The existing approaches only had 55.50% of sensitivity and 75.5% of specificity values [34]. Sound analysis has traditionally relied on time-frequency classifications such as Fourier-based Transforms (FT) or Wavelet-based Transforms (WT). A multi-time-scale DL framework was developed that accurately differentiates normal participants from patients whose breathing cycles contain some substantial sounds. The current proposal represents a suggestion for the future in the field of telehealth because it could serve as the operations and maintenance nucleus of a remote health monitoring device that enables users to observe patients’ respiratory illnesses, identifying potentially problematic situations that require immediate intervention in real-time. So many methods for classifying different no-crackle and crackle respiratory noises are investigated through biodegradation methods such as DWT, EMD, and EEMD, as well as methods for extracting features such as PCA and Autoencoder.
The purpose of this contribution is to decide whether deep learning, as demonstrated by deep CNN and LSTM units, is capable of differentiating between different pulmonary disorders from pulmonary sound waves. Bidirectional LSTM units and CNN made up the first two phases of the DL network architecture. The created algorithm classified users depending on the categories of lung diseases with CNN + BDLSTM, and it did so with a maximum of 99.00% accuracy and 98.00% precision [35]. CRNN models that have undergone numerous advances are also utilized in addition to CNN and aid in the classification of respiratory sounds. The suggested procedure consists of two steps. The respiratory sound data is first transformed by a short-time Fourier algorithm to produce a spectrogram [36]. The resulting spectrogram is then divided into both normal and abnormal pulmonary sounds by an enhanced three classes: crackle, wheeze, and both.
GPUs are needed to enable the standard CNN model’s extensive convolutional operations. A method for minimizing the computing operations of the standard convolution is the depth-wise separable (DS) convolution. The CNN simulations with DS layers of convolution neural network (DS-CNN), cutting-edge gadgets with limited computing power and no GPUs can infer CNN methods more quickly. In order to identify four different types of lung acoustic signals such as normal signal, broken signal, un-broken signal, and unknown signal, it proposes a feature engineering technique to extract specific features for DS-CNN. STFT feature, Mel coefficient features, and the combined feature of these two were the three features that were ultimately recovered for the reduced DS-CNN models. The accuracy of DS-CNN trained networks on the STFT and Mel-features was 82.27 and 73.02% respectively, and the accuracy increased to 85.74% when the two features were combined. To accomplish precise AI-aided respiratory disease detection, the combination of the STFT and Mel-frequency characteristics and a good idea for thin-edge computing devices might be DS-CNN [37, 38].
Several writers have reported outstanding outcomes on ARS categorization over the years. Nonetheless, this field’s reliance on tiny or private data collection has been a major challenge. Via the examination of various classifiers in those tests, the paper shows how random event creation can have a considerable impact on the automatic classification of ARS. The challenging endeavour of automatically classifying Acute Respiratory Symptoms (ARS) remains unaccomplished. Despite the considerable efforts invested in ARS classification, none of the existing work has attained widespread acceptance. CNNs have become state-of-the-art solutions for a number of jobs, but they were insufficient to address this issue. The CNN-MoE architecture was created by enhancing the C-DNN architecture by inserting the mixture-of-experts (MoE) technique into the DNN portion of the network [39].
Enhancing lung sound classification with augmentation
Data augmentation sets were added to create a mix-up, and its impact on the identification of respiratory sounds was investigated. The present investigation employs artificial intelligence methodologies for the identification of critical lung disorders through the analysis of lung sounds. A proposed automated lung sound diagnostic network that classifies lung sounds automatically and robustly implemented with a variety of machine learning methods, including SVM, Nave Bayes, KNN, and ANN, as well as lung signal pre-processing. When compared to earlier efforts, the suggested technique performs well in terms of precision. According to the testing results, the ANN classifier outperformed the most recent method with a maximum accuracy of about 95%. Also, a great deal of research has been done on feature extraction and selection techniques for automated lung sound analysis and categorization [31]. While doing feature extraction from lung sounds, some of the most frequently chosen features include Mel-spectrums, log-spectrums, and log-Mel spectrums. To discriminate between normal and unexpected breath sounds, a hybrid Deep Learning (DL) architecture was employed, integrating both Convolutional and Long Short-Term Memory (LSTM) models for classification. This strategy is justified by utilizing the Convolutional component of the model obtains deep parameters from the actual dataset and lower their dimensions, and the Long-Short-Term memory component to recognize and retain the un-identification relationships in each actual sound signal sequence [35]. The proposed technique also incorporates the Focal Loss (FL) method to address data imbalance and diminish estimation errors.
One of the main reasons why chronic lung disorders like asthma and COPD are viewed as a substantial global health burden is misdiagnosis. It is crucial to perform some beneficial work in order to avoid misdiagnosis. The utilization of a computerized system based on lung sounds (LS) for the categorization of asthma and Chronic Obstructive Pulmonary Disease (COPD) instances is employed. LS denoising is achieved through the application of spectral subtraction, Hurst analysis, and empirical mode segmentations. The study distinguishes between normal, asthmatic, and COPD cases based on LS with a significant classification accuracy of 99.30% utilizing the decision tree (DT) classifier using different algorithms. Determining the breathing cycles, extracting the characteristics, and classifying the features are the three processes in the process of detecting lung sounds [13].
In order to extract distinguishing features at the smallest size, the feature extraction techniques are utilized in the categorization of single-channel lung sounds acquired by automatic identification of breathing cycles were carefully evaluated. This led to the proposal of a fully automatic classification system for single-channel recordings of lung sounds. The lung sounds go through a procedure known as automated detection of breathing cycles in the first step of the approach. Breathing cycles, including the spectrogram, are obtained as repeating patterns at the conclusion of the initial process. The limits of the recurring breathing cycles are identified and isolated from the acoustic signals using the Dynamic Time Warping (DTW) technique [33]. The triple combination of the mean of MFCC, the standard deviation of LPC, and the variance produced the best accuracy. The diagnostic process has significantly improved in terms of reliability and efficiency, with the incorporation of machine learning and deep learning techniques into the medical field. There is currently no appropriate, accurate model to distinguish between COVID-19 illness symptoms and respiratory disease symptoms from the digital stethoscope data. In this study, we have explored the application of the Deep Convolutional Neural Network (Deep CNN) model for the classification of irregularities in symptoms associated with diverse respiratory diseases, encompassing both scenarios with and without max-pooling procedures. Respiratory sound data has been systematically collected through these analyses. The comprehensive background analysis has involved various investigations, incorporating an extensive review of pertinent literature through sources such as internet searches, scientific databases, and Google Scholar.
Methodology and methods
Data analysis
The Pulmonary Sound collection was initially developed as part of a research competition that was held at the 2017 International Conference on Biomedical Health Informatics (ICBHI). Both the public and private datasets from the ICBHI challenge are included in the most recent edition of this database, which is publicly accessible for academic research. Along with this, we have collected the respiratory disease database from online sources, and the COVID-19 sound dataset collected from Cambridge University [40].
Speech acoustic signals that are independently collected throughout two different countries are included in the Pulmonary Sound Data. The bulk of the recordings came from Hospital Infante D. Pedro in Aveiro, Portugal, and from the School of Health Sciences at the University of Aveiro (ESSUA) scientific team’s Respiratory Sciences and Rehabilitating Lab (Lab3R), ESSUA. The second study team compared to Aristotle University of Thessaloniki (AUTH) and the University of Coimbra (UC) collected pulmonary sound samples within the Papanikolaou Medical Center in Thessaloniki in the Medical Center of Imathia (Health Unit of Naousa), Greece. Along with this, we have collected a few respiratory disease samples (Asthma—12 samples, Bronchiolitis—08 samples, COPD—47 samples, heart disease—14 samples, Pneumonia—12 samples, URTI—Upper Respiratory Tract Infection—06 samples, COVID-19 symptoms—19 samples, Heart Failure + COPD—06 samples, LRTI—Lower Respiratory Tract Infection—08 samples, Healthy symptoms—35 samples) from the local Govt. hospitals. The dataset was prepared by collecting the various respiratory disease databases [41–45].
The ICBHI dataset consists of 980 annotated audio samples from 146 patients, totalling 6.5 h of acoustic sound signals with 7049 breath sounds, of which 1904 have crackle sounds, 943 have wheezing, and 583 have all three. Experts in respiratory biology annotated the cycles as having crackles, wheezes, a combination of them, or no accidental respiratory noises. The audio signals were created with various tools, and they ranged in length from 10 to 90 s. Also supplied are the chest regions where the recordings were made. Certain respiration cycles have significant noise levels, simulating real-world settings. The data has been collected from different chest locations of the patient such as the Anterior Left (AL), Posterior Left (PL), Lateral Left (LL), Trachea (Tc), Anterior Right (AR), Posterior Right (PR), Lateral Right (LR). The annotations of this respiratory sound data are made up of the Beginning of the respiratory cycle(s), Absence/Presence of crackles (Absence = 0, Presence = 1), Absence/Presence of wheezes (Absence = 0, Presence = 1), and End of the respiratory cycle(s).
Data augmentation
Three distinct augmentation pairs are taken to test the model, and the results are added to the original dataset. Each deformation in the breathing acoustic signal data is directly applied without undergoing transformation into the characteristic feature maps of diseases, as is typically employed in the preparation of the proposed neural network. It is to be noted that in order to retain greater operational validity, we carefully select the speeding factors for every augmentation set [46]. The dataset has been implemented using the various data augmentation approaches like (i) Shift Pitch –The pulmonary acoustic sound signal in this package can be raised or lowered by utilizing the 1, 2, − 2, and − 1 factors without affecting the respiratory sound pitch, (ii) Time Stretch—Using the 0.50, 1.00, 1.50, and 2.00 factors in this package, a pulmonary acoustic pattern can have its sustained frequency altered without changing the acoustic signal of the disease itself. (iii) Additive Noise in the background—a sequence of various audio sound files with the sample combined with other additional noises; for each sample, four context noise audio noises were combined into the same (all the sounds were collected from the online sources). The final input respiratory sound signal (actual signal + additive noise sound signal) can be calculated with Eq. (1).
| 1 |
where, ‘’ is the actual sound sample, the mixed value of the original respiratory sound signal and background noise is ‘’, ‘’ is the bias value. The weighted value allocated randomly is represented as ‘’ with 0.10, 0.50 factors. The MUDA library, a resource that provides detailed information on the procedures for each augmentation, was utilized for the application of augmentations, offering a comprehensive reference for the augmentation process. MUDA initially chooses the pulmonary acoustic signal and A JSON Annotated Music Standard (JAMS) configuration understands in which corresponds with its in order to produce the deformed audio and preferable JAMS samples that contain the actual enveloping parameters. In this endeavour, we created a JAM (JSON Annotated Music) package based on actual findings from the assessment data; it is accessible as an open-source library.
Experiments
All studies were conducted within the Python 3.8 framework [47], involving the training of two distinct deep CNN models from scratch using various API packages, and incorporating both models without and with max-pooling; the operating system employed was Windows 10 (version). The motivation of the present work is to create two deep CNN models, balance the disease classes by applying an audio augmentation process, and assess the model’s performance with both augmented and real-world data. The proposed network is prepared by employing various feature spectrums like Log-Mel, Soft-Mel, and Modified MFCC to obtain deep acoustic sound signal features to evaluate the actual findings of this proposed work. Our approach, which is briefly described below, draws a number of libraries from the open-source.
The ANACONDA environment, which is open-source, is configured to implement Python [48]. The machine learning and data science packages NumPy, pandas, and others are already loaded. There are also other well-known visualization packages included (matplotlib). Both Windows and Linux operating systems are compatible with it. Additionally, it enables the development of many settings, each with a unique collection of packages for carrying out the task. Another crucial Python module utilized in this analysis is LIBROSA [49]. To evaluate the acoustic sound signals of pulmonary samples, the library consultation is carried out. In this study, three different approaches to the auditory collection of features are employed, and all of them are contained in a single package. In the present study, this tool has been utilized for actual information preliminary processing. Without a visual dataset, it is capable of complicated respiratory sound file interpretation. In this study, the KERAS library [50] is established to implement the DCNN model. Here, we have implemented the DCNN model with and without max-pool functions with various hyperparameters (activation function, dropout layers, creations of layers, and max-pooling) to make the model more accurate. The features extracted from the pre-trained data are utilized for the identification of disease symptoms and the observation of abnormalities in acoustic signals associated with COVID-19 and other pulmonary illnesses.
Methodological framework
This study learns and evaluates the model performance using the ICBHI and a newly acquired own dataset. Lung sound categorization has seen a tremendous increase in the usage of CNN to recognize distinctive audible noises during the past few years. To categorize the distinct lung sounds in this work, we implemented the DCNN model with and without max-pooling functions. The research employed DCNN models to formulate their sound categorization models, employing various approaches. In this study, three Mel-Spectrum feature extraction methods (MMFCC, Log-Mel, and Soft-Mel) were utilized to train the newly constructed DCNN model. This approach addresses the concept of sound feature categorization specifically for respiratory sound data. The accuracy of the DCNN model is improved by a variety of data augmentation techniques on the ICBHI dataset and recently acquired data. The identical scenario applies this DCNN to the expanded dataset. The experimental outcomes show considerable increases in testing accuracy. The structural framework of this research is represented in Fig. 2.
Fig. 2.
The network of the newly implemented model with different feature extraction methods
DCNN model implementation
Two pooling operations have been created in the implementation of the DCNN model (with and without max-pooling functions). After obtaining sound parameters from multiple feature channels (Modified MFCC (Mel-frequency Cepstral Coefficients), Soft Mel-frequency channel, Log Mel-frequency channel), the DCNN model classifies respiratory sounds according to respiratory diseases (Asthma, Bronchiolitis, COPD, Heart Disease, Pneumonia, URTI—Upper Respiratory Tract Infection, COVID-19 Symptoms, a combination of Heart Failure and COPD, LRTI. this study, we developed a DCNN model to recognize respiratory disease symptoms in data from digital stethoscopes. The model can also be applied to discern disparities between COVID-19 disease symptoms and symptoms associated with other respiratory diseases in data acquired from digital stethoscopes. Three convolutional layers, two max-pooling functions, and one dense layer help shield the deep convolutional layer in the newly implemented Deep CNN model. The model assists in separating the features of deep respiratory sound collected from a digital stethoscope.
The model data is made up of TF-Ps (Time Frequency-Patches), that are acquired from the Log-Mel-frequency spectrum approach of the recognition of respiratory acoustic signals. They are related to the originally approved feature learning techniques modified for the classification of respiratory diseases from digital stethoscope data. We utilized the Python library (Essentia) for hop size with the same length in order to generate the Log-frequency spectral range with 256 various Bands depicting the recognized frequency band (with the frequency range of 0–22 kHz) using a frame rate of 25 ms. The implemented model acquired 512 sound feature vectors including 256 feature s, and 256 handicraft feature vectors for various sound modalities of the respiratory sound data (digital stethoscope sound data). After applying three feature extraction spectrums to the digital stethoscope data, audio data augmentation techniques were implemented to enhance the efficacy and reliability of the proposed models. The actual size of the vector of Log-frequency spectrum is (128, 128), Soft-Mel-spectrum is (128, 128), and the Modified MFCC (MMFCC) spectrogram size is (64,128).
The dataset contains sounds with durations ranging from 0 to 30 s. Therefore, we set the TF–Ps (Time-Frequency-Patches) for this input digital stethoscope sound data to 5 s for each frame (a total of 256 frames). As explained in the subsequent section, while training, TF-Ps are automatically derived over a period of time from the entire log-Mel-spectrogram of every sound snippet. The proposed network is made to learn the parameter as “ ” with the generalized value of the non-linear value “ ” that maps to “ T ” as the input and “ Y ” as the output function.
As will be explained further below, during training, TF-Ps are automatically derived over a period of duration from the entire log-Mel spectrum of every acoustic signal snippet. The network is built to understand the parameter ‘’ of a normal curved factor that transfers our input ‘T’ to the output ‘Y’, which is constitute by the equation Eq. (2).
| 2 |
Where is performed to detect each input function of the DCNN network. The operational performance function of the convolutional network layer model is presented in Eq. (3) and dense layer operational function is represented in Eq. (4).
| 3 |
| 4 |
where ‘’ represents the output of the prediction classifier, ‘’ is the pointwise representation of the activation function, the non-linear function is represented as ‘’, ‘’ is the kernel or filter function, ‘’ is the input vector function, the bias function is represented as , and ‘’ is represented as convolutional operation.
The optimization parameter utilized in this analysis is ‘Adam,’ with a total of 100 epochs and a batch size of 32 employed for designing the deep CNN model. ReLU is chosen as the activation function for the intermediate layers in this technique. Additionally, a L2-norm regularization function with a value of 0.001 is applied to adjust the loss function. The ‘Softmax’ activation function is implemented at the output layer for the classification of respiratory disease classes.
Functional description of DCNN framework
Convolutional networks form the DCNN model’s framework. At this point, the input spectrum is subjected to the convolutional filter. Convoluted elements of the original sound spectrogram are found using techniques like detection of edges, identification of patterns, and others. These convolved features are produced by the overall convolution of the kernel parameters and their matching actual spectrum parameters. Until it reaches the endpoint, this filter travels across the input spectrogram. This filter moves smoothly across the visual spectrogram depends on the stride value. The activation function for the ReLU in our Model-2 framework receives its input from the deep convolutional feature values (Rectified Linear Unit). If the input value is zero then, the model has to provide the output value also be zero. In Model 2, there is no max-pool action carried out prior to the ReLU activation and following the convolution function. Figure 3 shows the overall design of with max-pool while Fig. 4 shows the without max-pool operations of the proposed network model. We have stated both the model’s representation in the following sections.
Fig. 3.
Deep CNN layered network with max-pool function with various feature extraction methods
Fig. 4.
Deep CNN layered network without a max-pool function using various feature extraction methods for 10 different classes
The extraction of smooth and precise features is significantly aided by the pooling feature. Variance, computation, and dimensionality can all be minimized. In general, there are two types of pooling procedures. Max-pooling is the most popular method for obtaining features like points, edges, and other attributes. Another type of pooling that helps with the collection of straightforward feature-vectors in average pooling. In this design, Model-1 assesses the max-pool operation after the convolution process. Regardless of any limitations on time, the goal of this research is to assess the influence on DCNN when employing the max-pooling instead of using without the max-pooling function.
In Models 1 and 2, we regard the primary input parameters for the Deep Convolutional Neural Network (DCNN) as the respiratory noises generated by human lungs. Specifically, the sound spectrogram is employed to automatically extract profound respiratory sound features through DCNN Models 1 and 2. To construct the spectrograms, three distinct feature channels are employed, namely the Mel-frequency cepstral coefficient (MMFCC) spectrogram, Log-Mel spectrogram, and Soft-Mel spectrogram. The following are some of the features considered in this research to analyze the abnormalities of the various respiratory sounds including COVID-19.
The main features of respiratory sounds are the expiry volume of the cough sound, the loudness of the respiratory sound, the breath air volume, the peak-flow rate of the cough, sound peak velocity, sound intensity, acoustic signal of the sound, voice pitch, subglottic pressure of the voice. Some of the other features are VAD (Voice Activity Detection), frequency of the cough sound, frequency of the breath sound, respiratory sound pitch velocity, frequency of the speech sound, signal duration, and vocal resonance of the voice are derived by the proposed DCNN model from the various input modalities like cough, voice, and breath. It’s crucial to distinguish between typical respiratory noises like crackles, wheezes, and pleural rubs and abnormal ones like those for an appropriate diagnosis. For regularising the unbalanced dataset of COVID-19 sounds in the present research, the DCNN models performed better than the old deep-learning approaches. Gradient values of the network learnable features have disappeared in the middle layers while improving the actual error rate, which causes the model to finish training early. We suggest using DCNN models to address it. The COVID-19 illness is binary classified using the current deep learning algorithms. The suggested model produces the best results when respiratory disorders are categorized into multiple classes.
When a patient is being tested for a respiratory illness, the benchmark dataset (ICBHI) reveals the fact that there are certain discriminating signs in the findings that may be an accurate predictor. Specifically, the model performance for each disease class for testing gives better accuracy by combining various breathing modalities. The existing approaches like SVM, VGG-Net, 1D CNN, Light-Weight CNN, and DCNN with 5 layers are not performing better. The proposed model for the classification of various respiratory diseases from the patient sample data provides better performance compared to the existing approaches. The dataset samples are very less to train the model in a better way. Additional data will be gathered soon in order to boost the efficacy of a multiple-level proposed DCNN model. A multi-layer deep neural network will be put into practice in an effort to boost the accuracy of diagnosing various lung illnesses. The suggested model’s comparison to earlier audio classification models on various lung sound data from the past few years is explained in the "Result Analysis and Discussion" Section.
Result analysis and discussion
The performance of the DCNN model has been discussed with respect to the two datasets which are Respiratory Disease Database and COVID-19 sounds data. The accuracy and -score for this model is illustrated using the Eqs. 5 and 6. By contrasting the suggested model with the existing models on digital stethoscope data, this observation has demonstrated that it performs well.
| 5 |
| 6 |
And recall and precision calculated using the Eqs. 6 and 7. In order to assess working functionality of the suggested approach and compare it with existing DL and ML approaches, the model have been implemented with k-fold cross-validation technique. The accuracy of the suggested model is measured for various respiratory disease classes like Asthma, Pneumonia, COPD, Bronchiolitis, Heart Diseases, Heart Failure + COPD, URTI (Upper Respiratory Tract Infection), LRTI (Lower Respiratory Tract Infection), Pertussis, and COVID-19. ‘P’ is the Precision and ‘R’ is the Recall function in the Eq. 8 to estimate F1-Score for proposed model.
| 7 |
| 8 |
By improving the CNN model’s settings such as controlling the number of layers that are hidden. In our approach, we have incorporated diverse learning rates and employed various activation parameters across different networks. Additionally, to mitigate the risk of overfitting, we have implemented the L2 norm regularization method in the proposed model. In order to reduce model loss, we also looked at the number of epochs (500 epochs). This observation demonstrated that, when compared to the current models on the COVID-19 Sounds Data and Respiratory Disease Database, the suggested model provides good accuracy. The COVID-19 sounds dataset is obtained from the University of Cambridge, and additional samples from local hospitals have been added. The respiratory disease database is created using freely accessible online sources, a few respiratory sounds set from local government hospitals. The analysis of the suggested approach on human respiratory disease database and COVID-19 acoustic signals is indicated in Table 1. The imbalanced classes are balanced using L2-Norm regularization method in both the datasets.
Table 1.
The performance of the proposed on human respiratory sound disease database and COVID-19 sounds data
| Dataset | Name of disease | Accuracy (%) | score (%) |
|---|---|---|---|
| Respiratory disease database + ICBHI data | Asthma | 91.28 | 92.77 |
| Bronchiolitis | 92.32 | 94.02 | |
| COPD | 94.86 | 96.49 | |
| Heart diseases | 89.92 | 92.01 | |
| Pneumonia | 95.13 | 95.88 | |
| URTI | 88.68 | 91.01 | |
| COVID-19 | 92.43 | 93.71 | |
| Heart failure + COPD | 94.56 | 96.01 | |
| LRTI | 88.74 | 91.02 | |
| Healthy | 95.23 | 96.78 | |
| COVID-19 sounds data + ICBHI data | Asthma | 92.14 | 93.49 |
| Bronchiolitis | 92.18 | 93.49 | |
| COPD | 95.04 | 95.49 | |
| Heart diseases | 90.06 | 91.46 | |
| Pneumonia | 93.27 | 94.89 | |
| URTI | 90.36 | 91.79 | |
| COVID-19 | 95.83 | 97.20 | |
| Heart failure + COPD | 94.49 | 95.90 | |
| LRTI | 90.87 | 92.06 | |
| Healthy | 96.05 | 96.97 |
The proposed model with Respiratory Disease Database + ICBHI Data obtained the 91% of accuracy to identify Asthma symptoms, 92% of accuracy to identify Bronchiolitis symptoms, 94% of accuracy to detect the COPD disease symptoms, 89% of accuracy to identify the heart disease symptoms, 95% of accuracy to detect the Pneumonia disease symptoms. Along with that, 88% of accuracy to identify URTI (Upper Respiratory Tract Infection) symptoms, 92% of accuracy for identify COVID-19 disease symptoms, 94% of accuracy to detect the Heart Failure + COPD disease symptoms, 88% of accuracy to identify LRTI (Lower Respiratory Tract Infection) symptoms, and healthy symptoms are identified with the accuracy of 96%.
The result analysis of various disease is shown in the Fig. 5 for Respiratory Disease Database + ICBHI Data. The red dotted line identifies the performance of the suggested approach on COVID-19 Sounds Data + ICBHI Data and the blue dotted line shows the performance of the proposed model on Respiratory Disease Database + ICBHI Data. The abnormalities are considered using various respiratory sound parameters like sound frequency, cough peak time velocity, signal to noise ration of the sound loudness of the sound, acoustic signal rate, air volume, sound pitch and velocity, quality and intensity of the sound, cough expiry volume, voice activity detection, rate of the cough peak flow. Based on the findings of the initial analysis, the hyper-parameter objective function is revised, and a better model framework is then looked at. The comparisons of abnormalities are made for various respiratory diseases like Asthma, Pneumonia, COPD, Bronchiolitis, Heart Diseases, Heart Failure + COPD, URTI (Upper Respiratory Tract Infection), and LRTI (Lower Respiratory Tract Infection).
Fig. 5.

The performance of the proposed model analysis on human respiratory sound disease database and COVID-19 sounds data
The data analysis is done to observe abnormalities in the various respiratory diseases using different Mel spectrograms. MMFCC (Modified Mel-frequency spectrum) has better performance features than other spectrograms. By applying lung acoustic sound signals, the investigation demonstrates the differences within SARS-CoV-2 illness and other respiratory illnesses. The analysis of feature extraction with MMFCC is shown in Fig. 6 for various respiratory disease (Asthma, Bronchitis, COPD, Heart Disease, COVID-19, and Pneumonia). The spectrogram identifies the abnormalities of the frequency for each respiratory disease and variations of respiratory disease cough sound. Sample data allowed for the observation of respiratory sound is analysed using MMFCC. With the help of respiratory sound data, this research demonstrates the variances between different respiratory diseases. We examined pulmonary acoustic sound disease data using each of the three channels and discovered that MMFCC outperformed the remaining two feature spectrums in terms of obtaining deep pulmonary acoustic signal features. In this analysis, we observed that there are many crackles that occurred for COVID-19 sample sound. The frequency and loudness of the COVID-19 sound is more when compared with the other respiratory disease sounds.
Fig. 6.
Comparison of disease abnormalities for various respiratory diseases with MMFCC
The performance metrics for proposed model with various respiratory diseases (Asthma, Pneumonia, COPD, Bronchiolitis, Heart Diseases, Heart Failure + COPD, URTI (Upper Respiratory Tract Infection), LRTI (Lower Respiratory Tract Infection), Pertussis, and COVID-19) is displayed in the Table 2. Precession, recall, accuracy, and -Score for the proposed models is assessed in this investigation. The triangular filter representation for various respiratory disease (Asthma, Bronchitis, COPD, COVID-19) is represented in Fig. 7. The model has been observed three different cases in this analysis that are how the disease sound filter bank converting from Hz to Mel, what is the frequency of each filter bank (the model observed for 10 filter banks), and plotting of the triangular filters for each disease. Here, there is a huge difference in the normal acoustic signals and abnormal acoustic signals according to the frequency, sound pitch, sound velocity time, sound volume, and Mel filter coefficient values. The model is tested with respect to the all the cases to identify the abnormalities of the respiratory sound.
Table 2.
Analysis of proposed method’s performance with respect to the precession, recall, accuracy, -Score metrics on COVID-19 sounds data + ICBHI data and respiratory disease database + ICBHI data
| Dataset Name | Disease class name | Precession (%) | Recall (%) | Accuracy (%) | Score (%) |
|---|---|---|---|---|---|
| Respiratory disease database + ICBHI data | Asthma | 92.13 | 93.42 | 91.28 | 92.77 |
| Bronchiolitis | 93.82 | 94.23 | 92.32 | 94.02 | |
| COPD | 95.88 | 97.12 | 94.86 | 96.49 | |
| Heart diseases | 91.21 | 92.83 | 89.92 | 92.01 | |
| Pneumonia | 95.41 | 96.36 | 95.13 | 95.88 | |
| URTI | 90.36 | 91.67 | 88.68 | 91.01 | |
| COVID-19 | 93.38 | 94.05 | 92.43 | 93.71 | |
| Heart failure + COPD | 95.21 | 96.82 | 94.56 | 96.01 | |
| LRTI | 90.43 | 91.62 | 88.74 | 91.02 | |
| Healthy | 96.20 | 97.37 | 95.23 | 96.78 | |
| COVID-19 sounds data + ICBHI data | Asthma | 93.32 | 94.16 | 92.14 | 93.49 |
| Bronchiolitis | 93.42 | 94.28 | 92.18 | 93.49 | |
| COPD | 95.20 | 96.63 | 95.04 | 95.49 | |
| Heart diseases | 90.82 | 92.10 | 90.06 | 91.46 | |
| Pneumonia | 94.12 | 95.68 | 93.27 | 94.89 | |
| URTI | 91.31 | 92.28 | 90.36 | 91.79 | |
| COVID-19 | 95.16 | 96.21 | 95.83 | 97.20 | |
| Heart failure + COPD | 95.69 | 96.12 | 94.49 | 95.90 | |
| LRTI | 91.34 | 92.81 | 90.87 | 92.06 | |
| Healthy | 96.82 | 97.13 | 96.05 | 96.97 |
Fig. 7.
Triangular filter representation for various respiratory disease (Asthma, Bronchitis, COPD, COVID-19)
The model observes variations in the respiratory sound disease (Asthma, Bronchiolitis, COPD, Heart Disease, Pneumonia, URTI—Upper Respiratory Tract Infection, COVID-19 Symptoms, a combination of Heart Failure and COPD, LRTI—Lower Respiratory Tract Infection) with respect to the breathing sound parameters (loudness, sound peak flow rate, inhale and exhale, etc.,). The proposed model is implemented in a better way to identify abnormalities in the respiratory sound using the above-mentioned respiratory sound parameters.
The current strategies are mainly concentrated on identifying Pertussis, bronchitis, and asthma illnesses, as well as coronary artery disease. But we have included COVID-19 sound data with the existing data and trained the proposed DCNN model which helps to identify the abnormalities between the COVID-19 sound disease and various pulmonary disease. The analysis of proposed model metrics such as Precession, Recall, Accuracy, and F1-Score using COVID-19 Sounds Data + ICBHI Data and Respiratory Disease Database + ICBHI Data is shown in Fig. 8. The suggested model shows that state-of-the-art on COVID-19 Sounds Data + ICBHI Data compared to Respiratory Disease Database + ICBHI Data.
Fig. 8.
Analysis of proposed approach metrics like precession, recall, accuracy, -Score with COVID-19 sounds data + ICBHI data and respiratory disease database + ICBHI data
Impact of hyper parameters
The layer of convolution in the suggested DCNN is crucial for identifying irregularities in lung sounds. Using the respiratory disease database, COVID-19 sound data, and ICBHI sound dataset, an observational study was conducted to determine the number of layers of convolution needed for the fundamental network model. A test set made up of 20% of the data from the respiratory database and COVID-19 sound files, a validation set made up of 10% of the data from the same files, and training made up of 70% of the data from the same files using 32-batch sizes for 500 epochs. The function map’s smallest dimension on the current layer is exceeded, so the highest limit is 10 DCNN layers. For numerous additional variables extracted from the fundamental model, the ideal amount of convolutional neural networks and the parameters associated with them are also determined using the same technique.
In the past, studies have verified sound features to distinguish respiratory illness symptoms using the MFCC extracting features method. According to experimental results using different feature extraction techniques (MMFCC, MFCC, Log-Mel, and Soft-Mel) combined as a multi-feature channel, the extremely deep feature maps that were derived for this study have favourable rates of recognition. These results are summarized in Table 3. In order to filter out noise and disturbance from audio files, obtain the deep characteristics of a respiratory sound audio file, and react positively to DCNN model sound features, suggestions are made. Figure 9 illustrates the comparative outcomes of the DCNN convolutional model with MMFCC, MFCC, Log-Mel, and Soft-Mel using depth acoustic features. Its collected data consists of the Time-Frequency Patch (T-FP) values derived from the exponentially scaled Mel-spectrum architecture of lung acoustic pattern features and the initially developed feature training methods to identify different lung disease signals.
Table 3.
The impact of feature extraction methods employed to design proposed deep CNN model on COVID-19 Sounds Data + ICBHI Data to identify COVID-19 disease symptoms
| Kernel Size | Features types | Accuracy (%) | F1-Score (%) |
|---|---|---|---|
| 1 × 12 | MMFCC | 95.83 | 96.38 |
| MFCC | 92.31 | 93.13 | |
| Log-Mel | 88.38 | 89.21 | |
| Soft-Mel | 86.21 | 87.32 | |
| 1 × 24 | MMFCC | 94.86 | 95.45 |
| MFCC | 91.73 | 92.34 | |
| Log-Mel | 87.93 | 88.39 | |
| Soft-Mel | 86.01 | 87.28 | |
| 1 × 36 | MMFCC | 94.31 | 95.13 |
| MFCC | 91.21 | 92.18 | |
| Log-Mel | 87.45 | 88.37 | |
| Soft-Mel | 85.82 | 86.92 | |
| 1 × 48 | MMFCC | 94.04 | 94.96 |
| MFCC | 90.82 | 91.68 | |
| Log-Mel | 87.06 | 87.95 | |
| Soft-Mel | 85.43 | 86.32 | |
| 1 × 60 | MMFCC | 93.45 | 94.12 |
| MFCC | 90.18 | 91.27 | |
| Log-Mel | 86.28 | 87.38 | |
| Soft-Mel | 85.07 | 86.16 |
Fig. 9.

The visual representation illustrates the impact of the feature extraction methods utilized in designing the proposed deep convolutional neural network (CNN) model on COVID-19 sounds data combined with ICBHI data for the identification of COVID-19 disease symptoms
The proposed DCNN convolutional model performs well, and after joining another layer of a DCNN network, 10 DCNN levels are created. The process of convolution and pooling levels pop up here and there in the DCNN framework. Finally, each of the layers has been entirely linked to produce the output. Multiple layers of the artificial network’s output are evaluated. When there are more than five hidden levels, it is seen that the network output almost stays the same. Table 4 displays the testing findings. The suggested results indicate that the artificial neural network expansion in present study improved sound signal accuracy for different tests as well as identification. Figure 10 shows the contrast findings between the 10-layer and 15-layer DCNN models using various DCNN kernel sizes.
Table 4.
The impact of MMFCC feature extraction method on proposed model with various layers using different kernel sizes
| CNN Kernel Shape | Architecture type | Accuracy (%) | F1Score (%) |
|---|---|---|---|
| 1 × 60 | 10-Layers | 93.45 | 94.12 |
| 15-Layers | 91.26 | 92.13 | |
| 1 × 48 | 10-Layers | 94.04 | 94.96 |
| 15-Layers | 92.32 | 93.16 | |
| 1 × 36 | 10-Layers | 94.31 | 95.13 |
| 15-Layers | 92.18 | 93.27 | |
| 1 × 24 | 10-Layers | 94.86 | 95.45 |
| 15-Layers | 93.16 | 94.29 | |
| 1 × 12 | 10-Layers | 95.83 | 96.38 |
| 15-Layers | 94.19 | 95.13 |
Fig. 10.

The impact of MMFCC feature extraction method on proposed model with 10-layered and 15-layerd architecture using different kernel sizes (1 × 12, 1 × 24, 1 × 36, 1 × 48, 1 × 60)
The findings from experiments show that the current model’s comprehensive characteristics preserve extra feature values on a digital stethoscope of the sound data audio signals by improving the precision of identification of the respiratory disease symptoms. The suggested DCNN artificial network performs better than the regular CNN at classifying the digital stethoscope audio sounds employed in this study. This can be seen when the convolutional filter dimension is less than 48. It is due to the increased duration in processing border filter dimension and large space calculations that consumes more time. In contrast to the conventional MFCC approach, the MMFCC feature extraction method employed in this study proves significantly more apt for detecting abnormalities associated with COVID-19. Additionally, when the overall number of hidden convolutional layers is very large, identification precision is extremely high. However, the calculated sum grows slightly and the processing time for class identification is lengthened as the number of convolutional structure layers that are concealed increases.
In contrast to the traditional input, the collection of the depth sound features does not require preliminary processing for any length of lung audio based on COVID-19 sound. Thus, categorization techniques can specifically include lung sounds. The ability of the whole model to understand COVID-19 disease symptoms without utilizing x-rays computed tomography (CT), or other studies is significantly improved by using the suggested DCNN model in this study as opposed to CNN. Based on sound features, the suggested approach produced reliable identification effectiveness. The conceptual structure created in this study is being applied to routine research in daily living or to public health methods of treatment. If the model pinpoints disease signs, it is best to go to the doctor for an additional checkup.
Asthma, Bronchiolitis, COPD, Heart Disease, Pneumonia, URTI - Upper Respiratory Tract Infection, COVID-19 Symptoms, a combination of Heart Failure and COPD, and LRTI - Lower Respiratory Tract Infection are the only abnormalities that this method can detect. Currently, the method could only be successful with a few samples. As a result, it is not yet utilized in medical care, but proposed model may be applied to help in early illness identification. The initial endeavor aimed to improve effectiveness, and subsequently, the ICBHI collection was employed in healthcare settings for the identification of various respiratory diseases. Future research endeavors will leverage a diverse range of techniques to construct artificial models capable of extracting respiratory symptoms from digital stethoscope data, with the ultimate goal of identifying data suitable for clinical applications.
Comparison of proposed model with existing approaches
When a patient is being tested for a respiratory illness, the benchmark dataset (ICBHI) reveals the fact that there are certain discriminating signs in the findings that may be an accurate predictor. Specifically, the model performance for each disease class for testing is giving better accuracy by combining various breathing modalities. The DCNN model is a proposed model for the classification of various respiratory diseases from the patient sample data. The dataset samples are very less to train the model in a better way. Additional data will be gathered soon in order to boost the efficacy of a multiple levels deep CNN model. A multi-layer deep neural network will be put into practice in an effort to boost the accuracy of diagnosing various lung illnesses. The suggested model’s comparison to earlier audio classification models on various lung sound data from the past few years is shown in Table 5.
Table 5.
The comparision of proposed model with existing works to analyze the accuracy of the suggested approach
| Model Name | Dataset Name | Parameters | Accuracy (%) |
|---|---|---|---|
| SVM | COVID-19 | Cough, breath | 82 |
| VGG-Net | COVID-19 | Cough, breath | 87 |
| 1D CNN | COVID-19 | Cough, breathand speech sound | 90 |
| Light-Weight CNN | COVID-19 | Cough, breathand speech sound | 92 |
| DCNN with 5 layers | COVID-19 | Cough, breathand speech sound | 93 |
|
Deep CNN (Proposed model) |
COVID-19 sounds data + ICBHI data, and samples collected from Govt. Hospitals | Cough and breath | 94 |
|
Deep CNN (Proposed Model) |
Respiratory disease database + ICBHI data, and samples collected from Govt. Hospitals | Cough and breath | 96 |
In contrast to the conventional technique, no duration of the respiratory sound requires the identification of lung sound audio features for preliminary processing. As a result, the proposed classification networks can explicitly include lung sound patterns. The actual suggestion of the DCNN model outperforms the pre-trained CNN model in this study, significantly improving the ability of the whole system approach to detect anomalies among the different lung sound acoustic signals. Without using reference X-rays or computed tomography (CT) images, the suggested model will produce reliable identification results based on respiratory sound signal characteristics.
The framework created in this work may be applied to general public medical facilities or evaluations of daily life. It is imperative to seek clinical attention for further testing if any disorder manifests. Consequently, it can be employed for the early detection of respiratory diseases, even if it has not been applied in clinical practice before. Our ongoing efforts involve the continual development of methods and the selection of samples intended for application in medical care. The proposed models utilize a more comprehensive dataset compared to existing approaches, as delineated in Table 6, and the analysis results are illustrated in Fig. 11.
Table 6.
Systematic comparative analysis of various respiratory disease datasets
| Dataset | Modalities | Accuracy (%) |
|---|---|---|
| COVID-19 audio data | Sleep quality, fatigue, and anxiety | 55 |
| ComParE2021-CCS-CSS-data | Cough, breath and voice | 74 |
| MIT open voice data set | Voice, and breath | 79 |
| COVID-19 Sounds Data | Cough, breath and voice | 82 |
| CT images data | 296 Chest CT Images | 86 |
| COUGHVID data | Cough and wheezing sounds | 90 |
| COVID-19 english labelled tweets dataset | English tweets | 91 |
| COVID-19 sounds data + ICBHI data, and samples collected from Govt. Hospitals (Proposed and utilized in the present work) | Cough and breath | 96 |
Fig. 11.

The comparative results for various respiratory disease datasets
Diagnosing COVID-19 typically relies on a combination of clinical evaluation, laboratory tests, and medical imaging. The primary methods employed for COVID-19 diagnosis encompass:
Reverse transcription polymerase chain reaction (RT-PCR) test
This is the most common diagnostic test for COVID-19. It detects the presence of the virus’s genetic material (RNA) in a patient’s respiratory sample, usually obtained through a nasal or throat swab. RT-PCR is highly specific and sensitive and is considered the gold standard for COVID-19 diagnosis.
Antigen tests
Antigen tests detect specific proteins on the surface of the virus. They are faster than RT-PCR tests but can be less sensitive, meaning they may yield false-negative results, especially in asymptomatic cases.
Serological (antibody) tests
These tests are designed to detect antibodies produced by the immune system in response to the virus. Their primary utility lies in identifying past infections, and they are generally less effective for diagnosing acute cases of COVID-19.
Medical imaging
Chest imaging, such as X-rays and computed tomography (CT) scans, can be helpful to evaluate the extent of lung involvement in COVID-19 patients. These imaging techniques can show characteristic patterns of lung abnormalities associated with the virus, such as ground-glass opacities. Nevertheless, medical imaging is not conventionally employed as a standalone diagnostic tool for COVID-19; rather, it serves as a complementary method to assess the severity of the disease.
Comparing these diagnostic techniques
RT-PCR is the most accurate method for diagnosing current COVID-19 infections. However, it needed physical contact to take the samples and more trained medical representatives to test the samples. We can avoid physical contact in our proposed work because physical contact is the major way to spread the virus. Antigen tests are faster but less sensitive than RT-PCR tests. However, physical contact is also needed to take the samples and more trained medical representatives to test the samples in the exiting approaches. We can avoid physical contact in our proposed work because physical contact is the major way to spread the virus.
Serological tests prove valuable in identifying past infections and assessing immunity. On the other hand, medical imaging is primarily utilized to evaluate lung involvement and monitor disease progression; however, it is not considered a primary diagnostic tool for COVID-19. The choice of diagnostic method depends on the clinical context, availability of resources, and the stage of the disease. Combining multiple methods, such as clinical assessment, laboratory testing (RT-PCR or antigen), and imaging can provide a more comprehensive picture of a patient’s COVID-19 status. However, all the existing approaches needed physical contact to take the samples and more trained medical representatives to test the samples. It is important to note that the accuracy of these tests can vary, and false negatives and false positives can occur. Healthcare providers consider multiple factors, including symptoms, exposure history, and test results, when making a COVID-19 diagnosis. Our main aim of this proposed work is to avoid the physical contact and provide the better primary diagnostic model to identify COVID-19 disease symptoms.
Conclusion
The research was dedicated to identifying abnormalities in respiratory diseases by employing the proposed Dep CNN model, underscoring its significance in advancing our understanding of respiratory health. The suggested approach is tested with various datasets (Respiratory Disease Database + ICBHI Data, COVID-19 Sounds Data + ICBHI Data) to analyse model performance. The model was tested with max-pool function and without max-pool function and found that model is performing better with max-pool function on these datasets. The proposed deep CNN model is designed with three feature extraction approaches like Modified MFCC, soft Mel, and Log-Mel and tested with both the datasets. The final findings have shown that on various benchmark datasets for identifying COVID-19 disease abnormalities, the Deep CNN network along with max-pooling function employing MMFCC is providing a futuristic value. The model has been compared with different filter shapes like , , , , and and observed that kernel size performing better than the remaining filters. The proposed model with 10-layered architecture outperforms the existing DCNN model with a 5-layered architecture by 2–4% better. Notably, it demonstrates exceptional proficiency in detecting abnormalities associated with respiratory diseases across both datasets. The study focuses on the significance of Neural Architecture Search (NAS) as a tool for identifying superior framework categorization. However, it is essential to acknowledge that the model has limitations as it heavily relies on the quality and quantity of the data utilized in this research. Overfitting is also a concern, wherein the model becomes too specific to the training data, hindering its ability to generalize effectively across diverse scenarios. In dynamic environments, a challenge known as dataset shift may arise, wherein the model’s performance may deteriorate if the distribution of the testing data significantly differs from that of the training data. Addressing the initial constraint holds promise through the incorporation of ensemble methods, offering potential improvements in overall performance and robustness. To tackle the second limitation, strategies involve applying data transformations, incorporating regularization techniques like L1 or L2 to discourage overly complex models, thereby mitigating overfitting to noise in training data. Additionally, introducing adversarial training methods can bolster the model’s resilience to distributional shifts, while the inclusion of domain-specific metrics or those explicitly acknowledging variations between training and testing data can be explored. These avenues present promising directions for future research and development. To ensure continuous availability for patients seeking information or advice, especially in non-emergency situations beyond regular office hours, the proposed model is set to be deployed in artificial chatbots as part of future directions.
Acknowledgements
We want to express our gratitude to everyone who is working to control the SARS-CoV-2 outbreak.
Data availability
This work is connected to a repository of data, where you can get the data. We collected COVID-19 large-scale sound (breath, voice, cough) data from reputable repositories and referenced sources in this work instead of using publicly available data [41–45]. Along with this, we have collected ICBHI data (digital stethoscope data) from online sources [40]. The pertinent author may provide the information to scientists and investigators upon proper request.
Declarations
Conflict of interest
There are no conflicts of interest among the authors.
Footnotes
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Wang Y, et al. Unobtrusive and automatic classification of multiple people’s abnormal respiratory patterns in real time using deep neural network and depth camera. IEEE Internet Things J. 2020;7(9):8559–71. 10.1109/JIOT.2020.2991456. [Google Scholar]
- 2.Saatci E, Saatci E. Determination of respiratory parameters by means of hurst exponents of the respiratory sounds and stochastic processing methods. IEEE Trans Biomed Eng. 2021;68(12):3582–92. 10.1109/TBME.2021.3079160. [DOI] [PubMed] [Google Scholar]
- 3.World Health Organization. Coronavirus disease 2019 (covid-19). (2021) Available from: https://covid19.who.int/.
- 4.Vaishya R, Javaid M, Khan IH, Haleem A. Artificial intelligence (AI) applications for COVID-19 pandemic. Diabetes Metab Syndr Clin Res Rev. 2020;14(4):337–9. 10.1016/j.dsx.2020.04.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Unwin HJT, Mishra S, Bradley VC, et al. State-level tracking of COVID-19 in the United States. Nat Commun. 2020;11:6189. 10.1038/s41467-020-19652-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Easwaramoorthy D, Gowrisankar A, Manimaran A, Nandhini S, Rondoni L, Banerjee S. An exploration of fractal-based prognostic model and comparative analysis for second wave of COVID-19 diffusion. Nonlinear Dyn. 2021;8:1–21. 10.1007/s11071-021-06865-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Kavitha C, Gowrisankar A, Banerjee S. The second and third waves in India: when will the pandemic be culminated? Eur Phys J Plus. 2021;136(5):596. 10.1140/epjp/s13360-021-01586-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Gowrisankar A, Rondoni L, Banerjee S. Can India develop herd immunity against COVID-19? Eur Phys J Plus. 2020;135(6):526. 10.1140/epjp/s13360-020-00531-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Ma Y et al. LungBRN: A Smart digital stethoscope for detecting respiratory disease Using bi-ResNet Deep learning algorithm, 2019 IEEE Biomedical Circuits and Systems Conference (BioCAS), 2019, pp. 1–4, 10.1109/BIOCAS.2019.8919021.
- 10.Tong, Xia et al. COVID-19 Sounds: a large-scale audio dataset for digital respiratory screening, 35th Conference on Neural Information Processing Systems (NeurIPS 2021) Track on Datasets and Benchmarks, 2021. https://openreview.net/forum?id=9KArJb4r5ZQ.
- 11.Brabenec L, Mekyska J, Galaz Z, et al. Speech disorders in Parkinson’s disease: early diagnostics and effects of medication and brain stimulation. J Neural Transm. 2017;124:303–34. 10.1007/s00702-017-1676-0. [DOI] [PubMed] [Google Scholar]
- 12.Shi J, Zheng X, Li Y, Zhang Q, Ying S. Multimodal neuroimaging feature learning with multimodal stacked deep polynomial Networks for diagnosis of Alzheimer’s disease. IEEE J Biomed Health Inf. 2018;22(1):173–83. 10.1109/JBHI.2017.2655720. [DOI] [PubMed] [Google Scholar]
- 13.Robert RD, Pipe AL, Quinlan B, Oda J. Interactive voice response telephony to promote smoking cessation in patients with heart disease: a pilot study. Patient Educ Couns. 2007;66(3):319–26. 10.1016/j.pec.2007.01.005. [DOI] [PubMed] [Google Scholar]
- 14.Liu Y, Whitfield C, Zhang T, et al. Monitoring COVID-19 pandemic through the lens of social media using natural language processing and machine learning. Health Inf Sci Syst. 2021;9:25. 10.1007/s13755-021-00158-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Campagner A, Carobene A, Cabitza F. External validation of machine learning models for COVID-19 detection based on complete blood count. Health Inf Sci Syst. 2021;9:37. 10.1007/s13755-021-00167-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Bezzan VP, Rocco CD. Predicting special care during the COVID-19 pandemic: a machine learning approach. Health Inf Sci Syst. 2021;9:34. 10.1007/s13755-021-00164-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Malla SJ, Alphonse PJA. COVID-19 outbreak: an ensemble pre-trained deep learning model for detecting informative tweets. Appl Soft Comput. 2021;107:1568–4946. 10.1016/j.asoc.2021.107495. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Pham TD. Classification of COVID-19 chest X-rays with deep learning: new models or fine tuning? Health Inf Sci Syst. 2021;9:2. 10.1007/s13755-020-00135-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Jagadeesh MS, Alphonse PJA. NIT_COVID-19 at WNUT-2020 Task 2: Deep learning model RoBERTa for identify informative COVID-19 english tweets, Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020), pages 450–454, Online, 2020. 10.18653/v1/2020.wnut-1.66.
- 20.Andreu-Perez J, et al. A generic deep learning based cough analysis system from clinically validated samples for point-of-need covid-19 test and severity levels. IEEE Trans Serv Comput. 2021. 10.1109/TSC.2021.3061402. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Islam R, Tarique M, Abdel-Raheem E. A survey on signal processing based pathological voice detection techniques. IEEE Access. 2020;8:66749–76. [Google Scholar]
- 22.Al Ismail M, Deshmukh S, Singh R. Detection of Covid-19 Through the analysis of vocal fold oscillations, ICASSP. 2021–2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, pp. 1035–1039, 10.1109/ICASSP39728.2021.9414201.
- 23.Lella KK, Alphonse PJA. A literature review on COVID-19 disease diagnosis from respiratory sound data. AIMS Bioeng. 2021;8(2):140–53. 10.3934/bioeng.2021013. https://www.aimspress.com/article/doi/. [Google Scholar]
- 24.Nikolaou V, Massaro S, Fakhimi M, et al. COVID-19 diagnosis from chest x-rays: developing a simple, fast, and accurate neural network. Health Inf Sci Syst. 2021;9:36. 10.1007/s13755-021-00166-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Rahman T, Akinbi A, Chowdhury MEH, et al. COV-ECGNET: COVID-19 detection using ECG trace images with deep convolutional neural network. Health Inf Sci Syst. 2022;10:1. 10.1007/s13755-021-00169-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Chowdhury NK, Rahman MM, Kabir MA. PDCOVIDNet: a parallel-dilated convolutional neural network architecture for detecting COVID-19 from chest X-ray images. Health Inf Sci Syst. 2020;8:27. 10.1007/s13755-020-00119-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Lella KK, Pja A, Automatic. COVID-19 disease diagnosis using 1D convolutional neural network and augmentation with human respiratory sound based on parameters: cough, breath, and voice. AIMS Public Health. 2021;8(2):240–64. 10.3934/publichealth.2021019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Zohaib Mushtaq S-F, Su. Environmental sound classification using a regularized deep convolutional neural network with data augmentation. Appl Acoustics. 2020;167:0003–682. 10.1016/j.apacoust.2020.107389. [Google Scholar]
- 29.Malla SJ, Lella KK, Alphonse PJA. Novel fuzzy deep learning approach for automated detection of useful COVID-19 tweets. Artifntell Med. 2023. 10.1016/j.artmed.2023.102627. [DOI] [PubMed] [Google Scholar]
- 30.Lella KK, Pja A. Automatic diagnosis of COVID-19 disease using deep convolutional neural network with multi-feature channel from respiratory sound data: cough, voice, and breath. Alexandria Eng J. 2022 10.1016/j.aej.2021.06.024. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Tan W, Liu P, Li X, et al. Classification of COVID-19 pneumonia from chest CT images based on reconstructed super-resolution images and VGG neural network. Health Inf Sci Syst. 2021;9:10. 10.1007/s13755-021-00140-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Verónica Abreu A, JoséA Oliveira, Duarte. Alda Marques,. Computerized respiratory sounds in paediatrics: a systematic review. Respiratory Med. 2021. 10.1016/j.yrmex.2021.100027. [Google Scholar]
- 33.Kranthi Kumar L, Alphonse P. COVID-19 disease diagnosis with light-weight CNN using modified MFCC and enhanced GFCC from human respiratory sounds. Eur Phys J Spec Top. 2022. 10.1140/epjs/s11734-022-00432-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Khan SM, Qaiser N, Shaikh SF, Hussain MM. Design analysis and human tests of foil-based wheezing monitoring system for asthma detection. IEEE Trans Electron Devices. 2020;67(1):249–57. 10.1109/TED.2019.2951580. [Google Scholar]
- 35.Guo C, Lin S, Huang Z, et al. Analysis of sentiment changes in online messages of depression patients before and during the COVID-19 epidemic based on BERT + BiLSTM. Health Inf Sci Syst. 2022;10:15. 10.1007/s13755-022-00184-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Chemaitelly H, Yassine HM, Benslimane FM, et al. mRNA-1273 COVID-19 vaccine effectiveness against the B.1.1.7 and B.1.351 variants and severe COVID-19 disease in Qatar. Nat Med. 2021;27:1614–21. 10.1038/s41591-021-01446-y. [DOI] [PubMed] [Google Scholar]
- 37.Mardani R, et al. Laboratory parameters in detection of COVID-19 patients with poSITIVe RT-PCR; a diagnostic accuracy study. Archives Acad Emergency Med. 2020;8(1):e43. [PMC free article] [PubMed] [Google Scholar]
- 38.Tahamtan A, Ardebili. A. Real-time RT-PCR in COVID-19 detection: issues affecting the results. Expert Rev Mol Diagn. 2020;20(5):453–4. 10.1080/14737159.2020.1757437. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Puck B, van Kasteren B, van der Veer R, Molenkamp, Chantal BEM, Reusken, Meijer A. Comparison of seven commercial RT-PCR diagnostic kits for COVID-19. J Clin Virol. 2020;128:1386–6532. 10.1016/j.jcv.2020.104412. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Rocha BM, Filos D, Mendes L, Serbes G, Ulukaya S, Kahya YP, Jakovljevic N, Turukalo TL, Vogiatzis IM, Perantoni E, Kaimakamis E, Natsiavas P, Oliveira A, Jácome C, Marques A, Maglaveras N, Pedro Paiva R, Chouvarda I, de Carvalho P. An open access database for the evaluation of respiratory sound classification algorithms. Physiol Meas. 2019;40(3):035001. [DOI] [PubMed] [Google Scholar]
- 41.Brown C, Chauhan J, Grammenos A et al. Exploring automatic diagnosis of COVID-19 from Crowdsourced respiratory sound data. Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (2020).
- 42.Mesaros A et al. DCASE 2017 challenge setup: tasks datasets and baseline system, Procceedings Detection Classification Acoust. Scenes Events Workshop, 2017. https://hal.inria.fr/hal-01627981/.
- 43.Chaudhari G, Jiang X, Fakhry A et al. Virufy: global applicability of Crowdsourced and clinical datasets for AI detection of COVID-19 from Cough Audio Samples. 10.48550/arXiv.2011.13320.
- 44.Sharma N, Krishnan P, Kumar R, Ramoji S, Chetupalli SR, Ghosh RN, Ganapathy PK. S. (2020) Coswara–a database of breathing, cough, and Voice Sounds for COVID-19 Diagnosis. Proceedings Interspeech 2020, pp. 4811–4815, 10.21437/Interspeech.2020-2768.
- 45.Orlandic L, Teijeiro T, Atienza D. The COUGHVID crowdsourcing dataset, a corpus for the study of large-scale cough analysis algorithms. Sci Data. 2021;8(1):156. 10.1038/s41597-021-00937-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Jayalakshmy S, Sudha GF. Conditional GAN based augmentation for predictive modeling of respiratory signals. Comput Biol Med. 2021. 10.1016/j.compbiomed.2021.104930. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Van Rossum G, Drake FL. Python 3 reference Manual. Scotts Valley: CreateSpace; 2009. [Google Scholar]
- 48.Anon. Anaconda Software Distribution, Anaconda Inc. 2020 Available from https://docs.anaconda.com/.
- 49.McFee B et al. Librosa: audio and music signal analysis in python. In Proceedings of the 14th python in science conference. 2015
- 50.Chollet F. & others, 2015. Keras. Available from https://github.com/fchollet/keras.
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
This work is connected to a repository of data, where you can get the data. We collected COVID-19 large-scale sound (breath, voice, cough) data from reputable repositories and referenced sources in this work instead of using publicly available data [41–45]. Along with this, we have collected ICBHI data (digital stethoscope data) from online sources [40]. The pertinent author may provide the information to scientists and investigators upon proper request.






