Abstract
Sleep apnea (SLA) is a respiratory-related sleep disorder that affects a major proportion of the population. The gold standard in sleep testing, polysomnography, is costly, inconvenient, and unpleasant, and it requires a skilled professional to score. Multiple researchers have suggested and developed automated scoring processes with less detectors and automated classification algorithms to resolve these problems. An automatic detection system will allow for a high diagnosis rate and the analysis of additional patients. Deep learning (DL) is achieving high priority due to the availability of databases and recently developed methods. As the most up-and-coming technique for classification and generative tasks, DL has shown its significant potential in 2-dimensional clinical image processing studies. However, physiological information collected as 1-dimensional data has yet to be effectively extracted from this new approach to achieve the needed medical goals. So, in this study, we review the most recent studies in the field of DL applied to physiological data based on pulse oxygen saturation, electrocardiogram, airflow, and sound signal. A total of 47 articles from different journals and publishing houses that were published between 2012 and 2022 were identified. The primary objective of this work is to perform a comprehensive analysis to analyze, classify, and compare the main characteristics of deep-learning algorithms applied in physiological data processing for SLA detection. Overall, our analysis provides comprehensive and detailed information for researchers looking to add to this field. The data input source, objective, DL network, training framework, and database references are the critical factors of the DL approach examined. These are the most critical variables that influence system performance. We categorized the relevant research studies in physiological sensor data analysis using the DL approach based on (1) Physiological sensor data aspects, like signal types, sampling frequency, and window size; and (2) DL model perspectives, such as learning structure and input data types.
Supplementary Information
The online version contains supplementary material available at 10.1007/s13534-023-00297-5.
Keywords: CNN, DBN, Deep learning, GRU, LSTM, Sleep apnea
Introduction
Sleep apnea (SLA) is one of the most common sleep-related disorders, and the effects of undiagnosed SLA can range from high blood pressure to cardiac arrest [1]. SLA is defined by the American Academy of Sleep Medicine (AASM) as a sleep-related condition caused by respiratory difficulties while sleeping [1]. The Apnea–Hypopnea Index (AHI) is known as the much more relevant measurement system for diagnosing the presence and seriousness of the condition, representing the number of apneic occurrences per hour of sleep. SLA can increase the likelihood of developing cardiovascular disease, hypertension, chronic kidney disease, diabetes, anxiety, and cognitive impairment [2]. Obstructive sleep apnea (OSA) is a frequently reported sleep condition interspersed by frequent pharyngeal breakdowns involving a partial or total interruption in the upward airway [3]. As a result, there is inadequate airflow infiltrating in the respiratory and a drop-in the blood oxygen level. When there has been a significant loss of breathing but also no cardiovascular activities, these are called Central sleep apnea (CEN) episodes [4]. This indicates either that the human brain controlling the respiratory system failed to stimulate respiratory or that the impulse to breathe is not correctly distributed. Occurrences of cessation of respiratory function, sudden awakenings during sleep, hypersomnia, and snoring are also prominent symptoms of CEN. MIX apnea is a combination of OSA and CEN apnea, characterized by a decrease in respiratory effort and an upward respiratory obstruction.
An overnight polysomnography (PSG) in a sleep laboratory is the global standard method for the diagnosis and monitoring of sleep apnea. SLA is diagnosed using a full-night PSG report from a specialized sleep center lab [5]. Various physiological parameters, including respiration, oxygen levels, cardiopulmonary activity, and sleeping condition, are collected throughout that PSG as shown in Fig. 1. Following that, a professional research technician analyzes the information from the overnight data and examines every aspect of the waveform for the existence of sleep disorder. Besides that, since PSG is an unpleasant, time-consuming, and costly standardized test, several current researchers have worked on the implementation of a compact and far less cost-effective SLA clinical diagnosis incorporating fewer physiological data, such as blood oxygen saturation, an echocardiogram, and abdomen breathing signals, respiratory motions, as well as the cumulative signals [6]. Data indicates that many of the techniques focused on a single modality, single-lead electrocardiographic monitoring systems have the largest global detection by analyzed patient ratios [6].
This research aims to address the diagnostic techniques used in the area of SLA. Various approaches have been developed in the research, the majority of which require two steps: handcrafting a collection of specific features and developing a sufficient classification model to provide an automated diagnosis. Some existing research has concentrated on feature processing, which usually requires the use of a specialized feature extraction process to obtain ECG characteristics through the ECG signal, R to R intervals, and ECG-derived respiration (EDR) signals [7]. Several research, used wavelet transforms to derive characteristics of the ECG output waveform [8], and recurrence quantitative analysis of heart rate variability (HRV) data to record the complex changes within the cardiopulmonary cycle during OSA [9]. Varon et al. [10] used orthonormal feature space projections to obtain two characteristics resulting from improvements within the morphological characteristics of QRS sets, as well as heart rhythm and EDR. The k-nearest neighbor (kNN) [11], hidden Markov model [9], support vector machine (SVM) [3, 10], fuzzy logic [12], least-square SVM [13], linear discriminant analysis (LDA) [9, 11, 13], and neural network [7, 8, 14] are some of the classifiers used in these approaches. These approaches have two issues: first, the unlimited number of features that may be selected, which is amplified either by the assumption that merging two or more distinct characteristics, selected as the highest, may not ensure a superior feature set [15], and second, the requirement for extensive knowledge in a particular area in order to build significant features. Deep learning (DL) models, which automatically produce features by identifying correlation patterns in sensor reference signals, could also solve these two main problems. Even though existing studies in the research area of SLA diagnosis have also been conducted, including such testing devices for residential diagnosis of SLA, a diagnostic model focused on cardiovascular and oximetry sensors [13], various identification approaches [6], diagnostic and treatment technique [15], there are still gaps throughout the knowledge. However, for exhaustive significance, a comprehensive study is required to briefly describe physiological datasets, a study of different DNN models, challenges in datasets and DNN methods, and the future scope of the existing state of DL-based approaches for identifying SLA. Furthermore, recently published studies indicate that deep networks outperform shallow networks mostly in order to increase accuracy. As a result, the primary focus of such a study is on analyzing those works and evaluating the output of the present approaches in order to aim at providing in-depth information about the use of DL in the diagnosis of sleep disorders. Key Contribution of Paper –
A comprehensive survey has been done on the recent technique of DL for SLA, including problem identification, examination of datasets, and a parametric evaluation metric.
A systematic overview has been provided for DL-based techniques and a comparison of each method for an optimal solution.
Analyzing different SLA classification methods and discussing open issues and challenges provides future research directions.
In remainder of the paper, Sect. 2 provide data selection methodology, and Sect. 3 briefly describes the Signal and Data set, the data preprocessing discussed in Sect. 4, and the data performance metric and Classifiers discussed in Sect. 5 and 6, respectively. A discussion and open issues and challenges in Sects. 7 and 8, respectively. The future scope and conclusion of the paper are given in Sects. 9 and 10, respectively.
Data selection methodology
Search strategy and data selection
The study was performed to analyze research presented over the last decade, taking into consideration the period of time. IEEE Explorer, Web of Science, Science Direct, and PubMed were all used to perform a comprehensive search. Due to the various words and phrases of the word apnea, the search words were sleep apnea OR sleep apnoea, as well as the AND process and: semi-supervised function learning, unsupervised function learning; ‘DBN,’ deep belief network; ‘DNN,’ deep neural network; 1D and 2D ‘CNN,’ one dimensional and two-dimensional convolution neural network; autoencoder; ‘RNN,’ recurrent neural network. A total of 347 articles were collected after examining the databases. Subsequently, 167 duplicate papers were excluded, and only one transcript was acquired.
Data extraction and inclusion criteria
A total of 180 papers' titles and abstracts were found to be relevant to the subject. Studies that make a categorical data distinction between SLA and healthy states are included in the studies chosen for this systematic review. This research study also included a number of diagnostic analyses for apnea, hypopnea, and the normal (healthy) state that used a multiclass method. The key terms apnea and deep network were evaluated as part of the inclusion criteria. Nineteen articles are removed from the total number of articles in which the full-text paper cannot be accessed. Also, the non-English articles were one of the substantial exclusion factors. After that, 18 other irrelevant articles were also excluded, which were not explicitly created for the diagnosis of sleep disorders but could be modified for such a reason, and also were exempted. Besides rejecting such papers outright, we decided to exclude these articles after further analysis. We have included two articles despite not appearing in the search procedure due to the limited time span used for selecting the papers of the search articles. However, we added them because both papers are significantly relevant to this review article. A total of 47 articles were chosen for this evaluation. Figure 2 depicts the search strategy flow chart based on PRISMA guidelines [16], with showing the number of articles.
Signal and data set
All physiological inputs used in the system were obtained by either researchers or their collaborators or have been previously collected and recovered from datasets discussed in this section. A detailed overview of each input signal is given in the supplementary data (Table S1).
Based on pulse oxygen saturation signal
The level of oxygen in the bloodstream is measured by the pulse oxygen saturation level (SpO2). The apnea ECG database (AED) [17] and the University College Dublin SLA database (UCD-Dataset) [18] were obtained from the Physionet, which is free and accessible. Just eight of the 70 recordings in the AED have SpO2 signals [19]. These records, which ranged in length from 7 to 10 h and included minute-by-minute annotation [19], were used. This dataset is sampled at a rate of 50 Hz; Pathinarupothi et al. [20], Almazaydeh et al. [21], and Mostafa et al. [22] have used it from the AED data. Hypopnea (HYP), OSA, and CEN apnea are all chronically annotated in AED data. The SpO2 signal was sampled every 8 s and was used by Almazaydeh et al. [21], Mostafa et al. [22], and Cen et al. [23].
Ravelo-García et al. [13] used data from Dr. Negrn's Sleep Unit to compile a sample from Gran Canaria Hospital of 70 subjects. The patients did not have any heart problems. All annotations were produced within 30-s epochs. Biswal et al. [24] collected two types of data: the Massachusetts General Hospital (MGH) sleeping lab, which collected data from six channels of 10,000 objects, and the Sleep Heart Health Study (SHHS) database [25], which included two central channels of 5804 subjects. While the MGH database contains five sensors, data from four of them (chest, abdomen, pulse-ox, and airflow) were used across both datasets. In Mostafa et al. [26], data were obtained at the University Hospital of Gran Canaria using a VIASYS Healthcare Inc. (Wilmington, USA). There are 70 patients in the dataset, ranging in age from 18 to 82 years and dataset was referred to as HuGCDN2008 [27]. Only the SpO2 signal, with a sampling rate of 50 Hz, has been used in the study. Leimo et al. [28] used SpO2 signals collected from 1379 house sleep apnea testing methods (HSAT) of individuals with diagnosed SLA and tested with SpO2. In other study, Sharma et al. [29] analyzed data from two separate PSG data visits spanning a decade: SHHS1 (1995–1998) and SHHS2 (2001–2003), which provided a total of 5763 and 2651 subject data, respectively [30]. 5424 samples from SHHS1 and 2644 samples from SHHS2 patients were included in this study. SpO2 and pulse rate (PR) data are sampled at 1 Hz in both SHHS data.
Based on electrocardiogram (ECG)
One of the most widely used data sets for ECG research is the AED data [17]. The periods of each patient's ECG pulse recordings range from 430 to 578 min, with a cumulative apnea duration of minutes. All 35 observations are labeled down to the minute to differentiate between apnea and non-apnea. Experts inspect and mark each section for a minute. There have been a total of 16,988 min-long episodes. The apnea segment has 6496 min, while the non-apnea segment has 10,492 min. This dataset was used in Tyagi et al. [14], Pathinarupothi et al. [31], Wang et al. [32], Li et al. [33], DeFalco [34], Chang et al. [35], Mashrur et al. [36], Zhang et al. [37], Shen et al. [38], Zarai et al. [39], Gupta et al. [40], Faust et al. [41], Bahrami et al. [42], Liang et al. [43], Bahrami et al. [44].
In Banluesombatkul et al. [45], a dataset was collected from the MrOS sleep study (Visit 1) database. The data were obtained from 2911 people aged 65 and above in 6 health centers through a baseline test. The ECG signals within that dataset were collected using Ag/AgCl patched sensors at a sample rate of 512 Hz. Urtnasan et al. [46] used the PSG observations of 86 patients (65 M and 31 F) identified with OSA examined for this analysis. PSG data were collected at the Samsung Medical Clinic Laboratory (Seoul, Korea) [46] SCSMC86. In the same laboratory, night-time PSG records were obtained for 82 patients (63 M and 19 F) and 92 patients (74 M and 18 F), providing the databases SCSMC82 [47] and SCSMC92 [48], respectively.
Erdenebayar et al. [49] used full night-time PSG data from 86 subjects (65 M and 21 F) with a mean age of 58.18 ± 11.02 (mean ± SD). At the Samsung Medical Center (Seoul, Korea), PSG records were evaluated using an Embla N7000 amplification tool. Li et al. [50] used PSG records performed on retrospective samples from 148 SLA subjects (114 M and 34 F) and 33 healthy subjects (20 M and 13 F) from Beijing Tongren Hospital (TRECKY2019–049), between Jan 2018 and Sep 2019. Iwasaki et al. [51] used the PSG observations of 59 patients to identify them with OSA. The PSG data were collected at the Shiga University of Medical Science (SUMS) hospital, and the total absence of depression, hypertensive, diabetic, ischemia, or other endocrine disorders were the inclusion criteria. A sleep expert recorded PSG during sleep (6–7 h) in an EEG shield room. The validation dataset has an AHI of 40.4 and 18.0.
Ravelo-García et al. [52] collected a total of 133 subjects (93 M and 40 F) between the ages of 18 and 83. From those in the initial 133 patients in the dataset, 36 had been excluded due to a restriction in the period of sleeping and cardiac problems, allowing 97 subjects for this review. Both recordings include a standard ECG digitized at 200 Hz with a resolution of 16. Olsen et al. [53] used approximately 10,000 PSG sleep data points from two vast population-based datasets, the SHHS [30] and the Multi-Ethnic Analysis of Atherosclerosis (MESA) [53]. A total of 8444 PSG recordings from the SHHS included. Eighty-two records that had less than a full hour of sleep in a row were eliminated. The final 8362 PSG records were used to obtain a solitary ECG (lead II) processed at 125 Hz for analysis. Additionally, AED data 70 records were used for testing.
Based on airflow (AF)
In Thommandram et al. [54] used the AED data for the analysis of proposed study. The archive comprises 70 recordings, about 8 h long, with respiratory signals captured subsequently. However, only eight records (age: 43.3 ± 8.3 years, 7 M and 1F) provide respiratory signals. These Apnea AF data sets were used by Minu et al. [55]. Biswal et al. [24] examined AF signal data from the sleep lab at MGH, which had 10,000 patients, as well as sets of data from the SHHS [25], which had 5804 patients. Choi et al. [56] obtained PSG sensor information among 129 subjects over the age of 20 at Seoul National University Hospital’s (SNUH) Center. PTAF-2 (Pro-Tech, Woodinville, USA) used to assess respiratory signals using a pressure transducer to measure the respiratory signals of 129 patients. Steenkiste et al. [57] have used the SHHS-1 dataset [25], which includes data from 5804 subjects aged 40 and up. The breathing pattern signal has been sampled at a rate of 10 Hz on 2100 subjects (1008 F and 1092 M). Respiratory data obtained from the ECG have also been used. Cen et al. [23] evaluated oronasal AF from 23 subjects from the UCD [18] dataset. The respiratory AF signals collected from the MESA database were sensed by Haidar et al. [58, 59] and McCloskey et al. [60]. ElMoaqet et al. [61] used PSG data from 17 subjects collected at the Charité- Universitätsmedizin Berlin. The signal from nasal AF transducer was sampled at a rate of 256 Hz.
Haidar et al. [62] have used the MESA dataset, which contains PSG records for 1507 subjects. Cardiopulmonary tracheal motions were used to detect respiratory event and severity of SLA. Hafezi et al. [63] used a dataset of 69 adult patients from the Toronto Rehabilitation Center. An accelerometer is also connected to the patient's suprasternal notch in order to measure tracheal breathing movements. Lakhan et al. [64] used AF input that was taken from the MrOS sleep dataset (Visit 2). There were a total of 1026 male participants, who participated in sleep studies at 6 different clinical sites. ProTech Thermistor sensors sampled at 32 Hz were used to capture the AF signals. The authors analyzed 520 randomly selected participants from the data. Drzazag et al. [65] calculated the respiratory episode index (REI) by using the oronasal AF combined with the abdominal and thoracic respiration efforts data from SHHS-1 and UCD datasets. In the SHHS-1 dataset, 3610 of its 5804 data points were automatically evaluated based on respiration pulse energy. Additionally, 25 records from the UCD datasets were used for the accuracy evaluation.
Based on sound
A respiratory activity generates distinct sounds that could be selected to identify abnormalities. Kim et al. [66] obtained the complete PSG records of 120 subjects at the Seoul National University Bundang Hospital (SNUBH) sleep center. The respiratory sound was captured as part of PSG that used a PSG-embedded microphone (SUPR-102) mounted on the rooftop just above the subject's bed at the height of 1.7 m. Rosenwein et al. [67] used a database of audio files from 186 subjects, 93 of which were obtained through a PSG examination at the Sleep–Wake Disorder Center (Soroka University). The data were captured with a digital sound recorder (Edirol R-4 Pro) and an attached vertical condenser mic (RODE NTG-1), mounted 1.0 m just above the subject's head. Moreover, the dataset is comprised of signals collected by 93 subjects at their residences. Romero et al. [68] used a HSAT study that included 103 subjects (67 M and 36 F). Over 1 or 2 nights, each subject completed home SLA testing, while the sound was being captured concurrently. Wang et al. [39] used data from 194 overnight PSG tests performed at Beijing Tongren Hospital's sleep center. All subjects are above the age of 18. 162 eligible subjects registered from Oct 2018 to Jan 2020. A contactless digital audio recorder (Sony, PCM-D10) with a 44,100 Hz sampling frequency was used to time-synchronize PSG and nighttime ambiance sleeping noises. The subjects' heads were one meter apart from the audio recorder.
Cheng et al. [70] used the audio signal of 33 patients and 10 healthy persons. Mics were used to capture a total of 4780 aberrant snoring segments in addition to 10,740 regular snoring segments. After individually screening and analyzing PSG findings, snoring within two seconds of each episode is categorized as pathological snoring (PAS). Nakano et al. [71] collected 1852 patient PSG data of tracheal sound (TS) recordings from the Fukuoka National Center from 2008 to 2016. An audio microphone (Audio-Technica, AT9904; Panasonic, RP-VC3; Sharp, MC-TP2; ECM-PC60) was placed on the neck above the trachea to analyze snoring. Every 0.2 s, the power spectrum was log-transformed and recorded in the computer as decibel readings. Despite discarding half the sample, this approach is adequate to create compressed TS spectrogram images of respiratory episodes.
Data pre-processing
Physiological signals are susceptible to a wide variety of noise and abnormalities, including improper lead artifacts, movement artifacts, power line electrical interferences, and a number of many more. In order to filter out the noise and acknowledge the real information, researchers address the following processes.
Raw signal
Biswal et al. [24] used unprocessed respiratory signals: AF, pulse-oximetry, chest and abdominal belts as inputs for DL model. Almazaydeh et al. [21] used pulse oximetry SpO2 signals as inputs to a neural network. Cen et al. [23] used physiological signals, including SpO2, oronasal AF, and abdominal motion. The number of signal samples in every instance is denoted by
1 |
where w represents the window length of 5, fs is sampling frequency of 16 Hz, and a number of channels 3 denoted as . As proposed by Mostafa et al. [22] used signals from two distinct databases to evaluate both resampled sample frequencies to 1 Hz, and haidar et al. [58], the raw respiratory signal can be used directly as input for a classification algorithm by respiratory belts in the abdominal and thorax surrounding it.
Filtered signal
In an ECG signal, various filtering methods are used to minimize the noise effect. Researchers used a distinct approach to remove noise from widely used AED data. In [36], a Chebyshev Type II bandpass filter has been designed to retain frequency information ranging from 0.5 to 48 Hz. In other research, a Chebyshev type-II bandpass filter with a passband of 5–11 Hz was applied to the ECG data in order to eliminate unwanted noise [39, 55]. Zarei et al. [39] used a Chebyshev bandpass filter having a frequency range of 0.5–48 Hz to minimize baseline wandering (BW) and power line disturbances. An automated weight calculation approach is then used to remove the noisy portions. The noisy segments are identified and eliminated using the threshold level of 0.8.
Various undesirable signals affect the ECG signals. As a result, its spectrogram is similarly contaminated by noise. The noise in the distorted spectrogram is eliminated using S–G filtering [40].
2 |
where represents the input sequence, represents the polynomial coefficients, and represents the filtered output sequence. The source ECG signal was filtered using a 60 Hz notch filter, and then a 2-order bandpass Butterworth filter with a 5 and 35 Hz cutoff frequency was used [45]. In other methods, a bandpass filter with a cut-off frequency of 5 and 11 Hz was applied in [54–56], and a finite impulse response (FIR) bandpass filter with cut-off frequency of 0.5 and 30 Hz [49] to remove unwanted noise from the raw ECG data.
Signals were altered from muscle movement during long periods of respiratory recording. In Choi et al. [56], to minimize high-frequency noise and BW from the raw signal, filtered with the first HBF (0.01 Hz) and the consecutively LPF (3 Hz) with a 5th-order IIR Butterworth-filter in [56]. In Steenkiste et al. [57], physiological breathing signals are filtered using a 4th-order low-pass zero-phase-shift Butterworth filter of 0.7 Hz to minimize noise. In ElMoaqet et al. [61], all respiratory signals have the same sampling rate, and the nasal pressure signal has been down-sampled to 32 Hz. Preprocessing was performed on signals using an LP-FIR filter (0.5 Hz). A third-order Butterworth bandpass filter of 0.1 and 25 Hz was used to filter tracheal motions of signal related to respiratory [63]. To eliminate the influence of motion signal baseline drift, a cut-off frequency of 0.1 Hz was chosen. The 25 Hz cut-off frequency was chosen to prevent snoring-induced high-frequency vibrations.
In the dataset of desired sleep sound, the collected audio also included noises from other sources of distortion, such as the PSG equipment, the duvet, and a patient-clinician conversation. Using a two-stage filtering procedure, Kim et al. [66] reduced the disturbance from the respiration sound. Initially, a spectral reduction filtering approach [72] was applied to increase the efficiency. Then, sleep stage filtering was performed to remove noise from duvets and conversations.
Signal normalization
Data normalization is the process of preparing your semantic computer database to correspond to a set of standards. In choi et al. [56], adaptive normalization technique was used to get that segment at which magnitude of respiration was low due to a long period of asleep posture. For each second, the area and standard deviation (SD) of the filtered signal were calculated [56]. Haidar et al. [58] each sample was normalized dependent on mean () value and SD () value of the common episodes for every subject to balance the differences in signal (nasal, thoracic, and abdominal). The normalized signal.
3 |
where is raw signal for subject s and type n (which is either thoracic, nasal or abdomen). McCloskey et al. [60] used the AF signal for all segments between the start and end of sleep event and used normalized equation as (2) of type n = 1 signal. In Drzazag et al. [65], imbalance of each data point was eliminated by reducing the signal's average for each instance. A scaling factor is computed for each input, its value multiplie to the input.
Batch Normalization (BN) is a deep learning approach used in CNNs to enhance training efficiency and decrease overfitting by normalizing data input to each layer. A variation in each layer's input distribution during model training affects all successive outputs. This can make it challenging to train networks with saturating nonlinearities. BN was applied to solve that issue in [37, 45]. Suppose is input to a d-dimensional layer.
4 |
where and are mean and variance of a mini-batch. is a random noise variable. and are shift and scale parameters.
Spectrogram
Biswal et al. [24] have segmented each 30-s epoch into sub epochs of 2-s length with 1-s overlap using the spectrogram characterization of EEG and EMG records. Erdenebayar et al. [49] used the short-time Fourier transformation to transform ECG into 2D spectrogram images in order to obtain 2D input as:
5 |
where represent time and frequency when signal was received, respectively, and is window function with a 128-point window length and a 127-point overlap. Mushrur et al. [36] presented a signal-transforming pipeline in which each epoch is turned into a scalogram, a visual representation of time and frequency. The 60-s ECG scalogram (time–frequency) is generated using a continuous wavelet transform.
The Gabor transform (GT) was used by Gupta et al. [40] to convert the 1-D ECG into a 2-D Gabor spectrogram (GS). Signal properties that do not remain stationary are often represented by GT. It eliminates a portion of the time-series signal by making use of a Gaussian window. The GT of a given input signal
6 |
In Nakano et al. [71], all PSG data were used to create 60-s sound spectrogram. TS spectrogram (300 columns, equivalent to 60 s, and 64 rows, equivalent to 22 to 700 Hz; 24-bit color, 64 × 300 cells) were created from thousands of respiration episodes.
Feature analyses
Almazaydeh et al. [21] used the Spo2 signal-based delta index and oxygen destruction of 3% as two oximetric features and one non-linear metric signal. Ravelo-Garcia et al. [13] used a combination of Sp02 and RR interval (RRI) features. Both time and frequency domain characteristics were derived from the RR sequence for the oxygen saturation signal. DeFalco et al. [34] used 12 standard parameters from the heart rate signal of ECG data relating to the time domain, frequency domain, and non-linear domain parameter. The study by Lakhan et al. [64] used sleep-time AF data and extracted 17 characteristics from those signals to be utilized as input for the classification system.
Performance metrics
Table 1 shown the parameters and calculations used to determine the metrics (accuracy, sensitivity, specificity, and other parameters) in the form of true negative (TN), true positive (TP), false negative (FN) and false positive (FP) values. Receiver operating characteristic (ROC) used to assess apnea detection efficiency of various classification thresholds, and the area under the ROC (AUC) is measured to assess overall performance. True positive rates (TPR) vs false positive rates (FPR) were used to compute the AUC. The area that is underneath it reaches its maximum value of 1 Positive- predictive value and sensitivity are used to calculate the F-measure. A weighted proportion, is added into the F1 to determine , where a class index is , N represents the total number and, is number of class.
Table 1.
Parameters | Calculation |
---|---|
Accuracy (Acc) | |
Specificity (Spe) | |
Sensitivity (Sen) | |
Positive- predictive value (PPV) | |
Negative-predictive value (NPV) | |
True positive rate (TRR) | |
False positive rate (FPR) | |
where | |
AUC |
Classifiers
The researchers used three types of deep networks: CNN, RNN, and deep vanilla neural network (DVNN). A detailed discussion of each DL model is provided in the Supplementary data sheet.
CNN
CNNs are a form of DNN that is commonly used in image recognition, speech recognition, and signal analysis. For signal classification, two different types of CNN are frequently used: one-dimensional CNN (1D-CNN) and two-dimensional CNN (2D-CNN).
1D-CNN
The 1D-CNN section is made up of four sublayers: 1D-convolution, activation, and 1D-pooling, as well as a dropout layer. 1D-convolution is used to retrieve the feature maps from the signal in the convolutional (Conv) layer. Mostafa et al. [26] used the greed-based optimization algorithm to optimize a CNN model's effectiveness in detecting apnea events from a 1D SpO2 signal. For apnea, the weighted-topology transfer with rough estimation was found to be the most accurate, with an accuracy of 88.49 percent for the HuGCDN2008 data and 95.14 percent for the AED dataset. Mostafa et al. [27] used the non-dominated sorting genetic algorithm-II (NSGA-II) to develop a CNN for abnormal respiration based on signal segments. The model parameters of CNN were optimized by using NSGA-II, a multi-objective evolution model that was used as an optimization. The CNN used a structure of Conv layers, nonlinear layers (ReLU), and a fully-connected (FC) layer with 2 outputs with a softmax layer. 3 distinct input widths and datasets were evaluated, with the effective one achieving an average Acc of 94.0 percent. Choi et al. [56] used CNN and a solely nasal pressure signal for AHI identification. A CNN structure of 3 Conv layers, 2 max-pooling (MPL) layers, and 2 FC layers was used. The first MPL layer then subsamples the filtered signal. This process was repeated for the 2nd Conv- 3rd Conv and 2nd MPL layers, resulting in a total of 30 output signals. Such signals were further attached to the FC layers of 50 units and obtained an Acc of 96.6 percent.
Leimo et al. [28] developed a CNN model to estimate SLA severity using only overnight SpO2 data. The CNN model consists of four Conv layers, with an MPL layer (2 × 2) stride progressing after the two Conv layers with kernel size (200 × 10) and the other two Conv layer with kernel size (300 × 5) and a global average pooling occurring after the fourth layer. The presented model achieved an Acc of 88.3 percent and Sen of 90.9 percent. Sharma et al. [29] presented an epoch-based DL method for SLA diagnosis with the widely used PR and SpO2 inputs from pulse oximeters. Three 1D Conv layers with a kernel (5 sizes), filter (32, 32, and 8 sizes), and strides (2, 1, 1) have been used to process the input. Dropout and MPL layers have a 0.2 probability and 3 pool sizes. For the SHHS1 and SHHS2 datasets, the model achieved an Acc of 84.3 and 82.2 percent, respectively. Wang et al. [32] used a residual and CNN network with RRI data for the detection of apnea segments. The CNN has 7 Conv and 2 dense layers, while the residual network has 33 Conv layers in 31 residual blocks and 1 dense layer. The accuracy of CNN and the residual network was 90.9 percent and 94.4 percent, respectively. Chang et al. [35] used an SLA prediction model based on a deep 1DCNN using ECG data. The CNN architecture is composed of ten equivalent CNN-based layers, a flattened layer, and four equivalent classification layers primarily made up of FC networks. The method obtained the highest accuracy of 87.9 percent for per-minute apnea diagnosis.
Shen et al. [38] have proposed a weighted-loss and time-dependent classification model with a multiscale method for feature extraction for OSA identification. The RRI segments were automatically processed to extract variational features using the MSDA-1DCNN framework, with 2 Conv layers with 256 filters, kernel size 7, and strides 1 to the Relu activation function, and obtained an Acc of 89.4 percent. SLA was detected using CNNs and hybrid DNN models by Bahrami et al. [44]. ZF-Net and Alex-Net are popular CNN structures used in signal processing. The presented ZF-Net had 96 kernels (7 × 1 size) layered to BN followed by MPL (3 × 1) in the initial layer, In next layer a Conv layer 256 kernel (5 × 1 size) layered to BN followed by MPL (3 × 1), and 2 Conv layers with 384 kernel (3 × 1 size), and 256 kernel (3 × 1 size) each placed to BN with MPL (3 × 1) in the last layer. Flattened data were provided to two FC layers of (48 × 2 nodes), and obtained an Acc of 87.36 percent. The other CNN model, ZF-Net had 96 kernel (11 × 1 size) layered to MPL (3 × 1) followed by BN in initial layer, In next layer a Conv layer 256 kernel (5 × 1 size) layered to MPL (3 × 1) followed by BN, and 3 Conv layers with 512 kernel (3 × 1 size), 1024 kernel (3 × 1 size), and 512 kernel (3 × 1 size) placed together with MPL (3 × 1) in the last layer. Flattened data were provided to two FC layers (209 × 2 nodes), and obtained an Acc of 87.09 percent.
Urtnasan et al. [46] used a multi-class CNN structure to diagnose SLA event using ECG segments from SAHS subjects. 1D-Conv, MPL (size of 1 × 2) layer, dropout (p = 0.25), and FC layers with SoftMax function were used in the CNN structure. The six-layer CNN obtained a mean F1-score of 87 percent and Acc of 90.8 percent for all classes. Urtnasan et al. [47] proposed an automatic system for detecting. The CNN model is made up of several Conv layers, an activation function (ReLU), an MPL (size of 1.2), a dropout (rate of 0.25), and an FC layer. The CNN with six layers obtained the best Acc of 96 percent. Erdenebayar et al. [49] analyzed the utility of 6 DL models, including 1D-CNN, in order to detect apnea events. The accuracy of 98.5 percent was achieved using a six-layer Conv with three kernels (sizes of 50 × 1, 30 × 1, and 10 × 1), 1D pooling (Size 1 × 2), followed by the activation feature (ReLu), dropout layer (0.25), and the FC layer. Three 1D-signals segmented into 30-s epochs were used to feed a 1DCNN for OSA detection by Haidar et al. [58]. The CNN network consists of 6 Conv layers (32 filters), 3 MPL (1 2 sizes), and 1 FC softmax output layer. Both individual and paired combinations of channels were significantly outperformed by the collective usage of the three channels, which achieved an accuracy of 83.5 percent. In other study, Haidar et al. [59] proposed a method for detecting apnea-hyp episodes from nasal AF signal. Model consists of three 1-D Conv (30 filters, kernel size 5), followed by an MPL and an FC layer with a soft-max activation. To optimize the descriptive cross-entropy optimal solution, the CNN was trained by using backpropagation and obtained the average Acc of 75.0 percent.
McCloskey et al. [60] analyzed the nasal AF signal normalized with 30 s episodes from 1,507 subjects' datasets to detected OSA events using a CNN and wavelets. The normalized signal of each epoch was fed into the 1DCNN. Each 30 s epoch had 960 attributes. Each Conv layer of kernel size (3 × 1) is preceded by the MPL, which is followed by another Conv layer of kernel size (2 × 1) with 2 strides, and obtained the Acc of the 77.6 percent. The CNN models used by Haidar et al. [62] to detect SLA instances based on analysis of data from respiratory signals. The number of Conv layers was assigned to 32 with the ReLU activation variable, and a pooling layer size of two (Conv-MPL) was used in a three-cascading structure with the use of an FC layer. The CNN model achieved the highest outcomes, with an accuracy of 80.78 percent. Wang et al. [69] developed OSAnet, a deep CNN approach for event-by-event OSA identification using sleep sounds from a contactless sound recorder. OSAnet's layers included a feature extraction and fusion layer. Dimensional minimization layers decreased input values from 64 to 1, and ResNet50-like network systems were used to extract characteristics. The 2D Conv layers and MPL layers of ResNet50-like structures were replaced with 1D layers. The network consists of two Conv layer of 16 filter followed by MPL . After that, two Conv layers of 32 filter and 64 filters with a filter size of , each two Conv layer followed by an MPL layer. The presented method identified severe cases with Sen of 95.6 and Spe of 91.6 percent.
2D-CNN
Cen et al. [23] developed a method to detect events based on 1-s annotation using a mixture of SpO2, oronasal AF, and ribcage and abdominal motions on two layers of convolution and subsampling. The first Conv operation has six feature vectors, while the next Conv layer increases the feature vectors to twelve. Every Conv layer is followed by two subsampling levels with scale sizes of 2 and obtained an Acc rate of 79.6 percent. Mashrur et al. [36] presented a scalogram-CNN (SCNN) network that has nine layers, including three 2D-Conv layers, three MPL, and one FC layer. The scalogram image of used as input in the SCNN model. The first Conv layer has 512 kernels, the second has 256, and the third has 128. To avoid model overfitting, drop-out was used in the FC layer, and obtained Acc of 94.38 and 81.86 percent for AED and UCD data, respectively. Gupta et al. [40] presented a DL model for automated OSA diagnosis to achieve high efficiency by utilizing smoothed GS (SGS) of ECG data signals. The GS and SGS of ECG signals were provided as input to the previously trained Squeeze-Net, Res-Net50, and a designed OSA-CNN network (OSACN-Net). In addition to a classification layer, the OSA-CNN has a total of five layers network: four Conv layers, two MPLs, two FC Conv layers, and one softmax layer. A total of 96, 84, 48, and 128 filter have been selected, with filter sizes of and , respectively, and a stride of 2. Average classification Acc for SGS images using Squeeze-Net, Res-Net50, and OSACN-Net was obtained at 90.34, 94.51, and 94.81 percent, respectively.
Erdenebayar et al. [49] analyzed multiple DL models, including 2DCNN, in order to detect apnea events from a single-lead ECG data. Seven-layer Conv, three kernel sizes of 50 × 2, 30 × 2, and 10 × 2), MPL (Size 2 × 2), followed by the activation feature and dropout layer (p = 0.25), with FC layer, achieved an accuracy of 95.9 percent. McCloskey et al. [60] proposed a 2DCNN approached for detecting three types of incidents (OSA, hyp, and normal) using wavelet transform spectrogram images of nasal AF. CNN was comprised of two Conv layers of 56 filters of (10 × 10) kernel size, followed by activation function layers, a MPL (size of 2 × 2), an FC layer, and a 3-node softmax layer and obtained a 79.8 percent of Acc rate. Romero et al. [68] proposed a DL technique for OSA screening using sleep audio recordings. The power-spectrogram was processed using a bank of 64 Mel-filters with overlapped pass-bands from 70 to 7.5 kHz to obtain Mel-spectrograms. CNN has three Conv layers, each with a different kernel size and a different number of filters (16, 32, and 64). CNN was regularized with a 0.3 dropout rate. Finally, an FC layer with one sigmoid activation unit classified events as apneic or normal and obtained an AUC of 0.92 and a Spe of 93 percent. Nakano et al. [71] used spectrogram pictures from TS records as input to CNN to identify SLA episodes. It was observed that a five-layered neural network performed very well at differentiating these events. The network included 3 Conv and 2 FC layers. Three MPLs, 2 BN layers, and 4 activation function layers were added to optimize the model. For each subject in the validation set, a 60 s TS spectrogram picture was progressively created every 30 s (50% overlapping). The presented networks has a Sen of 98.0 percent and a Spe of 76.0 percent.
RNN
RNN is the optimal learning method for learning sequential data inputs and time-series data processing since its feedback and current value are feedback across the network. The analyzed works used two forms of RNN: long short-term memory (LSTM) and gated recurrent units (GRU).
LSTM
The LSTM-RNN technique was used to detect OSA on a minute-by-minute basis using SpO2 and Instantaneous heart rate (IHR) data in Pathinarupothi et al. [20]. The structure is composed of three layers: the input layer contains 30 neurons (or 60 in the case of SpO2), the hidden layer consists of 32 system memory with one neuron each, and the output layer includes two neurons containing two classes. Minute-to-minute IHR data had an accuracy of 89.0 percent, and SpO2 had an even higher Acc of 95.5 percent. Van Steenkiste et al. [57] used an LSTM network to identify SLA from chest and abdomen respiratory data. For each LSTM model, balanced datasets have been used, with one LSTM layer, 3 dropout layers, and an output unit at the end. An efficiency analysis was performed for three respiratory signals using temporal and non-temporal models. The temporal models' LSTM and FLSTM outperformed the non-temporal models of random forest, logistic regression, and ANN.
Pathinarupothi et al. [31] used a 2-layer LSTM-RNN with double memory blocks for every layer of HRV. The existing algorithms used a single hidden layer of different LSTM blocks between 2 to 32, with a learning value of 0.1, for 150 epochs. This method achieved perfect precision and an F1 calculated value of 1. Faust et al. [41] developed LSTM models to accurately diagnose sleep apnea using HRV data. The RR input series is concurrently transmitted backward and forward through two different LSTM models. The Adam optimizer was used to train the algorithm, with a 1024 of batch size and a 1e-3 of learning rate and achieved the Acc of 99.80 percent. Bahrami et al. [44] used RNN and hybrid DNN networks to predict SLA. The designed LSTM and bidirectional LSTM (BiLSTM) have a three-layer, two-cell structure, with each cell obtaining a , 2D structure input. The and output aspects were selected for LSTM and BiLSTM structures, respectively. The accuracy of the presented LSTM and BiLSTM was 82.52 percent and 82.45 percent, respectively. Urtnasan et al. [48] used single-lead ECG data segmented into 10-s incidents to diagnose apnea occurrences using a deep LSTM-RNN model, which consisted of six-layered recurrent layers. This approach achieved 98.5 percent Acc and the , measure of 98 percent, respectively. Erdenebayar et al. [49] analyzed multiple DL models, including LSTM, in order to detect apnea events. The LSTM model, which consists of 3 layers of RNNs, each with 60, 80, or 120 memory cells, was proceeded by output function maps that performed BN and dropout to achieve an Acc of 98 percent.
Iwasaki et al. [51] developed an SLA assessment method that used the LSTM and HRV data. A model with an LSTM layer and 32 nodes in a hidden layer that had been trained over 150 epochs using the Adam optimizer and had a learning rate of 0.01 was used. The proposed approach obtained a Sen and Spe of 100 percent in identifying individuals with moderate-to-serious SLA. ElMoaqet et al. [61] used an LSTM and BiLSTM network for extracting features and diagnosis of apnea episodes from single respiration channel inputs. The 1st and 2nd BiLSTM layers' memory cell counts were set at 100 (100 2 LSTMs) and 40 (40 2 LSTMs), respectively. In both detection models, the nasal pressure (NPRE) signals produced better detection performance. In comparison to the LSTM, the BiLSTM-based model performed better with oronasal thermal AF and ABD signals. For NPRE signal accuracy, the LSTM and BiLSTM obtained 85.1 percent and 85 percent, respectively. Drzazag et al. [65] suggested an approach that uses a structure based on the LSTM networks to find the locations of apnea or hypopnea events during sleep. The input was processed by an LSTM layer of 150 units, followed by dropout (0.5 factor), a dense layer (30 units), ReLu activation, and an FC softmax layer at the output. For the SHHS-1 and UCD databases, the provided model achieved an average Acc of 80.66 and 82.04 percent, respectively. Cheng et al. [70] present an LSTM-based classifier to distinguish respiratory occurrence snoring from normal snoring. The various properties of snoring are indicated by the Mel-frequency cepstrum coefficients (MFCC), Mel-filter banks (Fbanks), short-time energy, and linear prediction coefficient (LPC), which have been obtained as unique aspects of snoring. The input data was processed using a single-layer LSTM with 100 units, followed by a dropout layer using a single-layer LSTM with 50 units. The snoring binary classifier achieved 95.3% accuracy, allowing it to be employed in the alternative diagnosis of the OSA condition.
GRU
Urtnasan et al. [48] used a GRU-RNN to evaluate nighttime ECG signal to develop a method for automatically identifying sleep-disordered breathing (SDB) episodes. All ECG segments used to have the same length of 10 s and were formed as 2000 1. The structure is made up of six layers of RNNs, each with distinct numbers of memory cells. GRU-RNN method obtained Acc, and the weighted F1 measure was 99.0 percent. Erdenebayar et al. [49] analyzed multiple DL models, including GRU, in order to detect apnea events. The GRU model, which consists of 3 layers of RNNs, each with 60, 80, or 120 memory cells, was followed by output function maps that performed BN and dropout to achieve an Acc of 99 percent. Bahrami et al. [44] used the ECG signal and RNN and hybrid DNN models to identify SLA. The designed GRU network has a three-layer of two-cell structure, with each cell obtaining an , 2D structure input. The output aspects were selected for the model. The Acc and Spe of the presented model were 82.93 and 88.63 percent, respectively. Olsen et al. [53] used bidirectionally GRU on the ECG signal to identify sleeping irregular respiration events in diverse and large cohorts. The classifiers structure, such as GRU units and ReLU activation, an MPL to minimize the unique dimension, and subsequently BN and dropout layers to minimize overfitting. The presented method used 64, 128, 256, and 512 filters for the recurrent and densely linked layers. The model achieved Sen of 70.9 percent and Acc of 84.9 percent.
Deep vanilla neural network (DVNN)
The researchers of the reviewed papers used three forms of DVNN: multi-hidden layers neural networks (MHLNN), stacked sparse autoencoders (SSAE), and deep belief networks (DBN).
MHLNN
Almazaydeh et al. [21] used a three-layered feed-forward neural network as a classifier for OSA detection using the SpO2 signal. It feeds an oxygen desaturation index and delta index into a neural network, and achieves epoch-based Acc of 93.3 percent on the Apnea database. DeFalco et al. [34] used evolutionary equations (EAs) in combination with a data subsampling method to minimize simulation time in order to select the optimal MHLNN hyper-parameters. The optimal DNN structure obtained consists of two hidden layers of 23 and 24 units, respectively, and a rectified linear unit as an activation function to achieve an Acc of 68.37 percent. Kim et al. [66] used an MHLNN with two hidden layers and two dropout layers for 4 classes to analyze respiratory sounds during the night. Using ten-fold cross-validation, 4 window sizes of 2.5, 5, 7.5, and 10 s were measured, with 5 s outperforming the others. Obtained an Acc of 88.3 percent in the four-category classification and 92.5 percent in the binary classification.
Li et al. [50] developed a multi-layer feed-forward neural network (FNN) using the characteristics learned from the ECG, SpO2, and BMI studies. It had an input layer and an output layer, and it included 10 hidden layers in between. In order to optimize the network and cut down on the number of instances of overfitting, a cross-entropy loss function was applied. The output vectors of the network were either OSA (1.0) or normal (0.1) and obtained an Acc of 97.8 percent in SLA classification. Lakhan et al. [64] developed 17 characteristics using AF data and a sequence of fully-connected DNN with layer sizes of 1024, 512, 256, 128, 64, 32, 16, 8, and 4 hidden layers followed by the hyperbolic tanh activation function. tenfold cross-validation was used to get averaged acc values of 83.46, 85.39, and 92.69% for the three AHI cutoff criteria of 5, 16, and 30, respectively.
SSAE
A deep autoencoder is made up of multiple encoder layers that are stacked on top of each other and can be used to make a deep SSAE. Li et al. [33] used a sparse autoencoder and a hidden Markov model to diagnose OSA. The R-peaks were detected using the Pan-Tompkins algorithm [73], and the physiologically irrelevant and redundant points were removed using the median filter. The RRI sequence was computed using the validated R-peak locations, and the RRI sequence was interpolated into 100 points. For primary extraction of features, a hidden layer of SAE unsupervised training was used first, followed by fine-tuning with a logistic regression layer. The highest Acc of 84.7 percent was achieved by analyzing two deep network architectures.
DBN
DBN is a probabilistic generation framework that addresses the optimization issue of deep-rooted neural networks with the use of layer-by-layer learning. Mostafa et al. [22] developed a DL method for performing OSA classification by providing the raw SpO2 input to a DBN model. They proceed by calculating the preliminary weights using an unsupervised learning technique. The weights are then standardized using supervised fine-tuning. The learning model has three layers: the first two are for the Restricted Boltzmann Machine (RBM), and the final is a softmax layer and obtained the Acc of 85.26 and 97.64 percent for UCD and AED dataset respectively.
Tyagi et al. [14] proposed a DL method for classifying SLA by single-lead ECG data with cascading two distinct classes of RBM in Enhanced-DBN. HRV and EDR data have been extracted from the 1-min segmentation ECG signal. The performance of the given fine-tuned E-DBN model for identifying SLA episodes was tested using the AED datasets [19]. The suggested technique obtained specificity, sensitivity, and F1-scores of 92.28, 83.89, and 0.913, respectively.
Combined CNN and RNN approach
Biswal et al. [24] used a hybrid deep recurrent and CNN (RCNN) network for the sleeping AHI value based on a spectrogram for the nasal AF, SpO2, and abdominal signal. The combined effect of CNN and RNN allows one to derive features from the input data using CNN (2 filter dimensions of 100 and 200 sizes) and model long-term temporal correlations in the dataset with RNN. This hybrid approach obtained 80.2 and 88.2 percent Acc with RCNN classifiers, using SHHS and MGH datasets, respectively. Zhang et al. [37] presented a CNN-LSTM model for the assessment of SLA. The first Conv layer uses three distinct kinds of filters with each of the 24 units and sizes ( to enable the learning of a distinct scale characteristics. The developed model obtained 94.8 percent and 96.1 percent accuracy for CNN and CNN-LSTM networks, respectively.
Zarei et al. [39] developed an automated feature extraction approach by introducing CNN with the LSTM recurrent network. To extract spatial characteristics, a 2D CNN model was used. The suggested model is divided into two sections. The first section has four 2D CNN (number of filters = 128, kernel size = 2 2, activation = ReLU) with MPL (2 2) and three LSTM layers with 256 cells. An FC layer of 64 cells with a sigmoid function is used to the feature vector to distinguish SLA from regular intervals. The presented approach has an Acc of 97.21% for AED and 93.70% for UCDDB dataset. Bahrami et al. [42] collected the R maximum amplitudes and RRI data, analyzed CNN, RNN, and hybrid models, and determined that hybrid methods significantly improved performance. The generated outputs are input into the Deep-RNN models (LSTM, GRU, and BiLSTM) and LeNet-5 CNN models. LeNet-5 has 2 conv, 2 pooling, and 2 FC layers. In the first layer, A MPL of was applied after 20 filters of size were used. 50 filters of and an MPL of were applied in the second layer. The output of DRNNs was added to an FC layer and a softmax layer. The hybrid method of LeNet and LSTM obtained the best Acc and Sen of 80.67 and 75.04 percent. Liang et al. [43] enhanced overall effectiveness by merging CNNs with RNNs, with RR-interval signals passing through two Conv layers and one LSTM layer. The first Conv (5 kernel size, 80 features), the second Conv (3 kernel size, 100 features), the LSTM layer (100 hidden units). The Nesterov accelerated gradient technique [43] was used to speed up network convergence. The presented model obtained Spe of 96.94, Sen of 98.97, and Acc of 99.80 percent.
Bahrami et al. [44] used ECG signal data and hybrid DNN methods to obtain better results.. The designed hybrid ZF-Net has 96 kernels with an MPL layer followed by BN in the initial layer, 256 kernels followed by an MPL layer and BN in the next layer, and 512 kernels , 1024 kernels , and 512 kernels placed with an MPL layer in the third layer. For the GRU and LSTM, the output size was chosen to , whereas for the BiLSTM, it was chosen to ZFNet-BiLSTM obtained the maximum Acc (88.13%) and Spe (92.27%), and ZFNet-GRU obtained the maximum Sen (84.26%). Banluesombatkul et al. [45] used raw ECG records of AHI incidents with a sequence of 1DCNN for automated extraction of features, RNN with LSTM for temporal data retrieval, and FC-DNN for function encoding out of a wide set of features. A stack of 1DCNNs with 64, 128, and 256 units, the activation function (ReLU), and the MPL (size of 2) in order to find essential features from the CNN layer. An LSTM structure with the same CNN units, with the intermittent dropout set to 0.4, and then stacked DNNs with 5 layers (128, 64, 32, 16, 8), and 4 hidden nodes for feature encoding, proceeded by the softmax layer. For OSA severity classification, an Acc of 79.45 percent was obtained. To diagnose physiological occurrences from respiratory-related motion inputs, Hafezi et al. [63] used a DL method based on a combination of CNN and LSTM. Each CNN layer used a kernel scale of four with strides of 1 to obtain 64 features, followed by BN and the ReLU activation function. The LSTM model has hidden units of 128 and produces the same number of hidden states and output states for each time cycle, with an FC layer with a sigmoid activation feature, and obtained an Acc of 84 percent to detect SLA.
Discussion
Throughout the extensive literature analysis, various state-of-the-art methodologies used in the signal collection, preprocessing, normalization, feature extraction, and classification methods that impact the computer-assisted automated detection of SLA have been discussed in detail. Over the past ten years, there have been more innovations in the SLA field than ever due to the increased interest in sleep research and the treatment methods that could be used to assist. The bibliometric study was done based on the frameworks, datasets, and networks for SLA, as shown in Fig. 3. The existing deep classification techniques for the identification of SLA have been summarized and presented in this comprehensive study. The key conclusions from the 47 research studies that were chosen are presented below. A substantial number of articles have been published in the past five years, demonstrating great attention to this issue within the scholarly community. Additionally, it is yet unknown whether a sensor or signal is optimal for detecting apnea. The primary objective of the study is to help the researchers involved in building computer-aided methods for diagnosing SLA. The phrase diagnosis should rather be defined as the ability of computer-based approaches to determine the severity of SLA dependent on the AHI value.
The single-source signals used in apnea detection SpO2 [17–30], electrocardiogram (ECG) [14, 17, 19, 31–53], respiration [23, 54–65], and sound [66–71] have all been analyzed as input variables. The ECG data signal has been the most widely used, which could also be validated by Li et al. [33], which used a feature of the RR series that, over a source signal, ECG data have given the maximum global accuracy classification. Among all the reviewed sensors, the signal from pulse oximetry, which measures oxygen saturation, has showed promising results for convenient and effective SLA identification. Therefore, the increased precision of ECG data signals could have been attributed to the use of publicly available data, which is less likely to be affected by interference. Pathinarupothi et al. [20] obtained the highest performance by using SpO2 signal comparison to IHR from ECG for studies related to a single source. However, even with the use of various algorithms and datasets, a comparison between distinct input efficiency metrics is also not realistic for such an analysis. As stated in [23, 24, 58, 61], the use of more than 1 signal data from the source signal increase the accurate prediction of all model types. Furthermore, the major study aim of the majority of the experiments is to obtain a decent outcome with minimal detectors and sensors.
Following a detailed evaluation of the literature review conducted for this study, most of the researchers opted to use the AED dataset [14, 19–21, 31–44] of the PhysioNet in conjunction with a survey questionnaire. This is because it can be challenging to acquire real-time records and find patients to participate in the study. The ethical issue is the main difficulty in conducting sleep studies in health facilities. Furthermore, noise and artifacts are always present in these data. For the signal to be useful and the resulting image to be suitable for feature collection and classification, noise cancellation techniques must be applied [40, 57]. Several authors have divided the signal to save the processing time [23, 24, 28, 37, 45–49, 57–61, 63, 66, 68, 70]. Normally, the ECG waveform is divided into segments of one minute in length [14, 20, 22, 26, 31–36, 38–43, 51, 62, 71]. Researchers frequently employ the Pan-Tompkins technique to extract data from the ECG waveform's QRS complex [33]. Wavelet transforms are most often used because they can filter with a high level of accuracy [36, 60]. According to the literature review, CAD techniques are primarily based on two important aspects. One approach is to shorten the amount of time needed to monitor several parameters, such as in the case of PSG tests, which are carried out overnight in a monitored medical environment [6]. Moreover, from the other end, a breakthrough in the automated identification of SLA has been made possible by the development of advanced algorithms for signal processing and computational classifiers. Even if the reduction in the number of signals that need to be observed has a significant impact in terms of decreased expenses, enhanced patient comfort, and decreased waiting lists, such as in the instance of PSG, is much more difficult and requires longer time. The screening process should be automated for this reason.
CNN has been the most widely applied classification method that is based on both 1DCNN and 2DCNN. Generally, CNN was developed for 2D images with multiple channels as input, but that could be used for signals with only a single channel [35, 47, 49, 69]. Several researchers used the SpO2 signal [21, 22, 26, 28], nasal AF [55–57, 60, 65], or a mixture of SpO2, AF, as well as ribcage and abdomen motions [58, 63], and transformed these 1D signals to a 2D input for apnea detection, and applying the 2DCNN [23, 60, 69] directly. However, McCloskey et al. [60] evaluated both and observed that the 2D image spectrogram with the nasal AF outperformed the raw 1-D data signal for CNN. Biswal et al. [24] reported a significant result, with RCNN and spectrogram presentation obtaining better accuracy. Wang et al. [32] observed that a residual network performed significantly better than a CNN with a reduced number of input samples. Urtnasan et al. used 1DCNN [46, 47] and RNN [48] information on the same research lab data, and it was determined that RNN performed better than CNN. Evaluating the studies that have used LSTM [31, 41, 48, 51] and GRU [48, 49] can lead to a similar conclusion. ElMoaqet et al. [61] evaluated 3-respiratory signals using automatic function extraction of temporal features and classification of apneic episodes and obtained that the NPRE signal outperformed with deep BiLSTM-based method.
Optimization of hyper parameters is also an issue in deep network implementation. Several studies [46, 47] found that simply boosting the number of neurons or layers throughout the hidden layers in the network did not improve results. Others attempted to figure out the solution by using a predefined search space [22]. DeFalco et al. [34] proposed an alternative approach in which the hypermeters have been chosen using EAs. With the exception of the research conducted by Kim et al. [66], the majority of the studies that used deep networks demonstrated superior performance over those that utilized shallow networks. When compared to a shallow network, the performance of a deep network was marginally worse in their work. On the other hand, they used a deep neural network containing human-engineered characteristics; similar studies [64] had shown that MHLNN outperformed traditional network learning methods when features were included. As a result, the study of Kim et al. [66] may include a feature selection method or deep network hyper parameter selection.
The literature shows that the automatic SLA detection approaches focus on the two most important aspects. The first one aims to reduce the time required to monitor multiple parameters, as in the case of the PSG methods, which are performed overnight in an attended hospital. The other one involves the development of complex signal processing techniques and predictive classification methods, which has led to advances in automatic SLA detection. While minimizing the number of input signals that need to be monitored can result in lower costs, improved patient comfort, and reduced wait times.
Open issues and challenges
In this research, we summarized the challenges and causes of SLA and the issues with the respective diagnostic methods. Using biological signals such as the SpO2, ECG, AF, and sound, we discussed relevant research that shows feature learning approaches, effective use of conventional DL, and sensor/feature fusion techniques to diagnose SLA and, in certain conditions, diagnose its severity. As a result of the analysis, it has been observed that different DL algorithms, when applied to the databases included in the literature study, provide variable degrees of performance. This shows that the efficiency of the algorithms is dependent on many conditions, which include:
Dataset collection methods
The learning of DL classifiers is affected by factors such as sensor type, position, frequency, and sensitivity of data. There are several biomedical variables that can help find SLA; SpO2, ECG, AF, and sound signals are the most common types. The limitation of employing ECG is that signals obtained from three or more leads have to be used to provide a stationary ECG or ECG holter monitoring, which can be constrictive for the patient being studied due to lead position. Wearable devices can have single lead ECGs, although their accuracy can be relatively low compared to devices with many leads. When combined with subject demographic data, SPO2 sensors, like solitary lead ECG sensors, may be placed inside wearable technology and have been observed to identify SLA effectively. Some of them may cause noise to be introduced into the data collection process; for instance, sound sensors are sensitive to errors caused by environmental noise.
Characteristics of the dataset
The performance of supervised training approaches is also affected by dataset components such as distribution and sample data attributes. A balanced dataset is required for a classification model to be efficiently trained. In the case of SLA, it is essential that the proportion of apneic instances in the data be proportional to the number of non-apneic instances in the data. In its absence, the classification algorithm trained for majority’s class and minority subclasses is incorrectly classified. To optimize the classification training, it is necessary to use the proper data pre-processing and feature learning for the fine-tuning of model.
Challenges in DL methods
This section discusses some of the most significant challenges that can occur while employing DL techniques to diagnose apnea. Table 2 lists the research that used various DL models to diagnose SLA using physiological signals. Table 2 shows that standard or conventional DL models have been used by authors to diagnose SLA with satisfactory performance. Nonetheless, complicated DL approaches, including such as representation learning [74, 75], graph [76] and attention net [77], and so on, still need to be applied in SLA studies. It is primarily due to limited access to extensive data. Another issue that limits the authors from using complex approaches to detect sleep disorder is the lack of high-efficiency hardware resource [78, 79].
Table 2.
Study | Analysis model | Classifier type# | Performance metric (%) | ||||||
---|---|---|---|---|---|---|---|---|---|
Acc | Spe | Sen | PPV | NPV | AUC | Others | |||
[26] | 1D-CNN (AED) | AH/N | 88.49 | 93.80 | 73.64 | – | – | – | – |
G | 95.71 | – | – | – | – | – | – | ||
1D-CNN | AH/N | 95.14 | 97.08 | 92.36 | – | – | – | – | |
(HuGCDN2008) | G | 100 | – | – | – | – | – | – | |
[27] | 1D-CNN (AED) | A/N | 94.24 | 96.61 | 9.0.04 | – | – | – | – |
1D-CNN (UCD) | 85.79 | 93.90 | 67.35 | – | – | – | – | ||
1D-CNN(HuGCDN 2008) | 89.32 | 94.60 | 74.75 | – | – | – | – | ||
[56] | 1D-CNN | A/H/N | 96.6 | 98.5 | 81.1 | 87 | 97.7 | – | – |
G (AHI 5) | 96.2 | 84.6 | 100.0 | 95.1 | 100 | 99.0 | F1 98.0 | ||
G(AHI 5) | 92.3 | 86.5 | 98.1 | 87.9 | 97.8 | 99.0 | F1 93.0 | ||
G(AHI ) | 96.2 | 96.2 | 96.2 | 89.3 | 98.7 | 100 | F1 93.0 | ||
[28] | 1D-CNN | G | 88.3 | 95.4 | 90.9 | – | – | – | – |
[29] | 1D-CNN (SHHS1) | A/H/N | 84.3 | 86.4 | 68.9 | – | – | 86.2 | – |
1D-CNN SHHS2) | 82.2 | 82.1 | 82.9 | – | – | 90.4 | – | ||
[32] | 1D-CNN | A/N | 90.97 | 83.04 | 95.50 | – | – | 88.0 | – |
Residual Network | 94.39 | 93.04 | 94.95 | – | – | – | – | ||
[35] | 1D-CNN | A/N | 87.9 | 92.0 | 81.1 | – | – | 94.0 | – |
[38] | 1D-CNN | OA/N | 89.4 | 89.1 | 89.8 | 83.6 | – | 96.4 | 6 |
[46] | 1D-CNN | OA/H/N | 90.8 | 87.0 | 87.0 | 87.0 | – | – | |
[47] | 1D-CNN | OA/N | 96.0 | 96.0 | 96.0 | – | – | – | |
[49] | 1D-CNN | A/N | 98.5 | 99.0 | 99.0 | – | – | – | – |
2D-CNN | 95.9 | 96.0 | 96.0 | – | – | – | – | ||
LSTU | 98.0 | 98.0 | 98.0 | – | – | – | – | ||
GRU | 99.0 | 99.0 | 99.0 | – | – | – | – | ||
[58] | 1D-CNN | OA/H/N | 83.5 | – | 83.4 | 83.4 | – | – | F1 83.4 |
[59] | 1D-CNN | OA/N | 74.70 | – | 74.70 | 74.50 | – | – | F1 75.0 |
[60] | 1D-CNN | OA/H/N | 77.6 | – | 77.6 | 77.4 | – | – | F1 77.5 |
2D-CNN | 79.8 | 79.7 | 79.8 | – | – | F1 79.7 | |||
[62] | 1D-CNN | A/N | 80.78 | – | 81.73 | 80.78 | – | – | F1 80.63 |
[69] | 1D-CNN | G (AHI 5) | 91.5 | 83.3 | 93.6 | 95.6 | 76.9 | 94.1 | – |
G (AHI 10) | 81.3 | 78.9 | 82.5 | 89.1 | 68.1 | 93.5 | – | ||
G(AHI 5) | 91.5 | 95.8 | 88.5 | 96.8 | 85.1 | 98.1 | – | ||
G(AHI ) | 93.2 | 91.6 | 95.6 | 88.0 | 97.0 | 98.7 | – | ||
[23] | 2D-CNN | OA/H/N | 79.61 | – | – | – | – | – | – |
[36] | 2D-CNN (AED) | OA/N | 94.38 | 94.51 | 94.30 | – | – | – | |
2D-CNN (UCD) | 81.86 | 86.05 | 71.62 | – | – | – | |||
[68] | 2D-CNN | G(Acoustic) | – | 93.0 | 78.0 | – | – | 92.0 | – |
(DNN) | G(SpO2) | – | 82.0 | 93.0 | – | – | 93.0 | – | |
[71] | 2D-CNN | G (AHI 5) | – | 76.0 | 98.0 | – | – | 99.0 | – |
G(AHI 5) | – | 90.0 | 97.0 | – | – | 99.0 | – | ||
G(AHI 30) | – | 94.0 | 92.0 | – | – | 98.0 | – | ||
[40] | OSACN-Net | OA/N | 94.81 | 94.95 | 94.58 | 94.97 | 94.56 | 98.92 | |
Res-Net50 | 94.51 | 94.59 | 94.48 | 94.60 | 94.47 | 98.76 | |||
Squeeze-Net | 90.34 | 89.94 | 90.76 | 89.84 | 90.86 | 97.42 | |||
[20] | LSTM | G (IHR) | – | – | 99.4 | – | – | – | – |
OA/N(SpO2) | 95.5 | – | 92.9 | 99.2 | – | 98.0 | – | ||
OA/N(IHR) | 89.0 | – | 99.4 | 82.4 | – | 99.0 | – | ||
OA/N(SpO2 + IHR | 92.1 | – | 84.7 | 99.5 | – | 99.0 | – | ||
[31] | LSTM | G(IHR) | 100 | – | – | – | – | – | |
[41] | LSTM (tenfold) | A/N | 99.80 | 99.73 | 99.85 | – | – | 100 | – |
LSTM (Hold–out) | 81.30 | 91.75 | 59.90 | – | – | 85.32 | – | ||
[48] | LSTM | A/H/N | 98.5 | 98.0 | 98.0 | – | – | – | |
GRU | 99.0 | 99.0 | 99.0 | – | – | – | |||
[51] | LSTM | A/N | – | 100 | 100 | – | – | 96.0 | – |
[61] | LSTM | A/N(NPRE) | 85.1 | 83.8 | 90.0 | 58.9 | 97.0 | 91.7 | |
BiLSTM | 85.0 | 83.7 | 90.3 | 58.8 | 97.1 | 92.4 |
Study | Analysis model | Classifier Type# | Performance metric (%) | ||||||
---|---|---|---|---|---|---|---|---|---|
Acc | Spe | Sen | PPV | NPV | AUC | Others | |||
[57] | LSTM | A/N(abdores) | 77.2 | 80.3 | 62.3 | 39.9 | 91.1 | 77.5 | –– |
LSTM | A/N (thorres) | 75.0 | 76.5 | 67.8 | 37.7 | 91.9 | 79.7 | – | |
LSTM | A/N(EDR) | 60.1 | 61.8 | 52.1 | 22.1 | 86.1 | 58.8 | – | |
FLSTM | A/N(abdores) | 71.1 | 73.9 | 57.9 | 33.0 | 89.5 | 71.5 | – | |
FLSTM | A/N (thorres) | 74.7 | 77.2 | 62.9 | 36.8 | 90.9 | 76.9 | – | |
FLSTM | A/N(EDR) | 58.7 | 60.8 | 48.8 | 21.1 | 85.0 | 57.6 | – | |
[65] | LSTM (SHHS-1) | A/H/N | 80.66 | – | – | – | – | – | – |
LSTM(UCD) | s | 82.04 | – | – | – | – | – | – | |
[70] | LSTM | OA/N | 95.30 | 95.70 | 94.9 | 95.70 | – | – | |
[53] | GRU | 84.9 | – | 70.9 | 73.4 | – | – | F1 72.1 | |
GRU (AED) | G | 89.9 | 91.1 | 87.8 | – | – | – | – | |
[21] | MHLNN | OA/N | 93.3 | 100 | 87.5 | – | – | – | – |
[34] | MHLNN | OA/N | 68.37 | – | – | – | – | – | – |
[50] | MHLNN | OA/N | 97.80 | 93.90 | 98.60 | – | – | 97.0 | |
[64] | MHLNN | G(AHI 5) | 83.46 | 86.35 | 80.47 | – | – | – | – |
G(AHI 10) | 85.39 | 79.23 | 85.56 | ||||||
G(AHI 15) | 92.69 | 78.07 | 93.06 | ||||||
[66] | MHLNN | G | 75.0 | – | – | – | – | – | – |
[33] | SSAE | OA/N | 84.7 | 82.1 | 88.9 | – | – | 86.9 | – |
G | 100 | 100 | 100 | – | – | – | – | ||
[22] | DBN (AED) | A/N | 97.64 | 95.89 | 78.75 | – | – | – | – |
DBN (UCD) | 85.26 | 91.71 | 60.36 | – | – | – | – | ||
[14] | E-DBN | A/N | 89.11 | 92.28 | 83.89 | – | – | 0.960 | |
[24] | RCNN (MGH) | G | 88.2 | – | – | – | – | – | – |
RCNN (SHHS) | G | 80.2 | – | – | – | – | – | – | |
[37] | CNN-LSTM | OA/N | 96.1 | 96.2 | 96.1 | 97.6 | 93.8 | – | – |
CNN_1 | 94.8 | – | – | – | – | – | – | ||
[44] | ZFNet-LSTM | A/N | 87.84 | 90.68 | 83.29 | – | – | – | F1 84.03 |
ZFNet-BiLSTM | 88.13 | 92.27 | 81.49 | – | – | – | F1 84.04 | ||
ZFNet-GRU | 87.43 | 89.42 | 84.26 | – | – | – | F1 83.74 | ||
ZF-Net | 87.36 | 90.92 | 81.64 | – | – | – | F1 83.20 | ||
Alex-Net | 87.09 | 89.87 | 82.64 | – | – | – | F1 83.10 | ||
LSTM | 82.52 | 87.88 | 73.93 | – | – | – | F1 76.46 | ||
BiLSTM | 82.45 | 88.25 | 73.14 | – | – | – | F1 76.17 | ||
GRU | 82.93 | 88.63 | 73.80 | – | – | – | F1 76.86 | ||
[39] | CNN-LSTM(AED) | A/N | 97.21 | 98.94 | 94.41 | 94.42 | – | – | |
CNN-LSTM(UCD) | 93.70 | 95.82 | 90.69 | – | – | – | – | ||
CNN(AED) | 95.77 | 96.03 | 96.35 | – | – | – | – | ||
LSTM(AED) | 93.33 | 9558 | 89.70 | – | – | – | – | ||
[42] | LeNet-LSTM | A/H/N | 80.67 | 84.13 | 75.04 | – | – | – | F1 74.72 |
LeNet-BiLSTM | 79.98 | 87.11 | 72.23 | – | – | – | F1 72.23 | ||
LeNet-GRU | 80.17 | 85.42 | 71.62 | – | – | – | F1 73.33 | ||
[43] | CNN-LSTM | G | 99.80 | 96.94 | 98.97 | – | – | – | – |
[45] | CNN-LSTM-DNN | G | 79.45 | 80.10 | 77.60 | – | – | – | |
[63] | CNN-LSTM | G (AHI 15) | 84.00 | 87.00 | 81.00 | – | – | – |
#N-normal, A-apnea, H-hypopnea, O-obstructive, and G- Global or OSA severity
Future scope
In this section, future studies for the assessment of SLA in PSG data using DL approaches are provided. Future research will initially focus on making datasets with physiological signal data that have a large number of instances available. Data sources are one of computer aided diagnosis system (CADS) essential aspects for diagnosing different diseases. Providing datasets with a large number of cases is essential for future research. Additionally, a number of clinical research aim to analyze the effectiveness of magnetic resonance imaging (MRI) modalities in the detection of apnea [80]. Therefore, the access that researchers have to data from MRI methods enables them to examine the brain activity that occurs during SLA and compare the brain functioning of people with SLA to normal persons.
This section presents a number of potential approaches for using the most recent DL methods in upcoming research on the detection of SLA. Researchers working in this area have also been able to construct new models primarily to the major evolution of DL approaches that have taken place over a period of the past few years. Table 2 contains a summary of research that examines the use of DL methods in the clinical diagnosis of SLA. As can be observed, the investigations that were conducted on the detection of SLA employed conventional DL models. For this reason, several of the recent DL models for research on the detection of SLA are introduced, including representation learning [74, 75], graph [76], attention net [77], and others.
Conclusion
The study summarized the findings of an analysis of several techniques for detecting SLA. The purpose of the analysis method was to assess some of the most effective DL approaches for detecting SLA from a different input signal forms. A significant number of works have been presented throughout the last five years, highlighting the potential interest in this topic among the academic community. The comparative analysis of deep neural networks, as well as the parameter selection of DL models, is also an active area of study and a key topic of discussion. In addition to discovering new DL techniques for other detection methods, as automatic SLA detection techniques are promising. As epoch-based diagnosis is difficult, an automatic and effective SLA detection solution would be required. When developing a new automatic SLA detection method, considering several data signals (ECG, oximetry, and respiratory) would result in an increased detection rate.
Moreover, we use a review of previous studies to demonstrate the present status and future development direction of SLA event identification and detection focused on DL frameworks. This study can abstractly represent the developmental tendency of the research topic during a specified time span of ten years. Recent years have seen a rise in the use of DL techniques in the healthcare field, notably for the identification and categorization of SLA and normal events to help in the diagnosis of cardiovascular disorders, anxiety, hypertension and many more health issues. Following that, a DL based model has shown promise for taking these recent developments in all physiological signal processing for healthcare applications in SLA diagnosis to the state-of-the-art level.
This analysis should help many authors decide which source input data, DL process, DL model, and database are better for implementing SLA detection framework and obtaining state-of-the-art performance. As a result of this study, our discussion may assist other authors in selecting a DL problem, a DL network, or a training architecture. These are the primary factors affecting the overall system performance. Furthermore, a DL strategy has shown promise in advancing existing developments to the state-of-the-art point in physiological signal processing for medicinal applications.
In addition, researchers must concentrate their efforts on areas that presently receive relatively less attention when it comes to DL-based signal data for ECG applications. Some examples of these fields are wearable smart appliances as well as biological recognition and detection. All these domains have the potential to emerge as the next hotspot. Additionally, the research group is still searching for studies on how to include ECG physiological detection into smart wearables that can recognize and track CVDs. It is important to pay more attention to this kind of multidisciplinary or fusion analysis.
Supplementary Information
Below is the link to the electronic supplementary material.
Author Contribution
PKT designing, Conceptualization, Methodology, analysis and interpretation of the data, Writing—original draft, Writing—review & editing. DA Supervision, Visualization, Resources, Formal analysis, Investigation, Validation, Writing—review & editing. Both authors studied and approved the final draft.
Funding
This study was not funded by anyone.
Declarations
Conflict of interest
None of the authors has any conflict of interest to declare.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Footnotes
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.JeyaJothi ES, Anitha J, Rani S, Tiwari B. A comprehensive review: computational models for obstructive sleep apnea detection in biomedical applications. BioMed Res Int. 2022 doi: 10.1155/2022/7242667. [DOI] [PMC free article] [PubMed] [Google Scholar] [Retracted]
- 2.Tyagi PK, Rathore N, Parashar D, Agrawal D. A review of automated diagnosis of ECG arrhythmia using deep learning methods. AI-Enabled Smart Healthcare Biomed Signals. 2022;2022:98–111. doi: 10.4018/978-1-6684-3947-0.ch005. [DOI] [Google Scholar]
- 3.Olson EJ, Moore WR, Morgenthaler TI, Gay PC, Staats BA. Obstructive sleep apnea-hypopnea syndrome. Mayo Clinic Proc. 2003;78(12):1545–1552. doi: 10.4065/78.12.1545. [DOI] [PubMed] [Google Scholar]
- 4.Tyagi PK, Agarwal D, Mishra P. A review of automated sleep apnea detection using deep neural network. Artif Intell Intern Things Smart Mater Energy Appl. 2022;12:1–20. doi: 10.1201/9781003220176-1. [DOI] [Google Scholar]
- 5.Sezgin N, Tagluk ME. Energy based feature extraction for classification of sleep apnea syndrome. Comput Biol Med. 2009;39(11):1043–1050. doi: 10.1016/j.compbiomed.2009.08.005. [DOI] [PubMed] [Google Scholar]
- 6.Mendonca F, Mostafa SS, et al. A review of obstructive sleep apnea detection approaches. IEEE J Biomed Health Inf. 2018;23(2):825–837. doi: 10.1109/JBHI.2018.2823265. [DOI] [PubMed] [Google Scholar]
- 7.Mostafa SS, Mendonça F, Ravelo-García GA, Morgado-Dias F. A systematic review of detecting sleep apnea using deep learning. Sensors. 2019;19(22):4934. doi: 10.3390/s19224934. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Tyagi PK, Rathore N, et al. (2023) A review on heartbeat classification for arrhythmia detection using ECG signal processing. In: IEEE international students' conference on electrical, electronics and computer science. IEEE, pp 1–6.10.1109/SCEECS57921.2023
- 9.Song C, Liu K, Zhang X, Chen L, Xian X. An obstructive sleep apnea detection approach using a discriminative hidden Markov model from ECG signals. IEEE Trans Biomed Eng. 2015;63(7):1532–1542. doi: 10.1109/TBME.2015.2498199. [DOI] [PubMed] [Google Scholar]
- 10.Varon C, Caicedo A, et al. A novel algorithm for the automatic detection of sleep apnea from single-lead ECG. IEEE Trans Biomed Eng. 2015;62(9):2269–2278. doi: 10.1109/TBME.2015.2422378. [DOI] [PubMed] [Google Scholar]
- 11.Sharma H, Sharma KK. An algorithm for sleep apnea detection from single-lead ECG using Hermite basis functions. Comput Bio Med. 2016;77:116–124. doi: 10.1016/j.compbiomed.2016.08.012. [DOI] [PubMed] [Google Scholar]
- 12.Álvarez-Estévez D, Moret-Bonillo V. Fuzzy reasoning used to detect apneic events in the sleep apnea-hypopnea syndrome. Expert SystApp. 2009;36(4):7778–7785. doi: 10.1016/j.eswa.2008.11.043. [DOI] [Google Scholar]
- 13.Ravelo-García AG, Kraemer JF, et al. Oxygen saturation and RR intervals feature selection for sleep apnea detection. Entropy. 2015;17(5):2932–2957. doi: 10.3390/e17052932. [DOI] [Google Scholar]
- 14.Tyagi PK, Agrawal D. Automatic detection of sleep apnea from single-lead ECG signal using enhanced-deep belief network model. Biomed Signal Process Control. 2023;80:104401. doi: 10.1016/j.bspc.2022.104401. [DOI] [Google Scholar]
- 15.Jayaraj R, Mohan J, Kanagasabai A. A review on detection and treatment methods of sleep apnea. J Clin Diagn Res JCDR. 2017;11(3):VE01. doi: 10.7860/JCDR/2017/24129.9535. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Page MJ, McKenzie JE, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. Int J Surg. 2021;88:105906. doi: 10.1016/j.ijsu.2021.105906. [DOI] [PubMed] [Google Scholar]
- 17.PhysioNet. Available online: www.physionet.org.
- 18.St. Vincent's University Hospital/University College Dublin Sleep Apnea Database. Available: https://physionet.org/pn3/ucddb/.
- 19.Penzel T, Moody GB, Mark RG, et al. The apnea-ECG database 2000. Comput Cardiol. 2000;27:255–258. doi: 10.1109/CIC.2000.898505. [DOI] [Google Scholar]
- 20.Pathinarupothi RK, Rangan ES, et al. (2017) Single sensor techniques for sleep apnea diagnosis using deep learning. In: 2017 IEEE international conference on healthcare informatics (ICHI). IEEE, pp 524–529
- 21.Almazaydeh L, Faezipour M, et al. A neural network system for detection of obstructive sleep apnea through SpO2 signal features. Int J Adv Comput Sci Appl. 2012 doi: 10.14569/IJACSA.2012.030502. [DOI] [PubMed] [Google Scholar]
- 22.Mostafa SS, Mendonça F, et al. (2017) SpO2 based sleep apnea detection using deep learning. In: 2017 IEEE 21st international conference on intelligent engineering systems Larnaca, Cyprus, .IEEE, pp 000091–000096. 10.1109/INES.2017.8118534
- 23.Cen L, Yu ZL, et al. (2018) Automatic system for obstructive sleep apnea events detection using convolutional neural network. In: Proceedings 40th annual international conference of the IEEE engineering in medicine and biology society (EMBC), IEEE, pp 3975–3978 [DOI] [PubMed]
- 24.Biswal S, Sun H, et al. Expert-level sleep scoring with deep neural networks. J Am Med Inform Assoc. 2018;25(12):1643–1650. doi: 10.1093/jamia/ocy131. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Sleep Heart Health Study: https://sleepdata.org/datasets/shhs.
- 26.Mostafa SS, Baptista D, et al. Greedy based convolutional neural network optimization for detecting apnea. Comput Methods Programs Biomed. 2020;197:105640. doi: 10.1016/j.cmpb.2020.105640. [DOI] [PubMed] [Google Scholar]
- 27.Mostafa SS, Mendonca F, et al. Multi-objective hyperparameter optimization of convolutional neural network for obstructive sleep apnea detection. IEEE Access. 2020;8:129586–129599. doi: 10.1109/ACCESS.2020.3009149. [DOI] [Google Scholar]
- 28.Leino A, Nikkonen S, Kainulainen S, et al. Neural network analysis of nocturnal SpO2 signal enables easy screening of sleep apnea in patients with acute cerebrovascular disease. Sleep Med. 2021;79:71–78. doi: 10.1016/j.sleep.2020.12.032. [DOI] [PubMed] [Google Scholar]
- 29.Sharma P, Jalali A, et al. (2022) Deep-Learning based Sleep Apnea Detection using SpO2 and Pulse Rate. In: Annual international conference of the IEEE engineering in medicine & biology society (EMBC), pp 2611–14 [DOI] [PubMed]
- 30.Quan SF, Howard BV, et al. The sleep heart health study: design, rationale, and methods. Sleep. 1997;20(12):1077–1085. doi: 10.1093/sleep/20.12.1077. [DOI] [PubMed] [Google Scholar]
- 31.Pathinarupothi R, Vinaykumar R, et al. (2017) Instantaneous heart rate as a robust feature for sleep apnea severity detection using deep learning. In: Proceedings EMBS international conference on biomedical & health informatics IEEE, pp 293–296
- 32.Wang L, Lin Y, Wang J. A RR interval based automated apnea detection approach using residual network. Comput Methods Programs Biomed. 2019;176:93–104. doi: 10.1016/j.cmpb.2019.05.002. [DOI] [PubMed] [Google Scholar]
- 33.Li K, Pan W, Li Y, Jiang Q, Liu G. A method to detect sleep apnea based on deep neural network and hidden Markov model using single-lead ECG signal. Neurocomputing. 2018;294:94–101. doi: 10.1016/j.neucom.2018.03.011. [DOI] [Google Scholar]
- 34.De Falco I, De Pietro G, et al. (2018) Deep neural network hyper-parameter setting for classification of obstructive sleep apnea episodes. In: Proceedings IEEE symposium on computers and communications (ISCC), pp 01187–92
- 35.Chang HY, Yeh CY, Lee CT, Lin CC. A sleep apnea detection system based on a one-dimensional deep convolution neural network model using single-lead electrocardiogram. Sensors. 2020;20(15):4157. doi: 10.3390/s20154157. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Mashrur FR, Islam MS, et al. SCNN: scalogram-based convolutional neural network to detect obstructive sleep apnea using single-lead electrocardiogram signals. Comput Biol Med. 2021;134:104532. doi: 10.1016/j.compbiomed.2021.104532. [DOI] [PubMed] [Google Scholar]
- 37.Zhang J, Tang Z, et al. Automatic detection of obstructive sleep apnea events using a deep CNN-LSTM model. Comput Intell Neurosci. 2021 doi: 10.1155/2021/5594733. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Shen Q, Qin H, et al. Multiscale deep neural network for obstructive sleep apnea detection using RR interval from single-lead ECG signal. IEEE Trans Instrum Meas. 2021;70:1–3. doi: 10.1109/TIM.2021.3062414. [DOI] [Google Scholar]
- 39.Zarei A, Beheshti H, Asl BM. Detection of sleep apnea using deep neural networks and single-lead ECG signals. Biomed Signal Process Control. 2022;71:103125. doi: 10.1016/j.bspc.2021.103125. [DOI] [Google Scholar]
- 40.Gupta K, Bajaj V, Ansari IA. OSACN-Net: automated classification of sleep apnea using deep learning model and smoothed Gabor spectrograms of ECG signal. IEEE Trans Instrum Meas. 2021;71:1–9. [Google Scholar]
- 41.Faust O, Barika R, et al. Accurate detection of sleep apnea with long short-term memory network based on RR interval signals. Knowl-Based Syst. 2021;212:106591. doi: 10.1016/j.knosys.2020.106591. [DOI] [Google Scholar]
- 42.Bahrami M, Forouzanfar M. (2021) Detection of sleep apnea from single-lead ECG: comparison of deep learning algorithms. In: EEE international symposium on medical measurements and applications (MeMeA), IEEE, pp1–5
- 43.Liang X, Qiao X, Li Y. (2019) Obstructive sleep apnea detection using combination of CNN and LSTM techniques. In: 2019 IEEE 8th joint international information technology and artificial intelligence conference (ITAIC), pp 1733–1736. 10.1109 ITAIC.2019.8785833
- 44.Bahrami M, Forouzanfar M. Sleep apnea detection from single-lead ECG: a comprehensive analysis of machine learning and deep learning algorithms. IEEE Trans Instrum Meas. 2022;71:1–1. [Google Scholar]
- 45.Banluesombatkul N, Rakthanmanon T, et al. (2018) Single channel ECG for obstructive sleep apnea severity detection using a deep learning approach. In: TENCON 2018 IEEE region 10 conference, pp 2011–2016
- 46.Urtnasan E, Park JU, Lee KJ. Multiclass classification of obstructive sleep apnea/hypopnea based on a convolutional neural network from a single-lead electrocardiogram. Physiol Meas. 2018;39(6):065003. doi: 10.1088/1361-6579/aac7b7. [DOI] [PubMed] [Google Scholar]
- 47.Urtnasan E, Park JU, Joo EY, Lee KJ. Automated detection of obstructive sleep apnea events from a single-lead electrocardiogram using a convolutional neural network. J Med Syst. 2018;42:1–8. doi: 10.1007/s10916-018-0963-0. [DOI] [PubMed] [Google Scholar]
- 48.Urtnasan E, Park JU, Lee KJ. Automatic detection of sleep-disordered breathing events using recurrent neural networks from an electrocardiogram signal. Neural Comput Appl. 2020;32:4733–4742. doi: 10.1007/s00521-018-3833-2. [DOI] [Google Scholar]
- 49.Erdenebayar U, Kim YJ, et al. Deep learning approaches for automatic detection of sleep apnea events from an electrocardiogram. Comput Methods Progr Biomed. 2019;180:105001. doi: 10.1016/j.cmpb.2019.105001. [DOI] [PubMed] [Google Scholar]
- 50.Li Z, Li Y, et al. A model for obstructive sleep apnea detection using a multi-layer feed-forward neural network based on electrocardiogram, pulse oxygen saturation, and body mass index. Sleep Breathing. 2021;2021:1–8. doi: 10.1007/s11325-021-02302-6. [DOI] [PubMed] [Google Scholar]
- 51.Iwasaki A, Nakayama C, et al. Screening of sleep apnea based on heart rate variability and long short-term memory. Sleep Breathing. 2021;25:1821–1829. doi: 10.1007/s11325-020-02249-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Ravelo-García AG, Saavedra-Santana P, et al. Symbolic dynamics marker of heart rate variability combined with clinical variables enhance obstructive sleep apnea screening. Chaos. 2014;24(2):024404. doi: 10.1063/1.4869825. [DOI] [PubMed] [Google Scholar]
- 53.Olsen M, Mignot E, et al. ECG-based detection of Sleep-disordered breathing in large population-based cohorts. Sleep. 2020;43(5):zsz276. doi: 10.1093/sleep/zsz276. [DOI] [PubMed] [Google Scholar]
- 54.Thommandram A, Eklund JM, McGregor C. (2013) Detection of apnoea from respiratory time series data using clinically recognizable features and kNN classification. In: Proceedings 35th annual international conference of the IEEE engineering in medicine and biology society, pp 5013–5016, 10.1109/EMBC.2013.6610674 [DOI] [PubMed]
- 55.Minu M, Paul AM. SAHS detection based on ANFIS using single-channel airflow signal. Int J Innov Res Sci, Eng Technol. 2016;5(7):13053–13061. [Google Scholar]
- 56.Choi SH, Yoon H, et al. Real-time apnea-hypopnea event detection during sleep by convolutional neural networks. Comput Biol Med. 2018;100:123–131. doi: 10.1016/j.compbiomed.2018.06.028. [DOI] [PubMed] [Google Scholar]
- 57.Van Steenkiste T, Groenendaal W, et al. Automated sleep apnea detection in raw respiratory signals using long short-term memory neural networks. IEEE J Biomed Heal Inform. 2018;23(6):2354–2364. doi: 10.1109/JBHI.2018.2886064. [DOI] [PubMed] [Google Scholar]
- 58.Haidar R, McCloskey S, et al. (2018) Convolutional neural networks on multiple respiratory channels to detect hypopnea and obstructive apnea events. In: Proceedings international joint conference on neural networks (IJCNN), IEEE, pp 1–7
- 59.Haidar R, Koprinska I, Jeffries B. (2017) Sleep apnea event detection from nasal airflow using convolutional neural networks. In: international conference neural information processing (ICONIP), pp 819–827
- 60.McCloskey S, Haidar R, Koprinska I, Jeffries B. (2018) Detecting hypopnea and obstructive apnea events using convolutional neural networks on wavelet spectrograms of nasal airflow. In: Proceedings Pacific-Asia conference advances in knowledge discovery and data mining (PAKDD), pp 361–372
- 61.ElMoaqet H, Eid M, et al. Deep recurrent neural networks for automatic detection of sleep apnea from single channel respiration signals. Sensors. 2020;20(18):5037. doi: 10.3390/s20185037. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Haidar R, Koprinska I, Jeffries B. (2020) Sleep apnea event prediction using convolutional neural networks and Markov chains. In: 2020 international joint conference on neural networks (IJCNN). IEEE, pp 1–8
- 63.Hafezi M, Montazeri N, et al. Sleep apnea severity estimation from tracheal movements using a deep learning model. IEEE Access. 2020;8:22641–22649. doi: 10.1109/ACCESS.2020.2969227. [DOI] [Google Scholar]
- 64.Lakhan P, Ditthapron A, et al. (2018) Deep neural networks with weighted averaged overnight airflow features for sleep apnea-hypopnea severity classification. In Proceedings IEEE TENCON Region 10 conference, pp 441–445
- 65.Drzazga J, Cyganek B. An LSTM network for apnea and hypopnea episodes detection in respiratory signals. Sensors. 2021;21(17):5858. doi: 10.3390/s21175858. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Kim T, Kim JW, Lee K. Detection of sleep disordered breathing severity using acoustic biomarker and machine learning techniques. Biomed Eng Online. 2018;17:1–9. doi: 10.1186/s12938-018-0448-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Rosenwein T, Dafna E, et al. (2015) Breath-by-breath detection of apneic events for OSA severity estimation using non-contact audio recordings. In: Proceedings 37th annual international conference of the IEEE engineering in medicine and biology society, pp 7688–7691 [DOI] [PubMed]
- 68.Romero HE, Ma N, Brown GJ, Hill EA. Acoustic screening for obstructive sleep apnea in home environments based on deep neural networks. IEEE J Biomed Health Inform. 2022;26(7):2941–2950. doi: 10.1109/JBHI.2022.3154719. [DOI] [PubMed] [Google Scholar]
- 69.Wang B, Tang X, et al. Obstructive sleep apnea detection based on sleep sounds via deep learning. Nat Sci Sleep. 2022;2022:2033–2045. doi: 10.2147/NSS.S373367. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Cheng S, Wang C, et al. Automated sleep apnea detection in snoring signal using long short-term memory neural networks. Biomed Signal Process Control. 2022;71:103238. doi: 10.1016/j.bspc.2021.103238. [DOI] [Google Scholar]
- 71.Nakano H, Furukawa T, Tanigawa T. Tracheal sound analysis using a deep neural network to detect sleep apnea. J Clin Sleep Med. 2019;15(8):1125–1133. doi: 10.5664/jcsm.7804. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Boll S. Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans Acoust, Speech, Signal Process. 1979;27(2):113–120. doi: 10.1109/TASSP.1979.1163209. [DOI] [Google Scholar]
- 73.Pan J, Tompkins WJ. A real-time QRS detection algorithm. IEEE Trans Biomed Eng. 1985;3:230–236. doi: 10.1109/TBME.1985.325532. [DOI] [PubMed] [Google Scholar]
- 74.Guo W, Wang J, Wang S. Deep multimodal representation learning: a survey. IEEE Access. 2019;7:63373–63394. doi: 10.1109/ACCESS.2019.2916887. [DOI] [Google Scholar]
- 75.Butepage J, Black MJ, et al. (2017) Deep representation learning for human motion prediction and classification. In Proceedings IEEE conference on computer vision and pattern recognition. pp 6158–6166. 10.1109/CVPR.2017.173.
- 76.Wang MY. (2019) Deep graph library: towards efficient and scalable deep learning on graphs. In: Proceedings ICLR workshop represent. learn. Graphs Manifolds, pp 1–7
- 77.Wu J, Zhang Y, et al. (2020) AttenNet: deep attention based retinal disease classification in OCT images. In: Proceedings international conference multimedia modeling: Springer. pp 565–576. Doi: 10.1007/978-3-030-37734-2_75
- 78.Khodatars M, Shoeibi A, et al. Deep learning for neuroimaging-based diagnosis and rehabilitation of autism spectrum disorder: a review. Comput Biol Med. 2021;139:104949. doi: 10.1016/j.compbiomed.2021.104949. [DOI] [PubMed] [Google Scholar]
- 79.Shoeibi A, Khodatars M, et al. Applications of deep learning techniques for automated multiple sclerosis detection using magnetic resonance imaging: a review. Comput Biol Med. 2021;136:104697. doi: 10.1016/j.compbiomed.2021.104697. [DOI] [PubMed] [Google Scholar]
- 80.Agarwal C, Gupta S, et al. Deep learning analyses of brain MRI to identify sustained attention deficit in treated obstructive sleep apnea: a pilot study. Sleep Vigilance. 2022;15:1–6. doi: 10.1007/s41782-021-00190-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.