Skip to main content
Digital Health logoLink to Digital Health
. 2023 Mar 14;9:20552076231163783. doi: 10.1177/20552076231163783

Validation of the influence of biosignals on performance of machine learning algorithms for sleep stage classification

Junggu Choi 1, Seohyun Kwon 2, Sohyun Park 3, Sanghoon Han 1,4,
PMCID: PMC10017951  PMID: 36937698

Abstract

Background

Sleep stage identification is critical in multiple areas (e.g. medicine or psychology) to diagnose sleep-related disorders. Previous studies have reported that the performance of machine learning algorithms can be changed depending on the biosignals and feature-extraction processes in sleep stage classification.

Methods

To compare as many conditions as possible, 414 experimental conditions were applied, considering the combination of different biosignals, biosignal length, and window length. Five biosignals in polysomnography (i.e. electrocardiogram (ECG), electroencephalogram (EEG), electromyogram (EMG), electrooculogram left, and electrooculogram right) were used to identify optimal signal combinations for classification. In addition, three different signal-length conditions and six different window-length conditions were applied. The validity of each condition was examined via classification performance from the XGBoost classifiers trained using 10-fold cross-validation. Furthermore, results considering feature importance were examined to validate the experimental results in terms of model explanation.

Results

The combination of EEG + EMG + ECG with a 40 s window and 120 s signal length resulted in the best classification performance (precision: 0.853, recall: 0.855, F1-score: 0.853, and accuracy: 0.853). Compared to other conditions and feature importance results, EEG signals showed a relatively higher importance for classification in the present study.

Conclusion

We determined the optimal biosignal and window conditions for the feature-extraction process in machine learning algorithm-based sleep stage classification. Our experimental results inform researchers in the future conduct of related studies. To generalize our results, more diverse methodologies and conditions should be applied in future studies.

Keywords: Sleep stage classification, polysomnography, biosignal, machine learning, classification algorithm

Introduction

The number of people with sleep-related disorders is increasing continuously, while the underlying causes can be diverse.13 To date, and especially during the coronavirus disease 2019 (COVID-19) pandemic, the prevalence of sleep disturbances has widely increased, affecting various subpopulations. Al Maqbali et al. 4 in a meta-analytic study examined the psychological impact of stress and sleep disturbances associated with the COVID-19 pandemic on nurses working in hospitals. The authors suggested that in the context of the COVID-19 pandemic, experiences of sleep disturbance and depression among nurses were found to be higher than those related to previous Middle East respiratory syndrome and severe acute respiratory syndrome pandemics. Deng et al. 5 investigated rates of sleep disturbance in college students in a systematic review. The prevalence of sleep disturbances and associated risk of mental illness was found to be increased in association with the duration of the ongoing pandemic as well as higher age. Contrary to the aforementioned previous studies, however, Ara et al.6 conducted a web-based survey to identify sleep disturbance during the COVID-19 lockdown in the general population, including 1128 individuals from Bangladesh. They found various factors, such as working from home or doing online classes, to be linked with the presence of sleep disorders during the pandemic.

The classification of sleep stages is important when examining sleep disorders and disturbances. Various methodologies have been applied to measure the depth or stage of sleep. Haythornthwaite et al.7 attempted to develop a sleep diary assessment for patients with chronic pain based on diverse categories of questions for evaluation (e.g. difficulties falling asleep, early awakening, and quality of sleep). Currie et al.8 collected sleep reports from patients with alcohol-dependency to evaluate sleep problems. They found similar difficulties falling asleep among alcoholics with short-term and long-term abstinence.

In recent research, physiological data collected from participants have been widely used to overcome biases associated with self-reports in the form of sleep diaries. Yong et al.9 conducted polysomnography studies including 124 participants with Parkinson's disease, and revealed altered sleep architecture and reduced sleep duration in these patients. They analyzed variations in several biosignals, including electroencephalogram (EEG), electrooculogram (EOG), electromyogram (EMG), and electrocardiogram (ECG). In addition, Goyal et al.10 identified risk factors related to obstructive sleep apnea in COVID-19 patients using a polysomnography dataset. Data from EEG, EOG, EMG, and body position were used to compare apnea levels and sleep depth.

In previous studies, machine learning algorithms have widely been used to find latent patterns in multiple biosignals and variables. Arslan et al.11 used machine learning classification models to automatically score sleep stages using multichannel data from polysomnography. The framework proposed by the authors showed superior performance in sleep stage classification compared with other models used in previous studies. Furthermore, Satapathy et al.12 proposed machine learning models for the classification of sleep stages. Their systems focused on sleep irregularities based on abnormal sleep patterns. They used a polysomnographic dataset and evaluated the performance of their model. The respective framework achieved a higher classification accuracy than the models proposed previously.

Similar to the existing studies mentioned above, we confirmed that diverse biosignal data in the context of polysomnography can be applied to develop sleep stage classification using machine learning models. In addition, we reviewed several studies related to feature extraction conditions and types of biosignals. Wongsirichot and Hanskunatai13 compared four machine learning algorithms (k-means clustering, k-nearest neighbor, support vector machine, and multilayer perceptron) based on four biosignals (EEG, muscle movement, ECG, and thoracic respiratory efforts) in a polysomnography dataset. The authors determined that the classification performance of the machine learning algorithms changed with the combinations of biosignals. Based on their results, they suggested the importance of investigating the optimal features for sleep-level detection. The influence of EEG features on machine learning classifiers was validated by Satapathy et al.14 Twelve features were calculated from the EEG signals in polysomnography datasets. The classification performances of the machine learning algorithms were compared in three different sets of feature conditions (of 12, 9, and 5 features). Each result for the three feature conditions showed different performance through combinations of EEG features. Santaji et al.15 applied three different epoch lengths (of 1, 2, and 10 s) in the EEG feature-extraction process to identify the effects of EEG signal length on the sleep scoring performance of machine learning classifiers. The authors verified that the classification performance of three classifiers (decision tree, support vector machine, and random forest) can be altered based on the feature extraction conditions. Based on associated studies, including the aforementioned three studies, we evaluated the classification performance of machine learning algorithms using several combinations of biosignals (i.e. ECG, EEG, EOG left (EOGL), EOG right (EOGR), and EMG) in this study. Furthermore, different combinations of windows and signal lengths during feature extraction were compared. Finally, the performance of each model was validated by examining feature importance.

Methods

Overview

To compare the influences of different biosignals and several conditions for feature extraction on the performance of machine learning algorithms for sleep stage classification, we composed a five-step research scheme. First, a total of five biosignals (ECG, EEG, EMG, EOG from left-eye-movements, and EOG from right-eye-movements) were extracted from the polysomnography dataset (sleep heart health study dataset). Second, 64 features were calculated from the 5 previously selected biosignals, with diverse window and signal length conditions. Third, utilizing the extracted 64 feature sets, we created all possible combinations based on the biosignals (e.g. combinations with two signals: ECG + EEG or ECG + EMG). Fourth, each dataset of predefined conditions was used to train and evaluate the machine learning classification algorithm (XGBoost classifier). Finally, the classification performance of the XGBoost classifier was evaluated using four performance indices. In addition, the feature importance results of the experimental conditions with the highest classification performance are based on the trained algorithms. A detailed depiction of the present research scheme is presented in Figure 1.

Figure 1.

Figure 1.

Overview of the research scheme. ECG: electrocardiogram; EEG: electroencephalogram; EMG: electromyogram; EOGL: electrooculogram left; EOGR: electrooculogram right.

Data source

An open-source polysomnography dataset from the sleep heart health study (SHHS) was used.16,17 The SHHS is a multicenter cohort study conducted by the National Heart Lung and Blood Institute in the United States aiming to investigate cardiovascular and other consequences of sleep-disordered breathing. A total of 9736 participants (mean age: 40 years) were tested for associations between sleep-related breathing and the risk of heart disease, stroke, and hypertension. In addition, the respective dataset was collected over two cycles. The first cycle (SHHS visit 1) included surveys from 6441 participants enrolled between November 1, 1995, and January 31, 1998. Second-cycle surveys (SHHS visit 2) were conducted from January 2001 to June 2003 including 3295 participants. The final included datasets included polysomnography and survey data. In the case of polysomnography datasets, several collected biosignals were saved in EDF file format. One EDF file was used per participant. Consequently, 9736 EDF files were included in the SHHS dataset. Each biosignal in EDF files was labeled with sleep level scores in 30 s intervals. A total of six sleep level scores were included in polysomnography datasets (e.g. awake level and 1 ∼ 5 levels based on the sleep depth). For the survey results datasets, each response to the survey questions was included in two Excel files (i.e. SHHS1.xlsx and SHHS2.xlsx). The detailed subcategories of the variables of the SHHS dataset are listed in Table 1.

Table 1.

Categories of variables in the SHHS dataset.

No. Category No. Category No. Category
1 Demographics 5 Medication 9 Family history CVD
2 SES 6 Smoking 10 Diabetes
3 Obesity/overweight 7 Alcohol intake 11 Lipids
4 Blood pressure/hypertension 8 Subclinical CVD 12 Respiratory diseases and symptoms

CVD: chronic vascular disease; SHHS: sleep heart health study.

Feature extraction from biosignals

An overview of the feature-extraction process is shown in Figure 2. Among the two categories of data (i.e. biosignal and survey data) in the SHHS dataset, besides survey data we only used five biosignals (ECG, EEG, EMG, EOGL, and EOGR) from the polysomnography dataset. To examine the influence of signal length and window length during the feature-extraction process on classification, we created 18 conditions (i.e. 3 signal length conditions × 6 window length conditions = total 18 conditions). In the case of signal length, 3 length conditions were applied (60, 90, and 120 s). Sliced signals were used for feature extraction based on 6 window-width conditions (15, 20, 30, 40, 50, and 60 s). Before feature extraction, we extracted consecutive intervals of the biosignals with the same sleep labels to apply the aforementioned conditions considering each biosignal (e.g. 2 min lengths of biosignals with the same sleep level, i.e., 4 intervals with 30 s lengths and the same sleep score level). The sampling frequencies of each of the five different biosignals were also considered in the feature-extraction process. For example, ECG signals were measured at a sampling frequency of 250 Hz. Based on the sampling frequency of the ECG signals (250 Hz), 15,000 samples were sliced into 60 s signal length conditions. The overall feature-extraction process is depicted in Figure 2. Additionally, to validate the diverse features extracted from the 5 biosignals, we extracted 64 features from the signals. Detailed lists of the features are listed in Appendices A and B (additional descriptions of each feature are included in Appendices C and D). Furthermore, 23 combinations of biosignals were used to validate the usability of each signal. The combinations are listed in Table 2. As a result, we evaluated the classification performance from 414 conditions (23 combinations of biosignals × 18 conditions in signal and window lengths = 414 conditions) in this study. To reflect the characteristics of the biosignals as much as possible, samples of all biosignals utilized in this study were normalized to a range of 0 to 1 before feature extraction.

Figure 2.

Figure 2.

Example of feature-extraction process (15 s length window and 120 s length ECG signals). ECG: electrocardiogram.

Table 2.

Combinations of five biosignals in the SHHS dataset.

No. Number of signals Combination No. Number of signals Combination
1 4 ECG + EEG + EMG + EOGL 13 2 EEG + ECG
2 ECG + EEG + EMG + EOGR 14 EEG + EMG
3 3 ECG + EMG + EOGL 15 EEG + EOGL
4 ECG + EMG + EOGR 16 EEG + EOGR
5 EEG + ECG + EMG 17 EMG + EOGL
6 EEG + ECG + EOGL 18 EMG + EOGR
7 EEG + ECG + EOGR 19 1 ECG
8 EEG + EMG + EOGL 20 EEG
9 EEG + EMG + EOGR 21 EMG
10 2 ECG + EMG 22 EOGL
11 ECG + EOGL 23 EOGR
12 ECG + EOGR

ECG: electrocardiogram; EEG: electroencephalogram; EMG: electromyogram; EOGL: electrooculogram left; EOGR: electrooculogram right.

Machine learning algorithm for validation

Among various machine learning classification algorithms, we selected extreme gradient boosting (XGBoost) classifiers based on previous studies on similar research topics.1820 These supervised algorithms can be used for both regression and classification problems. In our cases, we applied the XGBoost algorithm to classify sleep stages. Because these algorithms are ensembles of decision tree models, the classification and regression tree (CART) algorithm is the basis of the XGBoost algorithms. Predicted values from multiple CART algorithms (i.e. the decision tree model) were summarized to calculate the final prediction. The final prediction of the XGBoost algorithm is calculated using the following equation:

yi^=k=1Kfk(xi),fkF (1)

where yi^ indicates the summed prediction for each decision tree from xi. fk denotes the decision tree models with CART algorithms. Based on the predicted value yi^, the objective function of the XGBoost classifier checks the differences between the prediction and target using the loss function. The objective function of the XGBoost classifier is as follows:

Obj=i=1nl(yi,yi^)+k=1KΩ(fk) (2)

In Equation (2), l means loss function for comparing the target value (yi) and prediction value (yi^). In addition, to prevent overfitting of the algorithms, a regularization term for the decision tree model was added in the function. In conclusion, the XGBoost algorithms determine the prediction values from the predicted values of the trained multiple decision tree models.

Evaluation metrics

To evaluate the classification performance of the XGBoost classifiers, we utilized four evaluation metrics (precision, recall, F1-score, and accuracy). To obtain the four aforementioned metrics, we obtained confusion matrices from the trained classifiers. From each confusion maix, true positive (TP), false positive (FP), true negative (TN), and false negative (FN) were calculated. The TP and TN values indicate the ratio of samples correctly classified. FP and FN denote the ratio of the incorrectly classified samples. Finally, we obtain four evaluation indices using the following equations:

Precision=TPTP+FP (3)
Recall=TPTP+FN (4)
F1score=2×Precision×RecallPrecision+Recall (5)
Accuracy=TP+TNTP+FP+TN+FN. (6)

Tools

XGBoost classifiers were built, and data preprocessing was performed using Python (version 3.7.1; scikit-learn, version 2.4.1) and R (version 4.0.3).

Results

Classification performances of machine learning classifier

Based on the extracted features, 414 final datasets for the experimental conditions were used to apply the XGBoost classifier model. In each final dataset, an average of 1,200,000 rows was included with features of signal combinations. For example, in the case of a dataset with a 120 s length signal and a 15 s length window from ECG signals, the dimension of the dataset was (1,233,053, 18). Here, 1,233,053 denotes the number of rows, and 18 indicates the number of ECG features in the dataset.

Using the aforementioned 414 datasets for evaluation, we compared the classification performances between the 414 experimental conditions to determine the optimized window and biosignal length for sleep stage identification. Among the tested experimental conditions, classification performance of “EEG + EMG + ECG” with 40 s window and 120 s signal length showed the highest evaluation metric values (precision: 0.853, recall: 0.855, F1-score: 0.853, and accuracy: 0.853). Table 3 and Appendices E, F, G, H, and I present the full results.

Table 3.

Averaged classification performances with features from a 40 s length window.

Signal condition (combinations) Window and signal length : 40 s and 120 s Window and signal length : 40 s and 90 s Window and signal length : 40 s and 60 s
Precision Recall F1-score Accuracy Precision Recall F1-score Accuracy Precision Recall F1-score Accuracy
ECG + EEG + EMG + EOGL 0.850 0.841 0.845 0.849 0.835 0.832 0.835 0.833 0.733 0.735 0.731 0.734
ECG + EEG + EMG + EOGR 0.851 0.843 0.839 0.846 0.835 0.833 0.832 0.833 0.732 0.734 0.730 0.734
ECG + EMG + EOGL 0.848 0.853 0.854 0.853 0.805 0.805 0.802 0.802 0.453 0.471 0.458 0.471
ECG + EMG + EOGR 0.847 0.843 0.840 0.854 0.811 0.816 0.825 0.817 0.459 0.478 0.465 0.478
EEG + ECG + EMG 0 . 853 0.855 0.853 0.853 0.826 0.821 0.811 0.821 0.735 0.732 0.730 0.731
EEG + ECG + EOGL 0.849 0.847 0.844 0.848 0.828 0.828 0.820 0.818 0.684 0.687 0.683 0.687
EEG + ECG + EOGR 0.848 0.845 0.845 0.840 0.832 0.830 0.831 0.824 0.680 0.673 0.671 0.673
EEG + EMG + EOGL 0.852 0.847 0.847 0.843 0.831 0.832 0.835 0.824 0.745 0.740 0.737 0.739
EEG + EMG + EOGR 0.853 0.848 0.851 0.845 0.833 0.832 0.837 0.831 0.741 0.737 0.734 0.736
ECG + EMG 0.446 0.463 0.452 0.463 0.439 0.458 0.445 0.458 0.429 0.450 0.434 0.450
ECG + EOGL 0.463 0.469 0.462 0.462 0.330 0.340 0.333 0.340 0.451 0.463 0.460 0.463
ECG + EOGR 0.463 0.465 0.474 0.466 0.331 0.344 0.334 0.344 0.457 0.461 0.461 0.451
EEG + ECG 0.840 0.839 0.842 0.844 0.829 0.827 0.829 0.826 0.682 0.682 0.679 0.681
EEG + EMG 0.829 0.829 0.831 0.835 0.819 0.823 0.824 0.818 0.740 0.734 0.732 0.734
EEG + EOGL 0.832 0.828 0.823 0.827 0.818 0.822 0.825 0.821 0.683 0.685 0.681 0.684
EEG + EOGR 0.831 0.837 0.836 0.830 0.822 0.820 0.822 0.818 0.673 0.672 0.669 0.671
EMG + EOGL 0.475 0.470 0.482 0.490 0.497 0.503 0.492 0.495 0.426 0.447 0.431 0.447
EMG + EOGR 0.493 0.479 0.485 0.497 0.503 0.504 0.495 0.491 0.432 0.453 0.437 0.453
ECG 0.302 0.308 0.303 0.308 0.302 0.308 0.303 0.308 0.281 0.289 0.282 0.289
EEG 0.814 0.816 0.817 0.822 0.819 0.823 0.820 0.818 0.700 0.703 0.698 0.702
EMG 0.397 0.418 0.403 0.418 0.397 0.421 0.402 0.421 0.401 0.426 0.405 0.426
EOGL 0.455 0.450 0.451 0.449 0.460 0.461 0.452 0.458 0.265 0.278 0.264 0.278
EOGR 0.451 0.453 0.456 0.453 0.461 0.455 0.462 0.470 0.267 0.290 0.246 0.290

ECG: electrocardiogram; EEG: electroencephalogram; EMG: electromyogram; EOGL: electrooculogram left; EOGR: electrooculogram right.

Feature importance of experimental condition with best classification performances

To validate model performance in terms of features important for classification, we examined feature importance considering the trained classifiers. The feature importance considering the model with the highest classification performance (i.e. “EEG + ECG + EMG” condition with 40 s window and 120 s signals) was examined. Ten different sets of results on feature importance were compared because 10-fold cross-validation was used for model training and evaluation. For important features from top 1 to 4 ranks, the same trends were confirmed. Only features extracted from EEG signals (“DELTA,” “Higuchi_Fractal_Dimension,” “Petrosian_Fractal_Dimension,” and “Detrended_Fluctuation_Analysis”) were included in the top four features. Features extracted from other signals (ECG and EMG) showed different rank trends considering importance results. Results considering feature importance from the top 1 to 10 features are detailed in Table 4.

Table 4.

Top 10 feature importance of EEG + ECG + EMG condition with 40 s window and 120 s signals.

1CV 2CV 3CV 4CV 5CV
Feature F-score Feature F-score Feature F-score Feature F-score Feature F-score
DELTA 132 DELTA 134 DELTA 125 DELTA 133 DELTA 136
Higuchi_Fractal_Dimension 67 Higuchi_Fractal_Dimension 76 Higuchi_Fractal_Dimension 75 Higuchi_Fractal_Dimension 71 Higuchi_Fractal_Dimension 63
Petrosian_Fractal_Dimension 59 Petrosian_Fractal_Dimension 66 Petrosian_Fractal_Dimension 56 Petrosian_Fractal_Dimension 55 Petrosian_Fractal_Dimension 59
Detrended_Fluctuation_Analysis 44 Detrended_Fluctuation_Analysis 51 Detrended_Fluctuation_Analysis 42 Detrended_Fluctuation_Analysis 48 Detrended_Fluctuation_Analysis 41
HRV_TINN 39 HRV_TINN 48 HRV_RMSSD 41 WAMP 45 WAMP 39
HRV_MCVNN 38 Hjorth_mobility 41 WAMP 36 HRV_SDNN 23 HRV_TINN 32
Hurst_Exponent 34 WAMP 27 HRV_MeanNN 24 Hurst_Exponent 23 Hjorth_mobility 27
WAMP 26 HRV_MCVNN 26 Hurst_Exponent 23 Hjorth_mobility 23 Hurst_Exponent 20
HRV_SDNN 26 Hurst_Exponent 20 HRV_SDNN 19 HRV_MCVNN 18 HRV_SDNN 16
HRV_MeanNN 25 WL 16 Hjorth_mobility 15 HRV_IQRNN 16 PKF 16
6CV 7CV 8CV 9CV 10CV
Feature F-score Feature F-score Feature F-score Feature F-score Feature F-score
DELTA 134 DELTA 128 DELTA 123 DELTA 133 DELTA 134
Higuchi_Fractal_Dimension 79 Higuchi_Fractal_Dimension 71 Higuchi_Fractal_Dimension 80 Higuchi_Fractal_Dimension 85 Higuchi_Fractal_Dimension 74
Petrosian_Fractal_Dimension 71 Petrosian_Fractal_Dimension 60 Petrosian_Fractal_Dimension 60 Petrosian_Fractal_Dimension 68 Petrosian_Fractal_Dimension 61
Detrended_Fluctuation_Analysis 42 Detrended_Fluctuation_Analysis 49 Detrended_Fluctuation_Analysis 39 Detrended_Fluctuation_Analysis 63 Detrended_Fluctuation_Analysis 54
WAMP 35 HRV_TINN 34 WAMP 29 WAMP 38 Hjorth_mobility 30
Hurst_Exponent 32 WAMP 29 HRV_TINN 21 Hurst_Exponent 34 HRV_TINN 27
Hjorth_mobility 25 HRV_SDNN 25 HRV_MCVNN 21 Hjorth_mobility 28 HRV_SDNN 26
WL 19 WENT 23 Hurst_Exponent 20 HRV_SDNN 23 HRV_MCVNN 19
HRV_SDNN 18 Hjorth_mobility 19 HRV_IQRNN 14 MDF 20 Hurst_Exponent 18
Hjorth_complexity 18 HRV_MCVNN 19 HRV_CVNN 13 HRV_MadNN 20 RMS 18

ECG: electrocardiogram; EEG: electroencephalogram; EMG: electromyogram; EOGL: electrooculogram left; EOGR: electrooculogram right.

Discussion

In this study, we tested the classification performance of XGBoost classifiers under diverse conditions to determine the optimal window and signal conditions for sleep stage classification. Before conducting our research, we attempted to propose reasonable evidence regarding our research topics (i.e. identifying optimal signal and window conditions for sleep stage classification with machine learning algorithms). First, in the case of two keywords related to machine learning and sleep stage classification, Surantha et al.21 utilized an extreme learning machine and support vector machine (SVM) to classify sleep stages with diverse class conditions. Aboalayon et al.22 compared five supervised classification machine learning algorithms (decision tree, neural network, k-nearest neighbors, naive Bayes, and SVM) for sleep stage classification tasks.

Second, regarding optimal feature extraction conditions (third keyword), Satapathy et al.23 validated 12 statistical features from input biosignals to find optimal feature sets in sleep-level identification. The usability of each feature was verified using the accuracy results of the random forest algorithms. Santaji and Desai24 extracted nine EEG features, including time and frequency domain features, to detect rapid eye movement (REM) sleep and non-REM sleep stages. In addition, different conditions for the amplitude and frequency ranges of EEG signals were used in the feature-extraction process. For several EEG features with various conditions, the utility of each feature was compared with the performance of machine learning algorithms. Based on the aforementioned studies, we concluded that our research aims were appropriate.

To construct a research scheme for our study, we considered several previous studies with similar research topics. Şen et al.25 applied five different machine learning classification algorithms (random forest, feed-forward neural network, decision tree, support vector machine, and radial basis function neural network) to identify the sleep levels. Their study consisted of three stages: feature extraction from EEG signals, feature selection, and classification using machine learning algorithms. In the first stage, 41 features in 4 different categories (time, nonlinear, frequency-based, and entropy) were extracted from the EEG signals. Among 41 features, the highest effective features were selected with associated 5 algorithms (“fast correlation based filter,” “mRMR algorithm,” “fisher score algorithm,” “t-test algorithm,” and “ReliefF algorithm”) in the feature selection stage. In the last stage, five machine learning classifiers were used to compare the classification performance in sleep scoring.

Ugi et al.26 proposed a sleep stage classification framework with a machine learning classifier in two classes (awake and sleep). Four phases were included in their research (“segmentation and filtering,” “feature extraction,” “estimation,” and “performance check”). The ECG signal collected from each participant was segmented using 30 s epochs and filtered using a finite impulse response filter at a band frequency of 0.05 ∼ 35 Hz in the first phase. In the second phase (feature-extraction process), three ECG features (mean, variance, and standard deviation) were extracted from each segment signal with 30 s length. The three extracted features were applied to the SVM classifiers for sleep stage classification in the estimation phase. Finally, the classification performance of the optimized SVM models was evaluated using three metrics (accuracy, precision, and recall).

Satapathy and Loganathan27 suggested a classification methodology that uses dual-channel EEG signals for automated sleep staging. Their research was composed of three steps (“feature extraction from EEG signals,” “feature selection,” and “classification with machine learning algorithms”). In the first step, linear and nonlinear features are extracted from the input signals. In the second step, the optimal features were selected from the extracted feature sets using the ReliefF weight algorithm. Random forest classification model was trained and evaluated using a 10-fold cross-validation strategy in the final step.

Similar to previous studies, including those mentioned above, we included several common steps (“feature extraction from biosignals,” “classification with machine learning algorithm,” and “performance evaluation with metrics”) in our research scheme. However, in our research, we focused on validating the influence of window length and biosignals in the feature-extraction process. To compare the effects of window length and biosignal length in feature extraction, 6 window length conditions (15, 20, 30, 40, 50, and 60 s) and 3 biosignal length conditions (60, 90, and 120 s) were used. Additionally, a total of five biosignals (ECG, EEG, EMG, EOGL, and EOGR) and their combinations were utilized to check the optimal combination of biosignals in sleep stage classification. Furthermore, unlike previous studies that compared several machine learning classifiers, only a single machine learning algorithm was used in this research to concentrate on the effects of the signal and window length for classification. Among the diverse set of choices of available machine learning classifiers, based on previous studies, we utilized XGBoost classifiers. Siyuan et al.28 compared three machine learning algorithms (XGBoost, AdaBoost, and SVM) in sleep staging research. In their experimental results, XGBoost classifiers showed better performance (accuracy: 90.6%) than AdaBoost and SVM classifiers, which have been widely applied in related studies. In addition, Choi et al.29 used XGBoost classifiers to develop a framework for detecting extreme drowsiness using short-time segment EEG signals. The authors showed the possibilities of these algorithms for classification with a relatively insufficient biosignal length.

To interpret our experimental results, we compared our results with those of previous studies. First, the condition with EEG + EMG + ECG, 40 s window, and 120 s signal length showed the best classification performance (precision: 0.853, recall: 0.855, F1-score: 0.853, and accuracy: 0.853) under all experimental conditions. Choi et al.29 observed similar trends (accuracy: 0.788, sensitivity: 0.788, and specificity: 0.787) in the classification performances of the XGBoost classifiers in similar research settings. Similar to our study, the authors have only applied filtering methods without detailed preprocessing steps. They extracted features in a 2 s window from EEG signals. Additionally, their framework classifies binary classes (extremely drowsy and normal). Hei et al.30 also suggested an XGBoost algorithm-based sleep stage classification framework with similar performance levels (average accuracy: 0.830). They used similar research designs as that in this study. They applied only filtering methods to preprocess the EEG and EOG signals. Furthermore, each feature was calculated using a 30 s window.

Second, in three biosignals (EEG, ECG, and EMG), we compared the relative importance of each biosignal through the performance of other combinations (e.g. EEG + EMG or EEG + ECG) in the same window and signal length. In the case of combinations with two biosignals, the ECG + EMG condition showed precision: 0.446, recall: 0.463, F1-score: 0.452, and accuracy: 0.463, whereas, conditions including EEG (EEG + EMG and EEG + ECG) showed better performance (EEG + EMG showed precision: 0.829, recall: 0.829, F1-score: 0.831, and accuracy: 0.835 / EEG + ECG had precision: 0.840, recall: 0.839, F1-score: 0.842, and accuracy: 0.844) than ECG + EMG conditions. Similarly, the best performance was observed in the EEG condition of a single biosignal (precision: 0.814, recall: 0.816, F1-score: 0.817, and accuracy: 0.822). Bin Heyat et al.31 compared the performance of several combinations of ECG, EMG, and EEG signals. The EEG signal conditions exhibited the best classification performance under the experimental conditions.

Finally, in feature importance results, four EEG features (“DELTA,” “Higuchi_Fractal_Dimension,” “Petrosian_Fractal_Dimension,” and “Detrended_Fluctuation_Analysis”) were commonly included in the top 1 to 4. It is associated with trends related to the aforementioned results that EEG signals are most important for classification. Furthermore, the delta wave of the EEG signal is related to sleep.32,33 These results further support the validity of the present results.

Strengths and limitations

This study has several strengths and limitations. As regards the strengths, the diverse combinations (i.e. 414 experimental conditions) of 5 biosignals and feature extraction conditions (signal and window length) were compared to determine the optimal conditions for sleep stage classification. Second, we validated our results using the feature importance of the trained XGBoost classifiers with the highest classification performances. However, our study also had some limitations. First, we used only one machine learning algorithm (i.e. XGBoost classifier) to investigate the optimal conditions for sleep levels. Although only one algorithm was applied, this algorithm has been widely applied in previous studies and has attained higher performance than that of other algorithms. Second, other latent patterns in biosignals for sleep stage classification can be identified using other data-driven algorithms (e.g. deep learning algorithms). Third, rather than using all six stages, different sleep stage combinations can be applied to find meaningful features for classification (e.g. classify stages between awake and stage 1). We considered that our experimental results can be used as a preliminary data for associated studies. Furthermore, we plan to examine other sleep stage conditions and patterns in future studies.

Conclusion

Accurate sleep stage classification is critical for various fields, including medicine and psychology. In this study, we compared several window lengths and biosignal conditions in feature extraction to determine the optimal combination of biosignals and respective conditions for sleep stage classification using machine learning algorithms. To examine the influence of each condition on the classification performance, 414 experimental conditions, including different biosignal combinations, were applied. We found that EEG, ECG, and EMG combinations with a 40 s length window and 120 s signal length show the best classification performance (precision: 0.853, recall: 0.855, F1-score: 0.853, and accuracy: 0.853) considering all evaluation metrics used in the present research setting. In addition, we found that the importance of EEG features was higher than that of ECG and EMG features based on the present results. Moreover, we validated the importance of EEG features for sleep stage classification by comparing the results with those of previous studies. In conclusion, we confirmed that our results are reasonable in terms of both quantitative (i.e. classification performance) and qualitative aspects (i.e. feature importance). Our research can provide appropriate evidence regarding window and signal length for researchers who want to conduct similar studies. Furthermore, to generalize our experimental results, we will conduct additional analyses in future studies.

Supplemental Material

sj-docx-1-dhj-10.1177_20552076231163783 - Supplemental material for Validation of the influence of biosignals on performance of machine learning algorithms for sleep stage classification

Supplemental material, sj-docx-1-dhj-10.1177_20552076231163783 for Validation of the influence of biosignals on performance of machine learning algorithms for sleep stage classification by Junggu Choi, Seohyun Kwon, Sohyun Park and Sanghoon Han in Digital Health

Acknowledgements

We would like to thank Editage (www.editage.co.kr) for editing and reviewing this manuscript for English language.

Appendix

Appendix A.

List of features extracted from biosignals (ECG and EEG).

No. Biosignal Feature No. Biosignal Feature
1 ECG Mean NN 18 LnHF
2 SDNN 19 EEG Delta
3 RMSSD 20 Theta
4 SDSD 21 Alpha
5 CVNN 22 Beta
6 CVSD 23 Delta ratio (%)
7 Median NN 24 Theta ratio (%)
8 Mad NN 25 Alpha ratio (%)
9 MCV NN 26 Beta ratio (%)
10 IQR NN 27 Petrosian Fractal Dimension
11 pNN 50 28 Higuchi Fractal Dimension
12 pNN 20 29 Hjorth mobility
13 HTI 30 SVD Entropy
14 TINN 31 Fisher information
15 HF 32 Approximate Entropy
16 VHF 33 Detrended Fluctuation Analysis
17 HFn 34 Hurst Exponent

ECG: electrocardiogram; EEG: electroencephalogram.

Appendix B.

List of features extracted from biosignals (EMG and EOG).

No. Biosignal Feature No. Biosignal Feature
1 EMG VAR 18 EOG Minimum value
2 RMS 19 Maximum value
3 IEMG 20 Median value
4 MAV 21 Kurtosis
5 LOG 22 Skewness
6 WL 23 Frequency power value (0.1 ∼ 2 Hz)
7 ACC 24 Frequency power value (2 ∼ 4 Hz)
8 DASDV 25 Frequency power value (4 ∼ 6 Hz)
9 ZC 26 Frequency power value (6 ∼ 8 Hz)
10 WAMP 27 Frequency power value (8 ∼ 10 Hz)
11 MYOP 28 Frequency power value (10 ∼ 12 Hz)
12 MNP 29 Frequency power value (12 ∼ 14 Hz)
13 TP 30 Frequency power value (14 ∼ 16 Hz)
14 MNF
15 MDF
16 PKF
17 WENT

ECG: electrocardiogram; EEG: electroencephalogram; EOG: electrooculogram.

Appendix C.

Description of features extracted from biosignals (ECG and EEG).

Biosignals Feature Description or formula
ECG Mean NN Mean of RR interval
SDNN The standard deviation of the RR interval
RMSSD The square root of the mean of the sum of successive differences between RR intervals
SDSD Standard deviation of the successive difference between RR intervals
CVNN Standard deviation of RR intervals divided by the mean of the RR intervals
CVSD The root mean square of the sum of successive differences divided by the mean of the RR intervals
Median NN The median of the absolute values of the successive differences between RR intervals.
Mad NN The median absolute deviation of the RR intervals
MCV NN The median absolute deviation of the RR intervals divided by the median of the absolute differences of their successive differences
IQR NN The interquartile range of the RR intervals
pNN 50 The 20th percentile of the RR intervals
pNN 20 The 80th percentile of the RR intervals
HTI The HRV triangular index, measuring the total number of RR intervals divided by the height of the RR intervals histogram
TINN A geometrical parameter of the HRV (It is an approximation of the RR interval distribution.)
HF The spectral power of high frequencies with range from 0.15 to 0.4 Hz
VHF The spectral power of very high frequencies with range from 0.4 to 0.5 Hz
HFn The normalized high frequency, obtained by dividing the low frequency power by the total power
LnHF The log transformed HF
EEG Delta Spectral power of frequency range from 0.5 to 4 Hz
Theta Spectral power of frequency range from 4 to 7 Hz
Alpha Spectral power of frequency range from 7 to 12 Hz
Beta Spectral power of frequency range from 12 to 30 Hz
Delta ratio (%) Spectral power in frequency ranges from 0.5 to 4 Hz normalized by total power
Theta ratio (%) Spectral power in frequency ranges from 4 to 7 Hz normalized by total power
Alpha ratio (%) Spectral power in frequency ranges from 7 to 12 Hz normalized by total power
Beta ratio (%) Spectral power in frequency ranges from 12 to 30 Hz normalized by total power
Petrosian Fractal Dimension Quantity of spatial information with dissimilar pairs in EEG patterns
Higuchi Fractal Dimension Approximated value for the box-counting dimension of the EEG signals
Hjorth mobility The mean frequency or the proportion of standard deviation of the power spectrum in EEG signal
SVD Entropy The number of eigenvectors in EEG signals calculated by singular value decomposition (SVD)
Fisher information Scalar values calculated in normalized singular spectrum
Approximate Entropy Quantified scalar values of the regularity of EEG signals
Detrended Fluctuation Analysis Quantified scalar values of the correlation property in EEG signals
Hurst Exponent Quantified scalar values of the autocorrelation between values of the EEG signals

ECG: electrocardiogram; EEG: electroencephalogram; EOG: electrooculogram.

Appendix D.

Description of features extracted from biosignals (EMG and EOG).

Biosignals Feature Description
EMG (time domain) VAR The variance of EMG signals (time domain feature)
RMS The square root of the mean in EMG signals
IEMG The integral values of absolute signal values in EMG signals
MAV Mean absolute values of EMG signals
LOG The exponential value of sum of log absolute EMG signal values
WL The sum of absolute wavelength value in EMG signals
ACC The sum of absolute values about average amplitude change in EMG signals
DASDV Difference absolute standard deviation values in EMG signals
ZC Zero-crossing values in EMG signals
WAMP Willison amplitude values in EMG signals
MYOP Myopulse percentage rate values in EMG signals
EMG (frequency domain) MNP Mean of spectral power values in EMG signals
TP Total value of spectral power values in EMG signals
MNF Mean of frequency of EMG signals
MDF Median of frequency of EMG signals
PKF Peak of frequency in EMG signals
WENT Wavelet energy values in EMG signals
EOG Minimum value The minimum value in EOG signals
Maximum value The maximum value in EOG signals
median value The median value in EOG signals
kurtosis The kurtosis value in EOG signals
skewness The skewness value in EOG signals
Frequency power value (0.1 ∼ 2 Hz) Spectral power of frequency range from 0.1 to 2 Hz in EOG signals
Frequency power value (2 ∼ 4 Hz) Spectral power of frequency range from 2 to 4 Hz in EOG signals
Frequency power value (4 ∼ 6 Hz) Spectral power of frequency range from 4 to 6 Hz in EOG signals
Frequency power value (6 ∼ 8 Hz) Spectral power of frequency range from 6 to 8 Hz in EOG signals
Frequency power value (8 ∼ 10 Hz) Spectral power of frequency range from 8 to 10 Hz in EOG signals
Frequency power value (10 ∼ 12 Hz) Spectral power of frequency range from 10 to 12 Hz in EOG signals
Frequency power value (12 ∼ 14 Hz) Spectral power of frequency range from 12 to 14 Hz in EOG signals
Frequency power value (14 ∼ 16 Hz) Spectral power of frequency range from 14 to 16 Hz in EOG signals

ECG: electrocardiogram; EEG: electroencephalogram; EOG: electrooculogram.

Appendix E.

Averaged classification performances with features from a 15 s window length.

Signal condition (combinations)  Window and signal length: 15 and 120 s  Window and signal length: 15 and 90 s  Window and signal length: 15 and 60 s
Precision Recall F1-score Accuracy Precision Recall F1-score Accuracy Precision Recall F1-score Accuracy
ECG + EEG + EMG + EOGL 0.820 0.823 0.826 0.827 0.832 0.827 0.829 0.826 0.721 0.720 0.717 0.719
ECG + EEG + EMG + EOGR 0.825 0.827 0.823 0.824 0.832 0.827 0.826 0.829 0.723 0.723 0.718 0.722
ECG + EMG + EOGL 0.818 0.836 0.816 0.808 0.791 0.784 0.797 0.796 0.452 0.469 0.457 0.469
ECG + EMG + EOGR 0.810 0.815 0.818 0.828 0.775 0.773 0.805 0.752 0.454 0.472 0.460 0.472
EEG + ECG + EMG 0.816 0.824 0.819 0.825 0.790 0.805 0.803 0.806 0.754 0.753 0.749 0.753
EEG + ECG + EOGL 0.806 0.803 0.812 0.812 0.821 0.817 0.817 0.820 0.721 0.722 0.717 0.721
EEG + ECG + EOGR 0.805 0.811 0.815 0.816 0.822 0.820 0.815 0.820 0.680 0.673 0.671 0.673
EEG + EMG + EOGL 0.819 0.820 0.819 0.820 0.822 0.828 0.822 0.821 0.725 0.726 0.721 0.725
EEG + EMG + EOGR 0.814 0.822 0.820 0.825 0.830 0.825 0.823 0.821 0.750 0.746 0.744 0.746
ECG + EMG 0.422 0.439 0.428 0.439 0.832 0.827 0.829 0.826 0.418 0.439 0.424 0.439
ECG + EOGL 0.431 0.447 0.425 0.432 0.832 0.827 0.826 0.829 0.267 0.276 0.269 0.276
ECG + EOGR 0.455 0.454 0.439 0.450 0.423 0.442 0.429 0.442 0.269 0.277 0.270 0.277
EEG + ECG 0.830 0.831 0.828 0.835 0.341 0.355 0.344 0.355 0.682 0.682 0.679 0.681
EEG + EMG 0.832 0.830 0.823 0.826 0.352 0.363 0.354 0.363 0.737 0.737 0.733 0.736
EEG + EOGL 0.827 0.825 0.822 0.822 0.830 0.824 0.825 0.825 0.683 0.685 0.681 0.684
EEG + EOGR 0.837 0.832 0.833 0.830 0.828 0.823 0.823 0.823 0.673 0.672 0.669 0.671
EMG + EOGL 0.444 0.459 0.447 0.447 0.830 0.825 0.825 0.825 0.378 0.403 0.383 0.403
EMG + EOGR 0.441 0.442 0.433 0.451 0.831 0.832 0.829 0.824 0.421 0.442 0.426 0.442
ECG 0.306 0.310 0.307 0.310 0.452 0.453 0.453 0.450 0.283 0.290 0.284 0.290
EEG 0.803 0.817 0.828 0.818 0.457 0.449 0.447 0.442 0.700 0.703 0.698 0.702
EMG 0.372 0.394 0.379 0.394 0.300 0.305 0.300 0.305 0.380 0.407 0.385 0.407
EOGL 0.405 0.408 0.404 0.415 0.419 0.416 0.423 0.422 0.284 0.308 0.268 0.308
EOGR 0.414 0.421 0.417 0.416 0.417 0.413 0.418 0.418 0.304 0.310 0.277 0.310

ECG: electrocardiogram; EEG: electroencephalogram; EMG: electromyogram; EOGL: electrooculogram left; EOGR: electrooculogram right.

Appendix F.

Averaged classification performances with features from a 20 s window length.

Signal condition (combinations)  Window and signal length: 20 and 120 s  Window and signal length: 20 and 90 s  Window and signal length: 20 and 60 s
Precision Recall F1-score Accuracy Precision Recall F1-score Accuracy Precision Recall F1-score Accuracy
ECG + EEG + EMG + EOGL 0.821 0.822 0.817 0.822 0.834 0.832 0.833 0.832 0.733 0.735 0.731 0.734
ECG + EEG + EMG + EOGR 0.827 0.827 0.822 0.822 0.832 0.834 0.833 0.835 0.732 0.734 0.730 0.734
ECG + EMG + EOGL 0.809 0.815 0.822 0.820 0.810 0.813 0.817 0.813 0.441 0.462 0.447 0.462
ECG + EMG + EOGR 0.807 0.821 0.826 0.822 0.821 0.831 0.822 0.827 0.453 0.471 0.459 0.471
EEG + ECG + EMG 0.805 0.814 0.806 0.806 0.830 0.828 0.821 0.824 0.735 0.732 0.730 0.731
EEG + ECG + EOGL 0.818 0.817 0.822 0.814 0.831 0.830 0.833 0.83 0.721 0.721 0.718 0.721
EEG + ECG + EOGR 0.808 0.815 0.819 0.821 0.821 0.831 0.84 0.836 0.704 0.705 0.701 0.704
EEG + EMG + EOGL 0.813 0.816 0.821 0.820 0.829 0.832 0.837 0.832 0.745 0.740 0.737 0.739
EEG + EMG + EOGR 0.813 0.814 0.815 0.813 0.835 0.833 0.835 0.832 0.741 0.737 0.734 0.736
ECG + EMG 0.436 0.452 0.442 0.452 0.427 0.447 0.434 0.447 0.420 0.441 0.426 0.441
ECG + EOGL 0.452 0.454 0.452 0.447 0.318 0.332 0.321 0.332 0.269 0.277 0.271 0.277
ECG + EOGR 0.462 0.457 0.453 0.452 0.342 0.354 0.344 0.354 0.268 0.276 0.269 0.276
EEG + ECG 0.830 0.830 0.826 0.833 0.827 0.829 0.829 0.832 0.705 0.705 0.701 0.705
EEG + EMG 0.824 0.823 0.821 0.825 0.823 0.823 0.823 0.824 0.740 0.734 0.732 0.734
EEG + EOGL 0.826 0.822 0.821 0.823 0.824 0.824 0.824 0.825 0.715 0.712 0.709 0.711
EEG + EOGR 0.830 0.823 0.822 0.824 0.813 0.824 0.829 0.83 0.714 0.713 0.710 0.713
EMG + EOGL 0.421 0.432 0.448 0.443 0.457 0.446 0.457 0.443 0.391 0.415 0.396 0.415
EMG + EOGR 0.437 0.455 0.441 0.453 0.457 0.438 0.442 0.432 0.421 0.445 0.427 0.445
ECG 0.304 0.308 0.305 0.308 0.298 0.303 0.298 0.303 0.287 0.293 0.287 0.293
EEG 0.802 0.817 0.819 0.817 0.814 0.821 0.8175 0.821 0.667 0.668 0.663 0.667
EMG 0.380 0.402 0.386 0.402 0.389 0.412 0.394 0.412 0.389 0.414 0.392 0.414
EOGL 0.421 0.423 0.435 0.431 0.419 0.419 0.416 0.416 0.250 0.271 0.237 0.271
EOGR 0.425 0.422 0.425 0.420 0.431 0.430 0.441 0.438 0.285 0.299 0.279 0.299

ECG: electrocardiogram; EEG: electroencephalogram; EMG: electromyogram; EOGL: electrooculogram left; EOGR: electrooculogram right.

Appendix G.

Averaged classification performances with features from a 30 s window length.

Signal condition (combinations)  Window and signal length: 30 and 120 s  Window and signal length: 30 and 90 s  Window and signal length: 30 and 60 s
Precision Recall F1-score Accuracy Precision Recall F1-score Accuracy Precision Recall F1-score Accuracy
ECG + EEG + EMG + EOGL 0.824 0.830 0.828 0.825 0.831 0.832 0.834 0.832 0.721 0.720 0.717 0.719
ECG + EEG + EMG + EOGR 0.827 0.829 0.823 0.823 0.831 0.829 0.831 0.828 0.723 0.723 0.718 0.722
ECG + EMG + EOGL 0.831 0.824 0.836 0.832 0.801 0.800 0.806 0.811 0.451 0.469 0.456 0.469
ECG + EMG + EOGR 0.829 0.832 0.829 0.835 0.815 0.811 0.808 0.810 0.467 0.483 0.471 0.483
EEG + ECG + EMG 0.825 0.823 0.827 0.822 0.810 0.814 0.813 0.816 0.453 0.471 0.459 0.471
EEG + ECG + EOGL 0.829 0.828 0.824 0.826 0.820 0.822 0.815 0.818 0.555 0.567 0.558 0.566
EEG + ECG + EOGR 0.829 0.823 0.824 0.827 0.825 0.823 0.822 0.821 0.680 0.683 0.678 0.683
EEG + EMG + EOGL 0.830 0.825 0.824 0.827 0.833 0.824 0.831 0.831 0.725 0.726 0.721 0.725
EEG + EMG + EOGR 0.828 0.826 0.827 0.824 0.834 0.833 0.832 0.833 0.750 0.746 0.744 0.746
ECG + EMG 0.437 0.453 0.442 0.453 0.434 0.452 0.440 0.452 0.428 0.448 0.434 0.448
ECG + EOGL 0.447 0.453 0.457 0.455 0.327 0.338 0.329 0.338 0.268 0.277 0.270 0.277
ECG + EOGR 0.454 0.452 0.453 0.463 0.358 0.367 0.360 0.367 0.269 0.277 0.271 0.277
EEG + ECG 0.831 0.825 0.823 0.832 0.827 0.826 0.825 0.824 0.703 0.701 0.698 0.701
EEG + EMG 0.828 0.823 0.821 0.820 0.813 0.822 0.822 0.824 0.737 0.737 0.733 0.736
EEG + EOGL 0.825 0.822 0.828 0.824 0.813 0.819 0.822 0.822 0.697 0.699 0.696 0.699
EEG + EOGR 0.833 0.825 0.830 0.823 0.822 0.824 0.824 0.826 0.681 0.679 0.675 0.679
EMG + EOGL 0.448 0.450 0.458 0.460 0.459 0.466 0.477 0.463 0.424 0.444 0.428 0.444
EMG + EOGR 0.458 0.466 0.464 0.460 0.464 0.470 0.474 0.455 0.444 0.462 0.448 0.462
ECG 0.308 0.314 0.309 0.314 0.300 0.305 0.300 0.305 0.282 0.289 0.283 0.289
EEG 0.811 0.817 0.810 0.810 0.827 0.823 0.817 0.823 0.726 0.724 0.721 0.724
EMG 0.393 0.414 0.399 0.414 0.395 0.418 0.400 0.418 0.393 0.419 0.398 0.419
EOGL 0.442 0.437 0.431 0.433 0.448 0.453 0.448 0.448 0.254 0.275 0.232 0.275
EOGR 0.430 0.430 0.421 0.438 0.454 0.459 0.456 0.455 0.312 0.316 0.274 0.316

ECG: electrocardiogram; EEG: electroencephalogram; EMG: electromyogram; EOGL: electrooculogram left; EOGR: electrooculogram right.

Appendix H.

Averaged classification performances with features from a 50 s length window.

Signal condition (combinations)  Window and signal length: 50 and 120 s  Window and signal length: 50 and 90 s  Window and signal length: 50 and 60 s
Precision Recall F1-score Accuracy Precision Recall F1-score Accuracy Precision Recall F1-score Accuracy
ECG + EEG + EMG + EOGL 0.821 0.830 0.826 0.828 0.831 0.835 0.834 0.831 0.721 0.720 0.717 0.719
ECG + EEG + EMG + EOGR 0.827 0.828 0.827 0.829 0.832 0.834 0.834 0.833 0.723 0.723 0.718 0.722
ECG + EMG + EOGL 0.810 0.815 0.797 0.809 0.807 0.808 0.814 0.788 0.464 0.483 0.470 0.483
ECG + EMG + EOGR 0.828 0.821 0.819 0.817 0.823 0.816 0.815 0.819 0.462 0.479 0.467 0.479
EEG + ECG + EMG 0.823 0.823 0.816 0.816 0.832 0.832 0.826 0.833 0.754 0.753 0.749 0.753
EEG + ECG + EOGL 0.822 0.821 0.820 0.819 0.831 0.830 0.827 0.827 0.707 0.709 0.705 0.709
EEG + ECG + EOGR 0.822 0.820 0.825 0.813 0.832 0.832 0.833 0.831 0.704 0.705 0.701 0.704
EEG + EMG + EOGL 0.824 0.822 0.821 0.822 0.827 0.833 0.834 0.832 0.745 0.740 0.737 0.739
EEG + EMG + EOGR 0.825 0.828 0.824 0.826 0.834 0.835 0.835 0.832 0.750 0.746 0.744 0.746
ECG + EMG 0.445 0.461 0.450 0.461 0.440 0.458 0.445 0.458 0.436 0.457 0.441 0.457
ECG + EOGL 0.461 0.464 0.458 0.465 0.335 0.35 0.338 0.350 0.453 0.458 0.461 0.464
ECG + EOGR 0.462 0.455 0.452 0.456 0.330 0.342 0.333 0.342 0.458 0.457 0.461 0.463
EEG + ECG 0.826 0.828 0.828 0.831 0.826 0.824 0.825 0.827 0.703 0.701 0.698 0.701
EEG + EMG 0.824 0.825 0.824 0.829 0.825 0.822 0.822 0.825 0.737 0.737 0.733 0.736
EEG + EOGL 0.822 0.826 0.826 0.829 0.821 0.824 0.823 0.824 0.715 0.712 0.709 0.711
EEG + EOGR 0.824 0.828 0.831 0.832 0.827 0.824 0.822 0.823 0.681 0.679 0.675 0.679
EMG + EOGL 0.502 0.503 0.502 0.513 0.499 0.496 0.499 0.500 0.440 0.461 0.445 0.461
EMG + EOGR 0.501 0.505 0.511 0.512 0.502 0.501 0.502 0.503 0.436 0.456 0.441 0.456
ECG 0.308 0.314 0.309 0.314 0.298 0.304 0.299 0.304 0.284 0.291 0.284 0.291
EEG 0.811 0.813 0.821 0.813 0.805 0.805 0.806 0.808 0.692 0.689 0.687 0.689
EMG 0.406 0.428 0.412 0.428 0.406 0.428 0.411 0.428 0.407 0.431 0.411 0.431
EOGL 0.424 0.425 0.432 0.423 0.433 0.429 0.419 0.423 0.283 0.298 0.261 0.298
EOGR 0.428 0.425 0.427 0.425 0.448 0.440 0.421 0.422 0.273 0.288 0.250 0.288

ECG: electrocardiogram; EEG: electroencephalogram; EMG: electromyogram; EOGL: electrooculogram left; EOGR: electrooculogram right.

Appendix I.

Averaged classification performances with features from a 60 s length window.

Signal condition (combinations)  Window and signal length: 60 and 120 s  Window and signal length: 60 and 90 s
Precision Recall F1-score Accuracy Precision Recall F1-score Accuracy
ECG + EEG + EMG + EOGL 0.831 0.824 0.823 0.826 0.830 0.829 0.828 0.826
ECG + EEG + EMG + EOGR 0.831 0.830 0.826 0.825 0.830 0.831 0.827 0.825
ECG + EMG + EOGL 0.827 0.826 0.822 0.824 0.811 0.804 0.814 0.820
ECG + EMG + EOGR 0.830 0.829 0.833 0.828 0.826 0.823 0.824 0.825
EEG + ECG + EMG 0.822 0.821 0.827 0.827 0.829 0.829 0.825 0.822
EEG + ECG + EOGL 0.825 0.822 0.826 0.827 0.832 0.826 0.825 0.824
EEG + ECG + EOGR 0.823 0.826 0.830 0.827 0.831 0.830 0.837 0.830
EEG + EMG + EOGL 0.826 0.826 0.829 0.825 0.831 0.833 0.829 0.831
EEG + EMG + EOGR 0.827 0.826 0.827 0.826 0.833 0.830 0.827 0.829
ECG + EMG 0.445 0.462 0.451 0.462 0.442 0.460 0.447 0.460
ECG + EOGL 0.452 0.463 0.461 0.460 0.322 0.334 0.325 0.334
ECG + EOGR 0.465 0.467 0.464 0.461 0.342 0.353 0.343 0.353
EEG + ECG 0.821 0.822 0.825 0.826 0.825 0.821 0.825 0.825
EEG + EMG 0.815 0.820 0.823 0.824 0.821 0.823 0.825 0.825
EEG + EOGL 0.822 0.823 0.825 0.827 0.826 0.824 0.823 0.825
EEG + EOGR 0.831 0.823 0.823 0.822 0.822 0.822 0.823 0.823
EMG + EOGL 0.498 0.501 0.502 0.503 0.496 0.503 0.499 0.495
EMG + EOGR 0.497 0.499 0.501 0.502 0.492 0.493 0.494 0.493
ECG 0.306 0.311 0.306 0.311 0.297 0.302 0.298 0.302
EEG 0.801 0.803 0.811 0.804 0.806 0.811 0.809 0.807
EMG 0.397 0.419 0.403 0.419 0.403 0.427 0.409 0.427
EOGL 0.440 0.450 0.450 0.435 0.269 0.277 0.268 0.277
EOGR 0.442 0.445 0.449 0.444 0.291 0.299 0.286 0.299

ECG: electrocardiogram; EEG: electroencephalogram; EMG: electromyogram; EOGL: electrooculogram left; EOGR: electrooculogram right.

Footnotes

Contributorship: JC and SK contributed to the conception, and design of the study. JG, SK, and SP contributed to the acquisition of data. JC, SK, SP, and SH contributed to the analysis and interpretation of the data. JC contributed to the drafting of the manuscript.

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Ethical approval: Not applicable.

Funding: The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the The Yonsei Signature Research Cluster Program of 2021 (grant number 2021-22-0005).

Supplemental material: Supplemental material for this article is available online.

Guarantor: SHH.

References

  • 1.Zhang J, Zhang X, Zhang K, et al. An updated of meta-analysis on the relationship between mobile phone addiction and sleep disorder. J Affect Disord 2022; 305: 94–101. [DOI] [PubMed] [Google Scholar]
  • 2.Abad VC, Guilleminault C. Sleep and psychiatry. Dialogues Clin Neurosci 2022; 7: 291–303. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Xu Z, Anderson KN, Pavese N. Longitudinal studies of sleep disturbances in Parkinson’s disease. Curr Neurol Neurosci Rep 2022; 10: 635–655. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Al Maqbali M, Al Sinani M, Al-Lenjawi B. Prevalence of stress, depression, anxiety and sleep disturbance among nurses during the COVID-19 pandemic: A systematic review and meta-analysis. J Psychosom Res 2021; 141: 110343. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Deng J, Zhou F, Hou W, et al. The prevalence of depressive symptoms, anxiety symptoms and sleep disturbance in higher education students during the COVID-19 pandemic: A systematic review and meta-analysis. Psychiatry Res 2021; 301: 113863. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Ara T, Rahman MM, Hossain MA, et al. Identifying the associated risk factors of sleep disturbance during the COVID-19 lockdown in Bangladesh: A web-based survey. Front Psychiatry 2020; 11: 580268. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Haythornthwaite JA, Hegel MT, Kerns RD. Development of a sleep diary for chronic pain patients. J Pain Symptom Manage 1991; 6: 65–72. [DOI] [PubMed] [Google Scholar]
  • 8.Currie SR, Clark S, Rimac S, et al. Comprehensive assessment of insomnia in recovering alcoholics using daily sleep diaries and ambulatory monitoring. Alcoholism Clin Exp Res 2003; 27: 1262–1269. [DOI] [PubMed] [Google Scholar]
  • 9.Yong MH, Fook-Chong S, Pavanni R, et al. Case control polysomnographic studies of sleep disorders in Parkinson's disease. PLoS One 2011; 6: e22511. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Goyal A, Saxena K, Kar A, et al. Obstructive sleep apnea is highly prevalent in COVID19 related moderate to severe ARDS survivors: Findings of level I polysomnography in a tertiary care hospital. Sleep Med 2022; 91: 226–230. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Arslan RS, Ulutaş H, Köksal AS, et al. Automated sleep scoring system using multi-channel data and machine learning. Comput Biol Med 2022; 146: 105653. [DOI] [PubMed] [Google Scholar]
  • 12.Satapathy SK, Malladi R, Kondaveeti HK. Accurate machine learning-based automated sleep staging using clinical subjects with suspected sleep disorders. In: Emergent converging technologies and biomedical systems. Singapore: Springer. 2021, pp. 363–379. [Google Scholar]
  • 13.Wongsirichot T, Hanskunatai A. A classification of sleep disorders with optimal features using machine learning techniques. J Health Res 2017; 31: 209–217. [Google Scholar]
  • 14.Satapathy S, Loganathan D, Kondaveeti HK, et al. Performance analysis of machine learning algorithms on automated sleep staging feature sets. CAAI Transact Intell Technol 2021; 6: 155–174. [Google Scholar]
  • 15.Santaji S, Santaji S, Desai V. Automatic sleep stage classification with reduced epoch of EEG. Evol Intell 2022; 15: 2239–2246. [Google Scholar]
  • 16.Zhang GQ, Cui L, Mueller R, et al. The national sleep research resource: Towards a sleep data commons. J Am Med Inform Assoc 2018; 25: 1351–1358. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Quan SF, Howard BV, Iber C, et al. The sleep heart health study: design, rationale, and methods. Sleep 1997; 20: 1077–1085. [PubMed] [Google Scholar]
  • 18.Liao Y, Zhang M, Wang Z, et al. Design and FPGA implementation of an high efficient XGBoost based sleep staging algorithm using single channel EEG. In: International Conference on Cognitive Systems and Signal Processing. Springer. 2018, pp. 294–303. [Google Scholar]
  • 19.Zhao X, Rong P, Sun G, et al. Automatic sleep staging based on XGBOOST physiological signals. In Proceedings of the 11th International Conference on Modelling, Identification and Control (ICMIC2019). Springer. 2020, pp. 1095–1106. [Google Scholar]
  • 20.Radhakrishnan BL, Kirubakaran E, Jebadurai IJ, et al. Classifying sleep stages automatically in single-channel against multi-channel EEG: A performance analysis. In: Disruptive technologies for big data and cloud applications. Singapore: Springer. 2022, pp. 527–537. [Google Scholar]
  • 21.Surantha N, Lesmana TF, Isa SM. Sleep stage classification using extreme learning machine and particle swarm optimization for healthcare big data. J Big Data 2021; 8: 1–17.33425651 [Google Scholar]
  • 22.Aboalayon KA, Almuhammadi WS, Faezipour M. A comparison of different machine learning algorithms using single channel EEG signal for classifying human sleep stages. In: 2015 Long island systems, applications and technology. IEEE. 2015, pp. 1–6. [Google Scholar]
  • 23.Satapathy S, Loganathan D, Kondaveeti HK, et al. Performance analysis of machine learning algorithms on automated sleep staging feature sets. CAAI Transact Intell Technol 2021; 6: 155–174. [Google Scholar]
  • 24.Santaji S., Desai V. Analysis of EEG signal to classify sleep stages using machine learning. Sleep Vigil 2020; 4: 145–152. [Google Scholar]
  • 25.Şen B, Peker M, Çavuşoğlu A, et al. A comparative study on classification of sleep stage based on EEG signals using feature selection and classification algorithms. J Med Syst 2014; 38: 1–21. [DOI] [PubMed] [Google Scholar]
  • 26.Ugi LV, Suratman FY, Sunarya U. Electrocardiogram feature selection and performance improvement of sleep stages classification using grid search. Bull Electr Eng Informat 2022; 11: 2033–2043. [Google Scholar]
  • 27.Satapathy SK, Loganathan D. A study of human sleep stage classification based on dual channels of EEG signal using machine learning techniques. SN Comput Sci 2021; 2: 1–16. [Google Scholar]
  • 28.Siyuan L, Jingyuan L, Hangping G, et al. Sleep staging prediction model based on XGBoost. In: 2021 International Conference on Electronic Information Engineering and Computer Science (EIECS). IEEE. 2021, pp. 350–353. [Google Scholar]
  • 29.Choi HS, Kim S, Oh JE, et al. XGBoost-based instantaneous drowsiness detection framework using multitaper spectral information of electroencephalography. In: Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics. 2018, pp. 111–121. [Google Scholar]
  • 30.Hei Y, Yuan T, Fan Z, et al. Sleep staging classification based on a new parallel fusion method of multiple sources signals. Physiol Meas 2022; 43: 045003. [DOI] [PubMed] [Google Scholar]
  • 31.Bin Heyat MB, Akhtar F, Khan A, et al. A novel hybrid machine learning classification for the detection of bruxism patients using physiological signals. Appl Sci 2020; 10: 7410. [Google Scholar]
  • 32.Amzica F, Steriade M. Electrophysiological correlates of sleep delta waves. Electroencephalogr Clin Neurophysiol 1998; 107: 69–83. [DOI] [PubMed] [Google Scholar]
  • 33.Hauri P, Hawkins DR. Alpha-delta sleep. Electroencephalogr Clin Neurophysiol 1973; 34: 233–237. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

sj-docx-1-dhj-10.1177_20552076231163783 - Supplemental material for Validation of the influence of biosignals on performance of machine learning algorithms for sleep stage classification

Supplemental material, sj-docx-1-dhj-10.1177_20552076231163783 for Validation of the influence of biosignals on performance of machine learning algorithms for sleep stage classification by Junggu Choi, Seohyun Kwon, Sohyun Park and Sanghoon Han in Digital Health


Articles from Digital Health are provided here courtesy of SAGE Publications

RESOURCES