Skip to main content
Scientific Reports logoLink to Scientific Reports
. 2024 Aug 2;14:17952. doi: 10.1038/s41598-024-68978-4

GRU-powered sleep stage classification with permutation-based EEG channel selection

Luis Alfredo Moctezuma 1,, Yoko Suzuki 1, Junya Furuki 1, Marta Molinas 2, Takashi Abe 1
PMCID: PMC11297028  PMID: 39095608

Abstract

We present a new approach to classifying the sleep stage that incorporates a computationally inexpensive method based on permutations for channel selection and takes advantage of deep learning power, specifically the gated recurrent unit (GRU) model, along with other deep learning methods. By systematically permuting the electroencephalographic (EEG) channels, different combinations of EEG channels are evaluated to identify the most informative subset for the classification of the 5-class sleep stage. For analysis, we used an EEG dataset that was collected at the International Institute for Integrative Sleep Medicine (WPI-IIIS) at the University of Tsukuba in Japan. The results of these explorations provide many new insights such as the (1) drastic decrease in performance when channels are fewer than 3, (2) 3-random channels selected by permutation provide the same or better prediction than the 3 channels recommended by the American Academy of Sleep Medicine (AASM), (3) N1 class suffers the most in prediction accuracy as the channels drop from 128 to 3 random or 3 AASM, and (4) no single channel provides acceptable levels of accuracy in the prediction of 5 classes. The results obtained show the GRU’s ability to retain essential temporal information from EEG data, which allows capturing the underlying patterns associated with each sleep stage effectively. Using permutation-based channel selection, we enhance or at least maintain as high model efficiency as when using high-density EEG, incorporating only the most informative EEG channels.

Keywords: Channel selection, Deep learning, Electroencephalogram (EEG), Gated recurrent unit (GRU), Permutation-based channel selection, Sleep, Sleep staging

Subject terms: Computational neuroscience, Learning algorithms, Network models, Biomedical engineering, Scientific data

Introduction

Sleep is a natural and essential state to regulate our mood and emotions, maintaining good health and well-being. We also sleep to help our brain process, consolidate, and organize new information and memories1,2. This is important because, considering 6–8 h of sleep per day, humans spend approximately 20–25 years sleeping throughout their entire lives3,4. During sleep, the human brain undergoes different patterns of activity, transitioning through wakefulness (W), the rapid eye movement (REM) sleep stage, and three non-rapid eye movement (NREM) sleep stages: N1, N2, and N35,6.

Sleep staging is a fundamental process used in sleep medicine to assess and understand the different stages of sleep a subject experiences during the night. For this, sleep experts manually determine sleep stages based on polysomnography (PSG), including the use of electroencephalograms (EEG), electrooculograms (EOG) and electromyograms (EMG)7. All of which mainly uses well-known information from each sleep stage, such as: W is characterized by alpha rhythms, muscle activity is present, and eye movements typically include eye blinks, rapid eye movement, and reading eye activity8. N1 is a stage of light sleep that lasts for a brief period, and brain activity begins to shift from alpha waves (8–13 Hz) to low-amplitude mixed frequency (LAMF, predominantly 4–7 Hz). EOG frequently shows slow eye movement during N18. N2 is characterized by sleep spindles (oscillations (11–16 Hz) appear in the thalamus and cortex) and K-complexes (brief, high-amplitude waves)9. Eye movements are minimal and heart rate and body temperature decrease. N3 is dominated by delta waves10,11. REM is characterized by rapid eye movements with the lowest level of chin muscle tone; Brain activity shows fast and desynchronized brain waves, similar to LAMF in N18.

Sleep stage studies can help unravel the mysteries surrounding sleep, dreams, dream emotions, memory consolidation, and other essential functions of the brain and body during rest4,1214. Sleep disorders, such as insomnia, sleep apnea, narcolepsy, and restless leg syndrome, among others, can be diagnosed and treated with the help of sleep staging1416. Interrupted sleep periods can cause sleep disorders that affect approximately 60% of adults and sleep apnea in 2–4% of adults and 1–3% of children5,16. Therefore, it is highly desirable to measure sleep quality through sleep monitoring and sleep stage classification.

Manual sleep staging, which is still prevalent in sleep clinics and sleep research, is an expensive and time-consuming process. To address these challenges, several research papers have suggested the use of artificial intelligence techniques such as machine learning (ML) or deep learning (DL) models1726.

ML methods require certain specialized expertise from the data engineer to perform feature extraction (such as fractal dimensions, energy distribution, power spectral density, or wavelet coefficients), with manually crafted features to develop a classification model2729. ML methods have been shown to achieve high performance, even when using only a single bipolar EEG channel19.

DL has demonstrated good capabilities in domains such as natural language processing (NLP), computer vision, and time-series analysis30. Recurrent Neural Networks (RNNs) are particularly well suited for sequential data, making them ideal for EEG signal classification. Among RNNs, the Gated Recurrent Unit (GRU) has shown superior performance in modeling long-range dependencies and mitigating the problem of vanishing gradients30,31.

CNN has been used successfully to extract sleep-related waveform characteristics, but, as with most DL methods, it requires a task-specific architecture, large models, and is data-intensive21,22,2527,3236. A CNN approach called EEGNet has been applied to EEG-related applications, including sleep staging, and has shown high performance37,38. It has been reported in combination with bidirectional long- and short-term memory (BiLSTM) and the mixing of randomly selected delta, theta, and beta frequencies for data augmentation to overcome the problem of class imbalance. In this way,39 reported accuracies of up to 0.87, using 5-class classification models.

The authors in40 combined two bipolar EEG channels (Fpz-Cz and Pz-Oz) and one horizontal EOG channel from Sleep-EDF Expanded dataset (Sleep-EDFX)41 with EEGNet-BiLSTM. The authors reported an accuracy of 0.90 and a confusion matrix in which the most misclassified classes are N1, N2 and N3.Another recent approach based on 5-class models, called SalientSleepNet, was proposed by Ziyu Jia et al.42 and tested on the SleepEDF-39 and Sleep-EDF-153 datasets using Fpz-Cz EEG and horizontal EOG channels41,43, reporting an accuracy of 0.85 on average, but only 0.55 on average to detect the N1 sleep stage.

Other work used Sleep-EDF-20 (Fpz-Cz channel), Sleep-EDF-78 (Fpz-Cz channel)41,43 and Sleep Heart Health Study (SHHS, C4-A1 channel)44 datasets. In the best case, they obtained an accuracy of 0.856, 0.829 and 0.866, respectively for each dataset using an attention-based deep learning architecture (AttnSleep)45. Another approach used a CNN model in the SHHS dataset, obtained an accuracy of 0.87, 0.87 Fscore, and 0.81 for Kappa32. A recent work proposed the use of cascaded CNN + LSTM and the use of only one bipolar channel from the Sleep-EDF dataset, showing that it is possible to obtain an accuracy of 0.827 using only Fpz-Cz, and 0.797 using Pz-Oz46.

The mentioned methods are mainly tested in different versions of the public Sleep-EDF dataset or SHHS dataset, in all the cases using bipolar EEG41,43,44. There are many factors that confound the comparisons in the state of the art, since different approaches are tested under different conditions, such as using a subset of subjects, splitting into different percentages, and using a different number of folds in the cross-validation. In general, the state-of-the-art covers a range of methods for classification of sleep stages from different perspectives and using features hand-crafted with ML or DL models. Furthermore, many works have presented experimental results creating different combinations, such as considering sleep versus W, N1 and W as a single class, N1N2 versus N3, W versus NREM versus REM, etcetera21,33,38,4554.

Here, we present a method for the classification of 5-class sleep stages using different DL architectures, comparing performance in training and validation sets, and for the test set, we measure performance using accuracy, Fscore, precision, recall, and area under the receiver operating characteristic (AUROC). For the analysis, we used a dataset collected at the International Institute for Integrative Sleep Medicine (WPI-IIIS) at the University of Tsukuba in Japan.

One of the challenges in any EEG-related task, including sleep staging, is the selection of channels, as some of them may be more informative than others for certain stages of sleep25,26,36,38,55. Although properly selecting EEG channels that contribute more to classification tasks will drastically reduce the computational cost to train and predict new data, the computational cost of the channel selection process itself can be high28,29.

For example, methods are proposed for EEG-related tasks that use Non-dominated Sorting Genetic Algorithm (NSGA), which have been shown to be effective in selecting the most informative channels for classification tasks; however, the number of channels in the datasets is less than 6428,29. For this, the process creates up to 20 models (20 chromosomes) per population and up to 200 generations to minimize a multiobjective optimization (MOO) function that consists of reducing the number of channels while increasing or at least maintaining the performance.

We present a new approach for sleep stage classification that incorporates a computationally inexpensive method based on permutations for channel selection and leverages the power of DL, specifically the GRU model, along with other deep learning methods. By systematically permuting the channels, different combinations of channels are evaluated to identify the most informative subset for the classification of sleep stages. Together, our results can provide indices that could improve understanding of sleep disorders and pave the way for more personalized and effective treatments in sleep medicine.

Results

Comparison of different algorithms for the classification of 5-class sleep stages using 128 EEG channels

Here, we compare the performance of various DL algorithms to create a 5-class classifier (W, N1, N2, N3, and REM). With this comparison, our aim is to analyze the performance of the different DL algorithms and thus select the one with the highest performance for further analysis. We used 128 channels and created a model per subject, which was validated using 5-fold cross-validation. The average results for all subjects are presented in Table 1.

Table 1.

Average 5-class sleep stage classification performance from all the subjects in the dataset with various DL algorithms.

Model M-acc V-Acc Acc FScore Precision Recall AUROC Kappa
GRU 0.879 ± 0.03 0.867 ± 0.03 0.885 ± 0.03 0.860 ± 0.02 0.889 ± 0.02 0.843 ± 0.03 0.983 ± 0.01 0.838 ± 0.04
EEGNeX 0.885 ± 0.03 0.878 ± 0.03 0.882 ± 0.03 0.859 ± 0.02 0.874 ± 0.03 0.849 ± 0.02 0.981 ± 0.01 0.837 ± 0.04
1D-CNN 0.817 ± 0.05 0.791 ± 0.05 0.809 ± 0.05 0.763 ± 0.05 0.808 ± 0.04 0.746 ± 0.06 0.948 ± 0.02 0.731 ± 0.07
EEGNet 0.785 ± 0.05 0.759 ± 0.07 0.811 ± 0.05 0.747 ± 0.06 0.801 ± 0.05 0.734 ± 0.06 0.947 ± 0.03 0.730 ± 0.08
Single 2D-CNN-LSTM 0.766 ± 0.09 0.764 ± 0.09 0.797 ± 0.06 0.711 ± 0.10 0.775 ± 0.07 0.702 ± 0.10 0.930 ± 0.04 0.707 ± 0.10
Bi-LSTM 0.615 ± 0.12 0.624 ± 0.12 0.677 ± 0.11 0.504 ± 0.14 0.520 ± 0.16 0.522 ± 0.13 0.820 ± 0.08 0.512 ± 0.18
Single LSTM 0.749 ± 0.19 0.747 ± 0.19 0.825 ± 0.09 0.777 ± 0.11 0.821 ± 0.10 0.765 ± 0.09 0.954 ± 0.04 0.753 ± 0.12

As stated above, we have tested all algorithms presented in a previous benchmark; however, we report only the models with the highest performance, without including those that gave low or random performance for the sleep staging task56.

Table 1 shows that the performance is similar when using GRU or EEGNeX, in both cases with a low standard deviation. It is clear that when using any other model, performance decreases >=8% in most metrics. An important point to highlight is that the standard deviation is lower than 0.05 in both GRU and EEGNeX, but not for the other algorithms.

The 2-way ANOVA test found no statistical significance in performance between using GRU and EEGNeX (p-value=0.53, see Table S1 from the supplementary information), but for both it is significant when compared with 1D-CNN (p-value= 0.000), which, as shown in Table 1, are the top three methods for obtaining the highest performance.

5-class sleep stage classification with GRU and EEGNeX using 3 recommended channels of AASM or 3 random channels

The American Academy of Sleep Medicine (AASM) has recommended the use of 3 channels and many studies have used them to compare their reported results. In this experiment, we performed the sleep stage classification process using the 3 channels recommended by AASM, and the results were obtained with the models created using 3 random channels. The 3 random channels are recalculated separately for each subject and for each of the 5 folds. We present this comparison to understand if the 3 channels proposed by AASM contain more useful information for the classifiers than any other channels.

For this analysis, we use the top 3 algorithms according to the results presented in Table 1, and the average results for the 5-class sleep staging of all subjects are presented in Fig. 1.

Figure 1.

Figure 1

Comparing the 5-class sleep stage classification performance with 3 AASM recommended channels and 3 random channels. The presented results are the average of all subjects and the 5-fold, which were tested with the top 3 methods according to the results from Table 1.

The results in Fig. 1 show that we can obtain the highest performance for sleep staging using the GRU model, in both cases, using 3-AASM or 3-random channels (p-value<0.002 for comparisons between GRU and EEGNeX/1D-CNN across methods). Additionally, the performance appears to be similar with both sets of channels, indicating that the set of 3 channels recommended by AASM may not necessarily provide more information than other sets of channels for automatic classification of sleep stages (p-value>0.53 for GRU across channel subsets (3-AASM and 3-random channels). See Table S2 from supplementary information for 2-way ANOVA results). However, further analysis in different datasets and with a higher number of subjects is needed to obtain a more solid conclusion.

The 2-way ANOVA test shows no statistical significance for the results between GRU and EEGNeX using the 128 EEG channels (see Table 1), however, when using 3-AASM or 3-random channels, GRU shows higher performance with statistical significance, highlighting the correct decision to choose GRU as the best method for further analysis in the experiments.

Figure 2 presents the confusion matrix of the average results for all subjects. To compare which stages are most affected by channel reduction, the figure includes the confusion matrix using GRU with the 128 channels (results from Fig. 1), 3 recommended channels of AASM, and 3 random channels.

Figure 2.

Figure 2

Confusion matrix of the average results (in percentage) from all the subjects and models created with GRU models, presented in Table 1 and Fig. 1. (A) 128-channel model, (B) 3 recommended channels of AASM, and (C) 3-random channel model.

The confusion matrices clearly show that the main affected sleep stages are W and N1, but the percentages of misclassification also increase for the other stages. When using 3 channels, 38–44% of the instances of N1 are misclassified with N2 and 13–15% of W are misclassified with N2. Based on the results of Table 1 and Fig. 1, we selected GRU for further analysis.

Single-channel for 5-class sleep staging using GRU

As presented in the Introduction, there are many current studies that have used single bipolar channels for sleep staging. To evaluate the effectiveness of using bipolar electrodes, as in the state-of-the-art, we created models considering the Pz-Oz or Fpz-Cz channels to create single-channel models.

According to the biosemi cap with 128 channels, Pz corresponds to A19, Oz to A23, Fpz to C17 and Cz to A1. This makes it easier to compare the models with the methods proposed in the state-of-the-art using a bipolar single channel. The experiments using single channel EEG were also repeated using each channel individually, that is, creating a model with channel A19, A23, C17, or A1.

The average results for all subjects and 5-fold are presented in Table 2. The highest performance is obtained using A1 (Cz) or C17-A1 (Fpz-Cz), indicating the importance of the A1 channel for the staging of sleep, at least from the set of channels tested (A1, A19, A23, C17). However, the performance during training and validation is lower than 0.60, and for the test set it is random.

Table 2.

Average performance with single-channel models for 5-class sleep stage classification.

Channel M-acc V-Acc Acc FScore Precision Recall AUROC Kappa
Cz (A1) 0.568 ± 0.06 0.567 ± 0.06 0.586 ± 0.06 0.467 ± 0.07 0.559 ± 0.12 0.466 ± 0.06 0.802 ± 0.04 0.388 ± 0.08
Pz (A19) 0.493 ± 0.06 0.497 ± 0.06 0.503 ± 0.06 0.353 ± 0.11 0.404 ± 0.15 0.371 ± 0.08 0.72 ± 0.08 0.249 ± 0.10
Oz (A23) 0.482 ± 0.07 0.488 ± 0.07 0.485 ± 0.08 0.300 ± 0.09 0.343 ± 0.12 0.325 ± 0.06 0.712 ± 0.07 0.205 ± 0.11
Fpz (C17) 0.482 ± 0.05 0.488 ± 0.05 0.492 ± 0.06 0.307 ± 0.08 0.363 ± 0.12 0.331 ± 0.06 0.716 ± 0.05 0.216 ± 0.08
Pz-Oz (A19, A23) 0.514 ± 0.08 0.523 ± 0.08 0.530 ± 0.08 0.392 ± 0.11 0.437 ± 0.14 0.407 ± 0.09 0.751 ± 0.09 0.298 ± 0.11
Fpz-Cz (C17, A1) 0.591 ± 0.07 0.600 ± 0.08 0.604 ± 0.07 0.483 ± 0.08 0.573 ± 0.10 0.488 ± 0.08 0.817 ± 0.06 0.415 ± 0.12

The channel names are indicated in the 10-20 international system and in parentesis as the biosemi system.

Permutation-based channel selection for 5-class sleep staging using GRU, classification performance comparison selecting the k best channels versus the k worst channels.

Here, our objective is to perform 5-class sleep staging with channels selected by permuting channels and evaluating the value of the performance delta (that is, performance with all channels minus performance after changing information from a given channel). We have followed the process explained in Fig. 10.

Figure 10.

Figure 10

Flowchart of the permutation-based channel selection process.

For this, we created a model per subject 5 times (5-fold), and the average performance delta is considered to understand how important a channel is. To illustrate this, Fig. 3 presents an example of the performance delta in accuracy, Fscore, and kappa metrics, where each line represents the importance of the channel in one-fold. From this, we consider that a channel is more important if the performance delta value is higher, since it means that by removing that channel, the performance decreases more. When the performance delta value is close to 0, it means that, using the channel or not, the performance is not affected at all; therefore, we could use fewer channels and ensure the same or very similar performance than using a high number of EEG channels.

Figure 3.

Figure 3

Example of performance delta values in accuracy, Fscore and kappa for channel importance from the 5 folds with one of the subjects.

An important point presented in Fig. 3 is that in the five different folds, the important channels are similar or the same. However, the importance of the channel may be different for other subjects in the dataset; therefore, to better understand it, Fig. 4 presents the topographic map with the biosemi model for all subjects. The values used to plot the topographic maps were calculated as the average performance delta between the different metrics used, but the unreported results show the same behavior as when using the performance delta change in common metrics for sleep staging: accuracy, Fscore, or kappa.

Figure 4.

Figure 4

Topographic maps from all the subjects in the dataset showing the permutation-based channel importance for 5-class sleep staging. Each topographic map represents one of the 14 subjects in the dataset.

Figure 7 presents the average topographic map of all subjects that evaluates the importance of the channel with the 128-channel model, which is the same for both A) and B) because there is no change in the calculated importance at that point; the changes occur from models with 64 EEG channels.

Figure 7.

Figure 7

Topographic map of the average channel importance from all the subjects. Importance of the channels of the models created by recursively removing half of the most important (A) or the less important (B) channels for 5-class sleep staging.

As shown in Fig. 4, the importance of the channel of some subjects is different from that of others, but there are only a few cases where it occurs. The most important channels in subjects tend to be in the central to frontal lobe.

Based on the importance of the channel presented in Figs. 7 and 4, we created 5-class models for sleep staging using GRU. To see the evolution of the importance of the channel, we created a model considering only the first most important channel, that is, the channel with which the highest performance delta value on average of all metrics was obtained based on the 128-channel model. Then, we created a model using 2 channels, the first and second most important channels, and then created 5-class models for sleep staging using the 3–10 most important channels for each subject.

To compare the classification performance if we select the incorrect channels, we also repeated the process but considering the k less important channels from the 128-channel models, i.e. the channels that when including them or not, the performance does not change or does not decrease too much (less than 0.01 on average). All processes were carried out for one subject at a time, that is, considering the importance of the channel for each subject, and the average performance is reported in Fig. 5.

Figure 5.

Figure 5

Average performance obtained from all the subjects using A) 1–10 most important channels or B) 1–10 less important channels for 5-class sleep staging. p-value=0.000 for 2-way ANOVA using the k best or k less important channels (see Table S3 from supplementary information).

Figure 5 shows that by incorrectly selecting channels, the performance can decrease approximately 5%, especially with fewer than 8 channels. It also shows that performance in all metrics is random, creating a model with the best or worst channel. However, by adding information from one more channel (that is, using at least information from two EEG channels), performance increases significantly.

Permutation-based channel selection for sleep staging with recursive removal of half of the best/worst channels

Selecting the channels that are more important when using all the 128 available channels may not ensure that those channels are globally the most relevant for sleep staging, i.e., the channels that are going to be the most relevant if we calculate the channel importance with a model created with fewer channels. To analyze it further, we performed the same process for sleep staging, but iteratively removed half of the best or worst channels to then recalculated the channel importance. The process was repeated using models with 128, 128/2, 64/2, 32/2, 16/2, 8/2 and 4/2.

In short, we created a GRU model using 128 channels, computed the performance and the channel importance, then selected the 64 most relevant channels to create another model and computed the performance and the channel importance. From the model with 64 EEG channels, we selected the 32 most relevant channels and repeated the process until a model with two channels was created.

In this way, the important channels are confirmed with a smaller model, which provides more information about which channels are more/less relevant for the classification process since with the smaller model, the performance is different from only validating the channel importance with the 128-channel models.

The whole process was repeated using 5-fold cross-validation and for each subject individually. The average performance of all subjects is presented in Fig. 6, and the average importance of the channel of all subjects is presented in a topographic map in Fig. 7.

Figure 6.

Figure 6

Average performance obtained from all the subjects and models created by recursively removing half of the most important (A) or the less important (B) channels for 5-class sleep staging. p-value=0.000 for 2-way ANOVA using the k best or k less important channels, except when using 32 and 64 channels, where no statistical significance was found (p-value >0.064, see Table S4 from supplementary information).

When we consider the evolution of removing half of the channels, keeping only the K most important channels to create the models, and seeing the importance based on permutation, the results of all subjects may vary. As shown in Figs. 8 and 9, if we plot the information from a single subject but the average from the 5-fold, the evolution is clear, and it is clear how the important channels remain important when the models are created using each time half of the channels.

Figure 8.

Figure 8

Topographic map of the average channel importance from one subject at a time (subjects 1–7). Importance of the channel of models created by recursively removing half of the most important channels for 5-class sleep staging.

Figure 9.

Figure 9

Topographic map of the average channel importance from one subject at a time (subjects 8–14). Importance of the channel of models created by recursively removing half of the most important channels for 5-class sleep staging.

In B) from Figure 6, we iteratively removed the most important channels, and the difference in performance compared to A) is clear. For example, using only two channels from A) we can obtain a Kappa value of 0.667, but only 0.416 with the two channels selected in B).

Discussion

This paper investigated DL methods for 5-class sleep stage classification from EEG signals, showing that GRU-based models can outperform methods such as EEGNet, which have been used previously and show high performance. The proposed method involves an approach to channel selection based on permutations, which is computationally cheap, since it requires training a single model (or 5 if using 5-fold cross-validation) to obtain an overview of EEG channels that contribute more to sleep stage classification.

The results obtained using GRU-based models have effectively classified sleep stages using less than 6 EEG channels and obtained a performance as high as using all the 128 channels.

By comparing the performance with the best channels found and the less informative ones, we have shown the importance of selecting them correctly. The results show a decrease of approximately 10% using the best or worst channels, especially with fewer than 6 EEG channels.

In our experiments, we compared the use of the 3 channels recommended by AASM with random sets of 3 channels, and the results show the same or similar performance. This suggests that using expert knowledge for channel selection may not be optimal for automatic classification using ML/DL. As the characteristic waveforms of sleep (K complex, spindle and slow wave activity) or wakefulness (alpha wave) are distributed in the brain in a topographic manner5759, it was possible to extract the characteristic waveforms required for the classification of the sleep stage even if 3 channels were extracted at random. Furthermore, even if the channels were randomly extracted, the electrodes around the electrodes used to label the AASM sleep stages could have been selected and automatic sleep scoring could have been performed without significant differences in the EEG waveform.

We believe that the proposed method is a promising approach to determining the stages of sleep. The channel selection method is simple to implement and easy to understand, and since the channel importance is calculated with the trained model, it can be used with any ML or DL algorithm.

Importantly, and contrary to the state-of-the-art using a single bipolar channel, in the WPI-IIIS dataset, we found that performance decreases approximately 10% in all metrics when using fewer than 3 EEG channels.

The results have shown that we can select a set of the most relevant channels for sleep staging, but the set selected may be different between subjects. Therefore, we have to (1) obtain a set of more relevant channels from a larger population or (2) select a set of channels for each subject. The essential channels for sleep scoring in subjects in this study tended to be in the frontal to central lobes. This is consistent with the fact that characteristic EEG waveforms, such as the K complex and sleep slow waves, are frontally dominant and the spindles appear centrally dominant18, which may also be one reason for the contribution to autosleep scoring performance. In addition, possibly due to EOG artifacts such as blinks during wakefulness, rapid eye movements characteristic of stage W and REM, and slow eye movements (SEM) joints in N1 may also be included in the frontal EEG, which may help determine stage W, N1, and REM. However, performance delta presented in Figures 4 and 9 tend to be higher in the central and left temporal regions. This may first be explained by the fact that autosleep scoring picks up delta wave changes in the central and left temporal regions rather than in the frontal region, which is essential for visual scoring. The central and temporal regions have lower delta wave amplitudes than the frontal regions, so amplitude and power value changes may be easier to track than the high-amplitude frontal regions. Another possible reason is that artifacts resemble sleep-slow waves, such as sweat with slow frequency drift, or rapid eye movements from EOG, making frontal slow waves less reliable when these artifacts are present. Information from the central and temporal regions, where artifacts are less likely to glide, may have been necessary to automatically determine sleep stage N3.

AASM induction and random extraction of the 3-EEG electrode increased the number of misclassifications from N1 to N2; however, 128 channels increased the detection performance of N1, which was higher than the N1 agreement rate (63%) for the reliability of human interrater60. The following reasons for the disagreement with N1 can be considered; (1) The detection performance of the N1 feature waveforms, such as sharp vertex waves, and the N2 feature waveforms, such as the K complex and the spindle, may be higher with 128 channels than with 3 channels. (2) In the AASM manual sleep stage scoring rule, cortical arousal during stage N2 results in the termination of N2 and the change in sleep stage to N18. A total of 128 channels may have detected cortical arousals that decrease the sleep stage from N2 to N1 and more accurately detected changes in EEG frequency. In addition, 3-channel models may not be able to detect cortical arousals. (3) N1 increases SEM more than N2; N2 may produce SEM but flatter eye movements; 128 channels may have captured SEM artifacts in areas closer to the eyeballs8. The definition of sleep stage N1 is relatively poor and has the lowest inter-scorer reliability60,61; Therefore, compared to human scorers, the GRU model performs relatively well.

A weak point of permutation-based channel selection is that it may provide a general indication of channels containing less/more information, since it does not consider complex channel combinations that could increase the classification performance. Complex channel combinations can be explored with approaches such as recursive channel elimination, forward addition, or NSGA28,29. However, in all cases the computational cost is higher than using permutation-based channel selection, since all mentioned approaches need to create as many models as channel combinations are tested, which in the case of our dataset, the channel combinations are high using the 128 channels.

Future work will focus on solving the problem associated with the N1 sleep stage, since, as shown in the confusion matrices in Fig. 2, even using all available channels, N1 is misclassified mainly with N2, but other works have also shown that it is misclassified with REM and to a lesser extent with W epochs38.

For this, we would like to explore the use of one-class classifiers, methods for data augmentation, use a large dataset from many subjects, or transfer learning. When using data augmentation methods, the authors have reported improvements of approximately 1–3% in N1 and up to 12% for the rest of the sleep stages, but the problem with N1 remains unsolved62.

In our work, we focused on the sleep staging using only EEG signals, since EOG/EMG channels could introduce noise (movements, blinking, muscle contractions) and potentially affect the classification performance without a preprocessing step, as well as adding signals in different amplitudes/frequencies. However, in our future work, we would like to explore the use of EOG signals, since other works had shown that combining EEG and EOG can obtain performance up to 0.8542.

Our future work will include testing our approach for sleep staging using public datasets such as SHHS, Sleep-EDF41,43,44, and compare the performance with state-of-the-art methods. Our experimental configuration (skipping a preprocessing step and using 2-s segments) were defined based in our previous findings, however, we will further explore whether a preprocessing step or the use of a different segment size helps to increase the classification performance, with both, our proposed method and state-of-the art methods. We plan to test the permutation-based channel selection in high-density EEG datasets (higher than 128 channels), and compare if different methods results in similar performance and identifying common relevant brain areas for sleep staging.

Materials and methods

EEG recording

The dataset was collected at the International Institute for Integrative Sleep Medicine - WPI-IIIS at the University of Tsukuba, Ibaraki, Japan. The data collection process was approved by the Clinical Research Ethics Review Committee of the University of Tsukuba Hospital (R02-213). This study was also conducted in accordance with the Declaration of Helsinki, and all subjects gave their informed consent. The dataset consists of EEG recordings of 14 subjects (5 females and 9 males, 22.5±0.9 years), who were sleeping for approximately 8 h and 15 min (±2 min for females and ±4 min for males. Sleep time was controlled by turning the lights off and on.) while their data were collected using the BioSemi headcap with 128 EEG channels, which use a radial montage and a sample rate of 1024 Hz, three EOG channels, three EMG channels, and two mastoid channels. It should be noted that we did not include EOG/EMG data for any of the experiments in this work.

The sleep stages were manually labeled every 30 seconds by a registered polysomnographic technologist. Table 3 presents the average percentage distribution of the five stages of all subjects: W, N1, N2, N3 and REM.

Table 3.

Average percentage distribution of the awake and the four sleep stages from all the subjects in the WPI-IIIS dataset.

W N1 N2 N3 REM
9.3 ± 5.8% 11.6 ± 6.1% 37.4 ± 9.2% 20.9 ± 6.7% 20.8 ± 8.5%

In the preprocessing step, the EEG channels were re-referenced by subtracting (M1+M2)2, which corresponds to the average between the right and left mastoids. We downsampled the dataset to 128 Hz and, to increase the number of instances in the dataset, each 30-s epoch was divided into 2-s segments, since using that segment has shown as high performance as using a larger segment (previous experiments considered 2, 5, 10, 15 and 30-s segments), it increases the number of instances, while it would be possible to detect the onset of the sleep stage faster38,63,64.

Additional preprocessing, such as bandpass filter or notch filtering, was skipped since the experimental results have shown that the tested algorithms perform better with the raw data.

Deep learning models

For the interest of this work, here we describe relevant details of GRU; however, we have tested many other architectures proposed for EEG-related tasks, including brain-computer interfaces (BCIs)56.

As mentioned before, EEGNet has shown good performance, and there are many current versions, and this work tested all of them, but reported only the one with the highest performance. In the case of CNN, we have tested different padding configurations, such as causal, dilated, and causal-dilated, but we report only the configurations with the highest performance.

Gated recurrent unit—GRU

GRUs are a type of RNN architecture that are designed to model sequential data and capture long-term dependencies. This makes them suitable for analyzing sleep EEG signals, which are time series data that contain information about the brain’s electrical activity during sleep. GRUs work by using a set of gating mechanisms to control the flow of information within the network. These gating mechanisms include an update gate and a reset gate31,65.

The update gate determines how much of the previous hidden state should be remembered, while the reset gate determines how much of the previous hidden state should be forgotten31,65. The candidate hidden state is then computed by combining the reset gate and the current input, and passing them through a nonlinear activation function, such as the hyperbolic tangent (tanh) function66. This candidate hidden state is then combined with the output of the update gate to produce the new hidden state65.

The GRU’s gating mechanisms allow it to selectively update and remember information over time, making it particularly effective in handling long sequences and capturing dependencies across time steps31,65. This is important for analyzing sleep EEG signals, as these signals can be long and complex and contain information on brain activity over time.

The number of GRU layers in a model can affect its performance. Adding more GRU layers can allow the model to capture more complex patterns and dependencies in the data, but it also increases the model’s capacity, which may lead to overfitting31.

Table 4 presents the architecture of the 3-layer GRU model that we use, which was tested experimentally and inspired by the results obtained in a previous benchmark showing that 3-layer GRU is a good balance between performance and complexity and also has a relatively low risk of overfitting56. The model has an input layer with 128 neurons, followed by 3 GRU layers with 100 neurons each. The output layer has 5 neurons, one for each sleep stage. We use the L2 regularization penalty with a regularization factor of 0.0001, the exponential linear unit (ELU) activation function and softmax. These parameters have been shown to improve the performance of GRU models for the classification of EEG tasks25,26,56.

Table 4.

Followed GRU architecture for 5-class sleep staging.

Layer # units Parameters
Input Input = (None, 128, 256)
GRU 100 Input_shape = (no_chs, samples), return_sequences = True, kernel_regularizer = l2(0.0001)
GRU 100 Return_sequences = True, kernel_regularizer = l2(0.0001)
GRU 100 Kernel_regularizer = l2(0.0001)
Dropout Rate = 0.5
Dense 100 Activation = elu
Dense Activation = softmax, output = (None, 5)

Classification configuration

For all DL algorithms tested, we use the categorical cross-entropy loss function, and the networks are optimized with the Adam algorithm67. In all cases, we consider a maximum of 300 epochs to train the models, applying an Early Stopping Callback to interrupt training when the validation accuracy stops improving.

The dataset was divided into 50% as training data, 25% as validation data, and 25% as test data. The performance of the model was validated using the model accuracy (referred to in the text as M-Acc), the validation accuracy (referred to in the text as V-acc), and for the test set: Accuracy, Fscore, precision, recall, AUROC (for this, we have used the one-versus-rest approach) and kappa. We tested the models using 5-fold cross-validation68,69 , which was done by repeating the experiments, and then reported the average performance.

Since the dataset is unbalanced, we experimented with various weight values of the classes; in this way, we observed how performance increased or decreased by adjusting the weights of the classes70. However, the experimental results show the highest performance when we use the same weight for all classes.

EEG channel selection

EEG signals are commonly used in sleep research, and there has been a desire to use a few channels to achieve real-time sleep staging. For this, methods have used expert knowledge for selecting the channels, as well as automatic methods such as greedy algorithms and MOO algorithms. The methods have been tested not only for sleep staging, but also for other tasks related to EEG, showing promising results28,29,63.

The American Academy of Sleep Medicine - AASM recommends the use of EEG channels F4, C4 and O2, which correspond, respectively, to C4, B22 and A28 of the biosemi cap with 128 channels18. For the experiments presented, we tested three main configurations: (1) considering a set of 3 channels recommended by AASM, (2) using a set of 3 random channels (randomly selected in each fold/subject) and (3) selecting the most important channels using a method based on permutation. Additionally, we present a set of experiments using a single channel, as is done in the state-of-the-art.

Permutation-based channel selection

Traditional approaches to classifying sleep stages typically use all available EEG channels. However, the use of irrelevant or redundant information from EEG channels could lead to poor classification performance, as well as the need to use high-density EEG devices.

To address this issue, we propose the use of a computationally cheap permutation-based channel selection method. By permuting the EEG channels and evaluating the performance of the classification model, we can identify the most informative channels for the classification of sleep stages. This process guarantees the use of only the most relevant EEG channels, resulting in improved classification performance comparable to, if not exceeding, the use of high-density EEG.

Figure 10 presents an illustrative example of the process followed to identify the most relevant channels.

The process starts by creating a model with all the 128 EEG channels, for which it uses 50% of the data for training and 25% for validation.

Once the model is created, the test set instances, consisting of 25% of the dataset, are used to predict sleep stages and measure performance. The performance obtained is considered the baseline performance, from which the other performances will be compared.

We selected data from the first EEG channel of all instances in the test set and then randomly assigned them to the instances. This means that instances will have different information at that channel position than when they were collected, and since the model learned that some specific information must be at that channel position, we can measure whether the information from the channel is important to classify the sleep stages.

When computing the performance delta, we can use any of the metrics, but in this paper we use the average difference between the metrics (accuracy, Fscore, precision, recall, AUROC, and kappa). Throughout the document, we refer to Performance delta as the difference between the performance obtained at the baseline (using all channels) and the model that permutes the information from certain channels.

This process is repeated for all EEG channels, and we can then analyze which channels are decreasing or increasing performance, reflected in the importance of the channel for the classification of stages of sleep. Furthermore, this process is repeated using a 5-fold cross-validation and for all subjects in the dataset individually, thus ensuring that the important channels are the same for different test sets.

In the example presented in Fig. 10, the baseline accuracy is 90%, and the accuracy decreases to 85% when the information from channel 2 (ch 2 in the figure) is changed. The accuracy decreases to 89% when removing the information from channel 3 (ch 3 in the figure). After the process is completed, we can see the graph located at the bottom right, which illustrates that the performance decreases more when we remove the information from channel 2, which means that channel 2 is more important than the other channels for high performance.

From this analysis, we can also obtain more information, for example, if some channels are not relevant or even if they contribute to reducing performance, as in the example illustrated with the last channel in the graph.

Two-way ANOVA

A two-way ANOVA was performed to evaluate the effects of the methods tested, the performance metrics. For this, we used the python library Statsmodels71. The ANOVA results were calculated using ordinary least squares (OLS) regression and the ANOVA tables were generated, showing the main effects and interaction effects of the factors.

The analysis was carried out using two factors: Method (two levels, representing the two different methods being compared. Example 1: GRU and EEGNeX, example 2: GRU-3AASM and EEGNeX-3AASM), and Metric (six levels, corresponding to the performance metrics used to validate the classification models). For more information on comparison settings, see the y information. Multiple comparison corrections were not performed, since the performance metrics used for ANOVA already represent the averages from the 5-fold cross-validation72.

Supplementary Information

Acknowledgements

This work was supported by the Japan Society for the Promotion of Science (JSPS) Postdoctoral Fellowship for Research in Japan: Fellowship ID P22716, JSPS KAKENHI: Grant numbers JP22K19802 and JP20K03493, Japan Agency for Medical Research and Development (AMED) under Grant Number JP21zf0127005, and the Ministry of Education, Culture, Sports, Science and Technology (MEXT), World Premier International Research Center Initiative (WPI) program. Sponsors were not involved in the design of the study, the collection, analysis, or interpretation of data, the writing of this article, or the decision to submit it for publication.

Author contributions

L.A.M devised the methods, performed the experiments, analyzed the results, and wrote the manuscript. Y.S, M.M, and T.A discussed the results and partially wrote, read and revised the manuscript. J.F collected and labeled the dataset.

Data and code availability

The dataset collected at the University of Tsukuba is available from the corresponding author upon reasonable request. The classifier code, including an example with a random subset of the dataset, is available at https://github.com/wavesresearch/sleep-staging_IIIS.

Competing interests

The authors declare that the investigation was carried out in the absence of commercial or financial relationships that could be construed as a potential conflict of interest.

Footnotes

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

The online version contains supplementary material available at 10.1038/s41598-024-68978-4.

References

  • 1.Stampi, C. Why we nap (Springer, 1992). [Google Scholar]
  • 2.Walker, M. Why we sleep: Unlocking the power of sleep and dreams (Simon and Schuster, 2017). [Google Scholar]
  • 3.Bisson, A. N. S., Robinson, S. A. & Lachman, M. E. Walk to a better night of sleep: Testing the relationship between physical activity and sleep. Sleep Health5, 487–494 (2019). 10.1016/j.sleh.2019.06.003 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Patel, A. K., Reddy, V., Shumway, K. R. & Araujo, J. F. Physiology, sleep stages. In StatPearls [Internet] (StatPearls Publishing, 2022). [PubMed]
  • 5.Aboalayon, K. A. I., Faezipour, M., Almuhammadi, W. S. & Moslehpour, S. Sleep stage classification using eeg signal analysis: A comprehensive survey and new investigation. Entropy18, 272 (2016). 10.3390/e18090272 [DOI] [Google Scholar]
  • 6.Yildirim, O., Baloglu, U. B. & Acharya, U. R. A deep learning model for automated sleep stages classification using psg signals. Int. J. Environ. Res. Public Health16, 599 (2019). 10.3390/ijerph16040599 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Keenan, S. A. An overview of polysomnography. Handb. Clin. Neurophysiol.6, 33–50 (2005). 10.1016/S1567-4231(09)70028-0 [DOI] [Google Scholar]
  • 8.Bery, R. B., Claude, L. A., Susan, M. H. & Al, E. The aasm manual for the scoring of sleep and associated events: Rules, terminology and technical specifications (The American Academy of Sleep Medicine, 2018). [Google Scholar]
  • 9.Contreras, D., Destexhe, A., Sejnowski, T. J. & Steriade, M. Spatiotemporal patterns of spindle oscillations in cortex and thalamus. J. Neurosci.17, 1179–1196 (1997). 10.1523/JNEUROSCI.17-03-01179.1997 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Stickgold, R. & Walker, M. P. The neuroscience of sleep (Academic Press, 2010). [Google Scholar]
  • 11.Kryger, M. H., Roth, T. & Dement, W. C. Principles and practice of sleep medicine E-book: Expert consult-online and print (Elsevier, 2010). [Google Scholar]
  • 12.Calkins, M. W. Statistics of dreams. Am. J. Psychol.5, 311–343 (1893). 10.2307/1410996 [DOI] [Google Scholar]
  • 13.Hobson, J. A. Rem sleep and dreaming: Towards a theory of protoconsciousness. Nat. Rev. Neurosci.10, 803–813 (2009). 10.1038/nrn2716 [DOI] [PubMed] [Google Scholar]
  • 14.Zhao, X. et al. Classification of sleep apnea based on eeg sub-band signal characteristics. Sci. Rep.11, 5824 (2021). 10.1038/s41598-021-85138-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Schlüter, T. & Conrad, S. An approach for automatic sleep stage scoring and apnea-hypopnea detection. Front. Comput. Sci.6, 230–241 (2012). 10.1007/s11704-012-2872-6 [DOI] [Google Scholar]
  • 16.Almuhammadi, W. S., Aboalayon, K. A. & Faezipour, M. Efficient obstructive sleep apnea classification based on eeg signals. In 2015 Long Island Systems, Applications and Technology, 1–6 (IEEE, 2015).
  • 17.Moser, D. et al. Sleep classification according to aasm and rechtschaffen & kales: Effects on sleep scoring parameters. Sleep32, 139. 10.1093/SLEEP/32.2.139 (2009). 10.1093/SLEEP/32.2.139 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Berry, R. et al.The AASM manual for the scoring of sleep and associated events: Rules, terminology and technical specifications (American Academy of Sleep Medicine, 2018). [Google Scholar]
  • 19.Memar, P. & Faradji, F. A novel multi-class eeg-based sleep stage classification system. IEEE Trans. Neural Syst. Rehabil. Eng.26, 84–95 (2017). 10.1109/TNSRE.2017.2776149 [DOI] [PubMed] [Google Scholar]
  • 20.Ghimatgar, H., Kazemi, K., Helfroush, M. S. & Aarabi, A. An automatic single-channel eeg-based sleep stage scoring method based on hidden Markov model. J. Neurosci. Methods324, 108320 (2019). 10.1016/j.jneumeth.2019.108320 [DOI] [PubMed] [Google Scholar]
  • 21.Tsinalis, O., Matthews, P. M., Guo, Y. & Zafeiriou, S. Automatic sleep stage scoring with single-channel eeg using convolutional neural networks. arXiv:1610.01683 (2016).
  • 22.Fiorillo, L., Favaro, P. & Faraci, F. D. Deepsleepnet-lite: A simplified automatic sleep stage scoring model with uncertainty estimates. IEEE Trans. Neural Syst. Rehabil. Eng.29, 2076–2085 (2021). 10.1109/TNSRE.2021.3117970 [DOI] [PubMed] [Google Scholar]
  • 23.Zhang, H., Wang, X., Li, H., Mehendale, S. & Guan, Y. Auto-annotating sleep stages based on polysomnographic data. Patterns3, 100371 (2022). 10.1016/j.patter.2021.100371 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Decat, N. et al. Beyond traditional sleep scoring: Massive feature extraction and data-driven clustering of sleep time series. Sleep Med.98, 39–52 (2022). 10.1016/j.sleep.2022.06.013 [DOI] [PubMed] [Google Scholar]
  • 25.Craik, A., He, Y. & Contreras-Vidal, J. L. Deep learning for electroencephalogram (eeg) classification tasks: A review. J. Neural Eng.16, 031001 (2019). 10.1088/1741-2552/ab0ab5 [DOI] [PubMed] [Google Scholar]
  • 26.Hosseini, M.-P., Hosseini, A. & Ahi, K. A review on machine learning for eeg signal processing in bioengineering. IEEE Rev. Biomed. Eng.14, 204–218 (2020). 10.1109/RBME.2020.2969915 [DOI] [PubMed] [Google Scholar]
  • 27.Lotte, F. et al. A review of classification algorithms for eeg-based brain-computer interfaces: A 10 year update. J. Neural Eng.15, 031005 (2018). 10.1088/1741-2552/aab2f2 [DOI] [PubMed] [Google Scholar]
  • 28.Moctezuma, L. A. Towards Universal EEG systems with minimum channel count based on Machine Learning and Computational Intelligence. Ph.D. thesis, NTNU (2021).
  • 29.Moctezuma, L. A. & Molinas, M. Multi-objective optimization for eeg channel selection and accurate intruder detection in an eeg-based subject identification system. Sci. Rep.10, 1–12 (2020). 10.1038/s41598-020-62712-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Goodfellow, I., Bengio, Y. & Courville, A. Deep Learning (MIT Press, 2016). [Google Scholar]
  • 31.Chung, J., Gulcehre, C., Cho, K. & Bengio, Y. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv:1412.3555 (2014).
  • 32.Sors, A., Bonnet, S., Mirek, S., Vercueil, L. & Payen, J.-F. A convolutional neural network for sleep stage scoring from raw single-channel eeg. Biomed. Signal Process. Control42, 107–114 (2018). 10.1016/j.bspc.2017.12.001 [DOI] [Google Scholar]
  • 33.Mousavi, S., Afghah, F. & Acharya, U. R. Sleepeegnet: Automated sleep stage scoring with sequence to sequence deep learning approach. PLoS ONE14, e0216456 (2019). 10.1371/journal.pone.0216456 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Lu, N., Li, T., Ren, X. & Miao, H. A deep learning scheme for motor imagery classification based on restricted Boltzmann machines. IEEE Trans. Neural Syst. Rehabil. Eng.25, 566–576 (2016). 10.1109/TNSRE.2016.2601240 [DOI] [PubMed] [Google Scholar]
  • 35.Yin, Z. & Zhang, J. Cross-session classification of mental workload levels using eeg and an adaptive deep learning model. Biomed. Signal Process. Control33, 30–47 (2017). 10.1016/j.bspc.2016.11.013 [DOI] [Google Scholar]
  • 36.Phan, H. & Mikkelsen, K. Automatic sleep staging of eeg signals: Recent development, challenges, and future directions. Physiol. Meas.43, 04TR01 (2022). 10.1088/1361-6579/ac6049 [DOI] [PubMed] [Google Scholar]
  • 37.Lawhern, V. J. et al. Eegnet: A compact convolutional neural network for eeg-based brain-computer interfaces. J. Neural Eng.15, 056013 (2018). 10.1088/1741-2552/aace8c [DOI] [PubMed] [Google Scholar]
  • 38.Moctezuma, L. A., Abe, T. & Molinas, M. Eeg-based 5- and 2-class cnn for sleep stage classification. In The 22nd World Congress of the International Federation of Automatic Control, 6 (2023).
  • 39.Lee, C.-H., Kim, H.-J., Heo, J.-W., Kim, H. & Kim, D.-J. Improving sleep stage classification performance by single-channel eeg data augmentation via spectral band blending. In 2021 9th International Winter Conference on Brain-Computer Interface (BCI), 1–5 (IEEE, 2021).
  • 40.Wang, I.-N., Lee, C.-H., Kim, H.-J., Kim, H. & Kim, D.-J. An ensemble deep learning approach for sleep stage classification via single-channel eeg and eog. In 2020 International Conference on Information and Communication Technology Convergence (ICTC), 394–398 (IEEE, 2020).
  • 41.Kemp, B., Zwinderman, A. H., Tuk, B., Kamphuisen, H. A. & Oberye, J. J. Analysis of a sleep-dependent neuronal feedback loop: The slow-wave microcontinuity of the eeg. IEEE Trans. Biomed. Eng.47, 1185–1194 (2000). 10.1109/10.867928 [DOI] [PubMed] [Google Scholar]
  • 42.Jia, Z. et al. Salientsleepnet: Multimodal salient wave detection network for sleep staging. arXiv:2105.13864 (2021).
  • 43.Goldberger, A. L. et al. Physiobank, physiotoolkit, and physionet: Components of a new research resource for complex physiologic signals. Circulation101, e215–e220 (2000). 10.1161/01.CIR.101.23.e215 [DOI] [PubMed] [Google Scholar]
  • 44.Quan, S. F. et al. The sleep heart health study: Design, rationale, and methods. Sleep20, 1077–1085 (1997). [PubMed] [Google Scholar]
  • 45.Eldele, E. et al. An attention-based deep learning approach for sleep stage classification with single-channel eeg. IEEE Trans. Neural Syst. Rehabil. Eng.29, 809–818 (2021). 10.1109/TNSRE.2021.3076234 [DOI] [PubMed] [Google Scholar]
  • 46.Tao, Y. et al. A novel feature relearning method for automatic sleep staging based on single-channel eeg. Complex Intell. Syst.9, 41–50 (2023). 10.1007/s40747-022-00779-6 [DOI] [Google Scholar]
  • 47.Hassan, A. R. & Subasi, A. A decision support system for automated identification of sleep stages from single-channel eeg signals. Knowl.-Based Syst.128, 115–124 (2017). 10.1016/j.knosys.2017.05.005 [DOI] [Google Scholar]
  • 48.Zhu, G., Li, Y. & Wen, P. Analysis and classification of sleep stages based on difference visibility graphs from a single-channel eeg signal. IEEE J. Biomed. Health Inform.18, 1813–1821 (2014). 10.1109/JBHI.2014.2303991 [DOI] [PubMed] [Google Scholar]
  • 49.Sharma, R., Pachori, R. B. & Upadhyay, A. Automatic sleep stages classification based on iterative filtering of electroencephalogram signals. Neural Comput. Appl.28, 2959–2978 (2017). 10.1007/s00521-017-2919-6 [DOI] [Google Scholar]
  • 50.Supratak, A., Dong, H., Wu, C. & Guo, Y. Deepsleepnet: A model for automatic sleep stage scoring based on raw single-channel eeg. IEEE Trans. Neural Syst. Rehabil. Eng.25, 1998–2008 (2017). 10.1109/TNSRE.2017.2721116 [DOI] [PubMed] [Google Scholar]
  • 51.Hassan, A. R. & Bhuiyan, M. I. H. Computer-aided sleep staging using complete ensemble empirical mode decomposition with adaptive noise and bootstrap aggregating. Biomed. Signal Process. Control24, 1–10 (2016). 10.1016/j.bspc.2015.09.002 [DOI] [Google Scholar]
  • 52.Zhang, Y. et al. Shnn: A single-channel eeg sleep staging model based on semi-supervised learning. Expert Syst. Appl.213, 119288 (2023). 10.1016/j.eswa.2022.119288 [DOI] [Google Scholar]
  • 53.Tsinalis, O., Matthews, P. M. & Guo, Y. Automatic sleep stage scoring using time-frequency analysis and stacked sparse autoencoders. Ann. Biomed. Eng.44, 1587–1597 (2016). 10.1007/s10439-015-1444-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Hsu, Y.-L., Yang, Y.-T., Wang, J.-S. & Hsu, C.-Y. Automatic sleep stage recurrent neural classifier using energy features of eeg signals. Neurocomputing104, 105–114 (2013). 10.1016/j.neucom.2012.11.003 [DOI] [Google Scholar]
  • 55.Nazih, W., Shahin, M., Eldesouki, M. I. & Ahmed, B. Influence of channel selection and subject’s age on the performance of the single channel eeg-based automatic sleep staging algorithms. Sensors23, 899 (2023). 10.3390/s23020899 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Chen, X., Teng, X., Chen, H., Pan, Y. & Geyer, P. Toward reliable signals decoding for electroencephalogram: A benchmark study to eegnex. arXiv:2207.12369 (2022).
  • 57.Zeitlhofer, J. et al. Topographic distribution of sleep spindles in young healthy subjects. J. Sleep Res.6, 149–155 (1997). 10.1046/j.1365-2869.1997.00046.x [DOI] [PubMed] [Google Scholar]
  • 58.McCormick, L., Nielsen, T., Nicolas, A., Ptito, M. & Montplaisir, J. Topographical distribution of spindles and k-complexes in normal subjects. Sleep20, 939–941 (1997). 10.1093/sleep/20.11.939 [DOI] [PubMed] [Google Scholar]
  • 59.Happe, S. et al. Scalp topography of the spontaneous k-complex and of delta-waves in human sleep. Brain Topogr.15, 43–49 (2002). 10.1023/A:1019992523246 [DOI] [PubMed] [Google Scholar]
  • 60.Rosenberg, R. S. & Van Hout, S. The American academy of sleep medicine inter-scorer reliability program: Sleep stage scoring. J. Clin. Sleep Med.9, 81–87 (2013). 10.5664/jcsm.2350 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Danker-Hopfe, H. et al. Interrater reliability between scorers from eight European sleep laboratories in subjects with different sleep disorders. J. Sleep Res.13, 63–69 (2004). 10.1046/j.1365-2869.2003.00375.x [DOI] [PubMed] [Google Scholar]
  • 62.Fan, J. et al. Eeg data augmentation: Towards class imbalance problem in sleep staging tasks. J. Neural Eng.17, 056017 (2020). 10.1088/1741-2552/abb5be [DOI] [PubMed] [Google Scholar]
  • 63.Seljevoll Herleiksplass, K., Moctezuma, L. A., Furuki, J., Suzuki, Y. & Molinas, M. Automatic sleep-wake scoring with optimally selected eeg channels from high-density eeg. In The 16th International Conference on Brain Informatics, 12 (2023).
  • 64.Moctezuma, L. A., Suzuki, Y., Furuki, J., Molinas, M. & Abe, T. Enhancing sleep stage classification with 2-class stratification and permutation-based channel selection. In 45th Annual International Conference of the IEEE, Engineering in Medicine and Biology Society, 1–4 (IEEE, 2024).
  • 65.Cho, K. et al. Learning phrase representations using rnn encoder–decoder for statistical machine translation. arXiv:1406.1078 (2014).
  • 66.Rumelhart, D. E., Hinton, G. E. & Williams, R. J. Learning representations by back-propagating errors. Nature323, 533–536 (1986). 10.1038/323533a0 [DOI] [Google Scholar]
  • 67.Kingma, D. P. & Ba, J. Adam: A method for stochastic optimization. arXiv:1412.6980 (2014).
  • 68.Berrar, D. Cross-validation. In Encyclopedia of bioinformatics and computational biology (eds Ranganathan, S. et al.) (Academic Press, 2019). [Google Scholar]
  • 69.King, R. D., Orhobor, O. I. & Taylor, C. C. Cross-validation is safe to use. Nat. Mach. Intell.3, 276–276 (2021). 10.1038/s42256-021-00332-z [DOI] [Google Scholar]
  • 70.King, G. & Zeng, L. Logistic regression in rare events data. Polit. Anal.9, 137–163 (2001). 10.1093/oxfordjournals.pan.a004868 [DOI] [Google Scholar]
  • 71.Seabold, S. & Perktold, J. Statsmodels: Econometric and statistical modeling with python. In 9th Python in Science Conference (2010).
  • 72.Armstrong, R. A. When to use the b onferroni correction. Ophthal. Physiol. Opt.34(5), 502–508 (2014). 10.1111/opo.12131 [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Data Availability Statement

The dataset collected at the University of Tsukuba is available from the corresponding author upon reasonable request. The classifier code, including an example with a random subset of the dataset, is available at https://github.com/wavesresearch/sleep-staging_IIIS.


Articles from Scientific Reports are provided here courtesy of Nature Publishing Group

RESOURCES