. Author manuscript; available in PMC: 2024 Oct 5.

Published in final edited form as: IEEE Trans Neural Netw Learn Syst. 2023 Oct 5;34(10):6983–7003. doi: 10.1109/TNNLS.2022.3145365

TABLE IV.

A summary of contributions describing the application of RNNS for sleep stage classification.

Author	Model architecture	Dataset^a,b	Performance
Supratak et al. 2017[21]	• The authors proposed a hierarchical structure called DeepSleepNet with CNN and RNN parts. • Each signal chunk was first fed into a CNN part composed of two parallel CNN-1Ds with different filter sizes, and the two outputs were then concatenated as a chunk-level feature. • All chunk features were connected as temporal sequences and fed into a 2-layer bidirectional LSTM with a residual connection.	MASS and Sleep-EDF	For MASS, the accuracy and macro F1-score were 86.2% and 0.817; for Sleep-EDF, the accuracy and macro F1-score were 82.0% and 0.769.
Dong et al. 2017[38]	• In each signal chunk, engineering features were extracted.. • Each feature was first fed into 2 dense-layers. The outputs of all the chunks in each recording were then connected into a feature sequence. • The feature sequence was fed into an LSTM as an input vector.	MASS	The accuracy and macro F1-score were 85.92% and 0.805
Phan et al. 2018[23]	• The authors first calculated the log-power spectral coefficients in each chunk as model input. • A DNN model was used as a filter bank for feature extraction and dimension reduction. • A two-layer bidirectional GRU with an attention layer was then applied on the top of the DNN part. The attention vector was fed to a softmax layer for final output. • After the model training, the softmax layer was replaced by an SVM for final decision-making.	Sleep-EDF Expanded	When in-bed part was considered only, the accuracy and macro F1-score were 79.1% and 0.698; when before and after sleep periods were also considered, the accuracy and macro F1-score were 82.5% and 0.72.
Michielli et al. 2018[81]	• Each 30s signal chunk was divided into 30 smaller chunks with 1s length, and 55 features were extracted in each chunk. • The authors designed a 2-level LSTM-based classifier. After feature selection, the feature sequences were fed into the first-level classifier. It classified 4-classes, in which the stages N1 and REM were merged into a single class. • The samples identified as N1 stage/REM were then fed to the second binary classifier.	Sleep-EDF	The accuracy was 86.7%
Phan et al. 2019[14]	• The authors proposed a hierarchical structure called SeqSleepNet with filter bank layers and two levels of RNNs. • Raw data had 3 channels (EEG, EOG and EMG). In each chunk, power spectrums image was calculated in each channel. The filter bank layer was applied to these images. • Connected the time point features as chunk-level sequence, which was fed into first level RNN (bidirectional GRU) with attention layer. • The authors connected the features as a chunk-level sequence, which was fed into bidirectional GRU with an attention layer.	MASS	The accuracy and macro F1-score were 87.1% and 0.833
Phan et al. 2019[31]	• The authors proposed a transfer learning strategy. • The author conducted either the CNN part from DeepSleepNet or the RNN part from SeqSleepNet to extract chunk-level features in each chunk. • The chunk-level features were connected as feature sequences and fed into a 2-layer bidirectional LSTM with residual connection.	The model was trained on MASS, and fine- turned on Sleep-EDF-SC, Sleep-EDF-ST, Surrey-cEEGGrid, and Surrey-PSG	The accuracy and macro F1-score obtained from transfer learning outperfomed directly training on all the four datasets.
Phan et al. 2019[82]	The authors proposed a Fusion model, which was composed of DeepSleepNet and SeqSleepNet with slight modifications.	MASS	The accuracy and macro F1-score were 88.0% and 0.843.
Mousavi et al. 2019[83]	• The authors proposed a Seq2seq model called SleepEEGNet. • The signal was first fed into CNN layers for extracting the feature sequence, which was used as input for an encoder.. • Both encoder and decoder were constructed with bidirectional LSTM and attention mechanism.	Sleep-EDF Expanded (version 1 and 2)	The accuracy and macro F1-score were 84.26% and 0.7966 for version 1; 80.03% and 0.7355 for version 2.

MASS: Montreal Archive of Sleep Studies[84]; Sleep-EDF: a database in European Data Format[15]; Sleep-EDF Expanded: An expanded version of Sleep-EDF[15]; Sleep-EDF-SC: the Sleep Cassette, a subset of the Sleep-EDF Expanded dataset; Sleep-EDF-ST: the Sleep Telemetry, a subset of the Sleep-EDF Expanded dataset; Surrey-cEEGGrid and Surrey-PSG: collected at the University of Surrey using the behind-the-ear electrodes and PSG electrodes respectively [85].

Sleeping stages are commonly categorized into five classes: rapid eye movement, three sleep stages corresponding to different depths of sleep (N1∼N3), and wakefulness.