. Author manuscript; available in PMC: 2024 Oct 5.

Published in final edited form as: IEEE Trans Neural Netw Learn Syst. 2023 Oct 5;34(10):6983–7003. doi: 10.1109/TNNLS.2022.3145365

TABLE II.

A summary of contributions dealing with the applications of RNNS to emotion recognition with EEG signals.

Author	Emotion classes^a	Model architecture	Performance	Dataset^b
Li, Xiang et al. 2016[30]	Low/high arousal and low/high valence.	• The authors calculated a scalogram (a time-frequency representation) in each channel by CWT^c, then stacked the scalograms from all the channels to form a frame sequence. • Each frame was fed into 2 CNN-1D layers, followed by one LSTM layer. • The average values of all hidden states were fed to a softmax layer for final output.	Mean accuracies were 72.06% and 74.12% on valence and arousal classification, respectively.	DEAP
Li,Youjun et al. 2017 [47]	Low/high arousal and low/high valence.	• The 32 leads conduction was first mapped to a 99 matrix. The PSD features from the raw EEG signal were filled into this matrix^c. • The matrix was then interpolated into a 200200 image at each time step. • The images were first fed into 2layer CNN with max pooling and one dense layer. • An LSTM and a dense layer were used for final decision-making with the feature sequence obtained from the last time step.	The average accuracy was 75.21%	DEAP
Alhagry et al. 2017[64]	Low/high arousal, low/high valence, and low/high liking.	• Raw signal in each segment was used as input for two-layer LSTM.	For the arousal, valence, and liking classes, average accuracies were 85.65%, 85.45%, and 87.99% .	DEAP
Yang et al. 2018[65]	Low/high arousal and low/high valence.	• The authors designed a parallel model constructed with RNN and CNN. • The signal was first fed into one dense layer and two stacked LSTM layers. The last hidden state was fed to another dense layer. • Single vector at single time step was transferred into 2D frame according to the position of the electrodes, and the frame sequences were fed into 3 CNN layers. • Both CNN and RNN outputs were concatenated for final decision-making.	Mean accuracies were 90.80% and 91.03% on valence and arousal classification tasks, respectively	DEAP
Hofmann et al. 2018[19]	Low/high arousal.	• Raw signals were decomposed with either Spatio-Spectral decomposition or Source Power Comodulation. • In the 1s segment, the decomposed signal with 250 samples was fed to 2-layer LSTM, and 2 dense layers were used for final classification.	• Model with Spatio -Spectral decomposition got 63.4% accuracy. • Model with Source Power Comodulation got 62.3% accuracy.	Self-collected EEG data from 45 participants.
Li, Yang et al. 2018[62]	Positive, Neutral and Negative emotion.	• Both left and right hemispheres had 31 channels of EEG signals. • The authors calculated DE^cin 9s signals with 1s window and 5 frequency bands to form a time sequence. • LSTM was applied as a feature extractor for sequence. • Domain Adversarial Neural Networks was used based on the LSTM.	• When conducted subject-specific experiment, the accuracy was 92.38% • When leave-one-subject-out cross-validation was considered, the accuracy was 83.28%	SEED
Li, Yang et al. 2018[63]	Positive, Neutral and Negative emotion.	Proposed an extension version of [62] by adding a subject discriminator.	Achieved 84.14% accuracy.	SEED
Zhang et al. 2018[29]	Positive, Neutral and Negative emotion.	• The authors extracted DE^cfeatures in 1s for all 62 electrodes. • In a single time step, Elman RNN was applied to model the spatial dependency from 4 directions. • The outputs of the spatial model at each time step were formed as sequences and fed into bi-directional Elman RNN.	Achieved 89.5% accuracy.	SEED
Xing et al. 2019[28]	Low/high arousal and low/high valence.	• A stacked DNN autoencoder was applied for obtaining the sequence representation with the raw signal as input. • A Hanning window is used for segmenting the sequence representation, and frequency band power features and correlation coefficients were extracted in each window. • The feature sequence was fed to 1layer LSTM, and all the hidden states were fed into a dense layer for final decision-making.	Mean accuracies were 81.10% and 74.38% on valence and arousal classification tasks, respectively.	DEAP
Li, Xiang et al. 2020[66]	Positive and Negative for SEED. Low/high arousal and low/high valence for DEAP	• Different types of autoencoder (Traditional, Restricted Boltzmann machine, and variational autoencoder) were first applied to obtain the latent sequences. • The latent sequence was fed into LSTM as an input, and 2 dense layers were used for final decision-making.	• For DEAP, mean accuracies were 76.23% and 79.89% on valence and arousal classification tasks, respectively. • For SEED, the accuracy was 85.81%.	DEAP and SEED

Valence, arousal, and dominance space is a dimensional representation of emotions. Valence ranges from unpleasant to pleasant; arousal ranges from calm to activated and can describe the emotional intensity[67];

SEED: Shanghai Jiao Tong University Emotion EEG Dataset, which contains 15 subjects’ EEG data with 62 channels sorted according to 10–20 systems. In SEED, there are three categories of emotions (positive, neutral, and negative)[61]. DEAP: a Database for Emotion Analysis using Physiological signals, which contains EEG signals from 32 subjects, and the EEG signals were collected with 32 channels according to 10–20 system. Each participant rated the emotion in terms of arousal, valence, like/dislike, dominance, and familiarity[58].

DE: differential entropy; PSD: power spectral density; CWT: continuous wavelet transform.