Abstract
Electroencephalography (EEG) signals based emotion brain computer interface (BCI) is a significant field in the domain of affective computing where EEG signals are the cause of reliable and objective applications. Despite these advancements, significant challenges persist, including individual differences in EEG signals across subjects during emotion recognition. To cope this challenge, current study introduces a cutting-edge cross subject contrastive learning (CSCL) scheme for EEG signals representation of brain region. The proposed scheme addresses the generalisation across subjects directly, which is a primary challenge in EEG signals-based emotions recognition. The proposed CSCL scheme captures the complex patterns effectively by employing emotions and stimulus contrastive losses within hyperbolic space. CSCL is designed primarily to learn representations that can effectively distinguish signals originating from different brain regions. Further, we evaluate the significance of our proposed CSCL scheme on five different datasets, including SEED, CEED, FACED and MPED, and obtain 97.70%, 96.26%, 65.98%, and 51.30% respectively. The experimental results show that our proposed CSCL scheme demonstrates strong effectiveness while addressing the challenges related to cross subject variability and label noise in the EEG-based emotion recognition system.
Keywords: Ensemble learning, CNN, Deep learning, Artificial Intelligence
Subject terms: Energy science and technology, Engineering
Introduction
Emotions play a significant role in human cognition, behaviour, and social interaction, influencing how individuals perceive, interpret, and respond to their environment1. The ability to automatically recognize emotions is central to the advancement of affective brain-computer interfaces (BCIs), which aim to bridge the gap between human affect and machine intelligence2. Traditional emotion recognition systems rely on behavioural and visual cues such as facial expressions, speech patterns, and body language3–5. However, these methods are often susceptible to voluntary control and lack the ability to reflect subtle emotional states6. Electroencephalography (EEG), a non-invasive method for recording brain activity, has emerged as a powerful and objective tool for emotion recognition due to its high temporal resolution, low cost, and ability to capture spontaneous neural responses7. Unlike fMRI or MEG, EEG provides real-time monitoring of cognitive and emotional processes and is more feasible for portable and wearable applications8. As a result, EEG-based emotion recognition has gained significant traction in affective computing and neurotechnology, offering deeper insights into user affect and intent. Extracting discriminative features from EEG signals is a key step in enabling accurate emotion classification. Classical methods such as Power Spectral Density (PSD) and Differential Entropy (DE) features have been widely used to represent emotional states across different frequency bands, particularly Beta and Gamma, which are often linked to affective processes in temporal regions9–12. Advanced signal processing techniques like the Tunable Q-factor Wavelet Transform (TQWT)13 and intelligent feature selection algorithms like greedy-based Max-Relevance14 and Min-Redundancy (Greedy-mRMR)15 have further improved model performance by isolating relevant and non-redundant features. In parallel, deep learning models have been leveraged to automatically learn spatiotemporal dependencies in EEG data, bypassing the need for manual feature engineering. Convolutional Neural Networks (CNNs), Long Short-Term Memory networks (LSTMs), and Graph Neural Networks (GNNs) have shown promising results by capturing inter-channel relationships and temporal variations16–22. More recently, attention mechanisms and capsule networks have been introduced to enhance the representation of emotion-related features and to deal with the uncertainty in noisy EEG labels23.
Despite these advances, EEG-based emotion recognition remains fundamentally challenged by individual differences in EEG patterns across subjects. These inter-subject variations arise from anatomical, cognitive, and perceptual differences in emotional response, leading to poor generalisation of models trained on one subject to another. Collecting large volumes of subject-specific data to overcome this limitation is time-consuming, expensive, and often infeasible, limiting the scalability of practical emotion BCI systems24,25. Furthermore, label noise due to inconsistent emotional responses to the same stimuli further complicates training and reduces recognition accuracy. Variations in psychological states, mental fatigue, attention levels, and prior emotional context can further alter EEG signals, even under the same experimental setup. Additionally, hardware discrepancies, electrode placements, and environmental factors introduce noise and inconsistencies that degrade model robustness. As a result, models trained on data from one cohort often fail to generalize to new individuals or even to repeated sessions with the same subjects. This issue limits the deployment of real-world, user-independent emotion recognition systems, especially in dynamic environments. Therefore, enhancing cross-subject generalization without requiring exhaustive retraining remains an essential goal. Developing algorithms capable of learning invariant features that transcend individual differences is critical to pushing the boundaries of practical, scalable affective computing systems.
To overcome these challenges, this study introduces a novel Cross-Subject Contrastive Learning (CSCL) scheme that enables robust and discriminative EEG signal representations across individuals. Unlike traditional methods, CSCL directly addresses the cross-subject generalization challenge by leveraging contrastive learning with emotion and stimulus-based loss functions, embedded in a hyperbolic space. This geometric embedding enables the model to capture the hierarchical and complex relationships within EEG data, making it especially effective in differentiating signals arising from distinct brain regions and emotional stimuli. The framework is explicitly designed to learn invariant and subject-discriminative features that remain consistent despite inter-individual variability. Additionally, CSCL introduces a region-aware learning mechanism, where the model learns to focus on specific EEG channel combinations associated with distinct brain areas implicated in emotional processing. This mechanism enhances interpretability and strengthens the physiological relevance of the learned features. To validate the effectiveness and generalizability of the proposed CSCL framework, we conduct extensive experiments across five diverse EEG emotion datasets, including SEED, CEED, FACED, and MPED, each containing unique challenges such as varied sampling rates, emotional classes, and subject demographics. By leveraging the strengths of contrastive learning in learning discriminative and invariant representations, CSCL advances the field of EEG-based emotion recognition, pushing toward more scalable, adaptable, and user-independent affective computing systems.
In this study, our contribution is summarized as follows:
Propose a novel Cross-Subject Contrastive Learning (CSCL) scheme that enhances the accuracy of emotion recognition by minimizing the variability among the subjects effectively.
To obtain a massive amount of emotional data, the proposed method leverages dual contrastive objectives by capturing the complex patterns effectively by employing emotions and stimulus contrastive losses within hyperbolic space. This mechanism effectively captures both subject-specific characteristics and generalizable features. Unlike existing approaches that use only Euclidean space, our model embeds EEG features in a hyperbolic space to better model hierarchical relationships among emotional states.
A triple-path encoder integrating spatial, temporal, and frequency information enhances brain-region specificity.
Extensively evaluated the proposed scheme on a variety of datasets including SEED, CEED, FACED and MPED. The experimental results show that our proposed model is promising model that minimize the individual differences in emotion recognition.
The rest of the paper is organized as follows: the section "Literature review" presents the literature review, followed by comprehensive background studies. The section "Methodology" describes the proposed model in detail. The section "Experiments" presents the used datasets, and implementation along with evaluated results. Finally, the section "Results & discussion" conclude the study.
Literature review
Now we give a detail of existing state of the art methods used for emotion detection followed by the description of the comprehensive background knowledge of technologies used in this study.
Contrastive Learning has emerged as a powerful self-supervised learning paradigm that enables models to learn discriminative representations by contrasting positive and negative sample pairs. Rather than relying solely on labelled data, contrastive learning pulls semantically similar instances closer in the representation space while pushing dissimilar instances apart26,27. This learning mechanism has shown remarkable success in a variety of domains such as computer vision, natural language processing, and speech analysis. Its ability to extract meaningful features from raw data without requiring extensive labels makes it especially attractive in fields with noisy or limited annotations. In the context of EEG-based emotion recognition, where label noise and subject variability present major challenges, contrastive learning offers a promising solution. It helps discover invariant features that are robust across individuals by learning consistent patterns associated with emotional responses. In the recent past, a variety of contrastive learning based solutions have been proposed for emotion recognition. For instance, Xiao et al.28, proposed a multi-source dynamic contrastive domain adaptation method (MS-DCDA) to address the signal variation issue while emotions recognition. Leveraging the domain knowledge of individual subjects, the proposed model obtained an effective trade-off between discriminability and transferability of the domain. With these implementation mechanisms authors attained 90% recognition accuracy using SEED dataset.
Minghao et al.29, worked on spatiotemporal features for EEG emotion recognition as well as fuzzy emotion labelling, and proposed FCLGCN approach. The proposed architecture simultaneously attends to time-domain and frequency-domain information flows, ensuring comprehensive feature learning. To further boost discriminative power, contrastive learning is incorporated, improving the separation between emotion-related EEG patterns. The model demonstrates competitive performance across benchmark EEG datasets, indicating its potential in affective computing and human–computer interaction. Beyond emotion recognition, the study also explores broader applications of EEG signals in motor imagery, brain disease diagnosis, and intelligent assistive technologies. The integration of graph neural networks is highlighted for future work in decoding physiological states from multimodal bio-signals like ECG and EMG. Despite its effectiveness, the proposed model faces challenges in optimizing performance across highly variable subject data. Additionally, further refinement is needed to enhance classification accuracy, especially under noisy or low-quality EEG conditions.
Fangze et al.30 emphasized the integration of textual, auditory, and visual cues for classifying emotional states. However, most existing models assume complete modality availability, which rarely aligns with real-world conversational data. Addressing this, recent work in Incomplete MERC (IMERC) leverages Graph Neural Networks (GNNs) to model modality gaps. Traditional GNNs, though effective, often struggle with over-smoothing and limited binary relational modelling. To overcome these challenges, the Spectral Domain Reconstruction GNN (SDR-GNN) has been proposed, which constructs speaker-context interaction graphs and employs spectral multi-frequency aggregation. This design enables the extraction of both high- and low-frequency features, improving semantic consistency and robustness in emotion classification. Multi-head attention further enhances feature fusion across modalities. Empirical results on multiple datasets confirm SDR-GNN’s superiority in handling incomplete multimodal data. The framework sets a strong benchmark in both accuracy and resilience under varying missing modality conditions. While SDR-GNN demonstrates strong performance, its reliance on carefully tuned spectral aggregation may limit generalizability across diverse datasets. Additionally, the model’s accuracy could benefit from more adaptive strategies for modality reconstruction under highly imbalanced or noisy data conditions.
Cross-subject emotion detection
Traditional machine learning approaches31–33 often fail to capture these diverse patterns, resulting in reduced accuracy when applied to unseen subjects. Deep learning has made progress in feature extraction, but generalization across subjects remains limited. On the other hand, when contrastive learning is applied in cross-subject scenarios, contrastive learning can reduce the impact of inter-subject variability by aligning representations of similar emotional states across different users34. Additionally, it can mitigate the influence of noisy labels by leveraging intrinsic relationships in the data rather than explicit supervision. Contrastive Learning offers a compelling solution by focusing on learning invariant representations that transcend individual differences35. By contrasting EEG signals based on emotional similarity rather than subject identity, it helps the model emphasize emotion-relevant features while suppressing subject-specific noise. Integrating cross-subject with contrastive learning, several methods have been introduced recently. Mengting et al.36, worked on cross-subject EEG-based emotion recognition faces significant challenges due to individual variability and signal complexity. The proposed method addresses this by aligning EEG features across spatial, temporal, and frequency domains. It uses contrastive loss to enhance class separation and similarity learning. A three-domain encoder extracts differential entropy features, followed by classification with a multilayer perceptron. Proposed scheme achieved strong results on multiple public datasets and generalizes well to unseen subjects. Its robustness highlights the potential for real-world emotion recognition applications. Variability in EEG signal characteristics across datasets complicates preprocessing and may lead to information loss, affecting classification accuracy and generalization. Similarly, Zhu et al.37, used multi-modal emotion recognition and gained prominence in human–computer interaction due to its ability to leverage complementary signals from multiple modalities. Despite this potential, distributional shifts across subjects and modality heterogeneity hinder consistent performance. To address these challenges, authors proposed a novel method that jointly learns subject-invariant and shared features from EEG and eye movement data. The approach employs dynamic adversarial domain adaptation to align cross-subject distributions by adaptively selecting source domains. It further captures intra- and inter-modal emotional cues through the integration of self-attention and cross-attention mechanisms. To enhance semantic consistency and reduce modality divergence, two contrastive loss functions are introduced. These strategies yield robust and complementary emotional representations. Experiments conducted on multi-modal datasets confirm the framework’s superiority over existing emotion recognition methods. Although the model performs well, its reliance on carefully synchronized multi-modal inputs may limit scalability. Moreover, improving generalization under highly noisy or imbalanced modality conditions remains an open challenge. Sujatha et al.38, introduced a deep learning-based AER model using four publicly available speech datasets: URDU, TESS, AESDD, and BAVED. Leveraging the Librosa library for audio feature extraction, the system is implemented in Jupyter Notebook. Experimental evaluations demonstrate that the proposed model outperforms existing methods on all datasets. Notably, the model achieves exceptionally high accuracy on the URDU and TESS datasets, showcasing its effectiveness in cross-lingual emotion recognition. The model shows performance variance across datasets, with lower accuracy on AESDD and BAVED, indicating sensitivity to dataset-specific features. Further enhancement is needed for generalizing the model to underrepresented languages and emotional expressions in real-world scenarios. Zhong et al.39, recently published their work on Emotion recognition using EEG signals. According to the authors, ER faces major challenges due to label scarcity and distribution differences across individuals and sessions. To tackle these, they proposed a novel unsupervised domain adaptation method with pseudo-label propagation (DAPLP). It aligns global distributions between domains and selects reliable pseudo-labels to guide label propagation. The propagation process is further refined using correction and smoothing techniques. DAPLP demonstrates strong performance on SEED, SEED-IV, and SEED-V datasets in both cross-subject and cross-session scenarios. The approach enhances generalization without requiring labeled target data. Future work will explore DAPLP’s robustness in noisy environments. However, DAPLP still struggles to fully capture the complex brain-region-specific patterns and does not leverage contrastive learning for better representation, which are effectively addressed in our proposed CSCL scheme.
Despite recent advancements, existing EEG-based emotion recognition models struggle with cross-subject variability and label noise, leading to poor generalization across individuals. Traditional contrastive learning approaches often fail to effectively capture the complex brain region-specific patterns critical for emotion differentiation. Moreover, most current methods rely on Euclidean space representations, limiting their ability to model hierarchical emotional relationships. Few studies exploit hyperbolic space, which is naturally suited for embedding such structured data. Additionally, current solutions lack robust mechanisms to jointly leverage emotional and stimulus-driven contrasts. Our proposed CSCL framework addresses these gaps by incorporating dual contrastive losses in hyperbolic space to learn invariant, brain-region-discriminative representations across subjects.
Methodology
This section presents the cutting-edge proposed cross subject contrastive learning (CSCL) scheme for EEG signals representation of the brain region. CSCL performs in two phases: 1) contrastive learning phase, and 2) predictor procedure. The detail of each phase is shown in Fig. 1. The primary flow of the contrastive learning phase includes the data generator source, three domain encoders, feature extraction, representation projector, loss objective, and domain adoption. On the other hand, the predictor method includes the feature extraction, training, testing data splitting, results evaluation, regrouping, MLP classifier, final evaluation, and emotions classifier.
Fig. 1.
Block diagram of the CSCL model showing the contrastive learning phase (left) and prediction phase (right), highlighting the data flow across domain encoders, projection, domain adaptation, and classification.
Phase I. The proposed CSCL scheme process begins from the data generator by dividing the input data into positive/negative sample pairs. To stabilize the training process, input data is passed through three domain-specific encoders, including spatial convolutional, temporal convolutional, and frequency convolutional, followed by batch normalization at the start and end. Spatial encoder captures the relationships across different electrode channels placed on the scalp. These spatial patterns often reflect functional connectivity between brain regions. Further, the temporal encoder focuses on sequential dynamics of brain activity over time, capturing how emotions evolve during a stimulus. Frequency encoder extracts spectral features (like alpha, beta, gamma bands), which are known to be correlated with different emotional and cognitive states. By learning from all three domains, the encoder generates a comprehensive and robust feature representation for each EEG sample. Each encoder outputs an intermediate representation, denoted as Hi, for the ith input. These representations capture spatial, temporal, and frequency-specific information from EEG signals. Therefore, the collected information is represented as H = H1, H2, H3,… Hn for multiple subjects. In the next stage, the representation projector module is designed to transform the rich but high-dimensional encoded features H into a more compact and discriminative representation Z suitable for downstream tasks. It refines and compresses the features extracted from the spatial, temporal, and frequency pathways. These multi-domain features are first combined to form a unified representation. This fused data is then passed through an Average Pooling layer, which reduces dimensionality by summarizing feature responses. The pooled output is fed into one or more Fully Connected (FC) layers to learn deeper, abstract patterns. Following this, Normalization ensures feature stability while an Activation Function introduces non-linearity for better generalization. The final output is a compact latent vector that captures subject-specific information effectively. This vector, denoted as Zi becomes the standardized input for contrastive learning or classification. Contrastive Loss: Contrastive learning aims to learn a representation space where similar data points (such as EEG signals representing the same emotion) are pulled closer together, while dissimilar data points (e.g., different emotions) are pushed farther apart. In our model, each training example is paired with another example from the same or a different class to form positive or negative pairs. This learning objective helps the model discover discriminative features without requiring large quantities of labelled data, thereby improving generalization across subjects. Once the model receives both the original and augmented versions of each subject’s feature representation. These pairs, represented as Zi and Zi′, are used to calculate a contrastive loss. The goal is to increase similarity between embedding’s from the same subject to preserve identity-consistent information. Simultaneously, the model reduces similarity between embedding’s from different subjects to learn distinctiveness. This process helps the network capture patterns that are invariant across individuals, despite variations in EEG signals. The contrastive objective promotes robust feature learning without relying solely on labels. As a result, the system better generalizes to unseen subjects during emotion recognition.
In EEG-based emotion recognition, signals from different subjects often exhibit substantial variability, known as domain shift. DANN is a powerful mechanism that reduces this discrepancy by making the learned features domain-invariant. It does so by including a domain discriminator network that tries to distinguish between source and target domain samples, while the encoder learns to generate representations that confuse this discriminator. This adversarial setup encourages the encoder to produce features that are useful for classification yet invariant to subject-specific noise or bias. To address distributional discrepancies in cross-subject EEG data, we adopt a Domain-Adversarial Neural Network (DANN) framework. Let Xs, where s denotes the source domain EEG signals and corresponding emotion labels, and Xt represent the unlabeled target domain signals. The architecture comprises three components: a feature extractor Gf, a label predictor Gy, and a domain discriminator Gd. EEG data x ∈ Xs ∪ Xt is input to Gf, which maps it to a latent representation f = G(x). The label predictor Gy then maps f to class probabilities
for emotion classification. Concurrently, Gd attempts to classify the domain (source vs. target) of f, while Gf is trained to minimize classification loss Ly and maximize domain confusion by fooling Gd. This is achieved via a gradient reversal layer (GRL), which negates gradients from Gd during backpropagation, encouraging domain-invariant feature learning. The domain loss Ld and classification loss Ly are jointly optimized using the total objective given in Eq. 1.
| 1 |
where λ controls the trade-off between task performance and domain alignment. Through iterative training, Gf learns representations that are both discriminative for emotions and invariant to subject identity.
Data Generator. The initial stage involves acquiring EEG signals
, where each
with C being the number of EEG channels and T the number of time samples. Data is collected across multiple subjects under diverse emotional stimuli, representing variable brain activity. This component introduces domain variability, mimicking real-world inter-subject and intra-subject differences. Signals are standardized to remove artefacts and baseline drift using preprocessing techniques such as band-pass filtering and z-score normalization. The generator thus supplies a rich and heterogeneous dataset for robust cross-subject modeling.
Three (03)-domains encoder. The 3-Domain Encoder Gf is a neural model trained to encode input si into a compact representation
. It leverages shared feature structures among domains (subjects/tasks) to generalize representations while suppressing domain-specific variations. Architecturally, it may consist of convolutional and recurrent blocks to model spatial and temporal patterns simultaneously. The encoder supports downstream generalization by learning invariant embeddings across source, auxiliary, and target domains. This forms the backbone for achieving domain-agnostic emotional representations.
Feature extraction. The Feature extraction is carried out by applying the trained encoder fθ on EEG signals to derive high-level abstractions zi. These feature vectors are optimized to encode emotional relevance while being insensitive to domain shifts. The extracted features are then normalized and transformed into structured vectors, preserving semantic relationships. Techniques like temporal slicing or channel attention can be integrated for enhanced discriminatory power. The result is a unified feature space conducive to inter-domain clustering and classification.
Representation projector. Traditional deep learning models operate in Euclidean space, which is limited in capturing complex hierarchical relationships between data points. Hyperbolic space, by contrast, offers better capacity for modelling such hierarchies, especially when representing structured relationships like emotional states that may exhibit tree-like or layered structure. Embedding EEG features in hyperbolic space allows our model to better separate emotional classes while preserving underlying similarities among them. The projector component
maps feature vectors z into a contrastive subspace z′ = (z), enhancing the separability of emotion categories. It aligns similar samples (e.g., same emotion, different subjects) while maximizing the distance between dissimilar ones. Often implemented as a few fully connected layers with batch normalization, this module aids in learning better margins between classes. Projected representations are used in contrastive learning loss to promote inter-class dispersion. This significantly improves robustness against individual variability in EEG signals. In our CSCL framework, contrastive learning is tightly coupled with the neural network architecture. EEG signals are passed through the three-domain encoder and the representation projector both composed of neural layers to generate latent embeddings. The contrastive loss is then computed on these neural representations (Z) to bring semantically similar samples closer and push dissimilar ones apart. This loss is back propagated through the neural network along with classification and domain-adversarial losses.
All components (encoder, projector, and classifier) are optimized jointly using stochastic gradient descent. This integration enables the model to learn both discriminative and domain-invariant features directly through neural network training. Further, the training is guided by a composite objective as
| 2 |
where,
is the cross-entropy classification loss,
is adversarial domain loss, and
is contrastive loss. The hyperparameters
and
control the influence of auxiliary learning signals. This joint loss encourages discriminative yet generalizable features. It harmonizes the dual goals of classification accuracy and domain generalization.
Domain adaptation. In our proposed model, the domain discriminator D tries to classify the domain (source, auxiliary, or target) of encoded features, while the encoder Gf tries to fool it, forming a min–max adversarial game:
| 3 |
This encourages Gf to produce domain-invariant embeddings. The training alternates between updating D to maximize domain discrimination and Gf to minimize it. Gradient reversal layers (GRL) are often used to facilitate adversarial optimization. This stage ensures that emotion-specific signals are preserved while subject-specific noise is attenuated.
A feedback loop integrates performance evaluation to iteratively enhance training through parameter updates. This process enables the model to generalize learned emotion features across varying distributions and subject-specific noise. Domain adaptation facilitates minimizing the shift between training and testing distributions Ptrain (x) ≠ Ptest (x). Emotion outputs span categories such as happiness, neutrality, and sadness, denoted Y = {y1, y2, y3}. Ultimately, the system maintains classification fidelity under inter-subject variability and supports transfer learning from source to target domains. This stage ensures robustness of inference under real-world conditions, where subject-specific calibration is often unavailable. Unlike traditional pipelines, Algorithm 1 integrates contrastive learning, adversarial domain adaptation, and classification in a unified iterative procedure to ensure subject-invariant and emotion-discriminative representation learning. The workflow of the proposed CSCL framework is presented in Algorithm 1 as follows.
Algorithm 1.
Cross-subject contrastive learning (CSCL) scheme for EEG representation for emotion recognition.
Feature extraction. Post-adaptation, the encoder Gf is reused (frozen or fine-tuned) to extract features from unseen EEG samples. This produces domain-invariant feature vectors. The features serve as consistent input to subsequent classification steps. This reuse of pre-trained models leverages knowledge transfer and minimizes the need for large annotated datasets. Additionally, statistical checks such as t-SNE or PCA can be applied to verify feature space coherence.
Dataset splitting. The entire dataset is divided into mutually exclusive sets by following the methods presented in Eqs. (4, 5).
| 4 |
| 5 |
Where Yi ∈ Y denotes emotion labels. Techniques such as LOSO and stratified sampling are applied to simulate real-world deployment. Care is taken to avoid data leakage and to ensure subject-independent evaluation. This design choice tests the model’s ability to generalize to unseen individuals.
Results evaluation. Model predictions
are benchmarked using evaluation metrics: accuracy, precision, recall, and F1-score. Confusion matrices and ROC curves may also be generated to understand classification tendencies. Evaluations are averaged over multiple runs or folds to obtain statistically significant results. This stage validates the effectiveness of domain adaptation and feature projection mechanisms. If performance drops, model parameters and feature space are re-tuned.
Regrouping. Post-classification, results are grouped by emotion classes Y = {happy, neutral, sad} to assess inter-class distinction. Analysis includes intra-class variance, class imbalance effects, and misclassification trends. Re-grouped results also aid in emotion-wise model calibration. Visualization of feature clusters helps verify the separation between emotional states. This provides interpretability and feedback for improving earlier phases.
Multi-layer perceptron (MLP) classifier. An MLP classifier Gy is trained on Dtrain with input dimension d and output dimension k = ∣Y∣. It uses one or more hidden layers with ReLU activation and dropout for regularization. The network computed following Eq. 6 as follows.
| 6 |
where σ is a non-linear activation and W, b are learned parameters. The MLP learns to associate invariant EEG features with emotional classes robustly.
Experiments
EEG datasets
In this study, we used four different EEG signal-based datasets, including SEED, CEED, FACED, and MPED. A summary of each dataset is presented in Table 1. Further, comprehensive details of each dataset are given as follows.
Table 1.
Summary of datasets used for the evaluation of proposed scheme.
| Attributes | SEED | CEED | FACED | MPED |
|---|---|---|---|---|
| Emotion classes | 3 | 5 | 9 | 7 |
| Stimulation type | Video (Chinese film clips) | Video (Emotion-eliciting clips) | Video | Video (Emotion-eliciting clips) |
| Sample rate | 200 | 200 Hz | 201 Hz | 203 Hz |
| No. of videos | 15 | 15 | 28 | 30 |
| No. of subjects | 15 | 15 | 123 | 30 |
| No. of channels | 62 | 63 | 32 | 62 |
| Filtering frequency | 0 to 75 Hz | 1000 Hz | 1000 Hz | 1000 Hz |
SEED dataset comprises EEG recordings from 15 individuals who were exposed to emotional stimuli categorized as positive, neutral, or negative40. To evoke these emotional states, each participant watched 15 videos specifically curated to trigger one of the three emotional responses. This experimental design results in a comprehensive dataset suitable for studying the impact of emotional states on EEG patterns across subjects. Each video has a duration of approximately four minutes, and participants took part in three separate sessions spaced at least a week apart. EEG signals were captured using 62 electrodes at an initial sampling rate of 1000 Hz, which was later reduced to 200 Hz. To enhance signal quality, the data underwent bandpass filtering within a 0–75 Hz frequency range.
CEED A total of 50 university students participated in the study, comprising 40 males and 10 females, with an average age of 21.9 years. All participants provided informed consent before the experiment. The study involved the presentation of 15 video clips, each approximately three minutes in length, selected to evoke one of three emotional categories: positive, negative, or neutral. Participants began the experiment by pressing a key after reading the instructions41. The experimental session was divided into five parts, with each segment initiated by the participant and followed by a 30-s resting interval after each video. Each trial consisted of a 5-s cue followed by an emotion-inducing video, with three clips shown for each emotional category, as depicted in Fig. 2. Upon completion of all videos, participants filled out a questionnaire to assess their emotional experiences. EEG data were collected.
Fig. 2.

Experimental workflow describing the trial order for emotion recognition.
Using the Brain Vision actiChamp-Plus system and a 64-channel acti-CAP, configured according to the international 10–20 electrode placement system, excluding the reference electrode. This setup yielded signals from 63 channels, sampled at 1000 Hz.
The FACED dataset developed by Tsinghua University, comprises EEG recordings from 123 healthy university students (including 75 females), aged between 17 and 38 years, with an average age of 23.242. During the experiment, participants were shown 28 video clips designed to evoke emotional responses across nine categories: four positive (amusement, inspiration, joy, tenderness), four negative (anger, fear, disgust, sadness), and one neutral. To capture the most intense emotional reactions, EEG data from the last 30 s of each video clip were utilized.
The MPED dataset developed by Professor Zheng Wenming’s team at Southeast University, is available in two formats: a multimodal version involving 23 participants and an EEG-only version comprising 30 participants. The initial 23 subjects are common to both versions, while the remaining 7 in the EEG-only set have data limited to EEG recordings without associated multimodal inputs. The 23-subject dataset is primarily used for multimodal emotion research, whereas the 30-subject version is frequently employed in EEG-based cross-subject emotion recognition studies43. In this study, we utilized the 30-subject EEG-only version, in which each participant underwent a single experimental session. During the session, seven emotional states joy, humour, neutral, sadness, fear, disgust, and anger were induced using four video clips per emotion, resulting in 28 trials per session. Each trial consisted of 120 EEG samples, amounting to a total of 3360 samples per subject.
Preprocessing
We followed the preprocessing approach outlined in CLISA43, which involved down-sampling the CEED, SEED, FACED, and MPED datasets to 200 Hz. To prepare the data for training, we employed stratified normalization44, applied along the channel dimension for each subject. This method preserves the relative distribution of EEG signals across channels, which is essential for maintaining signal fidelity and improving the model’s cross-subject generalization capabilities. In our experiments, we used the SEED dataset for a three-class classification task, the MPED dataset for a seven-class task, CEED for three classes, and the FACED dataset for nine tasks. These datasets were selected to cover a wide range of emotional categories and classification challenges, offering a robust assessment of the model's performance in cross-subject EEG-based emotion recognition. The experimental protocol closely mirrored the configurations described in CLISA to ensure consistency and comparability with previous research. The detailed description of the dataset splitting is described in Table 2.
Table 2.
Subject-wise tenfold cross-validation splitting scheme for multi-class emotion classification across five EEG datasets.
| Dataset | Classification | Splitting (tenfold cross validation) | ||
|---|---|---|---|---|
| Folds | Training | Validation | ||
| SEED |
Positive Neutral Negative |
1 | 1–11 | 12–13 |
| 2 | 2–12 | 13–14 | ||
| 3 | 3–13 | 14–15 | ||
| 4 | 4–14 | 15–1 | ||
| 5 | 5–15 | 1–2 | ||
| 6 | 6–1 | 2–3 | ||
| 7 | 7–2 | 3–4 | ||
| 8 | 8–3 | 4–5 | ||
| 9 | 9–4 | 5–6 | ||
| 10 | 10–5 | 6–7 | ||
| MPED |
Joy Funny Neutral Sad Fear Disgust Anger |
1 | 1–26 | 27–29 |
| 2 | 2–27 | 28–30 | ||
| 3 | 3–28 | 29–1 | ||
| 4 | 4–29 | 30–2 | ||
| 5 | 5–30 | 1–3 | ||
| 6 | 6–1 | 2–4 | ||
| 7 | 7–2 | 3–5 | ||
| 8 | 8–3 | 4–6 | ||
| 9 | 9–4 | 5–7 | ||
| 10 | 10–5 | 6–8 | ||
| FACED |
Amusement Inspiration Joy Tenderness Anger Fear Disgust Sadness Neutral |
1 | 1–110 | 111–123 |
| 2 | 2–111 | 112–1 | ||
| 3 | 3–112 | 113–2 | ||
| 4 | 4–113 | 114–3 | ||
| 5 | 5–114 | 115–4 | ||
| 6 | 6–115 | 116–5 | ||
| 7 | 7–116 | 117–6 | ||
| 8 | 8–117 | 118–7 | ||
| 9 | 9–118 | 119–8 | ||
| 10 | 10–119 | 120–9 | ||
| CEED |
Positive Neutral Negative |
1 | 1–12 | 13–15 |
| 2 | 2–13 | 14–1 | ||
| 3 | 3–14 | 15–2 | ||
| 4 | 4–15 | 1–3 | ||
| 5 | 5–1 | 2–4 | ||
| 6 | 6–2 | 3–5 | ||
| 7 | 7–3 | 4–6 | ||
| 8 | 8–4 | 5–7 | ||
| 9 | 9–5 | 6–8 | ||
| 10 | 10–6 | 7–9 | ||
Pre-train and classification
Our methodology is structured into two key phases: pre-training and classification. During the pre-training stage, we utilize the proposed CSCL framework to enable the model to learn robust, subject-invariant representations of EEG signals. This phase is designed to train the model on large-scale datasets, allowing it to capture multi-scale, hierarchical features that remain consistent across individuals. The objective is to ensure that EEG signals from the same subject under identical stimuli are embedded closely, while those from different subjects or stimuli exhibit greater separation. Further in the classification stage, the features generated by the pre-trained encoder are fed into a Multi-Layer Perceptron (MLP)45 for emotional state prediction.
Here, the encoder acts solely as a fixed feature extractor, with its parameters frozen during fine-tuning. The MLP is trained to map these fixed features to the corresponding emotion labels. To evaluate model performance and ensure its ability to generalize, we apply Leave-One-Subject-Out (LOSO)46 validation along with cross-validation. LOSO involves training on all participants except one, who is used for testing, thereby assessing generalization to unseen individuals. Cross-validation further verifies robustness across multiple data partitions, particularly for larger datasets. These evaluation strategies help prevent overfitting and confirm the model's effectiveness across a wide range of subjects.
Results & discussion
Comparison methods
This section presents the evaluated results of our proposed scheme and comparison with the three most recent cross-subject-based leading models proposed by37,38, and39.
CSMM (2025) leveraged an accurate emotion recognition through cross-subject multi-model implementation jointly learnt from EEG signals as well as eye features. Authors evaluated CSMM on the SEED dataset with different versions37.
DNN_AER (2024) was proposed using a deep neural network along with an automatic emotion recognition scheme, and evaluated on a variety of datasets including BAVED, TESS, CEED, and SEED38.
DAPLP (2025) is the adoption of pseudo-label propagation and an unsupervised domain adaptation mechanism over EEG signals for emotion recognition. DAPLP was evaluated over SEED and its different variants39.
Further, our proposed scheme, CSCL, demonstrates the accuracy level by comparing with these existing state-of-the-art approaches which indicates a promising performance and robustness of our proposed CSCL scheme in emotion recognition. Existing schemes, CSMM, DNN_AER, and DAPLP, aim to enhance generalization by learning features that remain consistent across different subjects, effectively addressing the core issue of inter-subject variability in cross-subject emotion recognition. CSMM integrated self-attentive adversarial learning with graph based structures to model the complex inter-subject relationships in EEG signals. DNN_AER leverages an ensemble neural network framework optimized through evolutionary algorithms, making it well-suited for capturing the nonlinear and intricate patterns in EEG data. DAPLP, on the other hand, applies graph-based multi-task self-supervised learning to improve feature representations by focusing on task-specific distinctions across EEG tasks. For evaluation, we adopted standard validation procedures: leave-one-subject-out (LOSO) for the SEED, CEED, FACED and MPED datasets, and tenfold cross-validation for each, ensuring consistency with prior work and enabling fair comparisons. Table 3 outlines the accuracy and standard deviation achieved by each method across datasets. Our proposed scheme, CSCL’s results are from our implementation, whereas the performance for other methods was taken from their respective publications due to reproducibility constraints.
Table 3.
Results analysis on the SEED dataset.
| Schemes | Average (%) | Standard deviation (%) | |
|---|---|---|---|
| SEED (with three classes) | DNN_AER | 84.3 | 5.8 |
| DAPLP | 89.4 | 6.4 | |
| CSMM | 94.6 | 5.1 | |
| CSCL (Our) | 97.7 | 5.3 |
Performance evaluation metrics
To evaluate the performance of the proposed framework for emotion recognition, we used standard and most common performance evaluation metrics, including accuracy and standard deviation. Accuracy is one of the most widely used indicators for assessing the effectiveness of classification models. It reflects the ratio of correctly predicted outcomes to the total number of predictions. To compute accuracy, one must determine how many predictions align with the actual labels and then divide this count by the total number of samples evaluated. In this context, TP refers to true positives (correct identifications), TN to true negatives (correct rejections), FP to false positives (incorrect classifications), and FN to false negatives (incorrect rejections). Equation 7 presents the formula used to determine the accuracy.
| 7 |
On the other hand, when a model is evaluated over multiple runs (e.g., k-fold cross-validation or repeated experiments), the standard deviation tells how much the accuracy varies across those runs. For instance, A1, A2, A3… Be the accuracy from n number of executions. Then
the mean average, can be determined as:
Then the standard deviation can be determined as:
| 8 |
Following the above performance metrics, we have evaluated our proposed scheme and compared it with existing state-of-the-art methods. Table 3 presents the performance evaluation of our proposed CSCL model against three baseline approaches, DNN_AER, DAPLP, and CSMM, across four diverse emotion recognition datasets, including SEED, CEED, FACED, and MPED. The experimental results demonstrate the superior effectiveness and stability of CSCL in comparison to the other methods. On the SEED dataset (three-class classification), our CSCL model achieves the highest average accuracy of 97.7%, outperforming CSMM (94.6%), DAPLP (89.4%), and DNN_AER (84.3%). The standard deviation of 5.3% indicates that CSCL maintains consistent performance across runs, highlighting its robustness. Further, in the CEED dataset, CSCL again leads with an accuracy of 96.15%, showing significant improvements over CSMM (91.8%), DAPLP (85.4%), and DNN_AER (78.63%). Notably, it also has the lowest standard deviation (3.72%), further affirming its reliability as shown in Table 4.
Table 4.
Results analysis on the CEED dataset.
| Schemes | Average (%) | Standard Deviation (%) | |
|---|---|---|---|
| CEED (with three classes) | DNN_AER | 78.63 | 7.8 |
| DAPLP | 85.4 | 5.3 | |
| CSMM | 91.8 | 8.03 | |
| CSCL (Our) | 96.15 | 3.72 |
For more complex multi-class tasks such as those in the FACED dataset (nine emotion classes), CSCL attains an accuracy of 65.98%, which is higher than CSMM (62.7%), DAPLP (54.5%), and DNN_AER (43.7%). Moreover, it exhibits a smaller standard deviation (4.09%) compared to other methods, suggesting more stable performance despite increased classification complexity (Table 5). On the MPED dataset (seven emotion classes), CSCL achieves 51.30% accuracy. While this is slightly lower than CSMM (55.6%), it surpasses DAPLP (49.02%) and DNN_AER (36.3%), and it yields the lowest standard deviation of 3.3%, reflecting highly stable predictions across cross-subject samples (Table 6).
Table 5.
Results analysis on FACED dataset.
| Schemes | Average (%) | Standard Deviation (%) | |
|---|---|---|---|
| FACED (with nine classes) | DNN_AER | 43.7 | 7.6 |
| DAPLP | 54.5 | 6.2 | |
| CSMM | 62.7 | 8.1 | |
| CSCL (Our) | 65.98 | 4.09 |
Table 6.
Results analysis on MPED dataset.
| Schemes | Average (%) | Standard Deviation (%) | |
|---|---|---|---|
| MPED (with seven classes) | DNN_AER | 36.3 | 7.4 |
| DAPLP | 49.02 | 4.18 | |
| CSMM | 55.6 | 6.3 | |
| CSCL (Our) | 51.30 | 3.3 |
In EEG-based emotion recognition, relying solely on accuracy may lead to misleading interpretations, especially in multi-class or imbalanced datasets where certain emotions may be over- or under-represented. Therefore, it is essential to compute additional performance metrics such as Precision, Recall, and F1-score to gain a more nuanced understanding of model effectiveness. Precision measures the proportion of correctly predicted instances among all instances predicted as a particular class, highlighting how trustworthy the model’s predictions are. Recall, also known as sensitivity, measures the proportion of correctly predicted instances out of all actual instances of a class, indicating how well the model captures true emotional states. F1-score is the harmonic mean of precision and recall, offering a balanced metric that is especially useful when class distribution is uneven or when false positives and false negatives carry different consequences. These metrics are calculated as follows:
| 9 |
| 10 |
| 11 |
where TP, FP, and FN represent true positives, false positives, and false negatives, respectively. By reporting these metrics for each emotion class, we can better assess the model’s capability to generalize across diverse emotional expressions and subject variations. Therefore, we evaluate the proposed CSCL model across four benchmark EEG datasets using multiple classification metrics, including precision, recall, and F1-score presented in Table 7.
Table 7.
Class-wise and average precision, recall, and F1-score of the proposed CSCL model on SEED, CEED, FACED, and MPED datasets.
| Datasets | Classification | Precision | Recall | F1-score |
|---|---|---|---|---|
| SEED | Positive | 0.94 | 0.94 | 0.94 |
| Neutral | 0.92 | 0.91 | 0.92 | |
| Negative | 0.94 | 0.94 | 0.94 | |
| Avg | 0.93 | 0.93 | 0.93 | |
| CEED | Positive | 0.93 | 0.94 | 0.94 |
| Neutral | 0.93 | 0.91 | 0.92 | |
| Negative | 0.92 | 0.92 | 0.92 | |
| Avg | 0.92 | 0.93 | 0.92 | |
| FACED | Amusement | 0.64 | 0.64 | 0.64 |
| Inspiration | 0.56 | 0.56 | 0.56 | |
| Joy | 0.54 | 0.58 | 0.56 | |
| Tenderness | 0.56 | 0.6 | 0.58 | |
| Anger | 0.53 | 0.62 | 0.57 | |
| Fear | 0.43 | 0.54 | 0.48 | |
| Disgust | 0.55 | 0.58 | 0.56 | |
| Sadness | 0.5 | 0.58 | 0.54 | |
| Neutral | 0.56 | 0.56 | 0.56 | |
| Avg | 0.54 | 0.58 | 0.56 | |
| MPED | Neutral | 0.42 | 0.52 | 0.46 |
| Anger | 0.39 | 0.47 | 0.43 | |
| Disgust | 0.36 | 0.45 | 0.4 | |
| Fear | 0.39 | 0.43 | 0.41 | |
| Happiness | 0.37 | 0.45 | 0.41 | |
| Sadness | 0.4 | 0.47 | 0.43 | |
| Surprise | 0.41 | 0.43 | 0.42 | |
| Avg | 0.39 | 0.46 | 0.42 | |
On the SEED and CEED datasets, CSCL achieved strong and balanced performance, with macro-averaged F1-scores of 0.93 and 0.92, respectively, indicating high generalization and robustness in binary and ternary emotion recognition settings. The model maintained consistent precision and recall across all emotion classes in these datasets, reflecting its ability to distinguish emotional states even under cross-subject conditions. On the more challenging FACED dataset, which includes nine emotion categories, CSCL achieved a macro-averaged F1-score of 0.56. Despite the increased complexity, the model performed particularly well on classes like Amusement and Disgust, while showing slightly lower recall for emotions such as Fear and Inspiration. For MPED, a 7-class dataset with highly overlapping affective states, CSCL achieved a macro F1-score of 0.42. This drop is attributed to the dataset’s noisy and ambiguous nature, where emotions like Surprise, Disgust, and Anger often share neural signatures. Nonetheless, CSCL still maintained a balanced trade-off between precision and recall across most classes. These results demonstrate the model’s adaptability across both controlled and real-world EEG data, outperforming many baseline approaches in terms of class-level reliability. Further, we generate the confusion matrices for selected SEED, CEED, FACED, and MPED datasets that demonstrate the robust classification capability of the proposed CSCL framework across diverse emotional categories. In the SEED dataset with three classes, the model exhibits highly accurate predictions, with only a few misclassifications, confirming strong performance in distinguishing between positive, neutral, and negative states as shown in Fig. 3a. Similarly, on the CEED dataset, the classifier maintains impressive precision across all three emotion categories, resulting in a high overall accuracy of 95.49%. Figure 3b shows the confusion matrix for CEED dataset using our CSCL model. For the more complex FACED dataset, which includes nine emotional classes, the model handles a broader range of emotions with a balanced confusion matrix, although slight overlap is observed between semantically close categories. Despite the increased challenge, accuracy remains solid at over 58% as shown in Fig. 4.
Fig. 3.
a The confusion matrix of our proposed scheme CSCL on cross-subject emotion recognition while using SEED dataset on three classes. b The confusion matrix of our proposed scheme CSCL on cross-subject emotion recognition while using three class CEED dataset.
Fig. 4.

The confusion matrix of our proposed scheme CSCL on cross-subject emotion recognition on nine class FACED dataset.
On the MPED dataset, which involves seven emotional states, the confusion matrix reveals that the model performs well across all categories, particularly for Joy and Neutral, while still facing some confusion among emotionally similar labels like Fear and Disgust, as presented in Fig. 5. These visualisations collectively validate the CSCL model’s capacity to generalize across datasets with varying class counts and emotional complexity. The overall results highlight the strength of cross-subject learning and the effectiveness of the model’s alignment and fusion strategy. Each matrix provides valuable insight into class-wise prediction patterns, showcasing consistent dominance of true positives along the diagonal. The performance difference across datasets also reflects how class granularity impacts recognition difficulty.
Fig. 5.

The confusion matrix of our proposed scheme, CSCL on cross-subject emotion recognition on the seven-class MPED dataset.
Discussion
A comparative analysis of recent state-of-the-art EEG-based emotion recognition models presented in Table 8 reveals the better performance and robustness of our proposed CSCL framework. Unlike methods such as MS-DCDA and FCLGCN, which primarily focus on domain alignment or fuzzy labelling, CSCL introduces dual contrastive losses embedded in hyperbolic space to enhance semantic separation and brain-region discrimination. While models like DAPLP and CLISA target subject-invariant representation, they lack the hierarchical embedding and dual objective mechanisms present in CSCL. Multimodal models such as CSMM offer promising results but are constrained by their reliance on synchronized data and limited scalability. Our model consistently outperforms others across four diverse datasets, achieving 97.7% on SEED and 96.15% on CEED, compared to 94.6% and 91.8% from the closest competitor. Additionally, CSCL demonstrates greater resilience to label noise and cross-subject variability, particularly in complex multi-class settings like FACED and MPED. This performance is attributed to the integrated triple-path encoder and domain adversarial network that ensure cross-subject generalization. The results underscore CSCL’s potential as a scalable, subject-independent emotion recognition solution for real-world EEG applications. Our proposed CSCL model significantly outperforms existing approaches by leveraging a novel combination of contrastive learning and domain adaptation, tailored specifically for cross-subject EEG emotion recognition. Its higher accuracy stems from a multi-phase architecture that captures spatial, temporal, and frequency domain patterns through three specialized encoders.
Table 8.
Comparative analysis of our proposed CSCL scheme with existing state of the art methods.
| References | Method Name | Proposed Schemes | Dataset | Accuracy (%) | Limitations |
|---|---|---|---|---|---|
| Xiao et al. (2025)28 | MS-DCDA | Multi-source Dynamic Contrastive Domain Adaptation | SEED | 90% | Limited to domain knowledge of labeled subjects; performance may drop with highly noisy data |
| Minghao et al. (2025)29 | FCLGCN | Fuzzy Emotion Labelling + Graph Convolution + Contrastive Learning | SEED, others | Not specified (claimed competitive) | Sensitive to noisy or low-quality EEG; generalization is still limited |
| Fangze et al. (2025)30 | SDR-GNN | Spectral Domain Reconstruction Graph Neural Network | Multiple multimodal datasets | Not specified | Requires spectral tuning; performance may drop with incomplete or noisy modalities |
| Mengting et al. (2024)36 | N/A | 3-domain encoding + contrastive learning + entropy-based features | SEED, SEED-IV | Not explicitly stated | High complexity in preprocessing across datasets; information loss risk |
| Zhu et al. (2025)37 | CSMM | Cross-subject multimodal learning (EEG + eye movement) + contrastive losses | SEED | 94.60% | Performance depends on multimodal synchronization; scalability issues |
| Sujatha et al. (2025)38 | DNN_AER | Deep Neural Network with automatic emotion recognition | BAVED, TESS, CEED, SEED | 84.3% (SEED), 43.7% (FACED) | Poor generalization across diverse datasets; cross-lingual issues |
| Zhong et al. (2025)39 | DAPLP | Pseudo-label propagation + domain adaptation (unsupervised) | SEED, SEED-IV, SEED-V | 89.4% (SEED) | Doesn’t capture brain-region-specific features; lacks contrastive learning |
| Shen et al. (2022)43 | CLISA | Contrastive Learning for Subject-Invariant Representations | SEED | ~ 90% | Limited use of dual losses; lacks hyperbolic geometry for structured data |
| Wang et al. (2025)35 | Source-free EEG Recognition | Contrastive Self-Supervised Learning | Not specified | Not reported | Does not incorporate domain adaptation or region-aware modules |
| Current Study (2025) | CSCL (Proposed) | Dual contrastive losses in hyperbolic space, with spatial–temporal-frequency and DANN | SEED, CEED, FACED, MPED | 97.7%, 96.15%, 65.98%, 51.30% | Slightly lower accuracy on highly noisy multi-class data (MPED); real-time validation remains future work |
Unlike traditional models that rely heavily on subject-specific features, our model ensures generalization by learning domain-invariant embedding, which is vital for real-world applicability. The contrastive learning phase maximizes inter-class separability while preserving intra-class consistency, enabling the model to distinguish emotions even across unseen subjects. Additionally, the DANN module minimizes distributional shifts between source and target domains, thus improving cross-subject generalization. The fusion of multi-domain features, followed by a well-designed representation projector, further enhances feature compactness and discriminability. Moreover, our composite loss function, integrating classification, domain, and contrastive losses, optimally balances learning objectives for robust emotion detection. Extensive evaluations on diverse datasets such as SEED, CEED, FACED, and MPED show that our model consistently achieves higher classification accuracy than baseline models, demonstrating scalability and adaptability. The MLP classifier, fine-tuned on domain-invariant features, ensures high fidelity in emotion labelling. The effectiveness of our training pipeline, comprising iterative optimisation, careful data standardisation, and rigorous testing via LOSO, ensures that overfitting is minimized. Feature visualisations and class-wise regrouping validate the model's ability to create well-separated clusters for different emotional states. Overall, the above findings in the following results section collectively validate the effectiveness of our cross-subject multi-modal emotion recognition framework. Even under more challenging multi-class settings such as FACED and MPED, CSCL maintains competitive accuracy with enhanced stability, reinforcing its practical applicability in real-world emotion recognition systems.
Conclusion
This study presents a novel Cross-Subject Contrastive Learning (CSCL) scheme for EEG-based emotion recognition, addressing a critical challenge in affective computing: the variability of EEG signals across individuals. By introducing a dual contrastive objective within hyperbolic space, our approach effectively learns both subject-specific nuances and generalizable representations. This enhances the robustness of emotional feature extraction across diverse subjects. The integration of emotion and stimulus contrastive losses enables the model to distinguish between different brain regions with greater fidelity, thereby improving classification performance. The CSCL model was rigorously evaluated on four benchmark datasets, SEED, CEED, FACED, and MPED, where it achieved impressive recognition accuracies of 97.70%, 96.26%, 65.98%, and 51.30%, respectively. These results demonstrate that our approach outperforms existing methods in handling inter-subject variability and label noise, two major bottlenecks in EEG-based emotion classification systems. Additionally, the extensive testing across multiple datasets ensures the reliability and adaptability of our method in real-world BCI applications. Overall, the CSCL framework not only enhances emotion recognition accuracy but also contributes to the broader development of more inclusive and generalizable brain-computer interface systems.
While the proposed CSCL framework demonstrates better performance across multiple EEG datasets, certain limitations remain. First, although hyperbolic space embedding and dual contrastive losses improve cross-subject generalization, the model’s performance slightly decreases on highly complex and noisy multi-class datasets such as MPED. Second, real-time emotion recognition was not evaluated in this study; thus, its applicability in online BCI systems needs further validation. Third, despite demonstrating generalization across four datasets, all data were collected in controlled lab settings; the model’s robustness in naturalistic or mobile environments is yet to be tested. Additionally, the computational cost of training triple-path encoders and domain adaptation modules may be high for edge devices. Finally, the model currently focuses solely on EEG signals, while integration of multi-modal physiological data could further improve recognition accuracy and contextual awareness.
Acknowledgements
This work was funded by the University of Jeddah, Jeddah, Saudi Arabia, under grant no. (UJ-24-SUCH-1247). The authors, therefore, thank the University of Jeddah for its technical and financial support.
Author contributions
A. M. A, and M. U. A. contributed to the study conception and design. Material preparation, data collection and analysis were performed by A. B, K. A. A, W. A. A and A.D. The first draft of the manuscript was written by M. U. A and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.
Funding
This work was funded by the University of Jeddah, Jeddah, Saudi Arabia, under grant no. (UJ-24-SUCH-1247). The authors, therefore, thank the University of Jeddah for its technical and financial support.
Data availability
The datasets used and/or analysed during the current study available from the corresponding author on reasonable request.
Declarations
Competing interests
The authors declare no competing interests.
Footnotes
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.LeDoux, J. E. Cognitive-emotional interactions in the brain. Cogn. Emot.3(4), 267–289 (1989). [Google Scholar]
- 2.Garcia-Molina, G., Tsoneva, T. & Nijholt, A. Emotional brain–computer interfaces. Int. J. Auton. Adapt. Commun. Syst.6(1), 9–25 (2013). [Google Scholar]
- 3.Abramson, L., Petranker, R., Marom, I. & Aviezer, H. Social interaction context shapes emotion recognition through body language, not facial expressions. Emotion21(3), 557 (2021). [DOI] [PubMed] [Google Scholar]
- 4.Noroozi, F. et al. Survey on emotional body gesture recognition. IEEE Trans. Affect. Comput.12(2), 505–523 (2018). [Google Scholar]
- 5.Rao, K. S. & Koolagudi, S. G. Emotion Recognition Using Speech Features (Springer, 2012). [Google Scholar]
- 6.Kraack K. A Multimodal Emotion Recognition System: Integrating Facial Expressions, Body Movement, Speech, and Spoken Language. arXiv preprint arXiv:2412.17907 (2024).
- 7.Dadebayev, D., Goh, W. W. & Tan, E. X. EEG-based emotion recognition: Review of commercial EEG devices and machine learning techniques. J. King Saud Univ.-Comput. Inf. Sci.34(7), 4385–4401 (2022). [Google Scholar]
- 8.Hall, E. L., Robson, S. E., Morris, P. G. & Brookes, M. J. The relationship between MEG and fMRI. Neuroimage15(102), 80–91 (2014). [DOI] [PubMed] [Google Scholar]
- 9.Lu, Y., Yao, X., Wang, W., Zhou, L., & Wu, T. Emotion recognition classification with differential entropy and power spectral density features. In International Conference on Image, Vision and Intelligent Systems 2023 Aug 16 541–548. (Springer Nature Singapore, Singapore, 2023)
- 10.Ufade, M. A., Gond, V. J., Kawade, M. M. Power spectral density based discrete emotional state recognition system using electroencephalography signals. Afr. J. Biomed. Res. 59–66 (2024).
- 11.Uyanık, H., Ozcelik, S. T., Duranay, Z. B., Sengur, A. & Acharya, U. R. Use of differential entropy for automated emotion recognition in a virtual reality environment with EEG signals. Diagnostics12(10), 2508 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Anjum, M., Batool, W., Saher, R. & Saeed, S. M. Enhanced classification of video-evoked stress response using power spectral density features. Appl. Sci.14(20), 9527 (2024). [Google Scholar]
- 13.Khare, S. K., Bajaj, V. & Sinha, G. R. Adaptive tunable Q wavelet transform-based emotion identification. IEEE Trans. Instrum. Meas.69(12), 9609–9617 (2020). [Google Scholar]
- 14.You, H. et al. Mitigating Regression Faults Induced by Feature Evolution in Deep Learning Systems. ACM Transactions on Software Engineering and Methodology. 2025.
- 15.Yang, L., Zhang, Q., Chao, S., Liu, D., & Yuan, X. Greedy-mrmr: An emotion recognition algorithm based on eeg using greedy algorithm. In2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 1329–1336 (IEEE, 2022).
- 16.Badrulhisham, N. A. & Mangshor, N. N. Emotion recognition using convolutional neural network (CNN). J. Phys.: Conf. Ser.1962(1), 012040 (2021). [Google Scholar]
- 17.Cîrneanu, A. L., Popescu, D. & Iordache, D. New trends in emotion recognition using image analysis by neural networks, a systematic review. Sensors23(16), 7092 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Abdullah, S. M. & Abdulazeez, A. M. Facial expression recognition based on deep learning convolution neural network: A review. J. Soft Comput. Data Min.2(1), 53–65 (2021). [Google Scholar]
- 19.Canal, F. Z. et al. A survey on facial emotion recognition techniques: A state-of-the-art literature review. Inf. Sci.1(582), 593–617 (2022). [Google Scholar]
- 20.Pandey SK, Shekhawat HS, Prasanna SM. Deep learning techniques for speech emotion recognition: A review. In 2019 29th International Conference Radioelektronika (RADIOELEKTRONIKA) 2019 Apr 16, 1–6 (IEEE, 2019).
- 21.Boughanem, H., Ghazouani, H. & Barhoumi, W. Facial emotion recognition in-the-wild using deep neural networks: A comprehensive review. SN Comput. Sci.5(1), 96 (2023). [Google Scholar]
- 22.Rajwal, S. & Aggarwal, S. Convolutional neural network-based EEG signal analysis: A systematic review. Arch. Comput. Methods Eng.30(6), 3585–3615 (2023). [Google Scholar]
- 23.Liu, S. et al. EEG emotion recognition based on the attention mechanism and pre-trained convolution capsule network. Knowl.-Based Syst.8(265), 110372 (2023). [Google Scholar]
- 24.Gkintoni, E., Aroutzidis, A., Antonopoulou, H. & Halkiopoulos, C. From neural networks to emotional networks: A systematic review of EEG-based emotion recognition in cognitive neuroscience and real-world applications. Brain Sci.15(3), 220 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Indraneel, K., & Miriyala, T. B. Artificial Intelligence Based Emotion Recognition from Fuzzy EEG Signals: A Comprehensive Review. In 2025 4th International Conference on Sentiment Analysis and Deep Learning (ICSADL), 921–926 (IEEE, 2025).
- 26.Desai, S., Ghose, D., & Chakraborty, D. A Survey on Data Curation for Visual Contrastive Learning: Why Crafting Effective Positive and Negative Pairs Matters. arXiv preprint arXiv:2502.08134 (2025).
- 27.Pang, B., Wei, Z., Lin, J., & Lu, C. Auto-Pairing Positives through Implicit Relation Circulation for Discriminative Self-Learning. In IEEE Transactions on Pattern Analysis and Machine Intelligence (2025). [DOI] [PubMed]
- 28.Xiao, Y. et al. Multi-source EEG emotion recognition via dynamic contrastive domain adaptation. Biomed. Signal Process. Control1(102), 107337 (2025). [Google Scholar]
- 29.Yu, M., He, Q., Wang, Y. & Du, N. Fusing temporal-frequency information with contrast learning on graph convolution network to decoding EEG. Biomed. Signal Process. Control1(100), 106986 (2025). [Google Scholar]
- 30.Fu, F. et al. SDR-GNN: Spectral domain reconstruction graph neural network for incomplete multimodal learning in conversational emotion recognition. Knowl.-Based Syst.30(309), 112825 (2025). [Google Scholar]
- 31.Khan, A. R. Facial emotion recognition using conventional machine learning and deep learning methods: current achievements, analysis and remaining challenges. Information13(6), 268 (2022). [Google Scholar]
- 32.Bota, P. J., Wang, C., Fred, A. L. & Da Silva, H. P. A review, current challenges, and future possibilities on emotion recognition using machine learning and physiological signals. IEEE Access26(7), 140990–141020 (2019). [Google Scholar]
- 33.Khalil, R. A. et al. Speech emotion recognition using deep learning techniques: A review. IEEE Access19(7), 117327–117345 (2019). [Google Scholar]
- 34.Deng, X., Li, C., Hong, X., Huo, H. & Qin, H. A novel multi-source contrastive learning approach for robust cross-subject emotion recognition in EEG data. Biomed. Signal Process. Control1(97), 106716 (2024). [Google Scholar]
- 35.Wang, Y., Ruan, Q., Wu, Q. & Wang, S. A contrastive self-supervised learning method for source-free EEG emotion recognition. User Model. User-Adap. Inter.35(1), 4 (2025). [Google Scholar]
- 36.Hu, M., Xu, D., He, K., Zhao, K. & Zhang, H. Cross-subject emotion recognition with contrastive learning based on EEG signal correlations. Biomed. Signal Process. Control1(104), 107511 (2025). [Google Scholar]
- 37.Zhu, Q. et al. Multi-modal cross-subject emotion feature alignment and recognition with EEG and eye movements. IEEE Trans. Affect. Comput.10.1109/TAFFC.2025.3554399 (2025). [Google Scholar]
- 38.Sujatha, R., Chatterjee, J. M., Pathy, B. & Hu, Y. C. Automatic emotion recognition using deep neural network. Multimed. Tools Appl.10, 1–30 (2025). [Google Scholar]
- 39.Zhong XC, Wang Q, Li R, Liu Y, Duan S, Yang R, Liu D, Sun J. Unsupervised Domain Adaptation with Pseudo-Label Propagation for Cross-Domain EEG Emotion Recognition. IEEE Transactions on Instrumentation and Measurement. 2025 Mar 28..
- 40.SEED dataset, (accessed on 12 March 2025) https://bcmi.sjtu.edu.cn/home/seed/.
- 41.CEED The Complex Emotion Expression Database: A Validated Stimulus Set of Trained Actors (accessed on 24 Mar 2025) https://nyu.databrary.org/volume/874. [DOI] [PMC free article] [PubMed]
- 42.FACED. (accessed on 13 March 2025) https://github.com/openmedlab/Awesome-Medical-Dataset/blob/main/resources/FACE.md.
- 43.MPED. (accessed on 17, 03, 2025) https://github.com/Tengfei000/MPED.
- 44.Shen, X., Liu, X., Hu, X., Zhang, D. & Song, S. Contrastive learning of subject-invariant EEG representations for cross-subject emotion recognition. IEEE Trans. Affect. Comput.14(3), 2496–2511 (2022). [Google Scholar]
- 45.Fdez, J., Guttenberg, N., Witkowski, O. & Pasquali, A. Cross-subject EEG-based emotion recognition through neural networks with stratified normalization. Front. Neurosci.3(15), 626277 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Kunjan, S. The necessity of leave one subject out (LOSO) cross validation for EEG disease diagnosis. In Brain Informatics: 14th International Conference, BI 2021, Virtual Event, September 17–19, 2021, Proceedings 14 2021, 558–567 (Springer International Publishing, 2021).
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The datasets used and/or analysed during the current study available from the corresponding author on reasonable request.



