Abstract
Electroencephalogram (EEG) emotion recognition plays an important role in human–computer interaction, and a higher recognition accuracy can improve the user experience. In recent years, domain adaptive methods in transfer learning have been used to construct a general emotion recognition model to deal with domain difference among different subjects and sessions. However, it is still challenging to effectively reduce domain difference in domain adaptation. In this paper, we propose a Multiple-Source Distribution Deep Adaptive Feature Norm Network for EEG emotion recognition, which reduce domain difference by improving the transferability of task-specific features. In detail, the domain adaptive method of our model employs a three-layer network topology, inserts Adaptive Feature Norm to self-supervised adjustment between different layers, and combines a multiple-kernel selection approach to mean embedding matching. The method proposed in this paper achieves the best classification performance in the SEED and SEED-IV datasets. In SEED dataset, the average accuracy of cross-subject and cross-session experiments is 85.01 and 91.93%, respectively. In SEED-IV dataset, the average accuracy is 58.81% in cross-subject experiments and 59.51% in cross-session experiments. The experimental results demonstrate that our method can effectively reduce the domain difference and improve the emotion recognition accuracy.
Keywords: EEG, Emotion recognition, Domain adaptation, Transfer learning
Introduction
Different from the extensively researched logical intelligence, emotion is the psychological and physical reaction response of people to the outside world, and is vital to human behavior and mental health (Dolan and Emotion 2002). The research of affective brain-computer interface must take into account the emotional state of human beings. Although non-physiological signals such as body movements (Ahmed et al. 2019) and speech (Liu et al. 2018) can also reflect emotional states, they are susceptible to camouflage and external factors, which limits their reliability in emotion recognition. Consequently, the researchers tend to favor using physiological signals. Emotion recognition of physiological signals can enable machines to have the perception ability of human emotional state, and enhance the learning and prediction ability of machines (Cowie et al. 2001). In recent years, Electroencephalogram (EEG) technology has gradually been applied to emotion recognition due to its convenience, simplicity, accessibility and operation. Compared with other physiological signals such as Electromyography (EMG) (Mithbavkar and Shah 2021) and Galvanic Skin Response (GSR) (Wu et al. 2010), EEG is derived from the synchronous synaptic activity of many neurons in the cerebral cortex and can capture changes in nerve electrical activity caused by emotions, which is highly correlated with emotions (Feng et al. 2023). Therefore, EEG holds significant research potential in emotion recognition.
When researchers extract task-related features from EEG signals, it is inevitable to consider the properties of EEG signals. Firstly, the complexity of EEG directly leads to the non-stationary characteristics of the collected signals. Many methods have been investigated to deal with the non-stationary properties of EEG signals (Ghare and Paithane 2016; Chai et al. 2016), such as Sparse Representation (Shin et al. 2015), Wavelet Transform (Bozhokin and Suslova 2015), Domain Adaptation (Chai et al. 2016), etc. Secondly, the acquisition process of EEG signals is easy to be disturbed and the signal noises are obvious. There may be deviation among the data collected by the same subject or session. Researchers have carried out widely research to address the issue of difference among EEG signals. By modifying the covariance matrix, Common Spatial Pattern (CSP) maps EEG signals into a common place, and obtains the feature vectors with high differentiation (Lotte and Guan 2010). In addition, domain adaptation has also been widely used to reduce the difference among EEG signals. The goal of domain adaptation is to lessen the disparity in distribution between two domains, so as to determine the label of target data. Tzeng et al. (Tzeng et al. 1412) proposed a Deep Domain Confusion (DDC) method, which utilizes Maximum Mean Discrepancy (MMD) to reduce the difference in feature distribution between source and target domains. Zheng et al. (Zheng et al. 2015) introduced MMD into the field of EEG emotion recognition to improve the accuracy. MMD is a kernel learning way that calculates the separation distance between two distributions in the Reproducing Kernel Hilbert Space (RKHS) (Borgwardt et al. 2006).
However, MMD utilizes a single fixed kernel, which may have a significant deviation in different models, and different kernel functions need to be selected for different models. Therefore, Multiple-Kernel Maximum Mean Discrepancy (MK-MMD) has been proposed and demonstrates superior domain adaptive ability (Hang et al. 2019).
Since the extensive research and application of deep learning, researchers have found that deep domain adaptation has superior domain adaptive ability (Ganin et al. 2016; Yosinski et al. 2014). Yosinski et al. (Yosinski et al. 2014) analyzed the transferability of features and found that feature transferability decreased significantly on the deep full connection, so it is particularly important to adapt to multiple layers. Long et al. (Long and Wang 2015) proposed Deep Adaptive Network (DAN) based on the above theories, which combines neural networks with domain adaptation. Jin et al. (Jin et al. 2017) introduced DAN into the field of emotion recognition to eliminate the difference between the source and target domains, and compared it with traditional methods. The results show that DAN outperforms similar methods in emotion recognition.
While deep domain adaptation can better reduce the domain difference, it often combines all source domains into a general domain and employs domain adaptation for alignment (Single source domain in Fig. 1). This simple method of expanding the training data can enhance the model performance to a certain degree, but it ignores the unique characteristics of each EEG signal. Each person can be regarded as an independent emotional individual, and individual difference lead to different marginal distributions of EEG data. If all subjects are in the same source domain, it is impossible to guarantee that the new marginal distribution is same as the different individuals, resulting in large deviation. Based on this problem, researchers have explored strategies for multiple source domains (Li et al. 2018; Gu et al. 2022; Chen et al. 2021a), which can preserve the unique characteristics of each EEG signal to the greatest extent (Multiple source domain in Fig. 1). Figure 1 shows the comparison of single and multiple source domain methods. Chen et al. (Chen et al. 2021a) proposed a Multi-Source Marginal Distribution Adaptation (MS-MDA) method for EEG emotion recognition, which matches the distribution to extract domain-specific information after pairing each source domain with the same target domain and creates multiple branches. Although utilizing multiple source domains can effectively preserve the features of each EEG signal and minimize the individual differences, there is a drawback that task-specific features are less transferable and the degree of domain adaptation is lower.
Fig. 1.
Comparison of single and multiple source domains
The low transferability of task-specific features can lead to model degradation, which is a problem in transfer learning. This means that the training model may not be fully applicable to new tasks, resulting in incomplete transfer of task-specific features and poor domain adaptation. The notion of model deterioration has gained widespread acceptance in domain adaptation in recent years (Yosinski et al. 2014; Tzeng et al. 2017), however there has been limited research exploring the potential causes of this issue. Xu et al. (Xu et al. 2018) proposed Adaptive Feature Norm (AFN) method and revealed that the instability of target domain recognition is primarily due to the smaller feature norm compared to that of source domain. The main method is to gradually adjust the feature norms of the target and source domains, bring them close to a wide range of stable values. This is based on the observation that larger feature norm for task-specific features simplifies the transfer process. Imran et al. (Imran and Athitsos 2021) proposed a method that used adaptive feature norm with subdomain adaptation to improve transfer gains. The results show that AFN has a positive impact on transfer learning, which offers an innovative idea to improve the domain adaptability of emotion recognition models. Based on this, this paper proposes a Multiple-Source Distribution Deep Adaptive Feature Norm Network (MSD-DAFNN) for EEG emotion recognition. The contributions made in this paper are summarized as follows:
We propose a Multiple-Source Distributed Deep Adaptive Network (MSD-DAN) that combines multiple-source domains and deep adaptive network. The layers related to the task-specific features in this structure are deeply adaptive in a hierarchical manner. Compared with the single kernel method, our proposed method significantly enhances the adaptive efficiency.
We propose a domain adaptive method called Deep Adaptive Feature Norm Network (DAFNN), that combines multiple-kernel, multiple-layer and AFN. After measuring the distance distribution discrepancy between the source and target domains, the AFN is added. The step size is gradually adjusted to ensure a steady increase in the feature norm within each domain, and finally leading to a wide range of values. It has been proved that the features of task-specific with larger norm are easier to transfer. We have verified on different datasets, and our method has superior domain adaptability.
Inspired by the previous two items, we propose the MSD-DAFNN for emotion recognition. We establish a three-layer network topology, and adapt it to multiple-source domains. In each branch, we utilize the DAFNN to conduct deep domain adaptation. The final experimental results show that this method improves the mean accuracy again on the basis of MSD-DAN.
The remaining sections of this paper are as follows. In "Methods" section describes methods. In "Experiments" section introduces the datasets and experimental setup. In "Results" section gives the results and analyses of different experiments. In "Discussion" section contains the discussion. In "Conclusions" section summarizes the research.
Methods
The proposed MSD-DAFNN is illustrated in Fig. 2. The network is consisted of four parts: pre-processing of signals, domain invariant feature extractor, domain specific feature extractor, and multiple source domain classifiers. The pre-processing of signals module extracts the Differential Entropy (DE) features of the collected signals. The domain invariant feature extractor is responsible for extracting low-level domain invariant features from both the source and target domains. The domain specific feature extractor is responsible for obtaining domain-specific features from various domains and employing DAFNN to minimize domain difference. The multiple source domain classifiers module predicts the classification results, and the average of all the predictions is considered as the final result.
Fig. 2.
The structure of the proposed MSD-DAFNN
Pre-processing of signals
It is necessary to pre-process the collected EEG signals. Firstly, according to the reaction of the subjects, only the experimental period during which the target emotions were stimulated was selected for analysis. Secondly, in order to increase signal-to-noise ratio, the original EEG signals were down-sampled to 200 Hz sampling rate. And then, data that were seriously contaminated by electromyography and electrooculography were removed. Finally, a bandpass filter with a range of 0.3 to 50 Hz was used to eliminate noise and interference signals. Since DE is capable of distinguishing patterns from different frequency bands (Zheng and Lu 2015), we adopt DE feature as the initial input to our model. The calculation formula is defined as
| 1 |
For each sampled data, it is decomposed into five frequency bands, delta (1–3 Hz), theta (4–7 Hz), alpha (8–13 Hz), beta (14–30 Hz) and gamma (31–50 Hz). DE features are extracted for each frequency band. The EEG collection process contains 62 channels, so the data form is sample size × 62 (channels) × 5 (frequency bands). Finally, the number of channels and frequency bands are combined to form the final data form, which is sample size × 310 (62 × 5).
Domain invariant feature extractor
For multiple source domain data with labels and newly collected target domain data, they are simultaneously input into the domain invariant feature extractor to extract common features across different domains. The inputs of our model are the pre-processed data of N source domain and a target domain , where and represent the source domain data features and labels of the branch respectively, and represents the target domain data features. This module adopts a single-layer network structure, and maps all data to a shared latent space to extract domain invariant features.
Domain specific feature extractor
We have obtained the features from both the source and target domains, and have matched the target domain with each source domain to form N branches. Each branch is then mapped to a specific latent space to extract specific domain features. In order to enhance the adaptability between the source and target domains, we propose the DAFNN method to make the two domains closer in the latent space, which is detailed in the DAFNN section.
DAFNN
DAFNN includes a three-layer network for deep domain adaptation, as illustrated in Fig. 3. We employ three layers of MK-MMD to reduce domain difference and introduce AFN after calculating the difference for each layer domain. AFN can enhance the transferability of task-specific features by adjusting the feature norms of source and target domains, so as to improve the domain adaptability of the model.
Fig. 3.
The structure of DAFNN
The formula of MK-MMD is defined as
| 2 |
where means the RKHS with a kernel of feature, is the mathematical expectation. and indicate the data in source and target domains, respectively. The probability distribution and in shows the RKHS distance between the mean embeddings of and . The characteristic kernel associated with the feature map , , is defined as the convex combination of Positive Semi-Definite (PSD) kernels .The formula of multiple-kernel selection is as follows,
| 3 |
where is the number of kernels, represents the constraint coefficient to make sure the exported multiple-kernel is characteristic. MK-MMD uses different to enhance the test, and gets the theoretically optimal kernel.
The module for feature extraction is recorded as , the task-specific classifier with full connection (FC) layers is recorded as . The FC is marked as , the last layer is marked as .The formula is as follows,
| 4 |
where and mean the amount of data in source and target domains respectively, and represent the source and target domains respectively. Function represents all possible results of combination of L2-norm operator and module for deep representation, i.e., , is a possible case of .
It is obvious that function contains a large enough amount of data. If the function type is not limited, its upper bound will be widely departure from zero. Therefore, the fixed scalar is introduced to distinguish the mean feature norm. However, as the fixed scalar increases, the accuracy of model can still be increased continuously, but the gradient generated by the feature norm penalty finally causes an explosion, which also means that we cannot accurately set a reasonable value.
To settle the above problem, we employ a gradual adjustment strategy to make the model learn features that task-specific with a higher feature norm. The formula is as follows,
| 5 |
where represent the arguments of , respectively. , and represent the model parameters updated in the prior iteration and the current iteration, respectively. as the distance, is a parameter, represents the source classification loss. indicates the positive residual scalar that controls the increase of the feature norm.
By gradually adjusting the step size, the feature norms of source and target domains are asymptotically stable in a large range to increase the transferability of task-specific features. DAFNN embeds AFN into the three layers network, when the feature norm of each layer reaches asymptotically stable value, it means that the transferability of task-specific features of this layer is optimal.
Multiple source domain classifiers
Since we adapt the target domain to multiple source domains, which means that the model has multiple branches. Therefore, we assign multiple classifiers to each branch, with each classifier trained on a distinct source domain. The features from the source and target domains in each branch are then input into the classifier for classification prediction. Subsequently, the classification loss is calculated according to the prediction results. Finally, the average of predictions from all classifiers is taken as the final result.
Loss function
The loss function for the model proposed in this paper is depicted in Fig. 4, which is defined as
| 6 |
where , and represent classification loss, discrepancy loss and DAFNN loss respectively. The specific descriptions of different losses are given below. The is defined as
| 7 |
where and , is a hyper-parameter. and represent the hidden representation of th layer of source and target domains respectively. means the evaluation of DAFNN between source and target domains on -layer. and are the total of feature norms loss of source and target domains in one branch, respectively.
Fig. 4.
Description of the model’s loss function
For the training of each classifier, the is estimated using cross-entropy, which is represented as
| 8 |
where and represent the prediction and real labels of the source domain data respectively. represents the number of classifiers. is the cross-entropy loss function, and indicates that data belongs to the source domain data set .
A simple average of the predictions of multiple classifiers may lead to significant variance. This effect is particularly pronounced when dealing with predictions close to the decision boundary, adversely affecting the final experimental results. can make the prediction results of multiple classifiers converge which effectively reduce the variance, as shown below,
| 9 |
where and represent the prediction results of two different classifiers for the target domain data, and indicates that data belongs to the target domain data set .
Experiments
Datasets
The SEED (Zheng and Lu 2015; Duan et al. 2013) and SEED-IV (Zheng et al. 2018) datasets are utilized in our study, which are developed and released by the BCMI Laboratory of Shanghai Jiao Tong University.
SEED: The SEED dataset includes positive, negative and neutral emotions. It involves 15 subjects (7 males and 8 females). Each person watches 15 film clips and data is collected during watching. There are 5 films for each emotion with an average film of 226 s, and a duration range of 165–185 s per clip. The film clip includes 5 s of tips, 45 s of self-assessment, and 15 s of rest. Each subject carries out three experiments with an interval of one week, resulting in a total of 45 experiments. EEG signals are recorded through 62 electrodes at a sampling rate of 1000 Hz. After preprocessing, each signal is segmented into 1 s segments, and each subject contains 3394 samples, with approximately 1100 samples per emotion.
SEED-IV: The SEED-IV dataset includes four emotions: happiness, sadness, fear and neutrality. It involves 15 subjects (7 males and 8 females). Each subject completes the experiment three times at different times. Each experiment involves watching 24 film clips (6 films for each emotion). The film clip consists of 5 s start prompt and 45 s self-assessment. After preprocessing, each signal is segmented into 4 s segments, and each subject contains approximately 850 samples, with approximately 200 samples per emotion.
Experimental setup
The model is implemented in Python using Pytorch, and the experimental environment is Intel ® Core ™ i5-1135G7 @ 2.40 GHz, 64 bit operating system. The Pytorch version is 1.7.0, while the CUDA version is 11.0. The batch size of the model is set to 64, the epoch is set to 20, and the learning rate of Adam (Kingma and Ba 1412) is set to 0.01. All experiments followed the 3 (sessions) × 15 (subjects) cross-validation method, resulting in 45 experiments being conducted. The average of their results is taken as the final result. For each experiment, we calculated several evaluation metrics such as mean accuracy (ACC), standard deviation (STD), precision (P), recall (R), and F1 score.
Results
Cross-subject experiment
In the cross-subject experiments, three groups of experiments were carried out. Each group took data from one session, and each session contained the data of 15 subjects. Initially, the data from one subject was selected as the target domain, while the data from the remaining 14 subjects served as the source domain. This process was then repeated with different subjects chosen as the target domain. Consequently, each subject's data was used as the target domain once, and the session data would be changed after the completion of the loop, resulting in a total of 15 (subjects) × 3 (sessions) cross-subject experiments.
In the SEED and SEED-IV datasets, each subject contains 3394 and 850 samples, respectively. Hence, within a single experiment, the input data of the model is 15 (14 source domain branches + 1 target domain branch) × 3394/850 (SEED/SEED-IV) × 310 (DE features). We conducted comparative experiments under consistent parameters, calculated the average ACC ± STD, P, R, and F1 score for each model. Each result represents the average of 45 experiments, as shown in Table 1. MSD-DAN represents the initial improvement and is used to compare the performance enhancement of the MSD-DAFNN method.
Table 1.
The average ACC ± STD, P, R, and F1 score of different methods in cross-subject experiments
| Dataset | Method | ACC ± STD (%) | P (%) | R (%) | F1 (%) |
|---|---|---|---|---|---|
| SEED | DDC | 59.87 ± 7.28 | 73.99 | 59.90 | 52.21 |
| DAN | 75.59 ± 7.89 | 77.78 | 75.64 | 74.93 | |
| MS-MDA | 80.41 ± 10.75 | 78.16 | 77.69 | 77.14 | |
| MSD-DAN | 83.32 ± 12.08 | 80.46 | 79.88 | 79.80 | |
| MSD-DAFNN | 85.01 ± 12.21 | 82.83 | 82.15 | 81.89 | |
| SEED-IV | DDC | 43.16 ± 11.25 | 40.96 | 44.57 | 36.75 |
| DAN | 44.46 ± 12.21 | 42.01 | 45.28 | 39.46 | |
| MS-MDA | 52.69 ± 11.92 | 45.75 | 45.76 | 42.72 | |
| MSD-DAN | 54.32 ± 11.06 | 46.07 | 46.14 | 44.61 | |
| MSD-DAFNN | 58.81 ± 10.46 | 47.67 | 47.25 | 46.07 |
In the SEED and SEED-IV datasets, DAN exhibits significantly higher performance across various evaluation metrics in cross-subject experiments compared to DDC. This indicates that the use of multiple-kernel has a positive impact on improving the accuracy of emotion recognition. In the both two datasets, the average ACC of MSD-DAN is 2.91 and 1.63% higher than that of MS-MDA, respectively. Furthermore, MSD-DAN demonstrates higher P, R and F1 score than MS-MDA. This suggests that the combination of multiple-source domains and deep adaptive network can effectively enhance the accuracy of emotion recognition and have stronger ability to identify positive samples. Compared to MS-MDA, MSD-DAFNN exhibits an increase in average ACC by 4.6 and 6.12%, P by 4.67 and 1.92%, R by 4.46 and 1.49%, and F1 score by 4.75 and 3.35%, respectively. This indicates that MSD-DAFNN improves the model's average accuracy and the ability to identify positive samples in cross-subject experiments. Additionally, MSD-DAFNN achieves the highest recognition accuracy in comparative experiments, and its average ACC ± STD is 85.01 ± 12.21% in the SEED dataset and 58.81 ± 10.46% in the SEED-IV dataset, respectively, which indicate that MSD-DAFNN has superior model performance.
Cross-session experiment
In the cross-session experiments, a total of 15 groups of experiments were carried out. Each group took data from one subject, and each subject contained data of 3 sessions. We first selected one session’s data from one subject as the target domain, and the remaining two session’s data of this subject as the source domain. Then we replaced different sessions as the target domain. Finally, each session's data was used as the target domain once, and the subject data was changed after the completion of the loop. So the final number of cross-session experiments is 3 (sessions) × 15 (subjects). In the SEED and SEED-IV datasets, the input data of the model in a single experiment is 3 (2 source domain branches + 1 target domain branch) × 3394/850 (SEED/SEED-IV) × 310 (DE features). We conducted cross-session comparative experiments with consistent parameters, and calculated the average ACC ± STD, P, R and F1 score of each model. Each result is the average value of 45 experiments. The detailed results are shown in Table 2.
Table 2.
The average ACC ± STD, P, R, and F1 score of different methods in cross- session experiments
| Dataset | Method | ACC ± STD (%) | P (%) | R (%) | F1 (%) |
|---|---|---|---|---|---|
| SEED | DDC | 73.97 ± 7.97 | 81.45 | 74.01 | 70.80 |
| DAN | 85.51 ± 9.29 | 86.57 | 85.56 | 85.26 | |
| MS-MDA | 88.21 ± 10.64 | 88.16 | 87.89 | 87.57 | |
| MSD-DAN | 88.41 ± 8.64 | 88.35 | 87.96 | 87.97 | |
| MSD-DAFNN | 91.93 ± 8.08 | 91.88 | 91.69 | 91.62 | |
| SEED-IV | DDC | 50.33 ± 15.51 | 52.93 | 52.81 | 47.17 |
| DAN | 51.26 ± 15.16 | 54.67 | 51.85 | 48.31 | |
| MS-MDA | 56.68 ± 15.71 | 60.54 | 57.21 | 55.48 | |
| MSD-DAN | 57.13 ± 12.62 | 60.66 | 58.19 | 57.36 | |
| MSD-DAFNN | 59.51 ± 14.58 | 64.46 | 61.78 | 61.11 |
Since the data of cross-session experiments all come from a single subject, the degree of individual difference is much smaller than that of cross-subject experiments. Consequently, the accuracy and evaluation metrics of cross-session experiments are higher than those of cross-subject experiments. In the SEED and SEED-IV datasets, the average ACC, P, R, and F1 score of DAN are higher than DDC. Similarly, the evaluation metrics for MSD-DAN surpass those for MS-MDA. This indicates that the utilization of the multiple-kernel adaptive method improves both the average ACC and the identification capability of positive samples in single and multiple source domains. From Table 2, it is evident that MSD-DAFNN achieves the highest average accuracy. In comparison to MS-MDA, MSD-DAFNN increases average ACC by 3.72 and 2.83%, P by 3.72 and 3.92%, R by 3.8 and 4.57%, and F1 score by 4.05 and 5.63%, respectively. This demonstrates that MSD-DAFNN enhances the emotion recognition capability of cross-session experiments, enabling the model to better identify positive samples. The improvement in average accuracy signifies a stronger predictive ability of the model, while the increase in F1 score indicates the model's improved handling of imbalanced data.
Ablation experiment
In order to study the impact of each module in the MSD-DAFNN, we conducted ablation experiments in SEED dataset. We split each module of MSD-DAFNN in turn, and compared the performance of the deep adaptive module, the AFN module and the ablation of both. Following the experimental steps described in the comparative experiments, we conducted cross-subject and cross-session experiments for all models. We evaluated the final results in terms of average ACC ± STD, P, R, and F1 score. The results are shown in Table 3, where the experimental results represent the average of 45 experiments. The input data of the model remained entirely consistent throughout. In order to visually observe the impact of each module on the accuracy across different subjects more intuitively, we took the average accuracy of three sessions for each subject. The results are shown in Fig. 5.
Table 3.
Comparison of the ablation experiment in the SEED dataset
| Experiment | Method | ACC ± STD (%) | P (%) | R (%) | F1 (%) |
|---|---|---|---|---|---|
| Cross-subject | MSD-DAFNN | 85.01 ± 12.21 | 82.83 | 82.15 | 81.89 |
| (-) AFN | 83.32 ± 12.08 | 80.46 | 79.88 | 79.80 | |
| (-) Deep Adaptive | 81.55 ± 11.39 | 79.71 | 78.77 | 78.38 | |
| (-) Both modules | 80.41 ± 10.75 | 78.16 | 77.69 | 77.14 | |
| Cross-session | MSD-DAFNN | 91.93 ± 8.08 | 91.88 | 91.69 | 91.62 |
| (-) AFN | 88.41 ± 8.64 | 88.35 | 87.96 | 87.97 | |
| (-) Deep Adaptive | 90.88 ± 8.32 | 90.84 | 90.32 | 90.22 | |
| (-) Both modules | 88.21 ± 10.64 | 88.16 | 87.89 | 87.57 |
"(-)" indicates the ablation of certain modules
Fig. 5.
Comparison of different algorithms on each subject in the SEED dataset
Firstly, we evaluated the performance of MSD-DAFNN after removing the AFN module. In cross-subject and cross-session experiments, the average ACC ± STD was 83.32 ± 12.08 and 88.41 ± 8.64%, respectively. Subsequently, we assessed the performance after solely removing the deep adaptation module, resulting in average ACC ± STD of 81.55 ± 11.39 and 90.88 ± 8.32% in cross-subject and cross-session experiments, respectively. Finally, upon removing both the AFN module and the deep adaptation module simultaneously, the model achieved an average ACC ± STD of 80.41 ± 10.75 and 88.21 ± 10.64% in cross-subject and cross-session experiments, respectively.
The ablation experiments reveal that removing either the AFN module or the deep adaptation module adversely affects the model performance. In cross-subject experiments, the average ACC decreases by 1.69 and 3.46% when solely removing the AFN module or the deep adaptation module, respectively. In cross-session experiments, the corresponding decreases are 3.52 and 1.05%. Moreover, upon removing both modules simultaneously, the average ACC decreases by 4.6% in cross-subject experiments and 3.72% in cross-session experiments. Figure 5(a) and (b) show the bar charts of the average accuracy of each subject in the cross-subject and cross-session ablation experiments, respectively. It is noticeable that for both cross-subject and cross-session experiments, the accuracy of most subjects will decrease after removing a single module, and its accuracy will decrease again after simultaneously removing two modules. Additionally, we also analyze P, R, and F1 scores, and find that they are all affected, as shown in Table 3. In cross-subject experiments, the average P, R, and F1 score for MSD-DAFNN are 82.83, 82.15, and 81.89%, respectively, while in cross-session experiments, they are 91.88, 91.69, and 91.62%, respectively. When simultaneously removing both the AFN and deep adaptation modules, in cross-subject experiments, the average P, R, and F1 score decrease by 4.67, 4.46, and 4.75%, respectively, and in cross-session experiments, they decrease by 3.72, 3.8, and 4.05%, respectively.
Visualization of results
In order to compare the effects of our method better, we utilized t-SNE (Hinton and Maaten 2008) for visualization in the cross-subject experiment. We chose the input data from the last layer of network to calculate t-SNE, and the results are seen in Fig. 6. Compared with cross-session experiments, cross-subject experiments have more sources and can thus provide a more comprehensive visualization. In the experiment results, different colors represent different source domains, while the black region is the target domain. In the experiment of DAN, since the method does not employ multiple source domains, the source domains are mixed together with only one color. It is obvious that the target domain of DAN does not completely coincide with the source domain, resulting in low domain adaptability. In the MS-MDA experiment, it can be seen that different source domains are scattered, and each color has some closeness and overlap with black target domain. For different source domains and target domains, the overlapping effect is not very obvious, which indicates that MS-MDA can adapt target domains to multiple source domains, but its domain adaptability is still low. In the MSD-DAFNN experiment, it is clear that different colors are more dispersed, and the source domain of each color almost overlaps with the black target domain. This indicates that each source domain is highly adaptive to the target domain, and the domain adaptation effect is significant.
Fig. 6.
Scatter plot of feature distribution
Performance of models
We constructed confusion matrices under different methods in the SEED dataset, as shown in Fig. 7. Each row represents the real label in the confusion matrix, and each column represents the predicted label, with negative, neutral, and positive emotions from left to right and from top to bottom. We selected the data of the same subject for the experiment. In order to reduce the influence of randomness, we conducted experiments on the three sessions data of the subject respectively, and the final result was the average value of the three groups of experiments.
Fig. 7.
Confusion matrices of models in the SEED dataset
In the SEED dataset, Fig. 7(a) shows that DAN can identify positive emotions well, but has a low recognition accuracy for neutral and negative emotions. In Fig. 7(b), MS-MDA achieves recognition accuracy of 73 and 76% for neutral and negative emotions, respectively, which increase by 20 and 14% compared with DAN. In Fig. 7(c), the recognition accuracy of MSD-DAFNN for neutral and negative emotions reaches 80 and 77%, respectively, which is 7 and 1% higher than that of MS-MDA. These results indicate that compared to other methods, MSD-DAFNN demonstrates higher recognition accuracy and lower error rates.
Discussion
We conducted several experiments to verify whether MSD-DAFNN can enhance the model's domain adaptability and improve the accuracy of emotion recognition. In addition, there are some aspects that are worth discussing.
Xu et al. (Xu et al. 2018) proposed that the instability of target domain recognition in domain adaptation is due to the fact that the feature norm of target domain is much smaller than that of source domain. In the field of EEG emotion recognition, researchers have neglected the effect of feature norm on the transferability of task-specific features. Therefore, we combined deep domain adaptation with adaptive feature norm and proposed a new domain adaptive method called MSD-DAFNN for EEG emotion recognition. It adopts a three-layer network structure to carry out deep domain adaptation, and constantly adjusts the feature norms of source and target domains to increase the transferability of task-specific features. Our experimental results verified that MSD-DAFNN has better domain adaptability, and can improve the accuracy of emotion recognition.
In the study of domain adaptation, the method of single source domain is to divide all subjects into the same source domain. This method ignores the difference of marginal distribution among different subjects, and increases the difficulty of domain adaptation to a certain extent. Therefore, we combined MSD-DAFNN with the multiple source domain method, and established individual branches for each subject. We applied MSD-DAFNN in single and multiple source domains respectively to verify the higher emotion recognition accuracy in the multiple source domain method. We conducted cross-subject and cross-session experiments in the SEED dataset. The three sessions data of each subject were respectively tested and their average values were taken as the final results. In order to make a more intuitive comparison, the average accuracy of each subject was finally drawn as a line chart. Figure 8(a) and (b) illustrate the results of cross-subject and cross-session experiments respectively. It is obvious that the average accuracy of each subject in multiple source domain is higher than that in single source domain. The experimental results show that the method combining multiple source domain with MSD-DAFNN is more beneficial to EEG emotion recognition.
In order to evaluate the novelty and superiority of MSD-DAFNN, we compared our method with several recent EEG emotion recognition methods in the SEED dataset, and the experimental results are illustrated in Table 4. We set the epoch of MSD-DAFNN to 80 for superior model performance, where † indicates the result we self-reproduced with the same parameters. MDTDDL (Gu et al. 2022) integrates transfer learning and dictionary learning into a learning model for emotion recognition. PPDA (Zhao et al. 2021) is a plug-and-play domain adaptive method that divides EEG representations into private components specific to each subject and shared emotional components for all subjects. MEERNet (Chen et al. 2021b) considers both domain-invariant features and domain-specific features for emotion recognition. PR-PL (Zhou et al. 2023) reveals emotions across individuals by learning discriminant and generalized EEG features, and formulates emotion recognition task as pairwise learning. We evaluated the cross-subject and cross-session experiments for different methods, and the average ACC ± STD of MSD-DAFNN is 87.88 ± 9.11% in cross-subject experiment and 93.25 ± 6.30% in cross-session experiment. The results show that compared with other EEG emotion recognition methods, MSD-DAFNN achieves the highest average accuracy in both experiments, which indicates that our method has the novelty and superiority.
The evaluation metrics of the model on the SEED dataset is significantly higher than that of the SEED-IV dataset. The reason is that SEED is a three-classification dataset, while SEED-IV is a four-classification dataset, and it is more difficult for the model to predict labels in more classification tasks. In addition, the quality of the dataset is also the key to affect the classification accuracy. We verify the superiority of the proposed method in three and four classification tasks. In future work, we will try more complex classification tasks to verify the strong domain adaptability of MSD-DAFNN.
EEG is a non-invasive, safe and easily accessible method for studying emotions. Emotions are associated with specific patterns of brain activity, EEG can directly measure electrical activity of the brain and reveal these neural processes. Therefore, our study is based on EEG, and in future studies we will try to combine multimodal approaches with other physiological signals for more accurate emotion recognition.
Emotion recognition based on EEG holds extensive potential applications in the future, such as human–computer interaction, psychological therapy, education. In practical applications, acquiring a large amount of labeled data is expensive and time-consuming, and traditional models typically require training on user-specific data. Transfer learning allows models to transfer knowledge learned from one task to another related task, which can enhance the model's adaptability and generalization, and present better prospects for practical applications. MSD-DAFNN is a new domain adaptive method with stronger domain adaptability, which can improve the accuracy of transfer learning in the field of emotion recognition.
Fig. 8.
Comparative experiment between single and multiple source domains in the SEED dataset
Table 4.
The average ACC ± STD (%) of the compared methods
| Method | Cross-subject | Cross-session |
|---|---|---|
| MDTDDL (Gu et al. 2022) (2022) | 76.75 ± 7.06 | – |
| PPDA (Zhao et al. 2021) (2021) | 86.70 ± 7.10 | – |
| † MS-MDA (Chen et al. 2021a) (2021) | 82.45 ± 10.70 | 90.42 ± 8.75 |
| MEERNet (Chen et al. 2021b) (2021) | 87.10 ± 2.00 | 86.20 ± 5.80 |
| PR-PL (Zhou et al. 2023) (2023) | 85.56 ± 4.78 | 93.06 ± 5.12 |
| MSD-DAFNN (ours) | 87.88 ± 9.11 | 93.25 ± 6.30 |
Conclusions
In this paper, we proposed the MSD-DAFNN model, which combines multiple source domains and deep adaptive method. It is worth noting that DAFNN is a new domain adaptive method proposed in this paper, and the AFN module is embedded in the deep adaptive module. The main process is to gradually adjust the feature norms of source and target domains to make them asymptotically stable in a large range, which can improve the transferability of task-specific features and improve the domain adaptability of the model. Compared with other methods, the proposed method achieves state-of-the-art performance on the SEED and SEED-IV datasets, which mean that MSD-DAFNN has the highest recognition accuracy and the strongest ability to identify positive samples. The experimental results show that the combination of AFN module and deep adaptive module can improve the EEG emotion recognition in transfer learning.
Acknowledgements
This work was supported by the Key Research and Development Project of Zhejiang Province (2020C04009) and Laboratory of Brain Machine Collaborative Intelligence of Zhejiang Province (2020E10010)
Data availability
The datasets analyzed during the current study are available in the SEED repository, https://bcmi.sjtu.edu.cn/ *seed/index.html.
Declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Footnotes
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- Ahmed F, Bari ASMH, Gavrilova ML (2019) Emotion recognition from body movement. IEEE Access 8:11761–11781 [Google Scholar]
- Borgwardt KM, Gretton A, Rasch MJ et al (2006) Integrating structured biological data by Kernel Maximum Mean Discrepancy. Bioinformatics 22(14):e49–e57 [DOI] [PubMed] [Google Scholar]
- Bozhokin SV, Suslova IB (2015) Wavelet-based analysis of spectral rearrangements of EEG patterns and of non-stationary correlations[J]. Physica A 421:151–160 [Google Scholar]
- Chai X, Wang Q, Zhao Y et al (2016) Unsupervised domain adaptation techniques based on auto-encoder for non-stationary EEG-based emotion recognition[J]. Comput Biol Med 79:205–214 [DOI] [PubMed] [Google Scholar]
- Chen H, Jin M, Li Z et al (2021) MS-MDA: Multisource Marginal Distribution Adaptation for Cross-subject and Cross-session EEG Emotion Recognition [J]. Front Neurosci 15:778488 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen H, Li Z, Jin M, et al (2021) MEERNet: multi-source EEG-based emotion recognition network for generalization across subjects and sessions[C]//2021 In: 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC). IEEE, pp 6094–6097 [DOI] [PubMed]
- Cowie R, Douglas-Cowie E, Tsapatsoulis N et al (2001) Emotion recognition in human-computer interaction [J]. IEEE Signal Process Mag 18(1):32–80 [Google Scholar]
- Dolan RJ (2002) Emotion, cognition, and behavior [J]. Science 298(5596):1191–1194 [DOI] [PubMed] [Google Scholar]
- Duan RN, Zhu JY, Lu BL (2013) Differential entropy feature for EEG-based emotion classification[C]//2013 In: 6th International IEEE/EMBS Conference on Neural Engineering (NER). IEEE, pp 81–84
- Feng X, Cong P, Dong L et al (2023) Channel attention convolutional aggregation network based on video-level features for EEG emotion recognition [J]. Cognit Neurodynamics. 10.1007/s11571-023-10034-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ganin Y, Ustinova E, Ajakan H et al (2016) Domain-adversarial training of neural networks [J]. J Mach Learn Res 17(1):1–35 [Google Scholar]
- Ghare PS, Paithane AN (2016) Human emotion recognition using non-linear and non-stationary EEG signal[C]//2016 In: International Conference on Automatic Control and Dynamic Optimization Techniques (ICACDOT). IEEE, pp 1013–1016
- Gu X, Cai W, Gao M et al (2022) Multi-source domain transfer discriminative dictionary learning modeling for electroencephalogram-based emotion recognition [J]. IEEE Trans Comput Soc Syst 9(6):1604–1612 [Google Scholar]
- Hang W, Feng W, Du R et al (2019) Cross-subject EEG signal recognition using deep domain adaptation network[J]. IEEE Access 7:128273–128282 [Google Scholar]
- Hinton G, van der Maaten L (2008) Visualizing data using t-SNE. J Mach Learn Res [J] 9:2579 [Google Scholar]
- Imran A, Athitsos V M (2021) Adaptive feature norm for unsupervised subdomain adaptation [C]//International Symposium on Visual Computing. Springer, Cham, pp 341-352
- Jin YM, Luo YD, Zheng WL, et al (2017) EEG-based emotion recognition using domain adaptation network[C]//2017 In: international conference on orange technologies (ICOT). IEEE, pp 222–225
- Kingma DP, Ba J (2014) Adam: a method for stochastic optimization [J]. arXiv preprint arXiv:1412.6980
- LiH, Jin YM, Zheng WL, et al (2018) Cross-subject emotion recognition using deep adaptation networks[C]//In: International conference on neural information processing. Springer, Cham, pp 403–413.
- Liu ZT, Xie Q, Wu M et al (2018) Speech emotion recognition based on an improved brain emotion learning model [J]. Neurocomputing 309:145–156 [Google Scholar]
- Long M, Wang J (2015) Learning transferable features with deep adaptation networks[C]// JMLR.org. JMLR.org
- Lotte F, Guan C (2010) Learning from other subjects helps reducing brain-computer interface calibration time[C]//2010 In: IEEE International conference on acoustics, speech and signal processing. IEEE, pp 614–617
- Mithbavkar SA, Shah MS (2021) Analysis of EMG based emotion recognition for multiple people and emotions[C]//2021 In: IEEE 3rd Eurasia Conference on Biomedical Engineering, Healthcare and Sustainability (ECBIOS). IEEE, pp 1–4
- Shin Y, Lee S, Ahn M et al (2015) Noise robustness analysis of sparse representation based classification method for non-stationary EEG signal classification[J]. Biomed Signal Process Control 21:8–18 [Google Scholar]
- Tzeng E, Hoffman J, Zhang N, et al (2014) Deep domain confusion: maximizing for domain invariance [J]. arXiv preprint arXiv:1412.3474
- Tzeng E, Hoffman J, Saenko K, et al (2017) Adversarial discriminative domain adaptation [C]//In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 7167–7176
- Wu G, Liu G, Hao M (2010) The analysis of emotion recognition from GSR based on PSO[C]//2010 In: International symposium on intelligence information processing and trusted computing. IEEE, pp 360–363
- Xu R, Li G, Yang J, et al (2018) Larger norm more transferable: an adaptive feature norm approach for unsupervised domain adaptation[J]
- Yosinski J, Clune J, Bengio Y, et al (2014) How transferable are features in deep neural networks? [J]. MIT Press
- Zhao L M, Yan X, Lu B L (2021) Plug-and-play domain adaptation for cross-subject EEG-based emotion recognition[C]//In: Proceedings of the AAAI Conference on Artificial Intelligence., vol 35(1): pp 863–870
- Zheng WL, Lu BL (2015) Investigating critical frequency bands and channels for EEG-based emotion recognition with deep neural networks[J]. IEEE Trans Auton Ment Dev 7(3):162–175 [Google Scholar]
- Zheng WL, Liu W, Lu Y et al (2018) Emotionmeter: a multimodal framework for recognizing human emotions[J]. IEEE Trans Cybern 49(3):1110–1122 [DOI] [PubMed] [Google Scholar]
- Zheng W L, Zhang Y Q, Zhu J Y, et al (2015) Transfer components between subjects for EEG-based emotion recognition[C]//2015 In: international conference on affective computing and intelligent interaction (ACII). IEEE, pp 917–922
- Zhou R, Zhang Z, Fu H, et al (2023) PR-PL: a novel prototypical representation based pairwise learning framework for emotion recognition using eeg signals[J]. IEEE Transactions on Affective Computing
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The datasets analyzed during the current study are available in the SEED repository, https://bcmi.sjtu.edu.cn/ *seed/index.html.








