Skip to main content
Health Information Science and Systems logoLink to Health Information Science and Systems
. 2024 Feb 17;12(1):9. doi: 10.1007/s13755-024-00271-0

Enhanced performance of EEG-based brain–computer interfaces by joint sample and feature importance assessment

Xing Li 1, Yikai Zhang 1, Yong Peng 1,2,, Wanzeng Kong 1,2
PMCID: PMC10874355  PMID: 38375134

Abstract

Electroencephalograph (EEG) has been a reliable data source for building brain–computer interface (BCI) systems; however, it is not reasonable to use the feature vector extracted from multiple EEG channels and frequency bands to perform recognition directly due to the two deficiencies. One is that EEG data is weak and non-stationary, which easily causes different EEG samples to have different quality. The other is that different feature dimensions corresponding to different brain regions and frequency bands have different correlations to a certain mental task, which is not sufficiently investigated. To this end, a Joint Sample and Feature importance Assessment (JSFA) model was proposed to simultaneously explore the different impacts of EEG samples and features in mental state recognition, in which the former is based on the self-paced learning technique while the latter is completed by the feature self-weighting technique. The efficacy of JSFA is extensively evaluated on two EEG data sets, i.e., SEED-IV and SEED-VIG. One is a classification task for emotion recognition and the other is a regression task for driving fatigue detection. Experimental results demonstrate that JSFA can effectively identify the importance of different EEG samples and features, leading to enhanced recognition performance of corresponding BCI systems.

Keywords: EEG, Driving fatigue detection, Emotion recognition, Joint assessment, Sample and feature importance

Introduction

With recent developments in neurotechnology and artificial intelligence, the physiological signals in BCI communication have been advanced from perception to higher order cognitive activities [1]. BCI aims to establish a direct channel to transmit information between the brain of human beings and the outside devices. Compared with other physiological signals, EEG is more often used in cognitive and neuroscience research since it is closely related to the neural activities of our cerebral cortex. Thanks to the rapid progresses in weak signal acquisition and analysis techniques, EEG has played important roles in diverse fields such as healthcare, disease diagnosis and rehabilitation [2]. The EEG signal is acquired by non-invasively measuring the electrical potentials generated by neural activities through scalp electrodes, which is characterized by low cost and high temporal resolution.

Some studies have shown that EEG signals dominate the recent research efforts in physiological signal-based affective computing and fatigue detection. Many machine learning paradigms such as semi-supervised learning, transfer learning, deep learning were employed in classifying emotional states from EEG [3]. Transfer learning is a domain adaptive approach to find a common space where the inter-subject (inter-session) EEG data discrepancies are reduced while discriminative information across different subjects (sessions) is preserved [4]. Deep feature representations usually obtain better recognition results than manually extracted features. For example, by combining long short-term memory and attention networks, the formulated S-LSTM-ATT model can effectively handle time-series EEG data and recognise intrinsic connections and patterns [5]. However, their learned features are less interpretable. Similarly, among different data modalities in fatigue detection, EEG has been considered as the gold standard, offering more objective detection results [68].

The weak and non-stationary properties make EEG data easily contaminated by noise, meaning that the quality of different EEG samples might be different in mental state recognition. Besides, we usually extract EEG features from multiple frequency bands and channels, and then concatenate them to form sample vectors for subsequent processing. This makes that features of different dimensions have different contributions to the recognition task. Though some researchers have tried to identify the different functions of different EEG channels and rhythms [9], the metaheuristic optimization method has limited theoretical basis. Most of the existing models in EEG-based BCI research did not simultaneously take the quality of EEG samples and features into consideration, which obviously violate the common sense that different EEG samples as well as different features contribute differently in mental state recognition. More reasonably, it is necessary to weaken the impact of noisy samples and enhance the impact of high quality samples to improve the model robustness. Similarly, the importance of different EEG features should also be explored to improve the model discriminative ability.

To jointly complete the above mentioned two tasks, we propose a novel JSFA model to jointly measure the importance of EEG samples and features for mental state recognition. Because EEG samples in vector representation are usually arranged as rows (or columns) of the data matrix, JSFA intuitively performs the measurement along both the horizontal and vertical directions. Specifically, we use the self-paced learning technique to gradually incorporate samples into model training from easy to more difficult ones. Meanwhile, feature weighting technique is used to adaptively assign large or small weights to different EEG features according to their different contributions in mental state recognition. By jointly measuring the importance of EEG samples and features, on one hand, the EEG decoding performance of BCI systems is greatly enhanced; on the other hand, the critical EEG frequency bands and channels in characterizing the mental states of respective BCI tasks can be automatically identified according to the correspondence between each EEG feature dimension and the specific frequency band (channel).

As a consequence, the contributions of this paper consist of the following points.

  • We propose to jointly measure the importance of samples and features in EEG-based BCI systems. As far as we know, this is the first try to improve the BCI systems’ performance through investigating the importance of both samples and features. As a result, the importance value of each sample and feature is quantitatively achieved. The detailed JSFA model formulation as well as its optimization method are provided.

  • More than improving the recognition performance of BCI systems only, JSFA provides a basis for further performing knowledge discovery from EEG data. That is, according to the correspondence between EEG spectra features and the frequency bands (channels), we automatically identify the specific spatial-frequency EEG patterns of a certain BCI task by the learned feature importance variable.

  • Our proposed JSFA model is flexible enough for both classification and regression tasks, making it competent for diverse EEG-based BCI paradigms. In our experiments, it performs well on two representative BCI paradigms, i.e., emotional recognition and driving fatigue detection, which are respectively corresponding to classification and regression tasks.

The rest of this paper is structured as follows. Section “Related works” reviews some recent advances in EEG-based BCI research and related techniques. In section “Method”, we propose the novel JSFA model and solve the derived optimization problem by an efficient iterative algorithm. In section “Experimental studies”, comparative studies are conducted to demonstrate the effectiveness of JSFA by applying it to a synthetic data set, an emotion recognition data set SEED-IV (i.e., a classification task) and a driving fatigue detection data set SEED-VIG (i.e., a regression task). We show the conclusions in section “Conclusion”.

Notations We use lowercase letter, boldface lowercase letters and boldface uppercase letters to respectively denote scalars, vectors, and matrices in this paper. For matrix M, its i-th row and j-th column are respectively denoted as mi and mj. The EEG frequency bands are denoted as Delta, Theta, Alpha, Beta and Gamma.

Related works

Below we review the related works from two aspects, recent advances in EEG-based BCIs with the emphasis on emotion recognition and driving fatigue detection, and the related techniques on sample and feature purification.

EEG-based BCIs

In a narrow sense, EEG-based BCIs aim to establish a new type of information communication and control channel between our brain and external devices, by decoding the EEG signals and translating them into commands [10]. Currently, there are some typical BCI paradigms including the motor imagery, P300, steady-state visual evoked potential and so on. The central problem in EEG-based BCIs is how to accurately decode EEG signals, which is usually formulated as a pattern recognition task. In this work, EEG-based emotion recognition and driving fatigue detection are considered in our experiments; that is, we need to determine the emotional states of subjects and the fatigue indices of drivers from EEG signals.

According to the general pipeline of pattern recognition, a typical EEG data analysis procedure includes the three consecutive stages, i.e., data preprocessing, feature extraction and model learning. Data preprocessing consists of operations such as down-sampling, filtering and artifact removal in order to provide clean and reliable EEG data for subsequent analysis [11]. Then, EEG features can be extracted from different domains to depict the abundant characteristics of EEG data [12]. Time domains features such as the event-related potentials, statistics, energy, power, and high-order zero-crossing analysis are the most intuitive because raw EEG data is multi-channel time series data. Research has shown that frequency domain features such as the power spectral density (PSD), event-related synchronization (desynchronization), high-order spectrum, and differential entropy (DE) are more stable than the time domain ones [13]. Sometimes, time–frequency features are necessary to capture the frequency information that varies over time, which can be achieved by wavelet transformation. In order to exploit the multi-channel property, connectivity features are developed to utilize the spatial information. For example, the differential asymmetry and rational asymmetry respectively explore the difference and ratio of features on symmetric electrodes of the left and right hemispheres. Brain network to encode functional connectivity among electrodes also provides useful information for EEG decoding [14].

On the machine learning-based EEG feature transformation and mental state recognition, a lot of efforts were made in the past decades. We can roughly divide the existing models into two categories, linear and nonlinear ones. The nonlinear models are mainly implemented by kernel trick or neural networks. To explore the complementary information among spatial, temporal and spectral domains of EEG signals, a spatio-temporal-spectral network termed STSNet was proposed for subject-independent EEG-based emotion recognition [15]. In [16], random vector functional link network was extended into semi-supervised learning by jointly optimizing the model variables and estimating the emotional states of the unlabeled EEG samples. In [17], a classification procedure was proposed by combining the correlation-based feature selection and a k-nearest neighbor for EEG-based attention recognition. In [18], by incorporating an adaptive graph learning strategy into the semi-supervised regression framework, the authors achieved the average results of 78.18%, 80.55% and 81.99% corresponding to the three subject-dependent cross-session emotion recognition tasks. By exploring the label-common and label-specific EEG features in cross-session EEG emotion recognition, the proposed JCSFE model not only obtained improved recognition performance but also provided us with data-driven EEG spatial-frequency activation patterns [19]. EEG data discrepancies usually appear in cross-subject (cross-session) scenarios, transfer learning models were widely used to enhance the universality in motor imagery [20], emotion recognition [21, 22], driving fatigue detection [23], and epileptic recognition [24]. It is worth mentioning that some deep learning models unified the feature extraction and recognition stages together, leading to an end-to-end mode. Though deep models exhibit promising performance in diverse BCI applications, their interpretability still needs to be improved [25]. Recent advances in emotion recognition and driving fatigue detection can be found in [26, 27], respectively.

Related techniques

Based on the fact that EEG features are often extracted from multiple frequency bands and channels, they should have different impacts in mental state recognition. In [28], a unified framework was proposed for feature importance learning, which is well suited for evaluating the importance of EEG frequency bands and channels in emotion recognition. The fact is that EEG features originated from different brain regions have different correlations to drowsiness, based on which a softmax feature weighting technique was incorporated into episodic training for driver drowsiness estimation [29]. In [30], the authors used a short-tailed Gaussian function to weight the common spatial pattern features rather than discarding unreliable features for EEG-based motor imagery.

Besides, EEG data is weak and easily contaminated by noises, which makes the obtained EEG samples sometimes unreliable enough for subsequent analysis. Some robust learning methods were proposed to enhance the model robustness on (but not limited to) EEG data analysis. Peng et al. [31] proposed a robust face recognition model based on structured sparse representation and the model objective was optimized under the half-quadratic framework. To handle the low quality test data, a neural process method was proposed to more robustly estimate the vigilance from EEG [32]. Recently, Meng et al. proposed a robust learning theory for noise modeling and overcoming by self-paced learning (SPL) [33, 34], which is inspired by the fact that easy concepts should be taught before difficult ones in the teaching-learning activities. Inspired by this idea, machine learning models should be trained in a self-paced fashion; that is, gradually incorporating samples from easy to difficult. Whereafter, SPL technique has been an effective tool in improving the robustness of existing models such as matrix factorization [35], feature selection [36] and multi-view learning [37]. In this paper, we mainly rely on the SPL technique for the sample quality assessment.

Method

This section first presents the JSFA model formulation of and then its optimization. Besides, discussions are provided to illustrate the rationality of SPL-based robust learning and the EEG feature weights in spatial-frequency patterns analysis.

Model formulation

Without loss of generality, below we take the classification paradigm as an example to state the EEG decoding settings. We are given the training samples X=[x1,x2,,xn]Rd×n, where n and d respectively denote the number of training samples and the feature dimensionality. The corresponding label matrix is Y=[y1;y2;;yn]Bn×c, where c represents the number of classes. yi|i=1nB1×c uses the one-hot encoding to indicate the category of the i-th sample.

As shown in Fig. 1, we propose to measure the importance along both the sample and feature directions. Concretely, we use vi|i=1n to characterize the importance of the n samples xi|i=1n and θi|i=1d to depict the importance of the d EEG features fi|i=1d. Then, we should learn both vectors vRn and θRd (which satisfies θ0 and 1Tθ=1) from the given EEG data. To this end, we treat both vectors as variables and incorporate them into the least squares regression formula due to its simplicity and efficacy, and propose the following objective function of our JSFA model as

minW,v,b,θC2i=1nvixiTΘW+bT-yi22+12W22+f(λ,v),s.t.v0,θ0,1Tθ=1. 1

In Eq. (1), ΘRd×d is a diagonal matrix whose j-th non-zero diagonal element is defined as Θjj|j=1d=θj. WRd×c and bRc are respectively the slop and intercept variables in the least squares regression. C is a parameter to depict the impact of the first term in objective function (1). f(λ,v) is a regularization term associated with variable v and parameter λ, which determines how the EEG samples are incorporated into model learning and how their importance values are calculated. Formally, this term is called the self-paced function since it controls the learning pace through the age parameter λ (λ>0). In this paper, we define its exact form according to the linear self-paced regularization which was proposed in [34]. That is

f(λ,v)=12λi=1nvi2-2vi,s.t.vi0,i=1,,n. 2

By introducing an intermediate variable A to replace ΘW, Eq. (1) can be rewritten as

minA,v,b,θC2i=1nvixiTA+bT-yi22+12Θ-1A22+f(λ,v),s.t.v0,θ0,1Tθ=1. 3

When v, A and b are fixed, based on the definition of Θjj and the constraint 1Tθ=1, we have the following equation

minθ0,1Tθ=1Θ-1A22=minθ0,1Tθ=1j=1daj22θj. 4

The corresponding Lagrangian function with respect to θj is

Lθj=aj22θj+η1Tθ-1+θTβ, 5

where ηR and βRd are two Lagrange multipliers. By setting L(θj)θj to be zero and taking the normalization constraint of θ into consideration, the solution to θ is calculated as

θj=aj2j=1da2. 6

Then, we have the equivalent form of Eq. (4) as

minθ0,1Tθ=1A2,12. 7

Now, Eq. (3) can be rewritten as

minA,v,bC2i=1nvixiTA+bT-yi22+12A2,12+f(λ,v). 8

Fig. 1.

Fig. 1

The overall architecture of JSFA

Model optimization

We propose to optimize v, A and b in objective function (8) by the alternating direction method; that is, one variable is updated with the others fixed. Below the detailed derivations are provided for each of the three variables.

  • Update v with A and b fixed. The objective function O(v) defined on v is
    minvC2i=1nvixiTA+bT-yi22+f(λ,v). 9
    To simplify the following notations, we use i to denote the squared regression loss xiTA+bT-yi22 on EEG sample xi. By merging parameters C with λ into a newer one [38], we rewrite O(v) as
    minvi=1nvii+λ2i=1n(vi2-2vi),s.t.vi0,i=1,,n. 10
    By calculating the derivative of Eq. (10) with respect to vi and setting it to zero, we have
    O(vi)vi=i+λvi-λ=0. 11
    It is easy to verify that the closed-form solution to vi is
    vi=1-iλ,i<λ;0,iλ. 12
  • Update b with A and v fixed. Now objective function (8) degenerates to
    minbC2i=1nvixiTA+bT-yi22. 13
    Denote U=diag(v) and we obtain the more compact matrix form of the above equation as
    minbUXTA+1bT-Y22. 14
    By calculating the partial derivative of the above equation with respect to b and setting it to 0, the updating rule to the intercept variable b is
    b=HTH-1T-GATH, 15
    where H=U1, T=UY and G=UXT.
  • Update A with v and b fixed. The objective function defined on variable A, i.e., O(A), is
    minAC2i=1nvixiTA+bT-yi22+12A2,12. 16
    To avoid the singularity problem caused when the 2-norm of a certain row of A is zero, we regularize A2,12 as j=1daj22+ϵ2 where ϵ>0 is a small enough constant. Then, we have
    minAC2UXTA+1bT-Y22+12j=1daj22+ϵ2. 17
    According to the previous definitions of G, H and T, we rewrite Eq. (17) as
    O(A)=C2GA+HbT-T22+12Tr(ATDA), 18
    where DRd×d is a diagonal matrix and its j-th diagonal element is calculated as
    djj=p=1dap22+ϵaj22+ϵ. 19
    Similarly, by calculating O(A)A and setting its value to zero, we have
    CGTGA+GTHbT-GTT+DA=0. 20
    Thus, the updating rule to A is
    A=GTG+DC-1GTT-HbT. 21

Since A and D are mutually involved in respective solutions, we propose to optimize variable A iteratively, as shown in Algorithm 1. Based on the above derivations, we summarize the complete optimization procedure to JSFA objective function in Algorithm 2.

Algorithm 1.

Algorithm 1

The algorithm to solve sub-objective (16)

Algorithm 2.

Algorithm 2

The algorithm to optimize objective (8)

Below we analyze the computational complexity of using Algorithm 2 to optimize the JSFA objective function. Based on the big O notation, the complexity of updating variable v is O(dnc+nc), and that of updating b is O(dnc+nc+n). When updating the projection matrix A, it consumes O(d3+d2n+dnc+d2c+n2d) complexity. Considering that the general case is n>dc, we have the overall complexity of JSFA is O(tn2d) where t is the number of optimization iterations.

Discussions on JSFA

Below we provide a brief explanation to the rationality of JSFA in joint sample and feature importance assessment.

We know that vi|i=1n reflects the importance of the i-th sample. Obviously, vi=0 means that the loss corresponding to the i-th sample is zero; that is, this sample will not be involved in the model learning. Conventionally, all samples are treated equally, meaning that they share the same weight one. The weighted loss of the i-th sample, vii, will be decreased when the i-the sample is identified as a noisy sample and vi is assigned as a small value. As pointed out by [39], the latent self-paced learning loss under the linear regularizer is

FλL()=-22λ,<λ;λ2,λ, 22

whose graphical illustration is provided in Fig. 2 and we neglect the subscript i here for succinctness. Essentially, when λ=, FλL() degenerates to the original least squares loss. However, when λ is set to a reasonable value, we can easily find the evident suppressing effect on FλL(). When the loss exceeds a given threshold, FλL() thereafter becomes a constant [31]. This explains how SPL works in dealing with outliers or heavy noises so as to improve the model robustness. Specifically, if the loss values of the samples are larger than the age parameter, they will have little influences to the model training because of their zero gradients; in other words, the importance values vis of these samples are zeros; therefore, they have no influence on the model optimization.

Fig. 2.

Fig. 2

Graphical illustration of the latent self-paced loss FλL()

Once the model training of JSFA is completed, the learned variable θ acts as the quantitative importance of features; specifically, θi depicts the importance value of the i-th EEG feature. As shown by Fig. 3, considering the widely used spectra features (i.e., power spectral density, differential entropy), each EEG feature dimension should be always corresponding to a certain frequency band and channel. If an EEG data set has M frequency bands and R channels in total and we form the sample vector by concatenating the R features of each of the M frequency bands together, then we can quantify the importance value of the m|m=1M-th frequency band as

ϕ(m)=θ(m-1)R+1+θ(m-1)R+2++θmR. 23

Similarly, we can obtain the quantitative importance of the r|r=1R-th channel by

ψ(r)=θr+θr+R++θr+(M-1)R. 24

Recent studies have pointed out that the identification results of critical frequency bands and channels not only provide more insights into the task-related EEG spatial-frequency activation patterns, but also lay a theoretical foundation for simplifying the hardware design of task-specific EEG acquisition devices in future.

Fig. 3.

Fig. 3

Graphical illustration of the feature importance-based EEG spatial-frequency activation patterns identification

Experimental studies

This section conducts experiments to evaluate the effectiveness of the proposed JSFA model. This work involved human subjects in its research. Approval of all ethical and experimental procedures and protocols was granted by the Research Ethics Committee of Shanghai Jiao Tong University under Protocol No. 2017060.

Experiments on synthetic data

We first explain how the synthetic data set is constructed and then conduct experiments to illustrate the effectiveness of JSFA on joint sample and feature importance assessment.

As shown in Fig. 4a, the three Gaussian distributed clusters in different colors are corresponding to three different classes, each of which consists of 150 data points. By utilizing the one-versus-one mode, we trained three least squares regression (LSR) classifiers whose decision boundaries are shown by corresponding lines. For example, the blue line is the decision boundary of the cyan and the magenta classes. Then, we deliberately introduce some noisy samples to investigate the model robustness. There are five outliers belonging to the blue class and 15 outliers belonging to the magenta class in Fig. 4b. We find that these noisy samples significantly affect the originally obtained decision boundaries. Taking the magenta line for example which is the boundary between the blue and cyan classes, since there is no specified mechanism to guarantee the robustness, the LSR classifier has to take these five outliers into consideration (i.e., LSR has to try its best to correctly classify these two classes). As a result, the original decision boundary is anticlockwise rotated by some degrees.

Fig. 4.

Fig. 4

Experiments on synthetic data

Given a certain model, the fitting errors of these deliberately introduced 20 points are much larger than those of the remaining samples, indicating their poor quality. To decrease their impacts in model training, they should be assigned small weights to improve the model robustness. For our proposed JSFA model, we set the age parameter λ as 0.85 in the experiment. After the model training, we obtained the average importance value of these 20 noisy samples is 0.5996 while that of the other points is 0.9724. In Fig. 4c, these samples are highlighted in boxes and the decision boundaries obtained by JSFA are almost identical to those in Fig. 4a. That is, the negative effects caused by these noisy samples are largely eliminated.

For this synthetic data set, if we want the samples to be well classified, they should be projected onto the x-axis rather than the y-axis, meaning that the first feature dimension is more discriminative than the second one. From Fig. 5b, we find that the cyan and blue classes are almost completely overlapped. Accordingly, the first feature dimension should be assigned a larger weight. By feeding this data set into JSFA, the learned feature importance vector θ is [0.7793,0.2207]. The value corresponding to the first feature dimension is significantly larger than the one corresponding to the second feature dimension. Therefore, the importance of different feature dimensions is adaptively learned by maximizing the model discriminative ability.

Fig. 5.

Fig. 5

Data points respectively projected onto the x-axis (top) and y-axis (bottom)

Experiments on emotion recognition

Data descriptions

SEED-IV is an video-evoked emotional EEG data set.1 15 healthy subjects were recruited in the EEG data acquisition experiments and each subject participated in the experiments three times (also termed three sessions). Therefore, SEED-IV consists of 45 sessions in total. In each session, four different emotional states (i.e., sad, fear, happy and neutral) were elicited by asking the subjects to watch the 24 well-chosen video clips, among which six clips correspond to one emotional state. When the subjects were watching the videos, EEG data was simultaneously recorded by using the ESI NeuroScan system with a 62-channel cap. The electrode placement is in line with the international 10–20 standard.

For each session, EEG data was partitioned into multiple 4-s non-overlapping segments and each of them will correspond to one sample for model learning. After being down-sampled from 1000 to 200 Hz, EEG data was first bandpass filtered to 1–75 Hz and then decomposed into five frequency bands including the Delta, Theta, Alpha, Beta and Gamma bands. Their frequency ranges are respectively 1–4 Hz, 4–8 Hz, 8–14 Hz, 14–31 Hz and 31–50 Hz. DE feature was extracted from each segment at these five frequency bands. Assuming that EEG data is a random variable (i.e., X) which follows the Gaussian distribution (i.e., f(x)=N(x;μ,σ2)), the DE feature [13, 40] can be calculated by

h(X)=--+f(x)ln12πσ2expx-μ22σ2dx=12ln(2πσ2)+Var(X)2σ2=12ln(2πeσ2). 25

From the above equation, it is easy to find an equivalence between DE and the logarithm of power spectrum. For each frequency band, there have 62 channel-wise features. We concatenate all these features corresponding to the five frequency bands together, leading to the sample dimensionality 310 in SEED-IV. Because the video clips have slightly different time durations, there are respectively 851, 832 and 822 EEG samples in these three sessions.

Experimental settings

Since each subject has EEG data from three different sessions, we perform emotion recognition in cross-session setting. By following the chronological order, three tasks including the ‘session1session2’, ‘session1session3’ and ‘session2session3’ are considered. For example, in the ‘session2session3’ task, the labeled EEG samples from the second session of each subject are used for model training and the unlabeled samples from the third session are used for model testing.

We compare our proposed JSFA model with some closely related ones including the support vector machine (SVM) and the least squares regression (LSR). In addition, to show the effectiveness of the feature importance variable θ and the self-paced regularization term in Eq. (8), we extrally included the comparison with the Rescaled Linear Square Regression (RLSR), and the self-paced learning (SPL). Here, the RLSR is a supervised classification model by incorporating the feature self-weighting variable θ into LSR, which is different from the semi-supervised version proposed in [41]. SPL has an augmented least squares loss by introducing the linear self-paced regularization term. In SVM, the linear kernel is used. Each of the four compared models has only one regularization coefficient C to tune. The regularization parameters in these compared models were tuned from {2-25,2-24,,225}. The set {1.1,1.2,,3.0} defines the search scope of step size parameter k in SPL and JSFA.

Results and analysis

Table 1 shows the results of JSFA and the other compared models, where we mark the best results in bold. These results depict the following meaningful points.

  • Though the EEG data collected at different sessions usually has considerable distribution discrepancies, our proposed JSFA model still achieves promising emotion recognition accuracies. To be specific, the average accuracies of JSFA on these three cross-session tasks are 80.79%, 82.52% and 81.20%, which respectively make improvements by 4.96%, 6.74% and 4.35% in comparison with the second-placed model. Therefore, we generally conclude that the internal subject-dependent emotional pattern is potentially stable which is covered with a layer of external factors and our JSFA model can effectively remove these factors from the perspective of jointly filtering out meaningless EEG samples and features.

  • During EEG data acquisition process, it is easily and sometimes inevitably contaminated by different types of noises such as the hardware devices and other physiological signals. Therefore, it is necessary to take the outliers or noises into consideration rather than ignoring them. From the average results, SPL outperformed LSR respectively by 6.41%, 9.15% and 6.22% in these three emotion recognition tasks, benefiting from introducing the sample importance descriptor to adaptively increase or decrease the impact of samples. To be specific, if a sample is difficult to fit with the current model, it is considered as a noisy sample and is automatically assigned a smaller weight.

  • Based on the consensus that different EEG frequency bands as well as different brain regions might have different correlations with neural activities, the discriminative abilities of these extracted EEG features should be different in mental state recognition. By introducing the feature self-weighting variable to adaptively explore the contributions of different EEG feature dimensions, RLSR obtained superior performance to LSR. Moreover, the average performance of JSFA outperforms that of SPL by 4.96%, 6.74% and 4.35% in these three cross-session tasks, indicating that adaptive learning of feature weights is beneficial for improving the recognition accuracy.

  • The experimental results depict the fact that the quality of both EEG samples and features determines the emotion recognition performance to a large extent. Therefore, seamlessly merging them into a unified model is beneficial for enhancing the recognition performance. From our point of view, these two aspects are complementary to each other. In JSFA, the sample importance descriptor v is jointly optimized with the feature importance vector θ in order to better capture the EEG data components which are more correlated to emotion expression, leading to improved emotion recognition performance.

Below we perform the one-way analysis of variance (ANOVA) between the experimental results respectively obtained by JSFA and each of the other compared models. For each model, there are 15 recognition accuracies corresponding to the 15 subjects in each session, leading to a accuracy sequence consisting of total 45 recognition accuracies, as shown in Table 1. The null hypothesis assumes that the means of these result groups corresponding to different models are equal. Table 2 shows the p-values returned by the ANOVA function, from which we find that JSFA significantly outperforms all the other models in emotion recognition and the hypothesis should be definitely rejected, demonstrating the effectiveness of our joint sample and feature assessment strategy.

Table 1.

The recognition accuracies (%) of JSFA and other compared methods on SEED-IV

ID Session1session2 Session1session3 Session2session3
SVM LSR RLSR SPL JSFA SVM LSR RLSR SPL JSFA SVM LSR RLSR SPL JSFA
Sub1 37.50 52.88 52.64 71.15 75.00 50.85 79.44 85.77 84.79 87.47 66.79 59.73 61.19 68.61 69.83
Sub2 83.89 85.82 92.79 94.47 97.36 72.02 40.51 50.36 64.36 80.90 82.12 45.86 46.84 63.75 64.84
Sub3 51.44 79.69 79.69 84.25 84.98 41.73 61.19 69.22 71.90 80.41 61.07 67.27 77.49 77.98 78.47
Sub4 32.45 69.35 73.80 72.96 82.93 54.01 76.16 78.35 81.87 82.60 62.04 84.79 85.04 85.77 88.69
Sub5 54.33 67.43 67.43 74.16 80.65 50.24 58.76 60.95 73.48 80.17 62.04 70.92 76.64 72.63 85.40
Sub6 44.23 68.75 69.59 71.03 74.76 84.67 90.15 87.47 92.34 94.89 67.40 91.73 92.82 92.70 94.28
Sub7 71.63 86.78 92.07 93.03 95.55 65.45 82.12 84.55 87.23 95.01 81.14 81.14 94.89 90.75 96.84
Sub8 65.38 67.55 77.76 74.40 81.85 84.55 83.70 86.01 89.05 94.04 73.97 74.93 77.13 76.28 77.13
Sub9 77.28 54.93 66.71 71.15 82.69 59.49 45.50 57.18 69.70 77.25 56.20 51.82 63.50 68.49 75.55
Sub10 41.59 57.81 58.29 68.15 70.19 34.79 58.15 58.64 63.26 75.43 67.03 61.56 66.30 75.55 77.37
Sub11 48.80 59.25 60.82 63.82 66.83 62.53 63.99 63.99 69.22 75.18 53.65 69.83 76.28 70.32 83.94
Sub12 43.63 65.87 73.08 66.23 74.76 30.17 51.46 53.16 58.52 64.60 64.84 70.80 68.13 72.87 75.67
Sub13 57.21 57.57 68.03 62.50 69.83 54.62 51.58 56.57 60.58 64.72 50.85 54.26 57.18 55.47 63.75
Sub14 78.61 74.40 77.76 77.28 78.73 63.38 77.86 84.18 86.50 92.58 82.12 87.23 86.86 91.73 91.73
Sub15 88.82 93.15 91.11 92.91 95.67 84.06 78.95 82.00 83.94 92.58 85.28 87.59 90.27 89.90 94.53
Avg. 58.45 69.42 73.44 75.83 80.79 59.50 66.63 70.56 75.78 82.52 67.77 70.63 74.70 76.85 81.20
Table 2.

The analysis of the variance (ANOVA) between JSFA and each of the other models

JSFA vs. SVM JSFA vs. LSR JSFA vs. RLSR JSFA vs. SPL
p-value 4.3707e−10** 3.34879e−06** 0.0007** 0.0165*

**p-value<0.01, *p-value<0.05

To provide more details on the recognition performance of each emotional state, we reorganize the recognition accuracies of the compared models by the form of confusion matrices in Fig. 6. From this figure, we get some insights into (1) the average recognition rate of each emotional state obtained by each compared model; (2) the rates of misclassifying samples from one class into the others, and (3) the performance improvements for each emotional state made by JSFA in comparison with the other models. Taking the neutral state as an example, the average recognition rate of JSFA is 87.77%, which is 11.78% higher than that of RLSR, i.e., 75.99%. In addition to the fact that 87.77% of the neutral EEG samples were classified correctly, the confusion matrix of JSFA also shows that 4.27%, 4.65% and 3.30% of the neutral samples were misclassified as the sad, fear, and happy states, respectively. Among these four emotional states, JSFA achieved the highest recognition rate on the neutral state.

Fig. 6.

Fig. 6

Confusion matrices of the compared models

Activation patterns mining from emotion data

In EEG-based emotion recognition, what we are interested in includes but is not limited to the recognition accuracy. We also expect JSFA to perform knowledge discovery on the EEG spatial-frequency activation patterns in emotion expression. In view of the abundant frequency and channel information contained in the EEG data, it is necessary to investigate the correlations between different frequency bands (brain regions) and emotion recognition. Based on the descriptions in the second part of section “Discussions on JSFA”, JSFA provides us with a quantitative way to identify the critical frequency bands and channels in EEG-based emotion recognition according to its learned feature importance vector θ.

As shown in Fig. 7, we visualize the learned θs. For example, the θ in Fig. 7a corresponds to the average of the 15 cases in the session1session2 task and the last subfigure is the average across all the 45 cases. Obviously, these EEG features have significantly different contributions in emotion recognition. We divide the interval of horizontal axis into five parts to make it correspond more intuitively to the five EEG frequency bands. According to Eq. (23), the importance measure of all the five frequency bands is quantified in Fig. 8, where the mean values are explicitly annotated on the top of bars. From these results, we experimentally demonstrate that the Gamma frequency band has the strongest correlations with the occurrence of affective effects, followed by the Delta band. Similarly, in order to check the significance between the Gamma band and the others, the returned p-values by one-way ANOVA are respectively 0.0063, 1.1413e-16, 2.4991e-15 and 1.0236e-18, indicating that the importance of the Gamma band is significantly important than the others. Though the obtained results are similar, the identification of EEG spatial-frequency activation patterns by JSFA is more flexible and adaptive in comparison with these trial-and-error methods [42, 43].

Fig. 7.

Fig. 7

The quantitative feature importance values learned by JSFA

Fig. 8.

Fig. 8

The mean importance of different frequency bands in SEED-IV

As analyzed above, the channel-wise EEG features (i.e., DE in this experiment) have different discriminative abilities; in turn, we want to identify the contributions of different EEG channels and further different brain regions in emotion recognition. Similar to the above analysis on frequency bands, the importance of EEG channels can be measured by Eq. (24) once the feature importance vector θ is fitted by data. Instead of directly listing their importance values here, we adopt the brain topographical map to more intuitively show the critical brain regions in Fig. 9. Obviously, the regions of the prefrontal, left/right central and (central) parietal lobes are considered to be more important in emotion recognition. The data-driven identification results of EEG spatial-frequency activation patterns not only provide us with more insights into the underlying neural mechanism in emotion processing, but also inspires us to design special EEG acquisition devices for emotion recognition in future.

Fig. 9.

Fig. 9

The spatial activation patterns of the three cross-session tasks (ac) and their average (d) identified by JSFA

Effect of the sample importance measurement

In addition to the theoretical analysis in section “Discussions on JSFA”, below we illustrate how the sample importance descriptor v acts in weighting the importance of samples and further improving the robustness of JSFA by experiments. We take the EEG samples from ‘subject 2: session 1’ as an example and visualize them in a two-dimensional subspace by the t-distributed stochastic neighborhood embedding (t-SNE) method in Fig. 10a, where the four different colors correspond to the four different states in SEED-IV. Obviously, there is an overlapping area highlighted by a rectangle whose enlarged version is provided in Fig. 10b. Below we provide an illustration from the perspective of the data acquisition paradigm in SEED-IV.

Fig. 10.

Fig. 10

An example to illustrate the sample importance measurement in JSFA. a The 2-D visualization by t-SNE of the samples from session 1 of subject 2; b A larger version corresponding to the rectangle part in a

In SEED-IV, self-assessment is conducted for each subject in a 45-s interval between trials. However, when a subject is completely immersed in the video clip of one trial and cannot extricate himself (herself), he (she) is difficult to quickly recover from that emotional state during such a short break. Then, in the front part of the next trial, this state will inevitably act as a background component. This will result in the inconsistencies between extracted EEG features and labeled emotional states. To be specific, though the EEG samples belong to the front part of the next trial are labeled with another emotional state, the underlying EEG features might be more similar to those in the previous trial. Therefore, it can be considered that the samples from both trials have similar features but are labeled with different emotional states. Correspondingly, we see some overlapped samples in the rectangle of Fig. 10 and there is no doubt that these samples are difficult to distinguish. To enhance the model robustness, we can perform model training by decreasing the weights of these samples to eliminate their side effects. As the self-paced regularizer did, these hard samples are treated as noisy ones whose weights are close to zero, and the remaining samples are treated as normal ones whose weights are close to one. As a result, the importance of samples is differently treated in the training process and the robustness of JSFA is enhanced by reducing the influence of noisy samples.

Experiments on driving fatigue detection

Data descriptions

SEED-VIG is a benchmark EEG data set for driving fatigue detection.2 During EEG data collection experiment, subjects were asked to perform simulated driving operations in a virtual reality-based driving scene. To ensure that there definitely exist fatigue states in the collected EEG data, the virtual road scenes were predominantly straight and monotonous. Besides, most experiments were conducted in early afternoon after lunch to make the subjects achieve the peaked circadian sleepiness. 23 subjects participated in the experiments. The data acquisition experiment for each subject lasted about 2 h. The Neuroscan system and a 62-channel EEG cap were used to record both the EEG and EOG data of subjects during the experiments. Additionally, the SMI eye-tracking glasses were worn to simultaneously capture the eye movements. Then the PERCLOS indicator values can be calculated to serve as the fatigue indices

PERCLOS=eye\_closing\_timetotal\_time. 26

As shown in Fig. 11, the 18 EEG channels which are mainly from the temporal and posterior lobes and are considered to be more correlated to fatigue expression were used for extracting EEG features. SEED-VIG provides both the PSD and DE features which were extracted from the five EEG frequency bands as those in SEED-IV. We used the DE features smoothed by linear dynamical system in the subsequent experiments. Since the time window used in short-time Fourier transform is 8 s and there is a bad channel, we get about 885 EEG samples for each subject and the feature dimensionality is 85 (i.e., 17 channels and five frequency bands).

Fig. 11.

Fig. 11

The 18 channels used in SEED-VIG data set [44]

Experimental settings

Comparative studies are conducted between JSFA and some popular regression models such as the support vector regression (SVR), least squares regression (LSR), rescaled least square regression (RLSR) and self-paced learning (SPL). What is worth mentioning that here RLSR is the incorporation of feature importance variable θ in LSR, which is a regression model and different from that in [41]. The available EEG samples for each subject are randomly partitioned for model training and testing, which respectively contain 600 samples and 285 samples. The parameter settings in each model are consistent with those in emotion recognition.

Two metrics, i.e., root mean square error (RMSE) and mean absolute percentage error (MAPE), are used to evaluate the regression performance of the compared models. They are respectively defined as

RMSE=1ni=1n(yi-yi)2 27

and

MAPE=1ni=1nyi-yiyi, 28

where n, yi and yi respectively denote the number of samples, the ground truth fatigue index and the estimated one of the i-th sample. It is obvious that the smaller of both metrics, the better driving fatigue detection performance.

Results and analysis

According to the experimental setup above, the driving fatigue detection results are shown in Table 3 where the best ones are highlighted in bold. In terms of both regression metrics, the average performance of JSFA ranks the first among all the compared models, demonstrating its superiority in EEG-based driving fatigue detection. Meanwhile, the flexibility of JSFA is exhibited in handling regression tasks.

Table 3.

Driving fatigue detection results by different models evaluated by the two metrics of RMSE and MAPE

ID RMSE MAPE
SVR LSR RLSR SPL JSFA SVR LSR RLSR SPL JSFA
Sub1 0.2549 0.0441 0.0440 0.0405 0.0396 34.76 6.48 6.47 5.03 5.03
Sub2 0.1498 0.0611 0.0610 0.0606 0.0514 39.97 20.71 20.29 15.99 13.69
Sub3 0.0801 0.0404 0.0405 0.0388 0.0386 12.82 6.80 6.80 5.76 5.67
Sub4 0.1186 0.0595 0.0596 0.0593 0.0573 41.58 18.75 18.77 15.17 14.54
Sub5 0.1591 0.0657 0.0656 0.0648 0.0647 44.04 13.53 13.53 10.32 10.25
Sub6 0.2410 0.0883 0.0878 0.0875 0.0863 45.00 27.10 27.13 21.40 19.30
Sub7 0.1687 0.0485 0.0485 0.0480 0.0463 25.39 16.48 16.50 11.56 9.64
Sub8 0.1009 0.0474 0.0474 0.0474 0.0436 19.64 11.04 11.04 9.91 9.27
Sub9 0.0569 0.0280 0.0281 0.0280 0.0280 9.53 5.43 5.43 5.04 5.00
Sub10 0.4544 0.0502 0.0502 0.0502 0.0489 232.83 19.45 19.13 17.08 14.52
Sub11 0.0740 0.0404 0.0404 0.0404 0.0397 13.71 7.01 7.01 6.11 5.96
Sub12 0.1279 0.0610 0.0608 0.0601 0.0586 21.68 11.74 11.67 10.53 10.30
Sub13 0.0988 0.0488 0.0488 0.0488 0.0488 19.95 10.90 10.81 9.90 9.84
Sub14 0.1432 0.0704 0.0704 0.0698 0.0695 47.70 16.02 15.80 13.75 13.36
Sub15 0.1753 0.0408 0.0408 0.0405 0.0376 19.98 4.73 4.72 4.27 3.73
Sub16 0.1387 0.0816 0.0805 0.0806 0.0765 32.07 12.62 12.61 10.44 10.25
Sub17 0.1468 0.0387 0.0387 0.0387 0.0384 21.79 8.26 8.23 7.54 7.08
Sub18 0.2513 0.0595 0.0595 0.0595 0.0595 33.90 10.97 11.03 9.77 8.82
Sub19 0.0543 0.0409 0.0409 0.0399 0.0384 9.71 6.68 6.63 5.74 5.57
Sub20 0.0921 0.0546 0.0545 0.0527 0.0521 17.92 12.88 12.84 9.93 8.91
Sub21 0.0884 0.0294 0.0292 0.0294 0.0287 13.60 5.36 5.30 4.69 4.60
Sub22 0.1105 0.0665 0.0665 0.0649 0.0635 172.22 43.47 41.24 17.91 17.27
Sub23 0.2164 0.0780 0.0780 0.0775 0.0774 59.49 61.25 59.66 28.94 27.89
Avg. 0.1523 0.0541 0.0540 0.0534 0.0519 43.01 15.55 15.33 11.17 10.46

Since the used channels in SEED-VIG have already been fixed in the temporal, parietal and occipital lobes instead of using all the 62 channels to cover all brain regions, we do not perform the critical EEG channels identification and only provide the knowledge discovery on EEG frequency bands. According to the learned θ by JSFA, we obtain the importance of different frequency bands on the two metrics in Fig. 12. Generally, the importance distributions of frequency bands in driving fatigue detection are completely different from those in the emotion recognition experiment. Below we summarize the possible reasons accounting for the reason that Alpha band contributes the most in driving fatigue detection, from the two perspectives of the neural mechanism of sleepy and the characteristics of the SEED-VIG itself. First, the Alpha activities have been considered as the most reliable index to depict the transition between wakefulness and sleepiness; specifically, they will be attenuated when the vigilance of drivers decreases [45]. In other words, the EEG features corresponding to the Alpha band are sensitive to the changes of drivers’ fatigue states. Therefore, the Alpha band should be deservedly identified as the most important one. Second, the Alpha waves generally appear more often in the parietal and occipital lobes especially the latter. Among the 18 channels used in SEED-VIG, 12 are concentrated in these areas, which increases the chance of detecting the Alpha activities. From the results shown in Table 4, we see the statistical significance between the Alpha band and all the other bands except the Beta band. Though the above results are totally data-driven, they have shown us the important role of Alpha band in driving fatigue detection. Since the Alpha rhythm is closely related to fatigue and sleep research [4648], we will continuously investigate this topic from both perspectives of engineering practice and cognitive neuroscience.

Fig. 12.

Fig. 12

The mean importance of different frequency bands in SEED-VIG

Table 4.

The analysis of the variance (ANOVA) between Alpha and each of the other bands

Alpha vs. Delta Alpha vs. Theta Alpha vs. Beta Alpha vs. Gamma
p-value(RMSE) 0.0006** 0.0189* 0.2365 7.1114e−07**
p-value(MAPE) 0.0022** 0.0024** 0.2291 7.8639e−07**

**p-value<0.01, *p-value<0.05

Conclusion

In this paper, a joint sample and feature importance assessment (JSFA) model was proposed for improving the EEG decoding performance of BCI systems. Unlike the usual approaches which treated all samples and features equally, JSFA adaptively learned their importance from EEG data according to their contributions in a certain recognition task. Extensive experiments were conducted on two typical supervised EEG-based BCI applications (i.e., emotion classification and driving fatigue detection) and the results demonstrated the superiority of such joint optimization approach in improving the recognition performance. Moreover, JSFA provided us with more insights into the occurrence of emotion and fatigue effects, mainly from the perspective of identifying the EEG spatial-frequency activation patterns. The limitations of the proposed JSFA model are mainly from two aspects. One is that JSFA is a linear model while might be not competent enough to characterize the nonlinear structures in EEG data. The other is that the sample importance is evaluated based on the squared 2-norm approximation error whose robustness can be further improved. As our future work, we will develop the neural network-based nonlinear JSFA variants to improve its nonlinear feature learning ability and incorporate more robust methods to measure the data reconstruction errors such as the 2,1-norm.

Acknowledgements

This work was supported in part by the National Key Research and Development Program of China (Grant No. 2023YFE0114900) and in part by the National Natural Science Foundation of China (Grant No. 61971173).

Author contributions

All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by XL, YZ, YP, and WK. The first draft of the manuscript was written by XL and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Funding

This work was supported in part by the National Key Research and Development Program of China (Grant No. 2023YFE0114900) and in part by the National Natural Science Foundation of China (Grant No. 61971173).

Data availability

The two EEG data sets, SEED_IV and SEED_VIG can be publicly accessed from https://bcmi.sjtu.edu.cn/~seed/index.html.

Code availability

The source code is available from https://github.com/SunseaIU/JSFA.

Declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose.

Ethical approval

This work involved human subjects or animals in its research. Approval of all ethical and experimental procedures and protocols was granted by the Research Ethics Committee of Shanghai Jiao Tong University under Approval No. 2017060.

Consent for publication

All authors have checked the manuscript and have agreed to the submission.

Footnotes

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.Gao X, Wang Y, Chen X, Gao S. Interface, interaction, and intelligence in generalized brain-computer interfaces. Trends Cogn Sci. 2021;25(8):671–84. [DOI] [PubMed] [Google Scholar]
  • 2.Pourbabaee B, Roshtkhari MJ, Khorasani K. Deep convolutional neural networks and learning ECG features for screening paroxysmal atrial fibrillation patients. IEEE Trans Syst Man Cybern Syst. 2018;48(12):2095–104. [Google Scholar]
  • 3.Chen L, Wu M, Zhou M, Liu Z, She J, Hirota K. Dynamic emotion understanding in human-robot interaction based on two-layer fuzzy SVR-TS model. IEEE Trans Syst Man Cybern Syst. 2020;50(2):490–501. [Google Scholar]
  • 4.Lan Z, Sourina O, Wang L, Scherer R, Müller-Putz GR. Domain adaptation techniques for EEG-based emotion recognition: a comparative study on two public datasets. IEEE Trans Cognit Dev Syst. 2018;11(1):85–94. [Google Scholar]
  • 5.Abgeena A, Garg S. S-LSTM-ATT: a hybrid deep learning approach with optimized features for emotion recognition in electroencephalogram. Health Inf Sci Syst. 2023;11(1):40. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.King J-T, Prasad M, Tsai T, Ming Y-R, Lin C-T. Influence of time pressure on inhibitory brain control during emergency driving. IEEE Trans Syst Man Cybern Syst. 2020;50(11):4408–14. [Google Scholar]
  • 7.Li C, Bao Z, Li L, Zhao Z. Exploring temporal representations by leveraging attention-based bidirectional LSTM-RNNs for multi-modal emotion recognition. Inf Process Manag. 2020;57(3):102185. [Google Scholar]
  • 8.Yang Y, Gao Z, Li Y, Cai Q, Marwan N, Kurths J. A complex network-based broad learning system for detecting driver fatigue from EEG signals. IEEE Trans Syst Man Cybern Syst. 2021;51(9):5800–8. [Google Scholar]
  • 9.Olmez Y, Koca GO, Sengur A, Acharya UR. PS-VTS: particle swarm with visit table strategy for automated emotion recognition with EEG signals. Health Inf Sci Syst. 2023;11(1):22. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Wolpaw JR, Millán JDR, Ramsey NF. Brain-computer interfaces: definitions and principles. Handb Clin Neurol. 2020;168:15–23. [DOI] [PubMed] [Google Scholar]
  • 11.Chen X, Liu Q, Tao W, Li L, Lee S, Liu A, Chen Q, Cheng J, McKeown MJ, Wang ZJ. ReMAE: user-friendly toolbox for removing muscle artifacts from EEG. IEEE Trans Instrum Meas. 2019;69(5):2105–19. [Google Scholar]
  • 12.Zhang G, Yu M, Chen G, Han Y, Zhang D, Zhao G, Liu Y-J. A review of EEG features for emotion recognition. Sci Sin Inf. 2019;49(9):1097–118. [Google Scholar]
  • 13.Duan R-N, Zhu J-Y, Lu B-L. Differential entropy feature for EEG-based emotion classification. In: Proceedings of international IEEE/EMBS conference on neural engineering. 2013. p. 81–4.
  • 14.Li J, Thakor N, Bezerianos A. Brain functional connectivity in unconstrained walking with and without an exoskeleton. IEEE Trans Neural Syst Rehabil Eng. 2020;28(3):730–9. [DOI] [PubMed] [Google Scholar]
  • 15.Li R, Ren C, Zhang S, Yang Y, Zhao Q, Hou K, Yuan W, Zhang X, Hu B. STSNet: a novel spatio-temporal-spectral network for subject-independent EEG-based emotion recognition. Health Inf Sci Syst. 2023;11(1):25. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Peng Y, Li Q, Kong W, Qin F, Zhang J, Cichocki A. A joint optimization framework to semi-supervised RVFL and ELM networks for efficient data classification. Appl Soft Comput. 2020;97:106756. [Google Scholar]
  • 17.Hu B, Li X, Sun S, Ratcliffe M. Attention recognition in EEG-based affective learning research using CFS+KNN algorithm. IEEE/ACM Trans Comput Biol Bioinform. 2018;15(1):38–45. [DOI] [PubMed] [Google Scholar]
  • 18.Sha T, Zhang Y, Peng Y, Kong W. Semi-supervised regression with adaptive graph learning for EEG-based emotion recognition. Math Biosci Eng. 2023;20(6):11379–402. [DOI] [PubMed] [Google Scholar]
  • 19.Peng Y, Liu H, Li J, Huang J, Lu B-L, Kong W. Cross-session emotion recognition by joint label-common and label-specific EEG features exploration. IEEE Trans Neural Syst Rehabil Eng. 2023;31:759–68. [DOI] [PubMed] [Google Scholar]
  • 20.Wu D, Xu Y, Lu B-L. Transfer learning for EEG-based brain-computer interfaces: a review of progress made since 2016. IEEE Trans Cognit Dev Syst. 2022;14(1):4–19. [Google Scholar]
  • 21.Li W, Huan W, Hou B, Tian Y, Zhang Z, Song A. Can emotion be transferred?—a review on transfer learning for EEG-based emotion recognition. IEEE Trans Cognit Dev Syst. 2022;14:833–46. [Google Scholar]
  • 22.Peng Y, Wang W, Kong W, Nie F, Lu B-L, Cichocki A. Joint feature adaptation and graph adaptive label propagation for cross-subject emotion recognition from EEG signals. IEEE Trans Affect Comput. 2022;13(4):1941–58. [Google Scholar]
  • 23.Liu Y, Lan Z, Cui J, Sourina O, Müller-Wittig W. Inter-subject transfer learning for EEG-based mental fatigue recognition. Adv Eng Inform. 2020;46(101157):1–8. [Google Scholar]
  • 24.Xia K, Ni T, Yin H, Chen B. Cross-domain classification model with knowledge utilization maximization for recognition of epileptic EEG signals. IEEE/ACM Trans Comput Biol Bioinform. 2021;18(1):53–61. [DOI] [PubMed] [Google Scholar]
  • 25.Gong S, Xing K, Cichocki A, Li J. Deep learning in EEG: advance of the last ten-year critical period. IEEE Trans Cognit Dev Syst. 2022;14(2):348–65. [Google Scholar]
  • 26.Suhaimi NS, Mountstephens J, Teo J. EEG-based emotion recognition: a state-of-the-art review of current trends and opportunities. Comput Intell Neurosci. 2020;2020(8875426):1–19. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Sikander G, Anwar S. Driver fatigue detection systems: a review. IEEE Trans Intell Transp Syst. 2018;20(6):2339–52. [Google Scholar]
  • 28.Peng Y, Qin F, Kong W, Ge Y, Nie F, Cichocki A. GFIL: a unified framework for the importance analysis of features, frequency bands and channels in EEG-based emotion recognition. IEEE Trans Cognit Dev Syst. 2022;14(3):935–47. [Google Scholar]
  • 29.Cui Y, Xu Y, Wu D. EEG-based driver drowsiness estimation using feature weighted episodic training. IEEE Trans Neural Syst Rehabil Eng. 2019;27(11):2263–73. [DOI] [PubMed] [Google Scholar]
  • 30.Mishuhina V, Jiang X. Feature weighting and regularization of common spatial patterns in EEG-based motor imagery BCI. IEEE Signal Process Lett. 2018;25(6):783–7. [Google Scholar]
  • 31.Peng Y, Lu B-L. Robust structured sparse representation via half-quadratic optimization for face recognition. Multimed Tools Appl. 2017;76(6):8859–80. [Google Scholar]
  • 32.Yao C-L, Lu B-L. A robust approach to estimating vigilance from EEG with neural processes. In: Proceeding of IEEE international conference on bioinformatics and biomedicine. 2020. p. 1202–5.
  • 33.Kumar M, Packer B, Koller D. Self-paced learning for latent variable models. In: Proceedings of advances in neural information processing systems. 2010. p. 1189–97.
  • 34.Jiang L, Meng D, Mitamura T, Hauptmann A. Easy samples first: self-paced reranking for zero-example multimedia search. In: Proceedings of ACM international conference on multimedia. 2014. p. 547–56.
  • 35.Zhao Q, Meng D, Jiang L, Xie Q, Xu Z, Hauptmann AG. Self-paced learning for matrix factorization. In: Proceeding of AAAI conference on artificial intelligence. 2015. p. 3196–202.
  • 36.Gan J, Wen G, Yu H, Zheng W, Lei C. Supervised feature selection by self-paced learning regression. Pattern Recogn Lett. 2020;132:30–7. [Google Scholar]
  • 37.Ma F, Meng D, Dong X, Yang Y. Self-paced multi-view co-training. J Mach Learn Res. 2020;21:1–38.34305477 [Google Scholar]
  • 38.Li L, Zhao K, Li S, Sun R, Cai S. Extreme learning machine for supervised classification with self-paced learning. Neural Process Lett. 2020;52(3):1723–44. [Google Scholar]
  • 39.Meng D, Zhao Q, Jiang L. What objective does self-paced learning indeed optimize? arXiv Preprint. 2015. Available from: arXiv:1511.06049
  • 40.Shi L-C, Jiao Y-Y, Lu B-L. Differential entropy feature for EEG-based vigilance estimation. In: Proceedings of international conference of the IEEE engineering in medicine and biology society. 2013. p. 6627–30. [DOI] [PubMed]
  • 41.Chen X, Yuan G, Nie F, Ming Z. Semi-supervised feature selection via sparse rescaled linear square regression. IEEE Trans Knowl Data Eng. 2020;32(1):165–76. [Google Scholar]
  • 42.Peng Y, Lu B-L. Discriminative manifold extreme learning machine and applications to image and EEG signal classification. Neurocomputing. 2016;174:265–77. [Google Scholar]
  • 43.Zheng W-L, Zhu J-Y, Lu B-L. Identifying stable patterns over time for emotion recognition from EEG. IEEE Trans Affect Comput. 2019;10:417–29. [Google Scholar]
  • 44.Zheng W-L, Lu B-L. A multimodal approach to estimating vigilance using EEG and forehead EOG. J Neural Eng. 2017;142:026017. [DOI] [PubMed] [Google Scholar]
  • 45.Shi L-C, Lu B-L. Dynamic clustering for vigilance analysis based on EEG. In: Proceedings of annual international conference of IEEE engineering in medicine and biology society. 2008. p. 54–7. [DOI] [PubMed]
  • 46.Pivik RT, Harman K. A reconceptualization of EEG alpha activity as an index of arousal during sleep: all alpha activity is not equal. J Sleep Res. 1995;4(3):131–7. [DOI] [PubMed] [Google Scholar]
  • 47.Benca RM, Obermeyer WH, Larson CL, Yun B, Dolski I, Kleist KD, Weber SM, Davidson RJ. EEG alpha power and alpha power asymmetry in sleep and wakefulness. Psychophysiology. 1999;37(4):430–6. [PubMed] [Google Scholar]
  • 48.Kerr CE, Sacchet MD, Lazar SW, Moore CI, Jones SR. Mindfulness starts with the body: somatosensory attention and top-down modulation of cortical alpha rhythms in mindfulness meditation. Front Hum Neurosci. 2013;7(12):1–15. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The two EEG data sets, SEED_IV and SEED_VIG can be publicly accessed from https://bcmi.sjtu.edu.cn/~seed/index.html.

The source code is available from https://github.com/SunseaIU/JSFA.


Articles from Health Information Science and Systems are provided here courtesy of Springer

RESOURCES