Skip to main content
Brain Informatics logoLink to Brain Informatics
. 2025 Aug 15;12(1):20. doi: 10.1186/s40708-025-00267-w

Motor imagery decoding network with multisubject dynamic transfer

Zhi Li 1, Mingai Li 1,2,3,, Yufei Yang 1
PMCID: PMC12356803  PMID: 40815349

Abstract

Brain computer interface (BCI) provides a promising and intelligent rehabilitation method for motor function, and it is crucial to acquire the patient’s movement intentions accurately through decoding motor imagery EEG (MI-EEG) . Because of the inter-individual heterogeneity, the decoding model should demonstrate dynamic adaptation abilities.Domain adaptation (DA) is effective to enhance model generalization by reducing the inherent distribution difference among subjects. However, the existing DA methods usually mix the multiple source domains into a new domain, the resulting multi-source domain conflict may cause negative transfer. In this paper, we propose a multi-source dynamic conditional domain adaptation network (MSDCDA). First, a multi-channel attention block is employed in the feature extractor to focus on the channels relevant to the corresponding MI task. Subsequently, the shallow spatial-temporal features are extracted using a spatial-temporal convolution block. And a dynamic residual block is applied in the feature extractor to dynamically adapt specific features of each subject to alleviate conflicts among multiple source domains since each domain is viewed as a distribution of electroencephalogram (EEG) signals. Furthermore, we employ the Margin Disparity Discrepancy (MDD) as the metric to achieve conditional distribution domain adaptation between the source and target domains through adversarial learning with an auxiliary classifier. MSDCDA achieved accuracies of 78.55% and 85.08% on Datasets IIa and IIb of BCI Competition IV, respectively. Our experimental results demonstrate that MSDCDA can effectively address multi-source domain conflicts and significantly enhance the decoding performance of target subjects. This study positively facilitates the application of BCI based on motor function rehabilitation.

Keywords: Motor function rehabilitation, Motor imagery electroencephalogram (MI-EEG), Domain adaptation (DA), Multisubject dynamic transfer, Conditional domain adaptation.

Introduction

Brain-computer interface (BCI) is a crucial tool for achieving human–machine interaction by establishing direct communication between the brain and external devices. It can help people with disabilities in regaining visual, auditory, sensory, and motor functions, enhancing their quality of life and productivity. To date, BCIs have significant advancements in areas such as emotion recognition [13], disease detection [46], and disability rehabilitation [7]. Electroencephalogram (EEG) signals, which record electrical changes during brain activity and reflect the electrophysiological activity of neurons on the surface of the cerebral cortex or scalp, are widely used in BCI systems because of their high temporal resolution, safety, and cost-effectiveness [810]. EEG-based patterns are generally categorized into four types: (1) sensorimotor rhythm (SMR), (2) steady-state visual evoked potential, (3) event-related potential (ERP), and 4) slow cortical potential. Motor imagery (MI), which is related to SMR, is a widely adopted BCI paradigm. During imagery actions, the amplitudes of the μ (7–13 Hz) and β (13–30 Hz) rhythms can be suppressed or enhanced. These phenomena are referred to as event-related desynchronization (ERD) and event-related synchronization, respectively [11]. MI-based BCI systems are advantageous because they do not require external stimulation of the user and rely on spontaneous EEG signals closely linked to the user’s active motor intentions, making them ideal for rehabilitation in patients with motor dysfunction. For instance, Tang et al. [12] integrated the MI paradigm with virtual reality technology and deep learning (DL) to design an exoskeleton system for upper-limb rehabilitation. Achieving excellent MI decoding performance is crucial for the effective implementation of MI-BCI systems.

DL, a branch of machine learning, is a salient technique for data analysis that can automatically extract and classify complex nonlinear features from datasets. It has achieved remarkable success in decoding EEG signals in recent years. For example, Schirrmeister et al. [13] proposed the shallow network (ShallowNet) and deep network (DeepNet), which can be directly employed for MI-EEG classification, substantially outperforming traditional machine learning methods. Lawhern et al. [14] developed a network called EEGNet for MI decoding, which not only surpassed the performance of prior algorithms but also extracted neurophysiologically interpretable features from EEG signals. Inspired by the inception structure [15], Santamaria et al. [16] proposed a model that integrates an inception module for ERP-based task detection, achieving advanced performance in multiple tasks, including MI decoding.

In practice, the application of DL methods, which require a large amount of labeled data for model training, is often hindered by the lack of labelled data and the heterogeneity of data among different subjects. To address this challenge, DL-based unsupervised domain adaptation (UDA) has been extensively studied. UDA leverages labeled data from a source domain to build a model that is suitable for unlabeled data from a target domain. For example, Xu et al. [17] proposed a multi-level spatial-temporal adaptation network to relieve the effects of variance across subjects and sessions, overcoming limitations in cross-domain MI tasks. Zhong et al.[18] developed a deep domain adaptation framework with correlation alignment, which aligns the second-order statistics of the margin distributions between the source and target domains. Li et al. [19] introduced a multi-direction transfer learning strategy for cross-subject adaptation, where data from multi-source domains are applied to the target domain, as well as from one multi-source domain to another. This method is model-agnostic, allowing for rapid deployment on existing models. Wei et al. [20] improved transfer joint matching to align the data distribution for each pair of subjects and combined the results using decision fusion with a multi-branch structure, effectively addressing the challenges posed by significant differences between multiple source subjects in domain adaptation. In addition to these traditional DL-based UDA methods, many studies inspired by generative adversarial networks [21] and domain-adversarial neural network [22] have integrated UDA with adversarial learning techniques. She et al. [23] used the Wasserstein distance instead of the 0-1 loss as a criterion for measuring the difference between the source and target domains, effectively preventing gradient vanishing and model collapse during adversarial training. Chen et al. [24] designed a multi-attention layer to automatically focus on the dominant brain region associated with motor actions, enhancing the ability of the model to extract discriminative features relevant to MI tasks. Li et al. [25] developed a dual-attention adversarial network that uses two unshared attention modules for the source and target domains preserve domain-specific information and avoid its loss. Hong et al. [26] explored global and local discriminators to align the marginal and conditional distributions between the source and target domains, respectively, and then dynamically adjusted the weights of the marginal and conditional alignments using a dynamic factor.

In traditional domain adaptation, the source domain data is sampled from the same distribution P(x). The idea of domain adaptation is to align the source domain distribution P(x) with the target domain distribution Q(x) to improve the model performance on the target domain. However, in many practical cases, the source domain data usually comes from multiple distributions (such as P1(x), P2(x), …, Pn(x)), and there are essentially multiple source domains. The effect of domain adaptation is undoubtedly affected by the distribution differences among multiple source domains, called multi-source domain conflict [27].

The most existing domain adaptation methods combine generally multiple source subject data into a single domain without considering the distribution difference among multiple subjects. If the differences among multiple source domains are not taken into account, the target domain cannot align multiple source domains simultaneously, which makes it difficult for the feature extractor to extract domain-invariant features and may even lead to negative transfer. Additionally, domain adaptation is generally implemented through adversarial training with a 0-1 loss function, which only achieves global adaptation between the source and target domains without considering the relationship between corresponding categories.

To address these challenges, we propose a multisubject dynamic conditional domain adaptation network (MSDCDA) in this paper. A dynamic residual module is introduced into MSDCDA to alleviate the multi-source domain conflict without significantly increasing the network complexity. Meanwhile, an auxiliary classifier is designed to be consistent with the classifier structure, and the margin disparity discrepancy (MDD), as a discrepancy measurement of the two domains, is applied to align the conditional shift between the source and target domains via an adversarial learning strategy, which can achieve better alignment by explicitly increasing the class information of samples during domain adaptation and avoid gradient vanishing. The main contributions of this study are summarised as follows:

  • We design a dynamic residual module in MSDCDA to dynamically adjust the network parameters according to samples from different domains, effectively mitigating multi-source domain conflicts.

  • We design a Multi-channel Attention Block containing multiple channel attention sub-blocks, which improves the decoding performance of the network by focusing on the channels related to the MI tasks.

  • We utilize MDD as a metric to measure the discrepancy (or distance) between source and target domains, achieving the purpose of aligning the corresponding categories of the source and target domains by introducing class information and conducting adversarial training with two structurally consistent classifiers.

The remainder of this paper is organised as follows. Section 2 presents the method in detail, including the architecture, design principles, and optimization of the proposed MSDCDA. Section 3 describes the experiments conducted on two public datasets. Section 4 presents the performance comparison, limitations, and potential future work. Finally, Sect. 5 summarizes this study.

Methods

Definitions and notations

This section presents the definitions and symbols used throughout this paper. The source domain with ns labelled samples is defined as

Ds=xis,yisi=1s

, where xisRE×P represents the ith EEG trial with E electrodes and P sampling points, and yisRN is the corresponding label. Analogously, the target domain with nt unlabelled samples is represented by

Dt=xjtj=1t

, where xjt indicates the jth EEG trial. In unsupervised domain adaptation, the assumption is that data distributions from different domains are similar but not identical, a concept referred to as domain shift (covariant shift). The objective is to leverage labeled data from the source subjects to train a classifier that performs well on a target subject using UDA techniques.

Network architecture

In this section, the framework of MSDCDA is presented in detail. As shown in Fig. 1, MSDCDA consists of three main parts: a feature extractor, classifier, and auxiliary classifier that is consistent with the classifier structure. During the training process of MSDCDA, the source and target EEG signals are first sent to the feature extractor. Subsequently, the classifier is trained on features from the labelled source domain using supervised learning. The target features are then passed to the classifier to generate pseudo-labels for the auxiliary classifier. This auxiliary classifier helps reduce the distribution discrepancy between the source and target domains by introducing a gradient reverse layer between the feature extractor and the auxiliary classifier during adversarial training. This setup ensures that the data distributions of the two domains are aligned while simultaneously learning domain-invariant feature representations. Consequently, labelled data from multiple source subjects can be leveraged to enhance classification performance on an unlabelled target subject.

Fig. 1.

Fig. 1

Network architecture of MSDCDA

The structure and working mechanism of MSDCDA are detailed below.

Feature extractor

To extract task-related discriminative features from EEG signals and mitigate multiple domain conflicts, we designed a feature extractor composed of three cascaded blocks: the Multi-channel attention block, spatial-temporal convolution block, and dynamic residual block.

1) Multi-channel Attention Block: The activation levels of the brain are related to MI tasks, and the MI signals from different channels contribute variably to recognition. Therefore, we introduce the multi-channel attention block to dynamically estimate the channel contributions in motion intent decoding, aiming to search for specific spatial patterns associated with MI tasks.

The mechanism of Multi-channel Attention Block (McAB) is illustrated in Fig. 2. The importance of each channel is evaluated by assigning a learnable weight, and a multi-channel attention block, which involves K parallel channel attention sub-blocks, is introduced to represent the corresponding channel activation levels for all MI tasks.

Fig. 2.

Fig. 2

Schematic diagram of the McAB in the feature extractor

We initialise the weights of all channel attention sub-blocks to their average, allowing the weights of each channel to be learned more effectively and simultaneously at the beginning of training. In addition, we use the softmax function to limit the sum of the weights for each sub-block to lie between 0 and 1, promoting the stability of network training. The weight is calculated using the following formula:

AK,i=eaK,ij=1EeaK,j 1

where K and E represent the numbers of channel attention sub-blocks and EEG channels, respectively. AK,i indicates the ith weight calculated by softmax in the Kth channel attention sub-block, and aK,i and aK,j represent two learnable weights corresponding to the ith and jth channels of the EEG signal, respectively. Here, K and E represent the numbers of parallel channel attention sub-blocks and EEG channels, respectively.

Compared with general spatial attention modules, McAB is essentially composed of multiple spatial attention patterns. Each pattern may select key channels highly relevant to specific mental tasks. Therefore, it is necessary to incorporate multiple patterns in the attention layer, especially in the case of multi-classification tasks.

2) Spatial-Temporal Convolution Block: The ShallowNet [13], as shown in Table 1, was selected as the spatial-temporal convolution block (S-T Conv Block) to extract spatial-temporal information from the feature maps generated by McAB. Specifically, we use two convolutional layers with kernel sizes (1, 25) and (E, 1) (where E represents the number of channels in the EEG signals) to extract the temporal and spatial features of the EEG signals, respectively. A batch normalisation layer follows to normalise the extracted features. Subsequently, a nonlinear squaring operation is applied, along with an average pooling layer sized (1, 75) with a stride of 15, and a logarithmic activation function. This sequence of operations simulates the log-variance computation of the signals, further enhancing the extraction of nonlinear features from the EEG data.

Table 1.

Structure of the spatial-temporal convolution block in the feature extractor

Layer Filter Kernel Stride Output
Input (K,E,1000)
Conv2d 32 (1,25) (1,1) (32,E,976)
Conv2d 32 (E,1) (1,1) (32,1,976)
BN (32,1,976)
Square (32,1,976)
AvgPool2d (1,75) (1,15) (32,1,61)
Log (32,1,61)
Dropout p=0.5 (32,1,61)

3) Dynamic Residual Block: The MI data distributions among different subjects exhibit heterogenity due to individual differences, which can negatively affect domain adaptation and subsequently degrade the classification performance of the model. The underlying principle is that model adaptation is achieved statistically by tailoring the model to each MI-EEG data distribution, considering each domain as a distinct representation of the subject’s MI-EEG data. Inspired by the dynamic transfer proposed by [27], a dynamic residual block that can dynamically adjust parameters according to the samples is introduced into the feature extractor to alleviate multi-source domain conflicts caused by individual differences. The key insight is that adapting model across domains is achieved via adapting model across samples, since each domain is viewed as a distribution of electroencephalogram (EEG) signals. This means that there are no rigid boundaries between multiple source domains.

The dynamic residual block Mθ(x) is difficult to learn, and the key is to restrict the model’s dependence on input x to a small number of parameters. Thus, we take the residual part with fewer parameters as the dynamic subpath of the dynamic residual block, as shown in Fig. 3, and the dynamic residual block can be formulated as

W(x)=W0+W(x), 2

where W0 and W(x) denote the static convolution kernel and dynamic residual matrices, respectively.

Fig. 3.

Fig. 3

Dynamic subpath of the dynamic residual block

The dynamic subpath part can be expressed as

W(x)=(x)W0+i=1Lπi(x)Φi. 3

Specifically, (x)W0 represents the channel attention module, where (x) is a diagonal Cout×Cout matrix, and Cout denotes the number of output channels. This component rescales the output channels of W0 using the squeeze-and-excitation block (SE block) [28].

The term i=1Lπi(x)Φi denotes the subspace routing, which is linearly composed of L static matrices Φi, with weights dependent on the input x. The value of L is empirically set to 4. The matrices Φi represent static convolution kernel weights. To reduce the number of dynamic parameters and enhance the stability of network training, we employ a 1×1 convolution kernel and introduce only the dynamic residual to the middle convolution layer. The dynamic coefficients πi can be regarded as projections of the dynamic residual matrix in the corresponding weight subspaces.

Similar to (x), the dynamic coefficient πi(x) is implemented using a lightweight attention module, which comprises an average pooling layer followed by two fully connected layers. Normalization of (x) employs the sigmoid function, whereas πi(x) uses the softmax function.The dynamic sub-path, as part of the feature extractor, relies on the backpropagation of the classifier gradient and the gradient of the auxiliary classifier through Gradient Reversal Layer (GRL) layer to optimize its parameters.

Classifier

The classifier predicts the labels of the representations learned from the feature extractor. As shown in Table 2, it comprises two fully connected (FC) layers and a softmax function, which transforms network predictions into class labels, and Class denoting the number of categories. In this study, the target domain was not used for trainingthe classifier; instead, it was trained solely on the labelled MI-EEG data from the source domains. The well-trained classifier was then directly applied to a target subject for MI classification.

Table 2.

Structure of the classifier in MSDCDA

Layer Units Output
FC 128 (,128)
BN (,128)
ELU (,128)
FC Class (,Class)
Softmax (,Class)

Conditional alignment and auxiliary classifier

An auxiliary, structured identically to the primary classifier, is employed for domain adaptation. The MDD serves as a measurement of the discrepancy between the source and target domains and is minimized by the auxiliary classifier through adversarial learning. Compared with the general adversarial domain adaptation method based on 0-1 loss, this method introduces category information to achieve conditional domain adaptation at a finer granularity level, and also avoids the problem of vanishing gradients during training.

We use MDD as a metric to measure the distance between the source and target domains, and its expression is as follows:

df,F(ρ)(Ds,Dt)supfF(dispDt(ρ)f,f-dispDs(ρ)f,f), 4

where Ds and Dt are the source and target domains, respectively, and dispD(ρ)f,f is the margin disparity. f and f are scoring functions, and hf and hf are their corresponding label functions, respectively.

The MDD satisfies non-negativity and sub-additivity and can be defined as a distribution difference measurement criterion, although it does not satisfy symmetry. Therefore, a target domain error upper bound can be proposed for the UDA as follows:

errDthferrDs(ρ)(f)+df,F(ρ)(Ds,Dt)+λ,λ=minfF(errDs(ρ)(f)+errDt(ρ)(f)), 5

where errDthf is the prediction error of the unlabelled target domains, and errDs(ρ)(f) represents the training error of the labelled source domains. λ denotes the ideal error when the source and target domains are combined. f is the ideal scoring function in hypothesis space.

The above illustration shows that if the hypothesis space is sufficiently rich, that is, includes the ideal scoring function, the prediction error of the target domain can be constrained to a small range. This is achieved by minimizing the training error of the source domains and addressing the differences between the source and target domains. Combined with the above analysis, we can see that the expected error of the target domain can be reduced by minimizing the error of source domains and the MMD between source and target domains:

minf,ψerrψ(Ds)(ρ)(f)+dispψ(Dt)(ρ)f,f-dispψ(Ds)(ρ)f,f, 6

where ψ is a feature extractor, and f is an ideal classifier. However, the ideal classifier is difficult to solve directly; therefore, we introduce an auxiliary classifier f that shares the hypothesis space with classifier f, and minimising the margin disparity discrepancy is a mini-max game because MDD is defined as the supremum over the hypothesis space. Because the max-player is still too strong, the feature extractor ψ makes the min-player stronger. The overall optimisation problem can be written as follows:

minf,ψerrψ(Ds)(ρ)(f)+dispψ(Dt)(ρ)f,f-dispψ(Ds)(ρ)f,f,f=maxfdispψ(Dt)(ρ)f,f-dispψ(Ds)(ρ)f,f. 7

Network optimization

As mentioned in 2.2.3, mini-max optimisation is involved, prompting the application of adversarial learning to solve the problem.

In Fig. 1, the feature extractor can extract discriminant features that are conducive to classification by training classifier f with labelled source domain data. The cross-entropy loss is denoted as L(py), where p represents the prediction and y signifies the label. The classification error of the source domains can be expressed as follows:

E(Ds)=Exs,ysDsLfψxs,ys. 8

The auxiliary classifier f can make the feature extractor extract domain-invariant features by adversarial learning to confuse the source and target domains; however, unlike the general 0-1 discriminator that only considers the domain information (the entire source domain is labelled 0, and the entire target domain is labelled 1), it considers the class information of samples to align the conditional distribution. In addition, the auxiliary classifier requires sample labels. Since the target domain samples are unlabelled at this stage, the predictive pseudo-labels generated by classifier f serve as the labels for the target domain samples. The one-hot pseudo-labels for the target domain samples can be obtained using the following formula:

y^t=argmaxkf(ψ(xt)). 9

Then,the MDD between the source and target domains is

Dρ(Ds,Dt)=ExtDtL1-fψxt,y^t-ρExs,ysDsLfψxs,ys. 10

where ρ is the margin, whose value will be determined by subsequent experiments.

Finally, the combined objective function of this optimisation problem is as follows:

minf,ψE(Ds)+αDρ(Ds,Dt),maxfDρ(Ds,Dt), 11

where α is the trade-off coefficient, which was set to 1 in our experiments.

Moreover, to streamline the network training process, a gradient reverse layer (GRL) is inserted between the feature extractor ψ and the auxiliary classifier f. The GRL acts as an identity layer during forward propagation, while it reverses the gradient in the auxiliary classifier during backward propagation by multiplying it with a value linearly constrained to 0.1. This approach effectively transforms the min-max optimisation problem into a minimisation problem.

Experiments

This section presents experimental details, including dataset information, data preprocessing, and hyperparameter selection.

Datasets

The experimental data used in this study consisted of dataset IIa [29] and IIb [30] from BCI Competition IV, with details provided in Table 3.

Table 3.

Details of datasets IIa and IIb from BCI competition IV

Dataset IIa IIb
Subjects 9 9
Sessions 2 (first for training,second for testing) 5 (first for training,last two for testing)
Tasks Left hand, Right hand, Feet, Tongue Left hand, Right hand
Number of trials 288 per session (72 trials per task)

120 in first two session (60 trials per task),

160 in last three session (80 trials per task)

Time period per trial 2-6 s 3-7 s
Sample rate (Hz) 250 250

Preprocessing

The data used in our experiments were obtained from all raw EEG signals through a series of preprocessing steps. First, raw EEG signals were filtered using a third-order Butterworth band-pass filter set between 3-40 Hz, effectively removing noise while preserving information relevant to MI. Subsequently, exponential moving standardisation was utilised to eliminate undesirable non-stationarity and electrode-by-electrode fluctuations. The standardised data can be expressed as follows:

xt=xt-utδt2, 12

where xt and xt represent the filtered signals after and before standardisation, respectively, at time t. ut and δt2 denote the exponential moving mean and variance of the signal, respectively, and were calculated as follows:

ut=(1-γ)xk+γut-1,δt2=(1-γ)(xt-ut)2+γδt-12, 13

where γ is a decay factor. According to [24], we set γ to 0.999 and took the mean and variance of each channel as the initial exponential moving mean and variance, respectively. The preprocessed EEG signals not only retain the information related to MI but also remove occasional noise, which is conducive to the training of our network.

Experimental settings

Three channels of electrooculogram recordings were eliminated from both datasets, leaving only the EEG channels for our experiments. An Adam optimiser with a weight decay of 0.0005 served as the solver for this problem. The learning rate was set to 0.001, and the batch size was configured to 80. To prevent overfitting, an early stopping strategy was employed, halting optimization when the accuracy of the test datasets showed no improvement for 10,000 iterations or when the total number of iterations reached 60,000. The division of the training and test datasets conformed to competition principles to ensure a fair comparison with other methods. The model adopted a leave-one-subject-out strategy for training and evaluation, whereby one subject was selected as the target domain while the remaining subjects served as the source domain. Specifically, in Dataset IIa, when A01 was tested as a target subject, the EEG data from the second session in A01 were used as the target domain, and the first session data from the other eight subjects constituted the source domain. In Dataset IIb, when we selected B01 as the target subject to test the model, the EEG data of the last two sessions in B01 were set as the target domain, and the first three sessions of the remaining eight subjects were set as the source domain.

In our experiments, the DL framework used was PyTorch, and the CPU and GPU platforms were an Intel Core I7 and NVIDIA RTX 2070, respectively.

Hyperparameter selection

In this section, the rational selection of two hyperparameters (ρ and K) in MSDCDA is explored through experiments.

1) Influence of margin ρ:

As mentioned in Sect. 2.3, as a hyperparameter in MDD loss, the margin ρ has a significant impact on the performance of MSDCDA. Based on [31], theoretically, the larger the value of margin ρ, the better the generalisation of the model. In fact, the selection of the margin ρ may not be at the theoretical maximum value because an excessively large ρ may cause a gradient explosion during optimisation. We prefer relatively larger ρ in practice when exploding gradients are not encountered. Therefore, we empirically specify the value range of ρ for different datasets.

Specifically, we limit the range of ρ to [2, 3, 4, 5] for Dataset IIa and [4, 5, 6, 7] for Dataset IIb. We evaluated the classification performance for different values of ρ, selecting the ρ associated with the highest accuracy as the optimal margin for the current target subject. The results indicating the appropriate margin ρ for each subject are highlighted, with the value of K set to 5 (K=5) in this experiment.

As shown in Tables 4 and 5, varying ρ values lead to significant differences in network performance. Notably, the network’s performance improved markedly with an increase in ρ within a certain range. However, performance declined when ρ exceeded this range, leading to potential gradient explosions during training, as observed with the second subject (A02) in Table 4. This shows that while alignment between the source and target domains improves with increasing ρ, there is a practical limit to its growth. Therefore, selecting a suitable ρ for each target subject is essential.

Table 4.

Classification performance (%) of different margins ρ (K=5) on dataset IIa of BCI competition IV

Margin(ρ) A01 A02 A03 A04 A05 A06 A07 A08 A09
2 81.58 49.31 89.58 58.68 53.13 50.69 77.78 77.08 83.33
3 87.50 53.82 92.71 59.72 54.51 60.76 85.76 84.72 85.42
4 91.32 60.42 88.89 73.96 60.41 60.42 87.5 86.46 86.11
5 86.11 90.97 70.83 51.04 59.72 91.32 84.72 89.58

1: ’—’ indicates gradient explosion during the optimization of the network

Table 5.

Classification performance (%) of different margins ρ (K=5) on dataset IIb of BCI competition IV

Margin(ρ) B01 B02 B03 B04 B05 B06 B07 B08 B09
4 80.63 63.57 59.06 97.19 83.75 87.50 82.19 94.38 89.06
5 81.56 62.50 63.13 97.50 88.75 87.19 84.38 95.31 90.94
6 81.88 62.50 59.06 98.13 93.75 90.00 83.13 95.63 91.56
7 84.06 61.07 60.31 98.13 95.31 89.38 81.56 95.63 91.56

2) Influence of the number of attention vectors K:

To prove the effectiveness of McAB in the feature extractor, we investigated the performance of MSDCDA with different numbers of channel attention sub-blocks based on Datasets IIa and IIb. Specifically, we set the number of channel attention sub-blocks as 0,3,5,7 and set ρ to the appropriate value obtained from the previous experiments.

The results are presented in Fig. 4, demonstrating that the network incorporating the attention mechanism achieves superior performance for most subjects, with the exception of A06. This indicates that the introduction of an attention mechanism is beneficial for enhancing MI-EEG decoding performance. Furthermore, the average classification accuracies for both Datasets IIa and IIb peak when the number of attention vectors (K) is set to 5, whereas they decline at K = 7. This observation suggests that while networks with multiple channel attention sub-blocks can outperform those without, an increase in the number of sub-blocks does not necessarily equate to better performance. MI signals reflect neuronal activity levels across different brain regions in response to various MI intentions. Concretely, the ERD of hand imagery movements appears over somatosensory areas, while the ERD of feet imagery movements localizes on the central cortex between both hemispheres. Besides, multi-channel attention block is essentially composed of multiple spatial attention patterns. Each pattern might select some critical channels that are highly relevant to a specific mental task. Consequently, the number of sub-blocks significantly influences network performance.

Fig. 4.

Fig. 4

Classification performance with varying numbers of channel attention sub-blocks. a and b are the results on datasets IIa and IIb, respectively

According to the above experiments, it is proven that the number of attention sub-blocks has an impact on network performance. Finally, we set the number of sub-blocks to 5 according to the results shown in Fig. 4.

Effectiveness of dynamic residual module

The dynamic residual module can adjust the model parameters according to the sample. In this study, the characteristics of the dynamic residual module were used to solve the domain conflict caused by the differences among multiple subjects in the source domain. To verify the effectiveness of the dynamic residual module, experiments were conducted on Datasets IIa and IIb. Specifically, we replaced the dynamic residual module in the model with a normal residual module, while the other modules and hyperparameters did not change, and the network was optimised using the same optimiser. In Tables 6 and 7, DR and SR indicate that the model has a dynamic residual module and normal (static) residual module, respectively. For the nine subjects in Dataset IIa, the model with the dynamic residual module showed better performance. In particular, for subjects A01, A04, and A06, the performance of the model improved by 4.51, 10.07, and 3.82%, respectively, after the dynamic residual module was used. After using the dynamic residual module, the performance of the model decreased by 0.94% on subject B01, and the performance did not improve on subjects B05 and B09. However, the performance was still greatly improved for most subjects, and the average performance increased by 1.19%. According to the experimental results, the dynamic residual module can alleviate the conflict caused by the differences among multiple subjects in the source domain, thus improving the performance of the model. Furthermore, the performance of the dynamic residual module in some subjects (such as B05, B09) did not improve compared to the ordinary residual module, and even showed a relatively low decline (such as B01, B08). This might be due to the fact that the data distribution differences among these subjects and the others are relatively small, which is why the dynamic residual module did not function or even caused a minor performance decline due to dynamic parameters.

Table 6.

Classification performance (%) of the model with dynamic residual (DR) module and static residual (SR) module on Dataset IIa of BCI Competition IV

A01 A02 A03 A04 A05 A06 A07 A08 A09 Avg
DR 91.32 60.42 92.71 73.96 60.41 60.76 91.32 86.46 89.58 78.55
SR 86.81 59.03 90.97 63.89 58.69 56.94 88.54 83.68 88.54 75.23

Table 7.

Classification performance (%) of the model with dynamic residual (DR) module and static residual (SR) module on Dataset IIb of BCI Competition IV

B01 B02 B03 B04 B05 B06 B07 B08 B09 Avg
DR 84.06 63.57 63.13 98.13 95.31 90.00 84.38 95.63 91.56 85.08
SR 85.00 60.36 60.00 97.50 95.31 89.06 80.31 95.94 91.56 83.89

Discussion

Comparison with other methods

We conducted comparative experiments on the two datasets to demonstrate the advantages of the proposed approach.

Nine representative methods were compared, including traditional methods such as FBCSP [32], traditional transfer learning methods RA-MDRM [33] and EA-CSP-LDA [34], deep learning (DL) methods EEGNet [14] and ConvNet [13], the supervised domain adaptation method MI-DABAN [25], and unsupervised domain adaptation (UDA) methods DRDA [35], DAFS [36], and DAWD [23]. Results for the comparison models were obtained from their original papers.

Based on Dataset IIa, we compared the classification accuracy of each subject, the average accuracy, and p-value. The results are shown in Table 8, where the highest accuracies are marked in bold. Our method is superior to most state-of-the-art methods, such as the DL methods EEGNet and ConvNet and the multi-source deep transfer learning methods DRDA and DAFS. Meanwhile, our method achieved the best classification performance for most subjects (i.e., A01, A03, A06, A07, A08, and A09). In addition, the p-values obtained by the paired T-test showed significant differences (p<0.05) between the proposed method and most of the comparison methods. Although there were no significant differences between our method and DAFS (p=0.11712) or DAWD (p=0.30469), the average accuracy of our method was 2.7% and 0.95% higher, respectively.

Table 8.

Comparison of classification performance (%) with state-of-the-art methods on dataset IIa of BCI competition IV

Methods Subjects Average Acc
(P-Value)
A01 A02 A03 A04 A05 A06 A07 A08 A09
FBCSP [32] 76.00 56.20 81.25 61.00 55.00 45.25 82.75 81.25 70.75 67.75 (0.00010)
RA-MDRM [33] 60.20 42.19 66.63 50.97 48.22 51.32 46.75 48.28 66.14 53.49 (0.00008)
EA-CSP-LDA [34] 69.50 40.25 83.01 51.61 38.20 46.58 53.25 68.88 56.12 56.37 (0.00003)
EEGNet [14] 79.86 58.68 89.93 64.93 63.19 58.68 64.24 73.61 77.08 70.22 (0.01040)
ConvNet [13] 76.39 55.21 89.24 74.65 56.94 54.17 92.71 77.08 76.39 72.53 (0.00638)
DRDA [35] 83.19 55.14 87.43 75.28 62.29 57.15 86.18 83.61 82.00 74.75 (0.00548)
DAFS [36] 81.94 64.58 88.89 73.61 70.49 56.60 85.42 79.51 81.60 75.85 (0.11712)
DAWD [23] 83.29 63.97 90.30 76.94 69.34 60.08 89.31 82.35 82.81 77.60 (0.30469)
MI-DABAN [25] 88.54 55.56 91.32 77.43 60.42 58.68 87.15 83.68 82.64 76.16 (0.02155)
Ours 91.32 60.42 92.71 73.96 60.42 60.76 91.32 86.46 89.58 78.55 (—)

The results on Dataset IIb are similarly shown in Table 9. Our method achieved the highest average accuracy and was significantly superior to traditional transfer learning and DL methods. Although there was no significant difference between our method and EEGNet (p=0.23849), DAFS (p=0.45259), DAWD (p=0.49128), or MI-DABAN (p=0.26774), the average accuracy of our method was slightly higher. In addition, EEGNet is a traditional supervised DL method that requires subjects to have a large number of labelled samples. For DAFS and DRDA, signals with high signal-to-noise ratios are required, which can be time-consuming in the preprocessing phase and is not conducive to online applications in the future. MI-DABAN is a supervised multi-source domain adaptation method that requires labelled target domains. In contrast, our method, as an unsupervised multi-source domain adaptation method, shows better performance and is more suitable for the practical applications of BCI systems.

Table 9.

Comparison of classification performance (%) with state-of-the-art methods on dataset IIb from BCI competition IV

Methods Subjects Average Acc
(P-Value)
B01 B02 B03 B04 B05 B06 B07 B08 B09
FBCSP [32] 70.00 60.36 60.94 97.50 93.12 80.63 78.13 92.50 86.88 80.00 (0.00357)
RA-MDRM [33] 73.33 59.17 47.50 85.00 60.00 57.50 54.17 59.17 65.83 62.41 (0.00085)
EA-CSP-LDA [34] 72.50 60.83 53.33 86.67 56.67 57.50 51.67 57.50 65.83 62.50 (0.00058)
EEGNet [14] 70.31 70.36 78.44 95.33 93.44 82.18 91.88 87.19 71.65 82.37 (0.23849)
ConvNet [13] 76.56 50.00 51.56 96.88 93.13 85.31 83.75 91.56 85.62 79.37 (0.00256)
DRDA [35] 81.37 62.86 63.63 95.94 93.56 88.19 85.00 95.25 90.00 83.98 (0.01118)
DAFS [36] 70.31 73.57 80.31 94.69 95.00 83.75 93.73 95.00 75.31 84.63 (0.45259)
DAWD [23] 84.66 66.57 68.04 96.78 94.32 82.61 88.47 93.96 90.10 85.06 (0.49128)
MI-DABAN [25] 84.03 63.43 62.50 98.61 94.44 88.19 86.81 94.79 90.63 84.83 (0.26774)
Ours 84.06 63.57 63.13 98.13 95.31 90.00 84.38 95.63 91.56 85.08 (—)

Visualization

To intuitively demonstrate the effectiveness of McAB and MDD, we randomly selected subjects A01 (from Dataset IIa) and B06 (from Dataset IIb) to visualise their category distributions. The results are shown in Figs. 5 and 6, where different coloured dots represent different MI tasks, as shown in the legends.

Fig. 5.

Fig. 5

:T-SNE category distribution visualization with different ρ and K of subject A01 from Dataset IIa of BCI Competition IV. a ρ=2 and K=5. b ρ=4 and K=0. c ρ=4 and K=5

Fig. 6.

Fig. 6

T-SNE category distribution visualization with different ρ and K of subject B06 from Dataset IIb of BCI Competition IV. a ρ=5 and K=5. b ρ=6 and K=0. c ρ=6 and K=5

Three scenarios are presented in the visualised results. First, when the model introduces McAB but does not adopt an appropriate margin ρ (as shown in Figs. 5a and 6a), the classification boundary is very fuzzy, and it is easy to confuse different categories. Second, when we adopt the appropriate margin ρ without implementing McAB (as shown in Figs. 5b and 6b), the class boundary ambiguity and class confusion are improved to a certain extent. Third, as shown in Figs. 5c and 6c), the samples of the same class are more clustered, and the margin between samples of different classes is larger after introducing McAB and the appropriate margin ρ simultaneously, indicating that MDD can both improve class boundaries and reduce generalisation errors. This suggests that an appropriate margin ρ has a greater effect on classification performance, and the performance of the model can be further improved by combining MDD with McAB. Interestingly, in either case, the class distribution of the two MI tasks of the left and right hands is very close, while that of the foot and tongue is more confusing. This phenomenon may be related to the functional connections between the activated areas associated with MI tasks.

Limitations and future works

The proposed MSDCDA performed well in MI classification; however, some limitations were identified. Notably, the quality of the EEG signal, such as the signal-to-noise ratio, was not considered. Brain activation is influenced by both the measurement environment and individual factors, resulting in lower signal-to-noise ratios for some subjects. Low-quality EEG signals can hinder classification performance for target subjects or negatively affect the domain adaptation process, ultimately leading to reduced classification accuracy. Thus, future studies should explore effective methods for evaluating EEG signal quality, which can serve as a basis for subject selection. Moreover, while domain adaptation effectively addresses the challenges posed by small and unlabeled samples from target subjects, real-world scenarios may involve unknown target subjects’ data. Therefore, developing an MI-EEG classification system that can leverage existing subject data to achieve satisfactory performance for unknown subjects holds significant practical importance. Domain generalization represents another key area for further investigation.

Conclusion

In this study, we proposed a multi-source conditional domain adaptation network (MSDCDA). The multi-channel attention block extracts distinguishing features, representing appropriate channel activation levels corresponding to all MI tasks by introducing K channel attention sub-blocks. The dynamic residual block mitigates multi-source domain conflicts by dynamically adjusting network parameters to adapt to samples from different domains. In addition, the auxiliary classifier shares the same structure as the primary classifier, utilizing the MDD strategy to implement conditional alignment between the source and target domains through adversarial learning. Together, these components enhance the effectiveness of domain adaptation. Experimental results show that the proposed method effectively decodes the MI intention of target subjects using EEG signals from multi-source subjects, achieving competitive performance compared to other advanced methods. This study presents a novel unsupervised domain adaptation algorithm for detecting EEG-based MI signals, showcasing significant potential in the field of BCI.

Acknowledgements

The research was financially supported by the National Natural Science Foundation of China (Nos. 62173010). We would like to thank the provider of the dataset and all the people who have given us helpful suggestions and advice.The authors are obliged to the anonymous reviewers and the editor for carefully looking over the details and for useful comments that improved this paper.

Biographies

Zhi Li

received his B.Sc. degree from Hubei University of Technology, Hubei, China, in 2022. He is currently pursuing his M.Sc. degree in Beijing University of Technology, Beijing, China. His research interests include brain-computer interface, deep learning, and transfer learning.

Mingai Li

received her B.Sc. degree and M.Sc. degree from Daqing Petroleum Institute, Heilongjiang, China, in 1987 and 1990, respectively, and Ph.D. degree from Beijing University of Technology, Beijing, in 2006. She is a professor at the Faculty of Information Technology, Beijing University of Technology. Her research interests mainly include brain computer interface, artificial intelligence, pattern rehabilitation robots.

Yufei Yang

received her B.Sc. degree in electrical engineering and automation from Tianjin University of Technology and Education, Tianjin, China, in 2018, and received the M.Sc. degree from Beijing Technology and Business University, Beijing, China, in 2021. She is currently pursuing her Ph.D. degree in Beijing University of Technology, Beijing, China. Her research interests include brain computer interface and incremental learning.

Author contributions

Z.L. wrote down the entire manuscript, including all the pictures and tables. M.L. reviewed and revised the manuscript. Y.Y. reviewed the manuscript and put forward helpful suggestions.

Funding

This study was financially supported by the National Natural Science Foundation of China (Nos. 62173010).

Data availability

No datasets were generated or analysed during the current study.

Declarations

Ethics approval and consent to participate

Not applicable. The data used in this study are all from the datasets (Datasets IIa and IIb from BCI Competition IV) made public by the Institute for Knowledge Discovery from Graz University of Technology in Austria.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.Song T, Zheng W, Song P, Cui Z (2018) Eeg emotion recognition using dynamical graph convolutional neural networks. IEEE Trans Affect Comput 11(3):532–541 [Google Scholar]
  • 2.Liu S, Wang X, Jiang M, An Y, Gu Z, Li B, Zhang Y (2024) Mas-dgat-net: a dynamic graph attention network with multibranch feature extraction and staged fusion for eeg emotion recognition. Knowl Based Syst 305:112599 [Google Scholar]
  • 3.Cui R, Chen W, Li M (2024) Emotion recognition using cross-modal attention from eeg and facial expression. Knowl Based Syst 304:112587 [Google Scholar]
  • 4.Prabal DB, Tuncer T, Baygin M, Dogan S, Acharya UR (2024) N-bodypat: investigation on the dementia and Alzheimer’s disorder detection using eeg signals. Knowl Based Syst 304:112510 [Google Scholar]
  • 5.Samiee K, Kovács P, Gabbouj M (2017) Epileptic seizure detection in long-term eeg records using sparse rational decomposition and local gabor binary patterns feature extraction. Knowl-Based Syst 118:228–240 [Google Scholar]
  • 6.Li Y, Cui W-G, Huang H, Guo Y-Z, Li K, Tan T (2019) Epileptic seizure detection in eeg signals using sparse multiscale radial basis function networks and the fisher vector approach. Knowl-Based Syst 164:96–106 [Google Scholar]
  • 7.Xiaojun Yu, Aziz MZ, Sadiq MT, Jia K, Fan Z, Xiao G (2022) Computerized multidomain eeg classification system: a new paradigm. IEEE J Biomed Health Inform 26(8):3626–3637 [DOI] [PubMed] [Google Scholar]
  • 8.Vaid S, Singh P, Kaur C (2015) Eeg signal analysis for bci interface: A review. In 2015 fifth international conference on advanced computing & communication technologies, 143–147. IEEE
  • 9.Janapati R, Dalal V, Sengupta R (2023) Advances in modern eeg-bci signal processing: a review. Mater Today Proc 80:2563–2566 [Google Scholar]
  • 10.Craik A, He Y, Contreras-Vidal JL (2019) Deep learning for electroencephalogram (eeg) classification tasks: a review. J Neural Eng 16(3):031001 [DOI] [PubMed] [Google Scholar]
  • 11.Pfurtscheller G, Da Silva FHL (1999) Event-related eeg/meg synchronization and desynchronization: basic principles. Clin Neurophysiol 110(11):1842–1857 [DOI] [PubMed] [Google Scholar]
  • 12.Tang Z, Wang H, Cui Z, Jin X, Zhang L, Peng Y, Xing B (2023) An upper-limb rehabilitation exoskeleton system controlled by mi recognition model with deep emphasized informative features in a vr scene. IEEE Trans Neural Syst Rehabil Eng. 10.1109/TNSRE.2023.3329059 [DOI] [PubMed] [Google Scholar]
  • 13.Schirrmeister RT, Springenberg JT, Fiederer LDJ, Glasstetter M, Eggensperger K, Tangermann M, Hutter F, Burgard W, Ball T (2017) Deep learning with convolutional neural networks for eeg decoding and visualization. Hum Brain Mapp 38(11):5391–5420 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Lawhern VJ, Solon AJ, Waytowich NR, Gordon SM, Hung CP, Lance BJ (2018) Eegnet: a compact convolutional neural network for eeg-based brain-computer interfaces. J Neural Eng 15(5):056013 [DOI] [PubMed] [Google Scholar]
  • 15.Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, 1–9
  • 16.Santamaria-Vazquez E, Martinez-Cagigal V, Vaquerizo-Villar F, Hornero R (2020) Eeg-inception: a novel deep convolutional neural network for assistive erp-based brain-computer interfaces. IEEE Trans Neural Syst Rehabil Eng 28(12):2773–2782 [DOI] [PubMed] [Google Scholar]
  • 17.Xu W, Wang J, Jia Z, Hong Z, Li Y, Lin Y (2022) Multi-level spatial-temporal adaptation network for motor imagery classification. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 1251–1255. IEEE
  • 18.Zhong X-C, Wang Q, Liu D, Liao J-X, Yang R, Duan S, Ding G, Sun J (2023) A deep domain adaptation framework with correlation alignment for eeg-based motor imagery classification. Comput Biol Med 163:107235 [DOI] [PubMed] [Google Scholar]
  • 19.Li A, Wang Z, Zhao X, Tianheng X, Zhou T, Honglin H (2023) Mdtl: a novel and model-agnostic transfer learning strategy for cross-subject motor imagery bci. IEEE Trans Neural Syst Rehabil Eng 31:1743–1753 [DOI] [PubMed] [Google Scholar]
  • 20.Wei F, Xueyuan X, Jia T, Zhang D, Xia W (2023) A multi-source transfer joint matching method for inter-subject motor imagery decoding. IEEE Trans Neural Syst Rehabil Eng 31:1258–1267 [DOI] [PubMed] [Google Scholar]
  • 21.Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. Advances in neural information processing systems. 27
  • 22.Ganin Y, Ustinova E, Ajakan H, Germain P, Larochelle H, Laviolette F, March M, Lempitsky V (2016) Domain-adversarial training of neural networks. J Mach Learn Res 17(59):1–35 [Google Scholar]
  • 23.She Q, Chen T, Fang F, Zhang J, Gao Y, Zhang Y (2023) Improved domain adaptation network based on Wasserstein distance for motor imagery eeg classification. IEEE Trans Neural Syst Rehabil Eng 31:1137–1148 [DOI] [PubMed] [Google Scholar]
  • 24.Chen P, Gao Z, Yin M, Jialing W, Ma K, Grebogi C (2021) Multiattention adaptation network for motor imagery recognition. IEEE Trans Syst Man Cybern Syst 52(8):5127–5139 [Google Scholar]
  • 25.Li H, Zhang D, Xie J (2023) Mi-daban: a dual-attention-based adversarial network for motor imagery classification. Comput Biol Med 152:106420 [DOI] [PubMed] [Google Scholar]
  • 26.Hong X, Zheng Q, Liu L, Chen P, Ma K, Gao Z, Zheng Y (2021) Dynamic joint domain adaptation network for motor imagery classification. IEEE Trans Neural Syst Rehabil Eng 29:556–565 [DOI] [PubMed] [Google Scholar]
  • 27.Li Y, Yuan L, Chen Y, Wang P, Vasconcelos N (2021) Dynamic transfer for multi-source domain adaptation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 10998–11007
  • 28.Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, 7132–7141
  • 29.Brunner C, Leeb R, Müller-Putz G, Schlögl A, Pfurtscheller G (2008) Bci competition 2008-graz data set a. Institute for knowledge discovery (laboratory of brain-computer interfaces). Graz Univ Technol 16:1–6 [Google Scholar]
  • 30.Leeb R, Clemens Brunner G, Müller-Putz AS, Pfurtscheller GJGUOT (2008) Bci competition 2008-graz data set b. Graz Univ Technol Austria 16:1–6 [Google Scholar]
  • 31.Chen Y, Dai X, Liu M, Chen D, Yuan L, Liu Z (2020) Dynamic convolution: Attention over convolution kernels. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 11030–11039
  • 32.Ang KK, Chin ZY, Zhang H, Guan C (2008) Filter bank common spatial pattern (fbcsp) in brain-computer interface. In 2008 IEEE international joint conference on neural networks (IEEE world congress on computational intelligence), 2390–2397. IEEE
  • 33.Zanini P, Congedo M, Jutten C, Said S, Berthoumieu Y (2017) Transfer learning: a Riemannian geometry framework with applications to brain-computer interfaces. IEEE Trans Biomed Eng 65(5):1107–1116 [DOI] [PubMed] [Google Scholar]
  • 34.He H, Dongrui W (2019) Transfer learning for brain-computer interfaces: a Euclidean space data alignment approach. IEEE Trans Biomed Eng 67(2):399–410 [DOI] [PubMed] [Google Scholar]
  • 35.Zhao H, Zheng Q, Ma K, Li H, Zheng Y (2020) Deep representation-based domain adaptation for nonstationary eeg classification. IEEE Trans Neural Netw Learn Syst 32(2):535–545 [DOI] [PubMed] [Google Scholar]
  • 36.Phunruangsakao C, Achanccaray D, Hayashibe M (2022) Deep adversarial domain adaptation with few-shot learning for motor-imagery brain-computer interface. IEEE Access 10:57255–57265 [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

No datasets were generated or analysed during the current study.


Articles from Brain Informatics are provided here courtesy of Springer

RESOURCES