Skip to main content
Cognitive Neurodynamics logoLink to Cognitive Neurodynamics
. 2024 May 3;18(5):2689–2707. doi: 10.1007/s11571-024-10115-y

An effective classification approach for EEG-based motor imagery tasks combined with attention mechanisms

Jixiang Li 1,2, Wuxiang Shi 1,2, Yurong Li 1,2,
PMCID: PMC11564468  PMID: 39555298

Abstract

Currently, electroencephalogram (EEG)-based motor imagery (MI) signals have been received extensive attention, which can assist disabled subjects to control wheelchair, automatic driving and other activities. However, EEG signals are easily affected by some factors, such as muscle movements, wireless devices, power line, etc., resulting in the low signal-to-noise ratios and the worse recognition results on EEG decoding. Therefore, it is crucial to develop a stable model for decoding MI-EEG signals. To address this issue and further improve the decoding performance for MI tasks, a hybrid structure combining convolutional neural networks and bidirectional long short-term memory (BLSTM) model, namely CBLSTM, is developed in this study to handle the various EEG-based MI tasks. Besides, the attention mechanism (AM) model is further adopted to adaptively assign the weight of EEG vital features and enhance the expression which beneficial to classification for MI tasks. First of all, the spatial features and the time series features are extracted by CBLSTM from preprocessed MI-EEG data, respectively. Meanwhile, more effective features information can be mined by the AM model, and the softmax function is utilized to recognize intention categories. Ultimately, the numerical results illustrate that the model presented achieves an average accuracy of 98.40% on the public physioNet dataset and faster training process for decoding MI tasks, which is superior to some other advanced models. Ablation experiment performed also verifies the effectiveness and feasibility of the developed model. Moreover, the established network model provides a good basis for the application of brain-computer interface in rehabilitation medicine.

Keywords: Brain-computer interface, Intention recognition, Attention mechanism, Convolutional neural networks, Motor imagery

Introduction

In recent years, brain-computer interface (BCI) technology has been a rapidly developing scientific research, which provides a direct way for patients with paralysis and spinal cord injury to communicate with each other, and it does not depend on the peripheral nerves and muscles of the human body (Schalk et al. 2004; Li et al. 2023). BCI has become a research hotspot in the fields of brain information and biomedical engineering, especially the exploration of decoding motor imagery (MI) task has been the key research object. In addition, BCI system can also covert human neuron activity into an electrophysiological signal, it is convenient for scientific researchers to further study the internal connection between brain activity and intentional behavior and stimulate the application in the field of medical rehabilitation (Nandhini and Sangeetha 2023; Kumar and Sharma 2018; Keerthi Krishnan and Soman 2021). However, electroencephalogram (EEG) is an extremely weak electrophysiological signal with nonlinearity and nonstationarity properties. At present, the common EEG acquisition methods are mainly divided into invasive, non-invasive and semi-invasive, among which the invasive and the semi-invasive EEG acquisition approaches require implanting electrodes into the cerebral cortex to obtain relevant signal features, which are difficult to be widely used since it is harmful to human body. In contrast, as the most commonly used approach, the non-invasive EEG acquisition system has a series of advantages at present, such as low cost, convenience and harmless. Unfortunately, the brain signals acquired by the non-invasive method also have some shortcomings with a large number of artifacts, for example, electrocardiogram (ECG), electromyography (EMG), power lines (PL), etc., which results in the pure EEG signal is hard to be obtained due to these external interfered factors. Therefore, the EEG signals collected by the non-invasive approach need numerous operations like removing artifacts in the later stage because of the low signal-to-noise ratio (SNR) compared with invasive and semi-invasive acquisition methods (Al-Saegh et al. 2021; Wang and Li 2023; Altaheri et al. 2023).

As is well-known that, MI is a branch of brain science research field, and it will cause some changes in related brain activities when people imaging a specific task movement. Particularly, desynchronization and synchronization of event-related potential (ERD/ERS) are two different characteristics of EEG-based MI tasks (Graimann et al. 2009; Qin et al. 2024; Santos et al. 2023). When the ipsilateral MI task is performed by subject, the EEG activity on the contralateral side is obviously decreased, which is called the ERD phenomenon. When the ipsilateral motor imagery task is at rest, the EEG activity of the contralateral side rises, which is called the ERS phenomenon. Notably, these two phenomena for ERD and ERS are important reference standards in the field of MI in brain science. At present, the researches on the control of BCI system based on MI tasks are gradually increasing. Numerous researchers have designed forward-looking applications by decoding EEG signals, and the better ones are robot and mouse control system design, wheelchair control system, mechanical arm control system design, et al. (Jiang et al. 2023; Hsu 2011; Lebedev and Nicolelis 2017).

Machine learning is increasingly utilized in medical data analysis process, including artificial neural networks (Prisciandaro et al. 2023; Souza Jr et al. 2021), associative memory neural network (Sun et al. 2023a, 2023b), extreme learning machines (Wang et al. 2020), et al. Besides, machine learning technology has also been applied in BCI field, and there were some previous researches (Hsu et al. 2007; Lemm et al. 2005, 2006; Zhang et al. 2018a; Falzon et al. 2012) that using the classical traditional machine learning ideas to identify EEG-based MI tasks classification. Whereas, these methods heavily relied on handcrafted features and extensive experience, and the obtained recognition rate with only close to 80% was insufficient to provide real-time application for BCI systems. Specifically, in (Hsu et al. 2007), the idea of discrete wavelet transform (DWT) was adopted to study the feature extraction of EEG signals. Afterwards, these features extracted were classified by using linear discriminant analysis (LDA) classification method. In (Lemm et al. 2005, 2006), common spatial pattern (CSP) method was extended to the state space so that each channel could independently set the frequency filter and decomposed related EEG signal by independent component analysis (ICA) method. The irrelevant interference in the light of the achieved weights was removed, and it could further improve the component of the required signal significantly. In (Zhang et al. 2018a), the sparse regression approach was employed to resolve CSP features of EEG signals in various frequency bands and the sliding windows with different values. Each spectrum-specific signal was further divided into multiple bands subsequently through the sliding window method. Finally, the support vector machine (SVM) model was employed to identify the MI tasks accurately according to the optimized EEG features. In (Falzon et al. 2012), an analytical algorithm was put forward for feature extraction via CSP, namely, analytic CSP (ACSP), which providing more comprehensive images of potential activities combing the phase and amplitude information of the data than the traditional CSP method. As a result, the classification accuracy was improved through the ACSP approach.

Considering that traditional machine learning approaches require complex feature extraction processes, it is not only time-consuming but also often leads to lower recognition rates. Therefore, there are still many challenges in accurately decoding MI actions. In recent years, deep learning (DL) has attracted the attention of many disciplines to further improve the EEG feature recognition effect in classification tasks. DL can directly use raw EEG data without the need to preprocess the data or extract features manually (Altaheri et al. 2022; Luo et al. 2018; Schirrmeister et al. 2017). Furthermore, as a modern network learning structure, DL has shown the stronger feature extraction capability and excellent performance in several applications, such as speech recognition (SR) (Tang et al. 2017), image recognition (Lee and Kwon 2017; Bird et al. 2021), EEG recognition (Lawhern et al. 2018), and others. Some researchers have also clarified that DL plays a key role in decoding brain activities accurately. Hence, it is very suitable for processing complex EEG signals. For instance, researchers in Tabar and Halici (2016) put forward two types of tasks for MI classification combining convolution neural networks (CNN) and stacked automatic encoder (SAE). Also, the short-time Fourier transform (STFT) was employed to transform the extracted time–frequency domain characteristics into two-dimensional images and then sent to the constructed CNN-SAE network for MI tasks recognition. Eventually, the proposed approach achieved 77.6% intention recognition results using the public database, and there was 9% improvement over the winner method. Hou et al. (2020a) presented a graph convolution neural networks (GCNN) based model to classify the MI tasks with different groups and cope with the inter-trial and inter-subject variability in EEG signals, in which the Pearson, Adjacency, and Laplacian Matrices of overall features obtained by the optimal recurrent neural network (RNN)-based were introduced sequentially to represent the topological structure of features. Finally, a promising result 94.64% was obtained on decoding MI-EEG signals, which surpassing other advanced methods. Dai et al. (2020) proposed a CNN architecture for EEG-based MI classification based on data enhancement approaches with mixed scale according to the differences of subjects’ best convolution scales. Experimental results showed that the proposed method achieved the average classification accuracy of 91.57% and 87.6% on two commonly used datasets, which exceeding other advanced algorithms. Li et al. (2022) used the CNN model to extract the spatial features information of EEG signals. The important features information of the middle layer was also considered to realize the MI tasks classification. Results illustrated that the fused model achieved 87% of the recognition effect, which providing new ideas for the study of feature extraction and classification in MI BCI field.

However, numerous previous studies mainly focused on binary or triple MI tasks classification. Studies exploring multi-class recognition in MI-related actions, such as those involving four or five tasks, remained relatively scarce. Besides, the recognition effect of some EEG signals based on MI tasks decoded by DL method was about 80%, and the small-scale data sets adopted with partial channels may also result in losing important features and reducing the recognition effect. Therefore, considering that the decoding performance of previous studies still has some room for improvement, it is necessary to develop an effective model to enhance EEG-based MI decoding for multi-class tasks. Inspired by previous researches, a novel architecture is developed in our work to improve the decoding performance of five actions which including four MI tasks and one eyes-close (EC) task, namely CBLSTM framework with three-layer CNN and bidirectional long short-term memory (BLSTM). First of all, three-layer CNN networks are designed to extract spatial feature information of the processed two-dimensional (2D) EEG data. Then, the BLSTM framework is utilized to extract the time feature information related to MI tasks in EEG signal. Due to the time-varying nature of MI-EEG signals, the attention mechanism (AM) model is also innovative introduced to further highlight the most valuable features information effectively. Meanwhile, some relevant comparative experiments have been carried out in this work. Finally, the results show that the proposed model achieves 98.40% on decoding multi-actions, which higher than other advanced models significantly.

The main contributions of this study are summarized as follows:

  1. A novel hybrid model combining CNN and BLSTM, namely CBLSTM, is proposed to learn the spatio-temporal features of EEG-based five-class tasks from the converted array representations.

  2. Weight is assigned to important feature information through AM model adopted to further mine the most relevant discriminant features of network.

  3. Under the same hardware configuration, the proposed model achieves the better decoding performance (98.40%) with less training time (11h35min) by comparing with other models (97.36%, 13h30min), proving the effectiveness of the designed model.

The remainder of this paper is organized as follows: Section Related works" describes the related works for the network models. Section Materials and methods" elaborates the overall framework design for the developed model, including the data processing, the design idea and the partial introduction of the neural network. Section "Results and discussion" provides the analysis of experimental results and discussion. Section "Conclusion" concludes the contributions and the development trend of this research in the future.

Related works

CNN model

CNN is a popular deep learning network inspired by biological vision mechanisms (Gu et al. 2018), and it is widely used in image recognition, power system fault diagnosis (PSFD), EMG signal detection, and others. CNN network has a strong feature extraction capacity, which can automatically learn effective feature representation from data directly without any complicated pre-processing operation. Generally, CNN mainly consists of three parts: convolutional layer for feature extraction, pooling layer for data dimension reduction, and full connection layer for category judgment of the data. The main point is that CNN has the unique advantages in speech recognition and image processing with its special structure of local weight sharing, and the layout is closer to the actual biological neural network. In addition, weight sharing reduces the complexity of the network. Especially, the features of multi-dimensional vectors can be directly input into the network to mitigate the complexity of data reconstruction for feature extraction and classification. Generally, the collected EEG data exhibits a high degree of temporal correlation, yet it shows relatively weak spatial correlation. Although the data is obtained via these 64 channels on scalp employed, the relationship between each channel is not stronger. Therefore, the spatial features can be extracted to enhance the performance through adopting CNN model on MI tasks decoding. Some works (Lu et al. 2019; Hershey et al. 2017) showed that the CNN model acted as an effective feature extraction approach had achieved better recognition results in some fields. For instance, Lu et al. (Lu et al. 2019) put forward a method of combining CNN and long short-term memory (LSTM) framework for MI classification, a four-class task public dataset was adopted. One-dimensional (1D) convolutional neural network and LSTM network were employed for features extraction and tasks classification. The experimental results showed that a good recognition rate was obtained. Hershey et al. (Hershey et al. 2017) employed the various CNN architectures to detect the soundtracks of data sets including 70 M training videos (5.24 million hours) with 30,871 video-level labels. The experimental results established that CNN models are not only effective in processing image data but also possess robust feature extraction capabilities for audio task detection, particularly when larger training and labeled datasets are utilized. In terms of identifying the intention of MI tasks, Wang et al. (2024) employed the fusion multi-branch CNN framework for EEG-based MI classification, and the proposed model has obtained good experimental results through the open dataset, achieving accuracies of 78.82% and 68.41% for subject-dependent and subject-independent modes, respectively. It was confirmed that the built CNN framework was superior to other classification methods. Given the above-mentioned stronger feature extraction and segmentation capabilities, CNN model is introduced in this paper to extract the vital spatial features of EEG-based MI tasks recognition.

BLSTM network model

At present, EEG data collected is easy to be interfered by some artifacts, which making it difficult to analyze the time-series features. To address this issue, RNN model is an effective approach to extract the time features of EEG signals. Although the cell units of RNN are similar to the feedforward neural network, the output results can be sent back to itself through a backward connection (Hou et al. 2020a). RNN model can extract effectively the time feature information from EEG signals because of its strong ability to the time-series signals, especially in solving natural language processing (NLP) problems. In addition, RNN model has been successfully applied to SR, brain rhythm signals detection, machine translation (MT), etc. As a general RNN, LSTM mechanism can also effectively extract temporal features via combining with other deep learning models for classifying MI tasks (Li et al. 2022). It is worth noting that BLSTM, an advanced variant of the LSTM structure, is particularly effective in task recognition of time series data by comparing with LSTM. In terms of signals processing, BLSTM mainly employs the advantages of a single LSTM layer to extract feature information in two directions. That is to say, when the input sequence is a time state, BLSTM can reduce information omission. In particular, these effective features from EEG signals are extracted by BLSTM model. The first one in the input sequence is the original sample, and the second one is the inverse sample of the input sequence. This provides additional context for the network in learning the feature representation more quickly and comprehensively. For this reason, BLSTM can better detect the long-term dependence of the original EEG signal. The specific operation process of the LSTM cell is described from Eqs. (1) to (6):

it=s(WxiTxt+WhiTht-1+bi) 1
ft=s(WxfTxt+WhfTht-1+bf) 2
ot=s(WxoTxt+WhoTht-1+bo) 3
gt=tanh(WxgTxt+WhgTht-1+bg) 4
ct=ftct-1+itgt 5
yt=ht=ottanh(ct) 6

where, xt represents the input gate, it represents the part of the information stored in the input gate; ot represents the output gate of LSTM, ct and ct-1 represents the memory state of the present moment and the last moment, respectively; ht−1 denotes the output of prior cell. WxiT, WxfT, WxoT, WxgT and bi, bf, bo, bg are weights and deviations of different layers for storing, memorizing and learning generalized models, respectively. WhiT, WhfT, WhoT, and WhgT represent the weight coefficients of ht-1 in the process of forgetting gate, input gate, output gate and feature extraction, respectively. Moreover, s represents the nonlinear activation function, namely s-type sigmoid function used in experiments, and yt represents the final output results.

The memory of EEG is manipulated and controlled through three types of gates structures, termed input gate, forget gate and output gate. The part of the information stored in the input gate xt is represented by it. It can control some information passed to the long-term state ct. The forgetting gate, as determined by ft, dictates that the partial information of ct should be ignored. The gate ot represents the output of LSTM, which mainly used to control the partial information of ct. Eventually, the output result is expressed as yt, which is the so-called short-term state ht. Through the operation of the above-mentioned gate, two states are stored. The long-term state ct can pass through the LSTM unit from left to right, where some memory information at the forgetting gate is discarded and the new information from the input gate is incorporated. Then, the information is passed through the tanh activation function and then filtered through the output gate. Finally, the short-time state ht is produced. The main operation process about BLSTM is that the input signal xt is passed into the forward LSTM unit from left to right. Notably, it is converted in the reverse direction and input to another LSTM unit, namely the reverse LSTM. As a result, there are two output vectors produced through BLSTM, and the final output of the internal structure cell can store more comprehensive information than the single LSTM.

To enhance the ability to extract EEG signals effectively, a two-layer recurrent neural network with positive and negative directions based on LSTM is adopted in our work. After spatial feature extraction, EEG signals are transmitted to the two-layer LSTM for temporal feature extraction. Considering that EEG signals with the high temporal resolution are easily interfered with by some artifacts during acquisition, such as blink, muscle fatigue, etc., this often results in a lower SNR. Consequently, it is easy to leave out the information-rich of important features and generate the poor recognition result when using conventional LSTM for temporal feature extraction. To reduce the omission of features information and improve the intention recognition rate of EEG-based MI tasks, a three-layer convolution network is designed to extract the spatial features of EEG signals, and the BLSTM model is adopted to extract the temporal features. Simulation results indicate that the combined approach of a three-layer CNN and two-layer BLSTM yields a higher recognition rate for MI intentions compared to other methods. Therefore, the BLSTM model adopted in this paper is effective.

AM model

The AM model is widely used in many fields such as MT, SR, power line detection (PLD), and so on (Huang et al. 2016; Chorowski et al. 2015). In the aspect of MI intention recognition, AM also shows good decoding performance. For example, Shi et al. (2023) put forward a classification method for EEG-based MI, and a hybrid neural network combining spatio-temporal convolution and AM model. Experimental results showed that the designed method achieved a global average classification accuracy of 83.3%, outperforming most of the existing methods. The reason is that AM model has the ability to discriminate, especially in the application of MT and SR. In addition, AM model can also release different weights to each word in the sentence, it enables the learning of the neural network model more flexible. In the original EEG signals collected, not all signals have the same contribution to classification. Compared with other physiological data sets, the MI data set has a long time sampling point. Besides, there are a large number of EEG channels, therein 64 channels are commonly used to collect more feature information related to motion intention. Considering that the CNN model is usually used for feature extraction, hence there are many better-related features obtained. However, convolutional network can share the network weights which are difficult to judge the importance of effective information and result in different degrees of influence on the final results. Therefore, the AM model st is jointly trained as a weighted sum of the outputs of the BLSTM with attention model. The feature mining method based on the attention model can be described from Eqs. (7) to (9):

ut=tan(wwyt+bw) 7
at=exp(utTuw)texp(utTuw) 8
st=tatyt 9

where, ut is a fully connected (FC) layer used to learn features of output yt, followed by a softmax layer is employed to output the probability of the tasks detected. Moreover, ww, uw and bw represent trainable weights and deviations, respectively; st represents the AM model, which is jointly trained as a weighted sum of the outputs of the BLSTM with attention based on the weights. The major contributions to the spatio-temporal information, denoted as αt, are extracted through the corresponding operations.

In this paper, BLSTM is utilized and combined with the AM model to extract those MI related time stamp characteristic of EEG signals. The weights are assigned to select more important features that are more conducive to classification. Given the high complexity of EEG signals and the diverse nature of the collected features, some less discernible features for classification tasks may increase the complexity of model training. As a result, the classification accuracy is also greatly affected. In addition, to reduce the training difficulty of the model and improve the ability of intention recognition, the AM model adopted can not only reduce the weights for those features that are not very obvious by the principle of random distribution of weights, but also it can increase the information amount of the weights significantly. By this way, the important features related to classification tasks can be extracted by the adopted AM model. Furthermore, to show the advantages of AM model more effectively, the network model without attention model is also been compared in this paper. And all parameter values are set to consistent during training. Simulation results confirm that the developed model combined with AM module achieves a better recognition rate than that of without AM module. Specifically, this work aims not only to enhance the speed of the proposed network model by integrating an attention network but also to mitigate the influence of uncorrelated EEG information on the classification of MI tasks. Therefore, in this paper, the AM model is introduced to further improve the effectiveness of the model.

Materials and methods

Data description and processing

The data employed in this paper is from the public EEG Motor Movement/Imagery Database which was sourced from PhysioNet (Goldberger et al. 2000), namely EEGMMIDB, including 109 subjects. 64-electrode channels were distributed with a sampling frequency of 160 Hz in BCI-2000 equipment. Five different states of EEG activities, namely eyes closed (without any imaginary tasks), imagining opening and closing the left fist, right fist, fists, and feet without actual movement were performed, respectively. Each subject conducted 14 experiments, which including two groups of one-minute baseline experiments (one with eyes open and the other with eyes closed) and three groups of two-minute experiments of motor imagination for each of the four tasks. The EEG signal acquisition of MI tasks was carried out in a well-closed laboratory. All participants sat in front of the EEG acquisition equipment quietly and comfortably to avoid other physiological activities during the acquisition. Initially, the closing of the eyes of all participants was used as the benchmark. Then, the remaining MI actions were repeated three times with more than 1,500 recordings. Notably, the data from Subject No. 89 was excluded due to its marked deviation from other participants during the EEG signal collection process. Consequently, the dataset which comprising 108 subjects is utilized for the classification of MI tasks.

The corresponding topological position structure for the EEG-based MI signal acquisition device with 64-electrode is displayed in Fig. 1. MI-EEG signals are collected according to the present experimental paradigm. As a result, the brain activity information with higher time resolution is recorded by each electrode according to each timestamp (Yilmaz et al. 2020).

Fig. 1.

Fig. 1

The position distribution of 64-electrode topology

Moreover, EEG signals collected from all electrodes are one-dimensional (1D) time-series signals. According to this characteristic, many researches are only limited to considering the 1D time signal characteristics of EEG signals. While, there is not much consideration of the changes caused by its spatial structure characteristics. In addition, it can be observed that each channel has two adjacent channels (front and back, or left and right) from the position distribution of electrodes. The position distribution of electrodes can lead to the collected EEG signals having certain spatial characteristic information, since the experimental paradigm of EEG acquisition is based on time series. To analyze the characteristics of the collected EEG signals effectively, 50 Hz filter is performed to attenuate the power line interference. Furthermore, the collected EEG signals which interpreted as 1D feature vectors are transformed using various effective methodologies viewed from different perspectives. These converted EEG signals containing temporal and spatial features are inputted to the proposed model for training and testing in this paper. Specifically, according to the actual positions distribution of 64 electrodes, the formed two-dimensional matrix through the 1D EEG vector rt=[st1,st2,stn,,st64]T is used as the inputs for developed model. The stn represents the feature vector at time t, and n represents the number of electrodes, n = 64, T stands for transpose matrix. As illustrated in Fig. 2, mt represents the converted 2D matrix. Furthermore, the positions without EEG electrodes are filled with zeros. Eventually, a matrix form with 10 rows and 11 columns is formed. The Z-score normalization method is employed to standardize the data types, and aids in accelerating the model training process.

Fig. 2.

Fig. 2

The transformed 2D data structures by 1D time-series signals

As EEG is a non-stationary signal with many artifacts, the characteristic information represented of EEG signals collected at different time is different. In addition, the simple time-domain representation can not fully reflect some characteristics to be expressed by EEG signals. For this reason, the transformation form of the 2D matrix has been adopted to display EEG sequence signals with both temporal and spatial characteristics simultaneously. Moreover, to better express the classification features of EEG-based MI tasks, the sliding window approach has been also adopted to obtain a series of fragment structures during data preprocessing. Subsequently, these fragmented structures serve as inputs for the proposed model. To preserve more effective classification features, the overlap rate between windows is set at 50%. These 2D data fragments obtained are defined as Dj=[mt,mt+1,,mt+p-1], in which the size of that time window is denoted by p. t denotes the time stamp information; j is defined as 1, 2, 3, …, q for the data segments segmented in the period. Figure 3 illustrates the process of converting 1D time-series signals into 2D data fragments. The advantage of this methodology lies not only in preserving the temporal features of the data but also in capturing the spatial characteristics inherent in the original EEG signals. This particular design significantly enhances the recognition efficacy of MI tasks.

Fig. 3.

Fig. 3

The whole preprocess of EEG signals. The original 1D time series signal is converted into a 2D matrix according to the channels distribution, and the sliding window amplifies the data fragments of the same trial

Overall structure of proposed network model

In this paper, the CBLSTM model which combining CNN and BLSTM is developed to mine the effective feature information of MI-EEG tasks as much as possible and improve the classification accuracy. Considering that the most studies only extract the single spatial or temporal feature, hence the one-dimensional time-series datasets collected from 64 electrodes are firstly transformed 2D grid through the effective pre-processing. Subsequently, these series of 2D fragments including both spatial and temporal feature information through the step of the sliding window are given into the developed network model. In addition, to highlight the effective components more effectively and speed up the training of the model, the AM module is also adopted to further increase the weight of effective information simultaneously. Eventually, these five tasks (T0–T4) including one EC task namely invalid MI task and four MI tasks are carried out. Table 1 lists the tasks types represented by T0–T4. The designed whole structure is displayed in Fig. 4.

Table 1.

Symbolic representation for different tasks

Symbolic representation The corresponding tasks
T1 Left fist (MI task)
T2 Right fist (MI task)
T3 Both fists (MI tasks)
T4 Both feet (MI tasks)
T0 Eyes closed (Invalid MI task)

Fig. 4.

Fig. 4

The overall structure of the proposed network model

In the developed network model, these obtained data segments by sliding windows are utilized to the input parts of the network model, namely, input = Dj=[mt,mt+1,,mt+p-1]Rp×h×w, in which h represents the height of the 2D array; w represents the width of the two-dimensional array; p represents the number of sliding windows. At first, the three-layer CNN network is employed to extract a large number of effective spatial features from EEG signals, and these three layers of CNN networks have the same size. Then, the BLSTM model is employed to extract time characteristics of MI-related tasks, and the AM model further increases the effective weight of feature information. In addition, the full connection layer is used to transfer the previously extracted features to the softmax layer for classification. In this work, T1, T2, T3, T4, and T0 represents five kinds of classification tasks, respectively. The proposed network model is mainly utilized to decode these related MI tasks and obtain the recognition rate of intentions. The specific process of the proposed idea is that the processed 2D EEG signals including spatio-temporal characteristics information are similar to the form of a large number of pictures, and the CNN model is beneficial to extract the spatial features inside. Although the pooling operation can reduce the dimension of data, it will also lose a lot of relevant information. Hence, the pooling layer structure is not considered in proposed model. According to the needs of model training in the experiment, in this work, the parameter of convolution kernels adopted is 32, 64, and 128 in turn. Besides, the size of convolution kernels is 3*3, and the step size is 1. Besides the number of convolution kernels, all other parameters remain consistent. Previous research showed that this setting approach achieved the best effect, and then the features extracted by the convolution layer could be directly fed to long short-term memory (LSTM) (Zhang et al. 2018b). The greatest benefit is that LSTM can directly extract time-dependent features from CNN. To get better features of intention recognition of MI tasks, the BLSTM strategy has been adopted in this paper. Simultaneously, the cell size of each LSTM layer parameter is set as 512, and the value of weights in which is optimized via the attention layer to effective related characteristic information. Then, the vital features are transmitted to the softmax layer after the full connection layer for intention recognition. Finally, the probability of correct classification on each category is obtained. Moreover, to effectively extract all the feature information-related to MI tasks, the zero-padding approach is added in each convolution layer to ensure the consistent size of feature maps with the inputs. The zero-padding strategy reduces the loss of information-related characteristics. As a result, the size of the feature map obtained is the same as that of the previous original feature map, namely h * w. As for the number of convolution kernels is set as 32 for the initial layer, then the number doubled in turn. After the third convolution operation, there are 128 feature maps with many high-level features related to MI tasks obtained in model training.

Results and discussion

Model training and testing

In this paper, the developed novel network framework CBLSTM with the AM strategy is adopted to classify the EEG-based MI tasks. The developed model is trained and verified by adopting the public physiological data sets. To increase the discrimination ability of the proposed model and reduce the interference of redundant information, the AM module is adopted to highlight there features for MI tasks. The evaluation flow chart for whole framework is shown in Fig. 5, which contains three main modules: data processing, model training and model testing.

Fig. 5.

Fig. 5

The flow chart of the proposed network model for MI tasks recognition

Before training and testing the proposed network, these preprocessed 2D data segments are divided according to the percentage firstly. 75% of the data segments are selected as the training set, and the rest segments are as the test set. Data segmentation from these public data sets used can be expanded by operations such as some sliding windows, through which a robust model can be obtained. In the part of the data test, the window length is set when sliding the window, and it is set to 62.5 ms for the best performance because setting other values will greatly affect the training and testing performance of the model. Therefore, when processing these data fragments, the window size p is chosen to be 10 with a 50% overlapping sliding window. In addition, the random gradient descent strategy with the Adam update rule (Kingma and Ba 2014) is adopted to optimize the cross-entropy loss in this study. Both the batch size and the number of training epochs are set to 300. The learning rate of the designed network is set to 1e-4 via the grid search. The output of the softmax function is mapped to the prediction probability yp,m, which is shown in Eq. (10):

yp,m=eymi=1Teym 10

where, yp,m represents prediction probability, m is the index of y, m = i = 1, 2, …, T, and T represents the total number of classes. In this work, T = 5. Moreover, the cross-entropy loss function is calculated by the Eq. (11):

L=-i=1Tyc,mlog(yp,m) 11

where yc,m is the index of label of intention strategy m.

Furthermore, in proposed model, the dropout operation set at 0.5 in the FC layer is executed to mitigate over-fitting during model training. For the number of hidden neurons, both BLSTM and FC layers are set to 512*2, “*” stands for replication. The detailed network parameters are listed in Table 2. In addition, the network model presented in this paper undergoes training and testing in a Python 3.7 environment with TensorFlow 1.13. The computational setup boasts a robust configuration, including 16 GB of RAM and an Intel Core i7-9700 processor clocked at 3.0 GHz, coupled with an NVIDIA GTX 2080Ti-GPU. This high-performance hardware ensemble is instrumental in training and evaluating the efficacy of the proposed network framework.

Table 2.

Experimental settings of the proposed CBLSTM hyper-parameter

Hyper-parameter The corresponding experiment settings
Filters 32, 64, 128
FC_size 512*2
Dropout rate 0.5
Epoch 300
Batch_size 300
Learning_rate 1e-4
BLSTM_size 512*2
Attention_size 8
Activation function Elu, sigmoid, tanh
Optimization algorithm Adam

Models comparison

In this paper, some advanced classification models of MI tasks are summarized to assess the performance of the proposed framework. These related literature and basic algorithm models are listed for comparative analysis. Additionally, the impact of incorporating an AM on enhancing intention recognition efficacy is also investigated for MI tasks. All the models being compared are based on the same public data set in this research. The following is the introduction to these comparative models.

Advanced models

  1. Alomari et al. (Alomari et al. 2014) employed collected EEG signals from C3, C4 and Cz channels. DWT method was adopted for feature extraction; SVM classifier was employed for final binary classification. The experimental results showed that the intention decoding rate of the proposed method achieved 98% on both fists and both feet MI tasks.

  2. Sita et al. (Sita and Nair 2013) utilized Gaussian weighting to extract EEG features of corresponding tasks through ICA. Moreover, LDA and fast discriminant analysis (FDA) were used to classify the MI actions. The results showed that the framework developed for this research achieved an average recognition accuracy of 87.24% for right fist, left fist, and both feet MI tasks.

  3. Hou et al. (Hou et al. 2020b) solved the forward and inverse problems of EEG signals according to the boundary element method (BEM) and weighted minimum norm estimation (WMNE) of EEG source imaging (ESI) technology, respectively. Afterwards, ten scouts were constructed in the motor cortex to choose the region of interest (ROI), and the important features were extracted from reconnaissance subsequences. Finally, the CNN network was employed to classify MI tasks. The average recognition rate was 94.50% for left fist, right fist, both fists and both feet tasks by using 10 subjects’ data.

  4. Zhang et al. (Zhang et al. 2017) built a seven-layer deep recurrent neural network model specifically for classifying original EEG signals. Furthermore, an orthogonal array experimental approach was also integrated for optimal hyperparameter selection, which significantly enhanced the EEG-based intention recognition capabilities of developed model. Finally, the average decoding rate 95.53% was obtained for eye closed, left fist, right fist, both fists and both feet MI tasks.

  5. Chen et al. (Chen et al. 2018) pointed out that the EEG signal specific frequency was beneficial to detect the intentional activity of the brain. For this reason, the collected EEG signals were decomposed into different frequency bands and imported into the deep RNN to realize these five classification tasks including eye closed, left fist, right fist, both fists and both feet MI. The experimental findings demonstrated that the proposed algorithmic model attained an average recognition rate of 97.86%. However, the spatial characteristics of MI-EEG signals were not considered.

  6. Zhang et al. (Zhang et al. 2019) employed the CNN model to enhance selective attention and automatically extract the EEG signal feature directly from the original acquisition. This approach led to the development of a versatile recognition framework, tailored to meet the practical demands of BCI applications across various scenarios. The experimental outcomes revealed that the proposed algorithm model achieved a notable accuracy of 96.32%. This research was accomplished by utilizing data from 20 subjects engaged in MI tasks involving eye closed, left fist, right fist, both fists and both feet.

  7. Zhang et al. (Zhang et al. 2018b) built a deep model combining CNN and LSTM by learning space–time representation. According to the advantages of the CNN and LSTM framework, five MI tasks were identified from the original EEG signals. The experimental outcomes revealed that the proposed model achieved the average recognition result 97.34% by using 108 subjects’ data for for eye closed, left fist, right fist, both fists and both feet MI tasks.

  8. Yang et al. (Li et al. 2020) put forward an idea of model training and classification based on CNN and gate recurrent unit (GRU). For this designed network framework, the spatial features were extracted using CNN, and the temporal features were extracted via GRU model. Afterwards, the softmax layer performed intention recognition of MI tasks. Finally, the proposed framework obtained the average accuracy 97.36% for eye closed, left fist, right fist, both fists and both feet tasks classification.

  9. Huang et al. (Huang et al. 2022) developed a novel DL model based on the EEG signals to enhance MI classification performance, namely local reparameterization trick into convolutional neural networks (LRT-CNN). Moreover, a global classifier was also evaluated for five MI tasks by different groups on PhysioNet Dataset. At last, the proposed method achieved the average accuracy 92.41% for eye closed, left fist, right fist, both fists and both feet tasks classification.

  10. Mammone et al. (Mammone et al. 2023) put forward the method which named autoencoder-filter bank common spatial patterns (AE-FBCSP) to decode MI tasks from electroencephalography (EEG). Firstly, the EEG signal was extracted by FBCSP. Then, the AE encoder was also trained to transfer the features to the full connection layer for intention recognition. Finally, the experimental results showed that the framework proposed achieved 83.86% recognition effect using 105 subjects in five types of MI tasks recognition, namely right fist, left fist, both fists and both feet tasks.

  11. Huang et al. (Huang et al. 2023) proposed a convolutional sliding window-attention network (CSANet) model which including the novel spatiotemporal convolution, sliding window and two-stage attention blocks. First of all, the spatiotemporal convolution was used to extract MI EEG features. Afterwards, the output feature sequence was input into the sliding window blocks to further extract the local and global context information of feature sequences. Finally, the extracted features were adaptively selected in the attention block and fed to two FC layers with Softmax activation for classification. The experimental results showed that the proposed framework achieved an average recognition rate of 92.36% for right fist, left fist, both fists and both feet MI tasks.

Baseline models and attention model

To explore the effectiveness of the network framework proposed in this work, the basic network structures including the basic 2D-CNN, RNN and 2D-CNN+ RNN are also compared. Considering that 2D-CNN can better extract spatial features, and the temporal information is further extracted by RNN model, therefore the model designed in this paper has a certain effect on improving performance via combining these models to extract the spatial-temporal information. For the basic model, although 2D-CNN employs a 2D EEG grid as input without any sliding window operations in the processing stage, these basic models keep the same parameter setting and structure as the built model. In addition, this paper also compares whether the intervention of the AM module can increase the effect of EEG intention decoding effectively. And the framework still employs proposed CBLSTM structure. To ensure a fair comparison, the baseline models, AM model and the proposed framework are all under identical environmental conditions when testing, focusing on full-channel analysis. The experimental data sets employed are all the public EEGMMIDB, from PhysioNet (Goldberger et al. 2000).

Comparison of results

Overall performance

In this paper, a new hybrid CBLSTM network model is developed. Also, the AM model is adopted to increase the effective weight coefficient of BLSTM. Initially, the acquired time-series signals undergo a transformation process, resulting in a 2D array similar to an image format through the application of a sliding window. This 2D array is then served as the input for the network model. Subsequently, a three-layer CNN model is utilized to extract spatial features, while the temporal aspects of EEG signals are extracted using the proposed BLSTM network. Additionally, to amplify the relevance of critical information, the AM model is also adopted to enhance the weighting of valuable data features. The experimental results demonstrate the efficacy of our developed hybrid CBLSTM model with the classification accuracy of 98.40% for MI-EEG intention recognition, outperforming many advanced models and basic models. Table 3 provides a detailed comparison between these specific network models.

Table 3.

Comparison between the model proposed in this paper and others on the public EEGMMIDB dateset

Index Work Tasks Participant Electrodes Training time ACC (%) p-Value
1 Almoaril et al. (2014) 2 100 3 74.90
2 Sita et al. ( 2013) 3 30 64 87.24
3 Hou et al. (2020b) 4 10 64 94.50
4 Zhang et al. (2017) 5 10 64 95.53
5 Chen et al. (2018) 5 10 64 97.86
6 Zhang et al. (2019) 5 20 64 96.32
7 Zhang et al. (2018b) 5 108 64 13h45min 97.34 0.018
8 Yang et al. (2020) 5 108 64 13h30min 97.36 0.021
9 Huang et al. (2022) 5 109 17 92.37
10 Mammone et al. (2023) 5 105 36 83.86
11 Huang et al. (2023) 4 109 18 92.36
12 2DCNN 5 108 64 14h15min 90.36 0.017
13 RNN 5 108 64 15h20min 86.78 0.035
14 2DCNN+ RNN 5 108 64 14h50min 94.51 0.023
15 This work 5 108 64 11h35min 98.40

In addition, some researches which mainly focus on simple scenes. For example, these researches about the literature (Alomari et al. 2014) and (Sita and Nair 2013) were binary and three classes, respectively, (Hou et al. 2020b) and (Huang et al. 2023) mainly focused on four categories of MI tasks. The difference between these models is that the number of subjects is different. The experiments show that our results still significantly exceed other methods. Compared with other methods which from #2 to #6, a large size dataset that included 108 subjects and 630,104 data fragments generated are used in this paper. In (Zhang et al. 2018b), a framework of three-layer CNN+ two-layer LSTM was adopted, and solved the problem of low spatial resolution of EEG by retaining spatial information. We reproduced their work by using their public codes, and the processed classification accuracy of the data set was 97.34%. Inspired by this literature, three-layer CNN and two-layer BLSTM are built to classify the MI actions in our work. The decoding rate of the model achieved a better result. When the AM model is introduced, the network model can further decrease the overall training time of the model. Therefore, the developed CBLSTM combined with AM module can achieve good recognition rate, which promotes the practical application in the field of BCI.

By compared with models in Table 3, it can be found that the proposed CBLSTM model combined with AM structure in our work achieves 98.40% average accuracy of intention recognition. Compared with the structures of Zhang et al. (2018b) and Yang et al. (Li et al. 2020), the classification results of our structure are increased by 1.06% and 1.04%, respectively. Compared with the basic model structures of 2D-CNN, RNN and 2D-CNN + RNN, the intentional decoding results of the proposed network model are improved by 8.04%, 11.62%, and 3.89%, respectively. To evaluate the effectiveness of the algorithm model, some repeated experiments have been conducted on the proposed model for many times. Compared the results with other advanced models, the results obtained are better than the given optimal results for all comparative articles and base models. Moreover, it can be found that the method proposed in this study is significantly different from other studies using the paired t-test. For instance, the reproduced literature (Zhang et al. 2018b; Li et al. 2020) and some baseline methods through the cross-verification, p-value < 0.05. Hence, the scheme proposed in this study has better decoding effect. In addition, although only some channel data were selected in the model training of references Huang et al. (Huang et al. 2022), Mammone et al. (Mammone et al. 2023) and Huang et al. (Huang et al. 2023), the recognition results are only 92.37%, 83.86%, 92.36%, respectively. It can be found by comparison, the recognition result of our model is further improved. Moreover, the literature Huang et al. (Huang et al. 2023) is only classified into four categories, while our proposed model is for intention recognition of five types of EEG activities. Particularly, the decoding effect of the model proposed is improved about 23.5% compared with the model proposed by Almoaril et al. (Alomari et al. 2014); The recognition rate is improved by 11.16% compared with the model proposed by Sita et al. (Sita and Nair 2013); The recognition result is improved by 3.9% compared with the model proposed by Hou et al. (Hou et al. 2020b) and the recognition rate is improved by 2.87% compared with the model proposed by Zhang et al. (Zhang et al. 2017). Through comparing the model proposed by Chen et al. (Chen et al. 2018), the result of the model proposed in this study is improved by 0.54%; The model proposed in this study is improved by 2.08% compared with the algorithm model proposed by Zhang et al. (Zhang et al. 2019). Moreover, training and testing for the proposed model on a large data set, the training time of the proposed model is 11h35min, while the training time of the reappeared model by Zhang et al. (Zhang et al. 2018b) is 13h45min. By comparison, it is found that the model proposed in this study has a faster training speed. Meanwhile, this study also reproduces the model structure proposed by Yang et al. (Li et al. 2020), and obtains 97.36% recognition results. By comparison, it is found that the decoding rate 98.40% of the model proposed in this study is increased by 1.04%, and the training time is shortened by about 2 h. According to the literature (Huang et al. 2022), although Huang et al. used some 18-channel data to study five kinds of intention recognition, the recognition rate was only 92.37%. It was lower than the model developed in this study, 6.03%. In the literature (Mammone et al. 2023), Mammone et al. used the data of 105 subjects, and the channel data was reduced by nearly half. As a result, the recognition rate was also reduced, only 83.86%. Compared with the literature (Mammone et al. 2023), the recognition rate obtained by the proposed model is improved by 14.54%. According to the literature (Huang et al. 2023), although Huang et al. used some channels to conduct intention recognition research on four types of tasks, only 92.36% recognition result was obtained. Our proposed model is higher than (Huang et al. 2023) with 6.04% in five types of intention recognition. These comparative analysis results also fully demonstrate the effectiveness and feasibility of the model developed in this study. In addition, from the above analysis, it can be found that the model proposed has good decoding effect on intention recognition rate of MI tasks to a certain extent, which also proves that our method is superior to others. Specially, in the same hardware environment configuration, the training time of the proposed model is 11h35min, the training time of the other models such as the references (Zhang et al. 2018b) and (Li et al. 2020) are 13h45min and 13h30min on the same data set, respectively. The experimental comparison results show that the training time of the proposed model can be shortened by about 2 h, which demonstrating that the proposed model has less training time by comparing with other state-of-the-art methods. As illustrated in Fig. 6, the comparison of the loss function is performed between the developed model in this study and the model proposed by Yang et al. (Li et al. 2020) in the training process. It can be found that the training optimization speed of the network model structure built in this paper is superior to the model proposed by Yang et al. Furthermore, it also shows that our model has a better training ability.

Fig. 6.

Fig. 6

The comparison of training loss with the increase of training iteration times between our developed model and Yang et al.’ model (Li et al. 2020)

To better highlight the superior performance for our developed framework, other indicators such as Recall, Precision, F1-score, et al. are used to evaluate the training model, as shown from Eqs. (12) to (14).

Recall=TPTP+FN 12
Precision=TPTP+FP 13
F1=21/Precision+1/recall 14

where, TP represents the true positive, namely the predicted value is the same as the real label; FP represents the false positive; FN represents the false negative, namely the predicted value is different with the real label.

The evaluation results demonstrate that the identification result of each category is more than 98%, as shown in Table 4. Besides, the mean results of the five evaluation indexes of Recall, Precision, Test-auc, and F1-score are 98.11%, 98.39%, 98.85% and 98.25%, respectively. It also shows that the developed model has certain robustness and feasibility.

Table 4.

Some valuation indexes, such as Recall, Precision, Test-auc and F1-score

Index/Class Task 1 Task 2 Task 3 Task 4 Task 5 Mean
Recall(%) 99.81 97.58 98.20 97.69 97.27 98.11
Precision(%) 98.45 98.49 98.15 98.11 98.78 98.39
Test-auc(%) 99.50 98.65 98.92 98.65 98.52 98.85
F1-score(%) 99.12 98.03 98.18 97.90 98.02 98.25

Optimization of proposed structure

Parameter optimization is also a crucial aspect of DL models. Thus, to ensure the reliability of the model designed in this study, continuous optimization of certain parameters and hyper-parameters is necessary during the training process. Nevertheless, there are many model parameters involved in DL, and it is impossible to list all the influencing parameters. Therefore, this paper primarily focuses on exploring the impact of the number of CNN network layers on performance. As indicated in Table 5, appropriately increasing the network layers can enhance the classification capabilities of the proposed model. On the flip side, it also leads to a more complex model structure.

Table 5.

The influence of different network layers on the recognition rate of the developed model

Model framework ACC (%)
Two-layer CNN + two-layer BLSTM 96.32
Three-layer CNN + two-layer BLSTM 98.40
Four-layer CNN + two-layer BLSTM 98.42
Five-layer CNN + two-layer BLSTM 98.45

When the CNN layers are set to 3, the accuracy of the proposed framework can reach 98.40%, which is 2.08% higher than that of the two-layer convolutional network. To further explore the influence of network layers on classification results, more convolution layers are added in turn based on the three-layer CNN + two-layer BLSTM framework. Meanwhile, the number of BLSTM layers remains unchanged. As a result, the proposed network model is not obviously improved (98.42%, 98.45%) by increasing the number of network layers. Based on the experimental analysis and with the aim of minimizing model complexity, the final structure of the proposed model adopted is a three-layer CNN coupled with a two-layer BLSTM in this paper.

Effects for AM on performance

To capture feature information that is more favorable for classification, the AM model is employed to concentrate on effective features. The advantage of the AM model is that it can optimize the weights and biases of the network model, thereby maximizing the emphasis on the information pertinent to MI tasks. This approach enables the identification of more effective features. In Fig. 7a, b, the vertical axis represents the training loss and training accuracy respectively. The horizontal axis represents the number of rounds during training. It can be clearly observed that after the network model designed in this paper is added with an AM model, the training loss drops faster than without adding with attention network. Meanwhile, the training accuracy is greatly improved compared with that before. Furthermore, the confusion matrix displays that the model proposed has good recognition performance for five types of tasks, as shown in Fig. 8. The recognition results of five types of tasks have reached above 97%, which shows that the model constructed is feasible and effective. From Table 6, it can be observed that the intended decoding accuracy reaches 98.40% after the AM model added, which is superior to other network structures (97.34%, 97.36%, 97.85%, 91.03). Besides, the experimental results also show that the training time of the model with AM model is 11h35min, and the test accuracy is 98.40%. When the AM model is removed and the other modules remain unchanged, in this case, the training time of the proposed model is 13h40min. At the same time, the test accuracy is 97.85% which is slightly lower than the scenarios of adding AM model. In addition, when the proposed model only adopts EEG temporal feature information for intention recognition, namely combining BLSTM with AM model, the recognition effect is 91.03%. Simultaneously, the training time of the model is 9h50min, compared with the temporal-spatial information (accuracy is 98.40%, training time is 11h35min); Although the training time is reduced by nearly 2 h, the recognition rate has reduced about 7%. Experiment demonstrates that it is effective and feasible for decoding the MI task based on EEG signals by considering both temporal and spatial information, and combining the attention model mechanism. Therefore, the AM module adopted in this study can not only effectively shorten the training time of the model, but also it can also improve the intention recognition ability by comparing other models structures.

Fig. 7.

Fig. 7

Influence of AM model on the performance of the presented model. a The comparison of training loss with and without AM model. b The comparison of training accuracy with and without AM model

Fig. 8.

Fig. 8

The confusion matrix of the developed model in this paper

Table 6.

The influence of attention mechanism in proposed model

Model structure ACC (%) Training time
CNN + LSTM(Zhang et al. 2018b) 97.34 13h45min
CNN + GRU(Li et al. 2020) 97.36 13h30min
CNN + BLSTM 97.85 13h40min
BLSTM + Attention 91.03 9h50min
CNN + BLSTM + Attention 98.40 11h35min

Discussion

This paper mainly aims at the intention decoding of MI tasks based on EEG signals. Considering that EEG signals are non-stationary, lots of artifacts generated in the acquisition process bring the challenges on decoding MI-EEG intention due to the low SNR. Although some researchers have done many works on feature extraction and some good fruit of research are obtained by considering the higher time resolution of EEG signals, it is insufficient to provide a better practical application due to the poor recognition performance in the BCI field.

For the purpose of improving the recognition ability of MI tasks, a novel network model structure CBLSTM combined with AM model is proposed. First of all, the data sets are processed. Although some studies transformed the collected 1D data into the time and frequency domains, or used the channel information alone for signal analysis, few studies used the information of electrode position for effective processing. In this paper, the structure of the data grid matrix is considered to transform the data structure of time-series signals. Afterwards, according to the electrode distribution, the 1D time series signals are mapped to the 2D topological structure with 10 rows and 11 columns by using the position structure information. However, not all elements can correspond to the location of the electrode information in the array. Therefore, the parts without electrodes are filled with zero to solve the blank area, namely null electrodes. As a result, the 2D spatial structure containing all the electrodes is obtained. Moreover, the formed data is segmented in the form of sliding window to further increase the amount of data. Finally, a series of data fragments are obtained. In view of these 2D data segments containing all the information recorded by 64 electrodes, the developed CBLSTM model is used to extract these characteristics related to MI intention recognition. In order to reduce the influence of network layer on classification accuracy, many times by increasing or decreasing the number of convolutional layers have been simulated in this paper. Subsequently, the best setting of the number of convolution layers is obtained. In addition, the AM model is added to mine the deep information of features to further increase the weight of effective information and compared with the network model without attention mechanism. Experimental results indicate that the recognition rate of the built framework is 98.40% with the AM model. In addition, a comparative analysis has been also conducted with the model structures proposed by Zhang (2018b) and Yang (2020). The classification accuracy of our network model shows an improvement of 1.06% and 1.04% over these models, respectively. By comparing with the basic model structure of 2DCNN, RNN, and 2D-CNN + RNN, the classification accuracy of the built network has also been improved by 8.04%, 11.62%, and 3.89%, respectively. Furthermore, this study also includes a comparison with and without the AM model, maintaining identical experimental conditions. Results indicate that the intention recognition result of combining CBLSTM and AM model outperforms the only CBLSTM model, namely without AM model. Meanwhile, the training process is also shortened about 2 h. It also proves that the developed CBLSTM + AM network model promotes the ability of EEG intention decoding by comparing with other mainstream algorithms. Moreover, the proposed CBLSTM + AM model has also achieved a good recognition effect by comparing with (Zhang et al. 2017) which only used the temporal information of EEG signals. Although (Chen et al. 2018) decomposed EEG signals into different frequencies and these features including time-frequency information were sent to the RNN model to realize the classification of MI, the classification accuracy obtained is almost the same as ours. It is indicated that increasing the dimension of features can improve the decoding performance, since part of the key lies in whether the effective components of EEG signals can be fully learned. Through the optimization experiment of the built network model, the results display that the number of convolution layers increases, the higher the obtained decoding rate. Besides, the high-level spatial features of EEG signals can be extracted via CNN model with many layers and the classification accuracy is also enhanced. However, the complexity of the model will be increased with higher computing resources. After weighing the performance and resources, CBLSTM strategy is used to extract spatial–temporal features. Moreover, some more important features information of MI-EEG signals can be extracted and classified effectively through combined with the AM model. Experimental results indicate that the proposed approach also reduces the training time.

However, the network framework based on DL methods has still some limitations, including the selection of layers and the optimization of network structure. Meanwhile, a particular concern is the uncertain generalization and interpretability of DL models. In addition, the neuroelectrical activity of brain areas covered by electrodes can be reflected by EEG signal data of each channel, and different brain areas are linked with different brain activities. There are still some difficulties in formulating the internal connection between the brain and neural activities. Considering that the neural activity can clearly explain the internal relationship between neural activity and brain regions, in subsequent work, some researches will be conducted on this relationship.

Conclusion

In this paper, to enhance the performance of EEG-based MI intention discriminating ability, a novel model CBLSTM has been proposed for decoding MI tasks by the multilevel CNN and BLSTM framework. In view of the advantages of LSTM in extracting time series, the BLSTM strategy is employed combining the relationship between before and after time series to extract the temporal feature information of MI-EEG. Moreover, the data conversion is designed to transform the original data firstly into an array containing both temporal and spatial feature information due to the low SNR of EEG signals. Then, a shallow three-layer CNN network is designed to extract these spatial features. In addition, the AM model is chosen to focus on the feature information which is easier to identify categories. Experimental results indicate that the proposed EEG-based MI tasks recognition framework in this paper achieves 98.40% with less training time in multi-scene applications.

Although the built network model can enhance the discrimination, the work still has limitations in exploring which brain regions are more active in specific neural activities, such as insufficient data samples, et al. It is also expected that the work our proposed can guide the studies of other researchers to improve MI-EEG signals classification. Furthermore, the data augmentation methods will be explored to expand the data volume and further enhance the reliability of the intention recognition model in the face of feature selection. Also, some other model fusion method are adopted to further promote the application of BCI in medical rehabilitation fields.

Acknowledgements

This work was supported by the National Nature Science Foundation of China [Grant Number 62373108].

Appendix

See Table 7.

Table 7.

Abbreviation

Acronyms Full terminology
BCI Brain-computer interface
MI Motor imagery
EEG Electroencephalogram
SNRs Signal-to-noise ratios
CBLSTM Convolution bidirectional long short-term memory
BLSTM Bidirectional long short-term memory
EMG Electromyography
ECG Electrocardiogram
NLP Natural language processing
AM Attention mechanism
PL Power line
DWT Discrete wavelet transform
ERS Event-related synchronization
ERD Event-related desynchronization
CSP Common spatial patterns
SVM Support vector machine
ICA Independent component analysis
PCA Principal component analysis
STFT Short-time fourier transform
ACSP Analytic common spatial patterns
DL Deep learning
CNN Convolution neural networks
SAE Stacked automatic encoder
GCN Graph convolution neural networks
EMD Empirical mode decomposition
MEMD Multivariate empirical mode decomposition
SLFN Single hidden layer feedforward neural
STFT Short time fourier transform
DWT Discrete wavelet transform
GCNN Graph convolution neural networks
1D One-dimensional
2D Two-dimensional
LSTM Long-short term memory
PSFD Power system fault diagnosis
RNN Recurrent neural network
MT Machine translation
PLD Power line detection
SR Speech recognition
LDA Linear discriminant analysis
FDA Fast discriminant analysis
WMNE Weighted minimum norm estimation
ESI Electroencephalogram source imaging
GRU Gate recurrent unit
BEM Boundary element method
ROI Region of interest
ACC Accuracy

Authors contribution

Jixiang Li: the technology design of the study, analysis and interpretation of data; Wuxiang Shi: drafting the article or revising it critically for important intellectual content; Yurong Li: final approval of the version to be submitted.

Declarations

Conflict of interest

There is no any conflict of interest exited in the submission of this manuscript, and manuscript is approved by all authors for publication.

Footnotes

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  1. Alomari MH, AbuBaker A, Turani A, Baniyounes AM, Manasreh A (2014) EEG mouse: a machine learning-based brain computer interface. Int J Adv Comput Sci 5(4):193–198. 10.14569/IJACSA.2014.050428 [Google Scholar]
  2. Al-Saegh A, Dawwd SA, Abdul-Jabbar JM (2021) Deep learning for motor imagery EEG-based classification: a review. Biomed Sign Proc Con 63:102172. 10.1016/j.bspc.2020.102172 [Google Scholar]
  3. Altaheri H, Muhammad G, Alsulaiman M (2022) Physicsinformed attention temporal convolutional network for EEGbased motor imagery classifcation. IEEE Trans Industr Inf 19(2):2249–2258. 10.1109/TII.2022.3197419 [Google Scholar]
  4. Altaheri H, Muhammad G, Alsulaiman M, Amin SU, Altuwaijri GA, Abdul W, Faisa M (2023) Deep learning techniques for classification of electroencephalogram (EEG) motor imagery (MI) signals: a review. Neural Comput Appl 35(20):14681–14722. 10.1007/s00521-021-06352-5 [Google Scholar]
  5. Bird JJ, Faria DR, Manso LJ, Ayrosa PP, Ekart A (2021) A study on CNN image classification of EEG signals represented in 2D and 3D. J Neural Eng 18(2):026005. 10.1088/1741-2552/abda0c [DOI] [PubMed] [Google Scholar]
  6. Chen W, Wang S, Zhang X, Yao L, Yue L, Qian B (2018) EEG-based motion intention recognition via multi-task RNNs. In: Proc. SIAM-ICDM pp 279–287. 10.1137/1.9781611975321.32.
  7. Chorowski JK, Bahdanau D, Serdyuk D, Cho K, Bengio Y (2015) Attention-based models for speech recognition. Adv Neural Inf Process Syst, 577–585. https://proceedings.neurips.cc/paper_files/paper/2015/file/1068c6e4c8051cfd4e9ea8072e3189e2-Paper.pdf.
  8. Dai G, Zhou J, Huang J, Wang N (2020) HS-CNN: a CNN with hybrid convolution scale for EEG motor imagery classification. J Neural Eng 17(1):016025. 10.1088/1741-2552/ab405f [DOI] [PubMed] [Google Scholar]
  9. De Souza Jr LA, Mendel R, Strasser S, Ebigbo A, Probst A, Messmann H, Palm C (2021) Convolutional neural networks for the evaluation of cancer in Barrett’s esophagus: explainable AI to lighten up the black-box. Comput Biol Med 135:104578. 10.1016/j.compbiomed.2021.104578 [DOI] [PubMed] [Google Scholar]
  10. Falzon O, Camilleri KP, Muscat J (2012) The analytic common spatial patterns method for EEG-based BCI data. J Neural Eng 9(4):045009. 10.1088/1741-2560/9/4/045009 [DOI] [PubMed] [Google Scholar]
  11. Goldberger AL, Amaral LA, Glass L, Hausdorff JM, Ivanov PC, Mark RG (2000) PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals. Circulation 101(23):e215–e220. 10.1161/01.CIR.101.23.e215 [DOI] [PubMed] [Google Scholar]
  12. Graimann B, Allison B, Pfurtscheller G (2009) Brain-computer interfaces: a gentle introduction. In: Proc. BCI Springer Berlin Heidelberg pp 1–27. 10.1007/978-3-642-02091-9_1.
  13. Gu J, Wang Z, Kuen J, Ma L, Shahroudy A, Shuai B (2018) Recent advances in convolutional neural networks. Pattern Recogn 77:354–377. 10.1016/j.patcog.2017.10.013 [Google Scholar]
  14. Hershey S, Chaudhuri S, Ellis DP, Gemmeke JF, Jansen A, Moore RC (2017) CNN architectures for large-scale audio classification. In: Proc. IEEE ICASSP pp 131–135. 10.1109/ICASSP.2017.7952132.
  15. Hou Y, Zhou L, Jia S, Lun X (2020b) A novel approach of decoding EEG four-class motor imagery tasks via scout ESI and CNN. J Neural Eng 17(1):016048. 10.1088/1741-2552/ab4af6/meta [DOI] [PubMed] [Google Scholar]
  16. Hou Y, Jia S, Zhang S, Lun X, Shi Y, Li Y (2020) Deep feature mining via attention-based BiLSTM-GCN for human motor imagery recognition. J Latex Cla Files, 14(8). https://arxiv.org/abs/2005.00777.
  17. Hsu WY (2011) Continuous EEG signal analysis for asynchronous BCI application. Int J Neural Sys 21(4):335–350. 10.1142/S0129065711002870 [DOI] [PubMed] [Google Scholar]
  18. Hsu WY, Lin CC, Ju MS, Sun YN (2007) Wavelet-based fractal features with active segment selection: application to single-trial EEG data. J Neurosci Meth 163(1):145–160. 10.1016/j.jneumeth.2007.02.004 [DOI] [PubMed] [Google Scholar]
  19. Huang W, Chang W, Yan G, Yang Z, Luo H, Pei H (2022) EEG-based motor imagery classification using convolutional neural networks with local reparameterization trick. Expert Syst Appl 187:115968. 10.1016/j.eswa.2021.115968 [Google Scholar]
  20. Huang PY, Liu F, Shiang SR, Oh J, Dyer C (2016) Attention-based multimodal neural machine translation. Proc First Conf Mach Trans 2:639–645 [Google Scholar]
  21. Huang Y, Zheng J, Xu B, Li X, Liu Y, Wang Z, Feng H, Cao S (2023) An improved model using convolutional sliding window-attention network for motor imagery EEG classification. Front Neurosci 17:1204385. 10.3389/fnins.2023.1204385 [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Jiang L, Luo C, Liao Z, Li X, Chen Q, Jin Y, Zhang D (2023) SmartRolling: a human-machine interface for wheelchair control using EEG and smart sensing techniques. Inform Process Manag 60(3):103262. 10.1016/j.ipm.2022.103262 [Google Scholar]
  23. Keerthi Krishnan K, Soman KP (2021) CNN based classification of motor imaginary using variational mode decomposed EEG-spectrum image. Biomed Eng Lett 11(3):235–247. 10.1007/s13534-021-00190-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Kingma DP, Ba J (2014) Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
  25. Kumar S, Sharma A (2018) A new parameter tuning approach for enhanced motor imagery EEG signal classification. Med Biol Eng Comput 56:1861–1874. 10.1007/s11517-018-1821-4 [DOI] [PubMed] [Google Scholar]
  26. Lawhern VJ, Solon AJ, Waytowich NR, Gordon SM, Hung CP, Lance BJ (2018) EEGNet: a compact convolutional neural network for EEG-based brain-computer interfaces. J Neural Eng 15(5):056013. 10.1088/1741-2552/aace8c [DOI] [PubMed] [Google Scholar]
  27. Lebedev MA, Nicolelis MA (2017) Brain-machine interfaces: from basic science to neuroprostheses and neurorehabilitation. Physiol Rev 97(2):767–837. 10.1152/physrev.00027.2016 [DOI] [PubMed] [Google Scholar]
  28. Lee H, Kwon H (2017) Going deeper with contextual CNN for hyperspectral image classification. IEEE Trans Image Process 26(10):4843–4855. 10.1109/TIP.2017.2725580 [DOI] [PubMed] [Google Scholar]
  29. Lemm S, Blankertz B, Curio G, Muller KR (2005) Spatio-spectral filters for improving the classification of single trial EEG. IEEE Trans Biomed Eng 52(9):1541–1548. 10.1109/TBME.2005.851521 [DOI] [PubMed] [Google Scholar]
  30. Lemm S, Curio G, Hlushchuk Y, Muller KR (2006) Enhancing the signal-to-noise ratio of ICA-based extracted ERPs. IEEE Trans Biomed Eng 53(4):601–607. 10.1007/978-3-642-13318-3_50 [DOI] [PubMed] [Google Scholar]
  31. Li Y, Yang H, Li J, Chen D, Du M (2020) EEG-based intention recognition with deep recurrent-convolution neural network: performance and channel selection by Grad-CAM. Neurocomputing 415:225–233. 10.1016/j.neucom.2020.07.072 [Google Scholar]
  32. Li H, Ding M, Zhang R, Xiu C (2022) Motor imagery EEG classification algorithm based on CNN-LSTM feature fusion network. Biomed Signal Proces 72:103342. 10.1016/j.bspc.2021.103342 [Google Scholar]
  33. Li J, Li Y, Du M (2023) Comparative study of EEG motor imagery classification based on DSCNN and ELM. Biomed Signal Proces 84:104750. 10.1016/j.bspc.2023.104750 [Google Scholar]
  34. Lu P, Gao N, Lu Z, Yang J, Bai O, Li Q (2019) Combined CNN and LSTM for motor imagery classification. In: Proc. CISP-BMEI pp 1–6. 10.1109/CISP-BMEI48845.2019.8965653.
  35. Luo TJ, Zhou CL, Chao F (2018) Exploring spatial-frequency-sequential relationships for motor imagery classification with recurrent neural network. BMC Bioinform 19(1):1–18. 10.1186/s12859-018-2365-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Mammone N, Ieracitano C, Adeli H, Morabito FC (2023) Autoencoder filter bank common spatial patterns to decode motor imagery from EEG. IEEE J Biomed Health 27(5):2365–2376. 10.1109/JBHI.2023.3243698 [DOI] [PubMed] [Google Scholar]
  37. Nandhini A, Sangeetha J (2023) A review on deep learning approaches for motor imagery EEG signal classification for brain-computer interface systems. Comput Vision Bio-Inspired Comput: Proc ICCVBIC 2022:353–365. 10.1007/978-981-19-9819-5_27 [Google Scholar]
  38. Prisciandaro E, Sedda G, Cara A, Diotti C, Spaggiari L, Bertolaccini L (2023) Artificial neural networks in lung cancer research: a narrative review. J Clin Med 12(3):880. 10.3390/jcm12030880 [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Qin Y, Li B, Wang W, Shi X, Wang H, Wang X (2024) ETCNet: an EEG-based motor imagery classification model combining efficient channel attention and temporal convolutional network. Brain Res 1823:148673. 10.1016/j.brainres.2023.148673 [DOI] [PubMed] [Google Scholar]
  40. Dos Santos EM, San-Martin R, Fraga FJ (2023) Comparison of subject-independent and subject-specific EEG-based BCI using LDA and SVM classifiers. Med Bio Eng Comput. 10.1007/s11517-023-02769-3 [DOI] [PubMed] [Google Scholar]
  41. Schalk G, McFarland DJ, Hinterberger T, Birbaumer N, Wolpaw JR (2004) BCI2000: a general-purpose brain-computer interface (BCI) system. IEEE Trans Biom Eng 51(6):1034–1043. 10.1109/TBME.2004.827072 [DOI] [PubMed] [Google Scholar]
  42. Schirrmeister RT, Springenberg JT, Fiederer LGJ, Glasstetter M, Eggensperger K, Tangermann M (2017) Deep learning with convolutional neural networks for EEG decoding and visualization. Hum Brain Mapp 38(11):5391–5420. 10.1002/hbm.23730 [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Shi X, Li B, Wang W, Qin Y, Wang H, Wang X (2023) Classification algorithm for electroencephalogram-based motor imagery using hybrid neural network with spatio-temporal convolution and multi-head attention mechanism. Neuroscience 527:64–73. 10.1016/j.neuroscience.2023.07.020 [DOI] [PubMed] [Google Scholar]
  44. Sita J, Nair GJ (2013) Feature extraction and classification of EEG signals for mapping motor area of the brain. In: Proc. ICCC pp 463–468, 10.1109/ICCC.2013.6731699.
  45. Sun J, Wang Y, Liu P, Wen S, Wang Y (2023b) Memristor-based circuit design of PAD emotional space and its application in mood congruity. IEEE Internet Things 10(18):16332–16342. 10.1109/JIOT.2023.3267778 [Google Scholar]
  46. Sun J, Li C, Wang Z, Wang Y (2023a) A memristive fully connect neural network and application of medical image encryption based on central diffusion algorithm. IEEE Trans Ind Inform. 10.1109/TII.2023.3312405 [Google Scholar]
  47. Tabar YR, Halici U (2016) A novel deep learning approach for classification of EEG motor imagery signals. J Neural Eng 14(1):016003. 10.1088/1741-2560/14/1/016003 [DOI] [PubMed] [Google Scholar]
  48. Tang Z, Shi Y, Wang D, Feng Y, Zhang S (2017) Memory visualization for gated recurrent neural networks in speech recognition. In: Proc ICASSP pp 2736–2740, 10.1109/ICASSP.2017.7952654.
  49. Wang W, Li B (2023) A novel model based on a 1D-ResCNN and transfer learning for processing EEG attenuation. Comput Methods Biomech Biomed Eng 26(16):1980–1993. 10.1080/10255842.2022.2162339 [DOI] [PubMed] [Google Scholar]
  50. Wang P, Song Q, Li Y, Lv S, Wang J, Li L, Zhang H (2020) Cross-task extreme learning machine for breast cancer image classification with deep convolutional features. Biomed Signal Proces 57:101789. 10.1016/j.bspc.2019.101789 [Google Scholar]
  51. Wang W, Li B, Wang H, Wang X, Qin Y, Shi X, Liu S (2024) EEG-FMCNN: a fusion multi-branch 1D convolutional neural network for EEG-based motor imagery classification. Med Biol Eng Comput 62:107–120. 10.1007/s11517-023-02931-x [DOI] [PubMed] [Google Scholar]
  52. Yilmaz BH, Yilmaz CM, Kose C (2020) Diversity in a signal-to-image transformation approach for EEG-based motor imagery task classification. Med Biol Eng Comput 58(2):443–459. 10.1007/s11517-019-02075-x [DOI] [PubMed] [Google Scholar]
  53. Zhang Y, Nam CS, Zhou G, Jin J, Wang X, Cichocki A (2018a) Temporally constrained sparse group spatial patterns for motor imagery BCI. IEEE Trans Cybern 49(9):3322–3332. 10.1109/TCYB.2018.2841847 [DOI] [PubMed] [Google Scholar]
  54. Zhang X, Yao L, Huang C, Sheng QZ, Wang X (2017) Intent recognition in smart living through deep recurrent neural networks. In: Proc. ICNIP. Springer, Cham pp 748–758, 10.1007/978-3-319-70096-0_76.
  55. Zhang D, Yao L, Zhang X, Wang S, Chen W, Boots R (2018b) Cascade and parallel convolutional recurrent neural networks on EEG-based intention recognition for brain computer interface. Proc AAAI Confer Artific Intell. 10.1609/aaai.v32i1.11496 [Google Scholar]
  56. Zhang X, Yao L, Wang X, Zhang W, Zhang S, Liu Y (2019) Know your mind: adaptive cognitive activity recognition with reinforced CNN. In: Proc. ICDM pp 896–905. 10.1109/ICDM.2019.00100.

Articles from Cognitive Neurodynamics are provided here courtesy of Springer Science+Business Media B.V.

RESOURCES