STGAT-CS: spatio-temporal-graph attention network based channel selection for MI-based BCI

Ming Meng; Bin Xu; Yuliang Ma; Yunyuan Gao; Zhizeng Luo

doi:10.1007/s11571-024-10154-5

. 2024 Jul 21;18(6):3663–3678. doi: 10.1007/s11571-024-10154-5

STGAT-CS: spatio-temporal-graph attention network based channel selection for MI-based BCI

Ming Meng ^1,^✉, Bin Xu ¹, Yuliang Ma ¹, Yunyuan Gao ¹, Zhizeng Luo ¹

PMCID: PMC11655804 PMID: 39712131

Abstract

Brain-computer interface (BCI) based on the motor imagery paradigm typically utilizes multi-channel electroencephalogram (EEG) to ensure accurate capture of physiological phenomena. However, excessive channels often contain redundant information and noise, which can significantly degrade BCI performance. Although there have been numerous studies on EEG channel selection, most of them require manual feature extraction, and the extracted features are difficult to fully represent the effective information of EEG signals. In this paper, we propose a spatio-temporal-graph attention network for channel selection (STGAT-CS) of EEG signals. We consider the EEG channels and their inter-channel connectivity as a graph and treat the channel selection problem as a node classification problem on the graph. We leverage the multi-head attention mechanism of graph attention network to dynamically capture topological relationships between nodes and update node features accordingly. Additionally, we introduce one-dimensional convolution to automatically extract temporal features from each channel in the original EEG signal, thereby obtaining more comprehensive spatiotemporal characteristics. In the classification tasks of the BCI Competition III Dataset IVa and BCI Competition IV Dataset I, STGAT-CS achieved average accuracies of 91.5% and 85.4% respectively, demonstrating the effectiveness of the proposed method.

Keywords: Brain-computer interface (BCI), Motor imagery (MI), Channel selection, One-dimensional convolution (1D Conv), Graph attention network (GAT)

Introduction

The brain computer interface (BCI) is an advanced technical system that enables direct communication between the human brain and computers or other external devices (Huang et al. 2016). It facilitates mind-controlled manipulation of external devices, enabling applications such as auxiliary medical treatment, neuroscience research, and virtual reality experiences (Zuo et al. 2020). Currently, a range of invasive and non-invasive techniques are available for recording physiological signals associated with brain activity. Among these methods, electroencephalogram (EEG) stands out due to its non-invasiveness, affordability, and high portability (Zhang et al. 2021).

In the BCI field, commonly employed paradigms include steady-state visual evoked potentials (SSVEPs) (Hsu et al. 2020), P300 (Allison et al. 2020), and motor imagery (MI) (Schlögl et al. 2005; Rong et al. 2020). Among these, MI has garnered significant attention due to its stimulus-independent nature and practicality (Cincotti et al. 2003). MI induces signal alterations in relevant brain regions by prompting users to imagine specific actions. For instance, imagining movement of the hand results in a decrease in α and β band amplitudes in the EEG signal of the contralateral motor cortex, known as event-related desynchronization (ERD), while an increase in corresponding band amplitudes is observed in the ipsilateral cortical area, referred to as event-related synchronization (ERS) (Park and Chung 2020). By effectively identifying this phenomenon, distinct MI of various body parts can be discerned for generating control signals applicable to MI-BCI.

The brain areas that contain discriminative information for MI tasks vary across individuals (Xiao et al. 2022). To ensure accurate assessments of ERD/ERS, multi-channel EEG is widely employed in MI-based BCI systems (Blankertz et al. 2008). The use of multiple EEG channels is considered essential for achieving efficient performance in brain-computer interfaces, as it provides more comprehensive information about underlying brain activity. However, an excessive number of EEG channels may include irrelevant or redundant information related to MI tasks, and increasing the channel count does not necessarily enhance classification accuracy (Liu et al. 2015). Therefore, channel selection methods are extensively utilized to improve the performance of BCI. Feng et al. (2019) proposed a novel method based on common spatial pattern- (CSP-) rank channel selection for multifrequency band EEG (CSP-R-MF) which selecting channels in across different frequency bands in order to increase the discriminability of the extracted features further. Wang et al. (2020) developed a channel selection weight update method based on canonical correlation analysis (CCA). They calculated weights using cross-validation with a support vector machine (SVM) classifier. Tang et al. (2022) studied a channel selection method using sequential backward floating search.

The aforementioned channel selection method based on traditional machine learning has achieved satisfactory results. However, it necessitates manual feature extraction and entails intricate steps. With the advancement of deep learning, researchers have devised a neural network architecture capable of processing Graph data and applied it in spatial analysis, known as graph neural network (GNN) (Luo et al. 2022). GNN, based on graph structure, exhibits advantages in modeling multi-channel and channel connectivity of EEG data while effectively extracting the topological relationship between channels (Chang et al. 2021). Graph convolutional network (GCN) is a variation of GNN that processes node features through convolution operations on the graph structure. Some researchers have employed GCN for channel selection. Liang et al. (2023) combined Pearson correlation coefficient (PCC) with GCN to propose a channel selection method based on node classification. Sun et al. (2023) proposed channel selection methods edge-selection and aggregation-selection utilizing the dynamically updated adjacency matrix in GCN, aiming to eliminate redundant channels. However, GCN-based channel selection methods necessitate predefining the relationship information between each node, which restricts network flexibility and generalization ability. Graph attention network (GAT) employs attention mechanism to dynamically compute relationship information between nodes, thereby exhibiting superior generalization capability when processing graph data. Demir et al. (2022) introduced a framework called EEG-GAT for multi-paradigm EEG classification tasks that utilizes GAT’s attention mechanism to automatically capture channel information and perform classification. Zhu et al. (2023) presented a multi-domain feature fusion model based by incorporating the GAT attention mechanism, improved accuracy was observed compared to using GCN alone.

Typically, traditional GNN models such as GAT are predominantly employed for handling static graph data, potentially resulting in a lack of mechanisms specifically designed to directly capture temporal information. The EEG signals generated by motor imagery possess multidimensional characteristics, encompassing temporal, spectral, and spatial domains that are relevant for brain classification tasks. These dimensions of information are interconnected rather than independent from each other, exerting mutual influence (Cong et al. 2015). The convolutional neural network (CNN) can capture features from various perspectives by employing convolutional kernels of different shapes (Hamedi et al. 2016). To address the challenge of capturing temporal information within individual channels in GAT, one-dimensional convolution (1D Conv) can be utilized. When dealing with sequential data, 1D Conv typically yields good results (Han et al. 2019) by performing convolution operations along one dimension of the input sequence using a sliding window to extract features. Lin et al. (2023) employed 1D Conv to capture EEG signal characteristics within channels and established a neural network model for emotion recognition and classification. Tang et al. (2023) proposed a novel multi-scale hybrid convolutional neural network (MSHCNN), which utilizes 1D Conv to extract high-level temporal features.

To fully leverage the spatiotemporal features of EEG signals and eliminate the laborious process of manual feature extraction, the present study proposes an innovative channel selection method, namely the spatio-temporal-graph attention network based channel selection (STGAT-CS), which integrates neurophysiology knowledge and graph theory principles. STGAT-CS treats channel selection as a node classification problem while considering both temporal representation within each EEG channel and topological relationships between channels. By combining 1D Conv with GAT to automatically extract features and perform classification, a model for channel selection is constructed to enhance classification accuracy while reducing the number of channels.

The remaining sections of this paper are structured as follows: Sect. "Methods" provides an overview of the relevant concepts and presents our method. Section "Results" introduces the dataset and showcases our results. In Sect. "Discussion", a comprehensive discussion is conducted. Finally, a conclusion is presented in Sect. "Conclusion".

Methods

In this study, we introduce STGAT-CS, a channel selection method that merges neurophysiology knowledge with graph theory principles. We address the channel selection challenge as a node classification task on a graph, where EEG channel distribution forms the graph. Initially, we utilize 1D Convolution to capture temporal information within the original EEG signals per channel. Then, employing GAT’s multi-head attention mechanism facilitates the dynamic capture of spatial relationships among channels, refining node features. Consequently, our model discerns effective channels from redundant ones. Finally, CSP features are extracted and SVM is applied for classification. The overall framework is shown in Fig. 1.

Fig. 1 — Spatio-temporal-graph attention network based channel selection

One-dimensional convolution

The 1D Conv is extensively employed for feature extraction and pattern recognition of 1D data. The extraction of channel-specific information without altering the number of channels can be achieved by performing 1D convolution operations on each independent channel, particularly for multidimensional data.

The 1D Conv utilizes a 1D convolution kernel $k$ to slide over the 1D representation of the data $x_{n}^{(l)} \in R^{T}$ , $n = 1, 2, . . . N$ , $N$ denotes the number of nodes, $T$ denotes the number of sampling points, $l$ is the number of 1D Conv layers, yielding an output $x_{n}^{(l + 1)}$ through multiplication between the input data and the convolution kernel followed by summation

\begin{matrix} x_{n}^{(l + 1)} [i] = \sum_{j = 1}^{M} (x_{n}^{(l)} [j] * k [i - j]) \end{matrix}

where $j$ is the index variable of the convolution operation, $i$ is the position in the output sequence, $k [i - j]$ denotes the values of the convolution kernel at the $i - j$ position, $*$ denotes the weighted summation process of convolving the input sequence with a kernel at different positions. Figure 2 shows the procedure of the 1D Conv where $M = 3$ .

Several advantages are offered by the utilization of 1D Conv. It enables the capture of local features in 1D data and exhibits translation invariance, thereby allowing recognition of data features regardless of their location. Given its efficacy in temporal feature extraction, employing a 1D Conv module is an effective approach for extracting temporal features.

Graph attention network

Graph is an abstract data structure comprising nodes and edges. Nodes typically represent entities or objects of a specific kind, while edges denote relationships or connections between these entities. GNNS are a class of models that model and learn from graph-structured data and can effectively capture the global and local properties of graphs. It continuously learns the structure and features of the graph through information transfer and aggregation between nodes, and new node features can be obtained at each layer.

A graph is represented as a tuple $G = (V, E)$ in GNN, where $V$ represents the collection of nodes, $E$ denotes the collection of edges, where $v_{i} \in V$ denotes the each node, $e_{ij} \in E$ represents the relationship between nodes, $i, j = 1, 2, . . . N$ , $N$ denotes the number of nodes. Each node $v_{i} \in V$ has an representation of initial feature vector $h_{i}^{(0)} \in R^{F}$ , $F$ denotes the initial time dimension of the vector and the output of the latest layer can be obtained by aggregating these node feature representations

\begin{matrix} h_{i}^{(l + 1)} = σ (\sum_{j \in N i} \frac{1}{c_{ij}} W_{ij}^{(l)} h_{j}^{(l)}) \end{matrix}

where $l$ denotes the number of GNN layers, $N i$ represents the set of neighboring nodes of node $v_{i}$ , and $c_{ij}$ is the normalization factor employed to balance the degrees of different nodes. $W_{ij}^{(l)} \in R^{F^{'} \times F}$ refers to the weight matrix, while $σ (\cdot)$ signifies a non-linear activation function and the $Re L U (\cdot)$ activation function is used in this study.

By facilitating information propagation and aggregation among neighboring nodes, the fundamental concept of GNN is to iteratively refine the feature representation of nodes. In each layer, the features $h_{i}^{(l)}$ and their corresponding weight matrix $W_{ij}^{(l)}$ are initially linearly transformed for neighbor nodes, followed by weighted summation to obtain the aggregated features of node $v_{i}$ . Subsequently, a nonlinear activation function $σ (\cdot)$ is applied to these aggregated features in order to derive the node feature representation $h_{i}^{(l + 1)}$ for the next layer. Through multiple layers of information diffusion and feature updates, GNN effectively captures both local and global graph structures while extracting high-level representations of individual nodes.

Through adaptively computing attention coefficients between each node and its neighboring nodes, the GAT effectively captures the inter-node relationships. As a variant of the GNN, it is able to achieve this. Specifically, GAT primarily employs the self-attention mechanism to calculate attention weights among nodes and update their respective features.

The attention weights for node $v_{i}$ and node $v_{j}$ can be acquired at layer $l$

\begin{matrix} α_{ij}^{(l)} = \frac{exp (e_{ij}^{(l)})}{\sum_{k \in N i} exp (e_{ik}^{(l)})} \end{matrix}

where $N i$ represents the neighboring nodes of node $v_{i}$ . The similarity coefficient is denoted by

\begin{matrix} e_{ij}^{(l)} = L e a k y R e L U (a^{(l) T} [W_{ij}^{(l)} h_{i}^{(l)} | | W_{ij}^{(l)} h_{i}^{(l)}]) \end{matrix}

where $W_{ij}^{(l)}$ is a shared parameter that adds dimension to the features of a node via a linear mapping, $| |$ denotes that the transformed features of vertices $i, j$ are concatenated, $a^{(l) T} (\cdot) \in R^{2 F^{'}}$ is a single-layer feed-forward neural network that maps concatenated high-dimensional features to a real number, $L e a k y R e L U (\cdot)$ is an activation function which is the variant of $Re L U (\cdot)$ , finally the attention coefficient can be obtained by softmax normalization in Eq. (3).

Then the new feature representation of node $v_{i}$ at layer $l + 1$ can be obtained through a weighted summation of the features of neighboring nodes and attention weights

\begin{matrix} h_{i}^{(l + 1)} = σ (\sum_{j \in N i} α_{ij}^{(l)} W_{ij}^{(l)} h_{j}^{(l)}) \end{matrix}

When employing multiple layers of GAT, the features of nodes can be progressively updated to accomplish feature learning and representation learning for graph data.

To further enhance its performance, the GAT can also utilize a multi-head attention mechanism. (Vaswani et al. 2017). Specifically, by employing multiple independent attention mechanisms and concatenating their features, or summing and then averaging, we can obtain more comprehensive output feature representations. To alleviate the computational burden, we adopt the multi-head attention mechanism of summation followed by averaging

\begin{matrix} h_{i}^{(l + 1)} = σ (\frac{1}{K} \sum_{k = 1}^{K} \sum_{j \in N i} α_{ij}^{k} W_{ij}^{k} h_{j}^{(l)}) \end{matrix}

where $K$ denotes the number of groups to obtain attention weights, the Fig. 3b shows the process of multi-head attention mechanism of $K = 3$ .

By effectively modeling the inter-node relationships, the GAT achieves remarkable performance on graph-structured data through the utilization of the attention mechanism. Consequently, GAT is envisioned as a dynamic approach for capturing cross-channel information and obtaining high-level feature representations for each channel.

Spatio-temporal-graph attention network based channel selection

In this method, we treat channel selection problem as a node classification problem and developed a graph-based network for classification.

Leveraging prior knowledge in the field of neurophysiology and channel selection, we designate certain channels as effective or redundant. According to studies on brain activity, it is widely believed that specific regions of the brain exhibit distinctions during MI tasks. Drawing upon neuroscience knowledge, conventional approaches typically select 10–20 effective EEG channels (C3 and C4) for placement due to their ability to distinguish different MI tasks (Pfurtscheller and DaSilva 1999; Gaur et al. 2015; Yang et al. 2017).

Therefore, we designate C3, C4, and Cz as effective channels since they are considered relevant to the motor cortex. Additionally, several unrelated occipital and frontal lobe channels are labeled as redundant before training our model using these labeled channels. The structure of our network can be seen in Fig. 4.

Fig. 4 — Framework of the STGAT-Net. a Dataset 1 and b Dataset 2

We denote the preprocessed EEG data as $X \in R^{N \times T}$ , where $N$ denotes the number of channels and $T$ denotes the number of sampling points. Additionally, we identified C3, C4, and Cz in the parietal region as effective channels, while AF3 and AF4 in the frontal lobe and O1 and O2 in the occipital lobe were considered redundant channels. The two-dimensional planar distribution of EEG electrodes is represented as a fully connected unweighted graph, with EEG data embedded in nodes. Each node’s feature is $h_{i} \in R^{T}$ , where $i, i = 1, 2 . . . N$ . We define the loss function

L = \frac{1}{|Y|} \sum_{i \in Y} loss (y_{i}, z_{i})

where $loss (\cdot)$ represents the cross entropy loss function, $Y$ is a set of nodes that have labels, $y_{i}$ is the ground truth of the ith node’s MI label, correspondingly, $z_{i}$ is the predicted probability of the MI label in the ith node. The new feature representation of each node $h_{ik}, k = 1, 2, 3$ , $k$ is the index of convolution layer, can be obtained after each layer of convolution by employing three 1D Conv layers with different kernel sizes, the process of calculation was shown in Eq. (1). Subsequently, the features are updated through three stacked layers of GAT, and the final output $z_{i}$ of the node $v_{i}$ is acquired by performing a weighted sum of neighbor node features and attention weights at each layer, followed by applying the multi-head attention mechanism to average them, as well as the procedure in Eqs. (5) and (6). Finally, the training model can be derived, and multiple rounds of training lead to obtaining the optimal model.

The optimal model is employed for channel selection in order to acquire the features of the final layer $F \in R^{N \times 2}$ , while the two-class weights can indicate the likelihood of each channel belonging to a specific channel class. Each channel is arranged based on its weight, and these sorted outcomes are utilized for subsequent channel selection.

Common spatial pattern

In MI-BCI field, the CSP algorithm is generally employed for feature extraction (Blankertz et al. 2007). The primary goal of CSP is to identify a projection that optimizes the discriminative power between two categories. Specifically, CSP constructs filters that aim to maximize the variance of projected data within one category while simultaneously minimizing it in the other category. Let the EEG signal be $X_{i} \in R^{N \times T}$ , where $N$ represents the number of channels, $T$ represents the number of sampling points, and $i$ indicates the motor imagery class $(i = 1, 2)$ , the spatial covariance matrices $Σ_{i} \in R^{N \times N}$ for the two classes can be calculated

\begin{matrix} Σ_{i} = \frac{X_{i} X_{i}^{T}}{trace (X_{i} X_{i}^{T})} \end{matrix}, i = 1, 2 .

where $trace (\cdot)$ operator represents the sum of the diagonal elements in the input matrix.

Consequently, the objective function can be formulated as the Rayleigh quotient

\begin{matrix} W^{*} = arg max \frac{W^{T} S_{d} W}{W^{T} S_{c} W} \end{matrix}

where $S_{d} = \sum_{1} - \sum_{2}$ and $S_{c} = \sum_{1} + \sum_{2}$ represent the discriminative activity and the common activity respectively, and $W \in R^{N \times N}$ is the spatial filter. The Lagrange multiplier method offers a solution to this optimization problem. Then we have the following solution:

S_{c}^{- 1} S_{d} W = λ W

where $λ$ is an eigenvalue of $S_{c}^{- 1} S_{d}$ while $W$ is an eigenvector of $S_{c}^{- 1} S_{d}$ .

Then, the optimal spatial filter $W^{*} \in R^{N \times N}$ can be constructed by selecting the first and last $m$ columns from the eigenvector matrix $W$ , resulting in the spatially filtered EEG signal

Z = W^{*} X

Finally, CSP features can be extracted from the filtered EEG signal as

\begin{matrix} f_{n} = log (\frac{v a r (z_{n})}{\sum_{n = 1}^{2 m} v a r (z_{n})}) \end{matrix}

where nth row of matrix $Z$ is $z_{n}$ , $var (\cdot)$ is the variance operation.

The CSP algorithm allows for the extraction of feature vectors that demonstrate optimal class differentiation, which can be subsequently utilized for classification tasks. This approach is extensively employed in MI-BCI systems and has demonstrated significant efficacy.

Support vector machines

The combination of CSP and SVM is widely employed for the classification of MI tasks because of its ability to attain high accuracy. Therefore, SVM is selected as the classifier in this experiment (Ali and Smith 2003). By computing a hyperplane, SVM effectively separates the two classes of data. The hyperplane can be mathematically denoted as $W^{T} x + b = 0$ , where $W$ is the weight vector and $b$ represents the bias term. The primary objective of separation is to maximize the margin between classes, which is subsequently transformed into convex quadratic programming

\begin{matrix} min_{w, b} \frac{1}{2} {| | w | |}^{2} + C \sum_{i = 1}^{n} ξ_{i}, \\ s.t. y_{i} (W^{T} f^{(i)} + b) \geq 1 - ξ_{i}, ξ_{i} \geq 0 \end{matrix}

where $C$ is regularization parameter of the soft margin, $ξ_{i}$ is slack variable, $y_{i}$ denotes class label, and $f^{(i)}$ corresponds to feature vector of the ith experiment.

We adopted a sequential approach that involved employing CSP on the first n selected channels to extract features from EEG signals, followed by classification using SVM. The average test accuracy was calculated through tenfold cross-validation, then we select the channel combination with the highest accuracy.

This experimental design enables us to identify the most suitable channel combination for achieving optimal classification results. By combining CSP and SVM, we can efficiently extract EEG features and achieve accurate classification, providing robust support for future research and applications.

Results

Dataset description

We utilize two competition datasets BCI Competition III Dataset IVa and BCI Competition IV Dataset I in this experiment. These datasets encompass EEG signals obtained from numerous healthy subjects during MI tasks.

Dataset 1 came from the part IVa of the BCI Competition III, which comprises 118 channels of EEG data from five subjects namely aa, al, av, aw and ay with a sampling rate of 100 Hz (Blankertz et al. 2006). The subjects performed MI tasks involving right hand or foot movements. Each task category consisted of 140 trials, resulting in a total of 280 trials for both categories. The duration of each trial was 3.5 s. The experimental procedure primarily revolved around displaying arrows on the screen as task prompts and instructing the subjects to engage in MI prompted by the arrow instructions. As shown in Fig. 5a, during the experiment process can be seen. Detailed information can be found at http://www.bbci.de/competition/iii/.

Fig. 5 — Timeline of a single experiment

Dataset 2 came form the BCI competition IV, I part, it includes EEG data from seven subjects namely a, b, c, d, e, f and g recorded using a sampling rate of 100 Hz (Zhang et al. 2012). However, we only utilized data from four out of these seven subjects (a, b, f, g). The remaining three subject’s (c, d, and e) data was manually created and not utilized in our analysis. For each category of task (left-hand or two-foot MI), there were a hundred trials resulting in a total of two hundred trials across both categories. During the experiment process are shown in Fig. 5b, fixed crosses and arrow prompts were displayed on the screen as cues for MI generation by the participants. Further details can be obtained at http://www.bbci.de/competition/iv/.

Data preprocessing

The ERD/ERS phenomenon is observed within the 8–13 Hz called alpha band and 14–30 Hz called beta band. Hence, we employed a finite length unit impulse response filter (FIR) for band-pass filtering the EEG signals of both datasets within the 8–30 Hz frequency range. This approach effectively mitigates interference from other frequencies, thereby reducing its impact on experimental outcomes.

In MI experiments, the time period from 0 to 1 s following the visual cue is commonly referred to as the preparation phase of imagination, while the interval between 3.5 s and 4 s is denoted as the post-imagination phase (Wang et al. 2017). To determine an appropriate data interception window, we utilized event-related spectral perturbation (ERSP) (Delorme and Makeig 2004) analysis. Considering data quality considerations, we set a uniform time window of 0.5–2.5 s following the visual cue for both datasets.

By applying frequency band processing and time window processing techniques, we preprocess raw EEG data with an aim to minimize interference and noise while selecting suitable temporal segments that capture information relevant to MI tasks. These preprocessing steps are crucial for subsequent feature extraction and classification processes.

Experiment settings

We accomplish this task by utilizing the third-party Python library torch_geometric.

Firstly, we conduct preprocessing on the raw EEG data and depict the distribution of EEG electrodes as a comprehensive non-authorized connection graph. Then, we embed the EEG data with prior labels into the graph to obtain structured graph data. Subsequently, time features within each node are automatically extracted through three 1D Conv layers with different kernel sizes. Additionally, we smooth the EEG signals of each node to mitigate potential noise effects. Furthermore, we employ GAT’s attention mechanism to dynamically calculate channel correlations and update node features layer by layer, ultimately obtaining advanced node features. Our approach utilizes an 8-head mechanism along with ReLU activation function to prevent overfitting and Dropout technique for reducing computational load by ignoring certain features. The loss function is provided in Eq. (7), while Adam serves as our optimizer with a learning rate set at 0.01; after training for 100 epochs, we obtain our best model. The Table 1 provided the parameters settings of our network.

Table 1.

Parameters of the STGAT-Net ( $T$ represents the initial feature dimension of each channel and $F_{n}$ is related to the number of samples)

Module	Parameters
1D Conv	$(T, 40, 1)$
1D Conv	$(40, 20, 1)$
1D Conv	$(20, 30, 1)$
Dropout	$p = 0.6$
GAT layer	$F_{n} \times F_{n}, h e a d = 8$
ReLU
GAT layer	$F_{n} \times 4, h e a d = 8$
ReLU
GAT layer	$4 \times 2, h e a d = 8$

Open in a new tab

The optimal model is then employed for testing purposes where it calculates probability weights indicating membership of each channel belonging to two types of channels under consideration. By comparing weight differences between effective channels and independent or redundant channels, one can assess their degree of association with effective channels accordingly. Based on these values, we select the top n channels as our final selection result.

Finally, we extract CSP features from these selected channels and validated using an SVM classifier.

Performance of the channel selection method

In order to evaluate the effectiveness of our method, we extract CSP features from different combinations of channels. These combinations included all channels (AC-CSP), only C3, C4, and Cz channels (3C-CSP), as well as the selected channels. A tenfold cross-validation approach was employed to train the model and evaluate its accuracy. Table 2 presents the highest accuracy achieved by each participant along with the corresponding number of channels utilized. We can see that the average accuracy of subjects achieved by the selected channel combination surpassed that obtained using all EEG channels (91.5% vs. 78.2%, 85.4% vs. 77.9%). Furthermore, statistical analysis using Wilcoxon signed-rank test indicated that the selected channel effectively captured MI patterns (p < 0.05) (Woolson 2007). By eliminating redundant channels, our average classification accuracy improved significantly. Additionally, subject ‘aa’ witnessed an increase in accuracy from 70.6 to 86.3%, while subject ‘av’ experienced an improvement from 69.2 to 84%. This represents a substantial increase of over ten percentage points in both cases. The comparison between STGAT-CS and 3C-CSP revealed a significantly higher classification accuracy for STGAT-CS (91.5% vs. 75.3%, 85.4% vs. 70.6%). This suggests that capturing ERD/ERS phenomenon comprehensively requires more than just three specific channels (C3, C4, and Cz). In contrast, the selected channel can capture this phenomenon more comprehensively.

Table 2.

Classification accuracy (%) of the STGAT-CS method on the two datasets

Participant	Method
	AC-CSP		3C-CSP		STGAT-CS
	Acc (%)	Num	Acc (%)	Num	Acc (%)	Num
aa	70.6	118	60.4	3	86.3	32
al	97.1	118	86.0	3	98.5	25
av	69.2	118	64.5	3	84.6	8
aw	76.5	118	75.2	3	88.2	14
ay	77.8	118	90.2	3	100.0	15
Mean ± std	78.2 ± 11.2	118	75.3 ± 13.0	3	91.5 ± 7.2	19
a	83.3	59	75.0	3	91.7	5
b	58.3	59	60.0	3	68.3	6
f	80.0	59	66.7	3	85.0	19
g	90.0	59	80.7	3	96.7	35
Mean ± std	77.9 ± 13.7	59	70.6 ± 9.1	3	85.4 ± 12.4	16
p value	0.047		0.010		–

Open in a new tab

Comparison with other methods and models

We compared our proposed method with following channel selection approaches. All methods extract the CSP features and utilized SVM classifier.

The CSP-rank (Tam et al. 2011) chose channels according to the eigenvectors corresponding to the largest and smallest eigenvalues after sorting the channels based on the magnitude of the filters obtained by the CSP.
The sparse CSP (SCSP) (Arvaneh et al. 2011) integrate the ratio of L1 norm to L2 norm as a restriction in the CSP objective function, which enables the acquisition of spatial filters by utilizing the refined objective function for sorting.
The location prior weight permutation entropy-CSP (PPWPE-CSP) (Sun et al. 2021) evaluate the channel on account of both amplitude and spatial information.
The channel selection method based on coefficient of variation (CVCS) (Xiao et al. 2022) categorize the channels pursuant to their contribution to the process of feature extraction, considering both the coefficient of variation and inter-class distance.
The channel selection based on graph convolutional network (GCN-CS) (Liang et al. 2023) utilize a combination of Pearson correlation coefficient (PCC) and GCN to perform EEG channel selection based on node classification.

As can be seen from the Table 3, the highest accuracy per subject is indicated in bold type. We can see that STGAT-CS exhibits the best average accuracy on both datasets. With the exception of subjects ‘al’ and ‘aw’, this method outperforms others in remaining subjects and even achieves 100% accuracy on subject ‘ay’. Furthermore, in dataset 1, this method utilizes fewer channels yet attains higher accuracy, surpassing CVCS by more than 3%, while reducing the number of channels by approximately 30. In dataset 2, despite selecting around 5 more channels than PPWPE-CSP, it results in an accuracy improvement of more than 6%. The results obtained from aforementioned testing methods demonstrate that the proposed channel selection approach outperforms alternative methods.

Table 3.

Comparison of classification accuracy and number of channels between STGAT-CS and other channel selection methods

Participant	Method
	CSP-rank	SCSP	PPWPE-CSP	CVCS	GCN-CS	STGAT-CS
	Acc (%)/Num	Acc (%)/Num	Acc (%)/Num	Acc (%)/Num	Acc (%)/Num	Acc (%)/Num
aa	81.5/25	80.3/85	82.6/13	85.4/56	83.2/20	86.3/32
al	96.2/22	79.2/59	98.8/50	97.9/49	98.9/37	98.5/25
av	63.7/10	65.3/12	59.3/49	71.8/28	74.6/6	84.6/8
aw	89.5/34	85.4/53	91.1/32	88.9/93	95.0/51	88.2/14
ay	88.3/36	93.2/36	89.4/15	94.3/48	93.9/20	100.0/15
Mean ± std	83.8/25 ± 12.4	84.3/49 ± 12.5	84.2/32 ± 15.1	87.7/55 ± 10.1	89.1/27 ± 9.9	91.5/19 ± 7.2
a	66.3/6	74.5/13	86.0/8	80.5/14	85.0/19	91.7/5
b	56.4/15	62.0/26	59.6/4	65.5/15	64.0/12	68.3/6
f	52.6/35	74.0/52	75.5/7	72.0/7	75.0/8	85.0/19
g	81.3/34	93.2/33	95.0/23	82.5/28	94.0/48	96.7/35
Mean ± std	64.2/23 ± 12.8	75.9/31 ± 12.9	79.0/11 ± 15.2	75.1/16 ± 7.9	79.5/22 ± 12.9	85.4/16 ± 12.4

Open in a new tab

We compare the channel selection effect of GCN-CS (GCN-based channel selection) (Liang et al. 2023) and GAT-CS (GAT-based channel selection), both of which are node classification models that utilize node aggregation in graph networks. It can be seen from the data in Table 4 that while GCN-CS requires prior construction of an adjacency matrix based on Pearson correlation coefficient, GAT-CS can directly input the unweighted fully connected graph and dynamically adjust aggregation weights between nodes using attention mechanism. Despite having more average channels than GCN-CS (34 vs. 27, 22 vs. 22), GAT-CS achieves comparable performance in terms of best accuracy (88.0% vs. 89.1%, 82.9% vs. 79.5%), eliminating the need for complex manual calculation process for adjacency matrix and greatly simplifying operation steps. Additionally, we introduce a 1D Conv module into GAT-CS, resulting in significant reduction in number of channels (19 vs. 34, 16 vs. 22). This improvement leads to enhanced accuracy (91.5% vs. 88.0%, 85 0.4% vs. 82 0.9%), validating the effectiveness of temporal information extraction within the introduced convolutional channels.

Table 4.

Comparison results of node classification models

Participant	Method
	GCN-CS		GAT-CS		STGAT-CS
	Acc (%)	Num	Acc (%)	Num	Acc (%)	Num
aa	83.2	20	80.4	43	86.3	32
al	98.9	37	97.5	30	98.5	25
av	74.6	6	79.6	10	84.6	8
aw	95.0	51	85.2	18	88.2	14
ay	93.9	20	97.2	71	100.0	15
Mean ± std	89.1 ± 9.9	27	88.0 ± 8.8	34	91.5 ± 7.2	19
a	85.0	19	86.7	7	91.7	5
b	64.0	12	68.3	25	68.3	6
f	75.0	8	83.3	43	85.0	19
g	94.0	48	93.3	14	96.7	35
Mean ± std	79.5 ± 12.9	22	82.9 ± 10.6	22	85.4 ± 12.4	16

Open in a new tab

Relationship between the number of channels and the average precision

We calculated the correlation between the ratio of selected channels number and the average accuracy of all subjects. The sorted channels are retained proportionally, and the average accuracy is plotted against the standard deviation on different datasets.

As shown in Fig. 6, it can be observed that for BCIIII Iva, the ratio of channel number is reduced from 100 to 80%, yet there is no decrease but an increase in accuracy, indicating effective removal of irrelevant channels. Further reduction in the ratio of channel number does not significantly reduce accuracy, thereby excluding the influence of redundant channels. Notably, when only 20% of the channels remain, it yields the highest accuracy among all proportions, demonstrating the effectiveness of this method in eliminating irrelevant or redundant channels associated with MI. In BCIIV I case, although a slight decrease in accuracy occurs when reducing channel count from 100 to 80%, maintaining final accuracy becomes possible as further reduction eliminates redundant channels effectively. This method also exhibits favorable results for fixed channel numbers.

Furthermore, we have also generated a line chart illustrating the relationship between the number of channels and accuracy for each participant in Dataset 2 in Fig. 7. The red line represents the highest classification accuracy achieved along with its corresponding number of channels. Notably, subjects ‘a’, ‘b’, and ‘f’ exhibit superior accuracy while utilizing less than two-thirds of the total channel count. Subject ‘g’, on the other hand, attains optimal accuracy at 35 channels; however, it already achieves remarkably high accuracy when employing approximately 15 channels. This indicates that about two-thirds of the channels can be excluded effectively to eliminate irrelevant or redundant information. Additionally, an intuitive observation across all subjects reveals that as the number of channels increases, there is initially an improvement in accuracy followed by a slight decline before stabilizing at a satisfactory level. These findings demonstrate that our selected channels possess excellent discrimination capabilities for MI tasks while most excluded channels are indeed irrelevant or redundant.

Fig. 7 — Relationship between the number of channels and average precision in Dataset 2. a Subject a, b Subject b, c Subject f and d Subject g

We use the T-SNE technique to visually represent the features distribution, we project the CSP from all channels and the channels we selected onto a 2D plane. As can be seen from the Fig. 8, the enhanced separability of the selected channels is evident compared to utilizing all channels. For subjects ‘aa’ and ‘b’, after applying STGAT-CS, Table 2 demonstrates an improvement exceeding 10%, accompanied by a heightened level of separability illustrated in the figure. Similar outcomes are observed for other subjects as well. These findings imply that eliminating redundant EEG channels can augment feature separability and contribute to enhancing classification performance.

Fig. 8 — Comparison of feature distributions. a Subject aa and b Subject b

In order to enhance channel discriminability, we consider the difference in weights as a measure of effectiveness, with higher scores indicating stronger MI effects. After sorting the channels, we select the top 10 and bottom 10 channels for each subject in dataset 2 and plot their electrode distribution. The electrode distribution map highlights effective channels in red and irrelevant or redundant channels in blue.

From the Fig. 9 we can see notable commonalities across all subjects, with the parietal region (proximate to channels C3, C4, and Cz) predominantly hosting effective channels. Moreover, frontal and occipital regions primarily contain irrelevant or redundant channels. This observation aligns with the neurocognitive concept that voluntary movements are governed by the frontal motor cortex. Furthermore, the resemblance in channel topography underscores the potential of our proposed method as a powerful tool for identifying critical areas within multi-channel EEG signal.

Discussion

Interpretability of STGAT in channel selection

Most MI-BCI require manual specification of specific methods to rank the importance of channels for channel selection. For example, Wang et al. (2020) designed an EEG channel selection weight update method based on canonical correlation analysis (CCA), where they used the cross-validation method of support vector machine (SVM) classifier to update the weights and select the channels according to the weights. However, this channel selection method requires manual feature extraction and the steps are cumbersome.

The model using deep learning can rely on the powerful computing power of computers to achieve automatic feature extraction. For example, Liang et al. (2023) combined Pearson correlation coefficient (PCC) and GCN to propose an EEG channel selection method based on node classification. However, EEG signals are multidimensional signals, and such graph-based channel selection methods often rely on the spatial relationship of EEG signals while ignoring the temporal information, resulting in poor performance of channel selection methods. Moreover, the channel selection method based on GCN needs to define the relationship information between each node in advance, which limits the flexibility and generalization ability of the network.

The use of 1D Conv combined with graph attention network, on the one hand, can capture the characteristics of time information in a single channel with the help of 1D Conv, making the information contained in the features more comprehensive. On the other hand, the GAT relies on the multi-head attention mechanism, which can adaptively capture the dependencies between nodes, thereby improving the generalization of the network.

Validity of STGAT-CS

In motor imagery classification, relying solely on 1D Conv for temporal feature extraction may not significantly enhance classification outcomes, while the efficacy of attention mechanisms in capturing inter-node dependencies could be limited. However, in our study, we address the channel selection challenge by framing it as a node classification problem on a graph, leveraging a combination of 1D Conv and GAT. By integrating neurophysiological insights, we construct graph data with tailored labels and employ the proposed STGAT-CS method to learn, predict, and select effective channels, thereby optimizing channel selection. The main contributions of our research are as follows: (1) Leveraging the popular deep learning method enables automatic feature extraction, thereby eliminating laborious manual feature extraction steps; (2) By considering the channel selection problem as a node classification problem within graph theory, we introduce the attention mechanism to simplify operational procedures and improve the generalization ability; (3) Through combining 1D Conv and GAT, we effectively extract spatiotemporal features from EEG signals, enhancing distinguishability of information contained within these features.

Specifically, the channel classification model STGAT was initially designed by combining the characteristics of 1D Conv and GAT, the model provides in Fig. 4. Subsequently, we extract CSP features from the channels we selected and SVM was employed for classification. The model was evaluated on two publicly available contest datasets. Through tenfold cross-validation, the optimal number of selected channels and accuracy were determined, the whole process shows in Fig. 1.

A comparison between the selected channels in this method, all channels, and traditional 3 channels revealed a higher precision rate as compares in Table 2, it can be seen that our selected channel works better than all channels and the mainstream channel of traditional 3 channels. Furthermore, when compared with other classical and recent popular methods, our approach yielded superior results, as shown in Table 3, we can see that our channel selection method works well in the moment. Additionally, we conducted a comparative analysis to assess the impact of both models used to node classification and the influence of the 1D Conv module on channel selection effectiveness; ultimately confirming the advantages of our chosen model, it can been seen at Table 4, The efficacy of the GAT with its attention mechanism becomes evident in its ability to capture inter-channel dependencies, comparable to the GAT utilizing manually extracted and constructed adjacency matrices. Moreover, the incorporation of the 1D Conv module addresses the temporal information inadequacies inherent in graph-based networks, leading to improved channel selection outcomes. Finally, we visually represented the effect of channel selection from various perspectives including feature distribution diagrams and electrode distribution diagrams for selected channels, effectively demonstrating the efficacy of our channel selection method, as shown in Figs. 6, 7, 8 and 9, Our results demonstrate impressive accuracy gains with fewer channels, highlighting the cost-effectiveness of our channel selection method. We require less than a third of the channels yet achieve equivalent accuracy to using all channels. Furthermore, the feature distribution of our selected channels is more discriminative post-selection. Notably, our selected channels are concentrated around C3, C4, and Cz electrodes, aligning well with neurophysiological knowledge.

Future work

Our method facilitates convenient channel selection, eliminating the need for manual feature extraction and overcoming the limitations of traditional methods in capturing comprehensive EEG signal information. Moving forward, we can initially address the deep learning’s inherent limitations, like its significant computational overhead and extensive parameterization, by reducing training parameters and developing lightweight models. Furthermore, we can leverage information theory to achieve a more optimal combination of effective channel markers.

Conclusion

This paper introduces a novel method, the spatio-temporal-graph attention network based channel selection (STGAT-CS), for EEG signal channel selection. By harnessing the capabilities of deep learning networks in automatic feature extraction, while simultaneously capturing temporal and spatial topological information, our approach facilitates comprehensive and efficient channel selection directly from raw data. In the context of MI-based BCI, we frame the channel selection problem as a node classification task on a graph, treating EEG electrodes as graph nodes and utilizing attention mechanisms to capture inter-node information.

Specifically, temporal information from each EEG channel is extracted using 1D Conv, while the attention mechanism of graph attention networks automatically captures the topological relationships between EEG signals among channels, thereby updating node features. Channels are then sorted and selected based on the output weights of the network, effectively modeling the topological relationships between channels and conducting channel selection.

Additionally, we employ the CSP method to extract features from selected channels, and SVM for MI classification, tested on two public competition datasets. Experimental results illustrate that our method significantly reduces redundant channels compared to using all channels, resulting in fewer selected channels with higher accuracy. Furthermore, our approach outperforms other methods in terms of average classification accuracy and rational channel selection, thus validating the effectiveness and rationality of our channel selection method. In summary, our proposed method offers new insights and approaches for related research.

Acknowledgements

This work was supported in part by National Natural Science Foundation of China under 62271181, Grant 62171171, Grant 62071161, and Grant 62371171.

Declarations

Conflict of interest

The authors declare that they have no conflicts of interest.

Footnotes

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

Ali S, Smith KA (2003) Matching svm kernel’s suitability to data characteristics using tree by fuzzy c-means clustering. Des Appl Hybrid Intell Syst 2003:553–562. 10.5555/998038.998103 [Google Scholar]
Allison BZ, Kübler A, Jin J (2020) 30+years of P300 brain-computer interfaces. Psychophysiology 57(7):e13569. 10.1111/psyp.13569 [DOI] [PubMed] [Google Scholar]
Arvaneh M, Guan C, Ang KK, Quek C (2011) Optimizing the channel selection and classification accuracy in EEG-based BCI. IEEE Trans Biomed Eng 58(6):1865–1873. 10.1109/TBME.2011.2131142 [DOI] [PubMed] [Google Scholar]
Blankertz B, Muller KR, Krusienski DJ, Schalk G, Wolpaw JR, Schlogl A, Pfurtscheller G (2006) The BCI competition III: validating alternative approaches to actual BCI problems. IEEE Trans Neural Syst Rehabil Eng 14(2):153–159. 10.1109/TNSRE.2006.875642 [DOI] [PubMed] [Google Scholar]
Blankertz B, Tomioka R, Lemm S, Kawanabe M, Müller K (2007) Optimizing spatial filters for robust EEG single-trial analysis. IEEE Signal Process Mag 25(1):41–56. 10.1109/MSP.2008.4408441 [Google Scholar]
Blankertz B, Losch F, Krauledat M, Dornhege G, Curio G, Müller KR (2008) The Berlin brain-computer interface: accurate performance from first-session in BCI-naive subjects. IEEE Trans Biomed Eng 55(10):2452–2462. 10.1109/TBME.2008.923152 [DOI] [PubMed] [Google Scholar]
Chang W, Huang W, Yan G, Zhang Y (2021) EEG based graph network analysis for motor imagery task. In: 2021 6th international conference on computational intelligence and applications (ICCIA). IEEE, pp 185–189. 10.1109/ICCIA52886.2021.00043
Cincotti F, Mattia D, Babiloni C, Carducci F, Salinari S, Bianchi L, Marciani MG, Babiloni F (2003) The use of EEG modifications due to motor imagery for brain-computer interfaces. IEEE Trans Neural Syst Rehabil Eng 11(2):131–133. 10.1109/TNSRE.2003.814455 [DOI] [PubMed] [Google Scholar]
Cong F, Lin Q-H, Kuang L-D, Gong X-F, Astikainen P, Ristaniemi T (2015) Tensor decomposition of EEG signals: a brief review. J Neurosci Methods 248:59–69. 10.1016/j.jneumeth.2015.03.018 [DOI] [PubMed] [Google Scholar]
Delorme A, Makeig S (2004) EEGLAB: an open source toolbox for analysis of single-trial EEG dynamics including independent component analysis. J Neurosci Methods 134(1):9–21. 10.1016/j.jneumeth.2003.10.009 [DOI] [PubMed] [Google Scholar]
Demir A, Koike-Akino T, Wang Y, Erdoğmuş D (2022) EEG-GAT: graph attention networks for classification of electroencephalogram (EEG) signal. In: 2022 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC). IEEE, pp 30–35. 10.1109/EMBC48229.2022.9871984 [DOI] [PubMed]
Feng JK, Jin J, Daly I, Zhou J, Niu Y, Wang X, Cichocki A (2019) An optimized channel selection method based on multifrequency CSP Rank for motor imagery-based BCI system. Comput Intell Neurosci 2019:8068357. 10.1155/2019/8068357 [DOI] [PMC free article] [PubMed] [Google Scholar]
Gaur P, Pachori RB, Wang H, Prasad G (2015) An empirical mode decomposition based filtering method for classification of motor-imagery EEG signals for enhancing brain-computer interface. In: 2015 International joint conference on neural networks (IJCNN). IEEE, pp 1–7. 10.1109/IJCNN.2015.7280754
Hamedi M, Salleh SH, Noor AM (2016) Electroencephalographic motor imagery brain connectivity analysis for BCI: a review. Neural Comput 28(6):999–1041. 10.1162/NECO_a_00838 [DOI] [PubMed] [Google Scholar]
Han KJ, Prieto R, Ma T (2019) State-of-the-art speech recognition using multi-stream self-attention with dilated 1D Conv. In: 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU). IEEE, pp 54–61. 10.1109/ASRU46091.2019.9003730
Hsu CC, Yeh CL, Lee WK, Hsu HT, Shyu KK (2020) Extraction of high-frequency SSVEP for BCI control using iterative filtering based empirical mode decomposition. Biomed Signal Process Control 61:102022. 10.1016/j.bspc.2020.102022 [Google Scholar]
Huang M, Daly I, Jin J, Zhang Y, Wang X, Cichocki A (2016) An exploration of spatial auditory BCI paradigms with different sounds: music notes versus beeps. Cogn Neurodyn 10:201–209. 10.1007/s11571-016-9377-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
Liang W, Jin J, Daly I, Sun H, Wang X, Cichocki A (2023) Novel channel selection model based on graph convolutional network for motor imagery. Cogn Neurodyn 17:1283–1296. 10.1007/s11571-022-09892-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
Lin X, Chen J, Ma W, Tang W, Wang Y (2023) EEG emotion recognition using improved graph neural network with channel selection. Comput Methods Programs Biomed 231:107380. 10.1016/j.cmpb.2023.107380 [DOI] [PubMed] [Google Scholar]
Liu Y, Zhang H, Chen M, Zhang L (2015) A boosting-basedspatial-spectral model for stroke patients’ EEG analysis in rehabilitation training. IEEE Trans Neural Syst Rehabil Eng 24(1):169–179. 10.1109/TNSRE.2015.2466079 [DOI] [PubMed] [Google Scholar]
Luo C, Li F, Li P, Yi C, Li C, Tao Q, Zhang X, Si Y, Yao D, Yin G, Song P, Wang H, Xu P (2022) A survey of brain network analysis by electroencephalographic signals. Cogn Neurodyn 16:17–41. 10.1007/s11571-021-09689-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
Park Y, Chung W (2020) A novel EEG correlation coefficient feature extraction approach based on demixing EEG channel pairs for cognitive task classification. IEEE Access 8:87422–87433. 10.1109/ACCESS.2020.2993318 [Google Scholar]
Pfurtscheller G, DaSilva FHL (1999) Event-related EEG/MEG synchronization and desynchronization: basic principles. Clin Neurophysiol 110(11):1842–1857. 10.1016/S1388-2457(99)00141-8 [DOI] [PubMed] [Google Scholar]
Rong Y, Wu X, Zhang Y (2020) Classification of motor imagery electroencephalography signals using continuous small convolutional neural network. Int J Imaging Syst Technol 30(3):653–659. 10.1002/ima.22405 [Google Scholar]
Schlögl A, Lee F, Bischof H, Pfurtscheller G (2005) Characterization of four-class motor imagery EEG data for the BCI-competition 2005. J Neural Eng 2(4):L14. 10.1088/1741-2560/2/4/L02 [DOI] [PubMed] [Google Scholar]
Sun H, Jin J, Kong W, Zuo C, Li S, Wang X (2021) Novel channel selection method based on position priori weighted permutation entropy and binary gravity search algorithm. Cogn Neurodyn 15:141–156. 10.1007/s11571-020-09608-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
Sun B, Liu Z, Wu Z, Mu C, Li T (2023) Graph Convolution neural network based end-to-end channel selection and classification for motor imagery brain-computer interfaces. IEEE Trans Industr Inform 19:9314–9324. 10.1109/TII.2022.3227736 [Google Scholar]
Tam WK, Ke Z, Tong KY (2011) Performance of common spatial pattern under a smaller set of EEG electrodes in brain-computer interface on chronic stroke patients: a multi-session dataset study. In: 2011 annual international conference of the IEEE engineering in medicine and biology society. IEEE, pp 6344–6347. 10.1109/IEMBS.2011.6091566 [DOI] [PubMed]
Tang C, Gao T, Li Y, Chen B (2022) EEG channel selection based on sequential backward floating search for motor imagery classification. Front Neurosci 16:1045851. 10.3389/fnins.2022.1045851 [DOI] [PMC free article] [PubMed] [Google Scholar]
Tang X, Yang C, Sun X, Zou M, Wang H (2023) Motor imagery EEG decoding based on multi-scale hybrid networks and feature enhancement. IEEE Trans Neural Syst Rehabil Eng 31:1208–1218. 10.1109/TNSRE.2023.3242280 [DOI] [PubMed] [Google Scholar]
Vaswani A, Shazeer N, Parmar N (2017) Attention is all you need. Adv Neural Inf Process Syst, 30. preprint arXiv:2212.01020. 10.48550/arXiv.1706.03762
Wang HT, Li T, Huang H, He YB, Liu XC (2017) A motor imagery analysis algorithm based on spatio-temporal-frequency joint selection and relevance vector machine. Control Theory Appl 34(10):1403–1408. 10.7641/CTA.2017.70169 [Google Scholar]
Wang Q, Cao T, Liu D, Zhang M, Lu JY, Bai O, Sun J (2020) A motor-imagery channel-selection method based on SVM-CCA-CS. Meas Sci Technol 32(3):035701. 10.1088/1361-6501/abc205 [Google Scholar]
Woolson RF (2007) Wilcoxon signed‐rank test. Wiley encyclopedia of clinical trials, pp 1–3. 10.1002/9780471462422.eoct979
Xiao R, Huang Y, Xu R, Wang B, Wang X (2022) Coefficient-of-variation-based channel selection with a new testing framework for MI-based BCI. Cogn Neurodyn 16:791–803. 10.1007/s11571-021-09752-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
Yang Y, Chevallier S, Wiart J, Bloch I (2017) Subject-specific time-frequency selection for multi-class motor imagery-based BCIs using few Laplacian EEG channels. Biomed Signal Process Control 38:302–311. 10.1016/j.bspc.2017.06.016 [Google Scholar]
Zhang H, Guan C, Ang KK, Wang C (2012) BCI competition IV–data set I: learning discriminative patterns for self-paced EEG-based motor imagery detection. Front Neurosci 6:7. 10.3389/fnins.2012.00007 [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhang X, Jin J, Li S, Wang X, Cichocki A (2021) Evaluation of color modulation in visual P300-speller using new stimulus patterns. Cogn Neurodyn 15:873–886. 10.1007/s11571-021-09669-y [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhu K, Zhang X, Wang J, Cheng N, Xiao J (2023) Improving EEG-based Emotion Recognition by Fusing Time-Frequency and Spatial Representations. In: 2023–2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp 1–5. 10.1109/ICASSP49357.2023.10097171
Zuo C, Jin J, Yin E, Saab R, Miao Y, Wang X, Hu D, Cichocki A (2020) Novel hybrid brain-computer interface system based on motor imagery and P300. Cogn Neurodyn 14:253–265. 10.1007/s11571-019-09560-x [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR1] Ali S, Smith KA (2003) Matching svm kernel’s suitability to data characteristics using tree by fuzzy c-means clustering. Des Appl Hybrid Intell Syst 2003:553–562. 10.5555/998038.998103 [Google Scholar]

[CR2] Allison BZ, Kübler A, Jin J (2020) 30+years of P300 brain-computer interfaces. Psychophysiology 57(7):e13569. 10.1111/psyp.13569 [DOI] [PubMed] [Google Scholar]

[CR3] Arvaneh M, Guan C, Ang KK, Quek C (2011) Optimizing the channel selection and classification accuracy in EEG-based BCI. IEEE Trans Biomed Eng 58(6):1865–1873. 10.1109/TBME.2011.2131142 [DOI] [PubMed] [Google Scholar]

[CR4] Blankertz B, Muller KR, Krusienski DJ, Schalk G, Wolpaw JR, Schlogl A, Pfurtscheller G (2006) The BCI competition III: validating alternative approaches to actual BCI problems. IEEE Trans Neural Syst Rehabil Eng 14(2):153–159. 10.1109/TNSRE.2006.875642 [DOI] [PubMed] [Google Scholar]

[CR5] Blankertz B, Tomioka R, Lemm S, Kawanabe M, Müller K (2007) Optimizing spatial filters for robust EEG single-trial analysis. IEEE Signal Process Mag 25(1):41–56. 10.1109/MSP.2008.4408441 [Google Scholar]

[CR6] Blankertz B, Losch F, Krauledat M, Dornhege G, Curio G, Müller KR (2008) The Berlin brain-computer interface: accurate performance from first-session in BCI-naive subjects. IEEE Trans Biomed Eng 55(10):2452–2462. 10.1109/TBME.2008.923152 [DOI] [PubMed] [Google Scholar]

[CR7] Chang W, Huang W, Yan G, Zhang Y (2021) EEG based graph network analysis for motor imagery task. In: 2021 6th international conference on computational intelligence and applications (ICCIA). IEEE, pp 185–189. 10.1109/ICCIA52886.2021.00043

[CR8] Cincotti F, Mattia D, Babiloni C, Carducci F, Salinari S, Bianchi L, Marciani MG, Babiloni F (2003) The use of EEG modifications due to motor imagery for brain-computer interfaces. IEEE Trans Neural Syst Rehabil Eng 11(2):131–133. 10.1109/TNSRE.2003.814455 [DOI] [PubMed] [Google Scholar]

[CR9] Cong F, Lin Q-H, Kuang L-D, Gong X-F, Astikainen P, Ristaniemi T (2015) Tensor decomposition of EEG signals: a brief review. J Neurosci Methods 248:59–69. 10.1016/j.jneumeth.2015.03.018 [DOI] [PubMed] [Google Scholar]

[CR10] Delorme A, Makeig S (2004) EEGLAB: an open source toolbox for analysis of single-trial EEG dynamics including independent component analysis. J Neurosci Methods 134(1):9–21. 10.1016/j.jneumeth.2003.10.009 [DOI] [PubMed] [Google Scholar]

[CR11] Demir A, Koike-Akino T, Wang Y, Erdoğmuş D (2022) EEG-GAT: graph attention networks for classification of electroencephalogram (EEG) signal. In: 2022 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC). IEEE, pp 30–35. 10.1109/EMBC48229.2022.9871984 [DOI] [PubMed]

[CR12] Feng JK, Jin J, Daly I, Zhou J, Niu Y, Wang X, Cichocki A (2019) An optimized channel selection method based on multifrequency CSP Rank for motor imagery-based BCI system. Comput Intell Neurosci 2019:8068357. 10.1155/2019/8068357 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR13] Gaur P, Pachori RB, Wang H, Prasad G (2015) An empirical mode decomposition based filtering method for classification of motor-imagery EEG signals for enhancing brain-computer interface. In: 2015 International joint conference on neural networks (IJCNN). IEEE, pp 1–7. 10.1109/IJCNN.2015.7280754

[CR14] Hamedi M, Salleh SH, Noor AM (2016) Electroencephalographic motor imagery brain connectivity analysis for BCI: a review. Neural Comput 28(6):999–1041. 10.1162/NECO_a_00838 [DOI] [PubMed] [Google Scholar]

[CR15] Han KJ, Prieto R, Ma T (2019) State-of-the-art speech recognition using multi-stream self-attention with dilated 1D Conv. In: 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU). IEEE, pp 54–61. 10.1109/ASRU46091.2019.9003730

[CR16] Hsu CC, Yeh CL, Lee WK, Hsu HT, Shyu KK (2020) Extraction of high-frequency SSVEP for BCI control using iterative filtering based empirical mode decomposition. Biomed Signal Process Control 61:102022. 10.1016/j.bspc.2020.102022 [Google Scholar]

[CR17] Huang M, Daly I, Jin J, Zhang Y, Wang X, Cichocki A (2016) An exploration of spatial auditory BCI paradigms with different sounds: music notes versus beeps. Cogn Neurodyn 10:201–209. 10.1007/s11571-016-9377-1 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR18] Liang W, Jin J, Daly I, Sun H, Wang X, Cichocki A (2023) Novel channel selection model based on graph convolutional network for motor imagery. Cogn Neurodyn 17:1283–1296. 10.1007/s11571-022-09892-1 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR19] Lin X, Chen J, Ma W, Tang W, Wang Y (2023) EEG emotion recognition using improved graph neural network with channel selection. Comput Methods Programs Biomed 231:107380. 10.1016/j.cmpb.2023.107380 [DOI] [PubMed] [Google Scholar]

[CR20] Liu Y, Zhang H, Chen M, Zhang L (2015) A boosting-basedspatial-spectral model for stroke patients’ EEG analysis in rehabilitation training. IEEE Trans Neural Syst Rehabil Eng 24(1):169–179. 10.1109/TNSRE.2015.2466079 [DOI] [PubMed] [Google Scholar]

[CR21] Luo C, Li F, Li P, Yi C, Li C, Tao Q, Zhang X, Si Y, Yao D, Yin G, Song P, Wang H, Xu P (2022) A survey of brain network analysis by electroencephalographic signals. Cogn Neurodyn 16:17–41. 10.1007/s11571-021-09689-8 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR22] Park Y, Chung W (2020) A novel EEG correlation coefficient feature extraction approach based on demixing EEG channel pairs for cognitive task classification. IEEE Access 8:87422–87433. 10.1109/ACCESS.2020.2993318 [Google Scholar]

[CR23] Pfurtscheller G, DaSilva FHL (1999) Event-related EEG/MEG synchronization and desynchronization: basic principles. Clin Neurophysiol 110(11):1842–1857. 10.1016/S1388-2457(99)00141-8 [DOI] [PubMed] [Google Scholar]

[CR24] Rong Y, Wu X, Zhang Y (2020) Classification of motor imagery electroencephalography signals using continuous small convolutional neural network. Int J Imaging Syst Technol 30(3):653–659. 10.1002/ima.22405 [Google Scholar]

[CR25] Schlögl A, Lee F, Bischof H, Pfurtscheller G (2005) Characterization of four-class motor imagery EEG data for the BCI-competition 2005. J Neural Eng 2(4):L14. 10.1088/1741-2560/2/4/L02 [DOI] [PubMed] [Google Scholar]

[CR26] Sun H, Jin J, Kong W, Zuo C, Li S, Wang X (2021) Novel channel selection method based on position priori weighted permutation entropy and binary gravity search algorithm. Cogn Neurodyn 15:141–156. 10.1007/s11571-020-09608-3 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR27] Sun B, Liu Z, Wu Z, Mu C, Li T (2023) Graph Convolution neural network based end-to-end channel selection and classification for motor imagery brain-computer interfaces. IEEE Trans Industr Inform 19:9314–9324. 10.1109/TII.2022.3227736 [Google Scholar]

[CR28] Tam WK, Ke Z, Tong KY (2011) Performance of common spatial pattern under a smaller set of EEG electrodes in brain-computer interface on chronic stroke patients: a multi-session dataset study. In: 2011 annual international conference of the IEEE engineering in medicine and biology society. IEEE, pp 6344–6347. 10.1109/IEMBS.2011.6091566 [DOI] [PubMed]

[CR29] Tang C, Gao T, Li Y, Chen B (2022) EEG channel selection based on sequential backward floating search for motor imagery classification. Front Neurosci 16:1045851. 10.3389/fnins.2022.1045851 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR30] Tang X, Yang C, Sun X, Zou M, Wang H (2023) Motor imagery EEG decoding based on multi-scale hybrid networks and feature enhancement. IEEE Trans Neural Syst Rehabil Eng 31:1208–1218. 10.1109/TNSRE.2023.3242280 [DOI] [PubMed] [Google Scholar]

[CR31] Vaswani A, Shazeer N, Parmar N (2017) Attention is all you need. Adv Neural Inf Process Syst, 30. preprint arXiv:2212.01020. 10.48550/arXiv.1706.03762

[CR32] Wang HT, Li T, Huang H, He YB, Liu XC (2017) A motor imagery analysis algorithm based on spatio-temporal-frequency joint selection and relevance vector machine. Control Theory Appl 34(10):1403–1408. 10.7641/CTA.2017.70169 [Google Scholar]

[CR33] Wang Q, Cao T, Liu D, Zhang M, Lu JY, Bai O, Sun J (2020) A motor-imagery channel-selection method based on SVM-CCA-CS. Meas Sci Technol 32(3):035701. 10.1088/1361-6501/abc205 [Google Scholar]

[CR34] Woolson RF (2007) Wilcoxon signed‐rank test. Wiley encyclopedia of clinical trials, pp 1–3. 10.1002/9780471462422.eoct979

[CR35] Xiao R, Huang Y, Xu R, Wang B, Wang X (2022) Coefficient-of-variation-based channel selection with a new testing framework for MI-based BCI. Cogn Neurodyn 16:791–803. 10.1007/s11571-021-09752-4 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR36] Yang Y, Chevallier S, Wiart J, Bloch I (2017) Subject-specific time-frequency selection for multi-class motor imagery-based BCIs using few Laplacian EEG channels. Biomed Signal Process Control 38:302–311. 10.1016/j.bspc.2017.06.016 [Google Scholar]

[CR37] Zhang H, Guan C, Ang KK, Wang C (2012) BCI competition IV–data set I: learning discriminative patterns for self-paced EEG-based motor imagery detection. Front Neurosci 6:7. 10.3389/fnins.2012.00007 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR38] Zhang X, Jin J, Li S, Wang X, Cichocki A (2021) Evaluation of color modulation in visual P300-speller using new stimulus patterns. Cogn Neurodyn 15:873–886. 10.1007/s11571-021-09669-y [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR39] Zhu K, Zhang X, Wang J, Cheng N, Xiao J (2023) Improving EEG-based Emotion Recognition by Fusing Time-Frequency and Spatial Representations. In: 2023–2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp 1–5. 10.1109/ICASSP49357.2023.10097171

[CR40] Zuo C, Jin J, Yin E, Saab R, Miao Y, Wang X, Hu D, Cichocki A (2020) Novel hybrid brain-computer interface system based on motor imagery and P300. Cogn Neurodyn 14:253–265. 10.1007/s11571-019-09560-x [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

STGAT-CS: spatio-temporal-graph attention network based channel selection for MI-based BCI

Ming Meng

Bin Xu

Yuliang Ma

Yunyuan Gao

Zhizeng Luo

Abstract

Introduction

Methods

Fig. 1.

One-dimensional convolution

Fig. 2.

Graph attention network

Fig. 3.

Spatio-temporal-graph attention network based channel selection

Fig. 4.

Common spatial pattern

Support vector machines

Results

Dataset description

Fig. 5.

Data preprocessing

Experiment settings

Table 1.

Performance of the channel selection method

Table 2.

Comparison with other methods and models

Table 3.

Table 4.

Relationship between the number of channels and the average precision

Fig. 6.

Fig. 7.

Fig. 8.

Fig. 9.

Discussion

Interpretability of STGAT in channel selection

Validity of STGAT-CS

Future work

Conclusion

Acknowledgements

Declarations

Conflict of interest

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases