Abstract
Spatial filtering is widely used in brain-computer interface (BCI) systems to augmented signal characteristics of electroencephalogram (EEG) signals. In this study, a spatial domain filtering based EEG feature extraction method, optimal discriminant hyperplane—common spatial subspace decomposition (ODH—CSSD) is proposed. Specifically, the multi-dimensional EEG features were extracted from the original EEG signals by common space subspace decomposition (CSSD) algorithm, and the optimal feature criterion was established to find the multi-dimensional optimal projection space. A classic method of data dimension optimizing is using the eigenvectors of a lumped covariance matrix corresponding to the maximum eigenvalues. Then, the cost function is defined as the extreme value of the discriminant criterion, and the orthogonal N discriminant vectors corresponding to the N extreme value of the criterion are solved and constructed into the N-dimensional optimal feature space. Finally, the multi-dimensional EEG features are projected into the N-dimensional optimal projection space to obtain the optimal N-dimensional EEG features. Moreover, this study involves the extraction of two-dimensional and three-dimensional optimal EEG features from motor imagery EEG datasets, and the optimal EEG features are identified using the interpretable discriminative rectangular mixture model (DRMM). Experimental results show that the accuracy of DRMM to identify two-dimensional optimal features is more than 0.91, and the highest accuracy even reaches 0.975. Meanwhile, DRMM has the most stable recognition accuracy for two-dimensional optimal features, and its average clustering accuracy reaches 0.942, the gap between the accuracy of the DRMM with the accuracy of the FCM and K-means can reach 0.26. And the optimal three-dimensional features, for most subjects, the clustering accuracy of DRMM is higher than that of FCM and K-means. In general, the decision rectangle obtained by DRMM can clearly explain the difference of each cluster, notably, the optimization of multidimensional EEG features by optimal projection is superior to Fisher's ratio, and this method provides an alternative for the application of BCI.
Supplementary Information
The online version contains supplementary material available at 10.1007/s11571-021-09768-w.
Keywords: Electroencephalogram (EEG), Motor imagery, Optimal discriminant hyperplane, Interpretable clustering
Introduction
Electroencephalogram (EEG) signals are generated by neurons in the human brain and are widely used in noninvasive brain-computer interface systems (Birbaumer et al. 2019). In brain-computer interface system, it is very important to extract the features of EEG signals with high separability and to accurately recognize the intention of EEG signals by using a strong classification algorithm (Perdikis et al. 2018). In addition, EEG signals play an extremely important role in classifying different cognitive states during the performance of various attention and visual-motor coordination tasks (Gaurav et al. 2021). Cui et al. had suggested that identifying the temporal changes of mental workload level is crucial for enhancing the safety of human–machine system operations, this is also vital in EEG intention recognition (Cui et al. 2016).
Extensive research efforts have been dedicated to improving EEG feature extract-ion and classification for BCI applications (Li et al. 2004; Mishchenko et al. 2018; Zhang et al. 2013, 2015; Reddy et al. 2019; Hua et al. 2019). Common spatial subspace decomposition (Chen et al. 2019; Jiang et al. 2018; Mishuhina et al. 2018) found the spatial filter that maximizes the variance of one class and minimizes the variance of the other class before entering the EEG data into the classification algorithm, which has been widely used to further improve the quality of EEG signals. However, CSSD suffers from problems (Jiang et al. 2018; Mishuhina et al. 2018) such that its sensitivity to outliers and overfitting of small-size dataset training and interference channels will degrade its performance. In view of these shortcomings, many researchers put forward improved methods (Wu et al. 2008, 2014; Sun et al. 2015; Fu et al. 2016; Aghaei et al. 2015; Meng et al. 2014). Among the proposed solutions, one of them attempts to find more discriminant information in time or spectra domain to help improve recognition accuracy. Another solution tries to solve the over-fitting problem in the spatial filter optimization process.
Some researchers have tried to improve the quality of EEG signals through spatial filtering. Several event-related potentials (ERP) (Roy et al. 2015) spatial filtering methods have been designed to enhance relevant EEG activity for active brain-computer interfaces. Spatial filtering was applied to increase the signal to noise ratio and reduce the dimension of the signal (Barachant 2014). Two CSSD filters (Wu et al. 2017) were demonstrated for EEG-based regression problems in BCI, which were extended from the CSSD filter for classification. Bhattacharyya et al. applied a novel filtering approach for suppressing the cortical stimulation artifact in stereo-EEG signals using time, frequency as well as spatial information (Bhattacharyya et al. 2018). The clustering algorithm was employed to extract the spatial nodes with distinct connectivity attributes throughout the EEG-based brain networks (Yamada et al. 2016; Zeng et al. 2020). Then, the temporal features of wavelet entropy from the extracted nodes were transformed to spatio-temporal images (Zhang et al. 2020).
Inspired by these studies, we present a novel spatial filtering method to improve the quality of EEG signals, called ODH—CSSD. ODH—CSSD optimizes multi-dimensional EEG features through multi-dimensional optimal projection space. That is superior to the Fisher ratio that optimizes multi-dimensional EEG features through the best projection, due to more dimensional features with high separability can be gotten.
In order to verify the effectiveness of the EEG feature extraction method proposed in this study. The ODH—CSSD algorithm was used to extract the optimal EEG features from the public EEG dataset. This study mainly studies the recognition accuracies of two-dimensional and three-dimensional optimal EEG features. Interpretable clustering results (Fraiman et al. 2013; Kim et al. 2015) not only can tell us about the characteristics of the cluster, but also how it differs from other clusters. When identifying the optimal features, the interpretable clustering model (Chen et al. 2016) was used to detect the optimal EEG features. The interpretable clustering model which is called DRMM in this study. Compared with interpretable clustering models (Fraiman et al. 2013; Kim et al. 2015; Liu et al. 2000), DRMM can make full use of prior knowledge to form interpretable clustering rather than by constructing decision trees. Meanwhile, the clustering model (Agrawal et al. 2019; Wang et al. 2019), DRMM can clearly explain the difference of each cluster. The supervised learning algorithms (Mishchenko et al. 2018; Fu et al. 2018; Kshirsagar et al., 2018; Chen et al. 2018) do not use the labeled data training model, which significantly reduces the training process for BCI application. The implementation process of interpretable clustering proposed in this study is shown in Fig. 1.
Fig. 1.
Identifying EEG intention with interpretable clustering
Experimental data and tensor-structure representation
The data were collected from motor imagery (MI) EEG data of the German team at IVth BCI competition (Blankertz et al. 2007). The signals from 59 EEG electrodes, which are most densely distributed in the sensorimotor region, were measured and sampled at 1000 Hz. The collected EEG signals were stored in a four-dimensional tensor structure, the size of which was the number (#) of samples × (#) of channels × (#) of trials × (#) of classes (400*59*90*2), as shown in Fig. 2.
Fig. 2.

High dimensional representation of data structure
ODH—CSSD feature optimization
Common spatial subspace decomposition
The core of the CSSD algorithm is to compute the spatial domain filter to extract the specific components associated with tasks in the primitive EEG signals.
(1) Let (i = 1 or 2. (a) the imagining movement of left hand—1, (b) the imagining the movement of right hand—2.) denotes a multi-channel EEG signal under the imagination task. The normalized spatial covariance matrix is defined as
| 1 |
where is the transpose of the matrix , is the sum of the diagonal elements of the matrix.
- (2) The covariance matrix of multiple experimental data of the same task is averaged to obtain an average spatial covariance matrix .
where represents the number of trials in each class.2
(3) The eigenvalues and eigenvectors are obtained according to Eq. (3).
| 3 |
in which is a matrix of eigenvectors and is a diagonal matrix of eigenvalues.
(4) The EEG signals are filtered by using , where represents the number of channels of the EEG signals, represents the number of samples in one trial. According to Eq. (4), the feature vector is extracted from the filtered matrix , and the dimension of does not exceed the number of channels.
| 4 |
where the matrix consists of extracting the first m rows and the last m rows of . These are some of the previous work done on the basis of CSSD (Fu, et al. 2021).
Optimal discriminant hyperplane
In order to find the optimal discriminant vectors (Foley et al., 1975), the optimal discriminant criterion should be set first. The optimal discriminant criterion set in this study is as follows:
| 5 |
in which ,. is L-dimensional column vector on which the data are projected. is transpose of . is within-class scatter for class i and its formula is as follows:
| 6 |
where is difference in the estimated means i.e. . is estimated mean of class i and it can be given by:
| 7 |
where . It represents jth feature vector for class i. is the number of samples in class i. Note that the generalized ratio is independent of the magnitude of the vector as .
In the Fisher’s ratio, the optimal direction of is . Assuming that and substituting it into Eq. (8)
| 8 |
then
| 9 |
where is chosen subject to and
By maximizing to solve the optimal discriminant direction, one of the constraints should be subjected to is that and are orthogonal. Suppose the objective function is:
| 10 |
take the partial derivative of with respect to and then make it be equal to zero,
| 11 |
then
| 12 |
in which is the normalized constant of . If the third optimal discriminant vector is found, Eq. (13) can be obtained:
| 13 |
take the partial derivative of with respect to , and then make them be equal to zero:
| 14 |
Finally, the third optimal discriminant vector can be obtained:
| 15 |
where is the normalized constant of .
Please refer to Supplementary Information for detailed derivation process. (SI Appendix, section II page 1).
Pseudo-codes for solving the optimal EEG features is shown in Table 1.
Table 1.
ODH—CSSD feature optimization
Discriminative rectangle mixture model
Let x represent the generating feature of the rule and y represent the clustering structure. The features are then normalized so that the mean of each feature is zero and the variance is unit variance. Assume k is the number of clusters, and is defined as the index of the N-th sample in k-dimension data. For example, means that the N-th data belongs to the k-th cluster (Ahmed et al. 2019; Bishop 2006; Fraley and Raftery 2002).
Decision boundary with prior distribution (SI Appendix, section III page 4).
Soft clustering indicator (SI Appendix, section III page 5).
Soft decision rectangle clustering (SI Appendix, section III page 5).
Generating the Cluster-Preserving Features (SI Appendix, section III page 6).
-
Overall Model (SI Appendix, section III page 6).
The joint probability of DRMM is:
where represents rectangular decision boundaries, and represents parameters. Given the observed data , and , DRMM focuses on learning the posterior distributions and the optimal parameters .16 Solving model (SI Appendix, section III page 6).
In the EM (Bishop 2006) algorithm, is used as a potential variable, and the maximum posterior estimate of the parameter and are obtained. The objective function is:
| 17 |
-
① Expectation Step.
The posterior probability of the implicit variable is calculated, given the parameters and the observed variables .
where represents constants that are not functions of and normalize this equation such that it defines a valid probability. For any , is conditionally independent of each other:18
in which and is the vector of K-dimension.19 With respect to the posterior distribution is known, the expectation of the joint logarithm is calculated by Eq. (20):20 -
Maximization Step.
The DRMM algorithm is summarized in Table 2.
Table 2.
DRMM algorithm
Here, the steps of DRMM algorithm for this work are briefly described, and the DRMM algorithm is summarized in Table 2.
Result
It is well known that the most powerful algorithm can't give satisfactory results without extracting good data features. In view of this, this study proposes an EEG features extraction method and good data characteristics can be obtained with maximum separability. In order to prove the reliability of the proposed method in this study, the motor imagery EEG data (Blankertz et al. 2007) of the German team in the IVth BCI competition was used for verification. The EEG data consists of 7 subjects, in which subject C, D, and E are manually fitted by the team. Its purpose is to evaluate and compare the performance of the algorithm.
In this study, the best clustering results were obtained when m is 5 (Fu et al. 2021). In addition, 3, 4, 6 and 7 was chosen for comparison at the same time, the clustering results with different values of m based on BCI Competition IV are shown in Table 3.
Table 3.
The clustering results with different values of m based on BCI Competition IV Data
| Subjects | A | B | F | G |
|---|---|---|---|---|
| m* = 3 | 0.89 | 0.83 | 0.84 | 0.81 |
| m = 4 | 0.83 | 0.89 | 0.92 | 0.82 |
| m = 5 | 0.91 | 0.93 | 0.935 | 0.92 |
| m = 6 | 0.85 | 0.88 | 0.83 | 0.91 |
| m = 7 | 0.79 | 0.84 | 0.87 | 0.82 |
*m represents the eigenvector matrix brought into the algorithm is composed of the m rows before and after the filtered matrix
As can be seen from Table 3, for subjects A, B, F and G, when m is 5, the clustering performance is the best and the accuracy is the highest, which is 0.91, 0.93, 0.935 and 0.92 respectively. When m value is greater or less than 5, the accuracy will be decreased and the clustering performance will be worse. Therefore, when CSSD is used to extract multidimensional EEG features from the original EEG signals in this study, m = 5.
When CSSD is used to extract multidimensional EEG features from raw EEG signals, it will cause overfitting for small-size dataset training (Jiang et al. 2018; Mishuhina et al. 2018). An effective solution to the problem is to reduce the dimension of the features. The classical method of reducing the data dimension is the use of the eigenvectors of the lumped covariance matrix corresponding to the largest eigenvalues. Encouraged by this, optimal criteria were established based on multidimensional EEG features in this study. Then the N discriminant vectors which are mutually orthogonal and are corresponding to the N extreme values of the optimal discriminant criterion are used as the optimal discriminant vectors. In training process, an interpretable clustering model DRMM was used to identify the optimal EEG features. In this study, the parameter of the clustering algorithm are set as a = 10 and αt = βt = 1. DRMM learns the optimal features, gives the predicted cluster label and finds the rectangular decision rule for each cluster. Rectangular decision rules can clearly explain the differences of each cluster. The two-dimensional EEG feature optimization results are shown in Fig. 3, for subjects C, D, and E, the two-dimensional optimal features are distributed near a straight line. Since they are all artificially fitted data, there are no outliers. For subjects A, B, F, and G, the two-dimensional optimal features data shows some outliers. But the separability between the two clusters of data is very high. Besides, it can be found that the optimal features data of the subject G have the most outliers. For all subjects, the rectangular decision rules obtained by DRMM accurately identify the distribution boundaries of the data.
Fig. 3.
Plots of the clustering results in two-dimensional optimal features of subjects. The rectangles represent the rules determined by DRMM
In order to further study the performance of ODH—CSSD in extracting high-dimensional EEG features, this study also extracts the optimal three-dimensional features and uses DRMM to identify the optimal features. The result are shown in Fig. 4. Since the discriminant criterion is established based on the maximum separability, the distribution of the optimal features for all subjects has little overlap. According to Fig. 4, DRMM accurately identified the boundary between the two clusters of data for each subject. For subject F, the boundaries of the rectangular decision obtained by DRMM have better performance than Bayesian classification boundaries.
Fig. 4.
Plots of the clustering results in three-dimensional optimal features of subjects. The rectangles represent the rules determined by DRMM
Comparing Fig. 3 with Fig. 4, it can be concluded that the optimal features data extracted by ODH—CSSD shows great separability. It is worth noting that the clustering results of the subject G’s three-dimensional optimal features have been misjudged. For the two-dimensional optimal features of the subject G, the rectangular decision rule can accurately identify the clustering boundary of data scattered points. However, from the clustering results of the subject G in Fig. 4, the rectangular decision boundaries do not accurately identify the boundaries of the data.
According to the clustering labels predicted by DRMM, the clustering accuracy of each subject was calculated, and the clustering accuracy of two-dimensional and three-dimensional optimal EEG features was compared. The clustering accuracy is shown in Table 4.
Table 4.
Comparison of clustering accuracy
| Subjects | A | B | C | D | E | F | G | AVERAGE |
|---|---|---|---|---|---|---|---|---|
| Tw-D-AC* | 0.91 | 0.93 | 0.95 | 0.965 | 0.975 | 0.935 | 0.92 | 0.942 |
| Th-D-AC* | 0.92 | 0.865 | 0.97 | 0.97 | 0.975 | 0.945 | 0.66 | 0.900 |
*Tw-D-AC represents two-dimensional optimal feature clustering accuracy, Th-D-AC represents three-dimensional optimal feature clustering accuracy
It can be seen from Table 4 of subject A and F that although the recognition accuracy of three-dimensional optimal feature is higher than that of two-dimensional optimal feature, the difference is small. For subjects B and G, the clustering accuracy of the optimal two-dimensional features is higher than that of the optimal three-dimensional features. For subjects C, D and E, increasing the dimension of features is so weak to improve the accuracy of optimal feature clustering, and even does not help improve the accuracy. And the subject E, the accuracy of the two-dimensional optimal features is the same as that of the three-dimensional optimal features clustering.
In order to study the performance of the method for identifying EEG intention proposed in this study, the mean square error (MSE) of each subject is calculated. In the IVth BCI competition, MSE was used as a indicator to evaluate the team performance. This study compares the MSE calculated by the clustering labels with the champion in the BCI competition. The results of the comparison are shown in Table 5.
Table 5.
Intention recognition performance of all seven participants evaluated by MSE
| Subjects | A | B | F | G | AVERAGE1 | C | D | E | AVERAGE2 |
|---|---|---|---|---|---|---|---|---|---|
| MSE1* | 0.40 | 0.42 | 0.42 | 0.29 | 0.382 | 0.33 | 0.23 | 0.28 | 0.281 |
| MSE2** | 0.36 | 0.24 | 0.24 | 0.28 | 0.28 | 0.20 | 0.14 | 0.10 | 0.146 |
| MSE3* | 0.32 | 0.54 | 0.22 | 1.36 | 0.61 | 0.12 | 0.12 | 0.10 | 0.113 |
*MSE1 represents the results of the champion in the competition list. MSE2 represents the results of the two-dimensional optimal features and MSE3 represents the results of the three-dimensional optimal features
In terms of the features clustering results of two-dimensional optimal EEG signals, the clustering results of all real subjects are better than those of the competition champion. At the same time, the average MSE is also lower than in the champion in the competition. For the artificially fitted subjects, the results are also better than the champion in the competition list.
As for the three-dimensional optimal EEG features clustering results, the mutually fitted data clustering has excellent clustering results. But for the experimental subjects B and G, the MSE of clustering bigger than the result of the champion in the competition list. The MSE of clustering for subject G is larger, but the overall MSE of clustering is lower than the champion in the competition list. From the MSE of the clustering for all subjects, the results of the two-dimensional optimal feature clustering are more stable.
In order to study the method of EEG signal intention recognition further, receiver operating characteristic curve (ROC) and corresponding area under curve (AUC) values were plotted according to the cluster labels predicted by DRMM, as shown in Fig. 5. Except for the subject G, the AUC values of the other three subjects are all greater than 0.96, and the AUC values are basically equal for the same subject. It is worth noting from Fig. 5b that the AUC value of the subject G is only 0.74. But compare that with the AUC value of two-dimensional optimal features, it can reach 0.98. As shown in Fig. 5, the two-dimensional optimal features have better performance than the optimal three-dimensional features.
Fig. 5.
ROC curve of two-dimensional optimal features and three-dimensional optimal features when by DRMM
In considering the influence of algorithm on clustering accuracy, this study compares the clustering result obtained by DRMM with the results predicted by Fuzzy C-means (FCM) (Agrawal et al. 2019; Bezdek et al. 1984) and K-means (MacQueen 1967; Wang et al. 2019). And the result is shown in Table 6, the numbers in bold are the best results for each subject. FCM clustering results show that increasing the dimension of the optimal feature does not always have a positive effect on improving the clustering accuracy. For example, for subjects A and G, the accuracies of optimal three-dimensional features are slightly lower than that of two-dimensional optimal features. For subject D and E, increasing the dimension of the optimal features reduces the accuracy of clustering. It can be seen from the clustering results of theme B, C and F that increasing the optimal feature dimension has a positive effect on improving the clustering accuracy, but the effect is very weak. According to the clustering results obtained by K-means, for four real subjects, increasing the dimension of the features helps to improve the accuracy of EEG intention recognition. However, among the four subjects, the improved accuracy was at most 0.03. It can be found that increasing the dimension of feature has little help to improve the accuracy of recognition. For the subject D and E, extended feature dimension does not really improve the clustering accuracy of subject D, which reduces the clustering accuracy.
Table 6.
Comparison the clustering accuracy of DRMM with FCM and K-means for optimal features
| subject | A | B | C | D | E | F | G | AVERAGE |
|---|---|---|---|---|---|---|---|---|
| DRMM_2* | 0.91 | 0.93 | 0.95 | 0.965 | 0.975 | 0.935 | 0.92 | 0.942 |
| Kmeans_2* | 0.875 | 0.85 | 0.855 | 0.96 | 0.965 | 0.805 | 0.66 | 0.852 |
| FCM_2* | 0.885 | 0.86 | 0.865 | 0.94 | 0.96 | 0.805 | 0.675 | 0.855 |
| DRMM_3* | 0.92 | 0.865 | 0.97 | 0.97 | 0.975 | 0.945 | 0.66 | 0.900 |
| Kmeans_3* | 0.895 | 0.88 | 0.955 | 0.945 | 0.965 | 0.825 | 0.665 | 0.875 |
| FCM_3* | 0.87 | 0.865 | 0.94 | 0.935 | 0.955 | 0.82 | 0.67 | 0.865 |
*DRMM_2 represents the result of DRMM clustering optimal two-dimensional features. Kmeans_2 represents the result of K-means clustering optimal two-dimensional features. FCM_2 represents the result of FCM clustering optimal two-dimensional features. DRMM_3 represents the result of DRMM clustering optimal three-dimensional features. Kmeans_3 represents the result of K-means clustering optimal three-dimensional features. FCM_3 represents the result of FCM clustering optimal three-dimensional features
As shown in Table 6, for the two-dimensional optimal features, the gap between the accuracy of the DRMM with the accuracy of the FCM and K-means can reach 0.26. For the optimal three-dimensional features, except for subject B and G, the clustering accuracy of DRMM is higher than that of FCM and K-means. It can be found from Table 6 that the accuracy of DRMM to identify two-dimensional optimal features is more than 0.91, and the highest accuracy even reaches 0.975. At the same time, DRMM has the most stable recognition accuracy for two-dimensional optimal features, and its average clustering accuracy reaches 0.942.
Discussion
In this study, a spatial filtering based EEG feature extraction method is proposed. This method is different from the previous studies that try to solve the over-fitting problem in spatial filter optimization. ODH—CSSD optimized multidimensional EEG features through N-dimensional optimal feature space. On the basis of reconstruction of the optimal feature space, and on the premise of maximum resolution and separability, the criterion of multidimensional EEG features was established to optimize EEG feature extraction method and improve feature separability. Under the condition of high accuracy, an intention decoding method of EEG based on interpretable clustering is used to cluster the optimal features obtained by the above proposed feature extraction method. The orthogonal discriminant vectors corresponding to the extremum of the optimal discriminant is found as the optimal discriminant vectors. In this study, two (three) optimal discriminant vectors are selected to form a two-dimensional (three-dimensional) discriminant hyperplane. Then the multi-dimensional EEG features extracted by CSP were projected into the optimal feature space to obtain the optimal EEG features. An interpretable clustering model DRMM is investigated to identify optimal features.
The CSSD is widely used to extract the features of multi-channel EEG signals. Many researchers have proposed improved CSSD algorithms (Wu et al. 2008; Aghaei et al. 2015) in order to improve the separability of EEG features as much as possible. These studies motivate us to put forward a novel EEG feature extraction method. Multi-dimensional optimal spatial filtering is intended to optimize multi-dimensional EEG features to obtain more dimensional features information. To ensure that the discriminant vectors can form multidimensional eigenspace, a constraint condition is introduced to make the N-th discriminant vector of the required solution orthogonal to the known N-1 discriminant vectors. When setting the optimal criterion, the matrix A can be regarded as the weight of the discriminant vector of the two clusters. In this study, clusters 1 and 2 can ideally be divided together. Therefore, A = 0.5 W1 + 0.5 W2.
When the interpretable clustering model DRMM is used to identify the optimal features, the parameter a is a threshold for determining two clusters of probabilities. If the value of a is too small, the number of elements in the predicted cluster is larger than the number of items in the real cluster. That will significantly reduce the accuracy of the cluster. If the value of a is so large, g(t) is not differentiated, and the DRMM cannot predict the label of optimal features. In this study, let αt = βt = 1. So that the decision rectangles can contain as many elements as possible in the cluster. If it takes another value but its ratio is 1, it will not affect the accuracy of the cluster. But it will make the elements in some clusters outside the rectangular decision. In this case, the N-th sample is assigned to the k clustering index by calculating the expected value of the soft clustering index according to the Eq. (24).
| 24 |
where γnk is defined as Eq. (25) (Fu et al. 2021) and E is the expected value of the variational distribution q.
| 25 |
This study mainly studies the performance of DRMM to identify the optimal EEG features in two dimensional space. According to the accuracy of DRMM to identify the optimal features, except for the subject B and G, the accuracy of the optimal three-dimensional features clustering are slightly higher than or equal to the two-dimensional optimal feature clustering accuracy. The accuracy of the optimal features recognition in two dimensions is above 91%, and even up to 97.5%. In addition to the three-dimensional optimal features’ AUC value of the subject G, the AUC values are higher than 0.96, which indicates that the EEG intention recognition method has good performance.
It should be noted that the performance difference between the optimal two-dimensional features with the optimal three-dimensional features for the same subject is not big. According to the clustering results obtained by DRMM, among the four real subjects, the difference between the accuracy of the optimal three-dimensional feature and the two-dimensional optimal feature is at most 0.01. As for the optimal feature clustering results obtained by FCM and K-means, the maximum difference between the accuracy of the optimal three-dimensional features with the accuracy of the two-dimensional optimal features is 0.03 in the four real subjects. Another point to note is that increasing the dimension of the elements increases ambiguity for data discrimination, which reduces the clustering accuracy. And one the one hand, the clustering results of DRMM, the maximum difference between the clustering accuracy of the two-dimensional optimal features and the clustering accuracy of the optimal three-dimensional features is 0.26 for the subject B and G. On the another hand, the clustering results of FCM, the maximum difference between the accuracy of the two-dimensional optimal features with the precision of optimal three-dimensional features reach 0.05 for the subject A and G.
Comparing the results of the three clustering algorithms, the interpretable clustering DRMM recognition accuracy is higher than that of the other two clustering algorithms. The average accuracy for the two-dimensional optimal features reaches 0.942. Besides, increasing the dimension of the optimal feature is different for improving the clustering accuracy, and the increase of the difference is caused by the difference of individual EEG signals.
From the results of the overall clustering, it can be known that the two-dimensional optimal EEG features have shown great separability. Therefore, projecting the multi-dimensional EEG feature into the three-dimensional optimal feature space does not significantly improve the accuracy of clustering. When extracting the optimal EEG features dimension is four or higher, the accuracy of clustering may be lower than that of optimal three-dimensional features.
Based on the above analysis, the two-dimensional optimal EEG features have higher separability. Compared with the optimal three-dimensional features, it has more stable and excellent performance in the single-trial motor imagery EEG intent recognition.
Conclusion
In this study, we propose a novel single-trial EEG intention recognition method for motor imagery. It is an EEG feature extraction method based on spatial domain filtering. Different from previous studies that try to solve the over-fitting problem in spatial filter optimization, ODH-CSSD optimizes multidimensional EEG features through n-dimensional optimal feature space on the basis of reconstruction and under the premise of maximum resolution and separation. The discriminant criterion of multidimensional EEG features was established to optimize the feature extraction method and improve the separability of EEG features. In addition, this study attempts to identify the optimal EEG features using the interpretable clustering model DRMM. Compared to the supervised algorithm, DRMM does not require a labeled data training model, which dramatically reduces the training process for BCI applications. More importantly, compared to the FCM and K-means clustering algorithms, the decision rectangle obtained by DRMM can clearly explain the difference between each cluster. Comparative experiments have proved that the EEG intent recognition method proposed in this study has better performance. This method can provide alternative solutions for BCI applications.
Supplementary Information
Below is the link to the electronic supplementary material.
Acknowledgements
This work was supported by the National Natural Science Foundation of China (Grant No. 62073282, 61973262, 61806174), The central guidance on local science and technology development fund of Hebei Province (206Z0301G), China Postdoctoral Science Foundation (Grant No. 2016M600193), Natural Science Foundation of Hebei Province (Grant No. E2018203433, F2020203070).
Author contribution
RF: Conceptualization, Methodology. DX: Data curation, Writing—original draft, Software, Visualization. WL: Writing—review & editing. PS: Software.
Data availability
The mat data used to support the findings of this study are currently under embargo while the research findings are commercialized.
Declarations
Conflicts of interest
The authors have no relevant financial or non-financial interests to disclose.
Footnotes
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Contributor Information
Rongrong Fu, Email: frr1102@aliyun.com.
Dong Xu, Email: xd26300@126.com.
Weishuai Li, Email: rookie_li@stumail.ysu.edu.cn.
Peiming Shi, Email: spm@ysu.edu.cn.
References
- Aghaei AS, Mahanta MS, Plataniotis KN. Separable common spatio-spectral patterns for motor imagery BCI systems. IEEE Trans Biomed Eng. 2015;63(1):15–29. doi: 10.1109/TBME.2015.2487738. [DOI] [PubMed] [Google Scholar]
- Agrawal A, Tripathy BK. Efficiency analysis of hybrid fuzzy C-means clustering algorithms and their application to compute the severity of disease in plant leaves. Comput Rev J. 2019;3:156–169. [Google Scholar]
- Ahmed SRA, Al Barazanchi I, Jaaz ZA, Abdulshaheed HR. Clustering algorithms subjected to K-mean and gaussian mixture model on multidimensional data set. Period Eng Nat Sci. 2019;7(2):448–457. [Google Scholar]
- Barachant A (2014) MEG decoding using Riemannian geometry and unsupervised classification. Technical Report.[Online]. Available: http://alexandre.barachant.org/wpcontent/uploads/2014/08/documentation.pdf
- Bezdek JC, Ehrlich R, Full W. FCM: The fuzzy c-means clustering algorithm. Comput Geosci. 1984;10(2–3):191–203. doi: 10.1016/0098-3004(84)90020-7. [DOI] [Google Scholar]
- Bhattacharyya A, Ranta R, Le Cam S, Louis-Dorr V, Tyvaert L, Colnat-Coulbois S, Pachori RB. A multi-channel approach for cortical stimulation artefact suppression in depth EEG signals using time-frequency and spatial filtering. IEEE Trans Biomed Eng. 2018;66(7):1915–1926. doi: 10.1109/TBME.2018.2881051. [DOI] [PubMed] [Google Scholar]
- Birbaumer N, Rana A (2019) Brain–computer interfaces for communication in paralysis. In: Casting light on the dark side of brain imaging. pp. 25–29. Academic Press. 10.1016/B978-0-12-816179-1.00003-7
- Bishop CM. Pattern recognition and machine learning, chapter 9, mixture models and EM. Berlin: Springer Science+ Business Media; 2006. [Google Scholar]
- Blankertz B, Dornhege G, Krauledat M, Müller KR, Curio G. The non-invasive Berlin brain–computer interface: fast acquisition of effective performance in untrained subjects. Neuroimage. 2007;37(2):539–550. doi: 10.1016/j.neuroimage.2007.01.051. [DOI] [PubMed] [Google Scholar]
- Chen JC, Wang H, Hua CC. Electroencephalography based fatigue detection using a novel feature fusion and extreme learning machine. Cogn Syst Res. 2018;52:715–728. doi: 10.1016/j.cogsys.2018.08.018. [DOI] [Google Scholar]
- Chen JX, Chang YL, Hobbs B, Castaldi P, Cho M, Silverman E, Dy J (2016) Interpretable clustering via discriminative rectangle mixture model. In: 2016 IEEE 16th International Conference on Data Mining (ICDM) (pp. 823–828). IEEE. https://ieeexplore.ieee.org/abstract/document/7837910
- Chen JX, Jiang DM, Zhang YN. A common spatial pattern and wavelet packet decomposition combined method for EEG-based emotion recognition. J Adv Comput Intell Intell Inform. 2019;23(2):274–281. doi: 10.20965/jaciii.2019.p0274. [DOI] [Google Scholar]
- Cui X, Zhang J, Wang R. Identification of mental workload using imbalanced EEG data and DySMOTE-based neural network approach. IFAC-PapersOnLine. 2016;49(19):567–572. doi: 10.1016/j.ifacol.2016.10.627. [DOI] [Google Scholar]
- Foley DH, Sammon JW. An optimal set of discriminant vectors. IEEE Trans Comput. 1975;100(3):281–289. doi: 10.1109/T-C.1975.224208. [DOI] [Google Scholar]
- Fraiman R, Ghattas B, Svarc M. Interpretable clustering using unsupervised binary trees. Adv Data Anal Classif. 2013;7(2):125–145. doi: 10.1007/s11634-013-0129-3. [DOI] [Google Scholar]
- Fraley C, Raftery AE. Model-based clustering, discriminant analysis, and density estimation. J Am Stat Ass. 2002;97(458):611–631. doi: 10.1198/016214502760047131. [DOI] [Google Scholar]
- Fu R, Li W, Chen J, Han M. Recognizing single-trial motor imagery eeg based on interpretable clustering method. Biomed Signal Process Control. 2021;63:102171. doi: 10.1016/j.bspc.2020.102171. [DOI] [Google Scholar]
- Fu YF, Xiong X, Jiang CH, Xu BL, Li YC, Li HY. Imagined hand clenching force and speed modulate brain activity and are classified by NIRS combined with EEG. IEEE Trans Neural Syst Rehabil Eng. 2016;25(9):1641–1652. doi: 10.1109/TNSRE.2016.2627809. [DOI] [PubMed] [Google Scholar]
- Fu YF, Chen J, Xiong X (2018) Calculation and analysis of microstate related to variation in executed and imagined movement of force of hand clenching. Computational Intelligence and Neuroscience, 2018. https://www.hindawi.com/journals/cin/2018/9270685/ [DOI] [PMC free article] [PubMed]
- Gaurav G, Anand RS, Kumar V. Eeg based cognitive task classification using multifractal detrended fluctuation analysis. Cogn Neurodyn. 2021 doi: 10.1007/s11571-021-09684-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hua CC, Wang H, Wang H, Lu SW, Liu C, Khalid SM. A novel method of building functional brain network using deep learning algorithm with application in proficiency detection. Int J Neural Syst. 2019;29(01):1850015. doi: 10.1142/S0129065718500156. [DOI] [PubMed] [Google Scholar]
- Jiang AM, Wang Q, Shang J, Liu XF (2018) Sparse common spatial pattern for EEG channel reduction in brain-computer interfaces. In: 2018 IEEE 23rd international conference on digital signal processing (DSP) (pp. 1–4). IEEE. https://ieeexplore.ieee.org/abstract/document/8631618
- Kim B, Shah JA, Doshi-Velez F (2015). Mind the gap: a generative approach to interpretable feature selection and extraction. In: Advances in neural information processing systems. pp. 2260–2268. https://dspace.mit.edu/handle/1721.1/109373
- Kshirsagar GB, Londhe ND. Improving performance of Devanagari script input-based P300 speller using deep learning. IEEE Trans Biomed Eng. 2018;66(11):2992–3005. doi: 10.1109/TBME.2018.2875024. [DOI] [PubMed] [Google Scholar]
- Li Y, Gao XR, Liu HS, Gao SK. Classification of single-trial electroencephalogram during finger movement. IEEE Trans Biomed Eng. 2004;51(6):1019–1025. doi: 10.1109/TBME.2004.826688. [DOI] [PubMed] [Google Scholar]
- Liu B, Xia YY, Yu PS (2000) Clustering through decision tree construction. In: Proceedings of the ninth international conference on information and knowledge management (pp. 20–29). ACM.10.1145/354756.354775
- MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley symposium on mathematical statistics and probability. Vol. 1, No. 14, pp. 281–297
- Meng JJ, Yao L, Sheng XJ, Zhang DG, Zhu XY. Simultaneously optimizing spatial spectral features based on mutual information for EEG classification. IEEE Trans Biomed Eng. 2014;62(1):227–240. doi: 10.1109/TBME.2014.2345458. [DOI] [PubMed] [Google Scholar]
- Mishchenko Y, Kaya M, Ozbay E, Yanar H. Developing a three-to six-state EEG-based brain-computer interface for a virtual robotic manipulator control. IEEE Trans Biomed Eng. 2018;66(4):977–987. doi: 10.1109/TBME.2018.2865941. [DOI] [PubMed] [Google Scholar]
- Mishuhina V, Jiang XD. Feature weighting and regularization of common spatial patterns in EEG-based motor imagery BCI. IEEE Signal Process Lett. 2018;25(6):783–787. doi: 10.1109/LSP.2018.2823683. [DOI] [Google Scholar]
- Perdikis S, Bourban F, Rouanne V, Millán JDR, Leeb R (2018) Effects of data sample dependence on the evaluation of BCI performance (No. CONF). BCI Society. https://infoscience.epfl.ch/record/254834/files/BCIMeeting18_PerdikisMM_CV.pdf
- Reddy TK, Arora V, Behera L, Wang YK, Lin CT (2019) Multi-class fuzzy time-delay common spatio-spectral patterns with fuzzy information theoretic optimization for EEG based regression problems in brain computer interface (BCI). IEEE Transactions on Fuzzy Systems. https://ieeexplore.ieee.org/abstract/document/8611122
- Roy RN, Bonnet S, Charbonnier S, Jallon P, Campagne A (2015) A comparison of ERP spatial filtering methods for optimal mental workload estimation. In: 2015 37th annual international conference of the IEEE engineering in medicine and biology society (EMBC) (pp. 7254–7257). IEEE. https://ieeexplore.ieee.org/abstract/document/7320066 [DOI] [PubMed]
- Sun HW, Fu YF, Xiong X, Yang J, Liu CW, Yu ZT. Identification of EEG induced by motor imagery based on hilbert-huang transform. Acta Automatica Sinica. 2015;41(9):1686–1692. [Google Scholar]
- Wang S, Gittens A, Mahoney MW. Scalable kernel K-means clustering with Nyström approximation: relative-error bounds. J Mach Learn Res. 2019;20(1):431–479. [Google Scholar]
- Wu W, Gao XR, Hong B, Gao SK. Classifying single-trial EEG during motor imagery by iterative spatio-spectral patterns learning (ISSPL) IEEE Trans Biomed Eng. 2008;55(6):1733–1743. doi: 10.1109/TBME.2008.919125. [DOI] [PubMed] [Google Scholar]
- Wu W, Chen Z, Gao XR, Li YQ, Brown EN, Gao SK. Probabilistic common spatial patterns for multichannel EEG analysis. IEEE Trans Pattern Anal Mach Intell. 2014;37(3):639–653. doi: 10.1109/TPAMI.2014.2330598. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wu DR, King JT, Chuang CH, Lin CT, Jung TP. Spatial filtering for EEG-based regression problems in brain–computer interface (BCI) IEEE Trans Fuzzy Syst. 2017;26(2):771–781. doi: 10.1109/TFUZZ.2017.2688423. [DOI] [Google Scholar]
- Yamada H, Inokawa H, Hori Y, Pan X, Matsuzaki R, Nakamura K, et al. Characteristics of fast-spiking neurons in the striatum of behaving monkeys. Neurosci Res. 2016;105:2–18. doi: 10.1016/j.neures.2015.10.003. [DOI] [PubMed] [Google Scholar]
- Zeng T, Tang F, Ji D, Si B. Neurobayesslam: neurobiologically inspired bayesian integration of multisensory information for robot navigation. Neural Netw. 2020;126:21–35. doi: 10.1016/j.neunet.2020.02.023. [DOI] [PubMed] [Google Scholar]
- Zhang C, Tong L, Zeng Y, Jiang JF, Bu HB, Yan B, Li J. Automatic artifact removal from electroencephalogram data based on a priori artifact information. Biomed Res Int. 2015 doi: 10.1155/2015/720450. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang C, Sun L, Cong F, Kujala T, Ristaniemi T, Parviainen T. Optimal imaging of multi-channel EEG features based on a novel clustering technique for driver fatigue detection. Biomed Signal Process Control. 2020;62:102103. doi: 10.1016/j.bspc.2020.102103. [DOI] [Google Scholar]
- Zhang C, Wang H, Wu MH. EEG-based expert system using complexity measures and probability density function control in alpha sub-band. Integr Comput-aided Eng. 2013;20(4):391–405. doi: 10.3233/ICA-130439. [DOI] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The mat data used to support the findings of this study are currently under embargo while the research findings are commercialized.






