Abstract
In visual-imagery-based brain–computer interface (VI-BCI), there are problems of singleness of imagination task and insufficient description of feature information, which seriously hinder the development and application of VI-BCI technology in the field of restoring communication. In this paper, we design and optimize a multi-character classification scheme based on electroencephalogram (EEG) signals of visual imagery (VI), which is used to classify 29 characters including 26 lowercase English letters and three punctuation marks. Firstly, a new paradigm of randomly presenting characters and including preparation stage is designed to acquire EEG signals and construct a multi-character dataset, which can eliminate the influence between VI tasks. Secondly, tensor data is obtained by the Morlet wavelet transform, and a feature extraction algorithm based on tensor—uncorrelated multilinear principal component analysis is used to extract high-quality features. Finally, three classifiers, namely support vector machine, K-nearest neighbor, and extreme learning machine, are employed for classifying multi-character, and the results are compared. The experimental results demonstrate that, the proposed scheme effectively extracts character features with minimal redundancy, weak correlation, and strong representation capability, and successfully achieves an average classification accuracy 97.59% for 29 characters, surpassing existing research in terms of both accuracy and quantity of classification. The present study designs a new paradigm for acquiring EEG signals of VI, and combines the Morlet wavelet transform and UMPCA algorithm to extract the character features, enabling multi-character classification in various classifiers. This research paves a novel pathway for establishing direct brain-to-world communication.
Keywords: Brain–computer interface, Multi-character classification, EEG, Visual imagery, Morlet wavelet
Introduction
Brain–computer interface (BCI) is a technology that can transform the neural activity of the brain into the user's intended output or mental activity, enabling direct communication and control between the brain and the external world without relying on human nerves and muscle tissue (Vinay et al. 2020; Jelena et al. 2021). As the primary source of sensory input for human beings, the visual information plays a crucial role in the development of BCI technology within the realm of communication restoration. Previous studies (Daniel et al. 2015; Kano 2019; Tang et al. 2020) have primarily focused on enabling patients to achieve basic communication through BCI system using visual stimulus-related paradigms. However, this approach necessitates prolonged visual stimulation, which can result in subject fatigue and potentially compromise the accuracy and reliability of the system practical implementation. Nevertheless, if electroencephalogram (EEG) signals can be directly obtained from imagined images, it would enable real-time control of the BCI system, significantly reduce training requirements, and further enhance brain-text communication performance (Lee et al. 2020). The emergence of a visual-imagery-based BCI (VI-BCI) system presents new opportunities for development and potential applications in restoring communication. As a new intuitive paradigm within BCI technology, the visual imagery (VI) is essentially the operation of visual information in memory, which is to imagine tasks independently rather than to accept information from external stimuli (Markus et al. 2000). Due to its advantages of reduced training requirements and a wider range of imagination tasks, VI-BCI has garnered increasing attention in research. Currently, the focus lies on signals acquisition, pre-processing, feature extraction, and classification.
Feature extraction, as a crucial component of the VI-BCI system, facilitates data projection from high to low dimensions, eliminates redundant information, and transforms data into optimal feature vectors. The reliability and stability of feature extraction significantly impact subsequent classification outcomes. In the field of VI-BCI research, Lee et al. employed the common spatial pattern (CSP) to extract EEG features corresponding to commonly used words/phrases (e.g., ambulance, clock) in communication (Lee et al. 2020). Llorella et al. developed a framework based on convolutional neural networks and black hole algorithm for classifying 12 types of VI tasks (e.g., apples, cars) (Llorella et al. 2023). However, due to tasks complexity and neglecting the importance of frequency domain information in classification results, average accuracy was relatively low in their studies. Koji et al. collected EEG signals for six Japanese words using VI-based methods and extracted power spectral density as features through the Welch periodogram method (Koji et al. 2018). Fu et al. gathered EEG signals related to imagining static or moving five-pointed stars and utilized classical modal decomposition combined with autoregressive model for feature extraction (Fu et al. 2021). Koji et al. (2018); Fu et al. (2021) only analyzed EEG signals in the time or frequency domain, but EEG signals are non-stationary and have time-varying spectral characteristics, which means that analyzing EEG signals in the time or frequency domain alone may not be sufficient (Boashash 2015).
Researchers have conducted time-frequency analysis on EEG signals, which holds significant importance in data processing (Altan and Karasu 2020; YaÄŸ and Altan 2022). Lee et al. employed event-related spectral perturbation (ERSP) and the Choi-Williams time-frequency distribution to process EEG signals from three image categories (object, digit, shape) (Lee et al. 2022). Bang et al. utilized ERSP for conducting time-frequency analysis on EEG signals of six different color plane patterns (Bang et al. 2021). However, the feature extraction methods used in Lee et al. (2022); Bang et al. (2021) transform EEG signals into one-dimensional objects for processing, which destroyed the original structure and correlation of EEG signals and ignored the internal relations between multi-channel EEG signals. To effectively obtain high-quality features with low redundancy, weak correlation, and strong representation ability, we propose an uncorrelated multilinear principal component analysis (UMPCA) to extract EEG features after performing wavelet transform-based time-frequency analysis on EEG signals in this paper. UMPCA is a tensor-based feature extraction algorithm that can project tensors onto vectors without vectorizing multi-channel EEG signals (Lu et al. 2009a, b).
Appropriate selection of classifiers is crucial for achieving good classification results in various tasks based on feature extraction. In the field of VI-BCI, support vector machine (SVM) has been employed to classify six Japanese words and static/moving five-pointed stars with average classification accuracies reaching 84.6% and 78.4% respectively (Koji et al. 2018; Fu et al. 2021). Additionally, Kumar et al. conducted VI experiments involving ten numbers, letters, or object images and reported an average accuracy of 85.20% using random forest technology (Kumar et al. 2018). A neural network was utilized to classify five VI tasks including tree, dog, plane, house, and relaxed state with an average accuracy of 60.5% (Fabio et al. 2020). Furthermore, an improved convolutional neural network achieved an average classification accuracy of 46.43% for three categories encompassing nine tasks, with a maximum of 70.38% (Lee et al. 2022). SVM has the advantages of improving generalization ability and solving high-dimensional and nonlinear data, and is a method suitable for small samples. Neural networks (Ko et al. 2021; Zhang et al. 2022) are capable of handling irrelevant features and robust against noise but require substantial training data while being prone to overfitting issues. The common classification algorithms currently include K-nearest neighbor (KNN) (Alom and Islam 2020), extreme learning machine (ELM) (Juneja 2019; Zhang et al. 2020), and others. KNN is particularly suitable for multi-classification problems due to its low training time complexity and simplicity in usage as well as its capability for nonlinear classification purposes. ELM offers advantages such as no hidden layer iteration requirement along with fast learning speed and excellent generalization performance that ensure high learning accuracy. In this paper, we utilize three different classifiers (SVM, KNN, ELM) to perform multi-character classification based on VI.
To sum up, in this paper we design and optimize a multi-character classification scheme based on EEG signals of VI. The overall scheme structure is shown in Fig. 1. Firstly, the subjects acquire EEG signals according to the designed experimental paradigm and electrode layout to construct the offline raw EEG dataset, which undergoes pre-processing. Secondly, the Morlet wavelet transform is used for conducting time-frequency analysis on the pre-processed EEG signals, while encapsulating EEG signals in tensor form (channel number frequency time), and UMPCA is utilized for tensor-to-vector projection to extract high-quality features. Finally, the features of characters are sent into three classifiers including SVM, KNN and ELM for model training or testing to obtain character labels. The experimental results show that our proposed scheme achieves superior performance in multi-character classification.
Fig. 1.
The overall scheme structure ( denotes the number of electrodes, the number of sample points, and the number of frequency points)
The scheme presented in this paper is based on a more intuitive and simplified paradigm for constructing a comprehensive multi-character classification scheme. In consideration of the time-frequency domain information of EEG signals, this scheme utilizes tensor algorithm to extract irrelevant features of multi-character and investigates the feasibility of classifying 29 VI tasks, including 26 lowercase English letters and three punctuation marks. The findings of this study not only explore the potential of VI-BCI in restoring communication, but also provide a novel approach to achieving real-time text communication online, which holds significant implications for the practical application of BCI.
Dataset construction and pre-processing
In this section, we rationally design the experimental paradigm and electrode layout to acquire effective EEG signals, and also implementing pre-processing techniques to enhance the signal-to-noise ratio and reliability of signals.
Experimental paradigm
To eliminate the subjective influence arising from continuous imagination and character sequence imagination, we design an experimental paradigm comprising a preparation stage and random prompt stage of characters. The experimental paradigm is structured into five stages, as shown in Fig. 2, which can be described as follows:
Start stage (0 ~ 3 s): A three-second countdown appears on the computer screen, reminding the subjects to start the trial.
Preparation stage (3 ~ 4 s): A fixation cross is displayed on the screen, which indicates that the subject’s VI task is about to begin, and asks the subject to concentrate, keep breathing steady, and let EEG signals return to the baseline level to avoid EEG signals being disturbed by blinking or head movement.
Character prompt stage (4 ~ 6 s):A character appears randomly on the screen as a prompt for VI task. Subjects are asked to observe the shape of this character.
Visual imagery stage (6 ~ 8 s): The screen turns black, and the subjects are asked to imagine the character prompted before and present it in the brain in the form of a picture.
Relaxation stage (8 ~ 10 s): The subjects take a short rest and are asked not to do any VI tasks to avoid fatigue.
Considering the integrity and legibility of text communication, the characters used for VI tasks in this paper include “,”, “.”, “~” and “a” to “z”, with a total of 29 characters, where “,” represents comma, “.” represents period and “~” represents space. In each trial, each character is presented in the form of a picture on the screen. Notably, 29 characters are randomly displayed without repetition.
Fig. 2.
Experimental paradigm of visual imagery
Electrode layout
Although the current research does not provide a definitive understanding of the specific brain area associated with VI tasks, according to the research and analysis of Koji et al. (2018); Esposito et al. (1997); Kwon et al. (2022); Nataliya et al. (2018); Sousa et al. (2017); Babo et al. (2019); Winlove et al. (2018), in this paper we adopt an electrode layout based on the 10–10 international electrode placement system, as shown in Fig. 3, to effectively record EEG signals related to VI. The electrodes are strategically distributed across various areas of the brain: occipital lobe (PO9, PO7, PO3, O1, POz, Oz, O2, PO4, PO8, PO10), prefrontal lobe (AFz, Fpz, AF7, F7, F5, AF8, F6, F8), temporal lobe (T7, TP7, P7, P8, TP8, T8), parietal lobe (P1, P3, Pz, P4, P2), the motor cortex (Cz, C3, C4), and the earlobe is equipped with two reference electrodes (CMS, DRL).
Fig. 3.

Electrode layout based on the 10–10 international electrode placement system
Data acquisition
In this paper, six healthy subjects (4 females and 2 males with an average age of 24 years) are recruited to acquire EEG signals using EMOTIV EPOC Flex equipment, which have a sampling frequency of 128 Hz, and the subjects are labeled S1S6 respectively. Prior to the experiment, subjects are required to have sufficient sleep and maintain clean heads. During the experiment, subjects are instructed to sit in a comfortable chair facing a computer screen in a quiet room while minimizing movement and blinking to reduce artifacts. For each trial, subjects imagine characters based on the experimental paradigm designed in Fig. 1. Each subject completed 10 groups of trials with each character having 10 epochs, resulting in a total of 290 epochs. The constructed datasets format for this paper is represented as 6 290 , where denotes the number of electrodes and represents the number of sample points.
Pre-processing
The EEG signals are pre-processed using Matlab 2020. Following the pre-processing, the EEG signals of each character is represented as a two-order tensor data . The pre-processing procedure in this paper is described as follows:
Filtering: According to Filip and Tom (2020); Christa et al. (2005); Ehsan (2012); Lee et al. (2020); Koji et al. (2018), FIR bandpass filter is used to filter the raw EEG signals at 460 Hz to eliminate noise,and 50 Hz notch filter is used to avoid power frequency interference.
Removing artifacts: Independent component analysis (ICA) (Srivastava et al. 2005) is used to remove the eye movement, Electrooculographic (ECG) and electromyographic (EMG) artifacts.
Getting epochs: Each trial is marked with corresponding labels and two seconds after the label is taken as the imaginary character signal of the subjects. Each character is an epoch, each subject’s EEG signals are divided into 29 parts, and each part is a kind of character data.
Time alignment: The time alignment technology (Williams et al. 2020) is used to align EEG signals, so as to eliminate the time difference in each imagery task caused by the change of behavior or cognitive state.
Feature extraction
In this section, we introduce two feature extraction algorithms. One approach involves utilizing Wavelet packet decomposition to process and extract fuzzy entropy (FE) (Li et al. 2020) as character features; Another method employed in this paper for feature extraction entails transforming into time-frequency domain data using the Morlet wavelet transform, and further use UMPCA algorithm to project the time-frequency domain data onto a low dimensional vector space to obtain a low dimensional feature vector.
Wavelet packet decomposition and fuzzy entropy
Wavelet packet decomposition (WPD) (Li et al. 2022; Pan et al. 2022) has arbitrary multi-scale characteristics, and it can improve time-frequency resolution by decomposing low-frequency and high-frequency signals. Different decomposition signals can be obtained by choosing different wavelet bases. FE is a statistical method to characterize the complexity of signal sequence, which has good anti-interference and anti-noise ability. FE can be described as:
| 1 |
where represents the fuzzy similarity function, and z, r and v represent the embedding dimension, the width of the fuzzy function boundary and the gradient of the similar tolerance boundary respectively.
Uncorrelated multilinear principal component analysis
Wavelet transform (WT) can effectively extract the time-frequency domain information of the signal, and is a powerful tool for processing non-stationary signals. The Morlet wavelet transform has the advantages of time-frequency locality, multi-resolution analysis and real-time. In this paper, we used the Morlet wavelet transform to transform into Hth-order tensor data , where H is 3, is the number of frequency points, set to 256. The Morlet wavelet function is as follows:
| 2 |
where i is the imaginary operator, f is frequency in Hz, and t is time in seconds, is the width of the Gaussian.
UMPCA algorithm is used to process tensor data represented in the form of time-frequency data. UMPCA algorithm is a tensor-based approach that leverages tensor vector projection (TVP) to directly extract discriminative features from tensor data. By avoiding the need for data vectorization, it preserves the structural information and correlations inherent in the original data, ensuring the acquisition of feature vectors with minimal redundancy. During the feature extraction process of UMPCA, it identifies important directions in the data by selecting eigenvectors associated with the largest eigenvalues (Lu et al. 2009a, b). The discriminant features {,,...,} of an Nth-order tensor data can be calculated using UMPCA, where , M represents the number of samples; N represents the dimension number of the tensor; is the n-mode dimension of the tensor, , , P represents the number of elementary multilinear projection (EMP) in TVP. UMPCA is a variant of PCA, and its core idea is to ensure that the projection is carried out in the direction with the maximum variance. So the discriminant features are obtained by constructing the objective function which is defined as:
| 3a |
| 3b |
where represents the projection component of sample projected through the pth EMP, , represents the pth coordinate vector, represents the qth coordinate vector, and is defined as:
| 4 |
In this paper, EEG signals are decomposed into five layers by Db4 wavelet basis, and the fuzzy entropy of each frequency band (448Hz) in the fifth layer is extracted as the feature, z, r and v are respectively set to 2, 0.25 and 2. And UMPCA algorithm is used to extract the features of characters by taking P value as 50. All process are implemented in Python.
Multi-character classification
In order to enhance the efficacy of multi-character classification, in this section, we use three kinds of classifiers suitable for multi-classification tasks, including SVM, KNN and ELM.
Support vector machine
The core idea of SVM is to achieve accurate dataset classification by constructing a separation hyperplane that maximizes the geometric margin. In the study the one vs. one (OVO) learning strategy is employed for multi-class classification. For each binary classifier, all combinations of class pairs are divided into positive and negative and a total of classifiers are generated. Here, ‘n’ refers to the total number of classes (Lee et al. 2022). And radial basis function (RBF) is selected as the kernel function of SVM. For sample data b and c with different characters, RBF can be described as:
| 5 |
where is a hyperparameter.
K-nearest neighbor
The core idea of the KNN algorithm is that if a majority of the K nearest samples in the feature space belong to a specific category, then the sample under consideration also belongs to that category. In this paper, the parameter k of KNN is set to 3, and Euclidean distance is used as the distance measurement method, the Euclidean distance between samples data b and c can be defined as:
| 6 |
where g represents the spatial dimension.
Extreme learning machine
The core idea of ELM lies in the random initialization of connection weights and biases between the hidden layer and the output layer, thereby facilitating rapid training and enabling the acquisition of high-precision models. If we assume the training samples , where is an n-dimensional feature vector, , and is the true labels, which consists of m-classes, , the output value of ELM with hidden neurons and activation function can be formulated as follows:
| 7 |
where is the weight vector for the input layer between the jth hidden neuron and the input neurons, is the weight vector for the hidden layer between the jth hidden neuron and the output neurons, is the output vector of the network, and is the bias of the jth hidden neuron.
In this paper, the probability density function for assigning the input weights and biases was set to be a uniform distribution function, is set to 2400, uses Sigmoid function, and the fitting error between the output layer and the training data is calculated using the least squares method (LSM) (Min et al. 2016).
In this paper, the data normalization method is used to map character feature vectors onto a range from 0 to 1, the normalized feature vectors, along with their corresponding labels, are utilized as input for training and evaluating the classification model implemented in Python. In order to estimate the generalization ability of the classification model more accurately and get reliable evaluation results, we adopt the K-fold cross-validation strategy to distinguish trials to obtain the average classification results. In this paper, K is set to 10, enabling the segmentation of each subject’s EEG data into ten distinct groups. Among these groups, nine are utilized for training classification models, while the remaining group is reserved for testing models. By conducting ten iterations on the data, we obtain an average result across these iterations representing the multi-character classification outcome for each subject.
Experimental results and analysis
The results and analysis of feature extraction
In order to investigate the impact of vectorization on the extraction of high-quality features, in this paper we use the maximum information coefficient (MIC) method (Sun et al. 2017) to assess the correlation between features extracted by WPD or UMPCA. MIC is mainly calculated by mutual information and grid division method and is a method to detect the nonlinear correlation between variables, its range is between 0 and 1. A higher MIC value indicates a stronger correlation. The higher the value of MIC, the stronger the correlation. The MIC of a two-dimensional finite set D is defined as:
| 8 |
where B(n) is the upper limit of grid division and is eigenmatrix composed of the maximum normalized mutual information value.
Figure 4 shows the MIC matrix between features of characters. Figure 4a shows MIC for fuzzy entropy features extracted using WPD. It is observed that the MIC varies across different character feature pairs, with some pairs exhibiting a remarkably high correlation of 0.99, such as character “i” and character “m”, This finding highlights the presence of strong interdependence among certain character features. Conversely, Fig. 4b presents MIC for features extracted by UMPCA, indicating relatively low correlations ranging from 0.02 to 0.19 between different character features, suggesting the features of different characters are basically irrelevant. Through correlation analysis, it is evident that UMPCA algorithm used in this paper effectively extracts high-quality features with low redundancy, weak correlation and strong representation ability. This outcome ensures the attainment of superior classification performance.
Fig. 4.

MIC matrix: a WPD; b UMPCA (Both the -axis and -axis indicate VI tasks)
In order to further explore the influence of features extracted by UMPCA algorithm on classification performance, we study the change of average classification accuracy of 29 characters under different P values. we set the P= [2, 4, 6, 8, 10, 20, 30, 40, 50] to extract the character features, and use SVM, KNN, ELM to realize multi-character classification. The experimental results are shown in Fig. 5, The classification performance of the three classifiers has the same trend, when P < 20, the classification accuracy increased with higher P value, and when P > 20, the classification performance is only increases slowly and tends to be stable. In this paper, we set P to 50 to extract features for ensure high classification performance.
Fig. 5.
Average classification accuracy under different P values
Remark 1
P value is an important parameter in UMPCA algorithm. In order to obtain better classification performance, it is very important to adjust and optimize P value according to the requirements of specific applications.
The results and analysis of multi-character classification
The features of EEG signals are extracted by UMPCA and classified by SVM, KNN and ELM. The classification performance is evaluated by four indicators (accuracy, precision, recall and F1 score), which are defined as:
| 9a |
| 9b |
| 9c |
| 9d |
where TP represents a positive sample that is predicted to be a positive class, TN a negative sample predicted as a negative class, FN a negative sample that is predicted as a positive class, FP a positive sample predicted as a negative class.
Table 1 shows the detailed data of the classification results of EEG signals of S1 to S6 under different classification models including the mean value and standard deviation (Std) of four indicators. Classification accuracy, precision, recall and F1 score are all obtained by 10-fold cross-validation technology, and the mean value and Std are calculated from the classification results of S1 to S6. It has been observed that the average classification accuracy of SVM, KNN and ELM reached 97.59%, 95.40% and 93.73% respectively, demonstrating good performance across all classifiers. Notably, SVM exhibit superior classification effectiveness for EEG signals acquired from different subjects with a highest accuracy of 98.28% for subject S6. In summary, our proposed scheme demonstrates high average classification accuracy among various classifiers and subjects while highlighting its stability.
Table 1.
The detailed classification performance of SVM, KNN and ELM
| Subject | SVM | KNN | ELM | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Acc (%) | Pr (%) | Re (%) | F1 (%) | Acc (%) | Pr (%) | Re (%) | F1 (%) | Acc (%) | Pr (%) | Re (%) | F1 (%) | |
| S1 | 98.28 | 99.77 | 98.28 | 98.76 | 95.86 | 96.32 | 98.28 | 95.64 | 95.86 | 98.33 | 95.86 | 96.51 |
| S2 | 95.52 | 97.36 | 95.52 | 95.77 | 93.10 | 96.84 | 93.45 | 93.62 | 93.79 | 96.32 | 93.79 | 94.20 |
| S3 | 98.62 | 99.02 | 98.62 | 98.54 | 98.28 | 98.48 | 97.93 | 98.27 | 92.41 | 94.22 | 92.41 | 92.69 |
| S4 | 96.90 | 97.93 | 96.90 | 96.87 | 95.86 | 96.32 | 97.93 | 95.73 | 92.07 | 94.05 | 92.07 | 92.05 |
| S5 | 97.24 | 98.56 | 97.24 | 97.56 | 95.86 | 96.90 | 95.86 | 95.96 | 94.48 | 95.98 | 94.48 | 94.46 |
| S6 | 98.97 | 99.31 | 98.97 | 99.00 | 93.45 | 95.36 | 97.62 | 93.56 | 93.79 | 96.24 | 93.79 | 94.34 |
| Mean | 97.59 | 98.66 | 97.59 | 97.75 | 95.40 | 96.70 | 95.40 | 95.46 | 93.73 | 95.86 | 93.74 | 94.04 |
| Std. | 1.88 | 0.82 | 1.88 | 1.15 | 1.73 | 0.94 | 1.73 | 1.59 | 1.27 | 1.44 | 1.27 | 1.42 |
Compared to the study (Kumar et al. 2018), the average classification accuracy of the proposed scheme has increased by 12.39%, and it is 26.21% higher than the highest classification accuracy in the study (Lee et al. 2022). This outcome can be attributed to the limited utilization of frequency domain information in EEG signals by Kumar et al. (2018), who only considered four temporal features, as well as the neglect of spatial relationships between multiple channel signals by Lee et al. (2022) despite their consideration of both temporal and spectral information. And the proposed scheme fully utilizes time-frequency domain information, while using tensor based algorithms to obtain multi-channel spatial information. These results further validate the superiority and feasibility of our proposed approach.
Remark 2
The reason of comparing the classification results with Lee et al. and Kumar et al. (2018); Lee et al. (2022) is that our study focuses on VI tasks involving letters, which is directly relevant to investigating communication recovery in BCI. This comparison will serve to validate the effectiveness and practicality of our scheme.
To provide a more intuitive view of the classification results for characters, we use the t-distributed stochastic neighbor embedding (t-SNE) method (Laurens and Hinton 2008) to visually analyze these results. S1 is chosen as a representative due to its excellent performance across all three classifiers. Figure 6 illustrates t-SNE visualization results for S1, where each dot represents a character and the same character is marked with a corresponding character label. While some characters exhibit varying degrees of overlap, such as character “g” and “r”, character “k” and “o” has abnormal value, most characters demonstrate clear separation, indicating same characters are clustered together and different characters are separated by a certain distance. Overall, all characters exhibit a noticeable clustering effect. Based on the aforementioned analysis, it can be concluded that our proposed approach in this paper is effective and superior.
Fig. 6.

The t-SNE visualization results of S1 (each dot represents a character and the same character is marked with a corresponding character label)
In order to assess the impact of different cortical areas of the brain on classification performance in VI tasks, we compute the average classification accuracy of electrodes in different cortical areas and all based on the placement of electrodes under three classifiers, as shown in Fig. 7. It is evident that, using all electrodes respectively yields the highest classification performance, with average classification accuracy of 87.59%, 85.79% and 77.38% for the occipital, prefrontal, and temporal lobes, indicating their significant contribution to accurate classification. Conversely, the motor cortex and parietal lobe respectively exhibit lower average accuracies at 63.03% and 52.07%, suggesting a relatively weaker influence on classification performance. Based on these findings, it can be inferred that, the cortical areas related to VI are not single, and VI is related to the activation of the occipital lobe, frontal lobe and temporal lobe.
Fig. 7.
Average classification accuracy of different cortical areas of the brain
Remark 3
The finding that VI exhibits connectivity with multiple cortical areas of the brain is consistent with previous studies (Koji et al. 2018; Dijkstraet al. 2018; Lee et al. 2020; Kwon et al. 2022). This outcome holds significant implications for research and practical applications in related fields, while also serving as a valuable reference for future investigations.
Further discussions
1) The great versatility of presented scheme. When dealing with multi-channel EEG signals, we take into full consideration time-frequency domain information of EEG signals and employ tensor algorithms to capture spatial relationships between them. By jointly extracting features from multiple channels, a higher classification accuracy has been achieved, indicating certain generalizability of our method. Moreover, even when considering an increase in tensor dimensions, our method is still capable of extracting high-quality features, demonstrating its potential. However, higher dimensions often lead to increased computational complexity that requires further investigation.
2) The high classification accuracy of simple image. In the current studies, there exists diversity in the representation forms of VI tasks based on VI-BCI system. For example, in Lee et al. (2020); Llorella et al. (2023), subjects imagined complex visual scenes and object shapes such as an ambulance and an apple, achieving classification accuracies of 40.14% and 30%, respectively. Conversely, simpler images like letters and numbers were imagined in Lee et al. (2022); Kumar et al. (2018) and in our study, with classification accuracies reaching 71.38%, 85.20% and 97.59%, respectively. These findings suggest that higher classification performance can be attained when imagining simpler images while increased complexity in imaginative structures may lead to decreased accuracy levels. This provides a foundation for our promising results presented in this paper.
3) The possible direction of multi-character classification. The research in this paper shows that multi-characters of VI have certain separability. This discovery is of great significance to the development of BCI technology in the field of restoring communication. To further enhance the accuracy and reliability of multi-character classification and facilitate its application in human-computer interaction, rehabilitation medicine, and other domains, our future research should consider the following aspects: (1) Expanding from offline VI-BCI research to develop an online multi-character recognition system; (2) Investigating the impact of specific channels on classification performance through studying channel optimization algorithms, as we only analyze the influence of different cortical areas in this paper; (3) Incorporating deep learning techniques to construct an end-to-end adaptive model that can effectively address inter-subject variability and improve the precision and dependability of multi-character classification.
Conclusions
Visual imagery, as a paradigm that directly reflects users’ mental activities, can construct a new way of communication between the brain and the outside world. In this paper, we design and optimize a multi-character classification scheme, which adopts a new VI-BCI paradigm to acquire EEG signals, and combines the Morlet wavelet transform and UMPCA to realize feature extraction of EEG signals, and then uses SVM, KNN and ELM to classify multi-characters. The scheme proposed in this paper not only achieves good classification performance and shows the effectiveness and superiority of the scheme, but also provides a new, intuitive and high-performance method for the development and application of BCI technology in the field of restoring communication. This scheme has important reference significance for the research of other types of BCI systems. For the future, it is necessary to further study the whole sentence recognition of VI-BCI system and build an online system to output sentences in real time.
Declarations
The data that support the findings of this study are available on request from the corresponding author Hongguang Pan (hongguangpan@163.com). The data are not publicly available due to the containing information that could compromise research participant privacy.
Funding
This work is supported by the Xi'an Science and Technology Program [2022JH-RGZN-0041], Natural Science Basic Research Program of Shaanxi [2021JQ-574].
Footnotes
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- Alom KM, Islam S (2020) Classification for the P300-based brain computer interface (BCI). In: 2020 2nd international conference on advanced information and communication technology (ICAICT), pp. 387–391
- Altan A, Karasu S (2020) Recognition of COVID-19 disease from X-ray images by hybrid model consisting of 2D curvelet transform, chaotic salp swarm algorithm and deep learning technique. Chaos Solit Fract 140:110071 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Babo RM, Buot A, Tallon BC (2019) Neural responses to heartbeats distinguish self from other during imagination. NeuroImage 191:10–20 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bang JS, Jeong JH, Won DO (2021) Classification of visual perception and imagery based EEG signals using convolutional neural networks. In: 2021 9th international winter conference on brain-computer interface (BCI), pp. 1–6
- Boashash B (2015) Time-frequency signal analysis and processing: a comprehensive reference
- Christa N, Scherer R, Reiner M, Gert P (2005) Imagery of motor actions: differential effects of kinesthetic and visual-motor mode of imagery in single-trial EEG. Brain Res Cogn Brain Res 25(3):668–677 [DOI] [PubMed] [Google Scholar]
- Daniel B, Beata J, Nicolas MY, Sergey SD (2015) Neural point-and-click communication by a person with incomplete locked-in syndrome. Neurorehabil Neural Rep 29(5):462–471 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dijkstra N, Mostert P, Lange FPd, Bosch S, Gerven MA (2018) Differential temporal dynamics during visual imagery and perception. Elife 7:33904 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ehsan ET (2012) Classification of primitive shapes using brain computer interfaces. Comput Aided Des 44(10):1011–1019 [Google Scholar]
- Esposito DM, Detre JA, Aguirre GK, Stallcup M, Alsop DC (1997) A functional MRI study of mental image generation. Neuropsychologia 35(5):725–30 [DOI] [PubMed] [Google Scholar]
- Fabio LR, Gustavo P, Jose AM (2020) Convolutional neural networks and genetic algorithm for visual imagery classification. Austral Phys Eng Sci Med 43(3):973–983 [DOI] [PubMed] [Google Scholar]
- Filip S, Tom C (2020) Non-specific visuospatial imagery as a novel mental task for online EEG-based BCI control. Int J Neural Syst 30(6):2050026 [DOI] [PubMed] [Google Scholar]
- Fu Y, Li Z, Gong A, Qian Q, Su L, Zhao L (2021) Identification of visual imagery by electroencephalography based on empirical mode decomposition and an auto-regressive model. Comput Intell Neuro Sci 30:203 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jelena M, Jeremy F, Smeety P, Jeremie M, Fabien L (2021) Towards identifying optimal biased feedback for various user states and traits in motor imagery BCI. IEEE Trans Biomed Eng 69(23):1101–1110 [DOI] [PubMed] [Google Scholar]
- Juneja K (2019) Individual and mutual feature processed ELM model for EEG signal based brain activity classification. Wirel Pers Commun Int J 108(2):659 [Google Scholar]
- Kano N (2019) Communication of ALS patients with totally locked-in syndrome. J Nurs Sci Eng 6(2):63–69 [Google Scholar]
- Ko W, Jeon E, Jeong S, Suk H-I (2021) Multi-scale neural network for EEG representation learning in BCI. IEEE Computat Intell Magaz 16(2):31–45 [Google Scholar]
- Koizumi K, Ueda K, Nakao M (2018) Development of a cognitive brain-machine interface based on a visual imagery method. Annu Int Conf IEEE Eng Med Biol Soc 18:1062–1065 [DOI] [PubMed] [Google Scholar]
- Kumar P, Saini R, Roy PP, Sahu PK, Dogra DP (2018) Envisioned speech recognition using EEG sensors. Pers Ubiquit Comput 22:185–199 [Google Scholar]
- Kwon BH, Lee BH, Cho JH, Jeong JH (2022) Decoding visual imagery from EEG signals using visual perception guided network training method. In: 2022 10th international winter conference on brain-computer interface (BCI), pp. 1–5
- Laurens VDM, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9(86):2579–2605 [Google Scholar]
- Lee S, Jang S, Jun SC (2022) Exploring the ability to classify visual perception and visual imagery EEG data: toward an intuitive BCI system. Electronics 11(17):2706 [Google Scholar]
- Lee S, Lee M, Lee S (2020) Neural decoding of imagined speech and visual imagery as intuitive paradigms for BCI communication. IEEE Trans Neural Syst Rehabilit Eng 28(12):2647–2659 [DOI] [PubMed] [Google Scholar]
- Li Y, Ning F, Jiang X, Yi Y (2022) Feature extraction of ship radiation signals based on wavelet packet decomposition and energy entropy. Math Probl Eng 2022:1–12 [Google Scholar]
- Li M, Wang R, Xu D (2020) An improved composite multiscale fuzzy entropy for feature extraction of MI-EEG. Entropy (Basel) 22(12):1356 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Llorella FR, AzorÃn JM, Patow G (2023) Black hole algorithm with convolutional neural networks for the creation of brain-computer interface based in visual perception and visual imagery. Neural Comput Appl 35(8):5631–5641 [Google Scholar]
- Lu H, Konstantinos PN, Anastasios VN (2009) Uncorrelated multilinear discriminant analysis with regularization and aggregation for tensor object recognition. IEEE Trans Neural Netw 20(1):103–23 [DOI] [PubMed] [Google Scholar]
- Lu H, Konstantinos PN, Anastasios VN (2009) Uncorrelated multilinear principal component analysis for unsupervised multilinear subspace learning. IEEE Trans Neural Netw 20(11):1820–1836 [DOI] [PubMed] [Google Scholar]
- Markus K, Jan K, Thomas M, Mark G (2000) Cortical activation evoked by visual mental imagery as measured by fMRI. Neuro Rep 11:3957–3962 [DOI] [PubMed] [Google Scholar]
- Min B, Kim J, Park H-j, Lee B, et al. (2016) Vowel imagery decoding toward silent speech BCI using extreme learning machine with electroencephalogram. BioMed Res Int 2016 [DOI] [PMC free article] [PubMed]
- Nataliya K, Lindgren JT, Anatole L (2018) Attending to visual stimuli versus performing visual imagery as a control strategy for EEG-based brain-computer interfaces. Sci Rep 8(1):13222 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pan H, Li Z, Tian C, Wang L, Fu Y, Qin X, Liu F (2022) The light GBM-based classification algorithm for Chinese characters speech development imagery BCI system. Cognit Neurodyn 17(2):373–384 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sousa T, Amaral C, Andrade J, Pires G, Nunes UJ, Castelo-Branco M (2017) Pure visual imagery as a potential approach to achieve three classes of control for implementation of BCI in non-motor disorders. J Neural Eng 14(4):1–11 [DOI] [PubMed] [Google Scholar]
- Srivastava G, Crottaz-Herbette S, Lau KM, Glover GH, Menon V (2005) ICA-based procedures for removing ballistocardiogram artifacts from EEG data acquired in the MRI scanner. Neuroimage 24(1):50–60 [DOI] [PubMed] [Google Scholar]
- Sun G, Song Z, Liu J, Zhu S, He Y (2017) Feature selection method based on maximum information coefficient and approximate markov blanket. Acta Automat Sin 43(5):795–805 [Google Scholar]
- Tang J, Xu M, Han J, Liu M, Dai T, Chen S, Ming D (2020) Optimizing SSVEP-based BCI system towards practical high-speed spelling. Sensors 20(15):4186 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vinay C, Ankur V, Anand N, Eklas H (2020) Brain-computer interface-based humanoid control: a review. Sensors (Basel) 20(13):3620 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Williams AH, Ben P, Maheswaranathan N, Dhawale AK (2020) Discovering precise temporal patterns in large-scale neural recordings through robust and interpretable time warping. Neuron 105(2):246–259 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Winlove C, Milton F, Ranson J, Fulford J, Mackisack M, Macpherson F, Zeman A (2018) The neural correlates of visual imagery: a co-ordinate-based meta-analysis. Cortex 105:4–25 [DOI] [PubMed] [Google Scholar]
- YaÄŸ I, Altan A (2022) Artificial intelligence-based robust hybrid algorithm design and implementation for real-time detection of plant diseases in agricultural environments. Biology 11(12):1732 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang Z, Sun J, Chen T (2022) A new dynamically convergent differential neural network for brain signal recognition. Biomed Sig Process Control 71(3):103130 [Google Scholar]
- Zhang L, Wen D, Li C, Zhu R (2020) Ensemble classifier based on optimized extreme learning machine for motor imagery classification. J Neural Eng 17(2):1–12 [DOI] [PubMed] [Google Scholar]




