An improved CapsNet based on data augmentation for driver vigilance estimation with forehead single-channel EEG

Huizhou Yang; Jingwen Huang; Yifei Yu; Zhigang Sun; Shouyi Zhang; Yunfei Liu; Han Liu; Lijuan Xia

doi:10.1007/s11571-024-10105-0

. 2024 Apr 10;18(5):2535–2550. doi: 10.1007/s11571-024-10105-0

An improved CapsNet based on data augmentation for driver vigilance estimation with forehead single-channel EEG

Huizhou Yang ¹, Jingwen Huang ¹, Yifei Yu ², Zhigang Sun ², Shouyi Zhang ², Yunfei Liu ^1,^✉, Han Liu ³, Lijuan Xia ²

PMCID: PMC11639747 PMID: 39678725

Abstract

Various studies have shown that it is necessary to estimate the drivers’ vigilance to reduce the occurrence of traffic accidents. Most existing EEG-based vigilance estimation studies have been performed on intra-subject and multi-channel signals, and these methods are too costly and complicated to implement in practice. Hence, aiming at the problem of cross-subject vigilance estimation of single-channel EEG signals, an estimation algorithm based on capsule network (CapsNet) is proposed. Firstly, we propose a new construction method of the input feature maps to fit the characteristics of CapsNet to improve the algorithm accuracy. Meanwhile, the self-attention mechanism is incorporated in the algorithm to focus on the key information in feature maps. Secondly, we propose substituting the traditional multi-channel signals with the single-channel signals to improve the utility of algorithm. Thirdly, since the single-channel signals carry fewer dimensions of the information compared to the multi-channel signals, we use the conditional generative adversarial network to improve the accuracy of single-channel signals by increasing the amount of data. The proposed algorithm is verified on the SEED-VIG, and Root-mean-square-error (RMSE) and Pearson Correlation Coefficient (PCC) are used as the evaluation metrics. The results show that the proposed algorithm improves the computing speed while the RMSE is reduced by 3%, and the PCC is improved by 12% compared to the mainstream algorithm. Experiment results prove the feasibility of using forehead single-channel EEG signals for cross-subject vigilance estimation and offering the possibility of lightweight EEG vigilance estimation devices for practical applications.

Keywords: Cross-subject, Vigilance estimation, Single-channel EEG signals, Capsule network, CGAN, Self-attention

Introduction

With the rapid increase in vehicles, the probability of traffic accidents continues to increase each year. Statistically, the leading cause of traffic accidents is fatigue driving, and about 20–30% of traffic accidents occur when drivers reduce or lose their vigilance due to prolonged driving (Sikander and Anwar 2019). Therefore, providing timely vigilance reminders to drivers during driving is essential.

In order to provide reliable and objective methods for vigilance estimation, several studies based on various driver’s behaviors and physiological signals have been proposed. Alioua et al. (2014) proposed to detect the driver’s mouth state using circular Hough transform (CHT) and to monitor the driver’s vigilance based on yawning. Flores et al. (2010) proposed to use a coalescent algorithm to extract eye state features and thus calculate the drowsiness index. Murphy-Chutorian and Trivedi (2010) presented a particle filter-based head tracker to determine the driver’s awareness via the position and rotation angle of the driver’s head. All of the above methods are based on drivers’ behaviors, and these methods are studied from the perspective of visual information. However, the limitations of these methods are that they are highly influenced by lighting conditions (Yue and Ji 2019) and different drivers have different driving habits. Therefore, the vigilance estimation based on behaviors suffer s from a lack of generalizability. Compared to the drivers’ behaviors, the physiological signals spontaneously generated by the human body are more objective. Currently, electrocardiography (ECG) (Buendia et al. 2019; Rogado et al. 2009), electrooculography (EOG) (Kamakura et al. 2007), electromyography (EMG) (Naeije and Zorn 1982; Jianchao et al. 2021) and electroencephalography (EEG) (Kong et al. 2017; Luo et al. 2019; Tuncer et al. 2021) have been used in vigilance estimation. In particular, EEG directly records neural signals from the human brain and is considered the gold standard for vigilance estimation (Pei et al. 2021).

Although there have been many studies on EEG-based vigilance estimation, most of the existing studies are based on the intra-Participant condition (Zeng et al. 2018; Wu et al. 2022), where the data for model training and testing are both from the same individual, and the estimation accuracy will be significantly reduced when the model is applied to another individual. However, the collection of training data and the training of models require considerable time. Therefore, research on cross-subject vigilance estimation is necessary. Existing solutions for cross-subject vigilance estimation can be divided into classical machine-learning-based and deep-learning-based methods. Chuang et al. (2014) proposed to fuse Gaussian classifier (GC), support vector machine (SVM) and radial basis function neural network (RBFNN) as integrated classifiers through PRTools to classify the level of vigilance, and the results demonstrated that the proposed integrated approach improved the accuracy by 7%. Ko et al. (2020) used the fused features of EEG and EOG signals to achieve the assessment of vigilance through a multiple linear regression (MLR) model, and the algorithm was able to produce results in 0.034 s with low cost and high efficiency. Deep learning is also popular in cross-subject vigilance estimation. Gao et al. (2019) proposed a CNN-based fatigue detection method to automatically learn features from EEG signals for classification, achieving an accuracy of 97.37%. In Lu et al. (2018), an adversarial discriminative domain adaptation (ADDA) method based on domain adaptation is proposed, and the experimental results show that the data distributions of different individuals can be aligned to similar distributions using domain adaptation, thus achieving a model with generality. Based on domain adaptation, Ma et al. (2019) proposed the use of domain generalization for vigilance estimation, where the model was trained without adding any additional test data, and the overall performance is stable despite a slight decrease in accuracy compared to the results of domain adaptation. Zhang and Etemad (2021) proposed an estimation method based on LSTM and capsule network using different capsule layers to extract local and global information, and state-of-the-art results were obtained in the experiments.

However, existing EEG-based methods for cross-subject vigilance estimation suffer from several drawbacks. Firstly, the recent high-precision cross-subject solutions require long operation time resulting in weak real-time capabilities (Zhang and Etemad 2021). Meanwhile, the transfer learning-based solutions require certain data from testers for pre-learning and cannot respond to the changes of different testers in time (Lu et al. 2018). Secondly, EEG signals are collected in a cumbersome manner. EEG signals are usually collected from the scalp of the human brain and the collection electrodes are arranged using the international 10–20 system, which often requires a great deal of time to configure the equipment and is not suitable for daily use (Zhang et al. 2020). Benefit from the development of forehead EEG signal acquisition devices, we are able to collect EEG signals from the forehead (Maskeliunas et al. 2016). Compared with the international 10–20 system, the forehead collection method is more convenient and suitable for daily wear. Moreover, in order to reduce the power consumption and cost of products, most of the commonly used consumer-grade EEG devices contain only one forehead sensor (Ratti et al. 2017). Therefore, it is necessary to investigate a single forehead channel vigilance estimation algorithm suitable for practical applications. Thirdly, since the training of the network relies on a large number of samples, and physiological signals such as EEG are difficult to obtain. Hence, the available data needs to be expanded for better training of the network. Finally, most existing solutions simply convert continuous vigilance metrics into discrete classification problems (Ko et al. 2020; Liu et al. 2019), classifying vigilance into several categories, such as normal, slight fatigue, and excessive fatigue, which deviates from the essence of vigilance estimation.

To handle the drawbacks described above, we propose a cross-subject vigilance estimation algorithm based on forehead single-channel EEG, and the capsule network is carefully chosen as the solution to the final regression task. In addition, to verify the feasibility of the forehead single-channel signals for vigilance estimation, we compared the multichannel signal results with the single-channel results in terms of both estimation accuracy and operation time. Meanwhile, in order to prevent the problem of insufficient data, we perform data augmentation on the training data to provide more learnable data to the model.

The contributions of our paper are as follows:

To improve the accuracy of cross-subject vigilance estimation, we propose an improved capsule network regression algorithm that incorporates the self-attention mechanism based on changing the network structure layer. In addition, we propose to change the input feature maps to better match the properties of the capsule network and further improve the estimation accuracy.
The forehead single-channel EEG signals are used for cross-subject vigilance estimation to reduce algorithm operation time while ensuring estimation accuracy. Meanwhile, the utilization of single-channel signals offers the possibility of producing wearable consumer-grade vigilance alert devices.
In order to solve the problem of insufficient data, we use CGAN for data augmentation in the training phase of the network to obtain sufficient samples.

Methods

Solution overview

We propose the model to learn discriminative information from the single-channel EEG signal. To achieve this, we combine the capsule network with the self-attention mechanism. First, the self-attention mechanism is used to extract the key features of the feature maps, and then the capsule network is used to further learn the part-whole hierarchical relationship. Furthermore, it is common sense that a deep learning model will be more accurate when it accesses more training data. Therefore, to further improve the training effectiveness of the network, we propose to perform data augmentation on training data, thus improving the accuracy of the network by expanding the training data set. The overall structure of the proposed method is shown in Fig. 1. It is worth mentioning that both the training data and the test data come from different individuals, and only the training data will be enhanced by the CGAN module, while the test data will not, so there will be no data leakage.

Fig. 1 — The overall architecture of the proposed method

Input representation layer

Before training and testing the network, the input EEG signals need to be encoded into extracted feature maps through the input representation layer, consisting of three steps: data pre-processing, feature extraction and construction of the feature maps.

Data pre-processing

The raw EEG signals acquired in vigilance experiments tend to have high resolution and are susceptible to interference from the surrounding environment, which is not conducive to processing and analyzing vigilance-related brain neural activity. Hence, to reduce the computational complexity, the raw EEG signals are first downsampled to 200 Hz. Then the signal is filtered by a 1–75 Hz band-pass filter and a 49–51 Hz band-stop filter to reduce artificial interference and power frequency interference (Zheng and Lu 2017).

Feature extraction

The time-frequency features are calculated by Short-term Fourier Transforms (STFT) with a nonoverlapping Hanning window. For feature extraction, the Power Spectral Density (PSD) and Differential Entropy (DE) calculated from the output of STFT are used as effective features for vigilance estimation, which are extracted from the total frequency band between 1.0 and 50 Hz with a frequency resolution of 2 Hz (Zheng and Lu 2017). The formula for calculating PSD is as follows:

\begin{matrix} S_{xx} (ω) = lim_{T \to \infty} E [{|\hat{X}, (ω)|}^{2}] \end{matrix}

where $x \sim G (μ, σ^{2})$ represents a random time series that follows Gaussian distribution, and the probability density function is:

\begin{matrix} f (x | μ, σ^{2}) = \frac{1}{σ \sqrt{2 π}} e^{- \frac{1}{2} {((x - μ) / σ)}^{2}} \end{matrix}

Similarly, the DE feature of a random time series that obeys a Gaussian distribution can be calculated using:

\begin{matrix} \begin{matrix} h (x | μ, σ^{2}) & = - \int_{- \infty}^{\infty} f (x | μ, σ^{2}) log (f (x | μ, σ^{2})) d x \\ = \frac{1}{2} l o g 2 π e σ^{2} \end{matrix} \end{matrix}

In general, we can extract 25 features of PSD and DE respectively from each segment of single-channel EEG signals, with a total of 50 features.

Construction of feature maps

After obtaining the features of each segment, it is necessary to build the feature maps to train the network. In the conventional way of feature map construction, different features are usually divided into different channels. This approach speeds up the operation for the network to calculate each output, especially for the multimodal fusion features studied in most of the literature, which includes EEG signals and EOG signals of multiple channels, each segment has 3 to 5 different types, with a total of 1000–2000 features (Zheng and Lu 2017; Wei et al. 2021). Such large-scale features require multi-channel feature maps to speed up the network operations. In contrast, in the single-channel EEG signals studied in this paper, each segment includes only 2 types and a total of 50 features. Hence, for the single-channel signal with fewer features, we propose a new method to construct the feature map to match the vigilance estimation algorithm proposed in this paper, and the specific construction method is shown in Fig. 2. By stitching the features of the two channels on the same plane, the correlation between the positions of different features can be significantly increased, which is more suitable for the subsequent use of capsule network for vigilance estimation.

Fig. 2 — The construction methods of feature maps. The two colors represent PSD and DE features respectively. a Is the traditional construction method, and different types of features are located in different channels. b Shows the proposed construction method, and features are arranged in the same channel

Self-attention module

In order to make the neural network notice the correlation between different parts of the input feature maps, we introduce the self-attention module. The whole module includes three parts, namely position encoding, self-attention mechanism, residual connection and normalization.

Positional encoding

To further utilize the position information between different features, the “position encodings” is incorporated into the input feature maps (Vaswani et al. 2017). The acquisition of position code is based on:

\begin{matrix} P E_{(p o s, 2 i)} = s i n (p o s / 10000^{2 i / d_{model}}) \\ P E_{(p o s, 2 i + 1)} = c o s (p o s / 10000^{2 i / d_{model}}) \end{matrix}

where pos is the position, i is the dimension, and $d_{model}$ represents the dimensions of input and output.

Self-attention mechanism

The self-attention mechanism was introduced due to its ability to fully exploit the relationships between different input features. And the effectiveness of model training can be further enhanced if the interrelationships between these inputs are fully utilized during neural network training. Thus, the output of each feature map after the self-attention mechanism can be calculated as:

\begin{matrix} a t t e n t i o n (Q, K, V) = s o f t m a x (\frac{Q K^{T}}{\sqrt{d_{k}}}) V \end{matrix}

where Q is the set of queries, K is the set of keys, V is the set of values, and $d_{k}$ denotes the dimension of K.

Residual connection and normalization

In order to prevent the degradation of deep neural network training, residual connection block is added to the self-attention module (He et al. 2016). Then, the data with the residual block is normalized to speed up the training and improve the stability of the training.

CapsNet

In CapsNet module, the feature maps of size $[A_{m} \times A_{n}]$ from the self-attention module is used as input to the first layer. The first layer of this module is a convolution layer with an $e \times e$ convolution kernel and a stride of g. No padding is employed in this convolution layer. Accordingly, the output volume of this layer is $[C \times \frac{A_{m} - e + 1}{g} \times \frac{A_{n} - e + 1}{g}]$ , where C denotes the number of channels.

The convolution layer is followed by a normalization layer. Differ from the conventional normalization method in CapsNet (Zhang and Etemad 2021; Sabour et al. 2017; Chao et al. 2019; Guarda et al. 2022), we propose to use Layer Normalization (LN) instead of Batch Normalization (BN). Compared with the BN layer, which treats the same batch of samples, the LN layer processes the different features of a single sample. Such an approach makes the distribution of input data in each layer is relatively stable, while strengthening the connection between them, which is more suitable for the subsequent use of CapsNet.

The third layer is part of CapsNet called primary capsule. In this layer, the normalized feature maps are split into n $C_{d}$ -dimensional capsules $(n \times C_{d} = C)$ , and the same convolution operation with $e \times e$ kernel and stride g is used within each primary capsule. Thus, the output of each primary capsule can be denoted as $u_{i}, i \in [1, n]$ , whose volume is $[C_{d} \times \frac{(A_{m} - e + 1) / g - e + 1}{g} \times \frac{(A_{n} - e + 1) / g - e + 1}{g}]$ .

Digit capsules constitute the fourth layer of the network. This layer is composed of a matrix of $K \times H$ , which is used to receive the vectors transmitted from the primary capsules, where K is the number of digit capsules and H is the dimension of each digit capsule. The output of the digit capsules $s_{j}, j \in [1, K]$ can be calculated as:

\begin{matrix} s_{j} = \sum_{i}^{} c_{i, j} {\hat{u}}_{i | j} \end{matrix}

where $c_{i, j}$ are the coupling coefficients determined by the dynamic routing of r iterations, ${\hat{u}}_{i | j}$ represent the product of the weight matrix $W_{i, j}$ and the output of the primary capsule, and thus can be expressed as:

\begin{matrix} {\hat{u}}_{i | j} = W_{ij} u_{i} \end{matrix}

In order for the length of the output vector of a capsule to represent the probability of the occurrence of the content represented by that capsule, the output from the digit capsule is normalized to between 0 and 1 by a nonlinear squash function, which can be expressed by equation (8):

\begin{matrix} v_{j} = \frac{{∥s_{j}∥}^{2}}{1 + {∥s_{j}∥}^{2}} \frac{s_{j}}{∥s_{j}∥} \end{matrix}

where $v_{j}, j \in [1, K]$ represents the output of the squashed capsules.

Finally, we added a fully connected layer which employs sigmoid as the activation function to obtain an estimate of the vigilance.

Data augmentation

In deep neural networks, more training data tend to imply higher estimation accuracy. However, in practical situations, it is difficult to acquire EEG data in large quantities. This problem is well addressed by the emergence of Generative Adversarial Net (GAN) (Goodfellow et al. 2014), which can learn the distribution of real data to generate artificial data for data augmentation. Jiao et al. (2020) proposed using Conditional Wasserstein GAN (CWGAN) to generate high-quality EOG features, and the experiments proved that the accuracy of the classifier was improved. Therefore, in this paper, we employ CGAN for data augmentation of the features extracted from single-channel EEG signals to further train the proposed neural network. The structure of CGAN is shown in Fig. 3.

The CGAN consists of two competing components: the generator and the discriminator. The inputs of the generator G are a random noise and the specified category labels, and the generator outputs the generated data with a distribution similar to the real data which confuses the discriminator. The inputs of the discriminator D are the generated data, the real data, and the specified category labels, and the output is either real or fake. Hence, the goal of the discriminator and generator game can be expressed as a minmax problem as follows:

\begin{matrix} \begin{matrix} min_{G} max_{D} V (D, G) & = E_{x \sim X_{r}} [{log}_{}, D (x | y)] \\ + E_{z \sim X_{g}} [{log}_{}, (1 - D ((G, (z | y))))] \end{matrix} \end{matrix}

where z is the random noise obeying Gaussian distribution, y is the specified category labels, $X_{r}$ denotes the distribution that the real data obeys, and $X_{g}$ represents the distribution that the generated data obeys. In this paper, in order to simplify the training of CGAN, the original labels (0−1) are divided into 101 categories, with two decimal places reserved.

Experimental studies

In order to evaluate the performance of our proposed vigilance estimation algorithm, we designed relevant experiments on a public dataset. The details are as follows.

Dataset

SEED-VIG

SEED-VIG is a large public dataset that records the EEG signals from a total of 23 participants, each for 120 consecutive minutes of simulated driving (Zheng and Lu 2017). 17 EEG channels were recorded from the temporal and posterior brain regions, and four EEG channels were collected from the forehead. All the signals were collected at the sampling frequency of 1000 Hz with a Neuroscan system. Considering practicality and portability, the four channels from the forehead (called Channel 4, Channel 5, Channel 6 and Channel 7, respectively) were extracted for this study to explore the feasibility of single-channel EEG signals from the forehead for vigilance estimation. Figure 4 presents the specific locations of the four EEG channels in the forehead.

Fig. 4 — The specific locations of the four EEG channels on the forehead in the SEED-VIG dataset. a Shows the actual wearing positions of the forehead EEG signal acquisition, and b represents the relative positions of the four channels. The figure is inspired by the SEED-VIG paper (Zheng and Lu 2017)

To ensure the accuracy of the vigilance labels for the dataset, SEED-VIG uses SMI eye-tracking glasses1 to record different eye movements, such as blink, fixation, saccade, and the duration of eyes closures (CLOS). Therefore, the percentage of eye closure PERCLOS, which is one of the most widely accepted vigilance indices in the literature (Trutschel et al. 2011; Bergasa et al. 2006; Dong et al. 2011), can be calculated as:

\begin{matrix} P E R C L O S = \frac{b l i n k + C L O S}{b l i n k + f i x a t i o n + s a c c a d e + C L O S} \end{matrix}

(2)
Sustained-attention driving task (SADT) dataset

The dataset is provided by the Brain Research Center, National Chiao Tung University (NCTU).2 The experiment was set on a four-lane highway, and the car would drift from the original cruise lane to the left lane or the right lane (Cao et al. 2019). Twenty-seven subjects were asked to keep the car in the middle of the lane. SADT recorded the time when the car deviated from the original lane (deviation onset), the time when the subjects began to respond to the deviation event (response onset), and the time when the car returned to the original lane (response offset). The experiment lasted about 90 min, during which the subjects’ EEG signals were continuously recorded. Figure 5 shows the experimental design and the electrode positions of EEG signals.

Fig. 5 — Experimental design of SADT dataset. a Shows the illustration of the event-related lane-departure paradigm. b Shows the positions of 30 electrodes and reference electrodes (A1 and A2) for obtaining EEG signals in the SADT (Cao et al. 2019)

Each subject chooses one experimental data, and a total of 27 subsets constitute the total dataset. To test the performance of the proposed algorithm on forehead EEG signals like SEED-VIG, we select the data of FP1 and FP2 electrodes for the experiment. The data pre-processing method is the same as Jiang et al. (Mar 2021), using a 0.5 Hz high-pass filter and 50 Hz low-pass filter to filter the signal, and then downsampling the signal from 500 to 250 Hz. The reaction time (RT) of subjects is an important basis for judging fatigue state. We calculate the features according to EEG signals in the reaction time and also calculate the fatigue index (FI) according to RT, the formula is:

\begin{matrix} F I = m a x \{0, \frac{1 - e^{- (τ - τ_{0})}}{1 + e^{- (τ - τ_{0})}}\} \end{matrix}

where $τ$ is the RT of each lane-departure event, and $τ_{0}$ is the alert RT with a value of 1 (Zhang et al. 2022).

Evaluation methods

In order to evaluate the performance of our proposed algorithm, root-mean-square Error (RMSE) and Pearson correlation coefficient (PCC) are used as performance metrics. RMSE usually reflects the squared error between the predicted and observed values, and the formula is as follows:

\begin{matrix} R M S E (Y, \hat{Y}) = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(y_{i} - {\hat{y}}_{i})}^{2}} \end{matrix}

where $Y = {(y_{1}, y_{2}, \dots, y_{N})}^{T}$ represent the observed values calculated by PERCLOS, and $\hat{Y} = {({\hat{y}}_{1}, {\hat{y}}_{2}, \dots, {\hat{y}}_{N})}^{T}$ represent the predicted values output by the proposed regression model.

PCC is applied due to its ability to describe the consistency of the trend between the predicted and observed values, and it can be calculated based on:

\begin{matrix} P C C (Y, \hat{Y}) = \frac{\sum_{i = 1}^{N} (y_{i} - \bar{y}) ({\hat{y}}_{i} - \bar{\hat{y}})}{\sqrt{\sum_{i = 1}^{N} {(y_{i} - \bar{y})}^{2} \sum_{i = 1}^{N} {({\hat{y}}_{i} - \bar{\hat{y}})}^{2}}} \end{matrix}

where $\hat{Y}$ is the mean of Y and $\bar{\hat{y}}$ is the mean of $\bar{Y}$ . The PCC values range from ‘ $- 1$ ’ to ‘1’, where ‘ $- 1$ ’, ‘0’ and ‘1’ indicate completely inconsistent, irrelevant and completely consistent, respectively. In general, lower RMSE value and higher PCC value indicate higher prediction accuracy.

To qualify our proposed algorithm for the cross-subject task, we performed Leave-One-Subject-Out (LOSO) cross-validation on the dataset, where the data collected from 22 participants were used to train the network and the data of the remaining one were used to test the network. This process was repeated until the data of each participant were regarded as the test data.

Implementation details

In our experiments, the Root Mean Square Error (RMSE) $L = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(y_{i} - {\hat{y}}_{i})}^{2}}$ is chosen as the loss function. The CGAN module is employed to produce the generated data of the same size as the training data. The iteration of the dynamic routing algorithm is an important hyperparameter in the CapsNet module. By comparing the training loss under different routing iterations, the parameter can be determined. As shown in the Fig. 6, we record the RMSE loss of the network during 30 epochs. Similar curve trends appear in 1–6 iterations, but the best performance is achieved in 5 iterations, with the fastest convergence speed and the best precision. Furthermore, Adam optimizer (Kingma and Ba 2014) is used to automatically optimize the network parameters and efficiently train our proposed model. At the same time, we empirically tune the hyper-parameters of the network to obtain the best performance, which can also be seen in Table 1. The pipeline is implemented on a PC with an Intel i5-8265U CPU @1.6GHz processor, we put the details of our experimental environments into Table 2.

Fig. 6 — Performance comparison of the proposed algorithm under different routing iterations in CapsNet. The x-axis is the training epoch, and the y-axis is the RMSE loss. Different colors represent different iterations

Table 1.

Parameter settings of each module

Modules	Parameters	Value
Proposed module	Training epochs	30
Proposed module	Batch size	32
Self-attention module	Input dimension $d_{module}$	10
CapsNet module	Input feature map size $[A_{m} \times A_{n}]$	$[5 \times 10]$
	Convolution kernel size e	3
	Stride g	1
	Total numbers of channels C	8
	Dimensions of primary capsules n	4
	Numbers of primary capsule channels $C_{d}$	2
	Numbers of representations K	10
	Dimensions of digit capsules H	16
	Iterations of dynamic routing r	5
CGAN module	Training epochs	5000
CGAN module	Batch size	32
Regression	Activation	Sigmoid

Open in a new tab

Table 2.

Experimental environments

Experimental environments	Details
Program language	Python 3.9
Framework	Pytorch 1.12.0
Operating system	Windows 10
CPU Type	Intel i5-8265U
RAM	8.0GB

Open in a new tab

Comparison

State-of-the-art methods

As described in the Introduction, there have been a number of solutions to the performance of vigilance estimation in cross-subject task. Here, we further present the state-of-the-art cross-subject solutions. We rigorously report work related to cross-subject vigilance estimation using single-channel EEG signals to provide a fair comparison.

In Lu et al. (2018), A cross-subject vigilance estimation algorithm based on the fused features of EEG and EOG, namely Adversarial Discriminative Domain Adaptation (ADDA), was proposed. The article proposed to narrow the distribution difference between the features of the source topic (training set) and the features of the target topic (testing set) through the adaptive network of antagonistic domain, so as to improve the accuracy of vigilance estimation.

In Zhang and Etemad (2021), LSTM-CapsAtt was proposed for vigilance estimation. The architecture learns hierarchical dependencies in the data through LSTM and capsule networks, extracts lower-level information using lower-level capsule layers, and further captures these representations and groups them with higher-level capsule layers. The experiments demonstrate the effectiveness of the proposed approach and set a new state-of-the-art. However, this method still has some shortcomings: the inference speed does not meet the real-time requirements, and there is still possibility for improving the network structure. Therefore, we propose to use the capsule network as the baseline and improve it by changing the normalization layer, and we also propose a new construction method of feature maps to increase the accuracy of the algorithm, so as to improve the problem of unsatisfactory inference speed brought by multi-network fusion as the solution method.

Classical methods

In addition to compare with the state-of-the-art models, we also compare several classic regression models, including traditional machine learning and deep learning methods: Support Vector Regression (SVR) (Zhang et al. 2015), LSTM (Zhang and Etemad 2021) and Convolutional Neural Network (CNN) (Zhang and Etemad 2021), thus ensuring a comprehensive comparison from multiple perspectives. The ‘RBF’ kernel function is used in SVR with kernel function coefficients $γ$ turned in the range [0.0001, 0.001,..., 1], and the penalty factor C is tuned in the range [1, 10, 50, 100]. Three stacked LSTM layers are utilized in the LSTM model, where each layer has 15 cells and 256 hidden cells. The CNN model includes one stacked 2D convolution layers with a core size of 3.

Results

In this section, we present the results of the proposed algorithm with the published algorithms described above. Meanwhile, to demonstrate the feasibility of using single-channel EEG signals to estimate vigilance, we compare the accuracy of vigilance estimation using forehead single-channel EEG signals with that of forehead four-channel EEG signals. Furthermore, we perform extensive ablation experiments to investigate the effects of the various modules proposed in this paper on regression accuracy through data visualization and comparison of results.

Performance

After Linear Dynamic System (LDS) filtering (Zheng and Lu 2017), the features called PSD_LDS and DE_LDS are extracted in SEED-VIG. To examine the viability of single-channel EEG signals for vigilance estimation, we extracted the features of each single channel as well as the fused four-channel features separately, so as to obtain the comparison results of vigilance estimation between multi-channel EEG signals and single-channel EEG signals. Therefore, 50 features can be obtained for each single-channel EEG signal, while 200 features are available for the combined four-channel EEG signals. The mean performances of all algorithms with different channels of EEG signals are listed in Table 3. It can be seen that each single-channel signal can support the estimation of the vigilance. In particular, Channel 6 performs better than the other single-channel signals. Even compared to the fused signals of all channels, Channel 6 still performs better in some of the algorithms. We also use single-channel EEG signals on SATD to compare different algorithms, and the results are shown in the Table 4. It is observed that the proposed model obtains state-of-the-art results based on RMSE and PCC values by outperforming previous solutions, which confirms the effectiveness of the proposed improved methods and the individual components of the model.

Table 3.

The performance of our proposed model in comparison to other solutions in SEED-VIG, which use different channels of EEG signals

Methods	Channels
	All Channels		Channel 4		Channel 5		Channel 6		Channel 7
	RMSE±SD	PCC±SD	RMSE±SD	PCC±SD	RMSE±SD	PCC±SD	RMSE±SD	PCC±SD	RMSE±SD	PCC±SD
SVR (Zhang et al. 2015)	$0.259 \pm 0.09$	$0.521 \pm 0.34$	$* 0.245 \pm 0.08$	$* 0.556 \pm 0.27$	$* 0.230 \pm 0.06$	$* 0.553 \pm 0.30$	* $0.243 \pm 0.12$	$* 0.565 \pm 0.27$	$* 0.242 \pm 0.07$	$0.496 \pm 0.29$
LSTM (Zhang and Etemad 2021)	$0.290 \pm 0.10$	$0.498 \pm 0.29$	$0.345 \pm 0.10$	$0.402 \pm 0.32$	$0.348 \pm 0.10$	$0.402 \pm 0.32$	$0.332 \pm 0.12$	$0.469 \pm 0.30$	$0.346 \pm 0.15$	$0.415 \pm 0.31$
CNN (Zhang and Etemad 2021)	$0.241 \pm 0.10$	$0.523 \pm 0.32$	$0.277 \pm 0.15$	$0.445 \pm 0.36$	$0.286 \pm 0.29$	$0.500 \pm 0.28$	$0.303 \pm 0.30$	$0.521 \pm 0.28$	$0.297 \pm 0.21$	$0.417 \pm 0.42$
LSTM-CapsAtt (Zhang and Etemad 2021)	$0.235 \pm 0.10$	$0.489 \pm 0.35$	$0.240 \pm 0.08$	$0.483 \pm 0.33$	$* 0.235 \pm 0.06$	$* 0.500 \pm 0.37$	$* 0.220 \pm 0.08$	$* 0.574 \pm 0.27$	$0.245 \pm 0.09$	$0.463 \pm 0.34$
ADDA (Lu et al. 2018)	$0.229 \pm 0.09$	$0.489 \pm 0.34$	$0.237 \pm 0.09$	$0.428 \pm 0.22$	$0.240 \pm 0.09$	$0.356 \pm 0.29$	$0.239 \pm 0.09$	$0.399 \pm 0.26$	$0.231 \pm 0.08$	$0.397 \pm 0.24$
CapsNet (baseline)	$0.225 \pm 0.06$	$0.538 \pm 0.25$	$0.250 \pm 0.08$	$0.450 \pm 0.30$	$0.239 \pm 0.06$	$0.445 \pm 0.29$	$* 0.219 \pm 0.07$	$0.469 \pm 0.33$	$0.247 \pm 0.07$	$0.419 \pm 0.24$
+ Self-attention	$0.216 \pm 0.06$	$0.610 \pm 0.23$	$0.235 \pm 0.08$	$0.505 \pm 0.25$	$* 0.211 \pm 0.06$	$0.528 \pm 0.29$	$* 0.209 \pm 0.07$	$0.569 \pm 0.23$	$0.228 \pm 0.08$	$0.464 \pm 0.34$
++ CGAN	$0.209 \pm 0.06$	$0.614 \pm 0.24$	$0.227 \pm 0.07$	$0.544 \pm 0.25$	$0.210 \pm 0.06$	$0.567 \pm 0.27$	*0.201±0.06	$0.610 \pm 0.22$	$0.220 \pm 0.07$	$0.498 \pm 0.34$

Open in a new tab

The optimal performance is shown in bold, and $*$ denotes that this single-channel signal outperforms the multi-channel signals. SD represents the standard deviation of performance between different subjects

Table 4.

The performance of our proposed model in comparison to other solutions in SADT

Methods	Channels
	FP1		FP2
	RMSE±SD	PCC±SD	RMSE±SD	PCC±SD
SVR (Zhang et al. 2015)	$0.113 \pm 0.03$	$0.825 \pm 0.22$	$0.120 \pm 0.03$	$0.803 \pm 0.25$
LSTM (Zhang and Etemad 2021)	$0.262 \pm 0.12$	$0.390 \pm 0.20$	$0.257 \pm 0.11$	$0.385 \pm 0.20$
CNN (Zhang and Etemad 2021)	$0.112 \pm 0.03$	$0.843 \pm 0.22$	$0.132 \pm 0.05$	$0.813 \pm 0.24$
LSTM-CapsAtt (Zhang and Etemad 2021)	$0.091 \pm 0.03$	$0.865 \pm 0.23$	$0.103 \pm 0.04$	$0.832 \pm 0.25$
ADDA (Lu et al. 2018)	$0.217 \pm 0.07$	$0.564 \pm 0.26$	$0.224 \pm 0.09$	$0.533 \pm 0.26$
CapsNet (baseline)	$0.079 \pm 0.03$	$0.879 \pm 0.18$	$0.080 \pm 0.03$	$0.887 \pm 0.17$
+ Self-attention	$0.078 \pm 0.03$	$0.889 \pm 0.18$	$0.078 \pm 0.03$	$0.905 \pm 0.15$
++ CGAN	$0.073 \pm 0.03$	$0.893 \pm 0.19$	$0.074 \pm 0.03$	$0.909 \pm 0.15$

Open in a new tab

The optimal performance is shown in bold. SD represents the standard deviation of performance between different subjects

For a visual demonstration of the detection performance of different algorithms with single channel signals, the Channel 6 signals of three subjects (subject 15, subject 20 and subject 21) are randomly selected in SEED-VIG to plot the results of our proposed method for vigilance estimation, as shown in Fig. 7. In addition, LSTM-CapsAtt as the representative of SOTA algorithms is compared with the proposed algorithm. The figure shows that the output estimates of the two methods are both in line with the trend in vigilance, and our proposed algorithm performs best in most cases.

Fig. 7 — Comparison of vigilance estimation results under different algorithms on subject 15, subject 20, and subject 21, with features extracted from the signals of Channel 6. The x-axis corresponds to elapsed times and the y-axis corresponds to the estimated vigilance. The black line is the PERCLOS values measured by the eye-tracking glasses for ground truth. The red line and the green line are estimates provided by the proposed algorithm and the LSTM-CapsAtt method

Furthermore, we compare the computing time of different algorithms with different numbers of channels during testing, as shown in Fig. 8. As the computing time of each algorithm differs too much, to make an intuitive comparison, the results are processed logarithmically with e as the base. The results show that the computing time for vigilance estimation by utilizing the single-channel signals is more than 15% shorter than that by utilizing the combined signals of all four channels, which greatly improves the speed of real-time detection. Compared with conventional solutions, the proposed algorithm significantly improves the estimation accuracy despite being more time-consuming. Compared with the state-of-the-art model LSTM-CapsAtt, the proposed model significantly improves the estimation accuracy while greatly reducing the time required for each estimation. It is worth mentioning that the computing time of each test stays the same with the addition of CGAN, since the generated data are only used in the stage of model training.

Ablation experiments

To better inspect the effect of each module in our algorithm, we evaluate each component through ablation experiments in SEED-VIG. We design four ablation experiments for different modules: the comparison of the proposed new feature map construction method with the conventional method, the comparison of LN layer with the traditional BN layer, the effect of the self-attention mechanism, and the effect of CGAN module.

Effect of proposed feature maps

The difference of constructing the feature maps between our proposed method and the traditional method is shown in Fig. 2. To demonstrate the effectiveness of our proposed approach, we compare their results by CapsNet algorithm in Table 5. It can be seen from the results that the proposed construction method has smaller errors as well as more stable performance with less standard deviation (SD) of both PCC and RMSE, although it is less similar to the ground truth in terms of trend than the conventional method. However, the first criterion for judging the goodness of the vigilance estimation algorithm is the error, so the reduction of RMSE is what we are happy to see, while there is no high requirement for PCC.

Table 5.

Comparison results of different feature map construction methods

Methods	RMSE±SD	PCC±SD
Conventional Feature Maps	$0.229 \pm 0.08$	$0.559 \pm 0.27$
Proposed Feature Maps	$0.225 \pm 0.06$	$0.538 \pm 0.25$

Open in a new tab

The features are extracted from the fused signals of four channels in the forehead

Effect of LN layer

In this paper, the difference between LN and BN layer s is mentioned in Methods, and we propose to adopt LN layer to replace the original BN layer in the CapsNet. Hence, in this ablation experiment, the results of employing LN layer and BN layer were compared, as shown in Table 6. The results demonstrate that the LN layer not only reduces the error but also improves the similarity to the trend of the ground truth. Compared with BN layer, LN layer as the normalization layer performs more robustly for the estimation of vigilance with smaller SD of RMSE and PCC.

Table 6.

Comparison results of different normalization layers

Layers	RMSE±SD	PCC±SD
Batch Normalization	$0.254 \pm 0.07$	$0.536 \pm 0.27$
Layer Normalization	$0.225 \pm 0.06$	$0.538 \pm 0.25$

Open in a new tab

The features are extracted from the fused signals of four channels in the forehead

Effect of self-attention mechanism

To evaluate the influence of the self-attention mechanism on the proposed model, we compare the accuracy before and after the inclusion of the self-attentive mechanism, Table 3 shows the results. The results indicate that the accuracy and stability of estimation are improved with the addition of the self-attention mechanism, indicating that the self-attention mechanism can concentrate on the critical features and thus improve the training of the network.

Effect of CGAN

We perform data augmentation through CGAN in order to better train the network via expanding the training data. In this paper, data of the same size as the original training data were generated through CGAN. To evaluate the effectiveness of CGAN, we compare the vigilance estimation results of the baseline network and the network containing CGAN, and the comparison results are shown in Table 7. According to the results analysis, the data generated through CGAN effectively expands the training data, thus improving the estimation accuracy of the network.

Table 7.

Comparison results of the presence and absence of CGAN module

Methods	RMSE±SD	PCC±SD
CapsNet (without CGAN)	$0.225 \pm 0.06$	$0.538 \pm 0.25$
CGAN + CapsNet	$0.215 \pm 0.06$	$0.575 \pm 0.28$

Open in a new tab

The features are extracted from the fused signals of four channels in the forehead

Discussion

Impact of new feature maps and LN layer

As seen from the results in Table 5, the variation in the approach of constructing the feature maps improves the accuracy of the vigilance estimation. The proposed model learns discriminative information from the signals through the CapsNet, which consists of two structures: the primary capsules (learning part information) and the digit capsules layer (learning whole information), enabling the CapsNet to concentrate not only on the information contained in the feature maps, but also on the positional relationships between different information (Sabour et al. 2017). Hence, “providing more positional information” is the fundamental reason for changing the construction method of feature maps. Through reconstructing the traditional 2-channel feature maps into a single channel, different kinds of features can be connected, thus expanding the range that CapsNet is able to learn and reducing the error in the final estimation.

Meanwhile, the results of the vigilance estimation under different normalization layers are demonstrated in Table 6, which shows that the LN layer is more suitable for the CapsNet than the BN layer. This is due to the fact that each feature on the feature maps we constructed represents information from different frequency bands, while the LN layer is able to normalize the features within the same layer, thus such an operation strengthens the connection between different information and facilitates the CapsNet to learn the global information.

Role of self-attention mechanism

Table 3 reflects the excellent performance of the self-attention mechanism in the proposed model. To further analyze the reasons for the superior performance of the mechanism, we visualized the weights of the feature maps after the self-attention mechanism, as shown in Fig. 9. It can be seen from the figure that the various parts of feature maps are given different weight coefficients after the self-attention mechanism, which facilitates the network to learn the priority information and thus perform more accurate estimates. In addition, Fig. 9 shows the weight coefficients corresponding to different feature maps, which indicates that each feature map has its weight matrix, and as the feature map varies, the emphasis on attention also changes, which is consistent with cognition.

Fig. 9 — Weight heatmap of features after self-attention mechanism. The size of a single heatmap is $5 \times 10$ , which is the same as the size of the feature map. Each color block represents the weight of a feature and corresponds to the position of the feature map. Min-max normalization is employed to rescale the weights to [0, 1] and the higher value indicates that the corresponding feature is receiving more attention

Impact of CGAN

From Table 7, it can be seen that the inclusion of CGAN further reduces the estimation error. To further evaluate the effectiveness of CGAN, we employ t-SNE (Maaten and Geoffrey 2008) to visualize the high-dimensional features, as shown in Fig. 10. To observe the validity of the generated data more clearly, we extracted the data with PERCLOS labels between 0.1 and 0.2 separately for visualization. Blue points indicate the real data of 22 individuals as the training set in the SEED-VIG dataset, orange points represent the data generated by CGAN which learns the distribution of the real data, and green points denote the real data of one person remaining in the data set as the test set. The generated data are close to the real data, which means that the generated data carry enough realistic information. In addition, the test data are well included in the distribution of the generated data, especially including some regions not covered by the real data, which indicates that the generated data supplement the training data, which well explains the improved model accuracy of CGAN.

Fig. 10 — Two-dimensional visualization of real data (blue points), generated data (orange points) and test data (green points) from one experiment in the SEED-VIG dataset (data with PERCLOS between 0.1 and 0.2)

Performance analysis

To evaluate the performance of the proposed algorithm, we compare the regression results with other advanced algorithms on SEED-VIG and SADT. Tables 3 and 4 show the comparison results among SVM (Zhang et al. 2015), CNN (Zhang and Etemad 2021), LSTM (Zhang and Etemad 2021), LSTM-CapsAtt (Zhang and Etemad 2021), ADDA (Lu et al. 2018) and our proposed algorithm. We observed that our architecture achieves better regression performance than other architectures. Compared with the traditional CNN and LSTM, we use the improved CapsNet as the basic skeleton to transmit vector information between capsules, which can better handle the changes in the position, direction, and size of objects in the feature maps. We reconstruct the output of the digit capsule and compare it with the original input feature map. The result is shown in Fig. 11. Compared with the original feature map, the reconstructed feature map can restore most of the information, which shows the superiority of vector information transmission in CapsNet. In addition, compared with the LSTM-CapsAtt (Zhang and Etemad 2021) network, we introduce the self-attention mechanism to enhance the key information of features based on the improved CapsNet, as shown in Fig. 9, which deepens the useful features, can better adapt to different data and improve the network robustness.

Fig. 11 — Heatmap of feature maps. The position of the heatmap corresponds to the feature map, and its shape is $5 \times 10$ . a Shows the data distribution of the original input feature map. b Shows the feature heatmap obtained by reconstructing the output of the digit capsule layer

We propose to use CGAN to enhance the training data. The results in Table 7 prove the effectiveness of this operation, and the visualization results in Fig. 10 illustrate the reason why the operation improves the performance of the algorithm. ADDA (Lu et al. 2018) uses some test data to guide the trained model to migrate, to adapt to the new object sample data. However, obtaining new individual data to guide model migration requires additional data collection, which increases the cost. Different from ADDA, in our proposed algorithm, the application of CGAN solves the adaptability problem of new samples well. By learning the distribution of a large number of data, new data with a similar distribution to the test data is simulated, which reduces the cost of collecting new data.

Feasibility and advantages of single channel

In this paper, we propose to use single-channel EEG signals for cross-subject vigilance estimation, and the results in Table 3 demonstrate the feasibility of this idea. Channel 5 and Channel 6 signals have good performance compared to the EEG signals of the four channels combined. The error of vigilance estimation under each channel are further reduced with the proposed model, and the performance of Channel 6 is even more stable than that of the combined channels, which proves the feasibility of using single-channel EEG signals to estimate vigilance. Furthermore, according to the results in Fig. 8, the estimations through the signals from a single channel provide better real-time performance, which facilitates timely warning the driver to prevent accidents.

Furthermore, from Table 3, we observed that channel 6 performed better than other single channels in the SEED-VIG dataset. However, in the SADT dataset in Table 4, the performance of channels FP1 and FP2 is equivalent. The reason for this may be that the distance between channels 4, 5 and 7 is close, and the noise generated during signal acquisition will affect each other, while channel 6 is far away from them, which is less affected by noise and more effective information can be extracted from it. In the EEG acquisition position of SADT, channel FP1 and channel FP2 are far away from each other and other channels, so there will be no signal interference problem, and the fatigue detection accuracy of the two channels is similar.

Limitation

Although this work presents a novel method for cross-subject vigilance estimation using single-channel EEG signals and the performance is improved through the proposed algorithm, some efforts can still be attempted. The validation of the algorithms in this study are performed on public available dataset, therefore, in order to further demonstrate the effectiveness of the proposed algorithm, validation in practical scenarios is needed, which is one of the components of the subsequent work. Furthermore, in this study, we only extract EEG signals to minimize the power consumption of the algorithm. However, in recent studies, multimodal fusion for cross-subject vigilance estimation has performed excellently (Zhao et al. 2022; Chen et al. 2021; Zhang et al. 2023). Hence, in the subsequent work, we will try multimodal fusion to further improve the algorithm accuracy.

Conclusions

In this paper, we propose a cross-subject vigilance estimation method based on single-channel EEG signals from the forehead. To improve the estimation accuracy, we propose to use a capsule network capable of sensing both local and global information for vigilance estimation. And we have modified the construction method of feature maps, while focusing on the key information using a self-attention mechanism. The performance of the proposed algorithm is validated on the SEED-VIG dataset, and the experimental results show that the algorithm has higher accuracy and stability compared to other mainstream algorithms, and the proposed components prove their necessity through ablation experiments. We also compare the single-channel results with the combined forehead four-channel ones, and the results demonstrate that the estimation accuracy of the single-channel signals is comparable to that of the combined signals. At the same time, there is a significant improvement in computing time. On the basis of these, we conclude that it is feasible to use single-channel signals for vigilance estimation in practical applications. In addition, the role of CGAN in expanding training data to improve the accuracy of single-channel signals for estimation is also proved by ablation experiments and data visualization. Through an investigation of cross-subject vigilance estimation using single-channel signals, our proposed method can be further integrated with existing lightweight EEG equipment, thus offering the possibility of producing real-time wearable vigilance warning devices.

Funding

This work was funded by the National Key R &D Program of China (grant number 2017YFD0600904).

Declarations

Conflict of interest

The authors declare no Conflict of interest.

Footnotes

https://www.smivision.com/eye-tracking/products/mobile-eye-tracking/.

https://figshare.com/articles/dataset/Multi-channel_EEG_recordings_during_a_sustained-attention_driving_task_preprocessed_dataset_/7666055/3.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

Alioua N, Amine A, Rziza M (2014) Driver’s fatigue detection based on yawning extraction. Int J Veh Technol 2014:47–75 [Google Scholar]
Bergasa LM, Nuevo J, Sotelo MA, Vhzquez M (2006) Real-time system for monitoring driver vigilance. IEEE Trans Intell Transp Syst 7:63–77 [Google Scholar]
Buendia R, Forcolin F, Karlsson J, Sjöqvist BA, Anund A, Candefjord S (2019) Deriving heart rate variability indices from cardiac monitoring-an indicator of driver sleepiness. Traffic Inj Prev 20:249–254 [DOI] [PubMed] [Google Scholar]
Cao Z, Chuang C-H, King J-K, Lin C-T (2019) Multi-channel eeg recordings during a sustained-attention driving task. Sci Data [DOI] [PMC free article] [PubMed]
Chao H, Dong L, Liu Y, Lu B (2019) Emotion recognition from multiband eeg signals using capsnet. Sensors (Switzerland) 19 [DOI] [PMC free article] [PubMed]
Chen S, Kaili X, Yao X, Ge J, Li L, Zhu S, Li Z (2021) Information fusion and multi-classifier system for miner fatigue recognition in plateau environments based on electrocardiography and electromyography signals. Comput Methods Programs Biomed 211:106451 [DOI] [PubMed] [Google Scholar]
Chuang CH, Ko LW, Lin YP, Jung TP, Lin CT (2014) Independent component ensemble of eeg for brain-computer interface. IEEE Trans Neural Syst Rehabil Eng 22:230–238 [DOI] [PubMed] [Google Scholar]
Dong Y, Hu Z, Uchimura K, Murayama N (2011) Driver inattention monitoring system for intelligent vehicles: A review. IEEE Trans Intell Transp Syst 12:596–614 [Google Scholar]
Flores MJ, Armingol JM, Adl Escalera (2010) Driver drowsiness warning system using visual information for both diurnal and nocturnal illumination conditions. In: Ad Hoc networks
Gao Z, Wang X, Yang Y, Chaoxu M, Cai Q, Dang W, Zuo S (2019) Eeg-based spatio-temporal convolutional neural network for driver fatigue evaluation. IEEE Trans Neural Netw Learn Syst 30:2755–2763 [DOI] [PubMed] [Google Scholar]
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Neural information processing systems
Guarda L, Tapia J, Droguett EL, Ramos M (2022) A novel capsule neural network based model for drowsiness detection using electroencephalography signals. Expert Syst Appl 201
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 770–778
Jianchao L, Zheng X, Tang L, Zhang T, Sheng QZ, Wang C, Jin J, Shui Yu, Zhou W (2021) Can steering wheel detect your driving fatigue? IEEE Trans Veh Technol 70:5537–5550 [Google Scholar]
Jiang Y, Zhang Y, Lin C, Dongrui W, Lin C-T (2021) Eeg-based driver drowsiness estimation using an online multi-view and transfer tsk fuzzy system. IEEE Trans Intell Transp Syst 22(3):1752–1764 [Google Scholar]
Jiao Y, Deng Y, Luo Y, Lu BL (2020) Driver sleepiness detection from eeg and eog signals using gan and lstm networks. Neurocomputing 408:100–111 [Google Scholar]
Kamakura Y, Ohsuga M, Inoue Y, Noguchi Y (2007) Classification of blink waveforms towards the assessment of driver’s arousal level. Trans Soc Autom Engineers Jpn 38:173–178 [Google Scholar]
Kingma D, Ba J (2014) Adam: a method for stochastic optimization. Comput Sci
Ko LW, Komarov O, Lai WK, Liang WG, Jung TP (2020) Eyeblink recognition improves fatigue prediction from single-channel forehead eeg in a realistic sustained attention task. J Neural Eng 17:036015 ((12pp)) [DOI] [PubMed] [Google Scholar]
Kong W, Zhou Z, Jiang B, Babiloni F, Borghini G (2017) Assessment of driving fatigue based on intra/inter-region phase synchronization. Neurocomputing 219:474–482 [Google Scholar]
Ko W, Oh K, Jeon E, Suk H-I (2020) Vignet: a deep convolutional neural network for eeg-based driver vigilance estimation. In: 2020 8th International Winter Conference on Brain-Computer Interface (BCI), pp 1–3
Liu Y, Lan Z, Cui J, Sourina O, Muller-Wittig W (2019) Eeg-based cross-subject mental fatigue recognition. In: Proceedings—2019 international conference on cyberworlds, CW 2019, pp 247–252
Lu BL, Li H, Zheng WL (2018) Multimodal vigilance estimation with adversarial domain adaptation networks. In: 2018 International Joint Conference on Neural Networks (IJCNN), pp 1–6
Luo H, Qiu T, Liu C, Huang P (2019) Research on fatigue driving detection using forehead eeg based on adaptive multi-scale entropy. Biomed Signal Process Control 51:50–58 [Google Scholar]
Maaten L, Geoffrey H (2008) Visualizing data using t-sne. J Mach Learn Res 9:2579–2605 [Google Scholar]
Ma B, Li H, Luo Y, Lu BL (2019) Depersonalized cross-subject vigilance estimation with adversarial domain generalization. In: 2019 International joint conference on neural networks (IJCNN), pp 1–8
Maskeliunas R, Damasevicius R, Martisius I, Vasiljevas M (2016) Consumer-grade eeg devices: Are they usable for control tasks? PeerJ [DOI] [PMC free article] [PubMed]
Murphy-Chutorian E, Trivedi MM (2010) Head pose estimation and augmented reality tracking: an integrated system and evaluation for monitoring driver awareness. IEEE Trans Intell Transp Syst 11:300–311 [Google Scholar]
Naeije M, Zorn H (1982) Relation between emg power spectrum shifts and muscle fibre action potential conduction velocity changes during local muscular fatigue in man. Eur J Appl Physiol 50:23–33 [Google Scholar]
Pei Z, Wang H, Bezerianos A, Li J (2021) Eeg-based multiclass workload identification using feature fusion and selection. IEEE Trans Instru Measure 70:4001108 [Google Scholar]
Ratti E, Waninger S, Berka C, Ruffini G, Verma A (2017) Comparison of medical and consumer wireless eeg systems for use in clinical trials. Front Human Neurosci 11 [DOI] [PMC free article] [PubMed]
Rogado E, García JL, Barea R, Bergasa LM, López E (2009) Driver fatigue detection system. In: 2008 IEEE International Conference on Robotics and Biomimetics, ROBIO 2008, pp 1105–1110
Sabour S, Frosst N, Hinton GE (2017) Dynamic routing between capsules. Proc Adv Neural Inf Process Syst 3856–3866
Sikander G, Anwar S (2019) Driver fatigue detection systems: a review. IEEE Trans Intell Transp Syst 20:2339–2352 [Google Scholar]
Trutschel U, Sirois B, Sommer D, Golz M, Edwards D (2011) Perclos: an alertness measure of the past. In: Driving Assessment: International Driving Symposium on Human Factors in Driver Assessment
Tuncer T, Dogan S, Ertam F, Subasi A (2021) A dynamic center and multi threshold point based stable feature extraction network for driver fatigue detection utilizing eeg signals. Cogn Neurodyn 15:223–237 [DOI] [PMC free article] [PubMed] [Google Scholar]
Vaswani A, Brain G, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need
Wei W, JonathanWu QM, Sun W, Yang Y, Yuan X, Zheng WL, Lu BL (2021) A regression method with subnetwork neurons for vigilance estimation using eog and eeg. IEEE Trans Cogn Develop Syst 13:209–222 [Google Scholar]
Wu W, Sun W, Jonathan Wu QM, Yang Y, Zhang H, Zheng WL, Lu BL (2022) Multimodal vigilance estimation using deep learning. IEEE Trans Cybern 52:3097–3110 [DOI] [PubMed] [Google Scholar]
Yue W, Ji Q (2019) Facial landmark detection: a literature survey. Int J Comput Vision 127:115–142 [Google Scholar]
Zeng H, Yang C, Dai G, Qin F, Zhang J, Kong W (2018) Eeg classification of driver mental states by deep learning. Cogn Neurodyn 12:597–606 [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhang Y-F, Gao X-Y, Zhu J-Y, Zheng W-L, Lu B-L (2015) A novel approach to driving fatigue detection using forehead eog. In 2015 7th International IEEE/EMBS Conference on Neural Engineering (NER), pp 707–710
Zhang G, Etemad A (2021) Capsule attention for multimodal eeg-eog representation learning with application to driver vigilance estimation. IEEE Trans Neural Syst Rehabil Eng 29:1138–1149 [DOI] [PubMed] [Google Scholar]
Zhang C, Sun L, Cong F, Kujala T, Ristaniemi T, Parviainen T (2020) Optimal imaging of multi-channel eeg features based on a novel clustering technique for driver fatigue detection. Biomed Signal Process Control 62:102103 [Google Scholar]
Zhang Y, Guo R, Peng Y, Kong W, Nie F, Bao-Liang L (2022) An auto-weighting incremental random vector functional link network for eeg-based driving fatigue detection. IEEE Trans Instrum Meas 71:1–14 [Google Scholar]
Zhang Y, Guo H, Zhou Y, Chengji X, Liao Y (2023) Recognising drivers’ mental fatigue based on eeg multi-dimensional feature selection and fusion. Biomed Signal Process Control 79:104237 [Google Scholar]
Zhao L, Li M, He Z, Ye S, Qin H, Zhu X, Dai Z (2022) Data-driven learning fatigue detection system: a multimodal fusion approach of ecg (electrocardiogram) and video signals. Measurement 201:111648 [Google Scholar]
Zheng WL, Lu BL (2017) A multimodal approach to estimating vigilance using eeg and forehead eog. J Neural Eng 14 [DOI] [PubMed]

[CR1] Alioua N, Amine A, Rziza M (2014) Driver’s fatigue detection based on yawning extraction. Int J Veh Technol 2014:47–75 [Google Scholar]

[CR2] Bergasa LM, Nuevo J, Sotelo MA, Vhzquez M (2006) Real-time system for monitoring driver vigilance. IEEE Trans Intell Transp Syst 7:63–77 [Google Scholar]

[CR3] Buendia R, Forcolin F, Karlsson J, Sjöqvist BA, Anund A, Candefjord S (2019) Deriving heart rate variability indices from cardiac monitoring-an indicator of driver sleepiness. Traffic Inj Prev 20:249–254 [DOI] [PubMed] [Google Scholar]

[CR4] Cao Z, Chuang C-H, King J-K, Lin C-T (2019) Multi-channel eeg recordings during a sustained-attention driving task. Sci Data [DOI] [PMC free article] [PubMed]

[CR5] Chao H, Dong L, Liu Y, Lu B (2019) Emotion recognition from multiband eeg signals using capsnet. Sensors (Switzerland) 19 [DOI] [PMC free article] [PubMed]

[CR6] Chen S, Kaili X, Yao X, Ge J, Li L, Zhu S, Li Z (2021) Information fusion and multi-classifier system for miner fatigue recognition in plateau environments based on electrocardiography and electromyography signals. Comput Methods Programs Biomed 211:106451 [DOI] [PubMed] [Google Scholar]

[CR7] Chuang CH, Ko LW, Lin YP, Jung TP, Lin CT (2014) Independent component ensemble of eeg for brain-computer interface. IEEE Trans Neural Syst Rehabil Eng 22:230–238 [DOI] [PubMed] [Google Scholar]

[CR8] Dong Y, Hu Z, Uchimura K, Murayama N (2011) Driver inattention monitoring system for intelligent vehicles: A review. IEEE Trans Intell Transp Syst 12:596–614 [Google Scholar]

[CR9] Flores MJ, Armingol JM, Adl Escalera (2010) Driver drowsiness warning system using visual information for both diurnal and nocturnal illumination conditions. In: Ad Hoc networks

[CR10] Gao Z, Wang X, Yang Y, Chaoxu M, Cai Q, Dang W, Zuo S (2019) Eeg-based spatio-temporal convolutional neural network for driver fatigue evaluation. IEEE Trans Neural Netw Learn Syst 30:2755–2763 [DOI] [PubMed] [Google Scholar]

[CR11] Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Neural information processing systems

[CR12] Guarda L, Tapia J, Droguett EL, Ramos M (2022) A novel capsule neural network based model for drowsiness detection using electroencephalography signals. Expert Syst Appl 201

[CR13] He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 770–778

[CR14] Jianchao L, Zheng X, Tang L, Zhang T, Sheng QZ, Wang C, Jin J, Shui Yu, Zhou W (2021) Can steering wheel detect your driving fatigue? IEEE Trans Veh Technol 70:5537–5550 [Google Scholar]

[CR15] Jiang Y, Zhang Y, Lin C, Dongrui W, Lin C-T (2021) Eeg-based driver drowsiness estimation using an online multi-view and transfer tsk fuzzy system. IEEE Trans Intell Transp Syst 22(3):1752–1764 [Google Scholar]

[CR16] Jiao Y, Deng Y, Luo Y, Lu BL (2020) Driver sleepiness detection from eeg and eog signals using gan and lstm networks. Neurocomputing 408:100–111 [Google Scholar]

[CR17] Kamakura Y, Ohsuga M, Inoue Y, Noguchi Y (2007) Classification of blink waveforms towards the assessment of driver’s arousal level. Trans Soc Autom Engineers Jpn 38:173–178 [Google Scholar]

[CR18] Kingma D, Ba J (2014) Adam: a method for stochastic optimization. Comput Sci

[CR19] Ko LW, Komarov O, Lai WK, Liang WG, Jung TP (2020) Eyeblink recognition improves fatigue prediction from single-channel forehead eeg in a realistic sustained attention task. J Neural Eng 17:036015 ((12pp)) [DOI] [PubMed] [Google Scholar]

[CR20] Kong W, Zhou Z, Jiang B, Babiloni F, Borghini G (2017) Assessment of driving fatigue based on intra/inter-region phase synchronization. Neurocomputing 219:474–482 [Google Scholar]

[CR21] Ko W, Oh K, Jeon E, Suk H-I (2020) Vignet: a deep convolutional neural network for eeg-based driver vigilance estimation. In: 2020 8th International Winter Conference on Brain-Computer Interface (BCI), pp 1–3

[CR22] Liu Y, Lan Z, Cui J, Sourina O, Muller-Wittig W (2019) Eeg-based cross-subject mental fatigue recognition. In: Proceedings—2019 international conference on cyberworlds, CW 2019, pp 247–252

[CR23] Lu BL, Li H, Zheng WL (2018) Multimodal vigilance estimation with adversarial domain adaptation networks. In: 2018 International Joint Conference on Neural Networks (IJCNN), pp 1–6

[CR24] Luo H, Qiu T, Liu C, Huang P (2019) Research on fatigue driving detection using forehead eeg based on adaptive multi-scale entropy. Biomed Signal Process Control 51:50–58 [Google Scholar]

[CR25] Maaten L, Geoffrey H (2008) Visualizing data using t-sne. J Mach Learn Res 9:2579–2605 [Google Scholar]

[CR26] Ma B, Li H, Luo Y, Lu BL (2019) Depersonalized cross-subject vigilance estimation with adversarial domain generalization. In: 2019 International joint conference on neural networks (IJCNN), pp 1–8

[CR27] Maskeliunas R, Damasevicius R, Martisius I, Vasiljevas M (2016) Consumer-grade eeg devices: Are they usable for control tasks? PeerJ [DOI] [PMC free article] [PubMed]

[CR28] Murphy-Chutorian E, Trivedi MM (2010) Head pose estimation and augmented reality tracking: an integrated system and evaluation for monitoring driver awareness. IEEE Trans Intell Transp Syst 11:300–311 [Google Scholar]

[CR29] Naeije M, Zorn H (1982) Relation between emg power spectrum shifts and muscle fibre action potential conduction velocity changes during local muscular fatigue in man. Eur J Appl Physiol 50:23–33 [Google Scholar]

[CR30] Pei Z, Wang H, Bezerianos A, Li J (2021) Eeg-based multiclass workload identification using feature fusion and selection. IEEE Trans Instru Measure 70:4001108 [Google Scholar]

[CR31] Ratti E, Waninger S, Berka C, Ruffini G, Verma A (2017) Comparison of medical and consumer wireless eeg systems for use in clinical trials. Front Human Neurosci 11 [DOI] [PMC free article] [PubMed]

[CR32] Rogado E, García JL, Barea R, Bergasa LM, López E (2009) Driver fatigue detection system. In: 2008 IEEE International Conference on Robotics and Biomimetics, ROBIO 2008, pp 1105–1110

[CR33] Sabour S, Frosst N, Hinton GE (2017) Dynamic routing between capsules. Proc Adv Neural Inf Process Syst 3856–3866

[CR34] Sikander G, Anwar S (2019) Driver fatigue detection systems: a review. IEEE Trans Intell Transp Syst 20:2339–2352 [Google Scholar]

[CR35] Trutschel U, Sirois B, Sommer D, Golz M, Edwards D (2011) Perclos: an alertness measure of the past. In: Driving Assessment: International Driving Symposium on Human Factors in Driver Assessment

[CR36] Tuncer T, Dogan S, Ertam F, Subasi A (2021) A dynamic center and multi threshold point based stable feature extraction network for driver fatigue detection utilizing eeg signals. Cogn Neurodyn 15:223–237 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR37] Vaswani A, Brain G, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need

[CR38] Wei W, JonathanWu QM, Sun W, Yang Y, Yuan X, Zheng WL, Lu BL (2021) A regression method with subnetwork neurons for vigilance estimation using eog and eeg. IEEE Trans Cogn Develop Syst 13:209–222 [Google Scholar]

[CR39] Wu W, Sun W, Jonathan Wu QM, Yang Y, Zhang H, Zheng WL, Lu BL (2022) Multimodal vigilance estimation using deep learning. IEEE Trans Cybern 52:3097–3110 [DOI] [PubMed] [Google Scholar]

[CR40] Yue W, Ji Q (2019) Facial landmark detection: a literature survey. Int J Comput Vision 127:115–142 [Google Scholar]

[CR41] Zeng H, Yang C, Dai G, Qin F, Zhang J, Kong W (2018) Eeg classification of driver mental states by deep learning. Cogn Neurodyn 12:597–606 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR42] Zhang Y-F, Gao X-Y, Zhu J-Y, Zheng W-L, Lu B-L (2015) A novel approach to driving fatigue detection using forehead eog. In 2015 7th International IEEE/EMBS Conference on Neural Engineering (NER), pp 707–710

[CR43] Zhang G, Etemad A (2021) Capsule attention for multimodal eeg-eog representation learning with application to driver vigilance estimation. IEEE Trans Neural Syst Rehabil Eng 29:1138–1149 [DOI] [PubMed] [Google Scholar]

[CR44] Zhang C, Sun L, Cong F, Kujala T, Ristaniemi T, Parviainen T (2020) Optimal imaging of multi-channel eeg features based on a novel clustering technique for driver fatigue detection. Biomed Signal Process Control 62:102103 [Google Scholar]

[CR45] Zhang Y, Guo R, Peng Y, Kong W, Nie F, Bao-Liang L (2022) An auto-weighting incremental random vector functional link network for eeg-based driving fatigue detection. IEEE Trans Instrum Meas 71:1–14 [Google Scholar]

[CR46] Zhang Y, Guo H, Zhou Y, Chengji X, Liao Y (2023) Recognising drivers’ mental fatigue based on eeg multi-dimensional feature selection and fusion. Biomed Signal Process Control 79:104237 [Google Scholar]

[CR47] Zhao L, Li M, He Z, Ye S, Qin H, Zhu X, Dai Z (2022) Data-driven learning fatigue detection system: a multimodal fusion approach of ecg (electrocardiogram) and video signals. Measurement 201:111648 [Google Scholar]

[CR48] Zheng WL, Lu BL (2017) A multimodal approach to estimating vigilance using eeg and forehead eog. J Neural Eng 14 [DOI] [PubMed]

PERMALINK

An improved CapsNet based on data augmentation for driver vigilance estimation with forehead single-channel EEG

Huizhou Yang

Jingwen Huang

Yifei Yu

Zhigang Sun

Shouyi Zhang

Yunfei Liu

Han Liu

Lijuan Xia

Abstract

Introduction

Methods

Solution overview

Fig. 1.

Input representation layer

Data pre-processing

Feature extraction

Construction of feature maps

Fig. 2.

Self-attention module

Positional encoding

Self-attention mechanism

Residual connection and normalization

CapsNet

Data augmentation

Fig. 3.

Experimental studies

Dataset

Fig. 4.

Fig. 5.

Evaluation methods

Implementation details

Fig. 6.

Table 1.

Table 2.

Comparison

State-of-the-art methods

Classical methods

Results

Performance

Table 3.

Table 4.

Fig. 7.

Fig. 8.

Ablation experiments

Effect of proposed feature maps

Table 5.

Effect of LN layer

Table 6.

Effect of self-attention mechanism

Effect of CGAN

Table 7.

Discussion

Impact of new feature maps and LN layer

Role of self-attention mechanism

Fig. 9.

Impact of CGAN

Fig. 10.

Performance analysis

Fig. 11.

Feasibility and advantages of single channel

Limitation

Conclusions

Funding

Declarations

Conflict of interest

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases