Abstract
Effective seizure detection from long-term EEG is highly important for seizure diagnosis. Existing methods usually design the feature and classifier individually, while little work has been done for the simultaneous optimization of the two parts. This work proposes a deep network to jointly learn a feature and a classifier so that they could help each other to make the whole system optimal. To deal with the challenge of the impulsive noises and outliers caused by EMG artifacts in EEG signals, we formulate a robust stacked autoencoder (R-SAE) as a part of the network to learn an effective feature. In R-SAE, the maximum correntropy criterion (MCC) is proposed to reduce the effect of noise/outliers. Unlike the mean square error (MSE), the output of the new kernel MCC increases more slowly than that of MSE when the input goes away from the center. Thus, the effect of those noises/outliers positioned far away from the center can be suppressed. The proposed method is evaluated on six patients of 33.6 hours of scalp EEG data. Our method achieves a sensitivity of 100% and a specificity of 99%, which is promising for clinical applications.
1. Introduction
Epilepsy is a common and serious brain disorder, which affects about 50 million people worldwide [1]. Epileptic seizures are characterized by convulsions, loss of consciousness, and muscle spasms resulting from excessive synchronization of neuronal activities in the brain [2]. The abnormal neuronal discharges lead to epileptic patterns such as closely spaced spikes and slow waves in electroencephalogram (EEG). In seizure diagnosis and evaluation, visual inspection of these epileptic patterns from long-term EEG is a routine job for the doctors, which could be highly tedious and time-consuming [3]. Therefore, reliable seizure detection system that identifies seizure events automatically would facilitate seizure diagnosis and has great potential in clinical applications.
There are two key points in automatic seizure detection. One is how to capture the diverse patterns of seizure EEG. For different individuals, the morphologies of seizure patterns could vary considerably. Therefore, effective feature extraction plays a key role in seizure detection and lots of efforts have been made. In order to characterize the changes in amplitude and energy in epileptic EEG, Saab and Gotman [4] proposed to use three measures, relative average amplitude, relative scale energy, and coefficient of variation of amplitude. Similarly, Majumdar and Vardhan [5] utilized the variance of differentiation of time window to detect significant changes in EEG signals. To identify the sharp waves which typically appear in seizure signals, Yadav et al. [6] introduced a morphology-based detector based on the slopes of the half-waves of signals. To characterize the intrinsic time-frequency components of seizure patterns, Ghosh-Dastidar et al. [7] used principal component analysis and Zandi et al. [8] applied wavelet transform to decompose the EEG signal for feature enhancement. To encode the changes in dynamics of epileptic signal, Jouny and Bergey [9] utilized nonlinear measures of sample entropy and Lempel-Ziv complexity. To describe the topology state of epilepsy, Santaniello et al. [10] transformed the multichannel EEG data into a cross-power matrix, and eigenvalues of the matrix are used for seizure detection. The other key point is how to reduce the effect of noise. The noises caused by electromyography (EMG) or electrode movements commonly appear in EEG signal and are prone to trigger false alarms. These artifacts could bring impulsive changes with large amplitudes in EEG signal and lead to outlying values in the feature space. Some existing methods simply assumed these noises to be Gaussian [11, 12] and thus would be fragile given large amounts of outliers. Other approaches applied specific false alarm avoidance methods against these noises [4–6].
Although existing methods have shown some strengths in specific EEG datasets, the following problems have not yet been well explored. First, most existing features are designed according to the observation of a few seizure patterns, which seems too empirical to cover a wide range of seizure patterns; thus the features are usually suboptimal. Second, existing methods could be sensitive to the noises in EEG signals. Artifacts caused by EMG or electrode movements probably lead to a EEG signal shape similar to that of seizure states. A simple Gaussian assumption for the noises can be incorrect and the approaches designed based on this can cause high false alarms [11, 12]. Finally, most methods design the feature and classifier individually. Few efforts have been made to study the relationship between them or simultaneously optimize both of the two parts to maximize the abilities of them.
Inspired by the great success of deep network in image retrieval, speech recognition, and computer vision [13–21], this paper proposes a deep model framework to deal with the above issues. The main contributions of our work can be summarized as follows.
Instead of manually designing a feature, we propose a network called robust stacked autoencoder (R-SAE) to automatically learn a feature to represent seizure patterns. The reconstruction error is first used to learn an initial feature.
To reduce the effect of noises on EEG signals, we formulate a maximum correntropy criterion (MCC) to the R-SAE network. Unlike the traditional autoencoder model which uses the mean square error (MSE) as the reconstruction cost, the output of the new kernel MCC increases more slowly than that of MSE when the input goes away from the center. Thus, the effect of those noises/outliers positioned far away from the center can be suppressed.
The R-SAE part and classification part are integrated to a new deep network. The objective of the network is the best seizure classification accuracy. Thus, both the initial feature and the classifier could be optimized according to the detection objective so that the whole detection system could be as optimal as possible. Besides, the optimal feature is completely data-driven. Given enough training data, the optimal feature learned by our method is able to represent various seizure patterns.
Our method is evaluated on 33.6 hours of EEG signals from six patients. With the MCC-based R-SAE model, robust features are extracted from noisy EEG signal that the sensitivity and specificity increase by 14% and 1% compared with the traditional stacked autoencoder (S-SAE). By supervised joint optimization of our deep model, the features are further optimized with better separability in the feature space and the sensitivity and specificity increase by 8% and 15%, respectively. In comparison with other methods, the proposed R-SAE model outperforms the competitors and achieves a high sensitivity of 100% and a specificity of 99%.
The rest of this paper is organized as follows. Section 2 presents the detail of the R-SAE deep model. The experimental results and discussions are shown in Section 3. Finally, we draw the conclusions in Section 4.
2. Materials and Methods
The framework of our method is shown in Figure 1. The multichannel EEG signals are firstly divided into short-time segments, and we calculate the cross-power matrix for each segment to reveal the spatial patterns of the brain. Then, compact features are extracted from the cross-power matrix by a deep network cascaded to a softmax classifier. In our method, the deep network is first pretrained with the R-SAE model to extract useful features, and then the features are further optimized jointly with the classifier to obtain optimal seizure detection system.
Figure 1.

Framework of our method.
2.1. EEG Data
Scalp EEG data of six patients are used in this study. The EEG data were recorded during long-term presurgical epilepsy monitoring using NicoletOne amplifier at Second Affiliated Hospital of Zhejiang University, College of Medicine. A total of 28 channels were acquired at the sample rate of 256 Hz according to 10–20 electrode placement systems. The detail of the EEG data is given in Table 1. For each patient, all the available seizure EEG signals are used, and we randomly choose two 2.8-hour-long EEG segments as the nonseizure data segmentation and data preparation.
Table 1.
Patient information and selected frequency bands.
| Patient | Sex | Chan. # | Sei. # | Hours | Freq. band |
|---|---|---|---|---|---|
| Pt01 | Female | 28 | 2 | 5.6 | 14–30 Hz |
| Pt02 | Female | 28 | 2 | 5.6 | 8–13 Hz |
| Pt03 | Female | 28 | 3 | 5.6 | 4–7 Hz |
| Pt04 | Male | 28 | 3 | 5.6 | 14–30 Hz |
| Pt05 | Male | 28 | 3 | 5.6 | 8–13 Hz |
| Pt06 | Male | 28 | 3 | 5.6 | 8–13 Hz |
2.2. Segmentation and Data Preparation
In the preprocessing stage, the multichannel EEG data are divided into 5-second-long segments with a sliding window. For each patient, a total of 4000 segments of nonseizure data and 1000 segments of seizure data are divided from the EEG signals. There is no overlap between nonseizure segments, while, for seizure segments, the proportion of overlap is configured considering the total length of the seizure signal and number of segments required.
After segmentation, all the segments are disordered and we randomly pick 750 seizure segments and 750 nonseizure segments as the training set and the rest 3500 segments are used as the testing set. All the experiments are carried out on the same training and testing set.
2.3. Multichannel Analysis
Studies have shown that the correlation structure of all pairs of EEG channels could reflect the spatiotemporal evolution of electrical ictal activities [22–24]. By characterizing the spatiotemporal patterns, it is possible to identify seizures and analyze seizure dynamics.
In this study, we adopt cross-power matrix [10] to reflect the spatial patterns of the brain. For each time window with N channels, the cross-power matrix A is N × N. Each element a ij in A is defined by the cross-power [10] between the two EEG channels i and j in a given frequency band of [lb, ub] as follows:
| (1) |
where P ij(ω) is the cross-power spectral density of channels i and j at frequency ω.
2.4. Frequency Band Selection
Considering the diversity of epileptic patterns among patients, we choose the frequency band patient specifically from theta (4–7 Hz), alpha (8–13 Hz), and beta (14–30 Hz) bands. In order to select the frequency band that best reflects the difference between seizure and nonseizure states, we adopt Fisher's discriminant ratio (FDR) [25] as the criterion as follows:
| (2) |
where μ s and σ s 2 are means and covariance, respectively, of cross-power matrix of seizure segments and μ n and σ n 2 are those of nonseizure segments. For each patient, only the training segments are utilized for frequency band selection, and the frequency band with the highest FDR is used for seizure detection. The frequency band selected for each patient is shown in Table 1.
2.5. Robust Stacked Autoencoder
After multichannel analysis, each time window is represented by a cross-power matrix of N × N, where N denotes the number of EEG channels. We propose to employ robust stacked autoencoders to extract reliable and compact features from the cross-power matrix.
In this section, first, we briefly introduce the basic autoencoder. Then, the robust autoencoder with MCC is presented to improve the feature learning ability under noises. Finally, we stack the robust autoencoders into a deep model for compact feature extraction.
2.5.1. Basic Autoencoder
Here, we begin with the traditional standard stacked autoencoder model (S-SAE). An autoencoder is a three-layer artificial network including an encoder and a decoder. The encoder takes an input vector x and maps it to a hidden representation x′ through a nonlinear function as follows:
| (3) |
where s(·) is the sigmoid function. Suppose x and x′ are d-dimensional and d′-dimensional vectors, respectively; then W (1) is a d′ × d weight matrix and b (1) is a d′-dimensional bias vector.
Then, the vector x′ is mapped back to a reconstruction vector y by the decoder as follows:
| (4) |
where the output vector is d-dimensional, W (2) is d × d′, and b (2) is a d-dimensional bias vector.
The parameter set θ = {W (1), b (1), W (2), b (2)} is optimized by minimizing the average reconstruction error as follows:
| (5) |
where L is the loss function. Mostly, the mean square error (MSE) is used as
| (6) |
2.5.2. Robust Autoencoder
The traditional autoencoder model based on MSE loss is not suitable for stable feature learning in EEG signals. In EEG, especially in scalp EEG signals, the large amount of noises caused by EMG artifacts or electrode movements could bring abrupt changes in EEG signal and lead to outliers in both time and frequency domain. A typical example is shown in Figure 2. In this time window, the EEG signals are noised by short-term EMG artifacts which lead to abrupt large-amplitude vibrations in some of the channels as shown in Figure 2(a). In the cross-power domain, such artifacts lead to outlying large values as in the light blocks in Figure 2(b). In the example illustrated, the cross-power between channel 17 and channel 18 is 5.41 × 104, which is far away from the interquartile range value of 395.3. In this situation, the MSE-based cost of the traditional autoencoder model could be dominated by these outliers so that the feature learning ability is weakened.
Figure 2.

An EEG segment with impulsive noises. (a) EMG artifacts cause short-term burst noises in some channels of EEG signal; (b) visualization of the cross-power matrix of the segment with noises. The vertical and horizontal axes denote the channels and each point (i, j) in this figure is the cross-power value of channel i and channel j. The cross-power matrix contains outliers with large values. Because of the noise, the cross-power between channel 17 and channel 18 is far away from the interquartile range value (5.41 × 104 versus 395.3).
In order to learn robust features from EEG signals, we replace the loss function of the autoencoder model with correntropy-based criterion to build robust autoencoder.
Maximum Correntropy Criterion. Correntropy is defined as a localized similarity measure [26] and it has shown good outlier suppression ability in studies [27, 28]. For two random variables X and Y, the correntropy is defined as
| (7) |
where E[·] is the mathematical expectation and κ σ(·) is the Gaussian kernel with kernel size of σ as follows:
| (8) |
The correntropy induces a new metric that, as the distance between X and Y gets larger, the equivalent distance evolves from 2-norm to 1-norm and eventually to zero-norm when X and Y are far apart [29]. Compared with second-order statistics such as MSE, correntropy is less sensitive to outliers. Figure 3 compares the second-order cost and correntropy cost. As the input x goes further from the center, the second-order cost increases sharply, so that it is sensitive to outliers. By contrast, the correntropy is only sensitive in a local range and the increase of the cost is extremely slow when the input value goes out of the central area. Therefore, the correntropy measure is particularly effective in outlier suppression.
Figure 3.

Illustration of second-order cost (red solid line) and correntropy cost (purple dashed line).
In practice, the joint probability density function is unknown and usually only a finite set of samples of {(x i,y i)}i=1 N is available for both X and Y; then the estimated correntropy can be calculated by
| (9) |
The maximum of correntropy error in (9) is called the maximum correntropy criterion (MCC) [29]. Due to the good outlier rejection property of correntropy, MCC is suitable for robust algorithm design.
Robust Autoencoder Based on MCC. In order to improve the antinoise ability of traditional autoencoders, we measure the reconstruction loss between the input vector x and the output vector y by MCC instead of MSE. In the MCC-based robust autoencoder, the cost function J is formulated as
| (10) |
where n is the number of training samples and m is the length of each training sample. The optimal parameter θ is obtained when J MCC(θ) is maximized.
In order to encourage the deep model to capture more implicit patterns, a sparsity-inducing term is adopted. Studies of sparse coding have shown that the sparseness seems to play a key role in learning useful features [30, 31]. Xie et al. [32] combined the virtues of sparse coding and deep networks into a sparse stacked denoising autoencoder to achieve better feature learning and denoising performance. In our model, we regularize the reconstruction loss by a sparsity-inducing term defined as in [32] as follows:
| (11) |
where β is the weight adjustment parameter, s 2 is the number of units in the second layer, is the activation value for the ith hidden layer unit, and ρ is a small number. The sparsity-inducing term constrains that the value of should be near ρ under Kullback-Leibler divergence.
Also, a weight decay term J weight(θ) is added to avoid overfitting. It is defined as follows:
| (12) |
where w ji (l) represents an element in W (l), λ is the parameter to adjust the weight of J weight(θ), and s l denotes number of units in layer l. Therefore, the cost function of the proposed robust autoencoder is defined as
| (13) |
By minimizing the cost of J R-SAE(θ), the parameter set θ could be optimized.
2.5.3. Stacking Robust Autoencoders into Deep Network
In order to learn more effective features for seizure classification, we stack the robust autoencoders into a deep model. Stacking the robust autoencoders works in the same way as stacking the ordinary autoencoders [17] and the output from the highest layer is cascaded to a softmax classifier for seizure detection. Such a model aims at the best seizure classification accuracy, and it is able to simultaneously optimize the feature and classifier.
The training process of the deep network includes two stages: unsupervised pretraining and supervised fine-tuning. In the pretraining stage, the network is trained layer-wisely by the proposed robust autoencoder model to learn useful filters for feature extraction. A well pretrained network yields a good starting point for fine-tuning [33]. In the fine-tuning stage, a softmax classifier is added to the output of the stack, and the parameters of the whole system are tuned to minimize the classification error in a supervised manner. The network is globally tuned through back-propagation and all the parameters of both feature extraction and classification are jointly optimized. After fine-tuning, the deep network is well configured to obtain optimal overall classification performance.
3. Results and Discussion
In this section, experiments are carried out to evaluate the seizure detection performance of our model. The experiments include four parts: (1) we compare the unsupervised feature learning performance of the modified R-SAE model and the standard stacked autoencoder (S-SAE); (2) we compare the features before and after supervised fine-tuning to demonstrate the strength of joint optimization; (3) we compare the seizure detection performance of R-SAE model with other methods; (4) we evaluate the influence of parameters in the R-SAE model on the seizure detection performance.
In our experiments, the seizure detection performance is evaluated with the two commonly used criteria, sensitivity and specificity. Sensitivity is defined as the percentage of true seizure segments detected and specificity is the proportion of nonseizure segments correctly classified.
3.1. Performance of Feature Learning
In this experiment, we evaluate the unsupervised feature learning ability of the R-SAE model with EEG signals. In our method, we train the R-SAE model to learn compact features from the cross-power matrix. After the layer-wised self-taught training, the deep network is well configured to learn useful features. The feature extraction results of the proposed R-SAE model are illustrated in Figure 4. For both illustrations, the seizure begins at about the 20th second. After seizure onset, the patterns of features extracted by R-SAE model show clear differences from nonseizure ones.
Figure 4.

Unsupervised feature learning results by R-SAE model for patient pt03 (a) and pt04 (b). For each subfigure, the top is the original EEG signal from one channel and the bottom is the features extracted by the R-SAE model.
The feature learning performance of R-SAE and S-SAE is compared using EEG signal. In order to evaluate the ability of the features quantitatively, we utilize the classification performance as the criterion. In this experiment, the cost function of the S-SAE model is as follows:
| (14) |
where the loss function J MSE(θ) is formulated with MSE-based loss function as in (6) and J weight(θ) and J sparse(θ) are formulated the same as R-SAE.
We stack two autoencoders to constitute a three-layer network with 784 input units, 50 hidden units, and 10 output units. The same stacked architectures are applied for both R-SAE and S-SAE. The networks are initialized randomly and trained layer-wisely using back-propagation to minimize the cost functions. The parameters are set as λ = 0.003, β = 3, and ρ = 0.1 for both methods and σ = 0.05 for R-SAE.
The seizure detection results of both R-SAE model and S-SAE model are shown in Table 2. In order to eliminate the effects of randomness in network initialization, we present all the results averaged over 10 trials. Results show that the average sensitivity of R-SAE is 97%, which demonstrates 14% improvement compared with S-SAE. With specificity, the average result is 92% for R-SAE which is also higher than that of S-SAE. Thus, R-SAE outperforms S-SAE in both sensitivity and specificity.
Table 2.
Comparison between R-SAE and S-SAE (before fine-tuning).
| Patient | R-SAE | S-SAE | ||
|---|---|---|---|---|
| Sensitivity | Specificity | Sensitivity | Specificity | |
| Pt01 | 0.99 ± 8.1 × 10−3 | 0.96 ± 2.0 × 10−2 | 0.83 ± 2.0 × 10−1 | 0.92 ± 3.2 × 10−2 |
| Pt02 | 0.96 ± 2.0 × 10−2 | 0.91 ± 1.7 × 10−2 | 0.90 ± 1.0 × 10−1 | 0.83 ± 7.6 × 10−2 |
| Pt03 | 0.96 ± 1.5 × 10−2 | 0.93 ± 2.1 × 10−2 | 0.82 ± 2.8 × 10−1 | 0.93 ± 3.2 × 10−2 |
| Pt04 | 0.95 ± 3.7 × 10−2 | 0.91 ± 1.3 × 10−2 | 0.91 ± 4.1 × 10−2 | 0.92 ± 2.6 × 10−2 |
| Pt05 | 0.98 ± 1.7 × 10−2 | 0.94 ± 1.5 × 10−2 | 0.70 ± 1.3 × 10−1 | 0.96 ± 2.4 × 10−2 |
| Pt06 | 0.97 ± 2.9 × 10−2 | 0.84 ± 5.5 × 10−2 | 0.82 ± 1.7 × 10−1 | 0.90 ± 1.5 × 10−2 |
|
| ||||
| Avg. | 0.97 ± 1.3 × 10−2 | 0.92 ± 3.8 × 10−2 | 0.83 ± 6.9 × 10−2 | 0.91 ± 4.0 × 10−2 |
In the analysis of the detection results, we find that S-SAE fails mostly on EEG segments with impulsive noises such as the segment illustrated in Figure 2. Since such abrupt artifacts could appear frequently in EEG signals, the S-SAE model could not be well trained because the MSE-based cost could be dominated by the large outliers. Thus, these EEG segments could not be well represented by the S-SAE model. By contrast, the MCC in the R-SAE model is more robust to large outliers. Therefore, the proposed R-SAE method could handle noises in EEG signal well, and it provides more robust feature extraction performance than S-SAE.
3.2. Performance of Joint Feature Optimization
In this experiment, we test the effects of joint feature optimization. After the MCC-based unsupervised learning, the deep network is well configured to extract useful features from EEG signals. On this basis, the deep model is fine-tuned through back-propagation to jointly optimize both feature and classifier, so that the optimal overall classification performance could be achieved. In this experiment, the parameters of R-SAE are set the same as in Section 3.1 that only the unit number of the output layer is set to 3 for visualization convenience.
The visual comparison of features before and after fine-tuning is illustrated in Figure 5. In Figures 5(a) and 5(b), the red circles denote features of seizure segments while the blue stars are nonseizure ones. It can be seen that, after fine-tuning, the seizure and nonseizure segments are more separable in the feature space. We quantitatively analyze the separability of the features before and after fine-tuning with the FDR criterion as in (2) using the first four patients. As illustrated in Figure 5(c), the fine-tuned features achieve about ten times higher FDR than do the original ones, which strongly indicates that the joint optimization could help to learn superior features with high separability, so that the seizure detection performance could be improved.
Figure 5.

Comparison between features before and after joint optimization. (a-b) Visualization of features for seizure and nonseizure segments. The red circles denote features of seizure segments while the blue stars are nonseizure ones. (c) The FDR value of features before and after joint optimization.
The seizure detection performance of features before and after fine-tuning is presented in Table 3. After joint feature learning, the average sensitivity of six patients increases by 8% and the specificity increases by 15%. Therefore, the joint learning process enhances the separability of features between the two classes and greatly facilitates seizure detection performance.
Table 3.
Comparison of seizure detection performance before and after fine-tuning (FT).
| Feature | Sensitivity | Specificity |
|---|---|---|
| Before FT | 0.90 ± 1.0 × 10−1 | 0.84 ± 1.4 × 10−1 |
| After FT | 0.98 ± 3.1 × 10−2 | 0.99 ± 4.7 × 10−3 |
3.3. Performance of Seizure Detection
In this experiment, seizure detection performance of the proposed R-SAE model is evaluated and compared with singular value decomposition- (SVD-) based method. The SVD method is the most popular tool for correlation matrix analysis. Studies have shown that the seizure EEG signals commonly lead to a lower-complexity state which could be well reflected by the eigenvalues from SVD of the correlation matrix [10, 22].
To provide a benchmark for the comparison, we also test the seizure detection performance with the original cross-power matrix without further feature extraction. The methods included in the comparison are configured as follows.
SVM: in SVM, the cross-power matrices of time windows are reshaped to vectors and fed into an SVM classifier with RBF kernel. The parameters of the SVM model are selected using 3-fold cross-validation.
SVD(p) + SVM: for each time window, the cross-power matrix is decomposed by SVD, and the first p eigenvalues are adopted as the features. The feature vectors are then classified by an SVM classifier with RBF kernel. The parameters of the SVM model are selected using 3-fold cross-validation.
R-SAE(q): the R-SAE model is configured with 784 input units, 50 hidden units, and q output units. The parameters are set as λ = 0.003, β = 3, ρ = 0.1, and σ = 0.05. For this method, all results are averaged over 10 trials.
The seizure detection results of the three methods are given in Table 4. For both SVD + SVM and R-SAE, we test the seizure detection performance under two different choices of parameters of p and q, respectively. Results show that, with the original cross-power matrix classified by SVM, high sensitivities of above 0.99 are achieved for all six patients and the average specificity is 0.91. By the SVD + SVM method with p = 3, uneven performance is shown in different patients. For pt03, high sensitivity of 0.96 is reached with 0.99 of specificity. However, low sensitivities are obtained for pt01, pt05, and pt06. For SVD + SVM method with p = 10 where more features are preserved, better sensitivities and specificities are achieved. However, the uneven performance over patients still exists, and the average sensitivity is only 0.83. Since the feature extraction process of the SVD-based method loses much useful information, lower performance is obtained compared with SVM benchmark. Besides, the seizure detection performance sees a decrease when fewer eigenvalues are used. By contrast, the proposed R-SAE method achieves better performance than the benchmark SVM method. In R-SAE with q = 10, high sensitivities of 1.00 and specificities of 0.99 are achieved for all patients. Equally high performance is obtained with p = 3. The R-SAE model keeps robust seizure detection ability even with such small dimension of features.
Table 4.
Comparison with other methods.
| Method | Pt01 | Pt02 | Pt03 | Pt04 | Pt05 | Pt06 | Avg | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| SEN∗ | SPE∗ | SEN | SPE | SEN | SPE | SEN | SPE | SEN | SPE | SEN | SPE | SEN | SPE | |
| SVM | 1.00 | 0.96 | 1.00 | 0.89 | 1.00 | 0.93 | 0.99 | 0.95 | 1.00 | 0.96 | 1.00 | 0.78 | 1.00 | 0.91 |
| SVD(3) + SVM [10] | 0.45 | 1.00 | 0.72 | 0.99 | 0.96 | 0.99 | 0.84 | 0.95 | 0.46 | 0.98 | 0.64 | 0.97 | 0.68 | 0.98 |
| SVD(10) + SVM [10] | 0.61 | 1.00 | 0.76 | 0.99 | 0.99 | 1.00 | 0.80 | 0.95 | 0.84 | 0.93 | 0.95 | 0.96 | 0.83 | 0.97 |
| R-SAE(3) | 1.00 | 0.99 | 0.99 | 0.99 | 0.98 | 0.99 | 1.00 | 0.99 | 1.00 | 0.99 | 0.92 | 0.98 | 0.98 | 0.99 |
| R-SAE(10) (ours) | 1.00 | 0.99 | 1.00 | 0.99 | 1.00 | 0.99 | 1.00 | 0.99 | 1.00 | 0.99 | 0.99 | 0.97 | 1.00 | 0.99 |
*SEN indicates sensitivity and SPE is specificity.
3.4. Model Analysis
In this experiment, we test the influence of the two important parameters on the seizure detection performance. The first parameter is the output feature number, that is, the number of units of the output layer of the R-SAE model, and the second parameter is the kernel size σ in MCC. The experiment is carried out using the first four patients.
3.4.1. Analysis of Feature Number
The feature number is tuned by the parameter q in Section 3.3. In order to test the influence of q on seizure detection, all the other parameters are fixed as in Section 3.3 and we gradually tune q from 20 to 3. Figure 6(a) illustrates the seizure detection results averaged over four patients under different choices of q. The result shows that the seizure detection performance of R-SAE before fine-tuning sees a slight decrease with the decrease of feature number. However, after the fine-tuning, the seizure detection performance is greatly enhanced that high sensitivities and specificities up to 99% are achieved even with small feature numbers.
Figure 6.

Model analysis of two important parameters of R-SAE. (a) Seizure detection performance under different feature numbers; (b) seizure detection performance with different selections of σ. In this figure, SEN-FT and SPE-FT are sensitivity and specificity after fine-tuning and SEN-NFT and SPE-NFT are those before fine-tuning.
3.4.2. Analysis of σ
In the MCC, the kernel size σ serves as an important parameter that an appropriate choice of σ can effectively suppress the outliers and noises. The kernel size or bandwidth is a free parameter that its selection is still an open issue in ITL [26, 29, 34]. In practice, the parameter σ can be selected with Silverman's rule [35]. In the experiments of Sections 3.1–3.3, we simply set σ = 0.05.
Here, we test the influence of parameter σ on overall seizure detection performance. Also, all the other parameters are fixed as in Section 3.3. Figure 6(b) illustrates the seizure detection results under different selections of σ averaged over four patients. Results show that high seizure detection performance could be achieved under a wide choice of σ. Better results are obtained with small σ, and when σ increases from 0.1 to 0.2, the seizure detection performance becomes worse. In practice, the choice of σ should be small to keep good local property of the MCC.
4. Conclusions
In this paper, we have presented a novel deep model which is capable of extracting robust features under large amounts of outliers. Experimental results show that the proposed R-SAE model could learn effective features in EEG signals for high performance seizure detection, and it is promising for clinical applications.
Acknowledgments
This research was supported by Grants from the National Natural Science Foundation of China (no. 61031002), National 973 Program (no. 2013CB329500), National High Technology Research and Development Program of China (no. 2012AA020408), National Natural Science Foundation of China (no. 61103107), and Zhejiang Provincial Science and Technology Project (no. 2013C03045-3).
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
References
- 1.Epilepsy. Geneva, Switzerland: World Health Organization; 2012. (Factsheet no. 999). [Google Scholar]
- 2.Fisher RS, Van Emde Boas W, Blume W, et al. Epileptic seizures and epilepsy: definitions proposed by the International League Against Epilepsy (ILAE) and the International Bureau for Epilepsy (IBE) Epilepsia. 2005;46(4):470–472. doi: 10.1111/j.0013-9580.2005.66104.x. [DOI] [PubMed] [Google Scholar]
- 3.Mormann F, Andrzejak RG, Elger CE, Lehnertz K. Seizure prediction: the long and winding road. Brain. 2007;130(2):314–333. doi: 10.1093/brain/awl241. [DOI] [PubMed] [Google Scholar]
- 4.Saab ME, Gotman J. A system to detect the onset of epileptic seizures in scalp EEG. Clinical Neurophysiology. 2005;116(2):427–442. doi: 10.1016/j.clinph.2004.08.004. [DOI] [PubMed] [Google Scholar]
- 5.Majumdar K, Vardhan P. Automatic seizure detection in ECoG by differential operator and windowed variance. IEEE Transactions on Neural Systems and Rehabilitation Engineering. 2011;19(4):356–365. doi: 10.1109/TNSRE.2011.2157525. [DOI] [PubMed] [Google Scholar]
- 6.Yadav R, Shah AK, Loeb JA, Swamy MNS, Agarwal R. Morphology-based automatic seizure detector for intracerebral EEG recordings. IEEE Transactions on Biomedical Engineering. 2012;59(7):1871–1881. doi: 10.1109/TBME.2012.2190601. [DOI] [PubMed] [Google Scholar]
- 7.Ghosh-Dastidar S, Adeli H, Dadmehr N. Principal component analysis-enhanced cosine radial basis function neural network for robust epilepsy and seizure detection. IEEE Transactions on Biomedical Engineering. 2008;55(2):512–518. doi: 10.1109/TBME.2007.905490. [DOI] [PubMed] [Google Scholar]
- 8.Zandi AS, Javidan M, Dumont GA, Tafreshi R. Automated real-time epileptic seizure detection in scalp EEG recordings using an algorithm based on wavelet packet transform. IEEE Transactions on Biomedical Engineering. 2010;57(7):1639–1651. doi: 10.1109/TBME.2010.2046417. [DOI] [PubMed] [Google Scholar]
- 9.Jouny CC, Bergey GK. Characterization of early partial seizure onset: Frequency, complexity and entropy. Clinical Neurophysiology. 2012;123(4):658–669. doi: 10.1016/j.clinph.2011.08.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Santaniello S, Burns SP, Golby AJ, Singer JM, Anderson WS, Sarma SV. Quickest detection of drug-resistant seizures: an optimal control approach. Epilepsy and Behavior. 2011;22(supplement 1):S49–S60. doi: 10.1016/j.yebeh.2011.08.041. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Liu D, Pang Z. Epileptic seizures predicted by modified particle filters. Proceedings of the IEEE International Conference on Networking, Sensing and Control (ICNSC08); April 2008; Sanya, China. IEEE; pp. 351–356. [Google Scholar]
- 12.Liu D, Pang Z, Wang Z. Epileptic seizure prediction by a system of particle filter associated with a neural network. EURASIP Journal on Advances in Signal Processing. 2009;2009638534 [Google Scholar]
- 13.Vincent P, Larochelle H, Bengio Y, Manzagol P. Extracting and composing robust features with denoising autoencoders. Proceedings of the 25th International Conference on Machine Learning; July 2008; ACM; pp. 1096–1103. [Google Scholar]
- 14.Hinton GE, Osindero S, Teh Y. A fast learning algorithm for deep belief nets. Neural Computation. 2006;18(7):1527–1554. doi: 10.1162/neco.2006.18.7.1527. [DOI] [PubMed] [Google Scholar]
- 15.Hinton GE, Salakhutdinov RR. Reducing the dimensionality of data with neural networks. The American Association for the Advancement of Science. Science. 2006;313(5786):504–507. doi: 10.1126/science.1127647. [DOI] [PubMed] [Google Scholar]
- 16.Boureau Y, Cun Y. Sparse feature learning for deep belief networks. Proceedings of the Advances in Neural Information Processing Systems; 2007; pp. 1185–1192. [Google Scholar]
- 17.Bengio Y. Learning deep architectures for AI. Foundations and Trends in Machine Learning. 2009;2(1):1–27. [Google Scholar]
- 18.Salakhutdinov R, Tenenbaum JB, Torralba A. Learning with hierarchical-deep models. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2013;35(8):1958–1971. doi: 10.1109/TPAMI.2012.269. [DOI] [PubMed] [Google Scholar]
- 19.Lee H, Grosse R, Ranganath R, Ng AY. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. Proceedings of the 26th International Conference on Machine Learning (ICML '09); June 2009; Montreal, Canada. pp. 609–616. [Google Scholar]
- 20.Hinton G, Deng L, Yu D, et al. Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Processing Magazine. 2012;29(6):82–97. [Google Scholar]
- 21.Lee H, Yan L, Pham P, Ng AY. Unsupervised feature learning for audio classification using convolutional deep belief networks. Proceedings of the 23rd Annual Conference on Neural Information Processing Systems (NIPS '09); December 2009; pp. 1096–1104. [Google Scholar]
- 22.Schindler K, Leung H, Elger CE, Lehnertz K. Assessing seizure dynamics by analysing the correlation structure of multichannel intracranial EEG. Brain. 2007;130(1):65–77. doi: 10.1093/brain/awl304. [DOI] [PubMed] [Google Scholar]
- 23.Schindler KA, Bialonski S, Horstmann M, Elger CE, Lehnertz K. Evolving functional network properties and synchronizability during human epileptic seizures. Chaos. 2008;18(3) doi: 10.1063/1.2966112.033119 [DOI] [PubMed] [Google Scholar]
- 24.Rummel C, Müller M, Baier G, Amor F, Schindler K. Analyzing spatio-temporal patterns of genuine cross-correlations. Journal of Neuroscience Methods. 2010;191(1):94–100. doi: 10.1016/j.jneumeth.2010.05.022. [DOI] [PubMed] [Google Scholar]
- 25.Scholkopft B, Mullert K. Fisher discriminant analysis with kernels. 1999.
- 26.Weifeng L, Pokharel PP, Principe JC. Correntropy: a localized similarity measure. Proceedings of the International Joint Conference on Neural Networks (IJCNN '06); July 2006; pp. 4919–4924. [Google Scholar]
- 27.Jeong K, Liu W, Han S, Hasanbelliu E, Principe JC. The correntropy MACE filter. Pattern Recognition. 2009;42(5):871–885. [Google Scholar]
- 28.He R, Hu B, Zheng W, Kong X. Robust principal component analysis based on maximum correntropy criterion. IEEE Transactions on Image Processing. 2011;20(6):1485–1494. doi: 10.1109/TIP.2010.2103949. [DOI] [PubMed] [Google Scholar]
- 29.Liu W, Pokharel PP, Principe JC. Correntropy: properties and applications in non-Gaussian signal processing. IEEE Transactions on Signal Processing. 2007;55(11):5286–5298. [Google Scholar]
- 30.Olshausen BA, Field DJ. Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature. 1996;381(6583):607–609. doi: 10.1038/381607a0. [DOI] [PubMed] [Google Scholar]
- 31.Lee H, Battle A, Raina R, Ng A. Efficient sparse coding algorithms. Advances in Neural Information Processing Systems. 2007;19:801–808. [Google Scholar]
- 32.Xie J, Xu L, Chen E. Image denoising and inpainting with deep neural networks. Proceedings of the 26th Annual Conference on Neural Information Processing Systems (NIPS '12); December 2012; pp. 350–358. [Google Scholar]
- 33.Vincent P, Larochelle H, Lajoie I, Manzagol P. Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. Journal of Machine Learning Research. 2010;11:3371–3408. [Google Scholar]
- 34.He R, Zheng W, Hu B, Kong X. A regularized correntropy framework for robust pattern recognition. Neural Computation. 2011;23(8):2074–2100. [Google Scholar]
- 35.Silverman B. Density Estimation for Statistics and Data analysis. Vol. 26. CRC Press; 1986. [Google Scholar]
