Skip to main content
Cognitive Neurodynamics logoLink to Cognitive Neurodynamics
. 2022 Apr 29;17(1):119–131. doi: 10.1007/s11571-022-09809-y

Generalizable epileptic seizures prediction based on deep transfer learning

Bahram Sarvi Zargar 1, Mohammad Reza Karami Mollaei 1,, Farideh Ebrahimi 1, Jalil Rasekhi 1
PMCID: PMC9871115  PMID: 36704623

Abstract

Predicting seizures before they happen can help prevent them through medication. In this research, first, a total of 22 features were extracted from 5-s segmented EEG signals. Second, tensors were developed as inputs for different deep transfer learning models to find the best model for predicting epileptic seizures. The effect of Pre-ictal state duration was also investigated by selecting four different intervals of 10, 20, 30, and 40 min. Then, nine models were created by combining three ImageNet convolutional networks with three classifiers and were examined for predicting seizures patient-dependently. The Xception convolutional network with a Fully Connected (FC) classifier achieved an average sensitivity of 98.47% and a False Prediction Rate (FPR) of 0.031 h−1 in a 40-min Pre-ictal state for ten patients from the European database. The most promising result of this study was the patient-independent prediction of epileptic seizures; the MobileNet-V2 model with an FC classifier was trained with one patient’s data and tested on six other patients, achieving a sensitivity rate of 98.39% and an FPR of 0.029 h−1 for a 40-min Pre-ictal scheme.

Keywords: Epilepsy, Seizure prediction, Deep learning, Convolutional network, Transfer learning

Introduction

One of the most frustrating aspects of epilepsy is the lack of warning before a seizure occurs. For some patients, tens of seizures may occur each day, but seizures can happen once in a few months or a year for others. This unpredictable characteristic negatively affects patients’ quality of life, disrupts basic daily activities, and simple activities such as driving or swimming may become life-threatening for epileptic patients. Predicting seizures beforehand will provide time for interventions and give epileptic patients ample opportunity to distance themselves from the potential risks.

New research suggests that the epilepsy process is caused by synchronized changes in a network of scattered components throughout the brain (Mormann et al. 2000). In 2000, the International Seizure Prediction Group (ISPG) was formed to investigate and resolve the mismatch problem in the performance tests of seizure prediction models. The ISPG found that the data, performance metrics, naming, and analysis methods needed to be standardized (Lehnertz et al. 2007). Besides, it was concluded that large international databases were required to validate prediction performance in standard datasets. A complete course of the epilepsy process can be divided into four general states, as shown in Fig. 1. The seizure prediction problem has traditionally been considered a classification problem. In most studies, features have been first extracted from raw Electroencephalogram (EEG) signals in the form of Univariate (from a single channel), bivariate (from pairs of channels), or multivariate (from numerous channels at the same time) features.

Fig. 1.

Fig. 1

Brain states in a typical epileptic EEG recording

The Fourier Transform is the most widely used procedure for converting EEG signals from the time domain to the frequency domain (Rasekhi et al. 2013). In a study by Bandarabadi et al. (2015), features were first extracted and then classified into preictal/non-preictal states from the preprocessed windowed EEG signals. EEG signal data from twenty-four patients with drug-resistant focal epilepsy from the Epilepsia database were used. The first three seizures and their related data were considered for training for each patient, and the remaining seizures were used to test the algorithm. In their study, a Support Vector Machine (SVM) classifier with radial base Gaussian function (RBF) kernel was utilized and could demonstrate a sensitivity of 75.8% and an FPR of 0.1 h−1.

In the study by Ahmadi and Soltanian-Zadeh (2019), a dataset of 12 patients from the CHB-MIT database was used. Gamma frequency band signals between 32 and 120 Hz were used to extract the features. Firstly, the gamma frequency band was separated from the primary signal using the Hamming high-pass filter. Then, the Shannon entropy and four statistical characteristics of minimum, mean, maximum, and standard deviation were calculated and used in the classification. Since not all extracted features were used for classification purposes, the t-feature selection technique separated the best features. An SVM classifier with linear kernel and K-nearest neighbor (KNN) were used to predict epileptic seizures using the first 9 min of a 10-min interval preceding the seizure. This study obtained an average sensitivity of 83.8% for both classifications, but the KNN classification accuracy with 71% was slightly better than the SVM classification with an accuracy of 67.8%.

In the proposed method by Aarabi and He (2017), seizures were predicted by extracting different features, including correlation dimension, correlation entropy, noise level, Lempel–Ziv complexity, largest Lyapunov exponent, and nonlinear interdependence from intracranial EEG segments of ten Freiburg database patients. Their results showed an average sensitivity of 92.9% and an average FPR of 0.096 h−1 for an average minimum prediction time of 33.3 min.

Zhang and Parhi (2016) extracted spectral power features and computed their ratios. For each channel, a total of 44 features, including eight absolute spectral powers, eight relative spectral powers, and 28 spectral power ratios, were extracted every 2 s using a 4-s window with a 50% overlap. These features were then ranked and selected in a patient-specific manner. A second-order Kalman filter further processed the selected features. They were then fed as input into a linear SVM classifier, resulting in a sensitivity of 98.68% and an FPR of 0.0465 h−1 with the CHB-MIT dataset.

Similarly, Li et al. (2013) applied a low-pass filter to remove the high-frequency artifacts in the EEG. Then, they used a morphology filter to detect spikes. Subsequently, the spike rate (SR) was computed and smoothed by an average filter. Finally, the performance of smoothed SR of the EEG during interictal, pre-ictal, and ictal periods was analyzed and employed as an index for seizure prediction. Experiments with intracranial EEGs of 21 patients showed that the proposed seizure prediction approach achieved a sensitivity of 75.8% with an average FPR of 0.09 h−1.

In summary, there are various methods for characterizing and classifying EEG signals to predict seizures. Although some of the methods have acceptable results, for most of them, the steps of extracting complex features, optimizing a large set of features, and selecting the channels must be done separately for each patient, making the implementation of complex algorithms difficult.

Artificial neural networks and deep learning tools have been practiced to predict seizures since over a decade ago. For the first time, Mirowski et al. (2009) used the convolutional network to predict seizures and achieved a sensitivity of 71% for the University of Freiburg dataset. This network was trained with bivariate features extracted from different EGG channels. The inputs were transformed by the Short-Term Fourier Transform (STFT) before being fed into the CNN network. Daoud and Bayoumi (2019) used raw EEG signals extracted from 8 children and available at the CHB-MIT database to predict epilepsy. In the study, the interval of 1 h preceding the seizure onset was considered as the pre-ictal state, and the same time interval with a length of 4 h before the seizure was marked as interictal state data. The problem of epilepsy prediction was turned into the classification of two types of data: interictal and pre-ictal. However, in their study, the data of ictal and postictal states were ignored, and four models of perceptron neural networks, convolutional, BiLSTM, and Auto-encoder, had been used to classify the data. Additionally, an EEG channel selection method was developed (the product of the channel’s variance and the whole channel’s entropy is considered as the channel selection criterion), which reduced the number of network parameters and lowered the network learning time. They achieved the highest sensitivity of 99.72%, an FPR of 0.004 h−1, and a very early seizure prediction time of 1 h.

In a study by Truong et al. (2018), the data from three databases were evaluated using a convolutional network. They applied convolutional neural networks to different intracranial and scalp EEG datasets and proposed a generalized retrospective and patient-specific seizure prediction method. They used the STFT on 30-s EEG windows to extract information in both frequency and time domains. The algorithm automatically generated optimized features for each patient to best classify pre-ictal and interictal segments. The proposed approach achieved a sensitivity of 81.4%, 81.2%, and 75% and an FPR of 0.06 h−1, 0.16 h−1, and 0.21 h−1 on the Freiburg Hospital intracranial EEG dataset, the Boston Children’s Hospital–MIT scalp EEG dataset, and the American Epilepsy Society Seizure Prediction Challenge dataset, respectively.

Shahbazi and Aghajan (2018) proposed a CNN-LSTM based approach to predict seizures. First, they applied STFT to EEG signals for constructing multichannel images. After a preprocessing step, a CNN-LSTM neural network was trained on the STFTs to capture the spectral, spatial, and temporal features from the EEG segments and classify them as a pre-ictal or interictal state. The proposed method achieved a sensitivity of 98.21%, an FPR of 0.13 h−1, and a mean prediction time of 44.74 min on the CHB-MIT dataset.

Most studies in the literature on deep learning-based epileptic seizure prediction have focused on patient-dependent models rather than patient-independent seizure predictions. All previously discussed studies were patient-dependent. Two goals were considered in a patient-independent study conducted by Khan et al. (2017). The study’s first goal was to extract features from the EEG signal that discriminate between different states. The convolutional network was used to extract features from the EEG signal to make a definite distinction between epileptic pre-ictal and non-epileptic interictal samples. The inputs required for CNN training were generated using the continuous wavelet transform, which provided a tensor of wavelet coefficients in each channel with three dimensions: time, scale, and channel. As a second goal, no decision about the duration of the pre-ictal state was made in advance, and they tried to find the optimal duration for the pre-ictal period with an algorithm. The algorithm was based on finding where the adopted features started to change as a sign of the brain shifting from the pre-ictal to interictal state. The results were promising, with a sensitivity of 87.8% and a low FPR of 0.142 h−1, predicting the seizure on average 10 min before the seizure onset.

The study by Tsiouris et al. (2017) proposed an SVM classifier to identify pre-ictal and interictal brain states from EEG signals. To train their model, they extracted time domain, frequency domain, and graph theory-based features from EEG signals. The resulting classifier, which was evaluated on all 24 subjects from the CHB-MIT EEG dataset, achieved a sensitivity of 68.50% for patient-independent classification.

Dissanayake et al. (2011) proposed the deep learning model as the state-of-the-art patient-independent epileptic seizure prediction model. Two deep learning architectures were proposed for ten patients from the CHB-MIT database, with two different learning strategies that could learn global functions using data from multiple patients. These two models with a 1-h pre-ictal interval demonstrated accuracies of 88.81% and 91.54%, respectively, and the highest achieved sensitivity rate was 97.88% for a 15-min pre-ictal interval.

There are other methods like experimental trials to control epileptic seizures. The experimental sets include numerous sensory devices for measuring the dynamics of the brain cortex. This cortical model can be unreliable and have time-varying properties. A robust Takagi–Sugeno controller and observer were created in Çetin’s study (2020) to suppress epileptic seizures without sensory measurements.

Materials and methods

This paper proposed nine deep transfer-learning-based models for an early and accurate patient-dependent seizure prediction. The best-performed models were trained and tested for patient-independent seizure prediction. In seizure prediction initiatives, the main goal is to detect the pre-ictal state, which precedes the ictal state. Therefore, the seizure onset can be predicted upon detection of the pre-ictal state. One method to do so is to build a two-class problem for predicting seizures, that is, the pre-ictal state against all other classes as a single class; another method is to employ the original four categories and build up a quad-class problem. The quad-class approach is used in this study.

The nine deep transfer-learning-based models were trained and tested for each patient separately to allow comparison with other patient-dependent models in the literature. In various studies, the data from the first two or three seizures were used to train the network and were tested on other seizures (Rasekhi et al. 2013; Bandarabadi et al. 2015). In this study, we used the data of the first three seizures of the patient for training and the rest for testing the models. Because we used the learning transfer method, the number of weights and network parameters needed to be trained and learned was reduced significantly. Therefore, the patients' data were used to find the parameters of an accurate classification, and the three seizure data addressed this issue sufficiently.

Despite the abundant research done on seizure prediction, there is no standard duration for the preictal state. In this study, four different pre-ictal times were proposed. The proper choice of pre-ictal time was essential to the overall performance of a seizure predictor. It should cover almost all transient changes preceding a seizure onset but exclude other EEG dynamics such as ictal and interictal states as much as possible. The proposed models for predicting epileptic seizures in this study include steps discussed below

  • Step 1

    Ten subjects were randomly selected for this study, and to investigate the effect of preictal length, four different pre-ictal times (10 min, 20 min, 30 min, and 40 min) were used.

  • Step 2

    For the selected patients, the extraction and preprocessing of 22 linear and univariate features from each of the EEG channels were carried out.

  • Step 3

    Normalizing features and converting feature vectors to feature tensors was necessary to feed deep convolutional neural network models.

  • Step 4

    In order to find a high-performance patient-dependent model, artificial neural networks were designed by mixing three ImageNet models with three classifiers. Nine different models were investigated.

  • Step 5

    Patient-dependent training and testing models gave insights about the models' performance and provided a reference to compare patient-independent models’ performance. For each patient, the first three seizure data were used to train the models, and the rest of the seizures were allocated for testing the models’ performance.

  • Step 6

    The models with a patient’s data were trained and then tested with the data from other patients to examine the idea of the possibility of predicting seizures patient-independently.

The type of the extracted features from signals, the ImageNet models, and the classifiers were not new, and each of them had been used repeatedly for machine learning projects previously. However, applying these deep transfer-learning models on feature-based tensors for the problem of seizure prediction was firstly suggested and practiced in this study and yielded promising results.

Dataset

Long-term continuous multichannel EEG recordings of 10 patients from the European database on epilepsy (Epilepsia) (Klatt et al. 2012) trained the proposed models and tested their performances. From the Epilepsia database, we only had access to ten patients’ data, which were selected randomly in the first place from the dataset (Table 1). Therefore, all patients on whom the models were tested had been reported in this paper. In most cases, EEG recordings from 27 channels of surface electrodes were made using the International 10–20 system. Patients were aged between 11 and 48 years and had drug-resistant seizures. Only one patient was recorded invasively (coded 1236703). For the rest of the patients, the recording of brain signals was noninvasive. The data used in this study had different sampling frequencies, and the number of signal recording electrodes varied. Details of these patients are listed in Table 1. The types of epileptic seizures were very diverse. Even in one patient, the type of seizure could change over time; even some types of seizures had not yet been identified and were still under investigation. In this study, the effect of seizure type on prognosis was not investigated.

Table 1.

Information for the ten studied patients

Number Patient code Gender Age Number of electrodes Lo. La. Number of seizures Record duration (hours) Sampling frequency (Hz)
1 300 M 20 19 F B 28 98.42 512
2 32502 M 46 23 F R 7 68.97 256
3 21602 M 44 24 F L 11 145.58 256
4 1326003 F 24 27 F B 27 186.64 400
5 1327903 F 21 27 F R 9 256.09 400
6 1328803 M 46 27 F R 5 91.36 400
7 2800 F 18 27 F B 7 117.82 512
8 1329003 M 36 39 F R 12 218 400
9 107802 M 11 48 F R 54 158 1024
10 1236703 M 48 62 F L 25 211 400

Lo. localization of seizures; F frontal; T temporal; C central; O occipital; P parietal; La. lateralization; R right; L left; B bilateral

Patients' brain signals were recorded at different sampling rates (400, 512, 256, and 1024). This frequency difference causes the number of samples recorded for patients to differ. On the other hand, the nature of epilepsy is such that the length of seizure intervals varies for different patients, and even one patient may experience epilepsy with varying lengths of seizures.

The preictal sample set, one of the four classes in the seizure prediction issue, was chosen based on a preictal time. Preictal time is thought to contain the most important transitional information and may contain evidence of an impending seizure. The seizure onset was initially marked by experts. However, preictal time is a matter of choice and can have a fixed value ranging from several seconds to tens of minutes, but it is undoubtedly ending at the seizure onset. In this study, four different preictal times were used: 10 min, 20 min, 30 min, and 40 min. Preictal time selection is critical to a seizure predictor's overall efficacy, as it should include practically all transitory changes preceding a seizure onset while avoiding other EEG dynamics such as ictal and interictal states as much as possible.

All the ictal sample’ sets were used for training and testing because this class had the lowest number of samples among the four classes. Since the number of interictal and postictal samples were much higher than the preictal ones, and usually, classifiers tend to produce higher accuracy over the class with more training samples, the number of interictal and postictal samples of the training set was reduced by resampling to achieve a balanced number of samples for the classes. However, for the testing phase, all of the samples were considered, and no resampling was carried out.

Preprocessing of EEG signals and extracting features

In the first step, the recorded signals must be refined from undesirables such as noise and environmental parasites. The raw signals were segmented using 5-s windows and then passed through the 50 Hz forward–backward Butterworth IIR filter. The corresponding Twenty-two univariate and linear features are listed in Table 2 and are briefly described in the following.

Table 2.

List of univariate features extracted from a channel of EEG signals

# Feature # Feature
1 Mean 12 Spectral Power of the Delta Band (< 4 Hz)
2 Variance 13 Spectral Power of the Theta Band (4–7 Hz)
3 Skewness 14 Spectral Power of the Alpha Band (8–12 Hz)
4 Kurtosis 15 Spectral Power of the Beta Band (12–30 Hz)
5 Long-term energy 16 Spectral Power of the Gamma Band (> 30 Hz)
6 AR model predictive error 17 Wavelet_ Daubiches-4 (db4) _Energy Band_ level 1
7 Decorrelation time 18 Wavelet_ Daubiches-4 (db4) _Energy Band_ level 2
8 Hjorth mobility 19 Wavelet_ Daubiches-4 (db4) _Energy Band_ level 3
9 Hjorth complexity 20 Wavelet_ Daubiches-4 (db4) _Energy Band_ level 4
10 Spectral edge frequency 21 Wavelet_ Daubiches-4 (db4) _Energy Band_ level 5
11 Spectral edge power 22 Wavelet_ Daubiches-4 (db4) _Energy Band_ level 6

The mean, variance, skewness, and kurtosis are four well-known parameters used in statistical analysis to provide information on the amplitude distribution of time series. While the mean and variance provide information on the position and span of the distribution, skewness, and kurtosis are measurements of the amplitude distribution’s symmetry, relative peakedness, or flatness, respectively.

Long-term energy, also known as accumulated energy, has been employed in research and has been shown to predict seizures (Litt et al. 2001). It has been demonstrated that as a seizure approaches, bursts of long-term energy rise. The idea of monitoring long-term energy bursts to forecast seizures is based on the hypothesis that seizures are caused by a series of EEG events that occur over several hours.

For both detection (Altunay et al. 2010) and prediction (Rajdev et al. 2010), the prediction error calculated from an autoregressive model of the EEG signal has been highlighted. The theory is that as the seizure approaches, the EEG data become more well-behaved and are predicted more accurately by the autoregressive model, decreasing the preictal prediction error. With the occurrence of a seizure, the decrease in EEG prediction error vanishes, and higher error levels are recorded once again.

The autocorrelation of signal x(t) is a function defined as the correlation between x(t) and versions that have been shifted by τ; as a result, it is only a function of the time shift. Autocorrelation can be used to determine whether a time series is stationary. On the other hand, decorrelation time is defined as the autocorrelation function’s first zero-crossings (Box et al. 2008). When the period between the first zero-crossing and the second zero-crossing approaches zero, the signal samples are less correlated. Mormann et al. (2005) discovered that the decorrelation time of epileptogenic EEG data could be used to distinguish between preictal and interictal periods.

To quantitatively define EEG data, Hjorth (1970) established a set of three time-domain parameters: activity, mobility, and complexity. The mobility and complexity of the EEG increased significantly during the preictal stage (Mormann et al. 2005). As a result, we have extracted the mobility and complexity properties of the EEG signal. The mobility is calculated by dividing the root mean square (RMS) of the EEG signal slopes by the RMS of the amplitudes within a moving time window. The signal’s bandwidth is estimated using the complexity.

EEG signals have a spectral strength that is distributed across all frequencies. However, a large portion of the power is restricted to frequencies below 40 Hz, and the spectrum edge frequency and spectral edge power measure how power is distributed within these frequencies. The spectral edge frequency is defined as the frequency below which x percent of the signal’s overall power is found. From 0 to 100, x can hold any value. However, in the context of seizure prediction, a 50% x value has been successfully used to characterize the lowest frequency at which 50% of the total power of the 0–40 Hz band is included.

The spectral edge-power equals to the signal’s half-power up to 40 Hz.

For seizure prediction, the spectral power in different frequency bands of the EEG was also taken into account. The transfer of power from lower to higher frequencies was seen before the seizure onset, according to Box et al. (2008). Each sub-spectral band’s power is computed by performing a Fast Fourier Transform (FFT) on the segmented signals and then summing the resulting Fourier coefficients within that sub-band. It has been suggested that normalized power values rather than absolute values can separate epileptogenic variations from brain signals generated by daily life. The use of normalized power reduces the dependence of sub-band power values on total power, resulting in better comparison metrics. Spectral power features are normalized by being divided by the total power.

In signal processing, the wavelet transform is one of the most frequently used mathematical transforms. This transform has a crucial role in processing nonstationary EEG signals due to its multi-resolution character. The Wavelet transform allows for the time–frequency breakdown of a signal into many levels (sub-signals). The initial levels are associated with the signal’s high-frequency components, while the latter is related to the signal’s lower frequency elements. A measure of the energy in different frequency ranges can be obtained by computing the energy of the signals generated by the decomposition. The Daubechies-4 (db4) was employed in this study since it had shown good EEG signal localization qualities in the temporal and frequency domains (Petrosian et al. 2000).

Converting features to tensors

Before moving onto the next phase, those extracted linear features were normalized by executing function (1) for each of the twenty-two features, and then tensors were made out of the feature vectors.

xnorm=x-min(x)maxx-min(x) 1

Since convolutional networks perform better with tensors and are essentially designed to process and classify images, in the next step, the groups of linear features extracted from each 5-s segment were converted into tensors. For example, tensors constructed from the 22 linear extracted features for patient code 2800 in the four groups—Interictal, Pre-ictal, Ictal, and Postictal- are compressed as an image and shown in Fig. 2.

Fig. 2.

Fig. 2

Linear features extracted of four states of epilepsy converted to three-dimensional tensors

ImageNet models require input dimensions with specific standards, and input tensors must be three-dimensional. In this study, the features extracted from EEG signals were resized as suitable inputs for the ImageNet networks. The 22 features extracted from each of the EEG channels were in the form of vectors. After the features were normalized, the feature vectors were transformed into a tensor by changing the dimensions.

For example, for a patient with code number 2800, brain signals had been recorded by 27 channels. Given that 22 features were extracted from each channel, the total feature matrix extracted in each 5-s segment of the EEG channel was equal to 22 × 27 = 594 for this patient. The obtained features were first normalized and were then used to convert the three-dimensional tensor as a suitable input for ImageNet networks. For example, for patient code 2800, the features extracted in each segment were reshaped to a tensor of (18, 11, 3).

Designing ImageNet networks based models

Convolutional neural networks are a group of deep neural networks commonly used to analyze visual images. They have image and video recognition applications, image classification, medical image analysis, and natural language processing. Three ImageNet deep convolutional networks were suggested to design a deep convolutional network; those network characteristics are given in Table 3. Three different classifiers have been examined in the last layers of each of these networks, leading to the design of 9 deep convolutional networks, as shown in Fig. 3.

Table 3.

Details of ImageNet convolutional networks selected for this study

Name of network Year Number of layers Number of parameters (million) Top-1 accuracy
Exception (Chollet 2017) 2017 71 23 0.79
EfficientNet-B0 (Tan and Le 1905) 2019 237 7.8 0.844
MobileNet-V2 (Sandler et al. 2018) 2018 53 3.4 0.728

Fig. 3.

Fig. 3

ImageNet Convolution networks and classifiers (Fully Connected = FC, Support Vector Machine = SVM, Random Forest = RF)

The transfer learning method was used to reduce the number of learnable networks’ parameters. Weights and parameters of ImageNet networks were transferred, and just the input connections and replaced classifier parameters were learned and optimized during the training process. The Python programs of these networks with Tensorflow and Keras libraries were carried out and then executed on Google Colab GPUs.

After the CNN extracts complex features from input tensors, they must be classified by the appropriate method. To do this, first, the classification layers of the introduced ImageNet networks were removed from the network and replaced with suitable classifiers. Three different types of classifiers were tested for this task. After removing the original classifier layers from the deep convolutional network (Xception, EfficientNet-B0, or MobileNet-V2), the outputs of the remaining network were connected to a Fully Connected (FC) classifier with a Flatten layer in between. The FC classifier contains two layers, each one with 128 neurons, connected to an output layer with four neurons. The categorical_crossentropy function is used to calculate the error for the designed networks, and the Adadelta optimizer has been at work. For improving the performance, the model sees all of the training data 10 times (epochs = 10), and the training data was given to the network in the form of thirty-two packages (Batch = 32).

Since its introduction, the SVM has been used for classification (Piryonesi and El-Diraby 2020) and has been used as one of the most widely used machine learning classifiers for predicting epileptic seizures. SVM uses special kernel functions to classify data with nonlinear boundaries, mapping the data to a higher-dimension feature space where linear separation is possible. In our study, a Radial Base Function (RBF) kernel was used:

Kx·y=exp-x-y22σ2 2

σ is a scale parameter, and x and y are feature vectors in the input space. The Sklearn library was used to design and build the SVM classifier.

Random forest (RF) is a combined learning method for classification and regression based on many decision trees. RF is suitable for decision trees that are over-fitted in complex training. The decision tree is a popular method for different types of machine learning tasks. However, in many cases, they are not accurate. Usually, a decision tree that is too deep will not have a precise pattern and will over-fit, and in such cases, an RF is a good alternative (Hastie et al. 2009). The SKlearn library was used to design and construct the RF classifier, and the number of estimators is considered equal to 10.

Training and testing patient-dependent models

Two indicators for epileptic prediction modes are the model's Sensitivity and the FPR. Sensitivity measures the algorithm’s ability to diagnose a disease correctly. The sensitivity should be controlled so that the number of predictions of an epileptic seizure does not differ from the actual number of seizures. The FPR can be one of the most essential and practical criteria. In any forecasting system, the error is inevitable and expressed by the FPR, which shows how reliable the system is. The FPR is determined by the ratio of the absolute number of false predictions to the total number of predictions over a period of time.

Transfer learning is adaptable, allowing pre-trained models to be used directly as feature extraction preprocessing or be integrated into entirely new models, allowing us to build accurate models in a time-saving way. Instead of starting the learning process of a model from scratch, transfer learning begins with patterns learned while solving a different problem. Rather than beginning from scratch, a new model might be constructed on existing knowledge in this way. This study developed pre-trained models on a large benchmark dataset to solve a similar issue. As training such models are computationally expensive, using models from the literature is expected.

To repurpose a pre-trained model for our goals, we removed the original classifier, added a new classifier appropriate for seizure prediction, and fine-tuned the models while freezing the convolutional layers. The fundamental concept is preserving the convolutional base in its original state and feeding the classifier its results. For all suggested nine models, parameters and weights of ImageNet neural networks had been transferred, and just the classifier’s parameters and models network’s hyperparameters were learned during the training.

Models were initially trained on three seizures of the patient’s data and then tested on the rest of the seizures. From the training data, 20% were allocated to validation and tuning hyperparameters on networks. The models' performance results were reported for the test data of each patient. In the training phase, the training data was first shuffled randomly and then given to the networks to eliminate the impact of the data order on the training of the networks.

Training and testing patient-independent model

In the patient-independent model, the designed network must be trained with one or more patients’ data and be then tested on the data from the remaining patients. Seven patients with electrodes number 27 or less from Table 1 were selected to apply and test the possibility of patient-independent seizure prediction. For three patients with fewer channels (300, 32502, and 21602), the data of lacking number of electrodes were generated by zero-padding to have equal tensor sizes for all the patients participating in patient-independent seizure prediction.

Three high-performance models were trained and tested for patient-independent seizure prediction from the nine designed models. For training the models, only the first three seizures of the patients were used. A rotation technique was used in which the training data was taken in turns from all inputs. Therefore, in order to evaluate the proposed patient-independent method better, models take data of each of the patients at a time as training samples and test the trained network on the data from the rest of the patients. Then, the average performance of overall rotations was calculated and reported as the overall performance of the patient independent learning method.

Results

For each patient, the highest achieved sensitivity [bold] among all the nine proposed models and the different pre-ictal states are given in Table 4.

Table 4.

The highest achieved sensitivity rate in the four pre-ictal times (10, 20, 30, 40 min) for each patient within all nine patient-independent models

Patient code Pre-ictal (minutes) ImageNet Network Classifier Sensitivity (%)
300 40 Xception Fully connected 94.86
32502 20 MobileNet-V2 Fully connected 99.01
21602 40 Xception/MobileNet-V2 Fully connected 98.11
1326003 40 MobileNet-V2 Fully connected 99.13
1327903 20 MobileNet-V2 Fully connected 99.24
1328803 30 MobileNet-V2 Fully connected 99.66
2800 40 MobileNet-V2 Fully connected 99.83
1329003 30 MobileNet-V2 Fully connected 99.21
107802 40 Xception Fully connected 98.61
1236703 40 MobileNet-V2 Fully connected 99.02

The effect of pre-ictal state

Four different periods of the pre-ictal state have been investigated to find optimal pre-ictal state duration. Figure 4 shows the results obtained from different models with the increase of the pre-ictal state.

Fig. 4.

Fig. 4

The average sensitivity of the nine proposed models in 4 different pre-ictal periods with deep CNNs (blue: Xception; red: EfficientNet-B0; green: MobileNet-V2) and Classifiers (square: Fully Connected; circle: SVM; star: Random Forest). (Color figure online)

We expected the results of the models with the same classifiers and different transferred ImageNet to be close to each other. The ImageNet networks shared many similarities and were trained and tuned with the same database of images.

For the deep networks with FC classifiers, on average, the sensitivity of the network improves with the increasing of the pre-ictal; whereas, for the models that use SVM and RF classifiers, the performance of the models improves as they get closer to the seizure and make the best prediction 10 min before the seizure. The network's performance with FC classifiers was significantly better than the other two classifiers. For ten patients and a pre-ictal scheme of 40 min, the Xception network with an FC classifier demonstrated the best prediction performance among the nine proposed models with an average sensitivity of 98.468% and an FPR of 0.031.

Effect of the ImageNet models

Figure 5 below shows the models’ average sensitivity rates among all patients without considering the effect of pre-ictal lengths and averaging over the best model results obtained for each of the six patients; for all patients, the models can be compared based on their structure. MobileNet-V2 networks combined with FC classifiers achieved the highest prediction rates. Although EfficientNet-B0 had the highest number of parameters among the three proposed ImageNet networks, it could not produce better results than its competitor networks.

Fig. 5.

Fig. 5

The patient dependent models average sensitivity rate for each of the nine proposed models (each model detail shown in Fig. 3) on the test set of data; within all four pre-ictal durations

The effect of the classifier

The results showed that the difference in the type of classification had been very effective in predicting seizures. Table 4 and Fig. 4 showed that the FC classifiers performed significantly better than the other two classifiers. The FC classifiers can be well fed by the CNN with more parameters than the SVM or RF classifiers, highlighting the differences between features.

Patient-independent model

As all of the patient-dependent proposed networks have a noticeable tendency towards the increased pre-ictal interval, showing that the optimal pre-ictal length becomes clear when the model is used patient independently. Nevertheless, with a pre-ictal length of 40 min, an average sensitivity rate of 98.39%, and an FPR of 0.029 h−1, it is possible to design and build a single model for all patients (Tables 5, 6, 7).

Table 5.

Performance of patient-independent model (MobileNet_V2 & fully connected classifier)

Pre-ictal  Sensitivity (%) FPR
10 min 20 min 30 min 40 min 10 min 20 min 30 min 40 min
Patient code
300 90.02 93.07 93.61 93.84 0.109 0.124 0.137 0.139
2800 99.62 99.80 99.69 99.71 0.002 0.003 0.004 0.004
32502 97.83 98.44 98.64 99.36 0.007 0.009 0.011 0.015
21602 96.18 97.86 98.03 98.11 0.012 0.014 0.017 0.020
1326003 96.63 98.94 99.25 99.13 0.009 0.008 0.009 0.012
1327903 99.54 99.20 99.51 99.20 0.003 0.005 0.005 0.007
1328803 99.30 99.40 99.39 99.39 0.003 0.003 0.004 0.007
Average 97.24 98.08 98.32 98.39 0.020 0.023 0.026 0.029

For each specific patient, the model was trained six times with the data from six other patients' data and then was tested each time with the target patients’ data; finally, the average of the six tests was reported)

Table 6.

Performance of patient-independent model (Xception & fully connected classifier)

Pre-ictal  Sensitivity (%) FPR
10 min 20 min 30 min 40 min 10 min 20 min 30 min 40 min
Patient code
300 88.05 92.43 93.01 93.23 0.120 0.128 0.136 0.143
2800 99.50 99.71 99.68 99.69 0.001 0.002 0.002 0.003
32502 97.67 98.63 98.48 98.44 0.010 0.012 0.013 0.020
21602 98.44 98.00 98.31 98.32 0.016 0.016 0.022 0.026
1326003 96.64 98.43 99.14 99.15 0.014 0.015 0.017 0.021
1327903 98.86 99.27 99.20 99.25 0.004 0.004 0.006 0.007
1328803 99.21 99.42 99.42 99.44 0.004 0.005 0.007 0.008
Average 96.91 97.98 98.18 98.22 0.024 0.026 0.029 0.032

For each specific patient, the model was trained six times with the data from six other patients' data and then was tested each time with the target patients’ data; finally, the average of six tests was reported)

Table 7.

Performance of patient-independent model (EfficientNet-B0 & fully connected classifier)

Pre-ictal  Sensitivity (%) FPR
10 min 20 min 30 min 40 min 10 min 20 min 30 min 40 min
Patient code
300 89.21 91.13 92.77 92.97 0.078 0.106 0.118 0.148
2800 99.34 99.67 99.67 99.67 0.002 0.003 0.004 0.005
32,502 97.08 98.44 98.21 98.44 0.010 0.013 0.023 0.018
21,602 95.86 97.85 98.03 98.10 0.014 0.020 0.026 0.032
1,326,003 97.38 98.93 99.68 99.12 0.014 0.017 0.018 0.022
1,327,903 98.47 99.19 99.19 99.21 0.005 0.007 0.010 0.012
1,328,803 98.81 99.38 99.38 99.38 0.004 0.006 0.008 0.010
Average 96.59 97.80 98.13 98.13 0.018 0.024 0.029 0.035

For each specific patient, the model was trained six times with the data from six other patients' data and then was tested each time with the target patients’ data; finally, the average of six tests was reported)

Discussion

There are several reasons why analyzing electrophysiological signals is the best method for studying effective connections. The first reason is that neural activity is measured at a group level. The second reason is the temporal resolution, which is compatible with neural processing times in the millisecond range. Invasive and noninvasive approaches can be used to measure these signals. Data collected by invasive procedures such as implanting electrodes in the brain is of high quality and spatial precision. Noninvasive approaches, on the other hand, are widely employed because of the high sampling frequency, and that source reconstruction techniques provide a higher signal-to-noise ratio and spatial resolution. In this study, the models were trained and tested with noninvasive data of Epilepsia dataset. We assumed that the first practical attempt for seizure prediction must be safety and noninvasiveness.

In this study, we investigated three problems: 1. Finding a suitable patient-dependent model for predicting seizures, 2. Investigating the effect of the pre-ictal duration, and 3. Developing a patient-independent model for predicting seizures in general. Concerning the first problem, nine different models were tested, among which, the Xception model with an FC classifier predicted seizures with the highest average sensitivity. For the second problem, it was observed that increasing the pre-ictal duration reduces the sensitivity of models using the SVM and the RF classifiers; in contrast, the FC classifiers showed a tendency towards increasing sensitivity. Our findings support the importance of significant changes in the nearby seizure onset, which can be useful in seizure prediction. As for the third problem, three models were trained with patient data and tested on the data from other patients. The MobileNet-V2, with an FC classifier, produced acceptable results, indicating that the patient-independency in seizure prediction is possible.

In this study, for the first time, ImageNet convolutional networks were used for seizure prediction. Previously, researchers had designed convolutional networks manually, and the network details had been adjusted through try and error. On the other hand, extracting a limited number of features instead of the raw and nonstationary EEG signal makes it possible to reduce processing operations dramatically, promising to achieve an online, real-time, and practical model for patients.

Patient-dependent

Results of our study can be compared with the research conducted by Rasekhi et al. (2013), where the same 22 linear features from EEG signals were extracted. Table 8 shows that the results are noticeably improved by applying a deep ImageNet CNN network and using the transfer learning method.

Table 8.

Comparison of the proposed model in this study and recent studies based on deep learning to predict seizures (patient-dependent)

Year Authors Database Method Sensitivity (%) FPR (h−1) Pre-ictal (minutes)
2009 Mirowski et al. (2009) Freiburg (10 patients) Bivariate features + CNN 71 0
2013 Rasekhi et al. (2013) Epilepsia (10 patients) Linear features + SVM 73.9 0.15
2016 Zhang and Parhi (2016) Freiburg + CHB-MIT Spectral Power + SVM 98.68 0.046 42.7
2018 Shahbazi and Aghajan (2018) CHB-MIT (14 patients) STFT + CNN + LSTM 98.2 0.13 44.74
2019 Daoud and Bayoumi (2019) CHB-MIT (8 patients) DCAE + Bi-LSTM 99.72 0.004 60
2021 This work Epilepsia (10 patients) Linear features + Xception + FC 98.47 0.031 40

Patient-independent

Seizure prediction studies have been primarily patient-dependent. The patient-independent seizure prediction approach has attracted many interests after the success of artificial neural networks. This study could predict seizures with the highest sensitivity using a MobileNet-V2 network in combination with an FC classifier (Table 9).

Table 9.

Comparison between our model and recent studies (patient-independent)

Year Authors Database Method Sensitivity (%) FPR (h−1) Pre-ictal (minutes)
2017 Khan et al. (2017) CHB-MIT (13 patients) Hand-extractede features + LSTM 87.8 0.142 10
2017 Tsiouris et al. (2017) CHB-MIT (24 patients) Feature selection + SVM 68.5 120
2020 Dissanayake et al. (2011) CHB-MIT (10 patients) CNN 97.88 15
2021 This work Epilepsia (7 patients) Linear features + MobileNet_V2 + FC 98.39 0.029 40

Conclusion

In this paper, a novel deep transfer-learning method based on both patient-specific and patient-independent epileptic seizure prediction methods and the use of long-term scalp EEG data was proposed. This method achieved a prediction sensitivity of 98.39%, a false alarm rate of 0.029 h−1, and a prediction time of 40 min prior to the seizure onset. A total of nine models using a combination of three convolutional networks with three classifiers were used. The proposed models were trained and tested to achieve high sensitivity and low FPR on the EEG recordings of ten patients. For the first time, the transfer learning approach of ImageNet was implemented in these models to predict epileptic seizures. For patient-independency, the optimal model was trained by the data from a single patient and was tested on the data from six other patients. The results indicated that our model was highly patient-independent.

In future studies, nonlinear features, bivariate, and multivariate features extracted from two or more channels can be added to the already employed linear features. Their effects on seizure prediction can be investigated. Reducing the number of EEG signal channels can improve the model’s speed and performance. The proposed model’s efficiency in predicting seizures can be examined using raw EEG signals instead of the extracted features. Besides, the degree of independence of the proposed epilepsy model can be assessed with respect to the new patient data as well as variations in the type of epilepsy.

Acknowledgements

This study was carried out following a previous project in the Babol Noshirvani University of Technology, and used the same limited dataset involving EEG features collected under FP7 Epilepsia Project, and made available to Dr. Rasekhi through a researcher visit opportunity to the Department of Informatics in the University of Coimbra, Portugal. Again, we thank Professor António Dourado and Dr. C.A. Teixeira from UC for their essential support and priceless contribution. Thanks also go to Google to provide GPU access at ‘colab.research.google.com’ that made it possible to run the related programs.

Footnotes

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  1. Aarabi A, He B. Seizure prediction in patients with focal hippocampal epilepsy. Clin Neurophysiol. 2017;128(7):1299–1307. doi: 10.1016/j.clinph.2017.04.026. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Ahmadi A, Soltanian-Zadeh H (2019) Epileptic seizure prediction using spectral entropy-based features of EEG. In: 2019 4th international conference on pattern recognition and image analysis (IPRIA). IEEE
  3. Altunay S, Telatar Z, Erogul O. Epileptic EEG detection using the linear prediction error energy. Expert Syst Appl. 2010;37(8):5661–5665. doi: 10.1016/j.eswa.2010.02.045. [DOI] [Google Scholar]
  4. Bandarabadi M, et al. Epileptic seizure prediction using relative spectral power features. Clin Neurophysiol. 2015;126(2):237–248. doi: 10.1016/j.clinph.2014.05.022. [DOI] [PubMed] [Google Scholar]
  5. Box GEP, Jenkins GM, Reinsel GC (2008) Time series analysis: forecasting and control, 4th edn. Wiley, Hoboken, ISBN 978-0-470-27284-8
  6. Çetin M. Model-based robust suppression of epileptic seizures without sensory measurements. Cogn Neurodyn. 2020;14(1):51–67. doi: 10.1007/s11571-019-09555-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Chollet F (2017) Xception: deep learning with depthwise separable convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition
  8. Daoud H, Bayoumi M (2019) Efficient epileptic seizure prediction based on deep learning. IEEE Trans Biomed Circuits Syst 13(5):804-13 [DOI] [PubMed]
  9. Dissanayake T, Fernando T, Denman S, Sridharan S, Fookes C (2020) Patient-independent epileptic seizure prediction using deep learning models. arXiv preprint. https://arxiv.org/abs/2011.09581 [DOI] [PubMed]
  10. Hastie T, Tibshirani R, Friedman J. The elements of statistical learning: data mining, inference, and prediction. Berlin: Springer Science & Business Media; 2009. [Google Scholar]
  11. Hjorth B. EEG analysis based on time domain properties. Electroencephalogr Clin Neurophysiol. 1970;29:306–310. doi: 10.1016/0013-4694(70)90143-4. [DOI] [PubMed] [Google Scholar]
  12. Khan H, et al. Focal onset seizure prediction using convolutional networks. IEEE Trans Biomed Eng. 2017;65(9):2109–2118. doi: 10.1109/TBME.2017.2785401. [DOI] [PubMed] [Google Scholar]
  13. Klatt J, Feldwisch-Drentrup H, Ihle M, Navarro V, Neufang M, Teixeira C, Adam C, et al. The EPILEPSIAE database: an extensive electroencephalography database of epilepsy patients. Epilepsia. 2012;53:1669–1676. doi: 10.1111/j.1528-1167.2012.03564.x. [DOI] [PubMed] [Google Scholar]
  14. Lehnertz K, Mormann F, Osterhage H, Müller A, Prusseit J, Chernihovskyi A, Staniek M, Krug D, Bialonski S, Elger CE. State-of-the-art of seizure prediction. J Clin Neurophysiol. 2007;24(2):147–153. doi: 10.1097/WNP.0b013e3180336f16. [DOI] [PubMed] [Google Scholar]
  15. Li S, Zhou W, Yuan Q, Liu Y. Seizure prediction using spike rate of intracranial EEG. IEEE Trans Neural Syst Rehabil Eng. 2013;21(6):880–6. doi: 10.1109/TNSRE.2013.2282153. [DOI] [PubMed] [Google Scholar]
  16. Litt B, Esteller R, Echauz J, D’Alessandro M, Shor R, Henry T, et al. Epileptic seizures may begin hours in advance of clinical onset: a report of five patients. Neuron. 2001;30:51–64. doi: 10.1016/S0896-6273(01)00262-8. [DOI] [PubMed] [Google Scholar]
  17. Mirowski P, Madhavan D, LeCun Y, Kuzniecky R. Classification of patterns of EEG synchronization for seizure prediction. Clin Neurophysiol. 2009;120(11):1927–1940. doi: 10.1016/j.clinph.2009.09.002. [DOI] [PubMed] [Google Scholar]
  18. Mormann F, Lehnertz K, David P, Elger CE. Mean phase coherence as a measure for phase synchronization and its application to the EEG of epilepsy patients. Physica D. 2000;144(3–4):358–369. doi: 10.1016/S0167-2789(00)00087-7. [DOI] [Google Scholar]
  19. Mormann F, Kreuz T, Rieke C, Andrzejak RG, Kraskov A, David P, et al. On the predictability of epileptic seizures. Clin Neurophysiol. 2005;116:569–587. doi: 10.1016/j.clinph.2004.08.025. [DOI] [PubMed] [Google Scholar]
  20. Petrosian A, Prokhorov D, Homan R, Dasheiff R, Wunsch D. Recurrent neural network based prediction of epileptic seizures in intra- and extracranial EEG. Neurocomputing. 2000;30:201–218. doi: 10.1016/S0925-2312(99)00126-5. [DOI] [Google Scholar]
  21. Piryonesi SM, El-Diraby TE. Data analytics in asset management: cost-effective prediction of the pavement condition index. J Infrastruct Syst. 2020;26(1):04019036. doi: 10.1061/(ASCE)IS.1943-555X.0000512. [DOI] [Google Scholar]
  22. Rajdev P, Ward M, Rickus J, Worth R, Irazoqui P. Realtime seizures prediction from local field potentials using an adaptive Wiener algorithm. Comput Biol Med. 2010;40(1):97–108. doi: 10.1016/j.compbiomed.2009.11.006. [DOI] [PubMed] [Google Scholar]
  23. Rasekhi J, et al. Preprocessing effects of 22 linear univariate features on the performance of seizure prediction methods. J Neurosci Methods. 2013;217(1-2):9–16. doi: 10.1016/j.jneumeth.2013.03.019. [DOI] [PubMed] [Google Scholar]
  24. Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L-C (2018) Mobilenetv2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 4510–4520
  25. Shahbazi M, Aghajan H (2018) A generalizable model for seizure prediction based on deep learning using CNN-LSTM architecture. In: 2018 IEEE global conference on signal and information processing (GlobalSIP). IEEE
  26. Tan M, Le QV (2019) Efficientnet: rethinking model scaling for convolutional neural networks. arXiv preprint. https://arxiv.org/abs/1905.11946
  27. Truong ND, et al. Convolutional neural networks for seizure prediction using intracranial and scalp electroencephalogram. Neural Netw. 2018;105:104–111. doi: 10.1016/j.neunet.2018.04.018. [DOI] [PubMed] [Google Scholar]
  28. Tsiouris KM, Pezoulas VC, Koutsouris DD, Zervakis M, Fotiadis DI (2017) Discrimination of pre-ictal and interictal brain states from long-term EEG data. In: 2017 IEEE 30th international symposium on computer-based medical systems (CBMS). IEEE, pp 318–323
  29. Zhang Z, Parhi KK. Low-complexity seizure prediction from iEEG/sEEG using spectral power and ratios of spectral power. IEEE Trans Biomed Circuits Syst. 2016;10(3):693–706. doi: 10.1109/TBCAS.2015.2477264. [DOI] [PubMed] [Google Scholar]

Articles from Cognitive Neurodynamics are provided here courtesy of Springer Science+Business Media B.V.

RESOURCES