Ensemble Classifier for Epileptic Seizure Detection for Imperfect EEG Data

Khalid Abualsaud; Massudi Mahmuddin; Mohammad Saleh; Amr Mohamed

doi:10.1155/2015/945689

. 2015 Feb 4;2015:945689. doi: 10.1155/2015/945689

Ensemble Classifier for Epileptic Seizure Detection for Imperfect EEG Data

Khalid Abualsaud ^1,^2,^*, Massudi Mahmuddin ², Mohammad Saleh ¹, Amr Mohamed ¹

PMCID: PMC4334942 PMID: 25759863

Abstract

Brain status information is captured by physiological electroencephalogram (EEG) signals, which are extensively used to study different brain activities. This study investigates the use of a new ensemble classifier to detect an epileptic seizure from compressed and noisy EEG signals. This noise-aware signal combination (NSC) ensemble classifier combines four classification models based on their individual performance. The main objective of the proposed classifier is to enhance the classification accuracy in the presence of noisy and incomplete information while preserving a reasonable amount of complexity. The experimental results show the effectiveness of the NSC technique, which yields higher accuracies of 90% for noiseless data compared with 85%, 85.9%, and 89.5% in other experiments. The accuracy for the proposed method is 80% when SNR = 1 dB, 84% when SNR = 5 dB, and 88% when SNR = 10 dB, while the compression ratio (CR) is 85.35% for all of the datasets mentioned.

1. Introduction

Brain status information is captured by physiological electroencephalogram (EEG) signals, which are extensively used to study different brain activities. In particular, they provide important information pertaining to epileptic seizure disease, as reported previously [1–3]. Epilepsy is a neurological disorder involving disturbances to the nervous system that are induced by brain damage. It has been reported [4] that 1% of the population worldwide is affected by this disease. Visual inspection of EEG signals can be very difficult and time consuming due to the difficulty of maintaining a high level of concentration during a lengthy inspection; this difficulty increases operator errors [5, 6]. Therefore, artificial intelligence techniques are proposed to enhance the process of epileptic seizure detection.

Recently, ensemble methods for EEG signal classification have attracted growing attention from both academia and industry. Sun et al. [7] evaluated the performance of three popular ensemble methods, namely, bagging, boosting, and random subspace ensembles. They reported that the capability of the ensemble methods is subject to the type of base classifiers, particularly the settings and parameters used for each individual classifier. Dehuri et al. [8] presented the ensemble of radial basis function neural networks (RBFNs) method to identify epileptic seizures. This method was based on the bagging approach and used differential evolution- (DE-) RBFNs as the base classifier. He et al. [9] proposed a signal-strength-based combining (SSC) method to support decision making in EEG classification. The results show that the proposed SSC method is competitive with the existing classifiers. Wang et al. [5] proposed a bag-of-words model for biomedical EEG and ECG time series that are represented as a histogram of the code words. The results of the proposed model are insensitive to the used parameters and are also robust to noise.

Feature extraction techniques proposed in the literature can generally be categorized into time-domain- or frequency-domain-based according to the features used. These techniques were used in several research works [10, 11].

Time-domain features are easily computed, and their time complexity is usually manageable [10]. Vidaurre et al. [12] proposed a time-domain-parameter- (TDP-) based feature extraction method. It is a generalized form of the Hjorth parameter and can be computed efficiently. The TDP feature is then fed to a linear discriminant analysis (LDA) classifier that is utilized in a brain computer interface application. Mohamed et al. [13] proposed five time-domain features, namely, sum, average, standard deviation, zero crossing, and energy. Subsequently, they used a set of classifiers to detect epileptic seizures. The output of the classifiers was then combined, using the Dempster rule of combination, for a final system decision. A classification accuracy of 89.5% was achieved. Nigam and Graupe [14] proposed an automated neural network-based epileptic seizure detection model, called LAMSTAR. Two features, namely, the relative spike amplitude and the spike rhythmicity of the EEG signals, were calculated and utilized to train the neural network.

Frequency-domain features are usually obtained by transforming EEG signals into their basic frequency components [6]. The characteristics of these components primarily fall within four frequency bands [15]. One classification system uses a one-second time window to extract relevant features [16]. The fast Fourier transformation (FFT) is used to transform the data in the window into the frequency domain. To distinguish between several brain states, frequency components from 9 to 28 Hz were studied and presented to a modified version of Kohonen's learning vector quantization classifier. Wang et al. [17] proposed an EEG classification system for epileptic seizure detection. It consists of three main stages, namely, (1) the best basis-based wavelet packet entropy method, which is used to represent EEG signals by wavelet packet coefficients; (2) a k-NN classifier with the cross-validation method in the training stage of hierarchical knowledge base (HKB) construction; and (3) the top-ranked discriminating rules from the HKB used in the testing stage to compute the classification accuracy and rejection rate. They reported a classification accuracy of close to 100%; however, their experiments considered only healthy subjects which is class A and epileptic seizure active subjects which is class E data and never considered seizure-free intervals which are class C or class D. Trivially, neglecting such classes eliminated the main source of difficulty in this classification process. Moreover, the data of their experiments is only noiseless and used a single classifier, k-NN. Bajaj and Pachori [18] proposed a new method for classifying seizure and nonseizure states. The method used the empirical mode decomposition (EMD) technique based on bandwidth features. The features were used as an input to a least squares SVM classifier. Sharma et al. [19] also presented a classification method of two focal and nonfocal EEG signals. Data from five epilepsy patients who had longstanding drug resistance has been used to test the method. The only base classifier used was the least square support vector machine (LS-SVM). Average sample entropy and average variance of the intrinsic mode functions (IMFs) were obtained based on EMD of EEG signals. The results show that the proposed method gives a classification accuracy of 85%. The second-order difference plot method of IMF [20] has been used as a feature for epileptic seizure classification. The computed area from the diagnostic signal demonstrates that the IMF detection is found to be a significant parameter for analysis of both healthy and unhealthy subjects [21]. The mean frequency feature of the IMFs has come up as a feature to identify variance between ictal and seizure-free EEG signals [22]. Wavelet and multiwavelet transformations have been included in analysis and classification of EEG time-frequency of the epileptic seizure [23]. However, these methods used noiseless data, while in this research both noiseless and noisy data were used. Compared with our methods, these datasets are only using the LS-SVM as a base classifier, while in this research 4 different classifiers were used. In another research work [24], the discrete wavelet transform (DWT) was used to transform EEG signals into their frequency components. For each wavelet subband, the maximum, minimum, mean, and standard deviation were then calculated and used as an input vector for a set of classifiers. The results revealed that the neural network classifier outperformed other classifiers with a 95% accuracy rate, while the k-NN classifier was more tolerant to imperfect data.

Other reported techniques utilize a mix of time- and frequency-domain features, such as in Valderrama et al. [25]. The first, second, third, and fourth statistical moments (i.e., mean, variance, skewness, and kurtosis, resp.) were extracted using the EEG amplitudes. Along with these time-domain features, energy and other frequency-domain features were extracted. A support vector machine (SVM) was then applied to the obtained features for seizure classification. Weng and Khorasani [26] proposed a method that uses the average EEG amplitude, average EEG duration, coefficient of variation, dominant frequency, and average power spectrum as features that are input to an adaptive structured neural network.

The classification techniques that are reported throughout the literature provide satisfactory performance data indicating that the EEG data are not contaminated by different factors. Although the raw EEG datasets (free of artifacts) were used, the lossy compression will introduce signal distortion, which will affect the reconstructed data. Therefore, wireless EEG data often are compressed before transmission, which means that some important information may get lost during the reconstruction process on the receiver side. Moreover, a wireless channel may augment the transmission problem by adding noise artifacts to the transmitted data. Therefore, a prospective classification technique should consider the uncertainty in the EEG data to guarantee the targeted performance.

In this paper, considering that the EEG signal is in nature bandwidth hungry, several works have considered in-network processing for either compressing EEG data [27] or transferring EEG features instead of delivering the raw uncompressed signal [28]. Another reason considering that the sensor is battery-operated, if the data is transmitted without compression, the battery power will be consumed faster. Therefore, we propose unified framework where the EEG data is compressed using compressive sensing (CS) and sent using two different types of channels. In the first, it was sent over a noiseless channel while the second was sent over the additive white Gaussian noise (AWGN) wireless channel in three different cases where SNR = 1, 5, and 10 dB. On the other hand, the compressed data was reconstructed and statistical features were extracted. Finally, the data obtained was contaminated due to the reconstruction and the different values of noise. A distinct factor that distinguishes this research work is the proposal of a new framework and new noise-aware signal combination (NSC) method that improves the classification of the reconstructed and noisy EEG data. To address this scenario, a unified framework has been designed, which presents compressive sensing-based technique to send compressed EEG data over AWGN wireless channel, reconstruction, and feature extraction using time-frequency-domain analysis in preparation of data classification. Such framework makes this work more practical because it performs classification considering data imperfection due to compression and wireless channel transmission.

Thus, the main contributions of this paper are as follows: (1) a framework for EEG compression and classification using CS and AWGN channel transmission has been developed, (2) a new noise-aware signal combination (NSC) method that supports both types of biomedical reconstructed EEG data, noiseless and noisy, has been proposed, and (3) a series of comprehensive experiments are conducted to investigate the effectiveness and robustness of the NSC method for classifying EEG signals.

The remainder of this paper is structured as follows. In Section 2, we present an EEG-based framework, including compressive sensing, the DCT method, and feature extraction, as well as the set of classifiers that have been used. Section 3 describes the proposed system model, which mainly includes an ensemble classification method, a description of the EEG datasets, an epileptic seizure detection system model, and a proposed noise-aware signal combination (NSC) method. The results and discussions of extensive experiments investigating the effectiveness and robustness of NSC for EEG signal classification are illustrated in Section 4, and the paper is concluded in Section 5.

2. Materials and Methods

Firstly, this section describes the framework of the implemented system and its architecture as well as the main components. Secondly, a description of the EEG datasets, which is being used to distinguish between healthy subjects and epilepsy subjects, is presented. Thirdly, the compressive sensing integrated with the discrete cosine transform and measurement matrix is being presented. Fourthly, feature extraction in described, and finally, a brief of classification methods are demonstrated.

2.1. Architecture of the Framework

The system model is composed of two main parts, the transmitter and the receiver, shown in Figure 1. The transmitter has 4096 samples raw electroencephalography (EEG) signals, represented by (x), and uses a CS technique to downsample the data based on a sparse measurement matrix. In this framework, we used DCT and the basis ψ for different quantities of M to obtain the compressed data $\hat{x}$ that will be transmitted over noiseless and noisy channels (i.e., radio frequency (RF) or Bluetooth). Several sources of noise can alter the data, including wireless channel fading, path loss, and thermal noise at the receiver. In this paper, without loss of generality, we consider the thermal noise using the AWGN model at the receiving side as the most widely used model for representing thermal noise [29–32]. We control the noise level using the signal to noise ratio (SNR) to demonstrate data imperfection and to study the behavior of the different classification techniques in the presence of such noise.

The receiver, which receives the compressed signal of size M, reconstructs the EEG data using an inverse DCT (iDCT) and basis pursuit to obtain the reconstructed signal. The iDCT reconstruction algorithm is used for the DCT, or an optimization problem with certain constraints is solved for the CS [30, 33, 34]. For example, in the following, for a given compressed measurement y at the receiver, the signal x can be reconstructed by solving one of the following optimization problems:

\begin{matrix} Minimum {‖x_{0}‖}_{2} \\ Subject to y_{i} = 〈Φ_{i} Ψ x_{0 i}〉 . \end{matrix}

(1)

Using a trick of basis pursuit, find the vector x ₀ with the lowest L ₂ norm that satisfies the observations. For an N-dimensional EEG signal x,

\begin{matrix} x = Ψ α, \end{matrix}

(2)

where Ψ is a discrete cosine transform (DCT) basis, α is the wavelet, and both are domain coefficients. At the receiver side, once we detect α, iDCT will be utilized to reconstruct the original signal from α. Figure 1 shows the framework that has compressive sensing and data reconstruction as well as the classification processes for EEG-based epileptic seizure [24].

2.2. EEG Datasets Descriptions

The datasets used in this work originated from Andrzejak et al. [35], which are widely used for automatic epileptic seizure detection. It contains both normal and epileptic EEG datasets. The EEG datasets were collected from five patients. The datasets consist of five sets termed A, B, C, D, and E. Each set was composed of 100 single channel EEG segments of 23.6-second duration. For sets A and B, the patients were relaxed and awake with eyes open and eyes closed, respectively. Segments of sets A and B were taken from surface EEG recordings, which were performed using a standardized electrode placement scheme on five healthy subjects. The segments in set C were recorded from the hippocampal formation of the opposite from the epileptogenic zone. The segments in set D were recorded from within the epileptogenic zone. Sets C, D, and E originated from EEG archive of presurgical diagnosis. Sets C and D both contained only the activity measured during seizure-free intervals. Finally, only set E contained seizure activity. All EEG signals were recorded with the same 128-channel amplifier system (neglecting electrodes that have strong eye movement artifacts (A and B) or pathological activity (C, D, and E)). The data were constantly written at a sampling rate of 173.61 Hz to the disk of the data acquisition computer system. Kumar et al. [36] reported that when the performance of sets A and E was compared with set B and set E, it was concluded that set A and set E were more efficient [36]. In addition, set A and set B are similar in feature properties that are hard for the classifier to distinguish between both sets representing healthy patients. It is worth noting that, during performance evaluation, we have conducted many experiments using different groups of classes (i.e., one group was all 5 classes; another group was A, C, and E, etc.), and the best results were evident for the class groups of A, C, and E. Therefore, in this research paper, we opted to use set A to represent healthy subjects, set C to represent unhealthy with seizure-free interval subjects, and set E to represent the epileptic seizure active subjects. In this case, 300 EEG segments are used; each class consists of 100 segments. Figure 2 illustrates the ideal raw EEG signals of sets A, C, and E, respectively.

Example of three different classes of EEG signals taken from different subjects.

Typically, transmitters are mobile devices, which are equipped with battery sources; hence, the power consumption during data transmission is critical. Therefore, the compressive sensing (CS) and discrete cosine transform (DCT) methods have been utilized to reduce the amount of data before transmission because CS does not require much complexity for downsampling at the transmitter; this low complexity comes with the cost of higher complexity on the receiver side [29].

2.3. Compressive Sensing

Compressive sensing (CS) technique [37] is used to reduce the size of the data that was sent from the transmitter to the receiver, and thus CS has been considered for efficient EEG acquisition and compression in several applications [31, 38, 39]. Signal acquisition is the critical part of most applications, where the acquisition time or the computational resources are limited, and the CS technique has the significant advantage of offloading the processing from the data acquisition step to the data reconstruction step. CS reduces the time consumed in processing at the transmitter, at the expense of higher complexity at the receiver where more processing time and higher computational capacity are usually available. Previous research work [38, 39] focused on the sparse modeling of EEG signals and evaluating the efficiency of CS-based compression in terms of signal reconstruction errors and time required.

An N-dimensional 4096-sample raw EEG signal x is considered to illustrate the CS compression and reconstruction. Assume that this signal is represented by a projection onto a different basis set Ψ:

\begin{matrix} x = \sum_{i = 1}^{N} {{x_{0}}_{i} Ψ}_{i} or x = Ψ x_{0}, \end{matrix}

(3)

where x is the original signal, x ₀ is the sparse of representation of x, and Ψ is an N∗N bases matrix.

The sparse vector x _0i can be calculated from the inner product of x and Ψ:

\begin{matrix} x_{0 i} = 〈x, Ψ_{i}〉 . \end{matrix}

(4)

The basis (Ψ) can be a Gabor, Fourier, or discrete cosine transform (DCT) or a Mexican hat, linear spline, cubic spline, linear B-spline, or cubic B-spline function. In compressive sensing, Ψ is chosen such that x ₀ is sparse. The vector x ₀ is k-sparse if it has k nonzero entries and the remaining (N − k) entries are all zeroes. In addition to the projection above, it is assumed that x can be related to another signal y:

\begin{matrix} y_{[M * 1]} = Φ_{[M * N]} \times x_{[N * 1]}, \end{matrix}

(5)

where Φ is a measurement matrix (also called sensing matrix) of dimensions M∗N, and y is the compressive sensed version of x. Matrix y has dimensions M∗1, and data compression is achieved if M < N. It can be shown that this technique is possible if Φ and Ψ are incoherent. To satisfy this condition, Φ is chosen as a random matrix. The compression ratio (CR) is then defined as follows:

\begin{matrix} CR = (1 - \frac{M}{N}) * 100 . \end{matrix}

(6)

2.4. Discrete Cosine Transom (DCT) Method

The discrete cosine transform (DCT) is used as the basis to make the EEG signal sparse as part of the CS framework. It is a Fourier-related transform similar to the discrete Fourier transform (DFT); however, it only uses real numbers and has low computational complexity [24, 28]. Obtaining the signal x(n) in the DCT domain will require a definition of the (N + 1)∗(N + 1) DCT transform matrix, whose elements are given by

\begin{matrix} {[C]}_{m n} = \sqrt{\frac{2}{N}} \{k_{m} k_{n} \cos (\frac{m n π}{N})\}, \\ m, n = 0, 1, \dots, N, \\ k_{i} = \{\begin{matrix} 1, & for i \neq 0 or N, \\ \frac{1}{\sqrt{2}}, & for i = 0 or N . \end{matrix} \end{matrix}

(7)

This matrix is unitary, and when it is applied to a data vector x of length N + 1, it produces a vector called X _c, where X _c = [C]∗x, and its elements are given by

\begin{matrix} X_{c} (m) = \sqrt{\frac{2}{N}} \sum_{n = 0}^{N} k_{m} k_{n} \cos (\frac{m n π}{N}) x (n) . \end{matrix}

(8)

On the receiver side, the basis of the iDCT [28] is utilized in the CS decoder in order to obtain the reconstructed signal (x _r) as follows:

\begin{matrix} x_{r} (a) = \sum_{k = 1}^{N} w (k) y (k) \cos [\frac{π (2 a + 1) k}{2 N}], \end{matrix}

(9)

where N is the length of both time series and cosine transform signals, a is the time series index (a = 1, 2,…, N), k is the cosine transform index (k = 1, 2,…, N), and the window function W(k) is defined as

\begin{matrix} w (k) = \{\begin{matrix} \frac{1}{\sqrt{N}}, & k = 1, \\ \sqrt{\frac{2}{N}}, & 2 \leq k \leq N . \end{matrix} \end{matrix}

(10)

After obtaining the contaminated reconstructed signal (x _r), DWT is used as feature extraction and selection techniques.

2.5. Feature Extraction

EEG feature extraction plays a significant role in diagnosing most brain diseases. Obtaining useful and discriminant features depends largely on the feature extraction method used. Because EEG signals are time-varying and space-varying nonstationary signals, the discrete wavelet transform (DWT) method is widely used [17]. It captures both frequency and time location information [32, 40–42]. Using multiresolution wavelet analysis, DWT basically decomposes the EEG signals into different frequency bands.

EEG data are generally nonstationary signals, which are heavily dependent on the subject condition. The Daubechies 6 DWT was employed, where the data were sampled at a rate of 173.61 Hz. This means that the EEG data frequency is 86.81 Hz; thus, the filter length is long as well; the frequency wavelet subband is the same as the fundamental component of the EEG. Hence, decomposition level 7 was calculated based on the EEG frequency. In addition, considering our extensive experimental work on the reconstruction accuracy of different wavelet families, filter lengths, and decomposition levels [30], we used Daubechies 6 with 1–8 different decomposition levels in this research. We found that Daubechies 6 with decomposition level 7 is the optimum level in terms of classification accuracy and computational complexity of the EEG epileptic seizure category of data. Given the EEG signal f(x), the wavelet series expansion is depicted [30] as follows:

\begin{matrix} f (x) = \sum_{k}^{} c_{j 0} (k) φ_{j 0, k} (x) \\ + \sum_{j = j_{o}}^{\infty} \sum_{k}^{} d_{j} (k) ψ_{j, k} (x), \end{matrix}

(11)

where f(x) ∈ L ²(R), L ²(R) is relative to the wavelet ψ(x) and the scaling function φ(x), and c _j0 are the approximation coefficients.

In the first sum, the approximation coefficients c _j0 can be represented as the outcome of the inner product process between the original signal f(x) and the approximation function φ _j0,k(x) as expressed by

\begin{matrix} c_{j_{0}} (k) = 〈f (x), φ_{j o, k} (x)〉 . \end{matrix}

(12)

In the second sum, a finer resolution is added to the approximation to provide increasing details. The function d _j(k) represents the details coefficients and it can be obtained by the inner product between the original signal f(x) and the wavelet function ψ _j,k(x) calculated as

\begin{matrix} d_{j} (k) = 〈f (x), ψ_{j, k} (x)〉 . \end{matrix}

(13)

Generally, the classification accuracies improve when using a combination of time- and frequency-domain features rather than features solely based on either the frequency domain or the time domain [30]. Different implementation choices, including different wavelet families, filter lengths, and decomposition levels, have been utilized to extract features. Accordingly, the conventional statistical features (maximum, minimum, mean, and standard deviation) are extracted from each wavelet subband. The extraction rules for statistical features that have been implemented for the wavelet subband are as follows.

Maximum feature:

\begin{matrix} x_{k} such that x_{k} > x_{i}, \forall i \neq k, i = 1, \dots, n, \\ d_{i} (x) = \max_{i = 1, \dots, k} {d_{i} (x)} . \end{matrix}

(14)

Minimum feature:

\begin{matrix} x_{k} such that x_{k} < x_{i}, \forall i \neq k, i = 1, \dots, n, \\ d_{i} (x) = \min_{i = 1, \dots, k} {d_{i} (x)} . \end{matrix}

(15)

The mean can be calculated by the following:

\begin{matrix} \hat{x} = \frac{1}{N} \sum_{i = 1}^{n} x_{i} . \end{matrix}

(16)

The standard deviation feature is given by the following:

\begin{matrix} σ^{2} = \frac{1}{N - 1} \sum_{i = 1}^{N} {(x_{i} - \hat{x})}^{2} . \end{matrix}

(17)

The original EEG signal was analyzed for the wavelet subbands A7 and D7-D1. Eventually, four conventional statistical features are selected from each wavelet subband individually. As a consequence, 32 attributes are obtained from the whole subbands to be fed to the classifiers. So the features maximum, minimum, mean, and standard deviation contribute to the classification accuracy in this research. It has been found that these features are robust with the dynamic environment of the wireless channel [24, 28]. Meanwhile, these features have low computational complexity.

2.6. Classification Methods

EEG detection and classification play an essential role in timely diagnoses and analyze potentially fatal and chronic diseases proactively in clinical as well as various life settings [3, 43]. Liang et al. [44] proposed a systematic evaluation of EEGs by combining both complexity analysis and spectral analysis for epilepsy diagnosis and seizure detection. Approximately 60% of the features extracted from the dataset were used for training, while the remaining ones were used to test the performance of the classification procedure on randomly selected EEG signals [44].

In this research work, four different classifiers have been used, namely, ANN, naïve Bayes, k-NN, and SVM. Initially, the classifiers were developed to work individually to compare their performances. However, we developed a data fusion method for combining the output of all classifiers in order to reduce the effect of data imperfections while maximizing the classification accuracy. Each classifier belongs to a different family of classifiers and has been shown to be the best classifier in its family. However, it is expected that they may yield different classification results because they each use a different classification strategy [13, 17, 45–47]. The following provides a brief description of these classifiers.

2.6.1. Artificial Neural Network

An artificial neural network (ANN) is a mathematical model that is motivated by the structure and functional aspects of biological neural networks. To establish classification rules and perform statistical analysis, ANN is able to estimate the posterior probabilities [14, 47, 48]. The ANN has several parameters; in this paper, the ANN configuration uses training cycles = 500, learning rate = 0.3, and momentum decay = 0.2.

2.6.2. Naïve Bayes

The naïve Bayes (NB) classifier is a statistical classifier. It is a simple probabilistic classifier based on the application of Bayes'theorem. The NB method involves an assumption that makes the calculation of the NB classifiers more efficient than the exponential complexity. Simply, it works by considering that the presence of certain features of a class is irrelevant to any other features. The NB classifier considers each feature independently to calculate the feature properties that contribute to the probability of a certain class to be the outcome of the classification. It then uses Laplace correction to prevent high encounters of zero probabilities as the default configuration [13, 24, 46].

2.6.3. k-Nearest Neighbor (k-NN)

The k-nearest neighbor (k-NN) algorithm compares a given test sample with training samples that are alike, where k parameter is a small positive and odd integer value. This algorithm combines two steps. First, find the k training samples that are closest to the invisible sample. Second, take the commonly occurring classification for these k samples and find the average of the values of its k-nearest neighbors in the regression. It can be defined by a distance metric called the normalized Euclidean distance, as indicated in the following equation, given two points Y ₁ = (y ₁₁, y ₁₂,…, y _1n) and Y ₂ = (y ₂₁, y ₂₂,…, y _2n) [6, 24, 45]:

\begin{matrix} dist (Y_{1}, Y_{2}) = \sqrt{\sum_{i = 1}^{n} {(y_{1 i} - y_{2 i})}^{2}} . \end{matrix}

(18)

In this research, the k-NN configuration uses value of k = 10, and mixed measures were selected as the measure type, which makes the mixed Euclidean distance the only available option.

2.6.4. Support Vector Machine

The support vector machine (SVM) learner is a strong classifier based on statistical learning theory. SVM constructs an ideal hyperplane in order to separate the data into two different classes to minimize the risks. SVM takes a set of input data and predicts, for each given input, which of the two possible classes involves the input. SVM is an integrated and powerful method for both classification and regression as well as distribution estimation. SVM operator supports types C-SVC and nu-SVC for classification tasks; epsilon-SVR and nu-SVR types for regression tasks. Finally, the one-class type is used for distribution estimation [13, 24, 46, 49]. In this research, SVM configuration is consist of both nu-SVC and radial basis function kernel were used for SVM configurations consist of both classification technique.

3. Ensemble Detection and Classification

Ensemble methods are introduced first, followed by the proposed ensemble system model and, finally, the ensemble method in this section.

3.1. Ensemble Classification Methods

Several combination techniques have been introduced in the literature, and each offers certain advantages and suffers from certain limitations. However, given several classifiers, the combination (fusion) method must address two critical issues: the dependency among the potentially combined classifiers and the consistency of the information contained in each classifier.

For the first issue, the classifiers must be independent because we consider each classifier to be a source of information. This means that each classifier simply works on the input feature set independently, while the classification is based on combining the outcomes of all classifiers simultaneously.

For the second issue, the classifiers may have conflicting decisions because different classifiers are expected to consider different viewpoints of the current system state. To address this anticipated conflict, an effective mechanism that is capable of quantifying the assurance in the decision of each classifier is desirable.

One of these well-known combination techniques is the majority voting. The majority voting (MV) rule technique collects the votes of all classifiers and investigates the class name that is mostly reported by the classifiers. It then chooses that class as a final decision [50]. However, MV is based on the idea that the classifiers participating in the voting process have the same weight. It completely ignores the inconsistency that may arise among the classifiers. This, of course, can cause less capable classifiers to override more capable classifiers. Thus, the performance of the classification system can be deteriorated. Because the classifier models proposed in this work are expected to have different discriminant weights, the MV technique is not suitable as a combination method.

In contrast, in probability-based voting schemes, the combination method should assign a probability value (p) that reflects the confidence of a classifier in its viewpoint. One of these schemes can be based on an accumulated experience. For instance, a given classifier is correct in identifying a certain hypothesis 75% percent of the time, while another classifier can correctly identify a different hypothesis 30% of the time. These values can actually be interpreted as probability assignments.

If the classifiers happen to provide these different and conflicting hypotheses as an explanation of the current system state, then the classifiers should not be treated equally at the classification stage. Clearly, the first classifier is more confident in its decision than the first one. This valuable information should be incorporated into the fusion (combination) process.

For instance, we may assign a weight (p) of 0.75 to the first classifier while assigning only 0.30 as a weight to the second classifier.

Let T be the set of classifiers:

\begin{matrix} T = {C_{1}, C_{2}, \dots, C_{n}}, \end{matrix}

(19)

and let C be the set of classes:

\begin{matrix} \{p_{1}, p_{2}, \dots, p_{n}\} . \end{matrix}

(20)

Then, let d _i,j be the decision of the classifier i and have the following definition:

\begin{matrix} d_{i, j} \in \{0,1\}, \end{matrix}

(21)

where i = 1,…, T and j = 1,…, C.

Let p _i represent the weight of the classifier i. Then, the probability-based voting decision is calculated as

\begin{matrix} \sum_{i = 1}^{| T |} p_{i} d_{i, j} = \begin{matrix} C \\ \max \\ j = 1 \end{matrix} \sum_{r = 1}^{| T |} p_{r} d_{r, j} . \end{matrix}

(22)

Considering the weight of each classifier, (22) counts the votes from the participating classifiers.

3.2. Proposed Ensemble System Model

The proposed model consists of three stages for detecting electroencephalogram seizures, namely, statistical feature extraction, classifier prediction, and proposed noise-aware signal combination (NSC) method. The extraction of statistical features was discussed in Section 2.5. For classifier prediction, four classifiers are utilized in this model, namely, ANN, Bayes, k-NN, and SVM. These classification methods are trained using the most popular data mining tools that are an industry standard and widely used tools for research. The training process is conducted on similar data adhering to various combinations of SNRs and downsampling rates. After exhaustive iterated experiments, the trained models are saved, and their averaged performances in different scenarios are reported to the NSC. The NSC is our proposed ensemble method using combinations of probability estimates. Eventually, the ultimate classification accuracy is obtained for the epileptic seizure detection. The proposed system model is shown in Figure 3.

There are s tabular observations O = {o ₀, o ₁,…,o _s−1}, where each o _i is a t-tuple of readings R _i = (r ₀, r ₁,…,r _t−1). These observations fall into (s/m) different categories of classes = {c ₀, c ₁,…, c _m−1}.

The DWT is applied to the set of observations O to obtain an l-tuple of features R _i = (f ₀, f ₁,…, f _l−1) for each o _i ∈ O. In other words, DWT : O → F such that DWT(o _k) = (f _k,0, f _k,1,…, f _k,l), where f _k,j is an l-tuple extracted feature for the observation o _k obtained by DWT.

Hence, DWT(O) = {(f _i,0, f _i,1,…, f _i,l∣i = 0, 1, …, s − 1)} is the training and testing tabular l-tuple format representing the input data for the classification model in this research work.

3.3. The Ensemble Method

Several classifiers (n) built on various hypotheses H = {h ₀, h ₁,…, h _n−1} are fed with input data. The data are DWT(O) in a tabular l-tuple format, as discussed above. Each classifier k built on hypothesis h _k is trained on the data to predict the label representing the class c _j that best describes a given set of features (f _i,0, f _i,1,…, f _i,l) corresponding to the observation o _i.

At the end of the training of each classifier, a set of performance measurements of interest is recorded. Table 1 shows some of these performance measurements. The trained model will then be saved for application to various categories of testing data. This process is replicated and repeated to yield an output that can be averaged to describe the model behavior for long run times.

Table 1.

Classifier's performance measurements.

Measure	Description
PL_i,j	Predicted label of o _i using hypothesis h _j
PC_i,j	Confidence value predicting c _i using hypothesis h _j
PR_i,j	Precision value of c _i using hypothesis h _j
RE_i,j	Class recall value of c _i using hypothesis h _j
AC⁡_j	Accuracy value of for applying hypothesis h _j

	True class label
	c ₀	c ₁	c ₂	$\bar{PR}$
Predicted class label
c ₀	0.88	0.12	0.02	0.273
c ₁	0.12	0.85	0.02	0.252
c ₂	0.00	0.03	0.96	0.245
$\bar{RE}$	0.247	0.288	0.251

h ₀ = ANN		$\bar{{AC}_{0}}$	0.259

Authors	Noiseless data	Imperfect data	Classifiers	Accuracy
Sharma et al., 2014 [19]	Two different classes	N/A	LS-SVM	85%
Sadati et al., 2006 [15]	A, D, and E	N/A	SVM, FBNN, ANFIS, and proposed ANFN	85.9%
Mohamed et al., 2013 [13]	A, B, C, D, and E	N/A	NB, MLP, k-NN, LDA, and SVM	89.5%
Liang et al., 2010 [44]	A, D, and E	N/A	PCA and GA	80%–90%
Tzallas et al., 2009 [11]	A, B, C, D, and E	N/A	ANN	89%
Proposed method	A, C, and E	A, C, and E	Ensemble NSC	90%

PERMALINK

Ensemble Classifier for Epileptic Seizure Detection for Imperfect EEG Data

Khalid Abualsaud

Massudi Mahmuddin

Mohammad Saleh

Amr Mohamed

Abstract

1. Introduction

2. Materials and Methods

2.1. Architecture of the Framework

Figure 1.

2.2. EEG Datasets Descriptions

Figure 2.

2.3. Compressive Sensing

2.4. Discrete Cosine Transom (DCT) Method

2.5. Feature Extraction

2.6. Classification Methods

2.6.1. Artificial Neural Network

2.6.2. Naïve Bayes

2.6.3. k-Nearest Neighbor (k-NN)

2.6.4. Support Vector Machine

3. Ensemble Detection and Classification

3.1. Ensemble Classification Methods

3.2. Proposed Ensemble System Model

Figure 3.

3.3. The Ensemble Method

Table 1.

Algorithm 1.

Table 2.

(a).

(b).

(c).

(d).

Table 3.

(a).

(b).

(c).

(d).

Table 4.

(a).

(b).

(c).

(d).

Table 5.

(a).

(b).

(c).

(d).

4. Results and Discussion

Table 6.

Figure 4.

Figure 8.

Figure 7.

Figure 5.

Figure 6.

Table 7.

5. Conclusion

Acknowledgments

Conflict of Interests

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases