Abstract
Early diagnosis of Alzheimer’s disease (AD) is becoming an increasingly important healthcare concern. Prior approaches analyzing event related potentials (ERPs) had varying degrees of success, primarily due to smaller study cohorts, and the inherent difficulty of the problem. A new effort using multiresolution analysis of ERPs is described. Distinctions of this study include analyzing a larger cohort; comparing different wavelets and different frequency bands; using ensemble based decisions; and most importantly, aiming the earliest possible diagnosis of the disease. Surprising yet promising outcomes indicate that ERPs in response to novel sounds of oddball paradigm may be more reliable as a biomarker than commonly used responses to target sounds.
Keywords: Alzheimer’s disease diagnosis, wavelets, event related potentials, ensemble classifiers
1. Introduction
Neurological disorders that cause gradual loss of cognitive function are collectively known as dementia. Among several forms dementia, perhaps the most infamous and the most common form is the irreversible and incurable senile dementia of Alzheimer’s type, or just Alzheimer’s disease (AD), in short. Alzheimer’s disease, first described by Alois Alzheimer in 1906, was once considered a rare disease, and it was mostly ignored due to elderly people being its primary victim. Today, on the centennial anniversary of the disease’s discovery, the situation is much different: as the world’s population ages rapidly – primarily in developed countries – so does the number of people affected by the disease. Different estimates vary considerably; however, it is now estimated that there are 18 – 24 million people suffering from AD worldwide, two-thirds of whom are living in developed or developing countries. This number is expected to reach 34 million by 2025. Up to age 60, AD appears in less than 1% of the population, but its prevalence increases sharply, doubling every five year thereafter: AD affects 5% of 65-year olds, and over 30% of 85-year olds. Beyond age 85, the odds of developing AD approaches a terrifying ratio of 1 in 2 [1,2].
The specific causes of AD are unknown; however, the disease is associated with two abnormal proteins: neurofibrillary tangles clustering inside the neurons, and amyloid plaques that accumulate outside of the neurons of primarily the cerebral cortex, amygdale and the hippocampus. These unusual proteins cause a gradual but irreversible decline in all cognitive (and eventually motor) skills, leaving the victim incapable of caring for him/herself. Furthermore, these proteins can only be identified by examining the brain tissue under a microscope, leaving autopsy as the only method for positive diagnosis. AD not only incapacitates its victim, but it also causes an unbearable grief on the victim’s caregiver, and a devastating financial toll on the society with an annual cost of over $100 billion.
Several biomarkers have been linked to AD, such as the cerebrospinal fluid tau, β-amyloid, urine F2-isoprostane, brain atrophy, and volume loss detected by PET or MRI scan [3,4]. However, these methods have either not proven to be conclusive, or remain primarily university or research hospital based tools. While clinical and neuropsychological evaluations achieve an average positive predictive value of 85–90%, this level of expertise is typically available only at university or research hospitals, and hence remain beyond reach for most patients. Therefore, these patients are evaluated by local community healthcare providers where the expertise and accuracy of AD specific diagnosis remains uncertain. Our sole metric for community clinics is a recent study that reported 83% sensitivity, 55% specificity, and 75% overall accuracy on AD diagnosis by a group of Health Maintenance Organization based physicians, despite having the advantage of longitudinal follow up [5]. Meanwhile, recent development of pathologically targeted medications requires an accurate diagnosis at the earliest stage possible, so that the patient’s life expectancy, as well as his/her quality of life, may be significantly improved. Therefore, to have a meaningful impact on healthcare, the diagnostic tool must be inexpensive, non-invasive, accurate, available to community physicians, and be able to diagnose the disease at its earliest stages.
Event related potentials (ERPs) of the electroencephalogram (EEG) may provide such a tool. It is a well-established and reliable procedure, it is non-invasive, and readily available to community clinics. However, the ability of EEG signals to resolve AD specific information is typically masked by changes due to normal aging, coexisting medical illness, and levels of anxiety or drowsiness during measurements. Various components of the ERPs, obtained through the oddball paradigm protocol, have previously been linked to cognitive functioning, and are believed to be relatively insensitive to above mentioned parameters [6–10]. In oddball paradigm, subjects are instructed to respond, typically by pressing a button, to an occasionally occurring target (oddball) tone of 2 kHz, within a series of regular 1 kHz tones. The ERPs then show a series of peaks, among which the P300 – a positive peak with an approximate latency of 300 ms that occurs only in response to the oddball stimulus – is of particular interest. Changes in the amplitude and latency of the P300 (P3, for short) are known to be altered by neurological disorders, such as the AD, that affect the temporal-parietal regions of the brain [11]. Polich et al. have shown that increased latency and decreased amplitude of P300, is associated with AD [6,12]. Several other efforts, such as [9,11,13–17] have later confirmed the strong link between AD on P300. More recently, task-irrelevant novel sounds have been included in the protocol, that may help distinguish AD from other forms of dementia using the amplitude and latency of the P300 [11]. However, looking at just the P300 component – while provides statistical correlation with AD – does not help in identifying individual patients: cognitively normal people may have delayed or absent P300; and those with AD, in particular in early stages, may still have strong P300, as shown in Figure 1. The inability of classical statistical approaches in individually identifying specific cases demands sophisticated approaches for such individual identification.
Figure 1.

(a&b) Expected P300 behavior from normal and AD patients, (c&d) not all cases follow this behavior.
Automated classification algorithms, such as neural networks, can be used for case-by-case identification of individual patient ERPs. The success of such automated classification algorithms strongly depends on the quantity and the quality of the training data: the available data must adequately sample the feature space, and the features themselves must carry discriminatory information among different classes. Traditionally, the features of the ERPs are obtained either in time (e.g., amplitude and latency of the P300) [11,15–22] or in frequency domain (e.g., power of different spectral bands of the ERP) [23–31]. However, both are suboptimal, since the ERP is a time and frequency varying non-stationary signal; and a time-frequency based analysis is more suitable. Despite its now mature history, studies applying time-frequency techniques, such as wavelets, to ERPs have only recently started, and mostly on non-AD related studies designed specifically for structural analysis of the P300 [32–44]. Other studies investigated the feasibility of wavelet analysis of EEGs, along with neural networks, but they either did not use ERPs [27,45] or did not specifically target AD diagnosis. For individual AD-specific diagnosis, there have been very few studies that use an appropriate time-frequency analysis, such as discrete wavelet transform, followed by neural network classification. The results of these primarily pilot studies, such as [46,47], and including our previous efforts [48,49] can be summarized as only limited success, due to several reasons: relatively small study cohort with typically 10–30 patients, not targeting diagnosis at the earliest stages, suboptimal selection of classifier model and/or its parameters, as well as the sheer inherent difficulty of the problem. The results therefore remain largely inconclusive.
In this study, we describe a new effort that investigates the feasibility of an automated classification approach that employs multiresolution wavelet analysis; however several factors set this study apart from previous efforts: (i) a very strict and controlled recruitment protocol along with a very detailed and thorough clinical evaluation protocol (see Section 2) is followed to ensure the quality of study cohort; (ii) the study cohort recruited for this study constitutes one of the largest of similar prior efforts; (iii) several different types of wavelets commonly used for analysis of biological signals are compared instead of a single generic wavelet; (iv) single classifier, as well as multi-classifier based ensemble approaches are implemented and compared; (v) analysis is done not only with respect to the general classification performance, but also with respect to commonly used medical diagnostic quantities, such as sensitivity, specificity and positive predictive value; and most importantly (vi) this study uniquely targets diagnosing the disease at its earliest stage possible, typically before commonly recognized symptoms appear.
In P300 studies, the ERPs are typically obtained from one of the so-called PZ, CZ or FZ electrodes of the 10–20 EEG electrode placement system shown in Figure 2, primarily the former one. The common choice of PZ electrode is well justified, as ERPs are known to be most prominent in the central parietal regions of the cortex [40]. Furthermore, since the P300 is traditionally associated with the oddball tone, only responses to this tone are typically analyzed. In our previous preliminary studies, we have also analyzed the oddball responses from the PZ electrode. We now investigate the diagnostic information that may reside in data obtained from the other two electrodes, CZ and FZ, and obtained in response to the novel tones, as well as the target tones. Our justification for analyzing the remaining two electrodes is the relative and symmetric proximity of CZ and FZ electrodes to the PZ electrode. Our justification for analyzing the responses to the novel tones is the potential information that may be present in other components of the ERP, such as the P3a, that may be more prominent in responses to the novel tones.
Figure 2.

The 10–20 International EEG electrode placement system
2. Experimental Setup
2.1. Research Subjects and the Gold Standard
The current gold standard for AD diagnosis is clinical evaluation through a series of neuropsychological tests, including interviews with the patient and their caregivers. Seventy-two patients have been recruited so far by the Memory Disorders Clinic and Alzheimer’s Disease Research Center of University of Pennsylvania, according to the following inclusion and exclusion criteria for each of the two cohorts: probable AD and cognitively normal.
Inclusion criteria for cognitively normal cohort
(i) age > 60; (ii) Clinical Dementia Rating score = 0; (iii) Mini Mental State Exam Score > 26; (iv) no indication of functional or cognitive decline during the two years prior to enrollment based on a detailed interview with the subject’s knowledgeable informant.
Exclusion criteria for cognitively normal cohort
(i) evidence of any central nervous system neurological disease (e.g. stroke, multiple sclerosis, Parkinson’s disease, or other form dementia) by history or exam; (ii) use of sedative, anxiolytic or anti-depressant medications within 48 hours of ERP acquisition.
Inclusion criteria for AD cohort
(i) age > 60; (ii) Clinical Dementia Rating score >0.50; (iii) Mini Mental State Exam Score ≤ 26; (iv) presence of functional and cognitive decline over the previous 12 months based on detailed interview with a knowledgeable informant; (v) satisfaction of NINCDS-ADRDA (National Institute of Neurological and Communicative Disorders and Stroke - Alzheimer’s Disease and Related Disorders Association) criteria for probable AD [50].
Exclusion criteria for AD cohort
Same as for the cognitively normal controls.
All subjects received a through medical history and neurological exam. Key demographic and medical information, including their current medications (prescription, over-the-counter, or any alternative medications) were noted. The evaluation included standardized assessments for overall impairment, functional impairment, extrapyramidal signs, behavioral changes and depression. The clinical diagnosis was made as a result of these analyses as described by the NINCDS-ADRDA criteria for probable AD [50].
The inclusion criteria for AD cohort were designed to ensure that subjects were at the earliest stages of the disease. One metric is the Mini Mental State Exam (MMSE), a widely used standardized test for evaluating cognitive mental status. The test assesses orientation, attention, immediate and short-term recall, language, and the ability to follow simple verbal and written commands. It also provides a total score placing the individual on a scale of cognitive function. Cognitive performance as measured by the MMSE shows an inverse relationship between MMSE scores and age/education, ranging from a median of 29 for those 18 to 24 years of age, to 25 for individuals 80 years of age and older. The median MMSE score is 29 for individuals with at least 9 years of schooling, 26 for those with 5 to 8 years of schooling, and 22 for those with 0 to 4 years of schooling [51,52]. A grade less than 19 usually indicates cognitive impairment. MMSE is not used for diagnosis alone, but rather for assessing the severity of disease. The AD diagnosis itself is made based on the above mentioned NINCDS-ADRDA criteria for probable AD.
Of the 72 patients initially recruited, 20 were removed due to various reasons, including those AD patients – despite satisfying the above requirements – were too demented to be considered at the earliest stage of the disease. Of the 52 remaining patients, 28 were probable Alzheimer’s (μAGE=79, μMMSE=24.7), and 24 were cognitively normal (μAGE=76, μMMSE=29.6). Note that with an average MMSE score of 25, the AD cohort represents those who are at the earliest stage of the disease, a stage during which the symptoms of the disease may not even be noticeable. While this distinction makes the classification problem all the more challenging, it also sets this study apart from similar earlier efforts. Also, with 52 patients, this effort constitutes one of the largest studies of its kind to date.
2.2. Event Related Potentials (ERPs) Acquisition Protocol
The ERPs were obtained using an auditory oddball paradigm while the subjects were comfortably seated in a specially designated room. We used the protocol described by Yamaguchi et al., with slight modifications [11]. Binaural audiometric thresholds were first determined for each subject using a 1kHz tone. The evoked response stimulus was presented to both ears using stereo earphones at 60 dB above each individual’s auditory threshold. The stimulus consists of tone bursts 100 ms in duration, including 5 ms onset and offset envelopes. A total of 1000 stimuli, including frequent 1 kHz normal tones (n=650), infrequent 2 kHz oddball (target) tones (n=200), and novel sounds (n=150) were delivered to each subject with an inter-stimulus interval of 1.0 to 1.3 seconds. The subjects were instructed to press a button each time they heard the 2 kHz oddball tone. The subjects were not told about the presence of novel sounds ahead of time, which consisted of unique sound bytes that were not repeated. With frequent breaks (approximately three minutes of rest every five minutes), data collection typically took less than 30 minutes. The experimental session was preceded by a 1-minute practice session without the novel sounds.
ERPs were recorded from 19 electrodes embedded in an elastic cap. The electrode impedances were kept below 20 kΩ. Artifactual recordings were identified and rejected by the EEG technician. The potentials were finally amplified, digitized at 256 Hz/channel, lowpass filtered, averaged, notched filtered at 59–61Hz, and baselined with respect to the prestimulus interval for a final 257 sample, 1-second long signal. ERPs are often difficult to extract from a single response, due to many variations in cortical activity. Consecutive successful responses to each tone are therefore synchronized and averaged (after responses with artifacts, responses to missed targets, etc. are removed by the EEG technician), to obtain a robust ERP response. The averaging process, a routine portion of the oddball paradigm protocol, consisted of averaging 90 ~ 250 responses per stimulus type each, per patient.
3. Methods
3.1. Multiresolution Wavelet Analysis for Feature Extraction
Time localizations of spectral components can be obtained by multiresolution wavelet analysis, as this method provides the time-frequency representation of the signal. Among many time-frequency representations, the discrete wavelet transform (DWT) is perhaps the most popular one due to its many desirable properties, and its ability to solve a diverse set of problems, including data compression, biomedical signal analysis, feature extraction, noise suppression, density estimation, and function approximation, all with modest computational expense. Considering the audience of this journal, the well-established nature of the wavelet theory, as well as for brevity, we only describe the specific main points of DWT implementation here, and refer the interested readers to many excellent references listed at [53].
The DWT analyzes the signal at different resolutions (hence, multiresolution analysis) through the decomposition of the signal into several successive frequency bands. The DWT utilizes two sets of functions, a scaling function, φ(t), and a wavelet function, ψ(t), each associated with lowpass and highpass filters, respectively. An interesting property of these functions is that they can be obtained as a weighted sum of the scaled (dilated) and shifted versions of the scaling function itself:
| (1) |
| (2) |
Conversely, a scaling function φj,k(t) or wavelet function ψj,k(t) that is discretized at scale j and translation k can be obtained from the original (prototype) function φ (t) = φ0,0(t) or ψ(t) = ψ0,0(t) by:
| (3) |
| (4) |
Different scale and translations of these functions allow us to obtain different frequency and time localizations of the signal. The coefficients (weights) h[n] and g[n] that satisfy (1) and (2) constitute the impulse responses of the lowpass and highpass filters used in the wavelet analysis, and define the type of the wavelet used in the analysis. Decomposition of the signal into different frequency bands is therefore accomplished by successive highpass and lowpass filtering of the time domain signal.
The original time domain signal x(t) sampled at 256 samples/second formed the discrete time signal x[n], which is first passed through a halfband highpass filter, g[n], and a lowpass filter, h[n]. In terms of normalized angular frequency, the highest frequency in the original signal is π, corresponding to the linear frequency of 128 Hz. According to Nyquist’s rule, half the samples can be removed after the filtering, since the bandwidth of the signal is reduced to π/2 radians upon filtering. This is accomplished by down-sampling with a factor of 2. Filtering followed by subsampling constitutes one level of decomposition, and it can be expressed as follows:
| (3) |
| (4) |
where yhigh[k] and ylow[k] are the outputs of the highpass and lowpass filters, respectively, after the sub-sampling. The output of the highpass filter, yhigh[k], represents Level 1 DWT coefficients, also called d1: level 1 detail coefficients. The output of the lowpass filter, a1: the level 1 approximation coefficients, is further decomposed by passing ylow[k] through another set of highpass and lowpass filters, to obtain level 2 detail coefficients d2, and level 2 approximation coefficients, a2, respectively.
This procedure, called subband coding, is repeated for further decomposition as many times desired, or until no more subsampling is possible. At each level, the procedure results in half the time resolution (due to subsampling) and double the frequency resolution (due to filtering), allowing the signal to be analyzed at different frequency ranges with different resolutions. Figure 3 illustrates this procedure, where the frequency range analyzed with each set of coefficients is marked with “F.” The length of each set of coefficients is also provided, which depends on the specific wavelet used in the analysis. The numbers given in Figure 3 are for Daubechies wavelets with 4 vanishing moments, whose corresponding filters h[n] and g[n] are of length 2x4=8. For example, starting with 257-long signal, the output of each level 1 filter is 257+8–1=265 points long, which reduces to 132 after subsampling. A wraparound or truncation can also be used to keep the number of coefficients exactly half of that of the previous level. The total number of all coefficients then adds up to the original signal length (± 1 for odd length signals).
Figure 3.

7-level DWT decomposition
An approximation signal Aj(t) and a detail signal Dj (t) can be reconstructed from level j coefficients:
| (5) |
| (6) |
The original signal x(t) can then be reconstructed from the approximation signal Aj(t) at any level j and the sum of all detail signals up to and including level j:
| (7) |
Figures 4 and 5 illustrate the reconstructed signals obtained at each level of a 7-level decomposition, for the analysis of a cognitively normal and probable AD patient, respectively. Daubechies wavelets with 4-vanishing moments were used for these analyses. Whereas signal reconstruction is not required as part of this work, the reconstructed signals in Figures 4 and 5 indicate that majority of the signals’ energy lie in D4 ~ D7 and A7. The results presented in Section 4 will later confirm this observation.
Figure 4.

Reconstructed detail and approximation signals of a cognitively normal person
Figure 5.

Reconstructed detail and approximation signals of a probable–AD patient.
In this study, we compared four different types of wavelets, including the Daubechies wavelets with 4 (Db4) and 8 (Db8) vanishing moments, symlets with 5 vanishing moments (Sym5), and the quadratic B-spline wavelets (Qbs). The quadratic B-spline wavelet was chosen due to its reported suitability in analyzing ERP data in several studies [34,35,54–56]. Db4 and Db8 were chosen for their simplicity and general purpose applicability in a variety of time-frequency representation problems, whereas Sym5 was chosen due to its similarity to Daubechies wavelets with additional near symmetry property.
3.2. An Ensemble of Classifier Based Classification
One of the novelties of this work is the investigation of an ensemble of classifiers based approach for the classification of ERP signals. An ensemble based system, also known as a multiple classifier system (MCS), combines several, preferably diverse, classifiers. The diversity in the classifiers is typically achieved by using a different training dataset for each classifier, which then allows each classifier to generate different decision boundaries. The expectation is that each classifier will make a different error, and strategically combining these classifiers can reduce the total error. Numerous studies have shown that such an approach can often outperform a single classifier system, is usually resistant to overfitting problems, and can often provide much more stable results. Since its humble beginnings with such seminal works including, but not limited to [57–65], research in multiple classifier systems has expanded rapidly, and become an important research topic [66–68]. A sample of the immense literature on classifier combination can be found in Kuncheva’s recent book, the first text devoted to theory and implementation of ensemble based classifiers [66], and references therein. The field has been developing so rapidly that an international workshop on MCS has recently been established, and the most current developments can be found in its proceedings [69].
The ensemble classification algorithm of choice for this study was Learn++, originally developed for efficient learning of novel information [70,71]. Inspired in part by AdaBoost [64], Learn++ generates an ensemble of diverse classifiers, where each classifier is trained on a strategically updated distribution of the training data that focuses on instances previously not seen or learned. The inputs to Learn++ algorithm are (i) the training data S comprised of m instances xi along with their correct labels yi ∈ Ω = {ω1, … ωC}, i = 1,2, …, m, for C number of classes; (ii) a supervised classification algorithm BaseClassifier, generating individual classifiers (henceforth, hypotheses); and (iii) an integer T, the number of classifiers to be generated. The pseudocode of the algorithm and its block diagram are provided in Figures 6 and 7, respectively; and described below in detail.
Figure 6.

Learn++ pseudocode
Figure 7.

Learn++ block diagram
The BaseClassifier can be any supervised classifier, whose instability can be adjusted to ensure adequate diversity, so that sufficiently different decision boundaries can be generated each time the classifier is trained on a different training dataset. This instability can be controlled by adjusting training parameters, such as the size or error goal of a neural network, with respect to the complexity of the problem. However, a meaningful minimum performance is enforced: the probability of any classifier to produce the correct labels on a given training dataset, weighted proportionally to individual instances’ probability of appearance, must be at least ½. If classifiers’ outputs are class-conditionally independent, then the overall error monotonically decreases as new classifiers are added. Originally known as the Condorcet Jury Theorem (1786) [72–74], this condition is necessary and sufficient for a two-class problem (C=2); and it is sufficient, but not necessary, for C>2.
An iterative process sequentially generates each classifier of the ensemble: during the tth iteration, Learn++ trains the BaseClassifier on a judiciously selected subset TRt (about ⅔) of the current training data to generate hypothesis ht. The training subset TRt is drawn from the training data according to a distribution Dt, which is obtained by normalizing a set of weights wt maintained on the entire training data S. The distribution Dt determines which instances of the training data are more likely to be selected into the training subset TRt. Unless a priori information indicates otherwise, this distribution is initially set to be uniform, giving equal probability to each instance to be selected into TR1. At each subsequent iteration loop t, the weights previously adjusted at iteration t-1 are normalized (in step 1 of the loop)
| (8) |
to ensure a proper distribution. Training subset TRt is drawn according to Dt (step 2), and the BaseClassifier is trained on TRt (step 3). A hypothesis ht is generated by the tth classifier, whose error εt is computed on the current dataset S as the sum of the distribution weights of the misclassified instances (step 4)
| (9) |
where, [| · |] evaluates to 1, if the predicate holds true, and 0 otherwise. As mentioned above, we insist that εt be less than ½. If this is the case, the hypothesis ht is accepted, and its error is normalized to obtain
| (10) |
If εt > ½, the current hypothesis is discarded, and a new training subset is selected by returning to step 2. All hypotheses generated thus far are then combined using weighted majority voting, to obtain the composite hypothesis Ht (step 5), for which each hypothesis ht is assigned a weight inversely proportional to its normalized error. Those hypotheses with smaller training error are awarded a higher voting weight, and thus have more say in the decision of Ht, which then represents the current ensemble decision:
| (11) |
It can be shown that the weight selection of log (1/βt) is optimum for weighted majority voting [66]. The error of the composite hypothesis Ht is computed as the sum of the distribution weights of the instances that are misclassified by the ensemble decision Ht (step 6)
| (12) |
Since individual hypotheses that make up the composite hypothesis all have individual errors less than ½, so too will the composite error, i.e., 0 ≤ Et< ½. We normalize the composite error Et to obtained
| (13) |
which is then used for updating the distribution weights assigned to individual instances
| (14) |
Equation (14) indicates that the distribution weights of the instances correctly classified by the composite hypothesis Ht are reduced by a factor of Bt. Effectively, this increases the weights of the misclassified instances, making them more likely to be selected into the training subset of the next iteration. Readers familiar with the AdaBoost algorithm have undoubtedly noticed the overall similarities, but also the key difference between the two algorithms: the weight update rule of Learn++ specifically targets learning novel information from data by focusing on those instances that are not yet learned by the ensemble, whereas AdaBoost focuses on instances that have been misclassified by the previous classifier. This is because, in AdaBoost the weight distribution is updated based on the decision of a single previously generated hypothesis ht [64], whereas Learn++ updates its distribution based on the decision of the current ensemble through the use of the composite hypothesis Ht. This procedure forces Learn++ to focus on instances that have not been properly learned by the ensemble. It can be argued that AdaBoost also looks, albeit indirectly, at the ensemble decision since, while based on a single hypothesis, the distribution update is cumulative. However, the update in Learn++ is directly tied to the ensemble decision, and hence been found to be more efficient in learning new information in our previous trials on benchmark datasets. The final hypothesis is obtained by combining all hypotheses that have been generated thus far:
| (15) |
For any given data instance x, Hfinal chooses the label y ∈ Ω: {ω1, … ωC} that receives the largest total vote from all classifiers ht , where the vote of ht is weighted by its normalized performance log(1/βt).
3.3. Implementation Details
As mentioned in the Introduction, several factors set this study apart from previous efforts: a large cohort; analysis using three different electrodes PZ, FZ and CZ, instead of just the standard PZ, and two different stimuli target and novel tones; analysis with four different wavelets and several different frequency bands; analysis with an ensemble of classifiers approach, as well as a single classifier; and the effect of the above mentioned variables on commonly used medical diagnostic quantities, including sensitivity, specificity and positive predictive value, instead of just overall generalization performance.
3.3.1. Feature Extraction
For each patient, six set of ERPs were extracted and averaged: responses to novel and target tones, from each of the PZ, CZ, and FZ electrodes. All averaged ERPs were decomposed into 7 levels using one of four types of wavelets: Daubechies with 4 and 8 vanishing moments (Db4, Db8), symlets with 5 vanishing moments (Sym5), and quadratic b-splines (Qbs). Of the 8 frequency bands created by the decomposition, the following bands were used for further analysis: Approximation at 0–1 Hz, and details at 1–2 Hz, 2–4 Hz, 4–8 Hz and 8–16 Hz. Detail coefficients at 16–32, 32–64 and 64–128 Hz were not considered for analysis, as the ERPs are known not to include any relevant frequency components in these intervals. In fact, the P300 is known to reside in 0 – 4 Hz interval, primarily around 3 Hz.
The number of coefficients created at each level depends on the analysis wavelet, more specifically the number of its filter tabs. Since each recording started 200 ms before the stimulus and lasted for exactly 1 second, the middle coefficients were extracted in each case to remove those DWT coefficients corresponding to pre-stimulus baseline as well as large latency post-stimulus baseline. The extracted coefficients corresponded to approximately 50 – 600 ms duration after the stimulus.
3.3.2. Classification
We have first tried a single classifier system, using the multilayer perceptron as the base model. We have experimented with several architectural parameters, such as the number of hidden layer nodes in the 5 to 50 range, and the error goal in the 0.005 to 0.1 range. As a result of thousands of independent trials, an MLP architecture of 10 hidden layer nodes and 0.01 error goal was decided as the common architecture for all experiments. We have then implemented a Learn++ based ensemble system, where we have tried several different number of classifiers in the ensemble, from 3 to 25. In general, a 5-classifier ensemble provided good results. While Learn++ is independent of the base classifier, and can work with any supervised classification algorithm, MLP with the same architecture mentioned above was chosen as the base classifier for a fair comparison.
3.3.3. Validation Process: Leave-one-out
In all cases listed below, generalization performance was obtained through leave-one-out cross-validation. According to this procedure, a classifier is trained on all but one of the available training data instances, and tested on the remaining instance. Its performance on this instance, 0 or 100%, is noted. The classifier is then discarded and a new one – with identical architecture – is trained again on all but one training data instance, this time leaving a different data instance out. Assuming that there are m training data points, the entire training and testing procedure is repeated m times, leaving a different instance as a test instance in each case. The mean of m individual performances is then accepted as the estimate of the performance of the system. The leave-one-out process is considered as the most rigorous, reliable and conservative – and, of course, computationally most costly – estimate of the true performance of the system, as it removes the bias of choosing particularly easy or difficult instances into training or test data. Due to the delicate nature of the application, and in order to obtain a reliable estimate of the true performance of this approach, we decided to use the leave-one-out procedure (instead of two-way splitting of the data into training and test datasets, or a k-fold cross-validation) despite its computational complexity. In order to further confirm the validity of the results, all leave-one-out validations were repeated three times.
3.3.4. Diagnostic Performance Figures
While generalization performance is the traditional figure of merit in evaluating machine learning algorithms, more descriptive quantities are often used to evaluate medical tests and procedures. Sensitivity, specificity, positive predictive value and negative predictive value are four such quantities commonly used in medical diagnostics. Table 1 summarizes the concepts defined below.
Table 1.
Category labels for defining diagnostic quantities.
| Number of Patients | True Condition | ||
|---|---|---|---|
| Probable AD | Cognitively Normal | ||
| Classification Decision | Probable AD | A | B |
| Cognitively Normal | C | D | |
Overall Generalization Performance (OGP)
In pattern recognition, this is the average leave-one-out validation performance of the classifiers, or average generalization performance on test data. OGP represents the average probability of correct decision. Within the medical community, OGP is also known as the accuracy of the test: the ratio of patients the classification system is expected to correctly identify.
Sensitivity
Formally defined as the probability of a positive diagnosis given that the patient does in fact have the condition, sensitivity is the ability of a medical test to correctly identify the target group. In the context of this application, sensitivity is the proportion of true AD patients correctly identified as AD patients by the classification system,
Specificity
Formally defined as the probability of a negative diagnosis given that the patient does not have the disease, specificity is the ability of a test to correctly identify the control group. In this study, specificity is the proportion of cognitively normal patients, correctly identified as normal.
Positive Predictive Value (PPV)
PPV is defined as the probability that the patient has the disease, given that the test result is positive. It is calculated as the proportion of the sample population that is identified by the test as the target group, who in fact belong to the target group. In the context of this study, PPV is the proportion of patients identified as AD patients by the classifier, who actually have AD.
Negative Predictive Value (NPV)
Not used as commonly, NPV is the probability that the patient does not have the disease, given that the test result is negative. It is calculated as the proportion of the sample population that is identified by the test to belong to control group, who in fact belong to control group. In the context of this study, NPV is the proportion of patients identified as normal by the classifier, who are in fact cognitively normal.
In Table 1, A is the number of patients classified as AD, who are in fact diagnosed as probable AD by the clinical evaluation; B is the number of patients, also classified as AD (albeit incorrectly), who are in fact cognitively normal; C is the number of patients classified as normal (again, incorrectly), who were originally diagnosed as probable AD; and D is the number of patients who are (correctly) classified as cognitively normal, and are in fact clinically determined to be cognitively normal. A+B+C+D is the total number of patients. Then,
| (16) |
| (17) |
| (18) |
| (19) |
| (20) |
4. Results
As mentioned above, six sets of ERPs were obtained from each patient (3 electrodes, 2 types of stimulus), analyzed at 5 levels of frequency bands (0–1Hz, 1–2Hz, 2–4 Hz, 4–8Hz and 8–16 Hz, constituting individual feature sets) for each set of ERPs, using each of 4 types of wavelets. Diagnostic classification performances, along with sensitivity, specificity, positive and negative predictive values were obtained for each of the above mentioned combinations. Furthermore, considering that ERPs are known to occupy primarily the 0–4 Hz range, we have also included this frequency range as the 6th feature set, obtained by concatenating the first three sets of coefficients.
Presenting the results for every combination would be impractical, and unnecessarily lengthen this paper. Summary results are therefore provided here. Specifically, the results corresponding to Daubechies 4 wavelets are provided in most detail, including the performance of each frequency band for each of the six sets of ERPs. Db4 was chosen due to its common use in broad range applications, including analysis of biological signals. Then, the performance figures for each set of ERP using each of the four wavelets are provided, but only for the highest performing frequency band. For all cases, we provide the overall classification performance obtained by a single classifier, as well as an ensemble of 5 classifiers. Both single and ensemble performances are averages of three independent 52-fold leave-one-out trials, whereas the Best Ensemble is the best performing leave-out-trial out of the three. As we discuss below, the ensembles performed, on average, better than individual classifiers. The sensitivity, specificity and positive predictive values are therefore provided for ensemble performances.
Tables 2 – 4 summarize the performance figures obtained when responses to target tones were processed with Db4 wavelet, for each of the three electrodes. The best performing spectral band for all electrodes was the 2–4 Hz range (corresponding to level D6 in Figures 4 and 5), indicated in bold face in Tables 2–4. Of the three electrodes, the best performance across all categories was obtained with the Cz electrode with an average ensemble performance of 72.4%, best ensemble performance of 75%, with sensitivity, specificity, PPV and NPV values of 68.6%, 69.2%, 72.5%, and 65.3% respectively.
Table 2.
Spectral specific performances obtained from CZ electrode – target response (Db4)
| Target CZ | Single Classifier OGP | Ensemble OGP | Best Ensemble | Ensemble Sensitivity | Ensemble Specificity | Ensemble PPV | Ensemble NPV |
|---|---|---|---|---|---|---|---|
| 0–1 Hz | 55.7 | 57.7 | 61.5 | 48.6 | 63.3 | 61.6 | 51.2 |
| 1–2 Hz | 50.0 | 48.7 | 51.9 | 47.9 | 47.5 | 51.5 | 43.8 |
| 2–4 Hz | 62.8 | 72.4 | 75.0 | 68.6 | 69.2 | 72.5 | 65.3 |
| 4–8 Hz | 56.4 | 54.5 | 57.7 | 51.4 | 47.5 | 53.9 | 45.0 |
| 8–16 Hz | 53.2 | 55.1 | 61.5 | 57.1 | 49.2 | 56.7 | 49.7 |
| 0–4 Hz | 55.8 | 57.1 | 57.7 | 51.4 | 58.3 | 59.5 | 50.5 |
Table 4.
Spectral specific performances obtained from FZ electrode – target response (Db4)
| Target FZ | Single Classifier OGP | Ensemble OGP | Best Ensemble | Ensemble Sensitivity | Ensemble Specificity | Ensemble PPV | Ensemble NPV |
|---|---|---|---|---|---|---|---|
| 0–1 Hz | 58.3 | 59.0 | 63.5 | 49.3 | 64.2 | 61.7 | 52.0 |
| 1–2 Hz | 59.0 | 55.8 | 61.5 | 50.0 | 51.7 | 54.3 | 47.4 |
| 2–4 Hz | 59.0 | 64.7 | 65.4 | 62.9 | 61.7 | 66.0 | 58.7 |
| 4–8 Hz | 60.3 | 62.8 | 67.3 | 54.3 | 63.3 | 63.6 | 54.2 |
| 8–16 Hz | 57.7 | 58.3 | 59.6 | 50.7 | 61.7 | 60.7 | 51.8 |
| 0–4 Hz | 55.1 | 54.5 | 57.7 | 52.1 | 48.3 | 53.7 | 46.8 |
Tables 5–7 summarize the classification performances obtained when responses to novel tones were processed with Db4 wavelet, for each of the three electrodes. The performance obtained with novel tones were significantly better, particularly for the PZ electrode than the performances obtained with the target tones. This is perhaps one of the most surprising outcomes of this study, as novel tones were not originally intended to be used for AD versus normal discrimination, but rather to improve the robustness of the P300 component generated in response to the target tones. Furthermore, the spectral band that provides the best performance is 1–2 Hz, as opposed to 2–4 Hz that performed well with target tones. With the PZ electrode, ERPs in response to novel tones decomposed with Db4 wavelet obtained overall generalization performances of 75%, 78.2% and 80.8% using averaged single classifier, average ensemble classifier and best ensemble classifier, respectively. The sensitivity was 67.1%, specificity 78.3%, and PPV was 78.8%.
Table 5.
Spectral specific performances obtained from CZ electrode – novel response (Db4)
| Novel CZ | Single Classifier OGP | Ensemble OGP | Best Ensemble | Ensemble Sensitivity | Ensemble Specificity | Ensemble PPV | Ensemble NPV |
|---|---|---|---|---|---|---|---|
| 0–1 Hz | 53.8 | 54.5 | 55.8 | 49.3 | 56.7 | 67.0 | 49.0 |
| 1–2 Hz | 51.9 | 51.3 | 57.7 | 49.3 | 48.3 | 53.0 | 44.4 |
| 2–4 Hz | 56.4 | 54.5 | 55.8 | 45.7 | 62.5 | 58.7 | 49.7 |
| 4–8 Hz | 62.8 | 64.1 | 65.4 | 57.9 | 68.3 | 68.4 | 58.1 |
| 8–16 Hz | 55.8 | 57.7 | 57.7 | 50.0 | 62.5 | 61.5 | 51.6 |
| 0–4 Hz | 60.9 | 57.7 | 59.6 | 48.6 | 64.2 | 61.3 | 51.7 |
Table 7.
Spectral specific performances obtained from FZ electrode – novel response (Db4)
| Novel FZ | Single Classifier OGP | Ensemble OGP | Best Ensemble | Ensemble Sensitivity | Ensemble Specificity | Ensemble PPV | Ensemble NPV |
|---|---|---|---|---|---|---|---|
| 0–1 Hz | 51.9 | 50.6 | 51.9 | 42.8 | 55.0 | 52.3 | 45.4 |
| 1–2 Hz | 52.6 | 51.9 | 55.8 | 47.8 | 50.8 | 53.2 | 45.5 |
| 2–4 Hz | 54.5 | 58.3 | 59.6 | 47.1 | 66.7 | 52.4 | 51.9 |
| 4–8 Hz | 50.6 | 56.4 | 57.7 | 47.1 | 57.5 | 56.4 | 48.3 |
| 8–16 Hz | 53.8 | 57.7 | 59.6 | 53.6 | 59.2 | 60.4 | 52.5 |
| 0–4 Hz | 51.9 | 49.4 | 50.0 | 37.1 | 54.2 | 48.1 | 42.6 |
Also interesting to note that 0–4 Hz that includes both the 1–2 and 2–4 Hz coefficients did not perform as well, confirming that the information provided by 0–1 Hz coefficients is not only non-discriminatory on their own right, but their existence has a deteriorating effect.
The same set of experiments was repeated with three additional wavelets, Db8, Sym5 and quadratic b-splines. The results for all four wavelets are summarized in Tables 8–11, where performances for only the best performing spectral bands are included. Each row in Table 8 is therefore the best row from the above six tables, and included here for completeness. Several interesting observations can be made from the following results: First, the best performing frequency band was the same regardless of the wavelet used: 2–4 Hz for responses to target tones from all three electrodes, and for novel tones from FZ electrode; 1–2 Hz for responses to novel tones from PZ electrode; and 4–8 Hz for responses to novel tones from CZ electrode. This confirms the reasonable proposition that the choice of wavelet does not change the amount of information provided by each frequency band. Second, while the best performing frequency bands do not change, the actual performance figures do vary depending on the wavelet chosen. Tables 8–11 indicate that symlet wavelets perform better than all other wavelets, whereas quadratic b-splines provide the lowest performance. Furthermore, whereas the average single classifier performance at 67.9% is lower with symlets than that obtained with Db4 (75%), the average ensemble performance at 79.5%, and best ensemble performance at 84.6% outperform the Db4 wavelet. Finally, the medical diagnostic figures are also higher with the symlets, providing 73.5% sensitivity, 79.2% specificity, and 80.4% PPV.
Table 8.
Best spectral performances obtained using Db4 wavelet
| Db4 | Single Classifier OGP | Ensemble OGP | Best Ensemble | Ensemble Sensitivity | Ensemble Specificity | Ensemble PPV | Ensemble NPV |
|---|---|---|---|---|---|---|---|
| TC 2–4Hz | 62.8 | 72.4 | 75.0 | 68.6 | 69.2 | 72.5 | 65.3 |
| TP 2–4Hz | 60.9 | 66.0 | 67.3 | 62.1 | 65.0 | 67.6 | 59.5 |
| TF 2–4Hz | 59.0 | 64.7 | 65.4 | 62.9 | 61.7 | 65.9 | 58.7 |
| NC 4–8Hz | 62.8 | 64.1 | 65.4 | 57.9 | 68.3 | 68.4 | 58.1 |
| NP 1–2Hz | 75.0 | 78.2 | 80.7 | 67.1 | 78.3 | 78.8 | 67.0 |
| NF 2–4Hz | 54.5 | 58.3 | 59.6 | 47.1 | 66.7 | 62.5 | 51.9 |
Table 11.
Best spectral performances obtained using Qbs wavelet
| Qbs | Single Classifier OGP | Ensemble OGP | Best Ensemble | Ensemble Sensitivity | Ensemble Specificity | Ensemble PPV | Ensemble NPV |
|---|---|---|---|---|---|---|---|
| TC 2–4Hz | 50.0 | 52.6 | 53.8 | 47.9 | 50.8 | 52.8 | 45.7 |
| TP 2–4Hz | 52.6 | 57.1 | 61.5 | 57.8 | 49.2 | 56.8 | 50.5 |
| TF 2–4Hz | 56.4 | 55.8 | 55.8 | 55.7 | 49.2 | 56.0 | 49.0 |
| NC 4–8Hz | 59.6 | 57.7 | 63.5 | 52.1 | 57.5 | 59.0 | 50.6 |
| NP 1–2Hz | 67.3 | 65.4 | 65.4 | 57.1 | 71.7 | 70.4 | 59.1 |
| NF 2–4Hz | 47.4 | 50.0 | 50.0 | 43.6 | 54.2 | 52.7 | 45.0 |
5. Conclusions & Discussions
The application presented in this work is concerned with automated early diagnosis of Alzheimer’s disease, using a non-invasive and cost-effective biomarker that can be measured in a community healthcare clinic setting. This problem is widely recognized as a particularly difficult one, not only for machine learning algorithms, but even for most neurophysiologists and neuropsychologists. The difficulty of the problem is exacerbated with our requirement to diagnose the disease at its earliest possible stage, during which the symptoms are often not much different than those that are associated with normal aging.
The proposed approach seeks a synergistic combination of some well established techniques, such as ERP analysis using multiresolution wavelets, with more recent developments in machine learning, such as ensemble systems. Specifically, we analyze the discriminatory ability of ERPs obtained in response to novel tones, as well as commonly used target tones, acquired from three different electrodes. One of the most surprising outcomes of this study is the ability of novel tones to discriminate the ERPs of cognitively normal people from those with earliest form of Alzheimer’s disease. It was found that novel tones at 1–2 Hz, acquired from the PZ electrode, provide the best performance, regardless of the wavelet used, though the best performances were obtained when these signals were decomposed and analyzed using the symlet wavelets with 5 vanishing moments. Daubechies wavelets with 4 vanishing moments closely followed symlet5. It should be noted that symlets have similar properties to that of Daubechies wavelets, and in fact they look similar; however, symlets are near symmetric, whereas Daubechies wavelets are not.
Ensemble performances were in general higher than single classifier performances, and sometimes by wide margins, demonstrating the usefulness of the ensemble approach. However, not all ensemble performances were better than those of individual classifiers, indicating that the ensemble classifiers must be constructed with care: if all classifiers that constitute the ensemble provide similar information, then there is nothing to be gained from using an ensemble approach. However, if the classifiers are negatively correlated, that is, they make errors on different instances, their combination can provide a performance boost.
Recall that a recent study estimates the community clinic based physicians’ diagnostic performances with 83% sensitivity, 53% specificity and 75% overall classification performance. While the results of this study are quite satisfactory in their own right (from a computational intelligence perspective), they are particularly meaningful within the context of this application. This is because the ensemble generalization performance in 80% range exceeds the 75% diagnostic performance of trained physicians at community based healthcare providers – despite the physicians’ benefit of a longitudinal study. Furthermore, with sensitivity, specificity and positive predictive values also reaching 80% ranges, these results are particularly promising, and provide clinically useful outcomes.
A particularly interesting observation can also be made with sensitivity and specificity figures: at 73%, the sensitivity of the novel PZ at 1–2 Hz is the only metric that is lower than that obtained by the community physicians (83%); however, the specificity of the approach at 79.2% is significantly better than the 55% obtained by the physicians. Therefore, while the approach is promising, and outperforms the community physicians in general diagnostic performance, it is particularly useful in discriminating cognitively normal individuals from early AD patients, as measured by specificity.
Overall, the results presented above are significant because an EEG based automated classification system is non-invasive, objective, substantially more cost-effective than clinical evaluations, and can be easily implemented at community clinics, where most patients get their first intervention.
Our current and future work include analyzing additional wavelets, as well as combining the discriminatory information provided by individual frequency bands in a data fusion setting using the ensemble of classifiers approach. We note that simply concatenating the coefficients from different frequency bands do not necessarily provide better performance, as demonstrated by the poor performance of the 0–4 Hz coefficients. However, combining individual ensemble of classifiers, each trained with signals at a particular frequency range, may prove to be more effective.
Summary
The rapidly growing proportion of elderly population, combined with lack of standard and effective diagnostic procedures that are available to community healthcare providers, makes early diagnosis of Alzheimer’s disease (AD) a major public healthcare concern. Several signal processing based approaches – some combined with automated classifiers – have been proposed for the analysis electroencephalogram signals, which has resulted in varying degrees of success. To date, the final outcomes of these studies remain largely inconclusive primarily due to lack of adequate study cohort, as well as the inherent difficulty of the problem. This paper describes a new effort using multiresolution wavelet analysis on event related potentials (ERPs) of the EEG to investigate whether EEG can be a reliable biomarker for AD.
Several factors sets this study apart from similar prior efforts: (i) a larger cohort recruited through a strict inclusion/exclusion protocol, and diagnosed through a thorough a rigorous clinical evaluation process; (ii) data from three different electrodes and two different stimulus tones (target and novel) are analyzed; (iii) different mother wavelets have been employed in analysis of the signals; (iv) performances of six frequency bands (0–1Hz, 1–2 Hz, 2–4Hz, 4–8 Hz, 8–16 Hz and 0–4 Hz) have been individually analyzed; (v) an ensemble of classifiers based decision is implemented and compared to a single classifier based decision; and most importantly, (vi) the earliest possible diagnosis of the disease is targeted. Some expected, and some interesting outcomes were observed, with respect to each parameter analyzed.
Individual frequency bands
In general, 1–2 Hz and 2–4 Hz bands provide the most discriminatory information. This was expected as the spectral content of the primary ERP component of interest, the P300, is known to reside in these intervals.
The choice of wavelet
The best performing bands remained the same when the data was analyzed using different wavelets. However, the specific performances of frequency bands varied with the choice of the wavelet, symlets5 providing the best results.
Choice of classifier
In general, ensemble based systems outperformed single classifier based systems, sometimes with wide margins, indicating that such systems should be examined more carefully.
Choice of Electrode
As expected, the PZ electrode provided the best performance, confirming results of previous efforts.
Choice of stimulus
Most surprisingly, the ERPs obtained in response to novel tones provided a better diagnostic performance than the traditionally used responses to target tones.
Diagnostic performance
Perhaps the most significant outcome of this study is the promising results obtained through the proposed approach. At around 80%, the overall performance of the proposed approach exceeded that of the trained community clinic physicians, and closely approached the gold standard performance of the university hospital based clinical evaluation. While the algorithm did well on all diagnostic performance figures, such as sensitivity, specificity and positive predictive value, its performance on specificity was particularly promising.
Considering the most challenging nature of diagnosing AD at its earliest stages, the results of this study justify the feasibility of this technique as a low-cost, objective, noninvasive approach that can be easily made available to the community clinics.
Table 3.
Spectral specific performances obtained from PZ electrode – target response (Db4)
| Target PZ | Single Classifier OGP | Ensemble OGP | Best Ensemble | Ensemble Sensitivity | Ensemble Specificity | Ensemble PPV | Ensemble NPV |
|---|---|---|---|---|---|---|---|
| 0–1 Hz | 54.5 | 53.8 | 55.8 | 46.4 | 57.5 | 55.9 | 48.1 |
| 1–2 Hz | 55.7 | 62.2 | 65.4 | 59.3 | 60.8 | 63.7 | 56.4 |
| 2–4 Hz | 60.9 | 66.0 | 67.3 | 62.1 | 65.0 | 67.6 | 59.5 |
| 4–8 Hz | 49.4 | 53.9 | 55.7 | 51.4 | 50.8 | 55.2 | 47.0 |
| 8–16 Hz | 53.2 | 58.3 | 59.6 | 53.6 | 60.8 | 61.5 | 52.9 |
| 0 –4 Hz | 58.3 | 60.9 | 65.4 | 56.4 | 62.5 | 63.7 | 55.2 |
Table 6.
Spectral specific performances obtained from PZ electrode – novel response (Db4)
| Novel PZ | Single Classifier OGP | Ensemble OGP | Best Ensemble | Ensemble Sensitivity | Ensemble Specificity | Ensemble PPV | Ensemble NPV |
|---|---|---|---|---|---|---|---|
| 0–1 Hz | 61.5 | 66.0 | 67.3 | 57.8 | 73.3 | 71.7 | 59.9 |
| 1–2 Hz | 75.0 | 78.2 | 80.8 | 67.1 | 78.3 | 78.8 | 67.0 |
| 2–4 Hz | 63.5 | 66.0 | 69.2 | 54.3 | 72.5 | 69.9 | 57.7 |
| 4–8 Hz | 62.8 | 65.4 | 69.2 | 60.0 | 61.7 | 65.2 | 56.6 |
| 8–16 Hz | 63.5 | 67.3 | 71.2 | 52.1 | 78.3 | 73.9 | 58.4 |
| 0–4 Hz | 70.5 | 73.7 | 75.0 | 65.0 | 80.8 | 79.9 | 66.5 |
Table 9.
Best spectral performances obtained using Db8 wavelet
| Db8 | Single Classifier OGP | Ensemble OGP | Best Ensemble | Ensemble Sensitivity | Ensemble Specificity | Ensemble PPV | Ensemble NPV |
|---|---|---|---|---|---|---|---|
| TC 2–4Hz | 50.6 | 53.8 | 59.6 | 48.6 | 51.7 | 54.1 | 46.2 |
| TP 2–4Hz | 52.6 | 55.8 | 57.7 | 51.4 | 56.7 | 58.1 | 50.0 |
| TF 2–4Hz | 46.8 | 53.8 | 53.8 | 48.6 | 55.8 | 56.3 | 48.2 |
| NC 4–8Hz | 59.0 | 64.1 | 67.3 | 57.1 | 68.3 | 57.9 | 57.7 |
| NP 1–2Hz | 69.2 | 73.7 | 75.0 | 66.4 | 77.5 | 77.5 | 66.5 |
| NF 2–4Hz | 53.2 | 50.6 | 51.9 | 42.9 | 53.3 | 51.7 | 44.4 |
Table 10.
Best spectral performances obtained using Sym5 wavelet
| Sym5 | Single Classifier OGP | Ensemble OGP | Best Ensemble | Ensemble Sensitivity | Ensemble Specificity | Ensemble PPV | Ensemble NPV |
|---|---|---|---|---|---|---|---|
| TC 2–4Hz | 53.8 | 55.1 | 57.7 | 51.4 | 55.8 | 57.8 | 49.5 |
| TP 2–4Hz | 51.3 | 46.8 | 50.0 | 47.1 | 41.7 | 48.6 | 40.1 |
| TF 2–4Hz | 55.8 | 62.2 | 67.3 | 60.7 | 57.5 | 62.6 | 55.6 |
| NC 4–8Hz | 64.1 | 62.8 | 69.2 | 58.6 | 60.0 | 63.6 | 55.0 |
| NP 1–2Hz | 67.9 | 79.5 | 84.6 | 73.5 | 79.2 | 80.4 | 72.1 |
| NF 2–4Hz | 54.5 | 51.9 | 53.8 | 47.9 | 52.5 | 53.9 | 46.4 |
Acknowledgments
This work is supported by National Institute on Aging of the National Institutes of Health under grant number P30 AG10124 - R01 AG022272, and by National Science Foundation under Grant No ECS-0239090.
Biographies
Robi Polikar is an Assistant Professor of Electrical and Computer Engineering at Rowan University, in Glassboro, NJ. He has received his B.Sc. degree in electronics and communications engineering from Istanbul Technical University, Istanbul, Turkey in 1993, and his M.Sc and Ph.D. degrees, both co-majors in electrical engineering and biomedical engineering, from Iowa State University, Ames, IA in 1995 and 2000, respectively. His current research interests are in pattern recognition, ensemble systems, computational models of learning, incremental learning, and their applications in biomedical engineering.
Apostolos Topalis is a graduate student with the Electrical and Computer Engineering department at Rowan University. His areas of interest include signal processing, pattern recognition and microprocessor system design.
Deborah Green is the lab manager for the EEG Lab at Drexel University, Philadelphia, PA. She has a B.A. degree in psychology from University of Delaware in 2003. Her research interests are in the neuropsychology of healthy aging and dementia, specifically in early diagnostic tools and possible intervention strategies, and the affects of mood and overall health on cognition. She will soon start her Ph.D. studies at Drexel.
John Kounios is a Professor of Psychology at Drexel University, and the director of the EEG Lab. He received his dual major B.A. degree in psychology and music theory/composition from Haverford College in 1978, and his Ph.D. in experimental psychology from University of Michigan in 1985. His research focuses on the neural and cognitive bases of semantic information processing, problem solving, and creativity.
Christopher M. Clark, a board certified neurologist, is an Associate Professor of Neurology, Associate Director of the Alzheimer’s Disease Center at Director of the Memory Disorders Clinic (MDC), and director of the recently initiated Center of Excellence for Research on Neurodegenerative Diseases. He is a Fellow of the University of Pennsylvania’s Institute on Aging. His research interests focus on Alzheimer’s disease and the development of diagnostically specific markers, the identification and evaluation of new treatments, the development of new instruments to measure rates of change, and studies of the relationship between Parkinson’s disease and Alzheimer’s disease.
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
References
- 1.Alzheimer’s Disease International. [Last accessed: 12-14-2005];About Alzheimer’s Disease. Available at: http://www.alz.co.uk/alzheimers/faq.html#howmany.
- 2.Alzheimer’s Association. [Last accessed: 12-14-2005];Basic Facts and Statistics. Available at: http://www.alz.org/Resources/TopicIndex/BasicFacts.asp.
- 3.Sunderland T, Gur RE, Arnold SE. The Use of Biomarkers in the Elderly: Current and Future Challenges. Biological Psychiatry. 8152005;58:272–276. doi: 10.1016/j.biopsych.2005.05.016. [DOI] [PubMed] [Google Scholar]
- 4.de Leon MJ, Klunk W. Biomarkers for the early diagnosis of Alzheimer’s disease. The Lancet Neurology. 2006;5:198–199. doi: 10.1016/S1474-4422(06)70357-X. [DOI] [PubMed] [Google Scholar]
- 5.Lim A, Kukull W, Nochlin D, Leverenz J, McCormick W, Bowen J, Teri L, Thompson J, Peskind E, Raskind M, Larson E. Clinico-neuropathological correlation of Alzheimer’s disease in a community-based case series. Journal of the American Geriatrics Society. 1999;47:564–569. doi: 10.1111/j.1532-5415.1999.tb02571.x. [DOI] [PubMed] [Google Scholar]
- 6.Polich J. P300 and Alzheimer’s disease. Biomed Pharmacother. 1989;43:493–499. doi: 10.1016/0753-3322(89)90110-8. [DOI] [PubMed] [Google Scholar]
- 7.Goodin DS. Clinical utility of long latency cognitive event-related potentials (P3): the pros. Electroencephalography and Clinical Neurophysiology. 1990;76:2–5. doi: 10.1016/0013-4694(90)90051-k. [DOI] [PubMed] [Google Scholar]
- 8.Ford JM, Askari N, Gabrieli JDE, Mathalon DH, Tinklenberg JR, Menon V, Yesavage J. Event-Related Brain Potential Evidence of Spared Knowledge in Alzheimer’s Disease. Psychology and Aging. 2001;16:161–176. doi: 10.1037/0882-7974.16.1.161. [DOI] [PubMed] [Google Scholar]
- 9.Olichney JM, Hillert DG. Clinical applications of cognitive event-related potentials in Alzheimer’s disease. Phys Med Rehabil Clin N Am. 2004;15:205–233. doi: 10.1016/s1047-9651(03)00103-7. [DOI] [PubMed] [Google Scholar]
- 10.Linden DJ. The p300: where in the brain is it produced and what does it tell us? Neuroscientist. 2005;11:563–576. doi: 10.1177/1073858405280524. [DOI] [PubMed] [Google Scholar]
- 11.Yamaguchi S, Tsuchiya H, Yamagata S, Toyoda G, Kobayashi S. Event-related brain potentials in response to novel sounds in dementia. Clinical Neurophysiology. 212000;111:195–203. doi: 10.1016/s1388-2457(99)00228-x. [DOI] [PubMed] [Google Scholar]
- 12.Polich J, Ladish C, Bloom FE. P300 assessment of early Alzheimer’s disease. Electroencephalogr Clin Neurophysiol. 1990;77:179–189. doi: 10.1016/0168-5597(90)90036-d. [DOI] [PubMed] [Google Scholar]
- 13.Mochizuki Y, Oishi M, Takasu T. Correlations between P300 components and regional cerebral blood flows. Journal of Clinical Neuroscience. 2001;8:407–410. doi: 10.1054/jocn.2000.0850. [DOI] [PubMed] [Google Scholar]
- 14.Ardekani BA, Choi SJ, Hossein-Zadeh GA, Porjesz B, Tanabe JL, Lim KO, Bilder R, Helpern JA, Begleiter H. Functional magnetic resonance imaging of brain activity in the visual oddball task. Cognitive Brain Research. 2002;14:347–356. doi: 10.1016/s0926-6410(02)00137-4. [DOI] [PubMed] [Google Scholar]
- 15.Golob EJ, Johnson JK, Starr A. Auditory event-related potentials during target detection are abnormal in mild cognitive impairment. Clinical Neurophysiology. 2002;113:151–161. doi: 10.1016/s1388-2457(01)00713-1. [DOI] [PubMed] [Google Scholar]
- 16.Phillips NA, Chertkow H, Leblanc MM, Pim H, Murtha S. Functional and anatomical memory indices in patients with or at risk for Alzheimer’s disease. J Int Neuropsychol Soc. 2004;10:200–210. doi: 10.1017/S1355617704102063. [DOI] [PubMed] [Google Scholar]
- 17.Ally BA, Jones GE, Cole JA, Budson AE. The P300 component in patients with Alzheimer’s disease and their biological children. Biological Psychology. 2006 doi: 10.1016/j.biopsycho.2005.10.004. In Press. [DOI] [PubMed] [Google Scholar]
- 18.Golob EJ, Starr A. Effects of stimulus sequence on event-related potentials and reaction time during target detection in Alzheimer’s disease. Clin Neurophys. 812000;111:1438–1449. doi: 10.1016/s1388-2457(00)00332-1. [DOI] [PubMed] [Google Scholar]
- 19.Mulert C, Jager L, Pogarell O, Bussfeld P, Schmitt R, Juckel G, Hegerl U. Simultaneous ERP and event-related fMRI: focus on the time course of brain activity in target detection. Methods Find Exp Clin Pharmacol. 2002;24(Suppl D):17–20. [PubMed] [Google Scholar]
- 20.Benvenuto J, Jin Y, Casale M, Lynch G, Granger R. Identification of diagnostic evoked response potential segments in Alzheimer’s disease. Exp Neurol. 2002;176:269–276. doi: 10.1006/exnr.2002.7930. [DOI] [PubMed] [Google Scholar]
- 21.Morgan CD, Murphy C. Olfactory event-related potentials in Alzheimer’s disease. J Int Neuropsychol Soc. 2002;8:753–763. doi: 10.1017/s1355617702860039. [DOI] [PubMed] [Google Scholar]
- 22.Chapman RM, Nowlis GH, McCrary JW, Chapman JA, Sandoval TC, Guillily MD, Gardner MN, Reilly LA. Brain event-related potentials: Diagnosing early-stage Alzheimer’s disease. Neurobiology of Aging. 2006 doi: 10.1016/j.neurobiolaging.2005.12.008. In Press, Corrected Proof. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Hogan MJ, Swanwick GR, Kaiser J, Rowan M, Lawlor B. Memory-related EEG power and coherence reductions in mild Alzheimer’s disease. Int J Psychophysiol. 2003;49:147–163. doi: 10.1016/s0167-8760(03)00118-1. [DOI] [PubMed] [Google Scholar]
- 24.Huang C, Wahlund L, Dierks T, Julin P, Winblad B, Jelic V. Discrimination of Alzheimer’s disease and mild cognitive impairment by equivalent EEG sources: a cross-sectional and longitudinal study. Clin Neurophysiol. 2000;111:1961–1967. doi: 10.1016/s1388-2457(00)00454-5. [DOI] [PubMed] [Google Scholar]
- 25.Knott V, Mohr E, Mahoney C, Ilivitsky V. Quantitative electroencephalography in Alzheimer’s disease: comparison with a control group, population norms and mental status. J Psychiatry Neurosci. 2001;26:106–116. [PMC free article] [PubMed] [Google Scholar]
- 26.Besthorn C, Sattel H, Hentschel F, Daniel S, Zerfass R, Forstl H. Quantitative EEG in frontal lobe dementia. J Neural Transm Suppl. 1996;47:169–181. doi: 10.1007/978-3-7091-6892-9_11. [DOI] [PubMed] [Google Scholar]
- 27.Cho SY, Kim BY, Park EH, Kim JW, Whang WW, Han SK, Kim HY. Automatic Recognition of Alzheimer’s Disease with Single Channel EEG Recording. A New Beginning for Human Health: Proceedings of the 25th Annual International Conference of the IEEE Engineering in Medicine and Biology Society; Sep 17–21 2003; pp. 2655–2658. [Google Scholar]
- 28.Lindau M, Jelic V, Johansson SE, Andersen C, Wahlund LO, Almkvist O. Quantitative EEG abnormalities and cognitive dysfunctions in frontotemporal dementia and Alzheimer’s disease. Dement Geriatr Cogn Disord. 2003;15:106–114. doi: 10.1159/000067973. [DOI] [PubMed] [Google Scholar]
- 29.Kai T, Asai Y, Sakuma K, Koeda T, Nakashima K. Quantitative electroencephalogram analysis in dementia with Lewy bodies and Alzheimer’s disease. Journal of the Neurological Sciences. 10152005;237:89–95. doi: 10.1016/j.jns.2005.05.017. [DOI] [PubMed] [Google Scholar]
- 30.solo D, Hornero R, Espino P, Poza J, nchez CI, de la Rosa R. Analysis of regularity in the EEG background activity of Alzheimer’s disease patients with Approximate Entropy. Clinical Neurophysiology. 2005;116:1826–1834. doi: 10.1016/j.clinph.2005.04.001. [DOI] [PubMed] [Google Scholar]
- 31.Bennys K, Rondouin G, Vergnes C, Touchon J. Diagnostic value of quantitative EEG in Alzheimer’s disease. Neurophysiologie Clinique/Clinical Neurophysiology. 2001;31:153–160. doi: 10.1016/s0987-7053(01)00254-4. [DOI] [PubMed] [Google Scholar]
- 32.Kolev V, Demiralp T, Yordanova J, Ademoglu A, Alkac U. Time-frequency analysis reveals multiple functional components during oddball P300. Neuroreport. 5271997;8:2061–2065. doi: 10.1097/00001756-199705260-00050. [DOI] [PubMed] [Google Scholar]
- 33.Ademoglu A, Demiralp T, Yordanova J, Kolev V, Devrim M. Decomposition of event-related brain potentials into multicomponents using wavelet transform. Applied Signal Processing. 1998;5:142–151. [Google Scholar]
- 34.Basar E, Schurmann M, Demiralp T, Basar-Eroglu C, Ademoglu A. Event-related oscillations are ‘real brain responses’-wavelet analysis and new strategies. International Journal of Psychophysiology. 2001;39:91–127. doi: 10.1016/s0167-8760(00)00135-5. [DOI] [PubMed] [Google Scholar]
- 35.Demiralp T, Ademoglu A, Istefanopulos Y, Basar-Eroglu C, Basar E. Wavelet analysis of oddball P300. International Journal of Psychophysiology. 2001;39:221–227. doi: 10.1016/s0167-8760(00)00143-4. [DOI] [PubMed] [Google Scholar]
- 36.Demiralp T, Ademoglu A. Decomposition of event-related brain potentials into multiple functional components using wavelet transform. Clinical Electroencephalography. 2001;32:122–138. doi: 10.1177/155005940103200307. [DOI] [PubMed] [Google Scholar]
- 37.Quiroga RQ, Sakowitz OW, Basar E, Schurmann M. Wavelet Transform in the analysis of the frequency composition of evoked potentials. Brain Research Protocols. 2001;8:16–24. doi: 10.1016/s1385-299x(01)00077-0. [DOI] [PubMed] [Google Scholar]
- 38.Quiroga RQ, Garcia H. Single-trial event-related potentials with wavelet denoising. Clinical Neurophysiology. 2003;114:376–390. doi: 10.1016/s1388-2457(02)00365-6. [DOI] [PubMed] [Google Scholar]
- 39.Aviyente S, Brakel LAW, Kushwaha RK, Snodgrass M, Shevrin H, Williams WJ. Characterization of Event Related Potentials Using Information Theoretic Distance Measures. IEEE Transactions on Biomedical Engineering. 2004;51:737–743. doi: 10.1109/TBME.2004.824133. [DOI] [PubMed] [Google Scholar]
- 40.Jansen BH, Allam A, Kota P, Lachance K, Osho A, Sundaresan K. An exploratory study of factors affecting single trial P300 detection. IEEE Trans on Biomedical Eng. 2004;51:975–978. doi: 10.1109/TBME.2004.826684. [DOI] [PubMed] [Google Scholar]
- 41.Fatourechi M, Mason SG, Birch GE, Ward RK. A wavelet-based approach for the extraction of event related potentials from EEG. Proceedings - IEEE International Conference on Acoustics, Speech, and Signal Processing; 2004. pp. 737–740. [Google Scholar]
- 42.Ozdemir AK, Karakas S, Cakmak ED, Tufekci DI, Arikan O. Time-frequency component analyser and its application to brain oscillatory activity. J Neuroscience Meth. 2005;145:107–125. doi: 10.1016/j.jneumeth.2004.12.003. [DOI] [PubMed] [Google Scholar]
- 43.Bernat EM, Williams WJ, Gehring WJ. Decomposing ERP time-frequency energy using PCA. Clinical Neurophysiology. 2005;116:1314–1334. doi: 10.1016/j.clinph.2005.01.019. [DOI] [PubMed] [Google Scholar]
- 44.Morup M, Hansen LK, Herrmann CS, Parnas J, Arnfred SM. Parallel Factor Analysis as an exploratory tool for wavelet transformed event-related EEG. NeuroImage. 212006;29:938–947. doi: 10.1016/j.neuroimage.2005.08.005. [DOI] [PubMed] [Google Scholar]
- 45.Karrasch M, Laine M, Rinne JO, Rapinoja P, Sinerva E, Krause CM. Brain oscillatory responses to an auditory-verbal working memory task in mild cognitive impairment and Alzheimer’s disease. International Journal of Psychophysiology. 2006;59:168–178. doi: 10.1016/j.ijpsycho.2005.04.006. [DOI] [PubMed] [Google Scholar]
- 46.Petrosian AA, Prokhorov DV, Lajara-Nanson W, Schiffer RB. Recurrent neural network-based approach for early recognition of Alzheimer’s disease in EEG. Clinical Neurophysiology. 2001;112:1378–1387. doi: 10.1016/s1388-2457(01)00579-x. [DOI] [PubMed] [Google Scholar]
- 47.Yagneswaran S, Baker M, Petrosian A. Power frequency and wavelet characteristics in differentiating between normal and Alzheimer EEG. 2002 IEEE Engineering in Medicine and Biology 24th Annual Conference and the 2002 Fall Meeting of the Biomedical Engineering Society (BMES/EMBS); Oct 23–26 2002; pp. 46–47. [Google Scholar]
- 48.Polikar R, Keinert F, Greer MH. Wavelet analysis of event related potentials for early diagnosis of Alzheimer’s disease. In: Petrosian A, Meyer FG, editors. Wavelets in Signal and Image Analysis, From Theory to Practice. Kluwer Academic Publishers; Boston: 2001. [Google Scholar]
- 49.Jacques G, Frymiare J, Kounios C, Clark C, Polikar R. Multiresolution analysis for early diagnosis of Alzheimer’s disease. 26th Annual Int. Conf.of IEEE Engineering in Medicine and Biology Soc.(EMBS2004); pp. 251–254. [DOI] [PubMed] [Google Scholar]
- 50.McKhann G, Drachman D, Folstein M, Katzman R, Price D, Stadlan EM. Clinical diagnosis of Alzheimer’s disease: report of the NINCDS-ADRDA Work Group under the auspices of Department of Health and Human Services Task Force on Alzheimer’s Disease. Neurology. 1984;34:939–944. doi: 10.1212/wnl.34.7.939. [DOI] [PubMed] [Google Scholar]
- 51.Cockrell JR, Folstein M. Mini Mental State Examination (MMSE) Psychopharmacology. 1988;24:689–692. [PubMed] [Google Scholar]
- 52.Crum RM, Anthony JC, Bassett SS, Folstein M. Population-based norms for the Mini-Mental State Examination by age and educational level. Journal of American Medical Association. 1993;269:2386–2391. [PubMed] [Google Scholar]
- 53.Unser M. [Last accessed: 03-03-2006];The Gallery at wavelet.org. Available at: http://www.wavelet.org/phpBB2/gallery.php.
- 54.Demiralp T, Yordanova J, Kolev V, Ademoglu A, Devrim M, Samar VJ. Time-frequency analysis of single-sweep event-related potentials by means of fast wavelet transform. Brain & Language. 1999;66:129–145. doi: 10.1006/brln.1998.2028. [DOI] [PubMed] [Google Scholar]
- 55.Bradley AP, Wilson WJ. On wavelet analysis of auditory evoked potentials. Clinical Neurophysiology. 2004;115:1114–1128. doi: 10.1016/j.clinph.2003.11.016. [DOI] [PubMed] [Google Scholar]
- 56.Subasi A, Ercelebi E. Classification of EEG signals using neural network and logistic regression. Computer Methods and Programs in Biomed. 2005;78:87–99. doi: 10.1016/j.cmpb.2004.10.009. [DOI] [PubMed] [Google Scholar]
- 57.Dasarathy BV, Sheela BV. Composite classifier system design: concepts and methodology. Proceedings of the IEEE. 1979;67:708–713. [Google Scholar]
- 58.Hansen LK, Salamon P. Neural network ensembles. IEEE Transactions on Pattern Analysis and Machine Intelligence. 1990;12:993–1001. [Google Scholar]
- 59.Schapire RE. The Strength of Weak Learnability. Machine Learning. 1990;5:197–227. [Google Scholar]
- 60.Jacobs RA, Jordan MI, Nowlan SJ, Hinton GE. Adaptive mixtures of local experts. Neural Computation. 1991;3:79–87. doi: 10.1162/neco.1991.3.1.79. [DOI] [PubMed] [Google Scholar]
- 61.Benediktsson JA, Swain PH. Consensus theoretic classification methods, IEEE Transactions on Systems. Man and Cybernetics. 1992;22:688–704. [Google Scholar]
- 62.Wolpert DH. Stacked generalization. Neural Networks. 1992;5:241–259. [Google Scholar]
- 63.Ho TK, Hull JJ, Srihari SN. Decision combination in multiple classifier systems. IEEE Transactions on Pattern Analysis and Machine Intelligence. 1994;16:66–75. [Google Scholar]
- 64.Freund Y, Schapire RE. Decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences. 1997;55:119–139. [Google Scholar]
- 65.Kittler J, Hatef M, Duin RPW, Mates J. On combining classifiers. IEEE Transactions on Pattern Analysis and Machine Intelligence. 1998;20:226–239. [Google Scholar]
- 66.Kuncheva LI. Combining Pattern Classifiers, Methods and Algorithms. NY: Wiley; 2005. [Google Scholar]
- 67.Ghosh J. Multiclassifier Systems: Back to the Future, 3rd Int. Workshop on Multiple Classifier Systems. In: Roli F, Kittler J, editors. Lecture Notes in Computer Science. Vol. 2364. pp. 1–15. [Google Scholar]
- 68.Ho TK. Multiple classifier combination: Lessons and the next steps. In: Kandel A, Bunke H, editors. Hybrid Methods in Pattern Recognition. World Scientific Publishing; 2002. [Google Scholar]
- 69.Various Authors. Proc. of International Workshop on Multiple Classifier Systems; 2000–2005. [Google Scholar]
- 70.Polikar R, Udpa L, Udpa SS, Honavar V. Learn++: An incremental learning algorithm for supervised neural networks, IEEE Transactions on Systems. Man and Cybernetics Part C: Applications and Reviews. 2001;31:497–508. [Google Scholar]
- 71.Muhlbaier M, Topalis A, Polikar R. Learn++. MT: A New Approach to Incremental Learning, 5th Int. Workshop on Multiple Classifiers Systems. In: Roli F, Kittler J, Windeatt T, editors. Lecture Notes in Computer Science. Vol. 3077. 2004. pp. 52–61. [Google Scholar]
- 72.Shapley L, Grofman B. Optimizing group judgmental accuracy in the presence of interdependencies. Public Choice (Historical Archive) 1984;43:329–343. [Google Scholar]
- 73.Boland PJ. Majority system and the Condorcet jury theorem. Statistician. 1989;38:181–189. [Google Scholar]
- 74.Berend D, Paroush J. When is Condorcet’s Jury Theorem valid? Social Choice and Welfare. 1998;15:481–488. [Google Scholar]
