Higher-order spectral analysis of spontaneous speech signals in Alzheimer’s disease

Mahda Nasrolahzadeh; Zeynab Mohammadpoory; Javad Haddadnia

doi:10.1007/s11571-018-9499-8

. 2018 Aug 27;12(6):583–596. doi: 10.1007/s11571-018-9499-8

Higher-order spectral analysis of spontaneous speech signals in Alzheimer’s disease

Mahda Nasrolahzadeh ^1,^✉, Zeynab Mohammadpoory ¹, Javad Haddadnia ¹

PMCID: PMC6233329 PMID: 30483366

Abstract

An early and accurate diagnosis of Alzheimer’s disease (AD) has been progressively attracting more attention in recent years. One of the main problems of AD is the loss of language skills. This paper presents a computational framework for classifying AD patients compared to healthy control subjects using information from spontaneous speech signals. Spontaneous speech data are obtained from 30 AD patients and 30 healthy controls. Because of the nonlinear and dynamic nature of speech signals, higher order spectral features (specifically bispectrum) were used for analysis. Four classifiers (k-Nearest Neighbor, Support Vector Machine, Naïve Bayes and Decision tree) were used to classify subjects into three different levels of AD and healthy group based on their performance in terms of the HOS-based features. Ten-fold cross-validation method was used to test the reliability of the classifier results. The results showed that the proposed method had a good potential in AD diagnosis. The proposed method was also able to diagnose the earliest stage of AD with high accuracy. The method has the great advantage of being non-invasive, cost-effective, and associated with no side effects. Therefore, the proposed method can be a spontaneous speech directed test for pre-clinical evaluation of AD diagnosis.

Keywords: Alzheimer’s disease, Spontaneous speech signal, Bispectrum estimation, Bicoherence estimation, Phase coupling

Introduction

One of the most common neurodegenerative disorders among elderly people is Alzheimer’s disease (AD). AD causes a gradual loss of mental ability, and problems with memory, understanding, judgment, thinking and language use (McKhann et al. 2011; Jack and Holtzman 2013). In addition to memory loss in AD, one of the main problems is the loss of language skills. The degree of loss of communication ability through language depends on the stage of disease. The deficits in speech happen in three stages of AD as pre-clinical stage, intermediate stage, and advanced stage. People with pre-clinical stage are often incapable to find the correct word to mention their meanings. The pre-clinical stage is described with problems and difficulties in finding the correct words in spontaneous speech which is often not found. In the intermediate stage, the ability to use language and vocabulary in daily condition becomes weak. In the advanced stage, the answers are sometimes very limited and restricted to a very few words (Buiza 2010; Martinez et al. 2012; Hu et al. 2010). In this stage the impairment is severe enough to affect subject ability to manage his daily activities.

Despite persistent investigation efforts to understand an effective treatment for AD, the reasons of disease are not understood. Therefore, an early and accurate diagnosis along with prescribed medication has been demonstrated to be useful in deferring the symptoms.

In recent research literature, AD has been diagnosed through several new techniques such as: electroencephalogram (EEG) signals (Dauwels et al. 2010), magnetic resonance imaging (MRI) (Zhang et al. 2015a, b), computed tomography (CT) imaging (Reynolds 2013), functional magnetic resonance imaging (fMRI) (Zhang et al. 2015c), positron emission tomography (PET) (Kippenhan et al. 1994), single-photon emission computed tomography (SPECT) (Salas-Gonzalez et al. 2010). The relative efficiency and reliability of these techniques have been tested in several studies with the use of methods with machine learning algorithms (Ortiz et al. 2013). However, these methods are expensive, difficult and time-consuming to use.

Fortunately, spontaneous speech analysis, a relatively inexpensive and non-invasive technique, can improve the performance of the methods in early diagnosis of AD (López de Ipiña et al. 2013a, b; König et al. 2015; Nasrolahzadeh et al. 2014, 2015a, b, 2016a; Nasrolahzadeh and Haddadnia 2016).

With this respect, López de Ipiña et al. (2013) utilized an artificial neural network for emotional analysis of spontaneous speech and Fractal dimension. In their study, the classification accuracy obtained is 97.7%.

In another study, A. König et al. used automatic speech analysis for AD diagnosis. They obtained the following classification accuracies: between healthy control-subjects (HCs) and those with mild cognitive impairment (MCI), 79 ± 5%; between HCs and those with AD, 87 ± 3%; and between those with MCI and those with Alzheimer’s, 80 ± 5%, demonstrating its assessment utility (König et al. 2015).

To discriminate control subjects and AD subjects with three different levels of AD, in the proposed method were utilized a technique based on acoustic features with ANFIS classifier. An accuracy of 97.96% was achieved between four groups (Nasrolahzadeh et al. 2014). Moreover, it was shown that nonlinear features of spontaneous speech signals had a good potential in AD diagnosis (Nasrolahzadeh et al. 2015a, b, 2016a; Nasrolahzadeh and Haddadnia 2016).

In recent years, higher order spectra (HOS) have been utilized to represent the properties of the signals in pattern recognition and many studies still utilize them in different types of fields. Because of their good performance, HOSs have interested considerable attention in studying the behavior of a dynamical system from an experimental time series such as speech signals (Nikias and Raghuveer 1987; Nasrolahzadeh et al. 2016b).

Because of these important characteristics, HOS as a non-linear method capturing subtle changes in signals can be utilized to extract features for the automated classification (Mookiah et al. 2012). This method provides information that cannot be presented through conventional spectral analysis techniques. However, it enjoys a better performance in capturing weak and noisy signals. HOS can reveal information about nonlinear signal generation mechanism, particularly those containing quadratic nonlinearities and deviation from Gaussianity (Chua et al. 2010). This is particularly interesting for speech signal processing. Because it has been recently suggested that the spontaneous speech generation may be a nonlinear process because of the disturbed air flow generated in the vocal tract, nonlinear neuro-muscular processes may take place at the larynx and the level of vocal cords. Moreover, nonlinear coupling may occur between various parts of the vocal tract during speech generation (Indrebo et al. 2004; Banbrook and Mclughlin 1994; Kumar and Mullick 1990; Teodorescu et al. 1996; Silipo et al. 1998).

Among the available methods, bispectrum and bicoherence have been widely used in research studies in that they can efficiently quantify any nonlinear interactions among the harmonic peaks (such as phase coupling). Furthermore, some research studies have shown that the non-zero bispectrum of a system refers to a non-linear interaction.

So far the HOS analysis has been used in several studies to investigate the speech signal. More specifically, the HOS measures have been used for pitch period estimation (Dogan and Mendel 1992; Moreno and Fonollosa 1992), speech endpoint detection (Rangoussi et al. 1993a, b), speech recognition (Paliwal and Sondhi 1991) and the determination of LPC parameters (Moreno et al. 1993; Salavedra et al. 1993a, b; Gdoura et al. 1993).

Toddler et al. (2013) applied the bispectral analysis, which provides an estimate of inter-word time intervals in psychotic speech, to distinguish psychotic subjects from healthy ones. Their results showed the psychotic group had a higher level of bicoherence, that is, a higher level of phase coupling. In another study, Azarbarzin and Moussavi (2011) used the bispectrum and bicoherence measures to discriminate snore segments. Their results showed that all the recorded fragments were non-Gaussian.

The present study aims to develop a new approach for early diagnosis of Alzheimer’s disease through HOS extraction of features from spontaneous speech signals by healthy subjects and three groups of AD. More specifically, this study is meant to answer the following two questions:

Can HOS-extracted features from the spontaneous speech signal provide a practical basis for AD diagnosis?
Can a good classifier be selected through giving the extracted features to the KNN, SVM, DT and NB classifiers so that it can be used as a reliable basis for the automatic discrimination among the subjects in the four groups?

The rest of this paper is organized as follows: in the next section, the datasets used in this study are described. Next, the methods and the quantification analyses used in the study are presented in section “Methodology”. The simulation results, the evaluation of the performance of the proposed method, and the discussions are summarized in section “Experimental results and discussion”. Finally, the conclusions are provided in section “Conclusion”.

Materials used

In what follows, the different steps taken in this study are presented.

Data collection

Some 60 participants were purposefully selected for this study: a group of 30 healthy control subjects (HCS), (52–98 years old, 15 women, 15 men) and a group of 30 AD patients (52–98 years old, 14 women, 16 men), with three levels of AD: First Stage (FS), Second Stage (SS) and Third Stage (TS), (FS = 6, SS = 15, TS = 9).

They were selected among the patients in an OLD NURSING HOME in Sabzevar according to the consensus diagnostic guidelines (McKeith et al. 1996). The patients in the three stages of AD had a Mini-Mental State Examination (MMSE) score of 14–26 and a Clinical Dementia Rating (CDR) of 0.5 or 1.0 and met the National Institute of Neurological and Communicative Disorders and Stroke and the Alzheimer’s disease and Related Disorders Association (NINCDS/ADRDA) criteria for probable AD. The normal subjects were non-depressed, non-MCI, and non-demented and had an MMSE score of 27–30 and a CDR of 0. Table 1 shows age information, MMSE scores and CDR values for four groups.

Table 1.

Demographic and Clinical characteristics of the study population

	HCS (N = 30; 15F/15M)			FS (N = 6; 3F/3 M)			SS (N = 15; 6F/9 M)			TS (N = 9; 5F/4 M)
	Mean	SD	Range	Mean	SD	Range	Mean	SD	Range	Mean	SD	Range
Age	75.6	5.6	52–98	73.3	5	60–86	70.6	6.5	52–88	77.4	6.3	56–98
MMSE	28.39	1.4	27–30	27.5	1.5	24–30	26.8	1.7	23–30	23.8	2.1	20–26
CDR	0	0	0	0.3	0.0	0–0.5	0.5	0.0	0.5–0.5	0.7	0.3	0.5–1

Open in a new tab

All of the patients underwent a standard battery of examinations consisting of medical history, physical and neurological examination, screening laboratory tests, psychometric tests and brain imaging. Diagnoses were made according to the consensus criteria for the diagnosis of AD and NINCDS/ADRDA criteria.

This study was approved by the Institutional Review Board of all participants and was performed in accordance with the ethical standards laid down in the 1964 Declaration of Helsinki.

After the patients were briefed about the objectives of the study, they gave their informed consent prior to their inclusion in the study. They were specifically asked to converse in a friendly way, tell graceful personal stories and express their feelings. The recording atmosphere was relaxed and friendly.

The speech signals were recorded using an audio recorder. The audio was extracted in WAV format. The sampling frequency and bit per second were 16 kHz and 16 bits, respectively. The speech signals were recorded about 15 and 17 h for the healthy control and AD groups, respectively. The recording time was longer for the subjects with Alzheimer’s in that Alzheimer’s patients spent more time than the healthy individuals to find words. They spoke more slowly, less clearly, with longer pauses. Their message was interrupted or remained incomplete, with more time needed for them to find the words. The database was intentionally multicultural (in this study, the data were collected from a total of 60 subjects from different birth places, including 10, 20, 6, 12, 1, 4, 2, and 5 subjects respectively from Mashhad, Sabzevar, Bojnord, Tehran, Fariman, Ghochan, Kordestan, and Neyshabour) so that a new methodology could be developed which was independent of the cultural, social and linguistic factors.

Non-analyzable events such as laughing, coughing, short hard noises and segments during which speakers overlapped, were eliminated after recording. Next, background noise was removed using the method described in Mohammadpoory and Haddadnia (2014). After that, there remained four hours for the AD groups and 12 h for the control subjects for further analysis. Finally, the speech was divided into some consecutive segments of 60 s in order to obtain comparable segments for all speakers. At the database of about 960 segments of spontaneous speech (70 segments for FS, 110 segments for SS, 60 segments for TS and 720 segments for HCS group) was developed. A more detailed explanation of database can be found in our previous works (Nasrolahzadeh et al. 2016a, b).

Methodology

Feature extraction and selection

The choice of feature extraction method plays an important role in the designing of an accurate diagnostic system. In the next step, the salient features were extracted with which the spontaneous speech signals of the healthy control subjects and patients in three stages of AD could be discriminated. In this paper, HOS based features were applied as inputs to classify these four groups. In the following part, the features along with the extraction method are presented.

Feature sets should be optimized so that computationally low-cost applications can be developed. The information gain criterion used to select the features (Jia et al. 2005). In other words, the performance of a feature is estimated by measuring the gain ratio with respect to the class in this method (López de Ipiña et al. 2015). The higher this value for a feature, the better it separates the classes.

Bispectrum estimation

Higher order spectra is well known descriptor and widely used in the analysis of spectral systems. HOS can be used to represent the properties of higher order statistics such as cumulants or moments of higher orders (Martis et al. 2013). A special case of higher order spectra is introduced as the third-order spectrum which is also called to as bispectrum. The prefix bi refers to the two frequencies of a signal. In addition, bispectrum is the Fourier transform of the third order correlation of the data and is expressed as follows (Chua et al. 2008):

B (f_{1}, f_{2}) = E [X (f_{1}) X (f_{2}) X^{*} (f_{1} + f_{2})],

where * and X denote the complex conjugate and the Fourier transform of the signal x, respectively. Not that X(f) is the discrete-time Fourier transform defined for deterministic sampled signals and calculated using the Fast Fourier Transform (FFT) algorithm. Besides, E[] denotes an average over an ensemble of realizations of a random signal (Acharya et al. 2008). It is worth mentioning that in the case of deterministic signals, the relationship holds without the expectation operation with the third order correlation being a time-average. Moreover, the frequency ‘‘f’’ must be normalized into [0,1] by the Nyquist frequency (Nikias and Raghuveer 1987; Ning and Bronzino 1990).

Equation (1) shows that the bispectrum includes a function of two frequency variables and complex-valued. We know that the FT of a real-valued signal exhibits conjugate symmetry, and the power spectrum is redundant in the negative frequency region. The bispectrum, which is the product of the three Fourier coefficients, shows also symmetry and hence, is calculated in the non-redundant region (Nikias 1993). Assuming that there is no bispectral aliasing, the bispectrum of a real-valued signal is uniquely defined in the triangle region which is boundaries characterized by 0 ≤ f2 ≤ f1 ≤ f1 + f2 ≤ 1. Moreover, Ω denotes the triangle region that is termed as the principal domain or the non-redundant region, as illustrated in Fig. 1.

Fig. 1 — Non-redundant region (Ω) for computation of the bispectrum for real signals

In order to certify the statistical accuracy, a long signal length is segmented into K epochs and the bispectrum of each epoch $b_{j} (f_{1}, f_{2})$ is yielded (Muthuswamy and Sharma 1996). Where j is the number of epochs, j = 1,2,…K. Eventually, the following equation is used to calculate the bispectrum of the total signal by the averaged value of these bispectra:

BS (f 1, f 2) = \frac{1}{K} \sum_{j = 1}^{K} b_{j} (f_{1}, f_{2}),

Bispectrum can be estimated by various methods. In this paper both direct (FFT-based) (Mendal 1991; Li et al. 2005) and parametric method (Xianda 1995) are used.

In the direct (FFT-based) method, the recorded signal are sampled and segmented into several overlapping frames. The mean of each record is removed and the FFT is calculated. Then, by the relationship between the cumulant spectrum and the moment spectrum, the bispectrum is computed. In addition, the data are smoothed and windowed so that the computation variance reduces (SubbaRao and Gabr 1984). Not that the Rao-Gabr window is utilized in this study (Childers 1978).

In the parametric approach, the autoregressive (AR) model is chosen. When new data are available, the coefficient of the AR model can be easily calculated by solving a set of linear equation. Efficiently, the coefficients can be updated by the Kalman filter equation (Ning and Bronzino 1990; Sigl and Chamoun 1994). The FFT length is set to 128 for both methods and the percentage overlap between segments is set to zero. For parametric model, an AR order of ten is used.

The extracted bispectral based features are:

Mean of bispectral magnitude is defined by

$M_{avg} = \frac{1}{L} \sum_{Ω} |B (f_{1}, f_{2})|,$ 3

Note that L denotes the number of points within the region $Ω$ .
Max of bispectral magnitude within the region is given by

$Max =_{Ω}^{max} |B (f_{1}, f_{2})|,$ 4
Min of bispectral magnitude within the region is expressed as follows

$Min =_{Ω}^{min} |B (f_{1}, f_{2})|,$ 5
The following equation is used to calculate the sum of logarithmic amplitudes of the bispectrum (Chua et al. 2009):

$H = \sum_{Ω} Log (|B (f_{1}, f_{2})|),$ 6
Bispectral phase entropy (Ph) is given by (Acharya et al. 2008):

$P_{h} = \sum_{n} p (ψ_{n}) logp (ψ_{n}),$ 7
where

$p (ψ_{n}) = \frac{1}{L} \sum_{Ω} I (ϕ (B (f_{1}, f_{2})) ϵ ψ_{n}),$ 8

$ψ_{n} = \{ϕ | - π + \frac{2 π n}{N} \leq ϕ < - π + \frac{2 π (n + 1)}{N}\},$ 9
n = 0,1,…,N − 1

where $ϕ$ and l(.) denote the phase angle of the bispectrum and the indicator function, respectively (Mookiah et al. 2012). Note that l(.) has a value of 1 when the phase angle is within the range bin $ψ_{n}$ described by Eq. (8).
Bispectrum entropies are calculated and given as follows (Mookiah et al. 2012):

$P_{1} = - \sum_{k} p_{k} {logp}_{k},$ 10
where

$p_{k} = \frac{|B (f_{1}, f_{2})|}{\sum_{Ω} |B (f_{1}, f_{2})|},$ 11

$P_{2} = - \sum_{i} q_{i} {logq}_{i},$ 12
where

$q_{i} = \frac{{|B (f_{1}, f_{2})|}^{2}}{\sum_{Ω} {|B (f_{1}, f_{2})|}^{2}},$ 13

$P_{3} = - \sum_{n} r_{n} {logr}_{n},$ 14
where

$r_{n} = \frac{{|B (f_{1}, f_{2})|}^{3}}{\sum_{Ω} {|B (f_{1}, f_{2})|}^{3}},$ 15

Classification of healthy subjects and AD groups

More recently, the use of classifiers has achieved excellence in medical diagnosis (Ren 2012). Because, classification is an incredibly important part of a diagnostic system, so that its performance directly affects the accuracy of the system. Four well-known classifiers, k-Nearest Neighbor (KNN), Support Vector Machines (SVM), Naïve Bayes (NB) and Decision Trees (DT) are used in this study. The KNN method is used in order to classify the objects based on closest training samples in the feature space (Han and Kamber 2006). Moreover, it is a type of data-driven learning, or lazy learning where the function is only approximated locally and all calculation is postponed until classification (Yuvaraj et al. 2014). As a measure to estimate the similarity of testing points, Euclidean distance is used. The following equation is used to compute the Euclidean distance

D_{E} = (a, b) = \sum_{i - 1}^{N} {(a_{i} - b_{i})}^{2},

where a and b denote the training and testing data and the number of features denoted by N, respectively. The different values of K between 1 and 10 are tested in this paper.

The SVM, developed recently, have been used extensively as classification and regression due to its good performance in noisy and complex domains (Pai et al. 2011). In addition, a separating hyper-plane maximizing the margin between the n-dimension input data classes (n is the number of features used as inputs) is specified in SVM. Moreover, it can readily discriminate nonlinearly separable data through kernal functions in order to map the data onto a higher dimension space in which the data makes more separable (Muller et al. 2001). Beside, kernel is the key that characterizes the performance of the SVM, the polynomial kernel and radial basis function (RBF) kernel is most commonly utilized (Christianini and Taylor 2000). The SVM classifier along with RBF kernel function is used in this paper.

The DT is a robust tool for classification and commonly built by recursive partitioning (Breiman et al. 1984; Quinlan 1993). DT is used as a method of classification for complex decision-making structure. Its structure is similar to a flowchart. The data is then divided according to the test, in each node level the decision rule determines what the next node of the decision process will be, and the process repeats recursively for each child-node. Besides leaf nodes, nodes without children, is related to the final decision yields from the tree. Moreover, a univariate or single attribute split is selected as the root of the tree, using some criteria includes mutual information, gain-ratio, gini index and so on. In this paper, C4.5 decision-tree algorithm that uses pruning, gain-ratio criterion was implemented (Christianini and Taylor 2000).

In order to compute the probability of each class given, in the NB classifier assumes that the attributes are conditionally independent of any given the label (Good 1965; Langley et al. 1992). Abstractly, Naive Bayes is a conditional probability model. Suppose $X = (x_{1}, \dots, x_{N})$ is a problem in stance to be classified (N is the number of features or independent variables). This classifier assigns a class label $C_{k}$ to X as follows:

C_{k} =_{k ϵ \{1, \dots, U\}}^{argmax} P (C_{k} | x_{1}, \dots, x_{n}),

U is the number of possible outcomes or classes. After Bayes’ theorem is used, the Eq. (17) can be simplified as:

C_{k} =_{k \in \{1, \dots, U\}}^{argmax} p (C_{k}) \prod_{i = 1}^{n} p (x_{i} | C_{k}),

The classifiers are implemented by using the Weka software.

Experimental results and discussion

Figure 2a, b represents the spontaneous speech signals and spectrograms of a control subject and an Alzheimer’s patient (TS), respectively. Since the loss of language skills caused by difficulties the AD patient faces in speaking, understanding and establishing relationship with the natural environment, he suffers an important poverty (more and longer pauses or silence sections) in his signal during spontaneous speech.

Fig. 2 — Signal and spectrogram of a healthy control subject (a) and an Alzheimer’s patient (TS) (b) during spontaneous speech

The purpose of this study was to achieve an Automatic Classification of healthy subjects and patients in three AD levels. In order to analyze the effect of HOS on the performance of the overall system, all speech signals utilized were preprocessed, with 60 segments selected for each group from the database described in section “Data collection”. The segmentation of speech signals was necessary for HOS quantification analysis to be performed on the spontaneous speech signals. For more information about how to calculate the segmentation and the processing on the speech signals, see our previous works (Nasrolahzadeh et al. 2014, 2016a). Then, as mentioned before, bispectrum of each speech segment were estimated, using two FFT and parametric models. For simplification purposes, the bispectrum estimated by the FFT method was shown by FFT-B and the one estimated by the parametric method by AR-B.

Figures 3a–d and 4a–d show the FFT-B and the AR-B for four signals from a healthy person and three AD patients from three levels.

Fig. 3 — The FFT-B of spontaneous speech signals. a healthy control subject, b FS subject, c SS subject, and d TS subject

Fig. 4 — The AR-B of spontaneous speech signals. a healthy control subject, b FS subject, c SS subject, and d TS subject

The colors indicate the relative changes in amplitude of bispectrum, with Blue and red representing the highest decrease and increase, respectively. These figures show that the estimated bispectrums were different for the four groups. In this paper, bispectral analysis was performed using the MATLAB toolbox. As shown in Fig. 3a–d, the phase-coupled harmonics were less than 0.2 Hz before AD and the phase-coupled harmonics were below 0.5 Hz for three levels of AD. The bispectrum shows sharp peaks at around (0.01, 0.2) HZ and (0.2, 0.01) Hz (and symmetric locations) before AD, with sharp peaks approximately between 0.2and 0.4 Hz (and symmetric locations) during AD. In this study, an AR order of 10 was used to estimate the bispectrum using the AR-B. As shown in Fig. 4a, before AD the dominant peak was approximately at (f1, f2) = (0.05, 0.05) Hz. According to Fig. 4b–d, in the three stages, the dominant peak was approximately between (f1, f2) = (1.4, 0.01) Hz. The dominant peaks demonstrating the attendance of a second harmonic were about 0.05 Hz before AD and 0.1 Hz during Alzheimer’s disease. As can be seen in Fig. 4a–d, the phase-coupled harmonics were less than 0.2 Hz before Alzheimer’s disease, with the phase-coupled harmonics being between 0.1 and 0.5 Hz for three level of AD. In addition, during AD phase, the coupling tended to higher frequencies than that before Alzheimer’s disease. The results show that the proposed approach can be appeared to affect differentially the characteristic components of spontaneous speech signals in various stages of AD.

Next, eight features were extracted from FFT-B and AR-B. The one-way analysis of variance (ANOVA) was run on the data collected in order to evaluate the ability of features in discriminating the different groups. Appropriate statistical analyses were run to determine the features with the best discrimination power at the significance level p = 0.001. Table 2 presents 16 features and their corresponding p values. It also shows there is statistically significant differences among the four groups concerning $P_{1}$ , $P_{h}$ , H and $M_{avg}$ FFT-B features and $P_{3}$ , Max, H and $M_{avg}$ AR-B features.

Table 2.

Features and corresponding p values for FFT-B and AR-B

Features	p value (FFT-B)	p value (AR-B)
$M_{avg}$	2.5464e−005	5.1423e−007
$Max$	0.0056	4.2157e−005
$Min$	0.0027	0.0025
$H$	5.2142e−006	6.5478e−007
$P_{1}$	3.8951e−004	0.0065
$P_{2}$	0.0018	0.0011
$P_{3}$	0.0034	5.0458e−005
$P_{h}$	8.0654e−004	0.0087

Open in a new tab

The classification reliability and performance of the classifiers by ten-fold cross-validation is evaluated in this paper (Refaeilzadeh et al. 2009). In addition, to further investigate the robustness of the proposed system, can be evaluated by the sensitivity, specificity and total classification accuracy measures, which are described as follows:

Sensitivity = \frac{TP}{{TP + FN}^{'}}

Specificity = \frac{TN}{{FP + TN}^{'}}

Total classification accuracy = \frac{TP + TN}{TP + FN + FP + {TN}^{'}}

where TP is true positive, TN is true negative, FP is false positive and FN is false negative.

A true positive decision occurs when the positive detection of the classifier coincided with a positive detection of the physician whereas a true negative decision occurs when both the classifier and the physician both suggest the absence of a positive detection. A false positive decision occurs when the classifier incorrectly labels a negative case as a positive and a false negative decision occurs when the classifier incorrectly labels a positive case as a negative.

Moreover, automatic feature selection is conducted using the information gains method, using WEKA software and ten-fold cross validation method. The information gains (mean ± standard deviation for 10 folds) of features are shown in Table 3. As shown in Table 3 the impact, from highest to the lowest, is related to the Max, $P_{1}$ , $M_{avg}$ , $P_{2}$ , $P_{3}$ , H, $P_{h}$ ,and Min for the AR-B features and H, $P_{1}$ , $P_{2}$ , $P_{h},$ Max, $M_{avg}$ , $P_{3},$ and Min for the FFT-B features, respectively. Consequently, 8, 7, 6, 5, 4 and 3 features with the highest information gain were selected from each bispectrum based on the features of the group. For the FFT-B features, the use of 6 features and for the AR-B features, the use of 5 features lead to the best results.

Table 3.

Information gain for selected features

Features	Mean ± std of weight(AR-B features)	Mean ± std of weight (FFT-B features)
$M_{avg}$	0.745 ± 0.035	0.164 ± 0.097
Max	0.945 ± 0.048	0.421 ± 0.076
Min	0.098 ± 0.053	0.034 ± 0.262
H	0.187 ± 0.065	0.854 ± 0.045
$P_{1}$	0.879 ± 0.052	0.721 ± 0.071
$P_{2}$	0.575 ± 0.277	0.564 ± 0.024
$P_{3}$	0.542 ± 0.056	0.0.84 ± 0.045
$P_{h}$	0.133 ± 0.203	0.552 ± 0.141

Open in a new tab

The confusion matrix of the selected FFT-B and the AR-B based features with four classifiers are shown in Tables 4(a–d) and 5(a–d). As can be seen, HCS group is most often confused with FS, and TS group with SS while HCS and FS groups are never confuse with TS group.

Table 4.

Confusion matrix of (a) KNN, (b) SVM, (c) DT, (d) NB for FFT-B based features

	Classes	Predicted
	Classes	HCS	FS	SS	TS	Total
(a)
Actual class	HCS	51	6	3	0	60
	FS	6	50	4	0	60
	SS	1	2	55	2	60
	TS	0	0	6	54	60
	Total	58	58	68	56	240
(b)
Actual class	HCS	48	8	4	0	60
	FS	7	48	5	0	60
	SS	3	5	51	1	60
	TS	0	0	5	55	60
	Total	58	61	65	56	240
(c)
Actual class	HCS	53	3	4	0	60
	FS	4	54	2	0	60
	SS	0	5	53	2	60
	TS	0	0	2	58	60
	Total	57	62	61	60	240
(d)
Actual class	HCS	51	5	4	0	60
	FS	4	53	3	0	60
	SS	0	3	55	2	60
	TS	0	0	3	57	60
	Total	55	61	65	59	240

Open in a new tab

Table 5.

Confusion matrix of (a) KNN, (b) SVM, (c) DT, (d) NB for AR-B based features

	Classes	Predicted
	Classes	HCS	FS	SS	TS	Total
(a)
Actual class	HCS	56	3	1	0	60
	FS	3	57	0	0	60
	SS	0	3	56	1	60
	TS	0	0	0	60	60
	Total	59	63	57	61	240
(b)
Actual class	HCS	53	4	3	0	60
	FS	3	55	2	0	60
	SS	0	2	57	1	60
	TS	0	0	2	58	60
	Total	56	61	64	59	240
(c)
Actual class	HCS	51	4	5	0	60
	FS	5	51	4	0	60
	SS	0	6	52	2	60
	TS	0	0	2	58	60
	Total	56	61	63	60	240
(d)
Actual class	HCS	54	5	1	0	60
	FS	6	52	2	0	60
	SS	2	2	56	0	60
	TS	0	0	2	58	60
	Total	62	59	61	58	240

Open in a new tab

Tables 6(a–d) and 7(a–d) present the classification results obtained using FFT-B and AR-B based features and the four mentioned classifiers. These results calculated by the corresponding confusion matrixes and the Eqs. 19, 20, 21. As can be seen, the AR-B based features are superior to the FFT-B based features in terms of the classification of four mentioned groups. The best accuracy with the AR-B based features was obtained by the KNN (97.71), and the best accuracy with the FFT-B based features was obtained with the DT (95.42).

Table 6.

The values of statistical parameters for (a) KNN, (b) SVM, (c) DT, (d) NB and FFT-B based features

	Statistical parameters (%)
	Sensitivity	Specificity	Total classification accuracy
(a)
HCS	85	96.10	93.75
FS	83.33	95.55
SS	91.67	92.78
TS	90	98.89
(b)
HCS	80	94.44	92.08
FS	80	92.78
SS	85	92.22
TS	91.67	99.44
(c)
HCS	88.33	97.78	95.42
FS	90	95.55
SS	88.33	95.55
TS	96.67	98.89
(d)
HCS	85	97.78	95
FS	88.33	95.55
SS	91.67	94.44
TS	95	98.89

Open in a new tab

Table 7.

The values of statistical parameters for (a) KNN, (b) SVM, (c) DT, (d) NB and AR-B based features

	Statistical parameters (%)
	Sensitivity	Specificity	Total classification accuracy
(a)
HCS	93.33	98.33	97.71
FS	95	96.67
SS	93.33	99.44
TS	100	99.44
(b)
HCS	88.33	98.33	96.46
FS	91.67	96.67
SS	95	96.11
TS	96.67	99.44
(c)
HCS	85	97.22	94.18
FS	85	94.44
SS	86.67	93.89
TS	96.67	96.67
(d)
HCS	90	95.55	95.94
FS	86.67	96.11
SS	93.33	97.22
TS	96.67	98.89

Open in a new tab

All methods showed a good performance in detecting TS stage. Also in all tables, the best sensitivities and specificities are always related to the TS stage while the worst sensitivities are often related to the FS stage and the worst specificities are often related to the SS stage discrimination. The best total classification accuracy (97.71) was obtained by the DT with the AR-B based features and the worst accuracy (92.08) was obtained by the SVM with the FFT-B based features. Also the KNN classifier with the AR-B based features was capable of detecting TS stage with an accuracy index of 100%.

Table 7(a) shows that the AR-B based features were able to diagnose the early stage of AD with a high accuracy of 95%. Therefore, the proposed method in this research can be a good scheme for early diagnosis of AD, a step which seems to be a significant and challenging problem in AD diagnosis.

To the best of our knowledge, the present study is the first in which the bispectral structural of spontaneous speech data is used with the purpose of early diagnosing AD. HOS has attracted considerable attention as a powerful feature extractor in that it enjoys a good resistance to noise in processing speech signals (Nikias and Raghuveer 1987). HOS can be used to extract several additional features relating to frequency interacting by coupling that may be useful for speech processing. Therefore HOSs have been utilized to elicit the FS, SS, TS, and HCS features from speech signals with the purpose of producing feature vectors for the purpose of diagnosing and classifying the HCS subjects and three groups of Alzheimer’s (i.e., FS, SS, and TS).

Additionally, the performances of the proposed method was assessed by using four classifiers namely KNN, SVM, NB and DT to consider the robustness of the HOS based features in classifying each of the three levels of AD patients and healthy controls. To this end, specificity sensitivity, and total classification accuracy were three parameters utilized to evaluate the results of the classification.

Because the purpose of this study was the early diagnosis of AD, an attempt was made to focus on the classification results between three different stages of Alzheimer’s and HCS. However, in discriminating the groups, the proposed method obtained a classification accuracy of 97.71%. As to the discrimination of FS, the high classification accuracy of 95% obtained shows that our method has the ability of the early diagnosis of AD.

A brief comparison between our proposed method with other reported systems from the viewpoint of diagnostic and classification results is represented in Table 8. Although our result in term of classification accuracy does not show improvement compared to other methods (López de Ipiña et al. 2013a, b; König et al. 2015), fewer features were used in the present method compared to other methods, which is the advantage of the present method. Moreover, the present method might be useful to analysis and automatic diagnosis of other central nervous system pathology affecting the language skills of people with abnormalities such as Parkinson disease (Nasrolahzadeh et al. 2016a).

Table 8.

Comparison of our proposed method with other systems

Reference Feature extraction method	Number of features	Number of classes	Database	Accuracy (%)
Our proposed method	HOS-based features/5	4	Private	97.71
Our previous method (Nasrolahzadeh et al. 2014)	Acoustic and the voice quality features/more than 100 feature	4	Private	97.96
López de Ipiña et al. (2013)	Acoustic and the voice quality and duration features + Fractal dimension/more than 100 feature	4	Private	97.7
König et al. (2015)	First vocal markers/6	3	Private	81

Open in a new tab

As it mentioned before, AD effects on the language skills and these changes are revealed in the spontaneous speech. This can be due to the brain damages causing the typical symptoms of front-temporal dementia including changes in personality and behavior and difficulties with language (Nasrolahzadeh et al. 2016a; Rohrer 2012). Overall, our results showed that the HOS features are able to assess and quantify such differences in speaking. However, there are some limitations which are worth mentioning. First, the use of a small number of AD subjects can negatively influence the dependability of the system. Second, all the subjects were native speakers of Persian. Similar studies with subjects speaking other languages seem warranted.

Conclusion

In this paper, a new technique based on HOSs was introduced for eliciting discriminative information from structural human speech signals with the purpose of an early diagnosis of AD, and also the classification between three stages of AD and healthy control groups. The results of the present study indicated that the spontaneous speech signals were reliable to distinguish the healthy subjects and subjects with three different stages of Alzheimer’s. The design of data acquisition and data selection procedure was presented in detail. Since the speech signal is a non-stationary, non-Gaussian, and, non-linear entity in nature, non-linear features such as bispectrum are utilized to classify each of the four mentioned classes. The performance capability of the extracted features was analyzed using four classifiers: KNN, SVM, DT, and NB. Our proposed system using DT classifier with AR-B based features managed to discriminate HCS, FS, SS, and TS classes with an accuracy index of 97.71%. The proposed method was instrumental in applying HOSs as highly suitable tools to diagnose the earliest stage of AD with high accuracy. Hence, this method can be an inexpensive readily used test for pre-clinical evaluation of AD diagnosis. The results obtained demonstrated that, owing to its precision and reliability, the proposed method can be used for the early diagnosis of Alzheimer’s disease.

Contributor Information

Mahda Nasrolahzadeh, Email: ms.nasrolahzadeh@yahoo.com.

Zeynab Mohammadpoory, Email: z.mohammadpoory@gmail.com.

Javad Haddadnia, Email: Haddadnia@hsu.ac.ir.

References

Acharya UR, Chua CK, Ng EY, Yu W, Chee C. Application of higher order spectra for the identification of diabetes retinopathy stages. Journal of medical system. 2008;32:481–488. doi: 10.1007/s10916-008-9154-8. [DOI] [PubMed] [Google Scholar]
Azarbarzin A, Moussavi Z (2011) Nonlinear properties of snoring sounds. In: 2011 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 4316–4319
Banbrook M, Mclughlin S (1994) Is speech chaotic? Invariant geometrical measures for speech data. In: IEEE colloquium on exploiting chaos in signals processing, 8/1–8/10, 1994
Breiman L, Friedman JH, Olshen RA, Stone CJ. Classification and Regression Trees. Belmont: Wadsworth International Group; 1984. [Google Scholar]
Buiza C. Evaluación y tratamiento de los trastornosdellenguaje. Donostia: MatiaFundazioa; 2010. [Google Scholar]
Childers DG, editor. Modern spectrum analysis. New York: IEEE Press; 1978. [Google Scholar]
Christianini N, Taylor J. Support vector machines and other Kernal-based learning methods. Cambridge: Cambridge University Press; 2000. [Google Scholar]
Chua KC, Chandran V, Acharya UR, Lim CM. Cardiac state diagnosis using higher order spectra of heart rate variability. J Med Eng Technol. 2008;32:145–155. doi: 10.1080/03091900601050862. [DOI] [PubMed] [Google Scholar]
Chua CK, Chandran V, Acharya RU, Min LC. Cardiac health diagnosis using higher order spectra and support vector machine. Open Med Inform J. 2009;3:1–8. doi: 10.2174/1874431100903010001. [DOI] [PMC free article] [PubMed] [Google Scholar]
Chua KC, Chandran V, Acharya UR, Lim CM. Application of higher order statistics/spectra in biomedical signals—a review. Med Eng Phys. 2010;32:679–689. doi: 10.1016/j.medengphy.2010.04.009. [DOI] [PubMed] [Google Scholar]
Dauwels J, Vialatte F, Cichocki A. Diagnosis of Alzheimer’s disease from EEG signals: where are we standing? Curr Alzheimer Res. 2010;7:487–505. doi: 10.2174/156720510792231720. [DOI] [PubMed] [Google Scholar]
Dogan MC, Mendel JM (1992) Real time robust pitch detector. In International conference on acoustics, speech and signal processing, San Francisco, USA, pp I129–I132
Gdoura IJ, Louzou P, Spanias A (1993) Speech processing using higher order statistics. In: Proceeding of the IEEE international symposium on circuits and systems, Chicago, pp 160–163
Good IJ. The estimation of probabilities: an essay on modern Bayesian methods. Cambridge: M.I.T. Press; 1965. [Google Scholar]
Han J, Kamber M. Data mining: concepts and techniques. 2. San Francisco: Morgan Kaufmann; 2006. [Google Scholar]
Hu WT, McMillan C, Libon D, Leight S, Forman M, Lee VMY, Trojanowski JQ, Grossman M. Multimodal predictors for Alzheimer’s disease in non fluent primary progressive aphasia. Neurology. 2010;75:595–602. doi: 10.1212/WNL.0b013e3181ed9c52. [DOI] [PMC free article] [PubMed] [Google Scholar]
Indrebo KM, Povinelli RJ, Johnson MT (2004) A comparison of reconstructed phase spaces and cepstral coefficients for multi-band phoneme classification. In: Proceedings 7th international conference on signal processing, 2004. Proceedings. ICSP ‘04
Jack CR, Jr, Holtzman DM. Biomarker modeling of Alzheimer’s disease. Neuron. 2013;80:1347–1358. doi: 10.1016/j.neuron.2013.12.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
Jia P, Dai J, Pan Y, Zhu M. Novel algorithm for attribute reduction based on mutual-information gain ratio. J Zhejiang Univ (Eng Ed) 2005;40:1041–1044. [Google Scholar]
Kippenhan JS, Barker WW, Nagel J, Grady C, Duara R. Neural-network classification of normal and Alzheimer’s disease subjects using high-resolution and low-resolution PET cameras. J Nucl Med. 1994;35:7–15. [PubMed] [Google Scholar]
König A, Satt A, Sorin A, Hoory R, Toledo-Ronen O, Derreumaux A, Manera V, Verhey F, Aalten P, Robert PH, David R. Automatic speech analysis for the assessment of patients with predementia and Alzheimer’s disease. Alzheimer’s Dement Diagn Assess Disease Monit. 2015;1:112–124. doi: 10.1016/j.dadm.2014.11.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kumar A, Mullick SK. Attractor dimension, entropy and modeling of speech time series. Electron Lett. 1990;26:1790–1791. doi: 10.1049/el:19901147. [DOI] [Google Scholar]
Langley P, Iba W, Thompson K (1992) An analysis of Bayesian classifiers. In Proceedings of the tenth national conference on artificial intelligence. AAAI Press and MIT Press, pp 223–228
Li Z, Wu Z, He Y, Fulei C. Hidden Markov model-based fault diagnostics method in speed-up and speed-down process for rotating machinery. Mech Syst Signal Process. 2005;19:329–339. doi: 10.1016/j.ymssp.2004.01.001. [DOI] [Google Scholar]
López de Ipiña K et al (2013a) Automatic analysis of emotional response based on non-linear speech modeling oriented to Alzheimer disease diagnosis. In: IEEE 17th international conference on intelligent engineering systems, 19–21 June 2013
López de Ipiña K, et al. Feature extraction approach based on fractal dimension for spontaneous speech modelling oriented to Alzheimer disease diagnosis. Adv Nonlinear Speech Process. 2013;7911:144–151. doi: 10.1007/978-3-642-38847-7_19. [DOI] [Google Scholar]
López de Ipiña K, Solé-Casals J, Eguiraun H, Alonso JB, Travieso CM, Ezeiza A, Barroso N, Ecay-Torres M, Martinez-Lage P, Beitia B. Feature selection for spontaneous speech analysis to aid in Alzheimer’s disease diagnosis: a fractal dimension approach. Comput Speech Lang. 2015;30:43–60. doi: 10.1016/j.csl.2014.08.002. [DOI] [Google Scholar]
Martinez F, Garcia J, Perez E, Carro J, Anara JM. Patrones de Prosodiaexpresiva en pacientes con enfermedadde Alzheimer. Psicothema. 2012;24:16–21. [PubMed] [Google Scholar]
Martis RJ, Acharya UR, Mandana KM, Ray AK, Chakraborty C. Cardiac decision making using higher order spectra. Biomed Signal Process Control. 2013;8:193–203. doi: 10.1016/j.bspc.2012.08.004. [DOI] [Google Scholar]
McKeith IG, Galasko D, Kosaka K, Perry EK, Dickson DW, Hansen LA, Salmon DP, Lowe J, Mirra SS, Byrne EJ, Lennox G, Quinn NP, Edwardson JA, Ince PG, Bergeron C, Burns A, Miller BL, Lovestone S, Collerton D, Jansen EN, Ballard C, de Vos RA, Wilcock GK, Jellinger KA, Perry RH. Consensus guideline for the clinical and pathological diagnosis of dementia with lewy bodies (DLB): report of the consortium on DLB international workshop. Neurology. 1996;47:1113–1124. doi: 10.1212/WNL.47.5.1113. [DOI] [PubMed] [Google Scholar]
McKhann GM, et al. The diagnosis of dementia due to Alzheimer’s disease: recommendations from the National Institute on Aging-Alzheimer’s Association workgroups on diagnostic guidelines for Alzheimer’s disease. Alzheimer’s Dement. 2011;7:263–269. doi: 10.1016/j.jalz.2011.03.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
Mendal JM. Tutorial on higher-order statistics (spectra) in signal processing and system theory: theoretical results and some application. Proc IEEE. 1991;79:278–305. doi: 10.1109/5.75086. [DOI] [Google Scholar]
Mohammadpoory Z, Haddadnia J. Speech enhancement using Laplacian mixture model under signal presence uncertainty. IJE Trans C Asp. 2014;27:1367–1376. [Google Scholar]
Mookiah MRK, Acharya UR, Lim CM, Petznick A, Suri JS. Data mining technique for automated diagnosis of glaucoma using higher order spectra and wavelet energy features. Knowl-Based Syst. 2012;33:73–82. doi: 10.1016/j.knosys.2012.02.010. [DOI] [Google Scholar]
Moreno A, Fonollosa JAR (1992) Pitch determination of noisy speech using HOS. In: International conference on acoustics, speech and signal processing, San Francisco, USA, pp 133–136
Moreno A, Fonollosa JAR, Vidal J (1993) Vocoder design based on HOS. In: Eurospeech ‘93, Berlin, Germany, pp 519–522
Muller KR, Mika S, Ratsch G, Tsuda K, Scholkopf B. An introduction to Kernal based learning algorithms. IEEE Trans Neural Netw. 2001;12:181–201. doi: 10.1109/72.914517. [DOI] [PubMed] [Google Scholar]
Muthuswamy J, Sharma A. A study of electroencephalographic descriptors and end-tidal concentration in estimating depth of anesthesia. J Clin Monit. 1996;12:353–364. doi: 10.1007/BF02077633. [DOI] [PubMed] [Google Scholar]
Nasrolahzadeh M, Haddadnia J. Poincaré plots of Spontaneous Speech Signals during Alzheimer’s disease. Mitteilungen Saechsischer Entomologen. 2016;119:358–365. [Google Scholar]
Nasrolahzadeh M, Mohhamadpoori Z, Haddadnia J. Optimal way to find the frame length of the speech signal for diagnosis of Alzheimer’s disease with PSO. Asian J Math Comput Res. 2014;2:33–41. [Google Scholar]
Nasrolahzadeh M, Mohhamadpoori Z, Haddadnia J. Adaptive neuro-fuzzy inference system for classification of speech signals in Alzheimer’s disease using acoustic and non-linear characteristics. Asian J Math Comput Res. 2015;3:122–131. [Google Scholar]
Nasrolahzadeh M, Mohhamadpoori Z, Haddadnia J. Alzheimer’s disease diagnosis using spontaneous speech signals and hybrid features. Asian J Math Comput Res. 2015;7:322–331. [Google Scholar]
Nasrolahzadeh M, Mohhamadpoori Z, Haddadnia J. Analysis of mean square error surface and its corresponding contour plots of spontaneous speech signals in Alzheimer’s disease with adaptive wiener filter. Comput Hum Behav. 2016;61:364–371. doi: 10.1016/j.chb.2016.03.031. [DOI] [Google Scholar]
Nasrolahzadeh M, Mohhamadpoory Z, Haddadnia J. A novel method for early diagnosis of Alzheimer’s disease based on higher-order spectral estimation of spontaneous speech signals. Cognit Neurodyn. 2016;10:495–503. doi: 10.1007/s11571-016-9406-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
Nikias CL, Petropulu A. Higher-order spectral analysis: a nonlinear signal processing framework. Englewood Cliffs: Prentice Hall; 1993. [Google Scholar]
Nikias CL, Raghuveer MR. Bispectrum estimation: a digital signal processing framework. Proc IEEE. 1987;75:869–891. doi: 10.1109/PROC.1987.13824. [DOI] [Google Scholar]
Ning T, Bronzino JD. Autoregressive and Bispectral analysis techniques: EEG applications. IEEE Eng Med Biol. 1990;9:47–50. doi: 10.1109/51.62905. [DOI] [PubMed] [Google Scholar]
Ortiz A, Górriz JM, Ramírez J, Martínez-Murcia FJ. LVQ-SVM Based CAD tool applied to structural MRI for the diagnosis of the Alzheimer’s disease. Pattern Recognit Lett. 2013;34:1725–1733. doi: 10.1016/j.patrec.2013.04.014. [DOI] [Google Scholar]
Pai PF, Hsu MF, Wang MC. A support vector machine based model for detecting top management fraud. Knowl-Based Syst. 2011;24:314–321. doi: 10.1016/j.knosys.2010.10.003. [DOI] [Google Scholar]
Paliwal KK, Sondhi MM (1991) Recognition of noisy speech using cumulant based linear prediction analysis. In: International conference on acoustics, speech and signal processing, Toronto, Canada, pp 429–432
Quinlan JR. C4.5: programs for machine learning. Los Altos: Morgan Kaufmann Publishers, Inc; 1993. [Google Scholar]
Rangoussi M, Delopoulos A, Tsatsanis M (1993a) On the use of higher-order statistics for robust endpoint detection of speech. In: IEEE signal processing workshop on higher-order statistics, Lake Tahoe, California, USA. IEEE. pp 56–60
Rangoussi M, Bakamidis S, Carayannis G (1993b) Robust endpoint detection of speech in the presence of noise. In: Eurospeech ‘93, Berlin, Germany. ESCA, pp 649–652
Refaeilzadeh P, Tang L, Liu H (2009) Cross-validation. In: Encyclopedia of database systems, pp 532–538
Ren J. ANN vs. Svm: which one performs better in classification of Mccs in mammogram imaging. Knowl-Based Syst. 2012;26:144–153. doi: 10.1016/j.knosys.2011.07.016. [DOI] [Google Scholar]
Reynolds A. Alzheimer disease: focus on computed tomography. Radiol Technol. 2013;2085:187CT–211CT. [PubMed] [Google Scholar]
Rohrer JD. Structural brain imaging in frontotemporal dementia. Biochem Biophys Acta. 2012;1822:325–332. doi: 10.1016/j.bbadis.2011.07.014. [DOI] [PubMed] [Google Scholar]
Salas-Gonzalez D, Gorriz JM, Ramírez J, Lopez M, Alvarez I, Segovia F, Chaves R, Puntonet CG. Computer-aided diagnosis of Alzheimer’s disease using support vector machines and classification trees. Phsy Med Biol. 2010;5:2807–2817. doi: 10.1088/0031-9155/55/10/002. [DOI] [PubMed] [Google Scholar]
Salavedra JM, Masgrau E, Moreno A, Jove X (1993a) A speech enhancement system using higher order AR estimation in real environments. In: Eurospeech ‘93, Berlin, Germany, pp 223–226
Salavedra JM, Masgrau E, Moreno A, Jove X (1993b) Comparison of different order cumulants in a speech enhancement system by adaptive wiener filtering. In: IEEE signal processing workshop on higher-order statistics, Lake Tahoe, California, USA. IEEE, pp 61–65
Sigl JC, Chamoun NG. An introduction to bispectral analysis for the electroencephalogram. J Clin Monit. 1994;10:392–404. doi: 10.1007/BF01618421. [DOI] [PubMed] [Google Scholar]
Silipo R, Deco G, Vergassola R, Bartsch H. Dynamics extraction in multivariate biomedical time series. Biol Cybern. 1998;79:15–27. doi: 10.1007/s004220050454. [DOI] [PubMed] [Google Scholar]
SubbaRao T, Gabr MM. An introduction to bispectral analysis and bilinear time series models (Lecture notes in statistics) New York: Springer; 1984. [Google Scholar]
Teodorescu HN, Grigorasand F, Apppei V. Nonlinear and non stationary processes in speech production. Int J Chaos Theor Appl. 1996;5:1453–1457. [Google Scholar]
Todder D, Avissar S, Schreiber G. Non-linear dynamic analysis of inter- word time intervals in psychotic speech. IEEE J Transl Eng Health Med. 2013;1:2200107. doi: 10.1109/JTEHM.2013.2268850. [DOI] [PMC free article] [PubMed] [Google Scholar]
Xianda Z. Modern signal processing. Beijing: Tsinghua University Press; 1995. pp. 373–433. [Google Scholar]
Yuvaraj R, Murugappan M, Ibrahim NM, Sundaraj K, Omar MI, Mohamad K, Palaniappan R. Detection of emotions in Parkinson’s disease using higher order spectral features from brain’s electrical activity. Biomed Signal Process Control. 2014;14:108–116. doi: 10.1016/j.bspc.2014.07.005. [DOI] [Google Scholar]
Zhang Y, Dong Z, Phillips P, Wang S, Ji G, Yang J, Yuan TF. Detection of subjects and brain regions related to Alzheimer’s disease using 3D MRI scans based on eigenbrain and machine learning. Front Comput Neurosci. 2015;9:66. doi: 10.3389/fncom.2015.00066.eCollection2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhang Y, Wang S, Phillips P, Dong Z, Ji G, Yang J. Detection of Alzheimer’s disease and mild cognitive impairment based on structural volumetric MR images using 3D-DWT and WTA-KSVM trained by PSOTVAC. Biomed Signal Process Control. 2015;21:58–73. doi: 10.1016/j.bspc.2015.05.014. [DOI] [Google Scholar]
Zhang Z, Zheng H, Liang K, Wang H, Kong S, Hu J, Wu F, Sun G. Functional degeneration in dorsal and ventral attention systems in amnestic mild cognitive impairment and Alzheimer’s disease: an fMRI study. Neurosci Lett. 2015;585:160–165. doi: 10.1016/j.neulet.2014.11.050. [DOI] [PubMed] [Google Scholar]

[CR1] Acharya UR, Chua CK, Ng EY, Yu W, Chee C. Application of higher order spectra for the identification of diabetes retinopathy stages. Journal of medical system. 2008;32:481–488. doi: 10.1007/s10916-008-9154-8. [DOI] [PubMed] [Google Scholar]

[CR2] Azarbarzin A, Moussavi Z (2011) Nonlinear properties of snoring sounds. In: 2011 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 4316–4319

[CR3] Banbrook M, Mclughlin S (1994) Is speech chaotic? Invariant geometrical measures for speech data. In: IEEE colloquium on exploiting chaos in signals processing, 8/1–8/10, 1994

[CR4] Breiman L, Friedman JH, Olshen RA, Stone CJ. Classification and Regression Trees. Belmont: Wadsworth International Group; 1984. [Google Scholar]

[CR5] Buiza C. Evaluación y tratamiento de los trastornosdellenguaje. Donostia: MatiaFundazioa; 2010. [Google Scholar]

[CR6] Childers DG, editor. Modern spectrum analysis. New York: IEEE Press; 1978. [Google Scholar]

[CR7] Christianini N, Taylor J. Support vector machines and other Kernal-based learning methods. Cambridge: Cambridge University Press; 2000. [Google Scholar]

[CR8] Chua KC, Chandran V, Acharya UR, Lim CM. Cardiac state diagnosis using higher order spectra of heart rate variability. J Med Eng Technol. 2008;32:145–155. doi: 10.1080/03091900601050862. [DOI] [PubMed] [Google Scholar]

[CR9] Chua CK, Chandran V, Acharya RU, Min LC. Cardiac health diagnosis using higher order spectra and support vector machine. Open Med Inform J. 2009;3:1–8. doi: 10.2174/1874431100903010001. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR10] Chua KC, Chandran V, Acharya UR, Lim CM. Application of higher order statistics/spectra in biomedical signals—a review. Med Eng Phys. 2010;32:679–689. doi: 10.1016/j.medengphy.2010.04.009. [DOI] [PubMed] [Google Scholar]

[CR11] Dauwels J, Vialatte F, Cichocki A. Diagnosis of Alzheimer’s disease from EEG signals: where are we standing? Curr Alzheimer Res. 2010;7:487–505. doi: 10.2174/156720510792231720. [DOI] [PubMed] [Google Scholar]

[CR12] Dogan MC, Mendel JM (1992) Real time robust pitch detector. In International conference on acoustics, speech and signal processing, San Francisco, USA, pp I129–I132

[CR13] Gdoura IJ, Louzou P, Spanias A (1993) Speech processing using higher order statistics. In: Proceeding of the IEEE international symposium on circuits and systems, Chicago, pp 160–163

[CR14] Good IJ. The estimation of probabilities: an essay on modern Bayesian methods. Cambridge: M.I.T. Press; 1965. [Google Scholar]

[CR15] Han J, Kamber M. Data mining: concepts and techniques. 2. San Francisco: Morgan Kaufmann; 2006. [Google Scholar]

[CR16] Hu WT, McMillan C, Libon D, Leight S, Forman M, Lee VMY, Trojanowski JQ, Grossman M. Multimodal predictors for Alzheimer’s disease in non fluent primary progressive aphasia. Neurology. 2010;75:595–602. doi: 10.1212/WNL.0b013e3181ed9c52. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR17] Indrebo KM, Povinelli RJ, Johnson MT (2004) A comparison of reconstructed phase spaces and cepstral coefficients for multi-band phoneme classification. In: Proceedings 7th international conference on signal processing, 2004. Proceedings. ICSP ‘04

[CR18] Jack CR, Jr, Holtzman DM. Biomarker modeling of Alzheimer’s disease. Neuron. 2013;80:1347–1358. doi: 10.1016/j.neuron.2013.12.003. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR19] Jia P, Dai J, Pan Y, Zhu M. Novel algorithm for attribute reduction based on mutual-information gain ratio. J Zhejiang Univ (Eng Ed) 2005;40:1041–1044. [Google Scholar]

[CR20] Kippenhan JS, Barker WW, Nagel J, Grady C, Duara R. Neural-network classification of normal and Alzheimer’s disease subjects using high-resolution and low-resolution PET cameras. J Nucl Med. 1994;35:7–15. [PubMed] [Google Scholar]

[CR21] König A, Satt A, Sorin A, Hoory R, Toledo-Ronen O, Derreumaux A, Manera V, Verhey F, Aalten P, Robert PH, David R. Automatic speech analysis for the assessment of patients with predementia and Alzheimer’s disease. Alzheimer’s Dement Diagn Assess Disease Monit. 2015;1:112–124. doi: 10.1016/j.dadm.2014.11.012. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR22] Kumar A, Mullick SK. Attractor dimension, entropy and modeling of speech time series. Electron Lett. 1990;26:1790–1791. doi: 10.1049/el:19901147. [DOI] [Google Scholar]

[CR23] Langley P, Iba W, Thompson K (1992) An analysis of Bayesian classifiers. In Proceedings of the tenth national conference on artificial intelligence. AAAI Press and MIT Press, pp 223–228

[CR24] Li Z, Wu Z, He Y, Fulei C. Hidden Markov model-based fault diagnostics method in speed-up and speed-down process for rotating machinery. Mech Syst Signal Process. 2005;19:329–339. doi: 10.1016/j.ymssp.2004.01.001. [DOI] [Google Scholar]

[CR25] López de Ipiña K et al (2013a) Automatic analysis of emotional response based on non-linear speech modeling oriented to Alzheimer disease diagnosis. In: IEEE 17th international conference on intelligent engineering systems, 19–21 June 2013

[CR26] López de Ipiña K, et al. Feature extraction approach based on fractal dimension for spontaneous speech modelling oriented to Alzheimer disease diagnosis. Adv Nonlinear Speech Process. 2013;7911:144–151. doi: 10.1007/978-3-642-38847-7_19. [DOI] [Google Scholar]

[CR27] López de Ipiña K, Solé-Casals J, Eguiraun H, Alonso JB, Travieso CM, Ezeiza A, Barroso N, Ecay-Torres M, Martinez-Lage P, Beitia B. Feature selection for spontaneous speech analysis to aid in Alzheimer’s disease diagnosis: a fractal dimension approach. Comput Speech Lang. 2015;30:43–60. doi: 10.1016/j.csl.2014.08.002. [DOI] [Google Scholar]

[CR28] Martinez F, Garcia J, Perez E, Carro J, Anara JM. Patrones de Prosodiaexpresiva en pacientes con enfermedadde Alzheimer. Psicothema. 2012;24:16–21. [PubMed] [Google Scholar]

[CR29] Martis RJ, Acharya UR, Mandana KM, Ray AK, Chakraborty C. Cardiac decision making using higher order spectra. Biomed Signal Process Control. 2013;8:193–203. doi: 10.1016/j.bspc.2012.08.004. [DOI] [Google Scholar]

[CR30] McKeith IG, Galasko D, Kosaka K, Perry EK, Dickson DW, Hansen LA, Salmon DP, Lowe J, Mirra SS, Byrne EJ, Lennox G, Quinn NP, Edwardson JA, Ince PG, Bergeron C, Burns A, Miller BL, Lovestone S, Collerton D, Jansen EN, Ballard C, de Vos RA, Wilcock GK, Jellinger KA, Perry RH. Consensus guideline for the clinical and pathological diagnosis of dementia with lewy bodies (DLB): report of the consortium on DLB international workshop. Neurology. 1996;47:1113–1124. doi: 10.1212/WNL.47.5.1113. [DOI] [PubMed] [Google Scholar]

[CR31] McKhann GM, et al. The diagnosis of dementia due to Alzheimer’s disease: recommendations from the National Institute on Aging-Alzheimer’s Association workgroups on diagnostic guidelines for Alzheimer’s disease. Alzheimer’s Dement. 2011;7:263–269. doi: 10.1016/j.jalz.2011.03.005. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR32] Mendal JM. Tutorial on higher-order statistics (spectra) in signal processing and system theory: theoretical results and some application. Proc IEEE. 1991;79:278–305. doi: 10.1109/5.75086. [DOI] [Google Scholar]

[CR33] Mohammadpoory Z, Haddadnia J. Speech enhancement using Laplacian mixture model under signal presence uncertainty. IJE Trans C Asp. 2014;27:1367–1376. [Google Scholar]

[CR34] Mookiah MRK, Acharya UR, Lim CM, Petznick A, Suri JS. Data mining technique for automated diagnosis of glaucoma using higher order spectra and wavelet energy features. Knowl-Based Syst. 2012;33:73–82. doi: 10.1016/j.knosys.2012.02.010. [DOI] [Google Scholar]

[CR35] Moreno A, Fonollosa JAR (1992) Pitch determination of noisy speech using HOS. In: International conference on acoustics, speech and signal processing, San Francisco, USA, pp 133–136

[CR36] Moreno A, Fonollosa JAR, Vidal J (1993) Vocoder design based on HOS. In: Eurospeech ‘93, Berlin, Germany, pp 519–522

[CR37] Muller KR, Mika S, Ratsch G, Tsuda K, Scholkopf B. An introduction to Kernal based learning algorithms. IEEE Trans Neural Netw. 2001;12:181–201. doi: 10.1109/72.914517. [DOI] [PubMed] [Google Scholar]

[CR38] Muthuswamy J, Sharma A. A study of electroencephalographic descriptors and end-tidal concentration in estimating depth of anesthesia. J Clin Monit. 1996;12:353–364. doi: 10.1007/BF02077633. [DOI] [PubMed] [Google Scholar]

[CR39] Nasrolahzadeh M, Haddadnia J. Poincaré plots of Spontaneous Speech Signals during Alzheimer’s disease. Mitteilungen Saechsischer Entomologen. 2016;119:358–365. [Google Scholar]

[CR40] Nasrolahzadeh M, Mohhamadpoori Z, Haddadnia J. Optimal way to find the frame length of the speech signal for diagnosis of Alzheimer’s disease with PSO. Asian J Math Comput Res. 2014;2:33–41. [Google Scholar]

[CR41] Nasrolahzadeh M, Mohhamadpoori Z, Haddadnia J. Adaptive neuro-fuzzy inference system for classification of speech signals in Alzheimer’s disease using acoustic and non-linear characteristics. Asian J Math Comput Res. 2015;3:122–131. [Google Scholar]

[CR42] Nasrolahzadeh M, Mohhamadpoori Z, Haddadnia J. Alzheimer’s disease diagnosis using spontaneous speech signals and hybrid features. Asian J Math Comput Res. 2015;7:322–331. [Google Scholar]

[CR43] Nasrolahzadeh M, Mohhamadpoori Z, Haddadnia J. Analysis of mean square error surface and its corresponding contour plots of spontaneous speech signals in Alzheimer’s disease with adaptive wiener filter. Comput Hum Behav. 2016;61:364–371. doi: 10.1016/j.chb.2016.03.031. [DOI] [Google Scholar]

[CR44] Nasrolahzadeh M, Mohhamadpoory Z, Haddadnia J. A novel method for early diagnosis of Alzheimer’s disease based on higher-order spectral estimation of spontaneous speech signals. Cognit Neurodyn. 2016;10:495–503. doi: 10.1007/s11571-016-9406-0. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR45] Nikias CL, Petropulu A. Higher-order spectral analysis: a nonlinear signal processing framework. Englewood Cliffs: Prentice Hall; 1993. [Google Scholar]

[CR46] Nikias CL, Raghuveer MR. Bispectrum estimation: a digital signal processing framework. Proc IEEE. 1987;75:869–891. doi: 10.1109/PROC.1987.13824. [DOI] [Google Scholar]

[CR47] Ning T, Bronzino JD. Autoregressive and Bispectral analysis techniques: EEG applications. IEEE Eng Med Biol. 1990;9:47–50. doi: 10.1109/51.62905. [DOI] [PubMed] [Google Scholar]

[CR48] Ortiz A, Górriz JM, Ramírez J, Martínez-Murcia FJ. LVQ-SVM Based CAD tool applied to structural MRI for the diagnosis of the Alzheimer’s disease. Pattern Recognit Lett. 2013;34:1725–1733. doi: 10.1016/j.patrec.2013.04.014. [DOI] [Google Scholar]

[CR49] Pai PF, Hsu MF, Wang MC. A support vector machine based model for detecting top management fraud. Knowl-Based Syst. 2011;24:314–321. doi: 10.1016/j.knosys.2010.10.003. [DOI] [Google Scholar]

[CR50] Paliwal KK, Sondhi MM (1991) Recognition of noisy speech using cumulant based linear prediction analysis. In: International conference on acoustics, speech and signal processing, Toronto, Canada, pp 429–432

[CR51] Quinlan JR. C4.5: programs for machine learning. Los Altos: Morgan Kaufmann Publishers, Inc; 1993. [Google Scholar]

[CR52] Rangoussi M, Delopoulos A, Tsatsanis M (1993a) On the use of higher-order statistics for robust endpoint detection of speech. In: IEEE signal processing workshop on higher-order statistics, Lake Tahoe, California, USA. IEEE. pp 56–60

[CR53] Rangoussi M, Bakamidis S, Carayannis G (1993b) Robust endpoint detection of speech in the presence of noise. In: Eurospeech ‘93, Berlin, Germany. ESCA, pp 649–652

[CR54] Refaeilzadeh P, Tang L, Liu H (2009) Cross-validation. In: Encyclopedia of database systems, pp 532–538

[CR55] Ren J. ANN vs. Svm: which one performs better in classification of Mccs in mammogram imaging. Knowl-Based Syst. 2012;26:144–153. doi: 10.1016/j.knosys.2011.07.016. [DOI] [Google Scholar]

[CR56] Reynolds A. Alzheimer disease: focus on computed tomography. Radiol Technol. 2013;2085:187CT–211CT. [PubMed] [Google Scholar]

[CR57] Rohrer JD. Structural brain imaging in frontotemporal dementia. Biochem Biophys Acta. 2012;1822:325–332. doi: 10.1016/j.bbadis.2011.07.014. [DOI] [PubMed] [Google Scholar]

[CR58] Salas-Gonzalez D, Gorriz JM, Ramírez J, Lopez M, Alvarez I, Segovia F, Chaves R, Puntonet CG. Computer-aided diagnosis of Alzheimer’s disease using support vector machines and classification trees. Phsy Med Biol. 2010;5:2807–2817. doi: 10.1088/0031-9155/55/10/002. [DOI] [PubMed] [Google Scholar]

[CR59] Salavedra JM, Masgrau E, Moreno A, Jove X (1993a) A speech enhancement system using higher order AR estimation in real environments. In: Eurospeech ‘93, Berlin, Germany, pp 223–226

[CR60] Salavedra JM, Masgrau E, Moreno A, Jove X (1993b) Comparison of different order cumulants in a speech enhancement system by adaptive wiener filtering. In: IEEE signal processing workshop on higher-order statistics, Lake Tahoe, California, USA. IEEE, pp 61–65

[CR61] Sigl JC, Chamoun NG. An introduction to bispectral analysis for the electroencephalogram. J Clin Monit. 1994;10:392–404. doi: 10.1007/BF01618421. [DOI] [PubMed] [Google Scholar]

[CR62] Silipo R, Deco G, Vergassola R, Bartsch H. Dynamics extraction in multivariate biomedical time series. Biol Cybern. 1998;79:15–27. doi: 10.1007/s004220050454. [DOI] [PubMed] [Google Scholar]

[CR63] SubbaRao T, Gabr MM. An introduction to bispectral analysis and bilinear time series models (Lecture notes in statistics) New York: Springer; 1984. [Google Scholar]

[CR64] Teodorescu HN, Grigorasand F, Apppei V. Nonlinear and non stationary processes in speech production. Int J Chaos Theor Appl. 1996;5:1453–1457. [Google Scholar]

[CR65] Todder D, Avissar S, Schreiber G. Non-linear dynamic analysis of inter- word time intervals in psychotic speech. IEEE J Transl Eng Health Med. 2013;1:2200107. doi: 10.1109/JTEHM.2013.2268850. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR66] Xianda Z. Modern signal processing. Beijing: Tsinghua University Press; 1995. pp. 373–433. [Google Scholar]

[CR67] Yuvaraj R, Murugappan M, Ibrahim NM, Sundaraj K, Omar MI, Mohamad K, Palaniappan R. Detection of emotions in Parkinson’s disease using higher order spectral features from brain’s electrical activity. Biomed Signal Process Control. 2014;14:108–116. doi: 10.1016/j.bspc.2014.07.005. [DOI] [Google Scholar]

[CR68] Zhang Y, Dong Z, Phillips P, Wang S, Ji G, Yang J, Yuan TF. Detection of subjects and brain regions related to Alzheimer’s disease using 3D MRI scans based on eigenbrain and machine learning. Front Comput Neurosci. 2015;9:66. doi: 10.3389/fncom.2015.00066.eCollection2015. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR69] Zhang Y, Wang S, Phillips P, Dong Z, Ji G, Yang J. Detection of Alzheimer’s disease and mild cognitive impairment based on structural volumetric MR images using 3D-DWT and WTA-KSVM trained by PSOTVAC. Biomed Signal Process Control. 2015;21:58–73. doi: 10.1016/j.bspc.2015.05.014. [DOI] [Google Scholar]

[CR70] Zhang Z, Zheng H, Liang K, Wang H, Kong S, Hu J, Wu F, Sun G. Functional degeneration in dorsal and ventral attention systems in amnestic mild cognitive impairment and Alzheimer’s disease: an fMRI study. Neurosci Lett. 2015;585:160–165. doi: 10.1016/j.neulet.2014.11.050. [DOI] [PubMed] [Google Scholar]

PERMALINK

Higher-order spectral analysis of spontaneous speech signals in Alzheimer’s disease

Mahda Nasrolahzadeh

Zeynab Mohammadpoory

Javad Haddadnia

Abstract

Introduction