A deep learning approach for diagnosis of schizophrenia disorder via data augmentation based on convolutional neural network and long short-term memory

Amin Mashayekhi Shams; Sepideh Jabbari

doi:10.1007/s13534-024-00360-9

. 2024 Feb 24;14(4):663–675. doi: 10.1007/s13534-024-00360-9

A deep learning approach for diagnosis of schizophrenia disorder via data augmentation based on convolutional neural network and long short-term memory

Amin Mashayekhi Shams ¹, Sepideh Jabbari ^1,^✉

PMCID: PMC11208387 PMID: 38946814

Abstract

Schizophrenia (SZ) is a severe, chronic mental disorder without specific treatment. Due to the increasing prevalence of SZ in societies and the similarity of the characteristics of this disease with other mental illnesses such as bipolar disorder, most people are not aware of having it in their daily lives. Therefore, early detection of this disease will allow the sufferer to seek treatment or at least control it. Previous SZ detection studies through machine learning methods, require the extraction and selection of features before the classification process. This study attempts to develop a novel, end-to-end approach based on a 15-layers convolutional neural network (CNN) and a 16-layers CNN- long short-term memory (LSTM) to help psychiatrists automatically diagnose SZ from electroencephalogram (EEG) signals. The deep model uses CNN layers to learn the temporal properties of the signals, while LSTM layers provide the sequence learning mechanism. Also, data augmentation method based on generative adversarial networks is employed over the training set to increase the diversity of the data. Results on a large EEG dataset show the high diagnostic potential of both proposed methods, achieving remarkable accuracy of 98% and 99%. This study shows that the proposed framework is able to accurately discriminate SZ from healthy subject and is potentially useful for developing diagnostic tools for SZ disorder.

Keywords: Convolutional neural networks, Deep learning, Generative adversarial networks, Long short-term memory, Schizophrenia disorder

Introduction

Schizophrenia (SZ) is a serious, prolonged mental illness [1]. According to the American Psychiatric Association, one percent of the world’s population suffers from it [2, 3]. It usually begins before the age of 25 and lasts until the end of life, and none of the social classes are immune to it [4]. Early symptoms of the disease are associated with behavioral problems such as withdrawal and emotional changes [5]. It interferes with social activities, work, and daily life [6, 7]. This complication persists despite pharmacological treatment of psychological symptoms, and therefore most studies on individual cognition in SZ are performed with the aim of monitoring, early diagnosis, or diagnosis [8]. A typical plan consists of two stages: offline training and online recognition [9]. During the training phase, brain activity is measured and recorded by various methods such as electroencephalography (EEG), electrical impedance tomography (EIT), magneto encephalography (MEG), and electroneurogram (ENG). In recent years, much attention has been paid to the analysis of EEG signals in comparison with other methods in patients with SZ [10–13].

Hornero et al. analyzed time series data generated by 20 schizophrenic patients and 20 healthy individuals using three nonlinear methods, including central tendency measurement (CTM), approximate entropy (AE), and Lempel–Ziv complexity (LZC). Due to the nonlinear nature of the EEG signal, these methods were suitable for classification [14]. Sabeti et al. performed EEG signal recording from 20 healthy individuals and 20 patients with SZ and selected the channels using a bidirectional search technique and plus-L-minus-R (RLS) algorithm. The genetic algorithm (GA) was used to extract the best features from the selected channels. Four types of features were extracted in their study, including autoregressive (AR) model parameters, band power, fractal dimension, and wavelet energy. Finally, the classification was performed by linear discriminant analysis (LDA) and support vector machine (SVM), which obtained 84.62% and 99.38% accuracy with first channel selection method and 88.23% and 99.54% accuracy with second channel selection algorithm, respectively [15]. Kim et al. showed that patients with SZ had lower values of Lyapunov exponent in the lower frontal and temporal lobes of the anterior than in the control group [16]. Sabeti et al. continued their work by utilizing other features including Shannon entropy, spectral entropy, approximate entropy, Lempel–Ziv complexity, and Higuchi fractal dimension. The leave-one-out cross-validation method was used to estimate the separability of two groups, and with LDA and adaptive boosting classifiers, 86% and 90% detection accuracy were obtained, respectively [17]. Kim et al. analyzed the absolute power of five frequency bands of EEG signal to examine differences between normal and schizophrenic subjects. The analysis performed on the delta frequency band generated the best result, with an overall classification accuracy of 62.2% [18]. Dvey-Aharon et al. performed a single-electrode approach to analyze EEG signals in the time–frequency domain using the Stockwell transform. The results indicated a high classification accuracy with the best five distinctive electrodes between 91.5% and 93.9% [19]. Santos-Mayo et al. analyzed event-related potentials (ERPs) of participants involved in an unknown hearing task. After recording, the signals were processed and 16 time and four frequency domain properties were extracted from each electrode for each participant. Multilayer perceptron and SVM were used for classification and the results were 93.42% and 92.23%, respectively [20]. Patel et al. conducted an analysis of multichannel EEG data for the N-back task using multivariate empirical mode decomposition (MEMD), a method that breaks down the data into a predetermined number of intrinsic mode functions (IMFs). From each IMF, various features were extracted based on statistical parameters, spectral power associated with brain waves, and parameters derived from time-series data. In order to mitigate the problems of overfitting and inadequate generalization, the study employed kernel principal component analysis (kPCA) to effectively reduce the number of features. The resulting reduced transformed features were then utilized for training and testing various machine learning models. The findings revealed that the KNN model, utilizing kPCA transformed features, achieved the highest average classification accuracy of 97.34% [21].

Although these methods achieved typically promising results, they require background knowledge about SZ characteristics in feature extraction and selection process. These manual-designed strategies suffer from reliability, robustness, and generalizability when dealing with different types of datasets. Recent advances in deep learning techniques have been shown to provide a promising alternative to the aforementioned feature-based methods. Deep learning approaches are able to extract proper features for any given issues, automatically. Recently, some works have reported the effectiveness of deep learning models in detection of SZ from EEG signals. Shu Lih et al. developed an 11-layers convolutional neural network (CNN) to analyze EEG signals from 14 SZ patients and 14 healthy subjects. Their proposed model generated classification accuracies of 98.07% and 81.26% in training and test stage, respectively [22]. Phang et al. proposed a multi-domain connectome CNN for detection of SZ patients [23]. The input to the model was obtained based on 2D time–frequency domain connectivity features and 1D intricate network features. An accuracy of 91.69% was yielded. In another recent work, Shalbaf et al. introduced a combinational methodology based on continues wavelet transform (CWT), pre-trained CNN, and SVM classifier for automatic detection of SZ patients from healthy controls [24]. They compensated the limitations of using a small dataset by utilizing the concept of transfer learning with pre-trained CNNs. Chandran et al. extracted nonlinear features such as Katz fractal dimension (KFD) and approximate entropy (ApEn), then applied them to the LSTM approach, and achieved the accuracy of 99% [25]. In order to detect, Aslan and Akin constructed images using the Hilbert transform [26]. Additionally, Sun et al. divided theta, alpha, and beta frequency bands from the EEG data and computed fuzzy entropy and fast Fourier transform (FFT) characteristics from them. The accuracy of 99.22% was then attained by creating two-dimensional pictures based on those images and fusing CNN and LSTM models to distinguish SZ patients from the healthy group [27]. The input of these models is designed based on manual extraction of features, which is time-consuming and also cannot guarantee robustness against different types of datasets.

Most previous studies have used machine learning methods for feature extraction and classification. A limited number of researches have used deep learning methods that were reviewed in general. Feature extraction and fusion strategies are crucial factors for the implementation of their methods. However, a proper feature extraction mechanism for different data sources and problems is still challenging. The main contributions of this work are summarized as follows:

We propose an end-to-end system based on deep CNNs for classifying EEG-patterns in SZ. Previous researches exploited complex hand-crafted features as CNN input along with robust classifiers for the automatic detection of SZ disorder. While, the presented model employs CNN to extract useful features from the raw EEG signals; therefore, feature extraction, feature selection, and also classification are not needed.
We investigate, for the first time, the effectiveness of data augmentation in aggregating the virtually generated data with actual data to enhance the performance of SZ detection. One of the main problems of previous studies and deep learning networks in diagnosing SZ disorders is the lack of sufficient data for proper training of different networks. The proposed method incorporates deep convolutional generative adversarial networks (DCGAN) to increase the amount of data available for training the models.
We develop a novel hybrid architecture called CNN–LSTM that allows fusion of the strengths of both architectures. The proposed model is an ensemble of CNN and LSTM combined in a series configuration. The CNN is able to learn the local features of input EEG signals (representation learning), and then the LSTM learns long-term dependencies and sequentially processes these features (sequence learning), leading to improved classification performance.

The structure of the paper is summarized as follows. Section 2 describes the dataset used in the study and deep learning concept. Moreover, the detailed information and methodology of DCGAN for data augmentation are discussed in this section. The experimental results and related discussions are illustrated in Sect. 3. The end part of this paper is the conclusion in Sect. 4.

Materials and methods

Figure 1 shows an overview of the proposed frameworks for classifying SZ and healthy subject. First part is pre-processing includes wavelet-based denoising, segmentation, and normalization with a Z-score. The segmented SZ and normal signals are then fed to the DCGAN to generate additional data. The DCGAN is trained to generate high-quality fake data, which is then combined with real data to increase the size of the dataset. Samples created by DCGAN provide more examples for the models to learn from, lowering the possibility of overfitting, and enhancing the model's capacity to handle various signal patterns. The prepared data are then used as inputs to a deep CNN. Convolutional layers extract the local features. Then one-dimensional vector of features is shaped and flattened, followed by fully-connected layers to classify data into SZ and normal. Also, another hybrid model is proposed which is an ensemble of CNN and LSTM. The combined model will further learn higher level features.

Fig. 1 — Overview of the proposed framework for diagnosis of SZ

Schizophrenia dataset

To evaluate the classification performance of the proposed models, EEG signals from the database of Laboratory for Neurophysiology and Neuro-Computer Interfaces of Lomonosov Moscow State University were analyzed. The database contains 16-channel EEG records of 45 adolescent patients with SZ and 39 healthy adolescents and each record is about 16 min. The sampling rate is 128 Hz, the ADC resolution is 24 bit, and the signals are recorded from channels F7, F3, F4, F8, T3, C3, Cz, C4, T4, T5, P3, Pz, P4, T6, O1, and O2 [28]. Before feeding the EEG signals into the proposed models, denoising by wavelet method was performed to eliminate artifacts for the further analysis. The denoising process comprises three distinct stages: decomposition, thresholding, and reconstruction. Four level of decomposition using “db10” wavelet was implemented on the EEG data. Then, one-minute (7680 samples) segments were separated from each of the 15 selected channels. Therefore, a total of (15 × 45) segments related to schizophrenic patients and (15 × 39) normal segments were employed. Each segment was also normalized with Z-score before feeding into the DCGAN to address the problem of amplitude scaling and offset effect, hence allowing faster convergence by the deep learning model during training.

Deep learning

Today, deep neural networks are used in classification due to their ability to learn new and combined features. The goal of deep learning is the intelligent extraction of features through several stages of learning. The heart of all common deep learning algorithms is artificial neural networks. The artificial neural network is made up of a large number of artificial neurons. An artificial neuron can only have multiple inputs located at a single output. Each input is multiplied by a coefficient as a weight, and all multiplied inputs are added by a coefficient as a bias, and the resulting value passes through a nonlinear operator. The strength of artificial neural networks lies in the fact that by combining many neurons in layers, the nonlinear relationships between input and output can be modeled, which is typically needed to make accurate predictions [29].

Data augmentation

Data augmentation methods significantly increase the diversity of data, without actually collecting new data. In recent years, GANs have been used by researchers to increase dataset at various levels successfully [30]. These networks can be characterized by training a pair of networks in competition with each other. The forger network that creates fake data is called the generator. The expert network, called the discriminator, receives the fake and real data and intends to classify the samples produced into two categories. In other words, it receives a sample of existing data as input (real or fake) and predicts the binary class label as real or fake. The error signal to the discriminator is provided by the correct labeling of the monitored data, which is the actual or generated data. The same error signal is used to train the generator network and leads it to produce better quality fake data. The training process of the GAN is shown in Fig. 2.

Fig. 2 — Generative adversarial network training process [30]

Deep convolutional generative adversarial network

The DCGAN proposed by Radford et al. is an expansion of the original GAN [31]. The difference is that the discriminator network and the generator network explicitly use convolutional layers. Due to the successful operation of DCGAN, this network has been used to increase the data in this study. Figure 3a represents the detailed structure of the generator network, which maps the latent space vector to data space. The generator is comprised of convolutional-transpose layers, batch norm layers, and rectified linear unit (ReLU) activations. Presence of the batch normalization layers leads to a proper gradient flow during training. The Batch normalization layer conversion is as follows:

{\hat{y}}^{(l - 1)} = \frac{y^{(l - 1)} - μ_{B}}{\sqrt{(σ_{B}^{2} + ε)}}

z^{(l)} = γ^{(l)} {\hat{y}}^{(l - 1)} + β^{(l)}

where $y^{(l - 1)}$ is the input vector to the normalizing layer $l - 1$ , $μ_{B} = \frac{1}{n} \sum_{i = 1}^{n} y_{i}^{(l - 1)}$ , and $σ_{B}^{2} = \frac{1}{n} \sum_{i = 1}^{n} {(y_{i}^{(l - 1)} - μ_{B})}^{2}$ are the mean and variance of the batch, respectively. $z^{(l)}$ is the normal output vector for a neuron and $ε$ is a small constant for numerical stability. $γ^{(l)}$ and $β^{(l)}$ are parameters related to scale and change in learning rate, respectively [32].

First, a 240 × 1 random noise vector as an input is fed to the dense layer to reshape the vector to a representation of 1680 × 1. Second, to generate an image of size 240 × 480 × 1, the output from the dense layer is reshaped to 240 × 7. Third, the output from the reshape layer is followed by a series of convolution-transpose layers to up-sample the representation. Fourth, leaky ReLU activation (LReLU) is used, except for the output layer that uses $\tanh$ activation, for all layers within the network. This enables the model not to face the issue of dead neurons that is made by the ReLU function. Fifth, batch normalization is used for all layers, except for the output layer, which normalizes the input to have zero mean and unit variance to fix the learning procedure [33]. As shown in Fig. 3a, we used five convolution-transpose layers to up-sample the representation of size 1680 × 1 to an image of size 240 × 480.

The detailed structure of the discriminator network, with an image as input and a scalar probability as output, is shown in Fig. 3b. The discriminator network uses pooling and convolutional layers to down-sample and batch normalization layers and LReLU activations to speed up gradient flow during training. The discriminator network takes images of size 240 × 480 as input, a combination of real images from the original dataset and the generated images from the generator network. The input image goes through a series of convolution and max pooling layers, followed by a sigmoid activation function to determine if the image is real or fake. Dropout technique has also been used to prevent overfitting. As recommended by Radford et al., each convolution layer is followed by a LReLU activation function [31].

The goal of GAN is to minimize the gap between the probability distribution of fake and real data. In this study, we used the minimax loss reported by Goodfellow et al., which is defined by the below equation [30]:

min_{G} max_{D} V_{GAN} (D, ,, G) = E_{x \sim p_{data} (x)} [log D (x)] + E_{z \sim p_{z} (z)} [log (1 - D (G, (z), ))]

where $E_{x \sim p_{data} (x)}$ is the expected value over all real samples, $E_{z \sim p_{z} (z)}$ is the expected value over all the fake samples, $p_{z} (z)$ is a prior on input noise variable, $G (z)$ is the generator function that maps to the data space, $x$ describes original data, and $D (x)$ is the probability that $x$ came from the original data rather than generator’s distribution. The generator network endeavors to minimize the loss function and the discriminator network endeavors to maximize the same loss function. So, the GAN’s learning method is a minimax competition between the discriminator and the generator.

The goal of $G$ is to guess the distribution that the training data came from $p_{data}$ , so it can create fake samples from that guessed distribution $p_{g}$ [31]. The discriminator $D$ tries to maximize the possibility of correctly classifying fake and real samples. Similarly, the generator $G$ efforts to minimize $log (1 - D (G, (z), ))$ . Theoretically, the key to this minimax competition is where $p_{g} = p_{data}$ . The task of the discriminator network is to classify real images from fake images, which is a binary classification problem. We used binary cross entropy (BCE) as the loss function, which is given by the below equation:

J_{BCE} (θ) = - \frac{1}{M} \sum_{m = 1}^{M} [y_{m} \times log (h_{θ}, (x_{m})) + (1 - y_{m}) \times log (1 - h_{θ} (x_{m}))]

where $M$ is the number of training samples in a mini-batch, $y_{m}$ is the target label for training sample $m$ (the real label is 1 and the fake label is 0), $x_{m}$ is the input for training sample $m$ , and $h_{θ}$ is the model with neural network weights θ. The equation calculates the mean cost of all the samples in the entire batch.

DCGAN training contains two steps: (1) training the discriminator network and (2) training the generator network. First, the discriminator is trained with a batch of real samples to compute $log D (x)$ . Second, the generator creates a batch of fake samples, and then the discriminator is trained with this batch of fake examples to compute $log (1 - D (G, (z), ))$ . We trained the DCGAN for 450 epochs, and the DCGAN was able to generate fake samples that resembled real data. During the GAN training, the loss of the discriminator and generator networks were computed. Figure 4 shows the loss function during the generator and discriminator training. Applying the DCGAN increases the number of data to 2730 for healthy subjects and to 3150 for SZs.

Fig. 4 — Generator and discriminator networks loss

According to Table 1, 70% of the data was chosen for training the model and 30% for validating and testing the model. In the next section, the CNN algorithm and the details of the proposed structure are described.

Table 1.

Data segmentation for training, validation, and testing

Data type	With schizophrenia	Normal
Training	2205	1911
Validation	815	749
Test	130	70

Open in a new tab

Convolutional neural network and representation learning

A CNN consists of one or more convolutional layers, are followed by one or more fully connected (FC) layers as in a standard neural network. CNN's architecture has two unique characteristics: local connectivity and shared weights. CNN uses local communication between neurons in nearby layers to take advantage of local correlation. Some neuron connections are repeated around the entire layer in CNN, with the same weights and biases. When dealing with machine learning issues, CNN tends to have greater generalization by using a particular architecture such as local connections and shared weights [34].

In a CNN, feature extraction is performed by convolution layers. Convolution filter weights are determined as part of the training operation. Convolutional layers extract the local features efficiently, because they restrict the receptive fields of the hidden layers to be local. The weights of the convolutional layer getting used for feature extraction likewise the fully connected layer getting used for classification, are chosen during the training operation. The error function of a network is minimized by training it on actual network outputs rather than optimal network outputs. This operation is carried out by altering the network's free parameters, such as weights and biases. Supervised learning is the training method used in the proposed network. Figure 5a illustrates the proposed network architecture. Different layers in the model are summarized as follows:

Convolutional layer The value of a neuron $v_{ij}^{x}$ at position $x$ of the $j th$ feature map in the $i th$ layer is given by bellow equation:
$v_{ij}^{x} = g (b_{ij} + \sum_{m} \sum_{p = 0}^{P_{i} - 1} ω_{ijm}^{p} v_{(i - 1) m}^{x + p})$ 5
where $m$ indexes the feature map in the previous layer $((i - 1) th layer)$ connected to the current feature map, and $ω_{ijm}^{p}$ is the weight of position $p$ connected to the $m th$ feature map. $P_{i}$ is the width of the kernel toward the spectral dimension, $b_{ij}$ is the bias of $j th$ feature map in the $i th$ layer, and $g$ is the activation function applied to the linear combination of inputs.
Dropout layer Dropout is a strategy used to prevent a model from overfitting by randomly setting the input units to zero with a frequency rate at each update of the training phase. The dropout ratio was set to 0.5, meaning that there is 0.5 probability that a neuron will be dropped out during training.
Fully connected layer The final FC layers come after the convolutional layers. These layers behave similar to their counterparts in conventional artificial neural networks, accounting for approximately 90% of the CNN parameters. Network result can be presented as a vector with a defined size using the FC layer. This vector can be used to further process or categorize images. Different classifications are used in this regard, such as sigmoid, tanh, ReLU, and LReLU [34].

Fig. 5 — Proposed a CNN and b CNN-LSTM structure

LSTM and sequence learning

The EEG signal can be considered as a time series of brain activity signals. Deep learning algorithms can be used to learn sequential signals. Sequential learning is used to mimic short-term and long-term memory. Although typical recurrent neural networks (RNN) are quite good at modeling short-term memory, they are ineffective in long-term dependency due to vanishing gradient issues [35]. Hochreiter and Schmidhuber proposed LSTM to satisfy this need [36]. Unlike the RNN architecture, there are unique hidden units called memory cells in the LSTM design that are used to recall the previous input for a long period. The forget, learn, remember, and use gates of an LSTM architecture assess whether an input is important enough to be saved. Four distinct functions are employed in the LSTM unit, including sigmoid ( $σ$ ), hyperbolic tangent (tanh), multiplication ( $\times$ ), and sum ( $+$ ), making it easy to update the weights during the back propagation process. The progressive learning strategy over EEG time series is characterized in more formal terms as follows:

\begin{matrix} f_{t} = σ (W_{f} . [h_{t - 1}, x_{t}] + b_{f}), \\ i_{t} = σ (W_{i} . [h_{t - 1}, x_{t}] + b_{i}), \\ {\tilde{C}}_{t} = tanh (W_{C} . [h_{C - 1}, x_{t}] + b_{C}), \\ C_{t} = f_{t} \times C_{t - 1} + i_{t} \times {\tilde{C}}_{t}, \\ O_{t} = σ (W_{O} . [h_{t - 1}, x_{t}] + b_{O}), \\ h_{t} = O_{t} \times tanh (C_{t}) \end{matrix}

Assume that there are $N$ local features $\{x_{1}, x_{2} \dots x_{N}\}$ , retrieved from our CNN model. The $x_{t}$ is an input signal feature in time $t$ . $h_{t - 1}$ and $C_{t - 1}$ denote short-term and long-term memory values, respectively. $W_{n}$ , $b_{n},$ , $i_{t}$ , and $f_{t}$ represent weight matrix, bias, ignore factor and forget gate, respectively. $n_{t} i_{t}$ is the learn gate's output, $C_{t - 1} f_{t}$ is the forget gate's output, $C_{t}$ is the remember gate's output, ${\tilde{C}}_{t}$ is the candidate cell status vector, and $O_{t}$ is the output gate.

In this study, we used two layers with long short-term memory after CNN layers to learn the sequence. The input of the first layer with long-term memory is 234 × 128, which produces 128 × 1 output. The reshape layer produces an output of 32 × 4, which is applied as input to the second layer with short-term memory and produces an output of 256 × 1. The activation function of layers with long short-term memory is also a linear rectifier unit. The reshape layer converts 256 × 1 vector to 64 × 4 vector. A layer of flatten is placed between the feature extraction layers and the output classification, converting the data that intends to enter the classification stage into a vector. The input of this layer is 64 × 4, which converts to 256 × 1 vector. The classification operation is performed by eight fully connected layers. The activation function of all layers except the last layer, which was the sigmoid function, was considered linear. The structure of the CNN-LSTM network is shown in Fig. 5b.

Performance evaluation

Various criteria such as accuracy, sensitivity, specificity, negative predictive value, and positive predictive value are used to evaluate the efficiency of a classification system. These parameters allow users to see how well a model works when it comes to data analysis. In this work, five standard performance metrics composed of accuracy, sensitivity, specificity, positive predictive value and negative predictive value, are obtained as follows:

A c c u r a c y = \frac{T P + T N}{(T P + T N + F P + F N)}

S e n s i t i v i t y = \frac{TP}{(T P + F N)}

S p e c i f i c i t y = \frac{TN}{(T N + F P)}

P o s i t i v e p r e d i c t i v e v a l u e = \frac{TP}{(T P + F P)}

N e g a t i v e p r e d i c t i v e v a l u e = \frac{TN}{(T N + F N)}

where true positives (TP) indicates the number of patients correctly classified, true negatives (TN) indicates the number of normal signals correctly classified, false positives (FP) indicates the number of normal signals mistaken as patient signals, and false negative (FN) indicates the number of patient signals mistaken as normal signals.

To visualize the performance of the method, confusion matrix is used. Each row of the matrix represents the instances in an actual class, while each column represents the instances in a predicted class.

Results and discussion

In this study, a total of 130 signals with SZ and 70 normal segments were fed into the proposed model as test data which were chosen using GAN at random. All the experiments were carried out on a computer of Intel(R) Core(TM) i7-8550U CPU with 8 GB of RAM and Python software version 3.7.

Figure 6 shows the result of training a 15-layers CNN with Adam optimizer function in 20 epochs on training and validation data with a learning rate of 0.00001. In Fig. 6, we also showed the results of the CNN's training and validation in the case where the data has not increased with DCGAN. Figure 7 shows the result of training a 16-layers CNN–LSTM with Adam optimizer function in 150 epochs on training and validation data with a learning rate of 0.000001. In Fig. 7, we also showed the results of the CNN–LSTM network's training and validation during 200 epochs in the case where the data has not increased with DCGAN. By comparing the results, it can be shown that increasing the data with DCGAN has a significant effect on the training of the convolutional network. The aggregation of the real with the fake data yielded a notable increase in the classification accuracy of the model which was trained on the augmented clinical data compared to that trained on real data instances.

Fig. 6 — Performance of CNN on training and validation data with and without DCGAN

Fig. 7 — Performance of CNN-LSTM on training and validation data with and without DCGAN

The confusion matrix for the test data based on two novel hybrid models are shown in Fig. 8. It can be seen that 98% and 99% of the segments were correctly classified as SZ by the proposed CNN and LSTM-CNN models, respectively. Also, 97% and 98% of the segments were correctly classified as normal, respectively. As for the CNN and LSTM-CNN models, the overall accuracy of 98% and 99% were yielded, respectively.

In Table 2, the results of this study are compared with related studies that used EEG signals. Phang et al. developed a deep CNN framework based on a parallel ensemble of 1D and 2D CNNs to integrate the features from various domains and dimensions using different fusion strategies. They obtained a high classification result with the accuracy of 91.69% with a decision-level fusion. Feature extraction and fusion strategies are crucial factors for the implementation of their methods. However, proper feature extraction mechanism for different data sources and problems is still challenging. Shalbaf et al. used transfer learning to detect SZ and they reached the high accuracy of 98.60%. The main advantage of their model is saving network training time. However, sometime transfer learning models may not work well with small data. Chandran et al. used LSTM to detect SZ and reached the accuracy of 98.96%. LSTM is suitable for sequential learning but has problem in local feature learning. Aslan et al. used a complex process based on CWT and spectrum computation to convert EEG segments into 2D images and then used CNN for classification. In contrast, our proposed model diagnoses SZ based on CNN and CNN-LSTM using the raw EEG signals. In fact, its end-to-end mechanism integrates feature extraction and classification stages into the model and is well suitable for automatically learning features against all kinds of high-dimensional dataset. Also, to our knowledge, this is the first work which aggregates high-quality fake data with real data to deal with clinical unmet needs in the development of robust SZ classification frameworks. Our results highlight the positive impact of aggregating real with fake data generated by the DCGAN through data augmentation towards developing the SZ classification model using just a classic CNN.

Table 2.

Comparison of classification performance of the proposed model with similar researches on the same database

Authors	Accuracy (%)	Precision (%)	Sensitivity (%)	Specificity (%)
Phang et al. [23]	91.69	–	–	–
Shalbaf et al. [24]	98.60	–	99.65	96.92
Chandran et al. [25]	99	99.2	–	–
Aslan et al.[26]	98	–	–	–
Proposed model (CNN)	98	98	98	97
Proposed model (CNN-LSTM)	99	99	99	98

Open in a new tab

Our framework is generally applicable to other neuropsychiatric disorders besides SZ that encounter with problem of small number of datasets. Especially with the framework of the CNN–LSTM model, the excellent classification results were achieved for SZ signals classification. It is potentially useful for development of robust computer-aided diagnostic tools in clinical settings.

Despite the promising results of this study, there are some limitations in deep learning-based methods. Diagnosing SZ through deep learning models has become challenging due to the limited number of cases in the available datasets; however, we solved the lack of datasets by using the DCGAN method. The primary purpose of this study was to facilitate the diagnosis of the SZ disorder. To assess reproducibility, generalizability, and interpretability of the results, it is crucial to train the models using large sets of clinical data. Also, further studies could explore application of the model to assess the severity of the disorder or discriminate between SZ, schizoaffective disorder and psychotic bipolar disorder which is more difficult in differential diagnosis due to overlapping clinical symptoms. Another limitation of this study was that deep learning models necessitate substantial computational resources in order to achieve optimal training and operational performance. This prerequisite poses a considerable constraint for us that have limited access to high-powered hardware. Some coding environments, such as Google Colab, have access to GPUs, but they have problems such as resource, runtime, and data storage limitations, internet dependency, privacy, and data security.

Nonetheless, results show that deep learning has had an important positive impact on brain disorder researches. As the amount of data keeps increasing, deep learning will be a key component to improve disease understanding for brain disorders in general.

Conclusion

This research uses a deep learning approach to develop a binary-class classification framework for SZ disorder detection. We examined the effectiveness of data augmentation using DCGAN in terms of enhancing the real clinical research databases with high-quality fake data to enhance the performance of classification. A 15-layers CNN with 11 convolutional layers, one flattening layer, one dropout layer, two fully connected layers, and the sigmoid activation function to distinguish two classes was designed. We also used a 16-layers CNN–LSTM with three convolutional layers, two LSTM layers, one flattening layer, two reshape layers, eight FC layers, and the sigmoid activation function. By comparing the results, we showed that the performance of CNN and CNN–LSTM has significantly improved with DCGAN. Overall, our results validate the scientific and technical impact of data augmentation in SZ detection yielding a significant increase in the classification accuracy, sensitivity, and specificity of the proposed models.

Author contributions

AM and SJ contributed to the study conception and analysis. The first draft was written by AM and SJ edited and supervised the revisions of the manuscript. Both authors read and approved the final manuscript.

Funding

The authors have no relevant financial or non-financial interests to disclose.

Declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose.

Ethics approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Consent to participate

This article does not require the informed consent.

Footnotes

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

1.Caroline W. Cognition and social behavior in schizophrenia: an animal model investigating the potential role of nitric oxide. Sweden Institute of Neuroscience and Physiology; 2007.
2.Guze SB. Diagnostic and statistical manual of mental disorders: DSM- IV. Washington DC: American Psychiatric Association; 1994. [Google Scholar]
3.World Health Organization. International Statistical Classification of Diseases and Health Related Problems ICD-10. 2005. https://apps.who.int/iris/handle/10665/43110
4.Centorrino F, Baldessarini RJ, Price BH, Tuttle M, Bahk WM, Hennen J. EEG abnormalities during treatment with typical and atypical antipsychotics. Am J Psychiatry. 2002;159(1):109–115. doi: 10.1176/appi.ajp.159.1.109. [DOI] [PubMed] [Google Scholar]
5.Keshayan MS, Diwadkar VA, Montrose DM, Rajarethinam R, Sweeney JA. Premorbid indicators and risk for schizophrenia: a selective review and update. Schizophr Res. 2005;79(1):45–57. doi: 10.1016/j.schres.2005.07.004. [DOI] [PubMed] [Google Scholar]
6.Tandon R, Nasrallah HA, Keshavan MS. Schizophrenia, just the facts Clinical features and conceptualization. Schizophr Res. 2009;110:1–23. doi: 10.1016/j.schres.2009.03.005. [DOI] [PubMed] [Google Scholar]
7.Marwaha S, Johnson S. Schizophrenia and employment-a review. Soc Psychiatry Psychiatr Epidemiol. 2004;39:337–349. doi: 10.1007/s00127-004-0762-4. [DOI] [PubMed] [Google Scholar]
8.Andreasen NC. Scale for the assessment of thought, language, and communication (TLC) Schizophr Bull. 1976;12:473–482. doi: 10.1093/schbul/12.3.473. [DOI] [PubMed] [Google Scholar]
9.Choi HS, Lee B, Yoon S. Biometric authentication using noisy electrocardiograms acquired by mobile sensors. IEEE Access. 2016;4:1266–1273. doi: 10.1109/ACCESS.2016.2548519. [DOI] [Google Scholar]
10.Guger C, Schlogl A, Neuper C, Walterspacher D, Strein T, Pfurtscheller G. Rapid prototyping of an EEG-based BCI. IEEE Trans Neural Syst Rehab Eng. 2001;9:49–58. doi: 10.1109/7333.918276. [DOI] [PubMed] [Google Scholar]
11.Panayiotopoulos CP. EEG and brain imaging. In: A clinical guide to epileptic syndromes and their treatment. London: Springer; 2010.
12.Knyazeva MG, Innocenti GM. EEG coherence studies in the normal brain and after early-onset cortical pathologies. Brain Res Rev. 2001;36:119–128. doi: 10.1016/S0165-0173(01)00087-X. [DOI] [PubMed] [Google Scholar]
13.Guevara MA, Lorenzo I, Arce C, Ramos J, Corsi-Cabrera M. Inter-and intrahemispheric EEG correlation during sleep and wakefulness. Sleep. 1995;18:257–265. doi: 10.1093/sleep/18.4.257. [DOI] [PubMed] [Google Scholar]
14.Hornero R, Abasolo D, Jimeno N, Sa´nchez CI, Poza J, Aboy M. Variability, regularity and complexity of time series generated by schizo-phrenic patients and control subjects. IEEE Trans Biomed Eng. 2006;53:210–218. doi: 10.1109/TBME.2005.862547. [DOI] [PubMed] [Google Scholar]
15.Sabeti M, Boostani R, Katebi SD, Price GW. Selection of relevant features for EEG signal classification of schizophrenic patients. Biomed Signal Process Control. 2007;2:122–134. doi: 10.1016/j.bspc.2007.03.003. [DOI] [Google Scholar]
16.Kim DJ, Jeong J, Chae JH, Park S, Kim SY, Go HJ. An estimation of the first positive Lyapunov exponent of the EEG in patients with schizophrenia. Psychiatry Res Neuroimaging. 2009;98(3):177–189. doi: 10.1016/S0925-4927(00)00052-4. [DOI] [PubMed] [Google Scholar]
17.Sabeti M, Katebi S, Boostani R. Entropy and complexity measures for EEG signal classification of schizophrenic and control participants. Artif Intell Med. 2009;47:263–274. doi: 10.1016/j.artmed.2009.03.003. [DOI] [PubMed] [Google Scholar]
18.Kim JW, Lee YS, Han DH, Min KJ, Lee J, Lee K. Diagnostic utility of quantitative EEG in un-medicated schizophrenia. Neurosci Lett. 2015;589:126–131. doi: 10.1016/j.neulet.2014.12.064. [DOI] [PubMed] [Google Scholar]
19.Dvey Z, Fogelson N, Peled A, Intrator N. Schizophrenia detection and classification by advanced analysis of EEG recordings using a single electrode approach. PLoS ONE. 2015;10:1–12. doi: 10.1371/journal.pone.0123033. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Santos-Mayo L, San-José-Revuelta LM, Arribas JI. A computer-aided diagnosis system with EEG based on the P3b wave during an auditory Odd-Ball task in schizophrenia. IEEE Trans Biomed. 2017;64:395–407. doi: 10.1109/TBME.2016.2558824. [DOI] [PubMed] [Google Scholar]
21.Patel R, Gireesan K, Baskaran R, Shekar N. Optimal classification of N-back task EEG data by performing effective feature reduction. Sådhanå. 2022 doi: 10.1007/s12046-022-02015-w. [DOI] [Google Scholar]
22.Shu LO, Vicnesh J, Ciaccio EJ, Yuvaraj R, Acharya UR. Deep convolutional neural network model for automated diagnosis of schizophrenia using EEG signals. Appl Sci. 2019;9:1–13. [Google Scholar]
23.Phang CR, Noman F, Hussain H, Ting CM, Ombao H. A multi-domain connectome convolutional neural network for identifying schizophrenia from EEG connectivity patterns. IEEE J Biomed Health Inf. 2019;24(5):1333–1343. doi: 10.1109/JBHI.2019.2941222. [DOI] [PubMed] [Google Scholar]
24.Shalbaf A, Bagherzadeh S, Maghsoudi A. Transfer learning with deep convolutional neural network for automated detection of schizophrenia from EEG signals. Phys Eng Sci Med. 2021;43:1229–1239. doi: 10.1007/s13246-020-00925-9. [DOI] [PubMed] [Google Scholar]
25.Chandran C, Sreekumar K, Subha DP. EEG-based automated detection of schizophrenia using long short-term memory (LSTM) network. In: Advances in machine learning and computational intelligence. Springer, Singapore; 2021. pp. 229–236
26.Aslan Z, Akin M. A deep learning approach in automated detection of schizophrenia using scalogram images of EEG signals. Phys Eng Sci Med. 2022;45:83–96. doi: 10.1007/s13246-021-01083-2. [DOI] [PubMed] [Google Scholar]
27.Sun J, Cao R, Zhou M, Hussain W, Wang B, Xue J, Xiang J. A hybrid deep neural network for classification of schizophrenia using EEG data. Sci Rep. 2021;11(1):1–16. doi: 10.1038/s41598-021-83350-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Gorbachevskaya NN, Borisov S. EEG data of healthy adolescents and adolescents with symptoms of schizophrenia. Available via http://brain.bio.msu.ru/eeg_schizophrenia.htm.
29.Litjens G, Ciompi F, Wolternik J, Vos B, Leiner T, Teuwen J, Isgum I. State of the art deep learning in cardiovascular image analysis. JACC Cardiovasc Imaging. 2019;12:1549–1565. doi: 10.1016/j.jcmg.2019.06.009. [DOI] [PubMed] [Google Scholar]
30.Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y. Generative adversarial nets. In: Proceedings of the 27th international conference on neural information processing systems, Montreal, QC, Canada. 2014; pp. 2672–2680
31.Radford A, Metz L, Chintala S. Unsupervised representation learning with deep convolutional generative adversarial networks. 2015. arXiv:1511.06434
32.Ioffe S, Szegedy C. Batch normalization: accelerating deep network training by reducing internal covariate shift. 2015. arXiv:1502.03167
33.Xu B, Wang N, Chen T, Li M. Empirical evaluation of rectified activations in convolutional network. 2015. arXiv:1505.00853.
34.Chen Y, Jiang H, Li C, Jia X, Ghamisi P. Deep feature extraction and classification of hyper spectral images based on convolutional neural networks. IEEE Trans Geosci Remote Sens. 2016;54(10):6232–6251. doi: 10.1109/TGRS.2016.2584107. [DOI] [Google Scholar]
35.Bengio Y, Simard P, Frasconi P. Learning long-term dependencies with gradient descent is difficult. IEEE Trans Neural Netw. 1994 doi: 10.1109/72.279181. [DOI] [PubMed] [Google Scholar]
36.Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput. 1997 doi: 10.1162/neco.1997.9.8.1735. [DOI] [PubMed] [Google Scholar]

[CR1] 1.Caroline W. Cognition and social behavior in schizophrenia: an animal model investigating the potential role of nitric oxide. Sweden Institute of Neuroscience and Physiology; 2007.

[CR2] 2.Guze SB. Diagnostic and statistical manual of mental disorders: DSM- IV. Washington DC: American Psychiatric Association; 1994. [Google Scholar]

[CR3] 3.World Health Organization. International Statistical Classification of Diseases and Health Related Problems ICD-10. 2005. https://apps.who.int/iris/handle/10665/43110

[CR4] 4.Centorrino F, Baldessarini RJ, Price BH, Tuttle M, Bahk WM, Hennen J. EEG abnormalities during treatment with typical and atypical antipsychotics. Am J Psychiatry. 2002;159(1):109–115. doi: 10.1176/appi.ajp.159.1.109. [DOI] [PubMed] [Google Scholar]

[CR5] 5.Keshayan MS, Diwadkar VA, Montrose DM, Rajarethinam R, Sweeney JA. Premorbid indicators and risk for schizophrenia: a selective review and update. Schizophr Res. 2005;79(1):45–57. doi: 10.1016/j.schres.2005.07.004. [DOI] [PubMed] [Google Scholar]

[CR6] 6.Tandon R, Nasrallah HA, Keshavan MS. Schizophrenia, just the facts Clinical features and conceptualization. Schizophr Res. 2009;110:1–23. doi: 10.1016/j.schres.2009.03.005. [DOI] [PubMed] [Google Scholar]

[CR7] 7.Marwaha S, Johnson S. Schizophrenia and employment-a review. Soc Psychiatry Psychiatr Epidemiol. 2004;39:337–349. doi: 10.1007/s00127-004-0762-4. [DOI] [PubMed] [Google Scholar]

[CR8] 8.Andreasen NC. Scale for the assessment of thought, language, and communication (TLC) Schizophr Bull. 1976;12:473–482. doi: 10.1093/schbul/12.3.473. [DOI] [PubMed] [Google Scholar]

[CR9] 9.Choi HS, Lee B, Yoon S. Biometric authentication using noisy electrocardiograms acquired by mobile sensors. IEEE Access. 2016;4:1266–1273. doi: 10.1109/ACCESS.2016.2548519. [DOI] [Google Scholar]

[CR10] 10.Guger C, Schlogl A, Neuper C, Walterspacher D, Strein T, Pfurtscheller G. Rapid prototyping of an EEG-based BCI. IEEE Trans Neural Syst Rehab Eng. 2001;9:49–58. doi: 10.1109/7333.918276. [DOI] [PubMed] [Google Scholar]

[CR11] 11.Panayiotopoulos CP. EEG and brain imaging. In: A clinical guide to epileptic syndromes and their treatment. London: Springer; 2010.

[CR12] 12.Knyazeva MG, Innocenti GM. EEG coherence studies in the normal brain and after early-onset cortical pathologies. Brain Res Rev. 2001;36:119–128. doi: 10.1016/S0165-0173(01)00087-X. [DOI] [PubMed] [Google Scholar]

[CR13] 13.Guevara MA, Lorenzo I, Arce C, Ramos J, Corsi-Cabrera M. Inter-and intrahemispheric EEG correlation during sleep and wakefulness. Sleep. 1995;18:257–265. doi: 10.1093/sleep/18.4.257. [DOI] [PubMed] [Google Scholar]

[CR14] 14.Hornero R, Abasolo D, Jimeno N, Sa´nchez CI, Poza J, Aboy M. Variability, regularity and complexity of time series generated by schizo-phrenic patients and control subjects. IEEE Trans Biomed Eng. 2006;53:210–218. doi: 10.1109/TBME.2005.862547. [DOI] [PubMed] [Google Scholar]

[CR15] 15.Sabeti M, Boostani R, Katebi SD, Price GW. Selection of relevant features for EEG signal classification of schizophrenic patients. Biomed Signal Process Control. 2007;2:122–134. doi: 10.1016/j.bspc.2007.03.003. [DOI] [Google Scholar]

[CR16] 16.Kim DJ, Jeong J, Chae JH, Park S, Kim SY, Go HJ. An estimation of the first positive Lyapunov exponent of the EEG in patients with schizophrenia. Psychiatry Res Neuroimaging. 2009;98(3):177–189. doi: 10.1016/S0925-4927(00)00052-4. [DOI] [PubMed] [Google Scholar]

[CR17] 17.Sabeti M, Katebi S, Boostani R. Entropy and complexity measures for EEG signal classification of schizophrenic and control participants. Artif Intell Med. 2009;47:263–274. doi: 10.1016/j.artmed.2009.03.003. [DOI] [PubMed] [Google Scholar]

[CR18] 18.Kim JW, Lee YS, Han DH, Min KJ, Lee J, Lee K. Diagnostic utility of quantitative EEG in un-medicated schizophrenia. Neurosci Lett. 2015;589:126–131. doi: 10.1016/j.neulet.2014.12.064. [DOI] [PubMed] [Google Scholar]

[CR19] 19.Dvey Z, Fogelson N, Peled A, Intrator N. Schizophrenia detection and classification by advanced analysis of EEG recordings using a single electrode approach. PLoS ONE. 2015;10:1–12. doi: 10.1371/journal.pone.0123033. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR20] 20.Santos-Mayo L, San-José-Revuelta LM, Arribas JI. A computer-aided diagnosis system with EEG based on the P3b wave during an auditory Odd-Ball task in schizophrenia. IEEE Trans Biomed. 2017;64:395–407. doi: 10.1109/TBME.2016.2558824. [DOI] [PubMed] [Google Scholar]

[CR21] 21.Patel R, Gireesan K, Baskaran R, Shekar N. Optimal classification of N-back task EEG data by performing effective feature reduction. Sådhanå. 2022 doi: 10.1007/s12046-022-02015-w. [DOI] [Google Scholar]

[CR22] 22.Shu LO, Vicnesh J, Ciaccio EJ, Yuvaraj R, Acharya UR. Deep convolutional neural network model for automated diagnosis of schizophrenia using EEG signals. Appl Sci. 2019;9:1–13. [Google Scholar]

[CR23] 23.Phang CR, Noman F, Hussain H, Ting CM, Ombao H. A multi-domain connectome convolutional neural network for identifying schizophrenia from EEG connectivity patterns. IEEE J Biomed Health Inf. 2019;24(5):1333–1343. doi: 10.1109/JBHI.2019.2941222. [DOI] [PubMed] [Google Scholar]

[CR24] 24.Shalbaf A, Bagherzadeh S, Maghsoudi A. Transfer learning with deep convolutional neural network for automated detection of schizophrenia from EEG signals. Phys Eng Sci Med. 2021;43:1229–1239. doi: 10.1007/s13246-020-00925-9. [DOI] [PubMed] [Google Scholar]

[CR25] 25.Chandran C, Sreekumar K, Subha DP. EEG-based automated detection of schizophrenia using long short-term memory (LSTM) network. In: Advances in machine learning and computational intelligence. Springer, Singapore; 2021. pp. 229–236

[CR26] 26.Aslan Z, Akin M. A deep learning approach in automated detection of schizophrenia using scalogram images of EEG signals. Phys Eng Sci Med. 2022;45:83–96. doi: 10.1007/s13246-021-01083-2. [DOI] [PubMed] [Google Scholar]

[CR27] 27.Sun J, Cao R, Zhou M, Hussain W, Wang B, Xue J, Xiang J. A hybrid deep neural network for classification of schizophrenia using EEG data. Sci Rep. 2021;11(1):1–16. doi: 10.1038/s41598-021-83350-6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR28] 28.Gorbachevskaya NN, Borisov S. EEG data of healthy adolescents and adolescents with symptoms of schizophrenia. Available via http://brain.bio.msu.ru/eeg_schizophrenia.htm.

[CR29] 29.Litjens G, Ciompi F, Wolternik J, Vos B, Leiner T, Teuwen J, Isgum I. State of the art deep learning in cardiovascular image analysis. JACC Cardiovasc Imaging. 2019;12:1549–1565. doi: 10.1016/j.jcmg.2019.06.009. [DOI] [PubMed] [Google Scholar]

[CR30] 30.Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y. Generative adversarial nets. In: Proceedings of the 27th international conference on neural information processing systems, Montreal, QC, Canada. 2014; pp. 2672–2680

[CR31] 31.Radford A, Metz L, Chintala S. Unsupervised representation learning with deep convolutional generative adversarial networks. 2015. arXiv:1511.06434

[CR32] 32.Ioffe S, Szegedy C. Batch normalization: accelerating deep network training by reducing internal covariate shift. 2015. arXiv:1502.03167

[CR33] 33.Xu B, Wang N, Chen T, Li M. Empirical evaluation of rectified activations in convolutional network. 2015. arXiv:1505.00853.

[CR34] 34.Chen Y, Jiang H, Li C, Jia X, Ghamisi P. Deep feature extraction and classification of hyper spectral images based on convolutional neural networks. IEEE Trans Geosci Remote Sens. 2016;54(10):6232–6251. doi: 10.1109/TGRS.2016.2584107. [DOI] [Google Scholar]

[CR35] 35.Bengio Y, Simard P, Frasconi P. Learning long-term dependencies with gradient descent is difficult. IEEE Trans Neural Netw. 1994 doi: 10.1109/72.279181. [DOI] [PubMed] [Google Scholar]

[CR36] 36.Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput. 1997 doi: 10.1162/neco.1997.9.8.1735. [DOI] [PubMed] [Google Scholar]

PERMALINK

A deep learning approach for diagnosis of schizophrenia disorder via data augmentation based on convolutional neural network and long short-term memory

Amin Mashayekhi Shams

Sepideh Jabbari

Abstract

Introduction

Materials and methods