A novel strategy for enhanced schizophrenia detection using established CNN architectures

Ali Allahgholi; Keivan Maghooli; Babak Gholamine

doi:10.1186/s12880-025-02093-2

. 2025 Dec 9;26:27. doi: 10.1186/s12880-025-02093-2

A novel strategy for enhanced schizophrenia detection using established CNN architectures

Ali Allahgholi ¹, Keivan Maghooli ^1,^✉, Babak Gholamine ²

PMCID: PMC12801474 PMID: 41366658

Abstract

Schizophrenia, a complex psychotic condition, is challenging to diagnose due to its reliance on clinical assessments and behavioral evaluations. Neuroimaging studies, particularly structural MRI, have revealed reductions in grey matter volume in brain regions such as the temporal lobe and insula. This study combines deep learning and neuroimaging for precise automatic detection of schizophrenia. Using T1-weighted coronal MRI data from three publicly accessible datasets (MCICShare, COBRE, UCLA), a DeepLabv3+ model with ResNet50-based segmentation was employed to isolate the temporal lobe and insula, achieving segmentation accuracies of 96% and 97%, respectively. Enhanced visualization of isolated regions through color-to-grayscale conversion distinguished schizophrenia patients from controls, achieving an AUC of 0.99. Our study specifically focuses on the regions critically linked to both cognitive and emotional dysfunctions in schizophrenia. These results advance the literature by demonstrating improved diagnostic performance and reliability compared to traditional clinical assessments and earlier imaging-based methods.

Keywords: Schizophrenia, Structural MRI, Deep learning, Semantic segmentation, Temporal lobe, Insula, DeepLabv3+, Contrast enhancement, SANS correlation, Grad-CAM

Introduction

Schizophrenia is a complex psychiatric disorder characterized by profound disturbances in perception, cognition, and emotion. It typically manifests in late adolescence or early adulthood, most commonly between the ages of 18 and 25 [1]. Neuroimaging studies have consistently revealed structural brain abnormalities, including grey matter volume reductions across several regions–most notably the frontal, temporal, hippocampal, fusiform, and insular cortices–with the temporal lobe often being the most affected [2–4]. These morphological alterations are closely linked to the cognitive and perceptual symptoms of schizophrenia.

Among these regions, the temporal lobe and insula have received particular attention for their critical roles in auditory, emotional, and self-referential processing–domains that are profoundly disrupted in schizophrenia. The temporal lobe, especially the superior temporal gyrus, is essential for auditory perception and language comprehension; abnormalities here are closely associated with auditory hallucinations [4–6]. The insula contributes to interoception, emotional regulation, and self-awareness, and its structural and functional disruptions have been linked to negative symptoms such as emotional blunting and social withdrawal [7–9]. Taken together, converging evidence indicates that aberrations in the left temporal and insular cortices are central to both the cognitive and affective dysfunctions observed in schizophrenia.

Despite advances in neuroimaging and computational analysis, accurately identifying schizophrenia-related abnormalities remains challenging. Structural MRI provides valuable insights into brain morphology, yet existing deep learning studies often lack interpretability and focus on global brain patterns rather than clinically meaningful regions of interest (ROIs). Moreover, preprocessing variability and the use of whole-brain models may obscure localized patterns critical to understanding symptom-specific brain alterations. There remains a need for methods that integrate region-specific analysis, computational efficiency, and biological interpretability to improve both diagnosis and understanding of schizophrenia.

A wide range of machine learning and deep learning approaches have been explored for schizophrenia classification using diverse neuroimaging and electrophysiological modalities. Early research predominantly employed EEG-based methods, leveraging temporal signal patterns to distinguish patients from healthy controls. These studies typically utilized classifiers such as support vector machines (SVM), k-nearest neighbors (KNN), and ensemble-based algorithms, demonstrating the feasibility of automatic diagnosis based on brain activity patterns [10–12]. With the increased availability of structural and functional MRI, recent work has shifted toward image-based approaches that capture morphological and connectivity abnormalities in schizophrenia. Studies employing structural MRI (sMRI) and functional MRI (fMRI) have explored various feature representations, including grey and white matter volumes, cortical thickness, and regional connectivity metrics, often in combination with classical classifiers such as SVM, random forest (RF), or logistic regression [13–15]. Hybrid models integrating multimodal data, such as sMRI–fMRI or MRI–genetic information, have also been proposed to improve diagnostic robustness [16, 17]. More recently, deep learning techniques have been introduced to automatically extract discriminative features from neuroimaging data. These methods range from convolutional neural networks (CNNs) applied to 2D MRI slices to hybrid architectures combining CNNs with recurrent models for sequential or multimodal data [18, 19]. Despite their promise, many of these studies rely on whole-brain inputs or heterogeneous feature sets, which may obscure localized abnormalities associated with key clinical symptoms.

Existing approaches demonstrate the potential of computational methods for schizophrenia diagnosis but reveal several limitations: limited interpretability due to whole-brain analysis, underexploration of critical regions of interest such as the temporal and insular cortices, and high computational demands associated with complex architectures. These gaps motivate the present study’s region-specific, 2D ROI-driven framework, which focuses on biologically and clinically relevant areas to enhance both diagnostic accuracy and neurobiological interpretability.

To address these challenges, this study proposes a ROI-driven 2D deep learning framework for schizophrenia classification, focusing specifically on the left temporal and insular cortices—two regions deeply implicated in the disorder’s clinical symptoms. We employ the DeepLabv3+ model with a ResNet-50 backbone for precise semantic segmentation of these cortical regions, leveraging its multi-scale and atrous convolution capabilities to capture subtle anatomical variations. Following segmentation, the extracted regions are contrast-enhanced and classified using AlexNet, a computationally efficient CNN suitable for small-to-moderate neuroimaging datasets. This two-stage design—precise region segmentation followed by targeted classification—balances performance, interpretability, and efficiency.

In contrast to prior CNN-based schizophrenia studies that predominantly analyze whole-brain volumes or non-specific features, the proposed method advances the field by introducing a region-focused deep learning pipeline that explicitly isolates and enhances the temporal and insular cortices before classification. This design not only improves the model’s sensitivity to subtle structural variations but also provides a clearer link between imaging biomarkers and clinical symptoms. Furthermore, by combining the segmentation precision of DeepLabv3+ with the simplicity and efficiency of AlexNet, our framework achieves a balance between diagnostic performance, computational tractability, and interpretability. Overall, this study contributes a novel, biologically grounded, and computationally efficient approach that extends previous CNN-based works through explicit region-level analysis and a streamlined diagnostic architecture. The overall workflow of the proposed method is illustrated in Fig. 1, showing the data flow, dimensionality changes, and evaluation procedures across all stages of the pipeline.

Fig. 1 — Flowchart of the entire pipeline

Methods

Dataset

We obtained publicly available neuroimaging data from three major datasets: COBRE [20], MCICShare [21], and UCLA [22]. The COBRE and MCICShare datasets were accessed through the SchizConnect database [23], originally collected to investigate brain metabolism and structure in patients with schizophrenia, whereas the UCLA dataset was provided for neuropsychiatric phenomics research. All datasets consisted of 2D T1-weighted structural MRI images, acquired on scanners with field strengths ranging from 1.5T to 3T. Specifically, the COBRE dataset included 90 subjects, and the MCICShare dataset included 109 subjects. A summary of the image distribution across datasets and diagnostic groups is presented in Table 1, showing the total number of images and the percentage contributed by each dataset for control and schizophrenia participants.

Table 1.

Statistical information of data which extracted from SchizoConnect database

Statistic/DX	No Known Disorder	Schizophrenia Broad	Schizophrenia Strict	Schizoaffective
Number of subjects	189	109	79	11
Gender(m/f)	132/57	83/26	64/15	8/3
Age(years)(mean SD)	35.9 12.2	34.3 11.2	37.8 13.8	40.6 12.6
Age range(years)(min-max)	18 - 65	18 - 61	19 - 66	19 - 59
Dataset/Number of Images	MCICShare	COBRE	UCLA	Total
Control	264(41.57%)	241 (37.95%)	130 (20.47%)	635
Schizophrenia	306(49.51%)	262(42.39%)	50(8.09%)	618

Open in a new tab

The COBRE, MCICShare, and UCLA datasets were combined to create a unified dataset comprising 249 schizophrenia (SZH) patients and 319 healthy controls, all of which were used for schizophrenia classification. Additionally, the images from the healthy control group of the UCLA dataset were also used for pixel-wise segmentation. All MRI data included in this study satisfied the required quality criteria. No participants or scans were excluded due to motion artifacts, image distortions, or any other quality-related issues.

The MCICShare and COBRE datasets were categorized into three diagnostic subgroups: schizophrenia (broad), schizophrenia (strict), and schizoaffective. In this study, we used the schizophrenia (broad) category, which combines schizophrenia and schizoaffective diagnoses [23]. This grouping approach is widely adopted in neuroimaging studies [24, 25], as these conditions often share overlapping structural and functional abnormalities.

MRI acquisition parameters were as follows: MCICShare: 3T – TR = 2530 ms, TE = 3.79 ms, FA = 7 Inline graphic , TI = 1100 ms, bandwidth = 181 Hz/pixel; 1.5T – TR = 12 ms, TE = 4.76 ms, FA = 20, bandwidth = 110 Hz/pixel; voxel size = 0.625 0.625 mm, slice thickness = 1.5 mm, FOV = 16–18 cm, matrix = 256 256 128. UCLA: TR = 1.9 s, TE = 2.26 ms, FOV = 250 mm, matrix = 256 Inline graphic 256, sagittal plane, slice thickness = 1 mm, 176 slices. COBRE: TR = 2.53 s, TE = 1.64–9.08 ms, TI = 1.2 s, FA = 7, slice thickness = 1 mm, FOV = 256 mm, matrix = 256 256.

For external validation, we used the dataset provided by Soler-Vidal et al. [26], which included T1-weighted MRI images from 46 patients with schizophrenia (mean age 42.52 years; 36 males, 10 females) and 25 healthy controls (mean age 39.8 years; 18 males, 7 females). Demographic variables in this external dataset were comparable to those in the main SchizConnect datasets. No significant group differences were observed in gender distribution ( Inline graphic ), age, or other demographic factors (), suggesting minimal demographic bias.

Statistical analyses were performed to evaluate potential demographic and dataset-related confounds. A one-way ANOVA showed no significant difference in age among diagnostic groups ( Inline graphic , ), indicating that participants were age-balanced across categories. Similarly, the chi-square test for gender distribution revealed no significant difference in male-to-female ratios (, ; Cramer’s ). These results confirm that age and sex were well matched between schizophrenia (SZ) and healthy control (HC) groups and are unlikely to bias classification outcomes.

Data preprocessing

In this study, preprocessing was crucial to reduce noise, enhance image quality, and standardize data for optimal model performance. Steps such as denoising, intensity normalization, and skull stripping were applied to ensure consistent intensity ranges and to eliminate non-brain elements that could interfere with the analysis. These preprocessing techniques are widely recognized for improving the accuracy and reliability of subsequent tasks such as semantic segmentation (pixel-level) and image-level classification [27, 28].

While deep learning models can process raw data, preprocessing is essential in MRI studies to improve image consistency, as variations in acquisition parameters, scanner types, and contrast profiles can affect model generalization. Skull stripping precisely removed non-brain tissues, including the skull and scalp, enabling the network to focus on relevant brain regions—specifically the temporal lobe and insula, which are the primary regions of interest in this study.

To prevent any potential data leakage, all preprocessing steps were applied in a strictly training-data-driven manner. Parameters for intensity normalization (mean and standard deviation) were computed exclusively from the training set and subsequently applied to the validation and test sets. Denoising and skull stripping were performed independently for each subject using identical processing pipelines, without using any label or distributional information from the validation or test data. This ensured that preprocessing did not bias the model toward unseen data.

Denoising

MRI images are often affected by noise during acquisition, which can compromise quantitative analysis. While multiple acquisitions in the scanner can reduce noise, this is impractical in clinical settings due to time constraints. As an alternative, filtering methods such as the nonlocal means (NLM) filter [29] were employed to reduce noise while preserving anatomical details. The NLM filter reconstructs each pixel using a weighted average of nearby pixels based on a similarity measure, as shown in Eq. 1 and Eq. 2.

In this context, “ Inline graphic ” denotes the weight assigned to the value , signifying the similarity between the local patches and . These patches have a radius of r and are centered on voxels and . Additionally, represents a local search around .

In Eq. 2, Inline graphic serves as a normalization constant to ensure that the expression, given by the formula ( ), the parameter plays a crucial role in filtering, influencing the rate at which the exponential function decays. Fig.2 shows the result of applying the NLM filter.

Fig. 2 — Image denoising; a) original image, b) NLM filter output

Normalization

Normalization was performed using the Z-Score approach, which adjusts pixel intensities to a common scale, compensating for differences in acquisition settings across scanners. This ensures that tissues across images exhibit consistent intensity ranges, which is essential for accurate analysis.

Skull stripping

Skull stripping is an essential step in removing non-brain tissues from MRI scans. This preprocessing step was carried out using SynthStrip [30], a deep learning tool that effectively extracts brain tissues from a variety of imaging modalities. The result is a cleaner image, allowing for more precise segmentation of brain regions of interest (Figure 3).

Fig. 3 — Sample result image of skull-stripping process

Semantic segmentation

Semantic segmentation, a deep learning technique, assigns a label to each pixel in an image to identify groups of pixels that belong to the same category. The goal is to divide the image into meaningful regions for further analysis. Recent advances in deep learning have led to powerful segmentation models, making them the dominant approach for semantic segmentation. In this study, we used deep learning-based semantic segmentation to identify regions of interest (ROIs), such as the insula and temporal lobe, to improve the automatic detection of schizophrenia. The following steps outline the process.

Labeling ground truth

A total of 130 2D T1-weighted brain MRI images from the coronal plane were obtained from the UCLA dataset and used for pixel-wise semantic segmentation of two anatomical regions of interest: the insula and the temporal lobe. Each image was manually annotated at the pixel level by expert raters to generate ground truth segmentation masks. Pixels corresponding to the insula were labeled in blue (RGB code [0, 0.45, 0.74]), while those representing the temporal lobe were labeled in orange (RGB code [0.85, 0.33, 0.1]). All other regions were assigned as background. This annotation process produced precise pixel-level class labels that enabled the segmentation network to learn fine-grained appearance and shape characteristics of each region, in contrast to coarse region annotations such as bounding boxes or contour approximations [31]. Figure 4(a) illustrates an example of a manually labeled ground truth image.

Fig. 4 — Semantic segmentation image post process; blue region indicates insula, orange region is temporal region; a) ROI labled image, b) predicted by CNN, c) grayscale highlighted image

Data augmentation

To prepare images for semantic segmentation, 130 labeled images from the UCLA dataset were used. However, this number was too small for training the deep learning model, so image augmentation techniques were applied to increase the dataset size. Simple rotations (e.g., 90-degree turns) and flips (horizontal and vertical) generated 390 images. Additional transformations, such as shifting and resizing, created 4 variations per image, bringing the total to 520 images. In the end, 1040 images were used to train the model for semantic segmentation.

Moreover, due to the insufficient data available for testing our model, we expanded our external dataset to 426 images using similar data augmentation techniques. In this study, we have opted to focus on 2D MRI data, deliberately excluding 3D MRI data. This selection is based on both practical and methodological considerations. The analysis of 3D MRI volumes involves substantially higher computational complexity, as it requires processing larger datasets with greater resolution, which can result in increased computational time and resource demands. However, in future work, we plan to expand our approach to incorporate 3D MRI data, which could potentially offer more detailed insights and enhance the robustness of the model.

Semantic segmentation deep learning construction

In this research, we employed a pre-trained DeepLabV3+ network for image segmentation, followed by a CNN for feature extraction. While an integrated model could streamline the process by combining these tasks into a single network, the chosen pipeline was motivated by the need for precise segmentation and specialized feature extraction, each tailored to specific objectives.

The DeepLabV3+ model, as proposed by Chen et al. [32], incorporates an encoder-decoder architecture. The encoder captures multi-scale contextual information through atrous convolutions, while the decoder enhances segmentation accuracy along object boundaries. This combination makes DeepLabV3+ particularly well-suited for segmenting complex regions, such as brain tissues, where boundary delineation is critical for downstream tasks. The network is built upon ResNet50, a robust backbone designed by He et al. [33]. ResNet50, as the backbone for DeepLabV3+, provides a powerful feature extraction mechanism with a depth of 50 layers. Its architecture includes a series of convolutional and pooling layers optimized for large-scale image analysis, making it an ideal choice for capturing the nuanced features required for accurate segmentation. The rationale for the Two-Step Approach are:

Precision in Segmentation: Using DeepLabV3+ ensures that the segmentation process is optimized independently, focusing on delineating meaningful regions (e.g., brain tissues) from the background.
Targeted Feature Extraction: By applying the CNN only to the segmented regions, feature extraction is concentrated on relevant areas, reducing noise and improving model focus.
Flexibility and Modularity: The two-step approach allows each module to be fine-tuned separately. This modular design provides the flexibility to replace or update either the segmentation or feature extraction components without affecting the overall pipeline.
Empirical Performance Gains: Segmenting the images before feature extraction reduces the computational complexity of the subsequent steps by narrowing the focus to the regions of interest, enhancing classification accuracy and efficiency.

Contrast enhancement using color-to-gray scale conversion

The main goal of using the color-to-grayscale technique is to improve the visibility of subtle differences in brain structures, which are critical for distinguishing schizophrenia-related abnormalities. By applying this contrast-enhancing technique, we provide a clearer and more defined representation of the regions of interest (ROIs), allowing the model to learn from finer anatomical details. By enhancing the contrast in the segmented regions, the model can more effectively identify and focus on critical areas associated with schizophrenia, such as the temporal lobe, insula, and surrounding cortical regions. Since MRI images often contain numerous non-discriminative features, the proposed enhancement step enables the model to prioritize diagnostically relevant regions, ensuring that these features receive higher importance in the classification process.

Mathematical formulation and application of the color-to-grayscale enhancement

The color-to-grayscale contrast enhancement method proposed by [34] was employed to improve the perceptual visibility of subtle structural variations in MRI slices. Conventional grayscale conversion based solely on luminance ( Inline graphic ) often suppresses minor chromatic or intensity differences that can reflect schizophrenia-related abnormalities. To address this, Kuhn’s algorithm models the color-to-gray mapping as a mass–spring physical system, in which each color behaves as a particle interacting with all other colors through virtual springs. These interactions iteratively adjust luminance values to produce a grayscale image that preserves perceptual contrast.

Perceptual distance between colors

Each quantized color Inline graphic in the CIE color space is represented as a particle. The desired rest length between two colors in the grayscale domain is given by:

where Inline graphic denotes the perceptual color distance, is the full grayscale range, and represents the maximum perceptual color distance in the image. This proportional mapping ensures that perceptual differences among colors are maintained after conversion. In this study, this mechanism was used to preserve fine intensity and texture variations across brain regions such as the insula and temporal lobe.

Force modeling among color particles

The system computes the total force applied to each particle as:

where Inline graphic is the current grayscale distance and is the stiffness coefficient. If two grayscale levels become too similar, the spring between them exerts a repulsive force, encouraging their separation; conversely, overly distant gray levels are drawn closer. When applied to MRI data, this mechanism prevents different tissue types or cortical structures from merging into similar gray levels, ensuring that subtle neuroanatomical boundaries remain distinguishable.

Dynamic luminance update

The luminance value of each particle is iteratively updated using Verlet integration:

where Inline graphic is the particle’s mass. The simulation iterates until the system reaches equilibrium, yielding a stable grayscale representation with maximized local and global contrast. In our application, this dynamic update enhances the definition of cortical folds and boundaries, which contributes to improved visual separation of key brain structures for the CNN classifier.

Color saturation weighting

The mass of each particle is inversely related to its chromatic saturation:

Highly saturated colors ( Inline graphic large) have smaller masses and therefore respond more strongly to forces, while near-neutral colors remain relatively stable. This weighting highlights highly informative regions and preserves structural uniformity elsewhere. In our framework, it emphasized subtle intensity transitions in segmented regions of interest (ROIs) without introducing artifacts in homogeneous tissue areas.

Quantitative evaluation using RWMS

The preservation of perceptual contrast was quantitatively assessed using the Root Weighted Mean Square (RWMS) error metric:

where Inline graphic is the expected grayscale difference between color pairs . Lower RWMS values indicate stronger preservation of perceptual contrast.

CNN construction

Convolutional Neural Networks (CNNs) are some of the most widely used and effective algorithms in Deep Learning. A major advantage of CNNs is their ability to automatically detect important features without human intervention. Among these, AlexNet is notable for its groundbreaking image recognition and classification achievements. It was the first CNN model to use GPUs for training, greatly accelerating the process. AlexNet also introduced a deeper architecture with 8 layers, improving feature extraction compared to earlier models like LeNet. Additionally, it employs the ReLU activation function, which speeds up training by allowing some neurons to stay inactive, thus avoiding the vanishing gradient problem.

ResNet gained prominence after winning the ILSVRC 2015 competition. It was designed to address the vanishing gradient problem in deep networks and has versions ranging from 34 to 1202 layers. ResNet50, a commonly used version, contains 49 convolutional layers and one fully connected layer.

In this research, both AlexNet and ResNet50 are used. These models include convolutional layers, max-pooling layers with ReLU activation, dropout layers, and fully connected layers with softmax for classification in AlexNet. ResNet50 also incorporates batch normalization.

Based on Fig. 5, the AlexNet architecture is designed to process and classify images through several layers. It begins with a convolutional layer that uses 96 kernels of size 11 × 11 and a stride of 4, followed by a max pooling layer with a 3 × 3 stride. The second convolutional layer uses 256 feature maps with 5 × 5 kernels and a stride of 1, followed by another max pooling layer with a 3 × 3 stride. The third and fourth convolutional layers each contain 384 feature maps with 3 × 3 kernels. The fifth convolutional layer has 256 feature maps with 3 × 3 kernels, followed by a final max pooling layer with a 3 × 3 stride. After these convolutional layers, three fully connected layers, each containing 4096 neurons, process the features. The architecture concludes with an output layer that generates the classification result.

Implementation detail

The proposed work was implemented in MATLAB 2022b and executed on a Windows 11 computer equipped with an AMD Ryzen 56600 H processor running at 3.30 GHz and 16 GB of RAM. The GPU used was an NVIDIA RTX3050. The suggested model was evaluated using three publicly available dataset, focusing on control subjects and schizophrenia patients. The dataset was divided as 70% as training data, 15% as test data, and 15% as validation data.

Following segmentation and contrast enhancement, each image was resized to 227 Inline graphic 227 pixels and used as the input to the AlexNet architecture. For grayscale MRI slices, the single channel was replicated to form a 227 227 3 input tensor, ensuring compatibility with the network’s original configuration.

AlexNet consists of five convolutional layers, three max-pooling layers, and three fully connected layers. The first convolutional layer extracts 96 low-level feature maps (55 Inline graphic 55) representing basic edges and textures, followed by max pooling that reduces the spatial dimension to 27 27. The second convolutional layer produces 256 mid-level feature maps (27 27), and subsequent pooling compresses them to 13 13. The third and fourth convolutional layers generate 384 higher-level feature maps (13 Inline graphic 13), capturing local structural and shape patterns. The fifth convolutional layer outputs 256 abstract feature maps (13 13), which are downsampled through the final pooling layer to 6 6, resulting in a flattened feature vector of 9,216 elements.

This flattened vector passes through three fully connected layers (fc6–fc8). The first two layers (fc6 and fc7) each contain 4,096 neurons that learn dense representations of the high-level abstract features, while the final layer (fc8) performs binary classification between schizophrenia and control groups. In this study, the 4,096-dimensional feature vector obtained from the fc7 layer was retained as the deep feature representation for each image. The extracted feature vectors were concatenated across the dataset, forming an 8192-dimensional composite feature vector representing each subject. This comprehensive feature set captures both emotional (insula-related) and cognitive (temporal-related) information relevant to schizophrenia classification.

Biological implications

While our model highlights the relevance of the temporal lobe and insula in schizophrenia, it is essential to bridge these findings with known biological mechanisms and clinical implications. Schizophrenia is a complex disorder associated with structural and functional abnormalities in specific brain regions, including the temporal lobe and insula. These regions are known to play critical roles in auditory processing, language comprehension, and interoceptive awareness, which are often disrupted in schizophrenia. The temporal lobe, particularly the superior temporal gyrus (STG), has been extensively linked to schizophrenia. Abnormalities in this region are associated with auditory hallucinations and deficits in social cognition, which are hallmark symptoms of the disorder [6, 36]. Our model’s identification of the temporal lobe aligns with previous neuroimaging studies that have reported reduced gray matter volume and altered functional connectivity in this area [5, 37]. These changes may correlate with biomarkers such as decreased N-acetylaspartate (NAA) levels, a marker of neuronal integrity, observed in magnetic resonance spectroscopy (MRS) studies [38]. The insula, a region involved in emotional regulation and self-awareness, has also been implicated in schizophrenia. Dysfunction in the insula is associated with impaired emotional processing and insight, which are commonly observed in patients [39, 40]. Our findings suggest that the insula’s role in schizophrenia may be linked to its connectivity with other brain regions, such as the anterior cingulate cortex (ACC) and the prefrontal cortex (PFC) [41, 42]. These connections are critical for integrating sensory and emotional information, and their disruption may contribute to the symptomatology of schizophrenia [43]. Biomarkers such as altered glutamate levels in the insula, as reported in some studies, could provide further biological validation of our model’s predictions [44, 45].

To validate the biological interpretability of our model, we utilized Gradient-weighted Class Activation Mapping (Grad-CAM) to identify and visualize the regions of interest (ROIs) that played the most significant role in the model’s decision-making process. Furthermore, we correlated these findings with clinical outcomes by analyzing the Scale for the Assessment of Negative Symptoms (SANS) scores for three patients with schizophrenia, categorized by severity: mild-to-moderate, moderate, and severe. This approach allowed us to bridge the model’s predictions with real-world clinical manifestations of the disorder. In addition to the regions previously mentioned, several studies have investigated the correlation between negative symptoms and structural alterations in specific brain regions in patients with schizophrenia. These studies collectively highlight that anhedonia and avolition, as measured by self-rated scales, are inversely related to white matter volume in the left anterior limb of the internal capsule. Furthermore, SANS scores show significant correlations with the vertical and horizontal distances between the corpus callosum and the infrafornix, ventricular area, and structural changes in the frontal lobe and amygdala. These findings emphasize the complex relationship between negative symptoms and structural brain abnormalities in schizophrenia, providing valuable insights into the neurobiological underpinnings of the disorder [46–48].

Figure 6 demonstrates the results of Gradient-weighted Class Activation Mapping (Grad-CAM) applied to our proposed model, highlighting the key brain regions that influence the model’s decision-making process. As shown in the figure, the model identifies several regions, including the superior frontal gyrus, inferior frontal gyrus, medial frontal gyrus, cingulate gyrus, superior temporal gyrus (STG), inferior temporal gyrus, amygdala, orbital gyrus, insula, thalamus, hippocampus, caudate nucleus,and the left lentiform nucleus. These regions are strongly associated with negative symptoms of schizophrenia, as captured by our proposed model.

Fig. 6 — Grad-CAM highlighted the critical regions that significantly influenced the outcomes of our study

Moreover, to further investigate this relationship, we calculated the average GradCAM values in the ROIs for a larger sample (N = 100) and computed the Spearman correlation coefficient Inline graphic between these GradCAM values and SANS scores. The observed Spearman correlation between higher GradCAM values—indicating the AI model’s focus on the hippocampus–and worse negative symptoms (higher SANS scores) suggests a moderate alignment between model attention and symptom severity. This supports existing findings that reduced hippocampal volume is linked to cognitive deficits and negative symptoms in schizophrenia. While GradCAM reflects model saliency rather than direct volumetric loss, the correlation underscores the clinical significance of hippocampal abnormalities.

A weak-to-moderate negative correlation between IFG GradCAM values and SANS scores Inline graphic suggests that increased model attention to the IFG is associated with milder negative symptoms. Similarly, a significant negative correlation between SFG GradCAM values and SANS scores indicates that preserved SFG integrity or function, as highlighted by the model, corresponds to less severe negative symptoms. Given the SFG’s role within the dorsolateral prefrontal cortex (DLPFC), a hub for executive control, this finding aligns with prior research linking DLPFC dysfunction to avolition and social withdrawal—core negative symptoms of schizophrenia [49].

Conversely, a moderate-to-strong positive correlation between left insula GradCAM values and SANS scores Inline graphic suggests that insular abnormalities contribute to negative symptoms. Greater model attention to the left insula correlates with more severe negative symptoms, consistent with evidence that insular hyperactivity during self-referential tasks is associated with symptom severity [50].

A similar pattern is observed in the thalamus, where a moderate-to-strong positive correlation Inline graphic suggests that thalamic abnormalities may play a role in negative symptoms. This finding aligns with studies linking thalamic dysfunction to symptom severity in schizophrenia [51].

In contrast, a moderate negative correlation between cingulate gyrus GradCAM values and SANS scores Inline graphic suggests that preserved cingulate integrity may mitigate negative symptoms. Higher model attention to this region corresponds to lower SANS scores, supporting evidence that intact anterior cingulate cortex (ACC) function is associated with better symptom outcomes [52].

No significant correlation was found between STG GradCAM values and SANS scores in this cohort.

The table 2 presents Spearman correlation coefficients ( Inline graphic ) between GradCAM-derived activation patterns in specific brain regions and schizophrenia symptom severity. All correlations are statistically significant (), indicating robust monotonic relationships. Effect sizes () are interpreted as follows: Weak: –; Moderate: –; Strong: .

Table 2.

Spearman correlation between Grad-CAM activation and negative symptoms and effect size synthesis

Spearman Correlation Coefficients by Brain Region
Region	Flattening	Alogia	Avolition	Anhedonia	Inattentiveness
Hippocampus	0.693	0.7408	0.67	0.73	0.30
Superior Frontal	−0.35	−0.26	0.00	−0.38	−0.09
Inferior Frontal	0.00	−0.19	−0.23	−0.38	−0.47
Left Insular	0.30	0.40	0.67	0.20	−0.43
Thalamus	0.30	0.43	0.32	0.34	0.16
Cingulate Cortex	−0.61	−0.28	−0.32	−0.42	−0.07
Effect Size Synthesis
Symptom	Strongest Positive Correlation			Strongest Negative Correlation
Flattening	Hippocampus ( = 0.69)			Cingulate Cortex ( = −0.61)
Alogia	Hippocampus ( = 0.74)			Cingulate Cortex ( = −0.28)
Anhedonia	Hippocampus ( = 0.73)			Cingulate Cortex ( = −0.42)
Inattentiveness	Hippocampus ( = 0.30)			Inferior Frontal ( = −0.47)

Open in a new tab

The hippocampus showed strong positive correlations with several core negative symptoms, including alogia ( Inline graphic ), anhedonia (), flattening (), and avolition (). Inattentiveness showed a weaker association (). These findings suggest that increased hippocampal activation, as captured by Grad-CAM, is closely linked to the severity of negative symptoms in schizophrenia. This aligns with the hippocampus’s known role in emotional regulation and memory, functions often impaired in the disorder.

The superior frontal gyrus displayed moderate negative correlations with flattening ( Inline graphic ) and anhedonia (), and weaker negative correlations with alogia () and inattentiveness (). Avolition was not correlated (). These results indicate that higher activation in this executive control region may buffer against affective flattening and anhedonia. The lack of correlation with avolition suggests distinct neural pathways for motivational impairments.

The inferior frontal gyrus showed a moderate negative correlation with inattentiveness ( Inline graphic ), and weak negative correlations with anhedonia (), avolition (), and alogia (). No correlation was observed with flattening (). This suggests that increased activity in this region may be linked to better attentional control and, to a lesser extent, to improvements in speech and motivation, potentially reflecting its role in language and inhibition.

The left insular cortex was strongly correlated with avolition ( Inline graphic ) and moderately correlated with alogia (). Inattentiveness showed a moderate negative association (), while flattening () and anhedonia () had weaker positive correlations. This pattern reflects the insula’s dual role in interoception and cognitive regulation—exacerbating avolition and alogia while potentially reducing inattentiveness.

The thalamus showed a moderate positive correlation with alogia ( Inline graphic ), and weak positive correlations with flattening (), avolition (), anhedonia (), and inattentiveness (). These findings highlight the thalamus’s diffuse but consistent contribution, possibly reflecting its role as a sensory relay center involved in filtering and integrating information.

The cingulate cortex exhibited a strong negative correlation with flattening ( Inline graphic ), a moderate negative correlation with anhedonia (), and weaker negative associations with avolition () and alogia (). Inattentiveness showed minimal correlation (). This suggests that greater activation in this region may protect against affective blunting and anhedonia, supporting its role in emotional regulation and conflict monitoring.

Our analysis demonstrates that the hippocampus consistently shows the largest effect sizes related to positive symptoms in schizophrenia ( Inline graphic for 4 out of 5 symptoms), confirming its central role in the pathophysiology of the disorder. In contrast, the cingulate cortex exhibits the most protective effects, with the strongest negative correlations observed for symptoms such as flattening and anhedonia. Frontal regions, including the superior and inferior gyri, correlate with symptom reduction, supporting their involvement in top-down regulatory processes. Interestingly, the left insula shows a symptom-specific paradox: high activation in this region worsens avolition but simultaneously improves attention.

The strong positive correlations observed in the hippocampus suggest that hippocampal hyperactivation drives negative symptoms through disrupted emotional memory and contextual processing. The frontal-insufficiency hypothesis is supported by the negative correlations found in the frontal gyri, reflecting impaired executive control in schizophrenia. Moreover, the cingulate cortex appears to act as a compensatory hub, as indicated by its negative correlations, making it a promising target for neuromodulation therapies. Symptom-specific neural networks are also evident; avolition is linked to insular activation, whereas inattentiveness relates more strongly to frontal regions, advocating for symptom-focused treatment strategies.

In summary, this analysis reveals region and symptom-specific neural patterns in schizophrenia. The hippocampus and insula are primarily associated with symptom aggravation (strong positive correlations), whereas the cingulate and frontal cortices appear to mitigate symptom severity (negative correlations). Effect sizes underscore the hippocampus as a primary neural substrate for negative symptoms, while frontal and cingulate regions offer compensatory potential. These insights could guide targeted interventions, such as neurostimulation of the cingulate cortex to reduce flattening.

Our model consistently highlights the temporal lobe and insula, which aligns with prior biological evidence in schizophrenia. However, we emphasize that re-examining these regions through explainable AI methods such as Grad-CAM provides unique added value. To date, no studies have specifically performed segmentation or knowledge localization of the temporal lobe and insula in the context of schizophrenia using artificial intelligence. While previous research has acknowledged the involvement of these areas, efforts to localize and quantify their individual-level contributions using data-driven approaches remain scarce [53].

Through the use of Grad-CAM, we revisited these regions from a novel, interpretable perspective. The visualization of model attention revealed a consistent focus on the temporal lobe and insula—further supporting the biological plausibility of our findings. More importantly, Grad-CAM enabled us to link these attentional patterns to symptom-specific severity scores, contextualizing their relevance within the clinical heterogeneity of schizophrenia. This fusion of model saliency and symptom-based profiling represents a significant step forward, moving beyond traditional hypothesis-driven or volumetric group comparisons and toward individualized insight into neurobiological mechanisms.

Nonetheless, we recognize current limitations, particularly the limited availability of confounding variables in our dataset. While statistically significant, the correlations between Grad-CAM values and SANS scores are primarily exploratory and should be interpreted as preliminary evidence rather than direct clinical biomarkers. Future studies should aim to validate these findings using larger and more diverse cohorts, incorporating richer clinical and demographic information. Such efforts will not only strengthen the robustness and generalizability of our approach but also pave the way for greater clinical applicability in the personalized diagnosis and management of schizophrenia.

Results

This study focuses on accurately diagnosing schizophrenia using 2D brain MRI images, emphasizing the insula and temporal lobe as critical diagnostic regions. The MRI data were sourced from publicly available datasets, including COBRE, MCICShare, and UCLA [23]. Table 1 summarizes the statistical characteristics of the COBRE and MCICShare datasets, while table 1 illustrates the distribution of control and schizophrenia subjects across these datasets. An external dataset was used to test the reliability of our model, sourced from the dataset created by Soler-Vidal et al. [26].

Image preprocessing steps included denoising with a Non-Local Means (NLM) filter, normalization, and skull stripping to reduce noise and intensity variation from different scanners, as well as isolating brain tissue. Figure 2 shows the images before and after the denoising process, and Figure 3 presents the final image following the preprocessing techniques. The semantic segmentation network employed a pre-trained DeepLabv3+ architecture based on ResNet50 for segmentation tasks. ResNet-50 was chosen as the network backbone for DeepLabV3+ based on a comparative analysis conducted among four pre-trained networks: ResNet-18, ResNet-50, Xception, and MobileNetv2. The results of this analysis, presented in Table 3, demonstrate that ResNet-50 outperformed the other networks in segmenting our regions of interest, achieving superior performance. The training was conducted using 130 2D brain MRI images from the control group within the UCLA dataset, with the regions of interest (ROI), including the insula and temporal lobe, manually labeled. The network was trained for 200 iterations using MATLAB R2022b software [54]. Table 3 presents the segmentation accuracy, Dice coefficient, and Jaccard index for the images, evaluated using 5-fold cross-validation. Additionally, Fig. 4(b) illustrates the regions of interest predicted by the Network.

Table 3.

a) semantic segmentation regions prediction accuracy using pre-trained networks b) validation metrices of predicted regions using ResNet-50 for preprocessed images and unpreprocessed images c) Performance of semantic segmentation evaluated by Dice and Jaccard indices using 5-fold cross-validation

a) Network	Acuracy Insula	Accuracy Temporal lobe
ResNet-18	94%	83%
ResNet-50	96%	97%
Xception	75%	70%
MobileNetv2	73%	60%
b) Class	pre-processed Accuracy	Unpreprocessed Accuracy
Insula	960.12%	800.21%
Temporal lobe	970.15%	850.06%
c) Class (5-fold cross-validation)	Dice(%)	Jaccard(%)
Insula
Temporal lobe

Open in a new tab

After pooling, the dataset, consisting of 249 schizophrenia (SZ) subjects and 319 healthy controls (HC), was randomly divided at the subject level into 70% for training (398 subjects), 15% for validation (85 subjects), and 15% for testing (85 subjects).The classification procedure were conducted using MATLAB software, and to ensure the reliability and reproducibility of the results, the entire training and evaluation process was repeated 40 independent times with different random initializations and data shuffles.

The averaged results across the 40 runs are summarized in Table 4, which reports accuracy, sensitivity, specificity, precision, recall, F1-score, and the area under the ROC curve (AUC) with standard deviations. In this table, Item (a) presents the model performance using tuned hyperparameters, (b) shows the results from a K-fold cross-validation setup with K = 10 (representing the number of folds used for validation), (c) corresponds to the performance achieved on the external validation dataset, (d) represents the performance on the training set, and (e) denotes the performance on the validation set. This structured comparison demonstrates the model’s stability across multiple experimental settings.

Table 4.

Model performance in the current study for schizophrenia prediction

	Accuracy (95% CI)	Sensitivity (95% CI)	Specificity (95% CI)	Precision	Recall	F1-Score	AUC (95% CI)
a)	98.02 1.62 [96.4–99.5]	98.95 1.82 [97.1–99.8]	97.06 2.06 [95.0–98.8]	97.65 2.04	98.95 1.82	98.20 1.69	0.9917 0.01 [0.987–0.996]
b)	99.01 0.14 [98.8–99.3]	98.00 0.17 [97.7–98.3]	97.08 0.22 [96.7–97.4]	97.34 0.14	98.00 0.17	97.66 0.14	0.9891 0.001 [0.987–0.991]
c)	96.20 0.20 [95.8–96.6]	97.60 0.15 [97.3–97.9]	98.30 0.30 [97.8–98.8]	97.54 0.20	97.60 0.15	97.56 0.11	0.980 0.12 [0.972–0.988]
d)	100.0 0.10 [99.8–100]	98.8 0.15 [98.5–99.2]	99.1 0.13 [98.8–99.4]	99.0 0.14	98.8 0.15	98.9 0.13	0.994 0.004 [0.986–0.999]
e)	98.2 0.03 [97.9–98.5]	97.9 0.04 [97.5–98.3]	98.4 0.03 [98.1–98.7]	98.1 0.04	97.9 0.04	98.0 0.03	0.986 0.006 [0.975–0.993]

Open in a new tab

Our proposed model achieved strong and consistent performance, with an average accuracy of 98.02% Inline graphic 1.62, sensitivity of 98.95% 1.82, specificity of 97.06% 2.06, and an AUC of 0.9917 0.01, confirming its robustness in distinguishing SZ from HC subjects. Figure 7 illustrates the receiver operating characteristic (ROC) curves for both the internal test set (a) and the external validation set (b). The proposed model achieved excellent discriminative performance, with an AUC of 0.99 for the internal dataset and 0.98 for the external dataset. The near-perfect AUC value in the internal evaluation indicates the model’s strong ability to distinguish schizophrenia (SZ) patients from healthy controls (HC), reflecting the effectiveness of the region-specific design focusing on the temporal lobe and insula. Moreover, the high AUC obtained on the independent external dataset confirms the model’s generalization capability and robustness against scanner or site-related variations. These results collectively demonstrate that the proposed framework achieves high diagnostic.

Fig. 7 — ROC curves for schizophrenia prediction: a) internal test set and b) external validation set

To ensure methodological rigor and eliminate the possibility of data leakage, we conducted all data partitioning strictly at the subject level. In this process, each subject’s MRI slices and all corresponding augmented samples were grouped together and assigned exclusively to a single subset (training, validation, or testing). This ensured that data from any individual subject did not appear in more than one partition. Furthermore, we applied a stratified random split design to maintain a balanced ratio of schizophrenia (SZ) and healthy control (HC) subjects across all subsets and to preserve a proportional site-wise distribution.

After applying the subject-level stratified random split strategy, the model was retrained and evaluated under the same protocol, yielding realistic and generalizable performance—accuracy of 97.8% Inline graphic 1.6, sensitivity of 98.6% 1.8, specificity of 96.9% 2.0, and AUC of 0.987 0.01.

Since the accuracy criterion cannot differentiate between false negative and false positive for the model, we calculated the two precision and recall. Also, to check whether our proposal is appropriate in this research, we calculate the F1-score, and the results show the goodness of Alexnet’s proposed model in schizophrenia diagnosis. The calculated F1-score is shown in Table 4. Furthermore, the results of our approach for each case introduced in the dataset section, including “Schizophrenia (broad)”, “Schizophrenia (strict)”, and “Schizoaffective”, have been calculated and are presented in Table 5. These results were run 40 times and are reported as (Mean Inline graphic STD) in accuracy, sensitivity, and specificity.

Table 5.

Performance of the model for each diagnostic subgroup

Detected case	ACC (95% CI)	SEN (95% CI)	SP (95% CI)	Precision (95% CI)	Recall (95% CI)	F1 (95% CI)	AUC (95% CI)
Schizophrenia (broad)	97.09 0.019 [96.7–97.5]	97.27 0.03 [96.8–97.7]	96.75 0.04 [96.2–97.3]	97.48 0.03 [97.0–97.9]	97.27 0.03 [96.8–97.7]	97.37 0.02 [97.0–97.7]	0.984 0.006 [0.973–0.992]
Schizophrenia (strict)	98.05 0.02 [97.7–98.4]	96.57 0.03 [96.1–97.0]	99.69 0.01 [99.5–99.8]	98.11 0.02 [97.7–98.4]	96.57 0.03 [96.1–97.0]	97.33 0.02 [96.9–97.7]	0.989 0.005 [0.981–0.995]
Schizoaffective	98.23 0.018 [97.9–98.5]	97.07 0.03 [96.6–97.5]	99.73 0.009 [99.6–99.8]	98.24 0.02 [97.9–98.5]	97.07 0.03 [96.6–97.5]	97.65 0.02 [97.3–98.0]	0.986 0.007 [0.974–0.994]

Open in a new tab

The findings of this study were obtained using the best-performing hyperparameters, identified through an experimental setup that compared classification performance across various configurations. Learning rates tested ranged from 0.0001 to 0.01. Three methods were evaluated for weight initialization: Glorot, He, and narrow-normal. The narrow-normal initializer samples weights independently from a normal distribution with a mean of zero and a standard deviation of 0.01, which is crucial for effective network training. Similarly, three bias initializers were compared: zeros, ones, and narrow normal. The epoch and batch size were predefined and fixed for this experiment. Both Stochastic Gradient Descent with Momentum (SGDM) and Adam optimizers were tested for optimization. This setup resulted in 342 different hyperparameter combinations, all trained and evaluated. The configuration with the highest validation accuracy was selected for the study, and these results are summarized in Table 6, along with the chosen hyperparameters.

Table 6.

Tuning hyperparameters results

Trial	Hyperparameters				Metrics
Trial	LearnRate	WeightsInitializer	biasInitializer	Optimizer	Tr. ACC	Tr. Loss	Valid. ACC	Valid. Loss
1	0.0001	he	zeros	sgdm	92.18	0.19	95.1	0.15
2	0.0002	he	zeros	sgdm	95.31	0.10	94.46	0.11
3	0.0003	he	zeros	sgdm	97.65	0.0775	96.09	0.11
4	0.0004	he	zeros	sgdm	96.87	0.074	96.94	0.10
5	0.0005	he	zeros	sgdm	98.43	0.051	96.74	0.09
…	…	…	…	…	…	…	…	…
20	0.0001	glort	zeros	sgdm	89.66	0.21	94.46	0.15
21	0.0002	glort	zeros	sgdm	94.53	0.13	95.11	0.11
22	0.0003	glort	zeros	sgdm	97.65	0.10	97.6	0.11
…	…	…	…	…	…	…	…	…
40	0.0002	narrow-normal	zeros	sgdm	95.31	0.12	96.41	0.1
…	…	…	…	…	…	…	…	…
60	0.0003	he	ones	sgdm	96.09	0.09	95.43	0.11
…	…	…	…	…	…	…	…	…
70	0.004	he	ones	sgdm	92.18	0.08	95.82	0.2
…	…	…	…	…	…	…	…	…
106	0.002	narrow-normal	ones	sgdm	100	0.0069	96.41	0.14
…	…	…	…	…	…	…	…	…
123	0.0009	he	narrow-normal	sgdm	99.2	0.02	98.1	0.09
…	…	…	…	…	…	…	…	…
172	0.0001	he	zeros	adam	100	0.00001	98.24	0.02
…	…	…	…	…	…	…	…	…
175	0.0004	he	zeros	adam	95.31	0.088	94.13	0.2
…	…	…	…	…	…	…	…	…
201	0.002	glort	zeros	sgdm	55.46	0.68	50.81	0.7
…	…	…	…	…	…	…	…	…
267	0.0001	narrow-normal	ones	sgdm	97.65	0.05	96.09	0.1
…	…	…	…	…	…	…	…	…
300	0.006	he	narrow-normal	adam	98.43	0.44	95.11	0.11
…	…	…	…	…	…	…	…	…
310	0.0006	glort	narrow-normal	adam	98.43	0.03	96.74	0.1
…	…	…	…	…	…	…	…	…
320	0.007	glort	narrow-normal	adam	97.65	0.05	96.09	0.1
…	…	…	…	…	…	…	…	…
342	0.01	narrow-normal	narrow-normal	adam	96.09	0.09	95.43	0.11
Epoch	Batch-Size	LearnRate	WeightsInitializer	BiasInitializer	Optimizer
20	64	0.0001	he	zeros	adam

Open in a new tab

Additionally, K-fold cross-validation was conducted to assess the robustness of the model with the tuned hyperparameters, with a value of K = 10 chosen for this study. The accuracy, sensitivity, and specificity metrics, reported as means and standard deviations, are presented in Table 4. The high consistency of these metrics across the K-fold cross-validation demonstrates the effectiveness and reliability of the selected hyperparameters for this study. When tested on an external dataset, our model continued to perform admirably, exhibiting high accuracy, sensitivity, specificity, precision, recall, and F1-score. Although there was a slight drop in accuracy and AUC compared to the tuned hyperparameter testing, the model retained robust performance and demonstrated strong generalization ability to unseen data. The higher specificity and stable recall indicate that the model remains effective at identifying true positives and avoiding false positives in an external setting.

To further evaluate the generalizability of the proposed model across independent acquisition sites and scanners, we performed a Leave-One-Site-Out (LOSO) cross-validation. In this analysis, data from one site were held out entirely for testing, while the model was trained and validated using the remaining two datasets. This procedure was repeated three times so that each site (COBRE, MCICShare, UCLA) served once as the unseen test domain. Importantly, LOSO evaluation was conducted using subject-level partitioning, ensuring that all slices from each subject were confined to a single site-specific fold to prevent inter-site or intra-subject data leakage.

As shown in Table 7, the LOSO experiments demonstrated consistent and high classification performance across all sites, with AUC values ranging between 0.962 and 0.983, indicating strong cross-site robustness. Notably, when trained on COBRE and MCICShare and tested on UCLA data, the model achieved an accuracy of 96.5% and AUC of 0.98, confirming the ability of the framework to generalize effectively to previously unseen scanner protocols and demographic distributions. Similarly, testing on COBRE and MCICShare yielded comparable accuracies of 95.3% and 94.8%, respectively. These results further support the stability of our approach and its resilience to inter-dataset variability, scanner differences, and acquisition heterogeneity.

Table 7.

Leave-one-site-out (LOSO) evaluation results demonstrating cross-site generalization performance

Training Sites	Test Site	Accuracy (%)	Sensitivity (%)	Specificity (%)	Precision (%)	F1-Score (%)	AUC
COBRE + MCICShare	UCLA
MCICShare + UCLA	COBRE
COBRE + UCLA	MCICShare
Average SD	–

Open in a new tab

To evaluate the effect of the color-to-grayscale enhancement, we performed an additional experiment in which the classification pipeline was executed without applying the enhancement step. In this baseline setting, the model achieved an average accuracy of 93.4 Inline graphic 1.9%, sensitivity of 94.2 1.8%, specificity of 92.5 2.0%, and an AUC of 0.961 0.010 across 40 independent runs. These results indicate that, although the model maintained acceptable performance, the absence of the enhancement step reduced its ability to capture subtle contrast variations and fine-grained structural features relevant to schizophrenia classification.

While deep learning models are typically designed to work with raw, unprocessed images, this study utilized various preprocessing steps to enhance model performance. These steps contributed to achieving higher accuracy, particularly during the segmentation process. Peak Signal-to-Noise Ratio (PSNR) and Signal-to-Noise Ratio (SNR) analyses were performed to evaluate the impact of preprocessing on image quality. For denoised images, the PSNR and SNR values were 37.11 Inline graphic 1.18 and 24.3 1.35, respectively, while for noisy images, the values were 10.047 0.15 and 4.2 0.75. The preprocessing steps implemented in this study significantly improved image quality. We conducted a comparative analysis of segmented unprocessed and preprocessed images to validate these improvements. The results in Table 3 demonstrate that preprocessing notably increased segmentation accuracy. Specifically, we achieved 96 and 97% accuracy for the insula and temporal lobe, respectively, compared to just 80% and 85% for the unprocessed images. Additionally, a comparison with state-of-the-art methods that utilize EEG and neuroimaging data, summarized in Table 8, confirms the superiority of our approach in terms of diagnostic accuracy and robustness. These findings underscore the potential of the model for reliable detection of schizophrenia and its applicability across a variety of datasets. Our proposed region-specific deep learning framework demonstrates promising potential for improving schizophrenia diagnosis through precise segmentation and classification of key brain regions. However, these findings should be interpreted as preliminary. Further multicenter, prospective studies involving larger and more diverse populations are required to validate the generalizability and clinical applicability of this approach.

Table 8.

Previous studies

Study	Methods			Performance
Study	Modalities	Features	classifier
Rajesh and Sunil Kumar (2024) [18]	EEG	-	logit boost classifier	ACC: 91.66%
Kanyal (2024) [16]	MRI/fMRI	MRI/fMRI and SNP	DL	ACC: 79.01%
Srinivasan et. al (2024) [19]	Multi-channel EG	-	CNN and LSTM	ACC: 98.2%
Agarwal and Singhal (2023) [10]	EEG	-	SVM, KNN, BT, and DT	ACC: 99.25%
Zhang et. al (2023) [55]	sMRI	-	3D CNN	AUC: 98.7%
Khare and Bajaj (2022) [56]	EEG	-	Optimize extreme machine classifier	ACC: 92.93%
De Rosa et. al (2022) [57]	Post-morterm Brain	HP and DLPFC	RF	AUC: 95%
Febles et. al (2022) [58]	EEG	-	multi kernel SVM	ACC: 83%
Lei et. al (2022) [13]	sMRI/fMRI	GM and WM	SVM	ACC: 90.83%
Algumaei et. al (2022) [59]	fMRI	HP,TL and FL	SVM	ACC: 98.57%
Tanveer et. al (2022) [14]	sMRI	GM and WM	SVM and RF	ACC: 80.71%
Luján et. al (2022) [60]	EEG	-	SVM, Bayesian LDA, Gaussian NB, KNN, AdaBoost, and RBF	ACC: 93.40%
Zandbagle et. al (2022) [61]	EEG	-	KNN,LDA and SVM	ACC: 89.21%
Aksöz et. al (2022) [11]	EEG	-	KNN,ANN and SVM	ACC: 93.9%
Lin et. al (2021) [17]	fMRI/DTI	whole brain	multi-kernel SVM	ACC: 95.33%
Shi et. al (2021) [62]	sMRI/fMRI	GM	LDA	ACC: 93.75%
Azizi et. al (2021) [12]	EEG	-	LR	ACC: 97%
Du et. al (2020) [63]	EEG	-	Non-linear dynamic and functional brain network	ACC: 76.77%
Vieira et. al (2020) [64]	sMRI	GMV/cortical thickness	SVM, KNN, LR, and DNN	ACC: 70%
Yang et. al(2020) [65]	fMRI	whole brain	SVM	ACC: 99.46%
Yassin et al. (2020) [15]	sMRI	cortical thickness, surface area and subcortical volume	SVM, RF, LG, AB, DT, and KNN	ACC: 75%

Open in a new tab

Experimental results

An experiment was conducted to evaluate the proposed method against other state-of-the-art classification techniques, aiming to demonstrate its strong potential for accurately diagnosing schizophrenia. The proposed method was compared with several advanced approaches, including CNN architectures integrated with attention mechanisms such as CBAM, transformer-based methods utilizing Vision Transformers (ViT), and hybrid CNN-RNN models. The metrics employed in this comparison include training accuracy, validation accuracy, and the size of the network. These metrics were used to assess and contrast the performance of the different methods. The details of each model are described below:

ResNet50-CBAM

ResNet50-CBAM is an enhanced version of the ResNet50 architecture, which integrates the Convolutional Block Attention Module (CBAM). ResNet50, a deep convolutional neural network with 50 layers, is renowned for its residual learning framework, which addresses the vanishing gradient problem and enables the training of very deep networks. The CBAM module is incorporated to improve feature representation by sequentially applying channel and spatial attention mechanisms. The channel attention mechanism focuses on “what” is important in the feature map, while the spatial attention mechanism identifies “where” the important regions are located. This combination allows the model to adaptively refine feature maps, enhancing its ability to capture discriminative features for tasks such as schizophrenia diagnosis. The integration of CBAM into ResNet50 improves its performance by emphasizing relevant features and suppressing less useful ones, making it a robust choice for accurate classification tasks. The Channel Attention module employs global average pooling and max pooling to aggregate spatial information, which is then processed by a shared multi-layer perceptron (MLP) to produce channel-wise attention weights. These weights are subsequently applied to the input feature map to accentuate the most significant channels. On the other hand, the Spatial Attention module generates spatial attention maps by performing convolutional operations on the concatenated outputs of average pooling and max pooling along the channel dimension. The resulting attention map is then applied to the feature map to emphasize relevant spatial regions. Figure 8 illustrates the architecture of the channel attention module, spatial attention module, and the overall CBAM structure. In our study, CBAM is integrated after the convolutional layers within each residual block of ResNet50. This integration enables the model to adaptively refine feature maps at multiple levels, significantly enhancing its ability to capture discriminative features for improved performance.

Fig. 8 — Architecture of channel attention and spatial attention and CBAM with ResBlock

Vision transformers (ViT)

The Vision Transformer (ViT) is a revolutionary architecture that adapts the Transformer model, originally developed for natural language processing (NLP), to computer vision tasks. Unlike traditional convolutional neural networks (CNNs), which rely on convolutional layers to extract hierarchical features, ViT utilizes the self-attention mechanism to process images, allowing it to capture global relationships between image patches. The ViT architecture consists of two main components: the backbone and the head. The backbone processes input images and generates a vector of features, while the head is responsible for making predictions by mapping the encoded feature vectors to prediction scores. In our research, we fine-tuned the ViT model by replacing the original head with a new classification head tailored to the specific requirements of our study, enabling the model to effectively classify data for our task. Figure 9 provides a visual representation of the Vision Transformer (ViT) architecture.

ResNet50-GRU

The ResNet50-GRU model is a powerful hybrid architecture that combines the strengths of ResNet50, a deep convolutional neural network (CNN), and Gated Recurrent Units (GRUs), a type of recurrent neural network (RNN). This integration leverages the spatial feature extraction capabilities of CNNs with the temporal modeling strengths of RNNs, making it particularly effective for tasks that involve both spatial and sequential data, such as video analysis, time-series classification, or medical image sequences. GRUs are a variant of RNNs designed to handle sequential data more efficiently than traditional RNNs. They incorporate gating mechanisms to control the flow of information, allowing the model to capture long-term dependencies in sequential data. In the ResNet50-GRU model, the ResNet50 backbone processes each input frame or image independently, extracting high-dimensional feature vectors that encapsulate spatial information. These feature vectors, representing the spatial characteristics of each frame, are then sequentially fed into the GRU network. The GRU models the temporal relationships between frames or time steps by updating its hidden state at each step, enabling it to capture patterns and dependencies over time. The final hidden state of the GRU, which integrates both spatial and temporal information, is passed to a fully connected layer or classification head to produce the final output, such as class probabilities or regression values. This combination of spatial and temporal modeling makes the ResNet50-GRU model highly effective for tasks involving sequential data. Figure 10 shows the architecture of ResNet50-GRU model.

The results presented in Table 9 not only compare the performance of different models in terms of training accuracy (Tr. Acc) and validation accuracy (Val. Acc) but also emphasize the critical aspects of computational cost and deployability across various systems. This consideration is particularly crucial in real-world scenarios where hardware resources are limited, and models must run efficiently on a wide range of devices, from high-performance servers to edge devices like mobile phones and IoT systems.

Table 9.

Results of the experimental study

Approach	Tr. Acc	Val. ACC	Size
ResNet50-CBAM	96.2 0.09	95.7 0.3	200Mb
ViT	91.02 0.12	90.1 0.02	670Mb
ResNet50-GRU	97.9 0.06	97 0.04	104Mb
Proposed Method	100 0.1	98.2 0.03	95Mb

Open in a new tab

The proposed method achieves the highest performance, with a training accuracy of 100% and a validation accuracy of 98.2%, outperforming all other models. Despite its superior accuracy, it maintains a remarkably compact size of just 95MB, making it the smallest model in this comparison. This reduced size translates into lower memory requirements and minimal bandwidth consumption for model transmission, making it ideal for deployment on edge devices and resource-constrained systems. Additionally, the lightweight nature of the model significantly reduces computational costs during inference, enabling smooth execution even on less powerful hardware.

In contrast, ResNet50-GRU delivers strong performance with a training accuracy of 97.9% and a validation accuracy of 97%, but at a slightly larger size of 104MB. While still relatively compact, its increased size compared to the proposed method may impose additional storage and computational demands, though it remains suitable for a broad range of devices.

ResNet50-CBAM, with a training accuracy of 96.2% and a validation accuracy of 95.7%, offers solid results but comes with a substantially larger model size of 200MB. This increased size could pose challenges for deployment on edge devices with limited memory, as well as higher computational costs, requiring more powerful hardware for efficient execution.

The Vision Transformer (ViT), despite its advanced architecture, exhibits the weakest performance in this comparison, achieving a training accuracy of 91.02% and a validation accuracy of 90.1%. More critically, it has an exceptionally large model size of 670MB, making it nearly seven times larger than the proposed method. The high computational cost and substantial memory requirements of ViT make it impractical for deployment on resource-constrained systems, as it necessitates powerful GPUs for effective execution. These findings highlight the efficiency of the proposed method, which not only achieves state-of-the-art accuracy but also remains highly deployable due to its compact size and lower computational demands.

Discussion

This study introduced a novel, domain-specific framework for the automated detection of schizophrenia using structural MRI, combining high-precision semantic segmentation and lightweight classification within a two-stage pipeline. The use of DeepLabv3+ for the segmentation of the temporal lobe and insula, regions strongly implicated in schizophrenia, represents a significant methodological advancement. To our knowledge, this is the first study to employ DeepLabv3+ with contrast-enhanced imaging for targeted segmentation in schizophrenia research. The integration of segmentation and classification stages yielded a high overall diagnostic performance (accuracy = 98.02 Inline graphic 1.62, AUC = 0.99170.01), exceeding the results reported in several previous deep learning studies using MRI-based classification [55, 66, 67].

Compared with prior work, our approach achieved higher classification accuracy and AUC while maintaining a lightweight and interpretable design. For instance, Goel et al. [66] reported 96.5% accuracy using a ResNet50-based ensemble, while Zhang et al. [55] achieved AUC = 0.987 with whole-brain 3D CNNs. The superior performance of our framework likely stems from its region-specific focus and contrast enhancement, which amplify diagnostically relevant structural cues while suppressing noise from irrelevant brain areas. By localizing the analysis to the temporal lobe and insula–regions consistently associated with structural and functional abnormalities in schizophrenia [5, 6, 68]—our method captures neurobiological substrates that are both clinically and mechanistically meaningful. This localization strategy aligns with the principles of explainable AI in neuroimaging, emphasizing knowledge-guided feature extraction rather than indiscriminate global modeling.

The biological plausibility of our findings is supported by converging evidence. The temporal lobe, particularly the superior temporal gyrus, has been repeatedly implicated in auditory hallucinations and language disturbances, while the insula contributes to sensory integration, self-awareness, and emotional processing—all functions commonly disrupted in schizophrenia [39]. Our Grad-CAM analyses further confirmed that model attention was concentrated on these regions, reinforcing the link between network focus and pathophysiologically relevant structures. Together, these findings suggest that anatomically constrained and contrast-enhanced models can achieve both high accuracy and interpretability, addressing a key limitation of black-box deep learning approaches in psychiatry.

From a computational perspective, the combination of DeepLabv3+ and AlexNet enabled strong diagnostic performance with minimal complexity. The proposed model (95 MB) demonstrated comparable or superior results to larger architectures (e.g., ResNet50-GRU, ViT) while maintaining a smaller parameter footprint, which is advantageous for reproducibility and clinical deployment. Cross-validation and external testing further confirmed model robustness and generalization. The consistency between internal and external results underscores the potential of lightweight, region-specific models for real-world implementation.

The application of 2D CNN for schizophrenia classification has several limitations. Manual labeling of a small dataset of 130 MRI slices was performed by a single expert rater. While this ensured consistency, it represents a limitation, as inter-rater reliability was not assessed. Future work should incorporate multiple raters and report metrics such as ICC to strengthen segmentation reliability. The reliance on 2D MRI slices, rather than full 3D volumes, limits the spatial context available to the model and may restrict its ability to capture complex structural patterns. Additionally, scanner-related variability across datasets could influence image quality and model performance despite harmonized acquisition parameters. The limited number of images used for segmentation also constrains model generalizability; in future studies, we plan to increase the dataset size to improve robustness and overall performance. To further enhance feature extraction and generalization, additional high-quality labeled datasets, along with data augmentation and effective image synthesis techniques, should be incorporated. Future research will also explore leveraging 3D MRI images to provide more comprehensive spatial information and improve model performance.

In this study, we adopted a lightweight classification model (AlexNet, 95MB) as part of a modular pipeline aimed at balancing high diagnostic performance with clinical interpretability and practical deployment. While larger and more complex architectures such as ResNet50-GRU, ResNet50-CBAM, and Vision Transformers have shown promise in other domains, their utility in schizophrenia research is limited by several factors. First, structural neuroimaging datasets for psychiatric disorders are typically small and heterogeneous, which makes deep, highly parameterized models prone to overfitting. To this end, we implemented 10-fold cross-validation , which consistently showed high accuracy and low variance across folds, supporting the robustness and generalizability of our approach. Furthermore, the model was tested on an independent external dataset , where it maintained strong classification performance (AUC = 0.98, accuracy = 96.2%), indicating that it generalized well beyond the original training distribution. These findings suggest that lightweight architectures, when combined with thoughtful pre-processing and domain-specific constraints, can achieve competitive performance without the complexity of larger models.

We also emphasize that the lightweight design does not stand in isolation but complements other overfitting mitigation strategies employed in this study. Specifically, we used data augmentation to synthetically expand the training set and increase variability, along with region-specific segmentation using DeepLabv3+ and contrast enhancement during grayscale conversion to improve signal clarity and reduce irrelevant noise. These steps collectively form a multi-layered regularization strategy that reduces the risk of the model fitting to spurious patterns while maintaining a clear focus on biologically and clinically relevant brain structures.

While the final feature representation extracted by AlexNet consisted of 4096 dimensions per region of interest, we carefully considered the balance between feature dimensionality and sample size to minimize the risk of overfitting. The total dataset incorporated over 1,200 MRI slices from three publicly available repositories (COBRE, MCICShare, and UCLA), complemented by extensive data augmentation to expand variability across the training set. Moreover, the choice of a lightweight classifier (AlexNet, 95 MB) substantially reduced the number of trainable parameters compared to deeper architectures such as ResNet50-GRU or Vision Transformers, thus maintaining an appropriate ratio between sample size and model complexity. 10-fold cross-validation and independent external testing further validated the model’s generalization ability, showing minimal performance variance (accuracy = 98.02 Inline graphic 1.62, AUC = 0.99). These results indicate that the model successfully avoided overfitting despite the high-dimensional feature space. In essence, the combination of moderate model complexity, strong regularization via data augmentation, and rigorous validation ensured a stable and generalizable relationship between sample size and feature dimensionality.

Importantly, the lightweight nature of the model also enhances its interpretability and deployability. Smaller networks with fewer parameters tend to produce clearer and more localized Grad-CAM maps, as observed in our saliency analyses (Fig. 6), where model attention aligned with clinically meaningful brain regions such as the hippocampus, insula, and cingulate cortex. This stands in contrast to large “black-box” models, whose interpretability is often compromised by their depth and architectural complexity. Finally, from a translational standpoint, the reduced model size (95MB) makes it highly suitable for integration into clinical imaging systems and portable diagnostic tools, particularly in resource-limited settings where access to high-performance computing infrastructure is restricted. Heavy models often pose barriers to independent validation and deployment across resource-constrained clinical institutions, especially in low-income regions, where schizophrenia research and diagnostic support are already underfunded. By achieving high performance with minimal computational overhead, our lightweight framework helps overcome these limitations and supports the broader goal of democratizing AI-assisted neuropsychiatric diagnosis globally.

Acknowledgements

Authors have not received any funding for this research.

Abbreviations

SZ: Schizophrenia
HC: Healthy Control
DL: Deep Learning
ML: Machine Learning
CNN: Convolutional Neural Network
ROI: Region of Interest
sMRI: Structural Magnetic Resonance Imaging
CBAM: Convolutional Block Attention Module
GRU: Gated Recurrent Unit
ViT: Vision Transformer
ReLU: Rectified Linear Unit
ANOVA: Analysis of Variance
AUC: Area Under the Receiver Operating Characteristic Curve
ACC: Accuracy
SEN: Sensitivity
SPEC: Specificity
PREC: Precision
REC: Recall
F1: F1-score
CI: Confidence Interval
PSNR: Peak Signal-to-Noise Ratio
SNR: Signal-to-Noise Ratio
LOSO: Leave-One-Site-Out
CV: Cross-Validation
K: Number of folds in K-Fold Cross-Validation
ADAM: Adaptive Moment Estimation
fc7: Fully Connected Layer 7 (in AlexNet)
NLM: Non-Local Means Filter
RWMS: Relative Weighted Mean Square
DSM-5: Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition
SANS: Scale for the Assessment of Negative Symptoms
COBRE: Center for Biomedical Research Excellence
MCICShare: Mind Clinical Imaging Consortium Share Database
UCLA: University of California, Los Angeles Dataset
PACs: Picture Archiving and Communication System

Author contributions

All authors wrote and edited the manuscript, and all have read and approved the final version of the manuscript.

Funding

Authors have not received any funding for this research.

Data availability

The dataset used in this study is publicly available at the following link: https://schizconnect.org

Declarations

Ethical approval

Not applicable.

Consent for publication

Not applicable. All authors have approved the final manuscript and consent to its publication.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

1.Insel TR. Rethinking schizophrenia. Nature. 2010;468:187–93. [DOI] [PubMed] [Google Scholar]
2.Ellison-Wright I, Glahn DC, Laird AR, Thelen SM, Bullmore E. The anatomy of first-episode and chronic schizophrenia: an anatomical likelihood estimation meta-analysis. Am J Psychiatry. 2008;165:1015–23. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Glahn DC, et al. Meta-analysis of gray matter anomalies in schizophrenia: application of anatomic likelihood estimation and network analysis. Biol Psychiatry. 2008;64:774–81. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Kasai K, et al. Differences and similarities in insular and temporal Pole mri gray matter volume abnormalities in first-episode schizophrenia and affective psychosis. Archiv Gener Psychiatry. 2003;60:1069–77. [DOI] [PubMed] [Google Scholar]
5.Honea R, Crow TJ, Passingham D, Mackay CE. Regional deficits in brain volume in schizophrenia: a meta-analysis of voxel-based morphometry studies. Am J Psychiatry. 2005;162:2233–45. [DOI] [PubMed] [Google Scholar]
6.Shenton ME, Dickey CC, Frumin M, McCarley RW. A review of mri findings in schizophrenia. Schizophr Res. 2001;49:1–52. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Craig AD. How do you feel? interoception: the sense of the physiological condition of the body. Nat Rev Neurosci. 2002;3:655–66. [DOI] [PubMed] [Google Scholar]
8.Kittleson AR, Woodward ND, Heckers S, Sheffield JM. The insula: leveraging cellular and systems-level research to better understand its roles in health and schizophrenia. Neurosci Biobehav Rev. 2024;105643. [DOI] [PMC free article] [PubMed]
9.Sheffield JM, Rogers BP, Blackford JU, Heckers S, Woodward ND. Insula functional connectivity in schizophrenia. Schizophr Res. 2020;220:69–77. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Agarwal M, Singhal A. Fusion of pattern-based and statistical features for schizophrenia detection from eeg signals. Med Eng Phys. 2023;112:103949. [DOI] [PubMed] [Google Scholar]
11.Aksöz A, et al. Analysis and classification of schizophrenia using event related potential signals. Comput Sci. 2022;32–36.
12.Azizi S, Hier DB, Wunsch DC. Schizophrenia classification using resting state eeg functional connectivity: source level outperforms sensor level. 2021 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), 1770–73 (IEEE, 2021. [DOI] [PubMed]
13.Lei D, et al. Integrating machining learning and multimodal neuroimaging to detect schizophrenia at the level of the individual. Hum Brain Mapp. 2020;41:1119–35. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Tanveer M, et al. Diagnosis of schizophrenia: a comprehensive evaluation. IEEE J Biomed And Health Inf. 2022;27:1185–92. [DOI] [PubMed] [Google Scholar]
15.Yassin W, et al. Machine-learning classification using neuroimaging data in schizophrenia, autism, ultra-high risk and first-episode psychosis. Transl Psychiatry. 2020;10:278. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Kanyal A, et al. Multi-modal deep learning from imaging genomic data for schizophrenia classification. Front Psychiatry. 2024;15:1384842. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Lin X, et al. Characteristics of multimodal brain connectomics in patients with schizophrenia and the unaffected first-degree relatives. Front Cell And Dev Biol. 2021;9:631864. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Rajesh KN, Kumar TS. Schizophrenia detection in adolescents from eeg signals using symmetrically weighted local binary patterns. EMBC. 2021;963–66. [DOI] [PubMed]
19.Srinivasan S, Johnson SD. A novel approach to schizophrenia detection: optimized preprocessing and deep learning analysis of multichannel eeg data. Expert Syst With Appl. 2024;246:122937. [Google Scholar]
20.Chyzhyk D, Savio A, Graña M. Computer aided diagnosis of schizophrenia on resting state fmri data by ensembles of elm. Neural Networks. 2015;68:23–33. [DOI] [PubMed] [Google Scholar]
21.Gollub RL, et al. The mcic collection: a shared repository of multi-modal, multi-site brain image data from a clinical investigation of schizophrenia. Neuroinformatics. 2013;11:367–88. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Bilder R, et al. Ucla consortium for neuropsychiatric phenomics la5c study. 2018.
23.Wang L, et al. Schizconnect: mediating neuroimaging databases on schizophrenia and related disorders for large-scale integration. Neuroimage. 2016;124:1155–67. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Robinson DG, et al. Predictors of treatment response from a first episode of schizophrenia or schizoaffective disorder. Am J Psychiatry. 1999;156:544–49. [DOI] [PubMed] [Google Scholar]
25.Szeszko PR, et al. White matter abnormalities in first-episode schizophrenia or schizoaffective disorder: a diffusion tensor imaging study. Am J Psychiatry. 2005;162:602–05. [DOI] [PubMed] [Google Scholar]
26.Soler-Vidal J, et al. Brain correlates of speech perception in schizophrenia patients with and without auditory hallucinations. PLoS One. 2022;17:e0276975. [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Manjón JV. Mri preprocessing. Imaging biomarkers: development and clinical integration 53–63. 2017.
28.Van Ginneken B, Schaefer-Prokop CM, Prokop M. Computer-aided diagnosis: how to move from the laboratory to the clinic. Radiology. 2011;261:719–32. [DOI] [PubMed] [Google Scholar]
29.Buades A, Coll B, Morel J-M. A non-local algorithm for image denoising. 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05). Ieee; 2005, 60–65, vol. 2.
30.Hoopes A, Mora JS, Dalca AV, Fischl B, Hoffmann M. Synthstrip: skull-stripping for any brain image. NeuroImage. 2022;260:119474. 10.1016/j.neuroimage.2022.119474. [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Brostow GJ, Fauqueur J, Cipolla R. Semantic object classes in video: a high-definition ground truth database. Pattern Recognit Lett. 2009;30:88–97. [Google Scholar]
32.Chen L-C, Zhu Y, Papandreou G, Schroff F, Adam H. Encoder-decoder with atrous separable convolution for semantic image segmentation. Proceedings of the European Conference on Computer Vision (ECCV). 2018.
33.He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2016, 770–78, 10.1109/CVPR.2016.90.
34.Kuhn GR, Oliveira MM, Fernandes LA. An improved contrast enhancing approach for color-to-grayscale mappings. The Visual Comput. 2008;24:505–14. [Google Scholar]
35.Mazhari A, Allahgholi A, Shafieian M. Automated detection of sdh and edh due to tbi from ct-scan images using cnn. 2023 30th National and 8th International Iranian Conference on Biomedical Engineering (ICBME). IEEE; 2023, 164–70.
36.Hugdahl K, et al. Auditory hallucinations in schizophrenia: the role of cognitive, brain structural and genetic disturbances in the left temporal lobe. Front Hum Neurosci. 2008;2:131. [DOI] [PMC free article] [PubMed] [Google Scholar]
37.Sun D, et al. Brain surface contraction mapped in first-episode schizophrenia: a longitudinal magnetic resonance imaging study. Mol Psychiatry. 2009;14:976–86. [DOI] [PMC free article] [PubMed] [Google Scholar]
38.Rg S. Measurement of brain metabolites by 1h magnetic resonance spectroscopy in patients with schizophrenia; a systematic review and meta-analysis. Neuropsychopharmacology. 2005;30:1949–62. [DOI] [PubMed] [Google Scholar]
39.Wylie KP, Tregellas JR. The role of the insula in schizophrenia. Schizophr Res. 2010;123:93–104. [DOI] [PMC free article] [PubMed] [Google Scholar]
40.Palaniyappan L, Mallikarjun P, Joseph V, White TP, Liddle PF. Regional contraction of brain surface area involves three large-scale networks in schizophrenia. Schizophr Res. 2011;129:163–68. [DOI] [PubMed] [Google Scholar]
41.Menon V, Uddin LQ. Saliency, switching, attention and control: a network model of insula function. Brain Struct And Function. 2010;214:655–67. [DOI] [PMC free article] [PubMed] [Google Scholar]
42.Uddin LQ, Nomi JS, Hébert-Seropian B, Ghaziri J, Boucher O. Structure and function of the human insula. J Clin Neurophysiol. 2017;34:300–06. [DOI] [PMC free article] [PubMed] [Google Scholar]
43.Craig AD. How do you feel—now? the anterior insula and human awareness. Nat Rev Neurosci. 2009;10:59–70. [DOI] [PubMed] [Google Scholar]
44.Poels EM, et al. Glutamatergic abnormalities in schizophrenia: a review of proton mrs findings. Schizophr Res. 2014;152:325–32. [DOI] [PMC free article] [PubMed] [Google Scholar]
45.Merritt K, Egerton A, Kempton MJ, Taylor MJ, McGuire PK. Nature of glutamate alterations in schizophrenia: a meta-analysis of proton magnetic resonance spectroscopy studies. JAMA Psychiatry. 2016;73:665–74. [DOI] [PubMed] [Google Scholar]
46.Xu X-J, Liu T-L, He L, Pu B. Changes in neurotransmitter levels, brain structural characteristics, and their correlation with panss scores in patients with first-episode schizophrenia. World J Clin Cases. 2023;11:5215. [DOI] [PMC free article] [PubMed] [Google Scholar]
47.Huang Z, et al. Negative symptoms correlate with altered brain structural asymmetry in amygdala and superior temporal region in schizophrenia patients. Front Psychiatry. 2022;13:1000560. [DOI] [PMC free article] [PubMed] [Google Scholar]
48.Chuang J, et al. Brain structural signatures of negative symptoms in depression and schizophrenia. front psychiatry. 2014;5:116. [DOI] [PMC free article] [PubMed]
49.Lesh TA, Niendam TA, Minzenberg MJ, Carter CS. Cognitive control deficits in schizophrenia: mechanisms and meaning. Neuropsychopharmacology. 2011;36:316–38. [DOI] [PMC free article] [PubMed] [Google Scholar]
50.Manoliu A, et al. Insular dysfunction within the salience network is associated with severity of symptoms and aberrant inter-network connectivity in major depressive disorder. Front Hum Neurosci. 2014;7:930. [DOI] [PMC free article] [PubMed] [Google Scholar]
51.Anticevic A, et al. Characterizing thalamo-cortical disturbances in schizophrenia and bipolar illness. Cereb Cortex. 2014;24:3116–30. [DOI] [PMC free article] [PubMed] [Google Scholar]
52.Walton E, et al. Prefrontal cortical thinning links to negative symptoms in schizophrenia via the enigma consortium. Psychological Med. 2018;48:82–94. [DOI] [PMC free article] [PubMed] [Google Scholar]
53.Kebets V, et al. Somatosensory-motor dysconnectivity spans multiple transdiagnostic dimensions of psychopathology. Biol Psychiatry. 2019;86:779–91. [DOI] [PubMed] [Google Scholar]
54.Inc. TM. Matlab version: 9.13.0 (r2022b). 2022.
55.Zhang J, et al. Detecting schizophrenia with 3d structural brain mri using deep learning. Sci Rep. 2023;13:14433. [DOI] [PMC free article] [PubMed] [Google Scholar]
56.Khare SK, Bajaj V. A hybrid decision support system for automatic detection of schizophrenia using eeg signals. Comput In Biol Med. 2022;141:105028. [DOI] [PubMed] [Google Scholar]
57.De Rosa A, et al. Machine learning algorithm unveils glutamatergic alterations in the post-mortem schizophrenia brain. Schizophrenia. 2022;8, 8. [DOI] [PMC free article] [PubMed]
58.Santos Febles E, Ontivero Ortega M, Valdés Sosa M, Sahli H. Machine learning techniques for the diagnosis of schizophrenia based on event-related potentials. Front Neuroinf. 2022;16:893788. [DOI] [PMC free article] [PubMed] [Google Scholar]
59.Algumaei AH, Algunaid RF, Rushdi MA, Yassine IA. Feature and decision-level fusion for schizophrenia detection based on resting-state fmri data. PLoS One. 2022;17:e0265300. [DOI] [PMC free article] [PubMed] [Google Scholar]
60.Luján MÁ, et al. Mental disorder diagnosis from eeg signals employing automated leaning procedures based on radial basis functions. J Med Biol Eng. 2022;42:853–59. [DOI] [PMC free article] [PubMed] [Google Scholar]
61.Zandbagleh A, Mirzakuchaki S, Daliri MR, Premkumar P, Sanei S. Classification of low and high schizotypy levels via evaluation of brain connectivity. Int J Neural Syst. 2022;32:2250013. [DOI] [PubMed] [Google Scholar]
62.Shi D, et al. Machine learning of schizophrenia detection with structural and functional neuroimaging. Disease Markers. 2021 (2021;9963824. [DOI] [PMC free article] [PubMed]
63.Du X, et al. Research on electroencephalogram specifics in patients with schizophrenia under cognitive load. Sheng wu yi xue Gong Cheng xue za zhi=. J Biomed Eng= Shengwu Yixue Gongchengxue Zazhi. 2020;37:45–53. [DOI] [PMC free article] [PubMed] [Google Scholar]
64.Vieira S, et al. Using machine learning and structural neuroimaging to detect first episode psychosis: reconsidering the evidence. Schizophr Bull. 2020;46:17–26. [DOI] [PMC free article] [PubMed] [Google Scholar]
65.Yang H, Di X, Gong Q, Sweeney J, Biswal B. Investigating inhibition deficit in schizophrenia using task-modulated brain networks. Brain Struct And Function. 2020;225:1601–13. [DOI] [PubMed] [Google Scholar]
66.Tanveer M. Investigating white matter abnormalities associated with schizophrenia using deep learning model and voxel-based morphometry. 2023. [DOI] [PMC free article] [PubMed]
67.Wen Y, et al. Bridging structural mri with cognitive function for individual level classification of early psychosis via deep learning. Front Psychiatry. 2023;13:1075564. [DOI] [PMC free article] [PubMed] [Google Scholar]
68.Shenton ME, Whitford TJ, Kubicki M. Structural neuroimaging in schizophrenia from methods to insights to treatments. Dialogues in clinical neuroscience. 2022. [DOI] [PMC free article] [PubMed]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The dataset used in this study is publicly available at the following link: https://schizconnect.org

[CR1] 1.Insel TR. Rethinking schizophrenia. Nature. 2010;468:187–93. [DOI] [PubMed] [Google Scholar]

[CR2] 2.Ellison-Wright I, Glahn DC, Laird AR, Thelen SM, Bullmore E. The anatomy of first-episode and chronic schizophrenia: an anatomical likelihood estimation meta-analysis. Am J Psychiatry. 2008;165:1015–23. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR3] 3.Glahn DC, et al. Meta-analysis of gray matter anomalies in schizophrenia: application of anatomic likelihood estimation and network analysis. Biol Psychiatry. 2008;64:774–81. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR4] 4.Kasai K, et al. Differences and similarities in insular and temporal Pole mri gray matter volume abnormalities in first-episode schizophrenia and affective psychosis. Archiv Gener Psychiatry. 2003;60:1069–77. [DOI] [PubMed] [Google Scholar]

[CR5] 5.Honea R, Crow TJ, Passingham D, Mackay CE. Regional deficits in brain volume in schizophrenia: a meta-analysis of voxel-based morphometry studies. Am J Psychiatry. 2005;162:2233–45. [DOI] [PubMed] [Google Scholar]

[CR6] 6.Shenton ME, Dickey CC, Frumin M, McCarley RW. A review of mri findings in schizophrenia. Schizophr Res. 2001;49:1–52. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR7] 7.Craig AD. How do you feel? interoception: the sense of the physiological condition of the body. Nat Rev Neurosci. 2002;3:655–66. [DOI] [PubMed] [Google Scholar]

[CR8] 8.Kittleson AR, Woodward ND, Heckers S, Sheffield JM. The insula: leveraging cellular and systems-level research to better understand its roles in health and schizophrenia. Neurosci Biobehav Rev. 2024;105643. [DOI] [PMC free article] [PubMed]

[CR9] 9.Sheffield JM, Rogers BP, Blackford JU, Heckers S, Woodward ND. Insula functional connectivity in schizophrenia. Schizophr Res. 2020;220:69–77. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR10] 10.Agarwal M, Singhal A. Fusion of pattern-based and statistical features for schizophrenia detection from eeg signals. Med Eng Phys. 2023;112:103949. [DOI] [PubMed] [Google Scholar]

[CR11] 11.Aksöz A, et al. Analysis and classification of schizophrenia using event related potential signals. Comput Sci. 2022;32–36.

[CR12] 12.Azizi S, Hier DB, Wunsch DC. Schizophrenia classification using resting state eeg functional connectivity: source level outperforms sensor level. 2021 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), 1770–73 (IEEE, 2021. [DOI] [PubMed]

[CR13] 13.Lei D, et al. Integrating machining learning and multimodal neuroimaging to detect schizophrenia at the level of the individual. Hum Brain Mapp. 2020;41:1119–35. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR14] 14.Tanveer M, et al. Diagnosis of schizophrenia: a comprehensive evaluation. IEEE J Biomed And Health Inf. 2022;27:1185–92. [DOI] [PubMed] [Google Scholar]

[CR15] 15.Yassin W, et al. Machine-learning classification using neuroimaging data in schizophrenia, autism, ultra-high risk and first-episode psychosis. Transl Psychiatry. 2020;10:278. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR16] 16.Kanyal A, et al. Multi-modal deep learning from imaging genomic data for schizophrenia classification. Front Psychiatry. 2024;15:1384842. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR17] 17.Lin X, et al. Characteristics of multimodal brain connectomics in patients with schizophrenia and the unaffected first-degree relatives. Front Cell And Dev Biol. 2021;9:631864. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR18] 18.Rajesh KN, Kumar TS. Schizophrenia detection in adolescents from eeg signals using symmetrically weighted local binary patterns. EMBC. 2021;963–66. [DOI] [PubMed]

[CR19] 19.Srinivasan S, Johnson SD. A novel approach to schizophrenia detection: optimized preprocessing and deep learning analysis of multichannel eeg data. Expert Syst With Appl. 2024;246:122937. [Google Scholar]

[CR20] 20.Chyzhyk D, Savio A, Graña M. Computer aided diagnosis of schizophrenia on resting state fmri data by ensembles of elm. Neural Networks. 2015;68:23–33. [DOI] [PubMed] [Google Scholar]

[CR21] 21.Gollub RL, et al. The mcic collection: a shared repository of multi-modal, multi-site brain image data from a clinical investigation of schizophrenia. Neuroinformatics. 2013;11:367–88. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR22] 22.Bilder R, et al. Ucla consortium for neuropsychiatric phenomics la5c study. 2018.

[CR23] 23.Wang L, et al. Schizconnect: mediating neuroimaging databases on schizophrenia and related disorders for large-scale integration. Neuroimage. 2016;124:1155–67. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR24] 24.Robinson DG, et al. Predictors of treatment response from a first episode of schizophrenia or schizoaffective disorder. Am J Psychiatry. 1999;156:544–49. [DOI] [PubMed] [Google Scholar]

[CR25] 25.Szeszko PR, et al. White matter abnormalities in first-episode schizophrenia or schizoaffective disorder: a diffusion tensor imaging study. Am J Psychiatry. 2005;162:602–05. [DOI] [PubMed] [Google Scholar]

[CR26] 26.Soler-Vidal J, et al. Brain correlates of speech perception in schizophrenia patients with and without auditory hallucinations. PLoS One. 2022;17:e0276975. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR27] 27.Manjón JV. Mri preprocessing. Imaging biomarkers: development and clinical integration 53–63. 2017.

[CR28] 28.Van Ginneken B, Schaefer-Prokop CM, Prokop M. Computer-aided diagnosis: how to move from the laboratory to the clinic. Radiology. 2011;261:719–32. [DOI] [PubMed] [Google Scholar]

[CR29] 29.Buades A, Coll B, Morel J-M. A non-local algorithm for image denoising. 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05). Ieee; 2005, 60–65, vol. 2.

[CR30] 30.Hoopes A, Mora JS, Dalca AV, Fischl B, Hoffmann M. Synthstrip: skull-stripping for any brain image. NeuroImage. 2022;260:119474. 10.1016/j.neuroimage.2022.119474. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR31] 31.Brostow GJ, Fauqueur J, Cipolla R. Semantic object classes in video: a high-definition ground truth database. Pattern Recognit Lett. 2009;30:88–97. [Google Scholar]

[CR32] 32.Chen L-C, Zhu Y, Papandreou G, Schroff F, Adam H. Encoder-decoder with atrous separable convolution for semantic image segmentation. Proceedings of the European Conference on Computer Vision (ECCV). 2018.

[CR33] 33.He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2016, 770–78, 10.1109/CVPR.2016.90.

[CR34] 34.Kuhn GR, Oliveira MM, Fernandes LA. An improved contrast enhancing approach for color-to-grayscale mappings. The Visual Comput. 2008;24:505–14. [Google Scholar]

[CR35] 35.Mazhari A, Allahgholi A, Shafieian M. Automated detection of sdh and edh due to tbi from ct-scan images using cnn. 2023 30th National and 8th International Iranian Conference on Biomedical Engineering (ICBME). IEEE; 2023, 164–70.

[CR36] 36.Hugdahl K, et al. Auditory hallucinations in schizophrenia: the role of cognitive, brain structural and genetic disturbances in the left temporal lobe. Front Hum Neurosci. 2008;2:131. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR37] 37.Sun D, et al. Brain surface contraction mapped in first-episode schizophrenia: a longitudinal magnetic resonance imaging study. Mol Psychiatry. 2009;14:976–86. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR38] 38.Rg S. Measurement of brain metabolites by 1h magnetic resonance spectroscopy in patients with schizophrenia; a systematic review and meta-analysis. Neuropsychopharmacology. 2005;30:1949–62. [DOI] [PubMed] [Google Scholar]

[CR39] 39.Wylie KP, Tregellas JR. The role of the insula in schizophrenia. Schizophr Res. 2010;123:93–104. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR40] 40.Palaniyappan L, Mallikarjun P, Joseph V, White TP, Liddle PF. Regional contraction of brain surface area involves three large-scale networks in schizophrenia. Schizophr Res. 2011;129:163–68. [DOI] [PubMed] [Google Scholar]

[CR41] 41.Menon V, Uddin LQ. Saliency, switching, attention and control: a network model of insula function. Brain Struct And Function. 2010;214:655–67. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR42] 42.Uddin LQ, Nomi JS, Hébert-Seropian B, Ghaziri J, Boucher O. Structure and function of the human insula. J Clin Neurophysiol. 2017;34:300–06. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR43] 43.Craig AD. How do you feel—now? the anterior insula and human awareness. Nat Rev Neurosci. 2009;10:59–70. [DOI] [PubMed] [Google Scholar]

[CR44] 44.Poels EM, et al. Glutamatergic abnormalities in schizophrenia: a review of proton mrs findings. Schizophr Res. 2014;152:325–32. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR45] 45.Merritt K, Egerton A, Kempton MJ, Taylor MJ, McGuire PK. Nature of glutamate alterations in schizophrenia: a meta-analysis of proton magnetic resonance spectroscopy studies. JAMA Psychiatry. 2016;73:665–74. [DOI] [PubMed] [Google Scholar]

[CR46] 46.Xu X-J, Liu T-L, He L, Pu B. Changes in neurotransmitter levels, brain structural characteristics, and their correlation with panss scores in patients with first-episode schizophrenia. World J Clin Cases. 2023;11:5215. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR47] 47.Huang Z, et al. Negative symptoms correlate with altered brain structural asymmetry in amygdala and superior temporal region in schizophrenia patients. Front Psychiatry. 2022;13:1000560. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR48] 48.Chuang J, et al. Brain structural signatures of negative symptoms in depression and schizophrenia. front psychiatry. 2014;5:116. [DOI] [PMC free article] [PubMed]

[CR49] 49.Lesh TA, Niendam TA, Minzenberg MJ, Carter CS. Cognitive control deficits in schizophrenia: mechanisms and meaning. Neuropsychopharmacology. 2011;36:316–38. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR50] 50.Manoliu A, et al. Insular dysfunction within the salience network is associated with severity of symptoms and aberrant inter-network connectivity in major depressive disorder. Front Hum Neurosci. 2014;7:930. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR51] 51.Anticevic A, et al. Characterizing thalamo-cortical disturbances in schizophrenia and bipolar illness. Cereb Cortex. 2014;24:3116–30. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR52] 52.Walton E, et al. Prefrontal cortical thinning links to negative symptoms in schizophrenia via the enigma consortium. Psychological Med. 2018;48:82–94. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR53] 53.Kebets V, et al. Somatosensory-motor dysconnectivity spans multiple transdiagnostic dimensions of psychopathology. Biol Psychiatry. 2019;86:779–91. [DOI] [PubMed] [Google Scholar]

[CR54] 54.Inc. TM. Matlab version: 9.13.0 (r2022b). 2022.

[CR55] 55.Zhang J, et al. Detecting schizophrenia with 3d structural brain mri using deep learning. Sci Rep. 2023;13:14433. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR56] 56.Khare SK, Bajaj V. A hybrid decision support system for automatic detection of schizophrenia using eeg signals. Comput In Biol Med. 2022;141:105028. [DOI] [PubMed] [Google Scholar]

[CR57] 57.De Rosa A, et al. Machine learning algorithm unveils glutamatergic alterations in the post-mortem schizophrenia brain. Schizophrenia. 2022;8, 8. [DOI] [PMC free article] [PubMed]

[CR58] 58.Santos Febles E, Ontivero Ortega M, Valdés Sosa M, Sahli H. Machine learning techniques for the diagnosis of schizophrenia based on event-related potentials. Front Neuroinf. 2022;16:893788. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR59] 59.Algumaei AH, Algunaid RF, Rushdi MA, Yassine IA. Feature and decision-level fusion for schizophrenia detection based on resting-state fmri data. PLoS One. 2022;17:e0265300. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR60] 60.Luján MÁ, et al. Mental disorder diagnosis from eeg signals employing automated leaning procedures based on radial basis functions. J Med Biol Eng. 2022;42:853–59. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR61] 61.Zandbagleh A, Mirzakuchaki S, Daliri MR, Premkumar P, Sanei S. Classification of low and high schizotypy levels via evaluation of brain connectivity. Int J Neural Syst. 2022;32:2250013. [DOI] [PubMed] [Google Scholar]

[CR62] 62.Shi D, et al. Machine learning of schizophrenia detection with structural and functional neuroimaging. Disease Markers. 2021 (2021;9963824. [DOI] [PMC free article] [PubMed]

[CR63] 63.Du X, et al. Research on electroencephalogram specifics in patients with schizophrenia under cognitive load. Sheng wu yi xue Gong Cheng xue za zhi=. J Biomed Eng= Shengwu Yixue Gongchengxue Zazhi. 2020;37:45–53. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR64] 64.Vieira S, et al. Using machine learning and structural neuroimaging to detect first episode psychosis: reconsidering the evidence. Schizophr Bull. 2020;46:17–26. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR65] 65.Yang H, Di X, Gong Q, Sweeney J, Biswal B. Investigating inhibition deficit in schizophrenia using task-modulated brain networks. Brain Struct And Function. 2020;225:1601–13. [DOI] [PubMed] [Google Scholar]

[CR66] 66.Tanveer M. Investigating white matter abnormalities associated with schizophrenia using deep learning model and voxel-based morphometry. 2023. [DOI] [PMC free article] [PubMed]

[CR67] 67.Wen Y, et al. Bridging structural mri with cognitive function for individual level classification of early psychosis via deep learning. Front Psychiatry. 2023;13:1075564. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR68] 68.Shenton ME, Whitford TJ, Kubicki M. Structural neuroimaging in schizophrenia from methods to insights to treatments. Dialogues in clinical neuroscience. 2022. [DOI] [PMC free article] [PubMed]

PERMALINK

A novel strategy for enhanced schizophrenia detection using established CNN architectures

Ali Allahgholi

Keivan Maghooli

Babak Gholamine

Abstract

Introduction

Fig. 1.

Methods

Dataset

Table 1.

Data preprocessing

Denoising

Fig. 2.

Normalization

Skull stripping

Fig. 3.

Semantic segmentation

Labeling ground truth

Fig. 4.

Data augmentation

Semantic segmentation deep learning construction

Contrast enhancement using color-to-gray scale conversion

Mathematical formulation and application of the color-to-grayscale enhancement

Perceptual distance between colors

Force modeling among color particles

Dynamic luminance update

Color saturation weighting

Quantitative evaluation using RWMS

CNN construction

Fig. 5.

Implementation detail

Biological implications

Fig. 6.

Table 2.

Results

Table 3.

Table 4.

Fig. 7.

Table 5.

Table 6.

Table 7.

Table 8.

Experimental results

ResNet50-CBAM

Fig. 8.

Vision transformers (ViT)

Fig. 9.

ResNet50-GRU

Fig. 10.

Table 9.

Discussion

Acknowledgements

Abbreviations

Author contributions

Funding

Data availability

Declarations

Ethical approval

Consent for publication

Competing interests

Footnotes

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases