Skip to main content
BMC Medical Imaging logoLink to BMC Medical Imaging
. 2025 Dec 9;26:27. doi: 10.1186/s12880-025-02093-2

A novel strategy for enhanced schizophrenia detection using established CNN architectures

Ali Allahgholi 1, Keivan Maghooli 1,, Babak Gholamine 2
PMCID: PMC12801474  PMID: 41366658

Abstract

Schizophrenia, a complex psychotic condition, is challenging to diagnose due to its reliance on clinical assessments and behavioral evaluations. Neuroimaging studies, particularly structural MRI, have revealed reductions in grey matter volume in brain regions such as the temporal lobe and insula. This study combines deep learning and neuroimaging for precise automatic detection of schizophrenia. Using T1-weighted coronal MRI data from three publicly accessible datasets (MCICShare, COBRE, UCLA), a DeepLabv3+ model with ResNet50-based segmentation was employed to isolate the temporal lobe and insula, achieving segmentation accuracies of 96% and 97%, respectively. Enhanced visualization of isolated regions through color-to-grayscale conversion distinguished schizophrenia patients from controls, achieving an AUC of 0.99. Our study specifically focuses on the regions critically linked to both cognitive and emotional dysfunctions in schizophrenia. These results advance the literature by demonstrating improved diagnostic performance and reliability compared to traditional clinical assessments and earlier imaging-based methods.

Keywords: Schizophrenia, Structural MRI, Deep learning, Semantic segmentation, Temporal lobe, Insula, DeepLabv3+, Contrast enhancement, SANS correlation, Grad-CAM

Introduction

Schizophrenia is a complex psychiatric disorder characterized by profound disturbances in perception, cognition, and emotion. It typically manifests in late adolescence or early adulthood, most commonly between the ages of 18 and 25 [1]. Neuroimaging studies have consistently revealed structural brain abnormalities, including grey matter volume reductions across several regions–most notably the frontal, temporal, hippocampal, fusiform, and insular cortices–with the temporal lobe often being the most affected [24]. These morphological alterations are closely linked to the cognitive and perceptual symptoms of schizophrenia.

Among these regions, the temporal lobe and insula have received particular attention for their critical roles in auditory, emotional, and self-referential processing–domains that are profoundly disrupted in schizophrenia. The temporal lobe, especially the superior temporal gyrus, is essential for auditory perception and language comprehension; abnormalities here are closely associated with auditory hallucinations [46]. The insula contributes to interoception, emotional regulation, and self-awareness, and its structural and functional disruptions have been linked to negative symptoms such as emotional blunting and social withdrawal [79]. Taken together, converging evidence indicates that aberrations in the left temporal and insular cortices are central to both the cognitive and affective dysfunctions observed in schizophrenia.

Despite advances in neuroimaging and computational analysis, accurately identifying schizophrenia-related abnormalities remains challenging. Structural MRI provides valuable insights into brain morphology, yet existing deep learning studies often lack interpretability and focus on global brain patterns rather than clinically meaningful regions of interest (ROIs). Moreover, preprocessing variability and the use of whole-brain models may obscure localized patterns critical to understanding symptom-specific brain alterations. There remains a need for methods that integrate region-specific analysis, computational efficiency, and biological interpretability to improve both diagnosis and understanding of schizophrenia.

A wide range of machine learning and deep learning approaches have been explored for schizophrenia classification using diverse neuroimaging and electrophysiological modalities. Early research predominantly employed EEG-based methods, leveraging temporal signal patterns to distinguish patients from healthy controls. These studies typically utilized classifiers such as support vector machines (SVM), k-nearest neighbors (KNN), and ensemble-based algorithms, demonstrating the feasibility of automatic diagnosis based on brain activity patterns [1012]. With the increased availability of structural and functional MRI, recent work has shifted toward image-based approaches that capture morphological and connectivity abnormalities in schizophrenia. Studies employing structural MRI (sMRI) and functional MRI (fMRI) have explored various feature representations, including grey and white matter volumes, cortical thickness, and regional connectivity metrics, often in combination with classical classifiers such as SVM, random forest (RF), or logistic regression [1315]. Hybrid models integrating multimodal data, such as sMRI–fMRI or MRI–genetic information, have also been proposed to improve diagnostic robustness [16, 17]. More recently, deep learning techniques have been introduced to automatically extract discriminative features from neuroimaging data. These methods range from convolutional neural networks (CNNs) applied to 2D MRI slices to hybrid architectures combining CNNs with recurrent models for sequential or multimodal data [18, 19]. Despite their promise, many of these studies rely on whole-brain inputs or heterogeneous feature sets, which may obscure localized abnormalities associated with key clinical symptoms.

Existing approaches demonstrate the potential of computational methods for schizophrenia diagnosis but reveal several limitations: limited interpretability due to whole-brain analysis, underexploration of critical regions of interest such as the temporal and insular cortices, and high computational demands associated with complex architectures. These gaps motivate the present study’s region-specific, 2D ROI-driven framework, which focuses on biologically and clinically relevant areas to enhance both diagnostic accuracy and neurobiological interpretability.

To address these challenges, this study proposes a ROI-driven 2D deep learning framework for schizophrenia classification, focusing specifically on the left temporal and insular cortices—two regions deeply implicated in the disorder’s clinical symptoms. We employ the DeepLabv3+ model with a ResNet-50 backbone for precise semantic segmentation of these cortical regions, leveraging its multi-scale and atrous convolution capabilities to capture subtle anatomical variations. Following segmentation, the extracted regions are contrast-enhanced and classified using AlexNet, a computationally efficient CNN suitable for small-to-moderate neuroimaging datasets. This two-stage design—precise region segmentation followed by targeted classification—balances performance, interpretability, and efficiency.

In contrast to prior CNN-based schizophrenia studies that predominantly analyze whole-brain volumes or non-specific features, the proposed method advances the field by introducing a region-focused deep learning pipeline that explicitly isolates and enhances the temporal and insular cortices before classification. This design not only improves the model’s sensitivity to subtle structural variations but also provides a clearer link between imaging biomarkers and clinical symptoms. Furthermore, by combining the segmentation precision of DeepLabv3+ with the simplicity and efficiency of AlexNet, our framework achieves a balance between diagnostic performance, computational tractability, and interpretability. Overall, this study contributes a novel, biologically grounded, and computationally efficient approach that extends previous CNN-based works through explicit region-level analysis and a streamlined diagnostic architecture. The overall workflow of the proposed method is illustrated in Fig. 1, showing the data flow, dimensionality changes, and evaluation procedures across all stages of the pipeline.

Fig. 1.

Fig. 1

Flowchart of the entire pipeline

Methods

Dataset

We obtained publicly available neuroimaging data from three major datasets: COBRE [20], MCICShare [21], and UCLA [22]. The COBRE and MCICShare datasets were accessed through the SchizConnect database [23], originally collected to investigate brain metabolism and structure in patients with schizophrenia, whereas the UCLA dataset was provided for neuropsychiatric phenomics research. All datasets consisted of 2D T1-weighted structural MRI images, acquired on scanners with field strengths ranging from 1.5T to 3T. Specifically, the COBRE dataset included 90 subjects, and the MCICShare dataset included 109 subjects. A summary of the image distribution across datasets and diagnostic groups is presented in Table  1, showing the total number of images and the percentage contributed by each dataset for control and schizophrenia participants.

Table 1.

Statistical information of data which extracted from SchizoConnect database

Statistic/DX No Known Disorder Schizophrenia Broad Schizophrenia Strict Schizoaffective
Number of subjects 189 109 79 11
Gender(m/f) 132/57 83/26 64/15 8/3
Age(years)(mean Inline graphic SD) 35.9 Inline graphic 12.2 34.3 Inline graphic 11.2 37.8 Inline graphic 13.8 40.6 Inline graphic 12.6
Age range(years)(min-max) 18 - 65 18 - 61 19 - 66 19 - 59
Dataset/Number of Images MCICShare COBRE UCLA Total
Control 264(41.57%) 241 (37.95%) 130 (20.47%) 635
Schizophrenia 306(49.51%) 262(42.39%) 50(8.09%) 618

The COBRE, MCICShare, and UCLA datasets were combined to create a unified dataset comprising 249 schizophrenia (SZH) patients and 319 healthy controls, all of which were used for schizophrenia classification. Additionally, the images from the healthy control group of the UCLA dataset were also used for pixel-wise segmentation. All MRI data included in this study satisfied the required quality criteria. No participants or scans were excluded due to motion artifacts, image distortions, or any other quality-related issues.

The MCICShare and COBRE datasets were categorized into three diagnostic subgroups: schizophrenia (broad), schizophrenia (strict), and schizoaffective. In this study, we used the schizophrenia (broad) category, which combines schizophrenia and schizoaffective diagnoses [23]. This grouping approach is widely adopted in neuroimaging studies [24, 25], as these conditions often share overlapping structural and functional abnormalities.

MRI acquisition parameters were as follows: MCICShare: 3T – TR = 2530 ms, TE = 3.79 ms, FA = 7Inline graphic, TI = 1100 ms, bandwidth = 181 Hz/pixel; 1.5T – TR = 12 ms, TE = 4.76 ms, FA = 20Inline graphic, bandwidth = 110 Hz/pixel; voxel size = 0.625 Inline graphic 0.625 mmInline graphic, slice thickness = 1.5 mm, FOV = 16–18 cm, matrix = 256 Inline graphic 256 Inline graphic 128. UCLA: TR = 1.9 s, TE = 2.26 ms, FOV = 250 mm, matrix = 256 Inline graphic 256, sagittal plane, slice thickness = 1 mm, 176 slices. COBRE: TR = 2.53 s, TE = 1.64–9.08 ms, TI = 1.2 s, FA = 7Inline graphic, slice thickness = 1 mm, FOV = 256 mm, matrix = 256 Inline graphic 256.

For external validation, we used the dataset provided by Soler-Vidal et al. [26], which included T1-weighted MRI images from 46 patients with schizophrenia (mean age 42.52 years; 36 males, 10 females) and 25 healthy controls (mean age 39.8 years; 18 males, 7 females). Demographic variables in this external dataset were comparable to those in the main SchizConnect datasets. No significant group differences were observed in gender distribution (Inline graphic), age, or other demographic factors (Inline graphic), suggesting minimal demographic bias.

Statistical analyses were performed to evaluate potential demographic and dataset-related confounds. A one-way ANOVA showed no significant difference in age among diagnostic groups (Inline graphic, Inline graphic), indicating that participants were age-balanced across categories. Similarly, the chi-square test for gender distribution revealed no significant difference in male-to-female ratios (Inline graphic, Inline graphic; Cramer’s Inline graphic). These results confirm that age and sex were well matched between schizophrenia (SZ) and healthy control (HC) groups and are unlikely to bias classification outcomes.

Data preprocessing

In this study, preprocessing was crucial to reduce noise, enhance image quality, and standardize data for optimal model performance. Steps such as denoising, intensity normalization, and skull stripping were applied to ensure consistent intensity ranges and to eliminate non-brain elements that could interfere with the analysis. These preprocessing techniques are widely recognized for improving the accuracy and reliability of subsequent tasks such as semantic segmentation (pixel-level) and image-level classification [27, 28].

While deep learning models can process raw data, preprocessing is essential in MRI studies to improve image consistency, as variations in acquisition parameters, scanner types, and contrast profiles can affect model generalization. Skull stripping precisely removed non-brain tissues, including the skull and scalp, enabling the network to focus on relevant brain regions—specifically the temporal lobe and insula, which are the primary regions of interest in this study.

To prevent any potential data leakage, all preprocessing steps were applied in a strictly training-data-driven manner. Parameters for intensity normalization (mean and standard deviation) were computed exclusively from the training set and subsequently applied to the validation and test sets. Denoising and skull stripping were performed independently for each subject using identical processing pipelines, without using any label or distributional information from the validation or test data. This ensured that preprocessing did not bias the model toward unseen data.

Denoising

MRI images are often affected by noise during acquisition, which can compromise quantitative analysis. While multiple acquisitions in the scanner can reduce noise, this is impractical in clinical settings due to time constraints. As an alternative, filtering methods such as the nonlocal means (NLM) filter [29] were employed to reduce noise while preserving anatomical details. The NLM filter reconstructs each pixel using a weighted average of nearby pixels based on a similarity measure, as shown in Eq. 1 and Eq. 2.

graphic file with name d33e696.gif 1

In this context, “Inline graphic” denotes the weight assigned to the value Inline graphic, signifying the similarity between the local patches Inline graphic and Inline graphic. These patches have a radius of r and are centered on voxels Inline graphic and Inline graphic. Additionally, Inline graphic represents a local search around Inline graphic.

graphic file with name d33e737.gif 2

In Eq. 2, Inline graphic serves as a normalization constant to ensure that the expression, given by the formula (Inline graphic ), the parameter Inline graphic plays a crucial role in filtering, influencing the rate at which the exponential function decays. Fig.2 shows the result of applying the NLM filter.

Fig. 2.

Fig. 2

Image denoising; a) original image, b) NLM filter output

Normalization

Normalization was performed using the Z-Score approach, which adjusts pixel intensities to a common scale, compensating for differences in acquisition settings across scanners. This ensures that tissues across images exhibit consistent intensity ranges, which is essential for accurate analysis.

Skull stripping

Skull stripping is an essential step in removing non-brain tissues from MRI scans. This preprocessing step was carried out using SynthStrip [30], a deep learning tool that effectively extracts brain tissues from a variety of imaging modalities. The result is a cleaner image, allowing for more precise segmentation of brain regions of interest (Figure 3).

Fig. 3.

Fig. 3

Sample result image of skull-stripping process

Semantic segmentation

Semantic segmentation, a deep learning technique, assigns a label to each pixel in an image to identify groups of pixels that belong to the same category. The goal is to divide the image into meaningful regions for further analysis. Recent advances in deep learning have led to powerful segmentation models, making them the dominant approach for semantic segmentation. In this study, we used deep learning-based semantic segmentation to identify regions of interest (ROIs), such as the insula and temporal lobe, to improve the automatic detection of schizophrenia. The following steps outline the process.

Labeling ground truth

A total of 130 2D T1-weighted brain MRI images from the coronal plane were obtained from the UCLA dataset and used for pixel-wise semantic segmentation of two anatomical regions of interest: the insula and the temporal lobe. Each image was manually annotated at the pixel level by expert raters to generate ground truth segmentation masks. Pixels corresponding to the insula were labeled in blue (RGB code [0, 0.45, 0.74]), while those representing the temporal lobe were labeled in orange (RGB code [0.85, 0.33, 0.1]). All other regions were assigned as background. This annotation process produced precise pixel-level class labels that enabled the segmentation network to learn fine-grained appearance and shape characteristics of each region, in contrast to coarse region annotations such as bounding boxes or contour approximations [31]. Figure 4(a) illustrates an example of a manually labeled ground truth image.

Fig. 4.

Fig. 4

Semantic segmentation image post process; blue region indicates insula, orange region is temporal region; a) ROI labled image, b) predicted by CNN, c) grayscale highlighted image

Data augmentation

To prepare images for semantic segmentation, 130 labeled images from the UCLA dataset were used. However, this number was too small for training the deep learning model, so image augmentation techniques were applied to increase the dataset size. Simple rotations (e.g., 90-degree turns) and flips (horizontal and vertical) generated 390 images. Additional transformations, such as shifting and resizing, created 4 variations per image, bringing the total to 520 images. In the end, 1040 images were used to train the model for semantic segmentation.

Moreover, due to the insufficient data available for testing our model, we expanded our external dataset to 426 images using similar data augmentation techniques. In this study, we have opted to focus on 2D MRI data, deliberately excluding 3D MRI data. This selection is based on both practical and methodological considerations. The analysis of 3D MRI volumes involves substantially higher computational complexity, as it requires processing larger datasets with greater resolution, which can result in increased computational time and resource demands. However, in future work, we plan to expand our approach to incorporate 3D MRI data, which could potentially offer more detailed insights and enhance the robustness of the model.

Semantic segmentation deep learning construction

In this research, we employed a pre-trained DeepLabV3+ network for image segmentation, followed by a CNN for feature extraction. While an integrated model could streamline the process by combining these tasks into a single network, the chosen pipeline was motivated by the need for precise segmentation and specialized feature extraction, each tailored to specific objectives.

The DeepLabV3+ model, as proposed by Chen et al. [32], incorporates an encoder-decoder architecture. The encoder captures multi-scale contextual information through atrous convolutions, while the decoder enhances segmentation accuracy along object boundaries. This combination makes DeepLabV3+ particularly well-suited for segmenting complex regions, such as brain tissues, where boundary delineation is critical for downstream tasks. The network is built upon ResNet50, a robust backbone designed by He et al. [33]. ResNet50, as the backbone for DeepLabV3+, provides a powerful feature extraction mechanism with a depth of 50 layers. Its architecture includes a series of convolutional and pooling layers optimized for large-scale image analysis, making it an ideal choice for capturing the nuanced features required for accurate segmentation. The rationale for the Two-Step Approach are:

  1. Precision in Segmentation: Using DeepLabV3+ ensures that the segmentation process is optimized independently, focusing on delineating meaningful regions (e.g., brain tissues) from the background.

  2. Targeted Feature Extraction: By applying the CNN only to the segmented regions, feature extraction is concentrated on relevant areas, reducing noise and improving model focus.

  3. Flexibility and Modularity: The two-step approach allows each module to be fine-tuned separately. This modular design provides the flexibility to replace or update either the segmentation or feature extraction components without affecting the overall pipeline.

  4. Empirical Performance Gains: Segmenting the images before feature extraction reduces the computational complexity of the subsequent steps by narrowing the focus to the regions of interest, enhancing classification accuracy and efficiency.

Contrast enhancement using color-to-gray scale conversion

The main goal of using the color-to-grayscale technique is to improve the visibility of subtle differences in brain structures, which are critical for distinguishing schizophrenia-related abnormalities. By applying this contrast-enhancing technique, we provide a clearer and more defined representation of the regions of interest (ROIs), allowing the model to learn from finer anatomical details. By enhancing the contrast in the segmented regions, the model can more effectively identify and focus on critical areas associated with schizophrenia, such as the temporal lobe, insula, and surrounding cortical regions. Since MRI images often contain numerous non-discriminative features, the proposed enhancement step enables the model to prioritize diagnostically relevant regions, ensuring that these features receive higher importance in the classification process.

Mathematical formulation and application of the color-to-grayscale enhancement

The color-to-grayscale contrast enhancement method proposed by [34] was employed to improve the perceptual visibility of subtle structural variations in MRI slices. Conventional grayscale conversion based solely on luminance (Inline graphic) often suppresses minor chromatic or intensity differences that can reflect schizophrenia-related abnormalities. To address this, Kuhn’s algorithm models the color-to-gray mapping as a mass–spring physical system, in which each color behaves as a particle interacting with all other colors through virtual springs. These interactions iteratively adjust luminance values to produce a grayscale image that preserves perceptual contrast.

Perceptual distance between colors

Each quantized color Inline graphic in the CIE Inline graphic color space is represented as a particle. The desired rest length between two colors in the grayscale domain is given by:

graphic file with name d33e905.gif 3

where Inline graphic denotes the perceptual color distance, Inline graphic is the full grayscale range, and Inline graphic represents the maximum perceptual color distance in the image. This proportional mapping ensures that perceptual differences among colors are maintained after conversion. In this study, this mechanism was used to preserve fine intensity and texture variations across brain regions such as the insula and temporal lobe.

Force modeling among color particles

The system computes the total force applied to each particle as:

graphic file with name d33e927.gif 4

where Inline graphic is the current grayscale distance and Inline graphic is the stiffness coefficient. If two grayscale levels become too similar, the spring between them exerts a repulsive force, encouraging their separation; conversely, overly distant gray levels are drawn closer. When applied to MRI data, this mechanism prevents different tissue types or cortical structures from merging into similar gray levels, ensuring that subtle neuroanatomical boundaries remain distinguishable.

Dynamic luminance update

The luminance value of each particle is iteratively updated using Verlet integration:

graphic file with name d33e945.gif 5

where Inline graphic is the particle’s mass. The simulation iterates until the system reaches equilibrium, yielding a stable grayscale representation with maximized local and global contrast. In our application, this dynamic update enhances the definition of cortical folds and boundaries, which contributes to improved visual separation of key brain structures for the CNN classifier.

Color saturation weighting

The mass of each particle is inversely related to its chromatic saturation:

graphic file with name d33e959.gif 6

Highly saturated colors (Inline graphic large) have smaller masses and therefore respond more strongly to forces, while near-neutral colors remain relatively stable. This weighting highlights highly informative regions and preserves structural uniformity elsewhere. In our framework, it emphasized subtle intensity transitions in segmented regions of interest (ROIs) without introducing artifacts in homogeneous tissue areas.

Quantitative evaluation using RWMS

The preservation of perceptual contrast was quantitatively assessed using the Root Weighted Mean Square (RWMS) error metric:

graphic file with name d33e973.gif 7

where Inline graphic is the expected grayscale difference between color pairs Inline graphic. Lower RWMS values indicate stronger preservation of perceptual contrast.

CNN construction

Convolutional Neural Networks (CNNs) are some of the most widely used and effective algorithms in Deep Learning. A major advantage of CNNs is their ability to automatically detect important features without human intervention. Among these, AlexNet is notable for its groundbreaking image recognition and classification achievements. It was the first CNN model to use GPUs for training, greatly accelerating the process. AlexNet also introduced a deeper architecture with 8 layers, improving feature extraction compared to earlier models like LeNet. Additionally, it employs the ReLU activation function, which speeds up training by allowing some neurons to stay inactive, thus avoiding the vanishing gradient problem.

ResNet gained prominence after winning the ILSVRC 2015 competition. It was designed to address the vanishing gradient problem in deep networks and has versions ranging from 34 to 1202 layers. ResNet50, a commonly used version, contains 49 convolutional layers and one fully connected layer.

In this research, both AlexNet and ResNet50 are used. These models include convolutional layers, max-pooling layers with ReLU activation, dropout layers, and fully connected layers with softmax for classification in AlexNet. ResNet50 also incorporates batch normalization.

Based on Fig. 5, the AlexNet architecture is designed to process and classify images through several layers. It begins with a convolutional layer that uses 96 kernels of size 11 × 11 and a stride of 4, followed by a max pooling layer with a 3 × 3 stride. The second convolutional layer uses 256 feature maps with 5 × 5 kernels and a stride of 1, followed by another max pooling layer with a 3 × 3 stride. The third and fourth convolutional layers each contain 384 feature maps with 3 × 3 kernels. The fifth convolutional layer has 256 feature maps with 3 × 3 kernels, followed by a final max pooling layer with a 3 × 3 stride. After these convolutional layers, three fully connected layers, each containing 4096 neurons, process the features. The architecture concludes with an output layer that generates the classification result.

Fig. 5.

Fig. 5

AlexNet architecture [35]

Implementation detail

The proposed work was implemented in MATLAB 2022b and executed on a Windows 11 computer equipped with an AMD Ryzen 56600 H processor running at 3.30 GHz and 16 GB of RAM. The GPU used was an NVIDIA RTX3050. The suggested model was evaluated using three publicly available dataset, focusing on control subjects and schizophrenia patients. The dataset was divided as 70% as training data, 15% as test data, and 15% as validation data.

Following segmentation and contrast enhancement, each image was resized to 227 Inline graphic 227 pixels and used as the input to the AlexNet architecture. For grayscale MRI slices, the single channel was replicated to form a 227 Inline graphic 227 Inline graphic 3 input tensor, ensuring compatibility with the network’s original configuration.

AlexNet consists of five convolutional layers, three max-pooling layers, and three fully connected layers. The first convolutional layer extracts 96 low-level feature maps (55 Inline graphic 55) representing basic edges and textures, followed by max pooling that reduces the spatial dimension to 27 Inline graphic 27. The second convolutional layer produces 256 mid-level feature maps (27 Inline graphic 27), and subsequent pooling compresses them to 13 Inline graphic 13. The third and fourth convolutional layers generate 384 higher-level feature maps (13 Inline graphic 13), capturing local structural and shape patterns. The fifth convolutional layer outputs 256 abstract feature maps (13 Inline graphic 13), which are downsampled through the final pooling layer to 6 Inline graphic 6, resulting in a flattened feature vector of 9,216 elements.

This flattened vector passes through three fully connected layers (fc6–fc8). The first two layers (fc6 and fc7) each contain 4,096 neurons that learn dense representations of the high-level abstract features, while the final layer (fc8) performs binary classification between schizophrenia and control groups. In this study, the 4,096-dimensional feature vector obtained from the fc7 layer was retained as the deep feature representation for each image. The extracted feature vectors were concatenated across the dataset, forming an 8192-dimensional composite feature vector representing each subject. This comprehensive feature set captures both emotional (insula-related) and cognitive (temporal-related) information relevant to schizophrenia classification.

Biological implications

While our model highlights the relevance of the temporal lobe and insula in schizophrenia, it is essential to bridge these findings with known biological mechanisms and clinical implications. Schizophrenia is a complex disorder associated with structural and functional abnormalities in specific brain regions, including the temporal lobe and insula. These regions are known to play critical roles in auditory processing, language comprehension, and interoceptive awareness, which are often disrupted in schizophrenia. The temporal lobe, particularly the superior temporal gyrus (STG), has been extensively linked to schizophrenia. Abnormalities in this region are associated with auditory hallucinations and deficits in social cognition, which are hallmark symptoms of the disorder [6, 36]. Our model’s identification of the temporal lobe aligns with previous neuroimaging studies that have reported reduced gray matter volume and altered functional connectivity in this area [5, 37]. These changes may correlate with biomarkers such as decreased N-acetylaspartate (NAA) levels, a marker of neuronal integrity, observed in magnetic resonance spectroscopy (MRS) studies [38]. The insula, a region involved in emotional regulation and self-awareness, has also been implicated in schizophrenia. Dysfunction in the insula is associated with impaired emotional processing and insight, which are commonly observed in patients [39, 40]. Our findings suggest that the insula’s role in schizophrenia may be linked to its connectivity with other brain regions, such as the anterior cingulate cortex (ACC) and the prefrontal cortex (PFC) [41, 42]. These connections are critical for integrating sensory and emotional information, and their disruption may contribute to the symptomatology of schizophrenia [43]. Biomarkers such as altered glutamate levels in the insula, as reported in some studies, could provide further biological validation of our model’s predictions [44, 45].

To validate the biological interpretability of our model, we utilized Gradient-weighted Class Activation Mapping (Grad-CAM) to identify and visualize the regions of interest (ROIs) that played the most significant role in the model’s decision-making process. Furthermore, we correlated these findings with clinical outcomes by analyzing the Scale for the Assessment of Negative Symptoms (SANS) scores for three patients with schizophrenia, categorized by severity: mild-to-moderate, moderate, and severe. This approach allowed us to bridge the model’s predictions with real-world clinical manifestations of the disorder. In addition to the regions previously mentioned, several studies have investigated the correlation between negative symptoms and structural alterations in specific brain regions in patients with schizophrenia. These studies collectively highlight that anhedonia and avolition, as measured by self-rated scales, are inversely related to white matter volume in the left anterior limb of the internal capsule. Furthermore, SANS scores show significant correlations with the vertical and horizontal distances between the corpus callosum and the infrafornix, ventricular area, and structural changes in the frontal lobe and amygdala. These findings emphasize the complex relationship between negative symptoms and structural brain abnormalities in schizophrenia, providing valuable insights into the neurobiological underpinnings of the disorder [4648].

Figure 6 demonstrates the results of Gradient-weighted Class Activation Mapping (Grad-CAM) applied to our proposed model, highlighting the key brain regions that influence the model’s decision-making process. As shown in the figure, the model identifies several regions, including the superior frontal gyrus, inferior frontal gyrus, medial frontal gyrus, cingulate gyrus, superior temporal gyrus (STG), inferior temporal gyrus, amygdala, orbital gyrus, insula, thalamus, hippocampus, caudate nucleus,and the left lentiform nucleus. These regions are strongly associated with negative symptoms of schizophrenia, as captured by our proposed model.

Fig. 6.

Fig. 6

Grad-CAM highlighted the critical regions that significantly influenced the outcomes of our study

Moreover, to further investigate this relationship, we calculated the average GradCAM values in the ROIs for a larger sample (N = 100) and computed the Spearman correlation coefficientInline graphic between these GradCAM values and SANS scores. The observed Spearman correlationInline graphic between higher GradCAM values—indicating the AI model’s focus on the hippocampus–and worse negative symptoms (higher SANS scores) suggests a moderate alignment between model attention and symptom severity. This supports existing findings that reduced hippocampal volume is linked to cognitive deficits and negative symptoms in schizophrenia. While GradCAM reflects model saliency rather than direct volumetric loss, the correlation underscores the clinical significance of hippocampal abnormalities.

A weak-to-moderate negative correlation between IFG GradCAM values and SANS scores Inline graphic suggests that increased model attention to the IFG is associated with milder negative symptoms. Similarly, a significant negative correlation between SFG GradCAM values and SANS scores Inline graphic indicates that preserved SFG integrity or function, as highlighted by the model, corresponds to less severe negative symptoms. Given the SFG’s role within the dorsolateral prefrontal cortex (DLPFC), a hub for executive control, this finding aligns with prior research linking DLPFC dysfunction to avolition and social withdrawal—core negative symptoms of schizophrenia [49].

Conversely, a moderate-to-strong positive correlation between left insula GradCAM values and SANS scoresInline graphic suggests that insular abnormalities contribute to negative symptoms. Greater model attention to the left insula correlates with more severe negative symptoms, consistent with evidence that insular hyperactivity during self-referential tasks is associated with symptom severity [50].

A similar pattern is observed in the thalamus, where a moderate-to-strong positive correlation Inline graphic suggests that thalamic abnormalities may play a role in negative symptoms. This finding aligns with studies linking thalamic dysfunction to symptom severity in schizophrenia [51].

In contrast, a moderate negative correlation between cingulate gyrus GradCAM values and SANS scores Inline graphic suggests that preserved cingulate integrity may mitigate negative symptoms. Higher model attention to this region corresponds to lower SANS scores, supporting evidence that intact anterior cingulate cortex (ACC) function is associated with better symptom outcomes [52].

No significant correlation was found between STG GradCAM values and SANS scores in this cohort.

The table 2 presents Spearman correlation coefficients (Inline graphic) between GradCAM-derived activation patterns in specific brain regions and schizophrenia symptom severity. All correlations are statistically significant (Inline graphic), indicating robust monotonic relationships. Effect sizes (Inline graphic) are interpreted as follows: Weak: Inline graphicInline graphic; Moderate: Inline graphicInline graphic; Strong: Inline graphic.

Table 2.

Spearman correlation between Grad-CAM activation and negative symptoms and effect size synthesis

Spearman Correlation Coefficients by Brain Region
Region Flattening Alogia Avolition Anhedonia Inattentiveness
Hippocampus 0.693 0.7408 0.67 0.73 0.30
Superior Frontal −0.35 −0.26 0.00 −0.38 −0.09
Inferior Frontal 0.00 −0.19 −0.23 −0.38 −0.47
Left Insular 0.30 0.40 0.67 0.20 −0.43
Thalamus 0.30 0.43 0.32 0.34 0.16
Cingulate Cortex −0.61 −0.28 −0.32 −0.42 −0.07
Effect Size Synthesis
Symptom Strongest Positive Correlation Strongest Negative Correlation
Flattening Hippocampus (Inline graphic = 0.69) Cingulate Cortex (Inline graphic = −0.61)
Alogia Hippocampus (Inline graphic = 0.74) Cingulate Cortex (Inline graphic = −0.28)
Anhedonia Hippocampus (Inline graphic = 0.73) Cingulate Cortex (Inline graphic = −0.42)
Inattentiveness Hippocampus (Inline graphic = 0.30) Inferior Frontal (Inline graphic = −0.47)

The hippocampus showed strong positive correlations with several core negative symptoms, including alogia (Inline graphic), anhedonia (Inline graphic), flattening (Inline graphic), and avolition (Inline graphic). Inattentiveness showed a weaker association (Inline graphic). These findings suggest that increased hippocampal activation, as captured by Grad-CAM, is closely linked to the severity of negative symptoms in schizophrenia. This aligns with the hippocampus’s known role in emotional regulation and memory, functions often impaired in the disorder.

The superior frontal gyrus displayed moderate negative correlations with flattening (Inline graphic) and anhedonia (Inline graphic), and weaker negative correlations with alogia (Inline graphic) and inattentiveness (Inline graphic). Avolition was not correlated (Inline graphic). These results indicate that higher activation in this executive control region may buffer against affective flattening and anhedonia. The lack of correlation with avolition suggests distinct neural pathways for motivational impairments.

The inferior frontal gyrus showed a moderate negative correlation with inattentiveness (Inline graphic), and weak negative correlations with anhedonia (Inline graphic), avolition (Inline graphic), and alogia (Inline graphic). No correlation was observed with flattening (Inline graphic). This suggests that increased activity in this region may be linked to better attentional control and, to a lesser extent, to improvements in speech and motivation, potentially reflecting its role in language and inhibition.

The left insular cortex was strongly correlated with avolition (Inline graphic) and moderately correlated with alogia (Inline graphic). Inattentiveness showed a moderate negative association (Inline graphic), while flattening (Inline graphic) and anhedonia (Inline graphic) had weaker positive correlations. This pattern reflects the insula’s dual role in interoception and cognitive regulation—exacerbating avolition and alogia while potentially reducing inattentiveness.

The thalamus showed a moderate positive correlation with alogia (Inline graphic), and weak positive correlations with flattening (Inline graphic), avolition (Inline graphic), anhedonia (Inline graphic), and inattentiveness (Inline graphic). These findings highlight the thalamus’s diffuse but consistent contribution, possibly reflecting its role as a sensory relay center involved in filtering and integrating information.

The cingulate cortex exhibited a strong negative correlation with flattening (Inline graphic), a moderate negative correlation with anhedonia (Inline graphic), and weaker negative associations with avolition (Inline graphic) and alogia (Inline graphic). Inattentiveness showed minimal correlation (Inline graphic). This suggests that greater activation in this region may protect against affective blunting and anhedonia, supporting its role in emotional regulation and conflict monitoring.

Our analysis demonstrates that the hippocampus consistently shows the largest effect sizes related to positive symptoms in schizophrenia (Inline graphic for 4 out of 5 symptoms), confirming its central role in the pathophysiology of the disorder. In contrast, the cingulate cortex exhibits the most protective effects, with the strongest negative correlations observed for symptoms such as flattening and anhedonia. Frontal regions, including the superior and inferior gyri, correlate with symptom reduction, supporting their involvement in top-down regulatory processes. Interestingly, the left insula shows a symptom-specific paradox: high activation in this region worsens avolition but simultaneously improves attention.

The strong positive correlations observed in the hippocampus suggest that hippocampal hyperactivation drives negative symptoms through disrupted emotional memory and contextual processing. The frontal-insufficiency hypothesis is supported by the negative correlations found in the frontal gyri, reflecting impaired executive control in schizophrenia. Moreover, the cingulate cortex appears to act as a compensatory hub, as indicated by its negative correlations, making it a promising target for neuromodulation therapies. Symptom-specific neural networks are also evident; avolition is linked to insular activation, whereas inattentiveness relates more strongly to frontal regions, advocating for symptom-focused treatment strategies.

In summary, this analysis reveals region and symptom-specific neural patterns in schizophrenia. The hippocampus and insula are primarily associated with symptom aggravation (strong positive correlations), whereas the cingulate and frontal cortices appear to mitigate symptom severity (negative correlations). Effect sizes underscore the hippocampus as a primary neural substrate for negative symptoms, while frontal and cingulate regions offer compensatory potential. These insights could guide targeted interventions, such as neurostimulation of the cingulate cortex to reduce flattening.

Our model consistently highlights the temporal lobe and insula, which aligns with prior biological evidence in schizophrenia. However, we emphasize that re-examining these regions through explainable AI methods such as Grad-CAM provides unique added value. To date, no studies have specifically performed segmentation or knowledge localization of the temporal lobe and insula in the context of schizophrenia using artificial intelligence. While previous research has acknowledged the involvement of these areas, efforts to localize and quantify their individual-level contributions using data-driven approaches remain scarce [53].

Through the use of Grad-CAM, we revisited these regions from a novel, interpretable perspective. The visualization of model attention revealed a consistent focus on the temporal lobe and insula—further supporting the biological plausibility of our findings. More importantly, Grad-CAM enabled us to link these attentional patterns to symptom-specific severity scores, contextualizing their relevance within the clinical heterogeneity of schizophrenia. This fusion of model saliency and symptom-based profiling represents a significant step forward, moving beyond traditional hypothesis-driven or volumetric group comparisons and toward individualized insight into neurobiological mechanisms.

Nonetheless, we recognize current limitations, particularly the limited availability of confounding variables in our dataset. While statistically significant, the correlations between Grad-CAM values and SANS scores are primarily exploratory and should be interpreted as preliminary evidence rather than direct clinical biomarkers. Future studies should aim to validate these findings using larger and more diverse cohorts, incorporating richer clinical and demographic information. Such efforts will not only strengthen the robustness and generalizability of our approach but also pave the way for greater clinical applicability in the personalized diagnosis and management of schizophrenia.

Results

This study focuses on accurately diagnosing schizophrenia using 2D brain MRI images, emphasizing the insula and temporal lobe as critical diagnostic regions. The MRI data were sourced from publicly available datasets, including COBRE, MCICShare, and UCLA [23]. Table 1 summarizes the statistical characteristics of the COBRE and MCICShare datasets, while table 1 illustrates the distribution of control and schizophrenia subjects across these datasets. An external dataset was used to test the reliability of our model, sourced from the dataset created by Soler-Vidal et al. [26].

Image preprocessing steps included denoising with a Non-Local Means (NLM) filter, normalization, and skull stripping to reduce noise and intensity variation from different scanners, as well as isolating brain tissue. Figure 2 shows the images before and after the denoising process, and Figure 3 presents the final image following the preprocessing techniques. The semantic segmentation network employed a pre-trained DeepLabv3+ architecture based on ResNet50 for segmentation tasks. ResNet-50 was chosen as the network backbone for DeepLabV3+ based on a comparative analysis conducted among four pre-trained networks: ResNet-18, ResNet-50, Xception, and MobileNetv2. The results of this analysis, presented in Table 3, demonstrate that ResNet-50 outperformed the other networks in segmenting our regions of interest, achieving superior performance. The training was conducted using 130 2D brain MRI images from the control group within the UCLA dataset, with the regions of interest (ROI), including the insula and temporal lobe, manually labeled. The network was trained for 200 iterations using MATLAB R2022b software [54]. Table 3 presents the segmentation accuracy, Dice coefficient, and Jaccard index for the images, evaluated using 5-fold cross-validation. Additionally, Fig. 4(b) illustrates the regions of interest predicted by the Network.

Table 3.

a) semantic segmentation regions prediction accuracy using pre-trained networks b) validation metrices of predicted regions using ResNet-50 for preprocessed images and unpreprocessed images c) Performance of semantic segmentation evaluated by Dice and Jaccard indices using 5-fold cross-validation

a) Network Acuracy Insula Accuracy Temporal lobe
ResNet-18 94% 83%
ResNet-50 96% 97%
Xception 75% 70%
MobileNetv2 73% 60%
b) Class pre-processed Accuracy Unpreprocessed Accuracy
Insula 96Inline graphic0.12% 80Inline graphic0.21%
Temporal lobe 97Inline graphic0.15% 85Inline graphic0.06%
c) Class (5-fold cross-validation) Dice(%) Jaccard(%)
Insula Inline graphic Inline graphic
Temporal lobe Inline graphic Inline graphic

After pooling, the dataset, consisting of 249 schizophrenia (SZ) subjects and 319 healthy controls (HC), was randomly divided at the subject level into 70% for training (398 subjects), 15% for validation (85 subjects), and 15% for testing (85 subjects).The classification procedure were conducted using MATLAB software, and to ensure the reliability and reproducibility of the results, the entire training and evaluation process was repeated 40 independent times with different random initializations and data shuffles.

The averaged results across the 40 runs are summarized in Table  4, which reports accuracy, sensitivity, specificity, precision, recall, F1-score, and the area under the ROC curve (AUC) with standard deviations. In this table, Item (a) presents the model performance using tuned hyperparameters, (b) shows the results from a K-fold cross-validation setup with K = 10 (representing the number of folds used for validation), (c) corresponds to the performance achieved on the external validation dataset, (d) represents the performance on the training set, and (e) denotes the performance on the validation set. This structured comparison demonstrates the model’s stability across multiple experimental settings.

Table 4.

Model performance in the current study for schizophrenia prediction

Accuracy (95% CI) Sensitivity (95% CI) Specificity (95% CI) Precision Recall F1-Score AUC (95% CI)
a) 98.02 Inline graphic 1.62 [96.4–99.5] 98.95 Inline graphic 1.82 [97.1–99.8] 97.06 Inline graphic 2.06 [95.0–98.8] 97.65 Inline graphic 2.04 98.95 Inline graphic 1.82 98.20 Inline graphic 1.69 0.9917 Inline graphic 0.01 [0.987–0.996]
b) 99.01 Inline graphic 0.14 [98.8–99.3] 98.00 Inline graphic 0.17 [97.7–98.3] 97.08 Inline graphic 0.22 [96.7–97.4] 97.34 Inline graphic 0.14 98.00 Inline graphic 0.17 97.66 Inline graphic 0.14 0.9891 Inline graphic 0.001 [0.987–0.991]
c) 96.20 Inline graphic 0.20 [95.8–96.6] 97.60 Inline graphic 0.15 [97.3–97.9] 98.30 Inline graphic 0.30 [97.8–98.8] 97.54 Inline graphic 0.20 97.60 Inline graphic 0.15 97.56 Inline graphic 0.11 0.980 Inline graphic 0.12 [0.972–0.988]
d) 100.0 Inline graphic 0.10 [99.8–100] 98.8 Inline graphic 0.15 [98.5–99.2] 99.1 Inline graphic 0.13 [98.8–99.4] 99.0 Inline graphic 0.14 98.8 Inline graphic 0.15 98.9 Inline graphic 0.13 0.994 Inline graphic 0.004 [0.986–0.999]
e) 98.2 Inline graphic 0.03 [97.9–98.5] 97.9 Inline graphic 0.04 [97.5–98.3] 98.4 Inline graphic 0.03 [98.1–98.7] 98.1 Inline graphic 0.04 97.9 Inline graphic 0.04 98.0 Inline graphic 0.03 0.986 Inline graphic 0.006 [0.975–0.993]

Our proposed model achieved strong and consistent performance, with an average accuracy of 98.02% Inline graphic 1.62, sensitivity of 98.95% Inline graphic 1.82, specificity of 97.06% Inline graphic 2.06, and an AUC of 0.9917 Inline graphic 0.01, confirming its robustness in distinguishing SZ from HC subjects. Figure 7 illustrates the receiver operating characteristic (ROC) curves for both the internal test set (a) and the external validation set (b). The proposed model achieved excellent discriminative performance, with an AUC of 0.99 for the internal dataset and 0.98 for the external dataset. The near-perfect AUC value in the internal evaluation indicates the model’s strong ability to distinguish schizophrenia (SZ) patients from healthy controls (HC), reflecting the effectiveness of the region-specific design focusing on the temporal lobe and insula. Moreover, the high AUC obtained on the independent external dataset confirms the model’s generalization capability and robustness against scanner or site-related variations. These results collectively demonstrate that the proposed framework achieves high diagnostic.

Fig. 7.

Fig. 7

ROC curves for schizophrenia prediction: a) internal test set and b) external validation set

To ensure methodological rigor and eliminate the possibility of data leakage, we conducted all data partitioning strictly at the subject level. In this process, each subject’s MRI slices and all corresponding augmented samples were grouped together and assigned exclusively to a single subset (training, validation, or testing). This ensured that data from any individual subject did not appear in more than one partition. Furthermore, we applied a stratified random split design to maintain a balanced ratio of schizophrenia (SZ) and healthy control (HC) subjects across all subsets and to preserve a proportional site-wise distribution.

After applying the subject-level stratified random split strategy, the model was retrained and evaluated under the same protocol, yielding realistic and generalizable performance—accuracy of 97.8% Inline graphic 1.6, sensitivity of 98.6% Inline graphic 1.8, specificity of 96.9% Inline graphic 2.0, and AUC of 0.987 Inline graphic 0.01.

Since the accuracy criterion cannot differentiate between false negative and false positive for the model, we calculated the two precision and recall. Also, to check whether our proposal is appropriate in this research, we calculate the F1-score, and the results show the goodness of Alexnet’s proposed model in schizophrenia diagnosis. The calculated F1-score is shown in Table 4. Furthermore, the results of our approach for each case introduced in the dataset section, including “Schizophrenia (broad)”, “Schizophrenia (strict)”, and “Schizoaffective”, have been calculated and are presented in Table 5. These results were run 40 times and are reported as (Mean Inline graphic STD) in accuracy, sensitivity, and specificity.

Table 5.

Performance of the model for each diagnostic subgroup

Detected case ACC (95% CI) SEN (95% CI) SP (95% CI) Precision (95% CI) Recall (95% CI) F1 (95% CI) AUC (95% CI)
Schizophrenia (broad) 97.09 Inline graphic 0.019 [96.7–97.5] 97.27 Inline graphic 0.03 [96.8–97.7] 96.75 Inline graphic 0.04 [96.2–97.3] 97.48 Inline graphic 0.03 [97.0–97.9] 97.27 Inline graphic 0.03 [96.8–97.7] 97.37 Inline graphic 0.02 [97.0–97.7] 0.984 Inline graphic 0.006 [0.973–0.992]
Schizophrenia (strict) 98.05 Inline graphic 0.02 [97.7–98.4] 96.57 Inline graphic 0.03 [96.1–97.0] 99.69 Inline graphic 0.01 [99.5–99.8] 98.11 Inline graphic 0.02 [97.7–98.4] 96.57 Inline graphic 0.03 [96.1–97.0] 97.33 Inline graphic 0.02 [96.9–97.7] 0.989 Inline graphic 0.005 [0.981–0.995]
Schizoaffective 98.23 Inline graphic 0.018 [97.9–98.5] 97.07 Inline graphic 0.03 [96.6–97.5] 99.73 Inline graphic 0.009 [99.6–99.8] 98.24 Inline graphic 0.02 [97.9–98.5] 97.07 Inline graphic 0.03 [96.6–97.5] 97.65 Inline graphic 0.02 [97.3–98.0] 0.986 Inline graphic 0.007 [0.974–0.994]

The findings of this study were obtained using the best-performing hyperparameters, identified through an experimental setup that compared classification performance across various configurations. Learning rates tested ranged from 0.0001 to 0.01. Three methods were evaluated for weight initialization: Glorot, He, and narrow-normal. The narrow-normal initializer samples weights independently from a normal distribution with a mean of zero and a standard deviation of 0.01, which is crucial for effective network training. Similarly, three bias initializers were compared: zeros, ones, and narrow normal. The epoch and batch size were predefined and fixed for this experiment. Both Stochastic Gradient Descent with Momentum (SGDM) and Adam optimizers were tested for optimization. This setup resulted in 342 different hyperparameter combinations, all trained and evaluated. The configuration with the highest validation accuracy was selected for the study, and these results are summarized in Table 6, along with the chosen hyperparameters.

Table 6.

Tuning hyperparameters results

Trial Hyperparameters Metrics
LearnRate WeightsInitializer biasInitializer Optimizer Tr. ACC Tr. Loss Valid. ACC Valid. Loss
1 0.0001 he zeros sgdm 92.18 0.19 95.1 0.15
2 0.0002 he zeros sgdm 95.31 0.10 94.46 0.11
3 0.0003 he zeros sgdm 97.65 0.0775 96.09 0.11
4 0.0004 he zeros sgdm 96.87 0.074 96.94 0.10
5 0.0005 he zeros sgdm 98.43 0.051 96.74 0.09
20 0.0001 glort zeros sgdm 89.66 0.21 94.46 0.15
21 0.0002 glort zeros sgdm 94.53 0.13 95.11 0.11
22 0.0003 glort zeros sgdm 97.65 0.10 97.6 0.11
40 0.0002 narrow-normal zeros sgdm 95.31 0.12 96.41 0.1
60 0.0003 he ones sgdm 96.09 0.09 95.43 0.11
70 0.004 he ones sgdm 92.18 0.08 95.82 0.2
106 0.002 narrow-normal ones sgdm 100 0.0069 96.41 0.14
123 0.0009 he narrow-normal sgdm 99.2 0.02 98.1 0.09
172 0.0001 he zeros adam 100 0.00001 98.24 0.02
175 0.0004 he zeros adam 95.31 0.088 94.13 0.2
201 0.002 glort zeros sgdm 55.46 0.68 50.81 0.7
267 0.0001 narrow-normal ones sgdm 97.65 0.05 96.09 0.1
300 0.006 he narrow-normal adam 98.43 0.44 95.11 0.11
310 0.0006 glort narrow-normal adam 98.43 0.03 96.74 0.1
320 0.007 glort narrow-normal adam 97.65 0.05 96.09 0.1
342 0.01 narrow-normal narrow-normal adam 96.09 0.09 95.43 0.11
Epoch Batch-Size LearnRate WeightsInitializer BiasInitializer Optimizer
20 64 0.0001 he zeros adam

Additionally, K-fold cross-validation was conducted to assess the robustness of the model with the tuned hyperparameters, with a value of K = 10 chosen for this study. The accuracy, sensitivity, and specificity metrics, reported as means and standard deviations, are presented in Table 4. The high consistency of these metrics across the K-fold cross-validation demonstrates the effectiveness and reliability of the selected hyperparameters for this study. When tested on an external dataset, our model continued to perform admirably, exhibiting high accuracy, sensitivity, specificity, precision, recall, and F1-score. Although there was a slight drop in accuracy and AUC compared to the tuned hyperparameter testing, the model retained robust performance and demonstrated strong generalization ability to unseen data. The higher specificity and stable recall indicate that the model remains effective at identifying true positives and avoiding false positives in an external setting.

To further evaluate the generalizability of the proposed model across independent acquisition sites and scanners, we performed a Leave-One-Site-Out (LOSO) cross-validation. In this analysis, data from one site were held out entirely for testing, while the model was trained and validated using the remaining two datasets. This procedure was repeated three times so that each site (COBRE, MCICShare, UCLA) served once as the unseen test domain. Importantly, LOSO evaluation was conducted using subject-level partitioning, ensuring that all slices from each subject were confined to a single site-specific fold to prevent inter-site or intra-subject data leakage.

As shown in Table 7, the LOSO experiments demonstrated consistent and high classification performance across all sites, with AUC values ranging between 0.962 and 0.983, indicating strong cross-site robustness. Notably, when trained on COBRE and MCICShare and tested on UCLA data, the model achieved an accuracy of 96.5% and AUC of 0.98, confirming the ability of the framework to generalize effectively to previously unseen scanner protocols and demographic distributions. Similarly, testing on COBRE and MCICShare yielded comparable accuracies of 95.3% and 94.8%, respectively. These results further support the stability of our approach and its resilience to inter-dataset variability, scanner differences, and acquisition heterogeneity.

Table 7.

Leave-one-site-out (LOSO) evaluation results demonstrating cross-site generalization performance

Training Sites Test Site Accuracy (%) Sensitivity (%) Specificity (%) Precision (%) F1-Score (%) AUC
COBRE + MCICShare UCLA Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
MCICShare + UCLA COBRE Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
COBRE + UCLA MCICShare Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Average Inline graphic SD Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic

To evaluate the effect of the color-to-grayscale enhancement, we performed an additional experiment in which the classification pipeline was executed without applying the enhancement step. In this baseline setting, the model achieved an average accuracy of 93.4 Inline graphic 1.9%, sensitivity of 94.2 Inline graphic 1.8%, specificity of 92.5 Inline graphic 2.0%, and an AUC of 0.961 Inline graphic 0.010 across 40 independent runs. These results indicate that, although the model maintained acceptable performance, the absence of the enhancement step reduced its ability to capture subtle contrast variations and fine-grained structural features relevant to schizophrenia classification.

While deep learning models are typically designed to work with raw, unprocessed images, this study utilized various preprocessing steps to enhance model performance. These steps contributed to achieving higher accuracy, particularly during the segmentation process. Peak Signal-to-Noise Ratio (PSNR) and Signal-to-Noise Ratio (SNR) analyses were performed to evaluate the impact of preprocessing on image quality. For denoised images, the PSNR and SNR values were 37.11 Inline graphic 1.18 and 24.3 Inline graphic 1.35, respectively, while for noisy images, the values were 10.047 Inline graphic 0.15 and 4.2 Inline graphic 0.75. The preprocessing steps implemented in this study significantly improved image quality. We conducted a comparative analysis of segmented unprocessed and preprocessed images to validate these improvements. The results in Table 3 demonstrate that preprocessing notably increased segmentation accuracy. Specifically, we achieved 96 and 97% accuracy for the insula and temporal lobe, respectively, compared to just 80% and 85% for the unprocessed images. Additionally, a comparison with state-of-the-art methods that utilize EEG and neuroimaging data, summarized in Table 8, confirms the superiority of our approach in terms of diagnostic accuracy and robustness. These findings underscore the potential of the model for reliable detection of schizophrenia and its applicability across a variety of datasets. Our proposed region-specific deep learning framework demonstrates promising potential for improving schizophrenia diagnosis through precise segmentation and classification of key brain regions. However, these findings should be interpreted as preliminary. Further multicenter, prospective studies involving larger and more diverse populations are required to validate the generalizability and clinical applicability of this approach.

Table 8.

Previous studies

Study Methods Performance
Modalities Features classifier
Rajesh and Sunil Kumar (2024) [18] EEG - logit boost classifier ACC: 91.66%
Kanyal (2024) [16] MRI/fMRI MRI/fMRI and SNP DL ACC: 79.01%
Srinivasan et. al (2024) [19] Multi-channel EG - CNN and LSTM ACC: 98.2%
Agarwal and Singhal (2023) [10] EEG - SVM, KNN, BT, and DT ACC: 99.25%
Zhang et. al (2023) [55] sMRI - 3D CNN AUC: 98.7%
Khare and Bajaj (2022) [56] EEG - Optimize extreme machine classifier ACC: 92.93%
De Rosa et. al (2022) [57] Post-morterm Brain HP and DLPFC RF AUC: 95%
Febles et. al (2022) [58] EEG - multi kernel SVM ACC: 83%
Lei et. al (2022) [13] sMRI/fMRI GM and WM SVM ACC: 90.83%
Algumaei et. al (2022) [59] fMRI HP,TL and FL SVM ACC: 98.57%
Tanveer et. al (2022) [14] sMRI GM and WM SVM and RF ACC: 80.71%
Luján et. al (2022) [60] EEG - SVM, Bayesian LDA, Gaussian NB, KNN, AdaBoost, and RBF ACC: 93.40%
Zandbagle et. al (2022) [61] EEG - KNN,LDA and SVM ACC: 89.21%
Aksöz et. al (2022) [11] EEG - KNN,ANN and SVM ACC: 93.9%
Lin et. al (2021) [17] fMRI/DTI whole brain multi-kernel SVM ACC: 95.33%
Shi et. al (2021) [62] sMRI/fMRI GM LDA ACC: 93.75%
Azizi et. al (2021) [12] EEG - LR ACC: 97%
Du et. al (2020) [63] EEG - Non-linear dynamic and functional brain network ACC: 76.77%
Vieira et. al (2020) [64] sMRI GMV/cortical thickness SVM, KNN, LR, and DNN ACC: 70%
Yang et. al(2020) [65] fMRI whole brain SVM ACC: 99.46%
Yassin et al. (2020) [15] sMRI cortical thickness, surface area and subcortical volume SVM, RF, LG, AB, DT, and KNN ACC: 75%

Experimental results

An experiment was conducted to evaluate the proposed method against other state-of-the-art classification techniques, aiming to demonstrate its strong potential for accurately diagnosing schizophrenia. The proposed method was compared with several advanced approaches, including CNN architectures integrated with attention mechanisms such as CBAM, transformer-based methods utilizing Vision Transformers (ViT), and hybrid CNN-RNN models. The metrics employed in this comparison include training accuracy, validation accuracy, and the size of the network. These metrics were used to assess and contrast the performance of the different methods. The details of each model are described below:

ResNet50-CBAM

ResNet50-CBAM is an enhanced version of the ResNet50 architecture, which integrates the Convolutional Block Attention Module (CBAM). ResNet50, a deep convolutional neural network with 50 layers, is renowned for its residual learning framework, which addresses the vanishing gradient problem and enables the training of very deep networks. The CBAM module is incorporated to improve feature representation by sequentially applying channel and spatial attention mechanisms. The channel attention mechanism focuses on “what” is important in the feature map, while the spatial attention mechanism identifies “where” the important regions are located. This combination allows the model to adaptively refine feature maps, enhancing its ability to capture discriminative features for tasks such as schizophrenia diagnosis. The integration of CBAM into ResNet50 improves its performance by emphasizing relevant features and suppressing less useful ones, making it a robust choice for accurate classification tasks. The Channel Attention module employs global average pooling and max pooling to aggregate spatial information, which is then processed by a shared multi-layer perceptron (MLP) to produce channel-wise attention weights. These weights are subsequently applied to the input feature map to accentuate the most significant channels. On the other hand, the Spatial Attention module generates spatial attention maps by performing convolutional operations on the concatenated outputs of average pooling and max pooling along the channel dimension. The resulting attention map is then applied to the feature map to emphasize relevant spatial regions. Figure 8 illustrates the architecture of the channel attention module, spatial attention module, and the overall CBAM structure. In our study, CBAM is integrated after the convolutional layers within each residual block of ResNet50. This integration enables the model to adaptively refine feature maps at multiple levels, significantly enhancing its ability to capture discriminative features for improved performance.

Fig. 8.

Fig. 8

Architecture of channel attention and spatial attention and CBAM with ResBlock

Vision transformers (ViT)

The Vision Transformer (ViT) is a revolutionary architecture that adapts the Transformer model, originally developed for natural language processing (NLP), to computer vision tasks. Unlike traditional convolutional neural networks (CNNs), which rely on convolutional layers to extract hierarchical features, ViT utilizes the self-attention mechanism to process images, allowing it to capture global relationships between image patches. The ViT architecture consists of two main components: the backbone and the head. The backbone processes input images and generates a vector of features, while the head is responsible for making predictions by mapping the encoded feature vectors to prediction scores. In our research, we fine-tuned the ViT model by replacing the original head with a new classification head tailored to the specific requirements of our study, enabling the model to effectively classify data for our task. Figure 9 provides a visual representation of the Vision Transformer (ViT) architecture.

Fig. 9.

Fig. 9

Architecture of ViT

ResNet50-GRU

The ResNet50-GRU model is a powerful hybrid architecture that combines the strengths of ResNet50, a deep convolutional neural network (CNN), and Gated Recurrent Units (GRUs), a type of recurrent neural network (RNN). This integration leverages the spatial feature extraction capabilities of CNNs with the temporal modeling strengths of RNNs, making it particularly effective for tasks that involve both spatial and sequential data, such as video analysis, time-series classification, or medical image sequences. GRUs are a variant of RNNs designed to handle sequential data more efficiently than traditional RNNs. They incorporate gating mechanisms to control the flow of information, allowing the model to capture long-term dependencies in sequential data. In the ResNet50-GRU model, the ResNet50 backbone processes each input frame or image independently, extracting high-dimensional feature vectors that encapsulate spatial information. These feature vectors, representing the spatial characteristics of each frame, are then sequentially fed into the GRU network. The GRU models the temporal relationships between frames or time steps by updating its hidden state at each step, enabling it to capture patterns and dependencies over time. The final hidden state of the GRU, which integrates both spatial and temporal information, is passed to a fully connected layer or classification head to produce the final output, such as class probabilities or regression values. This combination of spatial and temporal modeling makes the ResNet50-GRU model highly effective for tasks involving sequential data. Figure 10 shows the architecture of ResNet50-GRU model.

Fig. 10.

Fig. 10

Architecture of ResNet50-GRU

The results presented in Table 9 not only compare the performance of different models in terms of training accuracy (Tr. Acc) and validation accuracy (Val. Acc) but also emphasize the critical aspects of computational cost and deployability across various systems. This consideration is particularly crucial in real-world scenarios where hardware resources are limited, and models must run efficiently on a wide range of devices, from high-performance servers to edge devices like mobile phones and IoT systems.

Table 9.

Results of the experimental study

Approach Tr. Acc Val. ACC Size
ResNet50-CBAM 96.2 Inline graphic 0.09 95.7 Inline graphic 0.3 200Mb
ViT 91.02 Inline graphic 0.12 90.1 Inline graphic 0.02 670Mb
ResNet50-GRU 97.9 Inline graphic 0.06 97 Inline graphic 0.04 104Mb
Proposed Method 100 Inline graphic 0.1 98.2 Inline graphic 0.03 95Mb

The proposed method achieves the highest performance, with a training accuracy of 100% and a validation accuracy of 98.2%, outperforming all other models. Despite its superior accuracy, it maintains a remarkably compact size of just 95MB, making it the smallest model in this comparison. This reduced size translates into lower memory requirements and minimal bandwidth consumption for model transmission, making it ideal for deployment on edge devices and resource-constrained systems. Additionally, the lightweight nature of the model significantly reduces computational costs during inference, enabling smooth execution even on less powerful hardware.

In contrast, ResNet50-GRU delivers strong performance with a training accuracy of 97.9% and a validation accuracy of 97%, but at a slightly larger size of 104MB. While still relatively compact, its increased size compared to the proposed method may impose additional storage and computational demands, though it remains suitable for a broad range of devices.

ResNet50-CBAM, with a training accuracy of 96.2% and a validation accuracy of 95.7%, offers solid results but comes with a substantially larger model size of 200MB. This increased size could pose challenges for deployment on edge devices with limited memory, as well as higher computational costs, requiring more powerful hardware for efficient execution.

The Vision Transformer (ViT), despite its advanced architecture, exhibits the weakest performance in this comparison, achieving a training accuracy of 91.02% and a validation accuracy of 90.1%. More critically, it has an exceptionally large model size of 670MB, making it nearly seven times larger than the proposed method. The high computational cost and substantial memory requirements of ViT make it impractical for deployment on resource-constrained systems, as it necessitates powerful GPUs for effective execution. These findings highlight the efficiency of the proposed method, which not only achieves state-of-the-art accuracy but also remains highly deployable due to its compact size and lower computational demands.

Discussion

This study introduced a novel, domain-specific framework for the automated detection of schizophrenia using structural MRI, combining high-precision semantic segmentation and lightweight classification within a two-stage pipeline. The use of DeepLabv3+ for the segmentation of the temporal lobe and insula, regions strongly implicated in schizophrenia, represents a significant methodological advancement. To our knowledge, this is the first study to employ DeepLabv3+ with contrast-enhanced imaging for targeted segmentation in schizophrenia research. The integration of segmentation and classification stages yielded a high overall diagnostic performance (accuracy = 98.02Inline graphic1.62, AUC = 0.9917Inline graphic0.01), exceeding the results reported in several previous deep learning studies using MRI-based classification [55, 66, 67].

Compared with prior work, our approach achieved higher classification accuracy and AUC while maintaining a lightweight and interpretable design. For instance, Goel et al. [66] reported 96.5% accuracy using a ResNet50-based ensemble, while Zhang et al. [55] achieved AUC = 0.987 with whole-brain 3D CNNs. The superior performance of our framework likely stems from its region-specific focus and contrast enhancement, which amplify diagnostically relevant structural cues while suppressing noise from irrelevant brain areas. By localizing the analysis to the temporal lobe and insula–regions consistently associated with structural and functional abnormalities in schizophrenia [5, 6, 68]—our method captures neurobiological substrates that are both clinically and mechanistically meaningful. This localization strategy aligns with the principles of explainable AI in neuroimaging, emphasizing knowledge-guided feature extraction rather than indiscriminate global modeling.

The biological plausibility of our findings is supported by converging evidence. The temporal lobe, particularly the superior temporal gyrus, has been repeatedly implicated in auditory hallucinations and language disturbances, while the insula contributes to sensory integration, self-awareness, and emotional processing—all functions commonly disrupted in schizophrenia [39]. Our Grad-CAM analyses further confirmed that model attention was concentrated on these regions, reinforcing the link between network focus and pathophysiologically relevant structures. Together, these findings suggest that anatomically constrained and contrast-enhanced models can achieve both high accuracy and interpretability, addressing a key limitation of black-box deep learning approaches in psychiatry.

From a computational perspective, the combination of DeepLabv3+ and AlexNet enabled strong diagnostic performance with minimal complexity. The proposed model (95 MB) demonstrated comparable or superior results to larger architectures (e.g., ResNet50-GRU, ViT) while maintaining a smaller parameter footprint, which is advantageous for reproducibility and clinical deployment. Cross-validation and external testing further confirmed model robustness and generalization. The consistency between internal and external results underscores the potential of lightweight, region-specific models for real-world implementation.

The application of 2D CNN for schizophrenia classification has several limitations. Manual labeling of a small dataset of 130 MRI slices was performed by a single expert rater. While this ensured consistency, it represents a limitation, as inter-rater reliability was not assessed. Future work should incorporate multiple raters and report metrics such as ICC to strengthen segmentation reliability. The reliance on 2D MRI slices, rather than full 3D volumes, limits the spatial context available to the model and may restrict its ability to capture complex structural patterns. Additionally, scanner-related variability across datasets could influence image quality and model performance despite harmonized acquisition parameters. The limited number of images used for segmentation also constrains model generalizability; in future studies, we plan to increase the dataset size to improve robustness and overall performance. To further enhance feature extraction and generalization, additional high-quality labeled datasets, along with data augmentation and effective image synthesis techniques, should be incorporated. Future research will also explore leveraging 3D MRI images to provide more comprehensive spatial information and improve model performance.

In this study, we adopted a lightweight classification model (AlexNet, 95MB) as part of a modular pipeline aimed at balancing high diagnostic performance with clinical interpretability and practical deployment. While larger and more complex architectures such as ResNet50-GRU, ResNet50-CBAM, and Vision Transformers have shown promise in other domains, their utility in schizophrenia research is limited by several factors. First, structural neuroimaging datasets for psychiatric disorders are typically small and heterogeneous, which makes deep, highly parameterized models prone to overfitting. To this end, we implemented 10-fold cross-validation , which consistently showed high accuracy and low variance across folds, supporting the robustness and generalizability of our approach. Furthermore, the model was tested on an independent external dataset , where it maintained strong classification performance (AUC = 0.98, accuracy = 96.2%), indicating that it generalized well beyond the original training distribution. These findings suggest that lightweight architectures, when combined with thoughtful pre-processing and domain-specific constraints, can achieve competitive performance without the complexity of larger models.

We also emphasize that the lightweight design does not stand in isolation but complements other overfitting mitigation strategies employed in this study. Specifically, we used data augmentation to synthetically expand the training set and increase variability, along with region-specific segmentation using DeepLabv3+ and contrast enhancement during grayscale conversion to improve signal clarity and reduce irrelevant noise. These steps collectively form a multi-layered regularization strategy that reduces the risk of the model fitting to spurious patterns while maintaining a clear focus on biologically and clinically relevant brain structures.

While the final feature representation extracted by AlexNet consisted of 4096 dimensions per region of interest, we carefully considered the balance between feature dimensionality and sample size to minimize the risk of overfitting. The total dataset incorporated over 1,200 MRI slices from three publicly available repositories (COBRE, MCICShare, and UCLA), complemented by extensive data augmentation to expand variability across the training set. Moreover, the choice of a lightweight classifier (AlexNet,  95 MB) substantially reduced the number of trainable parameters compared to deeper architectures such as ResNet50-GRU or Vision Transformers, thus maintaining an appropriate ratio between sample size and model complexity. 10-fold cross-validation and independent external testing further validated the model’s generalization ability, showing minimal performance variance (accuracy = 98.02 Inline graphic 1.62, AUC = 0.99). These results indicate that the model successfully avoided overfitting despite the high-dimensional feature space. In essence, the combination of moderate model complexity, strong regularization via data augmentation, and rigorous validation ensured a stable and generalizable relationship between sample size and feature dimensionality.

Importantly, the lightweight nature of the model also enhances its interpretability and deployability. Smaller networks with fewer parameters tend to produce clearer and more localized Grad-CAM maps, as observed in our saliency analyses (Fig. 6), where model attention aligned with clinically meaningful brain regions such as the hippocampus, insula, and cingulate cortex. This stands in contrast to large “black-box” models, whose interpretability is often compromised by their depth and architectural complexity. Finally, from a translational standpoint, the reduced model size (95MB) makes it highly suitable for integration into clinical imaging systems and portable diagnostic tools, particularly in resource-limited settings where access to high-performance computing infrastructure is restricted. Heavy models often pose barriers to independent validation and deployment across resource-constrained clinical institutions, especially in low-income regions, where schizophrenia research and diagnostic support are already underfunded. By achieving high performance with minimal computational overhead, our lightweight framework helps overcome these limitations and supports the broader goal of democratizing AI-assisted neuropsychiatric diagnosis globally.

Acknowledgements

Authors have not received any funding for this research.

Abbreviations

SZ

Schizophrenia

HC

Healthy Control

DL

Deep Learning

ML

Machine Learning

CNN

Convolutional Neural Network

ROI

Region of Interest

sMRI

Structural Magnetic Resonance Imaging

CBAM

Convolutional Block Attention Module

GRU

Gated Recurrent Unit

ViT

Vision Transformer

ReLU

Rectified Linear Unit

ANOVA

Analysis of Variance

AUC

Area Under the Receiver Operating Characteristic Curve

ACC

Accuracy

SEN

Sensitivity

SPEC

Specificity

PREC

Precision

REC

Recall

F1

F1-score

CI

Confidence Interval

PSNR

Peak Signal-to-Noise Ratio

SNR

Signal-to-Noise Ratio

LOSO

Leave-One-Site-Out

CV

Cross-Validation

K

Number of folds in K-Fold Cross-Validation

ADAM

Adaptive Moment Estimation

fc7

Fully Connected Layer 7 (in AlexNet)

NLM

Non-Local Means Filter

RWMS

Relative Weighted Mean Square

DSM-5

Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition

SANS

Scale for the Assessment of Negative Symptoms

COBRE

Center for Biomedical Research Excellence

MCICShare

Mind Clinical Imaging Consortium Share Database

UCLA

University of California, Los Angeles Dataset

PACs

Picture Archiving and Communication System

Author contributions

All authors wrote and edited the manuscript, and all have read and approved the final version of the manuscript.

Funding

Authors have not received any funding for this research.

Data availability

The dataset used in this study is publicly available at the following link: https://schizconnect.org

Declarations

Ethical approval

Not applicable.

Consent for publication

Not applicable. All authors have approved the final manuscript and consent to its publication.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.Insel TR. Rethinking schizophrenia. Nature. 2010;468:187–93. [DOI] [PubMed] [Google Scholar]
  • 2.Ellison-Wright I, Glahn DC, Laird AR, Thelen SM, Bullmore E. The anatomy of first-episode and chronic schizophrenia: an anatomical likelihood estimation meta-analysis. Am J Psychiatry. 2008;165:1015–23. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Glahn DC, et al. Meta-analysis of gray matter anomalies in schizophrenia: application of anatomic likelihood estimation and network analysis. Biol Psychiatry. 2008;64:774–81. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Kasai K, et al. Differences and similarities in insular and temporal Pole mri gray matter volume abnormalities in first-episode schizophrenia and affective psychosis. Archiv Gener Psychiatry. 2003;60:1069–77. [DOI] [PubMed] [Google Scholar]
  • 5.Honea R, Crow TJ, Passingham D, Mackay CE. Regional deficits in brain volume in schizophrenia: a meta-analysis of voxel-based morphometry studies. Am J Psychiatry. 2005;162:2233–45. [DOI] [PubMed] [Google Scholar]
  • 6.Shenton ME, Dickey CC, Frumin M, McCarley RW. A review of mri findings in schizophrenia. Schizophr Res. 2001;49:1–52. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Craig AD. How do you feel? interoception: the sense of the physiological condition of the body. Nat Rev Neurosci. 2002;3:655–66. [DOI] [PubMed] [Google Scholar]
  • 8.Kittleson AR, Woodward ND, Heckers S, Sheffield JM. The insula: leveraging cellular and systems-level research to better understand its roles in health and schizophrenia. Neurosci Biobehav Rev. 2024;105643. [DOI] [PMC free article] [PubMed]
  • 9.Sheffield JM, Rogers BP, Blackford JU, Heckers S, Woodward ND. Insula functional connectivity in schizophrenia. Schizophr Res. 2020;220:69–77. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Agarwal M, Singhal A. Fusion of pattern-based and statistical features for schizophrenia detection from eeg signals. Med Eng Phys. 2023;112:103949. [DOI] [PubMed] [Google Scholar]
  • 11.Aksöz A, et al. Analysis and classification of schizophrenia using event related potential signals. Comput Sci. 2022;32–36.
  • 12.Azizi S, Hier DB, Wunsch DC. Schizophrenia classification using resting state eeg functional connectivity: source level outperforms sensor level. 2021 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), 1770–73 (IEEE, 2021. [DOI] [PubMed]
  • 13.Lei D, et al. Integrating machining learning and multimodal neuroimaging to detect schizophrenia at the level of the individual. Hum Brain Mapp. 2020;41:1119–35. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Tanveer M, et al. Diagnosis of schizophrenia: a comprehensive evaluation. IEEE J Biomed And Health Inf. 2022;27:1185–92. [DOI] [PubMed] [Google Scholar]
  • 15.Yassin W, et al. Machine-learning classification using neuroimaging data in schizophrenia, autism, ultra-high risk and first-episode psychosis. Transl Psychiatry. 2020;10:278. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Kanyal A, et al. Multi-modal deep learning from imaging genomic data for schizophrenia classification. Front Psychiatry. 2024;15:1384842. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Lin X, et al. Characteristics of multimodal brain connectomics in patients with schizophrenia and the unaffected first-degree relatives. Front Cell And Dev Biol. 2021;9:631864. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Rajesh KN, Kumar TS. Schizophrenia detection in adolescents from eeg signals using symmetrically weighted local binary patterns. EMBC. 2021;963–66. [DOI] [PubMed]
  • 19.Srinivasan S, Johnson SD. A novel approach to schizophrenia detection: optimized preprocessing and deep learning analysis of multichannel eeg data. Expert Syst With Appl. 2024;246:122937. [Google Scholar]
  • 20.Chyzhyk D, Savio A, Graña M. Computer aided diagnosis of schizophrenia on resting state fmri data by ensembles of elm. Neural Networks. 2015;68:23–33. [DOI] [PubMed] [Google Scholar]
  • 21.Gollub RL, et al. The mcic collection: a shared repository of multi-modal, multi-site brain image data from a clinical investigation of schizophrenia. Neuroinformatics. 2013;11:367–88. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Bilder R, et al. Ucla consortium for neuropsychiatric phenomics la5c study. 2018.
  • 23.Wang L, et al. Schizconnect: mediating neuroimaging databases on schizophrenia and related disorders for large-scale integration. Neuroimage. 2016;124:1155–67. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Robinson DG, et al. Predictors of treatment response from a first episode of schizophrenia or schizoaffective disorder. Am J Psychiatry. 1999;156:544–49. [DOI] [PubMed] [Google Scholar]
  • 25.Szeszko PR, et al. White matter abnormalities in first-episode schizophrenia or schizoaffective disorder: a diffusion tensor imaging study. Am J Psychiatry. 2005;162:602–05. [DOI] [PubMed] [Google Scholar]
  • 26.Soler-Vidal J, et al. Brain correlates of speech perception in schizophrenia patients with and without auditory hallucinations. PLoS One. 2022;17:e0276975. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Manjón JV. Mri preprocessing. Imaging biomarkers: development and clinical integration 53–63. 2017.
  • 28.Van Ginneken B, Schaefer-Prokop CM, Prokop M. Computer-aided diagnosis: how to move from the laboratory to the clinic. Radiology. 2011;261:719–32. [DOI] [PubMed] [Google Scholar]
  • 29.Buades A, Coll B, Morel J-M. A non-local algorithm for image denoising. 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05). Ieee; 2005, 60–65, vol. 2.
  • 30.Hoopes A, Mora JS, Dalca AV, Fischl B, Hoffmann M. Synthstrip: skull-stripping for any brain image. NeuroImage. 2022;260:119474. 10.1016/j.neuroimage.2022.119474. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Brostow GJ, Fauqueur J, Cipolla R. Semantic object classes in video: a high-definition ground truth database. Pattern Recognit Lett. 2009;30:88–97. [Google Scholar]
  • 32.Chen L-C, Zhu Y, Papandreou G, Schroff F, Adam H. Encoder-decoder with atrous separable convolution for semantic image segmentation. Proceedings of the European Conference on Computer Vision (ECCV). 2018.
  • 33.He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2016, 770–78, 10.1109/CVPR.2016.90.
  • 34.Kuhn GR, Oliveira MM, Fernandes LA. An improved contrast enhancing approach for color-to-grayscale mappings. The Visual Comput. 2008;24:505–14. [Google Scholar]
  • 35.Mazhari A, Allahgholi A, Shafieian M. Automated detection of sdh and edh due to tbi from ct-scan images using cnn. 2023 30th National and 8th International Iranian Conference on Biomedical Engineering (ICBME). IEEE; 2023, 164–70.
  • 36.Hugdahl K, et al. Auditory hallucinations in schizophrenia: the role of cognitive, brain structural and genetic disturbances in the left temporal lobe. Front Hum Neurosci. 2008;2:131. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Sun D, et al. Brain surface contraction mapped in first-episode schizophrenia: a longitudinal magnetic resonance imaging study. Mol Psychiatry. 2009;14:976–86. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Rg S. Measurement of brain metabolites by 1h magnetic resonance spectroscopy in patients with schizophrenia; a systematic review and meta-analysis. Neuropsychopharmacology. 2005;30:1949–62. [DOI] [PubMed] [Google Scholar]
  • 39.Wylie KP, Tregellas JR. The role of the insula in schizophrenia. Schizophr Res. 2010;123:93–104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Palaniyappan L, Mallikarjun P, Joseph V, White TP, Liddle PF. Regional contraction of brain surface area involves three large-scale networks in schizophrenia. Schizophr Res. 2011;129:163–68. [DOI] [PubMed] [Google Scholar]
  • 41.Menon V, Uddin LQ. Saliency, switching, attention and control: a network model of insula function. Brain Struct And Function. 2010;214:655–67. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Uddin LQ, Nomi JS, Hébert-Seropian B, Ghaziri J, Boucher O. Structure and function of the human insula. J Clin Neurophysiol. 2017;34:300–06. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Craig AD. How do you feel—now? the anterior insula and human awareness. Nat Rev Neurosci. 2009;10:59–70. [DOI] [PubMed] [Google Scholar]
  • 44.Poels EM, et al. Glutamatergic abnormalities in schizophrenia: a review of proton mrs findings. Schizophr Res. 2014;152:325–32. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Merritt K, Egerton A, Kempton MJ, Taylor MJ, McGuire PK. Nature of glutamate alterations in schizophrenia: a meta-analysis of proton magnetic resonance spectroscopy studies. JAMA Psychiatry. 2016;73:665–74. [DOI] [PubMed] [Google Scholar]
  • 46.Xu X-J, Liu T-L, He L, Pu B. Changes in neurotransmitter levels, brain structural characteristics, and their correlation with panss scores in patients with first-episode schizophrenia. World J Clin Cases. 2023;11:5215. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Huang Z, et al. Negative symptoms correlate with altered brain structural asymmetry in amygdala and superior temporal region in schizophrenia patients. Front Psychiatry. 2022;13:1000560. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Chuang J, et al. Brain structural signatures of negative symptoms in depression and schizophrenia. front psychiatry. 2014;5:116. [DOI] [PMC free article] [PubMed]
  • 49.Lesh TA, Niendam TA, Minzenberg MJ, Carter CS. Cognitive control deficits in schizophrenia: mechanisms and meaning. Neuropsychopharmacology. 2011;36:316–38. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Manoliu A, et al. Insular dysfunction within the salience network is associated with severity of symptoms and aberrant inter-network connectivity in major depressive disorder. Front Hum Neurosci. 2014;7:930. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Anticevic A, et al. Characterizing thalamo-cortical disturbances in schizophrenia and bipolar illness. Cereb Cortex. 2014;24:3116–30. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Walton E, et al. Prefrontal cortical thinning links to negative symptoms in schizophrenia via the enigma consortium. Psychological Med. 2018;48:82–94. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Kebets V, et al. Somatosensory-motor dysconnectivity spans multiple transdiagnostic dimensions of psychopathology. Biol Psychiatry. 2019;86:779–91. [DOI] [PubMed] [Google Scholar]
  • 54.Inc. TM. Matlab version: 9.13.0 (r2022b). 2022.
  • 55.Zhang J, et al. Detecting schizophrenia with 3d structural brain mri using deep learning. Sci Rep. 2023;13:14433. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Khare SK, Bajaj V. A hybrid decision support system for automatic detection of schizophrenia using eeg signals. Comput In Biol Med. 2022;141:105028. [DOI] [PubMed] [Google Scholar]
  • 57.De Rosa A, et al. Machine learning algorithm unveils glutamatergic alterations in the post-mortem schizophrenia brain. Schizophrenia. 2022;8, 8. [DOI] [PMC free article] [PubMed]
  • 58.Santos Febles E, Ontivero Ortega M, Valdés Sosa M, Sahli H. Machine learning techniques for the diagnosis of schizophrenia based on event-related potentials. Front Neuroinf. 2022;16:893788. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Algumaei AH, Algunaid RF, Rushdi MA, Yassine IA. Feature and decision-level fusion for schizophrenia detection based on resting-state fmri data. PLoS One. 2022;17:e0265300. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Luján MÁ, et al. Mental disorder diagnosis from eeg signals employing automated leaning procedures based on radial basis functions. J Med Biol Eng. 2022;42:853–59. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Zandbagleh A, Mirzakuchaki S, Daliri MR, Premkumar P, Sanei S. Classification of low and high schizotypy levels via evaluation of brain connectivity. Int J Neural Syst. 2022;32:2250013. [DOI] [PubMed] [Google Scholar]
  • 62.Shi D, et al. Machine learning of schizophrenia detection with structural and functional neuroimaging. Disease Markers. 2021 (2021;9963824. [DOI] [PMC free article] [PubMed]
  • 63.Du X, et al. Research on electroencephalogram specifics in patients with schizophrenia under cognitive load. Sheng wu yi xue Gong Cheng xue za zhi=. J Biomed Eng= Shengwu Yixue Gongchengxue Zazhi. 2020;37:45–53. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Vieira S, et al. Using machine learning and structural neuroimaging to detect first episode psychosis: reconsidering the evidence. Schizophr Bull. 2020;46:17–26. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Yang H, Di X, Gong Q, Sweeney J, Biswal B. Investigating inhibition deficit in schizophrenia using task-modulated brain networks. Brain Struct And Function. 2020;225:1601–13. [DOI] [PubMed] [Google Scholar]
  • 66.Tanveer M. Investigating white matter abnormalities associated with schizophrenia using deep learning model and voxel-based morphometry. 2023. [DOI] [PMC free article] [PubMed]
  • 67.Wen Y, et al. Bridging structural mri with cognitive function for individual level classification of early psychosis via deep learning. Front Psychiatry. 2023;13:1075564. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Shenton ME, Whitford TJ, Kubicki M. Structural neuroimaging in schizophrenia from methods to insights to treatments. Dialogues in clinical neuroscience. 2022. [DOI] [PMC free article] [PubMed]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The dataset used in this study is publicly available at the following link: https://schizconnect.org


Articles from BMC Medical Imaging are provided here courtesy of BMC

RESOURCES