Skip to main content
Cellular and Molecular Neurobiology logoLink to Cellular and Molecular Neurobiology
. 2026 Jan 23;46:40. doi: 10.1007/s10571-025-01660-z

Explainable Hybrid Deep Learning Framework Integrating MobileNetV2, EfficientNetV2B0, and KNN for MRI-Based Brain Tumor Classification

Mohammed Jajere Adamu 1,2,, Li Qiang 1, Charles Okanda Nyatega 1,3, Muhammad Fahad 4, Ayesha Younis 1, Adamu Halilu Jabire 5, Rabiu Sale Zakariyya 6, Halima Bello Kawuwa 7
PMCID: PMC12906459  PMID: 41578036

Abstract

Magnetic resonance imaging (MRI) is central to noninvasive brain tumor assessment, yet clinical uptake of artificial intelligence depends on both accuracy and transparency. This study presents a lightweight and interpretable hybrid framework that fuses features from two efficient convolutional backbones, MobileNetV2 and EfficientNetV2B0, using late fusion with global average pooling and vector concatenation. Classification is performed with a K‑Nearest Neighbors (KNN) head configured with k = 5, Euclidean distance, and distance‑based weighting. The dataset contains 7,023 MRI images drawn from Figshare, SARTAJ, and BR35H. Data were split with a 20% held‑out test set and a validation set equal to 20% of the remaining training pool, yielding 64%/16%/20% train/val/test. Four diagnostic categories were evaluated: Glioma, Meningioma, Pituitary, and Notumor. The confusion matrix shows a compact diagonal, and class‑wise precision, recall, and F1 are consistently high on the test set. A 5‑fold cross‑validation with normality assessment and paired significance testing supports robustness across folds. On the held‑out test set, class‑wise ROC–AUC was 1.00 for all four classes, and overall accuracy was 99.69%. Results should be interpreted in light of the unified dataset; external validation is warranted. Clinical interpretability is supported by class‑wise Grad‑CAM overlays and SHAP analyses, including waterfall plots that quantify individual feature contributions. These findings indicate that a dual‑backbone late‑fusion design coupled with a simple nonparametric classifier delivers strong, balanced performance while providing anatomically plausible case‑level insight into model decisions.

Keywords: CNN, MobilenetV2, EfficientNetV2B0, MR images, Brain tumor, Classification, Explainable AI

Introduction

Brain tumors remain a major cause of neurological morbidity and mortality, and timely, accurate categorization is critical for treatment planning. MRI is the modality of choice for brain tumor assessment because of its superior soft‑tissue contrast and multi‑sequence capability, enabling noninvasive visualization of tumor extent and peritumoral changes (Illimoottil and Ginat 2023). Clinically, tumors are graded and characterized along multiple axes (location, texture, shape, size), with three common diagnostic categories in routine practice; Glioma, Meningioma, and Pituitary lesions and Notumor cases (Saddique 2014; Varuna 2018; ul Haq et al. 2022; Ullah et al. 2023; Tanaka et al. 2024). Despite advances, manual interpretation is time‑consuming and can be challenged by heterogeneous appearance, overlapping boundaries, and variable acquisition protocols.

Automated MRI classification has therefore become a central problem in computer‑aided diagnosis. The task is difficult because tumor phenotypes vary widely in size, shape, intensity, and effects on surrounding tissue, and because imaging artifacts and site differences add distributional shifts. Integrating advanced computational methods has improved the study of complex neuropsychiatric and neurological conditions and similarly motivates robust pipelines for neuro‑oncology (Tanaka et al. 2024; Battaglia et al. 2024; Markkandeyan 2023; Suresh 2023). CNNs are well suited to MRI because they learn hierarchical features that capture both local textures and higher‑level semantic structure. In brain tumors, pathological processes such as angiogenesis, mass effect, and infiltrative growth produce spatial–textural signatures that deep networks can exploit (Singh 2023).

Among efficient CNN families, MobileNetV2 and EfficientNetV2B0 offer complementary strengths: depthwise separable convolutions and inverted residuals for lightweight feature extraction in MobileNetV2, and compound scaling with modern MBConv blocks and Swish activation in EfficientNetV2B0. Leveraging both can improve representational diversity without incurring the cost of very deep models. For the final decision layer, non‑parametric K‑Nearest Neighbors (KNN) is attractive because it makes minimal distributional assumptions and performs excellently on fused feature spaces with nonlinear class boundaries.

In this work, we propose a lightweight, interpretable hybrid framework that:

  • Extracts complementary features with MobileNetV2 and EfficientNetV2B0,

  • Performs late fusion via global average pooling and vector concatenation,

  • Classifies the fused representation using KNN, and.

  • Explains decisions with class‑wise Grad‑CAM overlays and SHAP, including SHAP waterfall plots for case‑level attribution.

Four classes were evaluated (Glioma, Meningioma, Pituitary, Notumor) using a unified dataset assembled from public sources. Beyond accuracy, emphasis was on interpretability: class‑wise metrics and confusion analysis, 5‑fold cross‑validation with normality and paired significance testing, and visual explanations that align with neuroradiological expectations. The complete workflow is summarized in Fig. 1; the model architecture and fusion detail are shown in Figs. 2 and 3, respectively.

Fig. 1.

Fig. 1

Standalone end-to-end workflow

Fig. 2.

Fig. 2

Neural Network Architecture of Proposed Approach

Fig. 3.

Fig. 3

Feature Fusion Detail

Literature Review

Deep learning for MRI brain tumor classification has progressed along three main lines: (i) classical feature–engineering with machine‑learning classifiers, (ii) end‑to‑end CNN models including transformers, and (iii) fusion strategies that combine complementary information.

Early pipelines extracted hand‑crafted texture or frequency features and used classical classifiers. Examples include fuzzy optimization for network design (Narmatha 2020), hypercolumn or multi‑layer feature aggregation in BrainMRNet (Toğaçar et al. 2020), deep feature selection and texture fusion (Sharif 2020), and multi‑sequence fusion with wavelet representations (Amin et al. 2020). Hybrid classifiers and ensembles, CNN‑SVM thresholding (Khairandish 2022), majority‑voting ensembles with KNN, RF, and DT (Garg 2021), and multi‑modal DNN with multiclass SVM (Maqsood et al. 2022), reported solid accuracy but required careful feature curation and tended to be less scalable across datasets.

With the rise of transfer learning, single‑backbone CNNs achieved strong performance on tumor classification, segmentation, or detection tasks. Prior work spans differential CNN variants (Kader et al. 2021; (Singh et al. 2025), U‑NET with fuzzy logic (Maqsood 2021; Yadav et al. 2025; Yadav 2025), transfer learning from VGG-ResNet family models (Singh 2023; Yadav et al. 2025; Yadav 2025; Mavaddati 2024; Ghosal 2019; Abbood et al. 2021; Siar 2019), and hybrid deep learning tailored to three class tumor problems (Raza et al. 2022). Deep feature fusion methods further improved robustness by combining multiple learned representations (Mavaddati 2024), while GAN pre‑training has been explored to regularize feature learning on MR images (Ghassemi et al. 2020). These CNN approaches generally deliver high accuracy but may involve heavier fully connected heads or extensive fine‑tuning, which can increase complexity and overfitting risk on modest datasets.

Vision Transformer (ViT) variants have shown competitive or state‑of‑the‑art accuracy on multiclass MRI tumor classification (Chandraprabha et al. 2025; Tummala et al. 2022; Reddy et al. 2024). Such models exploit global context but often need larger, better‑curated data and careful regularization. Ensemble strategies for grade classification also illustrate gains from combining diverse learners (Bv et al. 2024). Efficient family backbones such as EfficientNet provide a strong accuracy trade‑off for both classification and segmentation (Amin 2023; Tan 2019).

Information fusion appears in multiple forms: fusing MRI sequences in the frequency domain (Amin et al. 2020), learned deep feature fusion across backbones (Chen et al. 2024), and classical model‑level ensembles (Garg 2021; Bv et al. 2024). These studies support the premise that complementary representations can boost discriminability, particularly for heterogeneous tumors. A related hybrid study explored MobileNetV2 and EfficientNetV2B0 style designs for tumor classification (Rasool et al. 2022), underscoring the promise of pairing a lightweight backbone with a more expressive one.

Although some studies have begun integrating explainable AI (XAI) into tumor pipelines, explicit, systematic class‑wise explanations remain less common. For example (Lakshmi et al. 2025; Banerjee 2025), incorporated XAI to evaluate segmentation and classification behavior. However, many high accuracy reports still provide limited qualitative evidence and lack subject‑level attributions such as SHAP waterfall plots, which help quantify how features push individual predictions.

A substantial portion of the literature evaluates on public benchmarks, including BraTS and unified Kaggle‑style datasets (Figshare, SARTAJ, and Br35H) (Toğaçar et al. 2020; Amin et al. 2020; Khairandish 2022; Maqsood 2021; Nickparvar 2023). While useful for comparability, unified datasets are recognized as relatively easy, with several works reporting very high accuracy. This highlights the need to calibrate claims, emphasize interpretability, and pursue multi‑institutional validation.

Relative to the above, our work focuses on a lightweight, dual‑backbone late‑fusion design, MobileNetV2 + EfficientNetV2B0 with global average pooling and vector concatenation paired with a simple, non‑parametric KNN head. This combination aims to retain complementary features without heavy fully connected layers, thereby reducing overfitting risk on modest data. Our work complemented accuracy with systematic interpretability: class‑wise Grad‑CAM maps, SHAP summaries, and SHAP waterfall plots for case‑level attributions, addressing the transparency gap observed in prior reports (Singh 2023; Ghosal 2019; Siar 2019; Vankdothu et al. 2022; Mohanty et al. 2024).

Methods

Dataset and Splits

The dataset comprises of 7,023 brain MRI images from three public sources (Figshare, SARTAJ, and Br35H; Kaggle mirrors). Images are categorized into four classes: Glioma, Meningioma, Pituitary, and Notumor.

20% of images form a stratified test set; from the remaining 80%, 20% serve as validation, producing 64/16/20 train/val/test proportions. Splits use class‑level stratification at the image level due to missing subject IDs. Class counts per split are shown in Table 1.

Table 1.

Distribution of MR images across different classes

Class Number of Images
Training Data Validation Data Testing Data
Glioma 1056 265 300
Meningioma 1071 268 306
Notumor 1276 319 405
Pituitary 1165 292 300

Preprocessing and Data Augmentation

This is an essential procedure that adjusts the dataset to suit the model during training better and enhances its generalization capability. This study applied several data augmentation techniques to introduce variability and improve the model’s robustness. All images got resized to 224 × 224 pixels to match the input dimensions of the deep learning models, MobileNetV2 and EfficientNetV2B0. The following augmentation techniques were applied:

  • (i)

    Random Rotation: Images were randomly rotated within a range of ± 10 degrees to simulate positional variations and improve the model’s robustness to orientation changes.

  • (ii)

    Random Scaling: Images were randomly scaled within 90% to 110% of their original size to ensure robustness to scale changes.

  • (iii)

    Horizontal Flipping: Images were horizontally flipped with a probability of 50% to introduce spatial diversity and account for symmetry in brain structures.

  • (iv)

    Random Brightness and Contrast Adjustments: The brightness of images was randomly adjusted by a factor of ± 20%, and a factor of ± 15% adjusted the contrast to account for variations in imaging conditions.

  • (v)

    Random Translation: Images were randomly translated by up to 10% of their width and height to simulate minor shifts in the field of view.

  • (vi)

    Random Zoom: Images were randomly zoomed in or out by up to 10% to simulate variations in the distance between the MRI scanner and the subject.

These augmentation steps enriched the dataset, enabling the model to learn more generalized features while reducing the risk of overfitting. The specific parameters for each augmentation technique were chosen based on their effectiveness in improving model performance during preliminary experiments.

Model Architecture

Our model extracts complementary features using two ImageNet pretrained backbones, MobileNetV2 and EfficientNetV2B0, followed by late fusion and non‑parametric classification. Figure 2 shows the complete architecture. For each input image, both backbones process the image in parallel to produce convolutional feature maps. Global average pooling (GAP) is applied to each backbone’s final convolutional output to obtain compact feature vectors. These vectors are concatenated into a fused representation and regularized with dropout (rate 0.5). The fused feature vector is classified using K‑Nearest Neighbors (KNN; k = 5, Euclidean distance, distance weighting). Because KNN is non‑differentiable, a small auxiliary softmax head is trained on the fused features solely to compute Grad‑CAM and SHAP explanations; all reported performance metrics are derived from the KNN classifier.

Feature Fusion and Classifier

Feature fusion is performed after global average pooling (late fusion). Specifically, the two pooled vectors, one from MobileNetV2 and one from EfficientNetV2B0 are concatenated without additional weighting to preserve contributions from both backbones. A dropout layer (rate 0.5) is applied during auxiliary‑head training for regularization. The resulting fused feature vector is passed to a KNN classifier configured with k = 5, Euclidean distance, and distance‑based weighting. The fusion schematic is shown in Fig. 3.

Training Details

The backbone weights are initialized from ImageNet and used as fixed feature extractors. The auxiliary softmax head which was used only for explainability is trained with the Adam optimizer (learning rate 0.001), batch size 32, for 50 epochs with early stopping based on validation performance. The key hyperparameters are summarized in Table 2.

Table 2.

Final hyperparameters

Parameter Values
Optimizer Adam
Metrics Accuracy
Batch Size 32
Epochs 50
Learning Rate 0.001
Dropouts 0.5

Explainability

Complementary post hoc explanation methods are employed to analyze model behavior at both class and case levels.

  • i.

    Grad‑CAM: Grad‑CAM heatmaps were computed from the last convolutional block for each target class using the auxiliary softmax head to obtain class scores. Heatmaps are ReLU‑rectified, normalized to [0,1], and overlaid using identical display ranges across classes to enable visual comparison. Class‑wise exemplars were presented in Fig. 8 and misclassification overlays in Fig. 11.

  • ii.

    SHAP: DeepSHAP was used on the auxiliary softmax head with a balanced background of 25 images per class drawn from the training set. SHAP waterfall plots for representative test samples (Fig. 9) visualize individual feature contributions driving the prediction, and aggregated SHAP summaries with exemplar attribution maps (Fig. 10) show class‑consistent evidence across the test set. To reduce visualization noise, attributions are lightly smoothed and clipped at the 99th percentile.

Fig. 8.

Fig. 8

Grad‑CAM Class‑wise Exemplars

Fig. 11.

Fig. 11

Misclassification Grad‑CAM Overlays

Fig. 9.

Fig. 9

SHAP Waterfall Plot

Fig. 10.

Fig. 10

SHAP Summary and Exemplar Attribution Maps

The auxiliary softmax head enables gradient‑based and SHAP analyses but is not used for primary performance reporting. All accuracy and class‑wise metrics come from the KNN classifier operating on the fused feature vector.

Hyperparameter Tuning Results

The proposed model’s hyperparameters were carefully tuned to optimize performance. Table 3 summarizes the tested ranges and the final chosen values. The learning rate was set to 0.001 to ensure stable convergence, while the batch size was set to 32 to balance computational efficiency and model performance. As confirmed by the ablation study, the dropout rate 0.5 effectively reduced overfitting. These hyperparameters contributed to the model’s high accuracy and robustness, as demonstrated in the cross-validation results.

Table 3.

Hyperparameter tuning results

Hyperparameter Testing range Final value Justification
Learning Rate [0.0001, 0.01] 0.001 Ensures stable convergence and avoids overshooting the optimal solution.
Batch Size [16, 64] 32 Balances computational efficiency and model performance.
Dropout Rate [0.3, 0.7] 0.5 Reduces overfitting while maintaining model accuracy.
No. of Epochs [30, 100] 50 Provides sufficient training without overfitting.

Statistical Analysis

Five‑fold stratified cross‑validation was performed to assess robustness. For fold‑wise metrics (accuracy, precision, recall, F1, AUC), normality was assessed using the Shapiro–Wilk test. When normality held (p ≥ 0.05), paired t‑tests were applied to compare variants (e.g., single‑backbone vs. fused features); otherwise, Wilcoxon signed‑rank tests were used. Two‑sided p‑values and effect sizes (Cohen’s d for t‑tests; r for Wilcoxon) are reported. Summary statistics are provided in Table 5.

Table 5.

Cross-Validation results

Metric (%) Glioma Meningioma Pituitary Notumor Overall
Accuracy 98.5 97.8 99.2 99.5 98.7
Precision 98.2 97.5 99.0 99.3 98.5
Recall 98.3 97.7 99.1 99.4 98.6
F1-Score 98.2 97.6 99.0 99.4 98.5

Evaluation Metrics

Primary evaluation uses class‑wise precision, recall, and F1‑score, along with the confusion matrix to characterize error patterns across Glioma, Meningioma, Pituitary, and Notumor classes. ROC curves and AUC are reported in the result section.

The AUC is a measure that estimates the area under the ROC (Receiver Operating Characteristic) curve. The ROC curve runs from the point (0, 0) to the end (1, 1). The AUC (Area Under the Curve) measures classification performance. An excellent classification is shown by an AUC value of 1, whereas an inadequate classification is indicated by an AUC value of 0, meaning the predictions are entirely inaccurate. The strategy described in this study utilizes ROC and AUC curves to assess the model’s capacity for accurate classification and improved prediction.

The performance metrics are calculated using the following formula.

graphic file with name d33e886.gif 1
graphic file with name d33e890.gif 2
graphic file with name d33e894.gif 3
graphic file with name d33e898.gif 4

Model Efficiency Analysis

To evaluate the practicality of the proposed model for real-world clinical deployment, we measured its computational efficiency in terms of inference time, memory usage, and model size. The inference time was calculated as the average time to classify a single MR image. Memory usage was measured as the peak RAM consumption during inference, and the total number of parameters determined the model size. All experiments were conducted on a system with an NVIDIA RTX 3090 GPU, 32 GB RAM, and Python 3.8 with TensorFlow 2.6. The results were compared with other state-of-the-art models to assess the efficiency of the proposed approach.

Results

Model Performance

We first evaluated the fused MobileNetV2–EfficientNetV2B0 features classified with KNN on the held-out test set. The confusion matrix in Fig. 4  shows a compact diagonal across all four classes (Glioma, Meningioma, Pituitary, Notumor), indicating few off-diagonal errors. Class-wise precision, recall, and F1 scores in Fig. 5; numerical values in Table 4 demonstrate balanced performance, with especially strong results for Pituitary and Notumor and high precision/recall for Glioma and Meningioma. Figure 6  shows the proposed model’s AUC (Area Under the Curve) result.

Fig. 4.

Fig. 4

Confusion Matrix

Fig. 5.

Fig. 5

Proposed Model Performance

Table 4.

Performance metrics of the proposed model

Classes Precision Recall F1-score Number of Images
Glioma 0.98 0.99 0.98 300
Meningioma 0.97 0.98 0.98 306
Notumor 1.00 0.99 0.99 405
Pituitary 1.00 0.99 0.99 300

Fig. 6.

Fig. 6

Proposed Model ROC Representation

Key observations from Fig. 4; Table 4:

  • i.

    The majority of test samples fall on the confusion matrix diagonal, reflecting reliable class separation.

  • ii.

    Precision, recall, and F1 are consistently high across classes (Table 4), underscoring robustness beyond a single summary metric.

Cross-Validation and Statistical Analysis

We performed 5-fold stratified cross-validation to assess robustness and variability across splits. Table 5 summarizes fold-wise metrics and statistical tests. Normality of fold-wise accuracy and F1 was assessed using the Shapiro–Wilk test; paired comparisons, where fused vs. single-backbone variants used paired t-tests when normality held and Wilcoxon signed-rank tests otherwise (effect sizes reported as Cohen’s d or r). Cross-validation corroborated the single split results, with narrow fold-wise variability.

Comparative Evaluation with Prior Work

Table 6; Fig. 7 summarize reported test accuracies from representative MRI brain‑tumor classification studies alongside the proposed dual‑backbone late‑fusion model. Because datasets, class composition, preprocessing, and evaluation protocols vary across reports, these values should be interpreted as contextual indicators rather than strictly comparable benchmarks. Within this context, the proposed MobileNetV2–EfficientNetV2B0 late‑fusion approach with a KNN head attains 99.69% accuracy, competitive with high‑performing transformer‑based models and higher than several CNN baselines included in the comparison. Beyond aggregate accuracy, the present framework emphasizes efficiency through lightweight backbones and a non‑parametric classifier, and transparency via class‑wise Grad‑CAM and SHAP analyses, which together help interpret decision patterns across Glioma, Meningioma, Pituitary, and Notumor.

Table 6.

Baseline comparison table

Baseline Papers Models Accuracy (%)
Palash Ghosal et al. (2019) ResNet 93.83
R. Vankdothu et al. (2022) LSTM-CNN 95
Navid Ghassemi et al. (2020) GNA-CNN 88.05
Alaa Ahmed Abbood et al. (2021) ResNet 95.80
Masoumeh Siar et al. (2019) Customised CNN 97.34
Ravendra Singh et al. (Singh 2023) Novel CNN architecture 92.50
Mohanty, B. C et al. (2024) Soft attention CNN 95.1 (per class > 97.1)
Chandraprabha et al. (2025) Vision Transformer 99.64%
Tummala et al. (2022) Vision Transformer 98.7%
Proposed Model MobileNetV2 – EfficientNetV2B0 99.69

Fig. 7.

Fig. 7

Baseline Comparison with Proposed Model

Model Interpretability

We used complementary post hoc methods: Grad-CAM and SHAP to analyze model decisions at both class and case levels.

i- Grad-CAM (Fig. 8): Class-wise overlays on representative test images show anatomically plausible saliency. Typical patterns included intra-axial localization for Glioma, dural-based saliency for Meningioma, sellar and suprasellar localization for Pituitary, and diffuse/low-intensity maps for Notumor. These patterns were consistent across exemplars and robust to minor visualization parameter changes.

ii- SHAP waterfall plots (Fig. 9): For representative test cases, waterfall plots illustrate how specific features increase or decrease the probability for the predicted class, providing subject-level interpretability complementary to Grad-CAM’s spatial view.

iii- SHAP summaries and exemplars (Fig. 10): Aggregated SHAP results reveal class-consistent feature importance distributions across the test set, while exemplar maps illustrate localized evidence supporting individual predictions. The agreement between SHAP and Grad-CAM increases trust in the model’s decision patterns.

Error Analysis

Representative misclassifications from the test set are shown in Fig. 11 using Grad-CAM overlays. These examples illustrate the kinds of errors observed. Detailed analysis of error types is provided in the Discussion section.

Discussion

This work presents a lightweight, interpretable hybrid framework that fuses features from MobileNetV2 and EfficientNetV2B0 via late fusion and classifies with a non-parametric KNN classifier. The workflow (Fig. 1) and architecture (Figs. 2 and 3) were designed to balance accuracy, efficiency, and transparency. On the unified four-class dataset (Glioma, Meningioma, Pituitary, Notumor), the model achieved strong performance with a compact confusion matrix diagonal and consistently high class-wise precision, recall, and F1 (Figs. 4, 5, and 6; Table 4). Five-fold cross-validation with normality and paired testing (Table 5) substantiated robustness across splits. The ROC curves (Fig. 6) show excellent separability with AUC = 1.00 for all classes; to provide a more diagnostic view of errors and class balance, we also examine the confusion structure and class-wise precision, recall, and F1.

Why Fusion and KNN Worked Well

Late fusion after global average pooling retained complementary information learned by two diverse but efficient backbones, producing a compact fused representation. Classifying this representation with KNN (k = 5, Euclidean, distance weighting) offered three practical benefits: (i) a simple decision rule with minimal trainable parameters, reducing overfitting risk; (ii) competitive multi-class accuracy without the complexity of large fully connected heads; and (iii) ease of calibration and ablation when probing what the features capture. In practice, this design yielded balanced class-wise performance and a favorable error profile (Figs. 4, 5, and 6).

Positioning Relative To Prior Work

Recent reports highlight high accuracies for this unified dataset family, including transformer-based and deeper CNN approaches. As summarized in our comparative overview (Table 6; Fig. 7), the proposed lighter-weight, dual-backbone late-fusion model with a transparent KNN decision rule attains competitive accuracy relative to these prior works. Because datasets, splits, and preprocessing differ across studies, these values serve as contextual indicators rather than strictly comparable benchmarks. Rather than claiming absolute superiority, the current work findings are calibrated to this dataset regime and emphasize methodological strengths: efficient fused features, robust cross-validation with statistical testing, and comprehensive interpretability with class-wise Grad-CAM and SHAP, including SHAP waterfall plots and summary attributions.

Interpretability and Error Patterns

Grad-CAM maps (Fig. 8) localized evidence in ways that align with radiological expectations: intra-axial saliency for Glioma, dural-based for Meningioma, and sellar and suprasellar for Pituitary, with diffuse or low-intensity patterns in Notumor. SHAP waterfall plots (Fig. 9) complemented these spatial maps by revealing which features increased or decreased the class score for specific cases, providing subject-level interpretability. Aggregated SHAP summaries and exemplar maps (Fig. 10) demonstrated class-consistent attribution distributions across the test set. The convergence between Grad-CAM and SHAP increases confidence that decisions are not driven solely by spurious cues, and the combined views can be surfaced alongside predictions to support radiologist review.

Error Analysis and Insights

To examine how the model arrives at its decisions, we combined spatial and feature‑level explanations derived from the auxiliary softmax head (used only for interpretability) with predictions from the KNN classifier operating on the fused features. Grad‑CAM overlays (Fig. 8) consistently highlighted class‑relevant anatomy: intra‑axial regions for Glioma, dural‑based attachments for Meningioma, and sellar/suprasellar regions for Pituitary. Notumor cases typically showed diffuse or low‑intensity saliency. These patterns were stable to minor visualization parameters and align with common neuroradiological expectations, providing face validity for the learned representations.

SHAP analyses complemented this spatial view with both case‑specific and aggregate attributions. Waterfall plots (Fig. 9) identified which features increased or decreased the predicted class score for individual images, clarifying why specific samples crossed the decision boundary. SHAP summary plots and exemplar maps (Fig. 10) indicated class‑consistent attribution distributions across the test set, suggesting the model relies on similar evidence within each diagnostic category. In general, the agreement between Grad‑CAM and SHAP reduces concern that predictions are driven by spurious cues and makes the outputs more suitable for expert review.

We further examined misclassifications using Grad‑CAM overlays (Fig. 11). Three recurring failure modes emerged: (i) attenuated or off‑target saliency for the ground‑truth class, (ii) spurious activations aligned with the ultimately predicted class, and (iii) intrinsic ambiguity in appearance. The latter often reflects familiar clinical overlaps dural attachment versus intra‑axial margins when distinguishing glioma from meningioma, and subtle, low‑contrast sellar findings for small pituitary lesions. Representative patterns included gliomas with well‑defined borders resembling meningiomas, microadenoma‑like pituitary lesions labeled as Notumor, meningiomas with heterogeneous texture misread as gliomas, and normal intensity variations mistaken for pituitary lesions.

These observations point to practical avenues for improvement. Targeted augmentation near decision boundaries, such as controlled variations in edge sharpness, local texture, and contrast may help disambiguate look‑alike presentations. Incorporating additional MRI sequences when available, such as T1, post‑contrast T1, T2, FLAIR should provide complementary cues for challenging sellar and dural cases. Radiologist‑guided hard‑negative mining, curating confounders that commonly trigger spurious saliency can steer the model away from misleading patterns. Additional safeguards include harmonized intensity normalization to reduce site effects, region‑of‑interest cropping to focus attention, and uncertainty estimation to flag borderline cases for human adjudication.

The interpretability artifacts are directly actionable in a clinical viewer: alongside the predicted class, a Grad‑CAM overlay and a SHAP‑based, case‑level rationale can support rapid verification of anatomical plausibility and focused review of difficult cases. Although the auxiliary head used for explanations is not identical to the KNN decision rule, the observed concordance between Grad‑CAM and SHAP across correct and incorrect cases suggests that the explanations reflect the underlying fused features. Together, the interpretability results and error patterns provide a coherent, clinically meaningful picture of model behavior and clear guidance for the next iteration of improvements.

Clinical Implications

The pipeline’s outputs can be integrated into a viewer to display: (i) the predicted class with confidence, (ii) a Grad-CAM overlay indicating suspected regions, and (iii) a SHAP explanation showing feature contributions for the specific case. The proposed hybrid model, with Grad-CAM and SHAP visualizations, has potential clinical utility in triage, second-opinion support, and resident training contingent upon local institutional validation and robust clinical governance. The model’s simplicity and small footprint also make it suitable for constrained compute environments, pending external validation.

Limitations and Future Directions

Though our current study demonstrates excellent performance on the unified dataset, the combined Figshare, SARTAJ, and Br35H dataset is known to be relatively easy and has produced near-ceiling metrics in prior studies. For reliable clinical translation, prospective multi-center validation on diverse cohorts such as BraTS, TCIA, and multi-site hospital data is essential. Future work will include: (i) External validation on independent datasets from different institutions and scanner manufacturers to assess generalizability and explanation stability across acquisition settings; (ii) Integration of clinical metadata (age, sex, presentation symptoms) to enable multi-modal prediction; (iii) Extension to 3D volumetric analysis to capture spatial relationships more comprehensively; (iv) Investigation of the relationship between SHAP-derived feature importance and known molecular markers such as IDH mutation status and 1p/19q co-deletion to bridge radiological and molecular characterization. Additionally, clinician ratings of explanation plausibility will be collected to ensure clinical usefulness.

Conclusion

Detecting a tumor might be extremely difficult due to the sophisticated nature of the brain. The brain ultimately controls all physiological activities in the body. Machine and Deep learning technologies can automatically categorize brain cancers in their early stages. Nevertheless, there is a growing prevalence of brain tumors worldwide, resulting in shorter lifespans and the possibility of death. Inaccurate identification often leads to superfluous medical interventions, reducing opportunities for those affected. Hence, medical assessments are crucial for enhancing the prognosis of individuals with brain tumors. The effective use of computer-aided diagnostic tools has led to notable progress in machine and deep learning, explicitly aiding medical practitioners in establishing accurate diagnoses. This study presented a fused hybrid framework of state-of-the-art convolutional neural network (CNN) models designed to increase feature extraction and better classification performance.

The proposed technique uses a lightweight, interpretable hybrid framework for MRI-based brain tumor classification that fuses features from MobileNetV2 and EfficientNetV2B0 via late fusion (global average pooling and concatenation) and performs final classification with a KNN head. Using a stratified split (64% train, 16% validation, 20% test) across four classes (Glioma, Meningioma, Pituitary, Notumor), the approach achieved strong and well-balanced performance, as reflected by the confusion matrix and class-wise precision/recall/F1.

A key contribution of this work is transparency. Class‑wise Grad‑CAM overlays, SHAP waterfall plots for case‑level attribution, and aggregated SHAP summaries with exemplar maps are provided. These complementary explanations reveal anatomically plausible evidence supporting predictions and clarify error patterns, offering clinically meaningful insights into model behavior.

Author Contributions

Conceptualization, M.J.A. and C.O.N.; methodology, M.J.A. and C.O.N.; software, M.J.A. and M.F.; validation, M.J.A., C.O.N., A.Y. and H.B.K.; formal analysis, M.J.A., A.Y. and R.S.Z.; investigation, M.J.A., H.B.K. and A.H.J.; resources, L.Q. and A.H.J.; data curation, M.J.A., A.Y., M.F. and R.S.Z; writing—original draft preparation, M.J.A., C.O.N. and H.B.K.; writing—review and editing, L.Q., M.F., A.Y., A.H.J. and R.S.Z.; visualization, M.J.A., M.F., A.H.J and R.S.Z.; supervision, L.Q.; project administration, L.Q.; funding acquisition, L.Q.

Funding

This work was supported by the National Natural Science Foundation of China (Grant Nos. 61471263, 61872267), the Natural Science Foundation of Tianjin (Grant No. 16JCZDJC31100), the Tianjin University Innovation Foundation (Grant No. 2021XZC-0024), and the Foundation of the State Key Laboratory of Ultrasound in Medicine and Engineering (Grant No. 2022KFKT004).

Data Availability

All data used are publicly available from Figshare, SARTAJ, and Br35H (Kaggle). We do not redistribute them.

Code Availability

Code and scripts are available at: https://github.com/mainajajere/brain-tumor-hybrid-fusion-knn (release v0.1.0).

Declarations

Competing Interests

The authors declare no competing interests.

Ethical Approval

This study used de-identified, publicly available datasets and did not require institutional review board approval.

Footnotes

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  1. Abbood AA, Shallal QM, Fadhel MA, Shallal QJ (2021) Automated brain tumor classification using various deep learning models: a comparative study. IJoEE, Science C 22(1):252–9 [Google Scholar]
  2. Abd El Kader I, Xu G, Shuai Z, Saminu S, Javaid I, Salim Ahmad I (2021) Differential deep convolutional neural network model for brain tumor classification. Brain Sci 11(3):352 [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Amin B, Samir RS, Tarek Y, Ahmed M, Ibrahim R, Ahmed M et al (2023) Brain tumor multi classification and segmentation in MRI images using deep learning
  4. Amin J, Sharif M, Gul N, Yasmin M, Shad SA (2020) Brain tumor classification based on DWT fusion of MRI sequences using convolutional neural network. Pattern Recognit Lett 129:115–22 [Google Scholar]
  5. Banerjee T (2025) Towards automated and reliable lung cancer detection in histopathological images using DY-FSPAN: a feature-summarized pyramidal attention network for explainable AI. Comput Biol Chem 118:108500. 10.1016/j.compbiolchem.2025.108500 [DOI] [PubMed] [Google Scholar]
  6. Battaglia S, Di Fazio C, Mazzà M, Tamietto M, Avenanti A (2024) Targeting human glucocorticoid receptors in fear learning: a multiscale integrated approach to study functional connectivity. Int J Mol Sci 25(2):864 [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Bv B, Mathivanan SK, Shah MA (2024) Efficient brain tumor grade classification using ensemble deep learning models. BMC Med Imaging 24(1):297 [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Chandraprabha K, Ganesan L, Baskaran K (2025) A novel approach for the detection of brain tumor and its classification via end-to-end vision transformer-CNN architecture. Front Oncol 15:1508451 [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Chen W, Tan X, Zhang J, Du G, Fu Q, Jiang H (2024) A robust approach for multi-type classification of brain tumor using deep feature fusion. Front Neurosci 18:1288274 [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Garg G, Garg RJ (2021) Brain tumor detection and classification based on hybrid ensemble classifier
  11. Ghassemi N, Shoeibi A, Rouhani M (2020) Deep neural network with generative adversarial networks pre-training for brain tumor classification based on MR images. Biomed Signal Process Control 57:101678 [Google Scholar]
  12. Ghosal P, Nandanwar L, Kanchan S, Bhadra A, Chakraborty J, Nandi D (2019) Brain tumor classification using ResNet-101 based squeeze and excitation deep neural network. Second International Conference on Advanced Computational and Communication Paradigms (ICACCP): IEEE; 2019. pp. 1–6
  13. Illimoottil M, Ginat DJC (2023) Recent advances in deep learning and medical imaging for head and neck cancer treatment: MRI, CT, and PET scans. Cancers 15(13):3267 [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Khairandish MO, Sharma M, Jain V, Chatterjee JM, Jhanjhi NJI (2022) A hybrid CNN-SVM threshold segmentation approach for tumor detection and classification of MRI brain images. 43(4):290–299
  15. Lakshmi K, Amaran S, Subbulakshmi G, Padmini S, Joshi GP, Cho W (2025) Explainable artificial intelligence with UNet based segmentation and Bayesian machine learning for classification of brain tumors using MRI images. Sci Rep 15(1):690. 10.1038/s41598-024-84692-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Maqsood S, Damaševičius R, Maskeliūnas RJM (2022) Multi-modal brain tumor detection using deep neural network and multiclass SVM. Medicina 58(8):1090 [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Maqsood S, Damasevicius R, Shah FM (2021) An efficient approach for the detection of brain tumor using fuzzy logic and U-NET CNN classification. Computational Science and Its Applications–ICCSA. : 21st International Conference, Cagliari, Italy, September 13–16, 2021, Proceedings, Part V 21: Springer; 2021. pp. 105 – 18
  18. Markkandeyan S, Gupta S, Narayanan GV, Reddy MJ, Al-Khasawneh MA, Ishrat M et al (2023) Deep Learn Based Semantic Segmentation Approach Automatic Detect Brain Tumor. ;18(4)
  19. Mavaddati SJMT (2024) Applications. Brain tumors classification using deep models and transfer learning. :1–32
  20. Mohanty BC, Subudhi P, Dash R, Mohanty BJIJoIT (2024) Feature-enhanced deep learning technique with soft attention for MRI-based brain tumor classification. Int J Inf Technol 16(3):1617–26 [Google Scholar]
  21. Narmatha C, Eljack SM, Tuka AARM, Manimurugan S (2020) Mustafa MJJoai, computing h. A hybrid fuzzy brain-storm optimization algorithm for the classification of brain tumor MRI images. :1–9
  22. Nickparvar M (2023) Brain Tumours MRI Dataset
  23. Rasool M, Ismail NA, Boulila W, Ammar A, Samma H, Yafooz WM et al (2022) A hybrid deep learning model for brain tumour classification. Entropy 24(6):799 [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Raza A, Ayub H, Khan JA, Ahmad I, S. Salama A, Daradkeh YI et al (2022) A hybrid deep learning-based approach for brain tumor classification. Electronics 11(7):1146 [Google Scholar]
  25. Reddy CKK, Reddy PA, Janapati H, Assiri B, Shuaib M, Alam S et al (2024) A fine-tuned vision transformer based enhanced multi-class brain tumor classification using MRI scan imagery. Front Oncol 14:1400341 [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Saddique M, Kazmi JH, Qureshi KJC, medicine mmi (2014). A hybrid approach of using symmetry technique for brain tumor segmentation. ;2014 [DOI] [PMC free article] [PubMed]
  27. Sharif MI, Li JP, Khan MA, Saleem MAJPRL (2020) Active deep neural network features selection for segmentation and recognition of brain tumors using MRI images. 129:181–189
  28. Siar M, Teshnehlab M (2019) Brain tumor detection using deep neural network and machine learning algorithm. 9th international conference on computer and knowledge engineering (ICCKE): IEEE; 2019. pp. 363-8
  29. Singh DP, Kour P, Banerjee T, Swain D (2025) Arch Comput Methods Eng 32(6):3733–3757. 10.1007/s11831-025-10255-2. A Comprehensive Review of Various Machine Learning and Deep Learning Models for Anti-Cancer Drug Response Prediction: Comparative Analysis With Existing State of the Art Methods
  30. Singh R, Agarwal BBJIJIT (2023) An automated brain tumor classification in MR images using an enhanced convolutional neural network. 15(2):665–674
  31. Suresh M, Saranya S, Punitha A, Kowsalya R (2023) Identification of Brain Tumor Stages and Brain Tumor Diagnosis Using Deep Learning Model Based on Inception V4 and DENSENET 201. 2023 International Conference on System, Computation, Automation and Networking (ICSCAN): IEEE; pp. 1–6
  32. Tan M, Le QV, EfficientNet (2019) Rethinking model scaling for convolutional neural networks. Proceedings of the 36th International Conference on Machine Learning (ICML). :6105-14
  33. Tanaka M, Giménez-Llort L, Battaglia S, Chen C, Hepsomali P (2024) Emerging Translational Research in Neurological and Psychiatric Diseases: From In Vitro to In Vivo Models, from Animals to Humans, from Qualitative to Quantitative Methods 2.0. MDPI-Multidisciplinary Digital Publishing Institute [Google Scholar]
  34. Toğaçar M, Ergen B, Cömert ZJ (2020) BrainMRNet: brain tumor detection using magnetic resonance images with a novel convolutional neural network model. Med Hypotheses 134:109531 [DOI] [PubMed] [Google Scholar]
  35. Tummala S, Kadry S, Bukhari SAC, Rauf HTJ (2022) Classification of brain tumor from magnetic resonance imaging using vision transformers ensembling. Curr Oncol 29(10):7498–511 [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. ul Haq I, Anwar S, Hasnain G (2022) A combined approach for multiclass brain tumor detection and classification. JPJoE, Technology 5(1):83–8 [Google Scholar]
  37. Ullah S, Ahmad M, Anwar S, Khattak MIJ (2023) An intelligent hybrid approach for brain tumor detection. Pakistan Journal of Engineering and Technology 6(1):42–50 [Google Scholar]
  38. Vankdothu R, Hameed MA, Fatima HJ (2022) A brain tumor identification and classification using deep learning based on CNN-LSTM method. Comput Electr Eng 101:107960 [Google Scholar]
  39. Varuna Shree N, Kumar TJB (2018) Identification and classification of brain tumor MRI images with feature extraction using DWT and probabilistic neural network. 5(1):23–30 [DOI] [PMC free article] [PubMed]
  40. Yadav AC, Kolekar MH, Zope MK (2025) Modified recurrent residual attention U-Net model for MRI-based brain tumor segmentation. Biomed Signal Process Control. 10.1016/j.bspc.2024.107220 [Google Scholar]
  41. Yadav AC, Shah K, Purohit A, Kolekar MHJMT, Applications (2025b) Computer-aided diagnosis for multi-class classification of brain tumors using CNN features via transfer-learning. :1–24

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Citations

  1. Singh DP, Kour P, Banerjee T, Swain D (2025) Arch Comput Methods Eng 32(6):3733–3757. 10.1007/s11831-025-10255-2. A Comprehensive Review of Various Machine Learning and Deep Learning Models for Anti-Cancer Drug Response Prediction: Comparative Analysis With Existing State of the Art Methods

Data Availability Statement

All data used are publicly available from Figshare, SARTAJ, and Br35H (Kaggle). We do not redistribute them.

Code and scripts are available at: https://github.com/mainajajere/brain-tumor-hybrid-fusion-knn (release v0.1.0).


Articles from Cellular and Molecular Neurobiology are provided here courtesy of Springer

RESOURCES