A novel residual network based on multidimensional attention and pinwheel convolution for brain tumor classification

Jincan Zhang; Rongfu Lv; Wenna Chen; Ganqin Du; Qizhi Fu; Hongwei Jiang

doi:10.1038/s41598-025-16564-7

. 2025 Aug 23;15:31066. doi: 10.1038/s41598-025-16564-7

A novel residual network based on multidimensional attention and pinwheel convolution for brain tumor classification

Jincan Zhang ¹, Rongfu Lv ¹, Wenna Chen ^2,^✉, Ganqin Du ², Qizhi Fu ², Hongwei Jiang ²

PMCID: PMC12374982 PMID: 40849507

Abstract

Early and accurate brain tumor classification is vital for clinical diagnosis and treatment. Although Convolutional Neural Networks (CNNs) are widely used in medical image analysis, they often struggle to focus on critical information adequately and have limited feature extraction capabilities. To address these challenges, this study proposes a novel Residual Network based on Multi-dimensional Attention and Pinwheel Convolution (Res-MAPNet) for Magnetic Resonance Imaging (MRI) based brain tumor classification. Res-MAPNet is developed on two key modules: the Coordinated Local Importance Enhancement Attention (CLIA) module and the Pinwheel-Shaped Attention Convolution (PSAConv) module. CLIA combines channel attention, spatial attention, and direction-aware positional encoding to focus on lesion areas. PSAConv enhances spatial feature perception through asymmetric padding and grouped convolution, expanding the receptive field for better feature extraction. The proposed model classifies two publicly brain tumor datasets into glioma, meningioma, pituitary tumor, and no tumor. The experimental results show that the proposed model achieves 99.51% accuracy in the three-classification task and 98.01% accuracy in the four-classification task, better than the existing mainstream models. Ablation studies validate the effectiveness of CLIA and PSAConv, which are 4.41% and 4.45% higher than the ConvNeXt baseline, respectively. This study provides an efficient and robust solution for brain tumor computer-aided diagnosis systems with potential for clinical applications.

Keywords: Brain tumor classification, Deep learning, Convolutional neural networks, Asymmetric convolution, Attention mechanism

Subject terms: Oncology, Translational research, Computer science

Introduction

The human anatomy is composed of various organs, among which the most pivotal and valuable is the brain¹. Brain tumors are a group of irregular masses that form in the brain, which are likely to expand, and any growth in this part of the skull can produce potentially fatal brain malformations². Untreated tumors can lead to permanent brain damage or even be fatal. In 2019, about 86,000 new cases of brain tumors were diagnosed. Since 2019, about 17,000 deaths from the disease have been reported³. Brain tumors are classified and differentiated based on size, shape, and location in the brain. Brain tumors can be primary or secondary, where primary tumors originate within brain cells and secondary tumors spread to the brain from another part of the body⁴. Early identification and treatment of the tumor is crucial as it usually spreads to other tissues, reducing the chances of effective treatment and survival. Medical imaging plays a crucial role in detecting brain tumors. Magnetic Resonance Imaging (MRI) and Computed Tomography (CT) scans are valuable diagnostic tools. They reveal abnormal brain tissue and measure its extent. They also guide timely follow-up and clarify tumor status. MRI, in particular, stands out for its ability to provide high-resolution imaging of brain tumors⁵.

Classical machine learning algorithms have been widely used for classification problems in Computer-Aided Diagnosis (CAD) systems⁶. These algorithms rely on hand-crafted features. They use classifiers such as Support Vector Machines (SVM), Decision Trees (DT), or k-nearest neighbors (KNN) to perform classification. Effective feature engineering requires domain knowledge and is more time-consuming and error-prone if the size of the dataset is very large⁷. Owing to the critical nature of medical data, it is very important to achieve high accuracy in less time.

Nowadays, deep learning and especially Convolutional Neural Networks (CNNs) are widely used in various computer-aided diagnostic methods for medical imaging, as CNNs can achieve and even surpass human performance in generic object recognition. Different from traditional machine learning algorithms, CNNs can automatically extract meaningful features from images. Examples of traditional algorithms include support vector machines and back-propagation neural networks. These methods usually require manual feature engineering. CNNs eliminates this step, making the process more efficient and less dependent on expert knowledge. CNNs is composed of convolutional layers and therefore has a high inductive bias. For example, the convolutional kernel has a fixed size and can scan the image or intermediate feature maps incrementally, thus receiving only a small localized region at a time. In addition, the computation of different convolutional kernels can be realized in parallel. As a result, the performance of computer-aided diagnosis systems has been significantly improved with the development of advanced CNNs models⁸. Some classification network models have been widely used for brain tumor classification tasks, such as RegNet⁹, ResNet¹⁰, DeepTumorNet¹¹, Dense Efficient-Net¹², DQKNet¹³, Vision transformer¹⁴, and Swin Transformer¹⁵.

However, traditional CNNs suffer from fixed receptive fields and information loss in pooling layers, limiting their ability to capture complex lesion features. In addition, traditional CNNs requires deep networks to extract features efficiently, and the large number of parameters is a challenge for limited computational resources¹⁶. Vision Transformer had a significant impact on computer vision in 2020. Since then, Transformer architectures have been increasingly widely adopted. However, ConvNeXt¹⁷, a pure convolutional neural network developed by Facebook AI Research and UC Berkeley, outperforms the Swin Transformer and delivers even better results. Despite these advancements, ConvNeXt still inherits the inherent limitations of convolutional networks. Firstly, as a convolution-based architecture, it lacks explicit mechanisms to dynamically prioritize salient regions, a critical shortcoming in tasks like medical imaging, where lesion areas vary in size and location. Secondly, its feature extraction remains constrained by small receptive fields and limited capture capability of directional and positional information. To overcome the above problems, an attention mechanism and a novel convolutional operation were integrated into the ConvNeXt network model to address these shortcomings. Transformer relies on self-attention for global context but incur high computational costs, making them less suitable for resource-constrained clinical settings. Our model addresses these limitations by integrating multi-dimensional attention and improved convolution operation, balancing accuracy and efficiency. The experimental results demonstrate an improvement in classification accuracy. This enhancement was achieved by integrating the Coordinated Local Importance Enhancement Attention (CLIA) module and the Pinwheel-Shaped Attention Convolution (PSAConv) module introduced in this paper into the baseline model. The main contributions of this study can be summarized as follows:

In this paper, a new deep learning-based brain tumor classification model, Residual Network based on Multi-dimensional Attention and Pinwheel Convolution (Res-MAPNet), is proposed.
Based on the baseline model, the introduction of residual blocks and the addition of the CLIA attention module proposed in this study make the network model more focused on useful information.
Replacing the traditional convolution operation with the proposed PSAConv attention convolution module enhances feature extraction and significantly increases the receptive field.
In extensive experimental evaluations, the proposed method has been shown to outperform current techniques.

The rest of the paper is divided into five sections: Related work, Methods, Results, Discussion, and Conclusion. Related work discusses the literature review of several state-of-the-art methods related to this study. The proposed methodology is discussed in the Methods section. The Results section discusses the system and model parameter settings, evaluation metrics, experimental results, and ablation experiments. Comparative experiments are performed in the Discussion section. Finally, the full paper is summarized in the Conclusion section.

Related work

Machine learning and Deep learning techniques have made significant progress in the field of brain tumor classification. Among the machine learning methods, researchers such as Biswajit Jena et al.¹⁸ achieved the following classification accuracies on FLAIR-, T1C-, and T2-weighted brain tumor MRI using different techniques: 94.25% with support vector machine (SVM), 87.88% with K-nearest neighbors (KNN), 89.57% with binary decision trees (BDT), 96.99% with random forest (RF), and 97% with ensemble methods. Jinglan Zhang et al.¹⁹ pointed out that the computing paradigm of deep learning has been regarded as the gold standard in the field of machine learning. Deep learning has outperformed well-known machine learning techniques in many domains. The VGG model, which is known for its structural simplicity and depth, has been widely used in medical image classification tasks due to its ability to extract complex information from images²⁰. ResNet facilitates the training of very deep networks by mitigating the problem of gradient vanishing through its novel residual learning architecture²¹. Several studies have achieved breakthroughs through innovative improvements in the evolution of deep learning architectures. The survey by Tariq Sadad et al.²² demonstrated the superiority of the NASNet model based on CNNs, which achieved an impressive accuracy of 99.6% in multi-class tumor classification, surpassing other models such as ResNet50 and MobileNet V2. To balance the model performance and computational efficiency, researchers have attempted to combine CNNs with other techniques to construct a hybrid framework. Saeedi et al.²³ proposed a model using 2D CNNs and an autoencoder. Their CNNs reached 96.47% accuracy while keeping complexity far lower than the autoencoder. Hussain et al.²⁴ proposed EFFResNet-ViT, a fusion-based convolutional and vision transformer model for medical image classification, which achieved state-of-the-art performance in the brain tumor classification task. These studies collectively underscore the transition from traditional machine learning to deep learning dominance, driven by architectural innovations and hybrid strategies.

In recent years, the attention mechanism has become a key enhancement technique for sequence-to-sequence (seq2seq) models and computer vision models. Different regions of an image contribute unequally to recognition accuracy. Selective attention to the most important regions can significantly improve recognition performance. Attention is a very general mechanism that can be applied to many different problem domains. In the case of image classification using a basic CNNs model, for example, the CNNs generates a feature map in which a feature vector represents each region of the image. Since the attention mechanism does not inherently depend on the organization of the feature vectors, it is this property that allows it to be easily implemented in a variety of models in different domains²⁵. Later studies introduced architectures such as ECA-Net²⁶ and EMA²⁷. ECA-Net was trained on the CIFAR100 dataset, and EMA on the large-scale ImageNet dataset. The attention mechanism has a significant effect on the performance improvement of image recognition tasks²⁸. The attention mechanism has been widely used in deep learning model architecture optimization. For example, Hussain et al.²⁹ proposed DCSSGA-UNet. This architecture integrates a channel spatial and semantic guided attention mechanism based on the DenseNet121 model to enhance the performance of medical image segmentation. Shouno et al.³⁰ propose a segmentation network called Multi-Attention Gated Residual U-Net (MAGRes-UNet). This network incorporates four multi-attention gate (MAG) modules and residual blocks into a standard U-Net structure, providing competitive performance over the representative medical image segmentation methods. EfficientNetV2 integrates a channel attention block (SE Block) to strengthen its feature representation. It achieves 94.3% Top-5 classification accuracy on the ImageNet dataset³¹. These innovations underscore attention’s transformative role in model optimization, balancing computational efficiency with precision by emphasizing task-relevant spatial or channel-wise information.

While CNNs excel in computer vision tasks, traditional convolution operations face limitations in feature extraction efficiency. Therefore, various efficient convolution operations such as Group-Wise Convolution (GWC), Depth-Wise Convolution (DWC), and Point-Wise Convolution (PWC) have been proposed to replace the existing costly convolution operations³². MobileNet³³ introduces the use of DWC and PWC for filtering the inverted residual blocks of features, which reduces the number of parameters while speeding up the training. ShuffleNet³⁴ uses PWC and channel blending operations to improve information flow between different channel groups. Wang et al.³⁵ proposed TiedBlockConv, which shares the same channel block on equal convolutional filters to produce multiple responses within a single filter. Zhang et al.³⁶ proposed LDConv, which accomplishes an efficient feature extraction process through irregular convolutional operations, providing more exploration options for convolutional sampling shapes. GhostNet³⁷ considers redundancy between feature mappings and uses inexpensive operations like DWC to learn the redundant features. SilmConv uses the operations of reducing feature channels and flipping weights to reduce feature redundancy³⁸. Collectively, these methods improve CNNs efficiency by rethinking convolution design, and balancing computational savings with robust feature representation.

Inspired by the above research results, our network integrates the CLIA attention module and replaces the traditional convolution operation with PSAConv convolution. The experimental results show that our network achieves the desired accuracy on the dataset of brain tumor multiclassification tasks with a low number of parameters.

Methods

The proposed method for brain tumor image classification is detailed in this section. In this method, two brain tumor datasets are classified using the proposed Res-MAPNet network.

Datasets and preprocessing

This experiment is based on two publicly available brain tumor MRI datasets for the study. The MRI provided are a combination of T1, T2 and FLAIR types. The three-classification task is used to distinguish between gliomas, meningiomas, and pituitary tumors, and the four-classification task adds the identification of no tumors. The dataset used in the study was collected from the publicly available Figshare brain tumor dataset and the Kaggle website. The images for each category of the dataset are shown in Fig. 1. The two datasets are divided into the Train set and Val set in an 8:2 ratio. Table 1 shows the sample distributions of the three-classification and four-classification tasks in detail.

Fig. 1 — Brain tumor samples for each category.

Table 1.

Details of dataset partitioning.

Type	Three-classification datasets			Four-classification datasets
Type	Original set	Train set	Val set	Original set	Train set	Val set
Glioma	1426	1141	285	926	740	186
Meningioma	708	567	141	940	752	188
Pituitary	930	744	186	901	720	181
No tumor	-	-	-	497	397	100
Total	3064	2452	612	3264	2609	655

Open in a new tab

MRI images were resized to 224 × 224 × 3. Furthermore, the min-max normalization method was adopted to scale the intensity values of the image to the range of [0, 1]. To enhance the generalization ability of the model and suppress the overfitting phenomenon, this study introduces a data enhancement strategy in the training process, as shown in Fig. 2. Data augmentation techniques can be used to enhance the diversity of the dataset³⁹. The random scale cropping technique is used to enhance the scale invariance of the model by dynamically changing the scale and spatial distribution of the input samples. The random level flipping is implemented to effectively enhance the robustness of the model to spatially symmetric features. These data augmentation methods have expanded the diversity of training samples and enable the network to learn more discriminative feature representations. To alleviate the imbalance of the dataset, in the training process, a random oversampling method was adopted for the samples of the minority category. Furthermore, in this study, the focal loss function for handling class imbalance was employed.

Fig. 2 — Image after applying data enhancement. Original image (a), horizontal flip (b), and random resize crop (c).

Overall network architecture

A novel network architecture, Res-MAPNet for MRI brain tumor classification, is proposed in this paper, as shown in Fig. 3. It uses a residual learning backbone network with an input size of 224 × 224 × 3. Res-MAPNet integrates two novel modules into the ConvNeXt backbone network. Its core components are the CLIA and PSAConv modules. To address the shortcomings of existing methods in multi-scale feature extraction and spatial perception, this study is inspired by Local Importance-based Attention (LIA)⁴⁰ and Pinwheel-shaped Convolution (PConv)⁴¹. The CLIA module and PSAConv are proposed in this study. Deeper networks face degradation issues, which can be mitigated by using skip connections. Skip connections allow models to reach greater depth by transferring information from initial to deeper layers without adding parameters⁴². In the network design, the gradient dispersion problem is effectively mitigated by introducing the skip connections. The residual connections are designed using a custom approach. However, effective attention to the focal region of an image is difficult, and feature extraction is still difficult. Therefore, by adding the CLIA attention module to the residual structure, the ability of the network to pay attention to the features in the focal region is enhanced. This module strengthens the perception of critical regions by establishing a local importance weight map. Additionally, traditional convolution operations ignore the spatial distribution of features. They also have an insufficient receptive field. To address these limitations, this study uses the PSAConv module instead of the traditional convolutional layer. PSAConv differs from traditional convolution operations. It adopts asymmetric padding and groups convolutions. These techniques expand the receptive field and enhance underlying feature extraction. Moreover, PSAConv controls the growth in parameter count. To make the model lightweight, this study improves efficiency by optimizing the network depth. The number of repeated ResBlock modules is reduced. This lowers the computational complexity while maintaining the featureextraction capability.

Figure 4 shows the design of the ResBlock in the backbone network. During forward propagation, the block uses depthwise separable convolution. This reduces the number of computational parameters while keeping the input and output dimensions consistent. The Permute operation adjusts the tensor format from [N, C, H, W] to [N, H, W, C]. This change allows layer normalization to act directly on the channel dimension. It also ensures compatibility with the input format required by subsequent linear layers. Layer Scale dynamically adjusts feature magnitudes using learnable parameters. This improves training stability and model performance. After layer normalization, the linear layer, Layer Scale, and Drop Path operations are applied. Another Permute operation then restores the tensor to its original dimension [N, C, H, W]. This ensures compatibility with the channel order of subsequent layers. ResBlock uses residual connections to optimize deep networks. It also helps efficiently extract features and stabilize training while keeping the input and output dimensions consistent.

Coordinated local importance enhancement attention

As shown in Fig. 5, the CLIA module proposed in this study effectively addresses the limitations of traditional attention models in medical image analysis. It enhances multi-scale feature capture through a multi-dimensional feature fusion mechanism. CLIA is a hybrid attention mechanism, combining channel attention, spatial attention and directional positional encoding. By dynamically recalibrating feature maps along these dimensions, CLIA enables the network to focus on lesion regions more precisely, thereby improving the classification performance of brain tumors.

The input data, first passes through the Coordinate Attention (CA), as shown in Fig. 6. This attention decomposes the channel attention into two one-dimensional feature encoding processes that perform feature aggregation along two spatial directions. This design captures long-range dependencies along one spatial direction while maintaining precise positional information along the other. The generated feature maps are subsequently encoded as a pair of direction-aware and location-sensitive attention maps, which can act complementarily on the input feature maps to enhance the representation of the target object⁴³. Thus, it considers not only channel information but also orientation-related position information.

As shown in Fig. 7, SoftPool is used for downsampling. This method is more effective in retaining information-rich features, thus improving the performance of CNNs in classification and detection tasks. SoftPool employs a softmax function within the convolutional kernel region, which allows each activation value to have a scaled impact on the output. Its feature of improving classification performance while maintaining low computational requirements makes it an ideal alternative to the existing maximum pooling and average pooling operations⁴⁴. Finally, local statistical information is extracted by a 3 × 3 convolution to generate a preliminary importance graph G(X) as shown in the following equation:

where X is the input feature map. CA(X) is the first CA attention computation on the input feature X.

The local importance map is activated by a Sigmoid function and upsampled to the input resolution using a bilinear interpolation algorithm. To avoid artifacts introduced by convolution operations and bilinear interpolation, the first channel X₀ of the input features is directly used. This channel serves as a gating control to recalibrate the local importance map. Notably, this approach requires no additional parameters. The gating signal is subjected to Sigmoid activation to generate dynamic weights. Finally, the attention map A(X) is generated by an element-by-element product operation of the gating mask and the up-sampled importance map. This design effectively avoids the parameter redundancy problem while ensuring the feature calibration accuracy. As shown in the following equation:

where σ is the Sigmoid activation function, and Bilinear is the bilinear interpolation.

The attention map is multiplied with the original feature map element by element to enhance the important areas. Get the output of the CLIA module X _out as shown in the following equation:

Finally, CLIA is added to the residual structure of the network to enhance the ability of the network model to extract features.

Pinwheel-shaped attention convolution

Pinwheel-shaped convolution

PConv is a convolution module designed for the Gaussian spatial distribution characteristics of infrared small targets (IRST). PConv is intended to replace the standard convolution layer, especially for the underlying layer of backbone networks. The core idea is to expand the receptive field and enhance the extraction of low-level features. This is achieved through asymmetric padding and grouped convolution operations. The growth of the parameter size is effectively controlled. The original PConv module achieves efficient feature extraction through the following steps:

Parallel multi-directional convolution: 1 × 3 and 3 × 1 horizontal and vertical convolution kernels are used, combined with an asymmetric filling strategy to expand the receptive field.
Feature splicing and normalization: four sets of convolution results are spliced along the channel dimension, and channel compression is performed by a 2 × 2 convolution.
Parameter efficiency: significantly expanding the receptive field with only a small increase in parameters through grouped convolutional design.

In this study, the application of PConv to the field of medical image classification effectively improved the ability to extract network model features.

PConv integration strategy with CA

Most existing network models use standard convolutional operations. Although excellent performance is achieved, it is difficult to detect features in brain tumor lesion regions. The original PConv module enhances feature extraction with a pinwheel-shaped convolutional design. This design significantly increases the receptive field. However, it focuses primarily on efficiently extracting localized features. It does not account for differences in the importance of channels and spatial locations. This may lead to insufficient attention to critical channels or spatial regions, especially in complex scenes where it is difficult to distinguish redundant features. For this reason, this study introduces the attention mechanism CA at the end of the PConv module to form the PSAConv module, as shown in Fig. 8. It can dynamically adjust the weights of different channels and focus on key regions. This helps compensate for the lack of global context awareness in PConv. This attentional convolution enhances feature extraction and increases the receptive field while also making the network more focused on brain tumor lesion regions. Different from traditional convolution, PSAConv creates horizontal and vertical convolution kernels for each region of the image by asymmetrically filling the input tensor.

The feature extraction layer is similar to the original PConv, as PSAConv also employs a dynamic multi-scale convolutional kernel. Its output consists of four parallel convolution branches. As shown in the following equation:

where P_i denotes the asymmetric padding strategy for the input feature X, for example, P₁=(1,0,0,3) denotes the number of pixels padding in the left, right, up, and down directions for the input feature map, respectively. BN stands for Batch Normalization. SiLU stands for Sigmoid Gated Linear Unit, an activation function in neural networks.

The output after convolution is concatenated and spliced to obtain multi-scale features. Then, convolution, normalization, and SiLU activation function are performed to achieve efficient fusion of multi-scale features. As shown in the following equation:

where Cat denotes concatenate.

Finally, adding CA attention at the end of the module makes the network pay more attention to channel and direction-related position information. The final output of its module is shown in the following equation:

PSAConv utilizes grouped convolution, increasing the receptive field while minimizing the number of parameters. The number of parameters for Conv are calculated as:

where k is the convolution kernel size; c2 is the output channels; c1 is the input channels. When the size of the convolution kernel is 3 × 3, the Conv’s parameters are 9c₁c₂, and our PSAConv’s parameters are calculated as follows:

From this, compared with traditional convolutions, the PSAConv used in this study has a lower number of parameters.

Results

This section presents the classification results of brain tumor MRI datasets using the Res-MAPNet architecture. The system configuration and model parameters used in this study were first described. The evaluation metrics used in this study are then described, followed by the presentation of the experimental results, and finally, the ablation experiments are presented.

Model parameters and hardware environment

The experimental environment used in this study is Windows 10 operating system. PyTorch version 2.1.0 was used. Details of hardware configuration and model parameters are given in Table 2.

Table 2.

Hardware configuration and model parameters.

Types	Configuration	Types	Value
GPU	RTX 4070	Init-lr	5e-4
CPU	I5-13400 F	Weight_decay	5e-2
CUDA	12.0	Epoch	100
Pytorch	2.1.0	optimizer	Adamw

Open in a new tab

Evaluation metrics

To fully evaluate the performance of the proposed model, the accuracy, precision, recall, specificity, and F1-score are used to assess the model’s performance. In addition, the correspondence between the categories predicted by the model and the actual categories is also demonstrated through the confusion matrix.

The accuracy rate indicates the total number of outcomes successfully predicted by the model as a percentage of the total sample. Precision is defined as the proportion of samples correctly identified as positive among all instances predicted to be positive. It measures the model’s ability to avoid false positive predictions. Recall is the proportion of positive classes predicted correctly, that is, the proportion of all samples that are positive that are correctly predicted by the model. In some scenarios, such as disease screening, we may be more concerned about not missing any true cases (i.e., reducing false negatives) and therefore pay particular attention to recall. The F1-Score comprehensively considers the Precision and Recall of the model and evaluates the performance of the model by calculating the harmonic average of these two indicators. Specificity is the proportion of negative classes that are correctly predicted, that is, the proportion of all samples that are negative that are correctly predicted as negative by the model. All the evaluation metrics are given in the following equations:

where “TP” denotes true positives. It indicates the number of samples correctly predicted as positive by the model. “TN” indicates true negative. It indicates the number of samples correctly predicted by the model to be in the negative category. “FP” indicates a false positive. This indicates that the model incorrectly predicts the number of samples in the negative category as being in the positive category. “FN” denotes false negative. It indicates the number of samples in which the model incorrectly predicted the positive class as negative.

Confusion Matrix, also known as Error Matrix, is a structured form of table used to visualize the correspondence between the prediction results of network models and the real labels. This method reflects the overall classification effectiveness of the model. It also reveals the model’s ability to identify and generalize across different categories. This is achieved by systematically analyzing the discriminative results for each category. The confusion matrix adopts a specific layout. True labels are distributed along the vertical axis, while predicted categories align with the horizontal axis. This configuration creates a two-dimensional mapping that visualizes classification outcomes. By analyzing the row-column correspondence, the misclassification pattern of the model in specific categories can be precisely located, providing a quantitative basis for the optimization of the classifier.

Experimental results

To visually reflect the overall classification effect of the model and more conveniently calculate other evaluation indicators, the confusion matrices of the proposed model on the two datasets were plotted as shown in Fig. 9. In the three-classification task, the model demonstrated excellent performance, with an overall accuracy rate reaching 99.51%. Among them, the Pituitary category achieved zero misjudgment. Only a very small number of cross-misjudgments occurred between Glioma and Meningioma, with one misjudgment in each category. Among the four-classification tasks, the overall accuracy rate of the model slightly decreased to 98.01%. The main challenge was concentrated in the Meningioma category. Its misjudgments were distributed in Glioma 4 cases, No tumor 1 case, and Pituitary 2 cases. The No tumor category performed robustly, but 2 cases were misjudged as Meningioma. Pituitary maintained high robustness in both types of tasks. This indicates that the model had a significant effect on the feature extraction of pituitary tumors. In contrast, Meningioma showed a slight decline in the four-classification task. Overall, the diagonal regions representing the correct classification show obvious dark aggregation characteristics. The numerical distribution is highly concentrated at the diagonal position. The efficiency and reliability of the proposed model in the multi-classification task of brain tumors have been fully verified.

Fig. 9 — Classification results of the brain tumor dataset, (a) three-category confusion matrix and (b) four-category confusion matrix.

With the above confusion matrix, the evaluation metrics for each brain tumor category can be counted. The detailed values of Precision (Pre), Recall (Re), F1-score (F1), Specificity (Spe), and overall Accuracy (Acc) are shown in Table 3. The proposed model shows excellent classification performance in both three- and four-classification tasks. The overall accuracy reached 99.51% and 98.01%, respectively. For the three-classification task, the evaluation metrics of all categories exceed 99%. Pituitary tumors perform particularly well. They achieve 100% recall and 99.8% specificity. This indicates that their classification results are completely free of omissions and misdetections. After expanding to the four-classification task, the overall performance of the model remains robust. However, the precision rate for meningiomas decreases to 96.8%. This drop is mainly due to the misclassification of gliomas in four cases and pituitary tumors in two cases. Notably, the specificity of the new “no tumor” category was as high as 99.8%, with only 2 cases of false positives, validating the model’s ability to accurately identify healthy tissues. Except for meningioma in the fourclassification task, the specificity indices of all other categories exceeded 99%. The F1-scores showed very little deviation from precision and recall. This confirms that the model achieves a high balance between recall and specificity. The model, therefore, has potential for clinical deployment.

Table 3.

The results of each evaluation metric of the three-classification and four-classification tasks.

Type	Three-classification task					Four-classification task
	Pre (%)	Re (%)	F1 (%)	Spe (%)	Acc (%)	Pre (%)	Re (%)	F1 (%)	Spe (%)	Acc (%)
Glioma	99.6	99.3	99.5	99.7	99.51	97.9	98.9	98.4	99.1	98.01
Meningioma	99.3	99.3	99.3	99.8		96.8	96.3	96.5	98.7
No tumor	-	-	-	-		99.0	98.0	98.5	99.8
Pituitary	99.5	100	99.7	99.8		98.9	98.9	98.9	99.6

Open in a new tab

To deeply understand the decision-making mechanism of the model, enhance its interpretability, and provide auxiliary support for medical diagnosis, this study draws a heatmap as shown in Fig. 10. The highlighted regions indicate the regions of highest concern in the classification process of the model. The heatmap of MRI brain tumor multiclassification visualization, constructed based on the gradient-weighted class activation mapping (Grad-CAM) method, shows that the proposed deep learning model can effectively identify and accurately locate tumor lesions. The highlighted regions in the model output closely match the true tumor locations. This consistency indicates that the model accurately focuses on pathological feature regions. This result further demonstrates its potential application value in medical image analysis.

Fig. 10 — Visual representation of the proposed Res-MAPNet predictions in classifying the type of brain tumor from MRI sequences using Grad-CAM. Images in the top and bottom rows are the original image and Grad-CAM, respectively. Here, (a) glioma, (b) meningioma, and (c) pituitary tumor.

Ablation experiments

To validate the impact of each improvement module on model performance, we performed four sets of ablation experiments on the three configurations. Ablation experiments were performed on three- and four-classification tasks of brain tumor MRI. Each model was compared in terms of accuracy, average precision, average recall, average specificity, and average F1-score on the validation dataset. The results are summarized in Table 4.

Table 4.

Ablation experiments on three- and four-classification tasks for brain tumors.

Number of classes	ConvNext	CLIA	PSAConv	Acc (%)	Pre (%)	Re (%)	F1 (%)
Three	√	×	×	95.10	94.65	95.08	94.73
	√	√	×	97.22	96.85	96.69	96.77
	√	×	√	98.37	98.15	98.40	98.27
	√	√	√	99.51	99.47	99.53	99.50
Four	√	×	×	93.56	93.33	93.29	93.30
	√	√	×	96.32	96.55	96.31	96.41
	√	×	√	96.17	96.15	96.29	96.22
	√	√	√	98.01	98.13	98.02	98.07

Open in a new tab

The ablation experiments show that ConvNext, combined with CLIA and the PSAConv module, significantly improves the performance of the brain tumor classification task. Taking the three-classification task as an example, the model accuracy was 95.10% when ConvNext was used alone, and improved by 2.12–97.22% after adding CLIA. The effect is even more prominent when using PSAConv to replace the traditional convolutional operation, improving the accuracy by 3.27–98.37%. When both modules are integrated simultaneously, the model performance peaks with a 4.41% improvement in accuracy to 99.51% and a 4.77% improvement in F1-score to 99.50%, indicating that the two work synergistically. For the more complex four-classification task, the baseline model accuracy of 93.56% improved by 2.76–96.32% with the addition of CLIA, and by 2.61–96.17% with PSAConv alone. The combination of the two works best, with accuracy increasing from 4.45 to 98.01%. The highest accuracy rate for the four-classification task was 1.50% lower than the three-classification task, indicating that it is more difficult to categorize with more categories. The combination of CLIA and PSAConv not only improves the ability to differentiate between multiple tumor types but also stably responds to complex tasks. This provides an efficient and reliable solution for medical image analysis.

To effectively verify the performance advantage of the improved baseline model, the convergence process of the model under different classification tasks was analyzed. Figures 11(a) and 11(b) present the accuracy evolution curves of the validation set and the corresponding cross-entropy loss trends in the three-classification task of brain tumors. Figures 11(c) and 11(d) demonstrate the changes of the relevant metrics in the four-classification scenario. The experimental results show that with the addition of the improved module, the model exhibits faster convergence speed and a more stable training process in both classification tasks. The accuracy and loss curves are optimized on the validation set for both the three- and four-classification tasks. Among them, the proposed model performs the best, with faster convergence, higher validation accuracy, and lower loss values throughout the process. This demonstrates the effectiveness of CLIA and PSAConv in feature extraction and optimization to enhance the learning ability of the model and improve the classification accuracy. An accuracy of 99.51% was achieved in the three-classification task, outperforming the baseline model. The loss curve likewise validates this result. In contrast, the difficulty of the four-classification task increases. Nonetheless, the proposed model still demonstrates a relative advantage with an accuracy of 98.01%.

Fig. 11 — Validation accuracy-loss curve of the proposed Res-MAPNet model. For the three-classification task, (a) accuracy plot, (b) loss plot, and for the four-classification task, (c) accuracy plot, (d) loss plot.

Through the above ablation experiments, the proposed net model improves and achieves high accuracy in all evaluation metrics on the three- and four-classification datasets of brain tumors. These findings indicate that Res-MAPNet has good robustness and generalizability.

Discussion

To further investigate the performance of Res-MAPNet, this study compares Res-MAPNet with the classical network model and the state-of-the-art model, respectively.

Table 5 summarizes the results of the experiments comparing the three- and four-classification tasks for brain tumors on the classical network model. The highest evaluation metrics are shown in bold. The experimental design covers two typical classification scenarios. In the three-classification task, the model’s featuredifferentiation ability is evaluated under a limited number of categories. In the four-classification task, the model’s generalization performance is further verified when more categories are introduced. The comparison models include VGG16, AlexNet, ResNet50, DenseNet, EfficientNetV2 and Inception_v3. Swin Transformer and Vision Transformer, both of which are based on the self-attention mechanism, are also included. These selections ensure that the comparison experiments have adequate type coverage and technology representation.

Table 5.

The evaluation metrics obtained by using the classical network model for the three- and four-classification tasks.

Model	Three-classification task					Four-classification task
Model	Acc (%)	Pre (%)	Re (%)	F1 (%)	Param (M)	Acc (%)	Pre (%)	Re (%)	F1 (%)	Param (M)
VGG16	93.80	94.05	93.79	93.79	134.27	93.15	92.97	93.21	93.08	134.28
AlexNet	95.59	95.02	95.61	95.18	57.02	93.64	93.34	93.73	93.50	57.02
ResNet50	97.10	96.68	96.92	96.79	23.51	95.23	95.15	95.37	95.22	23.52
DenseNet	97.87	97.54	97.93	97.73	6.96	95.63	95.50	95.87	95.67	6.96
EfficientNetV2	96.9	96.23	97.23	96.68	20.18	96.47	96.34	96.66	96.49	20.18
Inception_V3	92.32	91.86	91.90	91.80	21.79	90.64	90.29	91.45	90.77	21.79
VisionTransformer	72.20	71.15	67.36	68.20	87.46	49.79	51.87	48.17	49.12	87.46
SwinTransformer	88.4	89.22	85.41	86.88	48.84	78.07	77.78	78.21	77.97	48.84
Proposed	99.51	99.47	99.53	99.50	16.41	98.01	98.13	98.02	98.07	16.41

Open in a new tab

The experimental results show that the proposed model achieves the desired performance in both three- and four-classification scenarios, achieving the highest accuracy, precision, recall, and F1-score with the next lowest number of parameters. In the three-classification task, the proposed method achieves 99.51% accuracy, 99.47% precision, 99.53% recall, and a 99.50% F1-score. It has 16.41 M parameters. Although its parameter count is higher than that of the lightweight model DenseNet, the method outperforms other comparative models in evaluation metrics. All models show a slight performance degradation in the more challenging four-classification task, but the accuracy of the proposed method has a smaller degradation (1.5%). This illustrates its stronger generalization ability for complex diagnostic tasks. The accuracy of 98.01%, precision of 98.13%, recall of 98.02%, and F1-score of 98.07% in the four-classification task outperform the other compared methods while maintaining a smaller parameter size (16.41 M). The proposed method delivers better classification performance than existing mainstream models. It maintains a lower parameter count. This demonstrates its effectiveness and practicality in brain tumor image recognition tasks.

Table 6 compares the comprehensive performance of our three and fourclassification tasks. It benchmarks our results against state-of-the-art models proposed in recent years. The comparison models were reproduced and validated on the basis of methods described in cutting-edge computer vision research papers. The table shows that the proposed methods have improved compared with the novel network models of recent years. The proposed model shows robustness in both tasks. This improvement indicates the model’s strong generalization ability. In the three-classification task, the proposed method is optimal in terms of accuracy, precision, recall, and F1-score. In the four-classification task, all the metrics are also superior to the other models. The results show that the proposed model can maintain high evaluation metrics under different classification difficulties, especially in positive case recognition and comprehensive performance of medical images.

Table 6.

Comparison of three- and four-classification tasks with some of the state-of-the-art models in each evaluation metric.

Number of classes	Model	Acc (%)	Pre (%)	Re (%)	F1 (%)
Three	GhostNet³⁷ (2020)	97.88	97.49	97.81	97.65
	Kakarla et al.⁴⁵ (2021)	97.42	97.41	97.42	-
	Nawaz et al.⁴⁶ (2021)	98.80	97.40	96.90	-
	Shaik et al.²⁸ (2022)	96.51	96.14	95.99	96.03
	Hossein et al.⁴⁷ (2023)	99.02	98.79	98.70	98.74
	Cinar et al.⁴⁸ (2023)	96.67	96.97	96.97	96.66
	Zulfiqar et al.⁴⁹ (2023)	98.86	98.65	98.77	98.71
	Nassar et al.⁵⁰ (2024)	99.31	99.34	98.30	98.90
	Wang et al.⁸ (2024)	98.86	98.87	98.46	-
	Li et al.⁵¹ (2025)	98.04	97.80	98.20	98.00
	Preetha et al.⁵² (2025)	98.72	98.70	98.68	98.69
	Proposed	99.51	99.47	99.53	99.50
Four	GhostNet³⁷ (2020)	96.63	96.38	96.68	96.52
	Yunenda et al.⁵³ (2022)	93.00	92.75	92.75	92.75
	Saeedi et al.²³ (2023)	93.45	94.75	95.75	95.00
	Preetha et al.⁵² (2025)	97.80	97.79	97.80	97.79
	Proposed	98.01	98.13	98.02	98.07

Open in a new tab

Table 7 compares the Floating Point Operations (FLOPs), average inference time, and memory usage of various computer vision models. The proposed model demonstrates better efficiency, with a FLOPs of 2.10, the shortest average inference time, and the lowest memory usage. Among the evaluated architectures, except for the number of FLOPs which is higher than that of AlexNet, all other indicators are at the optimal level, indicating a good balance in terms of computational cost, inference speed, and memory usage.

Table 7.

Comparative analysis of flops, average inference timey, and memory usage between diverse vision models and the proposed method.

Model	FLOPs (G)	Average inference time (ms/image)	memory usage (MB)
VGG16	15.47	3.46	570.57
AlexNet	1.5	1.22	270.26
ResNet50	4.41	3.10	258.17
DenseNet	2.91	8.12	197.11
EfficientNetV2	2.91	7.49	243.86
Inception_V3	4.12	5.24	255.64
Vision Transformer	16.87	4.62	367.39
Swin Transformer	15.47	13.83	389.85
ConvNext	4.47	3.06	152.78
Proposed	2.10	1.14	98.46

Open in a new tab

The proposed model achieves excellent classification performance on both three- and four-classification tasks. It surpasses existing mainstream approaches across several evaluation metrics. However, there is still room to optimize its parameter count further. Specifically, our model contains 16.41 million parameters. This is far fewer than VGG16’s 134.27 million and Inception_V3’s 21.79 million parameters. However, it remains larger than DenseNet, which has only 6.96 million parameters. Such a size may be problematic in resource-constrained scenarios, such as embedded devices or mobile terminals. Model compression and acceleration are still of great practical significance. The results in Table 6 show that DenseNet, despite having the fewest parameters, achieves superior performance on several evaluation metrics. This finding suggests that it is possible to maintain a small model complexity while still improving performance. Therefore, there is still room for optimizing the balance between performance and lightweight in the proposed method. Future work can focus on the combination of lightweight design and efficient modeling to further promote the widespread application of intelligent brain tumor diagnosis systems in real-world environments.

Conclusion

This study proposes ResMAPNet, a novel deep learning framework. The framework addresses the limited feature-extraction capacity of traditional convolutional neural networks and the constrained generalization ability of attention mechanisms in MRI-based brain tumor classification. By integrating the CLIA and the PSAConv into the ConvNeXt baseline, the above problems are effectively addressed. The CLIA module enhances attention in the spatial, channel, and positional directions through a multidimensional attention synergy mechanism. This enables the network to dynamically prioritize lesion regions. The PSAConv module replaces traditional convolution operations with asymmetric padding and grouped operations. This design expands the receptive field while preserving the spatial characteristics of feature distributions. Extensive experiments on two publicly available datasets demonstrate the superiority of Res-MAPNet. For the three-classification tasks, the model achieves 99.51% accuracy and 100% recall for pituitary tumors. For the more challenging four-classification task, the accuracy was maintained at 98.01%. The ablation study validates the incremental contribution of CLIA and PSAConv. Their combination improves accuracy by 4.41% (three-classification) and 4.45% (four-classification) over the baseline, respectively. The model also has strong generalizability. When moving from the three-classification task to the four-classification task, its performance degrades only slightly. The accuracy drops by just 1.5%. Comparative experiments verify that the proposed network exhibits optimal evaluation metrics with a small number of parameters for both classical and state-of-the-art network models. However, although the number of parameters is significantly lower than models such as EfficientNetV2 and Inception_V3, the model parameters are still large compared to some lightweight models. For deployment on resource-constrained devices, future research could focus on exploring model compression techniques such as knowledge distillation and channel pruning to reduce computational overhead while maintaining diagnostic accuracy.

Acknowledgements

We would like to thank Figshare and Kaggle for providing publicly available datasets.

Author contributions

Jincan Zhang: Conceptualization, Methodology, Software, Writing – original draft; Rongfu Lv: Writing – original draft, Conceptualization, Methodology, Software, Writing– review & editing; Wenna Chen: Methodology, Writing–original draft, Validation, Visualization; Ganqin Du: Supervision, Writing–review & editing, Validation, Investigation; Qizhi Fu: Visualization, Investigation, Supervision, Validation; Hongwei Jiang: Visualization, Investigation, Validation.

Funding

This research is funded by the Henan Province Young Backbone Teachers Training Program (No. 2023GGJS045), the Major Science and Technology Projects of Henan Province (No. 221100210500), the Foundation of He’nan Educational Committee (No. 24A320004), the Medical and Health Research Project in Luoyang (No. 2001027 A), and the Construction Project of Improving Medical Service Capacity of Provincial Medical Institutions in Henan Province (No. 2017-51).

Data availability

This study exclusively utilized two publicly available datasets: the Figshare and the Kaggle brain tumor dataset. The Figshare brain tumor dataset is accessible at https://doi.org/10.6084/m9.figshare.1512427.v8, and the Kaggle brain tumor dataset is available at https://www.kaggle.com/dsv/1183165.

Declarations

Competing interests

The authors declare no competing interests.

Ethical approval

As this study is retrospective and exclusively based on publicly available datasets, no direct involvement of human participants or additional data collection was conducted. The research strictly adhered to ethical standards outlined by the institution and the National Research Council, ensuring compliance with all relevant ethical guidelines for studies involving anonymized human data.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

1.Yadav, A. C., Kolekar, M. H. & Zope, M. K. Modified recurrent residual attention U-Net model for MRI-based brain tumor segmentation. Biomed. Signal Process. Control. 102, 107220 (2025). [Google Scholar]
2.Soomro, T. A. et al. Image segmentation for MR brain tumor detection using machine learning: A review. IEEE Rev. Biomed. Eng.16, 70–90 (2023). [DOI] [PubMed] [Google Scholar]
3.Rehman, A. et al. Microscopic brain tumor detection and classification using 3D CNN and feature selection architecture. Microsc Res. Tech.84, 133–149 (2021). [DOI] [PubMed] [Google Scholar]
4.Bent, M. J. et al. Primary brain tumours in adults. Lancet402, 1564–1579 (2023). [DOI] [PubMed] [Google Scholar]
5.Yadav, A. C. et al. EffUNet++: A novel architecture for brain tumor segmentation using FLAIR MRI images. IEEE Access.12, 152430–152443 (2024). [Google Scholar]
6.Rafi, T. H., Shubair, R. M., Farhan, F., Hoque, Md, Z. & Quayyum, F. M. Recent advances in Computer-Aided medical diagnosis using machine learning algorithms with optimization techniques. IEEE Access.9, 137847–137868 (2021). [Google Scholar]
7.Zhu, G., Jiang, S., Guo, X., Yuan, C. & Huang, Y. Evolutionary automated feature engineering. In PRICAI 2022: Trends in Artificial Intelligence (eds Khanna, S. et al.) 574–586 (Springer Nature Switzerland, 2022). [Google Scholar]
8.Wang, J., Lu, S. Y., Wang, S. H. & Zhang, Y. D. RanMerFormer: randomized vision transformer with token merging for brain tumor classification. Neurocomputing573, 127216 (2024). [Google Scholar]
9.Xu, J. et al. RegNet: Self-Regulated network for image classification. IEEE Trans. Neural Networks Learn. Syst.34, 9562–9567 (2023). [DOI] [PubMed] [Google Scholar]
10.He, K., Zhang, X., Ren, S. & Sun, J. Deep Residual Learning for Image Recognition. in IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 770–778 (2016). 770–778 (2016). (2016). 10.1109/CVPR.2016.90
11.Raza, A. et al. A hybrid deep Learning-Based approach for brain tumor classification. Electronics11, 1146 (2022). [Google Scholar]
12.Nayak, D. R., Padhy, N., Mallick, P. K., Zymbler, M. & Kumar, S. Brain tumor classification using dense Efficient-Net. Axioms11, 34 (2022). [Google Scholar]
13.Zhai, J., Zhang, Z., Ye, F., Wang, Z. & Guo, D. DQKNet: deep quasiconformal kernel network learning for image classification. Electronics13, 4168 (2024). [Google Scholar]
14.Dosovitskiy, A. et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. in International Conference on Learning Representations (2021).
15.Liu, Z. et al. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. in 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 9992–10002 (2021). 10.1109/ICCV48922.2021.00986
16.Jun, W. & Liyuan, Z. Brain tumor classification based on attention guided deep learning model. Int. J. Comput. Intell. Syst.15, 35 (2022). [Google Scholar]
17.Liu, Z. et al. A ConvNet for the 2020s. in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 11966–11976 (2022). 10.1109/CVPR52688.2022.01167
18.Jena, B., Nayak, G. K. & Saxena, S. An empirical study of different machine learning techniques for brain tumor classification and subsequent segmentation using hybrid texture feature. Mach. Vis. Appl.33, 6 (2021). [Google Scholar]
19.Alzubaidi, L. et al. Review of deep learning: concepts, CNN architectures, challenges, applications, future directions. J. Big Data. 8, 53 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Younis, A., Qiang, L., Nyatega, C. O., Adamu, M. J. & Kawuwa, H. B. Brain tumor analysis using deep learning and VGG-16 ensembling learning approaches. Appl. Sci.12, 7282 (2022). [Google Scholar]
21.Xu, W., Fu, Y. L. & Zhu, D. ResNet and its application to medical image processing: research progress and challenges. Comput. Methods Programs Biomed.240, 107660 (2023). [DOI] [PubMed] [Google Scholar]
22.Sadad, T. et al. Brain tumor detection and multi-classification using advanced deep learning techniques. Microsc. Res. Tech.84, 1296–1308 (2021). [DOI] [PubMed] [Google Scholar]
23.Saeedi, S., Rezayi, S., Keshavarz, H. & Niakan Kalhori, R. MRI-based brain tumor detection using convolutional deep learning methods and chosen machine learning techniques. BMC Med. Inf. Decis. Mak.23, 16 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Hussain, T. et al. EFFResNet-ViT: A Fusion-Based convolutional and vision transformer model for explainable medical image classification. IEEE Access.13, 54040–54068 (2025). [Google Scholar]
25.Brauwers, G. & Frasincar, F. A. General survey on attention mechanisms in deep learning. IEEE Trans. Knowl. Data Eng.35, 3279–3298 (2023). [Google Scholar]
26.Wang, Q. et al. ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks. in. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 11531–11539 (2020). (2020). 10.1109/CVPR42600.2020.01155
27.Ouyang, D. et al. Efficient Multi-Scale Attention Module with Cross-Spatial Learning. in ICASSP 2023–2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 1–5 (2023). 10.1109/ICASSP49357.2023.10096516
28.Shaik, N. S. & Cherukuri, T. K. Multi-level attention network: application to brain tumor classification. Signal. Image Video Process.16, 817–824 (2022). [Google Scholar]
29.Hussain, T., Shouno, H., Mohammed, M. A., Marhoon, H. A. & Alam, T. DCSSGA-UNet: biomedical image segmentation with densenet channel Spatial and semantic guidance attention. Knowl. Based Syst.314, 113233 (2025). [Google Scholar]
30.Hussain, T. & Shouno, H. MAGRes-UNet: improved medical image segmentation through a deep learning paradigm of Multi-Attention gated residual U-Net. IEEE Access.12, 40290–40310 (2024). [Google Scholar]
31.Tan, M. & Le, Q. V. EfficientNetV2: Smaller Models and Faster Training. in ICML 10096–10106 (2021).
32.Li, J., Wen, Y., He, L. & SCConv Spatial and Channel Reconstruction Convolution for Feature Redundancy. in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 6153–6162 (IEEE, Vancouver, BC, Canada, 2023)., 2023). (2023). 10.1109/CVPR52729.2023.00596
33.Sandler, M., Howard, A., Zhu, M., Zhmoginov, A. & Chen, L. C. MobileNetV2: Inverted Residuals and Linear Bottlenecks. in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018).
34.Zhang, X., Zhou, X., Lin, M., Sun, J. & ShuffleNet An Extremely Efficient Convolutional Neural Network for Mobile Devices. in IEEE/CVF Conference on Computer Vision and Pattern Recognition 6848–6856 (IEEE, Salt Lake City, UT, 2018). 6848–6856 (IEEE, Salt Lake City, UT, 2018). (2018). 10.1109/CVPR.2018.00716
35.Wang, X. & Yu, S. X. Tied Block Convolution: Leaner and Better CNNs with Shared Thinner Filters. AAAI 35, 10227–10235 (2021).
36.Zhang, X. et al. LDConv: linear deformable Convolution for improving Convolutional neural networks. Image Vis. Comput.149, 105190 (2024). [Google Scholar]
37.Han, K. et al. GhostNet: More Features From Cheap Operations. in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 1577–1586 (2020). 10.1109/CVPR42600.2020.00165
38.Qiu, J., Chen, C., Liu, S., Zeng, B. & SlimConv Reducing channel redundancy in convolutional neural networks by weights flipping. IEEE Trans. Image Process.30, 6434–6445 (2021). [DOI] [PubMed] [Google Scholar]
39.Alam, T. et al. An Integrated Approach using YOLOv8 and ResNet, SeResNet & Vision Transformer (ViT) Algorithms based on ROI Fracture Prediction in X-ray Images of the Elbow. CMIR 20, e15734056309890 (2024). [DOI] [PubMed]
40.Wang, Y., Li, Y., Wang, G., Liu, X. & PlainUSR, Chasing faster ConvNet for efficient Super-Resolution. In Computer Vision – ACCV 2024 Vol. 15475 (eds Cho, M. et al.) 246–264 (Springer Nature Singapore, 2025). [Google Scholar]
41.Yang, J. et al. Pinwheel-shaped Convolution and Scale-based dynamic loss for infrared small target detection. Proc. AAAI Conf. Artif. Intell.39, 9202–9210 (2025). [Google Scholar]
42.Sonawane, Y. et al. DCRUNet++: A depthwise convolutional residual UNet + + Model for brain tumor segmentation. In Pattern Recognition Vol. 15327 (eds Antonacopoulos, A. et al.) 266–280 (Springer Nature Switzerland, 2025). [Google Scholar]
43.Hou, Q., Zhou, D. & Feng, J. Coordinate Attention for Efficient Mobile Network Design. in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 13708–13717 (2021). 13708–13717 (2021). (2021). 10.1109/CVPR46437.2021.01350
44.Stergiou, A., Poppe, R. & Kalliatakis, G. Refining activation downsampling with SoftPool. in IEEE/CVF International Conference on Computer Vision (ICCV) 10337–10346 (IEEE, Montreal, QC, Canada, 2021)., 2021). (2021). 10.1109/ICCV48922.2021.01019
45.Kakarla, J., Isunuri, B. V., Doppalapudi, K. S. & Bylapudi, K. S. R. Three-class classification of brain magnetic resonance images using average‐pooling convolutional neural network. Int. J. Imaging Syst. Tech.31, 1731–1740 (2021). [Google Scholar]
46.Nawaz, M. et al. Analysis of brain MRI images using improved CornerNet approach. Diagnostics11, 1856 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
47.Mehnatkesh, H., Jalali, S. M. J., Khosravi, A. & Nahavandi, S. An intelligent driven deep residual learning framework for brain tumor classification using MRI images. Expert Syst. Appl.213, 119087 (2023). [Google Scholar]
48.Cinar, N., Kaya, M. & Kaya, B. A novel convolutional neural network-based approach for brain tumor classification using magnetic resonance images. Int. J. Imaging Syst. Technol.33, 895–908 (2023). [Google Scholar]
49.Zulfiqar, F., Bajwa, U. I. & Mehmood, Y. Multi-class classification of brain tumor types from MR images using efficientnets. Biomed. Signal Process. Control. 84, 104777 (2023). [Google Scholar]
50.Nassar, S. E., Yasser, I., Amer, H. M. & Mohamed, M. A. A robust MRI-based brain tumor classification via a hybrid deep learning technique. J. Supercomputing. 80, 2403–2427 (2024). [Google Scholar]
51.Li, Z. & Zhou, X. A Global-Local parallel Dual-Branch deep learning model with Attention-Enhanced feature fusion for brain tumor MRI classification. CMC83, 739–760 (2025). [Google Scholar]
52.Preetha, R., Priyadarsini, M. J. P. & Nisha, J. S. Hybrid 3B net and EfficientNetB2 model for Multi-Class brain tumor classification. IEEE Access.13, 63465–63485 (2025). [Google Scholar]
53.Nidaan Khofiya, S., Fu’adah, N., Caecar Pratiwi, Y. K. & Naufal, N. I. R. & Deta Pratama, A. Brain Tumor Classification Based On MRI Image Processing With Alexnet Architecture. in IEEE Asia Pacific Conference on Wireless and Mobile (APWiMob) 1–6 (IEEE, Bandung, Indonesia, 2022). 1–6 (IEEE, Bandung, Indonesia, 2022). (2022). 10.1109/APWiMob56856.2022.10014115

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

[CR1] 1.Yadav, A. C., Kolekar, M. H. & Zope, M. K. Modified recurrent residual attention U-Net model for MRI-based brain tumor segmentation. Biomed. Signal Process. Control. 102, 107220 (2025). [Google Scholar]

[CR2] 2.Soomro, T. A. et al. Image segmentation for MR brain tumor detection using machine learning: A review. IEEE Rev. Biomed. Eng.16, 70–90 (2023). [DOI] [PubMed] [Google Scholar]

[CR3] 3.Rehman, A. et al. Microscopic brain tumor detection and classification using 3D CNN and feature selection architecture. Microsc Res. Tech.84, 133–149 (2021). [DOI] [PubMed] [Google Scholar]

[CR4] 4.Bent, M. J. et al. Primary brain tumours in adults. Lancet402, 1564–1579 (2023). [DOI] [PubMed] [Google Scholar]

[CR5] 5.Yadav, A. C. et al. EffUNet++: A novel architecture for brain tumor segmentation using FLAIR MRI images. IEEE Access.12, 152430–152443 (2024). [Google Scholar]

[CR6] 6.Rafi, T. H., Shubair, R. M., Farhan, F., Hoque, Md, Z. & Quayyum, F. M. Recent advances in Computer-Aided medical diagnosis using machine learning algorithms with optimization techniques. IEEE Access.9, 137847–137868 (2021). [Google Scholar]

[CR7] 7.Zhu, G., Jiang, S., Guo, X., Yuan, C. & Huang, Y. Evolutionary automated feature engineering. In PRICAI 2022: Trends in Artificial Intelligence (eds Khanna, S. et al.) 574–586 (Springer Nature Switzerland, 2022). [Google Scholar]

[CR8] 8.Wang, J., Lu, S. Y., Wang, S. H. & Zhang, Y. D. RanMerFormer: randomized vision transformer with token merging for brain tumor classification. Neurocomputing573, 127216 (2024). [Google Scholar]

[CR9] 9.Xu, J. et al. RegNet: Self-Regulated network for image classification. IEEE Trans. Neural Networks Learn. Syst.34, 9562–9567 (2023). [DOI] [PubMed] [Google Scholar]

[CR10] 10.He, K., Zhang, X., Ren, S. & Sun, J. Deep Residual Learning for Image Recognition. in IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 770–778 (2016). 770–778 (2016). (2016). 10.1109/CVPR.2016.90

[CR11] 11.Raza, A. et al. A hybrid deep Learning-Based approach for brain tumor classification. Electronics11, 1146 (2022). [Google Scholar]

[CR12] 12.Nayak, D. R., Padhy, N., Mallick, P. K., Zymbler, M. & Kumar, S. Brain tumor classification using dense Efficient-Net. Axioms11, 34 (2022). [Google Scholar]

[CR13] 13.Zhai, J., Zhang, Z., Ye, F., Wang, Z. & Guo, D. DQKNet: deep quasiconformal kernel network learning for image classification. Electronics13, 4168 (2024). [Google Scholar]

[CR14] 14.Dosovitskiy, A. et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. in International Conference on Learning Representations (2021).

[CR15] 15.Liu, Z. et al. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. in 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 9992–10002 (2021). 10.1109/ICCV48922.2021.00986

[CR16] 16.Jun, W. & Liyuan, Z. Brain tumor classification based on attention guided deep learning model. Int. J. Comput. Intell. Syst.15, 35 (2022). [Google Scholar]

[CR17] 17.Liu, Z. et al. A ConvNet for the 2020s. in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 11966–11976 (2022). 10.1109/CVPR52688.2022.01167

[CR18] 18.Jena, B., Nayak, G. K. & Saxena, S. An empirical study of different machine learning techniques for brain tumor classification and subsequent segmentation using hybrid texture feature. Mach. Vis. Appl.33, 6 (2021). [Google Scholar]

[CR19] 19.Alzubaidi, L. et al. Review of deep learning: concepts, CNN architectures, challenges, applications, future directions. J. Big Data. 8, 53 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR20] 20.Younis, A., Qiang, L., Nyatega, C. O., Adamu, M. J. & Kawuwa, H. B. Brain tumor analysis using deep learning and VGG-16 ensembling learning approaches. Appl. Sci.12, 7282 (2022). [Google Scholar]

[CR21] 21.Xu, W., Fu, Y. L. & Zhu, D. ResNet and its application to medical image processing: research progress and challenges. Comput. Methods Programs Biomed.240, 107660 (2023). [DOI] [PubMed] [Google Scholar]

[CR22] 22.Sadad, T. et al. Brain tumor detection and multi-classification using advanced deep learning techniques. Microsc. Res. Tech.84, 1296–1308 (2021). [DOI] [PubMed] [Google Scholar]

[CR23] 23.Saeedi, S., Rezayi, S., Keshavarz, H. & Niakan Kalhori, R. MRI-based brain tumor detection using convolutional deep learning methods and chosen machine learning techniques. BMC Med. Inf. Decis. Mak.23, 16 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR24] 24.Hussain, T. et al. EFFResNet-ViT: A Fusion-Based convolutional and vision transformer model for explainable medical image classification. IEEE Access.13, 54040–54068 (2025). [Google Scholar]

[CR25] 25.Brauwers, G. & Frasincar, F. A. General survey on attention mechanisms in deep learning. IEEE Trans. Knowl. Data Eng.35, 3279–3298 (2023). [Google Scholar]

[CR26] 26.Wang, Q. et al. ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks. in. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 11531–11539 (2020). (2020). 10.1109/CVPR42600.2020.01155

[CR27] 27.Ouyang, D. et al. Efficient Multi-Scale Attention Module with Cross-Spatial Learning. in ICASSP 2023–2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 1–5 (2023). 10.1109/ICASSP49357.2023.10096516

[CR28] 28.Shaik, N. S. & Cherukuri, T. K. Multi-level attention network: application to brain tumor classification. Signal. Image Video Process.16, 817–824 (2022). [Google Scholar]

[CR29] 29.Hussain, T., Shouno, H., Mohammed, M. A., Marhoon, H. A. & Alam, T. DCSSGA-UNet: biomedical image segmentation with densenet channel Spatial and semantic guidance attention. Knowl. Based Syst.314, 113233 (2025). [Google Scholar]

[CR30] 30.Hussain, T. & Shouno, H. MAGRes-UNet: improved medical image segmentation through a deep learning paradigm of Multi-Attention gated residual U-Net. IEEE Access.12, 40290–40310 (2024). [Google Scholar]

[CR31] 31.Tan, M. & Le, Q. V. EfficientNetV2: Smaller Models and Faster Training. in ICML 10096–10106 (2021).

[CR32] 32.Li, J., Wen, Y., He, L. & SCConv Spatial and Channel Reconstruction Convolution for Feature Redundancy. in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 6153–6162 (IEEE, Vancouver, BC, Canada, 2023)., 2023). (2023). 10.1109/CVPR52729.2023.00596

[CR33] 33.Sandler, M., Howard, A., Zhu, M., Zhmoginov, A. & Chen, L. C. MobileNetV2: Inverted Residuals and Linear Bottlenecks. in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018).

[CR34] 34.Zhang, X., Zhou, X., Lin, M., Sun, J. & ShuffleNet An Extremely Efficient Convolutional Neural Network for Mobile Devices. in IEEE/CVF Conference on Computer Vision and Pattern Recognition 6848–6856 (IEEE, Salt Lake City, UT, 2018). 6848–6856 (IEEE, Salt Lake City, UT, 2018). (2018). 10.1109/CVPR.2018.00716

[CR35] 35.Wang, X. & Yu, S. X. Tied Block Convolution: Leaner and Better CNNs with Shared Thinner Filters. AAAI 35, 10227–10235 (2021).

[CR36] 36.Zhang, X. et al. LDConv: linear deformable Convolution for improving Convolutional neural networks. Image Vis. Comput.149, 105190 (2024). [Google Scholar]

[CR37] 37.Han, K. et al. GhostNet: More Features From Cheap Operations. in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 1577–1586 (2020). 10.1109/CVPR42600.2020.00165

[CR38] 38.Qiu, J., Chen, C., Liu, S., Zeng, B. & SlimConv Reducing channel redundancy in convolutional neural networks by weights flipping. IEEE Trans. Image Process.30, 6434–6445 (2021). [DOI] [PubMed] [Google Scholar]

[CR39] 39.Alam, T. et al. An Integrated Approach using YOLOv8 and ResNet, SeResNet & Vision Transformer (ViT) Algorithms based on ROI Fracture Prediction in X-ray Images of the Elbow. CMIR 20, e15734056309890 (2024). [DOI] [PubMed]

[CR40] 40.Wang, Y., Li, Y., Wang, G., Liu, X. & PlainUSR, Chasing faster ConvNet for efficient Super-Resolution. In Computer Vision – ACCV 2024 Vol. 15475 (eds Cho, M. et al.) 246–264 (Springer Nature Singapore, 2025). [Google Scholar]

[CR41] 41.Yang, J. et al. Pinwheel-shaped Convolution and Scale-based dynamic loss for infrared small target detection. Proc. AAAI Conf. Artif. Intell.39, 9202–9210 (2025). [Google Scholar]

[CR42] 42.Sonawane, Y. et al. DCRUNet++: A depthwise convolutional residual UNet + + Model for brain tumor segmentation. In Pattern Recognition Vol. 15327 (eds Antonacopoulos, A. et al.) 266–280 (Springer Nature Switzerland, 2025). [Google Scholar]

[CR43] 43.Hou, Q., Zhou, D. & Feng, J. Coordinate Attention for Efficient Mobile Network Design. in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 13708–13717 (2021). 13708–13717 (2021). (2021). 10.1109/CVPR46437.2021.01350

[CR44] 44.Stergiou, A., Poppe, R. & Kalliatakis, G. Refining activation downsampling with SoftPool. in IEEE/CVF International Conference on Computer Vision (ICCV) 10337–10346 (IEEE, Montreal, QC, Canada, 2021)., 2021). (2021). 10.1109/ICCV48922.2021.01019

[CR45] 45.Kakarla, J., Isunuri, B. V., Doppalapudi, K. S. & Bylapudi, K. S. R. Three-class classification of brain magnetic resonance images using average‐pooling convolutional neural network. Int. J. Imaging Syst. Tech.31, 1731–1740 (2021). [Google Scholar]

[CR46] 46.Nawaz, M. et al. Analysis of brain MRI images using improved CornerNet approach. Diagnostics11, 1856 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR47] 47.Mehnatkesh, H., Jalali, S. M. J., Khosravi, A. & Nahavandi, S. An intelligent driven deep residual learning framework for brain tumor classification using MRI images. Expert Syst. Appl.213, 119087 (2023). [Google Scholar]

[CR48] 48.Cinar, N., Kaya, M. & Kaya, B. A novel convolutional neural network-based approach for brain tumor classification using magnetic resonance images. Int. J. Imaging Syst. Technol.33, 895–908 (2023). [Google Scholar]

[CR49] 49.Zulfiqar, F., Bajwa, U. I. & Mehmood, Y. Multi-class classification of brain tumor types from MR images using efficientnets. Biomed. Signal Process. Control. 84, 104777 (2023). [Google Scholar]

[CR50] 50.Nassar, S. E., Yasser, I., Amer, H. M. & Mohamed, M. A. A robust MRI-based brain tumor classification via a hybrid deep learning technique. J. Supercomputing. 80, 2403–2427 (2024). [Google Scholar]

[CR51] 51.Li, Z. & Zhou, X. A Global-Local parallel Dual-Branch deep learning model with Attention-Enhanced feature fusion for brain tumor MRI classification. CMC83, 739–760 (2025). [Google Scholar]

[CR52] 52.Preetha, R., Priyadarsini, M. J. P. & Nisha, J. S. Hybrid 3B net and EfficientNetB2 model for Multi-Class brain tumor classification. IEEE Access.13, 63465–63485 (2025). [Google Scholar]

[CR53] 53.Nidaan Khofiya, S., Fu’adah, N., Caecar Pratiwi, Y. K. & Naufal, N. I. R. & Deta Pratama, A. Brain Tumor Classification Based On MRI Image Processing With Alexnet Architecture. in IEEE Asia Pacific Conference on Wireless and Mobile (APWiMob) 1–6 (IEEE, Bandung, Indonesia, 2022). 1–6 (IEEE, Bandung, Indonesia, 2022). (2022). 10.1109/APWiMob56856.2022.10014115

PERMALINK

A novel residual network based on multidimensional attention and pinwheel convolution for brain tumor classification

Jincan Zhang

Rongfu Lv

Wenna Chen

Ganqin Du

Qizhi Fu

Hongwei Jiang

Abstract

Introduction

Related work

Methods

Datasets and preprocessing

Fig. 1.

Table 1.

Fig. 2.

Overall network architecture

Fig. 3.

Fig. 4.

Coordinated local importance enhancement attention

Fig. 5.

Fig. 6.

Fig. 7.

Pinwheel-shaped attention convolution

Pinwheel-shaped convolution

PConv integration strategy with CA

Fig. 8.

Results

Model parameters and hardware environment

Table 2.

Evaluation metrics

Experimental results

Fig. 9.

Table 3.

Fig. 10.

Ablation experiments

Table 4.

Fig. 11.

Discussion

Table 5.

Table 6.

Table 7.

Conclusion

Acknowledgements

Author contributions

Funding

Data availability

Declarations

Competing interests

Ethical approval

Footnotes

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases