Skip to main content
Scientific Reports logoLink to Scientific Reports
. 2025 Oct 9;15:35345. doi: 10.1038/s41598-025-19352-5

Advanced transformer with attention-based neural network framework for precise renal cell carcinoma detection using histological kidney images

M Eliazer 1, Guntupalli Manoj Kumar 1, Sibi Amaran 1, Y Shasikala 2, Monalisa Sahu 3,, Bibhuti Bhusan Dash 4, Kanchan Bala 5
PMCID: PMC12511621  PMID: 41068281

Abstract

Renal cell carcinoma (RCC) is one of the typical categories of kidney cancer and is a varied group of malignancies arising from epithelial cells of the kidney parenchyma. RCC has more than ten subtypes. Classification of RCC sub-types is mainly according to morphologic features seen on histopathological hematoxylin and eosin (H & E)–stained slides. The histology classification of RCCs is of great significance, considering the important therapeutic and prognostic implications of its histologic subtypes. Imaging models play a prominent role in the diagnosis, follow-up, and staging of RCC. Histopathological images comprise morphological markers of disease development that have both predictive and diagnostic value. Recently, deep learning (DL) has achieved advanced performance in various computer vision tasks, including segmentation, image classification, and object detection. With the provision of sufficient data, the precision of a DL-enabled diagnosis model frequently matches or even exceeds that of qualified doctors. This paper presents an Advanced Transformer and Attention-Based Neural Network Framework for the Intelligent Detection of Renal Cell Carcinoma (ATANNF-IDRCC) model. The aim is to develop an accurate and automated model for detecting and ranking RCC using kidney histopathology images. Initially, the image pre-processing stage utilizes the contrast enhancement method to enhance the image quality. Furthermore, the ATANNF-IDRCC model utilizes the Twins-Spatially Separable Vision Transformer (Twins-SVT) method for feature extraction. For the RCC classification process, a hybrid model of bidirectional temporal convolutional network and long short-term memory with an attention mechanism (BiTCN-BiLSTM-AM) is employed. The performance analysis of the ATANNF-IDRCC technique is examined under the RCCGNet dataset. The comparison study of the ATANNF-IDRCC technique demonstrated a superior accuracy value of 98.26% compared to existing models.

Keywords: Renal cell carcinoma, Histopathology images, Computer vision, Hybrid deep learning, Attention-based neural network, Biomedical image analysis

Subject terms: Computer science, Information technology

Introduction

RCC is a distinct kind and makes up nearly 85% of kidney cancers. Kidney cancer is one of the most frequent cancers in humans and accounts for around 2.4% of all cancers. RCC consists of a diverse group of cancers with distinct molecular characteristics, tissue structures, treatment responses, and clinical outcomes1. The common RCC subtypes are clear cell, chromophobe, papillary, and clear cell papillary RCC. Subtype Classification of RCC mainly depends on the structural characteristics seen in tissue samples stained with H & E2. Among these four major RCC subtypes, clear cell and clear cell papillary RCCs exhibit significant structural similarities, particularly in the common occurrence of clear cells. The dissimilarity between clear cell and clear cell papillary RCCs is vital to deciding suitable patient treatment. Clear cell RCC usually carries a worse outcome with a greater chance of spreading3. In comparison, clear cell papillary RCC is a slower-developing cancer with a low risk of coming back or spreading. The accuracy of conventional clinical methods remains limited for individual cases, particularly in the early stages, and depends on the expertise of pathologists. Hence, there is a requirement for better indicators to assess RCC outcomes4. RCC subtypes are identified through imaging, relying on the level of cancer growth on multi-detector computed tomography. Additionally, microscopic inspection of H&E-stained biopsy slides remains a valuable tool for doctors and pathologists5.

Histological images comprise signs of illness development and phenotypic detail that have predictive and diagnostic values. The majority of pathologists follow simple techniques for cancer grading. Key limitations in H&E image analysis by diagnosticians include inter-observer variation and the time required to identify6. Computer-aided methods help overcome these restrictions and classify subtle morphological variations among clinical groups. Histopathological images provide significant data for cancer identification, staging, and prognosis, and are utilized broadly by pathologists in medical practices7. With the current accessibility of digital whole-slide imaging, automatic computational histopathology image testing systems have displayed a remarkable ability to diagnose and detect novel biomarkers for cancers. In contrast to human examination, computerized image analysis has a higher capability to enhance consistency, efficacy, and precision8. Moreover, histopathologic images and molecular features, including gene expression signatures and genetic alterations, are also widely used to predict medical outcomes for cancers. Currently, DL models have achieved outstanding results in numerous histopathology tasks, including cancer identification, grading, and classification9. DL is to examine the RCC’s histopathologic images, including several tasks. Several recent studies have utilized convolutional neural networks (CNNs) to focus on kidney cancer recognition and classification10.

This paper presents an Advanced Transformer and Attention-Based Neural Network Framework for the Intelligent Detection of Renal Cell Carcinoma (ATANNF-IDRCC) model. The aim is to develop an accurate and automated model for detecting and ranking RCC using kidney histopathology images. Initially, the image pre-processing stage utilizes the contrast enhancement method to enhance the image quality. Furthermore, the ATANNF-IDRCC model utilizes the Twins-Spatially Separable Vision Transformer (Twins-SVT) method for feature extraction. For the RCC classification process, a hybrid model of bidirectional temporal convolutional network and long short-term memory with an attention mechanism (BiTCN-BiLSTM-AM) is employed. The performance analysis of the ATANNF-IDRCC technique is examined under the RCCGNet dataset. The key contribution of the ATANNF-IDRCC technique is listed below.

  • The ATANNF-IDRCC model employs contrast enhancement using CLAHE during image pre-processing, which effectually enhances the visibility of subtle histological features in RCC images, thereby improving the quality of input data. This improvement facilitates more accurate and reliable feature extraction, ultimately contributing to better classification performance.

  • The ATANNF-IDRCC methodology utilizes the Twins-SVT model for feature extraction, which efficiently captures long-range spatial dependencies within RCC images while maintaining computational efficiency. This approach enhances the model’s capability to represent intrinsic image patterns, leading to improved detection accuracy and robustness.

  • The ATANNF-IDRCC approach implements a hybrid classification model that incorporates BiTCN, BiLSTM, and AM to capture temporal and sequential patterns in RCC data effectively. This integration enhances classification accuracy and robustness by concentrating on the most relevant features for diagnosis.

  • The novelty of the ATANNF-IDRCC technique is in incorporating the Twins-SVT with the BiTCN-BiLSTM-AM hybrid model into a single unified framework for RCC detection. This approach uniquely integrates advanced spatial feature extraction through Twins-SVT with temporal sequence modelling and AMs. It also effectually captures both spatial and temporal patterns, resulting in superior classification performance. This integration distinguishes the model from existing methods and improves its diagnostic accuracy.

Existing studies on RCC detection process

In11, a new LSTM + CNN-enabled method is designed for kidney cancer detection, incorporating the sequential learning ability of LSTM networks along with the strong feature extraction capabilities of CNNs. This method aims to enhance precision and efficacy in kidney cancer analysis, leveraging medical image data to utilize spatial and temporal features. In this regard, the designed method demonstrates superior accuracy, faster processing time, and improved overall classification efficiency compared to sophisticated methods. Mehta and Bhalla12 introduced a hybrid architecture that merges support vector machines (SVMs) and CNNs for kidney cancer classification. The CNN automatically performed feature extraction, whereas the power of the SVM lay in classification, utilizing its notable decision-making capabilities. Hossain et al.13 presented the design and validation of an application that relied on DL approaches. This application comprises synthetic image generation and an image pre-processing workflow, which is essential because of the absence of training data in pathology scenes. Then, the nuclei region and the overlying nuclei splitting segmentation. An enhanced methodology utilizes a cycle-consistent GAN network to generate artificial images, which are then incorporated into an adapted U-net network. Akram et al.14 proposed a model that employs the CNN technique for spatial feature extraction from contrast-enhanced images, integrated with an ensemble of machine learning (ML) classifiers comprising logistic regression (LR), random forest (RF), and Gaussian naïve Bayes (GNB). The hybrid model also utilizes DL for feature representation and probabilistic models for robust classification. Liu et al.15 presented MSMTSeg, generative self-supervised meta-learning models for multi-stained multi-tissue segmentation in kidney biopsy WSI. MSMTSeg integrates multiple stain transformer techniques for inter-stain domain style translation, a self-supervision model to obtain domain-specific feature models, and pre-trained models. A meta-learning approach that utilizes pre-trained models and generates virtual data to learn domain-invariant feature representations across numerous stains, thereby enhancing segmentation effectiveness. Sahu et al.16 proposed DL methods like CNN for recognizing kidney images on CT scans. This research utilizes a CNN with extra convolution layers for distinguishing between normal and malignant kidneys from images. By using X-rays, CT scans produce cross-sectional images that provide detailed information about internal organs and structures. Badawy et al.17 presented an AI-driven TL method for detecting RD in an initial stage. This method, based on CT images from microscopic histopathologic tests, helps precisely and automatically detect individuals with RD through a CNN, pre-trained techniques, and an optimizer technique applied to images. Furthermore, the sparrow search algorithm (SpaSA) is employed to enhance the performance of the pre-trained approach, utilizing the optimal pattern. Rajkumar et al.18 suggested kidney image recognition using DL paradigms like CNNs, alongside blood sample database values using artificial neural network (ANN) models. The current research primarily leveraged a CNN approach and performed one more kidney image classification. This study involves a CNN with additional convolutional layers to classify normal and cancerous kidneys from images, and an ANN is deployed to predict the presence of kidney cancer.

Moldovanu et al.19 evaluated the performance of well-known pre-trained CNNs, including Visual Geometry Group 16 (VGG16), Dense Convolutional Network 169 (DenseNet169), and EfficientNet Version 2B3 (EfficientNetV2B3), with two newly proposed custom-built CNN models with four and five layers, for classifying invasive ductal carcinoma (IDC) histopathological images. Feng et al.20 developed an artificial intelligence-based pediatric kidney diagnosis (APKD) approach that utilizes AI-driven segmentation and classification techniques to assist nephropathologists in detecting critical kidney structures, such as glomeruli. Rehman, Mahmood, and Saba21 proposed a model utilizing the Swin-ViT model along with modified DL-based pre-trained models, including ConvNext, Swin-Unet, MedT, modified ResNet152V2 (mResNet152V2), EfficientNet-B7, and modified VGG19 (mVGG19). A transfer learning (TL) model is employed to enhance performance and reduce reliance on large-scale datasets. Ali et al.22 presented a model by using a novel two-stage super-resolution architecture. This approach utilizes the Vision Transformer (ViT) to capture global and contextual details, followed by a diffusion model (DM) that iteratively improves image quality and generates fine details, both trained with the Charbonnier loss function for optimal performance. Pimpalkar et al.23 presented a technique that employs fine-tuned TL models, such as VGG16, ResNet50, AlexNet, and InceptionV3, integrated with advanced image processing techniques that include data augmentation, watershed segmentation, Otsu’s thresholding, and wide neural networks (WNN). He et al.24 proposed a model using a graph neural network (GNN) method integrated with global attention pooling (GAP) and Bayesian collaborative learning (BCL). Moreover, a soft classification head is incorporated to reduce semantic ambiguity. Maqsood and Khan25 proposed a DL model by comprising models such as clustering-constrained attention multiple-instance learning (CLAM), pre-trained 3D-ResNet18 (MeD-3D), and multi-layer perceptron (MLP) to improve clear cell renal cell carcinoma (ccRCC) recurrence prediction. The model also incorporates early and late fusion strategies to capture complementary information across modalities, thereby enhancing predictive accuracy. Ye et al.26 developed an explainable DL method by incorporating a CNN-based DL with the gradient-weighted class activation mapping (Grad-CAM) technique. Sharon and Anbarasi27 developed a dilated bottleneck attention-based renal network (DBAR-Net) methodology by utilizing dilated convolution, dual bottleneck attention modules, and layer normalization techniques. The model also incorporates multi-feature fusion and adaptive contextual feature extraction. Uhm et al.28 proposed the Lesion-Aware Cross-Phase Attention Network (LACPANet) technique by incorporating 3D inter-phase lesion-aware attention and a multi-scale AM to capture temporal dependencies and spatial lesion features more effectively. A comparison analysis of existing renal cell carcinoma diagnoses is demonstrated in Table 1.

Table 1.

A comparative analysis study of the several methods of RCC diagnosis.

Author Year Objective Method Dataset Performance analysis
Bino et al.11 2025 To precisely detect renal cancer LSTM + CNN
Mehta and Bhalla12 2025 To provide clinical decision support tools CNN and SVM Average accuracy of 93.75% and 91%
Hossain et al.13 2024 To recommend an innovative DL-based computerized application GAN and modified U-net H&E-stained HI dataset SSIM and PSNR values of 0.204 and 10.610
Akram et al.14 2024 To enhance the accuracy and reliability of renal carcinoma subtype classification CNN, probabilistic feature mapping, LR, RF, GNB Accuracy score, K-fold validation, contrast-enhanced images, diverse RCC subtypes The model achieved 99.72% accuracy, demonstrating strong performance
Liu et al.15 2024 To evaluate the framework of numerous tissues at different stains, a model that is labour-intensive and time-consuming is required MSMTSeg mDSC of 0.836 and mIoU of 0.718
Sahu et al.16 2024 To accurately detect kidney cancer using CT images CNN
Badawy et al.17 2023 To project an AI-based TL structure for identifying RD at an initial phase CNN, SpaSA CT kidney dataset and kidney cancer Accuracy of 99.98% and 100%
Rajkumar et al.18 2023 To detect and classify kidney cancer and to improve treatment outcomes CNN, ANN
Moldovanu et al.19 2025 To evaluate CNN models for IDC classification VGG16, DenseNet169, EfficientNetV2B3, Custom CNN1 and CNN2 (4 and 5 layers) Accuracy, AUC, precision, IDC histopathological images Pre-trained CNNs outperform custom models
Feng et al.20 2024 To develop an APKD model for automated segmentation and classification of kidney disease AI segmentation, structure classification, manual annotation Accuracy, spearman and intraclass correlation, pediatric kidney images APKD achieved 94% accuracy and a 5.5× speedup compared to manual work
Rehman, Mahmood, and Saba21 2025 To improve multi-label CKD image classification using advanced DL models Swin-ViT, ConvNext, Swin-Unet, MedT, mResNet152V2, EfficientNet-B7, mVGG19, TL Multi-label accuracy, precision, GradCAM interpretability Hybrid CNN-transformer models outperform singles
Ali et al.22 2023 To improve the resolution and quality of remote sensing images using a novel two-stage super-resolution architecture ViT, DM, charbonnier loss Peak signal-to-noise ratio (PSNR), structural similarity index (SSIM), UCMerced benchmark TESR significantly enhances image quality compared to existing methods
Pimpalkar et al.23 2025 To enhance kidney CT image classification accuracy using fine-tuned TL and advanced image processing TL, Watershed segmentation, Otsu’s thresholding, relief feature selection, WNN Accuracy, precision, recall, kidney CT images Achieved high classification accuracy (~ 99.96%) with advanced techniques
He et al.24 2024 To improve glomerular lesion recognition in pathology images GNN, GAP, BCL, Soft classification head F1 Score, Comparative accuracy, Four private WSIs and BRACS public datasets The model showed superior lesion recognition
Maqsood and Khan25 2025 To improve ccRCC recurrence prediction by integrating multimodal data using a DL method CLAM (ResNet50), 3D-ResNet18, MLP, early and late fusion Accuracy, AUC, precision-recall, TCGA, TCIA, CPTAC The model improves prediction accuracy and handles missing data
Ye et al.26 2024 To classify and localize lesions in ultrasound renal cancer images DL, Grad-CAM, VGG16, ResNet34 Area under curve (AUC), classification accuracy, ultrasound renal and hamartoma and RCC samples The model improves lesion localization and classification accuracy
Sharon and Anbarasi27 2025 To develop the DBAR-Net method for accurate classification of kidney diseases from CT images Dilated convolution, bottleneck attention, layer normalization, multi-feature fusion F1 score, classification accuracy, CT kidney dataset, four-class images The model achieved a high F1 score of 0.98 and 98.86% accuracy with low computational complexity
Uhm et al.28 2024 To improve renal lesion subtype classification using DL models LACPANet, cross-phase, 3D lesion-aware, and multi-scale attention Diagnostic accuracy, multi-phase CT scans LACPANet outperforms existing methods in diagnostic accuracy

Proposed methods

This paper presents an ATANNF-IDRCC model. This paper aims to develop an accurate and automated model for detecting and ranking RCC using kidney histopathology images. To achieve this, the ATANNF-IDRCC model comprises three distinct stages: image pre-processing, feature extraction, and classification. Figure 1 depicts the overall working flow process of the ATANNF-IDRCC technique.

Fig. 1.

Fig. 1

Complete workflow of ATANNF-IDRCC technique.

Dataset description: RCCGNet

The performance of the ATANNF-IDRCC model is evaluated using the RCCGNet dataset2,29, which is designed explicitly for RCC grading. This dataset comprises 3000 histopathological images, evenly distributed across five grades (Grade 0 to Grade 4), with 500 images per class. Each grade represents a distinct level of nuclear abnormality and disease severity, ranging from normal tissue in Grade 0 to severe cellular irregularities in Grade 4. The images were collected and labelled using standard H&E staining techniques. The balanced distribution of samples across all grades is natural, not synthetically generated, ensuring the integrity of class representation. The RCCGNet dataset offers a robust and reliable benchmark for evaluating the effectiveness of feature extraction and classification methods in RCC detection and grading tasks. This realistic and well-curated dataset supports the development of models that can perform accurately across diverse patient groups and imaging conditions. Table 2 portrays the dataset description.

Table 2.

Details of the dataset.

Classes Class description No. of images
Grade 0 or normal “Cells are well arranged and are normal in number.” 500
Grade 1 “Nucleoli are not visible at 400×  magnification. Nucleoli are seen as eosinophilic at 400×  magnification but not very prominent at 100×  magnification” 500
Grade 2 “Slightly irregular contour compared to normal nuclei” 500
Grade 3 “Clearly visible tumours were graded as grade-3” 500
Grade 4 “Rhabdoid or sarcomatoid differentiation” 500
Total number of images 3000

Contrast enhancement stage

Initially, the image pre-processing step applies the CLAHE method to enhance the contrast of images30. This method is chosen for its efficiency in improving local contrast while preventing noise amplification, which is specifically crucial for histological images with varying quality. This technique adapts to local regions, making it robust against staining inconsistencies and poor image quality standards in medical datasets, unlike global histogram equalization methods. This localized improvement enhances the visibility of subtle tissue structures critical for accurate analysis. Furthermore, this model is chosen over other contrast enhancement approaches for its capability in restricting contrast amplification. These benefits ensure more consistent and reliable pre-processing across diverse histological image conditions.

Using a CLAHE, image pre-processing can enhance image quality and reduce noise in kidney cancer images. The CLAHE model recommended is extensively applied to improve the contrast of histopathological images. Moreover, pixel-by-pixel grey-level measurements of kidney cancer images are analyzed using CLAHE. Approximation of the input and output image grey levels of the original random variable and the handled kidney cancer image histogram areas. To enhance kidney cancer image quality, the CLAHE model analyses pixel distribution to improve local contrast.

Compute the grey level equivalent to the complete pixel counts in the histopathology image, as represented in Eq. (1). Additionally, a histogram is described by transforming the input image to the output image of the increasing function. Consider Inline graphic-grey level, Inline graphic-total pixels HI, Inline graphic‐random variable, Inline graphic-pixel number,

graphic file with name d33e825.gif 1

During Eq. (2), compute the grey levels of the input and output images as arbitrary variable probability density functions of the regions of the original image histogram and the processed histogram. Consider Inline graphic and Inline graphic represent the density function probability,Inline graphic‐output image, and Inline graphic‐grey level input image.

Let Inline graphic and Inline graphic represent the PDFs of the input and output images, where Inline graphic is the input grey level and Inline graphic is the output grey level. The relationship between these PDFs is expressed in Eq. (2).

graphic file with name d33e890.gif 2

where Inline graphic and Inline graphic depict the cumulative distribution functions (CDFs) of the input and output images up to grey levels Inline graphic and Inline graphic, respectively.

The grayscale transfer function Inline graphic, which maps input grey levels to output grey levels, is approximated by scaling the cumulative integral of the input PDF as:

graphic file with name d33e930.gif 3

The grey level transfer function is assessed according to the joint features defined in Eqs. (4 and 5).

graphic file with name d33e944.gif 4
graphic file with name d33e950.gif 5

where Inline graphic, Inline graphic, and Inline graphic are constants or cumulative terms related to the distributions.

In Eq. (6), adaptive histogram equalization (AHE) improves the quality of kidney cancer images compared to complete histogram equalization by using a grayscale re-mapping function expressed discretely as:

graphic file with name d33e981.gif 6

where Inline graphic is the number of pixels in the grey level bin Inline graphic, Inline graphic is the overall pixel number in the region of interest, and Inline graphic indexes the grey levels.

By analyzing the grey level of the entire pixel count in the HI image, image quality is improved by adjusting the transfer function value, which enables better contrast and detail enhancement.

Model selection for feature extraction

Additionally, the ATANNF-IDRCC model utilizes the Twins-SVT method for feature extraction31. This method is chosen for its superior capability in capturing long-range dependencies and complex spatial relationships within RCC images, which conventional CNNs may struggle to represent effectively. This technique utilizes spatially separable AMs, enabling more comprehensive context modelling while maintaining computational efficiency, unlike conventional CNNs that depend on localized receptive fields. This model also effectually balances complexity and performance, presenting an improved feature representation crucial for accurate RCC classification. Furthermore, the model mitigates redundancy in computations compared to standard transformers, making it a practical and powerful alternative to simpler models without losing accuracy. Figure 2 describes the structure of the Twins‐SVT method.

Fig. 2.

Fig. 2

Structure of the twins‐SVT method.

Nowadays, transformer-based methods are mainly applied for image classification. Still, conventional encoder-decoders and Structural Transformers are not working correctly in densely predicted settings. The Twins‐SVT Transformer, a hierarchical network architecture equivalent to CNN, is ideally suited for this task. Patch Partition splits the original image into non-overlapping patch blocks, utilizing a patch dimension of Inline graphic. Therefore, the dimension after Patch Partition is Inline graphic. The Patch embedding function maps the new feature to a dimension C. According to the thorough examination of the present global attention, this framework enhances and increases the attention tactic. The latest tactic combines the local‐global AM. It is compared with the depth-wise separable convolution in CNNs, and the so-called Spatially Separable Self-Attention (SSSA). Unlike the depth-separable convolution, the SSSA projected by Twins-SVT aims to set the spatial sizes of the features, compute the self-attention for all groups, and then combine the grouped attention outcomes from a global view.

SSSA utilizes the local–global attention alternation (LSA-GSA) mechanism, which can significantly reduce computing cost, and complexity is decreased from a square order of the input to a linear order. By outlining the attention of grouping computation and utilizing it as the key to computing the global SA, the local attention is sent to the global. The particular calculation model of SSSA is provided by Eq. (7).

graphic file with name d33e1053.gif 7

Like that, Inline graphic characterizes the Inline graphic sub-window of block Inline graphic, and Inline graphic characterizes the output features of the FFN (Feed Forward Network) module of block Inline graphic

At last, the feature mapping dimension is (512, 7, 7). The Twins-SVT Transformer’s output layer utilizes a Global Average Pooling to obtain a feature vector of length 512 through this feature mapping, then passes it through an LN. A fully connected layer obtains the last classification prediction vector.

For the Twins-SVT Transformer, the network structure is multi-stage, and each stage output is also a collection of feature mappings. It is simply relocated to nearly every task in the CV. The most significant point in CNN is that as the network level deepens, the receptive area of nodes also expands. This feature is further fulfilled in the Twins‐SVT Transformer. To help the method obtain more beneficial information from the targeted input and increase model precision, this study focuses on incorporating feature fusion information between dissimilar phases of Twins-SVT.

The initial phase involves modifying the new Twins-SVT Transformer method. For the original Twins-SVT Transformer, which is applied for image classification, the head classification layer and final adaptive pooling layer are eliminated. Then reshape the feature whose new output size is 49 into the feature mapping Inline graphic of Inline graphic. Therefore, the output is the feature mapping of Inline graphic GAP is achieved on the feature maps, and the feature vector Inline graphic is gained after reducing the dimensions, and Inline graphic is gained after the layer of BN The classification prediction vector Inline graphic is gained after the fully connected layer. This leading branch is primarily used to remove global features.

Considering that the feature mapping dimensions are only Inline graphic, some comprehensive features in the image might be lost. Passing the output feature mapping of Inline graphic over a ConvBlock of a 2D convolution, an activation function, BN, SE Attention, and Mish. The function of 2D convolution is to decrease the feature map size and, at the same time, expand the number of feature map channels to the same number of channels. The aim is to achieve consistency in the number of feature vector channels during joint training and facilitate joint Centre loss training. The BN layer aims to standardize the feature mapping; SE attention is to boost the attention of the model to the main information; the Mish function is a smoother activation function than Relu that raises the non-linearity of the module, and can assist the training of the model, which will increase the generalizability of the method in gradient descent optimization. The role of ConvBlock is to acquire fine-grained features and improve the channel counts, while preserving feature information, so that the feature channel counts remain consistent with the overall channel counts. After passing over a ConvBlock, a feature map of dimensions Inline graphic was gained. Two feature maps of dimensions Inline graphic are gained by horizontal segmentation of the feature map, and then dual feature vectors Inline graphic and Inline graphic are attained by the reduction of GAP. Inline graphic and Inline graphic are achieved through a layer of BN, and lastly, classification prediction vectors Inline graphic and Inline graphic are gained through the FC layer correspondingly. During the training stage, the method is enhanced by the use of loss functions, including cross-entropy, Triplet loss, and Centre loss, respectively. In particular, Inline graphic Inline graphic, and Inline graphic are enhanced by centre loss, Triplet loss, and Inline graphic, and Inline graphic are enhanced by cross-entropy loss. As a result, it is among the features that have been improved, with the addition of a BN layer using Triplet loss. In the inference stage, the feature vectors after the BN layer are sequentially linked based on the channel sizes, serving as the input image feature representation. The model inference and performance assessment are then implemented.

Hybrid classification model

For the RCC classification process, the hybrid BiTCN-BiLSTM-AM model is employed32. This model is chosen for its ability to capture both temporal and sequential data dependencies effectively. The BiTCN method outperforms conventional recurrent networks in modelling long-range temporal patterns with lower computational complexity, while the BiLSTM captures bidirectional sequence data, thereby improving context understanding. The AM model also enhances performance by selectively focusing on the most relevant features, thereby improving classification accuracy. Together, this hybrid approach outperforms simpler models, such as standalone BiLSTM or CNNs, by providing a more comprehensive representation of the complex RCC data, resulting in enhanced robustness and reliability in classification tasks.

The hybrid technique Bi-TCN‐Bi-LSTM‐AM comprises three key modules: Bi-TCN, Bi-LSTM, and AM. The Bi-TCN module is utilized to acquire the bi-directional temporal dependencies in input data, although the Bi-LSTM network then processes the features. AM is combined to adaptively weight salient aspects, thus improving the importance of vital data.

TCN comprises dilated causal convolutional layers with similar output and input lengths, integrating the spatial feature extraction proficiency of CNN and the temporal dependency modelling proficiency of RNN. Unidirectional TCN concentrates on removing forward feature data, although ignoring the reverse feature data. To overcome this constraint, Bi-TCN is utilized to acquire the bi-directional features and temporal dependencies.

The causal convolutional network enforces strict temporal restrictions. The input series Inline graphic is output as Inline graphic once passing through the network, and output Inline graphic relies only on the previous input, effectively evading the leakage of upcoming data. Nevertheless, as the input series length rises, more hidden layers (HLs) are essential for causal convolution. To achieve a larger receptive area while preserving the dimensionality of feature mapping, Bi-TCN employs a bi-directional dilated causal convolution network. For a 1D series of input Inline graphic and a convolution kernel filter:{0,…,k−1}→R:

graphic file with name d33e1271.gif 8

Now s − di indicates a past direction; Inline graphic represents the dilation factor, and Inline graphic signifies the kernel size. While dilated causal convolution develops the receptive field of Bi-TCN, it also causes problems such as slower convergence and gradient vanishing. To tackle this problem, Bi-TCN utilizes the residual block to accomplish precise and effective feature extraction.

The LSTM network is developed from RNN and integrates gated memory cells inside the LSTM to maintain temporal dependency, which effectively addresses the issue of vanishing gradients that is dominant in traditional RNNs, resulting in an over-regulated flow of data. The LSTM network comprises forget, output, and input gates, as defined by Eqs. (914) depict the calculations of these gates:

graphic file with name d33e1302.gif 9
graphic file with name d33e1308.gif 10
graphic file with name d33e1314.gif 11
graphic file with name d33e1320.gif 12
graphic file with name d33e1326.gif 13
graphic file with name d33e1332.gif 14

Now Inline graphic Inline graphic Inline graphic Inline graphic represent the weighted matrix of every gate; Inline graphic Inline graphic Inline graphic Inline graphic indicate biased values of every gate; Inline graphic refer to input, forgetting, cell state, and output gates at moment Inline graphic, correspondingly; Inline graphic signifies sigmoid activation function; Inline graphic denotes hyperbolic tangent activation function; Inline graphic refers to input state of the memory cell at moment Inline graphic. The Bi-LSTM network comprises dual parallel LSTM layers that process the input data in both forward and backwards directions. In contrast to the unidirectional LSTM network, the Bi-LSTM network jointly trains LSTM layers on the reversed and original sequences, further improving integrity and global extraction. To combine the AM for allocating diverse weights to the output of Bi-LSTM, thus improving the aim of inputs that contribute extensively to the output.

graphic file with name d33e1426.gif 15
graphic file with name d33e1432.gif 16
graphic file with name d33e1438.gif 17

Here, Inline graphic Inline graphic refers to the attention-weighted matrix; Inline graphic indicates the attention probability distribution function; Inline graphic depicts the state of the HL vector output by the Bi-LSTM network; Inline graphic represents the bias; Inline graphic represents the final state of the attention mechanism, and Inline graphic signifies the attention score.

Performance analysis

The simulation analysis of the ATANNF-IDRCC model is examined under the RCCGNet dataset. The method runs on Python 3.6.5 with an i5-8600 k CPU, 4 GB GPU, 16 GB RAM, 250 GB SSD, and 1 TB HDD, using a 0.01 learning rate, ReLU, 50 epochs, 0.5 dropout, and batch size 5. Figure 3 represents the sample image of Grades 0 to 4. Figure 4 illustrates the sample images.

Fig. 3.

Fig. 3

Sample of Grade 0 to 4.

Fig. 4.

Fig. 4

Sample images.

Table 3 and Fig. 5 depict the RCC cancer detection of the ATANNF-IDRCC method at 80:20. Under 80% of the training phase (TRPHE), the ATANNF-IDRCC model attains an average Inline graphic of 98.26%, Inline graphic of 95.69%, Inline graphic of 95.70%, Inline graphic of 95.65%, Inline graphic of 94.60%, and Kappa of 94.65%. Moreover, during the 20% of testing phase (TSPHE), the ATANNF-IDRCC model achieves an average Inline graphic of 97.84%, Inline graphic of 94.73%, Inline graphic of 94.56%, Inline graphic of 94.62%, Inline graphic of 93.28%, and Kappa of 93.34%.

Table 3.

RCC cancer detection of ATANNF-IDRCC model 80:20.

Class labels Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Kappa
TRPHE (80%)
 Grade 0 97.90 97.47 92.34 94.84 93.58 93.65
 Grade 1 97.35 90.98 95.89 93.37 91.76 91.82
 Grade 2 98.15 97.42 93.32 95.32 94.20 94.26
 Grade 3 99.00 96.34 98.75 97.53 96.92 96.97
 Grade 4 98.90 96.22 98.20 97.20 96.52 96.58
 Average 98.26 95.69 95.70 95.65 94.60 94.65
TSPHE (20%)
 Grade 0 98.60 96.30 95.12 95.71 94.87 94.92
 Grade 1 97.60 95.41 93.69 94.55 93.01 93.06
 Grade 2 97.80 96.70 91.67 94.12 92.82 92.88
 Grade 3 97.20 91.35 95.00 93.14 91.41 91.48
 Grade 4 98.00 93.91 97.30 95.58 94.31 94.36
 Average 97.84 94.73 94.56 94.62 93.28 93.34

Fig. 5.

Fig. 5

Average values of ATANNF-IDRCC model under 80:20.

In Fig. 6, the training (TRAIN) Inline graphic and validation (VALID) Inline graphic outcomes of the ATANNF-IDRCC methodology on the 80:20 split are depicted. The figure indicates that the TRAIN and VALID Inline graphic values represent upward trends, reflecting the potential of the ATANNF-IDRCC method with efficacious performance across numerous iterations. Furthermore, the TRAIN and VALID Inline graphic remain close throughout the epochs, indicating marginal overfitting and highlighting the superior performance of the ATANNF-IDRCC method, which promises continual prediction on unseen instances.

Fig. 6.

Fig. 6

Inline graphic curve of ATANNF-IDRCC model under 80:20.

In Fig. 7, the TRAIN and VALID loss graphs of the ATANNF-IDRCC approach at an 80:20 ratio are portrayed. It is illustrated that the TRAIN and VALID values exhibit a downward trend, indicating the proficiency of the ATANNF-IDRCC approach in balancing a trade-off between generality and data fitting. The persistent decline further promises the heightened solution of the ATANNF-IDRCC technique and eventually modifies prediction outcomes.

Fig. 7.

Fig. 7

Loss curve of ATANNF-IDRCC technique at 80:20.

In Fig. 8, the precision-recall (PR) inspection study of the ATANNF-IDRCC methodology at 80:20 provides insights into its performance by plotting Precision against Recall for all class labels. The figure demonstrates that the ATANNF-IDRCC methodology persistently achieves elevated PR values among several classes, signifying its competence in maintaining a substantial portion of true positive predictions amid each positive prediction (precision) but also seizing a considerable proportion of actual positives (recall). The constant improvement in PR outcomes across all classes demonstrates the effectiveness of the ATANNF-IDRCC technique in the classifier process.

Fig. 8.

Fig. 8

PR curve of ATANNF-IDRCC technique at 80:20.

In Fig. 9, the ROC inspection of the ATANNF-IDRCC methodology at an 80:20 ratio is analyzed. The outcomes indicate that the ATANNF-IDRCC approach achieves higher ROC values for all classes, demonstrating a notable proficiency in distinguishing between class labels. This steady trend of increased values of ROC on multiple classes indicates the capable performance of the ATANNF-IDRCC approach in class prediction, underscoring the strong nature of the classification process.

Fig. 9.

Fig. 9

ROC curve of ATANNF-IDRCC technique at 80:20.

Table 4 and Fig. 10 present the RCC cancer detection of the ATANNF-IDRCC method under 70:30. Based on 70% TRPHE, the ATANNF-IDRCC method obtains an average Inline graphic of 98.10%, Inline graphic of 95.32%, Inline graphic of 95.29%, Inline graphic of 95.27%, Inline graphic of 94.11%, and Kappa of 94.18%. Besides, on 30% TSPHE, the ATANNF-IDRCC method obtains an average Inline graphic of 97.92%, Inline graphic of 94.87%, Inline graphic of 94.80%, Inline graphic of 94.78%, Inline graphic of 93.52%, and Kappa of 93.59%.

Table 4.

RCC cancer detection of ATANNF-IDRCC technique 70:30.

Class labels Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Kappa
TRPHE (70%)
 Grade 0 98.34 95.94 95.66 95.80 94.77 94.83
 Grade 1 99.03 97.35 97.63 97.49 96.89 96.96
 Grade 2 97.31 96.42 90.22 93.22 91.62 91.69
 Grade 3 97.94 94.17 95.76 94.96 93.67 93.75
 Grade 4 97.89 92.72 97.18 94.90 93.61 93.68
 Average 98.10 95.32 95.29 95.27 94.11 94.18
TSPHE (30%)
 Grade 0 98.67 97.37 96.10 96.73 95.90 95.96
 Grade 1 97.87 96.79 93.21 94.97 93.64 93.72
 Grade 2 97.60 96.27 90.85 93.48 92.07 92.15
 Grade 3 98.40 94.67 97.26 95.95 94.96 95.03
 Grade 4 97.07 89.24 96.58 92.76 91.04 91.12
 Average 97.92 94.87 94.80 94.78 93.52 93.59

Fig. 10.

Fig. 10

Average values of ATANNF-IDRCC technique under 70:30.

In Fig. 11, the TRAIN Inline graphic and VALID Inline graphic outcomes of the ATANNF-IDRCC approach at a 70:30 ratio are illustrated. The figure reflected that the TRAIN and VALID Inline graphic values show emerging tendencies, indicating the capability of the ATANNF-IDRCC approach with developed performance. Furthermore, the TRAIN and VALID Inline graphic remain close throughout the epochs, indicating minimal overfitting and demonstrating the better performance of the ATANNF-IDRCC model, which confirms stable prediction on unobserved instances.

Fig. 11.

Fig. 11

Inline graphic curve of ATANNF-IDRCC technique under 70:30.

In Fig. 12, the TRAIN and VALID loss graphs of the ATANNF-IDRCC methodology under a 70:30 split are illustrated. It means that the TRAIN and VALID values elucidate declining tendencies, representing the abilities of the ATANNF-IDRCC method to balance a trade-off. The persistent drop, moreover, pledges the outstanding performance of the ATANNF-IDRCC method and alters prediction outcomes.

Fig. 12.

Fig. 12

Loss curve of ATANNF-IDRCC technique under 70:30.

In Fig. 13, the PR investigation analysis of the ATANNF-IDRCC methodology at a 70:30 ratio presents an understanding of its performance through a mapping of Precision against Recall for each class label. The figure illustrates that the ATANNF-IDRCC methodology consistently achieves higher PR values across diverse class labels, demonstrating its ability to maintain a substantial portion of true positive predictions among all positive predictions (precision) while also capturing a significant proportion of actual positives (recall). The continual enhancement in PR outcomes amid each class reveals the efficacy of the ATANNF-IDRCC method during the classifer process.

Fig. 13.

Fig. 13

PR curve of ATANNF-IDRCC technique under 70:30.

In Fig. 14, the ROC analysis of the ATANNF-IDRCC methodology is examined under a 70:30 split. The results indicate that the ATANNF-IDRCC methodology achieved high ROC results across all classes, establishing its foremost capabilities to distinguish between classes. This consistent pattern of increasing ROC values across multiple classes indicates the skilful performance of the ATANNF-IDRCC model in class prediction, emphasizing the robust nature of the classification process.

Fig. 14.

Fig. 14

ROC curve of ATANNF-IDRCC technique under 70:30.

Table 5 and Fig. 15 present a comparative analysis of the ATANNF-IDRCC methodology with existing methodologies at various metrics33,34. The results underscore that the current models, such as MobileNetv2, VIT, T2T, SwinV2, ConvNext, ResNet18, and ResNet34, have shown the worst performance. Whereas, the ATANNF-IDRCC model has attained the highest Inline graphic, Inline graphic, Inline graphic and Inline graphic of 98.26%, 95.69%, 95.70%, and 95.65%, respectively.

Table 5.

Comparative analysis of ATANNF-IDRCC model with existing techniques.

Model Inline graphic Inline graphic Inline graphic Inline graphic
MobileNetv2 97.94 89.59 92.44 92.23
VIT 89.50 90.25 92.77 93.66
T2T 96.62 92.89 89.57 94.54
SwinV2 97.62 89.82 92.78 93.44
Convnext 93.14 92.99 91.92 90.50
ResNet18 94.20 95.01 92.02 89.56
ResNet34 93.70 95.15 95.01 93.10
ATANNF-IDRCC 98.26 95.69 95.70 95.65

Fig. 15.

Fig. 15

Comparison analysis of ATANNF-IDRCC model (a)Inline graphic, (b)Inline graphic, (c)Inline graphic, and (d)F1Score.

Table 6 and Fig. 16 illustrate the computational time (CT) analysis of the ATANNF-IDRCC methodology in comparison to existing models. The MobileNetV2 took 10.77 s, while ViT required 26.36 s, and the T2T transformer required 25.45 s. The SwinV2 model processed the data in 19.25 s, and ConvNeXt took 17.89 s. ResNet18 and ResNet34 models recorded CTs of 11.09 and 18.25 s, respectively. Notably, the ATANNF-IDRCC model illustrated superior efficiency, completing the computations in only 8.28 s, which is approximately 23% faster than the fastest existing model, MobileNetV2, indicating its potential for real-time applications and practical deployment.

Table 6.

CT assessment of ATANNF-IDRCC methodology with existing models.

Model CT (sec)
MobileNetv2 10.77
VIT 26.36
T2T 25.45
SwinV2 19.25
Convnext 17.89
ResNet18 11.09
ResNet34 18.25
ATANNF-IDRCC 8.28

Fig. 16.

Fig. 16

CT assessment of ATANNF-IDRCC methodology with existing models.

Table 7 and Fig. 17 indicate the ablation study of the ATANNF-IDRCC approach. The Twins-SVT model attained an Inline graphic of 96.96%, Inline graphic of 94.17%, Inline graphic of 94.38%, and an Inline graphic of 94.46%. The BiTCN-BiLSTM-AM technique improved these results with an Inline graphic of 97.72%, Inline graphic of 94.93%, Inline graphic of 94.98%, and Inline graphic of 95.12%. The ATANNF-IDRCC model outperformed both, with an Inline graphic of 98.26%, Inline graphic of 95.69%, Inline graphic of 95.7%, and an Inline graphic of 95.65%. Ablation studies further validated the efficiency of each component, illustrating consistent Inline graphic gains when all elements of the model were integrated.

Table 7.

Result analysis of the ablation study of the ATANNF-IDRCC approach.

Model Inline graphic Inline graphic Inline graphic Inline graphic
Twins-SVT 96.96 94.17 94.38 94.46
BiTCN‐BiLSTM‐AM 97.72 94.93 94.98 95.12
ATANNF-IDRCC 98.26 95.69 95.7 95.65

Fig. 17.

Fig. 17

Result analysis of ablation study of ATANNF-IDRCC approach.

Table 8 presents the computational efficiency of various methods, evaluated based on floating-point operations per second (FLOPs) in millions (M) and GPU memory usage in megabytes (MB). The conventional CNN model required 18.41 M FLOPs and 1712 M GPU memory, while the Transformer model used slightly more, with 19.21 M FLOPs and 1720 M GPU memory. The SVM and K-nearest neighbours (KNN) techniques achieved FLOPs of 19.32 M and 18.71 M, respectively, with GPU usage of 1772 M and 1672 M. The HybridSN approach consumed 18.38 M FLOPs but had a notably higher GPU memory requirement at 2820 M. The 2D-CAE model required 19.17 M FLOPs and 1110 M GPU memory. In contrast, the ATANNF-IDRCC approach illustrated superior computational efficiency, requiring only 11.00 M FLOPs and 924 M GPU memory, indicating significant savings in both processing power and memory usage compared to other methods.

Table 8.

Computational efficiency comparison of various models showing FLOPs and GPU memory usage in millions (M).

Method FLOPs (M) GPU (M)
CNN 18.41 1712
Transformer 19.21 1720
SVM 19.32 1772
KNN 18.71 1672
HybridSN 18.38 2820
2D-CAE 19.17 1110
ATANNF-IDRCC 11.00 924

Conclusion

This paper presents an ATANNF-IDRCC model. The aim is to develop an accurate and automated model for detecting and ranking RCC using kidney histopathology images. Initially, the image pre-processing stage utilizes the CLAHE method to enhance image contrast. Furthermore, the ATANNF-IDRCC model utilizes the Twins-SVT method for the feature extraction process. For the RCC classification process, the hybrid of the BiTCN‐BiLSTM‐AM model is employed. The performance analysis of the ATANNF-IDRCC technique is examined under the RCCGNet dataset. The comparison study of the ATANNF-IDRCC technique demonstrated a superior accuracy value of 98.26% compared to existing models. The limitations of the ATANNF-IDRCC technique include the relatively limited size and diversity of the dataset, which may affect the model’s capability to generalize across broader populations and varied imaging conditions. Furthermore, the model’s performance in real-time clinical environments remains to be thoroughly evaluated, encompassing factors such as image acquisition variability and differing staining protocols. Future work may concentrate on expanding the dataset with more heterogeneous samples, integrating cross-institutional validation, and optimizing the model for deployment in clinical workflows. Efforts will also be made to enhance interpretability and minimize computational complexity, thereby facilitating practical application.

Author contributions

Conceptualization: Eliazer M Data curation and Formal analysis: Guntupalli Manoj Kumar, Sibi Amaran Investigation and Methodology: Eliazer M, Sibi Amaran Funding Support, Monalisa Sahu Project administration and Resources: Monalisa Sahu Software and Supervision; Y. Shasikala, Eliazer M, Kanchan Bala Validation and Visualization: Bibhuti Bhusan Dash, Kanchan Bala Writing—original draft, Eliazer M Writing—review and editing, Monalisa Sahu All authors have read and agreed to the published version of the manuscript.

Data availability

The data supporting the findings of this study are openly available in the GitHub repository at https://github.com/shyamfec/RCCGNet, reference number29.

Declarations

Competing interests

The authors declare no competing interests.

Ethics approval

This article does not contain any studies with human participants performed by any of the authors.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.Tabibu, S., Vinod, P. K. & Jawahar, C. V. Pan-renal cell carcinoma classification and survival prediction from histopathology images using deep learning. Sci. Rep.9(1), 10509 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Chanchal, A. K., Lal, S., Kumar, R., Kwak, J. T. & Kini, J. A novel dataset and efficient deep learning framework for automated grading of renal cell carcinoma from kidney histopathology images. Sci. Rep.13(1), 5728 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Nancy, G. & Bhuvaneswari, E. A deep learning-driven multi-layer digital twin framework with miot for precision oncology in cancer diagnosis. J. Intell. Syst. Internet Things.17(1), 16–26 (2025).
  • 4.Jayapandian, C. P. et al. Development and evaluation of deep learning–based segmentation of histologic structures in the kidney cortex with multiple histologic stains. Kidney Int.99(1), 86–101 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Bahadoram, S. et al. Renal cell carcinoma: An overview of the epidemiology, diagnosis, and treatment. G. Ital. Nefrol.39(3), 2022 (2022). [PubMed] [Google Scholar]
  • 6.Bektas, C. T. et al. Clear cell renal cell carcinoma: Machine learning-based quantitative computed tomography texture analysis for prediction of Fuhrman nuclear grade. Eur. Radiol.29, 1153–1163 (2019). [DOI] [PubMed] [Google Scholar]
  • 7.Williamson, S. R., Taneja, K. & Cheng, L. Renal cell carcinoma staging: Pitfalls, challenges, and updates. Histopathology74(1), 18–30 (2019). [DOI] [PubMed] [Google Scholar]
  • 8.Su, H. et al. Renal histopathological analysis of 26 postmortem findings of patients with COVID-19 in China. Kidney Int.98(1), 219–227 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Altini, N. et al. Semantic segmentation framework for glomeruli detection and classification in kidney histological sections. Electronics9(3), 503 (2020). [Google Scholar]
  • 10.Hernández Bandera, N., Arizaga, J. M. M. & Rodríguez Reyes, E. Assessment and prediction of chronic kidney using an improved neutrosophic artificial intelligence model. Int. J. Neutrosophic Sci. (IJNS). 21(1), 174–183 (2023).
  • 11.Bino, J., Preethi, C., Renukadevi, M., Kanageswari, S. & Nelson, L. Development of a Comprehensive Deep Learning Framework for Enhanced Detection and Accurate Classification of Renal Cancer. In: 2025 8th International Conference on Trends in Electronics and Informatics ICOEI. IEEE. pp. 1056–1062 (2025).
  • 12.Mehta, S. & Bhalla, A. Towards Clinical Decision Support: CNN-SVM Models for Kidney Tumor Classification. In: 2025 International Conference on Automation and Computation (AUTOCOM). IEEE. pp. 1407–1411 (2025).
  • 13.Hossain, M. S., Armstrong, L. J., Cook, D. M. & Zaenker, P. Application of histopathology image analysis using deep learning networks. Human Centric Intell. Syst.4(3), 417–436 (2024). [Google Scholar]
  • 14.Akram, Z., Munir, K., Tanveer, M. U., Rehman, A. U. & Bermak, A. Kidney Ensemble-Net: Enhancing Renal Carcinoma Detection through Probabilistic Feature Selection and Ensemble Learning (IEEE Access, 2024). [Google Scholar]
  • 15.Liu, X. et. al. MSMTSeg: Multi-stained multi-tissue segmentation of kidney histology images via generative self-supervised meta-learning framework. IEEE J. Biomed. Health Inform.29 (6), 3906–3917 (2025). [DOI] [PubMed]
  • 16.Sahu, P. et al. Kidney tumor classification using deep learning techniques from computed tomography images. In: International Conference on Machine Learning Algorithms. (Cham: Springer Nature Switzerland). pp. 372–379 (2024).
  • 17.Badawy, M. et al. A two-stage renal disease classification based on transfer learning with hyperparameters optimization. Front. Med.10, 1106717 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Rajkumar, K. et al. Kidney cancer detection using deep learning models. In: 2023 7th International Conference on Trends in Electronics and Informatics ICOEI. IEEE. pp. 1197–1203 (2023).
  • 19.Moldovanu, S., Raducan, E., Miron, M. & Sîrbu, C. The advantages of employing transfer learning in the classification of breast cancer histopathological images. Eurasia Proc. Sci. Technol. Eng. Math.33, 140–147 (2025). [Google Scholar]
  • 20.Feng, C. et al. Artificial intelligence-assisted quantification and assessment of whole slide images for pediatric kidney disease diagnosis. Bioinformatics40(1), btad740 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Rehman, A., Mahmood, T. & Saba, T. Robust kidney carcinoma prognosis and characterization using Swin-ViT and DeepLabV3+ with multi-model transfer learning. Appl. Soft Comput.170, 112518 (2025). [Google Scholar]
  • 22.Ali, A. M., Benjdira, B., Koubaa, A., Boulila, W. & El-Shafai, W. TESR: Two-stage approach for enhancement and super-resolution of remote sensing images. Remote Sensing15(9), 2346 (2023). [Google Scholar]
  • 23.Pimpalkar, A. et al. Fine-tuned deep learning models for early detection and classification of kidney conditions in CT imaging. Sci. Rep.15(1), 10741 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.He, Q. et al. Global attention based GNN with Bayesian collaborative learning for glomerular lesion recognition. Comput. Biol. Med.173, 108369 (2024). [DOI] [PubMed] [Google Scholar]
  • 25.Maqsood, H. and Khan, S.U.R. MeD-3D: A Multimodal Deep Learning Framework for Precise Recurrence Prediction in Clear Cell Renal Cell Carcinoma (ccRCC). arXiv preprint arXiv:2507.07839. (2025).
  • 26.Ye, Z., Ge, S., Yang, M., Du, C. & Ma, F. An explainable classification model of renal cancer subtype using deep learning. In: 2024 17th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics CISP-BMEI. IEEE. pp. 1–10 (2024).
  • 27.Sharon, J. J. & Anbarasi, L. J. An attention enhanced dilated bottleneck network for kidney disease classification. Sci. Rep.15(1), 9865 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Uhm, K. H., Jung, S. W., Hong, S. H. & Ko, S. J. Lesion-aware cross-phase attention network for renal tumor subtype classification on multi-phase CT scans. Comput. Biol. Med.178, 108746 (2024). [DOI] [PubMed] [Google Scholar]
  • 29.https://github.com/shyamfec/RCCGNet
  • 30.Rajkumar, R. et al. DARKNET-53 convolutional neural network-based image processing for breast cancer detection. Mesop. J. Artif. Intell. Healthc.2024, 59–68 (2024). [Google Scholar]
  • 31.Jin, K., Zhai, J. & Gao, Y. TwinsReID: Person re-identification based on twins transformer’s multi-level features. Math. Biosci. Eng.20(2), 2110–2130 (2023). [DOI] [PubMed] [Google Scholar]
  • 32.Zhang, X., Luo, H., Pei, Y. & Ma, K. Mechanism-Guided Short-Term Heat Load Prediction of District Heating System Based on A Hybrid Data-Driven Model. Available at SSRN 5260365.
  • 33.Zong, H. et al. A deep learning ICDNET architecture for efficient classification of histopathological cancer cells using Gaussian noise images. Alex. Eng. J.112, 37–48 (2025). [Google Scholar]
  • 34.Abdeltawab, H. et al. A pyramidal deep learning pipeline for kidney whole-slide histology images classification. Sci. Rep.11(1), 20189 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The data supporting the findings of this study are openly available in the GitHub repository at https://github.com/shyamfec/RCCGNet, reference number29.


Articles from Scientific Reports are provided here courtesy of Nature Publishing Group

RESOURCES