Abstract
The COVID-19 pandemic has spread rapidly across the globe, presenting significant public health challenges. Biomedical imaging techniques, particularly computed tomography (CT), are vital for detecting and monitoring diseases. Accurately segmenting pneumonia lesions in CT scans is essential for diagnosing COVID-19 and assessing the severity of the disease. However, low-contrast infected regions pose a major challenge for automated segmentation methods. In this paper, we present an accessible deep learning framework for the automatic segmentation of COVID-19-infected regions. This framework integrates Contrast-Limited Adaptive Histogram Equalization (CLAHE) preprocessing with an Attention U-Net model trained using a hybrid Dice-Tversky loss. It is supported by extensive data augmentation techniques to improve generalization. We evaluated our approach on a publicly available COVID-19 CT dataset using 5-fold cross-validation. Our results achieved a Dice score of 0.83, an Intersection over Union (IoU) of 0.71, and an accuracy of 99.74%. To enhance the interpretability of our deep learning model, we applied Explainable Artificial Intelligence (XAI) techniques, such as Gradient-weighted Class Activation Mapping (Grad-CAM). These results demonstrate the effectiveness of our proposed framework and highlight its potential as a practical tool for medical imaging applications.
Subject terms: Computational biology and bioinformatics, Diseases, Health care, Mathematics and computing, Medical research
Introduction
The term “novel” is frequently associated with coronavirus to indicate it is a new strain within a group of dangerous viruses1. As stated by the World Health Organization (WHO)2, coronaviruses are part of a vast family of viruses that encompasses a range of illnesses, from the common xcold to severe diseases. These diseases have the potential to affect both people and animals3. The outbreak of COVID-19 in Wuhan, China, was first reported in late 2019 and quickly spread to various countries globally. In March 2020, the World Health Organization officially classified the disease as a pandemic4. By May 2021, COVID-19 had infected over 169 million individuals and resulted in the deaths of more than 3.5 million people5.
In this urgent scenario, faster identification of COVID-19 and improved patient screening will aid in minimizing infections and death rates, resulting in more effective treatment for those affected. A common method for identifying COVID-19 is the reverse transcription-polymerase chain reaction (RT-PCR) test. However, this test can be time-consuming and often leads to a significant number of false negatives6. An alternative method for diagnosing COVID-19 involves the use of medical imaging techniques. This approach utilizes images obtained from computed tomography (CT), radiography (X-ray), and magnetic resonance imaging (MRI) to detect COVID-19. The first two methods are more commonly used due to the wider availability of the necessary equipment. By examining these medical images, it is possible to identify whether COVID-19 is present or absent, as well as to assess the progression and severity of the illness. Given that medical imaging techniques are non-invasive, they are particularly appropriate for diagnosing COVID-19, as they eliminate the need for direct and close interaction with patients, thereby reducing the risk of exposure for healthcare workers7.
The CT and chest X-ray imaging have recently proven to be valuable diagnostic tools for COVID-19. Radiography images offer advantages such as simplicity, availability, and quicker diagnosis. While chest X-rays are more affordable, they are less effective for COVID-19 screening than CT scans, as they contain less detailed information. Nonetheless, CT has been crucial for diagnosis throughout this pandemic, revealing the characteristic radiographic signs in many patients infected with COVID-198. Healthcare professionals often must review multiple CT scans, making the process lengthy and prone to mistakes. To address this, automated deep learning methods have been developed to segment regions of interest (ROIs) of different shapes and sizes, such as lungs and lesions, within high-resolution CT images. These methods can assist healthcare professionals in the diagnostic procedure9.
In recent years, the use of machine learning (ML) techniques, such as deep neural networks (DNNs), has gained popularity in the healthcare sector due to the increasing complexity of healthcare data. Machine learning algorithms offer efficient and effective models for data analysis, allowing for the discovery of hidden patterns and valuable insights from large health datasets that traditional analytics cannot uncover in a timely manner. Specifically, Deep Learning (DL) methods have proven to be effective approaches for pattern recognition within healthcare systems10.
Because of the robust ability of deep learning techniques to learn and map features, they can effectively identify high-level features and represent data accurately when trained with a sufficiently large dataset. This efficiency allows deep-learning-based segmentation methods to outperform traditional approaches, leading to considerable interest in various deep-learning techniques for segmenting CT images. Deep neural networks are a type of neural network architecture designed as a multilayer perceptron (MLP), encompassing various other structures like convolutional neural networks (CNN) and recurrent neural networks (RNN). They are systematically trained to automatically learn representations from datasets, eliminating the need for hand-crafted feature extraction11. The CNN is a deep learning model that operates under a supervised learning paradigm and can process images as input. It uses filters to transform image pixels into feature representations, which help in differentiating between various types of data. Typically, it consists of three main components: the convolutional layer, the pooling layer, and the fully connected layer. The convolutional layer serves as the first component of a convolutional network. Subsequent layers can include additional convolutional or pooling layers, with the fully connected layer positioned at the end12.
Segmentation involves partitioning an image into multiple sections. In image segmentation, it is more effective to focus on semantic objects associated with particular areas instead of analyzing all the information in the image simultaneously. Consequently, the main objective of segmentation is to identify regions within the image that represent significant components of the object for the sake of analysis. The process of medical image segmentation using deep learning involves several essential stages. Initially, a dataset of medical images is gathered and divided into training, validation, and testing subsets; the training subset is utilized to learn the model parameters, the validation subset assists in fine-tuning the hyperparameters, and the testing subset assesses the model’s ability to generalize. Subsequently, preprocessing and scaling of the images are performed, which include techniques like normalization and data augmentation methods, such as random rotations and scaling, to improve the diversity of the dataset. A deep learning-based network for image segmentation is then utilized to extract features and produce segmented images, emphasizing significant areas for examination. Finally, the performance of the segmentation is evaluated using metrics that measure the accuracy and effectiveness of the model11.
In this study, we propose a systematic integration of established techniques to address the challenge of COVID-19 lesion segmentation in lung CT scans. Specifically, contrast-limited adaptive histogram equalization (CLAHE) is applied as a preprocessing step to enhance lesion visibility, followed by an Attention U-Net architecture to improve feature representation during segmentation. To strengthen model generalization, extensive data augmentation strategies are employed. Finally, a hybrid Dice-Tversky loss is used to mitigate class imbalance and enhance robustness. Additionally, Gradient-weighted Class Activation Mapping (Grad-CAM) is utilized to provide explainable Artificial Intelligence (XAI) visualizations, highlighting the regions of the lung that the model focuses on when predicting lesions. While these components have been individually explored in prior studies, our contribution lies in their cohesive integration into a lightweight, reproducible pipeline that delivers strong performance for COVID-19 CT lesion segmentation.
Related work
Deep learning has gained significant attention in the field of medical image analysis, especially in segmentation tasks that are crucial for diagnosis and treatment planning. Zhao et al.13 introduced the D2A U-Net, a segmentation model designed to improve COVID-19 lesion detection in CT scans. Their network used a dual attention mechanism together with hybrid dilated convolutions in the decoder, which helped refine feature maps and expand the receptive field. On a public COVID-19 CT dataset, the model achieved a Dice score of 0.73 and a recall of 0.71, showing the benefit of attention in capturing contextual information. However, the method focused mainly on decoder-level feature enhancement and did not include preprocessing or data-level techniques that could make lesions easier to distinguish. In our work, we address this by combining CLAHE preprocessing with systematic data augmentation and a hybrid Dice-Tversky loss, which together strengthen lesion visibility and improve segmentation robustness.
In a similar vein, Chen et al.14 proposed a method for COVID-19 lesion segmentation that combines several advanced components. The framework first extracts regions of interest using a patch-based strategy, then applies a 3D attention network to better capture spatial dependencies. Training was guided by a combination loss, and both data augmentation and conditional random fields (CRF) were used to refine the outputs. Their results showed strong accuracy and highlighted the potential of this multi-stage design. On the other hand, the reliance on a complex 3D structure and additional CRF post-processing makes the approach computationally demanding and less straightforward to reproduce. By contrast, our method uses a simpler 2D Attention U-Net enhanced with CLAHE and a hybrid Dice-Tversky loss, achieving better handling of subtle or low-contrast lesions while keeping the pipeline lightweight and reproducible.
Similarly, Ahmed et al.9 developed an improved U-Net framework with an attention mechanism for segmenting COVID-19 lung lesions. Their model combined boundary loss with a weighted binary cross-entropy Dice loss to address small or imbalanced lesions. On several public datasets, the method achieved Dice scores of 0.93 for lungs and 0.76 for infected areas, showing good performance. Nevertheless, the framework did not employ preprocessing techniques such as CLAHE to deal with low-contrast images, nor did it use a hybrid Dice-Tversky loss to explicitly balance class distributions. In our approach, these gaps are filled by applying CLAHE preprocessing and a hybrid loss, which makes small or irregular lesions more distinguishable and supports robust segmentation across diverse cases.
Enshaei et al.15 introduced a deep learning approach for segmenting COVID-19 lung lesions in CT images. Alongside their model, they released a public dataset of 433 slices from 82 patients, each with expert annotations, which provided a valuable resource for further studies. Their method was tested both on this dataset and on external scans, showing good generalization across different CT machines. To address differences between training and testing data, they also applied an unsupervised image enhancement technique. The system achieved a Dice score of about 0.81, with high sensitivity (0.83) and very high specificity (0.99), confirming its effectiveness in detecting lesions. Despite these strong results, the framework still depended heavily on domain adaptation and was limited by the relatively small dataset, which may reduce its ability to capture fine or low-contrast lesion details. In our work, we approach this limitation differently by applying CLAHE preprocessing and a hybrid Dice-Tversky loss, which directly improves lesion visibility and balances the segmentation of small or subtle regions, without the need for extra adaptation steps.
Furthermore, Zhang et al.16 proposed MSDC-Net, a multiscale dilated convolutional network designed to overcome two main obstacles in COVID-19 lesion segmentation: the variability in lesion sizes and the weak contrast between infected and healthy lung tissues. The network integrates a Multiscale Feature Capture Block (MSFCB) to extract lesion features across multiple resolutions and a Multilevel Feature Aggregate (MLFA) module to minimize information loss during down-sampling. When evaluated on the publicly available COVID-19 CT Segmentation dataset, the model achieved promising results, reporting a Dice score of 82.4%, sensitivity of 81.1%, and mean IoU of 78.2%. These outcomes highlight the effectiveness of combining multiscale and multilevel feature extraction for more precise lesion boundary detection. However, their framework mainly emphasizes architectural design while placing less focus on preprocessing strategies or loss functions that directly tackle issues such as low-contrast regions or imbalanced lesion distributions. In contrast, our study complements this gap by incorporating CLAHE preprocessing to enhance lesion visibility and a hybrid Dice-Tversky loss, which jointly improve segmentation consistency across both small and large lesion regions.
Saha et al.17 developed ADU-Net, an attention-based dense U-Net enhanced with deep supervision. In their framework, dense blocks were employed to improve gradient flow, attention gates were used to emphasize informative regions, and deep supervision aggregated outputs from different resolution levels to refine segmentation performance. This architecture demonstrated the potential of combining dense connectivity with attention mechanisms for lesion detection. Nevertheless, ADU-Net does not incorporate contrast enhancement methods such as CLAHE, relies on standard loss functions rather than a hybrid approach like Dice-Tversky, and applies only limited data augmentation. These limitations restrict its ability to address low-contrast lesions and class imbalance systematically. By contrast, our study introduces a more streamlined and reproducible pipeline that integrates CLAHE preprocessing for local contrast enhancement, a hybrid Dice-Tversky loss to manage small or imbalanced lesion regions better, and an extensive set of augmentation techniques. This combination ensures robustness without relying on the added complexity of dense blocks and deep supervision.
Additionally, Ilhan et al.18 explored the impact of preprocessing on segmentation performance by proposing a framework that combined histogram-based non-parametric region localization with enhancement (LE) techniques and the classical U-Net architecture. The LE method enhanced the visibility of infected regions before segmentation, allowing the network to better distinguish between healthy and diseased tissues. Their system achieved strong results, reporting an accuracy of 97.75%, a Dice score of 0.85, and a Jaccard index of 0.74, which marked a 0.21 improvement in Dice score compared to the standard U-Net. These outcomes highlight the essential contribution of preprocessing to boosting segmentation accuracy in COVID-19 CT scans. However, while their approach demonstrated the importance of enhancement, it relied on a relatively narrow preprocessing strategy (histogram-based LE) and did not incorporate advanced loss functions or comprehensive augmentation schemes to address class imbalance and lesion variability. In our work, we extend this direction by using CLAHE preprocessing, which provides more localized contrast enhancement, together with a hybrid Dice-Tversky loss to handle imbalanced and small lesion regions, supported by an extensive augmentation pipeline. This broader integration results in a more systematic and generalizable framework.
Alshomrani et al.19 proposed SAA-UNet. This hybrid framework integrates the strengths of Spatial Attention U-Net (SA-UNet) and Attention U-Net (Att-UNet) to boost the accuracy of infection segmentation in CT scans. By leveraging spatial attention modules together with attention gates, the model enhances the focus on lesion-relevant regions while suppressing irrelevant background. The framework was evaluated on multiple datasets, including MedSeg, Zenodo 20P, and Radiopaedia 9P, where it achieved a Dice similarity coefficient as high as 0.94 and a classification accuracy exceeding 97% for binary segmentation. Furthermore, the model demonstrated competitive performance in multi-class segmentation, emphasizing the utility of attention-based mechanisms in handling diverse COVID-19 lesion patterns. Despite these promising outcomes, the architecture relies on a relatively complex combination of dual attention modules, which may increase computational cost and complicate reproducibility in practical scenarios. In contrast, our study adopts a lighter 2D Attention U-Net. Still, it strengthens it with CLAHE preprocessing to improve local contrast and a hybrid Dice-Tversky loss to better address small or imbalanced lesions. This design achieves robust performance while maintaining simplicity and reproducibility, making it more adaptable to real-world clinical settings.
Geng et al.20 addressed two key limitations in medical image segmentation: the restricted ability of CNN-based models to capture long-range dependencies and the difficulty of transformer-based models in preserving fine lesion boundaries. To overcome these issues, they proposed STCNet, a hybrid network that alternates between Swin Transformer blocks and CNN layers within an encoder-decoder structure. A ReSwin transformer block was integrated to enhance global context modeling, while a skip connection cross-attention module ensured that boundary information was retained during feature propagation. Additionally, a scale-aware pyramid fusion module was employed to improve multi-scale feature integration. When evaluated on two benchmark COVID-19 CT segmentation datasets, STCNet achieved Dice scores of 79.92% and 82.78%, establishing state-of-the-art performance by effectively balancing global context extraction and fine detail preservation. Despite these advances, the model relies on a complex transformer-CNN hybrid architecture, which demands higher computational resources and may limit practical deployment in clinical settings. In contrast, our work emphasizes a lighter 2D Attention U-Net, enhanced with CLAHE preprocessing to improve contrast and a hybrid Dice-Tversky loss to handle class imbalance. This design allows us to maintain competitive segmentation performance while offering better reproducibility and reduced computational overhead.
Proposed methodology
In this section, we present the suggested approach for the automatic segmentation of COVID-19-impacted areas in lung CT scan images. The comprehensive block diagram illustrating the main stages of the proposed method is displayed in Fig. 1. The proposed methodology is structured as follows. First, to enhance the visibility of infected areas, Contrast Limited Adaptive Histogram Equalization (CLAHE) is applied to each lung region suspected of having COVID-19. The CT scans are then normalized to standardize the intensity levels and resized to a consistent resolution. To improve model generalization, extensive data augmentation is performed, which includes (i) spatial augmentations such as scaling, rotation, and elastic deformations, and (ii) color augmentations such as adjustments to brightness, saturation, and contrast.
Fig. 1.
Flowchart diagram of the proposed attention U-Net segmentation model.
For segmentation, we employ a U-Net architecture enhanced with an attention mechanism, which helps the model focus on clinically relevant infected regions. To address class imbalance and increase robustness, we adopt a hybrid loss function that combines Tversky and Dice losses. Finally, the methodology is validated using 5-fold cross-validation, providing a more reliable and accurate assessment of segmentation performance. Explainable AI (XAI) techniques, such as Grad-CAM, are used to offer visual explanations of the model’s decision-making process, thereby improving interpretability and clinical reliability. While each of these components has been explored individually in prior research, the novelty of our work lies in their systematic integration into a lightweight and reproducible pipeline tailored for COVID-19 CT lesion segmentation, which has demonstrated strong performance. The detailed steps of the methodology are explained in the following subsections.
Dataset
In this study, we utilized the publicly available COVID-19 CT scan lesion segmentation dataset21. The dataset was constructed by combining data from seven public COVID-19 collections. Out of these, three datasets provided lesion annotations, which were merged with the corresponding CT frames to yield 2,729 image-mask pairs. To ensure uniformity, all lesion types were mapped to white. Each CT slice is provided at a resolution of 512
512 pixels.
The dataset was split randomly into training and testing sets at the slice level, using an 80:20 ratio. This resulted in 2,183 images allocated for training and validation, and 546 images designated for testing. To improve reproducibility and provide a more reliable assessment of performance, we implemented 5-fold cross-validation on the training set. We recognize that slice-level splitting might lead to the inclusion of slices from the same patient in both the training and testing sets, potentially inflating our estimates of generalization performance. In our future work, we plan to use patient-level splitting and validate our findings on independent multi-center datasets to ensure a more realistic evaluation of the model’s robustness.
Image preprocessing
A critical step in the analysis of medical images is image preprocessing, which involves applying a variety of methods to raw image data to improve its quality and prepare it for use as input in deep learning or machine learning models. By addressing issues like noise, inadequate contrast, and irregularities in size or intensity, preprocessing aims to increase the visual clarity, consistency, and informational value of images. When it comes to CT scans, preprocessing usually includes tasks like grayscale normalization, image denoising, scaling, and contrast enhancement techniques like CLAHE. By standardizing the data, highlighting relevant anatomical characteristics, and reducing distracting elements, these approaches help the model focus on clinically important patterns such as abnormalities or lesions. Image preprocessing is essential for increasing the resilience, accuracy, and effectiveness of automated diagnostic systems by reducing intra-class variability and improving significant features.
Histogram equalization is a process designed to distribute the gray levels throughout an image so that they are uniformly spread across their full range. This method effectively adjusts the brightness value of each pixel according to the image’s histogram, aiming to broaden the pixel value distribution to enhance perceptual information. This study utilizes the CLAHE. Unlike traditional histogram equalization, which processes the entire image simultaneously, CLAHE functions locally by dividing the image into smaller segments known as tiles. CLAHE enhances the contrast in these small tiles and uses bilinear interpolation to blend the neighboring tiles, thereby smoothing out the artificial boundaries. Furthermore, a ‘clip limit’ factor is implemented to prevent excessive saturation in the image, particularly in regions that exhibit high peaks in the histogram of certain tiles because many pixels are clustered within the same gray level range22.
All CT images and their corresponding segmentation masks are scaled to a uniform spatial resolution of 256
256 pixels to preserve consistent input dimensions and enhance the deep learning model’s training procedure. Assuring compatibility with convolutional neural network architectures and batch processing depends on the input size being standardized, which is achieved by this scaling. Furthermore, CT image pixel values are scaled to the [0,1] range by dividing by 255 to accomplish intensity normalization. By ensuring uniform data distribution across the input space, this normalization step enhances numerical stability during training and encourages quicker convergence. To distinguish between lesion and background areas, binary masks are transformed to an integer format and thresholded at 0.5. The original image and the CLAHE-enhanced image of a sample are shown in Fig. 2. Figure 3 displays the enhanced image and the corresponding segmentation mask of that sample.
Fig. 2.
Sample of original CT image and CLAHE enhanced image.
Fig. 3.
Sample of CLAHE-enhanced CT image and its segmentation mask.
Data augmentation
The effectiveness of deep learning neural networks is typically directly related to the volume of available data. Data augmentation is a method that involves generating new data from existing training data through various image processing techniques, including rotation, zooming, and flipping. The purpose is to enhance the model’s generalization performance while broadening the training dataset with additional examples. As a result, the model consistently encounters new and varied versions of the input data, allowing it to learn more robust features23.
In this research, several data augmentation techniques are utilized, such as horizontal flipping, shifting, scaling, rotation, random brightness, elastic transformation, grid distortion, and optical distortion. Horizontal flipping mirrors the image by reflecting it along the vertical axis. The shift and scale transformations alter the image’s position and size. “Shifting” adjusts the image content either up/down or left/right, while “scaling” modifies its dimensions. Rotation refers to turning the image by a specified angle. Random Brightness introduces variations in the image brightness to replicate different imaging conditions. Elastic Transformation implements smooth, random deformations to the image, resembling the natural elasticity found in body tissues. Grid distortion deforms the image by relocating points on a grid applied over it, leading to a non-linear alteration. Optical distortion assists the model in accommodating imperfections and subtle spatial variations that may arise due to the characteristics of CT scanner lenses. These augmentations aim to enhance the model’s ability to generalize and reduce the likelihood of overfitting throughout the training phase24. The total number of training images increased to 4366 after applying data augmentation techniques. Figure 4 illustrates some examples of data augmentation.
Fig. 4.
Some examples of data augmentation.
Segmentation using U-net with attention mechanism
Image segmentation is a key task in the analysis of medical images that entails dividing an image into separate regions based on certain features like intensity, texture, or anatomical structure. In this research, segmentation involves identifying COVID-19-related lesions against the background of lung tissue in CT images. This is accomplished through a pixel-wise classification approach, where each pixel in the original image is labelled to indicate whether it is part of the lesion (foreground) or the non-lesion (background) class. Precise segmentation is vital for enabling quantitative evaluation of disease extent, tracking progression, and aiding clinical decision-making, especially in situations where infection regions are small and irregularly shaped, as observed in COVID-19 cases.
The U-Net deep learning architecture gained prominence due to its efficiency and versatility in image segmentation. The original U-Net paper25 was developed for segmenting biomedical image datasets using a fully convolutional neural network (FCNN) architecture that can be scaled for tasks such as tumor detection, anomalous skin tissue detection, or object detection in RGB images. The U-Net architecture provided an ingenious solution for scaling up multi-label image segmentation. The letter “U” in U-Net comes from the fact that the encoder and decoder parts of the network follow somewhat similar pathways while contracting and expanding, forming a U-shaped symmetrical architecture across each side of the bottleneck.
Figure 5 depicts the network design. It features a contracting pathway on the left and an expansive pathway on the right. The contracting pathway adheres to the standard structure of a convolutional network. It includes the repeated use of two 3x3 convolutions (without padding), each followed by an activation function, ReLU (Rectified Linear Unit), in addition to a 2x2 max pooling operation with a stride of 2 for down-sampling. With each down-sampling iteration, the number of feature channels has doubled. In the expanding path of the U-Net architecture, each phase starts by up-sampling the feature map, which is then processed through a 2
2 convolution that reduces the number of feature channels by half. This output is then merged with the appropriately cropped feature map from the corresponding stage in the contracting path, ensuring spatial information is retained. The combined features are processed through two 3
3 convolutional layers, each activated by a ReLU function. Cropping is essential to address the pixel loss that occurs at the borders during convolution. The final step involves a 1
1 convolution that transforms each 64-dimensional feature vector into the required number of classes. Altogether, the network includes 23 convolutional layers25.
Fig. 5.
The U-Net architecture25.
This study applies several architectural enhancements to the baseline Attention U-Net model to improve its performance in segmenting COVID-19 lesions from chest CT images. The model adopts a symmetrical encoder-decoder structure, where the encoder progressively extracts hierarchical features through a series of convolutional and down-sampling blocks. At the same time, the decoder reconstructs the spatial resolution using transposed convolution layers. Each convolutional block is implemented using separable convolutional layers to reduce computational complexity, followed by batch normalization and ReLU activation to promote stable and efficient training. To mitigate overfitting, particularly due to the limited size of medical imaging datasets, dropout layers with a dropout rate of 0.2 are incorporated within both the encoder and decoder paths.
Transpose convolutional layers are employed in the decoder to perform learnable up-sampling, allowing the model to more effectively restore the spatial resolution of the feature maps. The model incorporates attention gates at each skip connection to selectively highlight important features from the encoder prior to merging them with the upsampled features from the decoder. Each attention gate applies 1
1 convolutions to both the skip connection and the gating signal, aligns spatial dimensions using bilinear up-sampling, and generates an attention coefficient through element-wise addition, ReLU activation, and a final sigmoid operation. This coefficient is used to modulate the skip connection via element-wise multiplication. The bottleneck layer uses a deeper convolutional block to capture high-level semantic representations. The final output is generated through a 1
1 convolution with a sigmoid activation to produce a binary segmentation mask. The model is compiled with the Adam optimizer (learning rate = 1e-4) and trained using a custom loss function combining Dice loss and Tversky loss, with performance evaluated using the Dice coefficient metric. The proposed Attention U-Net architecture is shown in Fig. 6. In the following, we will describe the main components of our model: encoder, decoder, and attention mechanism.
Fig. 6.
The proposed Attention U-Net architecture.
Encoder
The encoder section of the proposed Attention U-Net architecture is designed to gradually extract both spatial and semantic features from the input CT images. It comprises four consecutive down-sampling stages, with each stage consisting of a convolutional block followed by a down-sampling layer. Each convolutional block utilizes a 3
3 Separable 2D Convolution, followed by batch normalization and a ReLU activation to enhance feature representation. To mitigate overfitting, a dropout layer with a rate of 0.2 is incorporated. Down-sampling is achieved through a 2D convolutional layer configured with a stride of 2 and “same” padding, which reduces the spatial resolution while simultaneously expanding the number of feature channels. The number of filters increases progressively at each encoding level–specifically 64, 128, 256, and 512–to enable the extraction of increasingly abstract features from the input image. The final output of the encoder is then forwarded to the bottleneck block for further transformation.
Decoder
The decoder module is designed to restore the spatial resolution of the feature maps while retaining the contextual information acquired during encoding. It comprises four up-sampling blocks arranged symmetrically to the encoder. In each block, an attention gate is first applied to the corresponding skip connection from the encoder, allowing only the most significant features to be transmitted forward. The decoder feature maps are upsampled using Transposed 2D Convolutional layers with a stride of 2 and “same” padding, enabling the model to learn the up-sampling process. These upsampled features are concatenated with the corresponding skip connections, which have been filtered, and then passed through a convolutional block consisting of a Separable 2D Convolution, Batch Normalization, a ReLU activation function, and Dropout. As the decoder reconstructs the spatial resolution, it systematically decreases the number of filters ([512, 256, 128, 64]), ultimately generating the final segmentation map.
Attention mechanism
The Attention Gate (AG) mechanism is integrated to improve skip connections by filtering out irrelevant feature responses and emphasizing important areas for segmentation. In U-Net, the standard architecture involves feature concatenation from the encoder to the decoder, where the fusion of high-resolution features from the encoder and upsampled features from the decoder enhances the localization of segmentation targets. Not all visual information in the encoder feature maps is relevant for accurate segmentation. Moreover, the semantic gap between the encoder and decoder can hinder the model’s performance. To address this, we use a gate attention module before concatenation to enhance the encoder features and minimize the semantic gap13.
As illustrated in Fig. 7, the AG takes two inputs: the encoder feature map
and the gating signal g from the corresponding decoder layer. Both inputs are first linearly transformed using 1
1 convolutions (
and
) to match dimensions and reduce channel depth. These transformed features are element-wise added and passed through a ReLU activation (
), followed by another 1
1 convolution (
) and a sigmoid activation (
) to produce the attention coefficient
. Since the attention coefficient is computed at the resolution of the gating signal, it is then upsampled using bilinear interpolation to match the spatial resolution of the encoder feature map before being applied. This coefficient acts as a soft mask and is applied to the encoder features through element-wise multiplication. The resulting attended feature map
retains only the most relevant spatial information, which is then forwarded to the decoder via concatenation. This approach enables the network to concentrate on critical areas, enhancing segmentation accuracy, particularly in challenging medical imaging tasks like identifying COVID-19 lesions.
Fig. 7.
Attention gate block diagram26.
Given an encoder feature map
and gating signal
, the attention coefficient
is calculated as,
![]() |
1 |
where
and
are 1
1 convolution kernels applied to the encoder features and gating signal, respectively, b is the bias term,
is the ReLU activation function,
is a 1
1 convolution applied after the ReLU to reduce the intermediate features to a scalar attention map,
is the sigmoid function used to squash the attention values between 0 and 1,
is the attention coefficient map, and the final output is obtained by multiplying the input feature map
by
,
![]() |
2 |
Loss function
The loss function represents the model’s performance in fitting the data by quantifying the difference between predicted outputs and actual values. It represents the model’s average error during training and shows how closely its predictions match the actual outcomes. Additionally, the loss function helps guide the adjustment of the model’s parameters to enhance its accuracy and overall performance11.
In this work, we employed a hybrid loss function that combines Dice Loss and Tversky Loss to simultaneously optimize segmentation accuracy and sensitivity to minority lesion regions. The Dice Similarity Coefficient (DSC), as defined in Equation 3, measures the overlap between the predicted probability map
and the ground truth label
for class c. It is widely used in medical image segmentation tasks27,
![]() |
3 |
where N represents the overall number of pixels, c denotes the number of classes. The term
indicates the predicted probability for the i-th pixel belonging to class c, while
represents the corresponding ground truth label. A small constant
is included to avoid division by zero. The Dice Loss (DL) is defined as follows,
![]() |
4 |
While Dice Loss is effective for overlap optimization, it penalizes false positives (FP) and false negatives (FN) equally. This can be problematic in medical imaging, where minimizing FN is often more important for capturing all lesion areas. To address this limitation, we incorporate the Tversky Index (TI), a generalization of DSC that introduces tunable weights for FP and FN, as follows,
![]() |
5 |
where
and
control the relative contribution of FN and FP, respectively. Assigning a higher weight to FN improves recall and supports better lesion detection.
The final combined loss function used during training is a balanced sum of Dice Loss and Tversky Index:
![]() |
6 |
This approach enables the model to learn precise segmentation boundaries while enhancing its sensitivity to less-represented lesion areas, resulting in strong and generalizable segmentation performance.
Results
This section presents and discusses experimental results to evaluate the performance of the proposed Attention U-Net segmentation framework. The specifics of the COVID-19 CT scan lesion segmentation dataset are outlined above. The dataset was initially split into 80% for training and validation and 20% for final testing. Data augmentation techniques were applied to the entire 80% training-validation set before performing 5-fold cross-validation. In each fold, the model was trained on 64% and validated on 16% of the data. While data augmentation was applied uniformly across the training and validation subsets in each fold to ensure consistency, the augmentation process remained identical across all folds. The model that achieved the highest validation Dice score was then evaluated on the independent 20% test set.
The model was set up using the Adam optimizer, featuring a learning rate of 1e-4. For enhanced segmentation performance, especially in dealing with imbalanced lesion areas, we adopted a custom combined loss function that incorporates both Dice Loss and Tversky Loss. The training spanned up to 50 epochs with a batch size of 16. These hyperparameters were chosen based on initial tests and align with standard practices observed in similar deep learning research. A summary of the hyperparameters used during training can be found in Table 1. The framework was trained from scratch and implemented using Python 3.11.12, TensorFlow 2.18.0, Keras 3.8.0, and Google Colaboratory (Colab), utilizing an NVIDIA Tesla T4 GPU (16 GB).
Table 1.
Parameter settings for the training proposed model.
| Parameter name | Parameter value |
|---|---|
| Number of parameters | 11,883,022 |
| Optimizer | Adam |
| Learning rate | ![]() |
| Batch size | 16 |
| Epoch | 50 |
| Image size | 256 256 |
Evaluation metrics
Four commonly used performance metrics in the field of medical image segmentation are the Dice coefficient score, the Intersection over Union (IoU) score, the sensitivity, and the specificity. We also computed overall accuracy and precision to supplement the efficacy of the proposed model. These evaluation metrics were calculated based on the true positive (TP), true negative (TN), false positive (FP), and false negative (FN)19.
Dice coefficient and IoU are the most utilized metrics for assessing segmentation effectiveness. The Dice coefficient quantifies the overlap between the predicted segmentation and the actual ground truth by calculating the product of the intersecting area and dividing it by the total pixel count in both images. The calculation is done using the formula provided:
![]() |
7 |
where G is the ground truth and P is predicted.
Intersection over Union score
It also called Jaccard Index. The IoU is the area of overlap between the predicted segmentation and the ground truth divided by the area of the union between them.
![]() |
8 |
The Dice and IoU scores measure the overlap between the ground truth and the class predicted by the model. Both metrics are always positively correlated. The Dice score is closer to the average performance of the segmentation model; in contrast, the IoU score represents the worst-case performance of the segmentation model by penalizing the bad classification more.
Accuracy
It is a straightforward and commonly applied metric in performance evaluation. It represents the likelihood that a randomly selected instance, whether positive or negative, is classified correctly. In diagnostic testing, it reflects the probability of making a correct assessment.
![]() |
9 |
Precision
It refers to the ability to accurately identify positive instances among all instances predicted as positive. It is calculated as the ratio of true positive predictions to the total number of predicted positive cases:
![]() |
10 |
Sensitivity
It also known as Recall or the true positive rate, represents a model’s capability to correctly identify all actual positive cases. It can be mathematically defined as,
![]() |
11 |
Specificity
It also known as the true negative rate, refers to the model’s ability to correctly identify negative cases.
![]() |
12 |
Discussion
The performance results obtained from the 5-fold cross-validation provide compelling evidence of the proposed Attention U-Net model’s effectiveness and consistency. By evaluating the model across different data splits, we ensured that the observed performance metrics reflect its true generalization ability rather than overfitting to specific samples. Table 2 presents the detailed results for each fold, along with the average metrics across all folds. The model consistently delivered high performance, with an average Dice Score of 0.8297 and IoU of 0.7089, demonstrating strong segmentation accuracy even in the presence of complex and irregular COVID-19 lesions. The high average accuracy of 99.50%, along with a specificity of 99.75% and precision of 84.49%, further underscores the model’s reliability in distinguishing lesion areas from healthy lung tissue. These results suggest that combining attention mechanisms with a hybrid loss function significantly enhances the model’s precision and robustness in clinical segmentation tasks.
Table 2.
Performance metrics for each fold and the mean across 5-fold cross-validation.
| Fold | Dice Score | IoU | Accuracy | Precision | Sensitivity | Specificity |
|---|---|---|---|---|---|---|
| 1 | 0.8317 | 0.7119 | 0.9949 | 0.8299 | 0.8574 | 0.9971 |
| 2 | 0.8314 | 0.7115 | 0.9950 | 0.8507 | 0.8380 | 0.9976 |
| 3 | 0.8262 | 0.7038 | 0.9949 | 0.8456 | 0.8404 | 0.9975 |
| 4 | 0.8295 | 0.7087 | 0.9950 | 0.8551 | 0.8339 | 0.9977 |
| 5 | 0.8297 | 0.7089 | 0.9949 | 0.8432 | 0.8445 | 0.9974 |
| Mean | 0.8297 | 0.7089 | 0.9950 | 0.8449 | 0.8428 | 0.9975 |
Figure 8 presents the training and validation loss curves and Dice score curves for the proposed model. The figure image displays the training and validation performance curves of the segmentation model over 50 training epochs. The left plot shows the training and validation loss, both of which decrease rapidly during the initial epochs, indicating effective learning. The training loss continues to decline gradually, while the validation loss plateaus after about 10 epochs and then stabilizes, suggesting that the model does not overfit and maintains good generalization to unseen data. The right plot presents the Dice coefficient for both training and validation sets. Here, the Dice scores increase sharply in the early epochs, with both curves reaching a high value and then stabilizing. The training Dice score continues to improve slightly, while the validation Dice score remains consistently high, demonstrating strong segmentation performance and stable learning throughout the training process.
Fig. 8.
Training and validation loss and dice score curves.
Sample segmentation results are visualized in Fig. 9, where the CT scan, ground truth mask, predicted mask, and overlay of the predicted mask on the CT scan are shown. The overlay highlights the predicted lesion in red, offering a clear comparison between the model’s predictions and the actual lesions. The predicted masks closely align with the ground truth masks, indicating accurate lesion segmentation by the model. This visual assessment confirms the model’s capacity to capture intricate lesion patterns effectively. It is crucial for evaluating the accuracy and quality of the segmentation model. To demonstrate the superior performance of the proposed Attention U-Net segmentation framework over existing state-of-the-art methods in terms of various assessment metrics, a comprehensive performance comparison with other similar works in the literature has been performed. The results of the quantitative comparison are presented in Table 3.
Fig. 9.
Comparison of CT scan images, ground truth mask, predicted mask, and overlay with predicted lesions.
Table 3.
Quantitative performance comparison of the proposed framework with those presented in previous works.
Model explainability and visualization
Explainable Artificial Intelligence (XAI) is an emerging field that encompasses various processes and methods to help human users understand and trust the outcomes of machine learning algorithms. Recent research in medical imaging analysis is increasingly aimed at generating clear, understandable explanations for how a model arrives at specific individual predictions. These explanations often take the form of visualizations that highlight the image regions most influential in the model’s predictions28. While XAI methods have achieved a certain level of success, evaluating and quantifying their effectiveness is still a challenge. Among these methods, Grad-CAM29 is a visualization technique that is useful for understanding how a CNN was driven to make a classification decision. Grad-CAM uses target gradients in the last convolutional layer to provide explanations by highlighting important regions of an image that impact prediction. The output of Grad-CAM is heatmaps representing the activation classes on the images received as input. An activation map is associated with a specific output class. The class activation map (CAM) is generated by calculating a linear combination of activations, weighted by the weights of the corresponding output for the observed class, with pixel-level importance values reflecting their contribution to the model’s final decision28.
In this study, we applied Grad-CAM to visualize the regions within CT images that significantly influenced the Attention U-Net model during COVID-19 lesion segmentation. These visualizations enhance the interpretability of the model and allow for a qualitative assessment of its focus on clinically relevant areas of infection. Figure 10 illustrates an example of the model’s interpretability using the Grad-CAM visualization. The first image shows the original CT slice, the second presents the corresponding ground truth mask annotated by experts, and the third displays the model’s predicted segmentation. The fourth panel presents the Grad-CAM heatmap, where red regions indicate areas of high model activation, suggesting strong confidence in lesion detection, while blue areas correspond to lower activation. This visualization confirms that the model primarily focuses on the infected lung regions when identifying COVID-19 lesions. It provides an interpretable explanation of the prediction process, which helps enhance clinical trust in the model’s decision-making.
Fig. 10.
Visualization of CT image, ground truth, predicted segmentation, and Grad-CAM output. The Grad-CAM heatmap (red-blue scale) highlights regions with high model activation corresponding to COVID-19 lesions.
Limitations
This study has some notable limitations that should be acknowledged. First, the dataset comprises 2729 CT slices sourced from seven publicly available collections, with only three of these featuring lesion annotations and exclusively representing COVID-19 pneumonia. This specificity may restrict the model’s applicability to other pneumonia types or lung conditions. Moreover, while attention gates and the hybrid Dice-Tversky loss enhance segmentation accuracy, they may also make the model overly sensitive to minor or ambiguous abnormalities, potentially increasing false positives for other conditions. Also, the approach of splitting slices at the level of individual scans allowed slices from the same patient to be included in both training and testing sets, potentially leading to an overestimation of performance metrics. Additionally, the absence of foundational models such as the Segment Anything Model (SAM) limited our capacity for benchmarking, and the hyperparameters were not fine-tuned using a systematic methodology. Employing automated hyperparameter optimization tools, like Optuna, could aid in determining the most effective configurations.
Another limitation is that this research relied on a single public dataset, with preprocessing and augmentation techniques designed specifically for its characteristics. Future studies could benefit greatly from validation using diverse, multi-center datasets that include various scanner types and imaging protocols. We also did not perform an ablation study to assess the individual effects of components such as attention gates, CLAHE preprocessing, and the Dice-Tversky loss. Future research could incorporate ablation experiments to better understand each module’s impact on segmentation performance. In summary, these limitations highlight the need for breaking down patient-level data, validating findings across various centers, and conducting comprehensive ablation studies.
Conclusion
This study introduced a deep learning framework designed for the automated segmentation of COVID-19-infected lesions in lung CT scans. The proposed pipeline combined CLAHE for preprocessing, extensive data augmentation techniques, and an Attention U-Net architecture optimized with a hybrid Dice-Tversky loss function. Evaluated using a publicly available dataset and 5-fold cross-validation, the model achieved an average accuracy of 99.74%, a Dice Score of 0.83, and an IoU of 0.71. These results demonstrate the model’s promising performance in accurately delineating infected lung regions. Additionally, XAI analysis was conducted using Grad-CAM to visualize the model’s attention and highlight the most influential regions within the CT images. The innovation of this work lies in the systematic integration of established techniques into a coherent, lightweight, and reproducible pipeline. This approach enhances segmentation accuracy while remaining practical for both clinical and research applications.
Future work will expand the proposed framework to incorporate multi-class segmentation and will carry out external validation on independent cohorts that include non-COVID-19 pneumonia and other pulmonary abnormalities. We will evaluate foundation models such as the Segment Anything Model (SAM) for benchmarking, as well as perform systematic hyperparameter optimization and validation on independent, multi-center datasets to enhance accuracy, robustness, and clinical applicability.
Author contributions
All authors contributed to the study conception and design
Funding
The authors extend their appreciation to Prince Sattam bin Abdulaziz University for funding this research work through the project number (PSAU/2025/01/36773).
Data availability
All data generated or analyzed during this study are included in this published article.
Declarations
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Contributor Information
Samy Bakheet, Email: s.bakheet@psau.edu.sa.
Rehab Youssef, Email: rehab.youssef@fci.sohag.edu.eg.
References
- 1.Paules, C. I., Marston, H. D. & Fauci, A. S. Coronavirus infections-more than just the common cold. JAMA323, 707–708. 10.1001/jama.2020.0757 (2020). [DOI] [PubMed] [Google Scholar]
- 2.World Health Organization. Coronavirus. https://www.who.int/health-topics/coronavirus. Accessed September 30th 2025.
- 3.Bhatele, K. R. et al. Covid-19 detection: A systematic review of machine and deep learning-based approaches utilizing chest X-rays and CT scans. Cogn. Comput.16, 1889–1926. 10.1007/s12559-022-10076-6 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.World Health Organization. WHO Director-General’s opening remarks at the media briefing on COVID-19 - 11 March 2020. https://www.who.int/director-general/speeches/detail/who-director-general-s-opening-remarks-at-the-media-briefing-on-covid-19---11-march-2020 (2020). Accessed September 30th 2025.
- 5.World Health Organization. WHO Coronavirus (COVID-19) Dashboard. https://covid19.who.int/ (2021). Accessed September 30th 2025.
- 6.Xie, X. et al. Chest ct for typical coronavirus disease 2019 (covid-19) pneumonia: Relationship to negative rt-pcr testing. Radiology296, E41–E45. 10.1148/radiol.2020200343 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Mozaffari, J., Amirkhani, A. & Shokouhi, S. B. A survey on deep learning models for detection of covid-19. Neural Comput. Appl.35, 16945–16973. 10.1007/s00521-023-08683-x (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Baghdadi, N. A. et al. An automated diagnosis and classification of covid-19 from chest ct images using a transfer learning-based convolutional neural network. Comput. Biol. Med.144, 105383. 10.1016/j.compbiomed.2022.105383 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Ahmed, I., Chehri, A. & Jeon, G. A sustainable deep learning-based framework for automated segmentation of covid-19 infected regions: Using u-net with an attention mechanism and boundary loss function. Electronics11, 2296. 10.3390/electronics11152296 (2022). [Google Scholar]
- 10.Shamshirband, S., Fathi, M., Dehzangi, A., Chronopoulos, A. T. & Alinejad-Rokny, H. A review on deep learning approaches in healthcare systems: Taxonomies, challenges, and open issues. J. Biomed. Inform.113, 103627. 10.1016/j.jbi.2020.103627 (2021). [DOI] [PubMed] [Google Scholar]
- 11.Zhang, J. et al. Recent developments in segmentation of covid-19 ct images using deep-learning: An overview of models, techniques and challenges. Biomed. Signal Process. Control91, 105970. 10.1016/j.bspc.2024.105970 (2024). [Google Scholar]
- 12.Sistaninejhad, B., Rasi, H. & Nayeri, P. A review paper about deep learning for medical image analysis. Comput. Math. Methods Med.2023, 7091301. 10.1155/2023/7091301 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Zhao, X. et al. D2a u-net: Automatic segmentation of covid-19 ct slices based on dual attention and hybrid dilated convolution. Comput. Biol. Med.135, 104526. 10.1016/j.compbiomed.2021.104526 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Chen, C. et al. An effective deep neural network for lung lesions segmentation from covid-19 ct images. IEEE Trans. Industr. Inf.17, 6528–6538. 10.1109/tii.2021.3059023 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Enshaei, N. et al. Covid-rate: An automated framework for segmentation of covid-19 lesions from chest ct images. Sci. Rep.12, 3212. 10.1038/s41598-022-06854-9 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Zhang, J., Ding, X., Hu, D. & Jiang, Y. Semantic segmentation of covid-19 lesions with a multiscale dilated convolutional network. Sci. Rep.12, 1847. 10.1038/s41598-022-05527-x (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Saha, S., Dutta, S., Goswami, B. & Nandi, D. Adu-net: An attention dense u-net based deep supervised dnn for automated lesion segmentation of covid-19 from chest ct images. Biomed. Signal Process. Control85, 104974. 10.1016/j.bspc.2023.104974 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Ilhan, A., Alpan, K., Sekeroglu, B. & Abiyev, R. Covid-19 lung ct image segmentation using localization and enhancement methods with u-net. Procedia Comput. Sci.218, 1660–1667. 10.1016/j.procs.2023.01.144 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Alshomrani, S., Arif, M. & Al Ghamdi, M. A. Saa-unet: Spatial attention and attention gate unet for covid-19 pneumonia segmentation from computed tomography. Diagnostics13, 1658. 10.3390/diagnostics13091658 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Geng, P. et al. Stcnet: Alternating cnn and improved transformer network for covid-19 ct image segmentation. Biomed. Signal Process. Control93, 106205. 10.1016/j.bspc.2024.106205 (2024). [Google Scholar]
- 21.Maedemaftouni, A. COVID-19 CT scan lesion segmentation dataset. https://www.kaggle.com/datasets/maedemaftouni/covid19-ct-scan-lesion-segmentation-dataset (2020). Accessed: April 7th 2025.
- 22.Bakheet, S., Al-Hamadi, A. & Youssef, R. A fingerprint-based verification framework using Harris and Surf feature detection algorithms. Appl. Sci.12, 2028. 10.3390/app12042028 (2022). [Google Scholar]
- 23.Acar, E., Şahin, E. & Yılmaz, İ. Improving effectiveness of different deep learning-based models for detecting covid-19 from computed tomography (ct) images. Neural Comput. Appl.33, 17589–17609. 10.1007/s00521-021-06344-5 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Buslaev, A. et al. Albumentations: Fast and flexible image augmentations. Information11, 125. 10.3390/info11020125 (2020). [Google Scholar]
- 25.Ronneberger, O., Fischer, P. & Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Medical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, proceedings, part III 18, 234–241. 10.48550/arXiv.1505.04597 (Springer, 2015).
- 26.Oktay, O. et al. Attention u-net: Learning where to look for the pancreas. 10.48550/arXiv.1804.03999 (2018).
- 27.Zhou, T., Canu, S. & Ruan, S. Automatic covid-19 ct segmentation using u-net integrated spatial and channel attention mechanism. Int. J. Imaging Syst. Technol.31, 16–27. 10.1002/ima.22527 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Elbouknify, I., Bouhoute, A., Fardousse, K., Berrada, I. & Badri, A. Ct-xcov: a ct-scan based explainable framework for covid-19 diagnosis. In 2023 10th International Conference on Wireless Networks and Mobile Communications (WINCOM), 1–8, 10.48550/arXiv.2311.14462 (IEEE, 2023).
- 29.Selvaraju, R. R. et al. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, 618–626. 10.1109/ICCV.2017.74 (2017).
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
All data generated or analyzed during this study are included in this published article.
























