Skip to main content
Scientific Reports logoLink to Scientific Reports
. 2025 Sep 25;15:32777. doi: 10.1038/s41598-025-15470-2

Adjacent-differential network with shallow attention for polyp segmentation in colonoscopy images

Keli Hu 1,2,3,4, Chen Wang 2, Hancan Zhu 2, Liping Zhao 2,4,, Chao Fu 5, Weijun Yang 3, Wensheng Pan 1,
PMCID: PMC12464208  PMID: 40998913

Abstract

Colonoscopy is the gold standard for the examination and detection of polyps, with over 90% of polyps potentially progressing into colorectal cancer. Accurate polyp segmentation plays a pivotal role in the early diagnosis and treatment of colorectal cancer. Most existing methods ignore inconsistent colors of colonoscopy images and fuse features of different layers by directly using element-wise addition or concatenation. The former operation may lead to overfitting, while the latter can weaken the complementarity between different levels. In order to resolve these challenges, we propose a deep adjacent-differential network with shallow attention for polyp segmentation (ADSANet). Initially, we develop a color exchange strategy based on uncorrelated and specular region suppression to decouple image content from colors. This strategy allows the model to prioritize the target appearance, reducing the risk of overfitting to color features. To maximize the synergy between layers, we first propose an adjacent-differential feature fusion module (ADFM) and then employ the shallow attention module (SAM) for further feature fusion. Specifically, the ADFM generates differential features between adjacent layers and combines features at the corresponding level in the encoder, as well as the adjacent decoder features. We sequentially apply the ADFM at different scale levels for feature decoding, and the final prediction is computed by fusing the outputs of the sequentially connected ADFMs and the shallow attention module. Extensive experiments on five datasets show ADSANet outperforms most state-of-the-art convolutional neural networks (CNN)-based methods, ADSANet yields significant gains of 18.5%, 3.5%, 3.8%, 4.0%, and 1.7% over classical method PraNet on ETIS, ClinicDB, Endoscene, ColonDB, and Kvasir-SEG respectively, which demonstrates the effectiveness of the proposed scheme of color exchange and adjacent-differential feature fusion for more accurate polyp segmentation.

Keywords: Polyp segmentation, Color exchange, Adjacent-differential feature fusion, Shallow attention

Subject terms: Colonoscopy, Classification and taxonomy, Image processing, Machine learning

Introduction

Colorectal cancer (CRC) has emerged as a significant public health concern in recent years, currently ranking as the third leading cause of cancer-related deaths worldwide1. Colorectal polyps are abnormal growths of tissue in the colon, and if left untreated, they can progress into cancer. Colonoscopy is widely regarded as the gold standard for gastrointestinal examinations2. Early detection of polyps during colonoscopy screening, followed by appropriate treatment such as removal, is an effective means of preventing CRC. Studies have indicated that a 1% increase in adenoma (a type of polyp) detection rate can reduce the risk of CRC by 3%3. However, current colonoscopy screening heavily relies on medical professionals, and its effectiveness is significantly influenced by their expertise, as well as factors such as patients’ intestinal conditions and the quality of colonoscopy images. Despite significant advances in colonoscopy technology, there is still a 22–28% chance of polyps being missed, and a 20–24% chance of adenomas being missed during a colonoscopy screening due to human errors, posing a substantial risk to patient health4. Moreover, detecting polyps but failing to completely remove the polyp tissue can lead to post-colonoscopy cancer5. Introducing automatic detection of polyp areas in colonoscopy images can significantly improve the quality of colonoscopy screening and reduce the risk of missed polyps or incomplete judgments due to factors such as fatigue or lack of experience6,7. This can greatly enhance the effectiveness of CRC prevention and ultimately improve public health outcomes.

Polyp segmentation, however, still faces three major challenges. Firstly, despite being abnormal tissue, polyps are very similar in appearance to normal tissue of the intestinal wall. For instance, the edges of flat polyps are smooth, and the morphologies of the intestinal wall folds and mucosal crypts are similar to those of polyps. The similarity between polyps and normal tissue makes it difficult to automatically extract the polyp boundaries. Secondly, there is a significant variation in the imaging of different polyps, primarily due to differences in size, texture, orientation, and color. Furthermore, significant differences may exist within a single polyp, such as abrupt changes in color. Finally, the presence of bubbles and fecal particles in the intestine, as well as specular reflections caused by the endoscopy light source, can all affect the accuracy of polyp segmentation.

Before the widespread adoption of deep learning in visual analysis tasks, traditional machine learning techniques were employed for polyp region extraction8. These methods typically involve a two-stage process. The first stage entails creating handcrafted features, primarily based on color differences, texture, shape, and edges that distinguish polyps from normal tissue. The second stage involves training a machine learning model using traditional methods for polyp region extraction. Although traditional machine learning methods have contributed to some degree in automating colonoscopy screening, their processing efficiency falls significantly short of meeting actual clinical requirements. Furthermore, the design of handcrafted features heavily relies on expert experience. Given the substantial influence of the intestinal environment on colonoscopy images and the high intra-class variability and low inter-class variability in polyp imaging, designing handcrafted features and employing traditional machine learning models become challenging in effectively capturing the features of colonoscopy images.

With the development of deep learning scheme, particularly the significant breakthroughs of convolutional neural networks (CNNs) in the fields of image recognition, object detection, and segmentation, these high-performing algorithms have been attempted to be applied to colonoscopy image analysis. Classic CNNs for image segmentation such as FCN9 UNet10 and SegNet11 have been introduced into the polyp segmentation domain. Compared to traditional machine learning methods, these CNN methods have significantly improved the accuracy of automatic polyp region extraction, making automated polyp screening possible. The greatest advantage of CNNs over traditional machine learning methods is their network size and structure, which are easily scalable and editable. They can autonomously learn key features specific to a given task based on data features, effectively avoiding the potential risks of manually designing features while better fitting large-scale data features and avoiding underfitting problems. However, researchers have found that existing classic segmentation networks still cannot effectively meet the high-reliability and high-accuracy demands of polyp segmentation. Specific segmentation networks tailored to polyp region extraction have been proposed. These segmentation networks are based on the encoder-decoder framework, where features are gradually extracted in the encoder stage and segmentation masks are generated progressively in the decoder stage. The success of the UNet structure in medical image segmentation tasks such as CT and MR has demonstrated its superior performance in solving image segmentation problems. In particular, for polyp segmentation tasks, most networks adopt the U-shaped structure and incorporate various mechanisms, such as edge feature constraints12,13 attention mechanisms14 global feature guidance15 and hierarchical feature strengthening, which enhance polyp segmentation performance. Despite the improved accuracy of computer-aided automatic polyp segmentation, there is still a significant gap between current performance and clinical requirements.

Deep learning networks have a strong capability to fit data features, but if the training data lacks sufficient diversity in features, the network may become constrained by the existing training data and exhibit poor generalization when processing data with significant feature differences. As illustrated in Fig. 1, colonoscopy images acquired under different scenarios (e.g., from different patients or devices) can vary considerably in color distribution. When it comes to the problem of colonoscopy polyp segmentation, if the data used to train the network does not have diverse color distributions or lacks for certain color distributions, the performance of the polyp segmentation network will be influenced by the distribution of these data features, leading to poor model adaptation to images with different feature distributions. In terms of representing complementary feature information across different layers, MSNet16 introduces multi-scale subtraction to address the issue of diminishing complementary information between layers, achieving good polyp segmentation results. Subsequently, multi-scale subtraction was applied to the differential fusion of deep network features in the final stages17 further demonstrating its effectiveness in polyp segmentation. The subtraction operation was also introduced in the automatic segmentation of hemorrhages18 leading to improved performance. However, exsiting schemes which utilizing multi-scale subtraction fail to effectively integrate raw information from each layer with intermediate semantic features, which weakens the network’s feature representation ability and limits its performance in polyp segmentation tasks. To enhance the model’s adaptability to variations in color distribution, SANet19 proposed a color exchange strategy. However, this strategy did not fully account for the specific characteristics of colonoscopy images; instead, it directly incorporated the color information of the entire image into the statistics, introducing influencing factors such as specular reflections and peripheral black areas. This approach may limit the network’s ability to effectively adapt to diverse color distributions.

Fig. 1.

Fig. 1

Colonoscopy images with different color distribution.

To further address the challenge of effectively integrating differential features to enhance the model’s ability to learn complex features, thereby further improving the accuracy of computer-aided colonoscopy polyp segmentation, we propose a novel deep adjacent-differential network with shallow attention for polyp segmentation (ADSANet). In addition, to mitigate overfitting to specific data distributions, we introduce a color exchange strategy based on uncorrelated and specular region suppression, aiming to better decouple image content from color information. This strategy is applied during model training and involves transforming the color distribution of the source image to match that of the target image. To avoid image distortion caused by color exchange and unnecessary noise that may interfere with model performance, we suppress the non-colonic wall areas and specular regions, and then perform color distribution transformation. In ADSANet, we introduce the differential features between adjacent layers to enable the network to fully capture inter-layer feature differences. Furthermore, we retain the original features of the corresponding layer to ensure that the network can capture features that are truly suitable for colonoscopy polyp segmentation. A shallow attention mechanism is introduced to improve the network’s adaptability to complex data distributions. Our contributions in this study are as follows.

  • To enhance the authenticity of colonoscopy images after color exchange, we propose a color exchange strategy that suppresses uncorrelated area and specular regions, allowing the model to better avoid interference from color distribution.

  • We introduce ADFM that fully utilizes the complementarity information between layers. ADFM integrates adjacent-differential features, differential feature variances, and current layer features to enhance network features comprehensively, forming the basis for accurate polyp segmentation.

  • SAM is introduced to efficiently fuse features across different levels. By integrating semantic features obtained through the ADFM cascade, it further strengthens the network’s ability to handle challenges like significant polyp scale variations. Additionally, it enhances the network’s feature representation capability, resulting in improved polyp segmentation accuracy.

The structure of this paper is as follows: Sect. “Related works” offers a brief review of prior work in polyp segmentation. Section “Method” provides a comprehensive introduction to ADSANet, including the color exchange strategy. In section “Experiments”, we present extensive experiments that showcase the superior performance of the ADSANet algorithm. Finally, in section “Conclusion”, we summarize this study and suggest potential avenues for future research.

Related works

Early methods for polyp extraction

In addressing the challenge of polyp region extraction, traditional methods primarily rely on hand-engineered approaches. These methods entail the manual design of features and the use of standard classifiers to distinguish polyps from the surrounding intestinal tissue. Bernal et al.20 proposed a segmentation-description-recognition mechanism, employing the watershed segmentation method for region segmentation. They developed the Sector Accumulation-Depth of Valleys Accumulation descriptor, which enabled efficient classification and initial polyp extraction, albeit with some rough edges. Ganz et al.21 introduced prior information about polyp shapes into edge detection and segmentation, successfully detecting polyp regions in the narrow-band imaging (NBI) scenario. Bae et al.22 utilized partial least square analysis to learn compact and discriminative features, improving the algorithm’s capability to handle challenges posed by similar appearances. Bernal et al.‘s23 introduced the window median depth of valleys accumulation energy map for polyp detection, with experimental results demonstrating a high degree of overlap between the energy map and regions of interest to medical professionals. Tajbakhsh et al.24 proposed a polyp region detection method based on a hybrid context-shape approach, primarily analyzing edge information to locate polyps. However, the primary objective of the algorithm was to determine the lesion’s position, and its polyp segmentation accuracy was limited. Yuan et al.25 were the first to incorporate saliency methods into polyp detection. Through a combination of bottom-up and top-down algorithm design, they treated polyps as salient objects using the SVM algorithm, resulting in improved accuracy in polyp region detection, it still fails to cope with the challenges posed by the diverse shapes and large size variations of polyps.

While hand-engineered polyp detection algorithms have made some progress in automating endoscopic polyp screening, their performance is still hindered by various factors, including prior knowledge and detection scenarios. These algorithms struggle to effectively address challenges related to polyp shape, color, texture, and blurred boundaries.

Basic CNN-based methods for polyp segmentation

Deep learning, characterized by its powerful data learning and fitting capabilities, eliminates the need for manual feature design. CNN-based algorithms have made remarkable strides in various domains, including image dehazing26,27 image segmentation28 and semantic analysis29. Several fundamental CNN-based image segmentation methods have been developed, outperforming traditional segmentation techniques in conventional image segmentation tasks. Given these advancements, these algorithms have been adapted for polyp segmentation. Akbari et al.30 introduced FCN into the field, leading to improved performance in CVC-ColonDB. However, they observed sensitivity to interferences such as glare. Brandao et al.31 integrated typical network structures like VGG and ResNet within the FCN framework and validated their algorithms on four datasets, highlighting the advantages of CNN-based methods. Nevertheless, they still encountered challenges related to missed polyp region detection and incomplete detections. Bardhi et al. [9] conducted an adaptability analysis of SegNet for segmentation tasks and found it performed better in tasks such as pressure ulcer and iris segmentation, underscoring the difficulties in polyp region segmentation. Selvaraj et al.32 employed UNet for polyp segmentation in wireless capsule endoscopy images, and achieved good results by incorporating test-time augmentation. Wang et al.33 introduced SegNet and conducted algorithm training and validation through dataset integration. Their experiments showed a polyp detection sensitivity of over 90% at the image level, with low instance-level missed detection rates. However, the precision of polyp region extraction remained vulnerable to interference, resulting in significant region-based missed detections. Jha et al.34 proposed an enhanced version of UNet, ResUNet++, and used it for polyp area segmentation. In addition to polyp region segmentation, Yu et al.35 introduced the 3D-FCN method for polyp detection.

When it comes to polyp extraction and detection tasks, CNN-based algorithms exhibit their superiority over traditional methods, confirming their effectiveness. Nonetheless, relying solely on transferring basic segmentation models remains inadequate for accurately segmenting polyp regions, emphasizing the pressing need for more effective deep learning frameworks dedicated to polyp segmentation.

Specific CNN-based methods for polyp segmentation

Specific CNN-based methods are tailored to address the challenges of polyp segmentation by introducing networks that align with the unique characteristics of polyps. Banik et al.36 developed a Dual-Tree wavelet pooled network for polyp segmentation and introduced a post-processing method based on level sets to optimize network predictions. Fan et al.37 integrated edge information, enhancing the network’s ability to recognize fuzzy boundary information without direct edge supervision. Guo et al.38 addressed the issue of fixed threshold binarization in network segmentation predictions by proposing ThresholdNet, offering adaptive threshold optimization capabilities to enhance the robustness of polyp region extraction. However, it increases the complexity of the segmentation task to some extent. To enhance feature guidance and fusion, mechanisms for the transmission and integration of global and local feature information have been introduced. For example, Wang et al.15 improved detection accuracy by combining global and local feature networks. Zhang et al.39 focused on both global and local semantic information, using a channel-wise attention mechanism to adaptively select and aggregate semantic features, Patel et al.40 effectively combined semantic feature enhancement with adaptive global semantic guidance, Wickstrøm et al.41 explored the issue of uncertainty in polyp region segmentation networks. Unlike the fusion mechanisms of global and local information, Cheng et al.42 emphasized polyp edge information and introduced a Learnable Oriented-Derivative Network, optimizing angle extraction from edge information to enhance polyp region detection accuracy, Huang et al.43 introduced HarDNet68, a low-memory CNN, and a decoder structure tailored for efficient saliency-based object detection, resulting in HarDNet-MSEG, Tomar et al.44 proposed a dual decoder attention mechanism, Wei et al.19 introduced a shallow attention mechanism, and Dong et al.45 incorporated the shallow attention mechanism into the vision Transformer framework, enhancing detection performance. In addition to proposing a shallow attention mechanism, Wei et al.19 also introduced color transformation into their framework. However, their method did not fully take into account the specific characteristics of colonoscopic images and failed to effectively suppress the interference from irrelevant regions during the color transformation process, which adversely affected the overall network performance. Given the blurred boundaries of reflective regions in colonoscopic images, Hu et al.46 introduced neutrosophic theory4751 and combined it with uncertainty information representation to propose a two-stage short-range cross-layer feature fusion strategy, achieved favorable reflection recovery results, demonstrating the effectiveness of the reflection recovery enhancement method in improving model performance. Considering polyps are abnormal tissue that show some differences in image distribution, Hu et al.46 and Zhou et al.52 employed the saliency detection scheme, and validated the effectiveness of the saliency scheme for polyp extraction. Zhao et al.16 redirected the network’s focus towards the differences in intermediate features, enhancing the network’s ability to represent differential features in polyp extraction. Zhu et al.53 treated lesion recognition and its region extraction as parallel network objectives, improving detection performance through information exchange and fusion between different branches. Li et al.54 proposed a parallel reverse structure base on PraNet37demonstrated the effectiveness of layer-wise attention to features in non-region-of-interest areas. Ahamed et al.55,56 incorporated an interpretable mechanism into polyp detection, enhancing both detection performance and model interpretability. Additionally, they used a variety of image transformations for data augmentation during training and applied image rotation mean fusion during inference57 resulting in improved performance. Selvaraj et al.58 incorporates 92 augmentation techniques and combines them with Padded U-Net for polyp segmentation, demonstrated the role of data augmentation in optimizing model performance. In addition, enhancements based on attention mechanisms, loss functions, semantic feature associations, and feature descriptors have also been introduced into polyp segmentation research. For example, Yue et al.59 enhancing the polyp segmentation accuracy by using a scheme of attention guided pyramid feature fusion. Gökkan et al.60 proposed an imbalance-aware loss function that jointly considers region, shape, and pixel-level distribution. A duplex contextual relation framework61 was introduced to capture both within-image and cross-image contextual relations for more accurate segmentation predictions. Liu et al.62 proposed a coarse-to-fine segmentation frame by combining the deep and classification features. Selvaraj et al.63 proposed a U-shaped network for polyp segmentation, and integrating traditional feature descriptors to perform classification tasks using multi-house database. Specifically, the MSNet proposed by Zhao et al.16 introduces a multi-scale subtraction mechanism, which effectively enhances the representation learning of complementary features between adjacent layers. This strategy has also been applied to automatic hemorrhage segmentation18. Although the structural design of multi-scale subtraction has demonstrated its effectiveness, existing schemes that utilize multi-scale subtraction often fail to effectively integrate raw low-level information with intermediate semantic features. This limitation weakens the overall feature representation capability of the network and consequently hinders its performance in polyp segmentation tasks. In general, most existing methods tend to focus more on optimizing the interaction of information across network layers and feature extraction at the current layer, while giving less consideration to enhancing complementary features between layers and addressing color feature distribution shifts. This has somewhat weakened the network’s performance when facing the challenges of polyp region extraction.

Overall, specific CNN-based approaches are specifically tailored to the unique characteristics of polyp imaging, leading to substantial improvements in polyp region extraction performance. Nonetheless, existing methods still fall short of meeting the clinical demands for polyp detection. To enhance the model’s ability to address challenges such as significant color distribution variations, substantial intra-class polyp differences, and minor inter-class distinctions, this study incorporates color domain transfer, adjacent-differential, and shallow attention mechanisms.

Method

This section begins with an overview of ADSANet. Figure 2 illustrates the proposed polyp segmentation framework, comprising two primary components: color exchange and ADSANet. The color exchange strategy is exclusively employed during the model training phase, while the original colonoscopy images are directly utilized during inference, ensuring efficiency in the inference process. ADFM serves as the core module of ADSANet. In the subsequent sections, we provide a detailed description of the overall architecture, the color exchange strategy, ADFM, SAM, and the loss function.

Fig. 2.

Fig. 2

Architecture of the proposed ADSANet for polyp segmentation.

Overall architecture

Figure 2 illustrates the overall architecture of ADSANet. The encoding convolution modules Inline graphic are concatenated to form the feature extraction backbone, where Inline graphic. These convolution modules are primarily used to extract features at different levels. ADSANet adopts a U-shaped structure. We refer to the direction of feature propagation from shallow to deep layers as downward, and the process of gradually recovering features layer by layer as upward. Therefore, in the encoding process of ADSANet, the feature propagation mainly occurs in the downward part of the U-shaped structure. The upward part of ADSANet consists of two main components: the upward structure formed by concatenating ADFM and SAM. Compared to the upward structure formed by ADFM, SAM primarily focuses on deep-level features to enhance semantic information. The network ultimately achieves the final prediction of the polyp region by fusing the inference results from both upward structures.

We adopt ResNet as the backbone network for feature extraction. After passing through the encoding convolution modules Inline graphic, the input image produces five encoded features Inline graphic, where H and W are the height and width of the input image, respectively, where Inline graphic, Inline graphic, and Inline graphic. To further extract feature information and reduce the computational complexity of the upward structure, all features Inline graphic along the encoding paths are transformed into 64-channel features Inline graphic using Inline graphic convolutional kernels. To fully leverage the features at each level, the encoded features Inline graphic from each level are propagated to their respective ADFMs. The differential features Inline graphic and fusion features Inline graphic generated by the deep-level ADFM propagate layer by layer in the upward direction, constructing refined features. The other upward pathway takes the last three encoded features Inline graphic, Inline graphic, and Inline graphic as inputs. After undergoing shallow attention processing, deep semantic-guided features are generated, and they are fused with the features extracted from the ADFM branch through concatenation to obtain the inference results.

Color exchange with region suppressing

As depicted in Fig. 1, colonoscopy images exhibit both diverse morphological variations of polyps and a wide range of color distributions. To prevent the model from overfitting to the limited training data, it is necessary to perform color transfer on the images to enrich the training dataset. However, random color transfer is not suitable since excessive color transformation can result in distorted images, hindering the model’s effectiveness in handling real colonoscopy image data.

For colonoscopy image color transformation, a more appropriate approach is to use real colonoscopy images as the target domain, and align the color distribution of the source images with the target domain. It is important to note that colonoscopy images contain two kinds of special regions: non-detection area and specular reflection area. Non-detection area refers to the black regions surrounding the images, which have relatively regular shapes. Specular reflection area, on the other hand, appears nearly white due to the reflection of the light source from the smooth intestinal wall. To ensure that the transferred images closely resemble the real imaging results, it is essential to consider these two types of special regions during the transformation process. By incorporating them into the color transfer, we can achieve more realistic and accurate representations of colonoscopy images.

Considering the LAB color space provides more meaningful channel features compared to the RGB color space, it addresses the issues of excessive transitional colors between blue and green and the lack of transitional colors between green and red. Prior to performing color domain transformation, we first convert the image from the RGB color space to the LAB color space. Let Inline graphic, Inline graphic, Inline graphic represent the intensity values of the L, A, and B components of the original image I at coordinate x, respectively. Similarly, let Inline graphic, Inline graphic, Inline graphic represent the intensity values of the L, A, and B components of the target image T at coordinate x. After region suppression and color exchange, the intensity of each component can be expressed as follows:

graphic file with name d33e795.gif 1

where Inline graphic, Inline graphic, Inline graphic are the standard deviations of the intensity values of the L, A, and B components at all coordinates in the original image after excluding the suppressed regions, and Inline graphic, Inline graphic, Inline graphic are the standard deviations of the intensity values of the L, A, and B components at all coordinates in the target image after excluding the suppressed regions. Similarly, Inline graphic, Inline graphic, Inline graphic represent the means of the intensity values of the L, A, and B components at all coordinates in the original image after excluding the suppressed regions, and Inline graphic, Inline graphic, Inline graphic represent the means of the intensity values of the L, A, and B components at all coordinates in the target image after excluding the suppressed regions. The calculation methods for Inline graphic and Inline graphic can be expressed as follows:

graphic file with name d33e889.gif 2
graphic file with name d33e895.gif 3

where the functions Inline graphic and Inline graphic are used to calculate the standard deviation and mean of all elements within a set, where each element is treated with equal probability. Inline graphic represents the set of coordinates in the original image, Inline graphic represents the set of coordinates in the specular reflection area, and Inline graphic represents the black region surrounding the colonoscopy image. In this case, pixels with grayscale values greater than the threshold Inline graphic are considered specular reflection pixels, while pixels with grayscale values lower than the threshold Inline graphic are considered black region pixels. For the calculation of Inline graphic, Inline graphic, Inline graphic, Inline graphic, Inline graphic, the same approach as Eq. (2) can be employed. To calculate Inline graphic, Inline graphic, Eq. (2) needs to be adjusted by replacing Inline graphic with Inline graphic and Inline graphic, respectively. Likewise, for Inline graphic, Inline graphic, and Inline graphic, Eq. (2) should be modified by substituting Inline graphic with Inline graphic, Inline graphic, and Inline graphic accordingly, and the corresponding sets of image coordinates, specular reflection area coordinates, and black region sets should correspond to the corresponding regions in the target image. Similar replacements of the corresponding terms in Eq. (3) are required for the calculation of Inline graphic, Inline graphic, Inline graphic, Inline graphic, and Inline graphic.

The images shown in Fig. 3 depict partial colonoscopy images after undergoing color transformation. The first column represents the original images, the second column shows the target images, the third column displays the corresponding images after color exchange with region suppression, and the fourth column exhibits the color-transformed images without considering region suppression. Here, “color-transformed images without considering region suppression” refers to the color distribution calculation that includes all pixels belonging to the image. From Fig. 3, it is evident that the color transformation method with region suppression better preserves the details of the images. In the second column of the first and second rows, it can be observed that the lower half of the images is significantly influenced by the high-light areas when region suppression is not applied, resulting in some degree of overexposure. In the second column of the third and fourth rows, it can be seen that our method effectively handles color transformation in the dark regions of colonoscopy images, whereas the method without region suppression exhibits significant color distortion after the transformation. In summary, the proposed method effectively achieves color domain transfer and successfully addresses issues such as high-light overexposure and color distortion in dark regions.

Fig. 3.

Fig. 3

Illustration of color exchange. (a) Original images, (b) Target images, (c) Images after color exchange with region suppressing, (d) Images after color exchange without region suppressing.

ADFM module

In Fig. 2, it is shown that ADFM (Attention-based Differential Feature Module) serves as a key module within the uplink structure in ADSANet. It receives encoded features from each layer and transfers them to different levels of ADFMs. ADFM acts as an enhancement module in the progressive refinement process, refining features from bottom to top. In Fig. 2, the ADFMs in ADSANet are represented in two different colors, primarily because the input to the deepest ADFM differs from other ADFMs in terms of feature input. The deepest ADFM receives the deep-layer features Inline graphic, which are extracted by the encoding backbone, serving as its differential features Inline graphic and fusion features Inline graphic. Normally, an ADFM consists of three external inputs and two outputs. The inputs consist of the current layer’s encoded features Inline graphic, the differential features Inline graphic, and the fusion features Inline graphic from the deeper layer. The outputs include differential features Inline graphic and fusion features Inline graphic, which integrate information from the current layer and the deeper layer. The primary purpose of constructing ADFM is to strengthen the network’s attention towards first-order and second-order differential features across various layers, as well as to integrate the encoded features of the current layer. This enhances the overall feature representation capability of the network.

Figure 4 presents the structure of ADFM. It consists of two subtraction operations, and two concatenation operations. When observed from left to right, the first subtraction operation is performed to extract the difference between Inline graphic and Inline graphic. The second subtraction operation is employed to capture the disparity between Inline graphic and the difference between Inline graphic and the deep-layer differential features Inline graphic. Technically, we start by applying a subtraction operation to obtain the differential features Inline graphic where Inline graphic Inline graphic Inline graphic Inline graphic The differential features Inline graphic can be computed as follows:

graphic file with name d33e1267.gif 4

Fig. 4.

Fig. 4

Illustration of the ADFM module.

where the function Inline graphic represents the convolution operation followed by the ReLU activation function, using a Inline graphic kernel, Inline graphic, Inline graphic denotes the absolute value calculation, Inline graphic denotes subtraction manipulation.

The fusion features play a crucial role in the up-sampling structure of ADSANet. We jointly consider the two-level differential features, the fusion features from the deeper layer, and the current layer’s encoding features to extract more robust fusion features. As shown in Fig. 4, after each subtraction operation and concatenation operation, a convolution operation is applied for feature processing in ADFM. The fusion features Inline graphic can be computed by:

graphic file with name d33e1316.gif 5

where the function Inline graphic denotes the concatenation operation. It should be noted that the convolution operations Inline graphic in Eqs. (4) and (5) are independent of each other but share the same kernel size of 3 × 3, Inline graphic. Additionally, it is important to mention that the computation of up-sampling is not explicitly mentioned in Eq. (4) and Eq. (5). As depicted in Fig. 4, when the differential features Inline graphic and fusion features Inline graphic are used as inputs for the i-th layer, the corresponding up-sampling operations should be performed. In this paper, we adopt this approach to address similar cases, and we will not provide further explanations for this procedure in subsequent discussions.

SAM module

The structure of SAM is depicted in Fig. 5. It can be observed that matrix element-wise multiplication is used twice in the SAM module. The main reason lies in the fact that matrix element-wise multiplication is typically employed for intermediate feature interaction and enhancement in networks, offering advantages such as computational efficiency, strong flexibility, and the ability to capture nonlinear relationships between features. As seen in Fig. 5, SAM has three inputs, namely the deep-level encoded features Inline graphic, Inline graphic, and Inline graphic from ADSANet, and produces one output. SAM utilizes matrix multiplication to directly guide the shallow-level features using the deep-level features. To preserve the semantic features extracted by the encoding network, the feature Inline graphic, and the products of Inline graphic with Inline graphic and Inline graphic with Inline graphic and Inline graphic are included in the fusion process. After applying convolutional operations, the output feature SA of SAM can be calculated as follows:

graphic file with name d33e1449.gif 6

Fig. 5.

Fig. 5

Illustration of the SAM module.

where the function Inline graphic represents the convolutional operation with ReLU activation, using a Inline graphic kernel size. The resulting feature tensor Inline graphic, Inline graphic denotes the concatenation operation, Inline graphic denotes the element-wise multiplication.

The final inference result is determined by the joint contribution of SA and the feature Inline graphic obtained from ADFM:

graphic file with name d33e1498.gif 7

where the function Inline graphic represents the sigmoid function, and Inline graphic denotes the convolutional operation with ReLU activation using a Inline graphic kernel size, Inline graphic denotes the concatenation operation. This operation is applied to compute a specific feature. The resulting feature tensor Inline graphic.

Loss function

ADSANet adopts the commonly used weighted Binary Cross Entropy (wBCE) loss and weighted Intersection over Union (wIoU) loss to jointly measure the network’s loss for polyp segmentation task. The loss function Inline graphic can be computed as follows:

graphic file with name d33e1546.gif 8

where Inline graphic denotes the predicted mask, and Inline graphic represents the corresponding ground truth. Inline graphic and Inline graphic represent the corresponding wBCE loss and wIoU loss, respectively.

Experiments

In this section, we conduct a comprehensive series of experiments to validate the effectiveness of ADSANet. We commence by providing details on the dataset, experimental settings, and the metrics for evaluation. Following that, we perform ablation studies, compare ADSANet with several state-of-the-art methods based on CNN and Transformer frameworks. Finally, we discuss the algorithm’s performance and conduct an analysis of failure cases.

Dataset and settings

Dataset

We conduct experiments using five publicly available datasets: ETIS64 ClinicDB23 Endoscene65 ColonDB24 and Kvasir-SEG66. All five datasets include colonoscopy images along with corresponding pixel-level annotations of polyp regions. Dataset configuration is shown in Table 1. The ETIS dataset comprises 196 images, each with a resolution of 1225Inline graphic996. This dataset displays significant variations in polyp appearance, with certain images featuring challenging polyps that are deeply concealed and closely resemble the surrounding regions. This presents a substantial challenge for algorithms. The ClinicDB dataset consists of 612 images extracted from 29 endoscopy video sequences with polyps. The images have a resolution of 384Inline graphic288. The ColonDB dataset is composed of 380 still frames with polyps extracted from 15 different polyp video sequences. The images have a resolution of 574Inline graphic500, and the polyps exhibit diverse morphological characteristics. The initial Endoscene dataset is a combination of images from the ClinicDB and ColonDB datasets. The Endoscene dataset used in our experiments is identical to the one defined in the MSNet study16 comprising 60 images with a resolution of 574Inline graphic500. The Kvasir-SEG dataset consists of 1000 polyp images extracted from colonoscopy videos. The images have varying resolutions ranging from 332Inline graphic487 to 1920Inline graphic1072, and the dataset showcases polyps with significant variations in their morphological appearance.

Table 1.

Dataset configuration.

Dataset Sample Num Resolution Training Testing
ETIS 196 1225 × 996 n/a 196
ClinicDB 612 384 × 288 550 62
Endoscene 60 574 × 500 n/a 60
ColonDB 380 574 × 500 n/a 380
Kvasir-SEG 1000 332 × 487 to 1920 × 1072 900 100

Experimental settings

To ensure fairness, we follow the same training and testing set configuration as described in reference37. As seen in Table 1. The training set consists of 900 samples from the Kvasir-SEG dataset and 550 samples from the ClinicDB dataset, while the remaining samples from these datasets, as well as samples from other datasets not included in the training set, are used for testing.

Our algorithm is implemented using the PyTorch framework and trained on a NVIDIA RTX 4090 GPU for accelerated performance. To ensure fairness in algorithm comparison, we did not perform any special fine-tuning of training parameters. The training epochs, batch size, and input image resolution were kept consistent with most of the compared algorithms in this paper. To highlight the comparative advantages of our method, for CFANet, we directly adopted the original settings: the batch size is set to 10, and the model was trained for 100 epochs. Specifically, in this study, all models except CFANet are trained for 50 epochs with a mini-batch size of 16, and the inputs are resized to 352 × 352 for each dataset. A general multi-scale training strategy is employed. To mitigate overfitting, we apply several data augmentation techniques, including random cropping, random horizontal flipping, and random rotation. We utilize Stochastic Gradient Descent (SGD) as the optimization algorithm with a momentum of 0.9 and weight decay of 0.0005. The maximum learning rate is set to 0.05 and adjusted using a warm-up and linear decay strategy.

Color transformation is applied during the model training phase. When reading the training samples, a random image from the training dataset is selected as a color transfer target. The threshold for pixels representing reflective areas (Inline graphic) is set to 220, and the threshold for pixels representing surrounding black regions (Inline graphic) is set to 16.

Evaluation metrics

We employ a set of six evaluation metrics to assess the performance of the models. These metrics include mean Dice coefficient (mDice), mean Intersection over Union (mIoU), weighted F-measure (Inline graphic), S-measure (Inline graphic), E-measure (Inline graphic), and mean absolute error (MAE). mDice and mIoU measure the degree of overlap between the segmentation results and the ground truth. They can be calculated by

graphic file with name d33e1784.gif 9
graphic file with name d33e1790.gif 10

where TP, FP, and FN denote true positive, false positive, and false negative, respectively. Weighted F-measure (Inline graphic) provides a comprehensive evaluation by considering both recall and precision, while taking into account the pixel differences using a weighting method. Inline graphic can be calculated by

graphic file with name d33e1810.gif 11

where Inline graphic is set to 1. S-measure (Inline graphic) focuses on the structural similarity between the segmentation results and the ground truth, it can be computed by

graphic file with name d33e1830.gif 12

where Inline graphic and Inline graphic denote region based and object level similarity measurements, the method for the calculation of those two parameters can refer to67Inline graphic is employed for balancing Inline graphic and Inline graphic, Inline graphic is set to 0.5. E-measure (Inline graphic) considers the segmentation results at both pixel and image levels, it can be computed by

graphic file with name d33e1884.gif 13

where w and h are the height and width of the input, The enhanced alignment matrix Inline graphic is used to capture pixel-level matching and image-level statistical characteristics68. Mean absolute error (MAE) quantifies the absolute difference between the inference results and the ground truth, it is defined as

graphic file with name d33e1902.gif 14

where G represents the pixel set values of the ground truth, P represents the pixel set values of the model’s predicted results, and M is the total number of pixels.

Ablation study

We conducted ablation experiments to analyze the efficacy of each module in our model. The results of the ablation experiments are presented in Table 2. ADSANet, built upon MSNet, incorporates differential features. In ADSANet, the transmission of differential features is simplified. Unlike MSNet, which progressively increases the number of differential feature extractions from deep to shallow layers, ADSANet performs two differentials on each layer. Moreover, after encoding the differential features of each layer, ADSANet reconsiders the source features of each layer to more comprehensively preserve fine-grained details. ADSANet also integrates a shallow attention mechanism by jointly optimizing features through ADFM and SAM in the up-sampling structure. Additionally, to mitigate the network’s reliance on image color distribution, ADSANet employs color transfer as a data augmentation technique during training. To investigate the impact of these characteristics, we designed three variants in the ablation experiments: ADSANet-na, ADSANet-ns, ADSANet-nc, and ADSANet-nc-ns. ADSANet-na removes the ADFM module compared to ADSANet. To ensure the integrity of information transmission and network architecture in ADSANet-na, we simplify the structure of ADFM in Fig. 4 by concatenating Pi and Bi+1, followed by a 3 × 3 convolution operation. The output feature Bi is then passed to the upper layer. ADSANet-ns removes the SAM module compared to ADSANet. ADSANet-nc eliminates the color transfer augmentation during training compared to ADSANet. ADSANet-nc-ns simultaneously removes both the SAM module and color transfer augmentation compared to ADSANet.

Table 2.

Ablation results of the testing datasets.

Dataset Model Metrics
mDice mIoU Inline graphic Inline graphic Inline graphic MAE
ETIS MSNet 0.719 0.664 0.678 0.840 0.830 0.020
ADSANet 0.813 0.743 0.796 0.876 0.910 0.010
ADSANet-na 0.705 0.667 0.657 0.832 0.801 0.020
ADSANe -ns 0.781 0.705 0.760 0.853 0.907 0.012
ADSANet-nc 0.755 0.683 0.718 0.846 0.864 0.017
ADSANet-nc-ns 0.723 0.661 0.705 0.833 0.869 0.017
ClinicDB MSNet 0.921 0.879 0.914 0.941 0.972 0.008
ADSANet 0.934 0.888 0.929 0.947 0.980 0.007
ADSANet-na 0.897 0.858 0.881 0.932 0.953 0.015
ADSANet-ns 0.941 0.895 0.934 0.948 0.986 0.007
ADSANet-nc 0.931 0.888 0.921 0.944 0.969 0.008
ADSANet-nc-ns 0.932 0.885 0.927 0.945 0.981 0.007
Endoscene MSNet 0.869 0.807 0.849 0.925 0.943 0.010
ADSANet 0.909 0.844 0.898 0.939 0.977 0.006
ADSANet-na 0.866 0.810 0.843 0.927 0.927 0.010
ADSANet-ns 0.902 0.838 0.888 0.936 0.974 0.006
ADSANet-nc 0.901 0.842 0.884 0.939 0.958 0.007
ADSANet-nc-ns 0.905 0.840 0.892 0.935 0.977 0.006
ColonDB MSNet 0.755 0.678 0.737 0.836 0.883 0.041
ADSANet 0.752 0.677 0.745 0.832 0.880 0.039
ADSANet-na 0.728 0.660 0.706 0.832 0.857 0.043
ADSANet-ns 0.756 0.678 0.743 0.831 0.876 0.042
ADSANet-nc 0.763 0.684 0.745 0.841 0.882 0.039
ADSANet-nc-ns 0.765 0.685 0.753 0.837 0.883 0.037
Kvasir-SEG MSNet 0.907 0.862 0.893 0.922 0.944 0.028
ADSANet 0.914 0.865 0.910 0.924 0.949 0.023
ADSANet-na 0.876 0.830 0.864 0.904 0.929 0.041
ADSANet-ns 0.899 0.846 0.894 0.912 0.949 0.029
ADSANet-nc 0.902 0.852 0.892 0.915 0.936 0.029
ADSANet-nc-ns 0.913 0.861 0.907 0.918 0.955 0.026

Based on the analysis in Table 2, ADSANet shows superior overall performance compared to the other models when evaluated on the five datasets, indicating its adaptability to different feature distributions. Across all five datasets, ADSANet and its three derived networks generally outperform the baseline network MSNet in various metrics. Only in the ColonDB and Kvasir-SEG datasets, some metrics show similar performance to MSNet, which confirms the effectiveness of the module ADFM. Among the datasets, ADSANet performs exceptionally well in ETIS, achieving the highest scores in all metrics. Compared to MSNet, ADSANet demonstrates a significant improvement of 9.4% and 7.9% in mDice and mIoU metrics, respectively. In ETIS, where the color distribution of images differs greatly from the training set, the removal of color transfer augmentation leads to a considerable drop in network performance. ADSANet-nc exhibits a 5.8% decrease in mDice compared to ADSANet, while ADSANet-nc-ns shows a similarly significant drop of 5.8% in mDice compared to ADSANet-ns. This observation highlights the crucial role of color transfer augmentation in enhancing the network’s adaptability to different color distributions. In ETIS, a comparison between ADSANet-ns and ADSANet reveals that the removal of SAM results in a performance decline of 3.2% and 3.8% in mDice and mIoU metrics, respectively. Similarly, ADSANet-nc-ns shows decreases in both mDice and mIoU metrics, with drops of 3.2% and 2.2%, respectively, when compared to ADSANet-nc. This phenomenon also demonstrates the advantage of introducing matrix element-wise multiplication in the SAM module to enhance the network’s feature representation capability. This finding further emphasizes the effectiveness of the combined dual up-sampling structure of SAM and ADFM. Looking at the results corresponding to other four datasets, although ADSANet and its derived networks achieve good or comparable performance compared to MSNet, there are some deviations in certain metrics from the patterns observed in the results of ETIS dataset. Comparing the training and testing sets, ClinicDB, Endoscene, ColonDB, and Kvasir-SEG exhibit similar color distributions and polyp morphology to the training set, indicating that color transfer augmentation does not provide significant advantages in these datasets. In the ColonDB dataset, the removal of color transfer augmentation leads to a nearly 1% improvement in mDice for both ADSANet-nc and ADSANet-nc-ns compared to ADSANet and ADSANet-ns, suggesting that color transfer augmentation imposes fitting pressure on the network in this type of data, while also confirming the effectiveness of ADFM. In the Endoscene and Kvasir-SEG datasets, ADSANet continues to outperform other models across all metrics. In the ClinicDB dataset, ADSANet-ns performs the best, demonstrating the combined advantages of color transfer and ADFM, while ADSANet shows only a slight performance difference compared to ADSANet-ns. Observing the performance of ADSANet-na across different datasets, it is evident that the removal of the ADFM module significantly impacts the network’s performance, making ADSANet-na the worst-performing model on all datasets. This validates the crucial role of ADFM in extracting key features of polyp regions. Overall, color transfer augmentation, ADFM, and SAM positively influence the network’s performance, particularly in scenarios where there are significant disparities in color and polyp morphology between the training and testing sets.

Comparisons with state-of-the-art

In this section, we conducted a comparison between ADSANet and 12 other polyp segmentation networks. Among them, two are conventional models, namely UNet10 and UNet++69, while the remaining ten are specialized network models designed for polyp region extraction, including selective feature aggregation network SFA70 enhanced U-Net EUNet40 adaptive context selection network ACSNet39 duplex contextual relation network DCRNet61 multi-scale subtraction network MSEG43 parallel reverse attention network PraNet37 shallow attention network SANet19 multi-scale subtraction network MSNet16 UNet-like pure Transformer based medical image segmentation network Swin-UNet71 cross-level feature aggregation network CFANet2 and cross-level guidance and multi-scale aggregation network CGMA-Net72. All network models adopt a U-shaped network framework. CGMA-Net72 is a polyp segmentation network based on the Transformer framework, other models follow CNN framework. The latter nine models aim to improve the accuracy of polyp region extraction by incorporating different feature extraction modules in the encoding and decoding stages to enhance the network’s ability to represent complex features specific to polyp extraction tasks. It is worth noting that SANet introduces color transfer augmentation and shallow attention mechanisms, while MSNet introduces differential features for polyp segmentation. However, SANet’s color transfer does not consider the issue of suppressing unrelated regions, and MSNet only applies differential encoding to each layer’s features, gradually increasing the differential operation from deep layers to shallow layers. ADSANet demonstrates significant structural improvements compared to SANet and MSNet. Qualitative comparative analysis is presented in Fig. 6, showcasing segmentation results of certain samples from different datasets. The quantitative comparative results are provided in Tables 3, 4, 5, 6 and 7.

Fig. 6.

Fig. 6

Qualitative comparison of polyp segmentation results by different models. From left to right are the test images, ground truth, results of UNet, UNet++, SFA, EUNet, ACSNet, DCRNet, MSEG, PraNet, SANet, MSNet, Swin-UNet, CFANet, and ADSANet. Example (a, b) are from the Endoscene dataset, (c, d) are from the ClinicDB dataset, (e, f) is from the ColonDB dataset, (gi) are from the ETIS dataset, (j) is from the Kvasir-SEG dataset. Green indicates correct polyp region. Yellow indicates the missed polyp region. Red indicates the wrong prediction.

Table 3.

Quantitative results of the test dataset ETIS.

Model Metrics
mDice mIoU Inline graphic Inline graphic Inline graphic MAE
UNet 10 0.398 0.335 0.366 0.684 0.643 0.036
UNet++69 0.401 0.344 0.390 0.683 0.629 0.035
SFA 70 0.297 0.217 0.231 0.557 0.531 0.109
EUNet 40 0.687 0.609 0.636 0.793 0.807 0.067
ACSNet 39 0.578 0.509 0.530 0.754 0.737 0.059
DCRNet 61 0.556 0.496 0.506 0.736 0.742 0.096
MSEG 43 0.700 0.630 0.671 0.828 0.854 0.015
PraNet 37 0.628 0.567 0.600 0.794 0.808 0.031
SANet 19 0.750 0.654 0.685 0.849 0.881 0.015
MSNet 16 0.719 0.664 0.678 0.840 0.830 0.020
Swin-UNet 71 0.771 0.647 0.716 0.820 0.838 0.016
CFANet 2 0.732 0.655 0.693 0.845 0.892 0.014
CGMA-Net 72 0.718 0.649 0.682 0.854
ADSANet 0.813 0.743 0.796 0.876 0.910 0.010

Table 4.

Quantitative results of the test dataset clinicdb.

Model Metrics
mDice mIoU Inline graphic Inline graphic Inline graphic MAE
UNet 10 0.823 0.755 0.811 0.889 0.913 0.019
UNet++69 0.794 0.729 0.785 0.873 0.891 0.022
SFA 70 0.700 0.607 0.647 0.793 0.840 0.042
EUNet 40 0.902 0.846 0.891 0.936 0.959 0.011
ACSNet 39 0.882 0.826 0.873 0.927 0.947 0.011
DCRNet 61 0.896 0.844 0.890 0.933 0.964 0.010
MSEG 43 0.909 0.864 0.907 0.938 0.961 0.007
PraNet 37 0.899 0.849 0.896 0.936 0.963 0.009
SANet 19 0.916 0.859 0.909 0.939 0.971 0.012
MSNet 16 0.921 0.879 0.914 0.941 0.972 0.008
Swin-UNet 71 0.909 0.825 0.905 0.933 0.958 0.013
CFANet 2 0.933 0.823 0.924 0.950 0.989 0.007
CGMA-Net 72 0.927 0.880 0.922 0.976
ADSANet 0.934 0.888 0.929 0.947 0.980 0.006

Table 5.

Quantitative results of the test dataset endoscene.

Model Metrics
mDice mIoU Inline graphic Inline graphic Inline graphic MAE
UNet 10 0.710 0.627 0.684 0.843 0.847 0.022
UNet++69 0.707 0.624 0.687 0.839 0.834 0.018
SFA 70 0.468 0.329 0.341 0.640 0.644 0.065
EUNet 40 0.837 0.765 0.805 0.904 0.919 0.015
ACSNet 39 0.863 0.787 0.825 0.923 0.939 0.013
DCRNet 61 0.856 0.788 0.830 0.921 0.943 0.010
MSEG 43 0.874 0.804 0.852 0.924 0.948 0.009
PraNet 37 0.871 0.797 0.843 0.925 0.950 0.010
SANet 19 0.888 0.815 0.859 0.928 0.962 0.008
MSNet 16 0.869 0.807 0.849 0.925 0.943 0.010
Swin-UNet 71 0.877 0.796 0.847 0.914 0.944 0.010
CFANet 2 0.893 0.827 0.875 0.938 0.978 0.008
CGMA-Net 72 0.865 0.794 0.833 0.934
ADSANet 0.909 0.844 0.898 0.939 0.977 0.006

Table 6.

Quantitative results of the test dataset colondb.

Model Metrics
mDice mIoU Inline graphic Inline graphic Inline graphic MAE
UNet 10 0.512 0.444 0.498 0.712 0.696 0.061
UNet++69 0.483 0.410 0.467 0.691 0.680 0.064
SFA 70 0.469 0.347 0.379 0.634 0.675 0.094
EUNet 40 0.756 0.681 0.730 0.831 0.863 0.045
ACSNet 39 0.716 0.649 0.697 0.829 0.839 0.039
DCRNet 61 0.704 0.631 0.684 0.821 0.840 0.052
MSEG 43 0.735 0.666 0.724 0.834 0.859 0.038
PraNet 37 0.712 0.640 0.699 0.820 0.847 0.043
SANet 19 0.753 0.670 0.726 0.837 0.869 0.043
MSNet 16 0.755 0.678 0.737 0.836 0.883 0.041
Swin-UNet 71 0.749 0.637 0.743 0.821 0.862 0.033
CFANet 2 0.743 0.665 0.728 0.835 0.898 0.039
CGMA-Net 72 0.780 0.698 0.757 0.893
ADSANet 0.752 0.677 0.745 0.832 0.880 0.039

Table 7.

Quantitative results of the test dataset Kvasir-SEG.

Model Metrics
mDice mIoU Inline graphic Inline graphic Inline graphic MAE
UNet 10 0.818 0.746 0.794 0.858 0.881 0.055
UNet++69 0.821 0.743 0.808 0.862 0.886 0.048
SFA 70 0.723 0.611 0.670 0.782 0.834 0.075
EUNet 40 0.908 0.854 0.893 0.917 0.951 0.028
ACSNet 39 0.898 0.838 0.882 0.920 0.941 0.032
DCRNet 61 0.886 0.825 0.868 0.911 0.933 0.035
MSEG 43 0.897 0.839 0.885 0.912 0.942 0.028
PraNet 37 0.898 0.840 0.885 0.915 0.944 0.030
SANet 19 0.904 0.847 0.892 0.915 0.949 0.028
MSNet 16 0.907 0.862 0.893 0.922 0.944 0.028
Swin-UNet 71 0.913 0.851 0.910 0.917 0.954 0.022
CFANet 2 0.915 0.861 0.903 0.924 0.962 0.023
CGMA-Net 72 0.907 0.854 0.895 0.948
ADSANet 0.915 0.865 0.910 0.924 0.949 0.023

Upon observing Tables 3, 4, 5, 6 and 7, it is evident that, apart from SFA, the remaining polyp segmentation networks exhibit performance improvements compared to the traditional U-shaped networks UNet and UNet++. This confirms the effectiveness of introducing the scheme of the enhanced interaction of feature flow information into the polyp segmentation model. Among these networks, ADSANet outperforms the others in all metrics on datasets other than ColonDB, demonstrating its superiority. From Table 6, we can observe that ADSANet, EUNet, SANet, and MSNet show very similar performance in terms of the mDice and mIoU metrics on the ColonDB dataset. Specifically, EUNet achieves improvements of 0.3%, 0.1%, and 0.4% in mDice compared to SANet, MSNet, and ADSANet, respectively. In terms of mIoU, EUNet outperforms SANet, MSNet, and ADSANet by 1.1%, 0.3%, and 0.4%, respectively. Comparatively, ADSANet exhibits the best performance in the MAE metric, indicating more accurate results in probability inference. Regarding the quantitative results on the ETIS dataset (shown in Table 3), ADSANet achieves the highest performance, followed by SANet and CFANet. In comparison to SANet, ADSANet achieves significant improvements of 6.3% and 8.9% in mDice and mIoU, respectively. Furthermore, ADSANet surpasses MSNet by 9.4% in mDice and 7.9% in mIoU. Swin-UNet, as a representative Transformer-based model, also exhibits moderate performance on ETIS, but shows clear performance gaps when compared to ADSANet, with differences of 4.2% in mDice and 9.6% in mIoU. As shown in Tables 3, 4, 5, 6 and 7, SANet shows greater improvements compared to MSNet on the ETIS dataset. However, in the ClinicDB, ColonDB, and Kvasir-SEG datasets, SANet and MSNet perform similarly in terms of the evaluated metrics, except for the Endoscene dataset where SANet outperforms MSNet by 1.9% in mDice. This confirms the effectiveness of introducing color transfer in the model to handle significant variations in sample color distributions. Among the five datasets, SFA exhibits weaker performance than other algorithms in terms of the mDice and mIoU metrics. Specifically, on the ETIS dataset, the performance gaps between SFA and ADSANet are as high as 51.6% and 52.6%, respectively. However, the gaps on the other four datasets are relatively smaller, indicating that the ETIS dataset poses greater challenges due to the current configuration of the training and testing sets. Overall, the ETIS dataset exhibits greater variations in color distribution and polyp morphology compared to the images in the training set, making it challenging for most networks to adapt effectively. The consistently weaker performance of most methods on the ETIS dataset further underscores the significant challenge posed by polyp morphology variations and the color shifts caused by differences in imaging devices or working environments. The performance of ADSANet on the ETIS dataset validates its ability to effectively handle significant variations in sample feature distributions by incorporating color transfer and combining the ADFM and SAM dual up-sampling structures. Notably, according to Table 4, MSEG demonstrates acceptable performance on the ClinicDB dataset. In comparison, ADSANet outperforms SANet by 1.8% and 2.9% in the mDice and mIoU metrics, respectively, and shows improvements of 1.3% and 0.9% compared to MSNet. Additionally, ADSANet achieves favorable results in the MAE metric on this dataset, it also performs slightly better than CFANet. On the other hand, ADSANet outperforms Swin-UNet by 2.5% and 6.3% in the mDice and mIoU metrics, respectively. These results suggest that although Swin-UNet benefits from its global receptive field, its performance on ClinicDB is still surpassed by ADSANet. In the Endoscene dataset, CFANet, SANet, and ADSANet are the top-performing networks. ADSANet surpasses SANet and MSNet by 2.1% and 4.0% in the mDice metric, and by 2.9% and 3.7% in the mIoU metric, respectively. Compared to Swin-UNet, ADSANet achieves 3.2% and 4.8% higher mDice and mIoU scores, respectively. These results confirm the superior boundary modeling and fine-detail recovery of ADSANet in complex environments. Compared to the performance on the ClinicDB dataset, ADSANet demonstrates greater advantages in the Endoscene dataset. In the Kvasir-SEG dataset, ADSANet performs the best, followed by CFANet, Swin-UNet and EUNet, which achieve slightly lower results. In the ColonDB dataset, EUNet also demonstrates good performance with a narrow advantage, and Swin-UNet shows performance in the mDice metric that is very close to that of ADSANet. The key feature focusing mechanism in EUNet and the Transform base U-type architecture prove to be effective in both of these datasets. Comparing ADSANet to MSNet on the Kvasir-SEG dataset, ADSANet shows improvements in all metrics, although the performance gains are not substantial. A comprehensive analysis of CGMA-Net’s performance across different datasets reveals that, although it outperforms ADSANet by 2.8% in terms of the mDice metric on the ColonDB dataset, its performance is generally weaker than ADSANet on other datasets. Notably, on the ETIS and EndoScene datasets, ADSANet surpasses CGMA-Net in mDice by 9.5% and 4.4%, respectively, demonstrating a significant performance advantage. Overall, the performance of different algorithms across various datasets, along with ADSANet’s demonstrated advantages, validates that the network built around ADFM and SAM as core modules can effectively adapt to polyp segmentation. Notably, when handling datasets with more pronounced variations in feature distributions from the training set, ADSANet’s superior feature extraction and analysis capabilities become increasingly evident.

Table 8 presents the standard deviation (SD) of the mean Dice (mDice) for each model. We conducted 6 independent training trials and calculated the SD of mDice. It can be observed that all models exhibited good stability, with CGMA-Net and ADSANet demonstrating relatively better stability.

Table 8.

The standard deviation of the mean dice (mDice ± SD) of the compared models.

Model Dataset
ETIS ClinicDB Endoscene ColonDB Kvasir-SEG
UNet 10 0.398 ± 0.033 0.823 ± 0.047 0.710 ± 0.049 0.512 ± 0.034 0.818 ± 0.039
UNet++69 0.401 ± 0.057 0.794 ± 0.044 0.707 ± 0.053 0.483 ± 0.037 0.821 ± 0.040
SFA 70 0.297 ± 0.025 0.700 ± 0.054 0.468 ± 0.050 0.469 ± 0.037 0.723 ± 0.052
EUNet 40 0.687 ± 0.039 0.902 ± 0.048 0.837 ± 0.049 0.756 ± 0.040 0.908 ± 0.042
ACSNet 39 0.578 ± 0.035 0.882 ± 0.048 0.863 ± 0.055 0.716 ± 0.040 0.898 ± 0.045
DCRNet 61 0.556 ± 0.039 0.896 ± 0.049 0.856 ± 0.052 0.704 ± 0.039 0.886 ± 0.043
MSEG 43 0.700 ± 0.039 0.909 ± 0.048 0.874 ± 0.051 0.735 ± 0.039 0.897 ± 0.041
PraNet 37 0.628 ± 0.036 0.899 ± 0.048 0.871 ± 0.051 0.712 ± 0.038 0.898 ± 0.041
SANet 19 0.750 ± 0.047 0.916 ± 0.049 0.888 ± 0.054 0.753 ± 0.040 0.904 ± 0.042
Swin-UNet 71 0.771 ± 0.024 0.909 ± 0.004 0.877 ± 0.014 0.749 ± 0.021 0.913 ± 0.002
CGMA-Net 72 0.718 ± 0.021 0.927 ± 0.011 0.865 ± 0.010 0.780 ± 0.001 0.907 ± 0.004
ADSANet 0.813 ± 0.020 0.934 ± 0.004 0.909 ± 0.005 0.752 ± 0.009 0.915 ± 0.004

Figure 6 present the inference results on various testing images. From these figures, it is evident that polyps display diverse morphologies and substantial variations in size. Additionally, some polyp boundaries are indistinct, posing substantial challenges for polyp segmentation algorithms. In samples depicted in Fig. 6a and f, the boundaries between polyp regions and the surrounding intestinal wall are unclear, leading to significant interference for most algorithms. UNet, SFA, EUNet, ACSNet, PraNet, SANet, MSNet, Swin-UNet, and CFANet all misclassified normal intestinal wall regions as polyp tissues in these samples. Although ADSANet performs relatively well, it still exhibits some minor errors in certain areas. In Figs. 6b, c, e, h, the polyp regions are generally small, particularly in samples (e) and (h), where the polyps are both small and inconspicuous. Only SANet, MSNet, and ADSANet successfully detected the polyp in sample (e). Notably, ADSANet exhibits a smaller false detection area compared to the other algorithms. Due to the influence of intestinal wall morphology and blood vessel distribution, some normal intestinal wall regions are erroneously identified as polyp regions. This misclassification issue is particularly prominent for SFA, EUNet, and DCRNet in certain samples. In sample (h), the polyp region is extremely small, featuring densely folded intestinal walls and a morphology resembling a raised polyp. Consequently, all algorithms, with the exception of ADSANet, fail to detect the polyp, whereas ADSANet accurately identify the small polyp with no false positives. In sample (b), the polyp is relatively discernible, but the boundary between the polyp region and the intestinal wall is unclear. While all algorithms detect the polyp region, UNet and UNet++only identify a small fraction of it. SFA exhibits numerous false detection regions, and EUNet, DCRNet, and MSNet have significant false detection areas in the closely connected intestinal wall region. PraNet, SANet, and ADSANet demonstrate better detection performance in sample (b), with ADSANet and CFANet showing superior results. Sample (c) contains a flat adenoma that is challenging to identify with the naked eye. EUNet, PraNet, SANet, Swin-UNet, CFANet, and ADSANet successfully detect the polyp region, although PraNet has relatively larger areas of missed detection. In samples (f) and (i), there are raised tissues that closely resemble polyps. All algorithms effectively detect the polyp regions, but most of them mistakenly identify the raised tissue areas as polyp regions. In samples (d) and (j), the polyp regions are extensive. In sample (d), the main challenge is the presence of a depression in the lower edge of the polyp, along with unclear boundaries. Only SFA and ADSANet detect the polyp region without excessive extension in the lower edge, although SFA has significant areas of missed detection in the polyp’s lower edge. In sample (j), the polyp has rich texture and edge information inside, which easily misleads the algorithms and results in incomplete extraction of the polyp region. Apart from ADSANet, CFANet, and Swin-UNet, all other algorithms are affected by the internal edge information of the polyp, leading to varying degrees of missed detection and an inability to fully capture the polyp region. UNet and UNet++ are also influenced by interference from the intestinal wall region. Overall, ADSANet demonstrates superior performance in polyp region extraction in the depicted scenarios.

To analysis the computational complexity and inference efficiency of ADSANet, we present a quantitative comparison on inference speed, Floating point operations, and parameters size with several representative algorithms in Table 9. All experiments results shown in Table 9 were conducted on an NVIDIA RTX4090 GPU with 20GB memory, using PyTorch 2.3.1, and the resolution of the testing image is 352 × 352. The inference time is measured by frames per second (FPS), floating point operations (FLOPs) are measured in Giga (G), and parameters (Params) are measured in millions (M). FLOPs indicate the computational complexity of the model, while Params directly reflect its size. ADSANet’s differential feature extraction is derived from MSNet but introduces optimizations. We can first focus on the performance of MSNet when analyzing computational complexity and inference efficiency. The comprehensive analysis reveals that ADSANet not only outperforms MSNet in terms of performance (Tables 3, 4, 5, 6 and 7) but also maintains a similar model size (Table 9). Although ADSANet has nearly twice the FLOPs of MSNet, its FPS performance significantly surpasses that of MSNet, demonstrating superior efficiency during inference. Compared to EUNet and PraNet, ADSANet also achieves advantages in both parameter size and inference speed. While the model size increases compared to the well-performing CFANet, the computational complexity is significantly reduced. Swin-UNet has the largest model size among the compared models, and its inference speed is also lower than that of ADSANet. As seen in Table 9, our method achieves an average inference speed of 61.10 FPS, significantly exceeding the real-time processing threshold. Although CFANet performs comparably to ADSANet on certain datasets, ADSANet delivers nearly three times higher execution efficiency, highlighting its superior practicality. Overall, Benefiting from the architectural advantages of ADSANet, the model achieves a good balance between performance improvement, computational complexity, and inference efficiency. In a typical colonoscopy workflow, the time-consuming component of the image pipeline is the image analysis module, as the image acquisition and display stages introduce minimal latency. Therefore, inference efficiency of the segmentation model plays a pivotal role in enabling real-time feedback. With an inference speed of 61.10 FPS on input images, ADSANet significantly outperforms the real-time processing threshold. These results indicate that ADSANet is computationally efficient, confirming its clinical applicability from the perspective of real-time deployment.

Table 9.

Model size and inference time.

Model EUNet 40 PraNet 37 MSNet 16 Swin-UNet 71 CFANet 2 ADSANet
Speed(FPS) 15.34 19.35 25.08 21.58 21.58 61.10
FLOPs 23.15 13.15 17.00 18.74 55.36 36.70
Param(M) 31.36 30.50 27.69 36.43 25.24 28.97

The performance improvements of ADSANet over baseline models can be attributed to two key innovations. First, the proposed color exchange strategy suppresses irrelevant regions and transforms color distributions to enhance generalization under color-inconsistent scenarios. Second, the ADFM module explicitly models differential features between adjacent layers, enabling more effective inter-layer feature integration. Combined with the shallow attention module, ADSANet achieves superior segmentation accuracy and robustness across diverse datasets.

Failure cases

Although our approach achieved advanced performance, there still has some limitations. Figure 7 displays some challenging examples of failed polyp region detection.

Fig. 7.

Fig. 7

Failure cases. Heatmaps of samples for different models. Example (a) and (e) are from the ETIS dataset, (b) and (d) are from the ColonDB dataset, (c) is from the ClinicDB dataset.

The samples presented in the figure are all highly challenging. Each algorithm exhibits varying degrees of misdetection in these scenarios. In sample (a), the transition between the polyp and the intestinal wall is smooth, and the morphology of the connecting region is similar to that of the polyp region. None of the algorithms accurately extracts the polyp region, resulting in significant differences between the extracted regions and the ground truth. In sample (b), the polyp region is small, and the edges are not well-defined. Among these algorithms, Swin-UNet performs the best, ADSANet demonstrates detection results that are closer to the ground truth. Although MSNet detect a portion of the polyp region, it mistakenly identified food residue in the intestine as polyp regions. EUNet, MSEG, and SANet also exhibit varying degrees of misidentification of food residue as polyps. Sample (c) contains two polyps, with the polyp in the middle region having more distinct boundaries. The polyp in the lower right area is in close proximity to the intestinal wall and is difficult to identify. SANet, MSNet, and ADSANet successfully detect the polyp in the middle region. Except for the SFA algorithm, which detects the polyp in the lower right corner, the remaining algorithms fail to detect it. Figure 7 also illustrates that the SFA algorithm generates a large number of false detections. Sample (d) includes ten polyps, representing a case with multiple polyps. UNet fails to accurately detect the polyp regions in this sample. ADSANet accurately detects five polyps but mistakenly detects an area that closely resembles a polyp as polyp. Swin-UNet also detected five polyps, but missed a smaller one compared to ADSANet. UNet, SFA, MSEG, and SANet all produce false detections in this area. In sample (e), there are intestinal wall folds at the left edge of the image, and raise tissues that resemble polyp regions. Most algorithms detect the polyp in sample (e), but with varying degrees of false detection.

Limitations and future work

The experimental results indicate that ADSANet performs well when dealing with significant color feature distribution variations between the training and testing sets. However, when the color distribution differences are small, the color transfer module may introduce some interference, affecting ADSANet’s advantages. Overall, ADSANet effectively addresses the challenges by leveraging the fusion of differential and convolutional features and incorporating feature attention mechanisms. Nevertheless, it may still encounter certain degrees of false detections when confronted with challenges such as unclear polyp edges, multiple polyps in the same view, and the presence of polyp-like protruding intestinal wall tissues. In future research, we will focus on enhancing the network’s capability to extract subtle edge features and improving its ability to suppress polyp-like regions and detect multiple polyps simultaneously. Furthermore, the processing of high-resolution images presents a challenge as they need to be resized to smaller dimensions. High-resolution images contain valuable texture information that can contribute to more accurate edge extraction. Hence, investigating how to tackle polyp extraction in high-resolution images will be a crucial aspect of our future work.

Conclusion

We propose a deep adjacent-differential network with shallow attention for polyp segmentation to address the limitations of existing polyp extraction network models, which are susceptible to image color distribution and suffer from weakened interlayer complementary features due to the neglect of differential features and convolutional feature fusion. A color transfer strategy is introduced as an important module in our approach. We incorporate region suppression during color exchange to ensure that the transferred images resemble the original images more closely and avoid image distortion. ADFM integrates interlayer differential features and current layer encoding features to achieve progressive refinement of regional information through layer-by-layer propagation. SAM provides feature guidance for the final extraction of polyp regions. Overall, ADSANet outperforms most state-of-the-art CNN-based methods in various aspects, including dealing with challenges like significant color distribution changes, small polyp regions, multiple polyps, and blurred edges.

Acknowledgements

This research was funded by Zhejiang Provincial Natural Science Foundation of China under Grant LZ24F020006, China Postdoctoral Science Foundation under Grant 2023M733161, Zhejiang Provincial Postdoctoral Science Foundation under Grant ZJ2022066, National Natural Science Foundation of China under Grant 61603258, 62271321, 62002226, 61802347.

Author contributions

Keli Hu: methodology, formal analysis, writing, validation, data curation. Chen Wang: validation, data curation. Hancan Zhu: data curation. Liping Zhao: validation, data curation. Chao Fu: data curation. Weijun Yang: supervision. Wensheng Pan: supervision, writing-review.

Data availability

The data and code are available at https://github.com/jasonhenrry/ADSANet.

Declarations

Competing interests

The authors declare no conflict of interest related to this article.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Liping Zhao, Email: zhaoliping_jian@126.com.

Wensheng Pan, Email: wspan223@163.com.

References

  • 1.Yao, L. et al. Effect of an artificial intelligence-based quality improvement system on efficacy of a computer-aided detection system in colonoscopy: a four-group parallel study. Endoscopy54, 757–768 (2022). [DOI] [PubMed] [Google Scholar]
  • 2.Zhou, T. et al. Cross-level feature aggregation network for polyp segmentation. Pattern Recogn.140, 109555 (2023). [Google Scholar]
  • 3.Corley, D. A. et al. Adenoma detection rate and risk of colorectal cancer and death. New Engl. J. Med.370, 1298–1306 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Ahmad, O. F. et al. Artificial intelligence and computer-aided diagnosis in colonoscopy: Current evidence and future directions. Lancet Gastroenterol. Hepatol.4, 71–80 (2019). [DOI] [PubMed] [Google Scholar]
  • 5.le Clercq, C. M. et al. Postcolonoscopy colorectal cancers are preventable: a population-based study. Gut63, 957–963 (2014). [DOI] [PubMed] [Google Scholar]
  • 6.Hassan, C. et al. Performance of artificial intelligence in colonoscopy for adenoma and polyp detection: a systematic review and meta-analysis. Gastrointest Endosc.93, 77–85 (2021). [DOI] [PubMed] [Google Scholar]
  • 7.Mei, J. et al. A survey on deep learning for polyp segmentation: techniques, challenges and future trends. Vis. Intell.3, 1–20 (2025). [Google Scholar]
  • 8.Mori, Y., Kudo, S., Berzin, T. M., Misawa, M. & Takeda, K. Computer-aided diagnosis for colonoscopy. Endoscopy49, 813–819 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Long, J., Shelhamer, E. & Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3431–3440 (2015). [DOI] [PubMed]
  • 10.Ronneberger, O., Fischer, P. & Brox, T. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention. 234–241 (Springer, 2015).
  • 11.Badrinarayanan, V., Kendall, A. & Cipolla, R. SegNet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell.39, 2481–2495 (2017). [DOI] [PubMed] [Google Scholar]
  • 12.Yue, G. et al. Boundary constraint network with cross layer feature integration for polyp segmentation. IEEE J. Biomed. Health Inform.26, 4090–4099 (2022). [DOI] [PubMed] [Google Scholar]
  • 13.Fang, Y., Zhu, D., Yao, J., Yuan, Y. & Tong, K.-Y. ABC-Net: Area-boundary constraint network with dynamical feature selection for colorectal polyp segmentation. IEEE Sens. J.21, 11799–11809 (2020). [Google Scholar]
  • 14.Yeung, M., Sala, E., Schönlieb, C.-B. & Rundo, L. Focus U-Net: A novel dual attention-gated CNN for polyp segmentation during colonoscopy. Comput. Biol. Med.137, 104815 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Wang, S. et al. Multi-scale context-guided deep network for automated lesion segmentation with endoscopy images of gastrointestinal tract. IEEE J. Biomed. Health Inform.25, 514–525 (2020). [DOI] [PubMed] [Google Scholar]
  • 16.Zhao, X., Zhang, L. & Lu, H. Automatic polyp segmentation via multi-scale subtraction network. In International Conference on Medical Image Computing and Computer Assisted Intervention. 120–130 (Springer, 2021).
  • 17.Zhao, G., Chen, S., Wang, X., Yan, F. & Guo, M. FTMSNet: Towards boundary-aware polyp segmentation framework based on hybrid Fourier Transform and Multi-scale Subtraction. In 2024 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). 2936–2943 (IEEE, 2024).
  • 18.Wu, R., Liang, P., Huang, Y., Chang, Q. & Yao, H. Automatic segmentation of hemorrhages in the ultra-wide field retina: multi-scale attention subtraction networks and an ultra-wide field retinal hemorrhage dataset. IEEE J. Biomed. Health Inf. 28, 7369–7381 (2024). [DOI] [PubMed] [Google Scholar]
  • 19.Wei, J. et al. Shallow attention network for polyp segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention. 699–708 (Springer, 2021).
  • 20.Bernal, J., Sánchez, J. & Vilarino, F. Towards automatic polyp detection with a polyp appearance model. Pattern Recogn.45, 3166–3182 (2012). [Google Scholar]
  • 21.Ganz, M., Yang, X. & Slabaugh, G. Automatic segmentation of polyps in colonoscopic narrow-band imaging data. IEEE Trans. Bio-med. Eng.59, 2144–2151 (2012). [DOI] [PubMed] [Google Scholar]
  • 22.Bae, S. H. & Yoon, K. J. Polyp detection via imbalanced learning and discriminative feature learning. IEEE Trans. Med. Imaging34, 2379–2393 (2015). [DOI] [PubMed] [Google Scholar]
  • 23.Bernal, J. et al. WM-DOVA maps for accurate polyp highlighting in colonoscopy: Validation versus saliency maps from physicians. Comput. Med. Imaging Gr.43, 99–111 (2015). [DOI] [PubMed] [Google Scholar]
  • 24.Tajbakhsh, N., Gurudu, S. R. & Liang, J. Automated polyp detection in colonoscopy videos using shape and context information. IEEE Trans. Med. Imaging35, 630–644 (2016). [DOI] [PubMed] [Google Scholar]
  • 25.Yuan, Y., Li, D. & Meng, M. Q. H. Automatic polyp detection via a novel unified bottom-up and top-down saliency approach. IEEE J. Biomed. Health Inform.22, 1250–1260 (2018). [DOI] [PubMed] [Google Scholar]
  • 26.Zhang, S., Zhang, J., He, F. & Hou, N. DRDDN: Dense residual and dilated dehazing network. Vis. Comput.39, 953–969 (2023). [Google Scholar]
  • 27.Zhang, S. et al. Semantic-aware dehazing network with adaptive feature fusion. IEEE Trans. Cybern. 1–14 (2021). [DOI] [PubMed]
  • 28.Hu, K. et al. PPNet: Pyramid pooling based network for polyp segmentation. Comput. Biol. Med.160, 107028 (2023). [DOI] [PubMed] [Google Scholar]
  • 29.Li, Q., Li, L., Wang, W., Li, Q. & Zhong, J. A comprehensive exploration of semantic relation extraction via pre-trained CNNs. Knowl.-Based Syst.194, 105488 (2020). [Google Scholar]
  • 30.Akbari, M. et al. Polyp segmentation in colonoscopy images using fully convolutional network. In 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). 69–72 (2018). [DOI] [PubMed]
  • 31.Brandao, P. et al. Towards a computed-aided diagnosis system in colonoscopy: Automatic polyp segmentation using convolution neural networks. J. Med. Robot. Res.3, 1840002 (2018). [Google Scholar]
  • 32.Jothiraj, S. & Kandaswami, J. A. Localization and semantic segmentation of polyp in an effort of early diagnosis of colorectal cancer from wireless capsule endoscopy images. In 2022 Seventh International Conference on Parallel, Distributed and Grid Computing (PDGC). 749–754 (IEEE, 2022).
  • 33.Wang, P. et al. Development and validation of a deep-learning algorithm for the detection of polyps during colonoscopy. Nat. Biomed. Eng.2, 741–748 (2018). [DOI] [PubMed] [Google Scholar]
  • 34.Jha, D. et al. A comprehensive study on colorectal polyp segmentation with ResUNet++, conditional random field and test-time augmentation. IEEE J. Biomed. Health Inform.25, 2029–2040 (2021). [DOI] [PubMed] [Google Scholar]
  • 35.Yu, L., Chen, H., Dou, Q., Qin, J. & Heng, P. A. integrating online and offline three-dimensional deep learning for automated polyp detection in colonoscopy videos. IEEE J. Biomed. Health Inform.21, 65–75 (2017). [DOI] [PubMed] [Google Scholar]
  • 36.Banik, D., Roy, K., Bhattacharjee, D., Nasipuri, M. & Krejcar, O. Polyp-Net: A multimodel fusion network for polyp segmentation. IEEE Trans. Instrum. Meas.70, 1–12 (2020).33776080 [Google Scholar]
  • 37.Fan, D.-P. et al. Pranet: Parallel reverse attention network for polyp segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention. 263–273 (Springer, 2020).
  • 38.Guo, X., Yang, C., Liu, Y. & Yuan, Y. Learn to threshold: ThresholdNet with confidence-guided manifold mixup for polyp segmentation. IEEE Trans. Med. Imaging40, 1134–1146 (2021). [DOI] [PubMed] [Google Scholar]
  • 39.Zhang, R. et al. Adaptive context selection for polyp segmentation. In Medical Image Computing and Computer Assisted Intervention—MICCAI 2020 (eds Martel, A. L. et al.) 253–262 (Springer International Publishing, Cham, 2020). [Google Scholar]
  • 40.Patel, K., Bur, A. M. & Wang, G. Enhanced u-net: A feature enhancement network for polyp segmentation. In 2021 18th Conference on Robots and Vision (CRV). 181–188 (IEEE, 2021). [DOI] [PMC free article] [PubMed]
  • 41.Wickstrøm, K., Kampffmeyer, M. & Jenssen, R. Uncertainty and interpretability in convolutional neural networks for semantic segmentation of colorectal polyps. Med. Image Anal.60, 101619 (2020). [DOI] [PubMed] [Google Scholar]
  • 42.Cheng, M. et al. Learnable Oriented-Derivative Network for Polyp Segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention. 720–730 (Springer, 2021).
  • 43.Huang, C.-H., Wu, H.-Y. & Lin, Y.-L. Hardnet-mseg: A simple encoder-decoder polyp segmentation neural network that achieves over 0.9 mean dice and 86 fps. arXiv preprint arXiv:2101.07172, (2021).
  • 44.Tomar, N. K. et al. DDANet: Dual decoder attention network for automatic polyp segmentation. In International Conference on Pattern Recognition. 307–314 (Springer, 2021).
  • 45.Dong, B. et al. Polyp-PVT: Polyp segmentation with pyramid vision transformers. arXiv preprint arXiv:2108.06932, (2021).
  • 46.Hu, K. et al. Colorectal polyp region extraction using saliency detection network with neutrosophic enhancement. Comput. Biol. Med.147, 105760 (2022). [DOI] [PubMed] [Google Scholar]
  • 47.Hu, K. et al. Online visual tracking of weighted multiple instance learning via neutrosophic similarity-based objectness estimation. Symmetry11, 832 (2019). [Google Scholar]
  • 48.Hu, K. et al. A novel object tracking algorithm by fusing color and depth information based on single valued neutrosophic cross-entropy. J. Intell. Fuzzy Syst.32, 1775–1786 (2017). [Google Scholar]
  • 49.Fan, C., Ye, J., Feng, S., Fan, E. & Hu, K. Multi-criteria decision-making method using heronian mean operators under a bipolar neutrosophic environment. Mathematics7, 97 (2019). [Google Scholar]
  • 50.Liu, P., Han, Q., Wu, T. & Tao, W. Anomaly detection in industrial multivariate time series data with neutrosophic theory. IEEE Internet Things J.10, 13458–13473 (2023). [Google Scholar]
  • 51.Fan, C., Chen, J., Hu, K., Fan, E. & Wang, X. Research on normal pythagorean neutrosophic set choquet integral operator and its application. Comput. Model. Eng. Sci.131, 477–491 (2022). [Google Scholar]
  • 52.Zhou, Q. et al. RMFDNet: Redundant and missing feature decoupling network for salient object detection. Eng. Appl. Artif. Intell.139, 109459 (2025). [Google Scholar]
  • 53.Zhu, M., Chen, Z. & Yuan, Y. DSI-Net: Deep synergistic interaction network for joint classification and segmentation with endoscope images. IEEE Trans. Med. Imaging40, 3315–3325 (2021). [DOI] [PubMed] [Google Scholar]
  • 54.Li, J. et al. PRCNet: A parallel reverse convolutional attention network for colorectal polyp segmentation. Biomed. Signal Process. Control95, 106336 (2024). [Google Scholar]
  • 55.Ahamed, M. F. et al. Detection of various gastrointestinal tract diseases through a deep learning method with ensemble ELM and explainable AI. Expert Syst. Appl.256, 124908 (2024). [Google Scholar]
  • 56.Ahamed, M. F. et al. Automated detection of colorectal polyp utilizing deep learning methods with explainable AI. IEEE Access12, 78074–78100 (2024). [Google Scholar]
  • 57.Ahamed, M. F. et al. Irv2-net: A deep learning framework for enhanced polyp segmentation performance integrating inceptionresnetv2 and unet architecture with test time augmentation techniques. Sensors23, 7724 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Selvaraj, J. & Jayanthy, A. Design and development of artificial intelligence-based application programming interface for early detection and diagnosis of colorectal cancer from wireless capsule endoscopy images. Int. J. Imaging Syst. Technol.34, e23034 (2024). [Google Scholar]
  • 59.Yue, G. et al. Attention-guided pyramid context network for polyp segmentation in colonoscopy images. IEEE T Instrum. Meas.72, 1–13 (2023). [Google Scholar]
  • 60.Gökkan, O. & Kuntalp, M. A new imbalance-aware loss function to be used in a deep neural network for colorectal polyp segmentation. Comput. Biol. Med.151, 106205 (2022). [DOI] [PubMed] [Google Scholar]
  • 61.Yin, Z., Liang, K., Ma, Z. & Guo, J. Duplex contextual relation network for polyp segmentation. In 2022 IEEE 19th International Symposium on Biomedical Imaging (ISBI). 1–5 (IEEE, 2022).
  • 62.Liu, G. et al. A coarse-to-fine segmentation frame for polyp segmentation via deep and classification features. Expert Syst. Appl.214, 118975 (2023). [Google Scholar]
  • 63.Selvaraj, J., Umapathy, S. & Rajesh, N. A. Artificial intelligence based real time colorectal cancer screening study: Polyp segmentation and classification using multi-house database. Biomed. Signal Process. Control99, 106928 (2025). [Google Scholar]
  • 64.Silva, J., Histace, A., Romain, O., Dray, X. & Granado, B. Toward embedded detection of polyps in wce images for early diagnosis of colorectal cancer. Int. J. Comput. Assist. Radiol. Surg.9, 283–293 (2014). [DOI] [PubMed] [Google Scholar]
  • 65.Vázquez, D. et al. A benchmark for endoluminal scene segmentation of colonoscopy images. J. Healthc. Eng.2017, 9 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Jha, D. et al. Kvasir-seg: A segmented polyp dataset. In International Conference on Multimedia Modeling. 451–462 (Springer, 2020).
  • 67.Fan, D.-P., Cheng, M.-M., Liu, Y., Li, T. & Borji, A. Structure-measure: A new way to evaluate foreground maps. In Proceedings of the IEEE International Conference on Computer Vision. 4548–4557 (2017).
  • 68.Fan, D.-P. et al. Enhanced-alignment measure for binary foreground map evaluation. arXiv preprint arXiv:1805.10421, (2018).
  • 69.Zhou, Z., Rahman Siddiquee, M. M., Tajbakhsh, N. & Liang, J. Unet++: A nested u-net architecture for medical image segmentation. In Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support 3–11 (Springer, 2018). [DOI] [PMC free article] [PubMed]
  • 70.Fang, Y., Chen, C., Yuan, Y. & Tong, K.-y. Selective feature aggregation network with area-boundary constraints for polyp segmentation. In MICCAI 2019: Medical Image Computing and Computer Assisted Intervention. 302–310 (Springer International Publishing, 2019).
  • 71.Cao, H. et al. Swin-unet: Unet-like pure transformer for medical image segmentation. In European conference on computer vision. 205–218 (Springer, 2022).
  • 72.Zheng, J., Yan, Y., Zhao, L. & Pan, X. CGMA-Net: Cross-level guidance and multi-scale aggregation network for polyp segmentation. IEEE J. Biomed. Health Inform.28, 1424–1435 (2024). [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The data and code are available at https://github.com/jasonhenrry/ADSANet.


Articles from Scientific Reports are provided here courtesy of Nature Publishing Group

RESOURCES