Review of application of U-Net and Transformer in colon polyp image segmentation

Yankun SHI; Shilei SUN; Jing LIU; Jingang MA; Ming LI

doi:10.7507/1001-5515.202405039

. 2025 Dec 25;42(6):1289–1295. [Article in Chinese] doi: 10.7507/1001-5515.202405039

Show available content in

Review of application of U-Net and Transformer in colon polyp image segmentation

Yankun SHI ¹, Shilei SUN ¹, Jing LIU ¹, Jingang MA ¹, Ming LI ¹

PMCID: PMC12744979 PMID: 41448773

Abstract

Colorectal cancer typically originates from the malignant transformation of colonic polyps, making the automatic and accurate segmentation of colonic polyps crucial for clinical diagnosis. Deep learning techniques such as U-Net and Transformer can effectively extract implicit features from medical images, and thus have significant potential in colonic polyp image segmentation. This paper first introduced commonly used evaluation metrics and datasets for colonic polyp segmentation. It then reviewed the application of segmentation models based on U-Net, Transformer, and their hybrid approaches in this domain. Finally, it summarized the improvement methods, advantages, and limitations of polyp segmentation algorithms, discussed the challenges faced by U-Net- and Transformer-based models, and provided an outlook on future research directions in this field.

Keywords: Colorectal polyp, Medical image segmentation, U-Net, Transformer

0. 引言

结直肠癌（colorectal cancer，CRC）是全球高发的恶性肿瘤，发病率位居第三，致死率位居第二^[1]。高达70%～80%的CRC由结肠息肉恶变而来，早期发现并切除腺瘤性息肉可显著降低其发病风险^[2]。结肠镜是目前主要的筛查手段^[3]，但该过程可能受到多种因素的影响，例如肠道清洁度，以及息肉在颜色、纹理等方面与周围黏膜的相似性。此外，不同于边界清晰的肺结节或肝脏病灶，结肠息肉形态多样，既可呈隆起型，也可表现为扁平型生长。为提升模型鲁棒性与临床适用性，亟需研发高效、精准的分割算法。

近年来，基于深度学习的结肠息肉图像分割模型显著提升了影像分析的准确性，它可通过大规模数据自动学习并提取多样化的息肉特征^[4]。已有研究对相关算法进行了系统综述。例如，考文涛等^[5]阐述了卷积神经网络（convolutional neural network，CNN）在结肠息肉分割中的应用，展示了其精准分割的优势。孙福艳等^[6]详细探讨了U-Net及其变体在结肠息肉分割中的应用。Xiao等^[7]提出了以Transformer为网络骨干的CTNet模型，指出该模型能够有效检测并分割具有伪装特性的结肠息肉。

基于以上研究，本文首先综述了结肠息肉图像分割常用的评价指标和数据集；接着总结了基于U-Net、Transformer及其融合模型的算法及改进方法，分析其优缺点；最后回顾现有研究面临的挑战并展望未来发展方向。

1. 评价指标及数据集

1.1. 评价指标

为了全面评估改进算法在结肠息肉图像分割中的性能，本文选用三个关键的评价指标：Dice相似系数（Dice similarity coefficient，DSC）、平均交并比（mean intersection over union，MIoU）、平均Dice相似系数（mean Dice coefficient，mDice）来评价模型。

（1）DSC用于衡量模型在结肠息肉分割中预测结果与真实标签的重叠程度，取值范围为0～1，值越大表示分割效果越好。其计算公式如下：

（2）MIoU用于衡量预测结果与真实标注的重叠程度，取值范围为0～1，值越大表示分割效果越好。其计算公式如下：

（3）mDice系数是多个样本DSC的平均值，用于评价模型在结肠息肉分割任务中的整体分割性能，数值越高表示效果越好。其计算公式如下：

公式中的TP（true positive）表示被正确识别的息肉像素，FP（false positive）表示被错误标记为息肉的非息肉像素，FN（false negative）表示未被识别的息肉像素，TN（true negative）表示被正确识别的非息肉像素。N表示样本总数，i表示类别编号，用于索引各类别。

1.2. 数据集

在结肠息肉图像分割研究中，已有多个公开数据集可供使用。表1^[8-11]汇总了部分常用数据集的基本信息，包括名称、样本数量、分辨率及数据类型。

表 1. Summary of the dataset.

数据集汇总

数据集	数量	分辨率	类型
CVC-ClinicDB^[8]	612	384 px × 288 px	静态图片
CVC-ColonDB^[9]	300	500 px × 574 px	静态图片
Kvasir-SEG^[10]	1 000	332 px × 487 px到1 920 px × 1 072 px	静态图片
CVC-EndoSceneStill^[11]	912	384 px × 288 px到574 px × 500 px	静态图片

Open in a new tab

2. 基于U-Net结构的分割模型

U-Net于2015年首次提出，是一种用于医学图像分割的CNN架构，由编码器和解码器组成，通过收缩路径和扩展路径实现特征提取和解码操作（具体U-Net结构图参见附件1）。U-Net网络的独特之处在于它有效地融合了低分辨率和高分辨率特征图，将低级和高级图像特征相结合。这种设计推动了深度学习在医学图像分割中的发展^[12]。表2梳理了基于U-Net结构改进方法的相关研究，并总结了各方法的优势与局限性。

表 2. Improved strategy based on U-Net.

基于U-Net的改进策略

改进策略	模型	改进方法	优点	局限性
注意力机制	MADoubleU-Net^[13]	加入空间与通道注意力及多尺度选择核心通道注意力模块	多尺度核心注意力机制增强了感受野	网络结构深且复杂，泛化能力差
	Focus U-Net^[14]	引入双注意力门控机制、深度监督和混合焦点损失函数	实现了息肉特征选择性学习	双注意力机制和焦点门模块导致模型复杂度高
	TGANet^[15]	引入文本引导注意力机制	有效增强了对扁平和无蒂息肉的分割能力	模型表现受到文本信息质量的制约
跳跃连接机制	U-Net++^[17]	使用了嵌套密集跳跃路径的方法	息肉特征学习增强，信息丢失减轻	易出现特征稀疏化与过拟合的风险
跳跃连接机制	DenseNet^[18]	DenseNet模型采用密集跳跃连接	实现了特征高效复用	引入了噪声干扰，可解释性差
编解码器	DDANet^[19]	采用双解码器机制与引入残差学习	主解码器的特征学习能力提高	对于扁平或者无梗息肉时分割精度下降
	ResUNet++^[20]	添加压缩提取模块，并相应地传播了通道注意力权重	压缩提取模块减少了特征图中的冗余信息	训练过程中会有信息丢失
	Graft-U-Net^[21]	融合编码器-解码器结构和局部预处理增强技术	预处理提高了息肉的识别能力	推理时间有所增加

Open in a new tab

2.1. 基于注意力机制的改进策略

U-Net虽然是一种全连接架构，但它难以有效地关注复杂或关键的区域，尤其在息肉与背景颜色相似的情况下更难以有效分割。通过引入注意力机制可增强关键特征和上下文信息的捕获，从而提升分割精度。刘佳伟等^[13]提出了双U型（multi-attention doubleU-Net，MADoubleU-Net）模型，采用DoubleU-Net架构，在第一个U-Net中引入空间与通道注意力机制，并在两U-Net间加入多尺度通道注意力模块自适应提取关键特征，从而提升对多样息肉形态的识别能力。然而，该模型存在泛化能力不足的问题。与此同时，Yeung等^[14]提出了双注意力门控深度神经网络Focus U-Net，该网络将有效的空间注意力和基于通道的注意力结合到单一的焦点门模块中，实现了对息肉特征的选择性学习，提升了息肉的分割能力。但是，该模型的可解释性不足。Tomar等^[15]提出了结合文本引导注意力机制的TGANet模型，通过将息肉的大小和数量等语义信息融入特征学习，实现了多发息肉的适应性分割，并在临床上对扁平息肉和无蒂息肉表现出更优异的分割性能。然而，该模型的性能受限于文本信息的质量。

2.2. 基于跳跃连接机制的改进策略

为提升信息传递效率和梯度流动性，同时增强特征融合能力，研究者已尝试采用跳跃连接进行结构改进^[16]。例如，Zhou等^[17]提出了U-Net++网络，通过将部分直接跳跃连接改为密集跳跃连接，并增加层级和特征通道，减轻了梯度消失和信息丢失的问题。然而，U-Net++的编码器特征随着网络深度增加而逐渐减少特征通道数量，易导致稀疏化问题。Huang等^[18]引入了基于U-Net++改进的密集卷积网络（dense convolutional network，DenseNet）。与Zhou的方法不同，该网络在跳跃连接中引入多个卷积层，并采用密集连接策略，将每一层与所有前置层直接相连，从而实现特征的充分传递与高效复用。

2.3. 基于编解码器的改进策略

通过编解码器结构的改进，可实现多层次特征的高效提取与融合，同时降低冗余连接噪声，提高整体建模效果。Tomar等^[19]提出了用于自动息肉分割的双解码器注意网络（double decoder attentive network，DDANet）。该网络包含由两个并行解码器共享的单个编码器，能够高效精准地分割大小不同的息肉。然而，该网络在面对扁平或者无梗息肉时会有不足。Jha等^[20]提出了ResUNet++网络，该网络在ResUNet编码器的每个残差块后引入压缩提取模块，并传递了通道注意力权重（具体结构图参见附件2）。网络利用这些权重过滤掉解码器特征图中的冗余信息。这种融合两边网络特征的方法相比于一次性串联的融合方式更加有效，但是会以信息丢失为代价。为解决这类问题，Ramzan等^[21]提出了Graft-U-Net模型，其利用编码器-解码器结构实现图像特征的提取与合成。在预处理阶段，采用局部增强技术提升图像的对比度。编码器负责提取图像特征，解码器则将特征还原至原始空间，实现了信息的有效融合。

3. 基于Transformer结构的分割模型

目前，医学图像分割模型普遍采用U-Net作为骨干网络。然而，U-Net在捕捉长距离依赖性能力方面存在局限，这制约了性能的提升^[22]。为此，研究者将Transformer引入到医学图像分割中，以增强对长距离依赖和全局上下文的建模能力。Transformer由多层自注意力机制和前馈神经网络构成，具备出色的全局特征提取能力（具体结构图参见附件3），相较于U-Net在复杂医学影像中表现更优。表3梳理了基于Transformer结构改进方法的相关研究，并总结了各方法的优势与局限性。

表 3. Transformer-based improvement strategies.

基于Transformer的改进策略

改进策略	模型	改进方法	优点	局限性
编解码器	TransNetR^[23]	在解码阶段引入了残差Transformer模块	息肉分割准确性和效率得到提高	对硬件计算资源需求较高
	UnX^[24]	将UnX和FeE模块集成到Transformer编码器中	能够消除背景干扰并挖掘边界区域	使用空洞卷积增加了计算复杂度
	TCPA-Net^[25]	将Transformer编码器和PAFM模块融合	提升了特征提取、噪声过滤和边缘识别能力	对小样本数据依赖较强
改进注意力机制	Att-PVT^[26]	PVT的层次化自注意力机制与多层次特征融合模块结合	兼顾局部细节与全局语义特征	边缘模糊图像上仍存在性能提升空间
	SwinE-Net^[27]	EfficientNet提取局部信息，Swin Transformer整合全局信息	捕获全局和局部信息能力提升	背景黏膜和息肉之间的边界模糊的地方存在漏检
	GRFormer^[28]	引入Swin Transformer多头自注意力机制	全局与局部特征兼顾，精准捕获图像边缘特征	泛化能力有待提高

Open in a new tab

3.1. 基于编解码器的改进策略

为了有效增强细节还原与边界分割能力，同时提升模型的效率与泛化性。Jha等^[23]提出了一种名为TransNetR的模型，该模型以ResNet-50作为编码器以增强局部特征提取能力，在解码阶段引入残差Transformer模块提升了边界细节恢复能力。然而，模型对计算资源配置较高。Guo等^[24]提出了一种基于Transformer编码器的息肉分割框架，结合不确定性探索模块（uncertainty exploration，UnX）和多尺度特征增强模块（feature enhancement，FeE）。该框架通过Transformer编码器提取全局特征，UnX用于扩展模糊边界区域以强化边界识别，FeE则补充局部与多尺度特征以适应不同形态息肉，在CVC-ClinicDB数据集上实现了0.933的mDice系数。梁礼明等^[25]结合Transformer与跨级相位感知（phase-aware fusion module，PAFM）模块提出了TCPA-Net模型。此模型利用Transformer编码器分层提取病变区域的语义和空间细节，克服局部信息丢失问题，利用PAFM有效聚合多尺度上下文信息，增强全局语义理解，精确捕捉了息肉区域的细节和边缘特征。然而，模型对小样本数据依赖较强。

3.2. 基于注意力机制的改进策略

Transformer的自注意力机制虽可捕获全局依赖，但在高分辨率图像下计算开销大，影响处理效率。为此，研究者提出多种改进以提升分割中任务的性能与适用性。例如，Liu等^[26]提出了Att-PVT模型，该模型基于金字塔视觉Transformer（pyramid vision transformer，PVT）架构，编码器以PVT提取特征，通过自注意力机制捕获全局依赖，利用多层次特征融合模块整合浅层边界与深层语义，有效提升了边缘分割精度。此外，Park等^[27]结合Swin Transformer与EfficientNet提出了SwinE-Net模型（具体SwinE-Net结构图参见附件4）。Swin Transformer通过局部窗口自注意力及窗口移位操作，有效弥补了卷积操作感受野的局限；通过EfficientNet能够有效提取低层细节信息。该模型在Kvasir-SEG数据集上的mDice为0.920，MIoU为0.870。梁礼明等^[28]提出一种基于Swin Transformer和图形推理的自适应网络（adaptive networks based on swin transformer and graph reasoning，GRFormer）。该网络结合Swin Transformer的多头自注意力与局部特征建模优势，兼具动态权重调整与全局上下文建模能力，在结肠息肉病变分割中可显著提升精度，并抑制边界伪影，然而，模型整体泛化能力仍有待提升。

4. 融合U-Net与Transformer结构的分割模型

近年来，U-Net和Transformer在图像分割任务中取得了显著进展，但传统架构仍存在不足。为此，研究人员结合Transformer的全局感知能力与U-Net的像素级空间特征提取能力，以产生更有效的特征表示。表4梳理了融合U-Net与Transformer结构的改进方法相关研究，总结了各方法的优势与局限性。

表 4. Improved strategy for fusing U-Net with Transformer.

融合U-Net与Transformer的改进策略

改进策略	模型	改进方法	优点	局限性
多尺度特征融合	UViT-Seg^[29]	ViT获取图像的高级语义信息，U-Net捕捉图像的形状低级特征	多重跳跃连接融合编码器与解码器多层特征，实现多尺度特征整合	定位深层与较大息肉时能力较低
	EG-TransUNet^[30]	引入渐进增强、信道空间注意和语义引导注意模块	提升了待分割息肉区域的特征表达能力	计算需求的硬件资源大
	ColonFormer^[31]	采用轻量级Transformer作为编码器	突破U-Net局部感受野限制，降低Transformer数据依赖	长程注意力破坏图像块结构
自注意力机制	TransFuse^[32]	双线性融合模块与自注意力机制结合	减少冗余和梯度消失问题	分割边界会出现伪影
自注意力机制	CswinDoubleU-Net^[33]	增加额外的U-Net并与自注意力机制结合	能够精准捕捉息肉位置	参数量大，限制实时分割

Open in a new tab

4.1. 基于多尺度特征融合的改进策略

Oukdach等^[29]通过结合视觉Transformer（vision transformer，ViT）的全局感知能力与U-Net的多尺度特征融合优势，提出了UViT-Seg模型。该模型能够捕捉息肉与周围组织的复杂相互作用，有效恢复丢失的息肉细节，并在边缘分割上表现出色。然而，该方法在大息肉的定位上仍存在不足。与此同时，Pan等^[30]提出了EG-TransUNet架构（具体结构图参见附件5），包含渐进增强、信道空间注意和语义引导注意三个模块。渐进增强模块通过多感受野融合优化边缘特征，有效整合了多尺度语义与空间纹理信息。该架构不仅保留了U-Net中的关键判别特征，还通过空间与语义信息的高效融合，减少了冗余信息。Duc等^[31]提出了融合U-Net结构与Transformer架构的混合型模型ColonFormer，编码器采用轻量级Transformer作为主干网络，用于建模多尺度的全局语义关系，解码器通过层次化特征融合，优化局部特征与边界分割，展现出优异的分割性能。然而，长程注意力机制在捕获全局依赖的同时，易引起局部结构信息的缺失。

4.2. 基于自注意力机制的改进策略

Transformer虽擅长建模全局信息，但计算开销大且局部细节不足，引入自注意力机制可在降低复杂度与资源消耗的同时强化局部特征感知，从而提升分割效率与精度。

Zhang等^[32]提出了将双线性融合模块与双分支U-Net和Transformer结合的TransFuse模型。该模型通过自注意力机制，有效融合全局语义与局部细节，提升了特征表达能力与计算效率，同时避免了深层网络带来的冗余计算与细节丢失。然而，在分割中仍易出现边界伪影和目标内部不连续的问题。Lin等^[33]提出了融合局部卷积特征与全局自注意力特征的双U型CSwinDoubleU-Net网络，该网络设计了卷积与自注意力特征融合模块，可有效保留空间信息，精准捕捉位置信息，从而显著提升对不同大小、形状和颜色息肉的分割能力。但模型参数量较大，限制了实时分割应用。

5. 可视化分析

为了更直观地展示不同模型在真实图像中的表现，本节引入了部分模型的可视化结果，通过与人工标注（ground truth，GT）对比，更直观地展示模型在真实图像中的表现。其典型模型分割结果如图1所示。

通过对比可发现，不同模型在处理复杂息肉图像时表现存在差异。例如，U-Net模型在处理息肉边界时易出现模糊分割问题，且对图像中的干扰信息抑制能力有限，容易把背景误判为目标；DenseNet模型分割边界不够精准，对小目标的捕捉能力不足；ResUNet++模型对于息肉边界分割不太精准；SwinE-Net模型则在识别与背景颜色相近的息肉时也会存在漏检现象；相比之下，将两个模型融合的ColonFormer模型能够更准确地定位息肉位置，尤其在边界清晰度与小目标检测方面具有明显优势。

6. 总结与展望

本文总结了基于U-Net与Transformer的结肠息肉图像分割模型的主要改进策略，分析了各种方法的优势和局限性，并指出目前该领域仍存在一些挑战：① 实时分割性能需求。高精度的U-Net与Transformer模型计算量较大，导致训练耗时较长且对硬件配置要求较高，难以满足实时处理要求^[34]。② 模型可解释性不足。基于Transformer的模型能捕捉图像中的长距离依赖和全局特征，但其输入信息到输出结果之间的决策过程对医生来说是不可见的，使得模型的预测过程缺乏透明性。③ 多模态分割挑战。在多模态分割中，不同成像设备信息差异显著：内窥镜呈现黏膜细节，不同模式（白光、窄带成像、荧光成像）提供额外对比，相比之下，计算机断层扫描或磁共振成像主要反映深层组织结构。模态在分辨率、视角和对比度上的差异会增加特征融合难度。传统U-Net难以捕捉全局关联，Transformer虽能建模长距离依赖，却缺乏自适应机制，易造成信息冗余或丢失。因此，如何兼顾局部细节与全局结构、充分利用多模态信息的高效融合，仍是多模态分割面临的核心挑战。

针对上述问题，未来可以从以下几个方面入手进行改进：① 模型压缩与自适应计算优化。通过模型剪枝和量化技术对U-Net与Transformer模型进行压缩，降低计算资源的需求，此外，引入自适应计算方法，使模型能够根据实时需求动态调整计算负载，从而提升其在资源受限环境下的适用性。② 引入生成性对抗网络辅助决策^[35]。未来可将生成对抗网络引入模型中，通过判别器对结构特征的反馈监督，引导模型生成更具临床可信度的分割结果，从而间接缓解模型可解释性不足的问题，提升模型在实际诊断中的可靠性与可接受度。③ 多模态适应性优化。U-Net可通过模态注意机制和特征对齐方法提升对模态差异的适应性，从而提升分割精度。Transformer可以引入跨模态注意机制和模态嵌入向量，增强模态间关联建模能力，从而减少信息冗余或丢失，进一步提高多模态分割性能。

重要声明

利益冲突声明：本文全体作者均声明不存在利益冲突。

作者贡献声明：史艳坤负责资料收集、文章撰写以及后期修改，孙石磊、刘静、马金刚对文章框架提供了指导性的建议，李明负责论文指导并进行论文审校。

本文附件见本刊网站的电子版本（biomedeng.cn）。

Funding Statement

国家自然科学基金面上项目（82174528）；山东省研究生教育优质课程和教学资源库建设项目（SDYKC20047，SDYAL2022041）；教育部产学合作协同育人项目（220606121142949）

References

1.Maida M, Dahiya D S, Shah Y R, et al Screening and surveillance of colorectal cancer: A review of the literature. Cancers. 2024;16(15):2746. doi: 10.3390/cancers16152746. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.李国威, 刘静, 曹慧, 等深度学习在结肠息肉图像分割中的研究综述. 计算机科学与探索. 2025;19(5):1198–1216. doi: 10.3778/j.issn.1673-9418.2408012. [DOI] [Google Scholar]
3.Grion B A R, Fonseca P L C, Kato R B, et al Identification of taxonomic changes in the fecal bacteriome associated with colorectal polyps and cancer: potential biomarkers for early diagnosis. Front Microbiol. 2024;14:1292490. doi: 10.3389/fmicb.2023.1292490. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.尹艺晓, 马金刚, 张文凯, 等从U-Net到Transformer: 混合模型在医学图像分割中的应用进展. 激光与光电子学进展. 2025;62(2):11–33. doi: 10.3788/LOP240875. [DOI] [Google Scholar]
5.考文涛, 李明, 马金刚卷积神经网络在结直肠息肉辅助诊断中的应用综述. 计算机科学与探索. 2024;18(3):627–645. doi: 10.3778/j.issn.1673-9418.2310062. [DOI] [Google Scholar]
6.孙福艳, 王琼, 吕宗旺, 等深度学习在结肠息肉分割中的应用综述. 计算机工程与应用. 2023;59(23):15–27. [Google Scholar]
7.Xiao B, Hu J, Li W, et al CTNet: Contrastive transformer network for polyp segmentation. IEEE T Cybern. 2024;54(9):5040–5053. doi: 10.1109/TCYB.2024.3368154. [DOI] [PubMed] [Google Scholar]
8.Bernal J, Sánchez F J, Fernández-Esparrach G, et al WM-DOVA maps for accurate polyp highlighting in colonoscopy: Validation vs. saliency maps from physicians. Comput Med Imag Grap. 2015;43:99–111. doi: 10.1016/j.compmedimag.2015.02.007. [DOI] [PubMed] [Google Scholar]
9.Tajbakhsh N, Gurudu S R, Liang J Automated polyp detection in colonoscopy videos using shape and context information. IEEE T Med Imaging. 2015;35(2):630–644. doi: 10.1109/TMI.2015.2487997. [DOI] [PubMed] [Google Scholar]
10.Jha D, Smedsrud P H, Riegler M A, et al. Kvasir-SEG: A segmented polyp dataset// International Conference on Multimedia Modeling. Cham: Springer International Publishing, 2019: 451-462.
11.Vázquez D, Bernal J, Sánchez F J, et al A benchmark for endoluminal scene segmentation of colonoscopy images. J Healthc Eng. 2017;2017(1):4037190. doi: 10.1155/2017/4037190. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Xu Y, Quan R, Xu W, et al Advances in medical image segmentation: A comprehensive review of traditional, deep learning and hybrid approache. Bioengineering. 2024;11(10):1034. doi: 10.3390/bioengineering11101034. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.刘佳伟, 刘巧红, 李晓欧, 等一种改进的双U型网络的结肠息肉分割方法. 光学学报. 2021;41(18):72–80. [Google Scholar]
14.Yeung M, Sala E, Schönlieb C B, et al Focus U-Net: A novel dual attention-gated CNN for polyp segmentation during colonoscopy. Comput Biol Med. 2021;137:104815. doi: 10.1016/j.compbiomed.2021.104815. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Tomar N K, Jha D, Bagci U, et al. TGANet: Text-guided attention for improved polyp segmentation// International Conference on Medical Image Computing and Computer-Assisted Intervention. Cham: Springer Nature Switzerland, 2022: 151-160.
16.Wang H, Cao P, Wang J, et al. UCTransNet: rethinking the skip connections in U-Net from a channel-wise perspective with transformer// Proceedings of the AAAI Conference on Artificial Intelligence. Virtual Conference: AAAI, 2022: 2441-2449.
17.Zhou Z, Rahman Siddiquee M M, Tajbakhsh N, et al. UNet++: A nested U-Net architecture for medical image segmentation// International Workshop on Deep Learning in Medical Image Analysis. Cham: Springer International Publishing, 2018: 3-11.
18.Huang G, Liu Z, Van Der Maaten L, et al. Densely connected convolutional networks// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 4700-4708.
19.Tomar N K, Jha D, Ali S, et al. DDANet: Dual decoder attention network for automatic polyp segmentation// International Conference on Pattern Recognition. Cham: Springer International Publishing, 2021: 307-314.
20.Jha D, Smedsrud P H, Johansen D, et al A comprehensive study on colorectal polyp segmentation with ResUNet++, conditional random field and test-time augmentation. IEEE J Biomed Health Inform. 2021;25(6):2029–2040. doi: 10.1109/JBHI.2021.3049304. [DOI] [PubMed] [Google Scholar]
21.Ramzan M, Raza M, Sharif M I, et al Gastrointestinal tract polyp anomaly segmentation on colonoscopy images using graft-U-Net. J Pers Med. 2022;12(9):1459. doi: 10.3390/jpm12091459. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Punn N S, Agarwal S Modality specific U-Net variants for biomedical image segmentation: a survey. Artif Intell Rev. 2022;55(7):5845–5889. doi: 10.1007/s10462-022-10152-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Jha D, Tomar N K, Sharma V, et al. TransNetR: transformer-based residual network for polyp segmentation with multi-center out-of-distribution testing// Medical Imaging with Deep Learning. Virtual Conference: PMLR, 2024: 1372-1384.
24.Guo Q, Fang X, Wang L, et al Polyp segmentation of colonoscopy images by exploring the uncertain areas. IEEE Access. 2022;10:52971–52981. doi: 10.1109/ACCESS.2022.3175858. [DOI] [Google Scholar]
25.梁礼明, 何安军, 朱晨锟, 等融合Transformer和跨级相位感知的结肠息肉分割方法. 生物医学工程学杂志. 2023;40(2):234–243. doi: 10.7507/1001-5515.202211067. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Liu X, Song S Attention combined pyramid vision transformer for polyp segmentation. Biomed Signal Proces. 2024;89:105792. doi: 10.1016/j.bspc.2023.105792. [DOI] [Google Scholar]
27.Park K B, Lee J Y SwinE-Net: Hybrid deep learning approach to novel polyp segmentation using convolutional neural network and Swin Transformer. J Comput Des Eng. 2022;9(2):616–632. [Google Scholar]
28.梁礼明, 何安军, 阳渊, 等基于Swin Transformer和图形推理的结直肠息肉分割方法. 工程科学学报. 2024;46(5):897–907. doi: 10.13374/j.issn2095-9389.2023.04.21.004. [DOI] [Google Scholar]
29.Oukdach Y, Garbaz A, Kerkaou Z, et al UViT-SEG: an efficient ViT and U-net-based framework for accurate colorectal polyp segmentation in colonoscopy and WCE images. J Imaging Inform Med. 2024;37(5):2354–2374. doi: 10.1007/s10278-024-01124-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Pan S, Liu X, Xie N, et al EG-TransUNet: a transformer-based U-Net with enhanced and guided models for biomedical image segmentation. BMC Bioinform. 2023;24(1):85. doi: 10.1186/s12859-023-05196-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Duc N T, Oanh N T, Thuy N T, et al Colonformer: An efficient transformer based method for colon polyp segmentation. IEEE Access. 2022;10:80575–80586. doi: 10.1109/ACCESS.2022.3195241. [DOI] [Google Scholar]
32.Zhang Y, Liu H, Hu Q. Transfuse: Fusing transformers and cnns for medical image segmentation// International Conference on Medical Image Computing and Computer-Assisted Intervention. Cham: Springer International Publishing, 2021: 14-24.
33.Lin Y, Han X, Chen K, et al CSwinDoubleU-Net: A double U-shaped network combined with convolution and Swin Transformer for colorectal polyp segmentation. Biomed Signal Proces. 2024;89:105749. doi: 10.1016/j.bspc.2023.105749. [DOI] [Google Scholar]
34.Mahmood T, Rehman A, Saba T, et al Recent advancements and future prospects in active deep learning for medical image segmentation and classification. IEEE Access. 2023;11:113623–113652. doi: 10.1109/ACCESS.2023.3313977. [DOI] [Google Scholar]
35.Goodfellow I, Pouget-Abadie J, Mirza M, et al Generative adversarial networks. Commun Acm. 2020;63(11):139–144. doi: 10.1145/3422622. [DOI] [Google Scholar]

[b1] 1.Maida M, Dahiya D S, Shah Y R, et al Screening and surveillance of colorectal cancer: A review of the literature. Cancers. 2024;16(15):2746. doi: 10.3390/cancers16152746. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b2] 2.李国威, 刘静, 曹慧, 等深度学习在结肠息肉图像分割中的研究综述. 计算机科学与探索. 2025;19(5):1198–1216. doi: 10.3778/j.issn.1673-9418.2408012. [DOI] [Google Scholar]

[b3] 3.Grion B A R, Fonseca P L C, Kato R B, et al Identification of taxonomic changes in the fecal bacteriome associated with colorectal polyps and cancer: potential biomarkers for early diagnosis. Front Microbiol. 2024;14:1292490. doi: 10.3389/fmicb.2023.1292490. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b4] 4.尹艺晓, 马金刚, 张文凯, 等从U-Net到Transformer: 混合模型在医学图像分割中的应用进展. 激光与光电子学进展. 2025;62(2):11–33. doi: 10.3788/LOP240875. [DOI] [Google Scholar]

[b5] 5.考文涛, 李明, 马金刚卷积神经网络在结直肠息肉辅助诊断中的应用综述. 计算机科学与探索. 2024;18(3):627–645. doi: 10.3778/j.issn.1673-9418.2310062. [DOI] [Google Scholar]

[b6] 6.孙福艳, 王琼, 吕宗旺, 等深度学习在结肠息肉分割中的应用综述. 计算机工程与应用. 2023;59(23):15–27. [Google Scholar]

[b7] 7.Xiao B, Hu J, Li W, et al CTNet: Contrastive transformer network for polyp segmentation. IEEE T Cybern. 2024;54(9):5040–5053. doi: 10.1109/TCYB.2024.3368154. [DOI] [PubMed] [Google Scholar]

[b8] 8.Bernal J, Sánchez F J, Fernández-Esparrach G, et al WM-DOVA maps for accurate polyp highlighting in colonoscopy: Validation vs. saliency maps from physicians. Comput Med Imag Grap. 2015;43:99–111. doi: 10.1016/j.compmedimag.2015.02.007. [DOI] [PubMed] [Google Scholar]

[b9] 9.Tajbakhsh N, Gurudu S R, Liang J Automated polyp detection in colonoscopy videos using shape and context information. IEEE T Med Imaging. 2015;35(2):630–644. doi: 10.1109/TMI.2015.2487997. [DOI] [PubMed] [Google Scholar]

[b10] 10.Jha D, Smedsrud P H, Riegler M A, et al. Kvasir-SEG: A segmented polyp dataset// International Conference on Multimedia Modeling. Cham: Springer International Publishing, 2019: 451-462.

[b11] 11.Vázquez D, Bernal J, Sánchez F J, et al A benchmark for endoluminal scene segmentation of colonoscopy images. J Healthc Eng. 2017;2017(1):4037190. doi: 10.1155/2017/4037190. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b12] 12.Xu Y, Quan R, Xu W, et al Advances in medical image segmentation: A comprehensive review of traditional, deep learning and hybrid approache. Bioengineering. 2024;11(10):1034. doi: 10.3390/bioengineering11101034. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b13] 13.刘佳伟, 刘巧红, 李晓欧, 等一种改进的双U型网络的结肠息肉分割方法. 光学学报. 2021;41(18):72–80. [Google Scholar]

[b14] 14.Yeung M, Sala E, Schönlieb C B, et al Focus U-Net: A novel dual attention-gated CNN for polyp segmentation during colonoscopy. Comput Biol Med. 2021;137:104815. doi: 10.1016/j.compbiomed.2021.104815. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b15] 15.Tomar N K, Jha D, Bagci U, et al. TGANet: Text-guided attention for improved polyp segmentation// International Conference on Medical Image Computing and Computer-Assisted Intervention. Cham: Springer Nature Switzerland, 2022: 151-160.

[b16] 16.Wang H, Cao P, Wang J, et al. UCTransNet: rethinking the skip connections in U-Net from a channel-wise perspective with transformer// Proceedings of the AAAI Conference on Artificial Intelligence. Virtual Conference: AAAI, 2022: 2441-2449.

[b17] 17.Zhou Z, Rahman Siddiquee M M, Tajbakhsh N, et al. UNet++: A nested U-Net architecture for medical image segmentation// International Workshop on Deep Learning in Medical Image Analysis. Cham: Springer International Publishing, 2018: 3-11.

[b18] 18.Huang G, Liu Z, Van Der Maaten L, et al. Densely connected convolutional networks// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 4700-4708.

[b19] 19.Tomar N K, Jha D, Ali S, et al. DDANet: Dual decoder attention network for automatic polyp segmentation// International Conference on Pattern Recognition. Cham: Springer International Publishing, 2021: 307-314.

[b20] 20.Jha D, Smedsrud P H, Johansen D, et al A comprehensive study on colorectal polyp segmentation with ResUNet++, conditional random field and test-time augmentation. IEEE J Biomed Health Inform. 2021;25(6):2029–2040. doi: 10.1109/JBHI.2021.3049304. [DOI] [PubMed] [Google Scholar]

[b21] 21.Ramzan M, Raza M, Sharif M I, et al Gastrointestinal tract polyp anomaly segmentation on colonoscopy images using graft-U-Net. J Pers Med. 2022;12(9):1459. doi: 10.3390/jpm12091459. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b22] 22.Punn N S, Agarwal S Modality specific U-Net variants for biomedical image segmentation: a survey. Artif Intell Rev. 2022;55(7):5845–5889. doi: 10.1007/s10462-022-10152-1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b23] 23.Jha D, Tomar N K, Sharma V, et al. TransNetR: transformer-based residual network for polyp segmentation with multi-center out-of-distribution testing// Medical Imaging with Deep Learning. Virtual Conference: PMLR, 2024: 1372-1384.

[b24] 24.Guo Q, Fang X, Wang L, et al Polyp segmentation of colonoscopy images by exploring the uncertain areas. IEEE Access. 2022;10:52971–52981. doi: 10.1109/ACCESS.2022.3175858. [DOI] [Google Scholar]

[b25] 25.梁礼明, 何安军, 朱晨锟, 等融合Transformer和跨级相位感知的结肠息肉分割方法. 生物医学工程学杂志. 2023;40(2):234–243. doi: 10.7507/1001-5515.202211067. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b26] 26.Liu X, Song S Attention combined pyramid vision transformer for polyp segmentation. Biomed Signal Proces. 2024;89:105792. doi: 10.1016/j.bspc.2023.105792. [DOI] [Google Scholar]

[b27] 27.Park K B, Lee J Y SwinE-Net: Hybrid deep learning approach to novel polyp segmentation using convolutional neural network and Swin Transformer. J Comput Des Eng. 2022;9(2):616–632. [Google Scholar]

[b28] 28.梁礼明, 何安军, 阳渊, 等基于Swin Transformer和图形推理的结直肠息肉分割方法. 工程科学学报. 2024;46(5):897–907. doi: 10.13374/j.issn2095-9389.2023.04.21.004. [DOI] [Google Scholar]

[b29] 29.Oukdach Y, Garbaz A, Kerkaou Z, et al UViT-SEG: an efficient ViT and U-net-based framework for accurate colorectal polyp segmentation in colonoscopy and WCE images. J Imaging Inform Med. 2024;37(5):2354–2374. doi: 10.1007/s10278-024-01124-8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b30] 30.Pan S, Liu X, Xie N, et al EG-TransUNet: a transformer-based U-Net with enhanced and guided models for biomedical image segmentation. BMC Bioinform. 2023;24(1):85. doi: 10.1186/s12859-023-05196-1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b31] 31.Duc N T, Oanh N T, Thuy N T, et al Colonformer: An efficient transformer based method for colon polyp segmentation. IEEE Access. 2022;10:80575–80586. doi: 10.1109/ACCESS.2022.3195241. [DOI] [Google Scholar]

[b32] 32.Zhang Y, Liu H, Hu Q. Transfuse: Fusing transformers and cnns for medical image segmentation// International Conference on Medical Image Computing and Computer-Assisted Intervention. Cham: Springer International Publishing, 2021: 14-24.

[b33] 33.Lin Y, Han X, Chen K, et al CSwinDoubleU-Net: A double U-shaped network combined with convolution and Swin Transformer for colorectal polyp segmentation. Biomed Signal Proces. 2024;89:105749. doi: 10.1016/j.bspc.2023.105749. [DOI] [Google Scholar]

[b34] 34.Mahmood T, Rehman A, Saba T, et al Recent advancements and future prospects in active deep learning for medical image segmentation and classification. IEEE Access. 2023;11:113623–113652. doi: 10.1109/ACCESS.2023.3313977. [DOI] [Google Scholar]

[b35] 35.Goodfellow I, Pouget-Abadie J, Mirza M, et al Generative adversarial networks. Commun Acm. 2020;63(11):139–144. doi: 10.1145/3422622. [DOI] [Google Scholar]

PERMALINK

U-Net与Transformer在结肠息肉图像分割中的应用综述

Review of application of U-Net and Transformer in colon polyp image segmentation

Yankun SHI

Shilei SUN

Jing LIU

Jingang MA

Ming LI

Abstract

Abstract

0. 引言