CGP-Net: Cross-modal guided prior network for precise gastric cancer segmentation

Chaoyang GE; Yifan GAO; Cheng LIU; Xin GAO

doi:10.7507/1001-5515.202507011

. 2026 Feb 25;43(1):146–153. [Article in Chinese] doi: 10.7507/1001-5515.202507011

Show available content in

CGP-Net: Cross-modal guided prior network for precise gastric cancer segmentation

Chaoyang GE ^1,², Yifan GAO ^1,², Cheng LIU ^1,², Xin GAO ^2,^3,^*

PMCID: PMC12948535 PMID: 41760214

Abstract

Precise segmentation of gastric cancer computed tomography (CT) images is a critical step for clinical precision diagnosis and treatment. However, it currently faces two major challenges: the low contrast between tumors and surrounding normal tissues makes boundary delineation difficult, and the high variability in tumor shape, size, and location leads to inaccurate localization. To address these issues, a cross-modal prior knowledge-guided gastric cancer CT image automatic segmentation method (CGP-Net) was proposed. In this method, visual priors were extracted from diagnostic reports using a large language model (LLM), and lesion localization was assisted by a semantic anchoring and parsing module. A mixed context-aware Mamba module was constructed to synergistically optimize feature modeling for adapting to tumor morphological variations. Furthermore, a boundary-aware gated convolution module was designed to improve the delineation accuracy of fuzzy boundaries. Experiments on a large-scale dataset of 349 gastric cancer patients demonstrated that the Dice coefficient and 95th percentile of Hausdorff distance (HD95) of the proposed method reached 78.10% and 16.44 mm, respectively. It outperformed state-of-the-art methods such as U-Mamba and nnUNet in terms of segmentation accuracy and boundary prediction. This method effectively integrates textual priors to significantly enhance segmentation accuracy, offering significant value for clinical applications.

Keywords: Gastric cancer segmentation, Text guidance, State-space model, Cross-modal fusion

0. 引言

胃癌是一种全球范围内常见的消化系统恶性肿瘤，精确的肿瘤空间位置、浸润范围及形态学特征对临床诊断、分期及治疗决策至关重要^[1-3]。计算机断层扫描（computed tomography，CT）是获取这些关键信息的首选影像学技术^[4-5]。然而，完全依赖放射科医生手动勾画肿瘤区域不仅耗时费力，且高度依赖医生经验，存在主观差异。因此，开发高效、精准的胃癌CT图像自动分割算法，对于辅助临床精准诊疗具有重要的现实意义。

近年来，随着深度学习技术的发展，医学图像分割领域经历了一场深刻的变革^[6-7]。以U-Net^[8]为代表的卷积神经网络（convolutional neural network，CNN）及其变体^[9-12]极大地推动了医学图像分割的发展。然而，CNN受限于局部感受野，在捕捉全局上下文信息方面存在不足^[13]。尽管后续出现的基于Transformer^[14-15]的方法在长序列建模上取得了突破，但是其计算复杂度较高，效率还需进一步提高。最近，状态空间模型在连续长序列数据分析中获得了尖端性能，已成为构建深度网络的高效构建块^[16-19]。Mamba通过选择机制进一步改进了状态空间模型，对图像序列的有序扫描兼顾了全局信息获取和线性复杂度^[20-21]。尽管取得了一定突破，但在处理胃癌CT图像时仍面临严峻挑战：一是胃癌病灶与周围胃壁、肠道等软组织对比度极低，导致边界模糊；二是肿瘤形态、大小和位置具有高度变异性，单纯依赖视觉特征难以实现精准定位^[22-25]。现有的视觉分割模型缺乏对医学诊断报告中丰富语义先验知识的利用，而参考图像分割则能根据文本描述，为视觉上难以辨识的模糊边界提供关键的定位线索^[26-28]，识别并分割图像中的特定对象，有效应对复杂视觉任务^[29-32]。

针对上述挑战，本文旨在探索一种融合文本先验知识与视觉特征的新范式，提出一种基于病例文本信息引导和状态空间模型的胃癌CT图像自动分割方法（cross-modal guided prior network，CGP-Net）。本研究的核心思路在于：首先，利用大语言模型将非结构化的诊断报告转化为结构化的视觉先验，通过语义锚定解决肿瘤的定位难题；其次，构建混合上下文感知Mamba模块优化特征建模，以适应肿瘤形态多变性；最后，通过边界感知机制锐化模糊轮廓。本研究期望通过这种跨模态互补策略，有效克服单一视觉模态在低对比度场景下的局限性，实现胃癌病灶的高精度自动分割。

1. 方法

1.1. 概述

本文提出的CGP-Net架构如图1所示。该网络的骨干架构采用了标准的编码器-解码器的U型设计，主要由语义锚定与解析模块（semantic anchoring and resolution module，SARM）、文本引导模块（text-guide block，TGB）、边界感知门控卷积（boundary-aware gated convolution，BAGC）模块、混合上下文感知Mamba（hybrid context-aware Mamba，HCAM）模块等模块组成。

如图1所示，输入的病例报告信息首先在大语言模型的指导下进行格式化和特征化，处理成标准文本向量。文本向量与CT图像共同作为文本引导模块的输入，用于引导CT图像的层次化特征提取。随后，经TGB模块融合的特征经过编码器进行特征编码和下采样生成不同尺度的特征图。其中，中间两层特征经过BAGC模块进行肿瘤边界特征的勾勒；最底层特征经HCAM模块进行局部精细化特征与全局特征表征的协同优化。最终，经HCAM模块处理的输出特征送入解码器进行特征解码和上采样，并与两个BAGC模块的输出特征和最高层特征进行多尺度融合，以缓解层次化特征提取造成的空间信息损失。最后通过卷积层输出分割结果。

1.2. 语义锚定与解析模块（SARM）

在胃癌的CT影像诊断中，由于对比度低导致肿瘤定位困难。为此本文设计了SARM模块，利用大语言模型将非结构化的诊断报告文本信息转化为精确的视觉引导信号（如图1左侧所示）。该模块采用两阶段分层处理策略：首先，通过特定的提示工程构建知识转化流程，引导大语言模型自动过滤非视觉干扰信息，仅聚焦于病灶的位置与形态特征；随后，将提取的专业术语转化为标准化的视觉描述语言。最终经文本编码生成高质量特征向量，作为后续TGB模块的输入。

1.3. 文本引导模块（TGB）

TGB模块旨在通过深度融合CT影像的放射学特征与病理报告的高级语义描述，实现由文本引导的、高精度的肿瘤区域分割（见图2）。具体工作流程如下：首先，标准文本向量通过尺度扩充后与图像特征进行尺度对齐得到文本特征；随后将文本特征与图像特征进行维度拼接，以将不同模态的信息进行结构化整合；最后，整合后特征通过双轴跨模态交互扫描，实现不同模态特征的深度融合与增强。

1.3.1. 特征对齐与混合张量构建

为实现跨模态信息的有效交互，TGB首先对输入的异构特征进行对齐与组装，构建统一的混合特征张量 Inline graphic 。首先，编码后的病例报告文本特征通过广播机制将其扩展至与CT影像特征图完全匹配的全局上下文特征。其次，为建立局部描述与影像区域的细粒度关联，通过计算影像特征与文本特征之间的矩阵乘积，生成局部关联图 Inline graphic ，如式(1)所示。该关联图旨在量化文本描述与CT影像的相关性强度，从而将抽象的文本语义“锚定”到具体的影像空间位置上。最后，将原始的影像学特征、全局诊断上下文特征以及局部关联特征沿着通道维度进行拼接，构建成统一的混合特征张量 Inline graphic ，如式(2)所示。

其中， Inline graphic 和分别是作用于影像特征和文本特征的可学习的线性投影矩阵。表示沿通道维度的拼接操作。

1.3.2. 双轴跨模态交互扫描

为强化引导作用，该模块对 Inline graphic 依次进行通道扫描（channel scan）和空间扫描（spatial scan），如式(3)所示。通道扫描促进不同模态语义的信息渗透，空间扫描则利用文本中的形态描述（如浸润性）修正肿瘤轮廓，最终通过残差连接输出融合特征。

其中， Inline graphic 表示修正线性单元（rectified linear unit）激活函数，和分别表示通道扫描与空间扫描操作。

1.4. 边界感知门控卷积模块（BAGC）

针对胃癌肿瘤影像边界模糊、不规则和低对比度的特性，本文设计BAGC模块，通过动态增强边缘特征，实现精确的边界勾勒。

如图3所示，BAGC模块由主分支和门控分支组成。输入特征 Inline graphic 首先通过主分支，该分支包含两个连续的深度可分离卷积块，分别记为和，高效提取病灶区域的通用形态特征，如式(4)所示。同时，输入特征也通过由 1 × 1卷积为核心的门控分支以生成门控注意力图，如式(5)所示，该注意力图在肿瘤边界处具有高响应值。随后，通过将主干特征 Inline graphic 与门控注意力图进行哈达玛积加权，抑制背景噪声并锐化边界特征。最后，特征再经过一个深度可分离卷积模块，记为，与输入模块的原始特征残差连接相加，得到最终的输出特征，如式(6)～(7)所示。

其中 Inline graphic 表示深度可分离卷积，表示组归一化，表示哈达玛积。

1.5. 混合上下文感知Mamba模块（HCAM）

为解决胃癌CT影像中因肿瘤形态多变及周围组织特征相似导致的分割难题，本文构建了HCAM模块。HCAM模块由分层上下文融合模块（hierarchical context fusion model，HCFM）和状态空间模型（Mamba）构成，通过显式与隐式特征建模的协同优化，增强网络对肿瘤区域的特征辨识能力。

如图4所示，输入特征首先送入HCFM，特征解耦输出 Inline graphic 、矩阵，以将混杂的特征分解为相互独立的子空间，减少冗余信息干扰；随后、矩阵均先经空间注意力抑制非肿瘤区域背景噪声，再经通道注意力强化肿瘤语义特征表达。处理后的解耦特征相加进行加权求和，融合后的特征经过深度卷积处理以后与经过特征对齐的 Inline graphic 矩阵特征进行哈达玛积运算，以显式强化原始特征，如式(8)所示。将强化后的原始特征经过展平和层归一化（layer norm）后送入Mamba状态空间模块，以隐式建模长序列，增强模型对关键信息的敏感性，最终，通过线性层和空间重塑恢复至二维特征图结构，获得模块的最终输出 Inline graphic ，如式(9)所示。

其中， Inline graphic 、、分别为查询、键、值张量，是一个包含空间和通道交互的上下文映射函数，为线性变换层，表示哈达玛积。

其中 Inline graphic 是维度转换算子，将二维特征图映射为一维序列，表示逆转换算子，将包含全局依赖信息的序列恢复至二维特征图结构，LN表示层归一化。

2. 实验与结果分析

2.1. 数据集和实验细节

本文所用数据集来自山西省肿瘤医院。该数据集包含了于2017年1月1日至2020年11月30日期间，在山西肿瘤医院接受治疗的349名患者的影像资料和病例报告。所有入组患者均为经组织学确认的局部晚期胃癌患者。本研究已获得伦理审查委员会的批准，并因其回顾性研究的性质，豁免了患者知情同意的要求。

本数据集不包含CT扫描仪的设备信息，仅保留纯净的图像数据以供分析。用于图像分割任务的影像资料是患者在接受新辅助化疗前采集的门静脉期CT图像。肿瘤的分割标签由一名具备8年腹部CT诊断经验的放射科医生，在肿瘤显示最大的CT断层图像上手动进行二维勾画。分割标签的准确性由另一位拥有15年经验的放射科医生审核，以确保标注的可靠性和可重复性。

为了训练和评估模型，本研究将349例患者数据按照8∶2的比例随机划分为训练集和测试集。其中，训练集包含279例数据，用于模型的参数优化；测试集包含70例数据，仅用于最终的性能评估。

所有实验基于PyTorch框架（版本2.1.2）实现，采用Python编程语言，运行环境为Intel（R） Xeon（R） Silver 4210 CPU @ 2.20GHz处理器及NVIDIA GeForce RTX3090 GPU的标准深度学习工作站。在保证图像关键细节不丢失的前提下，为了优化计算资源消耗，所有输入图像均处理成512 × 512分辨率。模型训练的超参数配置如下：batch size设为8；初始学习率设为0.01；优化器采用带有Nesterov动量和权重衰减的SGD优化器，动量设置为0.99，权重衰减设置为0.000 05；训练周期为200个epoch。本文使用了DeepSeek R1作为格式化报告的模型，Gemini Text Embedding Model作为特征化报告的模型。

为了定量评估模型分割性能，采用Dice系数和Hausdorff距离第95百分位数（95th percentile of Hausdorff distance，HD95）作为评价指标。

Dice系数计算公式如下：

其中 Inline graphic 代表模型预测的分割结果像素集合，代表真实的标签像素集合。

HD95用于评估分割边界的形状一致性。它是标准Hausdorff距离的变体，定义为预测边界 Inline graphic 与真实边界之间距离集合的第95百分位数。相比于标准HD，HD95对离群点更具鲁棒性，其计算公式定义为：

Inline graphic 和分别代表预测结果和真实标签的边界像素点集。是点和点之间的欧氏距离。表示取第95百分位数。

2.2. 对比实验

为了验证所提方法在胃癌CT图像分割任务中的有效性和优越性，本文在所构建的胃癌CT图像数据集上将其与以下方法进行对比：基于CNN的分割方法：UNet、UNet++、nnUNet^[33]；基于Transformer的分割方法：TransUNet、Swin-UNETR^[34]、TransFuse^[35]；基于Mamba的分割方法：MambaUNet^[36]、VMUNet^[37]、SwinUMamba^[38]、HVMUNet^[39]、MLLAUNet^[40]和U-Mamba。实验结果如表1所示。

表 1. Segmentation results of various methods on our datasets.

不同方法在胃癌分割数据集上的定量结果

	模型	Dice ↑	HD95/mm ↓
注：加粗表示指标最优，下划线表示指标次优，↓表示该指标越小越好，↑表示该指标越大越好
基于CNN	UNet	74.68%	18.48
	UNet++	74.81%	23.26
	nnUNet	76.17%	19.11
基于Transformer	TransUNet	71.74%	21.25
	Swin-UNETR	70.79%	26.77
	TransFuse	67.37%	25.43
基于Mamba	MLLAUNet	69.25%	24.03
	MambaUNet	68.55%	23.01
	U-Mamba	76.35%	19.57
	SwinUMamba	71.79%	29.30
	VMUNet	68.25%	25.36
	HVMUNet	65.16%	27.06
本文方法		78.10%	16.44

Open in a new tab

如表1所示，在胃癌CT图像分割任务中，现有方法的表现普遍受限，具体表现为：① 基于CNN的经典分割方法中，nnUNet表现相对最优，Dice系数值达到76.17%，但整体性能已达瓶颈，难以进一步提升分割精度。② 基于Transformer的分割方法整体分割性能不佳，Dice值在67.37%至71.74%之间。③ 基于Mamba架构的多种改进方法性能表现参差不齐，其中U-Mamba表现突出，Dice系数达到76.35%，展现了状态空间模型在该任务上的应用潜力，但其余多数Mamba改进方法性能仍有较大提升空间。相比之下，本文提出的方法在所有对比模型中表现最优，Dice指标达到78.10%，HD95指标为16.44 mm。较目前性能次优的U-Mamba和nnUNet，Dice值分别提升了1.75至1.93个百分点。同时，HD95作为衡量边界预测准确度的指标，其值亦在所有方法中达到了最好的结果，证明本方法不仅在分割区域的重合度上具有优势，也能更精准地勾勒肿瘤边界。实验结果表明，本文方法有效融合了不同架构的优点，综合性能超越了所有对比方法。

2.3. 消融实验

为了验证CGP-Net中各个组件的有效性，本文采用基于U-Mamba模型的编码器解码器模块构建基线网络，通过逐步集成各个模块的方式，构建4组对比网络进行消融分析。由于SRAM模块和TGB模块相互依存，在加入TGB的消融实验中，本文默认使用了SRAM。基于本文数据集的实验结果如表2所示。

表 2. Validation experiment of HCAM, BAGC and TGB modules.

HCAM、BAGC、TGB模块有效性验证实验

模型	HCAM	BAGC	TGB	Dice ↑	HD95/mm ↓
注：√表示添加该模块，×表示不添加该模块，加粗表示指标最优，下划线表示指标次优，↓表示该指标越小越好，↑表示该指标越大越好
基线模型	×	×	×	76.28%	21.08
	√	×	×	77.16%	20.21
	√	√	×	77.43%	18.77
	√	√	√	78.10%	16.44

Open in a new tab

如表2所示，HCAM、BAGC、TGB三个模块均对提升模型分割性能有积极作用，且作用机制存在互补性。在基线模型的基础上，首先引入HCAM模块，Dice值提升了0.88个百分点，HD95值则降低了0.87 mm，初步验证了该模块在特征提取方面的有效性。随后，在HCAM的基础上进一步集成BAGC模块，模型性能得到持续改善，相较于仅使用HCAM的模型，Dice提升了0.27个百分点，HD95降低了1.44 mm，证明了BAGC模块在增强边界信息上的优势。当三个模块协同工作时，网络达到了最优性能。与基线模型相比，最终模型在Dice指标上提升了1.82个百分点，同时HD95指标也显著降低了4.64 mm。该结果证实了各模块间的功能互补性，通过协同工作机制，共同提升了模型在胃癌CT图像分割任务中的综合性能。

2.4. 分割结果可视化

为了更直观地展示本文所提CGP-Net方法在分割效果上的优越性，图5呈现了本方法与其他12种主流分割模型在四个不同病例上的分割结果对比，其中病例 ①～③ 展示了模型在不同肿瘤形态下的优异分割性能，而病例 ④ 则展示了一个分割效果稍逊的挑战性案例。图中红色区域为模型预测的肿瘤范围，最后一列（GT）为放射科医生手工勾画的真实标签，作为评估的金标准。从图中可以清晰地观察到，多数对比模型在处理复杂的胃癌CT图像时存在明显的局限性。例如TransFuse、Swin-UNETR和MambaUNet表现出欠分割或过分割问题，而性能相对较好的nnUNet和U-Mamba虽然能够大致定位肿瘤，但在边界细节的刻画上仍有不足，其分割结果的轮廓与真实标签相比存在偏差。相比之下，本文提出的CGP-Net方法在所有病例中均表现出最佳的分割效果。其预测的肿瘤区域与真实标签高度吻合，尤其在肿瘤边界的勾勒上更为精准和平滑，这直观地印证了本方法在Dice系数和HD95指标上的定量优势。此外，为了客观评估模型性能，图5病例 ④ 展示了一个具有较高分割难度的偏差案例。在该病例中，由于病灶区域与周围正常胃壁组织的CT值极度接近，且存在严重的噪声干扰，导致视觉特征极不明显。观察结果显示，CGP-Net虽然成功定位了病灶的部分区域，但在病灶精细结构方面与GT存在一定偏差，未能完全覆盖真实边界。然而，对比其他主流模型在此极端样本下的表现（如完全漏检或大面积误检），本方法仍展现了一定的抗干扰能力。这些可视化结果有力地证明，通过有效融合文本先验知识并优化边界特征感知，本文提出的方法能够显著克服肿瘤定位不准和边界模糊的挑战，生成最接近专家标注的高质量分割结果。

3. 总结与展望

本文针对胃癌CT图像中肿瘤边界模糊、定位困难等挑战，提出一种融合病例文本先验与状态空间模型的自动分割方法CGP-Net。CGP-Net定位于影像获取后的辅助分析环节。对于初诊患者，模型在CT扫描完成后介入，利用报告先验克服低对比度造成的分割难题，辅助医生进行精准分期。对于复诊患者（如新辅助化疗后），需采集新影像以反映肿瘤形态的动态变化。此时，模型通过对新影像的精准分割，既可量化病灶体积改变以客观评估疗效，又能明确残余肿瘤边界与浸润范围，为后续手术规划提供精确导航。因依赖影像输入，本方法不适用于无CT资料的筛查阶段。

该方法设计了SARM模块，利用大语言模型从非结构化诊断报告中提取并转化视觉引导信息；构建了TGB模块，将文本先验与视觉特征深度耦合，以实现肿瘤区域的精确引导与定位；同时设计了HCAM模块和BAGC模块，分别用于增强对形态多变肿瘤的特征辨识和锐化模糊的肿瘤边界。实验结果验证，本方法在Dice系数和HD95指标上均超越了当前主流方法，不仅提升了分割的准确性，也保证了边界的清晰度，为胃癌的精准诊疗提供了有效的技术支持。

但本研究仍存在局限性：一是模型性能高度依赖于配对放射学报告质量，且目前缺乏具备大规模配对文本的公开数据集进行横向评测。二是研究基于单中心数据集，其在多中心数据上的泛化能力有待进一步验证。未来工作将聚焦于两个方面：第一，开发文本缺失场景下的自适应引导机制，使模型能够在仅有影像资料的情况下，通过检索内部知识库或自生成伪标签实现分割，从而具备在公开数据集上测试的能力。第二，积极收集临床样本，开展多中心协同研究，在保护隐私的前提下验证模型在多源数据上的稳定性。

重要声明

利益冲突声明：本文全体作者均声明不存在利益冲突。

作者贡献声明：葛潮洋负责本文算法程序设计、结果记录分析以及论文撰写，高亦凡负责提供实验指导、数据分析指导和算法指导，刘成负责本文论文撰写指导，高欣承担论文审阅修订与总体指导。

伦理声明：本研究通过了山西省肿瘤医院伦理委员会的审批（批文编号：202223）。

References

1.Bray F, Laversanne M, Sung H, et al Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2024;74(3):229–263. doi: 10.3410/f.739487650.793592245. [DOI] [PubMed] [Google Scholar]
2.Morgan E, Arnold M, Camargo M C, et al The current and future incidence and mortality of gastric cancer in 185 countries, 2020-40: A population-based modelling study. EClinicalMedicine. 2022;47:101404. doi: 10.1016/j.eclinm.2022.101404. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Yang W J, Zhao H P, Yu Y, et al Updates on global epidemiology, risk and prognostic factors of gastric cancer. World J Gastroenterol. 2023;29(16):2452–2468. doi: 10.3748/wjg.v29.i16.2452. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Lordick F, Carneiro F, Cascinu S, et al Gastric cancer: ESMO clinical practice guideline for diagnosis, treatment and follow-up. Ann Oncol. 2022;33(10):1005–1020. doi: 10.1016/j.annonc.2022.07.004. [DOI] [PubMed] [Google Scholar]
5.Hallinan J T P D, Venkatesh S K Gastric carcinoma: Imaging diagnosis, staging and assessment of treatment response. Cancer Imaging. 2013;13(2):212–227. doi: 10.1102/1470-7330.2013.0023. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Zhang Y, Yuan N, Zhang Z, et al Unsupervised domain selective graph convolutional network for preoperative prediction of lymph node metastasis in gastric cancer. Med Image Anal. 2022;79:102467. doi: 10.1016/j.media.2022.102467. [DOI] [PubMed] [Google Scholar]
7.胡伦瑜, 夏威, 李琼, 等基于自监督预训练和多任务学习的肺腺癌无复发生存期预测. 生物医学工程学杂志. 2024;41(2):205–212. doi: 10.7507/1001-5515.202309060. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for biomedical image segmentation// Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015. Cham: Springer, 2015: 234-241.
9.Oktay O, Schlemper J, Folgoc L L, et al. Attention U-Net: Learning where to look for the pancreas. arXiv preprint arXiv, 2018: 1804.03999.
10.Zhou Z, Rahman Siddiquee M M, Tajbakhsh N, et al. UNet++: A nested U-Net architecture for medical image segmentation// Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support. Cham: Springer, 2018: 3-11.
11.Shang H, Feng T, Han D, et al Deep learning and radiomics for gastric cancer serosal invasion: Automated segmentation and multi-machine learning from two centers. J Cancer Res Clin Oncol. 2025;151(2):60. doi: 10.1007/s00432-025-06117-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Wang J, Zhang B, Wang Y, et al CrossU-Net: Dual-modality cross-attention U-Net for segmentation of precancerous lesions in gastric cancer. Comput Med Imaging Graph. 2024;112:102339. doi: 10.1016/j.compmedimag.2024.102339. [DOI] [PubMed] [Google Scholar]
13.Chen J, Lu Y, Yu Q, et al. TransUNet: Transformers make strong encoders for medical image segmentation. arXiv preprint arXiv, 2021: 2102.04306.
14.Hatamizadeh A, Tang Y, Nath V, et al. UNETR: Transformers for 3D medical image segmentation// Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). Waikoloa: IEEE, 2022: 1748-1758.
15.Cao H, Wang Y, Chen J, et al. Swin-Unet: Unet-like pure transformer for medical image segmentation// European Conference on Computer Vision (ECCV). Cham: Springer, 2022: 205-218.
16.Liu Y, Tian Y, Zhao Y, et al VMamba: Visual state space model. Adv Neural Inf Process Syst. 2024;37:103031–103063. doi: 10.52202/079017-0808. [DOI] [Google Scholar]
17.Xing Z, Ye T, Yang Y, et al. SegMamba: Long-range sequential modeling Mamba for 3D medical image segmentation// Medical Image Computing and Computer-Assisted Intervention–MICCAI 2024. Cham: Springer, 2024: 578-588.
18.Wang J, Zheng J, Ma L, et al. LKM-UNet: Large kernel vision Mamba U-Net for medical image segmentation// Medical Image Computing and Computer-Assisted Intervention–MICCAI 2024. Cham: Springer, 2024: 360-370.
19.Bansal S, Madisetty S, Rehman M Z U, et al. A comprehensive survey of Mamba architectures for medical image analysis: Classification, segmentation, restoration and beyond. arXiv preprint arXiv, 2024: 2410.02362.
20.Gu A, Dao T. Mamba: Linear-time sequence modeling with selective state spaces. arXiv preprint arXiv, 2023: 2312.00752.
21.Ma J, Li F, Wang B. U-Mamba: Enhancing long-range dependency for biomedical image segmentation. arXiv preprint arXiv, 2024: 2401.04722.
22.Ling T, Zuo Z, Huang M, et al Stacking classifiers based on integrated machine learning model: fusion of CT radiomics and clinical biomarkers to predict lymph node metastasis in locally advanced gastric cancer patients after neoadjuvant chemotherapy. BMC Cancer. 2025;25(1):834. doi: 10.1186/s12885-025-14259-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Zheng G, Wang H, Chai X, et al Interpretable deep learning for multicenter gastric cancer T staging from CT images. npj Digit Med. 2026;9:2. doi: 10.1038/s41746-025-02002-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Li L, Wang C, Geng Y, et al Segment anything model for gastric cancer. Cancer Med. 2025;14(18):e71246. doi: 10.1002/cam4.71246. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Bhardwaj P, Kumar S, Kumar Y. Deep learning techniques in gastric cancer prediction and diagnosis// 2022 International Conference on Machine Learning, Big Data, Cloud and Parallel Computing (COM-IT-CON). New York: IEEE, 2022: 843-850.
26.Kumar G M K, Chadha A, Mendola J, et al. MedVisionLlama: Leveraging pre-trained large language model layers to enhance medical image segmentation// Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops. New York: IEEE, 2025: 1114-1124.
27.Wu J, Fu R, Fang H, et al Medical SAM adapter: Adapting segment anything model for medical image segmentation. Med Image Anal. 2025;102:103547. doi: 10.1016/j.media.2025.103547. [DOI] [PubMed] [Google Scholar]
28.Li Y, Lai Z, Bao W, et al. Visual large language models for generalized and specialized applications. arXiv preprint arXiv, 2025: 2501.02765.
29.Chen W, Liu J, Liu T, et al Bi-VLGM: Bi-level class-severity-aware vision-language graph matching for text guided medical image segmentation. Int J Comput Vis. 2025;133(3):1375–1391. doi: 10.1007/s11263-024-02246-w. [DOI] [Google Scholar]
30.Rao Y, Zhao W, Chen G, et al. Denseclip: Language-guided dense prediction with context-aware prompting// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New York: IEEE, 2022: 18082-18091.
31.Yuan R, Chen M, Xu J, et al. Text-promptable propagation for referring medical image sequence segmentation. arXiv preprint arXiv, 2025: 2502.11093.
32.He J, Wang G, Zhang Q, et al. ReMamber: Referring image segmentation with mamba twister. arXiv preprint arXiv, 2024: 2404.11651.
33.Isensee F, Jaeger P F, Kohl S A A, et al nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nat Methods. 2021;18(2):203–211. doi: 10.1038/s41592-020-01008-z. [DOI] [PubMed] [Google Scholar]
34.Hatamizadeh A, Nath V, Tang Y, et al. Swin UNETR: Swin transformers for semantic segmentation of brain tumors in MRI images// International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI). Cham: Springer, 2022: 272-284.
35.Zhang Y, Liu H, Hu Q. TransFuse: Fusing transformers and CNNs for medical image segmentation// International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI). Cham: Springer, 2021: 14-24.
36.Wang Z, Zheng J, Zhang Y, et al. Mamba-UNet: A mamba-based u-net for medical image segmentation. arXiv preprint arXiv, 2024: 2402.08699.
37.Ruan J, Li J, Xiang S. VM-UNet: Vision mamba u-net for medical image segmentation. arXiv preprint arXiv, 2024: 2402.02491.
38.Liu J, Yang H, Zhou H, et al. Swin-UMamba: Adapting mamba-based vision foundation models for medical image segmentation. arXiv preprint arXiv, 2024: 2402.03302.
39.Wu R, Liu Y, Liang L, et al. H-vmunet: High-order vision mamba u-net for medical image segmentation. arXiv preprint arXiv, 2024: 2403.13642.
40.Li Y, Li X, Liu Y, et al. MLLA-UNet: Mamba-like linear attention in an efficient u-shape model for medical image segmentation. arXiv preprint arXiv, 2024: 2405.02875.

[b1] 1.Bray F, Laversanne M, Sung H, et al Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2024;74(3):229–263. doi: 10.3410/f.739487650.793592245. [DOI] [PubMed] [Google Scholar]

[b2] 2.Morgan E, Arnold M, Camargo M C, et al The current and future incidence and mortality of gastric cancer in 185 countries, 2020-40: A population-based modelling study. EClinicalMedicine. 2022;47:101404. doi: 10.1016/j.eclinm.2022.101404. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b3] 3.Yang W J, Zhao H P, Yu Y, et al Updates on global epidemiology, risk and prognostic factors of gastric cancer. World J Gastroenterol. 2023;29(16):2452–2468. doi: 10.3748/wjg.v29.i16.2452. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b4] 4.Lordick F, Carneiro F, Cascinu S, et al Gastric cancer: ESMO clinical practice guideline for diagnosis, treatment and follow-up. Ann Oncol. 2022;33(10):1005–1020. doi: 10.1016/j.annonc.2022.07.004. [DOI] [PubMed] [Google Scholar]

[b5] 5.Hallinan J T P D, Venkatesh S K Gastric carcinoma: Imaging diagnosis, staging and assessment of treatment response. Cancer Imaging. 2013;13(2):212–227. doi: 10.1102/1470-7330.2013.0023. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b6] 6.Zhang Y, Yuan N, Zhang Z, et al Unsupervised domain selective graph convolutional network for preoperative prediction of lymph node metastasis in gastric cancer. Med Image Anal. 2022;79:102467. doi: 10.1016/j.media.2022.102467. [DOI] [PubMed] [Google Scholar]

[b7] 7.胡伦瑜, 夏威, 李琼, 等基于自监督预训练和多任务学习的肺腺癌无复发生存期预测. 生物医学工程学杂志. 2024;41(2):205–212. doi: 10.7507/1001-5515.202309060. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b8] 8.Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for biomedical image segmentation// Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015. Cham: Springer, 2015: 234-241.

[b9] 9.Oktay O, Schlemper J, Folgoc L L, et al. Attention U-Net: Learning where to look for the pancreas. arXiv preprint arXiv, 2018: 1804.03999.

[b10] 10.Zhou Z, Rahman Siddiquee M M, Tajbakhsh N, et al. UNet++: A nested U-Net architecture for medical image segmentation// Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support. Cham: Springer, 2018: 3-11.

[b11] 11.Shang H, Feng T, Han D, et al Deep learning and radiomics for gastric cancer serosal invasion: Automated segmentation and multi-machine learning from two centers. J Cancer Res Clin Oncol. 2025;151(2):60. doi: 10.1007/s00432-025-06117-w. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b12] 12.Wang J, Zhang B, Wang Y, et al CrossU-Net: Dual-modality cross-attention U-Net for segmentation of precancerous lesions in gastric cancer. Comput Med Imaging Graph. 2024;112:102339. doi: 10.1016/j.compmedimag.2024.102339. [DOI] [PubMed] [Google Scholar]

[b13] 13.Chen J, Lu Y, Yu Q, et al. TransUNet: Transformers make strong encoders for medical image segmentation. arXiv preprint arXiv, 2021: 2102.04306.

[b14] 14.Hatamizadeh A, Tang Y, Nath V, et al. UNETR: Transformers for 3D medical image segmentation// Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). Waikoloa: IEEE, 2022: 1748-1758.

[b15] 15.Cao H, Wang Y, Chen J, et al. Swin-Unet: Unet-like pure transformer for medical image segmentation// European Conference on Computer Vision (ECCV). Cham: Springer, 2022: 205-218.

[b16] 16.Liu Y, Tian Y, Zhao Y, et al VMamba: Visual state space model. Adv Neural Inf Process Syst. 2024;37:103031–103063. doi: 10.52202/079017-0808. [DOI] [Google Scholar]

[b17] 17.Xing Z, Ye T, Yang Y, et al. SegMamba: Long-range sequential modeling Mamba for 3D medical image segmentation// Medical Image Computing and Computer-Assisted Intervention–MICCAI 2024. Cham: Springer, 2024: 578-588.

[b18] 18.Wang J, Zheng J, Ma L, et al. LKM-UNet: Large kernel vision Mamba U-Net for medical image segmentation// Medical Image Computing and Computer-Assisted Intervention–MICCAI 2024. Cham: Springer, 2024: 360-370.

[b19] 19.Bansal S, Madisetty S, Rehman M Z U, et al. A comprehensive survey of Mamba architectures for medical image analysis: Classification, segmentation, restoration and beyond. arXiv preprint arXiv, 2024: 2410.02362.

[b20] 20.Gu A, Dao T. Mamba: Linear-time sequence modeling with selective state spaces. arXiv preprint arXiv, 2023: 2312.00752.

[b21] 21.Ma J, Li F, Wang B. U-Mamba: Enhancing long-range dependency for biomedical image segmentation. arXiv preprint arXiv, 2024: 2401.04722.

[b22] 22.Ling T, Zuo Z, Huang M, et al Stacking classifiers based on integrated machine learning model: fusion of CT radiomics and clinical biomarkers to predict lymph node metastasis in locally advanced gastric cancer patients after neoadjuvant chemotherapy. BMC Cancer. 2025;25(1):834. doi: 10.1186/s12885-025-14259-w. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b23] 23.Zheng G, Wang H, Chai X, et al Interpretable deep learning for multicenter gastric cancer T staging from CT images. npj Digit Med. 2026;9:2. doi: 10.1038/s41746-025-02002-5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b24] 24.Li L, Wang C, Geng Y, et al Segment anything model for gastric cancer. Cancer Med. 2025;14(18):e71246. doi: 10.1002/cam4.71246. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b25] 25.Bhardwaj P, Kumar S, Kumar Y. Deep learning techniques in gastric cancer prediction and diagnosis// 2022 International Conference on Machine Learning, Big Data, Cloud and Parallel Computing (COM-IT-CON). New York: IEEE, 2022: 843-850.

[b26] 26.Kumar G M K, Chadha A, Mendola J, et al. MedVisionLlama: Leveraging pre-trained large language model layers to enhance medical image segmentation// Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops. New York: IEEE, 2025: 1114-1124.

[b27] 27.Wu J, Fu R, Fang H, et al Medical SAM adapter: Adapting segment anything model for medical image segmentation. Med Image Anal. 2025;102:103547. doi: 10.1016/j.media.2025.103547. [DOI] [PubMed] [Google Scholar]

[b28] 28.Li Y, Lai Z, Bao W, et al. Visual large language models for generalized and specialized applications. arXiv preprint arXiv, 2025: 2501.02765.

[b29] 29.Chen W, Liu J, Liu T, et al Bi-VLGM: Bi-level class-severity-aware vision-language graph matching for text guided medical image segmentation. Int J Comput Vis. 2025;133(3):1375–1391. doi: 10.1007/s11263-024-02246-w. [DOI] [Google Scholar]

[b30] 30.Rao Y, Zhao W, Chen G, et al. Denseclip: Language-guided dense prediction with context-aware prompting// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New York: IEEE, 2022: 18082-18091.

[b31] 31.Yuan R, Chen M, Xu J, et al. Text-promptable propagation for referring medical image sequence segmentation. arXiv preprint arXiv, 2025: 2502.11093.

[b32] 32.He J, Wang G, Zhang Q, et al. ReMamber: Referring image segmentation with mamba twister. arXiv preprint arXiv, 2024: 2404.11651.

[b33] 33.Isensee F, Jaeger P F, Kohl S A A, et al nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nat Methods. 2021;18(2):203–211. doi: 10.1038/s41592-020-01008-z. [DOI] [PubMed] [Google Scholar]

[b34] 34.Hatamizadeh A, Nath V, Tang Y, et al. Swin UNETR: Swin transformers for semantic segmentation of brain tumors in MRI images// International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI). Cham: Springer, 2022: 272-284.

[b35] 35.Zhang Y, Liu H, Hu Q. TransFuse: Fusing transformers and CNNs for medical image segmentation// International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI). Cham: Springer, 2021: 14-24.

[b36] 36.Wang Z, Zheng J, Zhang Y, et al. Mamba-UNet: A mamba-based u-net for medical image segmentation. arXiv preprint arXiv, 2024: 2402.08699.

[b37] 37.Ruan J, Li J, Xiang S. VM-UNet: Vision mamba u-net for medical image segmentation. arXiv preprint arXiv, 2024: 2402.02491.

[b38] 38.Liu J, Yang H, Zhou H, et al. Swin-UMamba: Adapting mamba-based vision foundation models for medical image segmentation. arXiv preprint arXiv, 2024: 2402.03302.

[b39] 39.Wu R, Liu Y, Liang L, et al. H-vmunet: High-order vision mamba u-net for medical image segmentation. arXiv preprint arXiv, 2024: 2403.13642.

[b40] 40.Li Y, Li X, Liu Y, et al. MLLA-UNet: Mamba-like linear attention in an efficient u-shape model for medical image segmentation. arXiv preprint arXiv, 2024: 2405.02875.

PERMALINK

CGP-Net: 用于胃癌精准分割的跨模态引导先验网络

CGP-Net: Cross-modal guided prior network for precise gastric cancer segmentation

Chaoyang GE

Yifan GAO

Cheng LIU

Xin GAO

Abstract

Abstract

0. 引言