Skip to main content
. Author manuscript; available in PMC: 2026 Apr 9.
Published in final edited form as: Knowl Based Syst. 2025 Nov 12;331:114810. doi: 10.1016/j.knosys.2025.114810

Fig. 2.

Fig. 2.

Overview of our proposed MVGFormer. (a) We cut the whole tomogram into patches and send each patch to the MVGFormer as the input. For each input, multi-view transform and linear projection are applied to obtain sequence-level feature embeddings from three different observation perspectives. Each feature embedding is added with its unique corresponding position embedding and then sent to transformer encoder. Each input is also sent to the context encoder to generate the visual graph as the attention-guidance. (b) is the structure of the transformer layer. To obtain voxel-level segmentation, we design two different decoders: (c) multi-level feature fusion segmentor and (d) parallel 3D atrous convolution segmentor. Best viewed in color.