. Author manuscript; available in PMC: 2026 Apr 9.

Published in final edited form as: Knowl Based Syst. 2025 Nov 12;331:114810. doi: 10.1016/j.knosys.2025.114810

Table 1.

Summary of the proposed framework.

Dimension	Component	Function / Output
Input	3D input with XY, XZ and YZ projections	Orthogonal cryo-ET views providing complementary cues
Encoder	Transformer encoder with multi-view tokens	Captures cross-view semantic consistency
Fusion	Graph-based aggregation module	Models spatial and frequency-level relationships
Decoder	Multi-scale convolutional layers	Produces voxel-wise segmentation output
Learning Objective	View-masked SSL + CE loss	Jointly optimizes reconstruction and segmentation