Skip to main content
. Author manuscript; available in PMC: 2023 Nov 1.
Published in final edited form as: Med Image Anal. 2022 Sep 14;82:102615. doi: 10.1016/j.media.2022.102615

Fig. 5:

Fig. 5:

(a): Swin Transformer creates hierarchical feature maps by merging image patches. The self-attention is computed within each local 3D window (the red box). The feature maps generated at each resolution are sent into a ConvNet decoder to produce an output. (b): The 3D cyclic shift of local windows for shifted-window-based self-attention computation.