Skip to main content
. Author manuscript; available in PMC: 2025 Mar 4.
Published in final edited form as: Proc IEEE Int Conf Big Data. 2024 Dec;2024:4941–4945. doi: 10.1109/bigdata62323.2024.10825319

Fig. 2.

Fig. 2.

Illustration of the six stages of the pre-trained ViT. There is an input layer, four successive transformer blocks followed by the classifier head. The re-trained layers are encapsulated in black dashed lines. Note that the patch merging layer encapsulated by the red dashed line is only re-trained in the first transformer stage.