Skip to main content
. 2022 Jul 8;3(7):100520. doi: 10.1016/j.patter.2022.100520

Figure 8.

Figure 8

Comparison of different hierarchical architectures for classification models

After patch embedding, the feature map size is h×w×c, where h, w, and c are the height, width, and channel numbers. There is a patch merging operation between every two stages, usually 2×2 patches are merged, and the number of channels doubles. The resolutions of the feature maps are different, usually h=H/16 in single stage, h=H/7 in two stage, and h=H/4 in pyramid, where H and W are the height and width of the input image.