Skip to main content
. 2022 Nov 4;13:1024104. doi: 10.3389/fmicb.2022.1024104

Figure 2.

Figure 2

Structure of embedding layer in Vision Transformer. The darker green wider rectangles represent the flattened feature vector of each block of an image, while the pink wider rectangles represent the feature vectors corresponding to classes, and the brown narrower rectangle represents the spatiotemporal information of the image.