Skip to main content
. 2022 Jul 8;3(7):100520. doi: 10.1016/j.patter.2022.100520

Figure 2.

Figure 2

MLP-Mixer structure

We omit the layer normalization, non-linear activation function, and the residual path to improve readability. The token-mixing and channel-mixing MLP are reduced to one fully connected layer to ease understanding. The expression in the dashed box is consistent with Figure 1.