Skip to main content
. 2022 Jun 25;12(7):1549. doi: 10.3390/diagnostics12071549

Figure 2.

Figure 2

(a) The transformer block consists of an MSA, an MLP, skip connections, and layer normalizations. (b) Self-attention (Scaled Dot-Product Attention). Matmul: multiplication of two matrices. (c) Multi-head attention consists of multiple parallel self-attention heads. Concat: concatenation of feature representations. h: the number of self-attention heads.