Figure - PMC

Skip to main content

View full-text article in PMC

. Author manuscript; available in PMC: 2021 Jul 6.

Published in final edited form as: Proteins. 2021 Feb 16;89(6):697–707. doi: 10.1002/prot.26052

Figure 2. — Schematic illustration of 1D and 2D attention mechanism. (a) The scheme for 1D attention mechanism. The input is first transformed into a vector of size (N_heads, L, d_att) for the efficient multi-headed attention implementation. For each head, the vector of size (L, d_att) is multiplied to three different trainable matrices of size (d_att, d_att) to generate Query(Q), Key(K), and Value(V). Different heads have their own transformation matrices for Q, K and V. Q and K first go through a batch dot product operation, resulting in a new vector QK with size (N_heads, L, L). QK is then scaled and normalized with Softmax function on the last axis, which becomes the attention score W_att. The product of W_att × V for each head becomes the 1D attention output. (b) The scheme for 2D attention mechanism. The 2D input is first transformed with a 3D convolution and becomes a stretched vector of size (L, L, 32, n²). It is then computed with the similar attention operation as the 1D attention scheme on the last axis.