Skip to main content
. 2019 Dec 3;20:627. doi: 10.1186/s12859-019-3217-3

Fig. 7.

Fig. 7

An example of how each one of two attention heads in Multi-Head attention computes different context vectors based on words in SDP. The width of a line refers to an attention weight