Skip to main content
. 2020 Jan 9;2020:3062706. doi: 10.1155/2020/3062706

Table 1.

Comparison of attention mechanism modeling methods.

Ref. Attention name Method Comment
[69] Soft attention Give a probability according to the context vector for any word in the input sentence when seeking attention probability distribution Parameterization
Derivative enable
Definitely

[69] Hard attention Focus only on a randomly chosen location using Monte Carlo sampling to estimate the gradient Randomly
On the base of probability
Simple

[70] Multihead attention Linearly projecting multiple pieces of information selected from the input in parallel using multiple keys, values, and queries Linear projection
Parallel
Focus on information from different representation subspaces in different locations
Multiple attention head

[70] Scaled dot-product attention Execute a single attention function using keys, values, and query matrices High speed
Save space

[71] Global attention Considering the hidden layer state of all encoders, the weight distribution of attention is obtained by comparing the current decoder hidden layer state with the state of each encoder hidden layer Comprehensive
Time-consuming
Large amount of calculation

[71] Local attention First find a location for it, then calculate the attention weight in the left and right windows of its location, and finally weight the context vector Reduce the cost of calculations

[75] Adaptive attention Define a new adaptive context vector which is modeled as a mixture of the spatially attended image features and the visual sentinel vector. This trades off how much new information the network is considering from the image with what it already knows in the decoder memory Solve when and where to add attention in order to extract meaningful information for sequence words

[76] Semantic attention Select semantic concepts and incorporate them into the hidden state and output of the LSTM Optional
Merge
From top to bottom
From bottom to top

[77] Spatial and channel-wise attention Select semantic attributes based on the needs of the sentence context Multiple semantics
In order to overcome the problem of overrange when using the general attention

[4] Areas of attention Modeling the dependencies between image regions, title words, and the state of the RNN language model Interaction
Comprehensive