. 2020 Jan 9;2020:3062706. doi: 10.1155/2020/3062706

Table 1.

Comparison of attention mechanism modeling methods.

Ref.	Attention name	Method	Comment
[69]	Soft attention	Give a probability according to the context vector for any word in the input sentence when seeking attention probability distribution	Parameterization Derivative enable Definitely

[69]	Hard attention	Focus only on a randomly chosen location using Monte Carlo sampling to estimate the gradient	Randomly On the base of probability Simple

[70]	Multihead attention	Linearly projecting multiple pieces of information selected from the input in parallel using multiple keys, values, and queries	Linear projection Parallel Focus on information from different representation subspaces in different locations Multiple attention head

[70]	Scaled dot-product attention	Execute a single attention function using keys, values, and query matrices	High speed Save space

[71]	Global attention	Considering the hidden layer state of all encoders, the weight distribution of attention is obtained by comparing the current decoder hidden layer state with the state of each encoder hidden layer	Comprehensive Time-consuming Large amount of calculation

[71]	Local attention	First find a location for it, then calculate the attention weight in the left and right windows of its location, and finally weight the context vector	Reduce the cost of calculations

[75]	Adaptive attention	Define a new adaptive context vector which is modeled as a mixture of the spatially attended image features and the visual sentinel vector. This trades off how much new information the network is considering from the image with what it already knows in the decoder memory	Solve when and where to add attention in order to extract meaningful information for sequence words

[76]	Semantic attention	Select semantic concepts and incorporate them into the hidden state and output of the LSTM	Optional Merge From top to bottom From bottom to top

[77]	Spatial and channel-wise attention	Select semantic attributes based on the needs of the sentence context	Multiple semantics In order to overcome the problem of overrange when using the general attention

[4]	Areas of attention	Modeling the dependencies between image regions, title words, and the state of the RNN language model	Interaction Comprehensive