Figure 4. .
The attention mechanism within the temporal block. Similarity between the features of each voxel at time t (query) and its own values on other time steps (keys) is calculated through the dot product process and is reweighted during the training process to create the attention weights for the input sequence.
