Skip to main content
. 2021 Nov 11;21(22):7498. doi: 10.3390/s21227498

Table 4.

Network structure of the temporal attention module.

Unit Layer Output Size
Input 0 Facial Image Feature 512 × 48 (frames)
1 Facial Landmark Feature 256 × 48 (frames)
Temporal
Attention
Module
2 Concatenate (0 + 1) 768 × 48 (frames)
3 Average (48 frames) 768
4 Concatenate (2 + 3) 1536 × 48 (frames)
5 Fully Connected 1536 × 48 (frames)
Fully Connected 1536 × 48 (frames)
Fully Connected 1 × 48 (frames)
6 Multiplication (2 · 5) 768 × 48 (frames)
7 Average (48 frames) 768
Output 8 Fully Connected 3 or 4

In Unit 4, the outputs of Units 2 and 3 are concatenated for each frame. In Unit 6, the outputs of Units 2 and 5 are multiplied for each frame.