Skip to main content

View full-text article in PMC

. 2021 Nov 11;21(22):7498. doi: 10.3390/s21227498

Table 4.

Network structure of the temporal attention module.

	Unit	Layer	Output Size
Input	0	Facial Image Feature	512 × 48 (frames)
Input	1	Facial Landmark Feature	256 × 48 (frames)
Temporal Attention Module	2	Concatenate (0 + 1)	768 × 48 (frames)
	3	Average (48 frames)	768
	4	Concatenate (2 + 3)	1536 × 48 (frames)
	5	Fully Connected	1536 × 48 (frames)
		Fully Connected	1536 × 48 (frames)
		Fully Connected	1 × 48 (frames)
	6	Multiplication (2 · 5)	768 × 48 (frames)
	7	Average (48 frames)	768
Output	8	Fully Connected	3 or 4

In Unit 4, the outputs of Units 2 and 3 are concatenated for each frame. In Unit 6, the outputs of Units 2 and 5 are multiplied for each frame.