Skip to main content
. 2021 Dec 14;21(24):8356. doi: 10.3390/s21248356

Figure 4.

Figure 4

Our proposed Mixed AttendAffectNet: Feature vectors extracted from each movie part are first fed to fully connected layers for dimension reduction before being passed to N identical layers. Each of them includes a muli-head attention followed by a feed-forward layer. We apply average pooling to the outputs of those identical layers to obtain representation vectors corresponding to movie parts. We add positional encodings to these representation vectors, which are then fed to another set of N identical layers. These layers are similar to the previous ones, except that each of them includes one more layer called masked multi-head attention. This set of N identical layers is followed by dropout, and a fully connected layer. The previous outputs together with their corresponding positional encodings are used as the additional input to this model.