. 2023 Jun 28;23(13):6006. doi: 10.3390/s23136006

Table 1.

The comparison of existing multimodal fusion methods.

Model	Architecture	Features	Fusion Strategy
Luo et al. [8]	CNN + RNN	voice and text	Fuse the audio and handcrafted low-level descriptor through simple vector concatenation.
Micucci et al. [9]	CNN	palmprint and hand-geometry	Score level fusion, sum the weighted scores from each modality.
Sell et al. [10]	DNN + CNN	face and voice	Converting the output scores generated from unimodal verification systems into log-likelihood ratios.
PINS [11]	VGG-M	face and voice	Establish a joint embedding between faces and voices.
EmoRL-Net [12]	ResNet-18	face and voice	Project the representation of two full connection layers into a spherical space.