Skip to main content
. 2023 Jun 28;23(13):6006. doi: 10.3390/s23136006

Table 1.

The comparison of existing multimodal fusion methods.

Model Architecture Features Fusion Strategy
Luo et al. [8] CNN + RNN voice and text Fuse the audio and handcrafted low-level descriptor through simple vector concatenation.
Micucci et al. [9] CNN palmprint and hand-geometry Score level fusion, sum the weighted scores from each modality.
Sell et al. [10] DNN + CNN face and voice Converting the output scores generated from unimodal verification systems into log-likelihood ratios.
PINS [11] VGG-M face and voice Establish a joint embedding between faces and voices.
EmoRL-Net [12] ResNet-18 face and voice Project the representation of two full connection layers into a spherical space.