Table 9.
Performance of the proposed model in comparison to the baseline models
| Multimodality Models | Fusion method | Validation accuracy | Test accuracy |
|---|---|---|---|
| InferSent+VGG16 (Baseline) | Maximum | 86.55% | 86.58% |
| InferSent+EfficientNet (Baseline) | Maximum | 83.28% | 83.39% |
| InferSent+ResNet50 (Baseline) | Maximum | 88.88% | 88.91% |
| BERT+VGG16 (Baseline) | Maximum | 86.94% | 86.99% |
| BERT+EfficientNet (Baseline) | Maximum | 83.34% | 83.18% |
| BERT+ResNet50 (Baseline) | Maximum | 89.29% | 89.09% |
| BERT+ResNet50 (Baseline) | Concatenate | 85.64% | 85.68% |
| BERT+Xception (Proposed) | Maximum | 91.61% | 91.87% |
| BERT+Xception (proposed) | Concatenate | 91.67% | 91.88% |
| (BERT+Dense)+Xception (proposed) | Maximum | 91.68% | 91.94% |
| (BERT+Dense)+Xception (proposed) | Concatenate | 91.94% | 91.87% |
Bold indicates models with better performance measures (here validation and Test accuracy)