Skip to main content
. 2021 Jun 1;75:168–185. doi: 10.1016/j.inffus.2021.05.015

Fig. 4.

Fig. 4

The proposed multi-modal knowledge graph attention embedding model. Given multimodal knowledge graph G, we propose the multimodal attention mechanisms including three parts: ➀ the single-level modality attention and its results denoted as {fknowledgeΦi,i=0,1,,C}; ➁ the multiple-level modality attention and its embedding denoted as fmultipleknowledge; ➂ cross-level modality attention mechanism that fuse the information of single-level modality and multiple-level modality attentions, and its the embedding matrix denoted as fknowledge. Meanwhile, we propose the Temporal Convolutional Self-Attention Network (TCSAN) to handle the inputted multimodal data and get the multimodal sentence vectors fnetwork. Then, we get the knowledge-based attention feature vector f. Finally, we use the classifier (in this paper, we use the ResNet-34 [70]) to gain the labels, i.e., yˆp.