Skip to main content
. 2022 Jun 2;13:3094. doi: 10.1038/s41467-022-30761-2

Fig. 6. Cross-modal retrieval and visual question answering (VQA) results.

Fig. 6

a Cross-modal retrieval results (%) on the Chinese dataset AIC-ICC. b VQA results on Visual7W. Overall accuracies (%) along with results on each question type are reported. The dataset is translated into Chinese. c VQA examples of our BriVL model regarding whether it is pre-trained to validate the strong imagination ability of our pre-trained BriVL. Highest results in (a) and (b) are highlighted in bold.