Figure - PMC

Skip to main content

An official website of the United States government

Here's how you know

Here's how you know

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

View full-text article in PMC

. 2021 Jun 4;15(6):e12417. doi: 10.1111/lnc3.12417

Search in PMC
Search in PubMed
View in NLM Catalog
Add to search

© 2021 The Authors. Language and Linguistics Compass published by John Wiley & Sons Ltd.

This is an open access article under the terms of the http://creativecommons.org/licenses/by/4.0/ License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.

PMC Copyright notice

Datasets of natural images: The task of answering a question about an image has been promoted by the release of datasets containing an image and a question about it, such as VQA v1.0 (Antol et al., 2015). By controlling for the multimodal data points, models have been pushed to build finer‐grained representations (see VQA v2.0; Goyal et al., 2017). The release of densely annotated datasets, such as Visual Genome (Krishna et al., 2017), made it possible to tackle the challenge of building multimodal representations of relations between objects. This paved the way to resources, such as GQA (Hudson & Manning, 2019), which include compositional questions involving such relations