Skip to main content
. 2021 Jun 10;38(8):2939–2970. doi: 10.1007/s00371-021-02166-7

Table 4.

A selection of the frequently used multimodal datasets in the literature

Reference Year Dataset Modality Main tasks Size
[55] 2011 RGB-D Object RGB + D Object recognition Contains 300 object instances under 51 categories from different angles for a total of 250,000 RGB-D images
[56] 2014 BigBIRD RGB + D Object recognition Contains 125 objects, 600 RGB-D point clouds, and 600 12 megapixel images
[57] 2016 A large dataset of object scans RGB + D Object recognition Contains more than 10,000 scanned and reconstructed objects in 9 categories
[58] 2011 RGB-D Semantic Segmentation RGB + D Semantic segmentation Contains 3 3D models for 6 categories and 16 test object scenes
[55] 2011 RGB-D Scenes v.1 RGB + D Object recognition Contains 8 video scenes from several RGB-D images
Semantic segmentation
[55] 2014 RGB-D Scenes v.2 RGB + D Object recognition Contains 14 scenes of video sequences
Semantic segmentation
[59] 2011 NYU v1-v2 RGB + D Semantic segmentation NYU-v1 contains 64 different indoor scenes and 108617 unlabelled images. NYU-v2 contains 464 different indoor scenes and 407024 unlabeled images
[60] 2011 RGB-D People RGB + D Object recognition Contains more than 3000 RGB-D images
[61] 2016 SceneNet RGB-D RGB + D Semantic segmentation Contains 5M RGB-D images
Instance segmentation
Object detection
[62] 2017 Kinetics-400 RGB + Opt. flow Motion recognition Contains more than 300,000 video sequences in 400 classes
[63] 2016 Scene Flow RGB + Opt. flow Object segmentation Contains over 39,000 high resolution images
[64] 2012 MPI-Sintel RGB + Opt. flow Semantic segmentation Contains 1040 annotated optical flow and matching RGB images
Object recognition