Table 4.
A selection of the frequently used multimodal datasets in the literature
Reference | Year | Dataset | Modality | Main tasks | Size |
---|---|---|---|---|---|
[55] | 2011 | RGB-D Object | RGB + D | Object recognition | Contains 300 object instances under 51 categories from different angles for a total of 250,000 RGB-D images |
[56] | 2014 | BigBIRD | RGB + D | Object recognition | Contains 125 objects, 600 RGB-D point clouds, and 600 12 megapixel images |
[57] | 2016 | A large dataset of object scans | RGB + D | Object recognition | Contains more than 10,000 scanned and reconstructed objects in 9 categories |
[58] | 2011 | RGB-D Semantic Segmentation | RGB + D | Semantic segmentation | Contains 3 3D models for 6 categories and 16 test object scenes |
[55] | 2011 | RGB-D Scenes v.1 | RGB + D | Object recognition | Contains 8 video scenes from several RGB-D images |
Semantic segmentation | |||||
[55] | 2014 | RGB-D Scenes v.2 | RGB + D | Object recognition | Contains 14 scenes of video sequences |
Semantic segmentation | |||||
[59] | 2011 | NYU v1-v2 | RGB + D | Semantic segmentation | NYU-v1 contains 64 different indoor scenes and 108617 unlabelled images. NYU-v2 contains 464 different indoor scenes and 407024 unlabeled images |
[60] | 2011 | RGB-D People | RGB + D | Object recognition | Contains more than 3000 RGB-D images |
[61] | 2016 | SceneNet RGB-D | RGB + D | Semantic segmentation | Contains 5M RGB-D images |
Instance segmentation | |||||
Object detection | |||||
[62] | 2017 | Kinetics-400 | RGB + Opt. flow | Motion recognition | Contains more than 300,000 video sequences in 400 classes |
[63] | 2016 | Scene Flow | RGB + Opt. flow | Object segmentation | Contains over 39,000 high resolution images |
[64] | 2012 | MPI-Sintel | RGB + Opt. flow | Semantic segmentation | Contains 1040 annotated optical flow and matching RGB images |
Object recognition |