Skip to main content
. 2023 May 11:1–41. Online ahead of print. doi: 10.1007/s11042-023-15443-5

Table 3.

RGB DATASETS

Name of the Dataset Year Purpose Quality/ Format/ Source of preparation FPS/Remarks Action Types/Activities covered
FineGym [70] 2020 It provides temporal annotations at both action and sub-action levels with a three-level semantic hierarchy RGB videos of gymnasts. 303 Competition records ~708 hours High-quality videos 720P or 1080P gymnasium videos dataset
HACS [213] 2019 source for spatiotemporal feature learning HACS clips includes: 1.55 M 2-sec clips on 504 K videos. HACS segment includes: 140 K complete segments on 50 K videos RGB videos of 200 actions category

504 K untrimmed videos and 1.5 M annotated clips were sampled from them.

HACS Segments contains 139 K action segments densely annotated in 50 K untrimmed videos spanning 200 action categories

20BN-Something-Something Dataset V2 [158] 2017 Large collection of labeled video clips that show humans performing pre-defined basic actions with everyday objects Quality is 100px. FPS = 12 RGB-labeled videos of sub-activities 220,847 videos, with 168,913 in the training set, 24,777 in the validation set and 27,157 in the test set. There are 174 labels
Kinetics [110] 2017 high-quality dataset for human action recognition in videos RGB High-quality video dataset of 650,000 video clips;

Covers 400/600/700 activities lasting 10 seconds

Dataset contains URLs of the videos.

500,000 video clips covering 600 human action classes with at least 600 video clips for each action class
Watch-n-Patch [199] 2015 focus on modelling human activities, comprising multiple actions in a completely unsupervised setting RGB-D dataset Videos capturing daily activities. Kinect sensor used for skeleton data ground truth annotations. seven subjects perform daily activities in eight offices and five kitchens with complex background
Penn Action [96] 2013 human joint annotations for each sequence RGB frames within 640 × 480 15 different actions 2326 video sequences of 15 different actions
UCF101 [173] 2012 UCF101 is one of the largest datasets of human actions 101 action categories are grouped into 25 groups containing 4–7 videos each RGB videos classified into 101 categories consisting of 13,320 video clips
HMDB51 [169] 2011 realistic videos from various sources, including movies and web videos Youtube, Google RGB videos 51 action categories such as “jump”, “kiss” and “laugh”, with each category containing at least 101 clips, 6849 video clips
Kth [162] 2004 most standard datasets and first datasets for activity recognition, which contains six actions: 25 individuals participated as actors RGB videos walk, jog, run, box, hand-wave, and hand clap