Table 3.
RGB DATASETS
| Name of the Dataset | Year | Purpose | Quality/ Format/ Source of preparation | FPS/Remarks | Action Types/Activities covered |
|---|---|---|---|---|---|
| FineGym [70] | 2020 | It provides temporal annotations at both action and sub-action levels with a three-level semantic hierarchy | RGB videos of gymnasts. 303 Competition records ~708 hours | High-quality videos 720P or 1080P | gymnasium videos dataset |
| HACS [213] | 2019 | source for spatiotemporal feature learning | HACS clips includes: 1.55 M 2-sec clips on 504 K videos. HACS segment includes: 140 K complete segments on 50 K videos | RGB videos of 200 actions category |
504 K untrimmed videos and 1.5 M annotated clips were sampled from them. HACS Segments contains 139 K action segments densely annotated in 50 K untrimmed videos spanning 200 action categories |
| 20BN-Something-Something Dataset V2 [158] | 2017 | Large collection of labeled video clips that show humans performing pre-defined basic actions with everyday objects | Quality is 100px. FPS = 12 | RGB-labeled videos of sub-activities | 220,847 videos, with 168,913 in the training set, 24,777 in the validation set and 27,157 in the test set. There are 174 labels |
| Kinetics [110] | 2017 | high-quality dataset for human action recognition in videos | RGB High-quality video dataset of 650,000 video clips; |
Covers 400/600/700 activities lasting 10 seconds Dataset contains URLs of the videos. |
500,000 video clips covering 600 human action classes with at least 600 video clips for each action class |
| Watch-n-Patch [199] | 2015 | focus on modelling human activities, comprising multiple actions in a completely unsupervised setting | RGB-D dataset | Videos capturing daily activities. Kinect sensor used for skeleton data ground truth annotations. | seven subjects perform daily activities in eight offices and five kitchens with complex background |
| Penn Action [96] | 2013 | human joint annotations for each sequence | RGB frames within 640 × 480 | 15 different actions | 2326 video sequences of 15 different actions |
| UCF101 [173] | 2012 | UCF101 is one of the largest datasets of human actions | 101 action categories are grouped into 25 groups containing 4–7 videos each | RGB videos | classified into 101 categories consisting of 13,320 video clips |
| HMDB51 [169] | 2011 | realistic videos from various sources, including movies and web videos | Youtube, Google | RGB videos | 51 action categories such as “jump”, “kiss” and “laugh”, with each category containing at least 101 clips, 6849 video clips |
| Kth [162] | 2004 | most standard datasets and first datasets for activity recognition, which contains six actions: | 25 individuals participated as actors | RGB videos | walk, jog, run, box, hand-wave, and hand clap |