Table 4.
C1 | C2 | C3 | C4 | C5 | C6 | C7 | C8 | |
---|---|---|---|---|---|---|---|---|
# Features | Feature extraction | ML/DL model | Architecture | Metrics | Validation | Hyper-parameters/optimizer/loss function | CIT* | |
R1 | T. domain | Hand-crafted | 3DCNN on Color-skl-MHI and RJI | I/p layer with skeletal joints, Color-skl-MHI followed by 3D-DCNN, RJI followed by 3D-DCNN, decision fusion, o/p | Accuracy | Cross validation | DO ratios for the three hidden layers (0.1%, 0.2%,0.3%)/SGD | Phyo et al. (2019) |
R2 | Spatio temporal | Automatic | VGG-16, VGG-19, inception v3 | 224 × 224 is image is input and features from fc1 layer are extracted which gives 4096-dimensional vector for per image | Accuracy, precision, recall, F1-score | 10% data is used for validation | All 3 CNNs trained on imageNet then trained on Weizmann reusing same weights | Deep and Zheng (2019) |
R3 | Spatio temporal | Automatic | ResNet-50 | C1, MP, C2-C5, AP, FC (2048), SM | Accuracy | Evaluates UCF-101 and HMDB-51 | BS- 128, DO, LR: and /SGD | Feichtenhofer et al. (2017) |
R4 | Spatio temporal | Automatic | ResNet-50 | Raw clip i/P C1- P- C2- C3- C4- C5- GAP- FC- No. of classes. Pre-train on Kinetics-400, Kinetics-600 and kinetics-700 | mAPS, GFLOPS | Evaluate model performance on AVA dataset | LR, WD: , Batch normalization/SGD | Feichtenhofer and Ai (2019) |
R5 | Spatio temporal | Automatic | MERS model with ResNeXt-101 | MERS: Train using flow, freeze weights, train with RGB using MSE loss. MARS: Train using privileged flow n/w, freeze weights, use RGB frames during test phase | top-1 mean accuracy | Kinetics 40: 20 k, MiniKinetics: 5 k | WD = 0.0005, LR = 0.1, momentum = 0.9 and LR = 0.1 for 64f-clips/SGD/Cross entropy | Crasto et al. (2019) |
R6 | Spatio temporal | Automatic | HATnet based on ResNet-50 and STCnet | 2D ConvNets: to extract spatial structure, 3DConv: to deal with interaction in frames. Both 2D and 3D use ResNet-50 | Top-1 mAPS | Kinetics 400 and 600 | Fine tune on UCF-101 & HMDB-51/Cross entropy | Diba et al. (2020) |
R7 | Spatio-temporal | Automatic | 2D ResNet 50 with STM blocks | Video frames i/p, C1, C2x, C3x, C4x, C5x, FC, o/p. Replace all residual block with STM block (1 × 1 2D conv, followed by CMM and CSTM blocks, then 1 × 1 2D Conv) | top-1, top-5 accuracy | Kinetics 400: 19,095 | LR = 0.01, LR = 0.001 for 25 epochs, momentum = 0.9, WD = 2.5 /SGD | Jiang et al. (2019) |
T time, F frequency, CV cross validation, LOSO leave one subject out, C convolution, P pooling, AP average pooling, MP max pooling, FC fully connected, SM softmax, BN batch normalization layer, LR learning rate, DO dropout, BS batch size, SGD stochastic gradient descent, mAPs mean average precision, GFLOP giga floating point operations per second, Spec specificity, Sens sensitivity, AUC area under curve, EER equal error rate, TL transfer Learning
*CIT citations