Ding et al. (2014)
|
Violence Detection using 3D CNN |
3D convolution is used to get spatial information |
Backpropagation method |
Crowded |
91% accuracy |
Arandjelovic et al. (2016)
|
Deep architecture for place recognition |
VGG VLAD method for image retrieval |
Backpropagation method for feature extraction |
Crowded |
87%–96% accuracy |
Fenil et al. (2019)
|
Framework for football stadium comprising of big data analysis and deep learning through bidirectional LSTM |
Bidirectional LSTM |
HOG, SVM |
Crowded |
94.5% accuracy |
Mu, Cao & Jin (2016)
|
Violent scene detection using CNN and deep audio features |
MFB |
CNN |
Crowded |
Approximately 90% accuracy |
Mohtavipour, Saeidi & Arabsorkhi (2021)
|
A multi-stream CNN using handcrafted features |
A deep violence detection framework based on the specific features (speed of vmovement, and representative image) derived from handcrafted methods. |
CNN |
Both crowded and uncrowded |
|
Sudhakaran & Lanz (2017)
|
Detect violent videos using ConvLSTM |
CNN along with the ConvLSTM |
CNN |
Crowded |
Approximately 97% |
Naik & Gopalakrishna (2021)
|
Deep violence detection framework based on the specific features derived from handcrafted methods |
Discriminative feature with a novel differential motion energy image |
CNN |
Both crowded and uncrowded |
|
Meng, Yuan & Li (2017)
|
Detecting Human Violent Behavior by integrating trajectory and Deep CNN |
Deep CNN |
Optical flow method |
Crowded |
98% accuracy |
Rendón-Segador et al. (2021)
|
ViolenceNet: Dense Multi-Head Self-Attention with Bidirectional Convolutional LSTM |
3D DenseNet |
Optical flow method |
Crowded |
95.6%– 100% accuracy |
Xia et al. (2018)
|
Violence detection method based on a bi-channels CNN and the SVM. |
Linear SVM |
Bi-channels CNN |
Both crowded and uncrowded scenes |
95.90 ± 3.53 accuracy in Hockey fight, 93.25 ± 2.34 accuracy in Violence crowd |
Meng et al. (2020)
|
Trajectory-Pooled Deep Convolutional Networks |
ConvNet model which contains 17 convolutionpool-norm layers and two fully connected layers |
Deep ConvNet model |
Both crowded and uncrowded |
92.5% accuracy in Crowd Violence, 98.6% in Hockey Fight dataset |
Ullah et al. (2019)
|
Violence Detection using Spatiotemporal Features |
Pre-train Mobile Net CNN model |
3D CNN |
Crowded |
Approximately 97% accuracy |