Table 1.
Author | Publication Year | Technique | Dataset Used | Objectives | Limitations |
---|---|---|---|---|---|
Marastoni et al. [34] | 2021 | CNN, LSTM | OBF dataset (Custom dataset), MalImg, MsM2015 | Effect of bicubic interpolation and custom-trained TL on malware detection. | TL average, Fixed size, obfuscation techniques, data balancing. |
Casolare et al. [35] | 2022 | J48, LMT, RF, RT, REP Tree | Custom dataset on Android APK from Android malware repo. | Malware detection in Android applications. | No benchmark data, data balancing, not detect malware family, time diff. leads to decline. |
Kim et al. [36] | 2017 | CNN, MLP | Microsoft Malware | Identify groups in which malware resides and detect emerging malware. | Unstable MLP, model enhancement, and data balancing. |
Khan et al. [25] | 2018 | GoogleNet, ResNet18, 34, 50, 101, 152 | Microsoft Malware, Benign software opcodes converted to images. | Use .EXE files as images for malware identification. | Heavy-weight, extensive knowledge, more execution time, large validation loss, data balancing. |
Dai et al. [27] | 2018 | MLP, KNN, RF | VirusTotal | Malware detection using storage dump file’s content as images. | Unable to find malware in full system, hardware feature overlooked, shallow ML models, data balancing, below par accuracy. |
Singh et al. [37] | 2019 | CNN, ResNet-50 | Custom Dataset—Malshare, VirusShare, VirusTotal. MalImg | Eliminate difficulties during static and dynamic analysis by using image representation of executables. | Low for packed or unseen, obfuscation evades visualization, heavy-weight, data balancing, and undetected evasive malware. |
Venkatraman et al. [38] | 2019 | CNN, CNN BiLSTM, CNN BiGRU | BIG 2015, MalImg | Identify malware using image-based methods and hybrid models. | Complex processing, extensive knowledge of kernels, and fine-tuning. |
Vasan et al. [39] | 2020 | Ensemble of ResNet-50 and VGG16, SVM | MalImg, Packed malware data from VirusShare | Identification of packed and unpacked malwares. | Complex ensemble, requires extensive NN knowledge, heavy-weight, and data balancing. |
Sharma et al. [40] | 2020 | CNN, CNN-SVM | MalImg | Maximize potential of CNN and other ML models for malware classification. | Architectural improvement, multiple SVMs for multiclass, increased model size, and data balancing. |
Naeem et al. [41] | 2020 | Fine-tuned CNN trained on ImageNet, VGG-16, ResNet-50, Inception | MalImg, IoT-Android Mobile dataset | Develop a novel CNN-based classifier for multiclass malware classification. | Requires expert domain knowledge for fine-tuning through backpropagation. |
Bakour, K., and Unver, H. M. [42] | 2021 | RF, KNN, DT, Bagging, AdaBoost, Gradient Boost, Ensemble Voting Classifier, ResNet, Inception-V3 | Manifest.xml, DEX, Manifest-ARSC-DEX, Manifest-Resources.arsc, Manifest-ARSC-DEX-Native_jar-based image dataset. | A generic image-based classification for any file type that uses grayscale images from Android malware samples. | Static analysis, impacted by tampering and code obfuscation, injection attacks, hybrid classifier higher time. |
Kumar, S. [43] | 2021 | TL fine-tuned CNN trained on ImageNet, ResNet-50 | MalImg, BIG 2015 | Refined CNN to identify unidentified malware without extensive processing and evading strategies. | Requires expert knowledge, data balancing, Uniform image size, and Common CNN-based models. |
Anandhi et al. [44] | 2021 | DenseNet201, VGG3 | MalImg, BIG 2015, Some benign samples. | To preserve the semantic information by converting malware into Markov images using Gabor filter. | Uniform image size, heavy-weight, data balancing. |
Pant et al. [45] | 2021 | Custom CNN, VGG16, Resnet-18, Inception-V3 | MalImg | To detect malware in grayscale image form. | Insufficient data, non-uniform data, pre-trained model inferior. |
Kumar et al. [46] | 2022 | CNN trained on ImageNet, VGG16, VGG19, ResNet50, Inception V3 | MalImg, Microsoft BIG | Detect malware from files obtained by converting Windows PE files into grayscale images. | Heavy-weight, data balancing, difficult fine-tuning, CNN trained on general data. |
Kalash et al. [47] | 2018 | GIST-SVM, CNN | MalImg, Microsoft Malware | Develop a deep CNN model for malware identification using a self-learning approach. | Approach not comparable to existing solution-based on the file types, GIST-SVM not effective needs improvement. |
Unver, H. M., and Bakour, K. [48] | 2020 | Random Forest, KNN, DT, Bagging, AdaBoost, Gradient Boost | Manifest file-based image dataset, DEX code-based image dataset, Manifest-DEX-ARSC image dataset and Android Malware Dataset. | A generic method for any type of app when converted into images for malware detection. | Static analysis methods, impacted by tampering and code obfuscation, not detect injection attacks, no data balancing. |
Jin et al. [49] | 2020 | CNN based Autoencoders | Dataset obtained from Andro-Dumpsys study conducted by Korea University. | Detect malware using CNN-based autoencoders using uniform image size. | Uses small dataset, identifies uncollected malware as benign, separate encoder for malware, high complexity and redundancy, more resources and time. |
Bakour, K., and Unver, H. M. [50] | 2021 | DeepVisDroid (1D CNN) trained on local and global features, 2D CNN, CNN inspired by VGG16, ResNet ad Inception-V3. | Manifest.xml file-based dataset, DEX code files-based dataset, Manifest and Resources.arsc files-based dataset, and Manifest, Resources.arsc and Dex files-based image dataset. | To detect malware by fusing deep learning techniques with image-based attributes. | High computation time, fail to acknowledge the obfuscation and camouflage used in code and commonly used pre-trained models. |
Lo et al. [51] | 2019 | Xception model pre-trained on ImageNet, Ensemble of Xception. | MalImg, Microsoft Malware | Convert ‘bytes’ and ‘asm’ into images for malware detection using DL models. | Heavy-weight models, requires extensive knowledge, no data balancing. |
Parihar et al. [52] | 2022 | S-DCNN (Comprising ResNet50, Xception, EfficientNet-B4) ad MLP | MalImg VirusShare |
Tackle the malware detection problem using ensemble and transfer learning. | Heavy-weight ensemble model, no data augmentation. |
Darem et al. [54] | 2021 | Ensemble—CNN and XGBoost | Small custom dataset with 9 malware types. | Detect malware using grayscale images of obfuscated opcodes present as ASM files. | No benchmark dataset, requires extensive knowledge, time-consuming. |
Roseline et al. [55] | 2020 | Ensemble Deep Forest | MalImg BIG2015 Malevis Malicia (for validation) |
Use ensemble deep forest algorithm along with a vision-based approach for high dimensional malware data. | No data augmentation. |
Ding et al. [56] | 2020 | CNN | Dataset provided by DRE-BIN project. | Use bytecode images of malware APKs for Android malware detection. | No benchmark dataset, small data, no data augmentation, and average results. |
Ngo et al. [57] | 2020 | Grayscale image—CNN Other features—KNN, DT, SVM, RF, etc. |
IoT malware dataset by IoTPOT, IoT SOHO and VirusShare | Experimentation on existing methods of static analysis for IoT malware detection. | Average results with obfuscation and encryption, no benchmark data. |
Huang et al. [58] | 2021 | VGG16 | Malware + benign samples from ‘virussign.com’ | To detect malware present in Windows OS using hybrid visualization technique. | No benchmark data, unable to identify unknown samples, average results. |
Naeem et al. [59] | 2020 | CNN | Leopard Mobile MalImg |
Using image visualization and DL models for malware detection in industrial IoT. | No data balancing, more classification time. |
He et al. [60] | 2019 | CNN with SPP layers ResNet |
Data from Andro-Dumpsys study | Assess efficacy of CNN-based model in combating superfluous API injections in malware detection domain. | SPP led to memory limitations, dataset constraints, and models not optimized. |
Su et al. [61] | 2018 | 2-layered CNN | Data from Ubuntu System files and IoTPOT dataset. | Using CNN-based approach to mitigate risks of DDoS attacks in IoT environment. | Susceptible to obfuscation, time-consuming data pre-processing. |
Asam et al. [62] | 2022 | CNN AlexNet VGG16 ResNet50 Xception GoogleNet |
IoT Malware Dataset | Develop a CNN-based model to detect malware in IoT. | Complex CNN design, time-consuming process. |
Makandar, A., and Patrot, A. [63] | 2017 | SVM KNN |
Malheur MalImg |
Malware classification by using an efficient texture feature vector. | No data balancing, complex feature vector construction. |