Skip to main content
. 2023 Mar 20;23(6):3253. doi: 10.3390/s23063253

Table 1.

Literature Review Summary.

Author Publication Year Technique Dataset Used Objectives Limitations
Marastoni et al. [34] 2021 CNN, LSTM OBF dataset (Custom dataset), MalImg, MsM2015 Effect of bicubic interpolation and custom-trained TL on malware detection. TL average, Fixed size, obfuscation techniques, data balancing.
Casolare et al. [35] 2022 J48, LMT, RF, RT, REP Tree Custom dataset on Android APK from Android malware repo. Malware detection in Android applications. No benchmark data, data balancing, not detect malware family, time diff. leads to decline.
Kim et al. [36] 2017 CNN, MLP Microsoft Malware Identify groups in which malware resides and detect emerging malware. Unstable MLP, model enhancement, and data balancing.
Khan et al. [25] 2018 GoogleNet, ResNet18, 34, 50, 101, 152 Microsoft Malware, Benign software opcodes converted to images. Use .EXE files as images for malware identification. Heavy-weight, extensive knowledge, more execution time, large validation loss, data balancing.
Dai et al. [27] 2018 MLP, KNN, RF VirusTotal Malware detection using storage dump file’s content as images. Unable to find malware in full system, hardware feature overlooked, shallow ML models, data balancing, below par accuracy.
Singh et al. [37] 2019 CNN, ResNet-50 Custom Dataset—Malshare, VirusShare, VirusTotal. MalImg Eliminate difficulties during static and dynamic analysis by using image representation of executables. Low for packed or unseen, obfuscation evades visualization, heavy-weight, data balancing, and undetected evasive malware.
Venkatraman et al. [38] 2019 CNN, CNN BiLSTM, CNN BiGRU BIG 2015, MalImg Identify malware using image-based methods and hybrid models. Complex processing, extensive knowledge of kernels, and fine-tuning.
Vasan et al. [39] 2020 Ensemble of ResNet-50 and VGG16, SVM MalImg, Packed malware data from VirusShare Identification of packed and unpacked malwares. Complex ensemble, requires extensive NN knowledge, heavy-weight, and data balancing.
Sharma et al. [40] 2020 CNN, CNN-SVM MalImg Maximize potential of CNN and other ML models for malware classification. Architectural improvement, multiple SVMs for multiclass, increased model size, and data balancing.
Naeem et al. [41] 2020 Fine-tuned CNN trained on ImageNet, VGG-16, ResNet-50, Inception MalImg, IoT-Android Mobile dataset Develop a novel CNN-based classifier for multiclass malware classification. Requires expert domain knowledge for fine-tuning through backpropagation.
Bakour, K., and Unver, H. M. [42] 2021 RF, KNN, DT, Bagging, AdaBoost, Gradient Boost, Ensemble Voting Classifier, ResNet, Inception-V3 Manifest.xml, DEX, Manifest-ARSC-DEX, Manifest-Resources.arsc, Manifest-ARSC-DEX-Native_jar-based image dataset. A generic image-based classification for any file type that uses grayscale images from Android malware samples. Static analysis, impacted by tampering and code obfuscation, injection attacks, hybrid classifier higher time.
Kumar, S. [43] 2021 TL fine-tuned CNN trained on ImageNet, ResNet-50 MalImg, BIG 2015 Refined CNN to identify unidentified malware without extensive processing and evading strategies. Requires expert knowledge, data balancing, Uniform image size, and Common CNN-based models.
Anandhi et al. [44] 2021 DenseNet201, VGG3 MalImg, BIG 2015, Some benign samples. To preserve the semantic information by converting malware into Markov images using Gabor filter. Uniform image size, heavy-weight, data balancing.
Pant et al. [45] 2021 Custom CNN, VGG16, Resnet-18, Inception-V3 MalImg To detect malware in grayscale image form. Insufficient data, non-uniform data, pre-trained model inferior.
Kumar et al. [46] 2022 CNN trained on ImageNet, VGG16, VGG19, ResNet50, Inception V3 MalImg, Microsoft BIG Detect malware from files obtained by converting Windows PE files into grayscale images. Heavy-weight, data balancing, difficult fine-tuning, CNN trained on general data.
Kalash et al. [47] 2018 GIST-SVM, CNN MalImg, Microsoft Malware Develop a deep CNN model for malware identification using a self-learning approach. Approach not comparable to existing solution-based on the file types, GIST-SVM not effective needs improvement.
Unver, H. M., and Bakour, K. [48] 2020 Random Forest, KNN, DT, Bagging, AdaBoost, Gradient Boost Manifest file-based image dataset, DEX code-based image dataset, Manifest-DEX-ARSC image dataset and Android Malware Dataset. A generic method for any type of app when converted into images for malware detection. Static analysis methods, impacted by tampering and code obfuscation, not detect injection attacks, no data balancing.
Jin et al. [49] 2020 CNN based Autoencoders Dataset obtained from Andro-Dumpsys study conducted by Korea University. Detect malware using CNN-based autoencoders using uniform image size. Uses small dataset, identifies uncollected malware as benign, separate encoder for malware, high complexity and redundancy, more resources and time.
Bakour, K., and Unver, H. M. [50] 2021 DeepVisDroid (1D CNN) trained on local and global features, 2D CNN, CNN inspired by VGG16, ResNet ad Inception-V3. Manifest.xml file-based dataset, DEX code files-based dataset, Manifest and Resources.arsc files-based dataset, and Manifest, Resources.arsc and Dex files-based image dataset. To detect malware by fusing deep learning techniques with image-based attributes. High computation time, fail to acknowledge the obfuscation and camouflage used in code and commonly used pre-trained models.
Lo et al. [51] 2019 Xception model pre-trained on ImageNet, Ensemble of Xception. MalImg, Microsoft Malware Convert ‘bytes’ and ‘asm’ into images for malware detection using DL models. Heavy-weight models, requires extensive knowledge, no data balancing.
Parihar et al. [52] 2022 S-DCNN (Comprising ResNet50, Xception, EfficientNet-B4) ad MLP MalImg
VirusShare
Tackle the malware detection problem using ensemble and transfer learning. Heavy-weight ensemble model, no data augmentation.
Darem et al. [54] 2021 Ensemble—CNN and XGBoost Small custom dataset with 9 malware types. Detect malware using grayscale images of obfuscated opcodes present as ASM files. No benchmark dataset, requires extensive knowledge, time-consuming.
Roseline et al. [55] 2020 Ensemble Deep Forest MalImg
BIG2015
Malevis
Malicia (for validation)
Use ensemble deep forest algorithm along with a vision-based approach for high dimensional malware data. No data augmentation.
Ding et al. [56] 2020 CNN Dataset provided by DRE-BIN project. Use bytecode images of malware APKs for Android malware detection. No benchmark dataset, small data, no data augmentation, and average results.
Ngo et al. [57] 2020 Grayscale image—CNN
Other features—KNN, DT, SVM, RF, etc.
IoT malware dataset by IoTPOT, IoT SOHO and VirusShare Experimentation on existing methods of static analysis for IoT malware detection. Average results with obfuscation and encryption, no benchmark data.
Huang et al. [58] 2021 VGG16 Malware + benign samples from ‘virussign.com’ To detect malware present in Windows OS using hybrid visualization technique. No benchmark data, unable to identify unknown samples, average results.
Naeem et al. [59] 2020 CNN Leopard Mobile
MalImg
Using image visualization and DL models for malware detection in industrial IoT. No data balancing, more classification time.
He et al. [60] 2019 CNN with SPP layers
ResNet
Data from Andro-Dumpsys study Assess efficacy of CNN-based model in combating superfluous API injections in malware detection domain. SPP led to memory limitations, dataset constraints, and models not optimized.
Su et al. [61] 2018 2-layered CNN Data from Ubuntu System files and IoTPOT dataset. Using CNN-based approach to mitigate risks of DDoS attacks in IoT environment. Susceptible to obfuscation, time-consuming data pre-processing.
Asam et al. [62] 2022 CNN
AlexNet
VGG16
ResNet50
Xception
GoogleNet
IoT Malware Dataset Develop a CNN-based model to detect malware in IoT. Complex CNN design, time-consuming process.
Makandar, A., and Patrot, A. [63] 2017 SVM
KNN
Malheur
MalImg
Malware classification by using an efficient texture feature vector. No data balancing, complex feature vector construction.