. 2023 Mar 20;23(6):3253. doi: 10.3390/s23063253

Table 1.

Literature Review Summary.

Author	Publication Year	Technique	Dataset Used	Objectives	Limitations
Marastoni et al. [34]	2021	CNN, LSTM	OBF dataset (Custom dataset), MalImg, MsM2015	Effect of bicubic interpolation and custom-trained TL on malware detection.	TL average, Fixed size, obfuscation techniques, data balancing.
Casolare et al. [35]	2022	J48, LMT, RF, RT, REP Tree	Custom dataset on Android APK from Android malware repo.	Malware detection in Android applications.	No benchmark data, data balancing, not detect malware family, time diff. leads to decline.
Kim et al. [36]	2017	CNN, MLP	Microsoft Malware	Identify groups in which malware resides and detect emerging malware.	Unstable MLP, model enhancement, and data balancing.
Khan et al. [25]	2018	GoogleNet, ResNet18, 34, 50, 101, 152	Microsoft Malware, Benign software opcodes converted to images.	Use .EXE files as images for malware identification.	Heavy-weight, extensive knowledge, more execution time, large validation loss, data balancing.
Dai et al. [27]	2018	MLP, KNN, RF	VirusTotal	Malware detection using storage dump file’s content as images.	Unable to find malware in full system, hardware feature overlooked, shallow ML models, data balancing, below par accuracy.
Singh et al. [37]	2019	CNN, ResNet-50	Custom Dataset—Malshare, VirusShare, VirusTotal. MalImg	Eliminate difficulties during static and dynamic analysis by using image representation of executables.	Low for packed or unseen, obfuscation evades visualization, heavy-weight, data balancing, and undetected evasive malware.
Venkatraman et al. [38]	2019	CNN, CNN BiLSTM, CNN BiGRU	BIG 2015, MalImg	Identify malware using image-based methods and hybrid models.	Complex processing, extensive knowledge of kernels, and fine-tuning.
Vasan et al. [39]	2020	Ensemble of ResNet-50 and VGG16, SVM	MalImg, Packed malware data from VirusShare	Identification of packed and unpacked malwares.	Complex ensemble, requires extensive NN knowledge, heavy-weight, and data balancing.
Sharma et al. [40]	2020	CNN, CNN-SVM	MalImg	Maximize potential of CNN and other ML models for malware classification.	Architectural improvement, multiple SVMs for multiclass, increased model size, and data balancing.
Naeem et al. [41]	2020	Fine-tuned CNN trained on ImageNet, VGG-16, ResNet-50, Inception	MalImg, IoT-Android Mobile dataset	Develop a novel CNN-based classifier for multiclass malware classification.	Requires expert domain knowledge for fine-tuning through backpropagation.
Bakour, K., and Unver, H. M. [42]	2021	RF, KNN, DT, Bagging, AdaBoost, Gradient Boost, Ensemble Voting Classifier, ResNet, Inception-V3	Manifest.xml, DEX, Manifest-ARSC-DEX, Manifest-Resources.arsc, Manifest-ARSC-DEX-Native_jar-based image dataset.	A generic image-based classification for any file type that uses grayscale images from Android malware samples.	Static analysis, impacted by tampering and code obfuscation, injection attacks, hybrid classifier higher time.
Kumar, S. [43]	2021	TL fine-tuned CNN trained on ImageNet, ResNet-50	MalImg, BIG 2015	Refined CNN to identify unidentified malware without extensive processing and evading strategies.	Requires expert knowledge, data balancing, Uniform image size, and Common CNN-based models.
Anandhi et al. [44]	2021	DenseNet201, VGG3	MalImg, BIG 2015, Some benign samples.	To preserve the semantic information by converting malware into Markov images using Gabor filter.	Uniform image size, heavy-weight, data balancing.
Pant et al. [45]	2021	Custom CNN, VGG16, Resnet-18, Inception-V3	MalImg	To detect malware in grayscale image form.	Insufficient data, non-uniform data, pre-trained model inferior.
Kumar et al. [46]	2022	CNN trained on ImageNet, VGG16, VGG19, ResNet50, Inception V3	MalImg, Microsoft BIG	Detect malware from files obtained by converting Windows PE files into grayscale images.	Heavy-weight, data balancing, difficult fine-tuning, CNN trained on general data.
Kalash et al. [47]	2018	GIST-SVM, CNN	MalImg, Microsoft Malware	Develop a deep CNN model for malware identification using a self-learning approach.	Approach not comparable to existing solution-based on the file types, GIST-SVM not effective needs improvement.
Unver, H. M., and Bakour, K. [48]	2020	Random Forest, KNN, DT, Bagging, AdaBoost, Gradient Boost	Manifest file-based image dataset, DEX code-based image dataset, Manifest-DEX-ARSC image dataset and Android Malware Dataset.	A generic method for any type of app when converted into images for malware detection.	Static analysis methods, impacted by tampering and code obfuscation, not detect injection attacks, no data balancing.
Jin et al. [49]	2020	CNN based Autoencoders	Dataset obtained from Andro-Dumpsys study conducted by Korea University.	Detect malware using CNN-based autoencoders using uniform image size.	Uses small dataset, identifies uncollected malware as benign, separate encoder for malware, high complexity and redundancy, more resources and time.
Bakour, K., and Unver, H. M. [50]	2021	DeepVisDroid (1D CNN) trained on local and global features, 2D CNN, CNN inspired by VGG16, ResNet ad Inception-V3.	Manifest.xml file-based dataset, DEX code files-based dataset, Manifest and Resources.arsc files-based dataset, and Manifest, Resources.arsc and Dex files-based image dataset.	To detect malware by fusing deep learning techniques with image-based attributes.	High computation time, fail to acknowledge the obfuscation and camouflage used in code and commonly used pre-trained models.
Lo et al. [51]	2019	Xception model pre-trained on ImageNet, Ensemble of Xception.	MalImg, Microsoft Malware	Convert ‘bytes’ and ‘asm’ into images for malware detection using DL models.	Heavy-weight models, requires extensive knowledge, no data balancing.
Parihar et al. [52]	2022	S-DCNN (Comprising ResNet50, Xception, EfficientNet-B4) ad MLP	MalImg VirusShare	Tackle the malware detection problem using ensemble and transfer learning.	Heavy-weight ensemble model, no data augmentation.
Darem et al. [54]	2021	Ensemble—CNN and XGBoost	Small custom dataset with 9 malware types.	Detect malware using grayscale images of obfuscated opcodes present as ASM files.	No benchmark dataset, requires extensive knowledge, time-consuming.
Roseline et al. [55]	2020	Ensemble Deep Forest	MalImg BIG2015 Malevis Malicia (for validation)	Use ensemble deep forest algorithm along with a vision-based approach for high dimensional malware data.	No data augmentation.
Ding et al. [56]	2020	CNN	Dataset provided by DRE-BIN project.	Use bytecode images of malware APKs for Android malware detection.	No benchmark dataset, small data, no data augmentation, and average results.
Ngo et al. [57]	2020	Grayscale image—CNN Other features—KNN, DT, SVM, RF, etc.	IoT malware dataset by IoTPOT, IoT SOHO and VirusShare	Experimentation on existing methods of static analysis for IoT malware detection.	Average results with obfuscation and encryption, no benchmark data.
Huang et al. [58]	2021	VGG16	Malware + benign samples from ‘virussign.com’	To detect malware present in Windows OS using hybrid visualization technique.	No benchmark data, unable to identify unknown samples, average results.
Naeem et al. [59]	2020	CNN	Leopard Mobile MalImg	Using image visualization and DL models for malware detection in industrial IoT.	No data balancing, more classification time.
He et al. [60]	2019	CNN with SPP layers ResNet	Data from Andro-Dumpsys study	Assess efficacy of CNN-based model in combating superfluous API injections in malware detection domain.	SPP led to memory limitations, dataset constraints, and models not optimized.
Su et al. [61]	2018	2-layered CNN	Data from Ubuntu System files and IoTPOT dataset.	Using CNN-based approach to mitigate risks of DDoS attacks in IoT environment.	Susceptible to obfuscation, time-consuming data pre-processing.
Asam et al. [62]	2022	CNN AlexNet VGG16 ResNet50 Xception GoogleNet	IoT Malware Dataset	Develop a CNN-based model to detect malware in IoT.	Complex CNN design, time-consuming process.
Makandar, A., and Patrot, A. [63]	2017	SVM KNN	Malheur MalImg	Malware classification by using an efficient texture feature vector.	No data balancing, complex feature vector construction.