Abstract
Mobile devices are vulnerable to malicious apps that jeopardize user privacy and device integrity. This includes single-app malware that operates independently and colluding Android apps that collaborate with each other to carry out a malicious attack. Existing detection methods primarily focus on single-app malware and hence will misclassify colluding Android apps. This paper introduces SigColDroid, a novel approach for detecting colluding Android apps and single-app malware by leveraging dangerous permissions. The research begins by extracting and identifying key features, such as permissions, smali file size, and permission rates, for model training. To facilitate comprehensive evaluation, a balanced dataset of 1455 apps is created, consisting of 485 benign apps, 485 randomly sampled single-app malware from the AndroZoo repository, and 485 colluding applications. Extensive experimentation is conducted using five ensemble classifiers: random forest, Extra Trees, AdaBoost, XGBoost, and LightGBM alongside our proposed custom Artificial Neural Network (ANN) and Deep Neural Network (DNN) architectures. The classifiers are evaluated based on five metrics: Precision, Recall, F1-score, accuracy, and the area under the receiver operation curve (ROC_AUC). The experimental findings highlight the following key insights: (i) Identification of the five most significant permission features for detecting colluding applications; (ii) Positive impact of smali file size and permission rates on classification performance; (iii) Superior performance of Random Forest with a ROC_AUC of 99.48% and LightGBM with 96.91% accuracy, 96.96% precision, 96.90% recall and 96.90% F1-score compared to other classifiers; (iv) Comparative analysis with previous research demonstrates that SigColDroid, despite utilizing fewer features, outperforms state-of-the-art approaches. The proposed approach presents an effective solution for detecting colluding Android apps using permissions and contributes to the advancement of improved detection and prevention mechanisms in mobile security.
Subject terms: Mathematics and computing, Computer science
Introduction
The majority of smartphones in the market run on Android, which has a 70.8% share according to Statista1. The official App stores offer about 2.56 million Applications that can be installed on Android devices, and there are also other sources for downloading Apps2. Android dominates the smartphone application market with a growing number of Apps every day. The Android platform appeals to a wide range of users by providing applications with rich features. This increasing popularity of mobile devices has led to a significant rise in malware targeting these platforms, particularly Android-based devices with 97% of mobile malware. In the second quarter of 2021, about 1.45 million new malicious Apps for Android were found3, which means that new malware is created every few seconds. These Apps can harm the devices in various ways, such as worms, exploits, Trojans, viruses, and more. Some of these Apps are deliberately released in many versions to affect more users and avoid detection4.
In the Android ecosystem, applications consist of various components that interact with each other using Inter-Component Communication (ICC). This communication model enables modular design and promotes functionality reuse across different apps and app components. In Android, Inter-Component Communication (ICC) is accomplished through the use of a message-passing system where messages are encapsulated within Intent objects. By employing Intents, an app or app component can access and utilize the functionality exposed by other apps or app components. For instance, an app can use an Intent to request the web browser to load a specific webpage or to request a messaging app to send a text message. This efficient communication model has enabled developers to craft intricate application scenarios by leveraging pre-existing functionalities.
There are various types of Android malware, including single-app and colluding malware that differ in their operational modes. Single-app malware operates independently, without relying on other apps or components. Single-app malware can perform various malicious activities, such as stealing personal information, controlling the device remotely, or launching attacks on other devices or networks. Colluding Android apps, on the other hand, operate in coordination, leveraging their combined privileges and interactions to achieve malicious objectives. The concerning aspect of malware collusion is that each colluding app only needs to request a minimal set of permissions, which can make it appear harmless when analyzed individually5–8. To illustrate, consider two utility apps: a cab booking app and a browser app. The cab booking app requires access to the user’s location, while the browser app needs internet connectivity. Assuming both apps are developed by the same adversary who intentionally establishes a communication channel between them. Whenever a user interacts with the cab booking app, it serves its intended purpose and sends the user’s geographic location to the browser app. Since the browser app has internet access, it can easily transmit the user’s location to a command and control (C&C) server. Malware authors have strong motivations to create colluding malware.
Despite the growing threats posed by both single-app and colluding malware, research on their specific differences remains limited. Most existing studies focus on either detecting single-app malware or colluding apps separately, without considering mixed scenarios or unique features that could help distinguish between benign, colluding, and single-app malware. This gap indicates a need for a novel approach that can effectively classify these categories of Android applications.
Despite the growing and different threats and challenges posed by these two categories of malware to mobile security, there is a lack of research on the specific differences between them. Most of the existing studies focus on detecting and analyzing single-app malware (i.e. classifying apps into benign or malicious)9–11 or detecting colluding Android apps (detecting if they are benign or colluding)12 separately, without considering the possibility of mixed scenarios or the distinctive features that can be used to discriminate between benign, colluding or single-app malware. This implies that existing approaches will classify colluding applications into the category of benign applications, or classify single-app malware into benign application. This gap indicates a need for a novel approach that can effectively classify these categories of Android applications.
In this paper, we intend to address the above-mentioned challenges. To analyze potentially dangerous apps (i.e., colluding and single-app malware), we propose SigColDroid, an innovative approach for detecting colluding Android applications within a dataset that includes a combination of benign and single-app malware instances. Our approach focuses on the identification and extraction of dangerous permissions from a carefully collected dataset comprising 1455 applications. Various machine learning models, such as artificial neural network (ANN)13, deep neural network (DNN)14, Random Forest (RF)15, Extremely Randomized Trees (Extra Trees)16, Adaptive Boosting (AdaBoost)17, Extreme Gradient Boosting (XGBoost)18, and Light Gradient Boosting (LightGBM)19, are employed to categorize the apps into their respective classes (benign, single-app malware, or colluding). Extensive experimentation is conducted to identify the top four most significant features contributing to the accurate prediction of malicious app categories. The evaluation of the Android malware classification method relies on several metrics, including accuracy (ACC), F1-score, recall, and the ROC area.
The following is the structure of the paper: Section 2 provides a review of related work, examining prior research that uses static analysis, dynamic analysis, and machine learning for app-collusion detection. It discusses the limitations of existing methods of our proposed approach. Section 3 covers relevant background knowledge, explaining key concepts such as Android application structure, Application collusion, and ensemble classifiers. Section 4 details the proposed methodology into three main steps: constructing the parameter feature set, filtering and extracting the core features, and classifying collusion. Section 5 describes the experimental design, dataset, and evaluation metrics used to assess model performance. The results are presented in Section 6. Finally, Section 7 concludes the paper and discusses potential directions for future work, including extending the approach to address current limitations and exploring additional optimization techniques.
Motivation and contribution
In the evolving landscape of mobile applications, the dual threats of malware and app collusion present significant challenges to both users and developers. Traditional security measures often fall short, focusing on singular threats without addressing the multifaceted nature of modern app vulnerabilities. While numerous systems have been developed to identify either malicious or colluding apps, there remains a gap in comprehensive solutions that can simultaneously classify applications into multiple categories: malicious, colluding, and benign.
To bridge this gap, we propose a novel machine learning-based system capable of categorizing apps into these three distinct groups. Our approach not only enhances the accuracy of threat detection but also provides a more holistic view of app behavior, thus improving overall security measures. Furthermore, we have constructed a unique dataset encompassing these three types of apps, which serves as a valuable resource for future research in this domain. The key contributions of this research can be summarized as follows:
An approach is proposed that effectively differentiates between single-app malware and colluding Android apps through permissions.
A comprehensive dataset of 1455 applications, encompassing benign apps, single-app malware, and colluding Android apps, is curated and labeled.
The performance of five machine learning models for mobile malware classification is comparatively evaluated.
Related work
In order to emphasize the originality of our research, we investigate topics related to malware detection and collusion detection techniques. Specifically, in this section, we review related work in two areas: machine learning-based Android malware detection and machine learning-based Android app-collusion detection.
Machine learning-based android malware detection
Many studies have explored the use of machine learning for detecting Android malware. Typically, these detection methods involve extracting features from the Android application package20. These features are obtained through static analysis, dynamic analysis, or a combination of both, referred to as hybrid analysis.
In static analysis, features are extracted from application components without executing the application21. This process typically involves decompressing the APK file to access various objects for analysis. Key objects include the AndroidManifest.xml file, which provides information on app permissions, API calls, package names, referenced libraries, and application components such as intents, activities, services, and broadcast receivers. Another crucial object is the classes.dex file, containing all the compiled Android classes20,22. Among the most commonly used features for detecting Android malware are app permissions. In 2014, Tchakounte et al.23 introduced a method for characterizing and detecting Android malware based on permissions. Nida et al.24 utilized machine learning algorithms to classify Android applications as either malware or benign based on permissions and API-based features. Chrysikos et al.25 developed a machine learning framework to analyze and classify malicious applications into families based on their permissions.
Dynamic analysis involves executing an application in either a real or virtual environment to gather behavioral features21. These features can include network traffic, battery consumption, CPU usage, IP addresses, and opcodes, among others. For instance, DATDroid26 utilized an emulator to collect runtime data such as system calls, CPU and memory usage, and network packets from Android applications1. This data was then analyzed using a Random Forest algorithm to differentiate between malicious and benign apps.
Deep learning models excel in adapting to the evolving landscape of cyber threats through feature representation learning. Millar et al.27 extracted input features from raw data, including low-level opcodes, app permissions, and proprietary Android API package usage. They employed deep learning models to select, rank, and refine these input features. The three sets of derived features were then combined and fed into a multilayer perceptron (MLP) to predict malicious software. Arvind et al.9 developed a technique for selecting features that are then used for detecting Android malware. Sana et al.10 present a fast, scalable, and accurate mechanism for obfuscated Android malware detection based on the Deep learning algorithm using real and emulator-based platforms.
Recent research has explored using Convolutional Neural Networks (CNNs) for Android malware detection by converting app binaries into images. In 2023, Tchakounte et al.22 extracted opcode sequences from DEX files, split them using n-grams, and encoded them into m-bit vectors with SimHash. These vectors were converted into gray-scale images and analyzed using Singular Value Decomposition (SVD) to create feature vectors for malware detection with CNNs. In 2024, Benedict et al.28 used Hilbert space-filling curves to transform Bytecode extracted from Dalvik Executable (DEX) into grayscale images, achieving high accuracy. However, using entire DEX files for image generation can introduce significant noise.
While machine learning approaches for malware detection have proven accurate and efficient in detecting Android malware, they do not take into account the existence of colluding android applications and will hence classify them benign applications.
Static analysis android app-collusion detection techniques
Many approaches in the literature use static analysis for Android app-collusion detection. Static analysis involves the examination of code without its execution29. In the context of Android apps, this analysis is conducted by inspecting the source code without running the application21. Using static analysis techniques, it becomes possible to perform behavioral analysis on an application, thereby detecting whether it is benign or malicious.
Bugiel et al.30 proposed XManDroid, which is the first approach developed for detecting collusion attacks in Android platforms. XMandroid specifically focuses on detecting privilege escalation in scenarios involving pending intents and transmission channels between dynamically constructed components, such as broadcast receivers. FUSE31 is a tool that addresses the limitations of single-app static analysis over multi-app analysis. The approach described by the authors begins with the analysis of individual apps and the storage of relevant information. Subsequently, this information is combined to detect collusion based on a restricted policy engine. In their work6, Liu et al. introduce MR-Droid, a framework designed to identify inter-app communication threats, including intent hijacking, intent spoofing, and collusion. MR-Droid proposes a scalable approach based on the MapReduce paradigm to facilitate compositional app analysis on a larger scale. IccTA32 is a static taint analyzer designed to identify privacy leaks that occur between components within Android apps. Bhandar et al.33 introduce an automaton framework for detecting intent-based collusion among apps. The framework includes a static inter-app analysis tool that can analyze multiple apps simultaneously and detect potentially colluding apps. Casolare et al.34,35, introduce a method that relies on model checking. This method involves representing an Android application as an automaton and leveraging a set of logical properties to minimize the need for comparisons. Additionally, they use another set of properties, which are automatically generated, to effectively identify colluding applications.
Static analysis is faster as it does not require code execution, and allows inspecting all app code paths and components. However, static analysis suffers from code segments that are only executed under certain conditions/inputs, dynamic code loading20,36, and different obfuscation methods20,37. In addition, Static analysis relies on predefined rules and patterns, making it challenging to detect unknown or novel patterns that were not anticipated during the rule creation. Machine learning algorithms have the potential to discover and adapt to new patterns based on the data they are trained on.
Machine learning-based android app-collusion detection
A range of studies have explored machine learning techniques for permission-based Android app-collusion detection.
Asavoae et al.38 mentioned that collusion can cause information theft, money theft or service misuse. They defined collusion between apps as some set of actions executed by the apps that can lead to a threat. They proposed two approaches to identify candidates for collusion. One is a statistical approach using machine learning, which estimates the likelihood of collusion within a set of apps and other is a rule-based approach developed in Prolog.
Kalutarage et al.8 first described the ML-based technique to identify Android app-collusion. There are two components to the procedure. The first section uses a simple informative naïve Bayes classifier with a beta before calculating the collusion threat. The second section ascertains whether two or more apps are communicating with one another.
A technique for identifying app collusion using audio signals is presented by Casolare et al.12. Using audio signal processing techniques, Casolare’s method entails turning an executable application into an audio file and extracting a set of numerical attributes from each sample. Using this data, they develop various machine learning models and assess how well they detect app collusion.
Some studies have explored a two-stage classifier for detecting app-collusion in Android smartphones. Faiz et al.39 introduced a detection approach for colluding app-pairs by combining the naıve Bayes algorithm with the likelihood method. Additionally, the authors proposed an alternative method that consists of a perceptron and logistic regression to detect Android app-collusion40. In40, Faiz et al. used 13 critical permissions frequently requested by both Android malware and colluding app-pairs. In the first stage, they used a dataset that consisted of 5000 benign and 3000 malicious applications to train the model and a test set of 2000 benign and 207 malicious applications. The obtained model was further tested on three sets of malicious applications of sizes 1260, 247, and 154 obtained from41. In the second stage, for testing of app-pairs, they used two sets of 120 colluding app-pairs obtained from42. They then detect application collusion using the parameter vector and a basic judgment algorithm.
Faiz et al.43 proposed a system for detecting Android malware using a hybrid classification approach involving K-means clustering and Support Vector Machine (SVM). Two datasets42,44 were used in the first stage, resulting in Data1 (13,176 training apps and 1860 test apps) and Data2 (12,028 training apps and 3008 test apps). In the second stage, a dataset of 120 colluding app-pairs was considered, as the researchers believed these pairs could pose similar risks as Android malware. Application collusion was identified using a parameter vector and a basic judgment algorithm.
Other approaches involve monitoring system parameters such as memory consumption and CPU clock speed to detect anomalies indicative of collusion attacks (Khokhlov et al.46). Various machine learning techniques, including feed-forward neural networks and long short-term memory models, have been explored for this purpose (Khokhlov et al.46).
Table 1 serves as a concise summary of existing approaches that use machine learning for Android App-collusion Detection. In previous research, many techniques relied on permission-based feature sets for detecting collusion. Permission-based methods are commonly employed due to their efficiency and high detection accuracy. Analyzing permissions before app installation could prevent harm to the device. Permissions play a crucial role in the swift identification of colluding applications. However, there is a need to focus on utilizing only the essential permissions to enhance detection accuracy. Additionally, reducing the inclusion of ineffective permission features can decrease computational complexity. Also existing techniques do not take into account the existence of generic malware applications (single-app malware) and will hence classify them as benign applications.
Table 1.
Summary of existing machine learning-based Android App-Collusion detection approaches.
| Paper | Features | No. of features | No. of samples | Classifier/algorithm | Performance metric |
|---|---|---|---|---|---|
| Asavoae et al.38 | Permissions | – | 18,480 (9000 malware, 9000 benign software, 480 colluding apps) | Bayesian method | Precision=94%, f1-score=95% |
| Kalutarage et al.8 | Permissions | – | 29k+ size app set which includes “malicious”, “potentially malicious” and “ clean” apps and 240 colluding app pairs | log-likelihood method and Bayesian method | Precision=94%, Recall=95%, specificity=94%, f1-score=95% using Bayesian method |
| Faiz et al.45 | Permissions | 10 | 676 app-pairs() | logistic regression classifier | Accuracy=96.03%, Precision=99%, Recall=96.1%, f1-score=97.5% |
| Faiz et al.39 | Permissions | 26 | first stage = 10207(7000 benign, 3207 malicious), second stage = two sets of 120 colluding app-pairs | Naive Bayes classifier and a likelihood method | Recall of 90% and 87.5% on two sets of colluding app-pairs |
| Faiz et al.40 | Permissions | 13 | first stage = 10207(7000 benign, 3207 malicious), second stage = one set of 120 colluding app-pairs | logistic regression classifier and a perceptron learning | Recall = 52% |
| Khokhlov et al.46 | RAM consumption and CPU frequency | 13 | 402797 | a simple RNN, a Long short-term memory (LSTM) RNN, and a Gated Recurrent Unit (GRU) RNN. | Accuracy=95% in GRU |
| Faiz et al.43 | Permissions | – | First stage: Data1 (13,176 training apps, 1,860 test apps) and Data2 (12,028 training apps and 3008 test apps). Second stage: 120 colluding app-pairs | K-means clustering and Support Vector Machine (SVM) | |
| Casolare et al.12 | Audio signals | 25 | 359(199 Trusted applications, 160 colluding applications) | J48, BayesNet, RandomForest, LMT, JRip, DecisionTable | Accuracy=99.9%, Precision=97.5%, Recall=97.5%, AUC=94%, f1-score=95.5% using BayesNet |
| Our Study | 5 Permissions features | 25 | 1455(485 Benign applications, 485 single-app malware, 485 colluding applications) | ANN, DNN, Random Forest, Extra Trees, AdaBoost, XGBoost, LightGBM | Accuracy=96.91%, Precision=96.96%, Recall=96.90%, AUC=99.48%, f1-score=96.90% using LightGBM and AUC=99.48% using Random Forest |
Among these studies, our research is distinctive as it employs a comprehensive set of classifiers including ANN, DNN, and ensemble classifiers, and uniquely attempts to distinguish between generic malware (single-app malware) and colluding Android applications. Notably, it stands out by achieving high performance metrics such as an accuracy of 96.91%, all while using fewer permissions based features. Furthermore, our study underscores the effectiveness of a multi-classifier approach, validated through rigorous performance metrics and comparison with other studies.
Summary of literature review
The literature review section encompasses three main areas: Machine Learning-based Android Malware Detection, Static Analysis Android App-collusion Detection Techniques, and Machine Learning-based Android App-collusion Detection.
Firstly, in the domain of Machine Learning-based Android Malware Detection, various methodologies have been explored, highlighting the advantages of machine learning in accurately identifying malware. However, existing techniques do not take into account the existence of colluding android application and hence will classify them as benign applications.
Secondly, the review of Static Analysis Android App-collusion Detection Techniques reveals that while static analysis provides a computationally efficient way to detect app collusion, it often fails against sophisticated obfuscation techniques and cannot dynamically analyze app behavior.
Lastly, Machine Learning-based Android App-collusion Detection shows promise with advanced algorithms, but it should focus on essential permissions to improve accuracy and reduce computational complexity. Also, Current techniques often overlook generic malware (single-app malware), and will hence classify them as benign applications.
To address these issues, we propose a scheme that leverages the power of permissions to enable a more computationally efficient and rapid detection approach.
Background
Android overview
The Android platform, together with its built-in mechanisms and protection measures, are summarized in this section.
Android platform The Android platform comprises a comprehensive software stack. At its foundation lies the Linux kernel, responsible for managing low-level tasks such as hardware interaction, driver handling, and power management. On top of this kernel, we find essential C/C++ libraries like LibsC and SQLite. Additionally, the Android runtime (ART) and the core Android libraries contribute to the overall functionality of the platform. Next is the application framework that contains classes and services used by applications. Finally is the application layer which contains pre-installed programs and programs written by the user. Most Android applications (apps) are typically developed using the Java programming language and leverage a comprehensive set of APIs from the Android Software Development Kit (SDK). When an app is compiled, its code, data, and resources are bundled into an archive file known as an Android application package (APK). Once installed on an Android device, the app executes within the Android runtime (ART) environment. Figure 1 shows the major components of the Android platform, including the Linux kernel, the Android runtime, the application framework, and the application layer.
Fig. 1.
The android software stack.
Application components The fundamental logical building elements of Android apps are called components. All components have the ability to operate independently, either through their corresponding application or through the system upon permission from other apps. Android applications comprise four distinct types of components: (1) The Android user interface is built on top of Activity components. Every program could have several Activities that show the user different screens of the application. (2) Service components have no user interface but can process data in the background. While a user is interacting with another program, a Service component can be executed to carry out certain operations in the background for example downloading files and playing music. (3) Broadcast Receiver components react asynchronously to system-wide message broadcasts. These components serve as gateways, forwarding messages to other elements such as Activities or Services for further handling. (4) Content Provider components offer database functionality to other parts of the application. These databases can be used for both intra-app data storage and data sharing across different applications.
Application configuration Each Android application is accompanied by an XML configuration file called manifest.xml. This file describes, among other things, the principal components that make up an application, including their type, components, and their required and enforced permissions. define the types of requests that a specific component can handle. They serve to specify the capabilities of components within an Android app. These filter declarations are set in the app’s manifest file during compilation time and remain fixed; they cannot be altered at runtime.
Android security mechanism Android’s security mechanism is based on sandbox, permissions, and application signing. Such a mechanism has improved app security effectively.
Application sandbox. When each Android app is installed, it is given its unique Linux user ID, and each app runs within its instance of the Android runtime environment. As a result, each app is completely sandboxed. Its files, processes, and other resources are inaccessible to other apps. This sandbox approach ensures that by default, no app can access sensitive data or perform actions that could adversely affect other apps the OS, or the users such as accessing the internet, getting the user’s location, or reading or modifying the contacts database.
Application permissions. Android permissions play a crucial role in enforcing restrictions on an application’s actions related to sensitive resources, such as user data and sensor information (e.g. camera, GPS, etc). For example, the CALL_PHONE permission is necessary for an app to initiate phone calls. Any Android application can declare additional permissions, but to obtain them, the app must explicitly request them in its manifest file. Permissions have associated protection levels: (1) Normal permissions allow an app to access data or resources outside its sandbox without posing significant risks to user privacy or other apps’ functionality. If an app requests normal permission in its manifest, the system automatically grants it during installation; (2) dangerous permissions are high-risk and may provide access to private data or potentially harmful functionalities. An app cannot receive dangerous permissions without explicit confirmation from the user. Additionally, apps can define custom permissions to restrict sensitive tasks performed by components within the application.
Before Android 6.0, dangerous permissions were prompted during installation, whereas after Android 6.0, they are requested at runtime
Inter-process communication Inter-process Communication plays a crucial role in Android’s security architecture. To protect applications, Android isolates them from each other and restricts their access to system resources using a sandboxing mechanism. This isolation prevents malicious apps from directly accessing or modifying sensitive data belonging to other apps. To conduct IPC, intent messages are used. An Intent message represents an event that triggers a specific action along with the necessary data. These messages facilitate communication between different components within an app or across different apps. There are various types of component invocations, such as explicit or implicit, intra- or inter-app, etc. With Android’s IPC, late run-time binding between components in one or more applications is feasible. This is made possible by event messaging, a crucial aspect of event-driven systems, rather than explicit code calls.
Application collusion
App collusion refers to a scenario in which two or more apps communicate with each other and carry out a threat. The threat can be information theft, financial theft, service misuse, and elevation of privilege8. Information leakage is one of the most prominent threats. Information leakage can be carried out using the Internet feature or storage on the device.
Figure 2 illustrates a collusion attack. The scenario involves an Android device with two apps installed: Contacts Manager and News. The Contacts Manager app possesses the permissions READ CONTACTS and WRITE CONTACTS, while the News app has the permission INTERNET. It is presumed that both apps are created by the same malicious entity, which deliberately establishes a communication channel between them. Consequently, the Contacts Manager app sends an Intent containing contact information as a payload to the News app. Since the News app has internet access, it transmits the received data to external entities.
Fig. 2.
App-Collusion attack scenario.
App collusion refers to a scenario where two or more apps collaborate to achieve a malicious goal. Several apps share the tasks required to carry out a threat. But when we examine a single app, it seems harmless. Each application can use its permissions to carry out some of the specified tasks. When taken as a whole, the collaborating app set consists of a risky set of permissions that can be used to accomplish nefarious objectives.
We give a formal definition of app collusion as in7. Assume A to be the set of all Android apps and P to be the set of all dangerous permissions. Let a and b be two applications with permission sets
and
such that:
and
.
Now, assume that a performs a sensitive operation to access resource
by using permission
such that,
.
If
is a set of sequence of sensitive operations on/using
and if b performs a sequence of operations
using permission
,
Then we say that a and b are colluding apps
Artificial neural network (ANN)
An Artificial Neural Network is a mathematical model for learning inspired by biological neural networks13. Artificial neural networks model mathematical functions that map inputs to outputs based on the structure and parameters of the network. In artificial neural networks, the structure of the network is shaped through training on data. ANNs learn representations and patterns directly from input data through various layer types as shown in Fig. 3.
Fig. 3.
Structure of an artificial neural network.
Input layer: This layer receives the raw input data. For Android malware detection, this might include permissions, API calls, or other relevant features.
Hidden layers: In an Artificial Neural Network, there might be one or more hidden layers. Neural Networks with more than one hidden layer are known as Deep Neural Networks (DNN). Each layer comprises neurons that apply weights to the inputs and pass the results through activation functions.
Activation functions: Introduce non-linearities into the network. Common choices are rectified linear (ReLU), hyperbolic tangent (tanh), and sigmoid functions.
Dropout layers: Used primarily in DNNs, these layers randomly set a fraction of input units to zero during training to prevent overfitting.
Output layer: This layer outputs the final classification. In multi-class problems, a softmax function is often used to convert the raw scores into probabilities.
Both ANNs and DNNs benefit from techniques such as batch normalization and early stopping to enhance training efficiency and performance. The architecture and depth of the network depend on the complexity of the data and the specific application, allowing these networks to learn intricate and hierarchical representations.
By sequentially applying these components, ANNs and DNNs can effectively model complex patterns in data, providing robust solutions for tasks such as Android application classification. This automatic learning capability makes them versatile and powerful tools in various domains.
Ensemble classifiers
Ensemble classifiers represent a category of machine learning models that enhance overall performance by aggregating predictions from multiple base models47. Ensemble classifiers find utility across diverse applications, spanning image classification, speech recognition, and natural language processing.
Ensemble classifiers encompass various techniques, including bagging, boosting, and stacking. Bagging trains multiple base models independently on different subsets of the training data. The predictions from these models are then combined using techniques like voting or averaging. Boosting trains a sequence of models iteratively. Each model focuses on correcting the errors made by its predecessor. Stacking involves training multiple base models and using their predictions as input to a meta-model.
One of the advantages of ensemble classifiers is that they can help to reduce over-fitting, which occurs when a model is too complex and performs well on the training data but poorly on the test data. Furthermore, Ensemble classifiers can improve the performance of the model, which is the ability of the model to perform well on new, unseen data.
Here is a sample figure that shows how ensemble classifiers work:
Figure 4 shows how ensemble classifiers work. Multiple base models are trained on different subsets of the training data. The predictions of these base models are then combined to make a final prediction. This approach can help to improve the overall performance of the model.
Fig. 4.
Ensemble classifiers.
Methodology
In this section, we will present our methodology for detecting colluding applications. The proposed system, Significant Dangerous Permissions to Detect Android Colluding Malware (SigColDroid), extracts permission usage, smali size, and permission rate from application packages. However, instead of analyzing all the extracted features, SigColDroid mainly targets a selected set of features that is effective in distinguishing and improving the rate of collusion detection and classification. For the latter, the proposed scheme employs Random Forest, Naive Bayes, Support Vector Machine, Decision Trees, and K-nearest Neighbor classifiers. The permissions were selected based on their notable influence on malware detection potency. The proposed research consists of the following major components:
-
Step 1:Data Pre-Processing
-
Step 1.1:Constructing/Identifying the Features Set
-
Step 1.2:Filtering the Core Features
-
Step 1.1:
-
Step 2:
Employing the Supervised ML algorithms to Classify the Android Applications
The proposed strategy is shown in Fig. 5.
Fig. 5.
Methodology.
The key components of the strategy include:
Data pre-processing
Constructing the parameter feature set
In this step, we start by processing a dataset that consists of benign, malicious, and colluding Android application packages (APKs). Each APK undergoes disassembly using Apktool1 to extract important components such as the AndroidManifest.xml and smali files. Specifically, we focus on extracting the permissions utilized by each sample.
We represent each sample as:
| 1 |
Here,
represents the presence (1) or absence (0) of the j-th permission in sample
.
denotes the size of the smali files within the sample, while
is the permission rate calculated by dividing the total number of permissions requested by the sample (
) by its smali file size.
The class vector is denoted as:
| 2 |
Each entry
in the class vector corresponds to the class label of the i-th app sample, which can be classified as benign, malicious, or colluding.
The general representation of our dataset and labels can be described as follows:
![]() |
3 |
![]() |
4 |
In these equations, X represents the matrix of data samples, where each row
is a feature vector for the i-th app sample sample. Y represents the matrix of class labels, where each entry
corresponds to the class label of the i-th sample.
Filtering the core features
To ensure that we obtain the best results from the ML, we preprocess the original dataset which we denote as
. We let the matrix of
samples and
characteristics (variables) from the dataset
be represented as
where
is the number of samples in the dataset and
is the number of features.
To ascertain feature importance and subsequently feature selection, the Recursive Feature Importance method is used. This method is a wrapper-type feature selection algorithm48. This means a different machine learning algorithm is used at the core of the method. The ML algorithm is wrapped by RFE and is used to assist in selecting the features. When using RFE, it is critical to use a model that has a way to calculate feature importance, hence usually tree-based models are used which include Decision Trees (DT), Random Forest (RF), Boosting Trees to name a few since they can compute feature importance. A model based on the whole dataset variables is first created, assigning an importance score to each predictor. Subsequently, the least important predictors are removed and the model is rebuilt with a new set of significance scores until all the features have been explored49.
For this work, we use a random forest model for evaluating feature importance. Feature importance is considered as the decrease in node impurity weighted by the probability of reaching that node in the tree. The higher this value is, the more important the feature. To ascertain the feature importance, first the Gini impurity is computed from the expression:
![]() |
5 |
where
is the frequency of label
at a node,
is the number of unique labels or classes. Then the node importance
which corresponds to the feature importance is computed from the Gini importance. Assuming only 2 child nodes on the tree, the node importance is given by:
| 6 |
where
is the importance of the node
,
is the number of samples reaching node
,
is the impurity value of node
, left(
) and right(
) are the child nodes from the left and right split on node
respectively.
After RFE, we get a new dataset
, where
is the new number of features.
Malware classification by employing machine learning classification models
For the task of app classification into benign, single-app malware, and colluding application, we employed seven different machine learning algorithms, with two of them, (Artificial Neural Network(ANN)13 and Deep Neural Network(DNN)14) coming from the family of Neural Networks algorithms and five of the seven (Random Forest15, Extra Trees16, AdaBoost17, XGBoost18, and LightGBM19) being Ensemble Learning algorithms. The input variables in our study were binary, which is advantageous for decision trees and ensemble models as they can handle binary data directly without requiring additional pre-processing. These models are also well-suited for learning from unbalanced data, and ensemble models, in particular, are recognized for their robust predictive performance50–52. As a result, we selected ensemble tree models for our analysis. The following models were evaluated:
Artificial neural networks (ANN)13 are machine learning algorithms that emulate the structure of human neural networks. ANNs consist of an input layer, which receives multiple input data, an output layer that produces the output data, and one or more hidden layers in between. The construction of the model involves determining the number of nodes in the hidden layer. An activation function is used to optimize the weights and biases. While adding more hidden layers can improve prediction accuracy, it also significantly increases computational complexity. The drawbacks of ANNs include the difficulty in finding optimal parameters during the learning stage, a high risk of overfitting, and relatively long training times. Our proposed Artificial Neural Network (ANN) model classifies Android applications into three categories: benign, single-app malware, or colluding app. The model configuration can be summarized in Fig. 6. The ANN is simple yet effective, consisting of an input layer directly connected to an output layer. The input layer processes the standardized features of the Android applications, while the output layer has three neurons, each corresponding to one of the classes. The softmax activation function is applied to the output layer to generate a probability distribution over the classes, ensuring the sum of the outputs equals one. This configuration allows the model to provide a confident classification for each application. The model is trained using the sparse categorical cross-entropy loss function, optimized with the Adam optimizer set at a learning rate of 0.001.
Deep neural networks (DNN)14 enhance prediction accuracy by increasing the number of hidden layers. A DNN typically refers to a neural network with two or more hidden layers and is frequently used for iterative learning with large datasets. The error backpropagation technique, a common method, is employed for training. Prominent algorithms in this category include Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), Long Short-Term Memory networks (LSTM), and Gated Recurrent Units (GRU). Our proposed Deep Neural Network (DNN) model includes three dense hidden layers with 128, 64, and 32 neurons respectively, each utilizing the ReLU activation function (f(x) = max(0, x)). Dropout layers with a rate of 0.5 are applied after each dense layer to prevent overfitting. The model configuration can be summarized in Fig. 7. Input features are fed into the first dense layer, which extracts initial high-dimensional patterns from the data. The output of the first dense layer is then passed through an activation function like ReLU, introducing nonlinearity and enabling the network to learn more complex representations. The activated outputs from the first layer serve as input to the second dense layer, where the lower-level patterns are combined to detect more intricate features. This process continues through the third dense layer, further refining the detected patterns. Each layer combines the outputs of the previous layer to build a hierarchical representation of the input features. The final dense layer outputs class probabilities through the softmax activation function, enabling multi-class classification. An iterative process was undertaken to determine the optimal ANN, and DNN architecture. This involved experimenting with various layer configurations and hyperparameters while tracking performance. The final architecture selected provided the best results, balancing complexity and classification effectiveness. Ensemble algorithms aim to create a stronger model by combining multiple weak learners. A weak learner is an algorithm that performs slightly better than random guessing. By integrating these weak learners, the overall performance of the model is enhanced.
Random forest15 is an ensemble algorithm that employs multiple decision trees for learning. It randomly samples input data, assigns it to various decision trees for training, collects their decision results for a target app, and determines the most voted family. The approach prevents over-fitting at each node by only taking into account a portion of the features during the tree-growing phase. Generally, it is known for its simplicity, speed, and tendency to outperform a single classifier.
Extremely randomized trees (Extra Trees)16 also known as Extremely Randomized Trees, differ from random forests by determining the splitting thresholds for each feature. While random forests search for the optimal threshold, Extra Trees randomly select thresholds to split the data into subsets. This randomization significantly reduces the learning time compared to random forests since finding the optimal thresholds for all features at all nodes is time-consuming. Similarly, individual decision trees in extra trees exhibit some bias errors, but the ensemble approach effectively reduces both bias and variance errors.
Adaptive boosting (AdaBoost)17 is a boosting method that, once the first classifier learns, gives each instance the same weight. Based on the prior classifier’s errors, the weight of instances for the subsequent classifier is then modified. In particular, it makes occurrences with incorrect classifications heavier and those with good classifications lighter. This change causes the classifier after it to concentrate on the instance that was incorrectly classified in the previous step. Until the required number of classifiers is added, AdaBoost keeps doing this recalculation. AdaBoost uses weighted voting to get a final judgment after all classifiers have made guesses.
EXtreme gradient boosting (XGBoost)18 represents an optimized implementation of the gradient boosting framework. It offers the advantage of parallel processing across multiple CPU cores. XGBoost is often referred to as the normalized version of GBM (Gradient Boosting Machine), and its normalization helps mitigate the risk of overfitting. Additionally, users have the flexibility to perform cross-validation at each iteration of the boosting process, enhancing model robustness and generalization.
Light gradient boosting (LightGBM)19 is a highly efficient gradient-boosting decision tree algorithm. While it shares similarities with XGBoost, LightGBM distinguishes itself in its tree-building approach. LightGBM can reduce the learning time and memory usage by replacing continuous values with discrete bins. Furthermore, this algorithm minimizes the computational cost associated with gain calculation for each partition. Notably, LightGBM supports GPU learning, parallel processing, and excels in handling large-scale datasets.
Fig. 6.
Our proposed ANN model.
Fig. 7.
Our proposed DNN model.
Experimental setup
Dataset
We compiled a dataset that consists of 1455 Android applications, encompassing three categories: benign apps, single-app malware, and colluding Android apps. The benign apps and single-app malware were obtained from separate sources, namely the Google Play Store and VirusShare53, respectively. To collect these apps, we used AndroZoo54, a reputable repository of Android applications that offers access to a diverse range of apps from various sources, including the Google Play Store. AndroZoo provides information on the number of antivirus products in VirusTotal55 that flag an app as malware, thus allowing us to construct our dataset of benign and single-app malware samples. For an app to be considered benign, it must be classified as non-malicious by all antivirus products. Conversely, we collected single-app malware samples when they were identified as malicious by at least one antivirus scanner.
In addition, we gathered a set of 485 colluding Android apps, with 482 obtained from the Application Collusion Engine (ACE)42 and 3 from the DroidBench dataset56. This inclusion of colluding apps from multiple sources added diversity to our dataset, resulting in a total of 1455 apps, equally distributed among the three categories. We ensured a balanced representation of each category to maintain fairness during the subsequent classification process. Table 2 summarizes our dataset.
Table 2.
Summary of dataset.
| Category | Number of apps |
|---|---|
| Benign Apps | 485 |
| Single-App Malware | 485 |
| Colluding Android Apps | 485 |
| Total | 1455 |
Parameter tuning
Table 3 presents the parameters of the machine learning algorithms. For ANN and DNN, the activation function used was ReLU, the output function was Softmax, and the optimization function was Adam.
Table 3.
Parameters for machine learning algorithms.
| ANN | DNN | Random | Extra | Ada | XGBoost | Light | ||
|---|---|---|---|---|---|---|---|---|
| Forest | Trees | Boost | GBM | |||||
| Hidden layers | 0 | 3 | N estimators | 100 | 90 | 100 | 100 | 100 |
| Epoch | 100 | 100 | Max depth | 4 | 10 | 5 | – | – |
| Batch size | 32 | 32 | – | – | – | – | – |
Evaluation metrics
To assess the effectiveness of the proposed method in Android colluding detection, four different metrics are used: Precision, Recall, F1-Score, Model Accuracy, and ROC Area.
The precision represents the proportion of the Android applications truly belonging to a certain category (i.e., Benign, single_app_malware, or colluding_app_malware) among those labeled to belong to the category.
| 7 |
The recall metric is the proportion of Android applications correctly assigned to a specific category (e.g., Benign, single_app_malware, or colluding_app_malware) out of all the Android applications that truly belong to that category.
| 8 |
The F1-score represents the weighted average of recall and precision, providing a balanced measure of a model’s performance.
| 9 |
Accuracy represents the ratio of correctly classified samples (both true positives and true negatives) to the total number of samples in the dataset. The formula for accuracy is:
| 10 |
True positive (TP) refers to the count of positive testing samples correctly predicted as positive. Conversely, false positive (FP) represents the count of negative testing samples mistakenly predicted as positive. Similarly, true negative (TN) corresponds to the count of negative testing samples that are accurately predicted as negative. In contrast, false negative (FN) denotes the count of positive testing samples that are incorrectly predicted as negative.
The receiver operating characteristic (ROC) curve depicts the trade-off between the true positive rate and the false positive rate. When the test results are unbalanced, it is advantageous to minimize their magnitude. The performance of the classifier can be evaluated using the Area Under the Curve (AUC) metric. A higher AUC value, closer to 1, indicates a stronger classifier.
Results and discussion
Feature set
Android manifest parsing
To extract the requested permissions from the Android applications, we developed a Python script leveraging the official Android documentation https://developer.android.com/reference/android/Manifest.permission and libraries. The script accessed the Android Developers website to retrieve the complete list of permissions that an Android app can request. This ensured that we obtained an up-to-date and comprehensive set of permissions for our analysis.
Using the APKTool utility https://ibotpeaches.github.io/Apktool/, we decompiled each APK file to extract the Android manifest file (AndroidManifest.xml). The manifest file contains essential information about the app, including its requested permissions. We parsed the manifest file using a Python script that we developed, to extract a variety of potential permissions, including permission rate and the smali sizes of the applications. This extraction facilitated static analysis and provided explicit insight into the behavior of each app.
Recording permissions in a CSV file
For each app, we created a corresponding entry in a CSV file to record the presence or absence of each permission listed in the retrieved Android permissions set. The CSV file acted as a tabular representation of the permissions, with each row representing an app and each column representing a specific permission.
During the parsing process, we marked the presence of permission with a value of “1” in the respective cell of the CSV file. If permission was not explicitly declared in the app’s manifest, we marked it as absent with a value of “0”. This binary representation allowed us to analyze and compare the permission patterns across different apps effectively.
In addition to the permission indicators, we augmented the CSV file with extra columns to capture additional features. Specifically, we included columns for permission count, permission rate, and smali size for every single app. These additional metrics provided valuable insights into the number of permissions requested, the frequency of permission usage, and the size of the app’s smali code.
The resulting CSV file is a valuable resource for subsequent feature extraction and machine learning modeling. It provided a comprehensive view of the permissions landscape within the dataset, enabling us to identify common and unique permission patterns associated with single-app malware and colluding Android apps.
Selection of effective permissions
After obtaining the feature set described above, we refined it to focus on Google’s dangerous permissions, permission rate, and smali size as depicted in Table 4. To identify the most impactful permissions, we utilized the feature importance score of the Random Forest classifier. This score indicates how frequently a feature is used to split the data while constructing trees. Feature importance is a well-established measure to generate streamlined and efficient prediction models by utilizing a reduced set of inputs57,58. In our implementation, we adopted a Random Forest-based feature importance technique. This technique allowed us to identify the permissions that played a significant role in the detection process.
Table 4.
Feature importance of Google dangerous permissions with additional metrics.
| SN | Permission | Importance |
|---|---|---|
| 1 | Smali File Size | 0.4078 |
| 2 | READ_PHONE_STATE | 0.1435 |
| 3 | Permission Rate | 0.0947 |
| 4 | READ_SMS | 0.0733 |
| 5 | RECEIVE_SMS | 0.0637 |
| 6 | SEND_SMS | 0.0348 |
| 7 | CALL_PHONE | 0.0328 |
| 8 | WRITE_EXTERNAL_STORAGE | 0.0309 |
| 9 | READ_EXTERNAL_STORAGE | 0.0246 |
| 10 | READ_CONTACTS | 0.0177 |
| 11 | READ_HISTORY_BOOKMARKS | 0.0138 |
| 12 | CAMERA | 0.0115 |
| 13 | ACCESS_COARSE_LOCATION | 0.0113 |
| 14 | RECORD_AUDIO | 0.0081 |
| 15 | GET_ACCOUNTS | 0.0080 |
| 16 | READ_CALL_LOG | 0.0065 |
| 17 | ACCESS_FINE_LOCATION | 0.0064 |
| 18 | WRITE_CONTACTS | 0.0030 |
| 19 | INSTALL_PACKAGES | 0.0029 |
| 20 | WRITE_HISTORY_BOOKMARKS | 0.0014 |
| 21 | ANSWER_PHONE_CALLS | 0.0012 |
| 22 | RECEIVE_WAP_PUSH | 0.0006 |
| 23 | READ_CALENDAR | 0.0004 |
| 24 | RECEIVE_MMS | 0.0003 |
| 25 | WRITE_CALENDAR | 0.0003 |
| 26 | ACCESS_NOTIFICATION_POLICY | 0.0002 |
| 27 | WRITE_CALL_LOG | 0.0001 |
| 28 | READ_PHONE_NUMBERS | 0.0001 |
| 29 | UPDATE_DEVICE_STATS | 0.0000 |
| 30 | BODY_SENSORS | 0.0000 |
To establish a threshold for significance, we defined a measure of 0.1. Permissions with an importance value below this threshold were considered less impactful and were subsequently excluded from our analysis. By exploring various combinations of feature sets, we eventually determine the most crucial permission list, as illustrated in Table 5.
Table 5.
The proposed identified parameter features.
| Type | No | Name |
|---|---|---|
| 1 | android.permission.READ_PHONE_STATE | |
| Permissions | 2 | android.permission.READ_SMS |
| 3 | android.permission.RECEIVE_SMS | |
| Metrics | 1 | Smali File Size |
| 2 | Permission Rate |
Empirical setup and results
In our empirical analysis, we employed various machine learning models, including five ensemble classifiers-Random Forest (RF), Extremely Randomized Trees (Extra Trees), Adaptive Boosting (AdaBoost), Extreme Gradient Boosting (XGBoost), and Light Gradient Boosting (LightGBM)-alongside our proposed custom Artificial Neural Network (ANN) and Deep Neural Network (DNN) architectures. These models were evaluated on a dataset comprising 1455 applications, which included benign, single-app malware, and colluding applications. The input for these models consisted of the four most significant permission features extracted from the applications. Our custom ANN and DNN architectures were trained for 40 epochs, utilizing a batch size of 25.
The experiments were conducted on a system equipped with an Intel Core i5 7th generation processor. Our evaluation of the classification method was based on several performance metrics, including accuracy (ACC), F1-score, recall, and the area under the Receiver Operating Characteristic (ROC) curve. This comprehensive setup enabled us to effectively analyze the performance of our proposed approach in detecting generic malware and colluding Android applications.
Model training and testing
We used 10-fold cross-validation to evaluate model performance. Prior to experimentation, the dataset was shuffled and evenly divided into 10 partitions. Each run designated one partition for validation and testing and utilized the remaining nine partitions as the training set. This equated to allocating approximately 90% of data for training, 5% for validation, and 5% for final testing. The model was trained on the training data, while hyperparameter tuning utilised the validation set. After training completion, test set evaluation provided unbiased performance metrics. We relied on five measures-accuracy, precision, recall, F1-score, and ROC-AUC - to gauge effectiveness. Experiments were conducted using the TensorFlow library. Early stopping was employed during training to prevent over-fitting. The results of the 10-fold classification can be observed in Tables 6, 7, 8, 9, 10, 11 and 12.
Table 6.
The proposed scheme results on ANN classifier with 10-fold cross-validation.
| Classifier | Test Set | Precision | Recall | F1_Score | Accuracy | ROC_AUC |
|---|---|---|---|---|---|---|
| 1 | 0.8829 | 0.8867 | 0.8831 | 0.8836 | 0.9417 | |
| 2 | 0.9364 | 0.9349 | 0.9338 | 0.9315 | 0.9721 | |
| 3 | 0.9014 | 0.8975 | 0.8974 | 0.8973 | 0.9597 | |
| 4 | 0.9210 | 0.8960 | 0.8983 | 0.9041 | 0.9578 | |
| ANN | 5 | 0.8583 | 0.8687 | 0.8597 | 0.8630 | 0.9206 |
| 6 | 0.9051 | 0.9051 | 0.9042 | 0.9034 | 0.9724 | |
| 7 | 0.8764 | 0.8802 | 0.8778 | 0.8828 | 0.9750 | |
| 8 | 0.8865 | 0.8820 | 0.8840 | 0.8828 | 0.9654 | |
| 9 | 0.8610 | 0.8440 | 0.8493 | 0.8552 | 0.9302 | |
| 10 | 0.8775 | 0.8764 | 0.8752 | 0.8759 | 0.9250 | |
| Average (%) | 0.8906 | 0.8872 | 0.8863 | 0.8879 | 0.9520 | |
Table 7.
The proposed scheme results on DNN classifier with 10-fold cross-validation.
| Classifier | Test Set | Precision | Recall | F1_Score | Accuracy | ROC_AUC |
|---|---|---|---|---|---|---|
| 1 | 0.9662 | 0.9679 | 0.9670 | 0.9658 | 0.9870 | |
| 2 | 0.9716 | 0.9688 | 0.9699 | 0.9726 | 0.9984 | |
| 3 | 0.9541 | 0.9529 | 0.9519 | 0.9521 | 0.9952 | |
| 4 | 0.9665 | 0.9634 | 0.9644 | 0.9658 | 0.9980 | |
| DNN | 5 | 0.9404 | 0.9420 | 0.9410 | 0.9384 | 0.9828 |
| 6 | 0.9938 | 0.9921 | 0.9929 | 0.9931 | 0.9999 | |
| 7 | 0.9530 | 0.9550 | 0.9526 | 0.9517 | 0.9890 | |
| 8 | 0.9591 | 0.9572 | 0.9581 | 0.9586 | 0.9890 | |
| 9 | 0.9485 | 0.9418 | 0.9445 | 0.9517 | 0.9896 | |
| 10 | 0.9385 | 0.9385 | 0.9384 | 0.9379 | 0.9836 | |
| Average (%) | 0.9592 | 0.9579 | 0.9581 | 0.9588 | 0.9913 | |
Table 8.
The proposed scheme results on Random Forest classifier with 10-fold cross-validation.
| Classifier | Test Set | Precision | Recall | F1_Score | Accuracy | ROC_AUC |
|---|---|---|---|---|---|---|
| 1 | 0.9532 | 0.9522 | 0.9514 | 0.9521 | 0.9943 | |
| 2 | 0.9729 | 0.9726 | 0.9725 | 0.9726 | 0.9993 | |
| 3 | 0.9864 | 0.9863 | 0.9863 | 0.9863 | 0.9997 | |
| 4 | 0.9726 | 0.9725 | 0.9725 | 0.9726 | 0.9967 | |
| Random Forest | 5 | 0.9686 | 0.9660 | 0.9656 | 0.9658 | 0.9970 |
| 6 | 0.9729 | 0.9726 | 0.9725 | 0.9724 | 0.9982 | |
| 7 | 0.9471 | 0.9447 | 0.9449 | 0.9448 | 0.9917 | |
| 8 | 0.9808 | 0.9792 | 0.9793 | 0.9793 | 0.9982 | |
| 9 | 0.9728 | 0.9725 | 0.9724 | 0.9724 | 0.9837 | |
| 10 | 0.9546 | 0.9514 | 0.9518 | 0.9517 | 0.9896 | |
| Average (%) | 0.9682 | 0.9670 | 0.9669 | 0.9670 | 0.9948 | |
Table 9.
The proposed scheme results on Extra Trees classifier with 10-fold cross-validation.
| Classifier | Test Set | Precision | Recall | F1_Score | Accuracy | ROC_AUC |
|---|---|---|---|---|---|---|
| 1 | 0.9532 | 0.9522 | 0.9514 | 0.9521 | 0.9979 | |
| 2 | 0.9595 | 0.9590 | 0.9585 | 0.9589 | 0.9936 | |
| 3 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | |
| 4 | 0.9726 | 0.9725 | 0.9725 | 0.9726 | 0.9984 | |
| Extra Tree | 5 | 0.9256 | 0.9247 | 0.9236 | 0.9247 | 0.9920 |
| 6 | 0.9522 | 0.9520 | 0.9516 | 0.9517 | 0.9947 | |
| 7 | 0.9522 | 0.9517 | 0.9515 | 0.9517 | 0.9888 | |
| 8 | 0.9808 | 0.9792 | 0.9793 | 0.9793 | 0.9921 | |
| 9 | 0.9600 | 0.9589 | 0.9586 | 0.9586 | 0.9763 | |
| 10 | 0.9536 | 0.9514 | 0.9516 | 0.9517 | 0.9907 | |
| Average (%) | 0.9610 | 0.9602 | 0.9599 | 0.9601 | 0.9924 | |
Table 10.
The proposed scheme results on AdaBoost classifier with 10-fold cross-validation.
| Classifier | Test Set | Precision | Recall | F1_Score | Accuracy | ROC_AUC |
|---|---|---|---|---|---|---|
| 1 | 0.9538 | 0.9521 | 0.9525 | 0.9521 | 0.9868 | |
| 2 | 0.9863 | 0.9863 | 0.9863 | 0.9863 | 0.9997 | |
| 3 | 0.9863 | 0.9863 | 0.9863 | 0.9863 | 0.9996 | |
| 4 | 0.9726 | 0.9725 | 0.9725 | 0.9726 | 0.9967 | |
| Adaboost | 5 | 0.9386 | 0.9382 | 0.9383 | 0.9384 | 0.9921 |
| 6 | 0.9804 | 0.9796 | 0.9794 | 0.9793 | 0.9930 | |
| 7 | 0.9670 | 0.9654 | 0.9656 | 0.9655 | 0.9967 | |
| 8 | 0.9670 | 0.9654 | 0.9656 | 0.9655 | 0.9919 | |
| 9 | 0.9728 | 0.9725 | 0.9724 | 0.9724 | 0.9829 | |
| 10 | 0.9351 | 0.9308 | 0.9315 | 0.9310 | 0.9745 | |
| Average (%) | 0.9660 | 0.9649 | 0.9650 | 0.9649 | 0.9914 | |
Table 11.
The proposed scheme results on XGBoost classifier with 10-fold cross-validation.
| Classifier | Test Set | Precision | Recall | F1_Score | Accuracy | ROC_AUC |
|---|---|---|---|---|---|---|
| 1 | 0.9296 | 0.9250 | 0.9246 | 0.9247 | 0.9695 | |
| 2 | 0.9116 | 0.9110 | 0.9108 | 0.9110 | 0.9602 | |
| 3 | 0.9354 | 0.9318 | 0.9317 | 0.9315 | 0.9814 | |
| 4 | 0.8661 | 0.8563 | 0.8558 | 0.8562 | 0.9256 | |
| Xgboost | 5 | 0.8833 | 0.8700 | 0.8704 | 0.8699 | 0.9469 |
| 6 | 0.8573 | 0.8482 | 0.8497 | 0.8483 | 0.9410 | |
| 7 | 0.8568 | 0.8478 | 0.8488 | 0.8483 | 0.9330 | |
| 8 | 0.9230 | 0.9170 | 0.9173 | 0.9172 | 0.9537 | |
| 9 | 0.8901 | 0.8895 | 0.8895 | 0.8897 | 0.9549 | |
| 10 | 0.9109 | 0.9099 | 0.9095 | 0.9103 | 0.9645 | |
| Average (%) | 0.8964 | 0.8906 | 0.8908 | 0.8907 | 0.9531 | |
Table 12.
The proposed scheme results on LightGBM classifier with 10-fold cross-validation.
| Classifier | Test Set | Precision | Recall | F1_Score | Accuracy | ROC_AUC |
|---|---|---|---|---|---|---|
| 1 | 0.9458 | 0.9453 | 0.9447 | 0.9452 | 0.9932 | |
| 2 | 0.9863 | 0.9863 | 0.9863 | 0.9863 | 0.9994 | |
| 3 | 0.9933 | 0.9931 | 0.9931 | 0.9932 | 1.0000 | |
| 4 | 0.9869 | 0.9861 | 0.9862 | 0.9863 | 0.9986 | |
| LightGBM | 5 | 0.9521 | 0.9518 | 0.9519 | 0.9521 | 0.9972 |
| 6 | 0.9729 | 0.9726 | 0.9725 | 0.9724 | 0.9981 | |
| 7 | 0.9521 | 0.9518 | 0.9519 | 0.9517 | 0.9907 | |
| 8 | 0.9808 | 0.9792 | 0.9793 | 0.9793 | 0.9972 | |
| 9 | 0.9655 | 0.9656 | 0.9653 | 0.9655 | 0.9824 | |
| 10 | 0.9600 | 0.9585 | 0.9587 | 0.9586 | 0.9819 | |
| Average (%) | 0.9696 | 0.9690 | 0.9690 | 0.9691 | 0.9939 | |
Results analysis
To evaluate SigColDroid, we study its accuracy, and efficiency in detecting application collusion. Furthermore, we compare SigColDroid to another state-of-the-art Android application-collusion identification approach. Specifically, we answer the following research questions:
RQ1: How accurate is SigColDroid in distinguishing between benign, malicious, and colluding Android apps?
RQ2: What is the impact of permission rate and smali file size as opposed to using only Android dangerous permissions?
RQ3: How does SigColDroid’s detection accuracy compare to other detection approaches?
RQ1: How accurate is SigColDroid in distinguishing between benign, malicious, and colluding Android apps?
We evaluate the effectiveness of our proposed machine learning models in distinguishing between benign, malicious, and colluding Android apps using five ensemble classifiers-Random Forest (RF), Extremely Randomized Trees (Extra Trees), Adaptive Boosting (AdaBoost), Extreme Gradient Boosting (XGBoost), and Light Gradient Boosting (LightGBM)-alongside our custom Artificial Neural Network (ANN) and Deep Neural Network (DNN) architectures. To ensure robustness and prevent overfitting, we employed 10-fold cross-validation with early stopping in our experiments.
Table 13 presents the average performance metrics across the 10-fold validation for all models on our dataset. The LightGBM model achieved the highest precision of 0.9696, a recall of 0.9690, an F1-score of 0.9690, and an accuracy of 0.9691, while the Random Forest model attained the highest ROC AUC of 0.9948. Higher precision indicates that a greater proportion of the predicted single-app malware or colluding applications correctly belong to their respective classes. Similarly, a high recall means that most malware samples in the dataset are accurately predicted as either single-app malware or colluding applications. The average performance of our proposed approach is illustrated in Fig. 8, where LightGBM demonstrates the best classification effectiveness in terms of precision, recall, F1-score, and accuracy, while Random Forest achieved the highest ROC AUC. This establishes LightGBM as the top-performing model for our Android application classification task based on permission features.
Table 13.
Most effective classifiers and measured values. The highest value for each metric is emphasized as bold-face type.
| Classifier | Precision | Recall | F1_Score | Accuracy | ROC_AUC |
|---|---|---|---|---|---|
| ANN | 0.8906 | 0.8872 | 0.8863 | 0.8879 | 0.9520 |
| DNN | 0.9592 | 0.9579 | 0.9581 | 0.9588 | 0.9913 |
| Random Forest | 0.9682 | 0.9670 | 0.9669 | 0.9670 | 0.9948 |
| Extra Trees | 0.9610 | 0.9602 | 0.9599 | 0.9601 | 0.9924 |
| AdaBoost | 0.9660 | 0.9649 | 0.9650 | 0.9649 | 0.9914 |
| XGBoost | 0.8964 | 0.8906 | 0.8908 | 0.8907 | 0.9531 |
| LightGBM | 0.9696 | 0.9690 | 0.9690 | 0.9691 | 0.9939 |
Fig. 8.
Mean results of proposed approach on various machine learning clasifiers.
To evaluate the LightGBM’s ability to distinguish between different classes, we computed the area under the ROC curve (AUC) for three specific tasks: Benign vs. Rest, Colluding vs. Rest, and Malware vs. Rest. The ROC graph obtained during the 10-fold of LightGBM is shown in Fig. 9.
Benign vs. Rest: The classifier consistently achieved high AUC values ranging from 0.98 to 1.00, indicating its effectiveness in separating benign apps from other types of applications.
Colluding vs. Rest: Perfect discrimination was observed with an AUC of 1.00, suggesting optimal separation between colluding applications and the other types of applications.
Malware vs. Rest: Strong performance was consistently achieved with AUC values ranging from 0.99 to 1.00, effectively identifying malware apps.
Overall, the results demonstrate that the classifier is effective in distinguishing between benign, colluding, and malware apps. The AUC scores consistently demonstrate strong discriminatory power across all three classes. These findings indicate that the classifier has learned informative features for distinguishing between different types of apps, making it a promising model for application classification.
Fig. 9.
Receiver operating characteristic (ROC) achieved by the proposed method on LightGBM classifier.
R2: What is the impact of permission rate and smali file size as opposed to using only Android dangerous permissions?
To assess the impact of permission rate and smali file size on the classifiers’ ability to differentiate between benign, malicious, and colluding applications, we conducted an evaluation using the Google-based dangerous permissions feature set. The classifiers we employed included random forest, Extra Trees, AdaBoost, XGBoost, and LightGBM, as outlined in Table 14. Notably, our proposed parameter set achieved better detection accuracy while using a significantly reduced feature set, as shown in Fig. 10. Specifically, we reduced the number of features from 28
to 5
, resulting in approximately 82.1% fewer features when compared to those presented in Table 13.
Table 14.
Detection results using Google dangerous permission on ANN, DNN, Random Forest, Extra Trees, AdaBoost, XGBoost, and LightGBM classifier using 10-fold cross-validation.
| Clasifier | Precision | Recall | F1_Score | Accuracy | ROC_AUC |
|---|---|---|---|---|---|
| ANN | 0.8223 | 0.8038 | 0.7993 | 0.8061 | 0.9260 |
| DNN | 0.8665 | 0.8388 | 0.8332 | 0.8405 | 0.9362 |
| Random Forest | 0.8436 | 0.8208 | 0.8166 | 0.8206 | 0.9175 |
| Extra Trees | 0.8578 | 0.8352 | 0.8314 | 0.8350 | 0.9380 |
| AdaBoost | 0.8577 | 0.8317 | 0.8273 | 0.8316 | 0.8989 |
| XGBoost | 0.8512 | 0.8256 | 0.8202 | 0.8254 | 0.9282 |
| LightGBM | 0.8620 | 0.8393 | 0.8348 | 0.8392 | 0.9333 |
Fig. 10.
Comparing precision, Recall, F1-Score, and Accuracy Google dangerous permission on ANN, DNN, random forest, Extra Trees, AdaBoost, XGBoost, and LightGBM classifier using 10-fold cross-validation.
R3: How does SigColDroid’s detection accuracy compare to other detection approaches?
To evaluate the effectiveness of our approach, we compared it with a state-of-the-art Android-app collusion detector that uses audio features12. The authors of this paper propose a method for detecting app collusion using audio signals. Their approach involves converting an application executable into an audio file and using audio signal processing techniques to extract a set of numerical characteristics from each sample. They build different machine learning models to evaluate the effectiveness of their approach for app collusion detection. We evaluated Casolare’s approach using the same apps and experimental setup as our model, including the same split of the dataset for the 10-fold cross-validation technique. Table 15 shows the average accuracy, precision, recall, and F1-score of Casolare’s approach. Our model outperformed Casolare’s approach in all four evaluation metrics, as shown in Fig. 11.
Table 15.
Detection results using Casolare’s approach on12 ANN, DNN, Random Forest, Extra Trees, AdaBoost, XGBoost, and LightGBM classifier using 10-fold cross-validation.
| Clasifier | Precision | Recall | F1_Score | Accuracy | ROC_AUC |
|---|---|---|---|---|---|
| ANN | 0.6331 | 0.6351 | 0.5905 | 0.6238 | 0.7266 |
| DNN | 0.1031 | 0.3333 | 0.1569 | 0.3092 | 0.5000 |
| Random Forest | 0.9388 | 0.9375 | 0.9373 | 0.9375 | 0.9918 |
| Extra Trees | 0.9609 | 0.9602 | 0.9602 | 0.9601 | 0.9958 |
| AdaBoost | 0.9642 | 0.9630 | 0.9629 | 0.9629 | 0.9955 |
| XGBoost | 0.9340 | 0.9334 | 0.9330 | 0.9333 | 0.9747 |
| LightGBM | 0.9619 | 0.9609 | 0.9608 | 0.9608 | 0.9969 |
Fig. 11.
Comparing Results of Casolare’s approach12 with SigColDroid approach.
Figure 12 compares the number of features and detection accuracy achieved by SigColDroid and the existing approaches. We categorize the features into two sub-categories: works based on permission features and works based on other features. By doing so, we aim to highlight the effectiveness of SigColDroid specifically in the sub-category of works based on permission features.
Fig. 12.
Comparisons of the number of features and accuracy of the proposed and existing approaches.
The results indicate that SigColDroid achieved a detection accuracy of approximately 96.91%, using only 5
permissions. In contrast, Faiz et al.45 achieved a detection accuracy of 96.03% with 10 permission features, Khokhlov et al.46 achieved a detection accuracy of 95% with 13 dynamic features, and Casolare et al. achieved a detection accuracy of 97.5% with 25 audio signal features. These numbers clearly demonstrate that SigColDroid outperforms Faiz et al.45, Khokhlov et al.46, and achieved similar results to Casolare et al.12 approach while using only 5
permissions, compared to the existing approaches.
Conclusion
This paper presented SigColDroid, a novel approach for detecting potentially dangerous Android applications, specifically targeting colluding and single-app malware. Our methodology first extracts permissions from Manifest Files and uses random forest feature importance to filter out a reduced set of permission parameters. In addition, a novel and balanced dataset containing permission requests, smali size, and permission rates for 1455 applications, encompassing benign apps, single-app malware, and colluding Android apps, was proposed. By employing five ensemble classifiers-Random Forest (RF), Extremely Randomized Trees (Extra Trees), Adaptive Boosting (AdaBoost), Extreme Gradient Boosting (XGBoost), and Light Gradient Boosting (LightGBM)-alongside our proposed custom Artificial Neural Network (ANN) and Deep Neural Network (DNN) architectures, we effectively categorized applications into their respective classes. The experimental results demonstrated that the LightGBM model achieved the highest performance metrics, with a precision of 0.9696 and an accuracy of 0.9691, highlighting its effectiveness in malware detection.
We evaluated how permission rate and smali file size affect the classifiers’ ability to differentiate between benign, malicious, and colluding applications, we evaluated several classifiers, using the Google-based dangerous permissions feature set. Our analysis demonstrated that our proposed parameter set achieved superior detection accuracy with a significantly reduced feature set, lowering the number of features from 28 to 5, which corresponds to an approximate reduction of 82.1%. This enhancement indicates that effective detection can be maintained while simplifying the feature set, thereby improving efficiency.
Furthermore, we compared SigColDroid’s detection accuracy with a state-of-the-art collusion detection approach that utilizes audio features. Our analysis, which included the same dataset and experimental setup, showed that SigColDroid outperformed this method across all evaluation metrics, achieving an accuracy of approximately 96.91% with only 5 permission features. In contrast, other approaches required more features to achieve slightly lower or comparable accuracy. This highlights SigColDroid’s efficiency and effectiveness in detecting colluding applications while using significantly fewer features.
Moving forward, our plans include augmenting the dataset and assessing the proposed scheme using additional supervised and unsupervised machine learning classifiers to enhance accuracy. Additionally, we aim to identify specific patterns associated with malicious behavior.
Acknowledgements
We wish to express our deepest gratitude to Jorge Blasco and his team who made available the colluding dataset used in this study. We also thank the European Mathematical Society (EMS) Committee for Developing Countries that granted us for a research stay in South Africa that helped to define the model for detection.
Author contributions
R.Y.M. conceived, conducted and analyzed the experiment(s), J.B.A. reviewed the manuscript, F.T. refined the conception and experiments, and enhanced discussion of results. C.F. and K. refined the architecture and description with components. All authors reviewed the manuscript.
Data availability
The data will be available as per reasonable request upon sending an email to an author: mawohry3030@gmail.com or f.tchakounte@cycomai.com.
Declarations
Competing interests
The authors declare no competing interests.
Footnotes
https://ibotpeaches.github.io/Apktool/
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
F. Tchakounte, C. Fachkha, and Kolyang have contributed equally to this work.
References
- 1.Statista. Global android malware volume 2020. https://www.statista.com/statistics/680705/global-android-malware-volume/ (2022).
- 2.BoA. App statistics - 2021. https://www.businessofapps.com/data/app-statistics/ (2021).
- 3.Securelist. It threat evolution q2 2021. mobile statistics. https://securelist.com/it-threat-evolution-q2-2021-mobile-statistics/103636/ (2021).
- 4.FidelisCybersecurity. Fidelis threat intelligence report-february/march 2021. https://fidelissecurity.com/resource/report/fidelis-threat-intelligence-report-february-march-2021/ (2021).
- 5.Elish, K. O., Yao, D. & Ryder, B. G. On the need of precise inter-app icc classification for detecting android malware collusions. In Proceedings of IEEE mobile security technologies (MoST), in conjunction with the IEEE symposium on security and privacy (Citeseer, 2015).
- 6.Liu, F. et al. Mr-droid: A scalable and prioritized analysis of inter-app communication risks. In 2017 IEEE security and privacy workshops (SPW), 189–198, 10.1109/SPW.2017.12 (2017).
- 7.Bhandari, S. et al. Android inter-app communication threats and detection techniques. Comput. Secur.70, 392–421 (2017). [Google Scholar]
- 8.Kalutarage, H. K., Nguyen, H. N. & Shaikh, S. A. Towards a threat assessment framework for apps collusion. Telecommun. Syst.66, 417–430 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Mahindru, A. et al. Permdroid a framework developed using proposed feature selection approach and machine learning techniques for android malware detection. Sci. Rep.14, 10724 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Aurangzeb, S. & Aleem, M. Evaluation and classification of obfuscated android malware through deep learning using ensemble voting mechanism. Sci. Rep.13, 3093 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Rafiq, H., Aslam, N., Aleem, M., Issac, B. & Randhawa, R. H. Andromalpack: Enhancing the ml-based malware classification by detection and removal of repacked apps for android systems. Sci. Rep.12, 19534 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Casolare, R., Di Giacomo, U., Martinelli, F., Mercaldo, F. & Santone, A. Android collusion detection by means of audio signal analysis with machine learning techniques. Procedia Comput. Sci.192, 2340–2346 (2021). [Google Scholar]
- 13.Russell, I. Neural networks module. https://scholar.google.co.jp/citations?view_op=view_citation&hl=zh-TW &user=Oy46FHsAAAAJ &sortby=pubdate &citation_for_view=Oy46FHsAAAAJ:_FxGoFyzp5QC (2012). Accessed on 9 Nov 2024.
- 14.Hinton, G. E. & Salakhutdinov, R. R. Reducing the dimensionality of data with neural networks. Science313, 504–507 (2006). [DOI] [PubMed] [Google Scholar]
- 15.Breiman, L. Random forests. Mach. Learn.45, 5–32. 10.1023/A:1010933404324 (2001). [Google Scholar]
- 16.Geurts, P., Ernst, D. & Wehenkel, L. Extremely randomized trees. Mach. Learn.63, 3–42. 10.1007/s10994-006-6226-1 (2006). [Google Scholar]
- 17.Freund, Y., Schapire, R. & Abe, N. A short introduction to boosting. Japan. Soc. Artif. Intell.14, 771–780 (1999). [Google Scholar]
- 18.Chen, T. & Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp. 785–794 (2016).
- 19.Ke, G. et al. Lightgbm: A highly efficient gradient boosting decision tree. In Guyon, I. et al. (eds.) Advances in Neural Information Processing Systems, vol. 30 (Curran Associates, Inc., 2017).
- 20.Liu, K. et al. A review of android malware detection approaches based on machine learning. IEEE Access8, 124579–124607. 10.1109/ACCESS.2020.3006143 (2020). [Google Scholar]
- 21.Senanayake, J., Kalutarage, H. & Al-Kadri, M. O. Android mobile malware detection using machine learning: A systematic review. Electronics 10, 10.3390/electronics10131606 (2021).
- 22.Tchakounté, F., Manfouo, R. E. & Ebongue, J. L. Svdroid: Singular value decomposition with cnn for android malware classification. Int. J. Comput. Digital Syst.14, 1–1 (2023). [Google Scholar]
- 23.Tchakounté, F. Permission-based malware detection mechanisms on android: Analysis and perspectives. J. Comput. Sci. Softw. Appl. (2014).
- 24.Aslam, N. et al. Explainable classification model for android malware analysis using api and permission-based features. Comput. Mater. Continua 76 (2023).
- 25.Chrysikos, N., Karampelas, P. & Xylogiannopoulos, K. Permission-based classification of android malware applications using random forest. In ECCWS 2023 22nd European Conference on Cyber Warfare and Security, 1 (Academic Conferences and publishing limited, 2023).
- 26.Thangavelooa, R., Jinga, W. W., Lenga, C. K. & Abdullaha, J. Datdroid: Dynamic analysis technique in android malware detection. Int. J. Adv. Sci. Eng. Inf. Technol.10, 536–541 (2020). [Google Scholar]
- 27.Millar, S., McLaughlin, N., del Rincon, J. M. & Miller, P. Multi-view deep learning for zero-day android malware detection. J. Inf. Secur. Appl.58, 102718 (2021). [Google Scholar]
- 28.Mbungang, B. N. et al. Detecting android malware with convolutional neural networks and Hilbert space-filling curves. SN Comput. Sci.5, 1–25 (2024). [Google Scholar]
- 29.Sufatrio, Tan, D. J., Chua, T.-W. & Thing, V. L. Securing android: A survey, taxonomy, and challenges. ACM Comput. Surv. (CSUR)47, 1–45 (2015).
- 30.Bugiel, S., Davi, L., Dmitrienko, A., Fischer, T. & Sadeghi, A.-R. Xmandroid: A new android evolution to mitigate privilege escalation attacks. Technische Universität Darmstadt, Technical Report TR-2011-04 (2011).
- 31.Ravitch, T., Creswick, E. R., Tomb, A., Foltzer, A., Elliott, T., & Casburn, L. Multi-app security analysis with fuse: Statically detecting android app collusion. In In Proceedings of the 4th Program Protection and Reverse Engineering Workshop, 1–10 (2014).
- 32.Li, L. et al. Iccta: Detecting inter-component privacy leaks in android apps. In 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering1, 280–291. 10.1109/ICSE.2015.48 (2015).
- 33.Bhandari, S., Laxmi, V., Zemmari, A. & Gaur, M. S. Intersection automata based model for android application collusion. In 2016 IEEE 30th International Conference on Advanced Information Networking and Applications (AINA), 901–908 (IEEE, 2016).
- 34.Casolare, R., Martinelli, F., Mercaldo, F. & Santone, A. Detecting colluding inter-app communication in mobile environment. Appl. Sci.10, 8351 (2020). [Google Scholar]
- 35.Casolare, R., Martinelli, F., Mercaldo, F. & Santone, A. Malicious collusion detection in mobile environment by means of model checking. In 2020 International Joint Conference on Neural Networks (IJCNN), 1–6 (IEEE, 2020).
- 36.Zhang, H., Luo, S., Zhang, Y. & Pan, L. An efficient android malware detection system based on method-level behavioral semantic analysis. IEEE Access7, 69246–69256 (2019). [Google Scholar]
- 37.Cai, H., Meng, N., Ryder, B. & Yao, D. Droidcat: Effective android malware detection and categorization via app-level profiling. IEEE Trans. Inf. Forensics Secur.14, 1455–1470. 10.1109/TIFS.2018.2879302 (2019). [Google Scholar]
- 38.Asavoae, I. M. et al. Towards automated android app collusion detection. arXiv preprint arXiv:1603.02308 (2016).
- 39.Faiz, M. F. I., Hussain, M. A. & Marchang, N. App-collusion detection using a two-stage classifier. In Web, Artificial Intelligence and Network Applications: Proceedings of the Workshops of the 33rd International Conference on Advanced Information Networking and Applications (WAINA-2019) 33, 702–710 (Springer, 2019).
- 40.Faiz, M. F. I., Hussain, M. A. & Marchang, N. Machine learning based app-collusion detection in smartphones. In 2019 IEEE 23rd International Symposium on Consumer Technologies (ISCT), 134–137, 10.1109/ISCE.2019.8901022 (2019).
- 41.Wang, W. et al. Exploring permission-induced risk in android applications for malicious application detection. IEEE Trans. Inf. Forensics Secur.9, 1869–1882. 10.1109/TIFS.2014.2353996 (2014). [Google Scholar]
- 42.Blasco, J. & Chen, T. M. Automated generation of colluding apps for experimental research. J. Comput. Virol. Hack. Tech.14, 127–138 (2018). [Google Scholar]
- 43.Faiz, M. F. I. & Hussain, M. A. Hybrid classification model to detect android application-collusion. In 2020 43rd International Conference on Telecommunications and Signal Processing (TSP), 492–495, 10.1109/TSP49548.2020.9163571 (2020).
- 44.Yerima, S. Y. & Sezer, S. Droidfusion: A novel multilevel classifier fusion approach for android malware detection. IEEE Trans. Cybern.49, 453–466. 10.1109/TCYB.2017.2777960 (2019). [DOI] [PubMed] [Google Scholar]
- 45.Iqbal Faiz, M. F., Hussain, M. A. & Marchang, N. Detection of collusive app-pairs using machine learning. In 2018 IEEE International Conference on Consumer Electronics - Asia (ICCE-Asia), 206–212, 10.1109/ICCE-ASIA.2018.8552106 (2018).
- 46.Khokhlov, I., Ligade, N. & Reznik, L. Recurrent neural networks for colluded applications attack detection in android os devices. In 2020 International Joint Conference on Neural Networks (IJCNN), 1–8 (IEEE, 2020).
- 47.Rokach, L. Ensemble methods for classifiers. Data mining and knowledge discovery handbook, 957–980 (2005).
- 48.Xia, S. & Yang, Y. A model-free feature selection technique of feature screening and random forest-based recursive feature elimination. Int. J. Intell. Syst.1, 2400194 (2023). [Google Scholar]
- 49.Kuhn, M. & Johnson, K. Feature Engineering and Selection: A Practical Approach for Predictive Models (Chapman and Hall/CRC, 2019).
- 50.Idrees, F., Rajarajan, M., Conti, M., Chen, T. M. & Rahulamathavan, Y. Pindroid: A novel android malware detection system using ensemble learning methods. Comput. Secur.68, 36–46 (2017). [Google Scholar]
- 51.Feng, P., Ma, J., Sun, C., Xu, X. & Ma, Y. A novel dynamic android malware detection system with ensemble learning. IEEE Access6, 30996–31011 (2018). [Google Scholar]
- 52.Zhang, Y., Huang, Q., Ma, X., Yang, Z. & Jiang, J. Using multi-features and ensemble learning method for imbalanced malware classification. In 2016IEEE Trustcom/BigDataSE/ISPA (2016).
- 53.Virusshare. Virusshare. http://www.virusshare.com (2013). Accessed Aug 2023.
- 54.Allix, K., Bissyandé, T. F., Klein, J. & Le Traon, Y. Androzoo: Collecting millions of android apps for the research community. In Proceedings of the 13th International Conference on Mining Software Repositories, MSR ’16, 468–471, 10.1145/2901739.2903508 (ACM, New York, NY, USA, 2016).
- 55.Virustotal. VirusTotal. https://www.virustotal.com/. Accessed 19 Feb 2024.
- 56.Fritz, C. et al. Highly precise taint analysis for android applications (University of Luxembourg, Tech. Rep., 2013).
- 57.Sotiroudis, S. P., Goudos, S. K. & Siakavara, K. Feature importances: A tool to explain radio propagation and reduce model complexity. Telecom1, 114–125. 10.3390/telecom1020009 (2020). [Google Scholar]
- 58.Nasir, M., Javed, A. R., Tariq, M. A., Asim, M. & Baker, T. Feature engineering and deep learning-based intrusion detection framework for securing edge iot. J. Supercomput.78, 8852–8866. 10.1007/s11227-021-04250-0 (2022). [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The data will be available as per reasonable request upon sending an email to an author: mawohry3030@gmail.com or f.tchakounte@cycomai.com.















