DEFEAT: Android device behavior-based datasets for multi-stage APT

Thulfiqar Jabar; Amjed Ahmed Al-Kadhimi; Manmeet Mahinderjit Singh

doi:10.1016/j.dib.2026.112539

. 2026 Feb 10;65:112539. doi: 10.1016/j.dib.2026.112539

DEFEAT: Android device behavior-based datasets for multi-stage APT

Thulfiqar Jabar ^a,^b, Amjed Ahmed Al-Kadhimi ^a,^c, Manmeet Mahinderjit Singh ^a,^⁎

PMCID: PMC13080646 PMID: 41993081

Abstract

Android devices play a central role in both personal and organizational operations, which has made them a primary target for Advanced Persistent Threats (APTs). Unlike traditional attacks, APT attacks are implemented through multiple covert stages, allowing attackers to remain active on a device while avoiding detection models. Existing studies depend on data that captures only a single stage of an attack or focuses mainly on static features. Consequently, detection models trained on such datasets may fail to detect multi-stage APT attacks in real-world environments. In order to address this gap, this paper introduces DEFEAT, a benchmarking dataset built specifically for detecting APT attacks on Android devices. DEFEAT follows the MITRE ATT&CK framework to more accurately reflect multi-stage APT attacks in real-world environments. The dataset generation process includes three main phases: gathering normal activity, simulating multi-stage APT attacks, and preparing the data. The datasets were collected from a real Android smartphone and are provided in two parts: a resource-usage dataset that tracks CPU, RAM, battery, and network activity; and an app-based dataset that logs permissions, sensors, and services used by apps. The dataset captures the active phase of APT attacks, focusing on observable malicious behavior rather than long-term dormant activity. The requirements of a well-structured dataset have been met in the proposed datasets to ensure they are suitable for use by other researchers. Feature contributions have also been examined using SHAP (SHapley Additive exPlanations) to better understand their role in detecting APTs. In addition, statistical t-test analysis is applied to the resource-usage datasets to verify that the collected behavioral features vary significantly across malware families and attack stages, supporting their suitability for behavior-based APT detection. By offering a realistic and publicly accessible representation of multi-stage APTs, DEFEAT addresses an important gap in current Android security research and supports the development of more effective behavioral detection models. The datasets are publicly available and can be reused by other researchers for the tuning, evaluation, and comparison of detection models for multi-stage APT activities on Android devices.

Keywords: Device behavior analysis, Resource usage features, App-based features, Dataset generation, Advanced persistent threat (APT), Android security. mobile cyberattacks

Specifications Table

Subject	Computer Sciences
Specific subject area	APT Android dataset that includes two parts: resource usage (CPU, RAM, battery, network traffic) and app-based features (permissions, sensors, services).
Type of data	Raw: csv files. Data format: resource usage features (8 features per frame), app-based features (108 features per frame).
Data collection	Datasets were collected using the DEFENSE Android data-collection application deployed on a real smartphone (Android version 6). It runs the DEFENSE collector that collects resource usage (CPU, RAM, battery activity, and network traffic) and app-based features (permissions, sensors, and services) and sends them to the server every 3 seconds, then exports them to CSV. A total of 40 normal apps were used as a baseline, and 36 malwares were used to simulate the APT stages after VirusTotal validation, with device reset between runs. Each app was executed for 10 minutes, the records were stage-labelled, and the data were stored on the server.
Data source location	School of Computer Sciences, Universiti Sains Malaysia, 11800 USM Penang, Malaysia.
Data accessibility	Repository name: [Mendeley Repository] [1]. Data identification number: 10.0.68.224/bdtn9vj7d7.3 Direct URL to data: https://doi.org/10.17632/bdtn9vj7d7.3
Related research article	None

Dataset	No, of files	Sampling unit	Feature Group	Description	No. of Features	Data Type
Resource-usage dataset	3 CSV files	1 frame / 3 seconds	App-Info	Application data size (MB)	1	Numeric (float)
			Battery-Info	Battery voltage and temperature (%)	2	Numeric (float)
			CPU-Info	Device-level CPU usage (%)	1	Numeric (float)
			RAM-Info	Device-level RAM usage (MB)	1	Numeric (float)
			Traffic	RX, TX, and total traffic (MB)	3	Numeric (float)
			Class label	0 = normal, 1 = abnormal	-	Integer (0/1)
Total		8

App-based dataset	3 CSV files	1 frame / 3 seconds	Sensors	Camera, GPS, Microphone, Wi-Fi, Bluetooth	5	Binary (0/1)
			Services	SMS, Phone, Contacts, Storage, Calendar	5	Binary (0/1)
			Permissions	Encoded as permission-usage indicators per frame; names include tier weights (e.g.,0.25/0.5/0.75/1.0) indicating permission category/level	98	Binary (0/1)
			Class label	0 = normal, 1 = abnormal	-	Integer (0/1)
Total		118

Feature category	Feature name	Description
App-Info	App-data-size	The data size of normal and abnormal apps, measured in megabytes.
Battery-Info	B-temperature	The temperature of the device’s battery, often monitored to prevent overheating.
Battery-Info	B-voltage	The voltage level of the device's battery.
CPU-Info	CPU-Usage	The amount of processing power is currently being used by the device’s central processing unit (CPU).
RAM-Info	RAM-Usage	The amount of Random Access Memory (RAM) is being used by the device to run apps and processes.
Traffic	Received Data (RX)	The amount of data that has been downloaded from the internet to the device, measured in megabytes.
Traffic	Transferred Data (TX)	The amount of data that has been uploaded from the device to the internet, measured in megabytes.
	Traffic	The total amount of data traffic, both sent and received, on the device, measured in megabytes.

Features	Description
Camera	Ambient sensor, Access camera and capture image and video
GPS	Positioning sensor, Location tracking and transmission of information
Microphone	Ambient sensor, Access microphone record audio
WIFI	Positioning sensor, Location tracking and transmission of information
Bluetooth	Positioning sensor, Location tracking and transmission of information
SMS	Telephony services/ Access and send messages
Phone	Telecommunication, Access to telephony functions such as retrieving contact numbers, managing call states, and monitoring device telephony status
Contacts	Telecommunication/ Access contacts and profiles
Storage	Utilities/ Access to external storage
Calendar	Utilities/ Access and modify user calendar data
Normal Permissions
access_location_extra_commands	Grants access to advanced location provider commands
access_network_state	Grants applications the ability to retrieve information regarding active network connections
access_wifi_state	Allows retrieval of Wi-Fi network details, such as SSID, signal strength, and connectivity status
bluetooth_admin	Grants apps the ability to discover nearby Bluetooth devices and initiate pairing
Dangerous Permissions
access_background_location	Grants apps the ability to access location in the background
access_coarse_location	Grants apps the ability to access approximate location
access_fine_location	Grants apps the ability to access precise location
access_media_location	Grants applications access to stored geographic location data shared by the user or persisted across services
activity_recognition	Grants apps the ability to recognize physical activity
Signature Permissions
bind_accessibility_service	Required by an Accessibility Service to ensure binding is restricted to the system, protecting against unauthorized access or misuse
broadcast_sms	Grants app the ability to broadcast a notification upon receipt of an SMS message
capture_audio_output	Grants an app the ability to capture or record audio being played by the device
change_component_enabled_state	Grants an app the ability to change whether an app component is enabled or not
delete_packages	Grants an app the ability to delete package
Privileged Permissions
battery_stats	Grants an app the ability to collect battery statistics
call_privileged	Grants an app the ability to initiate phone calls, including emergency numbers without user interaction or confirmation via the Dialer interface
change_configuration	Grants an app the ability to alter system configuration settings
get_accounts_privileged	Grants apps access to the list of user accounts registered on the device via the Accounts Service
package_usage_stats	Grants an app the ability to collect component usage statistics

App name	Data Size	Battery Temperature	Battery Voltage	CPU Usage	RAM Usage	RX	TX	Traffic
Youla	0.176	31.4	3.747	0.11	0.59	67.17	121	188
Gumtree	0.164	30.1	3.622	0.09	0.63	70.51	141	211
memory booster	0.112	33	4.029	0.12	0.58	12.49	67.27	79.76
PhotoWonder	0.1	31.5	3.635	0.11	0.58	176	137	313
Dendroid	0.02	30.4	4.18	0.13	0.68	2.43	17.55	19.98
Setel	18.86	31.9	3.677	0.34	0.75	1340	386	1720
GoodFM	60.95	33.9	3.996	0.31	0.84	668	58.14	727
Messenger	114	32.2	3.78	0.14	0.78	718	113	831
Gumtree	0.164	30.1	3.431	0.11	0.64	70.52	141	211
WEBTOON	31.35	31.4	3.575	0.38	0.81	703	102	805
DramaBox	29.24	32.7	3.634	0.15	0.86	815	149	940
Nobetci eczane	0.068	31.9	4.073	0.19	0.65	14.25	21.87	36.12
xRecorder	9.36	31	4.052	0.24	0.77	2280	472	2740
nobetci eczane	0.068	32	4.085	0.14	0.64	14.35	23.58	37.93
Elmo loves ABCs	0.156	32.7	3.744	0.27	0.78	990	213	1200
MetaMask	0.088	31.6	3.832	0.05	0.61	56.58	96.03	153
KenanganCoffee	4.45	31	4.088	0.14	0.82	858	175	1010
Youla	0.176	31.2	3.684	0.13	0.59	67.3	124	192
Egypt 3D	0.024	32.3	3.814	0.09	0.57	103	93.35	197
Al Jazeera	6.05	31.3	3.847	0.15	0.76	2340	506	2830
GoogleUpdater	0.092	29.8	3.97	0.15	0.68	18.78	58.59	77.37
DramaBox	29.77	32.6	3.615	0.22	0.88	821	151	950
Chrome	12.54	30.5	3.66	0.11	0.78	2800	540	3330
dragon fighter 3d	3.2	31.5	3.701	0.15	0.57	128	116	243
Chrome	12.55	30.6	3.675	0.09	0.8	2800	540	3330
Chrome	17.18	31.5	3.674	0.21	0.82	2810	548	3340
AlfredCamera	11.31	33.3	3.789	0.29	0.83	990	206	1190
Egypt 3D	0.028	32.3	3.772	0.19	0.58	104	96.52	200
Moomoo	89.23	31.2	3.711	0.21	0.75	2800	531	3320

Reference	Focused	Features	MITRE	Multi-stage coverage	Online availability
[5,6]	Network-centric	URL	No	No	No
[3,4]	Network-centric	DNS logs	No	No	No
[9]	App-centric	Permissions, intents and API calls	No	No	No
[2,10]	Device + Network-centric	System logs, network traces	No	No	Yes
[11]	App + Device-centric	Permissions, activities, Intents, services, receivers, system calls	No	No	Yes
[8]	App-centric	Binary vectors of TTP and IoC (MITRE based)	Yes	Yes	No
[7]	App-centric	Permissions, activities, services, receivers, intents	Yes	Yes	Yes
DEFEAT dataset	Device, Network, and App-centric	•Resource usage (CPU, RAM, battery, traffic) •App sensors permissions, services	Yes	Yes	Yes

Category	Apps name	Collection time	Date and time
Watch apps	Todo list	10 mins each app	Monday 12/02/2024.
Android Auto	Good FM Dramas
Art and design	Canava: Design, Photo and video
Auto and vehicle	Maxim: Bike Taxi, car and Auto
Beauty	Beaty camera plus: Sweet Cam
Books and references	Al Quran
Business	Flyers, poster maker, Design
Comics	WEBTOON
Communications	Messenger and Chrome
Dating	Sexy video call & sexy chat
Education	Duolingo: Language Lessons
Entertainment	DramaBox-Stream Drama shorts and Youtube
Events	Easy Quran Mp3 Audio offline
Food and drink	Kenangan Coffee
Games	Crossmath-Math puzzle Games
Google Cast	Facebook
Health and fitness	Home Workout no equipment
House and home	Alfred Camera: Home security
Kids	Elmo loves ABCs
Lifestyle	Lemon 8-komuniti lifestyle
Maps and navigations	Setel:Fuel, parking, e-walet
Parenting	Asianparent: Kehamilan &bayi
Personalization	Fonts Keyboards themes & Emoji
Shopping	IKEA shopping
Social	Cherry talk- random video chat and Tiktok
Sports	Live football tv HD
Tools	QR code scanner
Travel and local	My ride Malysia’s E-Hailing
Video players and editors	Screen recorder- xrecorder
Weather	Local weather forecast
Libraries and demo	Update apps for android
Medical	Countour diabetes
Music and audio	Ringtone maker
Photography	Hypic
Productivity	Cam scanner
News and magazines	Podcast player
Finance	Moomoo

Category	Family	App name
Adware	Gooligan (6)	Best wallpapers, memory booster, crazy motor, Cargame, HTM5 games, and smarttouch.
	Kemog (5)	Shareit, 2048kg, privacy guard, magic treasure, and sex academy.
	Shuanet (3)	Airdemon, Ninja turtles flapy, and Wild blackjack.
SMS malware	FakeInst (5)	Egypt 3d, Dragon Fighter 3d, Indian game, Photo Wonder, and Zalo.
Backdoor	Dendroid	Dendroid
Others	Exodus	Smartphone
	Anubis (3)	Borsa dovis Takip, nobetci eczane, and Doviz.
	Henbox	Backup
	Stealth Mango	google updater
	Zoopark (2)	(All in One) and Iranian app
	Bouncing Golf	Kik
	Moonkle	Google
	Riltok(3)	Youla, Aviasales, and Gumtree
	Xloader	Sex kr porn
	Joker/Bread	display camera
	Clipper	Metamask

Group ID	Behavioral group	Families	Typical virus total labels
Group1	SMS-abusing malware (SMS Trojans)	FakeInst1/2/3/8, joker.apk	trojan, pua, adware, smskey, smsreg, smspay, trojansms, smssend.
Group2	Banking Trojans (credential & OTP theft)	Anubis1/3/4, Riltok1/2/3, Xloader1	trojan, downloader, dropper, banker, bankbot, riltok, wroba.
Group3	Spyware / cyber-espionage (APT-style mobile surveillance)	bouncinggolf.apk HenBox1.apk, Monokle1.apk, Zoopark1/2.apk, StealthMango1.apk	trojan, spyware, spyagent, domestickitten, henbox, monokle, zoopark, infostealer, apaspy.
Group4	Aggressive adware & auto-root droppers (repackaged apps)	Gooligan1/2/3/4/6/7/9, Gplayed.apk, Kemoge3/6/7/9/10, Shuanet1/7/10	trojan, adware, downloader, dropper, hiddenads, airpush, allad, ztorg, xinyinhe, ginmaster, kemoge, oveead, pluginloader.
Group5	Specialised trojans / loaders & info-stealers	clipper.apk, shuanet1.apk	trojan, clipper, pluginloader.

S. no.	Activity	Device	Description
1.	Reverse TCP payload injection	Client (Android device)	sudo msfvenom -x app-name.apk -p android/meterpreter/reverse_tcp LHOST= server-IP-address LPORT=server-port-number -o output.apk
2.	Establish the connection	C&C Server	• sudo systemctl start apache2 • msfconsole • use exploit/multi/handler • set payload android/meterpreter/reverse_tcp • set LHOST server-IP-address • exploit
3.	Initial Compromise	Client (Android device)	Collecting data without triggering any activity
4.>	Presence Expansion	Client (Android device)	Escalate privileges and gather credentials
5.	Exfiltration	Client (Android device)	Exfiltrate sensitive data to C&C server

APT stages	No. of abnormal apps	Collection time	Date and time
Initial Compromise	36	10 mins every app	Tuesday 13/02/2024
Presence Expansion	36	10 mins every app	Wednesday 14/02/2024
Exfiltration	36	10 mins every app	Thursday 15/02/2024

Dataset	Total data	Training dataset – before resampling (80%)		Training dataset - after resampling		Test dataset (20%)
Dataset	Total data	Normal	Attack	Normal	Attack	Normal	Attack
Initial Compromise	12,741	5225	4967	5380	4967	1307	1242
Presence Expansion	12,761	5225	4983	5398	4983	1307	1246
Exfiltration	12,833	5225	5041	5460	5041	1307	1260

Dataset	Classifier	Test 1 (Cross-validation)		Test 2 (Supplied test set)
Dataset	Classifier	Cross-validation	Final Test	Internal testing	Unseen Test
Initial Compromise	LR	0.9852	0.9898	0.9868	0.9874
	SVM	0.9934	0.9957	0.9951	0.9945
	DT	0.9919	0.9937	0.9931	0.9918
	RF	0.9928	0.9945	0.9961	0.9925
	NB	0.9930	0.9949	0.9956	0.9933
	KNN	0.9923	0.9969	0.9936	0.9941

Credential Access	LR	0.9911	0.9945	0.9927	0.9922
	SVM	0.9960	0.9969	0.9966	0.9957
	DT	0.9917	0.9937	0.9922	0.9933
	RF	0.9929	0.9941	0.9936	0.9945
	NB	0.9930	0.9941	0.9946	0.9922
	KNN	0.9957	0.9965	0.9966	0.9953

Exfiltration	LR	0.9954	0.9965	0.9976	0.9961
	SVM	0.9960	0.9969	0.9971	0.9965
	DT	0.9922	0.9934	0.9932	0.9907
	RF	0.9930	0.9945	0.9937	0.9930
	NB	0.9930	0.9945	0.9951	0.9938
	KNN	0.9960	0.9969	0.9966	0.9965

Dataset	Classifier	Test 1 (Cross-validation)		Test 2 (Supplied test set)
Dataset	Classifier	Cross-validation	Final Test	Internal testing	Unseen Test
Initial Compromise	LR	0.9764	0.9780	0.9819	0.9788
	SVM	0.9762	0.9737	0.9828	0.9737
	DT	0.9771	0.9780	0.9799	0.9796
	RF	0.9769	0.9761	0.9809	0.9757
	NB	0.9754	0.9757	0.9823	0.9757
	KNN	0.9736	0.9741	0.9789	0.9749

Privilege Escalation	LR	0.9758	0.9738	0.9721	0.9710
	SVM	0.9764	0.9734	0.9750	0.9726
	DT	0.9770	0.9734	0.9740	0.9749
	RF	0.9780	0.9765	0.9731	0.9730
	NB	0.9758	0.9702	0.9760	0.9702
	KNN	0.9766	0.9714	0.9755	0.9730

Exfiltration	LR	0.9751	0.9758	0.9786	0.9774
	SVM	0.9749	0.9762	0.9791	0.9782
	DT	0.9794	0.9770	0.9766	0.9747
	RF	0.9761	0.9739	0.9820	0.9758
	NB	0.9741	0.9762	0.9752	0.9758
	KNN	0.9760	0.9731	0.9766	0.9727

Dataset	Classifier	Test 1 (Cross-validation)				Test 2(Supplied test set)
		Cross-validation		Final Test		Cross-validation		Final Test
		FPR	FNR	FPR	FNR	FPR	FNR	FPR	FNR
Initial Compromise	LR	0.0221	0.0068	0.0153	0.0048	0.0227	0.0039	0.0173	0.0074
	SVM	0.0050	0.0083	0.0031	0.0056	0.0059	0.0039	0.0030	0.0082
	DT	0.0078	0.0085	0.0069	0.0056	0.0099	0.0039	0.0075	0.0090
	RF	0.0058	0.0087	0.0054	0.0056	0.0039	0.0039	0.0060	0.0090
	NB	0.0052	0.0089	0.0046	0.0056	0.0020	0.0068	0.0030	0.0106
	KNN	0.0071	0.0085	0.0008	0.0056	0.0089	0.0039	0.0038	0.0082

Credential Access	LR	0.0096	0.0080	0.0054	0.0056	0.0094	0.0051	0.0070	0.0087
	SVM	0.0000	0.0084	0.0000	0.0064	0.0000	0.0071	0.0000	0.0087
	DT	0.0082	0.0084	0.0061	0.0064	0.0094	0.0061	0.0047	0.0087
	RF	0.0059	0.0084	0.0054	0.0064	0.0057	0.0071	0.0023	0.0087
	NB	0.0009	0.0136	0.0008	0.0112	0.0009	0.0102	0.0000	0.0157
	KNN	0.0006	0.0084	0.0008	0.0064	0.0000	0.0071	0.0008	0.0087

Exfiltration	LR	0.0013	0.0081	0.0008	0.0063	0.0000	0.0052	0.0008	0.0070
	SVM	0.0000	0.0083	0.0000	0.0063	0.0000	0.0062	0.0000	0.0070
	DT	0.0075	0.0081	0.0069	0.0063	0.0073	0.0062	0.0116	0.0070
	RF	0.0059	0.0083	0.0046	0.0063	0.0064	0.0062	0.0070	0.0070
	NB	0.0000	0.0145	0.0000	0.0111	0.0000	0.0104	0.0000	0.0125
	KNN	0.0000	0.0083	0.0000	0.0063	0.0009	0.0062	0.0000	0.0070

Features	Description
Normal Permissions
broadcast_sticky	Grants an app the ability to broadcast sticky intents
change_network_state	Grants an app the ability to alter network connectivity state
change_wifi_multicast_state	Grants an app the ability to enter Wi-Fi Multicast mode
change_wifi_state	Grants an app with the ability to change Wi-Fi connectivity state
disable_keyguard	Grants an app the ability to disable the lock screen if no secure authentication method is set
expand_status_bar	Grants an app the ability to expand/collapse the status bar
foreground_service	Grants an app the ability to utilize Service.startForeground
get_package_size	Grants an app the ability to retrieve information about the storage space utilized by any installed package on the device
install_shortcut	Grants an app the ability to install a shortcut in Launcher
internet	Grants an app the ability to open network sockets
kill_background_processes	Grants an app the ability to call ActivityManager.killBackgroundProcesses
manage_own_calls	Enables an application to implement its own calling interface and handle connection setup, audio routing, and call state management through the self-managed ConnectionService APIs
modify_audio_settings	Grants an app the ability to amend global audio settings
nfc	Grants an app the ability to execute I/O operations over NFC
read_sync_settings	Grants an app the ability to read the sync settings
read_sync_stats	Grants an app the ability to read the sync stats
receive_boot_completed	Grants an app the ability to receive the BOOT_COMPLETED broadcast intent, allowing it to perform actions automatically once the system has finished booting
reorder_tasks	Grants an app the ability to alter the Z-order of tasks
request_ignore_battery_optimizations	Grants an application the ability to request exemption from the system's battery optimization policies
set_wallpaper	Grants an app the ability to set the wallpaper
set_wallpaper_hints	Grants an app the ability to set the wallpaper hints
use_biometric	Grants an app the ability to use device supported biometric modalities
use_fingerprint	Grants an app the ability to use the device’s fingerprint sensor for authentication purposes
use_full_screen_intent	Grants an app the ability to display notifications using full-screen intents, typically used for high-priority events such as incoming calls or alarms that require immediate user attention
vibrate	Allows access to the vibrator
wake_lock	Grants an app the ability to utilize WakeLocks via the PowerManager API, preventing the processor from sleeping or the screen from dimming
write_sync_settings	Grants an app the ability to write the sync settings
Dangerous Permissions
add_voicemail	Grants an app the ability to insert voicemails into the system
answer_phone_calls	Grants an app the ability to answer an incoming phone call
call_phone	Grants an app the ability to place phone calls directly, bypassing the Dialer interface and user confirmation
camera	Grants an app the ability to access the camera of the device
get_accounts	Grants app permission to access the list of user accounts registered on the device through the Accounts Service
read_calendar	Grants an app the ability to read the user's calendar data
read_call_log	Grants an app the ability to read the user's call log
read_contacts	Grants an app the ability to read the user's contacts data
read_external_storage	Grants an app the ability to read from external storage
read_logs	Grants an app the ability to read the low-level system log files
read_phone_numbers	Grants app permission to access the device phone numbers associated with the SIM card
read_phone_state	Grants an app read-only access to the device’s telephony state, including cellular network information, current call status, and registered phone accounts
read_sms	Grants an app the ability to read SMS messages
receive_mms	Grants an app the ability observes and process incoming MMS messages
receive_sms	Grants an app the ability to receive SMS messages
receive_wap_push	Grants an app the ability to receive WAP push messages
record_audio	Grants an app the ability to record audio.
send_respond_via_message	Grants app permission to delegate the 'respond via message' action for incoming calls to other applications
send_sms	Grants an app the ability to send SMS messages
write_calendar	Grants an app the ability to write the user's calendar data
write_call_log	Grants an app the ability to write and read the user's call log data
write_contacts	Allows the app to write data to the user's contact list, including adding new contacts or modifying existing entries
write_external_storage	Grants an app the ability to write to external storage
Signature Permissions
install_packages	Grants an app the ability to install packages
manage_documents	Grants app permission to manage access to documents on the device
master_clear	Grants an app the ability to perform a factory reset of the device, erasing all user data, apps, and settings by invoking a master clear operation
modify_phone_state	Grants an app permission to modify telephony state, including actions such as powering on radio modules, issuing MMI (Man-Machine Interface) codes, and controlling other low-level telephony functions
mount_unmount_filesystems	Grants an app with the ability to mount and unmount file systems for removable storage
request_install_packages	Grants an application the ability to request the installation of application packages, typically by invoking the system package installer
set_animation_scale	Grants app permission to adjust the global animation scale settings
status_bar	Grants an app the ability to open, close, or disable the status bar and its icons
system_alert_window	Grants app permission to create overlay windows using TYPE_APPLICATION_OVERLAY, which appear on top of all other app interfaces
write_settings	Grants an app the ability to read or write the system settings
Privileged Permissions
reboot	Grants app permission to initiate a device reboot
write_apn_settings	Grants app permission to modify app settings and access sensitive configuration fields, such as stored usernames and passwords of other applications
write_secure_settings	Grants an app the ability to read or write secure system settings

Keyword	Definition
Advanced Persistent Threat (APT)	A targeted cyberattack in which an adversary maintains long-term, covert access to a system or device to achieve objectives such as surveillance, credential theft, or data exfiltration.
Android runtime permissions	A permission model in which sensitive permissions are granted during app execution (runtime), based on user interaction, rather than only at install time.
App-based features	Features describing application-level behavior on Android, including permissions, sensors, and services accessed or requested during execution, as used in the DEFEAT app-based dataset.
Attack stage (multi-stage APT)	The attack path (e.g., Initial Compromise → Presence Expansion → Exfiltration) used to represent how attacker behavior evolves over time in a realistic sequence.
Command-and-Control (C&C)	Adversary communication is used to control a compromised device and coordinate actions such as issuing commands or receiving stolen data.
Confusion matrix (TP, TN, FP, FN)	A table summarizing classification outcomes: true positives, true negatives, false positives, and false negatives, used to compute performance metrics such as accuracy and error rates.
Cross-validation	A model evaluation approach that repeatedly splits data into training and testing folds to estimate performance stability across different partitions.
Credential Access stage	The stage in which the adversary is trying to steal account names, passwords, or other secrets that enable access to resources.
Dangerous permission	An Android permission protection level for operations that can expose sensitive user data or perform sensitive actions; typically requires explicit user approval (often at runtime).
Dataset “frame/instance/record”	One observation row in the dataset representing a snapshot of device or application behavior at a specific time interval (collected every 3 seconds in DEFEAT).
DEFEAT dataset	The proposed benchmarking dataset in this study consists of resource-usage features (CPU, RAM, battery, traffic) and app-based features (permissions, sensors, services), labelled for multi-stage APT analysis.
DEFENSE collector	The Android monitoring application developed in this study to record device-level and application-level behavioral features and transmit them to a server for storage and analysis.
Detection accuracy	The proportion of correctly classified instances across both classes (normal and attack), computed from the confusion matrix.
Exfiltration stage	The APT stage in which stolen data is transferred from the compromised device to external server, often observable through increased outbound communication.
False Negative (FN) / False Negative Rate (FNR)	FN: an attack instance predicted as normal. FNR is commonly computed as FN/(FN+TP) using confusion-matrix terms.
False Positive (FP) / False Positive Rate (FPR)	FP: a normal instance predicted as attack. FPR is commonly computed as FP/(FP+TN) using confusion-matrix terms.
Initial Compromise stage	The stage representing the attacker’s first foothold on the device (e.g., malicious app installed and activated), typically emphasizing stealth and minimal visible impact.
MITRE ATT&CK for Mobile	A structured knowledge base that organizes mobile adversary behavior into tactics/techniques, used as a threat-model reference to align simulated activities and labels.
Normalization	A preprocessing step that transforms features into comparable numeric ranges/scales so that no single feature dominates learning due to magnitude alone.
Permission protection level	Android’s categorization of permissions (e.g., normal/dangerous/signature/privileged) that reflects how restricted the capability is and how it can be granted.
Precision / Recall / F1-score	Standard metrics: recall is tp/(tp+fn) and F1 is the harmonic mean of precision and recall, widely used to evaluate the detection models.
Presence Expansion stage	A stage where adversary behavior broadens control and collects sensitive information (the paper operationalizes this via Privilege Escalation and/or Credential Access, depending on dataset component).
Privilege Escalation stage	Actions intended to gain higher privileges or broader control on the device (e.g., changing settings, weakening restrictions), represented as a stage label in the app-based dataset.
Resource-usage features	Device-level behavior indicators such as CPU usage, RAM usage, battery temperature/voltage, and network traffic (RX/TX/total), used in the DEFEAT resource-usage dataset.
Reverse TCP payload	A payload that initiates an outbound TCP connection from a compromised device to a command-and-control server, used in this study to establish controlled C&C communication during all simulated APT stages.
SHAP (SHapley Additive exPlanations)	An explainability method that attributes each feature’s contribution to a model prediction using Shapley-value principles, enabling interpretation of which features drive “attack vs normal.”
SHAP beeswarm plot	A SHAP visualization that shows the distribution of SHAP values per feature across instances, highlighting both impact direction and value density for the most influential features.
SMOTE	A class-balancing technique that synthetically generates new minority-class examples by interpolating between existing minority samples, commonly used to address class imbalance.
Supplied test set (unseen test)	A testing strategy that evaluates a trained model on a separate “unseen” subset that was not used for training, to estimate generalization in realistic settings.
Threat intelligence platform (VirusTotal)	A service that analyzes submitted files/URLs and aggregates multiple security engine results to support malware/indicator checking and validation.
t-test statistical analysis	A statistical test used to determine whether the mean values of a feature differ significantly between two groups (e.g., malware families), applied in this study to verify behavioral variability beyond payload effects.
TTP (Tactics, Techniques, and Procedures)	A common cyber threat-intelligence concept describing adversary stages (tactics), the methods used (techniques), and how they are executed in practice (procedures), often operationalized via ATT&CK-aligned analysis.

PERMALINK

DEFEAT: Android device behavior-based datasets for multi-stage APT

Thulfiqar Jabar

Amjed Ahmed Al-Kadhimi

Manmeet Mahinderjit Singh

Abstract

1. Value of the Data

2. Background

3. Data Description

Table 1.

Fig. 1.

Table 2.

Table 3.

Table 4.

Table 5.

Table 6.

Fig. 4.

Fig. 5.

Fig. 6.

2.1. Datasets comparison

Table 7.

3.1.1. Dataset focus

3.1.2. Feature types

3.1.3. MITRE alignment

3.1.4. Multi-stage APT coverage

3.1.5. Online availability

4. Experimental Design, Materials and Methods

Fig. 2.

4.1. Network setup for data collection

Fig. 3.

4.2. Normal data generation

Table 8.

4.3. Multi-stage APT generation

4.3.1. Android malware identification

Table 9.

Table 10.

4.3.2. Common APT stages

4.3.3. Research assumption of the APT scenario

4.3.4. Simulation process

Table 11.

Table 12.

4.4. DEFEAT datasets preparation

4.4.1. Labelling

4.4.2. Normalization

4.4.3. Balancing

Table 13.

5. DEFEAT datasets validation and evaluation

5.1. DEFEAT datasets validation

5.1.1. Realistic data

5.1.2. Scenarios diversity

5.1.3. Completed and correct labelling

5.1.4. Sufficient size

5.1.5. Representative features

5.2. SHAP analysis

5.2.1. SHAP analysis for resource usage features

Fig. 7.

Fig. 8.

5.2.2. SHAP analysis for app features

Fig. 9.

Fig. 10.

Fig. 11.

5.3. t-test statistical analysis of malware behavior across APT stages

Fig. 12.

Fig. 13.

5.4. DEFEAT datasets evaluation

Table 14.

Table 16.

Table 15.

Table 17.

Limitations

Ethics Statement

CRediT Author Statement

Acknowledgements

Declaration of Competing Interest

Appendix A

Table A1.

Table A2.

Data Availability

References

Associated Data