Efficient detection of intrusions in TON-IoT dataset using hybrid feature selection approach

N Dharini; V S Janani; Jeevaa Katiravan

doi:10.1038/s41598-026-37834-y

. 2026 Feb 7;16:7763. doi: 10.1038/s41598-026-37834-y

Efficient detection of intrusions in TON-IoT dataset using hybrid feature selection approach

N Dharini ^1,^✉, V S Janani ², Jeevaa Katiravan ³

PMCID: PMC12949231 PMID: 41651911

Abstract

This research improves IoT attack classification by introducing a bias-aware dataset refinement strategy that eliminates IP- and port-based identifiers and applies a domain-guided hybrid feature selection framework to derive a lightweight and generalizable feature set. Motivated by the need for intrusion detection models that generalize beyond predefined network configurations, this study focuses on behavior-driven network features that enable more realistic attack categorization in IoT environments. Wrapper-based feature selection methods, including forward selection, backward elimination, and genetic algorithms, identify five optimal features. To assess the robustness of the selected feature subset, both simple classifiers (Decision Tree and KNN) and ensemble learning models, including Random Forest, Gradient Boosting, XGBoost, Bagging, and Voting Ensemble, are evaluated under binary and multi-class settings. Using the proposed reduced feature set, the Decision Tree classifier achieved an accuracy of 0.986 for binary classification and 0.972 for multi-class attack classification, while the K-Nearest Neighbor classifier consistently achieved an accuracy of 0.972 for both binary and multi-class scenarios, while ensemble models yield only marginal performance improvements. Evaluation using precision, recall, F1-score, confusion matrices, and Cohen’s Kappa confirms that the discriminative power primarily arises from the selected feature subset rather than classifier complexity. These results demonstrate that effective feature selection enables lightweight models to achieve competitive intrusion detection performance suitable for real-world IoT deployments.

Supplementary Information

The online version contains supplementary material available at 10.1038/s41598-026-37834-y.

Keywords: ToN-IoT dataset, Attacks, Hybrid feature selection, Decision tree, K nearest neighbours

Subject terms: Engineering, Mathematics and computing

Introduction

Admitting the growing need for Internet of Things (IoT)-based smart devices in our daily lives, it is crucial to ensure the security of the data transmitted among these devices. Due to their wireless connectivity, IoT environments are highly vulnerable and easily targeted by intruders. Preventive techniques such as cryptography and authentication systems act as the first line of defence, despite attacks cannot be completely prevented; intrusion detection systems (IDS) are essential as a second line of defence. IDS monitor network and telemetry data, detect anomalies, and alert administrators to possible threats in real time.

To encounter attacks and threats effectively, real-time intrusion detection datasets were used in this work — namely ToN-IoT^1–8 and NF-ToN-IoT⁹. These datasets are particularly suited for Industrial Internet of Things (IIoT) research due to their real-world traffic data collected from heterogeneous IoT environments.

ToN-IoT contains 44 features and supports both binary classification (benign or attacker) and multi-class classification.
NF-ToN-IoT is the NetFlow version of ToN-IoT, containing a reduced set of approximately 12 features that capture network flow characteristics.

Both datasets were collected using a realistic IIoT testbed at the IoT Lab, University of New South Wales (UNSW), Canberra, developed using the NSX vCloud NFV platform integrating Software-Defined Networking (SDN), Network Function Virtualization (NFV), and Service Orchestration (SO). The datasets comprise heterogeneous data sources from IoT telemetry, Windows and Linux systems, and network traffic logs.

To evaluate intrusion detection in IoT environments, this work utilizes the widely adopted ToN-IoT and NF-ToN-IoT datasets, which contain heterogeneous network traffic and system telemetry collected from a realistic Industrial IoT testbed. These datasets support both binary and multi-class attack classification and are commonly used as benchmarks for evaluating intrusion detection systems in IIoT scenarios.

Limitations of existing ToN-IoT and NF-ToN-IoT datasets

Although the ToN-IoT and NF-ToN-IoT datasets are widely adopted benchmarks for IoT intrusion detection research, they exhibit certain limitations that affect their real-world applicability. First, attacks in these datasets were deliberately launched from predefined IP address ranges and port numbers, causing source and destination IPs and ports to act as strong attack identifiers rather than behavior-based indicators. This introduces dataset bias and limits generalization to unseen attackers operating from different network locations.

Second, the presence of a large number of heterogeneous features increases computational complexity and introduces redundancy, leading to longer training times without proportional gains in detection performance. Third, while the NF-ToN-IoT dataset reduces dimensionality, it suffers from degraded multiclass attack detection accuracy.

These limitations motivate the need for refining the dataset by eliminating biased identifiers and constructing a lightweight, behavior-driven feature space suitable for practical intrusion detection systems.

Research objectives

The primary objectives of this research are:

(i)
To identify and mitigate bias introduced by fixed IP- and port-based attack generation in the ToN-IoT dataset;
(ii)
To construct a lightweight and behavior-driven feature set using a hybrid feature selection framework; and.
(iii)
To evaluate the effectiveness of the reduced dataset for both binary and multi-class IoT intrusion detection.

Related works

There were many other public IoT based intrusion detection datasets were available such as.

Bot-IoT Dataset(2019): The BoT-IoT dataset^10–15 was created by designing a realistic network environment in the Cyber Range Lab of UNSW Canberra. The network environment incorporated a combination of normal and botnet traffic. The dataset includes DDoS, DoS, OS and Service Scan, Keylogging and Data exfiltration attacks, with the DDoS and DoS attacks further organized, based on the protocol with 72.000.000 records..It consists of 44 features. The netflow version of this dataset was aimed at standardizing network-security datasets to achieve interoperability and larger analyses with 12 features. The total number of data flows is 600,100 out of which 586,241 (97.69%) are attack samples and 13,859 (2.31%) are benign.
UNSW-NB 15 dataset (2015) : The raw network packets of the UNSW-NB 15 dataset^16–20 was created by the IXIA PerfectStorm tool in the Cyber Range Lab of UNSW Canberra for generating a hybrid of real modern normal activities and synthetic contemporary attack behaviours. The total number of records is two million. The number of records in the training set is 175,341 records and the testing set is 82,332 records from the different types, attack and normal. This dataset has nine types of attacks, namely, Fuzzers, Analysis, Backdoors, DoS, Exploits, Generic, Reconnaissance, Shellcode and Worms. It has 49 features with the class label.

The above were some of the key datasets introduced in the literature for the detection of malicious activity in IoT environment. But accounting to the heterogeneous data present in ToN-IoT and its diversified features present, many intrusion detection mechanism were proposed. They are as follows.

The authors of ToN-IoT developed a very comprehensive datasets comprising of data from different IoT devices^2,3,6,7,its related operating system data were also gathered. Both Windows⁴ and Linux⁵ datasets were also separately gathered.

Feature extraction for machine learning-based intrusion detection in IoT Networks²¹ was proposed by Mohanad Sarhan et al.²¹.Feature reduction techniques like Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA), Auto-encoder (AE),were evaluated using three benchmark datasets: UNSW-NB15, ToN-IoT and CSE-CIC-IDS2018. Six machine learning algorithms such as Logistic Regression (LR), and Naive Bayes (NB), Deep Feed Forward (DFF), Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), DecisionTree (DT) were trained over the datasets. Optimal number of dimensions were identified and different machine learning models were trained upon the optimal dimensions and accuracy were computed. Among the 18 tried combinations of FE algorithm and ML classifiers, no single combination performs best across all three NIDS datasets. As far as the ToN-IoT dataset is concerned the maximum accuracy obtained was 98.23% with 20 features using auto encoders.

The above authors proposed yet another reduced dataset comprising only of the netflow based features of the ToN-IoT dataset⁹. Various IDS datasets such as ToN-IoT, BoT-IoT, UQ-NIDS, UNSW-NB15 and CSECIC-IDS2018 were taken into consideration and their network features were extracted from the original dataset comprising of 12 main network features namely source address,destination address,source port,destination port,IP protocol identifier, TCP flags, incoming number of bytes, outgoing number of bytes, incoming number of packets, outgoing number of packets, flow duration. These 12 features were used in common across all the datasets considered and various machine learning algorithms were used for classification of attacks. Among the different datasets considered the netflow version of ToN-IoT obtained the highest accuracy of 99.66% in binary classification. the authors also added that much efficiency was not attained for multi classification based on the netflow versions of the different datasets. The authors concluded that the netflow version of the dataset can be improved by identifying the key features from the original dataset to improve the accuracy.

Arun Kumar Dey et al.²² proposed a hybrid feature selection approach including Coefficient (PCC), and Mutual Information (MI) are combined with a Non-Dominated Sorting Genetic Algorithm (NSGA-II)-based metaheuristic approach for optimization of features. In the proposed scheme, filter-based methods are employed to rank the features for guided population initialization in NSGA-I. the proposed scheme reduced the 44 exhaustive features of ToN-IoT dataset to a minimum of 13 features and attained an accuracy of 99.48%

Tareq et al.²³ did an analysis on the ToN-IoT, UNW-NB15, and Edge-IIoT Datasets Using Dense Net based Deep Learning. Among the different datasets considered windows with network version of the ToN-IoT dataset achieved 100% accuracy in classifying the attacks using inception time approach.

Gad et al.²⁴ proposed an intrusion detection system based on Chi-square (Chi2) technique for feature selection and Synthetic minority oversampling technique (SMOTE) for class balancing. Logistic regression (LR), naive Bayes (NB), decision tree (DT), support vector machine (SVM), k-nearest neighbor (kNN), random forest (RF), AdaBoost, as well as XGBoost algorithms were employed for classification. Among all the algorithms considered, XG boost algorithm performed well for classification and obtained an accuracy of 0.98 having an optimal set of 20 features.

Rubayyi Alghamdi and Martine Bellaiche proposed a deep ensemble-based IDS²⁵ using Lambda architecture. Long Short Term Memory (LSTM) was utilized for binary classification to categorize between malicious and benign traffic, and an ensemble of LSTM, Convolutional Neural Network and Artificial Neural Network classifiers were utilized to detect the type of attacks. The proposed approach gave an accuracy of over 99.93% and authors claimed that it saves useful processing time due to the multi-pronged classification strategy and using the lambda architecture.

Guo et.al²⁶ proposed an IDS incorporating Matthews Correlation Coefficient (MCC) scores based stacked ensemble model was proposed and claimed to attain an accuracy of 0.9971 and 0.9909 in the binary classification and the multiclass classification, respectively.

Ji et al. have extensively explored ensemble learning–based intrusion detection mechanisms for cyber-physical systems. In their earlier work, they proposed a cascading ensemble framework integrating bagging and boosting techniques to improve detection robustness and generalization across CPS attack scenarios²⁷. This approach demonstrated high accuracy by leveraging the complementary strengths of multiple classifiers, but at the cost of increased computational complexity due to the layered ensemble structure. Subsequently, the authors presented a comprehensive review of IDS techniques for CPS networks, analyzing datasets, industrial protocols, attack taxonomies, and open research challenges²⁸. The review highlighted critical issues such as dataset bias, scalability limitations, and the growing computational burden of deep and ensemble-based IDS solutions, motivating the need for lightweight and realistic detection frameworks.

Building upon these insights, Ji et al. introduced deep ensemble-based IDS models incorporating CNNs with Grey Wolf Optimization (GWO) and voting mechanisms to enhance detection performance²⁹. While the proposed CNN-GWO-voting framework achieved improved accuracy, the reliance on deep learning and meta-heuristic optimization significantly increased training time and memory requirements. In a subsequent study, the authors proposed hybrid enhanced IDS frameworks that integrate optimal feature selection with traditional machine learning classifiers to reduce dimensionality and computational overhead³⁰. Although this approach improved efficiency, it retained a moderate feature set and did not explicitly address biases introduced by predefined IP addresses and ports. More recently, Ji et al. proposed a meta-learning-based stacked generalization framework for CPS intrusion detection, achieving high accuracy in both binary and multi-class classification tasks³⁰. However, the stacked ensemble architecture further increased system complexity, limiting its suitability for real-time and resource-constrained IoT environments.

Kunhare et al. focused on optimization-driven intrusion detection systems with an emphasis on feature selection and hybrid classification. In their initial work, particle swarm optimization (PSO) was employed to identify an optimal subset of features for IDS, demonstrating improved detection accuracy with reduced dimensionality compared to full-feature models³². The authors later extended this approach by integrating genetic algorithm–based feature selection with hybrid classifiers, achieving further improvements in detection performance³³. Despite their effectiveness, both PSO and GA-based methods involve iterative population-based optimization, resulting in high computational overhead and limited scalability for large-scale or real-time IoT deployments.

In another study, Kunhare et al. analyzed real-time network packet behavior and evaluated Snort IDS performance under different DoS attack variants³⁴. This work provided practical insights into packet-level intrusion detection and real-time traffic analysis; however, it primarily relied on signature-based detection and did not address feature reduction, dataset bias, or the scalability challenges associated with machine learning-based IDS solutions.

From the above analysis, it is evident that existing studies either prioritize high detection accuracy through deep learning and ensemble models at the expense of computational efficiency or rely on meta-heuristic optimization techniques that introduce significant overhead. Moreover, most approaches overlook dataset biases arising from predefined IP addresses and ports, which can adversely affect generalization to unseen attack scenarios. In contrast, the proposed work explicitly addresses dataset bias and derives a compact five-feature representation using a bias-aware hybrid feature selection framework, achieving linear-time complexity Inline graphic and minimal memory requirements. This makes the proposed approach more suitable for lightweight, real-time, and edge-based IoT intrusion detection systems.

Accounting to the huge volume of the original ToN-IoT dataset with exhaustive 44 features, in our work we used the original ToN-IoT dataset. Owing to the big data volume and the curse of dimensionality, it is indeed necessary to reduce the data volume by removing the unwanted features or selecting the most relevant features necessary to classify the attacks accurately. Analyzing all the state of art feature selection or reduction methodologies applied on the considered datasets, methodologies such as Chi-square (Chi2) technique²⁴,Correlation Coefficients²⁶, Mutual Information²², Non dominated sorting genetic algorithm²², Meta heuristic approaches²², Principal Component Analysis (PCA)²¹, Auto-encoder (AE)²¹, and Linear Discriminant Analysis (LDA)²¹ were used.

Owing to the features present in the ToN-IoT dataset, many reduced feature space based datasets were proposed and analyzed in the literature. Analyzing the way in which all the attacks were launched in to the network, deliberately certain range of IP addresses were utilized, thus having the source and destination IP address as a feature will not be suitable to detect unknown attackers. Thus in this work we propose to make use of a hybrid feature selection approach incorporating both filter based and wrapper based techniques. By incorporating an appropriate feature selection technique, an exhaustive number of features present in the dataset can be considerably reduced. By reducing the redundant and useless features the complexity of training and testing the classification algorithm can be significantly reduced. Considering these parameters, space and time complexity of the existing and the proposed works were computed.

n- number of samples/flows.

d – original number of features.

k – selected/reduced features.

e – training epochs (DL).

h – hidden units/layers.

p– population size (meta-heuristics).

G– generations (meta-heuristics).

From the comparative analysis as given in Table 1, it is evident that while existing studies achieve high detection accuracy using feature selection, ensemble learning, or deep learning approaches, they largely overlook dataset biases introduced by predefined IP addresses and ports. Moreover, many approaches retain a relatively large feature set or rely on computationally intensive models. In contrast, the proposed work explicitly addresses dataset bias and derives a robust five-feature representation through a bias-aware hybrid feature selection framework, making it suitable for lightweight and real-time IoT intrusion detection systems.

Table 1.

Comparison with Existing Works.

Study	Dataset Used	Feature Selection/Reduction	Bias Handling (IP/Port)	No. of Features	Model Type	Binary & Multiclass	Time Complexity	Space Complexity	Limitations
Sarhan et al.²¹	UNSW-NB15, ToN-IoT, CSE-CIC-IDS2018	PCA, LDA, Autoencoder	✗ Not addressed	20 (ToN-IoT)	ML + DL	✓	O(e * n * d * h)	O(n * d + h^2)	No single method works consistently; dataset bias not considered
Sarhan et al. (NetFlow)⁹	ToN-IoT (NetFlow)	Fixed NetFlow features	✗ Not addressed	12	ML	✓	O(n * d)	O(n * d)	Reduced multiclass performance; retains IP/port identifiers
Dey et al.²²	ToN-IoT	PCC + MI + NSGA-II	✗ Not addressed	13	ML	✓	O(G * p * n * d)	O(n * d + p * d)	Optimization-focused; dataset bias not mitigated
Gad et al.²⁴	ToN-IoT	Chi-square + SMOTE	✗ Not addressed	20	ML Ensembles	✓	O(n * d + c * n * k)	O(n * d + k)	Feature selection purely statistical; bias not addressed
Tareq et al.²³	ToN-IoT, UNSW-NB15, Edge-IIoT	None (Deep Learning)	✗ Not addressed	Full feature set	DL (DenseNet)	✓	O(e * n * d * h)	O(n * d + h^2)	High computational cost; unsuitable for real-time IDS
Alghamdi & Bellaiche²⁵	ToN-IoT	None	✗ Not addressed	Full feature set	DL Ensemble	✓	O(e * n * d * m)	O(n * d + m * h^2)	Complex architecture; feature redundancy retained
Guo et al.²⁶	ToN-IoT	Correlation-based	✗ Not addressed	Not specified	Ensemble	✓	O(n * d + C(n,d))	O(n * d + model_size)	No discussion on IP/port bias or generalization
Proposed Work	ToN-IoT	Domain-guided filtering + FFS + BFE + GA	✓ Explicitly mitigated	5	Lightweight ML	✓	O(n * k)	O(n * k)	Designed for generalization and real-time deployment

Open in a new tab

Deep learning–based approaches show poor scalability, with time complexity: O(e * n * d * h).

This results in high latency, large memory consumption, and poor suitability for edge or real-time IoT environments.

Meta-heuristic feature selection methods improve detection accuracy but introduce high computational overhead, expressed as: O(G * p * n * d).

This makes them unsuitable for large-scale or time-critical deployments.

The proposed work clearly outperforms prior studies in computational efficiency by achieving:

Lowest feature dimension (k = 5)
Linear-time scalability: O(n * k)
Minimal memory footprint: O(n * k)
Explicit bias mitigation (IP/Port handling)
Better suitability for real-time and edge-based IDS deployment

The reduced time complexity of O(n·k) and memory requirement of O(n·k) demonstrate that the proposed solution meets cost-effectiveness criteria better than O(e·n·d·h) deep learning models or O(G·p·n·d) meta-heuristic systems, making it suitable for real-time IoT environments.

Although feature selection techniques have been widely applied in intrusion detection research, existing studies on the ToN-IoT and NF-ToN-IoT datasets primarily rely on standalone statistical filters or single wrapper-based optimization methods. Such approaches often overlook inherent dataset biases introduced by predefined IP addresses and ports and do not evaluate the robustness of selected features across multiple optimization strategies. Consequently, there remains a research gap in developing a bias-aware, lightweight, and robust feature selection framework tailored for realistic IoT intrusion detection.

Contributions

A bias-aware hybrid feature selection framework that integrates domain knowledge–based filtering with wrapper-based optimization techniques for refining the ToN-IoT dataset.
Explicit identification and mitigation of IP- and port-based bias inherent in the ToN-IoT dataset by eliminating artificial identifiers that limit real-world generalization.
Domain knowledge–driven filtering to remove irrelevant and weakly informative network features based on cyberattack behavior and network impact analysis.
Wrapper-based feature optimization using Forward Feature Selection, Backward Feature Elimination, and Genetic Algorithm to identify a minimal and robust common feature subset.
Construction of a lightweight reduced dataset that preserves multiclass intrusion detection performance while significantly reducing computational complexity.
Comprehensive performance evaluation of the proposed reduced dataset using accuracy, precision, recall, F1-score, confusion matrices, and Cohen’s Kappa.

Proposed work

Dataset description and attack scenarios

Attacks in ToN-IoT

The dataset contains labels corresponding to multiple types of security attacks: backdoor, DDoS (Distributed Denial of Service), DoS (Denial of Service), injection, MITM (Man in the Middle), password, ransomware, scanning, XSS, and benign entries. Understanding these attack types is essential for designing efficient intrusion detection systems. Table 2 lists the offensive host IPs associated with different types of network attacks.

Table 2.

Offensive Host IPs.

Attack Category	Offensive Host IPs
Scanning, DoS, DDoS, Injection, Password	192.168.1.{30–39}
MITM	192.168.1.{31,34}
Ransomware	192.168.1.{33,37}
XSS	192.168.1.{32,35,36,39}

Open in a new tab

i)
Scanning Attack

Also known as a probing attack, this attack identifies active systems and open ports on a network. Tools such as Nessus and Nmap were used to perform scanning on the subnet 192.168.1.0/24, targeting both local and public vulnerable systems such as MQTT brokers and PHP-based web servers.

ii)
Denial of Service (DoS) Attack

In this attack, the network is flooded with excessive packets to overwhelm victim nodes and prevent them from processing legitimate requests. Python scripts using the Scapy library were employed to launch these DoS attacks in the testbed environment.

iii)
Distributed Denial of Service (DDoS) Attack

Multiple DoS attacks were executed simultaneously, converting several systems into bots under the attacker’s control. These compromised systems formed a botnet that generated large-scale traffic using Scapy-based Python scripts.

iv)
Ransomware Attack

Ransomware is a type of malware that prevents users from accessing their systems or data until a ransom is paid. This attack was executed using the Metasploit framework, exploiting the EternalBlue SMB vulnerability from Kali Linux hosts (192.168.1.{33,37}) against Windows systems and their IoT monitoring web pages.

xxii)
Backdoor Attack

A backdoor allows attackers to maintain persistent unauthorized access to IIoT systems. Using the Metasploit framework and the command run persistence -h, attackers executed bash scripts from Kali hosts (192.168.1.{33,37}) to retain continuous control.

vi)
Injection Attack

Injection attacks involve inserting malicious data or code into vulnerable web applications, disrupting normal operations. Various injection scenarios were implemented from offensive systems (192.168.1.{30,31,33,35}) targeting IoT network web applications.

vii)
Password Cracking Attack

This attack uses brute-force and dictionary-based techniques to guess system passwords. Offensive systems (192.168.1.{30,31,32,35,38}) executed automated password cracking attempts against IoT devices.

viii)
Man-in-the-Middle (MITM) Attack

MITM attacks intercept communication between two legitimate parties to steal or alter transmitted data. Attackers used offensive systems (192.168.1.{31,34}) to perform various MITM scenarios, capturing sensitive packets.

ix)
Cross-Site Scripting (XSS) Attack

XSS attacks inject malicious scripts into trusted web applications, causing browsers of unsuspecting users to execute the attacker’s code. Offensive systems (192.168.1.{32,35,36,39}) were used to perform these web-based injection attacks.

The ToN-IoT and NF-ToN-IoT datasets provide comprehensive, real-time IoT network and system data that capture multiple categories of cyberattacks. They serve as valuable benchmarks for developing and testing robust intrusion detection systems capable of handling diverse, heterogeneous IIoT environments.

In this work, the term dataset refinement refers to the elimination of biased, redundant, and non-generalizable features from the original ToN-IoT network dataset without altering raw data values or attack labels. The refinement process focuses on retaining only those network parameters that directly capture abnormal traffic behavior caused by cyberattacks, thereby improving generalization, reducing dimensionality, and lowering computational overhead.

The aim of this work is to detect intrusions present in the ToN-IoT dataset. As a first step toward intrusion detection in the considered dataset, the data must be preprocessed. The four different categories of captured IoT traffic present in the ToN-IoT dataset are:

Device Dataset: consists of sensed data values such as temperature, latitude, and longitude, accounting for four features for each device, such as a refrigerator, GPS tracker, Modbus, etc.
Linux Dataset: consists of data related to disk activities, memory activities, and process scheduling activities, accounting for 34 features.
Windows Dataset: consists of processor data, process data, and network data from Windows 7 and Windows 10 operating systems, accounting for 133 features.
Network Dataset: consists of all network monitoring parameters, accounting for 44 features.

Analyzing the characteristics of the different attacks present in the dataset, it is observed that they largely impact network performance rather than device-sensed data or Linux and Windows system performance. In addition, to maintain generality and avoid platform-specific intrusion detection, common underlying network monitoring parameters were considered in this work. Thus, the Train-Test Network dataset of ToN-IoT was used for intrusion detection. The features present in the ToN-IoT and NF-ToN-IoT datasets are tabulated in Tables 3 and 4.

Table 3.

44 Features present in the original dataset (ToN-IoT-Network Dataset).

ts	src_bytes	dns_query	ssl_version	http_uri	http_resp_mime _types
src_ip	dst_bytes	dns_qclass	ssl_cipher	http_referrer	weird_name
src_port	conn_state	dns_qtype	ssl_resumed	http_version	weird_addl
dst_ip	missed_bytes	dns_rcode	ssl_established	http_request_body _len	weird_notice
dst_port	src_pkts	dns_AA	ssl_subject	http_response_body _len	label
proto	src_ip_bytes	dns_RD	ssl_issuer	http_status_code	type
service	dst_pkts	dns_RA	http_trans_depth	http_user_agent
duration	dst_ip_bytes	dns_rejected	http_method	http_orig_mime _types

Open in a new tab

Table 4.

12 Features present in NF-ToN-IoT Dataset.

IPV4_SRC_ADDR	L7_PROTO
IPV4_DST_ADDR	IN_BYTES
L4_SRC_PORT	OUT_BYTES
L4_DST_PORT	IN_PKTS
PROTOCOL	OUT_PKTS
TCP_FLAGS	FLOW_DURATION_MILLISECONDS

Open in a new tab

Total number of samples present under each category is given in the form a Fig. 1

While the ToN-IoT and NF-ToN-IoT datasets provide comprehensive IoT and network traffic information, the focus of this work is not on dataset construction but on refining feature representation to improve intrusion detection efficiency and generalization.

The complete flow of the proposed work is depicted in the figure below.

The flow of the work is depicted in Fig. 2 and is explained as follows.

The core contribution of this work lies in the proposed bias-aware hybrid feature selection framework, designed to address dataset bias and feature redundancy in IoT intrusion detection. This section details the methodological design, including domain knowledge–based filtering and wrapper-based optimization, which together enable the construction of a lightweight yet effective feature set.

Preprocessing

Based on the limitations discussed in Section "Research objectives", IP address, port number, and timestamp features were deliberately removed to avoid dataset bias and improve real-world applicability. As part of preprocessing, the considered dataset contains 461,043 rows, 42 features, and 3 meta features. As mentioned in the dataset description, the details of the 42 features are as follows. The three meta features present in the dataset are source IP, destination IP, and DNS query. The dataset is divided into six sections, namely Connection Activity, Statistical Activity, DNS Activity, SSL Activity, HTTP Activity, and Violation Activity. The description of all features in the dataset is provided by the authors¹ in Table 5.

Table 5.

Feature Description.

ID	Feature	Type	Description
1	ts	Time	Timestamp of connection between flow identifiers
2	src_ip	String	Source IP addresses which originate endpoints’ IP addresses
3	src_port	Number	Source ports which originate endpoint’s TCP/UDP ports
4	dst_ip	String	Destination IP addresses which respond to endpoint’s IP addresses
5	dst_port	Number	Destination ports which respond to endpoint’s TCP/UDP ports
6	proto	String	Transport layer protocols of flow connections
7	service	String	Dynamically detected protocols, such as DNS, HTTP and SSL
8	duration	Number	The time of the packet connections, which is estimated by subtracting ‘time of last packet seen’ and ‘time of first packet seen’
9	src_bytes	Number	Source bytes which are originated from payload bytes of TCP sequence numbers
10	dst_bytes	Number	Destination bytes which are responded payload bytes from TCP sequence numbers
11	conn_state	String	Various connection states, such as S0 (connection without replay), S1 (connection established), and REJ (connection attempt rejected)
12	missed_bytes	Number	Number of missing bytes in content gaps
13	src_pkts	Number	Number of original packets which is estimated from source systems
14	src_ip_bytes	Number	Number of original IP bytes which is the total length of IP header field of source systems
15	dst_pkts	Number	Number of destination packets which is estimated from destination systems
16	dst_ip_bytes	Number	Number of destination IP bytes which is the total length of IP header field of destination systems
17	dns_query	String	Domain name subjects of the DNS queries
18	dns_qclass	Number	Values which specifies the DNS query classes
19	dns_qtype	Number	Value which specifies the DNS query types
20	dns_rcode	Number	Response code values in the DNS responses
21	dns_AA	Bool	Authoritative answers of DNS, where T denotes server is authoritative for query
22	dns_RD	Bool	Recursion desired of DNS, where T denotes request recursive lookup of query
23	dns_RA	Bool	Recursion available of DNS, where T denotes server supports recursive queries
24	dns_rejected	Bool	DNS rejection, where the DNS queries are rejected by the server
25	ssl_version	String	SSL version which is offered by the server
26	ssl_cipher	String	SSL cipher suite which the server chose
27	ssl_resumed	Bool	SSL flag indicates the session that can be used to initiate new connections, where T refers to the SSL connection is initiated
28	ssl_established	Bool	SSL flag indicates establishing connections between two parties, where T refers to establishing the connection
29	ssl_subject	String	Subject of the X.509 cert offered by the server
30	ssl_issuer	String	Trusted owner/originator of SSL and digital certificate (certificate authority)
31	http_trans_depth	Number	Pipelined depth into the HTTP connection
32	http_method	String	HTTP request methods such as GET, POST and HEAD
33	http_uri	String	URIs used in the HTTP request
35	http_version	String	The HTTP versions utilised such as V1.1
36	http_request_body_len	Number	Actual uncompressed content sizes of the data transferred from the HTTP client
37	http_response_body_len	Number	Actual uncompressed content sizes of the data transferred from the HTTP server
38	http_status_code	Number	Status codes returned by the HTTP server
39	http_user_agent	Number	Values of the User-Agent header in the HTTP protocol
40	http_orig_mime_types	String	Ordered vectors of mime types from source system in the HTTP protocol
41	http_resp_mime_types	String	Ordered vectors of mime types from destination system in the HTTP protocol
42	weird_name	String	Names of anomalies/violations related to protocols that happened
43	weird_addl	String	Additional information is associated to protocol anomalies/violations
44	weird_notice	Bool	It indicates if the violation/anomaly was turned into a notice
45	label	Number	Tag normal and attack records, where 0 indicates normal and 1 indicates attacks
46	type	String	Tag attack categories, such as normal, DoS, DDoS and backdoor attacks, and normal records

Open in a new tab

Hybrid feature selection approach

Filter based feature elimination (based on domain knowledge)

Considering the attacks present in the dataset, such as Denial of Service (DoS) attacks and Distributed Denial of Service (DDoS) attacks, the intruder attempts to capture access points or neighboring nodes by flooding them with useless packets, thereby hindering services to legitimate users by overwhelming the network. This behavior impacts the number of packets or bytes originated and received by a particular node. DoS and DDoS nodes generate a large number of useless packets when compared to normal nodes, thereby exhausting network resources.

Other attacks, such as injection attacks, can cause data loss or theft, denial of service, loss of data integrity, and may also compromise the entire system. Attacks such as backdoor attacks can illegitimately gain access to a system and exploit its resources, which may, in turn, lead to a Denial of Service attack. Thus, backdoor-based illegitimate access allows a large amount of packet flow between multiple malicious systems, potentially resulting in DoS conditions.

Ransomware attacks currently target victims to create DoS scenarios. In ransomware and random DoS attacks, data is not stolen; instead, victims or organizations are sent emails demanding ransom payments. A large number of such request emails may be initiated from malicious IP addresses using privacy-based email providers. Consequently, malicious intruders may generate a high volume of such requests, increasing network traffic originating from these IP addresses.

Man-in-the-Middle (MITM) attacks interrupt and relay communication between two entities. MITM attacks exist in various forms, including IP spoofing, Domain Name System spoofing, HTTP spoofing, Secure Sockets Layer hijacking, email hijacking, Wi-Fi eavesdropping, session hijacking, and cache poisoning. In all these attack types, considering their mechanisms, there is an increase in the number of control packets exchanged to gain access between systems. This results in deliberate changes in the network parameters of the intruder nodes.

Scanning attacks are typically performed to gather sensitive information about a victim or organization, such as IP addresses, open ports and services, operating system details, and user account information. Collecting such information involves the exchange of significant control traffic. Cross-site scripting (XSS) attacks involve sending malicious links to users to gain access to their resources. These attacks typically target multiple users, increasing the traffic flow generated by the intruder.

Thus, all the attacks mentioned in the dataset affect network traffic and flow patterns, leading to an increase in both data packets and control packets exchanged within the network. Therefore, analyzing the connectivity activity and statistical activity parameters of the ToN-IoT dataset can significantly aid in detecting the presence of these attacks in the network. The network parameters present in the dataset are:

Timestamp
Source IP
Source Port
Destination IP
Destination Port
Proto
Service
Duration
Source bytes
Destination Bytes
Connection State
Missed Bytes
Source packets
Source IP Bytes
Destination Packets
Destination IP Bytes

After careful analysis and consideration of the attack characteristics obtained from modeling different attacks, it was observed that all attacks were forcefully initiated from certain IP addresses and ports. Thus, source IP, destination IP, source ports, and destination ports were eliminated from the dataset to avoid bias and improve generalization in detection. The timestamp feature also does not convey any meaningful information for attack classification and was therefore omitted. According to the attack modeling and analysis, the proto feature indicates whether the transport-layer protocol is TCP or UDP. The remaining features, such as service, connection state, and missed bytes, do not convey useful information regarding the transfer of data or control packets.

Thus, after omitting these unnecessary features from the Train-Test Network dataset, the remaining seven features were considered relevant. The seven features selected and carried forward to the next step of the feature selection process are:

Duration
Source bytes
Destination Bytes
Source packets
Source IP Bytes
Destination Packets
Destination IP Bytes

The entire process of analyzing the performance of each feature against the considered objective function is carried out using a base classifier. The classifier chosen in this work is the Decision Tree, due to its benefits and advantages. Moreover, the Decision Tree has been proven to attain high accuracy on this particular dataset, as reported in previous studies¹⁹.

Wrapper methods

As part of the hybrid feature selection approach, as depicted in Fig. 3, techniques such as forward feature selection, backward feature elimination, and genetic algorithm–based feature selection were applied to the features obtained from the filter-based approach.

The selected wrapper methods were chosen for the following reasons:

Forward Feature Selection provides a computationally efficient greedy strategy for identifying minimal feature subsets.
Backward Feature Elimination validates feature importance by assessing performance degradation upon feature removal.
Genetic Algorithm enables global search and avoids local optima, ensuring robustness of the selected feature subset.

Using these three complementary wrapper methods allows cross-validation of feature relevance and helps identify common features that consistently contribute to high detection accuracy.

Forward Feature Selection

In the forward feature selection process, each feature in the entire feature space is added iteratively, one by one, and the impact of each added feature is analyzed by computing its performance against the selected objective function. The objective function may be accuracy, error rate, kappa value, etc. Thus, the efficacy of each added feature is evaluated with respect to the chosen objective function.

The overall workflow of the hybrid feature selection approach is shown in the figure below, which incorporates both filter-based and wrapper-based methods. As discussed in the previous section, the full feature space comprising 43 features is reduced to a minimum of 7 features through filtering. The next step involves performing forward feature selection on these seven relevant features based on the chosen classifier model. As illustrated in the figure, the red-outlined block represents the forward feature selection process. The reduced seven-feature space is initially empty, and each feature is added iteratively. For each iteration, accuracy is computed using the Decision Tree classifier. If the addition of a feature improves accuracy compared to the previous iteration, the feature is retained in the optimal feature subset; otherwise, it is discarded.

In this manner, each feature is carefully evaluated, and the corresponding accuracy is recorded. Among the seven features, those achieving the highest accuracy constitute the optimal feature subset. Accordingly, for the seven features considered, the obtained accuracies and the selected optimal features are tabulated in the Fig. 4 below (as obtained from the machine learning platform).

Fig. 4 — Obtained Accuracy of Classification by Addition of features one by one using Forward Feature Selection.

From the table obtained using the forward feature selection technique on the machine learning platform, it is evident that after adding features 1, 2, 3, 4, and 5 sequentially, the maximum accuracy attained was 0.973. After adding features 6 and 7, there was no improvement in performance accuracy. Hence, the first five features were selected as the optimal feature set to maintain an accuracy of 0.973 and were carried forward for further analysis.

As shown in Fig. 5, the filtered table summarizes the features retained after forward feature selection, indicating that the first five features contribute to the highest achieved accuracy.

2.
Backward Feature Elimination technique

Backward feature elimination is the reverse of the forward feature selection technique, in which the entire feature space is considered initially and each feature is eliminated iteratively. After each feature elimination, the accuracy (i.e., the objective function) is computed and evaluated to determine whether the recently eliminated feature has a negative impact on the attained accuracy. If the elimination of a feature results in a reduction in performance accuracy, the removed feature is considered important and is retained in the optimal feature subset. Thus, each feature is evaluated iteratively for its impact on performance; if there is a sharp decrease in accuracy, the feature is deemed essential, otherwise it is discarded from the feature space.

As discussed in the previous section, the full feature space comprising 43 features is reduced to a minimum of 7 features through filtering. The next step involves performing backward feature elimination on these seven relevant features based on the chosen classifier model. This process is depicted as part of the hybrid feature selection framework in the figure above, where the backward feature elimination block is highlighted in blue. Initially, all seven features are considered together, and the accuracy is computed using the selected classifier model. Subsequently, each feature is removed iteratively, and the resulting accuracy is computed and tabulated, as shown in the Fig. 6 below (as obtained from the machine learning platform).

Fig. 6 — Obtained Accuracy of Classification by elimination of features one by one using Backward Feature Elimination Technique.

From the tabulated results shown above, it is observed that eliminating features 5, 4, 3, 2, and 1 results in a reduction in accuracy, indicating that these features are crucial for maintaining performance. In addition, the first feature, src_ip_bytes, is also important for preserving the initial accuracy attained (0.973). Thus, out of the seven features considered, six features are identified as essential for maintaining the maximum accuracy of 0.973. Figure 7 shows a snapshot of the filtered results obtained from the backward feature elimination process, indicating that the removal of key features leads to a noticeable reduction in classification accuracy.

Fig. 7 — Snapshot of the filtered table from backward feature elimination.

3.
Genetic algorithm based feature selection

Genetic algorithm–based feature selection is an approach used to identify an optimal set of features. To determine the optimal feature set, an initial population is selected from the entire feature space. From this population, feature subsets are generated, and the objective function is evaluated. Feature subsets that improve performance accuracy are selected to form the next generation, and these subsets are again evaluated against the objective function. Mutation and crossover operations are then performed on the next generation. This process continues until an optimal feature subset that achieves maximum accuracy is obtained. The process is terminated based on a stopping criterion, which is defined using the formula given below.

Hyperparameters of Genetic Algorithm

Stopping criterion = population size*(maximum generations + 1).

Population size is considered to be as 20, Maximum Number of Generations is considered as 10 in our case.

As a result of the process carried out, the following figure shows the various subsets of features being considered as the features during each generation and its respective accuracy attained.

As shown in Fig. 8, successive generations of the genetic algorithm explore various feature subsets, with accuracy evaluated at each generation to identify the optimal feature set.

Figure 9 presents a snapshot of the filtered feature subset obtained from the genetic algorithm–based feature selection approach.

Fig. 9 — Snapshot of the filtered table from Genetic algorithm based feature selection approach.

Proposed new reduced ToN_IoT dataset with 5 features using hybrid feature selection approach

Table 6 summarizes the best features selected by different wrapper-based feature selection methods, namely forward feature selection, backward feature elimination, and the genetic algorithm.

Table 6.

Best Features selected from different wrapper methods.

S.N	Forward Feature Selection()	Backward Feature Elimination()	Genetic Algorithm ()
1	Duration	duration	Duration
2	Src_bytes	Src_bytes	Src_bytes
3	Dst_bytes	Dst_bytes	Dst_bytes
4	Src_IP_bytes	Src_pkts	Src_pkts
5	Dst_Ip_bytes	Src_IP_bytes	Src_IP_bytes
6		Dst_IP_bytes	Dst_IP_bytes

Open in a new tab

The effectiveness of the proposed hybrid feature selection method can be explained mathematically and structurally. The hybrid method evaluates a candidate feature subset Inline graphic in two stages:

(A)
Filter-Based Reduction

Domain-redundant and bias-inducing features are removed based on their contribution Inline graphic to discriminatory behavior:

where.

represents domain relevance (traffic influence, control packet impact, anomaly sensitivity),
is the minimum acceptable relevance threshold.

This reduces the search space from 43 → 7 features.

(B)
Wrapper-Based Optimization

Each wrapper algorithm optimizes a feature subset based on classifier accuracy:

where.

= classifier (Decision Tree/KNN),
= accuracy obtained when trained over subset .

Each wrapper contributes:

Sixth feature excluded because it failed the stability criterion:

where Inline graphic = minimum improvement threshold (no accuracy gain + increased time complexity).

Thus from all the different wrapper methods considered, out of the seven features taken from the filter based feature elimination step, six features play a major role in attaining the expected maximum accuracy. Out of the six, our aim is always to reduce the count of features despite having maximum accuracy. After careful analysis and based on the time taken to execute all the feature selection algorithm, the five common features spread across all the three algorithms are considered the final best common features which can be used for classification of the different attacks considered in this work. Thus a new reduced ToN_IoT dataset with features duration, src_bytes, dst_bytes, src_IP_bytes and dst_IP_bytes is proposed.

Since the hybrid feature selection pipeline incorporates stochastic components (particularly the Genetic Algorithm), we acknowledge that convergence behavior may vary across runs. To reduce the influence of randomness, the algorithm was executed multiple times during the developmental phase, and it was observed that the same five features were repeatedly selected with only marginal ordering variations. Therefore, the chosen feature subset is stable and not dependent on a single-run outcome.

The results obtained from forward feature selection, backward feature elimination, and genetic algorithm optimization were analyzed independently and then combined using an intersection-based strategy. Features that consistently appeared in the optimal subsets across all three wrapper methods were considered robust and less sensitive to the choice of optimization technique. Although six features were identified as important by at least one wrapper method, only five features were commonly selected across all three methods. Inclusion of the sixth feature did not yield any measurable improvement in classification accuracy, while increasing computational overhead. Therefore, the final feature set was intentionally restricted to the five common features to achieve an optimal balance between detection performance and computational efficiency.

The significance of the above-listed features, as depicted in Fig. 10, in effectively detecting multiple attacks lies in the fact that all these features contribute crucial data used to calculate the number of outgoing and incoming packets and bytes for any node in the network. Therefore, any abnormality in the volume of outgoing or incoming data and their associated statistics indicates the presence of an intruder, as such activities consume an abnormal amount of network traffic.

Based on the time consumed to apply the feature selection approaches as tabulated in Table 7, the forward feature selection process required the least amount of time. The time required to apply these algorithms also depends on the efficiency of the system. All algorithms were executed on the same system; therefore, the reported timings can be considered as a reference template, which may vary depending on processor speed and other hardware characteristics.

Table 7.

Time Complexity of various wrapper methods.

Technique used	Time consumed to train
Forward Feature Selection	576675 ms
Backward Feature Elimination	636742 ms
Genetic Algorithm	859059 ms

Open in a new tab

The effectiveness and efficiency of the proposed hybrid feature selection approach stem from its two-stage design. In the first stage, domain knowledge–based filtering removes biased identifiers and weakly relevant features, thereby significantly reducing the search space and computational overhead for subsequent optimization. In the second stage, multiple wrapper-based methods are employed to evaluate feature subsets in conjunction with the classifier, allowing the method to capture feature interactions that are not identified by filter methods alone. By selecting features that are consistently identified across forward selection, backward elimination, and genetic algorithm optimization, the proposed framework ensures robustness while avoiding overfitting. This hybrid strategy achieves a balance between detection accuracy and computational efficiency, making it suitable for real-time IoT intrusion detection.

Classification model

All the selected features were trained using the Decision Tree classification model. The Decision Tree classification model works as follows.

The Decision Tree, as depicted in Fig. 11, is one of the most widely used supervised machine learning techniques for classification. A tree-like structure is constructed from the given dataset using two types of nodes: decision nodes (which make decisions) and leaf nodes (which represent outcomes). The following steps are performed to construct the tree:

Step 1: The entire dataset, denoted as D, is taken as the root node.

Step 2: An Attribute Selection Measure (ASM) is used to identify the best feature based on two metrics, namely information gain and the Gini index. During each iteration, the unvisited feature with the highest information gain is selected.

Step 3: The dataset is divided into multiple subsets, each containing the best values for the selected features.

Step 4: A decision tree node is constructed using the selected best attribute.

Step 5: New decision tree nodes are created iteratively using the subsets generated in Step 3. This process is repeated until no further classification is possible and the final node becomes a leaf node.

Empty and Insignificant branches are the significant drawbacks of decision trees. This is due to the unwanted features present in the dataset which in turn creates unnecessary branches and makes the tree’s structure complicated. This is reduced by our hybrid feature selection approach.

Results and discussion

Experimental setup

All experiments in this study were conducted using a combination of Python and the KNIME Analytics Platform. Python-based implementations were primarily used for data preprocessing, feature selection, and classification experiments, while KNIME was employed for workflow-based validation and cross-verification of results to ensure consistency.

The experimental evaluation was performed on a standard desktop computing environment equipped with an Intel Core i5 processor, 8 GB RAM, and running a 64-bit Windows operating system. No GPU acceleration was utilized during experimentation.

The evaluation was conducted using the original ToN-IoT dataset, considering both binary classification (benign vs. attack) and multi-class attack classification scenarios. The dataset was divided into training and testing sets using a consistent train–test split strategy, which was applied uniformly across all experiments.

All experiments were conducted using a stratified train–test split to preserve the original class distribution in both binary and multi-class settings. Unless otherwise stated, 70% of the data was used for training and 30% for testing, with a fixed random seed (random_state = 42) to ensure reproducibility. Performance was evaluated using accuracy, precision, recall, F1-score, confusion matrices, and Cohen’s Kappa.

To confirm the robustness of the proposed reduced feature set, k-fold cross-validation was performed using the selected classifiers. A stratified fivefold cross-validation strategy was adopted to ensure balanced class representation across folds. The cross-validation results showed consistent performance with low variance, closely matching the results obtained from the train–test split evaluation. This demonstrates that the reported performance is stable and not sensitive to a specific data partition.

Similar performance trends were observed under stratified k-fold cross-validation, confirming the stability of the proposed feature subset across different data partitions.

To prevent information leakage, feature selection was performed exclusively on the training data, and the selected feature subsets were subsequently applied to the test data. This ensured fair and unbiased evaluation of classification performance.

Decision Tree (DT) and K-Nearest Neighbour (KNN) classifiers were employed to assess the effectiveness of the proposed reduced feature set. These classifiers were chosen due to their low computational overhead, interpretability, and suitability for evaluating lightweight intrusion detection systems in resource-constrained IoT environments.

Performance evaluation was carried out using multiple metrics, including accuracy, precision, recall, F1-score, confusion matrices, and Cohen’s Kappa, to ensure a balanced assessment under class imbalance conditions. All experiments were conducted consistently across both the original and reduced feature sets to enable fair comparison.

Time complexity comparisons presented in this work reflect relative reductions in training and inference time achieved through feature dimensionality reduction under the same hardware and software environment, rather than absolute runtime benchmarking across different platforms.

Hybrid Feature selection approach

The machine learning models in this work were trained using the KNIME Analytics Platform, which provides access to various machine learning repositories for dataset training. An example workflow is shown in Fig. 12.

Fig. 12 — Forward Feature Selection Approach for Decision Tree Learner and Predictor created using KNIME Analytics Platform.

Various workflows were implemented on the proposed reduced dataset. The example above illustrates the backward feature elimination technique using a Decision Tree classifier, followed by analysis of the reduced dataset with the same classifier. All workflows were constructed using a 70/30 train–test split, with 70% of the data for training and 30% for testing. Accuracy was used as the objective function for feature selection.

Similar workflows were also implemented using other dimensionality reduction techniques. After detailed study and evaluation, the hybrid approach was found to be effective. As described in Section "Proposed work" on the hybrid feature selection approach, the best five features were selected. The resulting feature table and reduced dataset for each feature selection method are presented in detail in Section "Proposed work". Among the three wrapper methods tested, forward feature selection required the least time to identify the optimal features, as shown in Fig. 13.

Fig. 13 — Time Complexity of Different Wrapper methods used.

Statistical validation of selected features

Information gain–based feature relevance analysis

To confirm the efficacy of the selected five features, the Information Gain (IG) of each attribute was computed with respect to attack classification. Let Inline graphic denote the ToN-IoT dataset consisting of network flow samples, and let represent the corresponding attack class labels. Information Gain measures the reduction in entropy of the class label achieved by observing a feature , thereby quantifying its individual contribution to attack discrimination.

The information gain values for all features are presented in Table 8.

Table 8.

Information Gain of the Chosen Features.

Feature	Information Gain
Src_IP_bytes	1.526
Dst_IP_bytes	1.109
Duration	0.72
Dst_bytes	0.616
Src_bytes	0.598
Dst_pkts	0.47
Src_pkts	0.447

Open in a new tab

The computed Information Gain values for all candidate features are presented in Table 8. The results indicate that IP- and byte-level traffic attributes exhibit higher discriminative relevance, with Src_IP_bytes and Dst_IP_bytes achieving the highest IG scores. Importantly, the Information Gain–based ranking is consistent with the ordering obtained through the forward feature selection method, reinforcing the stability of the selected feature subset.

b.
Statistical significance validation using ANOVA

To further validate the discriminative capability of the selected features, a one-way Analysis of Variance (ANOVA) test was conducted on the proposed five-feature reduced ToN-IoT dataset under both binary and multi-class attack classification settings. ANOVA evaluates whether the mean values of a feature differ significantly across multiple class groups, thereby assessing its statistical relevance for intrusion detection. The objective was to evaluate whether each selected feature provides statistically significant discriminative power.

The probability that the observed differences between class/groups are due to random chance rather than an actual effect of the feature.

Formally:

So, the p-value helps determine if a feature actually contributes to distinguishing classes (attacks vs normal) or if it behaves similarly across them.

p < 0.05 → The feature is statistically significant; its variation contributes meaningfully to distinguishing between classes (effective for IDS).
p ≥ 0.05 → The feature is not statistically significant; contributes less discrimination.

Mathematically, the p-value represents:

where.

= computed F-statistic,
= null hypothesis stating that class means are equal (i.e., feature has no discriminatory power).

A lower p-value indicates higher confidence in rejecting Inline graphic , proving the feature is relevant.

Table 9 shows the p-values for both binary and multi-class scenarios.

Table 9.

p values obtained on the proposed reduced dataset for binary and multi class attack classification.

Feature	Binary p-value	Multi-Class p-value	Interpretation
duration	0.680388	0.000000	Required for multi-class discrimination
src_bytes	0.000000	0.000000	Strong indicator of abnormal traffic
dst_bytes	0.000000	0.000000	Consistently significant across tasks
src_ip_bytes	0.000687	0.008897	Retained for robustness
dst_ip_bytes	0.363643	0.010958	Optional in binary, essential in multi-class

Open in a new tab

The results confirm that:

A minimal subset of 3 features is sufficient for binary intrusion detection
The full reduced set of 5 features is required for reliable multi-class attack detection
The hybrid strategy preserves performance while reducing dimensionality and computation cost

The 5-feature hybrid dataset is statistically validated.

It balances efficiency (minimal features) with performance (statistical significance), making it viable for real-time IDS.

Lemma 1 (Robust Feature Relevance Across Statistical and Wrapper-Based Selection)

Features that consistently demonstrate high Information Gain and statistically significant ANOVA p-values, while also being selected across multiple wrapper-based strategies, exhibit robust discriminative relevance and reduced selection bias.

This lemma motivates the hybrid feature refinement strategy adopted in this work and does not assume optimality with respect to any specific classifier.

Classification performance

Since the feature selection procedures were conducted using a Decision Tree classifier, the classification performance of the full ToN-IoT dataset, the NF-ToN-IoT (NetFlow) dataset, and the proposed hybrid feature selection approach were compared.

The Decision Tree hyperparameters used in this work are listed in Table 10.

Table 10.

Configurable Hyper Parameters in Decision Tree.

Parameter	Configured value/method
Attribute Selection Measure	Gini Index
Pruning Method	Reduced Pruning
Minimum number of records per node	2
Do not split subsets smaller than	2
No. of decision trees created	1702

Open in a new tab

The Gini index is an attribute selection methodology that measures the probability of a variable being misclassified when chosen randomly. The dataset contains nine different attack types along with benign samples. Classification performance was evaluated using metrics such as accuracy, precision, recall, F-measure, confusion matrix, and ROC curve. Before presenting the results, the significance of all features is briefly discussed.

The four fundamental metrics used to compute classification performance are true positive (TP), true negative (TN), false positive (FP), and false negative (FN). These metrics describe the relationship between predicted and actual classes:

True Positive (TP): A benign sample correctly identified as benign.
True Negative (TN): An attack sample correctly identified as an attack.
False Positive (FP): An attack sample incorrectly classified as benign.
False Negative (FN): A benign sample incorrectly classified as an attack.

These outcomes are summarized in the confusion matrix shown in Table 11.

Table 11.

Confusion matrix.

		Predicted values
		Benign	Attack
Actual values	Benign	True Positive	False Negative
Actual values	Attack	False Positive	True Negative

Open in a new tab

Accuracy, Precision, Recall,F1 score are calculated from the four metrics computed in confusion matrix. The formulas are as follows

Accuracy metric (Eq. 8) conveys the overall correctness of the machine learning model classification. Precision (Eq. 10) denotes how good the machine learning model in predicting the particular class. Recall (Eq. 11) gives the true positive rate. F1 score (Eq. 9) is the accumulation of precision and recall over a harmonic mean function. The snapshot of the performance metrics as obtained from the machine learning platform is given below in Fig. 14 and 15

Fig. 14 — Snapshot of the Obtained Recall, Precision, Sensitivity, Specificity, F -measure values of the proposed Hybrid Feature Selection based Classification.

Fig. 15 — Snapshot of the Confusion Matrix as Obtained from Machine Learning Platform.

The reason for less accuracy is due to the incorrect prediction of Man In The Middle attacks (MITM).

The MITM attack category exhibits comparatively lower classification accuracy in the reduced feature space. This is primarily due to the passive nature of MITM attacks, which often mimic benign traffic patterns and lack strong flow-level signatures. After removing IP- and port-based identifiers to improve generalization, distinguishing MITM behavior becomes more challenging using lightweight network features alone.

Unlike volumetric attacks such as DoS or DDoS, MITM attacks often operate passively by intercepting or relaying packets without generating distinctive traffic patterns, making them inherently difficult to distinguish using lightweight, flow-based features alone.

Additionally, the proposed feature reduction strategy prioritizes features that are broadly effective across multiple attack categories. While this improves overall generalization and computational efficiency, it may reduce sensitivity to subtle attacks such as MITM that rely on fine-grained protocol-level or payload-level characteristics. This highlights a trade-off between dataset generality and sensitivity to subtle interception-based attacks. Future work will explore incorporating protocol-specific or temporal features to improve MITM detection while maintaining the lightweight nature of the dataset.

Of the two classifiers evaluated on the proposed hybrid feature selection–based dataset and the traditional ToN-IoT and NF-ToN-IoT datasets, K-Nearest Neighbour (KNN) and Decision Tree (DT) performed equally well for binary classification, achieving an accuracy of 0.986, as shown in Tables 12, 13 and 14. For multi-attack classification, as presented in Tables 13 and 15, both classifiers achieved an accuracy of 0.972.

Table 12.

Performance Analysis of Decision tree- Binary Classification.

	Decision tree model building time	Accuracy	Recall	Precision	Sensitivity	Specificity	F measure	Cohen’s Kappa
ToN-IoT Dataset	34,562 milli seconds	1	0.999	0.999	0.999	0.999	0.999	0.99
NF ToN-IoT Dataset	173,065 milli seconds	0.999	0.998	0.998	0.998	0.998297	0.998	0.997
Proposed Reduced Dataset	16,829 milli seconds	0.986	0.9865	0.9835	0.9865	0.9865	0.985	0.97

Open in a new tab

Table 13.

Performance Analysis of Decision tree- Multi attack classification.

	Decision tree model building time	Accuracy	Recall	Precision	Sensitivity	Specificity	F measure	Cohen’s Kappa
ToN-IoT Dataset	22879 milli seconds	1	0.997	0.999	0.997	1	0.998	0.999
NF ToN-IoT Dataset	108406 milli seconds	0.9962	0.997	0.995	0.994	0.992	0.997	0.997
Proposed Reduced Dataset	12627 milli seconds	0.972	0.9183	0.9204	0.918	0.9961	0.9189	0.95

Open in a new tab

Table 14.

Performance Analysis of K Nearest Neighbour- Binary Classification.

	KNN Building and prediction time	Accuracy	Recall	Precision	Sensitivity	Specificity	F-measure	Cohen’s Kappa
ToN-IoT Dataset	319597 milli seconds	0.999	0.998	0.998	0.998	0.998	0.998	0.997
NF ToN-IoT Dataset	34287 milli seconds	0.994	0.984	0.993	0.984	0.984	0.988	0.978
Proposed Reduced Dataset	369578 milli seconds	0.986	0.9855	0.9835	0.9855	0.9855	0.9845	0.969

Open in a new tab

Table 15.

K Nearest Neighbour- Multi attack Classification.

	KNN Building and prediction time	Accuracy	Recall	Precision	Sensitivity	Specificity	F-measure	Cohen’s Kappa
ToN-IoT Dataset	43419 milli seconds	0.999	0.997	0.997	0.997	0.999	0.997	0.998
NF ToN-IoT Dataset	27808 milli seconds	0.57	0.356	0.412	0.356	0.934	0.367	0.383
Proposed Reduced Dataset	272629 milli seconds	0.972	0.914	0.917	0.914	0.996	0.915	0.95

Open in a new tab

Thus, instead of using the large traditional ToN-IoT and NetFlow-based ToN-IoT datasets comprising 44 and 12 features respectively, the proposed hybrid feature selection dataset with only 5 features was able to detect attacks accurately. The NetFlow-based ToN-IoT dataset was not able to achieve comparable accuracy for multi-class classification. In contrast, the proposed hybrid feature selection dataset performed extremely well for both binary and multi-attack classification, demonstrating its efficiency.

With respect to other classification metrics—such as recall, precision, sensitivity, specificity, F-measure, and Cohen’s Kappa—the proposed reduced dataset achieved results comparable to those of the original, much larger datasets.

The main objective of this research is to reduce the time complexity associated with handling large datasets. To achieve this, a reduced dataset with a minimal number of features was proposed, based on domain knowledge and the hybrid feature selection approaches. From the graphs in Figs. 16, 17, and 18, corresponding to binary and multi-attack classification, it is evident that the proposed dataset requires less time to train the classification models, due to the reduced number of features, while maintaining performance comparable to the original dataset.

Fig. 16 — Accuracy Metrics of the Proposed Hybrid Selection based Reduced Dataset vs the existing datasets.

Fig. 17 — Time taken to build the binary classification model for various datasets vs the proposed reduced dataset.

Fig. 18 — Time taken to build the multi attack classification model for various datasets vs the proposed reduced dataset.

Although the proposed reduced feature set achieves competitive detection performance, it is not intended to universally replace the full ToN-IoT or NF-ToN-IoT datasets. Instead, it provides a lightweight alternative that trades marginal class-wise accuracy variations for improved computational efficiency and practical deployability.

To address concerns regarding classifier dependency and overfitting, the proposed reduced ToN-IoT dataset was further evaluated using several ensemble learning techniques, including Random Forest, AdaBoost, Gradient Boosting, XGBoost, Bagging, and a Voting Ensemble. Table 16 presents the comparative performance metrics.

Table 16.

Comparison with Ensemble Learning Models (Binary & Multi Class Classification).

Model	Binary Classification Results				Multi Class Attacks Classification Results
Model	Accuracy	Precision	Recall	F1-Score	Accuracy	Precision	Recall	F1-Score
Decision Tree	0.9860	0.98	0.98	0.98	0.972	0.9204	0.9183	0.9189
KNN	0.9720	0.97	0.97	0.97	0.972	0.917	0.914	0.915
Random Forest	0.9866	0.99	0.99	0.99	0.9734	0.9313	0.9279	0.9292
AdaBoost	0.9129	0.91	0.89	0.90	0.6961	0.2825	0.2064	0.1928
Gradient Boosting	0.9717	0.97	0.97	0.97	0.9677	0.9103	0.9045	0.9071
XGBoost	0.9798	0.98	0.98	0.98	0.9689	0.9185	0.9195	0.9187
Bagging	0.9863	0.99	0.99	0.99	0.9729	0.9240	0.9254	0.9245
Voting Ensemble	0.9834	0.98	0.98	0.98	0.9721	0.9334	0.9232	0.9279

Open in a new tab

While ensemble models marginally outperform single classifiers in certain cases, the performance improvement is not statistically significant when compared to the Decision Tree classifier trained on the proposed five-feature subset. For instance, Random Forest achieves an accuracy of 0.9866, whereas Decision Tree achieves comparable accuracy with significantly lower computational overhead.

Moreover, ensemble methods require multiple base learners, increased memory consumption, and longer inference time, which limits their applicability in resource-constrained IoT and edge environments. In contrast, the Decision Tree model offers strong interpretability, faster inference, and ease of deployment, making it more suitable for real-time intrusion detection scenarios.

These results confirm that the proposed hybrid feature selection framework generalizes well across different classifier families and does not rely on the complexity of ensemble models to achieve high detection accuracy.

Although ensemble methods marginally improve accuracy, the proposed five-feature dataset enables simple classifiers such as Decision Tree to achieve comparable performance with significantly lower computational complexity, making it more suitable for real-time IoT intrusion detection as tabulated in Table 15.

Analytical discussion of results

The results demonstrate that the proposed hybrid feature selection approach achieves accuracy comparable to the full ToN-IoT feature set while significantly reducing dimensionality. This behavior can be explained through the structural characteristics of the algorithms and the nature of the selected features:

Impact of Hybrid Selection The two-stage approach—domain knowledge filtering followed by wrapper optimization—removes biased identifiers (IPs, ports) and retains traffic-centric features that directly reflect attacker behavior. This reduces overfitting and improves generalization.
Forward vs. Backward vs. GA Interaction
- Forward Selection improves performance gradually, showing how each feature adds incremental discriminative value.
- Backward Elimination confirms that removing certain flow-based features results in accuracy drops, proving their significance.
- Genetic Algorithm identifies stable subsets across generations, reaffirming that the top 5 features remain critical regardless of search path.
- Together, their convergence validates the robustness of the final feature subset.
Why 5 Features Maintain Accuracy The selected features describe core attack patterns—abnormality in packet/byte flow, directional imbalance, and duration anomalies. These attributes are consistent indicators across DoS, DDoS, Backdoor, Injection, and Scanning attacks, enabling the reduced dataset to sustain performance.
Observed Limitation – MITM The drop in MITM detection accuracy occurs because MITM alters communication structure rather than traffic volume. This attack influences protocol behavior more than byte/packet deviation. This limitation is acknowledged and highlighted as a direction for future work involving protocol-aware and session-state features.
Alignment With Research Hypothesis The findings validate the hypothesis that:

“A bias-aware and traffic-centric reduction strategy can maintain IDS performance while enabling lightweight deployment.”

Despite minor variations in predictive performance, all evaluated classifiers exhibit closely clustered accuracy values, indicating that feature selection plays a more dominant role than classifier complexity.

Potential applications

The proposed hybrid feature selection approach and the resulting 5-feature reduced ToN-IoT dataset are suitable for deployment in lightweight and real-time IoT intrusion detection scenarios. Due to its low computational complexity O(n⋅k), the model can be integrated into resource-constrained devices such as gateways, routers, Raspberry-Pi-based IoT hubs, and edge nodes for early threat detection. It can also serve as a fast pre-screening engine in fog and edge-assisted IDS pipelines, where rapid traffic triaging is necessary before forwarding selected flows to heavier cloud-based analytics. In addition, the reduced feature set enables interoperability with SDN/NFV-based IoT architectures, making it suitable for automated policy enforcement and dynamic flow isolation. Security Operations Centers (SOCs) may also incorporate the model as a lightweight anomaly scoring layer for alert prioritization and risk-based monitoring, especially in environments where IP/port-based identifiers cannot be trusted. These application domains demonstrate the operational relevance of the proposed method for practical IoT security deployments.

Performance analysis of the proposed reduced feature space over other IDS datasets

To evaluate the cross-dataset generalization capability of the proposed five-feature representation, the same methodology was applied to the CICIoT2023 dataset. Since the dataset is distributed across 169 independent CSV files, incremental loading was adopted in Google Colab to avoid memory overhead. The dataset directory was mounted from Google Drive, CSV files were automatically indexed, and the first 80% of the files were processed to incrementally fit a StandardScaler and train a Decision Tree classifier, while the remaining 20% were reserved for final evaluation. Because the original features used in the ToN-IoT dataset (Src_bytes, Dst_bytes, Src_IP_bytes, Dst_IP_bytes, Duration) are not directly available in CICIoT2023, the most semantically equivalent features were selected, namely flow_duration, Tot size, Tot sum, Number, and IAT. These features were chosen because they collectively represent network throughput, packet volume, communication intensity, and flow behaviour in a manner that is functionally closest to the original five-feature set. The experiments were executed in the Google Colab free-tier environment (Intel Xeon CPU, 12 GB RAM, Python 3.12, Ubuntu runtime), using Pandas, NumPy, Scikit-learn, and TQDM. The Decision Tree model was trained with default lightweight hyperparameters (gini criterion, no fixed maximum depth, minimum split of two samples, and one sample per leaf), which aligns with the intended low-complexity deployment of the reduced feature IDS. Incremental fitting was completed on 135 CSV files, followed by testing on 34 files, resulting in a final accuracy of 98.90% across multi-attack labels. This confirms that the reduced five-feature set preserves its discriminative capability on a structurally different dataset, thereby demonstrating robust generalization, low computational overhead, and practical applicability for real-time IoT intrusion detection across domains.

From the comparative analysis presented in Table 17, of the popular IoT and network intrusion datasets, it is evident that there is a trade-off between dataset richness and edge/IoT suitability. Large, feature-rich datasets like IoT-23 and CICIoT2023 offer extensive attack diversity and realistic traffic, making them ideal for robust ML-based IDS training, but their size and complexity pose challenges for lightweight or real-time deployment on resource-constrained devices. On the other hand, datasets such as NF-ToN-IoT and the Proposed 5-feature ToN-IoT are carefully reduced with minimal, selected features, removing biases like fixed IPs/ports, and are highly suitable for edge-based or IoT deployments without significant loss of multi-class classification accuracy. The proposed lightweight 5-feature ToN-IoT, in particular, demonstrates that effective intrusion detection can be achieved with a small, optimized feature set, striking a balance between performance, efficiency, and real-world applicability for IoT environments.

Table 17.

Comparison with other IDS Datasets.

Dataset	# Features	# Samples (approx)	Attack Types	Feature Bias (IP/Port)	Feature Selection Applied	Lightweight (Reduced)	Suitable for Edge/IoT	Remarks
ToN-IoT (Full)¹	44	460 k +	Multiple	Yes, fixed IPs/ports	None	No	Moderate	Original dataset; includes bias
NF-ToN-IoT⁹	12	460 k +	Multiple	Yes, fixed IPs/ports	Fixed NetFlow features	Medium	Moderate	Reduced; multi-class accuracy lower
IoTID20³⁵	83	164 k	8	No	Some preprocessing	No	Low	Large number of features; mostly flow-based
IoT-23³⁶	115	~ 400 k	23	No	None	No	Low	Rich dataset; diverse attack types; high complexity
CICIoT2023³⁷	~ 47	~ 7.3 M (subset)/~ 46 M raw	7 main categories (e.g., DDoS, DoS, Recon, etc.)	No	Optional/researcher-applied (e.g., RF selection)	Medium	High	Extensive IoT topology with 105 devices; realistic real-device traffic; large scale data for ML/IDS research
Proposed 5-feature ToN-IoT	5	460 k +	Multiple	No	Domain-guided hybrid FFS + BFE + GA	Yes	High	Lightweight, minimal features, high multi-class accuracy

Open in a new tab

To ensure an unbiased assessment, consider the formal parameters:

n – number of samples/flows.
d – original feature dimension.
k – reduced feature dimension.
e – training epochs (for deep learning).
h – number of hidden units/layers.
p, G – population size & generations for meta-heuristics.

The Table 18 quantifies the runtime and memory overhead of each work:

Table 18.

Cost Effectiveness of the Proposed work with peers.

Study	Time Complexity	Space Complexity	Cost Effectiveness Summary
Sarhan et al.²¹	O(e·n·d·h)	O(n·d + h²)	High training cost; unsuitable for constrained IoT devices
Sarhan (NetFlow)⁹	O(n·d)	O(n·d)	Faster, but retains IP/port bias affecting generalization
Dey et al.²²	O(G·p·n·d)	O(n·d + p·d)	Optimization-heavy, high overhead; dataset bias remains
Gad et al.²⁴	O(n·d + c·n·k)	O(n·d + k)	Statistical improvement; high dependency on class balance
Tareq et al.²³	O(e·n·d·h)	O(n·d + h²)	Deep learning cost prohibits real-time deployment
Proposed Work	O(n·k)	O(n·k)	Lowest cost/fastest scaling, suitable for real-time IoT

Open in a new tab

Compared to existing solutions, the proposed method provides the lowest computational cost with linear scalability and minimal resource consumption. The five-feature representation (k = 5) reduces both runtime and memory, making the method more cost-effective for edge-level IDS deployment than deep learning or meta-heuristic-only approaches.

Conclusion and future scope

This work presented a hybrid feature selection–based reduction of the ToN-IoT dataset aimed at improving the generalizability and practical applicability of IoT intrusion detection systems. The original ToN-IoT dataset contains 44 features and was constructed under controlled experimental conditions in which attacks were deliberately launched from predefined IP addresses within a fixed subnet (192.168.1.0/24). As a result, IP- and port-based attributes emerged as dominant indicators for attack classification, a characteristic that does not reflect real-world IoT deployments where attacks may originate from diverse and dynamic network locations.

To address this limitation, the proposed approach explicitly eliminates IP- and port-based identifiers and focuses on behavior-driven network flow characteristics. Unlike existing reduced variants such as NF-ToN-IoT, which retain netflow identifiers including source and destination IPs and ports, this work derives a lightweight dataset composed solely of intrinsic traffic behavior features. A hybrid wrapper-based feature selection framework incorporating forward feature selection, backward feature elimination, and genetic algorithm–based optimization was employed. Five common features consistently selected across all three methods were identified as the final optimal feature subset for attack classification.

The effectiveness of the proposed reduced dataset was evaluated using multiple classifiers under both binary and multi-class settings. Using the proposed reduced feature set, the Decision Tree classifier achieved an accuracy of 0.986 for binary classification and 0.972 for multi-class attack classification, while the K-Nearest Neighbor classifier consistently achieved an accuracy of 0.972 for both binary and multi-class scenarios. In addition, ensemble learning models were evaluated to assess the robustness of the selected feature subset. Although ensemble classifiers yielded marginal performance improvements, the overall accuracy across models remained closely clustered, indicating that the discriminative capability primarily stems from the selected features rather than classifier complexity.

These findings demonstrate that effective feature engineering plays a more critical role than model sophistication in achieving high intrusion detection performance. Simple classifiers, when combined with the proposed reduced feature set, can achieve competitive results while offering advantages in interpretability and computational efficiency. Consequently, the proposed dataset serves as a practical benchmark for IoT intrusion detection research, enabling attack classification based purely on meaningful network behavior rather than environment-specific identifiers.

While this study focuses on traditional machine learning classifiers, the behavior of the selected features under encrypted traffic, latency-sensitive IoT systems, and highly heterogeneous environments has not yet been empirically evaluated. Future work will extend the evaluation to deep learning models as well as real-time streaming scenarios to assess end-to-end detection performance in realistic IoT deployments.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary Material 1^{(11.2MB, csv)}

Supplementary Material 2^{(13MB, xlsx)}

Author contributions

Ideation, Formulation and Manuscript preparation: Dharini N; Review of results and Manuscript preparation: Jeevaa Katiravan; Dataset preparation: Janani VS; Literature Review and Review of Results: Dharini N and Janani VS.

Funding

No Funding received.

Data availability

Dataset generated is available and shared as supplementary documents.

Declarations

Competing interests

The authors declare no conflicts of interest related to this research.

Ethical approval

No animal or human studies involved.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

1.Moustafa, N. A new distributed architecture for evaluating AI-based security systems at the edge: network TON_IoT datasets. Sustainable Cities Soc.72, 102994 (2021). [Google Scholar]
2.Booij, T. M., Chiscop, I., Meeuwissen, E., Moustafa, N. & Den Hartog, F. T. H. ToN IoT—The role of heterogeneity and the need for standardization of features and attack types in IoT network intrusion datasets. IEEE Internet Things J.9 (1), 485–496 (2022). [Google Scholar]
3.Alsaedi, A., Moustafa, N., Tari, Z., Mahmood, A. & Anwar, A. TON_IoT telemetry dataset: A new generation dataset of IoT and IIoT for data-driven intrusion detection systems. IEEE Access.8, 165130–165150 (2020). [Google Scholar]
4.Moustafa, N., Keshk, M., Debie, E. & Janicke, H. Federated TON_IoT Windows datasets for evaluating AI-based security applications, in Proc. IEEE 19th Int. Conf. Trust, Security and Privacy in Computing and Communications (TrustCom), pp. 848–855, (2020). 10.1109/TrustCom50675.2020.00114
5.Moustafa, N., Ahmed, M. & Ahmed, S. Data analytics-enabled intrusion detection: Evaluations of ToN_IoT Linux datasets, in Proc. IEEE 19th Int. Conf. Trust, Security and Privacy in Computing and Communications (TrustCom), pp. 727–735, (2020). 10.1109/TrustCom50675.2020.00100
6.Moustafa, N. New generations of Internet of Things datasets for cybersecurity applications based machine learning: TON_IoT datasets, in Proc. eResearch Australasia Conf., Brisbane, Australia, (2019).
7.Moustafa, N. A systemic IoT–Fog–Cloud architecture for big-data analytics and cyber security systems: A review of fog computing, arXiv preprint arXiv:.01055, 2019., 2019. (1906).
8.Ashraf, J. et al. IoTBoT-IDS: A novel statistical learning-enabled botnet detection framework for protecting networks of smart cities. Sustainable Cities Society, p. 103041, (2021).
9.Sarhan, M., Layeghy, S., Moustafa, N. & Portmann, M. NetFlow datasets for machine learning-based network intrusion detection systems, in Big Data Technologies and Applications. Berlin, Germany: Springer, 117–135, (2020). [Google Scholar]
10.Koroniotis, N., Moustafa, N., Sitnikova, E. & Turnbull, B. Towards the development of realistic botnet dataset in the internet of things for network forensic analytics: Bot-IoT dataset. Future Generation Comput. Syst.100, 779–796 (2019). [Google Scholar]
11.Koroniotis, N., Moustafa, N., Sitnikova, E. & Slay, J. Towards developing network forensic mechanisms for botnet activities in the IoT based on machine learning techniques, in Proc. Int. Conf. Mobile Networks and Management. Cham, Switzerland: Springer, pp. 30–44, (2017).
12.Koroniotis, N., Moustafa, N. & Sitnikova, E. A new network forensic framework based on deep learning for internet of things networks: A particle deep framework. Future Generation Comput. Syst.110, 91–106 (2020). [Google Scholar]
13.Koroniotis, N. & Moustafa, N. Enhancing network forensics with particle swarm and deep learning: The particle deep framework, arXiv preprint arXiv:.00722, 2020., 2020. (2005).
14.Koroniotis, N., Moustafa, N., Schiliro, F., Gauravaram, P. & Janicke, H. A holistic review of cybersecurity and reliability perspectives in smart airports. IEEE Access, (2020).
15.Koroniotis, N. Designing an Effective Network Forensic Framework for the Investigation of Botnets in the Internet of Things, Ph.D. dissertation, Univ. of New South Wales, Australia, (2020).
16.Moustafa, N., Slay, J. & UNSW-NB15. : A comprehensive data set for network intrusion detection systems, in Proc. Military Communications and Information Systems Conf. (MilCIS), IEEE, (2015).
17.Moustafa, N. & Slay, J. The evaluation of network anomaly detection systems: statistical analysis of the UNSW-NB15 dataset and the comparison with the KDD99 dataset. Information Secur. Journal: Global Perspective, pp. 1–14, (2016).
18.Moustafa, N. et al. Novel geometric area analysis technique for anomaly detection using trapezoidal area Estimation on large-scale networks. IEEE Trans. Big Data, (2017).
19.Moustafa, N. et al. Big data analytics for intrusion detection systems: statistical decision-making using finite dirichlet mixture models, in Data Analytics and Decision Support for Cybersecurity. Cham, Switzerland: Springer, 127–156, (2017). [Google Scholar]
20.Sarhan, M., Layeghy, S., Moustafa, N. & Portmann, M. NetFlow datasets for machine learning-based network intrusion detection systems, in Proc. 10th EAI Int. Conf. Big Data Technologies and Applications (BDTA), (2020).
21.Sarhan, S., Layeghy, S., Moustafa, N., Gallagher, M. & Portmann, M. Feature extraction for machine learning-based intrusion detection in IoT networks. Digit. Commun. Networks. 10.1016/j.dcan.2022.08.012 (2022). [Google Scholar]
22.Dey, A. K., Gupta, G. P. & Sahu, S. P. Hybrid meta-heuristic based feature selection mechanism for cyber-attack detection in IoT-enabled networks. Procedia Comput. Sci.218, 318–327 (2023). [Google Scholar]
23.Tareq, I., Elbagoury, B. M., El-Regaily, S. & El-Horbaty, E. S. M. Analysis of ToN-IoT, UNSW-NB15, and Edge-IIoT datasets using deep learning in cybersecurity for IoT. Appl. Sci.12, 9572 (2022). [Google Scholar]
24.Gad, A. R., Nashat, A. A. & Barkat, T. M. Intrusion detection system using machine learning for vehicular ad hoc networks based on ToN-IoT dataset. IEEE Access.9, 142206–142217. 10.1109/ACCESS.2021.3120626 (2021). [Google Scholar]
25.Alghamdi, R. & Bellaiche, M. An ensemble deep learning-based IDS for IoT using lambda architecture. Cybersecurity, 6, 5, (2023).
26.Guo, G. et al. An IoT intrusion detection system based on TON IoT network dataset, in Proc. IEEE 13th Annual Computing and Communication Workshop and Conf. (CCWC), Las Vegas, NV, USA, pp. 333–338, (2023). 10.1109/CCWC57344.2023.10099144
27.Ji, R., Selwal, A., Kumar, N. & Padha, D. Cascading bagging and boosting ensemble methods for intrusion detection in cyber-physical systems. Secur. Priv.8 (1), e497. 10.1002/spy2.497 (2025). [Google Scholar]
28.Ji, R., Padha, D., Singh, Y. & Sharma, S. Review of intrusion detection systems in cyber-physical system-based networks. Trans. Emerg. Telecommunications Technol.35 (9), e5029 (2024). [Google Scholar]
29.Ji, R., Kumar, N. & Padha, D. CNN-GWO-voting and hybrid ensemble learning inspired intrusion detection approaches for cyber-physical systems, Proc. Indian National Science Academy, vol. 91, pp. 848–862, (2025). 10.1007/s43538-024-00372-0
30.Ji, R., Kumar, N. & Padha, D. Hybrid enhanced intrusion detection frameworks for cyber-physical systems via optimal feature selection, Indian Journal of Science and Technology, vol. 17, no. 30, pp. 3069–3079, Jul. (2024). 10.17485/IJST/v17i30.1794
31.Ji, R., Kumar, N. & Padha, D. Optimized intrusion detection approach for cyber-physical systems using meta-learning with stacked generalization. Secur. Priv.8 (3), e70031. 10.1002/spy2.70031 (Apr. 2025).
32.Kunhare, N., Tiwari, R. & Dhar, J. Particle swarm optimization and feature selection for intrusion detection systems, Sādhanā – Academy Proceedings in Engineering Sciences, vol. 45, no. 1, p. 109, Dec. (2020). 10.1007/s12046-020-1308-5
33.Kunhare, N., Tiwari, R. & Dhar, J. Intrusion detection system using hybrid classifiers with meta-heuristic algorithms for optimization and feature selection. Comput. Electr. Eng.103, 108383. 10.1016/j.compeleceng.2022.108383 (Oct. 2022).
34.Kunhare, N., Tiwari, R. & Dhar, J. Network packet analysis in real-time traffic and study of Snort IDS during the variants of DoS attacks, in Proc. 19th Int. Conf. Hybrid Intelligent Systems (HIS), pp. 1–10, (2019).
35.Ullah, I. & Mahmoud, Q. H. A scheme for generating a dataset for anomalous activity detection in IoT networks, in Proc. Canadian Conf. Artificial Intelligence (CCAI), Ottawa, ON, Canada, pp. 508–520, (2020).
36.Garcia, S., Parmisano, A. & Erquiaga, M. J. IoT-23: A Labeled Dataset with Malicious and Benign IoT Network Traffic ( Zenodo, 2020).
37.Neto, E. C. P. et al. CICIoT2023: A real-time dataset and benchmark for large-scale attacks in IoT environment. Sensors23, 5941 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Material 1^{(11.2MB, csv)}

Supplementary Material 2^{(13MB, xlsx)}

Data Availability Statement

Dataset generated is available and shared as supplementary documents.

[CR1] 1.Moustafa, N. A new distributed architecture for evaluating AI-based security systems at the edge: network TON_IoT datasets. Sustainable Cities Soc.72, 102994 (2021). [Google Scholar]

[CR2] 2.Booij, T. M., Chiscop, I., Meeuwissen, E., Moustafa, N. & Den Hartog, F. T. H. ToN IoT—The role of heterogeneity and the need for standardization of features and attack types in IoT network intrusion datasets. IEEE Internet Things J.9 (1), 485–496 (2022). [Google Scholar]

[CR3] 3.Alsaedi, A., Moustafa, N., Tari, Z., Mahmood, A. & Anwar, A. TON_IoT telemetry dataset: A new generation dataset of IoT and IIoT for data-driven intrusion detection systems. IEEE Access.8, 165130–165150 (2020). [Google Scholar]

[CR4] 4.Moustafa, N., Keshk, M., Debie, E. & Janicke, H. Federated TON_IoT Windows datasets for evaluating AI-based security applications, in Proc. IEEE 19th Int. Conf. Trust, Security and Privacy in Computing and Communications (TrustCom), pp. 848–855, (2020). 10.1109/TrustCom50675.2020.00114

[CR5] 5.Moustafa, N., Ahmed, M. & Ahmed, S. Data analytics-enabled intrusion detection: Evaluations of ToN_IoT Linux datasets, in Proc. IEEE 19th Int. Conf. Trust, Security and Privacy in Computing and Communications (TrustCom), pp. 727–735, (2020). 10.1109/TrustCom50675.2020.00100

[CR6] 6.Moustafa, N. New generations of Internet of Things datasets for cybersecurity applications based machine learning: TON_IoT datasets, in Proc. eResearch Australasia Conf., Brisbane, Australia, (2019).

[CR7] 7.Moustafa, N. A systemic IoT–Fog–Cloud architecture for big-data analytics and cyber security systems: A review of fog computing, arXiv preprint arXiv:.01055, 2019., 2019. (1906).

[CR8] 8.Ashraf, J. et al. IoTBoT-IDS: A novel statistical learning-enabled botnet detection framework for protecting networks of smart cities. Sustainable Cities Society, p. 103041, (2021).

[CR9] 9.Sarhan, M., Layeghy, S., Moustafa, N. & Portmann, M. NetFlow datasets for machine learning-based network intrusion detection systems, in Big Data Technologies and Applications. Berlin, Germany: Springer, 117–135, (2020). [Google Scholar]

[CR10] 10.Koroniotis, N., Moustafa, N., Sitnikova, E. & Turnbull, B. Towards the development of realistic botnet dataset in the internet of things for network forensic analytics: Bot-IoT dataset. Future Generation Comput. Syst.100, 779–796 (2019). [Google Scholar]

[CR11] 11.Koroniotis, N., Moustafa, N., Sitnikova, E. & Slay, J. Towards developing network forensic mechanisms for botnet activities in the IoT based on machine learning techniques, in Proc. Int. Conf. Mobile Networks and Management. Cham, Switzerland: Springer, pp. 30–44, (2017).

[CR12] 12.Koroniotis, N., Moustafa, N. & Sitnikova, E. A new network forensic framework based on deep learning for internet of things networks: A particle deep framework. Future Generation Comput. Syst.110, 91–106 (2020). [Google Scholar]

[CR13] 13.Koroniotis, N. & Moustafa, N. Enhancing network forensics with particle swarm and deep learning: The particle deep framework, arXiv preprint arXiv:.00722, 2020., 2020. (2005).

[CR14] 14.Koroniotis, N., Moustafa, N., Schiliro, F., Gauravaram, P. & Janicke, H. A holistic review of cybersecurity and reliability perspectives in smart airports. IEEE Access, (2020).

[CR15] 15.Koroniotis, N. Designing an Effective Network Forensic Framework for the Investigation of Botnets in the Internet of Things, Ph.D. dissertation, Univ. of New South Wales, Australia, (2020).

[CR16] 16.Moustafa, N., Slay, J. & UNSW-NB15. : A comprehensive data set for network intrusion detection systems, in Proc. Military Communications and Information Systems Conf. (MilCIS), IEEE, (2015).

[CR17] 17.Moustafa, N. & Slay, J. The evaluation of network anomaly detection systems: statistical analysis of the UNSW-NB15 dataset and the comparison with the KDD99 dataset. Information Secur. Journal: Global Perspective, pp. 1–14, (2016).

[CR18] 18.Moustafa, N. et al. Novel geometric area analysis technique for anomaly detection using trapezoidal area Estimation on large-scale networks. IEEE Trans. Big Data, (2017).

[CR19] 19.Moustafa, N. et al. Big data analytics for intrusion detection systems: statistical decision-making using finite dirichlet mixture models, in Data Analytics and Decision Support for Cybersecurity. Cham, Switzerland: Springer, 127–156, (2017). [Google Scholar]

[CR20] 20.Sarhan, M., Layeghy, S., Moustafa, N. & Portmann, M. NetFlow datasets for machine learning-based network intrusion detection systems, in Proc. 10th EAI Int. Conf. Big Data Technologies and Applications (BDTA), (2020).

[CR21] 21.Sarhan, S., Layeghy, S., Moustafa, N., Gallagher, M. & Portmann, M. Feature extraction for machine learning-based intrusion detection in IoT networks. Digit. Commun. Networks. 10.1016/j.dcan.2022.08.012 (2022). [Google Scholar]

[CR22] 22.Dey, A. K., Gupta, G. P. & Sahu, S. P. Hybrid meta-heuristic based feature selection mechanism for cyber-attack detection in IoT-enabled networks. Procedia Comput. Sci.218, 318–327 (2023). [Google Scholar]

[CR23] 23.Tareq, I., Elbagoury, B. M., El-Regaily, S. & El-Horbaty, E. S. M. Analysis of ToN-IoT, UNSW-NB15, and Edge-IIoT datasets using deep learning in cybersecurity for IoT. Appl. Sci.12, 9572 (2022). [Google Scholar]

[CR24] 24.Gad, A. R., Nashat, A. A. & Barkat, T. M. Intrusion detection system using machine learning for vehicular ad hoc networks based on ToN-IoT dataset. IEEE Access.9, 142206–142217. 10.1109/ACCESS.2021.3120626 (2021). [Google Scholar]

[CR25] 25.Alghamdi, R. & Bellaiche, M. An ensemble deep learning-based IDS for IoT using lambda architecture. Cybersecurity, 6, 5, (2023).

[CR26] 26.Guo, G. et al. An IoT intrusion detection system based on TON IoT network dataset, in Proc. IEEE 13th Annual Computing and Communication Workshop and Conf. (CCWC), Las Vegas, NV, USA, pp. 333–338, (2023). 10.1109/CCWC57344.2023.10099144

[CR27] 27.Ji, R., Selwal, A., Kumar, N. & Padha, D. Cascading bagging and boosting ensemble methods for intrusion detection in cyber-physical systems. Secur. Priv.8 (1), e497. 10.1002/spy2.497 (2025). [Google Scholar]

[CR28] 28.Ji, R., Padha, D., Singh, Y. & Sharma, S. Review of intrusion detection systems in cyber-physical system-based networks. Trans. Emerg. Telecommunications Technol.35 (9), e5029 (2024). [Google Scholar]

[CR29] 29.Ji, R., Kumar, N. & Padha, D. CNN-GWO-voting and hybrid ensemble learning inspired intrusion detection approaches for cyber-physical systems, Proc. Indian National Science Academy, vol. 91, pp. 848–862, (2025). 10.1007/s43538-024-00372-0

[CR30] 30.Ji, R., Kumar, N. & Padha, D. Hybrid enhanced intrusion detection frameworks for cyber-physical systems via optimal feature selection, Indian Journal of Science and Technology, vol. 17, no. 30, pp. 3069–3079, Jul. (2024). 10.17485/IJST/v17i30.1794

[CR31] 31.Ji, R., Kumar, N. & Padha, D. Optimized intrusion detection approach for cyber-physical systems using meta-learning with stacked generalization. Secur. Priv.8 (3), e70031. 10.1002/spy2.70031 (Apr. 2025).

[CR32] 32.Kunhare, N., Tiwari, R. & Dhar, J. Particle swarm optimization and feature selection for intrusion detection systems, Sādhanā – Academy Proceedings in Engineering Sciences, vol. 45, no. 1, p. 109, Dec. (2020). 10.1007/s12046-020-1308-5

[CR33] 33.Kunhare, N., Tiwari, R. & Dhar, J. Intrusion detection system using hybrid classifiers with meta-heuristic algorithms for optimization and feature selection. Comput. Electr. Eng.103, 108383. 10.1016/j.compeleceng.2022.108383 (Oct. 2022).

[CR34] 34.Kunhare, N., Tiwari, R. & Dhar, J. Network packet analysis in real-time traffic and study of Snort IDS during the variants of DoS attacks, in Proc. 19th Int. Conf. Hybrid Intelligent Systems (HIS), pp. 1–10, (2019).

[CR35] 35.Ullah, I. & Mahmoud, Q. H. A scheme for generating a dataset for anomalous activity detection in IoT networks, in Proc. Canadian Conf. Artificial Intelligence (CCAI), Ottawa, ON, Canada, pp. 508–520, (2020).

[CR36] 36.Garcia, S., Parmisano, A. & Erquiaga, M. J. IoT-23: A Labeled Dataset with Malicious and Benign IoT Network Traffic ( Zenodo, 2020).

[CR37] 37.Neto, E. C. P. et al. CICIoT2023: A real-time dataset and benchmark for large-scale attacks in IoT environment. Sensors23, 5941 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Efficient detection of intrusions in TON-IoT dataset using hybrid feature selection approach

N Dharini

V S Janani

Jeevaa Katiravan

Abstract

Supplementary Information

Introduction

Limitations of existing ToN-IoT and NF-ToN-IoT datasets

Research objectives

Related works

Table 1.

Contributions

Proposed work

Dataset description and attack scenarios

Attacks in ToN-IoT

Table 2.

Table 3.

Table 4.

Fig. 1.

Fig. 2.

Preprocessing

Table 5.

Hybrid feature selection approach

Filter based feature elimination (based on domain knowledge)

Wrapper methods

Fig. 3.

Fig. 4.

Fig. 5.

Fig. 6.

Fig. 7.

Fig. 8.

Fig. 9.

Proposed new reduced ToN_IoT dataset with 5 features using hybrid feature selection approach

Table 6.

Fig. 10.

Table 7.

Classification model

Fig. 11.

Results and discussion

Experimental setup

Hybrid Feature selection approach

Fig. 12.

Fig. 13.

Statistical validation of selected features

Table 8.

Table 9.

Classification performance

Table 10.

Table 11.

Fig. 14.

Fig. 15.

Table 12.

Table 13.

Table 14.

Table 15.

Fig. 16.

Fig. 17.

Fig. 18.

Table 16.

Analytical discussion of results

Potential applications

Performance analysis of the proposed reduced feature space over other IDS datasets

Table 17.

Table 18.

Conclusion and future scope

Supplementary Information

Author contributions

Funding

Data availability

Declarations

Competing interests

Ethical approval

Footnotes

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK