Skip to main content
PLOS One logoLink to PLOS One
. 2024 Dec 18;19(12):e0314695. doi: 10.1371/journal.pone.0314695

Effective DDoS attack detection in software-defined vehicular networks using statistical flow analysis and machine learning

Himanshi Babbar 1,#, Shalli Rani 1,*,#, Maha Driss 2,3,#
Editor: Faouzi Jaidi4
PMCID: PMC11654938  PMID: 39693292

Abstract

Vehicular Networks (VN) utilizing Software Defined Networking (SDN) have garnered significant attention recently, paralleling the advancements in wireless networks. VN are deployed to optimize traffic flow, enhance the driving experience, and ensure road safety. However, VN are vulnerable to Distributed Denial of Service (DDoS) attacks, posing severe threats in the contemporary Internet landscape. With the surge in Internet traffic, this study proposes novel methodologies for effectively detecting DDoS attacks within Software-Defined Vehicular Networks (SDVN), wherein attackers commandeer compromised nodes to monopolize network resources, disrupting communication among vehicles and between vehicles and infrastructure. The proposed methodology aims to: (i) analyze statistical flow and compute entropy, and (ii) implement Machine Learning (ML) algorithms within SDN Intrusion Detection Systems for Internet of Things (IoT) environments. Additionally, the approach distinguishes between reconnaissance, Denial of Service (DoS), and DDoS traffic by addressing the challenges of imbalanced and overfitting dataset traces. One of the significant challenges in this integration is managing the computational load and ensuring real-time performance. The ML models, especially complex ones like Random Forest, require substantial processing power, which necessitates efficient data handling and possibly leveraging edge computing resources to reduce latency. Ensuring scalability and maintaining high detection accuracy as network traffic grows and evolves is another critical challenge. By leveraging a minimal subset of features from a given dataset, a comparative study is conducted to determine the optimal sample size for maximizing model accuracy. Further, the study evaluates the impact of various dataset attributes on performance thresholds. The K-nearest Neighbor, Random Forest, and Logistic Regression supervised ML classifiers are assessed using the BoT-IoT dataset. The results indicate that the Random Forest classifier achieves superior performance metrics, with Precision, F1-score, Accuracy, and Recall rates of 92%, 92%, 91%, and 90%, respectively, over five iterations.

Introduction and background

VN [1] are proposed to address essential needs of increasing transportation effectiveness and safety, lowering accident rates, and minimizing the effects of severe traffic congestion. Surveillance provisions, traffic control, and mobile vehicular cloud services on VN are no longer a distant possibility. Data is stored and sent between vehicles in VN [2, 3] because there is no centralized control nodes there. This implies that each vehicle node plays a crucial role, comparable to the storage and transmitting duties performed by routers and switches in conventional networks. Vehicle nodes work together to exchange messages and forward data [4]. DDoS attacks significantly lower the amount of data transmission integration among vehicular nodes, minimize the transmission speed of packets and bandwidth of VN, and limit the effectiveness of communication networks. Moreover, DDoS attacks can also have serious repercussions since compromised nodes utilize the resources of networks maliciously and degrade their effectiveness [5].

Background

The development of VN has a lot of promise thanks to SDN. Owing to the centralized intelligent control that SDN brings, SDVN have many administrative advantages compared to conventional VN [6]. SDN architecture provides the application and control plane to VN, as seen in Fig 1. A selection of services and applications are offered through the applications plane. The controller of the software platform, which serves as the central decision-making hub for an entire SDVN [7] architecture, is accommodated in the control plane, specifically, the control plane of SDN [8]. The control plane also helps aggregate underlying resources and make networks more programmable. Underlying network resources are mostly included in the forwarding plane of SDVN [9]. Hardware for forwarding, known as the forwarding plane, making use of switches and routers with SDN capabilities, is found on the higher data plane. Construction and networked VN, such as Vehicle-to-Infrastructure (V2I) and Vehicle-to-Vehicle (V2V), make up the majority of the lower forwarding plane. Additionally, the Northbound Interface (NBI) controls the opening of the Application Programming Interface (API) for communication between application and control planes [10].

Fig 1. Software defined vehicular networks.

Fig 1

Network design

SDN has become a technology for interconnected network architecture recently. There are three planes: application, control and forwarding. The control and forwarding planes are separated in SDN, allowing for flexibility and simplification. The SDN controller, located in the control plane, is responsible for managing the entire network. By providing a comprehensive picture of the networks and central control functions, SDN makes it easier to compile network statistics and offers greater network security than conventional methods. The most important protocol in SDN architectures is the south-bound protocol, which facilitates communication between network components and the controller.

We have organized our novel proposed model in the control plane as follows to increase its effectiveness:

  • The SDN control plane is completely programmable and customizable.

  • The forwarding plane of the control plane can support several networks.

  • The model is suitable for IoT networks since it can use IoT devices without burdening them computationally or otherwise.

  • Open-Flow (OF) switches are implemented to address diversity between IoT devices and SDN controllers.

The control and forwarding plane of an SDN architecture identifies OF as a prime south-bound protocol. The switches and controllers are related because they create activities and flow tables that instruct the switch on how to handle these channels and flows. A proper method for inspecting network data to find risks, suspicious activity, and attacks is made possible by combining IoT and SDN. In addition, numerous IoT devices, such as sensors, wireless technologies, and smart devices, can be connected to the forwarding plane of SDN. Additionally, NBI indicates the availability of the Application Programming Interface (API) for communication between the application and control plane. The southbound interface is harnessed to allow the API to connect the control plane to the forwarding plane. In the context of networking, controllers communicate back and forth using the API’s eastbound and westbound communication capabilities.

Motivation

Due to cognitive design, Deep Learning (DL) Intrusion Detection Systems (IDS) are excellent at prediction and classification [1113]. For SDN-IoT [14, 15] networks, DL and ML will produce encouraging outcomes. An SDN-IoT system generates an enormous amount of data, which may be exploited by learning techniques to make better and more informed decisions [16]. Fig 2 showcases the rise of DDoS attacks on numerous applications of IoT based on ML approaches. Security can also be improved by adding intelligence through learning-based methods. The main goal of this research is to develop an IDS for DDoS [17] attacks using DL approaches due to the increased use of various ML/DL models to combat such security and privacy issues. Fig 2 shows that DDoS attacks still happen despite intensive research being done to protect SDN–IoT infrastructure (https://trends.google.com/trends/).

Fig 2. Development of DDoS attacks, IoT applications and ML techniques.

Fig 2

Taking into account some of the issues discussed this far, this research suggests a revolutionary ML-based IDS named SDN-IoT-based ML to foresee different DDoS attack categories [18]. The suggested method has a high degree of accuracy in spotting DDoS attacks. The dataset that was taken into consideration for this research is contemporary and widely deployed in the research community to develop intrusion detection algorithms for SDN-IoT networks. The study also surveys several ML-based models by classifying them as benign or malicious [19]. Implementation of the developed SDN-IoT ML-based IDS is evaluated with current baseline methods, specifically ML-based IDS models.

Problem statement

The emerging technology SDN is programmable, scalable, and flexible. Furthermore, in IDS, there is an inherent need to develop strategies for reducing the impact of bias in datasets while considering accuracy and simultaneously making use of entropy. This research work proposes the detection of DDoS attacks. First, destination entropy and measurements for flow statistics to differentiate normal and malicious attacks are used. Second, the approach deploys ML algorithms to classify packets as normal or as a malicious attack. In this paper, if the value of entropy is less than the threshold value for a maximum of ten consecutive times, it is regarded as an attack. To cope with the accurate detection of attacks, the optimum value of the threshold must be selected after many different tests on the topologies proposed. To identify an optimal threshold value, a number of tests have been executed to determine how attacks have affected entropy for different types of topologies having different rates of attack.

Contributions

The main contributions are:

  1. Facilitate comprehensive insight into previous work, focusing on ML techniques that have employed the BoT-IoT dataset for DDoS development.

  2. The framework for the proposed work has been showcased where attack detection, mitigation, and recognition are considered. Concerning these considerations, two unique algorithms are proposed for attack detection and traffic categorization.

  3. Determine DDoS attack detection effectiveness for a given SDN controller. In this manner, two features (flow length and flow duration) and two approaches are presented, namely statistics of flow and entropy measurement. To identify DDoS attacks, a model using the degree of attack is presented.

  4. Represent detailed data-pre-processing steps utilizing feature extraction and selection techniques.

  5. Utilize ML-based models for detecting DDoS attacks on SDN-based IoT employing K- Nearest Neighbour (KNN), Logistic Regression (LR) and Random Forest (RF).

  6. Our results demonstrate the classification of benign, reconnaissance, DoS, and DDoS traffic with an accuracy of 91% and represent extracted features of a dataset.

Organization of paper

The paper is organized as follows: Section 2 explains the related work done by the existing authors for the detection of DDoS attacks; Section 3 describes the methodology that shows the impact of the behavior post traffic and identification of DDoS attacks; Section 4 describes the proposed approach for the detection of attacks; Section 5 highlights the experimental setup along with the dataset and results; Finally Section 6 concludes the paper.

Related work

For the detection of DDoS, ML techniques are considered effective at identifying attacks happening in the control plane of the SDN framework [20]. In this section, previous research deployed on SDN is described.

VN arise from IoT, which is used as a combined network that executes traffic management and intelligent control of traffic in SDN. Therefore, Yu et al. [21] develop a mechanism for detection and quick response to attacks happening in VN based on SDN. The developed mechanism is based on multi-dimensional information and the strategy itself is based on the flow of extracted features rather than the triggering of messages in the OpenFlow protocol. In this work, the results verify that the scheme based on attack detection minimizes starting time and false alarm rate.

Muthanna et al. [22] present a smart, SDN-enabled architecture for effective detection of attacks in IoT deploying Long Short Term Memory (LSTM) that accesses a baseline IoT dataset (CICIDS2017) utilizing evaluation metrics. The developed model acquires 99.50% accuracy for detection having a low false positive rate (FPR). The results are compared with different models in terms of efficiency, precision and various other evaluation metrics. Wani et al. [23] deploy features of SDN that are used to mitigate attacks in DDoS for IoT networks. The method is used to detect abnormal behavior and predict abnormality affected by DDoS attacks using ML [24]. The results show that the precision of this work is 98.74% which is much less than our proposed work. The authors also develop a mechanism of fusion entropy which identifies attacks by computing the randomness of events happening in a network. Simulation results of the developed mechanism show the value of entropy for attack detection that happened to be 99.25% lesser than a normal attack. The key advantage is that their methodology identifies attacks happening at the beginning of an attack which deploys integration between two entropy’s that significantly minimizes the entropy value [25]. Sahoo et al. [26] detect attack traffic by undertaking centralized control, therefore, by applying ML, malicious traffic can be detected and a Support Vector Machine (SVM) model is used for minimizing noise caused by differences in features that achieve classification accurately with better generalizations. Sultan et al. [27] recommended detection and mitigation of DDoS for systems in SDN in which the model is helpful in identifying attack traffic in multi-controller environments.

DDoS attacks in SDN for vehicular networks

IoT [28], an incorporated network that enforces intelligent management of traffic [29], intelligently changes information services, and vehicle intelligent control [30] in conformity with established communication protocols and data communication standards is where the concept of VN originated. Sedjelmaci et al. [31] propose safeguards against three types of attacks, namely DoS, Integrity Target, and False Alert Generation can be guarded against using effective and lightweight intrusion detection mechanisms for vehicle networks (ELIDV). ELIDV is built on a set of principles that quickly and accurately identify harmful vehicles.

To enable safe communication, it is crucial to ensure security in SDN. Eliyan et al. [32] focus on DDoS attacks in the control plane. A DDoS assault prevents users from accessing a system or network resources. This is accomplished by using all of a network’s bandwidth or all network nodes’ resources (such as memory and CPU). The various categories of DDoS attacks [33] are:

  1. UDP Flood: an attack that attempts to bring down the server by flooding the targeted host with numerous UDP packets to various random ports. Attackers typically use the connectionless capability of UDP to send a stream of UDP data packets to target workstations. The target machine’s queue fills up and it is unable to react to requests from reputable users [34]. To conceal the positions of attacked machines, the attacker typically hijacks source IP addresses of UDP packets.

  2. SYN Flood: an attack that targets a victim’s computer by initiating a TCP connection. The victim receives a huge amount of SYN packets. However, no ACK is ever returned, allowing the victim’s system to become overloaded with resources and inaccessible to other users.

  3. DNS Reflection attacks: cause responses that are significantly larger than requests to be sent directly to a victim by sending DNS requests to the target’s source IP address. Attackers transmit forged request packets to a server with a changed source address in reflection-based flooding attacks [35]. Massive response packets are sent to the victim identified by a modified server source address because the server cannot tell the difference between legitimate and faked packets. An example of a reflection-based attack called an amplification-based flooding attack aims to trick the server into sending a large number of answer packets to a victim with few queries.

  4. HTTP Flood: this happens when a web server is overloaded by HTTP requests, which sends an enormous volume of requests and is unable to handle valid requests.

  5. ICMP Flood: occurs when an attacker depletes a victim’s resources by bombarding the server with responses to a huge number of ICMP pings (echo requests and replies).

Methodology

This section provides a comprehensive explanation of the dataset, attack detection architecture, algorithms, and data pre-processing steps used as well as details of the methodology administered.

Impact of the behavior of traffic post attack

According to the type of topology, 100 packets are used to compute the entropy value, and a threshold value is chosen. A random value is computed based on the destination IP addresses. A flexible and quick way to calculate the standard distribution used as an anomaly detection tool was offered in the ahn entropy technique introduced in [36]. Here, we present entropy as a statistic for detecting DDoS attack traffic. Eq 1 calculates entropy.

Entropy(E)=a=1m-Ralog2Ra (1)

In Eq 1, the approach entropy (E) is a function with probability Ra. To distinguish between classes that have to be trained, we need to know which characteristic in a set of training feature vectors is most helpful. The details are gained to reveal the significance of a certain feature vector characteristic. Entropy gained is depicted in Eq 2.

Gain(E,F1)=Entropy(E)-cfeatures|Ec|E
Entropy(Ec) (2)

The number of values sampled is denoted as Ec having a value ∈ c. A standard technique has been utilized to normalize the value Gain for every feature. Let MGain be the value for the Gain post performing the standard normalization. MGain is evaluated in Eq 3.

MGain(E,F1)=Gain(E,F1)-highest(Gain(E,F1))highest(Gain(E,F1)-lowest(Gain(E,F1)) (3)

Entropy sample space of flow length is identical to the post-attack entropy sample space.

We begin with the assumption that the entropy’s flow length is e(length) before the attack, then post-attack the flow length is el(length) as shown in Eq 4.

el(length=1)=-a=bΦe(length=a)sabrblog2Ra (4)

In Eq 4, el shows the value of entropy after the attack, e denotes the value of entropy before the attack, the constant values are sabrb and the length of the flows post-attack is el(length = 1).

Let us assume that there are two lengths of flow b1 and b2, in case the same values are assigned for both the flow lengths, then the same values will be there for two flow lengths (e(length = b1) == e(length = b2)). The sample spaces sabrblog2Ra are defined as constant. Therefore, when the attack happens, before then the value of the two entropy’s are not different. However, the values of entropy for the attack that happened after are not different, in other words (e(length = b1) == e(length = b2)).

To validate Lemma, the Mininet emulator (http://mininet.org/) is deployed for the Open Virtual Switch that was utilized for the switches in the network. Mininet operates on the Linux Operating System and we run our validation on UBUNTU. We note here that Mininet is primarily used for stationary network nodes. However, if you want to simulate the mobility patterns of vehicles in Mininet, you can achieve this by using the Mininet-WiFi extension. Mininet-WiFi extends Mininet to support wireless networks and mobility models, making it suitable for simulating vehicular networks.

The procedure for detection of attacks deploying entropy is based on normal traffic, where the randomness in 100 packets is very high, therefore, entropy is high. In the scenario of attacked traffic, since the maximum number of packets are directed to a unified host, randomness reduces which signifies entropy becomes very low. In this paper, the value of entropy is high, therefore, in case, the value exceeds the threshold, the detection of a DDoS attack plays an alarm in the SDN Ryu controller (https://ryu-sdn.org/).

Degree of attack for DDoS attack identification

After a DDoS attack happens, DDoS identification in the SDN network begins. The detailed steps are described as follows: Step 1: The degree of the DDoS attack in SDN has four features u1, u2, u3, u4. M(u1), M(u2), M(u3), M(u4) denote the features of MGain values. Later, the degree of the attack (SD) is defined in the SDN network given in Eq 5.

SD=1ma=1mM(ua);m=8 (5)

Step 2: Let us assume that the flow is denoted as U, where Ub signifies the MGain flow during time b, given in Eq 3. To identify the SDN attack, Eqs 3 and 5 are used. If Ub = 0 and SD <= 1, it means an attack is not identified, whereas if Ub = 1 and SD >= 1, it means an attack is identified. In case SD > 1, the flow passed is an attack during time b. Therefore, the network of SDN suffers from the attack, otherwise, the flow is not an attack at the same time. Algorithm 1 gives the details of the proposed approach.

Algorithm 1 Algorithm for proposed approach

Input: Degree of Attack

Output: Recognization of DDoS

1: procedure Begin(:)

2:  if b ∈ B then

3:   Evaluate SD using Eq 5

4:  if SD ≥ 1

5:   DDoS attack identified on SDN network;

6:  else

7:   DDoS attack is not identified on SDN network;

8:  endif

9: endif

To differentiate DDoS attack packet flows from normal packet flows effectively, several sophisticated techniques and strategies can be employed. Firstly, monitoring traffic volume and rates is crucial; DDoS attacks typically generate a significantly higher volume of packets per second compared to normal traffic patterns. Analyzing these metrics allows for the detection of sudden spikes or anomalies that may indicate an ongoing attack. Secondly, statistical analysis plays a pivotal role in examining packet inter-arrival times and entropy [37]. DDoS attacks often exhibit irregular patterns and lower entropy due to the repetitive nature of attack packets, which contrasts with the more diverse and predictable patterns of normal traffic [38]. Thirdly, leveraging machine learning models, such as Random Forest, trained on historical data helps classify traffic based on learned patterns of normal versus attack behavior. These models can discern subtle deviations from established norms, enhancing the ability to detect and respond to emerging threats in real-time. Additionally, protocol and payload inspection techniques scrutinize network protocols and packet contents for anomalies indicative of attack signatures or malicious intent, providing deeper insights into the nature of traffic anomalies. By integrating these methods within a comprehensive detection framework, organizations can effectively differentiate DDoS attack packet flows from normal traffic, thereby fortifying their defenses and ensuring the resilience of their networks against evolving cyber threats.

Fig 3 shows the time and values for normal traffic and attacked traffic, indicating the MGain values for unique features in 8-time intervals. For the different time intervals, there is a fluctuation for different features. Fig 4 displays the changes that happen at various time points and features. Fig 4 shows that there are not many changes, and the MGain value for the flow arrival rate is higher than the flow length after the DDoS attack.

Fig 3. Flow length.

Fig 3

Fig 4. Flow duration.

Fig 4

Proposed approach

The standard phase is stated to be the state of the overall network. When the flow of messages increases, the network is assumed to be under attack, and the system injects into the detection phase. When the system reaches the detection phase, it must determine whether a DDoS is attacking the network or not. Whenever the escalated volume of communication extends a specific threshold, the detection phase is initiated. The attack recognition phase begins if a DDoS attack is detected in the system. The system looks to detect the attack’s source and attack path during this stage. The system enters the mitigation phase after locating the attack source. All traffic coming from the attacker’s source is halted during this period. Fig 5 depicts the system’s change through its many stages. Each arrow indicates an event that enables the system to move between stages. A mitigation plan is put into action in the mitigation phase to stop attack flow. The suggested IoT DDoS defense mechanism is made up of several elements that each perform a different portion of the overall task.

Fig 5. Framework of proposed SDN-IoT for attack detection.

Fig 5

The main elements of the framework are the frequency and traffic volume that allow the detection of any deviation from the network’s typical behavior. Then, control is transferred to the detection stage if it detects a dramatic rise in the occurrence and size of messages attempting to reach the IoT gateway. A monitoring element in the detection stage is responsible for finding DDoS attacks in the IoT network. The monitoring element finds anomalies and verifies a DDoS attack when control is transferred to the detection stage. After that, the system enters the recognition stage. By analyzing data from the detection stage and utilizing the SDN’s global view, the recognition stage can trace the attack path and identify the attacker. The system returns to an earlier stage if the attacker is not recognized. After recognizing the attacker, the Mitigation stage is initiated. A suitable defense tactic is employed in the mitigation stage to halt malicious traffic. After overcoming the attack, the system returns to the standard stage. All details are presented in Fig 5.

Detection of attacks

The detection module is the crucial subsystem in every DDoS security plan since it controls how proactive the system is. Due to the increased speed at which DDoS attacks are growing, the detection of DDoS attacks has become the main goal of this research work. Most DDoS detection methods include statistics, data mining, machine learning, computational intelligence, and/or knowledge-based approaches. Many newer proposed systems combine many of these elements. A monitoring component is part of a DDoS detection mechanism that scans the network for any deviation from usual activity before determining whether or not a DDoS attack is the culprit. The monitoring component notifies the system or network administrator if it finds a potential DDoS attack. Two different approaches make up the Detection Phase of the SDN-IoT framework: (1) analysis of the system and detect the unexpected flow of messages, and (2) evaluate abnormal behavior and verify the DDoS attack. The rate at which messages arrive at the IoT gateway is computed in the first sub-module, and the massive data abnormal detection method is utilized to find the outliers of messages.

Algorithm 2 shows the precise steps involved in DDoS attack detection. The amount of new messages designated as Flag (f) is increased by one each time a new message (m) arrives. The modular split of Flag and Threshold (t) is computed, a specified upper bound on the number of new messages. If the remainder is not zero, the controller receives the new message and processes it. Instead, the time is specified. The gap between the present time (ecurr) and the previous time (eprev), when the residual for the modular division of flag and threshold was zero, is used to calculate the amount of time that has elapsed (e). The newest message’s rate (r) is determined by dividing the threshold by the passing time.

Algorithm 2 Attack Detection

Input: Arrived newest message = m

Output: Detection of malicious behavior

1: procedure Begin(:)

2:   Increment the flag by 1≔ f++

3:  if f mod t = 0 then:

4:   e = ecurreprev

5:   r=te

6:  else

7:   transmits the newest messages to the SDN controller

8:  endif

9:  if r is benign, then

10:   inform the SDN controller

11:  else

12:   Confirm whether the malicious has happened due to DDoS

13:  endif

To determine whether the outlier discovered is part of a DDoS attack, irregularity must be carefully analyzed in the detection stage. Algorithm 2 is used to describe the latter portion of the detection stage. ML algorithms are employed in the proposed framework to find the potential DDoS attack. ML algorithms can discover patterns and identify trends from inadequate or complex data by extracting the necessary elements. The data from the flow entries is retrieved from the controller and then sent to the trained neural network or ML algorithms after the monitored sub-module has identified the abnormality. ML algorithms identify malicious and DDoS-based traffic. Before deploying a neural network model for authentic detection, it must first be trained. Using features of the malicious traffic, a dataset that was built in advance is used for training. A distinct set of numbers is needed to demonstrate malicious and benign traffic in a dataset.

It is crucial to emphasize that the system is independent to facilitate the efficiency of the detection and mitigation operations. Therefore, no human intervention is necessary even though the system generates an alarm to alert the network administrator when a DDoS attack is detected. Fig 6 describes a schematic diagram that illustrates how the proposed system operates. The SDN controller exports IP flow measurements or features every second via the OpenFlow protocol. These measurements consist of heterogeneous data that can be divided into qualitative and quantitative features, such as source/destination ports and IP addresses. Quantitative features include package rate and bits per second. The qualitative measurement needs to be changed into a quantitative one for the detection stage to be able to use this data.

Fig 6. SDN security system.

Fig 6

In Fig 7, as soon as the system is started, the neural network begins to be trained. ML algorithms take the characteristics of both malicious and benign traffic as input. These figures are compared with the anomaly discovered, aiding in the identification of a DDoS attack. The packet-count that each flow entry matches, the flow entry time and the rate of each flow entry are the attributes that ML algorithms use as input. The features listed can change based on the level of accuracy desired and are derived from the controller’s flow data [39]. These attributes are used to generate the eigenvalues for the ML algorithms, which aid in distinguishing between benign and malicious communications. The destination address is found and recorded in a list named malicious IP list after a malicious flow entry is found. A DDoS alarm is triggered and the controller halts the execution of flow statistics messages if malicious flow entries rise to a threshold number. If the flow entry is benign and the largest number of malicious entries has not been reached, the following flow entry is evaluated.

Fig 7. Traffic categorization.

Fig 7

The proposed approach ensures accurate differentiation between reconnaissance, DoS, and DDoS traffic by addressing the issues of unbalanced datasets and overfitting in the deployment of ML algorithms specifically designed for intrusion detection in SDN environments enhances the model’s ability to learn and recognize complex attack patterns [40]. These algorithms are trained to differentiate between reconnaissance, DoS, and DDoS attacks by learning from labeled examples. The approach explicitly tackles the issue of unbalanced datasets, which is common in cybersecurity datasets where benign traffic vastly outnumbers malicious traffic. Techniques such as oversampling the minority class, undersampling the majority class, or using synthetic data generation (e.g., SMOTE) can be employed to balance the dataset, ensuring that the ML model is exposed to sufficient examples of all types of traffic.

Limitations of selection of features for model evaluation

There exist various potential restrictions when evaluating a model for identifying DDoS assaults in SDVN by using a limited portion of the gathered characteristics. These limitations may affect the results’ generalizability. (i.) By selecting only a subset of features, important information that could be crucial for distinguishing between different types of traffic (e.g., benign, reconnaissance, DoS, DDoS) might be omitted. This can lead to a decrease in the model’s ability to accurately identify all types of malicious activities. (ii.) When a limited number of features are used, the model might not capture the complexity of the data, leading to underfitting. Underfitting occurs when the model is too simple to capture the underlying patterns in the data, resulting in poor performance across different datasets. (iii.) A model trained on a limited set of features may lack robustness to variations and anomalies in real-world traffic. This can make the model less effective in adapting to new types of attacks or changes in traffic patterns that were not represented in the training data.

Results and discussions

Experimental setup

Mininet is an emulator for utilizing huge networks on the restricted resources of a unified computer or any virtual machine (http://mininet.org/). Mininet has been used to enable the research in SDN and hence, it permits the users to run unmodified code on the virtual hardware or PC.

Jupyter Notebook [41] was used throughout the entire experiment on an HP laptop running the Windows 10 64-bit operating system (https://jupyter.org/). An Intel i7 8th generation Quad-core processor with a processing unit and 16.0 GB of RAM was used. To load and pre-process the dataset, Pandas and the NumPy library were utilized. Scikit-learn has been used for evaluating the model, evaluation metrics to check the effectiveness and training-testing data analysis. Data visualization was carried out using the Matplotlib module. The layout of the proposed model for the categorization of attacks in SDN-IoT using ML is shown in Fig 8.

Fig 8. Proposed Model for categorization of attacks in SDN-IoT using ML.

Fig 8

Dataset

The effectiveness of the threat detection methodology is substantially impacted by choosing an appropriate dataset. As per our study, BoT-IoT [42] is utilized for threat detection in IoT. Several datasets have been used by various authors for intrusion detection in IoT contexts. The BoT-IoT dataset created by The University of New South Wales (UNSW), Canberra, Australia, was the source of the data employed in this study. The complete dataset has 72 million records and includes both benign traffic and several attack traffic types like DDoS, DoS, Data exfiltration, OS, and Service Scan. 74 different files make up the BoT-CSV IoT’s [34] for the SDN version. For such experimental analysis, the traffic for two attacks—DDoS and reconnaissance—as well as benign traffic was taken into consideration for efficiency of management.

Characterstics of BoT-IoT dataset.

  • Origin and Purpose: The BoT-IoT dataset is derived from the publicly available IoT traffic dataset called “Bot-IoT 2018” and is designed for evaluating intrusion detection systems (IDS) in IoT environments. It contains traffic data captured from various IoT devices and networks, including simulated attack scenarios.

  • Dataset Size and Scope: The exact size of the BoT-IoT dataset can vary depending on the specific subset used for experimentation. Typically, it consists of a substantial amount of network traffic data, including both benign and malicious traffic samples.

  • Types of Attacks: The dataset covers a diverse range of attacks commonly encountered in IoT environments. These include various forms of DoS (Denial of Service), DDoS, reconnaissance attacks, and other malicious activities targeting IoT devices and networks.

  • Traffic Diversity: It includes traffic generated by different IoT devices and applications, each characterized by unique communication patterns and protocols. This diversity reflects real-world IoT environments, making it suitable for testing the robustness of intrusion detection systems under various conditions.

  • Anomaly and Attack Scenarios: The BoT-IoT dataset incorporates both normal traffic patterns and anomalous behaviors that simulate attacks. This allows researchers and practitioners to assess the effectiveness of detection algorithms in accurately distinguishing between benign activities and potential security threats.

  • Dataset Usage: Researchers often use subsets or specific versions of the BoT-IoT dataset for experimental evaluations in intrusion detection, machine learning-based classification, anomaly detection, and other related studies aimed at enhancing IoT security.

Feature extraction

The synchronization of transmitting and acquiring packets, as well as the subsequent automatic development of labeling these packets as benign or malicious, make it difficult to capture network data while guaranteeing the labeling process. We created some scripts on the Cron Linux features over the UBUNTU Tap VM to complete this work. An explicit benign or malicious scenario is required to be performed when the programs run at a specific time. For instance, when the DDoS was being generated, we planned the deployment of customized bash scripts that used hping3 and golden-eye to perform the DDoS malicious attacks. Simultaneously, background traffic was generated normally. Tshark [43, 44] is the tool that was running concurrently to collect raw packets and save them in 1 GB PCAP files to make it easier to retrieve network information. Therefore, defining the IP addresses of the attacker and recipient workstations allowed us to distinguish between benign and malicious traffic while trying to ensure that only attacking traffic would be sent between both groups.

There is a total of 46 features in the BoT-IoT dataset (the bottom three features are labeled), out of which 22 features are extracted for our research as shown in Table 1.

Table 1. Extracted features from BoT-IoT dataset.

Sno. Feature Name Description Sno. Feature Name Description
1 pkseqID Identifier of tuple 12 mean Average record duration for all records
2 proto Descriptive illustration of the network flow’s transaction protocols 13 N I N Conn P DstIP Number of connections coming into each destination IP
3 saddr IP address of the source 14 drate Packets per second from destination to source
4 sport Port number of source 15 srate Packets per second from source to destination
5 daddr IP address of the destination 16 max Maximum record duration for data collected
6 dport Port number of destination 17 dur A complete-duration record
7 seq The sequence number of tool 18 spkts Count of packets from source to destination
8 stddev Combined records standard deviation 19 dpkts Count of packets from destination to source
9 N I N Conn P SrcIP Number of connections from each source IP 20 attack Labeling the class: Benign-0 and malicious-1
10 min Minimum record duration for data collected 21 category Category related to traffic
11 state_number Depiction of feature state numerically 22 subcategory Sub-category related to traffic

Furthermore, the categories saddr, sport, daddr, dport, and proto are regarded as network flow identifiers because they can each be used to uniquely identify a flow at a given time and aid in labeling. For such dataset training and validation of machine learning models using binary classification, malicious occurrences are labeled with a 1 and benign occurrences with a 0.

Data pre-processing

The data pre-processing stage is the essential step in the evaluation. Certain proposed framework attributes, including saddr, daddr, and proto (see Table 1), are of the categorized type and must be transformed into algorithm executable form. Source and destination IP addresses are indicated by saddr and daddr, respectively, while the protocol type used during the flow is indicated by proto. This feature is converted into the integer type. Every source and destination IP address has been given a number. There are a total of 301 IP addresses used for the BoT-IoT data set.

A well-known issue in ML is imbalanced data, which happens when the dispersion of the various classes is biased. The distribution of the various classes in an unbalanced data set can be somewhat unbalanced or extremely unbalanced. Any learning model trained using a wildly unbalanced dataset would perform poorly in terms of predicting outcomes for minor classes. The BoT-IoT dataset has approx. 0.15% of benign data. Additionally, the BoT-IoT data set contains roughly 55% of DDoS and roughly 50% of DoS data. As a result, this dataset cannot be utilized to train and forecast benign, DDoS, or DoS packets alone. There are 34 million DoS packets and 40 million DDoS packets in the BoT-IoT data collection. In the proposed framework that has been presented, we divide the 3.2 benign packets into 14 equal data chunks for DDoS and DoS packets. There are 3.2 million distinct DDoS and DoS packets for all of the 14 chunks, meaning that there are 3.2 million benign packets, 3.2 million DDoS packets, and 3.2 million DoS packets in each chunk. Consequently, all of the 14 chunks include 7.5 million packets combined. With this approach, the over-fitting issue is reduced and the data for the three classes are distributed equally among all chunks.

Defining machine learning model classes

A system can learn and get better based on experience in ML, a subfield of Artificial Intelligence (AI), without any need for external programming [45]. Algorithms or techniques are used in ML to build learning models from data. The main goal of ML is to make it possible for computers to learn automatically, without human intervention, so they can make future decisions that are accurate. The choice of K-Nearest Neighbor, Random Forest, and Logistic Regression classifiers for this study in the context of SDN Intrusion Detection for IoT environments is based on their distinct characteristics and strengths that make them suitable for different aspects of intrusion detection. Here are the reasons for selecting these classifiers, along with their respective advantages and disadvantages:

  1. KNN: Data is categorized using KNN, a fundamental and simple machine learning technique, according to similar distance metrics. In this situation, the distance may be of the Manhattan or Euclidean types. The value of k is chosen once the proximity between the data points is computed. The data points with the closest distance are given the same class, as well as the value of k can be any integer value. With a rise in the frequency of nearest neighbors, the model’s accuracy rises (i.e., the value of k = 5). Since the KNN algorithm makes no assumptions about the data, it is non-parametric, which is useful when dealing with legitimate data.

    Advantages:

    • Performs well on small datasets where the computational cost is manageable.

    • Can handle multi-class classification without any modification.

    • KNN is a lazy learner, meaning it requires no explicit training phase, making it fast to deploy.

    Disadvantages:

    • High computational cost during prediction as it involves calculating the distance between the query point and all points in the dataset.

    • Requires storing the entire dataset, which can be impractical for large datasets.

    • Performance can be degraded by irrelevant or redundant features, necessitating careful feature selection and normalization.

  2. RF: Random Forest is built on the idea of ensemble learning, which combines different classifiers to handle complicated data problems and enhance the model’s overall effectiveness. RF is made up of several distinct decision trees that analyze various dataset fragments. The voting from every tree is considered, and the majority votes are used to forecast the outcome. The problem of overfitting can be avoided by increasing accuracy performance by increasing the number of trees in RF.

    Advantages:

    • Generally provides high accuracy and robustness to overfitting due to averaging multiple trees.

    • Can handle both classification and regression tasks and works well with high-dimensional data.

    • The ensemble nature reduces the risk of overfitting compared to individual decision trees.

    Disadvantages:

    • More complex and computationally intensive than single decision trees, particularly during the training phase.

    • While individual trees are interpretable, the overall Random Forest model is less transparent.

    • Can be slower to train, especially with large datasets and a high number of trees.

  3. LR: It is a technique for estimating different values using a set of independent variables as input. The algorithm’s anticipated result is a dependent categorical variable. It aids in the prediction of a probabilistic value between 0 and 1. Linear Regression and LR are comparable, but LR differs in the kind of problem that needs to be solved. When compared to other ML algorithms, LR is important since it can categorize a variety of data types and identify the most relevant variable.

    Advantages:

    • Computationally efficient and fast to train, making it suitable for real-time applications.

    • Provides coefficients that are easily interpretable, indicating the strength and direction of the relationship between features and the outcome.

    • Effective for binary classification tasks, which are common in intrusion detection.

    Disadvantages:

    • Assumes a linear relationship between the features and the log odds of the outcome, which may not capture complex, non-linear patterns in IoT traffic.

    • May perform poorly if the underlying data relationships are non-linear and complex.

    • Can be sensitive to outliers, which can skew the model’s performance.

Validation metrics

Performance analysis is a crucial step after data preparation and ML model training. Various performance indicators based on the confusion matrix are used to assess the effectiveness of the machine learning model. Before applying the model to previously unexplored data, it is crucial to assess its overall effectiveness.

Confusion metric

A performance evaluation method for machine learning classification in SDN-IoT is the confusion matrix. With the aid of the testing data’s real values, an evaluation is performed. Confusion Matrix assists in ML performance metric calculation and error detection (FP and FN).

  • True Positives (TP): are a true positive label that the classifier determines to be positive.

  • True Negatives (TN): are negative labels that the classifier recognizes as true.

  • False Positives (FP): are produced when a valid negative label is mistakenly interpreted as a positive.

  • False Negatives (FN): are when a real positive label is mistakenly interpreted as negative by the classifier.

A positive event is typically viewed in the context of cyber-security research to be a malicious occurrence, and the accurate classification of an event is considered to be a true positive outcome. A negative event is a good thing, and the accurate description is true negativity. An inaccurate classification could lead to the labeling of a benign event as malicious. It is thought that this misclassification is a false positive. Similarly, it is considered a false negative when a hostile event is labeled as a benign occurrence.

The metrics used for the performance evaluation in our study are:

  1. Accuracy: This statistic aids in figuring out a classifier’s accuracy. It establishes how many accurate predictions the model built. It is the proportion between the number of accurate predictions and all of the model’s other predictions. Eq 6 is evaluated as:
    Accuracy=TP+TNTP+TN+FP+FN (6)
  2. Precision: It is the percentage of correctly predicted values to all correctly predicted values as determined by the classification model. Eq 7 is evaluated as:
    Precision=TPTP+FP (7)
  3. Recall: It describes the overall number of records for a given class that can be correctly predicted using whatever data is available. Eq 8 is evaluated as:
    Recall=TPTP+FN (8)
  4. F1-Score: also referred to as the mean of Recall and Precision, employs Recall and Accuracy to evaluate the model thoroughly. Eq 9 is computed as:
    F1-Score=2*TP2*TP+FP+FN (9)
  5. False Positive and False Negative: False Positives are instances where legitimate traffic is incorrectly classified as a DDoS attack. False Negatives are instances where actual DDoS attack traffic is incorrectly classified as legitimate.

Performance analysis

The model is trained in the first experiment of this study employing the test/train split functions from the Scikit Learn library on the BoT-IoT dataset. Because the dataset only has around 72 million records, two attacks shown in Tables 2 and 3: a DDoS attack with 1, 541, 315 million samples, a DoS attack with 1,320,148, a reconnaissance attack with about 72, 919, Benign data with 370 and theft with 65, respectively were chosen for this research to make handling it easier. KNN, RF, and LR are the ML classifiers utilized in this research.

Table 2. Categories with its number of samples.

Categories Samples
DDoS 1,541,315
DoS 1,320,148
Reconnaissance 72,919
Benign 370
Theft 65

Table 3. Categories with its number of samples.

Label Samples
Malicious 2,934,447
Benign 370

Statistical flow measurements involve collecting and analyzing various metrics that describe the behavior of network traffic. Common metrics include:

Packet Count: Number of packets transmitted over a period. Byte Count: Amount of data (in bytes) transmitted over a period. Flow Duration: The time duration of a flow from the first to the last packet. Packet Inter-arrival Time: Time intervals between consecutive packets. Flow Rate: The rate at which packets/bytes are transmitted.

These metrics help in understanding the patterns and anomalies in network traffic. For instance, a sudden spike in packet count or flow rate can indicate a potential attack.

Criteria for Determining the Best Sample Size Determining the best sample size for the dataset and evaluating the balance between sample size and model performance involves several critical steps and criteria including: 1. Evaluate key performance metrics such as accuracy, precision, recall, and F1-score. These metrics provide a comprehensive view of the model’s ability to correctly identify different types of traffic, including benign and malicious traffic (reconnaissance, DoS, DDoS). 2. Consider the training time and computational resources required. Larger sample sizes typically improve model performance but also increase the time and resources needed for training. 3. Ensure that the sample size is large enough to achieve statistically significant results. Small sample sizes may lead to high variance in performance metrics and unreliable conclusions.

By using a large number of records with a very unbalanced class distribution, the model was able to perform well. This experiment includes a small subset of samples from the full dataset to determine the threshold sample limit at which the model performs at its best. The BoT-IoT dataset contains an unusually high proportion of malicious samples compared to benign samples, which can strain resources like memory and processing time. Identical to the last experiment, this one took into account three attacks: DDoS, DoS, and reconnaissance. The BoT-74 IoT’s CSV files are initially concatenated into a single file. The concatenated file is used to extract the aforementioned malicious traffic as well as benign traffic. Following this is feature selection pre-processing, which selects a limited subset of data for deployment: 10, 000 and 1, 000 samples, respectively, of malicious and benign traffic. The subset of samples is next subjected to pre-processing processes. By manually inputting the number of training samples for every iteration, a sequence of ten iterations is carried out. The iteration samples were selected using a procedure of trial and error. The Threshold/Breakpoint value is considered to be the first occurrence of the greatest value during 5 repetitions. There are two sections to this research. The number of benign samples is determined in the first section, and then malicious samples are ordered in a series for 5 iterations. For DDoS, DoS and reconnaissance threats, the model has been trained using three classifiers: KNN, FR, and LR. Here is a description of the threshold comparison analysis for the research’s initial phase: Fig 9 displays the KNN classifier’s threshold evaluation for reconnaissance, DoS and DDoS attacks. 500 fixed benign samples are taken into account for the KNN following the trial-and-error procedure. At 6000 samples, the model achieves an 85% Accuracy in reconnaissance, indicating a pre-breakthrough point for the threshold value. For reconnaissance, a threshold of 99% is reached after 8000 malicious samples, and then for DDoS, a threshold of 100% is reached at 1000 samples. In contrast to DDoS, where a perfect score is attained with 1000 attack samples, the classifier accuracy in this figure shows a progressive upward trend forward toward the threshold value for reconnaissance. The DoS samples have reached 78% with 5000 samples. A comparison of the three attacks reveals that DDoS needs fewer attack samples to reach the threshold value.

Fig 9. RF model.

Fig 9

Fig 9 shows the Random Forest classifier’s threshold analysis. The maximum classifier score that was attained using the same quantity of fixed benign samples (500 samples) as KNN was roughly 50%. As a result, new fixed benign samples had to be generated using the second round of trial and error. For RF, 58 benign samples were used as a fixed value, and similarly to KNN, 5 iterations of series changes were made to the attack samples. Employing 6000 assault samples for reconnaissance, the iterations that use the improved benign sample were able to hit the 97% threshold. At 1000 samples, DDoS reached a threshold of 100%. For DoS with 6000 samples, the threshold reaches 98% similar to the reconnaissance. Just after threshold iteration, a very slight percentage decline is seen for reconnaissance, however, this was not the case for the DDoS attack and DoS attack. As with KNN, DDoS and DoS require only a few attack samples to produce an accurate classifier.

Fig 10 shows the threshold analysis for Logistic Regression, in which the same 500 fixed benign samples were collected as for KNN. According to the graph, reconnaissance can reach a threshold of 99% with 5000 attack samples whereas DDoS can do so with just one. In contrast to DDoS attacks, a tiny drop in classifier scores is seen in reconnaissance after the threshold values. By contrasting the two forms of attacks in LR, it can be seen that DDoS needs fewer samples to reach the threshold.

Fig 10. LR model.

Fig 10

Fig 11 is a composite threshold analysis for two attacks employing all three classifiers. To reach the criteria for reconnaissance, LR needed 5000 samples, then for RF and KNN, respectively, 6000 and 8000 samples, respectively. In the initial iteration, LR successfully crossed the threshold of 100% for DDoS and DoS attacks. KNN and RF displayed comparable behaviours and met the criteria at 1000 sample attacks. This experiment has shown that DDoS takes fewer resources overall. To meet the criterion for KNN, RF, and LR, more samples must be collected than during reconnaissance.

Fig 11. Comparative analysis of all attacks.

Fig 11

The same is illustrated visually in Figs 1214 where it can be seen that efficiency for all classification models begins to improve after iteration 5. KNN and LR both obtain 100% for all metrics in iterations 4 and 5, however, RF achieves the maximum Precision and F1-score of 92%, Accuracy of 91%, and Recall of 90% at Iteration 5. These findings demonstrate a match between the performance measure values and the threshold value calculated from the classifier score.

Fig 12. RF_Reconnaissance attack.

Fig 12

Fig 14. KNN_Reconnaissance attack.

Fig 14

Fig 13. LR_Reconnaissance attack.

Fig 13

In Fig 15, the feature named sport has dropped 6% than the value of the threshold. The rest of the features show the least change from the threshold value. The features sport and dport have dropped by 24% than the value allocated as the threshold. The other features indicate the count of packets is equivalent to the threshold which shows the attack rate is less in the features extracted.

Fig 15. Dropping extracted features for all types of attacks.

Fig 15

Fig 16 shows the RF model performance on real-time imbalance data and the class balance dataset is good. The Accuracy and ROC AUC curve can be done on the BoT-IoT imbalanced data i.e. 99.6% and 99.2%, respectively and from balanced data 92.1% and 92.2%, respectively. For the BoT-IoT dataset, the Accuracy result from RF is good. This indicates that the RF classifier is an effective algorithm in the botnet detection system.

Fig 16. ROC curve.

Fig 16

The effectiveness of the proposed approach in detecting different types of DDoS attacks beyond those included in the BoT-IoT dataset hinges on its adaptability and generalization capabilities. The use of ML algorithms, such as KNN, Random Forest, and Logistic Regression, enables the system to learn patterns and characteristics of DDoS attacks from the training data and apply this knowledge to identify new and varied attack patterns. The system can be designed to continuously update its models with new data, including previously unseen attack types. This allows the ML algorithms to adapt to evolving threat landscapes. Beyond predefined attack types, the system can employ anomaly detection techniques to flag any traffic patterns that deviate significantly from the learned normal behavior, allowing for the detection of novel or sophisticated attacks.

Conclusion

This study proposed an SDN-based IoT in vehicular networks for the identification of DDoS attacks using attributes from the most recent benchmark dataset, BoT-IoT. Researchers from the University of New South Wales generated the dataset. By choosing an equal number of packets from each category, the issues with the public data set, such as mismatch in nature and over-fitting, are addressed. By using the ML technique, we were able to classify benign, reconnaissance, DoS, and DDoS traffic with an Accuracy of 91% and represent the extracted features in the dataset. This work is distinctive in that it covers the most recent benchmarked data set while reducing (nearly by half) the number of features provided for the recognition of malicious traffic. By utilizing additional well-known and benchmark data sets in the future, we want to further enhance the feature comparison and selection technique. There are various potential paths for further investigation. Investigating the effectiveness of the ML model using various subset combinations is one area in which this research could be extended.

Future scope

The future scope could explore enhancing detection accuracy and response speed in real-time scenarios as vehicular networks become increasingly complex and data-intensive. Advanced machine learning models, including deep learning techniques, could be further developed to detect even subtle and adaptive DDoS attack patterns. Additionally, integrating blockchain technology may offer secure and decentralized data sharing, strengthening the reliability of detection methods. Expanding the research to address attacks on autonomous vehicles, and exploring edge computing for distributed attack mitigation, could also pave the way for more resilient and responsive vehicular network security systems.

Data Availability

Data is publicaly avaialble on https://www.kaggle.com/datasets/vigneshvenkateswaran/bot-iot.

Funding Statement

The authors would like to acknowledge the support of Prince Sultan University for paying the Article Processing Charges (APC) of this publication. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript but they supported in supervision.

References

  • 1.Singh PK, Jha SK, Nandi SK, Nandi S. ML-based approach to detect DDoS attack in V2I communication under SDN architecture. In: TENCON 2018-2018 IEEE Region 10 Conference. IEEE; 2018. p. 0144–0149.
  • 2. Mekki T, Jabri I, Rachedi A, Chaari L. Software-defined networking in vehicular networks: A survey. Transactions on Emerging Telecommunications Technologies. 2022;33(10):e4265. doi: 10.1002/ett.4265 [DOI] [Google Scholar]
  • 3. Khan MZ, Sarkar A, Ghandorh H, Driss M, Boulila W. Information fusion in autonomous vehicle using artificial neural group key synchronization. Sensors. 2022;22(4):1652. doi: 10.3390/s22041652 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Meyer P, Hackel T, Langer F, Stahlbock L, Decker J, Eckhardt SA, et al. A security infrastructure for vehicular information using sdn, intrusion detection, and a defense center in the cloud. In: 2020 IEEE Vehicular Networking Conference (VNC). IEEE; 2020. p. 1–2.
  • 5. Babbar H, Rani S, Sah DK, AlQahtani SA, Kashif Bashir A. Detection of android malware in the Internet of Things through the K-nearest neighbor algorithm. Sensors. 2023;23(16):7256. doi: 10.3390/s23167256 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Badotra S, Panda SN. Evaluation and comparison of OpenDayLight and open networking operating system in software-defined networking. Cluster Computing. 2020;23(2):1281–1291. doi: 10.1007/s10586-019-02996-0 [DOI] [Google Scholar]
  • 7. Badotra S, Panda SN. Software-defined networking: A novel approach to networks. In: Handbook of Computer Networks and Cyber Security. Springer; 2020. p. 313–339. [Google Scholar]
  • 8.Singh V, Rajarajeswari S, Kanavalli A, Sanjeetha R. Mitigation of DDoS Attack in SDN using Table Miss-entry. In: 2022 4th International Conference on Circuits, Control, Communication and Computing (I4C). IEEE; 2022. p. 6–11.
  • 9.Amari H, Louati W, Khoukhi L, Belguith LH. Securing software-defined vehicular network architecture against ddos attack. In: 2021 IEEE 46th Conference on Local Computer Networks (LCN). IEEE; 2021. p. 653–656.
  • 10.Verma A, Saha R. Analysis of BayesNet Classifier for DDoS Detection in Vehicular Networks. In: 2022 International Conference on Augmented Intelligence and Sustainable Systems (ICAISS). IEEE; 2022. p. 980–987.
  • 11. Ghaleb FA, Saeed F, Al-Sarem M, Ali Saleh Al-rimy B, Boulila W, Eljialy A, et al. Misbehavior-aware on-demand collaborative intrusion detection system using distributed ensemble learning for VANET. Electronics. 2020;9(9):1411. doi: 10.3390/electronics9091411 [DOI] [Google Scholar]
  • 12. Shafique A, Ahmed J, Boulila W, Ghandorh H, Ahmad J, Rehman MU. Detecting the security level of various cryptosystems using machine learning models. IEEE Access. 2020;9:9383–9393. doi: 10.1109/ACCESS.2020.3046528 [DOI] [Google Scholar]
  • 13. Ghaleb FA, Maarof MA, Zainal A, Al-rimy BAS, Alsaeedi A, Boulila W. Ensemble-based hybrid context-aware misbehavior detection model for vehicular ad hoc network. Remote Sensing. 2019;11(23):2852. doi: 10.3390/rs11232852 [DOI] [Google Scholar]
  • 14. Xia W, Wen Y, Foh CH, Niyato D, Xie H. A survey on software-defined networking. IEEE Communications Surveys & Tutorials. 2014;17(1):27–51. doi: 10.1109/COMST.2014.2330903 [DOI] [Google Scholar]
  • 15. Driss M, Almomani I, e Huma Z, Ahmad J. A federated learning framework for cyberattack detection in vehicular sensor networks. Complex & Intelligent Systems. 2022;8(5):4221–4235. doi: 10.1007/s40747-022-00705-w [DOI] [Google Scholar]
  • 16.Peterson JM, Leevy JL, Khoshgoftaar TM. A review and analysis of the bot-iot dataset. In: 2021 IEEE International Conference on Service-Oriented System Engineering (SOSE). IEEE; 2021. p. 20–27.
  • 17.Xu Y, Liu Y. DDoS attack detection under SDN context. In: IEEE INFOCOM 2016-the 35th annual IEEE international conference on computer communications. IEEE; 2016. p. 1–9.
  • 18. Shafiq M, Tian Z, Bashir AK, Du X, Guizani M. CorrAUC: a malicious bot-IoT traffic detection method in IoT network using machine-learning techniques. IEEE Internet of Things Journal. 2020;8(5):3242–3254. doi: 10.1109/JIOT.2020.3002255 [DOI] [Google Scholar]
  • 19. Babbar H, Rani S. Frhids: Federated learning recommender hydrid intrusion detection system model in software defined networking for consumer devices. IEEE Transactions on Consumer Electronics. 2023;. [Google Scholar]
  • 20.Ben Atitallah S, Driss M, Boulila W, Almomani I. An effective detection and classification approach for dos attacks in wireless sensor networks using deep transfer learning models and majority voting. In: International Conference on Computational Collective Intelligence. Springer; 2022. p. 180–192.
  • 21. Yu Y, Guo L, Liu Y, Zheng J, Zong Y. An efficient SDN-based DDoS attack detection and rapid response platform in vehicular networks. IEEE access. 2018;6:44570–44579. doi: 10.1109/ACCESS.2018.2854567 [DOI] [Google Scholar]
  • 22. Muthanna MSA, Alkanhel R, Muthanna A, Rafiq A, Abdullah WAM. Towards SDN-Enabled, Intelligent Intrusion Detection System for Internet of Things (IoT). IEEE Access. 2022;10:22756–22768. doi: 10.1109/ACCESS.2022.3153716 [DOI] [Google Scholar]
  • 23. Wani A, Revathi S. DDoS detection and alleviation in IoT using SDN (SDIoT-DDoS-DA). Journal of The Institution of Engineers (India): Series B. 2020;101(2):117–128. [Google Scholar]
  • 24. Salau AO, Beyene MM. Software defined networking based network traffic classification using machine learning techniques. Scientific Reports. 2024;14(1):20060. doi: 10.1038/s41598-024-70983-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Serag RH, Abdalzaher MS, Elsayed HAEA, Sobh M, Krichen M, Salim MM. Machine-Learning-Based Traffic Classification in Software-Defined Networks. Electronics. 2024;13(6):1108. doi: 10.3390/electronics13061108 [DOI] [Google Scholar]
  • 26. Sahoo KS, Tripathy BK, Naik K, Ramasubbareddy S, Balusamy B, Khari M, et al. An evolutionary SVM model for DDOS attack detection in software defined networks. IEEE Access. 2020;8:132502–132513. doi: 10.1109/ACCESS.2020.3009733 [DOI] [Google Scholar]
  • 27.Sultan A. Intrusion Detection System Using Machine Learning Algorithms in SDN Based Vehicular Networks; 2022.
  • 28. Ning Z, Liu L, Xia F, Jedari B, Lee I, Zhang W. CAIS: A copy adjustable incentive scheme in community-based socially aware networking. IEEE Transactions on Vehicular Technology. 2016;66(4):3406–3419. doi: 10.1109/TVT.2016.2593051 [DOI] [Google Scholar]
  • 29. Wang S, Zhou A, Yang M, Sun L, Hsu CH, Yang F. Service composition in cyber-physical-social systems. IEEE Transactions on Emerging Topics in Computing. 2017;8(1):82–91. doi: 10.1109/TETC.2017.2675479 [DOI] [Google Scholar]
  • 30. Ning Z, Hu X, Chen Z, Zhou M, Hu B, Cheng J, et al. A cooperative quality-aware service access system for social Internet of vehicles. IEEE Internet of Things Journal. 2017;5(4):2506–2517. doi: 10.1109/JIOT.2017.2764259 [DOI] [Google Scholar]
  • 31. Sedjelmaci H, Senouci SM, Abu-Rgheff MA. An efficient and lightweight intrusion detection mechanism for service-oriented vehicular networks. IEEE Internet of things journal. 2014;1(6):570–577. doi: 10.1109/JIOT.2014.2366120 [DOI] [Google Scholar]
  • 32. Eliyan LF, Di Pietro R. DoS and DDoS attacks in Software Defined Networks: A survey of existing solutions and research challenges. Future Generation Computer Systems. 2021;122:149–171. doi: 10.1016/j.future.2021.03.011 [DOI] [Google Scholar]
  • 33. Fan C, Kaliyamurthy NM, Chen S, Jiang H, Zhou Y, Campbell C. Detection of DDoS attacks in software defined networking using entropy. Applied Sciences. 2021;12(1):370. doi: 10.3390/app12010370 [DOI] [Google Scholar]
  • 34. Cui Y, Qian Q, Guo C, Shen G, Tian Y, Xing H, et al. Towards DDoS detection mechanisms in software-defined networking. Journal of Network and Computer Applications. 2021;190:103156. doi: 10.1016/j.jnca.2021.103156 [DOI] [Google Scholar]
  • 35.Sharafaldin I, Lashkari AH, Hakak S, Ghorbani AA. Developing realistic distributed denial of service (DDoS) attack dataset and taxonomy. In: 2019 International Carnahan Conference on Security Technology (ICCST). IEEE; 2019. p. 1–8.
  • 36. Shadman Roodposhti M, Aryal J, Shahabi H, Safarrad T. Fuzzy shannon entropy: A hybrid gis-based landslide susceptibility mapping method. Entropy. 2016;18(10):343. doi: 10.3390/e18100343 [DOI] [Google Scholar]
  • 37. Wang Y, Wang X, Ariffin MM, Abolfathi M, Alqhatani A, Almutairi L. Attack detection analysis in software-defined networks using various machine learning method. Computers and Electrical Engineering. 2023;108:108655. doi: 10.1016/j.compeleceng.2023.108655 [DOI] [Google Scholar]
  • 38. Liu Z, Wang Y, Feng F, Liu Y, Li Z, Shan Y. A DDoS detection method based on feature engineering and machine learning in software-defined networks. Sensors. 2023;23(13):6176. doi: 10.3390/s23136176 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.de Biasi G, Vieira LF, Loureiro AA. Sentinel: defense mechanism against DDoS flooding attack in software defined vehicular network. In: 2018 IEEE International Conference on Communications (ICC). IEEE; 2018. p. 1–6.
  • 40. Gadallah WG, Ibrahim HM, Omar NM. A deep learning technique to detect distributed denial of service attacks in software-defined networks. Computers & Security. 2024;137:103588. doi: 10.1016/j.cose.2023.103588 [DOI] [Google Scholar]
  • 41.Wang J, Li L, Zeller A. Restoring execution environments of Jupyter notebooks. In: 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE; 2021. p. 1622–1633.
  • 42.Leevy JL, Hancock J, Khoshgoftaar TM, Peterson JM. An easy-to-classify approach for the bot-iot dataset. In: 2021 IEEE Third International Conference on Cognitive Machine Intelligence (CogMI). IEEE; 2021. p. 172–179.
  • 43.Kuang C, Hou D, Zhang Q, Zhao K, Li W. A Network Traffic Collection System for Space Information Networks Emulation Platform. In: International Conference on Wireless and Satellite Systems. Springer; 2021. p. 217–225.
  • 44.Ryšavỳ O, Matoušek P. A Network Traffic Processing Library for ICS Anomaly Detection. In: 7th Conference on the Engineering of Computer Based Systems; 2021. p. 1–7.
  • 45. Anyanwu GO, Nwakanma CI, Lee JM, Kim DS. RBF-SVM kernel-based model for detecting DDoS attacks in SDN integrated vehicular network. Ad Hoc Networks. 2023;140:103026. doi: 10.1016/j.adhoc.2022.103026 [DOI] [Google Scholar]

Decision Letter 0

Faouzi Jaidi

10 Jun 2024

PONE-D-24-16635Effective DDoS Attack Detection in Software-Defined Vehicular Networks Using Statistical Flow Analysis and Machine Learning

PLOS ONE

Dear Dr. Rani,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please consider that the outcome of the peer-review process for your manuscript is critical as reviwers have raised significant concerns that should be carefully addressed. More an update of the related work section to discuss more recent and relevant works in the field of study is required and a comparison of obtained results with state of the art results is of potential interest to help understanding the relevance and novelty of your work.

Please submit your revised manuscript by Jul 25 2024 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Faouzi Jaidi

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at 

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and 

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Please note that PLOS ONE has specific guidelines on code sharing for submissions in which author-generated code underpins the findings in the manuscript. In these cases, we expect all author-generated code to be made available without restrictions upon publication of the work. Please review our guidelines at https://journals.plos.org/plosone/s/materials-and-software-sharing#loc-sharing-code and ensure that your code is shared in a way that follows best practice and facilitates reproducibility and reuse.

3. Thank you for stating the following financial disclosure: 

   "The authors would like to acknowledge the support of Prince Sultan University for paying the Article Processing Charges (APC) of this publication." 

Please state what role the funders took in the study.  If the funders had no role, please state: "The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript." 

If this statement is not correct you must amend it as needed. 

Please include this amended Role of Funder statement in your cover letter; we will change the online submission form on your behalf.

4. We note that your Data Availability Statement is currently as follows: All relevant data are within the manuscript and its Supporting Information files.

Please confirm at this time whether or not your submission contains all raw data required to replicate the results of your study. Authors must share the “minimal data set” for their submission. PLOS defines the minimal data set to consist of the data required to replicate all study findings reported in the article, as well as related metadata and methods (https://journals.plos.org/plosone/s/data-availability#loc-minimal-data-set-definition).

For example, authors should submit the following data:

- The values behind the means, standard deviations and other measures reported;

- The values used to build graphs;

- The points extracted from images for analysis.

Authors do not need to submit their entire data set if only a portion of the data was used in the reported study.

If your submission does not contain these data, please either upload them as Supporting Information files or deposit them to a stable, public repository and provide us with the relevant URLs, DOIs, or accession numbers. For a list of recommended repositories, please see https://journals.plos.org/plosone/s/recommended-repositories.

If there are ethical or legal restrictions on sharing a de-identified data set, please explain them in detail (e.g., data contain potentially sensitive information, data are owned by a third-party organization, etc.) and who has imposed them (e.g., an ethics committee). Please also provide contact information for a data access committee, ethics committee, or other institutional body to which data requests may be sent. If data are owned by a third party, please indicate how others may request data access.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

Reviewer #2: Yes

Reviewer #3: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: I Don't Know

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: In this paper, the author proposed to SDN Intrusion Detection for Internet of Things (IoT) environments by applying

Machine learning based on statistical information. However, there are many issues mention below:

What are the potential limitations of using a small fraction of collected features for model evaluation, and how might this impact the generalizability of the results?

How does the proposed approach ensure that the differentiation between reconnaissance, DoS, and DDoS traffic is accurate and not influenced by the issues of unbalanced datasets?

Why were $K$-nearest Neighbor, Random Forest, and Logistic Regression specifically chosen as the classifiers for this study, and what are the advantages and disadvantages of each in the context of SDN Intrusion Detection for IoT environments?

How does the proposed approach measure statistical flow and compute entropy, and how are these measurements integrated into the Machine Learning algorithms for intrusion detection?

What criteria were used to determine the best sample size for the dataset, and how was the balance between sample size and model performance evaluated?

What are the implications of the Random Forest classifier achieving the highest performance metrics, and how might this influence the choice of ML algorithms for future studies in similar contexts?

How does the proposed approach handle the dynamic and evolving nature of IoT environments, particularly in terms of updating the intrusion detection models and dealing with new types of attacks?

Reviewer #2: Please follow my comments to improve the manuscript.

1. Please discuss how you will differentiate DDoS attack packet flow and legitimate packet flow?

2. Add a major limitation of the model in conclusion.

3. Include citations from some latest paper of 2022-2024.

Reviewer #3: • The abstract mentions measuring statistical flow and computing entropy but does not provide details on the specific methods or metrics used. Can you elaborate on the techniques used for these measurements?

• The abstract references the BoT-IoT dataset but does not provide details on its characteristics, such as the size, diversity, or types of attacks included. Can you provide more information about the dataset used for evaluation?

• The abstract mentions addressing unbalanced and overfitting dataset traces but does not specify the methods used to handle these issues. Can you provide more details on the techniques employed to balance the dataset and prevent overfitting?

• The abstract mentions a comparative study to identify the best sample size but does not provide details on the criteria or methodology used for this comparison. Can you elaborate on how the comparative study was conducted?

• The abstract states that the work aims to ascertain the impact of dataset attributes on threshold performance but does not provide specific findings or examples. Can you provide more details on which attributes were found to be significant and their impact?

• The abstract does not address whether the proposed approach can be applied in real-time for detecting DDoS attacks. Can you discuss the feasibility and performance of the approach in real-time scenarios?

• The abstract does not mention how the proposed approach can be integrated with existing vehicular network systems and infrastructures. Can you discuss the integration process and potential challenges?

• The abstract focuses on the BoT-IoT dataset but does not discuss the generalizability of the approach to other datasets or real-world scenarios. Can you provide insights on the applicability of the approach to different datasets?

• The abstract does not provide information on the computational complexity or resource requirements of the proposed approach. Can you provide details on the computational resources needed and the efficiency of the approach?

• How does the proposed approach handle real-time traffic management in vehicular networks while detecting DDoS attacks? Are there specific mechanisms to ensure timely detection and response?

• How effective is the proposed approach in detecting different types of DDoS attacks beyond those included in the BoT-IoT dataset? Can you provide examples of how the approach adapts to various attack patterns?

• How does the proposed approach scale with increasing network traffic and the number of connected vehicles? Are there specific optimizations to handle high traffic volumes?

• How does the approach ensure effective communication and collaboration between vehicles and infrastructural facilities during a DDoS attack? Are there protocols in place to manage this?

• What are the rates of false positives and false negatives observed in the experimental evaluation, and how does the approach minimize these rates?

• How does the approach perform under varying environmental conditions, such as different traffic densities, weather conditions, and geographical locations?

• How does the proposed approach integrate with Vehicle-to-Everything (V2X) communication technologies, and what are the potential challenges and solutions?

• How does the approach address ethical and privacy concerns related to monitoring and analyzing vehicular network traffic? Are there measures in place to ensure data privacy?

• How does the approach handle deployment in heterogeneous networks with different vehicle types, communication protocols, and network architectures?

• How does the approach ensure continuous learning and adaptation to new attack patterns and evolving threats in vehicular networks?

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Dr. Anupama Mishra

Reviewer #2: Yes: Muhammad Reazul Haque

Reviewer #3: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2024 Dec 18;19(12):e0314695. doi: 10.1371/journal.pone.0314695.r002

Author response to Decision Letter 0


10 Jul 2024

Point-wise Detailed Response to Editor and Reviewers’ comments

Title: Effective DDoS Attack Detection in Software-Defined Vehicular Networks Using Statistical Flow Analysis and Machine Learning

Authors: Himanshi Babbar, Shalli Rani, Maha Driss

Dear Editors and Reviewers:

We are thankful to you for spending your valuable time for making a review and for constructing the comments on our manuscript. These comments are valuable and very helpful for revising and improving our paper. We have studied comments carefully and have made correction as marked in the revised manuscript. We have tried our best to address the mentioned comments to revise our manuscript in the hope that these revisions will meet your requirement. The following changes have been made in the manuscript as per the received comments.

*The changes made in the manuscript as per received comments have been highlighted in blue color.

Reviewer #1:

1. What are the potential limitations of using a small fraction of collected features for model evaluation, and how might this impact the generalizability of the results?

Response: Using a small fraction of collected features for model evaluation in detecting DDoS attacks in Software-Defined Vehicular Networks (SDVN) presents several potential limitations that can impact the generalizability of the results:

a. Loss of Information: By selecting only a subset of features, important information that could be crucial for distinguishing between different types of traffic (e.g., benign, reconnaissance, DoS, DDoS) might be omitted. This can lead to a decrease in the model's ability to accurately identify all types of malicious activities.

b. Bias in Feature Selection: The chosen subset of features might not represent the entire feature space adequately, leading to a biased model. This bias can result in a model that performs well on the training and validation datasets but poorly on unseen data, thereby reducing its generalizability.

c. Underfitting Risk: When a limited number of features are used, the model might not capture the complexity of the data, leading to underfitting. Underfitting occurs when the model is too simple to capture the underlying patterns in the data, resulting in poor performance across different datasets.

d. Over-reliance on Selected Features: The model may become overly reliant on the selected features, which could be specific to the dataset used for training and evaluation. This over-reliance can cause the model to perform inadequately when exposed to different datasets with varying distributions of features.

e. Reduced Robustness: A model trained on a limited set of features may lack robustness to variations and anomalies in real-world traffic. This can make the model less effective in adapting to new types of attacks or changes in traffic patterns that were not represented in the training data.

2. By addressing these considerations, the model can be made more resilient and effective in real-world applications, thereby enhancing its capability to generalize beyond the training data.

Response: By addressing these considerations, the model can be made more resilient and effective in real-world applications, thereby enhancing its capability to generalize beyond the training data:

a. Comprehensive Feature Selection: Ensure thorough feature selection by evaluating the importance of each feature using methods like recursive feature elimination (RFE) or feature importance from tree-based algorithms. This helps in retaining the most relevant features and reducing the risk of excluding critical information.

b. Cross-Validation Techniques: Implement robust cross-validation methods, such as k-fold cross-validation, to assess model performance across multiple subsets of the data. This approach ensures that the model's performance is consistent and not dependent on a specific subset, enhancing its generalizability.

c. Dimensionality Reduction: Apply dimensionality reduction techniques such as Principal Component Analysis (PCA) to reduce the feature space while retaining essential information. This helps in capturing complex relationships within the data, improving the model's ability to detect sophisticated attack patterns.

d. Diverse Dataset Evaluation: Validate the model using multiple datasets with varying characteristics and distributions. This ensures that the model does not overfit to a single dataset and performs well in different real-world scenarios, improving its robustness and adaptability.

e. Data Augmentation: Use data augmentation techniques to create synthetic data that represents potential real-world scenarios. This helps in training the model to recognize a wider variety of attack patterns and normal behaviors, enhancing its generalization capability.

f. Regularization Methods: Incorporate regularization techniques such as L1 (Lasso) or L2 (Ridge) regularization to prevent overfitting. Regularization helps in keeping the model simpler and more generalizable by adding a penalty for larger coefficients.

By integrating these strategies, the proposed Machine Learning (ML) model for detecting DDoS attacks in Software-Defined Vehicular Networks (SDVN) can achieve greater resilience, adaptability, and effectiveness. This enhances the model's capability to generalize beyond the training data, making it more reliable and practical for real-world applications in vehicular networks.

3. How does the proposed approach ensure that the differentiation between reconnaissance, DoS, and DDoS traffic is accurate and not influenced by the issues of unbalanced datasets?

Response: The proposed approach ensures accurate differentiation between reconnaissance, Denial of Service (DoS), and Distributed Denial of Service (DDoS) traffic by addressing the issues of unbalanced datasets and overfitting through several key strategies:

a. By measuring statistical flow characteristics and computing entropy, the approach captures distinct patterns and anomalies in network traffic. This helps in distinguishing between different types of malicious activities based on their unique statistical signatures.

b. The deployment of machine learning (ML) algorithms specifically designed for intrusion detection in Software-Defined Networking (SDN) environments enhances the model's ability to learn and recognize complex attack patterns. These algorithms are trained to differentiate between reconnaissance, DoS, and DDoS attacks by learning from labeled examples.

c. The approach explicitly tackles the issue of unbalanced datasets, which is common in cybersecurity datasets where benign traffic vastly outnumbers malicious traffic. Techniques such as oversampling the minority class, undersampling the majority class, or using synthetic data generation (e.g., SMOTE) can be employed to balance the dataset, ensuring that the ML model is exposed to sufficient examples of all types of traffic.

d. By using techniques such as cross-validation and regularization, the approach mitigates the risk of overfitting, where the model would perform well on training data but poorly on unseen data. This ensures that the model generalizes well and accurately classifies new, unseen traffic.

e. By using a small fraction of collected features and conducting a comparative study to identify the optimal sample size, the approach ensures that the most informative features are selected. This reduces noise and irrelevant information, improving the model's performance and accuracy in distinguishing different types of attacks.

f. The comparative study to ascertain the impact of different attributes on threshold performance helps in fine-tuning the model. This involves experimenting with various features and their combinations to find the optimal set that maximizes accuracy and detection capabilities.

g. The experimental evaluation using the BoT-IoT dataset, which contains diverse and realistic examples of IoT-related attacks, provides a robust testing ground for the model. The results showing high Precision, F1-score, Accuracy, and Recall for the Random Forest classifier indicate the effectiveness of the approach in accurately differentiating between different types of malicious traffic.

4. Why were $K$-nearest Neighbor, Random Forest, and Logistic Regression specifically chosen as the classifiers for this study, and what are the advantages and disadvantages of each in the context of SDN Intrusion Detection for IoT environments?

Response: The choice of \\(K\\)-nearest Neighbor (KNN), Random Forest, and Logistic Regression classifiers for this study in the context of SDN Intrusion Detection for IoT environments is based on their distinct characteristics and strengths that make them suitable for different aspects of intrusion detection. Here are the reasons for selecting these classifiers, along with their respective advantages and disadvantages:

(K\\)-nearest Neighbor (KNN)

Reasons for Selection:

- KNN is easy to understand and implement. It is a straightforward algorithm that classifies a data point based on the majority class of its \\(K\\) nearest neighbors. It makes no explicit assumptions about the distribution of data, which is beneficial when dealing with complex and non-linear relationships in IoT traffic.

Advantages:

- Performs well on small datasets where the computational cost is manageable.

- Can handle multi-class classification without any modification.

- KNN is a lazy learner, meaning it requires no explicit training phase, making it fast to deploy.

Disadvantages:

- High computational cost during prediction as it involves calculating the distance between the query point and all points in the dataset.

- Requires storing the entire dataset, which can be impractical for large datasets.

- Performance can be degraded by irrelevant or redundant features, necessitating careful feature selection and normalization.

Random Forest

Reasons for Selection:

- Random Forest is an ensemble method that combines multiple decision trees to improve accuracy and control overfitting. It provides insights into feature importance, which is valuable for understanding which features are most indicative of an attack.

Advantages:

- Generally provides high accuracy and robustness to overfitting due to averaging multiple trees.

- Can handle both classification and regression tasks and works well with high-dimensional data.

- The ensemble nature reduces the risk of overfitting compared to individual decision trees.

Disadvantages:

- More complex and computationally intensive than single decision trees, particularly during the training phase.

- While individual trees are interpretable, the overall Random Forest model is less transparent.

- Can be slower to train, especially with large datasets and a high number of trees.

Logistic Regression

Reasons for Selection:

- Logistic Regression is a simple yet effective linear model that provides clear probabilistic interpretations of the classification results.

- Often used as a baseline classifier due to its simplicity and ease of implementation.

Advantages:

- Computationally efficient and fast to train, making it suitable for real-time applications.

- Provides coefficients that are easily interpretable, indicating the strength and direction of the relationship between features and the outcome.

- Effective for binary classification tasks, which are common in intrusion detection.

Disadvantages:

- Assumes a linear relationship between the features and the log odds of the outcome, which may not capture complex, non-linear patterns in IoT traffic.

- May perform poorly if the underlying data relationships are non-linear and complex.

- Can be sensitive to outliers, which can skew the model's performance.

5. How does the proposed approach measure statistical flow and compute entropy, and how are these measurements integrated into the Machine Learning algorithms for intrusion detection?

Response: Statistical flow measurements involve collecting and analyzing various metrics that describe the behavior of network traffic. Common metrics include:

Packet Count: Number of packets transmitted over a period.

Byte Count: Amount of data (in bytes) transmitted over a period.

Flow Duration: The time duration of a flow from the first to the last packet.

Packet Inter-arrival Time: Time intervals between consecutive packets.

Flow Rate: The rate at which packets/bytes are transmitted.

These metrics help in understanding the patterns and anomalies in network traffic. For instance, a sudden spike in packet count or flow rate can indicate a potential attack.

Computing Entropy

Entropy is a measure of randomness or unpredictability in a dataset. In the context of network traffic, entropy can help in identifying irregular patterns that deviate from normal behavior. The steps to compute entropy are as follows:

Categorize Traffic: Divide the network traffic into different categories based on features like source/destination IP addresses, port numbers, and protocols.

Probability Distribution: Calculate the probability distribution of these categories. For example, the probability P_i of each unique source IP address in the traffic.

Entropy Calculation: Use the probability distribution to compute the entropy using the formula:

H=−i∑Pilog2(Pi)

Here, H is the entropy, and P_i is the probability of the i-th category.

High entropy indicates a high level of randomness (potentially normal traffic), while low entropy can indicate more predictable and potentially malicious traffic patterns.

Integration into Machine Learning Algorithms

The statistical flow measurements and entropy values are integrated into ML algorithms as features. Here’s how this integration typically works:

Feature Extraction: Extract statistical flow metrics and entropy values for each traffic flow. These become part of the feature set used to represent the data.

Feature Vector Construction: Construct feature vectors for each traffic instance, combining traditional network features (like IP addresses and port numbers) with statistical metrics and entropy values.

Model Training: Use the constructed feature vectors to train ML models. The ML algorithms learn patterns in the feature space that differentiate normal traffic from various types of malicious traffic (reconnaissance, DoS, and DDoS).

Model Evaluation: Evaluate the models using metrics like accuracy, precision, recall, and F1-score to ensure they can accurately detect and classify different types of attacks.

6. What criteria were used to determine the best sample size for the dataset, and how was the balance between sample size and model performance evaluated?

Response: Determining the best sample size for the dataset and evaluating the balance between sample size and model performance involves several critical steps and criteria. Here's a detailed explanation of the process:

Criteria for Determining the Best Sample Size

a. Evaluate key performance metrics such as accuracy, precision, recall, and F1-score. These metrics provide a comprehensive view of the model's ability to correctly identify different types of traffic, including benign and malicious traffic (reconnaissance, DoS, DDoS).

b. Consider the training time and computational resources required. Larger sample sizes typically improve model performance but also increase the time and resources needed for training.

c. Monitor for signs of overfitting (where the model performs well on training data but poorly on validation/testing data) and underfitting (where the model performs poorly on both training and validation/testing data). The goal is to find a sample size that minimizes both.

d. Ensure that the sample size is large enough to achieve statistically significant results. Small sample sizes may lead to high variance in performance metrics and unreliable conclusions.

e. Ensure that the sample size captures the diversity and represents the full spectrum of network traffic behaviors, including different types of attacks and normal traffic patterns.

7. What are the implications of the Random Forest classifier achieving the highest performance metrics, and how might this influence the choice of ML algorithms for future studies in similar contexts?

Response: The Random Forest classifier achieving the highest performance metrics in this study has several imp

Attachment

Submitted filename: revision 1 (3).docx

pone.0314695.s001.docx (35.5KB, docx)

Decision Letter 1

Faouzi Jaidi

2 Sep 2024

PONE-D-24-16635R1Effective DDoS Attack Detection in Software-Defined Vehicular Networks Using Statistical Flow Analysis and Machine LearningPLOS ONE

Dear Dr. Rani,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please ensure that your revised paper carefully addresses all the reviwers comments and particularly discusses more recent and relevant related works with a comprehensive comparaison of your key findings with state of the art results.

Please submit your revised manuscript by Oct 17 2024 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Faouzi Jaidi

Academic Editor

PLOS ONE

Journal Requirements:

Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #2: All comments have been addressed

Reviewer #4: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #2: Yes

Reviewer #4: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #2: Yes

Reviewer #4: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #2: Yes

Reviewer #4: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #2: Yes

Reviewer #4: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #2: Thank you very much for following my comments to improve the manuscript. It's acceptable now. Please revise the English of the manuscript.

Reviewer #4: The research topic is original and offers a good perspective on DDoS detection in Software-Defined Vehicular Networks (SDVN). This work is particularly valuable given the growing importance of SDVNs and the unique challenges they face regarding security. The manuscript is well-organized, with clear sections delineating the introduction, methodology, results, and discussion. The flow of information is logical and easy to follow.

However, I have some questions :

Question 1: How does this approach ensure the model's generalizability and robustness in SDVN ?

Quesition 2: The dataset used in the study is highly imbalanced, with only 370 samples labeled as benign compared to 2,934,447 samples labeled as malicious. Could the authors clarify why they chose to include only 370 benign samples, especially considering the risk of overfitting and the potential impact on the model's ability to generalize?

Comments :

1) In Figure 9, which presents a comparative analysis of all attacks, the representation of the DDoS-RF (shown in pink) is not clear. The line style and color blend with other elements in the figure, making it difficult to distinguish and interpret accurately. I recommend adjusting the color or line style to enhance visibility and ensure that the DDoS-RF results are easily identifiable.

2) The references included in the manuscript are comprehensive, but they lack citations from the most recent years, particularly from 2023 and 2024. Including more up-to-date references would strengthen the relevance and impact of the study by incorporating the latest developments and research in the field.

3) Line 119: The word "precisioj" is misspelled; it should be "precision."

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #2: Yes: Dr. Muhammad Reazul Haque

Reviewer #4: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2024 Dec 18;19(12):e0314695. doi: 10.1371/journal.pone.0314695.r004

Author response to Decision Letter 1


10 Oct 2024

1. Thank you for updating your data availability statement. You note that your data are available within the Supporting Information files, but no such files have been included with your submission. At this time we ask that you please upload your minimal data set as a Supporting Information file, or to a public repository such as Figshare or Dryad.

Please also ensure that when you upload your file you include separate captions for your supplementary files at the end of your manuscript.

As soon as you confirm the location of the data underlying your findings, we will be able to proceed with the review of your submission.

\\section*{Data Availability}

Data is publicaly avaialble on

https://www.kaggle.com/datasets/vigneshvenkateswaran/bot-iot

2. Thank you for stating the following Funding Information in your manuscript:

"The authors would like to acknowledge the support of Prince Sultan University for paying the Article Processing Charges (APC) of this publication. "

We note that you have provided funding information that is currently declared in your Funding Statement. However, funding information should not appear in any areas of your manuscript. We will only publish funding information present in the Funding Statement section of the

online submission form.

Please remove any funding-related text from the manuscript and let us know how you would like to update your Funding Statement. Currently, your Funding Statement reads as follows:

Response: it is removed

"The authors would like to acknowledge the support of Prince Sultan University for paying the Article Processing Charges (APC) of this publication. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript but they supported in supervision."

Response: we removed funding statement from manuscript

1. We notice that your manuscript file was uploaded on Jul 10, 2024. Please can you upload the latest version of your revised manuscript as the main article file, ensuring that does not contain any tracked changes or highlighting. This will be used in the production process if your manuscript is accepted. Please follow this link for more information: http://blogs.PLOS.org/everyone/2011/05/10/how-to-submit-your-revised-manuscript/

It is done

2. Thank you for stating the following financial disclosure:

"The authors would like to acknowledge the support of Prince Sultan University for paying the Article Processing Charges (APC) of this publication."

Please state what role the funders took in the study. If the funders had no role, please state: ""The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.""

If this statement is not correct you must amend it as needed.

Please include this amended Role of Funder statement in your cover letter; we will change the online submission form on your behalf.

response: It is done

3. Thank you for updating your data availability statement. You note that your data are available within the Supporting Information files, but no such files have been included with your submission. At this time we ask that you please upload your minimal data set as a Supporting Information file, or to a public repository such as Figshare or Dryad.

Please also ensure that when you upload your file you include separate captions for your supplementary files at the end of your manuscript.

As soon as you confirm the location of the data underlying your findings, we will be able to proceed with the review of your submission.

Response: It is done

4. If possible, please upload a file showing your changes either highlighted or using track changes. This should be uploaded as a Revised Manuscript w/tracked changes, file type. Please follow this link for more information: http://blogs.PLOS.org/everyone/2011/05/10/how-to-submit-your-revised-manuscript/

Response: It is done

5. We note that your manuscript is not formatted using one of PLOS ONE’s accepted file types. Please reattach your manuscript as one of the following file types: .doc, .docx, .rtf, or .tex (accompanied by a .pdf).

If your submission was prepared in LaTex, please submit your manuscript file in PDF format and attach your .tex file as “other.”

Response : it is done

1. Thank you for stating the following financial disclosure:

"The authors would like to acknowledge the support of Prince Sultan University for paying the Article Processing Charges (APC) of this publication."

Please state what role the funders took in the study. If the funders had no role, please state: "The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript."

If this statement is not correct you must amend it as needed.

Please include this amended Role of Funder statement in your cover letter; we will change the online submission form on your behalf.

Response: It is emended. However, Maha Driss is in supervision.

2. Thank you for updating your data availability statement. You note that your data are available within the Supporting Information files, but no such files have been included with your submission. At this time we ask that you please upload your minimal data set as a Supporting Information file, or to a public repository such as Figshare or Dryad.

Please also ensure that when you upload your file you include separate captions for your supplementary files at the end of your manuscript.

As soon as you confirm the location of the data underlying your findings, we will be able to proceed with the review of your submission.

Response: All information is in manuscript only. No need to add ay other file. Data is taken from Kaggle. It is added i footnote.

3. Thank you for stating the following in your Competing Interests section:

"NO authors have competing interests."

Response: Conflict of interest is added in manuscript and cover letter

Point-wise Detailed Response to Editor and Reviewers’ comments

Title: Effective DDoS Attack Detection in Software-Defined Vehicular Networks Using Statistical Flow Analysis and Machine Learning

Authors: Himanshi Babbar, Shalli Rani, Maha Driss

Dear Editors and Reviewers:

We are thankful to you for spending your valuable time for making a review and for constructing the comments on our manuscript. These comments are valuable and very helpful for revising and improving our paper. We have studied comments carefully and have made correction as marked in the revised manuscript. We have tried our best to address the mentioned comments to revise our manuscript in the hope that these revisions will meet your requirement. The following changes have been made in the manuscript as per the received comments.

*The changes made in the manuscript as per received comments have been highlighted in blue color.

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #2: All comments have been addressed

Response: Thank you for the feedback.

Reviewer #4: All comments have been addressed

Response: Thank you for the feedback.

Reviewer #2: Thank you very much for following my comments to improve the manuscript. It's acceptable now. Please revise the English of the manuscript.

Response: Thank you much for the feedback.

Reviewer #4:

The research topic is original and offers a good perspective on DDoS detection in Software-Defined Vehicular Networks (SDVN). This work is particularly valuable given the growing importance of SDVNs and the unique challenges they face regarding security. The manuscript is well-organized, with clear sections delineating the introduction, methodology, results, and discussion. The flow of information is logical and easy to follow.

However, I have some questions :

Question 1: How does this approach ensure the model's generalizability and robustness in SDVN ?

Response: To ensure the generalizability and robustness of the proposed approach in SDN-based Vehicular Networks (SDVN), several key aspects must be addressed within the framework you described. Here's how this approach contributes to generalizability and robustness:

• By utilizing detailed feature extraction and selection techniques, the model can focus on the most relevant and informative features for detecting DDoS attacks. This prevents overfitting, as it avoids training the model on irrelevant or redundant data. This careful pre-processing ensures that the model captures meaningful patterns, making it more generalizable across different datasets and network topologies.

• The use of entropy as a key feature in detecting DDoS attacks leverages the randomness or disorder in traffic flow, which varies significantly between normal and malicious traffic. This allows the model to adaptively detect changes in traffic patterns across different network environments. The method of tracking entropy over consecutive intervals adds a dynamic aspect, ensuring that the detection mechanism remains sensitive to sustained malicious activity, even when network conditions change.

• Using a combination of flow statistics and entropy measurements provides a dual-layer detection mechanism. Flow statistics such as length and duration help to detect patterns that are common in DDoS attacks, while entropy measures capture the randomness in packet distribution. This layered approach allows the model to detect a wide range of DDoS behaviors, thereby improving its robustness in various SDVN scenarios, where network traffic can be highly dynamic.

• The concept of using a "degree of attack" metric helps the model fine-tune its response based on the severity and intensity of the attack. This feature allows for better calibration and adaptability, ensuring that the model can handle both low-rate and high-rate DDoS attacks. In SDVN, where vehicular communication systems can experience fluctuating traffic, this adaptability is crucial for maintaining robustness under different attack intensities.

Question 2: The dataset used in the study is highly imbalanced, with only 370 samples labeled as benign compared to 2,934,447 samples labeled as malicious. Could the authors clarify why they chose to include only 370 benign samples, especially considering the risk of overfitting and the potential impact on the model's ability to generalize?

Response: The authors' decision to include only 370 benign samples, compared to the much larger number of malicious samples (2,934,447), reflects a strategic approach to manage the dataset's imbalance and the complexities of working with large datasets, such as the BoT-IoT dataset. This imbalanced could be reduced with data augmentation, regularization, or cross-validation. However, it will again increase the response time and complexity (Increased Computational Load, Complexity in Data Pipeline, Hyperparameter Tuning, Impact on Learning Dynamics, Increased Training Time, Complexity in Evaluation) but the risk of overfitting can be mitigated with this approach.

However, there are important considerations and justifications for this choice:

• The dataset contains over 72 million records, making it computationally expensive to handle. By selecting a representative subset of the benign data, the authors aimed to reduce the computational complexity while still maintaining enough data to analyze different attack scenarios. However, the relatively small number of benign samples does create a potential imbalance.

• The inclusion of only 370 benign samples introduces a significant imbalance, which poses a risk of overfitting to the dominant malicious classes. To mitigate this risk, techniques such as class weighting, oversampling, or undersampling could have been used to ensure the model doesn't become biased towards the majority class (malicious traffic). While the paper doesn't explicitly mention using these techniques, they are standard practices in handling imbalanced datasets.

• Overfitting is indeed a risk when training on imbalanced data. The authors likely chose to keep a small portion of benign samples to prevent the model from becoming overly sensitive to benign traffic, which may lead to poor generalization in real-world scenarios. While the small benign sample size could lead to underrepresentation, cross-validation and tuning (e.g., with KNN, RF, and LR) can help reduce overfitting and enhance the model’s generalizability.

• The decision to focus heavily on malicious samples could affect the model's ability to generalize across diverse traffic types, especially under benign conditions. The authors may have prioritized attack detection over general traffic classification. However, this approach would benefit from further testing on datasets with more balanced traffic to validate the model’s robustness and adaptability in environments with different traffic patterns.

Comments :

1) In Figure 9, which presents a comparative analysis of all attacks, the representation of the DDoS-RF (shown in pink) is not clear. The line style and color blend with other elements in the figure, making it difficult to distinguish and interpret accurately. I recommend adjusting the color or line style to enhance visibility and ensure that the DDoS-RF results are easily identifiable.

Response: The figure has been updated!

2) The references included in the manuscript are comprehensive, but they lack citations from the most recent years, particularly from 2023 and 2024. Including more up-to-date references would strengthen the relevance and impact of the study by incorporating the latest developments and research in the field.

Response: The recent references are added in the papers:

Reference no. 19, 20, 32, 33, 35

3) Line 119: The word "precisioj" is misspelled; it should be "precision."

Response: I find the misspelled word in the entire PDF but couldn’t find any.

Thanks and Regards

Authors

Decision Letter 2

Faouzi Jaidi

10 Nov 2024

PONE-D-24-16635R2Effective DDoS Attack Detection in Software-Defined Vehicular Networks Using Statistical Flow Analysis and Machine LearningPLOS ONE

Dear Dr. Rani,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Based on received advices, your manuscript is accepted for publication, subject to minor adjustments. These are minor revisions, which I believe you can do very quickly:- Please check carefully for any non compliance issue between the metrics mentioned in the abstract, introduction and the conclusion parts (your models accuracy : 91% in the abstract ; 96.3% (introduction; line 112 and conclusion; line 713).- Please insert a paragraph at the end of the introduction to highlight the structure of the manuscript.- Please carefully review the manuscript and ensure that acronyms are explained only during their first use.

Please submit your revised manuscript by Dec 25 2024 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Faouzi Jaidi

Academic Editor

PLOS ONE

Journal Requirements:

Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #2: All comments have been addressed

Reviewer #4: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #2: Yes

Reviewer #4: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #2: Yes

Reviewer #4: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #2: Yes

Reviewer #4: (No Response)

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #2: Yes

Reviewer #4: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #2: Thank you very much for following my comments point to point. You can include some directions in the manuscript for future research.

Reviewer #4: This paper presents an intriguing research direction and makes a valuable contribution to the field.

- Comprehensive Evaluation: The paper offers a thorough analysis of the proposed method, demonstrating a solid understanding of the topic.

- High Accuracy: The results indicate a high level of accuracy.

- Methodical Process: The research follows a clear and methodical approach.

- Addressing Real-World Challenges: The study is relevant to practical applications.

- The references are up-to-date and relevant.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #2: Yes: Dr. Muhammad Reazul Haque

Reviewer #4: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2024 Dec 18;19(12):e0314695. doi: 10.1371/journal.pone.0314695.r006

Author response to Decision Letter 2


12 Nov 2024

Response:

1. We note that your manuscript is not formatted using one of PLOS ONE’s accepted file types. Please reattach your manuscript as one of the following file types: .doc, .docx, .rtf, or .tex (accompanied by a .pdf).

If your submission was prepared in LaTex, please submit your manuscript file in PDF format and attach your .tex file as “other.”

Response: It is uploaded now

2. Please note that your Data Availability Statement is currently missing [the repository name]. If your manuscript is accepted for publication, you will be asked to provide these details on a very short timeline. We therefore suggest that you provide this information now, though we will not hold up the peer review process if you are unable.

Response: Plz read the response below. Already data availability is present in the manuscript files. I do not know why without checking it, this comment is raised again. Plz refrain and kindly check the file

Point-wise Detailed Response to Editor and Reviewers’ comments

Title: Effective DDoS Attack Detection in Software-Defined Vehicular Networks Using Statistical Flow Analysis and Machine Learning

Authors: Himanshi Babbar, Shalli Rani, Maha Driss

Dear Editors and Reviewers:

We are thankful to you for spending your valuable time for making a review and for constructing the comments on our manuscript. These comments are valuable and very helpful for revising and improving our paper. We have studied comments carefully and have made correction as marked in the revised manuscript. We have tried our best to address the mentioned comments to revise our manuscript in the hope that these revisions will meet your requirement. The following changes have been made in the manuscript as per the received comments.

*The changes made in the manuscript as per received comments have been highlighted in blue color.

Editor’s Comments

1. Please check carefully for any non compliance issue between the metrics mentioned in the abstract, introduction and the conclusion parts (your models accuracy : 91% in the abstract ; 96.3% (introduction; line 112 and conclusion; line 713).

Response: Sorry for the non-compliance. The manuscript has been thoroughly read and made corrections as per the suggestion. The changes will be reflected with blue color on page no 5 in main contributions 6 point, and conclusion section.

2. Please insert a paragraph at the end of the introduction to highlight the structure of the manuscript.

Response: The organization of the paper has been inserted on page 5 and it is highlighted with blue color.

3. Please carefully review the manuscript and ensure that acronyms are explained only during their first use.

Response: The overall manuscript has been thoroughly read and ensured that acronyms are used only once.

Comments to the Author

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #2: All comments have been addressed

Response: Thank you for the feedback.

Reviewer #4: All comments have been addressed

Response: Thank you for the feedback.

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #2: Yes

Response: Thank you for the feedback.

Reviewer #4: Yes

Response: Thank you for the feedback.

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #2: Yes

Response: Thank you for the feedback.

Reviewer #4: Yes

Response: Thank you for the feedback.

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #2: Yes

Response: Thank you for the feedback.

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #2: Yes

Reviewer #4: Yes

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #2: Thank you very much for following my comments point to point. You can include some directions in the manuscript for future research.

Response: The future scope has been included in the conclusion section on page no 24 and it is highlighted with blue color.

Reviewer #4: This paper presents an intriguing research direction and makes a valuable contribution to the field.

- Comprehensive Evaluation: The paper offers a thorough analysis of the proposed method, demonstrating a solid understanding of the topic.

- High Accuracy: The results indicate a high level of accuracy.

- Methodical Process: The research follows a clear and methodical approach.

- Addressing Real-World Challenges: The study is relevant to practical applications.

- The references are up-to-date and relevant.

Response: Thank you for the feedback.

Thanks and Regards

Authors

Attachment

Submitted filename: minor revision PLOS ONE.docx

pone.0314695.s002.docx (17.5KB, docx)

Decision Letter 3

Faouzi Jaidi

15 Nov 2024

Effective DDoS Attack Detection in Software-Defined Vehicular Networks Using Statistical Flow Analysis and Machine Learning

PONE-D-24-16635R3

Dear Dr. Rani,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice will be generated when your article is formally accepted. Please note, if your institution has a publishing partnership with PLOS and your article meets the relevant criteria, all or part of your publication costs will be covered. Please make sure your user information is up-to-date by logging into Editorial Manager at Editorial Manager® and clicking the ‘Update My Information' link at the top of the page. If you have any questions relating to publication charges, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Faouzi Jaidi

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Acceptance letter

Faouzi Jaidi

3 Dec 2024

PONE-D-24-16635R3

PLOS ONE

Dear Dr. Rani,

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now being handed over to our production team.

At this stage, our production department will prepare your paper for publication. This includes ensuring the following:

* All references, tables, and figures are properly cited

* All relevant supporting information is included in the manuscript submission,

* There are no issues that prevent the paper from being properly typeset

If revisions are needed, the production department will contact you directly to resolve them. If no revisions are needed, you will receive an email when the publication date has been set. At this time, we do not offer pre-publication proofs to authors during production of the accepted work. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few weeks to review your paper and let you know the next and final steps.

Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

If we can help with anything else, please email us at customercare@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Faouzi Jaidi

Academic Editor

PLOS ONE


Articles from PLOS ONE are provided here courtesy of PLOS

RESOURCES